SlideShare a Scribd company logo
A Course in
LINEAR ALGEBRA
with Applications
Derek J. S. Robinson
A Course in
LINEAR ALGEBRA
with Applications
2nd Edition
A Course In LINEAR ALGEBRA With Applications
2nd Edition ik.
i l ^ f £ % M J 5% 9%tf"%I'll Stiffen
University of Illinois in Urbana-Champaign, USA
l | 0 World Scientific
NEW JERSEY • LONDON • SINGAPORE • BEIJING • SHANGHAI • HONG KONG • TAIPEI • CHENNAI
Published by
World Scientific Publishing Co. Pte. Ltd.
5 Toh Tuck Link, Singapore 596224
USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601
UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library.
A COURSE IN LINEAR ALGEBRA WITH APPLICATIONS (2nd Edition)
Copyright © 2006 by World Scientific Publishing Co. Pte. Ltd.
All rights reserved. This book, or parts thereof, may not be reproduced in anyform or by any means,
electronic or mechanical, including photocopying, recording or any information storage and retrieval
system now known or to be invented, without written permission from the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright
Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to
photocopy is not required from the publisher.
ISBN 981-270-023-4
ISBN 981-270-024-2 (pbk)
Printed in Singapore by B & JO Enterprise
For
JUDITH, EWAN and GAVIN
A Course In LINEAR ALGEBRA With Applications
PREFACE TO THE SECOND EDITION
The principal change from the first edition is the addition of
a new chapter on linear programming. While linear program-
ming is one of the most widely used and successful applications
of linear algebra, it rarely appears in a text such as this. In
the new Chapter Ten the theoretical basis of the simplex algo-
rithm is carefully explained and its geometrical interpretation
is stressed.
Some further applications of linear algebra have been
added, for example the use of Jordan normal form to solve
systems of linear differential equations and a discussion of ex-
tremal values of quadratic forms.
On the theoretical side, the concepts of coset and quotient
space are thoroughly explained in Chapter 5. Cosets have
useful interpretations as solutions sets of systems of linear
equations. In addition the Isomorphisms Theorems for vector
spaces are developed in Chapter Six: these shed light on the
relationship between subspaces and quotient spaces.
The opportunity has also been taken to add further exer-
cises, revise the exposition in several places and correct a few
errors. Hopefully these improvements will increase the use-
fulness of the book to anyone who needs to have a thorough
knowledge of linear algebra and its applications.
I am grateful to Ms. Tan Rok Ting of World Scientific
for assistance with the production of this new edition and for
patience in the face of missed deadlines. I thank my family
for their support during the preparation of the manuscript.
Derek Robinson
Urbana, Illinois
May 2006
vii
A Course In LINEAR ALGEBRA With Applications
PREFACE TO THE FIRST EDITION
A rough and ready definition of linear algebra might be: that
part of algebra which is concerned with quantities of the first
degree. Thus, at the very simplest level, it involves the so-
lution of systems of linear equations, and in a real sense this
elementary problem underlies the whole subject. Of all the
branches of algebra, linear algebra is the one which has found
the widest range of applications. Indeed there are few areas
of the mathematical, physical and social sciences which have
not benefitted from its power and precision. For anyone work-
ing in these fields a thorough knowledge of linear algebra has
become an indispensable tool. A recent feature is the greater
mathematical sophistication of users of the subject, due in
part to the increasing use of algebra in the information sci-
ences. At any rate it is no longer enough simply to be able to
perform Gaussian elimination and deal with real vector spaces
of dimensions two and three.
The aim of this book is to give a comprehensive intro-
duction to the core areas of linear algebra, while at the same
time providing a selection of applications. We have taken the
point of view that it is better to consider a few quality applica-
tions in depth, rather than attempt the almost impossible task
of covering all conceivable applications that potential readers
might have in mind.
The reader is not assumed to have any previous knowl-
edge of linear algebra - though in practice many will - but
is expected to have at least the mathematical maturity of a
student who has completed the calculus sequence. In North
America such a student will probably be in the second or third
year of study.
The book begins with a thorough discussion of matrix
operations. It is perhaps unfashionable to precede systems
of linear equations by matrices, but I feel that the central
ix
X Preface
position of matrices in the entire theory makes this a logical
and reasonable course. However the motivation for the in-
troduction of matrices, by means of linear equations, is still
provided informally. The second chapter forms a basis for
the whole subject with a full account of the theory of linear
equations. This is followed by a chapter on determinants, a
topic that has been unfairly neglected recently. In practice it
is hard to give a satisfactory definition of the general n x n
determinant without using permutations, so a brief account
of these is given.
Chapters Five and Six introduce the student to vector
spaces. The concept of an abstract vector space is probably
the most challenging one in the entire subject for the non-
mathematician, but it is a concept which is well worth the
effort of mastering. Our approach proceeds in gentle stages,
through a series of examples that exhibit the essential fea-
tures of a vector space; only then are the details of the def-
inition written down. However I feel that nothing is gained
by ducking the issue and omitting the definition entirely, as is
sometimes done.
Linear tranformations are the subject of Chapter Six.
After a brief introduction to functional notation, and numer-
ous examples of linear transformations, a thorough account
of the relation between linear transformations and matrices is
given. In addition both kernel and image are introduced and
are related to the null and column spaces of a matrix.
Orthogonality, perhaps the heart of the subject, receives
an extended treatment in Chapter Seven. After a gentle in-
troduction by way of scalar products in three dimensions —
which will be familiar to the student from calculus — inner
product spaces are denned and the Gram-Schmidt procedure
is described. The chapter concludes with a detailed account
of The Method of Least Squares, including the problem of
Preface xi
finding optimal solutions, which texts at this level often fail
to cover.
Chapter Eight introduces the reader to the theory of
eigenvectors and eigenvalues, still one of the most powerful
tools in linear algebra. Included is a detailed account of ap-
plications to systems of linear differential equations and linear
recurrences, and also to Markov processes. Here we have not
shied away from the more difficult case where the eigenvalues
of the coefficient matrix are not all different.
The final chapter contains a selection of more advanced
topics in linear algebra, including the crucial Spectral Theo-
rem on the diagonalizability of real symmetric matrices. The
usual applications of this result to quadratic forms, conies
and quadrics, and maxima and minima of functions of several
variables follow.
Also included in Chapter Nine are treatments of bilinear
forms and Jordan Normal Form, topics that are often not con-
sidered in texts at this level, but which should be more widely
known. In particular, canonical forms for both symmetric and
skew-symmetric bilinear forms are obtained. Finally, Jordan
Normal Form is presented by an accessible approach that re-
quires only an elementary knowledge of vector spaces.
Chapters One to Eight, together with Sections 9.1 and
9.2, correspond approximately to a one semester course taught
by the author over a period of many years. As time allows,
other topics from Chapter Nine may be included. In practice
some of the contents of Chapters One and Two will already be
familiar to many readers and can be treated as review. Full
proofs are almost always included: no doubt some instructors
may not wish to cover all of them, but it is stressed that for
maximum understanding of the material as many proofs as
possible should be read. A good supply of problems appears
at the end of each section. As always in mathematics, it is an
xii Preface
indispensible part of learning the subject to attempt as many
problems as possible.
This book was originally begun at the suggestion of
Harriet McQuarrie. I thank Ms. Ho Hwei Moon of World
Scientific Publishing Company for her advice and for help with
editorial work. I am grateful to my family for their patience,
and to my wife Judith for her encouragement, and for assis-
tance with the proof-reading.
Derek Robinson
Singapore
March 1991
CONTENTS
Preface to the Second Edition vii
Preface to the First Edition ix
Chapter One Matrix Algebra
1.1 Matrices 1
1.2 Operations with Matrices 6
1.3 Matrices over Rings and Fields 24
Chapter Two Systems of Linear Equations
2.1 Gaussian Elimination 30
2.2 Elementary Row Operations 41
2.3 Elementary Matrices 47
Chapter Three Determinants
3.1 Permutations and the Definition of a
Determinant 57
3.2 Basic Properties of Determinants 70
3.3 Determinants and Inverses of Matrices 78
xm
xiv Contents
Chapter Four Introduction to Vector Spaces
4.1 Examples of Vector Spaces 87
4.2 Vector Spaces and Subspaces 95
4.3 Linear Independence in Vector Spaces 104
Chapter Five Basis and Dimension
5.1 The Existence of a Basis 112
5.2 The Row and Column Spaces of a Matrix 126
5.3 Operations with Subspaces 133
Chapter Six Linear Transformations
6.1 Functions Defined on Sets 152
6.2 Linear Transformations and Matrices 158
6.3 Kernel, Image and Isomorphism 178
Chapter Seven Orthogonality in Vector Spaces
7.1 Scalar Products in Euclidean Space 193
7.2 Inner Product Spaces 209
7.3 Orthonormal Sets and the Gram-Schmidt
Process 226
7.4 The Method of Least Squares 241
Chapter Eight Eigenvectors and Eigenvalues
8.1 Basic Theory of Eigenvectors and Eigenvalues 257
8.2 Applications to Systems of Linear Recurrences 276
8.3 Applications to Systems of Linear Differential
Equations 288
Contents XV
Chapter Nine More Advanced Topics
9.1 Eigenvalues and Eigenvectors of Symmetric and
Hermitian Matrices 303
9.2 Quadratic Forms 313
9.3 Bilinear Forms 332
9.4 Minimum Polynomials and Jordan Normal
Form 347
Chapter Ten Linear Programming
10.1 Introduction to Linear Programming 370
10.2 The Geometry of Linear Programming 380
10.3 Basic Solutions and Extreme Points 391
10.4 The Simplex Algorithm 399
Appendix Mathematical Induction 415
Answers to the Exercises 418
Bibliography 430
Index 432
Chapter One
MATRIX ALGEBRA
In this first chapter we shall introduce one of the prin-
cipal objects of study in linear algebra, a matrix or rectan-
gular array of numbers, together with the standard matrix
operations. Matrices are encountered frequently in many ar-
eas of mathematics, engineering, and the physical and social
sciences, typically when data is given in tabular form. But
perhaps the most familiar situation in which matrices arise is
in the solution of systems of linear equations.
1.1 Matrices
An m x n matrix A is a rectangular array of numbers,
real or complex, with m rows and n columns. We shall write
dij for the number that appears in the ith row and the jth
column of A; this is called the (i,j) entry of A. We can either
write A in the extended form
/ an
«21
V&rol
or in the more compact form
Thus in the compact form a formula for the (i,j) entry of A
is given inside the round brackets, while the subscripts m and
n tell us the respective numbers of rows and columns of A.
1
&12 • • • CLn 
«22 - -
' &2n
Q"m2 ' ' ' Q"mn '
2 Chapter One: Matrix Algebra
Explicit examples of matrices are
/ 4 3  , / 0 2.4 6 
[l 2) a n d
{^=2 3/5 - l j -
Example 1.1.1
Write down the extended form of the matrix ((-l)*j + 1)3,2 •
The (i,j) entry of the matrix is (—l)l
j + i where i — 1,
2, 3, and j — 1, 2. So the matrix is
(1"0-
It is necessary to decide when two matrices A and B are
to be regarded as equal; in symbols A = B. Let us agree this
will mean that the matrices A and B have the same numbers
of rows and columns, and that, for all i and j , the (i,j) entry
of A equals the (i,j) entry of B. In short, two matrices are
equal if they look exactly alike.
As has already been mentioned, matrices arise when one
has to deal with linear equations. We shall now explain how
this comes about. Suppose we have a set of m linear equations
in n unknowns xi, X2, •
•
• , xn. These may be written in the
form
{
anxi + CL12X2 + • • • + anxn = bi
CL21X1 + a22X2 + • • • + a2nXn = £
>
2
omiXi + am2x2 + • •
• + a
Here the a^ and bi are to be regarded as given numbers. The
problem is to solve the system, that is, to find all n-tuples
of numbers xi, x2, ..., xn that satisfy every equation of the
1.1: Matrices 3
system, or to show that no such numbers exist. Solving a set
of linear equations is in many ways the most basic problem of
linear algebra.
The reader will probably have noticed that there is a ma-
trix involved in the above linear system, namely the coefficient
matrix
•
"• = =
y&ij )m,n-
In fact there is a second matrix present; it is obtained by using
the numbers bi, b2, •
•
., bmto add a new column, the (n + l)th,
to the coefficient matrix A. This results in an m x (n + 1)
matrix called the augmented matrix of the linear system. The
problem of solving linear systems will be taken up in earnest
in Chapter Two, where it will emerge that the coefficient and
augmented matrices play a critical role. At this point we
merely wish to point out that here is a natural problem in
which matrices are involved in an essential way.
Example 1.1.2
The coefficient and augmented matrices of the pair of linear
equations
2xi —3x2 +5a;3 = 1
^ -xx + x2 - x3 = 4
are respectively
2 - 3 5 , f 2 -3 5 1
and
- 1 1 - 1 7 V - 1 1 - 1 4
Some special matrices
Certain special types of matrices that occur frequently
will now be recorded.
(i) A 1 x n matrix, or n — row vector, A has a single row
A = (an a12 ... aln).
4 Chapter One: Matrix Algebra
(ii) An m x 1 matrix, or m-column vector, B has just one
column
b2i
B =
bml/
(iii) A matrix with the same number of rows and columns is
said to be square.
(iv) A zero matrix is a matrix all of whose entries are zero.
The zero m x n matrix is denoted by
0mn or simply 0.
Sometimes 0nn is written 0n. For example, O23 is the matrix
0 0 0
0 0 0
(v) The identity nxn matrix has l's on the principal diagonal,
that is, from top left to bottom right, and zeros elsewhere; thus
it has the form
( 0 •
• • 1 ^
0 1 •
•
• 0
This matrix is written
 0 0 •
•
• 1 /
In or simply I.
The identity matrix plays the role of the number 1 in matrix
multiplication.
(vi) A square matrix is called upper triangular if it has only
zero entries below the principal diagonal. Similarly a matrix
1.1: Matrices 5
is lower triangular if all entries above the principal diagonal
are zero. For example, the matrices
are upper triangular and lower triangular respectively.
(vii) A square matrix in which all the non-zero elements lie
on the principal diagonal is called a diagonal matrix. A scalar
matrix is a diagonal matrix in which the elements on the prin-
cipal diagonal are all equal. For example, the matrices
a 0 0 
0 6 0
0 0 c /
and
fa 0 0
0 a 0
 0 0 a
are respectively diagonal and scalar. Diagonal matrices have
much simpler algebraic properties than general square matri-
ces.
Exercises 1.1
1. Write out in extended form the matrix ((—)l
~^(i + j))2,4-
2. Find a formula for the (i,j) entry of each of the following
matrices:
- 1
1
- 1
1
- 1
1
- 1
1
- 1
(a) 1 - 1 1 , (b)
/ I
5
9
13
2
6
10
14
3
7
11
15
4
8
12
16
6 Chapter One: Matrix Algebra
3. Using the fact that matrices have a rectangular shape, say
how many different zero matrices can be formed using a total
of 12 zeros.
4. For every integer n > 1 there are always at least two zero
matrices that can be formed using a total of n zeros. For
which n are there exactly two such zero matrices?
5. Which matrices are both upper and lower triangular?
1.2 Operations with Matrices
We shall now introduce a number of standard operations
that can be performed on matrices, among them addition,
scalar multiplication and multiplication. We shall then de-
scribe the principal properties of these operations. Our object
in so doing is to develop a systematic means of performing cal-
culations with matrices.
(i) Addition and subtraction
Let A and B be two mxn matrices; as usual write a^ and bij
for their respective (i,j) entries. Define the sum A + B to be
the mxn matrix whose (i,j) entry is a^ + b^; thus to form
the matrix A + B we simply add corresponding entries of A
and B. Similarly, the difference A — B is the mxn matrix
whose (i,j) entry is a^- — b^. However A + B and A — B
are not defined if A and B do not have the same numbers of
rows and columns.
(ii) Scalar multiplication
By a scalar we shall mean a number, as opposed to a matrix
or array of numbers. Let c be a scalar and A an mxn matrix.
The scalar multiple cA is the mxn matrix whose (i, j) entry
is caij. Thus to form cA we multiply every entry of A by the
scalar c. The matrix (-l)A is usually written -A; it is called
the negative of A since it has the property that A + (-A) = 0.
1.2: Operations with Matrices 7
Example 1.2.1
If
, / 1 2 0 , „ (I 1 1
A
={-1 0 l) ™dB
={0 -3 1
then
2A + 35 = [ X A " I and 2A - 35
(iii) Matrix multiplication
It is less obvious what the "natural" definition of the
product of two matrices should be. Let us start with the
simplest interesting case, and consider a pair of 2 x 2 matrices
an a12 , , D ( blx bX2
A= ( lL
^ and B - , , .
 G 2 1 0122/  0 2 1 022
In order to motivate the definition of the matrix product AB
we consider two sets of linear equations
aiVi + a.i2V2 = xi a n d f &nzx + bX2z2 = y±
o.2iV + a22y2 = x2  b21zi + b22z2 = y2
Observe that the coefficient matrices of these linear systems
are A and B respectively. We shall think of these equations
as representing changes of variables from j/i, y2 to xi, x2, and
from z, z2 to y, y2 respectively.
Suppose that we replace y and y2 in the first set of equa-
tions by the values specified in the second set. After simplifi-
cation we obtain a new set of equations
(aii&n + ai2b2i)zi + (aU 01 2 + ai2b22)z2 = %i
(a21bn + a22b2i)zi + (a2ib12 + a22b22)z2 = x2
8 Chapter One: Matrix Algebra
This has coefficient matrix
aii&n + ai2&2i 011612 + 012622^
«21&11 + C122&21 ^21^12 + ^22^22 /
and represents a change of variables from zi, z2 to xi, x2 which
may be thought of as the composite of the original changes of
variables.
At first sight this new matrix looks formidable. However
it is in fact obtained from A and B in quite a simple fashion,
namely by the row-times-column rule. For example, the (1,2)
entry arises from multiplying corresponding entries of row 1 of
A and column 2 of B, and then adding the resulting numbers;
thus
(an a12)
&12
^22
Qll^l2 + Oi2^22-
Other entries arise in a similar fashion from a row of A and a
column of B.
Having made this observation, we are now ready to define
the product AB where A is an m x n matrix and B i s a n n x p
matrix. The rule is that the (i,j) entry of AB is obtained by
multiplying corresponding entries of row i of A and column j
of B, and then adding up the resulting products. This is the
row-times-column rule. Now row i of A and column j of B are
/ bij 
an a%2 ain) and
->2j
 bnj /
Hence the (i,j) entry of AB is
Uilblj + CLi202j + •-
• + O-inbnj,
1.2: Operations with Matrices 9
which can be written more concisely using the summation
notation as
n
fc=l
Notice that the rule only makes sense if the number of
columns of A equals the number of rows of B. Also the product
of an m x n matrix and a n n x p matrix is an m x p matrix.
Example 1.2.2
Let
A = {
1 I 21 and B
Since A is 2 x 3 and B is 3 x 3, we see that AB is defined
and is a 2 x 3 matrix. However BA is not defined. Using the
row-times-column rule, we quickly find that
AB =
0 0 2
2 16 - 2
Example 1.2.3
Let
A =
(O i)and B=
(I i
In this case both AB and BA are defined, but these matrices
are different:
AB=rQ ° ) = 0 2 2 M 1 d B A = ( ° l
10 Chapter One: Matrix Algebra
Thus already we recognise some interesting features of
matrix multiplication. The matrix product is not commuta-
tive, that is, AB and BA may be different when both are de-
fined; also the product of two non-zero matrices can be zero,
a phenomenon which indicates that any theory of division by
matrices will face considerable difficulties.
Next we show how matrix mutiplication provides a way of
representing a set of linear equations by a single matrix equa-
tion. Let A = (aij)mtn and let X and B be the column vectors
with entries x±, X2, ..., xn and 61, b2, ..., bm respectively. Then
the matrix equation
AX = B
is equivalent to the linear system
{
aiixx + ai2x2 + • •
• + anxn = bx
CL21X1 + CI22X2 + • • • + CL2nXn = h
o-mixi + am2X2 + • • • + a
For if we form the product AX and equate its entries to the
corresponding entries of B, we recover the equations of the
linear system. Here is further evidence that we have got
the definition of the matrix product right.
Example 1.2.4
The matrix form of the pair of linear equations
J 2xi — 3x2 + 5^3 = 1
 -xi + x2 - X3 = 4
is
1.2: Operations with Matrices 11
(iv) Powers of a matrix
Once matrix products have been defined, it is clear how to
define a non-negative power of a square matrix. Let A be an
n x n matrix; then the mth power of A, where m is a non-
negative integer, is defined by the equations
A0
= In and Am+1
= Am
A.
This is an example of a recursive definition: the first equation
specifies A0
, while the second shows how to define Am+1
, un-
der the assumption that Am
has already been defined. Thus
A1
= A, A2
= AA, A3
= A2
A etc. We do not attempt to
define negative powers at this juncture.
Example 1.2.5
Let
Then
The reader can verify that higher powers of A do not lead
to new matrices in this example. Therefore A has just four
distinct powers, A0
= I2, A1
= A, A2
and A3
.
(v) The transpose of a matrix
If A is an m x n matrix, the transpose of A,
is the n x m matrix whose (i,j) entry equals the (j,i) entry
of A. Thus the columns of A become the rows of AT
. For
example, if
/a b
A = c d ,
V fJ
12 Chapter One: Matrix Algebra
then the transpose of A is
A matrix which equals its transpose is called symmetric. On
the other hand, if AT
equals —A, then A is said to be skew-
symmetric. For example, the matrices
are symmetric and skew-symmetric respectively. Clearly sym-
metric matrices and skew-symmetric matrices must be square.
We shall see in Chapter Nine that symmetric matrices can in
a real sense be reduced to diagonal matrices.
The laws of matrix algebra
We shall now list a number of properties which are sat-
isfied by the various matrix operations defined above. These
properties will allow us to manipulate matrices in a system-
atic manner. Most of them are familiar from arithmetic; note
however the absence of the commutative law for multiplica-
tion.
In the following theorem A, B, C are matrices and c, d are
scalars; it is understood that the numbers of rows and columns
of the matrices are such that the various matrix products and
sums mentioned make sense.
Theorem 1.2.1
(a) A + B = B + A, {commutative law of addition)]
(b) (A + B) + C = A + (B + C), (associative law of
addition);
(c) A + 0 = A;
(d) (AB)C = A(BC), ( associative law of multiplication)]
(e) AI = A = I A;
1.2: Operations with Matrices 13
(f) A(B + C) = AB + AC, {distributive law);
(g) (A + B)C = AC + BC, (distributive law);
(h) A-B = A + (-l)B;
(i) (cd)A = c(dA);
(i)c(AB) = (cA)B = A(cB);
(k) c(A + B) = cA + cB;
(1) (c + d)A = cA + dA;
(m) (A + B)T
= AT
+ BT
;
(n) (AB)T
= BT
AT
.
Each of these laws is a logical consequence of the defini-
tions of the various matrix operations. To give formal proofs
of them all is a lengthy, but routine, task; an example of such a
proof will be given shortly. It must be stressed that familiarity
with these laws is essential if matrices are to be manipulated
correctly.
We remark that it is unambiguous to use the expression
A + B + C for both (A + B) + C and A+(B + C). For by
the associative law of addition these matrices are equal. The
same comment applies to sums like A + B + C + D , and also
to matrix products such as (AB)C and A(BC), both of which
are written as ABC.
In order to illustrate the use of matrix operations, we
shall now work out three problems.
Example 1.2.6
Prove the associative law for matrix multiplication, (AB)C =
A(BC) where A, B, C are mxn, nxp, pxq matrices re-
spectively.
In the first place observe that all the products mentioned
exist, and that both (AB)C and A(BC) are m x q matrices.
To show that they are equal, we need to verify that their (i, j)
entries are the same for all i and j .
14 Chapter One: Matrix Algebra
Let dik be the (i, k) entry of AB ; then dik = YH=I o-uhk-
Thus the (i,j) entry of (AB)C is YX=i d
ikCkj, that is
p n
J~](yiaubik)ckj.
fc=i 1=1
After a change in the order of summation, this becomes
n p
/^ilj/^lkCkj).
1=1 fc = l
Here it is permissible to change the order of the two summa-
tions since this just corresponds to adding up the numbers
aubikCkj in a different order. Finally, by the same procedure
we recognise the last sum as the (i,j) entry of the matrix
A(BC).
The next two examples illustrate the use of matrices in
real-life situations.
Example 1.2.7
A certain company manufactures three products P, Q, R in
four different plants W, X, Y, Z. The various costs (in whole
dollars) involved in producing a single item of a product are
given in the table
material
labor
overheads
P
1
3
2
Q
2
2
1
R
1
2
2
The numbers of items produced in one month at the four
locations are as follows:
1.2: Operations with Matrices 15
p
Q
w
2000
1000
2000
X
3000
500
2000
Y
1500
500
2500
Z
4000
1000
2500
The problem is to find the total monthly costs of material,
labor and overheads at each factory.
Let C be the "cost" matrix formed by the first set of
data and let N be the matrix formed by the second set of
data. Thus
/ l 2 1 /2000 3000 1500 4000 
C = 3 2 2 andJV= 1000 500 500 1000 .
 2 1 2 / 2000 2000 2500 2500/
The total costs per month at factory W are clearly
material : 1 x 2000 + 2 x 1000 + 1 x 2000 = 6000
labor : 3 x 2000 + 2 x 1000 + 2 x 2000 = 12000
overheads : 2 x 2000 + 1 x 1000 + 2 x 2000 = 9000
Now these amounts arise by multiplying rows 1, 2 and 3 of
matrix C times column 1 of matrix JV, that is, as the (1, 1),
(2, 1), and (3, 1) entries of matrix product CN. Similarly the
costs at the other locations are given by entries in the other
columns of the matrix CN. Thus the complete answer can be
read off from the matrix product
/ 6000 6000 5000 8500 
CN = I 12000 14000 10500 19000 I .
 9000 10500 8500 14000/
Here of course the rows of CN correspond to material, la-
bor and overheads, while the columns correspond to the four
plants W, X, Y, Z.
16 Chapter One: Matrix Algebra
Example 1.2.8
In a certain city there are 10,000 people of employable age.
At present 7000 are employed and the rest are out of work.
Each year 10% of those employed become unemployed, while
60% of the unemployed find work. Assuming that the total
pool of people remains the same, what will the employment
picture be in three years time?
Let en and un denote the numbers of employed and un-
employed persons respectively after n years. The information
given translates into the equations
en + i = .9en + .6un
un+i = .len + Aun
These linear equations are converted into a single matrix equa-
tion by introducing matrices
X„ = ( 6n
 and A ( , 9
-6
"n
u„. I V .1 .4
The equivalent matrix equation is
Xn+i = AXn.
Taking n to be 0, 1, 2 successively, we see that X = AXo,
X2 = AXi = A2
X0, X3 = AX2 = A3
XQ. In general
Xn = AU
XQ.
Now we were told that e0 = 7000 and UQ = 3000, so
Y - f700(A
x
°- ^3oooy •
Thus to find X3 all that we need to do is to compute the power
A3
. This turns out to be
1.2: Operations with Matrices 17
.861 .834 
.139 .166y
Hence
*-**,-(•£)
so that 8529 of the 10,000 will be in work after three years.
At this point an interesting question arises: what will
the numbers of employed and unemployed be in the long run?
This problem is an example of a Markov process; these pro-
cesses will be studied in Chapter Eight as an application of
the theory of eigenvalues.
The inverse of a square matrix
An n x n matrix A is said to be invertible if there is an
n x n matrix B such that
AB = In = BA.
Then B is called an inverse of A. A matrix which is not invert-
ible is sometimes called singular, while an invertible matrix is
said to be non-singular.
Example 1.2.9
Show that the matrix
1 3
3 9
is not invertible.
If f , ) were an inverse of the matrix, then we should
have
18 Chapter One: Matrix Algebra
1 3  fa b _ (1 0
3 9 c d ~ [ 0 1
which leads to a set of linear equations with no solutions,
a + 3c = 1
b + 3d = 0
3a + 9c = 0
3b + 9d = 1
Indeed the first and third equations clearly contradict each
other. Hence the matrix is not invertible.
Example 1.2.10
Show that the matrix
A- r ~2
is invertible and find an inverse for it.
Suppose that B = I , I is an inverse of A. Write out
the product AB and set it equal to I2, just as in the previous
example. This time we get a set of linear equations that has
a solution,
Indeed there is a unique solution a = 1, b = 2, c = 0, d = 1.
Thus the matrix
1.2: Operations with Matrices 19
is a candidate. To be sure that B is really an inverse of A, we
need to verify that BA is also equal to I2', this is in fact true,
as the reader should check.
At this point the natural question is: how can we tell if
a square matrix is invertible, and if it is, how can we find an
inverse? From the examples we have seen enough to realise
that the question is intimately connected with the problem of
solving systems of linear systems, so it is not surprising that
we must defer the answer until Chapter Two.
We now present some important facts about inverses of
matrices.
Theorem 1.2.2
A square matrix has at most one inverse.
Proof
Suppose that a square matrix A has two inverses B and B<i-
Then
ABX = AB2 = 1 = BXA = B2A.
The idea of the proof is to consider the product (BiA)B2
since BA = I, this equals IB2 = B2. On the other hand,
by the associative law it also equals Bi(AB2), which equals
BJ = Bx. Therefore Bx = B2.
From now on we shall write
A-1
for the unique inverse of an invertible matrix A.
20 Chapter One: Matrix Algebra
Theorem 1.2.3
(a) If A is an inveriible matrix, then A - 1
is invertible
and {A'1
)-1
=A.
(b) If A and B are invertible matrices of the same size,
then AB is invertible and (AB)~l
= B~1
A~1
.
Proof
(a) Certainly we have AA~1
= I — A~X
A, equations which
can be viewed as saying that A is an inverse of A~x
. Therefore,
since A~x
cannot have more than one inverse, its inverse must
be A.
(b) To prove the assertions we have only to check that B~1
A~1
is an inverse of AB. This is easily done: (AB)(B~1
A~l
) =
A(BB~1
)A~1
, by two applications of the associative law;
the latter matrix equals AIA~l
— AA~l
— I. Similarity
(B~1
A~1
)(AB) = I. Since inverses are unique, (AB)"1
=
B~l
A-
Partitioned matrices
A matrix is said to be partitioned if it is subdivided into
a rectangular array of submatrices by a series of horizontal or
vertical lines. For example, if A is the matrix (aij)^^, then
/ an ai2 | ai3 
021 0.22 I CI23
 a3i a32 | a33 /
is a partitioning of A. Another example of a partitioned matrix
is the augmented matrix of the linear system whose matrix
form is AX — B ; here the partitioning is [-A|S].
There are occasions when it is helpful to think of a matrix
as being partitioned in some particular manner. A common
one is when an m x n matrix A is partitioned into its columns
A±, A2, • • •, An,
1.2: Operations with Matrices 21
A=(A1A2 ... An).
Because of this it is important to observe the following fact.
Theorem 1.2.4
Partitioned matrices can be added and multiplied according to
the usual rules of matrix algebra.
Thus to add two partitioned matrices, we add correspond-
ing entries, although these are now matrices rather than
scalars. To multiply two partitioned matrices use the row-
times-column rule. Notice however that the partitions of the
matrices must be compatible if these operations are to make
sense.
Example 1.2.11
Let A = (0^)4,4 be partitioned into four 2 x 2 matrices
A =
An A12
A2 A22
where
An = ( a n G l 2
) , A12=l ° 1 3 a i 4
«21 &22 )  023 «24
•
4 21 = ( a 3 1 a 3 2
) , A 2 2 =
V a
4i a42 J
Let B = (fry)4,4 be similarly partitioned into submatrices Bn,
B2, B21, B22
Bn B2
B2 B22
B
T h e n
A + B An + Bn A12 + B12
A21 + B21 A22 + B22
22 Chapter One: Matrix Algebra
by the rule of addition for matrices.
Example 1.2.12
Let A be anTOX n matrix and B an n x p matrix; write Bi,
B2, ..., Bp for the columns of B. Then, using the partition of
B into columns B = [.B^i^l •
• • BP], we have
AB = (AB1AB2 ... ABP).
This follows at once from the row-times-column rule of matrix
multiplication.
Exercises 1.2
1. Define matrices
/l 2 3 /2 1 /3 0 4
A= 0 1 -1 , B= 1 2 , C= 0 1 0 .
2 1 0/ 1 1/ 2 -1 3/
(a) Compute 3A - 2C.
(b) Verify that (A + C)B = AB + CB.
(c) Compute A2
and A3
. (d) Verify that (AB)T
=
BT
AT
.
2. Establish the laws of exponents: Am
An
= Am+n
and
(Am
)n
= Amn
where A is any square matrix andTOand n are
non-negative integers. [Use induction on n : see Appendix.]
3. If the matrix products AB and BA both exist, what can
you conclude about the sizes of A and Bl
4. If A = ( 1, what is the first positive power of A
that equals I-p.
5. Show that no positive power of the matrix I J equals
h •
1.2: Operations with Matrices 23
6. Prove the distributive law A(B + C) = AB + AC where A
is m x n, and B and C are n x p.
7. Prove that (Ai?)r
= BT
AT
where A is m x n and £? is
n x p .
8. Establish the rules c{AB) = (cA)B = A(cB) and (cA)T
=
cAT
.
9. If A is an n x n matrix some power of which equals In,
then A is invertible. Prove or disprove.
10. Show that any two n x n diagonal matrices commute.
11. Prove that a scalar matrix commutes with every square
matrix of the same size.
12. A certain library owns 10,000 books. Each month 20%
of the books in the library are lent out and 80% of the books
lent out are returned, while 10% remain lent out and 10%
are reported lost. Finally, 25% of the books listed as lost the
previous month are found and returned to the library. At
present 9000 books are in the library, 1000 are lent out, and
none are lost. How many books will be in the library, lent
out, and lost after two months ?
13. Let A be any square matrix. Prove that {A + AT
) is
symmetric, while the matrix {A — AT
) is skew-symmetric.
14. Use the last exercise to show that every square matrix
can be written as the sum of a symmetric matrix and a skew-
symmetric matrix. Illustrate this fact by writing the matrix
(• J -i)
as the sum of a symmetric and a skew-symmetric matrix.
15. Prove that the sum referred to in Exercise 14 is always
unique.
24 Chapter One: Matrix Algebra
16. Show that a n n x n matrix A which commutes with every
other n x n matrix must be scalar. [Hint: A commutes with
the matrix whose (i,j) entry is 1 and whose other entries are
all 0.]
17. (Negative powers of matrices) Let A be an invertible ma-
trix. If n > 0, define the power A~n
to be (A~l
)n
. Prove that
A-n = (A*)'1
.
18. For each of the following matrices find the inverse or show
that the matrix is not invertible:
«G9= <21)-
19. Generalize the laws of exponents to negative powers of an
invertible matrix [see Exercise 2.]
20. Let A be an invertible matrix. Prove that AT
is invertible
and (AT
)~l
= {A-1
)T
.
21. Give an example of a 3 x 3 matrix A such that A3
= 0,
but A2
^ 0.
1.3 Matrices over Rings and Fields
Up to this point we have assumed that all our matrices
have as their entries real or complex numbers. Now there are
circumstances under which this assumption is too restrictive;
for example, one might wish to deal only with matrices whose
entries are integers. So it is desirable to develop a theory
of matrices whose entries belong to certain abstract algebraic
systems. If we review all the definitions given so far, it be-
comes clear that what we really require of the entries of a
matrix is that they belong to a "system" in which we can add
and multiply, subject of course to reasonable rules. By this we
mean rules of such a nature that the laws of matrix algebra
listed in Theorem 1.2.1 will hold true.
1.3: Matrices over Rings and Fields 25
The type of abstract algebraic system for which this can
be done is called a ring with identity. By this is meant a set
R, with a rule of addition and a rule of multiplication; thus
if ri and r2 are elements of the set R, then there is a unique
sum r + r2 and a unique product rir2 in R- In addition the
following laws are required to hold:
(a) 7
*
1 + r2 = r2 + ri, (commutative law of addition):
(b) (7*1 + r2) + r3 = ri + (r2 + r3), (associative law of
addition):
(c) R contains a zero element OR with the property
r + OR = r :
(d) each element r of R has a negative, that is, an
element —r of R with the property r + (—r) = 0.R :
(e) (rir2)r3 = ri(r2r^), (associative law of
multiplication):
(f) R contains an identity element 1R, different from 0^,
such that rR — r = l#r :
(g) (r
i + ^2)^3 = f]T3 + ^2^3, (distributive law):
(h) ri(r2 + 7-3) = rir2 + 7*17-3, (distributive law).
These laws are to hold for all elements 7*1, r2, r3, r of the
ring .R . The list of rules ought to seem reasonable since all of
them are familiar laws of arithmetic.
If two further rules hold, then the ring is called a field:
(i) rxr2 = r 2 r i , (commutative law of multiplication):
(j) each element r in R other than the zero element OK
has an inverse, that is, an element r"1
in R such that
rr x
= If? = r 1
r.
So the additional rules require that multiplication be a
commutative operation, and that each non-zero element of R
have an inverse. Thus a field is essentially an abstract system
in which one can add, multiply and divide, subject to the usual
laws of arithmetic.
Of course the most familiar examples of fields are
26 Chapter One: Matrix Algebra
C and R,
the fields of complex numbers and real numbers respectively,
where the addition and multiplication used are those of arith-
metic. These are the examples that motivated the definition
of a field in the first place. Another example is the field of
rational numbers
Q
(Recall that a rational number is a number of the form a/b
where a and b are integers). On the other hand, the set of all
integers Z, (with the usual sum and product), is a ring with
identity, but it is not a field since 2 has no inverse in this ring.
All the examples given so far are infinite fields. But there
are also finite fields, the most familiar being the field of two
elements. This field has the two elements 0 and 1, sums and
products being calculated according to the tables
+
0
1
0 1
0 1
1 0
and
X
0
1
0 1
0 0
0 1
respectively. For example, we read off from the tables that
1 + 1 = 0 and 1 x 1 = 1. In recent years finite fields have be-
come of importance in computer science and in coding theory.
Thus the significance of fields extends beyond the domain of
pure mathematics.
Suppose now that R is an arbitrary ring with identity.
An m x n matrix over R is a rectangular m x n array of
elements belonging to the ring R. It is possible to form sums
1.3: Matrices over Rings and Fields 27
and products of matrices over R, and the scalar multiple of
a matrix over R by an element of R, by using exactly the
same definitions as in the case of matrices with numerical
entries. That the laws of matrix algebra listed in Theorem
1.2.1 are still valid is guaranteed by the ring axioms. Thus in
the general theory the only change is that the scalars which
appear as entries of a matrix are allowed to be elements of an
arbitrary ring with identity.
Some readers may feel uncomfortable with the notion of a
matrix over an abstract ring. However, if they wish, they may
safely assume in the sequel that the field of scalars is either
R or C. Indeed there are places where we will definitely want
to assume this. Nevertheless we wish to make the point that
much of linear algebra can be done in far greater generality
than over R and C.
Example 1.3.1
Let A = I 1 and B = I n J be matrices over the
field of two elements. Using the tables above and the rules of
matrix addition and multiplication, we find that
Algebraic structures in linear algebra
There is another reason for introducing the concept of a
ring at this stage. For rings, one of the fundamental structures
of algebra, occur naturally at various points in linear algebra.
To illustrate this, let us write
Mn(R)
for the set of all n x n matrices over a fixed ring with identity
R. If the standard matrix operations of addition and multipli-
cation are used, this set becomes a ring, the ring of all n x n
28 Chapter One: Matrix Algebra
matrices over R. The validity of the ring axioms follows from
Theorem 1.2.1. An obviously important example of a ring is
M n ( R ) . Later we shall discover other places in linear algebra
where rings occur naturally.
Finally, we mention another important algebraic struc-
ture that appears naturally in linear algebra, a group. Con-
sider the set of all invertible n x n matrices over a ring with
identity R; denote this by
GLn(R).
This is a set equipped with a rule of multiplication; for if A
and B are two invertible n x n matrices over R, then AB
is also invertible and so belongs to GLn(R), as the proof of
Theorem 1.2.3 shows. In addition, each element of this set
has an inverse which is also in the set. Of course the identity
nxn matrix belongs to GLn{R), and multiplication obeys the
associative law.
All of this means that GLn(R) is a group. The formal
definition is as follows. A group is a set G with a rule of
multiplication; thus if g and gi are elements of G, there is
a unique product gig2 in G. The following axioms must be
satisfied:
(a) (0102)03 = (0102)03, {associative law):
(b) there is an identity element 1Q with the property
1 G 0 = 0 = 0 1 G :
(c) each element g of G has an inverse element 0 _ 1
in G
such that gg~l
= 1Q = 9'1
g-
These statements must hold for all elements g, gi, 02, 03 of G.
Thus the set GLn (R) of all invertible matrices over R, a
ring with identity, is a group; this important group is known
as the general linear group of degree n over R. Groups oc-
cur in many areas of science, particularly in situations where
symmetry is important.
1.3: Matrices over Rings and Fields 29
Exercises 1.3
1. Show that the following sets of numbers are fields if the
usual addition and multiplication of arithmetic are used:
(a) the set of all rational numbers;
(b) the set of all numbers of the form a + by/2 where a
and b are rational numbers;
(c) the set of all numbers of the form a + by/^l where
where a and b are rational numbers.
2. Explain why the ring Mn(C) is not a field if n > 1.
3. How many n x n matrices are there over the field of two
elements? How many of these are symmetric ? [You will need
the formula l + 2 + 3 + '-- + n = n(n + l)/2; for this see
Example A.l in the Appendix ].
4. Let
/ l 1 1  / O i l
A = 0 1 1 and B = 1 1 1
 0 1 0 /  1 1 0
be matrices over the field of two elements. Compute A + B,
A2
and AB.
5. Show that the set of all n x n scalar matrices over R with
the usual matrix operations is a field.
6. Show that the set of all non-zero nxn scalar matrices over
R is a group with respect to matrix multiplication.
7. Explain why the set of all non-zero integers with the usual
multiplication is not a group.
Chapter Two
SYSTEMS OF LINEAR EQUATIONS
In this chapter we address what has already been de-
scribed as one of the fundamental problems of linear algebra:
to determine if a system of linear equations - or linear system
- has a solution, and, if so, to find all its solutions. Almost
all the ensuing chapters depend, directly or indirectly, on the
results that are described here.
2.1 Gaussian Elimination
We begin by considering in detail three examples of linear
systems which will serve to show what kind of phenomena are
to be expected; they will also give some idea of the techniques
that are available for solving linear systems.
Example 2.1.1
xi - x2 + x3 + x4 = 2
%i + X2 + x3 - x4 = 3
Xi + 3X2 + £3 — 3^4 = 1
To determine if the system has a solution, we apply
certain operations to the equations of the system which are
designed to eliminate unknowns from as many equations as
possible. The important point about these operations is that,
although they change the linear system, they do not change
its solutions.
We begin by subtracting equation 1 from equations 2 and
3 in order to eliminate x from the last two equations. These
operations can be conveniently denoted by (2) — (1) and (3) —
(1) respectively. The effect is to produce a new linear system
30
2.1: Gaussian Elimination 31
X i - x2
2x2
4x2
+a:3 + x4 = 2
- 2x4 = 1
- 4x4 = -1
Next multiply equation 2 of this new system by , an opera-
tion which is denoted by  (2), to get
xi - x2 +x3 + x4 = 2
1.
2
-1
Finally, eliminate x2 from equation 3 by performing the op-
eration (3) — 4(2), that is, subtract 4 times equation 2 from
equation 3; this yields the linear system
- x2
X2
4x2
+£3 + x4
— X4
- 4x4
=
=
=
Xi - x2
X2
+ x3 + x4
— X4
0
= 2
1
—
2
= -3
Of course the third equation is false, so the original linear
system has no solutions, that is, it is inconsistent.
Example 2.1.2
X
2xi
+ 4x2
- 8x2
X2
+ 2X3
+ 3x3
+ x3
= -2
= 32
= 1
Add two times equation 1 to equation 2, that is, perform the
operation (2) + 2(1), to get
xi + 4x2 + 2x3 = - 2
7x3 = 28
x2 + x3 — 1
32 Chapter Two: Systems of Linear Equations
At this point we should have liked X2 to appear in the second
equation: however this is not the case. To remedy the situ-
ation we interchange equations 2 and 3, in symbols (2)«->(3).
The linear system now takes the form
xi + 4x2 + 2x3 = - 2
X2 + X3 = 1
7x3 = 28
Finally, multiply equation 3 by | , that is, apply |(3), to get
xi + 4x2 + 2x3 = - 2
x2 + x3 = 1
x3 = 4
This system can be solved quickly by a process called back
substitution. By the last equation X3 = 4, so we can substi-
tute X3 = 4 in the second equation to get x2 = —3. Finally,
substitute X3 = 4 and x2 = —3 in the first equation to get
x = 2. Hence the linear system has a unique solution.
Example 2.1.3
( Xi
< 2X!
1 -xi
+ 3x2
+ 6x2
- 3x2
+ 3x3
+ 9x3
+ 3x3
+ 2x4
+ 5X4
= 1
= 5
= 5
Apply operations (2) - 2(1) and (3) + (1) successively to
the linear system to get
Xi + 3x2 3x3
3x3
6x3
+ 2x4
+ x4
+ 2x4
= 1
= 3
= 6
Since X2 has disappeared completely from the second and third
equations, we move on to the next unknown x3; applying |(2),
we obtain
2.1: Gaussian Elimination 33
xi + 3x2 + 3x3 + 2^4 = 1
^3 + §^4 = 1
6^3 + 2x4 = 6
Finally, operation (3) - 6(2) gives
Xi + 3x2 + 3^3 + 2X4 = 1
1
3
£3 + kx
4 = !
0 = 0
Here the third equation tells us nothing and can be ignored.
Now observe that we can assign arbitrary values c and d to
the unknowns X4 and x2 respectively, and then use back sub-
stitution to find x3 and x. Hence the most general solution
of the linear system is
x± = —2 — c — 3d, x2 — d, X3 = 1 — - , £4 = c.
Since c and d can be given arbitrary values, the linear system
has infinitely many solutions.
What has been learned from these three examples? In
the first place, the number of solutions of a linear system can
be 0, 1 or infinity. More importantly, we have seen that there
is a systematic method of eliminating some of the unknowns
from all equations of the system beyond a certain point, with
the result that a linear system is reached which is of such a
simple form that it is possible either to conclude that no solu-
tions exist or else to find all solutions by the process of back
substitution. This systematic procedure is called Gaussian
elimination; it is now time to give a general account of the
way in which it works.
34 Chapter Two: Systems of Linear Equations
The general theory of linear systems
Consider a set of m linear equations in n unknowns xi,x2,
..., xn:
{
alxxi + ai2x2 + •
•
• + alnxn = b
a2Xi + a22x2 + • • • + a2nxn = b2
&mixi + am2x2 + • • • + amnxn = bm
By a solution of the linear system we shall mean an n-column
vector
fXl

x2
xnJ
such that the scalars x, x2, ..., xn satisfy all the equations
of the system. The set of all solutions is called the general
solution of the linear system; this is normally given in the form
of a single column vector containing a number of arbitrary
quantities. A linear system with no solutions is said to be
inconsistent.
Two linear systems which have the same sets of solutions
are termed equivalent. Now in the examples discussed above
three types of operation were applied to the linear systems:
(a) interchange of two equations;
(b) addition of a multiple of one equation to another
equation;
(c) multiplication of one equation by a non-zero scalar.
Notice that each of these operations is invertible. The critical
property of such operations is that, when they are applied
to a linear system, the resulting system is equivalent to the
original one. This fact was exploited in the three examples
above. Indeed, by the very nature of these operations, any
2.1: Gaussian Elimination 35
solution of the original system is bound to be a solution of the
new system, and conversely, by invertibility of the operations,
any solution of the new system is also a solution of the original
system. Thus we can state the fundamental theorem:
Theorem 2.1.1
When an operation of one of the three types (a), (b), (c) is
applied to a linear system, the resulting linear system is equiv-
alent to the original one.
We shall now exploit this result and describe the proce-
dure known as Gaussian elimination. In this a sequence of
operations of types (a), (b), (c) is applied to a linear system
in such a way as to produce an equivalent linear system whose
form is so simple that we can quickly determine its solutions.
Suppose that a linear system of m equations in n un-
knowns xi, X2, ..., xn is given. In Gaussian elimination the
following steps are to be carried out.
(i) Find an equation in which x appears and, if necessary,
interchange this equation with the first equation. Thus we can
assume that x appears in equation 1.
(ii) Multiply equation 1 by a suitable non-zero scalar in
such a way as to make the coefficient of x equal to 1.
(iii) Subtract suitable multiples of equation 1 from equa-
tions 2 through m in order to eliminate x from these equa-
tions.
(iv) Inspect equations 2 through m and find the first equa-
tion which involves one of the the unknowns a?2, •
•
• , xn , say
Xi2. By interchanging equations once again, we can suppose
that Xi2 occurs in equation 2.
(v) Multiply equation 2 by a suitable non-zero scalar to
make the coefficient of Xi2 equal to 1.
(vi) Subtract multiples of equation 2 from equations 3
through m to eliminate Xi2 from these equations.
36 Chapter Two: Systems of Linear Equations
(vii) Examine equations 3 through m and find the first
one that involves an unknown other than x and Xi2, say xi3.
By interchanging equations we may assumethat Xi3 actually
occurs in equation 3.
The next step is to make the coefficient of xi3 equal to 1,
and then to eliminate Xi3 from equations 4 through m, and so
on.
The elimination procedure continues in this manner, pro-
ducing the so-called pivotal unknowns xi = xix, xi2, ..., Xir,
until we reach a linear system in which no further unknowns
occur in the equations beyond the rth. A linear system of this
sort is said to be in echelon form; it will have the following
shape.
Xi1 -f- # Xi2
Xi2
<
K 0 = *
Here the asterisks represent certain scalars and the ij are in-
tegers which satisfy 1 = i < %2 < •
•
• < ir < n
- The unknowns
Xi. for j = 1 to r are the pivots.
Once echelon form has been reached, the behavior of the
linear system can be completely described and the solutions
- if any - obtained by back substitution, as in the preceding
examples. Consequently we have the following fundamental
result which describes the possible behavior of a linear system.
Theorem 2.1.2
(i) A linear system is consistent if and only if all the entries on
the right hand sides of those equations in echelon form which
contain no unknowns are zero.
+ • • • + * xn = *
+ • • • + * xn = *
•
Ei r i ' ' ' "T * Xn — *
0 = *
2.1: Gaussian Elimination 37
(ii) If the system is consistent, the non-pivotal unknowns can
be given arbitrary values; the general solution is then obtained
by using back substitution to solve for the pivotal unknowns.
(iii) The system has a unique solution if and only if all the
unknowns are pivotal.
An important feature of Gaussian elimination is that it
constitutes a practical algorithm for solving linear systems
which can easily be implemented in one of the standard pro-
gramming languages.
Gauss-Jordan elimination
Let us return to the echelon form of the linear system
described above. We can further simplify the system by sub-
tracting a multiple of equation 2 from equation 1 to eliminate
xi2 from that equation. Now xi2 occurs only in the second
equation. Similarly we can eliminate x;3 from equations 1
and 2 by subtracting multiples of equation 3 from these equa-
tions. And so on. Ultimately a linear system is reached which
is in reduced echelon form.
Here each pivotal unknown appears in precisely one equa-
tion; the non-pivotal unknowns may be given arbitrary values
and the pivotal unknowns are then determined directly from
the equations without back substitution.
The procedure for reaching reduced echelon form is called
Gauss-Jordan elimination: while it results in a simpler type of
linear system, this is accomplished at the cost of using more
operations.
Example 2.1.4
In Example 2.1.3 above we obtained a linear system in echelon
form
' X + 3^2 + 3^3 + 2^4 = 1
< x3 + | x 4 = 1
38 Chapter Two: Systems of Linear Equations
Here the pivots are x and X3. One further operation must
be applied to put the system in reduced row echelon form,
namely (1) - 3(2); this gives
x + 3x2 + X4 — — 2
£3 + x± = 1
To obtain the general solution give the non-pivotal unknowns
x2 and X4 the arbitrary values d and c respectively, and then
read off directly the values xi = —2 — c — 3d and x3 = 1 — c/3.
Homogeneous linear systems
A very important type of linear system occurs when all
the scalars on the right hand sides of the equations equal zero.
' a n x i + CJ12X2 +
a2Xi + a22x2 +
, amlx1 + am2x2 +
Such a system is called homogeneous. It will always have the
trivial solution x = 0, x2 = 0, ..., xn = 0; thus a homogeneous
linear system is always consistent. The interesting question
about a homogeneous linear system is whether it has any non-
trivial solutions. The answer is easily read off from the echelon
form.
Theorem 2.1.3
A homogeneous linear system has a non-trivial solution if and
only if the number of pivots in echelon form is less than the
number of unknowns.
For if the number of unkowns is n and the number of
pivots is r, the n — r non-pivotal unknowns can be given arbi-
trary values, so there will be a non-trivial solution whenever
+ «2n^n = 0
1 Q>mn%n =
U
2.1: Gaussian Elimination 39
n — r > 0. On the other hand, if n = r, none of the unknowns
can be given arbitrary values, and there is only one solution,
namely the trivial one, as we see from reduced echelon form.
Corollary 2.1.4
A homogeneous linear system of m equations in n unknowns
always has a non-trivial solution if m <n.
For if r is the number of pivots, then r <m < n.
Example 2.1.5
For which values of the parameter t does the following homo-
geneous linear system have non-trivial solutions?
6a?i - x2 + x3 = 0
tX + X3 = 0
x2 + tx3 = 0
It suffices to find the number of pivotal unknowns. We
proceed to put the linear system in echelon form by applying
to it successively the operations |(1), (2) — £(1), (2) «
-
>
• (3)
and ( 3 ) - | ( 2 ) :
{Xl - x2 + | ^ 3 = 0
x2 + tx3 = 0
U - S - T ) * 3 =0
The number of pivots will be less than 3, the number of un-
knowns, precisely when 1 — t/6 — t2
/6 equals zero, that is,
when i = 2 or i = - 3 . These are the only values of t for
which the linear system has non-trivial solutions.
The reader will have noticed that we deviated slightly
from the procedure of Gaussian elimination; this was to avoid
dividing by t/6, which would have necessitated a separate dis-
cussion of the case t = 0.
40 Chapter Two: Systems of Linear Equations
Exercises 2.1
In the first three problems find the general solution or else
show that the linear system is inconsistent.
x + 2x2 — 3x3 + x4 = 7
-xi + x2 - x3 + X4 = 4
2.
3.
+ x 2 — £3 — X4 = 0
+ x3 - x4 = - 1
+ 2x2 + %3 — 3^4 = 2
xx + x2 + 2x3 = 4
Xi - X2 — £3 = - 1
2xi — 4x2 —
5x3 = 1
Solve the following homogeneous linear systems
xi + x2 + x3 + x4 = 0
(a) { 2xi + 2x2 + £3 + £4 = 0
xi + x2 - x3 + x4 = 0
2xi — X2 + 3x3 = 0
(b) { 4xi + 2x2 + 2 x 3 = 0
-2xi + 5x2 — 4x3 = 0
5. For which values of t does the following homogeneous linear
system have non-trivial solutions?
12xi
tXi
- x2
X2
+ X3
+ X3
+ tx3
= 0
= 0
= 0
2.2: Elementary Row Operations 41
6. For which values of t is the following linear system consis-
tent?
7. How many operations of types (a), (b), (c) are needed in
general to put a system of n linear equations in n unknowns
in echelon form?
2.2 Elementary Row Operations
If we examine more closely the process of Gaussian elim-
ination described in 2.1, it is apparent that much time and
trouble could be saved by working directly with the augmented
matrix of the linear system and applying certain operations
to its rows. In this way we avoid having to write out the
unknowns repeatedly.
The row operations referred to correspond to the three
types of operation that may be applied to a linear system dur-
ing Gaussian elimination. These are the so-called elementary
row operations and they can be applied to any matrix. The
row operations together with their symbolic representations
are as follows:
(a) interchange rows i and j , (i?j <-»• Rj);
(b) add c times row j to row i where c is any scalar,
(Ri + cRj);
(c) multiply row i by a non-zero scalar c, (cRi).
From the matrix point of view the essential content of The-
orem 2.1.2 is that any matrix can be put in what is called
row echelon form by application of a suitable finite sequence
of elementary row operations. A matrix in row echelon form
42 Chapter Two: Systems of Linear Equations
has the typical "descending staircase" form
0
0
0
0
0
ll
0
0
0
0
* •
0 •
0 •
0 •
0 •
*
• 0
• 0
• 0
• 0
*
1
0
0
0
* • •
* • •
0 ••
0 ••
0 ••
* * •
* * •
ll * •
0 0 •
0 0 •
* * 
* *
* *
• 0 *
• 0 */
0
0
0
Vo
Here the asterisks denote certain scalars.
Example 2.2.1
Put the following matrix in row echelon form by applying
suitable elementary row operations:
1 3 3 2 1
2 6 9 5 5
- 1 - 3 3 0 5
Applying the row operations R2 — 2R and -R3 + R, we
obtain
1 3 3 2 l 
0 0 3 1 3 .
0 0 6 2 6 /
Then, after applying the operations |i?2 and R3 — 6R2, we
get
1 3 3 2 1
0 0 1 1 / 3 1
0 0 0 0 0
which is in row echelon form.
Suppose now that we wish to solve the linear system with
matrix form AX = B, using elementary row operations. The
first step is to identify the augmented matrix M = [A  B].
2.2: Elementary Row Operations 43
Then we put M in row echelon form, using row operations.
From this we can determine if the original linear system is
consistent; for this to be true, in the row echelon form of M
the scalars in the last column which lie below the final pivot
must all be zero. To find the general solution of a consistent
system we convert the row echelon matrix back to a linear
system and use back substitution to solve it.
Example 2.2.2
Consider once again the linear system of Example 2.1.3;
Xl
2xi
-Xi
+ 3x2
+ 6x2
- 3x2
+ 3x3
+ 9x3
+ 3x3
+ 2x4
+ 5X4
= 1
= 5
= 5
The augmented matrix here is
1 3 3 2 1 1
2 6 9 5 1 5
1 — 3 3 0 1 5
Now we have just seen in Example 2.2.1 that this matrix has
row echelon form
1 3 3 2
0 0 11/3
0 0 0 0
1 1
| 1
1 0
Because the lower right hand entry is 0, the linear system is
consistent. The linear system corresponding to the last matrix
is
Xi + 3X2 + 3X3 + 2X4 = 1
£3 + - x 4 = 1
0 = 0
44 Chapter Two: Systems of Linear Equations
Hence the general solution given by back substitution is x =
—2 — c — 3d, X2 = d, £3 = 1 — c/3, £4 = c, where c and d are
arbitrary scalars.
The matrix formulation enables us to put our conclusions
about linear systems in a succinct form.
Theorem 2.2.1
Let AX = B be a linear system of equations in n unknowns
with augmented matrix M = [A  B].
(i) The linear system is consistent if and only if the matri-
ces A and M have the same numbers of pivots in row echelon
form.
(ii) If the linear system is consistent and r denotes the
number of pivots of A in row echelon form, then the n — r
unknowns that correspond to columns of A not containing a
pivot can be given arbitrary values. Thus the system has a
unique solution if and only if r = n.
Proof
For the linear system to be consistent, the row echelon form
of M must have only zero entries in the last column below the
final pivot; but this is just the condition for A and M to have
the same numbers of pivots.
Finally, if the linear system is consistent, the unknowns
corresponding to columns that do not contain pivots may be
given arbitrary values and the remaining unknowns found by
back substitution.
Reduced row echelon form
A matrix is said to be in reduced row echelon form if it is
in row echelon form and if in each column containing a pivot
all entries other than the pivot itself are zero.
2.2: Elementary Row Operations 45
Example 2.2.3
Put the matrix
1 1 2 2
4 4 9 10
3 3 6 7
in reduced row echelon form.
By applying suitable row operations we find the row ech-
elon form to be
' 1 1 2 2'
0 0 1 2
0 0 0 1
Notice that columns 1, 3 and 4 contain pivots. To pass to
reduced row echelon form, apply the row operations Ri — 2R2,
R + 2i?3 and R2 — 2R3: the answer is
1 1 0 0'
0 0 1 0
0 0 0 1
As this example illustrates, one can pass from row echelon
form to reduced row echelon form by applying further row op-
erations; notice that this will not change the number of pivots.
Thus an arbitrary matrix can be put in reduced row echelon
form by applying a finite sequence of elementary row opera-
tions. The reader should observe that this is just the matrix
formulation of the Gauss-Jordan elimination procedure.
Exercises 2.2
1. Put each of the following matrices in row echelon form:
, (b)
46 Chapter Two: Systems of Linear Equations
/ l 2 - 3 1 
(c) 3 1 2 2 .
 8 1 9 1 /
2. Put each of the matrices in Exercise 2.2.1 in reduced row
echelon form.
3. Prove that the row operation of type (a) which interchanges
rows i and j can be obtained by a combination of row opera-
tions of the other two types, that is, types (b) and (c).
4. Do Exercises 2.1.1 to 2.1.4 by applying row operations to
the augmented matrices.
5. How many row operations are needed in general to put an
n x n matrix in row echelon form?
6. How many row operations are needed in general to put an
n x n matrix in reduced row echelon form?
7. Give an example to show that a matrix can have more than
one row echelon form.
8. If A is an invertible n x n matrix, prove that the linear
system AX = B has a unique solution. What does this tell
you about the number of pivots of A?
9. Show that each elementary row operation has an inverse
which is also an elementary row operation.
2.3: Elementary Matrices 47
2.3 Elementary Matrices
An nxn matrix is called elementary if it is obtained from
the identity matrix In in one of three ways:
(a) interchange rows i and j where i ^ j ;
(b) insert a scalar c as the (i,j) entry where % ^ j ;
(c) put a non-zero scalar c in the (i, i) position.
Example 2.3.1
Write down all the possible types of elementary 2x2 matrices.
These are the elementary matrices that arise from the matrix
12 =
( o i ) ' t h e y are
Ei=[ I l),E2 =(l
0 [ ) , * * = ( I I
and
* - « s : ) •
* - ( * °e
Here c is a scalar which must be non-zero in the case of E4
and E5.
The significance of elementary matrices from our point of
view lies in the fact that when we premultiply a matrix by an
elementary matrix, the effect is to perform an elementary row
operation on the matrix. For example, with the matrix
A _ 1 i n ^12
«21 «22
and elementary matrices listed in Example 2.3.1, we have
EA=(a21 a22
 EA=(ail + Ca21 a
i2 + c
«22N
U i i a i 2 / ' 2
V a
2i a22
48 Chapter Two: Systems of Linear Equations
and
E5A = ( au
°1 2
") .
ca2i ca22J
Thus premultiplication by E interchanges rows 1 and 2; pre-
multiplication by E2 adds c times row 2 to row 1; premultipli-
cation by £5 multiplies row 2 by c . What then is the general
rule?
Theorem 2.3.1
Let A be an m x n matrix and let E be an elementary m xm
matrix.
(i) // E is of type (a), then EA is the matrix obtained
from A by interchanging rows i and j of A;
(ii) if E is type (b), then EA is the matrix obtained from
A by adding c times row j to row i;
(iii) if E is of type (c), then EA arises from A by multi-
plying row i by c.
Now recall from 2.2 that every matrix can be put in re-
duced row echelon form by applying elementary row opera-
tions. Combining this observation with 2.3.1, we obtain
Theorem 2.3.2
Let A be any mxn matrix. Then there exist elementary mxm
matrices E, E2, • •
•
, Ek such that the matrix EkE^-i • •
• EA
is in reduced row echelon form.
Example 2.3.2
Consider the matrix
A=
[2 1 oj-
We easily put this in reduced row echelon form B by applying
successively the row operations R <-> R2, ^R, R — ^R2 •
. (2 1 0  (I 1/2 0  (I 0 - 1  _
^ ^ 0 1 2)^Q 1 2 ; ~ ^ 0 1 2J~
2.3: Elementary Matrices 49
Hence E^E2EA = B where
*-(!i).*-(Y:)'*-(i_ 1 /
?)
Column operations
Just as for rows, there are three types of elementary col-
umn operation, namely:
(a) interchange columns i and j , ( C{ «-> Cj);
(b) add c times column j to column i where c is a scalar,
(Ci + cCj);
(c) multiply column i by a non-zero scalar c, ( cCi).
(The reader is warned, however, that column operations can-
not in general be applied to the augmented matrix of a linear
system without changing the solutions of the system.)
The effect of applying an elementary column operation
to a matrix is simulated by right multiplication by a suitable
elementary matrix. But there is one important difference from
the row case. In order to perform the operation Ci + cCj to a
matrix A one multiplies on the right by the elementary matrix
whose (j, i) element is c. For example, let
E=(l
*) and A=(an
°12
V
c 1J a2i a22J
Then
AE _ / i n +cai 2 a12
a2i + ca22 a22
Thus E performs the column operation C + 2C2 and not
C2 + 2C. By multiplying a matrix on the right by suitable
sequences of elementary matrices, a matrix can be put in col-
umn echelon form or in reduced column echelon form; these
50 Chapter Two: Systems of Linear Equations
are just the transposes of row echelon form and reduced row
echelon form respectively.
Example 2.3.3
/ 3 6 2 
Put the matrix A = I J in reduced column echelon
form.
Apply the column operations C, C2 — 6Ci, C3 — 2Ci,
C2 <-> C3, ^ C 2 , and Cx - C2 :
A
1 6 2  / 1 0 0
1/3 2 7J ^ l/3 0 19/3
1 0 0  / 1 0 0
1/3 19/3 0y ~* ^ 1/3 1 0
1 0 0
0 1 0
We leave the reader to write down the elementary matrices
that produce these column operations.
Now suppose we are allowed to apply both row and column
operations to a matrix. Then we can obtain first row echelon
form; subsequently column operations may be applied to give
a matrix of the very simple type
'Ir 0
0 0
where r is the number of pivots. This is called the normal
form of the matrix; we shall see in 5.2 that every matrix has
a unique normal form. These conclusions are summed up in
the following result.
2.3: Elementary Matrices 51
Theorem 2.3.3
Let A be anmxn matrix. Then there exist elementary mxm
matrices E,..., Ek and elementary nxn matrices F,..., Fi
such that
Ek---ElAF1---Fl=N,
the normal form of A.
Proof
By applying suitable row operations to A we can find elemen-
tary matrices Ei, ..., Ek such that B = Ek • • • EA is in row
echelon form. Then column operations are applied to reduce
B to normal form; this procedure yields elementary matrices
Fi, ..., Fi such that N = BF1 • •
• F = Ek • • • E1AF1 • • • Ft is
the normal form of A.
Corollary 2.3.4
For any matrix A there are invertible matrices X and Y such
that N = XAY, or equivalently A = X~1
NY~1
, where N is
the normal form of A.
For it is easy to see that every elementary matrix is in-
vertible; indeed the inverse matrix represents the inverse of the
corresponding elementary row (or column) operation. Since
by 1.2.3 any product of invertible matrices is invertible, the
corollary follows from 2.3.3.
Example 2.3.4
(1 2 2 
Let A = „ „ , . Find the normal form N of A and write
2 3 4 J
N as the product of A and elementary matrices as specified
in 2.3.3.
All we need do is to put A in normal form, while keeping
track of the elementary matrices that perform the necessary
row and column operations. Thus
52 Chapter Two: Systems of Linear Equations
A^l1 2
2  A 2 2  / I 0 2~
0 - 1 Oj 0 1 Oj 0 1 0
1 0 0
0 1 0
which is the normal form of A. Here three row operations and
one column operation were used to reduce A to its normal
form. Therefore
E3E2E1AF1 = N
where
* = | J ? ) . * = f j ?),*3='1
^
-2 ) ' ^ V0 - 1 ) ' •
* I 0 1
and
Inverses of matrices
Inverses of matrices were defined in 1.2, but we deferred
the important problem of computing inverses until more was
known about linear systems. It is now time to address this
problem. Some initial information is given by
Theorem 2.3.5
Let A be annxn matrix. Then the following statements about
A are equivalent, that is, each one implies all of the others.
(a) A is invertible;
(b) the linear system AX = 0 has only the trivial solution;
(c) the reduced row echelon form of A is In;
2.3: Elementary Matrices 53
(d) A is a product of elementary matrices.
Proof
We shall establish the logical implications (a) —> (b), (b) —
•
(c), (c) —
> (d), and (d) —
> (a). This will serve to establish the
equivalence of the four statements.
If (a) holds, then A~l
exists; thus if we multiply both
sides of the equation AX = 0 on the left by A"1
, we get
A'1
AX = A^1
0, so that X = A- 1
0 = 0 and the only solution
of the linear system is the trivial one. Thus (b) holds.
If (b) holds, then we know from 2.1.3 that the number of
pivots of A in reduced row echelon form is n. Since A is n x n,
this must mean that In is the reduced row echelon form of A,
so that (c) holds.
If (c) holds, then 2.3.2 shows that there are elementary
matrices E±, ...,Ek such that Ek • • • E±A = In. Since elemen-
tary matrices are invertible, Ek- • -E is invertible, and thus
A=(Ek--- Ei)"1
= E^1
• • • E^1
, so that (d) is true.
Finally, (d) implies (a) since a product of elementary ma-
trices is always invertible.
A procedure for finding the inverse of a matrix
As an application of the ideas in this section, we shall
describe an efficient method of computing the inverse of an
invertible matrix.
Suppose that A is an invertible n x n matrix. Then
there exist elementary n x n matrices E, E^, • •
• , Ek such that
Ek--- E2EXA = In, by 2.3.2 and 2.3.5. Therefore
A-1
= InA~l
= (£*•
• • E2ElA)A~1
= (Ek--- E2E1)In.
This means that the row operations which reduce A to its
reduced row echelon form In will automatically transform In
to A- 1
. It is this crucial observation which enables us to
compute A~x
.
54 Chapter Two: Systems of Linear Equations
The procedure for computing A x
starts with the parti-
tioned matrix
[A | In]
and then puts it in reduced row echelon form. If A is invertible,
the reduced row echelon form will be
[In I A-1
],
as the discussion just given shows. On the other hand, if the
procedure is applied to a matrix that is not invertible, it will
be impossible to reach a reduced row echelon form of the above
type, that is, one with In on the left. Thus the procedure will
also detect non-invertibility of a matrix.
Example 2.3.5
Find the inverse of the matrix
A =
Put the matrix [A | J3] in reduced row echelon form, using
elementary row operations as described above:
1/2 0 0'
1/2 1 0
0 0 1
2.3: Elementary Matrices 55
-1/2 0 |
1 -2/3 I
-1 2 I
1 0 -1/3 | 2/3 1/3 0'
0 1 -2/3 J 1/3 2/3 0
0 0 4/3 | 1/3 2/3 1
1/3 | 2/3 1/3 0
2/3 I 1/3 2/3 0
1 j 1/4 1/2 3/4
| 3/4 1/2 1/4 
I V2
1 V 2
,
I 1/4 1/2 3/4/
which is the reduced row echelon form. Therefore A is invert-
ible and
/3/4 1/2 1/4'
A'1
= 1 / 2 1 1/2
 l / 4 1/2 3/4
This answer can be verified by checking that AA"1
= 1$ =
A~l
A.
As this example illustrates, the procedure for finding the
inverse of a n x n matrix is an efficient one; in fact at most
n2
row operations are required to complete it (see Exercise
2.3.10).
Exercises 2.3
1. Express each of the following matrices as a product of
elementary matrices and its reduced row echelon form:
56 Chapter Two: Systems of Linear Equations
2. Express the second matrix in Exercise 1 as a product of
elementary matrices and its reduced column echelon form.
3. Find the normal form of each matrix in Exercise 1.
4. Find the inverses of the three types of elementary matrix,
and observe that each is elementary and corresponds to the
inverse row operation.
5. What is the maximum number of column operations needed
in general to put an n x n matrix in column echelon form and
in reduced column echelon form?
6. Compute the inverses of the following matrices if they exist:
2
1
0
- 3
0
- 1
1 
2
- 3 /
/
; (c)

2 1 7
- 1 4 10
3 2 12
7. For which values of t does the matrix
6
t
0
- 1
0
1
1
1
t
not have an inverse?
8. Give necessary and sufficient conditions for an upper tri-
angular matrix to be invertible.
9. Show by an example that if an elementary column opera-
tion is applied to the augmented matrix of a linear system, the
resulting linear system need not be equivalent to the original
one.
10. Prove that the number of elementary row operations
needed to find the inverse of an n x n matrix is at most n2
.
Chapter Three
DETERMINANTS
Associated with every square matrix is a scalar called the
determinant. Perhaps the most striking property of the de-
terminant of a matrix is the fact that it tells us if the matrix
is invertible. On the other hand, there is obviously a limit
to the amount of information about a matrix which can be
carried by a single scalar, and this is probably why determi-
nants are considered less important today than, say, a hundred
years ago. Nevertheless, associated with an arbitrary square
matrix is an important polynomial, the characteristic poly-
nomial, which is a determinant. As we shall see in Chapter
Eight, this polynomial carries a vast amount of information
about the matrix.
3.1 Permutations and the Definition of a Determinant
Let A = (aij) be an n x n matrix over some field of scalars
(which the reader should feel free to assume is either R or C).
Our first task is to show how to define the determinant of A,
which will be written either
det(A)
or else in the extended form
an
Q21
ai2
«22 &2n
O-nl a
n2
For n = l and 2 the definition is simple enough:
kill = an and a u ai2
«2i a-ii
^ 1 1 ^ 2 2 — ^ 1 2 0 2 1 -
57
58 Chapter Three: Determinants
For example, |6| = 6 and
Where does the expression aa22 — a,2a2 come from?
The motivation is provided by linear systems. Suppose that
we want to solve the linear system
anxi +012X2 = 61
a2ixi + a22x2 = h
for unknowns x and x2. Eliminate x2 by subtracting a2
times equation 2 from a22 times equation 1; in this way we
obtain
(ana22 - ai2a2i)xi = 61022 - aX2b2.
This equation expresses xi as the quotient of a pair of 2 x 2
determinants:
bx aX2
b2 a22
an aw
a2i a22
provided, of course, that the denominator does not vanish.
There is a similar expression for x2.
The preceding calculation indicates that 2 x 2 determi-
nants are likely to be of significance for linear systems. And
this is confirmed if we try the same computation for a lin-
ear system of three equations in three unknowns. While the
resulting solutions are complicated, they do suggest the fol-
lowing definition for det(^4) where A = (0,^)3,3;
a n a 2 2 0 3 3 + <2l2<223a
31 + Ql3«2ia32
—ai2a2id33 — ai3a22a,3i — ana23a
32
2 - 3
4 1 14.
3.1: The Definition of a Determinant 59
What are we to make of this expression? In the first place it
contains six terms, each of which is a product of three entries
of A. The second subscripts in each term correspond to the
six ways of ordering the integers 1, 2, 3, namely
1,2,3 2,3,1 3,1,2 2,1,3 3,2,1 1,3,2.
Also each term is a product of three entries of A, while three
of the terms have positive signs and three have negative signs.
There is something of a pattern here, but how can one
tell which terms are to get a plus sign and which are get a
minus sign? The answer is given by permutations.
Permutations
Let n be a fixed positive integer. By a permutation of
the integers 1, 2,..., n we shall mean an arrangement of these
integers in some definite order. For example, as has been
observed, there are six permutations of the integers 1, 2, 3.
In general, a permutation of 1, 2,..., n can be written in
the form
k , 12, •
•
• , in
where ii, i2,. • •, in are the integers 1, 2,..., n in some order.
Thus to construct a permutation we have only to choose dis-
tinct integers ii, i2, •
. ., in from the set {1, 2,..., n). Clearly
there are n choices for i once i has been chosen, it cannot
be chosen again, so there are just n — 1 choices for i2 since
i and i2 cannot be chosen again, there are n — 2 choices for
^3, and so on. There will be only one possible choice for in
since n — 1 integers have already been selected. The number
of ways of constructing a permutation is therefore equal to the
product of these numbers
n(n - l)(n - 2) •• -2-1,
60 Chapter Three: Determinants
which is written
n!
and referred to as "n factorial". Thus we can state the follow-
ing basic result.
Theorem 3.1.1
The number of permutations of the integers 1,2,... ,n equals
n! = n ( n - l)---2- 1.
Even and odd permutations
A permutation of the integers 1,2, ...,n is called even
or odd according to whether the number of inversions of the
natural order 1,2,... ,n that are present in the permutation
is even or odd respectively. For example, the permutation 1,
3, 2 involves a single inversion, for 3 comes before 2; so this
is an odd permutation. For permutations of longer sequences
of integers it is advantageous to count inversions by means of
what is called a crossover diagram. This is best explained by
an example.
Example 3.1.1
Is the permutation 8, 3, 2, 6, 5, 1, 4, 7 even or odd?
The procedure is to write the integers 1 through 8 in the
natural order in a horizontal line, and then to write down the
entries of the permutation in the line below. Join each integer
i in the top line to the same integer i where it appears in the
bottom line, taking care to avoid multiple intersections. The
number of intersections or crossovers will be the number of
inversions present in the permutation:
3.1: The Definition of a Determinant 61
1 2 3 4 5 6 7 8
8 3 2 6 5 1 4 7
Since there are 15 crossovers in the diagram, this permutation
is odd.
A transposition is a permutation that is obtained from
1, 2,..., n by interchanging just two integers. Thus
2,1, 3,4,..., n is an example of a transposition. An important
fact about transpositions is that they are always odd.
Theorem 3.1.2
Transpositions are odd permutations.
Proof
Consider the transposition which interchanges i and j , with
i < j say. The crossover diagram for this transposition is
1 2 . . . / / + 1 . . . j - 1 j j + 1 . . . n
1 2 . . . j i + 1 . . . j - 1 / j + 1 . . . n
Each of the j — i — 1 integers i + 1, i + 2,..., j — 1 gives rise
to 2 crossovers, while i and j add one more. Hence the total
number of crossovers in the diagram equals 2(j — i — 1) + 1,
which is odd.
It is important to determine the numbers of even and odd
permutations.
62 Chapter Three: Determinants
Theorem 3.1.3
If n > 1, there are {n) even permutations of 1,2,... ,n and
the same number of odd permutations.
Proof
If the first two integers are interchanged in a permutation,
it is clear from the crossover diagram that an inversion is
either added or removed. Thus the operation changes an even
permutation to an odd permutation and an odd permutation
to an even one. This makes it clear that the numbers of even
and odd permutations must be equal. Since the total number
of permutations is n, the result follows.
Example 3.1.2
The even permutations of 1, 2, 3 are
1,2,3 2,3,1 3,1,2,
while the odd permutations are
2,1,3 3,2,1 1,3,2.
Next we define the sign of a permutation i, i2,. •
., in
sign(ii, i2 ,..., in)
to be +1 if the permutation is even and —1 if the permutation
is odd. For example, sign(3, 2, 1) = —1 since 3, 2, 1 is an odd
permutation.
Permutation matrices
Before proceeding to the formal definition of a determi-
nant, we pause to show how permutations can be represented
by matrices. An nxn matrix is called a permutation matrix if
it can be obtained from the identity matrix In by rearranging
the rows or columns. For example, the permutation matrix
3.1: The Definition of a Determinant 63
is obtained from I3 by cyclically permuting the columns, C —
>
C2 —» C3 —
> C. Permutation matrices are easy to recognize
since each row and each column contains a single 1, while all
other entries are zero.
Consider a permutation i, 22,..., in of 1,2,..., n, and let
P be the permutation matrix which has (j,ij) entry equal to
1 for j = 1,2,... , n, and all other entries zero. This means
that P is obtained from In by rearranging the columns in
the manner specified by the permutation i,. . . ,in , that is,
Cj Ci,. Then, as matrix multiplication shows,
(X
 (ix

2 i2
nj inJ
Thus the effect of a permutation on the order 1,2,... ,n is
reproduced by left multiplication by the corresponding per-
mutation matrix.
Example 3.1.3
The permutation matrix which corresponds to the permuta-
tion 4, 2, 1, 3 is obtained from I4 by the column replacements
Ci —
• C4, C2 —
> C2, C3 —
» Ci, C4 —
> C3. It is
P =
/ 0 0 0 1 
0 1 0 0
1 0 0 0
Vo 0 1 0/
64 Chapter Three: Determinants
and indeed
P
2
3
w
Definition of a determinant in general
We are now in a position to define the general n x n
determinant. Let A = (ay)n,n be an n x n matrix over some
field of scalars. Then the determinant of A is the scalar defined
by the equation
det(A) = Y^ s
ign(i1,i2,...,in)aUla2i2
where the sum is taken over all permutations ii,%2, • • • ,in of
1,2,...,n.
Thus det(.A) is a sum of n terms each of which involves
a product of n elements of A, one from each row and one
from each column. A term has a positive or negative sign
according to whether the corresponding permutation is even or
odd respectively. One determinant which can be immediately
evaluated from the definition is that of In:
det(In) = 1.
This is because only the permutation 1,2,... ,n contributes a
non-zero term to the sum that defines det(Jn).
If we specialise the above definition to the cases n =
1,2,3, we obtain the expressions for det(^4) given at the be-
ginning of the section. For example, let n — 3; the even and
odd permutations are listed above in Example 3.1.2. If we
write down the terms of the determinant in the same order,
we obtain
3.1: The Definition of a Determinant 65
ana220-33 + a
1 2 a
2 3 a
3 1 + ai3G21<332
— 012021033 — 013022^31 —
^Iia23tl32
We could in a similar fashion write down the general 4 x 4 de-
terminant as a sum of 4! =24 terms, 12 with a positive sign
and 12 with a negative sign. Of course, it is clear that the
definition does not provide a convenient means of comput-
ing determinants with large numbers of rows and columns;
we shall shortly see that much more efficient procedures are
available.
Example 3.1.4
What term in the expansion of the 8x8 determinant det((ajj))
corresponds to the permutation 8, 3, 2, 6, 5, 1, 4, 7 ?
We saw in Example 3.1.1 that this permutation is odd,
so its sign is —1; hence the term sought is
~ Ctl8a23a
32«46a55«6l074^87-
Minors and cofactors
In the theory of determinants certain subdeterminants
called minors prove to be a useful tool. Let A = (a^) be an
n x n matrix. The (i, j) minor Mi;- of A is defined to be the
determinant of the submatrix of A that remains when row i
and column j of A are deleted.
The (i, j) cofactor Aij of A is simply the minor with an
appropriate sign:
Ay = ( - l r ^ M y .
For example, if
(a n «12 a13
a2 a22 a23 J ,
«31 «32 G3 3 /
66 Chapter Three: Determinants
then
and
M23 = an ai2
«31 ^32
a
l l a
3 2 —
^12031
I23 = (-1)2 + 3
M2 3 = ai2a3i - ana3 2 .
One reason for the introduction of cofactors is that they
provide us with methods of calculating determinants called
row expansion and column expansion. These are a great im-
provement on the defining sum as a means of computing de-
terminants. The next result tells us how they operate.
Theorem 3.1.4
Let A = (dij) be an n x n matrix. Then
(i) det(A) = X]fc=i a
ikMk , (expansion by row i);
(ii) det(A) = Efc= i a
kjAkj, (expansion by column j).
Thus to expand by row i, we multiply each element in
row i by its cofactor and add up the resulting products.
Proof of Theorem 3.1.4
We shall give the proof of (i); the proof of (ii) is similar. It
is sufficient to show that the coefficient of a^ in the defining
expansion of det(.A) equals A^. Consider first the simplest
case, where i = 1 = k. The terms in the defining expansion of
det(.A) that involve an are those that appear in the sum
Y^ si
gn
(1
> *2, •
•
• , in)ana2i2 •
•
•
a nin.
Here the sum is taken over all permutations of 1,2,... ,n which
have the form 1, z2 ,i3 ,..., zn. This sum is clearly the same as
au(%2 sign(i2, i3, ... , in)a2i2
a
3i3
3.1: The Definition of a Determinant 67
where the summation is now over all permutations 12, • • •
, in
of the integers 2,..., n. But the coefficient of an in this last
expression is just Mu = An. Hence the coefficient of an is
the same on both sides of the equation in (i).
We can deduce the corresponding statement for general i
and k by means of the following device. The idea is to move
dik to the (1, 1) position of the matrix in such a way that it
will still have the same minor M^. To do this we interchange
row % of A successively with rows i — l,i — 2,...,l, after which
dik will be in the (1, k) position. Then we interchange column
k with the columns k — l,k — 2,...,l successively, until a^ is in
the (1,1) position. If we keep track of the determinants that
arise during this process, we find that in the final determinant
the minor of aik is still M^. So by the result of the first
paragraph, the coefficient of a^ in the new determinant is
Mik.
However each row and column interchange changes the
sign of the determinant. For the effect of such an interchange
is to switch two entries in every permutation, and, as was
pointed out during the proof of 3.1.3, this changes a permu-
tation from even to odd, or from odd to even. Thus the sign
of each permutation is changed by —1. The total number of
interchanges that have been applied is (i — 1) + (k — 1) =
i + k — 2. The sign of the determinant is therefore changed by
(-l)i + f c
~2
= (-l)i + f c
. It follows that the coefficient of a^ in
det(A) is (—l)l+k
Mik, which is just the definition of An-.
(It is a good idea for the reader to write out explicitly
the row and column interchanges in the case n — 3 and i = 2,
k = 3, and to verify the statement about the minor M23).
The theorem provides a practical method of computing
3 x 3 determinants; for determinants of larger size there are
more efficient methods, as we shall see.
68 Chapter Three: Determinants
Example 3.1.5
Compute the determinant
1 2 0
4 2 - 1 .
6 2 2
For example, we may expand by row 1, obtaining
2 - 1
2 2
+ 2 ( - l ) 3 4 - 1
6 2
+ 0(-l)4 4 2
6 2
= 6 - 28 + 0 = -22.
Alternatively, we could expand by column 2:
4 - 1
6 2
+ 2(-l)4 1 0
6 2
+ 2(-l)5 1 0
4 - 1
= -28 + 4 + 2 = -22.
However there is an obvious advantage in expanding by a row
or column which contains as many zeros as possible.
The determinant of a triangular matrix can be written
down at once, an observation which is used frequently in cal-
culating determinants.
Theorem 3.1.5
The determinant of an upper or lower triangular matrix equals
the product of the entries on the principal diagonal of the ma-
trix.
Proof
Suppose that A = (oij)n)Tl is, say, upper triangular, and ex-
pand det(i4) by column 1. The result is the product of a n
and an (n — 1) x (n — 1) determinant which is also upper tri-
angular. Repeat the operation until a 1 x 1 determinant is
obtained (or use mathematical induction).
3.1: The Definition of a Determinant 69
Exercises 3.1
1. Is the permutation 1, 3, 8, 5, 2, 6, 4, 7 even or odd? What
is the corresponding term in the expansion of det^a^^s)?
2. The same questions for the permutation 8, 5, 3, 2, 1, 7, 6,
9,4.
3. Use the definition of a determinant to compute
1 - 3 0
2 1 4
-1 0 1
4. How many additions, subtractions and multiplications are
needed to compute a n n x n determinant by using the defini-
tion?
5. For the matrix
2
4
-1
3
3
2
find the minors Mi3, M23 and M33, and the corresponding
cofactors Ai3, A23 and A33.
6. Use the cofactors found in Exercise 5 to compute the de-
terminant of the matrix in that problem.
7. Use row or column expansion to compute the following
determinants:
(a)
- 2 2 3
2 2 1
0 0 5
(c)
1
0
0
0
, (b)
1
2
1
0
- 2 3
4 2
0 - 3
0 0
- 2
0
0
1
4
1
1
3
3 4
1 3
- 1 2
1 3
70 Chapter Three: Determinants
8. If A is the n x n matrix
/ 0 0 • • • 0 ax 
0 0 •
• • a2 0
 a n 0 •
•
• 0 0 /
show that det(A) = (-l)n
(n
-1
)/2
aia2 • • • an.
9. Write down the permutation matrix that represents the
permutation 3, 1, 4, 5, 2.
10. Let ii,..., in be a permutation of 1,..., n , and let P be
the corresponding permutation matrix. Show that for any n x
n matrix A the matrix AP is obtained from A by rearranging
the columns according to the scheme Cj —
> Cj..
11. Prove that the sign of a permutation equals the determi-
nant of the corresponding permutation matrix.
12. Prove that every permutation matrix is expressible as a
product of elementary matrices of the type that represent row
or column interchanges.
13. If P is any permutation matrix, show that P"1
= PT
.
[Hint: apply Exercise 10].
3.2 Basic Properties of Determinants
We now proceed to develop the theory of determinants,
establishing a number of properties which will allow us to
compute determinants more efficiently.
Theorem 3.2.1
If A is an n x n matrix, then
&et(AT
) =det(A).
3.2: Basic Properties of Determinants 71
Proof
The proof is by mathematical induction. The statement is
certainly true if n = 1 since then AT
= A. Let n > 1 and
assume that the theorem is true for all matrices with n — 1
rows and columns. Expansion by row 1 gives
n
3 = 1
Let B denote the matrix AT
. Then a^ = bji. By induction on
n, the determinant A±j equals its transpose. But this is just
the (j, 1) cofactor Bji of B. Hence Aj = Bji and the above
equation becomes
n
det(A) = ^2bjiBji.
However the right hand side of this equation is simply the
expansion of det(B) by column 1; thus det(A) = det(B).
A useful feature of this result is that it sometimes enables
us to deduce that a property known to hold for the rows of a
determinant also holds for the columns.
Theorem 3.2.2
A determinant with two equal rows (or two equal columns) is
zero.
Proof
Suppose that the n x n matrix A has its jth and kth rows
equal. We have to show that det(A) = 0. Let ii,i^, • •
. , in be
a permutation of 1, 2,..., n; the corresponding term in the ex-
pansion of det(i4) is sign(ii, i-2, ... , «n)aii102i2 • • -ctnin- Now
if we switch ij and i^ in this product, the sign of the permuta-
tion is changed, but the product of the a's remains the same
72 Chapter Three: Determinants
since ajifc = a,kik and a^ = a^. This means that the term
under consideration occurs a second time in the denning sum
for det(.A), but with the opposite sign. Therefore all terms in
the sum cancel and det(^4) equals zero.
Notice that we do not need to prove the statement for
columns because of the remark following 3.2.1.
The next three results describe the effect on a determi-
nant of applying a row or column operation to the associated
matrix.
Theorem 3.2.3
(i) If a single row (or column) of a matrix A is multiplied
by a scalar c, the resulting matrix has determinant equal
to c(det(A)).
(ii) If two rows (or columns) of a matrix A are
interchanged, the effect is to change the sign of the
determinant.
(iii) The determinant of a matrix A is not changed if a
multiple of one row (or column) is added to another row
(or column).
Proof
(i) The effect of the operation is to multiply every term in
the sum defining det(A) by c. Therefore the determinant is
multiplied by c.
(ii) Here the effect of the operation is to switch two entries
in each permutation of 1, 2,..., n; we have already seen that
this changes the sign of a permutation, so it multiplies the
determinant by —1.
(iii) Suppose that we add c times row j to row k of the matrix:
here we shall assume that j < k. If C is the resulting matrix,
then det(C) equals
^2 sign(ii,..., in)aiii • • • %•
*, - • • • (a
kik + cajik ) •
• • anin,
3.2: Basic Properties of Determinants 73
which is turn equals the sum of
^2 signal,..., in)oii1 ••
and
c ^ s i g n ( i 1 , . . . , i n ) a i i l •
"3h ' a
kik ' ' ' a
nin
• a31j a3lk
1
ar
Now the first of these sums is simply det(^4), while the second
sum is the determinant of a matrix in which rows j and k are
identical, so it is zero by 3.2.2. Hence det(C) = det(A).
Now let us see how use of these properties can lighten the
task of evaluating a determinant. Let A be an n x n matrix
whose determinant is to be computed. Then elementary row
operations can be used as in Gaussian elimination to reduce A
to row echelon form B. But B is an upper triangular matrix,
say
(bn bi2 •
•
• & l n 
0 b22 • • • b2n
B =
 0 0 "nn /
so by 3.1.5 we obtain det(B) = 611622 • • -bun- Thus all that
has to be done is to keep track, using 3.2.3, of the changes in
det(.A) produced by the row operations.
Example 3.2.1
Compute the determinant
D =
0
1
- 2
1
1
1
- 2
- 2
2
1
3
- 2
3
1
3
- 3
Apply row operations Ri <-» R2 and then R3 + 2R, R4 —
i?i successively to D to get:
74 Chapter Three: Determinants
D =
1 1
0 1
- 2 - 2
1 - 2
1 1
2 3
3 3
-2 - 3
1 1
0 1
0 0
0 - 3
1 1
2 3
5 5
-3 - 4
Next apply successively i?4 + 3i?2 and l/5i?3 to get
D
1 1
0 1
0 0
0 0
1
2
5
3
= - 5
1
2
0 0 1 1
0 0 3 5
Finally, use of R4 — 3i?3 yields
1 1 1
n
D = -5 0 1 2
0 0 1
0 0 0
= -10.
Example 3.2.2
Use row operations to show that the following determinant is
identically equal to zero.
a + 2 b + 2 c + 2
x + 1 y + 1 z + 1
2x — a 2y — b 2z — c
Apply row operations R% + Ri and 2R2. The resulting
determinant is zero since rows 2 and 3 are identical.
Example 3.2.3
Prove that the value of the n x n determinant
2
1
0
0
1
2
0
0
0 •
1 •
0 •
0 •
• 0
• 0
• 1
• 0
0
0
2
1
0
0
1
2
3.2: Basic Properties of Determinants 75
is n + 1.
First note the obvious equalities D — 2 and D2
n > 3; then, expanding by row 1, we obtain
3. Let
Dn — 2Dn- —
1
0
0
0
0
1
2
1
0
0
0
1
2
0
0
0 •
0 •
1 •
0 •
0 •
• 0
• 0
• 0
• 1
• 0
0
0
0
2
1
0
0
0
1
2
Expanding the determinant on the right by column 1, we find
it to be Dn_2- Thus
Dn = 2Dn _! — Dn-2-
This is a recurrence relation which can be used to solve for
successive values of Dn. Thus D3 = 4 , D4 = 5, D5 = 6 , etc.
In general Dn = n + 1. (A systematic method for solving
recurrence relations of this sort will be given in 8.2.)
The next example is concerned with an important type
of determinant called a Vandermonde determinant; these de-
terminants occur frequently in applications.
Example 3.2.4
Establish the identity
1
Xi
X
n - 1
1
X2
Xn
JUn
X„
X n - 1
n
1,3
, Xj, Xj),
where the expression on the right is the product of all the
factors Xi — Xj with i < j and i,j = l,2,...,n.
76 Chapter Three: Determinants
Let D be the value of the determinant. Clearly it is a
polynomial in xi, x2,..., xn. If we apply the column operation
Ci—Cj , with i < j , to the determinant, its value is unchanged.
On the other hand, after this operation each entry in column i
will be divisible by xj — Xj. Hence D is divisible by •
X> £ Jb n for
alH, j;
= 1, 2,..., n and i < j . Thus we have located a total of
n(n— l)/2 distinct linear polynomials which are factors of D,
this being the number of pairs of distinct positive integers i,j
such that 1 < i < j < n. But the degree of the polynomial D
is equal to
, „N n(n — 1)
l + 2 + --- + ( n - l ) = 2 •
for each term in the denning sum has this degree. Hence D
must be the product of these n(n —1)/2 factors and a constant
c, there being no room for further factors. Thus
D = cY[{xi-Xj),
with i < j = 1, 2,..., n. In fact c is equal to 1, as can be seen
by looking at the coefficient of the term lx2x^ • • • x^~l
in the
defining sum for the determinant D; this corresponds to the
permutation 1,2,... ,n, and so its coefficient is +1. On the
other hand, in the product of the x^ — Xj the coefficient of the
term is 1. Hence c = 1.
The critical property of the Vandermonde determinant D
is that D = 0 if and only if at least two of xi, x2,..., xn are
equal.
Exercises 3.2
1. By using elementary row operations compute the following
determinants:
1 0 3 2
3 4 - 1 2
0 3 1 2 '
1 5 2 3
1 4 2
- 2 4 7
6 1 2
, (b)
3
0
2
1
4
- 3
- 2
4
6
, (c)
3.2: Basic Properties of Determinants 77
2. If one row (or column) of a determinant is a scalar multiple
of another row (or column), show that the determinant is zero.
3. If A is an n x n matrix and c is a scalar, prove that
det(cA) = cn
det(A).
4. Use row operations to show that the determinant
,2 b2 „2
1 + b
2b2
- b -
a
l + a
2a2
-a-1
c
l + c
1 2cz
-c-1
is identically equal to zero.
5. Let A be an n x n matrix in row echelon form. Show that
det(A) equals zero if and only if the number of pivots is less
than n.
6. Use row and column operations to show that
= (a + b + c)(-a2
-b2
- c2
+ab + bc + ca).
a
b
c
b
c
a
c
a
b
Without expanding the determinant, prove that
1 1 1
x - y)(y - z)(z - x)(x + y + z).
x
3
X
y
y3
z
^3
[Hint: show that the determinant has factors x — y , y — z ,
z — x , and that the remaining factor must be of degree 1 and
symmetric in x,y,z ].
8. Let Dn denote the "bordered" n x n determinant
0
b
0
0
0
a
0
b
0
0
0 •
a •
0 •
0 •
0 •
• 0
• 0
• 0
• 0
• b
0
0
0
a
0
78 Chapter Three: Determinants
Prove that Z^n-i = 0 and D2n — (—ab)n
.
9. Let Dn be the nxn determinant whose (i,j) entry is i + j .
Show that Dn = 0 if n > 2. [Hint: use row operations].
10. Let un denote the number of additions, subtractions and
multiplications needed in general to evaluate an n x n deter-
minant by row expansion. Prove that un = nun _i + 2n — 1.
Use this formula to calculate un for n = 2,3,4.
3.3 Determinants and Inverses of Matrices
An important property of the determinant of a square
matrix is that it tells us whether the matrix is invertible.
Theorem 3.3.1
An nxn matrix A is invertible if and only if det( A) ^ 0.
Proof
By 2.3.2 there are elementary matrices E,E2, • • • ,Ek such
that the matrix R = E^Ek-i • • • E^EA is in reduced row
echelon form. Now observe that if E is any elementary nxn
matrix, then det(EA) = cdet(^4) for some non-zero scalar c;
this is because left multiplication by E performs an elementary
row operation on A and we know from 3.2.3 that such an
operation will, at worst, multiply the value of the determinant
by a non-zero scalar. Applying this fact repeatedly, we obtain
det(-R) = det(Ek • •
• E2E1A) = ddet(A) for some non-zero
scalar d. Consequently det(-A) 7^ 0 if and only if det(i?) ^ 0.
Now we saw in 2.3.5 that A is invertible precisely when
R = In . But, remembering the form of the matrix R, we
recognise that the only way that det(-R) can be non-zero is if
R = In. Hence the result follows.
Example 3.3.1
The Vandermonde matrix of Example 3.2.4 is invertible if and
only if all different.
3.3: Determinants and Inverses of Matrices 79
Corollary 3.3.2
A linear system AX = 0 with n equations in n unknowns has
a non-trivial solution if and only if det(A) = 0.
This very useful result follows directly from 2.3.5 and
3.3.1. Theorem 3.3.1 can be used to establish a basic formula
for the determinant of the product of two matrices.
Theorem 3.3.3
If A and B are any n x n matrices, then
det(AB) = det (A) det(J5).
Proof
Consider first the case where B is not invertible, which by
3.3.1 means that det(B) = 0. According to 2.3.5 there is a
non-zero vector X such that BX = 0. This clearly implies
that (AB)X = 0, and so, by 2.3.5 and 3.3.1, det(AJ3) must
also be zero. Thus the formula certainly holds in this case.
Suppose now that B is invertible. Then B is a product
of elementary matrices, say B = EE% • •
• Ek', this is by 2.3.5.
Now the effect of right multiplication of A by an elementary
matrix E is to apply an elementary column operation to A.
What is more, we can tell from 3.2.3 just what the value of
det(AE) is; indeed
{
-det(A)
det (A) ,
c det (A)
according to whether E represents a column operation of the
types
O i 4—^ O j
Ci + cCj
cCi
80 Chapter Three: Determinants
Now we can see from the form of the elementary matrix E
that det(E) equals — 1, 1 or c, respectively, in the three cases;
hence the formula det(AE ) = det(A) det(E) is valid. In short
our formula is true when B is an elementary matrix. Applying
this fact repeatedly, we find that &et(AB) equals
det(AEfc • •
• E2EX) = det(A) det(Ek) • • • det(£2) det(Ei),
which shows that
det(AB) = det(A) det{Ek • •
• Ex) = det(A) det(5).
Corollary 3.3.4
Let A and B be n x n matrices. If AB = In , then BA = In,
and thus B = A"1
.
Proof
For 1 = det(AB) = det(A) det(S), so det(A) ^ 0 and A is in-
vertible, by 3.3.1. Therefore BA = A~l
{AB)A = A~l
InA =
In-
Corollary 3.3.5
If A is an invertible matrix, then det(^4-1
) = l/det(A).
Proof
Clearly 1 = det(7) = det{AA~l
) = det(A)det(A'1
), from
which the statement follows.
The adjoint matrix
Let A = (a,ij) be an n x n matrix. Then the adjoint
matrix
adj (A)
of A is defined to be the nxn matrix whose (i,j) element is the
(j,i) cofactor Aji of A. Thus adj(yl) is the transposed matrix
of cofactors of A. For example, the adjoint of the matrix
(6 -1 3 ]
2 -3 4/
3.3: Determinants and Inverses of Matrices 81
is
/ 5 -11 7
( —18 2 3
'  - 1 6 7 -13
The significance of the adjoint matrix is made clear by
the next two results.
Theorem 3.3.6
// A is any n x n matrix, then
A adj(A) = (det(A))In = adj(A)A.
Proof
The (i,j) entry of the matrix product A adj(i4) is
n n
^2aik(adj(A))kj = ^2aikAjk.
k=i fc=i
If i = j , this is just the expansion of det(A) by rov/ i; on the
other hand, if i ^ j , the sum is also a row expansion of a
determinant, but one in which rows i and j are identical. By
3.2.2 the sum will vanish in this case. This means that the
off-diagonal entries of the matrix product A a,d](A) are zero,
while the entries on the diagonal all equal det(^4). Therefore
A adj(^4) is the scalar matrix (det(A))In, as claimed. The
second statement can be proved in a similar fashion.
Theorem 3.3.5 leads to an attractive formula for the in-
verse of an invertible matrix.
Theorem 3.3.7
If A is an invertible matrix, then A"1
= (l/det(A))adj(A).
82 Chapter Three: Determinants
Proof
In the first place, remember that A~l
exists if and only if
det(A) ^ 0, by 3.3.1. Prom A adj(A) = {det(A))In we obtain
A(l/det(A))adj(A)) = l/det(A)(A adj(A)) = In,
by 3.3.6. The result follows in view of 3.3.4.
Example 3.3.2
Let A be the matrix
The adjoint of A is
/ 3 2 1 
2 4 2 .
 1 2
3 /
Expanding det(i4) by row 1, we find that it equals 4. Thus
/ 3 / 4 1/2 1/4 
A'1
= 1/2 1 1/2 .
 l / 4 1/2 3 / 4 /
Despite the neat formula provided by 3.3.7, for matrices with
four or more rows it is usually faster to use elementary row
operations to compute the inverse, as described in 2.3: for
to find the adjoint of an n x n matrix one must compute n
determinants each with n — 1 rows and columns.
Next we give an application of determinants to geometry.
Example 3.3.3
Let Pi(xi, yi, zx), P2(x2, y2, z2) and P3(^3, 2/3, z3) be three
non-collinear points in three dimensional space. The points
3.3: Determinants and Inverses of Matrices 83
therefore determine a unique plane. Find the equation of the
plane by using determinants.
We know from analytical geometry that the equation of
the plane must be of the form ax + by + cz + d = 0. Here
the constants a, b, c, d cannot all be zero. Let P(x,y,z) be
an arbitrary point in the plane. Then the coordinates of the
points P, Pi, P2, P3 must satisfy the equation of the plane.
Therefore the following equations hold:
ax + by + cz + d = 0
ax + byi + cz + d = 0
ax2 + bx2 + cz2 + d = 0
ax3 + by3 + cz3 + d = 0
Now this is a homogeneous linear system in the unknowns
a, b, c, d; by 3.3.2 the condition for there to be a non-trivial
solution is that
x y z 1
xi yx z 1
x2 2/2 z2 1
£3 2/3 z3 1
This is the condition for the point P to lie in the plane, so it
is the equation of the plane. That it is of the form ax + by +
cz + d = 0 may be seen by expanding the determinant by row
1.
For example, the equation of the plane which is deter-
mined by the three points (0, 1, 1), (1, 0, 1) and (1, 1, 0)
is
x y z 1
0 1 1 1
1 0 1 1 '
1 1 0 1
which becomes on expansion x + y + z — 2 = 0.
84 Chapter Three: Determinants
Cramer's Rule
For a second illustration of the uses of determinants, we
return to the study of linear systems. Consider a linear system
of n equations in n unknowns x, X2,..., xn
AX = B,
where the coefficient matrix A has non-zero determinant. The
system has a unique solution, namely X = A~1
B. There is a
simple expression for this solution in terms of determinants.
Using 3.3.7 we obtain
X = A~l
B = l/det(A) (adj(A) B).
From the matrix product adj(^4)JB we can read off the ith.
unknown as
n n
Xi = (5>dj(A))^)/det(A) = C^bjAjJ/detiA).
Now the second sum is a determinant; in fact it is det(Mj)
where Mi is the matrix obtained from A when column i is
replaced by B. Hence the solution of the linear system can be
expressed in the form Xi = det(Mj)/det(^4), i = 1,2, ...,n.
Thus we have obtained the following result.
Theorem 3.3.8 (Cramer's Rule)
If AX — B is a linear system of n equations in n unknowns
and det (A) is not zero, then the unique solution of the linear
system can be written in the form
Xi = det(Mi)/det(j4), i = l, ... , n,
where Mi is the matrix obtained from A when column i is
replaced by B.
3.3: Determinants and Inverses of Matrices 85
The reader should note that Cramer's Rule can only be
used when the linear system has the special form indicated.
Example 3.3.4
Solve the following linear system using Cramer's Rule.
Here
A =
X - X2
Xi + 2x2
2x±
1 - 1
1 2
2 0 :
! )
- x3 = 4
-x3 = 2
= 1
and B =
'4
Thus det(A) = 9, and Cramer's Rule gives the solution
xx = 1/9
x2 = 1/9
x3 = 1/9
4
2
1
1
1
2
1
1
2
- 1
2
0
4
2
1
- 1
2
0
-1
- 1
1
- 1
- 1
1
4
2
1
= 13/9,
= - 2 / 3 ,
= -17/9.
Exercises 3.3
1. For the matrices
A = and B =
2 5
4 7
86 Chapter Three: Determinants
verify the identity det(AB) = det(A) det(B).
2. By finding the relevant adjoints, compute the inverses of
the following matrices:
(a)
4
-2 (b) (c)
3. If A is a square matrix and n is a positive integer, prove
that det(An
) = (det(A))n
.
4. Use Cramer's Rule to solve the following linear systems:
(a)
2xx
Xi
2xi
Xi
2xi
Xi
- 3x2
+ 3x2
+ x2
+ x2
- x2
+ 2x2
+ £3
+ x3
+ x3
+ x3
- x3
- 3x3
= -1
= 6
= 11
= -1
= 4
= 7
(b)
5. Let A be an n x n matrix. Prove that A is invertible if and
only if adj(A) is invertible.
6. Let A be any n x n matrix where n > 1. Prove that
det(adj(yl)) = (det(A))n_1
. [Hint: first deal with the case
where det(A) ^ 0, by applying det to each side of the identity
of 3.3.6. Then argue that the result must still be true when
det(A)=0].
7. Find the equation of the plane which contains the points
(1,1,-2), (1,-2, 7) and (0,1,-4).
8. Consider the four points in three dimensional space
Pi(xi,yi,Zi), i = 1, 2, 3, 4. Prove that a necessary and suffi-
cient condition for the four points to lie in a plane is
xi y zi 1
X2 V2 Z2 1
X3 J/3 Z3 1
x4 y4 zA 1
= 0.
Chapter Four
INTRODUCTION TO VECTOR SPACES
The aim of this chapter is to introduce the reader to the
notion of an abstract vector space. Roughly speaking, a vec-
tor space is a set of objects called vectors which it is possible
to add and multiply by scalars, subject to reasonable rules.
Vector spaces occur in numerous branches of mathematics, as
well as in many applications; they are therefore of great im-
portance and utility. Rather than immediately confront the
reader with an abstract definition, we prefer first to discuss
some vector spaces which are familiar objects. Then we pro-
ceed to extract the common features of these examples, and
use them to frame the definition of a general vector space.
4.1 Examples of Vector Spaces
The first example of a vector space has a geometrical
background.
Euclidean space
Choose and fix a positive integer n, and define
to be the set of all n-column vectors
f X l

x = X
J
XnJ
87
88 Chapter Four: Introduction to Vector Spaces
where the entries Xi are real numbers. Of course these are
special types of matrices, so rules of addition and scalar mul-
tiplication are at hand, namely
fXl

xnJ
+
/Vi
V2
VVr>
X2 + V2
and
fxx
X2
( CXi _
cx2
Thus the set Rn
is "closed" with respect to the operation
of adding pairs of its elements, in the sense that one cannot
escape from Rn
by adding two of its elements; similarly Rn
is
closed with respect to multiplication of its elements by scalars.
Notice also that Rn
contains the zero column vector.
Another point to observe is that the rules of matrix alge-
bra listed in 1.2.1 which are relevant to column vectors apply
to the elements of Rn
. The set Rn
, together with the op-
erations of addition and scalar multiplication, forms a vector
space which is known as n- dimensional Euclidean space.
Line segments and R3
When n is 3 or less, the vector space Rn
has a good
geometrical interpretation. Consider the case of R3
. Atypical
element of R3
is a 3-column
Assume that a cartesian coordinate system has been chosen
with assigned x, y and z -axes. We plan to represent the col-
umn vector A by a directed line segment in three-dimensional
4.1: Examples of Vector Spaces 89
space. To achieve this, choose an arbitrary point I with co-
ordinates (u, «2> U3) as the initial point of the line segment.
The end point of the segment is the point E with coordinates
(ui+ai, U2 + a2, U3 + 0,3). The direction of the line segment
IE is indicated by an arrow:
E(u1 + a1,u2 + a
2. w
3 + a
3)
{U:,U2,U3)
The length of IE equals
I — J a + 02 + 03
and its direction is specified by the direction cosines
cii/l, a2/l, a3/l.
Here the significant feature is that none of these quantities
depends on the initial point I. Thus A is represented by in-
finitely many line segments all of which have the same length
and the same direction. So all the line segments which repre-
sent A are parallel and have equal length. However the zero
vector is represented by a line segment of length 0 and it is
not assigned a direction.
Having connected elements of R3
with line segments, let
us see what the rule of addition in R3
implies about line seg-
ments. Consider two vectors in R3
A = a2 and B
90 Chapter Four: Introduction to Vector Spaces
and their sum
A + B = a2 + b2 .
 a3 + b3 J
Represent the vectors A, B and A + B by line segments IU,
IV, and I W in three dimensional space with a common initial
point I (ui, u2, u3), say. The line segments determine a figure
I U W V as shown:
where U, W and V are the points
(ui+ai, u2+a2, u3+a3), (ui+ai+bi, u2+a2+b2, u3+a3+b3)
and
(ui +h, u2 +b2, u3 + b3),
respectively.
In fact I U W V is a parallelogram. To prove this, we need
to find the lengths and directions of the four sides. Simple
analytic geometry shows that IU'= VW = /a + a| + a§ = /,
and that IV = UW = ^Jb + b + bj = m, say. Also the
direction cosines of IU and V W are ai/l, a2/l, a3/l, while
those of IV and U W are bi/m, b2/m, b3/m. It follows that
opposite sides of I U W V are parallel and of equal length, so
it is indeed a parallelogram.
These considerations show that the rule of addition for
vectors in R3
is equivalent to the parallelogram rule for addi-
tion of forces, which is familiar from mechanics. To add line
4.1: Examples of Vector Spaces 91
segments IU and IV representing the vectors A and B, com-
plete the parallelogram formed by the lines IU and IV; the
the diagonal I W will represent the vector A + B.
An equivalent formulation of this is the triangle rule,
which is encapsulated in the diagram which follows:
I A
Note that this diagram is obtained from the parallelogram by
deleting the upper triangle. Since IV and U W are parallel
line segments of equal length, they represent the same vector
B.
There is also a geometrical interpretation of the rule of
scalar multiplication in R3
. As before let A in R3
be repre-
sented by the line segment joining I(tii, U2, u^) to U(tti + ai,
U2 + Q2) ^3 + 03). Let c be any scalar. Then cA is represented
by the line segment from (u, U2, U3) to (u + cax, U2 + ca2,
U3 + CGS3). This line segment has length equal to c times the
length of IU, while its direction is the same as that of IU if
c > 0, and opposite to that of IU if c < 0.
Of course, there are similar geometrical representations
of vectors in R2
by line segments drawn in the plane, and in
R1
by line segments drawn along a fixed line. So our first
examples of vector spaces are familiar objects if n < 3.
Further examples of vector spaces are obtained when the
field of real numbers is replaced by the field of complex num-
bers C: in this case we obtain
Cn
,
92 Chapter Four: Introduction to Vector Spaces
the vector space of all n-column vectors with entries in C.
More generally it is to carry out the same construction with
an arbitrary field of scalars F, in the sense of 1.3; this yields
the vector space
of all n-column vectors with entries in F, with the usual rules
of matrix addition and scalar multiplication.
Vector spaces of matrices
One obvious way to extend the previous examples is by
allowing matrices of arbitrary size. Let
Mm?n(R)
denote the set of all m x n matrices with real entries. This
set is closed with respect to matrix addition and scalar mul-
tiplication, and it includes the zero matrix 0m>n. The rules of
matrix algebra guarantee that MmiTl(R) is a vector space. Of
course, if n = 1, we recover the Euclidean space Rm
, while if
m = 1, we obtain the vector space
of all real n-row vectors. It is consistent with notation estab-
lished in 1.3 if we write
Mn(R)
for the vector space of all real n x n matrices, instead of
Mn n (R). Once again R can be replaced by any field of scalars
F in these examples, to produce the vector spaces
Mm ,n (F), Mn(F) and Fn.
4.1: Examples of Vector Spaces 93
Vector spaces of functions
Let a and b be fixed real numbers with a < b, and let
C[a, b] denote the set of all real-valued functions of x that are
continuous at each point of the closed interval [o, b]. If / and
g are two such functions, we define their sum f + g by the rule
f + g(x) = f(x)+g(x).
It is a well-known result from calculus that / + g is also con-
tinuous in [a, b], so that f + g belongs to C[a, b]. Next, if c is
any real number, the function cf defined by
cf(x) = c(f(x))
is continuous in [a, b] and thus belongs to C[a,b]. The zero
function, which is identically equal to zero in [a, b], is also
included in C[a,b].
Thus once again we have a set that is closed with respect
to natural operations of addition and scalar multiplication;
C[a, b] is the vector space of all continuous functions on the
interval [a, b]. In a similar way one can form the smaller vector
space D[a, b] consisting of all differentiable functions on [a, b],
with the same rules of addition and scalar multiplication. A
still smaller vector space is £>oo[a,6], the vector space of all
functions that are infinitely differentiable in [a, b]
Vector spaces of polynomials
A (real) polynomial in an indeterminate x is an expression
of the form
f(x) = aQ + aix H - anxn
where the coefficients a^ are real numbers. If an ^ 0, the
polynomial is said to have degree n. Define
Pn(R)
94 Chapter Four: Introduction to Vector Spaces
to be the set of all real polynomials in x of degree less than n.
Here we mean to include the zero polynomial, which has all
its coefficients equal to zero. There are natural rules of addi-
tion and scalar multiplication in Pn (R), namely the familiar
ones of elementary algebra: to add two polynomials add cor-
responding coefficients; to multiply a polynomial by a scalar
c, multiply each coefficient by c. Using these operations, we
obtain the vector space of all real polynomials of degree less
than n.
This example could be varied by allowing polynomial of
arbitrary degree, thus yielding the vector space of all real poly-
nomials
P(R).
As usual R may be replaced by any field of scalars here.
Common features of vector spaces
The time has come to identify the common features in
the above examples: they are:
(i) a non-empty set of objects called vectors, including a
"zero" vector;
(ii) a way of adding two vectors to give another vector;
(iii) a way of multiplying a vector by a scalar to give a
vector;
(iv) a reasonable list of rules that the operations
mentioned(ii) and (iii) are required to satisfy.
We are being deliberately vague in (iv), but the rules should
correspond to properties of matrices that are known to hold
in Rn
andMm > n (R).
4.2: Vector Spaces and Subspaces 95
Exercises 4.1
1. Give details of the geometrical interpretations of R1
and
R2
.
2. Which of the following might qualify as vector spaces in
the sense of the examples of this section?
(a) the set of all real 3-column vectors that correspond to
line segments of length 1;
(b) the set of all real polynomials of degree at least 2;
(c) the set of all line segments in R3
that are parallel to
a given plane;
(d) the set of all continuous functions of x defined in the
interval [0, 1] that vanish at x — 1/2.
4.2 Vector Spaces and Subspaces
It is now time to give a precise formulation of the defini-
tion of a vector space.
Definition of a vector space
A vector space V over R consists of a set of objects called
vectors, a rule for combining vectors called addition, and a
rule for multiplying a vector by a real number to give another
vector called scalar multiplication. If u and v are vectors, the
result of adding these vectors is written u + v, the sum of u
and v; also, if c is a real number, the result of multiplying v
by c, is written cv, the scalar multiple of v by c.
It is understood that the following conditions must be
satisfied for all vectors u, v, w and all real scalars c, d :
(i) u + v = v + u, (commutative law);
(ii) (u + v) + w = u + (v + w), (associative law);
(iii) there is a vector 0, called the zero vector, such that
v + 0 = v;
(iv) each vector v has a negative, that is, a vector —v
such that v + (—v) = 0;
96 Chapter Four: Introduction to Vector Spaces
(v) cd(v) = c(dv);
(vi) c(u + v) = cu + cv : (distributive law);
(vii) (c + d)v — cv + dv; (distributive law);
(viii) lv = v.
For economy of notation it is customary to use V to denote
the set of vectors, as well as the vector space. Since the vector
space axioms just listed hold for matrices, they are valid in
Rn
; they also hold in the other examples of vector spaces
described in 4.1.
More generally, we can define a vector space over an ar-
bitrary field of scalars F by simply replacing R by F in the
above axioms.
Certain simple properties of vector spaces follow easily
from the axioms. Since these are used constantly, it is as well
to establish them at this early stage.
Lemma 4.2.1
If u and v are vectors in a vector space, the following state-
ments are true:
(a) Ov = 0 and c 0 = 0 where c is a scalar;
(b) if u + v = 0, then u = —v;
(c) ( - l ) v = - v .
Proof
(a) In property (vii) above put c = 0 = d, to get Ov = Ov +
Ov. Add — (Ov) to both sides of this equation and use the
associative law (ii) to deduce that
0 = -(Ov) + Ov = (-(Ov) + Ov) + Ov,
which leads to 0 = Ov. Proceed similarly in the second part.
(b) Add —v to both sides of u + v = 0 and use the associative
law.
(c) Using (vii) and (viii), and also (a), we obtain
v + (-l)v = lv + (-l)v = (1 + (-l))v = Ov = 0.
4.2: Vector Spaces and Subspaces 97
Hence (-l)v = —v by (b).
Subspaces
Roughly speaking, a subspace is a vector space contained
within a larger vector space; for example, the vector space
p2(R) is a subspace of Ps(R). More precisely, a subset S of
a vector space V is called a subspace of V if the following
statements are true:
(i) S contains the zero vector 0;
(ii) if v belongs to S, then so does cv for every scalar c,
that is, S is closed under scalar multiplication;
(iii) if u and v belong to S, then so does u + v that is, S
is closed under addition.
Thus a subspace of V is a subset S which is itself a vector
space with respect to the same rules of addition and scalar
multiplication as V. Of course, the vector space axioms hold
in S since they are already valid in V.
Examples of subspaces
If V is any vector space, then V itself is a subspace, for
trivial reasons. It is often called the improper subspace. At
the other extreme is the zero subspace, written 0 or Oy, which
contains only the zero vector 0. This is the smallest subspace
of V. (In general a vector space that contains only the zero
vector is called a zero space). The zero subspace and the
improper subspace are present in every vector space. We move
on now to some more interesting examples of subspaces.
Example 4.2.1
Let S be the subset of R2
consisting of all columns of the form
(-30
98 Chapter Pour: Introduction to Vector Spaces
where t is an arbitrary real number. Since
2
M + ( 2t
A = ( 2(*i+<2)
and
—3t J  —3ct
S is closed under addition and scalar multiplication; also S
contains the zero vector I 1, as may be seen by taking t to
be to 0. Hence S is a subspace of R2
.
In fact this subspace has geometrical significance. For
/ 2 A
an arbitrary vector I 1 of S may be represented by a line
segment in the plane with initial point the origin and end point
(2t,—3t). But the latter is a general point on the line with
equation 3x + 2y = 0. Therefore the subspace S corresponds
to the set of line segments drawn from the origin along the
line 3x + 2y = 0.
Example 4.2.2
This example is an important one. Consider the homogeneous
linear system
AX = 0
in n unknowns over some field of scalars F and let S denote
the set of all solutions of the linear system, that is, all the
n-column vectors X over F that satisfy AX — 0. Then S is a
subset of Fn
and it certainly contains the zero vector. Now if
X and Y are solutions of the linear system and c is any scalar,
then
A(X + Y) = AX + AY = 0 and A(cX) = c(AX) = 0.
Thus X + Y and cX belong to S and it follows that S is a
subspace of the vector space Rn
. This subspace is called the
4.2: Vector Spaces and Subspaces 99
solution space of the homogeneous linear system AX = 0; it is
also known as the null space of the matrix A. (Question: why
is it necessary to have a homogeneous linear system here?)
Example 4.2.3
Let S denote the set of all real solutions y = y(x) of the
homogeneous linear differential equation
y" + by' + 6y = 0
defined in some interval [a, b]. Thus S is a subset of the vector
space C[a, &
] of continuous functions on [a, b]. It is easy to
verify that S contains the zero function and that S is closed
with respect to addition and scalar multiplication; in other
words S is a subspace of C[a, b}.
The subspace S in this example is called the solution space
of the differential equation. More generally, one can define the
solution space of an arbitrary homogeneous linear differential
equation, or even of a system of such differential equations.
Systems of homogeneous linear differential equations are stud-
ied in Chapter Eight.
Linear combinations of vectors
Let vi, V2,..., v/; be vectors in a vector space V. If c,
C2, ... , C
f
c are any scalars, the vector
civi + c2v2 H V cfcvfc
is called a linear combination of vi, v2 ,..., v^.
For example, consider two vectors in R2
The most general linear combination of X and X2 is
100 Chapter Four: Introduction to Vector Spaces
In general let X be any non-empty subset of a vector
space V and denote by
<X >
the set of all linear combinations of vectors in X. Thus a typical
element of < X > is a vector of the form
cixi + c2x2 H h CfcXfc
where xi, x2 ,..., x& are vectors belonging to X and c, c2 ,...,
Ck are scalars. From this formula it is clear that the sum of
any two elements of < X > is still in < X > and that a scalar
multiple of an element of < X > is in < X >. Thus we have
the following important result.
Theorem 4.2.2
If X is a non-empty subset of a vector space V, then < X >,
the set of all linear combinations of elements of X, is a sub-
space of V.
We refer to < X > as the subspace of V generated (or
spanned) by X. A good way to think of < X > is as the small- -
est subspace of V that contains X. For any subspace of V that
contains X will necessarily contain all linear combinations of
vectors in X and so must contain < X > as a subset. In par-
ticular, a subset X is a subspace if and only if X — < X >.
In the case of a finite set X = {x!, x2 ,..., x/J, we shall write
< X i , X2 , . . . , Xfc >
for < X >.
Example 4.2.4
For the three vectors of R3
given below, determine whether
C belongs to the subspace generated by A and B:
x
-(i),j,
-(1),c
-(~i)-
4.2: Vector Spaces and Subspaces 101
We have to decide if there are real numbers c and d such
that cA + dB — C. To see what this entails, equate cor-
responding vector entries on both sides of the equation to
obtain
c - d = - 1
c + 2d = 5
4c + d = 6
Thus C belongs to < A, B > if and only if this linear system
is consistent. It is quickly seen that the linear system has the
(unique) solution c = 1, d = 2. Hence C = A + 2B, so that C
does belong to the subspace < A, B >.
What is the geometrical meaning of this conclusion? Re-
call that A, B and C can be represented by line segments in
3-dimensional space with a common initial point I, say IP,
IQ and IR. A typical vector in < A, B > can be expressed in
the form sA + tB with real numbers s and t . Now sA and
tB are representable by line segments parallel to IP and IQ
respectively. We obtain a line segment that represents sA+tB
by applying the parallelogram law; clearly the resulting line
segment will lie in the plane determined by IP and IQ. Con-
versely, it is not difficult to see that any line segment lying in
this plane represents a vector of the form sA + tB. Therefore
the vectors in the subspace < A, B > are those that can be
represented by line segments drawn from I lying in the plane
determined by IP and IQ. What we have shown is that IR
lies in this plane.
Finitely generated vector spaces
A vector space V is said to be finitely generated if there
is a finite subset {vi, V2,..., v&} of V such that
V =< vi,v2 ,...,vf c >,
that is to say, every vector in V is a linear combination of the
vectors vi, V2,. • •, v^, and so has the form
CiVi + C2V2 - h CkVk
102 Chapter Four: Introduction to Vector Spaces
for some scalars c$. If, on the other hand, no finite subset
generates V, then V is said to be infinitely generated.
Example 4.2.5
Show that the Euclidean space Rn
is finitely generated.
Let X, X2, • •., Xn be the columns of the identity matrix
In- If
fa±

A= ^
anJ
is any vector in Rn
, then A = aX + 0,2X2 + • • • + anXn;
therefore X±,X2,.. • ,Xn generate Rn
and consequently this
vector space is finitely generated.
On the other hand, one does not have to look far to find
infinitely generated vector spaces.
Example 4.2.6
Show that the vector space P(R) of all real polynomials in x
is infinitely generated.
To prove this we adopt the method of proof by contradic-
tion. Assume that P(R) is finitely generated, say by polyno-
mials Pi,P2, • • • iPki a n
d look for a contradiction. Clearly we
may assume that all of these polynomials are non-zero; let m
be the largest of their degrees. Then the degree of any linear
combination of Pi,p2 • • • ,Pk certainly cannot exceed m. But
this means that xm+1
, for example, is not such a linear com-
bination. Consequently Pi,P2:---iPk do not generate P(R),
and we have reached a contradiction. This establishes the
truth of the claim.
4.2: Vector Spaces and Subspaces 103
Exercises 4.2
1. Which of the following are vector spaces? The operations
of addition and scalar multiplication are the natural ones:
(a) the set of all 2 x 2 real matrices with determinant
equal to zero;
(b) the set of all solutions X of a linear system AX = B
where B ^ 0;
(c) the set of all functions y = y(x) that are solutions of
the homogeneous linear differential equation
an(x)y{n)
+ an^{x)y^-l)
+ •
•
• + ai{x)y' + a0(x)y = 0.
2. In the following examples say whether S is a subspace of
the vector space V :
(a) V = R2
and S is the subset of all matrices of the form
1 where a is an arbitrary real number;
(b) V = C[0,1] and S is the set of all infinitely
differentiable functions in V.
(c) V = -P(R) and S is the set of all polynomials p
such that p(l) = 0.
3. Does the polynomial 1 — 2x + x2
belong to the subspace of
P3(R) generated by the polynomials 1 + X » X X and 3 — 2a:?
4. Determine if the matrix I 1 is in the subspace of
M2(R) generated by the following matrices:
3 4  / 0 2  / 0 2
1 2J' 1-1/3 4J' I 6 1
5. Prove that the vector spaces Mm)Tl(F) and Pn(F) are
finitely generated where F is an arbitrary field.
104 Chapter Four: Introduction to Vector Spaces
6. Prove that the vector spaces C[0, 1] and P(F) are infinitely
generated, where F is any field.
7. Let A and B be vectors in R2
. Show that A and B generate
R2
if and only if neither is a scalar multiple of the other.
Interpret this result geometrically.
4.3 Linear Independence in Vector Spaces
We begin with the crucial definition. Let V be a vector
space and let X be a non-empty subset of V. Then X is said to
be linearly dependent if there are distinct vectors Vi, v2 ,..., v^
in X, and scalars c±, c2 ,..., Ck, not all of them zero, such that
civi + c2v2 H h c/jVfc = 0.
This amounts to saying that at least one of the vectors v$ can
be expressed as a linear combination of the others. Indeed, if
say Ci / 0, then we can solve the equation for v$, obtaining
n
For example, a one-element set {v} is linearly dependent if and
only if v = 0. A set with two elements is linearly dependent
if and only if one of the elements is a scalar multiple of the
other.
A subset which is not linearly dependent is said to be
linearly independent. Thus a set of distinct vectors {vi, ... ,
Vfc} is linearly independent if and only if an equation of the
form civi -I hCfcVfc = 0 always implies that c = c2 = • • • =
ck = 0 .
We shall often say that vectors v i , . . . , v& are linearly de-
pendent or independent, meaning that the subset {vi,..., v/J
has this property.
4.3: Linear Independence in Vector Spaces 105
Linear dependence in R3
Consider three vectors A, B,C in Euclidean space R3
,
and represent them by line segments in 3-dimensional space
with a common initial point. If these vectors form a linearly
dependent set, then one of them, say A, can be expressed as
a linear combination of the other two, A = uB + vC; this
equation says that the line segment representing A lies in the
same plane as the line segments that represent B and C. Thus,
if the three vectors form a linearly dependent set, their line
segments must be coplanar.
Conversely, assume that A,B,C are vectors in R3
which
are represented by line segments drawn from the origin, all of
which lie in a plane. We claim that the vectors will then be
linearly dependent. To see this, let the equation of the plane
be ux + vy + wz = 0; keep in mind that the plane passes
through the origin. Let the entries of A be written ai, 02, 03,
with a similar notation for B and C. Then the respective
end points of the line segments have coordinates (01,02,03),
(61,62,63), (ci, 02,03). Since these points lie on the plane, we
have the equations
ua + vci2 + waz = 0
ub + v62 + 1063 = 0
UO + VC2 + WC3 = 0
This homogeneous linear system has a non-trivial solution for
u, v, w, so the determinant of its coefficient matrix is zero by
3.3.2. Now the coefficient matrix of the linear system
' ua + vb + wci = 0
< ua2 + vb2 + WC2 = 0
k ^03 + 1*63 + wc3 = 0
is the transpose of the previous one, so by 3.2.1 it has the same
determinant. It follows that the second linear system also has
106 Chapter Four: Introduction to Vector Spaces
a non-trivial solution u, v, w. But then uA + vB + wC = 0,
which shows that the vectors A, B, C are linearly independent.
Thus there is a natural geometrical interpretation of lin-
ear dependence in the Euclidean space R3
: three vectors are
linearly dependent if and only if they are represented by line
segments lying in the same plane. There is a corresponding in-
terpretation of linear dependence in R2
(see Exercise 4.3.11).
Example 4.3.1
Are the polynomials x + 1, x + 2, x2
— 1 linearly dependent in
the vector space P3(R)?
To answer this, suppose that c,C2,c^ are scalars satisfy-
ing
cx{x + 1) + c2(x + 2)+ c3(x2
- 1) = 0.
Equating to zero the coefficients of 1, x , x2
, we obtain the
homogeneous linear system
Ci + 2 c 2 - C3 = 0
ci + c2 = 0
C3 = 0
This has only the trivial solution c = c2 = c3 = 0; hence the
polynomials are linearly independent.
Example 4.3.2
Show that the vectors
( - : ) •( ! ) •( - : )
are linearly dependent in R2
.
Proceeding as in the last example, we let c, c2, cz be
scalars such that
*(-J)+
*GM-3-C0-
4.3: Linear Independence in Vector Spaces 107
This is equivalent to the homogeneous linear system
f -ci + c2 + 2c3 = 0
 2ci + 2c2 - 4c3 = 0
Since the number of unknowns is greater than the number
of equations, this system has a non-trivial solution by 2.1.4.
Hence the vectors are linearly dependent.
These examples suggest that the question of deciding
whether a set of vectors is linearly dependent is equivalent to
asking if a certain homogeneous linear system has non-trivial
solutions. Further evidence for this is provided by the proof
of the next result.
Theorem 4.3.1
Let Ai,A2,... ,Am be vectors in the vector space Fn
where
F is some field. Put A — [AA2... Am], an n x m matrix.
Then Ai, A2,..., Am are linearly dependent if and only if the
number of pivots of A in row echelon form is less than m.
Proof
Consider the equation CA + c2A2 + • • • + cmAm = 0 where
c, c2, ... , cm are scalars. Equating entries of the vector on
the left side of the equation to zero, we find that this equation
is equivalent to the homogeneous linear system
/ c i 
A . = 0 .
cmJ
By 2.1.3 the condition for this linear system to have a non-
trivial solution ci, c2,..., cm is that the number of pivots be
less than m . Hence this is the condition for the set of column
vectors to be linearly dependent.
108 Chapter Four: Introduction to Vector Spaces
In 5.1 we shall learn how to tell if a set of vectors in an
arbitrary finitely generated vector space is linearly dependent.
An application to differential equations
In the theory of linear differential equations it is an im-
portant problem to decide if a given set of functions in the
vector space C[a, b] is linearly dependent. These functions
will normally be solutions of a homogeneous linear differential
equation. There is a useful way to test such a set of func-
tions for linear independence using a determinant called the
Wronskian.
Suppose that / i , /2, • • •, /«, are functions whose first n—
derivatives exist at all points of the interval [a, b. In particular
this means that the functions will be continuous throughout
the interval, so they belong to C[a, b]. Assume that ci, C2,...,
cn are real numbers such that Ci/i +C2/2 + • • • + cnfn = 0, the
zero function on [a ,b]. Now differentiate this equation n — 1
times, keeping in mind that the Cj are constants. This results
in a set of n equations for c, c^,..., cn
{
c i / i + C2/2 + • • •
cif{ + c2ti +•
• •
ci/1
(
"-1)
+c2 /2
( n
-1 }
+•
• •
This linear system can be written in matrix form:
/ h h •
•
• fn 
l / l J 2 ' ' ' in
y An-l) An-1) _ _ _ f^n_1)
)
By 3.3.2, if the determinant of the coefficient matrix of the
linear system is not identically equal to zero in [a, b], the
+ cnfn = 0
+ cnfn = 0
+ cnfn
n
-1)
= 0
/ c i 
C2
Vcn/
0.
4.3: Linear Independence in Vector Spaces 109
linear system has only the trivial solution and the functions
/i) /25 • • •, fn w m De
linearly independent. Define
W(f1,f2,...,fn) =
h
fi
, ( n - l )
/:
h
( n - l )
In
f
J n
/ ( n - l )
Jn
This determinant is called the Wronskian of the functions
/i ? /2, • • •, fn • Then our discussion shows that the following is
true.
Theorem 4.3.2 Suppose that fi, f2, • • •
, fn are functions
whose first n — 1 derivatives exist in the interval [a,b. If
W(fi, / 2 , . . . , fn) is not identically equal to zero in this inter-
val, then / i , f 2 , . . . , fn are linearly independent in [a, b}.
The converse of 4.3.2 is false. In general one cannot
conclude that if / i , f2, • • •
, fn are linearly independent, then
W(fi,f2, • • •, fn) is not the zero function. However, it turns
out that if the functions fi,f2,---,fn a r e
solutions of a ho-
mogeneous linear differential equation of order n, then the
Wronskian can never vanish. Hence a necessary and sufficient
condition for a set of solutions of a homogeneous linear differ-
ential equation to be linearly independent is that their Wron-
skian should not be the zero function. For a detailed account
of this topic the reader should consult a book on differential
equations such as [16].
Example 4.3.3
Show that the functions x,ex
,e~2x
are linearly indepen-
dent in the vector space C[0, 1].
The Wronskian is
W(x,ex
, e~lx
) =
e-2x
-2e-2x
4 e - 2 *
3(2x-l)e"
110 Chapter Four: Introduction to Vector Spaces
which is not identically equal to zero in [0, 1].
Exercises 4.3
1. In each of the following cases determine if the subset S of
the vector space V is linearly dependent or linearly indepen-
dent:
(a) V = C and S consists of the column vectors
U)'KH'U4r);
(b) V = P(R) and S = {x - 1, x2
+ 1, x3
- x2
- x + 3};
(c) V = M(2, R) and S consists of the matrices
(2 - 3  / 3 1 [12 -7
6 4J> 1,-1/2 - 3 / ' Vl7 6J'
2. A subset of a vector space that contains the zero vector is
linearly dependent: true or false?
3. If X is a linearly independent subset of a vector space, every
non-empty subset of X is also linearly independent: true or
false?
4. If X is a linearly dependent subset of a vector space, every
non-empty subset of X is also linearly dependent: true or
false?
5. Prove that any three vectors in R2
are linearly dependent.
Generalize this result to Rn
.
6. Find a set of n linearly independent vectors in Rn
.
7. Find a set of ran linearly independent vectors in the vector
space Mm>n(R).
8. Show that the functions x, ex
sin x, ex
cos x form a lin-
early independent subset of the vector space C[0, n].
4.3: Linear Independence in Vector Spaces 111
9. The union of two linearly independent subsets of a vector
space is linearly independent: true or false?
10. If {u, v}, {v, w}and{w,u} are linearly independent sub-
sets of a vector space, is the subset {u, v, w} necessarily lin-
early independent?
11. Show that two non-zero vectors in R2
are linearly de-
pendent precisely when they are represented by parallel line
segments in the plane.
Chapter Five
BASIS AND DIMENSION
We now specialize our study of vector spaces to finitely
generated vector spaces, that is, to those that can be generated
by finite subsets. The essential fact to be established is that
in any non-zero vector space there is a basis, that is to say, a
set of vectors in terms of which every vector of the space can
be written in a unique manner. This allows the representation
of vectors in abstract vector spaces by column vectors.
5.1 The Existence of a Basis
The following theorem on linear dependence is fundamen-
tal for everything in this chapter.
Theorem 5.1.1
Let Vi, V2,..., vm be vectors in a vector space V and let S =
< v
i> v
2, • • •,v
m >> the subspace generated by these vectors.
Then any subset of S containing m + 1 or more elements is
linearly dependent.
Proof
To prove the theorem it suffices to show that if ui, u2 ,...,
u m + i are any m+1 vectors of the subspace S, then these vec-
tors are linearly dependent. This amounts to finding scalars
ci, c2 ,..., cm, not all of them zero, such that
ciui + c2u2 H h cm + ium + 1 = 0.
Now, because u; belongs to S, there is an expression
Uj = dijVi + d2 ;v2 H h dmivm
112
5.1: Existence of a Basis 113
where the dji are certain scalars. On substituting for the u^,
we obtain
m+l m
ciui + c2u2 H h cm + ium + i =Y^Ci (^2 djiVj)
i=l j=l
m m + l
j=l i=l
Here we have interchanged the summations over i and j . This
is permissible since it corresponds to adding up the vectors
CidjiVj in a different order, which is possible in a vector space
because of the commutative law for addition.
We deduce from the last equation that the vector ciUi +
C2U2 + • •
• + cm + ium + i will equal 0 provided that all the ex-
pressions ^djiCi equal zero, that is to say, ci,C2,... ,Cm+i
form a non-trivial solution of the homogeneous linear system
DC — 0 where D is the m x (m + 1) matrix whose (j, i) en-
try is dji and C is the column consisting of ci, C2, • •
. , Cm+i-
But this linear system has m + l unknowns and m equations;
therefore, by 2.1.4, there is a non-trivial solution C. In conse-
quence there are indeed scalars c, C2,..., Cm+i, not all zero,
which make the vector ciUi + C2U2 + • •
• + cm_|_ium+i zero.
Corollary 5.1.2
If V is a vector space which can be generated by m elements,
then every subset of V with m + l or more vectors is linearly
dependent.
Thus the number of elements in a linearly independent
subset of a finitely generated vector space cannot exceed the
number of generators. On the other hand, if a subset is to
generate a vector space, it surely cannot be too small. We
unite these two contrasting requirements in the definition of
a basis.
114 Chapter Five: Basis and Dimension
Bases
Let X be a non-empty subset of a vector space V. Then
X is called a basis of V if both of the following are true:
(i) X is linearly independent;
(ii) X generates V.
Example 5.1.1
As a first example of a basis, consider the columns of the
identity n x n matrix In:
Ex =
O
w
, E2
(
W
... , En =
0
W
From the equation
ciEi + c2E2 H h cnEn =
f c x 
cn/
it follows that E, E2,.. •
, En generate Rn
. But these vectors
are also linearly independent; for the equation also shows that
cE + c2E2 + • • • + cnEn cannot equal zero unless all the Cj
are zero. Therefore the vectors Ei, E2,..., En form a basis of
the Euclidean space Rn
. This is called the standard basis of
Rn
.
An important property of bases is uniqueness of express-
ibility of vectors.
5.1: Existence of a Basis 115
Theorem 5.1.3
If{ v
l j v
2 > • • • j v
n
} is a basis of a vector space V, then each
vector v in V has a unique expression of the form
v = civi + c2v2 H h cnvn
/or certain scalars Ci.
Proof
If there are two such expressions for v, say civi + • •
• + cn vn
and diVi + • •
• + dnvn, then, by equating these, we arrive at
the equation
(ci - di)vi H h (cn - dn)vn — 0.
By linear independence of the Vi this can only mean that c^ =
di for all i, so the expression is unique as claimed.
Naturally the question arises: does every vector space
have a basis? The answer is negative in general. Since a zero
space has 0 as its only vector, it has no linearly independent
subsets at all; thus a zero space cannot have a basis. However,
apart from this uninteresting case, every finitely generated
vector space has a basis, a fundamental result that will now
be proved. Notice that such a basis must be finite by 5.1.2.
Theorem 5.1.4
Let V be a finitely generated vector space and suppose that XQ
is a linearly independent subset of V. Then XQ is contained in
some basis XofV.
Proof
Suppose that V is generated by m elements. Then by 5.1.2
no linearly independent subset of V can contain more than m
elements. From this it follows that there exists a subset X
of V containing X0 which is as large as possible subject to
being linearly independent. For if this were false, it would be
116 Chapter Five: Basis and Dimension
possible to find arbitrarily large linearly independent subsets
of V.
We will prove the theorem by showing that the subset
X is a basis of V. Write X = {vi, V2, •
.. , vn }. Suppose that
u is a vector in V which does not belong to X. Then the
subset {vi, V2,..., vn , u} must be linearly dependent since it
properly contains X. Hence there is a linear relation of the
form
C1V1 + c2v2 H h cnvn + du = 0
where not all of the scalars c±, C2,..., cn, d are zero. Now if
the scalar d were zero, it would follow that cV + C2V2 +
• • • + cnvn = 0, which, in view of the linear independence of
vi, V2,..., vn , could only mean that c = c2 = • • • = cn = 0.
But now all the scalars are zero, which is not true. Therefore
d 7^ 0. Consequently we can solve the above equation for u to
obtain
u = (-o?_1
c1)vi + (-d~1
c2)r
2 H 1
- (-rf_1
cn)vn.
Hence u belongs to < v i , . . . , vn > . Prom this it follows that
the vectors v i , . . . , vn generate V; since these are also linearly
independent, they form a basis of V.
Corollary 5.1.5
Every non-zero finitely generated vector space V has a basis.
Indeed by hypothesis V contains a non-zero vector, say v.
Then {v} is linearly independent and by 5.1.4 it is contained
in a basis of V.
Usually a vector space will have many bases. For exam-
ple, the vector space R2
has the basis
5.1: Existence of a Basis 117
as well as the standard basis
©•
(!) •
And one can easily think of other examples. It is therefore
a very significant fact that all bases of a finitely generated
vector space have the same number of elements.
Theorem 5.1.6
Let V be a non-zero finitely generated vector space. Then any
two bases of V have equal numbers of elements.
Proof
Let {ui,U2,... ,um } and {vi,V2,..., vn } be two bases of V.
Then
V = < u 1 , u 2 , . , u m >
and it follows from 5.1.2 that no linearly independent subset
of V can have more than m elements; hence n < m . In the
same fashion we argue that m < n. Therefore m = n.
Dimension
Let V be a finitely generated vector space. If V is non-
zero, define the dimension of V to be the number of elements
in a basis of V; this definition makes sense because 5.1.6 guar-
antees that all bases of V have the same number of elements.
Of course, a zero space does not have a basis; however it is
convenient to define the dimension of a zero space to be 0,
so that every finitely generated vector space has a dimension.
The dimension of a finitely generated vector space V is de-
noted by
dim(V).
In fact infinitely generated vector spaces also have bases,
and it is even possible to assign a dimension to such a space,
118 Chapter Five: Basis and Dimension
namely a cardinal number, which is a sort of infinite analog
of a positive integer. However this goes well beyond our brief,
so we shall say no more about it.
Example 5.1.2
The dimension of Rn
is n; indeed it has already been shown
in Example 5.1.1 that the columns of the identity matrix In
form a basis of Rn
.
Example 5.1.3
The dimension of Pn (R) is n. In this case the polynomials
l,x,x2
,... ,xn
~1
form a basis (called the standard basis) of
Pn(R).
Example 5.1.4
Find a basis for the null space of the matrix
A =
Recall that the null space of A is the subspace of R4
consisting of all solutions X of the linear system AX = 0. To
solve this system, put A in reduced row echelon form using
row operations:
1 0 4/3 4/3'
0 1 1/3 - 2 / 3
0 0 0 0
From this we read off the general solution in the usual way:
( -Ac/3 - 4d/3 •
X =
5.1: Existence of a Basis 119
Now X can be written in the form
X = c
/ - 4
/ 3 
- 1 / 3
 J/
+ d
/-4/3N
2/3
0
V i/
where c and d are arbitrary scalars. Hence the null space of
A is generated by the vectors
X, = Xo
(

-4/3 
2/3
0
1 /
Notice that these vectors are obtained from the general solu-
tion X by putting c = 1, d = 0, and then c = 0, d = 1. Now
Xi and Xi are linearly independent. Indeed, if we assume
that some linear combination of them is zero, then, because
of the configuration of 0's and l's, the scalars are forced to be
be zero. It follows that X and X2 form a basis of the null
space of A, which therefore has dimension equal to 2.
It should be clear to the reader that this example de-
scribes a general method for finding a basis, and hence the di-
mension, of the null space of an arbitrary mxn matrix A. The
procedure goes as follows. Using elementary row operations,
put A in reduced row echelon form, with say r pivots. Then
the general solution of the linear system AX = 0 will con-
tain n — r arbitrary scalars, say ci, C2,..., cn _r . The method
of solving linear systems by elementary row operations shows
that the general solution can be written in the form
X = cXx + C2X2 + • •
• + cn-rXn—r
where Xi,..., Xn-r are particular solutions. In fact the solu-
tion Xi arises from X when we put c; = 1 and all other Cj's
120 Chapter Five: Basis and Dimension
equal to 0. The vectors Xi,X2, • • • ,Xn~r are linearly inde-
pendent, just as in the example, because of the arrangement
of O's and l's among their entries. It follows that a basis of
the null space of A is {Xi,X2, • • ., Xn-r}. We can therefore
state:
Theorem 5.1.7
Let A be a matrix with n columns and suppose that the number
of pivots in the reduced row echelon form of A is r. Then the
null space of A has dimension n — r.
Coordinate column vectors
Let V be a vector space with an ordered basis
{vi,..., vn }; this means that the basis vectors are to be writ-
ten in the prescribed order. We have seen in 5.1.3 that each
vector v of V has a unique expression in terms of the basis,
V = CiVi -i h CnVn
say. Thus v is completely determined by the scalars cx,..., cn.
We call the column
/ C ! 
cnJ
the coordinate vector of v with respect to the ordered basis
{vi,..., vn }. Thus each vector in the abstract vector space V
is represented by an n-column vector. This provides us with
a concrete way of representing abstract vectors.
Example 5.1.5
Find the coordinate vector of ( j with respect to the ordered
basis of R2
consisting of the vectors
( ! ) • ( ! ) •
5.1: Existence of a Basis 121
First notice that these two vectors are linearly indepen-
dent and generate R2
, so that they form a basis. We need to
find scalars c and d such that
1)-«(!)-(!
This amounts to solving the linear system
c + 3d = 2
c + Ad = 3
The unique solution i s c = — 1, d = 1, and hence the coordi-
-1
nate vector is ,
Coordinate vectors provide us with a method of testing a
subset of an arbitrary finitely generated vector space for linear
dependence.
Theorem 5.1.8
Let {vi,..., vn } be an ordered basis of a vector space V. Let
Ui,..., um be a set of vectors in V whose coordinate vectors
with respect to the given ordered basis are Xi,..., Xm respec-
tively. Then {u±,... ,um } is linearly dependent if and only
if the number of pivots of the matrix A = [X1IX2I... Xm] is
less than m.
Proof
Write Uj = ^ " = 1 ajiVj then the entries of Xi are an,..., ani,
so the (j,i) entry of A is a^. If ci,... ,cm are any scalars,
then
m n n m
cxui H h cmum = ^T Ci (^2 ajiVj) = ^2(^2 a
ji°i)v
j-
i=l j = l j=l i = l
122 Chapter Five: Basis and Dimension
Since v i , . . . , vn are linearly independent, the only way that
C1U1 + • • • + cmurn can be zero is if the sums Y^lLi a
jic
i vanish
for j — 1,... ,n. This amounts to requiring that AC = 0
where C is the column consisting of ci,...,cm . We know
from 2.1.3 that there is such a C different from 0 precisely
when the number of pivots of A is less than m. So this is the
condition for u i , . . . , um to be linearly dependent.
Example 5.1.6
Are the polynomials l — x + 2x2
— x3
, x + xs
, 2 + x + 4x2
+x3
linearly independent in Pt(R)?
Use the standard ordered basis {1 > X • OC < X 3
} of P4(R).
Then the coordinate columns of the given polynomials are the
columns of the matrix
/ 1 0 2
- 1 1 1
2 0 4
V-i i i
Using row operations, we see that the number of pivots of
the matrix is 2, which is less than the number of vectors.
Therefore the given polynomials are linearly dependent.
The next theorem lessens the work needed to show that
a particular set is a basis.
Theorem 5.1.9
Let V be a finitely generated vector space with positive dimen-
sion n. Then
(i) any set of n linearly independent vectors of V is a
basis;
(ii) any set of n vectors that generates V is a basis.
Proof
Assume first that the vectors vi, v2 ) ..., vn are linearly inde-
pendent. Then by 5.1.4 the set {vi, v2 ,..., vn } is contained
5.1: Existence of a Basis 123
in a basis of V. But the latter must have n elements by 5.1.6,
and so it coincides with the set of Vj's.
Now assume that the vectors vi, v2 ,..., vn generate V.
If these vectors are linearly dependent, then one of them, say
Vj, can be expressed as a linear combination of the others.
But this means that we can dispense with Vj completely and
generate V using only the v^'s for j ^ i, of which there are
n—1. Therefore dim(V) < n—1 by 5.1.2. By this contradiction
vi, V2,..., vn are linearly independent, so they form a basis
of V.
Example 5.1.7
The vectors
( - : ) •( : ) •( ! )
are linearly independent since the matrix which they form has
three pivots; therefore these vectors constitute a basis of R3
.
We conclude with an application of the ideas of this sec-
tion to accounting systems.
Example 5.1.8 (Transactions on an accounting system)
Consider an accounting system with n accounts, say
cti, CK2,..., oin- At any instant each account has a balance
which can be a credit (positive), a debit (negative), or zero.
Since the accounting system must at all times be in balance,
the sum of the balances of all the accounts will always be zero.
Now suppose that a transaction is applied to the system. By
this we mean that there is a flow of funds between accounts of
the system. If as a result of the transaction the balance of ac-
count Q.i changes by an amount £$, then the transaction can be
represented by an n-column vector with entries t,t2, • • •, tn.
Since the accounting system must still be in balance after the
transaction has been applied, the sum of the ti will be zero.
124 Chapter Five: Basis and Dimension
Hence the transactions correspond to column vectors
h
such that ti- -tn = 0. Now vectors of this form are easily
seen to constitute a subspace T of the vector space Rn
; this
is called the transaction space. Evidently T is just the null
space of the matrix
A =
( 1 1 •
•
• 1
0 0 0 •
•
• 0
 0 0 0 0,
Now A is already in reduced row echelon form, so we can read
off at once the general solution of the linear system AX = 0 :
/ - c 2 - c3
X =

c2
C3
Cn
J
with arbitrary real scalars c2, C3,..., cn. Now we can find a
basis of the null space in the usual way. For i = 2,..., n define
Ti to be the n-column vector with first entry —1, zth entry 1,
and all other entries zero. Then
X = c2T2 + C3T3 + • •
• + cnTn
and {T2, T3,..., Tn} is a basis of the transaction space T. Thus
dim(T) = n - 1. Observe that Ti corresponds to a simple
transaction, in which there is a flow of funds amounting to
5.1: Existence of a Basis 125
one unit from account a. to account an and which does not
affect other accounts.
Exercises 5.1
1. Show that the following sets of vectors form bases of R3
,
and then express the vectors Ei, E2, E3 of the standard basis
in terms of these:
2 3 1
3 1 4
1 2 1
1
- 7
0
(b) Yt = 1 , Y2 = 1 , Y3 =
2. Find a basis for the null space of each of the following
matrices:
1 - 5
(a) | - 4 2 - 6 J ; (b)
3 1
3. What is the dimension of the vector space MmjTl(F) where
F is an arbitrary field of scalars?
4. Let V be a vector space containing vectors vi, V2,..., vn
and suppose that each vector of V has a unique expression
as a linear combination of vi, V2,..., vn . Prove that the Vj's
form a basis of V.
5. If S is a subspace of a finitely generated vector space V,
establish the inequality dim(S') < dim(V).
126 Chapter Five: Basis and Dimension
6. If in the last problem dim(5r
) = dim(V), show that S = V.
7. If V is a vector space of dimension n, show that for each
integer i satisfying 0 < i < n there is a subspace of V which
has dimension i.
( 6

8. Write the transaction I —4 as a linear combination of
v-v
simple transactions.
9. Prove that vectors A, B, C generate R3
if and only if
none of these vectors belongs to the subspace generated by
the other two. Interpret this result geometrically.
10. If V is a vector space with dimension n over the field of
two elements, prove that V contains exactly 2n
vectors.
5.2 The Row and Column Spaces of a Matrix
Let A be an m x n matrix over some field of scalars F.
Then the columns of A are m-column vectors, so they belong
to the vector space Fm
, while the rows of A are n-row vectors
and belong to the vector space Fn. Thus there are two natural
subspaces associated with A, the row space, which is generated
by the rows of A and is a subspace of Fn, and the column space,
generated by the columns of A, which is a subspace of Fm
.
We begin the study of these important subspaces by in-
vestigating the effect upon them of applying row and column
operations to the matrix.
Theorem 5.2.1
Let A be any matrix.
(i) The row space is unchanged when an elementary row
operation is applied to A.
(ii) The column space is unchanged when an elementary
column operation is applied to A.
5.2: The Row and Column Spaces of a Matrix 127
Proof
Let B arise from A when an elementary row operation is ap-
plied. Then by 2.3.1 there is an elementary matrix E such
that B = EA. The row-times-column rule of matrix multi-
plication shows that each row of B is a linear combination of
the rows of A. Hence the row space of B is contained in the
row space of A. But A = E~1
B, since elementary matrices are
invertible, so the same argument shows that the row space of
A is contained in the row space of B. Therefore the row spaces
of A and B are identical. Of course, the argument for column
spaces is analogous.
There are simple procedures available for finding bases
for the row and column spaces of a matrix.
(I) To find a basis of the row space of a matrix A, use
elementary row operations to put A in reduced row echelon
form. Discard any zero rows; then the remaining rows will
form a basis of the row space of A.
(II) To find a basis of the column space of a matrix A,
use elementary column operations to put A in reduced column
echelon form. Discard any zero columns; then the remaining
columns will form a basis of the column space of A.
Why do these procedures work? By 5.2.1 the row space
of A equals the row space of R, its reduced row echelon form,
and this is certainly generated by the non-zero rows of R. Also
the non-zero rows of R are linearly independent because of the
arrangement of O's and l's in R ; therefore these rows form a
basis of the row space of A. Again the argument for columns
is similar. This discussion makes the following result obvious.
Corollary 5.2.2
For any matrix the dimension of the row space equals the num-
ber of pivots in reduced row echelon form, with a like statement
for columns.
128 Chapter Five: Basis and Dimension
Example 5.2.1
Consider the matrix
(
2 1 1 3 2 
- 1 2 1 1 3
o o i o i •
0 1 0 1 1 /
The reduced row echelon form of A is found to be
/ l 0 0 1 0
0 1 0 1 1
o o i o i •
VO 0 0 0 0/
Hence the row vectors [1 0 0 1 0], [0 1 0 1 1], [0 0 1 0 1] form
a basis of the row space of A and the dimension of this space
is 3.
In general elementary row operations change the column
space of a matrix, and column operations change the row
space. However it is an important fact that such operations
do not change the dimension.
Theorem 5.2.3
For any matrix, elementary row operations do not change the
dimension of the column space and elementary column opera-
tions do not change the dimension of the row space.
Proof
Take the case of row operations first. Let A be a matrix
with n columns and suppose that B = EA where E is an
elementary matrix. We have to show that the column spaces
of A and B have the same dimension. Denote the columns
of A by Ai, A2,. . ., An. If some of these columns are linearly
dependent, then there are integers i± < i2 < • • • < ir and
non-zero scalars c^, Cj2 ,..., cir such that
c^A^ +ci2Ai2- h cir Air = 0.
5.2: The Row and Column Spaces of a Matrix 129
Consequently there is a non-trivial solution C of the linear
system AC = 0 such that Cj =fi 0 for j = i%,..., ir. Using the
equation B = EA, we find that BC = EAC = EO = 0. This
means that columns i i , . . . , ir of B are also linearly dependent.
Therefore, if columns j i , . . . , ja of B are linearly independent,
then so are columns ji, • • • ,j3 of A. Hence the dimension of
the column space of B does not exceed the dimension of the
column space of A.
Since A = E~l
B, this argument can be applied equally
well to show that the dimension of the column space of A does
not exceed that of B. Therefore these dimensions are equal.
The truth of the corresponding statement for row spaces
can be quickly deduced from what has just been proved. Let
B = AE where E is an elementary matrix. Then BT
=
(AE)T
— ET
AT
. Now ET
is also an elementary matrix, so
by the last paragraph the column spaces of AT
and BT
have
the same dimension. But obviously the column space of AT
and the row space of A have the same dimension, and there is
a similar statement for B: the required result follows at once.
We are now in a position to connect row and column
spaces with normal form and at the same time to clarify a
point left open in Chapter Two.
Theorem 5.2.4
If A is any matrix, then the following integers are equal:
(i) the dimension of the row space of A;
(ii) the dimension of the column space of A;
(iii) the number of 1's in a normal form of A.
Proof
By applying elementary row and column operations to A, we
can reduce it to normal form, say
130 Chapter Five: Basis and Dimension
Now by 5.2.1 and 5.2.3 the row spaces of A and N have the
same dimension, with a like statement for column spaces. But
it is clear from the form of N that the dimensions of its row
and column spaces are both equal to r, so the result follows.
It is a consequence of 5.2.4 that every matrix has a unique
normal form; for the normal form is completely determined
by the number of l's on the diagonal.
The rank of a matrix
The rank of a matrix is defined to be the dimension of the
row or column space. With this definition we can reformulate
the condition for a linear system to be consistent.
Theorem 5.2.5
A linear system is consistent if and only if the ranks of the
coefficient matrix and the augmented matrix are equal.
This is an immediate consequence of 2.2.1, and 5.2.2.
Finding a basis for a subspace
Suppose that X, X2, • • •, -Xfc are vectors in Fn
where F
is a field. In effect we already know how to find a basis for
the subspace generated by these vectors; for this subspace is
simply the column space of the matrix [Xi|X2| ... X^]. But
what about subspaces of vector spaces other than Fn
? It turns
out that use of coordinate vectors allows us to reduce the
problem to the case of Fn
.
Let V be a vector space over F with a given ordered
basis vi, v2 ,..., vn , and suppose that S is the subspace of
V generated by some given set of vectors w1 ) W2,...,wm .
The problem is to find a basis of S. Recall that each vec-
tor in V has a unique expression as a linear combination of
the basis vectors v i , . . . , vn and hence has a unique coordi-
nate column vector, as described in 5.1. Let w, have co-
ordinate column vector Xi with respect to the given basis.
Then the coordinate column vector of the linear combination
5.2: The Row and Column Spaces of a Matrix 131
ciwi + c2w2 -I 1
- cfcwfc is surely cxXx + c2X2 H h CkXk.
Hence the set of all coordinate column vectors of elements of S
equals the subspace T of Fn
which is generated by Xx,..., Xk •
Moreover wi, W2,. • •, w/. will be linearly independent if and
only if Xi, X2 ,..., Xk are. In short wi, w2 ,..., w*; form a
basis of S if and only if X, X2, •
. •, Xk form a basis of T; thus
our problem is solved.
Example 5.2.2
Find a basis for the subspace of Pt(R) generated by the poly-
nomials 1 — x — 2x3
, 1 + x3
, 1 + x + 4x3
, x2
.
Of course we will use the standard ordered basis for P^TV)
consisting of l,x,x2
,x3
. The first step is to write down the
coordinate vectors of the given polynomials with respect to the
standard basis and arrange them as the columns of a matrix
A; thus
/ 1 1 1 0 
- 1 0 1 0
0 0 0 1 '
V-2 1 4 0/
To find a basis for the column space of A, use column opera-
tions to put it in reduced column echelon form:
/ l 0 0 0 
0 1 0 0
0 0 1 0 '
 1 3 0 0 /
The first three columns form a basis for the column space
of A. Therefore we get a basis for the subspace of P4(R) gen-
erated by the given polynomials by simply writing down the
polynomials that have these columns as their coordinate col-
umn vectors; in this way we arrive at the basis
l + x3
, x + 3x3
, x2
.
132 Chapter Five: Basis and Dimension
Hence the subspace generated by the given polynomials has
dimension 3.
Exercises 5.2
1. Find bases for the row and column spaces of the following
matrices:
<-»C2 =S i)-o» ("J J| J)-
2. Find bases for the subspaces generated by the given vectors
in the vector spaces indicated:
(a) l - 2 z - : r 3
, 3x-x2
, l + x + x2
+x3
, 4 + 7x + x2
+ 2x3
in P4(R);
3. Let A be a matrix and let N, R and C be the null space,
row space and column space of A respectively. Prove that
dim(JR) + dim(iV) = dim(C) + dim(iV) = n
where n is the number of columns of A.
4. If A is any matrix, show that A and AT
have the same
rank.
5. Suppose that A is an m x n matrix with rank r. What is
the dimension of the null space of AT
7
6. Let A and B be m x n and n x p matrices respectively.
Prove that the row space of AB is contained in the row space
of B, and the column space of AB is contained in the the
column space of A. What can one conclude about the ranks
of AB and BA ?
7. The rank of a matrix can be defined as the maximum num-
ber of rows in an invertible submatrix: justify this statement.
5.3: Operations with Subspaces 133
5.3 Operations with Subspaces
If U and W are subspaces of a vector space V, there
are two natural ways of combining U and W to form new
subspaces of V. The first of these subspaces is the intersection
unw,
which is the set of all vectors that belong to both U and V.
The second subspace that can be formed from U and W
is not, as one might perhaps expect, their union U UW; for
this is not in general closed under addition, so it may not be
a subspace. The subspace we are looking for is the sum
U + W,
which is denned to be the set of all vectors of the form u + w
where u belongs to U and w to W.
The first point to note is that these are indeed subspaces.
Theorem 5.3.1
If U and W are subspaces of a vector space V, then U CW
and U + W are subspaces of V.
Proof
Certainly U D W contains the zero vector and it is closed with
respect to addition and scalar multiplication since both U and
W are; therefore U fl W is a subspace.
The same method applies to U + W. Clearly this contains
0 + 0 = 0. Also, if Ui, U2 and wi, w2 are vectors in U and
W respectively, and c is a scalar, then
(ui + wi) + (u2 + w2) = (ui + u2) + (wi + w2)
and
c(ui + w i ) = cui + cwi,
134 Chapter Five: Basis and Dimension
both of which belong to U + W. Thus U + W is closed with
respect to addition and scalar multiplication and so it is a
subspace.
Example 5.3.1
Consider the subspaces U and W of R4
consisting of all vectors
of the forms
(a
 f°
b d
and
c e
Vo/ fJ
respectively, where a, b, c, d, e are arbitrary scalars. Then
U n W consists of all vectors of the form
ft)
c '
w
while U + W equals R4
since every vector in R4
can be ex-
pressed as the sum of a vector in U and a vector in W.
For subspaces of a finitely generated vector space there is
an important formula connecting the dimensions of their sum
and intersection.
Theorem 5.3.2
Let U and W be subspaces of a finitely generated vector space
V. Then
dim(U + W)+ dim(U n W) = dim(U) + dim(W).
Proof
If U = 0, then obviously U + W = W and U n W = 0; in this
case the formula is certainly true, as it is when W = 0.
5.3: Operations with Subspaces 135
Assume therefore that [ 7 ^ 0 and W ^ 0, and put
m = dim(U) and n = dim(W). Consider first the case where
U D W = 0. Let {ui,u2 ,... ,um } and {wi,w2, • •
. , wn } be
bases of U and W respectively. Then the vectors u i , . . . , um
and w i , . . . , wn surely generate U + W. In fact these vectors
are also linearly independent: for if there is a linear relation
between them, say
CiUi H h cm um + diwi H h dn wn = 0,
then
ciUi H h cmum = (-di)wi H h (-dn)wn ,
a vector which belongs to both U and W, and so to U HW,
which is the zero subspace. Consequently this vector must be
the zero vector. Therefore all the Cj and dj must be zero since
the Ui are linearly independent, as are the Wj. Consequently
the vectors u i , . . . , um , w i , . . . , wn form a basis of U + W, so
that dim([7 + W) = m + n = dim(J7) + dim(W), the correct
formula since U n W = 0 in the case under consideration.
Now we tackle the more difficult case where U D W ^ 0.
First choose a basis for UTW, say {zi,..., zr }. By 5.1.4 this
may be extended to bases of U and of W, say
{zi,..., zr, u,-_|_i,..., u m |
and
{zi,. . . , z r , w r + i , . . . ,wn }
respectively. Now the vectors
Zi . . . , Z r , U r + i , . . . , U m , W r + 1 , . . . , W n
generate U + W: for we can express any vector of U or W in
terms of them. What still needs to be proved is that they are
136 Chapter Five: Basis and Dimension
linearly independent. Suppose that in fact there is a linear
relation
r m n
]PeiZ;+ ^ C
3U
j+ 5Z dk
™k = 0
i=l j=r+l fc=r+l
where the e^, Cj, dk are scalars. Then
n r m
Y^ dkwk = ^2(-ei)zi+ X (-CJ)UJ,
fc=r+l i=l j = r + l
which belongs to both U and VF and so to U f) W. The
vector ^ dfcWfc is therefore expressible as a linear combination
of the Zi since these vectors are known to form a basis of
the subspace U D W. However Zi,..., zr, w r + i , . . . , wn are
definitely linearly independent. Therefore all the dj are zero
and our linear relation becomes
r m
J2eiZi+ ^ CjUj = 0.
But z i , . . . , zr, u r + i , , um are linearly independent, so it fol-
lows that the Cj and the e^ are also zero, which establishes
linear independence.
We conclude that the vectors z i , . . . , zr, u r + i , . . . , um ,
w r + i , . . . , wn form a basis of U + W. A count of the basis
vectors reveals that dim(C7 + W) equals
r + (m — r) + (n — r) = m + n — r
= dim(U) + dim(W) - dim(U D W).
Example 5.3.2
Suppose that U and W are subspaces of R10
with dimensions
6 and 8 respectively. Find the smallest possible dimension for
unw.
5.3: Operations with Subspaces 137
Of course dim(R10
) = 10 and, since U + W is a subspace of
R10
, its dimension cannot exceed 10. Therefore by 5.3.2
dim(C/ fl W) = dim(C7) + dim(W) - dim(U + W)
> 6 + 8 - 1 0 = 4.
So the dimension of the intersection is at least 4. The reader
is challenged to think of an example which shows that the
intersection really can have dimension 4.
Direct sums of subspaces
Let U and W be two subspaces of a vector space V. Then
V is said to be the direct sum of U and W if
V = U + W and UnW = 0.
The notation for the direct sum is
v = u®w.
Notice the consequence of the definition: each vector v of V
has a unique expression of the form v = u + w where u belongs
to U and w to W. Indeed, if there are two such expressions
v = ui + wi = U2 + W2 with Uj in U and Wj in W, then
ui — u2 = w2 — wi, which belongs to U D W = 0; hence
ui = u2 and wi = W2.
Example 5.3.3
Let U denote the subset of R3
consisting of all vectors of the
form
(i)
138 Chapter Five: Basis and Dimension
and let W be the subset of all vectors of the form
( ! )
where a, b, c are arbitrary scalars. Then U and W are sub-
spaces of R3
. In addition U + W = R3
and UCW = 0. Hence
R3
= u 8 W.
Theorem 5.3.3
If V is a finitely generated vector space and U and W are
subspaces of V such that V = U ® W, then
dim(V) = dim(C7) + dim(W).
This follows at once from 5.3.2 since dim(U DW) = 0 .
Direct sums of more than two subspaces
The concept of a direct sum can be extended to any finite
set of subspaces. Let U, U2, • • •, £4 be subspaces of a vector
space V. First of all define the sum of these subspaces
t/i + • • • + Uk
to be the set of all vectors of the form Ui + • • • + u& where
Uj belongs to Ui. This is clearly a subspace of V. The vector
space V is said to be the direct sum of the subspaces U,... Uk,
in symbols
v = u1@u2®---®uk,
if the following hold:
(i)V = U1 + --- + Uk;
(ii) for each i = 1, 2,..., k the intersection of Ui with the
sum of all the other subspaces Uj, j ^ i, equals zero.
5.3: Operations with Subspaces 139
In fact these are equivalent to requiring that every ele-
ment of V be expressible in a unique fashion as a sum of the
form ui + • • • + Ufc where u^ belongs to Ui.
The concept of a direct sum is a useful one since it often
allows us to express a vector space as a direct sum of subspaces
that are in some sense simpler.
Example 5.3.4
Let Ui,U2, U3 be the subspaces of R5
which consist of all
vectors of the forms
0
o
0
>
b
0
c
>
/ d 
0
0
0
respectively, where a, b, c, d, e are arbitrary scalars.
R5
= Ui@U2®U3.
Then
Bases for the sum and intersection of subspaces
Suppose that V is a vector space over a field F with posi-
tive dimension n and let there be given a specific ordered basis.
Assume that we have vectors u i , . . . , ur and w i , . . . , ws, gen-
erating subspaces U and W respectively. How can we find
bases for the subspaces U + W and UTiW and hence compute
their dimensions?
The first step in the solution is to translate the problem
to the vector space Fn
. Associate with each Ui and Wj its
coordinate column vector Xi and Yj with respect to the given
ordered basis of V. Then X,..., Xr and Y,..., Ys generate
respective subspaces U* and W* of Fn
. It is sufficient if we
can find bases for U* + W* and U* D W* since from these
bases for U + W and U CW can be read off. So assume from
now on that V equals Fn
.
140 Chapter Five: Basis and Dimension
Take the case of U + W first - it is the easier one. Let A
be the matrix whose columns are u i , . . . , u r : remember that
these are now n-column vectors. Also let B be the matrix
whose columns are w i , . . . , ws. Then U+W is just the column
space of the matrix M = [A  B]. A basis for U + W can
therefore be found by putting M in reduced column echelon
form and deleting the zero columns.
Turning now to UDW, we look for scalars Ci and dj such
that
C1U1 + h crur — diwi H 1
- dswa :
for every element of U D W is of this form. Equivalently
C1U1 H h crur + (-di)wi H h (-d8)w8 = 0.
Now this equation asserts that the vector
/ C l 
-di
-daJ
belongs to the null space of [A | B]. A method for finding a
basis for the null space of a matrix was described in 5.1. To
complete the process, read off the the first r entries of each
vector in the basis of the null space of [A  B], and take these
entries to be c1 ; ..., cr. The resulting vectors form a basis of
unw.
Example 5.3.5
Let
M =
2
2
 1
0
1
-2
1
2
5
-1
5
1 
2
- 1
3 /
5.3: Operations with Subspaces 141
and denote by U and W the subspaces of R4
generated by
columns 1 and 2, and by columns 3 and 4 of M respectively.
Find a basis for U + W.
Apply the procedure for finding a basis of the column
space of M. Putting M in reduced column echelon form, we
obtain
/ l 0 0 0
0 1 0 0
0 0 1 0 '
 3 - 1 / 3 - 2 / 3 0 /
The first three columns of this matrix form a basis of U + W;
hence dim(C7 + W)=3.
Example 5.3.6
Find a basis of U fl W where U and W are the subspaces of
Example 5.3.5.
Following the procedure indicated above, we put the ma-
trix M in reduced row echelon form:
/ l 0 0 - 1 
0 1 0 - 1 j
0 0 1 1 I '
 0 0 0 0 /
From this a basis for the null space of M can be read off, as
described in the paragraph preceding 5.1.7; in this case the
basis has the single element
1
- 1 "
V i/
Therefore a basis for U D W is obtained by taking the linear
combination of the generating vectors of U corresponding to
142 Chapter Five: Basis and Dimension
the scalars in the first two rows of this vector, that is to say
+ 1-
3
0
w
1.
Thus dim(C7 n W)
Example 5.3.7
Find bases for the sum and intersection of the subspaces U and
W of P4CR) generated by the respective sets of polynomials
{l + 2x + x3
, 1 x x2
} and {x + x2
- 3x3
, 2 + 2x - 2x3
}.
The first step is to translate the problem to R4
by writing
down the coordinate columns of the given polynomials with
respect to the standard ordered basis 1, 3
of P4(R).
Arranged as the columns of a matrix, these are
A =
Let U* and W* be the subspaces of R4
generated by the
coordinate columns of the polynomials that generate U and
W, that is, by columns 1 and 2, and by columns 3 and 4 of
A respectively. Now find bases for U* + W* and U* D W*,
just as in Examples 5.3.5 and 5.3.6. It emerges that U* + W*,
which is just the column space of A, has a basis
/ !
2
0
 1
1
- 1
- 1
0
0
1
1
- 3
2
2
0
- 2
5.3: Operations with Subspaces 143
On writing down the polynomials with these coordinate vec-
tors, we obtain the basis l-3x3
, x + 2x3
, x2
-5x3
for U* + W*.
In the case of U D W the procedure is to find a basis for
U* Pi W*. This turns out to consist of the single vector
( ;
)
Finally, read off that the polynomial
1 • (1 + 2x + x3
) + 1 • (1 - x - x2
) = 2 + x - x2
+ x3
forms a basis of U l~l W.
Quotient Spaces
We conclude the section by describing another subspace
operation, the formation of the quotient space of a vector
space with respect to a subspace. This new vector space is
formed by identifying the vectors in certain subsets of the
given vector space, which is a construction found throughout
algebra.
Proceeding now to the details, let us consider a vector
space V with a fixed subspace U. The first step is to define
certain subsets called cosets: the coset of U containing a given
vector v is the subset of V
v + [/ = {v + u|ue[/}.
Notice that the coset v + U really does contain the vector v
since v = v + OEv + U. Observe also that the coset v + U
can be represented by any one of its elements in the sense that
(v + u) + U = v + U for all u G U.
An important feature of the cosets of a given subspace is
that they are disjoint, i.e., they do not overlap.
144 Chapter Five: Basis and Dimension
Lemma 5.3.4
If U is a subspace of a vector space V, then distinct cosets of
U are disjoint. Thus V is the disjoint union of all the distinct
cosets of U.
Proof
Suppose that cosets v + U and w + U both contain a vector
x: we will show that these cosets are the same. By hypothesis
there are vectors ui, U2 in U such that
X = V + U i = W + U2-
Hence v = w + u where u = 112 — Ui G U, and consequently
v + U = (w + u) + U = 'w + U, since u + U = U, as claimed.
Finally, V is the union of all the cosets of U since v € v + U.
The set of all cosets of U in V is written
V/U.
A good way to think about V/U is that its elements arise by
identifying all the elements in a coset, so that each coset has
been "compressed" to a single vector.
The next step in the construction is to turn V/U into a
vector space by defining addition and scalar multiplication on
it. There are natural definitions for these operations, namely
(v + U) + (w + U) = (v + w) + U
c(v + U) = (cv) + U
where v, w € V and c is a scalar.
Although these definitions look natural, some care must
be exercised. For a coset can be represented by any of its
vectors, so we must make certain that the definitions just given
do not depend on the choice of v and w in the cosets v + U
and w + U.
5.3: Operations with Subspaces 145
To verify this, suppose we had chosen different represen-
tatives, say v' for v + 17 and w' for w + U. Then v' = v + Ui
and w' = w + u2 where ui, u2 £ U. Therefore
v' + w' = (v + w) + (ui + u2) e (v + w) + U,
so that (v7
+ w') + U = (v + w) + U. Also cv' = cv + cu± <
E
(cu) + U and hence cv' + U = cv + U. These arguments show
that our definitions are free from dependency on the choice of
coset representatives.
Theorem 5.3.5
If U is a subspace of a vector space V over a field F, then V/U
is a vector space over F where sum and scalar multiplication
are defined above: also the zero vector is 0 + U = U and the
negative of v + U is (—v) + U.
Proof
We have to check that the vector space axioms hold for V/U,
which is an entirely routine task. As an example, let us verify
one of the distributive laws. Let v, w G V and let c E F.
Then by definition
c((v + U) + (w + U)) = c((v + w) + U) = c(v + w) + U
= (cv + cw) + U,
which by definition equals (cv+U) + (cw+U). This establishes
the distributive law. Verification of the other axioms is left to
the reader as an exercise. It also is easy to check that 0 is the
zero vector and (—v) + U the negative ofv + U.
Example 5.3.8
Suppose we take U to be the zero subspace of the vector space
V: then V/0 consists of all v + 0 = {v}, i.e., the one-element
subsets of V. While V/0 is not the same vector space as V,
the two spaces are clearly very much alike: this can be made
precise by saying that they are isomorphic (see 6.3).
146 Chapter Five: Basis and Dimension
At the opposite extreme, we could take U = V. Now
V/V consists of the cosets v + V = V, i.e., there is just one
element. So V/V is a zero vector space.
We move on to more interesting examples of coset forma-
tion.
Example 5.3.9
Let S be the set of all solutions of a consistent linear system
AX = B of m equations in n unknowns over a field F. If
5 = 0, then S is a subspace of Fn
, namely, the solution
space U of the associated homogeneous linear system AX = 0.
However, if B ^ 0, then S is not a subspace: but we will see
that it is a coset of the subspace U.
Since the system is consistent, there is at least one solu-
tion, say Xi. Suppose X is another solution. Then we have
AX = B and AX = B. Subtracting the first of these equa-
tions from the second, we find that
0 = AX - AXl = A(X - Xi),
so that X — Xi E U and X £ Xi + U, where U is the solution
space of the system AX = 0. Hence every solution of AX = B
belongs to the coset X + U and thus S C X + U.
Conversely, consider any Y G X + U, say Y = X + Z
where Z eU. Then AY = AXl+AZ = B+0 = B. Therefore
Y ES SindS = X1 + U.
These considerations have established the following re-
sult.
Theorem 5.3.6
Let AX — B he a consistent linear system. Let X he any
fixed solution of the system and let U he the solution space of
the associated homogeneous linear system AX = 0. Then the
set of all solutions of the linear system AX = B is the coset
X^ + U.
5.3: Operations with Subspaces 147
Our last example of coset formation is a geometric one.
Example 5.3.10
Let A and B be vectors in R3
representing non-parallel line
segments in 3-dimensional space. Then the subspace
U=<A, B>
has dimension 2 and consists of all cA + dB, (c, d E R). The
vectors in U are represented by line segments, drawn from the
origin, which lie in a plane P. Now choose X G R , with
X = (xi, x2, x3)T
.
A typical vector in the coset X + U has the form
X + cA + dB, with c, d e R, i.e.,
(xi + cai + dbi x2 + ca2 + db2 X3 + ca3 + db3)T
.
Now the points (xi+cai+d&i, X2+ca2+db2, Xs + cas + db3)
lie in the plane Pi passing through the point (xi, x2, £3),
which is parallel to the plane P. This is seen by forming the
line segment joining two such points. The elements of X + U
correspond to the points in the plane Pi: the latter is called
a translate of the plane P.
Dimension of a Quotient Space
We conclude the discussion by noting a simple formula
for the dimension of a quotient space of a finite dimensional
vector space.
Theorem 5.3.7
Let U be a subspace of a finite dimensional vector space V.
Then
dim(V/C/) = dim(y) - dim(£/).
148 Chapter Five: Basis and Dimension
Proof
If U = 0, then dim([7) = 0 and V/0 = {{v} | v G V}, which
clearly has the same dimension as V. Thus the formula is
valid in this case.
Now let U ^ 0 and choose a basis {ui,..., um } of U. By
5.1.4 we may extend this to a basis
{ui,..., u m , u m + i , . . . ,un }
of V. Here of course m — dim([/) and n = dim(V). A typical
n
element v of V has the form v = ^ Cjiij, where the c^ are
scalars. Next
n n
v + C/=( Yl CiUi)+U= ^ Ci(ui + U),
i=m+l i=m+l
m
since Yl c
iu
i ^ U. Hence u m + i + U, ..., un + U generate
i=l
the quotient space V/U.
n
On the other hand, if Yl Ciiyn + U) — 0v/u =
U, then
i=m+l
n
Y CjUj e U, so that this vector is a linear combination of
i=m+l
Ui,..., um . Since the Uj are linearly independent, it follows
that cm + i — • • • = cn = 0. Therefore u m + i + U, ..., un + U
form a basis of V/U and hence
dim(V/J7) = n-m = dim(V) - dim(C7).
Exercises 5.3
1. Find three distinct subspaces U, V, W of R2
such that
n2
= u®v = v®w = weu.
5.3: Operations with Subspaces 149
2. Let U and W denote the sets of all n x n real symmetric
and skew-symmetric matrices respectively. Show that these
are subspaces of Mn (R), and that Mn (R) is the direct sum of
U and W. Find dim(U) and dim(W).
3. Let U and W be subspaces of a vector space V and suppose
that each vector v in V has a unique expression of the form
v = u + w where u belongs to U and w to W. Prove that
V = U e W.
4. Let U, V, W be subspaces of some vector space and suppose
that U C W. Prove that
(u + v) n w= u + (v n w).
5. Prove or disprove the following statement: if U, V, W are
subspaces of a vector space, then (U + V) n W = (U D W) +
(VHW).
6. Suppose that U and W are subspaces of Pi4(R) with
dim(C/) = 7 and dim(W) = 11. Show that dim(U n l f ) > 4 .
Give an example to show that this minimum dimension can
occur.
7. Let M be the matrix
/ 3 3 2 8
1 1 - 1 1
1 1 3 5
 - 2 4 6 8 /
and let U and W be the subspaces of R4
generated by rows 1
and 2 of M, and by rows 3 and 4 of M respectively. Find the
dimensions of U + W and U fl W.
8. Define polynomials
/i = 1 - 2x + x3
, f2 = x + x2
- x3
.
150 Chapter Five: Basis and Dimension
and
01 = 2 + 2x - Ax2
+ x3
, g2 = 1 - x + x2
, g3 = 2 + 3x - x2
.
Let U be the subspace of P^(JR) generated by {/i, f2} and let
W be the subspace generated by {gx, g2, g3}. Find bases for
the subspaces U + W and U CW.
9. Let Ui,... ,Uk be subspaces of a vector space V. Prove
that V = U © • • • © Uk if and only if each element of V has a
unique expression of the form Ui + • • • + u^ where Uj belongs
to Ui.
10. Every vector space of dimension n is a direct sum of n
subspaces each of which has dimension 1. Explain why this
true.
11. If Ui,..., Uk are subspaces of a finitely generated vec-
tor space whose sum is the direct sum, find the dimension of
Ui®---®Uk.
12. Let U, U2, U3 be subspaces of a vector space such that
Ui n U2 = U2nU3 = U2rUi = 0. Does it follow that
U + U2 + U3 = Ux © U2 © U3? Justify your answer.
13. Verify that all the vector space axioms hold for a quotient
space V/U.
14. Consider the linear system of Exercise 2.1.1,
x + 2x2 — Sx3 + X4 = 7
-xi + x2 - x3 + X4 = 4
(a) Write the general solution of the system in the form
XQ + Y, where XQ is a particular solution and Y is the general
solution of the associated homogeneous system.
(b) Identify the set of all solutions of the given linear
system as a coset of the solution space of the associated ho-
mogeneous linear system.
5.3: Operations with Subspaces 151
15. Find the dimension of the quotient space Pn(R)/U where
U is the subspace of all real constant polynomials.
16. Let V be an n-dimensional vector space over an arbitrary
field. Prove that there exists a quotient space of V of each
dimension i where 0 < i < n.
17. Let V be a finite-dimensional vector space and let U and
W be two subspaces of V. Prove that
dim((C7 + W)/W) = dim(U/(U n W)).
Chapter Six
LINEAR TRANSFORMATIONS
A linear transformation is a function between two vector
spaces which relates the structures of the spaces. Linear trans-
formations include operations as diverse as multiplication of
column vectors by matrices and differentiation of functions
of a real variable. Despite their diversity, linear transforma-
tions have many common properties which can be exploited
in different contexts. This is a good reason for studying linear
transformations and indeed much else in linear algebra.
In order to establish notation and basic ideas, we begin
with a brief discussion of functions defined on arbitrary sets.
Readers who are familiar with this elementary material may
wish to skip 6.1.
6.1 Functions Denned on Sets
If X and Y are two non-empty sets, a function or mapping
from X to Y,
F :X -> y,
is a rule that assigns to each element x o f l a unique element
F(x) of Y, called the image of x under F. The sets X and
Y are called the domain and codomain of the function F re-
spectively. The set of all images of elements of X is called the
image of the function F; it is written
Im(F).
Examples of functions abound; the most familiar are quite
likely the functions that arise in calculus, namely functions
whose domain and codomain are subsets of the set of real
numbers R. An example of a function which has the flavor
152
6.1: Functions Defined on Sets 153
of linear algebra is F : MTO)n(R) —
> R defined by F(A) =
det(A), that is, the determinant function.
A very simple, but nonetheless important, example of a
function is the identity function on a set X; this is the function
lx : X -> X
which leaves every element of the set X fixed, that is, lx(x) =
x for all elements x of X.
Next, three important special types of function will be
introduced. A function F : X —
> Y is said to be injective (or
one-one) if distinct elements of X always have distinct images
under F, that is, if the equation F(xi) = F{x2) implies that
X = X2- On the other hand, F is said to be surjective (or
onto) if every element y of Y is the image under F of at least
one element of X, that is, if y = F(x) for some x in X Finally,
F is said to be bijective (or a one-one correspondence) if it is
both injective and surjective.
We need to give some examples to illustrate these con-
cepts. For convenience these will be real-valued functions of
a real variable x.
Example 6.1.1
Define Fx : R -• R by the rule F±(x) = 2X
. Then Fx is
injective since 2X
= 2y
clearly implies that x — y. But i*
cannot be surjective since 2X
is always positive and so, for
example, 0 is not the image of any element under F.
Example 6.1.2
Define a function F2 : R —
> R by F2(x) = x2
(x — 1). Here
F2 is not injective; indeed ^ ( 0 ) = 0 = ^ ( 1 ) . However F2 is
surjective since the expression x2
(x — 1) assumes all real values
as x varies. The best way to see this is to draw the graph of
the function y = x2
(x — 1) and observe that it extends over
the entire y-axis.
154 Chapter Six: Linear Transformations
Example 6.1.3
Define F3 : R -»• R by F2(x) = 2x - 1. This function is both
injective and surjective, so it is bijective. (The reader should
supply the proof.)
Composition of functions
Consider two functions F : X —
> Y and G : U —
» V such
that the image of G is a subset of X. Then it is possible to
combine the functions to produce a new function called the
composite of F and G
FoG : U->Y,
by applying first G and then F; thus the image of an element
x of U is given by the formula
FoG(x) = F(G(x)).
Here it is necessary to know that Im(G') is contained in X,
since otherwise the expression F(G(x)) might be meaningless.
Example 6.1.4
Consider the functions F : R2
—
> R and G : C —
• R2
defined
by the rules
F((a
b )) = v V + 62
and G(a + v ^ ) =
Here a and 6 are arbitrary real numbers. Then F o G : C —
* R
exists and its effect is described by
F o G(a + V^lb) = F((2
2
a
b)) = ^/Aa? + 4b2
.
A basic fact about functional composition is that it sat-
isfies the associative law. First let us agree that two functions
ft-
6.1: Functions Defined on Sets 155
F and G are to be considered equal - in symbols F = G - if
they have the same domain and codomain and if F(x) = G(x)
for all x.
Theorem 6.1.1
Let F : X —>Y, G : U —
> V and H : R —
> S be functions such
that m{H) is contained in U and Im(G) is contained in X.
Then F o (G o H) = (F o G) o H.
Proof
First observe that the various composites mentioned in the
formula make sense: this is because of the assumptions about
Im(H) and lm(G). Let x be an element of X. Then, by the
definition of a composite,
F o (G o H){x) = F((G o H)(x)) = F(G{H(x))).
In a similar manner we find that (FoG) oH(x) is also equal to
this element. Therefore Fo(GoH) = (FoG)oH, as claimed.
Another basic result asserts that a function is unchanged
when it is composed with an identity function.
Theorem 6.1.2
If F : X —>Y is any function, then F o lx = F — ly o F.
The very easy proof is left to the reader as an exercise.
Inverses of functions
Suppose that F : X —> Y is a function. An inverse of
F is a function of the form G : Y —
> X such that FoG and
G o F are the identity functions on Y and on X respectively,
that is,
F{G{y)) = y and G(F(x)) = x
for all s i n l and y in Y. A function which has an inverse is
said to be invertible.
156 Chapter Six: Linear Transformations
Example 6.1.5
Consider the functions F and G with domain and codomain
R which are defined by F(x) = 2x — 1 and G(x) = (x + l)/2.
Then G is an inverse of F since F o G and G o F are both
equal to lp^. Indeed
F o G(x) = F(G(x)) = F((x + l)/2) = 2((x + l)/2) - 1 = 2;,
with a similar computation for 6* o F(x).
Not every function has an inverse; in fact a basic theorem
asserts that only the bijective ones do.
Theorem 6.1.3
A function F : X —
> Y has an inverse if and only if it is
bijective.
Proof
Suppose first that F has an inverse function G : Y —
> X.
If F(xi) = F(x2), then, on applying G to both sides, we
obtain G o F(xi) = G o F(x2). But G o F is the identity
function on X, so xi = x2. Hence F is injective. Next let y
be any element of Y; then, since FoG is the identity function,
y = F o G(y) = F(G(y)), which shows that j/ belongs to the
image of F and F is surjective. Therefore F is bijective.
Conversely, assume that F is a bijective function. We
need to find an inverse function G : Y —
> X for F. To this end
let y belong to F; then, since F is surjective, y = F{x) for
some a; in X; moreover x is uniquely determined by y since
F is injective. This allows us to define G(y) to be x. Then
G(F(a;)) = G(y) = x and F(G(y)) = F(ar) = j/. Here it is
necessary to observe that every element of X is of the form
G(y) for some y in Y, so that G(F(x)) equals x for all elements
x of X. Therefore G is an inverse function for F.
The next observation is that when inverse functions do
exist, they are unique.
6.1: Functions Defined on Sets 157
Theorem 6.1.4
Every bijective function F : X —
> Y has a unique inverse
function.
Proof
Suppose that F has two inverse functions, say G and G2.
Then {Gx o F) o G2 = lx ° G2 = G2 by 6.1.2. On the other
hand, by 6.1.1 this function is also equal to G o (F o G2) =
G i o l y = Gi. Thus Gi = G 2 .
Because of this result it is unambiguous to denote the
inverse of a bijective function F : X —•
> Y by
F~l
: Y -> X.
To conclude this brief account of the elementary theory of
functions, we record two frequently used results about inverse
functions.
Theorem 6.1.5
(a) If F : X —
> Y is an invertible function, then F~l
is
invertible with inverse F.
(b) IfF:X^YandG:U^X are invertible
functions, then the function F o G : U —
> Y
is invertible and its inverse is G~x
o F"1
.
Proof
Since F o F~l
= 1Y and F~x
o F = lx, it follows that F is
the inverse of F~x
. For the second statement it is enough to
check that when G - 1
o F _ 1
is composed with F o G on both
sides, identity functions result. To prove this simply apply the
associative law twice.
158 Chapter Six: Linear Transformationns
Exercises 6.1
1. Label each of the following functions F : R —
• R injective,
surjective or bijective, as is most appropriate. (You may wish
to draw the graph of the function in some cases):
(a) F(x) = x2
; (b) F(x) = x3
/(x2
+ 1);
(c) F(x) = x(x ~l)(x-2); (d) F(x) = ex
+ 2.
2. Let functions F and G from R to R be defined by F{x) =
2x — 3, and G{x) = (x2
— l)/(x2
+ l). Show that the composite
functions F o G and G o F are different.
3. Verify that the following functions from R to R are mutu-
ally inverse: F(x) = 3x — 5 and G(x) = (x + 5)/3.
4. Find the inverse of the bijective function F : R —
> R
defined by F(x) = 2x3
— 5.
5. Let G : F -^ X be an injective function. Construct a
function F : X —
• V such that F o G is the identity function
on Y. Then use this result to show that there exist functions
F, G : R -> R such that F o G = 1 R but G o F ^ 1 R .
6. Prove 6.1.2.
7. Complete the proof of part (b) of 6.1.5.
6.2 Linear Transformations and Matrices
After the preliminaries on functions, we proceed at once
to the fundamental definition of the chapter, that of a linear
transformation. Let V and W be two vector spaces over the
same field of scalars F. A linear transformation (or linear
mapping) from V to W is a function
T: V ^ W
with the properties
T(vi + v2) = T(vi) + T(v2) and T(cv) = cT(v)
6.2: Linear Transformations and Matrices 159
for all vectors v, vi, V2 in V and all scalars c in F. In short
the function T is required to act in a "linear" fashion on sums
and scalar multiples of vectors in V. In the case where T is a
linear transformation from V to V, we say that T is a linear
operator on V.
Of course we need some examples of linear transforma-
tions, but these are not hard to find.
Example 6.2.1
Let the function T : R3
—
> R2
be defined by the rule
Thus T simply "forgets" the third entry of a vector. From
this definition it is obvious that T is a linear transformation.
Now recall from Chapter Four the geometrical interpreta-
tion of the column vector with entries a, b, c as the line segment
joining the origin to the point with coordinates (a, b, c). Then
the linear transformation T projects the line segment onto the
xy-plane. Consequently projection of a line in 3-dimensional
space which passes through the origin onto the xy-pane is a
linear transformation from R3
to R2
.
The next example of a linear transformation is also of a
geometrical nature.
Example 6.2.2
Suppose that an anti-clockwise rotation through angle 9 about
the origin O is applied to the xy-plane. Since vectors in R2
are represented by line segments in the plane drawn from the
origin, such a rotation determines a function T : R2
—
> R2
;
here the line segment representing T(X) is obtained by rotat-
ing the line segment that represents X.
160 Chapter Six: Linear Transformationns
To show that T is a linear operator on R2
, we suppose that
Y is another vector in R2
.
T(X+Y)
Referring to the diagram above, we know from the trian-
gle rule that X+Y is represented by the third side of the trian-
gle formed by the line segments representing X and Y. When
the rotation is applied to this triangle, the sides of the result-
ing triangle represent the vectors T(X), T(Y), T(X) + T(Y),
as shown in the diagram. The triangle rule then shows that
T(X + Y) = T(X)+T(Y).
In a similar way we can see from the geometrical inter-
pretation of scalar multiples in R2
that T(cX) = cT(X) for
any scalar c. It follows that T is a linear operator on R .
Example 6.2.3
Define T : D^a, b] —
> Doo[a, b] to be differentiation, that is,
T(f(x)) = f'(x).
Here Doo[a,b} denotes the vector space of all functions of x
that are infinitely differentiable in the interval [a ,b]. Then
well-known facts from calculus guarantee that T is a linear
operator on D^a, b.
This example can be generalized in a significant fashion
as follows. Let a±, a2 ,..., an be functions in D^a, b]. For any
6.2: Linear Transformations and Matrices 161
/ in Doo[a,b], define T(f) to be
anf^ + a n - i / ( n _ 1 )
+ • • • + a i / ' + «o/.
Then T is a linear operator on Doo[a,b], once again by ele-
mentary results from calculus. Here one can think of T as a
sort of generalized differential operator that can be applied to
functions in -Doo[a, 6].
Our next example of a linear transformation involves quo-
tient spaces, which were defined in 5.3.
Example 6.2.4
Let U be a subspace of a vector space V and define a function
T : V -» V/U by the rule T(v) = v + U. It is simple to verify
that T is a linear transformation: indeed,
T(vx + v2) = (vi + v2) + U = (vi + U) + (v2 + *7)
= T(V l )+T(v2 )
by definition of the sum of two vectors in a quotient space. In
a similar way one can show that T(cv) = c(T(v)).
The function just defined is often called the canonical
linear transformation associated with the subspace U.
Finally, we record two very simple examples of linear
transformations.
Example 6.2.5
(a) Let V and W be two vector spaces over the same field. The
function which sends every vector in V to the zero vector of W
is a linear transformation called the zero linear transformation
from V to W; it is written
Ov,w or simply 0.
(b) The identity function y : V —
> V is a linear operator on
V.
After these examples it is time to present some elemen-
tary properties of linear transformations.
162 Chapter Six: Linear Transformationns
Theorem 6.2.1
Let T : V —
> W be a linear transformation. Then
r(Ov) = ow
and
T(c1 v1 +c2 v2 + - • •
+c fcvA!) = c1T(v1)+c2T(v2) + - • -+ckT(vk)
/or a// vectors v$ and scalars Ci.
Thus a linear transformation always sends a zero vector
to a zero vector; it also sends a linear combination of vectors
to the corresponding linear combination of the images of the
vectors.
Proof
In the first place we have
T(0V) = T(0V + 0V) = T(0V) + T(0V)
by the first defining property of linear transformations. Addi-
tion of —T(Oy) to both sides gives Ow = T(Oy), as required.
Next, use of both parts of the definition shows that
T(civi H h cfc_ivfc_! + ckvk)
is equal to the vector
r(cxvi H h Cfc_iVfc_i) + cfcr(vfc).
By repeated application of this procedure, or more properly
induction on k, we obtain the second result.
Representing linear transformations by matrices
We now specialize the discussion to linear transformations
of the type
6.2: Linear Transformations and Matrices 163
where F is some field of scalars. Let {Ei,E2, ...,En} be the
standard basis of Fn
written in the usual order, that of the
columns of the identity matrix ln . Also let {Di,D2, ...,Dm}
be the corresponding ordered basis of Fm
. Since T(Ej) is a
vector in Fm
, it can be written in the form
/
T{E3) =
O i j
— aijDi + • • • + amjDm = 2_^ &ijDi
 Uj
m3
Put A = [ajj]m)n, so that the columns of the matrix A are
the vectors T(E{), ...,T(En). We show that T is completely
determined by the matrix A.
Take an arbitrary vector in Fn
, say
X = : J = xiE1 + ~xnEn = Y^x
jE
r
xnl i=i
Then, using 6.2.1 together with the expression for T(Ej), we
obtain
n n n m
j=l j=l j-l i=l
m n
=
52(52a
i3x
j)D
i-
i=l j = l
Therefore the ith entry of T(X) equals the ith entry of the
matrix product AX. Thus we have shown that
T{X) = AX,
which means that the effect of T on a vector in Fn
is to
multiply it on the left by the matrix A. Thus A determines T
completely.
164 Chapter Six: Linear Transformationns
Conversely, suppose that we start with an m x n matrix
A over F; then we can define a function T : Fn
—
> Fm
by the
rule T(X) = AX. The laws of matrix algebra guarantee that
T is a linear transformation; for by 1.2.1
A(Xi + X2) = AXX + AX2 and A(cX) = c(AX).
We have now established a fundamental connection between
matrices and linear transformations.
Theorem 6.2.2
(i) Let T : Fn
—
> Fm
be a linear transformation. Then
T{X) = AX for all X in Fn
where A is the m x n matrix
whose columns are the images under T of the standard basis
vectors of Fn
.
(ii) Conversely, if A is any mx n matrix over the field F, the
function T : Fn
-> F m
defined by T(X) = AX is a linear
transformation.
Example 6.2.6
Define T : R3
-»• R2
by the rule
One quickly checks that T is a linear transformation. The
images under T of the standard basis vectors Ei, E2, E3 are
respectively. It follows that T is represented by the matrix
A =
[ 0 - 1 3 ) '
6.2: Linear Transformations and Matrices 165
Consequently
T{ x2 ) = A x2
x3J x3
as can be verified directly by matrix multiplication.
Example 6.2.7
Consider the linear operator T : R2
—
> R2
which arises from
an anti-clockwise rotation in the xy-plane through an angle 9
(see Example 6.2.2.) The problem is to write down the matrix
which represents T.
All that need be done is to identify the vectors T(E{)
and T(E2) where E and E2 are the vectors of the standard
ordered basis.
(-sin 9, cos 9) -(0,1)
(cos 9, sin 9)
(1.0)
The line segment representing E is drawn from the origin O to
the point (1, 0), and after rotation it becomes the line segment
from O to the point (cos 8, sin 9); thus T(E) = [ . n 1.
sin
" J
Similarly T(E2) =
— sin 9
cos 9
which represents the rotation T is
It follows that the matrix
cos 9
sin 9
-sin 9
cos 9
166 Chapter Six: Linear Transformationns
Representing linear transformations by matrices:
The general case
We turn now to the problem of representing by matrices
linear transformations between arbitrary finite-dimensional
vector spaces.
Let V and W be two non-zero finite-dimensional vector
spaces over the same field of scalars F. Consider a linear
transformation T : V —
> W. The first thing to do is to choose
and fix ordered bases for V and W, say
B = {v
i> v 2 . . . , v n } andC = {wi, w 2 . . . , w m }
respectively. We saw in 5.1 how any vector v of V can be
represented by a unique coordinate vector with respect to the
ordered basis B. If v = ciVi + • • • + cn vn , this coordinate
vector is
Similarly each w in W may be represented by a coordinate
vector [w]c with respect to C .
To represent T by a matrix with respect to these chosen
ordered bases, we first express the image under T of each
vector in B as a linear combination of the vectors of C, say
m
T(VJ) = aij-wi H 1
- am i wm = ^ a
y'w
*
where the scalars. Thus [T(VJ)]C is the column vector
with entries aij,..., amj. Let A be the m x n matrix whose
(i,j) entry is a^. Thus the columns of A are just the coordi-
nate vectors of T(vi),..., T(vn) with respect to C.
6.2: Linear Transformations and Matrices 167
Now consider the effect of T on an arbitrary vector of V,
say v = C1V1 + • •
• + cn vn . This is computed by using the
expression for T(VJ) given above:
n n n m
3 = 1 3 = 1 3 = 1 i=l
On interchanging the order of summations, this becomes
m n
T
(V
) = ^ E a
V c
j ) w
i -
i=l j=l
Hence the coordinate vector of T(v) with respect to the or-
dered basis C has entries J2^=i a
ijc
j for i = 1, 2,..., m. This
means that
[T(v)]c = A[v]B.
The conclusions of this discussion can be summed up as
follows.
Theorem 6.2.3
Let T : V —
> W be a linear transformation between two non-
zero finite-dimensional vector spaces V and W over the same
field. Suppose that B and C are ordered bases for V and W
respectively. If v is any vector of V, then
[T(v)]c = A[v]B
where A is the mxn matrix whose jth column is the coordinate
vector of the image under T of the jth vector of B, taken with
respect to the basis C.
What this result means is that a linear transformation
between non-zero finite-dimensional vector spaces can always
be represented by left multiplication by a suitable matrix. At
this point the reader may wonder if it is worth the trouble
168 Chapter Six: Linear Transformations
of introducing linear transformations, given that they can be
described by matrices. The answer is that there are situations
where the functional nature of a linear transformation is a
decided advantage. In addition there is the fact that a given
linear transformation can be represented by a host of different
matrices, depending on which ordered bases are used. The
real object of interest is the linear transformation, not the
representing matrix, which is dependent on the choice of bases.
Example 6.2.8
Define T : Pn + i(R) Pn (R) by the rule T(f) = /', the
derivative. Let us use the standard bases B = {1, x, x ,..., xn
}
and C = {l,ai, x2
, ...,xn_1
} for the two vector spaces. Here
T{xl
) = ix%
~1
, so [T(xl
)]c is the vector whose ith entry is i
and whose other entries are zero. Therefore T is represented
by the n x (n + 1) matrix
A
/ 0 1 0
0 0 2
0 0 0
0 0 0
Vo o o
°
0
0
n
0 /
For example,
A
( 2

-1
3
0
6
0
V 07
V 0/
which corresponds to the differentiation
T(2 - x + 3x2
) = (2-x + 3x2
)' = 6x - 1.
6.2: Linear Transformations and Matrices 169
Change of basis
Being aware of a dependence on the choice of bases, we
wish to determine the effect on the matrix representing a linear
transformation when the ordered bases are changed. The first
step is to find a matrix that describes the change of basis.
Let B = {v1,..., vn } and B' = {v^,..., v'n} be two or-
dered bases of a finite-dimensional vector space V. Then each
v^ can be expressed as a linear combination of v i , . . . , vn , say
n
J'=l
for certain scalars Sji. The change of basis B' —> B is deter-
mined by the n x n matrix S = [sij]. To see how this works
we take an arbitrary vector v in V and write it in the form
n
i=l
where, of course, c ,..., cn' are the entries of the coordinate
vector [v]g/. Replace each v / by its expression in terms of the
Vj to get
n n n n
i = l j = X j = l i = l
From this one sees that the entries of the coordinate vector
[V]B are just the scalars Y17-1s
jic
'n ^or
3 =
1, 2,..., n. But the
latter are the entries of the product
(c'A
Vn)
170 Chapter Six: Linear Transformationns
Therefore we obtain the fundamental relation
M B = S[v]B>.
Thus left multiplication by the change of basis matrix S trans-
forms coordinate vectors with respect to B' into coordinate
vectors with respect to B. It is in this sense that the matrix
S describes the basis change B' —
> B. Here it is important
to observe how S is formed: its ith column is the coordinate
vector of v[, the ith vector of B', with respect to the basis B.
It is a crucial remark that the change of basis matrix S
is always invertible. Indeed, if this were false, there would
by 2.3.5 be a non-zero n-column vector X such that SX — 0.
However, if u denotes the vector in V whose coordinate vector
with respect to basis B' is X, then [u]g = SX = 0, which can
only mean that u = 0 and X = 0, a contradiction.
As one would expect, the matrix S~x
represents the in-
verse change of basis B —
> B' for the equation M s =
^ M s '
implies that
[vBI = S-vB.
These conclusions can be summed up in the following
form.
Theorem 6.2.4
Let B and B' be two ordered bases of an n-dimensional vector
space V. Define S to be the n x n matrix whose ith column is
the coordinate vector of the ith vector of B' with respect to the
basis B. Then S is invertible and, ifv is any vector ofV,
M s = S[v]B> and [v]B/ = S~1
[v]B.
6.2: Linear Transformations and Matrices 171
Example 6.2.9
Consider two ordered bases of the vector space Pa(R):
B = {l,x,x2
} and B' = {1, 2x, Ax2
- 2}.
In order to find the matrix S which describes the change of
basis B' —
> B, we must write down the coordinate vectors of
the elements of B' with respect to the standard basis B: these
are
[l]s = I 0 , [2x]B = 2 J , [Ax2
- 2]B = I 0
Therefore
The matrix which describes the change of basis B —> B' is
/ l 0 1/2'
S"1
= 0 1/2 0
 0 0 1/4
For example, to express / = a + bx + ex2
in terms of the basis
B', we compute
[/]B'=S_1
[/]fl
Thus / = (a + c/2)l + (b/2)2x + (c/4)(4x2
- 2), which is of
course easy to verify.
172 Chapter Six: Linear Transformationns
Example 6.2.10
Consider the change of basis in R2
which arises when the x-
and y-axes are rotated through angle 9 in an anticlockwise
direction. As was noted in Example 6.2.6, the effect of this
rotation is to replace the standard ordered basis B = {E, E2]
by the basis B' consisting of
cos 9  _. / — sin 9
sin 9 J y cos 9
The matrix which describes the change of basis B ' —> B is
q _ f cos 9 — sin 9
 sin 9 cos 9
so the change of basis B —> B' is described by
s-l
=
,_i _ / cos 9 sin 9
— sin 9 cos 9
Hence, if X = I , I, the coordinate of vector of X with re-
spect to the basis B' is
r v — c-i v _ f a cos 9 + b sin 9 
This means that the coordinates of the point (a, b) with re-
spect to the rotated axes are
a' = a cos 9 + b sin # and b' = —a sin 9 + b cos 9,
respectively.
6.2: Linear Transformations and Matrices 173
Change of basis and linear transformations
We are now in a position to calculate the effect of change
of bases on the matrix representing a linear transformation.
Let B and C be ordered bases of finite-dimensional vector
spaces V and W over the same field, and let T : V —
> W be
a linear transformation. Then T is represented by a matrix A
with respect to these bases.
Suppose now that we select new bases B' and C for V
and W respectively. Then T will be represented with respect
to these bases by another matrix, say A'. The question before
us is: what is the relation between A and A'l
Let X and Y be the invertible matrices that represent
the changes of bases B —
> B' and C —
> C respectively. Then,
for any vectors v of V and w of W, we have
[v]B/ = X[V]B and [w]C/ = y[w]c .
Now by 6.2.3
[T(v)]c = A[w)B and [T(v)]c, = A'[v]s,.
On combining these equations, we obtain
[T(v)]c, = Y[T(v)]c = 7A[v]B = YAX-^B'-
But this means that the matrix YAX-1
describes the linear
transformation T with respect to the bases B' and C of V and
W respectively. Hence A' = YAX~l
.
We summarise these conclusions in
Theorem 6.2.5
Let V and W be non-zero finite-dimensional vector spaces over
the same field. Let B and B' be ordered bases of V, and C
and C ordered bases of W. Suppose that matrices X and Y
describe the respective changes of bases B —
> B' and C —
> C'.
If the linear transformation T : V —
> W is represented by a
174 Chapter Six: Linear Transformationns
matrix A with respect to B and C, and by a matrix A' with
respect to B' and C, then
A' = YAX-
The most important case is that of a linear operator
T : V —• V, when the ordered basis B is used for both domain
and codomain.
Theorem 6.2.6
Let B and B' be two ordered bases of a finite-dimensional vec-
tor space V and let T be a linear operator on V. If T is repre-
sented by matrices A and A' with respect to B and B' respec-
tively, then
A' = SAS'1
where S is the matrix representing the change of basis B —> B'.
Example 6.2.11
Let T be the linear transformation on -Ps(R) defined by
T(/) = /'. Consider the ordered bases of Pa(R)
B = {l,x,x2
} and B' = {l,2x,4x2
- 2}.
We saw in Example 6.2.9 that the change of basis B —
> B'
is represented by the matrix
/ l 0 1/2'
17 = 0 1/2 0
 0 0 1/4
Now T is represented with respect to B by the matrix
A
6.2: Linear Transformations and Matrices 175
Hence T is represented with respect to B' by
/ 0 2 0 
[/At/- 1
= 0 0 4 .
 0 0 0 /
This conclusion is easily checked. An arbitrary element of
P3(R) can be written in the form / = a(l)+6(2ir) + c(4a;2
-2).
Then it is claimed that the coordinate vector of T(f) with
respect to the basis B' is
/ 0 2 0  / a  / 2 6 
0 0 4 6 =
= 4 c .
0 0 0/  c / W
This is correct since
26(1) + 4c(2x) + 0(4a?2
- 2) = 2b + 8cx
= (a{l) + b(2x) + c(4x2
- 2))'.
Similar matrices
Let A and B be two n x n matrices over a field F; then
B is said to be similar to A over F if there is an invertible
n x n matrix S with entries in F such that
B = SAS~1
.
Thus the essential content of 6.2.6 is that two matrices which
represent the same linear operator on a finite-dimensional vec-
tor space are similar. Because of this fact it is to be expected
that similar matrices will have many properties in common:
for example, similar matrices have the same determinant. In-
deed if B = SAS~ then by 3.3.3 and 3.3.5
det(B) = det(S) det(A) det(S)-1
= det(A).
176 Chapter Six: Linear TYansformationns
We shall encounter other common properties of similar matri-
ces in Chapter Eight.
Exercises 6.2
1. Which of the following functions are linear transforma-
tions?
(a) Ti : R3 —
> R where Ti([2:1X2X3]) = Jx + x + x§;
(b) T2 : Mm,n{F) - Mn ,m (F) where T2(A) = AT
;
(c) T3 : Mn(F) ->• F where T3(^) = det(4).
2. If T is a linear transformation, prove that T(—v) = —T(y)
for all vectors v.
3. Let I be a fixed line in the xy-plane passing through the
origin O. If P is any point in the plane, denote by P' the
mirror image of P in the line /. Prove that the assignment
OP —
> OP' determines a linear operator on R2
. (This is
called reflection in the line I).
4. A linear transformation T : R —
> R is defined by
T{
(Xl

X3
X4 /
Xi — X2 — %3 ~ £4
2xi + x2 - X3
%2 - %3 + %4
Find the matrix that represents T with respect to the standard
bases of R4
and R3
.
5. A function T : P^(R) —
>
• P^(R) is defined by the rule
T(f) = xf" — 2xf + /. Show that T is a linear operator and
find the matrix that represents T with respect to the standard
basis of P4.(R).
6. Find the matrix which represents the reflection in Exercise
3 with respect to the standard ordered basis of R2
, given that
the angle between the positive x-direction and the line I is 4>.
6.2: Linear Transformations and Matrices 177
7. Let B denote the standard basis of R3
and let B' be the
basis consisting of
Find the matrices that represent the basis changes B —* B'
and B' -• B.
8. A linear transformation from R3
to R2
is defined by
xx - x2 - x3
-Xi + X3
Let B and C be the ordered bases
of R3
and R2
respectively. Find the matrix that represents T
with respect to these bases.
9. Explain why the matrices I 1 and ( 1 cannot
be similar.
10. If B is similar to A, prove that A is similar to B.
11. If B is similar to A and C is similar to B, prove that C
is similar to A.
12. If B is similar to A, then BT
is similar to AT
; prove or
disprove.
178 Chapter Six: Linear Transformations
6.3 Kernel, Image and Isomorphism
If T : V —
> W is a linear transformation between two vec-
tor spaces, there are two important subspaces associated with
T, the image and the kernel. The first of these has already
been defined; the image of T,
Im(T),
is the set of all images T(v) of vectors v in V: thus Im(T) is
a subset of W.
On the other hand, the kernel of T
Ker(T)
is defined to be the set of all vectors v i n 7 such that T(v) =
0W. Thus Ker(T) is a subset of V. Notice that by 6.2.1 the
zero vector of V must belong to Ker(T), while the zero vector
of W belongs to Im(T).
The first thing to observe is that we are actually dealing
with subspaces here, not just subsets.
Theorem 6.3.1
If T is a linear transformation from a vector space V to a
vector space W, then Ker(T) is a subspace of V and Im(T) is
a subspace of W.
Proof
We need to check that Ker(T) and Im(T) contain the relevant
zero vector, and that they are closed with respect to addition
and scalar multiplication. The first point is settled by the
equation T(Oy) = Ow, which was proved in 6.2.1. Also, by
definition of a linear transformation, we have T(vi + V2) =
T(vi) + T(v2) and T(cvi) = cT(vi) for all vectors v1 ; v2 of
V and scalars c. Therefore, if vi and v2 belong to Ker(T),
then T(vi + v2) = 0 ^ , and T(cvi) = Ow, so that vx + v2 and
6.3: Kernel, Image and Isomorphism 179
cvi belong to Ker(T); thus Ker(T) is a subspace. For similar
reasons Im(T) is a subspace.
Let us look next at some examples which relate these new
concepts to some more familiar ones.
Example 6.3.1
Consider the homogeneous linear differential equation for a
function y of the real variable x:
y™ + an_x{x)y^-^ + •
•
• + a1(x)y' + a0(x)y = 0,
with x in the interval [a, b] and di(x) in .Doo[a, &]. There is
an associated linear operator T on the vector space D^a^b]
defined by
T(f) = /( n )
+ an-xOr)/*"-1
* + • •
• + ax{x)f + aQ(x)f.
Then Ker(T) is the solution space of the differential equation.
Example 6.3.2
Let A be an m x n matrix over a field F. We have seen that
the rule T(X) = AX defines a linear transformation
Identify Ker(T) and Im(T).
In the first place, the definition shows that Ker(T) is
the null space of the matrix A. Next an arbitrary element of
Im(T) is a linear combination of the images of the standard
basis elements of Rn
; but the latter are simply the columns
of the matrix A. Consequently, the image of T coincides with
the column space of the matrix A.
Example 6.3.3
After the last example it is natural to enquire if there is an
interpretation of the row space of a matrix A as an image
180 Chapter Six: Linear Transformations
space. That this is the case may be seen from a related linear
transformation.
Given an m x n matrix A, define a linear transformation
7 from Fm to Fn by the rule Ti(X) = XA. In this case
Im(Xi) is generated by the images of the elements of the stan-
dard basis of Fm, that is, by the rows of A. Hence the image
of T equals the row space of A.
It is now time to consider what the kernel and image tell
us about a linear transformation.
Theorem 6.3.2
Let T be a linear transformation from a vector space V to a
vector space W. Then
(i) T is infective if and only ifKer(T) is the zero subspace
ofV;
(ii) T is surjective if and only ifIm(T) = W.
Proof
(i) Assume that T is an injective function. If v is a vector in
the kernel of T, then T(v) = 0W = T(0V). Therefore v = 0V
by injectivity, and Ker(T) = Oy- Conversely, suppose that
Ker(T') = Oy- If vi and v2 are vectors in V with the property
T(vi) = T(v2), then T(V l - v2) = T(V l ) - T(v2) = 0W.
Hence the vector vi — v2 belongs to Ker(T) and vi = v2.
(ii) This is true by definition of surjectivity.
For finite-dimensional vector spaces there is a simple for-
mula which links the dimensions of the kernel and image of a
linear transformation.
Theorem 6.3.3
Let T : V —
> W be a linear transformation where V and W
are finite-dimensional vector spaces. Then
dim(Ker(T)) + dim(Im(T)) = dim(F).
6.3: Kernel, Image and Isomorphism 181
Proof
Here we may assume that V is not the zero space; otherwise
the statement is true for obvious reasons. By 5.1.4 it is possi-
ble to choose a basis v i , . . . , vn of V such that part of it is a
basis of Ker(T), say v i , . . . , vr ; here of course
n = dim(V) > r = dim(Ker(T)).
We claim that the vectors T(vT .+ 1 ),..., T(vn) are linearly in-
dependent. For if cr + iT(vr + i) + • • • + cnT(vn) = Ow for
some scalars Ci, then T(cr+ivr+i + • • • + cnvn) = Oiy, so
that cr+ivr_|_i + • •
• + cn vn belongs to Ker(T) and is there-
fore expressible as a linear combination of v i , . . . , v r . But
v i , . . . , v r , . . . , vn are certainly linearly independent. Hence
cr + i,..., cn are all zero and our claim is established.
On the other hand, the vectors T(vr + 1 ),... ,T(vn) by
themselves generate Im(T) since T(vi) = • • • = T(vr) = Ow',
hence T(vr + i),... ,T(vn ) form a basis of Im(T). It follows
that
dim(Im(T)) = n-r = dim(F) - dim(Ker(T)),
from which the formula follows.
The dimension formula is in fact a generalization of some-
thing that we already know. For suppose we apply the for-
mula to the linear transformation T : Fn
—
• Fm
defined by
T(X) = AX, where A is an m x n matrix. Making the inter-
pretations of Ker(T) and Im(T) as the null space and column
space of A, we deduce that the sum of the dimensions of the
null space and column space of A equals n. This is essentially
the content of 5.1.7 and 5.2.4.
Isomorphism
Because of 6.3.2 we can tell whether a linear transforma-
tion T : V —•
> W is bijective. And in view of 6.1.3 this is the
same as asking whether T has an inverse. A bijective linear
transformation is called an isomorphism.
182 Chapter Six: Linear Transformations
Theorem 6.3.4
A linear transformation T : V —
> W is an isomorphism if and
only if Ker(T) is the zero subspace of V and Im(T) equals
W. Moreover, if T is an isomorphism, then so is its inverse
T~l
: W -> V.
Proof
The first statement follows from 6.3.2. As for the second state-
ment, all that need be shown is that T~l
is actually a linear
transformation: for by 6.1.5 it certainly has an inverse. This
is achieved by a trick. Let vj and v2 be any two vectors in V.
Then certainly
T ( r - 1
( v 1 + v 2 ) ) = v 1 + v 2 ,
while on the other hand,
TOT-VI) +T-X
M) = nr-^vi)) + rcr-Va))
= vi + v 2 ,
because T is known to be a linear transformation. Since T
is an injective function, this can only mean that the vectors
T- 1
(vi + v2) and T- 1
(vi) +T_ 1
(v2 ) are equal; for they have
the same image under T.
In a similar way it can be demonstrated that T_ 1
(cvi)
equals cT_ 1
(vi) where c is any scalar: just check that both
sides have the same image under T. Hence T~l
is a linear
transformation.
Two vector spaces V and W are said to be isomorphic if
there is an isomorphism from one to the other. Observe that
isomorphic vector spaces are necessarily over the same field of
scalars. The notation
V ~W
is often used to express the fact that vector spaces V and W
are isomorphic.
6.3: Kernel, Image and Isomorphism 183
How can one tell if two finite-dimensional vector spaces
are isomorphic? The answer is that the dimensions tell us all.
Theorem 6.3.5
Let V and W be finite-dimensional vector spaces over a field
F. Then V and W are isomorphic if and only if dim(V) =
dim(W).
Proof
Suppose first that dim(V) = dim(VF) = n. If n = 0, then V
and W are both zero spaces and hence are surely isomorphic.
Let n > 0. Then V and W have bases, say {vi,..., vn } and
{wi,..., wn } respectively. There is a natural candidate for an
isomorphism from V to W, namely the linear transformation
T : V -»• W defined by
T(c1v1 H h cnvn) = ciwi H h cnwn.
It is straightforward to check that T is a linear transformation.
Hence V and W are isomorphic.
Conversely, let V and W be isomorphic via an isomor-
phism T : V —
> W. Suppose that {vi,..., vn } is a basis of V.
In the first place, notice that the vectors T(vi),..., T"(vn) are
linearly independent; for if ciT(vi) + - • -+cnT(vn) = 0 ^ , then
T(ciVi + - • - + cnvn) = 0w• This implies that ciVi + - • -+cn vn
belongs to Ker(T) and so must be zero. This in turn implies
that c = • • • = cn = 0 because v i , . . . , vn are linearly inde-
pendent. It follows by 5.1.1 that dim(W) > n = dim(V). In
the same way it may be shown that dim(W) < dim(V^); hence
dim(V) = dim{W).
Corollary 6.3.6
Every n-dimensional vector space V over a field F is isomor-
phic with the vector space Fn
.
For both V and Fn
have dimension n. This result makes
it possible for some purposes to work just with vector spaces
of column vectors.
184 Chapter Six: Linear Transformations
Isomorphism theorems
There are certain theorems, known as isomorphism theo-
rems, which provide a link between linear transformations and
quotient spaces (which were defined in 5.3). Such theorems
occur frequently in algebra. The first theorem of this type is:
Theorem 6.3.7
IfT : V —+ W is a linear transformation between vector spaces
V and W, then
V/Kev(T) ~ Im(T).
Proof
Write K = Ker (T). We define a function S : V/K -> Im(T)
by the rule S{ + K) = T(v). The first thing to notice is that
S is well-defined: indeed if u € K, then
T(v + u) = T(v) + T(u) = T(v) + 0 = T(v),
since T(u) = 0. Thus S(v + K) does not depend on the choice
of representative v of the coset v + K.
Next it is simple to verify that 5" is a linear transforma-
tion: for example,
^((v! + K) + (v2 + K)) = S((vi + v2) + K)
= T ( v i + v 2 )
= r(vi) + r(v2),
which equals S(v + K) + 5,
(v2 + K). In a similar way it can
be shown that S(c(v + K)) = cS(v + K).
Clearly the function S is surjective, so all we need do to
complete the proof is show it is injective. If S(y + K) = 0,
then T(v) = 0; thus v e K and v + K = 0V/K- Hence, by
6.3.2, S is injective.
6.3: Kernel, Image and Isomorphism 185
The last result provides an alternative proof of the dimen-
sion formula in 6.3.3. Let T : V —
> W be a linear transfor-
mation. Then dim(V/Ker(:T)) = dim(Im(T) by 6.3.7. Prom
the formula for the dimension of a quotent space (see 5.3.7),
we obtain
dim(F) - dim(Ker(T)) = dim(Im(T)),
so that dim(Ker(T)) + dim(Im(T)) = dim(V).
There is second isomorphism theorem which provides
valuable insight into the relation between the sum of two sub-
spaces and certain associated quotient spaces.
Theorem 6.3.8
If U and W are subspaces of a vector space V, then
(u + w)/w ~ u/(unw).
Proof
We begin by defining a function T : U —
> (U + W)/W by
the rule T(u) = u + W, where u G U. It is a simple matter
to check that T is a linear transformation. Since u + W is a
typical vector in (U + W)/W, we see that T is surjective.
Next we need to compute the kernel of T. Now T(u) =
u + W equals the zero vector of (U + W)/W, i.e., the coset W,
precisely when u G W, which is just to say that u G U D W.
Therefore Ker(T) = UnW. It now follows directly from 6.3.7
that U/(U n l f ) ~ ( f / + W)/W.
We illustrate the usefulness of this last result by using it
to give another proof of the dimension formula of 5.3.2.
Corollary 6.3.9
// U and W are subspaces of a finite dimensional vector space
V, then
dim(C/ + W) + dim(U n W) = dim(C7) + dim(W).
186 Chapter Six: Linear Transformations
Proof
Since isomorphic vector spaces have the same dimension, we
have dim((C7 + W)/W) = dxm(U/(U D W)). Now use the
formula for the dimension of a quotient space in 5.3.7 to obtain
dim(U + W) - dim(W) = dim(U) - dim(U D W),
from which the result follows.
The algebra of linear operators on a vector space
We conclude the chapter by observing that the set of all
linear operators on a vector space has certain formal properties
which are very similar to properties that have already been
seen to hold for matrices. This similarity can be expressed by
saying that both systems form what is called an algebra.
Consider a vector space V with finite dimension n over a
field F. Let Ti and Ti be two linear operators on V. Then
we define their sum Ti + T2 by the rule
r 1 + T 2 ( v ) = Ti(v)+T2 (v)
and also the scalar multiple cTi, where c is an element of F,
by
cTi(v)=c(Ti(v)).
It is quite routine to verify that T + T2 and cT are also linear
operators on V. For example, to show that T + T2 is a linear
operator we compute
Ti + T2(vx + v2) = Ti (vi + v2) + T2(vi + v2)
= Ti(vi) + Ti(v2) + T2(Vl) + T2(v2),
from which it follows that
Ti +T2 (V l + v2) = (Ti +r2 (vi)) + (Ti +T2 (v2 )).
6.3: Kernel, Image and Isomorphism 187
It is equally easy to show that 7 + T2(cv) = c(2 + T2(v)).
Thus the set of all linear operators on V, which will henceforth
be written
L(V),
admits natural operations of addition and scalar multiplica-
tion.
Now there is a further natural operation that can be per-
formed on elements of L(V), namely functional composition
as defined in 6.1. Thus, if T and T2 are linear operators on
V', then the composite T oT2, which will in future be written
TiT2,
is defined by the rule
TiT2 (v)=Ti(T2 (v)).
One has of course to check that TiT2 is actually a linear trans-
formation, but again this is quite routine. So one can also form
products in the set L(V).
To illustrate these definitions, we consider an explicit ex-
ample where sums, scalar multiples and products can be com-
puted.
Example 6.3.4
Let Ti and T2 be the linear operators on -Doo[a, b] defined by
Ti(/) = f - f and T2(/) = xf" - 2/'. The linear opera-
tors Ti + T2, cT and TiT2 may be found directly from the
definitions as follows:
Ti + r2(/) = r1(/) + T2(/) = / ' - / + x/"-2/'
= -f-f' + xf".
Also
cT1(f) = cf'-cf
188 Chapter Six: Linear Transformations
and
TiT2(/) = T1(T2(f))=T1(xf" - 2/')
= ( * / " - 2 / ' ) ' - ( * / " - 2 / ' ) ,
which reduces to TiT2(/) = 2/' - (x + 1)/" + xf^ after
evaluation of the derivatives.
At this point one can sit down and check that those
properties of matrices listed in 1.2.1 which relate to sums,
scalar multiples and products are also valid for linear oper-
ators. Thus there is a similarity between the set of linear
operators L(V) and Mn(F), the set o f n x n matrices over F
where n = dim(V). This similarity should come as no sur-
prise since the action of a linear operator can be represented
by multiplication by a suitable matrix.
The relation between L(V) and Mn(F) can be formalized
by defining a new type of algebraic structure. This involves
the concept of a ring, which was was described in 1.3, and
that of a vector space.
An algebra A over a field F is a set which is simultane-
ously a ring with identity and a vector space over F, with
the same rule of addition and zero element, which satisfies the
additional axiom
c(xy) = (cx)y = x(cy)
for all x and y in A and all c in the field F. Notice that this
axiom holds for the vector space Mn(F) because of property
(j) in 1.2.1. Hence Mn{F) is an algebra over F. Now the
additional axiom is also valid in L(V), that is,
c(TlT2) = (cT1)T2 = Tl(cT2).
This is true because each of the three linear operators men-
tioned sends the vector v to c(Ti(T2(v))). It follows that
6.3: Kernel, Image and Isomorphism 189
L(V), the set of all linear operators on a vector space V over
a field F, is an algebra over F.
Suppose now that we pick and fix an ordered basis B for
the finite-dimensional vector space V. Then, with respect to
B, a linear operator T on V is represented by an n x n matrix,
which will be denoted by
M(T).
By 6.2.3 the matrix M(T) has the property
[T(v)]B = M(r)[v]B.
It follows from 6.2.3 that the assignment of the matrix
M(T) to a linear operator T determines a bijective function
from L(V) to Mn(F). The essential properties of this function
are summarized in the next result.
Theorem 6.3.10
Let Ti and T2 be linear operators on an n-dimensional vector
space V and let M(Ti) denote the matrix representing Ti with
respect to a fixed ordered basis B of V. Then the following
equations hold:
(i) M(Ti + T2) = M(Ti) + M(T2);
(ii) M{cT)=cM(T);
(iii) M(TiT2) = M(Ti)M(T2) for all scalars c.
It is as well to restate this technical result in words to
make sure that the reader grasps what is being asserted. Ac-
cording to part (i) of the theorem, if we add linear operators
T and T2, the resulting linear operator T + T2 is represented
by a matrix which is the sum of the matrices that represent
T and T2. Also (ii) asserts that the scalar multiple cTi is
represented by a matrix which is just c times the matrix rep-
resenting T.
190 Chapter Six: Linear Transformations
More unexpectedly, when we compose the linear opera-
tions Ti and T2, the resulting linear operator T1T2 is repre-
sented by the product of the matrices representing T and T2 •
In technical language, the function which sends T to
M(T) is an algebra isomorphism from L(V) to Mn(F). The
main point here is that isomorphic algebras, like isomorphic
vector spaces, are to be regarded as similar objects, which
exhibit the same essential features, even although their un-
derlying sets may be quite different.
In conclusion, our vague feeling that the algebras L(V)
and Mn(F) are somehow quite closely related is made precise
by the assertion that the algebra of all linear operators on an
n- dimensional vector space over a field F is isomorphic with
the algebra of all n x n matrices over F.
Example 6.3.5
Prove part (iii) of Theorem 6.3.10.
Let v be any vector of the vector space; then, using the
fundamental equation [T(v)]s = M(T)[v]s, we obtain
[TiT2{-v)}B = M(T1)[r2(v)]s = M(T1)(M(T2)[v]s)
= M(7i)M(r2 )[v]s ,
which shows that M{TXT2) = M(Ti)M(T2), as required.
Exercises 6.3
1. Find bases for the kernel and image of the following linear
transformations:
(a) T : R4
—
>
• R where T sends a column to the sum
of its entries;
(b) T : P3(R) - P3(R) where T(f) = /';
< = ) T : R ^ R ° where r ( ( * ) ) = ( £ ; $ ) .
6.3: Kernel, Image and Isomorphism 191
2. Show that every subspace U of a finite-dimensional vector
space V is the kernel and the image of suitable linear operators
on V. [Hint: assume that U is non-zero, choose a basis for U
and extend it to a basis of V].
3. Sort the following vector spaces into batches, so that those
within the same batch are isomorphic:
R 6
, R 6 , C6
, P6(C),M2,3(R),C[0,1].
4. Show that a linear transformation T : V —
> W is injective if
and only if it has the property of mapping linearly independent
subsets of V to linearly independent subsets of W.
5. Show that a linear transformation T : V —• W is surjec-
tive if and only if it has the property of mapping any set of
generators of V to a set of generators of W.
6. A linear operator on a finite-dimensional vector space is
an isomorphism if and only if some representing matrix is
invertible: prove or disprove.
7. Prove that the composite of two linear transformations is
a linear transformation.
8. Prove parts (i) and (ii) of Theorem 6.3.10.
9. Let T : V —
> W and S : W —
> U be isomorphisms of
vector spaces; show that the function ST : V —> U is also an
isomorphism.
10. Let T be a linear operator on a finite-dimensional vector
space V. Prove that the following statements about T are
equivalent:
(a) T is injective;
(b) T is surjective;
(c) T is an isomorphism.
Are these statements still equivalent if V is infinitely gener-
ated?
11. Show that similar matrices have the same rank. [Use the
fact that similar matrices represent the same linear operator].
192 Chapter Six: Linear Transformations
12. (The third isomorphism theorem). Let U and W be sub-
spaces of a vector space V such that W C. U. Prove that U/W
is a suhspace oiV/W and that (V/W)/(U/W) ~ V/U. [Hint:
define a function T : V/W -> V/U by the rule T(v + W) =
v + ?7. Show that T is a well defined linear transformation and
apply 6.3.7].
13. Explain how to define a power Tm
of a linear operator T
on a vector space V, where m > 0. Then show that powers of
T commute.
Chapter Seven
ORTHOGONALITY IN
VECTOR SPACES
The notion of two lines being perpendicular, or orthogo-
nal, is very familiar from analytical geometry. In this chapter
we show how to extend the elementary concept of orthogonal-
ity to abstract vector spaces over R or C. Orthogonality turns
out to be a tool of extraordinary utility with many applica-
tions, one of the most useful being the well-known Method
of Least Squares. We begin with Rn
, showing how to define
orthogonality in this vector space in a way which naturally
generalizes our intuitive notion of perpendicularity in three-
dimensional space.
7.1 Scalar Products in Euclidean Space
Let X and Y be two vectors in Rn
, with entries x,..., xn
and 2/1,..., yn respectively. Then the scalar product of X and
Y is defined to be the matrix product
XT
Y = (x1x2 ... xn)
(Vi
V2
= Xiyi + X2IJ2 H V XnVn-
This is a real number. Notice that XT
Y = YT
X, so the scalar
product is symmetric in X and Y. Of particular interest is the
scalar product of X with itself
XT
X = xl + xl + --- + x2
n.
193
194 Chapter Seven: Orthogonality in Vector Spaces
Since this expression cannot be negative, it has a real square
root, which is called the length of X. It is written
||X|| = VXT
X = ^xl+xl + ... + x 2n .
Notice that ||X|| > 0, and X = 0 if and only if all the x{
are zero, that is, X = 0. So the only vector of length 0 is the
zero vector. A vector whose length is 1 is called a unit vector.
At this point it is as well to specialize to R3
where geo-
metrical intuition can be used. Recall that a 3-column vector
X in R3
, with entries xi, X2, £3, is represented by a line seg-
ment in three-dimensional space with arbitrary initial point
(CJI, a2, a3) and endpoint (oi+xi, a2+x2, 03 + 2:3). Thus the
length of the vector X is just the length of any representing
line segment.
This suggests that we look for a geometrical interpreta-
tion of the scalar product of two vectors in R3
.
Theorem 7.1.1
Let X and Y be vectors in R3
. Then
XT
Y = X Ycos 9.
where 9 is the angle in the interval [0, TT] between line segments
representing X and Y drawn from the same initial point.
Proof
Consider the triangle rule for adding the vectors X and Y — X
in the triangle IAB, as shown in the diagram below.
7.1: Scalar Products in Euclidean Space 195
The idea is then to apply the cosine rule to this triangle.
Y-X
Thus we have
AB2
= I A2
+ IB2
- 21A • IB cos 9,
which becomes in vector form
Y-X |xir + i|y|r-2|ix|i imicos e.
As usual let the entries of X and Y be x, x2, xs and yi, y2,
y3 respectively. Then
2 _ „,2 , „,2 , „,2
Xr = xi + xi + xl YF = yi+v>2+vi
and
Y - X2
= (Vl - xx)2
+ (y2 - x2f + (y3 - x3)2
.
Now substitute these expressions in the equation for Y—X2
,
and solve for the expression ||X|| ||y|| cos 9. We obtain after
some simplification the required result
X Y cos 9 = xij/i + x2y2 + x3y3 = XT
Y.
196 Chapter Seven: Orthogonality in Vector Spaces
The formula of 7.1.1 allows us to calculate quickly the
angle 6 between two non-zero vectors X and Y; for it yields
the equation
XT
Y
Hence the vectors X and Y are orthogonal if and only if
XT
Y = 0.
There is another more or less immediate use for the for-
mula of 7.1.1. Since cos 6 always lies between —1 and +1, we
can derive a famous inequality.
Theorem 7.1.2 (The Cauchy - Schwartz Inequality)
If X and Y are any vectors in R3
, then
XT
Y < IIXII ||Y||.
Projection of a vector on a line
Let X and Y be two vectors in R3
with Y non-zero.
We wish to define the projection of X on Y. Now any vector
parallel to Y will have the form c Y for some scalar c. The
idea is to try to choose c in such a way that the vector X — cY
7.1: Scalar Products in Euclidean Space 197
is orthogonal to Y. For then cFwill be the projection of X
on Y, as one sees from the diagram.
The condition for X — cY to be orthogonal to Y is
0 = (X - cY)T
Y = XT
Y - cYT
Y = XT
Y - c Y2
.
The correct value of c is therefore XT
Y / Y2
and the vector
projection of X on Y is
The scalar projection of X on Y is the length of P, that is,
XT
Y
P
Y
We will see in 7.2 how to extend this concept to the projection
of a vector on an arbitrary subspace.
Example 7.1.1
Consider the vectors
X= - 1 , Y =
inR3
. Here ||X|| = V& Y = y/li and XT
Y = 2 - 3 + 2 = 1.
The angle 8 between X and Y is therefore given by
cos 8 =
84
and 8 is approximately 83.74°. The vector projection of X on
Y is (1/14)F and the scalar projection is 1/y/lA.
198 Chapter Seven: Orthogonality in Vector Spaces
The distance of a point from a plane
As an illustration of the usefulness of these ideas, we will
find a formula for the shortest distance between the point
($0, J/0) ZQ) and the plane whose equation is
ax + by + cz = d.
First we need to recall a few basic facts about planes.
Suppose that (xi, yi, z) and (x2, j/25 ^2) &re
two points
on the given plane. Then ax +byi +cz± = d = ax2 + by2 + cz2,
so that
a(xi - x2) + b(yi - y2) + c{zx - z2) = 0.
Now this equation asserts that the vector
N =
is orthogonal to the vector with entries xi—X2,yi—y2, Z1—Z2,
and hence to every vector in the plane.
N
(*2- 72. Z2)
( * i . y i . * i )
Thus iV is a normal vector to the plane ax + by + cz = d,
which is a familiar fact from the analytical geometry of three-
dimensional space.
7.1: Scalar Products in Euclidean Space 199
We are now in a position to calculate the shortest distance
I from the point (XQ, yo, z0) to the plane. Let (x, y, z) be a,
point in the plane, and write
x  x0
X= y and Y =  y0
z I z0
Then I is simply the scalar projection of XQ — X on N, as may
be seen from the diagram below:
(*> y. z)
Therefore
Now
(X0-X)T
N
IliVll
(X0 - X)T
N = a(x0 -x) + b(yQ - y) + c(z0 - z)
= ax0 + by0 + cz0 - d :
for ax + by + cz = d since the point (x, y, z) lies in the plane.
Thus we arrive at the formula
/ =
ax0 + fa/o + cz0 - d
Va2
+ b2
+ c2
200 Chapter Seven: Orthogonality in Vector Spaces
Vector products in R3
In addition to the scalar product, there is another well-
known construction in R3
called the vector product. This is
defined in the following manner.
Suppose that
X =
are two vectors in R3
. Then the vector product of X and Y
X x Y
is defined to be the vector
Xi 
x2
X3J
and Y =
Vi
V2
V3
(x2y3 - £32/2 
Xy2 - x2yx /
Notice that each entry of this vector is a 2 x 2 determinant.
Because of this, the vector product is best written as a
3 x 3 determinant. Following a commonly used notation, let
us write i, j , k for the vectors of the standard basis of R3
.
Thus
.-(i),j=(?)„dk =(0
o
Then the vector product X xY can be expressed in the form
X x Y = (x2y3 - x3y2)i + (x3yi - xxy3)} + {xxy2 - Z22/i)k.
This expression is a row expansion of the 3 x 3 determinant
X xY =
7.1: Scalar Products in Euclidean Space 201
Here the determinant is evaluated by expanding along row 1
in the usual manner.
Example 7.1.2
The vector product of X =
1 x 7 = 1 - 1
which becomes on expansion
14'
X x Y = 14i + 8j - 5k = | 8
The importance of the vector product X xY arises from
the fact that it is orthogonal to each of the vectors X and Y;
thus it is represented by a line segment that is normal to the
plane containing line segments corresponding to X and Y, in
case these are not parallel. To see this we can simply form the
scalar product of X x Y in turn with X and Y. For example,
XT
(X xY) =
Since rows 1 and 2 are identical, this is zero by a basic property
of determinants (3.2.2).
In fact the vectors X, Y, X x Y form a right-handed
system in the sense that their directions correspond to the
thumb and first two index fingers of the right hand when held
extended.
202 Chapter Seven: Orthogonality in Vector Spaces
Theorem 7.1.3
If X and Y are vectors in R3
; the vector X xY is orthogonal
to both X and Y, and the three vectors X, Y, X x Y form a
right-handed system.
The length of the vector product, like the the scalar prod-
uct, is a number with geometrical significance.
Theorem 7.1.4
If X and Y are vectors in R3
and 9 is the angle in the interval
[0,7r] between X and Y, then
X xY = X Ysm 9.
Proof
We compute the expression ||X||2
||y||2
— X x Y2
, by sub-
stituting ||X||2
= x + x + x2
3, Y2
= y2
+ yj + y2
and
X x Y2
= (x2y3 - x3y2)2
+ (x3yx - xxy^)2
+ (xxy2 - x2yi)2
•
After expansion and cancellation of some terms, we find that
||X||2
||F||2
- X x Y2
= (xlVl + x2y2 + x3y3)2
= (XT
Y)2
.
Therefore, by 7.1.1,
||X||2
||F||2
-X x Y2
= ||X||2
||y||2
cos2
^.
Consequently X x Y2
= X2
||F||2
sin2
^. Finally, take
the square root of each side, noting that the positive sign is
correct since sin 9 > 0 in the interval [0, n].
Theorem 7.1.4 provides another geometrical interpreta-
tion of the vector product X x Y. For ||X x Y is simply
the area of the parallelogram IPRQ formed by line segments
representing the vectors X and Y. Indeed the area of this
7.1: Scalar Products in Euclidean Space 2Uo
parallelogram equals
(IQ sin 9)IP = X Y sin 6 = X x Y.
Q
Orthogonality in Rn
Having gained some insight from R3
, we are now ready
to define orthogonality in n-dimensional Euclidean space.
Let X and Y be two vectors in Rn
. Then X and Y are
said to be orthogonal if
XT
Y = 0.
This a natural extension of orthogonality in R3
. It follows
from the definition that the zero vector is orthogonal to every
vector in Rn
and that no non-zero vector can be orthogonal
to itself: indeed XT
X = x + x + • •
• + x2
n > 0 if X ^ 0.
It turns out that the inequality of 7.1.2 is valid for Rn
.
Theorem 7.1.5 (Cauchy - Schwartz Inequality)
If X and Y are vectors in Rn
, then
XT
Y < 11X11 iiyii.
We shall not prove 7.1.5 at this stage since a more general
fact will be established in 7.2: see however Exercise 7.1.10.
204 Chapter Seven: Orthogonality in Vector Spaces
Because of 7.1.5 it is meaningful to define the angle between
two non-zero vectors X and Y in Rn
to be the angle 9 in the
interval [0, ir] such that
XT
Y
An important consequence of 7.1.5 is
Theorem 7.1.6 (The Triangle Inequality)
If X and Y are vectors in Rn
, then
X + Y < ||X|| + ||y||.
Proof
Let the entries of X and Y be x,... ,xn and y±,... ,yn re-
spectively. Then
||X + r||2
= (X + Yf{X + Y) = XT
X +XT
Y + YT
X + YYT
and, since XT
Y = YT
X, this equals
||X||2
+ ||y||2
+ 2XT
F.
By the Cauchy-Schwartz Inequality XT
Y < X Y, so it
follows that
X + Y2
< X2
+ Y2
+ 2X Y = (X + ||F||)2
,
which yields the desired inequality.
7.1: Scalar Products in Euclidean Space 205
When n = 3, the assertion of 7.1.6 is just the well-known
fact that the sum of the lengths of two sides of a triangle is
never less than the length of the third side, as can be seen
from the triangle rule of addition for the vectors X and Y.
""£>
Complex matrices and orthogonality in Cn
It is possible to define a notion of orthogonality in the
complex vector space C™, a fact that will be important in
Chapter Eight. However, a crucial change in the definition
must be made. To see why a change is necessary, consider the
complex vector X = ( 7* )• Then XT
X = - 1 + 1 = 0.
Since it does not seem reasonable to allow a non-zero vector
to have length zero, we must alter the definition of a scalar
product in order to exclude this phenomenon.
First it is necessary to introduce a new operation on com-
plex matrices. Let A be an m x n matrix over the complex
field C. Define the complex conjugate
A
of A to be the m xn matrix whose (i,j) entry is the complex
conjugate of the (i,j) entry of A. Then define the complex
transpose of A to be the transpose of the complex conjugate
A* = (Af.
206 Chapter Seven: Orthogonality in Vector Spaces
For example, if
A =
then
4 —v/-l 3
1 + ^/^T - 4 1 - J=l J '
Usually it is more appropriate to use the complex transpose
when dealing with complex matrices. In many ways the com-
plex transpose behaves like the transpose; for example, there
is the following fact.
Theorem 7.1.7
If A and B are complex matrices, then (AB)* = B*A*.
This follows at once from the equations (AB) — (A)(B)
and (AB)T
= BT
AT
.
Now let us use the complex transpose to define the com-
plex scalar product of vectors X and Y in Cn
; this is to be
X*Y = xtyi + --- + xnyn,
which is a complex number. Why is this definition any better
than the previous one? The reason is that, if we define the
length of the vector X in the natural way as
X = VX*X = x /|x1 |2
+ --- + |xn|2
,
then ||X|| is always a non-negative real number, and it can-
not equal 0 unless X is the zero vector. It is an important
consequence of the definition that Y*X equals the complex
conjugate of X*Y, so the complex scalar product is not sym-
metric in X and Y.
7.1: Scalar Products in Euclidean Space 207
It remains to define orthogonality in Cn
. Two vectors X
and Y in Cn
are said to be orthogonal if
X*Y = 0.
We now make the blanket assertion that all the results estab-
lished for scalar products in Rn
carry over to complex scalar
products in Cn
. In particular the Cauchy-Schwartz and Tri-
angle Inequalities are valid.
Exercises 7.1
(~2
 ( x
1. Find the angle between the vectors I 4 I and I —2
V 3 / V 3.
2. Find the two unit vectors which are orthogonal to both of
the vectors I 3 J and I 1
 - i / V1
.
3. Compute the vector and scalar projections of
"!)on
(i.
4. Show that the planes x — 3y + 4z = 12 and 2x — 6y + 8z = 6
are parallel and then find the shortest distance between them.
( 2
 f°
5. If X = I —1 J and Y = 4 , find the vector product
V 3/ W
X x Y. Hence compute the area of the parallelogram whose
vertices have the following coordinates: (1, 1, 1), (3, 0, 4),
(1, 5, 3), (3, 4, 6).
6. Establish the following properties of the vector product:
(a) X x X = 0; (b) X x (Y + Z) = X x Y + X x Z;
(c)XxY = -YxX; (d) Xx(cY) = c{XxY) = (cX)xY.
2L)o Chapter Seven: Orthogonality in Vector Spaces
7. If X, Y, Z are vectors in R3
, prove that
XT
(Y x Z) = YT
(Z xX) = ZT
(X x Y).
(This is called the scalar triple product of X, Y, Z). Then
show that that the absolute value of this number equals the
volume of the parallelopiped formed by line segments repre-
senting the vectors X, Y, Z drawn from the same initial point.
8. Use Exercise 7 to find the condition for the three vectors
X, Y, Z to be represented by coplanar line segments.
9. Show that the set of all vectors in Rn
which are orthog-
onal to a given vector X is a subpace of Rn
. What will its
dimension be?
10. Prove the Cauchy-Schwartz Inequality for Rn
. [Hint:
compute the expression ||X||2
||y||2
— |XT
F|2
and show that
it is is non-negative].
11. Find the most general vector in C3
which is orthogonal
to both of the vectors
( -s* ( x

2 + 7=T and 1 .
V 3 ) J=2)
12. Let A and B be complex matrices of appropriate sizes.
Prove the following statements:
(a)(i)T
= (W); (b)(A + B)* = A*+B*; (c)(A*)* = A.
13. How should the vector projection of X on Y be defined
inC3
?
7.2: Inner Product Spaces 209
14. Show that the vector equation of the plane through the
point (xo, yo, ZQ) with normal vector N is
{X - X0)T
N = 0
where X and XQ are the vectors with entries x, y, z and xo,
y0, z0, respectively.
15. Prove the Cauchy-Schwartz Inequality for complex scalar
products in Cn
.
16. Prove the Triangle Inequality for complex scalar products
i n C n
.
17. Establish the following expression for the vector triple
product in R3
: X x (Y x Z) = (X • Z)Y - (X • Y)Z. [Hint:
note that the vector on the right hand side is orthogonal to
to both X and Y x Z.
7.2 Inner Product Spaces
We have seen how to introduce the notion of orthogonal-
ity in the vector spaces Rn
and Cn
for arbitrary n. But what
about other vector spaces such as vector spaces of polynomials
or continuous functions? It turns out that there is a general
concept called an inner product which is a natural extension
of the scalar products in Rn
and Cn
. This allows the intro-
duction of orthogonality in arbitrary real and complex vector
spaces.
Let V be a real vector space, that is, a vector space over
R. An inner product on V is a rule which assigns to each pair
of vectors u and v of V a real number < u, v >, their inner
product, such that the following properties hold:
(i) < v, v > > 0 and < v, v > = 0 if and only if v = 0;
(ii) < u, v > = < v, u >;
(iii)< cu + dv, w >= c < u, w > + <i < v, w > .
2 1 0 Chapter Seven: Orthogonality in Vector Spaces
The understanding here is that these properties must hold for
all vectors u, v, w and all real scalars c, d.
We now give some examples of inner products, the first
one being the scalar product, which provided the original mo-
tivation.
Example 7.2.1
Define an inner product < > on Rn
by the rule
< X, Y > = XT
Y.
That this is an inner product follows from the laws of matrix
algebra, and the fact that XT
X is non-negative and equals 0
only if X = 0. This inner product will be referred to as the
standard inner product on Rn
. It should be borne in mind
that there are other possible inner products for this vector
space; for example, an inner product on R3
is defined by
< X, Y >= 2xxv + 3x2y2 + 4z3?/3
where X and Y are the vectors with entries x, X2, x$ and y±,
J/2; 2/3 respectively. The reader should verify that the axioms
for an inner product hold in this case.
Example 7.2.2
Define an inner product < > on the vector space C[a,b] by
the rule
<f,9>= / f(x)g(x)dx.
J a
This is very different type of inner product, which is im-
portant in the theory of orthogonal functions. Well-known
properties of integrals show that the requirements for an in-
ner product are satisfied. For example,
rb
< / , / > = / f(x)2
dx>0
Ja
7.2: Inner Product Spaces 211
since f(x)2
> 0; also, if we think of the integral as the area
under the curve y = f{x)2
, then it becomes clear that the
integral cannot vanish unless f(x) is identically equal to zero
in [a ,b).
Example 7.2.3
Define an inner product on the vector space Pn (R) of all real
polynomials in x of degree less than n by the rule
n
< f,9> = ^2f(.Xi)g(xi)
i=l
where distinct real numbers.
Here it is not so clear why the first requirement for an
inner product holds. Note that
n
also the only way that this sum can vanish is if f(x) = ...
= f(x
n) = 0. But / is a polynomial of degree at most n — 1, so
it cannot have n distinct roots unless it is the zero polynomial.
Orthogonality in inner product spaces
A real inner product space is a vector space V over R
together with an inner product < > on V. It will be con-
venient to speak of "the inner product space Vn
, suppressing
mention of the inner product where this is understood. Thus
"the inner product space Rn
" refers to Rn
with the scalar
product as inner product: this is called the Euclidean inner
product space.
Two vectors u and v of an inner product space V are said
to be orthogonal if
< u, v > = 0.
212 Chapter Seven: Orthogonality in Vector Spaces
It follows from the definition of an inner product that the zero
vector is orthogonal to every vector and no non-zero vector can
be orthogonal to itself.
Example 7.2.4
Show that the functions sin x, m = 1,2,..., are mutually
orthogonal in the inner product space C[0, n] where the inner
product is given by the formula < f,g > = JQ f(x)g(x)dx.
We have merely to compute the inner product of sin mx
and sin nx :
r
< sin mx, sin nx > = sin mx sin nx dx.
Jo
Now, according to a well-known trigonometric identity,
sinmx sin nx = -(cos(m — n)x — cos(m + n)x).
Therefore, on evaluating the integrals, we obtain as the value
of < sin mx, sin nx >
[ — r sin(m — n)x — — sin(m -f n)x)7. — 0,
L
2(m-n) v ;
2(m + n) v ; J0
provided m ^ n. This is a very important set of orthogonal
functions which plays a basic role in the theory of Fourier
series.
If v is a vector in an inner product space V, then
< v,v > > 0, so this number has a real square root. This
allows us to define the norm of v to be the real number
||v|| = V< v,v>.
Thus ||v|| > 0 and ||v|| equals zero if and only if v = 0. A
vector with norm 1 is called a unit vector. It is clear that
norm is a generalization of length in Euclidean space.
7.2: Inner Product Spaces 213
Example 7.2.5
Find the norm of the function sin mx in the inner product
space C[0, IT] of Example 7.2.4.
Once again we have to compute an integral:
|| sin rax||2
= / sin2
mx dx = / -(1 — cos 2mx)dx = ir/2.
Jo Jo 2
Hence || sin mx = ^/(TT/2). It follows that the functions
2~
sin mx, m = 1, 2,...,
n
form a set of mutually orthogonal unit vectors. Such sets are
called orthonormal and will be studied in 7.3.
There is an important inequality relating inner product
and norm which has already been encountered for Euclidean
spaces.
Theorem 7.2.1 (The Cauchy - Schwartz Inequality)
Let u and v be vectors in an inner product space. Then
| < u, v > | < ||u|| ||v||.
Proof
We can assume that v ^ 0 or else the result is obvious. Let
t denote an arbitrary real number. Then, using the defining
properties of the inner product, we find that < u—tv, u—tv >
equals
< u, u > - < u, v > t- < v, u > t+ < v, v > t2
,
which reduces to
||u||2
- 2 < u,v > t+ v2
t2
> 0.
214 Chapter Seven: Orthogonality in Vector Spaces
For brevity write a = ||v||2
, b = < u, v > and c = ||u||2
. Thus
at2
- 2bt + c =< u - tv, u - tv > > 0.
To see what this implies, complete the square in the usual
manner;
at2_2bt + c = a((t--)2
+ (--^)).
a a a2
'
Since a > 0 and the expression on the left hand side of the
equation is non-negative for all values of t, it follows that
c/a > b2
/a2
, that is, b2
< ac. On substituting the values of
a, b and c, and taking the square root, we obtain the desired
inequality.
Example 7.2.6
If 7.2.1 is applied to the vector space C[a, b] with the inner
product specified in Example 7.2.2, we obtain the inequality
f f(x)g(x)dx < ( f f(x)2
dx)1/2
( / g(x)2
dx)1/2
.
a J a J a
Normed linear spaces
The next step in our series of generalizations is to extend
the notion of length of a vector in Euclidean space. Let V
denote a real vector space. By a norm on V is meant a rule
which assigns to each vector v a real number ||v||, its norm,
such that the following properties hold:
(i) ||v|| > 0 and ||v|| = 0 if and only if v = 0;
(ii) ||cv|| = c ||v||;
(iii) ||u +v|| < ||TU.|| + ||v||. (The Triangle Inequality).
These are to hold for all vectors u and v in V and all scalars
c. A vector space together with a norm is called a normed
linear space.
7.2: Inner Product Spaces 215
We already know an example of a normed linear space;
for the length function on Rn
is a norm. To see why this
is so, we need to remember that the Triangle Inequality was
established for the length function in 7.1.6.
The reader will have noticed that the term "norm" has
already been used in the context of an inner product space.
Let us show that these two usages are consistent.
Theorem 7.2.2
Let V be an inner product space and define
||v|| = yf< V, V >.
Then || || is a norm on V and V is a normed linear space.
Proof
We need to check the three axioms for a norm. In the first
place, ||v|| = y < v, v > > 0, and this cannot vanish unless
v = 0, by the definition of an inner product. Next, if c is a
scalar, then
||cv|| = y/< cv, cv > = /{c2
< v, v >) = c |v||.
Finally, the Triangle Inequality must be established. By the
defining properties of the inner product:
||u + v||2
= < u + v, u + v > = ||u||2
+ 2 < u, v > +||v||2
,
which, by 7.2.1, cannot exceed
||u||2
+ 2||u||||v|| + ||vf = (||u|| + ||v||)2
.
On taking square roots, we derive the required inequality.
Theorem 7.2.2 enables us to give many examples of
normed linear spaces.
216 Chapter Seven: Orthogonality in Vector Spaces
Example 7.2.7
The Euclidean space Rn
is a normed linear space if length is
taken as the norm. Thus
||X|| = VX^X = /xl+x% + .-- + xl
Example 7.2.8
The vector space C[a,b] becomes a normed linear space if ||/||
is defined to be
{f' f(xfdx)1
/2
.
J a
Example 7.2.9 (Matrix norms)
A different type of normed linear space arises if we consider
the vector space of all real m x n matrices and introduce a
norm on it as follows. If A = [ciijm,ni define A to be
m n
<E£4)1/2
-
On the face of it this is a reasonable measure of the "size" of
the matrix. But of course one has to show that this is really a
norm. A neat way to do this is as follows: put A equal to the
ran-column vector whose entries are the elements of A listed
by rows. The key point to note is that A is just the length
of the vector A in Rm n
. It follows at once that || || is a norm
since we know that length is a norm.
Inner products on complex vector spaces
So far inner products have only been defined on real vec-
tor spaces. Now it has already been seen that there is a rea-
sonable concept of orthogonality in the complex vector space
Cn
, although it differs from orthogonality in Rn
in that a dif-
ferent scalar product must be used. This suggests that if an
7.2: Inner Product Spaces 217
inner product is to be defined on an arbitrary complex vector
space, there will have to be a change in the definition of the
inner product.
Let V be a vector space over C. An inner product on V
is a rule that assigns to each pair of vectors u and v i n F a
complex number < u, v > such that the following rules hold:
(i) < v, v > > 0 and < v, v > = 0 if and only if v = 0;
(ii) < u,v > = < v,u >;
(iii) < cu + dv, w > = c < u, w > +d < v, w > .
These are to hold for all vectors u, v, w and all complex scalars
c, d. Observe that property (ii) implies that < v,v > is
real: for this complex number equals its complex conjugate. A
complex vector space which is equipped with an inner product
is called a complex inner product space.
Our prime example of a complex inner product space is
Cn
with the complex scalar product < X, Y > = X*Y. To
see that this is a complex inner product, we need to note that
X*Y = F*X and
< cX + dY, Z > = (cX + dY)*Z = cX*Z + dY*Z,
which is just c < X, Z > + d < Y, Z > .
Provided that the changes implied by the altered condi-
tions (ii) and (iii) are made, the concepts and results already
established for real inner product spaces can be extended to
complex inner product spaces. In addition, results stated for
real inner product spaces in the remainder of this section hold
for complex inner product spaces, again with the appropriate
changes.
Orthogonal complements
We return to the study of orthogonality in real inner prod-
uct spaces. We wish to introduce the important notion of the
orthogonal complement of a subspace. Here what we have in
218 Chapter Seven: Orthogonality in Vector Spaces
mind as our model is the simple situation in three-dimensional
space where the orthogonal complement of a plane is the set
of line segments perpendicular to it.
Let 5 be a subspace of a real inner product space V.
The orthogonal complement of S is defined to be the set of all
vectors in V that are orthogonal to every vector in S: it is
denoted by the symbol
S±
.
Example 7.2.10
Let S be the subspace of R3
consisting of all vectors of the
form
(S)
where a and b are real numbers. Thus elements of S corre-
spond to line segments in the xy-plane. Equally clearly S1
- is
the set of all vectors of the form
( • ) •
These correspond to line segments along the 2-axis, hardly a
surprising conclusion.
The most fundamental property of an orthogonal com-
plement is that it is a subspace.
Theorem 7.2.3
Let S be a subspace of a real inner product space V. Then
(a) S1
- is a subspace of V;
(b) SnS±
= 0;
(c) if S is finitely generated, a vector v belongs to S1
if
and only if it is orthogonal to every vector in some set of
generators of S.
7.2: Inner Product Spaces 219
Proof
To show that S1
is a subspace we need to verify that it con-
tains the zero vector and is closed with respect to addition
and scalar multiplication. The first statement is true since
the zero vector is orthogonal to every vector. As for the re-
maining ones, take two vectors v and w in S1
-1
, let s be an
arbitrary vector in S and let c be a scalar. Then
< cv, s > = c < v, s > = 0,
and
< v + w, s > = < v , s > + < w , s > = 0 .
Hence cv and v + w belong to S1
-.
Now suppose that v belongs to the intersection S P S1
-.
Then v is orthogonal to itself, which can only mean that v =
0.
Finally, assume that v i , . . . , vm are generators of S and
that v is orthogonal to each v^. A general vector of S has the
form YlT=i c
iv
i f°r s o m e
scalars Q . Then
m m
< V, ^ °iV
i > = ^ Q < V, Vj > = 0.
i=l i=l
Hence v is orthogonal to every vector in S and so it belongs
to S-1
. The converse is obvious.
Example 7.2.11
In the inner product space ^ ( R ) with
< f,9> = / f{x)g{x)dx,
Jo
find the orthogonal complement of the subspace S generated
by 1 and x.
220 Chapter Seven: Orthogonality in Vector Spaces
Let / = ao+aix--a,2X2
be an element of Ps(R). By 7.2.3,
a polynomial / belongs to S1
- if and only if it is orthogonal
to 1 and x; the conditions for this are
f1
1 1
< /, 1 > = / f{x)dx = a0 + -ax + -a2 = 0
and
f1
1 1 1
< /, x >= / xf(x)dx = -a0 + -ai + -a2 = 0.
Solving these equations, we find that ao = t/6, a = —t and
a2 = t, where t is arbitrary. Hence / = t(x2
—£+|) is the most
general element of S1
-. It follows that S1
- is the 1-dimensional
subspace generated by the polynomial x2
— x + | .
Notice in the last example that dim(S') + dim(5,J
-) = 3,
the dimension of Pa(R). This is no coincidence, as the follow-
ing fundamental theorem shows.
Theorem 7.2.4
Let S be a subspace of a finite-dimensional real inner product
space V; then
V = S®S±
and dim(V) = dim(S) + dim(5±
).
Proof
According to the definition in 5.3, we must prove that V =
S + S1
- and S D Sx
= 0. The second statement is true by
7.2.3, but the first one requires proof.
Certainly, if S = 0, then S1
- = V and the result is clear.
Having disposed of this case, we may assume that S is non-
zero and choose a basis v^,..., vm for S. Extend this basis of
7.2: Inner Product Spaces 221
S to a basis of V, say v i , . . . , vTO, v m + i , . . . , vn : this possible
by 5.1.4. If v is an arbitrary vector of V, we can write
n
By 7.2.3 the vector v belongs to S1
- if and only if it is orthog-
onal to each of the vectors v i , . . . , vm ; the conditions for this
are
n
< v^ v > = 2_. < Vj, Vj > Cj = 0, for i = 1, 2,..., m.
Now the above equations constitute a linear system of m equa-
tions in the n unknowns ci, C2,..., cn. Therefore the dimen-
sion of S1
- equals the dimension of the solution space of the
linear system, which we know from 5.1.7 to be n — r where r
is the rank of the m x n coefficient matrix A = [< Vj, Vj >].
Obviously r < m; we shall show that in fact r = m. If this is
false, then the m rows of A must be linearly dependent and
there exist scalars dfa, • • •, dm, not all of them zero, such that
m m
0 = J^d» < Vi, Vj > = < J^djVi, Vj- >
for j = 1,... ,n. But a vector which is orthogonal to every
vector in a basis of V must be zero. Hence Yl'iLi ^iv
i =
0'
which can only mean that d = d2 = • • • = dm = 0 since
v
i,---)v
m are linearly independent. By this contradiction
r = m.
We conclude that dim(5±
) = n — m = n — dim(S'), which
implies that dim(5,
)+dim(5-L
) = n — dim(V). It follows from
5.3.2 that
dim(5 + S-1
-) = dim(5) + dim(5'±
) = dim(V).
222 Chapter Seven: Orthogonality in Vector Spaces
Hence V = S + S1
-, as required.
An important consequence of the theorem is
Corollary 7.2.5
If S is a subspace of a finite-dimensional real inner product
space V, then
(S^ = S.
Proof
Every vector in S is certainly orthogonal to every vector in
S±
; thus S is a subspace of (S-1
)-1
. On the other hand, a
computation with dimensions using 7.2.4 yields
dim((5±
)±
) =dim(V) - d i m ^ 1
)
= dim(V) - (dim(F) - dim(S))
= dim(S)
Therefore S={S±
)±
.
Projection on a subspace
The direct decomposition of an inner product space into
a subspace and its orthogonal complement afforded by 7.2.4
leads to wide generalization of the elementary notion of pro-
jection of one vector on another, as described in 7.1. This
generalized projection will prove invaluable during the discus-
sion of least squares in 7.4.
Let V be a finite-dimensional real inner product space, let
S be a subspace and let v an element of V. Since V = StSS-1
,
there is a unique expression for v of the form
v = s + s1
-
where s and s-1
belong to S and S1
- respectively. Call s the
projection ofv on the subspace S. Of course, s-1
is the projec-
tion of v on the subspace S1
. For example, if V is R3
, and
7.2: Inner Product Spaces 223
S is the subspace generated by a given vector u, then s is the
projection of v on u in the sense of 7.1.
Example 7.2.12
Find the projection of the vector X on the column space of
the matrix A where
X = 1 and A =
Let S denote the column space of A. Now the columns
of A are linearly independent, so they form a basis of S. We
have to find a vector Y in S such that X — Y is orthogonal to
both columns of A; for then X — Y will belong to S1
- and Y
will be the projection of X on S. Now Y must have the form
Y = x
for some scalars x and y. Then if A and A^ are the columns
of A, the conditions for X — Y to belong to S1
- are
< X-Y, Ax > = (l-x-3y) + 2(l-2x + y) + {l-x-4y) = 0
and
< X-Y, A2> = 3(l~x-3y)-(l~2x+y)+4{l-x-4:y) = 0.
These equations yield x = 74/131 and y — 16/131. The pro-
jection of X on the subspace S is therefore
1 f122

i6i
138/
2 2 4 Chapter Seven: Orthogonality in Vector Spaces
Orthogonality and the fundamental subspaces of a
matrix
We saw in Chapter Four that there are three natural sub-
spaces associated with a matrix A, namely the null space, the
row space and the column space. There are of course corre-
sponding subspaces associated with the transpose AT
, so in
all six subspaces may be formed. However there is very little
difference between the row space of A and the column space
of AT
; indeed, if we transpose the vectors in the row space of
A, we get the vectors of the column space of AT
. Similarly
the vectors in the row space of AT
arise by transposing vec-
tors in the column space of A. Thus there are essentially four
interesting subspaces associated with A, namely, the null and
column spaces of A and of AT
. These subspaces are connected
by the orthogonality relations indicated in the next result.
Theorem 7.2.6
Let A be a real matrix. Then the following statements hold:
(i) null space of A = (column space of AT
)±
;
(ii) null space of AT
= (column space of A)1
-;
(iii) column space of A = (null space of A7
^)-1
;
(iv) column space of AT
= (null space of A)1
-.
Proof
To establish (i) observe that a column vector X belongs to the
null space of A if and only if it is orthogonal to every column
of AT
, that is, X is in (column space of AT
)±
. To deduce (ii)
simply replace A by AT
in (i). Equations (iii) and (iv) follow
on taking the orthogonal complement of each side of (ii) and
(i) respectively, if we remember that S = (S-1
)1
- by 7.2.5.
7.2: Inner Product Spaces 225
Exercises 7.2
1. Which of the following are inner product spaces?
(a) Rn
where < X, Y > = -XT
Y;
(b) Rn
where < X, Y > = 2XT
Y;
(c) C[0,1] where <f,g>= J^(f(x)+g{x))dx.
2. Consider the inner product space C[0, TT] where < /, g > =
C f(x)g(x)dx; show that the functions 1/y/n, J2pn cos mx,
m — 1,2,..., form a set of mutually orthogonal unit vectors.
3. Let to be a fixed, positive valued function in the vector
space C[a, b}. Show that if < /, g > is defined to be
b
f(x)w(x)g(x)dx,
then < > is an inner product on C[a, b]. [Here w is called a
weight function].
4. Which of the following are normed linear spaces?
(a) R3
where ||X|| =xl + x%+ xj;
(b) R3
where ||X|| = y/x + x - x
(c) R where ||X|| = the maximum of |xi|, x2-, %?-
5. Let V be a finite-dimensional real inner product space
with an ordered basis v i , . . . , vn . Define a^ to be < v^, Vj >.
If A = [dij] and u and w are any vectors of V, show that
< u, w > = [u]T
A[w] where [ u] is the coordinate vector of u
with respect to the given ordered basis.
6. Prove that the matrix A in Exercise 5 has the following
properties:
(a) XT
AX > 0 for all X;
(b) XT
AX = 0 only if X = 0;
(c) A is symmetric.
Deduce that A must be non-singular.
I
226 Chapter Seven: Orthogonality in Vector Spaces
7. Let A be a real n x n matrix with properties (a), (b) and
(c) of Exercise 6. Prove that < X, Y > = XT
AY defines an
inner product on Rn
. Deduce that ||X|| = y/XT
AX defines a
norm on Rn
.
8. Let S be the subspace of the inner product space -Ps(R)
generated by the polynomials 1 — x2
and 2 — x + x2
, where
< /, g > = fQ f(x)g(x)dx. Find a basis for the orthogonal
complement of S.
9. Find the projection of the vector with entries 1, —2, 3 on
/ l 0
the column space of the matrix 2 —4
 3 5
10. Prove the following statements about subspaces S and T
of a finite dimensional real inner product space:
(a) (S + T)1
=S±
DT±
;
(b) S1
- = T1
- always implies that S = T;
(c) (SDT)1
- = S±
+T±
.
11. If S is a subspace of a finite dimensional real inner product
space V, prove that S1
- ~ V/S.
7.3 Orthonormal Sets and the Gram-Schmidt Process
Let V be an inner product space. A set of vectors in V is
called orthogonal if every pair of distinct vectors in the set is
orthogonal. If in addition each vector in the set is a unit vec-
tor, that is, has norm is 1, then the set is called orthonormal.
Example 7.3.1
In the Euclidean space R3
the vectors
7.3: Orthonormal Sets and the Gram-Schmidt Process 227
form an orthogonal set since the scalar product of any two of
them vanishes. To obtain an orthonormal set, simply multiply
each vector by the reciprocal of its length:
Example 7.3.2
The standard basis of R n
consisting of the columns of the
identity matrix l n is an orthonormal set.
Example 7.3.3
The functions
j2/irsn. mx, m = 1, 2,...
form an orthonormal subset of the inner product space
C[0, 7r]. For we observed in Examples 7.2.4 and 7.2.5 that
these vectors are mutually orthogonal and have norm 1.
A basic property of orthogonal subsets is that they are
always linearly independent.
Theorem 7.3.1
Let V be a real inner product space; then any orthogonal subset
of V consisting of non-zero vectors is linearly independent.
Proof
Suppose that the subset {vi,..., vn } is orthogonal, so that
< vi, Vj > = 0 if i / j . Assume that there is a linear relation
of the form ciVi + • • • + cn vn = 0. Then, on taking the inner
product of both sides with Vj, we get
n n
0 = ^ < QVi, Vj > = '^TjCi <Vi,Vj > = Cj < Vj,Vj >
i=l i=l
II 112
— c • v •
228 Chapter Seven: Orthogonality in Vector Spaces
since < vi: Vj > = 0 if i ^ j . Now ||vj|| ^ 0 since Vj is not
the zero vector; therefore Cj — 0 for all j . It follows that the
Vj are linearly independent.
This result raises the possibility of an orthonormal basis,
and indeed we have already seen in Example 7.3.2 that the
standard basis of Rn
is orthonormal. While at present there
are no grounds for believing that such a basis always exists,
it is instructive to record at this stage some useful properties
of orthonormal bases.
Theorem 7.3.2
Suppose that {vi,...,vn } is an orthonormal basis of a real
inner product space V. If v is an arbitrary vector of V, then
n n
v — ^ < v, Vj > Vj and ||v||2
= ^ < v, Vj >2
.
i = l i = l
Proof
Let v = X)I=i c
iv
« t»e
the expression for v in terms of the
given basis. Forming the inner product of both sides with Vj,
we obtain
n n
< V, Vj > = < ^2 C
*V
*> Vj > = J ^ Cj < Vj, Vj > = Cj
i=l i=l
since < Vj, Vj > = 0 if i ^ j and < Vj, v,- > = 1. Finally,
n n
||v||2
= < v, v > = < ^ c i V i , Y;C
JV
J >
t = l 3=1
n n
= ^YlCiC
i <x
^j >,
i = i j = i
which reduces to Y^i=i c
j-
7.3: Orthonormal Sets and the Gram-Schmidt Process 229
Another useful feature of orthonormal bases is that they
greatly simplify the procedure for calculating projections.
Theorem 7.3.3
Let V be an inner product space and let S be a subspace and
v a vector of V. Assume that {si,... , sm} is an orthonormal
basis of S. Then the projection of v on S is
m
^2<V, Si> Si.
1 = 1
Proof
Put p = Y^Li < v
)s
i > s
i> a
vector which quite clearly
belongs to S. Now < p, s^- > = < v, Sj >, so
< V - p , Sj > = < V, Sj > - < p , Sj >
= < V, Sj > — < V, Sj >
= 0.
Hence v — p is orthogonal to each basis element of S, which
shows that v — p belongs to 5rJ
-. Since v = p + (v — p), and
the expression for v as the sum of an element of S and an
element of S1
- is unique, it follows that p is the projection of
v on S.
Example 7.3.4
The vectors
form an orthonormal basis of a subspace S of R3
; find the
projection on S of the column vector X with entries 1,-1,1.
230 Chapter Seven: Orthogonality in Vector Spaces
Apply 7.3.3 with si = X and s2 = X2 we find that the
projection of X on S is
P = <X,Xi >Xi+<X,X2 >X2
4 1 1 / 16>
Having seen that orthonormal bases are potentially useful, let
us now address the problem of finding such bases.
Gram-Schmidt orthogonalization
Suppose that V is a finite-dimensional real inner prod-
uct space with a given basis {ui,... ,un }; we shall describe
a method of constructing an orthonormal basis of V which is
known as the Gram-Schmidt process.
The orthonormal basis of V is constructed one element
at a time. The first step is to get a unit vector;
1
Vi = j . jj-Ui.
Notice that ui and vi generate the same subspace; let us call
it Si. Then vi clearly forms an orthonormal basis of Si. Next
let
Pi = < u2 , vi > vi.
By 7.3.3 this is the projection of 112 on Si. Thus u2 — px
belongs to S^ and u2 — Pi is orthogonal to vi. Notice that
U2 ~ Pi ^ 0 since ui and u2 are linearly independent. The
second vector in the orthonormal basis is taken to be
V2 = 71 n-(u2 - P i ) .
llU
2-Pll|
By definition of vi and v2, these vectors generate the same
subspace as Ui, 112, say S2. Also vi and v2 form an orthonor-
mal basis of £2 •
7.3: Orthonormal Sets and the Gram-Schmidt Process 231
The next step is to define
p2 = < u3, vi > vi+ < u3, v2 > v2,
which by 7.3.3 is the projection of u3 on S^. Then u3 — p2
belongs to S^ and so it is orthogonal to vi and v2. Again
one must observe that u3 — p2 7^ 0, the reason being that Ui,
u2, u3 are linearly independent. Now define the third vector
of the orthonormal basis to be
V 3 =
T
i H
I (U3
~ Ps)-
IIU3-P2II
Then vi, v2, v3 form an orthonormal basis of the subspace
53 generated by ui, u2, u3.
The procedure is repeated n times until we have con-
structed n vectors v i , . . . , vn ; these will form an orthonormal
basis of V.
Our conclusions are summarised in the following funda-
mental theorem.
Theorem 7.3.4 (The Gram - Schmidt Process)
Let {ui,..., un } be a basis of a finite-dimensional real inner
product space V. Define recursively vectors v i , . . . , vn by the
rules
vi = vi—[7U1 and vi + i = ir(u
i+i _
Pi)'
llu
l|| llu
i+l-Pill
where
Pi = < Ui+i. v i > v i H h < u i + i , Vi > Vj
is the projection of ui+i on the subspace Si =< v i , . . . , Vj >.
Then v i , . . . , vn form an orthonormal basis ofV.
The Gram-Schmidt process furnishes a practical method
for constructing orthonormal bases, although the calculations
can become tedious if done by hand.
232 Chapter Seven: Orthogonality in Vector Spaces
Example 7.3.5
Find an orthonormal basis for the column space S of the ma-
trix
1 1 2'
1 2 3
1 2 1
.1 1 6.
In the first place the columns X, X2, X3 of the matrix
are linearly independent and so constitute a basis of S. We
shall apply the Gram-Schmidt process to this basis to produce
an orthonormal basis {Yi, Y2, Y3} of S, following the steps in
the procedure.
Now compute the projection of X2 on S =< Y >;
Px = <X2, Y1>Yl=3Y1
1
1
lJ
The next vector in the orthonormal basis is
y 2 =
| | X 2 - P 1 l | ( X 2
" P l )
- 2
1
1
-lJ
The projection of X3 on S2 =< Yi, Y2 > is
P2 = < X3, Yi > Yx+ < X3, Y2>Y2 = 6Y1 - 2Y2 =
/ 4

2
2
V4/
7.3: Orthonormal Sets and the Gram-Schmidt Process 233
The final vector in the orthonormal basis of S is therefore
Y*
Example 7.3.6
Find an orthonormal basis of the inner product space P3 (R)
where < f,g > is defined to be J_1 f(x)g(x)dx.
We begin with the standard basis {1, x, x2
} of Pa(R) and
then use the Gram-Schmidt process to construct an orthonor-
mal basis {/1, fa, fa}. Since ||1|| = y{J_l x) = /2, the first
member of the basis is
1 - 1
1 - 1
Next <x,fx> = f^(x/V2)dx = 0, so px =< x, f1>f1 = 0.
Hence
since ||x|| = y/(f_1 x2
dx) = w | .
Continuing the procedure, we find that < x2
, fi > =
x/2/3 and < x2
, f2 > = 0. Hence p2 = < x2
, /1 > / i + < x2
,
H > H — 1/3, and so the final vector of the orthonormal
basis is
U = - (x2
--) = ^(x2
- -)
Consequently the polynomials
1
^ . a n d 3
* 2
- 1
" !
V2' V 2 2V2
234 Chapter Seven: Orthogonality in Vector Spaces
form an orthonormal basis of Pa(R).
QR-factorization
In addition to being a practical tool for computing or-
thonormal bases, the Gram-Schmidt procedure has important
theoretical implications. For example, it leads to a valuable
way of factorizing an arbitrary real matrix. This is generally
referred to as QR-factorization from the standard notation for
the factors Q and R.
Theorem 7.3.5
Let A be a real m x n real matrix with rank n. Then A can
be written as a product QR where Q is a real m x n matrix
whose columns form an orthonormal set and R is a real nxn
upper triangular matrix with positive entries on its principal
diagonal.
Proof
Let V denote the column space of the matrix A. Then V is
a subspace of the Euclidean inner product space R m
. Since
A has rank n, the n columns X,... ,Xn of A are linearly
independent, and thus form a basis of V. Hence the Gram-
Schmidt process can be applied to this basis to produce an
orthonormal basis of V, say Y±,..., Yn.
Now we see from the way that the Yi in the Gram-Schmidt
procedure are defined that these vectors have the form
f
Yl=b11X1
< Y2 = b12X1 + b22X2
 Yn = binXi + binX2 + • •
• + bnnXn
for certain real numbers b^ with ba positive. Solving the
equations for Xi,.. ., Xn by back-substitution, we get a linear
7.3: Orthonormal Sets and the Gram-Schmidt Process 235
system of the same general form:
X i = r n Y i
X2 = ri2Yi + r22Y2
Xn = rinYi + r2nY2 + •
•
• + rnnYn
for certain real numbers rij, with ru positive again. These
equations can be written in matrix form
A=[XXX2 ... Xn]
( r   f2 •
•
• r l n 
= YiY2 ... Yn]
0 r22 •
•
• r2n
V 0 0 •
• • rnnJ
The columns of the mx n matrix Q = [Yi Y2 ... Yn] form an
orthonormal set since they constitute an orthonormal basis of
Rm
, while the matrix R = [rjj]n;n is plainly upper triangular.
The most important case of this theorem is when A is a
non-singular square matrix. Then the matrix Q is n x n, and
its columns form a orthonormal set; equivalently it has the
property
which, by 3.3.4, is just to say that Q~l
= QT
.
A square matrix A such that
AT
= A'1
is called an orthogonal matrix. We shall see in Chapter 9 that
orthogonal matrices play an important role in the study of
canonical forms of matrices.
It is instructive to determine to investigate the possible
forms of an orthogonal 2 x 2 matrix.
236 Chapter Seven: Orthogonality in Vector Spaces
Example 7.3.7
Find all real orthogonal 2 x 2 matrices.
Suppose that the real matrix
A=r 
c a
is orthogonal; thus AT
A = I2. Equating the entries of the
matrix AT
A to those of I2, we obtain the equations
a2
+ c2
= 1 = b2
+ d2
and ab + cd = 0.
Now the first equation asserts that the point (a, c) lies on the
circle x2
+ y2
= 1. Hence there is an angle 9 in the interval
[0, 2n] such that a = cos 9 and c = sin 9. Similarly there is
an angle 4> in this interval such that b = cos 0 and d = sin (p.
Now we still have to satisfy the third equation ab+cd = 0,
which requires that
cos 9 cos 4> + sin 9 sin 0 = 0
that is, cos(c/> - 9) = 0. Hence 4> - 9 = ±?r/2 or ±3TT/2. We
need to solve for b and d in each case. If <
/
> = 0 + 7r/2 or
cp = 9 — 37r/2, we find that b — — sin 9 and d = cos 9. If, on
the other hand, (f) = 9 - n/2 or 0 = 9 + 37r/2, it follows that
b = sin 9 and d = — cos 0.
We conclude that >1 has of one of the forms
cos 9 sin 9
sin 9 — cos 0
with 9 in the interval [0, 2n. Conversely, it is easy to verify
that such matrices are orthogonal. Thus the real orthogonal
2 x 2 matrices are exactly the matrices of the above types.
7.3: Orthonormal Sets and the Gram-Schmidt Process 237
We remark that these matrices have already appeared
in other contexts. The first matrix represents an anticlock-
wise rotation in R2
through angle 0: see Example 6.2.6. The
second matrix corresponds to a reflection in R2
in the line
through the origin making angle 0/2 with the positive IE-
direction; see Exercises 6.2.3 and 6.2.6. Thus a connection
has been established between 2x2 real orthogonal matrices on
the one hand, and rotations and reflections in 2-dimensional
Euclidean space on the other.
It is worthwhile restating the QR-factorization principle
in the important case where the matrix A is invertible.
Theorem 7.3.6
Every invertible real matrix A can be written as a product QR
where Q is a real orthogonal matrix and R is a real upper tri-
angular matrix with positive entries on its principal diagonal.
Example 7.3.8
Write the following matrix in the QR-factorized form:
A
The method is to apply the Gram-Schmidt process to
the columns X, X2, X3 of A, which are linearly independent
and so form a basis for the column space of A. This yields an
orthonormal basis {Yi, Y2, Y3} where
^ = -1 n=^,
y
-;*UJ-2
T* +
T*
238 Chapter Seven: Orthogonality in Vector Spaces
and
= - 3 ^ X 2 + /2X3.
Solving back, we obtain the equations
Xx = v^Fi
X2 = 4 ^ / 3 Yx + V6/3 Y2
X3 = 2VSYX + x/6/2 Y2 + V2/2 Y3
Therefore A = QR where
fl/y/3 -1/V6 l/v7
^
Q = l/>/3 2//6 0
lA/3 -1/V6 - l / A
and
(y/3 4/V3 2^3 
R = 0 /6/3 /6/2 .
 0 0 V2/2J
Unitary matrices
We point out, without going through the details, that
there is a version of the Gram-Schmidt procedure applicable
to complex inner product spaces. In this the formulas of 7.3.4
are carried over with minor changes, to reflect the properties
of complex inner products.
There is also a QR-factorization theorem. In this an im-
portant change must be made; the matrix Q which is pro-
duced by the Gram-Schmidt process has the property that
its columns are orthogonal with respect to the complex inner
product on Cm
. In the case where Q is square this is equiva-
lent to the equation
Q*Q = In
7.3: Orthonormal Sets and the Gram-Schmidt Process 239
or
Q-X
=Q*.
Recall here that Q* = {Q)T
• A complex matrix Q with the
above property is said to be unitary. Thus unitary matri-
ces are the complex analogs of real orthogonal matrices. For
example, the matrix
( cos 9 isin9
isin9 cos9 J '
is unitary for all real values of 9; here of course i = f—l.
Exercises 7.3
1. Show that the following vectors constitute an orthogonal
basis of R3
:
'i)-G)-(4
2. Modify the basis in Exercise 1 to obtain an orthonormal
basis.
3. Find an orthonormal basis for the column space of the
matrix
0 1 1'
1 - 2 1
1 2 0
4. Find an orthonormal basis for the subspace of ^ ( R ) gen-
erated by the polynomials 1 — 6x and 1 — 6x2
where < f,g >
= Jo f(x)g(x)dx.
2 4 0 Chapter Seven: Orthogonality in Vector Spaces
3
5. Find the projection of the vector ( 4 | on the subspace
-2_
of R3
which has the orthonormal basis consisting of
6. Express the matrix of Exercise 3 in QR-factorized form.
7. Show that a non-singular complex matrix can be expressed
as the product of a unitary matrix and an upper triangular
matrix whose diagonal elements are real and positive.
8. Find a factorization of the type described in the previous
exercise for the matrix
( —i i
l + i 2
where % = ->/—l.
9. If A and B are orthogonal matrices, show that A-1
and AB
are also orthogonal. Deduce that the set of all real orthogonal
nxn matrices is a group with respect to matrix multiplication
in the sense of 1.3.
10. If A = QR — Q'R' are two QR-factorizations of the real
non-singular square matrix A, what can you say about the
relationship between the Q and Q', and R and R'l
11. Let L be a linear operator on the Euclidean inner product
space Rn
. Call L orthogonal if it preserves lengths, that is, if
||LpO|| = X for all vectors X in Rn
.
(a) Give some natural examples of orthogonal linear
operators.
(b) Show that L is orthogonal if and only if it preserves
inner products, that is, < L(X),L(Y) > = < X,Y >
for all X and Y.
7.4: The Method of Least Squares 241
12. Let L be a linear operator on the Euclidean space Rn
.
Prove that L is orthogonal if and only if L(X) — AX where
A is an orthogonal matrix.
13. Deduce from Exercise 12 and Example 7.3.7 that a lin-
ear operator on R2
is orthogonal if and only if it is either a
rotation or a reflection.
7.4 The Method of Least Squares
A well known application of linear algebra is a method
of fitting a function to experimental data called the Method
of Least Squares. In order to illustrate the practical problem
involved, let us consider an experiment involving two measur-
able variables x and y where it is suspected that y is, approx-
imately at least, a linear function of x.
Assume that we have some supporting data in the form
of observed values of the variables and x and y, which can be
thought of as a set of points in the xy-plane
( a i , 6 i ) , . . . , ( a m , 6 m ) .
This means that when x = a*, it was observed that y = b{.
Now if there really were a linear relation, and if the data were
free from errors, all of these points would lie on a straight line,
whose equation could then be determined, and the linear rela-
tion would be known. But in practice it is highly unlikely that
this will be the case. What is needed is a way of finding the
straight line which "bests fits" the given data. The equation
of this best-fitting line will furnish a linear relation which is
an approximation to y.
242 Chapter Seven: Orthogonality in Vector Spaces
It remains to explain what is meant by the best-fitting
straight line. It is here that the "least squares" arise.
Consider the linear relation y = cx+d; this is the equation
of a straight line in the xy-plane. The conditions for the line
to pass through the m data points are
{
mi + d = b
ca2+ d = b2
cam + d — bm
Now in all probability these equations will be inconsistent.
However, we can ask for real numbers c and d which come
as close to satisfying the equations of the linear system as
possible, in the sense that they minimize the "total error". A
good measure of this total error is the expression
(cax + d- bi)2
H h (cam + d- bm)2
•
This is the sum of the squares of the vertical deviations of
the line from the data points in the diagram above. Here the
squares are inserted to take care of any negative signs that
might appear.
It should be clear the line-fitting problem is just a par-
ticular instance of a general problem about inconsistent linear
systems. Suppose that we have a linear system of m equations
in n unknowns x,..., xn
AX = B.
Since the system may be inconsistent, the problem of interest
is to find a vector X which minimizes the length of the vector
AX — B, or what is equivalent and also a good deal more
convenient, its square,
E= WAX -Bf.
7.4: The Method of Least Squares 243
In our original example, where a straight line was to be fitted
to the data, the matrix A has two columns a1a2 ... am] and
[11 ... 1], while X = I 1 and B is the column [bib2 • •
. bm]T
.
Then E is the sum of the squares of the quantities cai + d — bi.
A vector X which minimizes E is called a least squares
solution of the linear system AX = B. A least squares solution
will be an actual solution of the system if and and only if the
system is consistent.
The normal system
Once again consider a linear system AX = B and write
E = AX — B ||2
. We will show how to minimize E. Put
A = [aij]m,n and let the entries of X and B be x i , . . . , xn and
b,..., bm respectively. The ith entry of AX — B is clearly
(Z)"=i a
ijx
j) - t>i. Hence
E=AX-B2
= J2 ((E0
*^-)"6
*)
i=i j=i
2
which is a quadratic function of xi,..., xn.
At this juncture it is necessary to recall from calculus
the procedure for finding the absolute minima of a function
of several variables. First one finds the critical points of the
function E, by forming its partial derivatives and setting them
equal to zero:
m n
Hence
i = l j = l i = l
244 Chapter Seven: Orthogonality in Vector Spaces
for k = 1,2,... ,n. This is a new linear system of equations
in x,..., xn whose matrix form is
(AT
A)X = AT
B.
It is called the normal system of the linear system AX = B.
The solutions of the normal system are the critical points of
E.
Now E surely has an absolute minimum - after all it is a
continuous function with non-negative values. Since the func-
tion E is unbounded when XJ is large, its absolute minima
must occur at critical points. Therefore we can state:
Theorem 7.4.1
Every least squares solution of the linear system AX — B is
a solution of the normal system (AT
A)X = AT
B.
At this point potential difficulties appear: what if the
normal system is inconsistent? If this were to happen, we
would have made no progress whatsoever. And even if the
normal system is consistent, need all its solutions be least
squares solutions?
To help answer these questions, we establish a simple
result about matrices.
Lemma 7.4.2
Let A be a real mxn matrix. Then A7
A is a symmetric nxn
matrix whose null space equals the null space of A and whose
column space equals the column space of AT
.
Proof
In the first place {AT
A)T
= AT
{AT
)T
= AT
A, so AT
A is
certainly symmetric. Let S be the column space of A. Then
by 7.2.6 the null space of AT
equals SL
.
Let X be any n-column vector. Then X belongs to the
null space of AT
A if and only if AT
(AX) = 0; this amounts
to saying that AX belongs to the null space of AT
or, what
7.4: The Method of Least Squares 245
is the same thing, to S-1
. But AX also belongs to S; for it is
a linear combination of the columns of A. Now S fl S1
- is the
zero space by 7.2.3. Hence AX = 0 and X belongs to the null
space of A. On the other hand, it is obvious that if X belongs
to the null space of A, then it must belong to the null space
of AT
A. Hence the null space of AT
A equals the null space of
A.
Finally, by 7.2.6 and the last paragraph we can assert
that the column space of AT
A equals
(null space of AT
A)±
= (null space ofA)±
.
This equals the column space of AT
, as claimed.
We come now to the fundamental theorem on the Method
of Least Squares.
Theorem 7.4.3
Let AX = B be a linear system ofm equations in n unknowns.
(a) The normal system (AT
A)X = AT
B is always con-
sistent and its solutions are exactly the least squares solutions
of the linear system AX = B;
(b) if A has rank n, then AT
A is invertible and there is
a unique least squares solution of the normal system, namely
X = (AT
A)-l
AT
B.
Proof
By 7.4.2 the column space of AT
A equals the column space of
AT
. Therefore the column space of the matrix
[AT
A | AT
B
equals the column space of AT
A; for the extra column AT
B
is a linear combination of the columns of AT
and thus belongs
to the column space of AT
A. It follows that the coefficient
matrix and the augmented matrix of the normal system have
246 Chapter Seven: Orthogonality in Vector Spaces
the same rank. By 5.2.5 this is just the condition for the
normal system to be consistent.
The next point to establish is that every solution of the
normal system is a least squares solution of AX = B. Suppose
that X and X2 are two solutions of the normal system. Then
AT
A{XX - X2) = AT
B - AT
B = 0, so that Y = Xx - X2
belongs to the null space of AT
A. By 7.4.2 the latter equals
the null space of A. Thus AY = 0. Since Xx - Y + X2, we
have
AXi -B = A(Y + X2)-B = AX2 - B.
This means that E = AX — B2
has the same value for
X = Xi and X = X2. Thus all solutions of the normal
system give the same value of E. Since by 7.4.1 every least
squares solution is a solution of the normal system, it follows
that the solutions of the normal system constitute the set of
all least squares solutions, as claimed.
Finally, suppose that A has rank n. Then the matrix
AT
A also has rank n since by 7.4.2 the column space of AT
A
equals the column space of AT
, which has dimension n. Since
AT
A is n x n, it is invertible by 5.2.4 and 2.3.5. Hence the
equation AT
AX = AT
B leads to the unique solution
X = (AT
A)-1
AT
B,
which completes the proof On the other hand, if the rank of
A is less than n, there will be infinitely many least squares
solutions. We shall see later how to select one that is in some
sense optimal.
Example 7.4.1
Find the least squares solution of the following linear system:
xi + x2 + x3 = 4
-x + x2 + x3 — 0
- x2 + x3 =1
xi + x3 = 2
7.4: The Method of Least Squares 247
A = and B =
Here
1
so A has has rank 3. Since the augmented matrix has rank
4, the linear system is inconsistent. We know from 7.4.3 that
there is a unique least squares solution in this case. To find
it, first compute
A1
A =
3 0 1 x / 11 1—3
0 3 1 I and (AT
A)~l
= — | 1 1 1 - 3
1 1 4 - 3 - 3 9
Hence the least squares solution is
T A- AT;
X = {Ai
AyL
A1
B =
1
that is, xi = 8/5, x2 = 3/5, £3 = 6/5.
Example 7.4.2
A certain experiment yields the following data:
X
y
- l
0
0
l
l
3
2
9
It is suspected that y is a quadratic function of x. Use the
Method of Least Squares to find the quadratic function that
best fits the data.
Suppose that the function is y = a + bx + ex2
. We need
to find a least squares solution of the linear system
a
a
a
a
~ b + c
+ b + c
+ 26 + Ac
= 0
= 1
= 3
= 9
248 Chapter Seven: Orthogonality in Vector Spaces
Again the linear system is inconsistent. Here
A =
/ I
1
1
1
V
4 X

0 0
1 1
2 4 /
and B =
1
3
W
and A has rank 3. We find that
AT
A =
and
T A-l
(AM)
12 -20
36 -20
-20 20
The unique least squares solution is therefore
11
X = (AT
A)-l
AT
B = — I 33
20
25
that is, a = 11/20, b = 33/20, c = 5/4. Hence the quadratic
function that best fits the data is
11 33 5 o
y
20 20 4
Least squares and QR-factorization
Consider once again the least squares problem for the
linear system AX = B where A is m x n with rank n; we
have seen that in this case there is a unique least squares so-
lution X — (AT
A)~1
AT
B. This expression assumes a simpler
form when A is replaced by its QR-factorization. Let this be
7.4: The Method of Least Squares 249
A = QR, as in 7.3.5. Thus Q is an m x n matrix with or-
thonormal columns and R is an n x n upper triangular matrix
with positive diagonal elements. Since the columns of Q form
an orthonormal set, QT
Q = In. Hence
AT
A = RT
QT
QR = RT
R.
Thus X = {RT
R)-1
RT
QT
B, which reduces to
X = R-X
QT
B,
a considerable simplification of the original formula. However
Q and R must already be known before this formula can be
used.
Example 7.4.3
Consider the least squares problem
Here
A =
Xi
Xi
Xi
' 
+ x2
+ 2x2
+ 2x2
1 2
2 3
2 l)
+ 2x3 =
+ 3x3 =
+ x3 =
and B =
1
2
1
(I
1
V
It was shown in Example 7.3.8 that A = QR where
1/
Q=' '
and
R
l/>/3 -1/V6 1/V2'
1/V3 2/v^ 0
1/V3 - l /  / 6 - l /  / 2 /
V^ 4//3 2 ^ 
0 >/6/3 /6/2
0 0 V ^ / 2 /
250 Chapter Seven: Orthogonality in Vector Spaces
Hence the least squares solution is
X = R~1
QT
B = 1 ) ,
t h a t is, X = 1, X2 = 0, X3 = 0.
Geometry of the least squares process
There is a suggestive geometric interpretation of the least
squares process in terms of projections. Consider the least
squares problem for the linear system AX = B where A has
m rows. Let S denote the column space of the coefficient
matrix A. The least squares solutions are the solutions of the
normal system AT
AX = AT
B, or equivalently
AT
{B - AX) = 0.
The last equation asserts that B — AX belongs to the null
space of AT
, which by 7.2.6 is equal to S1
. Our condition
can therefore be reformulated as follows: X is a least squares
solution of AX = B if and only if B — AX belongs to S1
.
Now B = AX + (B - AX) and AX belongs to S. Recall
from 7.2.4 that B is uniquely expressible as the sum of its
projections on the subspaces S and S-1
; we conclude that
B — AX belongs to S1
precisely when AX is the projection of
B on S. In short we have a discovered a geometric description
of the least squares solutions.
Theorem 7.4.4
Let AX = B be an arbitrary linear system and let S denote
the column space of A. Then a column vector X is a least
squares solution of the linear system if and only if AX is the
projection of B on S.
Notice that the projection AX is uniquely determined by
the linear system AX = B. However there is a unique least
7.4: The Method of Least Squares 251
squares solution X if and only if X is uniquely determined by
AX, that is, if AX = AX implies that X = X. Hence X is
unique if and only if the null space of A is zero, that is, if the
rank of A is n. Therefore we can state
Corollary 7.4.5
There is a unique least squares solution of the linear system
AX = B if and only if the rank of A equals the number of
columns of A.
Optimal least squares solutions
Returning to the general least squares problem for the
linear system AX = B with n unknowns, we would like to be
able to say something about the least squares solutions in the
case where the rank of A is less than n. In this case there
will be many least square solutions; what we have in mind is
to find a sensible way of picking one of them. Now a natural
way to do this would be to select a least squares solution
with minimal length. Accordingly we define an optimal least
squares solution of AX = B to be a least squares solution X
whose length X is as small as possible.
There is a simple method of finding an optimal least
squares solution. Let U denote the null space of A; then U
equals (column space of AT
)±
, by 7.2.6. Suppose X is a least
squares solution of the system AX = B. Now there is a unique
expression X = XQ + X where XQ belongs to U and X be-
longs to U1
; this is by 7.2.4. Then AX = AX0 + AXX = AX±;
for AXQ = 0 since XQ belongs to the null space of A. Thus
AX — B = AXi — B, so that X is also a least squares solution
of AX = B. Now we compute
||X||2
= XQ+XX2
= (X0+X1)T
(X0+X1) = XZXQ+XTXL
For XQXI = 0 = XJ'XQ since X0 and Xx belong to U and
U1
- respectively. Therefore
||x||2
HI*ol|2
+ ||Xi||2
>||Xi||2
.
252 Chapter Seven: Orthogonality in Vector Spaces
Now, if X is an optimal solution, then ||X|| = X, so that
X0 = 0 and hence XQ = 0. Thus X = Xi belongs to U1
. It
follows that each optimal least squares solution must belong
to t/-1
, the column space of AT
.
Finally, we show that there is a unique least squares so-
lution in U±
. Suppose that X and X are two least squares
solutions in U^. Then from 7.4.4 we see that AX and AX are
both equal to the projection of B on the column space of A.
Thus A(X — X) = 0 and X — X belongs to U, the null space
of A. But X and X also belong to U^~, whence so does X — X.
Since U D U1
= 0, it follows that X - X = 0 and X = X.
Hence X is the unique optimal least squares solution and it
belongs to Vs
-. Combining these conclusions with 7.4.4, we
obtain:
Theorem 7.4.6
A linear system AX = B has a unique optimal least squares
solution, namely the unique vector X in the column space of
AT
such that AX is the projection of B on the column space
ofAT
.
The proof of 7.4.6 has the useful feature that it tells us
how to find the optimal least squares solution of a linear sys-
tem AX = B. First find any least squares solution, and then
compute its projection on the column space of AT
.
Example 7.4.4
Find the optimal least squares solution of the linear system
Xl - X2 + X3 = 1
xi + x2 - 2x3 = 2
2xi - x3 = 4
7.4: The Method of Least Squares 253
The first step is to identify the normal system (AT
A)X =
AT
B;
6Xx — 3x3 = 11
2x2 - 3x3 = 1
—3xi — 3x2 + 6x3 = —7
Any solution of this will do; for example, we can take the
solution vector
'-(?)•
To obtain an optimal least squares solution, find the projec-
tion of X on the column space of AT
; the first two columns
of AT
form a basis of this space. Proceeding as in Example
7.2.12, we find the optimal solution to be
so that Xi = 67/42, x2 = —3/14, X3 = —10/21 is the optimal
least squares solution of the linear system.
Least squares in inner product spaces
In 7.4.4 we obtained a geometrical interpretation of the
least squares process in Rn
in terms of projections on sub-
spaces. This raises the question of least squares processes in
an arbitrary finite-dimensional real inner product space V.
First we must formulate the least squares problem in V.
This consists in approximating a vector v in V by a vector in
a subspace S of V. A natural way to do this is to choose x in
S so that ||x — v||2
is as small as possible. This is a direct
generalization of the least squares problem in Rn
. For, if we
are given the linear system AX = B and we take S to be the
column space of A, v to be B and x to be the vector AX of
S, then the least squares problem is to minimize || AX" — -E?||2
.
2 5 4 Chapter Seven: Orthogonality in Vector Spaces
It turns out that the solution of this general least squares
problem is the projection of v on S, just as in the special case
ofRn
.
Theorem 7.4.7
Let V be a finite-dimensional, real inner product space, and let
v be an element and S a subspace of V. Denote the projection
of v on S by p. Then, if x is any vector in S other than p,
the inequality ||x — v|| > ||p — v|| holds.
Thus p is the vector in S which most closely approximates
v in the sense that it makes ||p — v|| as small as possible.
Proof
Since x and p both belong to S, so does x — p. Also p — v
belongs to S1
- since p is the projection of v on S. Hence
< p — v, x — p > = 0. It follows that
||x - v||2
= < (x - p) + (p - v), (x - p) + (p - v) >
= < x - p , x - p > + < p - v , p - v >
= ||x - P||2
+ ||P - v||2
> ||p - v f
since x — p ^ 0. Hence ||x — v|| > ||p — v||.
In applying 7.4.7 it is advantageous to have at hand an
orthonormal basis {vi,..., vm } of S. For the task of comput-
ing p, the projection of v on S, is then much easier since the
formula of 7.3.3 is available:
m
P = ^2 < V, Si > Sj .
i = l
Example 7.4.5
Use least squares to find a quadratic polynomial that approx-
imates the function ex
in the interval [—1, 1].
7.4: The Method of Least Squares 255
Here it is assumed that we are working in the inner prod-
uct space C[—1, 1] where < /, g > = J_x f(x)g(x)dx. Let S
denote the subspace consisting of all quadratic polynomials in
x. An orthonormal basis for S was found in Example 7.3.6:
1 , [3 3>/5, 2 1-
By 7.4.7 the least squares approximation to ex
in S is simply
the projection of ex
on S; this is given by the formula
p = < ex
, h > h + < ex
, h > h + < ex
, h > h-
Evaluating the integrals by integration by parts, we obtain
and
<e',/i> = ^ J.  <ex
J2>=V6e-1
<ex
,h> = J{e-le-1
).
The desired approximation to ex
is therefore
P=(e-e-')+3e-1
x + ^(e-7e-')(x2
-±).
Alternatively one can calculate the projection by using the
standard basis 1, x,x2
.
Exercises 7.4
1. Find least squares solutions of the following linear systems:
xi + x2 = 0
(a) I °°2 + X3 =
°
* xi - x2 - x3 = 3
£i + ^3 = 0
256 Chapter Seven: Orthogonality in Vector Spaces
' xi + x2 - 2x3 — 3
^ I 2xi - x2 + 3x3 = 4
1
| xi + x3 = 1
. X + X2 + X3 = 1
2. The following data were collected for the mean annual tem-
perature t and rainfall r in a certain region; use the Method
of Least Squares to find a linear approximation for r in terms
of t (a calculator is necessary):
t
r
24
47
27
30
22
35
24
38
3. In a tropical rain forest the following data was collected for
the numbers x and y (per square kilometer) of a prey species
and a predator species over a number of years. Use least
squares to find a quadratic function of x that approximates y
(a calculator is necessary):
X
y
2
l
3
2
4
2
5
1
4. Find the optimal least squares solution of the linear system
Xi + 2x3 = 1
x2 + 3x3 = 0
—xi + x2 + x3 = 0
— x2 — 3x3 — 1
5. Find a least squares approximation to the function e~x
by
a linear function in the interval [1, 2]. [Use the inner product
< f,9> = fi f(x)g(x)dx}.
6. Find a least squares approximation for the function sin x
as a quadratic function of x in the interval [0, n]. [Here the
inner product < f,g > = JQ f(x)g(x)dx is to be used].
Chapter Eight
EIGENVECTORS AND EIGENVALUES
An eigenvector of an n x n matrix A is a non-zero n-
column vector X such that AX = cX for some scalar c, which
is called an eigenvalue of A. Thus the effect of left multiplica-
tion of an eigenvector by A is merely to multiply it by a scalar,
and when n < 3, a parallel vector is obtained. Similarly, if T
is a linear operator on a vector space V, an eigenvector of T is
a non-zero vector v of V such that T(v) = cv for some scalar
c called an eigenvalue. For example, if T is a rotation in R3
,
the eigenvectors of T are the non-zero vectors parallel to the
axis of rotation and the eigenvalues are all equal to 1.
A large amount of information about a matrix or linear
operator is carried by its eigenvectors and eigenvalues. In
addition, the theory of eigenvectors and eigenvalues has im-
portant applications to systems of linear recurrence relations,
Markov processes and systems of linear differential equations.
We shall describe the basic theory in the first section and
then we give applications in the following two sections of the
chapter.
8.1 Basic Theory of Eigenvectors and Eigenvalues
We begin with the fundamental definition. Let A be an
n x n matrix over a field of scalars F. An eigenvector of A is
a non-zero n-column vector X over F such that
AX = cX
for some scalar c in F; the scalar c is then referred to as the
eigenvalue of A associated with the eigenvector X.
257
258 Chapter Eight: Eigenvectors and Eigenvalues
In order to clarify the definition and illustrate the tech-
nique for finding eigenvectors and eigenvalues, an example will
be worked out in detail.
Example 8.1.1
Consider the real 2 x 2 matrix
A
<1
The condition for the vector
M
- 1
4
'xx^
x2)
to be an eigenvector of A is that AX = cX for some scalar
c. This is equivalent to (A — cI2)X = 0, which simply asserts
that X is a solution of the linear system
2 - c - 1
2 4 - c
Xi
%2
Now by 3.3.2 this linear system will have a non-trivial solution
xi, X2 if and only if the determinant of the coefficient matrix
vanishes,
2 - c - 1
2 4 - <
= 0,
that is, c2
— 6c + 10 = 0. The roots of this quadratic equa-
tion are c = 3 + >/^T and C2 = 3 — -f—l, so these are the
eigenvalues of A.
The eigenvectors for each eigenvalue are found by solving
the linear systems (A — CI2)X = 0 and (A — c2l2)X = 0. For
example, in the case of c we have to solve
{--4^l)xX- £2=0
2zi + (l->/=
l)a;2 = 0
8.1: Basic Theory of Eigenvectors 259
The general solution of this system is £1 = |(—1 + y/—l) and
x2 — d , where d is an arbitrary scalar. Thus the eigenvectors
of A associated with the eigenvalue C are the non-zero vectors
of the form
Notice that these, together with the zero vector, form a 1-
dimensional subspace of C2
. In a similar manner the eigen-
vectors for the eigenvalue 3 — /—T are found to be the vectors
of the form
where d ^ 0. Again these form with the zero vector a subspace
ofC2
.
It should be clear to the reader that the method used in
this example is in fact a general procedure for finding eigen-
vectors and eigenvalues. This will now be described in detail.
The characteristic equation of a matrix
Let A be an n x n matrix over a field of scalars F, and let
X be a non-zero n-column vector over F. The condition for X
to be an eigenvector of A is AX = cX, or
{A - dn)X = 0,
where c is the corresponding eigenvalue. Hence the eigenvec-
tors associated with c, together with the zero vector, form
the null space of the matrix A — cln. This subspace is often
referred to as the eigenspace of the eigenvalue c.
Now (A — dn)X = 0 is a linear system of n equations in n
unknowns. By 3.3.2 the condition for there to be a non-trivial
solution of the system is that the coefficient matrix have zero
determinant,
det(A - cln) = 0.
260 Chapter Eight: Eigenvectors and Eigenvalues
Conversely, if the scalar c satisfies this equation, there will
be a non-zero solution of the system and c will be an eigen-
value. These considerations already make it clear that the
determinant
an -x a12 • • • aln
«2i a22~ x • •
• a2n
0"nl Q"n2 ' ' ' &nn X
must play an important role. This is a polynomial of de-
gree n in x which is called the characteristic polynomial of
A. The equation obtained by setting the characteristic poly-
nomial equal to zero is the characteristic equation. Thus the
eigenvalues of A are the roots of the characteristic equation
(or characteristic polynomial) which lie in the field F.
At this point it is necessary to point out that A may
well have no eigenvalues in F. For example, the characteristic
polynomial of the real matrix
is x2
+ 1, which has no real roots, so the matrix has no eigen-
values in R.
However, if A is a complex nxn matrix, its characteristic
equation will have n complex roots, some of which may be
equal. The reason for this is a well-known result known as
The Fundamental Theorem of Algebra; it asserts that every
polynomial / of positive degree n with complex coefficients can
be expressed as a product of n linear factors; thus the equation
f(x) = 0 has exactly n roots in C. Because of this we can be
sure that complex matrices always have all their eigenvalues
and eigenvectors in C. It is this case that principally concerns
us here.
Let us sum up our conclusions about the eigenvalues of
complex matrices so far.
det(A - xln) =
8.1: Basic Theory of Eigenvectors 261
Theorem 8.1.1
Let A be an n x n complex matrix.
(i) The eigenvalues of A are precisely the n roots of the
characteristic polynomial &et(A — xln);
(ii) the eigenvectors of A associated with an eigenvalue c
are the non-zero vectors in the null space of the matrix
A-cIn.
Thus in Example 8.1.1 the characteristic polynomial of
the matrix is
2-x - 1
2 4-x
= x2
- 6x + 10.
The eigenvalues are the roots of the characteristic equation
x2
— 6x + 10 = 0, that is, c = 3 + f—T and c^ — 3 — J—1;
the eigenspaces of c and c^ are generated by the vectors
and
( _ l + v /3T) / 2 -(l + V=l)/2
1
respectively.
Example 8.1.2
Find the eigenvalues of the upper triangular matrix
(a-x ai2 ai3
0 a22 - x a23
« l n 
0-2n
 0 0 0 ann — x /
The characteristic polynomial of this matrix is
an - x a12 a13
0 a22 - x a23
0 0 0
a in
0>2n
Ojn.n. ^
262 Chapter Eight: Eigenvectors and Eigenvalues
which, by 3.1.5, equals (an — x)(a,22 — x) ... (ann — x). The
eigenvalues of the matrix are therefore just the diagonal entries
^11) <^22) • • • i^nn-
Example 8.1.3
Consider the 3 x 3 matrix
The characteristic polynomial of this matrix is
2-x - 1 - 1
- 1 2-x - 1
- 1 - 1 -x
= -x6
+ 4x2
- x - 6.
Fortunately one can guess a root of this cubic polynomial,
namely x = — 1. Dividing the polynomial by x + 1 using long
division, we obtain the quotient —x2
+ 5x — 6 = — (x — 2)(x — 3).
Hence the characteristic polynomial can be factorized com-
pletely as — (x + l)(x — 2)(x — 3), and the eigenvalues of A are
— 1, 2 and 3.
To find the corresponding eigenvectors, we have to solve
the three linear systems (A + I3)X ~ 0, (A - 2I3)X = 0 and
(A — 3/s)X = 0. On solving these, we find that the respective
eigenvectors are the non-zero scalar multiples of the vectors
The eigenspaces are generated by these three vectors and so
each has dimension 1.
8.1: Basic Theory of Eigenvectors 263
Properties of the characteristic polynomial
Now let us see what can be said in general about the
characteristic polynomial ofannxn matrix A. Let p(x) denote
this polynomial; thus
Q>nn 2-
At this point we need to recall the definition of a determinant
as an alternating sum of terms, each term being a product of
entries, one from each row and column. The term of p(x) with
highest degree in x arises from the product
(an - x)--- (ann - x)
and is clearly (—x)n
. The terms of degree n — 1 are also easy
to locate since they arise from the same product. Thus the
coefficient of xn
~x
is
( - l ) n
- 1
( a n + --- + ann)
and the sum of the diagonal entries of A is seen to have sig-
nificance; it is given a special name, the trace of A,
tr(A) = an + a22 H h ann.
The term in p(x) of degree n — 1 is therefore tr(^4) (—a;)"-1
.
The constant term in p(x) may be found by simply
putting x = 0 in p(x) = det(A — xln), thereby leaving det(A).
Our knowledge of p(x) so far is summarized in the formula
p{x) = (-x)n
+ t r ^ X - a : ) " - 1
+ • • • + det(A).
p(x) =
an — x a2
0-21 0-22 - X
0"nl 0-n2
2 6 4 Chapter Eight: Eigenvectors and Eigenvalues
The other coefficients in the characteristic polynomial are
not so easy to describe, but they are in fact expressible as
subdeterminants of det(i4). For example, take the case of
xn
~2
. Now terms in xn
~2
arise in two ways: from the product
(an — x) • • • (ann — x) or from products like
-ai2 a2 i(a3 3 - x) • •
• (ann - x).
So a typical contribution to the coefficient of xn
~2
is
(-l)n
-2
(ana2 2 - a12a2i) = (-1)
From this it is clear that the term of degree n — 2 in p(x) is
just (—x)n
~2
times the sum of all the 2 x 2 determinants of
the form
an O'ij
a
ji a
jj
where i < j .
In general one can prove by similar considerations that
the following is true.
Theorem 8.1.2
The characteristic polynomial of the n x n matrix A equals
n
J2di(-x)n
-*
i=0
where di is the sum of all the i x i subdeterminants of det(A)
whose principal diagonals are part of the principal diagonal of
A.
Now assume that the matrix A has complex entries. Let
ci, c2 ,..., cn be the eigenvalues of A. These are the n roots of
the characteristic polynomial p(x). Therefore, allowing for the
an ai 2
0-21 0-22
8.1: Basic Theory of Eigenvectors 265
fact that the term of p(x) with highest degree has coefficient
(—l)n
, one has
p(x) = (ci - x)(c2 -x)---(cn- x).
The constant term in this product is evidently just cCi... cn,
while the term in xn
~l
has coefficient (—l)n-1
(ci + • • • + cn).
On the other hand, we previously found these to be det(A) and
(—l)n
~1
tx{A) respectively. Thus we arrive at two important
relations between the eigenvalues and the entries of A.
Corollary 8.1.3
// A is any complex square matrix, the product of the eigenval-
ues equals the determinant of A and the sum of the eigenvalues
equals the trace of A
Recall from Chapter Six that matrices A and B are said
to be similar if there is an invertible matrix S such that B =
SAS~X
. The next result indicates that similar matrices have
much in common, and really deserve their name.
Theorem 8.1.4
Similar matrices have the same characteristic polynomial and
hence they have the same eigenvalues, trace and determinant.
Proof
The characteristic polynomial of B — SAS^1
is
det^SAS'1
- xl) =det(S(A - x^S'1
)
= det(S) det(A - xl) det(5)"1
= det{A-xI).
Here we have used two fundamental properties of determi-
nants established in Chapter Three, namely 3.3.3 and 3.3.5.
The statements about trace and determinant now follow from
8.1.3.
266 Chapter Eight: Eigenvectors and Eigenvalues
On the other hand, one cannot expect similar matrices to
have the same eigenvectors. Indeed the condition for X to be
an eigenvector of SAS~X
with eigenvalue c is (SAS'^X —
cX, which is equivalent to Atf^X) = c(S~1
X). Thus X is
an eigenvector of SAS~~l
if and only if S~X
X is an eigenvector
of A
Eigenvectors and eigenvalues of linear transformations
Because of the close relationship between square matri-
ces and linear operators on finite-dimensional vector spaces
observed in Chapter Six, it is not surprising that one can also
define eigenvectors and eigenvalues for a linear operator.
Let T : V —
» V be a linear operator on a vector space
V over a field of scalars F. An eigenvector of T is a non-zero
vector v of V such that T(v) = cv for some scalar c in F:
here c is the eigenvalue of T associated with the eigenvector
v.
Suppose now that V is a finite-dimensional vector space
over F with dimension n. Choose an ordered basis for V, say
B. Then with respect to this ordered basis T is represented
by an n x n matrix over F, say A; this means that
[T(y)]B = A[v]B.
Here [U]B is the coordinate column vector of a vector u in V
with respect to basis B . The condition T(v) = cv for v to
be an eigenvector of T with associated eigenvalue c, becomes
-AMB = c[v]g, which is just the condition for M s to be an
eigenvector of the representing matrix A; also the eigenvalues
of T and A are the same.
If the ordered basis of V is changed, the effect is to replace
A by a similar matrix. Of course any such matrix will have
the same eigenvalues as T; thus we have another proof of the
fact that similar matrices have the same eigenvalues.
8.1: Basic Theory of Eigenvectors 267
These observations permit us to carry over to linear op-
erators concepts such as characteristic polynomial and trace,
which were introduced for matrices.
Example 8.1.4
Consider the linear transformation T : Doo[a,b] —
> Doo[a,6]
where T(f) = /', the derivative of the function /. The con-
dition for / to be an eigenvector of T is / ' = cf for some
constant c. The general solution of this simple differential
equation is / = decx
where d is a constant. Thus the eigen-
values of T are all real numbers c, while the eigenvectors are
the exponential functions decx
with d ^ 0.
Diagonalizable matrices
We wish now to consider the question: when is a square
matrix similar to a diagonal matrix? In the first place, why
is this an interesting question? The essential reason is that
diagonal matrices behave so much more simply than arbitrary
matrices. For example, when a diagonal matrix is raised to
the nth power, the effect is merely to raise each element on
the diagonal to the nth power, whereas there is no simple
expression for the nth power of an arbitrary matrix. Suppose
that we want to compute An
where A is similar to a diagonal
matrix D, with say A = SDS~X
. It is easily seen that An
=
SDn
S~1
. Thus it is possible to calculate An
quite simply
if we have explicit knowledge of S and D. It will emerge in
8.2 and 8.3 that this provides the basis for effective methods
of solving systems of linear recurrences and linear differential
equations.
Now for the important definition. Let A be a square
matrix over a field F. Then A is said to be diagonalizable over
F if it is similar to a diagonal matrix D over F, that is, there
is an invertible matrix S over F such that A = SDS-1
or
equivalently, D = S~1
AS. One also says that S diagonalizes
A. A diagonalizable matrix need not be diagonal: the reader
268 Chapter Eight: Eigenvectors and Eigenvalues
should give an example to demonstrate this. It is an important
observation that if A is diagonalizable and its eigenvalues are
c,..., cn, then A must be similar to the diagonal matrix with
ci,..., cn on the principal diagonal. This is because similar
matrices have the same eigenvalues and the eigenvalues of a
diagonal matrix are just the entries on the principal diagonal
- see Example 8.1.2.
What we are aiming for is a criterion which will tell us
exactly which matrices are diagonalizable. A key step in the
search for this criterion comes next.
Theorem 8.1.5
Let A be an n x n matrix over a field F and let C,..., cr
be distinct eigenvalues of A with associated eigenvectors
Xi,..., Xr. Then {Xi,..., Xr} is a linearly independent sub-
set of Fn
.
Proof
Assume the theorem is false; then there is a positive integer
i such that {X±,... ,Xi} is linearly independent, but the ad-
dition of the next vector Xi+i produces a linearly dependent
set {Xi,..., Xi+x}. So there are scalars d,..., rfj+i, not all
of them zero, such that
hXi + • •
• + di+1Xi+1 = 0 .
Premultiply both sides of this equation by A and use the equa-
tions AXj = CjXj to get
CidiXi -I 1
- ci+1di+1Xi+i — 0.
On subtracting Q+I times the first equation from the second,
we arrive at the relation
(ci - ci+1)diXi H h (CJ - ci+i)diXi = 0.
8.1: Basic Theory of Eigenvectors 269
Since Xi,..., Xi are linearly independent, all the coefficients
(CJ —Ci+i)dj must vanish. But ci,..., q+ 1 are all different, so
we can conclude that dj = 0 for j = 1,..., i; hence di+iXi+i =
0 and so di+i = 0, in contradiction to the original assumption.
Therefore the statement of the theorem must be correct.
The criterion for diagonalizability can now be established.
Theorem 8.1.6
Let A be an n x n matrix over a field F. Then A is diagonal-
izable if and only if A has n linearly independent eigenvectors
in Fn
.
Proof
First of all suppose that A has n linearly independent eigen-
vectors in Fn
, say Xi,..., Xn, and that the associated eigen-
values are ci,..., cn. Define S to be the n x n matrix whose
columns are the eigenvectors; thus
S=(X1...Xn).
The first thing to notice is that S is invertible; for by 8.1.5 its
columns are linearly independent. Forming the product of A
and S in partitioned form, we find that
AS = {AXX... AXn) = (c1X1 • • • cnXn),
which equals
(Xi •
•
• Xn)
'ci 0 0
0 c2 0
0 0 •
0
= SD,
where D is the diagonal matrix with entries C,..., cn. There-
fore S~1
AS = D and A is diagonalizable.
270 Chapter Eight: Eigenvectors and Eigenvalues
Conversely, assume that A is diagonalizable and that
S~1
AS = D is a diagonal matrix with entries ci,..., cn. Then
AS = SD. This implies that if X{ is the zth column of S,
then AXi equals the ith column of SD, which is CjXj. Hence
Xi,..., Xn are eigenvectors of A associated with eigenvalues
c,..., cn. Since X,..., Xn are columns of the invertible ma-
trix S, they must be linearly independent. Consequently A
has n linearly independent eigenvectors.
Corollary 8.1.7
An n x n complex matrix which has n distinct eigenvalues is
diagonalizable.
This follows at once from 8.1.5 and 8.1.6. On the other
hand, it is easy to think of matrices which are not diagonaliz-
able: for example, there is the matrix
-(; o-
Indeed if A were diagonalizable, it would be similar to the
identity matrix I2 since both its eigenvalues equal 1, and
S~1
AS = I2 for some S; but the last equation implies that
A = SI2S~l
= I2, which is not true.
An interesting feature of the proof of 8.1.6 is that it pro-
vides us with a method of finding a matrix S which diagonal-
izes A. One has simply to find a set of linearly independent
eigenvectors of A; if there are enough of them, they can be
taken to form the columns of the matrix S.
Example 8.1.5
Find a matrix which diagonalizes A =
In Example 8.1.1 we found the eigenvalues of A to be
3 + A/^T and 3 — y/^T; hence A is diagonalizable by 8.1.7. We
8.1: Basic Theory of Eigenvectors 271
also found eigenvectors for A; these form a matrix
5 /(-i + v=i)/2 -(i + v=T)/2y
Then by the preceding theory we may be sure that
Triangularizable matrices
It has been seen that not every complex square matrix is
diagonalizable. Compensating for this failure is the fact such
a matrix is always similar to an upper triangular matrix; this
is a result with many applications.
Let A be a square matrix over a field F. Then A is said
to be triangularizable over F if there is an invertible matrix S
over F such that S~l
AS = T is upper triangular. It will also
be convenient to say that S triangularizes A. Note that the
diagonal entries of the triangular matrix T will necessarily be
the eigenvalues of A. This is because of Example 8.1.2 and the
fact that similar matrices have the same eigenvalues. Thus a
necessary condition for A to be triangularizable is that it have
n eigenvalues in the field F. When F = C, this condition is
always satisfied, and this is the case in which we are interested.
Theorem 8.1.8
Every complex square matrix is triangularizable.
Proof
Let A denote a n n x n complex matrix. We show by induction
on n that A is triangularizable. Of course, if n = 1, then A is
already upper triangular: let n > 1. We shall use induction
on n and assume that the result is true for square matrices
with n — 1 rows.
272 Chapter Eight: Eigenvectors and Eigenvalues
We know that A has at least one eigenvalue c in C, with
associated eigenvector X say. Since X ^ 0, it is possible
to adjoin vectors to X to produce a basis of Cn
, say X =
X,X2,..., Xn here we have used 5.1.4. Next, recall that left
multiplication of the vectors of Cn
by A gives rise to linear
operator T on Cn
. With respect to the basis {Xi,... , X n } ,
the linear operator T will be represented by a matrix with the
special form
where A and A2 are certain complex matrices, A having
n — 1 rows and columns. The reason for the special form
is that T{X) = AX = cX since X is an eigenvalue of
A. Notice that the matrices A and B are similar since they
represent the same linear operator T; suppose that in fact
Bi = S^ASi where Si is an invertible n x n matrix.
Now by induction hypothesis there is an invertible matrix
62 with n — 1 rows and columns such that B^ = S^1
ASi is
upper triangular. Write
s =Sl
{o 1)-
This is a product of invertible matrices, so it is invertible. An
easy matrix computation shows that S^^-AS equals
which equals
Replace Bi by I .2
) and multiply the matrices together
to get
8.1: Basic Theory of Eigenvectors 273
Q-lAQ— (C A
^  - (C A
^
b Ab
~ 0 S^AXS2) ~ 0 B2
This matrix is clearly upper triangular, so the theorem is
proved.
The proof of the theorem provides a method for triangu-
larizing a matrix.
Example 8.1.6
Triangularize the matrix A -
- 1 3 /
The characteristic polynomial of A is x2
— 4x + 4, so both
eigenvalues equal 2. Solving (A — 2I2)X = 0, we find that
all the eigenvectors of A are scalar multiples of X =
Hence A is not diagonalizable by 8.1.6.
Let T be the linear operator on C2
arising from left mul-
tiplication by A. Adjoin a vector to X2 to X to get a basis
B2 = {Xu X2} of C2
, say X2 = (J J. Denote by Bx the
standard basis of C2
. Then the change of basis B —
> B2 is
described by the matrix Si = ( ). Therefore by 6.2.6
the matrix A which represents T with respect to the basis B2
is
Hence S = S^1
= I j triangularizes A.
274 Chapter Eight: Eigenvectors and Eigenvalues
Exercises 8.1
1. Find all the eigenvectors and eigenvalues of the following
matrices:
«»• (iJD' (I! i •!)•
2. Prove that tr(j4 + £) = tr(A) + tv(B) and tr(cA) = c tr(A)
where A and 5 are nxn matrices and c is a scalar.
3. If yl and B are nxn matrices, show that AB and BA have
the same eigenvalues. [Hint: let c be an eigenvalue of AB and
prove that it is an eigenvalue of BA ].
4. Suppose that A is a square matrix with real entries and
real eigenvalues. Prove that every eigenvalue of A has an
associated real eigenvector.
5. If A is a real matrix with distinct eigenvalues, then A is
diagonalizable over R: true or false?
6. Let p(x) be the polynomial
(-l)n
(xn
+ an.xxn
-1
+ an_2xn
-2
+ • • • + ao).
Show that p(x) is the characteristic polynomial of the follow-
ing matrix (which is called the companion matrix of p(x)):
/0 0 •
•
• 0 - a 0 
1 0 •
•
• 0 - a i
0 1 •
•
• 0 -a2
 0 0 •
• • 1 - a n _ i /
7. Find matrices which diagonalize the following:
8.1: Basic Theory of Eigenvectors 275
w (a a)= 0.)(J j 1 )•
8. For which values of a and b is the matrix I , 1 diago-
nalizable over C?
9. Prove that a complex 2 x 2 matrix is not diagonalizable
if and only if it is similar to a matrix of the form
where 6 ^ 0 .
10. Let A be a diagonalizable matrix and assume that S is
a matrix which diagonalizes A. Prove that a matrix T diago-
nalizes A if and only if it is of the form T = CS where C is a
matrix such that AC = CA.
11. If A is an invertible matrix with eigenvalues ci,..., cn,
show that the eigenvalues oi A~l
are c^~ ,..., c^1
.
12. Let T : V —
>
• V be a linear operator on a complex n-
dimensional vector space V. Prove that there is a basis
{vi, ..., vn } of V such that T(VJ) is a linear combination of
vi 5 ... ,vn for i = 1,... ,n.
13. Let T : Pn (R) —
• Pn(R-) be the linear operator corre-
sponding to differentiation. Show that all the eigenvalues of
T are zero. What are the eigenvectors?
14. Let ci,...,cn be the eigenvalues of a complex matrix A.
Prove that the eigenvalues of Am
are Cj",..., c™ where m is
any positive integer. [Hint: A is triangularizable].
15. Prove that a square matrix and its transpose have the
same eigenvalues.
a b
0 a
276 Chapter Eight: Eigenvalues and Eigenvectors
8.2 Applications to Systems of Linear Recurrences
A recurrence relation is an equation involving a function y
of a non-negative integral variable n, the value of y at n being
written yn. The equation relates the values of the function at
certain consecutive integers, typically yn+i,yn,..., yn-r. In
addition there may be some initial conditions to be satisfied,
which specify certain values of j/j. If the equation is linear in y,
the recurrence relation is said to be linear. The problem is to
solve the recurrence, that is, to find the most general function
which satisfies the equation and the initial conditions. Linear
recurrence relations, and more generally systems of linear re-
currence relations, occur in many real-life problems. We shall
see that the theory of eigenvalues provides an effective means
for solving such problems.
To understand how systems of linear relations can arise
we consider a predator-prey problem.
Example 8.2.1
In a population of rabbits and weasels it is observed that each
year the number of rabbits is equal to four times the number
of rabbits less twice the number of weasels in the previous
year. The number of weasels in any year equals the sum of
the numbers of rabbits and weasels in the previous year. If
the initial numbers of rabbits and weasels were 100 and 10
respectively, find the numbers of each species after n years.
Let rn and wn denote the respective numbers of rabbits
and weasels after n years. The information given in the state-
ment of the problem translates into the equations
rn + i = 4rn - 2wn
wn+1 = rn + wn
together with the initial conditions ro = 100, w0 = 10. Thus
we have to solve a system of two linear recurrence relations
for rn and wn, subject to two initial conditions.
8.2: Systems of Linear Recurrences 277
At first sight it may not seem clear how eigenvalues enter
into this problem. However, let us put the system of linear
recurrences in matrix form by writing
x
- = (-Ja n d
'4 =
(i ~'i
Then the two recurrences are equivalent to the single matrix
equation
Xn+i = AXn,
while the initial conditions assert that
100'
X
° ' 10
These equations enable us to calculate successive vectors Xn;
thus X = AX0, X2 = A2
XQ, and in general
Xn = An
Xo.
In principle this equation provides the solution of our prob-
lem. However the equation is difficult to use since it involves
calculating powers of A; these soon become very complicated
and there is no obvious formula for An
.
The key observation is that powers of a diagonal matrix
are easy to compute; one simply forms the appropriate power
of each diagonal element. Fortunately the matrix A is diago-
nalizable since it has distinct eigenvalues 2 and 3. Correspond-
ing eigenvectors are found to be I J and I J; therefore the
( 2 
matrix 5 = 1 1 diagonalizes A, and
D = S~1
AS=i I °
278 Chapter Eight: Eigenvalues and Eigenvectors
It is now easy to find Xn; for An
= (SOS'1
)™ = SDn
S~1
.
Therefore
Xn = An
X0 = SDn
S-1
X0
(I 2  (2n
-{i iJ(,o
lich leads to
M
3n
J
(-1
 1
2
)
-lj
/100
 10
= f 180 • 3n
- 80 • 2n
n
~  90 • 3n
- 80 • 2n
The solution to the problem can now be read off:
rn = 180 • 3" - 80 • 2n
and wn = 90 • 3n
- 80 • T.
Let us consider for a moment the implications of these equa-
tions. Notice that rn and wn both increase without limit as
n —> oo since 3n
is the dominant term; however
lim ( ^ ) = 2.
n—>oo t o n
The conclusion is that, while both populations explode, in the
long run there will be twice as many rabbits as weasels.
Having seen that eigenvalues provide a satisfactory so-
lution to the rabbit-weasel problem, we proceed to consider
systems of linear recurrences in general.
Systems of first order linear recurrence relations
A system of first order (homogeneous) linear recurrence
relations in functions y„ ,..., j/n of an integral variable n is
a set of equations of the form
t (i) (i) , , M
Vn+l = a
HVn + •
•
• + O-lmVn
(2) (1) , , {rn)
Vn+l = a
2l2M + • • • + 0,2mVn
(m) _ (!) i i (m
)
Vn--l — a
miyn -r ' • • -r 0,mrnyn
8.2: Systems of Linear Recurrences 279
We shall only consider the case where the coefficients
constants. One objective might be to find all the functions
Vn , • • • i Vn which satisfy the equations of the system, i.e.,
the general solution. Alternatively, one might want to find a
solution which satisfies certain given conditions,
V6
(1) _ h ,.(2) _ , 7/( m )
- h
where &i,..., bm are constants. Clearly the rabbit and weasel
problem is of this type.
The method adopted in Example 8.2.1 can be applied
with advantage to the general case. First convert the given
system of recurrences to matrix form byintroducing the matrix
A = [aij]m,m> the coefficient matrix, and defining
Yn =
yV

y^J
and B =
/ 6 i 
b2
bmJ
Then the system of recurrences becomes simply
*n+l =
AYn,
with the initial condition YQ — B. The general solution of this
is
Yn = An
B0.
Now assume that A is diagonalizable: suppose that in
fact D = S~l
AS is diagonal with diagonal entries di,..., dm.
Then A = SDS'1
and An
= SDn
S-1
, so that
Yn = SDn
S~1
B
Here of course Dn
is the diagonal matrix with entries
2oU Chapter Eight: Eigenvalues and Eigenvectors
dn
', d2n
..., dm
n
. Since we know how to find S and D, all we
need do is compute the product Yn, and read off its entries to
obtain the functions yn^ ..., j/n ^m
-*.
At this point the reader may ask: what if A is not di-
agonalizable? A complete discussion of this case would take
us too far afield. However one possible approach is to exploit
the fact that the coefficient matrix A is certainly triangular-
izable by 8.1.8. Thus we can find S such that S'1
AS = T is
upper triangular. Now write Un = S~1
Yn, so that Yn — SUn.
Then the recurrence Yn+i = AYn becomes SUn+i = ASUn,
or Un+i = (S~1
AS)Un = TUn. In principle this "triangular"
system of recurrence relations can be solved by a process of
back substitution: first solve the last recurrence for Un , then
substitute for Un in the second last recurrence and solve for
Un , and so on. What makes the procedure effective is the
fact that powers of a triangular matrix are easier to compute
than those of an arbitrary matrix.
Example 8.2.2
Consider the system of linear recurrences
Vn+l = Vn + Zn
Zn+1 =
Vn i "->z
n
The coefficient matrix A = I 1 is not diagonalizable,
but it was triangularized in Example 8.1.6; there it was found
that
T = S~1
AS= (2
Q ^ where S = (^ J
Put Un = S~1
Yn; here the entries of Un and Yn are written un,
vn and yn, zn respectively. The recurrence relation Yn+i =
AYn becomes Un+i = TUn. This system of linear recurrences
8.2: Systems of Linear Recurrences 281
is in triangular form:
= 2un + vn
The second recurrence has the obvious solution vn = d22n
with d2 constant. Substitute for vn in the first equation to
get un+i = 2un + d22n
. This recurrence can be solved in a
simple-minded fashion by calculating successively ui,u2,-.-
and looking for the pattern. It turns out that un = di2n
+
d2n 2 n _ 1
where d is another constant. Finally, yn and zn can
be found from the equation Yn = SUn; the general solution is
therefore
yn = dx2n
+ d2n2n
-1
zn = dl2n
+ d2(n + 2)2n
-1
Higher order recurrence relations
A system of recurrence relations for yh. , • • •
, yn which
expresses each y^+i in terms of the y, for j = n — r + 1,..., n,
is said to be of order r. When r > 2, such a system can
be converted into a first order system by introducing more
unknowns. The method works well even for a single recurrence
relation, as the next example shows.
Example 8.2.3 (The Fibonacci sequence)
The sequence of integers 0, 1, 1, 2, 3, 5,. .. is generated by
adding pairs of consecutive terms to get the next term. Thus,
if the terms are written yo, yi, y2, • • •
, then yn satisfies
Vn+i =yn + yn-i, n>l,
which is a second order recurrence relation.
To convert this into a first order system we introduce the
new function zn = yn~i, {n > !)• This results in an equivalent
282 Chapter Eight: Eigenvalues and Eigenvectors
system of first order recurrences
Vn+l =Vn + Zn
Zn+l = Vn
with initial conditions y0 = 0 and z0 = 1. The coefficient
matrix A = I J has eigenvalues (1 + v/5)/2 and
(1 — y/
5)/2, so it is diagonalizable. Diagonalizing A as in
Example 8.1.5, we find that
D-S-1
AS-((1 + V5)/2
° "i
where
S = ( r
( 1
+ V5)/2 ( l - V 5 ) / 2 y
Then Yn = An
Y0 = (SBS'1
)^ = SDn
S'1
Y0. This yields
the rather unexpected formula
for the (n + l)th Fibonacci number.
Markov processes
In order to motivate the concept of a Markov process, we
consider a problem about population movement.
Example 8.2.4
Each year 10% of the population of California leave the state
for some other part of the United States, while 20% of the
U.S. population outside California enter the state. Assum-
ing a constant total population of the country, what will the
ultimate population distribution be?
8.2: Systems of Linear Recurrences 283
Let yn and zn be the numbers of people inside and outside
California after n years; then the information given translates
into the system of linear recurrences
Writing
Vn+l = .9j/n + -2Zn
zn+i = .lyn + .8zn
x
-=it)"**=(* :
«
we have Xn+i = AXn. The matrix A has eigenvalues 1 and
.7, so we could proceed to solve for yn and zn in the usual way.
However this is unnecessary in the present example since it is
only the ultimate behavior of yn and zn that is of interest.
Assuming that the limits exist, we see that the real object
of interest is the vector
X00= lim Xn = (^n^ocVn
n->oo y iimn^oo Zn
Taking the limit as n —
> oo of both sides of the equation
Xn+i = AXn, we obtain X = AX; hence X is an eigenvec-
tor of A associated with the eigenvalue 1. An eigenvector is
quickly found to be ( J. Thus Xoo must be a scalar multiple
of this vector. Now the sum of the entries of X^ equals the
total U.S. population, p say, and it follows that
Y — y
p
3 VI
So the (alarming) conclusion is that ultimately two thirds of
the U.S. population will be in California and one third else-
where. This can be confirmed by explicitly calculating yn and
zn and taking the limit as n —*• oo.
284 Chapter Eight: Eigenvalues and Eigenvectors
The preceding problem is an example of what is known as
a Markov process. For an understanding of this concept some
knowledge of elementary probability is necessary. A Markov
process is a system which has a finite set of states Si,..., Sn.
At any instant the system is in a definite state and over a fixed
period of time it changes to another state. The probability
that the system changes from state Sj to state Si over one
time period is assumed to be a constant Pij. The matrix
* =
[Pijn,n
is called the transition matrix of the system. In Example 8.2.4
there are two states: a person is either in or not in California.
The transition matrix is the matrix A.
Clearly all the entries of P lie in the interval [0, 1]; more
importantly P has the property that the sum of the entries in
any column equals 1. Indeed Y^i=Pij =
1 s m c e
it is certain
that the system will change from state Sj to some state Si.
This property guarantees that 1 is an eigenvalue of P; indeed
det(P — I) = 0 because the sum of the entries in any column
of the matrix P — / is equal to zero, so its determinant is zero.
Suppose that we are interested in the behavior of the
system over two time periods. For this we need to know the
probability of going from state Sj to state Si over two periods.
Now the probability of the system going from Sj to Si via Sk
is PikPkj) s o
the probability of going from state Sj to Si over
two periods is
n
^2 PikPkj-
But this is immediately recognizable as the (i,j) entry of P2
;
therefore the transition matrix for the system over two time
periods is P2
. More generally the transition matrix for the
system over k time periods is seen to be Pk
by similar consid-
erations.
8.2: Systems of Linear Recurrences 285
The interesting problem for a Markov process is to deter-
mine the ultimate behavior of the system over a long period of
time, that is to say, limfc_+00(Pfc
). For the (i,j) entry of this
matrix is the probability that the system will go from state Si
to state Sj in the long run.
The first question to be addressed is whether this limit
always exists. In general the answer is negative, as a very
simple example shows: if P = ( 1, then Pk
equals either
1 or I 1, according to whether k is even or odd;
so the limit does not exist in this case. Nevertheless it turns
out that under some mild assumptions about the matrix the
limit does exist. Let us call a transition matrix P regular
if some positive power of P has all its entries positive. For
example, the matrix I 1 is regular; indeed all powers
after the first have positive entries. But, as we have seen, the
matrix I I is not regular. A Markov system is said to
be regular if its transition matrix is regular.
The fundamental theorem about Markov processes can
now be stated. A proof may be found in [15], for example.
Theorem 8.2.1
Let P be the transition matrix of a regular Markov system.
Then limfc_^00(PA:
) exists and has the form (XX ... X)
where X is the unique eigenvector of P associated with the
eigenvalue 1 which has entry sum equal to 1.
Our second example of a Markov process is the library
book problem from Chapter One (see Exercise 1.2.12).
Example 8.2.5
A certain library owns 10,000 books. Each month 20% of the
books in the library are lent out and 80% of the books lent out
286 Chapter Eight: Eigenvalues and Eigenvectors
are returned, while 10% remain lent out and 10% are reported
lost. Finally, 25% of books listed as lost the previous month
are found and returned to the library. How many books will
be in the library, lent out, and lost in the long run?
Here there are three states that a book may be in: Si =
in the library: S2 = lent out: S3 = lost. The transition matrix
for this Markov process is
.8 .8 .25'
P= I .2 .1 0
0 .1 .75
Clearly P2
has positive entries, so P is regular. Of course P
has the eigenvalue 1; the corresponding eigenvector with entry
sum equal to 1 is found to be
So the probabilities that a book is in states Si, S2, S3 after a
long period of time are 45/59, 10/59, 4/59 respectively. There-
fore the expected numbers of books in the library, lent out,
and lost, in the long run, are obtained by multiplying these
probabilities by the total number of books, 10,000. These
numbers are therefore 7627, 1695, 678 respectively.
Exercises 8.2
1. Solve the following systems of linear recurrences with the
specified initial conditions:
(a)  Vn+1
Z v , l
lX
z
n
where y0 = 0,z0 = 1;
(b)  y
;+l
: %- X f where
y = <>•
*> =l
-
Zn+l — z
Vn -r 3Zn
8.2: Systems of Linear Recurrences 287
2. In a certain nature reserve there are two competing animal
species A and B. It is observed that the number of species A
equals three times the number of A last year less twice the
number of species B last year. Also the number of species B
is twice the number of B last year less the number of species A
last year. Write down a system of linear recurrence relations
for an and bn, the numbers of each species after n years, and
solve the system. What are the long term prospects for each
species?
3. A pair of newborn rabbits begins to breed at age one
month, and each successive month produces one pair of off-
spring (one of each sex). Initially there were two pairs of rab-
bits. If rn is the total number of pairs of rabbits at the begin-
ning of the nth month, show that rn satisfies rn+i = r n + r n _ i
and ri = 2 = r^- Solve this second order recurrence relation
for rn.
4. A tower n feet high is to be built from red, white and blue
blocks. Each red block is 1 foot high, while the white and
blue blocks are 2 feet high. If un denotes the number of dif-
ferent designs for the tower, show that the recurrence relation
un+i = un + 2un_i must hold. By solving this recurrence,
find a formula for un.
5. Solve the system of recurrence relations yn+i = 3yn — 2zn,
zn+i — 2yn — zn, with the initial conditions yo = 1, zo = 0.
6. Solve the second order system yn+i = yn-i, zn+i = yn +
4zn, with the initial conditions yo = 0, y = 1 = z.
7. In a certain city 90% of employed persons retain their jobs
at the end of each year, while 60% of the unemployed find
a job during the year. Assuming that the total employable
population remains constant, find the unemployment rate in
the long run.
288 Chapter Eight: Eigenvectors and Eigenvalues
8. A certain species of bird nests in three locations A, B and
C. It is observed that each year half of the birds at A and half
of the birds at B move their nests to C, while the others stay
in the same nesting place. The birds nesting at C are evenly
split between A and B. Find the ultimate distribution of birds
among the three nesting sites, assuming that the total bird
population remains constant.
9. There are three political parties in a certain city, conserva-
tives, liberals and socialists. The probabilities that someone
who voted conservative last time will vote liberal or socialist
at the next election are .3 and .2 respectively. The proba-
bilities of a liberal voting conservative or socialist are .2 and
.1. Finally, the probabilities of a socialist voting conservative
or liberal are .1 and .2. What percentages of the electorate
will vote for the three parties in the long run, assuming that
everyone votes and the number of voters remains constant?
8.3 Applications to Systems of Linear Differential
Equations
In this section we show how the theory of eigenvalues
developed in 8.1 can be applied to solve systems of linear
differential equations. Since there is a close analogy between
linear recurrence relations and linear differential equations,
the reader will soon notice a similarity between the methods
used here and in 8.2.
For simplicity we consider initially a system of first or-
der linear {homogeneous) differential equations for functions
yi,..., yn of x. This has the general form
{ y'l = aiiVi + •
•
• + ainVn
y'n = anlyi + •
• • + annyn
8.3: Applications to Systems of Linear Differential Equations 289
Here the Qj<ij £1X6 assumed to be constants. The object is to
find the most general functions 2/i,...,j/n , differentiable in
some interval [a ,b], which satisfy the equations of the system.
Alternatively one may wish to find functions which satisfy in
addition a set of initial conditions of the form
yi(xQ) = h, y2(x0) = b2, ..., yn(xn) = K-
Here the bi are certain constants and x$ is in the interval [a, b].
Let A = [a,ij], the coefficient matrix of the system and
write
fyi
Y =
yn/
Then we define the derivative of Y to be
Y' =
/y[
y'2
With this notation the given system of differential equations
can be written in matrix form
Y' = AY.
By a solution of this equation we shall mean any column
vector Y of n functions in D[a, b] which satisfies the equation.
The set of all solutions is a subspace of the vector space of all
n-column vectors of differentiable functions; this is called the
solution space. It can be shown that the dimension of the so-
lution space equals n, so that there are n linearly independent
solutions, and every solution is a linear combination of them.
2 9 0 Chapter Eight: Eigenvectors and Eigenvalues
If a set of n initial conditions is given, there is in fact a
unique solution of the system satisfying these conditions. For
an account of the theory of systems of differential equations
the reader may consult a book on differential equations such
as [15] or [16]. Here we are concerned with methods of finding
solutions, not with questions of existence and uniqueness of
solutions.
Suppose that the coefficient matrix A is diagonalizable,
so there is an invertible matrix S such that D = S~1
AS is
diagonal, with diagonal entries d,..., dn say. Here of course
the di are the eigenvalues of A. Define
U = S-X
Y.
Then Y — SU and Y' = SU' since S has constant entries.
Substituting for Y and Y' in the equation Y' = AY, we obtain
SU' = ASU, or
U' = {S~1
AS)U = DU.
This is a system of linear differential equations for u,..., un,
the entries of U. It has the very simple form
dUi
d2u2
The equation u = diUi is easy to solve since its differential
form is
d(ln Ui) = di.
Thus its general solution is u^ = CiediX
where ci is a con-
stant. The general solution of the system of linear differential
equations for u,..., un is therefore
ui =cied i a ;
, ... ,un = cnednX
.
u'2
8.3: Applications to Systems of Linear Differential Equations 291
To find the original functions yi, simply use the equation Y =
SU to get
n n
3=1 3=1
Since we know how to find S, this procedure provides
an effective method of solving systems of first order linear
differential equations in the case where the coefficient matrix
is diagonalizable.
Example 8.3.1
Consider a long tube divided into four regions along which
heat can flow. The regions on the extreme left and right are
kept at 0°C, while the walls of the tube are insulated. It is
assumed that the temperature is uniform within each region.
Let y(t) and z(t) be the temperatures of the regions A and
B at time t. It is known that the rate at which each region
cools equals the sum of the temperature differences with the
surrounding media. Find a system of linear differential equa-
tions for y[t) and z(t) and solve it.
0° / A
m°

B
z(t)°
"s n°
According to the law of cooling
y' =(z-y) + {0-y)
Z' =(y-Z) + (p-Z)
292 Chapter Eight: Eigenvectors and Eigenvalues
Thus we are faced with the linear system of differential equa-
tions
y' =-2y + z
z' = y - 2z
Here
A=l~l _X)^Y=(l
Now the matrix A is diagonalizable; indeed
D = S~1
AS=[~l
_°3 J where S = (l
_J
Setting U = S~X
Y, we obtain from Y' = AY the equation
U' = DU. This yields two very simple differential equations
u[ = —u
u'o = —3u
2
where u and «2 are the entries of U. Hence u = ce * and
u-i = de~3t
, with arbitrary constants c and d. Finally
Y = SU-l a _ t _ . _ 3 t
ce * + de 3t
ce~t
— de
The general solution of the original system of differential equa-
tions is therefore
y = ce~* + de~3t
z = ce~f
— de~3t
Thus the temperatures of both regions A and B tend to zero
as t —
> oo.
In the next example complex eigenvalues arise, which
causes a change in the procedure.
8.3: Applications to Systems of Linear Differential Equations 29o
Example 8.3.2
Solve the linear system of differential equations
( y[ = Vi ~ Vi
y'2 = yi+y2
The coefficient matrix here is
A =
which has complex eigenvalues 1 + % and 1 — i; we are us-
ing the familiar notation % = /—l here. The corresponding
eigenvectors are
0and
(~i*
respectively. Let S be the 2x2 matrix which has these vectors
as its columns; then S~1
AS = D, the diagonal matrix with
diagonal entries 1 + i and 1 — i. If we write U = S~X
Y, the
system of equations becomes U' = DU, that is,
tti = (1 +i)u
u'2 = (1 - i)u2
where u and U2 are the entries of U.
The first equation has the solution u = e(1+
^x
, while the
second has the obvious solution u2 = 0. Using these values
for u and U2, we obtain a complex solution of the system of
differential equations
Y _ s u _ ( i e W 
y _ su - I e(1+i)x I
Of course we are looking for real solutions, but these are in
fact at hand. For the real and imaginary parts of Y will also
2 9 4 Chapter Eight: Eigenvectors and Eigenvalues
be solutions of the system Y' = AY. Thus we obtain two real
solutions from the single complex solution Y, by taking the
real and imaginary parts of Y; these are respectively
/—e^sin x , , . /e^cos x
 e cos x J  ex
sin x
Now Y and Y2 are easily seen to be linearly independent solu-
tions; therefore the general solution of the system is obtained
by taking an arbitrary linear combination of these:
Y = c1Y1 + c2Y2 = e*( ~Cl S i n x + C2 C
° S X
y c cos x + c2 sin x
where c and c2 are arbitrary real constants. Hence
J/i =ex
(—ci sin x + c2 cos x)
y2 = ex
(c cos a: + C2 sin a;)
Of course the success of the method employed in the last
two examples depended entirely upon the fact that A is diag-
onalizable. However, should this not be the case, one can still
treat the system of differential equations by triangularizing
the coefficient matrix and solving the resulting triangular sys-
tem using back substitution, rather as was done for systems
of linear recurrences in 8.2.
Example 8.3.3
Solve the linear system of differential equations
2/i = 3 / i + 2/2
2/2 = -2/1 + 3
2/2
In this case the coefficient matrix
8.3: Applications to Systems of Linear Differential Equations 295
is not diagonalizable, but it can be triangularized. In fact it
was shown in Example 8.1.6 that
T = S->AS=(l 
where S = I J. Put U = S Y and write ui, u2 for
the entries of U. Then Y = SU and Y' = SU'. The equation
Y' = AY now becomes U' = TU. This yields the triangular
system
u[ = 2ui + u<i
u'2 = 2u2
Solving the second equation, we find that u2 = c2e2x
with
C2 an arbitrary constant. Now substitute for u^ in the first
equation to get
u[ - 2ui = c2e2x
.
This is a first order linear equation which can be solved by a
standard method: multiply both sides of the equation by the
"integrating factor"
f -2dx -2x
The equation then becomes (uie~2x
)' = c2, whence ue~2x
=
c2x + ci, with c another arbitrary constant. Thus u —
c2xe2x
+ cxe2x
. To find the original functions j/i and y2, we
form the product
Y = SU = e2x
( C l
+ C 2 X
^
 c
i +c2{x + 1)
Thus the general solution of the system is
J/i = (ci +c2x)e2x
,
y2 = (ci + c2(x + l))e2x
296 Chapter Eight: Eigenvectors and Eigenvalues
Finally, suppose that initial conditions j/i (0) = 1 and
V2 (0) = 0 are given. We can find the correct values of c and
c2 by substituting t = 0 in the expressions for y and j/2, to get
ci = 1 and C2 = — 1. The required solution is y = (1 — x)e2x
and ?/2 = —x e2x
.
The next application is one of a military nature.
Example 8.3.4
Two armored divisions A and B engage in combat. At time t
their respective numbers of tanks are a(t) and b(t). The rate
at which tanks in a division are destroyed is proportional to
the number of intact enemy tanks at that instant. Initially
A and B have ao and bo tanks where ao > &o- Predict the
outcome of the battle.
According to the information given, the functions a and
b satisfy the linear system
a' = -kb
b' = -ka
where k is some positive constant. Here the coefficient matrix
is
0 -k'
A
~ ' -k 0
The characteristic equation is x2
— k2
= 0, so the eigenvalues
are k and —k and A is diagonalizable. It turns out that
where S = ( . If we set F = , the system of
differential equations becomes Y' = AY. On writing U =
S-X
Y, we get U' = DU. This is the system
u' = ku
v' = —kv
8.3: Applications to Systems of Linear Differential Equations 297
where U = I 1. Hence u = cekx
and v = de kx
, with c and
W
d arbitrary constants. The general solution is Y = SU, which
yields
j a= cekt
+ de~kt
b = -cekt
4- de~kt
Now the initial conditions are a(0) = ao and b(0) = bo, so
c + d = ao
—c + d=bo
Solving we obtain c = (ao — bo)/2, d = (a0 + &o)/2. Therefore
the numbers of tanks surviving at time t in Divisions A and
B are respectively
a = {2^y^+^o + boe-kt
a
o - fe
o ekt + (ao + b0 e_fc(
It is more convenient to write a(t) and b(t) in terms of the
hyperbolic functions cosh(:r) = ^(ex
+ e~x
) and sinh(x) =
{ex
— e~x
). Then the solution becomes
a = aocosh(kt) — 6osinh(H)
b = bocosh(kt) — aosinh(A;i)
Now Division B will have lost all its tanks when 6 = 0,
i.e., after time
t=itanh-1
(-).
k v
ao
Observe also that
2 t2 2 1,2
a — b — an — bf,
298 Chapter Eight: Eigenvectors and Eigenvalues
because of the identity cosh2
(kt) — sinh2
(A;i) = 1. Therefore
at the time when Division B has lost all of its tanks, Division
A still has a tanks where a2
— 0 = a§ — &o- Hence the number
of tanks that Division A has left at the end of the battle is
Va
o - b
o-
Not surprisingly, since it had more tanks to start with, Divi-
sion A wins the battle.
However, there is a way in which Division B could con-
ceivably win. Suppose that
— a 0 < bQ < a0.
v 2
Suppose further that Division A consists of two columns with
equal numbers of tanks, and that Division B manages to at-
tack one column of Division A before the other column can
come to its aid. Since 60 > |a0 , Division B defeats the first
column of Division A, and it still has •
Jb 2
) — OQ tanks left.
Then Division B attacks the second column and wins with
y b
l - ia
l - 4°o = y b
l - 2a
o
tanks left.
Thus Division B wins the battle despite having fewer
tanks than Division A: but it must have more than ao/^2
or 71% of the strength of the larger division for the plan to
work. This explains the frequent success of the "divide and
conquer strategy".
Higher order equations
Systems of linear differential equations of order 2 or more
can be converted to first order systems by introducing addi-
tional functions. Once again the procedure is similar to that
adopted for systems of linear recurrences.
8.3: Applications to Systems of Linear Differential Equations 299
Example 8.3.5
Solve the second order system
v'i
= -2y2 + y[ + 2y'2
= 2yx +2y[ 2/2
The system may be converted to a first order system by
introducing two new functions
2/3 = 2/i and y4 = y'2.
Thus y'{ = y'3 and y2 = y4. The given system is therefore
equivalent to the first order system
' 2/i = 2/3
2/2=2/4
y'3 = -2y2 + y3 + 2y4
, 2/4 = 2
2/i + 2y3 - y4
The coefficient matrix here is
A =
0 1
0 0
-2 1
0 2
1
2
- 1 /
Its eigenvalues turn out to be 1, —1, 2, —2, with corresponding
eigenvectors
/ 2

-1
-2
 1/
1
2
W
/
-1
-2
V 2 /
Therefore, if S denotes the matrix with these vectors as its
columns, we have S~1
AS = D, the diagonal matrix with di-
agonal entries 1, -1,2, - 2 . Now write U = S~1
Y. Then the
3 0 0 Chapter Eight: Eigenvectors and Eigenvalues
equation Y' = AY becomes U' = {S~X
AS)U = DU, which is
equivalent to
u[ = wi, ^2 — ~ u
2, u'3 = 2v,3, u'A = —2M4.
Solving these simple equations, we obtain
ui=ciex
, u2 = c2e~x
, u3 = c3e2x
, •
u 4 = c4e~2:r
.
The functions 2/1 and 2/2 ma
Y n o w De
read off from the equation
Y = SU to give the general solution
2/1 = cie* + 2c2e~x
+ c3e2a;
+ c4e-2:E
y2 = 2ciea:
— C2e_x
+ c^e2x
— c$e~2x
Exercises 8.3
1. Find the general solutions of the following systems of linear
differential equations:
(a) {ti=
-Vl+
* (b) ly
) = lyi
72
/2
{V2= 2yi~ 3y2
x
{y2 = - 2yi + 3y2
2/1 = 2/1+2/2 + 2/3
(c) { y'2= 2/2
y's = 2/2 + ys
2. Find the general solution (in real terms) of the system of
differential equations
y'i= 2/i+ 2/2
V2 = —22/1 + 3y2
Then find a solution satisfying the initial conditions yi(0) = 1,
2/2(0) = 2.
8.3: Applications to Systems of Linear Differential Equations 301
3. By triangularizing the coefficient matrix solve the system
of differential equations
(y'i = 5yi + 3y2
 2/2 = -3yi - V2
Then find a solution satisfying the initial conditions yi(0) = 0,
y2(0) = 2.
4. Solve the second order linear system
y'( = 2j/i + y2 + y[ + y'2
y'i = ~ %i + 2
2/2 + 5yi - y'2
5. Given a system of n (homogeneous) linear differential equa-
tions of order k, how would you convert this to a system of
first order equations? How many equations will there be in
the first order system?
6. Describe a general method for solving a system of second
order linear differential equations of the form Y" = AY, where
A is diagonalizable.
7. Solve the systems of differential equations
2
y'i = y - 2/2 (h) [y'i = - 4y-
y'{ = 33/1 + 5y2
{
M y'{ = Vl + 5y2
[Note that the general solution of the differential equation
u" = o?u is u = cicosh(ax) + c2sinh(ax)].
8. {The double pendulum) A string of length 21 is hung from
a rigid support. Two weights each of mass m are attached
to the midpoint and lower end of the string, which is then
allowed to execute small vibrations subject to gravity only.
Let y and y2 denote the horizontal displacements of the two
weights from the equilibrium position at time t.
(a) (optional) By using Newton's Second Law of Motion,
show that yi and y2 satisfy the differential equations
302 Chapter Eight: Eigenvectors and Eigenvalues
Vi = a2
(-3yi + y2), y2 = a2
(yx - y2) where a = y/gjl and g
is the acceleration due to gravity.
(b) Solve the linear system in (a) for y and y2. [Note:
the general solution of the differential equation y" + a2
y = 0
is y = c cos ax + c2 sin ax].
9. In Example 8.3.4 assume that Division A consists of m
equal columns. Suppose that Division B is able to attack
each column of A in turn. Show that Division B will win the
battle provided that bo > -s
^-.
Chapter Nine
MORE ADVANCED TOPICS
This chapter is intended to serve as an introduction to
some of the more advanced parts of linear algebra. The most
important result of the chapter is the Spectral Theorem, which
asserts that every real symmetric matrix can be diagonalized
by means of a suitable real orthogonal matrix. This result
has applications to quadratic forms, bilinear forms, conies and
quadrics, which are described in 9.2 and 9.3. The final section
gives an elementary account of the important topic of Jordan
normal form, a subject not always treated in a book such as
this.
9.1 Eigenvalues and Eigenvectors of Symmetric and
Hermitian Matrices
In this section we continue the discussion of diagonaliz-
ability of matrices, which was begun in 8.1, with special regard
to real symmetric matrices. More generally, a square complex
matrix A is called hermitian if
A = A*,
that is, A = (A)T
. Thus hermitian matrices are the complex
analogs of real symmetric matrices. It will turn out that the
eigenvalues and eigenvectors of such matrices have remark-
able properties not possessed by complex matrices in general.
The first indication of special behavior is the fact that their
eigenvalues are always real, while the eigenvectors tend to be
orthogonal.
303
304 Chapter Nine: Advanced Topics
Theorem 9.1.1
Let A be a hermitian matrix. Then:
(a) the eigenvalues of A are all real;
(b) eigenvectors of A associated with distinct eigenvalues
are orthogonal.
Proof
Let c be an eigenvalue of A with associated eigenvector X, so
that AX — cX. Taking the complex transpose of both sides
of this equation and using 7.1.7, we obtain X*A = cX* since
A = A*. Now multiply both sides of this equation on the right
by X to get X*AX = cX*X = c||X||2
: remember here that
X*X equals the square of the length of X. But (X*AX)* =
X*A*X** = X*AX; thus the scalar X* AX equals its complex
conjugate and so it is real. It follows that c||X||2
is real. Since
lengths of vectors are always real, we deduce that c, and hence
c, is real, which completes the proof of (a).
To prove (b) take two eigenvectors X and Y associated
with distinct eigenvalues c and d. Thus AX = cX and AY =
dY. Then Y*AX = Y*(cX) = cY*X, and in the same
way X*AY = dX*Y. However, by 7.1.7 again, (X*AY)* =
Y*A*X = Y*AX. Therefore (dX*Y)* = cY*X, or dY*X =
cY*X because d is real by the first part of the proof. This
means that (c—d)Y*X = 0, from which it follows that Y*X =
0 since c ^ d. Thus X and Y are orthogonal.
Suppose now that {Xi,..., Xr} is a set of linearly inde-
pendent eigenvectors of the n x n hermitian matrix A, and
that r is chosen as large as possible. We can multiply Xi by
l/||Xj|| to produce a unit vector; thus we may assume that
each Xi is a unit vector. By 9.1.1 {Xi,..., Xr} is an orthonor-
mal set. Now write U = (X .. -Xr), an n x r matrix. Then
U has the property
AU = (AYi ... AXr) = {c1X1 ... crXr),
9.1: Symmetric and Hermitian Matrices 305
where c,..., cr are the eigenvectors corresponding to
X,..., Xr respectively. Hence
AU = (X1X2...Xr)
/ci 0 0
0 c2 0
 0 0 0
0
Cr/
= UD,
where D is the diagonal matrix with diagonal entries c,...,
cr. Since the columns of U form an orthonormal set, U*U =
Ir.
In general r < n, but should it be the case that r = n,
then U is n x n and we have U^1
= U*, so that U is unitary
(see 7.3). Therefore U*AU = D and A is diagonalized by the
matrix U. In other words, if there exist n mutually orthogonal
eigenvectors of A, then A can be diagonalized by a unitary
matrix. The outstanding question is, of course, whether there
are always that many linearly independent eigenvectors. We
shall shortly see that this is the case.
A key result must first be established.
Theorem 9.1.2 (Schur '5 Theorem)
Let A be an arbitrary square complex matrix. Then there is a
unitary matrix U such that U*AU is upper triangular. More-
over, if A is a real symmetric matrix, then U can be chosen
real and orthogonal.
Proof
Let A be an n x n matrix. The proof is by induction on n.
Of course, if n = 1, then A is already upper triangular, so
let n > 1. There is an eigenvector X of A, with associated
eigenvalue c say. Here we can choose X to be a unit vector
in Cn
. Using 5.1.4 we adjoin vectors to X to form a basis of
Cn
. Then the Gram-Schmidt procedure (in the complex case)
may be applied to produce an orthonormal basis X,..., Xn
of Cn
; note that X is a member of this basis.
306 Chapter Nine: Advanced Topics
Let UQ denote the matrix (X... Xn); then UQ is unitary
since its columns form an orthonormal set. Now
U*QAXX = U*0{ClXx) = ci(C/0*Xi).
Also X*Xl=0iii>l, while X?XX = 1. Hence
U^AXX = Cl
/ c i 
0
. X
nX
l J W
Since
UZAUQ = U*QA{X1 ...Xn) = (U£AX1 U*0AX2 ... U£AXn),
we deduce that
ci B
U^AUo =
0 Ai
where A is a matrix with n — 1 rows and columns and 5 is
an (n — l)-row vector.
We now have the opportunity to apply the induction
hypothesis on n; there is a unitary matrix U such that
C/*i4it/i = Ti is upper triangular. Put
C/2 =
1 0
0 C/i
which is surely a unitary matrix. Then let U — VQU^'I this
also unitary since U*U = U^(U^U0)U2 = U;U2 = I. Finally
^M£/ = C^(^0M^o)y'2 = ^ ( C
0
1
f j ^ ,
9.1: Symmetric and Hermitian Matrices 307
which equals
1 0  / c i B  ( l 0  (a BUX 
0 UZJ0 A1J0 Uj V° UtA&iJ-
This shows that
U*AU=(C
' B
^y
an upper triangular matrix, as required.
If the matrix A is real symmetric, the argument shows
that there is a real orthogonal matrix S such that ST
AS is
diagonal. The point to keep in mind here is that the eigenval-
ues of A are real by 9.1.1, so that A has a real eigenvector.
The crucial theorem on the diagonalization of hermitian
matrices can now be established.
Theorem 9.1.3 (The Spectral Theorem)
Let A be a hermitian matrix. Then there is a unitary matrix U
such that U*AU is diagonal. If A is a real symmetric matrix,
then U may be chosen to be real and orthogonal.
Proof
By 9.1.2 there is a unitary matrix U such that U*AU = T
is upper triangular. Then T* = U*A*U = U*AU = T, so
T is hermitian. But T is upper triangular and T* is lower
triangular, so the only way that T and T* can be equal is if
all the off-diagonal entries of T are zero, that is, T is diagonal.
The case where A is real symmetric is handled by the
same argument.
Corollary 9.1.4
If A is an n x n hermitian matrix, there is an orthonormal
basis of Cn
which consists entirely of eigenvectors of A. If
in addition A is real, there is an orthonormal basis of Rn
consisting of eigenvectors of A.
308 Chapter Nine: Advanced Topics
Proof
By 9.1.3 there is a unitary matrix U such that U* AU = D is
diagonal, with diagonal entries d,..., dn say. If Xi,..., Xn
are the columns of U, then the equation AU = UD implies
that AXi = diXi for i = 1,..., n. Therefore the Xi are eigen-
vectors of A, and since U is unitary, they form an orthonormal
basis of Cn
. The argument in the real case is similar.
This justifies our hope that an n x n hermitian matrix
always has enough eigenvectors to form an orthonormal basis
of Cn
. Notice that this will be the case even if the eigenvalues
of A are not all distinct.
The following constitutes a practical method of diago-
nalizing an n x n hermitian matrix A by means of a unitary
matrix. For each eigenvalue find a basis for the correspond-
ing eigenspace. Then apply the Gram-Schmidt procedure to
get an orthonormal basis of each eigenspace. These bases are
then combined to form an orthonormal set, say {Xi,..., Xn}.
By 9.1.4 this will be a basis of Cn
. If U is the matrix with
columns X,..., Xn, then U is hermitian and U*AU is diago-
nal, as was shown in the discussion preceding 9.1.2. The same
procedure is effective for real symmetric matrices.
Example 9.1.1
Find a real orthogonal matrix which diagonalizes the matrix
The eigenvalues of A are 3 and —1, (real of course), and
corresponding eigenvectors are
( l ) a n d
( ~ l ) '
9.1: Symmetric and Hermitian Matrices 309
These are orthogonal; to get an orthonormal basis of R2
, re-
place them by the unit eigenvectors
Finally let
^OO^T^i
which is an orthogonal matrix. The theory predicts that
as is easily verified by matrix multiplication.
Example 9.1.2
Find a unitary matrix which diagonalizes the hermitian matrix
/ 3/2 i/2 0'
A= -i/2 3/2 0
V 0 0 1
where i = ^/—T.
The eigenvalues are found to be 1, 2, 1, with associated
unit eigenvectors
-if y/2 ( 1/V2  / 0 '
l/x/2 , -i/y/2 , 0
Therefore
U*AU =
/ l 0 0'
0 2 0
3 1 0 Chapter Nine: Advanced Topics
where U is the unitary matrix
^2
V 0 0 V2/
Normal matrices
We have seen that every nxn hermitian matrix A has the
property that there is an orthonormal basis of C n
consisting
of eigenvectors of A. It was also observed that this property
immediately leads to A being diagonalizable by a unitary ma-
trix, namely the matrix whose columns are the vectors of the
orthonormal basis. We shall consider what other matrices
have this useful property.
A complex matrix A is called normal if it commutes with
its complex transpose,
A* A = AA*.
Of course for a real matrix this says that A commutes with
its transpose AT
. Clearly hermitian matrices are normal; for
if A = A*, then certainly A commutes with A*. What is the
connection between normal matrices and the existence of an
orthonormal basis of eigenvectors? The somewhat surprising
answer is given by the next theorem.
Theorem 9.1.5
Let A be a complex nxn matrix. Then A is normal if and
only if there is an orthonormal basis of Cn
consisting of eigen-
vectors of A.
Proof
First of all suppose that Cn
has an orthonormal basis of
eigenvectors of A. Then, as has been noted, there is a uni-
tary matrix U such that U*AU = D is diagonal. This leads
9.1: Symmetric and Hermitian Matrices 311
to A = UDU* because U* = U~x
. Next we perform a di-
rect computation to show that A commutes with its complex
transpose:
AA* = UDU*UD*U* = UDD*U*,
and in the same way
A* A = UD*U*UDU* = UD*DU*.
But diagonal matrices always commute, so DD* = D*D. It
follows that AA* = A*A, so that A is normal.
It remains to show that if A is normal, then there is an
orthonormal basis of Cn
consisting entirely of eigenvectors of
A. From 9.1.2 we know that there is a unitary matrix U such
that U* AU = T is upper triangular. The next observation
is that T is also normal. This too is established by a direct
computation:
T*T = U*A*UU*AU = U*(A*A)U.
In the same way TT* = U*(AA*)U. Since A*A = AA*, it
follows that T*T = TT*.
Now equate the (1, 1) entries of T*T and TT*; this yields
the equation
|tii|2
= |iii|2
+ |£i2|2
+ --' + |iin|2
,
which implies that £12,..., tn are all zero. By looking at the
(2, 2), (3, 3),..., (n, n) entries of T*T and TT*, we see that
all the other off-diagonal entries of T vanish too. Thus T is
actually a diagonal matrix.
Finally, since AU = UT, the columns of U are eigenvec-
tors of A, and they form an orthonormal basis of C n
because
U is unitary. This completes the proof of the theorem.
The last theorem provides us with many examples of di-
agonalizable matrices: for example, complex matrices which
are unitary or hermitian are automatically normal, as are real
symmetric and real orthogonal matrices. Any matrix of these
types can therefore be diagonalized by a unitary matrix.
312 Chapter Nine: Advanced Topics
Exercises 9.1
1. Find unitary or orthogonal matrices which diagonalize the
following matrices:
( a
(i a);
(»>(; j -»)!
2. Suppose that A is a complex matrix with real eigenvalues
which can be diagonalized by a unitary matrix. Prove that A
must be hermitian.
3. Show that an upper triangular matrix is normal if and only
if it is diagonal.
4. Let A be a normal matrix. Show that A is hermitian if and
only if all its eigenvalues are real.
5. A complex matrix A is called skew-hermitian if A* = —A.
Prove the following statements:
(a) a skew-hermitian matrix is normal;
(b) the eigenvalues of a skew-hermitian matrix are purely
imaginary, that is, of the form af^l where a is real;
(c) a normal matrix is skew-hermitian if all its eigenvalues
are purely imaginary.
6. Let A be a normal matrix. Prove that A is unitary if and
only if all its eigenvalues c satisfy c = 1.
7. Let X be any unit vector in Cn
and put A = In — 2XX*.
Prove that A is both hermitian and unitary. Deduce that
A = A~
8. Give an example of a normal matrix which is not hermitian,
skew-hermitian or unitary. [Hint: use Exercises 4, 5, and 6].
9.2: Quadratic Forms 313
9. Let A be a real orthogonal n x n matrix. Prove that A
is similar to a matrix with blocks down the diagonal each of
which is Ii, —Im, or else a matrix of the form
cos 9 — sin 9 
sin 9 cos 9 J
where 0 < 9 < 2n, and 9 ^ TT. [Hint: by Exercise 6 the
eigenvalues of A have modulus 1; also A is similar to a diagonal
matrix whose diagonal entries are the eigenvalues].
9.2 Quadratic Forms
A quadratic form in the real variables x,..., xn is a poly-
nomial in x,..., xn with real coefficients in which every term
has degree 2. For example, the expression ax2
+ 2bxy + cy2
is a quadratic form in x and y. Quadratic forms occur in
many contexts; for example, the equations of a conic in the
plane and a quadric surface in three-dimensional space involve
quadratic forms.
We begin by observing that the quadratic form
q — ax2
+ 2bxy + cy2
in x and y can be written as a product of two vectors and a
symmetric matrix,
In general any quadratic form q in x i , . . . , xn can be written
in this form. For let q be given by the equation
n n
q = y j / j aijXiXj
i=l j = l
314 Chapter Nine: Advanced Topics
where the a^- are real numbers. Setting A = [ar,]n)Tl and
writing X for the column vector with entries x,..., xn, we see
from the definition of matrix products that q may be written
in the form
q = XT
AX.
Thus the quadratic form q is determined by the real matrix
A.
At this point we make the crucial observation that noth-
ing is lost if we assume that A is symmetric. For, since XT
AX
is scalar, q may also be written as (XT
AX)T
= XT
AT
X;
therefore
q= {XT
AX + XT
AT
X) = XT
{ {A + AT
) )X.
ZJ Zi
It follows that A can be replaced by the symmetric matrix
^(A + AT
). For this reason it will in future be tacitly assumed
that the matrix associated with a quadratic form is symmetric.
The observation of the previous paragraph allows us to
apply the Spectral Theorem to an arbitrary quadratic form.
The conclusion is that a quadratic form can be written in
terms of squares only.
Theorem 9.2.1
Let q = XT
AX be an arbitrary quadratic form. Then there is
a real orthogonal matrix S such that q = cix[ + • • • + cnx'n
where x[,..., x'n are the entries of X — ST
X and C,..., cn
are the eigenvalues of the matrix A.
Proof
By 9.1.3 there is a real orthogonal matrix S such that ST
AS =
D is diagonal, with diagonal entries c±,..., cn say. Define X
to be ST
X; then X = SX . Substituting for X, we find that
q = XT
AX = (SX')T
A(SX') = {X')T
{ST
AS)X'
= (X')T
DX'.
9.2: Quadratic Forms 315
Multiplying out the final matrix product, we find that q =
/ 2 , - / 2
CXi T • •
• ~r cnxn .
Application to conies and quadrics
We recall from the analytical geometry of two dimensions
that a conic is a curve in the plane with equation of the second
degree, the general form being
ax2
+ 2bxy + cy2
+ dx + ey + f = 0
where the coefficients are real numbers. This can be written
in the matrix form
XT
AX + (d e)X + f = 0
where
x - ( ; ) - d A - ( ; J).
So there is a quadratic form in x and y involved in this conic.
Let us examine the effect on the equation of the conic of ap-
plying the Spectral Theorem.
Let S be a real orthogonal matrix such that ST
AS =
, J where a' and d are the eigenvalues of A. Put X' =
ST
X and denote the entries of X by x',y'; then X = SX'
and the equation of the conic takes the form
( X
' ) T
( o °)x' + {de)SX' + f = 0,
or equivalently,
a'x'2
+ c'y'2
+ d'x' + e'y' + / = 0
for certain real numbers d' and e'. Thus the advantage of
changing to the new variables x' and y' is that no "cross term"
in x'y' appears in the quadratic form.
316 Chapter Nine: Advanced Topics
There is a good geometrical interpretation of this change
of variables: it corresponds to a rotation of axes to a new set
of coordinates x' and y'. Indeed, by Examples 6.2.9 and 7.3.7,
any real 2 x 2 orthogonal matrix represents either a rotation
or a reflection in R2
; however a reflection will not arise in the
present instance: for if it did, the equation of the conic would
have had no cross term to begin with. By Example 7.3.7 the
orthogonal matrix S has the form
cos 9 — sin 9 
sin 9 cos 9 J
where 9 is the angle of rotation. Since X = ST
X, we obtain
the equations
x' = x cos 9 + y sin 9
y' = —x sin 9 + y cos 9
The effect of changing the variables from x,y to x',y' is
to rotate the coordinate axes to axes that are parallel to the
axes of the conic, the so-called principal axes.
Finally, by completing the square in x' and y' as nec-
essary, we can obtain the standard form of the conic, and
identify it as an ellipse, parabola, hyperbola (or degenerate
form). This final move amounts to a translation of axes. So
our conclusion is that the equation of any conic can be put in
standard form by a rotation of axes followed by a translation
of axes.
Example 9.2.1
Identify the conic x2
+ Axy + y2
+ 3x + y — 1 = 0.
The matrix of the quadratic form x2
+ 4xy + y2
is
9.2: Quadratic Forms 317
It was shown in Example 9.1.1 that the eigenvalues of A are
3 and -1 and that A is diagonalized by the orthogonal matrix
S =
V2
Put X — ST
X where X has entries x' and y'; then X — SX
and we read off that
1
( '
X =
T2{X y')
y ^ (*- + •)
So here 9 = TT/4 and the correct rotation of axes for this conic
is through angle n/4 in an anticlockwise direction. Substitut-
ing for x and y in the equation of the conic, we get
3x'2
- y'2
+ 2^/2x' - y/2y' - 1 = 0.
From this we can already see that the conic is a hyperbola.
To obtain the standard form, complete the square in x' and
y':
3(z' + ^ ) 2
- ( y ' + 4;)2
3 ' ™ ' y/2}
6"
Hence the equation of the hyperbola in standard form is
3x"2
- y"2
= 7/6,
where x" = x' + ^2/3 and y" = y' + l/y/2. This is a hy-
perbola whose center is at the point where x' = — /2/3 and
y' = —jJ1 thus the xy - coordinates of the center of the
hyperbola are (1/6, —5/6). The axes of the hyperbola are the
lines x" = 0 and y" = 0, that is, x + y = —2/3 and x — y = l.
318 Chapter Nine: Advanced Topics
Quadrics
A quadric is a surface in three-dimensional space whose
equation has degree 2 and therefore has the form
ax2
+ by2
+ cz2
+ 2dxy + 2eyz + 2fzx + gx + hy + iz + j = 0.
Let A be the symmetric matrix
a d f 
d b e .
f e c)
Then the equation of the quadric may be written in the form
XT
AX + (gh i)X + .7=0.
where X is the column with entries x, y, z.
Recall from analytical geometry that a quadric is one the
following surfaces: an ellipsoid, a hyperboloid, a paraboloid, a
cone, a cylinder (or a degenerate form). The type of a quadric
can be determined by a rotation to principal axes, just as for
conies. Thus the procedure is to find a real orthogonal matrix
S such that ST
AX = D is diagonal, with entries a', b', c' say.
Put X' = ST
X. Then X = SX' and XT
AX = (X')T
DX':
the equation of the quadric becomes
(X'fDX' + (g h i)SX' +.7=0,
which is equivalent to
a'x'2
+ b'y'2
+ cV2
+ g'x' + tiy' + i'z' + j = 0.
Here a',b',c' are the eigenvalues of A, while ghi' are cer-
tain real numbers. By completing the square in x', y', z' as
9.2: Quadratic Forms 319
necessary, we shall obtain the equation of the quadric in stan-
dard form; it will then be possible to recognise its type and
position. The last step represents a translation of axes.
Example 9.2.2
Identify the quadric surface
x2
+y2
+ z2
+ 2xy + 2yz + 2zx - x + 2y - z = 0.
The matrix of the relevant quadratic form is
A =
and the equation of the quadric in matrix form is
XT
AX + (-l 2 - l ) X = 0.
We diagonalize A by means of an orthogonal matrix. The
eigenvalues of A are found to be 0, 0, 3, with corresponding
unit eigenvectors
W 2  / 0 / l A / 3 
-1/V2 , 1A/2 , W 3 .
o J V-WV W3/
The first two vectors generate the eigenspace corresponding to
the eigenvalue 0. We need to find an orthonormal basis of this
subspace; this can be done either by using the Gram-Schmidt
procedure or by guessing. Such a basis turns out to be
320 Chapter Nine: Advanced Topics
Therefore A is diagonalized by the orthogonal matrix
/ W2 W6 W3
S = -1/^2 1A/6 1/^3 .
V 0 -2/V6 1/V3/
The matrix 5 represents a rotation of axes. P u t X = 5 T
X ;
then X = SX' and
X r
A X = (X')T
(5T
A5)X = (X'fDX,
where D is the diagonal matrix with diagonal entries 0, 0, 3.
The equation of the quadric becomes
X'T
DX' + (-l 2 -1)SX' = 0
or
V2 V®
This is a parabolic cylinder whose axis is the line with equa-
tions y' = y/Zx', z' = 0.
Definite quadratic forms
Consider once again a quadratic form q = XT
AX in real
variables x±,..., xn, where A is a real symmetric matrix. In
some applications it is the sign of q that is significant.
The quadratic form q is said to be positive definite if q > 0
whenever 1 ^ 0 . Similarly, q is called negative definite if q < 0
whenever X ^ 0. If, however, q can take both positive and
negative values, then q is said to be indefinite. The terms
positive definite, negative definite and indefinite can also be
applied to a real symmetric matrix A, according to the behav-
ior of the corresponding quadratic form q = XT
AX.
For example, the expression 2x2
+ 3y2
is positive unless
x = 0 = y, so this is a positive definite quadratic form, while
9.2: Quadratic Forms 321
—2x2
— 3y2
is clearly negative definite. On the other hand,
the form 2x2
— 3y2
can take both positive and negative values,
so it is indefinite.
In these examples it was easy to decide the nature of the
quadratic form since it contained only squared terms. How-
ever, in the case of a general quadratic form, it is not possible
to decide the nature of the form by simple inspection. The
diagonalization process for symmetric matrices allows us to
reduce the problem to a quadratic form whose matrix is diag-
onal, and which therefore involves only squared terms. From
this it is apparent that it is the signs of the eigenvalues of the
matrix A that are important. The definitive result is
Theorem 9.2.2
Let A be a real symmetric matrix and let q = XT
AX: then
(a) q is positive definite if and only if all the eigenvalues
of A are positive;
(b) q is negative definite if and only if all the eigenvalues
of A are negative;
(c) q is indefinite if and only if A has both positive and
negative eigenvalues.
Proof
There is a real orthogonal matrix S such that ST
AS = D
is diagonal, with diagonal entries c,..., cn, say. Put X1
=
ST
X; then X = SX' and
q = XT
AX = (X')T
(ST
AS)X' = (x'fDX,
so that q takes the form
q = cx'2
+ c2x'2
2
H h cnx'n
2
where the entries of X . Thus q, considered as
a quadratic form in x'-y,...,x'n, involves only squares. Now
observe that as X varies over the set of all non-zero vectors
322 Chapter Nine: Advanced Topics
in Rn
, so does X' = ST
X. This is because ST
= S'1
is
invertible. Therefore q > 0 for all non-zero X if and only if
q > 0 for all non-zero X . In this way we see that it is sufficient
to discuss the behavior of q as a quadratic form in x[,..., x'n.
Clearly q will be positive definite as such a form precisely when
c,..., cn are all positive, with a corresponding statement for
negative definite: but q is indefinite if there are positive and
negative Q'S. Finally ci,..., cn are just the eigenvalues of A,
so the assertion of the theorem is proved.
Let us consider in greater detail the important case of a
quadratic form q in two variables x and y, say
q = ax2
+ 2bxy + cy2
; the associated symmetric matrix is
Let the eigenvalues of A be d±and d2. Then by 8.1.3 we have
the relations det(A) = dd2 and tr(A) = d + d2; hence
dd2 = ac — b2
and d + d2 = a + c.
Now according to 9.2.2 the form q is positive definite if and
only if d and d2 are both positive. This happens precisely
when ac > b2
and a > 0. For these conditions are certainly
necessary if d and d2 are to be positive, while if the conditions
hold, a and c must both be positive since the inequality ac > b2
shows that a and c have the same sign.
In a similar way we argue that the conditions for A to be
negative definite are ac > b2
and a < 0. Finally, q is indefinite
if and only if ac < b2
: for by 9.2.2 the condition for q to be
indefinite is that d and d2 have opposite signs, and this is
equivalent to the inequality dd2 < 0. Therefore we have the
following result.
9.2: Quadratic Forms 323
Corollary 9.2.3
Let q = ax2
+ 2bxy + cy2
be a quadratic form in x and y.
Then:
(a) q is positive definite if and only if ac > b2
and a > 0;
(b) q is negative definite if and only if ac > b2
and a < 0;
(c) q is indefinite if and only if ac < b2
.
Example 9.2.3
Let q = -2x2
+ xy - 3y2
. Here we have a = - 2 , b = 1/2,
c = —3. Since ac — b2
> 0 and a < 0, the quadratic form is
negative definite, by 9.2.3.
The status of a quadratic form in three or more variables
can be determined by using 9.2.2.
Example 9.2.4
Let q = —2x2
— y2
— 2z2
+ 6xz be a quadratic form in x, y, z.
The matrix of the form is
1-2 0 3
A= 0 - 1 0
 3 0 - 2
which has eigenvalues —5, —1,1. Hence q is indefinite.
Next we record a very different criterion for a matrix to
be positive definite. While it is not a practical test, it has a
very striking form.
Theorem 9.2.4
Let A be a real symmetric matrix. Then A is positive definite
if and only if A — BT
B for some invertible real matrix B.
Proof
Suppose first that A = BT
B with B an invertible matrix.
Then the quadratic form q = XT
AX can be rewritten as
q = XT
BT
BX = (BX)T
BX = BX2
.
324 Chapter Nine: Advanced Topics
If X ^ 0, then BX ^ 0 since B is invertible. Hence BX is
positive if X ^ 0. It follows that q, and hence A, is positive
definite.
Conversely, suppose that A is positive definite, so that
all its eigenvalues are positive. Now there is a real orthogonal
matrix S such that ST
AS = D is diagonal, with diagonal
entries d1 ( ..., dn say. Here the di are the eigenvalues of A, so
all of them are positive. Define [D to be the real diagonal
matrix with diagonal entries y/d~[,..., y/d^,. Then we have
A = {ST
)~l
DST
= SDST
since ST
= S"1
, and hence
A = S(VDVD)ST
= {y/DST
)T
{VDST
).
Finally, put B = /DST
and observe that B is invertible since
both S and y/~D are.
Application to local maxima and minima
A well-known use of quadratic forms is to determine if
a critical point of a function of several variables is a local
maximum or a local minimum. We recall briefly the nature of
the problem; for a detailed account the reader is referred to a
textbook on calculus such as [18].
Let / be a function of independent real variables xi,...,
xn whose first order partial derivatives exist in some region
R. A point P(ai, ...,an) of R is called a local maximum (min-
imum) of / if within some neighborhood of P the function /
assumes its largest (smallest) value at P. A basic result states
that if P is a local maximum or minimum of f lying inside
R, then all the first order partial derivatives of f vanish at P:
fXi(ai,...,an) =0 for i = l,...,n.
A point at which all these partial derivatives are zero is called
a critical point of / . Thus every local maximum or minimum
is a critical point of / . However there may be critical points
which are not local maxima or minima, but are saddle points
of/.
9.2: Quadratic Forms 325
For example, the function f(x, y) = x2
— y2
has a saddle
point at the origin, as shown in the diagram.
The problem is to devise a test which can distinguish
local maxima and minima from saddle points. Such a test is
furnished by the criterion for a quadratic form to be positive
definite, negative definite or indefinite.
For simplicity we assume that / is a function of two vari-
ables x and y. Assume further that / and its partial deriva-
tives of degree at most three are continuous inside a region R
of the plane, and that (xo, yo) is a critical point of / in R.
Apply Taylor's Theorem to the function / at the point
(x0, y0), keeping in mind that fx(x0,y0) = 0 = fy{x0, y0). If h
and k are sufficiently small, then f(xo + h, yo + k) — f(xo, yo)
equals
-^{h2
fxx(xo, yo) + 2hkfxy(x0, yo) + k2
fyy(x0, y0)) + S :
here S is a remainder term which is a polynomial of degree 3
or higher in h and k. Write a = fxx(xQ, yQ), b = fxy(xQ, y0)
and c = fyy(xo, y0); then
f(x0 + h,y0 + k)- /(xo, yo) = -^{ah2
+ 2bhk + ck2
) + S.
Here S is small compared to the other terms of the sum if h
and k are small.
326 Chapter Nine: Advanced Topics
Let q — ax2
+ 2bxy + cy2
. If q is negative definite, then
/(XQ + h, yo + k) < f(x0, yo) when h and k are small and P is
a local maximum. On the other hand, if q is positive definite,
then P is a local minimum since f(xo + h, yo + k) > /(XQ, yo)
for sufficiently small h and k. Finally, should q be indefinite,
the expression f(xo + h, yo+k)—f(xo, yo) can be both positive
and negative, so P is neither a local maximum nor a local
minimum, but a saddle point.
Thus the crucial quadratic form which provides us with
a test for P to be a local maximum or minimum arises from
the matrix
TT / Jxx Jxy 
 Jxy JyyJ
If the matrix H(xo, yo) is positive definite or negative definite,
then / will have a local minimum or local maximum respec-
tively at P. If, however, H(x0, yo) is indefinite, then P will
be a saddle point of /. Combining this result with 9.2.3, we
obtain
Theorem 9.2.5
Let f be a function of x and y and assume that f and its
partial derivatives of order < 3 are continuous in some region
containing the critical point P(xo, yo)- Let D — fxxfyy — fxy
:
(a) If D(xQ, yo) > 0 and fxx{xQ, y0) < 0, then P is a
local maximum of f;
(b) If D(XQ, yo) > 0 and fxx(x0, y0) > 0, then P is a
local minimum of f;
(c) If D < 0, then P is a saddle point of f.
The argument just given for a function of two variables
can be applied to a function / of n variables x,..., xn. The
relevant quadratic form in this case is obtained from the
9.2: Quadratic Forms 327
matrix
/fXlXl fXlXa •
• • fXlXn
TT JX2X1 JX2X2 ' ' ' JX2Xn
J XfX J XJIX2 J XfiXji
which is called the hessian of the function /. Notice that the
hessian matrix is symmetric since fXiXj = fXjXi, provided that
/ and all its derivatives of order < 3 are continuous.
The fundamental theorem may now be stated.
Theorem 9.2.6
Let f be a function of independent variables xi,... ,xn. As-
sume that f and its partial derivatives of order < 3 are contin-
uous in a region containing a critical point P(ai,a,2, • • • ,an).
Let H be the the hessian of f.
(a) // H(ai,..., an) is positive definite, then P is a local
minimum of f;
(b) if H(a±,..., on) is negative definite, then P is a local
maximum of f;
(c) if H(ai,..., an) is indefinite, then P is a saddle point
off-
Example 9.2.5
Consider the function f(x, y) = (x2
— 2x) cos y. It has a
single critical point (1, n) since this is the only point where
both first derivatives vanish. To decide the nature of this point
we compute the hessian of / as
H =
2 cos y — (2x — 2) sin y
-{fix — 2) sin y —(x2
— 2x) cos y
Hence H(1,TT) — I J, which is clearly negative defi-
nite. Thus the point in question is a local maximum of /.
Notice that the test given in 9.2.6 will fail to decide the
nature of the critical point P if at P the matrix H is not
328 Chapter Nine: Advanced Topics
positive definite, negative definite or indefinite: for example,
H might equal 0 at P.
Extremal values of a quadratic form
Consider a quadratic form in variables x±,..., xn
q = XT
AX,
where as usual A is a real symmetric nxn matrix and X is the
column consisting of x±,..., xn. Suppose that we want to find
the maximum and minimum values of q when X is subject to
a restriction. One possible restriction is that
||X|| = a
for some a > 0, that is, x + • • • + x — a2
. Thus we are
looking for the maximum and minimum values of q on the n-
sphere with radius a and center the origin in Rn
. One could
use calculus to attack this problem, but it is simpler to employ
diagonalization.
There is a real orthogonal nxn matrix S such that
ST
AS = D, where D is the diagonal matrix with the eigen-
values of A, say di,..., dn, on its diagonal. Put Y = S~1
X:
thus we have X = SY and
q = XT
AX = YT
ST
ASY = YT
DY = dlV + •
•
• + dny2
n,
where yi,..., yn are the entries of Y.
In addition we find that
XT
X = YT
(ST
S)Y = YT
Y,
since ST
= 5 _ 1
. Therefore our problem may be reformulated
as follows: find the maximum and minimum values of the ex-
pression diyf- -dnVn subject to y2
+ - • -+yn = a2
. But this
9.2: Quadratic Forms 329
is easily answered. For assume that m and M are respectively
the smallest and the largest eigenvalues of A. Then
q = dxy + •
•
• + dnyl < M(yf + • •
• + j/*) = Ma2
,
and
q = diyl + •
•
• + dnyl > m
(yi + --' + vl)= mo
?-
Suppose that the largest eigenvalue M occurs for k differ-
ent yj's; then we can take each of the corresponding y^s to be
equal to a/y/k and all other y^s to be 0. Then y2
+- • •J
ry2
l = a2
and the value of q at this point is exactly
Mk{a/^k)2
= Ma2
.
It follows that the largest value of q on the n-sphere really is
Ma2
. By a similar argument the smallest value of q on the
n-sphere is ma2
. We state this conclusion as:
Theorem 9.2.7
The minimum and maximum values of the quadratic form q =
XT
AX for X — a > 0 are respectively ma2
and Ma2
where
m and M are the smallest and largest eigenvalues of the real
symmetric matrix A.
We conclude with a geometrical example.
Example 9.2.6
The equation of an ellipsoid with center the origin is given as
XT
AX = c, where A is a real symmetric 3 x 3 matrix and c
is a positive constant. Find the radius of the largest sphere
with center the origin which lies entirely within the ellipsoid.
By a rotation to principal axes we can write the equation
of the ellipsoid in the form dx' + ey'2
+ fz' = c, where the
330 Chapter Nine: Advanced Topics
eigenvalues d, e, f of A are positive. Hence the equation of the
ellipsoid takes the standard form
/2 /2 /2
x y z
h - — I = 1.
c/d c/e c/f
Clearly the sphere will lie entirely inside the ellipsoid pro-
vided that its radius a does not exceed the length of any of
the semi-axes: thus a cannot be larger than any of
Therefore the condition on a is that a < y/jfr, where M is
the biggest of the eigenvalues d, e, /. Thus the largest sphere
which is contained entirely within the ellipsoid has radius
Exercises 9.2
1. Determine if the following quadratic forms are positive
definite, negative definite or indefinite:
(a) 2x2
-2xy + 3y2
;
(b) x2
-3xz-2y2
+ z2
;
(c) x2
+ y2
+ xz + yz.
2. Determine if the following matrix is positive definite, neg-
ative definite or indefinite:
G i0-
3. A quadratic form q — X7
AX is called positive semidefinite
if q > 0 for all X. The definition of negative semidefinite is
9.2: Quadratic Forms 331
similar. Prove that q is positive semidefinite if and only if all
the eigenvalues of A are > 0, and negative semidefinite if and
only if all the eigenvalues are < 0.
4. Let A be a positive definite nxn matrix and let S be a
real invertible nxn matrix. Prove that ST
AS is also positive
definite.
5. Let A be a real symmetric matrix. Prove that A is nega-
tive definite if and only if it has the form —(BT
B) for some
invertible matrix B.
6. Identify the following conies:
(a) Ux2
-16xy+hy2
= 6; (b) 2x2
+4xy+2y2
+x-3y = 1.
7. Identify the following quadrics:
(a) 2x2
+ 2y2
+3z2
+4yz = 3; (b) 2x2
+ 2y2
+ z2
+4xz = 4.
8. Classify the critical points of the following functions as
local maxima, local minima or saddle points:
(a) x2
+ 2xy + 2y2
+ Ax
(b) (x + yf + {x- yf - 12(3x + y);
(c) x2
+ y2
+ 3z2
— xy + 2xz — z.
9. Find the smallest and largest values of the quadratic form
q = 2a:2
+ 2y2
+ 3z2
+ 4yz when the point (x,y,z) is required
to lie on the sphere with radius 1 and center the origin.
10. Let XT
AX = c be the equation of an ellipsoid with center
the origin, where A is a real symmetric 3 x 3 matrix and c is a
positive constant. Show that the radius of the smallest sphere
with center the origin which contains the ellipsoid is y ^ ,
where m is the smallest eigenvalue of A.
11. Show that bx2
+ 2xy + 2y2
+ 5z2
= 1 is the equation of
an ellipsoid with center the origin. Then find the radius of
the smallest and largest sphere with center the origin which
contains, respectively is contained in, the ellipsoid.
332 Chapter Nine: Advanced Topics
9.3 Bilinear Forms
Roughly speaking, a bilinear form is a scalar-valued linear
function of two vector variables. One type of a bilinear form
which we have already met is an inner product on a real vector
space. It will be seen that there is a close connection between
bilinear forms and quadratic forms.
Let V be a vector space over a field of scalars F and write
VxV
for the set of all pairs (u, v) of vectors from V. Then a bilinear
form on V is a function
f:VxV^F,
that is, a rule assigning to each pair of vectors (u, v) a scalar
/(u,v), which satisfies the following requirements:
(i) /(ui + u2 ,v) = /(ui,v) + /(u2 ,v);
(ii) /(u, vi + v2) = /(u, vi) + /(u, v2);
(iii) /(cu,v) = c/(u,v);
(iv) /(u,cv) = c/(u,v).
These rules must hold for all vectors u, ui, 112, v, vi, v2 in V
and all scalars c in F. The effect of the four defining properties
is to make /(u, v) "linear" in both the variables u and v.
As has been mentioned, an inner product < > on a real
vector space is a bilinear form / in which
/(u, v) = < u, v > .
Indeed the defining properties of the inner product guarantee
this.
A very important example of a bilinear form arises when-
ever a square matrix is given.
9.3: Bilinear Forms 333
Example 9.3.1
Let A be an n x n matrix over a field F. A function
/ : Fn
x Fn
—
> F is defined by the rule
f(X, Y) = XT
AY.
That / is a bilinear form on Fn
follows from the usual rules of
matrix algebra. The importance of this example stems from
the fact that it is typical of bilinear forms on finite-dimensional
vector spaces in a sense that will now be made precise.
Matrix representation of bilinear forms
Suppose that f : V xV —>• F is a bilinear form on a vector
space V of dimension n over a field F. Choose an ordered basis
B = {vi,..., vn } of V and define a^- to be the scalar /(VJ, Vj).
Thus we can associate with / the n x n matrix
A= [aij].
Now let u and v be arbitrary vectors of V and write them
in terms of the basis as u = YM= ^V
* an<
^ v =
S?=i c
j v
i '
then the coordinate vectors of u and v with respect to the
given basis are
[u]B = I : and [v]B =
bn/
The linearity properties of / can be used to compute /(u, v)
in terms of the matrix A.
n n n n
/(u, v) = f(^2 biVl, J2 c
iv
i) = X) bi
f(Vi
> Ylc
iw
i">
i = l j = l i = l j'=l
n n
i=l i = l
Cl
334 Chapter Nine: Advanced Topics
Since /(VJ,VJ) = a^-, this becomes
n n
/(u,v) = ] P ^ 6 i a i i c i ,
i=i j=i
from which we obtain the fundamental equation
/(u,v) = ([u]s)r
A[v]e.
Thus the bilinear form / is represented with respect to the
basis B by the n x n matrix A whose (i,j) entry is / ( V J , V J ) .
The values of / can be computed using the above rule. In
particular, if / is a bilinear form on Fn
and the standard
basis of Fn
is used, then f(X, Y) = XT
AY.
Conversely, if we start with a matrix A and define / by
means of the equation /(u, v) = ([u]e)T
A[v]B, then it is easy
to verify that / is a bilinear form on V and that the matrix
representing / with respect to the basis B is A.
Now suppose we decide to use another ordered basis B':
what will be the effect on the matrix A? Let S be the invertible
matrix which describes the change of basis B' —
• B. Thus
[u]B = 5[U]B', according to 6.2.4. Therefore
/(u,v) = (S[u}BI)T
A(S[v]B/) = ([u}B,)T
(ST
AS)[v]B,,
which shows that the matrix ST
AS represents / with respect
to the basis B '.
At this point we recognize that a new relation between
matrices has arisen: a matrix B is said to be congruent to a
matrix A if there is an invertible matrix S such that
B = ST
AS.
While there is an analogy between congruence and similarity
of matrices, in general similar matrices need not be congruent,
nor congruent matrices similar.
9.3: Bilinear Forms 335
The point that has emerged from the preceding discussion
is that matrices which represent the same bilinear form with
respect to different bases of the vector space are congruent.
This result is to be compared with the fact that the matrices
representing the same linear transformation are similar.
The conclusions of the the last few paragraphs are sum-
marized in the following basic theorem.
Theorem 9.3.1
(i) Let f be a bilinear form on an n-dimensional vector space
V over a field F and let B = {vi,..., vn } be an ordered basis
of V. Define A to be the n x n matrix whose (i,j) entry is
/(v», Vj); then
/(u,v) = ([u]B)T
A[v]B,
and A is the n x n matrix representing f with respect to B.
(ii) If B' is another ordered basis of V, then f is represented
with respect to B' by the matrix ST
AS where S is the invertible
matrix describing the basis change B' —
> B.
(iii) Conversely, if A is any n x n matrix over F, a bilinear
form on V is defined by the rule /(u, v) = ([U]B)T
A[V]B- It
is represented by the matrix A with respect to the basis B.
Symmetric and skew-symmetric bilinear forms
A bilinear form / on a vector space V is called symmetric
if its values are unchanged by reversing the arguments, that
is, if
/(u,v) = /(v,u)
for all vectors u and v. Similarly, / is said to be skew-
symmetric if
/(u,v) = - / ( v , u )
is always valid. Notice the consequence, /(u, u) = 0 for all
vectors u. For example, any real inner product is a symmetric
336 Chapter Nine: Advanced Topics
bilinear form; on the other hand, the form defined by the rule
zi  Vi
is an example of a skew-symmetric bilinear form on R2
. As
the reader may suspect, there are connections with symmetric
and skew-symmetric matrices.
Theorem 9.3.2
Let f be a bilinear form on a finite-dimensional vector space
V and let A be a matrix representing f with respect to some
basis of V. Then f is symmetric if and only if A is symmetric
and f is skew-symmetric if and only if A is skew-symmetric.
Proof
Let A be symmetric. Then, remembering that [u]T
A[v] is
scalar, we have
/(u, v) = [u]T
A[v] = ([u]T
A[v])T
= [v}T
AT
[u} = {v]T
A[u}
= /(v,u).
Therefore / is symmetric. Conversely, suppose that / is sym-
metric, and let the ordered basis in question be {vi,..., v n } .
Then a^- = f(vi,Vj) = /(VJ, Vj) = a^, so that A is symmet-
ric.
The proof of the skew-symmetric case is similar and is
left as an exercise.
Symmetric bilinear forms and quadratic forms
Let / be a bilinear form on Rn
given by f(X, Y) =
XT
AY. Then / determines a quadratic form q where
q = f(X,X) = XT
AX.
9.3: Bilinear Forms 337
Conversely, if q is a quadratic form in x i , . . . , xn, we can
define a corresponding symmetric bilinear form / on Rn
by
means of the rule
f(X,Y) = ±{q(X + Y)-q(X)-q(Y)}
where X and Y are the column vectors consisting of x,..., xn
and yi,... ,yn. To see that / is bilinear, first write q(X) =
XT
AX with A symmetric; then we have
f(X, Y) =±{{X + Y)T
A(X + Y)~ XT
AX - YT
AY}
= {XT
AY + YT
AX)
= XT
AY,
since XT
AY = (XT
AY)T
= YT
AX. This shows that / is
bilinear.
It is readily seen that the correspondence q —> / just
described is a bijection from quadratic forms to symmetric
bilinear forms on Rn
.
Theorem 9.3.3
There is a bijection from the set of quadratic forms in n vari-
ables to the set of symmetric bilinear forms on Rn
.
From past experience we would expect to get significant
information about symmetric bilinear forms by using the Spec-
tral Theorem. In fact what is obtained is a canonical or stan-
dard form for such bilinear forms.
338 Chapter Nine: Advanced Topics
Theorem 9.3.4
Let f be a symmetric bilinear form on an n-dimensional real
vector space V. Then there is a basis BofV such that
/(u, v) = mvi H h ukvk - uk+ivk+1 UlVi
where u,..., un and v±,..., vn are the entries of the coordi-
nate vectors [u]g and [v]g respectively and k and I are integers
satisfying 0 < k < I < n.
Proof
Let / be represented by a matrix A with respect some basis
B' oiV. Then A is symmetric. Hence there is an orthogonal
matrix S such that ST
AS = D is diagonal, say with diagonal
entries di,... ,dn; of course these are the eigenvalues of A.
Here we can assume that d,..., dk > 0, while dk+i,... ,di < 0
and di+i = • •
• = dn = 0, by reordering the basis if necessary.
Let E be the nxn diagonal matrix whose diagonal entries are
the real numbers
1/y/di, •
•
-, 1/y/dk, l/y/-dk+1, ...,1/y/^dl, 1,...,1.
Then
(SE)T
A{SE) = ET
(ST
AS)E = EDE,
and the final product is the matrix
/ h I 0 | 0 
B 0 I
— I
 0 I
-Ii-k I 0
— I —
0 1 0 /
Now the matrix SE is invertible, so its inverse determines a
change of basis from B' to say B. Then / will be represented
by the matrix B with respect to the basis B. Finally, /(u, v) =
9.3: Bilinear Forms 339
([U]B)T
-B[V]B, so the result follows on multiplying the matrices
together.
Example 9.3.2
Find the canonical form of the symmetric bilinear form on R2
defined by f(X, Y) = xxyx + 2x±y2 + 2x2yi + x2y2.
The matrix of the bilinear form with respect to the stan-
dard basis is
*-G ?)•
which, by Example 9.1.1, has eigenvalues 3 and —1, and is
diagonalized by the matrix
then
f(X,Y) = XT
AY = (X')T
ST
AS Y' = (X)T
(3
Q _° Y',
so that
f(X,Y)=3x'1y'1-x2y'2.
Here x[ = 772(^1 + x2) and x'2 = -^(—xi + x2), with corre-
sponding formulas in y.
To obtain the canonical form of /, put x'[ = y/Zx^, y'{ =
y/3y[, and x'2' = x'2, y'2' = y'2. Then
f(X,Y) = x'{y';-x'2
l
y2',
which is the canonical form specified in 9.3.4.
Eigenvalues of congruent matrices
Since congruent matrices represent the same symmetric
bilinear form, it is natural to expect that such matrices should
340 Chapter Nine: Advanced Topics
have some common properties, as similar matrices do. How-
ever, whereas similar matrices have the same eigenvalues, this
is not true of congruent matrices. For example, the matrix
( o - a )
has eigenvalues 2 and —3, but the congruent matrix
(i!)G-°)(i 9-('-?)
has eigenvalues —2 and 3.
Notice that, although the eigenvalues of these congruent
matrices are different, the numbers of positive and negative
eigenvalues are the same for each matrix. This is an instance
of a general result.
Theorem 9.3.5 (Sylvester's Law of Inertia)
Let A be a real symmetric n x n matrix and S an invertible
n x n matrix. Then A and ST
AS have the same numbers of
positive, negative and zero eigenvalues.
Proof
Assume first of all that A is invertible; this is the essential
case. Recall that by 7.3.6 it is possible to write S in the form
QR where Q is real orthogonal and R is real upper triangular
with positive diagonal entries; this was a consequence of the
Gram-Schmidt process.
The idea of the proof is to obtain a continuous chain
of matrices leading from S to the orthogonal matrix Q; the
point of this is that QT
AQ — Q~l
AQ certainly has the same
eigenvalues as A. Define
S{t)=tQ + {l-t)S,
where 0 < t < 1. Thus 5(0) = S while 5(1) = Q. Now write
U = tl + (1 - t)R, so that S{t) = QU. Next U is an upper
9.3: Bilinear Forms 341
triangular matrix and its diagonal entries are t + (1 — t)ru
these cannot be zero since ra > 0 and 0 ^ t ^ 1. Hence U is
invertible, while Q is certainly invertible since it is orthogonal.
It follows that S(t) = QU is invertible; thus det(S(t)) ^ 0.
Now consider A(t) = S(t)T
AS{t); since
det(A{t)) = det(A) det(S(t))2
^ 0,
it follows that A(t) cannot have zero eigenvalues. Now as t
goes from 0 to 1, the eigenvalues of A(0) = ST
AS gradually
change to those of A(l) = QT
AQ, that is, to those of A.
But in the process no eigenvalue can change sign because the
eigenvalues that appear are continuous functions of t and they
are never zero. Consequently the numbers of positive and
negative eigenvalues of ST
AS are equal to those of A.
Finally, what if A is singular? In this situation the trick
is to consider the matrix A + el, which may be thought of as
a "perturbation" of A. Now A + el will be invertible provided
that € is sufficiently small and positive: for det(A + xl) is a
polynomial of degree n in x, so it vanishes for at most n values
of x. The previous argument shows that the result is true for
A + el if e is small and positive; then by taking the limit as
e —
•
> 0, we can deduce the result for A.
It follows from this theorem that the numbers of posi-
tive and negative signs that appear in the canonical form of
9.3.4 are uniquely determined by the bilinear form and do not
depend on the particular basis chosen.
Example 9.3.3
Show that the matrices ( j and I J are not con-
gruent.
All one need do here is note that the first matrix has
eigenvalues 1,3, while the second has eigenvalues 3 , - 1 . Hence
by 9.3.5 they cannot be congruent.
342 Chapter Nine: Advanced Topics
Skew-symmetric bilinear forms
Having seen that there is a canonical form for symmetric
bilinear forms on real vector spaces, we are led to enquire if
something similar can be done for skew-symmetric bilinear
forms. By 9.3.2 this is equivalent to trying to describe all
skew-symmetric matrices up to congruence. The theorem that
follows provides a solution to this problem.
Theorem 9.3.6
Let f be a skew-symmetric bilinear form on an n-dimensional
vector space V over either R or C. Then there is an ordered
basis ofV with the form {ui, v i , . . . ,uk, vfc, wi,.. .,wn_2fc},
where 0 < 2k < n, such that
f(ui,v^ = 1 = -/(vi,Ui), i = l,...,k
and f vanishes on all other pairs of basis elements.
Let us examine the consequence of this theorem before
setting out to prove it. If we use the basis provided by the
theorem, the bilinear form / is represented by the matrix
/
V
0
- 1
0
0
0
0
1
0
0
0
0
0
0
0
0
• - 1
0
0
0
0
1
0
0
0
0 •
0
0 ••
0 •
0
0
• °
0
0
0
0
• 0 )
where the number of blocks of the type is k. This
0 1
- 1 0
allows us to draw an important conclusion about skew-
symmetric matrices.
9.3: Bilinear Forms 343
Corollary 9.3.7
A skew-symmetric n x n matrix A over R or C is congruent
to a matrix M of the above form.
This is because the bilinear form / given by f(X, Y) =
XT
AY is skew-symmetric and hence is represented with re-
spect to a suitable basis by a matrix of type M; thus A must
be congruent to M.
Proof of 9.3.6
Let z i , . . . , zn be any basis of V. If /(ZJ, Zj) = 0 for all i and j ,
then /(u, v) = 0 for all vectors u and v, so that / is the zero
bilinear form and it is represented by the zero matrix. This is
the case k = 0. So assume that /(ZJ, Zj) is not zero for some i
and j . Since the basis can be reordered, we may suppose that
/(zi, z2) = a ^ 0. Then /(a_ 1
zi, z2) = a_ 1
/(zi5 z2) = 1.
Now replace zi by a_ 1
zi; the effect is to make /(zi, z2) = 1,
and of course /(z2 , zi) = —1 since / is skew-symmetric.
Next put bi = /(zi,Zj) where i > 2. Then
/(zi.zi -6z2 ) = /(zi,Zi) -6/(zi,z2 ) = 6 - 6 = 0.
This suggests that we modify the basis further by replacing Zj
by Zj — 6z2 for i > 2; notice that this does not disturb linear
independence, so we still have a basis of V. The effect of this
substitution is make
/(zi,Zj) = 0 for i = 3,.. .,n.
Next we have to address the possibility that /(z2 , z;) may
be non-zero when i > 2; let c = /(z2 , Zj). Then
/ ( z 2 , Z j + CZi) = / ( z 2 , Z j ) + c / ( z 2 , Z i ) = C + C ( - 1 ) = 0 .
This suggests that the next step should be to replace Zj by
Zj + czi where i > 2; again we need to observe that Zi,..., zn
344 Chapter Nine: Advanced Topics
will still be a basis of V. Also important is the remark that this
substitution will not nullify what has already been achieved;
the reason is that when i > 2
/(zi,Zi + czi) = /(zi,zi ) + c/(z1,z1) = 0 .
We have now reached the point where
/(zi,z2 ) = 1 = -/(z2 ,zi) and /(zi,z,) = 0 = /(z2,Zi),
for all i > 2. Now we rename our first two basis elements,
writing Ui = zi and vi = z2.
So far the matrix representing / has the form
/ O i l 0 0
- 1 0 1 0 0
I
V 0 0 I B J
where B is a skew-symmetric matrix with n — 2 rows and
columns. We can now repeat the argument just given for the
subspace with basis {Z3,..., zn }; it follows by induction on n
that there is a basis for this subspace with respect to which
/ is represented by a matrix of the required form. Indeed
let u2 ,...,uf c , v2,...,Vfc,wi,...,wn_2A; be this basis. By
adjoining ui and vi, we obtain a basis of V with respect to
which / is represented by a matrix of the required form.
Example 9.3.4
Find the canonical form of the skew-symmetric matrix
/ 0 0 2
A=  0 0 - 1
 - 2 1 0
We need to carry out the procedure indicated in the proof
of the theorem. Let {E±, E2, E3} be the standard basis of R3
.
9.3: Bilinear Forms 345
The matrix A determines a skew-symmetric bilinear form /
with the properties f{E1,E3) = 2 = -f(E3, Ex), f{E3, E2) =
1 = -f(E2,E3), f(E1,E2) = 0 = f(E2,E1).
The first step is to reorder the basis as {Ei,E3,E2};
this is necessary since f(E,E2) = 0 whereas f(Ei,E3) ^
0. Now replace {Ei,E3,E2} by {±Ei,E3,E2}, noting that
f(EuE3) = 1 = - / ( £ 3 , | £ i ) . Next f(E3,E2) = 1, so we
replace E2 by
E2 + f(E3,E2)-Ei = -Ei + E2.
Note that f{Eu EX + E2) = 0 = f(E3, EX + E2).
The procedure is now complete. The bilinear form is
represented with respect to the new ordered basis
by the matrix
/
M =
( 0 1 0
- 1 0 0
I 0 0 0
which is in canonical form. The change of basis from
{^Ei,E3, | E  + E2} to the standard ordered basis is repre-
sented by the matrix
1/2 0 1/2'
5 = | 0 0 1
0 1 0
The reader should now verify that ST
AS equals M, the canon-
ical form of A, as predicted by the proof of 9.3.6.
346 Chapter Nine: Advanced Topics
Exercises 9.3
1. Which of the following functions / are bilinear forms?
(a) f(X,Y) = X~Y on Rn
-
(b) f(X,Y) =XT
Y onRn
;
(c) f(g,h) = fag{x)h(x) dx on C{a,b].
2. Let / be the bilinear form on R2
which is defined by the
equation f(X,Y) = 2xy2 — 3x2yi- Write down the matrices
which represent / with respect to (a) the standard basis, and
(b) the basis {( J J ( j )}.
3. If / and g are two bilinear forms on a vector space V, define
their sum / + g by the rule / -f- g(u,v) = f(u,v) + g(u,v);
also define the scalar multiple cf by the equation cf(u, v) =
c(f(u,v)). Prove that with these operations the set of all
bilinear forms on V becomes a vector space V . If V has
dimension n, what is the dimension of V ?
4. Prove that every bilinear form on a real or complex vector
space is the sum of a symmetric and a skew-symmetric bilinear
form.
5. Find the canonical form of the symmetric bilinear form on
R2
given by f(X, Y) = 3ziyi + xxy2 + x2yi + 3x2y2-
6. Let / be a bilinear form on Rn
. Prove that / is an inner
product on Rn
if and only if / is symmetric and the corre-
sponding quadratic form is positive definite.
7. Test each of the following bilinear forms to see if it is an
inner product:
(a) f(X, Y) = 3zi2/i + xxy2 + x2yi + 5x2y2]
(b) f(X,Y) = 2x1y1+xiy2+x1y3 + x2yi + 3x2y2-2x2y3
+x3yt - 2x3y2 + 3x3y3.
9.4: Jordan Normal Form 347
8. Find the canonical form of the skew-symmetric matrix
and also find an invertible matrix S such that ST
AS equals
the canonical form.
9. (a) If A is a square matrix and S is an invertible matrix,
prove that A and ST
AS have the same rank.
(b) Deduce that the rank of a skew-symmetric matrix
equals twice the number of 2 x 2 blocks in the canonical form
of the matrix. Conclude that the canonical form is unique.
10. Call a skew-symmetric bilinear form / on a vector space
V non-isotropic if for every non-zero vector v there is an-
other vector w in V such that /(v, w) ^ 0. Prove that a
finite-dimensional real or complex vector space which has a
non-isotropic skew-symmetric bilinear form must have even
dimension.
9.4 Minimum Polynomials and Jordan Normal Form
The aim of this section is to introduce the reader to one
of the most famous results in linear algebra, the existence of
what is known as Jordan normal form of a matrix. This is a
canonical form which applies to any square complex matrix.
The existence of Jordan normal form is often presented as the
climax of a series of difficult theorems; however the simpli-
fied approach adopted here depends on only elementary facts
about vector spaces. We begin by introducing the important
concept of the minimum polynomial of a linear operator or
matrix.
348 Chapter Nine: Advanced Topics
The minimum polynomial
Let T be a linear operator on an n-dimensional vector
space V over some field of scalars F. We show that T must sat-
isfy some polynomial equation with coefficients in F. At this
point the reader needs to keep in mind the definitions of sum,
scalar multiple and product for linear operators introduced in
6.3. For any vector v of V, the set {v, T(v),..., Tn
(v)} con-
tains n + 1 vectors and so it must be linearly dependent by
5.1.1. Consequently there are scalars ao, a i , . . . , an, not all of
them zero, such that
a0v + aiT(v) + • • • + anTn
(v) = 0.
Let us write /v for the polynomial ao+a-_x + - • • + anxn
. Then
MT)=a0l + a1T+--- + anTn
,
where 1 denotes the identity linear operator. Therefore
/vCO(v) = a0v + aiT(v) + •
•
• + anTn
(v) = 0.
Now let {vi,..., vn } be a basis of the vector space V and
define / to be the product of the polynomials /V l , /v2 > •
•
• ) /vn •
Then
/(r)(vi) = /V l (r)-../V n (7)^0 = 0
for each i = l,...,n. This is because fVi(T)(vi) = 0 and
the /v (T) commute, since powers of T commute by Exercise
6.3.13. Therefore f(T) is the zero linear transformation on V,
that is,
/CO = o.
Here of course / is a polynomial with coefficients in F.
Having seen that T satisfies a polynomial equation, we
can select a polynomial / in x over F of smallest degree such
that f(T) = 0. In addition, we may suppose that / is monic,
9.4: Jordan Normal Form 349
that is, the highest power of x in / has its coefficient equal to
1. This polynomial / is called a minimum polynomial of T.
Suppose next that g is an arbitrary polynomial with coef-
ficients in F. Using long division, just as in elementary algebra,
we can divide g by / to obtain a quotient q and a remainder r;
both of these will be polynomials in x over F. Thus g = fq+r,
and either r = 0 or the degree of r is less than that of /. Then
we have
g(T) = f(T)q(T) + r(T) = r(T)
since f(T) = 0. Therefore g(T) = 0 if and only if r(T) = 0.
But, remembering that / was chosen to be of smallest degree
subject to f(T) = 0, we can conclude that r(T) = 0 if and
only if r = 0, that is, g is divisible by /. Thus the polynomials
that vanish at T are precisely those that are divisible by the
polynomial /.
If g is another monic polynomial of the same degree as
/ such that g(T) = 0, then in fact g must equal /. For g is
divisible by / and has the same degree as /, which can only
mean that g is a constant multiple of /. However g is monic,
so it actually equals /. Therefore the minimum polynomial of
T is the unique monic polynomial / of smallest degree such
that f(T) = 0.
These conclusions are summed up in the following result.
Theorem 9.4.1
Let T be a linear operator on a finite-dimensional vector space
over a field F with a minimum polynomial f. Then the only
polynomials g with coefficients in F such that g(T) = 0 are
the multiples of f. Hence f is the unique monic polynomial
of smallest degree such that f(T) = 0 and T has a unique
minimum polynomial.
So far we have introduced the minimum polynomial of
a linear operator, but it is to be expected that there will be
350 Chapter Nine: Advanced Topics
a corresponding concept for matrices. The minimum poly-
nomial of a square matrix A over a field F is defined to be
the monic polynomial / with coefficients in F of least degree
such that f(A) = 0. The existence of / is assured by 9.4.1
and the relationship between linear operators and matrices.
Clearly the minimum polynomial of a linear operator equals
the minimum polynomial of any representing matrix. There
is of course an exact analog of 9.4.1 for matrices.
Example 9.4.1
What is the minimum polynomial of the following matrix?
/ 2 1 1
4 = I 0 2 0
 0 0 2
In the first place we can see directly that (A — 2J3)2
= 0.
Therefore the minimum polynomial / must divide the poly-
nomial (x — 2)2
, and there are two possibilities, / = x — 2 and
f = (x — 2)2
. However / cannot equal x — 2 since A — 21 7^ 0.
Hence the minimum polynomial of A is / = (x — 2)2
.
Example 9.4.2
What is the minimum polynomial of a diagonal matrix D?
Let d,. . ., dr be the distinct diagonal entries of D. Again
there is a fairly obvious polynomial equation that is satisfied
by the matrix, namely
(A - dj) •
•
•
(A- drI) = 0.
So the minimum polynomial divides (x — d) • • • (x — dr) and
hence is the product of certain of the factors x — di. However,
we cannot miss out even one of these factors; for the product
of all the A — djI for j ^ i is not zero since dj ^ di. It follows
that the minimum polynomial of D is the product of all the
factors, that is, (x — d) • • • (x — dr).
9.4: Jordan Normal Form 351
In the computation of minimum polynomials the next
result is very useful.
Lemma 9.4.2
Similar matrices have the same minimum polynomial.
The quickest way to see this is to recall that similar ma-
trices represent the same linear operator, and hence their min-
imum polynomials equal the minimum polynomial of the lin-
ear operator. Thus, by combining Lemma 9.4.2 and Example
9.4.2, we can find the minimum polynomial of any diagonal-
izable complex matrix.
Example 9.4.3
Find the minimum polynomial of the the matrix
By Example 9.1.1 the matrix is similar to
Hence the minimum polynomial of the given matrix is
(x-3)(x + l).
In Chapter Eight we encountered another polynomial as-
sociated with a matrix or linear operator, namely the charac-
teristic polynomial. It is natural to ask if there is a connection
between these two polynomials. The answer is provided by a
famous theorem.
Theorem 9.4.3 (The Cayley-Hamilton Theorem)
Let A be annxn matrix over C. Ifp is the characteristic poly-
nomial of A, then p(A) = 0. Hence the minimum polynomial
of A divides the characteristic polynomial of A.
Proof
According to 8.1.8, the matrix A is similar to an upper tri-
angular matrix T; thus we have S~X
AS = T with S invert-
ible. By 9.4.2 the matrices A and T have the same minimum
polynomial, and we know from 8.1.4 that they have the same
1 2
2 1
352 Chapter Nine: Advanced Topics
characteristic polynomial. Therefore it is sufficient to prove
the statement for the triangular matrix T. From Example 8.1.2
we know that the characteristic polynomial of T is
(in -x)---(tnn -x).
On the other hand, direct matrix multiplication shows that
(tnl — T) • • • (tnnI — T)=0: the reader may find it helpful to
check this statement for n — 2 and 3. The result now follows
from 9.4.1.
At this juncture the reader may wonder if the minimum
polynomial is really of much interest, given that it is a divisor
of the more easily calculated characteristic polynomial. But in
fact there are features of a matrix that are easily recognized
from its minimum polynomial, but which are unobtainable
from the characteristic polynomial. One such feature is diag-
onalizability.
Example 9.4.4
(  
Consider for example the matrices I2 and I I: both of
these have characteristic polynomial (x — l)2
, but the first
matrix is diagonalizable while the second is not. Thus the
characteristic polynomial alone cannot tell us if a matrix is
diagonalizable. On the other hand, the two matrices just con-
sidered have different minimum polynomials, x — 1 and (x — l)2
respectively.
This example raises the possibility that it is the mini-
mum polynomial which determines if a matrix is diagonaliz-
able. The next theorem confirms this.
Theorem 9.4.4
Let A be an n x n matrix over C. Then A is diagonalizable if
and only if its minimum polynomial splits into a product of n
distinct linear factors.
9.4: Jordan Normal Form 353
Proof
Assume first that A is diagonalizable, so that S~1
AS — D, a
diagonal matrix, for some invertible S. Then A and D have
the same minimum polynomials by 9.4.2. Let di,...,dr be
the distinct diagonal entries of D; then Example 9.4.2 shows
that the minimum polynomial of D is (x — d) • • • (x — dr),
which is a product of distinct linear factors.
Conversely, suppose that A has minimum polynomial
/ = (x-di)---(x-dr)
where di,... ,dr are distinct complex numbers. Define #; to be
the polynomial obtained from / by deleting the factor x — di.
Thus
/
9i = -}-•
x — di
Next we recall the method of partial fractions, which is
useful in calculus for integrating rational functions. This tells
us that there are constants b,..., br such that
I = V^ b
i
f ~ ^ x- d{
Multiplying both sides of this equation by /, we obtain
1 = M i - r- Kgr
by definition of gi.
At this point we prefer to work with linear operators, so
we introduce the linear operator T on Cn
defined by T(X) —
AX. It follows from the above equation that bg{T) + • •
• +
brgr(T) is the identity function. Hence
X = bl9l(T)(X) + --- + brgr(T)(X)
354 Chapter Nine: Advanced Topics
for any vector X. Let Vi denote the set of all elements of the
form gi(T)X with X a vector in Cn
. Then Vj is a subspace
and the above equation for X tells us that
C n
= Vi + • • • + Vr.
Now in fact Cn
is the direct sum of the subspaces Vj, which
amounts to saying that the intersection of a Vi and the sum
of the remaining Vj, with j 7^ i, is zero. To see why this
is true, take a vector X in the intersection. Observe that
9i{T)gj{T) = 0 if % ^ j since every factor x — dk is present in
the polynomial giQj. Therefore gk(T)(X) = 0 for all k. Since
X = £ L i bkgk(T)(X), it follows that X = 0. Hence C n
is
the direct sum
Cn
= Vx®---®Vr.
Now the effect of T on vectors in Vi is merely to multiply
them by d; since (T - d^g^T) = /(T) = 0. Therefore, if we
choose bases for each subspace V,..., Vr and combine them
to form a basis of Cn
, then T will be represented by a diagonal
matrix. Consequently A is similar to a diagonal matrix.
Example 9.4.5
The matrix
has minimum polynomial (x — 2)2
, as we saw in Example 9.4.1.
Since this is not a product of distinct linear factors, the matrix
cannot be diagonalized.
9.4: Jordan Normal Form 355
Example 9.4.6
The n x n upper triangular matrix
°
0
0
1
cJ
has minimum polynomial (x —c)n
; this is because (A — cl)n
=
0, but (A — cl)n
~1
/ 0. Hence A is diagonalizable if and only
if n = 1. Notice that the characteristic polynomial of A equals
(c-x)n
.
Jordan normal form
We come now to the definition of the Jordan normal form
of a square complex matrix. The basic components of this are
certain complex matrices called Jordan blocks, of the type
considered in Example 9.4.6. In general a n n x n Jordan block
is a matrix of the form
0 0
0 0
0 0
c 1
0 c)
for some scalar c. Thus J is an upper triangular n x n ma-
trix with constant diagonal entries, a superdiagonal of l's, and
zeros elsewhere. By Example 9.4.6 the minimum and charac-
teristic polynomials of J are (x — c)n
and (c — x)n
respectively.
We must now take note of the essential property of the
matrix J. Let E±,..., En be the vectors of the standard basis
of Cn
. Then matrix multiplication shows that JE = cEi,
and JEi = cEi + -E^-i where 1 < i < n.
A
(c
0
0
0
Ko
1
c
0
0
0
0 ••
1 ••
c • •
0 ••
0 ••
• 0
• 0
• 0
c
• 0
J
I c ± u
0 c 1
0 0 c
0 0 0
V n n n
356 Chapter Nine: Advanced Topics
In general, if A is any complex n x n matrix, we call a
sequence of vectors X,..., XT in Cn
a Jordan string for A if
it satisfies the equations
AX i = cX1 and AXi = cXi + X;_i
where c is a scalar and 1 < i < r. Thus every n x n Jordan
block determines a Jordan string of length n.
Now suppose there is a basis of C n
which consists of
Jordan strings for the matrix A. Group together basis elements
in the same string. Then the linear operator on C n
given
by T(X) = AX is represented with respect to this basis of
Jordan strings by a matrix which has Jordan blocks down the
°
0
JkJ
Here Jj is a Jordan block, say with Cj on the diagonal. This
is because of the effect produced on the basis elements when
they are multiplied on the left by A.
Our conclusion is that A is similar to the matrix N, which
is called the Jordan normal form of A. Notice that the diagonal
elements Q of N are just the eigenvalues of A. Of course we
still have to establish that a basis consisting of Jordan strings
always exists; only then can we conclude that every matrix
has a Jordan normal form.
Theorem 9.4.5 (Jordan Normal Form)
Every square complex matrix is similar to a matrix in Jordan
normal form.
Proof
Let A be an n x n complex matrix. We have to establish the
existence of a basis of C n
consisting of Jordan strings for A.
This is done by induction on n; if n = 1, any non-zero vector
v
-i
"'fc>^""'1
-
N
/Jx 0
0 J2
V 0 0
9.4: Jordan Normal Form 357
qualifies as a Jordan string of length 1, so we can assume that
n > 1.
Since A is complex, it has an eigenvalue c. Thus the
matrix A' = A — cl is singular, and so its column space C
has dimension r < n. Recall from Example 6.3.2 that C
is the image of the linear operator on C" which sends X to
A1
X. Restriction of this linear operator to C produces a linear
operator which is represented by an r x r matrix. Since r < n,
we may assume by induction hypothesis on n that C has a
basis which is a union of Jordan strings for A. Let the ith
such string be written Xij, j = 1,... ,U thus AXn = CjXji
and in addition AXij = QXJJ + Xij-i for 1 < j < h- Then
A'Xn = 0 and A'Xij = Xy-_i if j > 1.
Next let D denote the intersection of C with N, the null
space of A', and set p = dim(D). We need to identify the
elements of D. Now any element of C has the form
Y = > J y]aijXij
i 3
where a^ is a complex number. Assume that Y is in D, and
thus in N, the null space of A'. Suppose that a^- ^ 0 and let
j be as large as possible with this property for the given i. If
j > 1, then the equations A'Xn = 0 and A'Xik = Xik-i will
prevent A'Y from being zero. Hence j = 1. It follows that
the Xii form a basis of .D, so there are exactly p of these Xn.
Every vector in C is of the form A'Y for some Y, since C
is the image space of the linear operator sending X to A'X.
For each i write the vector Xut in the form X ^ = A'Yi, for
some Yi , i = 1,..., p. There are p of these Yi. Finally, N has
dimension n — r, so we can adjoin a further set of n — r — p
vectors to the Xn to get a basis for N, say Z,..., Zn _r _p .
Altogether we have a total of r + p + (n — r — p) = n
vectors
358 Chapter Nine: Advanced Topics
We now assert that these vectors form a basis of Cn
which
consists of Jordan strings of A. Certainly
AYk = (A' + d)Yk = A'Yk + cYk = cYk + Xklk.
Thus the Jordan string Xki,..., Xkik has been extended by
adjoining Yk. Also AZm = cZm since Zm belongs to the null
space of A'; thus Zm is a Jordan string of A with length 1.
Hence the vectors in question constitute a set of Jordan strings
of A
What remains to be done is to prove that the vectors
Xij,Yk, Zm form a basis of Cn
, and by 5.1.9 it is enough to
show that they are linearly independent. To accomplish this,
we assume that e^, fk,gm are scalars such that
^ ^ eijXij + ^ fkYk + ^ 9mZm = 0.
Multiplying both sides of this equation on the left by A', we
get
53 53 ey(0 or X^) + J2 fk*kik = 0.
Now Xkik does not appear among the terms of the first sum
in the above equation since j — 1 < lk. Hence fk = 0 for all
k. Thus
/ J QrnZm =
~ / v / _, e
ijX{j,
which therefore belongs to D. Hence e^ = 0 if j > 1, and
J2 9mZm = — Yl e
aXn. This can only mean that gm = 0
and eji = 0 since the Xn and Z m are linearly independent.
Hence the theorem is established.
Corollary 9.4.6
Every complex n x n matrix is similar to an upper triangular
matrix with zeros above the superdiagonal.
This follows at once from the theorem since every Jordan
block is an upper triangular matrix of the specified type.
9.4: Jordan Normal Form 359
Example 9.4.7
Put the matrix
/
H
^ 3 1 0
- 1 1 0
V 0 0 2
in Jordan normal form.
We follow the method of the proof of 9.4.5. The eigen-
values of A are 2, 2, 2, so define
1 1 0'
A' = A-2I=-1 - 1 0
0 0 0
The column space C of A1
is generated by the single vector
( l

X = —1 . Note that AX = 2X, so X is a Jordan string
V 0/
of length 1 for A. Also the null space N of A' is generated by
X and the vector
Thus D — C D N — C is generated by X. The next step is to
write X in the form A'Y: in fact we can take
Y
0'
Thus the second basis element is Y. Finally, put Z = [ 0 | ,
so that {X, Z} is a basis for N. Then
A'X = 0, A'Y = X, A'Z = 0
360 Chapter Nine: Advanced Topics
and hence
AX = 2X, AY = 2Y + X and AZ = 2Z.
It is now evident that {X, Y, Z} is a basis of C3
consisting
of the two Jordan strings X,Y, and Z. Therefore the Jordan
form of A has two blocks and is
N
/ 2 1 1 0
0 2 1 0
V 0 0 1 2
As an application of Jordan form we establish an inter-
esting connection between a matrix and its transpose.
Theorem 9.4.7
Every square complex matrix is similar to its transpose.
Proof
Let A be a square matrix with complex entries, and write N
for the Jordan normal form of A. Thus S~1
AS = N for some
invertible matrix S by 9.4.5. Now
NT
= ST
AT
(S-1
)T
= ST
AT
(ST
)-
so NT
is similar to AT
. It will be sufficient if we can prove that
N and NT
are similar. The reason for this is the transitive
property of similarity: if P is similar to Q and Q is similar to
R, then P is similar to R.
Because of the block decomposition of N, it is enough
to prove that any Jordan block J is similar to its transpose.
But this can be seen directly. Indeed, if P is the permutation
matrix with a line of l's from top right to bottom left, then
matrix multiplication shows that P~X
JP — JT
.
9.4: Jordan Normal Form 361
Another use of Jordan form is to determine which matri-
ces satisfy a given polynomial equation.
Example 9.4.8
Find up to similarity all complex n x n matrices A satisfying
the equation A2
= I.
Let N be the Jordan normal form of A, and write N =
S^AS. Then N2
= S~1
A2
S. Hence A2
= I if and only if
N2
= /. Since N consists of a string of Jordan blocks down the
diagonal, we have only to decide which Jordan blocks J can
satisfy J2
= i". This is easily done. Certainly the diagonal
entries of J will have to be 1 or — 1. Furthermore, matrix
multiplication reveals that J2
^ I if J has two or more rows.
Hence the block J must be 1 x 1. Thus N is a diagonal
matrix with all its diagonal entries equal to +1 or — 1. After
reordering the rows and columns, we get a matrix of the form
where r + s = n. Therefore A2
= 1 if and only if A is similar
to a matrix with the form of N.
Next we consider the relationship between Jordan nor-
mal form and the minimum and characteristic polynomials.
It will emerge that knowledge of Jordan form permits us to
write down the minimum polynomial immediately. Since in
principle we know how to find the Jordan form - by using the
method of Example 9.4.7 - this leads to a systematic way of
computing minimum polynomials, something that was lacking
previously.
Let A be a complex nxn matrix whose distinct eigenval-
ues are ci,..., cr. For each Q there are corresponding Jordan
blocks in the Jordan normal form N of A which have Q on
their principal diagonals, say Jn,..., Jut let n^- be the num-
ber of rows of Ja. Of course A and N have the same minimum
362 Chapter Nine: Advanced Topics
and characteristic polynomials since they are similar matrices.
Now J^ is an n^ x n^ upper triangular matrix with ci on
the principal diagonal, so its characteristic polynomial is just
(ci — x)ni
i. The characteristic polynomial p of N is clearly the
product of all of these polynomials: thus
r h
p — TT(CJ — x)mi
where rrii =  J n
i j -
i = i j=i
The minimum polynomial is a little harder to find. If /
is any polynomial, it is readily seen that f(N) is the matrix
with the blocks f(Jij) down the principal diagonal and zeros
elsewhere. Thus f(N) = 0 if and only if all the / ( J ^ ) = 0.
Hence the minimum polynomial of N is the least common
multiple of the minimum polynomials of the blocks J^. But
we saw in Example 9.4.6 that the minimum polynomial of the
Jordan block J^ is (x — Ci)nij
. It follows that the minimum
polynomial of iV is
n
/=no* -c
*)fci
where ki is the largest of the n^ for j = 1,..., U.
These conclusions, which amount to a method of com-
puting minimum polynomials from Jordan normal form, are
summarized in the next result.
Theorem 9.4.8
Let A be an nxn complex matrix and let c±,... ,cr be the dis-
tinct eigenvalues of A. Then the characteristic and minimum
polynomials of A are
n n
Y[(ci-x)mi
and Y[(x-Ci)ki
i = l i=l
9.4: Jordan Normal Form 363
respectively, where m; is the sum of the numbers of columns
in Jordan blocks with eigenvalue ci and ki is the number of
columns in the largest such Jordan block.
Example 9.4.9
Find the minimum polynomial of the matrix
A =
The Jordan form of A is
N
/ 2 1 I 0 
0 2 1 0
— I
V 0 0 1 2 /
by Example 9.4.7. Here 2 is the only eigenvalue and there are
two Jordan blocks, with 2 and 1 columns. The minimum poly-
nomial of A is therefore (x — 2)2
. Of course the characteristic
polynomial is (2 — x)3
,
Application of Jordan form to differential equations
In 8.3 we studied systems of first order linear differential
equations for functions yi, y.2,..., yn of a variable x. Such a
system takes the matrix form
Y' = AY.
Here Y is the column of functions y,. .., yn and A is an n x
n matrix with constant coefficients. Since any such matrix
A is similar to a triangular matrix (by 8.1.8), it is possible
to change to a system of linear differential equations for a
new set of functions which has a triangular coefficient matrix.
This new system can then be solved by back substitution,
364 Chapter Nine: Advanced Topics
as in Example 8.3.3. However this method can be laborious
for large n and Jordan form provides a simpler alternative
method.
Returning to the system Y' = AY, we know that there is
a non-singular matrix S such that N = S~1
AS is in Jordan
normal form: say
7V =
0
0
0
jj
Here Ji is a Jordan block, say with di on the diagonal. Of
course the di are the eigenvalues of A. Now put U = S~~1
Y,
so that the system Y' — AY becomes (SU)' = ASU, or
U' = NU,
since N = S~1
AS. To solve this system of differential equa-
tions it is plainly sufficient to solve the subsystems U[ = JiUi
for i — 1,..., k where Ui is the column of entries of U corre-
sponding to the block Ji in N.
This observation effectively reduces the problem to one
in which the coefficient matrix is a Jordan block, let us say
(d 1 0 •
• • 0 0^
0 d 1 •
•
• 0 0
A =
o
0
0
0
0
d
0
1
d)
Now the equations in the corresponding system have a
much simpler form than in the general triangular case:
du + u2
du2 + u3
u.
ur
dun. - ur
dur
9.4: Jordan Normal Form 365
The functions Ui can be found by solving a series of first
order linear equations, starting from the bottom of the list.
Thus u'n = dun yields un = cn-edx
where cn_i is a constant.
The second last equation becomes
u'n_x - dun-i = cn-iedx
,
which is first order linear with integrating factor e~dx
. Multi-
plying the equation by this factor, we get (un-ie~dx
)' = cn_i.
Hence
un-i = (cn_2 + cn-ix)edx
where cn_2 is another constant. The next equation yields
u'n_2 - dun-2 = (cn-2 + cn-ix)edx
,
which is also first order linear with integrating factor e~dx
. It
can be solved to give
/ . c
n - 2 , c
n - l 2 dx
Un-2 = (cn-3 + ~jrX
+ ~^TX
)e
'
where cn„2 is constant. Continuing in this manner, we find
that the function «n_i is given by
^•
n—i — (Cn — i — 1 i r~| X "T ' " ' ~r r. X )C ,
1! i
where the Cj are constants. The original functions j/j can then
be calculated by using the equation Y — SU.
Example 9.4.10
Solve the linear system of differential equations below using
Jordan normal form:
y[ = 3yi + y2
y2 = -yi + V2
y'3 = 2y3
366 Chapter Nine: Advanced Topics
Here the coefficient matrix is
A =
The Jordan form of A was found in Example 9.4.7: we recall
the results obtained there. There is a basis of R3
consisting
of Jordan strings: this is {X, W, Z}, where
X = - 1 , W = 0 and Z
Here AX = 2X, AW = 2W + X, AZ = 2Z.
The matrix which describes the change of basis from
{X, W, Z} to the standard basis is
By 6.2.6 the matrix which represents the linear operator aris-
ing from left multiplication by A, with respect to the basis
{X,W,Z}, is
S^AS = J
Now put U = S X
Y, so that Y = SU and the system of
equations becomes U' = S~1
ASU = JU, that is,
2u3
u[
u'2
U'o
= 2ui
=
=
+ u2
2u2
9.4: Jordan Normal Form 367
We solve this system, beginning with the last equation, and
obtain u3 = c2e2x
. Next u'2 = 2u2, so that u2 = ce2x
. Finally
we solve
u[ - 2ui = cie2x
,
a first order linear equation, and find the solution to be
u — (CQ + c1x)e2x
.
Therefore
ci I <?
and since Y = SU, we obtain
Co + C +CiX
Y = | -CQ-CXX  e2x
,
c2
from which the values of the functions yi,y2,V3 can be read
off.
Exercises 9.4
1. Find the minimum polynomials of the following matrices
by inspection :
w(SS>w(Si)i ( c ,
(!S
/ 3 0 0'
(d) 0 2 1
 0 0 2
2. Let A be an n x n matrix and S an invertible n x n ma-
trix over a field F. If / is any polynomial over F, show that
f(S~1
AS) = S~1
f(A)S. Use this result to give another proof
368 Chapter Nine: Advanced Topics
of the fact that similar matrices have the same minimum poly-
nomial (see 9.4.2).
3. Use 9.4.4 to prove that if an n x n complex matrix has n
distinct eigenvalues, then the matrix is diagonalizable.
4. Show that the minimum polynomial of the companion ma-
trix
/ 0 0 - c
A= 1 0 -b
 0 1 -a
is x3
+ ax2
+ bx + c . (See Exercise 8.1.6). [Hint: show that
ul + vA + wA2
= 0 implies that u — v = w = 0].
5. Find the Jordan normal forms of the following matrices:
« G 2);<b
»(-! J)'(«=>(» ~ jj-
6. Read off the minimum polynomials from the Jordan forms
in Exercise 5.
7. Find up to similarity all nxn complex matrices A satisfying
A = A2
.
8. The same problem for matrices such that A2
= A3
.
9. (Uniqueness of Jordan normal form) Let A be a complex
nxn matrix with Jordan blocks Jij, where J^ is a block
associated with the eigenvalue Cj. Prove that the number of
r x r Jordan blocks Jij for a given i equals dr_i — dr, where
dk is the dimension of the intersection of the column space of
(A — Ciln)k
and the null space of A — Ciln. Deduce that the
blocks that appear in the Jordan normal form of A are unique
up to order.
10. Using Exercise 9.4.4 as a model, suggest an n x n matrix
whose minimal polynomial is xn
+ an-xn
~x
+ • •• + a,x + a0.
9.4: Jordan Normal Form 369
11. Use Jordan normal form to solve the following system of
differential equations:
2/i = Vi + 2/2 + 2/3
2/2 = 2/2
2/3 = 2/2 + 2/3
Chapter Ten
LINEAR PROGRAMMING
One of the great successes of linear algebra has been the
construction of algorithms to solve certain optimization prob-
lems in which a linear function has to be maximized or min-
imized subject to a set of linear constraints. Typically the
function is a profit or cost.Such problems are called linear
programming problems.
The need to solve such problems was recognized during
the Second World War, when supplies and labor were limited
by wartime conditions. The pioneering work of George Danzig
led to the creation of the Simplex Algorithm, which for over
half a century has been the standard tool for solving linear
programming problems. Our purpose here is to describe the
linear algebra which underlies the simplex algorithm and then
to show how it can be applied to solve specific problems.
10.1 Introduction to Linear Programming
We begin by giving some examples of linear programming
problems.
Example 10.1.1 (A productionproblem)
A food company markets two products F and F2l which are
made from two ingredients I and I2. To produce one unit
of product Fj one requires a^- units of ingredient Jj. The
maximum amounts of I and I2 available are mi and m2,
respectively. The company makes a profit of pi on each unit
of product Fi sold. How many units of F and F2 should
the company produce in order to maximize its profit without
running out of ingredients?
370
10.1: Introduction to Linear Programming 371
Suppose the company decides to produce Xj units of prod-
uct Fj. Then the profit on marketing the products will be
z — pXi + P2X2. On the other hand, the production process
will use an^i +012X2 units of ingredient I± and 021^2 + 022^2
units of ingredient 72- Therefore X and x2 must satisfy the
constraints
auxi + CL12X2 < mi and a2i£i + a22^2 < ™2-
Also x and x2 cannot be negative.
We therefore have to solve the following linear program-
ming problem:
maximize : z = pX + P2X2
{auxi + ai2x2 < mi
0^21^1 + 022^2 < ?™2
x,x2 > 0
Example 10.1.2 (A transportationproblem)
A company has m factories F±,..., Fm and n warehouses
Wi,..., Wn. Factory Fj can produce at most r^ units of a
certain product per week and warehouse Wj must be able to
supply at least Sj units per week. The cost of shipping one
unit from factory Fi to warehouse Wj is Cij. How many units
should be shipped from each factory to each warehouse per
week in order to minimize the total transportation cost and
yet still satisfy the requirements on the factories and ware-
houses?
Let Xij be the number of units to be shipped from factory
Fi to warehouse Wj per week. Then the total transportation
cost for the week is
m n
i=l j=X
372 Chapter Ten: Linear Programming
The condition on factory Fi is that 52 x
ij < r
ii while that on
m
warehouse Wj is 52 x
ij — s
j • We are therefore faced with the
following linear programming problem:
minimize: z = Y_, 2_.
i = l j=l
C-ij Xij
f n
52 x^ < n, i = i,...,m
J'=l
subject to: < ™ .
/ J
x
ij — s
ji 3 =
*•J • • • > ^
i = l
The general linear programming problem
After these examples we are ready to describe the general
form of a linear programming problem.
Let xi, x2, • • •, xn be variables. There is given a linear
function of the variables
z = ciXi + c2x2 H h cnxn,
called the objective function, which has to be maximized or
minimized. The variables Xj are subject to a number of linear
conditions, called the constraints, which take the form
anxi + ai2x2 - h ainxn < or = or > bi,
i = 1, 2,..., m. In addition, certain of the variables may be
constrained, i.e., they must take non-negative values. The gen-
eral linear programming problem therefore takes the following
form:
10.1: Introduction to Linear Programming 373
maximize or minimize: z = CXx + 1
- cnxn
{
a-nxi H V ainxn < or = or > bi,
i = 1,2,... ,m,
certain Xj > 0.
The understanding here is that a^-, &*, Cj are all known quan-
tities. The object is to find x,..., xn which optimize the ob-
jective function z, while satisfying the constraints. Evidently
Examples 10.1.1 and 10.1.2 areproblems of this type.
Feasible and optimal solutions
It will be convenient to think of X = (£i,a;2, • • • > x
n)T
in the above problem as a point in Euclidean space Rn
. If
X satisfies all the constraints (including the conditions Xj >
0), then it is called a feasible solution of the problem. A
feasible solution for which the objective function is maximum
or minimum is said to be an optimal solution.
For a general linear programming problem there are three
possible outcomes.
(i) There are no feasible solutions and thus the problem
has nooptimal solutions.
(ii) Feasible solutions exist, but the objective function has
arbitrarily large or small values at feasible solutions.
Again thereare no optimal solutions.
(iii) The objective function has finite maximum or
minimumvalues at feasible points. Then optimal
solutions exist.
In a linear programming problem the object is to find an
optimal solution or show that none exists.
374 Chapter Ten: Linear Programming
Standard and canonical form
Since the general linear programming problem has a com-
plex form, it is important to develop simpler types of prob-
lem which are equivalent to it. Here two linear programming
problems are said to be equivalent if they have the same sets
of feasible solutions and the same optimal solutions.
A linear programming problem is said to be in standard
form if it is a maximization problem with all constraints in-
equalities and all variables constrained. It therefore has the
general form
maximize: z — cX + • • • + cnxn
subject to:
' anXi + a2x2 + h ainxn < bi
a2Xi + a22x2 + h a2nxn < b2
Iamixi + am2x2 H h amnxn < bm
Xj>0, j = l,2,...,n
This problem can be written in matrix form: let A = (a,ij)mn,
B = (bi b2 ... bm)T
, C = (ci c2 ... cn)T
and X =
(x x2 ... xn)T
. Then the problem takes the form:
maximize : z = C X
, . . , (AX<B
subject to- S x > Q
Here a matrix inequality U < V means that U and V are
of the same size and Uy < v^j for all i,j: there is a similar
definition of U > V.
10.1: Introduction to Linear Programming 375
A second important type of linear programming problem
is a maximization problem with all constraints equalities and
allvariables constrained. The general form is:
maximize : z = C X
subject to:
Such a linear programming problem is said to be in canonical
form.
Changes to a linear programming problem
Our aim is to show that any linear programming problem
is equivalent to one in standard form and to one in canoni-
cal form. To do this we need to consider what changes to a
program will produce an equivalent program. There are four
types of change that can be made.
Replace a minimization by a maximization
If the objective function in a linear program is z = CT
X,
the minimum value of z occurs for the same X as the maxi-
mum value of (—C)T
X. Thus we can replace "minimize" by
"maximize" and CT
X by the new objective function (—C)T
X.
Reverse an inequality
The inequality anX + - • -+ainxn > bi is clearly equivalent
to (-Oji)a;i -I h {-ain)xn < -bi.
Replace an equality by two inequalities
The constraint anXi + - • -+ainxn = bn is equivalent tothe
two inequalities
a-nxi H h ainxn < bi
(-aa)xi H h (-ain)xn < -bi
 AX = B
 X > 0.
376 Chapter Ten: Linear Programming
Elimination of an unconstrained variable
Suppose that the variable Xj is unconstrained, i.e., it can
take negative values. The trick here is to replace Xj by two
new variables xt,xj which are constrained. Write Xj in the
form x-j = xf — x~ where xf.xj > 0. This is possible since
any real number can be written as the difference between two
positive numbers.
If we replace Xj by x~j~ — xj in each constraint and in
the objective function, and we add new constraints x^ > 0,
xj > 0, then the resulting equivalent program will have fewer
unconstrained variables.
By a sequence of operations of types I-IV a general linear
programming problem may be transformed to an equivalence
problem in standard form. Thus we have proved:
Theorem 10.1.1 Every linear programming problem is
equivalent to a program in standard form.
Example 10.1.3
Put the following programming problem in standard form.
minimize: z = 3x% + 2x2
—
^3
{Xi + X2 + 2x3 > 6
%l + ^2 + 3^3 < 2
£1,2:3 > 0
First of all change the minimization to a maximization
and replace the constraints involving = and > by constraints
involving < :
10.1: Introduction to Linear Programming Oil
maximize: z = —3xi — 2x2 + x3
subject to:
—xi — x2 — 2xz < —6
xi + x2 + x3 < 4
—x — x2 - x3 < —4
x - x2 + 3x3 < 2
x,x3 > 0
Next write x2, which is an unconstrained variable, in the form
J/O Xn This yields a problem in standard form:
maximize: z = —3xi — 2xt + 2x2 + x3
subject to:
-Xi
xx
Xi
+ Xn
-x2
+ xt-
+ xt +
Xn
x
2
- 2x3
+ x3
x3
— xt + x2 + 3^3
Xi,X~2,X2 ,X3 > 0
< - 6
< 4
< - 4
< 2
Slack variables
If we wish to transform a linear programming problem
to canonical form, a method for converting inequalities into
equalities is needed. This can be achieved by the introduction
of what are called slack variables.
Consider a linear programming problem in standard form:
maximize: z CT
X
subject to:
AX < B
X > 0
where A is m x n and the variables are x±,x2,... ,xn. We
introduce m new variables, xn+i,..., xn+m, the so-called slack
378 Chapter Ten: Linear Programming
variables, and replace the ith constraint anXi+- • -+ainxn < bi
by the new constraint
for i = 1, 2,..., m, together with X{+n > 0, i — 1,..., m.
The effect is totransform the problem to an equivalent linear
programming problem incanonical form:
maximize: z = CX + • • • + cnxn
subject to:
{
alxxi + • • • + alnxn + xn+i
a2Xi + • • • + a2nXn + Xn+2
Q"ml%l r " ' ' T arnn£n ~r Xn--m
Xi > 0, i = 1,2,..., n + m.
Combining this observation with 10.1.1, we obtain:
Theorem 10.1.2 Every linear programmingproblem is
equivalent to one in canonical form.
Exercises 10.1
1. A publishing house plans to issue three types of pamphlets
Pi) ?2) ?3- Each pamphlet has to be printed and bound.
The times in hours required to print and to bind one copy
of pamphlet Pj are Ui and Vi respectively. The printing and
binding machines can run for maximum times s and t hours
per day respectively. The profit made on one pamphlet of
type Pi is pi. Let xi,x2,X3 be thenumbers of pamphlets
of the three types to be produced per day. Set up a linear
program in xi,x2,x3 which maximizes the profitp per day
= h
= b2
10.1: Introduction to Linear Programming 379
and takes into account the times for which the machines are
available.
2. A nutritionist is planning a lunch menu with two food types
A and B. One ounce of A provides ac units of carbohydrate,
a/ units of fat and ap units of protein: for B the figures are
bc, bf, bp, respectively. The costs of one unit of A and one
unit of B are p and q respectively. The meal must provide at
least mc units of carbohydrate, m/ units of fat and mp units
of protein. Set up a linear program to determine how many
ounces of A and B should be provided in the meal in order to
minimize the cost e, while satisfying the dietary requirements.
3. Write the following linear programming problem in stan-
dard form:
minimize: z = 2x — x2 — £3 4- £4
{xi + 2x2 + £3 - £4 > 5
3^1 + £2 — x
3 + £4 < 4
xi,x2 > 0
4. Write the linear programming problem in Exercise 10.4.3
in canonical form.
5. Consider the following linear programming problem in
£1, X2, • • •, xn with n constraints:
maximize: z = CT
X
subject to:
where A is an n x n matrix with rank n.
(a) Show that there is a feasible solution if and only if
A~X
B>Q.
(b) Show that if a feasible solution exists, it must be
optimal.
(c) If an optimal solution exists, what is the maximum
value of zl
AX = B
X>0
380 Chapter Ten: Linear Programming
10.2 The Geometry of Linear Programming
Valuable insight into the nature of the linear program-
ming problem is gained by adopting a geometrical point of
view and regarding the problem as one about n-dimensional
space.
We will identify an n-column vector X with a point
(x1,x2,...,xn)
in n-dimensional space and denote the latter by Rn
. The set
of points X such that
aXi + • • • + anxn — b,
where the real numbers <2j,6 are not all zero, is called a hy-
perplane in Rn
. Thus a hyperplane in R2
is a line and a
hyperplane in R3
is a plane.
Let A = (ai a2 ... an); thus the equation of the hyper-
plane is AX = b: let us call it H. Then H divides Rn
into
two halfspaces
H1 = {X eKn
AX < b}
and
H2 = {XeRn
AX> b}.
Clearly
Rn
= ff!U H2 and H = H1nH2.
In a linear programming program in x,..., xn, each con-
straint requires the point X to lie in a half space or a hyper-
plane. Thus the set of feasible solutions corresponds to the
points lying in all of the half spaces or hyperplanes correspond-
ing to the constraints. In this way we obtain a geometrical
picture of the set of feasible solutions of the problem.
10.2: The Geometry of Linear Programming 381
Example 10.2.1
Consider the simple linear programming problem in standard
form:
laximi
to: <
ze: z = x + y
(2x+ y
x + 2y
x,y>0
< 3
< 3
The set S of feasible solutions is the region of the plane which
is bounded by the lines 2x + y = 3, x + 2y = 3, x = 0, y = 0
The objective function z = x + y corresponds to a plane
in 3-dimensional space. The problem is to find a point of S
at which the height of the plane above the xj/-plane is largest.
Geometrically, it is clear that this point must be one of the
"corner points" (0,0), (f ,0), (1,1), (0, §). The largest value of
z — x + y occurs at (1,1). Therefore x = 1 = y is an optimal
solution of the problem.
The next step is to investigate the geometrical properties
of the set of feasible solutions. This involves the concept of
convexity.
382 Chapter Ten: Linear Programming
Convex subsets
Let Xi and X2 be two distinct points in Rn
. The line
segment XX2 joining X and X2 is defined to be the set of
points
{tXx + (1 - t)X2 I 0 < t < 1}.
For example, if n < 3, the point tX + (1 — t)X2, where
0 < t < 1, is a typical point lying between Xi and X2 on the
line which joins them. To see this one has to notice that
X2 - (tX1 + (1 - t)X2) = t{X2 - Xx)
and
{tXx + (1 - t)X2) - X, = (1 - i)(X2 - Xi)
are parallel vectors.
(Keep in mind that we are using X to denote both the point
(xi,X2,xs) and the column vector {x X2 X3)7
'.)
A non-empty subset S of Rn
is called convex if, whenever
X and X2 are points in S, every point on the line segment
XX2 is also a point of S.
10.2: The Geometry of Linear Programming 3 8 3
It is easy to visualize the situation in R2
: for example,
consider the shaded regions shown.
The interior of the left hand figure is clearly convex, but the
interior of the right hand one is not.
The following property of convex sets is almost obvious.
Lemma 10.2.1
The intersection of a collection of convex subsets of Rn
is
either empty or convex.
Proof
Let {Si | i G /} be a set of convex subsets of Rn
and assume
that S — f] Si is not empty. If S has only one element, then
iei
it is obviously convex. So assume X and X2 are distinct
points of S and let 0 < t < 1. Now Xi and X2 belong to Si
for all i, as must tX + (1 — t)X2 since Si is convex. Hence
tX + (1 — t)X2 G S and S is convex.
Our interest in convex sets is motivated by the following
fundamental result.
Theorem 10.2.2
The set of all feasible solutions of a linear programming prob-
lem is either empty or convex.
384 Chapter Ten: Linear Programming
Proof
By 10.1.1 we may assume that the linear programming prob-
lem is in standard form. Hence the set of feasible solutions
is the intersection of a collection of half spaces. Because of
10.2.1 it is enough to prove that every half space H is convex.
For example, consider H = {X e Rn
| AX < b} where A
is an n-row vector. Suppose that Xi,X2 <
G H and 0 < t < 1.
Then
A(tXi + (l-t)X2) = t(AXx) + (l-t)AX2 < tb+(l-t)b = b.
Hence tX + (1 — t)X2 G H and H is convex.
The convex hull
Let Xi, Xii • • •
, Xm be vectors in Rn
. Then a vector of
the form
m
^CiXi,
where
m
Ci > 0 and VJ Q = 1,
i = i
is called a convex combination of Xi,X2,..., Xm. For exam-
ple, whenTO= 2, every convex combination of Xi, X2 has the
form tXi + { — t)X2, where 0 < t < 1. Thus the line segment
X±X2 consists of all the convex combinations of X and X2.
The set of all convex combinations of elements of a non-
empty subset S of Rn
is called the convex hull of S:
C{S).
For example, the convex hull of {Xi, X2}, where X ^ X2, is
just the line segment XiX2-
The relation between the convex hull and convexity is
made clear by the next result.
10.2: The Geometry of Linear Programming 385
Theorem 10.2.3
Let S be a non-empty subset o/Rn
. Then C(S) is the smallest
convex subset o/Rn
which contains S.
Proof
In the first place it is clear that 5 C C(S). We show next
that C(S) is a convex set. Let X,Y G C(S); then we can
m m
write X = Yl c
i^i a n
d Y = Yl diXi, where Xi,..., Xm G S,
i = l i = l
m m
0 < Cj, di < 1, and Y Ci = 1 = Y2 d%. Then for any t
i=i i=i
satisfying 0 < t < 1, we have
m m
iX + (1 - t)Y= Y^teiXi + 5 ] ( 1 - t)diXi
i=l i = l
m
= ^2(ta + {l-t)di)Xi.
i = l
Now
m / m  / m
^ ( t c i + ( l - t ) d i ) = t l ^ c i ] + ( l - t ) ]Tdj
i = l  i = l /  i = l
= t + (l-t)
= 1.
Consequently £X + (1 - t)F G C(S) and (7(5) is convex.
Next suppose T is any convex subset of R n
containing
S. We must show that C(S) C T; for then C(S) will be the
smallest convex subset containing S.
m
Let X G C(S) and write X = £ CjXj, where X; G 5,
i = i
771
Cj > 0 and ]T Q = 1. We will show that X G T by induction
i = l
386 Chapter Ten: Linear Programming
on m > 1, the claim being clearly true if m = 1. Now we have
m — 1
X = (1 - Cm) J2 iTZ—)X
i + C
mXm-
X Cm.
m—1
Next, since ^ c» = 1 — cm, we have
i-X
m—X
E l C-i  *• ^ m 1
M _ r ' _
1 _ r ~~
i = l - ^
Also 0 < , c
i < 1 since Q < ci + • • • + cm_i = 1 — cm for
1 < i < m — 1. Hence
m—X
-1
- *^T77.
1 = 1
by the induction hypothesis on m. Finally,
X = (1 - cm )F + cmXm e T,
since T is convex.
Extreme points
Let S be a convex subset of Rn
. A point of S is called an
extreme point if it is not an interior point of any line segment
joining two points of S. For example, the extreme points of
the set of points in the polygon below are just the six vertices
shown.
10.2: The Geometry of Linear Programming 3 8 7
The extreme points of a convex set can be characterized
in terms of convex combinations.
Theorem 10.2.4
Let S be a convex subset o/Rn
and let X e S. Then X is an
extreme point of S if and only if it is not a convex combination
of other points of S.
Proof
Suppose X is not an extreme point of S; then
X = tY + (l-t)Z,
where 0 < t < 1 and Y, Z G S. Then X is certainly a convex
combination of points of S, namely Y and Z.
Conversely, suppose that X is a convex combination of
other points of S. We will show that X is not an extreme
m
point of S. By assumption it is possible to write X = ^ qXi
2 = 1
m
where Xi G S, Xi ^ X, 0 < c; < 1 and ]T Q = 1. Notice
m
that Cj ^ 1; for otherwise ^ Q = 0 and Cj = 0 for all
i=l, i^j
i T^ j , so that X = Xj.
Just as in the proof of Theorem 10.2.3, we can write
m—1
X = (1 - Cm) J2 (TZ^)X
i + Cm*™-
i=l X C m
Also
m — 1
1 — C 1 — r ~
~
and 0 < yf*— < 1, since a < c + • • • + Cm-i = 1 — cm if
i < m. It follows that
ro— 1
388 Chapter Ten: Linear Programming
and X = (1 — cm)Y + cmXm is an interior point of the line
segment joining Y and Xm. Hence X is not an extreme point
of S, which completes the proof.
It is now time to explain the connection between optimal
solutions of a linear programming problem and the extreme
points of the set of feasible solutions.
Theorem 10.2.5 (The Extreme Point Theorem)
Let S be the set of all feasible solutions of a linear program-
ming problem.
(i) If S is non-empty and bounded, then there is an
optimal solution.
(ii) If an optimal solution exists, then it occurs at an
extreme point of S.
Here a subset S of Rn
is said to be bounded if there exists
a positive number d such that —d < Xi < d, for i — 1, 2,..., n
and all (xi,X2,... ,xn) in S.
Proof of Theorem 10.2.5
Suppose that we have a maximization problem. For simplicity
we will assume throughout that S is bounded and n = 2:
thus S can be visualized as a region of the plane bounded by
straight lines corresponding to the constraints.
Let z = f(x, y) — ex + dy be the objective function: we
can assume c and d are not both 0. Since S is bounded and /
is continuous in 5*, a standard theorem from calculus can be
applied to show that / has an absolute maximum in S. This
establishes (i).
Next assume that there is an optimal solution. By an-
other standard theorem, if P(x, y) is a point of P which is an
absolute maximum of /, then either P is a critical point of
S or else it lies on the boundary of S. But / has no critical
points: for fx = c, fy = d, so fx and fy cannot both vanish.
Thus P lies on the boundary of S and so on a line. By the
10.2: The Geometry of Linear Programming 389
same argument P cannot lie in the interior of the line. There-
fore P is a point of intersection of two lines and hence it is an
extreme point of S.
We can now summarize the possible situations for a linear
programming problem with set of feasible solutions S.
(a) S is empty: the problem has no solutions;
(b) S is non-empty and bounded: in this case the problem
has an optimal solution and it occurs at an extreme point
of S.
(c) S is unbounded: here optimal solutions need not exist,
but, if they do, they occur at extreme points of S.
We will see in 10.3 that if S is non-empty and bounded,
then it has a finite number of extreme points. By computing
the value of the objective function at each extreme point one
can find an optimal solution of the problem. We conclude
with two examples.
Example 10.2.2
maximize: z = 2x + 3y
( x+y > 1
subject to: < x — y > — 1
(x,y>0
Here the set of feasible solutions S corresponds to the region
of the xy-plane bounded by the lines x + y = 1, x — y = —1,
x — 0, y = 0. Clearly it is unbounded and z can be arbitrary
large at points in S. Thus no optimal solutions exist.
390 Chapter Ten: Linear Programming
Example 10.2.3
maximize: z = 1 — 12x — 3y
( x + y > 1
x — y > — 1
x,y>0
In this problem the set of feasible solutions is the same set S
as in the previous example. However the maximum value of
z in S occurs at x = 0, y = 1: this is an optimal solution of
the problem.
Exercises 10.2
In Exercises 10.2.1-10.2.3 sketch the convex subset of all
feasible solutions of a linear programming problem with the
given constraints.
( x- y < - 2
1. < 2x+ y < 3
(x,y>0
{ x - 2y < 3
x+ y < 6
x,y>0
{ x + y + z < 5
x - y - z < 0
x,y,z > 0
4. Find all the extreme points in the programs of Exercises
10.2.1 and 10.2.2 .
5. Suppose the objective function in Exercise 10.2.2 is
z = 2x + 3y. Find the optimal solution when z is to be
maximized.
6. Let S be any subspace of Rn
. Prove that S is convex.
Then give an example of a convex subset of R2
containing
(0,0) which is not a subspace.
10.3: Basic Solutions and Extreme Points 391
7. Let S be a convex subset of Rn
and let T be a linear
operator on Rn
. Define T(S) to be {T(X)  X e S}. Prove
that T(S) is convex.
8. Suppose that X and Xi are distinct feasible solutions
of a linear programming problem in standard form. If the
objective function has the same values at X and X2, prove
that this is the value of the objective function at any point on
the line segment joining X and X<i-
10.3 Basic Solutions and Extreme Points
We have seen in 10.2 that the extreme points for a linear
programming problem are the key to obtaining an optimal
solution. In this section we describe a method for finding the
extreme points which is the basis of the Simplex Algorithm.
Consider a linear programming problem in canonical form
- remember that any problem can be put in this form:
maximize: z = C X
subject to:
Suppose that the problem has n variables xi,..., xn and
rn constraints, which means that A is an m x n matrix, while
X,C ERn
a n d B 6 R m
The linear system AX = B must be consistent if there is
to be any chance of a feasible solution, so we assume this to be
the case; thus the matrix A and the augmented matrix (A | B)
have the same rank r. Hence the linear system AX = B is
equivalent to a system whose augmented matrix has rank r,
with its final m — r rows zero. These rows correspond to
constraints of the form 0 = 0, which are negligible. Therefore
AX = B
X>0
392 Chapter Ten: Linear Programming
there is no loss in supposing that A is an m x n matrix with
rank m; of course now m < n.
Since A has rank m, this matrix has m linearly indepen-
dent columns, say Ah, Ah,... ,Ajm, (jt < j 2 < • • • < j m ) -
Define
A = (Aj1 Aj2 ... Ajm),
which is an m x m matrix of rank m, so that (A')~1
exists.
The linear system
A (XJ1 Xj2 ... Xjm) = B
therefore has a unique solution for (XJX Xj2 ... Xjm)T
,
namely {A')-l
B.
This solution is in Rm
, not Rn
. To remedy this, define
X = (xi x2 ... xn) by putting xi = 0 if j ^ juj2,.. .,jm.
Then
AX = A{xx x2 ... xn)T
= xjlAjl + xhAh + •
•
• + xjmAjm
= B.
Therefore X is a solution of AX = B with the property that
all entries of X, except perhaps those in positions j i , . . . , j m ,
are zero. Such a solution is called a basic solution of the
linear programming problem; if in addition all the
non-negative, it is a basic feasible solution. The called
the basic variables.
The next step is to relate the basic feasible solutions to
the extreme points of a linear programming problem in canon-
ical form.
Theorem 10.3.1
A basic feasible solution of a linear programming problem in
canonical form is an extreme point of the set of feasible solu-
tions.
10.3: Basic Solutions and Extreme Points 393
Proof
Suppose that the linear programming problem is
maximize: z = C X
subject to:
and it has variables x,..., xn. Here A may be assumed to be
an m x n matrix with rank m: as has been pointed out, this
is no restriction. Then A has m linearly independent columns
and, by relabeling the variables if necessary, we can assume
these are the last m columns, say A[,..., A'm. Let
X = (0 ... 0 x[ ... x'jr
be the corresponding basic solution. Assume that X is feasi-
ble, i.e., x'j > 0 for j — 1,2,... ,m. Our task is to prove that
X is an extreme point of S, the set of all feasible solutions.
Suppose X is not an extreme point of S; then
X = tU+(l- t)V,
where 0 < t < 1, U, V 6 S and X ^ U, V. Write
U = (Ui ... Un-m u[ ... u'm)T
and
V = (Vi ... Vn-m v[ ... v'm)T
.
Equating the jth entries of X and tU + (1 — t)V, we obtain
tuj + (1 — t)vj = 0 , 1 < j < n — m
tu'j + (1 - t)v'j =x'j, l<j<m
Since 0 < t < 1 and Uj,Vj > 0, the first equation shows that
Uj = 0 = Vj for j = 1,..., n — m.
AX = B
X>Q
394 Chapter Ten: Linear Programming
Since U G S, we have AU = B, so that
u'1A'l + -.. + u'mA!m = B
and
x1A1 + • •
• + xmArn — B,
since AX = B. Therefore, on subtracting, we find that
K - x'M'i + • • • + K - 4 ) 4 = o.
However A^...,Nm are linearly independent, which means
that u[ = x[, ..., u'm = x'm, i.e., U = X, which is a contra-
diction.
The converse of this result is true.
Theorem 10.3.2
An extreme point of the set of feasible solutions of a linear
programming problem in canonical form is a basic feasible so-
lution.
Proof
Let the linear programming problem be
maximize: z = CT
X
, . . . (AX = B
subject to: < x >Q
where A is an m x n matrix of rank m. Let X be an extreme
point of the set of feasible solutions.
Suppose that X has s non-zero entries and label the vari-
ables so that the last s entries of X are non-zero, say
X = (0 ... Oii ... x's)T
.
Let A'j be the column of A which corresponds to the entry x'j.
We will prove that A[,..., A' are linearly independent.
10.3: Basic Solutions and Extreme Points 395
Assume that diA[ - h dsA's = 0 where not all the dj
are 0. Let e be any positive number. Then
In a similar fashion we have
a
Now define
£/ = (0 ... 0 xi + edi ... x's + eds)T
1/ = (0 ... 0 x[ - edx ... x's- eds)T
Then AU = B = AV.
Next choose e so that
x',
0 < e < -rj- j = l,2,...,s,
Mil
if dj ^ 0. This choice of e ensures that x' ± edj > 0 for
j = 1, 2,..., s. Hence U > 0 and V > 0, so that U and V are
feasible solutions. However, X = U + V, which means that
X = U or V since X is an extreme point. But both of these
are impossible because e > 0. It follows that A[,..., A'a are
linearly independent and X is a basic feasible solution.
We are now able to show that there are only finitely many
extreme points in the set of feasible solutions.
396 Chapter Ten: Linear Programming
Theorem 10.3.3
In a linear programming problem there are finitely many ex-
treme points in the set of feasible solutions.
Proof
We assume that the linear programming problem is in canon-
ical form:
maximize: z = CT
X
subject to:
We can further assume here that A is an m x n matrix
with rank m. Let X be an extreme point of S, the set of feasi-
ble solutions. Then X is a basic feasible solution by 10.3.2. In
fact, if the non-zero entries of X are Xjl,..., Xjs, the proof of
the theorem shows that the corresponding columns of A, that
is, Aj1,..., Aj3, are linearly independent and thus s < m. In
addition we have
x
ji An + ^x
jsAjs = B.
By 2.2.1 this equation has a unique solution for Xjx,... ,Xjs.
Therefore X is uniquely determined by j±,... ,js. Now there
are at most (n
) choices for ji,. • • ,js, so the total number of
extreme points is at most ^2™=0 (") •
The last theorem shows that in order to find an optimal
solution of a linear programming problem in canonical form,
one can determine the finite set of basic feasible solutions and
test the value of the objective function at each one. The sim-
plex algorithm provides a practical method for doing this and
is discussed in the next section.
In conclusion, we present an example of small order which
illustrates how the basic feasible solutions can be determined.
( AX = B
1 A>0
10.3: Basic Solutions and Extreme Points 397
Example 10.3.1
Consider the linear programming problem
maximize: z = 3x + 2y
(2x-y< 6
subject to: < 2x + y < 10
{ x,y>0
First transform the problem to canonical form by intro-
ducing slack variables u and v:
maximize: z = 3x + 2y
{ 2x — y + u = 6
2x + y + v = 10
x,y,u,v > 0
The matrix form of the constraints is
2
2
- 1 1 0 
1 0 l)
y
u
- (6
" I io
The coefficient matrix has rank 2 and each pair of columns
is linearly independent. Clearly there are (*) = 6 basic solu-
tions, not all of them feasible. In each such solution two of the
non-basic variables are zero. The basic solutions are listed in
the table below:
x y u v type z
0
0
3
5
4
0
0
10
0
0
2
- 6
6
16
0
- 4
0
0
10
0
4
0
0
16
feasible
feasible
feasible
infeasible
feasible
infeasible
0
20
9
15
16
-12
398 Chapter Ten: Linear Programming
There are four basic feasible solutions, i.e., extreme points.
The one that produces the largest value of,zis:r = 0,y = 10,
giving z = 20. Thus x = 0, y = 10 is an optimal solution.
Exercises 10.3
In each of the following linear programming problems,
transform the problem to canonical form and determine all the
basic solutions. Classify these as infeasible or basic feasible,
and then find the optimal solutions.
1.
maximize: z = 3x — y
( x + 3y < 6
subject to: < x — y < 2
{ x,y>0
2.
maximize: z = 2x + 3y
( 2x-y < 6
subject to: < 2x + y < 10
[x,y>0
3.
maximize: z = X + x-i + X3
{ 2xx - x2+ 4x3 < 12
4xi + 2x2 + 5x3 < 4
xi,x2,x3 > 0
4. A linear programming problem in standard form has m
constraints and n variables. Prove that the number of extreme
points is at most YlT=o (™^n
) •
10.4: The Simplex Algorithm 3 9 9
10.4 The Simplex Algorithm
We are now in a position to describe the simplex algo-
rithm, which is a practical method for solving linear program-
ming problems, based on the theory developed in the preced-
ing sections. The method starts with a basic feasible solution
and, by changing one basic variable at a time, seeks to find
an optimal solution of the problem. It should be kept in mind
that there are finitely many basic feasible solutions.
Consider a linear programming problem in standard form
with variables x,X2,... ,xn and m constraints:
maximize: z = CT
X
subject to: <
^ x >0
Thus A is an mxn matrix. For the present we will assume
that B > 0, which is likely to be true in many applications:
just what to do if this condition does not hold will be discussed
later.
Convert the program to one in canonical form by intro-
ducing slack variables xn+i,..., xn+rn:
maximize: z = CT
X
subject to:
where
A' = (A l m ) ,
an m x (n + m) matrix. Also I = (ii X2 • • • xn+rn)T
and
C = (ci C2 ... cn 0 ... 0)T
. Notice that A' has rank m since
columns n + l,n + 2,...,n + m are linearly independent.
J A'X = B
1 *>0
400 Chapter Ten: Linear Programming
Recall from 10.3 that the extreme points of the set of
feasible solutions are exactly the basic feasible solutions. Also
keep in mind that in a basic solution the non-basic variables
all have the value 0.
The initial tableau
For the linear programming problem in canonical form
above we have the solution
X — X2 — ' — Xn U, £n _|_i — 0, . • . , Xn-^-m — 0 m .
Since B > 0, this is a basic feasible solution in which the basic
variables are the slack variables xn+i,..., xn+m. The value of
z at this point is 0 since z = cXi + 1
- cnxn.
The data are displayed in an array called the initial
tableau.
xn+x
Xn+2
%n--m
Xi
an
«21
1 m l
- C l
X2
a2
«22
O'ml
- c 2
2-n
O'ln
a-2n
Q"mn
C n
x
n+l
1
0
0
0
•
En+m
0
0
1
0
z
0
0
0
1
h
b2
bn
0
Here the rows in the array correspond to the basic vari-
ables, which appear on the left, while the columns correspond
to all the variables, including z. The bottom row, which lies
outside the main array and is called the objective row, displays
the coefficients in the equation —cixi — • • • — cnxn + z — 0.
The z-column is often omitted since it never changes during
the algorithmic process. The right most column displays the
current values of the basic variables, with the value of z in the
lower right corner.
10.4: The Simplex Algorithm 401
Entering and departing variables
Consider the initial tableau above. Suppose that all the
entries in the objective row are non-negative. Then Cj < 0
n
and, since z = ^2 CjXj = 0, if we change the value of one of
the non-basic variables x,... ,xn by making it positive, the
value of z will decrease or remain the same. Therefore the
value of z cannot be increased from 0 and thus the solution is
optimal.
On the other hand, suppose that the objective row con-
tains a negative entry —Cj, so Cj > 0. Since z = CX + • • • +
d 1 < j < n, it may be possible to increase z by in-
creasing Xj; however this must be done in a manner that does
not violate any of the constraints.
Suppose that the most negative entry in the objective
row is —Cj. The question of interest is: by how much can we
increase the value of Xjl Since all other non-basic variables
equal 0, the zth constraint requires that
so that xn+j = bi — ciijXj > 0. Hence
for i = 1,2,..., m. Now if a^- < 0, this imposes no restriction
on Xj since bi > 0. Thus if a^- < 0 for all i, then Xj can be
increased without limit, so there are no optimal solutions.
If aij > 0 for some i, on the other hand, we must ensure
that
bi
0<Xj < — .
The number
h_
402 Chapter Ten: Linear Programming
is called a 6-ratio for Xj. Hence the value of Xj cannot be
increased by more than the smallest non-negative #-ratio of
XJ; for otherwise one of the constraints will be violated.
Suppose that the smallest non-negative #-ratio for Xj oc-
curs in the ith row: this is called the pivotal row. One then
applies row operations to the tableau, with the aim of making
the ith entry of column j equal to 1 and all other entries of the
column equal to 0. (This is called pivoting about (i,j) entry).
The choice of i and j guarantees that no negative entries will
appear in the right most column. Replace Xi (the departing
variable) by Xj (the entering variable). Now Xj becomes a
basic variable with value —*-, replacing X{. With this value of
Xj, the value of z will increase by biCj/aij, at least if 6; > 0.
After substituting Xj for Xi in the list of basic variables,
we obtain the second tableau. This is treated in the same way
as the first tableau, and if it is not optimal, one proceeds to a
third tableau. If at some point in the procedure all the entries
of the objective row become non-negative, an optimal solution
has been reached and the algorithm stops.
Summary of the simplex algorithm
Assume that a linear programming problem is given in
standard form
maximize: z = C X
, . . . (AX<B
subject to: < x > Q
where B > 0. Then the following procedure is to be applied.
1. Convert the program to canonical form by introducing
slack variables. With the slack variables as basic
variables, construct the initial tableau.
2. If no negative entries appear in the objective row, the
solution is optimal. Stop.
3. Choose the column with the most negative entry in
10.4: The Simplex Algorithm 403
the objective row. The variable for this column, say Xj,
is the entering variable.
4. If all the entries in column j are negative, then there
are no optimal solutions. Stop.
5. Find the row with the smallest non-negative 9-value
of Xj. If this corresponds to Xi, then X, is the departing
variable.
6. Pivot about the (i, j) entry, i.e., apply row operations
to the tableau to obtain 1 as the (i, j) entry, with all other
entries in column j equal to 0.
7. Replace Xi by Xj in the tableau obtained in step 6.
This is the new tableau. Return to Step 2.
Example 10.4.1
maximize: z = 8xi + 9x2 + 5x
subject to:
x + x2 + 2^3 < 2
2JCI + 3x2 + 4x3 < 3
3xi + 3x2 + £3 < 4
xi,x2 ,x3 > 0
Convert the problem to canonical form by introducing
slack variables X4,X5,X6:
maximize: z = 8x1 + 9x2 + 5x3
subject to:
Xi + x2 + 2x3 + x4 = 2
2xi + 3x2 + 4x3 + x5 = 3
3xx + 3x2 + xs + xe = 4
Xj > 0,
The initial basic feasible solution is X = X2 = x3 = 0,
x4 = 2, X5 = 3, XQ = 4, with basic variables x4,X5,x6. The
initial tableau is:
404 Chapter Ten: Linear Programming
£4
* * £5
XQ
Xi
1
2
3
-8
*x2
1
3
3
-9
X3
2
4
1
-5
£4
1
0
0
0
£5
0
1
0
0
x6
0
0
1
0
2
3
4
0
Here the z-column has been suppressed. The initial basic
feasible solution £4 = 2, £5 = 3, XQ = 4 is not optimal since
there are negative entries in the objective row; the most neg-
ative entry occurs in column 2, so £2 is the entering variable,
(indicated in the tableau by *).
The #-values for £2 are 2,1, | , corresponding to £4, £5, x&.
The smallest (non-negative) 9-value is 1, so £5 is the departing
variable (indicated in the tableau by **). Now pivot about the
(2, 2) entry to obtain the second tableau.
£4
X2
* * £ 6
*X
1/3
2/3
1
-2
X2
0
1
0
0
X3
2/3
4/3
-3
7
£4
1
0
0
0
£5
-1/3
1/3
-1
3
x6
0
0
1
0
1
1
1
9
The objective row still has a negative entry, so this is not
optimal: the entering variable is x. The smallest 9-value for
£1 is 1, occurring for £6, so this is the departing variable. Now
pivot about the (3,1) entry to get the third tableau.
£4
X2
£l
£l
0
0
1
0
x2
0
1
0
0
X3
5/3
10/3
-3
1
£4
1
0
0
0
£5
0
1
-1
1
£6
-1/3
-2/3
1
2
2/3
1/3
1
11
10.4: The Simplex Algorithm 405
Since there are no negative entries in the objective row,
this tableau is optimal. The optimal solution is therefore x =
1, x2 = §, x3 = 0, giving z = 11.
The next example shows how the simplex method can
detect a case where there are no optimal solutions.
Example 10.4.2
maximize: z = 5xi — 4x2
subject to:
£1 — £2 < 2
2xi +x2<2
xi,x2 > 0
Introduce slack variables X3 and X4 and pass to canonical
form:
maximize: z = 5xi — 4x2
xi - x2 + x3 = 2
subject to: < —2xi + X2 + X4 = 2
Xi,X2,X3,X4 > 0
The initial basic feasible solution is xi = 0 = X2, X3 = 2,
X4 = 2, with basic variables X3,X4. The initial tableau is
therefore
* * X3
X4
*Xi
1
-2
-5
%2
-1
1
4
x3
1
0
0
X4
0
1
0
2
2
0
The entering variable is x and the departing variable X3.
The second tableau is:
Xi
X4
Xi
1
0
0
*x2
-1
-1
-1
X3
1
2
5
X4
0
1
0
2
6
10
406 Chapter Ten: Linear Programming
The next entering variable is x2; however all the entries
in the X2-column are negative, which means that x2 can be in-
creased without limit. Therefore this problem has no optimal
solution.
Geometrically, what happened here is that the set of fea-
sible solutions is the infinite region of the plane lying between
the lines x1 - x2 = 2, -2x± + x2 = 2, x± = 0, x2 = 0. In this
region z = bxi — 4x2 can take arbitrarily large values.
Degeneracy
Up to this point we have not taken into account the pos-
sibility that the simplex algorithm may fail to terminate: in
fact this could happen.
To see how it might occur, suppose that at some stage
in the simplex algorithm the entering variable has two equal
smallest non-negative 9-values. Then after pivoting one of the
basic variables will have the value zero, a phenomenon called
degeneracy. If in the next tableau the basic variable whose
value was 0 is the departing variable, the objective function
will not increase in value. This raises the possibility that at
some point we might return to this tableau, in which event
the simplex algorithm will run forever.
In practice the simplex algorithm very seldom fails to ter-
minate. In any case there is a simple adjustment to the algo-
rithm which avoids the possibility of non-termination. These
adjustments involve different choices of entering and departing
variables, as indicated below.
(i) To select the entering variable, choose the variable
with a negative entry in the objective row which has the
smallest subscript.
(ii) To select the departing variable choose the basic
variable with smallest non-negative 8-value and smallest
subscript.
This procedure is known as Bland's Rule. It can be shown
10.4: The Simplex Algorithm 407
that the simplex method, when combined with Bland's Rule,
will always terminate, even if degeneracy occurs.
The Two Phase Method
The reader may have noticed that our version of the sim-
plex algorithm does not work if some constraints have nega-
tive numbers on the right side. We consider briefly how this
situation can be remedied.
Consider a linear program in standard form:
rp
maximize: z = C X
subject to: <
^ x >0
where A is m x n. As usual we introduce slack variables
xn +i,..., xn+m to obtain a problem in canonical form:
rp
maximize: z = C X
subject to:
where A' = [A | lm]. If some bi is negative, we can multiply
that constraint by —1 to get an entry —bi > 0 on the right
hand side. The problem now is that we do not have a basic
feasible solution — for the obvious solution xn+i = bi is not
feasible. What is called for at this point is a general method
for finding an initial basic feasible solution for any linear pro-
gramming problem in canonical form.
Suppose we have a linear programming problem in canon-
ical form:
rp
maximize: z = C X
subject to: I x >0 ®
where A is m x n. The problem is to find an initial basic
feasible solution. Once this is found, the simplex algorithm
A'X = B
X > 0
408 Chapter Ten: Linear Programming
can be run. There is no loss in assuming that B > 0 since we
can, if necessary, multiply a constraint by —1.
The method is to introduce new variables yi,V2,- • • ,ym
called artificial variables. These are used to form the auxiliary
program:
maximize: z -J>*
subject to: <
^ X Y >0 ^ '
If (II) has an optimal solution X, Y with z = 0, then
all the yi must equal 0 and thus AX = B. Hence X is a
basic feasible solution of (I). On the other hand, if the optimal
solution of (II) yields a negative value of 2, there are no feasible
solutions of (II) with Y = 0, i.e., there are no feasible solutions
of (I). Thus if we can solve the problem (II), we will either find
a basic feasible solution of (I) or else conclude that (I) has no
feasible solutions.
But can we in fact solve the problem (II)? The answer
is affirmative: for X = 0, y = b, ..., ym = bm is clearly
a basic feasible solution of (II), so it can be used to form the
initial tableau for problem (II). After solving (II), either we
will have a basic feasible solution of (I) or we will know that
no feasible solutions exist. In the former event the simplex
algorithm can then be run for problem (I). This is known as
the Two Phase Method.
We summarize the two phases for solving the linear pro-
gramming problem (I).
Phase One
Apply the simplex method to the auxiliary program (II). If
there is no optimal solution or if the optimal solution yields
a negative value z, then there are no feasible solutions of (I).
Stop. Otherwise a basic feasible solution to problem I is found.
10.4: The Simplex Algorithm 409
Phase Two
Starting with the basic feasible solution obtained in Phase
One, use the simplex algorithm to find an optimal solution of
(I) or show that none exists.
In conclusion, the Two Phase Method can be applied to
any linear programming problem in canonical form.
Example 10.4.3
maximize: z — 2x —
( xi + 2x2
subject to: < 3xi + 6x2
1 Xi>0
2x2 — 3x3 + 2x4
+ x3 + x4
+ 2x3
= 18
= 24
This problem is given in canonical form. The Two Phase
Method will be applied, the first phase being to find a basic
feasible solution. To this end we set up the auxiliary problem:
maximize: z = -y — 2/2 — J/3
( Xi + 2x2 + 2X3 +
u . , , 1 xi + 2x2 + x3 + x4 +
a b j G C t t 0 :
3x, + 6x2 + 2x3 +
I Xi,yj > 0
The initial tableau for this problem is:
Vi
V2
V3
= 12
= 18
= 24
yi
V2
2/3
Xi
1
1
3
0
X2
2
2
6
0
X3 X4
2 0
1 1
2 0
0 0
Vi
1
0
0
- 1
2/2
0
1
0
- 1
2/3
0
0
1
- 1
12
18
24
- 5 4
Here the initial basic feasible solution is y = 12, yi = 18, yz =
24, with z = —54. But notice that the entries in the objective
410 Chapter Ten: Linear Programming
row corresponding to the basic variables are not 0; this is
because z is expressed as —y — y2 — y3. We need to replace
2/1,2/2,2/3 by expressions in x±, x2, x3, x4 and thereby eliminate
the offending entries. Note that —y = x + 2^2 + 2x3 — 12,
-2/2 = x1 + 2x2 + x3 + x4 - 18 and - y 3 = 3xi + 6x2 + 2x3 - 24.
Adding these, we obtain
z = -2/i - 2/2 - 2/3 = 5xi + 10£2 + 5x3 + X4 - 54.
The next step is to use this expression to form the new
objective row:
2/1
2/2
* *2/3
Xi
1
1
3
- 5
*x2
2
2
6
-10
Z 3
2
1
2
- 5
X4
0
1
0
- 1
2/i
1
0
0
0
2/2
0
1
0
0
2/3
0
0
1
0
12
18
24
-54
This is the first tableau for the auxiliary problem. The enter-
ing variable is x2 and the departing variable y3. The second
tableau is:
* * 2 / i
2/2
%2
X i
0
0
1/2
0
X2
0
0
1
0
*x3
4/3
1/3
1/3
- 5 / 3
x4
0
1
0
- 1
2/1
1
0
0
0
2/2
0
1
0
0
2/3
- 1 / 3
- 1 / 3
1/6
5/3
4
10
4
-14
The entering variable is x3 and the departing variable is
y. The third tableau is:
X3
* * 2/2
Xi
xi x2 x3 *x4 j/i y2 2/3
0 0 1 0 3/4 0 -1/4
0 0 0 1 -1/4 0 1/4
1/2 1 0 0 -1/4 0 1/4
0 0 0 - 1 5/4 0 5/4
3
9
3
- 9
10.4: The Simplex Algorithm 411
The entering variable is £4 and the departing variable is
y2. The fourth tableau is
x3
£ 4
X2
X £ 2
0 0
0 0
1/2 1
0 0
X3
1
0
0
0
£ 4
0
1
0
0
Vi
3/4
-1/4
-1/4
1
V2
0
0
0
0
2/3
-1/4
-1/4
1/4
1
3
9
3
0
This tableau is optimal with z = 0. Hence we have a
basic solution of the original problem, x = 0, £2 = 3, £3 = 3,
£4 = 9.
Now Phase Two begins. To obtain an initial tableau, in
the final tableau of Phase 1 delete the columns corresponding
to the artificial variables yi,y2,V3- The new basic variables
are £3, £4, £2. Replace the objective row by the entries of the
original objective function, but retain 0 in the bottom right
hand corner:
X3
£ 4
X2
XX £ 2 £ 3 X4
0 0 1 0
0 0 0 1
1/2 1 0 0
- 2 2 3 - 2
3
9
3
0
Next eliminate the non-zero entries in the objective row
corresponding to the basic variables £3,£4,£2. This is done
by adding to the objective row (—2) x row 3, (—3) x row 1
and 2 x row 2. This yields the tableau:
X3
£ 4
* * £2
* £ l £2 £3 £4
0 0 1 0
0 0 0 1
1/2 1 0 0
- 3 0 0 0
3
9
3
3
412 Chapter Ten: Linear Programming
The entering variable is x and the departing variable is x<i-
The next tableau is:
%3
X4
Xi
Xi
0
0
1
0
x2
0
0
2
6
£ 3
1
0
0
0
£ 4
0
1
0
0
3
9
6
21
This tableau is optimal with solution x± = 6, x2 = 0, x3 = 3,
£4 = 9 and 2 = 21.
In conclusion we remark that there is one possible situa-
tion that the Two Phase Method cannot handle. It could be
that in the final tableau of Phase One at least one artificial
variable is basic. There is a modification of the Two Phase
Method to deal with this possibility. The reader is referred to
a text on linear programming such as [12] or [13] for details.
Needless to say, we have merely skimmed the surface of
linear programming. Recently an improvement on the simplex
method known as Kamarkar's algorithm has been discovered.
Again the interested reader may consult one of the above ref-
erences for details.
Exercises 10.4
In the following problems use the simplex method to solve
the linear programming problem or show that no optimal so-
lution exists.
1.
maximize: z = 3x — y
(x+ 3y < 6
subject to: < x — y < 2
[x,y> 0
10.4: The Simplex Algorithm 413
6.
maximize: z =
f
subject to: <
minimize: z —

subject to: <
minimize: z =
f
subject to: <
I
maximize: z -
(
subject to: <
I
maximize: z =
subject to:
( 2xi - x2
1 2xi + 3x2
| 3xi + x2
I Xj > 0
= 2x + 3y
2x- y
2x+ y
x,y > 0
-2x + 3y
x- y
x- 2y
x,y > 0
3xi — 2x2
Xi + X 2
< 6
< 10
> - 2
< 4
+ 2x3
2xx + x2 + x3
Xj >
-- xi + 2x2
3xx + x2
2xx + 4x2
X-
0
+ Z 3 -
+ 2x
< 7
< 4
X4
3 -XA
- 4x3
i> 0
= xi + x2 + 3x3 -
+ xz
+ 2x3
+ XA
+ 4x4
XA
< 8
< 6
< 18
< 2
< 4
4 1 4 Chapter Ten: Linear Programming
7. Use the Two Phase Method to solve the following linear
programming problem, noting that only one artificial variable
is needed.
maximize: z = x + 2^2 — Xs
subject to:
{2a:i + x2+ x3 < 4
xi + x-i + 2x3 — 3
Xj> 0
Appendix
MATHEMATICAL INDUCTION
Mathematical induction is one of the most powerful meth-
ods of proof in mathematics and it is used in several places
in this book. Since some readers may be unfamiliar with in-
duction, and others may feel in need of a review, we present
a brief account of it here.
The method of proof by induction rests on the following
principle.
Principle of mathematical induction
Let m be an integer and let P(n) be a statement or propo-
sition defined for each integer n > m. Assume furthermore
that the following hold:
(i) P(m) is true;
(ii) if P(n — 1) is true, then P(n) is true.
Then the conclusion is that P{n) is true for all integers n > m.
While this may sound harmless enough, it is in fact an
axiom for the integers: it cannot be deduced from the usual
arithmetic properties of the integers and its validity must be
assumed.
We shall give some examples to illustrate the use of this
principle.
Example A.l
If n is any positive integer, prove by mathematical induction
that the sum of the first n positive integers equals n{n + 1).
Let P(n) denote the statement:
1 + 2 +•
• • + n = - n ( n + l ) .
415
416 Appendix
We have to show that P(n) is true for all integers n > 1. Now
clearly P(l) is true: it simply asserts that 1 = {2). Suppose
that P(n — 1) is true; we must show that P(n) is also true.
In order to prove this, we begin with 1 + 2 + • • • + (n — 1) =
{n — l)n, which is known to be true, and then add n to both
sides. This yields
1 + 2 + • •
• + (n - 1) + n = -(n - l)ra + n = ~n{n + 1).
Hence P(n) is true. Therefore by the Principle of Mathemat-
ical Induction P(n) is true for all n > 1.
Example A.2
Let n be any positive integer. Prove by mathematical induc-
tion that the integer 8n + 1
+ 92 n _ 1
is always divisible by 73.
Let P{n) be the statement: 73 divides 8n + 1
+92 n _ 1
. Then
we easily verify that -P(l) is true. Assume that P(n — 1) is
true; thus 8n
+ 92n
~3
is divisible by 73. We need to show
that P(n) is true. The method in this example is to express
gn+l + g2n-l i n t e r m s o f gn + Q2n-3. t h u g
gn+1 + Q2n-1 = g(gn + g2n-3) + g2n-l _ g^n-3^
= 8(8n
+ 92n
~3
) + 92n
~3
(92
- 8)
= 8(8n
+92 n
-3
) + 73(92 n
'3
).
Since P(n — 1) is true, the last integer is divisible by 73. There-
fore P(n) is true.
Occasionally, the following alternate form of mathemati-
cal induction is useful.
Principle of mathematical induction - alternate form
Let m be an integer and let P{n) be a statement or propo-
sition defined for each integer n > m. Assume furthermore
that the following hold:
Appendix 417
(i) P(m) is true;
(ii) if P(k) is true for all k < n , then P(n) is true.
Then the conclusion is that P(n) is true for all integers n > m.
Example A 3
Prove that every integer n > 1 is a product of prime numbers.
Let P{n) be the statement that n is a product of primes;
here n > 2. Then P(2) is certainly true since 2 is a prime.
Assume that P(k) is true for all k < n. We have to show-
that P(n) is true. Now if n is a prime, P(n) is certainly true.
Assume that n is not a prime; then n = nn<i where n and
n,2 are positive integers less than n. Hence P{n) and P(ri2)
are true, so both n and n^ are products of primes. Therefore
n = nTi2 is a product of primes and Pin) is true. It now
follows from the second form of the Principle of Mathematical
Induction that P(n) is true for all n > 2.
Exercises
1. If n is a positive integer, prove by induction that the sum
of the squares of the first n positive integers equals
|n(n + l)(2n + l).
2. If n is a positive integer, prove by induction that the sum
of the cubes of the first n positive integers equals (|n(n + l))2
.
3. Let uo,ui,U2, • •
• be a sequence of integers which satisfies
the recurrence relation un+i — 2un + 3 and also UQ = 1. Prove
by induction that un = 2n + 2
— 3.
4. Prove by induction that the number of symmetric n x n
matrices over the field of two elements equals 2n
(n+1
)/2
.
5. Use the second form of mathematical induction to prove
that each integer > 1 is uniquely expressible as a product of
primes.
ANSWERS TO THE EXERCISES
Exercises 1.1
1- (_3 ""4 J "g)-2. (a)(-l)^-1
;(b)4z + j - 4 .
3. Six: 0i2,i, 06,2, 04)3, 03i4, 02,6, Oi,^. 4. n should be prime.
5. Diagonal matrices.
Exercises 1.2
22
-5
14
14
- 6
1
/ 9
A3
=1-4
 1 2
3. A is m x n and B is n x m. 4. A6
= I2. 9. True.
12. Numbers of books in library, lent out, lost are 7945, 1790,
265 respectively.
14. The matrix equals
1 5/2 11/2  / 0 1/2 - 3 / 2 '
5/2 5 -7/2 + -1/2 0 5/2
11/2 -7/2 5 /  3/2 -5/2 0,
18. (a) The inverse is | ( „ " I; (b) not invertible.
21.
/ 0 1 0"
0 0 1
 0 0 0
Exercises 1.3
2. A non-zero matrix need not have an inverse.
o <yn2
o n
( n
+ l ) / 2
418
Answers 419
1 0 0
4. A + B=  1 0 0 | , A2
=
1 0 0
0 1 0'
AB= | 0 0 1
1 1 1
7. The integer 2 does not have an inverse.
Exercises 2.1
1. xi = c/3 + d/3 - 1/3, x2 = 4c/3 - 2d/3 + 11/3, x3 = c,
£4 = d.
2. xi = 2c/3 - 5/3, x2 = 2c/3 + 7/3, x3 = c/3 + 2/3, x4 = c
.
3. Inconsistent.
4. (a) xi = —c , X2 = c , X3 = 0, X4 = 0; (b) x = X2 = X3 =
0.
5. For t = - 4 or 3. 6. t ^ - 1 / 3 . 7. n(n + l ) / 2 .
Exercises 2.2
1. W [ 0 ^ 4/5);(b) (J g "J
2 - 2  ,.
l V 5 J ; ( b ) ( (
1 2 - 3 0
(c) ( 0 1 -11/5 1/5
0 0 0 1
/ l 0 7/5  / l 0 7/5 0'
2. (a) /3; (b) 0 1 -11/5 ; (c) 0 1 -11/5 0
 0 0 0 /  0 0 0 1
5. n(n + l ) / 2 . 6. n2
. 7. J2 and H j
8. The number of pivots equals n.
420 Answers
Exercises 2.3
M.)(i!)(J_3°)(J?)(i?" J
(b)
(These answers are not unique).
2.
5. n(n + l)/2 and n'
2
6. (a) i ^ 3^ . ( b ) _ i / 3 _ 6 - 3 ; (c) not
invertible.
7. t = —3 or 2. 8. Entries on the principal diagonal must be
non-zero.
Exercises 3.1
1. Odd; -aiia23a38a45a52066a.74087-
2. Even; ai8a25«33«4205ia67076a89a94-
Answers 421
3. 19. 4. n(n) - 1. 5. M13 = 11 = A13, M23 = 7 = -A2 3 ,
M33 = - 6 = A33. 6. 84. 7. (a) -40, (b) -30, (c) -36.
/ 0 0 1 0 0
1 0 0 0 0
9. 0 0 0 1 0
0 0 0 0 1
Vo 1 0 0 0 /
Exercises 3.2
1. (a) 133; (b) 132; (c) -26. 10. u2 = 3, u3 = 14, w4 = 63.
Exercises 3.3
-6 -14
2- (a) ^ ( 2 4 ) ; ( b ) - & ( - 1 5 -11 _ 8 j ;
/ I
0
0
 0
- 1
1
0
0
0
- 1
1
0
0
0
- 1
1
(c)
4. (a) xi = 1, x2 = 2, £3 = 3; (b) x = 1, x2 = 0, x3 = - 2 .
7. 2x-3y-z = l.
Exercises 4.1
2. (a) No; (b) no; (c) yes; (d) yes.
Exercises 4.2
1. (a) No; (b) no; (c) yes. 2. (a) No; (b) yes; (c) yes. 3.
Yes. 4. No.
Exercises 4.3
1. (a) Linearly independent; (b) linearly independent;
(c) linearly dependent. 2. True. 3. True. 4. False.
9. False. 10. No.
422 Answers
Exercises 5.1
1. (a) Ex = l/13(9Xi+3X2 -8X3 ), E2 = l/13(-3Xi - X 2 +
7X3), E3 = l/13(-17Xi - 10X2 + I8X3);
(b) Ex = -2YX + AY2 - Y3, E2 = AYX - 7Y2 + 2y3, E3 =
y i - 3 F 2 + y3.
2. (a) (-2 - 1 1)T
; (b) ( 1 - 1 1 0)T
and (-2 1 0 1)T
.
3. mn. 6.S = V. 8. - 4 ( - l 1 0)T
- 2(-l 0 l ) r
.
Exercises 5.2
1. (a) Basis of the row space is (1 0 63/2), (0 1 18), basis of
the column space is (10)T
,(0 1)T
; (b) basis of the row space is
(1 0 5/19 25/19), (0 1 4/19 20/19), basis of the column space
is(10 4)T
,(0 1 1)T
.
2. (a) 1 + 5z3
/3, x + x3
/3, x2
+ x3
;
(b)
( 1/76/7J' (l/7 -1/7
5. vn — r. 6. They are < rank of A and < rank of B.
Exercises 5.3
1. The subspaces generated by (1 0)T
, (1 1)T
, (0 l ) r
.
2. dim(t/) = n(n + l)/2 and dim(W) = n(n - l)/2. 5. False.
6. Let U =< fi I i = 1,..., 7 > and W =< fi | i = 4,..., 14 >
where fi = xl
~l
.
7. dim(U + W)=3, dim(UnW) = l.
8. Basis for U+W 3
, basis for UCW is l — x+x2
.
11. dim(E/i) + -.- + dim({7fc). 12. No.
/ - l / 3  / c / 3 + d / 3 
1 4 y _ [ 11/3 v [ 4c/3-d/3
0
~ 0 ' c
V 0 y V d /
15. n - 1 .
Answers 423
Exercises 6.1
1. (a) None of these; (b) bijective; (c) surjective; (d) injective.
4. F-1
(x) = {(x + 5)/2}1
/3
.
Exercises 6.2
1. (a) No; (b) yes; (c) no, unless n — 1.
i - i - i - i  Z1
°
4. I 2 1 - 1 0 | . 5 .
0 1 - 1 1
6.
7.
8.
cos 20 sin 20
sin 20 — cos 20
1/2 1/4 -3/4
0 1/2 -1/2
0 0 1
0
2
-3
0
0
6
- 5 /
6 - 7 - 2
x2 - 3 - 1
12. The statement is true.
9. They have different determinants.
Exercises 6.3
1. (a) Basis of kernel is (-1 1 0 0)T
, (-1 0 1 0)r
, (-1 0 0 1)T
,
basis of image is 1; (b) basis of kernel is 1, basis of image is
1, x; (c) basis of kernel is (—3 2)T
, basis of image is (1 2)T
.
3. R6
, R6 and M(2,3, R) are all isomorphic: C6
and P6(C)
are isomorphic. 6. True. 10. They are not equivalent for
infinitely generated vector spaces.
3. 1/14(1 2 3)T
and
Exercises 7.1
1. 92.84°. 2. ±l/v/42(4 1 - 5)T
1/VTi. 4. 9/^/26. 5. Vector product = (-14 - 4 8)'", area
= 2^69. 8. det(X Y Z) = 0. 9. Dimension = n - 1 or n ,
according as X ^ 0 or X — 0.
424 Answers
11. t(3-V2 + 3(V2 + l)i (/2-3)(l+i) 4) where i = V=l
and t is arbitrary. 13. (X*Y/Y2
)Y.
Exercises 7.2
1. (a) No; (b) yes; (c) no. 4. (a) No; (6) no; (c) yes.
8. 23-120:r+110:r2
. 9. 1/105(17 - 190 331)T
.
Exercises 7.3
2. l/>/2(l 0 - 1 ) T
, 1/3(2 1 2)T
, l / v l 8 ( l - 4 1)T
.
3. 1/^2(0 1 l ) r
, 1/3(1 - 2 2)T
, l/>/l8(4 1 - 1)T
.
4. 1/^7(1 -6x), 75/154(2 + 30x-42a;2
). 5. (1/2 4 1/2)T
.
/ 0 1/3
6. Q = l/y/2 - 2 / 3 lA/18 I and
 l/v/2 2/3
V 2 0 1/^2
i?= | 0 3 - 1 / 3
0 0 5/VT8
8. The product of ( " " v * V « + « ) / 3 
(f(1
-^3
).10.Q = 0'a,dfl = iJ'.
Exercises 7.4
1. (a) xi = 1, z2 = - 3 / 5 , x3 = -3/5; (b) m = 1631/665,
x2 = -88/95, x3 = -66/95. 2. r = -70£/51 + 3610/51.
3. y = - 4 + 7x/2 - x2
/2. 4. xi = 13/35, x2 = -17/70,
x3 = 1/70. 5. (-8e-x
+ 26e~2
) + (6e_1
- 18e"2
)x.
6. 12(TT2
- 10)/TT3
+ 60(-TT2
+ 12)x/ir4
+ 60(TT2
- 12)X2
/TT5
.
Exercises 8.1
1. (a) Eigenvalues —2, 6; eigenvectors t(—5 3)T
, t( 1)T
;
(b) eigenvalues 1, 2, 3, eigenvectors t ( - l 1 2)T
, t(-2 1 4)T
,
£(—1 1 4)T
; (c) eigenvalues 1, 2, 3, 4, eigenvectors
Answers 425
( 2 - 4 - 1 1)T
, ( 0 - 2 0 1)T
, (0 0 1 1)T
, (0 0 0 1)T
. 5. False.
7. (a) ( - J);(b )(l "j "jJ.
8. They should be both zero or both non-zero. 13. Non-zero
constants.
Exercises 8.2
1. (a) yn = 4- 3n + 1
- 3 • 4n + 1
, zn = - 3 n + 1
+ 4n+1
;
(b) yn = l/9(10-7" + 5(-2)"+1
), zn = 1/9(5- 7n
-2(-2)-+1
).
2. an = l/3(a0 + 2b0 + 2.4n
(a0 - &o)), bn = l/3(a0 + 260 +
4n
(—ao + bo)) '• if ^0 > bo> species A nourishes, and species B
dies out: if ao < bo, the reverse holds.
3. rn = 2/V5{((l + >/5)/2)" - ((1 - y/E)/2)n
}.
4. un = (2n+1
+ (-l)n
)/3.
5. yn = 1 + 2n, zn = 2n. 6. yn = ( ( - l ) n + 1
+ l)/2,
zn = (38.4"-1
+ 3 ( - l ) n
- 5)/30.
7. Employed 85.7%, unemployed 14.3%. 8. Equal numbers
at each site. 9. Conservatives 24%, liberals 45%, socialists
31%.
Exercises 8.3
1. (a) yi = -cie~5 x
+ 2c2ex
, y2 = cxe~5x
+ c2ex
;
(b) yi = aex
- c2e5x
, y2 = cxex
+ c2e5x
;
(c) 2/i = cxex
+ c3e3x
, 2/2 = -2c2ex
, y3 = c2ex
+ c3e3x
.
2. 2/i = e2x
(cos x + sin x), y2 = 2e2x
cos x.
3. 2/1 = {3c2x — ci)e2x
, 2/2 = (—3c2£ + ci + c2)e2x
: particular
solution 2/i = 6xe2x
, y2 — (2 — 6x)e2s
.
4. 2/i = cie^ +C2e~x
+ c3e3a::
-f-C4e~3x
, y2 = —cex
+ 5c2e~x
+
c3e3x
— 5c4e_3x
.
5. n/c. 7. (a) 2/1 = —u + u2, y2 = u — 3^2 where ui =
cicosh y/2x + disinh [2x and 1
*
2 = c2cosh 2x + d2sinh 2x;
(b) 2/1 = —4tii — u2, 2/2 = wi + u2 where u = cicosh x+
disinh x and u2 = C2COsh 2x + d2smh 2x.
426 Answers
8- Vi = (-1 - /2)wi + (-1 + V2)w2, 2/2 = wi + w2 where
wi = ci cos ux + di sin ux and w2 = C2 cos vx + d2 sin us
with tt = aJ2 + 1/2 and w = a  / 2 -  / 2 .
Exercises 9.1
-1/^/3 2/^6 0'
/ 1 1  / -1-/V0
*/Vv u
1. (a) 1/^2 _} ! ; (b) l/v/3 1/^6 - 1 / ^ 2
V J
 W3 W6 W2,
(c)W2Q j), » = >/=!.
8.r-u=^.
Exercises 9.2
1. (a) Positive definite; (b) indefinite; (c) indefinite.
2. Indefinite. 6. (a) Ellipse; (b) parabola.
7. (a) Ellipsoid; (b) hyperboloid (of one sheet).
8. (a) Local minimum at (—4, 2); (b) local maximum at
(-1 - y/2, 1 - y/2), local minimum at (1 + ^2, - 1 + ^2),
saddle points at (^/2 - 1 , ^ / 2 + 1) and (1 - y/2, - 1 - y/2);
(c) local minimum at (—2/5, —1/5,3/10).
9. The smallest and largest values are 5
2 17
and 5
^17
re-
spectively.
11. The spheres have radii 0.768 and 0.434 respectively.
Exercises 9.3
1. (a) No; (b) yes; (c) yes. 2. (a) ( _ ° J ) ! 0>) ( ° I ? ) •
3. dim(V') = n2
. 5. 2zi yi + 4x2 y'2. 7. (a) Yes; (b) no.
0 1 0  /1/ 2 0 1/2'
8. I - 1 0 0 ; S= 0 1 1 / 2
0 0 0 / I 0 0 1
Answers 427
Exercises 9.4
1. (a) x-2; (b) {x - 2)(x - 3); (c) x2
- 1; (d) {x - 2)2
{x - 3).
6. (a) (a - 4)(x + 1); (b) (x - 2)2
; (c) (x - l)3
.'
L 0
7. A must be similar to where r + s
L
r
0 os
8. A must be similar to a block matrix with a block Ir, t
1
and a block 0S where r + 2t + s
blocks
10.
0 0
n.
n.
fO 0
1 0
0 1
Vo 0
0
0
0 -o2
1 - a n _ i /
11.
y 3 :
Vi = {c2x2
+ (ci + c2):r + (c0 + cx))ex
, y2 = c2ex
,
(c2x + ci)ex
.
Exercises 10.1
1.
maximize: p = pix± + p2x2 + P3X3
subject to
U1X1 + U2X2 + U3X3 < S
VX + V2X2 + V3X3 < t
Xj > 0
minimize: e = px + qy
acx + bcy > mc
dfx + bfy > rrif
apx + bpy > mp
x,y > 0
subject to :
428 Answers
3.
maximize: z = —2xi + x2 + x£ — x% — X4
{ —x — 2x2 — X3 + x^ + x < —5
3xi + x2 — x£ + X3 + X4 < 4
xi,x2,x^,x^ > 0
4.
maximize: z = —2x + x2 + x^ — x% — X4
{ —xi — 2x2 — X3 + X3 + X4 + Xs = —5
3xi + x2 — £3" + X3 + X4 + Xe = 4
Xi,X2,X~3,Xs,X4,X5,Xe > 0
5. (c) z = CT
A~l
B.
Exercises 10.2
4. (a) In Exercise 10.2.1 the extreme points are (0, 2),
(1/3, 7/3), (0, 3).
(b) In Exercise 10.2.2 the extreme points are (0, 0), (3, 0),
(5, 1),(6, 0),
5. The optimal solution is x = 0, y = 6.
Exercises 10.3
1. The optimal solution is x = 3, y = 1.
3. The optimal solution is x = 0, y = 10.
4. The optimal solution is X = 0 , x2 = 2, £3 = 0.
Exercises 10.4
1. x = 3, y = 1.
2. x = 0, y = 10.
Answers
3. No optimal solution.
4. x = 0, x2 = 4, xz = 0.
5 xi = 0, 22 = 4/3, x3 = 1/3, X4 = 0.
6. X! = 0, x2 = 2/3, x3 = 26/3, 24 = 0.
7. xi = 0 , 22 = 3, x3 = 0.
BIBLIOGRAPHY
Abstract Algebra
(1) I.N. Herstein, "Topics in Algebra", 2nd ed., Wiley, New
York, 1975.
(2) S. MacLane and G. Birkhoff, "Algebra", 3rd ed., Chelsea,
New York, 1988.
(3) D.J.S. Robinson, "An Introduction to Abstract Algebra",
De Gruyter, Berlin, 2003.
(4) Rotman, J.J. "A First Course in Abstract Algebra", 2nd
ed., Prentice Hall, Upper Saddle River, NJ, 2000.
Linear Algebra
(5) C.W. Curtis, "Linear Algebra, an Introductory
Approach", Springer, New York, 1984.
(6) F.R. Gantmacher, "The Theory of Matrices", 2 vols.,
Chelsea, New York, 1960.
(7) P.J. Halmos, "Finite-Dimensional Vector Spaces", Van
Nostrand-Reinhold, Princeton, N.J., 1958.
(8) B. Kolman, "Introductory Linear Algebra with Applica-
tions", 5th ed., Macmillan, New York, 1993.
(9) S.J. Leon, "Linear Algebra with Applications", 5th ed.,
Prentice Hall, Upper Saddle River, NJ, 1998.
(10) G. Strang, "Linear Algebra and its Applications", 3rd
ed., Harcourt Brace Jovanovich, San Diego, 1988.
Applied Linear Algebra
(11) R. Bellman, "Introduction to Matrix Analysis", 2nd ed.,
Society for Industrial and Applied Mathemetics, Philadelphia,
1995.
(12) H. Karloff, "Linear Programming", Birkhauser, Boston
1991.
(13) B. Kolman and R.E. Beck, "Elementary Linear Program-
ming with Applications", Academic Press, San Diego, 1995.
430
Bibliography 431
(14) B. Noble and J.W. Daniel, "Applied Linear Algebra", 3rd
ed., Prentice-Hall, Englewood Cliffs, N.J., 1988.
Some Related Books of Interest
(15) W.R. Derrick and S.I. Grossman, "Elementary Differen-
tial Equations with Applications", 2nd ed., Addison-Wesley,
Reading, MA, 1982.
(16) C.H. Edwards and D.E. Penney, "Elementary Differential
Equations with Boundary Value Problems", 2nd ed., Prentice-
Hall, Englewood Cliffs, N.J., 1989
(17) J.G. Kemeny and J.L. Snell, "Finite Markov Chains",
Springer, New York, 1976.
(18) G.B. Thomas and R.L. Finney, "Calculus and Analytic
Geometry", 9th ed., Addison-Wesley, Reading, MA, 1996.
Index
Addition,
of linear operators, 186
of matrices, 6
Adjoint of a matrix, 80
Algebra, 188
of linear operators, 186
of matrices, 188
Angle between two vectors, 196
Artificial variable, 408
Associative law, 12, 25, 155
Augmented matrix, 3
Auxiliary program, 408
Back substitution, 32
Basic solution, 392
Basis, 114
change of, 169
ordered, 120
Bijective function, 153
Bilinear form, 332
matrix representation of, 333
skew-symmetric, 335
symmetric, 335
Bland's rule, 406
Block, Jordan, 355
Canonical form, linear program in
375
Cauchy-Schwartz inequality, 196, 203,
213
Cayley-Hamilton Theorem, 351
Change of basis, 169
and linear transformations, 173
Characteristic equation, 260
Characteristic polynomial, 260
Codomain, 152
Coefficient matrix, 3
Cofactor, 65
Column,
echelon form, 49
expansion, 66
operation, 49
space, 126
vector, 4
Commutative law, 12, 25
Companion matrix, 274
Complex,
inner product space, 217
scalar product, 206
transpose, 205
Composite of functions, 154
Congruent matrices, 334
eigenvalues of, 339
Conic, 315
Consistent linear system, 34
Constraint, 372
Convex
combination, 384
hull, 384
set, 382
Coordinate vector, 120
Coset, 143
Cost matrix, 15
Cramer's rule, 84
Critical point, 324
Crossover diagram, 60
Degeneracy, 406
Departing variable, 402
Determinant, 57
definition of, 64
of a product, 79
properties of, 70
Diagonal matrix, 5
Diagonalizable matrix, 267, 307
Differential equations, 108
system of, 288, 363
Dimension, 117
formulas, 134, 147, 180
Direct sum of subspaces, 137
432
Index 433
Distance of a point from a plane,
198
Distributive law, 13, 25
Domain of a function, 152
Echelon form, 36
Eigenspace, 257
Eigenvalue, 257, 266
of hermitian matrix, 304
Eigenvector, 257, 266
Elementary,
column operation, 49
matrix, 47
row operation, 41
Entering variable, 402
Equations, linear, 3, 30
homogeneous, 38
Equivalent linear systems, 34
Euclidean space, 88
Even permutation, 60
Expansion,
column, 66
row, 66
Extreme point, 386
Theorem, 388
Factorization, QR —, 234
Feasible solution, 373
Fibonacci sequence, 281
Field,
axioms of a, 25
of two elements, 26
Finitely generated, 101
Function, 152
Fundamental subspaces, 224
Fundamental Theorem of Algebra,
260
Gaussian elimination, 35
Gauss-Jordan elimination, 37
General solution, 34
Geometry of linear programming, 380
Gram-Schmidt process, 230
Group, 28
general linear, 28
Hermitian matrix, 303
Hessian, 327
Homogeneous,
linear differential equation, 108
linear system, 38
Identity,
element, 25
function, 153
linear operator, 161
matrix, 4
Image,
of a function, 152
of a linear transformation, 178
Inconsistent linear system, 34
Indefinite quadratic form, 320
Infinitely generated, 102
Injective function, 153
Inner product, 209
complex, 217
real, 209
standard, 210
Inner product space, 209
Intersection of subspaces, 133
finding basis of, 139
Inverse,
of a function, 155
of a matrix, 17, 53
Inversion of natural order, 60
Invertible,
function, 155
matrix, 17
Isomorphic,
algebras, 190
vector spaces, 182
Isomorphism, 181, 190
Isomorphism theorems, 184, 192
Jordan,
block, 355
normal form, 356, 368
string, 356
Kernel, 178
434 Index
Law of Inertia, 340
Laws of,
exponents, 22
matrix algebra, 12
Least Squares, Method of, 241
and Q.R-factorization, 248
geometric interpretation of, 250
in inner product spaces, 253
Least squares solution, 243
Length of a vector, 193
Line segment, 88
Linear,
combination, 99
dependence, 104
differential equation, 108
independence, 104
mapping, 158
operator, 159
recurrence, 276
Linear programming problems, 370
Linear system,
of differential equations, 288
of equations, 3, 30
of recurrences, 278
Linear transformation, 158
matrix representation of, 162,
166
Linearly,
dependent, 104
independent, 104
Lower triangular, 5
Markov process, 284
regular, 285
Mathematical induction, 415
Matrices,
addition of, 6
congruent, 334
equality of, 2
multiplication of, 7
scalar multiplication of, 60
similar, 175
Matrix,
definition of, 1
diagonal, 5
diagonalizable, 267, 307
elementary, 47
hermitian, 303
identity, 4
invertible, 17
non-singular, 17
normal, 310
orthogonal, 235
partitioned, 20
permutation, 62
powers of, 11
scalar, 5
skew-hermitian, 312
skew-symmetric, 12
square, 4
symmetric, 12
triangular, 4
triangularizable, 271
unitary, 238
Maximum, local, 324
Method of Least Squares, 241
Minimum, local, 324
Minimum polynomial, 349
Minor, 65
Monic polynomial, 349
Multiplication of matrices, 7
Negative,
of a matrix, 6
of a vector, 95
Negative definite quadratic form, 320
Negative semidefinite, 330
Non-singular, 17
Norm, 212
Normal,
form of a matrix, 50
matrix, 310
system, 244
Normed linear space, 214
Null space of a matrix, 99
Objective,
function, 372
row, 400
Odd permutation, 60
Index 435
One-one, 153
correspondence, 153
Onto, 153
Operation,
column, 49
row, 41
Optimal least squares solution, 251
Optimal solution of linear program,
373
Ordered basis, 120
Orthogonal,
basis, 228
complement, 218
linear operator, 240
matrix, 235
set, 226
vectors, 196, 203, 211
Orthogonality,
in inner product spaces, 211
i n R n
, 203
Orthonormal,
basis, 228
set, 226
Parallelogram rule, 90
Partitioned matrix, 20
Permutation, 59
matrix, 62
Pivot, 36
Pivotal row, 402
Polynomial,
characteristic, 260
minimum, 349
Positive definite quadratic form, 320
Positive semidefinite, 330
Powers of a matrix, 11
negative, 24
Principal axes, 316
Principal diagonal, 4
Product of,
determinants, 79
linear operators, 187
matrices, 7
Projection of a vector,
on a line, 196
on a subspace, 222
QR-factorization, 234
Quadratic form, 313
indefinite, 320
negative definite, 320
positive definite, 320
Quadric surface, 318
Quotient space, 143
dimension of, 147
Rank of a matrix, 130
Ratio, 6-, 402
Real inner product space, 209
Recurrences, linear, 276
system of, 278
Reduced,
column echelon form, 49
echelon form, 37
row echelon form, 37, 44
Reflection, 176
Regular Markov process, 285
Right-handed system, 201
Ring with identity, 12
matrix over, 26
o f n x n matrices, 27
Rotation, 172
Row,
echelon form, 41
expansion, 66
operation, 41
space, 126
vector, 3
Row-times-column rule, 8
Saddle point, 324
Scalar, 6
matrix, 5
multiplication, 6, 95
product, 193
projection, 197
triple product 208
Scalar multiple,
of a linear operator, 186
of a matrix, 6
436 Index
Schur's Theorem, 305
Sign of a permutation, 62
Similar matrices, 175
Simplex algorithm, 399
Singular matrix, 17
Skew-hermitian matrix, 312
Skew-symmetric,
bilinear form, 335
matrix, 12
Slack variable, 377
Solution,
general, 34
non-trivial, 38
trivial, 38
Solution space, 99
Spectral Theorem, 307
Standard basis,
of Pn (R), 118
of Rn
, 114
Standard form, linear program in,
374
String, Jordan, 356
Subspace, 97
fundamental, 224
generated by a subset, 100
improper, 97
spanned by a subset, 100
zero, 97
Sum of subspaces, 133
finding a basis for, 139
Surjective function, 153
Sylvester's Law of Inertia, 340
symmetric,
bilinear form, 335
matrix, 12
System,
of differential equations, 288
of linear equations, 3, 30
of linear recurrences, 278
Tableau, 400
Trace, 263
Transaction, 123
Transition matrix, 284
Transpose, 11
complex, 205
Transposition, 61
Triangle inequality, 204, 215
Triangle rule, 90
Triangular matrix, 4
Triangularizable matrix, 271
Trivial solution, 38
Two Phase Method, 407
Unit vector, 194, 212
Unitary matrix, 238
Upper triangular matrix, 4
Vandermonde determinant, 75
Vector, 95
column, 4
product, 200
projection, 197
row, 3
triple product, 209
Vector space, 87
axioms for, 95
examples of, 87
Weight function, 225
Wronskian, 109
Zero,
linear transformation, 161
matrix, 4
subspace, 97
vector, 95
A Course in
LINEAR ALGEBRA
with Applications
2nd Edition
This book is a comprehensive introduction to linear algebra which
presupposes no knowledge on the part of the reader beyond the calculus.
It gives a thorough treatment of all the basic concepts, such as vector
space, linear transformation and inner product. The book proceeds at a
gentle pace, yet provides full proofs. The concept of a quotient space
is introduced and is related to solutions of linear system of equations.
Also a simplified treatment of Jordan normal form is given.
Numerous applications of linear algebra are described: these include
systems of linear recurrence relations, systems of linear differential
equations, Markov processes and the Method of Least Squares. In
addition, an entirely new chapter on linear programming introduces the
reader to the Simplex Algorithm and stresses understanding the theory
on which the algorithm is based.
The book is addressed to students who wish to learn linear algebra, as
well as to professionals who need to use the methods of the subject in
their own fields.
Derek J.S. Robinson received his Ph.D. degree from Cambridge University. He has held
positions at the University of London, the National University of Singapore and the University
of Illinois at Urbana-Champaign, where he is currently Professor of Mathematics. He is
the author of five books and numerous research articles on the theory of groups and other
branches of algebra.
!37hc
9 ''789812 7002301
www.worldscientific.com

More Related Content

PDF
Advanced Linear Algebra (Third Edition) By Steven Roman
PDF
[W]-REFERENCIA-Paul Waltman (Auth.) - A Second Course in Elementary Different...
PDF
Elementary geometry from an advanced standpoint(Geometría Elemental Desde Un ...
PDF
Lectures On The Geometry Of Manifolds 2nd Edition Liviu I. Nicolaescu
PDF
Advanced_Calculus
PDF
Herstein 3th editon
PDF
Brownian Motion and Martingales
PDF
Journey Into Mathematics An Introduction To Proofs Dover Ed Joseph J Rotman
Advanced Linear Algebra (Third Edition) By Steven Roman
[W]-REFERENCIA-Paul Waltman (Auth.) - A Second Course in Elementary Different...
Elementary geometry from an advanced standpoint(Geometría Elemental Desde Un ...
Lectures On The Geometry Of Manifolds 2nd Edition Liviu I. Nicolaescu
Advanced_Calculus
Herstein 3th editon
Brownian Motion and Martingales
Journey Into Mathematics An Introduction To Proofs Dover Ed Joseph J Rotman

Similar to A Course In LINEAR ALGEBRA With Applications (20)

PDF
Field theory a path integral approach 2nd Edition Ashok Das
PDF
A First Course In With Applications Complex Analysis
PDF
Linear Algebra_ Theory_Jim Hefferon
PDF
Computational Theory Of Iterative Methods Ioannis K Argyros Eds
PDF
Linear integral equations -rainer kress
PDF
Mathematical Methods For Students Of Physics And Related Fields 2nd Sadri Has...
PDF
The Design and Analysis of Computer Algorithms [Aho, Hopcroft & Ullman 1974-0...
PDF
Field theory a path integral approach 2nd Edition Ashok Das
PDF
New books apr 2012
PDF
New books dec 2012
PDF
A Book of Abstract Algebra.pdf
PDF
The Theory Of Functions Of A Real Variable Second Edition Ralph Jeffery
PDF
Representations Of Quantum Algebras And Combinatorics Of Young Tableaux Susum...
PDF
An introduction to the classification of amenable C algebras 1st Edition Huax...
PDF
Tensor Analysis With Applications In Mechanics Lebedev L Cloud M
PDF
Galois Theory Escofier Jeanpierreschneps Leilatranslator
PDF
Hilbert Space Operators In Quantum Physics 2ed Blank Jir Exner
PDF
Foundations Of Algebraic Topology Samuel Eilenberg Norman Steenrod
PDF
number theory Rosen
PDF
TRENCH_REAL_ANALYSIS.pdf
Field theory a path integral approach 2nd Edition Ashok Das
A First Course In With Applications Complex Analysis
Linear Algebra_ Theory_Jim Hefferon
Computational Theory Of Iterative Methods Ioannis K Argyros Eds
Linear integral equations -rainer kress
Mathematical Methods For Students Of Physics And Related Fields 2nd Sadri Has...
The Design and Analysis of Computer Algorithms [Aho, Hopcroft & Ullman 1974-0...
Field theory a path integral approach 2nd Edition Ashok Das
New books apr 2012
New books dec 2012
A Book of Abstract Algebra.pdf
The Theory Of Functions Of A Real Variable Second Edition Ralph Jeffery
Representations Of Quantum Algebras And Combinatorics Of Young Tableaux Susum...
An introduction to the classification of amenable C algebras 1st Edition Huax...
Tensor Analysis With Applications In Mechanics Lebedev L Cloud M
Galois Theory Escofier Jeanpierreschneps Leilatranslator
Hilbert Space Operators In Quantum Physics 2ed Blank Jir Exner
Foundations Of Algebraic Topology Samuel Eilenberg Norman Steenrod
number theory Rosen
TRENCH_REAL_ANALYSIS.pdf
Ad

More from Nathan Mathis (20)

PDF
Page Borders Design, Border Design, Baby Clip Art, Fre
PDF
How To Write Your Essays In Less Minutes Using This Website Doy News
PDF
Lined Paper For Beginning Writers Writing Paper Prin
PDF
Term Paper Example Telegraph
PDF
Unusual How To Start Off A Compare And Contrast Essay
PDF
How To Write A Methodology Essay, Essay Writer, Essa
PDF
Recolectar 144 Imagem Educational Background Ex
PDF
Microsoft Word Lined Paper Template
PDF
Owl Writing Paper
PDF
The Essay Writing Process Essays
PDF
How To Make A Cover Page For Assignment Guide - As
PDF
Awesome Creative Writing Essays Thatsnotus
PDF
Sites That Write Papers For You. Websites That Write Essays For You
PDF
4.4 How To Organize And Arrange - Hu
PDF
Essay Written In First Person
PDF
My Purpose In Life Free Essay Example
PDF
The Structure Of An Outline For A Research Paper, Including Text
PDF
What Are Some Topics For Exemplification Essays - Quora
PDF
Please Comment, Like, Or Re-Pin For Later Bibliogra
PDF
Ide Populer Word In English, Top
Page Borders Design, Border Design, Baby Clip Art, Fre
How To Write Your Essays In Less Minutes Using This Website Doy News
Lined Paper For Beginning Writers Writing Paper Prin
Term Paper Example Telegraph
Unusual How To Start Off A Compare And Contrast Essay
How To Write A Methodology Essay, Essay Writer, Essa
Recolectar 144 Imagem Educational Background Ex
Microsoft Word Lined Paper Template
Owl Writing Paper
The Essay Writing Process Essays
How To Make A Cover Page For Assignment Guide - As
Awesome Creative Writing Essays Thatsnotus
Sites That Write Papers For You. Websites That Write Essays For You
4.4 How To Organize And Arrange - Hu
Essay Written In First Person
My Purpose In Life Free Essay Example
The Structure Of An Outline For A Research Paper, Including Text
What Are Some Topics For Exemplification Essays - Quora
Please Comment, Like, Or Re-Pin For Later Bibliogra
Ide Populer Word In English, Top
Ad

Recently uploaded (20)

PPTX
master seminar digital applications in india
PDF
Weekly quiz Compilation Jan -July 25.pdf
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
RMMM.pdf make it easy to upload and study
PDF
Computing-Curriculum for Schools in Ghana
PDF
01-Introduction-to-Information-Management.pdf
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
Cell Types and Its function , kingdom of life
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PPTX
Orientation - ARALprogram of Deped to the Parents.pptx
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
Pharma ospi slides which help in ospi learning
master seminar digital applications in india
Weekly quiz Compilation Jan -July 25.pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
RMMM.pdf make it easy to upload and study
Computing-Curriculum for Schools in Ghana
01-Introduction-to-Information-Management.pdf
O5-L3 Freight Transport Ops (International) V1.pdf
human mycosis Human fungal infections are called human mycosis..pptx
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
O7-L3 Supply Chain Operations - ICLT Program
Cell Types and Its function , kingdom of life
Abdominal Access Techniques with Prof. Dr. R K Mishra
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Anesthesia in Laparoscopic Surgery in India
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
Final Presentation General Medicine 03-08-2024.pptx
Chinmaya Tiranga quiz Grand Finale.pdf
Orientation - ARALprogram of Deped to the Parents.pptx
Microbial disease of the cardiovascular and lymphatic systems
Pharma ospi slides which help in ospi learning

A Course In LINEAR ALGEBRA With Applications

  • 1. A Course in LINEAR ALGEBRA with Applications Derek J. S. Robinson
  • 2. A Course in LINEAR ALGEBRA with Applications 2nd Edition
  • 4. 2nd Edition ik. i l ^ f £ % M J 5% 9%tf"%I'll Stiffen University of Illinois in Urbana-Champaign, USA l | 0 World Scientific NEW JERSEY • LONDON • SINGAPORE • BEIJING • SHANGHAI • HONG KONG • TAIPEI • CHENNAI
  • 5. Published by World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore 596224 USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library. A COURSE IN LINEAR ALGEBRA WITH APPLICATIONS (2nd Edition) Copyright © 2006 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in anyform or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher. For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher. ISBN 981-270-023-4 ISBN 981-270-024-2 (pbk) Printed in Singapore by B & JO Enterprise
  • 8. PREFACE TO THE SECOND EDITION The principal change from the first edition is the addition of a new chapter on linear programming. While linear program- ming is one of the most widely used and successful applications of linear algebra, it rarely appears in a text such as this. In the new Chapter Ten the theoretical basis of the simplex algo- rithm is carefully explained and its geometrical interpretation is stressed. Some further applications of linear algebra have been added, for example the use of Jordan normal form to solve systems of linear differential equations and a discussion of ex- tremal values of quadratic forms. On the theoretical side, the concepts of coset and quotient space are thoroughly explained in Chapter 5. Cosets have useful interpretations as solutions sets of systems of linear equations. In addition the Isomorphisms Theorems for vector spaces are developed in Chapter Six: these shed light on the relationship between subspaces and quotient spaces. The opportunity has also been taken to add further exer- cises, revise the exposition in several places and correct a few errors. Hopefully these improvements will increase the use- fulness of the book to anyone who needs to have a thorough knowledge of linear algebra and its applications. I am grateful to Ms. Tan Rok Ting of World Scientific for assistance with the production of this new edition and for patience in the face of missed deadlines. I thank my family for their support during the preparation of the manuscript. Derek Robinson Urbana, Illinois May 2006 vii
  • 10. PREFACE TO THE FIRST EDITION A rough and ready definition of linear algebra might be: that part of algebra which is concerned with quantities of the first degree. Thus, at the very simplest level, it involves the so- lution of systems of linear equations, and in a real sense this elementary problem underlies the whole subject. Of all the branches of algebra, linear algebra is the one which has found the widest range of applications. Indeed there are few areas of the mathematical, physical and social sciences which have not benefitted from its power and precision. For anyone work- ing in these fields a thorough knowledge of linear algebra has become an indispensable tool. A recent feature is the greater mathematical sophistication of users of the subject, due in part to the increasing use of algebra in the information sci- ences. At any rate it is no longer enough simply to be able to perform Gaussian elimination and deal with real vector spaces of dimensions two and three. The aim of this book is to give a comprehensive intro- duction to the core areas of linear algebra, while at the same time providing a selection of applications. We have taken the point of view that it is better to consider a few quality applica- tions in depth, rather than attempt the almost impossible task of covering all conceivable applications that potential readers might have in mind. The reader is not assumed to have any previous knowl- edge of linear algebra - though in practice many will - but is expected to have at least the mathematical maturity of a student who has completed the calculus sequence. In North America such a student will probably be in the second or third year of study. The book begins with a thorough discussion of matrix operations. It is perhaps unfashionable to precede systems of linear equations by matrices, but I feel that the central ix
  • 11. X Preface position of matrices in the entire theory makes this a logical and reasonable course. However the motivation for the in- troduction of matrices, by means of linear equations, is still provided informally. The second chapter forms a basis for the whole subject with a full account of the theory of linear equations. This is followed by a chapter on determinants, a topic that has been unfairly neglected recently. In practice it is hard to give a satisfactory definition of the general n x n determinant without using permutations, so a brief account of these is given. Chapters Five and Six introduce the student to vector spaces. The concept of an abstract vector space is probably the most challenging one in the entire subject for the non- mathematician, but it is a concept which is well worth the effort of mastering. Our approach proceeds in gentle stages, through a series of examples that exhibit the essential fea- tures of a vector space; only then are the details of the def- inition written down. However I feel that nothing is gained by ducking the issue and omitting the definition entirely, as is sometimes done. Linear tranformations are the subject of Chapter Six. After a brief introduction to functional notation, and numer- ous examples of linear transformations, a thorough account of the relation between linear transformations and matrices is given. In addition both kernel and image are introduced and are related to the null and column spaces of a matrix. Orthogonality, perhaps the heart of the subject, receives an extended treatment in Chapter Seven. After a gentle in- troduction by way of scalar products in three dimensions — which will be familiar to the student from calculus — inner product spaces are denned and the Gram-Schmidt procedure is described. The chapter concludes with a detailed account of The Method of Least Squares, including the problem of
  • 12. Preface xi finding optimal solutions, which texts at this level often fail to cover. Chapter Eight introduces the reader to the theory of eigenvectors and eigenvalues, still one of the most powerful tools in linear algebra. Included is a detailed account of ap- plications to systems of linear differential equations and linear recurrences, and also to Markov processes. Here we have not shied away from the more difficult case where the eigenvalues of the coefficient matrix are not all different. The final chapter contains a selection of more advanced topics in linear algebra, including the crucial Spectral Theo- rem on the diagonalizability of real symmetric matrices. The usual applications of this result to quadratic forms, conies and quadrics, and maxima and minima of functions of several variables follow. Also included in Chapter Nine are treatments of bilinear forms and Jordan Normal Form, topics that are often not con- sidered in texts at this level, but which should be more widely known. In particular, canonical forms for both symmetric and skew-symmetric bilinear forms are obtained. Finally, Jordan Normal Form is presented by an accessible approach that re- quires only an elementary knowledge of vector spaces. Chapters One to Eight, together with Sections 9.1 and 9.2, correspond approximately to a one semester course taught by the author over a period of many years. As time allows, other topics from Chapter Nine may be included. In practice some of the contents of Chapters One and Two will already be familiar to many readers and can be treated as review. Full proofs are almost always included: no doubt some instructors may not wish to cover all of them, but it is stressed that for maximum understanding of the material as many proofs as possible should be read. A good supply of problems appears at the end of each section. As always in mathematics, it is an
  • 13. xii Preface indispensible part of learning the subject to attempt as many problems as possible. This book was originally begun at the suggestion of Harriet McQuarrie. I thank Ms. Ho Hwei Moon of World Scientific Publishing Company for her advice and for help with editorial work. I am grateful to my family for their patience, and to my wife Judith for her encouragement, and for assis- tance with the proof-reading. Derek Robinson Singapore March 1991
  • 14. CONTENTS Preface to the Second Edition vii Preface to the First Edition ix Chapter One Matrix Algebra 1.1 Matrices 1 1.2 Operations with Matrices 6 1.3 Matrices over Rings and Fields 24 Chapter Two Systems of Linear Equations 2.1 Gaussian Elimination 30 2.2 Elementary Row Operations 41 2.3 Elementary Matrices 47 Chapter Three Determinants 3.1 Permutations and the Definition of a Determinant 57 3.2 Basic Properties of Determinants 70 3.3 Determinants and Inverses of Matrices 78 xm
  • 15. xiv Contents Chapter Four Introduction to Vector Spaces 4.1 Examples of Vector Spaces 87 4.2 Vector Spaces and Subspaces 95 4.3 Linear Independence in Vector Spaces 104 Chapter Five Basis and Dimension 5.1 The Existence of a Basis 112 5.2 The Row and Column Spaces of a Matrix 126 5.3 Operations with Subspaces 133 Chapter Six Linear Transformations 6.1 Functions Defined on Sets 152 6.2 Linear Transformations and Matrices 158 6.3 Kernel, Image and Isomorphism 178 Chapter Seven Orthogonality in Vector Spaces 7.1 Scalar Products in Euclidean Space 193 7.2 Inner Product Spaces 209 7.3 Orthonormal Sets and the Gram-Schmidt Process 226 7.4 The Method of Least Squares 241 Chapter Eight Eigenvectors and Eigenvalues 8.1 Basic Theory of Eigenvectors and Eigenvalues 257 8.2 Applications to Systems of Linear Recurrences 276 8.3 Applications to Systems of Linear Differential Equations 288
  • 16. Contents XV Chapter Nine More Advanced Topics 9.1 Eigenvalues and Eigenvectors of Symmetric and Hermitian Matrices 303 9.2 Quadratic Forms 313 9.3 Bilinear Forms 332 9.4 Minimum Polynomials and Jordan Normal Form 347 Chapter Ten Linear Programming 10.1 Introduction to Linear Programming 370 10.2 The Geometry of Linear Programming 380 10.3 Basic Solutions and Extreme Points 391 10.4 The Simplex Algorithm 399 Appendix Mathematical Induction 415 Answers to the Exercises 418 Bibliography 430 Index 432
  • 17. Chapter One MATRIX ALGEBRA In this first chapter we shall introduce one of the prin- cipal objects of study in linear algebra, a matrix or rectan- gular array of numbers, together with the standard matrix operations. Matrices are encountered frequently in many ar- eas of mathematics, engineering, and the physical and social sciences, typically when data is given in tabular form. But perhaps the most familiar situation in which matrices arise is in the solution of systems of linear equations. 1.1 Matrices An m x n matrix A is a rectangular array of numbers, real or complex, with m rows and n columns. We shall write dij for the number that appears in the ith row and the jth column of A; this is called the (i,j) entry of A. We can either write A in the extended form / an «21 V&rol or in the more compact form Thus in the compact form a formula for the (i,j) entry of A is given inside the round brackets, while the subscripts m and n tell us the respective numbers of rows and columns of A. 1 &12 • • • CLn «22 - - ' &2n Q"m2 ' ' ' Q"mn '
  • 18. 2 Chapter One: Matrix Algebra Explicit examples of matrices are / 4 3 , / 0 2.4 6 [l 2) a n d {^=2 3/5 - l j - Example 1.1.1 Write down the extended form of the matrix ((-l)*j + 1)3,2 • The (i,j) entry of the matrix is (—l)l j + i where i — 1, 2, 3, and j — 1, 2. So the matrix is (1"0- It is necessary to decide when two matrices A and B are to be regarded as equal; in symbols A = B. Let us agree this will mean that the matrices A and B have the same numbers of rows and columns, and that, for all i and j , the (i,j) entry of A equals the (i,j) entry of B. In short, two matrices are equal if they look exactly alike. As has already been mentioned, matrices arise when one has to deal with linear equations. We shall now explain how this comes about. Suppose we have a set of m linear equations in n unknowns xi, X2, • • • , xn. These may be written in the form { anxi + CL12X2 + • • • + anxn = bi CL21X1 + a22X2 + • • • + a2nXn = £ > 2 omiXi + am2x2 + • • • + a Here the a^ and bi are to be regarded as given numbers. The problem is to solve the system, that is, to find all n-tuples of numbers xi, x2, ..., xn that satisfy every equation of the
  • 19. 1.1: Matrices 3 system, or to show that no such numbers exist. Solving a set of linear equations is in many ways the most basic problem of linear algebra. The reader will probably have noticed that there is a ma- trix involved in the above linear system, namely the coefficient matrix • "• = = y&ij )m,n- In fact there is a second matrix present; it is obtained by using the numbers bi, b2, • • ., bmto add a new column, the (n + l)th, to the coefficient matrix A. This results in an m x (n + 1) matrix called the augmented matrix of the linear system. The problem of solving linear systems will be taken up in earnest in Chapter Two, where it will emerge that the coefficient and augmented matrices play a critical role. At this point we merely wish to point out that here is a natural problem in which matrices are involved in an essential way. Example 1.1.2 The coefficient and augmented matrices of the pair of linear equations 2xi —3x2 +5a;3 = 1 ^ -xx + x2 - x3 = 4 are respectively 2 - 3 5 , f 2 -3 5 1 and - 1 1 - 1 7 V - 1 1 - 1 4 Some special matrices Certain special types of matrices that occur frequently will now be recorded. (i) A 1 x n matrix, or n — row vector, A has a single row A = (an a12 ... aln).
  • 20. 4 Chapter One: Matrix Algebra (ii) An m x 1 matrix, or m-column vector, B has just one column b2i B = bml/ (iii) A matrix with the same number of rows and columns is said to be square. (iv) A zero matrix is a matrix all of whose entries are zero. The zero m x n matrix is denoted by 0mn or simply 0. Sometimes 0nn is written 0n. For example, O23 is the matrix 0 0 0 0 0 0 (v) The identity nxn matrix has l's on the principal diagonal, that is, from top left to bottom right, and zeros elsewhere; thus it has the form ( 0 • • • 1 ^ 0 1 • • • 0 This matrix is written 0 0 • • • 1 / In or simply I. The identity matrix plays the role of the number 1 in matrix multiplication. (vi) A square matrix is called upper triangular if it has only zero entries below the principal diagonal. Similarly a matrix
  • 21. 1.1: Matrices 5 is lower triangular if all entries above the principal diagonal are zero. For example, the matrices are upper triangular and lower triangular respectively. (vii) A square matrix in which all the non-zero elements lie on the principal diagonal is called a diagonal matrix. A scalar matrix is a diagonal matrix in which the elements on the prin- cipal diagonal are all equal. For example, the matrices a 0 0 0 6 0 0 0 c / and fa 0 0 0 a 0 0 0 a are respectively diagonal and scalar. Diagonal matrices have much simpler algebraic properties than general square matri- ces. Exercises 1.1 1. Write out in extended form the matrix ((—)l ~^(i + j))2,4- 2. Find a formula for the (i,j) entry of each of the following matrices: - 1 1 - 1 1 - 1 1 - 1 1 - 1 (a) 1 - 1 1 , (b) / I 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16
  • 22. 6 Chapter One: Matrix Algebra 3. Using the fact that matrices have a rectangular shape, say how many different zero matrices can be formed using a total of 12 zeros. 4. For every integer n > 1 there are always at least two zero matrices that can be formed using a total of n zeros. For which n are there exactly two such zero matrices? 5. Which matrices are both upper and lower triangular? 1.2 Operations with Matrices We shall now introduce a number of standard operations that can be performed on matrices, among them addition, scalar multiplication and multiplication. We shall then de- scribe the principal properties of these operations. Our object in so doing is to develop a systematic means of performing cal- culations with matrices. (i) Addition and subtraction Let A and B be two mxn matrices; as usual write a^ and bij for their respective (i,j) entries. Define the sum A + B to be the mxn matrix whose (i,j) entry is a^ + b^; thus to form the matrix A + B we simply add corresponding entries of A and B. Similarly, the difference A — B is the mxn matrix whose (i,j) entry is a^- — b^. However A + B and A — B are not defined if A and B do not have the same numbers of rows and columns. (ii) Scalar multiplication By a scalar we shall mean a number, as opposed to a matrix or array of numbers. Let c be a scalar and A an mxn matrix. The scalar multiple cA is the mxn matrix whose (i, j) entry is caij. Thus to form cA we multiply every entry of A by the scalar c. The matrix (-l)A is usually written -A; it is called the negative of A since it has the property that A + (-A) = 0.
  • 23. 1.2: Operations with Matrices 7 Example 1.2.1 If , / 1 2 0 , „ (I 1 1 A ={-1 0 l) ™dB ={0 -3 1 then 2A + 35 = [ X A " I and 2A - 35 (iii) Matrix multiplication It is less obvious what the "natural" definition of the product of two matrices should be. Let us start with the simplest interesting case, and consider a pair of 2 x 2 matrices an a12 , , D ( blx bX2 A= ( lL ^ and B - , , . G 2 1 0122/ 0 2 1 022 In order to motivate the definition of the matrix product AB we consider two sets of linear equations aiVi + a.i2V2 = xi a n d f &nzx + bX2z2 = y± o.2iV + a22y2 = x2 b21zi + b22z2 = y2 Observe that the coefficient matrices of these linear systems are A and B respectively. We shall think of these equations as representing changes of variables from j/i, y2 to xi, x2, and from z, z2 to y, y2 respectively. Suppose that we replace y and y2 in the first set of equa- tions by the values specified in the second set. After simplifi- cation we obtain a new set of equations (aii&n + ai2b2i)zi + (aU 01 2 + ai2b22)z2 = %i (a21bn + a22b2i)zi + (a2ib12 + a22b22)z2 = x2
  • 24. 8 Chapter One: Matrix Algebra This has coefficient matrix aii&n + ai2&2i 011612 + 012622^ «21&11 + C122&21 ^21^12 + ^22^22 / and represents a change of variables from zi, z2 to xi, x2 which may be thought of as the composite of the original changes of variables. At first sight this new matrix looks formidable. However it is in fact obtained from A and B in quite a simple fashion, namely by the row-times-column rule. For example, the (1,2) entry arises from multiplying corresponding entries of row 1 of A and column 2 of B, and then adding the resulting numbers; thus (an a12) &12 ^22 Qll^l2 + Oi2^22- Other entries arise in a similar fashion from a row of A and a column of B. Having made this observation, we are now ready to define the product AB where A is an m x n matrix and B i s a n n x p matrix. The rule is that the (i,j) entry of AB is obtained by multiplying corresponding entries of row i of A and column j of B, and then adding up the resulting products. This is the row-times-column rule. Now row i of A and column j of B are / bij an a%2 ain) and ->2j bnj / Hence the (i,j) entry of AB is Uilblj + CLi202j + •- • + O-inbnj,
  • 25. 1.2: Operations with Matrices 9 which can be written more concisely using the summation notation as n fc=l Notice that the rule only makes sense if the number of columns of A equals the number of rows of B. Also the product of an m x n matrix and a n n x p matrix is an m x p matrix. Example 1.2.2 Let A = { 1 I 21 and B Since A is 2 x 3 and B is 3 x 3, we see that AB is defined and is a 2 x 3 matrix. However BA is not defined. Using the row-times-column rule, we quickly find that AB = 0 0 2 2 16 - 2 Example 1.2.3 Let A = (O i)and B= (I i In this case both AB and BA are defined, but these matrices are different: AB=rQ ° ) = 0 2 2 M 1 d B A = ( ° l
  • 26. 10 Chapter One: Matrix Algebra Thus already we recognise some interesting features of matrix multiplication. The matrix product is not commuta- tive, that is, AB and BA may be different when both are de- fined; also the product of two non-zero matrices can be zero, a phenomenon which indicates that any theory of division by matrices will face considerable difficulties. Next we show how matrix mutiplication provides a way of representing a set of linear equations by a single matrix equa- tion. Let A = (aij)mtn and let X and B be the column vectors with entries x±, X2, ..., xn and 61, b2, ..., bm respectively. Then the matrix equation AX = B is equivalent to the linear system { aiixx + ai2x2 + • • • + anxn = bx CL21X1 + CI22X2 + • • • + CL2nXn = h o-mixi + am2X2 + • • • + a For if we form the product AX and equate its entries to the corresponding entries of B, we recover the equations of the linear system. Here is further evidence that we have got the definition of the matrix product right. Example 1.2.4 The matrix form of the pair of linear equations J 2xi — 3x2 + 5^3 = 1 -xi + x2 - X3 = 4 is
  • 27. 1.2: Operations with Matrices 11 (iv) Powers of a matrix Once matrix products have been defined, it is clear how to define a non-negative power of a square matrix. Let A be an n x n matrix; then the mth power of A, where m is a non- negative integer, is defined by the equations A0 = In and Am+1 = Am A. This is an example of a recursive definition: the first equation specifies A0 , while the second shows how to define Am+1 , un- der the assumption that Am has already been defined. Thus A1 = A, A2 = AA, A3 = A2 A etc. We do not attempt to define negative powers at this juncture. Example 1.2.5 Let Then The reader can verify that higher powers of A do not lead to new matrices in this example. Therefore A has just four distinct powers, A0 = I2, A1 = A, A2 and A3 . (v) The transpose of a matrix If A is an m x n matrix, the transpose of A, is the n x m matrix whose (i,j) entry equals the (j,i) entry of A. Thus the columns of A become the rows of AT . For example, if /a b A = c d , V fJ
  • 28. 12 Chapter One: Matrix Algebra then the transpose of A is A matrix which equals its transpose is called symmetric. On the other hand, if AT equals —A, then A is said to be skew- symmetric. For example, the matrices are symmetric and skew-symmetric respectively. Clearly sym- metric matrices and skew-symmetric matrices must be square. We shall see in Chapter Nine that symmetric matrices can in a real sense be reduced to diagonal matrices. The laws of matrix algebra We shall now list a number of properties which are sat- isfied by the various matrix operations defined above. These properties will allow us to manipulate matrices in a system- atic manner. Most of them are familiar from arithmetic; note however the absence of the commutative law for multiplica- tion. In the following theorem A, B, C are matrices and c, d are scalars; it is understood that the numbers of rows and columns of the matrices are such that the various matrix products and sums mentioned make sense. Theorem 1.2.1 (a) A + B = B + A, {commutative law of addition)] (b) (A + B) + C = A + (B + C), (associative law of addition); (c) A + 0 = A; (d) (AB)C = A(BC), ( associative law of multiplication)] (e) AI = A = I A;
  • 29. 1.2: Operations with Matrices 13 (f) A(B + C) = AB + AC, {distributive law); (g) (A + B)C = AC + BC, (distributive law); (h) A-B = A + (-l)B; (i) (cd)A = c(dA); (i)c(AB) = (cA)B = A(cB); (k) c(A + B) = cA + cB; (1) (c + d)A = cA + dA; (m) (A + B)T = AT + BT ; (n) (AB)T = BT AT . Each of these laws is a logical consequence of the defini- tions of the various matrix operations. To give formal proofs of them all is a lengthy, but routine, task; an example of such a proof will be given shortly. It must be stressed that familiarity with these laws is essential if matrices are to be manipulated correctly. We remark that it is unambiguous to use the expression A + B + C for both (A + B) + C and A+(B + C). For by the associative law of addition these matrices are equal. The same comment applies to sums like A + B + C + D , and also to matrix products such as (AB)C and A(BC), both of which are written as ABC. In order to illustrate the use of matrix operations, we shall now work out three problems. Example 1.2.6 Prove the associative law for matrix multiplication, (AB)C = A(BC) where A, B, C are mxn, nxp, pxq matrices re- spectively. In the first place observe that all the products mentioned exist, and that both (AB)C and A(BC) are m x q matrices. To show that they are equal, we need to verify that their (i, j) entries are the same for all i and j .
  • 30. 14 Chapter One: Matrix Algebra Let dik be the (i, k) entry of AB ; then dik = YH=I o-uhk- Thus the (i,j) entry of (AB)C is YX=i d ikCkj, that is p n J~](yiaubik)ckj. fc=i 1=1 After a change in the order of summation, this becomes n p /^ilj/^lkCkj). 1=1 fc = l Here it is permissible to change the order of the two summa- tions since this just corresponds to adding up the numbers aubikCkj in a different order. Finally, by the same procedure we recognise the last sum as the (i,j) entry of the matrix A(BC). The next two examples illustrate the use of matrices in real-life situations. Example 1.2.7 A certain company manufactures three products P, Q, R in four different plants W, X, Y, Z. The various costs (in whole dollars) involved in producing a single item of a product are given in the table material labor overheads P 1 3 2 Q 2 2 1 R 1 2 2 The numbers of items produced in one month at the four locations are as follows:
  • 31. 1.2: Operations with Matrices 15 p Q w 2000 1000 2000 X 3000 500 2000 Y 1500 500 2500 Z 4000 1000 2500 The problem is to find the total monthly costs of material, labor and overheads at each factory. Let C be the "cost" matrix formed by the first set of data and let N be the matrix formed by the second set of data. Thus / l 2 1 /2000 3000 1500 4000 C = 3 2 2 andJV= 1000 500 500 1000 . 2 1 2 / 2000 2000 2500 2500/ The total costs per month at factory W are clearly material : 1 x 2000 + 2 x 1000 + 1 x 2000 = 6000 labor : 3 x 2000 + 2 x 1000 + 2 x 2000 = 12000 overheads : 2 x 2000 + 1 x 1000 + 2 x 2000 = 9000 Now these amounts arise by multiplying rows 1, 2 and 3 of matrix C times column 1 of matrix JV, that is, as the (1, 1), (2, 1), and (3, 1) entries of matrix product CN. Similarly the costs at the other locations are given by entries in the other columns of the matrix CN. Thus the complete answer can be read off from the matrix product / 6000 6000 5000 8500 CN = I 12000 14000 10500 19000 I . 9000 10500 8500 14000/ Here of course the rows of CN correspond to material, la- bor and overheads, while the columns correspond to the four plants W, X, Y, Z.
  • 32. 16 Chapter One: Matrix Algebra Example 1.2.8 In a certain city there are 10,000 people of employable age. At present 7000 are employed and the rest are out of work. Each year 10% of those employed become unemployed, while 60% of the unemployed find work. Assuming that the total pool of people remains the same, what will the employment picture be in three years time? Let en and un denote the numbers of employed and un- employed persons respectively after n years. The information given translates into the equations en + i = .9en + .6un un+i = .len + Aun These linear equations are converted into a single matrix equa- tion by introducing matrices X„ = ( 6n and A ( , 9 -6 "n u„. I V .1 .4 The equivalent matrix equation is Xn+i = AXn. Taking n to be 0, 1, 2 successively, we see that X = AXo, X2 = AXi = A2 X0, X3 = AX2 = A3 XQ. In general Xn = AU XQ. Now we were told that e0 = 7000 and UQ = 3000, so Y - f700(A x °- ^3oooy • Thus to find X3 all that we need to do is to compute the power A3 . This turns out to be
  • 33. 1.2: Operations with Matrices 17 .861 .834 .139 .166y Hence *-**,-(•£) so that 8529 of the 10,000 will be in work after three years. At this point an interesting question arises: what will the numbers of employed and unemployed be in the long run? This problem is an example of a Markov process; these pro- cesses will be studied in Chapter Eight as an application of the theory of eigenvalues. The inverse of a square matrix An n x n matrix A is said to be invertible if there is an n x n matrix B such that AB = In = BA. Then B is called an inverse of A. A matrix which is not invert- ible is sometimes called singular, while an invertible matrix is said to be non-singular. Example 1.2.9 Show that the matrix 1 3 3 9 is not invertible. If f , ) were an inverse of the matrix, then we should have
  • 34. 18 Chapter One: Matrix Algebra 1 3 fa b _ (1 0 3 9 c d ~ [ 0 1 which leads to a set of linear equations with no solutions, a + 3c = 1 b + 3d = 0 3a + 9c = 0 3b + 9d = 1 Indeed the first and third equations clearly contradict each other. Hence the matrix is not invertible. Example 1.2.10 Show that the matrix A- r ~2 is invertible and find an inverse for it. Suppose that B = I , I is an inverse of A. Write out the product AB and set it equal to I2, just as in the previous example. This time we get a set of linear equations that has a solution, Indeed there is a unique solution a = 1, b = 2, c = 0, d = 1. Thus the matrix
  • 35. 1.2: Operations with Matrices 19 is a candidate. To be sure that B is really an inverse of A, we need to verify that BA is also equal to I2', this is in fact true, as the reader should check. At this point the natural question is: how can we tell if a square matrix is invertible, and if it is, how can we find an inverse? From the examples we have seen enough to realise that the question is intimately connected with the problem of solving systems of linear systems, so it is not surprising that we must defer the answer until Chapter Two. We now present some important facts about inverses of matrices. Theorem 1.2.2 A square matrix has at most one inverse. Proof Suppose that a square matrix A has two inverses B and B<i- Then ABX = AB2 = 1 = BXA = B2A. The idea of the proof is to consider the product (BiA)B2 since BA = I, this equals IB2 = B2. On the other hand, by the associative law it also equals Bi(AB2), which equals BJ = Bx. Therefore Bx = B2. From now on we shall write A-1 for the unique inverse of an invertible matrix A.
  • 36. 20 Chapter One: Matrix Algebra Theorem 1.2.3 (a) If A is an inveriible matrix, then A - 1 is invertible and {A'1 )-1 =A. (b) If A and B are invertible matrices of the same size, then AB is invertible and (AB)~l = B~1 A~1 . Proof (a) Certainly we have AA~1 = I — A~X A, equations which can be viewed as saying that A is an inverse of A~x . Therefore, since A~x cannot have more than one inverse, its inverse must be A. (b) To prove the assertions we have only to check that B~1 A~1 is an inverse of AB. This is easily done: (AB)(B~1 A~l ) = A(BB~1 )A~1 , by two applications of the associative law; the latter matrix equals AIA~l — AA~l — I. Similarity (B~1 A~1 )(AB) = I. Since inverses are unique, (AB)"1 = B~l A- Partitioned matrices A matrix is said to be partitioned if it is subdivided into a rectangular array of submatrices by a series of horizontal or vertical lines. For example, if A is the matrix (aij)^^, then / an ai2 | ai3 021 0.22 I CI23 a3i a32 | a33 / is a partitioning of A. Another example of a partitioned matrix is the augmented matrix of the linear system whose matrix form is AX — B ; here the partitioning is [-A|S]. There are occasions when it is helpful to think of a matrix as being partitioned in some particular manner. A common one is when an m x n matrix A is partitioned into its columns A±, A2, • • •, An,
  • 37. 1.2: Operations with Matrices 21 A=(A1A2 ... An). Because of this it is important to observe the following fact. Theorem 1.2.4 Partitioned matrices can be added and multiplied according to the usual rules of matrix algebra. Thus to add two partitioned matrices, we add correspond- ing entries, although these are now matrices rather than scalars. To multiply two partitioned matrices use the row- times-column rule. Notice however that the partitions of the matrices must be compatible if these operations are to make sense. Example 1.2.11 Let A = (0^)4,4 be partitioned into four 2 x 2 matrices A = An A12 A2 A22 where An = ( a n G l 2 ) , A12=l ° 1 3 a i 4 «21 &22 ) 023 «24 • 4 21 = ( a 3 1 a 3 2 ) , A 2 2 = V a 4i a42 J Let B = (fry)4,4 be similarly partitioned into submatrices Bn, B2, B21, B22 Bn B2 B2 B22 B T h e n A + B An + Bn A12 + B12 A21 + B21 A22 + B22
  • 38. 22 Chapter One: Matrix Algebra by the rule of addition for matrices. Example 1.2.12 Let A be anTOX n matrix and B an n x p matrix; write Bi, B2, ..., Bp for the columns of B. Then, using the partition of B into columns B = [.B^i^l • • • BP], we have AB = (AB1AB2 ... ABP). This follows at once from the row-times-column rule of matrix multiplication. Exercises 1.2 1. Define matrices /l 2 3 /2 1 /3 0 4 A= 0 1 -1 , B= 1 2 , C= 0 1 0 . 2 1 0/ 1 1/ 2 -1 3/ (a) Compute 3A - 2C. (b) Verify that (A + C)B = AB + CB. (c) Compute A2 and A3 . (d) Verify that (AB)T = BT AT . 2. Establish the laws of exponents: Am An = Am+n and (Am )n = Amn where A is any square matrix andTOand n are non-negative integers. [Use induction on n : see Appendix.] 3. If the matrix products AB and BA both exist, what can you conclude about the sizes of A and Bl 4. If A = ( 1, what is the first positive power of A that equals I-p. 5. Show that no positive power of the matrix I J equals h •
  • 39. 1.2: Operations with Matrices 23 6. Prove the distributive law A(B + C) = AB + AC where A is m x n, and B and C are n x p. 7. Prove that (Ai?)r = BT AT where A is m x n and £? is n x p . 8. Establish the rules c{AB) = (cA)B = A(cB) and (cA)T = cAT . 9. If A is an n x n matrix some power of which equals In, then A is invertible. Prove or disprove. 10. Show that any two n x n diagonal matrices commute. 11. Prove that a scalar matrix commutes with every square matrix of the same size. 12. A certain library owns 10,000 books. Each month 20% of the books in the library are lent out and 80% of the books lent out are returned, while 10% remain lent out and 10% are reported lost. Finally, 25% of the books listed as lost the previous month are found and returned to the library. At present 9000 books are in the library, 1000 are lent out, and none are lost. How many books will be in the library, lent out, and lost after two months ? 13. Let A be any square matrix. Prove that {A + AT ) is symmetric, while the matrix {A — AT ) is skew-symmetric. 14. Use the last exercise to show that every square matrix can be written as the sum of a symmetric matrix and a skew- symmetric matrix. Illustrate this fact by writing the matrix (• J -i) as the sum of a symmetric and a skew-symmetric matrix. 15. Prove that the sum referred to in Exercise 14 is always unique.
  • 40. 24 Chapter One: Matrix Algebra 16. Show that a n n x n matrix A which commutes with every other n x n matrix must be scalar. [Hint: A commutes with the matrix whose (i,j) entry is 1 and whose other entries are all 0.] 17. (Negative powers of matrices) Let A be an invertible ma- trix. If n > 0, define the power A~n to be (A~l )n . Prove that A-n = (A*)'1 . 18. For each of the following matrices find the inverse or show that the matrix is not invertible: «G9= <21)- 19. Generalize the laws of exponents to negative powers of an invertible matrix [see Exercise 2.] 20. Let A be an invertible matrix. Prove that AT is invertible and (AT )~l = {A-1 )T . 21. Give an example of a 3 x 3 matrix A such that A3 = 0, but A2 ^ 0. 1.3 Matrices over Rings and Fields Up to this point we have assumed that all our matrices have as their entries real or complex numbers. Now there are circumstances under which this assumption is too restrictive; for example, one might wish to deal only with matrices whose entries are integers. So it is desirable to develop a theory of matrices whose entries belong to certain abstract algebraic systems. If we review all the definitions given so far, it be- comes clear that what we really require of the entries of a matrix is that they belong to a "system" in which we can add and multiply, subject of course to reasonable rules. By this we mean rules of such a nature that the laws of matrix algebra listed in Theorem 1.2.1 will hold true.
  • 41. 1.3: Matrices over Rings and Fields 25 The type of abstract algebraic system for which this can be done is called a ring with identity. By this is meant a set R, with a rule of addition and a rule of multiplication; thus if ri and r2 are elements of the set R, then there is a unique sum r + r2 and a unique product rir2 in R- In addition the following laws are required to hold: (a) 7 * 1 + r2 = r2 + ri, (commutative law of addition): (b) (7*1 + r2) + r3 = ri + (r2 + r3), (associative law of addition): (c) R contains a zero element OR with the property r + OR = r : (d) each element r of R has a negative, that is, an element —r of R with the property r + (—r) = 0.R : (e) (rir2)r3 = ri(r2r^), (associative law of multiplication): (f) R contains an identity element 1R, different from 0^, such that rR — r = l#r : (g) (r i + ^2)^3 = f]T3 + ^2^3, (distributive law): (h) ri(r2 + 7-3) = rir2 + 7*17-3, (distributive law). These laws are to hold for all elements 7*1, r2, r3, r of the ring .R . The list of rules ought to seem reasonable since all of them are familiar laws of arithmetic. If two further rules hold, then the ring is called a field: (i) rxr2 = r 2 r i , (commutative law of multiplication): (j) each element r in R other than the zero element OK has an inverse, that is, an element r"1 in R such that rr x = If? = r 1 r. So the additional rules require that multiplication be a commutative operation, and that each non-zero element of R have an inverse. Thus a field is essentially an abstract system in which one can add, multiply and divide, subject to the usual laws of arithmetic. Of course the most familiar examples of fields are
  • 42. 26 Chapter One: Matrix Algebra C and R, the fields of complex numbers and real numbers respectively, where the addition and multiplication used are those of arith- metic. These are the examples that motivated the definition of a field in the first place. Another example is the field of rational numbers Q (Recall that a rational number is a number of the form a/b where a and b are integers). On the other hand, the set of all integers Z, (with the usual sum and product), is a ring with identity, but it is not a field since 2 has no inverse in this ring. All the examples given so far are infinite fields. But there are also finite fields, the most familiar being the field of two elements. This field has the two elements 0 and 1, sums and products being calculated according to the tables + 0 1 0 1 0 1 1 0 and X 0 1 0 1 0 0 0 1 respectively. For example, we read off from the tables that 1 + 1 = 0 and 1 x 1 = 1. In recent years finite fields have be- come of importance in computer science and in coding theory. Thus the significance of fields extends beyond the domain of pure mathematics. Suppose now that R is an arbitrary ring with identity. An m x n matrix over R is a rectangular m x n array of elements belonging to the ring R. It is possible to form sums
  • 43. 1.3: Matrices over Rings and Fields 27 and products of matrices over R, and the scalar multiple of a matrix over R by an element of R, by using exactly the same definitions as in the case of matrices with numerical entries. That the laws of matrix algebra listed in Theorem 1.2.1 are still valid is guaranteed by the ring axioms. Thus in the general theory the only change is that the scalars which appear as entries of a matrix are allowed to be elements of an arbitrary ring with identity. Some readers may feel uncomfortable with the notion of a matrix over an abstract ring. However, if they wish, they may safely assume in the sequel that the field of scalars is either R or C. Indeed there are places where we will definitely want to assume this. Nevertheless we wish to make the point that much of linear algebra can be done in far greater generality than over R and C. Example 1.3.1 Let A = I 1 and B = I n J be matrices over the field of two elements. Using the tables above and the rules of matrix addition and multiplication, we find that Algebraic structures in linear algebra There is another reason for introducing the concept of a ring at this stage. For rings, one of the fundamental structures of algebra, occur naturally at various points in linear algebra. To illustrate this, let us write Mn(R) for the set of all n x n matrices over a fixed ring with identity R. If the standard matrix operations of addition and multipli- cation are used, this set becomes a ring, the ring of all n x n
  • 44. 28 Chapter One: Matrix Algebra matrices over R. The validity of the ring axioms follows from Theorem 1.2.1. An obviously important example of a ring is M n ( R ) . Later we shall discover other places in linear algebra where rings occur naturally. Finally, we mention another important algebraic struc- ture that appears naturally in linear algebra, a group. Con- sider the set of all invertible n x n matrices over a ring with identity R; denote this by GLn(R). This is a set equipped with a rule of multiplication; for if A and B are two invertible n x n matrices over R, then AB is also invertible and so belongs to GLn(R), as the proof of Theorem 1.2.3 shows. In addition, each element of this set has an inverse which is also in the set. Of course the identity nxn matrix belongs to GLn{R), and multiplication obeys the associative law. All of this means that GLn(R) is a group. The formal definition is as follows. A group is a set G with a rule of multiplication; thus if g and gi are elements of G, there is a unique product gig2 in G. The following axioms must be satisfied: (a) (0102)03 = (0102)03, {associative law): (b) there is an identity element 1Q with the property 1 G 0 = 0 = 0 1 G : (c) each element g of G has an inverse element 0 _ 1 in G such that gg~l = 1Q = 9'1 g- These statements must hold for all elements g, gi, 02, 03 of G. Thus the set GLn (R) of all invertible matrices over R, a ring with identity, is a group; this important group is known as the general linear group of degree n over R. Groups oc- cur in many areas of science, particularly in situations where symmetry is important.
  • 45. 1.3: Matrices over Rings and Fields 29 Exercises 1.3 1. Show that the following sets of numbers are fields if the usual addition and multiplication of arithmetic are used: (a) the set of all rational numbers; (b) the set of all numbers of the form a + by/2 where a and b are rational numbers; (c) the set of all numbers of the form a + by/^l where where a and b are rational numbers. 2. Explain why the ring Mn(C) is not a field if n > 1. 3. How many n x n matrices are there over the field of two elements? How many of these are symmetric ? [You will need the formula l + 2 + 3 + '-- + n = n(n + l)/2; for this see Example A.l in the Appendix ]. 4. Let / l 1 1 / O i l A = 0 1 1 and B = 1 1 1 0 1 0 / 1 1 0 be matrices over the field of two elements. Compute A + B, A2 and AB. 5. Show that the set of all n x n scalar matrices over R with the usual matrix operations is a field. 6. Show that the set of all non-zero nxn scalar matrices over R is a group with respect to matrix multiplication. 7. Explain why the set of all non-zero integers with the usual multiplication is not a group.
  • 46. Chapter Two SYSTEMS OF LINEAR EQUATIONS In this chapter we address what has already been de- scribed as one of the fundamental problems of linear algebra: to determine if a system of linear equations - or linear system - has a solution, and, if so, to find all its solutions. Almost all the ensuing chapters depend, directly or indirectly, on the results that are described here. 2.1 Gaussian Elimination We begin by considering in detail three examples of linear systems which will serve to show what kind of phenomena are to be expected; they will also give some idea of the techniques that are available for solving linear systems. Example 2.1.1 xi - x2 + x3 + x4 = 2 %i + X2 + x3 - x4 = 3 Xi + 3X2 + £3 — 3^4 = 1 To determine if the system has a solution, we apply certain operations to the equations of the system which are designed to eliminate unknowns from as many equations as possible. The important point about these operations is that, although they change the linear system, they do not change its solutions. We begin by subtracting equation 1 from equations 2 and 3 in order to eliminate x from the last two equations. These operations can be conveniently denoted by (2) — (1) and (3) — (1) respectively. The effect is to produce a new linear system 30
  • 47. 2.1: Gaussian Elimination 31 X i - x2 2x2 4x2 +a:3 + x4 = 2 - 2x4 = 1 - 4x4 = -1 Next multiply equation 2 of this new system by , an opera- tion which is denoted by (2), to get xi - x2 +x3 + x4 = 2 1. 2 -1 Finally, eliminate x2 from equation 3 by performing the op- eration (3) — 4(2), that is, subtract 4 times equation 2 from equation 3; this yields the linear system - x2 X2 4x2 +£3 + x4 — X4 - 4x4 = = = Xi - x2 X2 + x3 + x4 — X4 0 = 2 1 — 2 = -3 Of course the third equation is false, so the original linear system has no solutions, that is, it is inconsistent. Example 2.1.2 X 2xi + 4x2 - 8x2 X2 + 2X3 + 3x3 + x3 = -2 = 32 = 1 Add two times equation 1 to equation 2, that is, perform the operation (2) + 2(1), to get xi + 4x2 + 2x3 = - 2 7x3 = 28 x2 + x3 — 1
  • 48. 32 Chapter Two: Systems of Linear Equations At this point we should have liked X2 to appear in the second equation: however this is not the case. To remedy the situ- ation we interchange equations 2 and 3, in symbols (2)«->(3). The linear system now takes the form xi + 4x2 + 2x3 = - 2 X2 + X3 = 1 7x3 = 28 Finally, multiply equation 3 by | , that is, apply |(3), to get xi + 4x2 + 2x3 = - 2 x2 + x3 = 1 x3 = 4 This system can be solved quickly by a process called back substitution. By the last equation X3 = 4, so we can substi- tute X3 = 4 in the second equation to get x2 = —3. Finally, substitute X3 = 4 and x2 = —3 in the first equation to get x = 2. Hence the linear system has a unique solution. Example 2.1.3 ( Xi < 2X! 1 -xi + 3x2 + 6x2 - 3x2 + 3x3 + 9x3 + 3x3 + 2x4 + 5X4 = 1 = 5 = 5 Apply operations (2) - 2(1) and (3) + (1) successively to the linear system to get Xi + 3x2 3x3 3x3 6x3 + 2x4 + x4 + 2x4 = 1 = 3 = 6 Since X2 has disappeared completely from the second and third equations, we move on to the next unknown x3; applying |(2), we obtain
  • 49. 2.1: Gaussian Elimination 33 xi + 3x2 + 3x3 + 2^4 = 1 ^3 + §^4 = 1 6^3 + 2x4 = 6 Finally, operation (3) - 6(2) gives Xi + 3x2 + 3^3 + 2X4 = 1 1 3 £3 + kx 4 = ! 0 = 0 Here the third equation tells us nothing and can be ignored. Now observe that we can assign arbitrary values c and d to the unknowns X4 and x2 respectively, and then use back sub- stitution to find x3 and x. Hence the most general solution of the linear system is x± = —2 — c — 3d, x2 — d, X3 = 1 — - , £4 = c. Since c and d can be given arbitrary values, the linear system has infinitely many solutions. What has been learned from these three examples? In the first place, the number of solutions of a linear system can be 0, 1 or infinity. More importantly, we have seen that there is a systematic method of eliminating some of the unknowns from all equations of the system beyond a certain point, with the result that a linear system is reached which is of such a simple form that it is possible either to conclude that no solu- tions exist or else to find all solutions by the process of back substitution. This systematic procedure is called Gaussian elimination; it is now time to give a general account of the way in which it works.
  • 50. 34 Chapter Two: Systems of Linear Equations The general theory of linear systems Consider a set of m linear equations in n unknowns xi,x2, ..., xn: { alxxi + ai2x2 + • • • + alnxn = b a2Xi + a22x2 + • • • + a2nxn = b2 &mixi + am2x2 + • • • + amnxn = bm By a solution of the linear system we shall mean an n-column vector fXl x2 xnJ such that the scalars x, x2, ..., xn satisfy all the equations of the system. The set of all solutions is called the general solution of the linear system; this is normally given in the form of a single column vector containing a number of arbitrary quantities. A linear system with no solutions is said to be inconsistent. Two linear systems which have the same sets of solutions are termed equivalent. Now in the examples discussed above three types of operation were applied to the linear systems: (a) interchange of two equations; (b) addition of a multiple of one equation to another equation; (c) multiplication of one equation by a non-zero scalar. Notice that each of these operations is invertible. The critical property of such operations is that, when they are applied to a linear system, the resulting system is equivalent to the original one. This fact was exploited in the three examples above. Indeed, by the very nature of these operations, any
  • 51. 2.1: Gaussian Elimination 35 solution of the original system is bound to be a solution of the new system, and conversely, by invertibility of the operations, any solution of the new system is also a solution of the original system. Thus we can state the fundamental theorem: Theorem 2.1.1 When an operation of one of the three types (a), (b), (c) is applied to a linear system, the resulting linear system is equiv- alent to the original one. We shall now exploit this result and describe the proce- dure known as Gaussian elimination. In this a sequence of operations of types (a), (b), (c) is applied to a linear system in such a way as to produce an equivalent linear system whose form is so simple that we can quickly determine its solutions. Suppose that a linear system of m equations in n un- knowns xi, X2, ..., xn is given. In Gaussian elimination the following steps are to be carried out. (i) Find an equation in which x appears and, if necessary, interchange this equation with the first equation. Thus we can assume that x appears in equation 1. (ii) Multiply equation 1 by a suitable non-zero scalar in such a way as to make the coefficient of x equal to 1. (iii) Subtract suitable multiples of equation 1 from equa- tions 2 through m in order to eliminate x from these equa- tions. (iv) Inspect equations 2 through m and find the first equa- tion which involves one of the the unknowns a?2, • • • , xn , say Xi2. By interchanging equations once again, we can suppose that Xi2 occurs in equation 2. (v) Multiply equation 2 by a suitable non-zero scalar to make the coefficient of Xi2 equal to 1. (vi) Subtract multiples of equation 2 from equations 3 through m to eliminate Xi2 from these equations.
  • 52. 36 Chapter Two: Systems of Linear Equations (vii) Examine equations 3 through m and find the first one that involves an unknown other than x and Xi2, say xi3. By interchanging equations we may assumethat Xi3 actually occurs in equation 3. The next step is to make the coefficient of xi3 equal to 1, and then to eliminate Xi3 from equations 4 through m, and so on. The elimination procedure continues in this manner, pro- ducing the so-called pivotal unknowns xi = xix, xi2, ..., Xir, until we reach a linear system in which no further unknowns occur in the equations beyond the rth. A linear system of this sort is said to be in echelon form; it will have the following shape. Xi1 -f- # Xi2 Xi2 < K 0 = * Here the asterisks represent certain scalars and the ij are in- tegers which satisfy 1 = i < %2 < • • • < ir < n - The unknowns Xi. for j = 1 to r are the pivots. Once echelon form has been reached, the behavior of the linear system can be completely described and the solutions - if any - obtained by back substitution, as in the preceding examples. Consequently we have the following fundamental result which describes the possible behavior of a linear system. Theorem 2.1.2 (i) A linear system is consistent if and only if all the entries on the right hand sides of those equations in echelon form which contain no unknowns are zero. + • • • + * xn = * + • • • + * xn = * • Ei r i ' ' ' "T * Xn — * 0 = *
  • 53. 2.1: Gaussian Elimination 37 (ii) If the system is consistent, the non-pivotal unknowns can be given arbitrary values; the general solution is then obtained by using back substitution to solve for the pivotal unknowns. (iii) The system has a unique solution if and only if all the unknowns are pivotal. An important feature of Gaussian elimination is that it constitutes a practical algorithm for solving linear systems which can easily be implemented in one of the standard pro- gramming languages. Gauss-Jordan elimination Let us return to the echelon form of the linear system described above. We can further simplify the system by sub- tracting a multiple of equation 2 from equation 1 to eliminate xi2 from that equation. Now xi2 occurs only in the second equation. Similarly we can eliminate x;3 from equations 1 and 2 by subtracting multiples of equation 3 from these equa- tions. And so on. Ultimately a linear system is reached which is in reduced echelon form. Here each pivotal unknown appears in precisely one equa- tion; the non-pivotal unknowns may be given arbitrary values and the pivotal unknowns are then determined directly from the equations without back substitution. The procedure for reaching reduced echelon form is called Gauss-Jordan elimination: while it results in a simpler type of linear system, this is accomplished at the cost of using more operations. Example 2.1.4 In Example 2.1.3 above we obtained a linear system in echelon form ' X + 3^2 + 3^3 + 2^4 = 1 < x3 + | x 4 = 1
  • 54. 38 Chapter Two: Systems of Linear Equations Here the pivots are x and X3. One further operation must be applied to put the system in reduced row echelon form, namely (1) - 3(2); this gives x + 3x2 + X4 — — 2 £3 + x± = 1 To obtain the general solution give the non-pivotal unknowns x2 and X4 the arbitrary values d and c respectively, and then read off directly the values xi = —2 — c — 3d and x3 = 1 — c/3. Homogeneous linear systems A very important type of linear system occurs when all the scalars on the right hand sides of the equations equal zero. ' a n x i + CJ12X2 + a2Xi + a22x2 + , amlx1 + am2x2 + Such a system is called homogeneous. It will always have the trivial solution x = 0, x2 = 0, ..., xn = 0; thus a homogeneous linear system is always consistent. The interesting question about a homogeneous linear system is whether it has any non- trivial solutions. The answer is easily read off from the echelon form. Theorem 2.1.3 A homogeneous linear system has a non-trivial solution if and only if the number of pivots in echelon form is less than the number of unknowns. For if the number of unkowns is n and the number of pivots is r, the n — r non-pivotal unknowns can be given arbi- trary values, so there will be a non-trivial solution whenever + «2n^n = 0 1 Q>mn%n = U
  • 55. 2.1: Gaussian Elimination 39 n — r > 0. On the other hand, if n = r, none of the unknowns can be given arbitrary values, and there is only one solution, namely the trivial one, as we see from reduced echelon form. Corollary 2.1.4 A homogeneous linear system of m equations in n unknowns always has a non-trivial solution if m <n. For if r is the number of pivots, then r <m < n. Example 2.1.5 For which values of the parameter t does the following homo- geneous linear system have non-trivial solutions? 6a?i - x2 + x3 = 0 tX + X3 = 0 x2 + tx3 = 0 It suffices to find the number of pivotal unknowns. We proceed to put the linear system in echelon form by applying to it successively the operations |(1), (2) — £(1), (2) « - > • (3) and ( 3 ) - | ( 2 ) : {Xl - x2 + | ^ 3 = 0 x2 + tx3 = 0 U - S - T ) * 3 =0 The number of pivots will be less than 3, the number of un- knowns, precisely when 1 — t/6 — t2 /6 equals zero, that is, when i = 2 or i = - 3 . These are the only values of t for which the linear system has non-trivial solutions. The reader will have noticed that we deviated slightly from the procedure of Gaussian elimination; this was to avoid dividing by t/6, which would have necessitated a separate dis- cussion of the case t = 0.
  • 56. 40 Chapter Two: Systems of Linear Equations Exercises 2.1 In the first three problems find the general solution or else show that the linear system is inconsistent. x + 2x2 — 3x3 + x4 = 7 -xi + x2 - x3 + X4 = 4 2. 3. + x 2 — £3 — X4 = 0 + x3 - x4 = - 1 + 2x2 + %3 — 3^4 = 2 xx + x2 + 2x3 = 4 Xi - X2 — £3 = - 1 2xi — 4x2 — 5x3 = 1 Solve the following homogeneous linear systems xi + x2 + x3 + x4 = 0 (a) { 2xi + 2x2 + £3 + £4 = 0 xi + x2 - x3 + x4 = 0 2xi — X2 + 3x3 = 0 (b) { 4xi + 2x2 + 2 x 3 = 0 -2xi + 5x2 — 4x3 = 0 5. For which values of t does the following homogeneous linear system have non-trivial solutions? 12xi tXi - x2 X2 + X3 + X3 + tx3 = 0 = 0 = 0
  • 57. 2.2: Elementary Row Operations 41 6. For which values of t is the following linear system consis- tent? 7. How many operations of types (a), (b), (c) are needed in general to put a system of n linear equations in n unknowns in echelon form? 2.2 Elementary Row Operations If we examine more closely the process of Gaussian elim- ination described in 2.1, it is apparent that much time and trouble could be saved by working directly with the augmented matrix of the linear system and applying certain operations to its rows. In this way we avoid having to write out the unknowns repeatedly. The row operations referred to correspond to the three types of operation that may be applied to a linear system dur- ing Gaussian elimination. These are the so-called elementary row operations and they can be applied to any matrix. The row operations together with their symbolic representations are as follows: (a) interchange rows i and j , (i?j <-»• Rj); (b) add c times row j to row i where c is any scalar, (Ri + cRj); (c) multiply row i by a non-zero scalar c, (cRi). From the matrix point of view the essential content of The- orem 2.1.2 is that any matrix can be put in what is called row echelon form by application of a suitable finite sequence of elementary row operations. A matrix in row echelon form
  • 58. 42 Chapter Two: Systems of Linear Equations has the typical "descending staircase" form 0 0 0 0 0 ll 0 0 0 0 * • 0 • 0 • 0 • 0 • * • 0 • 0 • 0 • 0 * 1 0 0 0 * • • * • • 0 •• 0 •• 0 •• * * • * * • ll * • 0 0 • 0 0 • * * * * * * • 0 * • 0 */ 0 0 0 Vo Here the asterisks denote certain scalars. Example 2.2.1 Put the following matrix in row echelon form by applying suitable elementary row operations: 1 3 3 2 1 2 6 9 5 5 - 1 - 3 3 0 5 Applying the row operations R2 — 2R and -R3 + R, we obtain 1 3 3 2 l 0 0 3 1 3 . 0 0 6 2 6 / Then, after applying the operations |i?2 and R3 — 6R2, we get 1 3 3 2 1 0 0 1 1 / 3 1 0 0 0 0 0 which is in row echelon form. Suppose now that we wish to solve the linear system with matrix form AX = B, using elementary row operations. The first step is to identify the augmented matrix M = [A B].
  • 59. 2.2: Elementary Row Operations 43 Then we put M in row echelon form, using row operations. From this we can determine if the original linear system is consistent; for this to be true, in the row echelon form of M the scalars in the last column which lie below the final pivot must all be zero. To find the general solution of a consistent system we convert the row echelon matrix back to a linear system and use back substitution to solve it. Example 2.2.2 Consider once again the linear system of Example 2.1.3; Xl 2xi -Xi + 3x2 + 6x2 - 3x2 + 3x3 + 9x3 + 3x3 + 2x4 + 5X4 = 1 = 5 = 5 The augmented matrix here is 1 3 3 2 1 1 2 6 9 5 1 5 1 — 3 3 0 1 5 Now we have just seen in Example 2.2.1 that this matrix has row echelon form 1 3 3 2 0 0 11/3 0 0 0 0 1 1 | 1 1 0 Because the lower right hand entry is 0, the linear system is consistent. The linear system corresponding to the last matrix is Xi + 3X2 + 3X3 + 2X4 = 1 £3 + - x 4 = 1 0 = 0
  • 60. 44 Chapter Two: Systems of Linear Equations Hence the general solution given by back substitution is x = —2 — c — 3d, X2 = d, £3 = 1 — c/3, £4 = c, where c and d are arbitrary scalars. The matrix formulation enables us to put our conclusions about linear systems in a succinct form. Theorem 2.2.1 Let AX = B be a linear system of equations in n unknowns with augmented matrix M = [A B]. (i) The linear system is consistent if and only if the matri- ces A and M have the same numbers of pivots in row echelon form. (ii) If the linear system is consistent and r denotes the number of pivots of A in row echelon form, then the n — r unknowns that correspond to columns of A not containing a pivot can be given arbitrary values. Thus the system has a unique solution if and only if r = n. Proof For the linear system to be consistent, the row echelon form of M must have only zero entries in the last column below the final pivot; but this is just the condition for A and M to have the same numbers of pivots. Finally, if the linear system is consistent, the unknowns corresponding to columns that do not contain pivots may be given arbitrary values and the remaining unknowns found by back substitution. Reduced row echelon form A matrix is said to be in reduced row echelon form if it is in row echelon form and if in each column containing a pivot all entries other than the pivot itself are zero.
  • 61. 2.2: Elementary Row Operations 45 Example 2.2.3 Put the matrix 1 1 2 2 4 4 9 10 3 3 6 7 in reduced row echelon form. By applying suitable row operations we find the row ech- elon form to be ' 1 1 2 2' 0 0 1 2 0 0 0 1 Notice that columns 1, 3 and 4 contain pivots. To pass to reduced row echelon form, apply the row operations Ri — 2R2, R + 2i?3 and R2 — 2R3: the answer is 1 1 0 0' 0 0 1 0 0 0 0 1 As this example illustrates, one can pass from row echelon form to reduced row echelon form by applying further row op- erations; notice that this will not change the number of pivots. Thus an arbitrary matrix can be put in reduced row echelon form by applying a finite sequence of elementary row opera- tions. The reader should observe that this is just the matrix formulation of the Gauss-Jordan elimination procedure. Exercises 2.2 1. Put each of the following matrices in row echelon form: , (b)
  • 62. 46 Chapter Two: Systems of Linear Equations / l 2 - 3 1 (c) 3 1 2 2 . 8 1 9 1 / 2. Put each of the matrices in Exercise 2.2.1 in reduced row echelon form. 3. Prove that the row operation of type (a) which interchanges rows i and j can be obtained by a combination of row opera- tions of the other two types, that is, types (b) and (c). 4. Do Exercises 2.1.1 to 2.1.4 by applying row operations to the augmented matrices. 5. How many row operations are needed in general to put an n x n matrix in row echelon form? 6. How many row operations are needed in general to put an n x n matrix in reduced row echelon form? 7. Give an example to show that a matrix can have more than one row echelon form. 8. If A is an invertible n x n matrix, prove that the linear system AX = B has a unique solution. What does this tell you about the number of pivots of A? 9. Show that each elementary row operation has an inverse which is also an elementary row operation.
  • 63. 2.3: Elementary Matrices 47 2.3 Elementary Matrices An nxn matrix is called elementary if it is obtained from the identity matrix In in one of three ways: (a) interchange rows i and j where i ^ j ; (b) insert a scalar c as the (i,j) entry where % ^ j ; (c) put a non-zero scalar c in the (i, i) position. Example 2.3.1 Write down all the possible types of elementary 2x2 matrices. These are the elementary matrices that arise from the matrix 12 = ( o i ) ' t h e y are Ei=[ I l),E2 =(l 0 [ ) , * * = ( I I and * - « s : ) • * - ( * °e Here c is a scalar which must be non-zero in the case of E4 and E5. The significance of elementary matrices from our point of view lies in the fact that when we premultiply a matrix by an elementary matrix, the effect is to perform an elementary row operation on the matrix. For example, with the matrix A _ 1 i n ^12 «21 «22 and elementary matrices listed in Example 2.3.1, we have EA=(a21 a22 EA=(ail + Ca21 a i2 + c «22N U i i a i 2 / ' 2 V a 2i a22
  • 64. 48 Chapter Two: Systems of Linear Equations and E5A = ( au °1 2 ") . ca2i ca22J Thus premultiplication by E interchanges rows 1 and 2; pre- multiplication by E2 adds c times row 2 to row 1; premultipli- cation by £5 multiplies row 2 by c . What then is the general rule? Theorem 2.3.1 Let A be an m x n matrix and let E be an elementary m xm matrix. (i) // E is of type (a), then EA is the matrix obtained from A by interchanging rows i and j of A; (ii) if E is type (b), then EA is the matrix obtained from A by adding c times row j to row i; (iii) if E is of type (c), then EA arises from A by multi- plying row i by c. Now recall from 2.2 that every matrix can be put in re- duced row echelon form by applying elementary row opera- tions. Combining this observation with 2.3.1, we obtain Theorem 2.3.2 Let A be any mxn matrix. Then there exist elementary mxm matrices E, E2, • • • , Ek such that the matrix EkE^-i • • • EA is in reduced row echelon form. Example 2.3.2 Consider the matrix A= [2 1 oj- We easily put this in reduced row echelon form B by applying successively the row operations R <-> R2, ^R, R — ^R2 • . (2 1 0 (I 1/2 0 (I 0 - 1 _ ^ ^ 0 1 2)^Q 1 2 ; ~ ^ 0 1 2J~
  • 65. 2.3: Elementary Matrices 49 Hence E^E2EA = B where *-(!i).*-(Y:)'*-(i_ 1 / ?) Column operations Just as for rows, there are three types of elementary col- umn operation, namely: (a) interchange columns i and j , ( C{ «-> Cj); (b) add c times column j to column i where c is a scalar, (Ci + cCj); (c) multiply column i by a non-zero scalar c, ( cCi). (The reader is warned, however, that column operations can- not in general be applied to the augmented matrix of a linear system without changing the solutions of the system.) The effect of applying an elementary column operation to a matrix is simulated by right multiplication by a suitable elementary matrix. But there is one important difference from the row case. In order to perform the operation Ci + cCj to a matrix A one multiplies on the right by the elementary matrix whose (j, i) element is c. For example, let E=(l *) and A=(an °12 V c 1J a2i a22J Then AE _ / i n +cai 2 a12 a2i + ca22 a22 Thus E performs the column operation C + 2C2 and not C2 + 2C. By multiplying a matrix on the right by suitable sequences of elementary matrices, a matrix can be put in col- umn echelon form or in reduced column echelon form; these
  • 66. 50 Chapter Two: Systems of Linear Equations are just the transposes of row echelon form and reduced row echelon form respectively. Example 2.3.3 / 3 6 2 Put the matrix A = I J in reduced column echelon form. Apply the column operations C, C2 — 6Ci, C3 — 2Ci, C2 <-> C3, ^ C 2 , and Cx - C2 : A 1 6 2 / 1 0 0 1/3 2 7J ^ l/3 0 19/3 1 0 0 / 1 0 0 1/3 19/3 0y ~* ^ 1/3 1 0 1 0 0 0 1 0 We leave the reader to write down the elementary matrices that produce these column operations. Now suppose we are allowed to apply both row and column operations to a matrix. Then we can obtain first row echelon form; subsequently column operations may be applied to give a matrix of the very simple type 'Ir 0 0 0 where r is the number of pivots. This is called the normal form of the matrix; we shall see in 5.2 that every matrix has a unique normal form. These conclusions are summed up in the following result.
  • 67. 2.3: Elementary Matrices 51 Theorem 2.3.3 Let A be anmxn matrix. Then there exist elementary mxm matrices E,..., Ek and elementary nxn matrices F,..., Fi such that Ek---ElAF1---Fl=N, the normal form of A. Proof By applying suitable row operations to A we can find elemen- tary matrices Ei, ..., Ek such that B = Ek • • • EA is in row echelon form. Then column operations are applied to reduce B to normal form; this procedure yields elementary matrices Fi, ..., Fi such that N = BF1 • • • F = Ek • • • E1AF1 • • • Ft is the normal form of A. Corollary 2.3.4 For any matrix A there are invertible matrices X and Y such that N = XAY, or equivalently A = X~1 NY~1 , where N is the normal form of A. For it is easy to see that every elementary matrix is in- vertible; indeed the inverse matrix represents the inverse of the corresponding elementary row (or column) operation. Since by 1.2.3 any product of invertible matrices is invertible, the corollary follows from 2.3.3. Example 2.3.4 (1 2 2 Let A = „ „ , . Find the normal form N of A and write 2 3 4 J N as the product of A and elementary matrices as specified in 2.3.3. All we need do is to put A in normal form, while keeping track of the elementary matrices that perform the necessary row and column operations. Thus
  • 68. 52 Chapter Two: Systems of Linear Equations A^l1 2 2 A 2 2 / I 0 2~ 0 - 1 Oj 0 1 Oj 0 1 0 1 0 0 0 1 0 which is the normal form of A. Here three row operations and one column operation were used to reduce A to its normal form. Therefore E3E2E1AF1 = N where * = | J ? ) . * = f j ?),*3='1 ^ -2 ) ' ^ V0 - 1 ) ' • * I 0 1 and Inverses of matrices Inverses of matrices were defined in 1.2, but we deferred the important problem of computing inverses until more was known about linear systems. It is now time to address this problem. Some initial information is given by Theorem 2.3.5 Let A be annxn matrix. Then the following statements about A are equivalent, that is, each one implies all of the others. (a) A is invertible; (b) the linear system AX = 0 has only the trivial solution; (c) the reduced row echelon form of A is In;
  • 69. 2.3: Elementary Matrices 53 (d) A is a product of elementary matrices. Proof We shall establish the logical implications (a) —> (b), (b) — • (c), (c) — > (d), and (d) — > (a). This will serve to establish the equivalence of the four statements. If (a) holds, then A~l exists; thus if we multiply both sides of the equation AX = 0 on the left by A"1 , we get A'1 AX = A^1 0, so that X = A- 1 0 = 0 and the only solution of the linear system is the trivial one. Thus (b) holds. If (b) holds, then we know from 2.1.3 that the number of pivots of A in reduced row echelon form is n. Since A is n x n, this must mean that In is the reduced row echelon form of A, so that (c) holds. If (c) holds, then 2.3.2 shows that there are elementary matrices E±, ...,Ek such that Ek • • • E±A = In. Since elemen- tary matrices are invertible, Ek- • -E is invertible, and thus A=(Ek--- Ei)"1 = E^1 • • • E^1 , so that (d) is true. Finally, (d) implies (a) since a product of elementary ma- trices is always invertible. A procedure for finding the inverse of a matrix As an application of the ideas in this section, we shall describe an efficient method of computing the inverse of an invertible matrix. Suppose that A is an invertible n x n matrix. Then there exist elementary n x n matrices E, E^, • • • , Ek such that Ek--- E2EXA = In, by 2.3.2 and 2.3.5. Therefore A-1 = InA~l = (£*• • • E2ElA)A~1 = (Ek--- E2E1)In. This means that the row operations which reduce A to its reduced row echelon form In will automatically transform In to A- 1 . It is this crucial observation which enables us to compute A~x .
  • 70. 54 Chapter Two: Systems of Linear Equations The procedure for computing A x starts with the parti- tioned matrix [A | In] and then puts it in reduced row echelon form. If A is invertible, the reduced row echelon form will be [In I A-1 ], as the discussion just given shows. On the other hand, if the procedure is applied to a matrix that is not invertible, it will be impossible to reach a reduced row echelon form of the above type, that is, one with In on the left. Thus the procedure will also detect non-invertibility of a matrix. Example 2.3.5 Find the inverse of the matrix A = Put the matrix [A | J3] in reduced row echelon form, using elementary row operations as described above: 1/2 0 0' 1/2 1 0 0 0 1
  • 71. 2.3: Elementary Matrices 55 -1/2 0 | 1 -2/3 I -1 2 I 1 0 -1/3 | 2/3 1/3 0' 0 1 -2/3 J 1/3 2/3 0 0 0 4/3 | 1/3 2/3 1 1/3 | 2/3 1/3 0 2/3 I 1/3 2/3 0 1 j 1/4 1/2 3/4 | 3/4 1/2 1/4 I V2 1 V 2 , I 1/4 1/2 3/4/ which is the reduced row echelon form. Therefore A is invert- ible and /3/4 1/2 1/4' A'1 = 1 / 2 1 1/2 l / 4 1/2 3/4 This answer can be verified by checking that AA"1 = 1$ = A~l A. As this example illustrates, the procedure for finding the inverse of a n x n matrix is an efficient one; in fact at most n2 row operations are required to complete it (see Exercise 2.3.10). Exercises 2.3 1. Express each of the following matrices as a product of elementary matrices and its reduced row echelon form:
  • 72. 56 Chapter Two: Systems of Linear Equations 2. Express the second matrix in Exercise 1 as a product of elementary matrices and its reduced column echelon form. 3. Find the normal form of each matrix in Exercise 1. 4. Find the inverses of the three types of elementary matrix, and observe that each is elementary and corresponds to the inverse row operation. 5. What is the maximum number of column operations needed in general to put an n x n matrix in column echelon form and in reduced column echelon form? 6. Compute the inverses of the following matrices if they exist: 2 1 0 - 3 0 - 1 1 2 - 3 / / ; (c) 2 1 7 - 1 4 10 3 2 12 7. For which values of t does the matrix 6 t 0 - 1 0 1 1 1 t not have an inverse? 8. Give necessary and sufficient conditions for an upper tri- angular matrix to be invertible. 9. Show by an example that if an elementary column opera- tion is applied to the augmented matrix of a linear system, the resulting linear system need not be equivalent to the original one. 10. Prove that the number of elementary row operations needed to find the inverse of an n x n matrix is at most n2 .
  • 73. Chapter Three DETERMINANTS Associated with every square matrix is a scalar called the determinant. Perhaps the most striking property of the de- terminant of a matrix is the fact that it tells us if the matrix is invertible. On the other hand, there is obviously a limit to the amount of information about a matrix which can be carried by a single scalar, and this is probably why determi- nants are considered less important today than, say, a hundred years ago. Nevertheless, associated with an arbitrary square matrix is an important polynomial, the characteristic poly- nomial, which is a determinant. As we shall see in Chapter Eight, this polynomial carries a vast amount of information about the matrix. 3.1 Permutations and the Definition of a Determinant Let A = (aij) be an n x n matrix over some field of scalars (which the reader should feel free to assume is either R or C). Our first task is to show how to define the determinant of A, which will be written either det(A) or else in the extended form an Q21 ai2 «22 &2n O-nl a n2 For n = l and 2 the definition is simple enough: kill = an and a u ai2 «2i a-ii ^ 1 1 ^ 2 2 — ^ 1 2 0 2 1 - 57
  • 74. 58 Chapter Three: Determinants For example, |6| = 6 and Where does the expression aa22 — a,2a2 come from? The motivation is provided by linear systems. Suppose that we want to solve the linear system anxi +012X2 = 61 a2ixi + a22x2 = h for unknowns x and x2. Eliminate x2 by subtracting a2 times equation 2 from a22 times equation 1; in this way we obtain (ana22 - ai2a2i)xi = 61022 - aX2b2. This equation expresses xi as the quotient of a pair of 2 x 2 determinants: bx aX2 b2 a22 an aw a2i a22 provided, of course, that the denominator does not vanish. There is a similar expression for x2. The preceding calculation indicates that 2 x 2 determi- nants are likely to be of significance for linear systems. And this is confirmed if we try the same computation for a lin- ear system of three equations in three unknowns. While the resulting solutions are complicated, they do suggest the fol- lowing definition for det(^4) where A = (0,^)3,3; a n a 2 2 0 3 3 + <2l2<223a 31 + Ql3«2ia32 —ai2a2id33 — ai3a22a,3i — ana23a 32 2 - 3 4 1 14.
  • 75. 3.1: The Definition of a Determinant 59 What are we to make of this expression? In the first place it contains six terms, each of which is a product of three entries of A. The second subscripts in each term correspond to the six ways of ordering the integers 1, 2, 3, namely 1,2,3 2,3,1 3,1,2 2,1,3 3,2,1 1,3,2. Also each term is a product of three entries of A, while three of the terms have positive signs and three have negative signs. There is something of a pattern here, but how can one tell which terms are to get a plus sign and which are get a minus sign? The answer is given by permutations. Permutations Let n be a fixed positive integer. By a permutation of the integers 1, 2,..., n we shall mean an arrangement of these integers in some definite order. For example, as has been observed, there are six permutations of the integers 1, 2, 3. In general, a permutation of 1, 2,..., n can be written in the form k , 12, • • • , in where ii, i2,. • •, in are the integers 1, 2,..., n in some order. Thus to construct a permutation we have only to choose dis- tinct integers ii, i2, • . ., in from the set {1, 2,..., n). Clearly there are n choices for i once i has been chosen, it cannot be chosen again, so there are just n — 1 choices for i2 since i and i2 cannot be chosen again, there are n — 2 choices for ^3, and so on. There will be only one possible choice for in since n — 1 integers have already been selected. The number of ways of constructing a permutation is therefore equal to the product of these numbers n(n - l)(n - 2) •• -2-1,
  • 76. 60 Chapter Three: Determinants which is written n! and referred to as "n factorial". Thus we can state the follow- ing basic result. Theorem 3.1.1 The number of permutations of the integers 1,2,... ,n equals n! = n ( n - l)---2- 1. Even and odd permutations A permutation of the integers 1,2, ...,n is called even or odd according to whether the number of inversions of the natural order 1,2,... ,n that are present in the permutation is even or odd respectively. For example, the permutation 1, 3, 2 involves a single inversion, for 3 comes before 2; so this is an odd permutation. For permutations of longer sequences of integers it is advantageous to count inversions by means of what is called a crossover diagram. This is best explained by an example. Example 3.1.1 Is the permutation 8, 3, 2, 6, 5, 1, 4, 7 even or odd? The procedure is to write the integers 1 through 8 in the natural order in a horizontal line, and then to write down the entries of the permutation in the line below. Join each integer i in the top line to the same integer i where it appears in the bottom line, taking care to avoid multiple intersections. The number of intersections or crossovers will be the number of inversions present in the permutation:
  • 77. 3.1: The Definition of a Determinant 61 1 2 3 4 5 6 7 8 8 3 2 6 5 1 4 7 Since there are 15 crossovers in the diagram, this permutation is odd. A transposition is a permutation that is obtained from 1, 2,..., n by interchanging just two integers. Thus 2,1, 3,4,..., n is an example of a transposition. An important fact about transpositions is that they are always odd. Theorem 3.1.2 Transpositions are odd permutations. Proof Consider the transposition which interchanges i and j , with i < j say. The crossover diagram for this transposition is 1 2 . . . / / + 1 . . . j - 1 j j + 1 . . . n 1 2 . . . j i + 1 . . . j - 1 / j + 1 . . . n Each of the j — i — 1 integers i + 1, i + 2,..., j — 1 gives rise to 2 crossovers, while i and j add one more. Hence the total number of crossovers in the diagram equals 2(j — i — 1) + 1, which is odd. It is important to determine the numbers of even and odd permutations.
  • 78. 62 Chapter Three: Determinants Theorem 3.1.3 If n > 1, there are {n) even permutations of 1,2,... ,n and the same number of odd permutations. Proof If the first two integers are interchanged in a permutation, it is clear from the crossover diagram that an inversion is either added or removed. Thus the operation changes an even permutation to an odd permutation and an odd permutation to an even one. This makes it clear that the numbers of even and odd permutations must be equal. Since the total number of permutations is n, the result follows. Example 3.1.2 The even permutations of 1, 2, 3 are 1,2,3 2,3,1 3,1,2, while the odd permutations are 2,1,3 3,2,1 1,3,2. Next we define the sign of a permutation i, i2,. • ., in sign(ii, i2 ,..., in) to be +1 if the permutation is even and —1 if the permutation is odd. For example, sign(3, 2, 1) = —1 since 3, 2, 1 is an odd permutation. Permutation matrices Before proceeding to the formal definition of a determi- nant, we pause to show how permutations can be represented by matrices. An nxn matrix is called a permutation matrix if it can be obtained from the identity matrix In by rearranging the rows or columns. For example, the permutation matrix
  • 79. 3.1: The Definition of a Determinant 63 is obtained from I3 by cyclically permuting the columns, C — > C2 —» C3 — > C. Permutation matrices are easy to recognize since each row and each column contains a single 1, while all other entries are zero. Consider a permutation i, 22,..., in of 1,2,..., n, and let P be the permutation matrix which has (j,ij) entry equal to 1 for j = 1,2,... , n, and all other entries zero. This means that P is obtained from In by rearranging the columns in the manner specified by the permutation i,. . . ,in , that is, Cj Ci,. Then, as matrix multiplication shows, (X (ix 2 i2 nj inJ Thus the effect of a permutation on the order 1,2,... ,n is reproduced by left multiplication by the corresponding per- mutation matrix. Example 3.1.3 The permutation matrix which corresponds to the permuta- tion 4, 2, 1, 3 is obtained from I4 by the column replacements Ci — • C4, C2 — > C2, C3 — » Ci, C4 — > C3. It is P = / 0 0 0 1 0 1 0 0 1 0 0 0 Vo 0 1 0/
  • 80. 64 Chapter Three: Determinants and indeed P 2 3 w Definition of a determinant in general We are now in a position to define the general n x n determinant. Let A = (ay)n,n be an n x n matrix over some field of scalars. Then the determinant of A is the scalar defined by the equation det(A) = Y^ s ign(i1,i2,...,in)aUla2i2 where the sum is taken over all permutations ii,%2, • • • ,in of 1,2,...,n. Thus det(.A) is a sum of n terms each of which involves a product of n elements of A, one from each row and one from each column. A term has a positive or negative sign according to whether the corresponding permutation is even or odd respectively. One determinant which can be immediately evaluated from the definition is that of In: det(In) = 1. This is because only the permutation 1,2,... ,n contributes a non-zero term to the sum that defines det(Jn). If we specialise the above definition to the cases n = 1,2,3, we obtain the expressions for det(^4) given at the be- ginning of the section. For example, let n — 3; the even and odd permutations are listed above in Example 3.1.2. If we write down the terms of the determinant in the same order, we obtain
  • 81. 3.1: The Definition of a Determinant 65 ana220-33 + a 1 2 a 2 3 a 3 1 + ai3G21<332 — 012021033 — 013022^31 — ^Iia23tl32 We could in a similar fashion write down the general 4 x 4 de- terminant as a sum of 4! =24 terms, 12 with a positive sign and 12 with a negative sign. Of course, it is clear that the definition does not provide a convenient means of comput- ing determinants with large numbers of rows and columns; we shall shortly see that much more efficient procedures are available. Example 3.1.4 What term in the expansion of the 8x8 determinant det((ajj)) corresponds to the permutation 8, 3, 2, 6, 5, 1, 4, 7 ? We saw in Example 3.1.1 that this permutation is odd, so its sign is —1; hence the term sought is ~ Ctl8a23a 32«46a55«6l074^87- Minors and cofactors In the theory of determinants certain subdeterminants called minors prove to be a useful tool. Let A = (a^) be an n x n matrix. The (i, j) minor Mi;- of A is defined to be the determinant of the submatrix of A that remains when row i and column j of A are deleted. The (i, j) cofactor Aij of A is simply the minor with an appropriate sign: Ay = ( - l r ^ M y . For example, if (a n «12 a13 a2 a22 a23 J , «31 «32 G3 3 /
  • 82. 66 Chapter Three: Determinants then and M23 = an ai2 «31 ^32 a l l a 3 2 — ^12031 I23 = (-1)2 + 3 M2 3 = ai2a3i - ana3 2 . One reason for the introduction of cofactors is that they provide us with methods of calculating determinants called row expansion and column expansion. These are a great im- provement on the defining sum as a means of computing de- terminants. The next result tells us how they operate. Theorem 3.1.4 Let A = (dij) be an n x n matrix. Then (i) det(A) = X]fc=i a ikMk , (expansion by row i); (ii) det(A) = Efc= i a kjAkj, (expansion by column j). Thus to expand by row i, we multiply each element in row i by its cofactor and add up the resulting products. Proof of Theorem 3.1.4 We shall give the proof of (i); the proof of (ii) is similar. It is sufficient to show that the coefficient of a^ in the defining expansion of det(.A) equals A^. Consider first the simplest case, where i = 1 = k. The terms in the defining expansion of det(.A) that involve an are those that appear in the sum Y^ si gn (1 > *2, • • • , in)ana2i2 • • • a nin. Here the sum is taken over all permutations of 1,2,... ,n which have the form 1, z2 ,i3 ,..., zn. This sum is clearly the same as au(%2 sign(i2, i3, ... , in)a2i2 a 3i3
  • 83. 3.1: The Definition of a Determinant 67 where the summation is now over all permutations 12, • • • , in of the integers 2,..., n. But the coefficient of an in this last expression is just Mu = An. Hence the coefficient of an is the same on both sides of the equation in (i). We can deduce the corresponding statement for general i and k by means of the following device. The idea is to move dik to the (1, 1) position of the matrix in such a way that it will still have the same minor M^. To do this we interchange row % of A successively with rows i — l,i — 2,...,l, after which dik will be in the (1, k) position. Then we interchange column k with the columns k — l,k — 2,...,l successively, until a^ is in the (1,1) position. If we keep track of the determinants that arise during this process, we find that in the final determinant the minor of aik is still M^. So by the result of the first paragraph, the coefficient of a^ in the new determinant is Mik. However each row and column interchange changes the sign of the determinant. For the effect of such an interchange is to switch two entries in every permutation, and, as was pointed out during the proof of 3.1.3, this changes a permu- tation from even to odd, or from odd to even. Thus the sign of each permutation is changed by —1. The total number of interchanges that have been applied is (i — 1) + (k — 1) = i + k — 2. The sign of the determinant is therefore changed by (-l)i + f c ~2 = (-l)i + f c . It follows that the coefficient of a^ in det(A) is (—l)l+k Mik, which is just the definition of An-. (It is a good idea for the reader to write out explicitly the row and column interchanges in the case n — 3 and i = 2, k = 3, and to verify the statement about the minor M23). The theorem provides a practical method of computing 3 x 3 determinants; for determinants of larger size there are more efficient methods, as we shall see.
  • 84. 68 Chapter Three: Determinants Example 3.1.5 Compute the determinant 1 2 0 4 2 - 1 . 6 2 2 For example, we may expand by row 1, obtaining 2 - 1 2 2 + 2 ( - l ) 3 4 - 1 6 2 + 0(-l)4 4 2 6 2 = 6 - 28 + 0 = -22. Alternatively, we could expand by column 2: 4 - 1 6 2 + 2(-l)4 1 0 6 2 + 2(-l)5 1 0 4 - 1 = -28 + 4 + 2 = -22. However there is an obvious advantage in expanding by a row or column which contains as many zeros as possible. The determinant of a triangular matrix can be written down at once, an observation which is used frequently in cal- culating determinants. Theorem 3.1.5 The determinant of an upper or lower triangular matrix equals the product of the entries on the principal diagonal of the ma- trix. Proof Suppose that A = (oij)n)Tl is, say, upper triangular, and ex- pand det(i4) by column 1. The result is the product of a n and an (n — 1) x (n — 1) determinant which is also upper tri- angular. Repeat the operation until a 1 x 1 determinant is obtained (or use mathematical induction).
  • 85. 3.1: The Definition of a Determinant 69 Exercises 3.1 1. Is the permutation 1, 3, 8, 5, 2, 6, 4, 7 even or odd? What is the corresponding term in the expansion of det^a^^s)? 2. The same questions for the permutation 8, 5, 3, 2, 1, 7, 6, 9,4. 3. Use the definition of a determinant to compute 1 - 3 0 2 1 4 -1 0 1 4. How many additions, subtractions and multiplications are needed to compute a n n x n determinant by using the defini- tion? 5. For the matrix 2 4 -1 3 3 2 find the minors Mi3, M23 and M33, and the corresponding cofactors Ai3, A23 and A33. 6. Use the cofactors found in Exercise 5 to compute the de- terminant of the matrix in that problem. 7. Use row or column expansion to compute the following determinants: (a) - 2 2 3 2 2 1 0 0 5 (c) 1 0 0 0 , (b) 1 2 1 0 - 2 3 4 2 0 - 3 0 0 - 2 0 0 1 4 1 1 3 3 4 1 3 - 1 2 1 3
  • 86. 70 Chapter Three: Determinants 8. If A is the n x n matrix / 0 0 • • • 0 ax 0 0 • • • a2 0 a n 0 • • • 0 0 / show that det(A) = (-l)n (n -1 )/2 aia2 • • • an. 9. Write down the permutation matrix that represents the permutation 3, 1, 4, 5, 2. 10. Let ii,..., in be a permutation of 1,..., n , and let P be the corresponding permutation matrix. Show that for any n x n matrix A the matrix AP is obtained from A by rearranging the columns according to the scheme Cj — > Cj.. 11. Prove that the sign of a permutation equals the determi- nant of the corresponding permutation matrix. 12. Prove that every permutation matrix is expressible as a product of elementary matrices of the type that represent row or column interchanges. 13. If P is any permutation matrix, show that P"1 = PT . [Hint: apply Exercise 10]. 3.2 Basic Properties of Determinants We now proceed to develop the theory of determinants, establishing a number of properties which will allow us to compute determinants more efficiently. Theorem 3.2.1 If A is an n x n matrix, then &et(AT ) =det(A).
  • 87. 3.2: Basic Properties of Determinants 71 Proof The proof is by mathematical induction. The statement is certainly true if n = 1 since then AT = A. Let n > 1 and assume that the theorem is true for all matrices with n — 1 rows and columns. Expansion by row 1 gives n 3 = 1 Let B denote the matrix AT . Then a^ = bji. By induction on n, the determinant A±j equals its transpose. But this is just the (j, 1) cofactor Bji of B. Hence Aj = Bji and the above equation becomes n det(A) = ^2bjiBji. However the right hand side of this equation is simply the expansion of det(B) by column 1; thus det(A) = det(B). A useful feature of this result is that it sometimes enables us to deduce that a property known to hold for the rows of a determinant also holds for the columns. Theorem 3.2.2 A determinant with two equal rows (or two equal columns) is zero. Proof Suppose that the n x n matrix A has its jth and kth rows equal. We have to show that det(A) = 0. Let ii,i^, • • . , in be a permutation of 1, 2,..., n; the corresponding term in the ex- pansion of det(i4) is sign(ii, i-2, ... , «n)aii102i2 • • -ctnin- Now if we switch ij and i^ in this product, the sign of the permuta- tion is changed, but the product of the a's remains the same
  • 88. 72 Chapter Three: Determinants since ajifc = a,kik and a^ = a^. This means that the term under consideration occurs a second time in the denning sum for det(.A), but with the opposite sign. Therefore all terms in the sum cancel and det(^4) equals zero. Notice that we do not need to prove the statement for columns because of the remark following 3.2.1. The next three results describe the effect on a determi- nant of applying a row or column operation to the associated matrix. Theorem 3.2.3 (i) If a single row (or column) of a matrix A is multiplied by a scalar c, the resulting matrix has determinant equal to c(det(A)). (ii) If two rows (or columns) of a matrix A are interchanged, the effect is to change the sign of the determinant. (iii) The determinant of a matrix A is not changed if a multiple of one row (or column) is added to another row (or column). Proof (i) The effect of the operation is to multiply every term in the sum defining det(A) by c. Therefore the determinant is multiplied by c. (ii) Here the effect of the operation is to switch two entries in each permutation of 1, 2,..., n; we have already seen that this changes the sign of a permutation, so it multiplies the determinant by —1. (iii) Suppose that we add c times row j to row k of the matrix: here we shall assume that j < k. If C is the resulting matrix, then det(C) equals ^2 sign(ii,..., in)aiii • • • %• *, - • • • (a kik + cajik ) • • • anin,
  • 89. 3.2: Basic Properties of Determinants 73 which is turn equals the sum of ^2 signal,..., in)oii1 •• and c ^ s i g n ( i 1 , . . . , i n ) a i i l • "3h ' a kik ' ' ' a nin • a31j a3lk 1 ar Now the first of these sums is simply det(^4), while the second sum is the determinant of a matrix in which rows j and k are identical, so it is zero by 3.2.2. Hence det(C) = det(A). Now let us see how use of these properties can lighten the task of evaluating a determinant. Let A be an n x n matrix whose determinant is to be computed. Then elementary row operations can be used as in Gaussian elimination to reduce A to row echelon form B. But B is an upper triangular matrix, say (bn bi2 • • • & l n 0 b22 • • • b2n B = 0 0 "nn / so by 3.1.5 we obtain det(B) = 611622 • • -bun- Thus all that has to be done is to keep track, using 3.2.3, of the changes in det(.A) produced by the row operations. Example 3.2.1 Compute the determinant D = 0 1 - 2 1 1 1 - 2 - 2 2 1 3 - 2 3 1 3 - 3 Apply row operations Ri <-» R2 and then R3 + 2R, R4 — i?i successively to D to get:
  • 90. 74 Chapter Three: Determinants D = 1 1 0 1 - 2 - 2 1 - 2 1 1 2 3 3 3 -2 - 3 1 1 0 1 0 0 0 - 3 1 1 2 3 5 5 -3 - 4 Next apply successively i?4 + 3i?2 and l/5i?3 to get D 1 1 0 1 0 0 0 0 1 2 5 3 = - 5 1 2 0 0 1 1 0 0 3 5 Finally, use of R4 — 3i?3 yields 1 1 1 n D = -5 0 1 2 0 0 1 0 0 0 = -10. Example 3.2.2 Use row operations to show that the following determinant is identically equal to zero. a + 2 b + 2 c + 2 x + 1 y + 1 z + 1 2x — a 2y — b 2z — c Apply row operations R% + Ri and 2R2. The resulting determinant is zero since rows 2 and 3 are identical. Example 3.2.3 Prove that the value of the n x n determinant 2 1 0 0 1 2 0 0 0 • 1 • 0 • 0 • • 0 • 0 • 1 • 0 0 0 2 1 0 0 1 2
  • 91. 3.2: Basic Properties of Determinants 75 is n + 1. First note the obvious equalities D — 2 and D2 n > 3; then, expanding by row 1, we obtain 3. Let Dn — 2Dn- — 1 0 0 0 0 1 2 1 0 0 0 1 2 0 0 0 • 0 • 1 • 0 • 0 • • 0 • 0 • 0 • 1 • 0 0 0 0 2 1 0 0 0 1 2 Expanding the determinant on the right by column 1, we find it to be Dn_2- Thus Dn = 2Dn _! — Dn-2- This is a recurrence relation which can be used to solve for successive values of Dn. Thus D3 = 4 , D4 = 5, D5 = 6 , etc. In general Dn = n + 1. (A systematic method for solving recurrence relations of this sort will be given in 8.2.) The next example is concerned with an important type of determinant called a Vandermonde determinant; these de- terminants occur frequently in applications. Example 3.2.4 Establish the identity 1 Xi X n - 1 1 X2 Xn JUn X„ X n - 1 n 1,3 , Xj, Xj), where the expression on the right is the product of all the factors Xi — Xj with i < j and i,j = l,2,...,n.
  • 92. 76 Chapter Three: Determinants Let D be the value of the determinant. Clearly it is a polynomial in xi, x2,..., xn. If we apply the column operation Ci—Cj , with i < j , to the determinant, its value is unchanged. On the other hand, after this operation each entry in column i will be divisible by xj — Xj. Hence D is divisible by • X> £ Jb n for alH, j; = 1, 2,..., n and i < j . Thus we have located a total of n(n— l)/2 distinct linear polynomials which are factors of D, this being the number of pairs of distinct positive integers i,j such that 1 < i < j < n. But the degree of the polynomial D is equal to , „N n(n — 1) l + 2 + --- + ( n - l ) = 2 • for each term in the denning sum has this degree. Hence D must be the product of these n(n —1)/2 factors and a constant c, there being no room for further factors. Thus D = cY[{xi-Xj), with i < j = 1, 2,..., n. In fact c is equal to 1, as can be seen by looking at the coefficient of the term lx2x^ • • • x^~l in the defining sum for the determinant D; this corresponds to the permutation 1,2,... ,n, and so its coefficient is +1. On the other hand, in the product of the x^ — Xj the coefficient of the term is 1. Hence c = 1. The critical property of the Vandermonde determinant D is that D = 0 if and only if at least two of xi, x2,..., xn are equal. Exercises 3.2 1. By using elementary row operations compute the following determinants: 1 0 3 2 3 4 - 1 2 0 3 1 2 ' 1 5 2 3 1 4 2 - 2 4 7 6 1 2 , (b) 3 0 2 1 4 - 3 - 2 4 6 , (c)
  • 93. 3.2: Basic Properties of Determinants 77 2. If one row (or column) of a determinant is a scalar multiple of another row (or column), show that the determinant is zero. 3. If A is an n x n matrix and c is a scalar, prove that det(cA) = cn det(A). 4. Use row operations to show that the determinant ,2 b2 „2 1 + b 2b2 - b - a l + a 2a2 -a-1 c l + c 1 2cz -c-1 is identically equal to zero. 5. Let A be an n x n matrix in row echelon form. Show that det(A) equals zero if and only if the number of pivots is less than n. 6. Use row and column operations to show that = (a + b + c)(-a2 -b2 - c2 +ab + bc + ca). a b c b c a c a b Without expanding the determinant, prove that 1 1 1 x - y)(y - z)(z - x)(x + y + z). x 3 X y y3 z ^3 [Hint: show that the determinant has factors x — y , y — z , z — x , and that the remaining factor must be of degree 1 and symmetric in x,y,z ]. 8. Let Dn denote the "bordered" n x n determinant 0 b 0 0 0 a 0 b 0 0 0 • a • 0 • 0 • 0 • • 0 • 0 • 0 • 0 • b 0 0 0 a 0
  • 94. 78 Chapter Three: Determinants Prove that Z^n-i = 0 and D2n — (—ab)n . 9. Let Dn be the nxn determinant whose (i,j) entry is i + j . Show that Dn = 0 if n > 2. [Hint: use row operations]. 10. Let un denote the number of additions, subtractions and multiplications needed in general to evaluate an n x n deter- minant by row expansion. Prove that un = nun _i + 2n — 1. Use this formula to calculate un for n = 2,3,4. 3.3 Determinants and Inverses of Matrices An important property of the determinant of a square matrix is that it tells us whether the matrix is invertible. Theorem 3.3.1 An nxn matrix A is invertible if and only if det( A) ^ 0. Proof By 2.3.2 there are elementary matrices E,E2, • • • ,Ek such that the matrix R = E^Ek-i • • • E^EA is in reduced row echelon form. Now observe that if E is any elementary nxn matrix, then det(EA) = cdet(^4) for some non-zero scalar c; this is because left multiplication by E performs an elementary row operation on A and we know from 3.2.3 that such an operation will, at worst, multiply the value of the determinant by a non-zero scalar. Applying this fact repeatedly, we obtain det(-R) = det(Ek • • • E2E1A) = ddet(A) for some non-zero scalar d. Consequently det(-A) 7^ 0 if and only if det(i?) ^ 0. Now we saw in 2.3.5 that A is invertible precisely when R = In . But, remembering the form of the matrix R, we recognise that the only way that det(-R) can be non-zero is if R = In. Hence the result follows. Example 3.3.1 The Vandermonde matrix of Example 3.2.4 is invertible if and only if all different.
  • 95. 3.3: Determinants and Inverses of Matrices 79 Corollary 3.3.2 A linear system AX = 0 with n equations in n unknowns has a non-trivial solution if and only if det(A) = 0. This very useful result follows directly from 2.3.5 and 3.3.1. Theorem 3.3.1 can be used to establish a basic formula for the determinant of the product of two matrices. Theorem 3.3.3 If A and B are any n x n matrices, then det(AB) = det (A) det(J5). Proof Consider first the case where B is not invertible, which by 3.3.1 means that det(B) = 0. According to 2.3.5 there is a non-zero vector X such that BX = 0. This clearly implies that (AB)X = 0, and so, by 2.3.5 and 3.3.1, det(AJ3) must also be zero. Thus the formula certainly holds in this case. Suppose now that B is invertible. Then B is a product of elementary matrices, say B = EE% • • • Ek', this is by 2.3.5. Now the effect of right multiplication of A by an elementary matrix E is to apply an elementary column operation to A. What is more, we can tell from 3.2.3 just what the value of det(AE) is; indeed { -det(A) det (A) , c det (A) according to whether E represents a column operation of the types O i 4—^ O j Ci + cCj cCi
  • 96. 80 Chapter Three: Determinants Now we can see from the form of the elementary matrix E that det(E) equals — 1, 1 or c, respectively, in the three cases; hence the formula det(AE ) = det(A) det(E) is valid. In short our formula is true when B is an elementary matrix. Applying this fact repeatedly, we find that &et(AB) equals det(AEfc • • • E2EX) = det(A) det(Ek) • • • det(£2) det(Ei), which shows that det(AB) = det(A) det{Ek • • • Ex) = det(A) det(5). Corollary 3.3.4 Let A and B be n x n matrices. If AB = In , then BA = In, and thus B = A"1 . Proof For 1 = det(AB) = det(A) det(S), so det(A) ^ 0 and A is in- vertible, by 3.3.1. Therefore BA = A~l {AB)A = A~l InA = In- Corollary 3.3.5 If A is an invertible matrix, then det(^4-1 ) = l/det(A). Proof Clearly 1 = det(7) = det{AA~l ) = det(A)det(A'1 ), from which the statement follows. The adjoint matrix Let A = (a,ij) be an n x n matrix. Then the adjoint matrix adj (A) of A is defined to be the nxn matrix whose (i,j) element is the (j,i) cofactor Aji of A. Thus adj(yl) is the transposed matrix of cofactors of A. For example, the adjoint of the matrix (6 -1 3 ] 2 -3 4/
  • 97. 3.3: Determinants and Inverses of Matrices 81 is / 5 -11 7 ( —18 2 3 ' - 1 6 7 -13 The significance of the adjoint matrix is made clear by the next two results. Theorem 3.3.6 // A is any n x n matrix, then A adj(A) = (det(A))In = adj(A)A. Proof The (i,j) entry of the matrix product A adj(i4) is n n ^2aik(adj(A))kj = ^2aikAjk. k=i fc=i If i = j , this is just the expansion of det(A) by rov/ i; on the other hand, if i ^ j , the sum is also a row expansion of a determinant, but one in which rows i and j are identical. By 3.2.2 the sum will vanish in this case. This means that the off-diagonal entries of the matrix product A a,d](A) are zero, while the entries on the diagonal all equal det(^4). Therefore A adj(^4) is the scalar matrix (det(A))In, as claimed. The second statement can be proved in a similar fashion. Theorem 3.3.5 leads to an attractive formula for the in- verse of an invertible matrix. Theorem 3.3.7 If A is an invertible matrix, then A"1 = (l/det(A))adj(A).
  • 98. 82 Chapter Three: Determinants Proof In the first place, remember that A~l exists if and only if det(A) ^ 0, by 3.3.1. Prom A adj(A) = {det(A))In we obtain A(l/det(A))adj(A)) = l/det(A)(A adj(A)) = In, by 3.3.6. The result follows in view of 3.3.4. Example 3.3.2 Let A be the matrix The adjoint of A is / 3 2 1 2 4 2 . 1 2 3 / Expanding det(i4) by row 1, we find that it equals 4. Thus / 3 / 4 1/2 1/4 A'1 = 1/2 1 1/2 . l / 4 1/2 3 / 4 / Despite the neat formula provided by 3.3.7, for matrices with four or more rows it is usually faster to use elementary row operations to compute the inverse, as described in 2.3: for to find the adjoint of an n x n matrix one must compute n determinants each with n — 1 rows and columns. Next we give an application of determinants to geometry. Example 3.3.3 Let Pi(xi, yi, zx), P2(x2, y2, z2) and P3(^3, 2/3, z3) be three non-collinear points in three dimensional space. The points
  • 99. 3.3: Determinants and Inverses of Matrices 83 therefore determine a unique plane. Find the equation of the plane by using determinants. We know from analytical geometry that the equation of the plane must be of the form ax + by + cz + d = 0. Here the constants a, b, c, d cannot all be zero. Let P(x,y,z) be an arbitrary point in the plane. Then the coordinates of the points P, Pi, P2, P3 must satisfy the equation of the plane. Therefore the following equations hold: ax + by + cz + d = 0 ax + byi + cz + d = 0 ax2 + bx2 + cz2 + d = 0 ax3 + by3 + cz3 + d = 0 Now this is a homogeneous linear system in the unknowns a, b, c, d; by 3.3.2 the condition for there to be a non-trivial solution is that x y z 1 xi yx z 1 x2 2/2 z2 1 £3 2/3 z3 1 This is the condition for the point P to lie in the plane, so it is the equation of the plane. That it is of the form ax + by + cz + d = 0 may be seen by expanding the determinant by row 1. For example, the equation of the plane which is deter- mined by the three points (0, 1, 1), (1, 0, 1) and (1, 1, 0) is x y z 1 0 1 1 1 1 0 1 1 ' 1 1 0 1 which becomes on expansion x + y + z — 2 = 0.
  • 100. 84 Chapter Three: Determinants Cramer's Rule For a second illustration of the uses of determinants, we return to the study of linear systems. Consider a linear system of n equations in n unknowns x, X2,..., xn AX = B, where the coefficient matrix A has non-zero determinant. The system has a unique solution, namely X = A~1 B. There is a simple expression for this solution in terms of determinants. Using 3.3.7 we obtain X = A~l B = l/det(A) (adj(A) B). From the matrix product adj(^4)JB we can read off the ith. unknown as n n Xi = (5>dj(A))^)/det(A) = C^bjAjJ/detiA). Now the second sum is a determinant; in fact it is det(Mj) where Mi is the matrix obtained from A when column i is replaced by B. Hence the solution of the linear system can be expressed in the form Xi = det(Mj)/det(^4), i = 1,2, ...,n. Thus we have obtained the following result. Theorem 3.3.8 (Cramer's Rule) If AX — B is a linear system of n equations in n unknowns and det (A) is not zero, then the unique solution of the linear system can be written in the form Xi = det(Mi)/det(j4), i = l, ... , n, where Mi is the matrix obtained from A when column i is replaced by B.
  • 101. 3.3: Determinants and Inverses of Matrices 85 The reader should note that Cramer's Rule can only be used when the linear system has the special form indicated. Example 3.3.4 Solve the following linear system using Cramer's Rule. Here A = X - X2 Xi + 2x2 2x± 1 - 1 1 2 2 0 : ! ) - x3 = 4 -x3 = 2 = 1 and B = '4 Thus det(A) = 9, and Cramer's Rule gives the solution xx = 1/9 x2 = 1/9 x3 = 1/9 4 2 1 1 1 2 1 1 2 - 1 2 0 4 2 1 - 1 2 0 -1 - 1 1 - 1 - 1 1 4 2 1 = 13/9, = - 2 / 3 , = -17/9. Exercises 3.3 1. For the matrices A = and B = 2 5 4 7
  • 102. 86 Chapter Three: Determinants verify the identity det(AB) = det(A) det(B). 2. By finding the relevant adjoints, compute the inverses of the following matrices: (a) 4 -2 (b) (c) 3. If A is a square matrix and n is a positive integer, prove that det(An ) = (det(A))n . 4. Use Cramer's Rule to solve the following linear systems: (a) 2xx Xi 2xi Xi 2xi Xi - 3x2 + 3x2 + x2 + x2 - x2 + 2x2 + £3 + x3 + x3 + x3 - x3 - 3x3 = -1 = 6 = 11 = -1 = 4 = 7 (b) 5. Let A be an n x n matrix. Prove that A is invertible if and only if adj(A) is invertible. 6. Let A be any n x n matrix where n > 1. Prove that det(adj(yl)) = (det(A))n_1 . [Hint: first deal with the case where det(A) ^ 0, by applying det to each side of the identity of 3.3.6. Then argue that the result must still be true when det(A)=0]. 7. Find the equation of the plane which contains the points (1,1,-2), (1,-2, 7) and (0,1,-4). 8. Consider the four points in three dimensional space Pi(xi,yi,Zi), i = 1, 2, 3, 4. Prove that a necessary and suffi- cient condition for the four points to lie in a plane is xi y zi 1 X2 V2 Z2 1 X3 J/3 Z3 1 x4 y4 zA 1 = 0.
  • 103. Chapter Four INTRODUCTION TO VECTOR SPACES The aim of this chapter is to introduce the reader to the notion of an abstract vector space. Roughly speaking, a vec- tor space is a set of objects called vectors which it is possible to add and multiply by scalars, subject to reasonable rules. Vector spaces occur in numerous branches of mathematics, as well as in many applications; they are therefore of great im- portance and utility. Rather than immediately confront the reader with an abstract definition, we prefer first to discuss some vector spaces which are familiar objects. Then we pro- ceed to extract the common features of these examples, and use them to frame the definition of a general vector space. 4.1 Examples of Vector Spaces The first example of a vector space has a geometrical background. Euclidean space Choose and fix a positive integer n, and define to be the set of all n-column vectors f X l x = X J XnJ 87
  • 104. 88 Chapter Four: Introduction to Vector Spaces where the entries Xi are real numbers. Of course these are special types of matrices, so rules of addition and scalar mul- tiplication are at hand, namely fXl xnJ + /Vi V2 VVr> X2 + V2 and fxx X2 ( CXi _ cx2 Thus the set Rn is "closed" with respect to the operation of adding pairs of its elements, in the sense that one cannot escape from Rn by adding two of its elements; similarly Rn is closed with respect to multiplication of its elements by scalars. Notice also that Rn contains the zero column vector. Another point to observe is that the rules of matrix alge- bra listed in 1.2.1 which are relevant to column vectors apply to the elements of Rn . The set Rn , together with the op- erations of addition and scalar multiplication, forms a vector space which is known as n- dimensional Euclidean space. Line segments and R3 When n is 3 or less, the vector space Rn has a good geometrical interpretation. Consider the case of R3 . Atypical element of R3 is a 3-column Assume that a cartesian coordinate system has been chosen with assigned x, y and z -axes. We plan to represent the col- umn vector A by a directed line segment in three-dimensional
  • 105. 4.1: Examples of Vector Spaces 89 space. To achieve this, choose an arbitrary point I with co- ordinates (u, «2> U3) as the initial point of the line segment. The end point of the segment is the point E with coordinates (ui+ai, U2 + a2, U3 + 0,3). The direction of the line segment IE is indicated by an arrow: E(u1 + a1,u2 + a 2. w 3 + a 3) {U:,U2,U3) The length of IE equals I — J a + 02 + 03 and its direction is specified by the direction cosines cii/l, a2/l, a3/l. Here the significant feature is that none of these quantities depends on the initial point I. Thus A is represented by in- finitely many line segments all of which have the same length and the same direction. So all the line segments which repre- sent A are parallel and have equal length. However the zero vector is represented by a line segment of length 0 and it is not assigned a direction. Having connected elements of R3 with line segments, let us see what the rule of addition in R3 implies about line seg- ments. Consider two vectors in R3 A = a2 and B
  • 106. 90 Chapter Four: Introduction to Vector Spaces and their sum A + B = a2 + b2 . a3 + b3 J Represent the vectors A, B and A + B by line segments IU, IV, and I W in three dimensional space with a common initial point I (ui, u2, u3), say. The line segments determine a figure I U W V as shown: where U, W and V are the points (ui+ai, u2+a2, u3+a3), (ui+ai+bi, u2+a2+b2, u3+a3+b3) and (ui +h, u2 +b2, u3 + b3), respectively. In fact I U W V is a parallelogram. To prove this, we need to find the lengths and directions of the four sides. Simple analytic geometry shows that IU'= VW = /a + a| + a§ = /, and that IV = UW = ^Jb + b + bj = m, say. Also the direction cosines of IU and V W are ai/l, a2/l, a3/l, while those of IV and U W are bi/m, b2/m, b3/m. It follows that opposite sides of I U W V are parallel and of equal length, so it is indeed a parallelogram. These considerations show that the rule of addition for vectors in R3 is equivalent to the parallelogram rule for addi- tion of forces, which is familiar from mechanics. To add line
  • 107. 4.1: Examples of Vector Spaces 91 segments IU and IV representing the vectors A and B, com- plete the parallelogram formed by the lines IU and IV; the the diagonal I W will represent the vector A + B. An equivalent formulation of this is the triangle rule, which is encapsulated in the diagram which follows: I A Note that this diagram is obtained from the parallelogram by deleting the upper triangle. Since IV and U W are parallel line segments of equal length, they represent the same vector B. There is also a geometrical interpretation of the rule of scalar multiplication in R3 . As before let A in R3 be repre- sented by the line segment joining I(tii, U2, u^) to U(tti + ai, U2 + Q2) ^3 + 03). Let c be any scalar. Then cA is represented by the line segment from (u, U2, U3) to (u + cax, U2 + ca2, U3 + CGS3). This line segment has length equal to c times the length of IU, while its direction is the same as that of IU if c > 0, and opposite to that of IU if c < 0. Of course, there are similar geometrical representations of vectors in R2 by line segments drawn in the plane, and in R1 by line segments drawn along a fixed line. So our first examples of vector spaces are familiar objects if n < 3. Further examples of vector spaces are obtained when the field of real numbers is replaced by the field of complex num- bers C: in this case we obtain Cn ,
  • 108. 92 Chapter Four: Introduction to Vector Spaces the vector space of all n-column vectors with entries in C. More generally it is to carry out the same construction with an arbitrary field of scalars F, in the sense of 1.3; this yields the vector space of all n-column vectors with entries in F, with the usual rules of matrix addition and scalar multiplication. Vector spaces of matrices One obvious way to extend the previous examples is by allowing matrices of arbitrary size. Let Mm?n(R) denote the set of all m x n matrices with real entries. This set is closed with respect to matrix addition and scalar mul- tiplication, and it includes the zero matrix 0m>n. The rules of matrix algebra guarantee that MmiTl(R) is a vector space. Of course, if n = 1, we recover the Euclidean space Rm , while if m = 1, we obtain the vector space of all real n-row vectors. It is consistent with notation estab- lished in 1.3 if we write Mn(R) for the vector space of all real n x n matrices, instead of Mn n (R). Once again R can be replaced by any field of scalars F in these examples, to produce the vector spaces Mm ,n (F), Mn(F) and Fn.
  • 109. 4.1: Examples of Vector Spaces 93 Vector spaces of functions Let a and b be fixed real numbers with a < b, and let C[a, b] denote the set of all real-valued functions of x that are continuous at each point of the closed interval [o, b]. If / and g are two such functions, we define their sum f + g by the rule f + g(x) = f(x)+g(x). It is a well-known result from calculus that / + g is also con- tinuous in [a, b], so that f + g belongs to C[a, b]. Next, if c is any real number, the function cf defined by cf(x) = c(f(x)) is continuous in [a, b] and thus belongs to C[a,b]. The zero function, which is identically equal to zero in [a, b], is also included in C[a,b]. Thus once again we have a set that is closed with respect to natural operations of addition and scalar multiplication; C[a, b] is the vector space of all continuous functions on the interval [a, b]. In a similar way one can form the smaller vector space D[a, b] consisting of all differentiable functions on [a, b], with the same rules of addition and scalar multiplication. A still smaller vector space is £>oo[a,6], the vector space of all functions that are infinitely differentiable in [a, b] Vector spaces of polynomials A (real) polynomial in an indeterminate x is an expression of the form f(x) = aQ + aix H - anxn where the coefficients a^ are real numbers. If an ^ 0, the polynomial is said to have degree n. Define Pn(R)
  • 110. 94 Chapter Four: Introduction to Vector Spaces to be the set of all real polynomials in x of degree less than n. Here we mean to include the zero polynomial, which has all its coefficients equal to zero. There are natural rules of addi- tion and scalar multiplication in Pn (R), namely the familiar ones of elementary algebra: to add two polynomials add cor- responding coefficients; to multiply a polynomial by a scalar c, multiply each coefficient by c. Using these operations, we obtain the vector space of all real polynomials of degree less than n. This example could be varied by allowing polynomial of arbitrary degree, thus yielding the vector space of all real poly- nomials P(R). As usual R may be replaced by any field of scalars here. Common features of vector spaces The time has come to identify the common features in the above examples: they are: (i) a non-empty set of objects called vectors, including a "zero" vector; (ii) a way of adding two vectors to give another vector; (iii) a way of multiplying a vector by a scalar to give a vector; (iv) a reasonable list of rules that the operations mentioned(ii) and (iii) are required to satisfy. We are being deliberately vague in (iv), but the rules should correspond to properties of matrices that are known to hold in Rn andMm > n (R).
  • 111. 4.2: Vector Spaces and Subspaces 95 Exercises 4.1 1. Give details of the geometrical interpretations of R1 and R2 . 2. Which of the following might qualify as vector spaces in the sense of the examples of this section? (a) the set of all real 3-column vectors that correspond to line segments of length 1; (b) the set of all real polynomials of degree at least 2; (c) the set of all line segments in R3 that are parallel to a given plane; (d) the set of all continuous functions of x defined in the interval [0, 1] that vanish at x — 1/2. 4.2 Vector Spaces and Subspaces It is now time to give a precise formulation of the defini- tion of a vector space. Definition of a vector space A vector space V over R consists of a set of objects called vectors, a rule for combining vectors called addition, and a rule for multiplying a vector by a real number to give another vector called scalar multiplication. If u and v are vectors, the result of adding these vectors is written u + v, the sum of u and v; also, if c is a real number, the result of multiplying v by c, is written cv, the scalar multiple of v by c. It is understood that the following conditions must be satisfied for all vectors u, v, w and all real scalars c, d : (i) u + v = v + u, (commutative law); (ii) (u + v) + w = u + (v + w), (associative law); (iii) there is a vector 0, called the zero vector, such that v + 0 = v; (iv) each vector v has a negative, that is, a vector —v such that v + (—v) = 0;
  • 112. 96 Chapter Four: Introduction to Vector Spaces (v) cd(v) = c(dv); (vi) c(u + v) = cu + cv : (distributive law); (vii) (c + d)v — cv + dv; (distributive law); (viii) lv = v. For economy of notation it is customary to use V to denote the set of vectors, as well as the vector space. Since the vector space axioms just listed hold for matrices, they are valid in Rn ; they also hold in the other examples of vector spaces described in 4.1. More generally, we can define a vector space over an ar- bitrary field of scalars F by simply replacing R by F in the above axioms. Certain simple properties of vector spaces follow easily from the axioms. Since these are used constantly, it is as well to establish them at this early stage. Lemma 4.2.1 If u and v are vectors in a vector space, the following state- ments are true: (a) Ov = 0 and c 0 = 0 where c is a scalar; (b) if u + v = 0, then u = —v; (c) ( - l ) v = - v . Proof (a) In property (vii) above put c = 0 = d, to get Ov = Ov + Ov. Add — (Ov) to both sides of this equation and use the associative law (ii) to deduce that 0 = -(Ov) + Ov = (-(Ov) + Ov) + Ov, which leads to 0 = Ov. Proceed similarly in the second part. (b) Add —v to both sides of u + v = 0 and use the associative law. (c) Using (vii) and (viii), and also (a), we obtain v + (-l)v = lv + (-l)v = (1 + (-l))v = Ov = 0.
  • 113. 4.2: Vector Spaces and Subspaces 97 Hence (-l)v = —v by (b). Subspaces Roughly speaking, a subspace is a vector space contained within a larger vector space; for example, the vector space p2(R) is a subspace of Ps(R). More precisely, a subset S of a vector space V is called a subspace of V if the following statements are true: (i) S contains the zero vector 0; (ii) if v belongs to S, then so does cv for every scalar c, that is, S is closed under scalar multiplication; (iii) if u and v belong to S, then so does u + v that is, S is closed under addition. Thus a subspace of V is a subset S which is itself a vector space with respect to the same rules of addition and scalar multiplication as V. Of course, the vector space axioms hold in S since they are already valid in V. Examples of subspaces If V is any vector space, then V itself is a subspace, for trivial reasons. It is often called the improper subspace. At the other extreme is the zero subspace, written 0 or Oy, which contains only the zero vector 0. This is the smallest subspace of V. (In general a vector space that contains only the zero vector is called a zero space). The zero subspace and the improper subspace are present in every vector space. We move on now to some more interesting examples of subspaces. Example 4.2.1 Let S be the subset of R2 consisting of all columns of the form (-30
  • 114. 98 Chapter Pour: Introduction to Vector Spaces where t is an arbitrary real number. Since 2 M + ( 2t A = ( 2(*i+<2) and —3t J —3ct S is closed under addition and scalar multiplication; also S contains the zero vector I 1, as may be seen by taking t to be to 0. Hence S is a subspace of R2 . In fact this subspace has geometrical significance. For / 2 A an arbitrary vector I 1 of S may be represented by a line segment in the plane with initial point the origin and end point (2t,—3t). But the latter is a general point on the line with equation 3x + 2y = 0. Therefore the subspace S corresponds to the set of line segments drawn from the origin along the line 3x + 2y = 0. Example 4.2.2 This example is an important one. Consider the homogeneous linear system AX = 0 in n unknowns over some field of scalars F and let S denote the set of all solutions of the linear system, that is, all the n-column vectors X over F that satisfy AX — 0. Then S is a subset of Fn and it certainly contains the zero vector. Now if X and Y are solutions of the linear system and c is any scalar, then A(X + Y) = AX + AY = 0 and A(cX) = c(AX) = 0. Thus X + Y and cX belong to S and it follows that S is a subspace of the vector space Rn . This subspace is called the
  • 115. 4.2: Vector Spaces and Subspaces 99 solution space of the homogeneous linear system AX = 0; it is also known as the null space of the matrix A. (Question: why is it necessary to have a homogeneous linear system here?) Example 4.2.3 Let S denote the set of all real solutions y = y(x) of the homogeneous linear differential equation y" + by' + 6y = 0 defined in some interval [a, b]. Thus S is a subset of the vector space C[a, & ] of continuous functions on [a, b]. It is easy to verify that S contains the zero function and that S is closed with respect to addition and scalar multiplication; in other words S is a subspace of C[a, b}. The subspace S in this example is called the solution space of the differential equation. More generally, one can define the solution space of an arbitrary homogeneous linear differential equation, or even of a system of such differential equations. Systems of homogeneous linear differential equations are stud- ied in Chapter Eight. Linear combinations of vectors Let vi, V2,..., v/; be vectors in a vector space V. If c, C2, ... , C f c are any scalars, the vector civi + c2v2 H V cfcvfc is called a linear combination of vi, v2 ,..., v^. For example, consider two vectors in R2 The most general linear combination of X and X2 is
  • 116. 100 Chapter Four: Introduction to Vector Spaces In general let X be any non-empty subset of a vector space V and denote by <X > the set of all linear combinations of vectors in X. Thus a typical element of < X > is a vector of the form cixi + c2x2 H h CfcXfc where xi, x2 ,..., x& are vectors belonging to X and c, c2 ,..., Ck are scalars. From this formula it is clear that the sum of any two elements of < X > is still in < X > and that a scalar multiple of an element of < X > is in < X >. Thus we have the following important result. Theorem 4.2.2 If X is a non-empty subset of a vector space V, then < X >, the set of all linear combinations of elements of X, is a sub- space of V. We refer to < X > as the subspace of V generated (or spanned) by X. A good way to think of < X > is as the small- - est subspace of V that contains X. For any subspace of V that contains X will necessarily contain all linear combinations of vectors in X and so must contain < X > as a subset. In par- ticular, a subset X is a subspace if and only if X — < X >. In the case of a finite set X = {x!, x2 ,..., x/J, we shall write < X i , X2 , . . . , Xfc > for < X >. Example 4.2.4 For the three vectors of R3 given below, determine whether C belongs to the subspace generated by A and B: x -(i),j, -(1),c -(~i)-
  • 117. 4.2: Vector Spaces and Subspaces 101 We have to decide if there are real numbers c and d such that cA + dB — C. To see what this entails, equate cor- responding vector entries on both sides of the equation to obtain c - d = - 1 c + 2d = 5 4c + d = 6 Thus C belongs to < A, B > if and only if this linear system is consistent. It is quickly seen that the linear system has the (unique) solution c = 1, d = 2. Hence C = A + 2B, so that C does belong to the subspace < A, B >. What is the geometrical meaning of this conclusion? Re- call that A, B and C can be represented by line segments in 3-dimensional space with a common initial point I, say IP, IQ and IR. A typical vector in < A, B > can be expressed in the form sA + tB with real numbers s and t . Now sA and tB are representable by line segments parallel to IP and IQ respectively. We obtain a line segment that represents sA+tB by applying the parallelogram law; clearly the resulting line segment will lie in the plane determined by IP and IQ. Con- versely, it is not difficult to see that any line segment lying in this plane represents a vector of the form sA + tB. Therefore the vectors in the subspace < A, B > are those that can be represented by line segments drawn from I lying in the plane determined by IP and IQ. What we have shown is that IR lies in this plane. Finitely generated vector spaces A vector space V is said to be finitely generated if there is a finite subset {vi, V2,..., v&} of V such that V =< vi,v2 ,...,vf c >, that is to say, every vector in V is a linear combination of the vectors vi, V2,. • •, v^, and so has the form CiVi + C2V2 - h CkVk
  • 118. 102 Chapter Four: Introduction to Vector Spaces for some scalars c$. If, on the other hand, no finite subset generates V, then V is said to be infinitely generated. Example 4.2.5 Show that the Euclidean space Rn is finitely generated. Let X, X2, • •., Xn be the columns of the identity matrix In- If fa± A= ^ anJ is any vector in Rn , then A = aX + 0,2X2 + • • • + anXn; therefore X±,X2,.. • ,Xn generate Rn and consequently this vector space is finitely generated. On the other hand, one does not have to look far to find infinitely generated vector spaces. Example 4.2.6 Show that the vector space P(R) of all real polynomials in x is infinitely generated. To prove this we adopt the method of proof by contradic- tion. Assume that P(R) is finitely generated, say by polyno- mials Pi,P2, • • • iPki a n d look for a contradiction. Clearly we may assume that all of these polynomials are non-zero; let m be the largest of their degrees. Then the degree of any linear combination of Pi,p2 • • • ,Pk certainly cannot exceed m. But this means that xm+1 , for example, is not such a linear com- bination. Consequently Pi,P2:---iPk do not generate P(R), and we have reached a contradiction. This establishes the truth of the claim.
  • 119. 4.2: Vector Spaces and Subspaces 103 Exercises 4.2 1. Which of the following are vector spaces? The operations of addition and scalar multiplication are the natural ones: (a) the set of all 2 x 2 real matrices with determinant equal to zero; (b) the set of all solutions X of a linear system AX = B where B ^ 0; (c) the set of all functions y = y(x) that are solutions of the homogeneous linear differential equation an(x)y{n) + an^{x)y^-l) + • • • + ai{x)y' + a0(x)y = 0. 2. In the following examples say whether S is a subspace of the vector space V : (a) V = R2 and S is the subset of all matrices of the form 1 where a is an arbitrary real number; (b) V = C[0,1] and S is the set of all infinitely differentiable functions in V. (c) V = -P(R) and S is the set of all polynomials p such that p(l) = 0. 3. Does the polynomial 1 — 2x + x2 belong to the subspace of P3(R) generated by the polynomials 1 + X » X X and 3 — 2a:? 4. Determine if the matrix I 1 is in the subspace of M2(R) generated by the following matrices: 3 4 / 0 2 / 0 2 1 2J' 1-1/3 4J' I 6 1 5. Prove that the vector spaces Mm)Tl(F) and Pn(F) are finitely generated where F is an arbitrary field.
  • 120. 104 Chapter Four: Introduction to Vector Spaces 6. Prove that the vector spaces C[0, 1] and P(F) are infinitely generated, where F is any field. 7. Let A and B be vectors in R2 . Show that A and B generate R2 if and only if neither is a scalar multiple of the other. Interpret this result geometrically. 4.3 Linear Independence in Vector Spaces We begin with the crucial definition. Let V be a vector space and let X be a non-empty subset of V. Then X is said to be linearly dependent if there are distinct vectors Vi, v2 ,..., v^ in X, and scalars c±, c2 ,..., Ck, not all of them zero, such that civi + c2v2 H h c/jVfc = 0. This amounts to saying that at least one of the vectors v$ can be expressed as a linear combination of the others. Indeed, if say Ci / 0, then we can solve the equation for v$, obtaining n For example, a one-element set {v} is linearly dependent if and only if v = 0. A set with two elements is linearly dependent if and only if one of the elements is a scalar multiple of the other. A subset which is not linearly dependent is said to be linearly independent. Thus a set of distinct vectors {vi, ... , Vfc} is linearly independent if and only if an equation of the form civi -I hCfcVfc = 0 always implies that c = c2 = • • • = ck = 0 . We shall often say that vectors v i , . . . , v& are linearly de- pendent or independent, meaning that the subset {vi,..., v/J has this property.
  • 121. 4.3: Linear Independence in Vector Spaces 105 Linear dependence in R3 Consider three vectors A, B,C in Euclidean space R3 , and represent them by line segments in 3-dimensional space with a common initial point. If these vectors form a linearly dependent set, then one of them, say A, can be expressed as a linear combination of the other two, A = uB + vC; this equation says that the line segment representing A lies in the same plane as the line segments that represent B and C. Thus, if the three vectors form a linearly dependent set, their line segments must be coplanar. Conversely, assume that A,B,C are vectors in R3 which are represented by line segments drawn from the origin, all of which lie in a plane. We claim that the vectors will then be linearly dependent. To see this, let the equation of the plane be ux + vy + wz = 0; keep in mind that the plane passes through the origin. Let the entries of A be written ai, 02, 03, with a similar notation for B and C. Then the respective end points of the line segments have coordinates (01,02,03), (61,62,63), (ci, 02,03). Since these points lie on the plane, we have the equations ua + vci2 + waz = 0 ub + v62 + 1063 = 0 UO + VC2 + WC3 = 0 This homogeneous linear system has a non-trivial solution for u, v, w, so the determinant of its coefficient matrix is zero by 3.3.2. Now the coefficient matrix of the linear system ' ua + vb + wci = 0 < ua2 + vb2 + WC2 = 0 k ^03 + 1*63 + wc3 = 0 is the transpose of the previous one, so by 3.2.1 it has the same determinant. It follows that the second linear system also has
  • 122. 106 Chapter Four: Introduction to Vector Spaces a non-trivial solution u, v, w. But then uA + vB + wC = 0, which shows that the vectors A, B, C are linearly independent. Thus there is a natural geometrical interpretation of lin- ear dependence in the Euclidean space R3 : three vectors are linearly dependent if and only if they are represented by line segments lying in the same plane. There is a corresponding in- terpretation of linear dependence in R2 (see Exercise 4.3.11). Example 4.3.1 Are the polynomials x + 1, x + 2, x2 — 1 linearly dependent in the vector space P3(R)? To answer this, suppose that c,C2,c^ are scalars satisfy- ing cx{x + 1) + c2(x + 2)+ c3(x2 - 1) = 0. Equating to zero the coefficients of 1, x , x2 , we obtain the homogeneous linear system Ci + 2 c 2 - C3 = 0 ci + c2 = 0 C3 = 0 This has only the trivial solution c = c2 = c3 = 0; hence the polynomials are linearly independent. Example 4.3.2 Show that the vectors ( - : ) •( ! ) •( - : ) are linearly dependent in R2 . Proceeding as in the last example, we let c, c2, cz be scalars such that *(-J)+ *GM-3-C0-
  • 123. 4.3: Linear Independence in Vector Spaces 107 This is equivalent to the homogeneous linear system f -ci + c2 + 2c3 = 0 2ci + 2c2 - 4c3 = 0 Since the number of unknowns is greater than the number of equations, this system has a non-trivial solution by 2.1.4. Hence the vectors are linearly dependent. These examples suggest that the question of deciding whether a set of vectors is linearly dependent is equivalent to asking if a certain homogeneous linear system has non-trivial solutions. Further evidence for this is provided by the proof of the next result. Theorem 4.3.1 Let Ai,A2,... ,Am be vectors in the vector space Fn where F is some field. Put A — [AA2... Am], an n x m matrix. Then Ai, A2,..., Am are linearly dependent if and only if the number of pivots of A in row echelon form is less than m. Proof Consider the equation CA + c2A2 + • • • + cmAm = 0 where c, c2, ... , cm are scalars. Equating entries of the vector on the left side of the equation to zero, we find that this equation is equivalent to the homogeneous linear system / c i A . = 0 . cmJ By 2.1.3 the condition for this linear system to have a non- trivial solution ci, c2,..., cm is that the number of pivots be less than m . Hence this is the condition for the set of column vectors to be linearly dependent.
  • 124. 108 Chapter Four: Introduction to Vector Spaces In 5.1 we shall learn how to tell if a set of vectors in an arbitrary finitely generated vector space is linearly dependent. An application to differential equations In the theory of linear differential equations it is an im- portant problem to decide if a given set of functions in the vector space C[a, b] is linearly dependent. These functions will normally be solutions of a homogeneous linear differential equation. There is a useful way to test such a set of func- tions for linear independence using a determinant called the Wronskian. Suppose that / i , /2, • • •, /«, are functions whose first n— derivatives exist at all points of the interval [a, b. In particular this means that the functions will be continuous throughout the interval, so they belong to C[a, b]. Assume that ci, C2,..., cn are real numbers such that Ci/i +C2/2 + • • • + cnfn = 0, the zero function on [a ,b]. Now differentiate this equation n — 1 times, keeping in mind that the Cj are constants. This results in a set of n equations for c, c^,..., cn { c i / i + C2/2 + • • • cif{ + c2ti +• • • ci/1 ( "-1) +c2 /2 ( n -1 } +• • • This linear system can be written in matrix form: / h h • • • fn l / l J 2 ' ' ' in y An-l) An-1) _ _ _ f^n_1) ) By 3.3.2, if the determinant of the coefficient matrix of the linear system is not identically equal to zero in [a, b], the + cnfn = 0 + cnfn = 0 + cnfn n -1) = 0 / c i C2 Vcn/ 0.
  • 125. 4.3: Linear Independence in Vector Spaces 109 linear system has only the trivial solution and the functions /i) /25 • • •, fn w m De linearly independent. Define W(f1,f2,...,fn) = h fi , ( n - l ) /: h ( n - l ) In f J n / ( n - l ) Jn This determinant is called the Wronskian of the functions /i ? /2, • • •, fn • Then our discussion shows that the following is true. Theorem 4.3.2 Suppose that fi, f2, • • • , fn are functions whose first n — 1 derivatives exist in the interval [a,b. If W(fi, / 2 , . . . , fn) is not identically equal to zero in this inter- val, then / i , f 2 , . . . , fn are linearly independent in [a, b}. The converse of 4.3.2 is false. In general one cannot conclude that if / i , f2, • • • , fn are linearly independent, then W(fi,f2, • • •, fn) is not the zero function. However, it turns out that if the functions fi,f2,---,fn a r e solutions of a ho- mogeneous linear differential equation of order n, then the Wronskian can never vanish. Hence a necessary and sufficient condition for a set of solutions of a homogeneous linear differ- ential equation to be linearly independent is that their Wron- skian should not be the zero function. For a detailed account of this topic the reader should consult a book on differential equations such as [16]. Example 4.3.3 Show that the functions x,ex ,e~2x are linearly indepen- dent in the vector space C[0, 1]. The Wronskian is W(x,ex , e~lx ) = e-2x -2e-2x 4 e - 2 * 3(2x-l)e"
  • 126. 110 Chapter Four: Introduction to Vector Spaces which is not identically equal to zero in [0, 1]. Exercises 4.3 1. In each of the following cases determine if the subset S of the vector space V is linearly dependent or linearly indepen- dent: (a) V = C and S consists of the column vectors U)'KH'U4r); (b) V = P(R) and S = {x - 1, x2 + 1, x3 - x2 - x + 3}; (c) V = M(2, R) and S consists of the matrices (2 - 3 / 3 1 [12 -7 6 4J> 1,-1/2 - 3 / ' Vl7 6J' 2. A subset of a vector space that contains the zero vector is linearly dependent: true or false? 3. If X is a linearly independent subset of a vector space, every non-empty subset of X is also linearly independent: true or false? 4. If X is a linearly dependent subset of a vector space, every non-empty subset of X is also linearly dependent: true or false? 5. Prove that any three vectors in R2 are linearly dependent. Generalize this result to Rn . 6. Find a set of n linearly independent vectors in Rn . 7. Find a set of ran linearly independent vectors in the vector space Mm>n(R). 8. Show that the functions x, ex sin x, ex cos x form a lin- early independent subset of the vector space C[0, n].
  • 127. 4.3: Linear Independence in Vector Spaces 111 9. The union of two linearly independent subsets of a vector space is linearly independent: true or false? 10. If {u, v}, {v, w}and{w,u} are linearly independent sub- sets of a vector space, is the subset {u, v, w} necessarily lin- early independent? 11. Show that two non-zero vectors in R2 are linearly de- pendent precisely when they are represented by parallel line segments in the plane.
  • 128. Chapter Five BASIS AND DIMENSION We now specialize our study of vector spaces to finitely generated vector spaces, that is, to those that can be generated by finite subsets. The essential fact to be established is that in any non-zero vector space there is a basis, that is to say, a set of vectors in terms of which every vector of the space can be written in a unique manner. This allows the representation of vectors in abstract vector spaces by column vectors. 5.1 The Existence of a Basis The following theorem on linear dependence is fundamen- tal for everything in this chapter. Theorem 5.1.1 Let Vi, V2,..., vm be vectors in a vector space V and let S = < v i> v 2, • • •,v m >> the subspace generated by these vectors. Then any subset of S containing m + 1 or more elements is linearly dependent. Proof To prove the theorem it suffices to show that if ui, u2 ,..., u m + i are any m+1 vectors of the subspace S, then these vec- tors are linearly dependent. This amounts to finding scalars ci, c2 ,..., cm, not all of them zero, such that ciui + c2u2 H h cm + ium + 1 = 0. Now, because u; belongs to S, there is an expression Uj = dijVi + d2 ;v2 H h dmivm 112
  • 129. 5.1: Existence of a Basis 113 where the dji are certain scalars. On substituting for the u^, we obtain m+l m ciui + c2u2 H h cm + ium + i =Y^Ci (^2 djiVj) i=l j=l m m + l j=l i=l Here we have interchanged the summations over i and j . This is permissible since it corresponds to adding up the vectors CidjiVj in a different order, which is possible in a vector space because of the commutative law for addition. We deduce from the last equation that the vector ciUi + C2U2 + • • • + cm + ium + i will equal 0 provided that all the ex- pressions ^djiCi equal zero, that is to say, ci,C2,... ,Cm+i form a non-trivial solution of the homogeneous linear system DC — 0 where D is the m x (m + 1) matrix whose (j, i) en- try is dji and C is the column consisting of ci, C2, • • . , Cm+i- But this linear system has m + l unknowns and m equations; therefore, by 2.1.4, there is a non-trivial solution C. In conse- quence there are indeed scalars c, C2,..., Cm+i, not all zero, which make the vector ciUi + C2U2 + • • • + cm_|_ium+i zero. Corollary 5.1.2 If V is a vector space which can be generated by m elements, then every subset of V with m + l or more vectors is linearly dependent. Thus the number of elements in a linearly independent subset of a finitely generated vector space cannot exceed the number of generators. On the other hand, if a subset is to generate a vector space, it surely cannot be too small. We unite these two contrasting requirements in the definition of a basis.
  • 130. 114 Chapter Five: Basis and Dimension Bases Let X be a non-empty subset of a vector space V. Then X is called a basis of V if both of the following are true: (i) X is linearly independent; (ii) X generates V. Example 5.1.1 As a first example of a basis, consider the columns of the identity n x n matrix In: Ex = O w , E2 ( W ... , En = 0 W From the equation ciEi + c2E2 H h cnEn = f c x cn/ it follows that E, E2,.. • , En generate Rn . But these vectors are also linearly independent; for the equation also shows that cE + c2E2 + • • • + cnEn cannot equal zero unless all the Cj are zero. Therefore the vectors Ei, E2,..., En form a basis of the Euclidean space Rn . This is called the standard basis of Rn . An important property of bases is uniqueness of express- ibility of vectors.
  • 131. 5.1: Existence of a Basis 115 Theorem 5.1.3 If{ v l j v 2 > • • • j v n } is a basis of a vector space V, then each vector v in V has a unique expression of the form v = civi + c2v2 H h cnvn /or certain scalars Ci. Proof If there are two such expressions for v, say civi + • • • + cn vn and diVi + • • • + dnvn, then, by equating these, we arrive at the equation (ci - di)vi H h (cn - dn)vn — 0. By linear independence of the Vi this can only mean that c^ = di for all i, so the expression is unique as claimed. Naturally the question arises: does every vector space have a basis? The answer is negative in general. Since a zero space has 0 as its only vector, it has no linearly independent subsets at all; thus a zero space cannot have a basis. However, apart from this uninteresting case, every finitely generated vector space has a basis, a fundamental result that will now be proved. Notice that such a basis must be finite by 5.1.2. Theorem 5.1.4 Let V be a finitely generated vector space and suppose that XQ is a linearly independent subset of V. Then XQ is contained in some basis XofV. Proof Suppose that V is generated by m elements. Then by 5.1.2 no linearly independent subset of V can contain more than m elements. From this it follows that there exists a subset X of V containing X0 which is as large as possible subject to being linearly independent. For if this were false, it would be
  • 132. 116 Chapter Five: Basis and Dimension possible to find arbitrarily large linearly independent subsets of V. We will prove the theorem by showing that the subset X is a basis of V. Write X = {vi, V2, • .. , vn }. Suppose that u is a vector in V which does not belong to X. Then the subset {vi, V2,..., vn , u} must be linearly dependent since it properly contains X. Hence there is a linear relation of the form C1V1 + c2v2 H h cnvn + du = 0 where not all of the scalars c±, C2,..., cn, d are zero. Now if the scalar d were zero, it would follow that cV + C2V2 + • • • + cnvn = 0, which, in view of the linear independence of vi, V2,..., vn , could only mean that c = c2 = • • • = cn = 0. But now all the scalars are zero, which is not true. Therefore d 7^ 0. Consequently we can solve the above equation for u to obtain u = (-o?_1 c1)vi + (-d~1 c2)r 2 H 1 - (-rf_1 cn)vn. Hence u belongs to < v i , . . . , vn > . Prom this it follows that the vectors v i , . . . , vn generate V; since these are also linearly independent, they form a basis of V. Corollary 5.1.5 Every non-zero finitely generated vector space V has a basis. Indeed by hypothesis V contains a non-zero vector, say v. Then {v} is linearly independent and by 5.1.4 it is contained in a basis of V. Usually a vector space will have many bases. For exam- ple, the vector space R2 has the basis
  • 133. 5.1: Existence of a Basis 117 as well as the standard basis ©• (!) • And one can easily think of other examples. It is therefore a very significant fact that all bases of a finitely generated vector space have the same number of elements. Theorem 5.1.6 Let V be a non-zero finitely generated vector space. Then any two bases of V have equal numbers of elements. Proof Let {ui,U2,... ,um } and {vi,V2,..., vn } be two bases of V. Then V = < u 1 , u 2 , . , u m > and it follows from 5.1.2 that no linearly independent subset of V can have more than m elements; hence n < m . In the same fashion we argue that m < n. Therefore m = n. Dimension Let V be a finitely generated vector space. If V is non- zero, define the dimension of V to be the number of elements in a basis of V; this definition makes sense because 5.1.6 guar- antees that all bases of V have the same number of elements. Of course, a zero space does not have a basis; however it is convenient to define the dimension of a zero space to be 0, so that every finitely generated vector space has a dimension. The dimension of a finitely generated vector space V is de- noted by dim(V). In fact infinitely generated vector spaces also have bases, and it is even possible to assign a dimension to such a space,
  • 134. 118 Chapter Five: Basis and Dimension namely a cardinal number, which is a sort of infinite analog of a positive integer. However this goes well beyond our brief, so we shall say no more about it. Example 5.1.2 The dimension of Rn is n; indeed it has already been shown in Example 5.1.1 that the columns of the identity matrix In form a basis of Rn . Example 5.1.3 The dimension of Pn (R) is n. In this case the polynomials l,x,x2 ,... ,xn ~1 form a basis (called the standard basis) of Pn(R). Example 5.1.4 Find a basis for the null space of the matrix A = Recall that the null space of A is the subspace of R4 consisting of all solutions X of the linear system AX = 0. To solve this system, put A in reduced row echelon form using row operations: 1 0 4/3 4/3' 0 1 1/3 - 2 / 3 0 0 0 0 From this we read off the general solution in the usual way: ( -Ac/3 - 4d/3 • X =
  • 135. 5.1: Existence of a Basis 119 Now X can be written in the form X = c / - 4 / 3 - 1 / 3 J/ + d /-4/3N 2/3 0 V i/ where c and d are arbitrary scalars. Hence the null space of A is generated by the vectors X, = Xo ( -4/3 2/3 0 1 / Notice that these vectors are obtained from the general solu- tion X by putting c = 1, d = 0, and then c = 0, d = 1. Now Xi and Xi are linearly independent. Indeed, if we assume that some linear combination of them is zero, then, because of the configuration of 0's and l's, the scalars are forced to be be zero. It follows that X and X2 form a basis of the null space of A, which therefore has dimension equal to 2. It should be clear to the reader that this example de- scribes a general method for finding a basis, and hence the di- mension, of the null space of an arbitrary mxn matrix A. The procedure goes as follows. Using elementary row operations, put A in reduced row echelon form, with say r pivots. Then the general solution of the linear system AX = 0 will con- tain n — r arbitrary scalars, say ci, C2,..., cn _r . The method of solving linear systems by elementary row operations shows that the general solution can be written in the form X = cXx + C2X2 + • • • + cn-rXn—r where Xi,..., Xn-r are particular solutions. In fact the solu- tion Xi arises from X when we put c; = 1 and all other Cj's
  • 136. 120 Chapter Five: Basis and Dimension equal to 0. The vectors Xi,X2, • • • ,Xn~r are linearly inde- pendent, just as in the example, because of the arrangement of O's and l's among their entries. It follows that a basis of the null space of A is {Xi,X2, • • ., Xn-r}. We can therefore state: Theorem 5.1.7 Let A be a matrix with n columns and suppose that the number of pivots in the reduced row echelon form of A is r. Then the null space of A has dimension n — r. Coordinate column vectors Let V be a vector space with an ordered basis {vi,..., vn }; this means that the basis vectors are to be writ- ten in the prescribed order. We have seen in 5.1.3 that each vector v of V has a unique expression in terms of the basis, V = CiVi -i h CnVn say. Thus v is completely determined by the scalars cx,..., cn. We call the column / C ! cnJ the coordinate vector of v with respect to the ordered basis {vi,..., vn }. Thus each vector in the abstract vector space V is represented by an n-column vector. This provides us with a concrete way of representing abstract vectors. Example 5.1.5 Find the coordinate vector of ( j with respect to the ordered basis of R2 consisting of the vectors ( ! ) • ( ! ) •
  • 137. 5.1: Existence of a Basis 121 First notice that these two vectors are linearly indepen- dent and generate R2 , so that they form a basis. We need to find scalars c and d such that 1)-«(!)-(! This amounts to solving the linear system c + 3d = 2 c + Ad = 3 The unique solution i s c = — 1, d = 1, and hence the coordi- -1 nate vector is , Coordinate vectors provide us with a method of testing a subset of an arbitrary finitely generated vector space for linear dependence. Theorem 5.1.8 Let {vi,..., vn } be an ordered basis of a vector space V. Let Ui,..., um be a set of vectors in V whose coordinate vectors with respect to the given ordered basis are Xi,..., Xm respec- tively. Then {u±,... ,um } is linearly dependent if and only if the number of pivots of the matrix A = [X1IX2I... Xm] is less than m. Proof Write Uj = ^ " = 1 ajiVj then the entries of Xi are an,..., ani, so the (j,i) entry of A is a^. If ci,... ,cm are any scalars, then m n n m cxui H h cmum = ^T Ci (^2 ajiVj) = ^2(^2 a ji°i)v j- i=l j = l j=l i = l
  • 138. 122 Chapter Five: Basis and Dimension Since v i , . . . , vn are linearly independent, the only way that C1U1 + • • • + cmurn can be zero is if the sums Y^lLi a jic i vanish for j — 1,... ,n. This amounts to requiring that AC = 0 where C is the column consisting of ci,...,cm . We know from 2.1.3 that there is such a C different from 0 precisely when the number of pivots of A is less than m. So this is the condition for u i , . . . , um to be linearly dependent. Example 5.1.6 Are the polynomials l — x + 2x2 — x3 , x + xs , 2 + x + 4x2 +x3 linearly independent in Pt(R)? Use the standard ordered basis {1 > X • OC < X 3 } of P4(R). Then the coordinate columns of the given polynomials are the columns of the matrix / 1 0 2 - 1 1 1 2 0 4 V-i i i Using row operations, we see that the number of pivots of the matrix is 2, which is less than the number of vectors. Therefore the given polynomials are linearly dependent. The next theorem lessens the work needed to show that a particular set is a basis. Theorem 5.1.9 Let V be a finitely generated vector space with positive dimen- sion n. Then (i) any set of n linearly independent vectors of V is a basis; (ii) any set of n vectors that generates V is a basis. Proof Assume first that the vectors vi, v2 ) ..., vn are linearly inde- pendent. Then by 5.1.4 the set {vi, v2 ,..., vn } is contained
  • 139. 5.1: Existence of a Basis 123 in a basis of V. But the latter must have n elements by 5.1.6, and so it coincides with the set of Vj's. Now assume that the vectors vi, v2 ,..., vn generate V. If these vectors are linearly dependent, then one of them, say Vj, can be expressed as a linear combination of the others. But this means that we can dispense with Vj completely and generate V using only the v^'s for j ^ i, of which there are n—1. Therefore dim(V) < n—1 by 5.1.2. By this contradiction vi, V2,..., vn are linearly independent, so they form a basis of V. Example 5.1.7 The vectors ( - : ) •( : ) •( ! ) are linearly independent since the matrix which they form has three pivots; therefore these vectors constitute a basis of R3 . We conclude with an application of the ideas of this sec- tion to accounting systems. Example 5.1.8 (Transactions on an accounting system) Consider an accounting system with n accounts, say cti, CK2,..., oin- At any instant each account has a balance which can be a credit (positive), a debit (negative), or zero. Since the accounting system must at all times be in balance, the sum of the balances of all the accounts will always be zero. Now suppose that a transaction is applied to the system. By this we mean that there is a flow of funds between accounts of the system. If as a result of the transaction the balance of ac- count Q.i changes by an amount £$, then the transaction can be represented by an n-column vector with entries t,t2, • • •, tn. Since the accounting system must still be in balance after the transaction has been applied, the sum of the ti will be zero.
  • 140. 124 Chapter Five: Basis and Dimension Hence the transactions correspond to column vectors h such that ti- -tn = 0. Now vectors of this form are easily seen to constitute a subspace T of the vector space Rn ; this is called the transaction space. Evidently T is just the null space of the matrix A = ( 1 1 • • • 1 0 0 0 • • • 0 0 0 0 0, Now A is already in reduced row echelon form, so we can read off at once the general solution of the linear system AX = 0 : / - c 2 - c3 X = c2 C3 Cn J with arbitrary real scalars c2, C3,..., cn. Now we can find a basis of the null space in the usual way. For i = 2,..., n define Ti to be the n-column vector with first entry —1, zth entry 1, and all other entries zero. Then X = c2T2 + C3T3 + • • • + cnTn and {T2, T3,..., Tn} is a basis of the transaction space T. Thus dim(T) = n - 1. Observe that Ti corresponds to a simple transaction, in which there is a flow of funds amounting to
  • 141. 5.1: Existence of a Basis 125 one unit from account a. to account an and which does not affect other accounts. Exercises 5.1 1. Show that the following sets of vectors form bases of R3 , and then express the vectors Ei, E2, E3 of the standard basis in terms of these: 2 3 1 3 1 4 1 2 1 1 - 7 0 (b) Yt = 1 , Y2 = 1 , Y3 = 2. Find a basis for the null space of each of the following matrices: 1 - 5 (a) | - 4 2 - 6 J ; (b) 3 1 3. What is the dimension of the vector space MmjTl(F) where F is an arbitrary field of scalars? 4. Let V be a vector space containing vectors vi, V2,..., vn and suppose that each vector of V has a unique expression as a linear combination of vi, V2,..., vn . Prove that the Vj's form a basis of V. 5. If S is a subspace of a finitely generated vector space V, establish the inequality dim(S') < dim(V).
  • 142. 126 Chapter Five: Basis and Dimension 6. If in the last problem dim(5r ) = dim(V), show that S = V. 7. If V is a vector space of dimension n, show that for each integer i satisfying 0 < i < n there is a subspace of V which has dimension i. ( 6 8. Write the transaction I —4 as a linear combination of v-v simple transactions. 9. Prove that vectors A, B, C generate R3 if and only if none of these vectors belongs to the subspace generated by the other two. Interpret this result geometrically. 10. If V is a vector space with dimension n over the field of two elements, prove that V contains exactly 2n vectors. 5.2 The Row and Column Spaces of a Matrix Let A be an m x n matrix over some field of scalars F. Then the columns of A are m-column vectors, so they belong to the vector space Fm , while the rows of A are n-row vectors and belong to the vector space Fn. Thus there are two natural subspaces associated with A, the row space, which is generated by the rows of A and is a subspace of Fn, and the column space, generated by the columns of A, which is a subspace of Fm . We begin the study of these important subspaces by in- vestigating the effect upon them of applying row and column operations to the matrix. Theorem 5.2.1 Let A be any matrix. (i) The row space is unchanged when an elementary row operation is applied to A. (ii) The column space is unchanged when an elementary column operation is applied to A.
  • 143. 5.2: The Row and Column Spaces of a Matrix 127 Proof Let B arise from A when an elementary row operation is ap- plied. Then by 2.3.1 there is an elementary matrix E such that B = EA. The row-times-column rule of matrix multi- plication shows that each row of B is a linear combination of the rows of A. Hence the row space of B is contained in the row space of A. But A = E~1 B, since elementary matrices are invertible, so the same argument shows that the row space of A is contained in the row space of B. Therefore the row spaces of A and B are identical. Of course, the argument for column spaces is analogous. There are simple procedures available for finding bases for the row and column spaces of a matrix. (I) To find a basis of the row space of a matrix A, use elementary row operations to put A in reduced row echelon form. Discard any zero rows; then the remaining rows will form a basis of the row space of A. (II) To find a basis of the column space of a matrix A, use elementary column operations to put A in reduced column echelon form. Discard any zero columns; then the remaining columns will form a basis of the column space of A. Why do these procedures work? By 5.2.1 the row space of A equals the row space of R, its reduced row echelon form, and this is certainly generated by the non-zero rows of R. Also the non-zero rows of R are linearly independent because of the arrangement of O's and l's in R ; therefore these rows form a basis of the row space of A. Again the argument for columns is similar. This discussion makes the following result obvious. Corollary 5.2.2 For any matrix the dimension of the row space equals the num- ber of pivots in reduced row echelon form, with a like statement for columns.
  • 144. 128 Chapter Five: Basis and Dimension Example 5.2.1 Consider the matrix ( 2 1 1 3 2 - 1 2 1 1 3 o o i o i • 0 1 0 1 1 / The reduced row echelon form of A is found to be / l 0 0 1 0 0 1 0 1 1 o o i o i • VO 0 0 0 0/ Hence the row vectors [1 0 0 1 0], [0 1 0 1 1], [0 0 1 0 1] form a basis of the row space of A and the dimension of this space is 3. In general elementary row operations change the column space of a matrix, and column operations change the row space. However it is an important fact that such operations do not change the dimension. Theorem 5.2.3 For any matrix, elementary row operations do not change the dimension of the column space and elementary column opera- tions do not change the dimension of the row space. Proof Take the case of row operations first. Let A be a matrix with n columns and suppose that B = EA where E is an elementary matrix. We have to show that the column spaces of A and B have the same dimension. Denote the columns of A by Ai, A2,. . ., An. If some of these columns are linearly dependent, then there are integers i± < i2 < • • • < ir and non-zero scalars c^, Cj2 ,..., cir such that c^A^ +ci2Ai2- h cir Air = 0.
  • 145. 5.2: The Row and Column Spaces of a Matrix 129 Consequently there is a non-trivial solution C of the linear system AC = 0 such that Cj =fi 0 for j = i%,..., ir. Using the equation B = EA, we find that BC = EAC = EO = 0. This means that columns i i , . . . , ir of B are also linearly dependent. Therefore, if columns j i , . . . , ja of B are linearly independent, then so are columns ji, • • • ,j3 of A. Hence the dimension of the column space of B does not exceed the dimension of the column space of A. Since A = E~l B, this argument can be applied equally well to show that the dimension of the column space of A does not exceed that of B. Therefore these dimensions are equal. The truth of the corresponding statement for row spaces can be quickly deduced from what has just been proved. Let B = AE where E is an elementary matrix. Then BT = (AE)T — ET AT . Now ET is also an elementary matrix, so by the last paragraph the column spaces of AT and BT have the same dimension. But obviously the column space of AT and the row space of A have the same dimension, and there is a similar statement for B: the required result follows at once. We are now in a position to connect row and column spaces with normal form and at the same time to clarify a point left open in Chapter Two. Theorem 5.2.4 If A is any matrix, then the following integers are equal: (i) the dimension of the row space of A; (ii) the dimension of the column space of A; (iii) the number of 1's in a normal form of A. Proof By applying elementary row and column operations to A, we can reduce it to normal form, say
  • 146. 130 Chapter Five: Basis and Dimension Now by 5.2.1 and 5.2.3 the row spaces of A and N have the same dimension, with a like statement for column spaces. But it is clear from the form of N that the dimensions of its row and column spaces are both equal to r, so the result follows. It is a consequence of 5.2.4 that every matrix has a unique normal form; for the normal form is completely determined by the number of l's on the diagonal. The rank of a matrix The rank of a matrix is defined to be the dimension of the row or column space. With this definition we can reformulate the condition for a linear system to be consistent. Theorem 5.2.5 A linear system is consistent if and only if the ranks of the coefficient matrix and the augmented matrix are equal. This is an immediate consequence of 2.2.1, and 5.2.2. Finding a basis for a subspace Suppose that X, X2, • • •, -Xfc are vectors in Fn where F is a field. In effect we already know how to find a basis for the subspace generated by these vectors; for this subspace is simply the column space of the matrix [Xi|X2| ... X^]. But what about subspaces of vector spaces other than Fn ? It turns out that use of coordinate vectors allows us to reduce the problem to the case of Fn . Let V be a vector space over F with a given ordered basis vi, v2 ,..., vn , and suppose that S is the subspace of V generated by some given set of vectors w1 ) W2,...,wm . The problem is to find a basis of S. Recall that each vec- tor in V has a unique expression as a linear combination of the basis vectors v i , . . . , vn and hence has a unique coordi- nate column vector, as described in 5.1. Let w, have co- ordinate column vector Xi with respect to the given basis. Then the coordinate column vector of the linear combination
  • 147. 5.2: The Row and Column Spaces of a Matrix 131 ciwi + c2w2 -I 1 - cfcwfc is surely cxXx + c2X2 H h CkXk. Hence the set of all coordinate column vectors of elements of S equals the subspace T of Fn which is generated by Xx,..., Xk • Moreover wi, W2,. • •, w/. will be linearly independent if and only if Xi, X2 ,..., Xk are. In short wi, w2 ,..., w*; form a basis of S if and only if X, X2, • . •, Xk form a basis of T; thus our problem is solved. Example 5.2.2 Find a basis for the subspace of Pt(R) generated by the poly- nomials 1 — x — 2x3 , 1 + x3 , 1 + x + 4x3 , x2 . Of course we will use the standard ordered basis for P^TV) consisting of l,x,x2 ,x3 . The first step is to write down the coordinate vectors of the given polynomials with respect to the standard basis and arrange them as the columns of a matrix A; thus / 1 1 1 0 - 1 0 1 0 0 0 0 1 ' V-2 1 4 0/ To find a basis for the column space of A, use column opera- tions to put it in reduced column echelon form: / l 0 0 0 0 1 0 0 0 0 1 0 ' 1 3 0 0 / The first three columns form a basis for the column space of A. Therefore we get a basis for the subspace of P4(R) gen- erated by the given polynomials by simply writing down the polynomials that have these columns as their coordinate col- umn vectors; in this way we arrive at the basis l + x3 , x + 3x3 , x2 .
  • 148. 132 Chapter Five: Basis and Dimension Hence the subspace generated by the given polynomials has dimension 3. Exercises 5.2 1. Find bases for the row and column spaces of the following matrices: <-»C2 =S i)-o» ("J J| J)- 2. Find bases for the subspaces generated by the given vectors in the vector spaces indicated: (a) l - 2 z - : r 3 , 3x-x2 , l + x + x2 +x3 , 4 + 7x + x2 + 2x3 in P4(R); 3. Let A be a matrix and let N, R and C be the null space, row space and column space of A respectively. Prove that dim(JR) + dim(iV) = dim(C) + dim(iV) = n where n is the number of columns of A. 4. If A is any matrix, show that A and AT have the same rank. 5. Suppose that A is an m x n matrix with rank r. What is the dimension of the null space of AT 7 6. Let A and B be m x n and n x p matrices respectively. Prove that the row space of AB is contained in the row space of B, and the column space of AB is contained in the the column space of A. What can one conclude about the ranks of AB and BA ? 7. The rank of a matrix can be defined as the maximum num- ber of rows in an invertible submatrix: justify this statement.
  • 149. 5.3: Operations with Subspaces 133 5.3 Operations with Subspaces If U and W are subspaces of a vector space V, there are two natural ways of combining U and W to form new subspaces of V. The first of these subspaces is the intersection unw, which is the set of all vectors that belong to both U and V. The second subspace that can be formed from U and W is not, as one might perhaps expect, their union U UW; for this is not in general closed under addition, so it may not be a subspace. The subspace we are looking for is the sum U + W, which is denned to be the set of all vectors of the form u + w where u belongs to U and w to W. The first point to note is that these are indeed subspaces. Theorem 5.3.1 If U and W are subspaces of a vector space V, then U CW and U + W are subspaces of V. Proof Certainly U D W contains the zero vector and it is closed with respect to addition and scalar multiplication since both U and W are; therefore U fl W is a subspace. The same method applies to U + W. Clearly this contains 0 + 0 = 0. Also, if Ui, U2 and wi, w2 are vectors in U and W respectively, and c is a scalar, then (ui + wi) + (u2 + w2) = (ui + u2) + (wi + w2) and c(ui + w i ) = cui + cwi,
  • 150. 134 Chapter Five: Basis and Dimension both of which belong to U + W. Thus U + W is closed with respect to addition and scalar multiplication and so it is a subspace. Example 5.3.1 Consider the subspaces U and W of R4 consisting of all vectors of the forms (a f° b d and c e Vo/ fJ respectively, where a, b, c, d, e are arbitrary scalars. Then U n W consists of all vectors of the form ft) c ' w while U + W equals R4 since every vector in R4 can be ex- pressed as the sum of a vector in U and a vector in W. For subspaces of a finitely generated vector space there is an important formula connecting the dimensions of their sum and intersection. Theorem 5.3.2 Let U and W be subspaces of a finitely generated vector space V. Then dim(U + W)+ dim(U n W) = dim(U) + dim(W). Proof If U = 0, then obviously U + W = W and U n W = 0; in this case the formula is certainly true, as it is when W = 0.
  • 151. 5.3: Operations with Subspaces 135 Assume therefore that [ 7 ^ 0 and W ^ 0, and put m = dim(U) and n = dim(W). Consider first the case where U D W = 0. Let {ui,u2 ,... ,um } and {wi,w2, • • . , wn } be bases of U and W respectively. Then the vectors u i , . . . , um and w i , . . . , wn surely generate U + W. In fact these vectors are also linearly independent: for if there is a linear relation between them, say CiUi H h cm um + diwi H h dn wn = 0, then ciUi H h cmum = (-di)wi H h (-dn)wn , a vector which belongs to both U and W, and so to U HW, which is the zero subspace. Consequently this vector must be the zero vector. Therefore all the Cj and dj must be zero since the Ui are linearly independent, as are the Wj. Consequently the vectors u i , . . . , um , w i , . . . , wn form a basis of U + W, so that dim([7 + W) = m + n = dim(J7) + dim(W), the correct formula since U n W = 0 in the case under consideration. Now we tackle the more difficult case where U D W ^ 0. First choose a basis for UTW, say {zi,..., zr }. By 5.1.4 this may be extended to bases of U and of W, say {zi,..., zr, u,-_|_i,..., u m | and {zi,. . . , z r , w r + i , . . . ,wn } respectively. Now the vectors Zi . . . , Z r , U r + i , . . . , U m , W r + 1 , . . . , W n generate U + W: for we can express any vector of U or W in terms of them. What still needs to be proved is that they are
  • 152. 136 Chapter Five: Basis and Dimension linearly independent. Suppose that in fact there is a linear relation r m n ]PeiZ;+ ^ C 3U j+ 5Z dk ™k = 0 i=l j=r+l fc=r+l where the e^, Cj, dk are scalars. Then n r m Y^ dkwk = ^2(-ei)zi+ X (-CJ)UJ, fc=r+l i=l j = r + l which belongs to both U and VF and so to U f) W. The vector ^ dfcWfc is therefore expressible as a linear combination of the Zi since these vectors are known to form a basis of the subspace U D W. However Zi,..., zr, w r + i , . . . , wn are definitely linearly independent. Therefore all the dj are zero and our linear relation becomes r m J2eiZi+ ^ CjUj = 0. But z i , . . . , zr, u r + i , , um are linearly independent, so it fol- lows that the Cj and the e^ are also zero, which establishes linear independence. We conclude that the vectors z i , . . . , zr, u r + i , . . . , um , w r + i , . . . , wn form a basis of U + W. A count of the basis vectors reveals that dim(C7 + W) equals r + (m — r) + (n — r) = m + n — r = dim(U) + dim(W) - dim(U D W). Example 5.3.2 Suppose that U and W are subspaces of R10 with dimensions 6 and 8 respectively. Find the smallest possible dimension for unw.
  • 153. 5.3: Operations with Subspaces 137 Of course dim(R10 ) = 10 and, since U + W is a subspace of R10 , its dimension cannot exceed 10. Therefore by 5.3.2 dim(C/ fl W) = dim(C7) + dim(W) - dim(U + W) > 6 + 8 - 1 0 = 4. So the dimension of the intersection is at least 4. The reader is challenged to think of an example which shows that the intersection really can have dimension 4. Direct sums of subspaces Let U and W be two subspaces of a vector space V. Then V is said to be the direct sum of U and W if V = U + W and UnW = 0. The notation for the direct sum is v = u®w. Notice the consequence of the definition: each vector v of V has a unique expression of the form v = u + w where u belongs to U and w to W. Indeed, if there are two such expressions v = ui + wi = U2 + W2 with Uj in U and Wj in W, then ui — u2 = w2 — wi, which belongs to U D W = 0; hence ui = u2 and wi = W2. Example 5.3.3 Let U denote the subset of R3 consisting of all vectors of the form (i)
  • 154. 138 Chapter Five: Basis and Dimension and let W be the subset of all vectors of the form ( ! ) where a, b, c are arbitrary scalars. Then U and W are sub- spaces of R3 . In addition U + W = R3 and UCW = 0. Hence R3 = u 8 W. Theorem 5.3.3 If V is a finitely generated vector space and U and W are subspaces of V such that V = U ® W, then dim(V) = dim(C7) + dim(W). This follows at once from 5.3.2 since dim(U DW) = 0 . Direct sums of more than two subspaces The concept of a direct sum can be extended to any finite set of subspaces. Let U, U2, • • •, £4 be subspaces of a vector space V. First of all define the sum of these subspaces t/i + • • • + Uk to be the set of all vectors of the form Ui + • • • + u& where Uj belongs to Ui. This is clearly a subspace of V. The vector space V is said to be the direct sum of the subspaces U,... Uk, in symbols v = u1@u2®---®uk, if the following hold: (i)V = U1 + --- + Uk; (ii) for each i = 1, 2,..., k the intersection of Ui with the sum of all the other subspaces Uj, j ^ i, equals zero.
  • 155. 5.3: Operations with Subspaces 139 In fact these are equivalent to requiring that every ele- ment of V be expressible in a unique fashion as a sum of the form ui + • • • + Ufc where u^ belongs to Ui. The concept of a direct sum is a useful one since it often allows us to express a vector space as a direct sum of subspaces that are in some sense simpler. Example 5.3.4 Let Ui,U2, U3 be the subspaces of R5 which consist of all vectors of the forms 0 o 0 > b 0 c > / d 0 0 0 respectively, where a, b, c, d, e are arbitrary scalars. R5 = Ui@U2®U3. Then Bases for the sum and intersection of subspaces Suppose that V is a vector space over a field F with posi- tive dimension n and let there be given a specific ordered basis. Assume that we have vectors u i , . . . , ur and w i , . . . , ws, gen- erating subspaces U and W respectively. How can we find bases for the subspaces U + W and UTiW and hence compute their dimensions? The first step in the solution is to translate the problem to the vector space Fn . Associate with each Ui and Wj its coordinate column vector Xi and Yj with respect to the given ordered basis of V. Then X,..., Xr and Y,..., Ys generate respective subspaces U* and W* of Fn . It is sufficient if we can find bases for U* + W* and U* D W* since from these bases for U + W and U CW can be read off. So assume from now on that V equals Fn .
  • 156. 140 Chapter Five: Basis and Dimension Take the case of U + W first - it is the easier one. Let A be the matrix whose columns are u i , . . . , u r : remember that these are now n-column vectors. Also let B be the matrix whose columns are w i , . . . , ws. Then U+W is just the column space of the matrix M = [A B]. A basis for U + W can therefore be found by putting M in reduced column echelon form and deleting the zero columns. Turning now to UDW, we look for scalars Ci and dj such that C1U1 + h crur — diwi H 1 - dswa : for every element of U D W is of this form. Equivalently C1U1 H h crur + (-di)wi H h (-d8)w8 = 0. Now this equation asserts that the vector / C l -di -daJ belongs to the null space of [A | B]. A method for finding a basis for the null space of a matrix was described in 5.1. To complete the process, read off the the first r entries of each vector in the basis of the null space of [A B], and take these entries to be c1 ; ..., cr. The resulting vectors form a basis of unw. Example 5.3.5 Let M = 2 2 1 0 1 -2 1 2 5 -1 5 1 2 - 1 3 /
  • 157. 5.3: Operations with Subspaces 141 and denote by U and W the subspaces of R4 generated by columns 1 and 2, and by columns 3 and 4 of M respectively. Find a basis for U + W. Apply the procedure for finding a basis of the column space of M. Putting M in reduced column echelon form, we obtain / l 0 0 0 0 1 0 0 0 0 1 0 ' 3 - 1 / 3 - 2 / 3 0 / The first three columns of this matrix form a basis of U + W; hence dim(C7 + W)=3. Example 5.3.6 Find a basis of U fl W where U and W are the subspaces of Example 5.3.5. Following the procedure indicated above, we put the ma- trix M in reduced row echelon form: / l 0 0 - 1 0 1 0 - 1 j 0 0 1 1 I ' 0 0 0 0 / From this a basis for the null space of M can be read off, as described in the paragraph preceding 5.1.7; in this case the basis has the single element 1 - 1 " V i/ Therefore a basis for U D W is obtained by taking the linear combination of the generating vectors of U corresponding to
  • 158. 142 Chapter Five: Basis and Dimension the scalars in the first two rows of this vector, that is to say + 1- 3 0 w 1. Thus dim(C7 n W) Example 5.3.7 Find bases for the sum and intersection of the subspaces U and W of P4CR) generated by the respective sets of polynomials {l + 2x + x3 , 1 x x2 } and {x + x2 - 3x3 , 2 + 2x - 2x3 }. The first step is to translate the problem to R4 by writing down the coordinate columns of the given polynomials with respect to the standard ordered basis 1, 3 of P4(R). Arranged as the columns of a matrix, these are A = Let U* and W* be the subspaces of R4 generated by the coordinate columns of the polynomials that generate U and W, that is, by columns 1 and 2, and by columns 3 and 4 of A respectively. Now find bases for U* + W* and U* D W*, just as in Examples 5.3.5 and 5.3.6. It emerges that U* + W*, which is just the column space of A, has a basis / ! 2 0 1 1 - 1 - 1 0 0 1 1 - 3 2 2 0 - 2
  • 159. 5.3: Operations with Subspaces 143 On writing down the polynomials with these coordinate vec- tors, we obtain the basis l-3x3 , x + 2x3 , x2 -5x3 for U* + W*. In the case of U D W the procedure is to find a basis for U* Pi W*. This turns out to consist of the single vector ( ; ) Finally, read off that the polynomial 1 • (1 + 2x + x3 ) + 1 • (1 - x - x2 ) = 2 + x - x2 + x3 forms a basis of U l~l W. Quotient Spaces We conclude the section by describing another subspace operation, the formation of the quotient space of a vector space with respect to a subspace. This new vector space is formed by identifying the vectors in certain subsets of the given vector space, which is a construction found throughout algebra. Proceeding now to the details, let us consider a vector space V with a fixed subspace U. The first step is to define certain subsets called cosets: the coset of U containing a given vector v is the subset of V v + [/ = {v + u|ue[/}. Notice that the coset v + U really does contain the vector v since v = v + OEv + U. Observe also that the coset v + U can be represented by any one of its elements in the sense that (v + u) + U = v + U for all u G U. An important feature of the cosets of a given subspace is that they are disjoint, i.e., they do not overlap.
  • 160. 144 Chapter Five: Basis and Dimension Lemma 5.3.4 If U is a subspace of a vector space V, then distinct cosets of U are disjoint. Thus V is the disjoint union of all the distinct cosets of U. Proof Suppose that cosets v + U and w + U both contain a vector x: we will show that these cosets are the same. By hypothesis there are vectors ui, U2 in U such that X = V + U i = W + U2- Hence v = w + u where u = 112 — Ui G U, and consequently v + U = (w + u) + U = 'w + U, since u + U = U, as claimed. Finally, V is the union of all the cosets of U since v € v + U. The set of all cosets of U in V is written V/U. A good way to think about V/U is that its elements arise by identifying all the elements in a coset, so that each coset has been "compressed" to a single vector. The next step in the construction is to turn V/U into a vector space by defining addition and scalar multiplication on it. There are natural definitions for these operations, namely (v + U) + (w + U) = (v + w) + U c(v + U) = (cv) + U where v, w € V and c is a scalar. Although these definitions look natural, some care must be exercised. For a coset can be represented by any of its vectors, so we must make certain that the definitions just given do not depend on the choice of v and w in the cosets v + U and w + U.
  • 161. 5.3: Operations with Subspaces 145 To verify this, suppose we had chosen different represen- tatives, say v' for v + 17 and w' for w + U. Then v' = v + Ui and w' = w + u2 where ui, u2 £ U. Therefore v' + w' = (v + w) + (ui + u2) e (v + w) + U, so that (v7 + w') + U = (v + w) + U. Also cv' = cv + cu± < E (cu) + U and hence cv' + U = cv + U. These arguments show that our definitions are free from dependency on the choice of coset representatives. Theorem 5.3.5 If U is a subspace of a vector space V over a field F, then V/U is a vector space over F where sum and scalar multiplication are defined above: also the zero vector is 0 + U = U and the negative of v + U is (—v) + U. Proof We have to check that the vector space axioms hold for V/U, which is an entirely routine task. As an example, let us verify one of the distributive laws. Let v, w G V and let c E F. Then by definition c((v + U) + (w + U)) = c((v + w) + U) = c(v + w) + U = (cv + cw) + U, which by definition equals (cv+U) + (cw+U). This establishes the distributive law. Verification of the other axioms is left to the reader as an exercise. It also is easy to check that 0 is the zero vector and (—v) + U the negative ofv + U. Example 5.3.8 Suppose we take U to be the zero subspace of the vector space V: then V/0 consists of all v + 0 = {v}, i.e., the one-element subsets of V. While V/0 is not the same vector space as V, the two spaces are clearly very much alike: this can be made precise by saying that they are isomorphic (see 6.3).
  • 162. 146 Chapter Five: Basis and Dimension At the opposite extreme, we could take U = V. Now V/V consists of the cosets v + V = V, i.e., there is just one element. So V/V is a zero vector space. We move on to more interesting examples of coset forma- tion. Example 5.3.9 Let S be the set of all solutions of a consistent linear system AX = B of m equations in n unknowns over a field F. If 5 = 0, then S is a subspace of Fn , namely, the solution space U of the associated homogeneous linear system AX = 0. However, if B ^ 0, then S is not a subspace: but we will see that it is a coset of the subspace U. Since the system is consistent, there is at least one solu- tion, say Xi. Suppose X is another solution. Then we have AX = B and AX = B. Subtracting the first of these equa- tions from the second, we find that 0 = AX - AXl = A(X - Xi), so that X — Xi E U and X £ Xi + U, where U is the solution space of the system AX = 0. Hence every solution of AX = B belongs to the coset X + U and thus S C X + U. Conversely, consider any Y G X + U, say Y = X + Z where Z eU. Then AY = AXl+AZ = B+0 = B. Therefore Y ES SindS = X1 + U. These considerations have established the following re- sult. Theorem 5.3.6 Let AX — B he a consistent linear system. Let X he any fixed solution of the system and let U he the solution space of the associated homogeneous linear system AX = 0. Then the set of all solutions of the linear system AX = B is the coset X^ + U.
  • 163. 5.3: Operations with Subspaces 147 Our last example of coset formation is a geometric one. Example 5.3.10 Let A and B be vectors in R3 representing non-parallel line segments in 3-dimensional space. Then the subspace U=<A, B> has dimension 2 and consists of all cA + dB, (c, d E R). The vectors in U are represented by line segments, drawn from the origin, which lie in a plane P. Now choose X G R , with X = (xi, x2, x3)T . A typical vector in the coset X + U has the form X + cA + dB, with c, d e R, i.e., (xi + cai + dbi x2 + ca2 + db2 X3 + ca3 + db3)T . Now the points (xi+cai+d&i, X2+ca2+db2, Xs + cas + db3) lie in the plane Pi passing through the point (xi, x2, £3), which is parallel to the plane P. This is seen by forming the line segment joining two such points. The elements of X + U correspond to the points in the plane Pi: the latter is called a translate of the plane P. Dimension of a Quotient Space We conclude the discussion by noting a simple formula for the dimension of a quotient space of a finite dimensional vector space. Theorem 5.3.7 Let U be a subspace of a finite dimensional vector space V. Then dim(V/C/) = dim(y) - dim(£/).
  • 164. 148 Chapter Five: Basis and Dimension Proof If U = 0, then dim([7) = 0 and V/0 = {{v} | v G V}, which clearly has the same dimension as V. Thus the formula is valid in this case. Now let U ^ 0 and choose a basis {ui,..., um } of U. By 5.1.4 we may extend this to a basis {ui,..., u m , u m + i , . . . ,un } of V. Here of course m — dim([/) and n = dim(V). A typical n element v of V has the form v = ^ Cjiij, where the c^ are scalars. Next n n v + C/=( Yl CiUi)+U= ^ Ci(ui + U), i=m+l i=m+l m since Yl c iu i ^ U. Hence u m + i + U, ..., un + U generate i=l the quotient space V/U. n On the other hand, if Yl Ciiyn + U) — 0v/u = U, then i=m+l n Y CjUj e U, so that this vector is a linear combination of i=m+l Ui,..., um . Since the Uj are linearly independent, it follows that cm + i — • • • = cn = 0. Therefore u m + i + U, ..., un + U form a basis of V/U and hence dim(V/J7) = n-m = dim(V) - dim(C7). Exercises 5.3 1. Find three distinct subspaces U, V, W of R2 such that n2 = u®v = v®w = weu.
  • 165. 5.3: Operations with Subspaces 149 2. Let U and W denote the sets of all n x n real symmetric and skew-symmetric matrices respectively. Show that these are subspaces of Mn (R), and that Mn (R) is the direct sum of U and W. Find dim(U) and dim(W). 3. Let U and W be subspaces of a vector space V and suppose that each vector v in V has a unique expression of the form v = u + w where u belongs to U and w to W. Prove that V = U e W. 4. Let U, V, W be subspaces of some vector space and suppose that U C W. Prove that (u + v) n w= u + (v n w). 5. Prove or disprove the following statement: if U, V, W are subspaces of a vector space, then (U + V) n W = (U D W) + (VHW). 6. Suppose that U and W are subspaces of Pi4(R) with dim(C/) = 7 and dim(W) = 11. Show that dim(U n l f ) > 4 . Give an example to show that this minimum dimension can occur. 7. Let M be the matrix / 3 3 2 8 1 1 - 1 1 1 1 3 5 - 2 4 6 8 / and let U and W be the subspaces of R4 generated by rows 1 and 2 of M, and by rows 3 and 4 of M respectively. Find the dimensions of U + W and U fl W. 8. Define polynomials /i = 1 - 2x + x3 , f2 = x + x2 - x3 .
  • 166. 150 Chapter Five: Basis and Dimension and 01 = 2 + 2x - Ax2 + x3 , g2 = 1 - x + x2 , g3 = 2 + 3x - x2 . Let U be the subspace of P^(JR) generated by {/i, f2} and let W be the subspace generated by {gx, g2, g3}. Find bases for the subspaces U + W and U CW. 9. Let Ui,... ,Uk be subspaces of a vector space V. Prove that V = U © • • • © Uk if and only if each element of V has a unique expression of the form Ui + • • • + u^ where Uj belongs to Ui. 10. Every vector space of dimension n is a direct sum of n subspaces each of which has dimension 1. Explain why this true. 11. If Ui,..., Uk are subspaces of a finitely generated vec- tor space whose sum is the direct sum, find the dimension of Ui®---®Uk. 12. Let U, U2, U3 be subspaces of a vector space such that Ui n U2 = U2nU3 = U2rUi = 0. Does it follow that U + U2 + U3 = Ux © U2 © U3? Justify your answer. 13. Verify that all the vector space axioms hold for a quotient space V/U. 14. Consider the linear system of Exercise 2.1.1, x + 2x2 — Sx3 + X4 = 7 -xi + x2 - x3 + X4 = 4 (a) Write the general solution of the system in the form XQ + Y, where XQ is a particular solution and Y is the general solution of the associated homogeneous system. (b) Identify the set of all solutions of the given linear system as a coset of the solution space of the associated ho- mogeneous linear system.
  • 167. 5.3: Operations with Subspaces 151 15. Find the dimension of the quotient space Pn(R)/U where U is the subspace of all real constant polynomials. 16. Let V be an n-dimensional vector space over an arbitrary field. Prove that there exists a quotient space of V of each dimension i where 0 < i < n. 17. Let V be a finite-dimensional vector space and let U and W be two subspaces of V. Prove that dim((C7 + W)/W) = dim(U/(U n W)).
  • 168. Chapter Six LINEAR TRANSFORMATIONS A linear transformation is a function between two vector spaces which relates the structures of the spaces. Linear trans- formations include operations as diverse as multiplication of column vectors by matrices and differentiation of functions of a real variable. Despite their diversity, linear transforma- tions have many common properties which can be exploited in different contexts. This is a good reason for studying linear transformations and indeed much else in linear algebra. In order to establish notation and basic ideas, we begin with a brief discussion of functions defined on arbitrary sets. Readers who are familiar with this elementary material may wish to skip 6.1. 6.1 Functions Denned on Sets If X and Y are two non-empty sets, a function or mapping from X to Y, F :X -> y, is a rule that assigns to each element x o f l a unique element F(x) of Y, called the image of x under F. The sets X and Y are called the domain and codomain of the function F re- spectively. The set of all images of elements of X is called the image of the function F; it is written Im(F). Examples of functions abound; the most familiar are quite likely the functions that arise in calculus, namely functions whose domain and codomain are subsets of the set of real numbers R. An example of a function which has the flavor 152
  • 169. 6.1: Functions Defined on Sets 153 of linear algebra is F : MTO)n(R) — > R defined by F(A) = det(A), that is, the determinant function. A very simple, but nonetheless important, example of a function is the identity function on a set X; this is the function lx : X -> X which leaves every element of the set X fixed, that is, lx(x) = x for all elements x of X. Next, three important special types of function will be introduced. A function F : X — > Y is said to be injective (or one-one) if distinct elements of X always have distinct images under F, that is, if the equation F(xi) = F{x2) implies that X = X2- On the other hand, F is said to be surjective (or onto) if every element y of Y is the image under F of at least one element of X, that is, if y = F(x) for some x in X Finally, F is said to be bijective (or a one-one correspondence) if it is both injective and surjective. We need to give some examples to illustrate these con- cepts. For convenience these will be real-valued functions of a real variable x. Example 6.1.1 Define Fx : R -• R by the rule F±(x) = 2X . Then Fx is injective since 2X = 2y clearly implies that x — y. But i* cannot be surjective since 2X is always positive and so, for example, 0 is not the image of any element under F. Example 6.1.2 Define a function F2 : R — > R by F2(x) = x2 (x — 1). Here F2 is not injective; indeed ^ ( 0 ) = 0 = ^ ( 1 ) . However F2 is surjective since the expression x2 (x — 1) assumes all real values as x varies. The best way to see this is to draw the graph of the function y = x2 (x — 1) and observe that it extends over the entire y-axis.
  • 170. 154 Chapter Six: Linear Transformations Example 6.1.3 Define F3 : R -»• R by F2(x) = 2x - 1. This function is both injective and surjective, so it is bijective. (The reader should supply the proof.) Composition of functions Consider two functions F : X — > Y and G : U — » V such that the image of G is a subset of X. Then it is possible to combine the functions to produce a new function called the composite of F and G FoG : U->Y, by applying first G and then F; thus the image of an element x of U is given by the formula FoG(x) = F(G(x)). Here it is necessary to know that Im(G') is contained in X, since otherwise the expression F(G(x)) might be meaningless. Example 6.1.4 Consider the functions F : R2 — > R and G : C — • R2 defined by the rules F((a b )) = v V + 62 and G(a + v ^ ) = Here a and 6 are arbitrary real numbers. Then F o G : C — * R exists and its effect is described by F o G(a + V^lb) = F((2 2 a b)) = ^/Aa? + 4b2 . A basic fact about functional composition is that it sat- isfies the associative law. First let us agree that two functions ft-
  • 171. 6.1: Functions Defined on Sets 155 F and G are to be considered equal - in symbols F = G - if they have the same domain and codomain and if F(x) = G(x) for all x. Theorem 6.1.1 Let F : X —>Y, G : U — > V and H : R — > S be functions such that m{H) is contained in U and Im(G) is contained in X. Then F o (G o H) = (F o G) o H. Proof First observe that the various composites mentioned in the formula make sense: this is because of the assumptions about Im(H) and lm(G). Let x be an element of X. Then, by the definition of a composite, F o (G o H){x) = F((G o H)(x)) = F(G{H(x))). In a similar manner we find that (FoG) oH(x) is also equal to this element. Therefore Fo(GoH) = (FoG)oH, as claimed. Another basic result asserts that a function is unchanged when it is composed with an identity function. Theorem 6.1.2 If F : X —>Y is any function, then F o lx = F — ly o F. The very easy proof is left to the reader as an exercise. Inverses of functions Suppose that F : X —> Y is a function. An inverse of F is a function of the form G : Y — > X such that FoG and G o F are the identity functions on Y and on X respectively, that is, F{G{y)) = y and G(F(x)) = x for all s i n l and y in Y. A function which has an inverse is said to be invertible.
  • 172. 156 Chapter Six: Linear Transformations Example 6.1.5 Consider the functions F and G with domain and codomain R which are defined by F(x) = 2x — 1 and G(x) = (x + l)/2. Then G is an inverse of F since F o G and G o F are both equal to lp^. Indeed F o G(x) = F(G(x)) = F((x + l)/2) = 2((x + l)/2) - 1 = 2;, with a similar computation for 6* o F(x). Not every function has an inverse; in fact a basic theorem asserts that only the bijective ones do. Theorem 6.1.3 A function F : X — > Y has an inverse if and only if it is bijective. Proof Suppose first that F has an inverse function G : Y — > X. If F(xi) = F(x2), then, on applying G to both sides, we obtain G o F(xi) = G o F(x2). But G o F is the identity function on X, so xi = x2. Hence F is injective. Next let y be any element of Y; then, since FoG is the identity function, y = F o G(y) = F(G(y)), which shows that j/ belongs to the image of F and F is surjective. Therefore F is bijective. Conversely, assume that F is a bijective function. We need to find an inverse function G : Y — > X for F. To this end let y belong to F; then, since F is surjective, y = F{x) for some a; in X; moreover x is uniquely determined by y since F is injective. This allows us to define G(y) to be x. Then G(F(a;)) = G(y) = x and F(G(y)) = F(ar) = j/. Here it is necessary to observe that every element of X is of the form G(y) for some y in Y, so that G(F(x)) equals x for all elements x of X. Therefore G is an inverse function for F. The next observation is that when inverse functions do exist, they are unique.
  • 173. 6.1: Functions Defined on Sets 157 Theorem 6.1.4 Every bijective function F : X — > Y has a unique inverse function. Proof Suppose that F has two inverse functions, say G and G2. Then {Gx o F) o G2 = lx ° G2 = G2 by 6.1.2. On the other hand, by 6.1.1 this function is also equal to G o (F o G2) = G i o l y = Gi. Thus Gi = G 2 . Because of this result it is unambiguous to denote the inverse of a bijective function F : X —• > Y by F~l : Y -> X. To conclude this brief account of the elementary theory of functions, we record two frequently used results about inverse functions. Theorem 6.1.5 (a) If F : X — > Y is an invertible function, then F~l is invertible with inverse F. (b) IfF:X^YandG:U^X are invertible functions, then the function F o G : U — > Y is invertible and its inverse is G~x o F"1 . Proof Since F o F~l = 1Y and F~x o F = lx, it follows that F is the inverse of F~x . For the second statement it is enough to check that when G - 1 o F _ 1 is composed with F o G on both sides, identity functions result. To prove this simply apply the associative law twice.
  • 174. 158 Chapter Six: Linear Transformationns Exercises 6.1 1. Label each of the following functions F : R — • R injective, surjective or bijective, as is most appropriate. (You may wish to draw the graph of the function in some cases): (a) F(x) = x2 ; (b) F(x) = x3 /(x2 + 1); (c) F(x) = x(x ~l)(x-2); (d) F(x) = ex + 2. 2. Let functions F and G from R to R be defined by F{x) = 2x — 3, and G{x) = (x2 — l)/(x2 + l). Show that the composite functions F o G and G o F are different. 3. Verify that the following functions from R to R are mutu- ally inverse: F(x) = 3x — 5 and G(x) = (x + 5)/3. 4. Find the inverse of the bijective function F : R — > R defined by F(x) = 2x3 — 5. 5. Let G : F -^ X be an injective function. Construct a function F : X — • V such that F o G is the identity function on Y. Then use this result to show that there exist functions F, G : R -> R such that F o G = 1 R but G o F ^ 1 R . 6. Prove 6.1.2. 7. Complete the proof of part (b) of 6.1.5. 6.2 Linear Transformations and Matrices After the preliminaries on functions, we proceed at once to the fundamental definition of the chapter, that of a linear transformation. Let V and W be two vector spaces over the same field of scalars F. A linear transformation (or linear mapping) from V to W is a function T: V ^ W with the properties T(vi + v2) = T(vi) + T(v2) and T(cv) = cT(v)
  • 175. 6.2: Linear Transformations and Matrices 159 for all vectors v, vi, V2 in V and all scalars c in F. In short the function T is required to act in a "linear" fashion on sums and scalar multiples of vectors in V. In the case where T is a linear transformation from V to V, we say that T is a linear operator on V. Of course we need some examples of linear transforma- tions, but these are not hard to find. Example 6.2.1 Let the function T : R3 — > R2 be defined by the rule Thus T simply "forgets" the third entry of a vector. From this definition it is obvious that T is a linear transformation. Now recall from Chapter Four the geometrical interpreta- tion of the column vector with entries a, b, c as the line segment joining the origin to the point with coordinates (a, b, c). Then the linear transformation T projects the line segment onto the xy-plane. Consequently projection of a line in 3-dimensional space which passes through the origin onto the xy-pane is a linear transformation from R3 to R2 . The next example of a linear transformation is also of a geometrical nature. Example 6.2.2 Suppose that an anti-clockwise rotation through angle 9 about the origin O is applied to the xy-plane. Since vectors in R2 are represented by line segments in the plane drawn from the origin, such a rotation determines a function T : R2 — > R2 ; here the line segment representing T(X) is obtained by rotat- ing the line segment that represents X.
  • 176. 160 Chapter Six: Linear Transformationns To show that T is a linear operator on R2 , we suppose that Y is another vector in R2 . T(X+Y) Referring to the diagram above, we know from the trian- gle rule that X+Y is represented by the third side of the trian- gle formed by the line segments representing X and Y. When the rotation is applied to this triangle, the sides of the result- ing triangle represent the vectors T(X), T(Y), T(X) + T(Y), as shown in the diagram. The triangle rule then shows that T(X + Y) = T(X)+T(Y). In a similar way we can see from the geometrical inter- pretation of scalar multiples in R2 that T(cX) = cT(X) for any scalar c. It follows that T is a linear operator on R . Example 6.2.3 Define T : D^a, b] — > Doo[a, b] to be differentiation, that is, T(f(x)) = f'(x). Here Doo[a,b} denotes the vector space of all functions of x that are infinitely differentiable in the interval [a ,b]. Then well-known facts from calculus guarantee that T is a linear operator on D^a, b. This example can be generalized in a significant fashion as follows. Let a±, a2 ,..., an be functions in D^a, b]. For any
  • 177. 6.2: Linear Transformations and Matrices 161 / in Doo[a,b], define T(f) to be anf^ + a n - i / ( n _ 1 ) + • • • + a i / ' + «o/. Then T is a linear operator on Doo[a,b], once again by ele- mentary results from calculus. Here one can think of T as a sort of generalized differential operator that can be applied to functions in -Doo[a, 6]. Our next example of a linear transformation involves quo- tient spaces, which were defined in 5.3. Example 6.2.4 Let U be a subspace of a vector space V and define a function T : V -» V/U by the rule T(v) = v + U. It is simple to verify that T is a linear transformation: indeed, T(vx + v2) = (vi + v2) + U = (vi + U) + (v2 + *7) = T(V l )+T(v2 ) by definition of the sum of two vectors in a quotient space. In a similar way one can show that T(cv) = c(T(v)). The function just defined is often called the canonical linear transformation associated with the subspace U. Finally, we record two very simple examples of linear transformations. Example 6.2.5 (a) Let V and W be two vector spaces over the same field. The function which sends every vector in V to the zero vector of W is a linear transformation called the zero linear transformation from V to W; it is written Ov,w or simply 0. (b) The identity function y : V — > V is a linear operator on V. After these examples it is time to present some elemen- tary properties of linear transformations.
  • 178. 162 Chapter Six: Linear Transformationns Theorem 6.2.1 Let T : V — > W be a linear transformation. Then r(Ov) = ow and T(c1 v1 +c2 v2 + - • • +c fcvA!) = c1T(v1)+c2T(v2) + - • -+ckT(vk) /or a// vectors v$ and scalars Ci. Thus a linear transformation always sends a zero vector to a zero vector; it also sends a linear combination of vectors to the corresponding linear combination of the images of the vectors. Proof In the first place we have T(0V) = T(0V + 0V) = T(0V) + T(0V) by the first defining property of linear transformations. Addi- tion of —T(Oy) to both sides gives Ow = T(Oy), as required. Next, use of both parts of the definition shows that T(civi H h cfc_ivfc_! + ckvk) is equal to the vector r(cxvi H h Cfc_iVfc_i) + cfcr(vfc). By repeated application of this procedure, or more properly induction on k, we obtain the second result. Representing linear transformations by matrices We now specialize the discussion to linear transformations of the type
  • 179. 6.2: Linear Transformations and Matrices 163 where F is some field of scalars. Let {Ei,E2, ...,En} be the standard basis of Fn written in the usual order, that of the columns of the identity matrix ln . Also let {Di,D2, ...,Dm} be the corresponding ordered basis of Fm . Since T(Ej) is a vector in Fm , it can be written in the form / T{E3) = O i j — aijDi + • • • + amjDm = 2_^ &ijDi Uj m3 Put A = [ajj]m)n, so that the columns of the matrix A are the vectors T(E{), ...,T(En). We show that T is completely determined by the matrix A. Take an arbitrary vector in Fn , say X = : J = xiE1 + ~xnEn = Y^x jE r xnl i=i Then, using 6.2.1 together with the expression for T(Ej), we obtain n n n m j=l j=l j-l i=l m n = 52(52a i3x j)D i- i=l j = l Therefore the ith entry of T(X) equals the ith entry of the matrix product AX. Thus we have shown that T{X) = AX, which means that the effect of T on a vector in Fn is to multiply it on the left by the matrix A. Thus A determines T completely.
  • 180. 164 Chapter Six: Linear Transformationns Conversely, suppose that we start with an m x n matrix A over F; then we can define a function T : Fn — > Fm by the rule T(X) = AX. The laws of matrix algebra guarantee that T is a linear transformation; for by 1.2.1 A(Xi + X2) = AXX + AX2 and A(cX) = c(AX). We have now established a fundamental connection between matrices and linear transformations. Theorem 6.2.2 (i) Let T : Fn — > Fm be a linear transformation. Then T{X) = AX for all X in Fn where A is the m x n matrix whose columns are the images under T of the standard basis vectors of Fn . (ii) Conversely, if A is any mx n matrix over the field F, the function T : Fn -> F m defined by T(X) = AX is a linear transformation. Example 6.2.6 Define T : R3 -»• R2 by the rule One quickly checks that T is a linear transformation. The images under T of the standard basis vectors Ei, E2, E3 are respectively. It follows that T is represented by the matrix A = [ 0 - 1 3 ) '
  • 181. 6.2: Linear Transformations and Matrices 165 Consequently T{ x2 ) = A x2 x3J x3 as can be verified directly by matrix multiplication. Example 6.2.7 Consider the linear operator T : R2 — > R2 which arises from an anti-clockwise rotation in the xy-plane through an angle 9 (see Example 6.2.2.) The problem is to write down the matrix which represents T. All that need be done is to identify the vectors T(E{) and T(E2) where E and E2 are the vectors of the standard ordered basis. (-sin 9, cos 9) -(0,1) (cos 9, sin 9) (1.0) The line segment representing E is drawn from the origin O to the point (1, 0), and after rotation it becomes the line segment from O to the point (cos 8, sin 9); thus T(E) = [ . n 1. sin " J Similarly T(E2) = — sin 9 cos 9 which represents the rotation T is It follows that the matrix cos 9 sin 9 -sin 9 cos 9
  • 182. 166 Chapter Six: Linear Transformationns Representing linear transformations by matrices: The general case We turn now to the problem of representing by matrices linear transformations between arbitrary finite-dimensional vector spaces. Let V and W be two non-zero finite-dimensional vector spaces over the same field of scalars F. Consider a linear transformation T : V — > W. The first thing to do is to choose and fix ordered bases for V and W, say B = {v i> v 2 . . . , v n } andC = {wi, w 2 . . . , w m } respectively. We saw in 5.1 how any vector v of V can be represented by a unique coordinate vector with respect to the ordered basis B. If v = ciVi + • • • + cn vn , this coordinate vector is Similarly each w in W may be represented by a coordinate vector [w]c with respect to C . To represent T by a matrix with respect to these chosen ordered bases, we first express the image under T of each vector in B as a linear combination of the vectors of C, say m T(VJ) = aij-wi H 1 - am i wm = ^ a y'w * where the scalars. Thus [T(VJ)]C is the column vector with entries aij,..., amj. Let A be the m x n matrix whose (i,j) entry is a^. Thus the columns of A are just the coordi- nate vectors of T(vi),..., T(vn) with respect to C.
  • 183. 6.2: Linear Transformations and Matrices 167 Now consider the effect of T on an arbitrary vector of V, say v = C1V1 + • • • + cn vn . This is computed by using the expression for T(VJ) given above: n n n m 3 = 1 3 = 1 3 = 1 i=l On interchanging the order of summations, this becomes m n T (V ) = ^ E a V c j ) w i - i=l j=l Hence the coordinate vector of T(v) with respect to the or- dered basis C has entries J2^=i a ijc j for i = 1, 2,..., m. This means that [T(v)]c = A[v]B. The conclusions of this discussion can be summed up as follows. Theorem 6.2.3 Let T : V — > W be a linear transformation between two non- zero finite-dimensional vector spaces V and W over the same field. Suppose that B and C are ordered bases for V and W respectively. If v is any vector of V, then [T(v)]c = A[v]B where A is the mxn matrix whose jth column is the coordinate vector of the image under T of the jth vector of B, taken with respect to the basis C. What this result means is that a linear transformation between non-zero finite-dimensional vector spaces can always be represented by left multiplication by a suitable matrix. At this point the reader may wonder if it is worth the trouble
  • 184. 168 Chapter Six: Linear Transformations of introducing linear transformations, given that they can be described by matrices. The answer is that there are situations where the functional nature of a linear transformation is a decided advantage. In addition there is the fact that a given linear transformation can be represented by a host of different matrices, depending on which ordered bases are used. The real object of interest is the linear transformation, not the representing matrix, which is dependent on the choice of bases. Example 6.2.8 Define T : Pn + i(R) Pn (R) by the rule T(f) = /', the derivative. Let us use the standard bases B = {1, x, x ,..., xn } and C = {l,ai, x2 , ...,xn_1 } for the two vector spaces. Here T{xl ) = ix% ~1 , so [T(xl )]c is the vector whose ith entry is i and whose other entries are zero. Therefore T is represented by the n x (n + 1) matrix A / 0 1 0 0 0 2 0 0 0 0 0 0 Vo o o ° 0 0 n 0 / For example, A ( 2 -1 3 0 6 0 V 07 V 0/ which corresponds to the differentiation T(2 - x + 3x2 ) = (2-x + 3x2 )' = 6x - 1.
  • 185. 6.2: Linear Transformations and Matrices 169 Change of basis Being aware of a dependence on the choice of bases, we wish to determine the effect on the matrix representing a linear transformation when the ordered bases are changed. The first step is to find a matrix that describes the change of basis. Let B = {v1,..., vn } and B' = {v^,..., v'n} be two or- dered bases of a finite-dimensional vector space V. Then each v^ can be expressed as a linear combination of v i , . . . , vn , say n J'=l for certain scalars Sji. The change of basis B' —> B is deter- mined by the n x n matrix S = [sij]. To see how this works we take an arbitrary vector v in V and write it in the form n i=l where, of course, c ,..., cn' are the entries of the coordinate vector [v]g/. Replace each v / by its expression in terms of the Vj to get n n n n i = l j = X j = l i = l From this one sees that the entries of the coordinate vector [V]B are just the scalars Y17-1s jic 'n ^or 3 = 1, 2,..., n. But the latter are the entries of the product (c'A Vn)
  • 186. 170 Chapter Six: Linear Transformationns Therefore we obtain the fundamental relation M B = S[v]B>. Thus left multiplication by the change of basis matrix S trans- forms coordinate vectors with respect to B' into coordinate vectors with respect to B. It is in this sense that the matrix S describes the basis change B' — > B. Here it is important to observe how S is formed: its ith column is the coordinate vector of v[, the ith vector of B', with respect to the basis B. It is a crucial remark that the change of basis matrix S is always invertible. Indeed, if this were false, there would by 2.3.5 be a non-zero n-column vector X such that SX — 0. However, if u denotes the vector in V whose coordinate vector with respect to basis B' is X, then [u]g = SX = 0, which can only mean that u = 0 and X = 0, a contradiction. As one would expect, the matrix S~x represents the in- verse change of basis B — > B' for the equation M s = ^ M s ' implies that [vBI = S-vB. These conclusions can be summed up in the following form. Theorem 6.2.4 Let B and B' be two ordered bases of an n-dimensional vector space V. Define S to be the n x n matrix whose ith column is the coordinate vector of the ith vector of B' with respect to the basis B. Then S is invertible and, ifv is any vector ofV, M s = S[v]B> and [v]B/ = S~1 [v]B.
  • 187. 6.2: Linear Transformations and Matrices 171 Example 6.2.9 Consider two ordered bases of the vector space Pa(R): B = {l,x,x2 } and B' = {1, 2x, Ax2 - 2}. In order to find the matrix S which describes the change of basis B' — > B, we must write down the coordinate vectors of the elements of B' with respect to the standard basis B: these are [l]s = I 0 , [2x]B = 2 J , [Ax2 - 2]B = I 0 Therefore The matrix which describes the change of basis B —> B' is / l 0 1/2' S"1 = 0 1/2 0 0 0 1/4 For example, to express / = a + bx + ex2 in terms of the basis B', we compute [/]B'=S_1 [/]fl Thus / = (a + c/2)l + (b/2)2x + (c/4)(4x2 - 2), which is of course easy to verify.
  • 188. 172 Chapter Six: Linear Transformationns Example 6.2.10 Consider the change of basis in R2 which arises when the x- and y-axes are rotated through angle 9 in an anticlockwise direction. As was noted in Example 6.2.6, the effect of this rotation is to replace the standard ordered basis B = {E, E2] by the basis B' consisting of cos 9 _. / — sin 9 sin 9 J y cos 9 The matrix which describes the change of basis B ' —> B is q _ f cos 9 — sin 9 sin 9 cos 9 so the change of basis B —> B' is described by s-l = ,_i _ / cos 9 sin 9 — sin 9 cos 9 Hence, if X = I , I, the coordinate of vector of X with re- spect to the basis B' is r v — c-i v _ f a cos 9 + b sin 9 This means that the coordinates of the point (a, b) with re- spect to the rotated axes are a' = a cos 9 + b sin # and b' = —a sin 9 + b cos 9, respectively.
  • 189. 6.2: Linear Transformations and Matrices 173 Change of basis and linear transformations We are now in a position to calculate the effect of change of bases on the matrix representing a linear transformation. Let B and C be ordered bases of finite-dimensional vector spaces V and W over the same field, and let T : V — > W be a linear transformation. Then T is represented by a matrix A with respect to these bases. Suppose now that we select new bases B' and C for V and W respectively. Then T will be represented with respect to these bases by another matrix, say A'. The question before us is: what is the relation between A and A'l Let X and Y be the invertible matrices that represent the changes of bases B — > B' and C — > C respectively. Then, for any vectors v of V and w of W, we have [v]B/ = X[V]B and [w]C/ = y[w]c . Now by 6.2.3 [T(v)]c = A[w)B and [T(v)]c, = A'[v]s,. On combining these equations, we obtain [T(v)]c, = Y[T(v)]c = 7A[v]B = YAX-^B'- But this means that the matrix YAX-1 describes the linear transformation T with respect to the bases B' and C of V and W respectively. Hence A' = YAX~l . We summarise these conclusions in Theorem 6.2.5 Let V and W be non-zero finite-dimensional vector spaces over the same field. Let B and B' be ordered bases of V, and C and C ordered bases of W. Suppose that matrices X and Y describe the respective changes of bases B — > B' and C — > C'. If the linear transformation T : V — > W is represented by a
  • 190. 174 Chapter Six: Linear Transformationns matrix A with respect to B and C, and by a matrix A' with respect to B' and C, then A' = YAX- The most important case is that of a linear operator T : V —• V, when the ordered basis B is used for both domain and codomain. Theorem 6.2.6 Let B and B' be two ordered bases of a finite-dimensional vec- tor space V and let T be a linear operator on V. If T is repre- sented by matrices A and A' with respect to B and B' respec- tively, then A' = SAS'1 where S is the matrix representing the change of basis B —> B'. Example 6.2.11 Let T be the linear transformation on -Ps(R) defined by T(/) = /'. Consider the ordered bases of Pa(R) B = {l,x,x2 } and B' = {l,2x,4x2 - 2}. We saw in Example 6.2.9 that the change of basis B — > B' is represented by the matrix / l 0 1/2' 17 = 0 1/2 0 0 0 1/4 Now T is represented with respect to B by the matrix A
  • 191. 6.2: Linear Transformations and Matrices 175 Hence T is represented with respect to B' by / 0 2 0 [/At/- 1 = 0 0 4 . 0 0 0 / This conclusion is easily checked. An arbitrary element of P3(R) can be written in the form / = a(l)+6(2ir) + c(4a;2 -2). Then it is claimed that the coordinate vector of T(f) with respect to the basis B' is / 0 2 0 / a / 2 6 0 0 4 6 = = 4 c . 0 0 0/ c / W This is correct since 26(1) + 4c(2x) + 0(4a?2 - 2) = 2b + 8cx = (a{l) + b(2x) + c(4x2 - 2))'. Similar matrices Let A and B be two n x n matrices over a field F; then B is said to be similar to A over F if there is an invertible n x n matrix S with entries in F such that B = SAS~1 . Thus the essential content of 6.2.6 is that two matrices which represent the same linear operator on a finite-dimensional vec- tor space are similar. Because of this fact it is to be expected that similar matrices will have many properties in common: for example, similar matrices have the same determinant. In- deed if B = SAS~ then by 3.3.3 and 3.3.5 det(B) = det(S) det(A) det(S)-1 = det(A).
  • 192. 176 Chapter Six: Linear TYansformationns We shall encounter other common properties of similar matri- ces in Chapter Eight. Exercises 6.2 1. Which of the following functions are linear transforma- tions? (a) Ti : R3 — > R where Ti([2:1X2X3]) = Jx + x + x§; (b) T2 : Mm,n{F) - Mn ,m (F) where T2(A) = AT ; (c) T3 : Mn(F) ->• F where T3(^) = det(4). 2. If T is a linear transformation, prove that T(—v) = —T(y) for all vectors v. 3. Let I be a fixed line in the xy-plane passing through the origin O. If P is any point in the plane, denote by P' the mirror image of P in the line /. Prove that the assignment OP — > OP' determines a linear operator on R2 . (This is called reflection in the line I). 4. A linear transformation T : R — > R is defined by T{ (Xl X3 X4 / Xi — X2 — %3 ~ £4 2xi + x2 - X3 %2 - %3 + %4 Find the matrix that represents T with respect to the standard bases of R4 and R3 . 5. A function T : P^(R) — > • P^(R) is defined by the rule T(f) = xf" — 2xf + /. Show that T is a linear operator and find the matrix that represents T with respect to the standard basis of P4.(R). 6. Find the matrix which represents the reflection in Exercise 3 with respect to the standard ordered basis of R2 , given that the angle between the positive x-direction and the line I is 4>.
  • 193. 6.2: Linear Transformations and Matrices 177 7. Let B denote the standard basis of R3 and let B' be the basis consisting of Find the matrices that represent the basis changes B —* B' and B' -• B. 8. A linear transformation from R3 to R2 is defined by xx - x2 - x3 -Xi + X3 Let B and C be the ordered bases of R3 and R2 respectively. Find the matrix that represents T with respect to these bases. 9. Explain why the matrices I 1 and ( 1 cannot be similar. 10. If B is similar to A, prove that A is similar to B. 11. If B is similar to A and C is similar to B, prove that C is similar to A. 12. If B is similar to A, then BT is similar to AT ; prove or disprove.
  • 194. 178 Chapter Six: Linear Transformations 6.3 Kernel, Image and Isomorphism If T : V — > W is a linear transformation between two vec- tor spaces, there are two important subspaces associated with T, the image and the kernel. The first of these has already been defined; the image of T, Im(T), is the set of all images T(v) of vectors v in V: thus Im(T) is a subset of W. On the other hand, the kernel of T Ker(T) is defined to be the set of all vectors v i n 7 such that T(v) = 0W. Thus Ker(T) is a subset of V. Notice that by 6.2.1 the zero vector of V must belong to Ker(T), while the zero vector of W belongs to Im(T). The first thing to observe is that we are actually dealing with subspaces here, not just subsets. Theorem 6.3.1 If T is a linear transformation from a vector space V to a vector space W, then Ker(T) is a subspace of V and Im(T) is a subspace of W. Proof We need to check that Ker(T) and Im(T) contain the relevant zero vector, and that they are closed with respect to addition and scalar multiplication. The first point is settled by the equation T(Oy) = Ow, which was proved in 6.2.1. Also, by definition of a linear transformation, we have T(vi + V2) = T(vi) + T(v2) and T(cvi) = cT(vi) for all vectors v1 ; v2 of V and scalars c. Therefore, if vi and v2 belong to Ker(T), then T(vi + v2) = 0 ^ , and T(cvi) = Ow, so that vx + v2 and
  • 195. 6.3: Kernel, Image and Isomorphism 179 cvi belong to Ker(T); thus Ker(T) is a subspace. For similar reasons Im(T) is a subspace. Let us look next at some examples which relate these new concepts to some more familiar ones. Example 6.3.1 Consider the homogeneous linear differential equation for a function y of the real variable x: y™ + an_x{x)y^-^ + • • • + a1(x)y' + a0(x)y = 0, with x in the interval [a, b] and di(x) in .Doo[a, &]. There is an associated linear operator T on the vector space D^a^b] defined by T(f) = /( n ) + an-xOr)/*"-1 * + • • • + ax{x)f + aQ(x)f. Then Ker(T) is the solution space of the differential equation. Example 6.3.2 Let A be an m x n matrix over a field F. We have seen that the rule T(X) = AX defines a linear transformation Identify Ker(T) and Im(T). In the first place, the definition shows that Ker(T) is the null space of the matrix A. Next an arbitrary element of Im(T) is a linear combination of the images of the standard basis elements of Rn ; but the latter are simply the columns of the matrix A. Consequently, the image of T coincides with the column space of the matrix A. Example 6.3.3 After the last example it is natural to enquire if there is an interpretation of the row space of a matrix A as an image
  • 196. 180 Chapter Six: Linear Transformations space. That this is the case may be seen from a related linear transformation. Given an m x n matrix A, define a linear transformation 7 from Fm to Fn by the rule Ti(X) = XA. In this case Im(Xi) is generated by the images of the elements of the stan- dard basis of Fm, that is, by the rows of A. Hence the image of T equals the row space of A. It is now time to consider what the kernel and image tell us about a linear transformation. Theorem 6.3.2 Let T be a linear transformation from a vector space V to a vector space W. Then (i) T is infective if and only ifKer(T) is the zero subspace ofV; (ii) T is surjective if and only ifIm(T) = W. Proof (i) Assume that T is an injective function. If v is a vector in the kernel of T, then T(v) = 0W = T(0V). Therefore v = 0V by injectivity, and Ker(T) = Oy- Conversely, suppose that Ker(T') = Oy- If vi and v2 are vectors in V with the property T(vi) = T(v2), then T(V l - v2) = T(V l ) - T(v2) = 0W. Hence the vector vi — v2 belongs to Ker(T) and vi = v2. (ii) This is true by definition of surjectivity. For finite-dimensional vector spaces there is a simple for- mula which links the dimensions of the kernel and image of a linear transformation. Theorem 6.3.3 Let T : V — > W be a linear transformation where V and W are finite-dimensional vector spaces. Then dim(Ker(T)) + dim(Im(T)) = dim(F).
  • 197. 6.3: Kernel, Image and Isomorphism 181 Proof Here we may assume that V is not the zero space; otherwise the statement is true for obvious reasons. By 5.1.4 it is possi- ble to choose a basis v i , . . . , vn of V such that part of it is a basis of Ker(T), say v i , . . . , vr ; here of course n = dim(V) > r = dim(Ker(T)). We claim that the vectors T(vT .+ 1 ),..., T(vn) are linearly in- dependent. For if cr + iT(vr + i) + • • • + cnT(vn) = Ow for some scalars Ci, then T(cr+ivr+i + • • • + cnvn) = Oiy, so that cr+ivr_|_i + • • • + cn vn belongs to Ker(T) and is there- fore expressible as a linear combination of v i , . . . , v r . But v i , . . . , v r , . . . , vn are certainly linearly independent. Hence cr + i,..., cn are all zero and our claim is established. On the other hand, the vectors T(vr + 1 ),... ,T(vn) by themselves generate Im(T) since T(vi) = • • • = T(vr) = Ow', hence T(vr + i),... ,T(vn ) form a basis of Im(T). It follows that dim(Im(T)) = n-r = dim(F) - dim(Ker(T)), from which the formula follows. The dimension formula is in fact a generalization of some- thing that we already know. For suppose we apply the for- mula to the linear transformation T : Fn — • Fm defined by T(X) = AX, where A is an m x n matrix. Making the inter- pretations of Ker(T) and Im(T) as the null space and column space of A, we deduce that the sum of the dimensions of the null space and column space of A equals n. This is essentially the content of 5.1.7 and 5.2.4. Isomorphism Because of 6.3.2 we can tell whether a linear transforma- tion T : V —• > W is bijective. And in view of 6.1.3 this is the same as asking whether T has an inverse. A bijective linear transformation is called an isomorphism.
  • 198. 182 Chapter Six: Linear Transformations Theorem 6.3.4 A linear transformation T : V — > W is an isomorphism if and only if Ker(T) is the zero subspace of V and Im(T) equals W. Moreover, if T is an isomorphism, then so is its inverse T~l : W -> V. Proof The first statement follows from 6.3.2. As for the second state- ment, all that need be shown is that T~l is actually a linear transformation: for by 6.1.5 it certainly has an inverse. This is achieved by a trick. Let vj and v2 be any two vectors in V. Then certainly T ( r - 1 ( v 1 + v 2 ) ) = v 1 + v 2 , while on the other hand, TOT-VI) +T-X M) = nr-^vi)) + rcr-Va)) = vi + v 2 , because T is known to be a linear transformation. Since T is an injective function, this can only mean that the vectors T- 1 (vi + v2) and T- 1 (vi) +T_ 1 (v2 ) are equal; for they have the same image under T. In a similar way it can be demonstrated that T_ 1 (cvi) equals cT_ 1 (vi) where c is any scalar: just check that both sides have the same image under T. Hence T~l is a linear transformation. Two vector spaces V and W are said to be isomorphic if there is an isomorphism from one to the other. Observe that isomorphic vector spaces are necessarily over the same field of scalars. The notation V ~W is often used to express the fact that vector spaces V and W are isomorphic.
  • 199. 6.3: Kernel, Image and Isomorphism 183 How can one tell if two finite-dimensional vector spaces are isomorphic? The answer is that the dimensions tell us all. Theorem 6.3.5 Let V and W be finite-dimensional vector spaces over a field F. Then V and W are isomorphic if and only if dim(V) = dim(W). Proof Suppose first that dim(V) = dim(VF) = n. If n = 0, then V and W are both zero spaces and hence are surely isomorphic. Let n > 0. Then V and W have bases, say {vi,..., vn } and {wi,..., wn } respectively. There is a natural candidate for an isomorphism from V to W, namely the linear transformation T : V -»• W defined by T(c1v1 H h cnvn) = ciwi H h cnwn. It is straightforward to check that T is a linear transformation. Hence V and W are isomorphic. Conversely, let V and W be isomorphic via an isomor- phism T : V — > W. Suppose that {vi,..., vn } is a basis of V. In the first place, notice that the vectors T(vi),..., T"(vn) are linearly independent; for if ciT(vi) + - • -+cnT(vn) = 0 ^ , then T(ciVi + - • - + cnvn) = 0w• This implies that ciVi + - • -+cn vn belongs to Ker(T) and so must be zero. This in turn implies that c = • • • = cn = 0 because v i , . . . , vn are linearly inde- pendent. It follows by 5.1.1 that dim(W) > n = dim(V). In the same way it may be shown that dim(W) < dim(V^); hence dim(V) = dim{W). Corollary 6.3.6 Every n-dimensional vector space V over a field F is isomor- phic with the vector space Fn . For both V and Fn have dimension n. This result makes it possible for some purposes to work just with vector spaces of column vectors.
  • 200. 184 Chapter Six: Linear Transformations Isomorphism theorems There are certain theorems, known as isomorphism theo- rems, which provide a link between linear transformations and quotient spaces (which were defined in 5.3). Such theorems occur frequently in algebra. The first theorem of this type is: Theorem 6.3.7 IfT : V —+ W is a linear transformation between vector spaces V and W, then V/Kev(T) ~ Im(T). Proof Write K = Ker (T). We define a function S : V/K -> Im(T) by the rule S{ + K) = T(v). The first thing to notice is that S is well-defined: indeed if u € K, then T(v + u) = T(v) + T(u) = T(v) + 0 = T(v), since T(u) = 0. Thus S(v + K) does not depend on the choice of representative v of the coset v + K. Next it is simple to verify that 5" is a linear transforma- tion: for example, ^((v! + K) + (v2 + K)) = S((vi + v2) + K) = T ( v i + v 2 ) = r(vi) + r(v2), which equals S(v + K) + 5, (v2 + K). In a similar way it can be shown that S(c(v + K)) = cS(v + K). Clearly the function S is surjective, so all we need do to complete the proof is show it is injective. If S(y + K) = 0, then T(v) = 0; thus v e K and v + K = 0V/K- Hence, by 6.3.2, S is injective.
  • 201. 6.3: Kernel, Image and Isomorphism 185 The last result provides an alternative proof of the dimen- sion formula in 6.3.3. Let T : V — > W be a linear transfor- mation. Then dim(V/Ker(:T)) = dim(Im(T) by 6.3.7. Prom the formula for the dimension of a quotent space (see 5.3.7), we obtain dim(F) - dim(Ker(T)) = dim(Im(T)), so that dim(Ker(T)) + dim(Im(T)) = dim(V). There is second isomorphism theorem which provides valuable insight into the relation between the sum of two sub- spaces and certain associated quotient spaces. Theorem 6.3.8 If U and W are subspaces of a vector space V, then (u + w)/w ~ u/(unw). Proof We begin by defining a function T : U — > (U + W)/W by the rule T(u) = u + W, where u G U. It is a simple matter to check that T is a linear transformation. Since u + W is a typical vector in (U + W)/W, we see that T is surjective. Next we need to compute the kernel of T. Now T(u) = u + W equals the zero vector of (U + W)/W, i.e., the coset W, precisely when u G W, which is just to say that u G U D W. Therefore Ker(T) = UnW. It now follows directly from 6.3.7 that U/(U n l f ) ~ ( f / + W)/W. We illustrate the usefulness of this last result by using it to give another proof of the dimension formula of 5.3.2. Corollary 6.3.9 // U and W are subspaces of a finite dimensional vector space V, then dim(C/ + W) + dim(U n W) = dim(C7) + dim(W).
  • 202. 186 Chapter Six: Linear Transformations Proof Since isomorphic vector spaces have the same dimension, we have dim((C7 + W)/W) = dxm(U/(U D W)). Now use the formula for the dimension of a quotient space in 5.3.7 to obtain dim(U + W) - dim(W) = dim(U) - dim(U D W), from which the result follows. The algebra of linear operators on a vector space We conclude the chapter by observing that the set of all linear operators on a vector space has certain formal properties which are very similar to properties that have already been seen to hold for matrices. This similarity can be expressed by saying that both systems form what is called an algebra. Consider a vector space V with finite dimension n over a field F. Let Ti and Ti be two linear operators on V. Then we define their sum Ti + T2 by the rule r 1 + T 2 ( v ) = Ti(v)+T2 (v) and also the scalar multiple cTi, where c is an element of F, by cTi(v)=c(Ti(v)). It is quite routine to verify that T + T2 and cT are also linear operators on V. For example, to show that T + T2 is a linear operator we compute Ti + T2(vx + v2) = Ti (vi + v2) + T2(vi + v2) = Ti(vi) + Ti(v2) + T2(Vl) + T2(v2), from which it follows that Ti +T2 (V l + v2) = (Ti +r2 (vi)) + (Ti +T2 (v2 )).
  • 203. 6.3: Kernel, Image and Isomorphism 187 It is equally easy to show that 7 + T2(cv) = c(2 + T2(v)). Thus the set of all linear operators on V, which will henceforth be written L(V), admits natural operations of addition and scalar multiplica- tion. Now there is a further natural operation that can be per- formed on elements of L(V), namely functional composition as defined in 6.1. Thus, if T and T2 are linear operators on V', then the composite T oT2, which will in future be written TiT2, is defined by the rule TiT2 (v)=Ti(T2 (v)). One has of course to check that TiT2 is actually a linear trans- formation, but again this is quite routine. So one can also form products in the set L(V). To illustrate these definitions, we consider an explicit ex- ample where sums, scalar multiples and products can be com- puted. Example 6.3.4 Let Ti and T2 be the linear operators on -Doo[a, b] defined by Ti(/) = f - f and T2(/) = xf" - 2/'. The linear opera- tors Ti + T2, cT and TiT2 may be found directly from the definitions as follows: Ti + r2(/) = r1(/) + T2(/) = / ' - / + x/"-2/' = -f-f' + xf". Also cT1(f) = cf'-cf
  • 204. 188 Chapter Six: Linear Transformations and TiT2(/) = T1(T2(f))=T1(xf" - 2/') = ( * / " - 2 / ' ) ' - ( * / " - 2 / ' ) , which reduces to TiT2(/) = 2/' - (x + 1)/" + xf^ after evaluation of the derivatives. At this point one can sit down and check that those properties of matrices listed in 1.2.1 which relate to sums, scalar multiples and products are also valid for linear oper- ators. Thus there is a similarity between the set of linear operators L(V) and Mn(F), the set o f n x n matrices over F where n = dim(V). This similarity should come as no sur- prise since the action of a linear operator can be represented by multiplication by a suitable matrix. The relation between L(V) and Mn(F) can be formalized by defining a new type of algebraic structure. This involves the concept of a ring, which was was described in 1.3, and that of a vector space. An algebra A over a field F is a set which is simultane- ously a ring with identity and a vector space over F, with the same rule of addition and zero element, which satisfies the additional axiom c(xy) = (cx)y = x(cy) for all x and y in A and all c in the field F. Notice that this axiom holds for the vector space Mn(F) because of property (j) in 1.2.1. Hence Mn{F) is an algebra over F. Now the additional axiom is also valid in L(V), that is, c(TlT2) = (cT1)T2 = Tl(cT2). This is true because each of the three linear operators men- tioned sends the vector v to c(Ti(T2(v))). It follows that
  • 205. 6.3: Kernel, Image and Isomorphism 189 L(V), the set of all linear operators on a vector space V over a field F, is an algebra over F. Suppose now that we pick and fix an ordered basis B for the finite-dimensional vector space V. Then, with respect to B, a linear operator T on V is represented by an n x n matrix, which will be denoted by M(T). By 6.2.3 the matrix M(T) has the property [T(v)]B = M(r)[v]B. It follows from 6.2.3 that the assignment of the matrix M(T) to a linear operator T determines a bijective function from L(V) to Mn(F). The essential properties of this function are summarized in the next result. Theorem 6.3.10 Let Ti and T2 be linear operators on an n-dimensional vector space V and let M(Ti) denote the matrix representing Ti with respect to a fixed ordered basis B of V. Then the following equations hold: (i) M(Ti + T2) = M(Ti) + M(T2); (ii) M{cT)=cM(T); (iii) M(TiT2) = M(Ti)M(T2) for all scalars c. It is as well to restate this technical result in words to make sure that the reader grasps what is being asserted. Ac- cording to part (i) of the theorem, if we add linear operators T and T2, the resulting linear operator T + T2 is represented by a matrix which is the sum of the matrices that represent T and T2. Also (ii) asserts that the scalar multiple cTi is represented by a matrix which is just c times the matrix rep- resenting T.
  • 206. 190 Chapter Six: Linear Transformations More unexpectedly, when we compose the linear opera- tions Ti and T2, the resulting linear operator T1T2 is repre- sented by the product of the matrices representing T and T2 • In technical language, the function which sends T to M(T) is an algebra isomorphism from L(V) to Mn(F). The main point here is that isomorphic algebras, like isomorphic vector spaces, are to be regarded as similar objects, which exhibit the same essential features, even although their un- derlying sets may be quite different. In conclusion, our vague feeling that the algebras L(V) and Mn(F) are somehow quite closely related is made precise by the assertion that the algebra of all linear operators on an n- dimensional vector space over a field F is isomorphic with the algebra of all n x n matrices over F. Example 6.3.5 Prove part (iii) of Theorem 6.3.10. Let v be any vector of the vector space; then, using the fundamental equation [T(v)]s = M(T)[v]s, we obtain [TiT2{-v)}B = M(T1)[r2(v)]s = M(T1)(M(T2)[v]s) = M(7i)M(r2 )[v]s , which shows that M{TXT2) = M(Ti)M(T2), as required. Exercises 6.3 1. Find bases for the kernel and image of the following linear transformations: (a) T : R4 — > • R where T sends a column to the sum of its entries; (b) T : P3(R) - P3(R) where T(f) = /'; < = ) T : R ^ R ° where r ( ( * ) ) = ( £ ; $ ) .
  • 207. 6.3: Kernel, Image and Isomorphism 191 2. Show that every subspace U of a finite-dimensional vector space V is the kernel and the image of suitable linear operators on V. [Hint: assume that U is non-zero, choose a basis for U and extend it to a basis of V]. 3. Sort the following vector spaces into batches, so that those within the same batch are isomorphic: R 6 , R 6 , C6 , P6(C),M2,3(R),C[0,1]. 4. Show that a linear transformation T : V — > W is injective if and only if it has the property of mapping linearly independent subsets of V to linearly independent subsets of W. 5. Show that a linear transformation T : V —• W is surjec- tive if and only if it has the property of mapping any set of generators of V to a set of generators of W. 6. A linear operator on a finite-dimensional vector space is an isomorphism if and only if some representing matrix is invertible: prove or disprove. 7. Prove that the composite of two linear transformations is a linear transformation. 8. Prove parts (i) and (ii) of Theorem 6.3.10. 9. Let T : V — > W and S : W — > U be isomorphisms of vector spaces; show that the function ST : V —> U is also an isomorphism. 10. Let T be a linear operator on a finite-dimensional vector space V. Prove that the following statements about T are equivalent: (a) T is injective; (b) T is surjective; (c) T is an isomorphism. Are these statements still equivalent if V is infinitely gener- ated? 11. Show that similar matrices have the same rank. [Use the fact that similar matrices represent the same linear operator].
  • 208. 192 Chapter Six: Linear Transformations 12. (The third isomorphism theorem). Let U and W be sub- spaces of a vector space V such that W C. U. Prove that U/W is a suhspace oiV/W and that (V/W)/(U/W) ~ V/U. [Hint: define a function T : V/W -> V/U by the rule T(v + W) = v + ?7. Show that T is a well defined linear transformation and apply 6.3.7]. 13. Explain how to define a power Tm of a linear operator T on a vector space V, where m > 0. Then show that powers of T commute.
  • 209. Chapter Seven ORTHOGONALITY IN VECTOR SPACES The notion of two lines being perpendicular, or orthogo- nal, is very familiar from analytical geometry. In this chapter we show how to extend the elementary concept of orthogonal- ity to abstract vector spaces over R or C. Orthogonality turns out to be a tool of extraordinary utility with many applica- tions, one of the most useful being the well-known Method of Least Squares. We begin with Rn , showing how to define orthogonality in this vector space in a way which naturally generalizes our intuitive notion of perpendicularity in three- dimensional space. 7.1 Scalar Products in Euclidean Space Let X and Y be two vectors in Rn , with entries x,..., xn and 2/1,..., yn respectively. Then the scalar product of X and Y is defined to be the matrix product XT Y = (x1x2 ... xn) (Vi V2 = Xiyi + X2IJ2 H V XnVn- This is a real number. Notice that XT Y = YT X, so the scalar product is symmetric in X and Y. Of particular interest is the scalar product of X with itself XT X = xl + xl + --- + x2 n. 193
  • 210. 194 Chapter Seven: Orthogonality in Vector Spaces Since this expression cannot be negative, it has a real square root, which is called the length of X. It is written ||X|| = VXT X = ^xl+xl + ... + x 2n . Notice that ||X|| > 0, and X = 0 if and only if all the x{ are zero, that is, X = 0. So the only vector of length 0 is the zero vector. A vector whose length is 1 is called a unit vector. At this point it is as well to specialize to R3 where geo- metrical intuition can be used. Recall that a 3-column vector X in R3 , with entries xi, X2, £3, is represented by a line seg- ment in three-dimensional space with arbitrary initial point (CJI, a2, a3) and endpoint (oi+xi, a2+x2, 03 + 2:3). Thus the length of the vector X is just the length of any representing line segment. This suggests that we look for a geometrical interpreta- tion of the scalar product of two vectors in R3 . Theorem 7.1.1 Let X and Y be vectors in R3 . Then XT Y = X Ycos 9. where 9 is the angle in the interval [0, TT] between line segments representing X and Y drawn from the same initial point. Proof Consider the triangle rule for adding the vectors X and Y — X in the triangle IAB, as shown in the diagram below.
  • 211. 7.1: Scalar Products in Euclidean Space 195 The idea is then to apply the cosine rule to this triangle. Y-X Thus we have AB2 = I A2 + IB2 - 21A • IB cos 9, which becomes in vector form Y-X |xir + i|y|r-2|ix|i imicos e. As usual let the entries of X and Y be x, x2, xs and yi, y2, y3 respectively. Then 2 _ „,2 , „,2 , „,2 Xr = xi + xi + xl YF = yi+v>2+vi and Y - X2 = (Vl - xx)2 + (y2 - x2f + (y3 - x3)2 . Now substitute these expressions in the equation for Y—X2 , and solve for the expression ||X|| ||y|| cos 9. We obtain after some simplification the required result X Y cos 9 = xij/i + x2y2 + x3y3 = XT Y.
  • 212. 196 Chapter Seven: Orthogonality in Vector Spaces The formula of 7.1.1 allows us to calculate quickly the angle 6 between two non-zero vectors X and Y; for it yields the equation XT Y Hence the vectors X and Y are orthogonal if and only if XT Y = 0. There is another more or less immediate use for the for- mula of 7.1.1. Since cos 6 always lies between —1 and +1, we can derive a famous inequality. Theorem 7.1.2 (The Cauchy - Schwartz Inequality) If X and Y are any vectors in R3 , then XT Y < IIXII ||Y||. Projection of a vector on a line Let X and Y be two vectors in R3 with Y non-zero. We wish to define the projection of X on Y. Now any vector parallel to Y will have the form c Y for some scalar c. The idea is to try to choose c in such a way that the vector X — cY
  • 213. 7.1: Scalar Products in Euclidean Space 197 is orthogonal to Y. For then cFwill be the projection of X on Y, as one sees from the diagram. The condition for X — cY to be orthogonal to Y is 0 = (X - cY)T Y = XT Y - cYT Y = XT Y - c Y2 . The correct value of c is therefore XT Y / Y2 and the vector projection of X on Y is The scalar projection of X on Y is the length of P, that is, XT Y P Y We will see in 7.2 how to extend this concept to the projection of a vector on an arbitrary subspace. Example 7.1.1 Consider the vectors X= - 1 , Y = inR3 . Here ||X|| = V& Y = y/li and XT Y = 2 - 3 + 2 = 1. The angle 8 between X and Y is therefore given by cos 8 = 84 and 8 is approximately 83.74°. The vector projection of X on Y is (1/14)F and the scalar projection is 1/y/lA.
  • 214. 198 Chapter Seven: Orthogonality in Vector Spaces The distance of a point from a plane As an illustration of the usefulness of these ideas, we will find a formula for the shortest distance between the point ($0, J/0) ZQ) and the plane whose equation is ax + by + cz = d. First we need to recall a few basic facts about planes. Suppose that (xi, yi, z) and (x2, j/25 ^2) &re two points on the given plane. Then ax +byi +cz± = d = ax2 + by2 + cz2, so that a(xi - x2) + b(yi - y2) + c{zx - z2) = 0. Now this equation asserts that the vector N = is orthogonal to the vector with entries xi—X2,yi—y2, Z1—Z2, and hence to every vector in the plane. N (*2- 72. Z2) ( * i . y i . * i ) Thus iV is a normal vector to the plane ax + by + cz = d, which is a familiar fact from the analytical geometry of three- dimensional space.
  • 215. 7.1: Scalar Products in Euclidean Space 199 We are now in a position to calculate the shortest distance I from the point (XQ, yo, z0) to the plane. Let (x, y, z) be a, point in the plane, and write x x0 X= y and Y = y0 z I z0 Then I is simply the scalar projection of XQ — X on N, as may be seen from the diagram below: (*> y. z) Therefore Now (X0-X)T N IliVll (X0 - X)T N = a(x0 -x) + b(yQ - y) + c(z0 - z) = ax0 + by0 + cz0 - d : for ax + by + cz = d since the point (x, y, z) lies in the plane. Thus we arrive at the formula / = ax0 + fa/o + cz0 - d Va2 + b2 + c2
  • 216. 200 Chapter Seven: Orthogonality in Vector Spaces Vector products in R3 In addition to the scalar product, there is another well- known construction in R3 called the vector product. This is defined in the following manner. Suppose that X = are two vectors in R3 . Then the vector product of X and Y X x Y is defined to be the vector Xi x2 X3J and Y = Vi V2 V3 (x2y3 - £32/2 Xy2 - x2yx / Notice that each entry of this vector is a 2 x 2 determinant. Because of this, the vector product is best written as a 3 x 3 determinant. Following a commonly used notation, let us write i, j , k for the vectors of the standard basis of R3 . Thus .-(i),j=(?)„dk =(0 o Then the vector product X xY can be expressed in the form X x Y = (x2y3 - x3y2)i + (x3yi - xxy3)} + {xxy2 - Z22/i)k. This expression is a row expansion of the 3 x 3 determinant X xY =
  • 217. 7.1: Scalar Products in Euclidean Space 201 Here the determinant is evaluated by expanding along row 1 in the usual manner. Example 7.1.2 The vector product of X = 1 x 7 = 1 - 1 which becomes on expansion 14' X x Y = 14i + 8j - 5k = | 8 The importance of the vector product X xY arises from the fact that it is orthogonal to each of the vectors X and Y; thus it is represented by a line segment that is normal to the plane containing line segments corresponding to X and Y, in case these are not parallel. To see this we can simply form the scalar product of X x Y in turn with X and Y. For example, XT (X xY) = Since rows 1 and 2 are identical, this is zero by a basic property of determinants (3.2.2). In fact the vectors X, Y, X x Y form a right-handed system in the sense that their directions correspond to the thumb and first two index fingers of the right hand when held extended.
  • 218. 202 Chapter Seven: Orthogonality in Vector Spaces Theorem 7.1.3 If X and Y are vectors in R3 ; the vector X xY is orthogonal to both X and Y, and the three vectors X, Y, X x Y form a right-handed system. The length of the vector product, like the the scalar prod- uct, is a number with geometrical significance. Theorem 7.1.4 If X and Y are vectors in R3 and 9 is the angle in the interval [0,7r] between X and Y, then X xY = X Ysm 9. Proof We compute the expression ||X||2 ||y||2 — X x Y2 , by sub- stituting ||X||2 = x + x + x2 3, Y2 = y2 + yj + y2 and X x Y2 = (x2y3 - x3y2)2 + (x3yx - xxy^)2 + (xxy2 - x2yi)2 • After expansion and cancellation of some terms, we find that ||X||2 ||F||2 - X x Y2 = (xlVl + x2y2 + x3y3)2 = (XT Y)2 . Therefore, by 7.1.1, ||X||2 ||F||2 -X x Y2 = ||X||2 ||y||2 cos2 ^. Consequently X x Y2 = X2 ||F||2 sin2 ^. Finally, take the square root of each side, noting that the positive sign is correct since sin 9 > 0 in the interval [0, n]. Theorem 7.1.4 provides another geometrical interpreta- tion of the vector product X x Y. For ||X x Y is simply the area of the parallelogram IPRQ formed by line segments representing the vectors X and Y. Indeed the area of this
  • 219. 7.1: Scalar Products in Euclidean Space 2Uo parallelogram equals (IQ sin 9)IP = X Y sin 6 = X x Y. Q Orthogonality in Rn Having gained some insight from R3 , we are now ready to define orthogonality in n-dimensional Euclidean space. Let X and Y be two vectors in Rn . Then X and Y are said to be orthogonal if XT Y = 0. This a natural extension of orthogonality in R3 . It follows from the definition that the zero vector is orthogonal to every vector in Rn and that no non-zero vector can be orthogonal to itself: indeed XT X = x + x + • • • + x2 n > 0 if X ^ 0. It turns out that the inequality of 7.1.2 is valid for Rn . Theorem 7.1.5 (Cauchy - Schwartz Inequality) If X and Y are vectors in Rn , then XT Y < 11X11 iiyii. We shall not prove 7.1.5 at this stage since a more general fact will be established in 7.2: see however Exercise 7.1.10.
  • 220. 204 Chapter Seven: Orthogonality in Vector Spaces Because of 7.1.5 it is meaningful to define the angle between two non-zero vectors X and Y in Rn to be the angle 9 in the interval [0, ir] such that XT Y An important consequence of 7.1.5 is Theorem 7.1.6 (The Triangle Inequality) If X and Y are vectors in Rn , then X + Y < ||X|| + ||y||. Proof Let the entries of X and Y be x,... ,xn and y±,... ,yn re- spectively. Then ||X + r||2 = (X + Yf{X + Y) = XT X +XT Y + YT X + YYT and, since XT Y = YT X, this equals ||X||2 + ||y||2 + 2XT F. By the Cauchy-Schwartz Inequality XT Y < X Y, so it follows that X + Y2 < X2 + Y2 + 2X Y = (X + ||F||)2 , which yields the desired inequality.
  • 221. 7.1: Scalar Products in Euclidean Space 205 When n = 3, the assertion of 7.1.6 is just the well-known fact that the sum of the lengths of two sides of a triangle is never less than the length of the third side, as can be seen from the triangle rule of addition for the vectors X and Y. ""£> Complex matrices and orthogonality in Cn It is possible to define a notion of orthogonality in the complex vector space C™, a fact that will be important in Chapter Eight. However, a crucial change in the definition must be made. To see why a change is necessary, consider the complex vector X = ( 7* )• Then XT X = - 1 + 1 = 0. Since it does not seem reasonable to allow a non-zero vector to have length zero, we must alter the definition of a scalar product in order to exclude this phenomenon. First it is necessary to introduce a new operation on com- plex matrices. Let A be an m x n matrix over the complex field C. Define the complex conjugate A of A to be the m xn matrix whose (i,j) entry is the complex conjugate of the (i,j) entry of A. Then define the complex transpose of A to be the transpose of the complex conjugate A* = (Af.
  • 222. 206 Chapter Seven: Orthogonality in Vector Spaces For example, if A = then 4 —v/-l 3 1 + ^/^T - 4 1 - J=l J ' Usually it is more appropriate to use the complex transpose when dealing with complex matrices. In many ways the com- plex transpose behaves like the transpose; for example, there is the following fact. Theorem 7.1.7 If A and B are complex matrices, then (AB)* = B*A*. This follows at once from the equations (AB) — (A)(B) and (AB)T = BT AT . Now let us use the complex transpose to define the com- plex scalar product of vectors X and Y in Cn ; this is to be X*Y = xtyi + --- + xnyn, which is a complex number. Why is this definition any better than the previous one? The reason is that, if we define the length of the vector X in the natural way as X = VX*X = x /|x1 |2 + --- + |xn|2 , then ||X|| is always a non-negative real number, and it can- not equal 0 unless X is the zero vector. It is an important consequence of the definition that Y*X equals the complex conjugate of X*Y, so the complex scalar product is not sym- metric in X and Y.
  • 223. 7.1: Scalar Products in Euclidean Space 207 It remains to define orthogonality in Cn . Two vectors X and Y in Cn are said to be orthogonal if X*Y = 0. We now make the blanket assertion that all the results estab- lished for scalar products in Rn carry over to complex scalar products in Cn . In particular the Cauchy-Schwartz and Tri- angle Inequalities are valid. Exercises 7.1 (~2 ( x 1. Find the angle between the vectors I 4 I and I —2 V 3 / V 3. 2. Find the two unit vectors which are orthogonal to both of the vectors I 3 J and I 1 - i / V1 . 3. Compute the vector and scalar projections of "!)on (i. 4. Show that the planes x — 3y + 4z = 12 and 2x — 6y + 8z = 6 are parallel and then find the shortest distance between them. ( 2 f° 5. If X = I —1 J and Y = 4 , find the vector product V 3/ W X x Y. Hence compute the area of the parallelogram whose vertices have the following coordinates: (1, 1, 1), (3, 0, 4), (1, 5, 3), (3, 4, 6). 6. Establish the following properties of the vector product: (a) X x X = 0; (b) X x (Y + Z) = X x Y + X x Z; (c)XxY = -YxX; (d) Xx(cY) = c{XxY) = (cX)xY.
  • 224. 2L)o Chapter Seven: Orthogonality in Vector Spaces 7. If X, Y, Z are vectors in R3 , prove that XT (Y x Z) = YT (Z xX) = ZT (X x Y). (This is called the scalar triple product of X, Y, Z). Then show that that the absolute value of this number equals the volume of the parallelopiped formed by line segments repre- senting the vectors X, Y, Z drawn from the same initial point. 8. Use Exercise 7 to find the condition for the three vectors X, Y, Z to be represented by coplanar line segments. 9. Show that the set of all vectors in Rn which are orthog- onal to a given vector X is a subpace of Rn . What will its dimension be? 10. Prove the Cauchy-Schwartz Inequality for Rn . [Hint: compute the expression ||X||2 ||y||2 — |XT F|2 and show that it is is non-negative]. 11. Find the most general vector in C3 which is orthogonal to both of the vectors ( -s* ( x 2 + 7=T and 1 . V 3 ) J=2) 12. Let A and B be complex matrices of appropriate sizes. Prove the following statements: (a)(i)T = (W); (b)(A + B)* = A*+B*; (c)(A*)* = A. 13. How should the vector projection of X on Y be defined inC3 ?
  • 225. 7.2: Inner Product Spaces 209 14. Show that the vector equation of the plane through the point (xo, yo, ZQ) with normal vector N is {X - X0)T N = 0 where X and XQ are the vectors with entries x, y, z and xo, y0, z0, respectively. 15. Prove the Cauchy-Schwartz Inequality for complex scalar products in Cn . 16. Prove the Triangle Inequality for complex scalar products i n C n . 17. Establish the following expression for the vector triple product in R3 : X x (Y x Z) = (X • Z)Y - (X • Y)Z. [Hint: note that the vector on the right hand side is orthogonal to to both X and Y x Z. 7.2 Inner Product Spaces We have seen how to introduce the notion of orthogonal- ity in the vector spaces Rn and Cn for arbitrary n. But what about other vector spaces such as vector spaces of polynomials or continuous functions? It turns out that there is a general concept called an inner product which is a natural extension of the scalar products in Rn and Cn . This allows the intro- duction of orthogonality in arbitrary real and complex vector spaces. Let V be a real vector space, that is, a vector space over R. An inner product on V is a rule which assigns to each pair of vectors u and v of V a real number < u, v >, their inner product, such that the following properties hold: (i) < v, v > > 0 and < v, v > = 0 if and only if v = 0; (ii) < u, v > = < v, u >; (iii)< cu + dv, w >= c < u, w > + <i < v, w > .
  • 226. 2 1 0 Chapter Seven: Orthogonality in Vector Spaces The understanding here is that these properties must hold for all vectors u, v, w and all real scalars c, d. We now give some examples of inner products, the first one being the scalar product, which provided the original mo- tivation. Example 7.2.1 Define an inner product < > on Rn by the rule < X, Y > = XT Y. That this is an inner product follows from the laws of matrix algebra, and the fact that XT X is non-negative and equals 0 only if X = 0. This inner product will be referred to as the standard inner product on Rn . It should be borne in mind that there are other possible inner products for this vector space; for example, an inner product on R3 is defined by < X, Y >= 2xxv + 3x2y2 + 4z3?/3 where X and Y are the vectors with entries x, X2, x$ and y±, J/2; 2/3 respectively. The reader should verify that the axioms for an inner product hold in this case. Example 7.2.2 Define an inner product < > on the vector space C[a,b] by the rule <f,9>= / f(x)g(x)dx. J a This is very different type of inner product, which is im- portant in the theory of orthogonal functions. Well-known properties of integrals show that the requirements for an in- ner product are satisfied. For example, rb < / , / > = / f(x)2 dx>0 Ja
  • 227. 7.2: Inner Product Spaces 211 since f(x)2 > 0; also, if we think of the integral as the area under the curve y = f{x)2 , then it becomes clear that the integral cannot vanish unless f(x) is identically equal to zero in [a ,b). Example 7.2.3 Define an inner product on the vector space Pn (R) of all real polynomials in x of degree less than n by the rule n < f,9> = ^2f(.Xi)g(xi) i=l where distinct real numbers. Here it is not so clear why the first requirement for an inner product holds. Note that n also the only way that this sum can vanish is if f(x) = ... = f(x n) = 0. But / is a polynomial of degree at most n — 1, so it cannot have n distinct roots unless it is the zero polynomial. Orthogonality in inner product spaces A real inner product space is a vector space V over R together with an inner product < > on V. It will be con- venient to speak of "the inner product space Vn , suppressing mention of the inner product where this is understood. Thus "the inner product space Rn " refers to Rn with the scalar product as inner product: this is called the Euclidean inner product space. Two vectors u and v of an inner product space V are said to be orthogonal if < u, v > = 0.
  • 228. 212 Chapter Seven: Orthogonality in Vector Spaces It follows from the definition of an inner product that the zero vector is orthogonal to every vector and no non-zero vector can be orthogonal to itself. Example 7.2.4 Show that the functions sin x, m = 1,2,..., are mutually orthogonal in the inner product space C[0, n] where the inner product is given by the formula < f,g > = JQ f(x)g(x)dx. We have merely to compute the inner product of sin mx and sin nx : r < sin mx, sin nx > = sin mx sin nx dx. Jo Now, according to a well-known trigonometric identity, sinmx sin nx = -(cos(m — n)x — cos(m + n)x). Therefore, on evaluating the integrals, we obtain as the value of < sin mx, sin nx > [ — r sin(m — n)x — — sin(m -f n)x)7. — 0, L 2(m-n) v ; 2(m + n) v ; J0 provided m ^ n. This is a very important set of orthogonal functions which plays a basic role in the theory of Fourier series. If v is a vector in an inner product space V, then < v,v > > 0, so this number has a real square root. This allows us to define the norm of v to be the real number ||v|| = V< v,v>. Thus ||v|| > 0 and ||v|| equals zero if and only if v = 0. A vector with norm 1 is called a unit vector. It is clear that norm is a generalization of length in Euclidean space.
  • 229. 7.2: Inner Product Spaces 213 Example 7.2.5 Find the norm of the function sin mx in the inner product space C[0, IT] of Example 7.2.4. Once again we have to compute an integral: || sin rax||2 = / sin2 mx dx = / -(1 — cos 2mx)dx = ir/2. Jo Jo 2 Hence || sin mx = ^/(TT/2). It follows that the functions 2~ sin mx, m = 1, 2,..., n form a set of mutually orthogonal unit vectors. Such sets are called orthonormal and will be studied in 7.3. There is an important inequality relating inner product and norm which has already been encountered for Euclidean spaces. Theorem 7.2.1 (The Cauchy - Schwartz Inequality) Let u and v be vectors in an inner product space. Then | < u, v > | < ||u|| ||v||. Proof We can assume that v ^ 0 or else the result is obvious. Let t denote an arbitrary real number. Then, using the defining properties of the inner product, we find that < u—tv, u—tv > equals < u, u > - < u, v > t- < v, u > t+ < v, v > t2 , which reduces to ||u||2 - 2 < u,v > t+ v2 t2 > 0.
  • 230. 214 Chapter Seven: Orthogonality in Vector Spaces For brevity write a = ||v||2 , b = < u, v > and c = ||u||2 . Thus at2 - 2bt + c =< u - tv, u - tv > > 0. To see what this implies, complete the square in the usual manner; at2_2bt + c = a((t--)2 + (--^)). a a a2 ' Since a > 0 and the expression on the left hand side of the equation is non-negative for all values of t, it follows that c/a > b2 /a2 , that is, b2 < ac. On substituting the values of a, b and c, and taking the square root, we obtain the desired inequality. Example 7.2.6 If 7.2.1 is applied to the vector space C[a, b] with the inner product specified in Example 7.2.2, we obtain the inequality f f(x)g(x)dx < ( f f(x)2 dx)1/2 ( / g(x)2 dx)1/2 . a J a J a Normed linear spaces The next step in our series of generalizations is to extend the notion of length of a vector in Euclidean space. Let V denote a real vector space. By a norm on V is meant a rule which assigns to each vector v a real number ||v||, its norm, such that the following properties hold: (i) ||v|| > 0 and ||v|| = 0 if and only if v = 0; (ii) ||cv|| = c ||v||; (iii) ||u +v|| < ||TU.|| + ||v||. (The Triangle Inequality). These are to hold for all vectors u and v in V and all scalars c. A vector space together with a norm is called a normed linear space.
  • 231. 7.2: Inner Product Spaces 215 We already know an example of a normed linear space; for the length function on Rn is a norm. To see why this is so, we need to remember that the Triangle Inequality was established for the length function in 7.1.6. The reader will have noticed that the term "norm" has already been used in the context of an inner product space. Let us show that these two usages are consistent. Theorem 7.2.2 Let V be an inner product space and define ||v|| = yf< V, V >. Then || || is a norm on V and V is a normed linear space. Proof We need to check the three axioms for a norm. In the first place, ||v|| = y < v, v > > 0, and this cannot vanish unless v = 0, by the definition of an inner product. Next, if c is a scalar, then ||cv|| = y/< cv, cv > = /{c2 < v, v >) = c |v||. Finally, the Triangle Inequality must be established. By the defining properties of the inner product: ||u + v||2 = < u + v, u + v > = ||u||2 + 2 < u, v > +||v||2 , which, by 7.2.1, cannot exceed ||u||2 + 2||u||||v|| + ||vf = (||u|| + ||v||)2 . On taking square roots, we derive the required inequality. Theorem 7.2.2 enables us to give many examples of normed linear spaces.
  • 232. 216 Chapter Seven: Orthogonality in Vector Spaces Example 7.2.7 The Euclidean space Rn is a normed linear space if length is taken as the norm. Thus ||X|| = VX^X = /xl+x% + .-- + xl Example 7.2.8 The vector space C[a,b] becomes a normed linear space if ||/|| is defined to be {f' f(xfdx)1 /2 . J a Example 7.2.9 (Matrix norms) A different type of normed linear space arises if we consider the vector space of all real m x n matrices and introduce a norm on it as follows. If A = [ciijm,ni define A to be m n <E£4)1/2 - On the face of it this is a reasonable measure of the "size" of the matrix. But of course one has to show that this is really a norm. A neat way to do this is as follows: put A equal to the ran-column vector whose entries are the elements of A listed by rows. The key point to note is that A is just the length of the vector A in Rm n . It follows at once that || || is a norm since we know that length is a norm. Inner products on complex vector spaces So far inner products have only been defined on real vec- tor spaces. Now it has already been seen that there is a rea- sonable concept of orthogonality in the complex vector space Cn , although it differs from orthogonality in Rn in that a dif- ferent scalar product must be used. This suggests that if an
  • 233. 7.2: Inner Product Spaces 217 inner product is to be defined on an arbitrary complex vector space, there will have to be a change in the definition of the inner product. Let V be a vector space over C. An inner product on V is a rule that assigns to each pair of vectors u and v i n F a complex number < u, v > such that the following rules hold: (i) < v, v > > 0 and < v, v > = 0 if and only if v = 0; (ii) < u,v > = < v,u >; (iii) < cu + dv, w > = c < u, w > +d < v, w > . These are to hold for all vectors u, v, w and all complex scalars c, d. Observe that property (ii) implies that < v,v > is real: for this complex number equals its complex conjugate. A complex vector space which is equipped with an inner product is called a complex inner product space. Our prime example of a complex inner product space is Cn with the complex scalar product < X, Y > = X*Y. To see that this is a complex inner product, we need to note that X*Y = F*X and < cX + dY, Z > = (cX + dY)*Z = cX*Z + dY*Z, which is just c < X, Z > + d < Y, Z > . Provided that the changes implied by the altered condi- tions (ii) and (iii) are made, the concepts and results already established for real inner product spaces can be extended to complex inner product spaces. In addition, results stated for real inner product spaces in the remainder of this section hold for complex inner product spaces, again with the appropriate changes. Orthogonal complements We return to the study of orthogonality in real inner prod- uct spaces. We wish to introduce the important notion of the orthogonal complement of a subspace. Here what we have in
  • 234. 218 Chapter Seven: Orthogonality in Vector Spaces mind as our model is the simple situation in three-dimensional space where the orthogonal complement of a plane is the set of line segments perpendicular to it. Let 5 be a subspace of a real inner product space V. The orthogonal complement of S is defined to be the set of all vectors in V that are orthogonal to every vector in S: it is denoted by the symbol S± . Example 7.2.10 Let S be the subspace of R3 consisting of all vectors of the form (S) where a and b are real numbers. Thus elements of S corre- spond to line segments in the xy-plane. Equally clearly S1 - is the set of all vectors of the form ( • ) • These correspond to line segments along the 2-axis, hardly a surprising conclusion. The most fundamental property of an orthogonal com- plement is that it is a subspace. Theorem 7.2.3 Let S be a subspace of a real inner product space V. Then (a) S1 - is a subspace of V; (b) SnS± = 0; (c) if S is finitely generated, a vector v belongs to S1 if and only if it is orthogonal to every vector in some set of generators of S.
  • 235. 7.2: Inner Product Spaces 219 Proof To show that S1 is a subspace we need to verify that it con- tains the zero vector and is closed with respect to addition and scalar multiplication. The first statement is true since the zero vector is orthogonal to every vector. As for the re- maining ones, take two vectors v and w in S1 -1 , let s be an arbitrary vector in S and let c be a scalar. Then < cv, s > = c < v, s > = 0, and < v + w, s > = < v , s > + < w , s > = 0 . Hence cv and v + w belong to S1 -. Now suppose that v belongs to the intersection S P S1 -. Then v is orthogonal to itself, which can only mean that v = 0. Finally, assume that v i , . . . , vm are generators of S and that v is orthogonal to each v^. A general vector of S has the form YlT=i c iv i f°r s o m e scalars Q . Then m m < V, ^ °iV i > = ^ Q < V, Vj > = 0. i=l i=l Hence v is orthogonal to every vector in S and so it belongs to S-1 . The converse is obvious. Example 7.2.11 In the inner product space ^ ( R ) with < f,9> = / f{x)g{x)dx, Jo find the orthogonal complement of the subspace S generated by 1 and x.
  • 236. 220 Chapter Seven: Orthogonality in Vector Spaces Let / = ao+aix--a,2X2 be an element of Ps(R). By 7.2.3, a polynomial / belongs to S1 - if and only if it is orthogonal to 1 and x; the conditions for this are f1 1 1 < /, 1 > = / f{x)dx = a0 + -ax + -a2 = 0 and f1 1 1 1 < /, x >= / xf(x)dx = -a0 + -ai + -a2 = 0. Solving these equations, we find that ao = t/6, a = —t and a2 = t, where t is arbitrary. Hence / = t(x2 —£+|) is the most general element of S1 -. It follows that S1 - is the 1-dimensional subspace generated by the polynomial x2 — x + | . Notice in the last example that dim(S') + dim(5,J -) = 3, the dimension of Pa(R). This is no coincidence, as the follow- ing fundamental theorem shows. Theorem 7.2.4 Let S be a subspace of a finite-dimensional real inner product space V; then V = S®S± and dim(V) = dim(S) + dim(5± ). Proof According to the definition in 5.3, we must prove that V = S + S1 - and S D Sx = 0. The second statement is true by 7.2.3, but the first one requires proof. Certainly, if S = 0, then S1 - = V and the result is clear. Having disposed of this case, we may assume that S is non- zero and choose a basis v^,..., vm for S. Extend this basis of
  • 237. 7.2: Inner Product Spaces 221 S to a basis of V, say v i , . . . , vTO, v m + i , . . . , vn : this possible by 5.1.4. If v is an arbitrary vector of V, we can write n By 7.2.3 the vector v belongs to S1 - if and only if it is orthog- onal to each of the vectors v i , . . . , vm ; the conditions for this are n < v^ v > = 2_. < Vj, Vj > Cj = 0, for i = 1, 2,..., m. Now the above equations constitute a linear system of m equa- tions in the n unknowns ci, C2,..., cn. Therefore the dimen- sion of S1 - equals the dimension of the solution space of the linear system, which we know from 5.1.7 to be n — r where r is the rank of the m x n coefficient matrix A = [< Vj, Vj >]. Obviously r < m; we shall show that in fact r = m. If this is false, then the m rows of A must be linearly dependent and there exist scalars dfa, • • •, dm, not all of them zero, such that m m 0 = J^d» < Vi, Vj > = < J^djVi, Vj- > for j = 1,... ,n. But a vector which is orthogonal to every vector in a basis of V must be zero. Hence Yl'iLi ^iv i = 0' which can only mean that d = d2 = • • • = dm = 0 since v i,---)v m are linearly independent. By this contradiction r = m. We conclude that dim(5± ) = n — m = n — dim(S'), which implies that dim(5, )+dim(5-L ) = n — dim(V). It follows from 5.3.2 that dim(5 + S-1 -) = dim(5) + dim(5'± ) = dim(V).
  • 238. 222 Chapter Seven: Orthogonality in Vector Spaces Hence V = S + S1 -, as required. An important consequence of the theorem is Corollary 7.2.5 If S is a subspace of a finite-dimensional real inner product space V, then (S^ = S. Proof Every vector in S is certainly orthogonal to every vector in S± ; thus S is a subspace of (S-1 )-1 . On the other hand, a computation with dimensions using 7.2.4 yields dim((5± )± ) =dim(V) - d i m ^ 1 ) = dim(V) - (dim(F) - dim(S)) = dim(S) Therefore S={S± )± . Projection on a subspace The direct decomposition of an inner product space into a subspace and its orthogonal complement afforded by 7.2.4 leads to wide generalization of the elementary notion of pro- jection of one vector on another, as described in 7.1. This generalized projection will prove invaluable during the discus- sion of least squares in 7.4. Let V be a finite-dimensional real inner product space, let S be a subspace and let v an element of V. Since V = StSS-1 , there is a unique expression for v of the form v = s + s1 - where s and s-1 belong to S and S1 - respectively. Call s the projection ofv on the subspace S. Of course, s-1 is the projec- tion of v on the subspace S1 . For example, if V is R3 , and
  • 239. 7.2: Inner Product Spaces 223 S is the subspace generated by a given vector u, then s is the projection of v on u in the sense of 7.1. Example 7.2.12 Find the projection of the vector X on the column space of the matrix A where X = 1 and A = Let S denote the column space of A. Now the columns of A are linearly independent, so they form a basis of S. We have to find a vector Y in S such that X — Y is orthogonal to both columns of A; for then X — Y will belong to S1 - and Y will be the projection of X on S. Now Y must have the form Y = x for some scalars x and y. Then if A and A^ are the columns of A, the conditions for X — Y to belong to S1 - are < X-Y, Ax > = (l-x-3y) + 2(l-2x + y) + {l-x-4y) = 0 and < X-Y, A2> = 3(l~x-3y)-(l~2x+y)+4{l-x-4:y) = 0. These equations yield x = 74/131 and y — 16/131. The pro- jection of X on the subspace S is therefore 1 f122 i6i 138/
  • 240. 2 2 4 Chapter Seven: Orthogonality in Vector Spaces Orthogonality and the fundamental subspaces of a matrix We saw in Chapter Four that there are three natural sub- spaces associated with a matrix A, namely the null space, the row space and the column space. There are of course corre- sponding subspaces associated with the transpose AT , so in all six subspaces may be formed. However there is very little difference between the row space of A and the column space of AT ; indeed, if we transpose the vectors in the row space of A, we get the vectors of the column space of AT . Similarly the vectors in the row space of AT arise by transposing vec- tors in the column space of A. Thus there are essentially four interesting subspaces associated with A, namely, the null and column spaces of A and of AT . These subspaces are connected by the orthogonality relations indicated in the next result. Theorem 7.2.6 Let A be a real matrix. Then the following statements hold: (i) null space of A = (column space of AT )± ; (ii) null space of AT = (column space of A)1 -; (iii) column space of A = (null space of A7 ^)-1 ; (iv) column space of AT = (null space of A)1 -. Proof To establish (i) observe that a column vector X belongs to the null space of A if and only if it is orthogonal to every column of AT , that is, X is in (column space of AT )± . To deduce (ii) simply replace A by AT in (i). Equations (iii) and (iv) follow on taking the orthogonal complement of each side of (ii) and (i) respectively, if we remember that S = (S-1 )1 - by 7.2.5.
  • 241. 7.2: Inner Product Spaces 225 Exercises 7.2 1. Which of the following are inner product spaces? (a) Rn where < X, Y > = -XT Y; (b) Rn where < X, Y > = 2XT Y; (c) C[0,1] where <f,g>= J^(f(x)+g{x))dx. 2. Consider the inner product space C[0, TT] where < /, g > = C f(x)g(x)dx; show that the functions 1/y/n, J2pn cos mx, m — 1,2,..., form a set of mutually orthogonal unit vectors. 3. Let to be a fixed, positive valued function in the vector space C[a, b}. Show that if < /, g > is defined to be b f(x)w(x)g(x)dx, then < > is an inner product on C[a, b]. [Here w is called a weight function]. 4. Which of the following are normed linear spaces? (a) R3 where ||X|| =xl + x%+ xj; (b) R3 where ||X|| = y/x + x - x (c) R where ||X|| = the maximum of |xi|, x2-, %?- 5. Let V be a finite-dimensional real inner product space with an ordered basis v i , . . . , vn . Define a^ to be < v^, Vj >. If A = [dij] and u and w are any vectors of V, show that < u, w > = [u]T A[w] where [ u] is the coordinate vector of u with respect to the given ordered basis. 6. Prove that the matrix A in Exercise 5 has the following properties: (a) XT AX > 0 for all X; (b) XT AX = 0 only if X = 0; (c) A is symmetric. Deduce that A must be non-singular. I
  • 242. 226 Chapter Seven: Orthogonality in Vector Spaces 7. Let A be a real n x n matrix with properties (a), (b) and (c) of Exercise 6. Prove that < X, Y > = XT AY defines an inner product on Rn . Deduce that ||X|| = y/XT AX defines a norm on Rn . 8. Let S be the subspace of the inner product space -Ps(R) generated by the polynomials 1 — x2 and 2 — x + x2 , where < /, g > = fQ f(x)g(x)dx. Find a basis for the orthogonal complement of S. 9. Find the projection of the vector with entries 1, —2, 3 on / l 0 the column space of the matrix 2 —4 3 5 10. Prove the following statements about subspaces S and T of a finite dimensional real inner product space: (a) (S + T)1 =S± DT± ; (b) S1 - = T1 - always implies that S = T; (c) (SDT)1 - = S± +T± . 11. If S is a subspace of a finite dimensional real inner product space V, prove that S1 - ~ V/S. 7.3 Orthonormal Sets and the Gram-Schmidt Process Let V be an inner product space. A set of vectors in V is called orthogonal if every pair of distinct vectors in the set is orthogonal. If in addition each vector in the set is a unit vec- tor, that is, has norm is 1, then the set is called orthonormal. Example 7.3.1 In the Euclidean space R3 the vectors
  • 243. 7.3: Orthonormal Sets and the Gram-Schmidt Process 227 form an orthogonal set since the scalar product of any two of them vanishes. To obtain an orthonormal set, simply multiply each vector by the reciprocal of its length: Example 7.3.2 The standard basis of R n consisting of the columns of the identity matrix l n is an orthonormal set. Example 7.3.3 The functions j2/irsn. mx, m = 1, 2,... form an orthonormal subset of the inner product space C[0, 7r]. For we observed in Examples 7.2.4 and 7.2.5 that these vectors are mutually orthogonal and have norm 1. A basic property of orthogonal subsets is that they are always linearly independent. Theorem 7.3.1 Let V be a real inner product space; then any orthogonal subset of V consisting of non-zero vectors is linearly independent. Proof Suppose that the subset {vi,..., vn } is orthogonal, so that < vi, Vj > = 0 if i / j . Assume that there is a linear relation of the form ciVi + • • • + cn vn = 0. Then, on taking the inner product of both sides with Vj, we get n n 0 = ^ < QVi, Vj > = '^TjCi <Vi,Vj > = Cj < Vj,Vj > i=l i=l II 112 — c • v •
  • 244. 228 Chapter Seven: Orthogonality in Vector Spaces since < vi: Vj > = 0 if i ^ j . Now ||vj|| ^ 0 since Vj is not the zero vector; therefore Cj — 0 for all j . It follows that the Vj are linearly independent. This result raises the possibility of an orthonormal basis, and indeed we have already seen in Example 7.3.2 that the standard basis of Rn is orthonormal. While at present there are no grounds for believing that such a basis always exists, it is instructive to record at this stage some useful properties of orthonormal bases. Theorem 7.3.2 Suppose that {vi,...,vn } is an orthonormal basis of a real inner product space V. If v is an arbitrary vector of V, then n n v — ^ < v, Vj > Vj and ||v||2 = ^ < v, Vj >2 . i = l i = l Proof Let v = X)I=i c iv « t»e the expression for v in terms of the given basis. Forming the inner product of both sides with Vj, we obtain n n < V, Vj > = < ^2 C *V *> Vj > = J ^ Cj < Vj, Vj > = Cj i=l i=l since < Vj, Vj > = 0 if i ^ j and < Vj, v,- > = 1. Finally, n n ||v||2 = < v, v > = < ^ c i V i , Y;C JV J > t = l 3=1 n n = ^YlCiC i <x ^j >, i = i j = i which reduces to Y^i=i c j-
  • 245. 7.3: Orthonormal Sets and the Gram-Schmidt Process 229 Another useful feature of orthonormal bases is that they greatly simplify the procedure for calculating projections. Theorem 7.3.3 Let V be an inner product space and let S be a subspace and v a vector of V. Assume that {si,... , sm} is an orthonormal basis of S. Then the projection of v on S is m ^2<V, Si> Si. 1 = 1 Proof Put p = Y^Li < v )s i > s i> a vector which quite clearly belongs to S. Now < p, s^- > = < v, Sj >, so < V - p , Sj > = < V, Sj > - < p , Sj > = < V, Sj > — < V, Sj > = 0. Hence v — p is orthogonal to each basis element of S, which shows that v — p belongs to 5rJ -. Since v = p + (v — p), and the expression for v as the sum of an element of S and an element of S1 - is unique, it follows that p is the projection of v on S. Example 7.3.4 The vectors form an orthonormal basis of a subspace S of R3 ; find the projection on S of the column vector X with entries 1,-1,1.
  • 246. 230 Chapter Seven: Orthogonality in Vector Spaces Apply 7.3.3 with si = X and s2 = X2 we find that the projection of X on S is P = <X,Xi >Xi+<X,X2 >X2 4 1 1 / 16> Having seen that orthonormal bases are potentially useful, let us now address the problem of finding such bases. Gram-Schmidt orthogonalization Suppose that V is a finite-dimensional real inner prod- uct space with a given basis {ui,... ,un }; we shall describe a method of constructing an orthonormal basis of V which is known as the Gram-Schmidt process. The orthonormal basis of V is constructed one element at a time. The first step is to get a unit vector; 1 Vi = j . jj-Ui. Notice that ui and vi generate the same subspace; let us call it Si. Then vi clearly forms an orthonormal basis of Si. Next let Pi = < u2 , vi > vi. By 7.3.3 this is the projection of 112 on Si. Thus u2 — px belongs to S^ and u2 — Pi is orthogonal to vi. Notice that U2 ~ Pi ^ 0 since ui and u2 are linearly independent. The second vector in the orthonormal basis is taken to be V2 = 71 n-(u2 - P i ) . llU 2-Pll| By definition of vi and v2, these vectors generate the same subspace as Ui, 112, say S2. Also vi and v2 form an orthonor- mal basis of £2 •
  • 247. 7.3: Orthonormal Sets and the Gram-Schmidt Process 231 The next step is to define p2 = < u3, vi > vi+ < u3, v2 > v2, which by 7.3.3 is the projection of u3 on S^. Then u3 — p2 belongs to S^ and so it is orthogonal to vi and v2. Again one must observe that u3 — p2 7^ 0, the reason being that Ui, u2, u3 are linearly independent. Now define the third vector of the orthonormal basis to be V 3 = T i H I (U3 ~ Ps)- IIU3-P2II Then vi, v2, v3 form an orthonormal basis of the subspace 53 generated by ui, u2, u3. The procedure is repeated n times until we have con- structed n vectors v i , . . . , vn ; these will form an orthonormal basis of V. Our conclusions are summarised in the following funda- mental theorem. Theorem 7.3.4 (The Gram - Schmidt Process) Let {ui,..., un } be a basis of a finite-dimensional real inner product space V. Define recursively vectors v i , . . . , vn by the rules vi = vi—[7U1 and vi + i = ir(u i+i _ Pi)' llu l|| llu i+l-Pill where Pi = < Ui+i. v i > v i H h < u i + i , Vi > Vj is the projection of ui+i on the subspace Si =< v i , . . . , Vj >. Then v i , . . . , vn form an orthonormal basis ofV. The Gram-Schmidt process furnishes a practical method for constructing orthonormal bases, although the calculations can become tedious if done by hand.
  • 248. 232 Chapter Seven: Orthogonality in Vector Spaces Example 7.3.5 Find an orthonormal basis for the column space S of the ma- trix 1 1 2' 1 2 3 1 2 1 .1 1 6. In the first place the columns X, X2, X3 of the matrix are linearly independent and so constitute a basis of S. We shall apply the Gram-Schmidt process to this basis to produce an orthonormal basis {Yi, Y2, Y3} of S, following the steps in the procedure. Now compute the projection of X2 on S =< Y >; Px = <X2, Y1>Yl=3Y1 1 1 lJ The next vector in the orthonormal basis is y 2 = | | X 2 - P 1 l | ( X 2 " P l ) - 2 1 1 -lJ The projection of X3 on S2 =< Yi, Y2 > is P2 = < X3, Yi > Yx+ < X3, Y2>Y2 = 6Y1 - 2Y2 = / 4 2 2 V4/
  • 249. 7.3: Orthonormal Sets and the Gram-Schmidt Process 233 The final vector in the orthonormal basis of S is therefore Y* Example 7.3.6 Find an orthonormal basis of the inner product space P3 (R) where < f,g > is defined to be J_1 f(x)g(x)dx. We begin with the standard basis {1, x, x2 } of Pa(R) and then use the Gram-Schmidt process to construct an orthonor- mal basis {/1, fa, fa}. Since ||1|| = y{J_l x) = /2, the first member of the basis is 1 - 1 1 - 1 Next <x,fx> = f^(x/V2)dx = 0, so px =< x, f1>f1 = 0. Hence since ||x|| = y/(f_1 x2 dx) = w | . Continuing the procedure, we find that < x2 , fi > = x/2/3 and < x2 , f2 > = 0. Hence p2 = < x2 , /1 > / i + < x2 , H > H — 1/3, and so the final vector of the orthonormal basis is U = - (x2 --) = ^(x2 - -) Consequently the polynomials 1 ^ . a n d 3 * 2 - 1 " ! V2' V 2 2V2
  • 250. 234 Chapter Seven: Orthogonality in Vector Spaces form an orthonormal basis of Pa(R). QR-factorization In addition to being a practical tool for computing or- thonormal bases, the Gram-Schmidt procedure has important theoretical implications. For example, it leads to a valuable way of factorizing an arbitrary real matrix. This is generally referred to as QR-factorization from the standard notation for the factors Q and R. Theorem 7.3.5 Let A be a real m x n real matrix with rank n. Then A can be written as a product QR where Q is a real m x n matrix whose columns form an orthonormal set and R is a real nxn upper triangular matrix with positive entries on its principal diagonal. Proof Let V denote the column space of the matrix A. Then V is a subspace of the Euclidean inner product space R m . Since A has rank n, the n columns X,... ,Xn of A are linearly independent, and thus form a basis of V. Hence the Gram- Schmidt process can be applied to this basis to produce an orthonormal basis of V, say Y±,..., Yn. Now we see from the way that the Yi in the Gram-Schmidt procedure are defined that these vectors have the form f Yl=b11X1 < Y2 = b12X1 + b22X2 Yn = binXi + binX2 + • • • + bnnXn for certain real numbers b^ with ba positive. Solving the equations for Xi,.. ., Xn by back-substitution, we get a linear
  • 251. 7.3: Orthonormal Sets and the Gram-Schmidt Process 235 system of the same general form: X i = r n Y i X2 = ri2Yi + r22Y2 Xn = rinYi + r2nY2 + • • • + rnnYn for certain real numbers rij, with ru positive again. These equations can be written in matrix form A=[XXX2 ... Xn] ( r f2 • • • r l n = YiY2 ... Yn] 0 r22 • • • r2n V 0 0 • • • rnnJ The columns of the mx n matrix Q = [Yi Y2 ... Yn] form an orthonormal set since they constitute an orthonormal basis of Rm , while the matrix R = [rjj]n;n is plainly upper triangular. The most important case of this theorem is when A is a non-singular square matrix. Then the matrix Q is n x n, and its columns form a orthonormal set; equivalently it has the property which, by 3.3.4, is just to say that Q~l = QT . A square matrix A such that AT = A'1 is called an orthogonal matrix. We shall see in Chapter 9 that orthogonal matrices play an important role in the study of canonical forms of matrices. It is instructive to determine to investigate the possible forms of an orthogonal 2 x 2 matrix.
  • 252. 236 Chapter Seven: Orthogonality in Vector Spaces Example 7.3.7 Find all real orthogonal 2 x 2 matrices. Suppose that the real matrix A=r c a is orthogonal; thus AT A = I2. Equating the entries of the matrix AT A to those of I2, we obtain the equations a2 + c2 = 1 = b2 + d2 and ab + cd = 0. Now the first equation asserts that the point (a, c) lies on the circle x2 + y2 = 1. Hence there is an angle 9 in the interval [0, 2n] such that a = cos 9 and c = sin 9. Similarly there is an angle 4> in this interval such that b = cos 0 and d = sin (p. Now we still have to satisfy the third equation ab+cd = 0, which requires that cos 9 cos 4> + sin 9 sin 0 = 0 that is, cos(c/> - 9) = 0. Hence 4> - 9 = ±?r/2 or ±3TT/2. We need to solve for b and d in each case. If < / > = 0 + 7r/2 or cp = 9 — 37r/2, we find that b — — sin 9 and d = cos 9. If, on the other hand, (f) = 9 - n/2 or 0 = 9 + 37r/2, it follows that b = sin 9 and d = — cos 0. We conclude that >1 has of one of the forms cos 9 sin 9 sin 9 — cos 0 with 9 in the interval [0, 2n. Conversely, it is easy to verify that such matrices are orthogonal. Thus the real orthogonal 2 x 2 matrices are exactly the matrices of the above types.
  • 253. 7.3: Orthonormal Sets and the Gram-Schmidt Process 237 We remark that these matrices have already appeared in other contexts. The first matrix represents an anticlock- wise rotation in R2 through angle 0: see Example 6.2.6. The second matrix corresponds to a reflection in R2 in the line through the origin making angle 0/2 with the positive IE- direction; see Exercises 6.2.3 and 6.2.6. Thus a connection has been established between 2x2 real orthogonal matrices on the one hand, and rotations and reflections in 2-dimensional Euclidean space on the other. It is worthwhile restating the QR-factorization principle in the important case where the matrix A is invertible. Theorem 7.3.6 Every invertible real matrix A can be written as a product QR where Q is a real orthogonal matrix and R is a real upper tri- angular matrix with positive entries on its principal diagonal. Example 7.3.8 Write the following matrix in the QR-factorized form: A The method is to apply the Gram-Schmidt process to the columns X, X2, X3 of A, which are linearly independent and so form a basis for the column space of A. This yields an orthonormal basis {Yi, Y2, Y3} where ^ = -1 n=^, y -;*UJ-2 T* + T*
  • 254. 238 Chapter Seven: Orthogonality in Vector Spaces and = - 3 ^ X 2 + /2X3. Solving back, we obtain the equations Xx = v^Fi X2 = 4 ^ / 3 Yx + V6/3 Y2 X3 = 2VSYX + x/6/2 Y2 + V2/2 Y3 Therefore A = QR where fl/y/3 -1/V6 l/v7 ^ Q = l/>/3 2//6 0 lA/3 -1/V6 - l / A and (y/3 4/V3 2^3 R = 0 /6/3 /6/2 . 0 0 V2/2J Unitary matrices We point out, without going through the details, that there is a version of the Gram-Schmidt procedure applicable to complex inner product spaces. In this the formulas of 7.3.4 are carried over with minor changes, to reflect the properties of complex inner products. There is also a QR-factorization theorem. In this an im- portant change must be made; the matrix Q which is pro- duced by the Gram-Schmidt process has the property that its columns are orthogonal with respect to the complex inner product on Cm . In the case where Q is square this is equiva- lent to the equation Q*Q = In
  • 255. 7.3: Orthonormal Sets and the Gram-Schmidt Process 239 or Q-X =Q*. Recall here that Q* = {Q)T • A complex matrix Q with the above property is said to be unitary. Thus unitary matri- ces are the complex analogs of real orthogonal matrices. For example, the matrix ( cos 9 isin9 isin9 cos9 J ' is unitary for all real values of 9; here of course i = f—l. Exercises 7.3 1. Show that the following vectors constitute an orthogonal basis of R3 : 'i)-G)-(4 2. Modify the basis in Exercise 1 to obtain an orthonormal basis. 3. Find an orthonormal basis for the column space of the matrix 0 1 1' 1 - 2 1 1 2 0 4. Find an orthonormal basis for the subspace of ^ ( R ) gen- erated by the polynomials 1 — 6x and 1 — 6x2 where < f,g > = Jo f(x)g(x)dx.
  • 256. 2 4 0 Chapter Seven: Orthogonality in Vector Spaces 3 5. Find the projection of the vector ( 4 | on the subspace -2_ of R3 which has the orthonormal basis consisting of 6. Express the matrix of Exercise 3 in QR-factorized form. 7. Show that a non-singular complex matrix can be expressed as the product of a unitary matrix and an upper triangular matrix whose diagonal elements are real and positive. 8. Find a factorization of the type described in the previous exercise for the matrix ( —i i l + i 2 where % = ->/—l. 9. If A and B are orthogonal matrices, show that A-1 and AB are also orthogonal. Deduce that the set of all real orthogonal nxn matrices is a group with respect to matrix multiplication in the sense of 1.3. 10. If A = QR — Q'R' are two QR-factorizations of the real non-singular square matrix A, what can you say about the relationship between the Q and Q', and R and R'l 11. Let L be a linear operator on the Euclidean inner product space Rn . Call L orthogonal if it preserves lengths, that is, if ||LpO|| = X for all vectors X in Rn . (a) Give some natural examples of orthogonal linear operators. (b) Show that L is orthogonal if and only if it preserves inner products, that is, < L(X),L(Y) > = < X,Y > for all X and Y.
  • 257. 7.4: The Method of Least Squares 241 12. Let L be a linear operator on the Euclidean space Rn . Prove that L is orthogonal if and only if L(X) — AX where A is an orthogonal matrix. 13. Deduce from Exercise 12 and Example 7.3.7 that a lin- ear operator on R2 is orthogonal if and only if it is either a rotation or a reflection. 7.4 The Method of Least Squares A well known application of linear algebra is a method of fitting a function to experimental data called the Method of Least Squares. In order to illustrate the practical problem involved, let us consider an experiment involving two measur- able variables x and y where it is suspected that y is, approx- imately at least, a linear function of x. Assume that we have some supporting data in the form of observed values of the variables and x and y, which can be thought of as a set of points in the xy-plane ( a i , 6 i ) , . . . , ( a m , 6 m ) . This means that when x = a*, it was observed that y = b{. Now if there really were a linear relation, and if the data were free from errors, all of these points would lie on a straight line, whose equation could then be determined, and the linear rela- tion would be known. But in practice it is highly unlikely that this will be the case. What is needed is a way of finding the straight line which "bests fits" the given data. The equation of this best-fitting line will furnish a linear relation which is an approximation to y.
  • 258. 242 Chapter Seven: Orthogonality in Vector Spaces It remains to explain what is meant by the best-fitting straight line. It is here that the "least squares" arise. Consider the linear relation y = cx+d; this is the equation of a straight line in the xy-plane. The conditions for the line to pass through the m data points are { mi + d = b ca2+ d = b2 cam + d — bm Now in all probability these equations will be inconsistent. However, we can ask for real numbers c and d which come as close to satisfying the equations of the linear system as possible, in the sense that they minimize the "total error". A good measure of this total error is the expression (cax + d- bi)2 H h (cam + d- bm)2 • This is the sum of the squares of the vertical deviations of the line from the data points in the diagram above. Here the squares are inserted to take care of any negative signs that might appear. It should be clear the line-fitting problem is just a par- ticular instance of a general problem about inconsistent linear systems. Suppose that we have a linear system of m equations in n unknowns x,..., xn AX = B. Since the system may be inconsistent, the problem of interest is to find a vector X which minimizes the length of the vector AX — B, or what is equivalent and also a good deal more convenient, its square, E= WAX -Bf.
  • 259. 7.4: The Method of Least Squares 243 In our original example, where a straight line was to be fitted to the data, the matrix A has two columns a1a2 ... am] and [11 ... 1], while X = I 1 and B is the column [bib2 • • . bm]T . Then E is the sum of the squares of the quantities cai + d — bi. A vector X which minimizes E is called a least squares solution of the linear system AX = B. A least squares solution will be an actual solution of the system if and and only if the system is consistent. The normal system Once again consider a linear system AX = B and write E = AX — B ||2 . We will show how to minimize E. Put A = [aij]m,n and let the entries of X and B be x i , . . . , xn and b,..., bm respectively. The ith entry of AX — B is clearly (Z)"=i a ijx j) - t>i. Hence E=AX-B2 = J2 ((E0 *^-)"6 *) i=i j=i 2 which is a quadratic function of xi,..., xn. At this juncture it is necessary to recall from calculus the procedure for finding the absolute minima of a function of several variables. First one finds the critical points of the function E, by forming its partial derivatives and setting them equal to zero: m n Hence i = l j = l i = l
  • 260. 244 Chapter Seven: Orthogonality in Vector Spaces for k = 1,2,... ,n. This is a new linear system of equations in x,..., xn whose matrix form is (AT A)X = AT B. It is called the normal system of the linear system AX = B. The solutions of the normal system are the critical points of E. Now E surely has an absolute minimum - after all it is a continuous function with non-negative values. Since the func- tion E is unbounded when XJ is large, its absolute minima must occur at critical points. Therefore we can state: Theorem 7.4.1 Every least squares solution of the linear system AX — B is a solution of the normal system (AT A)X = AT B. At this point potential difficulties appear: what if the normal system is inconsistent? If this were to happen, we would have made no progress whatsoever. And even if the normal system is consistent, need all its solutions be least squares solutions? To help answer these questions, we establish a simple result about matrices. Lemma 7.4.2 Let A be a real mxn matrix. Then A7 A is a symmetric nxn matrix whose null space equals the null space of A and whose column space equals the column space of AT . Proof In the first place {AT A)T = AT {AT )T = AT A, so AT A is certainly symmetric. Let S be the column space of A. Then by 7.2.6 the null space of AT equals SL . Let X be any n-column vector. Then X belongs to the null space of AT A if and only if AT (AX) = 0; this amounts to saying that AX belongs to the null space of AT or, what
  • 261. 7.4: The Method of Least Squares 245 is the same thing, to S-1 . But AX also belongs to S; for it is a linear combination of the columns of A. Now S fl S1 - is the zero space by 7.2.3. Hence AX = 0 and X belongs to the null space of A. On the other hand, it is obvious that if X belongs to the null space of A, then it must belong to the null space of AT A. Hence the null space of AT A equals the null space of A. Finally, by 7.2.6 and the last paragraph we can assert that the column space of AT A equals (null space of AT A)± = (null space ofA)± . This equals the column space of AT , as claimed. We come now to the fundamental theorem on the Method of Least Squares. Theorem 7.4.3 Let AX = B be a linear system ofm equations in n unknowns. (a) The normal system (AT A)X = AT B is always con- sistent and its solutions are exactly the least squares solutions of the linear system AX = B; (b) if A has rank n, then AT A is invertible and there is a unique least squares solution of the normal system, namely X = (AT A)-l AT B. Proof By 7.4.2 the column space of AT A equals the column space of AT . Therefore the column space of the matrix [AT A | AT B equals the column space of AT A; for the extra column AT B is a linear combination of the columns of AT and thus belongs to the column space of AT A. It follows that the coefficient matrix and the augmented matrix of the normal system have
  • 262. 246 Chapter Seven: Orthogonality in Vector Spaces the same rank. By 5.2.5 this is just the condition for the normal system to be consistent. The next point to establish is that every solution of the normal system is a least squares solution of AX = B. Suppose that X and X2 are two solutions of the normal system. Then AT A{XX - X2) = AT B - AT B = 0, so that Y = Xx - X2 belongs to the null space of AT A. By 7.4.2 the latter equals the null space of A. Thus AY = 0. Since Xx - Y + X2, we have AXi -B = A(Y + X2)-B = AX2 - B. This means that E = AX — B2 has the same value for X = Xi and X = X2. Thus all solutions of the normal system give the same value of E. Since by 7.4.1 every least squares solution is a solution of the normal system, it follows that the solutions of the normal system constitute the set of all least squares solutions, as claimed. Finally, suppose that A has rank n. Then the matrix AT A also has rank n since by 7.4.2 the column space of AT A equals the column space of AT , which has dimension n. Since AT A is n x n, it is invertible by 5.2.4 and 2.3.5. Hence the equation AT AX = AT B leads to the unique solution X = (AT A)-1 AT B, which completes the proof On the other hand, if the rank of A is less than n, there will be infinitely many least squares solutions. We shall see later how to select one that is in some sense optimal. Example 7.4.1 Find the least squares solution of the following linear system: xi + x2 + x3 = 4 -x + x2 + x3 — 0 - x2 + x3 =1 xi + x3 = 2
  • 263. 7.4: The Method of Least Squares 247 A = and B = Here 1 so A has has rank 3. Since the augmented matrix has rank 4, the linear system is inconsistent. We know from 7.4.3 that there is a unique least squares solution in this case. To find it, first compute A1 A = 3 0 1 x / 11 1—3 0 3 1 I and (AT A)~l = — | 1 1 1 - 3 1 1 4 - 3 - 3 9 Hence the least squares solution is T A- AT; X = {Ai AyL A1 B = 1 that is, xi = 8/5, x2 = 3/5, £3 = 6/5. Example 7.4.2 A certain experiment yields the following data: X y - l 0 0 l l 3 2 9 It is suspected that y is a quadratic function of x. Use the Method of Least Squares to find the quadratic function that best fits the data. Suppose that the function is y = a + bx + ex2 . We need to find a least squares solution of the linear system a a a a ~ b + c + b + c + 26 + Ac = 0 = 1 = 3 = 9
  • 264. 248 Chapter Seven: Orthogonality in Vector Spaces Again the linear system is inconsistent. Here A = / I 1 1 1 V 4 X 0 0 1 1 2 4 / and B = 1 3 W and A has rank 3. We find that AT A = and T A-l (AM) 12 -20 36 -20 -20 20 The unique least squares solution is therefore 11 X = (AT A)-l AT B = — I 33 20 25 that is, a = 11/20, b = 33/20, c = 5/4. Hence the quadratic function that best fits the data is 11 33 5 o y 20 20 4 Least squares and QR-factorization Consider once again the least squares problem for the linear system AX = B where A is m x n with rank n; we have seen that in this case there is a unique least squares so- lution X — (AT A)~1 AT B. This expression assumes a simpler form when A is replaced by its QR-factorization. Let this be
  • 265. 7.4: The Method of Least Squares 249 A = QR, as in 7.3.5. Thus Q is an m x n matrix with or- thonormal columns and R is an n x n upper triangular matrix with positive diagonal elements. Since the columns of Q form an orthonormal set, QT Q = In. Hence AT A = RT QT QR = RT R. Thus X = {RT R)-1 RT QT B, which reduces to X = R-X QT B, a considerable simplification of the original formula. However Q and R must already be known before this formula can be used. Example 7.4.3 Consider the least squares problem Here A = Xi Xi Xi ' + x2 + 2x2 + 2x2 1 2 2 3 2 l) + 2x3 = + 3x3 = + x3 = and B = 1 2 1 (I 1 V It was shown in Example 7.3.8 that A = QR where 1/ Q=' ' and R l/>/3 -1/V6 1/V2' 1/V3 2/v^ 0 1/V3 - l / / 6 - l / / 2 / V^ 4//3 2 ^ 0 >/6/3 /6/2 0 0 V ^ / 2 /
  • 266. 250 Chapter Seven: Orthogonality in Vector Spaces Hence the least squares solution is X = R~1 QT B = 1 ) , t h a t is, X = 1, X2 = 0, X3 = 0. Geometry of the least squares process There is a suggestive geometric interpretation of the least squares process in terms of projections. Consider the least squares problem for the linear system AX = B where A has m rows. Let S denote the column space of the coefficient matrix A. The least squares solutions are the solutions of the normal system AT AX = AT B, or equivalently AT {B - AX) = 0. The last equation asserts that B — AX belongs to the null space of AT , which by 7.2.6 is equal to S1 . Our condition can therefore be reformulated as follows: X is a least squares solution of AX = B if and only if B — AX belongs to S1 . Now B = AX + (B - AX) and AX belongs to S. Recall from 7.2.4 that B is uniquely expressible as the sum of its projections on the subspaces S and S-1 ; we conclude that B — AX belongs to S1 precisely when AX is the projection of B on S. In short we have a discovered a geometric description of the least squares solutions. Theorem 7.4.4 Let AX = B be an arbitrary linear system and let S denote the column space of A. Then a column vector X is a least squares solution of the linear system if and only if AX is the projection of B on S. Notice that the projection AX is uniquely determined by the linear system AX = B. However there is a unique least
  • 267. 7.4: The Method of Least Squares 251 squares solution X if and only if X is uniquely determined by AX, that is, if AX = AX implies that X = X. Hence X is unique if and only if the null space of A is zero, that is, if the rank of A is n. Therefore we can state Corollary 7.4.5 There is a unique least squares solution of the linear system AX = B if and only if the rank of A equals the number of columns of A. Optimal least squares solutions Returning to the general least squares problem for the linear system AX = B with n unknowns, we would like to be able to say something about the least squares solutions in the case where the rank of A is less than n. In this case there will be many least square solutions; what we have in mind is to find a sensible way of picking one of them. Now a natural way to do this would be to select a least squares solution with minimal length. Accordingly we define an optimal least squares solution of AX = B to be a least squares solution X whose length X is as small as possible. There is a simple method of finding an optimal least squares solution. Let U denote the null space of A; then U equals (column space of AT )± , by 7.2.6. Suppose X is a least squares solution of the system AX = B. Now there is a unique expression X = XQ + X where XQ belongs to U and X be- longs to U1 ; this is by 7.2.4. Then AX = AX0 + AXX = AX±; for AXQ = 0 since XQ belongs to the null space of A. Thus AX — B = AXi — B, so that X is also a least squares solution of AX = B. Now we compute ||X||2 = XQ+XX2 = (X0+X1)T (X0+X1) = XZXQ+XTXL For XQXI = 0 = XJ'XQ since X0 and Xx belong to U and U1 - respectively. Therefore ||x||2 HI*ol|2 + ||Xi||2 >||Xi||2 .
  • 268. 252 Chapter Seven: Orthogonality in Vector Spaces Now, if X is an optimal solution, then ||X|| = X, so that X0 = 0 and hence XQ = 0. Thus X = Xi belongs to U1 . It follows that each optimal least squares solution must belong to t/-1 , the column space of AT . Finally, we show that there is a unique least squares so- lution in U± . Suppose that X and X are two least squares solutions in U^. Then from 7.4.4 we see that AX and AX are both equal to the projection of B on the column space of A. Thus A(X — X) = 0 and X — X belongs to U, the null space of A. But X and X also belong to U^~, whence so does X — X. Since U D U1 = 0, it follows that X - X = 0 and X = X. Hence X is the unique optimal least squares solution and it belongs to Vs -. Combining these conclusions with 7.4.4, we obtain: Theorem 7.4.6 A linear system AX = B has a unique optimal least squares solution, namely the unique vector X in the column space of AT such that AX is the projection of B on the column space ofAT . The proof of 7.4.6 has the useful feature that it tells us how to find the optimal least squares solution of a linear sys- tem AX = B. First find any least squares solution, and then compute its projection on the column space of AT . Example 7.4.4 Find the optimal least squares solution of the linear system Xl - X2 + X3 = 1 xi + x2 - 2x3 = 2 2xi - x3 = 4
  • 269. 7.4: The Method of Least Squares 253 The first step is to identify the normal system (AT A)X = AT B; 6Xx — 3x3 = 11 2x2 - 3x3 = 1 —3xi — 3x2 + 6x3 = —7 Any solution of this will do; for example, we can take the solution vector '-(?)• To obtain an optimal least squares solution, find the projec- tion of X on the column space of AT ; the first two columns of AT form a basis of this space. Proceeding as in Example 7.2.12, we find the optimal solution to be so that Xi = 67/42, x2 = —3/14, X3 = —10/21 is the optimal least squares solution of the linear system. Least squares in inner product spaces In 7.4.4 we obtained a geometrical interpretation of the least squares process in Rn in terms of projections on sub- spaces. This raises the question of least squares processes in an arbitrary finite-dimensional real inner product space V. First we must formulate the least squares problem in V. This consists in approximating a vector v in V by a vector in a subspace S of V. A natural way to do this is to choose x in S so that ||x — v||2 is as small as possible. This is a direct generalization of the least squares problem in Rn . For, if we are given the linear system AX = B and we take S to be the column space of A, v to be B and x to be the vector AX of S, then the least squares problem is to minimize || AX" — -E?||2 .
  • 270. 2 5 4 Chapter Seven: Orthogonality in Vector Spaces It turns out that the solution of this general least squares problem is the projection of v on S, just as in the special case ofRn . Theorem 7.4.7 Let V be a finite-dimensional, real inner product space, and let v be an element and S a subspace of V. Denote the projection of v on S by p. Then, if x is any vector in S other than p, the inequality ||x — v|| > ||p — v|| holds. Thus p is the vector in S which most closely approximates v in the sense that it makes ||p — v|| as small as possible. Proof Since x and p both belong to S, so does x — p. Also p — v belongs to S1 - since p is the projection of v on S. Hence < p — v, x — p > = 0. It follows that ||x - v||2 = < (x - p) + (p - v), (x - p) + (p - v) > = < x - p , x - p > + < p - v , p - v > = ||x - P||2 + ||P - v||2 > ||p - v f since x — p ^ 0. Hence ||x — v|| > ||p — v||. In applying 7.4.7 it is advantageous to have at hand an orthonormal basis {vi,..., vm } of S. For the task of comput- ing p, the projection of v on S, is then much easier since the formula of 7.3.3 is available: m P = ^2 < V, Si > Sj . i = l Example 7.4.5 Use least squares to find a quadratic polynomial that approx- imates the function ex in the interval [—1, 1].
  • 271. 7.4: The Method of Least Squares 255 Here it is assumed that we are working in the inner prod- uct space C[—1, 1] where < /, g > = J_x f(x)g(x)dx. Let S denote the subspace consisting of all quadratic polynomials in x. An orthonormal basis for S was found in Example 7.3.6: 1 , [3 3>/5, 2 1- By 7.4.7 the least squares approximation to ex in S is simply the projection of ex on S; this is given by the formula p = < ex , h > h + < ex , h > h + < ex , h > h- Evaluating the integrals by integration by parts, we obtain and <e',/i> = ^ J. <ex J2>=V6e-1 <ex ,h> = J{e-le-1 ). The desired approximation to ex is therefore P=(e-e-')+3e-1 x + ^(e-7e-')(x2 -±). Alternatively one can calculate the projection by using the standard basis 1, x,x2 . Exercises 7.4 1. Find least squares solutions of the following linear systems: xi + x2 = 0 (a) I °°2 + X3 = ° * xi - x2 - x3 = 3 £i + ^3 = 0
  • 272. 256 Chapter Seven: Orthogonality in Vector Spaces ' xi + x2 - 2x3 — 3 ^ I 2xi - x2 + 3x3 = 4 1 | xi + x3 = 1 . X + X2 + X3 = 1 2. The following data were collected for the mean annual tem- perature t and rainfall r in a certain region; use the Method of Least Squares to find a linear approximation for r in terms of t (a calculator is necessary): t r 24 47 27 30 22 35 24 38 3. In a tropical rain forest the following data was collected for the numbers x and y (per square kilometer) of a prey species and a predator species over a number of years. Use least squares to find a quadratic function of x that approximates y (a calculator is necessary): X y 2 l 3 2 4 2 5 1 4. Find the optimal least squares solution of the linear system Xi + 2x3 = 1 x2 + 3x3 = 0 —xi + x2 + x3 = 0 — x2 — 3x3 — 1 5. Find a least squares approximation to the function e~x by a linear function in the interval [1, 2]. [Use the inner product < f,9> = fi f(x)g(x)dx}. 6. Find a least squares approximation for the function sin x as a quadratic function of x in the interval [0, n]. [Here the inner product < f,g > = JQ f(x)g(x)dx is to be used].
  • 273. Chapter Eight EIGENVECTORS AND EIGENVALUES An eigenvector of an n x n matrix A is a non-zero n- column vector X such that AX = cX for some scalar c, which is called an eigenvalue of A. Thus the effect of left multiplica- tion of an eigenvector by A is merely to multiply it by a scalar, and when n < 3, a parallel vector is obtained. Similarly, if T is a linear operator on a vector space V, an eigenvector of T is a non-zero vector v of V such that T(v) = cv for some scalar c called an eigenvalue. For example, if T is a rotation in R3 , the eigenvectors of T are the non-zero vectors parallel to the axis of rotation and the eigenvalues are all equal to 1. A large amount of information about a matrix or linear operator is carried by its eigenvectors and eigenvalues. In addition, the theory of eigenvectors and eigenvalues has im- portant applications to systems of linear recurrence relations, Markov processes and systems of linear differential equations. We shall describe the basic theory in the first section and then we give applications in the following two sections of the chapter. 8.1 Basic Theory of Eigenvectors and Eigenvalues We begin with the fundamental definition. Let A be an n x n matrix over a field of scalars F. An eigenvector of A is a non-zero n-column vector X over F such that AX = cX for some scalar c in F; the scalar c is then referred to as the eigenvalue of A associated with the eigenvector X. 257
  • 274. 258 Chapter Eight: Eigenvectors and Eigenvalues In order to clarify the definition and illustrate the tech- nique for finding eigenvectors and eigenvalues, an example will be worked out in detail. Example 8.1.1 Consider the real 2 x 2 matrix A <1 The condition for the vector M - 1 4 'xx^ x2) to be an eigenvector of A is that AX = cX for some scalar c. This is equivalent to (A — cI2)X = 0, which simply asserts that X is a solution of the linear system 2 - c - 1 2 4 - c Xi %2 Now by 3.3.2 this linear system will have a non-trivial solution xi, X2 if and only if the determinant of the coefficient matrix vanishes, 2 - c - 1 2 4 - < = 0, that is, c2 — 6c + 10 = 0. The roots of this quadratic equa- tion are c = 3 + >/^T and C2 = 3 — -f—l, so these are the eigenvalues of A. The eigenvectors for each eigenvalue are found by solving the linear systems (A — CI2)X = 0 and (A — c2l2)X = 0. For example, in the case of c we have to solve {--4^l)xX- £2=0 2zi + (l->/= l)a;2 = 0
  • 275. 8.1: Basic Theory of Eigenvectors 259 The general solution of this system is £1 = |(—1 + y/—l) and x2 — d , where d is an arbitrary scalar. Thus the eigenvectors of A associated with the eigenvalue C are the non-zero vectors of the form Notice that these, together with the zero vector, form a 1- dimensional subspace of C2 . In a similar manner the eigen- vectors for the eigenvalue 3 — /—T are found to be the vectors of the form where d ^ 0. Again these form with the zero vector a subspace ofC2 . It should be clear to the reader that the method used in this example is in fact a general procedure for finding eigen- vectors and eigenvalues. This will now be described in detail. The characteristic equation of a matrix Let A be an n x n matrix over a field of scalars F, and let X be a non-zero n-column vector over F. The condition for X to be an eigenvector of A is AX = cX, or {A - dn)X = 0, where c is the corresponding eigenvalue. Hence the eigenvec- tors associated with c, together with the zero vector, form the null space of the matrix A — cln. This subspace is often referred to as the eigenspace of the eigenvalue c. Now (A — dn)X = 0 is a linear system of n equations in n unknowns. By 3.3.2 the condition for there to be a non-trivial solution of the system is that the coefficient matrix have zero determinant, det(A - cln) = 0.
  • 276. 260 Chapter Eight: Eigenvectors and Eigenvalues Conversely, if the scalar c satisfies this equation, there will be a non-zero solution of the system and c will be an eigen- value. These considerations already make it clear that the determinant an -x a12 • • • aln «2i a22~ x • • • a2n 0"nl Q"n2 ' ' ' &nn X must play an important role. This is a polynomial of de- gree n in x which is called the characteristic polynomial of A. The equation obtained by setting the characteristic poly- nomial equal to zero is the characteristic equation. Thus the eigenvalues of A are the roots of the characteristic equation (or characteristic polynomial) which lie in the field F. At this point it is necessary to point out that A may well have no eigenvalues in F. For example, the characteristic polynomial of the real matrix is x2 + 1, which has no real roots, so the matrix has no eigen- values in R. However, if A is a complex nxn matrix, its characteristic equation will have n complex roots, some of which may be equal. The reason for this is a well-known result known as The Fundamental Theorem of Algebra; it asserts that every polynomial / of positive degree n with complex coefficients can be expressed as a product of n linear factors; thus the equation f(x) = 0 has exactly n roots in C. Because of this we can be sure that complex matrices always have all their eigenvalues and eigenvectors in C. It is this case that principally concerns us here. Let us sum up our conclusions about the eigenvalues of complex matrices so far. det(A - xln) =
  • 277. 8.1: Basic Theory of Eigenvectors 261 Theorem 8.1.1 Let A be an n x n complex matrix. (i) The eigenvalues of A are precisely the n roots of the characteristic polynomial &et(A — xln); (ii) the eigenvectors of A associated with an eigenvalue c are the non-zero vectors in the null space of the matrix A-cIn. Thus in Example 8.1.1 the characteristic polynomial of the matrix is 2-x - 1 2 4-x = x2 - 6x + 10. The eigenvalues are the roots of the characteristic equation x2 — 6x + 10 = 0, that is, c = 3 + f—T and c^ — 3 — J—1; the eigenspaces of c and c^ are generated by the vectors and ( _ l + v /3T) / 2 -(l + V=l)/2 1 respectively. Example 8.1.2 Find the eigenvalues of the upper triangular matrix (a-x ai2 ai3 0 a22 - x a23 « l n 0-2n 0 0 0 ann — x / The characteristic polynomial of this matrix is an - x a12 a13 0 a22 - x a23 0 0 0 a in 0>2n Ojn.n. ^
  • 278. 262 Chapter Eight: Eigenvectors and Eigenvalues which, by 3.1.5, equals (an — x)(a,22 — x) ... (ann — x). The eigenvalues of the matrix are therefore just the diagonal entries ^11) <^22) • • • i^nn- Example 8.1.3 Consider the 3 x 3 matrix The characteristic polynomial of this matrix is 2-x - 1 - 1 - 1 2-x - 1 - 1 - 1 -x = -x6 + 4x2 - x - 6. Fortunately one can guess a root of this cubic polynomial, namely x = — 1. Dividing the polynomial by x + 1 using long division, we obtain the quotient —x2 + 5x — 6 = — (x — 2)(x — 3). Hence the characteristic polynomial can be factorized com- pletely as — (x + l)(x — 2)(x — 3), and the eigenvalues of A are — 1, 2 and 3. To find the corresponding eigenvectors, we have to solve the three linear systems (A + I3)X ~ 0, (A - 2I3)X = 0 and (A — 3/s)X = 0. On solving these, we find that the respective eigenvectors are the non-zero scalar multiples of the vectors The eigenspaces are generated by these three vectors and so each has dimension 1.
  • 279. 8.1: Basic Theory of Eigenvectors 263 Properties of the characteristic polynomial Now let us see what can be said in general about the characteristic polynomial ofannxn matrix A. Let p(x) denote this polynomial; thus Q>nn 2- At this point we need to recall the definition of a determinant as an alternating sum of terms, each term being a product of entries, one from each row and column. The term of p(x) with highest degree in x arises from the product (an - x)--- (ann - x) and is clearly (—x)n . The terms of degree n — 1 are also easy to locate since they arise from the same product. Thus the coefficient of xn ~x is ( - l ) n - 1 ( a n + --- + ann) and the sum of the diagonal entries of A is seen to have sig- nificance; it is given a special name, the trace of A, tr(A) = an + a22 H h ann. The term in p(x) of degree n — 1 is therefore tr(^4) (—a;)"-1 . The constant term in p(x) may be found by simply putting x = 0 in p(x) = det(A — xln), thereby leaving det(A). Our knowledge of p(x) so far is summarized in the formula p{x) = (-x)n + t r ^ X - a : ) " - 1 + • • • + det(A). p(x) = an — x a2 0-21 0-22 - X 0"nl 0-n2
  • 280. 2 6 4 Chapter Eight: Eigenvectors and Eigenvalues The other coefficients in the characteristic polynomial are not so easy to describe, but they are in fact expressible as subdeterminants of det(i4). For example, take the case of xn ~2 . Now terms in xn ~2 arise in two ways: from the product (an — x) • • • (ann — x) or from products like -ai2 a2 i(a3 3 - x) • • • (ann - x). So a typical contribution to the coefficient of xn ~2 is (-l)n -2 (ana2 2 - a12a2i) = (-1) From this it is clear that the term of degree n — 2 in p(x) is just (—x)n ~2 times the sum of all the 2 x 2 determinants of the form an O'ij a ji a jj where i < j . In general one can prove by similar considerations that the following is true. Theorem 8.1.2 The characteristic polynomial of the n x n matrix A equals n J2di(-x)n -* i=0 where di is the sum of all the i x i subdeterminants of det(A) whose principal diagonals are part of the principal diagonal of A. Now assume that the matrix A has complex entries. Let ci, c2 ,..., cn be the eigenvalues of A. These are the n roots of the characteristic polynomial p(x). Therefore, allowing for the an ai 2 0-21 0-22
  • 281. 8.1: Basic Theory of Eigenvectors 265 fact that the term of p(x) with highest degree has coefficient (—l)n , one has p(x) = (ci - x)(c2 -x)---(cn- x). The constant term in this product is evidently just cCi... cn, while the term in xn ~l has coefficient (—l)n-1 (ci + • • • + cn). On the other hand, we previously found these to be det(A) and (—l)n ~1 tx{A) respectively. Thus we arrive at two important relations between the eigenvalues and the entries of A. Corollary 8.1.3 // A is any complex square matrix, the product of the eigenval- ues equals the determinant of A and the sum of the eigenvalues equals the trace of A Recall from Chapter Six that matrices A and B are said to be similar if there is an invertible matrix S such that B = SAS~X . The next result indicates that similar matrices have much in common, and really deserve their name. Theorem 8.1.4 Similar matrices have the same characteristic polynomial and hence they have the same eigenvalues, trace and determinant. Proof The characteristic polynomial of B — SAS^1 is det^SAS'1 - xl) =det(S(A - x^S'1 ) = det(S) det(A - xl) det(5)"1 = det{A-xI). Here we have used two fundamental properties of determi- nants established in Chapter Three, namely 3.3.3 and 3.3.5. The statements about trace and determinant now follow from 8.1.3.
  • 282. 266 Chapter Eight: Eigenvectors and Eigenvalues On the other hand, one cannot expect similar matrices to have the same eigenvectors. Indeed the condition for X to be an eigenvector of SAS~X with eigenvalue c is (SAS'^X — cX, which is equivalent to Atf^X) = c(S~1 X). Thus X is an eigenvector of SAS~~l if and only if S~X X is an eigenvector of A Eigenvectors and eigenvalues of linear transformations Because of the close relationship between square matri- ces and linear operators on finite-dimensional vector spaces observed in Chapter Six, it is not surprising that one can also define eigenvectors and eigenvalues for a linear operator. Let T : V — » V be a linear operator on a vector space V over a field of scalars F. An eigenvector of T is a non-zero vector v of V such that T(v) = cv for some scalar c in F: here c is the eigenvalue of T associated with the eigenvector v. Suppose now that V is a finite-dimensional vector space over F with dimension n. Choose an ordered basis for V, say B. Then with respect to this ordered basis T is represented by an n x n matrix over F, say A; this means that [T(y)]B = A[v]B. Here [U]B is the coordinate column vector of a vector u in V with respect to basis B . The condition T(v) = cv for v to be an eigenvector of T with associated eigenvalue c, becomes -AMB = c[v]g, which is just the condition for M s to be an eigenvector of the representing matrix A; also the eigenvalues of T and A are the same. If the ordered basis of V is changed, the effect is to replace A by a similar matrix. Of course any such matrix will have the same eigenvalues as T; thus we have another proof of the fact that similar matrices have the same eigenvalues.
  • 283. 8.1: Basic Theory of Eigenvectors 267 These observations permit us to carry over to linear op- erators concepts such as characteristic polynomial and trace, which were introduced for matrices. Example 8.1.4 Consider the linear transformation T : Doo[a,b] — > Doo[a,6] where T(f) = /', the derivative of the function /. The con- dition for / to be an eigenvector of T is / ' = cf for some constant c. The general solution of this simple differential equation is / = decx where d is a constant. Thus the eigen- values of T are all real numbers c, while the eigenvectors are the exponential functions decx with d ^ 0. Diagonalizable matrices We wish now to consider the question: when is a square matrix similar to a diagonal matrix? In the first place, why is this an interesting question? The essential reason is that diagonal matrices behave so much more simply than arbitrary matrices. For example, when a diagonal matrix is raised to the nth power, the effect is merely to raise each element on the diagonal to the nth power, whereas there is no simple expression for the nth power of an arbitrary matrix. Suppose that we want to compute An where A is similar to a diagonal matrix D, with say A = SDS~X . It is easily seen that An = SDn S~1 . Thus it is possible to calculate An quite simply if we have explicit knowledge of S and D. It will emerge in 8.2 and 8.3 that this provides the basis for effective methods of solving systems of linear recurrences and linear differential equations. Now for the important definition. Let A be a square matrix over a field F. Then A is said to be diagonalizable over F if it is similar to a diagonal matrix D over F, that is, there is an invertible matrix S over F such that A = SDS-1 or equivalently, D = S~1 AS. One also says that S diagonalizes A. A diagonalizable matrix need not be diagonal: the reader
  • 284. 268 Chapter Eight: Eigenvectors and Eigenvalues should give an example to demonstrate this. It is an important observation that if A is diagonalizable and its eigenvalues are c,..., cn, then A must be similar to the diagonal matrix with ci,..., cn on the principal diagonal. This is because similar matrices have the same eigenvalues and the eigenvalues of a diagonal matrix are just the entries on the principal diagonal - see Example 8.1.2. What we are aiming for is a criterion which will tell us exactly which matrices are diagonalizable. A key step in the search for this criterion comes next. Theorem 8.1.5 Let A be an n x n matrix over a field F and let C,..., cr be distinct eigenvalues of A with associated eigenvectors Xi,..., Xr. Then {Xi,..., Xr} is a linearly independent sub- set of Fn . Proof Assume the theorem is false; then there is a positive integer i such that {X±,... ,Xi} is linearly independent, but the ad- dition of the next vector Xi+i produces a linearly dependent set {Xi,..., Xi+x}. So there are scalars d,..., rfj+i, not all of them zero, such that hXi + • • • + di+1Xi+1 = 0 . Premultiply both sides of this equation by A and use the equa- tions AXj = CjXj to get CidiXi -I 1 - ci+1di+1Xi+i — 0. On subtracting Q+I times the first equation from the second, we arrive at the relation (ci - ci+1)diXi H h (CJ - ci+i)diXi = 0.
  • 285. 8.1: Basic Theory of Eigenvectors 269 Since Xi,..., Xi are linearly independent, all the coefficients (CJ —Ci+i)dj must vanish. But ci,..., q+ 1 are all different, so we can conclude that dj = 0 for j = 1,..., i; hence di+iXi+i = 0 and so di+i = 0, in contradiction to the original assumption. Therefore the statement of the theorem must be correct. The criterion for diagonalizability can now be established. Theorem 8.1.6 Let A be an n x n matrix over a field F. Then A is diagonal- izable if and only if A has n linearly independent eigenvectors in Fn . Proof First of all suppose that A has n linearly independent eigen- vectors in Fn , say Xi,..., Xn, and that the associated eigen- values are ci,..., cn. Define S to be the n x n matrix whose columns are the eigenvectors; thus S=(X1...Xn). The first thing to notice is that S is invertible; for by 8.1.5 its columns are linearly independent. Forming the product of A and S in partitioned form, we find that AS = {AXX... AXn) = (c1X1 • • • cnXn), which equals (Xi • • • Xn) 'ci 0 0 0 c2 0 0 0 • 0 = SD, where D is the diagonal matrix with entries C,..., cn. There- fore S~1 AS = D and A is diagonalizable.
  • 286. 270 Chapter Eight: Eigenvectors and Eigenvalues Conversely, assume that A is diagonalizable and that S~1 AS = D is a diagonal matrix with entries ci,..., cn. Then AS = SD. This implies that if X{ is the zth column of S, then AXi equals the ith column of SD, which is CjXj. Hence Xi,..., Xn are eigenvectors of A associated with eigenvalues c,..., cn. Since X,..., Xn are columns of the invertible ma- trix S, they must be linearly independent. Consequently A has n linearly independent eigenvectors. Corollary 8.1.7 An n x n complex matrix which has n distinct eigenvalues is diagonalizable. This follows at once from 8.1.5 and 8.1.6. On the other hand, it is easy to think of matrices which are not diagonaliz- able: for example, there is the matrix -(; o- Indeed if A were diagonalizable, it would be similar to the identity matrix I2 since both its eigenvalues equal 1, and S~1 AS = I2 for some S; but the last equation implies that A = SI2S~l = I2, which is not true. An interesting feature of the proof of 8.1.6 is that it pro- vides us with a method of finding a matrix S which diagonal- izes A. One has simply to find a set of linearly independent eigenvectors of A; if there are enough of them, they can be taken to form the columns of the matrix S. Example 8.1.5 Find a matrix which diagonalizes A = In Example 8.1.1 we found the eigenvalues of A to be 3 + A/^T and 3 — y/^T; hence A is diagonalizable by 8.1.7. We
  • 287. 8.1: Basic Theory of Eigenvectors 271 also found eigenvectors for A; these form a matrix 5 /(-i + v=i)/2 -(i + v=T)/2y Then by the preceding theory we may be sure that Triangularizable matrices It has been seen that not every complex square matrix is diagonalizable. Compensating for this failure is the fact such a matrix is always similar to an upper triangular matrix; this is a result with many applications. Let A be a square matrix over a field F. Then A is said to be triangularizable over F if there is an invertible matrix S over F such that S~l AS = T is upper triangular. It will also be convenient to say that S triangularizes A. Note that the diagonal entries of the triangular matrix T will necessarily be the eigenvalues of A. This is because of Example 8.1.2 and the fact that similar matrices have the same eigenvalues. Thus a necessary condition for A to be triangularizable is that it have n eigenvalues in the field F. When F = C, this condition is always satisfied, and this is the case in which we are interested. Theorem 8.1.8 Every complex square matrix is triangularizable. Proof Let A denote a n n x n complex matrix. We show by induction on n that A is triangularizable. Of course, if n = 1, then A is already upper triangular: let n > 1. We shall use induction on n and assume that the result is true for square matrices with n — 1 rows.
  • 288. 272 Chapter Eight: Eigenvectors and Eigenvalues We know that A has at least one eigenvalue c in C, with associated eigenvector X say. Since X ^ 0, it is possible to adjoin vectors to X to produce a basis of Cn , say X = X,X2,..., Xn here we have used 5.1.4. Next, recall that left multiplication of the vectors of Cn by A gives rise to linear operator T on Cn . With respect to the basis {Xi,... , X n } , the linear operator T will be represented by a matrix with the special form where A and A2 are certain complex matrices, A having n — 1 rows and columns. The reason for the special form is that T{X) = AX = cX since X is an eigenvalue of A. Notice that the matrices A and B are similar since they represent the same linear operator T; suppose that in fact Bi = S^ASi where Si is an invertible n x n matrix. Now by induction hypothesis there is an invertible matrix 62 with n — 1 rows and columns such that B^ = S^1 ASi is upper triangular. Write s =Sl {o 1)- This is a product of invertible matrices, so it is invertible. An easy matrix computation shows that S^^-AS equals which equals Replace Bi by I .2 ) and multiply the matrices together to get
  • 289. 8.1: Basic Theory of Eigenvectors 273 Q-lAQ— (C A ^ - (C A ^ b Ab ~ 0 S^AXS2) ~ 0 B2 This matrix is clearly upper triangular, so the theorem is proved. The proof of the theorem provides a method for triangu- larizing a matrix. Example 8.1.6 Triangularize the matrix A - - 1 3 / The characteristic polynomial of A is x2 — 4x + 4, so both eigenvalues equal 2. Solving (A — 2I2)X = 0, we find that all the eigenvectors of A are scalar multiples of X = Hence A is not diagonalizable by 8.1.6. Let T be the linear operator on C2 arising from left mul- tiplication by A. Adjoin a vector to X2 to X to get a basis B2 = {Xu X2} of C2 , say X2 = (J J. Denote by Bx the standard basis of C2 . Then the change of basis B — > B2 is described by the matrix Si = ( ). Therefore by 6.2.6 the matrix A which represents T with respect to the basis B2 is Hence S = S^1 = I j triangularizes A.
  • 290. 274 Chapter Eight: Eigenvectors and Eigenvalues Exercises 8.1 1. Find all the eigenvectors and eigenvalues of the following matrices: «»• (iJD' (I! i •!)• 2. Prove that tr(j4 + £) = tr(A) + tv(B) and tr(cA) = c tr(A) where A and 5 are nxn matrices and c is a scalar. 3. If yl and B are nxn matrices, show that AB and BA have the same eigenvalues. [Hint: let c be an eigenvalue of AB and prove that it is an eigenvalue of BA ]. 4. Suppose that A is a square matrix with real entries and real eigenvalues. Prove that every eigenvalue of A has an associated real eigenvector. 5. If A is a real matrix with distinct eigenvalues, then A is diagonalizable over R: true or false? 6. Let p(x) be the polynomial (-l)n (xn + an.xxn -1 + an_2xn -2 + • • • + ao). Show that p(x) is the characteristic polynomial of the follow- ing matrix (which is called the companion matrix of p(x)): /0 0 • • • 0 - a 0 1 0 • • • 0 - a i 0 1 • • • 0 -a2 0 0 • • • 1 - a n _ i / 7. Find matrices which diagonalize the following:
  • 291. 8.1: Basic Theory of Eigenvectors 275 w (a a)= 0.)(J j 1 )• 8. For which values of a and b is the matrix I , 1 diago- nalizable over C? 9. Prove that a complex 2 x 2 matrix is not diagonalizable if and only if it is similar to a matrix of the form where 6 ^ 0 . 10. Let A be a diagonalizable matrix and assume that S is a matrix which diagonalizes A. Prove that a matrix T diago- nalizes A if and only if it is of the form T = CS where C is a matrix such that AC = CA. 11. If A is an invertible matrix with eigenvalues ci,..., cn, show that the eigenvalues oi A~l are c^~ ,..., c^1 . 12. Let T : V — > • V be a linear operator on a complex n- dimensional vector space V. Prove that there is a basis {vi, ..., vn } of V such that T(VJ) is a linear combination of vi 5 ... ,vn for i = 1,... ,n. 13. Let T : Pn (R) — • Pn(R-) be the linear operator corre- sponding to differentiation. Show that all the eigenvalues of T are zero. What are the eigenvectors? 14. Let ci,...,cn be the eigenvalues of a complex matrix A. Prove that the eigenvalues of Am are Cj",..., c™ where m is any positive integer. [Hint: A is triangularizable]. 15. Prove that a square matrix and its transpose have the same eigenvalues. a b 0 a
  • 292. 276 Chapter Eight: Eigenvalues and Eigenvectors 8.2 Applications to Systems of Linear Recurrences A recurrence relation is an equation involving a function y of a non-negative integral variable n, the value of y at n being written yn. The equation relates the values of the function at certain consecutive integers, typically yn+i,yn,..., yn-r. In addition there may be some initial conditions to be satisfied, which specify certain values of j/j. If the equation is linear in y, the recurrence relation is said to be linear. The problem is to solve the recurrence, that is, to find the most general function which satisfies the equation and the initial conditions. Linear recurrence relations, and more generally systems of linear re- currence relations, occur in many real-life problems. We shall see that the theory of eigenvalues provides an effective means for solving such problems. To understand how systems of linear relations can arise we consider a predator-prey problem. Example 8.2.1 In a population of rabbits and weasels it is observed that each year the number of rabbits is equal to four times the number of rabbits less twice the number of weasels in the previous year. The number of weasels in any year equals the sum of the numbers of rabbits and weasels in the previous year. If the initial numbers of rabbits and weasels were 100 and 10 respectively, find the numbers of each species after n years. Let rn and wn denote the respective numbers of rabbits and weasels after n years. The information given in the state- ment of the problem translates into the equations rn + i = 4rn - 2wn wn+1 = rn + wn together with the initial conditions ro = 100, w0 = 10. Thus we have to solve a system of two linear recurrence relations for rn and wn, subject to two initial conditions.
  • 293. 8.2: Systems of Linear Recurrences 277 At first sight it may not seem clear how eigenvalues enter into this problem. However, let us put the system of linear recurrences in matrix form by writing x - = (-Ja n d '4 = (i ~'i Then the two recurrences are equivalent to the single matrix equation Xn+i = AXn, while the initial conditions assert that 100' X ° ' 10 These equations enable us to calculate successive vectors Xn; thus X = AX0, X2 = A2 XQ, and in general Xn = An Xo. In principle this equation provides the solution of our prob- lem. However the equation is difficult to use since it involves calculating powers of A; these soon become very complicated and there is no obvious formula for An . The key observation is that powers of a diagonal matrix are easy to compute; one simply forms the appropriate power of each diagonal element. Fortunately the matrix A is diago- nalizable since it has distinct eigenvalues 2 and 3. Correspond- ing eigenvectors are found to be I J and I J; therefore the ( 2 matrix 5 = 1 1 diagonalizes A, and D = S~1 AS=i I °
  • 294. 278 Chapter Eight: Eigenvalues and Eigenvectors It is now easy to find Xn; for An = (SOS'1 )™ = SDn S~1 . Therefore Xn = An X0 = SDn S-1 X0 (I 2 (2n -{i iJ(,o lich leads to M 3n J (-1 1 2 ) -lj /100 10 = f 180 • 3n - 80 • 2n n ~ 90 • 3n - 80 • 2n The solution to the problem can now be read off: rn = 180 • 3" - 80 • 2n and wn = 90 • 3n - 80 • T. Let us consider for a moment the implications of these equa- tions. Notice that rn and wn both increase without limit as n —> oo since 3n is the dominant term; however lim ( ^ ) = 2. n—>oo t o n The conclusion is that, while both populations explode, in the long run there will be twice as many rabbits as weasels. Having seen that eigenvalues provide a satisfactory so- lution to the rabbit-weasel problem, we proceed to consider systems of linear recurrences in general. Systems of first order linear recurrence relations A system of first order (homogeneous) linear recurrence relations in functions y„ ,..., j/n of an integral variable n is a set of equations of the form t (i) (i) , , M Vn+l = a HVn + • • • + O-lmVn (2) (1) , , {rn) Vn+l = a 2l2M + • • • + 0,2mVn (m) _ (!) i i (m ) Vn--l — a miyn -r ' • • -r 0,mrnyn
  • 295. 8.2: Systems of Linear Recurrences 279 We shall only consider the case where the coefficients constants. One objective might be to find all the functions Vn , • • • i Vn which satisfy the equations of the system, i.e., the general solution. Alternatively, one might want to find a solution which satisfies certain given conditions, V6 (1) _ h ,.(2) _ , 7/( m ) - h where &i,..., bm are constants. Clearly the rabbit and weasel problem is of this type. The method adopted in Example 8.2.1 can be applied with advantage to the general case. First convert the given system of recurrences to matrix form byintroducing the matrix A = [aij]m,m> the coefficient matrix, and defining Yn = yV y^J and B = / 6 i b2 bmJ Then the system of recurrences becomes simply *n+l = AYn, with the initial condition YQ — B. The general solution of this is Yn = An B0. Now assume that A is diagonalizable: suppose that in fact D = S~l AS is diagonal with diagonal entries di,..., dm. Then A = SDS'1 and An = SDn S-1 , so that Yn = SDn S~1 B Here of course Dn is the diagonal matrix with entries
  • 296. 2oU Chapter Eight: Eigenvalues and Eigenvectors dn ', d2n ..., dm n . Since we know how to find S and D, all we need do is compute the product Yn, and read off its entries to obtain the functions yn^ ..., j/n ^m -*. At this point the reader may ask: what if A is not di- agonalizable? A complete discussion of this case would take us too far afield. However one possible approach is to exploit the fact that the coefficient matrix A is certainly triangular- izable by 8.1.8. Thus we can find S such that S'1 AS = T is upper triangular. Now write Un = S~1 Yn, so that Yn — SUn. Then the recurrence Yn+i = AYn becomes SUn+i = ASUn, or Un+i = (S~1 AS)Un = TUn. In principle this "triangular" system of recurrence relations can be solved by a process of back substitution: first solve the last recurrence for Un , then substitute for Un in the second last recurrence and solve for Un , and so on. What makes the procedure effective is the fact that powers of a triangular matrix are easier to compute than those of an arbitrary matrix. Example 8.2.2 Consider the system of linear recurrences Vn+l = Vn + Zn Zn+1 = Vn i "->z n The coefficient matrix A = I 1 is not diagonalizable, but it was triangularized in Example 8.1.6; there it was found that T = S~1 AS= (2 Q ^ where S = (^ J Put Un = S~1 Yn; here the entries of Un and Yn are written un, vn and yn, zn respectively. The recurrence relation Yn+i = AYn becomes Un+i = TUn. This system of linear recurrences
  • 297. 8.2: Systems of Linear Recurrences 281 is in triangular form: = 2un + vn The second recurrence has the obvious solution vn = d22n with d2 constant. Substitute for vn in the first equation to get un+i = 2un + d22n . This recurrence can be solved in a simple-minded fashion by calculating successively ui,u2,-.- and looking for the pattern. It turns out that un = di2n + d2n 2 n _ 1 where d is another constant. Finally, yn and zn can be found from the equation Yn = SUn; the general solution is therefore yn = dx2n + d2n2n -1 zn = dl2n + d2(n + 2)2n -1 Higher order recurrence relations A system of recurrence relations for yh. , • • • , yn which expresses each y^+i in terms of the y, for j = n — r + 1,..., n, is said to be of order r. When r > 2, such a system can be converted into a first order system by introducing more unknowns. The method works well even for a single recurrence relation, as the next example shows. Example 8.2.3 (The Fibonacci sequence) The sequence of integers 0, 1, 1, 2, 3, 5,. .. is generated by adding pairs of consecutive terms to get the next term. Thus, if the terms are written yo, yi, y2, • • • , then yn satisfies Vn+i =yn + yn-i, n>l, which is a second order recurrence relation. To convert this into a first order system we introduce the new function zn = yn~i, {n > !)• This results in an equivalent
  • 298. 282 Chapter Eight: Eigenvalues and Eigenvectors system of first order recurrences Vn+l =Vn + Zn Zn+l = Vn with initial conditions y0 = 0 and z0 = 1. The coefficient matrix A = I J has eigenvalues (1 + v/5)/2 and (1 — y/ 5)/2, so it is diagonalizable. Diagonalizing A as in Example 8.1.5, we find that D-S-1 AS-((1 + V5)/2 ° "i where S = ( r ( 1 + V5)/2 ( l - V 5 ) / 2 y Then Yn = An Y0 = (SBS'1 )^ = SDn S'1 Y0. This yields the rather unexpected formula for the (n + l)th Fibonacci number. Markov processes In order to motivate the concept of a Markov process, we consider a problem about population movement. Example 8.2.4 Each year 10% of the population of California leave the state for some other part of the United States, while 20% of the U.S. population outside California enter the state. Assum- ing a constant total population of the country, what will the ultimate population distribution be?
  • 299. 8.2: Systems of Linear Recurrences 283 Let yn and zn be the numbers of people inside and outside California after n years; then the information given translates into the system of linear recurrences Writing Vn+l = .9j/n + -2Zn zn+i = .lyn + .8zn x -=it)"**=(* : « we have Xn+i = AXn. The matrix A has eigenvalues 1 and .7, so we could proceed to solve for yn and zn in the usual way. However this is unnecessary in the present example since it is only the ultimate behavior of yn and zn that is of interest. Assuming that the limits exist, we see that the real object of interest is the vector X00= lim Xn = (^n^ocVn n->oo y iimn^oo Zn Taking the limit as n — > oo of both sides of the equation Xn+i = AXn, we obtain X = AX; hence X is an eigenvec- tor of A associated with the eigenvalue 1. An eigenvector is quickly found to be ( J. Thus Xoo must be a scalar multiple of this vector. Now the sum of the entries of X^ equals the total U.S. population, p say, and it follows that Y — y p 3 VI So the (alarming) conclusion is that ultimately two thirds of the U.S. population will be in California and one third else- where. This can be confirmed by explicitly calculating yn and zn and taking the limit as n —*• oo.
  • 300. 284 Chapter Eight: Eigenvalues and Eigenvectors The preceding problem is an example of what is known as a Markov process. For an understanding of this concept some knowledge of elementary probability is necessary. A Markov process is a system which has a finite set of states Si,..., Sn. At any instant the system is in a definite state and over a fixed period of time it changes to another state. The probability that the system changes from state Sj to state Si over one time period is assumed to be a constant Pij. The matrix * = [Pijn,n is called the transition matrix of the system. In Example 8.2.4 there are two states: a person is either in or not in California. The transition matrix is the matrix A. Clearly all the entries of P lie in the interval [0, 1]; more importantly P has the property that the sum of the entries in any column equals 1. Indeed Y^i=Pij = 1 s m c e it is certain that the system will change from state Sj to some state Si. This property guarantees that 1 is an eigenvalue of P; indeed det(P — I) = 0 because the sum of the entries in any column of the matrix P — / is equal to zero, so its determinant is zero. Suppose that we are interested in the behavior of the system over two time periods. For this we need to know the probability of going from state Sj to state Si over two periods. Now the probability of the system going from Sj to Si via Sk is PikPkj) s o the probability of going from state Sj to Si over two periods is n ^2 PikPkj- But this is immediately recognizable as the (i,j) entry of P2 ; therefore the transition matrix for the system over two time periods is P2 . More generally the transition matrix for the system over k time periods is seen to be Pk by similar consid- erations.
  • 301. 8.2: Systems of Linear Recurrences 285 The interesting problem for a Markov process is to deter- mine the ultimate behavior of the system over a long period of time, that is to say, limfc_+00(Pfc ). For the (i,j) entry of this matrix is the probability that the system will go from state Si to state Sj in the long run. The first question to be addressed is whether this limit always exists. In general the answer is negative, as a very simple example shows: if P = ( 1, then Pk equals either 1 or I 1, according to whether k is even or odd; so the limit does not exist in this case. Nevertheless it turns out that under some mild assumptions about the matrix the limit does exist. Let us call a transition matrix P regular if some positive power of P has all its entries positive. For example, the matrix I 1 is regular; indeed all powers after the first have positive entries. But, as we have seen, the matrix I I is not regular. A Markov system is said to be regular if its transition matrix is regular. The fundamental theorem about Markov processes can now be stated. A proof may be found in [15], for example. Theorem 8.2.1 Let P be the transition matrix of a regular Markov system. Then limfc_^00(PA: ) exists and has the form (XX ... X) where X is the unique eigenvector of P associated with the eigenvalue 1 which has entry sum equal to 1. Our second example of a Markov process is the library book problem from Chapter One (see Exercise 1.2.12). Example 8.2.5 A certain library owns 10,000 books. Each month 20% of the books in the library are lent out and 80% of the books lent out
  • 302. 286 Chapter Eight: Eigenvalues and Eigenvectors are returned, while 10% remain lent out and 10% are reported lost. Finally, 25% of books listed as lost the previous month are found and returned to the library. How many books will be in the library, lent out, and lost in the long run? Here there are three states that a book may be in: Si = in the library: S2 = lent out: S3 = lost. The transition matrix for this Markov process is .8 .8 .25' P= I .2 .1 0 0 .1 .75 Clearly P2 has positive entries, so P is regular. Of course P has the eigenvalue 1; the corresponding eigenvector with entry sum equal to 1 is found to be So the probabilities that a book is in states Si, S2, S3 after a long period of time are 45/59, 10/59, 4/59 respectively. There- fore the expected numbers of books in the library, lent out, and lost, in the long run, are obtained by multiplying these probabilities by the total number of books, 10,000. These numbers are therefore 7627, 1695, 678 respectively. Exercises 8.2 1. Solve the following systems of linear recurrences with the specified initial conditions: (a) Vn+1 Z v , l lX z n where y0 = 0,z0 = 1; (b) y ;+l : %- X f where y = <>• *> =l - Zn+l — z Vn -r 3Zn
  • 303. 8.2: Systems of Linear Recurrences 287 2. In a certain nature reserve there are two competing animal species A and B. It is observed that the number of species A equals three times the number of A last year less twice the number of species B last year. Also the number of species B is twice the number of B last year less the number of species A last year. Write down a system of linear recurrence relations for an and bn, the numbers of each species after n years, and solve the system. What are the long term prospects for each species? 3. A pair of newborn rabbits begins to breed at age one month, and each successive month produces one pair of off- spring (one of each sex). Initially there were two pairs of rab- bits. If rn is the total number of pairs of rabbits at the begin- ning of the nth month, show that rn satisfies rn+i = r n + r n _ i and ri = 2 = r^- Solve this second order recurrence relation for rn. 4. A tower n feet high is to be built from red, white and blue blocks. Each red block is 1 foot high, while the white and blue blocks are 2 feet high. If un denotes the number of dif- ferent designs for the tower, show that the recurrence relation un+i = un + 2un_i must hold. By solving this recurrence, find a formula for un. 5. Solve the system of recurrence relations yn+i = 3yn — 2zn, zn+i — 2yn — zn, with the initial conditions yo = 1, zo = 0. 6. Solve the second order system yn+i = yn-i, zn+i = yn + 4zn, with the initial conditions yo = 0, y = 1 = z. 7. In a certain city 90% of employed persons retain their jobs at the end of each year, while 60% of the unemployed find a job during the year. Assuming that the total employable population remains constant, find the unemployment rate in the long run.
  • 304. 288 Chapter Eight: Eigenvectors and Eigenvalues 8. A certain species of bird nests in three locations A, B and C. It is observed that each year half of the birds at A and half of the birds at B move their nests to C, while the others stay in the same nesting place. The birds nesting at C are evenly split between A and B. Find the ultimate distribution of birds among the three nesting sites, assuming that the total bird population remains constant. 9. There are three political parties in a certain city, conserva- tives, liberals and socialists. The probabilities that someone who voted conservative last time will vote liberal or socialist at the next election are .3 and .2 respectively. The proba- bilities of a liberal voting conservative or socialist are .2 and .1. Finally, the probabilities of a socialist voting conservative or liberal are .1 and .2. What percentages of the electorate will vote for the three parties in the long run, assuming that everyone votes and the number of voters remains constant? 8.3 Applications to Systems of Linear Differential Equations In this section we show how the theory of eigenvalues developed in 8.1 can be applied to solve systems of linear differential equations. Since there is a close analogy between linear recurrence relations and linear differential equations, the reader will soon notice a similarity between the methods used here and in 8.2. For simplicity we consider initially a system of first or- der linear {homogeneous) differential equations for functions yi,..., yn of x. This has the general form { y'l = aiiVi + • • • + ainVn y'n = anlyi + • • • + annyn
  • 305. 8.3: Applications to Systems of Linear Differential Equations 289 Here the Qj<ij £1X6 assumed to be constants. The object is to find the most general functions 2/i,...,j/n , differentiable in some interval [a ,b], which satisfy the equations of the system. Alternatively one may wish to find functions which satisfy in addition a set of initial conditions of the form yi(xQ) = h, y2(x0) = b2, ..., yn(xn) = K- Here the bi are certain constants and x$ is in the interval [a, b]. Let A = [a,ij], the coefficient matrix of the system and write fyi Y = yn/ Then we define the derivative of Y to be Y' = /y[ y'2 With this notation the given system of differential equations can be written in matrix form Y' = AY. By a solution of this equation we shall mean any column vector Y of n functions in D[a, b] which satisfies the equation. The set of all solutions is a subspace of the vector space of all n-column vectors of differentiable functions; this is called the solution space. It can be shown that the dimension of the so- lution space equals n, so that there are n linearly independent solutions, and every solution is a linear combination of them.
  • 306. 2 9 0 Chapter Eight: Eigenvectors and Eigenvalues If a set of n initial conditions is given, there is in fact a unique solution of the system satisfying these conditions. For an account of the theory of systems of differential equations the reader may consult a book on differential equations such as [15] or [16]. Here we are concerned with methods of finding solutions, not with questions of existence and uniqueness of solutions. Suppose that the coefficient matrix A is diagonalizable, so there is an invertible matrix S such that D = S~1 AS is diagonal, with diagonal entries d,..., dn say. Here of course the di are the eigenvalues of A. Define U = S-X Y. Then Y — SU and Y' = SU' since S has constant entries. Substituting for Y and Y' in the equation Y' = AY, we obtain SU' = ASU, or U' = {S~1 AS)U = DU. This is a system of linear differential equations for u,..., un, the entries of U. It has the very simple form dUi d2u2 The equation u = diUi is easy to solve since its differential form is d(ln Ui) = di. Thus its general solution is u^ = CiediX where ci is a con- stant. The general solution of the system of linear differential equations for u,..., un is therefore ui =cied i a ; , ... ,un = cnednX . u'2
  • 307. 8.3: Applications to Systems of Linear Differential Equations 291 To find the original functions yi, simply use the equation Y = SU to get n n 3=1 3=1 Since we know how to find S, this procedure provides an effective method of solving systems of first order linear differential equations in the case where the coefficient matrix is diagonalizable. Example 8.3.1 Consider a long tube divided into four regions along which heat can flow. The regions on the extreme left and right are kept at 0°C, while the walls of the tube are insulated. It is assumed that the temperature is uniform within each region. Let y(t) and z(t) be the temperatures of the regions A and B at time t. It is known that the rate at which each region cools equals the sum of the temperature differences with the surrounding media. Find a system of linear differential equa- tions for y[t) and z(t) and solve it. 0° / A m° B z(t)° "s n° According to the law of cooling y' =(z-y) + {0-y) Z' =(y-Z) + (p-Z)
  • 308. 292 Chapter Eight: Eigenvectors and Eigenvalues Thus we are faced with the linear system of differential equa- tions y' =-2y + z z' = y - 2z Here A=l~l _X)^Y=(l Now the matrix A is diagonalizable; indeed D = S~1 AS=[~l _°3 J where S = (l _J Setting U = S~X Y, we obtain from Y' = AY the equation U' = DU. This yields two very simple differential equations u[ = —u u'o = —3u 2 where u and «2 are the entries of U. Hence u = ce * and u-i = de~3t , with arbitrary constants c and d. Finally Y = SU-l a _ t _ . _ 3 t ce * + de 3t ce~t — de The general solution of the original system of differential equa- tions is therefore y = ce~* + de~3t z = ce~f — de~3t Thus the temperatures of both regions A and B tend to zero as t — > oo. In the next example complex eigenvalues arise, which causes a change in the procedure.
  • 309. 8.3: Applications to Systems of Linear Differential Equations 29o Example 8.3.2 Solve the linear system of differential equations ( y[ = Vi ~ Vi y'2 = yi+y2 The coefficient matrix here is A = which has complex eigenvalues 1 + % and 1 — i; we are us- ing the familiar notation % = /—l here. The corresponding eigenvectors are 0and (~i* respectively. Let S be the 2x2 matrix which has these vectors as its columns; then S~1 AS = D, the diagonal matrix with diagonal entries 1 + i and 1 — i. If we write U = S~X Y, the system of equations becomes U' = DU, that is, tti = (1 +i)u u'2 = (1 - i)u2 where u and U2 are the entries of U. The first equation has the solution u = e(1+ ^x , while the second has the obvious solution u2 = 0. Using these values for u and U2, we obtain a complex solution of the system of differential equations Y _ s u _ ( i e W y _ su - I e(1+i)x I Of course we are looking for real solutions, but these are in fact at hand. For the real and imaginary parts of Y will also
  • 310. 2 9 4 Chapter Eight: Eigenvectors and Eigenvalues be solutions of the system Y' = AY. Thus we obtain two real solutions from the single complex solution Y, by taking the real and imaginary parts of Y; these are respectively /—e^sin x , , . /e^cos x e cos x J ex sin x Now Y and Y2 are easily seen to be linearly independent solu- tions; therefore the general solution of the system is obtained by taking an arbitrary linear combination of these: Y = c1Y1 + c2Y2 = e*( ~Cl S i n x + C2 C ° S X y c cos x + c2 sin x where c and c2 are arbitrary real constants. Hence J/i =ex (—ci sin x + c2 cos x) y2 = ex (c cos a: + C2 sin a;) Of course the success of the method employed in the last two examples depended entirely upon the fact that A is diag- onalizable. However, should this not be the case, one can still treat the system of differential equations by triangularizing the coefficient matrix and solving the resulting triangular sys- tem using back substitution, rather as was done for systems of linear recurrences in 8.2. Example 8.3.3 Solve the linear system of differential equations 2/i = 3 / i + 2/2 2/2 = -2/1 + 3 2/2 In this case the coefficient matrix
  • 311. 8.3: Applications to Systems of Linear Differential Equations 295 is not diagonalizable, but it can be triangularized. In fact it was shown in Example 8.1.6 that T = S->AS=(l where S = I J. Put U = S Y and write ui, u2 for the entries of U. Then Y = SU and Y' = SU'. The equation Y' = AY now becomes U' = TU. This yields the triangular system u[ = 2ui + u<i u'2 = 2u2 Solving the second equation, we find that u2 = c2e2x with C2 an arbitrary constant. Now substitute for u^ in the first equation to get u[ - 2ui = c2e2x . This is a first order linear equation which can be solved by a standard method: multiply both sides of the equation by the "integrating factor" f -2dx -2x The equation then becomes (uie~2x )' = c2, whence ue~2x = c2x + ci, with c another arbitrary constant. Thus u — c2xe2x + cxe2x . To find the original functions j/i and y2, we form the product Y = SU = e2x ( C l + C 2 X ^ c i +c2{x + 1) Thus the general solution of the system is J/i = (ci +c2x)e2x , y2 = (ci + c2(x + l))e2x
  • 312. 296 Chapter Eight: Eigenvectors and Eigenvalues Finally, suppose that initial conditions j/i (0) = 1 and V2 (0) = 0 are given. We can find the correct values of c and c2 by substituting t = 0 in the expressions for y and j/2, to get ci = 1 and C2 = — 1. The required solution is y = (1 — x)e2x and ?/2 = —x e2x . The next application is one of a military nature. Example 8.3.4 Two armored divisions A and B engage in combat. At time t their respective numbers of tanks are a(t) and b(t). The rate at which tanks in a division are destroyed is proportional to the number of intact enemy tanks at that instant. Initially A and B have ao and bo tanks where ao > &o- Predict the outcome of the battle. According to the information given, the functions a and b satisfy the linear system a' = -kb b' = -ka where k is some positive constant. Here the coefficient matrix is 0 -k' A ~ ' -k 0 The characteristic equation is x2 — k2 = 0, so the eigenvalues are k and —k and A is diagonalizable. It turns out that where S = ( . If we set F = , the system of differential equations becomes Y' = AY. On writing U = S-X Y, we get U' = DU. This is the system u' = ku v' = —kv
  • 313. 8.3: Applications to Systems of Linear Differential Equations 297 where U = I 1. Hence u = cekx and v = de kx , with c and W d arbitrary constants. The general solution is Y = SU, which yields j a= cekt + de~kt b = -cekt 4- de~kt Now the initial conditions are a(0) = ao and b(0) = bo, so c + d = ao —c + d=bo Solving we obtain c = (ao — bo)/2, d = (a0 + &o)/2. Therefore the numbers of tanks surviving at time t in Divisions A and B are respectively a = {2^y^+^o + boe-kt a o - fe o ekt + (ao + b0 e_fc( It is more convenient to write a(t) and b(t) in terms of the hyperbolic functions cosh(:r) = ^(ex + e~x ) and sinh(x) = {ex — e~x ). Then the solution becomes a = aocosh(kt) — 6osinh(H) b = bocosh(kt) — aosinh(A;i) Now Division B will have lost all its tanks when 6 = 0, i.e., after time t=itanh-1 (-). k v ao Observe also that 2 t2 2 1,2 a — b — an — bf,
  • 314. 298 Chapter Eight: Eigenvectors and Eigenvalues because of the identity cosh2 (kt) — sinh2 (A;i) = 1. Therefore at the time when Division B has lost all of its tanks, Division A still has a tanks where a2 — 0 = a§ — &o- Hence the number of tanks that Division A has left at the end of the battle is Va o - b o- Not surprisingly, since it had more tanks to start with, Divi- sion A wins the battle. However, there is a way in which Division B could con- ceivably win. Suppose that — a 0 < bQ < a0. v 2 Suppose further that Division A consists of two columns with equal numbers of tanks, and that Division B manages to at- tack one column of Division A before the other column can come to its aid. Since 60 > |a0 , Division B defeats the first column of Division A, and it still has • Jb 2 ) — OQ tanks left. Then Division B attacks the second column and wins with y b l - ia l - 4°o = y b l - 2a o tanks left. Thus Division B wins the battle despite having fewer tanks than Division A: but it must have more than ao/^2 or 71% of the strength of the larger division for the plan to work. This explains the frequent success of the "divide and conquer strategy". Higher order equations Systems of linear differential equations of order 2 or more can be converted to first order systems by introducing addi- tional functions. Once again the procedure is similar to that adopted for systems of linear recurrences.
  • 315. 8.3: Applications to Systems of Linear Differential Equations 299 Example 8.3.5 Solve the second order system v'i = -2y2 + y[ + 2y'2 = 2yx +2y[ 2/2 The system may be converted to a first order system by introducing two new functions 2/3 = 2/i and y4 = y'2. Thus y'{ = y'3 and y2 = y4. The given system is therefore equivalent to the first order system ' 2/i = 2/3 2/2=2/4 y'3 = -2y2 + y3 + 2y4 , 2/4 = 2 2/i + 2y3 - y4 The coefficient matrix here is A = 0 1 0 0 -2 1 0 2 1 2 - 1 / Its eigenvalues turn out to be 1, —1, 2, —2, with corresponding eigenvectors / 2 -1 -2 1/ 1 2 W / -1 -2 V 2 / Therefore, if S denotes the matrix with these vectors as its columns, we have S~1 AS = D, the diagonal matrix with di- agonal entries 1, -1,2, - 2 . Now write U = S~1 Y. Then the
  • 316. 3 0 0 Chapter Eight: Eigenvectors and Eigenvalues equation Y' = AY becomes U' = {S~X AS)U = DU, which is equivalent to u[ = wi, ^2 — ~ u 2, u'3 = 2v,3, u'A = —2M4. Solving these simple equations, we obtain ui=ciex , u2 = c2e~x , u3 = c3e2x , • u 4 = c4e~2:r . The functions 2/1 and 2/2 ma Y n o w De read off from the equation Y = SU to give the general solution 2/1 = cie* + 2c2e~x + c3e2a; + c4e-2:E y2 = 2ciea: — C2e_x + c^e2x — c$e~2x Exercises 8.3 1. Find the general solutions of the following systems of linear differential equations: (a) {ti= -Vl+ * (b) ly ) = lyi 72 /2 {V2= 2yi~ 3y2 x {y2 = - 2yi + 3y2 2/1 = 2/1+2/2 + 2/3 (c) { y'2= 2/2 y's = 2/2 + ys 2. Find the general solution (in real terms) of the system of differential equations y'i= 2/i+ 2/2 V2 = —22/1 + 3y2 Then find a solution satisfying the initial conditions yi(0) = 1, 2/2(0) = 2.
  • 317. 8.3: Applications to Systems of Linear Differential Equations 301 3. By triangularizing the coefficient matrix solve the system of differential equations (y'i = 5yi + 3y2 2/2 = -3yi - V2 Then find a solution satisfying the initial conditions yi(0) = 0, y2(0) = 2. 4. Solve the second order linear system y'( = 2j/i + y2 + y[ + y'2 y'i = ~ %i + 2 2/2 + 5yi - y'2 5. Given a system of n (homogeneous) linear differential equa- tions of order k, how would you convert this to a system of first order equations? How many equations will there be in the first order system? 6. Describe a general method for solving a system of second order linear differential equations of the form Y" = AY, where A is diagonalizable. 7. Solve the systems of differential equations 2 y'i = y - 2/2 (h) [y'i = - 4y- y'{ = 33/1 + 5y2 { M y'{ = Vl + 5y2 [Note that the general solution of the differential equation u" = o?u is u = cicosh(ax) + c2sinh(ax)]. 8. {The double pendulum) A string of length 21 is hung from a rigid support. Two weights each of mass m are attached to the midpoint and lower end of the string, which is then allowed to execute small vibrations subject to gravity only. Let y and y2 denote the horizontal displacements of the two weights from the equilibrium position at time t. (a) (optional) By using Newton's Second Law of Motion, show that yi and y2 satisfy the differential equations
  • 318. 302 Chapter Eight: Eigenvectors and Eigenvalues Vi = a2 (-3yi + y2), y2 = a2 (yx - y2) where a = y/gjl and g is the acceleration due to gravity. (b) Solve the linear system in (a) for y and y2. [Note: the general solution of the differential equation y" + a2 y = 0 is y = c cos ax + c2 sin ax]. 9. In Example 8.3.4 assume that Division A consists of m equal columns. Suppose that Division B is able to attack each column of A in turn. Show that Division B will win the battle provided that bo > -s ^-.
  • 319. Chapter Nine MORE ADVANCED TOPICS This chapter is intended to serve as an introduction to some of the more advanced parts of linear algebra. The most important result of the chapter is the Spectral Theorem, which asserts that every real symmetric matrix can be diagonalized by means of a suitable real orthogonal matrix. This result has applications to quadratic forms, bilinear forms, conies and quadrics, which are described in 9.2 and 9.3. The final section gives an elementary account of the important topic of Jordan normal form, a subject not always treated in a book such as this. 9.1 Eigenvalues and Eigenvectors of Symmetric and Hermitian Matrices In this section we continue the discussion of diagonaliz- ability of matrices, which was begun in 8.1, with special regard to real symmetric matrices. More generally, a square complex matrix A is called hermitian if A = A*, that is, A = (A)T . Thus hermitian matrices are the complex analogs of real symmetric matrices. It will turn out that the eigenvalues and eigenvectors of such matrices have remark- able properties not possessed by complex matrices in general. The first indication of special behavior is the fact that their eigenvalues are always real, while the eigenvectors tend to be orthogonal. 303
  • 320. 304 Chapter Nine: Advanced Topics Theorem 9.1.1 Let A be a hermitian matrix. Then: (a) the eigenvalues of A are all real; (b) eigenvectors of A associated with distinct eigenvalues are orthogonal. Proof Let c be an eigenvalue of A with associated eigenvector X, so that AX — cX. Taking the complex transpose of both sides of this equation and using 7.1.7, we obtain X*A = cX* since A = A*. Now multiply both sides of this equation on the right by X to get X*AX = cX*X = c||X||2 : remember here that X*X equals the square of the length of X. But (X*AX)* = X*A*X** = X*AX; thus the scalar X* AX equals its complex conjugate and so it is real. It follows that c||X||2 is real. Since lengths of vectors are always real, we deduce that c, and hence c, is real, which completes the proof of (a). To prove (b) take two eigenvectors X and Y associated with distinct eigenvalues c and d. Thus AX = cX and AY = dY. Then Y*AX = Y*(cX) = cY*X, and in the same way X*AY = dX*Y. However, by 7.1.7 again, (X*AY)* = Y*A*X = Y*AX. Therefore (dX*Y)* = cY*X, or dY*X = cY*X because d is real by the first part of the proof. This means that (c—d)Y*X = 0, from which it follows that Y*X = 0 since c ^ d. Thus X and Y are orthogonal. Suppose now that {Xi,..., Xr} is a set of linearly inde- pendent eigenvectors of the n x n hermitian matrix A, and that r is chosen as large as possible. We can multiply Xi by l/||Xj|| to produce a unit vector; thus we may assume that each Xi is a unit vector. By 9.1.1 {Xi,..., Xr} is an orthonor- mal set. Now write U = (X .. -Xr), an n x r matrix. Then U has the property AU = (AYi ... AXr) = {c1X1 ... crXr),
  • 321. 9.1: Symmetric and Hermitian Matrices 305 where c,..., cr are the eigenvectors corresponding to X,..., Xr respectively. Hence AU = (X1X2...Xr) /ci 0 0 0 c2 0 0 0 0 0 Cr/ = UD, where D is the diagonal matrix with diagonal entries c,..., cr. Since the columns of U form an orthonormal set, U*U = Ir. In general r < n, but should it be the case that r = n, then U is n x n and we have U^1 = U*, so that U is unitary (see 7.3). Therefore U*AU = D and A is diagonalized by the matrix U. In other words, if there exist n mutually orthogonal eigenvectors of A, then A can be diagonalized by a unitary matrix. The outstanding question is, of course, whether there are always that many linearly independent eigenvectors. We shall shortly see that this is the case. A key result must first be established. Theorem 9.1.2 (Schur '5 Theorem) Let A be an arbitrary square complex matrix. Then there is a unitary matrix U such that U*AU is upper triangular. More- over, if A is a real symmetric matrix, then U can be chosen real and orthogonal. Proof Let A be an n x n matrix. The proof is by induction on n. Of course, if n = 1, then A is already upper triangular, so let n > 1. There is an eigenvector X of A, with associated eigenvalue c say. Here we can choose X to be a unit vector in Cn . Using 5.1.4 we adjoin vectors to X to form a basis of Cn . Then the Gram-Schmidt procedure (in the complex case) may be applied to produce an orthonormal basis X,..., Xn of Cn ; note that X is a member of this basis.
  • 322. 306 Chapter Nine: Advanced Topics Let UQ denote the matrix (X... Xn); then UQ is unitary since its columns form an orthonormal set. Now U*QAXX = U*0{ClXx) = ci(C/0*Xi). Also X*Xl=0iii>l, while X?XX = 1. Hence U^AXX = Cl / c i 0 . X nX l J W Since UZAUQ = U*QA{X1 ...Xn) = (U£AX1 U*0AX2 ... U£AXn), we deduce that ci B U^AUo = 0 Ai where A is a matrix with n — 1 rows and columns and 5 is an (n — l)-row vector. We now have the opportunity to apply the induction hypothesis on n; there is a unitary matrix U such that C/*i4it/i = Ti is upper triangular. Put C/2 = 1 0 0 C/i which is surely a unitary matrix. Then let U — VQU^'I this also unitary since U*U = U^(U^U0)U2 = U;U2 = I. Finally ^M£/ = C^(^0M^o)y'2 = ^ ( C 0 1 f j ^ ,
  • 323. 9.1: Symmetric and Hermitian Matrices 307 which equals 1 0 / c i B ( l 0 (a BUX 0 UZJ0 A1J0 Uj V° UtA&iJ- This shows that U*AU=(C ' B ^y an upper triangular matrix, as required. If the matrix A is real symmetric, the argument shows that there is a real orthogonal matrix S such that ST AS is diagonal. The point to keep in mind here is that the eigenval- ues of A are real by 9.1.1, so that A has a real eigenvector. The crucial theorem on the diagonalization of hermitian matrices can now be established. Theorem 9.1.3 (The Spectral Theorem) Let A be a hermitian matrix. Then there is a unitary matrix U such that U*AU is diagonal. If A is a real symmetric matrix, then U may be chosen to be real and orthogonal. Proof By 9.1.2 there is a unitary matrix U such that U*AU = T is upper triangular. Then T* = U*A*U = U*AU = T, so T is hermitian. But T is upper triangular and T* is lower triangular, so the only way that T and T* can be equal is if all the off-diagonal entries of T are zero, that is, T is diagonal. The case where A is real symmetric is handled by the same argument. Corollary 9.1.4 If A is an n x n hermitian matrix, there is an orthonormal basis of Cn which consists entirely of eigenvectors of A. If in addition A is real, there is an orthonormal basis of Rn consisting of eigenvectors of A.
  • 324. 308 Chapter Nine: Advanced Topics Proof By 9.1.3 there is a unitary matrix U such that U* AU = D is diagonal, with diagonal entries d,..., dn say. If Xi,..., Xn are the columns of U, then the equation AU = UD implies that AXi = diXi for i = 1,..., n. Therefore the Xi are eigen- vectors of A, and since U is unitary, they form an orthonormal basis of Cn . The argument in the real case is similar. This justifies our hope that an n x n hermitian matrix always has enough eigenvectors to form an orthonormal basis of Cn . Notice that this will be the case even if the eigenvalues of A are not all distinct. The following constitutes a practical method of diago- nalizing an n x n hermitian matrix A by means of a unitary matrix. For each eigenvalue find a basis for the correspond- ing eigenspace. Then apply the Gram-Schmidt procedure to get an orthonormal basis of each eigenspace. These bases are then combined to form an orthonormal set, say {Xi,..., Xn}. By 9.1.4 this will be a basis of Cn . If U is the matrix with columns X,..., Xn, then U is hermitian and U*AU is diago- nal, as was shown in the discussion preceding 9.1.2. The same procedure is effective for real symmetric matrices. Example 9.1.1 Find a real orthogonal matrix which diagonalizes the matrix The eigenvalues of A are 3 and —1, (real of course), and corresponding eigenvectors are ( l ) a n d ( ~ l ) '
  • 325. 9.1: Symmetric and Hermitian Matrices 309 These are orthogonal; to get an orthonormal basis of R2 , re- place them by the unit eigenvectors Finally let ^OO^T^i which is an orthogonal matrix. The theory predicts that as is easily verified by matrix multiplication. Example 9.1.2 Find a unitary matrix which diagonalizes the hermitian matrix / 3/2 i/2 0' A= -i/2 3/2 0 V 0 0 1 where i = ^/—T. The eigenvalues are found to be 1, 2, 1, with associated unit eigenvectors -if y/2 ( 1/V2 / 0 ' l/x/2 , -i/y/2 , 0 Therefore U*AU = / l 0 0' 0 2 0
  • 326. 3 1 0 Chapter Nine: Advanced Topics where U is the unitary matrix ^2 V 0 0 V2/ Normal matrices We have seen that every nxn hermitian matrix A has the property that there is an orthonormal basis of C n consisting of eigenvectors of A. It was also observed that this property immediately leads to A being diagonalizable by a unitary ma- trix, namely the matrix whose columns are the vectors of the orthonormal basis. We shall consider what other matrices have this useful property. A complex matrix A is called normal if it commutes with its complex transpose, A* A = AA*. Of course for a real matrix this says that A commutes with its transpose AT . Clearly hermitian matrices are normal; for if A = A*, then certainly A commutes with A*. What is the connection between normal matrices and the existence of an orthonormal basis of eigenvectors? The somewhat surprising answer is given by the next theorem. Theorem 9.1.5 Let A be a complex nxn matrix. Then A is normal if and only if there is an orthonormal basis of Cn consisting of eigen- vectors of A. Proof First of all suppose that Cn has an orthonormal basis of eigenvectors of A. Then, as has been noted, there is a uni- tary matrix U such that U*AU = D is diagonal. This leads
  • 327. 9.1: Symmetric and Hermitian Matrices 311 to A = UDU* because U* = U~x . Next we perform a di- rect computation to show that A commutes with its complex transpose: AA* = UDU*UD*U* = UDD*U*, and in the same way A* A = UD*U*UDU* = UD*DU*. But diagonal matrices always commute, so DD* = D*D. It follows that AA* = A*A, so that A is normal. It remains to show that if A is normal, then there is an orthonormal basis of Cn consisting entirely of eigenvectors of A. From 9.1.2 we know that there is a unitary matrix U such that U* AU = T is upper triangular. The next observation is that T is also normal. This too is established by a direct computation: T*T = U*A*UU*AU = U*(A*A)U. In the same way TT* = U*(AA*)U. Since A*A = AA*, it follows that T*T = TT*. Now equate the (1, 1) entries of T*T and TT*; this yields the equation |tii|2 = |iii|2 + |£i2|2 + --' + |iin|2 , which implies that £12,..., tn are all zero. By looking at the (2, 2), (3, 3),..., (n, n) entries of T*T and TT*, we see that all the other off-diagonal entries of T vanish too. Thus T is actually a diagonal matrix. Finally, since AU = UT, the columns of U are eigenvec- tors of A, and they form an orthonormal basis of C n because U is unitary. This completes the proof of the theorem. The last theorem provides us with many examples of di- agonalizable matrices: for example, complex matrices which are unitary or hermitian are automatically normal, as are real symmetric and real orthogonal matrices. Any matrix of these types can therefore be diagonalized by a unitary matrix.
  • 328. 312 Chapter Nine: Advanced Topics Exercises 9.1 1. Find unitary or orthogonal matrices which diagonalize the following matrices: ( a (i a); (»>(; j -»)! 2. Suppose that A is a complex matrix with real eigenvalues which can be diagonalized by a unitary matrix. Prove that A must be hermitian. 3. Show that an upper triangular matrix is normal if and only if it is diagonal. 4. Let A be a normal matrix. Show that A is hermitian if and only if all its eigenvalues are real. 5. A complex matrix A is called skew-hermitian if A* = —A. Prove the following statements: (a) a skew-hermitian matrix is normal; (b) the eigenvalues of a skew-hermitian matrix are purely imaginary, that is, of the form af^l where a is real; (c) a normal matrix is skew-hermitian if all its eigenvalues are purely imaginary. 6. Let A be a normal matrix. Prove that A is unitary if and only if all its eigenvalues c satisfy c = 1. 7. Let X be any unit vector in Cn and put A = In — 2XX*. Prove that A is both hermitian and unitary. Deduce that A = A~ 8. Give an example of a normal matrix which is not hermitian, skew-hermitian or unitary. [Hint: use Exercises 4, 5, and 6].
  • 329. 9.2: Quadratic Forms 313 9. Let A be a real orthogonal n x n matrix. Prove that A is similar to a matrix with blocks down the diagonal each of which is Ii, —Im, or else a matrix of the form cos 9 — sin 9 sin 9 cos 9 J where 0 < 9 < 2n, and 9 ^ TT. [Hint: by Exercise 6 the eigenvalues of A have modulus 1; also A is similar to a diagonal matrix whose diagonal entries are the eigenvalues]. 9.2 Quadratic Forms A quadratic form in the real variables x,..., xn is a poly- nomial in x,..., xn with real coefficients in which every term has degree 2. For example, the expression ax2 + 2bxy + cy2 is a quadratic form in x and y. Quadratic forms occur in many contexts; for example, the equations of a conic in the plane and a quadric surface in three-dimensional space involve quadratic forms. We begin by observing that the quadratic form q — ax2 + 2bxy + cy2 in x and y can be written as a product of two vectors and a symmetric matrix, In general any quadratic form q in x i , . . . , xn can be written in this form. For let q be given by the equation n n q = y j / j aijXiXj i=l j = l
  • 330. 314 Chapter Nine: Advanced Topics where the a^- are real numbers. Setting A = [ar,]n)Tl and writing X for the column vector with entries x,..., xn, we see from the definition of matrix products that q may be written in the form q = XT AX. Thus the quadratic form q is determined by the real matrix A. At this point we make the crucial observation that noth- ing is lost if we assume that A is symmetric. For, since XT AX is scalar, q may also be written as (XT AX)T = XT AT X; therefore q= {XT AX + XT AT X) = XT { {A + AT ) )X. ZJ Zi It follows that A can be replaced by the symmetric matrix ^(A + AT ). For this reason it will in future be tacitly assumed that the matrix associated with a quadratic form is symmetric. The observation of the previous paragraph allows us to apply the Spectral Theorem to an arbitrary quadratic form. The conclusion is that a quadratic form can be written in terms of squares only. Theorem 9.2.1 Let q = XT AX be an arbitrary quadratic form. Then there is a real orthogonal matrix S such that q = cix[ + • • • + cnx'n where x[,..., x'n are the entries of X — ST X and C,..., cn are the eigenvalues of the matrix A. Proof By 9.1.3 there is a real orthogonal matrix S such that ST AS = D is diagonal, with diagonal entries c±,..., cn say. Define X to be ST X; then X = SX . Substituting for X, we find that q = XT AX = (SX')T A(SX') = {X')T {ST AS)X' = (X')T DX'.
  • 331. 9.2: Quadratic Forms 315 Multiplying out the final matrix product, we find that q = / 2 , - / 2 CXi T • • • ~r cnxn . Application to conies and quadrics We recall from the analytical geometry of two dimensions that a conic is a curve in the plane with equation of the second degree, the general form being ax2 + 2bxy + cy2 + dx + ey + f = 0 where the coefficients are real numbers. This can be written in the matrix form XT AX + (d e)X + f = 0 where x - ( ; ) - d A - ( ; J). So there is a quadratic form in x and y involved in this conic. Let us examine the effect on the equation of the conic of ap- plying the Spectral Theorem. Let S be a real orthogonal matrix such that ST AS = , J where a' and d are the eigenvalues of A. Put X' = ST X and denote the entries of X by x',y'; then X = SX' and the equation of the conic takes the form ( X ' ) T ( o °)x' + {de)SX' + f = 0, or equivalently, a'x'2 + c'y'2 + d'x' + e'y' + / = 0 for certain real numbers d' and e'. Thus the advantage of changing to the new variables x' and y' is that no "cross term" in x'y' appears in the quadratic form.
  • 332. 316 Chapter Nine: Advanced Topics There is a good geometrical interpretation of this change of variables: it corresponds to a rotation of axes to a new set of coordinates x' and y'. Indeed, by Examples 6.2.9 and 7.3.7, any real 2 x 2 orthogonal matrix represents either a rotation or a reflection in R2 ; however a reflection will not arise in the present instance: for if it did, the equation of the conic would have had no cross term to begin with. By Example 7.3.7 the orthogonal matrix S has the form cos 9 — sin 9 sin 9 cos 9 J where 9 is the angle of rotation. Since X = ST X, we obtain the equations x' = x cos 9 + y sin 9 y' = —x sin 9 + y cos 9 The effect of changing the variables from x,y to x',y' is to rotate the coordinate axes to axes that are parallel to the axes of the conic, the so-called principal axes. Finally, by completing the square in x' and y' as nec- essary, we can obtain the standard form of the conic, and identify it as an ellipse, parabola, hyperbola (or degenerate form). This final move amounts to a translation of axes. So our conclusion is that the equation of any conic can be put in standard form by a rotation of axes followed by a translation of axes. Example 9.2.1 Identify the conic x2 + Axy + y2 + 3x + y — 1 = 0. The matrix of the quadratic form x2 + 4xy + y2 is
  • 333. 9.2: Quadratic Forms 317 It was shown in Example 9.1.1 that the eigenvalues of A are 3 and -1 and that A is diagonalized by the orthogonal matrix S = V2 Put X — ST X where X has entries x' and y'; then X — SX and we read off that 1 ( ' X = T2{X y') y ^ (*- + •) So here 9 = TT/4 and the correct rotation of axes for this conic is through angle n/4 in an anticlockwise direction. Substitut- ing for x and y in the equation of the conic, we get 3x'2 - y'2 + 2^/2x' - y/2y' - 1 = 0. From this we can already see that the conic is a hyperbola. To obtain the standard form, complete the square in x' and y': 3(z' + ^ ) 2 - ( y ' + 4;)2 3 ' ™ ' y/2} 6" Hence the equation of the hyperbola in standard form is 3x"2 - y"2 = 7/6, where x" = x' + ^2/3 and y" = y' + l/y/2. This is a hy- perbola whose center is at the point where x' = — /2/3 and y' = —jJ1 thus the xy - coordinates of the center of the hyperbola are (1/6, —5/6). The axes of the hyperbola are the lines x" = 0 and y" = 0, that is, x + y = —2/3 and x — y = l.
  • 334. 318 Chapter Nine: Advanced Topics Quadrics A quadric is a surface in three-dimensional space whose equation has degree 2 and therefore has the form ax2 + by2 + cz2 + 2dxy + 2eyz + 2fzx + gx + hy + iz + j = 0. Let A be the symmetric matrix a d f d b e . f e c) Then the equation of the quadric may be written in the form XT AX + (gh i)X + .7=0. where X is the column with entries x, y, z. Recall from analytical geometry that a quadric is one the following surfaces: an ellipsoid, a hyperboloid, a paraboloid, a cone, a cylinder (or a degenerate form). The type of a quadric can be determined by a rotation to principal axes, just as for conies. Thus the procedure is to find a real orthogonal matrix S such that ST AX = D is diagonal, with entries a', b', c' say. Put X' = ST X. Then X = SX' and XT AX = (X')T DX': the equation of the quadric becomes (X'fDX' + (g h i)SX' +.7=0, which is equivalent to a'x'2 + b'y'2 + cV2 + g'x' + tiy' + i'z' + j = 0. Here a',b',c' are the eigenvalues of A, while ghi' are cer- tain real numbers. By completing the square in x', y', z' as
  • 335. 9.2: Quadratic Forms 319 necessary, we shall obtain the equation of the quadric in stan- dard form; it will then be possible to recognise its type and position. The last step represents a translation of axes. Example 9.2.2 Identify the quadric surface x2 +y2 + z2 + 2xy + 2yz + 2zx - x + 2y - z = 0. The matrix of the relevant quadratic form is A = and the equation of the quadric in matrix form is XT AX + (-l 2 - l ) X = 0. We diagonalize A by means of an orthogonal matrix. The eigenvalues of A are found to be 0, 0, 3, with corresponding unit eigenvectors W 2 / 0 / l A / 3 -1/V2 , 1A/2 , W 3 . o J V-WV W3/ The first two vectors generate the eigenspace corresponding to the eigenvalue 0. We need to find an orthonormal basis of this subspace; this can be done either by using the Gram-Schmidt procedure or by guessing. Such a basis turns out to be
  • 336. 320 Chapter Nine: Advanced Topics Therefore A is diagonalized by the orthogonal matrix / W2 W6 W3 S = -1/^2 1A/6 1/^3 . V 0 -2/V6 1/V3/ The matrix 5 represents a rotation of axes. P u t X = 5 T X ; then X = SX' and X r A X = (X')T (5T A5)X = (X'fDX, where D is the diagonal matrix with diagonal entries 0, 0, 3. The equation of the quadric becomes X'T DX' + (-l 2 -1)SX' = 0 or V2 V® This is a parabolic cylinder whose axis is the line with equa- tions y' = y/Zx', z' = 0. Definite quadratic forms Consider once again a quadratic form q = XT AX in real variables x±,..., xn, where A is a real symmetric matrix. In some applications it is the sign of q that is significant. The quadratic form q is said to be positive definite if q > 0 whenever 1 ^ 0 . Similarly, q is called negative definite if q < 0 whenever X ^ 0. If, however, q can take both positive and negative values, then q is said to be indefinite. The terms positive definite, negative definite and indefinite can also be applied to a real symmetric matrix A, according to the behav- ior of the corresponding quadratic form q = XT AX. For example, the expression 2x2 + 3y2 is positive unless x = 0 = y, so this is a positive definite quadratic form, while
  • 337. 9.2: Quadratic Forms 321 —2x2 — 3y2 is clearly negative definite. On the other hand, the form 2x2 — 3y2 can take both positive and negative values, so it is indefinite. In these examples it was easy to decide the nature of the quadratic form since it contained only squared terms. How- ever, in the case of a general quadratic form, it is not possible to decide the nature of the form by simple inspection. The diagonalization process for symmetric matrices allows us to reduce the problem to a quadratic form whose matrix is diag- onal, and which therefore involves only squared terms. From this it is apparent that it is the signs of the eigenvalues of the matrix A that are important. The definitive result is Theorem 9.2.2 Let A be a real symmetric matrix and let q = XT AX: then (a) q is positive definite if and only if all the eigenvalues of A are positive; (b) q is negative definite if and only if all the eigenvalues of A are negative; (c) q is indefinite if and only if A has both positive and negative eigenvalues. Proof There is a real orthogonal matrix S such that ST AS = D is diagonal, with diagonal entries c,..., cn, say. Put X1 = ST X; then X = SX' and q = XT AX = (X')T (ST AS)X' = (x'fDX, so that q takes the form q = cx'2 + c2x'2 2 H h cnx'n 2 where the entries of X . Thus q, considered as a quadratic form in x'-y,...,x'n, involves only squares. Now observe that as X varies over the set of all non-zero vectors
  • 338. 322 Chapter Nine: Advanced Topics in Rn , so does X' = ST X. This is because ST = S'1 is invertible. Therefore q > 0 for all non-zero X if and only if q > 0 for all non-zero X . In this way we see that it is sufficient to discuss the behavior of q as a quadratic form in x[,..., x'n. Clearly q will be positive definite as such a form precisely when c,..., cn are all positive, with a corresponding statement for negative definite: but q is indefinite if there are positive and negative Q'S. Finally ci,..., cn are just the eigenvalues of A, so the assertion of the theorem is proved. Let us consider in greater detail the important case of a quadratic form q in two variables x and y, say q = ax2 + 2bxy + cy2 ; the associated symmetric matrix is Let the eigenvalues of A be d±and d2. Then by 8.1.3 we have the relations det(A) = dd2 and tr(A) = d + d2; hence dd2 = ac — b2 and d + d2 = a + c. Now according to 9.2.2 the form q is positive definite if and only if d and d2 are both positive. This happens precisely when ac > b2 and a > 0. For these conditions are certainly necessary if d and d2 are to be positive, while if the conditions hold, a and c must both be positive since the inequality ac > b2 shows that a and c have the same sign. In a similar way we argue that the conditions for A to be negative definite are ac > b2 and a < 0. Finally, q is indefinite if and only if ac < b2 : for by 9.2.2 the condition for q to be indefinite is that d and d2 have opposite signs, and this is equivalent to the inequality dd2 < 0. Therefore we have the following result.
  • 339. 9.2: Quadratic Forms 323 Corollary 9.2.3 Let q = ax2 + 2bxy + cy2 be a quadratic form in x and y. Then: (a) q is positive definite if and only if ac > b2 and a > 0; (b) q is negative definite if and only if ac > b2 and a < 0; (c) q is indefinite if and only if ac < b2 . Example 9.2.3 Let q = -2x2 + xy - 3y2 . Here we have a = - 2 , b = 1/2, c = —3. Since ac — b2 > 0 and a < 0, the quadratic form is negative definite, by 9.2.3. The status of a quadratic form in three or more variables can be determined by using 9.2.2. Example 9.2.4 Let q = —2x2 — y2 — 2z2 + 6xz be a quadratic form in x, y, z. The matrix of the form is 1-2 0 3 A= 0 - 1 0 3 0 - 2 which has eigenvalues —5, —1,1. Hence q is indefinite. Next we record a very different criterion for a matrix to be positive definite. While it is not a practical test, it has a very striking form. Theorem 9.2.4 Let A be a real symmetric matrix. Then A is positive definite if and only if A — BT B for some invertible real matrix B. Proof Suppose first that A = BT B with B an invertible matrix. Then the quadratic form q = XT AX can be rewritten as q = XT BT BX = (BX)T BX = BX2 .
  • 340. 324 Chapter Nine: Advanced Topics If X ^ 0, then BX ^ 0 since B is invertible. Hence BX is positive if X ^ 0. It follows that q, and hence A, is positive definite. Conversely, suppose that A is positive definite, so that all its eigenvalues are positive. Now there is a real orthogonal matrix S such that ST AS = D is diagonal, with diagonal entries d1 ( ..., dn say. Here the di are the eigenvalues of A, so all of them are positive. Define [D to be the real diagonal matrix with diagonal entries y/d~[,..., y/d^,. Then we have A = {ST )~l DST = SDST since ST = S"1 , and hence A = S(VDVD)ST = {y/DST )T {VDST ). Finally, put B = /DST and observe that B is invertible since both S and y/~D are. Application to local maxima and minima A well-known use of quadratic forms is to determine if a critical point of a function of several variables is a local maximum or a local minimum. We recall briefly the nature of the problem; for a detailed account the reader is referred to a textbook on calculus such as [18]. Let / be a function of independent real variables xi,..., xn whose first order partial derivatives exist in some region R. A point P(ai, ...,an) of R is called a local maximum (min- imum) of / if within some neighborhood of P the function / assumes its largest (smallest) value at P. A basic result states that if P is a local maximum or minimum of f lying inside R, then all the first order partial derivatives of f vanish at P: fXi(ai,...,an) =0 for i = l,...,n. A point at which all these partial derivatives are zero is called a critical point of / . Thus every local maximum or minimum is a critical point of / . However there may be critical points which are not local maxima or minima, but are saddle points of/.
  • 341. 9.2: Quadratic Forms 325 For example, the function f(x, y) = x2 — y2 has a saddle point at the origin, as shown in the diagram. The problem is to devise a test which can distinguish local maxima and minima from saddle points. Such a test is furnished by the criterion for a quadratic form to be positive definite, negative definite or indefinite. For simplicity we assume that / is a function of two vari- ables x and y. Assume further that / and its partial deriva- tives of degree at most three are continuous inside a region R of the plane, and that (xo, yo) is a critical point of / in R. Apply Taylor's Theorem to the function / at the point (x0, y0), keeping in mind that fx(x0,y0) = 0 = fy{x0, y0). If h and k are sufficiently small, then f(xo + h, yo + k) — f(xo, yo) equals -^{h2 fxx(xo, yo) + 2hkfxy(x0, yo) + k2 fyy(x0, y0)) + S : here S is a remainder term which is a polynomial of degree 3 or higher in h and k. Write a = fxx(xQ, yQ), b = fxy(xQ, y0) and c = fyy(xo, y0); then f(x0 + h,y0 + k)- /(xo, yo) = -^{ah2 + 2bhk + ck2 ) + S. Here S is small compared to the other terms of the sum if h and k are small.
  • 342. 326 Chapter Nine: Advanced Topics Let q — ax2 + 2bxy + cy2 . If q is negative definite, then /(XQ + h, yo + k) < f(x0, yo) when h and k are small and P is a local maximum. On the other hand, if q is positive definite, then P is a local minimum since f(xo + h, yo + k) > /(XQ, yo) for sufficiently small h and k. Finally, should q be indefinite, the expression f(xo + h, yo+k)—f(xo, yo) can be both positive and negative, so P is neither a local maximum nor a local minimum, but a saddle point. Thus the crucial quadratic form which provides us with a test for P to be a local maximum or minimum arises from the matrix TT / Jxx Jxy Jxy JyyJ If the matrix H(xo, yo) is positive definite or negative definite, then / will have a local minimum or local maximum respec- tively at P. If, however, H(x0, yo) is indefinite, then P will be a saddle point of /. Combining this result with 9.2.3, we obtain Theorem 9.2.5 Let f be a function of x and y and assume that f and its partial derivatives of order < 3 are continuous in some region containing the critical point P(xo, yo)- Let D — fxxfyy — fxy : (a) If D(xQ, yo) > 0 and fxx{xQ, y0) < 0, then P is a local maximum of f; (b) If D(XQ, yo) > 0 and fxx(x0, y0) > 0, then P is a local minimum of f; (c) If D < 0, then P is a saddle point of f. The argument just given for a function of two variables can be applied to a function / of n variables x,..., xn. The relevant quadratic form in this case is obtained from the
  • 343. 9.2: Quadratic Forms 327 matrix /fXlXl fXlXa • • • fXlXn TT JX2X1 JX2X2 ' ' ' JX2Xn J XfX J XJIX2 J XfiXji which is called the hessian of the function /. Notice that the hessian matrix is symmetric since fXiXj = fXjXi, provided that / and all its derivatives of order < 3 are continuous. The fundamental theorem may now be stated. Theorem 9.2.6 Let f be a function of independent variables xi,... ,xn. As- sume that f and its partial derivatives of order < 3 are contin- uous in a region containing a critical point P(ai,a,2, • • • ,an). Let H be the the hessian of f. (a) // H(ai,..., an) is positive definite, then P is a local minimum of f; (b) if H(a±,..., on) is negative definite, then P is a local maximum of f; (c) if H(ai,..., an) is indefinite, then P is a saddle point off- Example 9.2.5 Consider the function f(x, y) = (x2 — 2x) cos y. It has a single critical point (1, n) since this is the only point where both first derivatives vanish. To decide the nature of this point we compute the hessian of / as H = 2 cos y — (2x — 2) sin y -{fix — 2) sin y —(x2 — 2x) cos y Hence H(1,TT) — I J, which is clearly negative defi- nite. Thus the point in question is a local maximum of /. Notice that the test given in 9.2.6 will fail to decide the nature of the critical point P if at P the matrix H is not
  • 344. 328 Chapter Nine: Advanced Topics positive definite, negative definite or indefinite: for example, H might equal 0 at P. Extremal values of a quadratic form Consider a quadratic form in variables x±,..., xn q = XT AX, where as usual A is a real symmetric nxn matrix and X is the column consisting of x±,..., xn. Suppose that we want to find the maximum and minimum values of q when X is subject to a restriction. One possible restriction is that ||X|| = a for some a > 0, that is, x + • • • + x — a2 . Thus we are looking for the maximum and minimum values of q on the n- sphere with radius a and center the origin in Rn . One could use calculus to attack this problem, but it is simpler to employ diagonalization. There is a real orthogonal nxn matrix S such that ST AS = D, where D is the diagonal matrix with the eigen- values of A, say di,..., dn, on its diagonal. Put Y = S~1 X: thus we have X = SY and q = XT AX = YT ST ASY = YT DY = dlV + • • • + dny2 n, where yi,..., yn are the entries of Y. In addition we find that XT X = YT (ST S)Y = YT Y, since ST = 5 _ 1 . Therefore our problem may be reformulated as follows: find the maximum and minimum values of the ex- pression diyf- -dnVn subject to y2 + - • -+yn = a2 . But this
  • 345. 9.2: Quadratic Forms 329 is easily answered. For assume that m and M are respectively the smallest and the largest eigenvalues of A. Then q = dxy + • • • + dnyl < M(yf + • • • + j/*) = Ma2 , and q = diyl + • • • + dnyl > m (yi + --' + vl)= mo ?- Suppose that the largest eigenvalue M occurs for k differ- ent yj's; then we can take each of the corresponding y^s to be equal to a/y/k and all other y^s to be 0. Then y2 +- • •J ry2 l = a2 and the value of q at this point is exactly Mk{a/^k)2 = Ma2 . It follows that the largest value of q on the n-sphere really is Ma2 . By a similar argument the smallest value of q on the n-sphere is ma2 . We state this conclusion as: Theorem 9.2.7 The minimum and maximum values of the quadratic form q = XT AX for X — a > 0 are respectively ma2 and Ma2 where m and M are the smallest and largest eigenvalues of the real symmetric matrix A. We conclude with a geometrical example. Example 9.2.6 The equation of an ellipsoid with center the origin is given as XT AX = c, where A is a real symmetric 3 x 3 matrix and c is a positive constant. Find the radius of the largest sphere with center the origin which lies entirely within the ellipsoid. By a rotation to principal axes we can write the equation of the ellipsoid in the form dx' + ey'2 + fz' = c, where the
  • 346. 330 Chapter Nine: Advanced Topics eigenvalues d, e, f of A are positive. Hence the equation of the ellipsoid takes the standard form /2 /2 /2 x y z h - — I = 1. c/d c/e c/f Clearly the sphere will lie entirely inside the ellipsoid pro- vided that its radius a does not exceed the length of any of the semi-axes: thus a cannot be larger than any of Therefore the condition on a is that a < y/jfr, where M is the biggest of the eigenvalues d, e, /. Thus the largest sphere which is contained entirely within the ellipsoid has radius Exercises 9.2 1. Determine if the following quadratic forms are positive definite, negative definite or indefinite: (a) 2x2 -2xy + 3y2 ; (b) x2 -3xz-2y2 + z2 ; (c) x2 + y2 + xz + yz. 2. Determine if the following matrix is positive definite, neg- ative definite or indefinite: G i0- 3. A quadratic form q — X7 AX is called positive semidefinite if q > 0 for all X. The definition of negative semidefinite is
  • 347. 9.2: Quadratic Forms 331 similar. Prove that q is positive semidefinite if and only if all the eigenvalues of A are > 0, and negative semidefinite if and only if all the eigenvalues are < 0. 4. Let A be a positive definite nxn matrix and let S be a real invertible nxn matrix. Prove that ST AS is also positive definite. 5. Let A be a real symmetric matrix. Prove that A is nega- tive definite if and only if it has the form —(BT B) for some invertible matrix B. 6. Identify the following conies: (a) Ux2 -16xy+hy2 = 6; (b) 2x2 +4xy+2y2 +x-3y = 1. 7. Identify the following quadrics: (a) 2x2 + 2y2 +3z2 +4yz = 3; (b) 2x2 + 2y2 + z2 +4xz = 4. 8. Classify the critical points of the following functions as local maxima, local minima or saddle points: (a) x2 + 2xy + 2y2 + Ax (b) (x + yf + {x- yf - 12(3x + y); (c) x2 + y2 + 3z2 — xy + 2xz — z. 9. Find the smallest and largest values of the quadratic form q = 2a:2 + 2y2 + 3z2 + 4yz when the point (x,y,z) is required to lie on the sphere with radius 1 and center the origin. 10. Let XT AX = c be the equation of an ellipsoid with center the origin, where A is a real symmetric 3 x 3 matrix and c is a positive constant. Show that the radius of the smallest sphere with center the origin which contains the ellipsoid is y ^ , where m is the smallest eigenvalue of A. 11. Show that bx2 + 2xy + 2y2 + 5z2 = 1 is the equation of an ellipsoid with center the origin. Then find the radius of the smallest and largest sphere with center the origin which contains, respectively is contained in, the ellipsoid.
  • 348. 332 Chapter Nine: Advanced Topics 9.3 Bilinear Forms Roughly speaking, a bilinear form is a scalar-valued linear function of two vector variables. One type of a bilinear form which we have already met is an inner product on a real vector space. It will be seen that there is a close connection between bilinear forms and quadratic forms. Let V be a vector space over a field of scalars F and write VxV for the set of all pairs (u, v) of vectors from V. Then a bilinear form on V is a function f:VxV^F, that is, a rule assigning to each pair of vectors (u, v) a scalar /(u,v), which satisfies the following requirements: (i) /(ui + u2 ,v) = /(ui,v) + /(u2 ,v); (ii) /(u, vi + v2) = /(u, vi) + /(u, v2); (iii) /(cu,v) = c/(u,v); (iv) /(u,cv) = c/(u,v). These rules must hold for all vectors u, ui, 112, v, vi, v2 in V and all scalars c in F. The effect of the four defining properties is to make /(u, v) "linear" in both the variables u and v. As has been mentioned, an inner product < > on a real vector space is a bilinear form / in which /(u, v) = < u, v > . Indeed the defining properties of the inner product guarantee this. A very important example of a bilinear form arises when- ever a square matrix is given.
  • 349. 9.3: Bilinear Forms 333 Example 9.3.1 Let A be an n x n matrix over a field F. A function / : Fn x Fn — > F is defined by the rule f(X, Y) = XT AY. That / is a bilinear form on Fn follows from the usual rules of matrix algebra. The importance of this example stems from the fact that it is typical of bilinear forms on finite-dimensional vector spaces in a sense that will now be made precise. Matrix representation of bilinear forms Suppose that f : V xV —>• F is a bilinear form on a vector space V of dimension n over a field F. Choose an ordered basis B = {vi,..., vn } of V and define a^- to be the scalar /(VJ, Vj). Thus we can associate with / the n x n matrix A= [aij]. Now let u and v be arbitrary vectors of V and write them in terms of the basis as u = YM= ^V * an< ^ v = S?=i c j v i ' then the coordinate vectors of u and v with respect to the given basis are [u]B = I : and [v]B = bn/ The linearity properties of / can be used to compute /(u, v) in terms of the matrix A. n n n n /(u, v) = f(^2 biVl, J2 c iv i) = X) bi f(Vi > Ylc iw i"> i = l j = l i = l j'=l n n i=l i = l Cl
  • 350. 334 Chapter Nine: Advanced Topics Since /(VJ,VJ) = a^-, this becomes n n /(u,v) = ] P ^ 6 i a i i c i , i=i j=i from which we obtain the fundamental equation /(u,v) = ([u]s)r A[v]e. Thus the bilinear form / is represented with respect to the basis B by the n x n matrix A whose (i,j) entry is / ( V J , V J ) . The values of / can be computed using the above rule. In particular, if / is a bilinear form on Fn and the standard basis of Fn is used, then f(X, Y) = XT AY. Conversely, if we start with a matrix A and define / by means of the equation /(u, v) = ([u]e)T A[v]B, then it is easy to verify that / is a bilinear form on V and that the matrix representing / with respect to the basis B is A. Now suppose we decide to use another ordered basis B': what will be the effect on the matrix A? Let S be the invertible matrix which describes the change of basis B' — • B. Thus [u]B = 5[U]B', according to 6.2.4. Therefore /(u,v) = (S[u}BI)T A(S[v]B/) = ([u}B,)T (ST AS)[v]B,, which shows that the matrix ST AS represents / with respect to the basis B '. At this point we recognize that a new relation between matrices has arisen: a matrix B is said to be congruent to a matrix A if there is an invertible matrix S such that B = ST AS. While there is an analogy between congruence and similarity of matrices, in general similar matrices need not be congruent, nor congruent matrices similar.
  • 351. 9.3: Bilinear Forms 335 The point that has emerged from the preceding discussion is that matrices which represent the same bilinear form with respect to different bases of the vector space are congruent. This result is to be compared with the fact that the matrices representing the same linear transformation are similar. The conclusions of the the last few paragraphs are sum- marized in the following basic theorem. Theorem 9.3.1 (i) Let f be a bilinear form on an n-dimensional vector space V over a field F and let B = {vi,..., vn } be an ordered basis of V. Define A to be the n x n matrix whose (i,j) entry is /(v», Vj); then /(u,v) = ([u]B)T A[v]B, and A is the n x n matrix representing f with respect to B. (ii) If B' is another ordered basis of V, then f is represented with respect to B' by the matrix ST AS where S is the invertible matrix describing the basis change B' — > B. (iii) Conversely, if A is any n x n matrix over F, a bilinear form on V is defined by the rule /(u, v) = ([U]B)T A[V]B- It is represented by the matrix A with respect to the basis B. Symmetric and skew-symmetric bilinear forms A bilinear form / on a vector space V is called symmetric if its values are unchanged by reversing the arguments, that is, if /(u,v) = /(v,u) for all vectors u and v. Similarly, / is said to be skew- symmetric if /(u,v) = - / ( v , u ) is always valid. Notice the consequence, /(u, u) = 0 for all vectors u. For example, any real inner product is a symmetric
  • 352. 336 Chapter Nine: Advanced Topics bilinear form; on the other hand, the form defined by the rule zi Vi is an example of a skew-symmetric bilinear form on R2 . As the reader may suspect, there are connections with symmetric and skew-symmetric matrices. Theorem 9.3.2 Let f be a bilinear form on a finite-dimensional vector space V and let A be a matrix representing f with respect to some basis of V. Then f is symmetric if and only if A is symmetric and f is skew-symmetric if and only if A is skew-symmetric. Proof Let A be symmetric. Then, remembering that [u]T A[v] is scalar, we have /(u, v) = [u]T A[v] = ([u]T A[v])T = [v}T AT [u} = {v]T A[u} = /(v,u). Therefore / is symmetric. Conversely, suppose that / is sym- metric, and let the ordered basis in question be {vi,..., v n } . Then a^- = f(vi,Vj) = /(VJ, Vj) = a^, so that A is symmet- ric. The proof of the skew-symmetric case is similar and is left as an exercise. Symmetric bilinear forms and quadratic forms Let / be a bilinear form on Rn given by f(X, Y) = XT AY. Then / determines a quadratic form q where q = f(X,X) = XT AX.
  • 353. 9.3: Bilinear Forms 337 Conversely, if q is a quadratic form in x i , . . . , xn, we can define a corresponding symmetric bilinear form / on Rn by means of the rule f(X,Y) = ±{q(X + Y)-q(X)-q(Y)} where X and Y are the column vectors consisting of x,..., xn and yi,... ,yn. To see that / is bilinear, first write q(X) = XT AX with A symmetric; then we have f(X, Y) =±{{X + Y)T A(X + Y)~ XT AX - YT AY} = {XT AY + YT AX) = XT AY, since XT AY = (XT AY)T = YT AX. This shows that / is bilinear. It is readily seen that the correspondence q —> / just described is a bijection from quadratic forms to symmetric bilinear forms on Rn . Theorem 9.3.3 There is a bijection from the set of quadratic forms in n vari- ables to the set of symmetric bilinear forms on Rn . From past experience we would expect to get significant information about symmetric bilinear forms by using the Spec- tral Theorem. In fact what is obtained is a canonical or stan- dard form for such bilinear forms.
  • 354. 338 Chapter Nine: Advanced Topics Theorem 9.3.4 Let f be a symmetric bilinear form on an n-dimensional real vector space V. Then there is a basis BofV such that /(u, v) = mvi H h ukvk - uk+ivk+1 UlVi where u,..., un and v±,..., vn are the entries of the coordi- nate vectors [u]g and [v]g respectively and k and I are integers satisfying 0 < k < I < n. Proof Let / be represented by a matrix A with respect some basis B' oiV. Then A is symmetric. Hence there is an orthogonal matrix S such that ST AS = D is diagonal, say with diagonal entries di,... ,dn; of course these are the eigenvalues of A. Here we can assume that d,..., dk > 0, while dk+i,... ,di < 0 and di+i = • • • = dn = 0, by reordering the basis if necessary. Let E be the nxn diagonal matrix whose diagonal entries are the real numbers 1/y/di, • • -, 1/y/dk, l/y/-dk+1, ...,1/y/^dl, 1,...,1. Then (SE)T A{SE) = ET (ST AS)E = EDE, and the final product is the matrix / h I 0 | 0 B 0 I — I 0 I -Ii-k I 0 — I — 0 1 0 / Now the matrix SE is invertible, so its inverse determines a change of basis from B' to say B. Then / will be represented by the matrix B with respect to the basis B. Finally, /(u, v) =
  • 355. 9.3: Bilinear Forms 339 ([U]B)T -B[V]B, so the result follows on multiplying the matrices together. Example 9.3.2 Find the canonical form of the symmetric bilinear form on R2 defined by f(X, Y) = xxyx + 2x±y2 + 2x2yi + x2y2. The matrix of the bilinear form with respect to the stan- dard basis is *-G ?)• which, by Example 9.1.1, has eigenvalues 3 and —1, and is diagonalized by the matrix then f(X,Y) = XT AY = (X')T ST AS Y' = (X)T (3 Q _° Y', so that f(X,Y)=3x'1y'1-x2y'2. Here x[ = 772(^1 + x2) and x'2 = -^(—xi + x2), with corre- sponding formulas in y. To obtain the canonical form of /, put x'[ = y/Zx^, y'{ = y/3y[, and x'2' = x'2, y'2' = y'2. Then f(X,Y) = x'{y';-x'2 l y2', which is the canonical form specified in 9.3.4. Eigenvalues of congruent matrices Since congruent matrices represent the same symmetric bilinear form, it is natural to expect that such matrices should
  • 356. 340 Chapter Nine: Advanced Topics have some common properties, as similar matrices do. How- ever, whereas similar matrices have the same eigenvalues, this is not true of congruent matrices. For example, the matrix ( o - a ) has eigenvalues 2 and —3, but the congruent matrix (i!)G-°)(i 9-('-?) has eigenvalues —2 and 3. Notice that, although the eigenvalues of these congruent matrices are different, the numbers of positive and negative eigenvalues are the same for each matrix. This is an instance of a general result. Theorem 9.3.5 (Sylvester's Law of Inertia) Let A be a real symmetric n x n matrix and S an invertible n x n matrix. Then A and ST AS have the same numbers of positive, negative and zero eigenvalues. Proof Assume first of all that A is invertible; this is the essential case. Recall that by 7.3.6 it is possible to write S in the form QR where Q is real orthogonal and R is real upper triangular with positive diagonal entries; this was a consequence of the Gram-Schmidt process. The idea of the proof is to obtain a continuous chain of matrices leading from S to the orthogonal matrix Q; the point of this is that QT AQ — Q~l AQ certainly has the same eigenvalues as A. Define S{t)=tQ + {l-t)S, where 0 < t < 1. Thus 5(0) = S while 5(1) = Q. Now write U = tl + (1 - t)R, so that S{t) = QU. Next U is an upper
  • 357. 9.3: Bilinear Forms 341 triangular matrix and its diagonal entries are t + (1 — t)ru these cannot be zero since ra > 0 and 0 ^ t ^ 1. Hence U is invertible, while Q is certainly invertible since it is orthogonal. It follows that S(t) = QU is invertible; thus det(S(t)) ^ 0. Now consider A(t) = S(t)T AS{t); since det(A{t)) = det(A) det(S(t))2 ^ 0, it follows that A(t) cannot have zero eigenvalues. Now as t goes from 0 to 1, the eigenvalues of A(0) = ST AS gradually change to those of A(l) = QT AQ, that is, to those of A. But in the process no eigenvalue can change sign because the eigenvalues that appear are continuous functions of t and they are never zero. Consequently the numbers of positive and negative eigenvalues of ST AS are equal to those of A. Finally, what if A is singular? In this situation the trick is to consider the matrix A + el, which may be thought of as a "perturbation" of A. Now A + el will be invertible provided that € is sufficiently small and positive: for det(A + xl) is a polynomial of degree n in x, so it vanishes for at most n values of x. The previous argument shows that the result is true for A + el if e is small and positive; then by taking the limit as e — • > 0, we can deduce the result for A. It follows from this theorem that the numbers of posi- tive and negative signs that appear in the canonical form of 9.3.4 are uniquely determined by the bilinear form and do not depend on the particular basis chosen. Example 9.3.3 Show that the matrices ( j and I J are not con- gruent. All one need do here is note that the first matrix has eigenvalues 1,3, while the second has eigenvalues 3 , - 1 . Hence by 9.3.5 they cannot be congruent.
  • 358. 342 Chapter Nine: Advanced Topics Skew-symmetric bilinear forms Having seen that there is a canonical form for symmetric bilinear forms on real vector spaces, we are led to enquire if something similar can be done for skew-symmetric bilinear forms. By 9.3.2 this is equivalent to trying to describe all skew-symmetric matrices up to congruence. The theorem that follows provides a solution to this problem. Theorem 9.3.6 Let f be a skew-symmetric bilinear form on an n-dimensional vector space V over either R or C. Then there is an ordered basis ofV with the form {ui, v i , . . . ,uk, vfc, wi,.. .,wn_2fc}, where 0 < 2k < n, such that f(ui,v^ = 1 = -/(vi,Ui), i = l,...,k and f vanishes on all other pairs of basis elements. Let us examine the consequence of this theorem before setting out to prove it. If we use the basis provided by the theorem, the bilinear form / is represented by the matrix / V 0 - 1 0 0 0 0 1 0 0 0 0 0 0 0 0 • - 1 0 0 0 0 1 0 0 0 0 • 0 0 •• 0 • 0 0 • ° 0 0 0 0 • 0 ) where the number of blocks of the type is k. This 0 1 - 1 0 allows us to draw an important conclusion about skew- symmetric matrices.
  • 359. 9.3: Bilinear Forms 343 Corollary 9.3.7 A skew-symmetric n x n matrix A over R or C is congruent to a matrix M of the above form. This is because the bilinear form / given by f(X, Y) = XT AY is skew-symmetric and hence is represented with re- spect to a suitable basis by a matrix of type M; thus A must be congruent to M. Proof of 9.3.6 Let z i , . . . , zn be any basis of V. If /(ZJ, Zj) = 0 for all i and j , then /(u, v) = 0 for all vectors u and v, so that / is the zero bilinear form and it is represented by the zero matrix. This is the case k = 0. So assume that /(ZJ, Zj) is not zero for some i and j . Since the basis can be reordered, we may suppose that /(zi, z2) = a ^ 0. Then /(a_ 1 zi, z2) = a_ 1 /(zi5 z2) = 1. Now replace zi by a_ 1 zi; the effect is to make /(zi, z2) = 1, and of course /(z2 , zi) = —1 since / is skew-symmetric. Next put bi = /(zi,Zj) where i > 2. Then /(zi.zi -6z2 ) = /(zi,Zi) -6/(zi,z2 ) = 6 - 6 = 0. This suggests that we modify the basis further by replacing Zj by Zj — 6z2 for i > 2; notice that this does not disturb linear independence, so we still have a basis of V. The effect of this substitution is make /(zi,Zj) = 0 for i = 3,.. .,n. Next we have to address the possibility that /(z2 , z;) may be non-zero when i > 2; let c = /(z2 , Zj). Then / ( z 2 , Z j + CZi) = / ( z 2 , Z j ) + c / ( z 2 , Z i ) = C + C ( - 1 ) = 0 . This suggests that the next step should be to replace Zj by Zj + czi where i > 2; again we need to observe that Zi,..., zn
  • 360. 344 Chapter Nine: Advanced Topics will still be a basis of V. Also important is the remark that this substitution will not nullify what has already been achieved; the reason is that when i > 2 /(zi,Zi + czi) = /(zi,zi ) + c/(z1,z1) = 0 . We have now reached the point where /(zi,z2 ) = 1 = -/(z2 ,zi) and /(zi,z,) = 0 = /(z2,Zi), for all i > 2. Now we rename our first two basis elements, writing Ui = zi and vi = z2. So far the matrix representing / has the form / O i l 0 0 - 1 0 1 0 0 I V 0 0 I B J where B is a skew-symmetric matrix with n — 2 rows and columns. We can now repeat the argument just given for the subspace with basis {Z3,..., zn }; it follows by induction on n that there is a basis for this subspace with respect to which / is represented by a matrix of the required form. Indeed let u2 ,...,uf c , v2,...,Vfc,wi,...,wn_2A; be this basis. By adjoining ui and vi, we obtain a basis of V with respect to which / is represented by a matrix of the required form. Example 9.3.4 Find the canonical form of the skew-symmetric matrix / 0 0 2 A= 0 0 - 1 - 2 1 0 We need to carry out the procedure indicated in the proof of the theorem. Let {E±, E2, E3} be the standard basis of R3 .
  • 361. 9.3: Bilinear Forms 345 The matrix A determines a skew-symmetric bilinear form / with the properties f{E1,E3) = 2 = -f(E3, Ex), f{E3, E2) = 1 = -f(E2,E3), f(E1,E2) = 0 = f(E2,E1). The first step is to reorder the basis as {Ei,E3,E2}; this is necessary since f(E,E2) = 0 whereas f(Ei,E3) ^ 0. Now replace {Ei,E3,E2} by {±Ei,E3,E2}, noting that f(EuE3) = 1 = - / ( £ 3 , | £ i ) . Next f(E3,E2) = 1, so we replace E2 by E2 + f(E3,E2)-Ei = -Ei + E2. Note that f{Eu EX + E2) = 0 = f(E3, EX + E2). The procedure is now complete. The bilinear form is represented with respect to the new ordered basis by the matrix / M = ( 0 1 0 - 1 0 0 I 0 0 0 which is in canonical form. The change of basis from {^Ei,E3, | E + E2} to the standard ordered basis is repre- sented by the matrix 1/2 0 1/2' 5 = | 0 0 1 0 1 0 The reader should now verify that ST AS equals M, the canon- ical form of A, as predicted by the proof of 9.3.6.
  • 362. 346 Chapter Nine: Advanced Topics Exercises 9.3 1. Which of the following functions / are bilinear forms? (a) f(X,Y) = X~Y on Rn - (b) f(X,Y) =XT Y onRn ; (c) f(g,h) = fag{x)h(x) dx on C{a,b]. 2. Let / be the bilinear form on R2 which is defined by the equation f(X,Y) = 2xy2 — 3x2yi- Write down the matrices which represent / with respect to (a) the standard basis, and (b) the basis {( J J ( j )}. 3. If / and g are two bilinear forms on a vector space V, define their sum / + g by the rule / -f- g(u,v) = f(u,v) + g(u,v); also define the scalar multiple cf by the equation cf(u, v) = c(f(u,v)). Prove that with these operations the set of all bilinear forms on V becomes a vector space V . If V has dimension n, what is the dimension of V ? 4. Prove that every bilinear form on a real or complex vector space is the sum of a symmetric and a skew-symmetric bilinear form. 5. Find the canonical form of the symmetric bilinear form on R2 given by f(X, Y) = 3ziyi + xxy2 + x2yi + 3x2y2- 6. Let / be a bilinear form on Rn . Prove that / is an inner product on Rn if and only if / is symmetric and the corre- sponding quadratic form is positive definite. 7. Test each of the following bilinear forms to see if it is an inner product: (a) f(X, Y) = 3zi2/i + xxy2 + x2yi + 5x2y2] (b) f(X,Y) = 2x1y1+xiy2+x1y3 + x2yi + 3x2y2-2x2y3 +x3yt - 2x3y2 + 3x3y3.
  • 363. 9.4: Jordan Normal Form 347 8. Find the canonical form of the skew-symmetric matrix and also find an invertible matrix S such that ST AS equals the canonical form. 9. (a) If A is a square matrix and S is an invertible matrix, prove that A and ST AS have the same rank. (b) Deduce that the rank of a skew-symmetric matrix equals twice the number of 2 x 2 blocks in the canonical form of the matrix. Conclude that the canonical form is unique. 10. Call a skew-symmetric bilinear form / on a vector space V non-isotropic if for every non-zero vector v there is an- other vector w in V such that /(v, w) ^ 0. Prove that a finite-dimensional real or complex vector space which has a non-isotropic skew-symmetric bilinear form must have even dimension. 9.4 Minimum Polynomials and Jordan Normal Form The aim of this section is to introduce the reader to one of the most famous results in linear algebra, the existence of what is known as Jordan normal form of a matrix. This is a canonical form which applies to any square complex matrix. The existence of Jordan normal form is often presented as the climax of a series of difficult theorems; however the simpli- fied approach adopted here depends on only elementary facts about vector spaces. We begin by introducing the important concept of the minimum polynomial of a linear operator or matrix.
  • 364. 348 Chapter Nine: Advanced Topics The minimum polynomial Let T be a linear operator on an n-dimensional vector space V over some field of scalars F. We show that T must sat- isfy some polynomial equation with coefficients in F. At this point the reader needs to keep in mind the definitions of sum, scalar multiple and product for linear operators introduced in 6.3. For any vector v of V, the set {v, T(v),..., Tn (v)} con- tains n + 1 vectors and so it must be linearly dependent by 5.1.1. Consequently there are scalars ao, a i , . . . , an, not all of them zero, such that a0v + aiT(v) + • • • + anTn (v) = 0. Let us write /v for the polynomial ao+a-_x + - • • + anxn . Then MT)=a0l + a1T+--- + anTn , where 1 denotes the identity linear operator. Therefore /vCO(v) = a0v + aiT(v) + • • • + anTn (v) = 0. Now let {vi,..., vn } be a basis of the vector space V and define / to be the product of the polynomials /V l , /v2 > • • • ) /vn • Then /(r)(vi) = /V l (r)-../V n (7)^0 = 0 for each i = l,...,n. This is because fVi(T)(vi) = 0 and the /v (T) commute, since powers of T commute by Exercise 6.3.13. Therefore f(T) is the zero linear transformation on V, that is, /CO = o. Here of course / is a polynomial with coefficients in F. Having seen that T satisfies a polynomial equation, we can select a polynomial / in x over F of smallest degree such that f(T) = 0. In addition, we may suppose that / is monic,
  • 365. 9.4: Jordan Normal Form 349 that is, the highest power of x in / has its coefficient equal to 1. This polynomial / is called a minimum polynomial of T. Suppose next that g is an arbitrary polynomial with coef- ficients in F. Using long division, just as in elementary algebra, we can divide g by / to obtain a quotient q and a remainder r; both of these will be polynomials in x over F. Thus g = fq+r, and either r = 0 or the degree of r is less than that of /. Then we have g(T) = f(T)q(T) + r(T) = r(T) since f(T) = 0. Therefore g(T) = 0 if and only if r(T) = 0. But, remembering that / was chosen to be of smallest degree subject to f(T) = 0, we can conclude that r(T) = 0 if and only if r = 0, that is, g is divisible by /. Thus the polynomials that vanish at T are precisely those that are divisible by the polynomial /. If g is another monic polynomial of the same degree as / such that g(T) = 0, then in fact g must equal /. For g is divisible by / and has the same degree as /, which can only mean that g is a constant multiple of /. However g is monic, so it actually equals /. Therefore the minimum polynomial of T is the unique monic polynomial / of smallest degree such that f(T) = 0. These conclusions are summed up in the following result. Theorem 9.4.1 Let T be a linear operator on a finite-dimensional vector space over a field F with a minimum polynomial f. Then the only polynomials g with coefficients in F such that g(T) = 0 are the multiples of f. Hence f is the unique monic polynomial of smallest degree such that f(T) = 0 and T has a unique minimum polynomial. So far we have introduced the minimum polynomial of a linear operator, but it is to be expected that there will be
  • 366. 350 Chapter Nine: Advanced Topics a corresponding concept for matrices. The minimum poly- nomial of a square matrix A over a field F is defined to be the monic polynomial / with coefficients in F of least degree such that f(A) = 0. The existence of / is assured by 9.4.1 and the relationship between linear operators and matrices. Clearly the minimum polynomial of a linear operator equals the minimum polynomial of any representing matrix. There is of course an exact analog of 9.4.1 for matrices. Example 9.4.1 What is the minimum polynomial of the following matrix? / 2 1 1 4 = I 0 2 0 0 0 2 In the first place we can see directly that (A — 2J3)2 = 0. Therefore the minimum polynomial / must divide the poly- nomial (x — 2)2 , and there are two possibilities, / = x — 2 and f = (x — 2)2 . However / cannot equal x — 2 since A — 21 7^ 0. Hence the minimum polynomial of A is / = (x — 2)2 . Example 9.4.2 What is the minimum polynomial of a diagonal matrix D? Let d,. . ., dr be the distinct diagonal entries of D. Again there is a fairly obvious polynomial equation that is satisfied by the matrix, namely (A - dj) • • • (A- drI) = 0. So the minimum polynomial divides (x — d) • • • (x — dr) and hence is the product of certain of the factors x — di. However, we cannot miss out even one of these factors; for the product of all the A — djI for j ^ i is not zero since dj ^ di. It follows that the minimum polynomial of D is the product of all the factors, that is, (x — d) • • • (x — dr).
  • 367. 9.4: Jordan Normal Form 351 In the computation of minimum polynomials the next result is very useful. Lemma 9.4.2 Similar matrices have the same minimum polynomial. The quickest way to see this is to recall that similar ma- trices represent the same linear operator, and hence their min- imum polynomials equal the minimum polynomial of the lin- ear operator. Thus, by combining Lemma 9.4.2 and Example 9.4.2, we can find the minimum polynomial of any diagonal- izable complex matrix. Example 9.4.3 Find the minimum polynomial of the the matrix By Example 9.1.1 the matrix is similar to Hence the minimum polynomial of the given matrix is (x-3)(x + l). In Chapter Eight we encountered another polynomial as- sociated with a matrix or linear operator, namely the charac- teristic polynomial. It is natural to ask if there is a connection between these two polynomials. The answer is provided by a famous theorem. Theorem 9.4.3 (The Cayley-Hamilton Theorem) Let A be annxn matrix over C. Ifp is the characteristic poly- nomial of A, then p(A) = 0. Hence the minimum polynomial of A divides the characteristic polynomial of A. Proof According to 8.1.8, the matrix A is similar to an upper tri- angular matrix T; thus we have S~X AS = T with S invert- ible. By 9.4.2 the matrices A and T have the same minimum polynomial, and we know from 8.1.4 that they have the same 1 2 2 1
  • 368. 352 Chapter Nine: Advanced Topics characteristic polynomial. Therefore it is sufficient to prove the statement for the triangular matrix T. From Example 8.1.2 we know that the characteristic polynomial of T is (in -x)---(tnn -x). On the other hand, direct matrix multiplication shows that (tnl — T) • • • (tnnI — T)=0: the reader may find it helpful to check this statement for n — 2 and 3. The result now follows from 9.4.1. At this juncture the reader may wonder if the minimum polynomial is really of much interest, given that it is a divisor of the more easily calculated characteristic polynomial. But in fact there are features of a matrix that are easily recognized from its minimum polynomial, but which are unobtainable from the characteristic polynomial. One such feature is diag- onalizability. Example 9.4.4 ( Consider for example the matrices I2 and I I: both of these have characteristic polynomial (x — l)2 , but the first matrix is diagonalizable while the second is not. Thus the characteristic polynomial alone cannot tell us if a matrix is diagonalizable. On the other hand, the two matrices just con- sidered have different minimum polynomials, x — 1 and (x — l)2 respectively. This example raises the possibility that it is the mini- mum polynomial which determines if a matrix is diagonaliz- able. The next theorem confirms this. Theorem 9.4.4 Let A be an n x n matrix over C. Then A is diagonalizable if and only if its minimum polynomial splits into a product of n distinct linear factors.
  • 369. 9.4: Jordan Normal Form 353 Proof Assume first that A is diagonalizable, so that S~1 AS — D, a diagonal matrix, for some invertible S. Then A and D have the same minimum polynomials by 9.4.2. Let di,...,dr be the distinct diagonal entries of D; then Example 9.4.2 shows that the minimum polynomial of D is (x — d) • • • (x — dr), which is a product of distinct linear factors. Conversely, suppose that A has minimum polynomial / = (x-di)---(x-dr) where di,... ,dr are distinct complex numbers. Define #; to be the polynomial obtained from / by deleting the factor x — di. Thus / 9i = -}-• x — di Next we recall the method of partial fractions, which is useful in calculus for integrating rational functions. This tells us that there are constants b,..., br such that I = V^ b i f ~ ^ x- d{ Multiplying both sides of this equation by /, we obtain 1 = M i - r- Kgr by definition of gi. At this point we prefer to work with linear operators, so we introduce the linear operator T on Cn defined by T(X) — AX. It follows from the above equation that bg{T) + • • • + brgr(T) is the identity function. Hence X = bl9l(T)(X) + --- + brgr(T)(X)
  • 370. 354 Chapter Nine: Advanced Topics for any vector X. Let Vi denote the set of all elements of the form gi(T)X with X a vector in Cn . Then Vj is a subspace and the above equation for X tells us that C n = Vi + • • • + Vr. Now in fact Cn is the direct sum of the subspaces Vj, which amounts to saying that the intersection of a Vi and the sum of the remaining Vj, with j 7^ i, is zero. To see why this is true, take a vector X in the intersection. Observe that 9i{T)gj{T) = 0 if % ^ j since every factor x — dk is present in the polynomial giQj. Therefore gk(T)(X) = 0 for all k. Since X = £ L i bkgk(T)(X), it follows that X = 0. Hence C n is the direct sum Cn = Vx®---®Vr. Now the effect of T on vectors in Vi is merely to multiply them by d; since (T - d^g^T) = /(T) = 0. Therefore, if we choose bases for each subspace V,..., Vr and combine them to form a basis of Cn , then T will be represented by a diagonal matrix. Consequently A is similar to a diagonal matrix. Example 9.4.5 The matrix has minimum polynomial (x — 2)2 , as we saw in Example 9.4.1. Since this is not a product of distinct linear factors, the matrix cannot be diagonalized.
  • 371. 9.4: Jordan Normal Form 355 Example 9.4.6 The n x n upper triangular matrix ° 0 0 1 cJ has minimum polynomial (x —c)n ; this is because (A — cl)n = 0, but (A — cl)n ~1 / 0. Hence A is diagonalizable if and only if n = 1. Notice that the characteristic polynomial of A equals (c-x)n . Jordan normal form We come now to the definition of the Jordan normal form of a square complex matrix. The basic components of this are certain complex matrices called Jordan blocks, of the type considered in Example 9.4.6. In general a n n x n Jordan block is a matrix of the form 0 0 0 0 0 0 c 1 0 c) for some scalar c. Thus J is an upper triangular n x n ma- trix with constant diagonal entries, a superdiagonal of l's, and zeros elsewhere. By Example 9.4.6 the minimum and charac- teristic polynomials of J are (x — c)n and (c — x)n respectively. We must now take note of the essential property of the matrix J. Let E±,..., En be the vectors of the standard basis of Cn . Then matrix multiplication shows that JE = cEi, and JEi = cEi + -E^-i where 1 < i < n. A (c 0 0 0 Ko 1 c 0 0 0 0 •• 1 •• c • • 0 •• 0 •• • 0 • 0 • 0 c • 0 J I c ± u 0 c 1 0 0 c 0 0 0 V n n n
  • 372. 356 Chapter Nine: Advanced Topics In general, if A is any complex n x n matrix, we call a sequence of vectors X,..., XT in Cn a Jordan string for A if it satisfies the equations AX i = cX1 and AXi = cXi + X;_i where c is a scalar and 1 < i < r. Thus every n x n Jordan block determines a Jordan string of length n. Now suppose there is a basis of C n which consists of Jordan strings for the matrix A. Group together basis elements in the same string. Then the linear operator on C n given by T(X) = AX is represented with respect to this basis of Jordan strings by a matrix which has Jordan blocks down the ° 0 JkJ Here Jj is a Jordan block, say with Cj on the diagonal. This is because of the effect produced on the basis elements when they are multiplied on the left by A. Our conclusion is that A is similar to the matrix N, which is called the Jordan normal form of A. Notice that the diagonal elements Q of N are just the eigenvalues of A. Of course we still have to establish that a basis consisting of Jordan strings always exists; only then can we conclude that every matrix has a Jordan normal form. Theorem 9.4.5 (Jordan Normal Form) Every square complex matrix is similar to a matrix in Jordan normal form. Proof Let A be an n x n complex matrix. We have to establish the existence of a basis of C n consisting of Jordan strings for A. This is done by induction on n; if n = 1, any non-zero vector v -i "'fc>^""'1 - N /Jx 0 0 J2 V 0 0
  • 373. 9.4: Jordan Normal Form 357 qualifies as a Jordan string of length 1, so we can assume that n > 1. Since A is complex, it has an eigenvalue c. Thus the matrix A' = A — cl is singular, and so its column space C has dimension r < n. Recall from Example 6.3.2 that C is the image of the linear operator on C" which sends X to A1 X. Restriction of this linear operator to C produces a linear operator which is represented by an r x r matrix. Since r < n, we may assume by induction hypothesis on n that C has a basis which is a union of Jordan strings for A. Let the ith such string be written Xij, j = 1,... ,U thus AXn = CjXji and in addition AXij = QXJJ + Xij-i for 1 < j < h- Then A'Xn = 0 and A'Xij = Xy-_i if j > 1. Next let D denote the intersection of C with N, the null space of A', and set p = dim(D). We need to identify the elements of D. Now any element of C has the form Y = > J y]aijXij i 3 where a^ is a complex number. Assume that Y is in D, and thus in N, the null space of A'. Suppose that a^- ^ 0 and let j be as large as possible with this property for the given i. If j > 1, then the equations A'Xn = 0 and A'Xik = Xik-i will prevent A'Y from being zero. Hence j = 1. It follows that the Xii form a basis of .D, so there are exactly p of these Xn. Every vector in C is of the form A'Y for some Y, since C is the image space of the linear operator sending X to A'X. For each i write the vector Xut in the form X ^ = A'Yi, for some Yi , i = 1,..., p. There are p of these Yi. Finally, N has dimension n — r, so we can adjoin a further set of n — r — p vectors to the Xn to get a basis for N, say Z,..., Zn _r _p . Altogether we have a total of r + p + (n — r — p) = n vectors
  • 374. 358 Chapter Nine: Advanced Topics We now assert that these vectors form a basis of Cn which consists of Jordan strings of A. Certainly AYk = (A' + d)Yk = A'Yk + cYk = cYk + Xklk. Thus the Jordan string Xki,..., Xkik has been extended by adjoining Yk. Also AZm = cZm since Zm belongs to the null space of A'; thus Zm is a Jordan string of A with length 1. Hence the vectors in question constitute a set of Jordan strings of A What remains to be done is to prove that the vectors Xij,Yk, Zm form a basis of Cn , and by 5.1.9 it is enough to show that they are linearly independent. To accomplish this, we assume that e^, fk,gm are scalars such that ^ ^ eijXij + ^ fkYk + ^ 9mZm = 0. Multiplying both sides of this equation on the left by A', we get 53 53 ey(0 or X^) + J2 fk*kik = 0. Now Xkik does not appear among the terms of the first sum in the above equation since j — 1 < lk. Hence fk = 0 for all k. Thus / J QrnZm = ~ / v / _, e ijX{j, which therefore belongs to D. Hence e^ = 0 if j > 1, and J2 9mZm = — Yl e aXn. This can only mean that gm = 0 and eji = 0 since the Xn and Z m are linearly independent. Hence the theorem is established. Corollary 9.4.6 Every complex n x n matrix is similar to an upper triangular matrix with zeros above the superdiagonal. This follows at once from the theorem since every Jordan block is an upper triangular matrix of the specified type.
  • 375. 9.4: Jordan Normal Form 359 Example 9.4.7 Put the matrix / H ^ 3 1 0 - 1 1 0 V 0 0 2 in Jordan normal form. We follow the method of the proof of 9.4.5. The eigen- values of A are 2, 2, 2, so define 1 1 0' A' = A-2I=-1 - 1 0 0 0 0 The column space C of A1 is generated by the single vector ( l X = —1 . Note that AX = 2X, so X is a Jordan string V 0/ of length 1 for A. Also the null space N of A' is generated by X and the vector Thus D — C D N — C is generated by X. The next step is to write X in the form A'Y: in fact we can take Y 0' Thus the second basis element is Y. Finally, put Z = [ 0 | , so that {X, Z} is a basis for N. Then A'X = 0, A'Y = X, A'Z = 0
  • 376. 360 Chapter Nine: Advanced Topics and hence AX = 2X, AY = 2Y + X and AZ = 2Z. It is now evident that {X, Y, Z} is a basis of C3 consisting of the two Jordan strings X,Y, and Z. Therefore the Jordan form of A has two blocks and is N / 2 1 1 0 0 2 1 0 V 0 0 1 2 As an application of Jordan form we establish an inter- esting connection between a matrix and its transpose. Theorem 9.4.7 Every square complex matrix is similar to its transpose. Proof Let A be a square matrix with complex entries, and write N for the Jordan normal form of A. Thus S~1 AS = N for some invertible matrix S by 9.4.5. Now NT = ST AT (S-1 )T = ST AT (ST )- so NT is similar to AT . It will be sufficient if we can prove that N and NT are similar. The reason for this is the transitive property of similarity: if P is similar to Q and Q is similar to R, then P is similar to R. Because of the block decomposition of N, it is enough to prove that any Jordan block J is similar to its transpose. But this can be seen directly. Indeed, if P is the permutation matrix with a line of l's from top right to bottom left, then matrix multiplication shows that P~X JP — JT .
  • 377. 9.4: Jordan Normal Form 361 Another use of Jordan form is to determine which matri- ces satisfy a given polynomial equation. Example 9.4.8 Find up to similarity all complex n x n matrices A satisfying the equation A2 = I. Let N be the Jordan normal form of A, and write N = S^AS. Then N2 = S~1 A2 S. Hence A2 = I if and only if N2 = /. Since N consists of a string of Jordan blocks down the diagonal, we have only to decide which Jordan blocks J can satisfy J2 = i". This is easily done. Certainly the diagonal entries of J will have to be 1 or — 1. Furthermore, matrix multiplication reveals that J2 ^ I if J has two or more rows. Hence the block J must be 1 x 1. Thus N is a diagonal matrix with all its diagonal entries equal to +1 or — 1. After reordering the rows and columns, we get a matrix of the form where r + s = n. Therefore A2 = 1 if and only if A is similar to a matrix with the form of N. Next we consider the relationship between Jordan nor- mal form and the minimum and characteristic polynomials. It will emerge that knowledge of Jordan form permits us to write down the minimum polynomial immediately. Since in principle we know how to find the Jordan form - by using the method of Example 9.4.7 - this leads to a systematic way of computing minimum polynomials, something that was lacking previously. Let A be a complex nxn matrix whose distinct eigenval- ues are ci,..., cr. For each Q there are corresponding Jordan blocks in the Jordan normal form N of A which have Q on their principal diagonals, say Jn,..., Jut let n^- be the num- ber of rows of Ja. Of course A and N have the same minimum
  • 378. 362 Chapter Nine: Advanced Topics and characteristic polynomials since they are similar matrices. Now J^ is an n^ x n^ upper triangular matrix with ci on the principal diagonal, so its characteristic polynomial is just (ci — x)ni i. The characteristic polynomial p of N is clearly the product of all of these polynomials: thus r h p — TT(CJ — x)mi where rrii = J n i j - i = i j=i The minimum polynomial is a little harder to find. If / is any polynomial, it is readily seen that f(N) is the matrix with the blocks f(Jij) down the principal diagonal and zeros elsewhere. Thus f(N) = 0 if and only if all the / ( J ^ ) = 0. Hence the minimum polynomial of N is the least common multiple of the minimum polynomials of the blocks J^. But we saw in Example 9.4.6 that the minimum polynomial of the Jordan block J^ is (x — Ci)nij . It follows that the minimum polynomial of iV is n /=no* -c *)fci where ki is the largest of the n^ for j = 1,..., U. These conclusions, which amount to a method of com- puting minimum polynomials from Jordan normal form, are summarized in the next result. Theorem 9.4.8 Let A be an nxn complex matrix and let c±,... ,cr be the dis- tinct eigenvalues of A. Then the characteristic and minimum polynomials of A are n n Y[(ci-x)mi and Y[(x-Ci)ki i = l i=l
  • 379. 9.4: Jordan Normal Form 363 respectively, where m; is the sum of the numbers of columns in Jordan blocks with eigenvalue ci and ki is the number of columns in the largest such Jordan block. Example 9.4.9 Find the minimum polynomial of the matrix A = The Jordan form of A is N / 2 1 I 0 0 2 1 0 — I V 0 0 1 2 / by Example 9.4.7. Here 2 is the only eigenvalue and there are two Jordan blocks, with 2 and 1 columns. The minimum poly- nomial of A is therefore (x — 2)2 . Of course the characteristic polynomial is (2 — x)3 , Application of Jordan form to differential equations In 8.3 we studied systems of first order linear differential equations for functions yi, y.2,..., yn of a variable x. Such a system takes the matrix form Y' = AY. Here Y is the column of functions y,. .., yn and A is an n x n matrix with constant coefficients. Since any such matrix A is similar to a triangular matrix (by 8.1.8), it is possible to change to a system of linear differential equations for a new set of functions which has a triangular coefficient matrix. This new system can then be solved by back substitution,
  • 380. 364 Chapter Nine: Advanced Topics as in Example 8.3.3. However this method can be laborious for large n and Jordan form provides a simpler alternative method. Returning to the system Y' = AY, we know that there is a non-singular matrix S such that N = S~1 AS is in Jordan normal form: say 7V = 0 0 0 jj Here Ji is a Jordan block, say with di on the diagonal. Of course the di are the eigenvalues of A. Now put U = S~~1 Y, so that the system Y' — AY becomes (SU)' = ASU, or U' = NU, since N = S~1 AS. To solve this system of differential equa- tions it is plainly sufficient to solve the subsystems U[ = JiUi for i — 1,..., k where Ui is the column of entries of U corre- sponding to the block Ji in N. This observation effectively reduces the problem to one in which the coefficient matrix is a Jordan block, let us say (d 1 0 • • • 0 0^ 0 d 1 • • • 0 0 A = o 0 0 0 0 d 0 1 d) Now the equations in the corresponding system have a much simpler form than in the general triangular case: du + u2 du2 + u3 u. ur dun. - ur dur
  • 381. 9.4: Jordan Normal Form 365 The functions Ui can be found by solving a series of first order linear equations, starting from the bottom of the list. Thus u'n = dun yields un = cn-edx where cn_i is a constant. The second last equation becomes u'n_x - dun-i = cn-iedx , which is first order linear with integrating factor e~dx . Multi- plying the equation by this factor, we get (un-ie~dx )' = cn_i. Hence un-i = (cn_2 + cn-ix)edx where cn_2 is another constant. The next equation yields u'n_2 - dun-2 = (cn-2 + cn-ix)edx , which is also first order linear with integrating factor e~dx . It can be solved to give / . c n - 2 , c n - l 2 dx Un-2 = (cn-3 + ~jrX + ~^TX )e ' where cn„2 is constant. Continuing in this manner, we find that the function «n_i is given by ^• n—i — (Cn — i — 1 i r~| X "T ' " ' ~r r. X )C , 1! i where the Cj are constants. The original functions j/j can then be calculated by using the equation Y — SU. Example 9.4.10 Solve the linear system of differential equations below using Jordan normal form: y[ = 3yi + y2 y2 = -yi + V2 y'3 = 2y3
  • 382. 366 Chapter Nine: Advanced Topics Here the coefficient matrix is A = The Jordan form of A was found in Example 9.4.7: we recall the results obtained there. There is a basis of R3 consisting of Jordan strings: this is {X, W, Z}, where X = - 1 , W = 0 and Z Here AX = 2X, AW = 2W + X, AZ = 2Z. The matrix which describes the change of basis from {X, W, Z} to the standard basis is By 6.2.6 the matrix which represents the linear operator aris- ing from left multiplication by A, with respect to the basis {X,W,Z}, is S^AS = J Now put U = S X Y, so that Y = SU and the system of equations becomes U' = S~1 ASU = JU, that is, 2u3 u[ u'2 U'o = 2ui = = + u2 2u2
  • 383. 9.4: Jordan Normal Form 367 We solve this system, beginning with the last equation, and obtain u3 = c2e2x . Next u'2 = 2u2, so that u2 = ce2x . Finally we solve u[ - 2ui = cie2x , a first order linear equation, and find the solution to be u — (CQ + c1x)e2x . Therefore ci I <? and since Y = SU, we obtain Co + C +CiX Y = | -CQ-CXX e2x , c2 from which the values of the functions yi,y2,V3 can be read off. Exercises 9.4 1. Find the minimum polynomials of the following matrices by inspection : w(SS>w(Si)i ( c , (!S / 3 0 0' (d) 0 2 1 0 0 2 2. Let A be an n x n matrix and S an invertible n x n ma- trix over a field F. If / is any polynomial over F, show that f(S~1 AS) = S~1 f(A)S. Use this result to give another proof
  • 384. 368 Chapter Nine: Advanced Topics of the fact that similar matrices have the same minimum poly- nomial (see 9.4.2). 3. Use 9.4.4 to prove that if an n x n complex matrix has n distinct eigenvalues, then the matrix is diagonalizable. 4. Show that the minimum polynomial of the companion ma- trix / 0 0 - c A= 1 0 -b 0 1 -a is x3 + ax2 + bx + c . (See Exercise 8.1.6). [Hint: show that ul + vA + wA2 = 0 implies that u — v = w = 0]. 5. Find the Jordan normal forms of the following matrices: « G 2);<b »(-! J)'(«=>(» ~ jj- 6. Read off the minimum polynomials from the Jordan forms in Exercise 5. 7. Find up to similarity all nxn complex matrices A satisfying A = A2 . 8. The same problem for matrices such that A2 = A3 . 9. (Uniqueness of Jordan normal form) Let A be a complex nxn matrix with Jordan blocks Jij, where J^ is a block associated with the eigenvalue Cj. Prove that the number of r x r Jordan blocks Jij for a given i equals dr_i — dr, where dk is the dimension of the intersection of the column space of (A — Ciln)k and the null space of A — Ciln. Deduce that the blocks that appear in the Jordan normal form of A are unique up to order. 10. Using Exercise 9.4.4 as a model, suggest an n x n matrix whose minimal polynomial is xn + an-xn ~x + • •• + a,x + a0.
  • 385. 9.4: Jordan Normal Form 369 11. Use Jordan normal form to solve the following system of differential equations: 2/i = Vi + 2/2 + 2/3 2/2 = 2/2 2/3 = 2/2 + 2/3
  • 386. Chapter Ten LINEAR PROGRAMMING One of the great successes of linear algebra has been the construction of algorithms to solve certain optimization prob- lems in which a linear function has to be maximized or min- imized subject to a set of linear constraints. Typically the function is a profit or cost.Such problems are called linear programming problems. The need to solve such problems was recognized during the Second World War, when supplies and labor were limited by wartime conditions. The pioneering work of George Danzig led to the creation of the Simplex Algorithm, which for over half a century has been the standard tool for solving linear programming problems. Our purpose here is to describe the linear algebra which underlies the simplex algorithm and then to show how it can be applied to solve specific problems. 10.1 Introduction to Linear Programming We begin by giving some examples of linear programming problems. Example 10.1.1 (A productionproblem) A food company markets two products F and F2l which are made from two ingredients I and I2. To produce one unit of product Fj one requires a^- units of ingredient Jj. The maximum amounts of I and I2 available are mi and m2, respectively. The company makes a profit of pi on each unit of product Fi sold. How many units of F and F2 should the company produce in order to maximize its profit without running out of ingredients? 370
  • 387. 10.1: Introduction to Linear Programming 371 Suppose the company decides to produce Xj units of prod- uct Fj. Then the profit on marketing the products will be z — pXi + P2X2. On the other hand, the production process will use an^i +012X2 units of ingredient I± and 021^2 + 022^2 units of ingredient 72- Therefore X and x2 must satisfy the constraints auxi + CL12X2 < mi and a2i£i + a22^2 < ™2- Also x and x2 cannot be negative. We therefore have to solve the following linear program- ming problem: maximize : z = pX + P2X2 {auxi + ai2x2 < mi 0^21^1 + 022^2 < ?™2 x,x2 > 0 Example 10.1.2 (A transportationproblem) A company has m factories F±,..., Fm and n warehouses Wi,..., Wn. Factory Fj can produce at most r^ units of a certain product per week and warehouse Wj must be able to supply at least Sj units per week. The cost of shipping one unit from factory Fi to warehouse Wj is Cij. How many units should be shipped from each factory to each warehouse per week in order to minimize the total transportation cost and yet still satisfy the requirements on the factories and ware- houses? Let Xij be the number of units to be shipped from factory Fi to warehouse Wj per week. Then the total transportation cost for the week is m n i=l j=X
  • 388. 372 Chapter Ten: Linear Programming The condition on factory Fi is that 52 x ij < r ii while that on m warehouse Wj is 52 x ij — s j • We are therefore faced with the following linear programming problem: minimize: z = Y_, 2_. i = l j=l C-ij Xij f n 52 x^ < n, i = i,...,m J'=l subject to: < ™ . / J x ij — s ji 3 = *•J • • • > ^ i = l The general linear programming problem After these examples we are ready to describe the general form of a linear programming problem. Let xi, x2, • • •, xn be variables. There is given a linear function of the variables z = ciXi + c2x2 H h cnxn, called the objective function, which has to be maximized or minimized. The variables Xj are subject to a number of linear conditions, called the constraints, which take the form anxi + ai2x2 - h ainxn < or = or > bi, i = 1, 2,..., m. In addition, certain of the variables may be constrained, i.e., they must take non-negative values. The gen- eral linear programming problem therefore takes the following form:
  • 389. 10.1: Introduction to Linear Programming 373 maximize or minimize: z = CXx + 1 - cnxn { a-nxi H V ainxn < or = or > bi, i = 1,2,... ,m, certain Xj > 0. The understanding here is that a^-, &*, Cj are all known quan- tities. The object is to find x,..., xn which optimize the ob- jective function z, while satisfying the constraints. Evidently Examples 10.1.1 and 10.1.2 areproblems of this type. Feasible and optimal solutions It will be convenient to think of X = (£i,a;2, • • • > x n)T in the above problem as a point in Euclidean space Rn . If X satisfies all the constraints (including the conditions Xj > 0), then it is called a feasible solution of the problem. A feasible solution for which the objective function is maximum or minimum is said to be an optimal solution. For a general linear programming problem there are three possible outcomes. (i) There are no feasible solutions and thus the problem has nooptimal solutions. (ii) Feasible solutions exist, but the objective function has arbitrarily large or small values at feasible solutions. Again thereare no optimal solutions. (iii) The objective function has finite maximum or minimumvalues at feasible points. Then optimal solutions exist. In a linear programming problem the object is to find an optimal solution or show that none exists.
  • 390. 374 Chapter Ten: Linear Programming Standard and canonical form Since the general linear programming problem has a com- plex form, it is important to develop simpler types of prob- lem which are equivalent to it. Here two linear programming problems are said to be equivalent if they have the same sets of feasible solutions and the same optimal solutions. A linear programming problem is said to be in standard form if it is a maximization problem with all constraints in- equalities and all variables constrained. It therefore has the general form maximize: z — cX + • • • + cnxn subject to: ' anXi + a2x2 + h ainxn < bi a2Xi + a22x2 + h a2nxn < b2 Iamixi + am2x2 H h amnxn < bm Xj>0, j = l,2,...,n This problem can be written in matrix form: let A = (a,ij)mn, B = (bi b2 ... bm)T , C = (ci c2 ... cn)T and X = (x x2 ... xn)T . Then the problem takes the form: maximize : z = C X , . . , (AX<B subject to- S x > Q Here a matrix inequality U < V means that U and V are of the same size and Uy < v^j for all i,j: there is a similar definition of U > V.
  • 391. 10.1: Introduction to Linear Programming 375 A second important type of linear programming problem is a maximization problem with all constraints equalities and allvariables constrained. The general form is: maximize : z = C X subject to: Such a linear programming problem is said to be in canonical form. Changes to a linear programming problem Our aim is to show that any linear programming problem is equivalent to one in standard form and to one in canoni- cal form. To do this we need to consider what changes to a program will produce an equivalent program. There are four types of change that can be made. Replace a minimization by a maximization If the objective function in a linear program is z = CT X, the minimum value of z occurs for the same X as the maxi- mum value of (—C)T X. Thus we can replace "minimize" by "maximize" and CT X by the new objective function (—C)T X. Reverse an inequality The inequality anX + - • -+ainxn > bi is clearly equivalent to (-Oji)a;i -I h {-ain)xn < -bi. Replace an equality by two inequalities The constraint anXi + - • -+ainxn = bn is equivalent tothe two inequalities a-nxi H h ainxn < bi (-aa)xi H h (-ain)xn < -bi AX = B X > 0.
  • 392. 376 Chapter Ten: Linear Programming Elimination of an unconstrained variable Suppose that the variable Xj is unconstrained, i.e., it can take negative values. The trick here is to replace Xj by two new variables xt,xj which are constrained. Write Xj in the form x-j = xf — x~ where xf.xj > 0. This is possible since any real number can be written as the difference between two positive numbers. If we replace Xj by x~j~ — xj in each constraint and in the objective function, and we add new constraints x^ > 0, xj > 0, then the resulting equivalent program will have fewer unconstrained variables. By a sequence of operations of types I-IV a general linear programming problem may be transformed to an equivalence problem in standard form. Thus we have proved: Theorem 10.1.1 Every linear programming problem is equivalent to a program in standard form. Example 10.1.3 Put the following programming problem in standard form. minimize: z = 3x% + 2x2 — ^3 {Xi + X2 + 2x3 > 6 %l + ^2 + 3^3 < 2 £1,2:3 > 0 First of all change the minimization to a maximization and replace the constraints involving = and > by constraints involving < :
  • 393. 10.1: Introduction to Linear Programming Oil maximize: z = —3xi — 2x2 + x3 subject to: —xi — x2 — 2xz < —6 xi + x2 + x3 < 4 —x — x2 - x3 < —4 x - x2 + 3x3 < 2 x,x3 > 0 Next write x2, which is an unconstrained variable, in the form J/O Xn This yields a problem in standard form: maximize: z = —3xi — 2xt + 2x2 + x3 subject to: -Xi xx Xi + Xn -x2 + xt- + xt + Xn x 2 - 2x3 + x3 x3 — xt + x2 + 3^3 Xi,X~2,X2 ,X3 > 0 < - 6 < 4 < - 4 < 2 Slack variables If we wish to transform a linear programming problem to canonical form, a method for converting inequalities into equalities is needed. This can be achieved by the introduction of what are called slack variables. Consider a linear programming problem in standard form: maximize: z CT X subject to: AX < B X > 0 where A is m x n and the variables are x±,x2,... ,xn. We introduce m new variables, xn+i,..., xn+m, the so-called slack
  • 394. 378 Chapter Ten: Linear Programming variables, and replace the ith constraint anXi+- • -+ainxn < bi by the new constraint for i = 1, 2,..., m, together with X{+n > 0, i — 1,..., m. The effect is totransform the problem to an equivalent linear programming problem incanonical form: maximize: z = CX + • • • + cnxn subject to: { alxxi + • • • + alnxn + xn+i a2Xi + • • • + a2nXn + Xn+2 Q"ml%l r " ' ' T arnn£n ~r Xn--m Xi > 0, i = 1,2,..., n + m. Combining this observation with 10.1.1, we obtain: Theorem 10.1.2 Every linear programmingproblem is equivalent to one in canonical form. Exercises 10.1 1. A publishing house plans to issue three types of pamphlets Pi) ?2) ?3- Each pamphlet has to be printed and bound. The times in hours required to print and to bind one copy of pamphlet Pj are Ui and Vi respectively. The printing and binding machines can run for maximum times s and t hours per day respectively. The profit made on one pamphlet of type Pi is pi. Let xi,x2,X3 be thenumbers of pamphlets of the three types to be produced per day. Set up a linear program in xi,x2,x3 which maximizes the profitp per day = h = b2
  • 395. 10.1: Introduction to Linear Programming 379 and takes into account the times for which the machines are available. 2. A nutritionist is planning a lunch menu with two food types A and B. One ounce of A provides ac units of carbohydrate, a/ units of fat and ap units of protein: for B the figures are bc, bf, bp, respectively. The costs of one unit of A and one unit of B are p and q respectively. The meal must provide at least mc units of carbohydrate, m/ units of fat and mp units of protein. Set up a linear program to determine how many ounces of A and B should be provided in the meal in order to minimize the cost e, while satisfying the dietary requirements. 3. Write the following linear programming problem in stan- dard form: minimize: z = 2x — x2 — £3 4- £4 {xi + 2x2 + £3 - £4 > 5 3^1 + £2 — x 3 + £4 < 4 xi,x2 > 0 4. Write the linear programming problem in Exercise 10.4.3 in canonical form. 5. Consider the following linear programming problem in £1, X2, • • •, xn with n constraints: maximize: z = CT X subject to: where A is an n x n matrix with rank n. (a) Show that there is a feasible solution if and only if A~X B>Q. (b) Show that if a feasible solution exists, it must be optimal. (c) If an optimal solution exists, what is the maximum value of zl AX = B X>0
  • 396. 380 Chapter Ten: Linear Programming 10.2 The Geometry of Linear Programming Valuable insight into the nature of the linear program- ming problem is gained by adopting a geometrical point of view and regarding the problem as one about n-dimensional space. We will identify an n-column vector X with a point (x1,x2,...,xn) in n-dimensional space and denote the latter by Rn . The set of points X such that aXi + • • • + anxn — b, where the real numbers <2j,6 are not all zero, is called a hy- perplane in Rn . Thus a hyperplane in R2 is a line and a hyperplane in R3 is a plane. Let A = (ai a2 ... an); thus the equation of the hyper- plane is AX = b: let us call it H. Then H divides Rn into two halfspaces H1 = {X eKn AX < b} and H2 = {XeRn AX> b}. Clearly Rn = ff!U H2 and H = H1nH2. In a linear programming program in x,..., xn, each con- straint requires the point X to lie in a half space or a hyper- plane. Thus the set of feasible solutions corresponds to the points lying in all of the half spaces or hyperplanes correspond- ing to the constraints. In this way we obtain a geometrical picture of the set of feasible solutions of the problem.
  • 397. 10.2: The Geometry of Linear Programming 381 Example 10.2.1 Consider the simple linear programming problem in standard form: laximi to: < ze: z = x + y (2x+ y x + 2y x,y>0 < 3 < 3 The set S of feasible solutions is the region of the plane which is bounded by the lines 2x + y = 3, x + 2y = 3, x = 0, y = 0 The objective function z = x + y corresponds to a plane in 3-dimensional space. The problem is to find a point of S at which the height of the plane above the xj/-plane is largest. Geometrically, it is clear that this point must be one of the "corner points" (0,0), (f ,0), (1,1), (0, §). The largest value of z — x + y occurs at (1,1). Therefore x = 1 = y is an optimal solution of the problem. The next step is to investigate the geometrical properties of the set of feasible solutions. This involves the concept of convexity.
  • 398. 382 Chapter Ten: Linear Programming Convex subsets Let Xi and X2 be two distinct points in Rn . The line segment XX2 joining X and X2 is defined to be the set of points {tXx + (1 - t)X2 I 0 < t < 1}. For example, if n < 3, the point tX + (1 — t)X2, where 0 < t < 1, is a typical point lying between Xi and X2 on the line which joins them. To see this one has to notice that X2 - (tX1 + (1 - t)X2) = t{X2 - Xx) and {tXx + (1 - t)X2) - X, = (1 - i)(X2 - Xi) are parallel vectors. (Keep in mind that we are using X to denote both the point (xi,X2,xs) and the column vector {x X2 X3)7 '.) A non-empty subset S of Rn is called convex if, whenever X and X2 are points in S, every point on the line segment XX2 is also a point of S.
  • 399. 10.2: The Geometry of Linear Programming 3 8 3 It is easy to visualize the situation in R2 : for example, consider the shaded regions shown. The interior of the left hand figure is clearly convex, but the interior of the right hand one is not. The following property of convex sets is almost obvious. Lemma 10.2.1 The intersection of a collection of convex subsets of Rn is either empty or convex. Proof Let {Si | i G /} be a set of convex subsets of Rn and assume that S — f] Si is not empty. If S has only one element, then iei it is obviously convex. So assume X and X2 are distinct points of S and let 0 < t < 1. Now Xi and X2 belong to Si for all i, as must tX + (1 — t)X2 since Si is convex. Hence tX + (1 — t)X2 G S and S is convex. Our interest in convex sets is motivated by the following fundamental result. Theorem 10.2.2 The set of all feasible solutions of a linear programming prob- lem is either empty or convex.
  • 400. 384 Chapter Ten: Linear Programming Proof By 10.1.1 we may assume that the linear programming prob- lem is in standard form. Hence the set of feasible solutions is the intersection of a collection of half spaces. Because of 10.2.1 it is enough to prove that every half space H is convex. For example, consider H = {X e Rn | AX < b} where A is an n-row vector. Suppose that Xi,X2 < G H and 0 < t < 1. Then A(tXi + (l-t)X2) = t(AXx) + (l-t)AX2 < tb+(l-t)b = b. Hence tX + (1 — t)X2 G H and H is convex. The convex hull Let Xi, Xii • • • , Xm be vectors in Rn . Then a vector of the form m ^CiXi, where m Ci > 0 and VJ Q = 1, i = i is called a convex combination of Xi,X2,..., Xm. For exam- ple, whenTO= 2, every convex combination of Xi, X2 has the form tXi + { — t)X2, where 0 < t < 1. Thus the line segment X±X2 consists of all the convex combinations of X and X2. The set of all convex combinations of elements of a non- empty subset S of Rn is called the convex hull of S: C{S). For example, the convex hull of {Xi, X2}, where X ^ X2, is just the line segment XiX2- The relation between the convex hull and convexity is made clear by the next result.
  • 401. 10.2: The Geometry of Linear Programming 385 Theorem 10.2.3 Let S be a non-empty subset o/Rn . Then C(S) is the smallest convex subset o/Rn which contains S. Proof In the first place it is clear that 5 C C(S). We show next that C(S) is a convex set. Let X,Y G C(S); then we can m m write X = Yl c i^i a n d Y = Yl diXi, where Xi,..., Xm G S, i = l i = l m m 0 < Cj, di < 1, and Y Ci = 1 = Y2 d%. Then for any t i=i i=i satisfying 0 < t < 1, we have m m iX + (1 - t)Y= Y^teiXi + 5 ] ( 1 - t)diXi i=l i = l m = ^2(ta + {l-t)di)Xi. i = l Now m / m / m ^ ( t c i + ( l - t ) d i ) = t l ^ c i ] + ( l - t ) ]Tdj i = l i = l / i = l = t + (l-t) = 1. Consequently £X + (1 - t)F G C(S) and (7(5) is convex. Next suppose T is any convex subset of R n containing S. We must show that C(S) C T; for then C(S) will be the smallest convex subset containing S. m Let X G C(S) and write X = £ CjXj, where X; G 5, i = i 771 Cj > 0 and ]T Q = 1. We will show that X G T by induction i = l
  • 402. 386 Chapter Ten: Linear Programming on m > 1, the claim being clearly true if m = 1. Now we have m — 1 X = (1 - Cm) J2 iTZ—)X i + C mXm- X Cm. m—1 Next, since ^ c» = 1 — cm, we have i-X m—X E l C-i *• ^ m 1 M _ r ' _ 1 _ r ~~ i = l - ^ Also 0 < , c i < 1 since Q < ci + • • • + cm_i = 1 — cm for 1 < i < m — 1. Hence m—X -1 - *^T77. 1 = 1 by the induction hypothesis on m. Finally, X = (1 - cm )F + cmXm e T, since T is convex. Extreme points Let S be a convex subset of Rn . A point of S is called an extreme point if it is not an interior point of any line segment joining two points of S. For example, the extreme points of the set of points in the polygon below are just the six vertices shown.
  • 403. 10.2: The Geometry of Linear Programming 3 8 7 The extreme points of a convex set can be characterized in terms of convex combinations. Theorem 10.2.4 Let S be a convex subset o/Rn and let X e S. Then X is an extreme point of S if and only if it is not a convex combination of other points of S. Proof Suppose X is not an extreme point of S; then X = tY + (l-t)Z, where 0 < t < 1 and Y, Z G S. Then X is certainly a convex combination of points of S, namely Y and Z. Conversely, suppose that X is a convex combination of other points of S. We will show that X is not an extreme m point of S. By assumption it is possible to write X = ^ qXi 2 = 1 m where Xi G S, Xi ^ X, 0 < c; < 1 and ]T Q = 1. Notice m that Cj ^ 1; for otherwise ^ Q = 0 and Cj = 0 for all i=l, i^j i T^ j , so that X = Xj. Just as in the proof of Theorem 10.2.3, we can write m—1 X = (1 - Cm) J2 (TZ^)X i + Cm*™- i=l X C m Also m — 1 1 — C 1 — r ~ ~ and 0 < yf*— < 1, since a < c + • • • + Cm-i = 1 — cm if i < m. It follows that ro— 1
  • 404. 388 Chapter Ten: Linear Programming and X = (1 — cm)Y + cmXm is an interior point of the line segment joining Y and Xm. Hence X is not an extreme point of S, which completes the proof. It is now time to explain the connection between optimal solutions of a linear programming problem and the extreme points of the set of feasible solutions. Theorem 10.2.5 (The Extreme Point Theorem) Let S be the set of all feasible solutions of a linear program- ming problem. (i) If S is non-empty and bounded, then there is an optimal solution. (ii) If an optimal solution exists, then it occurs at an extreme point of S. Here a subset S of Rn is said to be bounded if there exists a positive number d such that —d < Xi < d, for i — 1, 2,..., n and all (xi,X2,... ,xn) in S. Proof of Theorem 10.2.5 Suppose that we have a maximization problem. For simplicity we will assume throughout that S is bounded and n = 2: thus S can be visualized as a region of the plane bounded by straight lines corresponding to the constraints. Let z = f(x, y) — ex + dy be the objective function: we can assume c and d are not both 0. Since S is bounded and / is continuous in 5*, a standard theorem from calculus can be applied to show that / has an absolute maximum in S. This establishes (i). Next assume that there is an optimal solution. By an- other standard theorem, if P(x, y) is a point of P which is an absolute maximum of /, then either P is a critical point of S or else it lies on the boundary of S. But / has no critical points: for fx = c, fy = d, so fx and fy cannot both vanish. Thus P lies on the boundary of S and so on a line. By the
  • 405. 10.2: The Geometry of Linear Programming 389 same argument P cannot lie in the interior of the line. There- fore P is a point of intersection of two lines and hence it is an extreme point of S. We can now summarize the possible situations for a linear programming problem with set of feasible solutions S. (a) S is empty: the problem has no solutions; (b) S is non-empty and bounded: in this case the problem has an optimal solution and it occurs at an extreme point of S. (c) S is unbounded: here optimal solutions need not exist, but, if they do, they occur at extreme points of S. We will see in 10.3 that if S is non-empty and bounded, then it has a finite number of extreme points. By computing the value of the objective function at each extreme point one can find an optimal solution of the problem. We conclude with two examples. Example 10.2.2 maximize: z = 2x + 3y ( x+y > 1 subject to: < x — y > — 1 (x,y>0 Here the set of feasible solutions S corresponds to the region of the xy-plane bounded by the lines x + y = 1, x — y = —1, x — 0, y = 0. Clearly it is unbounded and z can be arbitrary large at points in S. Thus no optimal solutions exist.
  • 406. 390 Chapter Ten: Linear Programming Example 10.2.3 maximize: z = 1 — 12x — 3y ( x + y > 1 x — y > — 1 x,y>0 In this problem the set of feasible solutions is the same set S as in the previous example. However the maximum value of z in S occurs at x = 0, y = 1: this is an optimal solution of the problem. Exercises 10.2 In Exercises 10.2.1-10.2.3 sketch the convex subset of all feasible solutions of a linear programming problem with the given constraints. ( x- y < - 2 1. < 2x+ y < 3 (x,y>0 { x - 2y < 3 x+ y < 6 x,y>0 { x + y + z < 5 x - y - z < 0 x,y,z > 0 4. Find all the extreme points in the programs of Exercises 10.2.1 and 10.2.2 . 5. Suppose the objective function in Exercise 10.2.2 is z = 2x + 3y. Find the optimal solution when z is to be maximized. 6. Let S be any subspace of Rn . Prove that S is convex. Then give an example of a convex subset of R2 containing (0,0) which is not a subspace.
  • 407. 10.3: Basic Solutions and Extreme Points 391 7. Let S be a convex subset of Rn and let T be a linear operator on Rn . Define T(S) to be {T(X) X e S}. Prove that T(S) is convex. 8. Suppose that X and Xi are distinct feasible solutions of a linear programming problem in standard form. If the objective function has the same values at X and X2, prove that this is the value of the objective function at any point on the line segment joining X and X<i- 10.3 Basic Solutions and Extreme Points We have seen in 10.2 that the extreme points for a linear programming problem are the key to obtaining an optimal solution. In this section we describe a method for finding the extreme points which is the basis of the Simplex Algorithm. Consider a linear programming problem in canonical form - remember that any problem can be put in this form: maximize: z = C X subject to: Suppose that the problem has n variables xi,..., xn and rn constraints, which means that A is an m x n matrix, while X,C ERn a n d B 6 R m The linear system AX = B must be consistent if there is to be any chance of a feasible solution, so we assume this to be the case; thus the matrix A and the augmented matrix (A | B) have the same rank r. Hence the linear system AX = B is equivalent to a system whose augmented matrix has rank r, with its final m — r rows zero. These rows correspond to constraints of the form 0 = 0, which are negligible. Therefore AX = B X>0
  • 408. 392 Chapter Ten: Linear Programming there is no loss in supposing that A is an m x n matrix with rank m; of course now m < n. Since A has rank m, this matrix has m linearly indepen- dent columns, say Ah, Ah,... ,Ajm, (jt < j 2 < • • • < j m ) - Define A = (Aj1 Aj2 ... Ajm), which is an m x m matrix of rank m, so that (A')~1 exists. The linear system A (XJ1 Xj2 ... Xjm) = B therefore has a unique solution for (XJX Xj2 ... Xjm)T , namely {A')-l B. This solution is in Rm , not Rn . To remedy this, define X = (xi x2 ... xn) by putting xi = 0 if j ^ juj2,.. .,jm. Then AX = A{xx x2 ... xn)T = xjlAjl + xhAh + • • • + xjmAjm = B. Therefore X is a solution of AX = B with the property that all entries of X, except perhaps those in positions j i , . . . , j m , are zero. Such a solution is called a basic solution of the linear programming problem; if in addition all the non-negative, it is a basic feasible solution. The called the basic variables. The next step is to relate the basic feasible solutions to the extreme points of a linear programming problem in canon- ical form. Theorem 10.3.1 A basic feasible solution of a linear programming problem in canonical form is an extreme point of the set of feasible solu- tions.
  • 409. 10.3: Basic Solutions and Extreme Points 393 Proof Suppose that the linear programming problem is maximize: z = C X subject to: and it has variables x,..., xn. Here A may be assumed to be an m x n matrix with rank m: as has been pointed out, this is no restriction. Then A has m linearly independent columns and, by relabeling the variables if necessary, we can assume these are the last m columns, say A[,..., A'm. Let X = (0 ... 0 x[ ... x'jr be the corresponding basic solution. Assume that X is feasi- ble, i.e., x'j > 0 for j — 1,2,... ,m. Our task is to prove that X is an extreme point of S, the set of all feasible solutions. Suppose X is not an extreme point of S; then X = tU+(l- t)V, where 0 < t < 1, U, V 6 S and X ^ U, V. Write U = (Ui ... Un-m u[ ... u'm)T and V = (Vi ... Vn-m v[ ... v'm)T . Equating the jth entries of X and tU + (1 — t)V, we obtain tuj + (1 — t)vj = 0 , 1 < j < n — m tu'j + (1 - t)v'j =x'j, l<j<m Since 0 < t < 1 and Uj,Vj > 0, the first equation shows that Uj = 0 = Vj for j = 1,..., n — m. AX = B X>Q
  • 410. 394 Chapter Ten: Linear Programming Since U G S, we have AU = B, so that u'1A'l + -.. + u'mA!m = B and x1A1 + • • • + xmArn — B, since AX = B. Therefore, on subtracting, we find that K - x'M'i + • • • + K - 4 ) 4 = o. However A^...,Nm are linearly independent, which means that u[ = x[, ..., u'm = x'm, i.e., U = X, which is a contra- diction. The converse of this result is true. Theorem 10.3.2 An extreme point of the set of feasible solutions of a linear programming problem in canonical form is a basic feasible so- lution. Proof Let the linear programming problem be maximize: z = CT X , . . . (AX = B subject to: < x >Q where A is an m x n matrix of rank m. Let X be an extreme point of the set of feasible solutions. Suppose that X has s non-zero entries and label the vari- ables so that the last s entries of X are non-zero, say X = (0 ... Oii ... x's)T . Let A'j be the column of A which corresponds to the entry x'j. We will prove that A[,..., A' are linearly independent.
  • 411. 10.3: Basic Solutions and Extreme Points 395 Assume that diA[ - h dsA's = 0 where not all the dj are 0. Let e be any positive number. Then In a similar fashion we have a Now define £/ = (0 ... 0 xi + edi ... x's + eds)T 1/ = (0 ... 0 x[ - edx ... x's- eds)T Then AU = B = AV. Next choose e so that x', 0 < e < -rj- j = l,2,...,s, Mil if dj ^ 0. This choice of e ensures that x' ± edj > 0 for j = 1, 2,..., s. Hence U > 0 and V > 0, so that U and V are feasible solutions. However, X = U + V, which means that X = U or V since X is an extreme point. But both of these are impossible because e > 0. It follows that A[,..., A'a are linearly independent and X is a basic feasible solution. We are now able to show that there are only finitely many extreme points in the set of feasible solutions.
  • 412. 396 Chapter Ten: Linear Programming Theorem 10.3.3 In a linear programming problem there are finitely many ex- treme points in the set of feasible solutions. Proof We assume that the linear programming problem is in canon- ical form: maximize: z = CT X subject to: We can further assume here that A is an m x n matrix with rank m. Let X be an extreme point of S, the set of feasi- ble solutions. Then X is a basic feasible solution by 10.3.2. In fact, if the non-zero entries of X are Xjl,..., Xjs, the proof of the theorem shows that the corresponding columns of A, that is, Aj1,..., Aj3, are linearly independent and thus s < m. In addition we have x ji An + ^x jsAjs = B. By 2.2.1 this equation has a unique solution for Xjx,... ,Xjs. Therefore X is uniquely determined by j±,... ,js. Now there are at most (n ) choices for ji,. • • ,js, so the total number of extreme points is at most ^2™=0 (") • The last theorem shows that in order to find an optimal solution of a linear programming problem in canonical form, one can determine the finite set of basic feasible solutions and test the value of the objective function at each one. The sim- plex algorithm provides a practical method for doing this and is discussed in the next section. In conclusion, we present an example of small order which illustrates how the basic feasible solutions can be determined. ( AX = B 1 A>0
  • 413. 10.3: Basic Solutions and Extreme Points 397 Example 10.3.1 Consider the linear programming problem maximize: z = 3x + 2y (2x-y< 6 subject to: < 2x + y < 10 { x,y>0 First transform the problem to canonical form by intro- ducing slack variables u and v: maximize: z = 3x + 2y { 2x — y + u = 6 2x + y + v = 10 x,y,u,v > 0 The matrix form of the constraints is 2 2 - 1 1 0 1 0 l) y u - (6 " I io The coefficient matrix has rank 2 and each pair of columns is linearly independent. Clearly there are (*) = 6 basic solu- tions, not all of them feasible. In each such solution two of the non-basic variables are zero. The basic solutions are listed in the table below: x y u v type z 0 0 3 5 4 0 0 10 0 0 2 - 6 6 16 0 - 4 0 0 10 0 4 0 0 16 feasible feasible feasible infeasible feasible infeasible 0 20 9 15 16 -12
  • 414. 398 Chapter Ten: Linear Programming There are four basic feasible solutions, i.e., extreme points. The one that produces the largest value of,zis:r = 0,y = 10, giving z = 20. Thus x = 0, y = 10 is an optimal solution. Exercises 10.3 In each of the following linear programming problems, transform the problem to canonical form and determine all the basic solutions. Classify these as infeasible or basic feasible, and then find the optimal solutions. 1. maximize: z = 3x — y ( x + 3y < 6 subject to: < x — y < 2 { x,y>0 2. maximize: z = 2x + 3y ( 2x-y < 6 subject to: < 2x + y < 10 [x,y>0 3. maximize: z = X + x-i + X3 { 2xx - x2+ 4x3 < 12 4xi + 2x2 + 5x3 < 4 xi,x2,x3 > 0 4. A linear programming problem in standard form has m constraints and n variables. Prove that the number of extreme points is at most YlT=o (™^n ) •
  • 415. 10.4: The Simplex Algorithm 3 9 9 10.4 The Simplex Algorithm We are now in a position to describe the simplex algo- rithm, which is a practical method for solving linear program- ming problems, based on the theory developed in the preced- ing sections. The method starts with a basic feasible solution and, by changing one basic variable at a time, seeks to find an optimal solution of the problem. It should be kept in mind that there are finitely many basic feasible solutions. Consider a linear programming problem in standard form with variables x,X2,... ,xn and m constraints: maximize: z = CT X subject to: < ^ x >0 Thus A is an mxn matrix. For the present we will assume that B > 0, which is likely to be true in many applications: just what to do if this condition does not hold will be discussed later. Convert the program to one in canonical form by intro- ducing slack variables xn+i,..., xn+rn: maximize: z = CT X subject to: where A' = (A l m ) , an m x (n + m) matrix. Also I = (ii X2 • • • xn+rn)T and C = (ci C2 ... cn 0 ... 0)T . Notice that A' has rank m since columns n + l,n + 2,...,n + m are linearly independent. J A'X = B 1 *>0
  • 416. 400 Chapter Ten: Linear Programming Recall from 10.3 that the extreme points of the set of feasible solutions are exactly the basic feasible solutions. Also keep in mind that in a basic solution the non-basic variables all have the value 0. The initial tableau For the linear programming problem in canonical form above we have the solution X — X2 — ' — Xn U, £n _|_i — 0, . • . , Xn-^-m — 0 m . Since B > 0, this is a basic feasible solution in which the basic variables are the slack variables xn+i,..., xn+m. The value of z at this point is 0 since z = cXi + 1 - cnxn. The data are displayed in an array called the initial tableau. xn+x Xn+2 %n--m Xi an «21 1 m l - C l X2 a2 «22 O'ml - c 2 2-n O'ln a-2n Q"mn C n x n+l 1 0 0 0 • En+m 0 0 1 0 z 0 0 0 1 h b2 bn 0 Here the rows in the array correspond to the basic vari- ables, which appear on the left, while the columns correspond to all the variables, including z. The bottom row, which lies outside the main array and is called the objective row, displays the coefficients in the equation —cixi — • • • — cnxn + z — 0. The z-column is often omitted since it never changes during the algorithmic process. The right most column displays the current values of the basic variables, with the value of z in the lower right corner.
  • 417. 10.4: The Simplex Algorithm 401 Entering and departing variables Consider the initial tableau above. Suppose that all the entries in the objective row are non-negative. Then Cj < 0 n and, since z = ^2 CjXj = 0, if we change the value of one of the non-basic variables x,... ,xn by making it positive, the value of z will decrease or remain the same. Therefore the value of z cannot be increased from 0 and thus the solution is optimal. On the other hand, suppose that the objective row con- tains a negative entry —Cj, so Cj > 0. Since z = CX + • • • + d 1 < j < n, it may be possible to increase z by in- creasing Xj; however this must be done in a manner that does not violate any of the constraints. Suppose that the most negative entry in the objective row is —Cj. The question of interest is: by how much can we increase the value of Xjl Since all other non-basic variables equal 0, the zth constraint requires that so that xn+j = bi — ciijXj > 0. Hence for i = 1,2,..., m. Now if a^- < 0, this imposes no restriction on Xj since bi > 0. Thus if a^- < 0 for all i, then Xj can be increased without limit, so there are no optimal solutions. If aij > 0 for some i, on the other hand, we must ensure that bi 0<Xj < — . The number h_
  • 418. 402 Chapter Ten: Linear Programming is called a 6-ratio for Xj. Hence the value of Xj cannot be increased by more than the smallest non-negative #-ratio of XJ; for otherwise one of the constraints will be violated. Suppose that the smallest non-negative #-ratio for Xj oc- curs in the ith row: this is called the pivotal row. One then applies row operations to the tableau, with the aim of making the ith entry of column j equal to 1 and all other entries of the column equal to 0. (This is called pivoting about (i,j) entry). The choice of i and j guarantees that no negative entries will appear in the right most column. Replace Xi (the departing variable) by Xj (the entering variable). Now Xj becomes a basic variable with value —*-, replacing X{. With this value of Xj, the value of z will increase by biCj/aij, at least if 6; > 0. After substituting Xj for Xi in the list of basic variables, we obtain the second tableau. This is treated in the same way as the first tableau, and if it is not optimal, one proceeds to a third tableau. If at some point in the procedure all the entries of the objective row become non-negative, an optimal solution has been reached and the algorithm stops. Summary of the simplex algorithm Assume that a linear programming problem is given in standard form maximize: z = C X , . . . (AX<B subject to: < x > Q where B > 0. Then the following procedure is to be applied. 1. Convert the program to canonical form by introducing slack variables. With the slack variables as basic variables, construct the initial tableau. 2. If no negative entries appear in the objective row, the solution is optimal. Stop. 3. Choose the column with the most negative entry in
  • 419. 10.4: The Simplex Algorithm 403 the objective row. The variable for this column, say Xj, is the entering variable. 4. If all the entries in column j are negative, then there are no optimal solutions. Stop. 5. Find the row with the smallest non-negative 9-value of Xj. If this corresponds to Xi, then X, is the departing variable. 6. Pivot about the (i, j) entry, i.e., apply row operations to the tableau to obtain 1 as the (i, j) entry, with all other entries in column j equal to 0. 7. Replace Xi by Xj in the tableau obtained in step 6. This is the new tableau. Return to Step 2. Example 10.4.1 maximize: z = 8xi + 9x2 + 5x subject to: x + x2 + 2^3 < 2 2JCI + 3x2 + 4x3 < 3 3xi + 3x2 + £3 < 4 xi,x2 ,x3 > 0 Convert the problem to canonical form by introducing slack variables X4,X5,X6: maximize: z = 8x1 + 9x2 + 5x3 subject to: Xi + x2 + 2x3 + x4 = 2 2xi + 3x2 + 4x3 + x5 = 3 3xx + 3x2 + xs + xe = 4 Xj > 0, The initial basic feasible solution is X = X2 = x3 = 0, x4 = 2, X5 = 3, XQ = 4, with basic variables x4,X5,x6. The initial tableau is:
  • 420. 404 Chapter Ten: Linear Programming £4 * * £5 XQ Xi 1 2 3 -8 *x2 1 3 3 -9 X3 2 4 1 -5 £4 1 0 0 0 £5 0 1 0 0 x6 0 0 1 0 2 3 4 0 Here the z-column has been suppressed. The initial basic feasible solution £4 = 2, £5 = 3, XQ = 4 is not optimal since there are negative entries in the objective row; the most neg- ative entry occurs in column 2, so £2 is the entering variable, (indicated in the tableau by *). The #-values for £2 are 2,1, | , corresponding to £4, £5, x&. The smallest (non-negative) 9-value is 1, so £5 is the departing variable (indicated in the tableau by **). Now pivot about the (2, 2) entry to obtain the second tableau. £4 X2 * * £ 6 *X 1/3 2/3 1 -2 X2 0 1 0 0 X3 2/3 4/3 -3 7 £4 1 0 0 0 £5 -1/3 1/3 -1 3 x6 0 0 1 0 1 1 1 9 The objective row still has a negative entry, so this is not optimal: the entering variable is x. The smallest 9-value for £1 is 1, occurring for £6, so this is the departing variable. Now pivot about the (3,1) entry to get the third tableau. £4 X2 £l £l 0 0 1 0 x2 0 1 0 0 X3 5/3 10/3 -3 1 £4 1 0 0 0 £5 0 1 -1 1 £6 -1/3 -2/3 1 2 2/3 1/3 1 11
  • 421. 10.4: The Simplex Algorithm 405 Since there are no negative entries in the objective row, this tableau is optimal. The optimal solution is therefore x = 1, x2 = §, x3 = 0, giving z = 11. The next example shows how the simplex method can detect a case where there are no optimal solutions. Example 10.4.2 maximize: z = 5xi — 4x2 subject to: £1 — £2 < 2 2xi +x2<2 xi,x2 > 0 Introduce slack variables X3 and X4 and pass to canonical form: maximize: z = 5xi — 4x2 xi - x2 + x3 = 2 subject to: < —2xi + X2 + X4 = 2 Xi,X2,X3,X4 > 0 The initial basic feasible solution is xi = 0 = X2, X3 = 2, X4 = 2, with basic variables X3,X4. The initial tableau is therefore * * X3 X4 *Xi 1 -2 -5 %2 -1 1 4 x3 1 0 0 X4 0 1 0 2 2 0 The entering variable is x and the departing variable X3. The second tableau is: Xi X4 Xi 1 0 0 *x2 -1 -1 -1 X3 1 2 5 X4 0 1 0 2 6 10
  • 422. 406 Chapter Ten: Linear Programming The next entering variable is x2; however all the entries in the X2-column are negative, which means that x2 can be in- creased without limit. Therefore this problem has no optimal solution. Geometrically, what happened here is that the set of fea- sible solutions is the infinite region of the plane lying between the lines x1 - x2 = 2, -2x± + x2 = 2, x± = 0, x2 = 0. In this region z = bxi — 4x2 can take arbitrarily large values. Degeneracy Up to this point we have not taken into account the pos- sibility that the simplex algorithm may fail to terminate: in fact this could happen. To see how it might occur, suppose that at some stage in the simplex algorithm the entering variable has two equal smallest non-negative 9-values. Then after pivoting one of the basic variables will have the value zero, a phenomenon called degeneracy. If in the next tableau the basic variable whose value was 0 is the departing variable, the objective function will not increase in value. This raises the possibility that at some point we might return to this tableau, in which event the simplex algorithm will run forever. In practice the simplex algorithm very seldom fails to ter- minate. In any case there is a simple adjustment to the algo- rithm which avoids the possibility of non-termination. These adjustments involve different choices of entering and departing variables, as indicated below. (i) To select the entering variable, choose the variable with a negative entry in the objective row which has the smallest subscript. (ii) To select the departing variable choose the basic variable with smallest non-negative 8-value and smallest subscript. This procedure is known as Bland's Rule. It can be shown
  • 423. 10.4: The Simplex Algorithm 407 that the simplex method, when combined with Bland's Rule, will always terminate, even if degeneracy occurs. The Two Phase Method The reader may have noticed that our version of the sim- plex algorithm does not work if some constraints have nega- tive numbers on the right side. We consider briefly how this situation can be remedied. Consider a linear program in standard form: rp maximize: z = C X subject to: < ^ x >0 where A is m x n. As usual we introduce slack variables xn +i,..., xn+m to obtain a problem in canonical form: rp maximize: z = C X subject to: where A' = [A | lm]. If some bi is negative, we can multiply that constraint by —1 to get an entry —bi > 0 on the right hand side. The problem now is that we do not have a basic feasible solution — for the obvious solution xn+i = bi is not feasible. What is called for at this point is a general method for finding an initial basic feasible solution for any linear pro- gramming problem in canonical form. Suppose we have a linear programming problem in canon- ical form: rp maximize: z = C X subject to: I x >0 ® where A is m x n. The problem is to find an initial basic feasible solution. Once this is found, the simplex algorithm A'X = B X > 0
  • 424. 408 Chapter Ten: Linear Programming can be run. There is no loss in assuming that B > 0 since we can, if necessary, multiply a constraint by —1. The method is to introduce new variables yi,V2,- • • ,ym called artificial variables. These are used to form the auxiliary program: maximize: z -J>* subject to: < ^ X Y >0 ^ ' If (II) has an optimal solution X, Y with z = 0, then all the yi must equal 0 and thus AX = B. Hence X is a basic feasible solution of (I). On the other hand, if the optimal solution of (II) yields a negative value of 2, there are no feasible solutions of (II) with Y = 0, i.e., there are no feasible solutions of (I). Thus if we can solve the problem (II), we will either find a basic feasible solution of (I) or else conclude that (I) has no feasible solutions. But can we in fact solve the problem (II)? The answer is affirmative: for X = 0, y = b, ..., ym = bm is clearly a basic feasible solution of (II), so it can be used to form the initial tableau for problem (II). After solving (II), either we will have a basic feasible solution of (I) or we will know that no feasible solutions exist. In the former event the simplex algorithm can then be run for problem (I). This is known as the Two Phase Method. We summarize the two phases for solving the linear pro- gramming problem (I). Phase One Apply the simplex method to the auxiliary program (II). If there is no optimal solution or if the optimal solution yields a negative value z, then there are no feasible solutions of (I). Stop. Otherwise a basic feasible solution to problem I is found.
  • 425. 10.4: The Simplex Algorithm 409 Phase Two Starting with the basic feasible solution obtained in Phase One, use the simplex algorithm to find an optimal solution of (I) or show that none exists. In conclusion, the Two Phase Method can be applied to any linear programming problem in canonical form. Example 10.4.3 maximize: z — 2x — ( xi + 2x2 subject to: < 3xi + 6x2 1 Xi>0 2x2 — 3x3 + 2x4 + x3 + x4 + 2x3 = 18 = 24 This problem is given in canonical form. The Two Phase Method will be applied, the first phase being to find a basic feasible solution. To this end we set up the auxiliary problem: maximize: z = -y — 2/2 — J/3 ( Xi + 2x2 + 2X3 + u . , , 1 xi + 2x2 + x3 + x4 + a b j G C t t 0 : 3x, + 6x2 + 2x3 + I Xi,yj > 0 The initial tableau for this problem is: Vi V2 V3 = 12 = 18 = 24 yi V2 2/3 Xi 1 1 3 0 X2 2 2 6 0 X3 X4 2 0 1 1 2 0 0 0 Vi 1 0 0 - 1 2/2 0 1 0 - 1 2/3 0 0 1 - 1 12 18 24 - 5 4 Here the initial basic feasible solution is y = 12, yi = 18, yz = 24, with z = —54. But notice that the entries in the objective
  • 426. 410 Chapter Ten: Linear Programming row corresponding to the basic variables are not 0; this is because z is expressed as —y — y2 — y3. We need to replace 2/1,2/2,2/3 by expressions in x±, x2, x3, x4 and thereby eliminate the offending entries. Note that —y = x + 2^2 + 2x3 — 12, -2/2 = x1 + 2x2 + x3 + x4 - 18 and - y 3 = 3xi + 6x2 + 2x3 - 24. Adding these, we obtain z = -2/i - 2/2 - 2/3 = 5xi + 10£2 + 5x3 + X4 - 54. The next step is to use this expression to form the new objective row: 2/1 2/2 * *2/3 Xi 1 1 3 - 5 *x2 2 2 6 -10 Z 3 2 1 2 - 5 X4 0 1 0 - 1 2/i 1 0 0 0 2/2 0 1 0 0 2/3 0 0 1 0 12 18 24 -54 This is the first tableau for the auxiliary problem. The enter- ing variable is x2 and the departing variable y3. The second tableau is: * * 2 / i 2/2 %2 X i 0 0 1/2 0 X2 0 0 1 0 *x3 4/3 1/3 1/3 - 5 / 3 x4 0 1 0 - 1 2/1 1 0 0 0 2/2 0 1 0 0 2/3 - 1 / 3 - 1 / 3 1/6 5/3 4 10 4 -14 The entering variable is x3 and the departing variable is y. The third tableau is: X3 * * 2/2 Xi xi x2 x3 *x4 j/i y2 2/3 0 0 1 0 3/4 0 -1/4 0 0 0 1 -1/4 0 1/4 1/2 1 0 0 -1/4 0 1/4 0 0 0 - 1 5/4 0 5/4 3 9 3 - 9
  • 427. 10.4: The Simplex Algorithm 411 The entering variable is £4 and the departing variable is y2. The fourth tableau is x3 £ 4 X2 X £ 2 0 0 0 0 1/2 1 0 0 X3 1 0 0 0 £ 4 0 1 0 0 Vi 3/4 -1/4 -1/4 1 V2 0 0 0 0 2/3 -1/4 -1/4 1/4 1 3 9 3 0 This tableau is optimal with z = 0. Hence we have a basic solution of the original problem, x = 0, £2 = 3, £3 = 3, £4 = 9. Now Phase Two begins. To obtain an initial tableau, in the final tableau of Phase 1 delete the columns corresponding to the artificial variables yi,y2,V3- The new basic variables are £3, £4, £2. Replace the objective row by the entries of the original objective function, but retain 0 in the bottom right hand corner: X3 £ 4 X2 XX £ 2 £ 3 X4 0 0 1 0 0 0 0 1 1/2 1 0 0 - 2 2 3 - 2 3 9 3 0 Next eliminate the non-zero entries in the objective row corresponding to the basic variables £3,£4,£2. This is done by adding to the objective row (—2) x row 3, (—3) x row 1 and 2 x row 2. This yields the tableau: X3 £ 4 * * £2 * £ l £2 £3 £4 0 0 1 0 0 0 0 1 1/2 1 0 0 - 3 0 0 0 3 9 3 3
  • 428. 412 Chapter Ten: Linear Programming The entering variable is x and the departing variable is x<i- The next tableau is: %3 X4 Xi Xi 0 0 1 0 x2 0 0 2 6 £ 3 1 0 0 0 £ 4 0 1 0 0 3 9 6 21 This tableau is optimal with solution x± = 6, x2 = 0, x3 = 3, £4 = 9 and 2 = 21. In conclusion we remark that there is one possible situa- tion that the Two Phase Method cannot handle. It could be that in the final tableau of Phase One at least one artificial variable is basic. There is a modification of the Two Phase Method to deal with this possibility. The reader is referred to a text on linear programming such as [12] or [13] for details. Needless to say, we have merely skimmed the surface of linear programming. Recently an improvement on the simplex method known as Kamarkar's algorithm has been discovered. Again the interested reader may consult one of the above ref- erences for details. Exercises 10.4 In the following problems use the simplex method to solve the linear programming problem or show that no optimal so- lution exists. 1. maximize: z = 3x — y (x+ 3y < 6 subject to: < x — y < 2 [x,y> 0
  • 429. 10.4: The Simplex Algorithm 413 6. maximize: z = f subject to: < minimize: z — subject to: < minimize: z = f subject to: < I maximize: z - ( subject to: < I maximize: z = subject to: ( 2xi - x2 1 2xi + 3x2 | 3xi + x2 I Xj > 0 = 2x + 3y 2x- y 2x+ y x,y > 0 -2x + 3y x- y x- 2y x,y > 0 3xi — 2x2 Xi + X 2 < 6 < 10 > - 2 < 4 + 2x3 2xx + x2 + x3 Xj > -- xi + 2x2 3xx + x2 2xx + 4x2 X- 0 + Z 3 - + 2x < 7 < 4 X4 3 -XA - 4x3 i> 0 = xi + x2 + 3x3 - + xz + 2x3 + XA + 4x4 XA < 8 < 6 < 18 < 2 < 4
  • 430. 4 1 4 Chapter Ten: Linear Programming 7. Use the Two Phase Method to solve the following linear programming problem, noting that only one artificial variable is needed. maximize: z = x + 2^2 — Xs subject to: {2a:i + x2+ x3 < 4 xi + x-i + 2x3 — 3 Xj> 0
  • 431. Appendix MATHEMATICAL INDUCTION Mathematical induction is one of the most powerful meth- ods of proof in mathematics and it is used in several places in this book. Since some readers may be unfamiliar with in- duction, and others may feel in need of a review, we present a brief account of it here. The method of proof by induction rests on the following principle. Principle of mathematical induction Let m be an integer and let P(n) be a statement or propo- sition defined for each integer n > m. Assume furthermore that the following hold: (i) P(m) is true; (ii) if P(n — 1) is true, then P(n) is true. Then the conclusion is that P{n) is true for all integers n > m. While this may sound harmless enough, it is in fact an axiom for the integers: it cannot be deduced from the usual arithmetic properties of the integers and its validity must be assumed. We shall give some examples to illustrate the use of this principle. Example A.l If n is any positive integer, prove by mathematical induction that the sum of the first n positive integers equals n{n + 1). Let P(n) denote the statement: 1 + 2 +• • • + n = - n ( n + l ) . 415
  • 432. 416 Appendix We have to show that P(n) is true for all integers n > 1. Now clearly P(l) is true: it simply asserts that 1 = {2). Suppose that P(n — 1) is true; we must show that P(n) is also true. In order to prove this, we begin with 1 + 2 + • • • + (n — 1) = {n — l)n, which is known to be true, and then add n to both sides. This yields 1 + 2 + • • • + (n - 1) + n = -(n - l)ra + n = ~n{n + 1). Hence P(n) is true. Therefore by the Principle of Mathemat- ical Induction P(n) is true for all n > 1. Example A.2 Let n be any positive integer. Prove by mathematical induc- tion that the integer 8n + 1 + 92 n _ 1 is always divisible by 73. Let P{n) be the statement: 73 divides 8n + 1 +92 n _ 1 . Then we easily verify that -P(l) is true. Assume that P(n — 1) is true; thus 8n + 92n ~3 is divisible by 73. We need to show that P(n) is true. The method in this example is to express gn+l + g2n-l i n t e r m s o f gn + Q2n-3. t h u g gn+1 + Q2n-1 = g(gn + g2n-3) + g2n-l _ g^n-3^ = 8(8n + 92n ~3 ) + 92n ~3 (92 - 8) = 8(8n +92 n -3 ) + 73(92 n '3 ). Since P(n — 1) is true, the last integer is divisible by 73. There- fore P(n) is true. Occasionally, the following alternate form of mathemati- cal induction is useful. Principle of mathematical induction - alternate form Let m be an integer and let P{n) be a statement or propo- sition defined for each integer n > m. Assume furthermore that the following hold:
  • 433. Appendix 417 (i) P(m) is true; (ii) if P(k) is true for all k < n , then P(n) is true. Then the conclusion is that P(n) is true for all integers n > m. Example A 3 Prove that every integer n > 1 is a product of prime numbers. Let P{n) be the statement that n is a product of primes; here n > 2. Then P(2) is certainly true since 2 is a prime. Assume that P(k) is true for all k < n. We have to show- that P(n) is true. Now if n is a prime, P(n) is certainly true. Assume that n is not a prime; then n = nn<i where n and n,2 are positive integers less than n. Hence P{n) and P(ri2) are true, so both n and n^ are products of primes. Therefore n = nTi2 is a product of primes and Pin) is true. It now follows from the second form of the Principle of Mathematical Induction that P(n) is true for all n > 2. Exercises 1. If n is a positive integer, prove by induction that the sum of the squares of the first n positive integers equals |n(n + l)(2n + l). 2. If n is a positive integer, prove by induction that the sum of the cubes of the first n positive integers equals (|n(n + l))2 . 3. Let uo,ui,U2, • • • be a sequence of integers which satisfies the recurrence relation un+i — 2un + 3 and also UQ = 1. Prove by induction that un = 2n + 2 — 3. 4. Prove by induction that the number of symmetric n x n matrices over the field of two elements equals 2n (n+1 )/2 . 5. Use the second form of mathematical induction to prove that each integer > 1 is uniquely expressible as a product of primes.
  • 434. ANSWERS TO THE EXERCISES Exercises 1.1 1- (_3 ""4 J "g)-2. (a)(-l)^-1 ;(b)4z + j - 4 . 3. Six: 0i2,i, 06,2, 04)3, 03i4, 02,6, Oi,^. 4. n should be prime. 5. Diagonal matrices. Exercises 1.2 22 -5 14 14 - 6 1 / 9 A3 =1-4 1 2 3. A is m x n and B is n x m. 4. A6 = I2. 9. True. 12. Numbers of books in library, lent out, lost are 7945, 1790, 265 respectively. 14. The matrix equals 1 5/2 11/2 / 0 1/2 - 3 / 2 ' 5/2 5 -7/2 + -1/2 0 5/2 11/2 -7/2 5 / 3/2 -5/2 0, 18. (a) The inverse is | ( „ " I; (b) not invertible. 21. / 0 1 0" 0 0 1 0 0 0 Exercises 1.3 2. A non-zero matrix need not have an inverse. o <yn2 o n ( n + l ) / 2 418
  • 435. Answers 419 1 0 0 4. A + B= 1 0 0 | , A2 = 1 0 0 0 1 0' AB= | 0 0 1 1 1 1 7. The integer 2 does not have an inverse. Exercises 2.1 1. xi = c/3 + d/3 - 1/3, x2 = 4c/3 - 2d/3 + 11/3, x3 = c, £4 = d. 2. xi = 2c/3 - 5/3, x2 = 2c/3 + 7/3, x3 = c/3 + 2/3, x4 = c . 3. Inconsistent. 4. (a) xi = —c , X2 = c , X3 = 0, X4 = 0; (b) x = X2 = X3 = 0. 5. For t = - 4 or 3. 6. t ^ - 1 / 3 . 7. n(n + l ) / 2 . Exercises 2.2 1. W [ 0 ^ 4/5);(b) (J g "J 2 - 2 ,. l V 5 J ; ( b ) ( ( 1 2 - 3 0 (c) ( 0 1 -11/5 1/5 0 0 0 1 / l 0 7/5 / l 0 7/5 0' 2. (a) /3; (b) 0 1 -11/5 ; (c) 0 1 -11/5 0 0 0 0 / 0 0 0 1 5. n(n + l ) / 2 . 6. n2 . 7. J2 and H j 8. The number of pivots equals n.
  • 436. 420 Answers Exercises 2.3 M.)(i!)(J_3°)(J?)(i?" J (b) (These answers are not unique). 2. 5. n(n + l)/2 and n' 2 6. (a) i ^ 3^ . ( b ) _ i / 3 _ 6 - 3 ; (c) not invertible. 7. t = —3 or 2. 8. Entries on the principal diagonal must be non-zero. Exercises 3.1 1. Odd; -aiia23a38a45a52066a.74087- 2. Even; ai8a25«33«4205ia67076a89a94-
  • 437. Answers 421 3. 19. 4. n(n) - 1. 5. M13 = 11 = A13, M23 = 7 = -A2 3 , M33 = - 6 = A33. 6. 84. 7. (a) -40, (b) -30, (c) -36. / 0 0 1 0 0 1 0 0 0 0 9. 0 0 0 1 0 0 0 0 0 1 Vo 1 0 0 0 / Exercises 3.2 1. (a) 133; (b) 132; (c) -26. 10. u2 = 3, u3 = 14, w4 = 63. Exercises 3.3 -6 -14 2- (a) ^ ( 2 4 ) ; ( b ) - & ( - 1 5 -11 _ 8 j ; / I 0 0 0 - 1 1 0 0 0 - 1 1 0 0 0 - 1 1 (c) 4. (a) xi = 1, x2 = 2, £3 = 3; (b) x = 1, x2 = 0, x3 = - 2 . 7. 2x-3y-z = l. Exercises 4.1 2. (a) No; (b) no; (c) yes; (d) yes. Exercises 4.2 1. (a) No; (b) no; (c) yes. 2. (a) No; (b) yes; (c) yes. 3. Yes. 4. No. Exercises 4.3 1. (a) Linearly independent; (b) linearly independent; (c) linearly dependent. 2. True. 3. True. 4. False. 9. False. 10. No.
  • 438. 422 Answers Exercises 5.1 1. (a) Ex = l/13(9Xi+3X2 -8X3 ), E2 = l/13(-3Xi - X 2 + 7X3), E3 = l/13(-17Xi - 10X2 + I8X3); (b) Ex = -2YX + AY2 - Y3, E2 = AYX - 7Y2 + 2y3, E3 = y i - 3 F 2 + y3. 2. (a) (-2 - 1 1)T ; (b) ( 1 - 1 1 0)T and (-2 1 0 1)T . 3. mn. 6.S = V. 8. - 4 ( - l 1 0)T - 2(-l 0 l ) r . Exercises 5.2 1. (a) Basis of the row space is (1 0 63/2), (0 1 18), basis of the column space is (10)T ,(0 1)T ; (b) basis of the row space is (1 0 5/19 25/19), (0 1 4/19 20/19), basis of the column space is(10 4)T ,(0 1 1)T . 2. (a) 1 + 5z3 /3, x + x3 /3, x2 + x3 ; (b) ( 1/76/7J' (l/7 -1/7 5. vn — r. 6. They are < rank of A and < rank of B. Exercises 5.3 1. The subspaces generated by (1 0)T , (1 1)T , (0 l ) r . 2. dim(t/) = n(n + l)/2 and dim(W) = n(n - l)/2. 5. False. 6. Let U =< fi I i = 1,..., 7 > and W =< fi | i = 4,..., 14 > where fi = xl ~l . 7. dim(U + W)=3, dim(UnW) = l. 8. Basis for U+W 3 , basis for UCW is l — x+x2 . 11. dim(E/i) + -.- + dim({7fc). 12. No. / - l / 3 / c / 3 + d / 3 1 4 y _ [ 11/3 v [ 4c/3-d/3 0 ~ 0 ' c V 0 y V d / 15. n - 1 .
  • 439. Answers 423 Exercises 6.1 1. (a) None of these; (b) bijective; (c) surjective; (d) injective. 4. F-1 (x) = {(x + 5)/2}1 /3 . Exercises 6.2 1. (a) No; (b) yes; (c) no, unless n — 1. i - i - i - i Z1 ° 4. I 2 1 - 1 0 | . 5 . 0 1 - 1 1 6. 7. 8. cos 20 sin 20 sin 20 — cos 20 1/2 1/4 -3/4 0 1/2 -1/2 0 0 1 0 2 -3 0 0 6 - 5 / 6 - 7 - 2 x2 - 3 - 1 12. The statement is true. 9. They have different determinants. Exercises 6.3 1. (a) Basis of kernel is (-1 1 0 0)T , (-1 0 1 0)r , (-1 0 0 1)T , basis of image is 1; (b) basis of kernel is 1, basis of image is 1, x; (c) basis of kernel is (—3 2)T , basis of image is (1 2)T . 3. R6 , R6 and M(2,3, R) are all isomorphic: C6 and P6(C) are isomorphic. 6. True. 10. They are not equivalent for infinitely generated vector spaces. 3. 1/14(1 2 3)T and Exercises 7.1 1. 92.84°. 2. ±l/v/42(4 1 - 5)T 1/VTi. 4. 9/^/26. 5. Vector product = (-14 - 4 8)'", area = 2^69. 8. det(X Y Z) = 0. 9. Dimension = n - 1 or n , according as X ^ 0 or X — 0.
  • 440. 424 Answers 11. t(3-V2 + 3(V2 + l)i (/2-3)(l+i) 4) where i = V=l and t is arbitrary. 13. (X*Y/Y2 )Y. Exercises 7.2 1. (a) No; (b) yes; (c) no. 4. (a) No; (6) no; (c) yes. 8. 23-120:r+110:r2 . 9. 1/105(17 - 190 331)T . Exercises 7.3 2. l/>/2(l 0 - 1 ) T , 1/3(2 1 2)T , l / v l 8 ( l - 4 1)T . 3. 1/^2(0 1 l ) r , 1/3(1 - 2 2)T , l/>/l8(4 1 - 1)T . 4. 1/^7(1 -6x), 75/154(2 + 30x-42a;2 ). 5. (1/2 4 1/2)T . / 0 1/3 6. Q = l/y/2 - 2 / 3 lA/18 I and l/v/2 2/3 V 2 0 1/^2 i?= | 0 3 - 1 / 3 0 0 5/VT8 8. The product of ( " " v * V « + « ) / 3 (f(1 -^3 ).10.Q = 0'a,dfl = iJ'. Exercises 7.4 1. (a) xi = 1, z2 = - 3 / 5 , x3 = -3/5; (b) m = 1631/665, x2 = -88/95, x3 = -66/95. 2. r = -70£/51 + 3610/51. 3. y = - 4 + 7x/2 - x2 /2. 4. xi = 13/35, x2 = -17/70, x3 = 1/70. 5. (-8e-x + 26e~2 ) + (6e_1 - 18e"2 )x. 6. 12(TT2 - 10)/TT3 + 60(-TT2 + 12)x/ir4 + 60(TT2 - 12)X2 /TT5 . Exercises 8.1 1. (a) Eigenvalues —2, 6; eigenvectors t(—5 3)T , t( 1)T ; (b) eigenvalues 1, 2, 3, eigenvectors t ( - l 1 2)T , t(-2 1 4)T , £(—1 1 4)T ; (c) eigenvalues 1, 2, 3, 4, eigenvectors
  • 441. Answers 425 ( 2 - 4 - 1 1)T , ( 0 - 2 0 1)T , (0 0 1 1)T , (0 0 0 1)T . 5. False. 7. (a) ( - J);(b )(l "j "jJ. 8. They should be both zero or both non-zero. 13. Non-zero constants. Exercises 8.2 1. (a) yn = 4- 3n + 1 - 3 • 4n + 1 , zn = - 3 n + 1 + 4n+1 ; (b) yn = l/9(10-7" + 5(-2)"+1 ), zn = 1/9(5- 7n -2(-2)-+1 ). 2. an = l/3(a0 + 2b0 + 2.4n (a0 - &o)), bn = l/3(a0 + 260 + 4n (—ao + bo)) '• if ^0 > bo> species A nourishes, and species B dies out: if ao < bo, the reverse holds. 3. rn = 2/V5{((l + >/5)/2)" - ((1 - y/E)/2)n }. 4. un = (2n+1 + (-l)n )/3. 5. yn = 1 + 2n, zn = 2n. 6. yn = ( ( - l ) n + 1 + l)/2, zn = (38.4"-1 + 3 ( - l ) n - 5)/30. 7. Employed 85.7%, unemployed 14.3%. 8. Equal numbers at each site. 9. Conservatives 24%, liberals 45%, socialists 31%. Exercises 8.3 1. (a) yi = -cie~5 x + 2c2ex , y2 = cxe~5x + c2ex ; (b) yi = aex - c2e5x , y2 = cxex + c2e5x ; (c) 2/i = cxex + c3e3x , 2/2 = -2c2ex , y3 = c2ex + c3e3x . 2. 2/i = e2x (cos x + sin x), y2 = 2e2x cos x. 3. 2/1 = {3c2x — ci)e2x , 2/2 = (—3c2£ + ci + c2)e2x : particular solution 2/i = 6xe2x , y2 — (2 — 6x)e2s . 4. 2/i = cie^ +C2e~x + c3e3a:: -f-C4e~3x , y2 = —cex + 5c2e~x + c3e3x — 5c4e_3x . 5. n/c. 7. (a) 2/1 = —u + u2, y2 = u — 3^2 where ui = cicosh y/2x + disinh [2x and 1 * 2 = c2cosh 2x + d2sinh 2x; (b) 2/1 = —4tii — u2, 2/2 = wi + u2 where u = cicosh x+ disinh x and u2 = C2COsh 2x + d2smh 2x.
  • 442. 426 Answers 8- Vi = (-1 - /2)wi + (-1 + V2)w2, 2/2 = wi + w2 where wi = ci cos ux + di sin ux and w2 = C2 cos vx + d2 sin us with tt = aJ2 + 1/2 and w = a / 2 - / 2 . Exercises 9.1 -1/^/3 2/^6 0' / 1 1 / -1-/V0 */Vv u 1. (a) 1/^2 _} ! ; (b) l/v/3 1/^6 - 1 / ^ 2 V J W3 W6 W2, (c)W2Q j), » = >/=!. 8.r-u=^. Exercises 9.2 1. (a) Positive definite; (b) indefinite; (c) indefinite. 2. Indefinite. 6. (a) Ellipse; (b) parabola. 7. (a) Ellipsoid; (b) hyperboloid (of one sheet). 8. (a) Local minimum at (—4, 2); (b) local maximum at (-1 - y/2, 1 - y/2), local minimum at (1 + ^2, - 1 + ^2), saddle points at (^/2 - 1 , ^ / 2 + 1) and (1 - y/2, - 1 - y/2); (c) local minimum at (—2/5, —1/5,3/10). 9. The smallest and largest values are 5 2 17 and 5 ^17 re- spectively. 11. The spheres have radii 0.768 and 0.434 respectively. Exercises 9.3 1. (a) No; (b) yes; (c) yes. 2. (a) ( _ ° J ) ! 0>) ( ° I ? ) • 3. dim(V') = n2 . 5. 2zi yi + 4x2 y'2. 7. (a) Yes; (b) no. 0 1 0 /1/ 2 0 1/2' 8. I - 1 0 0 ; S= 0 1 1 / 2 0 0 0 / I 0 0 1
  • 443. Answers 427 Exercises 9.4 1. (a) x-2; (b) {x - 2)(x - 3); (c) x2 - 1; (d) {x - 2)2 {x - 3). 6. (a) (a - 4)(x + 1); (b) (x - 2)2 ; (c) (x - l)3 .' L 0 7. A must be similar to where r + s L r 0 os 8. A must be similar to a block matrix with a block Ir, t 1 and a block 0S where r + 2t + s blocks 10. 0 0 n. n. fO 0 1 0 0 1 Vo 0 0 0 0 -o2 1 - a n _ i / 11. y 3 : Vi = {c2x2 + (ci + c2):r + (c0 + cx))ex , y2 = c2ex , (c2x + ci)ex . Exercises 10.1 1. maximize: p = pix± + p2x2 + P3X3 subject to U1X1 + U2X2 + U3X3 < S VX + V2X2 + V3X3 < t Xj > 0 minimize: e = px + qy acx + bcy > mc dfx + bfy > rrif apx + bpy > mp x,y > 0 subject to :
  • 444. 428 Answers 3. maximize: z = —2xi + x2 + x£ — x% — X4 { —x — 2x2 — X3 + x^ + x < —5 3xi + x2 — x£ + X3 + X4 < 4 xi,x2,x^,x^ > 0 4. maximize: z = —2x + x2 + x^ — x% — X4 { —xi — 2x2 — X3 + X3 + X4 + Xs = —5 3xi + x2 — £3" + X3 + X4 + Xe = 4 Xi,X2,X~3,Xs,X4,X5,Xe > 0 5. (c) z = CT A~l B. Exercises 10.2 4. (a) In Exercise 10.2.1 the extreme points are (0, 2), (1/3, 7/3), (0, 3). (b) In Exercise 10.2.2 the extreme points are (0, 0), (3, 0), (5, 1),(6, 0), 5. The optimal solution is x = 0, y = 6. Exercises 10.3 1. The optimal solution is x = 3, y = 1. 3. The optimal solution is x = 0, y = 10. 4. The optimal solution is X = 0 , x2 = 2, £3 = 0. Exercises 10.4 1. x = 3, y = 1. 2. x = 0, y = 10.
  • 445. Answers 3. No optimal solution. 4. x = 0, x2 = 4, xz = 0. 5 xi = 0, 22 = 4/3, x3 = 1/3, X4 = 0. 6. X! = 0, x2 = 2/3, x3 = 26/3, 24 = 0. 7. xi = 0 , 22 = 3, x3 = 0.
  • 446. BIBLIOGRAPHY Abstract Algebra (1) I.N. Herstein, "Topics in Algebra", 2nd ed., Wiley, New York, 1975. (2) S. MacLane and G. Birkhoff, "Algebra", 3rd ed., Chelsea, New York, 1988. (3) D.J.S. Robinson, "An Introduction to Abstract Algebra", De Gruyter, Berlin, 2003. (4) Rotman, J.J. "A First Course in Abstract Algebra", 2nd ed., Prentice Hall, Upper Saddle River, NJ, 2000. Linear Algebra (5) C.W. Curtis, "Linear Algebra, an Introductory Approach", Springer, New York, 1984. (6) F.R. Gantmacher, "The Theory of Matrices", 2 vols., Chelsea, New York, 1960. (7) P.J. Halmos, "Finite-Dimensional Vector Spaces", Van Nostrand-Reinhold, Princeton, N.J., 1958. (8) B. Kolman, "Introductory Linear Algebra with Applica- tions", 5th ed., Macmillan, New York, 1993. (9) S.J. Leon, "Linear Algebra with Applications", 5th ed., Prentice Hall, Upper Saddle River, NJ, 1998. (10) G. Strang, "Linear Algebra and its Applications", 3rd ed., Harcourt Brace Jovanovich, San Diego, 1988. Applied Linear Algebra (11) R. Bellman, "Introduction to Matrix Analysis", 2nd ed., Society for Industrial and Applied Mathemetics, Philadelphia, 1995. (12) H. Karloff, "Linear Programming", Birkhauser, Boston 1991. (13) B. Kolman and R.E. Beck, "Elementary Linear Program- ming with Applications", Academic Press, San Diego, 1995. 430
  • 447. Bibliography 431 (14) B. Noble and J.W. Daniel, "Applied Linear Algebra", 3rd ed., Prentice-Hall, Englewood Cliffs, N.J., 1988. Some Related Books of Interest (15) W.R. Derrick and S.I. Grossman, "Elementary Differen- tial Equations with Applications", 2nd ed., Addison-Wesley, Reading, MA, 1982. (16) C.H. Edwards and D.E. Penney, "Elementary Differential Equations with Boundary Value Problems", 2nd ed., Prentice- Hall, Englewood Cliffs, N.J., 1989 (17) J.G. Kemeny and J.L. Snell, "Finite Markov Chains", Springer, New York, 1976. (18) G.B. Thomas and R.L. Finney, "Calculus and Analytic Geometry", 9th ed., Addison-Wesley, Reading, MA, 1996.
  • 448. Index Addition, of linear operators, 186 of matrices, 6 Adjoint of a matrix, 80 Algebra, 188 of linear operators, 186 of matrices, 188 Angle between two vectors, 196 Artificial variable, 408 Associative law, 12, 25, 155 Augmented matrix, 3 Auxiliary program, 408 Back substitution, 32 Basic solution, 392 Basis, 114 change of, 169 ordered, 120 Bijective function, 153 Bilinear form, 332 matrix representation of, 333 skew-symmetric, 335 symmetric, 335 Bland's rule, 406 Block, Jordan, 355 Canonical form, linear program in 375 Cauchy-Schwartz inequality, 196, 203, 213 Cayley-Hamilton Theorem, 351 Change of basis, 169 and linear transformations, 173 Characteristic equation, 260 Characteristic polynomial, 260 Codomain, 152 Coefficient matrix, 3 Cofactor, 65 Column, echelon form, 49 expansion, 66 operation, 49 space, 126 vector, 4 Commutative law, 12, 25 Companion matrix, 274 Complex, inner product space, 217 scalar product, 206 transpose, 205 Composite of functions, 154 Congruent matrices, 334 eigenvalues of, 339 Conic, 315 Consistent linear system, 34 Constraint, 372 Convex combination, 384 hull, 384 set, 382 Coordinate vector, 120 Coset, 143 Cost matrix, 15 Cramer's rule, 84 Critical point, 324 Crossover diagram, 60 Degeneracy, 406 Departing variable, 402 Determinant, 57 definition of, 64 of a product, 79 properties of, 70 Diagonal matrix, 5 Diagonalizable matrix, 267, 307 Differential equations, 108 system of, 288, 363 Dimension, 117 formulas, 134, 147, 180 Direct sum of subspaces, 137 432
  • 449. Index 433 Distance of a point from a plane, 198 Distributive law, 13, 25 Domain of a function, 152 Echelon form, 36 Eigenspace, 257 Eigenvalue, 257, 266 of hermitian matrix, 304 Eigenvector, 257, 266 Elementary, column operation, 49 matrix, 47 row operation, 41 Entering variable, 402 Equations, linear, 3, 30 homogeneous, 38 Equivalent linear systems, 34 Euclidean space, 88 Even permutation, 60 Expansion, column, 66 row, 66 Extreme point, 386 Theorem, 388 Factorization, QR —, 234 Feasible solution, 373 Fibonacci sequence, 281 Field, axioms of a, 25 of two elements, 26 Finitely generated, 101 Function, 152 Fundamental subspaces, 224 Fundamental Theorem of Algebra, 260 Gaussian elimination, 35 Gauss-Jordan elimination, 37 General solution, 34 Geometry of linear programming, 380 Gram-Schmidt process, 230 Group, 28 general linear, 28 Hermitian matrix, 303 Hessian, 327 Homogeneous, linear differential equation, 108 linear system, 38 Identity, element, 25 function, 153 linear operator, 161 matrix, 4 Image, of a function, 152 of a linear transformation, 178 Inconsistent linear system, 34 Indefinite quadratic form, 320 Infinitely generated, 102 Injective function, 153 Inner product, 209 complex, 217 real, 209 standard, 210 Inner product space, 209 Intersection of subspaces, 133 finding basis of, 139 Inverse, of a function, 155 of a matrix, 17, 53 Inversion of natural order, 60 Invertible, function, 155 matrix, 17 Isomorphic, algebras, 190 vector spaces, 182 Isomorphism, 181, 190 Isomorphism theorems, 184, 192 Jordan, block, 355 normal form, 356, 368 string, 356 Kernel, 178
  • 450. 434 Index Law of Inertia, 340 Laws of, exponents, 22 matrix algebra, 12 Least Squares, Method of, 241 and Q.R-factorization, 248 geometric interpretation of, 250 in inner product spaces, 253 Least squares solution, 243 Length of a vector, 193 Line segment, 88 Linear, combination, 99 dependence, 104 differential equation, 108 independence, 104 mapping, 158 operator, 159 recurrence, 276 Linear programming problems, 370 Linear system, of differential equations, 288 of equations, 3, 30 of recurrences, 278 Linear transformation, 158 matrix representation of, 162, 166 Linearly, dependent, 104 independent, 104 Lower triangular, 5 Markov process, 284 regular, 285 Mathematical induction, 415 Matrices, addition of, 6 congruent, 334 equality of, 2 multiplication of, 7 scalar multiplication of, 60 similar, 175 Matrix, definition of, 1 diagonal, 5 diagonalizable, 267, 307 elementary, 47 hermitian, 303 identity, 4 invertible, 17 non-singular, 17 normal, 310 orthogonal, 235 partitioned, 20 permutation, 62 powers of, 11 scalar, 5 skew-hermitian, 312 skew-symmetric, 12 square, 4 symmetric, 12 triangular, 4 triangularizable, 271 unitary, 238 Maximum, local, 324 Method of Least Squares, 241 Minimum, local, 324 Minimum polynomial, 349 Minor, 65 Monic polynomial, 349 Multiplication of matrices, 7 Negative, of a matrix, 6 of a vector, 95 Negative definite quadratic form, 320 Negative semidefinite, 330 Non-singular, 17 Norm, 212 Normal, form of a matrix, 50 matrix, 310 system, 244 Normed linear space, 214 Null space of a matrix, 99 Objective, function, 372 row, 400 Odd permutation, 60
  • 451. Index 435 One-one, 153 correspondence, 153 Onto, 153 Operation, column, 49 row, 41 Optimal least squares solution, 251 Optimal solution of linear program, 373 Ordered basis, 120 Orthogonal, basis, 228 complement, 218 linear operator, 240 matrix, 235 set, 226 vectors, 196, 203, 211 Orthogonality, in inner product spaces, 211 i n R n , 203 Orthonormal, basis, 228 set, 226 Parallelogram rule, 90 Partitioned matrix, 20 Permutation, 59 matrix, 62 Pivot, 36 Pivotal row, 402 Polynomial, characteristic, 260 minimum, 349 Positive definite quadratic form, 320 Positive semidefinite, 330 Powers of a matrix, 11 negative, 24 Principal axes, 316 Principal diagonal, 4 Product of, determinants, 79 linear operators, 187 matrices, 7 Projection of a vector, on a line, 196 on a subspace, 222 QR-factorization, 234 Quadratic form, 313 indefinite, 320 negative definite, 320 positive definite, 320 Quadric surface, 318 Quotient space, 143 dimension of, 147 Rank of a matrix, 130 Ratio, 6-, 402 Real inner product space, 209 Recurrences, linear, 276 system of, 278 Reduced, column echelon form, 49 echelon form, 37 row echelon form, 37, 44 Reflection, 176 Regular Markov process, 285 Right-handed system, 201 Ring with identity, 12 matrix over, 26 o f n x n matrices, 27 Rotation, 172 Row, echelon form, 41 expansion, 66 operation, 41 space, 126 vector, 3 Row-times-column rule, 8 Saddle point, 324 Scalar, 6 matrix, 5 multiplication, 6, 95 product, 193 projection, 197 triple product 208 Scalar multiple, of a linear operator, 186 of a matrix, 6
  • 452. 436 Index Schur's Theorem, 305 Sign of a permutation, 62 Similar matrices, 175 Simplex algorithm, 399 Singular matrix, 17 Skew-hermitian matrix, 312 Skew-symmetric, bilinear form, 335 matrix, 12 Slack variable, 377 Solution, general, 34 non-trivial, 38 trivial, 38 Solution space, 99 Spectral Theorem, 307 Standard basis, of Pn (R), 118 of Rn , 114 Standard form, linear program in, 374 String, Jordan, 356 Subspace, 97 fundamental, 224 generated by a subset, 100 improper, 97 spanned by a subset, 100 zero, 97 Sum of subspaces, 133 finding a basis for, 139 Surjective function, 153 Sylvester's Law of Inertia, 340 symmetric, bilinear form, 335 matrix, 12 System, of differential equations, 288 of linear equations, 3, 30 of linear recurrences, 278 Tableau, 400 Trace, 263 Transaction, 123 Transition matrix, 284 Transpose, 11 complex, 205 Transposition, 61 Triangle inequality, 204, 215 Triangle rule, 90 Triangular matrix, 4 Triangularizable matrix, 271 Trivial solution, 38 Two Phase Method, 407 Unit vector, 194, 212 Unitary matrix, 238 Upper triangular matrix, 4 Vandermonde determinant, 75 Vector, 95 column, 4 product, 200 projection, 197 row, 3 triple product, 209 Vector space, 87 axioms for, 95 examples of, 87 Weight function, 225 Wronskian, 109 Zero, linear transformation, 161 matrix, 4 subspace, 97 vector, 95
  • 453. A Course in LINEAR ALGEBRA with Applications 2nd Edition This book is a comprehensive introduction to linear algebra which presupposes no knowledge on the part of the reader beyond the calculus. It gives a thorough treatment of all the basic concepts, such as vector space, linear transformation and inner product. The book proceeds at a gentle pace, yet provides full proofs. The concept of a quotient space is introduced and is related to solutions of linear system of equations. Also a simplified treatment of Jordan normal form is given. Numerous applications of linear algebra are described: these include systems of linear recurrence relations, systems of linear differential equations, Markov processes and the Method of Least Squares. In addition, an entirely new chapter on linear programming introduces the reader to the Simplex Algorithm and stresses understanding the theory on which the algorithm is based. The book is addressed to students who wish to learn linear algebra, as well as to professionals who need to use the methods of the subject in their own fields. Derek J.S. Robinson received his Ph.D. degree from Cambridge University. He has held positions at the University of London, the National University of Singapore and the University of Illinois at Urbana-Champaign, where he is currently Professor of Mathematics. He is the author of five books and numerous research articles on the theory of groups and other branches of algebra. !37hc 9 ''789812 7002301 www.worldscientific.com