SlideShare a Scribd company logo
L Y N N H. L 0 0 MIS and S H L 0 M 0 S T ERN B ERG
Department of Mathematics, Harvard University
ADVANCED CALCULUS
REVISED EDITION
JONES AND BARTLETT PUBLISHERS
Boston London
~"' ,
~ ",
:,i.; J)
Editorial, Sales, and Customer Service Offices:
Jones and Bartlett Publishers, Inc,
One Exeter Plaza
Boston, MA 02116
Jones and Bartlett Publishers International
POBox 1498
London W6 7RS
England
Copyright © 1990 by Jones and Bartlett Publishers, Inc.
Copyright © 1968 by Addison-Wesley Publishing Company, Inc.
All rights reserved. No part of the material protected by this copyright notice
may be reproduced or utilized in any form, electronic or mechanical, including
photocopying, recording, or by any information storage and retrieval system,
without written permission from the copyright owner.
Printed in the United States of America.
10 9 8 7 6 5 4 3 2
Library of Congress Cataloging-in-Publication Data
Loomis, Lynn H.
Advanced calculus / Lynn H. Loomis and Shlomo Sternberg. -Rev. ed.
p. cm.
Originally published: Reading, Mass. : Addison-Wesley Pub. Co., 1968.
ISBN 0-86720-122-3
1. Calculus. I. Sternberg, Shlomo. II. Title.
QA303.L87 1990
515--dc20
'. ,
'I
89-15620
CIP
PREFACE
This book is based on an honors course in advanced calculus that we gave in the
1960's. The foundational material, presented in the unstarred sections of Chap-
ters 1 through 11, was normally covered, but different applications of this basic
material were stressed from year to year, and the book therefore contains more
material than was covered in anyone year. It can accordingly be used (with
omissions) as a text for a year's course in advanced calculus, or as a text for a
three-semester introduction to analysis.
These prerequisites are a good grounding in the calculus of one variable
from a mathematically rigorous point of view, together with some acquaintance
with linear algebra. The reader should be familiar with limit and continuity type
arguments and have a certain amount of mathematical sophistication. AB possi-
ble introductory texts, we mention Differential and Integral Calculus by R. Cou-
rant, Calculus by T. Apostol, Calculus by M. Spivak, and Pure Mathematics by
G. Hardy. The reader should also have some experience with partial derivatives.
In overall plan the book divides roughly into a first half which develops the
calculus (principally the differential calculus) in the setting of normed vector
spaces, and a second halfwhich deals with the calculus ofdifferentiable manifolds.
Vector space calculus is treated in two chapters, the differential calculus in
Chapter 3, and the basic theory of ordinary differential equations in Chapter 6.
The other early chapters are auxiliary. The first two chapters develop the neces-
sary purely algebraic theory of vector spaces, Chapter 4 presents the material
on compactness and completeness needed for the more substantive results of
the calculus, and Chapter 5 contains a brief account of the extra structure en-
countered in scalar product spaces. Chapter 7 is devoted to multilinear (tensor)
algebra and is, in the main, a reference chapter for later use. Chapter 8 deals
with the theory of (Riemann) integration on Euclidean spaces and includes (in
exercise form) the fundamental facts about the Fourier transform. Chapters 9
and 10 develop the differential and integral calculus on manifolds, while Chapter
11 treats the exterior calculus of E. Cartan.
The first eleven chapters form a logical unit, each chapter depending on the
results of the preceding chapters. (Of course, many chapters contain material
that can be omitted on first reading; this is generally found in starred sections.)
On the other hand, Chapters 12, 13, and the latter parts of Chapters 6 and 11
are independent of each other, and are to be regarded as illustrative applications
of the methods developed in the earlier chapters. Presented here are elementary
Sturm-Liouville theory and Fourier series, elementary differential geometry,
potential theory, and classical mechanics. We usually covered only one or two
of these topics in our one-year course.
We have not hesitated to present the same material more than once from
different points of view. For example, although we have selected the contraction
mapping fixed-point theorem as our basic approach to the in1plicit-function
theorem, we have also outlined a "Newton's method" proof in the text and have
sketched still a third proof in the exercises. Similarly, the calculus of variations
is encountered twice-once in the context of the differential calculus of an
infinite-dimensional vector space and later in the context of classical mechanics.
The notion of a submanifold of a vector space is introduced in the early ohapters,
while the invariant definition of a manifold is given later on.
In the introductory treatment of vector space theory, we are more careful
and precise than is customary. In fact, this level of precision of language is not
maintained in the later chapters. Our feeling is that in linear algebra, where the
concepts are so clear and the axioms so familiar, it is pedagogically sound to
illustrate various subtle points, such as distinguishing between spaces that are
normally identified, discussing the naturality of various maps, and so on. Later
on, when overly precise language would be more cumbersome, the reader should
be able to produce for hin1self a more precise version of any assertions that he
finds to be formulated too loosely. Similarly, the proofs in the first few chapters
are presented in more formal detail. Again, the philosophy is that once the
student has mastered the notion of what constitutes a fonnal mathematical
proof, it is safe and more convenient to present arguments in the usual mathe-
matical colloquialisms.
While the level of formality decreases, the level of mathematical sophisti-
cation does not. Thus increasingly abstract and sophisticated mathematical
objects are introduced. It has been our experience that Chapter 9 contains the
concepts most difficult for students to absorb, especially the notions of the
tangent space to a manifold and the Lie derivative of various objects with
respect to a vector field.
There are exercises of many different kinds spread throughout the book.
Some are in the nature of routine applications. Others ask the r~ader to fill in
or extend various proofs of results presented in the text. Sometimes whole
topics, such as the Fourier transform or the residue calculus, are presented in
exercise form. Due to the rather abstract nature of the textual material, the stu-
dent is strongly advised to work out as many of the exercises as he possibly can.
Any enterprise of this nature owes much to many people besides the authors,
but we particularly wish to acknowledge the help of L. Ahlfors, A. Gleason,
R. Kulkarni, R. Rasala, and G. Mackey and the general influence of the book by
Dieudonne. We also wish to thank the staffofJones and Bartlett for their invaluable
help in preparing this revised edition.
Cambridge, Massachusetts
1968, 1989
L.H.L.
S.S.
Advanced_Calculus
CONTENTS
Chapter 0 I n troduction
1 Logic: quantifiers 1
2 The logical connectives 3
3 Negations of quantifiers 6
4 Sets 6
5 Restricted variables . 8
6 Ordered pairs and relations. 9
7 Functions and mappings 10
8 Product sets; index notation 12
9 Composition 14
10 Duality 15
11 The Boolean operations . 17
12 Partitions and equivalence relations 19
Chapter 1 Vector Spaces
1 Fundamental notions 21
2 Vector spaces and geometry 36
3 Product spaces and Hom(V, TV) 43
4 Affine subspaces and quotient spaces 52
5 Direct sums 56
6 Bilinearity 67
Chapter 2 Finite-Dimensional Vector Spaces
1 Bases 71
2 Dimension 77
3 The dual space 81
4 .Matrices 88
5 Trace and determinant 99
6 Matrix computations 102
*7 The diagonalization of a quadratic form 111
Chapter 3 The Differential Calculus
1 Review in IR 117
2 Norms. 121
3 Continuity 126
4 Equivalent norms 132
5 Infinitesimals . 136
6 The differential 140
7 Directional derivatives; the mean-value theorem 146
8 The differential and product spaces 152
9 The differential and IRn • 156
10 Elementary applications 161
11 The implicit-function theorem 164
12 Submanifolds and Lagrange multipliers 172
*13 Functional dependence 175
*14 Uniform continuity and function-valued mappings 179
*15 The calculus of variations 182
*16 The second differential and the classification of critical points 186
*17 The Taylor formula . 191
Chapter 4 Compactness and Completeness
1 Metric spaces; open and closed sets 195
*2 Topology 201
3 Sequential convergence . 202
4 Sequential compactness. 205
5 Compactness and uniformity 210
6 Equicontinuity 215
7 Completeness. 216
8 A first look at Banach algebras 223
9 The contraction mapping fixed-point theorem 228
10 The integral of a parametrized arc 236
11 The complex number system 240
*12 Weak methods 245
Chapter 5 Scalar Product Spaces
1 Scalar products 248
2 Orthogonal projection 252
3 Self-adjoint transformations 257
4 Orthogonal transformations 262
5 Compact transformations 264
Chapter 6 Differential Equations
1 The fundamental theorem 266
2 Differentiable dependence on parameters . 274
3 The linear equation 276
4 The nth-order linear equation 281
5 Solving the inhomogeneous equation 288
6 The boundary-value problem 294
7 Fourier series . 301
Chapter 7 Multilinear Functionals
1 Bilinear functionals 305
2 Multilinear functionals 306
3 Permutations. 308
4 The sign of a permutation 309
5 The subspace an of alternating tensors 310
6 The determinant . 312
7 The exterior algebra. 316
8 Exterior powers of scalar product spaces 319
9 The star operator 320
Chapter 8 Integration
1 Introduction 321
2 Axioms 322
3 Rectangles and paved sets 324
4 The minimal theory . 327
5 The minimal theory (continued) 328
6 Contented sets 331
7 When is a set contented? 333
8 Behavior under linear distortions 335
9 Axioms for integration 336
10 Integration of contented functions 338
11 The change of variables formula 342
12 Successive integration 346
13 Absolutely integrable functions 351
14 Problem set: The Fourier transform 355
Chapter 9 Differentiable Manifolds
1 Atlases 364
2 Functions, convergence . 367
3 Differentiable manifolds 369
4 The tangent space 373
5 Flows and vector fields 376
6 Lie derivatives 383
7 Linear differential forms 390
8 Computations with coordinates 393
9 Riemann metrics . 397
Chapter 10 The Integral Calculus on Manifolds
1 Compactness . 403
2 Partitions of unity 405
3 Densities 408
4 Volume density of a Riemann metric 411
5 Pullback and Lie derivatives of densities 416
6 The divergence theorem 419
7 More complicated domains 424
Chapter 11 Exterior Calculus
1 Exterior differential forms 429
2 Oriented manifolds and the integration of exterior differential forms 433
3 The operator d 438
4 Stokes' theorem 442
5 Some illustrations of Stokes' theorem 449
6 The Lie derivative of a differential form 452
Appendix 1. "Vector analysis" . 457
Appendix II. Elementary differential geometry of surfaces in [3 459
Chapter 12 Potential Theory in lEn
1 Solid angle 474
2 Green's formulas . 476
3 The maximum principle 477
4 Green's functions 479
5 The Poisson integral formula 482
6 Consequences of the Poisson integral formula 485
7 Harnack's theorem 487
8 Subharmonic functions 489
9 Dirichlet's problem 491
10 Behavior near the boundary 495
11 Dirichlet's principle 499
12 Physical applications 501
13 Problem set: The calculus of residues 503
Chapter 13 Classical Mechanics
1 The tangent and cotangent bundles 511
2 Equations of variation 513
3 The fundamental linear differential form on T*(M) 515
4 The fundamental exterior two-form on T*(M) 517
5 Hamiltonian mechanics . 520
6 The central-force problem 521
7 The two-body problem 528
8 Lagrange's equations 530
9 Variational principles 532
10 Geodesic coordinates 537
11 Euler's equations 541
12 Rigid-body motion 544
13 Small oscillations 551
14 Small oscillations (continued) 553
15 Canonical transformations 558
Selected References . 569
Notation Index 572
Index 575
Advanced_Calculus
CHAPTER 0
INTRODUCTION
This preliminary chapter contains a short exposition of the set theory that
forms the substratum of mathematical thinking today. It begins with a brief
discussion of logic, so that set theory can be discussed with some precision, and
continues with a review of the way in which mathematical objects can be defined
as sets. The chapter ends with four sections which treat specific set-theoretic
topics.
It is intended that this material be used mainly for reference. Some of it
will be familiar to the reader and some of it will probably be new. We suggest
that he read the chapter through "lightly" at first, and then refer back to it
for details as needed.
1. LOGIC: QUANTIFIERS
A statement is a sentence which is true or false as it stands. Thus '1 < 2' and
'4 +3 = 5' are, respectively, true and false mathematical statements. Many
sentences occurring in mathematics contain variables and are therefore not true
or false as they stand, but become statements when the variables are given
values. Simple examples are 'x < 4', 'x < 1/', 'x is an integer', '3x2 + y2 = 10'.
Such sentences will be called statementjrames. If P(x) is a frame containing the
one variable 'x', then P(5) is the statement obtained by replacing 'x' in P(x) by
the numeral '5'. For example, if P(x) is 'x < 4', then P(5) is '5 < 4', P(0)
is '0 < 4', and so on.
Another way to obtain a statement from the frame P(x) is to assert that P(x)
is always true. We do this by prefixing the phrase 'for every x'. Thus, 'for every
x, x < 4' is a false statement, and 'for every x, x2 - 1 = (x - 1)(x + 1)' is a
true statement. This prefixing phrase is called a universal quantifier. Syn-
onymous phrases are 'for each x' and 'for all x', and the symbol customarily
used is '("Ix)', which can be read in any of these ways. One frequently presents
sentences containing variables as being always true without explicitly writing
the universal quantifiers. For instance, the associative law for the addition of
numbers is often written
x + (y +z) = (x + y) +z,
where it is understood that the equation is true for all x, y and z. Thus the
1
2 INTRODUCTION 0.1
actual statement being made is
(Vx) (Vy) (Vz) [x + (y + z) = (x + y) + z].
Finally, we can convert the frame P(x) into a statement by asserting that
it is sometimes true, which we do by writing 'there exists an x such that P(x)'.
This process is called existential quantification. Synonymous prefixing phrases
here are 'there is an x such that', 'for some x', and, symbolically, '(::jx)'.
The statement '(Vx)(x < 4)' still contains the variable 'x', of course, but
'x' is no longer free to be given values, and is now called a bound variable.
Roughly speaking, quantified variables are bound and unquantified variables
are free. The notation 'P(x), is used only when 'x' is free in the sentence being
discussed.
Now suppose that we have a sentence P(x, y) containing two free variables.
Clearly, we need two quantifiers to obtain a statement from this sentence.
This brings us to a very important observation. If quantifiers of both types are
used, then the order in which they are written affects the meaning of the statement;
(::jy)(Vx)P(x, y) and (Vx)(::jy)P(x, y) say different things. The first says that one y
can be found that works for all x: "there exists a y such that for all x ... ".
The second says that for each x a y can be found that works: "for each x there
exists a y such that ... ". ~ut in the second case, it may very well happen that
when x is changed, the y that can be found will also have to be changed. The
existence of a single y that serves for all x is thus the stronger statement. For
example, it is true that (Vx)(::jy)(x < y) and false that (::jy)(Vx)(x < y). The
reader must be absolutely clear on this point; his whole mathematical future is
at stake. The second statement says that there exists a y, call it Yo, such that
(Vx)(x < Yo), that is, such that every number is less than Yo. This is false;
Yo + 1, in particular, is not less than Yo. The first statement says that for each x
we can find a corresponding y. And we can: take y = x + 1.
On the other hand, among a group of quantifiers of the same type the order
does not affect the meaning. Thus '(Vx) (Vy)' and '(Vy) (Vx) , have the same mean-
ing. We often abbreviate such clumps of similar quantifiers by using the quan-
tification symbol only once, as in '(Vx, y)', which can be read 'for every x and y'.
Thus the strictly correct '(Vx) (Vy) (Vz) [x + (y + z) = (x + y) + zl' receives the
slightly more idiomatic rendition '(Vx, y, z)[x + (y + z) = (x + y) + zl'. The
situation is clearly the same for a group of existential quantifiers.
The beginning student generally feels that the prefixing phrases 'for every x
there exists a y such that' and 'there exists a y such that for every x' sound
artificial and are unidiomatic. This is indeed the case, but this awkwardness is the
price that has to be paid for the order of the quantifiers to be fixed, so that the
meaning of the quantified statement is clear and unambiguous. Quantifiers do
occur in ordinary idiomatic discourse, but their idiomatic occurrences often
house ambiguity. The following two sentences are good examples of such
ambiguous idiomatic usage: "Every x is less than some y" and "Some y is greater
than every x". If a poll were taken, it would be found that most men on the
0.2 THE LOGICAL CONNECTIVES 3
street feel that these two sentences say the same thing, but half will feel that the
common assertion is false and half will think it true! The trouble here is that
the matrix is preceded by one quantifier and followed by another, and the poor
reader doesn't know which to take as the inside, or first applied, quantifier. The
two possible symbolic renditions of our first sentence, '[(Vx)(x < y)](3y)' and
'(Vx)[(x < y)(3y)1', are respectively false and true. Mathematicians do use
hanging quantifiers in the interests of more idiomatic writing, but only if they
are sure the reader will understand their order of application, either from the
context or by comparison with standard usage. In general, a hanging quantifier
would probably be read as the inside, or first applied, quantifier, and with this
understanding our two ambiguous sentences become true and false in that order.
After this apology the reader should be able to tolerate t.he definit.ion of
sequential convergence. It involves three quantifiers and runs as follows: The
sequence {xn} converges to x if (Ve) (3N) (Vn) (if n > N then IXn - xl < e).
In exactly the same format, we define a function f to be continuous at a if
(Ve) (3 0)(Vx) (if Ix - al < 0 then If(x) - f(a) I < e). We often omit an inside
universal quantifier by displaying the final frame, so that the universal quanti-
fication is understood. Thus we define f to be continuous at a if for every e
there is a 0 such that
if Ix - al < 0, then If(x) - f(a) I < E.
We shall study these definitions later. We remark only that it is perfectly
possible to build up an intuitive understanding of what these and similar
quantified statements actually say.
2. TIlE LOGICAL CONNECTIVES
When the word 'and' is inserted between two sentences, the resulting sentence
is true if both constituent sentences are true and is false otherwise. That is, the
"truth value", T or F, of the compound sentence depends only on the truth
values of the constituent sentences. We can thus describe the way 'and' acts in
compounding sentences in the simple "truth table"
P Q P and Q
T T T
T F F
F T F
F F F
where 'P' and 'Q' stand for arbitrary statement frames. Words like 'and' are
called logical connectives. It is often convenient to use symbols for connectives,
and a standard symbol for 'and' is the ampersand '&'. Thus 'P & Q' is read
'P andQ'.
4 INTRODUCTION 0.2
Another logical connective is the word 'or'. Unfortunately, this word is used
ambiguously in ordinary discourse. Sometimes it is used in the exclusive sense,
where 'P or Q' means that one of P and Q is true, but not both, and sometimes
it is used in the inclusive sense that at least one is true, and possibly both are
true. Mathematics cannot tolerate any fundamental ambiguity, and in mathe-
matics 'or' is always used in the latter way. We thus have the truth table
P Q P orQ
T T T
T F T
F T T
F F Ii'
The above two connectives are binary, in the sense that they combine two
sentences to form one new sentence. The word 'not' applies to one sentence and
really shouldn't be considered a connective at all; nevertheless, it is called a
unary connective. A standard symbol for 'not' is '~'. Its truth table is obviously
P ~P
T F
F T
In idiomatic usage the word 'not' is generally buried in the interior of a
sentence. We write' x is not equal to y' rather than' not (x is equal to y)'.
However, for the purpose of logical manipulation, the negation sign (the word
'not' or a symbol like '~') precedes the sentence being negated. We shall, of
course, continue to write 'x ~ y', but keep in mind that this is idiomatic for
'not (x = y)' or '~(x = y)'.
We come now to the troublesome 'if ... ,then ...' connective, which we
write as either 'if P, then Q' or 'P ==} Q'. This is almost always applied in the
universally quantified context (Vx) (P(x) ==} Q(x»), and its meaning is best
unraveled by a study of this usage. We consider 'if x < 3, then x < 5' to be a
true sentence. More exactly, it is true for all x, so that the universal quantifi-
cation (Vx)(x < 3 ==} x < 5) is a true statement. This conclusion forces us to
agree that, in particular, '2 < 3 ==} 2 < 5', '4 < 3 ==} 4 < 5', and '6 < 3 ==}
6 < 5' are all true statements. The truth table for '==}' thus contains the
values entered below.
P Q P==}Q
T T T
T F
F T T
F F T
0.2 THE LOGICAL CONNECTIVES 5
On the other hand, we consider 'x < 7 ==} x < 5' to be a false sentence, and
therefore have to agree that '6 < 7 ==} 6 < 5' is false. Thus the remaining row
in the table above gives the value 'F' for P ==} Q.
Combinations of frame variables and logical connectives such as we have
been considering are called truth-functional forms. We can further combine the
elementary forms such as 'P ==} Q' and '",P' by connectives to construct com-
posite forms such as '",(P ==} Q)' and '(P ==} Q) & (Q ==} P)'. A sentence has a
given (truth-functional) form if it can be obtained from that form by substitution.
Thus 'x < y or ",(x < V)' has the form 'P or ",P', since it is obtained from this
form by substituting the sentence 'x < y' for the sentence variable 'P'. Com-
posite truth-functional forms have truth tables that can be worked out by
combining the elementary tables. For example, '",(P ==} Q)' has the table below,
the truth value for the whole form being in the column under the connective
which is applied last ('",' in this example).
P Q ",(P ==} Q)
T T F T
T F T F
F T F T
F F F T
Thus", (P ==} Q) is true only when P is true and Q is false.
A truth-functional form such as 'P or (",P), which is always true (i.e., has
only 'T' in the final column of its truth table) is called a tautology or a tautologous
form. The reader can check that
and ((P ==} Q) & (Q ==} R)) ==} (P ==} R)
are also tautologous. Indeed, any valid principle of reasoning that does not
involve quantifiers must be expressed by a tautologous form.
The 'if and only if' form 'P <=? Q', or 'P if and only if Q', or 'P iff Q', is an
abbreviation for '(P ==} Q) & (Q ==} P)'. Its truth table works out to be
P Q P<=?Q
T T T
T F F
F T F
F F T
That is, P <=? Q is true if P and Q have the same truth values, and is false
otherwise.
Two truth-functional forms A and B are said to be equivalent if (the final
columns of) their truth tables are the same, and, in view of the table for '<=?',
we see that A and B are equivalent if A <=? B is tautologous, and conversely.
6 INTRODUCTION 0.4
Replacing a sentence obtained by substitution in a form A by the equivalent
sentence obtained by the same substitutions in an equivalent form B is a device
much used in logical reasoning. Thus to prove a statement P true, it suffices to
prove the statement ",P false, since 'P' and '",(",P), are equivalent forms.
Other important equivalences are
",(P or Q) ~ (",P) & (",Q),
(P => Q) ~ Q or (",P),
",(P => Q) ~ P & (",Q).
A bit of conventional sloppiness which we shall indulge in for smoother
idiom is the use of 'if' instead of the correct 'if and only if' in definitions. We
definefto be continuous at x if so-and-so, meaning, of course, thatfis continuous
at x if and only if so-and-so. This causes no difficulty, since it is clear that 'if
and only if' is meant when a definition is being given.
3. NEGATIONS OF QUANTIFIERS
The combinations '",(V'x)' and '(3x)",' have the same meanings: something is
not always true if and only if it is sometimes false. Similarly, '",(3y)' and '(V'y)",'
have the same meanings. These equivalences can be applied to move a negation
sign past each quantifier in a string of quantifiers, giving the following important
practical rule:
In taking the negation of a statement beginning with a string of quantifiers,
we simply change each quantifier to the opposite kind and move the negation
sign to the end of the string.
Thus
",(V'x)(3y) (V'z)P(x, y, z) ~ (3x)(V'y)(3z)",P(x, y, z).
There are other principles of quantificational reasoning that can be isolated
and which we shall occasionally mention, but none seem worth formalizing here.
4. SETS
It is present-day practice to define every mathematical object as a set of some
kind or other, and we must examine this fundamental notion, however briefly.
A set is a collection of objects that is itself considered an entity. The objects
in the collection are called the elements or members of the set. The symbol for
'is a member of' is 'E' (a sort of capital epsilon), so that 'x E A' is read "x is a
member of A", "x is an element of A", "x belongs to A", or "x is in A".
We use the equals sign '=' in mathematics to mean logical identity; A = B
means that A is B. Now a set A is considered to be the same object as a set B
if and only if A and B have exactly the same members. That is, 'A = B' means
that (V'x)(x E A ~ x E B).
0.4 SETS 7
We say that a set A is a subset of a set B, or that A is included in B (or that
B is a superset of A) if every element of A is an element of B. The symbol for
inclusion is Ie'. Thus 'A e B' means that (Yx)(x E A =} x E B). Clearly,
(A = B) {=} (A e B) and (B e A).
This is a frequently used way of establishing set identity: we prove that A = B
by proving that A e B and that B e A. If the reader thinks about the above
equivalence, he will see that it depends first on the equivalence of the truth-func-
tional forms 'P {=} Q' and '(P =} Q) & (Q =} P)', and then on the obvious
quantificational equivalence between '(Yx)(R & S)' and '(Yx)R & (Yx)S'.
We define a set by specifying its members. If the set is finite, the members
can actually be listed, and the notation used is braces surrounding a member-
ship list. For example {I, 4, 7} is the set containing the three numbers 1, 4, 7,
{x} is the unit set of x (the set having only the one object x as a member),
and {x, y} is the pair set of x and y. We can abuse this notation to name some
infinite sets. Thus {2, 4, 6, 8, ...} would certainly be considered the set of all
even positive integers. But infinite sets are generally defined by statement
frames. If P(x) is a frame containing the free variable 'x', then {x : P(x)} is the
set of all x such that P(x) is true. In other words, {x : P(x)} is that set A such
that
yEA {=} P(y).
For example, {x: x2 < 9} is the set of all real numbers x such that x 2 < 9,
that is, the open interval (-3, 3), and y E {x : x2 < 9} {=} y2 < 9. A statement
frame P(x) can be thought of as stating a property that an object x mayor may
not have, and {x : P(x)} is the set of all objects having that property.
We need the empty set 0, in much the same way that we need zero in
arithmetic. If P(x) is never true, then {x: P(x)} = 0. For example,
{x:x ~ x} = 0.
When we said earlier that all mathematical objects are customarily con-
sidered sets, it was taken for granted that the reader understands the distinction
between an object and a name of that object. To be on the safe side, we add a
few words. A chair is not the same thing as the word 'chair', and the number 4
is a mathematical object that is not the same thing as the numeral '4'. The
numeral '4' is a name of the number 4, as also are 'four', '2 +2', and 'IV'.
According to our present viewpoint, 4 itself is taken to be some specific set.
There is no need in this course to carry logical analysis this far, but some readers
may be interested to know that we usually define 4 as {O, 1, 2, 3}. Similarly,
2 = {O, I}, 1 = {O}, and 0 is the empty set 0.
It should be clear from the above discussion and our exposition thus far
that we are using a symbol surrounded by single quotation marks as a name of
that symbol (the symbol itself being a name of something else). Thus' '4' , is a
name of '4' (which is itself a name of the number 4). This is strictly correct
8 INTRODUCTION 0.5
usage, but mathematicians almost universally mishandle it. It is accurate to
write: let x be the number; call this number 'x'. However, the latter is almost
always written: call this number x. This imprecision causes no difficulty to the
reading mathematician, and it often saves the printed page from a shower of
quotation marks. There is, however, a potential victim of such ambiguous
treatment of symbols. This is the person who has never realized that mathe-
matics is not about symbols but about objects to which the symbols refer. Since
by now the present reader has safely avoided this pitfall, we can relax and
occasionally omit the strictly necessary quotation marks.
In order to avoid overworking the word 'set', we use many synonyms,
such as 'class', 'collection', 'family' and 'aggregate'. Thus we might say, "Let a
be a family of classes of sets". If a shoe store is a collection of pairs of shoes, then
a chain of shoe stores is such a three-level object.
5. RESTRICTED VARIABLES
A variable used in mathematics is not allowed to take all objects as values; it
can only take as values the members of a certain set, called the domain of the
variable. The dOInain is sometimes explicitly indicated, but is often only im-
plied. For example, the letter' n' is customarily used to specify an integer, so
that' (Vn)P(n) , would automatically be read "for every integer n, P(n)". How-
ever, sometimes n is taken to be a positive integer. In case of possible ambiguity
or doubt, we would indicate the restriction explicitly and write' ("In E 71.)P(n)',
where'71.' is the standard symbol for the set of all integers. The quantifier is
read, literally, "for all n in 71.", and more freely, "for every integer n". Similarly,
'(3n E 71.)P(n), is read "there exists an n in 71. such that P(n)" or "there exists
an integer n such that P(n)". Note that the symbol 'E' is here read as the
preposition'in'. The above quantifiers are called restricted quantifiers.
In the same way, we have restricted set formation, both implicit and explicit,
as in '{n: P(n)} , and '{n E 71. : pen)}', both of which are read "the set of all
integers n such that P(n)".
by
Restricted variables can be defined as abbreviations of unrestricted variables
("Ix E A)P(x) ¢=> ("Ix) (x E A => P(x)),
(3x E A)P(x) ¢=> (3x) (x E A & P(x)),
{x E A :P(x)} = {x:x E A & P(x)}.
Although there is never any ambiguity in sentences containing explicitly
restricted variables, it sometimes helps the eye to see the structure of the
sentence if the restricting phrases are written in superscript position, as in
(Ve>o)(3nEZ). Some restriction was implicit on page 1. If the reader agreed that
(Vx)(x2 - 1 = (x - 1)(x +1)) was true, he probably took x to be a real
number.
0.6 ORDERED PAIRS AND RELATIONS 9
6. ORDERED PAIRS AND RELATIONS
Ordered pairs are basic tools, as the reader knows from analytic geometry.
According to our general principle, the ordered pair -< a, b> is taken to be a
certain set, but here again we don't care which particular set it is so long as it
guarantees the crucial characterizing property:
-<x,y> = -<a,b> R x = aandy = b.
Thus -<1, 3> ~ -<3,1>.
The notion of a correspondence, or relation, and the special case of a map-
ping, or function, is fundamental to mathematics. A correspondence is a pairing
of objects such that given any two objects x and y, the pair -< x, y> either does
or does not correspond. A particular correspondence (relation) is generally
presented by a statement frame P(x, y) having two free variables, with x and y
corresponding if any only if P(x, y) is true. Given any relation (correspondence),
the set of all ordered pairs -<x, y> of corresponding elements is called its graph.
Now a relation is a mathematical object, and, as we have said several times,
it is current practice to regard every mathematical object as a set of some sort
or other. Since the graph of a relation is a set (of ordered pairs), it is efficient and
customary to take the graph to be the relation. Thus a relation (correspondence)
is simply a set of ordered pairs. If R is a relation, then we say that x has the
relation R to y, and we write 'xRy', if and only if -<x, y> E R. We also say
that x corresponds to y under R. The set of all first elements occurring in the
ordered pairs of a relation R is called the domain of R and is designated dom R
or ~(R). Thus
dom R = {x: (~y)-<x, y> E R}.
The set of second elements is called the mnge of R:
rangeR = {y:(~x)-<x,y> ER}.
The inverse, R-l, of a relation R is the set of ordered pairs obtained by reversing
those of R:
R-1 = {-<x, y> : -<y, x> E R}.
A statement frame P(x, y) having two free variables actually determines a pair
of mutually inverse relations R & S, called the gmphs of P, as follows:
R = {-<x, y> : P(x, y)}, S = {-<y, x> : P(x, y)}.
A two-variable frame together with a choice of which variable is considered to
be first might be called a directed frame. Then a directed frame would have a
uniquely determined relation for its graph. The relation of strict inequality
on the real number system IR would be considered the set {-<x, y> : x < y},
since the variables in 'x < y' have a natural order.
The set A X B = {-<x, y> : x E A & y E B} of all ordered pairs with
first element in A and second element in B is called the Cartesian product of the
10 INTRODUCTION 0.7
sets A and B. A relation R is always a subset of dom R X range R. If the two
"factor spaces" are the same, we can use exponential notation: A 2 = A X A.
The Cartesian product R2 = R X R is the "analytic plane". Analytic
geometry rests upon the one-to-one coordinate correspondence between R2 and
the Euclidean plane [2 (determined by an axis system in the latter), which
enables us to treat geometric questions algebraically and algebraic questions
geometrically. In particular, since a relation between sets of real numbers is a
subset of R2, we can "picture" it by the corresponding subset of the Euclidean
plane, or of any model of the Euclidean plane, such as this page. A simple
Cartesian product is shown in Fig. 0.1 (A UB is the union of the sets A and B).
B
B
1
1 A A
AXB when A=[l, 2Iu[2t, 31 and B=[1, Itlu{2}
Fig. 0.1
R[AI
I
I
I
I
I ....
I
I
----------,---
Fig. 0.2
I
I,
A
R
If R is a relation and A is any set, then the restriction of R to A, R rA,
is the subset of R consisting of those pairs with first element in A:
R t A = {-<x, y> : -<x, y> E R and x E A}.
Thus R rA = R n (A X range R), where C n D is the intersection of the sets
CandD.
If R is a relation and A is any set, then the image oj A under R, R[A), is
the set of second elements of ordered pairs in R whose first elements are in A:
R[A} = {y: (3x)(x E A & -<x. y> E R)}.
Thus R[A] = range (R rA), as shown in Fig. 0.2.
7. FUNCTIONS AND MAPPINGS
A Junction is a relation J such that each domain element x is paired with exactly
one range element y. This property can be expressed as follows:
-<x, y> EJ and -<x, Z> EJ =} y = z.
0.7 FUNCTIONS AND MAPPINGS
The y which is thus uniquely determined by f and x is designated f(x):
y = f(x) ~ <x, y> Ef.
11
One tends to think of a function as being active and a relation which is not
a function as being passive. A function f acts on an element x in its domain to
givef(x). We take x and applyfto it; indeed we often call a function an operator.
On the other hand, if R is a relation but not a function, then there is in general
no particular y related to an element x in its domain, and the pairing of x and y
is viewed more passively.
We often define a function f by specifying its value f(x) for each x in its
domain, and in this connection a stopped arrow notation is used to indicate the
pairing. Thus x 1-+ x2 is the function assigning to each number x its square x2•
Fig. 0.3
If we want it to be understood that f is this function, we can write "Consider
the function f: x 1-+ X2ll• The domain of f must be understood for this notation
to be meaningful.
If f is a function, then r 1 is of course a relation, but in general it is not a
function. For example, if f is the function x 1-+ X2, then r 1 contains the pairs
<4,2> and <4, -2> and so is not a function (see Fig. 0.3). Ifr 1 is a func-
tion, we say thatf is one-to-one and thatf is a one-to-one correspondence between
its domain and its range. Each x E domf corresponds to only one y E rangef
(f is a function), and each y E rangef corresponds to only one x E dom f (r1 is
a function).
The notation·
f:A -tB
is read "a (the) function f on A into B" or "the function f from A to B". The
notation implies that f is a function, that domf = A, and that range feB.
Many people feel that the very notion of function should include all these
ingredients; that is, a function should be considered an ordered triple <f, A, B> ,
where f is a function according to our more limited definition, A is the domain
12 INTRODUCTION 0.8
of f, and B is a superset of the range of f, which we shall call the codomain of f in
this context. We shall use the terms 'map', 'mapping', and 'transformation'
for such a triple, so that the notationf: A --+ B in its totality presents a mapping.
Moreover, when there is no question about which set is the codomain, we shall
often call the function f itself a mapping, since the triple -<f, A, B>- is then
determined by f. The two arrow notations can be combined, as in: "Define
f: !Fl --+ !Fl by x 1-+ x2 ".
A mapping f: A --+ B is said to be injective if f is one-to-one, surjective if
range f = B, and bijective if it is both injective and surjective. A bijective
mapping f: A --+ B is thus a one-to-one correspondence between its domain A
and its codomain B. Of course, a function is always surjective onto its range R,
and the statement that f is surjective means that R = B, where B is the under-
stood codomain.
8. PRODUCT SETS; INDEX NOTATION
One of the characteristic habits of the modern mathematician is that as soon as
a new kind of object has been defined and discussed a little, he immediately
looks at the set of all such objects. With the notion of a function from A to S
well in hand, we naturally consider the set of all functions from A to S, which we
designate SA. Thus!FlR is the set of all real-valued functions of one real variable,
and sz+ is the set of all infinite sequences in S. (It is understood that an infinite
sequence is nothing but a function whose domain is the set Z+ of all positive
integers.) Similarly, if we set n = {I, ... , n}, then Sri is the set of all finite
sequences of length n in S.
If B is a subset of S, then itR characteristic function (relative to S) is the func-
tion on S, usually designated XB, which has the constant value I on B and the
constant value 0 off B. The set of all characteristic functions of subsets of S is
thus 28 (since 2 = {O, I}). But because this collection of functions is in a
natural one-to-one correspondence with the collection of all subsets of S, XB
corresponding to B, we tend to identify the two collections. Thus 28 is also
interpreted as the set of all subsets of S. We shall spend most of the remainder
of this section discussing further similar definitional ambiguities which mathe-
maticians tolerate.
The ordered triple -<x, y, z>- is usually defined to be the ordered pair
-< -<x, y>- ,z>- . The reason for this definition is probably that a function of
two variables x and y is ordinarily considered a function of the single ordered
pair variable -<x, y>-, so that, for example, a real-valued function of two real
variables is a subset of (!Fl X !Fl) X !Fl. But we also consider such a function a
subset of Cartesian 3-space !Fl3• Therefore, we define !Fl3 as (!Fl X !Fl) X !Fl;
that is, we define the ordered triple -<x, y, z>- as -< -<x, y>-. z>-.
On the other hand, the ordered triple -<x, y, z>- could also be regarded as
the finite sequence {-< I, x>-, -<2, y>-, -<3, z>-}, which, of course, is a different
object. These two models for an ordered triple serve equally well, and, again,
0.8 PRODUCT SETS; INDEX NOTATION 13
mathematicians tend to slur over the distinction. We shall have more to say
on this point later when we discuss natural isomorphisms (Section 1.6). For
the moment we shall simply regard IRa and 1R"3" as being the same; an ordered
triple is something which can be "viewed" as being either an ordered pair of
which the first element is an ordered pair or as a sequence of length 3 (or, for that
matter, as an ordered pair of which the second element is an ordered pair).
Similarly, we pretend that Cartesian 4-space 1R4 is 1R4, 1R2 X 1R2, or
IRI X IRa = IR X ((IR X IR) X IR), etc. Clearly, we are in effect assuming an
associative law for ordered pair formation that we don't really have.
This kind of ambiguity, where we tend to identify two objects that really are
distinct, is a necessary corollary of deciding exactly what things are. It is one
of the prices we pay for the precision of set theory; in days when mathematics
was vaguer, there would have been a single fuzzy notion.
The device of indices, which is used frequently in mathematics, also has am-
biguous implications which we should examine. An indexed collection, as a set,
is nothing but the range set of a function, the indexing function, and a particular
indexed object, say Xi, is simply the value of that function at the domain element i.
If the set of indices is I, the indexed set is designated {Xi: i E l} or {Xi};EI
(or {Xi};:'l in case I = Z+). However, this notation suggests that we view the
indexed set as being obtained by letting the index run through the index set I
and collecting the indexed objects. That is, an indexed set is viewed as being
the set together with the indexing function. This ambivalence is reflected in the
fact that the same notation frequently designates the mapping. Thus we refer
to the sequence {Xn}:=l, where, of course, the sequence is the mapping n ~ Xn.
We believe that if the reader examines his idea of a sequence he will find this
ambiguity present. He means neither just the set nor just the mapping, but the
mapping with emphasis on its range, or the range "together with" the mapping.
But since set theory cannot reflect these nuances in any simple and graceful way,
we shall take an indexed set to be the indexing function. Of course, the same
range object may be repeated with different indices; there is no implication that
an indexing is one-to-one. Note also that indexing imposes no restriction on the
set being indexed; any set can at least be self-indexed (by the identity function).
Except for the ambiguous' {Xi: i E I}', there is no universally used notation
for the indexing function. Since Xi is the value of the function at i, we might
think of 'x;' as another way of writing 'xCi)', in which case we designate the
function 'x' or 'x'. We certainly do this in the case of ordered n-tuplets when
we say, "Consider the n-tuplet x = -< XI, •.• , xn »". On the other hand, there
is no compelling reason to use this notation. We can call the indexing function
anything we want; if it is j, then of course j(i) = Xi for all i.
We come now to the general definition of Cartesian product. Earlier we
argued (in a special case) that the Cartesian product A X B X C is the set of
all ordered triples x = -<XI, X2, xa» such that Xl E A, X2 E B, and Xa E C.
More generally, A I X A 2 X ... X An, or IIi=1 Ai, is the set of ordered n-
tuples x = -< XI, ... , xn» such that Xi E Ai for i = 1, ... ,n. If we interpret
14 INTRODUCTION 0.9
an ordered n-tuplet as a function on n = {I, ... , n}, we have
IIi=l Ai is the set of all functions x with domain n such that Xi. E Ai
for all i En.
This rephrasal generalizes almost verbatim to give us the notion of the
Cartesian product of an arbitrary indexed collection of sets.
Definition. The Cartesian product IIiE1Si of the indexed collection of
sets {Si: i E I} is the set of all functions f with domain the index set I
such that f(i) E Si for all i E I.
We can also use the notation II{Si : i E I} for the product and fi for the
value f(i).
9. COMPOSITION
If we are given maps f: A ~ Band g: B ~ C, then the composition of g with f,
g 0 f, is the map of A into C defined by
(gof)(x) = g(j(x)) for all X E A.
This is the function of a function operation of elementary calculus. Iff and g are
the maps from IR to IR defined by f(x) = Xl/3 +1 and g(x) = x2, then f 0 g(x) =
(X 2)1/3 + 1 = X 2 / 3 + 1, and g 0 f(x) = (X 1/3 + 1)2 = X 2 / 3 + 2Xl/3 + 1. Note
that the codomain of f must be the domain of g in order for go f to be defined.
This operation is perhaps the basic binary operation of mathematics.
Lemma. Composition satisfies the associative law:
f 0 (g 0 h) = (f 0 g) 0 h.
P1"OOf. (jo (g 0 h)) (x) = f((g 0 h)(x)) = f(g(h(x))) = (fo g) (h(x)) =
((f 0 g) 0 h) (x) for all x E dom h. 0
If A is a set, the identity map I A: A ~ A is the mapping taking every
x E A to itself. Thus I A = {-< x, x>- : x E A}. If f maps A into B, then clearly
foIA=f=IBof.
If g: B ~ A is such that g 0 f = lA, then we say that g is a left inverse of f and
that f is a right inverse of g.
Lemma. If the mapping f: A ~ B has both a right inverse h and a left
inverse g, they must necessarily be equal.
Proof. This is just algebraic juggling and works for any associative operation.
We have
h = IA 0 h = (g 0 f) 0 h = go (f 0 h) = go IB = g. 0
0.10 DUALITY 15
In this case we call the uniquely determined map y: B ---t A such that
fog = IB and g 0 f = IA the inverse of f. We then have:
Theorem. A mapping f: A ---t B has an inverse if and only if it is bijective,
in which case its inverse is its relational inverse f-l.
Proof. If fis bijective, then the relational inverser1 is a function from B to A,
and the equations fori = I Band r 1 0 f = I A are obvious. On the other
hand, if fog = I B, then f is surjective, since then every y in B can be written
y = f(g(y»). And if g 0 f = I A, then f is injective, for then the equation
f(x) = f(y) implies that x = y(j(x») = y(j(y») = y. Thus f is bijective if it
has an inverse. D
Now let ~(A) be the set of all bijections f: A ---t A. Then ~(A) is closed
under the binary operation of composition and
1) f 0 (y 0 h) = (f 0 y) 0 h for all f, g, h E~;
2) there exists a unique I E ~(A) such that f 0 I = I 0 f = f for all f E ~;
3) for each f E ~ there exists a unique y E ~ such that fog = g 0 f = I.
Any set G closed under a binary operation having these properties is called
a group with respect to that operation. Thus ~(A) is a group with respect to
composition.
Composition can also be defined for relations as follows. If RCA X Band
S C B X C, then S 0 RCA X C is defined by
-<x,z>- ESoR <=> (3yEB)(-<x,y>- ER& -<y,z>- ES).
If Rand S are mappings, this definition agrees with our earlier one.
10. DUALITY
There is another elementary but important phenomenon called duality which
occurs in practically all branches of mathematics. Let F: A X B ---t C be any
function of two variables. It is obvious that if x is held fixed, then F(x, y) is a
function of the one variable y. That is, for each fixed x there is a function
hX:B ---t C defined by hX(y) = F(x, y). Then x 1-+ hXis a mapping cp of A into
CB. Similarly, each y E B yields a function gy E CA , where gy(x) = F(x, y),
and y 1-+ yy is a mapping (J from B to CA.
Now suppose conversely that we are given a mapping cp: A ---t CB. For each
x E A we designate the corresponding value of cp in index notation as hX , so
that hXis a function from B to C, and we define F: A X B ---t C by F(x, y) =
hX(y). We are now back where we started. Thus the mappings cp: A ---t CB,
/1': A X B ---t C, and (J: B ---t CA are equivalent, and can be thought of as three
different ways of viewing the same phenomenon. The extreme mappings cp and
(J will be said to be dual to each other.
16 INTRODUCTION 0.10
The mapping I() is the indexed family of functions {hx: x E A} C CB. Now
suppose that 5' C CB is an unindexed collection of functions on B into C, and
define F: 5' X B -+ C by F(f, y) = f(y). Then 8: B -+ (J'.f is defined by gll(f) =
f(y). What is happening here is simply that in the expressionf(y) we regard both
symbols as variables, so that f(y) is a function on 5' X B. Then when we hold y
fixed, we have a function on 5' mapping 5' into C.
Wc shall see some important applications of this duality principle as our
subject develops. For example, an m X n matrix is a function t = {tij} in
RmXn. We picture the matrix as a rectangular array of numbers, where Ii' is
the row index and Ij' is the column index, so that tij is the number at the inter-
section of the ith row and the jth column. If we hold i fixed, we get the n-tuple
forming the ith row, and the matrix can therefore be interpreted as an m-tuple
of row n-tuples. Similarly (dually), it can be viewed as an n-tuple of column
m-tuples.
In the same vein, an n-tuple -<!I, ... ,fn > of functions from A to B can
be regarded as a single n-tuple-valued function from A to Bn,
In a somewhat different application, duality will allow us to regard a finite-
dimensional vector space V as being its own second conjugate space (V*)*.
It is instructive to look at elementary Euclidean geometry from this point
of view. Today we regard a straight line as being a set of geometric points.
An older and more neutral view is to take points and lines as being two different
kinds of primitive objects. Accordingly, let A be the set of all points (so that A
is the Euclidean plane as we now view it), and let B be the set of all straight lines.
Let F be the incidence function: F(p, l) = 1 if p and I are incident (p is "on" l,
I is "on" p) and F(p, l) = 0 otherwise. Thus F maps A X B into {O, 1}. Then
for each IE B the function gl(P) = F(p, I) is the characteristic function of the
set of points that we think of as being the line l (gl(P) has the value 1 if p is on l
and 0 if p is not on l.) Thus each line determines the set of points that are on it.
But, dually, each point p determines the set of lines I "on" it, through its char-
acteristic function hP(I). Thus, in complete duality we can regard a line as being
a set of points and a point as being a set of lines. This duality aspect of geometry
is basic in projective geometry.
It is sometimes awkward to invent new notation for the "partial" function
obtained by holding a variable fixed in a function of several variables, as we did
above when we set gil(x) = F(x, y), and there is another device that is frequently
useful in this situation. This is to put a dot in the position of the "varying
variable". Thus F(a,') is the function of one variable obtained from F(x, y)
by holding x fixed at the value a, so that in our beginning discussion of duality
we have
hX = F(x, .), gil = F(·, y).
If f is a function of one variable, we can then write f = f('), and so express the
0.11 THE BOOLEAN OPERATIONS 17
above equations also as h"'(-) = F(x, .), gy(-) = F( . , y). The flaw in this notation
is that we can't indicate substitution without losing meaning. Thus the value
of the function F(x,·) at b is F(x, b), but from this evaluation we cannot read
backward and tell what function was evaluated. Weare therefore forced to
some such cumbersome notation as F(x, ·)/b, which can get out of hand. Never-
theless, the dot device is often helpful when it can be used without evaluation
difficulties. In addition to eliminating the need for temporary notation, as
mentioned above, it can also be used, in situations where it is strictly speaking
superfluous, to direct the eye at once to the position of the variable.
For example, later on D~F will designate the directional derivative of the
function F in the (fixed) direction~. This is a function whose value at a is
D~F(a), and the notation D~F(-) makes this implicitly understood fact explicit.
11. THE BOOLEAN OPERATIONS
Let S be a fixed domain, and let 5' be a family of subsets of S. The union of 5',
or the union of all the sets in 5', is the set of all elements belonging to at least one
set in 5'. We designate the union U5' or UAE~ A, and thus we have
U5' = {x: (3A E~)(X E A)}, Y E U5' ¢=} (3A E~)(y E A).
We often consider the family 5' to be indexed. That is, we assume given a set I
(the set of indices) and a surjective mapping i 1-+ Ai from I to 5', so that 5' =
{Ai: i E I}. Then the union of the indexed collection is designated UiEI Ai or
U{Ai: i E I}. The device of indices has both technical and psychological
advantages, and we shall generally use it.
If 5' is finite, and either it or the index set is listed, then a different notation
is used for its union. If 5' = {A, B}, we designate the union A U B, a notation
that displays the listed names. Note that here we have x E A u B ¢=} x E A or
x E B. If 5' = {Ai: i = 1, ... ,n}, we generally write 'AI U A2 U· .. U An'
or 'Uf=l Ai' for U5'·
The intersection of the indexed family {Ai}iEI, designated niEI Ai, is the
set of all points that lie in every Ai. Thus
x E nAi ¢=} fYiEI)(x E Ai).
iEI
For an unindexed family 5' we use the notation n5' or nAE~ A, and if 5' =
{A, B}, then n5' = An B.
The complement, A', of a subset of S is the set of elements x ,~: S not in
A: A' = {XES: x fJ. A}. The law of De Morgan states that the complement of
an intersection is the union of the complements:
(nAi)' = U (A~).
iEI iEI
This an immediate consequence of the rule for negating quantifiers. It is the
18 INTRODUCTION 0.11
equivalence between 'not always in' and 'sometimes not in': [~(Vi)(x E Ai) <=?
(3i)(x G! Ai)] says exactly that
x E (0Ai)' <=? X E yeA:>.
If we set Bi = A~ and take complements again, we obtain the dual form:
(UiE1Bi)' = niEI(BD·
Other principles of quantification yield the laws
B n (U Ai) = U(B n Ai)
iEI iEI
from P & (3x)Q(x) <=? (3x)(P & Q(x»,
B U (n Ai) = n(B u Ai),
iEI iEI
B n (n Ai) = n(B n Ai),
iEI iEI
B U (U Ai) = U(B U Ai).
iEI iEI
In the case of two sets, these laws imply the following familiar laws of set algebra:
(A U B)' = A' n B', (A n B)' = A' U B' (De Morgan),
A n (B U C) = (A n B) U (A n C),
A u (B n C) = (A u B) n (A U C).
Even here, thinking in terms of indices makes the laws more intuitive. Thus
(AI n A 2 )' = A) u A~
is obvious when thought of as the equivalence between 'not always in' and
'sometimes not in'.
The family 5' is disjoint if distinct sets in 5' have no elements in common, i.e.,
if ('IX, yE5')(X ~ Y =} X n Y = 0). For an indexed family {Ai}iEI the
condition becomes i ~ J=} Ai n Aj = 0. If 5' = {A, B}, we simply say that
A and B are disjoint.
Given f: U ~ V and an indexed family {Bi} of subsets of V, we have the
following important identities:
and, for a single set B C V,
For example,
x E r l [~ Bi] <=? f(x) E ~ Bi <=? (Vi) (j(x) E B i)
<=? (Vi) (x E f-1[Bi ]) <=? x E nf-l[Bi l.
i
0.12 PARTITIONS AND EQUIVALENCE RELATIONS 19
The first, but not the other two, of the three identities above remains valid
when f is replaced by any relation R. It follows from the commutative law,
(3x)(3y)A ~ (3y)(3x)A. The second identity fails for a general R because
'(3x)(Vy)' and '(Vy)(3x)' have different meanings.
12. PARTITIONS AND EQUIVALENCE RELATIONS
A partition of a set A is a disjoint family ;r of sets whose union is A. We call the
elements of;r 'fibers', and we say that;r fibers A or is afibering of A. For example,
the set of straight lines parallel to a given line in the Euclidean plane is a fibering
of the plane. If 'x' designates the unique fiber containing the point x, then
x ~ x is a surjective mapping 71': A ~ ;r which we call the projection of A on;r.
Passing from a set A to a fibering ;r of A is one of the principal ways of forming
new mathematical objects.
Any function f automatically fibers its domain into sets on which f is con-
stant. If A is the Euclidean plane and f(p) is the x-coordinate of the point p in
some coordinate system, then f is constant on each vertical line; more exactly,
j-l(X) is a vertical line for every x in IR. Moreover, x ~ j-l(X) is a bijection
from IR to the set of all fibers (vertical lines). In general, if j: A ~ B is any sur-
jective mapping, and if for each value y in B we set
Ay = j-l(y) = {x E A: j(x) = y},
then ;r = {Ay: y E B} is a fibering of A and cp: y ~ Ay is a bijection from
B to;r. Also cp 0 f is the projection 71': A ~;r, since cp 0 j(x) = cp(j(x) is the
set x of all z in A such that j(z) = j(x).
The above process of generating a fibering of A from a function on A is
relatively trivial. A more important way of obtaining a fibering of A is from
an equality-like relation on A called an equivalence relation. An equivalence
relation ~ on A is a binary relation which is reflexive (x ~ x for every x E A),
symmetric (x ~ y =? Y ~ x), and transitive (x ~ y and y ~ z =? X ~ z). Every
fibering;r of A generates a relation ~ by the stipulation that x ~ y if and only if
x and yare in the same fiber, and obviously ~ is an equivalence relation. The
most important fact to be established in this section is the converse.
Theorem. Every equivalence relation ~ on A is the equivalence relation
of a fibering.
Proof. We obviously have to define x as the set of elements y equivalent to x,
x = {y: y ~ x}, and our problem is to show that the family ;r of all subsets of A
obtained this way is a fibering.
The reflexive, symmetric, and transitive laws become
x Ex, x E 'ii =? Y Ex, and x E 'ii and y E Z =? X E Z.
Reflexivity thus implies that ;r covers A. Transitivity says that if y E z, then
x E 'ii =? X E z; that is, if yE z, then 'ii c z. But also, if yE z, then z E 'ii by
20 INTRODUCTION 0.12
symmetry, and so zC y. Thus yE z implies y = z. Therefore, if two of our
sets aand bhave a point x in common, then a = x = 0. In other words, if ais
not the set b, then aand bare disjoint, and we have a fibering. 0
The fundamental role this argument plays in mathematics is due to the fact
that in many important situations equivalence relations occur as the primary
object, and then are used to define partitions and functions. We give two
examples.
Let lL be the integers (positive, negative, and zero). A fraction 'min' can
be considered an ordered pair -<m, n>- of integers with n -:;e o. The set of all
fractions is thus lL X (lL - {OJ). Two fractions -<m, n>- and -<p, q>- are
"equal" if and only if mq = np, and equality is checked to be an equivalence
relation. The equivalence class -<m, n>- is the object taken to be the rational
number min. Thus the rational number system Q is the set of fibers in a par-
tition of lL X (lL - {O}).
Next, we choose a fixed integer p ElL and define a relation E on lL by
mEn <=> p divides m - n. Then E is an equivalence relation, and the set lLp of
its equivalence classes is called the integers modulo p. It is easy to see that mEn
if and only if m and n have the same remainder when divided by p, so that in
this case there is an easily calculated function f, where f(m) is the remainder
after dividing m by p, which defines the fibering. The set of possible remainders
is {O, 1, ... , p - I}, so that lLp contains p elements.
A function on a set A can be "factored" through a fibering of A by the
following theorem.
Theorem. Let g be a function on A, and let g: be a fibering of A. Then g
is constant on each fiber of g: if and only if there exists a function y on g:
such that g = y 0 7r.
Proof. If g is constant on each fiber of g:, then the association of this unique
value with the fiber defines the function y, and clearly g = yo 7r. The converse
is obvious. 0
CHAPTER 1
VECTOR SPACES
The calculus of functions of more than one variable unites the calculus of one
variable, which the reader presumably knows, with the theory of vector spaces,
and the adequacy of its treatment depends directly on the extent to which vector
space theory really is used. The theories of differential equations and differential
geometry are similarly based on a mixture of calculus and vector space theory.
Such "vector calculus" and its applications constitute the subject matter of this
book, and in order for our treatment to be completely satisfactory, we shall
have to spend considerable time at the beginning studying vector spaces them-
selves. This we do principally in the first two chapters. The present chapter is
devoted to general vector spaces and the next chapter to finite-dimensional
spaces.
We begin this chapter by introducing the basic concepts of the subject-
vector spaces, vector subspaces, linear combinations, and linear transforma-
tions-and then relate these notions to the lines and planes of geometry. Next
we establish the most elementary formal properties of linear transformations and
Cartesian product vector spaces, and take a brief look at quotient vector spaces.
This brings us to our first major objective, the study of direct sum decomposi-
tions, which we undertake in the fifth section. The chapter concludes with a
preliminary examination of bilinearity.
I. FUNDAMENTAL NOTIONS
Vector spaces and subspaces. The reader probably has already had some
eontact with the notion of a vector space. lIost beginning calculus texts discuss
1!:cometric vectors, which are represented by "arrows" drawn from a chosen
origin O. These vectors are added geometrically by the parallelogram rule:
The sum of the vector 01 (represented by the arrow from 0 to A) and the
vcctor Oii is the vector QP, where P is the vertex opposite 0 in the parallelogram
having OA and OB as two sides (Fig. 1.1). Vectors can also be multiplied by
numbers: x(o"A) is that vector DB such that B is on the line through 0 and
:1, the distance from 0 to B is Ixl times the distance from 0 to A, and B and A
arc on the same side of 0 if x is positive, and on opposite sides if x is negative
21
22 VECTOR SPACES
o
Fig. 1.1
------!!.------ j
-------- / 
-- / 
X
I
I
, /
O~-- _______.._......~~/ o
--------p
------
Fig. 1.3
Fig. 1.2
oc=-till
OB=fOA
------------.----
1.1
(Fig. 1.2). These two vector operations satisfy certain laws of algebra,
which we shall soon state in the definition. The geometric proofs of these laws
are generally sketchy, consisting more of plausibility arguments than of airtight
logic. For example, the geometric figure in Fig. 1.3 is the essence of the usual
proof that vector addition is associative. In each case the final vector OX is
represented by the diagonal starting from 0 in the parallelepiped constructed
from the three edges OA, OB, and ~C. The set of all geometric vectors, together
with these two operations and the laws of algebra that they satisfy, constitutes
one example of a vector space. We shall return to this situation in Section 2.
The reader may also have seen coordinate triples tr€ated as vectors. In this
system a three-dimensional vector is an ordered triple of numbers -<Xl, X2, xa>
which we think of geometrically as the coordinates of a point in space. Addition
is now algebraically defined,
-<Xb X2,Xa> + -<YbY2,Ya> = -<Xl+Yb X.2+Y2,Xa+Ya>,
as is multiplication by numbers, t-<Xl' X2, xa> = -<tXl, tX2, tXa>. The
vector laws are much easier to prove for these objects, since they are almost
algebraic formalities. The set ~3 of all ordered triples of numbers, together with
these two operations, is a second example of a vector space.
1.1 FUNDAMENTAL NOTIONS 23
If we think of an ordered triple -<Xli X2, X3 >- as a function x with domain
the set of integers from 1 to 3, where Xi is the value of the function x at i (see
Section 0.8), then this vector space suggests a general type called a function
space, which we shall examine after the definition. For the moment we remark
only that we defined the sum of the triple x and the triple y as that triple z
such that Zi = Xi +Yi for every i.
A vector space, then, is a collection of objects that can be added to each
other and multiplied by numbers, subject to certain laws of algebra. In this
context a number is often called a scalar.
Definition. Let V be a set, and let there be given a mapping -<a, fl >- .-
a +fl from V X V to V, called addition, and a mapping -<x, a>- .- xa
from IR X V to V, called multiplication by scalars. Then V is a vector space
with respect to these two operations if:
AI. a + (fl +1') = (a +tJ) +I' for all a, fl, I' E V.
A2. a +fl = fl +a for all a, fl E V.
A3. There exists an element 0 E V such that a +0 = a for all a E V.
A4. For every a E V there exists a fl E V such that a +fl = O.
S1. (xy)a = x(ya) for all x, y E IR, a E V.
S2. (x + y)a = Xa + ya for all x, y E IR, a E V.
S3. x(a +tJ) = Xa +xfl for all X E IR, a, fl E V.
S4. Ia = a for all a E V.
In contexts where it is clear (as it generally is) which operations are intended,
we refer simply to the vector space V.
Certain further properties of a vector space follow directly from the axioms.
Thus the zero element postulated in A3 is unique, and for each a the fl of A4
is unique, and is called -a. Also Oa = 0, xO = 0, and (-I)a = -a. These
elementary consequences are considered in the exercises.
Our standard example of a vector space will be the set V = IRA of all real-
valued functions on a set A under the natural operations of addition of two
functions and multiplication of a function by a number. This generalizes the
example lR(l,2,31 = 1R3 that we looked at above. Remember that a function f
in IRA is simply a mathematical object of a certain kind. We are saying that two
of these objects can be added together in a natural way to form a third such
object, and that the set of all such objects then satisfies the above laws for
addition. Of course, f + g is defined as the function whose value at a is f(a) +
y(a), so that (f + g)(a) = f(a) + g(a) for all a in A. For example, in 1R3 we
defined the sum x +y as that triple whose value at i is Xi +Yi for all i. Similarly,
cf is the function defined by (cf)(a) = c(j(a») for all a. Laws Al through 84
follow at once from these definitions and the corresponding laws of algebra for
the real number system. For example, the equation (s +t)f = sf +tf means
24 VECTOR SPACES 1.1
that (s + t)f) (a) = (sf + tf)(a) for all a E A. But
(s + t)f) (a) = (s + t) (f(a)) = s(f(a)) + t(f(a))
= (sf)(a) + (tf)(a) = (sf + tf)(a),
where we have used the definition of scalar multiplication in IRA, the distributive
law in IR, the definition of scalar multiplication in IRA, and the definition of
addition in IRA, in that order. Thus we have S2, and the other laws follow
similarly.
The set A can be anything at all. If A = IR, then V = IRR is the vector
space of all real-valued functions of one real variable. If A = IR X IR, then
V = IRRXR is the space of all real-valued functions of two real variables. If
A = {1,2} = 2, then V = 1R2 = 1R2 is the Cartesian plane, and if A =
{I, ... ,n} = ii, then V = IRn is Cartesian n-space. If A contains a single
point, then IRA is a natural bijective image of IR itself, and of course IR is trivially
a vector space with respect to its own operations.
Now let V be any vector space, and suppose that W is a nonempty subset of
V that is closed under the operations of V. That is, if a and {3 are in W, then so
is a +(3, and if a is in W, then so is Xa for every scalar x. For example, let V be
the vector space lR[a,bl of all real-valued functions on the closed interval [a, b) C IR,
and let W be the set e([a, b]) of all continuous real-valued functions on [a, b).
Then W is a subset of V that is closed under the operations of V, since f +g
and cf are continuous whenever f and g are. Or let V be Cartesian 2-space 1R2,
and let W be the set of ordered pairs x = -<XI, X2> such that Xl + X2 = O.
Clearly, W is closed under the operations of V.
Such a subset W is always a vector space in its own right. The universally
quantified laws AI, A2, and Sl through S4 hold in W because they hold in the
larger set V. And since there is some {3 in W, it follows that 0 = O{3 is in W
because W is closed under multiplication by scalars. For the same reason, if a
is in W, then so is -a = (-l)a. Therefore, A3 and A4 also hold, and we see
that W is a vector space. We have proved the following lemma.
Lelllilla. If W is a nonempty subset of a vector space V which is closed
under the operations of V, then W is itself a vector space.
We call Wa subspace of V. Thus e([a, b]) is a subspace of lR[a,bl, and the
pairs -<Xl, X2> such that Xl + X2 = 0 form a subspace of 1R2. Subspaces will
be with us from now to the end.
A subspace of a vector space IRA is called a function space. In other words, a
function space is a collection of real-valued functions on a common domain
which is closed under addition and multiplication by scalars.
What we have defined so far ought to be called the notion of a real vector
space or a vector space over IR. There is an analogous notion of a complex vector
space, for which the scalars are the complex numbers. Then laws Sl through S4
refer to mUltiplication by complex numbers, and the space CA of all complex-
1.1 FUNDAMENTAL NOTIONS 25
valued functions on A is the standard example. In fact, if the reader knew what
is meant by a field F, we could give a single general definition of a vector space
over F, where scalar multiplication is by the elements of F, and the standard
example is the space V = FA of all functions from A to F. Throughout this
book it will be understood that a vector space is a real vector space unless explic-
itly stated otherwise. However, much of the analysis holds as well for complex
vector spaces, and most of the pure algebra is valid for any scalar field F.
EXERCISES
1.1 Sketch the geometric figure representing law S3,
x(OA +oB) = x(OA) +x(oB),
for geometric vectors. Assume that x > 1.
1.2 Prove S3 for 1R3 using the explicit displayed form {Xl, X2, X3J for ordered triples.
1.3 The vector 0 postulated in A3 is unique, as elementary algebraic fiddling will
show. For suppose that 0' also satisfies A3. Then
0' = 0'+0
= 0+ 0'
=0
(A3 for 0)
(A2)
(A3 for 0').
Show by similar algebraic juggling that, given a, the {3 postulated in A4 is unique.
This unique {3 is designated -a.
1.4 Prove similarly that Oa = 0, xO = 0, and (-I)a = -a.
1.5 Prove that if xa = 0, then either X = 0 or a = O.
1.6 Prove SI for a function space IRA. Prove S3.
1.7 Given that a is any vector in a vector space V, show that the set {xa: X E IR}
of all scalar multiples of a is a subspace of V.
1.8 Given that a and {3 are any two vectors in V, show that the set of all vectors
.ra +y{3, where x and yare any real numbers, is a subspace of V.
1.9 Show that the set of triples x in 1R3 such that Xl - X2 +2X3 = 0 is a subspace
M. If N is the similar subspace {x: Xl + X2 + X3 = O}, find a nonzero vector a in
!If n N. Show that !If n N is the set {xa: X E IRJ of all scalar multiples of a.
1.10 Let A be the open interval (0,1), and let V be IRA. Given a point X in (0,1),
lct V:z; be the set of functions in V that have a derivative at x. Show that V:z; is a sub-
space of V.
l.ll For any subsets A and B of a vector space V we define the set sum A +B by
.1+B = {a+{3:aEAand{3EB}. Show that (A+B)+C = A+(B+C).
1.12 If A C V and X C IR, we similarly define X A = {xa: X E X and a E .ti}.
Show that a nonvoid set A is a subspace if and only if A + A = A and IRA = A.
1.13 Let V be 1R2, and let !If be the line through the origin with slope k. Let x be
any nonzero vector in M. Show that M is the subspace IRx = {tx: t E IR}.
2G VECTOR SPACES 1.1
1.14 Show that any other line L with the same slope k is of the form M + a for some a.
1.15 Let !If be a subspace of a vector space V, and let a and {3 be any two vectors in V.
Given A = a +1If and B = {3 + M, show that either A = B or An B = 0.
Show also that A +B = (a +(3) +M.
1.16 State more carefully and prove what is meant by "a subspace of a subspace is
a subspace".
1.17 Prove that the intersection of two subspaces of a vector space is always itself
a subspace.
1.18 Prove more generally that the intersection TV = niEI Wi of any family
{Wi: i E J} of subspaces of V is a subspace of V.
1.19 Let V again be IR(O.l), and let W be the set of all functions f in V such that f'(x)
exists for every x in (0, 1). Show that lr is the intersection of the collection of subspaces
of the form V. that were considered in Exercise 1.10.
1.20 Let V be a function space IR--t, and for a point a in .1 let Wa be the set of functions
such that f(a) = O. Wa is clearly a subspace. For a subset Be A let W B be the set
of functions f in V such that f = 0 on B. Show that lVB is the intersection naEB Wa.
1.21 Supposing again that X and Yare subspaces of V, show that if X + y = V and
X n l' = {O}, then for every vector ~ in V there is a unique pair of vectors !; E X
and 71 E Y such that ~ = !; + 71.
1.22 Show that if X and Yare subspaces of a vector space 17, then the union XU l'
can only be a subspace if either XC Yor Y ex.
Linear combinations and linear span. Because of the commutative and associ-
ative laws for vector addition, the sum of a finite set of vectors is the same for all
possible ways of adding them. For example, the sum of the three vectors
aa, ab, a c can be calculated in 12 ways, all of which give the same result:
Therefore, if I = {a, b, c} is the set of indices used, the notation LiEI ai,
which indicates the sum without telling us how we got it, is unambiguous. In
general, for any finite indexed set of vectors {ai: i E l} there is a uniquely
determined sum vector LiEI ai which we can compute by ordering and group-
ing the a/s in any way.
The index set I is often a block of integers n = {1, ... ,n}. In this case
the vectors ai form an n-tuple {ai}~' and unless directed to do otherwise we
would add them in their natural order and write the sum as Li'=l ai. Note
that the way they are grouped is still left arbitrary.
Frequently, however, we have to use indexed sets that are not ordered.
For example, the general polynomial of degree at most 5 in the two variables
's' and 't' is
and the finite set of monomials {Siti}i+i:$5 has no natural order.
1.1 FUNDAMENTAL NOTIONS 27
*The formal proof that the sum of a finite collection of vectors is indepen-
dent of how we add them is by induction. We give it only for the interested
reader.
In order to avoid looking silly, we begin the induction with two vectors,
in which case the commutative law aa +ab = ab +aa displays the identity of
all possible sums. Suppose then that the assertion is true for index sets having
fewer than n elements, and consider a collection {ai:i E I} having n members.
Let {3 and 'Y be the sum of these vectors computed in two ways. In the com-
putation of {3 there was a last addition performed, so that {3 = (LiEJ1 ai) +
(LiEJ 2 ai), where {JI, J 2 } partitions I and where we can write these two
partial sums without showing how they were formed, since by our inductive
hypothesis all possible ways of adding them give the same result.
Similarly, 'Y = (LiEKl ai) + (L iEK2 ai). Now set
and ~jk = L: ai,
iELjk
where it is understood that ~jk = °if Ljk is empty (see Exercise 1.37). Then
LiE J 1 = ~ 11 + h 2 by the inductive hypothesis, and similarly for the other
three sums. Thus
which completes our proof. *
A vector {3 is called a linear combination of a subset A of the vector space V
if (3 is a finite sum L Xiai, where the vectors ai are all in A and the scalars Xi
are arbitrary. Thus, if A is the subset {tn}; C IRR of all "monomials", then a
function f is a linear combination of the functions in A if and only if f is a
polynomial function f(t) = L~ ci. If A is finite, it is often useful to take the
indexed set {ai} to be the whole of A, and to simply use a O-coefficient for any
vector missing from the sum. Thus, if A is the subset {sin t, cos t, et} of IRR,
then we can consider A an ordered triple in the listed ordering, and the function
3 sin t - et = 3· sin t +°.cos t + (-I)et is the linear combination of the
triple A having the coefficient triple -< 3, 0, -1>-.
Consider now the set L of all linear combinations of the two vectors
-< 1, 1, 1>- and -< 0, 1, -1>- in 1R3. It is the set of all vectors s -< 1, 1, 1>- +
t-< 0, 1, -1>- = -< s, s +t, s - t>-, where sand t are any real numbers. Thus
L = {-<s, s +t, s - t>- : -<s, t>- E 1R2}. It will be clear on inspection that
L is closed under addition and scalar multiplication, and therefore is a subspace
of 1R3. Also, L contains each of the two given vectors, with coefficient pairs
-< 1, 0>- and -<0, 1>-, respectively. Finally, any subspace M of 1R3 which
contains ·each of the two given vectors will also contain all of their linear combi-
nations, and so will include L. That is, L is the smallest subspace of 1R3 containing
-< 1, 1, 1>- and -< 0, 1, -1>-. It is called the linear span of the two vectors, or the
subspace generated by the two vectors. In general, we have the following theorem.
28 VECTOR SPACES 1.1
Theorem 1.1. If A is a nonempty subset of a vector space V, then the set
L(A) of all linear combinations of the vectors in A is a subspace, and it is
the smallest subspace of V which includes the set A.
Proof. Suppose first that A is finite. We can assume that we have indexed A
in some way, so that A = {ai: i E I} for some finite index set I, and every
element of L(A) is of the form LiEI Xiai. Then we have
(L Xiai) + (L Yiai) = L (Xi + Yi)ai
because the left-hand side becomes Li (Xiai +Yiai) when it is regrouped by
pairs, and then S2 gives the right-hand side. We also have
e(L Xiai) = L(exi)ai
by S3 and mathematical induction. Thus L(A) is closed under addition and
multiplication by scalars and hence is a subspace. Moreover, L(A) contains
each ai (why?) and so includes A. Finally, if a subspace W includes A, then it
contains each linear combination L Xiai, so it includes L(A). Therefore, L(A)
can be directly characterized as the uniquely determined smallest subspace
which includes the set A.
If A is infinite, we obviously can't use a single finite listing. However, the
sum (L~ Xiai) + (Lf Yj{3j) of two linear combinations of elements of A is
clearly a finite sum of scalars times elements of A. If we wish, we can rewrite it
as L~+m Xiai, where we have set (3j = an+j and Yj = xn+j for j = 1, ... , m.
In any case, L(A) is again closed under addition and multiplication by scalars
and so is a subspace. 0
We call L(A) the linear span of A. If L(A) = V, we say that A spans V;
V is finite-dimensional if it has a finite spanning set.
If V = JR3, and if 0 02, and 03 are the "unit points on the axes", 01 =
-<1,0,0>-,02 = -<0,1,0>-, and 03 = -<0,0,1>-, then {Oi}r spans V, since
x= -<Xl, X2, X3>- = -<XllO,O>- + -<0,X2,0>- + -<0,0,X3>- = X101 +
X202 + X303 = L~ XiOi for every x in JR3. More generally, if V = JRn and oj is
the n-tuple having 1 in the jth place and °elsewhere, then we have similarly that
x = -< Xl, ... , Xn>- = Li'=l Xioi, so that {oi}l spans JRn. Thus JRn is finite-
dimensional. In general, a function space on an infinite set A will not be finite-
dimensional. For example, it is true but not obvious that e([a, bJ) has no finite
spanning set.
EXERCISES
1.23 Given a = -<1,1,1>-, {3 = -<0,1, -1>-,1' = -<2,0,1>-, compute the linear
combinations a +{3 +1', 3a - 2{3 +1', xa +y{3 +z'Y. Find x, y, and z such that
xa + y{3 + z'Y = -< 0,0,1>- = 03 • Do the same for 01 and 02.
1.24 Given a = -<1,1,1>-, {3 = -<0,1, -1>-, I' = -<1,0,2>-, show that each of
a, {3, I' is a linear combination of the other two. Show that it is impossible to find
coefficients x, y, and z such that xa + y{3 + z'Y = 01.
1.1 FUNDAMENTAL NOTIONS 29
1.25 a) Find the linear combination of the set A = <. t, t2 - 1, t2 + 1> with coeffi-
cient triple <'2, -1, 1>. Do the same for <'0,1,1>.
b) Find the coefficient triple for which the linear combination of the triple A
is (t + 1)2. Do the same for 1.
c) Show in fact that any polynomial of degree ~ 2 is a linear combination of A.
1.26 Find the linear combinationfof {et, e-t} C jRR such thatf(O) = 1 andf'(O) = 2.
1.27 Find a linear combination f of sin x, cos x, and eX such that f(O) = 0, ff (0) = 1,
and f"(0) = 1.
1.28 Suppose that a sin x + b cos x + ceX is the zero function. Prove that a = b =
c = O.
1.29 Prove that <'1,1> and <'1,2> span jR2.
1.30 Show that the subspace M = {x: Xl + X2 = O} C jR2 is spanned by one vector.
1.31 Let M be the subspace {x: Xl - X2 + 2X3 = O} in jR3. Find two vectors a
and h in M neither of which is a scalar multiple of the other. Then show that M is
the linear span of a and h.
1.32 Find the intersection of the linear span of <. 1, 1, 1> and <. 0, 1, -1 > in jR3
with the coordinate subspace X2 = O. Exhibit this intersection as a linear span.
1.33 Do the above exercise with the coordinate space replaced by
J[ = {x: Xl + X2 = O}.
1.34 By Theorem 1.1 the linear span L(A) of an arbitrary subset .t of a vector space
V has the following two properties:
i) L( A) is a subspace of V which includes A;
ii) If M is any subspace which includes A, then L(A) eM.
Using only (i) and (ii), show that
a) A C B=} L(A) C L(B);
b) L(L(A)) = L(A).
1.35 Show that
a) if M and N are subspaces of V, then so is M + N;
b) for any subsets A, B C V, L(A U B) = L(A) + L(B).
1.36 Remembering (Exercise 1.18) that the intersection of any family of subspaces
is a subspace, show that the linear span L(A) of a subset A of a vector space V is the
intersection of all the subspaces of V that include A. This alternative characterization
is sometimes taken as the definition of linear span.
1.37 By convention, the sum of an empty set of vectors is taken to be the zero vector.
This is necessary if Theorem 1.1 is to be strictly correct. Why? What about the
preceding problem?
Linear transformations. The general function space JRA and the subspace
e([a, bJ) of jR[a,bJ both have the property that in addition to being closed under
the vector operations, they are also closed under the operation of multiplication
of two functions. That is, the pointwise product of two functions is again a
function [(fg)(a) = f(a)g(a)J, and the product of two continuous functions is
continuous. With respect to these three operations, addition, multiplication,
30 VECTOR SPACES 1.1
and scalar multiplication, IRA and e([a, b]) are examples of algebras. If the reader
noticed this extra operation, he may have wondered why, at least in the context
of function spaces, we bother with the notion of vector space. Why not study
all three operations? The answer is that the vector operations are exactly the
operations that are "preserved" by many of the most important mappings of
sets of functions. For example, define T: e([a, b]) ~ IR by T(f) = f: f(t) dt.
Then the laws of the integral calculus say that T(f + g) = T(f) + T(g) and
T(cf) = cT(f). Thus T "preserves" the vector operations. Or we can say that T
"commutes" with the vector operations, since plus followed by T equals T
followed by plus. However, T does not preserve multiplication: it is not true in
general that T(fg) = T(f)T(g).
Another example is the mapping T: x ~ y from 1R3 to 1R2 defined by
YI = 2XI - X2 + X3, Y2 = Xl + 3X2 - 5X3, for which we can again verify
that T(x + y) = T(x) + T(y) and T(cx) = cT(x). The theory of the solvability
of systems of linear equations is essentially the theory of such mappings T; thus
we have another important type of mapping that preserves the vector operations
(but not products).
These remarks suggest that we study vector spaces in part so that we can
study mappings which preserve the vector operations. Such mappings are
called linear transformations.
Definition. If V and Ware vector spaces, then a mapping T: V ~ W is a
linear transformation or a linear map if T(a + (3) = T(a) + T({3) for all
a, (3 E V, and T(xa) = xT(a) for all a E V, X E IR.
These two conditions on T can be combined into the single equation
T(xa + y(3) = xT(a) + yT({3) for all a, {3 E V and all x, y E IR.
lIoreover, this equation can be extended to any finite sum by induction, so
that if T is linear, then
for any linear combination L Xiai· For example, f: (L~ cdi) = L~ Ci f: k
EXERCISES
1.38 Show that the most general linear map from IR to IR is multiplication by a con-
stant.
1.39 For a fixed a in V the mapping x ~ xa from IR to V is linear. Why?
1.40 Why is this true for a ~ xa when x is fixed?
1.41 Show that every linear mapping from IR to V is of the form x ~ Xa for a fixed
vector a in V.
1.1 FUNDAMENTAL NOTIONS 31
1.42 Show that every linear mapping from 1R2 to V is of the form <Xl, X2 > 1-+
XIO!I +X20!2 for a fixed pair of vectorsO!l and 0!2 in V. What is the range of this mapping?
1.43 Show that the map f 1-+ f: fCt) dt from eC[a, b]) to IR does not preserve products.
1.44 Let g be any fixed function in IRA. Prove that the mapping T: IRA ~ IRA
defined by TU) = gf is linear.
1.45 Let cp be any mapping from a set A to a set B. Show that composition by cp is
a linear mapping from IRB to IRA. That is, show that T: IRB ~ IRA defined by TU) =
f 0 cp is linear.
In order to acquire a supply of examples, we shall find all linear transforma-
tions having IRn as domain space. It may be well to start by looking at one such
transformation. Suppose we choose some fixed triple of functions {Ii} ~ in the
space IRR of all real-valued functions on IR, say !I(t) = sin t, f2(t) = cos t, and
fa(t) = et = exp(t). Then for each triple of numbers x = {xiH in 1R3 we have
the linear combination L~=l Xdi with {Xi} as coefficients. This is the element of
IRR whose value at t is L~ xiIi(t) = Xl sin t +X2 cos t +X3et. Different coefficient
triples give different functions, and the mapping x 1-+ L~=l xdi = Xl sin +
X2 cos + X3 exp is thus a mapping from 1R3 to IRR. It is clearly linear. If we call
this mapping T, then we can recover the determining triple of functions from T
as the images of the "unit points" ~i in 1R3; T(~j) = L ~!Ii = Ii, and so
T(~l) = sin, T(~2) = cos, and T(~3) = expo We are going to see that every
linear mapping from 1R3 to IRR is of this form.
In the following theorem {~iH is the spanning set for IRn that we defined
earlier, so that x = Li Xi~i for every n-tuple x = <Xl, ••• , xn> in IRn.
Theorelll 1.2. If {~j} i is any fixed n-tuple of vectors in a vector space W,
then the "linear combination mapping" x 1-+ Li Xi~i is a linear trans-
formation T from IRn to W, and T(~j) = ~j for j = 1, ... ,n. Conversely,
if T is any linear mapping from IRn to W, and if we set ~j = T(~j) for j =
1, ... ,n, then T is the linear combination mapping x 1-+ Li Xi~i.
Proof. The linearity of the linear combination map T follows by exactly the
same argument that we used in Theorem 1.1 to show that L(A) is a subspace.
Thus
n n
T(x + y) = L: (Xi + Yi)~i = L: (Xi~i + Yi~i)
I I
n n
= L: Xi~i +L: Yi~i = T(x) + T(y),
I I
and
n n n
T(sx) = L: (SXi)~i = L: S(Xi~i) = SL: Xi~i = sT(x).
I I I
32 VECTOR SPACES 1.1
Conversely, if T: IRn ~ W is linear, and if we set (3j = T(5j ) for all j, then for
any x = -< Xb .•• , xn>- in IRn we have T(x) = T(L:i Xi 5i) = L:i xiT(5i ) =
L:i Xi{3i. Thus T is the mapping x ~ L:i xi{3i. 0
This is a tremendously important theorem, simple though it may seem, and
the reader is urged to fix it in his mind. To this end we shall invent some termi-
nology that we shall stay with for the first three chapters. If a = {ab ... , an}
is an n-tuple of vectors in a vector space W, let La. be the corresponding linear
combination mapping x ~ L:i Xiai from IRn to W. Note that the n-tuple a
itself is an element of W n • If T is any linear mapping from IRn to W, we shall call
the n-tuple {T(5i)}i the skeleton of T. In these terms the theorem can be restated
as follows.
Theorelll 1.2'. For each n-tuple a in W n , the map La.: IRn ~ W is linear
and its skeleton is a. Conversely, if T is any linear map from IRn to W, then
T = Lp where (3 is the skeleton of T.
Or again:
Theorelll 1.2". The map a ~ La. is a bijection from wn to the set of all
linear maps T from IRn to W, and T ~ skeleton (T) is its inverse.
A linear transformation from a vector space V to the scalar field IR is called
a linear functional on V. Thus f ~ f: f(t) dt is a linear f,unctional on V =
e([a, bJ). The above theorem is particularly simple for a linear functional F:
since W = IR, each vector (3i = F(5i ) in the skeleton of F is simply a number bi ,
and the skeleton {bi } i is thus an element of IRn. In this case we would write
F(x) = L:i biXi, putting the numerical coefficient 'b/ before the variable
'xi'. Thus F(x) = 3x! - X2 +4X3 is the linear functional on 1R3 with skeleton
-< 3, -1, 4>-. The set of all linear functionals on IRn is in a natural one-to-one
correspondence with IRn itself; we get b from F by bi = F(5i ) for all i, and we
get F from b by F(x) = L: biXi for all x in IRn.
We next consider the case where the codomain space of T is a Cartesian
space IRm, and in order to keep the two spaces clear in our minds, we shall, for
the moment, take the domain space to be 1R3. Each vector (3i = T(5i ) in the
skeleton of T is now an m-tuple of numbers. If we picture this m-tuple as a
column of numbers, then the three m-tuples {3i can be pictured as a rectangular
army of numbers, consisting of three columns each of m numbers. Let tij be the
ith number in the jth column. Then the doubly indexed set of numbers {tij} is
called the matrix of the transformation T. We call it an m-by-3 (an m X 3)
matrix because the pictured rectangular array has m rows and three columns.
The matrix determines T uniquely, since its columns form the skeleton of T.
The identity T(x) = L:~ xjT(5j) = L:~ Xj{3j allows the m-tuple T(x) to be
calculated explicitly from x and the matrix {tij}. Picture multiplying the
column m-tuple {3j by the scalar Xj and then adding across the three columns at
1.1 FUNDAMENTAL NOTIONS 33
the ith row, as below:
Since tij is the ith number in the m-tuple (3j, the ith number in the m-tuple
L:=1 Xj{3j is L:=1 Xjtij. That is, if we let y be the m-tuple T(x), then
3
Yi = L tijXj
j=1
for i = 1, ... , m,
and this set of m scalar equations is equivalent to the one-vector equation
y = T(x).
We can now replace three by n in the above discussion without changing
anything except the diagram, and thus obtain the following specialization of
Theorem 1.2.
Theorelll 1.3. Every linear mapping T from IRn to IRm determines the
m X n matrix t = {tij} having the skeleton of T as its columns, and the
expression of the equation y = T(x) in linear combination form is equivalent
to the m scalar equations
n
Yi = L tijXj
j=1
for '/, = 1, ... , m.
Conversely, each m X n matrix t determines the linear combination mapping
having the columns of t as its skeleton, and the mapping t 1---+ T is therefore
a bijection from the set of all m X n matrices to the set of all linear maps
from IRn to IRm.
A linear functional F on IRn is a linear mapping from IRn to IR1, so it must
be expressed by a 1 X n matrix. That is, the n-tuple b in IRn which is the skeleton
of F is viewed as a matrix of one row and n columns.
As a final example of linear maps, we look at an important class of special
linear functionals defined on any function space, the so-called coordinate func-
tionals. If V = IRI and i E I, then the ith coordinate functional 1ri is simply
evaluation at i, so that 1ri(f) = f(i). These functionals are obviously linear. In
fact, the vector operations on functions were defined to make them linear; since
sf +tg is defined to be that function whose value at i is sf(i) +tg(i) for all i,
we see that sf +tg is by definition that function such that 1ri(sf +tg) =
S1ri(f) +t1ri(g) for all i!
If V is IRn , then 1rj is the mapping x = -<Xl> ... , xn >- 1---+ Xj. In this case
we know from the theorem that 1rj must be of the form 1rj(x) = L1 biXi for
some n-tuple b. What is b?
34 VECTOIt SPACES 1.1
The general form of the linearity property, TeE Xiai) = L xiT(ai), shows
that T and T- l both carry subspaces into subspaces.
Theorem 1.4. If T: V ~ W is linear, then the T-image of the linear span
of any subset A C V is the linear span of the T-image of A: T[L(A)] =
L(T[AJ). In particular, if A is a subspace, then so is T[A]. Furthermore, if Y
is a subspace of W, then T-l[y] is a subspace of V.
Proof. According to the formula T(L Xiai) = L xiT(ai), a vector in W is
the T-image of a linear combination on A if and only if it is a linear combination
on T[A]. That is, T[L(A)] = L(T[AJ). If A is a subspace, then A = L(A) and
T[A] = L(T[AJ), a subspace of W. Finally, if Y is a subspace of Wand {ail C
T-lfY], then T(L Xiai) = L xiT(ai) E L(Y) = Y. Thus L Xiai E T-l[y]
and T-l[y] is its own linear span. 0
The subspace T-l(O) = {a E V: T(a) = O} is called the null space, or
kernel, of T, and is designated N(T) or meT). The range of T is the subspace
T[V] of W. It is designated R(T) or (!l(T).
Lemma 1.1. A linear mapping T is injective if and only if its null space
is {O}.
Proof. If T is injective, and if a rf 0, then T(a) rf T(O) = 0 and the null space
accordingly contains only O. On the other hand, if N(T) = {O}, then whenever
a rf {3, we have a - (3 rf 0, T(a) - T({3) = T(a - (3) rf 0, and T(a) rf T({3);
this shows that T is injective. 0
A linear map T: V ~ W which is bijective is called an isomorphism.
Two vector spaces V and Ware isomorphic if and only if there exists an iso-
morphism between them.
For example, the map -< Cl, ... , Cn >- ~ Lo-l Ci+1Xi is an isomorphism of
IRn with the vector space of all polynomials of degree < n.
Isomorphic spaces "have the same form", and are identical as abstract
vector spaces. That is, they cannot be distinguished from each other solely on
the basis of vector properties which they do or do not have.
When a linear transformation is from V to itselfJ special things can happen.
One possibility is that T can map a vector a essentially to itself, T(a) = Xa
for some x in IR. In this case a is called an eigenvector (proper vector, character-
istic vector), and x is the corresponding eigenvalue.
EXERCISES
1.46 In the situation of Exerci~e 1.45, show that T is an isomorphism if <p is bijective
by showing that
a) <p injective ==} T surjective,
b) <p surjective ==} T injective.
1.1 FUNDAMENTAL NOTIONS 35
1.47 FindthelinearfunctionallonIR2suchthatl(-<I,I>-) = Oandl(-<1,2>-) = 1.
That is, find b = -< bI, b2>- in 1R2 such that l is the linear combination map
1.48 Dothesameforl(-<2,1>-) = -3 and l(-<1,2>-) = 4.
1.49 Find the linear T: 1R2 ~ IRR such that T(-< 1,1 >-) = t2 and T(-< 1,2>-) = t3•
That is, find the functions h (t) and h(t) such that T is the linear combination map
x ~ xI/I +x2h.
1.50 LetTbethelinearmapfromIR2toIR3suchthatT(~1) = -<2, -1, 1>-, T(~2) =
-< 1, 0, 3>-. Write down the matrix of T in standard rectangular form. Determine
whether or not ~I is in the range of T.
1.51 Let T be the linear map from 1R3 to 1R3 whose matrix is
[1 2 3]2 ° -1 .
3 -1 1
Find T(x) when x = -< 1,1,0>-; do the same for x = -< 3, -2, 1>-.
1.52 Let M be the linear span of -< 1, -1, 0>- and -< 0, 1, 1>-. Find the subspace
T[M] by finding two vectors spanning it, where T is as in the above exercise.
1.53 Let T be the map -< x, y >- ~ -< x +2y, y>- from 1R2 to itself. Show that T is a
linear combination mapping, and write down its matrix in standard form.
1.54 Do the same for T: -< x, y, z >- ~ -< x - z, x + z, y>- from 1R3 to itself.
1.55 Find a linear transformation T from 1R3 to itself whose range space is the span
of -< 1, -1,0>- and -< -1,0,2>-.
1.56 Find two linear functionals on 1R4 the intersection of whose null spaces is the
linear span of -<1, 1, 1, 1>- and -<1,0, -1,0>-. You now have in hand a linear
transformation whose null space is the above span. What is it?
1.57 Let V = e([a, b]) be the space of continuous real-valued functions on [a, b],
also designated eO([a, b]), and let lV = e 1([a, b]) be those having continuous first
derivatives. Let D: lV ~ V be differentiation (Df = f'), and define T on V by
T(f) = F, where F(x) = 1:f(t) dt. By stating appropriate theorems of the calculus,
show that D and T are linear, T maps into lV, and D is a left inverse of T (D 0 Tis
the identity on V).
1.58 In the above exercise, identify the range of T and the null space of D. We
know that D is surjective and that T is injective. Why?
].59 Let V be the linear span of the functions sin x and cos x. Then the operation
of differentiation D is a linear transformation from V to V. Prove that D is an isomor-
phism from V to V. Show that D2 = - / on V.
1.60 a) As the reader would guess, e 3(1R) is the set of real-valued functions on IR
having continuous derivatives up to and including the third. Show that f ~ fIll is a
surjective linear map T from e 3(1R) to e(IR).
b) For any fixed a in IR show that f ~ -<f(a), !,(a), j"(a) >- is an isomorphism
from the null space N(T) to 1R3. [Hint: Apply Taylor's formula with remainder.]
3G VECTOR SPACES 1.2
1.61 .n integral analogue of the matrix equations Yi = Li tiixi, i = 1, ... , tn, is
the equation
g(s) = 101
K(s, t)f(t) dt, s E [0, I].
Assuming that [(s, t) is defined on the square [0, I] X [0, I] and is continuous as a
function of t for each s, check that f ----> g is a linear mapping from e([O, 1]) to 1R1O.1l.
1.62 For a finite set A = {ai}, Theorem l.1 is a corollary of Theorem 104. Why?
1.63 Show that the inverse of an isomorphism is linear (and hence is an isomorphism).
1.64 Find the eigenvectors and eigenvalues of T: 1R2 ----> 1R2 if the matrix of T is
[ 1-1].
-2 °
Since every scalar multiple xa of an eigenvector a is clearly also an eigenvector, it will
suffice to find one vector in cach "eigendirection". This is a problem in elementary
algebra.
1.65 Find the eigenvectors and eigenvalues of the transformations T whose matrices
are
[-1 -1],-1 -1
[ 1-1]-2 2'
1.66 The five transformations in the above two exercises exhibit four different kinds
of behavior according to the number of distinct eigendirections they have. What are
the possibilities?
1.67 Let V be the vector space of polynomials of degree ::::; 3 and define T: V ----> V
by f ----> tj'(t). Find the eigenvectors and eigenvalues of T.
2. VECTOR SPACES AND GEOMETRY
The familiar coordinate systems of analytic geometry allow us to consider
geometric entities such as lines and planes in vector settings, and these geometric
notions give us valuable intuitions about vector spaces. Before looking at the
vector forms of these geometric ideas, we shall briefly review the construction of
the coordinate correspondence for three-dimensional Euclidean space. As usual,
the confident reader can skip it.
We start with the line. A coordinate correspondence between a line Land
the real number system IR is determined by choosing arbitrarily on L a zero
point 0 and a unit point Qdistinct from O. Then to each point X on L is assigned
the number x such that JxJ is the distance from 0 to X, measured in terms of
the segment OQ as unit, and x is positive or negative according as X and Q are
on the same side of 0 or on opposite sides. The mapping X ~ x is the coordinate
correspondence. Now consider three-dimensional Euclidean space 1E3. We want
to set up a coordinate correspondence between 1E3 and the Cartesian vector
space 1R3. We first choose arbitrarily a zero point 0 and three unit points
Ql, Q2, and Q3 in such a way that the four points do not lie in a plane. Each of
1.2 VECTOR SPACES A~D GEOMETRY 37
the unit points Qi determines a line Li through 0 and a coordinate correspon-
dence on this line, as defined above. The three lines L l, L 2, and L3 are called
the coordinate axes. Consider now any point X in 1E3. The plane through X
parallel to L2 and L3 intersects Ll at a point X b and therefore determines a
number Xl, the coordinate of Xl on L l . In a similar way, X determines points
X 2 on L2 and X 3 on L3 which have co-
ordinates X2 and X3, respectively. Alto-
gether X determines a triple
in ~3, and we have thus defined a mapping
(): X ~ x from 1E3 to ~3 (see Fig. 1.4).
We call () the coordinate correspondence
defined by the axis system. The conven-
tion implicit in our notation above is that
()(Y) is y, ()(A) is a, etc. Note that the
unit point Ql on Ll has the coordinate
triple 01 = -< 1,0,0>, and similarly, that
and
()(Q3) = 03 = -< 0, 0, 1> .
L~3X3/ ----/ ---
/ --JI
~_ Q2 // 
 ----- //  --..L 
 Xl 
  
 I
  I
  I
  
  I
  I
 I I
I Q 1________x,
 1 ~
 / L1
X.,---_ I //
- ---1.-
Fig. 1.4
There are certain basic facts about the coordinate correspondence that have
to be proved as theorems of geometry before the correspondence can be used to
treat geometric questions algebraically. These geometric theorems are quite
tricky, and are almost impossible to discuss adequately on the basis of the usual
secondary sch:)ol treatment of geometry. We shall therefore simply assume
them. They are:
1) () is a bijection from 1E3 to ~3.
2) Two line segments AB and XY are equal in length and parallel, and the
direction from A to B is the same as that from X to Y if and only if b - a =
y - X (in the vector space ~3). This relationship between line segments is
important enough to formalize. A directed line segment is a geometric line seg-
ment, together with a choice of one of the two directions along it. If we interpret
AB as the directed line segment from A to B, and if we define the directed line
segments AB and XY to be equivalent (and write AB ~ XV) if they are equal
in length, parallel, and similarly directed, then (2) can be restated:
AB ~ XY <=} b - a = y - x.
3) If X ~ 0, then Y is on the line through 0 and X in 1E3 if and only if
y = tx for some t in R Moreover, this t is the coordinate of Y with respect to X
as unit point on the line through 0 and X.
38 VECTOR SPACES
82 =Xr+X~
IOXI2=r2=82+x~
1.2
y
Fig. 1.5
4) If the axis system in 1E3 is Cartesian, that is, if the axes are mutually
perpendicular and a common unit of distance is used, then the length 10XI of
the segment OX is given by the so-called Euclidean norm on 1R3, 10XI =
CE~ Xi2)1/2. This follows directly from the Pythagorean theorem. Then this
formula and a second application of the Pythagorean theorem to the triangle
OXY imply that the segments OX and OY are perpendicular if and only if the
scalar product (x, y) = L~=l XiYi has the value 0 (see Fig. 1.5).
In applying this result, it is useful to note that the scalar product (x, y) is
linear as a function of either vector variable when the other is held fixed. Thus
3 3 3
(cx +dy, z) = L: (CXi +dYi)zi = C L: XiZi +d L: YiZi = c(x, z) +dey, z).
1 1 1
Exactly the same theorems hold for the coordinate correspondence between
the Euclidean plane 1E2 and the Cartesian 2-space 1R2, except that now, of course,
(x, y) = L~ XiYi = XIYl +X2Y2·
We can easily obtain the equations for lines and
planes in 1E3 from these basic theorems. First, we see
from (2) and (3) that if fixed points A and B are given, X
with A F- 0, then the line through B parallel to the
segment OA contains the point X if and only if there 0
exists a scalar t such that x - b = ta (see Fig. 1.6).
Therefore, the equation of this line is
x = ta+ h. Fig. 1.6
This vector equation is equivalent to the three numerical equations Xi =
ait +bi, i = 1, 2, 3. These are customarily called the parametric equations of the
line, since they present the coordinate triple x of the varying point X on the line
as functions of the "parameter" t.
1.2 VECTOR SPACES AND GEOMETRY 39
Next, we know that the plane through B perpendicular to the direction of
the segment OA contains the point X if and only if BX 1. OA, and it therefore
follows from (2) and (4) that the plane contains X if and only if (x - b, a) = 0
(see Fig. 1.7). But (x - b, a) = (x, a) - (b, a) by the linearity of the scalar
product in its first variable, and if we set l = (b, a), we see that the equation of
the plane is
(x, a) = l or
That is, a point X is on the plane through B perpendicular to the direction of
OA if and only if this equation holds for its coordinate triple x. Conversely,
if a ¢ 0, then we can retrace the steps taken above to show that the set of points
X in 1E3 whose coordinate triples x satisfy (x, a) = l is a plane.
A
Fig. 1.7
The fact that 1R3 has the natural scalar product (x, y) is of course extremely
important, both algebraically and geometrically. However, most vector spaces
do not have natural scalar products, and we shall deliberately neglect scalar
products in our early vector theory (but shall return to them in Chapter 5).
This leads us to seek a different interpretation of the equation L~ aiXi = l.
We saw in Section 1 that x 1---+ L~ aiXi is the most general linear functional f on
1R3. Therefore, given any plane 111 in 1E3, there is a nonzero linear functional f
on 1R3 and a number l such that the equation of 111 is f(x) = l. And conversely,
given any nonzero linear functional f: IRa ~ IR and any l E IR, the locus of
I(x) = l is a plane M in 1E3. The reader will remember that we obtain the
coefficient triple a from f by ai = f(~i), since then f(x) = f(LI Xi~i) =
3 i a
Ll x;J( ~ ) = Ll Xiai·
Finally, we seek the vector form of the notion of parallel translation. In
plane geometry when we are considering two congruent figures that are parallel
and similarly oriented, we often think of obtaining one from the other by "sliding
40 VECTOR SPACES 1.2
the plane along itself" in such a way that all lines remain parallel to their original
positions. This description of a parallel translation of the plane can be more
elegantly stated as the condition that every directed line segment slides to an
equivalent one. If X slides to Y and 0 slides to B, then OX slides to BY, so
that OX ~ BY and x = y - b by (2). Therefore, the coordinate form of such
a parallel sliding is the mapping x ~ y = x + b.
Conversely, for any b in ~2 the plane mapping defined by x ~ y = x +b
is easily seen to be a parallel translation. These considerations hold equally well
for parallel translations of the Euclidean space 1E3•
It is geometrically clear that under a parallel translation planes map to
parallel planes and lines map to parallel lines, and now we can expect an easy
algebraic proof. Consider, for example, the plane 111 with equation f(x) = l;
let us ask what happens to 111 under the translation x ~ y = x + b. Since
x = y - b, we see that a point x is on 111 if and only if its translate y satisfies
the equation f(y - b) = lor, since f is linear, the equation f(y) = l', where
l' = l +f(b). But this is the equation of a plane N. Thus the translate of M
is the plane N.
It is natural to transfer all this geometric terminology from sets in 1E3
to the corresponding sets in ~3 and therefore to speak of the set of ordered
triples x satisfying f(x) = l as a set of points in ~3 forming a plane in ~3, and
to call the mapping x ~ x + b the (parallel) translation of ~3 through b, etc.
lIoreover, since ~3 is a vector space, we would expect these geometric ideas to
interplay with vector notions. For instance, translation through b is simply the
operation of adding the constant vector b: x ~ x + b. Thus if 111 is a plane, then
the plane N obtained by translating 111 through b is just the vector set sum
111 + b. If the equation of 111 is f(x) = l, then the plane 111 goes through 0 if
and only if l = 0, in which case 111 is a vector subspace of ~3 (the null space of f).
It is easy to see that any plane 111 is a translate of a plane through O. Similarly,
the line {ta + b : t E ~} is the translate through b of the line {ta : t E ~}, and
this second line is a subspace, the linear span of the one vector a. Thus planes
and lines in ~3 are translates of subspaces.
These notions all carryover to an arbitrary real vector space in a perfectly
satisfactory way and with additional dimensional variety. A plane in ~3
through 0 is a vector space which is two-dimensional in a strictly algebraic sense
which we shall discuss in the next chapter, and a line is similarly one-dimensional.
In ~3 there are no proper subspaces other than planes and lines through 0,
but in a vector space V with dimension n > 3 proper subspaces occur with all
dimensions from 1 to n - 1. We shall therefore use the term "plane" loosely to
refer to any translate of a subspace, whatever its dimension. More properly,
translates of vector subspaces are called affine subspaces.
We shall see that if V is a finite-dimensional space with dimension n, then
the null space of a nonzero linear functionalfis always (n - I)-dimensional, and
therefore it cannot be a Euclidean-like two-dimensional plane except when
1.2 VECTOR SPACES AND GEOMETRY 41
n = 3. We use the term hyperplane for such a null space or one of its translates.
Thus, in general, a hyperplane is a set with the equation f(x) = l, where f is
a nonzero linear functional. It is a proper affine subspace (plane) which is maxi-
mal in the sense that the only affine subspace properly including it is the whole
of V. In ~3 hyperplanes are ordinary geometric planes, and in ~2 hyperplanes
are lines!
EXERCISES
2.1 Assuming the theorem AB ~ XY <=} b -'- a = y - x, show that OC is the sum
of 01. and OB, as defined in the preliminary discussion of Section 1, if and only if
c = b +a. Considering also our assumed geometric theorem (3), show that the
mapping x 1--+ OX from ~3 to the vector space of geometric vectors is linear and
hence an isomorphism.
2.2 Let L be the line in the Cartesian plane ~2 with equation X2 = 3Xl. Express L
in parametric form as x = ta for a suitable ordered pair a.
2.3 Let V be any vector space, and let ex and {j be distinct vectors. Show that the
line through ex and {j has the parametric equation
~ = t{j + (1 - t)ex, tE ~.
Show also that the segment from ex to {j is the image of [0, 1] in the above mapping.
2.4 According to the Pythagorean theorem, a triangle with side lengths a, b, and e
has a right angle at the vertex "opposite e" if and only if e2 = a2 + b2•
a~
b
Prove from this that in a Cartesian coordinate system in 1E3 the length IOXI of a
segment OX is given by
3
IOXI2 = :E X~,
1
where x = -< XI, X2, X3 >- is the coordinate triple of the point X. Next use our geometric
theorem (2) to conclude that
OX..L OY if and only if (x, y) = 0, where
(Use the bilinearity of (x, y) to expand IX - YI2.)
3
(x, y) = :E XiYi.
1
2.5 More generally, the law of cosine says that in any triangle labeled as indicated,
e2 = a2 +b2 - 2ab cos (J.
Bb
42 VECTOR SPACES 1.2
Apply this law to the diagram
to prove that
(x, y) = 21xllyl cos 8,
where (x, y) is the scalar product I:~ XiYi, Ixl = (x, x)I/2 = 10XI, etc.
2.6 Given a nonzero linear functional f: 1R3 --> IR, and given k E IR, show that the
set of points X in P such that f(x) = k is a plane. [Hint: Find a h in 1R3 such that
f(h) = k, and throw the equation f(x) = k into the form (x - h, a) = 0, etc.]
2.7 Show that for any b in 1R3 the mapping X f-+ Y from P to itself defined by
y = x + b is a parallel translation. That is, show that if X f-+ }' and Z f-+ W, then
XZ"'" YW.
2.8 Let M be the set in 1R3 with equation 3XI - X2 + X3 = 2. Find triplets a and h
such that M is the plane through b perpendicular to the direction of a. What is the
equation of the plane P = J[ + -< 1,2, 1>- '?
2.9 Continuing the above exercise, what is the condition on the triplet h in order for
N = M + h to pass through the origin? What is the equation of N?
2.10 Show that if the plane Min 1R3 has the equation f(x) = l, then J[ is a translate
of the null space N of the linear functional f. Show that any two translates M and P
of N are either identical or disjoint. What is the condition on the ordered triple h
in order that M + h = M?
2.11 Generalize the above exercise to hyperplanes in IRn.
2.12 Let N be the subspace (plane through the origin) in 1R3 with equation f(x) = O.
Let M and P be any two planes obtained from N by parallel translation. Show that
Q = ).11 + P is a third such plane. If M and P have the equations f(x) = it and
f(x) = l2, find the equation for Q.
2.13 If M is the plane in 1R3 with equation f(x) = l, and if r is any nonzero number,
show that the set product rM is a plane parallel to M.
2.14 In view of the above two exercises, discuss how we might consider the set of all
parallel translates of the plane N with equation f(x) = 0 as forming a new vector
space.
2.15 Let L be the subspace (line through the origin) in 1R3 with parametric equation
x = tao Discuss the set of all parallel translates of L in the spirit of the above three
exercises.
2.16 The best object to take as "being" the geometric vector AS is the equivalence
class of all directed line segments XY such that XY ,.." AB. Assuming whatever you
need from properties (1) through (4), show that this is an equivalence relation on the
set of all directed line segments (Section 0.12).
2.17 Assuming that the geometric vector AS is defined as in the above exercise, show
that, strictly speaking, it is actually the mapping of the plane (or space) into itself that
we have called the parallel translation through AB. Show also that AS +cD is the
composition of the two translations.
1.3 PRODUCT SPACES AND HOM(V, W) 43
3. PRODUCT SPACES AND HOM(V, W)
Product spaces. If W is a vector space and A is an arbitrary set, then the set
V = W A of all W-valued functions on A is a vector space in "exactly the same
way that ~A is. Addition is the natural addition of functions, (f + g) (a) =
f(a) + g(a), and, similarly, (xf)(a) = x(j(a) for every function f and scalar x.
Laws Al through S4 follow just as before and for exactly the same reasons. For
variety, let us check the associative law for addition. The equation!+(g +h) =
(f + g) +h means that (j + (g + h)(a) = ((f + g) + h) (a) for all a E A.
But
(j + (g + h) (a) = f(a) + (g + h)(a)
= f(a) + (g(a) + h(a) = (j(a) + g(a) + h(a)
= (f + g)(a) +h(a) = ((f + g) +h) (a),
where the middle equality in this chain of five holds by the associative law for W
and the other four are applications of the definition of addition. Thus the
associative law for addition holds in WA because it holds in W, and the other
laws follow in exactly the same way. As before, we let 7ri be evaluation at i,
so that 7ri(f) = f(i). Now, however, 7ri is vector valued rather than scalar valued,
because it is a mapping from V to W, and we call it the ith coordinate projection
rather than the ith coordinate functional. Again these maps are all linear.
In fact, as before, the natural vector operations on W A are uniquely defined by
the requirement that the projections 7ri all be linear. We call the value fU) =
7rj(f) the jth coordinate of the vector f. Here the analogue of Cartesian n-space
is the set wn of all n-tuples a = -<al, ... , a n >- of vectors in W; it is also
designated Wn • Clearly, aj is the jth coordinate of the n-tuple a.
There is no reason why we must use the same space W at each index, as we
did above. In fact, if W b ... , Wn are any n vector spaces, then the set of all
n-tuples a = -<ab ... , an >- such that aj E Wj for j = 1, ... , n is a vector
space under the same definitions of the operations and for the same reasons.
That is, the Cartesian product W = W1 X W2 X ... X Wn is also a vector
space of vector-valued functions. Such finite products will be very important
to us. Of course, ~n is the product IIi Wi with each Wi = ~; but ~n can also
be considered ~m X ~n-m, or more generally, IIf Wi, where Wi = ~mi and
L:f mi = n. However, the most important use of finite product spaces arises
from the fact that the study of certain phenomena on a vector space V may
lead in a natural way to a collection {ViH of subspaces of V such that V is
isomorphic to the product IIi Vi. Then the extra structure that V acquires
when we regard it as the product space IIi Vi is used to study the phenomena
in question. This is the theory of direct sums, and we shall investigate it in
Section 5.
Later in the course we shall need to consider a general Cartesian product of
vector spaces. We remind the reader that if {Wi: i E J} is any indexed collection
of vector spaces, then the Cartesian product IIiEI Wi of these vector spaces is
44 VECTOR SPACES 1.3
defined as the set of all functions f with domain I such that f(i) E Wi for all
i E I (see Section 0.8).
The following is a simple concrete example to keep in mind. Let S be the
ordinary unit sphere in 1R3, S = {x: L~ x~ = I}, and for each point x on S
let W" be the subspace of 1R3 tangent to S at x. By this we mean the subspace
(plane through 0) parallel to the tangent plane to S at x, so that the translate
W" + x is the tangent plane (see Fig. 1.8). A function f in the product space
W = II"es W" is a function which assigns to each point x on S a vector in W",
that is, a vector parallel to the tangent plane to Sat x. Such a function is called
a vector field on S. Thus the product set W is the set of all vector fields on S,
and W itself is a vector space, as the next theorem states.
Fig. 1.8
Of course, the jth coordinate projection on W = IIiES Wi is evaluation
at j, 7rj(f) = f(j), and the natural vector operations on Ware uniquely defined
by the requirement that the coordinate projections all be linear. Thus f + g
must be that element of W whose value at j, 7rj(f + g), is 7rj(f) + 7rj(g) =
f(j) + g(j) for all j E I, and similarly for mUltiplication by scalars.
Theorem. 3.1. The Cartesian product of a collection of vector spaces can
be made into a vector space in exactly one way so that the coordinate pro-
jections are all linear.
Proof. With the vector operations determined uniquely as above, the proofs of
Al through S4 that we sampled earlier hold verbatim. They did not require that
the functions being added have all their values in the same space, but only that
the values at a given domain element i all lie in the same space. 0
Hom.(V, W). Linear transformations have the simple but important properties
that the sum of two linear transformations is linear and the composition of two
linear transformations is linear. These imprecise statements are in essence the
theme of this section, although they need bolstering by conditions on domains
and codomains. Their proofs are simple formal algebraic arguments, but the
objects being discussed will increase in conceptual complexity.
1.3 PRODUCT SPACES AND HOM(V. W) 45
If W is a vector space and A is any set, we know that the space WA of all
mappings f: A -+ W is a vector space of functions (now vector valued) in the
same way that IRA is. If A is itself a vector space V, we naturally single out for
special study the subset of WV consisting of all linear mappings. We designate
this subset Hom(V, W). The fonowing elementary theorems summarize its
basic algebraic properties.
TheoreIn 3.2. Hom(V, W) is a vector subspace of WV.
Pl·oof. The theorem is an easy formality. If Sand T are in Hom(V, W), then
(S + T)(xa. + y(3) = S(xa. + y,8) + T(xa. + y,8)
= xS(a.) + yS(,8) + xT(a.) + yT(,8) = xeS + T)(a.) + yeS + T)(,8),
so S + T is linear and Hom(V, W) is closed under addition. The reader should
be sure he knows the justification for each step in the above continued equality.
The closure of Hom(V, W) under multiplication by scalars fonows similarly,
and since Hom(V, W) contains the zero transformation, and so is nonempty,
it is a subspace. 0
TheoreIn 3.3. The composition of linear maps is linear: if T E Hom(V, W)
and S E Hom(W, X), then SoT E Hom(V, X). Moreover, composition
is distributive over addition, under the obvious hypotheses on domains and
codomains:
(Sl + S2) 0 T = SloT + S2 0 T and So (Tl + T2) = So Tl + S 0 T2.
Finally, composition commutes with scalar multiplication:
C(S 0 T) = (cS) 0 T = S 0 (cT).
Proof. We have
So T(xa. + y,8) = S(T(xa. + y(3)) = S(xT(a.) + yT(,8»)
= xS(T(a.») + yS(T(,8») = xeS 0 T)(a.) + yeS 0 T)«(3),
so SoT is linear. The two distributive laws will be left to the reader. 0
Corollary. If T E Hom(V, W) is fixed, then composition on the right by T
is a linear transformation from the vector space Hom(W, X) to the vector
space Hom(V, X). It is an isomorphism if T is an isomorphism.
P1'Oof. The algebraic properties of composition stated in the theorem can be
combined as follows:
(ClSl +C2S2) 0 T = Cl(Sl 0 T) +C2(S2 0 T),
So (clTl +C2T2) = Cl(S 0 T 1) +C2(S 0 T2)'
The first equation says exactly that composition on the right by a fixed T is a
linear transformation. (Write SoT as 3(S) if the equations still don't look
right.) If T is an isomorphism, then composition by T-l "undoes" composition
by T, and so is its inverse. 0
46 VECTOR SPACES 1.3
The second equation implies a similar corollary about composition on the
left by a fixed S.
TheoreIn 3.4. If W is a product vector space, W = IIi Wi, then a mapping
T from a vector space V to W is linear if and only if 7ri 0 T is linear for
each coordinate projection 7ri.
Proof. If T is linear, then 7ri 0 T is linear by the above theorem. Now suppose,
conversely, that all the maps 7ri 0 T are linear. Then
7ri(T(xa +y(3») = 7ri 0 T(xa +y(3) = X(7ri 0 T)(a) +Y(7ri 0 T)({3)
= x7ri(T(a») +Y7ri(T({3») = 7ri(xT(a) + yT({3»).
But if 7ri(f) = 7ri(g) for all i, then f = g. Therefore, T(xa + y(3) = xT(a) +
yT({3), and T is linear. 0
If T is a linear mapping from ~n to W whose skeleton is {{3j} n, then 7ri 0 T /
has skeleton {7ri({3j)}f=l. If W is ~m, then 7ri is the ith coordinate functional /
y 1---+ Yi, and (3j is the jth column in the matrix t = {tij} of T. Thus 7ri({3j) = tij,
and 7ri 0 T is the linear functional whose skeleton is the ith row of the matrix of T.
In the discussion centering around Theorem 1.3, we replaced the vector
equation y = T(x) by the equivalent set of m scalar equations Yi = L:f=l tijXj,
which we obtained by reading off the ith coordinate in the vector equation. But
in "reading off" the ith coordinate we were applying the coordinate mapping
7ri, or in more algebraic terms, we were replacing the linear map T by the set of
linear maps {7ri 0 T}, which is equivalent to it by the above theorem.
Now consider in particular the space Hom(V, V), which we may as well
designate 'Hom(V)'. In addition to being a vector space, it is also closed under
composition, which we consider a multiplication operation. Since composition
of functions is always associative (see Section 0.9), we thus have for multiplica-
tion the laws
A 0 (B 0 C) = (A 0 B) 0 C,
A 0 (B +C) = (A 0 B) + (A 0 C),
(A +B) 0 C = (A 0 C) + (B 0 C),
k(A 0 B) = (kA) 0 B = A 0 (kB).
Any vector space which has in addition to the vector operations an operation
of multiplication related to the vector operations in the above ways is called
an algebra. Thus,
TheoreIn 3.5. Hom(V) is an algebra.
We noticed earlier that certain real-valued function spaces are also algebras.
Examples were ~A and e([O, 1]). In these cases multiplication is commutative,
but in the case of Hom(V) multiplication is not commutative unless V is a
trivial space (V = {O}) or V is isomorphic to~. We shall check this later when
we examine the finite-dimensional theory in greater detail.
1.3 PRODUCT SPACES AND HOM(V, W) 47
Product projections and injections. In addition to the coordinate projections,
there is a second class of simple linear mappings that is of basic importance in
the handling of a Cartesian product space W = IIkEK Wk. These are, for each
j, the mapping OJ taking a vector a E Wj to the function in the product space
having the value a at the index j and °elsewhere. For example, O2 for WI X
W 2 X W3 is the mapping a ~ -<0, a, 0> from W 2 to W. Or if we view ~3 as
~ X ~2, then O2 is the mapping -< X2, X3 > ~ -< 0, -< X2, X3> > = -< 0, X2, X3 > .
We call OJ the injection of Wj into IIk Wk. The linearity of OJ is probably obvious.
The mappings 7rj and OJ are clearly connected, and the following pl'Ojection-
injection identities state their exact relationship. If I j is the identity trans-
formation on Wj, then
and if i ~ j.
If K is finite and I is the identity on the product space W, then
L Ok 0 11'k = I.
kEK
In the case IIt=l Wi, we have 02 0 11'2(-<aI, a2, a3» = -<0, a2, 0>, and
the identity simply says that -< aI, 0, °> + -< 0, a2, °> + -< 0, 0, a3> =
-< aI, a2, a3> for all aI, a2, a3' These identities will probably be clear to the
reader, and we leave the formal proofs as an exercise.
The coordinate projections 11'j are useful in the study of any product space,
but because of the limitation in the above identity, the injections OJ are of
interest principally in the case of finite products. Together they enable us to
decompose and reassemble linear maps whose domains or codomains are finite
product spaces.
For a simple example, consider the T in Hom(~3, ~2) whose matrix is
[i -1
1
Then 11'1 0 T is the linear functional whose skeleton -< 2, -1, 1> is the first row
in the matrix of T, and we know that we can visualize its expression in equation
form, Yl = 2Xl - X2 + X3, as being obtained from the vector equation y =
T(x) by "reading off the first row". Thus we "decompose" T into the two linear
functionals li = 11'i 0 T. Then, speaking loosely, we have the reassembly
T = -< h, l2> j more exactly, T(x) = -< 2Xl - X2 + X3, Xl + X2 + 4X3> =
-< II (x), l2(X) > for all x. However, we want to present this reassembly as the
action of the linear maps 01 and O2 . We have
which shows that the decomposition and reassembly of T is an expression of the
identity L Oi 0 11'i = I. In general, if T E Hom(V, W) and W = IIi Wi, then
Ti = 11'; 0 T is in Hom(V, Wi) for each i, and Ti can be considered "the part
of T going into W/', since Ti(a) is the ith coordinate of T(a) for each a. Then we
48 VECTOR SPACES 1.3
can reassemble the T/s to form T again by T = L Oi 0 Ti, for L Oi 0 Ti =
(L Oi 0 'Tri) 0 T = loT = T. Moreover, any finite collection of T/s on a
common domain can be put together in this way to make a T. For example,
we can assemble an m-tuple {Ti} r of linear maps on a common domain V to
form a single m-tuple-valued linear map T. Given a in V, we simply define
T(a) as that m-tuple whose ith coordinate is Ti(a) for i = 1, ... , m, and then
check that T is linear. Thus without having to calculate, we see from this
assembly principle that T: x ~ -< 2XI - X2 + X3, Xl + X2 + 4X3>- is a linear
mapping from 1R3 to 1R2, since we have formed T by assembling the two linear
functionals ll(X) = 2XI - X2 + X3 and l2(X) = Xl + X2 +4X3 to form a
single ordcrcd-pair-valued map. This very intuitive proccss has an cqually
simple formal justification. We rigorize our discussion in the following theorem.
Theorem 3.6. If Ti is in Hom(V, Wi) for each i in a finite index set I,
and if W is the product space IIiEI Wi, then there is a uniquely determined
Tin Hom(V, W) such that Ti = 'Tri 0 T for all i in I.
Proof. If T exists such that Ti = 'Tri 0 T for each i, then T = Iw 0 T =
(L Oi 0 'Tri) 0 T = L Oi 0 (7ri 0 T) = L Oi 0 Ti. Thus T is uniquely determincd
as L Oi 0 Ti. Moreover, this T does have the required property, since thcn
'Trj 0 T = 'Trj 0 (L Oi 0 Ti) = L ('Trj 0 Oi) 0 Ti = I j 0 T j = T j. 0
i
In the same way, we can decompose a linear T whose domain is a product
space V = II;'=1 V j into the maps T j = To OJ with domains Vj, and thcn
reassemble these maps to form T by the identity T = L;'=l Tj 0 'Trj (check it
mentally!). lVloreover, a finite collection of maps into a common codomain
space can be put together to form a single map on the product of the domain
spaces. Thus an n-tuple of maps {Ti} ~ into W defines a single map T into W,
where the domain of T is the product of the domains of the T/s, by the equation
T( -< aI, ... , an >-) = L~ Ti(ai) or T = L~ Ti 0 'Tri. For example, if T I: IR ~ 1R2
is the map t ~ t -< 2, 1>- = -< 2t, t>-, and T 2 and T 3 are similarly the maps
t ~ t -< -1, 1>- and t ~ t -< 1,4>-, then T = Lt Ti 0 7ri is the mapping from
1R3 to 1R2 whose matrix is
-1
1 !J.
Again there is a simple formal argument, and we shall ask the reader to write
out the proof of the following theorem.
Theorem 3.7. If Tj is in Hom(Vj, W) for each J in a finite index set J,
and if V = IIjEJ Vi> then there exists a unique T in Hom(V, W) such
that To OJ = T j for eachJ in J.
Finally we should mention that Theorem 3.6 holds for all product spaces,
finite or not, and states a property that characterizes product spaces. We shall
1.3 PRODUCT SPACES AND HOM(V, W) 49
investigate this situation in the exercises. The proof of the general case of
Theorem 3.6 has to get along without the injections OJ; instead, it is an application
of Theorem 3.4.
The reader may feel that we are being overly formal in using the projections
7ri and the injections Oi to give algebraic formulations of processes that are
easily visualized directly, such as reading off the scalar "components" of a
vector equation. However, the mappings
and Xi ~ -<0, ... , 0, Xi, 0, ... , °>-
are clearly fundamental devices, and making their relationships explicit now
will be helpful to us later on when we have to handle their occurrences in more
complicated situations.
EXERCISES
3.1 Show that IRm X IR" is isomorphic to 1R,,+m.
3.2 Show more generally that if L:~ ni = n, then 11:"=1 IRni is isomorphic to IRn.
3.3 Show that if {B, C] is a partitioning of .1, then IRA and IRB X IRe are isomorphic.
3.4 Generalize the above to the case where {.li]~ partitions A.
3.5 Rhow that a mapping T from a vector space r to a vector space Tr is linear if
and only if (the graph of) T is a subspace of V X H'.
3.6 L('t Sand T be nonzero linear maps from V to W. The definition of the map
S + T is not the same as the set sum of (the graphs of) Sand T as subspaces of V X Tr.
Show that the set sum of (the graphs of) Sand T cannot be a graph unless S = T.
3.7 Give the justification for each step of the calculation in Theorem 3.2.
3.8 Prove the distributive laws given in Theorem 3.3.
3.9 L('t D: e 1([a, b)) --+ e([a, b)) be differentiation, and l('t S: e([a, b» --+ IR be the
definit<> int('gral map f ~ J:f. Compute the eomposition SoD.
3.10 We know that the g('nerallinear functional F on 1R2 is the map x ~ alXI + a2;(2
determined by the pair a in 1R2, and that the g('neral linear map T in Hom(1R2) is
determined by a matrix
t = [tIl t12] .
t21 t22
Then F 0 T is another linear functional, and hence is of the form x ~ btXI + b2X2 for
some b in 1R2. Compute b from t and a. Your computation should ,!lOW you that
a ~ b is linear. What is its matrix?
3.ll Given Sand Tin Hom(1R2) whose matrices are
and [~ ~],
respectively, find the matrix of SoT in Hom(1R2).
50 VECTOR SPACES
3.12 Given Sand T in Hom(~2) whose matrices are
s = [811 812]
821 822
find the matrix of SoT.
and t = [t11 t12] ,
t21 t22
1.3
3.13 With the above answer in mind, what would you guess the matrix of SoT is
if Sand T are in Hom(~3)? Verify your guess.
3.14 We know that if T E Hom(V, lV) is an isomorphism, t"'en T-l is an isomorphism
in HomOT', V). Prove that
SoT surjective =} S surjective, SoT injective =} T injective,
and, therefore, that if T E Hom(V, tv), S E Hom(W, V), and
SoT = Iv, To S = I. ,
then T is an isomorphism.
3.15 Show that if S-1 and T-l exist, then (So T)-1 exists and equals T-l 0 S-I.
Give a more careful statement of this result.
3.16 Show that if Sand T in Hom V commute with each other, then the null space of
T, N = N(T), and its range R = R(T) are invariant under S (S[N] C Nand S[R] C R).
3.17 Show that if ex is an eigenvector of T and S commutes with T, then S(ex) is
an eigenvector of T and has the same eigenvalue.
3.18 Show that if S commutes with T and T-l exists, then S commutes with T-l.
3.19 Given that ex is an eigenvector of T with eigenvalue x, show that ex is also an
eigenvector of T2 = ToT, of Tn, and of T-l (if T is invertible) and that the corre-
sponding eigenvalues are x2, x n, and l/x.
Given that p(t) is a polynomial in t, define the operator p(T), and under the above
hypotheses, show that ex is an eigenvector of p(T) with eigenvalue p(x).
3.20 If Sand T are in Hom V, we say that S doubly commute8 with T (and write
S cc T) if S commutes with every A in Hom V which commutes with T. Fix T, and
set {T]" = {S: S cc T}. Show that {T}" is a commutative subalgebra of Hom V.
3.21 Given T in Hom V and ex in V, let N be the linear span of the "trajectory of ex
under T" (the set {Tnex: n E l+]). Show that N is invariant under T.
3.22 A transformation T in Hom V such that Tn = 0 for some n is said to be nilpotent.
Show that if T is nilpotent, then I - T is invertible. [Hint: The power series
_1_ = fxn
1 - x 0
is a finite sum if x is replaced by T.]
3.23 Suppose that T is nilpotent, that S commutes with T, and that S-1 exists, where
S, T E Hom V. Show that (S - T) -1 exists.
3.24 Let q; be an isomorphism from a vector space V to a vector space W. Show that
T ~ q; 0 To q;-1 is an algebra isomorphism from the algebra Hom V to the algebra
Hom TV.
1.3 PRODUCT SPACES AND HOM(V, lV) 51
3.25 Show the 7r/s and (J/s explicitly for ~3 = ~ X ~ X ~ using the stopped arrow
notation. Also write out the identity L: (Jj 0 7rj = I in explicit form.
3.26 Do the same for ~5 = ~2 X ~3.
3.27 Show that the first two projection-injection identities (7ri 0 (Ji = Ii and 7rj 0 (Ji = 0
if j ~ i) are simply a restatement of the definition of (Ji. Show that the linearity of (Ji
follows formally from these identities and Theorem 3.4.
3.28 Prove the identityL: (Ji 0 7ri = I by applying 7rj to the equation and remembering
that f = g if 7rj(f) = 7rj(g) for all j (this being just the equation f(j) = g(j) for all j).
3.29 Prove the general case of Theorem 3.6. We are given an indexed collection of
linear maps {Ti: i E I} with common domain V and codomains {Wi: i E 1]. The
first question is how to define T: V ---+ W = IIi Wi. Do this by defining T(~) suitably
for each ~ E V and then applying Theorem 3.4 to conclude that T is linear.
3.30 Prove Theorem 3.7.
3.31 We know without calculation that the map
from ~3 to ~4 is linear. Why? (Cite relevant theorems from the text.)
3.32 Write down the matrix for the transformation T in the above example, and then
write down the mappings To (Ji from ~ to ~4 (for i = 1,2,3) in explicit ordered
quadruplet form.
3.33 Let W = II~ Wi be a finite product vector space and set Pi = (Ji 0 7ri, so that
Pi is in Hom W for all i. Prove from the projection-injection identities that L:~ Pi = I
(the identity map on W), Pi 0 pj = 0 if i ~ j, and Pi 0 Pi = Pi. Identify the range
Ri = R(Pi).
3.34 In the context of the above exercise, define T in Hom W as
Show that a is an eigenvector of T if and only if a is in one of the subspaces Ri and that
then the eigenvalue of a is i.
3.35 In the same situation show that the polynomial
n
II (T - jl) = (T - I) 0 ••• 0 (T - nl)
j=l
is the zero transformation.
3.36 Theorems 3.6 and 3.7 can be combined if T E Hom(V, 11'), where both V and lI'
are product spaces:
n
V = IIVj and
1
State and prove a theorem which says that such a T can be decomposed into a doubly
indexed family {Tij} when Tij E Hom(Vi, Wj) and conversely that any such doubly
indexed family can be assembled to form a single T form V to W.
52 VECTOR SPACES 1.4
3.37 Apply your theorem to the special case where V = IRn and TV = IRm (that is,
Vi = Wj = IR for all i and j). Now Tij is from IR to IR and hence is simply multipli-
cation by a number tij. Show that the indexed collection {tij} of these numbers is the
matrix of T.
3.38 Given an m-tuple of vector spaces {Wi) '{', suppose that there are a vector space
X and maps Pi in Hom(X, 1ri), i = 1, ... , m, with the following property:
P. For any m-tuple of linear maps {Ti) from a common domain space V to the
above spaces Wi (so that Ti E Hom(V, Wi), i = 1, ... , m), there is a unique T
in Hom(V, X) such that Ti = Pi 0 T, i = 1, ... , m.
Prove that there is a "canonical" isomorphism from
m
TV = II Wi to X
1
under which the given maps Pi become the projections 7ri. [Remark: The product space
TV itself has property P by Theorem 3.6, and this exercise therefore shows that P is an
abstract characterization of the product space.]
4. AFFINE SUBSPACES AND QUOTIENT SPACES
In this section we shall look at the "planes" in a vcctor space V and see what
happens to them when we translate them, intersect them with each other,
take their images under linear maps, and so on. Then we shall confine ourselves
to the set of all planes that are translates of a fixed subspace and discover that
this set itself is a vector space in the most obvious way. Some of this material
has been anticipated in Section 2.
Affine subspaces. If N is a subspace of a vector space V and a is any vector
of V, then the set N + a = {~+ a : ~ EN} is called either the coset of N
containing a or the affine subspace of V through a and parallel to N. The set N +a
is also called the translate of N through a. We saw in Section 2 that affine sub-
spaces are thc general objects that we want to call planes. If N is given and fixed
in a discussion, we shall use the notation a = N +a (see Section 0.12).
We begin with a list of some simple properties of affine subspaces. Some of
these will gencralize observations already made in Section 2, and the proofs of
some will be left as exercises.
1) With a fixed subspace N assumed, if 'Y E a, then 'Y = a. For if 'Y =
a + 7]0, then 'Y + 7] = a + (7]0 + 7]) E a, so 'Y Ca. Also a + 7] = 'Y + (7] - 7]0) E 'Y,
soaC'Y. Thusa= 'Y.
2) With N fixed, for any a and {3, either a = 11 or a and 11 are disjoint.
For if a and 11 are not disjoint, then there exists a 'Y in each, and a = 'Y = 11
by (1). The reader may find it illuminating to compare these calculations with
the more general ones of Section 0.12. Here a ~ {3 if and only if a - (3 EN.
3) Now let a be the collection of all affine subspaces of V; a is thus the set
of all cosets of all vector subspaces of V. Then the intersection of any sub-
1.4 AFFINE SUBSP,CES AXD QUOTIEXT SP,CES 53
family of Ci is either empty or itself an affine subspace. In fact, if [Ai} iEI is
an indexed collection of affine subspaces and Ai is a coset of the vector subspace
Wi for each i E I, then niEI Ai is either empty or a coset of the vector subspace
niEI Wi. For if {3 E niEI Ai, then (1) implies that Ai = {3 + Wi for all i, and
then nAi = (3 + nWi.
4) If A, BE Ci, then A + BE Ci. That is, the set sum of any two affine
subspaces is itself an affine ~mbspace.
5) If A E Ci and T E Hom(V, W), then T[A] is an affine subspace of W.
In particular, if t E IR, then tA E Ci.
6) If B is an affine subspace of Wand T E Hom(V, TV), then T- 1[B] is
either empty or an affine subspace of V.
7) For a fixed a E V the translation of V through a is the mapping
Sa: V ~ V defined by Sa(~) = ~ + a for all ~ E V. Translation is not linear;
for example, Sa(O) = a. It is clear, however, that translation carries affine
subspaces into affine subspaces. Thus Sa(A) = A + a and Sa({3 + W) =
(a +(3) + W.
8) An affine transformation from a vector space V to a vector space W is a
linear mapping from V to W followed by a translation in W. Thus an affine
transformation is of the form ~ f-+ T(~) +(3, where T E Hom(V, W) and (3 E W.
Note that ~ f-+ T(~ + a) is affine, since
T(~ + a) = T(~) +{3, where (3 = T(a).
It follows from (5) and (7) that an affine transformation carries affine
subspaces of V into affine subspaces of W.
Quotient space. Now fix a subspace N of V, and consider the set W of all
translates (cosets) of N. We are going to see that W itself is a vector space in
the most natural way possible. Addition will be set addition, and scalar multipli-
cation will be set multiplication (except in one special case). For example, if N
is a line through the origin in 1R3, then W consists of all lines in 1R3 parallel to N.
Weare saying that this set of parallel lines will automatically turn out to be a
vector space: the set sums of any two of the lines in W turn out to be a line in W!
And if LEW and t ~ 0, then the set product tL is a line in W. The translates
of L fiber 1R3, and the set of fibers is a natural vector space.
During this discussion it will be helpful temporarily to indicate set sums by
'+s' and set products by I •• '. With N fixed, it follows from (2) above that two
eosets are disjoint or identical, so that the set W of all cosets is a fibering of V
in the general case, just as it was in our example of the parallel lines. From (4)
or by a direct calculation we know that ex +. II = a + (3. Thus W is closed
under set addition, and, naturally, we take this to be our operation of addition
on W. That is, we define + on W by ex + II = ex +. II. Then the natural map
7r: a f-+ ex from V to W preserves addition, 7r(a +(3) = 7r(a) + 7r({3), since
54 VECTOH SPACES 1.4
this is just our equation a +{3 = a + ~ above. Similarly, if t EO IR, then the
set product t .s a is either ta or {O]. Hence if we define ta as the set product when
t ,e 0 and as 0 = N when t = 0, then 7r also preserves scalar multiplication,
7r(ta) = t7r(a).
We thus have two vectorlike operations on the set W of all cosets of N,
and we naturally expect W to turn out to be a vector space. We could prove this
by verifying all the laws, but it is more elegant to notice the general setting for
such a verification proof.
Theorem 4.1. Let V be a vector space, and let W be a set having two
vectorlike operations, which we designate in the usual way. Suppose that
there exists a surjective mapping T: V --+ W which preserves the operations:
T(sa + t(3) = sT(a) + tT({3). Then W is a vector space.
Proof. We have to check laws Al through S4. However, one example should
make it clear to the reader how to proceed. We show that T(O) satisfies A3 and
hence is the zero vector of lV. Since every (3 EO W is of the form T(a), we have
T(O) +(3 = T(O) + T(a) = T(O + a) = T(a) = (3,
which is A3. We shall ask the reader to check more of the laws in the exercises. 0
Theorem 4.2. The set of cosets of a fixed subspace N of a vector space V
themselves form a vector space, called the quotient space VIN, under the
above natural operations, and the projection 7r is a surjective linear map
from V to VIN.
Theorem 4.3. If T is in Hom(V, W), and if the null space of T includes the
subspace lIf C V, then T has a unique factorization through VIlIf. That is,
there exists a unique transformation S in Hom(V1M, W) such that T =
S07r.
Proof. Since T is zero on lIf, it follows that T is constant on each coset A of lIf,
so that T[A] contains only one vector. If we define S(A) to be the unique
vector in T[A], then S(a) = T(a), so So 7r = T by definition. Conversely, if
T = R 0 7r, then R(a) = R 0 7r(a) = T(a), and R is our above S. The linearity
of S is practically obvious. Thus
S(a +m= S(a +(3) = T(a +(3) = T(a) + T({3) = S(a) + S(m,
and homogeneity follows similarly. This completes the proof. 0
One more remark is of interest here. If N is invariant under a linear map
T in Hom V (that is, T[N] eN), then for each a in V, T[a] is a subset of the
coset T(a), for
T[a] = T[a +N] = T(a) +. T[N] C T(a) +. N = T(a).
1.4 AFFINE SUBSPACES AND QUOTIENT SPACES 55
There is therefore a map S: VIN -+ VIN defined by the requirement that
S(a) = T(OI) (or S 0 7r = 7r 0 T), and it is easy to check that S is linear. There-
fore,
Theorem 4.4. If a subspace N of a vector space V is carried into itself by a
transformation T in Hom V, then there is a unique transformation S in
Hom(VIN) such that S 0 7r = 7r 0 T.
EXERCISES
4.1 Prove properties (4), (5), and (6) of affine subspaces.
4.2 Choose an origin 0 in the Euclidean plane P (your sheet of paper), and lct
1.,1 and L2 be two parallel lines not containing O. Let X and Y be distinct points on
1.,1 and Z any point on L2. Draw the figure giving the geometric sums
and
(parallelogram rule), and state the theorem from plane geometry that says that these
two sum points are on a third line L3 parallel to L1 and L2.
4.3 a) Prove the associative law for addition for Theorem 4.1.
b) Prove also laws A4 and S2.
4.4 Return now to Exercise 2.1 and reexamine the situation in the light of Theorem
4.1. Show, finally, how we really know that the geometric vectors form a vector space.
4.5 Prove that the mapping 8 of Theorem 4.3 is injective if and only if N is the
null space of T.
4.6 We know from Exercise 4.5 that if T is a surjective element of Hom(V, W) and
N is the null space of T, then the 8 of Theorem 4.3 is an isomorphism from VIN to W.
Its inverse 8-1 assigns a coset of N to each." in Tr. Show that the process of "indefinite
integration" is an example of such a map 8-1• This is the process of calculating an
integral and adding an arbitrary constant, as in
Jsin x dx = -cos x +c.
4.7 Suppose that Nand Mare subspaces of a vector space V and that N C M.
Hhow that then MIN is a subspace of VIN and that VI M is naturally isomorphic to the
(Iuotient space (VIN)/(MIN). [Hint: Every coset of N is a subset of some coset of M.l
4.8 Suppose that Nand M are any subspaces of a vector space V. Prove that
(M +N)IN is naturally isomorphic to MI(M n N). (Start with the fact that each
('oset of M n N is included in a unique coset of N.)
4.9 Prove that the map 8 of Theorem 4.4 is linear.
·~.IO Given T E Hom V, show that T2 = 0 (T2 = To T) if and only if R(T) C N(T).
·~.ll Suppose that T E Hom V and the subspace N are such that T is the identity
nn N and also on VIN. The latter assumption is that the 8 of Theorem 4.4 is the
idcntityon VIN. Set R = T - I, and use the above exercise to show that R2 = O.
Hhow that if T = 1+ Rand R2 = 0, then there is a subspace N such that T is the
identity on N and also on VIN.
56 VECTOH SPACES 1.5
4.12 We now view the above situation a little differently. Supposing that T is the
identity on N and on V IN, and setting R = I - T, show that there exists a
K E Hom(VIN, V) such that R = K 0 7r. Show that for any coset A of N the action
of T on A can be viewed as translation through K(A). That is, if ~ E ..4 and 71 = K(A),
then TW = ~+ 71.
4.13 Consider the map T: <Xl, X2>- ~ <Xl + 2X2, X2>- in Hom !R2 , and let N be
the null space of R = T - I. Identify N and show that T is the identity on Nand
on !R21N. Find the map K of the above exercise. Such a mapping T is called a shear
transformation of r parallel to N. Draw the unit squar~ and its image under T.
4.14 If we remember that the linear span 1.,(.1) of a subset .1 of a vector space V can
be defined as the intersection of all the subspaces of V that include .1, then the fact
that the intersection of any collection of affine subspaces of a vector space V is either
an affine subspace or empty suggests that we define the affine span .1J(A) of a nonempty
subset A C V as the intersection of all affine subspaces including A. Then we know
from (3) in our list of affine properties that .1J(.I) is an affine subspace, and by its
definition above that it is the smallest affine subspace including A. We now naturally
wonder whether M(A) can be directly described in terms of linear combinations.
Show first that if a E ii, then M(A) = 1.,("1 - a) + a; then prove that M(A) is the
set of all linear combinations I: Xiai on .1 such that I: Xi = 1.
4.15 Show that the linear span of a set B is the affine span of B U {OJ.
4.16 Show that .t/(A + 'Y) = .:1/(.1) + 'Y for any 'Y in r and that M(xA) = xM(A)
for any X in !R.
5. DIRECT SUMS
We come now to the heart of the chapter. It frequently happens that the study
of some phenomenon on a vector space V leads to a finite collection of subspaces
{Vi} such that V is naturally isomorphic to the product space IIi Vi. Under
this isomorphism the maps (h 0 7ri on the product space become certain maps
Pi in Hom V, and the projection-injection identities are reflected in the identities
"'£Pi = I, Pj 0 Pj = Pj for all .1, and Pi 0 Pj = 0 if i ¢ i Also, Vi = range
Pi. The product structure that V thus acquires is then used to study the phe-
nomenon that gave rise to it. For example, this is the way that we unravel the
structure of a linear transformation in Hom V, the study of which is one of the
central problems in linear algebra.
Direct SUIllS. If VI, ... , Vn are subspaces of the vector space V, then the
mapping 7r: <aI,"" an >- ~ "'£7 ai is a linear transformation from II~ Vi
to V, since it is the sum 7r = "'£7 7ri of the coordinate projections.
Definition. We shall say that the Vi's are independent if 7r is injective and
that V is the direct sum of the Vi's if 7r is an isomorphism. We express the
latter relationship by writing V = VI EEl ... EEl Vn = EB7 Vi.
Thus V = EBi=I Vi if and only if 7r is injective and surjective, i.e., if and
only if the subspaces {Vig are both independent and span V. A useful restate-
1.5 DIRECT SUMS 57
ment of the direct sum condition is that each a E V is uniquely expressible as
a sum :E~ ai, with ai E Vi for all i; a has some such expression because the V/s
span V, and the expression is unique by their independence.
For example, let V = e(lR) be the space of real-valued continuous functions
on IR, let V. be the subset of even functions (functions f such that f( -x) = f(x)
for all x), and let Vo be the subset of odd functions (functions such that f( -x) =
-f(x) for all x). It is clear that V. and Vo are subspaces of V, and we claim that
V = V. EB Yo' To see this, note that for any f in V, O(x) = (j(x) +f( -x))/2
is even, h(x) = (j(x) - f( -x)) /2 is odd, and f = 0 + h. Thus V = V. + Yo'
Moreover, this decomposition of f is unique, for if f = 01 + hI also, where 01
is even and hI is odd, then 0 - 01 = hI - h, and therefore 0 - 01 = 0 =
hI - h, since the only function that is both even and odd is zero. The even-odd
components of eX are the hyperbolic cosine and sine functions:
X (eX + e-X) (eX - e-X) .
e = 2 + 2 = cosh x + smh x.
Since 7r is injective if and only if its null space is {O} (Lemma 1.1), we have:
Lelllllla 5.1. The independence of the subspaces Wig is equivalent to the
property that if ai E Vi for all i and :E~ ai = 0, then ai = 0 for all i.
Corollary. If the subspaces {Vi}~ are independent, ai E Vi for all i, and
:E~ ai is an element of Vj, then ai = 0 for i ~ j.
We leave the proof to the reader.
The case of two subspaces is particularly simple.
Lelllllla 5.2. The subspaces 1If and N of V are independent if and only if
1If n N = (O}.
Proof. If a E M, {3 E N, and a +{3 = 0, then a = -{3 E M n N. If 1If n N =
{O}, this will further imply that a = {3 = 0, so 1If and N are independent.
On the other hand, if 0 ~ {3 E M n N, and if we set a = -{3, then a E JI.!,
{j E N, and a +{3 = 0, so 1If and N are not independent. 0
Note that the first argument above is simply the general form of the unique-
ness argument we gave earlier for the even-odd decomposition of a function
on IR.
Corollary. V = M EB N if and only if V = JI.! + Nand 1If n N = {O}.
Definition. If V = M EB N, then M and N are called complementary sub-
spaces, and each is a complement of the other.
Waming: A subspace M of V does not have a unique complementary subspace
IInless M is trivial (that is, M = {O} or M = V). If we view 1R3 as coordinatized
Euclidean 3-space, then M is a proper subspace if and only if M is a plane con-
taining the origin or M is a line through the origin (see Fig. 1.9). If M and N are
58 VECTOR SPACES
1R13 = NffiL
~='1+A
1.5
Fig. 1.9
proper subspaces one of which is a plane and the other a line not lying in that
plane, then M and N are complementary subspaces. Moreover, these are
nontrivial complementary pairs in IRs, The rea.der will be asked to prove
some of these facts in the exercises and they all will be clear by the middle of
the next chapter.
The following lemma is technically useful.
Lemma 5.3. If 171 and Vo are independent subspaces of 17 and {Yin are
independent subspaces of Yo, then {Vi}'; are independent subspaces of V.
Proof. If ai E Vi for all i and 2:1 ai = 0, then, setting ao = 2:2 ai, we have
al +ao = 0, with ao E Yo. Therefore, a 1 = ao = 0 by the independence of
171 and Vo. But then a2 = as = ... = an = 0 by the independence of
{Yin, and we are done (Lemma 5.1). 0
Corollary. V = V 1 EEl V 0 and V 0 = EBf=2 Vi together imply that
V = EBf=l Vi.
Projections. If V = EBf=1 Vi, if 7r is the isomorphism -< a b ... , an >- .-
a = 2:1 ai, and if 7rj is the jth projection map -<al>"" an >- !--l> aj from
IIi=i Vi to Vi> then (7rj 0 7r-1)(a) = ai.
Definition. We call aj the jth component of a, and we call the linear map
Pj = 7rj 0 7r-1 the projection of V onto Vi (with respect to the given direct
sum decomposition of V). Since each a in V is uniquely expressible as a sum
a = 2:1 ai, with ai in Vi for all i, we can view Pj(a) = aj as "the part of
a in V;".
This use of the word "projection" is different from its use in the Cartesian
product situation, and each is different from its use in the quotient space con-
text (Section 0.12). It is apparent that these three uses are related, and the
ambiguity causes little confusion since the proper meaning is always clear from
the context.
Theorem 5.1. If the maps Pi are the above projections, then range Pi = Vi,
Pi 0 Pj = 0 for i ~ j, and L:l Pi = I.
Proof. Since 7r is an isomorphism and P j = 7rj 0 7r-I, we have range Pj =
range 7rj = Vj. Next, it follows directly from the corollary to Lemma 5.1 that
1.5 DIRECT SUMS 59
if a E Vj, then PiCa) = 0 for i ¢ j, and so Pi 0 P j = 0 for i ¢ j. Finally,
Ei Pi = Ei 7ri 0 7r-l = (Ei 7ri) 0 7r-l = 7r 0 7r-l = I, and we are done. 0
The above projection properties are clearly the reflection in V of the pro-
jection-injection identities for the isomorphic space IIi Vi.
A converse theorem is also true.
Theorelll 5.2. If {Pi}i c Hom V satisfy :Li Pi = I and Pi 0 P j = 0 for
i ¢ j, and if we set Vi = range Pi, then V = EB?=l Vi, and Pi is the
corresponding projection on Vi.
P1"Oof. The equation a = lea) = Ei PiCa) shows that the subspaces {Vi} i
span V. Next, if fJ E Vj, then Pi(fJ) = 0 for i ¢ j, since fJ E range Pj and
Pi 0 P j = 0 if i ¢ j. Then also P;({3) = (I - Ei+j P i)({3) = 1({3) = (3.
Now consider a = Ei ai for any choice of ai E Vi. Using the above two facts,
we have Pj(a) = Pj(E?=l ai) = Ei=l Pj(ai) = aj. Therefore, a = 0 implies
that aj = Pj(O) = 0 for all j, and the subspaces Vi are independent.
Consequently, V = EBi Vi. Finally, the fact that a = E PiCa) and PiCa) E Vi
for all i shows that P j(a) is the jth component of a for every a and therefore that
P j is the projection of V onto Vj. 0
There is an intrinsic characterization of the kind of map that is a projection.
Lelllllla 5.4. The projections Pi are idempotent (Pr = Pi), or, equivalently,
each is the identity on its range. The null space of Pi is the sum of the spaces
Vj for j ¢ i.
Proof. PJ = P j 0 (I - Ei+j Pi) = P j 0 I = Pj. Since this can be rewritten
as Pj(Pj(a» = Pj(a) for every a in V, it says exactly that P j is the identity
on its range.
Now set Wi = Ei+i V;, and note that if (3 E Wi, then P i({3) = 0 since
Pi[Vj] = 0 for j ¢ i. Thus Wi C N(Pi). Conversely, if PiCa) = 0, then a =
lea) = Ei Pj(a) = Ej+i Pj(a) E Wi. Thus N(Pi) C Wi, and the two spaces
are equal. 0
Conversely:
Lelllllla 5.5. If P E Hom(V) is idempotent, then V is the direct sum of its
range and null space, and P is the corresponding projection on its range.
Proof. Setting Q = I - P, we have PQ = P - p 2 = O. Therefore, V is the
direct sum of the ranges of P and Q, and P is the corresponding projection on its
range, by the above theorem. Moreover, the range of Q is the null space of P,
by the coronary. 0
If V = M E9 Nand P is the corresponding projection on M, we call P the
p1"Ojection on M along N. The projection P is not determined by M alone, since
M does not determine N. A pair P and Qin Hom V such that P +Q = I and
PQ = QP = 0 is called a pair of complementary projections.
60 VECTOR SPACES 1.5
In the above discussion we have neglected another fine point. Strictly
speaking, when we form the sum 7r = L~ 7ri, we are treating each 7rj as though
it were from II~ Vi to V, whereas actually the codomain of 7rj is Vj. And we
want P j to be from V to V, whereas 7rj 0 7r- 1 has codomain Vj, so the equation
Pj = 7rj 0 7r- 1 can't quite be true either. To repair these flaws we have to
introduce the injection Lj: Vj ---+ V, which is the identity map on Vj, but which
views V j as a subspace of V and so takes Vas its codomain. If our concept of a
mapping includes a codomain possibly larger than the range, then we have to
admit such identity injections. Then, setting ifj = Lj 0 7rj, we have the correct
equations 7r = L~ ifi and Pj = ifj 0 7r- 1•
EXERCISES
5.1 Prove the corollary to Lemma 5.1.
5.2 Let a bc the vector -< 1, 1, 1>- in 1R3, and let M = IRa be its one-dimrnsional
span. Show that each of the three coordinate planes is a complement of M.
5.3 Show that a finite product space V = II~ Vi has subspaces {WiH such that
Wi is isomorphic to Vi and r = EB~ Wi. Show how the corresponding projections
{Pi} are related to the 7r;'S and O;'s.
5.4 If T E HomO', W), show that (the graph of) T is a complement of W' =
{OJ X Win Y X W.
5.5 If l is a linear functional on Y (l E Hom(Y, IR) = Y*), and if a is a vector in V
such that l(a) r'- 0, show that F = N ® JI, where N is the null space of land M = IRa
is the linear span of a. What does this result say about complements in 1R3?
5.6 Show that any complement M of a subspace N of a vector space V is isomorphic
to the quotient space YIN.
5.7 We suppose again that every subspace has a complement. Show that if
T E Hom Y is not injective, then there is a nonzero S in Hom Y such that To S = O.
Show that if T E Hom Y is not surjective, then there is a nonzero S in Hom V such
that SoT = O.
5.8 Using the above exercise for half the arguments, show that T E Hom Y is
injective if and only if To S = 0 => S = 0 and that T is surjective if and only if SoT =
o=> S = O. We thus have characterizations of injectivity and surjectivity that are
formal, in the sense that they do not rrfer to the fact that Sand T are transformations,
but refer only to the algebraic properties of Sand T as elements of an algebra.
5.9 Let M and N be complementary subspaces of a vector space Y, and let X be a
subspace such that X n N = {OJ. Show that there is a linear injection from X to M.
[Hint: Consider the projection P of V onto M along N.] Show that any two comple-
ments of a subspace N are isomorphic by showing that the above injection is surjective
if and only if X is a complement of N.
5.10 Going back to the first point of the preceding exercise, let Y be a eomplemrnt of
P[X] in M. Show that X n Y = {OJ and that X ® Y is a complement of N.
5.Il Let M be a proper subspace of V, and let {ai: i E I} be a finite set in r. Set
L = L( {ai}), and suppose that jlJ +L = Y. Show that there is a subset J C I such
1.5 DIRECT SUMS 61
that {ai: i E J} spans a complement of j[. [Hint: Consider a largest possible subset J
such that ill n L ({ai} J) = {O}.l
5.12 Given T E Hom(Y, 11") and S E Hom(lr, X), show that
a) SoT is surjective <=> S is surjective and R(T) + N(S) = W;
b) SoT is injective <=> T is injective and R(T) n N(S) = {O};
c) So Tis an isomorphism <=> Sis surjective, Tis injective,and lr = R(T) EB N(S).
5.13 Assuming that every subspace of V has a complement, show that T E Hom Y
satisfies T2 = 0if and only if l' has a din'ct sum decomposition Y = JIll EB N such that
T = 0 on Nand T[Jl1 C N.
5.14 Suppose next that T3 = 0 but T2 rf O. Show that l' can be written as Y =
Y 1 EB Y 2 EB 1':l, wlH're T[Vrl C T'2, T[l'21 C Y:l , and T = 0 on 1':1. (Assume again
that any subspace of a vector space haH a complement.)
5.15 We now suppose that Tn = 0 but Tn-l rf o. Set Ni = null space (Ti) for
t = 1, ... ,n - 1, and let VI be a complement of N n-l in Y. Show first that
T[Yl1 n N n-2 = {OJ
and that T[yrl C N n-l. Extend T[Yl1 to a complement Y2 of N n-2 in Nn-l, and
show that in this way we can construct subspaces Y 1, ... , Y n such that
n
Y = ED Yi , for i < n,
1
and
T[V n1 = {OJ.
On solving a linear equation. 1fany important problems in mathematics are
in the following general form. A linear operator T: V ---t W is given, and for a
given 1/ E W the equation T(~) = 1/ is to be solved for ~ E V. In our terms, the
condition that there exist a solution is exactly the condition that 1/ be in the
range space of T. In special circumstances this condition can be given more or
less useful equivalent alternative formulations. Let us suppose that we know
how to recognize R(T), in which case we may as well make it the new codomain,
and so assume that T is surjective. There still remains the problem of determin-
ing what we mean by solving the equation. The universal principle running
through all the important instances of the problem is that a solution process
calculates a right inverse to T, that is, a linear operator S: W ---t V such that
To S = I w, the identity on W. Thus a solution process picks one solution
vector ~ E V for each 1/ E W in such a way that the solving ~ varies linearly with
1/. Taking this as our meaning of solving, we have the following fundamental
reformulation.
TheoreIll 5.3. Let T be a surjective linear map from the vector space V
to the vector space W, and let N be its null space. Then a subspace M is a
complement of N if and only if the restriction of T to M is an isomorphism
from M to W. The mapping M ~ (T rM)-1 is a bijection from the set
of all such complementary subspaces M to the set of all linear right inverses
of T.
62 VECTOR SPACES 1.5
Proof. It should be clear that a subspace M is the range of a linear right inverse
of T (a map S such that T 0 S = Iw) ifand only if T rM is an isomorphism to W,
in which caseS = (T rM)-l. Strictly speaking, the right inverse must be from
W to V and therefore must be R = LM 0 S, where LM is the identity injection
from M to V. Then (R 0 T)2 = R 0 (T 0 R) 0 T = R 0 Iw 0 T = RoT, and
RoT is a projection whose range is M and whose null space is N (since R is
injective). Thus V = M Ef) N. Conversely, if V = M Ef) N, then T rM is
injective because JI.f n N = {O} and surjective because M +N = V implies
that W = T[V] = T[JI.f +N] = T[M] + T[N] = T[M] + {O} = T[M]. 0
Polynomials in T. The material in this subsection will be used in our study of
differential equations with constant coefficients and in the proof of the diagonal-
izability of a symmetric matrix. In linear algebra it is basic in almost any
approach to the canonical forms of matrices.
If PI(t) = Lo ai and P2(t) = Lo bjtj are any two polynomials, then their
product is the polynomial
m+n
p(t) = PI(t)P2(t) = L Cktk,
o
where Ck = Li+j=k aibj = Lf=o aibk-i. Now let T be any fixed element of
Hom(V), and for any polynomial q(t) let q(T) be the transformation obtained
by replacing t by T. That is, if q(t) = L~ Cktk, then q(T) = L~ CkTk, where, of
course, Tl is the composition product ToT 0 ••• 0 T with l factors. Then the
bilinearity of composition (Theorem 3.3) shows that if p(t) = PI (t)P2(t),
then p(T) = PI(T) 0 P2(T). In particular, any two polynomials in T commute
with each other under composition. More simply, the commutative law for
addition implies that
if p(t) = PI(t) +P2(t), then p(T) = PI(T) +P2(T).
The mapping p(t) 1---+ p(T) from the algebra of polynomials to the algebra
Hom(V) thus preserves addition, multiplication, and (obviously) scalar mUltipli-
cation. That is, it preserves all the operations of an algebra and is therefore
what is called an (algebra) homomorphism.
The word "homomorphism" is a general term describing a mapping 8
between two algebraic systems of the same kind such that 8 preserves the
operations of the system. Thus a homomorphism between vector spaces is
simply a linear transformation, and a homomorphism between groups is a
mapping preserving the one group operation. An accessible, but not really
typical, example of the latter is the logarithm function, which is a homomorphism
from the multiplicative group of positive real numbers to the additive group of ~.
The logarithm function is actually a bijective homomorphism and is therefore
a group isomorphism.
If this were a course in algebra, we would show that the division algorithm
and the properties of the degree of a polynomial imply the following theorem.
(However, see Exercises 5.16 through 5.20.)
1.5 DIRECT SUMS 63
Theorem 5.4. If PI(t) and P2(t) are relatively prime polynomials, then there
exist polynomials al(t) and a2(t) such that
al(t)PI (t) + a2(t)P2(t) = 1.
By relatively prime we mean having no common factors except constants.
We shall assume this theorem and the results of the discussion preceding it in
proving our next theorem.
We say that a subspace MeV is invariant under T E Hom(V) if T[M] eM
[that is, T rM E Hom(M)].
Theorem 5.5. Let T be any transformation in Hom V, and let q be any
polynomial. Then the null space N of q(T) is invariant under T, and if
q = qlq2 is any factorization of q into relatively prime factors and N I and
N 2 are the null space of ql(T) and q2(T), respectively, then N = N I E9 N 2.
Proof. Since T 0 q(T) = q(T) 0 T, we see that if q(T)(a) = 0, then q(T)(Ta) =
T(q(T)(a)) = 0, so T[N] eN. Note also that since q(T) = ql(T) 0 q2(T),
it follows that any a in N 2 is also in N, so N 2 eN. Similarly, N leN. We can
therefore replace V by Nand T by T rN j hence we can assume that T E Hom N
and q(T) = ql(T) 0 q2(T) = 0.
Now choose polynomials al and a2 so that alql + a2q2 = 1. Since P 1--+ p(T)
is an algebraic homomorphism, we then have
al(T) 0 ql(T) + a2(T) 0 q2(T) = I.
Set Al = al(T), etc., so that Al 0 QI + A2 0 Q2 = I, QI 0 Q2 = 0, and all the
operators Ai, Qi commute with each other. Finally, set Pi = Ai 0 Qi = Qi 0 Ai
for i = 1,2. Then PI + P 2 = I and P IP 2 = P 2P I = 0. Thus PI and P 2 are
projections, and N is the direct sum of their ranges: N = VI E9 V2. Since each
range is the null space of the other projectivn, we can rewrite this as N =
N I E9 N 2, where Ni = N(Pi). It remains for us to show that N(Pi) = N(Qi).
Note first that since QI 0 P 2 = QI 0 Q2 0 A2 = 0, we have QI = QI 0 I =
QI 0 (PI + P 2) = QI 0 Pl' Then the two identities Pi = Ai 0 Qi and Qi =
Qi 0 Pi show that the null space of each of Pi and Qi is included in the other, and
so they are equal. This completes the proof of the theorem. 0
Corollary. Let p(t) = rrf'=l Pi(t) be a factorization of the polynomial
p(t) into relatively prime factors, let T be an element of Hom(Il), and set
Ni = N(Pi(T)) for i = 1, ... ,m and N = N(p(T)). Then N and all the
Ni are invariant under T, and N = EBf'=1 N i.
Proof. The proof is by induction on m. The theorem is the case m = 2, and if
we set q = rr~ Pi(t) and M = N(q(T)), then the theorem implies that
N = N I E9 M and that N I and M are invariant under T. Restricting T to M,
we see that the inductive hypothesis implies that M = EBf'=2 Ni and that Ni is
invariant under T for i = 2, ... ,m. The corollary to Lemma 5.3 then yields
our result. 0
64 VECTOR SPACES 1.5
EXERCISES
5.16 Presumably the reader knows (or can see) that the degree d(P) of a polynomial
P satisfies the laws
d(P +Q) ~ max {d(P), d(Q)},
d(P . Q) = d(P) + d(Q) if both P and Q arc nonzero.
The degree of the zero polynomial is undefined. (It would have to be -oo!) By induc-
tion on the degree of P, prove that for any two polynomials P and D, with D ~ 0,
there are polynomials Q and R such that P = DQ + Rand d(R) < d(D) or R = o.
[/lint: If d(P) < d(D), we can takeQ and R as what? If d(P) ;::: d(D), and if the lead-
ing terms of P and D arc ax" and bx>n, respectively, with n ;::: m, then the polynomial
P' = P _ (~) x,,-mD
has degree less than d(P), so P' = DQ' + R' by the inductive hypothesis. Now finish
the prooL]
5.17 Assuming the above result, prove that Rand Q are uniquely determined by
P and D. (Assume also that P = DQ' + R', and prove from the properties of degree
that R' = Rand Q' = Q.) These two results together constitute the division algorithm
for polynomials.
5.18 If P is any polynomial
n
P(x) = L: anxn,
o
and if t is any number, then of course pet) is the number
n
L: a"t".
o
Prove from the division algorithm that for any polynomial P and any number t there
is a polynomial Q such that
P(x) = (x - t)Q(x) + pet),
and therefore that P(x) is divisible by (x - t) if and only if pet) = o.
5.19 Let P and Q be nonzero polynomials, and choose polynomials Ao and Bo such
that among all the polynomials of the form AP + BQ the polynomial
D = AoP+ BoQ
is nonzero and has minimum degree. Prove that D is a factor of both P and Q. (Sup-
pose that D does not divide P and apply the division algorithm to get a contradiction
with the choice of Ao and Bo.)
5.20 Let P and Q be nonzero relatively prime polynomials. This means that if E is a
common factor of P and Q (P = EP', Q = EQ'), then E is a constant. Prove that
there are polynomials A and B such that A(x)P(x) + B(x)Q(x) = 1. (Apply the
above exercise.)
1.5 DIRECT SL"MS 65
5.21 In the context of Theorem 5.5, show that the restriction of q2(T) = Q2 to N 1
is an isomorphism (from N 1 to N 1).
5.22 An involution on V is a mapping T EO Hom V such that T2 = 1. Show that if
T is an involution, then V is a direct sum V = Vi EB V 2, where T(~) = ~ for every
~ EO Vi (T = Ion Vi) and TW = -~ for every ~ EO 1'2 (T = -Ion V2). (Apply
Theorem 5.5.)
5.23 We noticed earlier (in an exercise) that if cp is any mapping from a set. t to a
set B, then fl----+ fo cp is a linear map T." from RE to R.i. Show now that if 1/;; B -> C,
then
(This "hould turn out to hr a direct const'qurnce of tht' associativity of composition.)
5.2,t Let. t be any set, and let cp; . t -> A be such that cp 0 cp(a) = a for ('very a.
Then T.,,;fl----+focp is an involution on V = R.t (since T."oy, = Ty,o T.,,). Show that
the decomposition of RR as the direct sum of the subspace of even functions and the
subspace of odd functions aris('s from an involution on RR defined by such a map
cp; R -> IR.
5.25 Let V be a subspace of RR consisting of differentiable functions, and suppose
that V is invariant under differentiation (f EO V=} Df EO V). Suppose abo that on V
the linear operator D EO Hom V satisfies D2 - 2D - 31 = o. Prove that V is the
direct sum of two subspaces J[ and N such that D = 31 on J[ and D = -Ion N.
Actually, it follows that J[ is the linear span of a single vector, and similarly for N.
Find these two functions, if you can. (f' = 3f=} f = '?)
*Block decompositions of linear maps. Given T in Hom V and a direct sum
decomposition V = EB7 Vi, with corresponding projections {Pi}7, we can
consider the maps Tij = Pi 0 To Pj. Although Tij is from V to V, we may also
want to consider it as being from V j to Vi (in which case, strictly speaking, what
is it?). We picture the T;/s arranged schematically in a rectangular array
Similar to a matrix, as indicated below for n = 2.
Furthermore, since T = Li,j Tij, we call the doubly indexed family the block
decomposition of T associated with the given direct sum decomposition of V.
110re generally, if TEO Hom(V, W) and W also has a direct sum decomposi-
tion W = EBi=l Wi, with corresponding projections {Qi}'{', then the family
{Tij} defined by Tij = Qi 0 To P j and pictured as an m X n rectangular array
is the block decomposition of T with respect to the two direct sum decompositions.
Whenever T in Hom V has a special relationship to a particular direct sum
decomposition of V, the corresponding block diagram may have features that
display these special properties in a vivid way; this then helps us to understand
the nature of T better and to calculate with it more easily.
66 VECTOR SPACES 1.5
For example, if V = VI E9 V 2, then VIis invariant under T (i.e., T[V de VI)
if and only if the block diagram is upper triangular, as shown in the following
diagram.
Suppose, next, that T2 = O. Letting VI be the range of T, and supposing that
V 1 has a complement V2, the reader should clearly see that the corresponding
block diagram is
This form is called strictly upper triangular; it is upper triangular and also zero
on the main diagonal. Conversely, if T has some strictly upper-triangular
2 X 2 block diagram, then T2 = O.
If R is a composition product, R = ST, then its block components can be
computed in terms of those of Sand T. Thus
Rik = PiRP" = PiSTPk = PiS (f Pj) TPk = t SijTjk.
j=1 j=1
We have used the identities 1= L.f=1 Pj and Pj = pJ. The 2 X 2 case is
pictured below.
SllTll +S12T 21 SllT12 +S12T 22
S21 T ll +S22T 21 S21 T 12 +S22T 22
From this we can read off a fact that will be useful to us later: If Tis 2 X 2
upper triangular (T21 = 0), and if Tii is invertible as a map from Vi to
Vi (i = 1, 2), then T is invertible and its inverse is
T -1
11
o
-Tll-ITI2T22-1
T22- 1
We find this solution by simply setting the product diagram equal to
II 0
and solving; but of course with the diagram in hand it can simply be checked to
be correct.
EXERCISES
5.26 Show that if T E Hom V, if V = EBl' Vi, and if {Pi}]' are the corresponding
projections, then the sum of the transformation Tij = Pi 0 To Pj is T.
1.6 BILINEARITY 67
5.27 If Sand T are in Hom V and {Sij} , {Tij} are their block components with
respect to some direct sum decomposition of V, show that Sij 0 Tlk = 0 if j ,t. l.
5.28 Verify that if T has an upper-triangular block diagram with respect to the
direct sum decomposition V = VI <±l V2, then VIis invariant under T.
5.29 Verify that if the diagram is strictly upper triangular, then T2 = o.
5.30 Show that if V = VI <±l V2 <±l V3 and T E Hom V, then the subspaces Vi are
all invariant under T if and only if the block diagram for Tis
[T o 1o .
T33
Show that T is invertible if and only if Tii is invertible (as an element of Hom Vi)
for each i.
5.31 Supposing that T has an upper-triangular 2 X 2 block diagram and that Tii
is invertible as an element of Hom Vi for i = 1, 2, verify that T is invertible by form-
ing the 2 X 2 block diagram that is the product of the diagram for T and the diagram
given in the text as the inverse of T.
5.32 Supposing that T is as in the preceding exercise, show that S = T-I must have
the given block diagram by considering the two equations To S = I and SoT = I
in their block form.
5.33 What would strictly upper triangular mean for a 3 X 3 block diagram? What
is the corresponding property of T? Show that T has this property if and only if it has
a strictly upper-triangular block diagram. (See Exercise 5.14.)
5.34 Suppose that T in Hom V satisfies Tn = 0 (but T,,-l ,t. 0). Show that T has
a strictly upper-triangular n X n block decomposition. (Apply Exercise 5.15.)
6. BILINEARITY
Bilinear mappings. The notion of a bilinear mapping is important to the un-
derstanding of linear algebra because it is the vector setting for the duality
principle (Section 0.10).
Definition. If U, V, and Ware vector spaces, then a mapping
w: -<~, 7]> ~ w(~, 7])
from U X V to W is bilinear if it is linear in each variable when the other
variable is held fixed.
That is, if we hold ~ fixed, then 7] ~ w(~, 7]) is linear [and so belongs to
Hom(V, W)]; if we hold 7] fixed, then similarly w(~, 7]) is in Rom(U, W) as a
function of~. This is not the same notion as linearity on the product vector
space U X V. For example, -< x, y > ~ x + y is a linear mapping from 1R2
to IR, but it is not bilinear. If y is held fixed, then the mapping x ~ x + y is
affine (translation through y), but it is not linear unless y is O. On the other
hand, -< x, y > ~ xy is a bilinear mapping from 1R2 to IR, but it is not linear. If y
68 VECTOR SPACES 1.6
is held fixed, then the mapping x ~ yx is linear. But the sum of two ordered
couples does not map to the sum of their images:
<x, y> + <u, v> = <x +u, y +v> ~ (x +u)(y +v),
which is not the sum of the images, xy +uv. Similarly, the scalar product
(x, Y) = Ll XiYi is bilinear from IRn X IRn to IR, as we observed in Section 2.
The linear meaning of bilinearity is partially explained in the following
theorem.
Theorem 6.1. If w: U X V ~ W is bilinear, then, by duality, w is equiv-
alent to a linear mapping from U to Hom(V, W) and also to a linear mapping
from V to Hom(U, W).
Proof. For each fixed TJ E V let w~ be the mapping ~ ~ w(~, TJ). That is,
w~w = w(~, TJ). Then w~ E Hom(U, W) by the bilinear hypothesis. The
mapping TJ ~ w~ is thus from V to Hom(U, W), and its linearity is due to the
linearity of w in TJ when ~ is held fixed:
wCHd!"(O = w(~, CTJ +cit) = cw(~, TJ) +ciwa, t) = cw~W +ciw!"(~),
so that
Similarly, if we define w~ by w~(TJ) = w(~, TJ), then ~ ~ w~ is a linear mapping
from U to Hom(V, W). Conversely, if IP: U ~ Hom(V, W) is linear, then the
function w defined by w(~, TJ) = IP(O(TJ) is bilinear. Moreover, w~ = IP(~), so
that IP is the mapping ~ ~ w~. 0
We shall see that bilinearity occurs frequently. Sometimes the reinterpreta-
tion provided by the above theorem provides new insights; at other times it
seems less helpful.
For example, the composition map <S, T> ~ SoT is bilinear, and the
corollary of Theorem 3.3, which in effect states that composition on the right by
a fixed T is a linear map, is simply part of an explicit statement of the bilinearity.
But the linear map T ~ composition by T is a complicated object that we have
no need for except in the case W = lR-
On the other hand, the linear combination formula Ll Xi(Xi and Theorem 1.2
do receive new illumination.
Theorem 6.2. The mapping w(x, ex) = Ll Xi(Xi is bilinear from IR n X vn
to V. The mapping ex ~ Wa. is therefore a linear mapping from vn to
Hom(lRn , V), and, in fact, is an isomorphism.
Proof. The linearity of w in x for a fixed ex was proved in Theorem 1.2, and its
linearity in ex for a fixed x is seen in the same way. Then ex ~ Wa. is linear by
Theorem 6.1. Its bijectivity was implicit in Theorem 1.2. 0
It should be remarked that we can use any finite index set I just as well as
the special set'ii and conclude that w(x, ex) = LiEI Xi(Xi is bilinear from IRI X Vi
1.6 BILINEARITY 69
to V and that a 1-+ Wa. is an isomorphism from VI to Hom(~I, V). Also note
that Wa. = La. in the terminology of Section 1.
Corollary. The scalar product (x, a) = I:1 Xiai is bilinear from ~n X ~n
to ~; therefore, a 1-+ Wa = La is an isomorphism from ~n to Hom(~n, ~).
Natural isomorphisms. We often find two vector spaces related to each other
in such a way that a particular isomorphism between them is singled out. This
phenomenon is hard to pin down in general terms but easy to describe by
examples.
Duality is one source of such "natural" isomorphisms. For example, an
m X n matrix {iii} is a real-valued function of the two variables -< i,} >-, and
as such it is an element of the Cartesian space ~mXn. We can also view {iii} as
a sequence of n column vectors in ~m. This is the dual point of view where we
hold} fixed and obtain a function of i for each i From this point of view {iii}
is an element of (~m)n. This correspondence between ~mXn and (~m)n is clearly
an isomorphism, and is an example of a natural isomorphism.
We review next the various ways of looking at Cartesian n-space itself.
One standard way of defining an ordered n-tuplet is by induction. The ordered
triplet -< x, y, z>- is defined as the ordered pair -< -< x, y>- , z>- , and the ordered
n-tuplet -<XI, ... , xn>- is defined as -< -<XI, ... , Xn-l>-, xn>-. Thus we
define ~n inductively by setting ~l = ~ and ~n = ~n-l X ~.
The ordered n-tuplet can also be defined as the function on n = {I, ... , n}
which assigns Xi to i. Then
-<Xl,""Xn>- = {-<l,Xl>-,"" -<n,xn>-},
and Cartesian n-space is ~n = ~ (1, ... , nl .
Finally, we often wish to view Cartesian (n +m)-space as the Cartesian
product of Cartesian n-space with Cartesian m-space, so we now take
as
and ~n+m as ~n X ~m.
Here again if we pair two different models for the same n-tuplet, we have an
obvious natural isomorphism between the corresponding models for Cartesian
n-space.
Finally, the characteristic properties of Cartesian product spaces given in
Theorems 3.6 and 3.7 yield natural isomorphisms. Theorem 3.6 says that an
n-tuple of linear maps {Tig on a common domain V is equivalent to a single
n-tuple-valued map T, where T(~) = -< Tl(~)"'" TnW >- for all ~ E V.
(This is duality again! TiW is a function of the two variables i and 0 And it is
not hard to see that this identification of T with {Tig is an isomorphism from
I1i Hom(V, Wi) to Hom(V, IIi Wi).
Similarly, Theorem 3.7 identifies an n-tuple of linear maps {Ti }1 into a com-
mon codomain V with a single linear map T of an n-tuple variable, and this iden-
tification is a natural isomorphism from II1 Hom(Wi, V) to Hom(II1 Wi, V).
70 VECTOR SPACES l.G
An arbitrary isomorphism between two vector spaces identifies them in a
transient way. For the moment we think of the vector spaces as representing the
same abstract space, but only so long as the isomorphism is before us. If we
shift to a different isomorphism between them, we obtain a new temporary
identification. Natural isomorphisms, on the other hand, effect permanent
identifications, and we think of paired objects as being two aspects of the same
object in a deeper sense. Thus we think of a matrix as "being" either a sequence
of row vectors, a sequence of column vectors, or a single function of two integer
indices. We shall take a final look at this question at the end of Section 3 in the
next chapter.
*We can now make the ultimate dissection of the theorems centering
around the linear combination formula. Laws Sl through S3 state exactly that
the scalar product xa is bilinear. lIore precisely, they state that the mapping
s: -< x, a >- ~ xa from IR X W to W is bilinear. In the language of Theorem G.1,
xa = w",(x), and from that theorem we conclude that the mapping a ~ w'" is
an isomorphism from W to Hom(lR, W).
This isomorphism between Wand Hom(lR, W) extends to an isomor-
phism from wn to (Hom(lR, W)) n, which in turn is naturally isomorphic to
Hom(lRn,W) by the second Cartesian product isomorphism. Thus wn is natu-
rally isomorphic to Hom(lRn , W); the mapping is a ~ La., where La.(x) =
L~ Xiai·
In particular, IRn is naturally isomorphic to the space Hom(lRn , IR) of all
linear functionals on IR n , the n-tuple a corresponding to the functional Wa
defined by wa(x) = L~ aiXi.
Also, (IRrn)n is naturally isomorphic to Hom(lRn , 1R1n). And since IRmXn is
naturally isomorphic to (IRrn)n, it follows that the spaces IRmXn and Hom(lR n , IRrn)
are naturally isomorphic. This is simply our natural association of a transfor-
mation Tin Hom(lR n , IRm) to an m X n matrix {tij}.
CHAPTER 2
FINITE-DIMENSIONAL VECTOR SPACES
We have defined a vector space to be finite-dimensional if it has a finite spanning
set. In this chapter we shall focus our attention on such spaces, although this
restriction is unnecessary for some of our discussion. We shall see that we can
assign to each finite-dimensional space V a unique integer, called the dimension
of V, which satisfies our intuitive requirements about dimensionality and which
becomes a principal tool in the deeper explorations into the nature of such
e;paces. A number of "dimensional identities" are crucial in these further
investigations. We shall find that the dual space of all linear functionals on V,
V* = Hom(V, IR), plays a more satisfactory role in finite-dimensional theory
than in the context of general vector spaces. (However, we shall see later in
the book that when we add limit theory to our algebra, there are certain special
infinite-dimensional vector spaces for which the dual space plays an equally
important role.) A finite-dimensional space can be characterized as a vector
space isomorphic to some Cartesian space IRn , and such an isomorphism allows a
transformation T in Hom V to be "transferred" to IRn , whereupon it acquires a
matrix. The theory of linear transformations on such spaces is therefore mirrored
eompletely by the theory of matrices. In this chapter we shall push much
deeper into the nature of this relationship than we did in Chapter 1. We also
include a section on matrix computations, a brief section describing the trace
and determinant functions, and a short discussion of the diagonalization of a
quadratic form.
I. BASES
Consider again a fixed finite indexed set of vectors a = {ai: i E I} in V and the
corresponding linear combination map La.: x f---+ L Xiai from IRI to V having a
lie; skeleton.
Definition. The finite indexed set {ai: i E I} is independent if the above
mapping La. is injective, and {ai} is a basis for V if La. is an isomorphism
(onto V). In this situation we call {ai: i E J} an ordered basis or frame if
J = n = {I, ... , n} for some positive integer n.
Thus {ai : i E J} is a basis if and only if for each ~ E V there exists a unique
illdexed "coefficient" set x = {Xi: i E I} E IRI such that ~ = L Xiai. The
71
72 FINITE-DIMENSIONAL VECTOR SPACES 2.1
numbers Xi always exist because {ai: i E I} spans V, and x is unique because
La. is injective.
For example, we can check directly that hI = -< 2, 1>- and h 2 = -< 1, -3 >-
form a basis for jR2. The problem is to show that for each y E jR2 there is a
unique x such that
2
Y = L Xihi = Xl -<2,1>- + X2-< 1, -3>- = -<2XI + X2, Xl - 3X2>-.
I
Since this vector equation is equivalent to the two scalar equations YI =
2XI +X2 and Y2 = Xl - 3X2, we can find the unique solution Xl = (3YI +Y2)/7,
X2 = (YI - 2Y2)/7 by the usual elimination method of secondary school
algebra.
The form of these definitions is dictated by our interpretation of the linear
combination formula as a linear mapping. The more usual definition of indepen-
dence is a corollary.
Lelllllla I.I. The independence of the finite indexed set {ai: i E I} is
equivalent to the property that LI Xiai = 0 only if all the coefficients Xi
are O.
Proof. This is the property that the null space of La. consist only of 0, and is
thus equivalent to the injectivity of La., that is, to the independence of {ai} , by
Lemma 1.1 of Chapter 1. 0
If {ain is an ordered basis (frame) for V, the unique n-tuple x such that
~ = L~ Xiai is called the coordinate n-tuple of ~ (with respect to the basis {ai}),
and Xi is the ith coordinate of~. We call Xiai (and sometimes Xi) the ith component
of~. The mapping La. will be called a basis isomorphism, and its inverse L;:l,
which assigns to each vector ~ E V its unique coordinate n-tuple x, is a coordinate
isomorphism. The linear functional ~ 1---+ Xi is the jth coordinate functional;
it is the composition of the coordinate isomorphism ~ 1---+ x with the jth coordi-
nate projection x 1---+ Xi on jRn. We shall see in Section 3 that the n coordinate
functionals form a basis for V* = Hom(V, jR).
In the above paragraph we took the index set I to be n = {I, ... , n} and
used the language of n-tuples. The only difference for an arbitrary finite index
set is that we speak of a coordinate function x = {Xi: i E I} instead of a coordi-
nate n-tuple.
Our first concern will be to show that every finite-dimensional (finitely
spanned) vector space has a basis. We start with some remarks about indices.
We note first that a finite indexed set {ai: i E I} can be independent only
if the indexing is injective as a mapping into V, for if ak = aI, then L Xiai = 0,
where Xk = 1, Xl = -1, and Xi = 0 for the remaining indices. Also, if {ai : i E l}
is independent and J C I, then {ai : i E J} is independent, since if LJ Xiai = 0,
and if we set Xi = 0 for i E I - J, then LI Xiai = 0, and so each xi is o.
A finite unindexed set is said to be independent if it is independent in some
2.1 BASES 73
(necessarily bijective) indexing. It will of course then be independent with
respect to any bijective indexing. An arbitrary set is independent if every finite
subset is independent. It follows that a set A is dependent (not independent) if
and only if there exist distinct clements al> ..• , an in A and scalars Xl, ... , xn
not all zero such that L7 x,;a,; = o. An unindexed basis would be defined in the
obvious way. However, a set can always be regarded as being indexed, by itself
if necessary!
Lemma 1.2. If B is an independent subset of a vector space V and {3 is any
vector not in the linear span L(B), then B U {{3} is independent.
Proof. Otherwise there is a zero linear combination, ;r{3 + L~ :r,;{3,- = 0, where
{3I> •.. , {3n arc distinct clements of B and the coefficients are not all O. But then
X cannot be zero: if it were, the equation would contradict the independence of
B. We can therefore divide by;'/: and solve for {3, so that {3 E L(B), a contra-
diction. 0
The reader will remember that we call a vector space V finite-dimensional
if it has a finite spanning set {aJ 7. We can use the above lemma to construct a
basis for such a V by choosing some of the a/so We simply run through the
sequence {aig and choose those members that increase the linear span of the
preceding choices. We end up with a spanning set since {aig spans, and our
subsequence is independent at each step, by the lemma. In the same way we
can extend an independent set {{3i} 7to a basis by choosing some members of a
spanning set {ai} 7. This procedure is intuitive, but it is messy to set up rigor-
ously. We shall therefore proceed differently.
Theorem 1.1. Any minimal finite spanning set is a basis, and therefore any
finite-dimensional vector space V has a basis. More generally, if {{3j : j E J}
is a finite independent set and {ai: i E J} is a finite spanning set, and if K
is a smallest subset of J such that {{3j} J U {ai} K spans, then this collection is
independent and 2, basis. Therefore, any finite independent subset of a
finite-dimensional space can be extended to a basis.
Proof. It is sufficient to prove the second assertion, since it includes the first
as a special case. If {{3j} J U {ai} K is not independent, then there is a nontrivial
zero linear combination LJ Yj{3j + LK Xiai = O. If every Xi were zero, this
pquation would contradict the independence of {{3j} J. Therefore, some Xk is
Ilot zero, and we can solve the equation for ak. That is, if we set L = K - {k},
then the linear span of {{3j} J U {ai} L contains ak. It therefore includes the whole
original spanning set and hence is V. But this contradicts the minimal nature of
K, since L is a proper subset of K. Consequently, {{3j} J U {aJ K is independent. 0
We next note that ~n itself has a very special basis. In the indexing map
,: ~ ai the vector aj corresponds to the index j, but under the linear combi-
Ilation map x ~ L Xiai the vector aj corresponds to the function oj which has
t he value 1 at j and the value 0 elsewhere, so that Li o{ai = aj. This function
74 FINITE-DIMENSIONAL VECTOR SPACES 2.1
~j is called a Kronecker delta function. It is clearly the characteristic function XB
of the one-point set B = {J}, and the symbol ,~j, is ambiguous, just as 'XB'
is ambiguous; in each case the meaning depends on what domain is implicit from
the context. We have already used the delta functions on ~n in proving Theorem
1.2 of Chapter 1.
Theorem 1.2. The Kronecker functions {~j}i=l form a basis for ~n.
Proof. Since Li Xi~i(j) = Xj by the definition of ~i, we see that L~ Xi~i is
the n-tuple x itself, so the linear combination mapping L5: x 1--+ L~ Xi~i is the
identity mapping x 1--+ x, a trivial isomorphism. 0
Among all possible indexed bases for ~n, the Kronecker basis is thus singled
out by the fact that its basis isomorphism is the identity; for this reason it is
called the standard basis or the natural basis for ~n. The same holds for ~l for
any finite set I.
Finally, we shall draw some elementary conclusions from the existence of
a basis.
Theorem 1.3. If T E Hom(V, W) is an isomorphism and a = {ai: i E I}
is a basis for V, then {T(ai) : i E I} is a basis for W.
Proof. By hypothesis Lo. is an isomorphism in Hom(~n, V), and so To Lo. is
an isomorphism in Hom(~n, W). Its skeleton {T(ai)} is therefore a basis for W. 0
We can view any basis {ai} as the image of the standard basis {~i} under
the basis isomorphism. Conversely, any isomorphism 8: ~l --t B becomes a basis
isomorphism for the basis aj = 8(~j).
Theorem 1.4. If X and Yare complementary subspaces of a vector space V,
then the union of a basis for X and a basis for Y is a basis for V. Conversely,
if a basis for V is partitioned into two sets, with linear spans X and Y,
respectively, then X and Yare complementary subspaces of V..
Proof. We prove only the first statement. If {ai : i E J} is a basis for X and
{ai: i E K} is a basis for Y, then it is clear that {ai : i E J uK} spans V, since
its span includes both X and Y, and so X + Y = V. Suppose then that
LJUK Xiai = O. Setting ~ = LJ Xiai and 7J = LK Xiai, we see that ~ E X,
7J E Y, and ~ + 7J = O. But then ~ = 7J = 0, since X and Yare complementary.
And then Xi = 0 for i E J because {ai} J is independent, and Xi = 0 for i E K
because {ai} K is independent. Therefore, {ai} JUK is a basis for V. We leave the
converse argument as an exercise. 0
Corollary. If V = EB~ Vi and Bi is a basis for Vi, then B = U~ Bi is a
basis for V.
Proof. We see from the theorem that Bl U B2 is a basis for VI EB V 2. Proceed-
ing inductively we see that U{=l Bi is a basis for EB{=l Vi for j = 2, ... , n,
and the corollary is the case j = n. 0
2.1 BASES 75
If we follow a coordinate isomorphism by a linear combination map, we get
the mapping of the following existence theorem, which we state only in n-tuple
form.
Theorelll 1.5. If {3 = {l1i} ~ is an ordered basis for the vector space V, and
if {ai} ~ is any n-tuple of vectors in a vector space W, then there exists a
unique S E Hom(V, W) such that S(l1i) = ai for i = 1, ... , n.
Proof. By hypothesis L~ is an isomorphism in Hom(!Rn , V), and so
S = La. 0 (L~)-1 is an clement of Hom(V, W) such that S(l1i) = La.(oi) = ai.
Conversely, if S E Hom(V, W) is such that S(l1i) = ai for all i, then S 0 L~( oi) =
ai for all i, so that S 0 L~ = La.. Thus S is uniquely determined as La. 0 (L~) -1. 0
It is natural to ask how the unique S above varies with the n-tuple {ai}.
The answer is: linearly and "isomorphically".
Theorelll 1.6. Let {l1i} ~ be a fixed ordered basis for the vector space
V, and for each n-tuple a = {aig chosen from the vector space W let
Sa. E Hom(V, W) be the unique transformation defined above. Then the
map a ~ Sa. is an isomorphism from wn to Hom(V, W).
Proof. As above, Sa. = La. 0 (r where 0 is the basis isomorphism L~. N"ow we
know from Theorem 6.2 of Chapter 1 that a ~ La. is an isomorphism from wn
to Hom(!Rn , W), and composition on the right by the fixed coordinate isomor-
phism 0-1is an isomorphism from Hom(!Rn , W) to Hom(V, W) by the corollary
to Theorem 3.3 of Chapter 1. Composing these two isomorphisms gives us the
theorem. 0
*Infinite bases. Most vector spaces do not have finite bases, and it is natural to
try to extend the above discussion to index sets I that may be infinite. The
Kronecker functions {oi : i E I} have the same definitions, but they no longer
span !RI . By definition f is a linear combination of the functions oi if and only
if f is of the form LiEI! Cioi, where II is a finite subset of I. But then f = 0
outside of I l' Conversely, if f E !RI is 0 except on a finite set 111 then f =
LiEIJ(i) oi. The linear span of {oi : i E I} is thus exactly the set of all func-
tions of !RI that are zero except on a finite set. We shall designate this sub-
space !RI.
If {ai: i E I} is an indexed set of vectors in V and f E !RI , then the sum
LiEI f(i)ai becomes meaningful if we adopt the reasonable convention that the
sum of an arbitrary number of O's is O. Then LiEI = LiEIo' where lois any
finite subset of I outside of which f is zero.
With this convention, La.:f ~ Ld(i)ai is a linear map from !RI to V, as in
Theorem 1.2 of Chapter 1. And with the same convention, LiEI f(i)ai is an
elegant expression for the general linear combination of the vectors ai. Instead
of choosing a finite subset II and numbers Ci for just those indices i in II, we
define Ci for all i E I, but with the stipulation that Ci = 0 for all but a finite
number of indices. That is, we take c = {Ci: i E I} as a function in !RI.
76 FINITE-DlMENSro","AL VECTOR SPACES 2.1
We make the same definitions of independence and basis as before. Then
{ai: i E l} is a basis for V if and only if La.: IRI ~ V is an isomorphism, i.e., if
and only if for each ~ E V there exists a unique x E IRI such that ~ = Li Xiai.
By using an axiom of set theory called the axiom of choice, it can be shown
that every vector space has a basis in this sense and that any independent set
can be extended to a basis. Then Theorems 1.4 and 1.5 hold with only minor
changes in notation. In particular, if a basis for a subspace 111 of V is extended
to a basis for V, then the linear span of the added part is a subspace N comple-
mentary to M. Thus, in a purely algebraic sense, every subspace has com-
plementary subspaces. We assume this fact in some of our exercises.
The above sums are always finite (despite appearances), and the above
notion of basis is purely algebraic. However, infinite bases in this sense are not
very useful in analysis, and we shall therefore concentrate for the present on
spaces that have finite bases (i.e., are finite-dimensional). Then in one impor-
tant context later on we shall discuss infinite bases where the sums are genuinely
infinite by virtue of limit theory.
EXERCISES
1.1 Show by a direct computation that {-< 1, -1>, -< 0, I>} is a basis for 1R2.
1.2 The student must realize that the ith coordinate of a vector depends on the whole
basis and not jm3t on the ith basis vector. Prove this for the second coordinate of
vectors in 1R2 using the standard basis and the basis of the above exercise.
1.3 Show that {-< 1, 1>, -< 1, 2>} is a basis for V = 1R2. The basis isomorphism
from 1R2 to V is now from 1R2 to 1R3. Find its matrix. Find the matrix of the coordinate
isomorphism. Compute the coordinates, with respect to this basis, of -< -1, 1 >, -< 0, 1>,
-< 2,3 >.
1.4 Show that {bi}~, where b l = -<1,0,0>, b 2 = -<1,1,0>, and b 3 =
-< 1, 1, 1>, is a basis for 1R3.
1.5 In the above exercise find the three linear functionals li that are the coordinate
functionals with respect to the given basis. Since
3
x = L: li(X)b
1
finding the Ii is equivalent to solving x = L~ Yibi for the y/s in terms of
x = -<Xl, X2, X3>.
1.6 Show that any set of polynomials no two of which have the same degree is
independent.
1.7 Show that if {ai}~ is an independent subset of V and Tin Hom(V, W) is injec-
tive, then {T(ai)}~ is an independent subset of W.
1.8 Show that if T is any element of Hom(V, W) and {T(ai)}~ is independent in W,
then {ai}~ is independent in V.
2.2 DI~1ENSION 77
1.9 Later on WE' are going to call a vpctor space V n-dinlPnsional if every basis for
l' contains exactly n ecmcnts. If l' is the span of a singlp vC'ctor a, so that l' = IRa,
then V is clearly one-dimensional.
Let {Vi} ~ bE' a collection of one-dimen~ional suh~pa('c~ of a vcrtor space Y, and
choose a nonzero vpctor ai in V, for eaeh i. Provp that fai} ~ is indepC'ndpnt if ami
only if the subspaces {Vi} ~ are indcppn<icnt and that fai: ~ is a ba"is if and only if
V = EB7 Vi.
1.10 Finish the proof of TheorC'm 1.4.
1.11 Give a proof of ThC'orem 1.4 baspd on the pxistencC' of isomorphisms.
1.12 ThE' rpader would guess, and "(' shall prOH' in thC' next sE'ction, that l'very
subspace of a finite-dimensional space is fini te-<iimen"ional. Prove now that a sub-
space N of a finitC'-dinH'nsiollal H'dor spacer is finite-diIllcllsional if and only if it ha:,
a complcment JI. (York from a combination of Theorems 1.1 and 1.4 and direct sum
projections.)
1.13 SincE' {hin = {-< 1,0, °>, -< 1, ], °>, -< 1, 1, ] >} is a basis for 1R3, there is a
unique T in Hom(1R3,1R2) such that T(h 1) = -<1,0>, T(h2) = -<0,1>, and
T(h3 ) = -< 1, 1 >. Find thl' matrix of T. (Find T( 8i ) for i = 1, 2, 3.)
1.14 Find, similarly, the S in Hom 1R3 such that S(hi) = 8i for i = 1, 2, 3.
1.15 Show that the infinite spqupnce ft u } ~ is a basis for the vcclor space of all poly-
nomials.
2. DIMENSION
The concept of dimension rests on the fact that two different bases for the same
space always contain the same number of elements. This number, which is
then the number of elements in every basis for V, is called the dimension of V.
It tells all there is to know about V to within isomorphism: There exists an
isomorphism between two spaces if and only if they have the same dimension.
We shall consider only finite dimensions. If V is not finite-dimensional, its
dimension is an infinite cardinal number, a concept with which the reader is
probably unfamiliar.
Lemma 2.1. If V is finite-dimensional and T in Hom V is surjective, then T
is an isomorphism.
Proof. Let 11 be the smallest number of elements that can span V. That is, there
is some spanning set {a;} 7and none with fewer than n elements. Then {ai} 7is a
basis, by Theorem 1.1, and the linear combination map 0: x f---> 2::7 Xiai is
accordingly a basis isomorphism. But {iJJ 7= {T(ai)} 7also spans, since T is
surjective, and so ToO is also a basis isomorphism, for the same reason. Then
T = (T 0 0) 0 0- 1 is an isomorphism. 0
Theorem 2.1. If V is finite-dimensional, then all bases for V contain the
same number of elements.
Proof. Two bases with nand m elements determine basis isomorphisms
0: IRn ~ V and cp: 1R1n ~ V. Suppose that m < n and, viewing IRn as IRrn X IR n - m ,
78 FINITE-DIMEXSIONAL VECTOR SPACES 2.2
let 7r be the projection of IRn onto IRm,
Since T = 0-1 0 cp is an isomorphism from IRm to IRn and To 7r: IRn ~ IRn is
therefore surjective, it follows from the lemma that T 0 7r is an isomorphism.
Then 7r = T- l 0 (T 0 7r) is an isomorphism. But it isn't, because 7r(on) = 0,
and we have a contradiction. Therefore no basis can be smaller than any other
basis. D
The integer that is the number of elements in every basis for V is of course
called the dimension of V, and we designate it d(V). Since the standard basis
{oi} ~ for IRn has n elements, we see that IRn is n-dimensional in this precise sense.
Corollary. Two finite-dimensional vector spaces are isomorphic if and only
if they have the same dimension.
Proof. If T is an isomorphism from V to Wand B is a basis for V, then T[B] is a
basis for W by Theorem 1.3. Therefore d(V) = #B = #T[B] = d(W), where
#A is the number of elements in A. Conversely, if d(V) = d(W) = n, then V
and Ware each isomorphic to IRn and so to each other. D
Theorem 2.2. Every subspace M of a finite-dimensional vector space V is
finite-dimensional.
Proof. Let a be the family of finite independent subsets of M. By Theorem 1.1,
if A E a, then A can be extended to a basis for V, and so #A ~ d(V). Thus
{#A : A E a} is a finite set of integers, and we can choose B E a such that
n = #B is the maximum of this finite set. But then L(B) = M, because other-
wise for any a E ]J[ - L(B) we have B U {a} E a, by Lemma 1.2, and
#(B U {a}) = n + 1,
contradicting the maximal nature of n. Thus M is finitely spanned. D
Corollary. Every subspace ]J[ of a finite-dimensional space V has a comple-
ment.
Proof. Use Theorem 1.1 to extend a basis for M to a basis for V, and let N be
the linear span of the added vectors. Then apply Theorem 1.4. D
DiInensional identities. We now prove two basic dimensional identities.
We will always assume V finite-dimensional.
Lemma 2.2. If V 1 and V 2 are complementary subspaces of V, then d(V) =
d(V1) +d(V2)' More generally, if V = EB~ Vi then d(V) = L~ d(Vi)'
Proof. This follows at once from Theorem 1.4 and its corollary. D
Theorem 2.3. If U and Ware subspaces of a finite-dimensional vector space,
then d(U + W) +d(U n W) = d(U) +d(W).
2.2 DIMENSION 79
Proof. Let V be a complement of U n Win U. We start by showing that then V
is also a complement of W in U + W. First
V + W = V + ((U n W) + W) = (V + (U n W») + W = U + W.
We have used the obvious fact that the sum of a vector space and a subspace
is the vector space. Next,
V n W = (V n U) n W = V n (U n W) = {O},
because V is a complement of Un W in U. We thus have both V + W =
U + Wand V n W = {O}, and so V is a complement of W in U + W by
the corollary of Lemma 5.2 of Chapter 1.
The theorem is now a corollary of the above lemma. We have
d(U) +deW) = (d(U n W) +dey») +deW) = d(U n W) +(d(V) + deW»)
= d(U n W) +d(U + W). 0
Theorem 2.4. Let V be finite-dimensional, and let W be any vector space.
Let T E Hom(V, W) have null space N (in V) and range R (in W). Then R
is finite-dimensional and dey) = deN) +d(R).
Proof. Let U be a complement of N in V. Then we know that T r U is an
isomorphism onto R. (See Theorem 5.3 of Chapter 1.) Therefore, R is finite-
dimensional and d(R) + deN) = d(U) + deN) = dey) by our first identity. 0
Corollary. If W is finite-dimensional and deW) = dey), then T is injective
if and only if it is surjective, so that in this case injectivity, surjectivity, and
bijectivity are all equivalent.
Proof. T is surjective if and only if R = W. But this is equivalent to d(R) =
deW), and if deW) = deY), then.the theorem shows this is turn to be equivalent
to deN) = 0, that is, to N = {O}. 0
Theorem 2.5. If dey) = nand deW) = m, then Hom(V, W) is finite-
dimensional and its dimension is mn.
Proof. By Theorem 1.6, Hom(V, W) is isomorphic to wn which is the direct
sum of the n subspaces isomorphic to W under the injections (Ji for i = 1, ... ,n.
The dimension of wn is therefore L~ m = mn by Lemma 2.2. 0
Another proof of Theorem 2.5 will be available in Section 4.
EXERCISES
2.1 Prove that if d(V) = n, then any spanning subset of n elements is a basis.
2.2 Prove that if-d(V) = n, then any independent subset of n elements is a basis.
2.3 Show that if d(V) = nand lV is a subspace of the same dimension, then W = V.
80 FIKITE-DIMENSIONAL VECTOR SPACES 2.2
2.4 Prove by using dimensional identities that if f is a nonzero linear functional on
an n-dimensional space V, then its null space has dimension n - 1.
2.5 Prove by u::;ing dimensional identities that if f is a linear functional on a finite-
dimensional space V, and if a is a vector not in its null space N, then V = N (B IRa.
2.6 Given that N is an (n - I)-dimensional subspa<;e of an n-dimen:sional vector
space V, show that N is the null space of a linear functional.
2.7 Let X and Y be subspaces of a finite-dimensional vector space V, and suppose
that Tin Hom(V, W) has null space N = X n L Show that T[X + 1"] = T[X] (B
T( y), and then deduce Theorem 2.3 from Lemma 2.2 and Theorem 2.4. This proof
still depends on the existence of a T having N = X n Y as its null space. Do we know
of any such T'?
2.8 Show that if T' i~ finite-dimen;;ional and S, T E Hom V, then
SoT = I ==} T is invertible.
Show also that To S = I ==} T is invertible.
2.9 A subspace N of a vector space V has finite codimension n if the quotient space
V IN is finite-dimensional, with dimension n. Show that a subspace N has finite
codimension n if and only if N has a complementary subspace J1 of dimension 7!.
(:Iove a basis for VI N ba<;k into V.) Do not assume V to be finite-dimensional.
2.10 Show that if N 1 and N 2 are subspaces of a vector space V with finite codimC'n-
sions, then N = N 1 n N 2 has finite codimension and
cod(N) ::;: cod(NI) + cod(Nz).
(Consider the mapping ~ f-+ < ~I' ~2 >- when ~i is the coset of Ni containing ~.)
2.11 In the above exercise, suppose that cod(NI) = cod(N2), that is, d(VIN1)
d(VIN2). Prove that d(NJ/lv') = d(N2IN).
2.12 Given nonzero vectors (3 in V and f in V* such that f({3) ,e 0, show that some
scalar multiple of the mapping ~ f-+ f(~){3 is a projection. Prove that any projedion
having a one-dimensional rangC' arises in this way.
2.13 We know that the choice of an origin 0 in Euclidean 3-space 1E3 indu<;C's a
vector space structure in 1E3 (under the correspondence X f-+ OX) and that this vector
space is three-dimensional. Show that a geometric plane through 0 becomes a two-
dimensional subspace.
2.14 An m-dimensional plane ,11 is a translate N +ao of an m-dimensional subspa<;e N.
Let {{3i} ~ be any basis of N, and set ai = {3i +ao. Show that M is exactly the set of
linear combinations
such that
m
LXi = 1.
o
2.15 Show that Exercise 2.14 is a corollary of Exercise 4.14 of Chapter l.
2.16 Show, conversely, that if a plane M is the affine span of m + 1 elements, then
its dimension is ::;: m.
2.17 From the above two exer<;ises concoct a direct definition of the dimension of an
affine subspace.
2.3 THE DUAL SPACE 81
2.18 Write a small essay suggested by the following definition. An (rn + I)-tuple
{ai}O' is affinely independent if the conditions
together imply that
m
L: Xiai = 0
o
Xi = 0
and
m
L: Xi = 0
o
for all i.
2.19 A polynomial on a vector space V is a real-valued function on V which can be
represented as a finite sum of finite products of linear functionals. Define the degree
of a polynomial; define a hornogeneous polynornial oj degree k. Show that the set of
homogeneous polynomials of degree k is a vector space X k • -
2.20 Continuing the above exercise, show that if kl < k2 < ... < kN, then the
vector spaces {XkJf are independent sub3paces of the vector space of all polynomials.
[Assume that a polynom'ial p(t) of a real variable can be the zero function only if all
its coefficients are O. For any polynomial P on V consider the polynomials p,,(t) =
P(ta).J
2.21 Let -<a, {3 >- be a basis for the two-dimensional space V, and let -<~, p. >- be the
corresponding coordinate projections (dual basis in V*). Show that every polynomial
on V "is a polynomial in the two variables ~ and p.".
2.22 Let -<a, {3 >- be a basis for a two-dimensional vector space V, and let -<~, p. >-
be the corresponding coordinate projections (dual basis for V*). Show that
-<~2, ~p., p.2 >-
is a basis for the vector space of homogeneous polynomials on V of degree 2. Similarly,
compute the dimension of the space of homogeneous polynomials of degree 3 on a
two-dimensional vector space.
2.23 Let V and W be two-dimensional vector spaces, and let F be a mapping from
V to W. Using coordinate systems, define the notion of F being quadratic and then
show that it is independent of coordinate systems. Generalize the above exercise to
higher dimensions and also to higher degrees.
2.24 Now let F: V ~ W be a mapping between two-dimensional spaces such that
for any u, v E V and any l E W*, l(F(tu + v)) is a quadratic function of t, that is, of
the form at2 + bt + c. Show that F is quadratic according to your definition in the
above exercises.
3. THE DUAL SPACE
Although throughout this section all spaces will be assumed finite-dimensional,
many of the definitions and properties are valid for infinite-dimensional spaces
as well. But for such spaces there is a difference between pmely algebraic
situations and situations in which algebra is mixed with hypotheses of continuity.
One of the blessings of finite dimensionality is the absence of this complication.
As the reader has probably surmised from the number of special linear functionals
we have met, particularly the coordinate functionals, the space Hom(V, IR)
of all linear functionals on V plays a special role.
82 FINITE-DIMENSIONAL VECTOR SPACES 2.3
Definition. The dual space (or conjugate space) V* of the vector space V is
the vector space Hom(V, IR) of all linear mappings from V to IR. Its elements
are called linear functionals.
Weare going to see that in a certain sense V is in turn the dual space of
V* (V and (V*)* are naturally isomorphic), so that the two spaces are sym-
metrically related. We shall briefly study the notion of annihilation (orthogonal-
ity) which has its origins in this setting, and then see that there is a natural
isomorphism between Hom(V, W) and Hom(W*, V*). This gives the mathema-
tician a new tool to use in studying a linear transformation Tin Hom(V, W);
the relationship between T and its image T* exposes new properties of T itself.
Dual bases. At the outset one naturally wonders how big a space V* is, and we
settle the question immediately.
Theorem 3.1. Let {f3i}~ be an ordered basis for V, and let ej be the corre-
sponding jth coordinate functional on V: ej(l;) = Xi> where ~ = L~ Xif3i.
Then {ejg is an ordered basis for V*.
Proof. Let us first make the proof by a direct elementary calculation.
a) Independence. Suppose that L~ Cjej = 0, that is, L~ Cjej(O = °for
all ~ E V. Taking ~ = f3i and remembering that the coordinate n-tuple of f3i
is ~i, we see that the above equation reduces to Ci = 0, and this for all i. There-
fore, {ej}~ is independent.
b) Spanning. First note that the basis expansion ~ = L Xif3i can be re-
written ~ = L ei(~)f3i' Then for any AE V* we have A(~) = L~ liei(~)'
where we have set li = A(f3i). That is, A= L liei. This shows that {ej} ~ spans
V*, and, together with (a), that it is a basis. D
Definition. The basis {ej} for V* is called the dual of the basis {f3i} for V.
As usual, one of our fundamental isomorphisms is lurking behind all this,
but we shall leave its exposure to an exercise.
Corollary. d(V*) = dey).
The three equations
A(~) = L A(f3i) . ei(~)
are worth looking at. The first two are symmetrically related, each presenting
the basis expansion of a vector with its coefficients computed by applying the
corresponding element of the dual basis to the vector. The third is symmetric
itself between ~ and A.
Since a finite-dimensional space V and its dual space V* have the same
dimension, they are of course isomorphic. In fact, each basis for V defines an
isomorphism, for we have the associated coordinate isomorphism from V to IRn,
the dual basis isomorphism from IRn to V*, and therefore the composite isomor-
2.3 THE DUAL SPACE 83
phism from V to V*. This isomorphism varies with the basis, however, and there
is in general no natural isomorphism between V and V*.
It is another matter with Cartesian space IRn because it has a standard
basis, and therefore a standard isomorphism with its dual space (IRn)*. It is
not hard to see that this is the isomorphism a 1-+ La, where La(x) = L~ aiXi,
that we discussed in Section 1.6. We can therefore feel free to identify IR n with
(IRn)*, only keeping in mind that when we think of an n-tuple a as a linear
functional, we mean the functional La(x) = L~ aiXi.
The second conjugate space. Despite the fact that V and V* are not naturally
isomorphic in general, we shall now see that V is naturally isomorphic to V** =
(V*)*.
TheorelD 3.2. The function w: V X V* ~ IR defined by w(~, f) = f(O is
bilinear, and the mapping ~ 1-+ w~ from V to V** is a natural isomorphism.
Proof. In this context we generally set ~** = w~, so that ~** is defined by
~**(f) = fW for all f E V*. The bilinearity of w should be clear, and Theorem
6.1 of Chapter 1 therefore applies. The reader might like to run through a
direct check of the linearity of ~ 1-+ ~** starting with (Cl h +C2 ~2) **(1).
There still is the question of the injectivity of this mapping. If a ~ 0, we
can find f E V* so that f(a) ~ O. One way is to make a the first vector of an
ordered basis and to takefas the first functional in the dual basis; thenf(a) = 1.
Since a**(f) = f(a) ~ 0, we see in particular that a** ~ O. The mapping
~ ~ ~** is thus injective, and it is then bijective by the corollary of Theorem
2.4. 0
If we think of V** as being naturally identified with V in this way, the two
Hpaces V and V* are symmetrically related to each other. Each is the dual of
t.he other. In the expression 'f(~)' we think of both symbols as variables and
t.hen hold one or the other fixed for the two interpretations. In such a situation
we often use a more symmetric symbolism, such as (~,f), to indicate our inten-
t.ion to treat both symbols as variables.
LelDlDa 3.1. If {Xi} is the basis in V* dual to the basis {ai} in V, then
{ai*} is the basis in V** dual to the basis {Xi} in V*.
1'l'oof. We have ai*(Xj) = Xj(ai) = 5}, which shows that a{* is the ith coordi-
nate projection. In case the reader has forgotten, the basis expansion f = L CjXj
implies that ai*(f) = f(ai) = (L CjXj) (ai) = Ci, so that ai* is the mapping
J1-+ Ci. 0
Annihilator subspaces. It is in this dual situation that orthogonality first
naturally appears. However, we shall save the term 'orthogonal' for the latter
enntext in which V and V* have been identified through a scalar product, and
shall speak here of the annihilator of a set rather than its orthogonal com-
plement.
84 FI],;"ITE-DlME],;"SIOXAL VECTOR SP.-CES 2.3
Definition. If A C V, the annihilator of A, A 0, is the set of all f in V * such
that f(a) = 0 for all a in A. Similarly, if A c V*, then
AO = {a E V:f(a) = 0 forallfE A}.
If we view Vas (V*)*, the second definition is included in the first.
The following properties arc easily establiHhed and will be left as exercises:
1) A ° iH always a subspace.
2) ACE =} EO C A 0.
3) (L(A))O = A 0.
4) (A u Bt = A° n W.
5) A C AOo.
We now add one more crucial dimenHional identity to thoHe of the last
section.
Theorem 3.3. If W ii"l a subspace of V, then d(V) = d(W) + d(WO).
Proof. Let {i3Yf be a basis for W, and extend it to a basis {{1i} ~ for V. Let
{Ai}~ be the dual basis in V*. We claim that then {Ai}::'+l is a basis for Woo
First, if j > In, then Aj({1i) = 0 for i = 1, ... , m, and so Aj is in WO by (3)
above. Thus {Am+l' ... , An} CWo. Now suppose that f E WO, and let f =
Lj'=l CjAj be its (dual) basis expansion. Then for each i ::; m we have Ci =
f({1i) = 0, since {1i E Wand f E WO; therefore, f = L::'+l CjAj. Thus every fin
WO is in the span of {Ai}::'+l. Altogether, we have shown that WO is the span of
{Ai}::'+l, as claimed. Then d(WO) + d(W) = (n - m) + m = n = d(V), and
we are done. D
Corollary. A 00 = L(A) for every subset A C V.
Proof. Since (L(A))O = A 0, we have d(L(A)) + d(A 0) = d(V), by the
theorem. Also d(AO) + d(.400) = d(V*) = d(V). Thus d(AOO) = d(L(A)),
and since L(A) C A 00, by (5) above, we have L(A) = A 00. D
The adjoint of T. We shall now see that with every T in Hom(V, W) there is
naturally associated an element of Hom(W*, V*) which we call the adjoint of
T and designate T*. One consequence of the intimate relationship between T
and T* is that the range of T* is exactly the annihilator of the null space of
T. Combined with our dimensional identities, this implies that the ranges of T
and T* have the same dimension. And later on, after we have established the
connection between matrix representations of T and T*, this turns into the very
mysterious fact that the dimension of the linear span of the row vectors of an m-
by-n matrix is the same as the dimension of the linear span of its column vectors,
which gives us our notion of the rank of a matrix. In Chapter 5 we shall study a
situation (Hilbert space) in which we are given a fixed fundamental isomorphism
between V and V*. If T is in Hom V, then of course T* is in Hom V*, and we
can use this isomorphism to "transfer" T* into Hom V. But now T can be com-
2.3 THE DUAL SPACE 85
pared with its (transferred) adjoint T*, and they may be equal. That is, T may
be self-adjoint. It turns out that the self-adjoint transformations are "nice" ones,
as we shall see for ourselves in simple cases, and also, fortunately, that many
important linear maps arising from theoretical physics are self-adjoint.
If T E Hom(V, W) and l E W*, then of course loT E V*. Moreover, the
mapping l ~ loT (T fixed) is a linear mapping from W* to V* by the corollary
to Theorem 3.3 of Chapter 1. This mapping is called the adjoint of T and is
designated T*. Thus T* E Hom(W*, V*) and T*(l) = loT for all l E W*.
Theorelll 3.4. The mapping T ~ T* is an isomorphism from the vector
space Hom(V, W) to the vector space Hom(W*, V*). Also (T ° S)* =
S* ° T* under the relevant hypotheses on domains and codomains.
Proof. Everything we have said above through the linearity of T ~ T* is a
consequence of the bilinearity of w(l, T) = loT. The map we have called T*
is simply WT, and the linearity of T ~ T* thus follows from Theorem 6.1 of
Chapter 1. Again the reader might benefit from a direct linearity check, begin-
ning with (ciTI + c2T2)*(l).
To see that T ~ T* is injective, we take any T .,t. 0 and choose a E V so
that T(a) .,t. O. We then choose l E W* so that l(T(a») .,t. o. Since l(T(a») =
(T*(l») (a), we have verified that T* .,t. o.
Next, if d(V) = m and d(W) = n, then also d(V*) = m and d(W*) = n .
bythecorollaryofTheorem3.1,andd(Hom(V, W») = mn= d(Hom(W*, V*»)
by Theorem 2.5. The injective map T ~ T* is thus an isomorphism (by the
corollary of Theorem 2.4).
Finally, (T ° S)*l = lo (T ° S) = (loT) ° S = S*(l ° T) = S*(T*(l») =
(S* ° T*)l, so that (T ° S)* = S* ° T*. D
The reader would probably guess that T** becomes identified with T under
the identification of V with V**. This is so, and it is actually the reason for
calling the isomorphism ~ ~ ~** natural. We shall return to this question at the
end of the section. Meanwhile, we record an important elementary identity.
Theorelll 3.5. (R(T*»)O = N(T) and N(T*) = (R(T»)o.
Proof. The following statements are definitionally equivalent in pairs as they
occur: l E N(T*), T*(l) = 0, loT = 0, l(T(~») = 0 for all ~ E V, l E (R(T»)o.
Therefore, N(T*) = (R(T»)o. The other proof is similar and will be left to the
reader. [Start with a E N(T) and end with a E (R(T*»)o.] D
The rank of a linear transformation is the dimension of its range space.
Corollary. The rank of T* is equal to the rank of T.
Proof. The dimensions of R(T) and (N(T»)O are each d(V) - d(N(T») by
Theorems 2.4 and 3.3, and the second is d(R(T*») by the above theorem.
Therefore, d(R(T») = d(R(T*»). D
86 FINITE-DIMENSIONAL VECTOR SPACES 2.3
Dyads. Consider any Tin Hom(V, W) whose range M is one-dimensional. If fJ
is a nonzero vector in 111, then x ~ xfJ is a basis isomorphism (J: ~ --. M and
rIoT: V --. ~ is a linear functional AE V*. Then T = (J 0 A and T(~) =
A(~)fJ for all~. We write this as T = A(')fJ, and call any such T a dyad.
Lemma 3.2. If T is the dyad A(' )fJ, then T* is the dyad fJ**(· )A.
Proof. (T*(l»)(~) = (l 0 T)W = l(T(~») = l(A(~)fJ) = l(fJ)A(~), so that
T*(l) = l(fJ)A = fJ**(l)A, and T* = fJ**(· )A. 0
*Natural isomorphisms again. We are now in a position to illustrate more
precisely the notion of a natural isomorphism. We saw above that among all the
isomorphisms from a finite-dimensional vector space V to its second dual, we
could single one out naturally, namely, the map ~ ~ ~**, where ~**(f) = f(~)
for all f in V*. Let us call this isomorphism cpv. The technical meaning of the
word 'natural' pertains to the collection {cpv} of all these isomorphisms; we
found a way to choose one isomorphism CPV for each space V, and the proof that
this is a "natural" choice lies in the smooth way the various cpv's relate to each
other. To see what we mean by this, consider two finite-dimensional spaces V
and Wand a map Tin Hom(V, W). Then T* is in Hom(W*, V*) and T** =
(T*)* is in Hom(V**, W**). The setting for the four maps T, T**, cpv, and CPW
can be displayed in a diagram as follows:
V __----.::T'--___ W
'I'~j j'l'w
T**
V**------- W**
The diagram indicates two maps, CPw 0 T and T** 0 cpv, from V to W**, and we
define the collection of isomorphisms {cpv} to be natural if these two maps are
always equal for any V, Wand T. This is the condition that the two ways of
going around the diagram give the same result, i.e., that the diagram be com-
mutative.
Put another way, it is the condition that T "become" T** when V is identi-
fied with V** (by cpv) and W is identified with W** (by cpw). We leave its proof
as an exercise.
EXERCISES
3.1 Let (J be an isomorphism from a vector space V to ~n. Show that the functionals
{lI'i 0 (J}~ form a basis for V*.
3.2 Show that the standard isomorphism from ~n to (~n)* that we get by composing
the coordinate isomorphism for the standard basis for ~n (the identity) with the dual
basis isomorphism for (~n)* is just our friend a ~ la, where la(x) = L~ a;Xi. (Show
that the dual basis isomorphism is a ~ L~ a,"1f'i.)
2.3 THE DUAL SPACE 87
3.3 We know from Theorem 1.6 that a choice of a basis {,Bi} for V defines an isomor-
phism from wn to Hom(V, W) for any vector space W. Apply this fact and Theorem
1.3 to obtain a basis in V*, and show that this basis is the dual basis of {,Bi}.
3.4 Prove the properties of A° that are listed in the text.
3.5 Find (a basis for) the annihilator of -< 1, 1, 1>- in ~3. (Use the isomorphism
of (~3)* with ~3 to express the basis vectors as triples.)
3.6 Find (a basis for) the annihilator of {-< 1,1, 1 >-, -< 1,2,3>-} in ~3.
3.7 Find (a basis for) the annihilator of {-< 1, 1, 1, 1>-, -< 1, 2, 3, 4>-} in ~4.
3.8 Show that if V = MEt> N, then V* = MO Et> N°.
3.9 Show that if M is any subspace of an n-dimensional vector space V and d(M)
In, then M can be viewed as being the linear span of an independent subset of m
clements of V or as being the annihilator of (the intersection of the null spaces of) an
independent subset of n - m elements of V*.
3.10 If B = {Ii} ~ is a finite collection of linear functionals on V (B C V*), then its
annihilator BO is simply the intersection N = n~ Ni of the null spaces Ni = N(fi)
of the functionals k State the dual of Theorem 3.3 in this context. That is, take W
as the linear span of the functionals Ii, so that we V* and lYo C V. State the dual
of the corollary.
3.11 Show that the following theorem is a consequence of the corollary of Theorem 3.3.
Theorem.. Let N be the intersection n~ Ni of the null spaces of a set {fi}~ of
linear functionals on V, and suppose that gin V* is zero on N. Then g is a linear
combination of the set {M~.
3.12 A corollary of Theorem 3.3 is that if W is a proper subspace of V, then there is
Itt least one nonzero linear functional I in V* such that I = 0 on W. Prove this fact
directly by elementary means. (You are allowed to construct a suitable basis.)
3.13 An m-tuple of linear functionals {f;} ~ on a vector space V defines a linear
mapping at--+ -<it(a), ... ,/m(a) >- from V to ~m. What theorem is being applied
here? Prove that the range of this linear mapping is the whole of ~m if and only if
{f;}~ is an independent set of functionals. [Hint: If the range is a proper subspace lV,
t.here is a nonzero m-tuple a such that L~ aiXi = 0 for all x E W.]
3.14 Continuing the above exercise, what is the null space N of the linear mapping
It t--+ -<it(a), ... ,Im(a) >--? If g is a linear functional which is zero on N, show that g
is a linear combination of the f;, now as a corollary of the above exercise and Theorem
4.3 of Chapter 1. (Assume the set {fi} ~ independent.)
3.15 Write out from scratch the proof that T* is linear [for a given Tin Hom(V, W)].
Also prove directly that T t--+ T* is linear.
3.16 Prove the other half of Theorem 3.5.
3.17 Let 8i be the isomorphism a t--+ a** from Vi to V;** for i = 1, 2, and suppose
p;iven Tin Hom(V1, V2). The loose statement T = T** means exactly that
or T** 0 81 = 82 0 T.
Prove tlIis identity. As usual, do this by proving that it holds for each a in VI.
88 FINITE-DIMENSIONAL VECTOR SPACES 2.4
3.18 Let (J: IR,n ~ V be a basis isomorphism. Prove that the adjoint (J* is the coordi-
nate isomorphism for the dual basis if (IR,n)* is identified with IR,n in the natural way.
3.19 Let w be any bilinear functional on V X TV. Then the two associated linear
transformations are T: V ~ W* defined by (TW)(71) = w(~, 71) and S: lr ~ V*
defined by (S(71»)W = w(~, 71). Prove that S = T* if W is identified with IF**.
3.20 Suppose that fin (IR,m)* has coordinate m-tuple a [fey) = I:~ aiYi] and that T
in Hom(lR,n, IR,m) has matrix t = {t;;}. Write out the explicit expression of the number
f( T(x») in terms of all these coordinates. Rearrange the sum so that it appears in
the form
n
g(x) = :E biXi,
1
and then read off the formula for b in terms of a.
4. MATRICES
Matrices and linear transforlllations. The reader has already learned something
about matrices and their relationship to linear transformations from Chapter 1;
we shall begin our more systematic discussion by reviewing this earlier material.
By popular conception a matrix is a rectangular array of numbers such as
Note that the first index numbers the rows and the second index numbers the
columns. If there are m rows and n columns in the array, it is called an m-by-n
(m X n) matrix. This notion is inexact. A rectangular array is a way of picturing
a matrix, but a matrix is really a function, just as a sequence is a function. With
the notation m = {I, ... , m}, the above matrix is a function assigning a num-
ber to every pair of integers -<i, j>- in mX n. It is thus an element of the set
IR,mXn. The addition of two m X n matrices is performed in the obvious place-
by-place way, and is merely the addition of two functions in IR,mX7i; the same is
true for scalar multiplication. The set of all m X n matrices is thus the vector
space IR,mXn, a Cartesian space with a rather fancy finite index set. We shall use
the customary index notation tij for the value t(i, j) of the function t at -<i, j>-,
and we shall also write {tij} for t, just as we do for sequences and other indexed
collections.
The additional properties of matrices stem from the correspondence between
m X n matrices {tij} and transformations T E Hom(lR,n, IR,m).
The following theorem restates results from the first chapter. See Theorems
1.2, 1.3, and 6.2 of Chapter 1 and the discussion of the linear combination map
at the end of Section 1.6.
2.4 MATRICES 89
Theorem 4.1. Let {tij} be an m-by-n matrix, and let t j be the m-tuple that
is itsjth column for j = 1, ... , n. Then there is a unique TinHom(~n, ~m)
such that skeleton T = {tj} , i.e., such that T( oj) = t j for all j. T is defined
as the linear combination mapping x ~ y = 2:.1=1 Xjt j, and an equivalent
presentation of T is the collection of scalar equations
n
Yi = L tijXj
j=l
for i = 1, ... , m.
Each T in Hom(~n, ~m) arises this way, and the bijection {tij} ~ T from
~mXn to Hom(~n, ~m) is a natural isomorphism.
The only additional remark called for here is that in identifying an m X n
matrix with an n-tuple of m-tuples, we are making use of one of the standard
identifications of duality (Section 0.10). We are treating the natural isomorphism
between the really distinct spaces ~mXn and (~m)n as though it were the identity.
We can also relate T to {tij} by way of the rows of {tij}. As above, taking
ith coordinates in the m-tuple equation y = 2:.1=1 Xjtj, we get the equivalent
and familiar system of numerical (scalar) equations Yi = 2:1=1 tijXj for
i = 1, ... ,m. Now the mapping x ~ 2:1=1 CjXj from ~n to ~ is the most gen-
eral linear functional on ~n. In the above numerical equations, therefore, we
have simply used the m rows of the matrix {tij} to present the m-tuple of linear
functionals on ~n which is equivalent to the single m-tuple-valued linear
mapping T in Hom(~n, ~m) by Theorem 3.6 of Chapter 1.
The choice of ordered bases for arbitrary finite-dimensional spaces V and W
allows us to transfer the above theorem to Hom(V, W). Since we are now going
to correlate a matrix t in ~mXn with a transformation Tin Hom(V, W), we shall
designate the transformation in Hom(~n, ~m) discussed above by T.
Theorem 4.2. Let {ajg and {Ili} 7 be ordered bases for the vector spaces V
and W, respectively. For each matrix {tij} in ~mXn let T be the unique
element of Hom(V, W) such that T(aj) = 2:7'=1 tiif3i for j = 1, ... , n.
Then the mapping {tij} ~ T is an isomorphism from ~mXn to Hom(V, W).
1'1'oof. We simply combine the isomorphism {tij} ~ T of the above theorem
with the isomorphism T ~ T = 1/; 0 To cp-1 from Hom(~n, ~m) to Hom(V, W),
where cp and 1/; are the two given basis isomorphisms. Then T is the transforma-
tion described in the theorem, for T(aj) = 1/;(T(cp-1(aj»)) = 1/;(T(oj») =
f(t;) = 2:7'=1 tijlli' The map {tij} ~ T is the composition of two isomorphisms
alld so is an isomorphism. 0
It is instructive to look at what we have just done in a slightly different way.
(:iven the matrix {tij}, let Tj be the vector in W whose coordinate m-tuple is the
.It h column t j of the matrix, so that Tj = 2:~ 1tijlli. Then let T be the unique
(·lement of Hom(V, W) such that T(aj) = Tj for j = 1, ... ,n. Now we have
()btained T from {tij} in the following two steps: T corresponds to the n-tuple
90 FINITE-DIMENSIONAL VECTOR SPACES 2.4
{Tj}i under the isomorphism from Hom(V, W) to wn given by Theorem 1.6,
and {Tj} ~ corresponds to the matrix {tij} by extension of the coordinate isomor-
phism between W and ~m to its product isomorphism from wn to (~m)n.
Corollary. If y is the coordinate m-tuple of the vector." in W and x is the
coordinate n-tuple of ~ in V (with respect to the given bases), then." = T(~)
if and only if Yi = L1=1 tijXj for i = 1, ... , m.
Proof. We know that the scalar equations are equivalent to y = T(x), which is
the equation y = ",,-1 0 To <p(x). The isomorphism"" converts this to the
equation." = T(~). 0
Our problem now is to discover the matrix analogues of relationship between
linear transformations. For transformations between the Cartesian spaces ~n
this is a fairly direct, uncomplicated business, because, as we know, the matrix
here is a natural alter ego for the transformation. But when we leave the Car-
tesian spaces, a transformation T no longer has a matrix in any natural way, and
only acquires one when bases are chosen and a corresponding T on Cartesian
spaces is thereby obtained. All matrices now are determined with respect to
chosen bases, and all calculations are complicated by the necessary presence of
the basis and coordinate isomorphisms. There are two ways of handling this
situation. The first, which we shall follow in general, is to describe things
directly for the general space V and simply to accept the necessarily more compli-
cated statements involving bases and dual bases and the corresponding loss in
transparency. The other possibility is first to read off the answers for the
Cartesian spaces and then to transcribe them via coordinate isomorphisms.
Lemma 4.1. The matrix element tkj can be obtained from T by the formula
where fJ.k is the kth element of the dual basis in W*.
Proof. fJ.k(T(aj)) = fJ.k(Li"=1 tij{Ji) = Li tijfJ.k(fJ;) = Li tij c5~ = tkj. 0
In terms of Cartesian spaces, T(c5j). is the jth column m-tuple t j in the
matrix {tij} of T, and tkj is the kth coordinate of tj. From the point of view of
linear maps, the kth coordinate is obtained by applying the kth coordinate
projection 'Irk, so that tkj = 'lrk(T(c5j)). Under the basis isomorphisms, 'Irk
becomes fJ.k, T becomes T, c5j becomes aj, and the Cartesian identity becomes
the identity of the lemma.
The transpose. The transpose of the m X n matrix {tij} is the n X m matrix
{tt} defined by tt = tji for all i, j. The rows of t* are of course the columns of t,
and conversely.
Theorem 4.3. The matrix of T* with respect to the dual bases in W* and
V* is the transpose of the matrix of T.
2.4 MATRICES
Proof. If s is the matrix of T*, then Lemmas 3.1 and 4.1 imply that
Sji = a;*(T*(J.LM = a;*(J.Li 0 T)
= (J.Li 0 T)(aj) = J.Li(T(ai)) = tii' 0
91
Definition. The row space of the matrix {tii} E ~mXn is the subspace of ~n
spanned by the m row vectors. The column space is similarly the span of
the n column vectors in ~m.
Corollary. The row and column spaces of a matrix have the same dimension.
Proof. If T is the element of Hom(~n, ~m) defined by T(~i) = ti, then the
set {tin of column vectors in the matrix {tii} is the image under T of the stan-
dard basis of ~n, and so its span, which we have called the column space of the
matrix, is exactly the range of T. In particular, the dimension of the column
space is d(R(T)) = rank T.
Since the matrix of T* is the transpose t* of the matrix t, we have, similarly,
t.hat rank T* is the dimension of the column space of t*. But the column space
of t* is the row space of t, and the assertion of the corollary is thus reduced to
the identity rank T* = rank T, which is the corollary of Theorem 3.5. 0
This common dimension is called the rank of the matrix.
Matrix products. If T E Hom(~n, ~m) and 8 E Hom(~m, ~l), then of course
R = 80 T E Hom(~n, ~l), and it certainly should be possible to calculate the
matrix r of R from the matrices sand t of 8 and T, respectively. To make this
eomputation, we set y = T(x) and z = 8(y), so that z = (8 0 T)(x) = R(x).
The equivalent scalar equations in terms of the matrices t and s are
so that
n
Yi = L tihXh
h=1
and
m
Zk = L SkiYi,
i=1
But Zk = L~=1 rkhXh for k = 1, ... ,l. Taking x as ~i, we have
m
rki = L Skitii
i=1
for all k and j.
We thus have found the formula for the matrix r of the map R = 80 T: x -t z.
Of course, r is defined to be the product of the matrices sand t, and we write
r = s . t or r = st.
Note that in order for the product st to be defined, the number of columns
ill the left factor must equal the number of rows in the right factor. We get the
clement rki by going across the kth row of s and simultaneously down the jth
92 FINITE-DIMENSIONAL VECTOR SPACES 2.4
column of t, multiplying corresponding elements as we go, and adding the
resulting products. This process is illustrated in Fig. 2.1. In terms of the scalar
product (x, y) = L~ XiYi on ~n, we see that the element Tkj in r = st is the
scalar product of the kth row of s and the jth column of t.
(l by m) X (m by n) (l by n)
n
I
Q
I
I
klh mw~-<>--:--<>l
In
I
I
I
• I
I
Q
n I
I
I
'I'lj
------------.----
I
jth column
s • r
Fig. 2.1
Since we have defined the product of two matrices as the matrix of the
product of the corresponding transformations, i.e., so that the mapping T ~ {tij}
preserves products (8 0 T ~ st), it follows from the general principle of
Theorem 4.1 of Chapter 1 that the algebraic laws satisfied by composition of
transformations will automatically hold for the product of matrices. For
example, we know without making an explicit computation that matrix multipli-
cation is associative. Then for square matrices we have the following theorem.
Theorem 4.4. The set M n of square n X n matrices is an algebra naturally
isomorphic to the algebra Hom(~n).
Proof, We already know that T ~ {tij} is a natu~al linear isomorphism from
Hom(~n) to lfn (Theorem 4.1), and we have defined the product of matrices
so that the mapping also preserves multiplication. The laws of algebra (for an
algebra) therefore follow for M n from our observation in Theorem 3.5 of Chapter
1 that they hold for Hom(~n). 0
The identity I in Hom(~n) takes the basis vector oj into itself, and therefore
its matrix e has oj for its jth column: ej = oj. Thus eij = o{ = 1 if i = j
and eij = o{ = 0 if i ~ f. That is, the matrix e is 1 along the main diagonal
(from upper left to lower right) and 0 elsewhere. Since I ~ e under the algebra
isomorphism T ~ t, we know that e is the identity for matrix multiplication.
Of course, we can check this directly: LJ=l tijejk = tik, and similarly for mul-
tiplying by e on the left. The symbol 'e' is ambiguous in that we have used it
to denote the identity in the space ~nXn of square n X n matrices for any n.
Corollary. A square n X n matrix t has a multiplicative inverse if and only
if its rank is n.
2.4 MATRICES 93
Proof. By the theorem there exists an 8 E M n such that 8t = t8 = e if and
only if there exists an 8 E Hom(lRn ) such that 8 0 T = T 0 8 = I. But such
an 8 exists if and only if T is an isomorphism, and by the corollary to Theorem 2.4
this is equivalent to the dimension of the range of T being n. But this dimension
is the rank of t, and the argument is complete. D
A square matrix (or a transformation in Hom V) is said to be nonsingular
if it is invertible.
Theorem 4.5. If {ai}~' {j3j}i', and {'Yk}11 are ordered bases for the vec-
tor spaces U, V, and lV, respectively, and if T E Hom(U, V) and
8 E Hom(V, lV), then the matrix of 80 T is the product of the matrices
of 8 and T (with respect to the given bases).
Proof. By definition the matrix of 80 T is the matrix of 8 0 T = X-I 0 (8 0 T) 0 cp
in Hom(lR n, 1R1), where cp and x are the given basis isomorphisms for U and lV.
But if I/; is the basis isomorphism for V, we have
80 T = (x- 1 0801/;) 0 (1/;-1 0 To cp) = ;S 0 T,
and therefore its matrix is the product of the matrices of ;S and T by the defini-
tion of matrix multiplication. The latter are the matrices of 8 and Twith respect
to the given bases. Putting these observations together, we have the theorem. D
There is a simple relationship between matrix products and transposition.
Theorem 4.6. If the matrix product 8t is defined, then so is t*8*, and
t*8* = (8t)*.
Proof. A direct calculation is easy. We have
m m
(St);k = (St)kj = L Skitij = L t;iS:k = (t*s*) jk·
i=l i=l
Thus (8t)* = t*8*, as asserted. D
This identity is clearly the matrix form of the transformation identity
(8 0 T)* = T* 0 8*, and it can be deduced from the latter identity if desired.
Cartesian vectors as matrices. We can view an n-tuple x = -<Xb"" xn>
as being alternatively either an n X 1 matrix, in which case we call it a column
vector, or a 1 X n matrix, in which case we call it a row vector. Of course, these
identifications are natural isomorphisms. The point of doing this is, in part, that
then the equations Yi = L.f=l tijXj say exactly that the column vector y is the
matrix product of t and the column vector x, that is, y = t· x. The linear map
T: IRn ~ IRm becomes left multiplication by the fixed matrix t when IR n is viewed as
the space of n X 1 column vectors. For this reason we shall take the column
vector as the standard matrix interpretation of an n-tuple x; then x* is the
corresponding row vector.
94 FINITE-DIME~SIONAL VECTOU SPACES 2.4
In particular, a linear functional F E (IRn)* becomes left multiplication by
its matrix, which is of course 1 X n (F being from IRn to 1R1), and therefore is
simply the row matrix interpretation of an n-tuple in IRn. That is, in the natural
isomorphism a f-+ La from IRn to (IRn)*, where La(x) = L~ aiXi, the functional
La can now be interpreted as left matrix multiplication by the n-tuple a viewed
as the row vector a*. The matrix product of the row vector (1 X n matrix) a*
and the column vector (n X 1 matrix) x is a 1 X 1 matrix a*· x, that is, a
number.
Let us now see what these observations say about T*. The number La (T(x»
is the 1 X 1 matrix a*tx. Since La(T(x» = (T*(La»(x) by the definition of
T*, we see that the functional T*(La) is left multiplication by the row vector
a*t. Since the row vector form of La is a* and the row vector form of T*(La) is
a*t, this shows that when the functionals on IRn are interpreted as row vectors,
T* becomes right multiplication by t. This only repeats something we already
know. If we take transposes to throw the row vectors into the standard column
vector form for n-tuples, it shows that T* is left multiplication by t*, and so
gives another proof that the matrix of T* is t*.
Change of basis. If cP: x f-+ ~ = L~ xif3i and (j: y f-+ ~ = L~ Yif3i are two basis
isomorphisms for V, then A = (j-l 0 cP is the isomorphism in Hom(lRn) which
takes the coordinate n-tuple x of a vector ~ with respect to the basis {f3i} into the
coordinate n-tuple y of the same vector with respect to the basis {f3i}. The
isomorphism A is called the "change of coordinates" isomorphism. In terms
of the matrix a of A, we have y = ax, as above.
The change of coordinate map A = (j-l 0 cP should not be confused with the
similar looking T = (j 0 cP-1. The latter is a mapping on V, and is the element
of Hom(V) which takes each f3i to f3i.
T'
~.m.pI
T B
V W
Til
.p2
IRn u;i1m
Fig. 2.2
We now want to see what happens to the matrix of a transformation
T E Hom(V, W) when we change bases in its domain and codomain spaces.
Suppose then that CPl and CP2 are basis isomorphisms from IRn to V, that 1/11 and 1/12
are basis isomorphisms from IRm to W, and that t' and t" are the matrices of T
with respect to the first and second bases, respectively. That is, t' is the matrix
of T' = (1/11)-1 0 To CPl E Hom(lRn, IRm ), and similarly for t". The mapping
A = cp;-1 0 CPl E Hom(lRn) is the change of coordinates transformation for
V: if x is the coordinate n-tuple of a vector ~ with respect to the first basis
[that is, ~ = CPl (x)], then A (x) is its coordinate n-tuple with respect to the second
basis. Similarly, let B be the change of coordinates map 1/1;-1 01/11 for W. The
diagram in Fig. 2.2 will help keep the various relationships of these spaces and
2.4 MATRICES 95
mappings straight. We say that the diagram is commutative, which means that
any two paths between two points represent the same map. By selecting various
pairs of paths, we can read off all the identities which hold for the nine maps
T, T', Til, <PI, <P2, A, 1/Ib 1/12, B. For example, Til can be obtained by going back-
ward along A, forward along T', and then forward along B. That is, Til =
BoT' 0 A-I. Since these "outside maps" are all maps of Cartesian spaces, we
can then read off the corresponding matrix identity
til = bt'a-1,
showing how the matrix of T with respect to the second pair of bases is obtained
from its matrix with respect to the first pair.
What we have actually done in reading off the above identity from the
diagram is to eliminate certain retraced steps in the longer path which the
definitions would give us. Thus from the definitions we get
BoT' 0 A-I = (1/1"21 0 1/11) 0 (1/1110 T 0 <PI) 0 (<PlIo «2) = 1/1"21 0 T 0 <P2 = T".
In the above situation the domain and codomain spaces were different, and
the two basis changes were independent of each other. If W = V, so that
T E Hom(V), then of course we consider only one basis change and the formula
becomes
t" = a· t' . a-I.
Now consider a linear functional FE V*. If f" and f' are its coordinate
n-tuples considered as column vectors (n X 1 matrices), then the matrices of F
with respect to the two bases are the row vectors (f')* and (f")*, as we saw
earlier. Also, there is no change of basis in the range space since here W = IR,
with its permanent natural basis vector 1. Therefore, b = e in the formula
t" = bt'a-t, and we have (f")* = (f')*a-1 or
f" = (a-1)*f'.
We want to compare this with the change of coordinates of a vector ~ E V,
which, as we saw earlier, is given by
x" = ax'.
These changes go in the oppositive directions (with a transposition thrown in).
For reasons largely historical, functionals F in V* are called covariant vectors,
and since the matrix for a change of coordinates in V is the transpose of the
inverse of the matrix for the corresponding change of coordinates in V*, the
vectors ~ in V are called contravariant vectors. These terms are used in classical
tensor analysis and differential geometry.
The isomorphism {tij} f-t T, being from a Cartesian space IRmXn, is auto-
matically a basis isomorphism. Its basis in Hom(V, W) is the image under the
isomorphism of the standard basis in IRmXn, where the latter is the set of
Kronecker functions ,p defined by ~kl(i, j) = 0 if <. k, l> ¢ <. i, j> and
~kl(k, l) = 1. (Remember that in IRA, ~a is that function such that ~a(b) = 0
!)(j FIXITE-DIMEXSIOXcL VECTOn SPACES 2.4
if b r!= a and o"(a) = 1. Here A = m X 'ii and the elements a of A are ordered
pairs a = <. k, Z>-.) The function okl is that matrix whose columns are all 0
except for the lth, and the lth column is the m-tuple ok. The corrcsponding
tran"formation D'd thus takes every ba"is vector (Xj to 0 except (Xl and takes (Xl
to (3". That i", Dkl((Xj) = 0 if j r!= l, and Dk/((Xl) = (3k. Again, Dkl takes the lth
basi" vector in V to the !.th basis vector in TV and takes the other basis vectors
in V to O.
If ~ = L: :ri(Xi, it follows that Dkl(~) = XI(3".
Since {Dkl} is the basis defined by the isomorphi"l1l {tij} ~ T, it follows
that {tij} is the coordinate set of T with respect to this basis; it is thc image of T
under the coordinate isomorphism. It is interesting to see how this basis expan-
"ion of T automatically appear::;. We have
so that
T = L tijD ij.
i,i
Our original discu""ion of the dual basi" ill V* was a special ca"e of the
pre::;ent situation. There we had Hom(V, IR.) = V*, with the permanent "tan-
dard basis 1 for R The basis for V* corresponding to the basis {(Xi} for V
therefore con"ists of those maps Dl taking (Xl to 1 and (Xj to 0 for j r!= l. Then
Dl(~) = DI(L: :rjCXj) = Xl, and Dl i" the lth coordinate functional C,l.
Finally, we note that the matrix expression of T E Hom(lR.n , IR."') is very
suggestive of the block decompositions of T that we discussed earlier in Section
1.5. In the exerci"es we shall ask the reader to show that in fact T'd = tkID,,-/-
EXEHCISES
4.1 Pro"(' that if w: l' X l' ----f IR. i:-; a bilinear functional on rand T: r ----f r*
is the corrt'sponding linear transformation defined by (T(1)))(O = w(~, 1)), then for
any ba:-;is {(XI} for r the matrix t'j = W((XI, (Xj) is the matrix of T. .
4·.2 Verify that the row and column rank of the following matrix are both 1:
[
-5
-10
2
4
·1.3 Show by a direct calculation that if the row rank of a 2 X 3 matrix is 1, then so
is itl'; column rank.
.t.t Let {fin be a linearly dependent set of e2-functions (twice continuously differ-
entiable real-valued functions) on IR.. Show that the three triples <'fi(X),f:(x),f;'(x) >-
arc dependent for any x. Prove therefore that sin t, cos t, ancl et arc linearly indepen-
dent. (Compute the cieri'ative triplei; for a well-chosen x.)
2.4 MATRICES 97
4.5 Compute
r ~ -~l'-3 0
4 2
4.6 Compute
[ a bJ X[ d -bJ.c d -c a
From your answer give a necessary and sufficient condition for
to exist.
4.7 A matrix a is idempotent if a 2 = a. Find a basis for the vector space ~2X2 of
all 2 X 2 matrices consisting entirely of idempotents.
4.8 By a direct calculation show that
is invertible and find its inverse.
4.9 Show by explicitly solving the equation
[; ~]. [: ~] = [~ ~]
,
that the matrix on the left is invertible if and only if (the determinant) ad - be is not
zero.
4.10 Find a nonzero 2 X 2 matrix
[; ~]
whose square is zero.
4.11 Find all 2 X 2 matrices whose squares are zero.
4.12 Prove by computing matrix products that matrix multiplication is associative.
4.13 Similarly, prove directly the distributive law, (r + s) . t = r' t + s . t.
4.14 Show that left matrix multiplication by a fixed r in ~mXn is a linear transforma-
tion from ~nXp to ~mXp. What theorem in Chapter 1 does this mirror?
4.15 Show that the rank of a product of two matrices is at most the minimum of their
ranks. (Remember that the rank of a matrix is the dimension of the range space of its
associated T.)
4.16 Let a be an m X n matrix, and let b be n X m. If m > n, show that a . b cannot
be the identity e (m X m).
98 FINITE-DIMENSIONAL VECTOR SPACES 2.4
4.17 Let Z be the subset of 2 X 2 matrices of the form
Prove that Z is a subalgebra of R2X2 (that is, Z is closed under addition, scalar multipli-
cation, and matrix multiplication). Show that in fact Z is isomorphic to the complex
number system.
4.18 A matrix (necessarily square) which is equal to its transpose is said to be sym-
metric. As a square array it is symmetric about the main diagonal. Show that for any
m X n matrix t the product t . t* is meaningful and symmetric.
4.19 Show that if sand t are symmetric n X n matrices, and if they commute, then
s' t is symmetric. (Do not try to answer this by writing out matrix products.) Show
conversely that if s, t, and s' t are all symmetric, then sand t commute.
4.20 Suppose that T in Hom R2 has a symmetric matrix and that T is not of the
form cI. Show that T has exactly two eigenvectors (up to scalar multiples). What
does the matrix of T become with respect to the "eigenbasis" for R2 consisting of these
two eigenvectors?
4.21 Show that the symmetric 2 X 2 matrix t has a symmetric square root 8 (82 = t)
if and only if its eigenvalues are nonnegative. (Assume the above exercise.)
4.22 Suppose that t is a 2 X 2 matrix such that t* = t-l. Show that t has one of
the forms
where a2 + b2 = 1.
4.23 Prove that multiplication by the above t is a Euclidean isometry. That is,
show that if y = t· x, where x and yE R2, then Ilxll = Ilyll, where Ilxll = (x~ +x~) 1/2.
4.24 Let {Dkl} be the basis for Hom(V, TV) defined in the text. Taking lr = V,
show that these operators satisfy the very important multiplication rules
Dij 0 Dkl = 0
Dik 0 Dkl = Dil.
if j ~ k,
4.25 Keeping the above identities in mind, show that if l ~ 1n, then there are trans-
formations Sand T in Hom V such that
SoT - T 0 S = Dim.
Also find Sand T such that
SoT - T 0 S = Dll - Dmm.
4.26 Given T in Hom Rn, we know from Chapter 1 that T = Li.j Tij, where Tij =
PiTPj and Pi = (Ji'Tri. Now we also have
Show from the definition of Dij in the text that PiDijPj = Dij and that PiDklPj = 0
if either i ~ k or j ~ l. Conclude that Tij = tijDij.
2.5 TRACE A~D DETERMINANT 99
5. TRACE AND DETERMINANT
Our aim in this short section is to acquaint the reader with two very special
real-valued functions on Hom V and to describe some of their properties.
Theorem 5.1. If V is an n-dimensional vector space, there is exactly one
linear functional A on the vector space Hom(V) with the property that
A(S 0 T) = A(T 0 S) for all S, Tin Hom(V) and normalized so that A(I) = n.
If a basis is chosen for V and the corresponding matrix of T is {tij}, then
A(T) = L:i'=1 tii, the sum of the elements on the main diagonal.
Proof. If we choose a basis and define A(T) as L:~ tii, then it is clear that Ais a
linear functional on Hom(V) and that A(I) = n. Moreover,
n(n ) nX(S 0 T) = 2: ~ Sijtji = .2: Sijtji = ~ tjiSij = X(T 0 S) .
•=1 J=1 ',J=1 ',J
That is, each basis for V gives us a functional Ain (Hom V) *such that A(S 0 T) =
A(T 0 S), A(l) = n, and A(T) = L: tii for the matrix representation of that basis.
Now suppose that J..I. is any element of (Hom(V)) * such that J..I.(S 0 T) =
J..I.(T 0 S) and J..I.(I) = n. If we choose a basis for V and use the isomorphism
8: {tij} 1--+ T from ~nXn to Hom V, we have a functional v = J..I. 0 8 on ~nXn
(v = 8*J..I.) such that v(st) = v(ts) and v(e) = n. By Theorem 4.1 (or 3.1) v is
given by a matrix c, v(t) = L:i.j=1 Cijtij, and the equation v(st - ts) = 0
becomes L:i.j,k=1 Cij(Siktkj - Sjktki) = o.
Weare going to leave it as an exercise for the reader to show that if l rf m,
then very simple special matrices sand t can be chosen so that this sum reduces
to Clm = 0, and, by a different choice, to Cll - Cmm = O.
Together with the requirement that v(e) = n, this implies that Clm = 0 for
l rf m and Cmm = 1 for m = 1, ... ', n. That is, v(t) = L:~ tmm , and v is the
A of the basis being used. Altogether this shows that there is a unique A in
(Hom V)* such that A(S 0 T) = A(T 0 S) for all Sand T and A(I) = n, and that
A(T) has the diagonal evaluation as L: tii in every basis. 0
This unique A is called the trace functional, and A(T) is the trace of T. It is
usually designated tr(T).
The determinant function tl(T) on Hom V is much more complicated, and
we shall not prove that it exists until Chapter 7. Its geometric meaning is as
follows. First, [tl(T)[ is the factor by which T multiplies volumes. More pre-
cisely, if we define a "volume" v for subsets of V by choosing a basis and using
the coordinate correspondence to transfer to V the "natural" volume on ~n,
then, for any figure A C V, v(T[AJ) = [tl(T)/. v(A). This will be spelled out in
Chapter 8. Second, tl(T) is positive or negative according as T preserves or
reverses orientation, which again is a sophisticated notion to be explained later.
For the moment we shall list properties of tl(T) that are related to this geometric
interpretation, and we give a sufficient number to show the uniqueness of tl.
100 FINITE-DIMENSIONAL VECTOR SPACES 2.5
v
...-f--"A
M-+-+---+--
Fig. 2.3
We assume that for each finite-dimensional vector space V there is a func-
tion d (or dv when there is any question about domain) from Hom(V) to IR such
that the following are true:
a) d(S 0 T) = d(S) d(T) for any S, T in Hom(V).
b) If a subspace N of V is invariant under T and T is the identity on Nand
on VIN (that is, T[a] = a for each coset a = a +N of N), then
d(T) = 1. Such a T is a shearing of V along the planes parallel to N.
In two dimensions it can be pictured as in Fig. 2.3.
c) If V is a direct sum V = M +N of T-invariant subspaces M and N, and
if R = T rM and S = T rN, then d(T) = d(R) d(S). More exactly,
dv(T) = d},f(R) dN(S).
d) If V is one-dimensional, so that any Tin Hom(V) is simply multiplication
by a constant CT, then d(T) is that constant CT.
e) If V is two-dimensional and T interchanges a pair of independe~t vectors,
then d(T) = -1. This is clearly a pure orientation-changing property.
The fact that d is uniquely determined by these properties will follow from
our discussion in the next section, which will also give us a process for calculating
d. This process is efficient for dimensions greater than two, but for Tin Hom(1R2)
there is a simple formula for d(T) which every student should know by heart.
Theorem 5.2. If T is in Hom(1R2) and {tij} is its 2 X 2 matrix, then
d(T) = tllt22 - t 12t 21 ·
This is a special case of a general formula, which we shall derive in Chapter 7,
that expresses d(T) as a sum of n! terms, each term being a product of n numbers
from the matrix of T. This formula is too complicated to be useful in computa-
tions for large n, but for n = 3 it is about as easy to use as our row-reduction
calculation in the next section, and for n = 2 it becomes the above simple
expression. There are a few more properties of d with which every student
should be familiar. They will all be proved in Chapter 7.
Theorem 5.3. If T isin Hom V, then d(T*) = d(T). If 8 is an isomorphism
from V to Wand S = (J 0 ToO-l, then d(S) = d(T).
2.5 TRACE AND DETERMINANT 101
Theorem 5.4. The transformation Tis nonsingular (invertible) if and only
if A(T) ~ O.
In the next theorem we consider T in Hom Rn, and we want to think of A(T)
as a function of the matrix t of T. To emphasize this we shall use the notation
D(t) = A(T).
Theorem 5.5 (Cramer's rule). Given an n X n matrix t and an n-tuple y,
let t Ii y be the matrix obtained by replacing thejth column of t by y. Then
y = t . x =} D(t)Xi = D(t Ii y)
for all j.
If t is nonsingular [D(t) ~ 0], this becomes an explicit form.lla for the
solution x of the equation y = t· x; it is theoretically important even in those
cases when it is not useful in practice (large n).
EXERCISES
5.1 Finish Theorem 5.1 by applying Exercise 4.25.
5.2 It follows from our discussion of trace that tr(T) = :E tii is independent of the
basis. Show that this fact follows directly from
tr(t . s) = tr(s . t)
and the change of basis formula in the preceding section.
5.S Show by direct computation that the function d(t) = tllt22 - t12t21 satisfies
d(s· t) = des) d(t) (where sand tare 2 X 2 matrices). Conclude that if V is two-
dimensional and d(T) is defined for T in Hom V by choosing a basis and setting
d(T) = d(t), then d(T) is actually independent of the basis.
5.4 Continuing the above exercise, show that d(T) = A(T) in any of the following
cases:
1) T interchanges two independent vectors.
2) T has two eigenvectors.
3) T has a matrix of the form
[~ ~J.
Hhow next that if T has none of the above forms, then T = R 0 S, where S is of type
(1) and R is of type (2) or (3). [Hint: Suppose T(a) = {3, with a and {3 independent.
Let S interchange a and (3, and consider R = To S.] Show finally that d(T) = A(T)
for all T in Hom V. (V is two-dimensional.)
5.5 If t is symmetric and 2 X 2, show that there is a 2 X 2 matrix s such that
H* = 8-1, A(s) = 1, and sts-l is diagonal.
5.6 Assuming Theorem 5.2, verify Theorem 5.4 for the 2 X 2 case.
5.7 Assuming Theorem 5.2, verify Theorem 5.5 for the 2 X 2 case.
102 FINITE-DIMENSIONAL VECTOR SPACES 2.6
5.8 In this exercise we suppose that the reader remembers what a continuous func-
tion of a real variable is. Suppose that the 2 X 2 matrix function
a(t) = [all (t)
a2l (t)
has continuous components aiit) for t E (0, 1), and suppose that a(t) is nonsingular
for every t. Show that the solution y(t) to the linear equation a(t) . y(t) = x(t) has
continuous components YI (t) and Y2(t) if the functions Xl (t) and X2(t) are continuous.
5.9 A homogeneous second-order linear differential equation is an equation of the
form
Y" + alY' + aoy = 0,
where al = al (t) and ao = ao(t) are continuous functions. A solution is a e2-function
1 (i.e., a twice continuously differentiable function) such that I"(t) + al(t)/'(t) +
ao(t)/(t) = o. Suppose that 1 and g are e2-functions [on (0,1), say] such that the
2 X 2 matrix
[ l(t) g(t) ]
!'(t) g'(t)
is always nonsingular. Show that there is a homogeneous second-order differential
equation of which they are both solutions.
5.10 In the above exercise show that the space of all solutions is a two-dimensional
vector space. That is, show that if h(t) is any third solution, then h is a linear combi-
nation of 1and g.
5.11 Bya "linear motion" of the Cartesian plane 1R2 into itself we shall mean a con-
tinuous map X t--+ t(x) from [0, 1] to the set of 2 X 2 nonsingular matrices such that
t(O) = e. Show that Ll (t(I)) > O.
5.12 Show that if Ll(s) = 1, then there is a linear motion whose final matrix t(l) is s.
6. MATRIX COMPUTATIONS
The computational process by which the reader learned to solve systems of
linear equations in secondary school algebra was undoubtedly "elimination by
successive substitutions". The first equation is solved for the first unknown, and
the solution expression is substituted for the first unknown in the remaining
equations, thereby eliminating the first unknown from the remaining equations.
Next, the second equation is solved for the second unknown, and this unknown is
then eliminated from the remaining equations. In this way the unknowns are
eliminated one at a time, and a solution is obtained.
This same procedure also solves the following additional problems:
1) to obtain an explicit basis for the linear span of a set of m vectors in IRn;
therefore, in particular,
2) to find the dimension of such a subspace;
3) to compute the determinant of an m X m matrix;
4) to compute the inverse of an invertible m X m matrix.
2.6 MATRIX COMPUTATIONS 103
In this section we shall briefly study this process and the solutions to these
problems.
We start by noting that the kinds of changes we are going to make on a
finite sequence of vectors do not alter its span.
Lemma 6.1. Let {ai}'f be any m-tuple of vectors in a vector space, and let
{i3i}'f be obtained from {ai} 'f by anyone of the following elementary
operations:
1) interchanging two vectors;
2) multiplying some ai by a nonzero scalar;
3) replacing ai by ai - xaj for some j ~ i and some x E R
Then
L({i3iH) = L({ai}'f).
Proof. If ai = ai - xaj, then ai = ai + Xaj. Thus if {i3i}'f is obtained from
{ai}'f by one operation of type (3), then {ai}'f can be obtained from {i3i}'f by
one operation of type (3). In particular, each sequence is in the linear span of
the other, and the two linear spans are therefore the same.
Similarly, each of the other operations can be undone by one of the same
type, and the linear spans are unchanged. 0
When we perform these operations on the sequence of TOW vectors in a
matrix, we call them elementary row operations.
We define the order of an n-tuple x = <Xb"" xn > as the index of the
first nonzero entry. Thus if Xi = °for i < j and Xj ~ 0, then the order of x
isj. The order of <0,0,0,2, -1,0> is 4.
Let {aij} be an m X n matrix, let V be its row space, and let nl < n2 <
... < nk be the integers that occur as orders of nonzero vectors in V. We are
going to construct a basis for V consisting of k elements having exactly the
above set of orders.
If every nonzero row in {aij} has order >p, then every nonzero vector x in
V has order> p, since x is a linear combination of these row vectors. Since some
vector in V has the minimal order nb it follows that some row in {aij} has order
1£1. We move such a row to the top by interchanging two rows. We then multiply
this row x by a constant, so that its first nonzero entry xn1 is 1. Let a 1, ... , an
be the row vectors that we now have, so that a 1 has order nl and a~l = 1. We
next subtract multiples of a 1 from each of the other rows in such a way that the
new ith row has °as its nl-coordinate. Specifically, we replace ai by ai - a;.l· a1
for i > 1. The matrix that we thus obtain has the property that its jth column
is the zero m-tuple for eachj < n1 and its n1th column is 151 in IRm. Its first row
has order nb and every other row has order >nl' Its row space is still V. We
again call it a.
Now let x = I:'f ciai be a vector in V with order n2. Then Cl = 0, for if
Cl ~ 0, then the order of x is nl. Thus x is a linear combination of the second
104 FINITE-DIMEXSIONAL VECTOR SPACES 2.6
to the mth rows, and, just as in the first case, one of these rows must therefore
have order n2.
We now repeat the above process all over again, keying now on this vector.
We bring it to the second row, make its n2-coordinate 1, and subtract multiples
of it from all the other rows (including the first), so that the resulting matrix
has 52 for its n2th column. Next we find a row with order n3, bring it to the
third row, and make the n3th column 53, etc.
We exhibit this process below for one 3 X 4 matrix. This example is dis-
honest in that it has been chosen so that fractions will not occur through the
application of (2). The reader will not be that lucky when he tries his hand.
Our defense is that by keeping the matrices simple we make the process itself
more apparent.
[~
-1
2
4
2
4
o
---+
(3)
---+ [00
1
(3)
2
-1
4
1
-1
2
o
1
o
4
2
o
2
2
-4
4
-2
o
o 4
1 -2
o 0
-11
;J
-~l11
~l
1
-1
4
1
1
2
o
1
o
2
2
o
2
-2
-4
4
-2
o
Note that from the final matrix we can tell that the orders in the row space
are 1, 2, and 4, whereas the original matrix only displays the orders 1 and 2.
We end up with an m X n matrix having the same row space V and the
following special structure:
1) For 1 :s; j :s; k the jth row has order nj.
2) If k < m, the remaining m - k rows are zero (since a nonzero row would
have order >nk, a contradiction).
3) The njth column is 5j •
It follows that any linear combination of the first k rows with coefficients
Cb .•• , Ck has Cj in the njth place, and hence cannot be zero unless all the
c/s are zero. These k rows thus form a basis for V, solving problems (1) and (2).
Our final matrix is said to be in row-reduced echelon form. It can be shown to
be uniquely determined by the space V and the above requirements relating its
rows to the orders of the elements of V. Its rows form the canonical basis of V.
2.6 MATRIX COMPUTATIONS 105
A typical row-reduced echelon matrix is shown in Fig. 2.4. This matrix is 8 X 11,
its orders are 1, 4, 5, 7, 10, and its row space has dimension 5. It is entirely 0
below the broken line. The dashes in the first five lines represent arbitrary num-
bers, but any change in these remaining entries changes the spanned space V.
We shall now look for the significance of the row-reduction operations from
the point of view of general linear theory. In this discussion it will be convenient
to use the fact from Section 4 that if an n-tuplet in IRn is viewed as an n X 1
matrix (i.e., as a column vector), then the system of linear equations Yk =
L:.i=1 aiixj, i = 1, ... , m, expresses exactly the single matrix equation y = a' x.
Thus the associated linear transformation A E Hom(lRn, IRm) is now viewed as
being simply multiplication by the matrix a; y = A(x) if and only if y = a· x.
1 - - 0 0 - 0 0 -
------,
I 1 0 - 0 0 -
L-,
I 1 - 0 0 -
L __ -,
I 1 0 -L ___ ..,
I 1 -
o- 0
- 0
- 0
Fig. 2.4
We first note that each of our elementary row operations on an m X n
matrix a is equivalent to premultiplication by a corresponding m X m elementary
matrix u. Supposing for the moment that this is so, we can find out what u
is by using the m X m identity matrix e. Since U· a = (u· e) . a, we see that
the result of performing the operation on the matrix a can also be obtained by
premultiplying a by the matrix u . e. That is, if the elementary operation can
be obtained as matrix multiplication by u, then the multiplier is u . e. This
argument suggests that we should perform the operation on e and then see if
premultiplying a by the resulting matrix performs the operation on a.
If the elementary operation is interchanging the ioth and ioth rows, then
performing it on e gives the matrix a with Ukk = 1 for k ~ io and k ~ io,
uioio = Uioio = 1 and Ukl = 0 for all other indices. Moreover, examination of
the sums defining the elements of the product matrix u . a will show that pre-
multiplying by this u does just interchange the ioth and ioth rows of any
In X n matrix a.
In the same way, multiplying the ioth row of a by c is equivalent to pre-
multiplying by the matrix u which is the same as e except that Uioio = c.
Finally, multiplying theioth row by x and adding it to the ioth row is equivalent
to premultiplying by the matrix u which is the identity e except that uioio is x
instead of O.
106 FINITE-DIMENSIONAL VECTOR SPACES 2.6
These three elementary matrices are indicated schematically in Fig. 2.5.
Each has the value 1 on the main diagonal and 0 off the main diagonal except as
indicated.
io jo io jo
io
~I I io
c io x
-I~i-
jo
-1-1~
Fig. 2.5
These elementary matrices u are all nonsingular (invertible). The row inter-
change matrix is its own inverse. The inverse of multiplying the jth row by e
is multiplying the same row by lie. And the inverse of adding e times the jth
row to the ith row is adding -e times the jth row to the ith row.
If u 1, u 2, ... , uP is a sequence of elementary matrices, and if
b = up· Up-I • •••• u
then b· a is the matrix obtained from a by performing the corresponding
sequence of elementary row operations on a. If u  ... , uP is a sequence which
row reduces a, then r = b· a is the resulting row-reduced echelon matrix.
Now suppose that a is a square m X m matrix and is nonsingular (invertible).
Thus the dimension of the row space is m, and hence there are m different orders
n}, ... ,nk' That is, k = m, and since 1 ~ nl < n2 < ... < nm = m, we
must also have ni = i, i = 1, ... ,m. Remembering that the nith column in r is
5 we see that now the ith column in r is 5i and therefore that r is simply the
identity matrix e. Thus b . a = e and b is the inverse of a.
Let us find the inverse of
by this procedure. The row-reducing sequence is
2J ~ [1 2J ~ [14 (3) 0 -2 (2) 0 21J (3) [~
The corresponding elementary matrices are
2.6 MATRIX COMPUTATIONS 107
The inverse is therefore the product
[1 -2][1 0][ 1
o 1 0 -l -3
0] = [-; ~].1 2-2
Check it if you are in doubt.
Finally, since h· e = h, we see that we get h from e by applying the same
row operations (gathered together as premultiplication by h) that we used to
reduce a to echelon form. This is probably the best way of computing the inverse
of a matrix. To keep track of the operations, we can place e to the right of a to
form a single m X 2m matrix a Ie, and then row reduce it. In echelon form it
will then be the m X 2m matrix e Ih, and we can read off the inverse h of the
original matrix a.
Let us recompute the inverse of
D !]
by this method. We row reduce
[~
2 1
~l4 0
getting
[! 2 1
~] [~ -~I
1
~] -- [~
2 1
-~]4 0 (3) -3 (2) 1 ~
2
(3) [~ ~I
-2
-~] ,t
from which we read off the inverse to be
[-;2 -~l
Finally we consider the problem of computing the determinant of a square
m X m matrix. We use two elementary operations (one modified) as follows:
1') interchanging two rows and simultaneously changing the sign of one of
them;
2) as before, replacing some row ai by ai - xaj for some j ~ i.
When applied to the 1"OWS of a square matrix, these operations leave the determi-
nant unchanged. This follows from the properties of determinants listed in
Section 5, and its proof will be left as an exercise. Moreover, these properties
will be trivial consequences of our definition of a determinant in Chapter 7.
Consider, then, a square m X m matrix {aij}. We interchange the first
and pth rows to bring a row of minimal order nl to the top, and change the sign
of the row being moved down (the first row here). We do not make the leading
108 FIXITE-DIMENSIONAL VECTOR SPACES 2.6
coefficient of the new first row 1; this elementary operation is not being used
now. We do subtract multiples of the first row from the remaining rows, in order
to make all the remaining entries in the nlth column O. The nlth column is now
C10 l, where Cl is the leading coefficient in the first row. And the new matrix has
the same determinant as the original matrix.
We continue as before, subject to the above modifications. We change the
sign of a row moved downward in an interchange, we do not make leading
coefficients 1, and we do clear out the njth column so that it becomes CjOn;,
where Cj is the leading coefficient of the jth row (1 ::=; j ::=; k). As before, the
remaining m - k rows are 0 (if k < m). Let us call this resulting matrix
semireduced. Note that we can find the corresponding reduced echelon matrix
from it by k applications of (2); we simply multiply the jth row by 1/Cj for
j = 1, ... ,k. If s is the semireduced matrix which we obtained from a using
(1') and (3), then we shall show below that its determinant, and therefore the
determinant of a also, is the product of the entries on the main diagonal: IIi=1 sii.
Recapitulating, we can compute the determinant of a square matrix a by using
the operations (1') and (3) to change a to a semireduced matrix s, and then
taking the product of the numbers on the main diagonal of s.
If we apply this process to
D !J,
we get
4
2
J (3) [1 2J ~ [1 OJ
o -2 (3) 0 -2 '
and the determinant is 1 . (-2) = -2. Our 2 X 2 determinant formula, applied
to
gives 1 . 4 - 2· 3 = 4 - 6 = -2.
If the original matrix {aij} is nonsingular, so that k = m and ni = i for
i = 1, ... , m, then the jth column in the semireduced matrix is CjOi, so that
Sjj = CiJ and we are claiming that the determinant is the product IIi=l Ci of the
leading coefficients.
To see this, note that if T is the transformation in Hom([Rn) corresponding
to our semireduced matrix, then T( oj) = CjO j , so that [Rn is the direct sum of n
T-invariant, one-dimensional subspaces, on the jth of which T is multiplication
by Cj. It follows from (c) and (d) of our list of determinant properties that
t:.(T) = II~ Cj = II~ Sjj. This is nonzero.
On the other hand, if {aij} is singular, so that k = d(V) < m, then the mth
row in the semireduced matrix is 0 and, in particular, Smm = O. The product
IIi Sii is thus zero. Now, without altering the main diagonal, we can subtract
multiples of the columns containing the leading row entries (the columns with
2.6 MATRIX COMPUTATIONS 109
indices nj) to make the mth column a zero column. This process is equivalent
to postmultiplying by elementary matrices of type (2) and, therefore, again
leaves the determinant unchanged. But now the transformation S of this matrix
leaves ~m-l invariant (as the span of Clt, ... , Clm - 1 in ~m) and takes Clm to 0,
so that t.(S) = 0 by (c) in the list of determinant properties. So again the
determinant is the product of the entries on the main diagonal of the semi-
reduced matrix, zero in this case.
We have also found that a matrix is nonsingular (invertible) if and only if its
determinant is nonzero.
EXERCISES
6.1 Compute the canonical basis of the row space of
[-1
2 1
j2 3
-3 0
4 -1
6.2 Do the same for
U
2 4
:l2 3
-2 0
6.3 Do the same for the above matrix but with a different first choice.
6.4 Calculate the inverse of
[~
2
3
4 ~]
by row reduction. Check your answer by multiplication.
6.5 Row reduce
[~
2
3
4
3
4
7
Yl]Y2 .
Y3
How does the fourth column in the row-reduced matrix compare with the inverse of
[~
computed in the above exercise? Explain.
2
3
4 ~]
6.6 Check whether or not -< 1, 1, 1, 1>-, -< 1, 2, 3, 4>-, -<0, 1, 0, 1>-, and
-< 4, 3, 2, 1>- are linearly independent by row reducing. Part of one of the row-reduc-
ing operations is unnecessary for this check. What is it?
110 FINITE-DIMENSIONAL VECTOR SPACES 2.6
6.7 Let us call a k-tuple of vectors {ai}t in [Rn canonical if the k X n matrix a with
ai as its ith row for all i is in row-reduced echelon form. Supposing that an n-tuple ~
is in the row space of a, we can read off what its coordinates are with respect to the
above canonical basis. What are they? How then can we check whether or not an
arbitrary n-tuple ~ is in the row space?
6.8 Use the device of row reducing, as suggested in the above exercise, to determine
whether or not 51 = -< 1,0,0,0> is in the span of -< 1,1,1,1 >, -< 1,2,3,4>, and
-< 2,0,1, -1 >. Do the same for -< 1,2,1,2>, and also for -< 1,1,0,4>.
6.9 Supposing that a ,t. 0, show that
[; ~J
is invertible if and only if ad - bc ,t. °by reducing the matrix to echelon form.
6.10 Let a be an m X n matrix, and let u be the nonsingular matrix that row reduces
a, so that r = u' a is the row-reduced echelon matrix obtained from a. Suppose that r
has m - k > °zero rows at the bottom (the kth row being nonzero). Show that the
bottom m - k rows of u span the annihilator (range A)O of the range of A. That is,
y = ax for some x if and only if
m
L: CiYi = °1
for each m-tuple c in the bottom m - k rows of u. [Hint: The bottom row of r is
obtained by applying the bottom row of u to the columns of a.]
6.11 Remember that we find the row-reducing matrix u by applying to the m X m
identity matrix e the row operations that reduce a to r. That is, we row reduce the
m X (n +m) juxtaposition matrix a Ie to r Iu. Assuming the result stated in the
above exercise, find the range of A E Hom([R3) as the null space of a functional if the
matrix of A is
2
3
5
6.12 Similarly, find the range of A if the matrix of A is
r~ ~ iJ6.13 Let a be an m X n matrix, and let a be row reduced to r. Let A and R be the
corresponding operators in Hom([Rn, [Rm) [so that A(x) = a . x]. Show that A and R
have the same null space and that A* and R* have the same range space.
6.14 Show that solving a system of m linear equations in n unknowns is equivalent
to solving a matrix equation
k = tx
for the n-tuple x, given the m X n matrix t and the m-tuple k. Let T E Hom([Rn, [Rm)
be multiplication by t. Review the possibilities for a solution from our general linear
theory for T (range, null space, affine subspace).
2.7 THE DIAGONALIZATION OF A QUADRATIC FORM 111
6.15 Let b = c Id be the m X (n + p) matrix obtained by juxtaposing the m X n
matrix c and the m X p matrix d. If a is an l X m matrix, show that
a' b = ac Iad.
State the similar result concerning the expression of b as the juxtaposition of n column
m-tuples. State the corresponding theorem for the "distributivity" of right multipli-
cation over juxtaposition.
6.16 Let a be an m X n matrix and k a column m-tuple. Let b II be the m X (n + 1)
matrix obtained from the m X (n + 1) juxtaposition matrix a Ik by row reduction.
Show that a . x = k if and only if b . x = I. Show that there is a solution x if and only
if every row that is zero in b is zero in I. Restate this condition in terms of the notion
of row rank.
6.17 Let b be the row-reduced echelon matrix obtained from an m X n matrix a.
Thus b = U· a, where u is nonsingular, and Band i1 have the same null space (where
B E Hom(~n, ~m) is multiplication by b). We can read off from b a basis for a sub-
space W C ~n such that B I W is an isomorphism onto range B. What is this basis?
We then know that the null space N of B is a complement of W. One complement of W,
call it M, can be read off from W. What is M?
6.18 Continuing the above exercise, show that for each standard basis vector Oi in M
we can read off from the matrix b a vector (Xi in W such that Oi - (Xi E N. Show that
these vectors {oi - (Xi} form a basis for N.
6.19 We still have to show that the modified elementary row operations leave the
determinant of a square matrix unchanged, assuming the properties (a) through (e)
from Section 5. First, show from (a), (c), (d), and (e) that if T in Hom ~2 is defined
by T(ol) = 02 and T(02) = -0 1, then /:;'(T) = 1. Do this by a very simple factor-
ization, T = R 0 S, where (e) can be applied to S. Conclude that a type (1') elementary
matrix has determinant 1.
6.20 Show from the determinant property (b) that an elementary matrix of type (2)
has determinant 1. Show, therefore, that the modified elementary row operations on a
square matrix leave its determinant unchanged.
*7. THE DIAGONALIZATION OF A QUADRATIC FORM
As we mentioned earlier, one of the crucial problems of linear algebra is the
analysis of the "structure" of a linear transformation T in Hom V. From the
point of view of bases, every theorem in this area asserts that with the choice
of a special basis for V the matrix of T can be given the such-and-such simple
form. This is a very difficult part of the subject, and ,ye are only making con-
tact with it in this book, although Theorem 5.5 of Chapter 1 and its corollary
form a cornerstone of the structural results.
In this section we are going to solve a simpler problem. In the above lan-
guage it is the problem of choosing a basis for V making simple the matrix of a
transformation T in Hom(V, V*). Such a transformation is equivalent to a
bilinear functional on V (by Theorem 6.1 of Chapter 1 and Theorem 3.2 of this
chapter); we shall tackle the problem in this setting.
112 FINITE-DIMENSIONAL VECTOR SPACES 2.7
Let V be a finite-dimensional real vector space, and let w: V X V ~ IR be
a bilinear functional. If {aig is a basis for V, then w determines a matrix
tij = w(ai' aj). We know that if w'l(~) = w(~, 7]), then W'l E V* and 7] 1-+ W'l is a
linear mapping T from V to V*. We leave it as an exercise for the reader to
show that {tij} is the matrix of T with respect to the basis {ai} for V and its
dual basis for V* (Exercise 4.1).
If ~ = L~ Xiai and 7] = L~ Yjaj, then
i,j i,i
In particular, if we set q(~) = w(~, ~), then q(~) = Li,j tijXiXj is a homogeneous
quadratic polynomial in the coordinates Xi.
For the rest of this section we assume that w is symmetric: w(~, 7]) =
w(7], ~). Then we can recover w from the quadratic form q by
( t ) _ qa+ 7]) - qa - 7])
w ,>,7] - 4 '
as the reader can easily check. In particular, if the bilinear form w is not iden-
tically zero, then there are vectors ~ such that q(~) = wa, ~) ~ O.
What we want to do is to show that we can find a basis {aig for V such that
w(ai' aj) = 0 if i ~ j and w(ai' ai) has one of the three values 0, ± 1. Borrow-
ing from the standard usage of scalar product theory (see Chapter 5), we say
that such a basis is orthonormal. Our proof that an orthonormal basis exists will
be an induction on n = dim V. If n = 1, then any nonzero vector (3 is a basis,
and if w({3, (3) ~ 0, then we can choose a = x{3 so that x2w({3, (3) = w(a, a) =
±I, the required value of X obviously being x = /w({3, (3)/-1/2. In the general
case, if w is the zero functional, then any basis will trivially be orthonormal, and
we can therefore suppose that w is not identically O. Then there exists a (3 such
that w({3, (3) ~ 0, as we noted earlier. We set an = x{3, where x is chosen to
make q(an) = wean, an) = ±1. The nonzero linear functionalf(~) = w(~, an)
has an (n - I)-dimensional null space N, and if we let w' be the restriction of
w to N X N, then w' has an orthonormal basis {aig- 1 by the inductive hypoth-
esis. Also w(ai' an) = wean, ai) = 0 if i < n, because ai is in the null space off.
Therefore, {aig is an orthonormal basis for w, and we have reached our goal:
Theorelll 7.1. If w is a symmetric bilinear functional on a finite-dimensional
real vector space V, then V has an w-orthonormal basis.
For an w-orthonormal basis the expansion w(~, 7]) = L xiYjw(ai, aj) reduces to
n
w(~, 7]) = L: xiYiq(ai),
i=1
where q(ai) = ± 1 or O. If we let V 1 be the span of those basis vectors ai for
which q(ai) = 1, and similarly for V -1 and V o, then we see that q(~) > 0 for
every nonzero ~ in V b q(~) < 0 for every nonzero vector ~ in V-b and q = 0
2.7 THE DIAGONALIZATION OF A QUADRATIC FORM 113
on Vo. Furthermore, V = V 1 9 V -1 9 V 0, and the three subspaces are
w-orthonormal to each other (which means that w{~, 71) = 0 if ~ E V1 and
71 E V-1> etc.). Finally, q(~) ::; 0 for every ~ in V-1 9 Yo·
If we choose another orthonormal basis {~i} and let W 1, W -1, and W 0 be
its corresponding subspaces, then W 1 may be different from V 1, but their dimen-
sions must be the same. For W 1 n (V-1 9 V 0) = {O}, since any nonzero ~
in this intersection would yield the contradictory inequalities q(~) > 0 and
q(~) ::; o. Thus W 1 can be extended to a complement of V -1 9 Yo, and since
V 1 is a complement, we have d(W1) ::; d(V1). Similarly, d(V1) ::; d(W1),
and the dimensions therefore are equal. Incidentally, this shows that W 1 is a
complement of V-1 9 Yo. In exactly the same way, we find that d(W-1) =
d(V-1) and finally, by subtraction, that d(W0) = d(Vo). It is conventional to
reorder an w-orthonormal basis {ai} 1so that all the a/s with q(ai) = 1 come first,
then those with q(ai) = -1, and finally those with q(ai) = o. Our results
above can then be stated as follows:
Theorelll 7.2. If w is a symmetric bilinear functional on a finite-dimensional
space V, then there are integers nand p such that if {ai} '{' is any w-ortho-
normal basis in conventional order, and if ~ = L:,{, Xiai, then
q(~) = x~ +... +x; - X;+1 - ... - X;+n
p p+n
=:E x~ - :E xl1 p+1
The integer p - n is called the signature of the form q (or its associated
symmetric bilinear functional w), and p +n is its rank. Note that p + n is the
dimension of the column space of the above matrix of q, and hence equals the
dimension of the range of the related linear map T. Therefore, p +n is the
rank of every matrix of q.
An inductive proof that an orthonormal basis exists doesn't show us how to
find one in practice. Let us suppose that we have the matrix {tii} of w with
respect to some basis {aig before us, so that w(~, 71) = L: XiYitij, where
~ = L:1 Xiai, 71 = 1:1 Yiai, and tii = w(ai' ai), and we want to know how to go
about actually finding an orthonormal basis {~i} 1. The main problem is to find
an orthogonal basis; normalization is then trivial. The first objective is to find
a vector ~ such that w(~, m0;6- o. If some tii = w(ai' ai) is not zero, we can take
~ = ai. If all tii = 0 and the form w is not the zero form, there must be some
Iii 0;6- 0, say t12 0;6- O. If we set 1'1 = a1 +a2 and I'i = ai for i > 1, then {I'ig
is a basis, and the matrix s = {Sii} of w with respect to the basis {-Yi} has
Sll = W(I'1> 1'1) = w(a1 +a2, a1 +a2) = tll +2t12 +t22 = 2t12 0;6- O.
Similarly, sii = tii if either i or j is greater than 1.
For example, if w is the bilinear form on 1R2 defined by w(x, y) = X1Y2 +
X2Y1, then its matrix tii = w(,si, ,si) is
[~ ~J'
114 FINITE-DIMENSIONAL VECTOR SPACES 2.7
and we must change the basis to get tt1 ~ o. According to the above scheme,
we set 'Yl = 151 + 152 and 'Y2 = 152 and get the new matrix Sij = W('Yi' 'Yj),
which works out to
[i ~].
The next step is to find a basis for the null space of the functional w(~, 'Y1) =
L XiS I i· We do this by modifying 'Y2, ... , 'Yn; we replace 'Yj by 'Yj +e'Y1 and
calculate e so that this vector is in the null space. Therefore, we want 0 =
w('Yj +e'Yl, 'Yl) = Slj +eS11, and so e = -sljls11. Note that we cannot take
this orthogonalizing step until we have made Sl1 ~ o. The new set still spans
and thus is a basis, and the new matrix {rij} has r11 ~ 0 and rlj = rjl = 0 for
j > 1. We now simply repeat the whole procedure for the restriction of W to this
(n - I)-dimensional null space, with matrix {rij : 2 ~ i, j ~ n}, and so on.
This is a long process, but until we normalize, it consists only of rational oper-
ations on the original matrix. We add, subtract, multiply, and divide, but we
do not have to find roots of polynomial equations.
Continuing our above example, we set fh = 'Yb but we have to replace 'Y2
by fJ2 = 'Y2 - (SI2Is11)'Y1 = 'Y2 - t'Yl. The final matrix rij = W(fJi, fJj)
has
rll = Sl1 = 2,
{rij} = [2 ?].o -"2
The final basisisfJl = 'Y1 = 151 + c52andfJ2 = 'Y2 - t'Yl = 152 - t(c51 + 152)=
(152 - 151)/2.
The steps we had to take above are reminiscent of row reduction, but since
we are changing bases simultaneously in the domain and range spaces of the
transformation T: V ---? V* associated with w, each step involves simultaneously
premultiplying and postmultiplying by an elementary matrix. That is, we are
simultaneously row and column reducing. It should be intuitively clear that this
has to be the case if we are to operate on a symmetric matrix in such a way as
to keep it symmetric. -
For additional information about quadratic forms, we go back to the change
of basis formula for the matrix of a transformation: til = b· t'· a-I. Here the
transformation T associated with the form w is from V to V*, and so b = (a*)-I,
according to our calculations in Section 4. Now one of the properties of the
determinant function is that d(T*) = d(T), and so d(a*) = d(a). Therefore,
if t and s are the matrices of a quadratic form with respect to a first and second
basis in V, and if a is the change of basis matrix, then s = (a*)-I. t· a-I and
deS) = (d(a-1))2 d(t). Therefore, a quadratic form has parity. If it is non-
singular, then its determinant is either always positive or always negative, and
2.7 THE DIAGONALIZATION OF A QUADRATIC FORM 115
we can call it even or odd. In our continuing example, the beginning and final
matrices
[~ ~] and
both have determinant -1.
In the two-dimensional case, the determinant of a form with respect to an
orthonormalized basis is +1 if the diagonal elements are both +1 or both - 1,
and -1 if they are of opposite sign. We can therefore read off the signature of a
nonsingular form over a two-dimensional space without orthonormalizing. If the
determinant t11t22 - (t 12)2 is positive, the signature is ±2, and we can deter-
mine which by looking at t11 (since t11 is then unchanged by our orthogonalizing
procedure). Thus the signature is +2 or -2 depending on whether tIl > 0 or
t11 < o. If the determinant is negative, then the signature is O. Thus the
signature of the form w(x, y) = X1Y2 + X2Yr, with matrix
[~ ~],
is known to be 0, without any calculation.
Theorems 7.1 and 7.2 are important for the classification of critical points
of real-valued functions on vector spaces. We shall see in Section 3.16 that the
second differential of such a function F is a symmetric bilinear functional, and
that the signature of its form has the same significance in determining the be-
havior of F near a point at which its first differential is zero that the sign of
the second derivative has in the elementary calculus.
A quadratic form q is said to be definite if q(~) is never zero except for ~ = O.
Then q(~) must always have the same sign, and q is accordingly called positive
definite or negative definite. Looking back to Theorem 7.2, it should be obvious
that q is positive definite only if p = dey) and n = 0, and negative definite
only if n = dey) and p = o. A symmetric bilinear functional whose associated
quadratic form is positive definite is called a scalar product. This is a very
important notion on general vector spaces, and the whole of Chapter 5 is de-
voted to developing some of its implications.
CHAPTER 3
THE DIFFERENTIAL CALCULUS
Our algebraic background is now adequate for the differential calculus, but we
still need some multidimensional limit theory. Roughly speaking, the differ-
ential calculus is the theory of linear approximations to nonlinear mappings,
and we have to know what we mean by approximation in general vector settings.
We shall therefore start this chapter by studying the notion of a measure of
length, called a norm, for the vectors in a vector space V. We can then study
the phenomenon suggested by the way in which a tangent plane to a surface
approximates the surface near the point of tangency. This is the general theory
of unique local linear approximations of mappings, called differentials. The
collection of rules for computing differentials includes all the familiar laws of
the differential calculus, and achieves the same goal of allowing complicated
calculations to be performed in a routine way. However, the theory is richer
in the multidimensional setting, and one new aspect which we must master is
the interplay between the linear transformations which are differentials and their
evaluations at given vectors, which are directional derivatives in general and
partial derivatives when the vectors belong to a basis. In particular, when the
spaces in question are finite-dimensional and are replaced by Cartesian spaces
through a choice of bases, then the differential is entirely equivalent to its matrix,
which is a certain matrix of partial derivatives called the Jacobian matrix of the
mapping. Then the rules of the differential calculus are expressed in terms of
matrix operations.
Maximum and minimum points of real-valued functions are found exactly
as before, by computing the differential and setting it equal to zero. However,
we shall neglect this subject, except in starred sections. It also is much richer
than its one-variable counterpart, and in certain infinite-dimensional situations
it becomes the subject called the calculus of variations.
Finally, we shall begin our study of the inverse-mapping theorem and the
implicit-function theorem. The inverse-mapping theorem states that if a mapping
between vector spaces is continuously differentiable, and if its differential at a
point a is invertible (as a linear transformation), then the mapping itself is
invertible in the neighborhood of a. The implicit-function theorem states that if
a continuously differentiable vector-valued function G of two vector variables
is set equal to zero, and if the second partial differential of G is invertible (as a
linear mapping) at a point -< a, (3 >- where G(a, (3) = 0, then the equation
116
3.1 REVIEW IN IR 117
G(~7J) = 0 can be solved for 7J in terms of ~ near this point. That is, there is a
uniquely determined mapping 7J = F(~) defined near a such that (3 = F(a) and
such that G(~, F(~») = 0 in the neighborhood of a. These two theorems are
fundamental to the further development of analysis. They are deeper results
than our work up to this point in that they depend on a special property of
vector spaces called completeness; we shall have to put off part of their proofs to
the next chapter, where we shall study completeness in a fairly systematic way.
In a number of starred sections at the end of the chapter we present some
harder material that we do not expect the reader to master. However, he should
try to get a rough idea of what is going on.
I. REVIEW IN IR
Every student of the calculus is presumed to be familiar with the properties of
the real number system and the theory of limits. But we shall need more than
familiarity at this point. It will be absolutely essential that the student under-
stand the E-definitions and be able to work with them.
To be on the safe side, we shall review some of this material in the setting of
limits of functions; the confident reader can skip it. We suppose that all the
functions we consider are defined at least on an open interval containing a,
except possibly at a itself. The need for this exception is shown by the difference
quotients of the calculus, which are not defined at the point near which their
hchavior is crucial.
Definition. f(x) approaches l as x approaches a (in symbols, f(x) ~ l as
x ~ a) if for every positive Ethere exists a positive ~ such that
o < Ix - al < ~ ~ If(x) - II < E.
We also say that l is the limit of f(x) as x approaches a and write
lim",--->af(x) = l. The displayed statement in the definition is understood to be
universally quantified in x, so that the definition really begins with the three
quantifiers (VE>O)(3~>O)(Vx). These prefixing quantifiers make the definition
sound artificial and unidiomatic when read as
ordinary prose, but the reader will remember from
our introductory discussion of quantification that
this artificiality is absolutely necessary in order
for the meaning of the sentence to be clear and
unambiguous. Any change in the order of the
quantifiers (VE)(3~)(Vx) changes the meaning of the
statement.
The meaning of the inner universal quantifi-
mtion
(Vx)(O < Ix - al < ~ ~ If(x) - II < E)
is intuitive and easily pictured (see Fig. 3.1).
I
E{~-_-_-~i_-~-~~r-_
E{ I
I
--+-
I
I
I
Fig. 3.1
118 THE DIFFERENTIAL CALCULUS 3.1
For all x closer to a than ~ the value of f at x is closer to l than E. The
definition begins by stating that such a positive ~ can be found for each positive E.
Of course, ~ will vary with E; if E is made smaller, we will generally have to
go closer to a, that is, we will have to take ~ smaller, before all the values of f
on (a - ~, a + ~) - {a} become E close to l.
The variables 'E' and '~' are almost always restricted to positive real num-
bers, and from now on we shall let this restriction be implicit unless there seems
to be some special call for explicitness. Thus we shall write simply ('v'E)(3~) ...
The definition of convergence is used in various ways. In the simplest
situations we are given one or more functions having limits at a, say, f(x) ---+ u
and g(x) ---+ v, and we want to prove that some other function h has a limit 10
at a. In such cases we always try to find an inequality expressing the quantity we
wish to make small, Ih(x) - 101, in terms of the quantities which we know can be
made small, If(x) - ul and Ig(x) - vi.
For example, suppose that h = f +g. Since f(x) is close to u and g(x) is
close to v, clearly hex) is close to 10 = U +v. But how close? Since hex) - 10 =
(f(x) - u) + (g(x) - v), we have
Ih(x) - 101 ::; If(x) - ul + Ig(x) - vi.
From thi~ it is clear that in order to make Ih(x) - 101 less than E it is sufficient
to make each of If(x) - ul and Ig(x) - vi less than E/2. Therefore, given any E,
we can take ~l so that 0 < Ix - al < ~l => If(x) - ul < E/2, and ~2 so that
o < Ix - al < ~2 => Ig(x) - vi < E/2, and we can then take ~ as the smaller
of these two numbers, so that if 0 < Ix - al < ~, then both inequalities hold.
Thus
o < Ix - al < ~ => Ih(x) - 101 ::; If(x) - ul + Ig(x) - vi < ~ +~ = E,
and we have found the desired ~ for the function h.
Suppose next that u r!= 0 and that h = l/f. Clearly, hex) is close to 10 = l/u
whenf(x) is close to u, and so we try to express hex) - 10 in terms of f(x) - u.
Thus
1 1 u - f(x)
hex) - 10 = f(x) - U= f(x)u '
and so Ih(x) - 101 ::; If(x) - ul/lf(x)ul. The trouble here is that the denomi-
nator is variable, and if it should happen to be very small, it might cancel the
smallness of If(x) - ul and not force a small quotient. But the answer to this
problem is easy. Sincef(x) is close to u and u is not zero, f(x) cannot be close to
zero. For instance, if f(x) is closer to u than lul/2, then f(x) must be farther
from 0 than lul/2. We therefore choose ~l so that 0 < Ix - al < ~l =>
If(x) - ul < lul/2, from which it follows that If(x) I > lul/2. Then
Ih(x) - 101 < 2If(x) - ul/luI 2,
3.1 REVIEW IN ~ 119
and now, given any E, we take 02 so that
o < Jx - aJ < 02 => Jf(x) - uJ < EJUJ2/2.
Again taking 0 as the smaller of 01 and 02, so that both inequalities will hold
Himultaneously when 0 < Jx - aJ < 0, we have
o < Jx - aJ < 0 => Jh(x) - wJ < 2Jf(x) - UJ/JUJ2 < 2EJUJ2/2JuJ2 = E,
and again we have found our 0 for the function h.
We have tried to show how one would think about these situations. The
actual proof that would be written down would only show the choice of o. Thus,
Lelllllla 1.1. If f(x) ----+ u and g(x) ----+ v as x ----+ a, then f(x) + g(x) ----+ u +v
as x ----+ a.
Proof. Given E, choose 01 so that 0 < Jx - aJ < 01 => Jf(x) - aJ < E/2
(by the assumed convergence of f to u at a), and, similarly, choose 02 so that
o < Jx - aJ < 02 => Jg(x) - vJ < E/2. Take 0 as the smaller of 01 and 02.
Then
0< Jx - aJ < 0 => J(j(x) +g(x») - (u+v)J
:::; Jf(x) - uJ + Jg(x) - vJ < E/2 + E/2 = E.
Thus we have proved that for every Ethere is a 0 such that
o < Jx - aJ < 0 => J(j(x) + g(x») - (u +v)1 < E,
and we are done. 0
In addition to understanding E-techniques in limit theory, it is necessary to
understand and to be able to use the fundamental property of the real number
system called the least upper bound property. In the following statement of the
property the semi-infinite interval (- 00, a] is of course the subset {x E ~ : x :::; a}.
If A is a nonempty subset of ~ such that A c (- 00, a] for some a, then
there exists a uniquely determined smallest number bsuch that A C (- 00, b].
A number a such that A C (- 00, a] is called an upper bound of A; clearly, a
iH an upper bound of A if and only if every x in A is less than or equal to a.
Aset having an upper bound is said to be bounded above. The property says that
a nonempty set A which is bounded above has a least upper bound (lub). If
we reverse the order relation by multiplying everything by -1, then we have the
alternative formulation which asserts that a nonempty subset of ~ that is
hounded below has a greatest lower bound (glb). The least upper bound of the
interval (0, 1) is 1. The least upper bound of [0, 1] is also 1. The greatest lower
hound of {1/n : n a positive integer} is O. Furthermore, lub {x : x is a positive
rational number and x 2 < 2} = 0, glb {ex: x E ~} = 0, and lub {ex: x is
rational and x < 0} = eV2•
120 THE DIFFERENTIAL CALCULUS 3.1
EXERCISES
1.1 Prove that if f(x) ----> I and f(x) ----> m as x ----> a, then I = 111. ·We can therefore
talk about the limit of f as x ----> a.
1.2 Prove that if f(x) ----> I and g(x) ----> m (as x ----> a), then f(x) g(x) ----> 1m as x ----> a.
1.3 Prove that Jx - aJ ::::; JaJ/2 => JxJ 2:: JaJ/2.
1.4 Prove (in detail) the greatest lower bound property from the least upper bound
property.
1.5 Show that lub A = x if and only if x is an upper bound of A and, for every
positive E, x - E is not an upper bound of A.
1.6 Let A and B be subsets of IR that are nonempty and bounded aboye. Show that
A +B is nonempty and bounded above and that lub (A + B) = lub A +lub B.
1.7 Formulate and prove a correct theorem about the least upper bound of thr
product of two sets.
1.8 Define the notion of a one-sided limit for a function whose domain is a subset of IR.
For example, we want to be able to discuss the limit of f(x) as x approaches a from
below, which we might designate
lim f(x).
",fa
1.9 If the domain of a real-valued function f is an interval, say [a, b], we say that f i~
an increasing function if
x < y => f(x) ::::; fey).
Prove that an increasing function has one-sided limits everywhere.
1.10 Let [a, b] be a closed interval in IR, and letf: [a, b] ----> IR be increasing. Show that
limx->yf(x) = fey) for all y in [a, b] (f is continuous on [a, b]) if and only if the range
of f does not omit any subinterval (e, d) C [f(a) , feb)]. [Hint: Suppose the range omit~
(e, d), and set y = lub {x :f(x) ::::; e}. Then f(x) +fey) as x ----> y.]
loll A set that intersects every open subinterval of an interval [s, t] is said to be
dense in [s, t]. Show that if f: [a, b] ----> IR is increasing and rangefis dense in [f(a),j(b)],
thenrangef = [f(a),f(b)]. (Foranyzbetweenf(a) andf(b) sety = lub {x:f(x)::::; z},
etc.)
1.12 Assuming the results of the above two exercises, show that if f is a continuous
strictly increasing function from [a, b] to IR, and if r = f(a) and s = feb), then f-1 is a
continuous strictly increasing function from [r, s] to IR. [A function f is continuous if
f(x) ----> fey) as x ----> y for every y in its domain; it is strictly increasing if x < y ==}
f(x) < f(y)·]
1.13 Argue somewhat as in Exercise 1.11 above to prove that if f: [a, b] ----> IR is con-
tinuous on [a, b], then the range of f includes [f(a),j(b)]. This is the intermediate-
value theorem.
1.14 Suppose the function q: IR ----> IR satisfies q(xy) = q(x) q(y) for all x, y E IR.
Note that q(x) = xn (n a positive integer) and q(x) = JxJr (r any real number) satisfy
this "functional equation". So does q(x) == 0 (r = -00 ?). Show that if q satisfies the
functional equation and q(x) > 1 for x > 1, then there is a real number r > 1 such
that q(x) = xr for all positive x.
3.2 NORMS 121
1.15 Show that if q is continuous and satisfies the functional equation q(xy) =
q(x) q(y) for all x, y E IR, and if there is at least one point a where q(a) ¢ 0, 1, then
q(x) == xr for all positive x. Conclude that if also q is nonnegative, then q(x) == Ixlr on IR.
1.16 Show that if q(x) == lxi', and if q(x +y) ~ q(x) +q(y), then r ~ 1. (Try y = 1
and x large; what is q'(x) like if r > I?)
2. NORMS
rn the limit theory of IR, as reviewed briefly above, the absolute-value function
is used prominently in expressions like' Ix - yl' to designate the distance
between two numbers, here between x and y. The definition of the convergence
of f(x) to u is simply a careful statement of what it means to say that the distance
I/(x) - ul tends to zero as the distance Ix - al tends to zero. The properties of
[:rl which we have used in our proofs are
1) Ixl > 0 if x ~ 0, and 101 = 0;
2) Ixyl = IxIIYI;
3) Ix +yl ~ Ixl + Iyl·
The limit theory of vector spaces is studied in terms of functions called
/lOrms, which serve as multidimensional analogues of the absolute-value function
on IR. Thus, if p: V ~ IR is a norm, then we want to interpret pea) as the "size"
of a and pea - (3) as the "distance" between a and (3. However, if V is not
one-dimensional, there is no one notion of size that is most natural. For example,
if f is a positive continuous function on [a, b], and if we ask the reader for a
number which could be used as a measure of how "large" f is, there are two
possibilities that will probably occur to him: the maximum value off and the area
Ilnder the graph of f. Certainly, f must be considered small if max f is small.
But also, we would have to agree that f is small in a different sense if its area is
small. These are two examples of norms on the vector space V = e([a, b]) of all
(·ontinuous functions on [a, b]:
p(f) = max {If(t) I:t E [a, b]} and q(f) = lab If(t)Idt.
"ote that f can be small in the second sense and not in the first.
In order to be useful, a notion of size for a vector must have properties
analogous to those of the absolute-value function on IR.
Definition. A norm is a real-valued function p on a vector space V such that
nl. pea) > 0 if a ~ 0 (positivity);
n2. p(xa) = Ixlp(a) for all a E V, x E IR (homogeneity);
n3. pea +(3) ~ pea) + p({3) for all a, {3 E V (triangle inequality).
A normed linear space (nls), or normed vector space, is a vector space V
together with a norm p on V. A normed linear space is thus really a pair
122 THE DIFFERENTIAL CALCULUS 3.2
-< V, p:>, but generally we speak simply of the normed linear space V, a definite
norm on V then being understood.
It has been customary to designate the norm of a by Iiall, presumably to
suggest the analogy with absolute value. The triangle inequality n3 then
becomes Iia + ~II :::; Iiall + II~II. which is almost identical in form with the basic
absolute-value inequality Ix + yl :::; Ixl + Iyl. Similarly, n2 becomes Ilxall =
Ixiliall, analogous to Ixyl = Ixllyl in IR. Furthermore, Iia - ~II is similarly
interpreted as the distance between a and~. This is reasonable since if we set
a = ~ - 71 and ~ = 71 - r, then n3 becomes the usual triangle inequality of
geometry:
II~ - rll :::; II~ - 7111 + 1171 - rll·
We shall use both the double bar notation and the "p"-notation for norms; each
is on occasion superior to the other.
The most commonly used norms on IRn are IlxilI = 1:7 lXii, the Euclidean
norm IIxl12 = (1:7 X~)I/2, and Ilxlloo = max {Ixil}i. Similar norms on the
infinite-dimensional vector space e([a, b)) of all continuous real-valued functions
on [a, b] are
IIflll = fab If(t) Idt,
( b )1/2IIfll2 = fa If(tW dt ,
Ilflloo = max {If(t) I: a :::; t :::; b}.
It should be easy for the reader to check that II lit is a norm in both cases
above, and we shall take up the so-called uniform norms II 1100 in the next
paragraph. The Euclidean norms II 112 are trickier; their properties depend on
scalar product considerations. These will be discussed in Chapter 5. :Meanwhile,
so that the reader can use the Euclidean norm II 112 on IRn , we shall ask him to
prove the triangle inequality for it (the other axioms being obvious) by brute
force in an exercise. On IR itself the absolute value is a norm, and it is the only
norm to within a constant multiple.
We can transfer the above norms on IRn to arbitrary finite-dimensional
spaces by the following general remark.
Lemma 2.1. If p is a norm on a vector space Wand T is an injective linear
map from a vector space V to W, then poT is a norm on V.
Proof. The proof is left to the reader.
Uniform norms. The two norms II 1100 considered above are special cases of a
very general situation. Let A be an arbitrary nonempty set, and let <B(A, IR)
be the set of all bounded functionsf: A ~ IR. That is, f E <B(A, IR) if and only if
f E IRA and range f c [-b, b] for some b E IR. This is the same as saying that
range IfIc [0, b], and we call any such b a bound of IfI· The set <B(A, IR) is a
3.2 NORMS 123
vector space V, since if IfI and Igl are bounded by band c, respectively, then
Ixf + ygl is bounded by Ixlb + lyle. The uniform norm Ilfll., is defined as the
smallest bound of IfI· That is,
Ilfll., = lub {If(p)1 : pEA}.
Of course, it has to be checked that II II., is a norm. For any p in A,
If(p) + g(p)1 :::; If(p) I + Ig(p)1 :::; Ilfll", + Ilgll.,·
Thus Ilfll., + Ilgll., is a bound of If +gl and is therefore greater than or equal to
the smallest such bound, which is Ilf +glloo. This gives the triangle inequality.
Next we note that if x ~ 0, then b bounds IfIif and only if Ixlb bounds Ixfl, and
it follows that Ilxfll., = Ixillfll.,. Finally, Ilfll., ~ 0, and Ilfll., = °only if f is
the zero function.
We can replace IR by any normed linear space W in the above discussion.
A function f: A ---t W is bounded by b if and only if Ilf(p) II :::; b for all p in A,
and we define the corresponding uniform norm on CB(A, W) by
Ilflloo = lub {lIf(p) II : pEA}.
If f E e([O, 1]), then we know that the continuous function f assumes the
least upper bound of its range as a value (that is, f "assumes its maximum value"),
so that then IIfll., is the maximum value of IfI· In general, however, the definition
must be given in terms of lub.
Balls. Remembering that IIa - ~II is interpreted as the distance from a to ~, it is
natural to define the open ball of radius r about the center a as {~ : IIa - ~II < r}.
We designate this ball Br(a). Translation through (3 preserves distance,
and therefore ~ E Br(a) if and only if ~ +(3 E Br(a +(3). That is, translation
through (3 carries Br(a) into Br(a +(3): T(:I[Br(a)] = Br(a +(3). Also, scalar
multiplication by c multiplies all distances by c, and it follows in a similar way
that cBr(a) = Bcr(ca).
Although Br(a) behaves like a ball, the actual set being defined is different
for different norms, and some of them "look unspherelike". The unit balls about
the origin in 1R2 for the three norms II 1111 II 112, and II II., are shown in Fig. 3.2.
A subset A of a nls V is bounded if it lies in some ball, say Br(a). Then it
also lies in a ball about the origin, namely Br+llall(O). This is simply the fact that
if II ~ - all < r, then II ~II < r + lIall, which we get from the triangle inequality
upon rewriting II~II as lIa - a) +all.
The radius of the largest ball about a vector {3 which does not touch a set A
is naturally called the distance from {3 to A. It is clearly glb {II ~ - {311 : ~ E A}
(see Fig. 3.3).
124 THE DIFFERENTIAL CALCULUS 3.2
p(fj, A) =r
Fig. 3.2 Fig. 3.3 Fig. 3.4
A point ex is an interior point of a set A if some ball about ex is included in A.
This is equivalent to saying that the distance from ex to the complement of A is
positive (supposing that A is not the whole of V), and should coincide with
the reader's intuitive notion of what an "inside" point should be. A subset A
of a normed linear space is said to be open if every point of A is an interior
point.
If our language is to be consistent, an open ball should be an open set. It is:
if ex E Br({3), then Ilex - (311 < r, and then Ba(ex) C Br({3), provided that 0 :;:; r -
"ex - {311, by virtue of the triangle inequality (see Fig. 3.4). The reader should
write down the detailed proof. He has to show that if ~ E Ba(ex), then ~ E Br({3).
Our intuitions about distances are quite trustworthy, but they should always be
checked by a computation. The reader probably can see by a mental argument
that the union of any collection of open sets is open. In particular, the union of
any collection of open balls is open (Fig. 3.5), and this is probably the most
intuitive way of visualizing an open set. (See Exercise 2.9.)
Fig. 3.5 Fig. 3.6
A subset C is said to be closed if its complement C' is open.
Our discussion above shows that a nonempty set C is closed if and only if
every point not in it is at a positive distance from it: ex ~ C =? p(ex, C) > o.
The so-called closed ball of radius r about {3, B = H : ,,~ - {311 :;:; r}, is a closed
set. As Fig. 3.6 suggests, the proof is another application of the triangle in-
equality.
3.2
EXERCISES
2.1 Show that if II~ - all ~ lIa11/2, then II~II ~ lIa1l/2.
2.2 Prove in detail that
n
Ilxlh = L: Ix;1
1
is a norm on IR n. Also prove that
Ilfll! = { 1/(t)1 dt
is a norm on e([a, b]).
2.3 For xin ~n let Ixl be the Euclidean length
Ixl = [~xq/2,
and let (x, y) be the scalar product
The Schwarz inequality says that
n
(x, y) = L: XiYi.
1
I(x, y)1 ~ Ixllyl
and that the inequality is strict if x and yare independent.
NORMS 125
a) Prove the Schwarz inequality for the case n = 2 by squaring and canceling.
b) Now prove it for the general n in the same way.
2.4 Continuing the above exercise, prove that the Euclidean length Ixl is a norm.
The crucial step is the triangle inequality, Ix + yl ~ Ixl + Iyl. Reduce it to the
Schwarz inequality by squaring and canceling. This is of course our two-norm Ilxll2.
2.5 Prove that the unit balls for the norms II 111 and II 1100 on 1R2 are as shown in
Fig. 3.2.
2.6 Prove that an open ball is an open set.
2.7 Prove that a closed ball is a closed set.
2.8 Give an example of a subset of 1R2 that is neither open nor closed.
2.9 Show from the definition of an open set that any open set is the union of a
family (perhaps very large!) of open balls. Show that any union of open sets is open.
Conclude, therefore, that a set is open if and only if it is a union of open balls.
2.10 A subset A. of a normed linear space V is said to be convex jf A includes the line
segment joining any two of its points. We know that the line segment from a to {3 is
the image of [0, 1] under the mapping t -> t{3 + (1 - t)a. Thus A. is convex if and
only if a, {3 E A and t E [0, 11 =? t{3 + (1 - t)a E A. Prove that every ball Br('Y) in
a normed linear space V is convex.
2.11 A seminorm is the same as a norm except that the positivity condition nl is
relaxed to nonnegativity:
nl'. pea) ~ °for all a.
126 THE DIFFERENTIAL CALCULUS 3.3
Thus p(a) may be 0 for some nonzero a. Every norm is in particular a seminorm.
Prove:
a) If p is a seminorm on a vector space lr and T is a linear mapping from V to W,
then poT is a seminorm on V.
b) poT is a norm if and only if T is injective and p is a norm on range T.
2.12 Show that the sum of two seminorms is a seminorm.
2.13 Prove from the above two exercises (and not by a direct calculation) that
q(f) = 11f'lloo + 11(to) I
is a seminorm on the space e1([a, b]) of all continuously differentiable real-valued
functions on [a, b], where to is a fixed point in [a, b). Prove that q is a norm.
2.14 Show that the sum of two bounded sets is bounded.
2.15 Prove that the sum Br(a) + B8 ({3) is exactly the ball Br+8 (a +(3).
3. CONTINUITY
Let V and W be any two normed linear spaces. We shall designate both norms
by II II. This ambiguous usage does not cause confusion. It is like the ambiguous
use of "0" for the zero elements of all the vector spaces under consideration. If we
replace the absolute value sign I I by the general norm symbol II II in the
definition we gave earlier for the limit of a real-valued function of a real variable,
it becomes verbatim the corresponding definition of convergence in the general
setting. However, we shall repeat the definition and take the occasion to relax
the hypothesis on the domain of f. Accordingly, let A by any subset of V, and let
f be any mapping from A to W.
Definition. We say that f(~) approaches (3 as ~ approaches a, and write
f(~) ~ {3 as ~ ~ a, if for every E there is a 0 such that
~ E A and 0 < II ~ - all < 0 => IlfW - (311 < E.
If a E A and f(~) ~ f(a) as ~ ~ a, then we say that f is continuous at a.
We can then drop the requirement that ~ ~ a and have the direct E,O-
characterization of continuity: f is continuous at a if for every E there exists a 0
such that II ~ - all < 0 => IlfW - f(a) II < E. It is understood here that ~ is
universally quantified over the domain A of f. We say that f is continuous iff is
continuous at every point a in its domain. If the absolute value of a number is
replaced by the norm of a vector, the limit theorems that we sampled in Section 1
hold verbatim for normed linear spaces. We shall ask the reader to write out a
few of these transcriptions in the exercises.
There is a property stronger than continuity at a which is much simpler to
use when it is available. We say that f is Lipschitz continuous at a if there is a
constant csuch that IIf(~) - f(a) II ::; cll ~ - all for all ~ sufficiently close to a.
3.3 CONTINUITY 127
That is, there are constants c and r such that
II t - all < r => IIf(t) - f(a) II ::; cll t - all·
The point is that now we can take 8 simply as E/c (provided E is small enough so
that this makes 8 ::; r; otherwise we have to set 8 = min {E/C, r}). We say
that f is a Lipschitz function (on its domain A) if there is a constant c such that
Ilf(~) - f(7J) II ::; cllt - 7111 for all t,7J in A. For a linear map T: V ~ W the
Lipschitz inequality is more simply written as
for all ~ E V; we just use the fact that now T(~) - T(7J) = T(t - 71) and set
~ = t - 71. In this context it is conventional to call T a bounded linear mapping
rather than a Lipschitz linear mapping, and any such c is called a bound of T.
We know from the beginning calculus that if f is a continuous real-valued
function on [a, b) (that is, if f E e([a, b))), then II: f(x) £lxl ::; m(b - a), where
m is the maximum value of If(x)l. But this is just the uniform norm of f, so that
the inequality can be rewritten as II: fl ::; (b - a) 111!100. This shows that if the
uniform norm is used on e([a, b)), then f 1---+ I:f is a bounded linear functional,
with bound b - a.
It should immediately be pointed out that this is not the same notion of
boundedness we discussed earlier. There we called a real-valued function
bounded if its range was a bounded subset of IR. The analogue here would be to
call a vector-valued function bounded if its range is norm bounded. But a
nonzero linear transformation cannot be bounded in this sense, because
IIT(xa) II = IxIIIT(a)ll·
The present definition amounts to the boundedness in the earlier sense of the
quotient T(a)/iiall (on V - {O}). It turns out that for a linear map T, being
continuous and being Lipschitz are the same thing.
TheoreDl 3.1. Let T be a linear mapping from a normed linear space V to a
normed linear space W. Then the following conditions are equivalent:
1) T is continuous at one point;
2) T is continuous;
3) T is bounded.
Proof. (1) => (3). Suppose T is continuous at ao. Then, taking E = 1, there
exists 8such that lIa - aoll < 8=> IIT(a) - T(ao)1I < 1. Setting t = a - ao
and using the additivity of T, we have II til < 8=> IIT( t)II < 1. Now for any
nonzero 71, t = 871/2117111 has norm 8/2. Therefore, IIT(t)1I < 1. But
IIT(t)11 = 8I1T(7J)11/2117J11, giving IIT(7J)1I < 2117111/8. Thus T is bounded by
C = '2/8.
128 THE DIFFERENTIAL CALCULUS 3.3
(3) ==} (2). Suppose IITW II ~ Gil ~II for all~. Then for any ao and any E
we can take 6 = E/G and have
lIa - aoll < 6 ==} IIT(a) - T(ao)1I = IIT(a - ao)1I ~ Gila - aoll < G6 = E.
(2) ==} (1). Trivial. 0
In the lemma below we prove that the norm function is a Lipschitz function
from V to~.
Lemma 3.1. For all a, {3 E V, Illall - 1I{311 I~ lIa - {311·
Proof. We have lIall = II (a - (3) +{311 ~ IIa - {311 + 1I{311, so that lIall - 1I{311 ~
lIa - {311· Similarly, II{311 - lIall ~ 1I{3 - all = lIa - {311. This pair of inequal-
ities is equivalent to the lemma. 0
Other Lipschitz mappings will appear when we study mappings with con-
tinuous differentials. Roughly speaking, the Lipschitz property lies between
continuity and continuous differentiability, and it is frequently the condition
that we actually apply under the hypothesis of continuous differentiability.
The smallest bound of a bounded linear transformation T is called its norm.
That is,
IITII = lub {IIT(a)lI/l1all : a ~ O}.
For example, let T: e([a, b]) ---+ ~ be the Riemann integral, T(f) = I:f(x) dx.
We saw earlier that if we use the uniform norm IIfll", on e([a, b)), then T is
bounded by b - a: IT(f)1 ~ (b - a)lIfll",. On the other hand, there is no smaller
bound, because I: 1 = b - a = (b - a) II 111",· Thus IITII = b - a. Other
formulations of the above definition are useful. Since
IIT(a)lI/l1all = IIT(alllall)1I
by homogeneity, and since {3 = aillall has norm 1, we have
IITII = lub {IIT({3)1I : 1I{311 = 1}.
Finally, if II"YII ~ 1, then"Y = x{3, where 1I{311 = 1 and Ixl ~ 1, and
IIF("Y)II = IxIIlF({3)1I ~ IIF({3) II·
We therefore have an inefficient but still useful characterization:
IITII = lub {IIT("Y)II : II"YII ~ 1}.
These last two formulations are uniform norms. Thus, if B I is the closed unit
ball H : II ~II ~ 1}, we see that a linear T is bounded if and only if T fBI is
bounded in the old sense, and then
IITII = liT f BIll",·
A linear map T: V ---+ Wisbouncled below byb if IiTa)1I ~ bll~1I for all ~ in V.
If T has a bounded inverse and m = liT-III, then T is bounded below by 11m,
for IIT-I('1)1I ~ mll'1l1 for all '1 E W if and only if II~II ~ mIlT(~)1I for all ~ E V.
3.3 CONTINUITY 129
If V is finite-dimensional, then it is true, conversely, that if T is bounded below,
then it is invertible (why?), but in general this does not follow.
If V and Ware normed linear spaces, then Hom(V, W) is defined to be the
set of all bounded linear maps T: V ~ W. The results of Section 2.3 all remain
true, but require some additional arguments.
Theorem 3.2. Hom(V, W) is itself a normed linear space if II Til is defined
as above, as the smallest bound for T.
Proof. This follows from the uniform norm discussion of Section 2 by virtue
of the identity IITII = liT rBilloo- 0
Theorem 3.3. If U, V, and Ware normed linear spaces, and if
T E Hom(U, V) and S E Hom(V, W), then SoT E Hom(U, W) and
liS 0 Til ~ IISIIIITII. It follows that composition on the right by a fixed T
is a bounded linear transformation from Hom(V, W) to Hom(U, W), and
similarly for composition on the left by a fixed S.
Proof
II(S 0 T)(a)1I = IIS(T(a)) II ~ IISIIIIT(a)11 ~ IISII(IITllllall) = (IISII' IITII)(llall)·
Thus SoT is bounded by liSII . II Til and everything else follows at once. 0
As before, the conjugate space V* is Hom(V, IR), now the space of all bounded
linear functionals.
EXERCISES
3.1 Write out the f,B-proofs of the following limit theorems.
1) Let V and TV be normed linear spaces, and let F and Gbe mappings from V to W.
If lim~->a FW = p. and lim~->a GW = v, then lim~->a (F + G) W = p. + v.
2) Given F: V ~ lV and g: V ~ IR, if F(~) ~ p. and g(~) ~ b as ~ ~ a, then
(gF)(~) ~ bp..
3.2 Prove that if F(~) ~ p. as ~ ~ a and G(.,,) ~ X as ." ~ p., then G 0 F(~) ~ X as
~ ~ a. Give a careful, complete statement of the theorem you have proved.
3.3 Suppose that A is an open subset of a nls V and that ao E A. Suppose that
F: A ~ IR is such that lima->ao F(a) = b ~ O. Prove that l/F(a) ~ lib as a ~ ao
(f,B-proof).
3.4 The functionf(x) = Ixlr is continuous at x = 0 for any positive r. Prove thatf is
not Lipschitz continuous at x = 0 if r < 1. Prove, however, that f is Lipschitz con-
tinuous at x = a if a > O. (Use the mean-value theorem.)
3.5 Use the mean-value theorem of the calculus and the definition of the derivative
to show that if f is a real-valued function on an interval I, and if f' exists everywhere,
then f is a Lipschitz mapping if and only if f' is a bounded function. Show also that
then Ilf'lloo is the smallest Lipschitz constant C.
130 THE DIFFERENTIAL CALCULUS 3.3
3.6 The "working rules" for II Til are
1) IITWII::; IITIIII~II for all ~;
2) IITWII::; bll~ll, all ~ ==} IITII::; b.
Prove these rules.
3.7 Prove that if we use the one-norm Ilxlll = L~ IXil on IRn, then the norm of the
linear functional
is lIall",.
3.8 Prove similarly that if IIxll = IIxll"" then IILali = Iialh.
3.9 Use the above exercises to show that if Ilxll on IRn is the one-norm, then
Ilxll = lub {If(x)1 :fE (IRn)* and IIfll ::; I}.
3.10 Show that if Tin Hom(lRn, IRm) has matrix t = {tij] , and if we use the one-
norm Ilxlll on IRn and the uniform norm IIYII", on IRm, then IITII = Iltll",.
3.ll Show that the meaning of 'Hom(V, TV)' has changed by giving an example of a
linear mapping that fails to be bounded. There is one in the text.
3.12 For a fixed ~ in V define the mapping eVE: Hom(V, W) ~ W by eVE(T) = T(~).
Prove that eVE is a bounded linear mapping.
3.13 In the above exercise it is in fact true that IlevEIi = II~II, but to prove this we
need a new theorem.
Theorem. Given ~ in the normed linear space V, there exists a functionalfin V*
such that Ilfll = 1 and IfWi = II~II.
Assuming this theorem, prove that IlevEII = II~II. [Hint: Presumably you have already
shown that lIevEII ::; II~II. You now need a Tin Hom(V, W) such that IITII = 1 and
IITWII = II~II. Consider a suitable dyad.]
3.14 Let t = {tij] be a square matrix, and define IItll as maXi (Lj Itijl). Prove that
this is a norm on the space IRnXn of all n X n matrices. Prove that listII ::; IIsll . Iltll.
Compute the norm of the identity matrix.
3.15 Let V be the normed linear space IR n under the uniform norm IIxll", = max {Ixil).
If T E Hom V, prove that II Til is the norm of its matrix IItll as defined in the above
exercise. That is, show that
(Show first that IItll is an upper bound of T, and then show that II T(x) II = Iltllllxll for
a specially chosen x.) Does part of the previous exercise now become superfluous?
3.16 Assume the following fact: If fE e([O, 1]) and Ilflll = a, then given E, there is a
function UE e([O, 1]) such that
Ilull", = 1 and
3.3 CONTINUITY 131
Let K(8, t) be continuous on [0, 1] X [0, 1] and bounded by b. Define T: e([O, 1]) ---+
ffi([O, 1]) by Th = k, where
k(8) = fa1 K(8, t) h(t) dt.
If V and TV are the normal linear spaces e and ffi under the uniform norms, prove that
IITII = l~b /IK(8, 01 dt.
[lIint: Proceed as in the above exercise.]
3.17 Let V and TV be normed linear spaces, and let A be any subset of V containing
more than one point. Let £(A, 11') be the set of all Lipschitz mappings from .1 to W.
For fin £(.1, lr), let p(f) be the smallest Lipschitz constant for f. That is,
p(f) = lub Ilfm - f(1/)II.
~*~ II~ - 1/11
Prove that £(.:1, Tr) is a vector space V and that p is a seminorm on V.
:t18 Continuing the above exercise, show that if a is any fixed point of .1, then
fI(f) + Ilf(a) II is a norm on V.
:t19 Let K be a mapping from a subset A of a normed linear space V to V which
differs from the identity by a Lipschitz mapping with constant c less than 1. We may
as well take c = !, and then our hypothesis is that
Prove that [( is injective and that its inverse is a Lipschitz mapping with constant 2.
:1.20 Continuing the above exercise, suppose in addition that the domain A of [( is
lin open subset of V and that K[C] is a closed set whenever C is a closed ball lying in A.
Prove that if C = Cr(a) , the closed ball of radius r about a, is a subset of A, then
I[C] includes the ball B = Br/7('Y) , where 'Y = [((a). This proof is elementary but
tricky. If there is a point v of B not in [([CJ, then since K[C] is closed, there is a largest
hall B' about v disjoint from K[C] and a point 1/ = K(~) in [([C] as close to B' as we
wish. Now if we change ~ by adding v - 1/, the change in the value of [( will approxi-
lIlate v - 1/ closely enough to force the new value of [( to be in B'. If we can also show
that the new value ~ + (v - 1/) is in C, then this new value of [( is in [([CJ, and we
have our contradiction.
Draw a picture. Obviously, the radius p of B' is at most r/7. Show that if
1/ = K (~) is chosen so that II v - 1/ II :-::; 3/2p, then the above assertions follow from the
triangle inequality, and the Lipschitz inequality displayed in Exercise 3.19. You have
t() prove that
IIK(~+ (v - 1/» - vii < p
Illl(i
:1.21 Assume the result of the above exercise and show that
132 THE DIFFERENTIAL CALCULUS 3.4
Show, therefore, that K[.I] is an open subset of Y. State a theorem about the Lipschitz
invertibility of K, including all the hypotheses on K that were used in the above
exercises.
3.22 We shall see in the next chapter that if V and Tr are finite-dimensional spaces,
then any continuom, map from V to Tr takes bounded closed sets into bounded closed
sets. Assuming this and the results of the above exercises, prove the following theorem.
Theorem. Let F be a mapping from an open subset A of a finite-dimensional
normed linear space V to a finite-dimensional normed linear space W. Suppose
that there is a T in Hom(V, lV) such that T-I exists and such that F - T is
Lipschitz on A, with constant 112m, where m = liT-III. Then F is injective, its
range R = F[.I] is an open subset of 11', and its inverse F-I is Lipschitz contin-
uous, with constant 2m.
4. EQUIVALENT NORMS
Two normed linear spaces V and Ware norm isomorphic if there is a bijection T
from V to W such that T E Hom(V, W) and T-I E Hom(W, V). That is, an
isomorphism is a linear isomorphism T such that both T and T- I are continuous
(bounded). As usual, we regard isomorphic spaces as being essentially the same.
For two different norms on the same space we are led to the following definition.
Definition. Two norms p and q on the same vector space V are equivalent
if there exist constants a and b such that p ~ aq and q ~ bp.
Then (l/b)q ~ P ~ aq and (l/a)p ~ q ~ bp, so that two norms are
equivalent if and only if either can be bracketed by two multiples of the other.
The above definition simply says that the identity map ~ ~ ~ from V to V,
considered as a map from the normed linear space -< V, p>- to the normed
linear space -< V, q>-, is bounded in both directions, and hence that these two
normed linear spaces are isomorphic.
If V is infinite-dimensional, two norms will in general not be equivalent.
For example, if V = e([O, 1]) and fn(t) = tn, then Ilfnlll = l/(n + 1) and
Ilinlloo = 1. Therefore, there is no constant a such that lIilloo ~ aililil for all
f E e[O, 1], and the norms II 1100 and II III are not equivalent on V = era, b].
This is why the very notion of a normed linear space depends on the assumption
of a given norm.
However, we have the following theorem, which we shall prove in the next
chapter by more sophisticated methods than we are using at present.
Theorem 4.1. On a finite-dimensional vector space Vall norms are equiva-
lent.
We shall need this theorem and also the following consequence of it occasion-
ally in the present chapter.
Theorem 4.2. If V and Ware finite-dimensional normed linear spaces, then
every linear mapping T from V to W is necessarily bounded.
3.4 EQUIVALENT NORMS 133
Proof. Because of the above theorem, it is sufficient to prove T bounded with
respect to some pair of norms. Let (J: IRn ~ V and cp: IRm~ W be any basis
isomorphisms, and let {tii} be the matrix of T = cp-l 0 To (J in Hom(lRn, IRm).
Then
where b = max Itiil. Now q(T/) = IIcp-I(T/)1100 and p(~) = II(J-I(~)III are norms
on Wand V respectively,. by Lemma 2.1, and since
q(T(~) = liT(o-l~)lloo ::; bll(J-I~III = bp(~),
we see that T is bounded by b with respect to the norms p and q on V and W. D
If we change to an equivalent norm, we are merely passing through an
isomorphism, and all continuous linear properties remain unchanged. For
example:
Theorem 4.3. The vector space Hom(V, W) remains the same if either the
domain norm or the range norm is replaced by an equivalent norm, and the
two induced norms on Hom(V, W) are equivalent.
Proof. The proof is left to the reader.
We now ask what kind of a norm we might want on the Cartesian product
V X W of two normed linear spaces. It is natural to try to choose the product
norm so that the fundamental mappings relating the product space to the two
factor spaces, the two projections 7ri and the two injections (Ji, should be con-
tinuous. It turns out that these requirements determine the product norm
uniquely to within equivalence. For if II <a, ~ >- II has these properties, then
II <a, ~>- II = II <a, 0>- + <0, ~>- II ::; II <a, 0>- II + II <0, ~>- II
::; kliiall + k211 ~II ::; k(llall + II ~II),
where ki is a bound of the injection (Ji and k is the larger of kl and k2 . Also,
Iiall ::; cI11 <a, ~>- II and II~II ::; c211 <a, ~>- II, by the boundedness of the projec-
tions 7ri, and so Iiall + II~II ::; cll <a, ~>- II, where c= C1 + C2. Now Iiall +
II ~II is clearly a norm II I!1on V X W, and our argument above shows that
I! <a, ~ >- II will satisfy our requirements if and only if it is equivalent to II Ill.
Any such norm will be called a product norm for V X W. The product norms
IllOSt frequently used are the uniform (product) norm
II;<a, ~>-Iloo = max {ilall, II~II}'
the Euclidean (product) norm II <a, ~>- 112 = (lIa112+ 11~112)1/2, and the above
Hum (product) norn1 II <a, ~ >- lit- We shall leave the verification that the uni-
form and Euclidean norms actually are norms as exercises.
Each of these three product norms can be defined as well for n factor spaces
as for two, and we gather the facts for this general case into a theorem.
134 THE DIFFERENTIAL CALCULUS 3.4
Theorem 4.4. If { -< Vi, Pi'? n is a finite set of normed linear spaces, then
II II I, II 112, and II II"" defined on V = IIi=l Vi by lIalll = L~ Pi(ai),
IIal12 = (L~ Pi(aY) 1/2, and Ilall", = max {Pi(ai) : i = 1, ... ,n}, are
equivalent norms on V, and each is a product norm in the sense that the
projections 7ri and the injections (Ji are all continuous.
*It looks above as though all we are doing is taking any norm II lion IRn and
then defining a norm III Ilion the product space V by
This is almost correct. The interested reader will discover, however, that
II lion IRn must have the property that if IXil :-:::: IYil for i = 1, •.. ,n, then
Ilxll :-:::: Ilyll for the triangle inequality to follow for III III in V. If we call such a
norm on IRn an increasing norm, then the following is true.
If II II is any increasing norm on IR n, then Ilia III = II-<Pl(al), ... , Pn(an) '? II
is a product norm on V = II~ Vi.
However, we shall use only the 1-, 2-, oo-product norms in this book. *
The triangle inequality, the continuity of addition, and our requirements on
a product norm form a set of nearly equivalent conditions. In particular, we
make the following observation.
Lemma 4.1. If V is a normed linear space, then the operation of addition
is a bounded linear map from V X V to V.
Proof. The triangle inequality for the norm on V says exactly that addition is
bounded by 1 when the sum norm is used on V X V. 0
A normed linear space V is a (norm) direct sum EB~ Vi if the mapping
-< XI, ... , Xn '? 1---7 L~ Xi is a norm isomorphism from II~ Vi to V. That is, the
given norm on V must be equivalent to the product norm it acquires when it is
viewed as II~ Vi. If V is algebraically the direct sum EB~ Vi, we always have
by the triangle inequality for the norm on V, and the sum on the right is the one-
norm for II~ Vi. Therefore, V will be the norm direct sum EB~ Vi if, conversely,
there is an n-tuple of constants {ki } such that Ilxill :-:::: kilixil for all x. This is
the same as saying that the projections Pi: X 1---7 Xi are all bounded. Thus,
Theorem 4.5. If V is a normed linear space and V is algebraically the direct
sum V = EB~ Vi, then V = EB~ Vi as normed linear spaces if and only if
the associated projections {Pi} are all bounded.
:1.4 EQUIVALENT NORMS 135
EXERCISES
4.1 The fact that Hom(V, lV) is unchanged when norms are replaced by equivalent
norms can be viewed as a corollary of Theorem 3.3. Show that this is so.
4.2 Write down a string of quite obvious inequalities showing that the norms
II Ill, II 112, and II 1100 on IR n are equivalent. Discuss what happens as n ----> 00.
4.3 Let V be an n-dimensional vector space, and consider the collection of all norms
on V of the form p 0 0, where 0: V ----> IRn is a coordinate isomorphism and p is one of
the norms II Ill, II 112, II 1100 on IRn. Show that all of these norms are equivalent. (Use
the above exercise and the reasoning in Theorem 4.2.)
4.4 Prove that II-<a, ~>-II max {ilall, II~II} is a norm on VX lV.
1·.5 Prove that 11-< a, ~ >- II = Iiall + II ~II is a norm on V X lV.
4.6 Prove that II-<a, ~>-II = (lIaI1 2+ 11~112)1!2isanormon VX TV.
4.7 Assuming Exercises 4.4 through 4.6, prove by induction the corresponding part
of Theorem 4.4.
4.8 Prove that if A is an open subset of V X lV, ~hen 11'1[A) is an open subset of V.
1,.9 Prove (E, 0) that -< T, S>- ----> SoT is a continuous map from
Hom(VI, V2) X Hom(V2, V3) to Hom(VI, V3),
where the Vi are all normed linear spaces.
1.10 Let II II be any increasing norm on IRn; that is, Ilxll :::; Ilyll if Xi :::; Yi for all i.
Lrt Pi be a norm on the vector space Vi for i = 1, ... ,n. Show that
is a norm on V = IIi' Vi.
I.ll Suppose that p: V ----> IR is a nonnegative function such that p(xa) = Ixlp(a)
for all X, a. This is surely a minimum requirement for any function purporting to be a
IIwasure of length of a vector.
a) Define continuity with respect to p and show that Theorem 3.1 is valid.
b) Our next requirement is that addition be continuous as a map from V X V to V,
and we decide that continuity at 0 means that for every E there is a 0 such that
p(a) < 0 and p({3) < 0 =} p(a +(3) < E.
Argue again as in Theorem 3.1 to show that there is a constant c such that
p(a +(3) :::; c(p(a) + p({3») for all a, (3 E V.
1.12 Let V and TV be normed linear spaces, and let f: V X TV ----> IR be bounded and
hilinear. Let T be the corresponding linear map from V to TV*. Prove that T is bounded
1111( that II Til is the smallest bound to f, that is, the smallest b such that
I!(a, (3) I :::; bllall 11{311 for all a, {3.
1.13 Let the normed linear space V be a norm direct sum M E0 N. Prove that the
"Ilhspaces M and N are closed sets in V. (The converse theorem is false.)
136 THE DIFFERENTIAL CALCULUS 3.5
4.14 Let N be a closed subspace of the normed linear space V. If A is a coset N + a,
define III A III as glb {II ~II :~ E A}. Prove that III A III is a norm on the quotient space VIN.
Prove also that if ~ is the coset containing t then the mapping ~ ~ ~ (the natural
projection 71' of V onto VIN) is bounded by 1.
4.15 Let V and TV be normed linear spaces, and let Tin Hom(V, W) have a null space
which includes the closed subspace N. Prove that the unique linear S from VIN to TV
defined by T = So 71' (Theorem 4.3 of Chapter 1) is bounded and that IISII = IITII.
4.16 Let N be a closed subspace of a normed linear space, and suppose that N has a
finite-dimensional complement in the purely algebraic sense. Prove that then V is the
norm direct sum M E9 N. (Use the above exercise and Theorem 4.2 to prove that if P
is the projection of V onto N along M, then P is bounded.)
4.17 Let N 1 and N 2 be closed subspaces of the normed linear space V, and suppose
that they have the same finite codimension. Prove that N 1 and N2 are norm isomor-
phic. (Assume the results of the above exercise and Exercise 2.11 of Chapter 2.)
4.18 Prove that if p is a seminorm on a vector space V, then its null set is a subspace
N, p is constant on the cosets of N, and p factors: p = q 0 71', where q is a norm on VIN
and 71' is the natural projection ~ ~ ~ of V onto VIN. Note that ~ ~ ~ is thus an
isometric surjection from the seminormed space V to the normed space VIN. An
isometry is a distance-preserving map.
5. INFINITESIMALS
The notion of an infinitesimal was abused in the early literature of the calculus,
its treatment generally amounting to logical nonsense, and the term fell into
such disrepute that many modern books avoid it completely. Nevertheless, it
is a very useful idea, and we shall base our development of the differential upon
the properties of two special classes of infinitesimals which we shall call "big oh"
and "little oh" (and designate 'e' and 'e', respectively).
Originally an infinitesimal was considered to be a number that "is infinitely
small but not zero". Of course, there is no such number. Later, an infinitesimal
was considered to be a variable that approaches zero as its limit. However, we
know that it is functions that have limits, and a variable can be considered to
have a limit only if it is somehow considered to be a function. We end up looking
at functions cp such that cp(t) ~ 0 as t ~ O. The definition of derivative involves
several such infinitesimals. If f'(x) exists and has the value a, then the funda-
mental difference quotient (J(x + t) - f(x)) It is the quotient of two infinites-
imals, and, furthermore, (U(x + t) - f(x))lt) - a also approaches 0 as t ~ O.
This last function is not defined at 0, but we can get around this if we wish by
multiplying through by t, obtaining
(J(x + t) - f(x)) - at = cp(t),
wheref(x + t) - f(x) is the "change in!" infinitesimal, at is a linear infinitesimal,
and cp(t) is an infinitesimal that approaches Ofaster than t (i.e., cp(t)lt ~ 0 as t ~ 0).
If we divide the last equation by t again, we see that this property of the infin-
:t5 INFINITESIMALS 137
itesimal <p, that it converges to 0 faster than t as t -+ 0, is exactly equivalent to
the fact that the difference quotient of1converges to a. This makes it clear that
the study of derivatives is included in the study of the rate at which infinites-
imals get small, and the usefulness of this paraphrase will shortly become clear.
Definition. A subset A of a normed linear space V is a neighborhood of a
point a if A includes some open ball about a. A deleted neighborhood of a is a
neighborhood of a minus the point a itself.
We define special sets of functions fJ, 0, and 0 as follows. It will be assumed
in these definitions that each function is from a neighborhood of 0 in a normed
linear space V to a normed linear space W.
1 E fJ if 1(0) = 0 and 1 is continuous at O. These functions are the infi-
nitesimals.
1 E °if 1(0) = 0 and 1 is Lipschitz continuous at o. That is, there exist
positive constants rand esuch that IIfW II ::; ell ~II on Br(O).
f Eo if f(O) = 0 and Ilf(~) 11I11 ~II -+ 0 as ~ -+ o.
When the spaces V and Ware not understood, we specify them by writing
Cl(V, W), etc.
A simple set of functions from ~ to ~ makes the qualitative difference
hdween these classes apparent. The function f(x) = Ix1 1/2 is in fJ (~, ~) but not
ill 0, g(x) = x is in °and therefore in fJ but not in 0, and hex) = x2 is in all
three classes (Fig. 3.7).
g
f
Fig. 3.7
It is clear that fJ, 0, and 0 are unchanged when the norms on V and Ware
n·placed by equivalent norms.
Our previous notion of the sum of two functions does not apply to a pair
of functions f, g E fJ(V, W) because their domains may be different. However,
f I· g is defined on the intersection dom f n dom g, which is still a neighborhood
"I' o. Moreover, addition remains commutative and associative when extended
III this way. It is clear that then fJ(V, W) is almost a vector space. The only
trouble occurs in connection with the equation f + (-f) = 0; the domain of
f lin function on the left is dom f, whereas we naturally take 0 to be the zero
function on the whole of V.
138 THE DIFFERENTIAL CALCULUS 3.5
*The way out of this difficulty is to identify two functions f and g in [f if
they are the same on some ball about O. We define f and g to be equivalent
(f '" g) if and only if there exists a neighborhood of 0 on whichf = g. We then
check (in our minds) that this is an equivalence relation and that we now do
have a vector space. Its elements are called germs of functions at o. Strictly
speaking, a germ is thus an equivalence class of functions, but in practice one
tends to think of germs in terms of their representing functions, only keeping in
mind that two functions are the same as germs when they agree on a neighbor-
hood of 0.*
As one might guess from our introductory discussion, the algebraic prop-
erties of the three classes [f, e, and e are crucial for the differential calculus.
We gather them together in the following theorem.
Theorelll 5.1
1) e(V, W) c e(V, W) c [f(V, W), and each of the three classes is closed
under addition and multiplication by scalars.
2) If f E e(V, W), and if g E e(W, X), then g 0 f E e(V, X), where
dom g 0 f = rl[dom gJ.
3) If either f or g above is in e, then so is g 0 f.
4) If f E e(V, W) and g E [f(V, IR), then fg E e(V, W), and similarly if
f E [f and gEe.
5) In (4) if either for g is in e and the other is merely bounded on a neigh-
borhood of 0, then fg E e(V, W).
6) Hom(V, W) c e(V, W).
-7) Hom(V, W) n e(V, W) = {O}.
Proof. Let £.(V, W) be the set of infinitesimals f such that Ilf(~)11 ~ EII~II on
some ball about O. Thenf E e if and only iff is in some £., and fEe if and only
if f is in every £.. Obviously, e C e c [f.
1) If Ilf(~)11 ~ all~11 on Bt(O) and Ilg(~)1I ~ bll~11 on Bu(O), then
Ilf(O + g(~)11 ~ (a + b)II~11
on Br(O), where r = min {t, u}. Thus e is closed under addition. The
closure of e under addition follows similarly, or simply from the limit of a
sum being the sum of the limits.
2) If Ilfa)II ~ all~11 when II~II ~ t and Ilg(71)11 ~ bll7111 when 117111 ~ u, then
IlgUW)11 ~ bllfWl1 ~ abll~11
when II~II ~ t and Ilfa)11 ~ u, and so when II~II ~ r = min {t, u/a}.
3.5 INFINITESIMALS 139
3) Now suppose thatf E 0 in (2). Then, given E, we can take a = E/b and have
Ilg(j(~) II ~ Ell ~II
when II ~II ~ r. Thus g 0 f E 0. The argument when g E 0 and f E 0 is
essentially the same.
4) Given IlfW II ~ ell ~II on Br(O) and given E, we choose 8 such that IgWI ~
E/e on B~(O) and have
Ilf(~)g(~) II ~ Ell ~II
when II ~II ~ min (8, r). The other result follows similarly, as also does (5).
6) A bounded linear transformation is in 0 by definition.
7) Suppose that f E Hom(V, W) n 0(V, W). Take any a ~ o. Given E,
choose r so that IlfW II ~ Ell ~II on Br(O). Then write a as a = x~, where
II ~II < r. (Find ~ and x.) Then
Ilf(a)1I = Ilf(x~)11 = Ixl· Ilf(~)11 ~ Ixl· E· II~II = Ellall·
Thus Ilf(a) II ~ Ellall for every positive E, and so f(a) = O. Thus f = 0,
proving (7). D
Remark. The additivity offwas not used in this argument, only its homogeneity.
It follows therefore that there is no homogeneous function (of degree 1) in 0
except o.
Sometimes when more than one variable is present it is necessary to indicate
with respect to which variable a function is in 0 or 0. We then write ''f(~) = o(~)"
for ''I EO", where "O(~)" is used to designate an arbitrary element of o.
The following rather curious lemma will be useful later in our proof of the
differentiability of an implicitly defined function. It is understood that 71 = f(~),
where f is the function we are studying.
Lemma 5.1. If 71 = oW +0( -< ~, 71 >-) and also 71 = d(~), then 71 = O(~).
Proof. The hypotheses imply that there are numbers b, rl and p such that
117111 ::; bll~11 + !(II~II + 117111) if II~II ~ rl and II~II + 117111 ::; p, and then that
117111 ~ p/2 if II ~II is smaller than some r2. If II ~II ::; r = min {rb 1"2, p/2} , then
all the conditions are met and 117111 ::; bll ~II +!(II~II + 117111). But this is the
inequality 117111::; (2b + 1)1I~11, and so 71 = 0(0. D
We shall also need the following straightforward result.
Lemma 5.2. Iff E O(V, X) and g E O(V, Y), then -<f, g>- E O(V, X X Y).
That is, -<O(~), o(~) >- = oW.
/'roo!. The proof is left to the reader.
140 THE DIFFERENTIAL CALCULUS 3.6
EXERCISES
5.1 Prove in detail that the class g(V, W) is unchanged if the norms on V and TV
are replaced by equivalent norms.
5.2 Do the same for e and o.
5.3 Prove (5) of the eo-theorem (Theorem 5.1).
5.4 Prove also that if in (4) either for 9 is in e and the other is merely bounded on- a
neighborhood of 0, thenfg E e(V, W).
5.5 Prove Lemma 5.2. (Remember that P = <1', P2>- is loose language for
F = 01 0 PI + 02 0 F2.) State the generalization to n functions. State the o-form of
the theorem.
5.6 Given PI E e(Vl, vI') and P2 E e(V2, W), define P from (a subset of) V =
VI X V2 to W by P(Cil, Ci2) = 1'(Cil) + F2(Ci2). Prove that FE e(V, W). (First
state the defining equation as an identity involving the projections 1fl and 1f2 and not
involving explicit mention of the domain vectors Cil and Ci2.)
5.7 Given Fl E e(Vl, W) and F2 E e(V2, ~), define precisely what you mean by
FIF2 and show that it is in o(V1 X V 2, W).
5.8 Define the class en asfollows:f E en iff E g and IlfWII/II~11 nis bounded in some
deleted ball about O. (A deleted neighborhood of Ci is a neighborhood minus Ci.) State
and prove a theorem about f + 9 when f E en and 9 E em.
5.9 State and prove a theorem about fog when f E en and 9 E em.
5.10 State and prove a theorem about fg when f E en and 9 E em.
5.II Define a similar class on. State and prove a theorem about fog when f E en
and 9 E om.
6. THE DIFFERENTIAL
Before considering the notion of the differential, we shall review some geometric
material from the elementary calculus. We do this for motivation only; our sub-
sequent theory is independent of the preliminary discussion.
In the elementary one-variable calculus the derivative f'(a) of a function f
at the point a has geometric meaning as the slope of the tangent line to the graph
of f at the point a. (Of course, according to our notion of a function, the graph
of f is f.) The tangent line thus has the (point-slope) equation y - f(a) =
f'(a)(x - a), and is the graph of the affine map x f--+ f'(a) (x - a) +f(a).
We ordinarily examine the nature of the curve f near the point <a, f(a) >-
by using new variables which are zero at this point. That is, we express every-
thing in terms of s = y - f(a) and t = x-a. This change of variables is
simply the translation <x,y>- f--+ <t,s>- = <x-a,y-f(a)>- in the
Cartesian plane ~2 which brings the point of interest <a, f(a) >- to the origin.
If we picture the situation in a Euclidean plane, of which the next page is a satis-
factory local model, then this translation in ~2 is represented by a choice of new
axes, the t- and s-axes, with origin at the point of tangency. Since y = f(x)
3.6 THE DIFFERENTIAL 141
if and only if 8 = f(a + t) - f(a), we see that the image of f under this trans-
lation is the function !!.fa defined by t.fo(t) = f(a + t) - f(a). (See Fig. 3.8.)
Of course, t.fa is simply our old friend the change in f brought about by changing
x from a to a + t.
dfa(t) - t.fa(t) =0(t)
f(a+t) I
'fo(t){ ===i~~=_~ tj'C"HJ.(t)
f(a) I t
I :
I I
I I
I I
I I
l ~ Fig. 3.8
a a+t
Similarly, the equation y - f(a) = 1'(a)(x - a) becomes s = 1'(a)t, and
the tangent line accordingly translates to the line that is (the graph of) the
linear functional l: t f---+ .f'(a)t having the number 1'(a) as its skeleton (matrix).
Remember that from the point of view of the geometric configuration (curve and
tangent line) in the Euclidean plane, all that we are doing is choosing the natural
axis system, with origin at the point of tangency. Then the curve is (the graph
of) the function t.fa, and the tangent line is (the graph of) the linear map l.
Now it follows from the definition of1'(a) that l can also be characterized as
Ihe linear function that approximates t.fa most closely. For, by definition,
t.fa(t) _ f'(a)
t
as
and this is exactly the same as saying that
t.fa(t) - let) _ 0
t
or
t-O,
t.fa - lEe.
But we know from the Be-theorem that the expression of the function t.fa as the
sum l + e is unique. This unique linear approximation l is called the differential
of f at a and is designated dfa. Again, the differential of f at a is the linear function
I: IR f---+ IR that approximates the actual change in f, t.fa, in the sense that
tlJa - lEe; we saw above that if the derivative l'(a) exists, then the differential
of Jat a exists and has f'(a) as its skeleton (1 X 1 matrix).
Similarly, if1is a function of two variables, then (the graph of) 1is a surface
ill Cartesian 3-space 1R3 = 1R2 X IR, and the tangent plane to this surface at
-< a, b,l(a, b) >- has the equation z - l(a, b) = 11Ca, b)(x - a) +12(a, b)(x - b),
142 THE DIFFERENTIAL CALCULUS 3.6
where/l = allax and 12 = aljay. If, as above, we set
1l.1<a,b>(S, t) = I(a + s, b +t) - I(a, b)
and l(s, t) = s/t(a, b) +t/2(a, b), then 1l.1<a,b> is the change in I around a, b
and 1is the linear functional on ~2 with matrix (skeleton) -</t(a, b), /2(a, b) >.
Moreover, it is a theorem of the standard calculus that if the partial derivatives
of I are continuous, then again 1approximates 1l.1<a,b>, with error in e. Here
also 1is called the differential ofI at -<a, b> and is designated dl<a,b> (Fig. 3.9).
The notation in the figure has been changed to show the value at t = -<tt, t2 >
of the differential dis of I at a = -<all a2> .
Fig. 3.9
The following definition should now be clear. As above, the local function
ll.Fa is defined by ll.Fa(~) = F(a + ~) - F(a).
Definition. Let V and W be normed linear spaces, and let A be a neighbor-
hood of a in V. A mapping F: A - W is differentiable at a if there is a T
in Hom(V, W) such that ll.Faa) = T(~) +e(~).
The ee-theorem implies then that T is uniquely determined, for if also ll.Fa =
S +e, then T - SEe, and so T - S = 0 by (7) of the theorem. This uniquely
determined T is called the differential 01 F at a and is designated dFa' Thus
ll.Fa = dFa +e,
where dFa is the unique (bounded) linear approximation to ll.Fa'
3.6 THE DIFFERENTIAL 143
* Our preliminary discussion should make it clear that this definition of the
differential agrees with standard usage when the domain space is R". However,
in certain cases when the domain space is an infinite-dimensional function space,
dFa is called the first variation of F at a. This is due to the fact that although the
early writers on the calculus of variations saw its analogy with the differential
calculus, they did not realize that it was the same subject.*
We gather together in the next two theorems the familiar rules for differ-
entiation. They follow immediately from the definition and the el9-theorem.
Itwill be convenient to use the notation !Da(V, W) for the set of all mappings
from neighborhoods of a in V to W that are differentiable at a.
Theorem 6.1
1) If F E !Da(V, W), then t:.Fa E e(V, W).
2) If F, G E !Da(V, W), then F + G E !Da(V, W) and d(F + G)a =
dFa +dGa.
3) IfF E !Da(V, R) and G E !Da(V, W), then FG E !Da(V, W) and d(FG)a =
F(a) dGa +dFaG(a), the second term beiIm a dyad.
4) If F is a constant function on V, then F is differentiable and dFa = O.
5) If F E Hom(V, W), then F is differentiable at every a E V and dFa = F.
Proof
1) t:.Fa = dFa +19 = e +f) = e by (1) and (6) of the el9-theorem.
2) It is clear that t:.(F + G)a = t:.Fa + i1Ga. Therefore, t:.(F + G)a =
(dFa +0) + (dGa +0) = (dFa +dGa) +0 by (1) of the el9-theorem. Since
dFa +dGa E Hom(V, W), we have (2).
3) t:.(FG)a(~) = F(a + ~)G(a + ~) - F(a)G(a)
= t:.Fa(~)G(a) +F(a) t:.Ga(~) +t:.Fa(~) i1Ga(~),
as the reader will see upon expanding and canceling. This is just the usual
device of adding and subtracting middle terms in order to arrive at the form
involving the t:.'s. Thus
i1(FG)a = (dFa +0)G(a) +F(a)(dGa+19) +ee = dFaG(a) +F(a) dGa+19
by the efJ-theorem.
4) If i1Fa = 0, then dFa = 0 by (7) of the e0-theorem.
5) t:.Fa(~) = F(a + ~) - F(a) = F(~). Thus t:.Fa = FE Hom(V, W). 0
The composite-function rule is somewhat more complicated.
Theorem 6.2. IfF E !Da(V, W) and G E !DF(a)(W, X), then Go F E !Da(V, X)
and
144 THE DIFFERENTIAL CALCULUS
Proof. We have
il(G 0 F)a(V = G(F(a + ~») - G(F(a»)
= G(F(a) +ilFaa») - G(F(a»)
= ilGF(a,(ilFa(~»)
= dGF(a)(ilFaW) +e(ilFaU»)
= dGF(a)(dFa(~») + dGF(a)(e(~») + e 0 e
= (dGF(a) 0 dFa)(~) + e 0 e + e 0 e.
3.6
Thus il(G 0 F)a = dGF(a) 0 dFa + e, and since dGF(a) 0 dFa E Hom(V, W),
this proves the theorem. The reader should be able to justify each step taken in
this chain of equalities. 0
EXERCISES
6.1 The coordinate mapping -< x, y >- ~ x from 1R2 to IR is differentiable. Why?
What is its differential?
6.2 Prove that differentiation commutes with the application of bounded linear
maps. That is, show that if F: V --+ TV is differentiable at a and if T E Hom(TV, X),
then To F is differentiable at a and d(T 0 F)a = To dFa.
6.3 Prove that FE :Da(V, IR) and F(a) ¢ 0 =:::} G = I/F E :Da(V, IR) and
dG = -dFa
a (F(a»)2
6.4 Let F: V --+ IR be differentiable at a, and let f: IR --+ IR be a function whose
derivative exists at a = F(a). Prove that f 0 F is differentiable at a and that
d(fo F)a = f'(a) dFa.
[Remember that the differential of f at a is simply multiplication by its derivative:
dfa(h) = hI'(a).J Show that the preceding problem is a special case.
6.5 Let V and TV be normed linear spaces, and let F: V --+ TV and G: TV --+ V be
continuous maps such that Go F = Iv and FoG = Iw. Suppose that F is differ-
entiable at a and that G is differentiable at (3 = F(a). Prove that
6.6 Let f: V --+ IR be differentiable at a. Show that g = r is differentiable at a and
that
(Prove this both by an induction on the product rule and by the composite-function
rule, assuming in the second case that D",xn = nxn - 1.)
3.6 THE DIFFERENTIAL 145
6.7 Prove from the product rule by induction that if the n functions Ii: V -t IR,
i = 1, ... , n, are all differentiable at a, then so is f = InIi, and that
6.1l A monomial of degree n on the normed linear space V is a product In li of
linear functionals (li E V*). A homogeneous polynomial of degree n is a finite sum of
monomials of degree n. A polynomial of degree n is a sum of homogeneous polynomials
Pi, i = 0, ... , n, where Po is a constant. Show from the above exercise and other
known facts that a polynomial is differentiable everywhere.
6.9 Show that if Fl: V -t WI and F2: V -t W2 are both differentiable at a, then
so is F = -<Fl, F2> from V to W = Trl X W2 (use the injections 81 and 82).
6.10 Show without using explicit computations, but using the results of earlier
exercises instead, that the mapping F = 1R2 -t 1R2 defined by
-<x, y> 1-+ -< (x - y)2, (x +y)3>
is everywhere differentiable. Now compute its differential at -< a, b>.
6.11 Let F: V -t X and G: W -t X be differentiable at a and fJ respectively, and
define K: V X W -t X by
K(~, 11) = FW + G(l1).
Show that K is differentiable at -<a, fJ >
a) by a direct .6-calculation;
b) by using the projections 11"1 and 11"2 to express K in terms of F and G without
explicit reference to the variable, and then applying the differentiation rules.
6.12 Now suppose given F: V -t IR and G: TV -t X, and define K by
Show that if F and G are differentiable at a and fJ respectively, then K is differentiable
at -< a, fJ> in the manner of (b) in the above exercise.
6.13 Let V and W be normed linear spaces. Prove that the map -< a, fJ> 1-+ lIall IIfJlI
from V X W to IR is in 0(V X W, IR). Use the maximum norm on the product space.
Let f: V X W -t IR be bounded and bilinear. Here boundedness means that there
is some b such that If(a, fJ)1 ::::; bllallllfJlI for all a, fJ. Prove that f is differentiable
everywhere and find its differential.
6.14 Letf and g be differentiable functions from IR to R We know from the composite-
function rule of the ordinary calculus that
(fa g)'(a) = !'(g(a»)g'(a).
Our composite-function rule says that
d(f a g)a = dfu(a) a dga,
where df", is the linear mapping t -t !,(x)t. Show that these two statements are equiv-
alent.
146 THE DIFFERENTIAL CALCULUS 3.7
6.15 Prove that f(x, y) = II -< x, y>- [[ 1 = Ixl +IYI is differentiable except on the
coordinate axes (that is, df<.a,b> exists if a and b are both nonzero).
6.16 Comparing the shapes of the unit balls for [[ [[1 and II [[00 on ~2, guess from the
above the theorem about the differentiability of [[ [[00' Prove it.
6.17 Let V and W be fixed normed linear spaces, let Xd be the set of all maps from
V to W that are differentiable at 0, let X 0 be the set of all maps from V to lV that
belong to o(V, W), and let Xl be Hom(V, W). Prove that Xd and Xo are vector spaces
and that Xd = Xo EB Xl.
6.18 Let F be a Lipschitz function with constant C which is differentiable at a point a.
Prove that [[dF,,[I ~ C.
7. DIRECTIONAL DERIVATIVES; THE MEAN-VALUE THEOREM
Directional derivatives form the connecting link between differentials and the
derivatives of the elementary calculus, and, although they add one more concept
that has to be fitted into the scheme of things, the reader should find them
intuitively satisfying and technically useful.
A continuous function f from an interval I C ~ to a normed linear space W
can have a derivativef'(x) at a point x E I in exactly the sense of the elementary
calculus:
f'(x) = lim f(x + t) - f(x) .
t-+o t
The range of such a function f is a curve or arc in W, and it is conventional to
call f itself a parametrized arc when we want to keep this geometric notion in
mind. We shall also call f'(x), if it exists, the tangent vector to the arc f at x.
This terminology fits our geometric intuition, as Fig. 3.10 suggests. For sim-
plicity we have set x = 0 and f(x) = O. If f'(x) exists, we say that the param-
etrized arc f is smooth at x. We also say that f is smooth at a = f(x), but this
terminology is ambiguous if f is not injective (i.e., if the arc crosses itself). An
arc is smooth if it is smooth at every value of the parameter.
We naturally wonder about the relationship between the existence of the
tangent vector f'(x) and the differentiability of fat x. If dfx exists, then, being a
linear map on ~, it is simply multiplication "by" the fixed vector a that is its
skeleton, dfx(h) = h dfx(1) = ha, and we expect a to be the tangent vector,
1'(0)
~1(1) 1(0+0 -1(0)
1~ t
Fig. 3.10
3.7 DIRECTIONAL DERIVATIVES; THE MEAN-VALUE THEOREM 147
f'(x). We showed this and also the converse result for the ordinary calculus in
our preliminary discussion in Section 6. Actually, our argument was valid for
vector-valued functions, but we shall repeat it anyway.
When we think of a vector-valued function of a real variable as being an
arc, we often use Greek letters like' 'A' and' 1" for the function, as we do below.
This of course does not in any way change what is being proved, but is slightly
suggestive of a geometric interpretation.
Theorem 7.1. A parametrized arc 1': [a, b] -+ Vis differentiable at x E (a, b)
if and only if the tangent vector (derivative) a = 'Y'(x) exists, in which case
the tangent vector is the skeleton of the differential, d'Yx(h) = h'Y'(x) = ha.
Proof. If the parametrized arc 1': [a, b] -+ V is differentiable at x E (a, b), then
d'Yx(h) = hd'Yx(1) = ha, where a = d'Yx(1). Since Ll'Yx - d'Yx E tJ, this gives
IILl'Yx(h) - hall/lhl-+ 0, and so Ll'Yx(h)/h -+ a as h -+ 0. Thus a is the derivative
'Y'(x) in the ordinary sense. By reversing the above steps we see that the exis-
tence of 1" (x) implies the differentiability of I' at x. 0
Now let F be a function from an open set A in a normed linear space V to a
normed linear space W. One way to study the behavior of F in the neighborhood
of a point a in A is to consider how it behaves on each straight line through a.
That is, we study F by temporarily restricting it to a one-dimensional domain.
The advantage gained in doing this is that the restricted F is then simply a
parametrized arc, and its differential is simply multiplication by its ordinary
derivative.
For any nonzero ~ E V the straight line through a in the direction ~ has the
parametric representation t ....... a + t~. The restriction of F to this line is the
parametrized arc 1': 'Y(t) = F(a + t~). Its tangent vector (derivative) at the
origin t = 0, if it exists, is called the derivative of F in the direction ~ at a, or the
derivative of F with respect to ~ at a, and is designated D~F(a). Clearly,
D F() 1· F(a + t~) - F(a)
~ a = 1m t .
t---+o
Comparing this with our original definition of1', we see that the tangent vector
'Y'(x) to a parametrized arc I' is the directional derivative Dl'Y(X) with respect
to the standard basis vector 1 in R
Strictly speaking, we are misusing the word "direction", because different
vectors can have the same direction. Thus, if 11 = c~ with c > 0, then 11 and ~
point in the same direction, but, because DI;F(a) is linear in ~ (as we shall see in a
moment), their associated derivatives are different: D~F(a) = cDI;F(a).
We now want to establish the relationship between directional derivatives,
which are vectors, and differentials, which are linear maps. We saw above that
for an arc I' differentiability is equivalent to the existence of 'Y'(x) = Dl'Y(X).
In the general case the relationship is not as simple as it is for arcs, but in one
direction everything goes smoothly.
148 THE DIFFERENTIAL CALCULUS 3.7
Theorelll 7.2. If F is differentiable at lX, and if Ais any smooth arc through lX,
with lX = A(X), then I' = F 0 A is smooth at x, and I"(x) = dFa(A'(X»).
In particular, if F is differentiable at lX, then every directional derivative
D~F(lX) exists, and D~F(lX) = dFaW.
Proof. The smoothness of I' is equivalent to its differentiability at x and there-
fore follows from the composite-function theorem. Moreover, I"(x) = dl'x(1) =
d(F 0 A)x(1) = dFa(dAA1») = dFa(A'(X»). If A is the parametrized line
A(t) = lX + t~, then it has the constant derivative ~, and since lX = A(O) here,
the above formula becomes 1"(0) = dFaW. That is, D~F(lX) = 1"(0) =
dFaW. 0
It is not true, conversely, that the existence of all the directional derivatives
D~F(lX) of a function F at a point lX implies the differentiability of Fat lX. The
easiest counterexample involves the notion of a homogenous function. We say
that a function F: V -7 W is homogeneous if F(x~) = xF(~) for all x and ~.
For such a function the directional derivative D~F(O) exists because the arc
I'(t) = F(O + to = tFW is linear, and 1"(0) = F(~). Thus, all of the directional
derivatives of a homogeneous function F exist at 0 and D~F(O) = F(~). If F is
also differentiable at 0, then dFo(~) = D~F(O) = FW and F = dFo. Thus a
differentiable homogeneous function must be linear. Therefore, any nonlinear
homogeneous function F will be a function such that D~F(O) exists for all ~ but
dFo does not exist. Taking the simplest possible situation, define F: 1R2 -7 IR by
F(x, y) = X3 /(X2 + y2) if <x, y> ~ <0,0> and F(O, 0) = O. Then
F(tx, ty) = tF(x, y),
so that F is homogeneous, but F is not linear.
However, if V is finite-dimensional, and if for each ~ in a spanning set of
vectors the directional derivative D~F(lX) exists and is a continuous function of lX
on an open set A, then F is continuously differentiable on A. The proof of this
fact depends on the mean-value theorem, which we take up next, but we shall
not complete it until Section 9 (Theorem 9.3).
The reader will remember the mean-value theorem as a cornerstone of the
calculus, and this is just as true in our general theory. We shall apply it in the
next section to give the proof of the general form of the above-mentioned
theorem, and practically all of our more advanced work will depend on it. The
ordinary mean-value theorem does not have an exact analogue here. Instead we
shall prove a theorem that in the one-variable calculus is an easy consequence of
the mean-value theorem.
Theorelll 7.3. Let f be a continuous function (parametrized arc) from a
closed interval [a, b] to a normed linear space, and suppose that f'(t) exists
and that [[f'(t)[[ ::; m for all t E (a, b). Then [[feb) - f(a)[[ ::; m(b - a).
Proof. Fix E > 0, and let A be the set of points x E [a, b] such that
[[f(x) - f(a)[[ ::; (m + E)(X - a) + E.
..
3.7 DIRECTIONAL DERIVATIVES; THE MEAN-VALUE THEOREM 149
A includes at least a small interval [a, e], because f is continuous at a. Set
l = lub A. Then Ilf(l) - f(a) II ~ (m +e)(l - a) +ebythecontinuityoffatl.
Thus lEA, and a < l ~ b. We claim that l = b. For if l < b, then f'(l)
exists and IIf'(l)II ~ m. Therefore, there is a ~ such that
II[f(x) - f(l)]j(x - l)II < m +e
when Ix - II ~ ~. It follows that
IIf(l + ~) - f(a) II ~ 11.f(l + ~) - f(l) II + IIf(l) - f(a) II
~ (m +e)~ + (m +e)(l - a) +e
= (m +e)(l + ~ - a) +e,
so that l + ~ E A, a contradiction. Therefore, l = b. We thus have
IIf(b) - f(a) II ~ (m +e)(b - a) + e,
and, since e is arbitrary, IIf(b) - f(a) II ~ m(b - a). 0
The following more general version of the mean-value theorem is the form in
which it is ordinarily applied. As usual, F and G are from a subset of V to W.
Theorem 7.4. If F is differentiable in the ball BT(a), and if IIdFj311 ~ e for
every fJ in this ball, then 116Fj3(~)11 ~ ell ~II whenever fJ and fJ + ~ are in the
ball. More generally, the same result holds if the ball BT(a) is replaced by
any convex set C.
Proof. The segment from fJ to fJ + ~ is the range of the parametrized arc
X(t) = fJ +t~ from [0, 1] to V. If fJ and fJ + ~ are in the ball BT(a), then this
segment is a subset of the ball. Setting 'Y(t) = F(fJ +t~), we then have 'Y'(x) =
dFj3+xE(},,'(x») = dFi3+x~(~), from Theorem 7.2. Therefore, II'Y'(x)II ~ ell~II on
[0,1], and the mean-value theorem then implies that
116Fj3WII = IIF(fJ + ~) - F(fJ)II = II'Y(I) - 'Y(O)II ~ ell~II(1 - 0) = ell~II,
which is the desired inequality. The only property of BT(a) that we have used is
that it includes the line segment joining any two of its points. This is the
definition of convexity, and the theorem is therefore true for any convex set. 0
Corollary. If G is differentiable on the convex set C, if T E Hom(V, W),
and if IIdGj3 - Til ~ e for all fJ in C, then II.::lGj3a) - T(~)II ~ ell ~II when-
ever fJ and fJ + ~ are in C.
Proof. SetF = G - T,andnotethatdFj3 = dGj3 - Tand.::lFj3 = .::lGj3 - T. 0
We end this section with a few words about notation. Notice the reversal
of the positions of the variables in the identity (D~)(a) = dFa(~). This differ-
l!IlCe has practical importance. We have a function of the two variables'a'
lI.nd ' ewhich we can convert to a function of one variable by holding the other
variable fixed; it is convenient technically to put the fixed variable in subscript
150 THE DIFFERENTIAL CALCULUS 3.7
position. Thus we think of dFa(~) with a held fixed and have the function dFa
in Hom(V, W), whereas in (D~F)(a) we hold ~ fixed and have the directional
derivative D~F: A ---t W in the fixed direction ~ as a function of a, generalizing
the notation for any ordinary partial derivative aFjaxi(a) as a function of a.
We can also express this implication of the subscript position of a variable in the
dot notation (Section 0.10): when we write D~F(a), we are thinking of the value
at a of the function D~F(·).
Still a third notation that we shall use in later chapters puts the function
symbol in subscript position. We write
J F(a) = dFa •
This notation implies that the mapping F is going to be fixed through a discussion
and gets it "out of the way" by putting it in subscript position.
If F is differentiable at each point of the open set A, then we naturally con-
sider dF to be the map a ~ dFa from A to Hom(V, W). In the "J"-notation,
dF = J F. Later in this chapter we are going to consider the differentiability
of this map at a. This notion of the second differential d2Fa = d(dF)a is probably
confusing at first sight, and a preliminary look at it now may ease the later
discussion. We simply have a new map G = dF from an open set A in a normed
linear space V to a normed linear space X = Hom(V, W), and we consider its
differentiability at a. If dGa = d(dF)a exists, it is a linear map from V to
Hom(V, W), and there is something special now. Referring back to Theorem 6.1
of Chapter 1, we know that dGa = d2Fa is equivalent by duality to a bilinear
mapping w from V X V to W: since dGa(~) is itself a transformation in
Hom(V, W), we can evaluate it at TI, and we define w by
The dot notation may be helpful here. The mapping a ~ dFa is simply
dF(.), and we have defined G by GO = dF(.). Later, the fact that dGaW is a
mapping can be emphasized by writing it as dGaa)(·). In each case here we
have a function of one variable, and the dot only reminds us of that fact and
shows us where we shall put the variable when indicating an evaluation. In the
case of w we have the original use of the dot, as in w(~, .) = dGa(~).
EXERCISES
7.1 Given f: IR ---+ IR such that f'(a) exists, show that the "directional derivative"
Dbf(a) has the value bf'(a), by a direct evaluation of the limit of the difference quotient.
7.2 Letfbe a real-valued function on an n-dimensional space V, and suppose thatfis
differentiable at a E V. Show that the directions ~ in which the derivative D~F(a) is
zero make up an (n - I)-dimensional subspace of V (or the whole of V). What similar
conclusions can be drawn if f maps V to a two-dimensional space W?
DIRECTIONAL DERIVATIVES; THE MEAN-VALUE THEOREM 151
7.3 a) Show by a direct argument on limits that iff and g are two functions from an
interval Ie IR to a normed linear space V, and if f'(x) and g'(x) both exist, then
(f+ g)'(x) exists and (f+ g)'(x) = !'(x) +g'(x).
b) Prove the same result as a corollary of Theorems 7.1 and 7.2 and the differen-
tiation rules of Section 6.
7.4 a) Given f: I - V and g: I - W, show by a direct limit argument that if
!'(x) and g'(x) both exist, and if F = -<f, g>-: I - V X W, then F'(x) exists and
W(x) = -<!'(x), g'(x) >-.
b) Prove the same result from Theorems 7.1 and 7.2 and the differentiation rules of
Hcction 6, using the exact relation F = fh 0 f +82 0 g.
7.5 In the spirit of the above two exercises, state a product law for derivatives of
IU'CS and prove it as in the (b) proofs above.
7.6 Find the tangent vector to the arc -<el,sint>- at t = OJ at t = 7r/2. [Apply
Exercise 7.4(a).] What is the differential of the above parametrized arc at these two
points? That is, if f(t) = -<el, sin t >-, what are dfo and df...{2?
7.7 Let F: 1R2 _1R2 be the mapping -<x, y>- ~ -<3x2y, x2y3>-. Compute the
dircctional derivative D<1,2>F(3, -1)
a) as the tangent vector at -< 3, -1>- to the arc f 0 ~, where ~ is the straight line
through -< 3, -1 >- in the direction -< 1, 2>- j
b) by first computing dF<3.-1> and then evaluating at -< 1, 2>-.
7.8 Let ~ and JJ. be any two linear functionals on a vector space V. Evaluate the
pl'llduct fW = ~WJJ.W along the line ~ = tOl., and hence compute D..f(OI.). Now
"valuate f along the general line ~ = ta +(J, and from it compute D..f(fJ).
7.9 Work the above exercise by computing differentials.
7.10 If f: IRn - IR is differentiable at a, we know that its differential dfa, being a
linear functional on IRn, is given by its skeleton n-tuple L according to the formula
n
dfa(x) = (L,x) = L: liXi.
1
III this context we call the n-tuple Lthe gradient of f at a. Show from the Schwarz
IIwquality (Exercise 2.3) that if we use vectors y of Euclidean length 1, then the
t1il'l'ctional derivative Dyf(a) is maximum when y points in the direction of the gradient
IIf f.
7.11 Let W be a normed linear space, and let V be the set of parametrized arcs
~: [-1, 1] - W such that ~(O) = 0 and ~'(0) exists. Show that V is a vector space and
'hilt X- ~'(O) is a surjective linear mapping from V to W. Describe in words the
,-I,-ments of the quotient space VIN, where N is the null space of the above map.
7.12 Find another homogeneous nonlinear function. Evaluate its directional deriva-
'lVI'S DEF(O), and show again that they do not make up a linear map.
7.13 Prove that if F is a differentiable mapping from an open ball B of a normed
lliwar space V to a normed linear space W such that dF.. = 0 for every a in B, then F
i" " constant function.
7. J.i Generalize the above exercise to the case where the domain of F is an open set A
with the property that any two points of A can be joined by a smooth arc lying in A.
152 THE DIFFERENTIAL CALCULUS 3.8
Show by a counterexample that the result does not generalize to arbitrary open sets A
as the domain of F.
7.15 Prove the following generalization of the mean-value theorem. Let f be a con-
tinuous mapping from the closed interval [a, b] to a normed linear space V, and let g
be a continuous real-valued function on [a, b]. Suppose that f' (t) and g'(t) both exist,
at all points of the open interval (a, b) and that 11f'(t) II ~ g'(t) on (a, b). Then
llf(b) -f(a)ll:::; g(b) -g(a).
[Consider the points x such that Ilf(x) - f(a) II ~ g(x) - g(a) + E(X - a) + E.]
8. THE DIFFERENTIAL AND PRODUCT SPACES
In this section we shall relate the differentiation rules to the special configurationH
resulting from the expression of a vector space as a finite Cartesian product.
When dealing with the range, this is a trivial consideration, but when the domain
is a product space, we become involved with a deeper theorem. These general
product considerations will be specialized to the IRn-spaces in the next section,
but they also have a more general usefulness, as we shall see in the later sectionH
of this chapter and in later chapters.
We know that an m-tuple of functions on a common domain, Fi: A ~ Wi,
i = 1, ... , m, is equivalent to a single m-tuple-valued function
m
F: A ~ W = II Wi,
1
F(OI) being the m-tuple {Fi(OI)}T for each a E A. We now check the obviously
necessary fact that F is differentiable at a if and only if each Fi is differentiable
at 01.
TheoreDl 8.1. Given Fi: A ~ Wi, i = 1, ... , m, and F = -< FI, ... ,Fm >-,
then F is differentiable at a if and only if all the functions Fi are, in which
case dF", = -<dF!, ... , dF':: >-.
Proof. Strictly speaking, F = LT 8i 0 Fi , where 8j is the injection of Wj into
the product space W = lIT Wi (see Section 1.3). Since each 8i is linear and
hence differentiable, with d(8i )", = 8i, we see that if each Fi is differentiable at 01,
then so is F, and dF", = LT 8i 0 dF~. Less exactly, this is the statement
dF", = -<dF!, ... , dF'::>-. The converse follows similarly from Fi = 7ri 0 F,
where 7rj is the projection of lIT Wi onto Wj. D
Theorems 7.1 and 8.1 have the following obvious corollary (which can also
be proved as easily by a direct inspection of the limits involved)..
LeDlDla 8.1. If f; is an arc from [a, b] to Wi, for i = 1, ... , n, and if f i~
the n-tuple-valued arc f = -<iI,.·., fn >-, then f'(x) exists if and only if
fI (x) exists for each i, in which case f'(x) = -<f~ (x), ... ,f~(x) >-.
When the domain space V is a product space lI~ V j the situation is morl'
complicated. A function F(h, ... , ~n) of n vector variables does not decomposl'
a.s THE DIFFERENTIAL AND PRODUCT SPACES 153
into an equivalent n-tuple of functions. Moreover, although its differential
liFo. does decompose into an equivalent n-tuple of partial differentials {dF~},
we do not have the simple theorem that dFa. exists if and only if the partial
differentials dF~ all exist.
Of course, we regard a function F(h, ... , ~n) of n vector variables as being
a function of the single n-tuple variable ~ = <h, ... , ~n >-, so that in principle
t.here is nothing new when we consider the differentiability of F. However, when
we consider a composition FoG, the inner function G must now be an n-tuple-
valued function G = <gl, ... , gn >-, where gi is from an open subset A of some
normed linear space X to Vi, and we naturally try to express the differential
of FoG in terms of the differentials dgi. To accomplish this we need the partial
differentials dF~ of F. For the moment we shall define the jth partial differential
of F at a = <ai, ... , a n>- as the restriction of the differential dFa. to Vj,
(~onsidered as a subspace of V = In Vi. As usual, this really involves the
injection OJ of Vj into II? Vi, and our formal (temporary) definition, accordingly,
IK
dF~ = dFa. 0 OJ.
Then, since ~ = <h, ... , ~n >- = L:? Oi(~i)' we have
n
dFa.W = L dF~ai).
1
Himilarly, since G = <gl, ... , gn >- = L:? Oi 0 gi, we have
n
d(F 0 Gh = L dFhcY) 0 dg~,
I
which we shall call the general chain rule. There is ambiguity in the "i"-super-
H(Tipts in this formula: to be more proper we should write (dF)~ and d(gi)-y.
We shall now work around to the real definition of a partial differential.
Hince
AFa. 0 OJ = (dFa.+ 19) 0 OJ = dFa. 0 OJ +19 = dF~+ 19,
II'P see that dF~ can be directly characterized, independently of dFa., as follows:
dF~ is the unique element Ti of Hom(Vi, W) such that AFa. 0 Oi = Ti +19.
That is, dF~ is the differential at ai of the function of the one variable ~i
oht.ained by holding the other variables in F(h, ... , ~n) fixed at the values
Ei = aj. This is important because in practice it is often such partial differen-
I ill.bility that we come upon as the primary phenomenon. We shall therefore
luke this direct characterization as our definition of dF~, after which our moti-
'lLting calculation above is the proof of the following lemma.
Lemma 8.2. If A is an open subset of a product space V = II? Vi, and if
F: A -7 W is differentiable at a, then all the partial differentials dF~ exist
and dF~ = dFa. 0 Oi'
154 THE DIFFERENTIAL CALCULUS 3.8
The question then occurs as to whether the existence of all the partial
differentials dF~ implies the existence of dFa.. The answer in general is negative,
as we shall see in the next section, but if all the partial differentials dF~ exist for
each a in an open set A and are continuous functions of a, then F is continuously
differentiable on A. Note that Lemma 8.2 and the projection-injection identities
show us what dFa. must be if it exists: dF~ = dFa. 0 8i and L 8i 0 'Tri = I together
imply that dFa. = L dF~ 0 'Tri.
Theorem 8.2. Let A be an open subset of the normed linear space
V = VI X V 2, and suppose that F: A ~ W has continuous partial differ-
entials dF~a.f3> and dF~a.f3> on A. Then dF <a.f3> exists and is continuous
on A, and dF <a.f3>(~, 7]) = dF~a.f3>(~) + dF~a.f3>(7]).
Proof. We shall use the sum norm on V = VI X V 2. Given E, we choose aso
that I/dFi<J.'.v> - dFi<a.f3> II < E for every <'J.t, v>- in the a-ball about <'a, {3>-
and for i = 1, 2. Setting
GW = F(a + ~,(3 + 7]) - dF~a.f3>(~),
we have
and the corollary of Theorem 7.4 implies that
when II <. ~, 7] >- II < a. Arguing similarly with
H(7]) = F(a, (3 + 7]) - dF~a.f3>(7]),
we find that
when II <.0, 7] >- 1/ < a. Combining the two inequalities, we have
when II <. ~, 7] >- 1/ < a, where T = dF~a.f3> 0 'Trl + dF~a.f3> 0 'Tr2' That IS,
tJ.F <a.f3> - T = l'l, and so dF <a.f3> exists and equals T. 0
The theorem for more than two factor spaces is a corollary.
Theorem 8.3. If A is an open subset of II~Vi and F ~ W is such that for
each i = 1, ... , n the partial differential dF~ exists for all a E A and is
continuous as a function of a = <. a 1> ••• , an >-, then dFa. exists and is
continuous on A. If ~ = <. tI, ... , ~n >-, then dFa.W = L~ dF~(~i)"
Proof. The existence and continuity of dF~ and dF! imply by the theorem that
dF~ 0 7rl + dF! 0 'Tr2 is the differential of F considered as a function of the first
two variables when the others are held fixed. Since it is the sum of continuouH
,
3.8 THE DIFFERENTIAL AND PRODUCT SPACES 155
functions, it is itself continuous in a, and we can now apply the theorem again
to add dF! to this sum partial differential, concluding that L ~ dF~ 0 7ri is the
partial differential of F on the factor space VI X V2 X V3, and so on (which is
colloquial for induction). 0
As an illustration of the use of these two theorems, we shall deduce the
general product rule (although a direct proof based on .1-estimates is perfectly
feasible). A general product is simply a bounded bilinear mapping w: X X Y ---t W,
where X, Y, and Ware all normed linear spaces. The boundedness inequality
here is "w(~, 7/)11 ~ bll ~11117/11.
We first show that w is differentiable.
Lemma 8.3. A bounded bilinear mapping w: X X Y ---t W is everywhere
differentiable and dW<a.{J>(~, 7/) = w(a, 7/) + w(~, (3).
Proof. With (3 held fixed, g{Ja) = wa, (3) is in Hom(X, W) and therefore is
cverywhere differentiable and equal to its own differential. That is, dwl exists
and dW~a.{J>(~) = w(~, (3). Since {3 ~ g{J is a bounded linear mapping,
dW~a.{J> = g{J is a continuous function of -<a, (3 '? Similarly, dW~a.{J>(7/) =
w(a, 7/), and dw2 is continuous. The lemma is now a direct corollary of Theorem
8.2. 0
If w(~, 7/) is thought of as a product of ~ and 7/, then the product of two
functions g(r) and her) is weyer), h(r), where g is from an open subset A of a
Ilormed linear space V to X and h is from A to Y. The product rule is now just
what would be expected: the differential of the product is the first times the
differential of the second plus the second times the differential of the first.
Theorem 8.4. If g: A ---t X and h: A ---t Yare differentiable at (3, then so
is the product F(r) = weger), h(r) and
dFp(r) = w(y({3), dhp(r) + w(dgp(r), h({3)).
Proof. This is a direct corollary of Theorem 8.1, Lemma 8.3, and the chain
rule. 0
t:XERCISES
8.1 Find the tangent vector to the arc -<sin t, cos t, t2 '? at t = 0; at t = 7r/2.
What is the differential of the above parametrized arc at the two given points? That is,
if I(t) = -<sin t, cos t, t2 '?, what are dlo and dlr /2?
8.2 Give the detailed proof of Lemma 8.1.
8.3 The formula
..
dFuW = L dF!(~i)
1
156 THE DIFFERENTIAL CALCULUS 3.9
is probably obvious in view of the identity
n
~ = L (Ji(~i)
1
and the definition of partial differentials, but write out an explicit, detailed proof
anyway.
8.4 Let F be a differentiable mapping from an n-dimensional vector space V to a
finite-dimensional vector space W, and define G: VX W ---7 Wby G(t 1J) = 1J - F(~).
Thus the graph of F in V X W is the null set of G. Show that the null space of
dG<a.fJ> has dimension nfor every -«a, (3) E V X W.
8.5 Let F(~, 1J) be a continuously differentiable function defined on a product
A X B, where B is a ball and A is an open set. Suppose that dF~a.fJ> = 0 for all
-«a, (3) in A X B. Prove that F is independent of 1J. That is, show that there is a
continuously differentiable function Gm defined on A such that F(~, 1J) = Gm on
AXB.
8.6 By considering a domain in 1R2 as indicated at the right, show
that there exists a function f(x, y) on an open set A in 1R2 such that
everywhere and such that f(x, y) is not a function of x alone.
8.7 Let F(~, 1J, t) be any function of three vector variables, and for fixed/' set
G(~, 1J) = F(~, 1J, /'). Prove that the partial differential dF~a.fJ..Y> exists if and only
if dG~a.fJ> exists, in which case they are equal.
8.8 Give a more careful proof of Theorem 8.3. That is, state the inductive hypothesis
and show that the theorem follows from it and Theorem 8.2. If you are meticulous in
your argument, you will need a form of the above exercise.
8.9 Let f be a differentiable mapping from 1R2 to IR. Regarding 1R2 as IR X IR, show
that the two partial differentials of f are simply multiplication by its partial derivatives.
Generalize to n dimensions. Show that the above is still true for a map F from 1R2 to a
general vector space V, the partial derivatives now being vectors.
8.10 Give the details of the proof of Theorem 8.4.
9. THE DIFFERENTIAL AND IRn
We shall now apply the results of the last two sections to mappings involving the
Cartesian spaces IRn, the bread and butter spaces of finite-dimensional theory.
We start with the domain.
Theorem 9.1. If F is a mapping from (an open subset of) IRn to a normed
linear space W, then the directional derivative of F in the direction of the .ith
standard basis vector ~j is just the partial derivative aF/ ax;, and the .ith
partial differential is multiplication by aF/aXj: dF~(h) = h(aF/aXj) (a).
More exactly, if anyone of the above three objects exists at a, then they
all do, with the above relationships.
3.9 THE DIFFERENTIAL AND IRn 157
Proof. We have
aF () l' F(ab"" aj + t, ... , an) - F(ab ... , aj, ... , an)-- a = Im~~--~~~~--~~~--~~--~~--~~
aXj t--->o t
= lim F(a + t~j) - F(a) = DOiF(a).
t--->o t
Moreover, since the restriction of F to a + lR~j is a parametrized arc whose
differential at 0 is by definition the jth partial differential of F at a and whose
tangent vector at 0 we have just computed to be (aFjaxj)(a), the remainder of
the theorem follows from Theorem 7.1. 0
Combining this theorem and Theorem 7.2, we obtain the following result.
Theorelll 9.2. If V = IRn and F is differentiable at a, then the partial
derivatives (aFjaXj) (a) all exist and the n-tuple of partial derivatives at
a, {(aFjaxj)(a))~, is the skeleton of dFa. In particular,
n aF
DyF(a) = E1Yj aXj (a).
Proof. Since dFa(~i) = DaiF(a) = (aFjaXi) (a), as we noted above, we have
(
n i) n i n aF
DyF(a) = dFa(Y) = dFa L Yi~ = L Yi dFa(~ ) = L Yi --a. (a).
1 1 1 X,
All that we have done here is to display dFa as the linear combination mapping
defined by its skeleton {dFa(~i)} (see Theorem 1.2 of Chapter 1), where T(~i) =
tlFa(~i) is now recognized as the partial derivative (aFjaXi)(a). 0
The above formula shows the barbarism of the classical notation for partial
Ilcrivatives: note how it comes out if we try to evaluate dFa(x). The notation
[)aiF is precise but cumbersome. Other notations are Fj and DjF. Each has its
problems, but the second probably minimizes the difficulties. Using it, our
formula reads dFa(Y) = L:J=1 yjDjF(a).
In the opposite direction we have the corresponding specialization of
Theorem 8.3.
Theorelll 9.3. If A is an open subset of IRn, and if F is a mapping from A
to a normed linear space W such that all of the partial derivatives
(aFjaXj) (a) exist and are continuous on A, then F is continuously differ-
entiable on A.
Proof. Since the jth partial differential of F is simply multiplication by aFj aXil
we are (by Theorem 9.1) assuming the existence and continuity of all the partial
differentials dF~ on A. Theorem 9.3 thus becomes a special case of Theorem 8.3. 0
Now suppose that the range space of F is also a Cartesian space, so that F
is a mapping from an open subset A of IRn to IRm. Then dFa is in Hom(lRn, IRm).
158 THE DIFFERENTIAL CALCULUS 3.9
For computational purposes we want to represent linear maps from IRn to IR'"
by their matrices, and it is therefore of the utmost importance to find the matrix
t of the differential T = dFa' This matrix is called the Jacobian matrix of F
at a.
The columns of t form the skeleton of dFa, and we saw above that this
skeleton is the n-tuple of partial derivatives (aF/aXj) (a). If we write the m-
tuple-valued F loosely as an m-tuple of functions, F = -<!11""!m >-, then
according to Lemma 8.1, the jth column of t is the m-tuple
aF ~ aft aim ~
a- (a) = a- (a), ... 'a- (a) .
Xj Xj Xj
Thus,
Theorelll 9.4. Let F be a mapping from an open subset of IRn to IRm, and
suppose that F is differentiable at a. Then the matrix of dFa (the Jacobian
matrix of F at a) is given by
alitij = a- (a).
Xj
If we use the notation Yi = /i(x), we have
aYi
tij = a- (a).
Xj
If we also have a differentiable map z = G(y) = -< gl (y), ... , gl(y) >- from
an open set B C IRminto 1R1, then dGb has, similarly, the matrix
agk (b) = aZk (b).
aYi aYi
Also, if B contains b = F(a), then the composite-function rule
has the matrix form
or simply
aZk _ :t aZk aYi
aXj - i=l aYi aXj .
This is the usual form of the chain rule in the calculus. We see that it is merely
the expression of the composition of linear maps as matrix multiplication.
We saw in Section 8 that the ordinary derivativef'(a) of a function! of one
real variable is the skeleton of the differential d!a, and it is perfectly reasonable to
generalize this relationship and define the derivative F'(a) of a function F of
n real variables to be the skeleton of dFa , so that F'(a) is the n-tuple of partial
derivatives {(aF/aXi) (a)H, as we saw above. In particular, if F is from an open
subset of IRn to IRm, then F'(a) is the Jacobian matrix of F at a. This gives the
3.9 THE DIFFERENTIAL AND IRn 159
matrix chain rule the standard form
(G 0 F)'(a) = G'(F(a))F'(a).
Some authors use the word 'derivative' for what we have called the differ-
ential, but this is a change from the traditional meaning in the one-variable case,
and we prefer to maintain the distinction as discussed above: the differential dFa.
is the linear map approximating flFa. and the derivative F'(a) must be the
matrix of this linear map when the domain and range spaces are Cartesian.
However, we shall stay with the language of Jacobians.
Suppose now that A is an open subset of a finite-dimensional vector space V
and that H: A ~ W is differentiable at a E A. Suppose that W is also finite-
dimensional and that cp: V ~ IRn and 1/;: W ~ IRm are any coordinate isomor-
phisms. If A = cp[A], then A is an open subset of IRn and H = 1/; 0 H 0 cp-l is a
mapping from A to IRm which is differentiable at a = cp(a), with dY. =
1/; 0 dHa. 0 cp-l. Then dH. is given by its Jacobian matrix {(ohi/oXj) (a)} , which
we now call the Jacobian matrix of H with respect to the chosen bases in V and
W. Change of bases in V and W changes the Jacobian matrix according to the
rule given in Section 2.4.
If F is a mapping from IRn to itself, then the determinant of the Jacobian
matrix COr;oXj)(a) is called the Jacobian of Fat a. It is designated
or
if it is understood that Yi = rex). Another notation is J F(a) (or simply J(a) if
F is understood). However, this is sometimes used to indicate the differential
dF., and we shall write det J F(a) instead.
If F(x) = -<x~ - x~, 2XIX2 >-, then its Jacobian matrix is
[
2Xl -2X2]
2X2 2Xl'
EXERCISES
9.1 By analogy with the notion of a parametrized are, we define a smooth param-
ct.rized two-dimensional surface in a normed linear space W to be a continuously
differentiable map r from a rectangle I X J in 1R2 to W. Suppose that I X J =
[-1,1] X [-1,1], and invent a definition of the tangent space to the range of r in W
at the point nO, 0). Show that the two vectors
or (0 0)
ox ' and or (0 0)
oy ,
are a basis for this tangent space. (This should not have been your definition.)
/
160 THE DIFFERENTIAL CALCULUS 3.9
9.2 Generalize the above exercise to a smooth parametrized n-dimensional surface
in a normed linear space W.
9.3 Compute the Jacobian matrix of the mapping -< x, y > ~ -< x2, y2, (x + y)2 >.
Show that its rank is two except at the origin.
9.4 Let F = -<P, p, f3 > from 1R3 to 1R3 be defined by
hex, y, z) = x + y + z, hex, y, z) = x2+ y2 +z2,
and
Compute the Jacobian of F at -< a, b, c >. Show that it is nonsingular unless two of the
three coordinates are equal. Describe the locus of its singularities.
9.5 Compute the Jacobian of the mapping F: -<x, y> ~ -< (x +y)2, y3> from
1R2to 1R2 at -< 1, -1> jat -< 1,0> jat -<a, b>. Compute the Jacobian of G: -<8, t> ~
-<8- t, 8+t> at -<u, v>.
9.6 In the above exercise compute the compositions FoG and G 0 F. Compute the
Jacobian of FoG at -< y, v>. Compute the corresponding product of the Jacobians
of F and G.
9.7 Compute the Jacobian matrix and determinant of the mapping T defined by
x = r cos 8, y = r sin 8, z = z. Composing a function f(x, y, z) with this mapping
gives a new function:
g(r, 8, z) = fer cos 8, r sin 8, z).
That is, g = f 0 T. This composition (substitution) is called the change to cylindrical
coordinates in 1R3.
9.8 Compute the Jacobian determinant of the polar coordinate transformation
-<r, 8> ~ -<x, y>, where x = r cos 8, y = r sin 8.
9.9 The transformation to spherical coordinates is given by x = r sin 'P cos 8,
y = r sin 'P sin 8,.z = r cos 8. Compute the Jacobian
a(x, y, z)
a(r, 'P, 8)
9.10 Write out the chain rule for the following special cases:
dw/dt = ?, where w = F(x, y), x = get), y = h(t).
Find dw/dt when w = F(xl, ... , xn) and Xi = gi(t), i = 1, ... ,n, Find aw/au when
w = F(x, y), x = g(u, v), y = h(u, v). The special case where g(u, v) = 'lit can be
rewritten
aax F(x, hex, v».
Compute it.
9.11 If w = f(x, y), x = r cos 8, and y = r sin 8, show that
[ r aw]2 +[aw]2 = [aw]2 +[aw]2,
ar a8. ax ay
3.10 ELEMENTARY APPLICATIONS 161
10. ELEMENTARY APPLICATIONS
The elementary max-min theory from the standard calculus generalizes with
little change, and we include a brief discussion of it at this point.
TheoreDl 10.1. Let F be a real-valued function defined on an open subset A
of a normed linear space V, and suppose that F assumes a relative maximum
value at a point a in A where dFa exists. Then dFa = O.
Proof. By definition D~F(a) is the derivative 'Y'(O) of the function 'Y(t) =
F(a + t~), and the domain of'Y is a neighborhood of 0 in R Since'Y has a relative
maximum value at 0, we have 'Y'(O) = 0 by the elementary calculus. Thus
dFa(~) = D~F(a) = 0 for all ~, and so dFa = O. 0
A point a such that dFa = 0 is called a critical point. The theorem states
that a differentiable real-valued function can have an interior extremal value
only at a critical point.
If V is IRn, then the above argument shows that a real-valued function F
can have a relative maximum (or minimum) at a only if the partial derivatives
(aF/aXi)(a) are all zero, and, as in the elementary calculus, this often provides
a way of calculating maximum (or minimum) values. Suppose, for example,
that wewant to show that the cube is the most efficient rectangular parallelepiped
from the point of view of minimizing surface area for a given volume V. If
the edges are x, y and z, we have V = xyz and A = 2(xy + xz + yz) =
2(xy + V/y + V/x). Then from 0 = aAjax = 2(y - V/x2), we see that
V = yx2, and, similarly, aA/ay = 0 implies that V = xy2. Therefore, yx2 =
xy2, and since neither x nor y can be 0, it follows that x = y. Then V = yx2 =
x3, and x = V I /3 = y. Finally, substituting in V = xyz shows that z = V I /3.
Our critical configuration is thus a cube, with minimum area A = 6V2/3.
It was assumed above that A has an absolute minimum at some point
-<x, y, z >-. The reader might enjoy showing that A ~ 00 if any of x, y, z tends
to 0 or 00, which implies that the minimum does indeed exist.
We shall return to the problem of determining critical points in Sections
12, 15, and 16.
The condition dFa = 0 is necessary but not sufficient for an interior maxi-
mum or mmimum. The reader will remember a sufficient condition from
beginning calculus: If f'(x) = 0 and f"(x) < 0 (>0), then x is a relative maxi-
mum (minimum) point for f. We shall prove the corresponding general theorem
in Section 16. There are more possibilities now; among them we have the
analogous sufficient condition that if dFa = 0 and d2Fa is negative (positive)
definite as a quadratic form on V, then a is a relative maximum (minimum)
point of F.
We consider next the notion of a tangent plane to a graph. The calculation
of tangent lines to curves and tangent planes to surfaces is ordinarily considered
a geometric application of the derivative, and we take this as sufficient justifica-
tion for considering the general question here.
162 THE DIFFERENTIAL CALCULUS 3.10
Let F be a mapping from an open subset A of a normed linear space V to
a normed linear space W. When we view F as a graph in V X W, we think of it
as a "surface" S lying "over" the domain A, generalizing the geometric interpre-
tation of the graph of a real-valued function of two real variables in 1R3 = 1R2 X IR.
The projection 11'1: V X W ~ V projects S "down" onto A,
-< ~, F(~) >- t;rt ~,
and the mapping ~ 1---+ -<~, F(~) >- gives the point of S lying "over"~. Our
geometric imagery views V as the plane (subspace) V X {O} in V X W, just as
we customarily visualize IR as the real axis IR X {O} in 1R2.
We now assume that F is differentiable at a. Our preliminary discussion in
Section 6 suggested that (the graph of) the linear function dFa is the tangent
plane to (the graph of) the function IlFa in V X W, and that its translate M
through -<a, F(a) >- is the tangent plane at -<a, F(a) >- to the surface S that is
(the graph of) F. The equation of this plane is TJ - F(a) = dFa(~ - a), and
it is accordingly (the graph of) the affine function GW = dFa(~ - a) +F(a).
Now we know that dFa is the unique T in Hom(V, W) such that IlFa(!;) =
T(r) +('J(r), and if we set r = ~ - a, it is easy to see that this is the same as
saying that G is the unique affine map from V to W such that
F(~) - G(~) = ('J(~ - a).
That is, M is the unique plane over V that "fits" the surface S around -<a, F(a) >-
in the sense of ('J-approximation.
However, there is one further geometric fact that greatly strengthens our
feeling that this really is the tangent plane.
Theorem 10.2. The plane with equation TJ - F(a) = dFa(~ - a) is
exactly the union of all the straight lines through -< a, F(a) >- in V X W
that are tangent to smooth curves on the surface S = graph F passing
through this point. In other words, the vectors in the subspace dFa of
V X Ware exactly the tangent vectors to curves lying in S and passing
through -<a, F(a) >-.
P,·OOj. This is nearly trivial. If -<~, TJ >- E dFa, then the arc
'Y(t) = -< a + t~, F(a + t~) >-
in S lying over the line t 1---+ a + t~ in V has -<~, dFaa) >- = -<~, TJ >- as its
tangent vector at -<a, F(a) >-, by Lemma 8.1 and Theorem 8.2.
Conversely, if t 1---+ -< X(t), F (X(t») >- is any smooth arc in S passing through a,
with a = X(to), then its tangent vector at -<a, F(a) >- is
-< X'(to), dFa (X'(to») >- ,
a vector in (the graph of) dFa. 0
:uo ELEMENTARY APPLICATIONS 163
As an example of the general tangent plane discussed above, let F =
<ft,h> be the map from 1R2 to 1R2 defined by ft(x) = (x~ - x~)/2, h(x) =
.CIX2. The graph of F is a surface over 1R2 in 1R4 = 1R2 X 1R2. According to our
above discussion, the tangent plane at <a, F(a) > has the equation y =
rlFa(x - a) + F(a). At a = <1,2> the Jacobian matrix of dFa is
-2]1 '
and F(a) = < -!, 2>. The equation of the tangent plane 1If at < 1, 2> is thus
<Yb Y2> = [~ -~] <Xl - 1, X2 - 2> + < -~, 2>.
(~omputing the matrix product, we have the scalar equations
YI = Xl - 2X2 + (-1 +4 -!) = Xl - 2X2 + !,
Y2 = 2XI + X2 + (-2 -2 +2) = 2XI + X2 - 2.
Note that these two equations present the affine space]I.J as the intersection
of the hyperplane in 1R4 consisting of all <Xb X2, Yb Y2> such that
Xl - 2X2 - YI = -!,
with the hyperplane having the equation
EXERCISES
10.1 Find the maximum value of f(x, y, z) = x + y + z on the ellipsoid
x2+ 2y2 + 3z2 = 1.
10.2 Find the maximum value of the linear functional f(x) = L1 CiXi on the unit
""here L1 x~ = 1.
10.3 Find the (minimum) distance between the two lines
and y = 8<1,1,1> + <1,0, -1>
111 1R3.
10.4 Show that there is a uniquely determined pair of closest points on the two lines
 = ta + I and y = 8b + m in IRn unless b = ka for some k. We assume that
II .,e 0 ~ b. Remember that if b is not of the form ka, then I(a, b)1 < lIall2 IIblb
1I('(:ording to the Schwarz inequality.
10.5 Show that the origin is the only critical point of f(x, y, z) = xy + yz + zx.
Find a line through the origin along which 0 is a maximum point forf, and find another
line along which 0 is a minimum point.
164 THE DIFFERENTIAL CALCULUS 3.11
10.6 In the problem of minimizing the area of a rectangular parallelepiped of given
volume V worked out in the text, it was assumed that
has an absolute minimum at an interior point of the first quadrant. Prove this. Show
first that A --+ ao if <x, y> approaches the boundary in any way:
x --+ 0, x --+ ao, y --+ 0, or y --+ ao.
10.7 Let F: 1R2 --+ 1R2 be the mapping defined by
Find the equation of the tangent plane in 1R4 to the graph of F over the point a
<11"/4,11"/4>.
10.8 Define F: 1R3 --+ 1R2 by
3 3
YI = L x7, Y2 = L x~.
1 1
Find the equation of the tangent plane to the graph of F in 1R5 over a = <1,2, -1 >.
10.9 Let w(~, 1/) be a bounded bilinear mapping from a product normed linear space
V X ll" to a normed linear space X. Show that the equation of the tangent plane to thc
graph S of w in V X ll" X X at the point <a, (3, 'Y> E S is
s = w(~, (3) +w(a, 1/) +w(a, (3).
10.10 Let F be a bounded linear functional on the normed linear space V. Show that
the equation of the tangent plane to the graph of F3 in V X IR over the point a can be
written in the form y = F2(a) (3FW - 2F(a).
10.11 Show that if the general equation for a tangent plane given in the text is applied
to a mapping Fin Hom(V, W), then it reduces to the equation for F itself [1/ = F(~)],
no matter where the point of tangency. (Naturally!)
10.12 Continuing Exercise 9.1, show that the tangent space to the range of r in n·
at r(O) is the projection on ll" of the tangent 8pace to the graph of r in 1R2 X ll" at the
point <0, r(O) >. Now define the tangent plane to range r in lV at r(O), and show
that it is similarly the projection of the tangent plane to the graph of r.
10.13 Let F: V --+ W be differentiable at a. Show that the range of dFa is the pro-
jection on 11" of the tangent space to the graph of F in V X W at the point <a, F(a) >.
ll. THE IMPLICIT-FUNCTION THEOREM
The formula for the Jacobian of a composite map that we obtained in Section )
is reminiscent of the chain rule for the differential of a composite map that we
derived earlier (Section 8). The Jacobian formula involves numbers (partial
derivatives) that we multiply and add; the differential chain rule involves linear
maps (partial differentials) that we compose and add. (The similarity becomes
a full formal analogy if we use block decompositions.) Roughly speaking, the
3.11 THE IMPLICIT-FUNCTW:>i THEOREM 165
whole differential calculus goes this way. In the one-variable calculus a differ-
ential is a linear map from the one-dimensional space IR to itself, and is therefore
multiplication by a number, the derivative. In the many-variable calculus when
we decompose with respect to one-dimensional subspaces, we get blocks of such
numbers, i.e., Jacobian matrices. When we generalize the whole theory to vector
spaces that are not one-dimensional, we get essentially the same formulas but
with numbers replaced by linear maps (differentials) and multiplication by
composition.
Thus the derivative of an inverse function is the reciprocal of the derivative
of the function: if 9 = r 1 and b = f(a), then g'(b) = l/f'(a). The differential
of an inverse map is the composition inverse of the differential of the map: if
G = F-1 and F(a) = {3, then dG{3 = (dFa)-l.
If the equation g(x, y) = °defines y implicitly as a function of x, y = f(x),
we learn to compute f'(a) in the elementary calculus by differentiating
and we get
where b = f(a). Hence
g(x,f(x)) == 0,
8g 8g ,
8_- (a, b) +8-- (a, b) f (a) = 0,
x y
8g/8x
f'(a) =
- 8g/ay'
We shall see below that if G(~, 7)) = °defines 7) as a function of ~, 7) = F(~),
and if (3 = F(a), then we calculate the differential dFa by differentiating the
identity G(~, F(~)) = 0, and we get a formula formally identical to the above.
Finally, in exactly the same way, the so-called auxiliary variable method of
solving max-min problems in the elementary calculus has the same formal
structure as our later solution of a "constrained" maximum problem by Lagrange
multipliers.
In this section we shall consider the existence and differentiability of func-
tions implicitly defined. Suppose that we are given a (vector-valued) function
G(~, 7)) of two vector variables, and we want to know whether setting G equal
to °defines 7) as a function of ~, that is, whether there exists a unique function F
such that G(~, F(~)) is identically zero. Supposing that such an "implicitly
defined" function F exists and that everything is differentiable, we can try
to compute the differential of F at a by differentiating the equation G(~, F(~)) =
0, or Go <.1, F'? = 0. We get dG~a,B> 0 dla +dG~a,{3> 0 dFa = 0, where we
have set (3 = F(a). If dG2 is invertible, we can solve for dFa, getting
elFa = -(clG~a,{3»-l 0 dG~a,{3>.
Note that this has the same form as the corresponding expression from the
elementary calculus that we reviewed above. If F is uniquely determined, then
so is elFa, and the above calculation therefore strongly suggests that we are
166 THE DIFFERENTIAL CALCULUS 3.11
going to need the existence of (dG~a,tJ»-1 as a necessary condition for the
existence of a uniquely defined implicit function around the point -<a, (3 >.
Since (3 is F(a), we also need G(a, n= O. These considerations will lead us to
the right theorem, but we shall have to postpone part of its proof to the next
chapter. What we can prove here is that if there is an implicitly defined function,
then it must be differentiable.
Theorem 11.1. Let V, W, and X be normed linear spaces, and let G be a
mapping from an open subset A X B of V X W to X. Suppose that F is a con-
tinuous mapping from A to B implicitly defined by the equation G(~, 1/) = 0,
that is, satisfying G(~, F(~)) = 0 on A. Finally, suppose that G is differ-
entiable at -< a, (3 >, where (3 = F(a), and that dG~a,tJ> is invertible. Then
F is differentiable at a and dFa = -(dG~a,tJ»-1 0 dG~a,tJ>.
Proof. Set 1/ = dFaa), so that G(a + ~,(3 + 1/) = G(a + ~,F(a + ~)) = O.
Then
0= G(a + ~,(3 +1/) - G(a, (3) = dG<a,tJ>a, 1/) = dG<a,tJ>(~, 1/) +e(~, 1/)
= dG~a,tJ>(~) +dG~a,tJ>(1/) +e(~, 1/).
Applying T- l to this equation, where T = dG~a,tJ>' and solving for 1/, we get
1/ = _T-l(dG~a,tJ>(~)) +O(e( -<~, 1/> )).
This equation is of the form 1/ = O(~) + e( -< ~, 1/ > ), and since 1/ = dFa(~) is
an infinitesimal 9'(~), by the continuity of F at a, Lemmas 5.1 and 5.2 imply
first that 1/ = O(~) and then that -<~, 1/> = O(~). Thus O(e(-<~, 1/») =
e(eO(m) = ea), and we have
dFaW = 1/ = 8W +e(~),
where 8 = -(dG~a,tJ»-1 0 dG~a,tJ>' an element of Hom(V, W). Therefore, F
is differentiable at a and dFa has the asserted value. 0
We shall show in the next chapter, as an application of the fixed-point
theorem, that if V, W, and X are finite-dimensional, and if G is a continuously
differentiable mapping from an open subset A X B of V X W to X such that
at the point -<a, (3) we have both G(a, (3) = 0 and dG~a,tJ> invertible, then
there is a uniquely determined continuous mapping F from a neighborhood M of
a to B such that F(a) = (3 and G(~, F(~)) = 0 on M. The same theorem is true
for the more general class of complete normed linear spaces which we shall study
in the next chapter. For these spaces it is also true that if T- l exists, then so
does 8-1 for all 8 sufficiently close to T, and the mapping 81--+8-1 is contin-
uous. Therefore dG~p.,.> is invertible for all -<fJ., v> sufficiently close to
-<a, (3 >, and the above theorem then implies that F is differentiable on a
neighborhood of a. Moreover, only continuous mappings are involved in the
formula given by the theorem for dF: fJ. 1--+ dFp., and it follows that F is in fact
continuously differentiable near a. These conclusions constitute the implicit-
function theorem, which we now restate.
lUI THE IMPLICIT-FUNCTION THEOREM 167
Theorem n.2. Let V, W, and X be finite-dimensional (or, more generally,
complete) normed linear spaces, let A X B be an open subset of V X W,
and let G: A X B ~ X be continuously differentiable. Suppose that at the
point -<a, (3 >- in A X B we have both G(a, (3) = 0 and dG~<x,f3> invertible.
Then there is a ball M about a and a uniquely defined continuously differen-
tiable mapping F from M to B such that F(a) = (3 and G(~, F(O) = 0 on M.
The so-called inverse-mapping theorem is a special case of the implicit-
fUllction theorem.
Theorem n.3. Let H be a continuously differentiable mapping from an
open subset B of a finite-dimensional (or complete) normed linear space W
to a normed linear space V, and suppose that its differential is invertible at
a point (3. Then H itself is invertible near (3. That is, there is a ball M about
a = H((3) and a uniquely determined continuously differentiable function F
from M to B such that F(a) = (3 and H(F(~)) = ~ on M.
1)(00/. Set G(~, 1/) = ~ - H(1/). Then G is continuously differentiable from
V X B to V and dG~a,f3> = -dHf3 is invertible. The implicit-function theorem
then gives us a ball M about a and a uniquely determined continuously differ-
f»1t.iable mapping F from M to B such that F(a) = (3 and 0 = G(~, F(~)) =
• . H(FW) on M. 0
The inverse-mapping theorem is often given a slightly different formulation
'Wilich we state as a corollary.
Corollary. Under the hypotheses of the above theorem there exists an open
neighborhood U of (3 such that H is injective on U, N = H[U] is open in
V, and H-1 is continuously differentiable on N. (See Fig. 3.11.)
Fig. 3.11
l'mo/. The proof of the corollary is left as an exercise.
In practice we often have to apply the Cartesian formulations of these
~hnorems. The student should certainly be able to write these downf but we
iltmll state them anyway; starting with the simpler inverse-mapping .theorem.
Theorem n.4. Suppose that we are given n continuously differentiable
real-valued functions Gi (Y1,"" Yn), i = 1, ... ,n, of n real variables
defined on a neighborhood B of a point b in ~n and suppose that the Jacobian
determinant
168 THE DIFFERENTIAL CALCULUS 3.11
is not zero. Then there is a ball M about a = G(b) in ~n and a uniquely
determined n-tuple F = -< F1, ... , Fn >- of continuously differentiable real-
valued functions defined on M such that F(a) = band G(F(x)) = x on 111
for i = 1, ... ,n. That is, Gi(F1(Xb ... ,xn), ... ,Fn(Xb ... ,xn)) = ;r,
for all x in M and for i = 1, ... , n.
For example, if x = -< Y~ + Y~, Y~ + Y~ >-, then at the point b = -< 1, 2>-
we have
a(Xl, X2) _ det [3yt 3Y~]1
a(Yl, Y2) - 2Yl, 2Y2 < 1.2 >
= det [;: I!]= -12 ,= 0,
and we therefore know without trying to solve explicitly that there is a uniqUf'
solution for y in terms of x near x = -< 13 + 23, 12 + 22>- = -< 9, 5>-. Thl'
reader would find it virtually impossible to solve for y, since he would quickly
discover that he had to solve a polynomial equation of degree 6. This clearly
shows the power of the theorem: we are guaranteed the existence of a mappillJ!:
which may be very difficult if not impossible to find explicitly. (However, ill
the next chapter we shall discover an iterative procedure for approximating thl'
inverse mapping as closely as we want.)
Everything we have said here applies all the more to the implicit-functioll
theorem, which we now state in Cartesian form.
Theorem n.s. Suppose that. we are given m continuously differentiabll'
real-valued functions Gi(x, y) = Gi(Xl, ... , Xn, Yb ... , Ym) of n +m real
variables defined on an open subset A X B of ~n+m and an (n + m)-tupll'
-<a, b>- = -<ab"" an, b1, • •• , bm>- such that Gi(a, b) = 0 for i,
1, ... , m, and such that the Jacobian determinant
is not zero. Then there is a ball M about a in ~n and a uniquely determinl'd
m-tuple F = -< F1, .•• , Fm >- of continuously differentiable real-vaIUl·d
functions Fj(x) = Fj(Xb ... ,xn) defined on M such that b = F(a) alld
Gi(x, F(x)) = 0 on M for i = 1, ... ,m. That is, bi = Fi(ab ... ,an) I'lli'
i = 1, ... ,m,and Gi (Xl, ... ,Xn; F1 (Xb ... ,xn), ... ,Fm(Xb ... ,xn ))
ofor all x in M and for i = 1, ... , m.
For example, the equations
X~ + x~ - y~ - y~ = 0,
x~ - x~ - y~ - y~ = 0
can be solved uniquely for y in terms of x near -< x, y>- -< 1, 1, 1, -1>-,
3.11 THE IMPLICIT-FUNCTION THEOREM 169
because they hold at that point and because
a(GI, G2) = det [-2Y~, -2Y2] (2 2)
a(YI, Y2) -3Yb -3y~ = 6 YIY2 - Y2YI
has the value 12 there. Of course, we mean only that the solution functions
exist, not that we can explicitly produce them.
EXERCISES
11.1 Show that -< x, y> ....... -< e" +ell, eX +e-II > is locally invertible about any
point -< a, b>, and compute the Jacobian matrix of the inverse map.
11.2 Show that -< u, v> ....... -< eU +e', eU - eV > is locally invertible about any
point -< a, b> in 1R2, by computing the Jacobian matrix. In this case the whole map-
ping is invertible, with an easily computed inverse. Make this calculation, compute
the Jacobian matrix of the inverse map, and verify that the two matrices are inverses
at the appropriate points.
11.3 Show that the mapping -<x,y,z> ....... -<sinx,cosy,ez > from 1R3 to 1R3 is
locally invertible about -< 0, 11'/2, 0>. Show that
-<x, y, z> ....... -<sin (x + y + z), cos (x - y +z), e<dll-z) >
is locally invertible about -<11'/4, -11'/4,0>.
11.4 Express the second map of the above exercise as the composition of two maps,
and obtain your answer a second way.
11.5 Let F: -<x, y> ....... -<u, v> be the mapping from 1R2 to 1R2 defined by u =
x2 + y2, V = 2xy. Compute an inverse G of F, being careful to give the domain and
range of G. How many inverse mappings are there? Compute the Jacobian matrices
of F at -< 1,2> and of Gat -< 5,4>, and show by multiplying them that they are
inverse.
11.6 Consider now the mapping F: -< x, y> ....... -< x3, y3 >. Show that dF<0.0> is
singular and yet that the mapping has an inverse G. What conclusion do we draw
about the differentiability of G at the origin?
11.7 Define F:1R2---t1R2 by -<x,y> ....... -<e"cosy,eXsiny>. Prove that F is
locally invertible about every point.
11.8 Define F: 1R3 ---t 1R3 by x ....... y where
YI = Xl +x~ + (X3 - 1)
2 3
Y2 = Xl +X2+ (X3 - 3X3),
3 2
Y3 = Xl +X2 + X3.
Prove that x ....... y = F(x) is locally invertible about x = -< 0, 0, 1>.
11.9 For a functionf: IR ---t IR the proof of local invertibility around a point a where
dfa is nonsingular is much simpler than the general case. Show first that the Jacobian
matrix of f at a is the number f'ea). We are therefore assuming thatf'(x) is continuous
in a neighborhood of a and that f'(a) ;>'! O. Prove that then f is strictly increasing (or
decreasing) in an interval about a. Now finish the theorem. (See Exercise 1.12.)
170 THE DIFFERENTIAL CALCULUS 3.11
n.IO Show that the equations
t2 + x3 + y3 + z3 = 0, t + x2+ y2 + z2 = 4,
have differentiable solutions x(t), y(t), z(t) around -<t, x, y, z> = -<0, -1, 1,0>,
n.n Show that the equations
x + 2y + 3u + 4. 4e e e e = ,
can be uniquely solved for u and v in terms of x and y around the point -< 0, 0, 0, °> ,
n.12 Let S be the graph of the equation
xz + sin (xy) + cos (xz) = 1
in IJ;P. Determine whether in the neighborhood of (0,1,1) S is the graph of a diffrr-
entiable function in any of the following forms:
z = f(x, y), x = g(y, z), y = h(x, z).
n.13 Given functionsf and g from 1R3 to IR such thatf(a, b, c) = °and g(a, b, c) = n,
write down the condition on the partial derivatives of f and g that guarantees tht,
existence of a unique pair of differentiable functions y = h(x) and z = k(x) satisfyin/-':
h(a) = b, k(a) = c,
and
f(x, y, z) = f(x, h(x), k(x)) = 0,
g(x, y, z) = g(x, h(x), k(x)) = ° around -<a, b, c>.
n.14 Let G(t TJ, n be a continuously differentiable mapping from V = In Vi to 11
such that dG!: V3 ----> W is invertible and G(a) = G(aI, a2, (3) = 0. Prove that then'
exists a uniquely determined function t = F(~, TJ) defined around -< aI, a2 > III
VI X V 2 such that G(~, TJ, F(~, TJ)) = °and F(al, (2) = a3. Also show that
dF~t,~> = [-dG~t,~'>l-l[dG~t,~,r>l,
where t = F(~, TJ).
n.15 Let F(t TJ) be a continuously differentiable function from V X tv to X, and
suppose that dF~a,{3> is invertible. Setting'Y = F(a, (3), show that there is a product
neighborhood L X M X N of -< 'Y, a, (3 > in X X V X Wand a unique continuously
differentiable mapping G: L X M ----> N such that on LX M, F(~, G(t, ~)) = !:
n.16 Suppose that the equation g(x, y, z) = °can be solved for z in terms of x and iI,
This means that there is a function f(x, y) such that g(x, y, f(x, y)) = 0. Suppose abo
that everything is differentiable and compute az/ax.
n.17 Suppose that the equations
g(x, y, z) = ° and h(x, y, z) = °
can be solved for y and z as functions of x. Compute dy/dx.
n.IS Suppose that g(x, y, u, v) = °and h(x, y, u, v) = °can be solved for u and I'
as functions of x and y. Compute au/ax.
n.19 Compute dz/dx where x3 + y3 + z3 = °and x2+ y2 +- z2 = 1.
n.20 If t3 + x3 + y3 + z3 = °and t2+ x2 + y2 + z2 = 1, then az/ax is ambiguolI~,
We are obviously going to think of two of the variables as functions of the other two,
3.11 THE IMPLICIT-FUNCTION THEOREM 171
Also z is going to be dependent and x independent. But is t or y going to be the other
independent variable? Compute azjax under each of these assumptions.
1l.21 We are given four "physical variables" p, v, t, and cp such that each of them is a
function of any two of the other three. Show that atjap has two quite different mean-
ings, and make explicit what the relationship between them is by labeling the various
functions that are relevant and applying the implicit differentiation process.
1l.22 Again the "one-dimensional" case is substantially simpler. Let G be a con-
tinuously differentiable mapping from 1R2 to IR such that G(a, b) = 0 and
(aGjay) (a, b) = G2(a, b) > o.
Show that there are positive numbers E and ~ such that for each c in (a - ~, a + ~)
the function g(y) = G(c, y) is strictly increasing on [b - E, b + E) and G(c, b - E) <
o < G(c, b + E). Conclude from the intermediate-value theorem (Exercise 1.13) that
there exists a unique function F: (a - ~, a + ~) ---+ (b - E, b + E) such that
G(x,F(x» = O.
1l.23 By applying the same argument used in the above exercise a second time, prove
that F is continuous.
1l.24 In the inverse-function theorem show that dFa = (dHfj) -1. That is, the differ-
ential of the inverse of H is the inverse of the differential of H. Show this
a) by applying the implicit-function theorem;
b) by a direct calculation from the identity H (FW) = ~.
11.25 Again in the context of the inverse-mapping theorem, show that there is a
neighborhood I'll of f3 in A such that F(H(rJ» = ." on M. (Don't work at this. Just
apply the theorem again.)
11.26 We continue in the context of the inverse-mapping theorem. Assume the result
(from the next chapter) that if dH~l exists, then so does dH"i1, for ~ sufficiently close
to f3. Show that there is an open neighborhood U of f3 in B such that H is injective on U,
H[U) is an open set N in V, and H-1 is continuously differentiable on N.
11.27 Use Exercise 3.21 to give a direct proof of the existence of a Lipschitz con-
tinuous local inverse in the context of the inverse-mapping theorem. [Hint: Apply
Theorem 7.4.)
11.28 A direct proof of the differentiability of an inverse function is simpler than the
implicit-function theorem proof. Work out such a proof, modeling your arguments in a
general way upon those in Theorem 11.1.
11.29 Prove that the implicit-function theorem can be deduced from the inverse-
function theorem as follows. Set
H(~,.,,) = -<~, G(~, .,,) >,
ILnd show that dH<a,fj> has the block diagram
I I 0
dG1 dG2
Then show that dH<a,fj> -1 exists from the block diagram results of Chapter 1. Apply
the inverse-mapping theorem.
172 THE DIFFERENTIAL CALCULUS 3.12
12. SUBMANIFOLDS AND LAGRANGE MULTIPLIERS
If V and Ware finite-dimensional spaces, with dimensions nand m, respectively,
and if F is a continuous mapping from an open subset A of V to W, then (the
graph of) F is a subset of V X W which we visualize as a kind of "n-dimensional
surface" S spread out over A. (See Section 10.) We shall call F an n-dimensional
patch in V X W. :Iore generally, if X is any (n +m)-dimensional vector space,
we shall call a subset S an n-dimensional patch if there is an isomorphism 'P
from X to a product space V X W such that V is n-dimensional and 'P[S] is a
patch in V X W. That is, S becomes a patch in the above sense when X is
considered to be V X W. This means that if 11"1 is the projection of X = V X W
onto V, then 1I"dS] is an open subset A of V, and the restriction 11"1 rS is one-to-
one and has a continuous inverse. If 11"2 is the projection on W, then F =
11"2 a (11"1 rS)-I~ is the map from A to W whose graph in V X W is S (when
V X W is identified with X).
Now there are important surfaces that aren't such "patch" surfaces. Con-
sider, for instance, the surface of the unit ball in ~3, S = {x : L~ x2 = I}. S is
obviously a two-dimensional surface in ~3 which cannot be expressed as a graph,
no matter how we try to express ~3 as a direct sum. However, it should be
equally clear that S is the union of overlapping surface patches. If a is any point
on S, then any sufficiently small neighborhood N of a in ~3 will intersect S in a
patch; we take Vas the subspace parallel to the tangent plane at a and Was
the perpendicular line through O. Moreover, this property of S is a completely
adequate definition of what we mean by a submanifold.
A subset S of an (n + m)-dimensional vector space X is an n-dimensional
submanifold of X if each a on S has a neighborhood N in X whose intersection
with S is an n-dimensional patch.
We say that S is smooth if all these patches Sa are smooth, that is, if the
function F: A ---? W whose graph in V X W is the patch Sa (when X is viewed
as V X W) is continuously differentiable for every such patch Sa.
The sphere we considered above is a two-dimensional smooth submanifold
of ~3.
Submanifolds are frequently presented as zero sets of mappings. For
example, our sphere above is the zero set of the mapping G from ~3 to ~ defined
by G(x) = L~ xl - 1. It is obviously important to have a condition guar-
anteeing that such a null set is a submanifold.
Theorem 12.1. Let G be a continuously differentiable mapping from an open
subset U of an (n + m)-dimensional vector space X to an m-dimensional
vector space Y such that dGa is surjective for every a in the zero set S of G.
Then S is an n-dimensional submanifold of X.
Proof. Choose any point 'Y of S. Since dG-y is surjective from the (n + m)-
dimensional vector space X to the m-dimensional vector space Y, we know that
the null space V of dG-y has dimension n (Theorem 2.4, Chapter 2). Let W be any
3.12 SUBMANIFOLDS AND LAGRANGE MULTIPLIERS 173
complement of V, and think of X as V X W, so that G now becomes a function of
two vector variables and 'Y is a point <a, /3 >- such that G(a, /3) = O. The
restriction of dG<a.,fJ> to W is an isomorphism from W to Y; that is, (dG~a.,fJ»-l
exists. Therefore, by the implicit-function theorem, there is a product neigh-
borhood Sa(a) X Sr(/3) of <a, /3 >- in X whose intersection with S is the graph
of a function on Sa(a). This proves our theorem. 0
If S is a smooth submanifold, then the function F whose graph is the patch
of S around 'Y (when X is viewed suitably as V X W) is continuously differentia-
ble, and therefore S has a uniquely determined n-dimensional tangent plane M
at 'Y that fits S most closely around 'Y in the sense of our l'J-approximations.
If 'Y = 0, this tangent plane is an n-dimensional subspace, and in general it is
the translate through 'Y of a subspace N. We call N t.he tangent space of S at 'Y;
its elements are exactly the vectors in X tangent to parametrized arcs drawn
in S through 'Y. What we are going to do later is to describe an n-dimensional
manifold S independently of any imbedding of S in a vector space. The tangent
space to S at a point 'Y will still be an invaluable notion, but we are not going to
he able to visualize it by an actual tangent plane in a space X carrying S.
Instead, we will have to construct the vector space tangent to S at 'Y some-
how.
The clue is provided by Theorem 10.2, which tells us that if S is imbedded
I1S a submanifold in a vector space X, then each vector tangent to S at 'Y can be
presented as the unique tangent vector at 'Y to some smooth curve lying in S.
This mapping from the set of smooth curves in S through 'Y to the tangent space
Itt'Y is not injective; clearly, different curves can be tangent to each other at 'Y
ILlld so have the same tangent vector there. Therefore, the object in S that
eorresponds to a tangent vector at 'Y is an equivalence class of smooth curves
through 'Y, and this will in fact be our definition of a tangent vector for a general
lIIanifold.
The notion of a submanifold allows us to consider in an elegant way a
<:Iassical "constrained" maximum problem. Weare given an open subset U
of a finite-dimensional vector space X, a differentiable real-valued function F
defined on U, and a submanifold S lying in U. We shall suppose that the
,ubmanifold S is the zero set of a continuously differentiable mapping G from U
10 a vector space Y such that dG"( is surjective for each 'Y on S. We wish to
('lIllsider the problem of maximizing (or minimizing) F('Y) when 'Y is "con-
Htrained" to lie on S. We cannot expect to find such a maximum point 'Yo by
Hd,ting dF"( = 0 and solving for 'Y, because 'Yo will not be a critical point for F.
(:ollsider, for example, the function fI(x) = L~ xT - 1 from ~3 to ~ and F(x) =
.1'2' Here the "surface" defined by fI = 0 is the unit sphere L~ xT = 1, and on
Ihis sphere F has its maximum value 1 at <0,1,0>-. But F is linear, and so
,[/,,"( = F can never be the zero transformation. The device known as Lagrange
multipliers shows that we can nevertheless find such constrained critical points
by solving dL,,( = 0 for a suitable function L.
174 THE DIFFERENTIAL CALCULUS 3.12
Theorem 12.2. Suppose that F has a maximum value on S at the point 1'.
Then there is a functionall in y* such that I' is a critical point of the func-
tion F - (loG).
Proof. By the implicit-function theorem we can express X as V X W in such a
way that the neighborhood of S around I' is the graph of a mapping H from an
open set A in V to W. Thus, expressing F and G as functions on V X W, we
have G(~, 1/) = 0 near I' = «a, (3) if and only if 1/ = H(~), and the restriction
of F(~, 1/) to this zero surface is thus the function K: A ~ IR defined by Ka) =
F (~, H (~». By assumption a is a critical point for this function. Thus
o= dKa = dF~a,p> +dF~a,p> 0 dHa.
Also from the identity G(~, H(O) = 0, we get
o= dG~a,p> + dG~a,p> 0 dHa.
Since dG~a,fJ> is invertible, we can solve the second equation for dHa and
substitute in the first, thus getting, dropping the subscripts for simplicity,
dF I - dF2 0 (dG2)-1 0 dGI = O.
Let l E y* be the functional dF2 0 (dG2 )-I. Then we have dF I = lo dGI and,
by definition, dF2 = lo dG2• Composing the first equation (on the right) with
1rl: V X W ~ V and the second with 1r2, and adding, we get dF<a,p> =
lo dG<a,p>. That is, d(F - lo Gh = o. 0
Nothing we have said so far explains the phrase "Lagrange mUltipliers".
This comes out of the Cartesian expression of the theorem, where we have U
an open subset of a Cartesian space IRn, Y = IRm, G = «gl, ... , gm> , and l ill
y* of the form lc: l(y) = L'f CiYi. Then F - loG = F - L'f Cigi , and
d(F - lo G)a = 0 becomes
aF magi
- - 1: Ci - = 0, j = 1, ... , n.
aXj 1 aXj
These n equations together with the m equations G = «gl, ... , gm> = 0 give
m +n equations in the m +n unknowns Xb ••• , X n , Cb ••• , Cm.
Our original trivial example will show how this works out in practice. We
want to maximize F(x) = X2 from 1R3 to IR subject to the constraint L~ x'f = 1.
Here g(x) = L~ x'f - 1 is also from 1R3 to IR, and our method tells us to look
for a critical point of F - cg subject to g = O. Our system of equations is
0- 2CXl = 0,
1 - 2CX2 = 0,
0- 2CX3 = 0,
3
1: x~ = 1.
1
3.13 FUNCTIONAL DEPENDENCE 175
The first says that c = °or Xl = 0, and the second implies that c cannot be 0.
Therefore, Xl = Xa = 0, and the fourth equation then shows that X2 = ±1.
Another example is our problem of minimizing the surface area A =
2(xy + yz +zx) of a rectangular parallelepiped, subject to the constraint of a
constant volume, xyz = V. The theorem says that the minimum point will be a
critical point of A - AV for some A, and, setting the differential of this function
equal to zero, we get the equations
together with the constraint
2(y + z) - AyZ = 0,
2(x + z) - AXZ = 0,
2(x + y) - AXY = 0,
xyz = V.
The first three equations imply that X = Y = z; the last then gives VIla at the
common value.
*13. FUNCTIONAL DEPENDENCE
The question, roughly, is this: If we are given a collection of continuous functions,
all defined on some open set A, how can we tell whether or not some of them are
functions of the rest? For example, if we are given three real-valued continuous
functions fll 12, and fa, how can we tell whether or not some one of them is a
function of the other two, say fa is a function offl and12, which means that there
is a function of two variables g(x, y) such that fa(t) = g(!I (t), f2(t)) for all t in
the common domain A? If this happens, we say that fa is functionally dependent
on!I and f2. This is very nearly the same as asking when it will be the case that
the range S of the mapping F: t ~ <'!I(t),!2(t), fa(t) >- is a two-dimensional
submanifold of IRa. However, there are differences in these questions that are
worth noting. If fa is functionally dependent on fl and f2' then the range of F
certainly lies on a two-dimensional submanifold of IRa, namely, the graph of g.
But this is no guarantee that it itself forms a two-dimensional submanifold.
For example, both 12 and fa might be functionally dependent on !I, f2 = go fl'
and fa = h 0!I, in which case the range of F lies on the curve <. s, g(s), h(s) >- in
IRa, which is a one-dimensional submanifold. In the opposite direction, the range
of F can be a two-dimensional submanifold M without fa being functionally
dependent onf2 and fl. All we can conclude in this case is that locally one of the
functions {h} ~ is a function of the other two, since locally M is a surface patch,
in the language of the last section. But if we move a little bit away on the
eurving surface M to the neighborhood of another point, we may have to solve
for a different one of the functions. Nevertheless, if M = range F is a subset of
u two-dimensional manifold, it is reasonable to say that the functions {h} ~ are
functionally dependent, and we are led to examine this more natural notion.
176 THE DIFFERENTIAL CALCULUS 3.1:~
If we assume that F = -<ft, f2' fa >- is continuously differentiable and that
the rank of dFa is 3 at some point a in A, then the implicit-function theorem
implies that F[A] includes a whole ball in 1R3 about the point F(a). Thus a
necessary condition for !If = range F to lie on a two-dimensional submanifold in
1R3 is that the rank of dFa be everywhere less than 3. We shall see, in fact, that
if the rank of dFais 2 for all a, then lIf = range F is essentially a two-dimensional
manifold. (There is still a tiny difficulty that we shall explain later.) Our tools
are going to be the implicit-function theorem and the following theorem, which
could well have come much earlier, that the rank of T is a "lower semicon-
tinuous" function of T.
Theorelll 13.1. Let V and W be finite-dimensional vector spaces, normed
in some way. Then for any Tin Hom(V, W) there is an E such that
liS - Til < E ==} rank S ~ rank T.
Pl'oof. Let T have null space N and range R, and let X be any complement of
N in V. Then the restriction of T to X is an isomorphism to R, and hence is
bounded below by some positive m. (Its inverse from R to X is bounded by
some b, by Theorem 4.2, and we set m = l/b.) Then if liS - Til < m/2, it
follows that S is bounded below on X by m/2, for the inequalities
IIT(a) II ~ mllall and II(S - T)(a)11 ::s; (m/2)llall
together imply that IIS(a)11 ~ (m/2)llall. In particular, S is injective on X, and
so rank S = derange S) ~ d(X) = d(R) = rank T. D
We can now prove the general local theorem.
Theorelll 13.2. Let V and W be finite-dimensional spaces, let l' be an integer
less than the dimension of W, and let F be a continuously differentiable map
from an open subset A C V to W such that the rank of dF-y = r for all 'Y
in A. Then each point 'Y in A has a neighborhood U such that F[U] is an
r-dimensional patch submanifold of W.
Proof. For a fixed 'Y in A let VIand Y be the null space and range of dF-y, let
V2 be a complement of VI in V, and view V as VI X V2. Then F becomes
a function F(~, 71) of two variables, and if 'Y = -<a, f3>-, then dF~a,{J> is an
isomorphism from V 2 to Y. At this point we can already choose the decom-
position W = WI Et> W2 with respect to which F[A] is going to be a graph
(locally). We simply choose any direct sum decomposition W = WI Et> W2
such that W2 is a complement of Y = range dF<a,{J>' Thus WI might be Y,
but it doesn't have to be. Let P be the projection of W onto W b along W2.
Since Y is a complement of the null space of P, we know that PlY is an
isomorphism from Y to WI. In particular, WI is r-dimensional, and
rank P 0 dF<a,{J> = r.
3.13 FUNCTIONAL DEPENDENCE 177
Moreover, and this is crucial, P is an isomorphism from the range of
dF <~,~> to WI for all <~, 71 >- sufficiently close to <a, (3 >-. For the above rank
theorem implies that rank po dF <~,~> ~ rank P 0 dF <a,~> = r on some
neighborhood of <a, (3 >-. On the other hand, the range of P 0 dF <~,~> is
included in the range of P, which is WI, and so rank P 0 dF <~,~> ~ r. Thus
rank po dF <~,'1> = r for <~, 71>- near <a, (3)-, and since rank dF <~,~> = r
by hypothesis, we see that P is an isomorphism on the range of any such dF<~,'1>'
Now define H: WI X A -+ WI as the mapping
<r,~, 71>- f---+ P 0 Fa, 71) - r.
If J1. = po F(a, (3), then dH~I',a,~> = P 0 dF~a,~>, which is an isomor-
phism from V 2 to WI. Therefore, by the implicit-function theorem there exists
a neighborhood L X M X N of <J1., a, (3 >- and a uniquely determined con-
tinuously differentiable mapping G from L X M to N such that
H(r, ~, G(r, ~») = 0
on L X M. That is,
r = P 0 F(~, G(r, ~»)
on L X M.
The remainder of our argument consists in showing that F (~, G(r, ~») is a
function of r alone. We start by differentiating the above equation with respect
to ~, getting
0= po (dFI+dF2odG2) = podFo <I,dG2>-.
As noted above, P is an isomorphism on the range of dF <~,~> for all <~, 71 >-
sufficiently close to <a, (3 >- ,and if we suppose that L X M is also taken small
enough so that this holds, then the above equation implies that
dF<~,~> 0 <I, dG2>- = 0
for all <r, ~>- E L X M. But this is just the statement that the partial differ-
ential with respect to ~ of F (~, G(r, ~») is identically 0, and hence that
F(~, G(r, ~»)
is a continuously differentiable function K of r alone:
F(~, G(r, ~») = K(t).
Hince 71 = G(r, ~) and r = P 0 F(~, 71), we thus have Fa, 71) = K(P 0 F(~, 71»),
or
F = K 0 P 0 F,
and this holds on the open set U consisting of those points <~, 71 >- in M X /1/
Huch that P 0 F(~, 71) E L. If we think of W as WI X W 2, then F and K
are ordered pairs of functions, F = <FI, F2>- and K = <l, k>- ,P is the
mapping <r, /I >- f---+ r, and the second component of the above equation is
F2 = k 0 Fl.
178 THE DIFFERENTIAL CALCULUS 3.13
Since F1[U] = P 0 F[U] = L, the above equation says that F[U] is the graph of
the mapping k from L to W 2. Moreover, L is an open subset of the r-dimensional
vector space W 1, and therefore F[U] is an r-dimensional patch manifold in
W = W 1 X W 2• 0
The above theorem includes the answer to our original question about
functional dependence.
Corollary. Let F = {fi}i be an m-tuple of continuously differentiable
real-valued functions defined on an open subset A of a normed linear space
V, and suppose that the rank of dFa has the constant value r on A, where r
is less than m. Then any point 'Y in A has a neighborhood U over which
m - r of the functions are functionally dependent on the remaining r.
Proof. By hypothesis the range Y of dF'Y = -<df~, ... , df!7> is an r-dimen-
sional subspace of IRm. We can therefore find a basis for a complementary sub-
space W2 by choosing m - r of the standard basis elements {6i}, and we may
as well renumber the functions t so that these are 6T +1, ••• ,6m • Then the
projection P of IRm onto W = L(61, ••• , 6T) is an isomorphism from Y to W
(since Y is a complement of its null space), and by the theorem there is a neigh-
borhood U of 'Y over which (l - P) 0 F is a function k of po F. But this says
exactly that -<r+1, ... ,r> = k 0 -<fl, ... ,r>. That is, k is an (m - r)-
tuple-valued function, k = -<kr+t, ... , km >, and fi = ki 0 -<ft, ... ,F> for
j = r + 1, ... ,m. 0
y Fig. 3.12
We mentioned earlier in the section that there was a difficulty in concluding
that if F is a continuously differentiable map from an open subset A of V to W
whose differential has constant rank r less than deW), then S = range F is an
r-dimensional submanifold of S. The flaw can be described as follows. The
definition of a submanifold S of X required that each point of S have a neighbor-
hood in X whose intersection with S is a patch. In the case before us, what we
3.14 UNIFORM CONTINUITY AND FUNCTION-VALUED MAPPINGS 179
can conclude is that if fl is a point of S, then fl = F(at.) for some at. in A, and at.
has a neighborhood U whose image under F is a patch. But this image may not
be a full neighborhood of fl in S, because S may curve back on itself in such a
way as to intrude into every neighborhood of fl. Consider, for example, the one-
dimensional r imbedded in 1R3 suggested by following Fig. 3.12. The curve
begins in the xz-plane along the z-axis, curves over, and when it comes to the
xy-plane it starts spiraling in to the origin in the xy-plane (the point of change
over from the xz-plane to the xy-plane is a singularity that we could smooth out).
The origin is not a point having a neighborhood in 1R3 whose intersection with r
is a one-patch, but the full curve is the image of (-1, 1) under a continuously
differentiable injection.
We would consider r to be a one-dimensional manifold without any difficulty,
but something has gone wrong with its imbedding in 1R3, so it is not a one-dimen-
sional submanifold of 1R3.
*14. UNIFORM CONTINUITY AND FUNCTION-VALUED MAPPINGS
In the next chapter we shall see that a continuous function F whose domain is a
bounded closed subset of a finite-dimensional vector space V is necessarily
uniformly continuous. This means that given E, there is a ~ such that
I!~ - 'III < ~ =} IIF(~) - F('1)1I < E
for all vectors ~ and 'I in the domain of F.
The point is that 8 depends only on E and not, as in ordinary continuity,
on the "anchor" point at which continuity is being asserted. This is a very
important property. In this section we shall see that it underlies a class of
theorems in which a point map is escalated to a function-valued map, and prop-
erties of the point map imply corresponding properties of the function-valued
map. Such theorems have powerful applications, as we shall see in Section 15
and in Section 1 of Chapter 6. An application that we shall get immediately here
is the theorem on differentIation under the integral sign. However, it is only
Theorem 14.3 that will be used later in the book.
Suppose first that Fa, 'I) is a bounded continuous function from a product
open set M X N to a normed linear space X. Holding 'I fixed, we have a function
I.(E) = F(~, 'I) which is a bounded continuous function on M, that is, an
~Iement of the normed linear space Y = me(M, X) of all bounded continuous
maps from M to X. This function is also indicated F( . , 'I), so that f~ = F( . , 'I)'
We are supposing that the uniform norm is being used on Y:
IIf~1I = lub {lIf'l(~)1I : ~ E M} = lub {IIF(~, '1)11 : ~ EM}.
Theorem 14.1. In the above context, if F is unt"formly continuous, then the
mapping 'I 1-+ f'l (or 'I 1-+ F(·, 'I)) is continuous, in fact, uniformly continuous,
from N to Y.
180 THE DIFFERENTIAL CALCULUS 3.14
Proof. Given E, choose 8 so that
II-<~, 1]>- - -<}L, v>-II < 8 => IIF(~, 1]) - F(}L, v)11 < E.
Taking}L = ~ and rewriting the right-hand side, we have
for all~. Thus
111] - vII < 8 => Ilf'1W - f.WII < E
111] - vII < 8 => Ilf'1 - f.lloo ::::; E. 0
We have proved that if a function of two variables is uniformly continuous,
then the mappings obtained from it by the general duality principle are con-
tinuous. This phenomenon lies behind many well-known facts. For example:
Corollary. If F(x, y) is a uniformly continuous real-valued function on the
unit square [0, 11 X [0, 11 in ~2, then UF(x, y) dx is a continuous function
of y.
Proof. The mapping y ~ IJ F(x, y) dx is the composition of the bounded
linear mapping f ~ IJ f from e([O, 11 to ~ with the continuous mapping
y ~ F( . , y) from [0, 1] to e[O, 1], and is continuous as the composition of con-
tinuous mappings. 0
We consider next the differentiability of the above duality-induced mapping.
Theorelll 14.2. If F is a bounded continuous mapping from an open product
set M X N of a normed linear space V X W to a normed linear space X,
and if dJi'~a.fJ> exists and is a bounded uniformly continuous function of
-<a, fJ>- on M X N, then cP: 1] ~ F(·, 1]) is a differentiable mapping from
N to Y = CBe(M, X), and [dcpfJ(1])]W = dF~~.fJ>(1]).
Proof. Given E, we choose 8 by the uniform continuity of dF2, so that
II}L - vii < 8 => IldF~~.Jl> - dF~~.•>11 < E
for all ~ E M. The corollary to Theorem 7.4 then implies that
for all ~ E M, all fJ EN, and all 1] such that the line segment from fJ to fJ +1]
is in N. We fix fJ and rewrite the right-hand side of the above inequality. This
is the heart of the proof. First
dF~~.fJ>(1]) = Fa, fJ + 1]) - F(~, fJ)
= [ffJ+'1 - ftJ](O = [cp(fJ + 1]) - cp(fJ)](~) = [dCPfJ(1])](~)·
Next we can check that if IldF~Jl.•>11 ::::; b for -<}L, v>- EM X N, then the map-
ping T defined by the formula [T(1])J(~) = dF~~.fJ>(1]) is an element of
Hom(W, Y) of norm at most b. We leave the detailed verification of this as an
3.14 UNIFORM CONTINUITY AND FUNCTION-VALUED MAPPINGS
exercise for the reader. The last displayed inequality now takes the form
and hence
1171// < ~ => II[Acp/l(71) - T(71)]WII ~ EII7III,
//71// < ~ => //Acp/l(71) - T(71)//", ~ E//71//.
This says exactly that the mapping cP is differentiable at (3 and dcp/l = T. 0
181
The mapping cP is in fact continuously differentiable, as can be seen by
arguing a little further in the above manner. The situation is very close to being
an application of Theorem 14.1.
The classical theorem on differentiability under the integral sign is a corollary
of the above theorem. We give a simple case. Note that if 71 is a real variable y,
then the above formula for dcp can be rewritten in terms of arc derivatives:
[cp'(b)]W = ~~ (~, b).
Corollary. If F(x, y) is a continuous real-valued function on the unit
square [0, 1] X [0, 1], and if aFjay exists and is a uniformly continuous
function on the square, then UF(x, y) dx is a differentiable function of y
and its derivative is U(aFjay) (x, y) dx.
Proof. The mapping T: y f-+ UF(x, y) dx is the composition of the bounded
linear mappingf f-+ Uf(x) dx from e([O, 1]) to IR with the differentiable mapping
cp: y f-+ F( . ,y) from [0, 1] to e([O, 1]), and is therefore differentiable by the
composite-function rule. Then Theorem 7.2 and the fact that the differential of
a bounded linear map is itself give
T'(y) = 101
[cp'(y)](x) dx = 101
:~ (x, y) dx. 0
We come now to the situation of most importance to us, where a point to
point map generates a function-to-function map by composition. Let A be an
open set in a normed linear space V, let S be an arbitrary set, and let a be the
set of bounded mapsf from S to A. Then a is a subset of the normed linear space
<B(S, V) of all bounded functions from S to V under the uniform norm. A func-
tionf E a will be an interior point of a if and only if the distance from the range
off to the boundary of a is a positive number ~, for this is clearly equivalent to
saying that a includes a ball in <B(S, V) about the point f. Now let g be any
bounded mapping from A to a normed linear space W, and let G: a ~ <B(S, W)
be composition by g. That is, h = G(f) if and only if f E a and h = g 0 f. We
can consider both the continuity and differentiability of G, but we shall only
work out the differentiability theorem.
Theorem 14.3. Let the function g: A ~ W be differentiable at each point
a in A, and let dg(JI. be a bounded uniformly continuous function of a. Then
the mapping G: a ~ <B(S, W) defined by G(fl = go f is differentiable at
182 THE DIFFERENTIAL CALCULUS
any interior point f in <t and dGj : CB(S, V) -t CB(S, W) is defined by
[dGj(h)](s) = dgj(8)(h(s»
for all s E S.
Proof. Given E, choose (j by the uniform continuity of dg so that
and then apply the corollary to Theorem 7.4 once more to conclude that
3.15
provided the line segment from a to a + ~ is in A. Now choose any fixed interior
point f in a, and choose (j' ::; (j so that Ba,(f) c <t. Then for any h in CB(S, V),
Ilhll", < (j' =} II~gjc.) (h(s» - dgj(8) (h(s» II ::; Ellh(s) II
for aIls E S. Define a map T: CB(S, V) ~ CB(S, W) by [T(h)](s) = dgjcs)(h(s».
Then the above displayed inequality can be rewritten as
That is, ~Gj = T +I:l. We will therefore be done when we h,ave shown that
T E Hom(CB(S, V), CB(S, W».
First, we have
(T(h l + h2»(s) = dgjcs )((hl + h2)(s» = dgjCs) (hl(S) + h2(s»
= dgjcs)(hl(s» + dgj(8)(h2(s» = (T(hl»(S) + (T(h2»(S).
Thus T(hl + h2) = T(hl ) + T(h2), and homogeneity follows similarly. Second,
if b is a bound to Iidgall on A, then IIT(h)ll", = lub {II (T(h»(s)11 : s E S} ::;
lub {lldgjcs)II . Ilh(s) II :s E S} ::; bllhll",· Therefore, IITil ::; b, and we are
finished. 0
In the above situation, if g is from A X U to W, so that G(f) is the function
h given by h(t) = g(j(t), t), then nothing is changed except that the theorem is
about dgl instead of dg. If, in addition, V is a product space V 1 X V 2, so that f
is of the form <hh> and [G(f)](t) = g(!I(t),f2(t), t), then our rules about
partial differentials give us the formula
[dGj(h)](t) = dyJc!)(hl(t» + dY7c!) (h2(t».
*15. THE CALCULUS OF VARIATIONS
The problems of the calculus of variations are simply critical-point problems of a
certain type with a characteristic twist in the way the condition dFa = 0 is used.
We shall illustrate the subject by proving one of its standard theorems.
Since we want to solve a constrained maximum problem in which the domain
is an infinite-dimensional vector space, a systematic discussion would start off
3.15 THE CALCULUS OF VARIATIONS 183
with a more general form of the Lagrange multiplier theorem. However, for our
purpose it is sufficient to note that if S is a closed plane M + a, then the restric-
tion of F to S is equivalent to a new function on the vector space 111, and its
differential at {3 = T/ + a in S is clearly just the restriction of dFfJ to M. The
requirement that {3 be a critical point for the constrained function is therefore
simply the requirement that dFfJ vanish on M.
Let F be a uniformly continuous differentiable real-valued function of three
variables defined on (an open subset of) W X W X IR, where W is a normed
linear space. Given a closed interval [a, b] C IR, let V be the normed linear space
e1([a, b], W) of smooth arcs f: [a, b] ~ W, with Ilfll taken as Ilflloo + 11!'1100.
The problem is to maximize the (nonlinear) functional G(f) = f: F(j(t), !'(t), t) dt,
subject to the restraints f(a) = a and feb) = {3. That is, we consider only
smooth arcs in W with fixed endpoints a and {3, and we want to find that arc
from a to {3 which maximizes (or minimizes) the integral. Now we can show
that G is a continuously differentiable function from (an open subset of) V to R
The easiest way to do this is to let X be the space e([a, b], W) of continuous arcs
under the uniform norm, and to consider first the more general functional K
from X X X to IR defined by K(f, g) = f: F(j(t), get), t) dt. By Theorem 14.3
the integrand map <f, g>- 1---+ F(jO, gO, .) is differentiable from X X X to
e([a, b]) and its differential at <f, g>- evaluated at <h, k>- is the function
dF~f(t),g(t),t>(h(t)) + dF~f(t),g(t),t>k(t).
Sincef 1---+ f: f(t) is a bounded linear functional on e, it is differentiable and equal
to its differential. The composite-function rule therefore implies that K is
differentiable and that
dK<f,g>(h, k) = ib
[dF1(h(t)) +dF2(k(t))] dt,
where the partial differentials in the integrand are at the point <f(t), get), t>-.
Now the pairs <f, g>- such thatf' exists and equals g form a closed subspace of
X X X which is isomorphic to V. It is obvious that they form a subspace, but
to see that it is closed requires the theory of the integral for parametrized arcs
from Chapter 4, for it depends on the representation f(t) = f(a) +f~ !,(s) ds
and the consequent norm inequality Ilf(t) - f(a) II :::; (t - a) 11!'1100. Assuming
this, we see that our original functional G is just the restriction of K to this sub-
space (isomorphic to) V, and hence is differentiable with
This differential dGf is called the first variation of G about f.
The fixed endpoints a and {3 for the arc f determine in turn a closed plane P
in V, for the evaluation maps (coordinate projections) 7rz: f 1---+ f(x) are bounded
and P is the intersection of the hyperplanes 7ra = a and 7rb = {3. Since P is a
translate of the subspace M = {f E V: f(a) = feb) = O}, our constrained
184 THE DIFFERENTIAL CALCULUS 3.15
maximum equation is
dG/(h) = lab [dFl(h(t)) +dF2(h'(t))] dt = 0
for all h in M.
We come now to the special trick of the calculus of variations, called the
lemma of Du Bois-Reymond.
Suppose for simplicity that W = IR. Then F is a function F(x, y, t) of three
real variables, the partial differentials are equivalent to ordinary partial deriva-
tives, and our critical-point equation is
dG/(h) = rb
(aF . h +aF . h') = O.
Ja ax ay
If we integrate the first term in the integral by parts and remember that h(a) =
h(b) = 0, we see that the equation becomes
where g = h'. Since h is an arbitrary continuously differentiable function except
for the constraints h(a) = h(b) = 0, we see that g is an arbitrary continuous
function except for the constraint f: get) dt = O. That is, aFjay - faFjax is
orthogonal to the null space N of the linear functional g ~ f: get) dt. Since the
one-dimensional space N1. is clearly the set of constant functions,
our condition becomes
aF taF
ay (f(t),f'(t), t) = Jo ax (f(s),I'(s), s) ds +C.
This equation implies, in particular, that the left member is differentiable. This
is not immediately apparent, since!, is only assumed to be continuous. Differ-
entiating, we conclude finally that I is a critical point of the mapping G if and
only if it is a solution of the differential equation
:t :: (f(t),!'(t), t) = :~ (f(t),f'(t), t),
which is called the Euler equation of the variational problem. It is an ordinary
differential equation for the unknown function I; when the indicated derivative
is computed, it takes the form
a2FI" + a2F !' + a2F _ aF = 0
ay2 ay ax ay at ax .
If W is not IR, we get exactly the same result from the general form of the
integration by parts formula (using Theorem 6.3) and a more sophisticated
3.15 THE CALCULUS OF VARIATIONS 185
version of the above argument. (See Exercise 10.14 and 10.15 of Chapter 4.)
That is, the smooth arc / with fixed endpoints a and p is a critical point of the
mapping g 1-+ f: F(g(t), g'(t), t) dt if and only if it satisfies the Euler differential
equation
d 2 1
dt dF</(1)./'(1),1> = dF</(1)./'(1),1>·
This is now a vector-valued equation, with values in W*. If W is finite-dimen-
sional, with dimension n, then a choice of basis makes W* into IRn, and this
vector equation is equivalent to n scalar equations
! ::i(j(t), f'(t), t) = :~ (j(t),f'(t), t),
where F is now a function of 2n + 1 real variables,
F(x, y, t) = F(xb ... ,Xn,Yb ... , Yn, t).
Finally, let us see what happens to the simpler v~riational problem (W = IR)
when the endpoints of/are not fixed. Now the critical-point equation is dG/(h) =
ofor all h in V, and when we integrate by parts it becomes
b b
of. hI+1(oF -!!:.. OF) h= 0
oY a ox dt oY
for all h in V. We can reason essentially as above, but a little more closely, to
conclude that a function / is a critical point if and only if it satisfies the Euler
equation
!!:..(OF) _ of = 0
dt oY ox
and also the endpoint conditions
oFi - OFi - 0
oY I=a - oY I=b - •
This has been only a quick look at the variational calculus, and the interested
reader can pursue it further in treatises devoted to the subject. There are many
more questions of the general type we have considered. For example, we may
want neither fixed nor completely free endpoints but freedom subj~ct to con-
straints. We shall take this up in Chapter 13 in the special case of the varia-
tional equations of mechanics. Or again, / may be a function of two or more
variables and the integral may be a multiple integral. In this case the Euler
equation may become a system of partial differential equations in the unknown/.
Finally, there is the question of sufficient conditions for the critical function to
give.a maximum or minimum value to the integral. This will naturally involve a
study of the second differential of the functional G, or its second variation, as it is
known in this subject.
186 THE DIFFERENTIAL CALCULUS 3.1(
*16. THE SECOND DIFFERENTIAL AND
THE CLASSIFICATION OF CRITICAL POIN.TS
Suppose that V and Ware normed linear spaces, that A is an open subset of V,
and that F: A ~ W is a continuously differentiable mapping. The first differ-
ential of F is the continuous mapping dF: l' ~ dF-y from A to Hom(V, W). W('
now want to study the differentiability of this mapping at the point a. Prp-
sumably, we know what it means to say that dF is differentiable at a. By
definition d(dF)a is a bounded linear transformation T from V to Hom(V, W)
such that A(dF)a(1]) - T(1]) = 0(1]). That is, dFa+'1 - dFa - T(7J) is all
element of Hom(V, W) of norm less than el17J11 for 71 sufficiently small. We St't
d2Fa = d(dF)a and repeat: d2Fa = d2FaO is a linear map from V to Hom(V, W),
d2Fa(1]) = d2Fa(1])(-) is an element of Hom(V, W), and d2Fa(1])U) is a vector
in W. Also, we know that d2Fa is equivalent to a bounded bilinear map
w:Vx V~W,
where w(7J, t) = d2Fa(1])(t).
The vector d2Fa(7J)(t) clearly ought to be some kind of second derivative of
F at a, and the reader might even conjecture that it is the mixed derivative ill
the directions t and 1].
Theorem 16.1. If F: A ~ W is continuously differentiable, and if th('
second differential d2Fa exists, then for each fixed p. E V the functioll
D!,F: l' ~ D!,F(1') from A to W is differentiable at a and Dp(D!,F)(a) =
(d2Fa(V)) (p.),
Proof, We use the evaluation-at-p. map eVIL: Hom(V, W) ~ W defined for a
fixed p. in V by ev!'(T) = T(p.). It is a bounded linear mapping. Then
(D!,F)(a) = dFa(p.) = ev!'(dFa) = (ev!, 0 dF)(a),
so that the function D!,F is the composition eVIL 0 dF. It is differentiable at (X
because d(dF)a exists and eVIL is linear. Thus (Dp(D!,F))(a) = d(D!,F)a(v) =c
d(ev!, 0 dF)a(v) = (ev!, 0 d(dF)a) (v) = ev!'[(d2Fa)(v)] = (d2Fa(V)) (p.). 0
The reader must remember in going through the above argument that D!,F
is the function (D!,F) (.), and he might prefer to use this notation, as follow,;
Dp((D!,F)('))la = d((D!,F)('))a(v) = d(ev!, 0 dF(.))a(v)
= rev!' 0 d(dF(.))a](v) = eV!,(d2Fa(v)).
If the domain space V is the Cartesian space IRn, then the differentiability of
(DaiF)O = (aFjaxj)(') at a implies the existence of the second partial deriva
tives (a2Fjaxi aXj) (a) by Theorem 9.2, and with band c fixed, we then haw
Dc(DbF) = De (L bi :~) = :E biDe ::i
= L bi (L Cj~ (aF)) = L biCj a2
F .
aXj aXi i,j aXj aXi
3.16 SECOND DIFFERENTIAL; CLASSIFICATION OF CRITICAL POINTS 187
Thus,
Corollary I. If V = IRn in the above theorem, then the existence of d2Fa
implies the existence of all the second partial derivatives (o2FjoXi OXj)(a)
and
Moreover, from the above considerations and Theorem 9.3 we can also
conclude that:
Theorem 16.2. If V = IRn , and if all the second partial derivatives
(o2F/OXi OXj) (a) exist and are continuous on the open set A, then the
second differential d2Fa exists on A and is continuous.
Proof. We have directly from Theorem 9.3 that each first partial derivative
(oF/OXj)(') is differentiable. But ofIOXj = eVa; a dF, and the corollary is then a
consequence of the following general principle. 0
Lemma. If {Si}~ is a finite collection of linear maps on a vector space W
such that S = -<S1, ... , Sk> is invertible, then a mapping F: A ~ W is
differentiable at a if and only if Si a F is differentiable at a for all i.
Proof. For then S a F and F = S-1 a S a F are differentiable, by Theorems 8.1
and 6.2. 0
These considerations clearly extend to any number of differentiations. Thus,
if d2F(.): y ~ d2Fy is differentiable at a, then for fixed band c the evaluation
d2F(.)(b, c) is differentiable at a and the formula
shows (for special choices of b and c) that all the second partials (o2Floxj OXi)(')
are differentiable at a, with
Conversely, if all the third partials exist and are continuous on A, then the
~econd partials are differentiable on A by Theorem 9.3, and then d2F(.) is
differentiable by the lemma, since (o2Floxi OXj)(-) = ev<ai,a;> a d2F(.).
As the reader will remember, it is crucially important in working with
higher-order derivatives that o2Floxi OXj = o2Flo.'Cj OXi, and we very much
need the same theorem here.
Theorem 16.3. The second differential is a symmetric function of its two
arguments: (d2Fa(7J)(~) = (d2Fa(O) (7J).
188 THE DIFFERENTIAL CALCULUS 3.16
Proof. By the definition of d(dF)a, given E, there is a 8 such that
whenever 117]11 ~ 8. Of course, tl(dF)a(7]) = dFa+7I - dFa. If we write down
the same inequality with 7] replaced by 7] + r, then the difference of the trans-
formations in the left members of the two inequalities is
and the triangle inequality therefore implies that
provided that both 7] and 7] + r have norms at most 8. We shall take IIrll ~ 8/3
and 117]11 ~ 28/3. If we hold r fixed, and if we set T = d2Fa (-r) and G(~) =
F(~) - F(~ + r), then this inequality becomes IldGa +7I - Til ~ 2E(II7]11 + Ilrll),
and since it holds whenever 117]11 ~ 28/3, we can apply the corollary to Theorem
7.4 and conclude that IltlGa+7I(~) - T(~)II ~ 2E(II7]1I + IIrl!)11 ~II, provided that 7]
and 7] + ~ have norms at most 28/3. This inequality therefore holds if 7], r, and ~
all have norms at most 8/3. If we now set r = -7], we have
and tlGa+7IW = F(a +7] + ~) - F(a + 7]) - F(a + ~) + F(a). This function
of 7] and ~ is called the second difference of F at a, and is designated tl2Fa(7], ~).
Note that it is symmetric in ~ and 7]. Our final inequality can now be rewritten as
Reversing 7] and ~, and using the symmetry of tl2Fa, we see that
provided 7] and ~ have norms at most 8/3. But now it follows by the usual
homogeneity argument that this inequality holds for all 7] and~. Finally, since
E is arbitrary, the left-hand side is zero. 0
The reader will remember from the elementary calculus that a critical
point a for a function / [f'(a) = 0] is a relative extremum point if the second
derivativef"(a) exists and is not zero. In fact, if f"(a) < 0, then/has a relative
maximum at a, becausef"(a) < 0 implies thatf' is decreasing in a neighborhood
of a and the graph of / is therefore concave down in a neighborhood of a. Simi-
larly, / has a relative minimum at a if f'(a) = 0 and f"(a) > O. If f"(a) = 0,
nothing can be concluded.
If/ is a real-valued function defined on an open set A in a finite-dimensional
vector space V, if ex E A is a critical point of /, and if d2/a. exists and is a non-
3.16 SECOND DIFFERENTIAL; CLASSIFICATION OF CRITICAL POINTS 189
singular element of Hom(V, V*), then we can draw similar conclusions about the
behavior of I near a, only now there is a richer variety of possibilities. The
reader is probably already familiar with what happens for a function I from 1R2
to IR. Then a may be a relative maximum point (a "cap" point on the graph
of I), a relative minimum point, or a saddle point as shown in Fig. 3.13 for the
graph of the translated function t:./a. However, it must be realized that new
axes may have to be chosen for the orientation of the saddle to the axes to look as
shown. Replacing I by t:./a amounts to supposing that 0 is the critical point and
that 1(0) = o. Note that if 0 i~_ a saddle point, then there are two complemen-
tary subspaces, the coordinate axes in the Fig. 3.13, such that 0 is a relative
maximum for Iwhen/is restricted to one of them, and a relative minimum point
for the restriction of I to the other.
Fig. 3.13
We shall now investigate the general case and find that it is just like the
two-dimensional case except that when there is a saddle point the subspace on
which the critical point is a maximum point may have any dimension from 1
to n - 1 [where d(V) = n). Moreover, this dimension is exactly the number of
-l's in the standard orthonormal basis representation of the quadratic form
q(~) = wa, ~) = d2IOt(~, ~).
Our hypotheses, then, are that I is a continuously differentiable real-valued
function on an open subset of a finite-dimensional normed linear space V, that
a E A is a critical point for I (dla = 0), and that the mapping d21a.: V --+ V*
exists and is nonsingular. This last hypothesis is equivalent to assuming that the
bilinear form wa, '7) = d2Ia(~, '7) has a nonsingular matrix with respect to any
basis for V. We now use Theorem 7.1 of Chapter 2 to choose an w-orthonormal
basis {ai}~' Remember that this means that w(ai' aj) = 0 if i ¢ j, w(ai' ai) = 1
for i = 1, ... ,p, and w(ai' ai) = -1 for i = p + 1, ... ,n. There cannot be
any 0 values for w(ai' ai) because the Inatrix tij = w(ai' aj) is nonsingular: if
w(ai' ai) = 0, then the whole ith column is zero, the column space has dimen-
sion :::;; n - 1, and the Inatrix is singular.
We can use the basis isomorphism tp to replace V by IR" (i.e., replace I by
10 tp), and we can therefore suppose that V = IR" and that the standard basis is
190 THE DIFFERENTIAL CALCULUS 3.16
w-orthonormal, with w(x, y) = Ll XiYi - L;+l XiYi. Since
w(!5i, !5j) = d2fa{!5 i, !5j) = DaiDaif(a) = a a
2
a'f (a),
Xi Xj
our hypothesis of w-orth0gonality is that (a2flaXi aXj) (a) = 0 for i ¢ j,
a2flaxl = 1 for i = 1, ... ,p, and a2flaxl = -1 for i = p + 1, ... , n.
Since p can have any value from 0 to n, there are n + 1 possibilities. We show
first that if p = n, then a is a relative minimum off. In this case the quadratic
form q is said to be positive definite, since q(x) = w(x, x) is positive for every
nonzero x. We also say that the bilinear form w(x, y) = d2fa(x, y) is positive
definite, and, in the language of Chapter 5, that w is a scalar product.
Theorelll 16.4. Let f be a continuously differentiable real-valued function
defined on an open subset A of Rn, and let a E A be a critical point of fat
which d2f exists and is positive definite. Thenf has a relative minimum at a.
Proof. We suppose, as above, that the standard basis {!5i} ~ is w-orthonormal.
By the definition of d2fa, given E, there is a 15 such that
whenever Ilyll ::::; 15. Now dfa = 0, since a is a critical point off, and d2fa(x, y) =
L~ XiYi, by the assumption that {!5i) ~ is w-orthonormal. Therefore, if we use
the two-norm on IR and set y = tx, we have
Also, if h(t) = f(a + tx), then h'(s) = dfa+8x(X), and this inequality therefore
says that (1 - E)tllxl1 2 ::::; h'(t) ::::; (1 + E)tllxI1 2• Integrating, and remembering
that h(l) - h(O) = f(a + x) - f(a) = fl.fa(x), we have
e2 E) IIxl1 2 ::::; Ilfa(x) ::::; e~ E) IIxll 2
whenever Ilxll ::::;15. This shows not only that a is a relative minimum point
but also that fl.fa lies between two very close paraboloids when x is sufficiently
small. 0
The above argument will work just as well in general. If
p n
q(x) = :E x~ - :E x~
1 p+l
is the quadratic form of the second differential and IIxll~ = L~ xl, then replac-
ing IIxl1 2 inside the absolute values in the above inequalities by q(x), we conclude
that
q(x) - Ellxl1 2 < A f ( ) < q(x) + Ellxl1 2
2 - UJa X - 2 '
3.17 THE TAYLOR F.ORMULA 191
or
! (t (1 - E)X~ - t (1 + E)X~) ~ afa(x)
1 p+l
~ i (t (1 + E)X~ - t (1 - E)X~) .
1 p+l
This shows that afa lies between two very close quadratic surfaces of: the
same type when IIxll ~ ~. If 1 ~ p ~ n - 1 and a = 0, then f has a relative
minimum on the subspace VI = L({aiH) and a relative maximum on the
complementary space V2 = L({~i}~+l)'
According to our remarks at the end of Section 2.7, we can read off the type
of a critical point for a function of two variables without orthonormalizing by
looking at the determinant of the matrix of the (assumed nonsingular) form d2fa.
This determinant is
2 a2f a2f (a2f)2
tUt22 - (t12) = ax2 ax2 - ax ax .
1 2 1 2
If it is positive, then a is either a relative minimum or a relative maximum. We
can tell which by following f along a single line, say the xraxis. Thus, if
a2f/ax~ < 0, then a is a relative maximum point. On the other hand, if the
above expression is negative, then a is a saddle point.
It is important for the calculus of variations that Theorem 16.4 remains
true when the domain space is replaced by a space of the general type that we
shall study in the next chapter, called a Banach space. The hypotheses now are
that a is a critical point of f, that q(~) = dfa(~, ~) is positive definite, and that
the scalar product norm qlJ 2 (see Chapter 5) is equivalent to. the given norm on
V. The proof remains virtually unchanged.
*17. HIGHER ORDER DIFFERENTIALS. THE TAYLOR FORMULA
We have seen that if V and Ware normed linear spaces, and if F is a differ-
entiable mapping from an open set A in V to W, then its differential dF = dF(o)
is a mapping from A to Hom(V, W). If this mapping is differentiable on A, then
its differential d(dF) = d(dFk) is a mapping from A to Hom(V, Hom(V, W)).
We remember that an element of Hom(V, Hom(V, W)) is equivalent by duality
to a bilinear mapping from V X V to W, and if we designate the space of all such
bilinear mappings by Hom2(V, W), then d(dF) can be considered to be from A to
Hom2(V, W). We write d(dF) = d2F, and call this mapping the second differ-
ential of F. In Section 16 we saw that d2Fa(~, 1/) = D~(D'1F)(a), and that if
V = IRn , then
2 ~F
d Fa(b, c) = DbDcF(a) = L: biCj~ (a).
UXjUXi
The differentials of higher order are defined in the same way. If
d2F: A --+ Hom2(V, W)
192 THE DIFFERENTIAL CALCULUS 3.17
is differentiable on A, then its differential, d(d2F) = d3F, is from A to
Hom(V, Hom2(V, W)) = Hom3(V, W), the space of all trilinear mappings from
V 3 = V X V X V to W. Continuing inductively, we arrive at the notion of the nth
differential of F on A as a mapping from A to Hom(V, Homn- 1(V, W)) =
Homn(V, W), the space of all n-linear mappings from vn to W. The theorem
that d2Fa is a symmetric element of Hom2(V, W) extends inductively to show
that dnFa is a symmetric element of Homn(V, W). We shall omit this proof.
Our theorem on the evaluation of the second differential by mixed directional
derivatives also generalizes by induction to give
D~l' ... ,D~nF(a) = dnFa(6, ... , ~n)'
for starting from the left-hand term, we have
D~1(D~2"'" D~nF)Ola = d(D~2" .. , D~nF('))a(~I)
= d(dn - 1Fo (b, ... , ~n))a(6)
= d(ev<~2"".~n> 0 dn-1F(.,)a(~I)
= rev 0 d(dn- 1F ) ](1: )<~2'''''~''> (.) a <;1
= eV<b".~n>(dnFaal))
= (dnFa(6))(b, .. . , ~n) = dnFa(~b ... , ~n)'
If V = IRn, then our conclusions about partial derivatives extend inductively
in the same way to show that F has continuous differentials on A up through
order m if and only if all the mth-order partial derivatives amFjaxip ... , aXim
exist and are continuous on A, with
dmFa(cI, ... , cm) = t dp .•• , ci'ma . amF a . (a).
i 1.....i m =1 X'l' ... , X'm
We now consider the behavior of F along the line t t-+ a +t7J, where, of
course, a and 7J are fixed. If A(t) = F(a +t7J), then we can prove by induction
that
We know this to be true for j = 1 by Theorem 7.2, and assuming it for j = m,
we have, by the same theorem,
dm+1A (dAm)' .
dtm+1 = dtm (t) = d(D':,'F)a+t,,(7J) = D,,(D':,'F)(a +t7J) = D':,'+IF(a +t7J).
Now suppose that Fis real-valued (W = IR). We then have Taylor's formula:
tm r+1
A(t) = A(O) +tA'(O) +... +m! A(m)(O) + (m + I)! A(m+l)(kt)
for some k between 0 and 1. Taking t = 1 and substituting from above, we have
F(a + 7J) = F(a) + D"F(a) +... + ~! D':,'F(a) + (m! I)! D':,'+IF(a +k7J),
3.17 THE TAYLOR FORMULA 193
which is the general Taylor formula in the normed linear space context. In
terms of differentials, it is
1
F(a + 71) = F(a) + dFa (7J) +... +-,d""Fa (7J, •.• , 71)
m.
1 d""+l (
+ (m + I)! Fa+k~ 71, ... , 71).
If V = IRn, then DyG = L~ Yi aGjaxi, and so the general term in the
Taylor expansion is
1 (n a )m 1 n amF
m! L Yi ax' F(a) = m!. ~ _ Yil··· Yim ax' ... ax' (a).
1 " ttt . .. ,'Im-1 '&1 1m
If m = n = 2, and if we use the notation x = -<x, y>-, s = -<8, t>-, then
1 2 1[2a2F a2F 2a2F ]
2! D8 F(a) = 2 8 ax2 (a) +28t ax ay (a) +t ay2 (a) .
The above description is logically simple, but it is inefficient in that it repeats
identical terms such as YIY2(a2Fjaxl aX2) and Y2Yl(a2Fjax2 aXl). We conclude
by describing for the interested reader the modern "multi-index" notation for this
very complicated situation.
Remember that we are looking at the mth term of the Taylor formula for F,
and that F has n variables.
For any n-tuple k = -< kll ... , kn >- of nonnegative integers, we define jkj
as L~ ki' and for x E IRn, we set Xk = XlklXl2 ... xnkn. Also we set Fk =
Fklk2·..kn' or better, if DjF = aFjaxj, we set
DkF = D/IDl2 . .. DnknF = Fk •
Finally, we set k! = kl !k2! ... kn!, and if p ~ jkj, we set (t) = p!jk !(p - jkj)!.
Then the mth term of the Taylor expansion of F is
1, L (mk ) DkF(a)xk,
m·lkl=m
which is surely a notational triumph.
The general Taylor formula is too cumbersome to be of much use in practice;
it is principally of theoretical value. The Taylor expansions that we actually
compute are generally found by other means, such as substitution of a poly-
nomial (or power series) in a power series. For example,
( X + 2)3 (x + 2)5
sin (x + y2) = (x + y2) - 3!Y + 5t ...
= x + y2 _ x3 _ x2y2 + (X5 _ Xy4) + (X4y2 _ y6
) •••
3! 2 5! 2 4! 3!
A mapping from A to W which has continuous differentials of all orders
through k is said to be of class Ck on A, and the collection of all such mappings
194 THE DIFFERENTIAL CALCULUS 3.17
is designated Ck(A, W) or ek(A, W). It is clear that Ck(A, W) is a vect(i)l"'Space
(induction). Moreover, it can also be shown by induction that a composition
of Ck-maps is itself of class Ck. This depends on <recognizing the generail form of
the mth differential of a composition FoG as being a finite sum, each term of
which is a composition offunctions chosen from F, dF, ... , dmF, G, dG, ... , dmG.
Functions of many variables are involved in these calculations, and it is
simplest to treat each as a function of a single n-tuplet variable and to apply the
obvious corollary of Theorem 8.1 that if Gt, ... , Gn are of class Ck, then so is
G = -< Gr, ... , Gn>, with dkG = -<dkGt, ... , dkGn>. As a special case of
composition, we can conclude that a product of Ck-maps is of class Ck.
We shall see in the next chapter that cP: T ~ T-I is a differentiable map on
the open set of invertible elements in Hom V (if V is a Banach space) and that
dCPT(H) = - T-IHT- I. Since -<S, H, T> ~ S-IHT-Ithen has continuous par-
tial differentials, we can continue, and another induction shows that cP is of class
Ck for every k and that dm'PT(Hb ... , Hm) is a finite sum of finite products of
T-t, HI, ... , Hm' 'It then follows that a function F defined implicitly by a Ck_
function G is also of class ck, for its differential, as computed in the implicit-
function theorem, is then a composition of maps of class Ck-I.
A mapping F which is of class Ck for all k is said to be of class Coo, and it
follows from our remarks above that the family of Coo-maps is closed under all
the operations that we have met in the calculus. If the domain of F is an open
set in IRn, then FE eOO(A, W) if and only if all the partial derivatives of F exist
and are continuous on A.
CHAPTER 4
COMPACTNESS AND COMPLETENESS
In this chapter we shall investigate two properties of subsets of a normed linear
space V which are concerned with the fact that in a certain sense all the points
which ought to be there really are there. These notions are largely independent
of the algebraic structure of V, and we shall therefore study them in their own
most natural setting, that of metric spaces. The stronger of these two properties,
compactness, helps to explain why the theory of finite-dimensional spaces is so
simple and satisfactory. The weaker property, completeness, is shared by
important infinite-dimensional normed linear spaces, and allows us to treat
these spaces in almost as satisfactory a way.
It is these properties that save the calculus from being largely a formal
theory. They allow us to define crucial elements by limiting processes, and are
responsible, for example, for an infinite series having a sum, a continuous real-
valued function assuming a maximum value, and a definite integral existing.
For the real number system itself, the compactness property is equivalent to the
least upper bound property, which has already been an absolutely essential tool
in our construction of the differential calculus in Chapter 3.
In Sections 8 through 10 we shall apply completeness to the calculus. The
first of these sections is devoted to the existence and differentiability of functions
defined by power series, and since we want to include power series in an operator
T, we shall take the occasion to introduce and exploit the notion of a Banach
algebra. Next we shall prove the contraction mapping fixed-point theorem, which
is the missing ingredient in our unfinished proof of the implicit-function theorem
in Chapter 3 and which will be the basis for the fundamental existence and
uniqueness theorem for ordinary differential equations in Chapter 6. In Section
10 we shall prove a simple extension theorem for linear mappings into a complete
normed linear space and apply it to construct the Riemann integral of a param-
atrized arc.
1. METRIC SPACES; OPEN AND CLOSED SETS
In the preceding chapter we occasionally treated questions of convergence and
continuity in situations where the domain was an arbitrary subset A of a normed
linear space V. In such discussions the algebraic structure of V fades into the
background, and the vector operations of V are used only to produce the combi-
195
196 COMPACTNESS AND COMPLETENESS 4.1
nation Iia - !311, which is interpreted as the distance from ato!3. If we distill
out of these contexts what is essential to the convergence and continuity argu-
ments, we find that we need a space A and a function p: A X A ~ JR, p(x, y)
being called the distance from x to y, such that
1) p(x, y) > 0 if x ¥- y, and p(x, x) = 0;
2) p(x, y) = p(y, x) for all x, YEA;
3) p(x, z) ::; p(x, y) +p(y, z) for all x, y, z E A.
Any set A together with such a function p from A X A to JR is called a metric
space; the function p is the metric. It is obvious that a normed linear space is a
metric space under the norm metric pea, m= Iia - !311 and that any subset B
of a metric space A is itself a metric space under p rB X B. If we start with a
nice intuitive space, like JRn under one of its standard norms, and choose a weird
subset B, it will be clear that a metric space can be a very odd object, and may
fail to have almost any property one can think of.
Metric spaces very often arise in practice as subsets of normed linear
spaces with the norm metric, but they come from other sources too. Even in the
normed linear space context, metrics other than the norm metric are used.
For example, S might be a two-dimensional spherical surface in JR3, say S =
{x : r:,~ xl = I}, and p(x, y) might be the great circle distance from x to y. Or,
more generally, S might be any smooth two-dimensional surface in JR3, and
p(x, y) might be the length of the shortest curve connecting x to y in S.
In this chapter we shall adopt the metric space context for our arguments
wherever it is appropriate, so that the student may become familiar with this
more general but very intuitive notion. We begin by reproducing the basic
definitions in the language of metrics. Because the scalar-vector dichotomy is
not a factor in this context, we shall drop our convention that points be repre-
sented by Greek or boldface roman letters and shall use whatever letters we wish.
Definition. If X and Yare metric spaces, then j: X ~ Y is continuous at
a E X if for every E there is a ~ such that
p(x, a) < ~ =? p(j(x),j(a) < E.
Here we have used the same symbol'p' for metrics on different spaces, just
as earlier we made ambiguous use of the norm symbol.
Definition. The (open) ball oj radius r about p, BT(p), is simply the set of
points whose distance from p is less that r:
BT(P) = {x :p(x, p) < r}.
Definition. A subset A C X is open if every point p in A is the center of
some ball included in A, that is, if
4.1 METRIC SPACES; OPEN AND CLOSED SETS 197
LeDlDla 1.1. Every ball is open; in fact, if q E Br(P) and a= r - p(p, q),
then Ba(q) C Br(P).
Proof. This amounts to the triangle inequality. For, if x E Ba(q), then p(x, q) <
a and p(x, p) ~ p(x, q) +p(q, p) < a+p(p, q) = r, so that x E Br(P).
Thus Ba(q) C Br(P). 0
LeDlDla 1.2. If P is held fixed, then p(p, x) is a continuous function of x.
Proof. A symbol-by-symbol paraphrase of Lemma 3.1 of Chapter 3 shows that
Ip(p, x) - p(p, y)1 ~ p(x, y), so that p(p, x) is actually a Lipschitz function
with constant 1. 0
TheoreDl 1.1. The family ~ of all open subsets of a metric space S has the
following properties:
1) The union of any collection of open sets is open; that is, if {Ai: i E J} C
~, then UiEI Ai E ~.
2) The intersection of two open sets is open; that is, if A, B E ~, then
A nB E~.
3) 0, V E~.
Proof. These properties follow immediately from the definition. Thus any
point p in Ui Ai lies in some Aj, and therefore, since A j is open, some ball about p
is a subset of Aj and hence of the larger set Ui Ai. 0
Corollary. A set is open if and only if it is a union of open balls.
Proof. This follows from the definition of open set, the lemma above, and
property (1) of the theorem. 0
The union of all the open subsets of an arbitrary set A is an open subset of
A, by (1), and therefore is the largest open subset of A. It is called the interior
of A and is designated A into Clearly, p is in Aint if and only if some ball about p
is a subset of A, and it is helpful to visualize Aint as the union of all the balls
that are in A.
Definition. A set A is closed if A' is open.
The theorem above and De Morgan's law (Section 0.11) then yield the
following complementary set of properties for closed sets.
TheoreDl 1.2
1) The intersection of any family of closed sets is closed.
2) The union of two closed sets is closed.
3) 0 and V are closed.
Proof. Suppose, for example, that {Bi: i E J} is a family of closed sets. Then
the complement B: is open for each i, so that UiB: is open by the above theorem.
198 COMPACTNESS AND COMPLETENESS 4.1
Also, UiB[ = miBi)' by De Morgan's law (see Section 0.11). Thus niBi is
the complement of an open set and is closed. 0
Continuing our "complementary" development, we define the closure, if, of
an arbitrary set A as the intersection of all closed sets including A, and we have
from (1) above that if is the smallest closed set including A. De Morgan's law
implies the important identity
For F is a closed superset of A if and only if its complement U = F' is an open
subset of A'. By De Morgan's law the complement of the intersection of all such
sets F is the union of all such sets U. That is, the complement of if is (A')int.
This identity yields a direct characterization of closure:
Lemma 1.3. A point p is in if if and only if every ball about p intersects A.
Proof. A point p is not in if if and only if p is in the interior of A', that is, if and
only if some ball about p does not intersect A. Negating the extreme members
of this equivalence gives the lemma. 0
Definition. The boundary, iJA, of an arbitrary set A is the difference
between its closure and its interior. Thus
iJA = if - Aint.
Since A - B = A n B', we have the symmetric characterization iJA =
if n (A'). Therefore, iJA = iJ(A') j also,
p E iJA if and only if every ball about p intersects both A and A'.
Example. A ball Br(a) is an open set. In a normed linear space the closure of
Br(a) is the closed ball about a of radius r, H :p(~, a) ~ r}. This is easily seen
from Lemma 1.3. The boundary iJBr(a) is then the spherical surface of radius I'
about a, H: p(~, a) = r}. If some but not all of the points of this surface are
added to the open ball, we obtain a set that is neither open nor closed. The
student should expect that a random set he may encounter will be neither open
nor closed.
Continuous functions furnish an important source of open and closed sets
by the following lemma.
Lemma 1.4. If X and Yare metric spaces, and iff is a continuous mapping
from X to Y, then f-I[A] is open in X whenever A is open in Y.
Proof. If pErI [A], then f(p) E A, and, since A is open, some ball B.(f(p)
is a subset of A. But the continuity of fat p says exactly that there is a ~ such
that f[Ba(p)] C B.(f(p). In particular, f[Ba(p)] C A and Ba(P) Cf-I[A].
Thus for each p inrI[A] there is a ball about p included inf-I[A], and this set
is therefore open. 0
4.1 METRIC SPACES; OPEN AND CLOSED SETS 199
Since f-I[A'] = (f-I[A])', we also have the following corollary.
Corollary. Iff: X -- Y is continuous, thenrI[C] is closed in X whenever C
is closed in Y.
The converses of both of these results hold as well. As an example of the use
of this lemma, consider for a fixed a E X the continuous function f: X -- IR
defined by f(~D = p(~, a). The sets (-1',1'), [0,1'], and {r} are respectively open,
closed, and closed subsets of IR. Therefore, their inverse images under f-the ball
Br(a), the closed ball a:pa, a) ~ r}, and the spherical surface a:p(~, a) = r}
----<are respectively open, closed, and closed in X. In particular, the triangle
inequality argument demonstrating directly that Br(a) is open is now seen to be
unnecessary by virtue of the triangle inequality argument that demonstrates the
continuity of the distance function (Lemma 1.2).
It is not true that continuous functions take closed sets into closed sets in the
forward direction. For example, if f: IR -- IR is the arc tangent function, then
f[lR] = range f = (-7r/2, 7r/2), which is not a closed subset of IR. The reader
may feel that this example cheats and that we should only expect the f-image of a
closed set to be a closed subset of the metric space that is the range of f. He
might then consider f(x) = 2x/(1 +x2 ) from IR to its range [-1, 1]. The set
of positive integers Z+ is a closed subset of IR, but f[Z+] = {2n/(1 +n2)} ~ is not
closed in [-1, 1], since 0 is clearly in its closure.
The distance between two nonempty sets A and B, p(A, B), is defined as
glb {p(a, b) : a E A and b E B}. If A and B intersect, the distance is zero.
If A and B are disjoint, the distance may still be zero. For example, the interior
and exterior of a circle in the plane are disjoint open sets whose distance apart is
zero. The x-axis and (the graph of) the function f(x) = l/x are disjoint closed
sets whose distance apart is zero. As we have remarked earlier, a set A is closed
if and only if every point not in A is a positive distance from A. More generally,
for any set A a point p is in A if and only if p(p, A) = o.
We list below some simple properties of the distance between subsets of a
normed linear space.
1) Distance is unchanged by a translation: p(A, B) = p(A + "/, B + "/)
(became II(a +"/) - ({3 +"/)II = Iia - (311)·
2) p(kA, kB) = Iklp(A, B) (because Ilka - k{311 = Ikilia - (31f).
3) If N is a subspace, then the distance from B to N is unchanged when we
translate B through a vector in N: p(N, B) = p(N, B +71) if 71 E N
(because N - 71 = N).
4) If T E Hom(V, W), then p(T[A], T[B]) ~ IITllp(A, B) (because
/IT(a) - T({3)1I ~ IIT/I. lIa - (311)·
Lemma 1.5. If N is a proper closed subspace and 0 < E <1, there exists an
a suc:n that Ila/l = 1 and p(a, N) > 1 - E.
200 COMPACTNESS AND COMPLETENESS 4.1
Proof. Choose any (3 ~ N. Then p({3, N) > 0 (because N is closed), and there
exists an 71 EN such that
11{3 - 7111 < p({3, N)/(1 - E)
[by the definition of p({3, N)]. Set a = ({3 - 71)/II{3 - 7111. Then Iiall = 1 and
p(a, N) = p({3 - 71, N)/II{3 - 7111
= p({3, N)/II{3 - 7111 > p({3, N)(1 - E)/p({3, N) = 1 - E,
by (2), (3), and the definition of 71. 0
The reader may feel that we ought to be able to improve this lemma. Surely,
all we have to do is choose the point in N which is closest to (3, and so obtain
11{3 - 7111 = p({3, N), giving finally a vector a such that Iiall = 1 and p(a, N) = 1.
However, this is a matter on which our intuition lets us down: if N is infinite-
dimensional, there may not be a closest point 71! For example, as we shall see
later in the exercises of Chapter 5, if V is the space e([-1, 1]) under the two-
norm Ilfll = (f!l f2)1/2, and if N is the set of functions g in V such that U g = 0,
then N is a closed subspace for which we cannot find such a "best" Ci. But if N is
finite-dimensional, we can always find such a point, and if V is a Hilbert space,
(see Chapter 5) we can also.
EXERCISES
1.1 Write out the proof of Lemma 1.2.
1.2 Prove (2) and (3) of Theorem 1.1.
1.3 Prove (2) of Theorem 1.2.
1.4 It is not true that the intersection of a sequence of open sets is necessarily open.
Find a counterexample in ~.
1.5 Prove the corollary of Lemma 1.4.
1.6 Prove that pEA if and only if pep, A) = o.
1.7 Let X and Y be metric spaces, and let j: X --+ Y have the property that j-l[B]
is open in X whenever B is open in Y. Prove that j is continuous.
1.8 Show that p(x, A) = p(x, A).
1.9 Show that p(x, A) is a continuous function of x. (In fact, it is Lipschitz con-
tinuous.)
1.10 Invent metric spaces S (by choosing subsets of ~2) having the following prop-
erties:
1) S has n points.
2) S is infinite and p(x, y) ~ 1 if x ~ y.
3) S has a ball Bl(a) such that the closed ball {x: p(x, a) :$ I} is not the same as
the closure of Bl(a).
4.2 TOPOLOGY 201
1.11 Prove that in a normed linear space a closed ball is the closure of the corre-
sponding open ball.
1.12 Show that if f: X -+ Y and g: Y -+ Z are continuous (where X, Y, and Z are
metric spaces), then so is g 0 f.
1.13 Let X and Y be metric spaces. Define the notion of a product metric on Z =
X X Y. Define a I-metric PI and a uniform metric p", on Z (showing that they are
metrics) in analogy with the I-norm and uniform norm on a product of normed linear
spaces, and show that each is a product metric according to your definition above.
1.14 Do the same for a 2-metric P2 on Z = X X Y.
1.IS Let X and Y be metric spaces, and let V be a normed linear space. Letf: X -+ IR
and g: Y -+ V be continuous maps. Prove that
-< x, y>- 1-+ f(x) g(y)
is a continuous map from X X Y to v.
*2. TOPOLOGY
If X is an arbitrary set and 3 is any family of subsets of X satisfying properties
(1) through (3) in Theorem 1.1, then 3 is called a topology on X. Theorem 1.1
thus asserts that the open subsets of a metric space X form a topology on X.
The subsequent definitions of interior, closed set, and closure were purely
topological in the sense that they depended only on the topology 3, as were
Theorem 1.2 and the identity (A), = (A')int. The study of the consequences of
the existence of a topology is called general topology.
On the other hand, the definitions of balls and continuity given earlier were
metric definitions, and therefore part of metric space theory. In metric spaces,
then, we have not only the topology, but also our E-definitions of continuity and
balls and the spherical characterizations of closure and interior.
The reader may be surprised to be told now that although continuity and
convergence were defined metrically, they also have purely topological char-
acterizations and are therefore topological ideas. This is easy to see if one keeps
in mind that in a metric space an open set is nothing but a union of balls. We
have:
f is continuous at p if and only if for every open set A containing f(p) there
exists an open set B containing p such that f[B] C A.
This local condition involving behavior around a single point p is more
fluently rendered in terms of the notion of neighborhood. A set A is a neighbor-
hood of a point p if pEAinto Then we have:
f is continuous at p if and only if for every neighborhood N of f(p) , r1[N]
is a neighborhood of p.
Finally there is an elegant topological characterization of global continuity.
Suppose that 8 1 and 8 2 are topological spaces. Then f: 8 1 -+ 8 2 is continuous
202 COMPACTNESS AND COMPLETENESS 4.:
(everywhere) if and only if rl[A] is open whenever A is open. Also, f is con-
tinuous if and only if f-l[B] is closed whenever B is closed. These conditionH
are not surprising in view of Lemma 1.4.
3. SEQUENTIAL CONVERGENCE
In addition to shifting to the more general point of view of metric space theory,
we also want to add to our kit of tools the notion of sequential convergence,
which the reader will probably remember from his previous encounter with the
calculus. One of the principal reasons why metric space theory is simpler and
more intuitive than general topology is that nearly all metric arguments can be
presented in terms of sequential convergence, and in this chapter we shall
partially make up for our previous neglect of this tool by using it constantly and
in preference to other alternatives.
Definition. We say that the infinite sequence {xn} converges to the point a
if for every E there is an N such that
n > N => p(xn' a) < E.
We also say that Xn approaches a as n approaches (or tends to) infinity, and
we call a the limit of the sequence. In symbols we write Xn ~ a as n ~ 00, or
limn-+oo Xn = a. Formally, this definition is practically identical with our earlier
definition of function convergence, and where there are parallel theorems the
arguments that we use in one situation will generally hold almost verbatim in
the other. Thus the proof of Lemma 1.1 of Chapter 3 can be alternated slightly
to give the following result.
Lemma 3.1. If {M and {"Ii} are two sequences in a normed linear space V,
then
~i ~ a and 7Ji ~ {3 => ~i + 7Ji ~ a + {3.
The main difference is that we now choose N as max {N b N 2} instead of
choosing ~ as min {~1' ~2}. Similarly:
Lemma 3.2. If ~i ~ a in V and Xi ~ a in IR, then Xi~i ~ aa.
As before, the definition begins with three quantifiers, (VE)(3N) (Vn). A
somewhat more idiomatic form can be obtained by rephrasing the definition in
terms of balls and the notion of "almost all n". We say that P(n) is true for
almost all n if P(n) is true for all but a finite number of integers n, or equivalently,
if (3N)(Vn>N)P(n). Then we see that
lim Xn = a if and only if every ball about a contains almost all the Xn •
The following sequential characterization provides probably the most
intuitive way of viewing the notion of closure and closed sets.
4.~~ SEQUENTIAL CONVERGENCE 203
Theorem. 3.1. A point x is in the closure A of a set A if and only if there is a
sequence {xn} in A converging to x.
Therefore, a set A is closed if and only if every convergent sequence lying
in A has its limit in A.
Proof. If {Xn} C A and Xn ~ x, then every ball about x contains almost every
Xn> and so, in particular, intersects A. Thus x E A by Lemma 1.3. Conversely,
if x E A, then every ball about x intersects A, and we can construct a sequence in
A that converges to x by choosing Xn as any point in B lin (X) n A. Since A is
closed if and only if A = A, the second statement of the theorem follows from
the first. 0
There is also a sequential characterization of continuity which helps greatly
in using the notion of continuity in a flexible way. Let X and Y be metric spaces,
and let f be any function from X to Y.
Theorelll 3.2. The function f is continuous at a if and only if, for any
sequence {xn} in X, if Xn ~ a, then f(xn) ~ f(a).
Proof. Suppose first that f is continuous at a, and let {xn} be any sequence
converging to a. Then, given any E, there is a 0 such that
p(X , a) < 0 =} p(j(x) , f(a)) < E,
by the continuity of f at a, and for this 0 there is an N such that
n > N =} p(xn , a) < 0,
because ;t·n ~ a. Combining these implications, we see that given E we have
found N so that n > N =} p(f(xn) , f(a)) < E. That is, f(xn) ~ f(a).
Now suppose that f is not continuous at a. In considering such a negation
it is important that implicit universal quantifiers be made explicit. Thus, for-
mally, we are assuming that ~(V'E)(30)(V'x)(p(x , a) < 0 =} p(j(X) , f(a)) < E),
that is, that (3E)(V'0)(3x)(p(x, a) < 0 &p(f(x) ,f(a)) ~ E). Such symbolization
will not be necessary after the reader has had some practice in computing logical
negations; the experienced thinker will intuit the correct negation without a
formal calculation. In any event, we now have a fixed E, and for each 0 of the
form 0 = lin we can let Xn be a corresponding x. We then have p(xn , a) < lin
and p(f(xn) , f(a)) ~ E for all n. The first inequality shows tnat Xn ~ a; the
second shows that f(xn) + f(a). Thus, if f is not continuous at a, then the
sequential condition is not satisfied. 0
The above type of argument is used very frequently and almost amounts to
an automatic proof procedure in the relevant situations. We want to prove, say,
that (V'x)(3y)(V'z)P(x, y, z). Arguing by contradiction, we suppose this false, so
that (3x)(V'y)(3z)~P(x, y, z). Then, instead of trying to use all numbers y, we
let y run through some sequence converging to zero, such as (lIn}, and we choose
204 COMPACTNESS AND COMPLETENESS 4.3
one corresponding z, Zn, for each such y. We end up with "",P(x, lin, zn) for the
given x and all n, and we finish by arguing sequentially.
The reader will remember that two norms p and q on a vector space V are
equivalent if and only if the identity map ~ 1-+ ~ is continuous from -< V, p>- to
-< V, q>- and also from -< V, q>- to -< V, p>-. By virtue of the above theorem
we now see that:
Theorem 3.3. The norms p and q are equivalent if and only if they yield
exactly the same collection of convergent sequences.
Earlier we argued that a norm on a product V X W of two normed linear
spaces should be equivalent to 11-<a, ~ >- III = lIall + II ~II. Now with respect to
this sum norm it is clear that a sequence -< an, ~n >- in V X W converges to
-< a, ~ >- if and only if an ---+ a in V and ~n ---+ ~ in W. We now see (again by
Theorem 3.2) that:
Theorem 3.4. A product norm on V X W is any norm with the property
that -< an, ~n >- ---+ -< a, ~ >- in V X W if and only if an ---+ a in V and
~n ---+ ~ in W.
EXERCISES
3.1 Prove that a convergent sequence in a metric space has a unique limit. That is,
show that if Xn --+ a and x.. --+ b, then a = b.
3.2 Show that Xn --+ x in the metric space X if and only if P(Xn, x) --+ 0 in IR.
3.3 Prove that if Xn --+ a in IR and Xn ;;:::: 0 for all n, then a ;;:::: O.
3.4 Prove that if Xn --+ 0 in IR and IYnl ::;; Xn for all n, then Yn --+ O.
3.5 Give detailed E, N-proofs of Lemmas 3.1 and 3.2.
3.6 By applying Theorem 3.2, prove that if X is a metric space, V is a normed linear
space, and F and G are continuous maps from X to V, then F + G is continuous.
State and prove the similar theorem for a product FG.
3.7 Prove that continuity is preserved under composition by applying Theorem 3.2.
3.8 Show that (the range of) a sequence of points in a metric space is in general not
a closed set. Show that it may be a closed set.
3.9 The fact that in a normed linear space the closure of an open ball includes the
corresponding closed ball is practically trivial on the basis of Lemma 3.2 and Theorem
3.1. Show that this is so.
3.10 Show directly that if the maximum norm 11-<a, ~).. II = max {liall, II~II} is used
on V = VI X V2, then it is true that
-<an, ~..>- --+ -<a, ~>- in V
if and only if
an --+ a in VI and
4.4 SEQUENTIAL COMPACTNESS 205
S.ll Show that if II II is any increasing norm on 1R2 (see the remark after Theorem 4.3
of Chapter 3), then
p(-<Xl,Yl>, -<X2,Y2» = lI-<p(xl,x2),p(Yl,Y2»1I
is a metric on the product X X Y of two metric spaces X and Y.
S.I2 In the above exercise show that -< Xn , Yn> -+ -< x, y> in X X Y if and only if
Xn -+ x in X and Yn -+ yin Y. This property would be our minimal requirement for a
product metric.
S.IS Defining a product metric as above, use Theorem 3.2 to show that
-<f, g> : S -+ X X Y
is continuous if and only if f: S -+ X and g: S -+ Yare both continuous.
3.14 Let X, Y, and Z be metric spaces, and let f: X X Y -+ Z be a mapping such
that f(x, y) is continuous in the variables separately. Suppose also that the continuity
in x is uniform over y. That is, suppose that given E and XO, there is a ~ such that
p(x, xo) < ~ => p(f(x, y), f(xo, y») < E
for every value of y. Show that then f is continuous on X X Y.
3.15 Define the function f on the closed unit square [0, 1] X [0, 1] by
f(O,O) = 0,
xy
f(x, y) = (x +y)2 if -<x, y> F- -<0,0>.
Thenf is continuous as a function of x for each fixed value of y, and conversely. Show,
however, that f is not continuous at the origin. That is, find a sequence -< Xn, Yn >
converging to -< 0, 0> in the plane such that f(xn, Yn) does not converge to O. This
example shows that continuity of a function of two variables is a stronger property
than continuity in each variable separately.
4. SEQUENTIAL COMPACTNESS
The reader is probably familiar with the idea of a subsequence. A subsequence of
a sequence {xn} is a new sequence {Ym} that is formed by selecting an infinite
number, but generally not all, of the terms X n , and counting them off in the
order of the selected indices. Thus, if nl is the first selected n, n2 the next, and
so on, and if we set Ym = xn.., then we obtain the subsequence
or
Strictly speaking, this counting off of the selected set of indices n is a sequence
m ~ nm from Z+ to Z+ which preserves order: nm+l > nm for all m. And the
subsequence m ~ xn... is the composition of the sequence n ~ Xn and the
selector sequence.
In order to avoid subscripts on subscripts, we may use the notation n(m)
instead of nm• In either case we are being conventionally sloppy: we are using
the same symbol'n' as an integer-valued variable, when we write Xn, and as the
selector function, when we write n(m) or nm• This is one of the standard nota-
206 COMPA€TNESS AND COMPLETENESS 4.4
tional ambiguities which we tolerate in elementary calculus, because the cure is
considered worse than the disease. We could say: let f be a sequence, i.e., a
function from Z+ to IR. Then a subsequence of f is a composition fog, where g
is a mapping from Z+ to Z+ such that g(m + 1) > g(m) for all m.
Ifyou have grasped the idea of subsequence, you should be able to see that
any infinite sequence of O's and l's, say {O, 1, 0, 0, 1, 0, 0, 0, 1, ...}, can be
obtained as a subsequence of {O, 1,0, 1,0, 1, ... , [1 + (-1)n]/2, ...}.
If Xn --+ a, then it should be clear that every subsequence also converges to a.
We leave the details as an exercise. On the other hand, if the sequence {xn}
does not converge to a, then there is an E such that for every N there is some
larger n at which p(xn' a) ~ E. Now we can choose such an n for every N,
taking care that nN+l > nN, and thus choose a subsequence all of whose terms
are at a distance at least E from a. Then this sequence has no subsequence
converging to a. Thus, if {Xn} does not converge to a, then it has a subsequence
no (sub)subsequence of which converges to a. Therefore,
Lemma 4.1. If the sequence {xn} and the point a are such that every
subsequence of {xn} has itself a subsequence that converges to a, then
Xn --+ a.
This is a wild and unlikely sounding lemma, but we shall use it to prove a
most important theorem (Theorem 4.2).
Definition. A subset A of a metric space is sequentially compact if every
sequence in A has a subsequence that converges to a point of A.
Here, so to speak, we create convergence out of nothing. One would expect
a compact set to have very powerful properties, and perhaps suspect that there
aren't many such sets. We shall soon see, however, that every bounded closed
subset of IRn is compact, and it is in the theory of finite-dimensional spaces that
we most frequently use this notion. Sequential compactness in infinite-dimen-
sional spaces is a much rarer phenomenon, but when it does occur it is very
important, as we shall see in our brief look at Sturm-Liouville theory in Chapter 6.
We begin with a few simple but important general results.
Lemma 4.2. If A is a sequentially compact subset of a metric space S,
then A is closed and bounded.
Proof. Suppose that {xn} C A and that Xn --+ b. By the compactness of A
there exists'a subsequence {Xn(i)}i that converges to a point a EA. But a sub-
sequence of a convergent sequence converges to the same limit. Therefore,
a = band b E A. Thus A is closed.
Boundedness here will mean lying in some ball about a given point b. If A
is not bounded, for each n there exists a point Xn E A such that p(xn, b) > n.
By compactness a suBsequence {Xn(i)}i converges to a point a E A, and
P(Xn(i), b) --+ p(a, b).
This clearly contradicts P(Xn(i), b) > n(i) ~ i. 0
4.4 SEQUENTIAL COMPACTNESS 207
Continuous functions carry compact sets into compact sets. The proof of
the following result is left as an exercise.
TheorelD 4.1. Iff is continuous and A is a sequentially compact subset of its
domain, then I[A] is sequentially compact.
A nonempty compact set A C iii' contains maximum and minimum elements.
This is because lub A is the limit of a sequence in A, and hence belongs to A
itself, since A is closed. Combining this fact with the above theorem, we obtain
the following well-known corollary.
Corollary. Iffis a continuous real-valued function and dom (f) is nonempty
and sequentially compact, then f is bounded and assumes maximum and
minimum values.
The following very useful result is related to the above theorem.
TheorelD 4.2. If f is continuous and bijective and dom (f) is sequentially
compact, then r 1 is continuous.
Proof. We have to show that if Yn ~ Y in the range of I, and if Xn = f-1(Yn)
and: x = rl(y), then Xn ~ x. It is sufficient to show that every subsequence
{Xn(i)}i has itself a subsequence converging to x (by Lemma 4.1). But, since
dom (f) is compact, there is a subsequence {Xn(i(i»} i converging to some z, and
the continuity of f implies that f(z) = limi->oc f(Xn(i(j») = limi->oc Yn(i(j» = y.
Therefore, z = r1(y) = x, which is what we had to prove. Thus r 1 is con-
tinuous. 0
We now take up the problem of showing that bounded closed sets in IRn are
compact. We first prove it for IR itself and then give an inductive argument
for IRn.
A sequence {xn} C IR is said to be increasing if Xn ~ xn+1 for all n. It is
strictly increasing if Xn < Xn+1 for all n. The notions of a decreasing sequence
and a strictly decreasing sequence are obvious. A sequence which is either increas-
ing or decreasing is said to be monotone. The relevance of these notions here lies
in the following two lemmas.
LelDlDa 4.3. A bounded monotone sequence in IR is convergent.
Proof. Suppose that {xn} is increasing and bounded above. Let 1be the least
upper bound ofits range. That is, Xn ~ 1for all n, but for every E, 1 - E is not an
upper bound, and so 1- E < XN for some N. Then
n > N =} 1- E < XN ~ Xn ~ 1,
and so IXn - II < E. That is, Xn ~ 1as n ~ 00. 0
LelDlDa 4.4. Any sequence in IR has a monotone subsequence.
Proof. Call Xn a peak term if it is greater than or equal to all later terms. If
there are infinitely many peak terms, then they obviously form a decreasing
208 COMPACTNESS AND COMPLETENESS 4.4
subsequence. On the other hand, if there are only finitely many peak terms, then
there is a last one xno (or none at all), and then every later term is strictly less
than some other still later term. We choose any n1 greater than no, and then we
can choose n2 > n1 so that xnl < xn2' etc. Therefore, in this case we can choose
a strictly increasing subsequence. We have thus shown that any sequence {xn }
in IR has either a decreasing subsequence or a strictly increasing subsequence. 0
Putting these two lemmas together, we have:
Theorem 4.3. Every bounded sequence in IR has a convergent subsequence.
Now we can generalize to IRn by induction.
Theorem 4.4. Every bounded sequence in IRn has a convergent subsequence
(using any product norm, say II /11)'
Proof. The above theorem is the case n = 1. Suppose then that the theorem is
true for n - 1, and let {xm}m be a bounded sequence in IRn. Thinking of IRn as
IRn- 1 X IR, we have xm = -< ym, zm>-, and {ym} mis bounded in IRn-1, because if
x = -<y, z>-, then /lx/l 1 = /ly/l 1+ Izi ;::: /lyl/ 1. Therefore, there is a subsequence
{yn(i)}i converging to some y in IRn-t, by the inductive hypothesis. Since
{Zn(i)} is bounded in IR, it has a subsequence {Zn(i(p»} p converging to some x in
IR. Of course, the corresponding subsubsequence {yn(i(p»} p still converges to y
in IRn-t, and then {xn(i(p»} p converges to x = -< y, Z>- in IRn = IRn- 1 X IR,
since its two component sequences now converge to y and z, respectively. We
have thus found a convergent subsequence of {xn}. 0
Theorem 4.5. If A is a bounded closed subset of IRn, then A is sequentially
compact (in any product norm).
Proof. If {xn} C A, then there is a subsequence {Xn(i)}i converging to'some x
in IRn, by Theorem 4.4, and x is in A, since A is closed. Thus A is compact. 0
We can now fill in one of the minor gaps in the last chapter.
Theorem 4.6. All norms on IRn are equivalent.
Proof. It is sufficient to prove that an arbitrary norm /I /I is equivalent to /I /I 1.
Setting a = max {I/cSi/lg, we have
/lxI/ = II~ XicSill ~ ~ IXil /lcSil/ ~ al/xllt,
so one of our inequalities is trivial. We also have I/lxI/ - /lyl/ I ~ /Ix - yl/ ~
allX - yl/ 1, so /lxI/ is a continuous function on IRn with respect to the one-norm.
Now the unit one-sphere S = {x: /lx/l1 = I} is closed and bounded and so
compact (in the one-norm). The restriction of the continuous function /lx/I to
this compact set S has a minimum value m, and m cannot be zero because S
does not contain the zero vector. We thus have /lx/l ;::: m/lxl/1 on S, and so
/lx/l ;::: m/lxl/ 1 on IRn, by homogeneity. Altogether we have found positive
constants a and m such that m/l /I1 ~ /I /I ~ a/l /I1. 0
4.4 SEQUENTIAL COMPACTNESS 209
Composing with a coordinate isomorphism, we see that all norms on any
finite-dimensional vector space are equivalent.
Corollary. If M is a finite-dimensional subspace of the normed linear space
V, then M iR a closed subspace of V.
Proof. Suppose that {~n} eM and ~n -+ a E V. We have to show that a is in
M. Now an} is a bounded subset of M, and its closure in M is therefore se-
quentially compact, by the theorem. Therefore, some subsequence converges to
a point {3 in M as well as to a, and so a = (3 E M. 0
EXERCISES
4.1 Prove by induction that if f: Z+ ---+ Z+ is such that fen + 1) > fen) for all n,
then fen) ~ n for all n.
4.2 Prove carefully that if Xn ---+ a as n ---+ 00, then Xn(m) ---+ a as m ---+ 00 for any
subsequence. The above exercise is useful in this proof.
4.3 Prove that if {Xn} is an increasing sequence in IR (Xn+1 ~ Xn for all n), and if
{Xn} has a convergent subsequence, then {Xn} converges.
4.4 Give a more detailed version of the argument that if the sequence {xn} does not
converge to a, then there is an E and a subsequence {Xn(m)} m such that P(Xn(m), a) ~ E
for all m.
4.5 Find a sequence in IR having no convergent subsequence.
4.6 Find a nonconvergent sequence in IR such that the set of limit points of con-
vergent subsequence consists exactly of the number 1.
4.7 Show that there is a sequence {Xn} in [0,1] such that for any y E [0, 1] there is a
subsequence Xnm converging to y.
4.8 Show that the set of limits of convergent subsequences of a sequence {Xn} in a
metric space X is a closed subset of X.
4.9 Prove Theorem 4.1.
4.10 Prove that the Cartesian product of two sequentially compact metric spaces is
sequentially compact. (The proof is essentially in the text.)
4.11 A metric space is boundedly compact if every closed bounded set is sequentially
compact. Prove that the Cartesian product of two boundedly compact metric spaces is
boundedly compact (using, say, the maximum metric on the product space).
4.12 Prove that the sum A + B of two sequentially compact subsets of a normed
linear space is sequentially compact.
4.13 Prove th_at the sum A +B of a closed set and a compact set is closed.
4.14 Show by an example in IR that the sum of two closed sets neea not be closed.
4.15 Let {Cn} be a decreasing sequence (Cn+l C Cn for all n) of nonempty closed
subsets of a sequentially compact metric space S. Prove that niCn is nonempty.
4.16 Give an example of a decreasing sequence {Cn} of nonempty closed subsets of
a metric space such that ni Cn = 0.
210 COMPACTNESS AND COMPLETENESS 4.5
4.17 Suppose the metric space S has the property that every decreasing sequence
{Cn] of nonempty closed subsets of S has nonempty intersection. Prove that then S
must be sequentially compact. [Hint: Given any sequence {x;} C S, let Cn be thr
closure of {Xi: i ~ n}.J
4.18 Let.l be a sequentially compact subset of a nls V, and let B be obtained from .I
by drawing all line segments from points of A to the origin (that is,
B = {ta: a E A and t E [0, 1]}).
Prove that B is compact.
4.19 Show by applYing a compactness argument to Lemma 1.5 that if N is a proper
closed subspace of a finite-dimensional vector space V, then there exists a in V such
that lIall = p(a, N) = 1.
5. COMPACTNESS AND UNIFORMITY
The word' uniform' is frequently used as a qualifying adjective in mathematics.
Roughly speaking, it concerns a "point" property P(y) which mayor may not
hold at each point y in a domain A and whose definition involves an existential
quantifier. A typical form for P(y) is (Vc)(3d)Q(y, c, d). Thus, if P(y) is 'f is
continuous at y', then P(y) has the form (VE)(3lJ)Q(y, E, lJ). The property holds
on A if it holds for all y in A, that is, if
(VyEA)[(Vc)(3d)Q(y, c, d)].
Here d will, in general, depend both on y and c; if either y or c is changed, the
corresponding d may have to be changed. Thus lJ in the definition of continuity
depends both on E and on the point y at which continuity is being asserted. The
property is said to hold uniformly on A, or uniformly in y, if a value d can be
found that is independent of y (but still dependent on c). Thus the property holds
uniformly in y if
(Vc)(3d)(VyEA)Q(y, c, d);
the uniformity of the property is expressed in the reversal of the order of the
quantifiers (VyEA) and (3d). Thus f is uniformly continuous on A if
(VE)(3lJ)(Vy, ZEA)[p(y, z) < lJ ==> p(J(y), fez)) < E].
Now lJ is independent of the point at which continuity is being asserted, but still
dependent on E, of course.
We saw in Section 14 of the last chapter how much more powerful the point
condition of continuity becomes when it holds uniformly. In the remainder of
this section we shall discuss some other uniform notions, and shall see that the
uniform property is often implied by the point property if the domain over which
it holds is sequentially compact.
The formal statement forms we have examined above show clearly the
distinction between uniformity and nonuniformity. However, in writing an
argument, we would generally follow our more idiomatic practice of dropping out
4.5 COMPACTNESS AND UNIF@RMIT;Y 211
the inside universal quantifier. For example, a sequence of functions Un} C W A
converges pointwise to f: A -+ W if it converges to f at every point P in A, that
is, if for every point p in A and for every f there is an N such that
n> N => p(jn(p),f(p) ~ f.
The sequence converges uniformly on A if an N exists that is independent of p,
that is, if for every f there is an N such that
n > N => p(jn(P),f(p) ~ f for every p in A.
When p(~, 71) = II ~ - 711[, saying that p(jn(P), f(p) ~ f for all p is the same as
saying that IIfn - fll"" ~ f. Thus fn -+ funiformly if and only if IIfn - fll"" -+ 0;
this is why the norm IIfll"" is called the uniform norm.
Pointwise convergence does not imply uniform convergence. Thus fn(x) =
xn on A = (0, 1) converges pointwise to the zero function but does not converge
uniformly.
Nor does continuity on A imply uniform continuity. The function f(x) =
1/x is continuous on (0, 1) but is not uniformly continuous. The function
sin (l/x) is continuous and bounded on (0, 1) but is not unifomlly continuous.
Compactness changes the latter situation, however.
Theorem 5.1. Iff is continuous on A and A is compact, then f is uniformly
continuous on A.
Proof. This is one of our "automatic" negation proofs. Uniform continuity
(UC) is the property
(Vf>o)(H>O)(Vx, yEA)[p(x, y) < ~ => p(j(x),j(y) < f].
Therefore, .....,UC ~ (3f)(V~)(3x, y)[p(x, y) < ~ and p(j(x), fey)~ ~ f]. Take
~ = lin, with corresponding Xn and Yn. Thus, for all n, p(xn, Yn) < lin and
p(j(xn), f(Yn) ~ f, where f is a fixed positive number. Now {xn} has a con-
vergent subsequence, say Xn(i) -+ x, by the compactness of A. Since
P(Yn(i), Xn(i») < Iii,
we also have Yn(i) -+ x. By the continuity off at x,
p(j(xn(i»),f(Yn(i»)) ~ p(j(xn(i»),f(x) +p(j(X),f(Yn(i»)) -+0,
which contradicts p(j(Xn(i»), f(Yn(i»)) ~ f. This completes the proof by nega-
tion. 0
The compactness of A does not, however, automatically convert the point-
wise convergence of a sequence of functions on A into uniform convergence. The
"piecewise linear" functions fn: [0, 1] -+ [0, 1] defined by the graph shown in
Fig. 4.1 converge pointwise to zero on the compact domain [0, 1], but the con-
vergence is not uniform. (However, see Exercise 5.4.)
212 COMPACTNESS AND COMPLETENESS 4.5
lr--------.--------------,
lin 21n
Fig. 4.1 Fig. 4.2
We pointed out earlier that the distance between a pair of disjoint closed
sets may be zero. However, if one of the closed sets is compact, then the distance
must be positive.
Theorem 5.2. If A and C are disjoint nonempty closed sets, one of which is
compact, then p(A, C) > o.
Proof. The proof is by automatic contradiction, and is left to the reader.
This result is again a uniformity condition. Saying that a set A is disjoint
from a closed set C is saying that (VxEA )(3r>O) (Br(x) n C = 0). Saying that
p(A, C) > 0 is saying that (3r>O)(VxEA ) •••
As a last consequence of sequential compactness, we shall establish a very
powerful property which is taken as the definition of compactness in general
topology. First, however, we need some preparatory work. If A is a subset of a
metric space S, the r-neighborhood of A, Br[A], is simply the union of all the balls
of radius r about points of A:
Br[A] = U{Br(a) : a E A} = {x: (3aEA) (p(x, a) < r)}.
A subset A C S is r-dense in S if S C Br[A], that is, if each point of S is closer
than r to some point of A.
A subset A of a metric space S is dense in S if A = S. This is the same as
saying that for every point p in S there are points of A arbitrarily close to p.
The set iQ of all rational numbers is a dense subset of the real number system IR,
because any irrational real number x can be arbitrarily closely approximated by
rational numbers. Since we do arithmetic in decimal notation, it is customary to
use decimal approximations, and if 0 < x < 1 and the decimal expansion of
x is x = L~ an/lOn, where each an is an integer and 0 ~ an < 10, then
:Ef an/IOn is a rational number differing from x by less than lO-N. Note that A
is a dense subset of B if and only if A is r-dense in B for every positive r.
A set B is said to be totally bounded if for every positive r there is a finite set
which is r-dense in B. Thus for every positive r the set B can be covered by a
finite number of balls of radius r. For example, the n - 1 numbers {i/nH-1 are
(l/n)-dense in the open interval (0, 1) for each n, and so (0, 1) is totally bounded.
4.5 COMPACTNESS AND UNIFORMITY 213
Total boundedness is a much stronger property than boundedness, as the
following lemma shows.
Lemma 5.1. If the normed linear space V is infinite-dimensional, then its
closed unit ball BI = H: II ~II ~ I} cannot be covered by a finite number
of balls of radius -1.
Proof. Since V is not finite-dimensional, we can choose a sequence {an} such
that an+l is not in the linear span M n of {aI, ... , an}, for each n. Since M n is
closed in V, by the corollary of Theorem 4.6, we can apply Lemma 1.5 to find
a vector ~n in M n such that II ~n II = 1 and p( ~n' M n-l) > i for all n > 1.
We take h = adilalll, and we have a sequence Hn} C BI such that
II~m - ~nll > i
if m ¢ n. Then no ball of radius -1 can contain more than one h, proving the
lemma. 0
For a concrete example, let V be e([O, 1]), and letfn be the "peak" function
sketched in Fig. 4.2, where the three points on the base are 1/(2n +2), 1/(2n+ 1),
and 1/2n. Then fn+l is "disjoint" from fn (that is, fn+dn = 0), and we have
Ilfnlloo = 1 for all nand Ilfn - fmlloo = 1 if n ¢ m. Thus no ball of radius i can
contain more than one of the functions fn, and accordingly the closed unit ball in
V cannot be covered by a finite number of balls of radius l
Lemma 5.2. Every sequentially compact set A is totally bounded.
Proof. If A is not totally bounded, then there exists an r such that no finite
subset F is r-dense in A. We can then define a sequence {Pn} inductively by
taking PI as any point of A, P2 as any point of A not in Br(PI), and Pn as any
point of A not in Br[U~-l Pi] = U~-l Br(Pi). Then {Pn} is a sequence in A
such that P(Pi' Pj) ~ r for all i ¢ j. But this sequence can have no convergent
subsequence. Thus, if A is not totally bounded, then A is not sequentially com-
pact, proving the lemma. 0
Corollary. A normed linear space V is finite-dimensional if and only if its
closed unit ball is sequentially compact.
Proof. This follows from Theorem 4.4 in one direction and from the above two
lemmas in the other direction. 0
Lemma 5.3. Suppose that A is sequentially compact and that {Ei : i E J}
is an open covering of A (that is, {Ei} is a family of open sets and A C UiEi).
Then there exists an r > 0 with the property that for every point P in A the
ball Br(P) is included in some Ej •
Proof. Otherwise, for every r there is a point P in A such that Br(P) is not a sub-
set of any Ej • Take r = l/n, with corresponding sequence {Pn}. Thus Bl/n(Pn)
is not a subset of any Ej • Since A is sequentially compact, {Pn} has a convergent
subsequence, Pn(m) ~ P as m ~ 00. Since {Ei} covers A, some Ej contains p,
214 COMPACTNESS AND COMPLETENESS 4.5
and then B.(p) C Ej for some E > 0, since Ej is open. Taking m large enough so
that l/m < E/2 and also P(Pn(m), p) < E/2, we have
Blfn(m)(Pn(m» C B.(p) C Ej,
contradicting the fact that Blfn(Pn) is not a subset of any Ei. The lemma has
thus been proved. 0
Theorem 5.S. If ff is an open covering of a sequentially compact set A,
then some finite subfamily of ff covers A.
Proof. By the lemma immediately above there exists an l' > 0 such that for
every P in A the ball Br(P) lies entirely in some set of ff, and by the first lemma
there exist Pb ... , Pn in A such that A C U~ Br(Pi). Taking corresponding sets
Ei in ff such that Br(Pi) C Ei for i = 1, ... , n, we clearly have A C U~ Ei • 0
In general topology, a set A such that every open covering of A includes a
finite covering is said to be compact or to have the Heine-Borel property. Thc
above theorem says that in a metric space every sequentially compact set if:;
compact. We shall see below that the reverse implication also holds, so that thc
two notions are in fact equivalent on a metric space.
Theorem 5.4. If A is a compact metric space, then A is sequentially
compact.
Proof. Let {xn} be any sequence in A, and let ff be the collection of open balls B
such that B contains only finitely many Xi. If ff were to cover A, then by com-
pactness A would be the union of finitely many balls in ff, and this would clearly
imply that the whole of A contains only finitely many Xi, contradicting the fact
that {Xi} is an infinite sequence. Therefore, ff does not cover A, and so there is a
point X in A such that every ball about x contains infinitely many of the Xi.
More precisely, every ball about x contains Xi for infinitely many indices i. It can
now be safely left to the reader to see that a subsequence of {xn} converges to x. 0
EXERCISES
5.1 Show thatfn(x) = xn does not converge uniformly on (0, 1).
5.2 Show thatf(x) = l/x is not uniformly continuous on (0,1).
5.S Define the notion of a function [{: X X Y ~ }' being uniformly Lipschitz in its
second variable over its first variable.
5,4, Let S be a sequentially compact metric space, and let {fn) be a sequence of
continuous real-valued functions on S that decreases pointwise to zero (that is, {fn(P)]
is a decreasing sequence in IR andfn(p) ~ 0 as n ~ co for each p in S). Prove that the
convergence is uniform. (Try to apply Exercise 4.15.)
5.5 Restate the corollaries of Theorems 15.1 and 15.2 of Chapter 3, employing the
weaker hypotheses that suffice by virtue of Theorem 5.1 of the present section.
4.6 EQUICONTINUITY 215
5.6 Prove Theorem 5.2.
5.7 Prove that if .1 is an r-dense subset of a set X in a normed linear space V, and
if B is an s-dense subset of a set Y C V, then A + B is (r + s)-dense in X + Y. Con-
clude that the sum of two totally bounded subsets of V is totally bounded.
5.8 Suppose that the n points {Pi] i are r-dense in a metric space X. Let A. be any
subset of X. Show that A. has a subset of at most n points that is 2r-dense in A.
Conclude that any subset of a totally bounded metric space is itself totally bounded.
5.9 Prove that the Cartesian product of two totally bounded metric spaces is totally
bounded.
5.10 Show that if a metric space X has a dense subset A that is totally bounded, then
X is total!y bounded.
5.11 Show that if two continuous mappings f and g from a metric space X to a metric
space Yare equal on a dense subset of X, then they are equal everywhere.
5.12 Write out in explicit quantified form involving the existence of balls the state-
ment that the interiors of the sets {Ai} cover the metric space A. Then show that the
conclusion of Lemma 5.3 is another uniformity assertion.
5.13 Reprove the theorem that a continuous function on a compact domain is
bounded on the basis of Theorem 5.3.
5.14 Reprove the theorem that a continuous function on a compact domain is
uniformly continuous from Theorem 5.3.
6. EQUICONTINUITY
The application of sequential compactness that we shall make in an infinite-
dimensional context revolves around the notion of an equicontinuou8 family of
functions. If A and B are metric spaces, then a subset fr C BA is said to be
equicontinuou8 at Po in A if all the functions of fr are continuous at Po and if
given E, there is a ~ which works for them all, i.e., such that
p(p, Po) < ~ => p(j(p),f(Po» < E for every fin fr.
The family fr is uniformly equicontinuou8 if ~ is also independent of Po, and so is
dependent only on E. Our quantifier string is thus (VE)(3~)(Vp, qEA)(Vr'.f).
For example, given m > 0, let fr be a collection of functions f from (0, 1) to
(0, 1) such that!' exists and 1f'1 :$ m on (0, 1). Then If(x) - f(y) I :$ mix - yl,
by the ordinary mean-value theorem. Therefore, given any E, we can take
~ = Elm and have
Ix - yl < ~ => If(x) - f(y)1 < E
for all x, y E (0, 1) and all f E fr. The collection fr is thus uniformly equicon-
tinuous.
TheorelD 6.1. If A and B are totally bounded metric spaces, and if ff is a
uniformly equicontinuou8 subfamily of BA, then fr is totally bounded in the
uniform metric.
216 COMPACTNESS AND COMPLETENESS 4.7
Proof. Given E > 0, choose 8so that for allfin 5 and all Pi, P2 in A, P(Pb P2) <
o =? P(J(Pl), f(P2») < E/4. Let D be a finite subset of A which is o-dense in A,
and let E be a finite subset of B which is (E/4)-dense in B. Let G be the set ED of
all functions on D into E. G is of course finite; in fact, #G = nm , where m = #D
and n = #E. Finally, for each g E G let 5g be the set of all functions f E 5 such
that
p(j(p), g(p») < E/4 for every P E D.
We claim that the collections 5g cover 5 and that each 5g has diameter at most E.
We will then obtain a finite E-dense subset of 5 by choosing one function from
each nonempty 5g , and the theorem will be proved.
To show that every f E 5 is in some 5g , we simply construct a suitable g.
For each P in D there exists a q in E whose distance from f(p) is less than E/4.
If we choose one such q in E for each pin D, we have a function gin G such that
fE5g •
The final thing we have to show is that if f, h E 5g , then p(f, h) ~ E. Since
p(h, g) < E/4 on D and p(f, g) < E/4 on D, it follows that
p(j(p), h(p») < E/2 for every p E D.
Then for any p' E A we have only to choose p E D such that p(p', p) < 0, and
we have
p(j(p'), h(p'») ~ p(j(p'),f(p») + p(j(p), h(p») + p(h(p), h(p'»)
~ E/4 +E/2 +E/4 = E. 0
The above proof is a good example of a mathematical argument that is
completely elementary but hard. When referring to mathematical reasoning, the
words 'sophisticated' and 'difficult' are by no means equivalent.
7. COMPLETENESS
If xn ~ a as n ~ 00, then the terms Xn obviously get close to each other as n
gets large. On the other hand, if {xn} is a sequence whose terms get arbitrarily
close to each other as n ~ 00, then {xn} clearly ought to converge to a limit.
It may not, however; the desired limit point may be missing from the sp~ce.
If a metric space S is such that every sequence which ought to converge actually
does converge, then we say that S is complete. We now make this notion precise.
Definition. {Xn} is a Cauchy sequence if for every Ethere is an N such that
m > Nand n > N =? p(xm, xn) < E.
Lemma 7.1. If {xn} is convergent, then {Xn} is Cauchy.
Proof. Given E, we choose N such that n > N =? p(xn' a) < E/2, where a is
the limit of the sequence. Then if m and n are both greater than N, we have
p(xm' xn) ~ p(xm' a) +p(a, xn) <: E/2 +E/2 = E. 0
4.7 COMPLETENESS 217
Lelllllla 7.2. If {Xn} is Cauchy, and if a subsequence is convergent, then
{xn} itself converges.
Proof. Suppose that Xn(i) -+ a as i -+ 00. Given E, we take N so that m, n > N =}
p(xn' xm) < E. Because Xn(i) -+ a as i -+ 00, we can choose an i such that
n(i) > Nand P(Xn(i), a) < E. Thus if m > N, we have
p(xm' a) ~ p(xm' Xn(i» +p(Xn(i), a) < 2E,
and so Xm -+ a. 0
Actually, of course, if m, n > N =} p(xm, xn) < E, and if Xn -+ a, then for
any m > N it is true that p(xm' a) ~ E. Why?
Lelllllla 7.3. If A and B are metric spaces, and if T is a Lipschitz mapping
from A to B, then T carries Cauchy sequences in A into Cauchy sequences in
B. This is true in particular if A and Bare normed linear spaces and T is an
element of Hom(A, B).
Proof. Let {xn} be a Cauchy sequence in A, and set Yn = T(xn). Given E,
choose N so that m, n > N =} p(xm' xn) < EIC, when C is a Lipschitz constant
for F. Then
m, n > N =} P(Ym, Yn) = p(T(xm), T(xn») ~ Cp(xm, xn) < CEIC = E. 0
This lemma has a substantial generalization, as follows.
Theorelll 7.1. If A and B are metric spaces, {xn} is Cauchy in A, and
F: A -+ B is uniformly continuous, then {F(xn )} is Cauchy in B.
Proof. The proof is left as an exercise.
The student should try to acquire a good intuitive feel for the truth of these
lemmas, after which the technical proofs become more or less obvious.
Definition. A metric space A is complete if every Cauchy sequence in A
converges to a limit in A. A complete normed linear space is called a Banach
space.
Weare now going to list some important examples of Banach spaces. In
each case a proof is necessary, so the list becomes a collection of theorems.
Theorelll 7.2. IR is complete.
Proof. Let {xn} be Cauchy in R Then {xn} is bounded (why?) and so, by
Theorem 4.3, has a convergent subsequence. Lemma 7.2 then implies that
{xn} is convergent. 0
Theorelll 7.3. If A is a complete metric space, and if f is a continuous
bijective mapping from A to a metric space B such that r 1 is Lipschitz
continuous, then B is complete. In particular, if V is a Banach space, and if
Tin Hom(V, W) is invertible, then W is a Banach space.
218 COMPACTNESS AND COMPLETENESS 4.7
Prool. Suppose that {Yn} is a Cauchy sequence in B, and set Xi = I-I(y,) for
all i. Then {Xi} is Cauchy in A, by Lemma 7.3, and so converges to some X in A,
since A is complete. But then Yn = I(xn) ~ I(x), because I is continuomt
Thus every Cauchy sequence in B is convergent and B is complete. 0
The Banach space assertion is a specia.l case, because the invertibility of '1'
means that T-I exists in Hom(W, V) and hence is a Lipschitz mapping.
Corollary. If p and q are equivalent norms on V and -< V, p>- is completc,
then so is -< V, q>-.
Theorenl 7.4. If V I and V2 are Banach spaces, then so is V 1 X V2.
Proof. If {-< tn, 7Jn >-} is Cauchy, then so are each of {tn} and {7Jn} (by
Lemma 7.3, since the projections 7ri are bounded). Then tn ~ a and 7Jn ~ {:J
for some a E V land {3 E V2· Thus -< tn, 7Jn >- ~ -<a, {3 >- in V I X V2. (See
Theorem 3.4.) 0
Corollary 1. If {Vi] 1are Banach spaces, then so is IIi=l Vi.
Corollary 2. Every finite-dimensional vector space is a Banach space
(in any norm).
Proof. IRn is complete (in the one-norm, say) by Theorem 7.2 and Corollary 1
above. We then impose a one-norm on V by choosing a basis, and apply the
corollary of Theorem 7.3 to pass to any other norm. 0
TheoreIIl 7.5. Let W be a Banach space, let A be any set, and let 03(A, W)
be the vector space of all bounded functions from A to W with the uniform
norm Ilill"" = lub {lli(a)11 : a E A}. Then 03(A, W) is a Banach space.
Prooi. Let Un] be Cauchy, and choose any a E A. Since Ilin(a) - 1m(a) I! ~
Ilin - imll"" it follows that Un(a)} is Cauchy in Wand so convergent. Definc
g: A ~ W by yea) = limin(a) for each a E A. We have to show that g is
bounded and that in ~ y.
Given E, we choose N so that m, n > N =:} Ilim - inll"" < E. Then
Ilim(a) - yea) II = lim Ilim(a) - inCa) II ~ E.
n-->""
Thus, if m > N, then Ilim(a) - y(a)!1 ~ E for all a E A, and hence Ilim - gil"" ~
E. This implies both that im - g E 03(A, W), and so
g = im - (fm - g) E 03(A, W),
and that im ~ g in the uniform norm. 0
TheoreIIl 7.6. If V is a normed linear space and W is a Banach space, then
Hom(V, W) is a Banach space.
The method of proof is identical to that of the preceding theorem, and we
leave it as an exercise. Boundedness here has a different meaning, but it is used
4.7 COMPLETENESS 219
in essentially the same way. One additional fact has to be established, namely,
that the limit map (corresponding to g in the above theorem) is linear.
Theorem 7.7. A closed subset of a complete metric space is complete. A
complete subset of any metric space is closed.
Proof. The proof is left to the reader.
It follows from Theorem 7.7 that a complete metric space A is absolutely
closed, in the sense that no matter how we extend A to a larger metric space
B, A is always a closed subset of B. Actually, this property is equivalent to
completeness, for if A is not complete, then a very important construction of
metric space theory shows that A can be completed. That is, we can construct
a complete metric space B which includes A. Now, if A is not complete, then
the closure of A in B, being complete, is different from A, and A is not absolutely
closed.
See Exercise 7.21 through 7.23 for a construction of the completion of a
metric space. The completion of a normed linear space is of course a Banach
space.
Theorem 7.8. In the context of Theorem 7.5, let A be a metric space, let
e(A, W) be the space of continuous functions from A to W, and set
CBe(A, W) = CB(A, W) n e(A, W).
Then CBe is a closed subspace ofCB.
I
I
I
I
I
I
-----..!...---I
I
I
I
I
I
I
I
a x
} <.j3
Fig. 4.3
Proof. We suppose that Un} C CBe and that IIfn - all", --t 0, where g E CB.
We have to show that g is continuous. This is an application of a much used
"up, over, and down" argument, which can be schematically indicated as in
Fig. 4.3.
Given E, we first choose any n such that IIfn - all", < E/3. Consider now
any a EA. Sincefn is continuous at a, there exists a 6 such that
p(x, a) < 6 :::} Ilfn(x) - fn(a) II < E/3.
220 COMPACTNESS AND COMPLETENESS 4.7
Then
p(x, a) < 6 ==> Ilg(x) - g(a)11 :::; IIg(x) - fn(x) II + Ilfn(x) - fn(a) II
+ Ilfn(a) - g(a)11 < E/3 +E/3 +E/3 = E.
Thus g is continuous at a for every a E A, and so g E me. 0
This important classical result is traditionally stated as follows: The limit oj
a uniformly convergent sequence of continuous functions is continuous.
Remark. The proof was slightly more general. We actually showed that if
fn ---+ f uniformly, and if each fn is continuous at a, then f is continuous at a.
Corollary. me(A, W) is a Banach space.
Theorem 7.9. If A is a sequentially compact metric space, then A is com-
plete.
Proof. A Cauchy sequence in A has a subsequence converging to a limit in A,
and therefore, by Lemma 7.2, itself converges to that limit. Thus A is complete. 0
In Section 5 we proved that a compact set is also totally bounded. It can be
shown, conversely, that a complete, totally bounded set A is sequentially com-
pact, so that these two properties together are equivalent to compactness.
The crucial fact is that if A is totally bounded, then every sequence in A
has a Cauchy subsequence. If A is also complete, this Cauchy subsequence will
converge to a point of A. Thus the fact that total boundedness and complete-
ness together are equivalent to compactness follows directly from the next
lemma.
Lemma 7.4. If A is totally bounded, then every sequence in A has a Cauchy
subsequence.
Proof. Let {Pm} be any sequence in A. Since A can be covered by a finite
number of balls of radius 1, at least one ball in such a covering contains infinitely
many of the points {Pm}. More precisely, there exists an infinite set M 1 C z+
such that the set {Pm: m E M I} lies in a single ball of radius 1. Suppose that
M b ... , M n C Z+ have been defined so that M i+1 C M;for i = 1, ... ,n - 1,
Mn is infinite, and {Pm: m E Mi } is a subset of a ball of radius l/i for i = 1, ... ,n.
Since A can be covered by a finite family of balls of radius l/(n + 1), at least
one covering ball contains infinitely many points of the set {Pm: mE M n}. More
precisely, there exists an infinite set M n+l C M n such that {Pm: m E M n+l}
is a subset of a ball of radius l/(n + 1). We thus define an infinite sequence
{Mn} of subsets of Z+ having the above properties.
Now choose ml E M 1, m2 E M 2 so that m2 > ml, and, in general, mn+l E
Mn+l so that mn+l > m". Then the subsequence {PmJn is Cauchy. For
given E, we can choose n so that l/n < E/2. Then i, j > n ==> mi, mj E M n ==>
P(Pm., Pm.) < 2(1/n) < E. This proves the lemma, and our theorem is a. . ,
corollary. 0
4.7 COMPLETENESS 221
Theorem 7.10. A metric space S is sequentially compact if and only if S is
totally bounded and complete.
The next three sections will be devoted to applications of completeness to
the calculus, but before embarking on these vital matters we should say a few
words about infinite series. As in the ordinary calculus, if {~n} is a sequence in a
normed linear space V, we say that the series :E ~i converges and has the sum a,
and write:E~ ~i = a, if the sequence of partial sums converges to a. This means
that Un -t a as n -t ao, where Un is the finite sum :Ei ~i for each n. We say that
:E ~i converges absolutely if the series of norms :EIIM converges in IR. This is
abuse of language unless it is true that every absolutely convergent series con-
verges, and the importance of the notion stems from the following theorem.
Theorem 7.11. If V is a Banach space, then every absolutely convergent
series in V is convergent.
Proof. Let:E ~i be absolutely convergent. This means that :EII ~ill converges in
R, i.e., that the sequence {sn} converges in R, where Sn = :Ei II ~ill. If m < n,
then
Since {Si} is Cauchy in R, this inequality shows that {Ui} is Cauchy in V and
therefore, because V is complete, that {un} is convergent in V. That is, :E ~i is
convergent in V. 0
The reader will be asked to show in an exercise that, conversely, if a normed
linear space V is such that every absolutely convergent series converges, then V
is complete. This property therefore characterizes Banach spaces.
We shall make frequent use of the above theorem. For the moment we
note just one corollary, the classical Weierstrass comparison test.
Corollary. If {In} is a sequence of bounded real-valued (or W-valued, for
some Banach space W) functions on a common domain A, and if there is a
sequence {Mn} of positive constants such that :E Mn is convergent and
IIfnll"" ~ Mn for each n, then :E fn is uniformly convergent.
Proof. The hypotheses imply that :Ellfnll"" converges, and so :E fn converges in
the Banach space CB(A, W) by the theorem. But convergence in CB(A, W) is
uniform convergence. 0
EXERCISES
7.1 Prove that a Cauchy sequence in a metric space is a bounded set.
7.2 Let V be a normed linear space. Prove that the sum of two Cauchy sequences
in V is Cauchy.
7.3 Show also that if {~n} is Cauchy in V and {an} is Cauchy in R, then {an~n} is
Cauchy in V.
222 COMPACTNESS AND COMPLETENESS 4.7
7.4 Prove that if {~n} is a Cauchy sequence in a normed linear space V, then
(II ~n II) is a Cauchy sequence in R
7.5 Prove that if [Xn} and [Yn} are two Cauchy sequences in a metric space S,
then [P(Xn, Yn)} is a Cauchy sequence in IR.
7.6 Prove the statement made after the proof of Lemma 7.2.
7.7 Thl' rational number system is an incomplete metric space. Prove this by
exhibiting a Cauchy sequence of rational .lUmbers that does not converge to a rational
number.
7.8 Prove Theorem 7.1.
7.9 Deduce a strengthened form of Theorem 7.3 from Theorem 7.1.
7.10 Write out a careful proof of Theorem 7.6, modeled on the proof of Theorem 7.5.
7.Il Prove Theorem 7.7.
7.12 Let the metric space X have a dense subset Y such that every Cauchy sequence
in Y is convergent in X. Prove that X is complete.
7.13 Show that the set W of all Cauchy sequences in a normed linear space V i~
itself a vector space and that a seminorm P can be defined on W by p( {~n)) = lim lI~nll.
(Put this together from the material in the text and the preceding problems.)
7.14 Continuing the above exercise, for each ~ E V, let ~c be the constant sequence
all of whose terms are~. Show that 0: ~ f-+ ~c is an isometric linear injection of V into II·
and that O[V) is dense in W in terms of the seminorm from the above exercise.
7.15 Prove next that every Cauchy sequence in O[V) is convergent in W. Put Exer-
cises 4.18 of Chapter 3 and 7.12 through 7.14 of this chapter together to conclude that
if N is the set of null Cauchy sequences in W, then WIN is a Banach space, and that
~ f-+ ~c is an isometric linear injection from V to a dense subspace of WIN. This con-
stitutes the standard completion of the normed linear space V.
7.16 We shall now sketch a nonstandard way of forming the completion of a metric
space S. Choose some point Po in S, and let V be the set of real-valued functions on S
such that I(po) = 0 and I is a Lipschitz function. For I in Y define 11I11 as the smallest
Lipschitz constant for I. That is,
Ilfll = lub {1/(p) - l(q)llp(p, q)}.
p-:/,q
Prove that r is a normed linear space under this norm. (V actually is complete, but
we do not need this fact.)
7.17 Continuing the above exercise, we know that the dual space V* of all bounded
linear functionals on V is complete by Theorem 7.6. We now want to show that Scan
be isometrically imbedded in V*; then the closure of S as a subset of V will be the
desired completion of S. For each pES, let Op: V ~ IR be "evaluation at p". That is,
Op(f) = I(p)· Show that Op E V* and that 1I0p - Oqll ::; pep, q).
7.18 In order to conclude that the mapping 0: p f-+ Op is an isometry (i.e., is distance-
preserving), we have to prove the opposite inequality 1I0p - Oqll 2: pep, q). To do this,
choose p and consider the special function I(x) = pep, x) - pep, Po). Show that I is
in V and that 11/11 = 1 (from an early lemma in the chapter). Now apply the definition
of 1I0p - Oqll and conclude that 0 is an isometric injection of S into V*. Then 0(8) is
our constructed completion.
4.8 A FIRST LOOK AT BANACH ALGEBRAS
7.19 Prove that if a normed linear space V has the property that every absolutely
convergent series converges, then V is complete. (Let {an} be a Cauchy sequence.
Show that there is a subsequence {an;1 i such that if ~i = a ni+1 - ani' then II~ill < 2-i•
Conclude that the subsequence converges and finish up.)
7.20 The above exercise gives a very useful criterion for V to be complete. Use it to
prove that if V is a Banach space and N is a closed subspace, then VIN is a Banach
space (see Exercise 4.14 of Chapter 3 for the norm on VIN).
7.21 Prove that the sum of a uniformly convergent series of infinitesimals (all on the
same domain) is an infinitesimal.
8. A FIRST LOOK AT BANACH ALGEBRAS
When we were considering the implicit-function theorem and the inverse-function
theorem in· the last chapter, we saw how useful it is to know that if a transfor-
mation T has an inverse T-1, then so does S whenever liS - Til is small enough,
and that the mapping T 1-+ T- 1 is continuous on the open set of all invertible
elements. When the spaces in question are finite-dimensional, these facts can
be made to follow from the continuity of the determinant function T 1-+ /leT)
from Hom V to IR. It is also possible to produce them by arguing directly in
terms of upper and lower bounds for T and its close approximations S. But the
most natural, most elegant,. and-in the case of Banach spaces-easiest way to
prove these things is to show that if V is a Banach space and T in Hom V has
norm less than one, then the sum of the geometric series :EO' Tn is the inverse of
I - T, just as in the elementary calculus. But in making this argument, the
fact that T is a linear transformation has little importance, and we shall digress
for a moment to explore this situation.
Ust us summarize the norm and algebraic properties of Hom V when V is a
Banach space. First of all, we know that Hom V is also a Banach space. Second,
it is an algebra. That is, it possesses an associative mUltiplication operation
(composition) that relates to the linear operations according to the following
laws:
S(TI +T 2) = ST1 +ST2,
(SI + S2)T = SIT + S2T,
c(ST) = (cS)T = S(cT).
Finally, multiplication is related to the norm by
IISTII 5 IISII IITil and 11111 = 1.
This list of properties constitutes exactly the axioms for a Banach algebra.
Just as we can see certain properties of functions most clearly by forgetting
that they are functions and considering them only as elements of a vector space,
now it turns out that we can treat certain properties of transformations in
Hom(V) most simply by forgetting the complicated nature of a linear transfor-
mation and considering it merely as an element of an abstract Banach algebra.A.
224 COMPACTNESS AND COMPLETENESS 4.8
The most important simple thing we can do in a Banach algebra that we
couldn't do in a Banach space is to consider power series. The following theorem
shows that the geometric series, in particular, plays the same central role here
that it plays in elementary calculus. Since we are not thinking of the elements of
A as transformations, we shall designate them by lower-case letters; e is the
identity of A.
Theorem 8.1. If A is a Banach algebra, and if x in A has norm less than one,
then (e - x) is invertible and its inverse is the sum of the geometric series
lUX:
00
(e - X)-l = LX".
o
Also, lie - (e - x)-Ill ~ rl(l - r), where r = Ilxll.
Proof. Since Ilx"ll ~ Ilxll" = r", the series L x" is absolutely convergent when
Ilxll < 1by comparison with the ordinary geometric series L r". It is therefore
convergent, and if y = L~ x", then
"(e - x)y = lim (e - x) L xi = lim (e - X"+l) = e,
n--+co 0
since Ilxll"+1 ~ r,,+l --+ o. That is, y = (e - X)-I. Finally,
lie - (e - x)-lll = II~ x"ll ~ ~ r" = rl(l - r). 0
Theorem 8.2. The set ;n of invertible elements in a Banach algebra A is
open and the mapping x 1-4 X-I is continuous from ;n to;n. In fact, if y-l
exists and m = Ily-111, then (y - h)-l exists whenever Ilhll < 11m and
II(y - h)-l - y-Ill ~ m2 11hll/(l - mllhl!).
Proof. Set x = y-Ih. Then (y - h) = y(e - x), where Ilxll = Ily-Ihil ~
mllhll, and so by the above theorem y - h will be invertible, with (y - h)-l =
(e - X)-Iy-t, provided Ilhll < 11m. Then also
Ily-1 - (y - h)-III ~ lie - (e - x)-Ill .m,
and this is bounded above by
mrl(l - r) ~ m211hll/(1 - mllhll),
by the last inequality in the above theorem. 0
Corollary. If V and Ware Banach spaces, then the invertible elements in
Hom(V, W) form an open set, and the map T 1-4 T-1 is continuous on this
domain.
Proof. Suppose that T-I exists, and set m = liT-III. Then if liT - SII < 11m,
we have III - T-1SII ~ liT-III liT - SII < 1, and so T-1S = I - (I - T-IS)
is an invertible element of Hom V. Therefore, S = T(T-IS) is invertible and
S-I = (T-1S)-IT-1. The continuity of S 1-4 S-I is left to the reader. 0
4.8 A FIRST LOOK AT BANACH ALGEBRAS 225
We saw above that the map x 1-+ (e - X)-l from the open unit ball B1(O) in
a Banach algebra A to A is the sum of the geometric power series. We can define
many other mappings by convergent power series, at hardly any greater effort.
Theorem 8.3. Let A be a Banach algebra. Let the sequence {an} C A and
the positive number ~ be such that the sequence {llanll ~n} is bounded. Then
L anxn converges for x in the ball Ba(O) in A, and if 0 < s < ~,then the
series converges uniformly on B8(O).
Proof. Set r = s/~, and let b be a bound for the sequence {llanll ~n}. On the
ball Bs(O) we then have Ilanxnll ::; Ilanllsn= Ilanll ~nrn ::; brn, and the series
therefore converges uniformly on this ball by comparison with the geometric
series bL rn, since 1" < 1. 0
The series of most interest to us will have real coefficients. They are included
in the above argument because the product of the vector x and the scalar t is
the algebra product (te)x. In addition to dealing with the above geometric series,
we shall be particularly interested in the exponential function eX = LoxnIn!.
The usual comparison arguments of the elementary calculus show just as easily
here that this series converges for every x in A and uniformly on any ball.
It is natural to consider the differentiability of the maps from A to A defined
by such convergent series, and we state the basic facts below, starting with a
fundamental theorem on the differentiability of a limit of a sequence.
Theorem 8.4. Let {Fn} be a sequence of maps from a ball B in a normed
linear space V to a normed linear space W such that Fn converges pointwise
to a map F on B and such that {dF:} converges for each a and uniformly over
a. Then F is differentiable on Band dF{J = lim dF~ for each {3 in B.
Proof. Fix {3 and set T = lim dF~. By the uniform convergence of {dFn} ,
given E, there is an N such that IldF: - dF~11 ::; Efor all n ~ N and for all a
in B. It then follows from the mean-value theorem for differentials that
for all n ~ N and all ~ such that (3 + ~ E B. Letting n ~ 00 and regrouping,
we have
II(~F{JW - TW) - (~F:W - dF:W) II ::; 2EII~11
for all such ~. But, by the definition of dF: there is a ~ such that
II~F:(~) - dF:WII ::; EII~II
when II ~II < ~. Putting these last two inequalities together, we see that
II ~II < ~ => II~F{J(~) - T(~)II ::; 3EII ~II·
Thus F is differentiable at {3 and dF{J = T. 0
The remaining proofs are left as a set of exercises.
226 COMPACTNESS AXD COMPLETENESS 4.8
Lemma 8.1. Multiplication on a Banach algebra A is differentiable (from
A X A to A). If we let p be the product function, so that p(x, y) = xv,
then dP<a.b>(X, y) = ay +xb.
Lemma 8.2. Let A be a commutative Banach algebra, and let p be the
monomial function p(x) = ax7l • Then p is everywhere differentiable and
dpy(x) = nayn-Ix.
Lemma 8.3. If {llanllrn] is a bounded sequence in IR, then {nllanllsn} is
bounded for any 0 < s < r, and therefore L na"xn- l converges uniformly
on any ball in A smaller than Br(O).
Theorem 8.5. If A is a commutative Banach algebra and {an} C A is such
that (1Ianllrn] is bounded in IR, then F(x) = Lo anxn is defined and differ-
entiable on the ball Br(O) in A, and
dFy(x) = (~nanyn-t). x.
It is natural to call the element L~ nanyn-t the derivative of F at y and to
designate it F'(y), although this departs from our rule that derivatives are
vectors obtained as limits of difference quotients. The remarkable aspect of the
above theorem is that for this kind of differentiable mapping from a,n open
,mbset of A to A the linear transformation dF-y is multiplication by an element
of A: dFy(:r) = F'(y) . :r.
In particular, the exponential function exp (x) = eX = LoxnIn! is its own
derivative, since L~ nxn- l In! = LO xm1m!, and from this fact (see the exer-
cises) or from direct algebraic manipulation of the series in question, we can
deduce the law of exponents eX+y = eXeY • Remember, though, that this is on a
commutative Banach algebra. The function x ~ eX = Lo xnln! can be defined
just as easily on any Banach algebra A, but it is not nearly as pleasant when A
is noncommutative. However, one thing that we can always do, and often
thereby save the day, is to restrict the exponential mapping to a commutative
subalgebra of A, say that generated by a single element x. For example, we can
consider the parametrized arc 'Y(t) = etx (x fixed) into any Banach algebra A,
and, because its range lies in the commutative subalgebra X generated by x, we
can apply Theorem 7.2 of Chapter 3 to conclude that 'Y is differentiable and that
'Y'(t) = d exptx (x) = xetx.
This can also easily be proved directly from the law of exponents:
~'Yt(h) = e<t+hlx - etx = etx(ehx - 1),
and since it is clear from the series that (chx - l)lh -+ ;); as h ~ 0, we have that
'Y'(t) = lim ~'Y,(h) = XCix.
h-+O h
4.8 A FIRST LOOK AT BAXACH ALGEBRAS 227
EXERCISES
8.1 Finish the proof of the corollary of Theorem 8.2.
8.2 Let.1 be a Banach algebra, and let [an) C R and x E .1 be such that L aixi
converges. Suppose also that x satisfies a polynomial indentity p(x) = LO bixi = 0,
where {bi] C Rand bn ;t. O. Prove that the element L~ aixi is a polynomial in x of
degree::S; n - 1. (Let JI be the linear span of {Xi) 0-1, and show first that Xi E JI
for all i.)
8.3 Let.1 be any Banach algebra, let x be a fixed element in .1, and let X be the
smallest closed subalgebra of A containing x. Prove that X is a commutative Banach
algebra. (The set of polynomials p(x) = LO aixi is the smallest algebra containing x.
Consider its closure in X.)
8.4 Prove Lemma 8.1. [Hint: -< x, y >- ~ xy is a bounded bilinear map.]
8.5 Prove Lemma 8.2 by making a direct ~-estimate from the binomial expansion,
as in the elementary calculus.
8.6 Prove Lemma 8.2 by induction from Lemma 8.1.
8.7 Let A be any Banach algebra. Prove that p: x ~ x3 is differentiable and that
dpa(x) = xa2 +axa +a2x.
8.8 Prove by induction that if q(x) = x n , then q is differentiable and
Deduce Lemma 8.2 as a corollary.
n-I
L:: aixa(n-I-i).
i=O
8.9 Let.1 be any Banach algebra. Prove that r: x ~ x-I is everywhere differentiable
on the open set U of invertible elements and that
[Hint: Examine the proofs of Theorems 8.1 and 8.2.]
8.10 Let A be an open subset of a normed linear space V, and let F and G be mappings
from A to a Banach algebra X that are differentiable at a. Prove that the product
mapping FG is differentiable at a and that d(FG..) = F(a) dG.. +dF..G(a). Does it
follow that d(F2).. = 2F(a) dP..?
8.11 Continuing the above exercise, show that if X is a commutative Banach algebra,
then d(pn).. = npn-I(a) dP...
8.12 Let F: A --+ X be a differentiable map from an open set A of a normed linear
space to a Banach algebra X, and suppose that the element P(~) is invertible in X
for every ~ in A. Prove that the map G: ~ ~ [F(m-I is differentiable and that
dG..(~) = -F(a)-I dP..(~)P(a). Show also that if P isa parametrized arc(A=lC R),
then G'(a) = - P(a) -1 . P'(a) . F(a) -1.
8.13 Prove Lemma 8.3.
8.14 Prove Theorem 8.5 by showing that Lemma 8.3 makes Theorem 8.4 applicable.
8.15 Show that in Theorem 8.4 the convergence of Fn to P needs only to be assumed
at one point, provided we know that the codomain space 1r is a Banach space.
228 COMPACTNESS AND COMPLETE~ESS 4.9
8.16 We want to prove the law of exponents for the exponential function on a com-
mutative Banach algebra. Show first that (exp (-x) ) (exp x) == e by applying Exercise
7.13 of Chapter 3, the above Exercise 8.10, and the fact that d eXPa (x) = (exp a)x.
8.17 Show that if X is a commutative Banach algebra and F: X ---> X is a differ-
entiable map such that dFa(~) = ~F(a), then F(~) = {3 exp ~ for some constant {3.
[Consider the differential of FW exp (-~).l
8.18 Now set F(~) = exp (~+ 7/) and prove from the above exercise that
exp (~+ 7/) = exp W exp (7/).
You will also need the fact that exp 0 = 1.
8.19 Let z be a nilpotent element in a commutative Banach algebra X. That is,
zP = 0 for some positive integer p. Show by an elementary estimate based on the
binomial expansion that if Ilxll < 1, then Ilx + zll n :::; knpllxll n-p for n > p. The
series of positive terms L nar n converges for r < 1 (by the ratio test). Show, therefore,
that the series for log (1 - (x +z)) and for (1 - (x +z)) -1 converge when IIxll < 1.
8.20 Continuing the above exercise, show that F(y) = log (1 - y) is defined and
differentiable on the ball Ily - zll < 1 and that dFa(x) = -(1 - a)-I. x. Show,
therefore, that exp (log (1 - y)) = 1 - Y on this ball, either by applying the inverse
mapping theorem or by applying the composite function rule for differentiating.
Conclude that for every nilpotent element z in X there exists a u in X such that
exp u = 1 - z.
8.21 Let Xl, ... , Xn be Banach algebras. Show that the product Banach space
X = IIi Xi becomes a Banach algebra if the product xy = -< x, ... ,Xn >-<YI, ... ,Yn >
is defined as -<XIYI, ... , XnYn> and if the maximum norm is used on X.
8.22 In the above situation the projections 7ri have now become bounded algebra
homomorphisms. In fact, just as in our original vector definitions on a product space,
our definition of multiplication on X was determined by the requirement that 7ri(XY) =
7ri(X)7ri(Y) for all i. State and prove an algebra theorem analogous to Theorem 3.4 of
Chapter 1.
8.23 Continuing the above discussion, suppose that the series L anxn converges in X,
with sum y. Show that then L(an)i(xi) n converges in Xi to Yi for each i, where, of
course, y = -<YI, ... , Yn >. Conclude that eX = -< eXt, ... , eXn> for any x =
-<Xl, ... ,xn > in X.
8.24 Define the sine and cosine functions on a commutative Banach algebra, and
show that sin' = cos, cos' = -sin, sin2 + cos2 = e.
9. THE CONTRACTION MAPPING FIXED-POINT THEOREM
In this section we shall prove the very simple and elegant fixed-point theorem for
contraction mappings, and then shall use it to complete the proof of the implicit-
function theorem. Later, in Chapter 6, it will be the basis of our proof of the
fundamental existence and uniqueness theorem for ordinary differential equa-
tions. The section concludes with a comparison of the iterative procedure of the
fixed-point theorem and that of Newton's method.
4.9 THE CONTRACTION MAPPING FIXED-POINT THEOREM 229
A mapping K from a metric space X to itself is a contraction if it is a Lipschitz
mapping with constant less than 1; that is, if there is a constant C with 0 < C < 1
such that p(K(x), K(y)) ~ Cp(x, y) for all x, y EX. A fixed point of K is, of
course, a point x such that K(x) = x.
A contraction K can have at most one fixed point, since if K(x) = x and
K(y) = y, thenp(x, y) = p(K(x), K(y)) ~ Cp(x, y), and so (1 - C)p(x, y) ~ O.
Since C < 1, this implies that p(x, y) = 0 and x = y.
Theorem 9.1. Let X be a nonempty complete metric space, and let
K: X - X be a contraction. Then K has a (unique) fixed point.
Proof. Choose any Xo in X, and define the sequence {xn} 0inductively by setting
Xl = K(xo), X2 = K(Xl) = K2(XO)' and Xn = K(Xn_l) = Kn(xo). Set 6 =
P(Xh xo). Thenp(x2, Xl) = P(K(Xl), K(xo)) ~ Cp(Xl, xo) = C6, and, by induc-
tion,
p(xn+l> xn) = p(K(xn), K(Xn_l)) ~ Cp(xn> Xn-l) ~ C· cn-l 6 = Cn 6.
It follows that {xn } is Cauchy, for if m > n, then
m-l m-l
p(xm , xn) ~ :E P(Xi+l, Xi) ~ :E Ci6 < Cn6/(1 - C),
n n
and Cn _ 0 as n - 00, because C < 1. Since X is complete, {xn} converges to
some a in X, and it then follows that K(a) = lim K(xn) = lim Xn+l = a, so
that a is a fixed point. 0
In practice, we meet mappings K that are contractions only near some
particular point p, and we have to establish that a suitable neighborhood of p
is carried into itself by K. We show below that if K is a contraction on a ball
about p, and if K doesn't move the center of p very far, then the theorem can
be applied.
Corollary 1. Let D be a closed ball in a complete metric space X, and let
K: D - X be a contraction which moves the center of D a distance at
most (1 - C)r, where r is the radius of D and C is the contraction constant.
Then K has a unique fixed point and it is in D.
Proof. We simply check that the range of K is actually in D. If p is the center
of D and X is any point in D, then
p(K(x), p) ~ p(K(x), K(p)) +p(K(p), p)
~ Cp(x, p) + (1 - C)r ~ Cr + (1 - C)r = r. 0
Corollary 2. Let B be an open ball in a complete metric space X, and let
K: B - X be a contraction which moves the center of B a distance less than
(1 - C)r, where r is the radius of Band C is the contraction constant.
Then K has a unique fixed point.
Proof. Restrict K to any slightly smaller closed ball D concentric with B, and
apply the above corollary. 0
230 COMPACTNESS AND COMPLETENESS 4.9
Corollary 3. Let K be a contraction on the complete metric space X, and
suppose that K moves the point x a distance d. Then the distance from x to
the fixed point is at most dl(1 - C), where C is the contraction constant.
Proof. Let D be the closed ball about x of radius r = dl(1 - C), and apply
Corollary 1 to the restriction of K to D. It implies that the fixed point is in D. 0
We now suppose that the contraction K contains a parameter s, so that K
is now a function of two variables K(s, x). We shall assume that K is a con-
traction in x uniformly over s, which means that peKes, x), K(s, y) s Cp(x, y)
for all x, y, and s, where 0 < C < 1. We shall also assume that K is a con-
tinuous function of s for each fixed x.
Corollary 4. Let K be a mapping from S X X to X, where X is a complete
metric space and S is any metric space, and suppose that K(s, x) is a con-
traction in x uniformly over s and is continuous in s for each x. Then the
fixed point P. is a continuous function of s.
Proof. Given f, we use the continuity of K in its first variable around the point
-<t, PI >- to choose 5, so that if pes, t) < 5, then the distance from K(s, PI) to
K(t, Pt) is at most f. Since K(t, PI) = Pt, this simply says that the contraction
with parameter value 8 moves PI a distance at most f, and so the distance from
PI to the fixed point P. is at most f/(1 - C) by Corollary 3. That is, pes, t) <
5 ~ p(P., Pt) < f/(1 - C), where C is the uniform contraction constant, and
the mapping s 1--+ P. is accordingly continuous at t. 0
Combining Corollaries 2 and 4, we have the following theorem.
Theorem 9.2. Let B be a ball in a complete metric space X, let S be any
metric space, and let K be a mapping from S X B to X which is a contraction
in its second variable uniformly over its first variable and is continuous in its
first variable for each value of its second variable. Suppose also that K
moves the center of B a distance less than (1 - C)r for every s in S, where r
is the radius of Band C is the uniform contraction constant. Then for each s
in S there is a unique P in B such that K(s, p) = p, and the mapping
s 1--+ P is continuous from S to B.
We can now complete the proof of the implicit-function theorem.
Theorem 9.3. Let V, W, and X be Banach spaces, let A X B be an open
subset of V X W, and let G: A X B - X be continuous and have a con-
tinuous second partial differential. Suppose that the point -<ex, f3 >- in
A X B is such that G(ex, (3) = 0 and dG"<.a..fJ> is invertible. Then there are
open balls M and N about ex and f3, respectively, such that for each ~ in M
there is a unique." in N satisfying G(~, .,,) = O. The function F thus
uniquely defined near -<ex, f3 >- by the condition G(~, F(~) = 0 is continuous.
Proof. Set T = dG"<.a..fJ> and K(~,.,,) = ." - T-l(G(~, .,,). Then K is a con-
tinuous mapping from A X B to W such that K(ex, (3) = f3, and K has a con-
4.9 THE CONTRACTION MAPPING FIXED-POINT THEOREM 231
tinuous second partial differential such that dK~a,(j> = 0. Because dK~p,.> is
a continuous function of -<p., v>-, we can choose a product ball 111 X N about
-<a, (3>- on which dK~p,.> is bounded by!, and we can then decrease the ball M
if necessary so that for p. in M we also have IIK(p., (3) - (311 < 1"/2, where l' is the
radius of the ball N. The mean-value theorem for differentials implies that K is
a contraction in its second variable with constant t. The preceding theorem
therefore shows that for each ~ in M there is a unique." in N such that K(t, .,,) =
." and the mapping F: ~ 1-+ ." is continuous. Since K(~, .,,) = ." if and only if
G(~, .,,) = 0, we are done. 0
Theorems 8.2 and 9.3 complete the list of ingredients of the implicit-function
theorem. (However, see Exercise 9.8.)
We next show, in the other direction, that if a contraction depending on a
parameter is continuously differentiable, then the fixed point is a continuously
differentiable function of the parameter.
Theorem 9.4. Let V and W be Banach spaces, and let K be a differentiable
mapping from an open subset A X B of V X W to W which satisfies the
hypotheses of Theorem 9.2. Then the function F from A to B uniquely
defined by the equation K(~, F(~)) = F(~) is differentiable.
Proof. The inequality IIK(~, .,,') - K(~, .,,")11 ~ Gil.,,' - .,,"11 is equivalent to
IldK~a,(j>11 ~ G for all -<a, (3>- in A X B. We now define G by G(~, .,,) =
." - K(~, .,,), and observe that dG2 = 1- dK2 and that dG2 is therefore
invertible by Theorem 8.1. Since G(~, F(~)) = 0, it follows from Theorem 11.1
of Chapter 3 that F is differentiable and that its differential is obtained by
differentiating the above equation. 0
Corollary. If K is continuously differentiable, then so is F.
*We should emphasize that the fixed-point theorem not only has the implicit-
function theorem as a consequence, but the proof of the fixed-point theorem
gives an iterative procedure for actually finding the value of F(~), once we
know how to compute T-1 (where T = dG~a,(j». In fact, for a given value of
~ in a small enough ball about -<a, (3>- consider the function G(~, .). If we
set K(~, .,,) = ." - T-IG(~, .,,), then the inductive procedure
"'i+I = K(~, ."D
becomes
(9.1)
The meaning of this iterative procedure is easily seen by studying the graph of
the situation where V = W = RI. (See Fig. 4.4.) As was proved above, under
suitable hypotheses, the series EII"'i+l - "'ill converges geometrically.
It is instructive to compare this procedure with Newton's method of elemen-
tary calculus. There the iterative scheme (9.1) is replaced by
(9.2)
232 COMPACTNESS AND COMPLETENESS 4.9
z G(x,.)
Fig. 4.4
where Si = dG~q.T/i>. (See Fig. 4.5.) As we shall see, this procedure (when it
works) converges much more rapidly than (9.1), but it suffers from the dis-
advantage that we must be able to compute the inverses of an infinite number of
linear transformations Si.
Fig. 4.5
Let us suppress the ~ which will be fixed in the argument and consider a map
G defined in some neighborhood of the origin in a Banach space. Suppose that G
has two continuous differentials. For definiteness, we assume that G is defined
in the unit ball, B, and we suppose that for each x E B the map dG", is invertible
and, in fact,
IIdG;lll :::; K,
Let Xo = 0 and, assuming that Xn has been defined, we set
Xn+l = Xn - S;;lG(Xn),
where Sn = dG",,,. We shall show that if IIG(O) II is sufficiently small (in terms
of K), then the procedure is well defined (that is, Ilxn+lll < 1) and converges
rapidly. In fact, if T is any real number between one and two (for instance
T = i), we shall show that for some c (which can be made large if IIG(O)II is
small)
4.9 THE CONTRACTION MAPPING FIXED-POINT THEOREM 233
Note that if we can establish (*) for large enough c, then Ilxnll :$ 1 follows.
In fact,
i 00 00 -c(r-1)
IIx'lI < L: e-crn < L: e-crn < L: e-cn(r-l) = e ,
, - 1 - 1 - 1 1 - e-c<r-ll
which is :$ 1 if c is large. Let us try to prove (*) by induction. Assuming it true
for n, we have
IIXn+1 - xnll = IIS;;-IG(Xn)1I
:$ KII G(Xn-l - S;;-':1G(Xn_l» II
:$ K{IIG(xn_l) - dG"'n_lS;;-':IG(Xn-l)1I + Kllxn - xn_1112}
by Taylor's theorem. Now the first term on the right of the inequality vanishes,
and we have
IIXn+1 - xnll :$ K211xn - xn_111 2 :$ K2e-2crn.
For the induction to work we must have
or
(**)
Since T < 2, this last inequality can be arranged by choosing c sufficiently
large. We must still verify (*) for n = 1. This says that
or
-cr
IIG(O)II :$ eK .
In summary, for 1 < T < 2 choose c so that K2 :$ e(2-r)cr and
-c(r-l)
e < 1
1 - e-C<r-ll - .
Then if (***) holds, the sequence Xn converges exponentially, that is, (*) holds.
If x = lim Xi, then G(x) = lim G(xn) = lim Sn(Xn+l - xn) = O. This is
Newton's method.
As a possible choice of c and T, let T = I, and let c be given by K2 = e3c/4,
so that (**) just holds. We may also assume that K ~ 23/4, so that e3c/4 ~ 43/4
or eC ~ 4, which guarantees that e-c/2 :$ !, implying that e-c/ 2/(1 - e-c/2) :$
1. Then (***) becomes the requirement G(O) :$ K-5•
We end this section with an example of the fixed-point iterative procedure in
its simplest context, that of the inverse-mapping theorem. We suppose that
H(O) = 0 and that dH(j1 exists, and we want to invert H near zero, i.e., solve
the equation H(1/) - ~ = 0 for 1/ in terms of~. Our theory above tells us that
the 1/ corresponding to ~ will be the fixed point of the contraction K(~, 1/) =
1/ - T-1H(1/) + T-l(~), where T = dHo. In order to make our example as
234 COMPACTNESS AND COMPLETENESS 4.!1
simple as possible, we shall take H from ~2 to ~2 and choose it so that dH0 = f.
Also, in order to avoid indices, we shall use the mongrel notation x = -< x, y? ,
U = -<u, V?
Consider the mapping x = H(u) defined by x = u +v2, y = u3 +v.
The Jacobian matrix
[3~2 2;]
is clearly the identity at the origin. Moreover, in the expression K(x, u) =
x + u - H(u), the difference H(u) - u is just the function J(u) = -<v2, U 3 ?
This cancellation of the first-order terms is the practical expression of the fact
that in forming K(~, .,,) = ." - T-IG(~, .,,), we have acted to make dK2 = 0 at
the "center point" (the origin here). We naturally start the iteration with
Uo = 0, and then our fixed-point sequence proceeds
UI = K(x, uo) = K(x, 0), ... , Un = K(x, Un-I)'
Thus Uo = 0 and Un = K(x, Un-I) = X - J(Un_I), giving
UI = x, VI = y,
U2 = X - y2, V2 = Y - x 3,
U3 = X - (y - X 3)2, V3 = Y - (x - y2) 3,
U4 = X - [y - (x - y2)3J2, V4 = Y - [x - (y - X 3)2]3.
We are guaranteed that this sequence Un will converge geometrically provided
the starting point x is close enough to 0, and it seems clear that these two
sequences of polynomials are computing the Taylor series expansions for the
inverse functions u(x, y) and vex, y). We shall ask the reader to prove this in an
exercise. The two Taylor series start out
u(x, y) = x - y2 - 2yx3 + ... ,
vex, y) = y - x 3 + 3x2y2 + .. .
EXERCISES
9.1 Let B be a compact subset of a normed linear space such that rB C B for all
r E [0, 1]. Suppose that F: B - B is a Lipschitz mapping with constant 1 (i.e.,
IIFW - F(.,,) II ~ II~ - .,,11 for all ~,." E B). Prove that F has a fixed point. [Hint:
Consider first G = rF for 0 < r < 1.]
9.2 Give an example to show that the fixed point in the above exercise may not be
unique.
9.3 Let X be a compact metric space, and let K: X - X "reduce each nonzero
distance". That is, p(K(x), K(y») < p(x, y) if x ~ y. Prove that K has a unique
fixed point. (Show that otherwise glb {p(K(x), x)} is positive and achieved as a
minimum. Then get a contradiction.)
9.4 Let K be a mapping from S X X to X, where X is a complete metric space and S
is any metric space, and suppose that K(s, x) is a contraction in s uniformly over x and
4.9 THE CONTRACTION MAPPING FIXED-POINT THEOREM 235
is Lipschitz continuous in x uniformly over x. Show that the fixed point P. is a Lipschitz
continuous function of s. [Hint: Modify the E,~-beginning of the proof of Corollary
4 of Theorem 9.1.]
9.5 Let D be an open subset of a Banach space V, and let K: D ---+ V be such that
I - K is Lipschitz with constant j-.
a) Show that if Br(a) C D and (3 = K (a), then Br/2({3) C K[D]. (Apply a corollary
of the fixed-point theorem to a certain simple contraction mapping.) 
b) Conclude that K is injective and has an open range, and that K-l is Lipschitz
with constant 2.
9.6 Deduce an improved version of the result in Exercise 3.20, Chapter 3, from the
result in the above exercise.
9.7 In the context of Theorem 9.3, show that dG~,..•>is invertible if IldK~,.."> II < 1.
(Do not be confused by the notation. We merely want to know that 8 is invertible if
III - T-l 0 811 < 1.)
9.8 There is a slight discrepancy between the statements of Theorem 11.2 in Chapter
3 and Theorem 9.3. In the one case we assert the existence of a unique continuous
mapping from a ball M, and in the other case, from the ball M to the ball N. Show
that the requirement that the range be in N can be dropped by showing that two
continuous solutions must agree on M. (Use the point-by-point uniqueness of
Theorem 9.3.)
9.9 Compute the expression for dFa from the identity G(~, F(~» = 0 in Theorem
9.4, and show that if K is continuously differentiable, then all the maps involved in the
solution expression are continuous and that a 1--+ dFa is therefore continuous.
9.10 Going back to the example worked out at the end of Section 9, show by induction
that the polynomials Un - U n-l and Vn - Vn-l contain no terms of degree less than n.
9.11 Continuing the above exercise, show therefore that the power series defined by
taking the terms of degree at most n from Un is convergent in a ball about 0 and that its
sum is the first component u(x, y) of the mapping inverse to H.
9.12 The above conclusions hold generally. Let J = -< K, L>- be any mapping
from a ball about 0 in 1R2 to 1R2 defined by the convergent power series
K(x, t) = L a;ixiyi, L(x, y) = L biiXiyi
in which there are no terms of degree 0 or 1. With the conventions x = -<x, y>- and
U = -<u, v >-, consider the iterative sequence
Uo = 0, Un = X - J(Un-l).
Make any necessary assumptions about what happens when one power series is sub-
stituted in another, and show by induction that Un - Un-l contains no terms of
degree less than n, and therefore that the Un define a convergent power series whose
sum is the function u(x, y) = -<u(x, y), v(x, y) >- inverse to H in a neighborhood of O.
[Remember that J(TJ) = H(TJ) - TJ.]
9.13 Let A be a Banach algebra, and let x be an element of A of norm less than 1.
Show that
ao
(e - x)-1 = IT (1 +x2
;=1
236 COMPACTNESS AND COMPLETENESS 4.10
This means that if 7rn is the partial product IH (1 + x2 then 7rn --. (e - x) -1.
[Hint: Prove by induction that (e - X)7rn -l = e - x2n.J
This is another example of convergence at an exponential rate, like Newton's
method in the text.
10. THE INTEGRAL OF A PARAMETRIZED ARC
In this section we shall make our final application of completeness. We first,
prove a very general extension theorem, and then apply it to the construction of
the Riemann integral as an extension of an elementary integral defined for step
functions.
TheoreD1 10.1. Let U be a subspace of a normed linear space V, and let 7'
be a bounded linear mapping from U to a Banach space W. Then T has a
uniquely determined extension to a bounded linear transformation 8 from
the closure D to W. Moreover, 11811 = IITII.
ProoJ. Fix a E Dand choose {~n} C U so that ~n -+ a. Then {~n} is Cauchy and
{T(~n)} is Cauchy (by the lemmas of Section 7), so that {T(~n)} con-
verges to some (3 E W. If {lIn} is any other sequence in U converging to a,
then ~n - lIn -+ 0, T(~n) - T(lIn) = T(~n - lIn) -+ 0, and so T(lIn) -+ {3 also.
Thus {3 is independent of the sequence chosen, and, clearly, (3 must be the valuc
8(a) at a of any continuous extension 8 of T. If a E U, then (3 = lim T(an) =
T(a) by the continuity of T. We thus have 8 uniquely defined on D by the
requirement that it be a continuous extension of T.
It remains to be shown that 8 is linear and bounded by II Til. For any a, {3 E D
we choose {~n}, {lIn} C U, so that ~n -+ a and lIn -+ (3. Then x~n + YlIn -+
x~ +YlI, so that
8(xa +y(3) = lim T(x~n +YlIn) = x lim T(~n) +Y lim T(lIn) = x8(a) +y8({3).
Thus 8 is linear. Finally,
118(a)11 = lim IIT(~n)11 ~ IITlllim II~nll = !ITII· Ilall·
Thus II Til is a bound for 8, and, since 8 includes T, 11811 = II Til. 0
The above theorem has many applications, but we shall use it only once, to
obtain the Riemann integral f: J(t) dt of a continuous function J mapping a
closed interval [a, b] into a Banach space Was an extension of the trivial integral
for step functions. If W is a normed linear space and J: [a, b] -+ W is a con-
tinuous function defined on a closed interval [a, b] C IR, we might expect to be
able to define f: J(t) dt as a suitable vector in Wand to proceed with the integral
calculus of vector-valued functions of one real variable. We haven't done this
until now because we need the completeness of W to prove that the integral
exists!
At first we shall integrate only certain elementary functions called step
functions. A finite subset A of [a, b] which contains the two endpoints a and b
4.10 THE INTEGRAL OF A PARAMETRIZED AHC 237
will be called a partition of [a, b). Thus A is (the range of) some finite sequence
{ti}o,wherea= to < tl < ... < tn = b,andAsubdivides[a, b) into a sequence
of smaller intervals. To be definite, we shall take the open intervals (ti-b ti),
i = 1, ... , n, as the intervals of the subdivision. If A and B are partitions and
A C B, we shall say that B is a j·efinement of A. Then each interval (Sj-b Sj) of
the B-subdivision is included in an interval (ti-b ti) of the A-subdivision; ti-l
is the largest element of A which is less than or equal to Sj-b and ti is the smallest
greater than or equal to Sj. A step function is simply a map f: [a, b) ---t W which
is constant on the intervals of some subdivision A = {ti}0· That is, thE'J'e exists
a sequence of vectors {aiH such that f(~) = ai when ~ E (ii-b ti). The values
off at the subdividing points may be among these values or they may be different.
For each step function f we define f: f(t) dt as Li'=1 ai Ilti, where f = ai on
(t;-b ti) and Ilti = ti - ti_l. If f were real-valued, this would be simply the
sum of the areas of the rectangles making up the region between the graph of f
and the t-axis. Now f may be described as a step function in terms of many
different subdivisions. For example, if f is constant on the intervals of A, and
if we obtain B from A by adding one new point s, then f is constant on the
(smaller) intervals of B. We have to be sure that the value of the integral of f
doesn't change when we change the describing subdivision. In the case just
mentioned this is easy to see. The one new point slies in some interval (ti-l, ti),
defined by the partition A. The contribution of this interval to the A-sum is
ai(ti - ti_l), while in the B-sum it splits into ai(ti - s) +ai(s - ti_l). But
this is the same vector. The remaining summands are the same in the two sums,
and the integral is therefore unchanged. In general, suppose that f is a step
function with respect to A and also with respect to C. Set B = A U C, the
"common refinement" of A and C. We can pass from A to B in a sequence of
steps at each of which we add one new point. As we have seen, the integral
remains unchanged at each of these steps, and so it is the same for A as for B.
It is similarly the same for C and B, and so for A and C. We have thus shown
that f: f is independent of the subdivision used to define f.
Now fix [a, b) and W, and let e be the set of all step functions from [a, b)
to W. Then eis a vector space. For, iff and gin eare step functions relative to
partitions A and B, then both functions are constant on the intervals of C =
A u B, and therefore xf + yg is also. Moreover, if C = {ti} 0, and if on (ti-b ti)
we havef = ai and g = (3;, so that xf +yg = xai +y{3i there, then the equation
is just f: (xf +yg) = x f: f +y f: g. The map f ~ f: f is thus linear from e to
W. Finally,
238 COMPACTNESS AND COMPLETENESS 4.10
where !!fll"" = lub {!!f(t)!! : t E [a, b]} = max {!!ai!! : 1 ~ i ~ n}. That is, if
we use on S the uniform norm defined from the norm of W, then the linear
mapping I ~ f; f is bounded by (b - a). If W is complete, this transformation
therefore has a unique bounded linear extension to the closure S of S in
(B([a, b], W) by Theorem 10.1. But we can show that S includes the space
e([a, b], W) of all continuous functions from [a, b] to W, and the integral of a
continuous function is thus uniquely defined.
Lemma 10.1. e([a, b], W) c S.
Proof. A continuous function f on [a, b] is uniformly continuous (Theorem 5.1).
That is, given E>o, there exists 15>0 such that !s - tl < 15 => IIf(s) - I(t) II < E.
Now take any partition A = {ti} 0on [a, b] such that Ati = ti - ti-l < 15 for all
i, and take ai as any value of f on (ti-l, ti). Then IIf(t) - aill < E on [ti-b tJ
Thus, if g is the step function with value ai on (ti-l, til and g(a) = aI, then
IIf - gil"" ~ E. Thus f is in S, as desired. 0
Our main theorem is a recapitulation.
Theorem 10.2. If W is a Banach space and V = e([a, b], W) under the
uniform norm, then there exists a J E Hom(V, W) uniquely determined by
setting J(f) = lim f; fn, where Un} is any sequence in S converging to f
and f; fn is the integral on S defined above. Moreover, IIJII ~ (b - a).
Iffis elementary from [a, b] to Wand c E [a, b], then of coursefis elementary
on each of [a, c] and [c, b]. If c is added to a subdivision A used in defining f, and
if the sum defining f; f with respect to B = A u {c} is broken into two sums
at c, we clearly have f;I = f:f + febf. This same identity then follows for any
continuous function f on [a, b], since f; I = lim f; In = lim (f:In + Lb fn) =
lim f: fn+lim LbIn = f: I +Lbf.
The fundamental theorem of the calculus is still with us.
Theorem 10.3. If f E e([a, b], W) and F: [a, b] --t W is defined by F(x) =
f: f(t) dt, then F' exists on (a, b) and F'(x) = f(x).
Proof. By the continuity of fat Xo, for every E there exists a 15 such that
IIf(xo) - f(x)!1 < E
whenever Ix - xol < 15. But then
IIJ~: (J(xo) - f(t» dtl/ ~ Elx - xol,
and since fx~ f(xo) dt = f(xo} (x - xo) by the definition of the integral for an
elementary function, we see that
Ilf(xo) - (1:f(t) dt/(x - xo))11 < E.
Since f:a f(t) dt = F(x) - F(xo), this is exactly the statement that the differ-
ence quotient for F converges to f(xo), as was to be proved. 0
4.10 THE INTEGRAL OF A PARAMETRIZED ARC 239
EXERCISES
10.1 Prove the following analogue of Theorem 10.1. Let A be a subset of a metric
space B, let C be a complete metric space, and let F: A ~ C be uniformly continuous.
Then F extends uniquely to a continuous map from A to C.
10.2 In Exercises 7.16 through 7.18 we have constructed a completion of 8, namely,
0[8] in V*. Prove that this completion is unique to within isometry. That is, supposing
that cp is some other isometric imbedding of 8 in a complete space X, show that the
identification of the two images of 8 by cp 0 0-1 (from 0[8] to cp[8J) extends to an
isometric bijection from O[S] to cp[S]. [Hint: Apply the above exercise.]
10.3 Suppose that 8 is a normed linear space X and that X is a dense subset of a
complete metric space Y. This means, remember, that every point of Y is the limit of a
sequence lying in the subset X. Prove that the vector space structure of X extends in a
unique way to make Y a Banach space. Since we know from Exercise 7.18 that a metric
space can be completed, this shows again that a normed linear space can always be
completed to a Banach space.
10.4 In the elementary calculus, if f is continuous, then
{f(t) dt = f(x)(b - a)
for some x in (a, b). Show that this is not true for vector-valued continuous functionsf
by considering the arc f: [0, 11"] ~ 1R2 defined by
f(t) = -< sin t, cos t>-.
10.5 Show that integration commutes with the application of linear transformations.
That is, show that if f is a continuous function from [a, b] to a Banach space W, and if
T E Hom(W, X), where X is a Banach space, then
{ T(f(t)) dt = T[{ f(t) dt].
[Hint: Make the computation directly for step functions.]
10.6 State and prove the theorem suggested by the following identity:
{ -<f(t), get) >- dt = -< { f(t) dt, { get) dt >-.
(Apply the above exercise.)
10.7 Let W be any normed linear space, {ain a finite set of vectors in W, and
{fin a corresponding set of real-valued continuous functions on [a, b]. Define the
arc l' by
"1'(t) = L:j;(t)a;.
1
Prove that f: 1'(t) dt exists and equals
*[labI;(t) dt] ai.
240 COMPACTNESS AND COMPLETENESS 4.11
10.8 Let f be a continuous function from ~2 to a Banach space W. Describe how
one might set up a theory of a double integral
ff f(s, t) ds dt,
IXJ
where I X J is a closed rectangle.
10.9 Prove that if fn converges uniformly to f, then
{ fn(t) dt ~ {f(t) dt.
This is trivial if you have understood the definition and properties of the integral.
10.10 Suppose that {fn} is a sequence of smooth arcs from [a, b] to a Banach space TV
such that Llf~(t) is uniformly convergent. Suppose also that Llfn(a) is convergent.
Prove that then L fn(t) is uniformly convergent, that f = Ll fn is smooth, and that
l' = Llf~. (Use the above exercise and the fundamental theorem of the calculus.)
10.n Prove that even if lV is not a Banach space, if the arc f: [a, b] ~ W has a
continuous derivative, then f: f' exists and equals f(b) - f(a).
10.12 Let X be a normed linear space, and set (I, ~) = l(~) for ~ E X and IE X*.
Now let f and g be continuously differentiable functions (arcs) from the closed interval
[a, b] to X and X*, respectively. Prove the integration by parts formula:
(g(b),f(b» - (g(a),f(a» = {(f(t), g'(t» dt +lab (j'(t), g(t» dt.
[Hint: Apply Theorem 8.4 from Chapter 3.]
10.13 State the generalization of the above integration by parts formula that holds
for any bounded bilinear mapping w: V X lV ~ X, where X is a Banach space.
10.14 Let t ~ It be a fixed continuous map from a closed interval [a, b] to the dual W*
of a Banach space lV. Suppose that for any continuous map g from [a, b] to lV
{ g(t) dt = 0 =? {It(g(t)) dt = O.
Show that there exists a fixed L E W* such that
{It(g(t)) dt = L ({ g(t) dt)
for all continuous arcs g: [a, b] ~ W. Show that it then follows that It = L for all t.
10.15 Use the above exercise to deduce the general Euler equation of Section 3.15.
n. THE COMPLEX NUMBER SYSTEM
The complex number system C is the third basic number field that must be
studied, after the rational numbers and the real numbers, and the reader surely
has had some contact with it in the past.
Almost everybody views a complex number ~ as being equivalent to a pair
of real numbers, the "real and imaginary parts" of ~, and the complex number
system C is thus viewed as being Cartesian 2-space ~2 with some further struc-
4.11 THE COMPLEX NUMBER SYSTEM 241
ture. In particular, a complex-valued function is simply a certain kind of vector-
valued function, and is equivalent to an ordered pair of real-valued functions,
again its real and imaginary parts.
What distinguishes the complex number system C from its vector substratum
1R2 is the presence of an additional operation, complex multiplication. The
vector operations of 1R2 together with this complex multiplication operation
make C into a commutative algebra. Moreover, it turns out that -< 1, 0>- is the
unique multiplicative identity in C and that every nonzero complex number ~
has a multiplicative inverse. These additional facts are summarized by saying
that C is a field, and they allow us to use C as a new scalar field in vector space
theory. In fact, the whole development of Chapters 1 and 2 remains valid when
IR is replaced everywhere by Co Scalar multiplication is now multiplication by
complex numbers. Thus cn is the vector space of ordered n-tuples of complex
numbers -< ~l! ... , ~n >-, and the product of an n-tuple by a complex scalar ex.
is defined by ex.-< ~l! ... ' ~n>- = -<ex.~l! ... ' ex.~n>-' where ex.~i is complex
multiplication.
It is time to come to grips with complex multiplication. As the reader prob-
ably knows, it is given by an odd looking formula that is motivated by thinking
of an element ~ = -<Xl! X2>- as being in the form Xl + iX2' where i 2 = -1,
and then using the ordinary laws of algebra. Then we have
~." = (Xl +ix2)(YI +iY2)
= XIYI + iXIY2 + iX2YI + i2X2Y2 = (XIYI - X2Y2) + i(XIY2 + X2Yl),
and thus our definition is
-<Xl, X2>- -< YI, Y2>- = -<XIYI - X2Y2, XIY2 + X2YI >-.
Of course, it has to be verified that this operation is commutative and satisfies
the laws for an algebra. A straightforward check is possible but dull, and we
shall indicate a neater way in the exercises.
The mapping X ~ -<X, 0>- is an isomorphic injection of the field IR into the
field C. It clearly preserves sums, and the reader can check in his mind that it
also preserves products. It is conventional to identify X with its image -< x, 0>- ,
and so to view IR as a subfield of C.
The mysterious i can be identified in C as the pair -<0, 1>-, since then
i 2 = -<0,1>- -<0,1>- = -< -1,0>-, which we have identified with -1. With
these identifications we have -<x, y>- = -<X, 0>- + -<0, y>- = -< X, 0>- +
-<0, 1>- -< y, 0>- = X + iy, and this is the way we shall write complex numbers
from now on.
The mapping X + iy ~ X - iy is a field isomorphism of C with itself.
That is, it preserves both sums and products, as the reader can easily check.
Such a self-isomorphism is called an automorphism. The above automorphism is
called complex conjugation, and the image X - iy of r = X + iy is called the
conjugate of r, and is designated r. We shall ask the reader to show in an exercise
242 COMPACTNESS AND COMPLETENESS 4.11
that conjugation is the only automorphism of C (except the identity automor-
phism) which leaves the elements of the subfield IR fixed.
The Euclidean norm of r = x + iy = -<x, y>- is called the absolute value
of r, and is designated Irl, so that Irl = Ix + iyl = (x2+ y2)1/2. This is
reasonable beeause it then turns out that Ipl = Irll'Yl. This can be verified by
squaring and multiplying, but it is much more elegant first to notice the relation-
ship between absolute value and the conjugation automorphism, namely,
rf = Irl2
[(x+iy)(x - iy) = x2 - (iy) 2= X 2+y2]. Then Ipl2 = (p)(p) = (rf)('Y'Y) =
IrI21'Y12, and taking square roots gives us our identity. The identity rf = Irl2
also shows us that if r ~ 0, then fllrl 2is its multiplicative inverse.
Because the real number system IR is a subfield of the complex number
system C, any vector space over C is automatically also a vector space over IR:
multiplication by complex scalars includes multiplication by real scalars. And
any complex linear transformation between complex vector spaces is auto-
matically real linear. The converse, of course, does not hold. For example, a
real linear mapping T from 1R2 to 1R2 is not in general complex linear from C to C,
nor does a real linear S in Hom 1R4 become a complex linear mapping in Hom C2
when 1R4 is viewed as C2• We shall study this question in the exercises.
The complex differentiability of a mapping F between complex vector spaces
has the obvious definition llF", = T + ('), where T is complex linear, and then
F is also real differentiable, in view of the above remarks. But F may be real
differentiable without being complex differentiable. It follows from the dis-
cussion at the end of Section 8 that if {an} C C and {lanI5n} is bounded, then the
series L anrn converges on the ball Ba(O) in the (real) Banach algebra C, and
F(r) = La anrnis real differentiable on this ball, with dFIl(r) = (L~ nan(3n)r =
F'«(3) . r But multiplication by F'«(3) is obviously a complex linear operation
on the one-dimensional complex vector space Co Therefore, complex-valued
functions defined by convergent complex power series are automatically com-
plex differentiable. But we can go even further. In this case, if r ~ 0, we can
divide by r in the defining equation
to get the result that
llFIlW ~ F'«(3)
r
as r~o.
That is, F'«(3) is now an honest derivative again, with the complex infinitesimal r
in the denominator of the difference quotient.
The consequences of complex differentiability are incalculable, and we shall
mostly leave them as future pleasures to be experienced in a course on functions
of complex variables. See, however, the problems on the residue calculus at the
end of Chapter 12 and the proof in Chapter 11, Exercise 4.3, of the following
fundamental theorem of algebra.
4.11 THE COMPLEX NUMBER SYSTEM 243
Theorem. Every polynomial with complex coefficients is a product of
linear factors.
A weaker but equivalent statement is that every polynomial has at least one
(complex) root. The crux of the matter is that x2 + 1 cannot be factored over jR
(i.e., it has no real root), but over C we have x 2 + 1 = (x + i)(x -- i), with the
two roots ± i.
For later use we add a few more words about the complex exponential func-
tion exp r = ei = LO' rn/n!. If r = x +iy, we have ei = ex+ill = eXeill, and
ei1l = LO' (iy)n/n! = (1 - y2/2! +y4/4! - ...) +i(y - y3/3!+ yS/5! - ...) =
cos y +i sin y. Thus ex +i1l = eX(cos y +i sin y). That is, the real and imaginary
parts of the complex-valued function exp (x +iy) are eX cos y and eX sin y,
respectively.
EXERCISES
11.1 Prove the associativity of complex multiplication directly from its definition.
B.2 Prove the distributive law,
a(~ +71) = a~ + a71,
for complex numbers.
B.3 Show that scalar multiplication by a real number a, a-< x, y>- = -<ax, ay >-, in
C = 1R2 is consistent with the interpretation of a as the complex number -<a, 0>- and
the definition of complex multiplication.
11.4 Let 8 be an automorphism of the complex number field leaving the real numbers
fixed. Prove that 8 is either the identity or complex conjugation. [Hint: (8(i))2 =
fJ(i2) = fJ( -1) = -1. Show that the only complex numbers x +iy whose squares are
-1 are ±i, and then finish up.]
11.5 If we remember that C is in particular the two-dimensional real vector space 1R2,
we see that multiplying the elements of C by the complex number a +ib must define
a linear transformation on 1R2. Show that its matrix is
11.6 The above exercise suggests that the complex number system may be like the
set A of all 2 X 2 real matrices of the form
Prove that A is a subalgebra of the matrix algebra jR2X2 (that is, A is closed under
multiplication, addition, and scalar multiplication) and that the mapping
[: -!]~ a+ ib
is a bijection from A to C that preserves all algebra operations. We therefore can
conclude that the laws of an algebra automatically hold for C. Why?
244 COMPACTNESS AND COMPLETENESS 4.11
II.7 In the above matrix model of the complex number system show that the abso-
lute value identity Inl = ls-I 1')'1 is a determinant property.
II.S Let lr be a real vector space, and let V be the real vector space W X W.
Show that there is a () in Hom V such that (}2 = -1. (Think of C as being the real
vector space ~2 = ~ X ~ under multiplication by i.)
11.9 Let V be a real vector space, and let () in Hom V satisfy (}2 = -1. Show that r
becomes a complex vector space if ia is defined as (}(a). If the complex vector space r
is made from the real vector space ll' as in this and the above exercise, we shall call
V the complexification of lr. We shall regard IV itself as being a real subspace of r
(actually lr X {OJ), and then V = WEe ill'.
II.IO Show that the complex vector space C" is the complexification of ~". Show
more generally that for any set .1 the complex vector space CA is the complexificatioll
of the real vector space ~A.
II.II Let V be the complexification of the real vector space lV. Define the operation
of complex conjugation on V. That is, show that there is a real linear mapping <p such
that <p2 = 1 and <p(ia) = -i<p(a). Show, conversely, that if V is a complex vector
space and <p is a conjugation on V [a real linear mapping <p such that <p2 = 1 and
<p(ia) = -i<p(a)], then V is (isomorphic to) the complexification of a real linear spac('
W. (Apply Theorem 5.5 of Chapter 1 to the identity <p2 - 1 = 0.)
II.12 Let W be a real vector space, and let V be its complexification. Show that,
every T in Hom lr "extends" to a complex linear S in Hom V which commutes with thp
conjugation <p. By S extending T we mean, of course, that S I (lJ' X {O}) = T.
Show, conversely, that if S in Hom V commutes with conjugation, then S is th!'
extension of a T in Hom W.
II.13 In this situation we naturally call S the complexification of T. Show finally
that if S is the complexification of T, then its null space X in V is the direct sum
X = N Ee iN, where N is the null space of Tin lV. Remember that we are viewing r
as lV Ee ilV.
II.14 On a complex normed linear space V the norm is required to be complex homo-
geneous:
IIAali = IAI . Iiall
for all complex numbers A. Show that the natural definitions of II Ill, II 112, and II 1100
on Cn have this property.
II.IS If a real normed linear space lV is complexified to V = W Ee ilV, there is no
trivial formula which converts the real norm for lV into a complex norm for V. Sho
that, nevertheless, any product norm on V (which really is lr X lr) can be used to
generate an equivalent complex norm. [Hint: Given -< ~, 1] > E V, consider the set of
numbers {1I(x +iy) -<~, 1] >II: Ix +iyl = I}, and try to obtain from this set a singl!'
number that works.]
II.16 Show that every nonzero complex number has a logarithm. That is, show that.
if u + iv ~ 0, then there exists an x + iy such that e,,+ill = u + iv. (Write the equatioll
e"(cos y + i sin y) = u + iv, and solve by being slightly clever.)
II.17 The fundamental theorem of algebra and Theorem 5.5 of Chapter 1 imply
that if V is a complex vector space and T in Hom V satisfies p(T) = 0 for a polynomial
4.12 WEAK METHODS 245
p, then there are subspaces {Vi] 1of V, complex numbers [Xi] 1, and integers [?lti] 1such
that V = E81 Vi, Vi is T-invariant for each, and (T - XiT)mi = 0 on 1'i for each i.
Show that this is so. Show also that if V is finite-dimensional, then every T in Hom V
must satisfy some polynomial equation p(t) = O. (Consider the linear independence or
dependence of the vector I, T, T2, ... , Tn2 , ••• ,in the vector space Hom V.)
Il.I8 Suppose that the polynomial p in the above exercise has real coefficients. Use
the fact that complex conjugation is an automorphism of IC to prove that if Xis a root
of p, then so is X.
Show that if V is the complexification of a real space Wand T is the complexifica-
tion of R E Hom W, then there exists a real polynomial p such that p(T) = O.
Il.I9 Show that if W is a finite-dimensional real vector space and R E Hom W is an
isomorphism, then there exists an A E Hom W such that R = eA (that is, log R exists).
This is a hard exercise, but it can be proved from Exercises 8.19 through 8.23, 11.12,
11.17, and 11.18.
*12. WEAK METHODS
Our theorem that all norms are equivalent on a finite-dimensional space suggests
that the limit theory of such spaces should be accessible independently of norms,
and our earlier theorem that every linear transformation with a finite-dimen-
sional domain is automatically bounded reinforces this impression. We shall look
into this question in this section. In a sense this effort is irrelevant, since we
can't do without norms completely, and since they are so handy that we use
them even when we don't have to.
Roughly speaking, what we are going to do is to study a vector-valued map F
by studying the whole collection of real-valued maps (l 0 F : 1E V*) .
Theorem 12.1. If V is finite-dimensional, then ~n ~ ~ in V (with respect to
any, and so every, norm) if and only if lan) ~ l(~) in IR for each 1in V*.
Proof. If ~n ~ ~ and 1E V*, then l(~n) ~ l(~), since 1 is automatically con-
tinuous. Conversely, if l(~n) ~ l(~) for every 1 in V*, then, choosing a basis
{tli}~ for V, we have Ei(~n) ~ Ei(~) for each functional Ei in the dual basis, and
this implies that ~n ~ ~ in the associated one-norm, since I! ~n - ~!! 1 =
L~ IEi(~n) - Ei(~)1 ~ o. 0
Remark. If V is an arbitrary normed linear space, so that V* = Hom(V, IR)
is the set of bounded linear functionals, then we say that ~n ~ ~ weakly if
l(~n) ~ l(~) for each 1E V*. The above theorem can therefore be rephrased to
Ray that in a finite-dimensional space, weak convergence and norm convergence are
equivalent notions.
We shall now see that in a similar way the integration and differentiation of
parametrized arcs can all be thrown back to the standard calculus of real-valued
functions of a real variable by applying functionals from V* and using the
natural isomorphism of V** with V. Thus, if f E e([a, b], V) and XE V*, then
246 COMPACTNESS AND COMPLETENESS 4.1~'
A0 f E e([a, bJ. ~), and so the integral I: A0 f exists from standard calculus. If WI'
vary A, we can check that the map A1--+ I: A0 f is linear, hence is in V**, awl
therefore is given by a uniquely determined vector a E V (by duality; S('I'
Chapter 2, Theorem 3.2.). That is, there exists a unique a E V such thai
A(a) = I: A0 f for every AE V*, and we define this a to be I:f. Thus integra
tion is defined so as to commute with the application of linear functionab
I:f is that vector such that
A(labf) = lab A(j(t)) dt for all A E V*.
Similarly, if all the real-valued functions {A 0 f: AE V*} are differentiahll'
at xo, then the mapping A1--+ (A 0 f), (xo) is linear by the linearity of the derivati'I .
in the standard calculus:
Therefore, there is again a unique a E V such that
(A 0 j)'(xo) = A(a) for all AE V*,
and if we define this a to be the derivative l'(xo), we have again defined an opcr-
ation of the calculus by commutativity with linear functionals:
(A of') (xo) = (A 0 f), (xo).
Now the fundamental theorem of the calculus appears as follows.
If F(x) = I: f, then (A 0 F)(x) = I: A0 f by the weak definition of tlw
integral. The fundamental theorem of the standard calculus then says thai
(A 0 F)' exists and (A 0 F)'(x) = (A 0 f) (x) = A(j(X)). By the weak definition 01
the derivative we then have that F' exists and F'(x) = f(x).
The one conclusion that we don't get so easily by weak methods is the nOflll
inequality [[I:f[[ ~ (b - a)[[f[["", This requires a theorem about norms Oil
finite-dimensional spaces that we shall not prove in this course.
Theorem 12.2. [[a**[[ = [[all for each a E V.
What is being asserted is that lub [a**(A)[/[[AII = [[all. Since a**(A) = A(a),
and since IA(a)1 ~ IIAII . IJal1 by the definition of IIAII, we see that
lub la**(A)I/[IAII ~ Ila[l·
Our problem is therefore to find AE V* with IIAII = 1 and IA(a)1 = Iiali. If WI'
multiply through by a suitable constant (replacing a by ca, where c = 1/I!all).
we can suppose that Ila!1 = 1. Then a is on the unit spherical surface, and tIll'
problem is to find a functional AE V* such that the affine subspace (hyperplane)
where A= 1 touches the unit sphere at a (so that A(a) = 1) and otherwi:-w
lies outside the unit sphere (so that IA(OI ~ 1 when II ~II = 1, and henc('
[IAII ~ 1). It is clear geometrically that such "tangent planes" must exist, but,
we shall drop the matter here.
4.12 WEAK METHODS 247
If we assume this theorem, then, since
Ix (lab1)1 = lib x(j(t)) dtl ~ (b - a) max {lx(j(t))I: tE [a, bJ}
~ (b - a)IIXII max {11/(t)II} (from IX(a)1 ~ IIXII· Iiall)
= (b - a)IIXII . 11/1100,
we get
the extreme members of which form the desired inequality.
CHAPTER 5
SCALAR PRODUCT SPACE~
In this short chapter we shall look into what is going on behind two-norms, arrd
we shall find that a wholly new branch of linear analysis is opened up. TIll'c'
norms can be eharaderized abHtractly a::; those arising from sealar prodw·t
They are the finite and infinite-dimensional analogues of ordinary geometri,'
length, and they carry with them practically all the concepts of Eucliden"
geometry, such as the notion of the angle between two vectors, perpendicularit,
(orthogonality) and the Pythagorean theorem, and the existence of marr.'
rigid motions.
The impact of this extra structure is particularly dramatic for infinil('
dimensional spaces. Infinite orthogonal bases exist in great profusion and C:tII
be handled about as easily a::; bases in finite-dimensional spaces, although 01<'
basis expamlion of a vector is now a convergent infinite series, ~ = L~ x,n,
lIany of the most important series expansions in mathematics are examples (II
such orthogonal basis expansions. For example, we shall see in the next chapkr
that the Fourier series expan::;ion of a continuous function 1on [0, 71'] is the basi"
expansion of 1 under the two-norm 111112 = (fo12)1/2 for the particular orthog
onal basis [an) ~ = {sin lIt)~. If a vector space is complete under a scalar prodW't
norm, it is called a Hilbert space. The more advanced theory of such space::; i.':
one of the most beautiful parts of mathematics.
1. SCALAR PRODUCTS
A scalar product on a real vector ::;pace r is a real-valued function from V X
to ~, its value at the pair -<~, TJ >- ordinarily being designated (~, TJ), such that
a) (~, TJ) iH linear in ~ when TJ is held fixed;
b) (~, TJ) = (TJ,~) (symmetry);
c) (~, 0 > 0 if ~ ~ 0 (positive definiteness).
If (c) is replaced by the weaker condition
c') (~, ~) ;:::: 0 for all ~ E TT,
then (~, TJ) is called a semiscalar product.
Two important examples of scalar products are
n
(x, Y) = L :riYi
1
when V = ~n
248
5.1 SCALAR PRODUCTS 249
and
(f, g) = labf(t) g(t) dt when v = e([a, b]).
On a complex vector space (b) must be replaced by
b/) (~, 1]) = (1], ~) (Hermitian symmetry),
where the bar denotes complex conjugation. The corresponding examples are
(z, w) = L1 Z.Wi when V = en and (f, g) = J:jg when V is the space of
continuous complex-valued functions on [a, b]. We shall study only the real case.
It follows from (a) and (b) that a semiscalar product is also linear in the
second variable when the first variable is held fixed, and therefore is a symmetric
bilinear functional whose associated quadratic form q(~) = (~, ~) is positive
definite or positive semidefinite [(c) or (c/); see the last section in Chapter 2].
The definiteness of the form q has far-reaching consequences, as we shall begin
to see at once.
Theorem 1.1. The Schwarz inequality
I(~, 1])1 ~ (~, ~)1/2(1], 1])1/2
is valid for any semiscalar product.
Proof. We have 0 ~ (~ - t1], ~ - t1]) = (~, ~) - 2t(~, 1]) + t2 (1], 1]) for every
t E~. Since this quadratic in t is never negative, it cannot have distinct roots,
and the usual (b2 - 4ac)-formula implies that 4(~, 1])2 - 4(~, ~)(1], 1]) ~ 0,
which is equivalent to the Schwarz inequality. 0
We can also proceed directly. If (1], 1]) > 0, and if we set t = (~, 1])/(1], 1])
in the quadratic inequality in the first line of the proof, then the resulting
expression simplifies to the Schwarz inequality. If (1], 1]) = 0, then (~, 1]) must
also be 0 (or else the beginning inequality is clearly false for some t), and now
the Schwarz inequality holds trivially.
Corollary. If (~, 1]) is a scalar product, then II ~II = (~, ~)1/2 is a norm.
Proof
II~ + 1]11 2 = (~+ 1], ~ + 1])
= II ~112 + 2(~, 1]) + 111]11 2 ~ II ~112 + 211 ~II 111]11 + 111]11 2
= (II ~II + 111]11)2,
(by Schwarz)
proving the triangle inequality. Also, Ilc~1I = (c~, C~)1/2 = (C2(~, ~»)1/2 =
lei II ~II. 0
Note that the Schwarz inequality I(~, 1])1 ~ 1I~1I111]11 is now just the state-
ment that the bilinear functional (~, 1]) is bounded by one with respect to the
scalar product norm.
A normed linear space V in which the norm is a scalar product norm is
called a pre-Hilbert space. If V is complete in this norm, it is a Hilbert space.
250 SCALAR PRODUCT SPACES 5.1
The two examples of scalar products mentioned earlier give us the real explana-
tion of our two-norms for the first time:
(
n )1/2
lxl12 = ~ x~ for x E IRn
and
for IE e([a, b])
are scalar product norms.
Since the scalar product norm on IRn becomes Euclidean length under a
Cartesian coordinate correspondence with Euclidean n-space, it is conventional
to call IRn itself Euclidean n-space IP when we want it understood that thp
scalar product norm is being used.
Any finite-dimensional space V is a Hilbert space with respect to any scalar
product norm, because its finite dimensionality guarantees its completeness.
On the other hand, we shall see in Exercise 1.10 that e([a, b]) is incomplete in th('
two-norm, and is therefore a pre-Hilbert space but not a Hilbert space in this
norm. (Remember, however, that e([a, b]) is complete in the uniform norm
11111",.) It is important to the real uses of Hilbert spaces in mathematics that any
pre-Hilbert space can be completed to a Hilbert space, but the theory of infinite-
dimensional Hilbert spaces is for the most part beyond the scope of this book.
Scalar product norms have in some sense the smoothest possible unit
spheres, because these spheres are quadratic surfaces.
It is orthogonality that gives the theory of pre-Hilbert spaces its special
flavor. Two vectors a and fJ are said to be orthogonal, written a 1- fJ, if (a, fJ) = O.
This definition gets its inspiration from geometry; we noted in Chapter 1 that
two geometric vectors are perpendicular if and only if their coordinate triples x
and y satisfy (x, y) = O. It is an interesting problem to go further and to show
from the law of cosines (c2 = a2 + b2 - 2ab cos (J) that the angle (J between two
geometric vectors is given by (x, y) = Ilxllllyll cos (J. This would motivate us
to define the angle (J between two vectors ~ and 'I] in a pre-Hilbert space by
(~, '1]) = II ~IIII'I]II cos (J, but we shall have no use for this more general formu-
lation.
We say that two subsets A and B are orthogonal, and we write A 1- B, if
a 1- fJ for every a in A and fJ in B; for any subset A we set A1- = {fJ E V: fJ 1- AJ.
Lemma 1.1. If fJ is orthogonal to the set A, then fJ is orthogonal to L(A), tlw
closure of the linear span of A. It follows that B1- is a closed subspace for
every subset B.
Proof. The first assertion depends on the linearity and continuity of the scalar
product in one of its variables; it will be left to the reader. As for A = B1-, it
includes the closure of its own linear span, by the first part, and so is a closed
subspace. 0
5.1 SCALAR PRODUCTS 251
Lemma 1.2. In any pre-Hilbert space we have the parallelogram law,
Iia + ~112 + Iia - ~112 = 2(lla112+ 11~112),
and the Pythagorean theorem,
a.l.~ if and only if
If {aig is a (pairwise) orthogonal collection of vectors, then
Proof. Since Iia +~112 = IIal12+ 2(a,~) + 1I~112, by the bilinearity of the
scalar product Iia + ~112, we see that Iia +~112 = IIal12+ 11~112 if and only if
(a,~) = 0, which is the Pythagorean theorem. Writing down the similar
expansion of Iia - ~112 and adding, we have the parallelogram law. The last
statement follows from the Pythagorean theorem and Lemma 1.1 by induction.
Or we can obtain this statement directly by expanding the scalar product on the
left and noticing that all "mixed terms" drop out by orthogonality. 0
The reader will notice that the Schwarz inequality has not been used in this
lemma, but it would have been silly to state the lemma before proving that
II~II = (~, ~)1/2 is a norm.
If {ai}i are orthogonal and nonzero, then the identity IIl:~ Xiail12 =
l:i xlilail12shows that l:i Xiai can be zero only if all the coefficients Xi are zero.
Thus,
Corollary. A finite collection of (pairwise) orthogonal nonzero vectors is
independent. Similarly, a finite collection of orthogonal subspaces is
independent.
EXERCISES
1.1 Complete the second proof of Theorem 1.1.
1.2 Reexamine the proof of Theorem 1.1 and show that if ~ and 7J are independent,
then the Schwarz inequality is strict.
1.3 Continuing the above exercise, now show that the triangle inequality is strict
if ~ and 7J are independent.
1.4 a) Show that the sum of two semiscalar products is a semiscalar product.
b) Show that if (p., v) is a semiscalar product on a vector space 1V and if T is a
linear transformation from a vector space V to W, then I~, 7Jl = (Tp., Tv) is a
semiscalar product on V.
c) Deduce from (a) and (b) that
(j, g) = f(a)g(a) +t!'(t)g'(t) dt
is a semiscalar product on V = el (la, b]). Prove that it is a scalar product.
252 SC..LAR PRODUCT SPACES
r. .,
d.
1.5 If a is held fixed, we know that f(~) = (~, a) is continuous. Why? Prove mOIl
generally that (~, 1/) is continuous as a map from V X V to R
1.6 Let l' be a two-dimensional Hilbert space, and let {al, (2) be any basis for 
Show that a scalar product (~, 1/) has the form
(t 1/) = aXIYI + b(XIY2 + X2YI) +CX2Y2,
where b2 < ac. Here, of course, ~ = Xlal + X2a2, 1/ = Ylal + Y2a2.
1.7 Prove that if w(x, y) = aXIYI + b(XIY2 + X2YI) +CX2Y2 and b2 < ac, then ,.'
is a scalar product on 1R2.
1.8 Let w(~, 1/) be any symmetric bilinear functional on a finite-dimensional veclut
space Y, and let q(~) = w(~, ~) be its associated quadratic form. Show that for aliI
choice of a basis for Y the equation IJ(~) = 1 becomes a quadratic equation in til<
coordinates {x;] of ~.
1.9 Prove in detail that if a vector {3 is orthogonal to a set A in a pre-Hilbert spa(·,'.
then (3 is orthogonal to L(A).
1.10 We know from the last chapter that the Riemann integral is defined for the ;;('1
eof uniform limits of real-valued step functions on [0, 1] and that eincludes all till'
continuous functions. Given that k is the step function whose value is 1 on [0, !l and
o on [!, 1], show that Ilf - kl12 > 0 for any continuous function f. Show, howevpl'.
that there is a sequence of continuous functions {fn) such that Ilfn - kl12 ---> O. ShOll.
therefore, that e([O, 1]) is incomplete in the two-norm, by showing that the abo (.
sequence [jn] is Cauchy but not convergent in e([O, 1]).
2. ORTHOGONAL PROJECTION
One of the most important devices in geometric reasoning is "dropping a per-
pendicular" from a point to a line or a plane and then using right triangh·
arguments. This device is equally important in pre-Hilbert space theory. If M
is a subspace and a is any element in V, then by "the foot of the perpendicular
dropped from a to M" we mean that vector J.L in M such that (a - J.L) 1- 111,
if such a J.L exists. (See Fig. 5.1.) Writing a as J.L + (a - J.L), we see that tllP
existence of the "foot" J.L in JIll for each a in V is equiv-
alent to the direct sum decomposition V = M $ M 1-.
N ow it is precisely this direct sum decomposition
that the completeness of a Hilbert space guarantees,
as we shall shortly see. We start by proving the
geometrically intuitive fact that J.L is the foot of the
perpendicular dropped from a to M if and only if J.L
is the point in M closest to a.
It ig.5.1
111
Lemma 2.1. If J.L is in the subspace M, then (a - J.L) 1- M if and only'if J.L is
the unique point in M closest to a, that is, J.L is the "best approximation" to
a in M.
Proof. If (a - J.L) 1- M and ~ is any other point in M, then Iia - ~112 =
II(a - J.L) + (J.L - ~)112 = Iia - J.L112 + IIJ.L - ~112 > Iia - J.L112. Thus J.L is the
5.2 ORTHOGONAL PROJECTION 253
unique point in M closest to a. Conversely, suppose that JJ. is a point in M closest
to a,and let ~beanynonzerovectorinM. Then Iia -JJ.11 2 .:::; II(a -JJ.)+t~1I2,
which becomes 0 .:::; 2t(a - JJ., 0 + t211 ~112 when the right-hand scalar product is
expanded. This can hold for all t only if (a - JJ., ~) = 0 (otherwise let t = ?).
Therefore, (a - JJ.) .1. M. 0
On the basis of this lemma it is clear that a way to look for JJ. is to take a
sequence JJ.n in M such that Iia - JJ.nll --t p(a, M) and to hope to define JJ. as its
limit. Here is the crux of the matter: We can prove that such a sequence {JJ.n} is
always Cauchy, but its limit may not exist if M is not complete!
Lemma 2.2. If {JJ.n} is a sequence in the subspace M whose distance from
some vector a converges to the distance p from a to M, then {JJ.n} is Cauchy.
Proof. By the parallelogram law, IIJJ.n - JJ.m11 2 = II(a - JJ.n) - (a - JJ.m)112 =
2(lla - JJ.n11 2 + Iia - JJ.mI1 2) - 112a - (JJ.n +JJ.m)112. Since the first term on the
right converges to 4p2 as n, m --t 00, and since the second term is always
.:::; -4p2 (factor out the 2), we see that IIJJ.n - JJ.m1l 2 --t 0 as n,m --t 00. 0
Theorem 2.1. If M is a complete subspace of a pre-Hilbert space V, then
V = M ED M l., In particular, this is true for any finite-dimensional sub-
space of a pre-Hilbert space and for any closed subspace of a Hilbert space.
Proof. This follows at once from the last two lemmas, since now JJ. = lim JJ.n
exists, Iia - JJ.II = p(a, M), and so (a - JJ.) .1. M. 0
If V = M ED M-L, then the projection on M along Ml. is called the orthogonal
projection on M, or simply the projection on M, since among all the projections on
M associated with the various complements of M, the orthogonal projection is
distinguished. Thus, if M is a complete subspace of V, and if P is the projection
on M, then P(~) is at once the foot of the perpendicular dropped from ~ to M
(which is where the word "projection" comes from) and also the best approxi-
mation to ~ in M (Lemma 2.1).
Lemma 2.3. If {MiH is a finite collection of complete, pairwise orthogonal
subspaces, and if for a vector a in V, ai is the projection of a on "Ali for
i = 1, ... , n, then L~ ai is the projection of a on EB~ Mi.
Proof. We have to show that a - L~ ai is orthogonal to EB~ Mj, and it is
sufficient to show it orthogonal to each M j separately. But if ~ E Mj, then
(a - L~ ai, ~) = (a - aj, ~), since (ai, ~) = 0 for i ~ j, and (a - aj, ~) = 0
because aj is the projection of a on M j. Thus (a - L~ ai, ~) = O. 0
Lemma 2.4. The projection of ~ on the one-dimensional span of a single
nonzero vector." is «(~, .")/11.,,11 2).,,.
Proof. Here JJ. must be of the form x.". But (~ - x.,,) .1. ." if and only if
or
254 SCALAR PRODUCT SPACES 5 ')
.-
We call the number (~, '17)/11'17112the 'I7-Fourier coefficient of~. If '17 is a unit
(normalized) vector, then this Fourier coefficient is just (~, '17). It follows from
Lemma 2.3 that if {'Pi} ~ is an orthogonal collection of nonzero vectors, and if
{Xin are the corresponding Fourier coefficients of a vector ~, then L~ Xi'Pi is thl'
projection of ~ on the subspace M spanned by {'Pi} ~. Therefore, ~ - L~ Xi'Pi -L M.
and (Lemma 2.1) L~ Xi'Pi is the best approximation to ~ in M. If ~ is in M, thcll
both of these statements say that ~ = L~ Xi'Pi. (This can of course be verified
directly, by letting ~ = L~ ai'Pi be the basis expansion of ~ and computinl!;
(~, 'Pj) = L~ ai('Pi, 'Pj) = ajll'PjI12.)
If an orthogonal set of vectors {'Pi} is also normalized (lI'Pill = 1), then we
call the set orthonormal.
Theorem 2.2. If {'Pi} ~ is an infinite orthonormal sequence, and if {Xi} ~ are
the corresponding Fourier coefficients of a vector ~, then
(Bessel's inequality),
and ~ = L~ Xi'Pi if and only if L~ xl = II ~112 (Parseval's equation).
Proof. Setting Un = L~ Xi'Pi and ~ = (~ - un) +Un , and remembering that
~ - Un -L Un, we have
n
11~112 = II~ _ un l12+1: X~.
1
Therefore, L~ xl ~ II ~112 for all n, proving Bessel's inequality, and Un --+ ~ (that
is, II ~ - Un II --+ 0) if and only if L~ xl --+ II ~112, proving Parseval's identity. 0
We call the formal series I: Xi'Pi the Fourier series of ~ (with respect to the
orthonormal set {'Pi}). The Parseval condition says that the Fourier series of ~
converges to ~ if and only if II ~112 = I:~ xl.
An infinite orthonormal sequence {'Pi} ~ is called a basis for a pre-Hilbert
space V if every element in V is the sum of its Fourier series.
Theorem 2.3. An infinite orthonormal sequence {'Pi} ~ is a basis for a pre-
Hilbert space V if (and only if) its linear span is dense in V.
Proof. Let ~ be any element of V, and let {Xi} be its sequence of Fourier
coefficients. Since the linear span of {'Pi} is dense in V, given any E, there is a
finite linear combination L~ Yi'Pi which approximates ~ to within E. But
L~ Xi'Pi is the best approximation to ~ in the span of {'Pi}~' by Lemmas 2.3 and
2.1, and so
for any m ~ n.
That is, ~ = L~ Xi'Pi. 0
CoroIJary. If V is a Hilbert space, then the orthonormal sequence {'Pi} ~
is a basis if and only if {'Pi}.L = {O}.
5.2 ORTHOGONAL PROJECTION 255
Proof. Let M be the closure of the linear span of {lPi}~. Since V = M + M.l.,
and since M.l. = {lPi}.l., by Lemma 1.1, we see that {lPi}.l. = {o} if and only if
V = M, and, by the theorem, this holds if and only if {lPi} is a basis. 0
Note that when orthogonal bases only are being used, the coefficient of a
vector ~ at a basis element {3 is always the Fourier coefficient (~, (3)/11{3112.
Thus the {3-coefficient of ~ depends only on {3 and is independent of the choice
of the rest of the basis. However, we know from Chapter 2 that when an arbi-
trary basis containing {3 is being used, then the {3-coefficient of ~ varies with the
basis. This partly explains the favored position of orthogonal bases.
We often obtain an orthonormal sequence by "orthogonalizing" some given
sequence.
Lemma 2.5. If {ai} is a finite or infinite sequence of independent vectors,
then there is an orthonormal sequence {lPi} such that {ai}! and {lPi}'~,have
the same linear span for all n.
Proof. Since normalizing is trivial, we shall only orthogonalize. Suppose, to be
definite, that the sequence is infinite, and let M n be the linear span of
{aI, ... , an}. Let JLn be the orthogonal projection of an on M n-l, and set
IPn = an - JLn (and IPI = al). This is our sequence. We halVe lPi E Mi C M n-l
if i < n, and IPn 1. M n-l, so that the vectors lPi are mutually orthogonal.
Also, IPn ~ 0, since an is not in M n_ l . Thus {lPi}! is an independent subset of
the n-dimensional vector space Mn, by the corollary of Lemma 1.2, and so
{lPi}! spans Mn. 0
The actual calculation of the orthogonalized sequence {lPn} can be carried
out recursively, starting with IPI = ab by noticing that since JLn is the projection
of an on the span of IPI, ... , IPn-b it must be the vector L:~-l CilPi, where ci.is
the Fourier coefficient of an with respect to lPi.
Consider, for example, the sequence {xn} 0' in e([O, 1]). We have IPI =
al = 1. Next, 1P2 = a2 - JL2 = X - c· 1, where
C= (a2, IPI)/[IIPI[[2 = fol x· 1/f; (1)2 = !.
Then 1P3 = a3 - (C21P2 + CIIPI), where
CI = fol x2 • 1/fol (1)2 = t
and
C2 = fol x 2(x - !)/fol (x - !)2 = (! - i)/(2;) = 1.
Thus the first three terms in the orthogonalization of {xn} 0' in e([O, 1]) are 1,
x - !, x 2 - (x - !) - t = x2 - X +i. This process is completely elemen-
tary, but the calculations obviously become burdensome after only a few terms.
We remember from general bilinear theory that if for {3 in V we define
(}{J: V ~ IR by (},s(~) = (~, (3), then (},s E V* and (): {3 ~ (},s is a linear mapping
from V to V*. If a, '1/) is a scalar product, then (},s({3) = 11{311 2 > °if {3 ~ 0, and
so () is injective. Actually, () is an isometry, as we shall ask the reader to show in
256 SCALAR PRODUCT SPACES 5.2
an exercise. If V is finite-dimensional, the injectivity of () implies that () is an
isomorphism. But we have a much more startling result:
Theorem 2.4. () is an isomorphism if and only if V is a Hilbert space.
Proof. Suppose first that V is a Hilbert space. We have to show that () is sur-
jective, i.e., that every nonzero F in V* is of the form ()p. Given such an F, let N
be its null space, let a be a vector orthogonal to N (Theorem 2.1), and consider
(3 = ca, where c is to be determined later. Every vector ~ in V is uniquely a sum
~ = x{3 + 1/, where 1/ is in N. [This only says that VIN is one-dimensional, which
presumably we know, but we can check it directly by applying F and seeing that
F(~ - x(3) = 0 if and only if x = FWIF({3).J But now the equations
FW = F(x{3 +1/) = xF({3) = xcF(a)
and
()fjW = (~, (3) = (x{3 +1/, (3) = xll{3112 = xc2 11al12
show that ()p = F if we take c = F(a)/llaI1 2•
Conversely, if () is surjective (and assuming that it is an isometry), then it is
an isomorphism in Hom(V, V*), and since V* is complete by Theorem 7.6,
Chapter 4, it follows that V is complete by Theorem 7.3 of the same chapter.
Weare finished. 0
EXERCISES
2.1 In the proof of Lemma 2.1, if (a - }J., ~) -,.6 0, what value of t will contradict
the inequality 0 ~ 2t(a -}J., ~) + t211~112?
2.2 Prove the "only if" part of Theorem 2.3.
2.3 Let {Mi} be an orthogonal sequence of complete subspaces of a pre-Hilbert
space V, and let Pi be the (orthogonal) projection on Mi. Prove that {Pi~) is Cauchy
for any ~ in V.
2.4 Show that the functions {sin nt) :'=1 form an orthogonal collection of elements in
the pre-Hilbert space e([O,1I"» with respect to the standard scalar product (I, g) =
fo'" I(t) g(t) dt. Show also that Iisin ntl12 = ...;'11"/2.
2.5 Compute the Fourier coefficients of the function I(t) = t in e([0,1I"]) with
respect to the above orthogonal set. What then is the best two-norm approximation
to t in the two-dimensional space spanned by sin t and sin 2t? Sketch the graph of this
approximating function, indicating its salient features in the usual manner of calculus
curve sketching.
2.6 The "step" function I defined by f(t) = 11"/2 on [0,11"/21 and f(t) = °on (11"/2,11"1
is of course discontinuous at 11"/2. Nevertheless, calculate its Fourier coefficients with
respect to {sin nt) :'=1 in e([o, 11"]) and graph its best approximation in the span of
{sin nt}~.
2.7 Show that the functions {sin nt) :'=1 U {cos nt) :'=0 form an orthogonal collection
of elements in the pre-Hilbert space e([-11",11"]) with respect to the standard scalar
product (f, g) = f~" I(t) g(t) dt.
5.3 SELF-ADJOINT TRANSFORMATIONS 257
2.8 Calculate the first three terms in the orthogonalization of {xn} 0' in e([-1, 1]).
2.9 Use the definition of the norm of a bounded linear transformation and the
Schwarz inequality to show that 1101/11 ~ 11{311 [where Ol/(~) = (~, (3)]. In order to
conclude that {3 ~ 01/ is an isometry, we also need the opposite inequality, 1101/11 2:: 1I{311.
Prove this by using a special value of ~.
2.10 Show that if V is an incomplete pre-Hilbert space, then V has a proper closed
subspace JJf such that M J.. = {O}. [Hint: There must exist P E V* not of the form
PW = (~, a).] Together with Theorem 2.1, this shows that a pre-Hilbert space V is a
Hilbert space if and only if V = M EEl M J.. for every closed subspace M.
2.11 The isometry 0: a ~ Oa [where Oa(~) = (~, a)] imbeds the pre-Hilbert space V
in its conjugate space V*. We know that V* is complete. Why? The closure of Vas
a subspace of V* is therefore complete, and we can hence complete V as a Banach
space. Let H be its completion. It is a Banach space including (the isometric image of)
V as a dense subspace. Show that the scalar product on V extends uniquely to Hand
that the norm on H is the extended scalar product norm, so that H is a Hilbert space.
2.12 Show that under the isometric imbedding a ~ Oa of a pre-Hilbert space V into
V* orthogonality is equivalent to annihilation as discussed in Section 2.3. Discuss the
connection between the properties of the annihilator A0 and Lemma 1.1 of this chapter.
2.13 Prove that if C is a nonempty complete convex subset of a pre-Hilbert space V,
and if a is any vector not in C, then there is a unique JI. E C closest to a. (Examine the
proof of Lemma 2.2.)
3. SELF-ADJOINT TRANSFORMATIONS
Definition. If V is a pre-Hilbert space, then T in Hom V is self-adjoint if
(Ta, (3) = (a, T(3) for every a, {3 E V. The set of all self-adjoint transforma-
tions will be designated SA.
Self-adjointness suggests that T ought to become its own adjoint under the
injection °of V into V*. We check this now. Since (a, (3) = OfJ(a), we can rewrite
the equation (Ta,{3) = (a, T(3) as OfJ(Ta) = OTtI(a), and again as (T*(0tl)(a) = t,(TrI.)
~ by the definition of T*. This holds for all a and (3 if and only if T*(OfJ) =
OTfJ for all (3 E V, or T* 0 °= °0 T, which is the asserted identification.
Lemma 3.1. If V is a finite-dimensional Hilbert space and {<Pi}~ is an
orthonormal basis for V, then T E Hom(V) is self-adjoint if and only if the
matrix {tij} of T with respect to {<Pi} is symmetric (t = t*).
Proof. If we substitute the basis expansions of a and {3 and expand, we see that
(a, T(3) = (Ta, (3) for all a and {3 if and only if (<Pi, T<pj) = (T<pi, <pj) for all i
and j. But T<P1 = Lk=l tkl<Pk, and when this is substituted in these last scalar
products, the equation becomes tij = tji. That is, T is self-adjoint if and only if
t = t*. 0
A self-adjoint T is said to be nonnegative if (T~, 0 2:: 0 for all~. Then
[~, '11] = (T~, '11) is a semiscalar product!
258 SCALAR PRODUCT SPACES 5.:
Lemma 3.2. If T is a nonnegative self-adjoint transformation, then
IITWII ~ IITI11/2(T~, ~)1/2 for all~. Therefore, if (T~,~) = 0, then
T~ = 0, and, more generally, if (T~n' ~n) ~ 0, then T(~n) ~ o.
Proof. If T is nonnegative as well as self-adjoint, then [~, 17] = (TI;, 17) is a
semiscalar product, and so, by Schwarz's inequality,
I(TI;, 17)1 = [~, 17] ~ [I;, ~P/2[17, 17P/2 = (T~, ~)1/2(T17, 17)1/2.
Taking 17 = T~, the factor on the right becomes (T(T~), T~)1/2, which is lesH
than or equal to IITI11/21IT~II, by Schwarz and the definition of IITII. Dividing by
IIT ~II, we get the inequality of the lemma. 0
If a ~ 0 and T(a) = Ca for some c, then a is called an eigenvector (proper
vector, characteristic vector) of T, and c is the associated eigenvalue (proper
value, characteristic value).
Theorem 3.1. If V is a finite-dimensional Hilbert space and T is a self-
adjoint element of Hom V, then V has an orthonormal basis consisting
entirely of eigenvectors of T.
Proof. Consider the function (T~, ~). It is a continuous real-valued function
of ~, and on the unit sphere S = {~: II~II = I} it is bounded above by IITII
(by Schwarz). Set m = lub {(TI;,~): II~II = I}. Since S is compact (being
bounded and closed), (T~, ~) assumes the value m at some point a on S. Now
m - T is a nonnegative self-adjoint transformation (Check this!), and (Ta, a) =
m is equivalent to ((m - T)a, a) = o. Therefore, (m - T)a = 0 by Lemma
3.2, and Ta = ma. We have thus found one ~igenvector for T. Now set VI = V,
a1 = a, and m1 = m, and let V2 be {a1} 1.. Then T[V2] C V2, for if I;.l all
then (T~, a1) = (~, Tal) = m(~, a1) = O.
We can therefore repeat the above argument for the restriction of T to the
Hilbert space V2and find a2 in V2such that IIa211 = 1 and T(a2) = m2a2,
where m2 = lub {(T~, ~) : II ~II = 1 and ~ E V 2J. Clearly, m2 ~ mI. We then
set V3 = {all a2} 1. and continue, arriving finally at an orthonormal basis
{aig of eigenvectors of T. 0
Now let All •.• , AT be the distinct values in the list mil ... , mn , and let M j
be the linear span of those basis vectors ai for which mi = Aj. Then the sub-
spaces 1I1j are orthogonal to each other, V = E9; lIfj, each 1I1j is T-invariant,
and the restriction of T to 1I1j is Aj times the identity. Since all the nonzero
vectors in 1I1j are eigenvectors with eigenvalue Aj, if the a/s spanning ]lIj are
replaced by any other orthonormal basis for AIj, then we still have an ortho-
normal basis of eigenvectors. The a/s are therefore not in general uniquely
determined. But the subspaces 1I1j and the eigenvalues Aj are unique. This will
follow if we show that every eigenvector is in an ]lIj .
Lemma 3.3. In the context of the above discussion, if ~ ~ 0 and T(I;) = x~
for some x in IR, then ~ E 1I1j (and so x = Aj) for some j.
5.3 SELF-ADJOINT TRANSFORMATIONS 259
Proof. Since V = EB; M j , we have ~ = L; ~i with ~i E Mi. Then
r r r r
L: X~i = x~ = TW = L: Tai) = L: Xi~i and L: (x - Xi)~i = O.
1 1 1 1
Since the subspaces Mi are independent, every component (x - XiHi is O. But
some ~j ~ 0, since ~ ~ O. Therefore, x = Xi> h = 0 for i ~ j, and
~ = ~j E M j • 0
We have thus proved the following theorem.
Theorelll 3.2. If V is a finite-dimensional Hilbert space and T is a self-
adjoint element of Hom V, then there are uniquely determined subspaces
{Vi}; of V, and distinct scalars {Xi};, such that {Vi} is an orthogonal
family whose sum is V and the restriction of T to Vi is Xi times the identity.
If V is a finite-dimensional vector space and we are given T E Hom V, then
we know how to compute related mappings such as T2 and T-1 (if it exists)
and vectors Ta, T-1a, etc., by choosing a basis for V and then computing matrix
products, inverses (when they exist), and so on. Some of these computations,
particularly those related to inverses, can be quite arduous. One enormous
advantage of a basis consisting of eigenvectors for T is that it trivializes all of
these calculations.
To see this, let {i3n} be a basis of V consisting entirely of eigenvectors for T,
and let {rn} be the corresponding eigenvalues. To compute T~, we write down
the basis expansion for ~, ~ = L~ Xii3i, and then T~ = L~ rixii3i. T2 has the
same eigenvectors, but with eigenvalues {rl}. Thus T 2a = L~ rlxii3i. T- 1
exists if and only if no ri = 0, in which case it has the same eigenvectors with
eigenvalues {l/ri}. Thus T- 1 ~ = L~ (x;jri)i3i. If P(t) = L~ antn is any
polynomial, then P(T) takes i3i into P(ri)i3i. Thus P(T) ~ = L~ P(ri)xii3i.
By now the point should be amply clear.
The additional value of orthonormality in a basis is already clear fom the
last section. Basically, it enables us to compute the coefficients {Xi} of ~ by
scalar products: Xi = (~, i3i).
This is a good place to say a few words about the general eigenvalue problem
in finite-dimensional theory. Our complete analysis above was made possible by
the self-adjointness of T (or the symmetry of the matrix t). What we can say
about an arbitrary T in Hom V is much less satisfactory.
We first note that the eigenvalues of T can be determined algebraically, for X
is an eigenvalue if and only if T - XI is not injective, or, equivalently, is singu-
lar, and we know that T - XIis singular ifand only if its determinant .1(T - XI)
is O. If we choose any basis for V, the determinant of T - XI is the determinant
of its matrix t - Xe, and our later formula in Chapter 7 shows that this is a
polynomial of degree n in X. It is easy to see that this polynomial is independent
of the basis; it is called the characteristic polynomial of T. Thus the eigenvalues
of T are exactly the roots of the characteristic polynomial of T.
260 SCALAR PRODUCT SPACES 5.:~
However, T need not have any eigenvectors! Consider, for example, a 900
rotation in the Cartesian plane. This is the map T: -< x, y> f-4 -< -y, x> .
Thus T(OI) = 02 and T(02) = -0 so the matrix of Tis
[~ -~l
Then the matrix of T - Ais
[~ -~l
and the characteristic polynomial of T is the determinant of this matrix: A2 + 1.
Since this polynomial is irreducible over IR, there are no eigenvalues.
Note how different the outcome is if we consider the transformation with the
same matrix on complex 2-space (:2. Here the scalar field is the complex number
system, and T is the map -< z}, Z2> f-4 -< -Z2, Zl > from (:2 to (:2. But now
>-.2 + 1 = (>-. + i)(>-' - i), and T has eigenvalues ±i! To find the eigenvectors
for i, we solve T(z) = iz, which is the equation -< -Z2, Zl > = -< iz}, iz2> , or
Z2 = -izl . Thus -<1, -i> (or i-<l, -i> = -<i, 1» is the unique eigen-
vector for i to within a scalar multiple.
We return to our real theory. If T E Hom V and n = d(V), so that
d(Hom V) = n2 , then the set of n2 + 1 vectors {Ti n2 in Hom V is dependent.
But this is exactly the same as saying that p(T) = 0 for some polynomial p of
degree ~n2. That is, any T in Hom V satisfies a polynomial identity p(T) = O.
Now suppose that T is an eigenvalue of T and that TW = T~. Then p(T)(O =
p(/') ~ === 0, and so p(r) = O. That is, every eigenvalue of T is a root of the
polynomial p. Conversely, if p(r) = 0, then we know from the remainder
theorem of algebra that t - l' is a factor of the polynomial p(t), and therefore
(t - 1')1n will be one of the relatively prime factors of p. Now suppose that p is
the minimal polynomial of T (see Exercise 3.5). Theorem 5.5, Chapter 1, tells
us that (T - 1'I)m is zero on a corresponding subspace N of V and therefore, in
particular, that T - rl is not injective when restricted to N. That is, l' is an
eigenvalue. We have proved:
Theorem 3.3. The eigenvalues of T are zeros (roots) of every polynomial
p(t) such that p(T) = 0, and are exactly the roots of the minimal polynomial.
EXERCISES
3.1 Use the defining identity (T~, 1/) = (~, T1/) to show that the set S.I of all self-
adjoint elements of Hom r is a subspace. Prove similarly that if Sand T are self-
adjoint, then ST is self-adjoint if and only if ST = TS. Conclude that if T is
self-adjoint, then so is p(T) for any polynomial p.
3.2 Show that if T is self-adjoint, then S = T2 is nonnegative. Show, therefore,
that if T is self-adjoint and a ~ 0, then T2 + aI cannot be the zero transformation.
3.3 Let p(t) = t2 +bt +c be an irreducible quadratic polynomial (b2 < 4c), and
let T be a self-adjoint transformation. Show that p(T) ~ O. (Complete the squarc
and apply carlier exercises.)
5.3 SELF-ADJOINT TRANSFORMATIONS 261
3.4 Let T be self-adjoint and nilpotent (Tn = 0 for some n). Prove that T = o.
This can be done in various ways. One method is to show it first for n = 2 and then
for n = 2m by induction. Finally, any n can be bracketed by powers of 2, 2m ::;
n < 2m+! .
3.5 Let V be any vector space, and let T be an element of Hom V. Suppose that there
is a polynomial q such that q(T) = 0, and let p be such a polynomial of minimum
degree. Show that p is unique (to within a constant multiple). It is called the minimal
polynomial of T. Show that if we apply Theorem 5.5 of Chapter 1 to the minimal
polynomial p of T, then the subspaces Ni must both be nontrivial.
3.6 It is a corollary of the fundamental theorem of algebra that a polynomial with
real coefficients can be factored into a product of linear factors (t - r) and irreducible
quadratic factors' (t2 + bt +c). Let T be a self-adjoint transformation on a finite-
dimensional Hilbert space, and let pet) be its minimal polynomial. Deduce a new
proof of Theorem 3.1 by applying to pet) the above remark, Theorem 5.5 of Chapter 1,
and Exercises 3.1 through 3.4.
3.7 Prove that if T is a self-adjoint transformation on a pre-Hilbert space V, then
its null space is the orthogonal complement of its range: N(T) = (R(T»)J.. Conclude
that if V is a Hilbert space, then a self-adjoint T is injective if and only if its range is
dense (in V).
3.8 Assuming the above exercise, show that if V is a Hilbert space and T is a self-
adjoint element of Hom V that is bounded below (as well as bounded), then T is sur-
jective.
3.9 Let T be self-adjoint and nonnegative, and set m = lub {(T~, ~): II~II = I}.
Use the Schwarz inequality and the inequality of Lemma 3.2 to show that m = IITil·
3.10 Let V be a Hilbert space, let T be a self-adjoint element of Hom V, and set
m = lub {(T~,~): II~II = I}. Show that if a> m, then a - T (=aI - T) is in-
vertible and II(a - T) -111 ::; I/(a - m). (Apply the Schwarz inequality, the definition
of m, and Exercise 3.8.)
3.11 Let P be a bounded linear transformation on a pre-Hilbert space V that is a
projection in the sense of Chapter 1. Prove that if P is self-adjoint, then P is an
orthogonal projection. Now prove the converse.
3.12 Let V be a finite-dimensional Hilbert space, let T in Hom V be self-adjoint,
and suppose that S in Hom V commutes with T. Show that the subspaces 111i of
Theorem 3.1 and Lemma 3.3 are invariant under S.
3.13 A self-adjoint transformation T on a finite-dimensional Hilbert space V is said
to have a simple spectrum if all its eigenvalues are distinct. By this we mean that all
the subspaces 111i are one-dimensional. Suppose that T is a self-adjoint transformation
with a simple spectrum, and suppose that S commutes with T. Show that S is also
self-adjoint. (Apply the above exercise.)
3.14 Let H be a Hilbert space, and let w[~, 1/J be a bounded bilinear form on H X H.
That is, there is a constant b such that
Iw[~, 1/11 ::; bll~IIII1/11 for all ~,1/ E H.
Show that there is a unique T in Hom V such that w[t 1/J = (~, T1/). Show that Tis
self-adjoint if and only if w is symmetric.
262 SCALAR PR.oDUCT SPACES 5.4
4. ORTHOGONAL TRANSFORMATIONS
Assuming that V is a Hilbert space and that therefare 0: V ~ V* is an isamar-
phism, we can .of caurse replace the adjaint T* E Ham V* .of any T E Ham V
by the carresP.onding transfarmatian 0-1 0 T* 0 0 E Ham V. In Hilbert space
theary it is this mapping that is called the adjaint .of T and is designated T*.
Then, exactly as in .our discussi.on .of a self-adjaint T, we see that
(Ta, (3) = (a, T*(3) far all a, f3 E V
and that T* is uniquely defined by this identity. Finally, Tis self-adjaint if and
.only if T = T*.
Althaugh it really amaunts ta the abave way .of intraducing T* inta Ham V,
we can make a direct definitian. as fallaws. Far each 71 the mapping ~ ~ (T~, '11)
is linear and baunded, and sa is an elemen!t~of V*, which, by Thearem 2.4, is
given by a unique element f3." in V accarding ta the farmula (T~, 71) = (~, f3.,,).
Naw we check that 71 ~ f3n is linear and baunded and is therefare an element .of
Ham V which we call T*, etc.
The matrix calculatians .of Lemma 3.1 generalize verbatim ta shaw that the
matrix .of T* in Ham V is the transpase t* .of the matrix t .of T.
Anather very impartant type .of transfarmatian an a Hilbert space is .one
that preserves the scalar praduct.
Definition. A transfarmatian T E Ham V is orthogonal if (Ta, T(3) =
(a, (3) far all a, f3 E V.
By the basic adjaint identity abave this is entirely equivalent ta (a, T*T(3) =
(a, (3), far all a, f3, and hence ta T*T = I. An arthaganal T is injective, since
IITal12 = Ila112, and is therefare invertible if V is finite-dimensianal. Whether V
is finite-dimensianal .or nat, if T is invertible, then the abave canditian becames
T* = T-1•
If T E Ham IRn, the matrix farm .of the equatian T*T = I is .of caurse
t *t = e, and if this is written aut, it becames
n
L tkitkj = Cl}
k=l
far all i, j,
which simply says that the calumns .of t farm an arthanarmal set (and hence a
basis) in IRn. We thus have:
Theorem 4.1. A transfarmatian T E Ham IRn is arthaganal if and .only if
the image .of the standard basis {Ili}i under T is anather arthanarmal basis
(with respect ta the standard scalar product).
The necessity .of this canditian is, .of caurse, abviaus fram the scalar-praduct-
preserving definitian .of arthaganality, and the sufficiency can alsa be checked
directly using the basis expansians .of a and f3.
We can naw state the eigenbasis thearem in different terms. By a diagonal
matrix we mean a matrix which is zero everywhere except an the main diaganal.
5.4 ORTHOGONAL TRANSFORMATIONS 263
Theorelll 4.2. Let t = {tii} be a symmetric n X n matrix. Then there
exists an orthogonal n X n matrix b such that b-1tb is a diagonal matrix.
Proof. Since the transformation T E Hom IRn defined by t is self-adjoint, there
exists an orthonormal basis {bi}i of eigenvectors of T, with corresponding
eigenvalues {ri}i. Let B be the orthogonal transformation defined by B(~i) =
bi, j = 1, ... ,n. (The n-tuples b i are the columns of the matrix b = {bii}
of B.) Then (B-1 0 T 0 B)(~i) = ri~i. Since (B-1 0 T 0 B)(~i) is the jth
column of b-1tb, we see that s = b-1tb is diagonal, with 8ji = rj. 0
For later applications we are also going to want the following result.
Theorelll 4.3. Any invertible T E Hom V on a finite-dimensional Hilbert
space V can be expressed in the form T = RS, where R is orthogonal and S
is self-adjoint and positive.
Proof. For any T, T*T is self-adjoint, since (T*T)* = T*T** = T*T. Let
{'Pi}i be an orthonormal eigenbasis, and let {ri}i be the corresponding eigen-
values of T*T. Then 0 < IIT'Pi1l 2 = (T*T'Pi, tpi) = (ritpi, 'Pi) = ri for each i.
Since all the eigenvalues of T*T are thus positive, we can define a positive square
root S = (T*T)1/2 by Stpi = (ri)1/2tpi, i = 1,2, ... ,n. It is clear that S2 =
T*T and that S is self-adjoint.
Then A = ST-1 is orthogonal, for (ST-1a, ST-1fJ) = (T-1a, S2T-1fJ) =
(T-1a, T*TT-1fJ) = (T-1a, T*fJ) = (TT-1a, fJ) = (a, fJ). Since T = A -IS,
we set R = A-I and have the theorem. 0
It is not hard to see that the above factorization of T is unique. Also, by
starting with TT*, we can express T in the form T = SR, where S is self-adjoint
and positive and R is orthogonal.
We call these factorizations the polar decompositions of T, since they func-
tion somewhat like the polar coordinate factorization z = rei8 of a complex
number.
Corollary. Any nonsingular n X n matrix t can be expressed as t = udv,
where u and v are orthogonal and d is diagonal.
Proof. From the theorem we have t = rs, where r is orthogonal and s is sym-
metric. By Theorem 4.2, s = bdb-l, where d is diagonal and b is orthogonal.
Thus t = rs = (rb)db-1 = udv, where u = rb and v = b-1 are both
orthogonal. 0
EXERCISES
4.1 Let V be a Hilbert space, and suppose that 8 and T in Hom V satisfy
(T~, 11) = (~, 811) for all ~,11.
Write out the proof of the identity 8 = (J-10 T* 0 (J.
264 SCALAR PRODUCT SPACES 5.1i
4.2 Write out the analogue of the proof of Lemma 3.1 which shows that the matrix
of T* is the transpose of the matrix of T.
4.3 Once again show that if (~, 71) = (~, S-) for all ~, then 71 = S-. Conclude that if
S, T in Hom V are such that (~, TTJ) = (~, STJ) for all 71, then T = S.
4.4 Let {a, b} be an orthonormal basis for 1R2, and let t be the 2 X 2 matrix whose
columns are a and b. Show by direct calculation that the rows of t are also ortho-
normal.
4.5 State again why it is that if V is finite-dimensional, and if Sand T in Hom V
satisfy SoT = I, then T is invertible and S = T-l. Now let V be a finite-dimensional
Hilbert space, and let T be an orthogonal transformation in Hom V. Show that T* ill
also orthogonal.
4.6 Let t be an n X n matrix whose columns form an orthonormal basis for IRn.
Prove that the rows of t also form an orthonormal basis. (Apply the above exercise.)
4.7 Show that a nonnegative self-adjoint transformation S on a finite-dimensional
Hilbert space has a uniquely determined nonnegative self-adjoint square root.
4.8 Prove that if V is a finite-dimensional Hilbert space and T E Hom V, then the
"polar decomposition" of T, T = RS, of Theorem 4.3 is unique. (Apply the above
exercise.)
5. COMPACT TRANSFORMATIONS
Theorem 3.1 breaks down when V is an infinite-dimensional Hilbert space.
A self-adjoint transformation T does not in general have enough eigenvectors
to form a basis for V, and a more sophisticated analysis, allowing for a "con-
tinuous spectrum" as well as a "discrete spectrum", is necessary. This en-
riched situation is the reason for the need for further study of Hilbert space
theory at the graduate level, and is one of the sources of complexity in the
mathematical structure of quantum mechanics.
"However, there is one very important special case in which the eigenbasis
theorem is available, and which will have a startling application in the next
chapter.
Definition. Let V and W be any normed linear spaces, and let S be the unit
ball in V. A transformation T in Hom(V, W) is compact if the closure of
T[S] in W is sequentially compact.
TheoreID 5.1. Let V be any pre-Hilbert space, and let T E Hom V be self-
adjoint and compact. Then the pre-Hilbert space R = range (T) has an
orthonormal basis {oi} consisting entirely of eigenvectors of T, and the
corresponding sequence of eigenvalues {rn} converges to 0 (or is finite).
Proof. The proof is just like that of Theorem 3.1 except that we have to start a
little differently. Set m = IITII = lub {IIT(~)II : II ~II = I}, and choose a se-
quence Hn} such that II ~nII = 1 for all n and IIT( ~n) II ---? m. Then
((m2 - T2Hn, ~n) = m2 - IIT(~n)112 ---? 0,
5.5 COMPACT TRANSFORMATIONS 265
and since m2 - T2 is a nonnegative self-adjoint transformation, Lemma 3.2
tells that (m2 - T2)(~n) -7 O. But since T is compact, we can suppose (passing
to a subsequence if necessary) that {T~n} converges, say to fj. Then T2 ~n -7 Tfj,
and so m2~n -7 Tfj also. Thus ~n -7 Tfj/m2 and fj = lim T~n = T2(fj)/m2
Since Ilfjll = lim IIT(~n)1I = m, we have a nonzero vector fj such that T2(fj) =
m2fj. Set a = fj/llfjll.
We have thus found a vector a such that Iiall = 1and 0 = (m2 - T2)(a) =
(m - T)(m + T)(a). Then either (m + T)(a) = 0, in which case T(a) =
-ma, or (m + T)(a) = 'Y ¢ 0 and (m - T)'Y = 0, in which case T'Y = m'Y.
Thus there exists a vector 'PI (a or 'Y/II'YII) such that II'PIII = 1 and T('PI) =
rl'PI, where hi = m. We now proceed just as in Theorem 3.I.
For notational consistency we set mi = m, V I = V, and now set V 2 =
{'PI}1.. Then T[V2] C V 2, since if a.i 'Pb then (Ta, 'PI) = (a, T'P2) =
rl(a, 'PI) = O. Thus T rV 2is compact and self-adjoint, and if m2 = liT rV 211,
there exists 'P2 with 11'P211 = 1 and T('P2) = r2'P2, where Ir21 = m2. We continue
inductively, obtaining an orthonormal sequence {'Pn} C V and a sequence
{rn} C IR such that T'Pn = rn'Pn and Irnl = liT rV nil, where
V n = {'Pb ... ,'Pn_l}l..
We suppose for the moment, since this is the most interesting case, that
rn ¢ 0 for all n.- Then we claim that Irnl -7 O. For Irnl is decreasing in any case,
and if it does not converge to 0, then there exists a b > 0 such that Irnl ~ b
for all n. Then IIT('Pi) - T('Pj)11 2 = Ilri'Pi - rj'Pj112 = "1Iri'PiI12 + Ilrj'Pjl12 =
ri2+ r; ~ 2b2 for all i ¢ j, and the sequence {T('Pi)} can have no convergent
subsequence, contradicting the compactness of T. Therefore Irnll O.
Finally, we have to show that the orthonormal sequence {'Pi} is a basis for R.
If fj = T(a), and if {bn} and {an} are the Fourier coefficients of fj and a,
then we expect that bn = rnan, and this is easy to check: bn = (fj, 'Pn) =
(T(a), 'Pn) = (a, T('Pn)) = (a, rn'Pn) = rn(a, 'Pn) = rnan. This is just saying
that T(an'Pn) = bn'Pn' and therefore fj - L~ bi'Pi = T(a - L~ ai'Pi). Now
a - L~ ai'Pi is orthogonal to {'Pi}~ and therefore is an element of Vn+b and the
norm of T on V n+l is Irn+ll. Moreover, Iia - L~ ai'Pili ~ lIall, by the Pytha-
gorean theorem. Altogether we can conclude that
and since rn+1 -70, this implies that fj = Li bi'Pi. 'Thus {'Pi} is a basis for
R(T). Also, since T is self-adjoint, N(T) = R(T)l. = {'Pi} 1. = ni Vi.
If some ri = 0, then there is a first n such that rn = O. In this
case liT rV nil = Irnl = 0, so that V n C N(T). But 'Pi E R(T) if i < n, since
then 'Pi = T('Pi)/ri, and so N(T) = R(T)l. C {'Pb ... ,'Pn_l}l. = V n' There-
fore, N(T) = Vn and R(T) is the span of {'Pi}~-I. 0
/
CHAPTER 6
DIFFERENTIAL EQUATIONS
This chapter is not a small differential equations textbook; we leave out far too
much. We are principally concerned with some of the theory of the subject,
although we shall say one or two practical things. Our first goal is the funda-
mental existence and uniqueness theorem of ordinary differential equations,
which we prove as an elegant application of the fixed-point theorem. Next we
look at the linear theory, where we make vital use of material from the first two
chapters and get quite specific about the process of actually finding solutions.
So far our development is linked to the initial-value problem, concerning the
existence of, and in some cases the ways of finding, a unique solution passing
through some initially prescribed point in the space containing the solution
curves. However, some of the most important aspects of the subject relate to
what are called boundary-value problems, and our last and most sophisticated
effort will be directed toward making a first step into this large area. This will
involve us in the theory of Chapter 5, for we shall find ourselves studying self-
adjoint operators. In fact, the basic theorem about Fourier series expansion:;;
will come out of recognizing a certain right inverse of a differential operator to be
a compact self-adjoint operator.
1. THE FUNDAMENTAL THEOREM
Let A be an open subset of a Banach space W, let I be an open interval in IR, and
let F: I X A ~ W be continuous. We want to study the differential equation
dOi/dt = F(t, (1).
A solution of this equation is a function!: J ~ A, where J is an open subinterval
of I, such that f'(t) exists and
f'(t) = F (t, f(t))
for every t in J. Note that a solution f has to be continuously differentiable, for
the existence of f' implies the continuity of f, and then f'(t) = F (t, f(t)) is
continuous by the continuity of F.
Weare going to see that if F has a continuous second partial differential,
then there exists a uniquely determined "local" solution through any point
-<to, 010> E I X A.
266
6.1 THE FUNDAMENTAL THEOREM 267
In saying that the solution f goes through <to, ao >, we mean, of course, that
ao = f(to). The requirement that the solution f have the value ao when t = to
is called an initial condition.
The existence and continuity of dF~t.a> implies, via the mean-value
theorem, that F(t, a) is locally uniformly Lipschitz in a. By this we mean that
for any point <to, ao> in I X A there is a neighborhood M X N and a constant
b such that IIF(t, ~) - F(t, '17)11 ::::; bll ~ - '1711 for all t in M and all ~, '17 in N.
To see this we simply choose balls M and N about to and ao such that dF~I.a> is
bounded, say by b, on M X N, and apply Theorem 7.4 of Chapter 3. This is the
condition that we actually use below.
Theorelll 1.1. Let A be an open subset of a Banach space W, let I be an
open interval in IR, and let F be a continuous mapping from I X A to W
which is locally uniformly Lipschitz in its second variable. Then for any
point <to, ao> in I X A, for some neighborhood U of ao and for any
sufficiently small interval J containing to, there is a unique function f from
J to U which is a solution of the differential equation passing through the
point <to, ao> .
Proof. If f is a solution on J through <to, ao> , then an integration gives
so that
f(t) - f(to) = rl
F(s,f(s)) ds,
110
f(t) = ao + rt F(s,f(s)) ds
lto
for t E J. Conversely, if f satisfies this "integral equation", then the funda-
mental theorem of the calculus implies that !'(t) exists and equals F(t,f(t))
on J, so thatfis a solution of the differential equation which clearly goes through
< to, ao>. Now for any continuous f: J ---7 A we can define g: J ---7 W by
g(t) = ao + rt F(s,f(s)) ds,
lto
and our argument above shows that f is a solution of the differential equation if
and only iff is a fixed point of the mapping K: f f---+ g. This suggests that we try
to show that K is a contraction, so that we can apply the fixed-point theorem.
We start by choosing a neighborhood L X U of <to, ao> on which F(t, a)
is bounded and Lipschitz in a uniformly over t. Let J be some open subinterval
of L containing to, and let V be the Banach space CJ3e(J, W) of bounded con-
tinuous functions from J to W. Our later calculation will show how small we
have to take J. We assume that the neighborhood U is a ball about ao of radius
r, and we consider the ball of functions 'U = Br(ao) in V, where ao is the con-
stant function with value ao. Then any fin 'U has its range in U, so that F(t, f(t))
is defined, bounded, and continuous. That is, K as defined earlier maps the ball
'U into V.
268 DIFFERENTIAL EQUATIONS 6.1
We now calculate. Let F be bounded by m on L X U and let 0 be the length
of J. Then
IIK(ao) - aoll", = lub {Ill: F(s, ao) dsll : t E J} ::::; om (1)
by the norm inequality for integrals (see Section 10 of Chapter 4). Also, if fI
and f2 are in'ti, and if c is a Lipschitz constant for F on L X U, then
IIK(h) - K(f2)11", = lub {Ill: F(s,h(s)) - F(s,h(s))II}
::::; olub {IIF(s,h(s)) - F(s,h(s))II}
::::; oc lub {llh(s) - h(s)ll}
= ocllh - 1211",. (2)
From (2) we see that K is a contraction with constant C = OC if OC < 1, and
from (1) we see that K moves the center ao of the ball 'ti a distance less than
(1 - C)r if om < (1 - oc)r. This double requirement on 0 is equivalent to
r
0< + 'm cr
and with any such 0 the theorem follows from a corollary of the fixed-point
theorem (Corollary 2 of Theorem 9.1, Chapter 4). 0
Corollary. The theorem holds if F: I X A ~ W is continuous and has a
continuous second partial differential.
We next show that any two solutions through <to, ao>- must agree on the
intersection of their domains (under the hypotheses of Theorem 1.1).
Lelllllla 1.1. Let gl and g2 be any two solutions of da/dt = F(t, a) through
<to, ao>-. Then gl (t) = g2(t) for all t in the intersection J = J 1 n J 2 of
their domains.
Proof. Otherwise there is a point s in J such that gl(S) ~ g2(S). Suppose that
s > to, and set C = {t: t > to and gl(t) ~ g2(t)} and x = glb C. The set C
is open, since gl and g2 are continuous, and therefore x is not in C. That is,
gl(X) = g2(X). Call this common value a and apply the theorem to <x, a>-.
With r such that Br(a) CA, we choose 0 small enough so that the differential
equation has a unique solution g from (x - 0, x + 0) to Br(a) passing through
<x, a>-, and we also take 0 small enough so that the restrictions of gl and g2 to
(x - 0, x + 0) have ranges in Br(a). But then gl = g2 = g on this interval
by the uniqueness of g, and this contradicts the definition of x. Therefore,
gl = g2 on the intersection of their domains. 0
This lemma allows us to remove the restriction on the range of f in the
theorem.
Theorelll 1.2. Let A, I, and F be as in Theorem 1.1. Then for any point
<to, ao>- in I X A and any sufficiently small interval neighborhood J of to,
there is a unique solution from J to A passing through <to, ao>- .
6.1 THE FUNDAMENTAL THEOREM 269
Fig. 6.1
Global solutions. The solutions we have found for the differential equation
dajdt = F(t, a) are defined only in sufficiently small neighborhoods of the initial
point to and are accordingly called local solutions. Now if we run along to Ii.
point -<tl, al > near the end of such a local solution and then consider the local
solution about -< tt, al >,first of all it will have to agree with our first solution
on the intersection of the two domains, and secondly it will in general extend
farther beyond tl than the first solution, so the two local solutions will fit together
to make a solution on a larger t-interval than either gives separately. We can
continue in this way to extend our original solution to what might be called a
global solution, made up of a patchwork of matching local solutions. These
notions are somewhat vague as described above, and we now turn to a more
precise construction of a global solution.
Given -<to, ao> C I X A, let (f be the family of all solutions through
-<to, ao>. Thus g E (f if and only if g is a solution on an interval J C I, to E J,
and g(to) = ao. Lemma 1.1 shows exactly that the uniont f of all the functions g
in (f is itself a function, for if -<tl, al > E gl and -< tIl a2 > E g2, then al =
gl (t) = g2(t) = a2'
Moreover, f is a solution, because around any x in its domain f agrees with
some g E (f. By the way f was defined we see that f is the unique maximum
solution through -<to, ao>. We have thus proved the following theorem.
Theorelll 1.3. Let F: I X A ~ V be a function satisfying the hypotheses of
Theorem 1.1. Then through each -<to, ao> in I X A there is a uniquely
determined maximal solution to the differential equation dajdt = F(t, a).
In general, we would have to expect a maximal solution to "run into the
boundary of A" and therefore to have a domain interval J properly included in I,
as Fig. 6.1 suggests.
t Remember that we are taking a function to be a set of ordered pairs, so that the
union of a family of functions makes precise sense.
270 DIFFERENTIAL EQUATIONS 6.1
However, if A is the whole space W, and if F(t, a) is Lipsohitz in a for each t,
with a Lipschitz bound e(t) that is continuous in t, then we can show that each
maximal solution is over the whole of I. We shall shortly see that this condition
is a natural one for the linear equation.
Theorem 1.4. Let W be a Banach space, and let I be an open interval in ~.
Let F: I X W ~ W be continuous, and suppose that there is a continuous
function e: I ~ ~ such that
for all t in I and all aI, a2 in W. Then each maximal solution to the differ-
ential equation daldt = F(t, a) has the whole of I for its domain.
Proof. Suppose, on the contrary, that g is a maximal solution whose domain
interval J has right-hand endpoint b less than that of I. We choose a finite open
interval L containing b and such that LeI (see Fig. 6.2). Since L is compact,
the continuous function e(t) has a maximum value e on L. We choose any tl
in L n J close enough to b so that b - tl < lie, and we set al = g(tl) and
m = max IIF(t, al) lion L. With these values of e and m, and with any r, the
proof of Theorem 1.1 gives us a local solution f through -< tl, al >- with domain
(It - 5, tl + 5) for any 5 less than r/(m +re) = liCe + (mlr». Since we
now have no restriction on r (because A = W), this bound on 5 becomes
lie, and since we chose tl so that tl + (lie) > b, we can now choose 5 so that
tl + 5 > b. But this gives us a contradiction; the maximal solution g through
-< tt, al >- includes the local solution f, so that, in particular, tl + 5 :::; b.
We have thus proved the theorem. 0
!
I
I ~b
Fig. 6.2
Going back to our original situation, we can conclude that if the Lipschitz
control of F is of the stronger type assumed above, and if the domain J of some
maximal solution g is less than I, then the open set A cannot be the whole of W.
It is in fact true that the distance from g(t) to the boundary of A approaches
zero as t approaches an endpoint b of J which is interior to I. That is, it is now a
theorem that p(j(t), A') ~ 0 as t ~ b. The proof is more complicated than our
argument above, and we leave it as a set of exercises for the interested reader.
The nth-order equation. Let AI, A 2, ... , An be open subsets of a Banach
space W, let Ibean open interval in~, and let G: I X Al X A2 X· .. X An ~ W
be continuous. We consider the differential equation
dnaldtn = G(t, a, daldt, ... ,dn-Ialdtn- l).
6.1 THE FUNDAMENTAL THEOREM 271
A functionf: J ~ W is a solution to this equation if J is an open subinterval of
I, f has continuous derivatives on J up to the nth order, f(i-ll[J] C Ai, i =
1, ... , n, and
f(n)(t) = GCt, f(t),f'(t), . .. ,pn-ll(t))
for t E J. An initial condition is now given by a point
-<to, f3I, f32,"" f3n>- E I X Al X··· X An.
The basic theorem is almost the same as before. To simplify our notation, let a
be the n-tuple -<aI, a2, ... , an>- in wn = V, and set A = In Ai. Also let
1/1 be the mapping f 1---+ -< f, 1', ... , f(n-I) >- . Then the solution equation
becomesf(n)(t) = GCt,1/If(t)).
Theorem 1.5. Let G: I X A ~ W be as above and suppose, in addition,
that G(t, a) is locally uniformly Lipschitz in a. Then for any -<to, fJ >- in
I X A and for any sufficiently small open interval J containing to, there is a
unique functionffrom J to W such thatfis a solution to the above nth-order
equation satisfying the initial condition if;f(to) = fJ.
Proof. There is an ancient and standard device for reducing a single nth-order
equation to a system of first-order equations. The idea is to replace the single
equation
dna/dtn = G(t, a, da/dt, ... ,dn-Ia/dtn- I)
by the system of equations
daddt = a2,
da2/dt = aa,
dan_ddt = an,
dan/dt = G(t, aI, a2, ... , an),
and then to recognize this system as equivalent to a single first-order equation
on a different space. In fact, if we define the mapping F = -<FI, ... , Fn>-
from I X A to V = wn by setting Fi(t, a) = ai+I for i = 1, ... , n - 1, and
Fn(t, a) = G(t, a), then the above system becomes the single equation
da/dt = F(t, a),
where F is clearly locally uniformly Lipschitz in a. Now a function f =
-<iI, ... ,fn >- from J to V is a solution of this equation if and only if
f~ =h,
f~ =ia,
f~-I = fn,
f~ = G(t, !I, ... , fn),
272 DIFFERENTIAL EQUATIONS 6.1
that is, if and only if f1 has derivatives up to order n, t/!(/I) = f and f~nt) =
GCt, t/!/I(t). The n-tuplet initial condition 1/If(to) = fl is now just f(to) = fl.
Thus the nth-order theorem for G has turned into the first-order theorem for F,
and so follows from Theorems 1.1 and 1.2 0
The local solution through -<to, fl> extends to a unique maximal solution
by Theorem 1.3 applied to our first-order problem dot./dt = F(t, a), and the
domain of the maximal solution is the whole of I if G(t, a) is Lipschitz in a with
a bound c(t) that is continuous and if A = Wn , as in Theorem 1.4.
EXERCISES
1.1 Consider the equation da/dt = F(t, a) in the special case where W = 1R2.
Write out the equation as a pair of equations involving real-valued functions and real
variables.
1.2 Consider the system of differential equations
dy/dt = cos xy.
Define the function F: 1R3 ~ 1R2 so that the above system becomes
da/dt = F(t, a),
where a = -<x, y>.
1.3 In the above exercise show that F is uniformly Lipschitz in a on IR X A, where
A is any bounded open set in 1R2. Is F uniformly Lipschitz on IR X 1R2?
1.4 Write out the above system in terms of a solution function 1 = -</1,12>.
Write out for this system the integrated form used in proving Theorem 1.1.
1.5 The fixed-point theorem iteration sequence that we used in proving Theorem 1.1
starts off with 10 as the constant function ao and then proceeds by
In(t) = ao+ fo' F(S,fn-1(S» ds.
Compute this sequence as far as /4 for the differential equation
dx/dt=t+x [/'(t) = t +I(t)]
with the initial condition1(0) = O. That is, take 10 = 0 and compute/1,12,fa, and 14
from the formula. Now guess the solution 1and verify it.
1.6 Compute the iterates 10, /1,12, and fa for the initial-value problem
dy/dx = x +y2, y(O) = o.
Supposing that the solution 1has a power series expansion about 0, what are its first
three nonzero terms?
1.7 Make the computation in the above exercise for the initial condition/(O) = -1.
1.8 Do the same for 1(0) = +1.
6.1 THE FUNDAMENTAL THEOREM 273
1.9 Suppose that W is a Banach space and that F and G are functions from IR X W4
to W satisfying suitable Lipschitz conditions. Show how the second-order system
TI" = F(t, t TI, /;" TI'), ~" = G(t, ~, TI, /;" TI')
would be brought under our standard theory by making it into a single second-order
equation.
I.IO Answer the above exercise by converting it to a first-order system and then to a
single first-order equation.
l.ll Let 8 be a nonnegative, continuous, real-valued function defined on an interval
[0, a) c IR, and suppose that there are constants band e > 0 such that
8(x) ~ e10'" 8(t) dt +bx for all x E [0, a).
a) Prove by induction that if m = 11811"" then
8(x) ~ m(e~t+~ f (e~(
n. C j=l J.
for every n.
b) Then prove that
( b cz )
8 x) ~ - (e - 1
e
for all x.
1.12 Let W be a Banach space, let I be an interval in IR, and let F be a continuous
mapping from I X W to W. Suppose that IIF(t, cio)II :::; b for all tEl and that
lIF(t, a) - F(t, (3) II :::; ella - {311
for all t in I and all a, (3 in W. Let f be the global solution through -<to, ao >-, and
set 8(x) = Ilf(to + x) - aoll. Prove that
8(x) ~ fox 8(t) dt + bx
for x > 0 and to + x in I. Then use the result in the above exercise to derive a much
stronger bound than we have in the text on the growth of the solution f(t) as t goes
away from to.
1.13 With the hypotheses on F as in the above exercise, show that the iteration
sequence for the solution through -<to, ao>- converges on the whole of I by showing
inductively that if fo = ao and
then
fn(t) = ao+ fol F«S),fn-l(S)) ds,
b (ett
Ifn(t) - fn-l(t) I ~ - -, .C n.
From these inequalities prove directly that the solution f through -<to, ao>- satisfies
IIf(t) - aoll < ~ (ec1t-tol - 1).
-c
274 DIFFERENTIAL EQUATIONS 6.2
2. DIFFERENTIABLE DEPENDENCE ON PARAMETERS
It is exceedingly important in some applications to know how the solution to the
system
f'(t) = G(t,f(t)),
varies with the initial point -< tl, al >-. In order to state the problem precisely,
we fix an open interval J, set 'U = Br(ao) C V = CBe(J, W) as in the previous
section, and require a solution in'U passing through -<h, al >- , where -<h, al >-
is near -< to, ao>-. Supposing that a unique solution f exists, we then have a
mapping -< tll al >- ~ f, and it is the continuity and differentiability of this map
that we wish to study.
TheoreDl 2.1. Let L X U be a neighborhood of -< to, ao>- in the Banach
space ~ X W, and let F(t, a) be a bounded continuous mapping from
L X U to W which is Lipschitz in a uniformly over t. Then there is a neigh-
borhood J X N of -<to, ao>- with the following property. For any -< tll al >-
in J X N there is a unique function f from J to U which is a solution of the
differential equation da/dt = F(t, a) passing through -< tl, al >-, and the
mapping -< tl, al >- ~ f from J X N to V is continuous.
Proof. We simply reexamine the calculation of Theorem 1.1 and take 0 a little
smaller. Let K(tll all f) be the mapping of that theorem but with initial point
-< tl, al >-, so that g = K(tl , all f) if and only if get) = al +ft~ F(s, f(s)) ds for
all t in J. Clearly K is continuous in -< tl, al >- for each fixed f.
If N is the ball Br / 2 (ao), then the inequality (1) in the proof shows that
IIK(tll aI, ao) - aoll ~ lIal - aoll + om ~ r/2 + om. The second inequality
remains unchanged. Therefore, f ~ K(tll aI, f) is a map from'U to V which is a
contraction with constant C = OC if OC < 1, and which moves the center ao of 'U
a distance less than (1 - C)r if r/2 + om < (1 - oc)r. This new double
requirement on 0 is equivalent to
r
o< 2(m +cr)'
which is just half the old value. With J of length 0, we can now apply Theorem
9.2 of Chapter 4 to the map K(tl' aI, f) from (J X N) X'U to V, and so have
our theorem. 0
If we want the map -< tll al >- ~ f to be differentiable, it is sufficient, by
Theorem 9.4 of Chapter 4, to know in addition to the above that
K:(JXN)X'U~V
is continuously differentiable. And to deduce this, it is sufficient to suppose that
dF exists and is utniformly continuous on L X U.
TheoreDl 2.2. Let L X U be a neighborhood of -<to, ao>- in the Banach
space ~ X W, and let F(t, a) be a bounded mapping from LX U to W such
that dF exists, is bounded, and is uniformly continuous on L X U. Then, in
6.2 DIFFERENTIABLE DEPENDENCE ON PARAMETEUS 275
the context of the above theorem, the solution f is a continuously differ-
entiable function of the initial value -< tt, at >- .
Proof. We have to show that the map K(tt, at, f) from (J X N) X '11 to Vis con-
tinuously differentiable, after which we can apply Theorem 9.4 of Chapter 4, as
we remarked above. Now the mapping h 1--+ k defined by k(t) = ft~ h(s) ds is a
bounded linear mapping from V to V which clearly depends continuously on tt,
and by Theorem 14.3 of Chapter 3 the integrand map f 1--+ h defined by h(s) =
F(s, f(s)) is continuously differentiable on '11. Composing these two maps we see
that dK~tl''''l.J> exists and is continuous on J X N X '11. Now
LlK~tl''''l.J>(~) = ~,
so that dK2 = I, and LlK~tl''''l.J>(h) = - ft~l+h F(s,f(s)) ds, from which it
follows easily that dK~tl''''l.J>(h) =-hF(tt,f(t)). The three partial differentials
dK t, dK2, and dK3 thus exist and are continuous on J X N X '11, and it follows
from Theorem 8.3 of Chapter 3 that KCtt, at, f) is continuously differentiable
there. D
Corollary. If s is any point in J, then the value f(s) of a solution at s is a
differentiable function of its value at to.
Proof. Let f", be the solution through -<to, a>-. By the theorem, a 1--+ f", is a
continuously differentiable map from N to the function space V = ffie(J, W).
Eut 71'8: f 1--+ f(s) is a bounded linear mapping and thus trivially continuously
differentiable. Composing these two maps, we see that a 1--+ f",(s) is continuously
differentiable on N. D
It is also possible to make the continuous and differentiable dependence of
the solution on its initial value -<to, ao>- into a global affair. The following is
the theorem. We shall not go into its proof here.
Theorem 2.3. Let f be the maximal solution through -<to, ao>- with domain
J, and let [a, b] be any finite closed subinterval of J containing to. Then there
exists an E > 0 such that for every -< tt, at >- E BE( -<to, ao>-) the domain
of the global solution through -<tt, at >- includes [a, b], and the restriction
of this solution to [a, b] is a continuous function of -<tt, at >-. If F satisfies
the hypotheses of Theorem 2.2, then this dependence is continuously
differentiable.
Finally, suppose that F depends continuously (or continuously differ-
entiably) on a parameter A, so that we have F(A, t, a) on M X I X A. Now the
solution f to the initial-value problem
f'(t) = F(t,f(t)),
depends on the parameter Aas well as on the initial condition fUt) = at, and if
the reader has fully understood our arguments above, he will see that we can
show in the same way that the dependence of f on A is also continuous (con-
tinuously differentiable). We shall not go into these details here.
276 DIFFERENTIAL EQUATIONS 6.3
3. THE LINEAR EQUATION
We now suppose that the function F of Section 1 is from I X W to Wand con-
tinuous, and that F(t, a) is linear in a for each fixed t. It is not hard to see that
we then automatically have the strong Lipschitz hypothesis of Theorem 1.4,
which we shall in any case now assume. Here this is a boundedness condition
on a linear map: we are assuming that F(t, a) = Tt(a), where T t E Hom W,
and that IITtll :::; e(t) for all t, where e(t) is continuous on I.
As one might expect, in this situation the existence and uniqueness theory of
Section 1 makes contact with general linear theory. Let X 0 be the vector space
e(1, W) of all continuous functions from I to W, and let X 1 be its subspace
e1(1, W) of all functions having continuous first derivatives. Norms will play
no role in our theorem.
Theorem 3.1. The mapping S: X1 ~ X0 defined by setting g = Sf if
get) = f'(t) - F(t, f(t)) is a surjective linear mapping. The set N of global
solutions of the differential equation da/dt = F(t, a) is the null space of S,
and is therefore, in particular, a vector space. For each to E I the restriction
to N of the coordinate (evaluation) mapping 'Trto: f ~ f(to) is an isomorphism
from N to W. The null space M of 'Trto is therefore a complement of N in X 1,
and so determines a right inverse R of S. The mapping f ~ -<Sf, f(to) >
is an isomorphism from X1 to X0 X W, and this fact is equivalent to all
the above assertions.
Proof. For any fixed gin Xo we set G(t, a) = F(t, a) +get) and consider the
(nonlinear) equation da/dt = G(t, a). By Theorems 1.3 and 1.4 it has a unique
maximal solution f through any initial point -<to, ao>, and the domain of f is
the whole of I. That is, for each pair -< g, a> in X0 X W there is a unique f in
X 1 such that -< Sf, f(to) > = -<g, a>. The mapping
-<S, 'Trto> :f~ -<Sf,f(to»
is thus bijective, and since it is clearly linear, it is an isomorphism. In particular,
S is surjective. The null space N of S is the inverse image of {O} X W under the
above isomorphism; that is, 'Trto rN is an isomorphism from N to W.
Finally, the null space M of 'Trto is the inverse image of X0 X {O} under
-<S, 'Trto> , and the direct sum decomposition X 1 = M $ N simply reflects the
decomposition X 0 X W = (X0 X {O}) EB ({O} X W) under the inverse isomor-
phism. This finishes the proof of the theorem. 0
The problem of finding, for a given g in X 0 and a given ao in W, the uniquef
in X 1 such that S(f) = g and f(to) = ao is called the initial-value problem. At
the theoretical level, the problem is solved by the above theorem, which states
that the uniquely determined f exists. At the practical level of computation, the
problem remains important.
The fact that M = Mto is a complement of N breaks down the initial-value
problem into two independent subproblems. The right inverse R associated with
6.3 THE LINEAR EQUATION 277
Mto finds h in X 1 such that S(h) = g and h(to) = O. The inverse of the isomor-
phism f 1-+ f(to) from N to W selects that k in Xl such that S(k) = 0 and
k(to) = ao. Then f = h + k. The first subproblem is the problem of "solving
the inhomogeneous equation with homogeneous initial data", and the second is
the problem of "solving the homogeneous equation with inhomogeneous initial
data". In a certain sense the initial-value problem is the "direct sum" of these
two independent problems.
We shall now study the homogeneous equation da/dt = Tt(a) more closely.
As we saw above, its solution space N is isomorphic to W under each projection
map 'lrt = f 1-+ f(t). Let CPt be this isomorphism (so that CPt = 'lrt rN). We now
choose some fixed to in I-we may as well suppose that I contains 0 and take
to = O-and set K t = CPt 0 CPOI. Then {Kt} is a one-parameter family of linear
isomorphisms of W with itself, and if we setf(3(t) = K t({3), thenf(3 is the solution
of da/dt = Tt(a) passing through <0, fJ>. We call K t a fundamental solution of
the homogeneous equation da/dt = Tt(a).
Since fp(t) = Tt(f(3(t)), we see that d(Kt)/dt = Tt 0 K t in the sense that
the equation is true at each fJ in W. However, the derivative d(Kt)/dt does not
necessarily exist as a norm limit in Hom W. This is because our hypotheses on T,
do not imply that the mapping t 1-+ Tt is continuous from I to Hom W. If this
mapping is continuous, then the mapping <t, A> 1-+ Tt 0 A is continuous from
I X Hom W to Hom W, and the initial-value problem
dA/dt = Tt 0 A, Ao = I
has a unique solution At in (31(1, Hom W). Because evaluation at fJ is a bounded
linear mapping from Hom W to W, At(fJ) is a differentiable function of t and
This implies that At(fJ) = Kt(fJ) for all fJ, so K t = At. In particular, the
fundamental solution t 1-+ Kt is now a differentiable map into Hom W, and
dKt/dt = Tt 0 K t. We have proved the following theorem.
Theorem. 3.2. Let t 1-+ Tt be a continuous map from an interval neighbor-
hood I of 0 to Hom W. Then the fundamental solution t 1-+ K t of the
differential equation da/dt = Tt(a) is the parametrized arc from I to
Hom W that is the solution of the initial-value problem dA/dt = Tt 0 A,
Ao = I.
In terms of the isomorphisms K t = K(t), we can now obtain an explicit
solution for the inhomogeneous equation dajdt = Tt(a) +g(t). We want f such
that
f'(t) - Tt(f(t)) = g(t).
Now K'(t) = Tt 0 K(t), so that Tt = K'(t) 0 K(t)-l, and it follows from
Exercise 8.12 of Chapter 4 and the general product rule for differentiation
278 DIFFERENTIAL EQUATIONS
(Theorem 8.4 of Chapter 3) that the left side of the equation above is exactly
K(t) (~t [K(t)-l (J(t) )]) .
The equation we have to solve can thus be rewritten as
We therefore have an obvious solution, and even if the reader has found our
:plOtivating argument too technical, he should be able to check the solution by
differentiating.
Theorelll 3.3. In the context of Theorem 3.2, the function
f(t) = K t [fot K;l(g(S) dsJ
is the solution of the inhomogeneous initial-value problem
da/dt = Tt(a) +g(t), f(O) = o.
This therefore is a formula for the right inverse R of S determined by thp
complement M 0 of the null space N of S.
The special case of the constant coefficient equation, where the "coefficient"
operator Tt is a fixed T in Hom W, is extremely important. The first new fact.
to be observed is that if f is a solution of da/dt = T(a), then so is 1'. For the
equationf'(t) = T(j(t) has a differentiable right-hand side, and differentiating,
we getf"(t) = T(j'(t). That is:
Lelllllla 3.1. The solution space N of the constant coefficient equation
da/dt = T(a) is invariant under the derivative operator D.
Moreover, we see from the differential equation that the operator D on N
is just composition with T. More precisely, the equation f'(t) = T(J(t) can be
rewritten 'lrt 0 D = To 'lrt, and since the restriction of 'lrt to N is the isomor-
phism CPt from N to W, this equation can be solved for T. We thus have the
following lemma.
Lemma 3.2. For each fixed t the isomorphism CPt from N to W takes the
derivative operator D on N to the operator T on W. That is,
T = CPt 0 D 0 cp-;l.
The equation for the fundamental solution Kt is now dS/dt = TS. In the
elementary calculus this is the equation for the exponential function, which leads
us to expect and immediately check that K t = etT• (See the end of Section 8 of
Chapter 4.) The solution of da/dt = T(a) through <0, {3> is thus the function
etT(3 = f: ti Ti.({3) .
o J!
6.3 THE LINEAR EQUATION 279
If T satisfies a polynomial equation p(T) = 0, as we know it must if W is
finite-dimensional, then ouranalysis can be carried significantly further. Suppose
for now that p has only real roots, so that its relatively prime factorization is
p(x) = II~(x - Xi)mi. Then we know from Theorem 5.5 of Chapter 1 that W
is the direct sum W = EB~ Wi of the null spaces Wi of the transformations
(T - Xi)mi , and that each Wi is invariant under T. This gives us a much simpler
form for the solution curve etTa if the point a is in one of the null spaces Wi.
Taking such a subspace Wi itself as W for the moment, we have (T - Xl)m = 0,
so that T = XI + R, where Rm = 0, and the factorization etT = et).etR,
together with the now finite series expansion of etR, gives us
etTa = et). [a + tR(a) +... +tm-1 Rm-1(a)].
(m - 1)
Note that the number of terms on the right is the degree of the factor (t - X)m
in the polynomial p(t).
In the general situation where W = EB~ Wi, we have a = L:~ ai, etT(a), =
L:~ etT(ai), and each etT(ai) of the above form. The solution of f'(t) = T(j(t))
through the general point -<0, a>- is thus a finite sum of terms of the form
tiet).ifJi;' the number of terms being the degree of the polynomial p.
If W is a complex Banach space, then the restriction that p have only real
roots is superfluous. We get exactly the same formula but with complex values
of X. This introduces more variety into the behavior of the solution curves
since an outside exponential factor et). = etpeiIP now has a periodic factor if
11 ~ 0.
Altogether we have proved the following theorem.
TheorelD 3.4. If W is a real or complex Banach space and T E Hom W,
then the solution curve in W of the initial-value problem f' (t) = T (j(t)) ,
f(O) = fJ, is
aD i
f(t) = etTfJ = L ~ Ti(fJ).
o J.
If T satisfies a polynomial equation (T - X)m = 0, then
f(t) = et)' [fJ +tR(fJ) +... + (~:11)  Rm-1(fJ)] ,
where R = T - H. If T satisfies a polynomial equation p(T) = °and p
has the relatively prime factorization p(x) = II~ (x - Xi)mi, then f(t) is a
sum of k terms of the above type, and so has the form
f(t) = L tiet).ifJij,
i.i
where the number of terms on the right is the degree of the polynomial p,
and each fJii is a fixed (constant) vector.
280 DIFFERENTIAL EQUATIONS
It is important to notice how the asymptotic behavior of f(t) as t ---? + 00 iH
controlled by the polynomial roots Ai. We first restrict ourselves to the solution
through a vector a in one of the subspaces Wi, which amounts to supposing that.
(T - A)m = O. Then if Ahas a positive real part, so that eO. = et/Leitv with
JL > 0, then Ilf(t)1I ---? 00 in exponential fashion. If Ahas a negative real part,
then f(t) approaches zero as t ---? 00 (but its norm becomes infinite exponentially
fast as t ---? -(0). If the real part of Ais zero, then IIf(t)lI---? 00 like tm - I if
m > 1. Thus the only way for f to be bounded on the whole of IR is for the real
part of Ato be zero and m = 1, in which case f is periodic. Similarly, in the
general case where p(T) = In (T - An)m" = 0, it will be true that all the
solution curves are bounded on the whole of IR if and only if the roots An are all
pure imaginary and all the multiplicities mn are 1.
EXERCISES
3.1 Let I be an open interval in IR, and let W be a normed linear space. Let F(t, a)
be a continuous function from I X W to W which is linear in a for each fixed t. Prove
that there is a function c(t) which is bounded on every closed interval [a, b] included
in I and such that IIF(t, a) II ~ c(t) Iiall for all a and t. Then show that c can be made
continuous. (You may want to use the Heine-Borel property: If [a, b] is covered by n
collection of open intervals, then some finite subcollection already covers [a, b].)
3.2 In the text we omitted checking that jf-+ jCn) - G(t,j,j', ... ,jCn-I) is sur-
jective from Xn to Xo. Prove that this is so by tracking down the surjectivity through
the reduction to a first-order system.
3.3 Suppose that the coefficients ai(t) in the operator
n
Tj = L: aJ'i)
o
are all themselves in el . Show that the null space N of T is a subspace of en+!. State
a generalization of this theorem and indicate roughly why it is true.
3.4 Suppose that W is a Banach space, T E Hom W, and (3 is an eigenvector of l'
with eigenvalue r. Show that the solution of the constant coefficient equation dot/dt =
T(a) through <0, (3 >- is j(t) = etT{3.
3.5 Suppose next that W is finite-dimensional and has a basis {(3iH consisting of
eigenvectors of T, with corresponding eigenvalues rio Find a formula for the solution
through <0, a>- in terms of the basis expansion of a.
3.6 A very important special case of the linear equation da/dt = Tt(a) is when th('
operator function Tt is periodic. Suppose, for example, that Tt+l = Tt for all t.
Show that then Kt+n = Kt(Kl)n for all t and n.
Assume next that KI has a logarithm, and so can be written KI = eA for some A
in Hom W. (We know from Exercise 11.19 of Chapter 4 that this is always possibll~
if W is finite-dimensional.) Show that now Kt can be written in the form
K t = B(t)etA ,
where B(t) is periodic with period 1.
6.4 THE nTH-ORDER LINEAR EQUATION 281
3.7 Continuing the above exercise, suppose now that W is a finite-dimensional
complex vector space. Using the analysis of etAfJ given in the text, show that the
differential equation da/dt = Tt(a) has a periodic solution (with any period) only if
KI has an eigenvalue of absolute value 1. Show also that if KI has an nth root of
unity as an eigenvalue, then the differential equation has a periodic solution with
period n.
3.8 Write out the special form that the formula of Theorem 3.3 takes in the constant
coefficient situation.
3.9 It is interesting to look at the facts of Theorem 3.1 from the point of view of
Theorem 5.3 of Chapter 1. Assume that S: Xl ~ Xo is surjective and that its null
space N is isomorphic to W under the coordinate (evaluation) map 71'to. Prove that if M
is the nullspace of 71'to in Xl, then S rM is an isomorphism onto Xo by applying this
theorem.
4. THE nTH-ORDER LINEAR EQUATION
The nth-order linear differential equation is the equation
dna/dtn = GCt, a, da/dt, ... , dn-Ia/dtn- l),
where GCt, a) = GCt, ab ..• , an) is now linear from V = wn to W for each t in I.
We convert this to a first-order equation da/dt = FCt, a) just as before, where
now F is a map from I X V to V that is linear in its second variable a, FCt, a) =
TtCa).
Our proof of Theorem 1.5 showed that a functionf in e(n)(I, W) is a solution
of the nth-order equation dna/dtn = G(t, a, ... , dn-Ia/dtn- l) if and only if
the n-tuplet 1/;f = -<f,!', ... ,f(n-O> is a solution of the first-order equation
da/dt = F(t, a) = TtCa). We know that the latter solutions form a vector
subspace N of el(I, wn), and since the map 1/;: f I--'t -<f,!', ... ,!'n-l) > is
linear from en(l, W) to el(I, Wn), it follows that the set N of solutions of the
nth-order equation is a subspace of en(I, W) and 1/; rN is an isomorphism from
'N to N. Since the coordinate evaluation CPt = 7rt rN is an isomorphism from N
to wn for each t (Theorem 3.1), it follows that the map
7rt 0 1/;:fl--'t -<fCt),!'Ct), ... ,f(n-OCt»
takes N isomorphically to wn. Its null space M t is a complement of N in en, as
before. Here M t is the set of functions f in enCI, W) such that fCt) = ... =
f(n-OCt) = O.
We now consider the special case W = ~. For each fixed t, G is now a linear
map from ~n to ~, that is, an element of c~n)*, and its coordinate set with
respect to the standard basis is an n-tuple k = -< kl' ... , kn >. Since the
linear map varies continuously with t, the n-tuple k varies continuously with t.
Thus, when we take t into account, we have an n-tuple kCt) = -<kl(t), ... , knCt) >
of continuous real-valued functions on I such that
n
GCt, Xb ••• , xn) = :E kiCt)Xi.
i=l
282 DIFFERENTIAL EQUATIONS 6.·1
The solution space N of the nth-order differential equation
dna/dtn = G(a, ... , dn-la/dtn-I, t)
is just the null space of the linear transformation L: en(I, !R.) ~ eO(I, !R.)
defined by
If we shift indices to coincide with the order of the derivative, and if we let t n )
also have a coefficient function, then our nth-order linear differential operator J,
appears as
(Lf)(t) = anCt)f(n)(t) +... + ao(t)f(t).
Giving t n) a coefficient function an changes nothing provided an(t) is never
zero, since then it can be divided out to give the form we have studied. This is
called the regular case. The singular case, where an(t) is zero for some t, requires
further study, and we shall not go into it here.
We recapitulate what our general linear theory tells us about this situation.
Theorem 4.1. L is a surjective linear transformation from the space en(l)
of all real-valued functions on I having continuous derivatives through
order n to the space e°(I) = e(I) of continuous functions on I. Its null
space N is the solution space of our original differential equation. For
each to in I the restriction to N of the mapping <Pto 0 1/;:fl---7 -<f(to), ... ,
tn-°(tO) >- is an isomorphism from N to !R.n, and the set Mto of functions
f in en such that f(to) = ... = fIn-OCto) = 0 is therefore a complement
of N in en(I), and determines a linear right inverse of L.
The practical problem of "solving" the differential equation LU) = g for J
when g is given falls into two parts. First we have to find the null space N of L,
that is, we have to solve the homogeneous equation LU) = O. Since N is all
n-dimensional vector space, the problem of delineating it is equivalent to finding
a basis, and this is clearly the efficient way to proceed. Our first problem there-
fore is to find n linearly independent solutions {uiE of LU) = O. Our second
problem is to find a right inverse of L, that is, a linear way of picking one f such
that LU) = g for each g. Here the obvious thing to do is to try to make th(,
formula of Theorem 3.3 into a practical computation. If v is one solution or
LU) = g, then of course the set of all solutions is the affine subspace N +v.
We shall start with the first problem, that of finding a basis {Ui} ~ of solutiolls
to LU) = O. Unfortunately, there is no general method available, and we hav(~
to be content with partial success. We shall see that we can easily solve thp
first-order equation directly, and that if we can find one solution of the nth-
order equation, then we can reduce the problem to solving an equation of order
n - 1. Moreover, in the very important special case of an operator L with
constant coefficients, Theorem 3.4 gives a complete explicit solution.
The first-order homogeneous linear equation can be written in the form
y' + a(t)y = 0, where the coefficient of y' has been divided out. Dividing by y
6.4 THE nTH-ORDER LINEAR EQUATION 283
and remembering that y'/y = (log y)" we see that, formally at least, a solution
is given by log y = - Ja(t) dt or y = e-fa<tldt, and we can check it by inspec-
tion. Thus the equation y' + y/t = 0 has a solution y = e-1og t = 1ft, as the
reader might have noticed directly.
Suppose now that L is an nth-order operator and that we know one solution
U of Lf = o. Our problem then is to find n - 1 solutions VlJ ••• ,Vn-l inde-
pendent of each other and of u. It might even be reasonable to guess that these
could be determined as solutions of an equation of order n - 1. We try to find
a second solution vet) in the form c(t)u(t), where c(t) is an unknown function.
Our motivation, in part, is that such a solution would automatically be inde-
pendent of u unless c(t) turns out to be a constant.
Now if vet) = c(t)u(t), then v' = cu' +c'u, and generally
vU) = t (~) c(i)uU- i ).
i=O '/.
If we write down L(v) = L~ aj(t)v(j)(t) and collect those terms involving c(t),
we get
n
L(v) = c(t) L ajuU) +terms involving c', ... , c(n)
o
= cL(u) +S(c') = S(c'),
where S is a certain linear differential operator of order n - 1 which can be
explicitly computed from the above formulas. We claim that solving S(n = 0
solves our original problem. For suppose that {gin-l is a basis for the null
space of S, and set Ci(t) = Jci gi. Then L(CiU) = S(c~) = S(gi) = 0 for i =
1, ... , n - 1. Moreover, u, ClU, ... ,Cn_lU are independent, for if u =
L~-l kic.u, then 1 = L~-l kiCi(t) and 0 = L~-l kiC~(t) = L~-l kigi(t), con-
tradicting the independence of the set {gi}.
We have thus shown that if we can find one solution u of the nth-order
equation Lf = 0, then its complete solution is reduced to solving an equation
Sf = 0 of order n - 1 (although our independence argument was a little
sketchy).
This reduction procedure does not combine with the solution of the first-
order equation to build up a sequence of independent solutions of the nth-order
equation because, roughly speaking, it "works off the top instead of off the
bottom". For the combination to be successful, we would have to be able to
find from a given nth-order operator a first-order operator S such that N(S) C
N(L), and we can't do this in general. However, we can do it when the coefficient
functions in L are all constants, although we shall in fact proceed differently.
Meanwhile it is valuable to note that a second-order equation Lf = 0 can
be solved completely if we can find one solution u, since the above argument
reduces the remaining problem to a first-order equation which can then be solved
by an integration, as we saw earlier. Consider, for instance, the equation
y" - 2y/t2 = 0 over any interval I not containing 0, so that ao(t) = l/t2 is
continuous on I. We see by inspection that u(t) = t2 is one solution. Then we
284 DIFFERENTIAL EQUATIONS G.·
know that we can find a solution vet) independent of u(t) in the form vet) = t2c(t)
and that the problem will become a first-order problem for c'. We have, in fac1,
v' = t2c' + 2tc and v" = t2c" + 4tc' + 2c, so that L(v) = v" - 2v/t2 =
t2c" +4tc', and L(v) = 0 if and only if (c')' + (4/t)c' = O. Thus
c' = e-f4dt/t = e-41ogt = l/t c = 1/t3
(to within a scalar multiple; we only want a basis!), and v = t2c(t) = l/l.
(The reader may wish to check that this is the promised solution.) The null
space of the operator LU) = f" - 2f/t2 is thus the linear span of {t2, l/t] .
We now turn to an important tractable case, the differential operator
Lf = anf(n) + an_d(n-O + ... + aof,
where the coefficients ai are constants and an might as well be taken to 1. What
makes this case accessible is that now L is a polynomial in the derivative operator
D. That is, if Df =!" so that Djf = f(j), then L = p(D), where p(x) =
L~ aixi.
The most elegant, but not the most elementary, way to handle this equation
is to go over to the equivalent first-order system dx/dt = T(x) on IRn and to
apply the relevant theory from the last section.
Theorem 4.2. If pet) = (t - b)n, then the solution space N of the COIl-
stant coefficient nth-order equation p(D)f = 0 has the basis
{ebt, tebt, ... , tn-1ebt}.
If pet) is a polynomial which has a relatively prime factorization pet) =
II~ Pi(t) with each P.(t) of the above fonn, then the solution space of the
constant coefficient equation p(D)f = 0 has the basis UB" where Bi is th(!
above basis for the solution space Ni of pi(D)f = O.
Proof. We know that the mapping if;:f~ <'f,!', ... ,f(n-O>- is an isomor-
phism from the null space N of p(D) to the null space N of dx/dt - T(x). It i;;
clear that if; commutes with differentiation, if;(Df) = <.!', ... , f(n) >- = Dif;(f),
and since we know that N is invariant under D by Lemma 3.1, it follows (and call
easily be checked directly) that N is invariant under D. By Lemma 3.2 we have
T = CPt 0 D 0 CPt 1, which simply says that the isomorphism CPt: N ---7 IRn take;;
the operator D on N into the operator T on IRn. Altogether CPt 0 if; takes D
on N into Ton IR n, and since p(D) = 0 on N, it follows that peT) = 0 on IRn.
We saw in Theorem 3.4 that if peT) = 0 and p = (t - b)n, then the
solution space N of dx/dt = T(x) is spanned by vectors of the form
The first coordinates of the n-tuple-valued functions g in N form the space N
(under the isomorphism f = if;-lg), and we therefore see that N is spanned by
the functions ebt, ... ,tn-1ebt. Since N is n-dimensional, and since there are n
of these functions, the spanning set forms a basis.
6.4 THE nTH-ORDER LINEAR EQUATION 285
The remainder of the theorem can be viewed as the combination of the
above and the direct application of Theorem 5.5 of Chapter 1 to the equation
p(D) = 0 on N, or as the carry-over to N under the isomorphism 1/;-1 of the facts
already established for N in the last section. 0
If the roots of the polynomial p are not all real, then we have to resort to
the complexification theory that we developed in the exercises of Section 11,
Chapter 4. Except for one final step, the results are the same. The one extra
fact that has to be applied is that the null space of a real operator T acting on a
real vector space Y is exactly the intersection with Y of the null space of the
complexification S of T acting on the complexification Z = Y EB iY of Y.
This implies that if p(t) is a polynomial with real coefficients, then we get the
real solutions of p(D)f = 0 as the real parts of the complex solutions. In order
to see exactly what this means, suppose that q(x) = (x2 - 2bx + c)m is one of
the relatively prime factors of p(x) over tR, with x2 - 2bx + c irreducible over tR.
Over C, q(x) factors into (x - A)m(X - x)m, where A= b + iw and w2 = c - b2.
It follows from our general theory above that the complex 2m-dimensional
null space of q(D) is the complex span of
{eht teht tm- 1eht e~t teXt tm- 1eXt}, , ... , " , ... , .
The real parts of the complex linear combinations of these 2m functions is a
2m-dimensional real vector space spanned by the real parts of the above functions
and the real parts of i times the above functions. That is, the null space of the
real operator q(D) is a 2m-dimensional real space spanned by
{ebt cos wt, tebt cos wt, ... , tm- 1ebt cos wt; ebt sin wt, ... , tm- 1ebt sin wt}.
Since there are 2m of these functions, they must be independent and must form
a basis for the real solution space of q(D)f = O. Thus,
Theorelll 4.3. If p(t) = (t2 + 2bt + c)m and b2 < c, then the solution space
of the constant coefficient 2mth-order equation p(D)f = 0 has the basis
{ i bt }m-l { i bt· }m-lt e cos wt i=O ute sm wt i=O,
where w2 = c - b2. For any polynomial p(t) with real coefficients, if p(t) =
II~ Pi(t) is its relatively prime factorization into powers of linear factors and
powers of irreducible quadratic factors, then the solution space N of
p(D)f = 0 has the basis U~ Bi, where Bi is the basis for the null space of
Pi(D) that we displayed above if Pi(t) is a power of an irreducible quadratic,
and Bi is the basis of Theorem 4.2 if Pi(t) is a power of a linear factor.
Suppose, for example, that we want to find a basis for the null space of
D4 - 1 = O. Here p(x) = X4 - 1 = (x - l)(x + l)(x - i)(x + i). The
basis for the complex solution space is therefore {et, e-t, eit, e-it}. Since eit =
cos t +i sin t, the basis for the real solution space is {et, e-t, cos t, sin t}.
286 DIFFERENTIAL EQUATIONS
The same problem for D3 - 1 = 0 gives us
p(x) = x3 - 1 = (x - 1)(x2 +X + 1)
= (x - 1) (x + 1 +2iV~ (x + 1 -2i~,
so that the basis for the complex solution space is
{et, e-[(1+i.y3)/2lt, e-[(1-i.y3/2Jt}
and the basis for the real solution space is
{et, e-t/2 cos (V3t/2), e-t/2 sin (y'3t/2)}.
6.4
*Our results above suggest that the collection a of all real-valued
solutions of constant coefficient homogeneous linear differential equations con-
tains the functions ti, ert, cos wt, sin wt for all i, r, and w, and is closed under
addition and multiplication, and is in fact the algebra generated by these functions.
We can easily prove this conjecture. We first consider sums. Suppose that
T(f) = 0 and that S(g) = 0, where Sand T are two such constant coefficient
operators. Then f +g is in the null space of SoT because Sand T commute:
(S 0 T)(f + g) = (S 0 T)(f) + (S 0 T)(g) = S(Tf) + T(Sg) = 0 + 0 = o. We
know that Sand T commute because they are both polynomials in D.
In order to treat products, we first have to recognize that the linear span of
all the trigonometric functions sin at, cos bt is an algebra. In other words, any
finite product of such functions is a linear combination of such functions. This
is the role of a certain class of trigonometric identities, such as 2 sin x cos y =
sin (x + y) +sin (x - y), which the reader has undoubtedly had to struggle
with. (And again the mystery disappears when we are allowed to treat them
as complex exponentials.) Then we observe that any function in the algebra a is
a finite sum of terms each of which is of the form tVt sin wt or tVt cos wt for
some i, r, and w. We can exhibit an operator T having such a function in its
null space, and our finite sum of such terms will then be in the null space of the
composition of these operators T by our first argument.
We are tempted to say one more thing. The functions ti, ert, sin wt, cos wt,
and sums of their products can be shown to be exactly the continuous functions
f: IR - IR such that the set of translates of f has a finite-dimensional span. That
is, if we define translation through x, K x, by (Kxf)(t) = f(t - x), then for
exactly the above functionsfthe linear span of {Kxf, x E IR} is finite-dimensional.
This second characterization of exactly the same class of functions cannot be
accidental. Part of the secret lies in the fact that the constant coefficient oper-
ators T are exactly those linear differential operators that commute with trans-
lation. That is, if T is a linear differential operator, then T 0 Kx = Kx 0 T for
all x if and only if T has constant coefficients. Now we have noted in an early
chapter that if To S = SoT, then the null space of T is invariant under S.
Therefore, the null space N of a constant coefficient operator T is invariant under
all translations: Kx[N] eN for all x. Now we know that N is finite-dimensional
6.4 THE n!l'H-ORDER LINEAR EQUATION 287
from our differential equation theory. Therefore, the functions in N are such
that their translates have a finite-dimensional span!
This device of gaining additional information about the null space N of a
linear operator T by finding operators S that commute with T, so that N is
S-invariant, is much used in advanced mathematics. It is especially important
when we have a group of commuting operators S, as we do in the above case with
the operators S = K",.
What we have not shown is that if a continuous function f is such that its
translation generates a finite-dimensional vector space, then f is in the null space
of some constant coefficient operator p(D). This is delicate, and it depends on
showing that if {Kt} is a one-parameter family of linear transformations on a
finite-dimensional space such that K.+t = K. 0 K t and Kt --7 I as t --7 0, then
there is an S in Hom V such that K t = ets.*
EXERCISES
Find solutions for the following equations.
4.1
4.3
4.5
4.7
x"-3x'+2x = 0
x"+2x'+3x = 0
XIII - 3x"+ 3x' - x = 0
x(6) - x" = 0
4.9 XIII ~ x" = 0
4.2 x" +2x' - 3x = 0
4.4 x" +2x' +x = 0
4.6 x'" - x = 0
4.8 x'" = 0
4.10 Solve the initial-value problem x" +4x' - 5x = 0, x(O) = 1, x'(O) = 2.
4.11 Solve the initial-value problem XIII +x' = O,x(O) = O,x'(O) = -l,x"(O) = 1.
4.12 Find one solution u of the equation 4t2x" +x = 0 by trying u(t) = tn, and then
find a second solution as in the text by setting v(t) = c(t)u(t).
4.13 Solve t3x'" - 3tx' + 3x = 0 by trying u(t) = tn.
4.14 Solve tx" +x' = O.
4.15 Solve t(xlll +x') + 2(x" +x) = O.
4.16 Knowing that e-bt cos wt and e-bt sin wt are solutions of a second-order linear
differential equation, and observing that their values at 0 are 1 and 0, we know that
they are independent. Why?
4.17 Find constant coefficient differential equations of which the following functions
are solutions: t2, sin t, t2 sin t.
4.18 Iff and g are independent solutions of a second-order linear differential equation
u" +alu' +a2U = 0 with continuous coefficient functions, then we know that the
vectors <f(x), f'(x) > and <g(x), g'(x) > are independent at every point x. Show
conversely that if two functions have this latter property, then they are solutions
of a second-order differential equation.
4.19 Solve the equation (D - a)3f = 0 by applying the order-reducing procedure
discussed in the text starting with the obvious solution eat.
288 DIFFERENTIAL EQUATIONS 6.5
5. SOLVING THE INHOMOGENEOUS EQUATION
We come now to the problem of solving the inhomogeneous equation L(f) = g.
We shall briefly describe a practical method which works easily some of the time
and a theoretical method which works all the time, but which may be hard to
apply. The latter is just the translation of Theorem 3.3 into matrix language.
We first consider the constant coefficient equation L(f) = g in the special
case where g itself is in the null space of a constant coefficient operator S. A
simple example is y' - ay = ebt (or y' - ay = sin bt), where g(t) = ebt is in the
null space of S = (D - b). In such a situation a solution f must be in the
null space of SoL, for So L(f) = S(g) = O. We know what all these functions
are, and our problem is to select f among them such that L(f) is the given g.
For the moment suppose that the polynomials Land S (polynomials in D)
have no factors in common. Then we know that L is an isomorphism on the
null space N s of S and therefore that there exists an f in N s such that Lf = g.
Since we have a basis for N s, we could construct the matrix for the action of Lon
N s and find f by solving a matrix equation, but the simplest thing to do is take
a general linear combination of the basis, with unknown coefficients, let L act
on it, and see what the coefficients must be to give g.
For example, to solve y' - ay = ebt, we try f(t) = cebt and apply
L: (D - a)(cebt) = (b - a)cebt J, ebt,
and we see that c = 1j(b - a).
Again, to solve y' - ay = cos bt, we observe that cos bt is in the null space
of S = D2 +b2 and that this null space has the basis {sin bt, cos bt}. We
therefore set f(t) = CI sin bt +C2 cos bt and solve (D - a)f = cos bt, getting
(-acI - bC2) sin bt + (bCI - aC2) cos bt = cos bt,
-aCI - bC2 = 0,
bCI - aC2 = 1,
and
f(t) = 2 b b2 sin bt - 2 a b2 cos bt.
a + a +
When Land S do have factors in common, the situation is more complicated,
but a similar procedure can be proved to work. Now an extra factor ti must be
introduced, where i is the number of occurrences of the common factor in L.
For example, in solving (D - r)2f = eTt, we have SoL = (D - r)3, and so
we must set f(t) = ct2eTt. Our equation then becomes
(D - r)2ct2eTt = 2ceTt J, eTt,
and so C = !.
For (D2 + 1)f = sin t we have to set f(t) = t(CI sin t + C2 cos t), and after
we work it out we find that CI = 0 and C2 = -!, so that f = -!t cos t.
6.5 SOLVING THE INHOMOGENEOUS EQUATION 289
This procedure, called, naturally, the method of undetermined coefficients, vio-
lates our philosophy about a solution process being a linear right inverse. Indeed,
it is not a single process, applicable to any g occurring on the right, but varies
with the operator S. However, when it is available, it is the easiest way to com-
pute explicit solutions.
We describe next a general theoretical method, called variation of parameters,
that is a right inverse to L and does therefore apply to every g. Moreover, it
inverts the general (variable coefficient) linear nth-order operator L:
"(Lf)(t) = L: ai(t)j<i)(t).
o
We are assuming that we know the null space N of Lj that is, we assume
known n linearly independent solutions {Ui}~ of the homogeneous equation
Lf = O. What we are going to do is to translate into this context our formula
K t f~ K-;l(g(S») ds for the solution to da/dt = Tt(a) +get). Since
1/1: f 1-+ -<f,!', ... ,f("-O >
is an isomorphism from the solution space N of the nth-order equation LCf) = 0
to the solution space N of the equivalent first-order system dx/dt = Tt(x), it
follows that if we have a basis {uiH for N, then the columns of the matrix
Wij = uji-ll form a basis for N.
Let wet) be the matrix Wij(t) = uji-o(t). Since evaluation at t is the isomor-
phism tpt from N to IR", the columns of wet) form a basis for IR", for each t.
But Kt(a) is the value at t of the solution of dx/dt = TI(x) through the initial
point -< 0, a> ,and it follows that the linear transformation KI takes the columns
of the matrix w(O) to the corresponding columns of wet). The matrix for KI is
therefore wet) . w(O)-l, and the matrix form of our formula
f(t) = Ktlot (Ks)-l(g(S») ds
is therefore
f(t) = wet) . w(O)-l . lot w(O) . w(S)-l . g(s) ds.
Moreover, since integration commutes with the application of a constant linear
transformation (here multiplication by a constant matrix), the middle w(O)
factors cancel, and we have the result that
rl 1f(t) = wet) . Jo w(s)- . g(s) ds
is the solution of dx/dt = Tt(x) +get) which passes through -<0,0>. Finally,
set k(s) = W(S)-l . g(s), so that this solution formula splits into the pair
f(t) = wet) . fk(s) ds, w(s) . k(s) = g(s).
290 DIFFERENTIAL EQUATIONS
Now we want to solve the inhomogeneous nth-order equation L(f) = g, and
this means solving the first-order system with g = -<0, ... , 0, g>-. Therefore,
the second equation above is equivalent to
L: wij(s)kj(s) = 0,
j
L: wn;(s)kj(s) = g(s).
j
i < n,
Moreover, the solution J of the nth-order equation is the first component of the
n-tuple f (that is, J = ",-If), and so we end up with
J(t) = 1;1Wlj(t) itkj(s) ds = ~ Uj(t)Cj(t),
where Ci(t) is the antiderivative I~ ki(s) ds. Any other antiderivative would do
as well, since the difference between the two resulting formulas is of the form
I:i aiui(t), a solution of the homogeneous equation L(f) = 0. We have proved
the following theorem.
Theorem. 5.1. If {Ui(t)}~ is a basis for the solution space of the homogeneous
equation L(h) = 0, and if J(t) = I:~ Ci(t)Ui(t), where the derivatives cW)
are determined as the solutions of the equations
L: cW)u~j)(t) = 0, j = 0, ... , n - 2,
i
L: c~(t)u~n-l)(t) = get),
i
then L(f) = g.
We now consider a simple example of this method. The equation y" + y =
sec x has constant coefficients, and we can therefore easily find the null space of
the homogeneous equation y" + y. A basis for it is {sin x, cos x}. But we can't
use the method of undetermined coefficients, because sec x is not a solution of a
constant coefficient equation. We therefore try for a solution
vex) = C1(X) sin x +C2(X) cos x.
Our system of equations to be solved is
ci sin x + c~ cos x = 0,
ci cos x - C2 sin x = sec x.
Thus c~ = -c~ tan x and c~ (cos x +sin x tan x) = sec x, giving
and
(Check it!)
C~ = 1,
C1 = x,
C~ = -tanx,
C2 = log cos x,
vex) = x sin x + (log cos x) cos x.
6.5 SOLVING THE INHOMOGENEOUS EQUATION 291
This is all we shall say about the process of finding solutions. In cases where
everything works we have complete control of the solutions of L(f) = g, and
we can then solve the initial-value problem. If L has order n, then we know that
the null space N is n-dimensional, and if for a given g the function v is one
solution of the inhomogeneous equation L(f) = g, then the set of all solutions is
the n-dimensional plane (affine subspace) M = N +v. If we have found a
basis {Ui} ~ for N, then every solution of L(f) = g is of the form I = L~ CiUi +v.
The initial-value problem is the problem of finding I such that L(f) = g and
l(to) = aY,,!,(to) = ag, . .. ,!'n-l)(to) = a~, where <.aY, ... , a~>- = a O is the
given initial value. We can now find this unique I by using these n conditions
to determine the n coefficients Ci in I = L CiUi +V. We get n equations in the n
unknowns Ci. Our ability to solve this problem uniquely again comes back to the
fact that the matrix Wij(tO) = uji-ll(tO) is nonsingular, as did our success in
carrying out the variation of parameters process.
We conclude this section by discussing a very simple and important example.
When a perfectly elastic spring is stretched or compressed, it resists with a
"restoring" force proportional to its deformation. If we picture a coiled spring
lying along the x-axis, with one end fixed and the free end at the origin when
undisturbed (Fig. 6.3), then when the coil is stretched a distance x (compression
being negative stretching), the force it exerts is -cx, where C is a constant rep-
resenting the stiffness, or elasticity, of the spring, and the minus sign shows
that the force is in the direction opposite to the displacement. This is Hooke's
law.
Fig. 6.3
Suppose that we attach a point mass m to the free end of the spring, pull the
spring out to an initial position xo = a, and let go. The reader knows perfectly
well that the system will then oscillate, and we want to describe its vibration
explicitly. We disregard the InaSS of the spring itself (which amounts to ad-
justing m), and for the moment we suppose that friction is zero, so that the
system will oscillate forever. Newton's law says that if the force F is applied to
the mass m, then the particle will accelerate according to the equation
d2x
m dt2 = F.
Here F = -cx, so the equation combining the laws of Newton and Hooke is
d2x
m dt2 +ex = O.
292 DIFFERENTIAL EQUATIONS 6.5
This is almost the simplest constant coefficient equation, and we know that the
general solution is
x = CI sin Qt +C2 cos Qt,
where Q = v'c/m. Our initial condition was that x = a and x' = 0 when t = O.
Thus C2 = a and CI = 0, so x = a cos Qt. The particle oscillates forever
between x = -a and x = a. The maximum displacement a is called the
amplitude A of the oscillation. The number of complete oscillations per unit time
is called the frequency f, so f = Q/27r = v'C/27rVm. This is the quantitative
expression of the intuitively clear fact that the frequency will increase with the
stiffness c and decrease as the mass m increases. Other initial conditions are
equally reasonable. We might consider the system originally at rest and strike
it, so that we start with an initial velocity v and an initial displacement 0 at
time t = O. Now C2 = 0 and x = CI sin Qt. In order to evaluate Cll we remem-
ber that dx/dt = v at t = 0, and since dx/dt = CIQ cos Qt, we have v = CIQ
and CI = v/Q, the amplitude for this motion. In general, the initial condition
would be x = a and x' = v when t = 0, and the unique solution thus determined
would involve both terms of the general solution, with amplitude to be calculated.
The situation is both more realistic and more interesting when friction is
taken into account. Frictional resistance is ideally a force proportional to the
velocity dx/dt but again with a negative sign, since its direction is opposite to
that of the motion. Our new equation is thus
d2x dx
m dt2 +k dt +cx = 0,
and we know that the system will act in quite different ways depending on the
relationship among the constants m, k, and c. The reader will be asked to explore
these equations further in the exercises.
It is extraordinary that exactly the same equation governs a freely oscillating
electric circuit. It is now written
d2x dx 1
L dt2+ R dt +eX = 0,
where L, R, and C are the inductance, resistance, and capacitance of the circuit,
respectively, and dx/dt is the current. However, the ordinary operation of such
a circuit involves forced rather than free oscillation. An alternating (sinusoidal)
voltage is applied as an extra, external, "force" term, and the equation is now
d2x dx x .
L dt2 + R dt +C= a sm wt.
This shows the most interesting behavior of alL Using the method of un-
determined coefficients, we find that the solution contains transient terms that
die away, contributed by the homogeneous equation, and a permanent part of
frequency w/27r, arising from the inhomogeneous term a sin wt. New phenomena
called phase and resonance now appear, as the reader will discover in the exercises.
6.5 SOLVING THE INHOMOGENEOUS EQUATION 293
EXERCISES
Find particular solutions of the following equations.
5.1 x" - x = t4 5.2 x" - x = sin t
5.4 x" + x = sin t
5.6 y" - y' = eX
5.5 y" - y' = x2
5.3 x" - x = sin t +t4
(Here y' = dy/dx.)
5.7 Consider the equation y" + y = sec x that was solved in the text. To what
interval I must we limit our discussion? Check that the particular solution found in
the text is correct. Solve the initial-value problem for
f"(x) +f(x) = sec x, f(O) = 1, j'(0) -1.
Solve the following equations by variation of parameters.
S.S x" + x = tan t 5.9 x'" +x' = t 5.10 y" + y = 1
5.11 y(4) - Y = cos x 5.12 y" + 4y = sec 2x 5.13 y" + 4y = sec x
5.14 Show that the general solution
of the frictionless elastic equation m(d2x/dt2) +cx = 0 can be rewritten in the form
A sin (Ot - a).
(Remember that sin (x - y) = sin x cos y - cos x sin y.) This type of motion along
a line is called simple harmonic motion.
5.15 In the above exercise express A and a in terms of the initial values dx/dt = v
and x = a when t = o.
5.16 Consider now the freely vibrating system with friction taken into account, and
therefore having the equation
m(d2x/dt2 ) +k(dx/dt) +cx = 0,
all coefficients being positive. Show that if k2 < 4mc, then the system oscillates forever,
but with amplitude decreasing exponentially. Determine the frequency of oscillation.
Use Exercise 5.14 to simplify the solution, and sketch its graph.
5.17 Show that if the frictional force is sufficiently large (k2 ~ 4mc), then a freely
vibrating system does not in fact vibrate. Taking the simplest case k2 = 4mc, sketch
the behavior of the system for the initial condition dx/dt = 0 and x = a when t = O.
Do the same for the initial condition dx/dt = v and x = 0 when t = o.
S.IS Use the method of undetermined coefficients to find a particular solution of the
equation of the driven electric circuit
Assuming that R > 0, show by a general argument that your particular solution is in
fact the steady-state part (the part without exponential decay) of the general solution.
294 DIFFERENTIAL EQUATIONS 6.(j
5.19 In the above exercise show that the "current" dx/dt for your solution can b(~
written in the form
dx
dt
a sin (wt - a),
VR2+X2
where X = Lw - l/wC. Here a is called the phase angle.
5.20 Continuing our discussion, show that the current flowing in the circuit will have
a maximum amplitude when the frequency of the "impressed voltage" a sin wt iH
1/27rvLC. This is the phenomenon of resonance. Show also that the current is in
phase with the impressed voltage (i.e., that a = 0) if and only if L = C = O.
5.21 What is the condition that the phase a be approximately gOO? -gOO?
5.22 In the theory of a stable equilibrium point in a dynamical system we end up with
two scalar products (~, 7J) and (t 7J») on a finite-dimensional vector space V, the qua-
dratic form q(~) = H(~, ~») being the potential energy and p(e> = !(e, e> being the
kinetic energy. Now we know that dqaW = (a, ~») and similarly for p, and because of
this fact it can be shown that the Lagrangian equations can be written
d(d~ )dt dt' 7J = (~, 7J»).
Prove that a basis {,ail i can be found for V such that this vector equation becomes the
system of second-order equations
d2xi
dt2 = AiXi, i = 1, ... , n,
where the constants Ai are positive. Show therefore that the motion of the system is the
sum of n linearly independent simple harmonic motions.
6. THE BOUNDARY-VALUE PROBLEM
We now turn to a problem which seems to be like the initial-value problem but
which turns out to be of a wholly different character. Suppose that T is a second-
order operator, which we consider over a closed interval [a, b]. Some of the most
important problems in physics require us to find solutions to T(f) = g such that
f has given values at a and b, instead of f and l' having given values at a single
point to. This new problem is called a boundary-value problem, because {a, b} is
the boundary of the domain I = [a, b]. The boundary-value problem, like the
initial-value problem, breaks neatly into two subproblems if the set
M = {j E e2 ([a, b]) : f(a) = f(b) = O}
turns out to be a complement of the null space N of T. However, if the reader
will consider this general question for a moment, he will realize that he doesn't
have a clue to it from our initial-value development, and, in fact, wholly new
tools have to be devised.
Our procedure will be to forget that we are trying to solve the boundary-
value problem and instead to speculate on the nature of a linear differential
6.6 THE BOUNDARY-VALUE PROBLEM 295
operator T from the point of view of scalar products and the theory of self-
adjoint operators. That is, our present study of T will be by means of the scalar
product (f, g) = J: f(t)g(t) dt, the general problem being the usual one of
solving Tf = g by finding a right inverse S of T. Also, as usual, S may be deter-
mined by finding a complement M of N(T). Now, however, it turns out that if T
is "formally self-adjoint", then suitable choices of M will make the associated
right inverses S self-adjoint and compact, and the eigenvectors of S, computed as
those solutions of the homogeneous equation Tf - rf = 0 which lie in M, then
allow (relatively) the same easy handling of S, by virtue of Theorem 5.1 of
Chapter 5, that they gave us earlier in the finite-dimensional situation.
We first consider the notion of "formal adjoint" for an nth-order linear
differential operator T. The ordinary formula for integration by parts,
allows the derivatives of f occurring in the scalar product (Tf, g) to be shifted
one at a time to g. At the end, f is undifferentiated and g is acted on by a certain
nth-order linear differential operator R. The endpoint evaluations, like the
above fgl~, that accumulate step by step can be described as
B(f, g)l: = L kij(x)f(i)(x)g<il(x)l:,
O"':i+j<n
where the coefficient functions kij(x) are linear combinations of the coefficient
functions ai(x) and their derivatives. Thus
(Tf, g) = (f, Rg) + B(f, g)l~.
The operator R is called the formal adJ·oint of T, and if R = T, we say that T is
formally self-adjoint.
Every application of the integration by parts formula introduces a sign
change, and the reader may be able to see that the leading coefficient of R is
(-1) n times the leading coefficient of T. Assuming this, we see that a necessary
condition for formal self-adjointness is that n be even, so that Rand T have the
same first terms.
Supposing that T is formally self-adjoint, we seek a complement M of the
null space N of T in en([a, b]) with the further property that S, the associated
right inverse of T, is self-adjoint as a mapping from the pre-Hilbert space
eO([a, b]) to itself. Let us see what this further requirement amounts to. For
any u, v E eO, set f = Su and g = Sv, so that f and g are in M and u = Tf,
v = Tg. Then (u, Sv) = (Tf, g) = (f, Tg) + B(f, g)l~ = (Su, v) + B(f, g)l~.
We thus have:
LeIllIlla 6.1. If T is a formally self-adjoint differential operator and M is a
complement of the null space of T, then the right inverse of T determined
by M is self-adjoint if and only if
f, gEM =} B(f, g)l~ = o.
IlU'PEnl;:"T'AI. EQUATIO:"I!I 0.6
From now on we shall consider only the seoond-order ca.se, However, IIlmO/!t
everything Uillt we are going to do worh perfectly well for the general r.Me, the
pritll of genemlity being only Additional notationsl complexity,
We start hy oomputing the formal adjoint of the second-order op4:'nLtor
TI _ cJ" +cd' +eJ. WII hllve
(Tj, g) = f cJ"g +f cd'g I- f co/g,
f eJ'g - eJfIr-f j(elg)',
f C2/"g = c:/,g! - f !'(C2g)'
= (c,/'g f(c29}'):t + f / (C29)",
givillg
(I, Ng) - fl(e'l{J)" - (c /{J)' + (Co9'),
ond
BU, g) - e~U 'g - g'/) + (e, - ~)ffl.
ThWi Ng == C2fI" + (2c~ - e,)g' + (I'; - c; +cl)g au.1 R = T if and only if
2e'; - C, ..." C, (and e~ - e; _ 0), that is, c; = Ct. We have provP.d:
Lemnla 6.2. T he ~nd-order diITercntial opcrotor T is formally Self-
adjoint if and only if
Tf "'" c~f" +c;r +CM = (clI' )' +cal,
in which case
BU, g) -= ttV'g - g'f).
A constant coefficient operntor is tllU~ formally self·adjoillt if and only
ifc, = O.
Suppo;!ing that T is formally 8elf-adjoint, we noW try to find a complement M
of ita null space N slIch that I, gEM = BU, fI)1! _ O. Since N is two-diUlell'
sional, any complement M ClUj be described as the intersection of the lIull space
of two linear fUlictiollal~ I, and 13 on X1 - e2{!a, bJ). ,For ~mple, tbe MODO
point" complement MI~ that we had eArlier in connt!ctiOIl with the initial-value
problem i~ tho intel'SCCtion of the nullllp8ct'l1 of tile tl'l'O fUllctionals '1(J) = /(10)
lI.ntllJ (f) ... J'(lu). Here, howeVer, the vllilishing of II and I~ for two functiol1ll
I and (J mtJ.~t imply that B(I, V).J: = C3(1'(1 - g'f).J: = 0, and the fWlCtiollsls
1,(1) must therefore involve the valUe<! of f and/, at a and at b. We would natu-
rally guess, alld it can be proved, that eAch of II and II must be of the fonn
l(f) = k,J(a) +kJ'(a) I- kJ(b) +kJ'(b).
Our problem can therefore bc restated Il!j follows. We mll~t find two linear
funetiollals II and /) of the above general fonn such that if M is the intersection
G.G TilE DOU:o.'O,l.tlY-V,l.LUE PROBL J;:M 297
of thp,ir null spaces, then
a) M is a complement of N, and
b) I, g E loT ~ t;2(f'U - !7'!)1! = 0,
in which case we cal! the boundary condition 11(/) - I~(n - 0 ~c1f·adjoint.
Lemma 6.3. We tllll rcpiuCI) (11.) by
11.') T j~ illjl'<!(ive 011 M.
Proof. If T is injective on AI, then M n N - {O}, 80 that the map
f -> -<II(f), l~(f) >-
is injective on N, and therefore, ~ll.Use N i~ tWo-JiIllCIl~i!rud, i~ all iWlllorphi~JU
from N to R~ (by the corollary of Theorem 2.4, Chapter 2). Then M is a comple-
ment of N by Theorem f•.a of Chapter L 0
Now we un c8.!lily write down various pain 11 and l~ tha~ form a self-adjoint
boundary condition, We li~t wille below.
1) f E M<=> f(a) = feb) = 0 [that is, ldf) = f{a) Ilud 12 m - feb)].
2) IE M ~f'(a) "" I'(b) "" O.
3) lIore generaly,/,(a) - V(a),j'(b) "" c/(b). (Tn fact, lz can he any I that
uepo::w:Is only on the values at a, and Iz can be any I that depends only on b.
Thu~ II(!) = kd(a) + kJ'(a), anu if l,(f) ...., 1,0) = 0, then the pairs
</(a),/'(a» and <y(a),II'(a» are dependent, since both lie in the one-
dimensional null space of it. and M f'y ...., 1/14 _ O. The same holds for 12
and b, 50 that this split pair of endpoint conditions makes h(f, oj]! = 0 hy
making the valUe!! of IJ at a and at b separately 0.)
4) If c2(a) = c2(b), then tM.kef E!If <=> f(a) = I(b) lu,,1/'(a) = I'(b). That
is, II(/) = I(a) - I(b) and 12(f) = /,(a) - f'(b).
We now show thllt in every clise hut (3) the condit.ion (a') also holds if we
replace l' by '1' - Xfor a suitablc~. T his is true also for case (3), but the proof is
han.ler, am.! we ~llall omit it.
1...,11111111 6.4. Suppose that M is defined by one of the self-/ldjoint boumlary
conrlilion!l (1), (2), or (4) above, that C2(t) ~ m > 0 on la, bj, and that
~ 2: co(t) + I on la, b]. T hen
I« T- ')/,/)1 ;, mll!'lI: ·1·llflll
for all f E M. In particular, M is a complement of the null WM.~e I)f l' - X
ami hence defines a sclf~adjoint right inveI"SC of T - ~ .
Proof. We have
«}.. T)/,/) - - t (c~I')'I -1 i' (}.. - CO)/2
= -C21'fr+t c2(f')~ +t(}..-co)/'}..
298 DIFFERENTIAL EQUATIONS G.G
Under any of conditions (1), (2), or (4), cd'jJ! = 0, and the two integral terms
are clearly bounded below by mllf'll~ and Ilfll~, respectively. Lemma 6.3 then
implies that !vI is a complement of the null space of T - "A. 0
We come now to our main theorem. It says that the right inverse S of T - "A
determined by the subspace !vI above is a compact self-adjoint mapping of the
pre-Hilbert space eO([a, b]) into itself, and is therefore endowed with all the rich
eigenvalue structures of Theorem 5.1 of the last chapter. First we present some
classical terminology. A Sturm-Liouville system on [a, b] is a formally self-adjoint
second-order differential operator Tf = (cd')' +cof defined over the closed
interval [a, b], together with a self-adjoint boundary ~ondition II (f) = l2(f) = 0
for that interval. If C2(t) is never zero on [a, b], the system is called regular. If
c2(a) or c2(b) is zero, or if the interval [a, b] is replaced by an infinite interval
such as [a, 00], then the system is called singular.
Theorelll 6.1. If T: l1, l2 is a regular Sturm-Liouville system on [a, b], with
C2 positive, then the subspace !vI defined by the homogeneous boundary
condition is a complement of N(T - "A) if "A is taken sufficiently large, and
the right inverse of T - "A thus determined by !vI is a compact self-adjoint
mapping of the pre-Hilbert space eO([a, b]) into itself.
Proof. The proof depends on the inequality of the above lemma. Since we have
proved this inequality only for boundary conditions (1), (2), and.(4), our proof
will be complete only for those cases.
Set g = (T - "A)f. Since IIgl1211fl12 ~ I((T - "A)f,f)1 by the Schwarz
inequality, we see from the lemma first that Ilfll~ ::; Ilg11211f112' so that
IIfl12 ::; Ilg112'
and then that mllf'll~ ::; IIgl1211fl12 ::; IIgll~, so that
11f'112 ::; IlgI12/vm.
We have already checked that the right inverse S of the formally self-
adjoint T - "A defined by !vI is self-adjoint, and it remains for us to show that the
set S[U] = {f: IIgl12 ::; 1} has compact closure. For any such f the Schwarz
inequality and the above inequality imply that
ly
lY Iy - xl 1/2
If(y) - f(x) I ::; x lf'l = x 1f'1· 1 ::; 1If'I121y - xl 1/2 ::; vm
Thus S[U] is uniformly equicontinuous. Since the common domain of the
functions in S[U] is the compact set [a, b], we will be able to conclude from
Theorem 6.1 of Chapter 4 that the set S[U] is totally bounded if we can show
that there is a constant C such that all the functions in S[U] have their ranges in
[-C, C]. Taking y and x in the last inequality where IfIassumes its maximum
and minimum values, we have Ilfll", - min IfI ::; (b - a) 1/2/vm. But
6.6 THE BOUNDARY-VALUE PROBLEM 299
(min Ifl)(b - a)1/2 ::;; IIfl12 ::;; IIYl12 ::;; 1, and therefore
Ilfll", ::;; e = l/(b - a)1/2 + (b - a)1/2/vm.
Thus S[U] is a uniformly equicontinuous set of functions mapping the com-
pact set [a, b] into the compact set [-e, e], and is therefore totally bounded in
the uniform norm. Since e([a, b]) is complete in the uniform norm, every sequence
in S[U] has a subsequence uniformly converging to some fEe, and since
IIfll2 ::;; (b - a)1/21Ifll"" this subsequence also converges to f in the two-norm.
We have thus shown that if H is the pre-Hilbert space e([a, b]) under the
standard scalar product, then the image S[U] of the unit ball U C H under S has
the property that every sequence in S[U] has a subsequence converging in H.
This is the property we actually used in proving Theorem 5.1 of Chapter 5, but
it is not quite the definition of the compactness of S, which requires us to show
that the closure S[U] is compact in H. However, if {~n} is any sequence in this
closure, then we can choose {Sn} in S[U] so that II~n - snll < l/n. The se-
quence {sn} has a convergent subsequence {sn(m)} m as above, and then {~n(m)} m
converges to the same limit. Thus S is a compact operator. 0
Theorelll 6.2. There exists an orthonormal sequence {<Pn} consisting entirely
of eigenvectors of T and forming a basis for M. Moreover, the Fourier
expansion of any f E M with respect to the basis {<Pn} converges uniformly
to f (as well as in the two-norm).
Proof. By Theorem 5.1 of Chapter 5 there exist an eigenbasis for the range of S,
which is M. Since S<Pn = 1'n<Pn for some nonzero rn, we have (T - X)(rn<Pn) =
<Pn and T<Pn = (1 + Xrn)/rn)<Pn' The uniformity of the series convergence
comes out of the following general consideration.
Lelllllla 6.5. Suppose that T is a self-adjoint operator on a pre-Hilbert
space V and that T is compact as a mapping from V to -< V, q>- , where q is a
second norm on V that dominates the scalar product norm p (q 2:: cp).
Then T is compact (from p to p), and the eigenbasis expansion L bn<Pn of an
element {3 in the range of T converges to {3 in both norms.
Proof. Let U be the unit ball of V in the scalar product norm. By the hypothesis
of the lemma, the q-closure B of T[U] is compact. B is then also p-compact, for
any sequence in it has a q-convergent subsequence which also p-converges to the
same limit, because p ::;; cq. We can therefore apply the eigenbasis theorem.
Now let a and (3 = T(a) have the Fourier series L ai<Pi and L bi<Pi, and let
T(<Pi) = ri<Pi. Then bi = riai, because bi = (T(a), <Pi) = (a, T(<Pi)) =
(a, ri<Pi) = ri(a, <Pi) = riai. Since the sequence of partial sums L~ ai<Pi is
p-bounded (Bessel's inequality), the sequence {L~ bi<Pi} = {T(L:~ ai<Pi)} is
totally q-bounded. Any subsequence of it therefore has a subsubsequence
q-converging to some element 'Y in V. Since it then p-converges to 'Y, 'Y must be {3.
Thus every subsequence has a subsubsequence q-converging to {3, and so
{L~ bi<Pi} itself q-converges to {3 by Lemma 4.1 of Chapter 4. 0
300 DIFFERENTIAL EQUATIONS 6.6
EXERCISES
6.1 Given that Tf(x) = xf"(x) +f(x) and Sf(x) = !'(x), compute To Sand SoT.
6.2 Show that the differential operators T = aD and S = bD commute if and
only if the functions a(x) and b(x) are proportional.
6.3 Show that the differential operators T = aD2 and S = bD commute if and
only if b(x) is a first-degree polynomial b(x) = ex +d and a(x) = k(b(x»2.
6.4 Compute the formal adjoint S of T if
a) Tf = 1', b) Tf = 1", c) Tf = 1"', d) (Tf)(x) = xf'(x),
e) (Tf) (x) = x3f" (x).
6.5 Let Sand T be linear differential operators of orders m and n, respectively.
What are the coefficient conditions for SoT to be a linear differential operator of
order m +n?
6.6 Let T be the second-order linear differential operator
(Tf)(t) = a2(t)f"(t) + al(t)f'(t) + ao(t)f(t).
What are the conditions on its coefficient functions for its formal adjoint to exist'!
What are these conditions for T of order n?
6.7 Let Sand T be linear differential operators of order m and n, respectively, and
suppose that all coefficients are e""-functions (infinitely differentiable). Prove that
SoT - T a S is of order :-:; m + n - 1.
6.8 A a-blip is a continuous nonnegative function cp such
that cp = 0 outside of [-a, a] and J~6 cp = 1 (Fig. 6.4). We
assume that there exists an infinitely differentiable I-blip cpo
Show that there exists an infinitely differentiable a-blip for
every a> O. Define what you would mean by a a-blip centered
at x, and show that one exists.
Fig. 6.4
6.9 Let f be a continuous function on [a, b] such that (f, g) = J: fg = 0 whenever
g is an infinitely differentiable function which vanishes near a and b. Show that f = O.
(Use the above exercise.)
6.10 Let eOO([a, bJ) be the vector space of infinitely differentiable functions on [a, b],
and let T be a second-order linear differential operator with coefficients in e"":
(Tf)(t) = a2(t)f"(t) +al (t)f'(t) +aoCt)f(t).
Let S be a linear operator on e""([a, bJ) such that
(Tf, g) - (f, Sg) = K(f, g)
is a bilinear functional depending only on the values of f, g,f', and g' at a and b.
Prove that S is the formal adjoint of T. [Hint: Take f to be a a-blip centered at x.
Then K(f, g) = O. Now try to work the assertion to be proved into a form to which
the above exercise can be applied.]
6.7 FOURIER SERIES 301
6.11 Prove an nth-order generalization of the above exercise.
6.12 Let X be the space of linear differential operators with ~oo-coefficients, and let AT
be the formal adjoint of T. Prove that T ~ AT is an isomorphism from X to X.
Prove that A(TOS) = As 0 AT.
7. FOURIER SERIES
There are not many regular Sturm-Liouville systems whose associated ortho-
normal eigenbases have proved to be important in actual calculations. Most
orthonormal bases that are used, such as those due to Bessel, Legendre, Hermite,
and Laguerre, arise from singular Sturm-Liouville systems and are therefore
beyond the limitations we have set for this discussion. However, the most well-
known example, Fourier series, is available to us.
We shall consider the constant coefficient operator Tf = D2f, which is clearly
both formally self-adjoint and regular, and either the boundary condition
f(O) = f(7r) = 0 on [0, 7r] (type 1) or the periodic boundary conditionf(-7r) =
f(7r), f'(-7r) = f'(7r) on [-7r, 7r] (type 4).
To solve the first problem, we have to find the solutions of f" - V = 0
which satisfy f(O) = f(7r) = O. If}.. > 0, then we know that the two-dimen-
sional solution space is spanned by {crx, c-rx}, where r = }..1/2. But if CICrx +
C2C-rx is 0 at both 0 and 7r, then CI = C2 = 0 (because the pairs -< 1, 1>- and
-<cT1r , c-T1r >- are independent). Therefore, there are no solutions satisfying the
boundary conditions when}.. > O. If}.. = 0, then f(x) = CIX + Co and again
CI = Co = O.
If }.. < 0, then the solution space is spanned by {sin rx, cos rx}, where
r = (_}..) 1/2. Now if C1 sin rx + C2 cos rx is 0 at x = 0 and x = 7r, we get,
first, that C2 = 0 and, second, that r7r = n7r for some integer n. Thus the
eigenfunctions for the first system form the set {sin nx} f, and the corresponding
eigenvalues of D2 are {-n2}f.
At the end of this section we shall prove that the functions in ~2([a, b])
that are zero near a and b are dense in ~([a, b]) in the two-norm. Assuming this,
it follows from Theorem 2.3 of Chapter 5 that a basis for M is a basis for ~o, and
we now have the following corollary of the Sturm-Liouville theorem.
TheoreD1 7.1. The sequence {sin nx}f is an orthogonal basis for the pre-
Hilbert space ~O([O, 7r]). If f E ~2([0, 7r]) and f(O) = f(7r) = 0, then the
Fourier series for f converges uniformly to f.
We now consider the second boundary problem. The computations are a
little more complicated, but again if f(x) = CICrx + C2C-rx, and if f( -7r) = f(7r)
and 1'(-7r) = f'(7r), then f = O. For now we have
302 DIFFERENTIAL EQUATIONS 6.7
giving CI (eT'lr - e-r ll") = 0, and so CI = o. Again f(x) = CIX + Co is ruled out
Finally, if f(x) = CI sin rx + C2 cos rx, our boundary conditions become
2CI sin nr = 0 and 2rC2 sin nr = 0,
so that again r = n, but this time the full solution space of (D2 + n2)f = 0
satisfies the boundary condition.
Theorem 7.2. The set {sin nx} ~ U {cos nx} 0 forms an orthogonal basis for
the pre-Hilbert space eO([ -7r, 7r]). If f E e2 ([ -7r, 7r)) and f( -7r) = f(7r),
1'(-7r) = f'(7r), then the Fourier series for f converges to f uniformly on
[-7r,7r).
Remaining proof. This theorem follows from our general Sturm-Liouville dis-
cussion except for the orthogonality of sin nx and cos nx. We have
(sin nx, cos nx) = f~lI" sin nt cos nt dt
= tf~" sin 2nt dt
= -(1/4n) cos 2nx)~1I"
= o.
Or we can simply remark that the first integrand is an odd function and therefore
its integral over any symmetric interval [-a, a) is necessarily zero.
The orthogonality of eigenvectors having different eigenvalues follows of
course, as in the proof of Theorem 3.1 of Chapter 5. 0
Finally, we prove the density theorem we needed above. There are very
slick ways of doing this, but they require more machinery than we have avail-
able, and rather than taking the time to make the machines, we shall prove the
theorem with our bare hands.
It is standard notation to let a subscript zero on a symbol denoting a class
of functions pick out those functions in the class that are zero "on the boundary"
in some suitable sense. Here eo([a, b)) will denote the functions in e([a, bJ) that
are zero in neighborhoods of a and b, and similarly for e~([a, b)).
Theorem 7.3. e 2 ([a, b)) is dense in e([a, b)) in the uniform norm, and
e6([a, b)) is dense in e([a, b)) in the two-norm.
Proof. We first approximate f E e([a, b)) to within E by a piecewise "linear"
function g by drawing straight line segments between the adjacent points on the
graph of f lying over a subdivision a = Xo < Xl < ... < Xn = b of [a, bJ.
If f varies by less than E on each interval (Xi-I, Xi), then Ilf - all"" :::; E. Now
a'(t) is a step function which is constant on the intervals of the above sub-
division. We now alter g'(t) slightly near each jump in such a way that the new
function h(t) is continuous there. If we do it as sketched in Fig. 6.5, the total
6.7 FOURIER SERIES 303
1
o
Fig. 6.5 Fig. 6.6
integral error at the jump is zero, I:.0:,,& (h - g') = 0, and the maximum error
I:''-a is 611/4. This will be less than Eif we take 6= E/llg'II"" since 11 ~ 21Ig'II",.
We now have a continuous function h such that II:h(t) dt - (J(x) - I(a)) I< 2E.
In other words, we have approximated I uniformly by a continuously differ-
entiable function.
Now choose g and h in e1([a, b)) so that first III - gil", < E/2 and then
IIg' - hll", < E/2(b - a). Then
Ig(x) - g(O) - 10'"hi < E/2,
and so H(x) = fO' h + g(O) is a twice continuously differentiable function such
that III - HII", < E. In other words, e 2«(a, b)) is dense in e([a, b]) in the
uniform norm. It is then also dense in the two-nonn, since
11/112 = (ib12y/2 ~ 1I/1I",(ib
1r/2= (b - a)1/211/11",.
But now we can do something which we couldn't do for the uniform norm:
we can alter the approximating function to one that is zero on neighborhoods of a
and b, and keep the two-norm approximation good. Given 6, let e(t) be a non-
negative function on [a, b] such that e(t) = 1 on [a + 26, b - 26], e(t) = 0 on
la, a + 6] and on [b - 0, b], e" is continuous, and lIell", = 1. Such an e(t) clearly
exists, since we can draw it. We leave it as an interesting exercise to actually
define e(t). Here is a hint: Show somehow that there is a fifth-degree polynomial
pet} having a graph between 0 and 1 as shown in Fig. 6.6, with a zero second
derivative at 0 and at 1, and then use a piece of the graph, suitably translated,
compressed, rotated, etc., to help patch together e(t).
Anyway, then IIg - egl12 ~ IIgll",,(46)1/2 for any g on e([a, b)), and if g has
continuous derivatives up to order 2, then so does ego Thus, if we start withI in e
and approximate it by gin e2 , and then approximate g by eg, we have altogether
the second approximation of the theorem. 0
304 DIFFERENTIAL EQUATIONS 6.7
EXERCISES
7.1 Convert the orthogonal basis {sin nxJ '1 for the pre-Hilbert space e([O, 7rJ) to
an orthonormal basis.
7.2 Do the same for the orthogonal basis {sinnxJ'1U {cosnxJO' for e([-7r,7rJ).
7.3 Show that {sin nx] '1 is an orthogonal basis for the vector space V of all odd
continuous functions on [-7r, 7r]. (De clever. Do not calculate from scratch.) Normal-
ize t.he above basis.
7.4 State and prove the corresponding theorem for the even functions on [-7r,7r].
7.5 Prove that the derivative of an odd function is even, and conversely.
7.6 Ye now want to prove t.he following s(.ronger tlH'Ol'em ahout (he uniform
convergence of Fourier series.
TheoreIll. Let f have a continuous derivative on [-7r, 7r], and suppose that
f( -7r) = f(7r)· Then the Fourier series for f converges to f uniformly.
Assume for convenience that f is even. (This only cuts down the number of
calculations.) Show first that the Fourier series for l' is the series obtained from the
Fourier series for f by term-by-term differentiation. Apply the above exercises here.
Next show from the two-Horm convergence of its Fourier series to l' and the Schwarz
inequality that the Fourier series for f converges uniformly.
7.7 Prove that {cos nx] 0' is an orthonormal basis for the space 11[ of e2-function:-;
on [0,7r] such that 1'(0) =f'(7r) = O.
7.8 Find a fifth-degree polynomial p(x) such that
p(O) = p'(O) = p"(O) = 0, p'(1) = p"(1) = 0, p(l) 1.
(Forget the last condition until the end.) Sketch the graph of p.
7.9 Usc a "piece" of the above polynomial p to construct a function e(x) such that e'
and elf exist and are continuous, e(x) = 0 when x < a + 0 and x > b - 0, e(x) = 1
on [a + 20, b - 20], and llell", = 1.
7.10 Prove the Weierstrass theorem given below on [0, 7r] in the following steps. Ve
know that f can be uniformly approximated by a e2-function g.
1) Show that c and d can be found and that g(t) ~ c(t) - d is 0 at 0 and 7r.
2) Use the Fourier series expansion of this function and the Maclaurin series for
the functions sin nx to show that the polynomial p(x) can be found.
TheoreIll (The Weierstrass approximation theorem). The polynomials are dense
in e([a, bJ) in the uniform norm. That is, given any continuous function f 011
[a, b] and any f, there is a polynomial p such that /f(x) - p(x)/ < f for all :;;
in [a, b].
CHAPTER 7
MULTILINEAR FUNCTIONALS
This chapter is principally for reference. Although most of the proofs will be
included, the reader is not expected to study them. Our goal is a collection of
basic theorems about alternating multilinear functionals, or exterior forms, and
the determinant function is one of our rewards.
1. BILINEAR FUNCTIONALS
We have already studied various aspects of bilinear functionals. We looked at
their duality implications in Section 6, Chapter 1, we considered the "canonical
forms" of symmetric bilinear functionals and their equivalent quadratic forms
in Section 7, Chapter 2, and, of course, the whole scalar product theory of
Chapter 5 is the theory of a still more special kind of bilinear functional. In this
chapter we shall restrict ourselves to bilinear and multilinear functionals over
finite-dimensional spaces, and our concerns are purely algebraic.
We begin with some material related to our earlier algebra. If V and Ware
finite-dimensional vector spaces, then the set of all bilinear functionals on
V X W is pretty clearly a vector space. We designate it V* ® W* and call it
the tensor product of V* and W*. Our first theorem simply states something
that was implicit in Theorem 6.1 of Chapter 1.
TheoreIn 1.1. The vector spaces V* ® W*, Hom(V, W*), and Hom(W, V*)
are naturally isomorphic.
Proof. We saw in Theorem 6.1 of Chapter 1 that eachfin V* ® W* determines
a linear mapping a 1--+ fa from W to V*, wherefa(t) = f(t, a), and we also noted
that this correspondence from V* ® W* to Hom(W, V*) is bijective. All that
the present theorem adds is that this bijective correspondence is linear and so
constitutes a natural isomorphism, as does the similar one from V* ® W* to
Hom(V, W*). To see this, leth be the bilinear functional corresponding to Tin
Hom(V, W*). Then f(T+S) = h +fs, for f(T+S)(a, (3) = (T + S)(a))({3) =
(T(a) +S(a))({3) = (T(a))({3) + (S(a))({3) = h(a, (3) +fs(a, (3). We can
do the same for homogeneity.
The isomorphism of V* ® W* with Hom(W, V*) follows in exactly the
same way by reversing the roles of the variables. We are thus finished with the
proof. D
305
306 MULTILINEAR FUNCTIONALS 7.2
Before looking for bases in V* Q9 W*, we define a bilinear functional 'Y Q9 A
for any two functionals 'Y E V* and AE W* by ('Y Q9 A)(~, '1/) = 'Y(~)A('1/).
We call 'Y Q9 Athe tensor product of the functionals 'Y and Aand call any bilinear
functional having this form elementary. It is not too hard to see thatf E V* Q9 W*
is elementary if and only if the corresponding T E Hom(V, W*) is a dyad.
If V and Ware finite-dimensional, with dimensions m and n, respectively,
then the above isomorphism of V* Q9 W* with Hom(V, W*) shows that the
dimension of V* Q9 W* is mn. We now describe the basis determined by given
bases in V and W.
Theorem 1.2. Let {ai}'{' and {{jiH be any bases for V and W, and let their
dual bases in V* and W* be {Mi}'{' and {viH. Then the mn elementary
bilinear functionals {Mi Q9 Vi} form the corresponding basis for V* Q9 W*.
Proof· Since Mi Q9 Vi(~' '1/) = MiWVi('1/) = xiYb the matrix expansionf(~, '1/) =
Li,i tiixiYi becomes f(~, '1/) = Li,i tii(Mi Q9 Vi)(~' '1/) or
f = 1: tii(Mi Q9 v;).
i,i
The set {Mi Q9 Vi} thus spans V* Q9 W*. Since it contains the same number of
elements (mn) as the dimension of V* Q9 W*, it forms a basis. 0
Of course, independence can also be checked directly. If Li,i tii(Mi Q9 Vi) =
0, then for every pair -<k, l>, tkl = Li,i tii(Mi Q9 Vi) (ak' (jl) = O.
We should also remark that this theorem is entirely equivalent to our
discussion of the basis for Hom(V, W) at the end of Section 4, Chapter 2.
2. MULTILINEAR FUNCTIONALS
All the above considerations generalize to multilinear functionals
f: VI X '" X Vn ~ IR.
We change notation, just as we do in replacing the traditional -<x, y> E 1R2 by
x = -<Xl,"" Xn> E IRn. Thus we write f(al,"" an) = f(a), where
a = -<al,"" an> E VI X ... X V n. Our requirement now is that
f(all . .. , an)
be a linear functional of ai when ai is held fixed for all i :;z!! i. The set of all such
functionals is a vector space, called the tensor product of the dual spaces
Vr, ... , V:, and is designated Vr Q9 ... Q9 V:.
As before, there are natural isomorphims between these tensor product
spaces and various Hom spaces. For example,
and
are naturally isomorphic. Also, there are additional isomorphisms of a variety
7.2 MULTILINEAR FUNCTIONALS 30.7
not encountered in the bilinear case. However, it will not be necessary for us to
look into these questions.
We define elementary multilinear functionals as before. If Ai E vt, i =
1, ... , n, and E = -< h, ... , ~n >- , then
To keep our notation as simple as possible, and also because it is the case of
most interest to us, we shall consider the question of bases only when VI =
V2 = ... = Vn = V. In this case (V*)® = V* (8) ... (8) V* (in factors) is
called the space of covariant tensors of order n (over V).
If {aj}i is a basis for V and! E (V*)®, then we can eXlJand the value
!(E) = !(h, ... , ~n) with respect to the basis expansions of the vectors ~i just
as we did when! was bilinear, but now the result is notationally more complex.
If we set ~i = Li!=l x}aj for i = 1, ... , n (so that the coordinate set of ~i is
xi = {xJ} j) and use the linearity of the!(h, ... , ~n) in its separate variables one
variable at a time, we get
!(~l' ... , ~n) = L X!lX;2 ... x;J(aplI ap2, ... , apn),
where the sum is taken over all n-tuples p = -< PI, ... , Pn >- such that 1 .:::; Pi .:::;
m for each i from 1 to m. The set of all these n-tuples is just the set of all func-
tions from {I, ... , n} to {I, ... ,m}. We have designated this set mn, using the
notation n = {I, ... , n}, and the scope of the above sum can thus be indicated
in the formula as follows:
!(h, ... , ~n) = I: X!l··· x;J(ap1, ... , apn)·
PEm1'
A strict proof of this formula would require an induction on n, and is left to the
interested reader. At the inductive step he wilt have to rewrite a double sum
LpE;nii LjEm as the single sum LqEm n+l using the fact that an ordered pair
-<p, j>- in mn X m is equivalent to an (n + 1)-tuplet q E mn+t, where qi = Pi
for i = 1, ... , nand qn+l = j.
If {,ui}i is the dual basis for V* and q E mn, let,uq be the elementary func-
tional,uql (8) ... (8) ,uqn· Thus ,uq(apII ... , apJ = ll1,uqi(ap) = 0. unless p = q,
in which case its value is 1. More generally,
}Lq(h, ... , ~n) = ,uql(h) ... ,uqn(~n) = X!l ... x~n·
Therefore, if we set cq = !(aq1, • .. , aqn), the general expansion now appears as
f(~b ... , ~n) = I: CP,uP(~l' ... ' ~n)
PEm1'
or! = L ~, which is the same formula we obtained in the bilinear case, but
with more sophisticated notation. The functionals {,up: p E mn} thus span
(V*)®. They are also independent. For, if L Cp,lLp = 0., then for each q,
cq = L cp,lLp(aq1, ... , aqJ = 0.. We have proved the following theorem.
308 MULTILINEAR FUNCTIONALS 7.:~
Theorem 2.1. The set {Mp : p E mn} is a basis for (V*)@. For any f ill
(V*)@ its coordinate function {cp } is defined by cp = f(OlPl' ••• ,00Pn )·
Thusf = L CpMp andf(~b ... , ~n) = L cpMp(h, ... , ~n) = L Cp X!l ... x~.
for any f E (V*)@ and any -< h, ... , ~n >- E vn.
Corollary. The dimension of (V*)@ is mn.
Proof. There are mn functions in mn, so the basis {Mp: p E mn} has mn ele-
ments. 0
3. PERMUTATIONS
A permutation on a set S is a bijection f: S -t S. If S(S) is the set of all permu-
tations on S, then S = S(S) is closed under composition (u, pES =} u 0 pES)
and inversion (u E S =} u-I E S). Also, the identity map I is in S, and, of course,
the composition operation is associative. Together these statements say exactly
that S is a group under composition. The simplest kind of permutation other
than I is one which interchanges a pair of elements of S and leaves every other
element fixed. Such a permuation is called a transposition.
We now take S to be the finite set n = {I, ... ,n} and set Sn = Sen).
It is not hard to see that then any permutation can be expressed as a product of
transpositions, and in more than one way.
A more elementary fact that we shall need is that if p is a fixed element of Sn,
then the mapping u I---t u 0 p is a bijection Sn I---t Sn. It is surjective because
any u' can be written u' = (u' 0 p-I) 0 p, and it is injective because UI 0 P =
U2 0 P =} (UI 0 p) 0 p-I = (U2 0 p) 0 p-l =} Ul = U2. Similarly, the mapping
U I---t P 0 U (p fixed) is bijective.
We also need the fact that there are n! elements in Sn. This is the ele-
mentary count from secondary school algebra. In defining an element U E Sn,
u(I) can be chosen in n ways. For each of these choices u(2) can be chosen in
n - 1 ways, so that -<u(I), u(2) >- can be chosen in n(n - 1) ways. For each of
these choices u(3) can be chosen in n - 2 ways, etc. Altogether u can be chosen
in n(n - I)(n - 2) ... 1 = n! ways.
In the sequel we shall often write 'pu' instead of 'p 0 u', just as we occasion-
ally wrote 'ST' instead of'S 0 T' for the composition of linear maps.
If ~ = -< ~b ... , ~n >- E vn andu E Sn, then we can "apply u to ~", or "per-
mute the elements of -< ~I' ... ' ~n>- through u". We mean, of course, that
we can replace -< h, ... , ~n >- by -< ~..(1}, ••• , ~..(n) >-, that is, we can replace ~
by ~ 0 u.
Permuting the variables changes a functional f E (V*)@ into a new such
functional. Specifically, given f E (V*)@ and u E Sn, we define r by
rw = fa 0 u-I) = fa..-I(}), ... , ~cr-I(n».
The reason for using u-1 instead of u is, in part, that it gives us the following
formula.
7.4 THE SIGN OF A PERMUTATION 309
Proof. J'tlltl2)(~) = f(l;o (Ul O(2)-I) = f(l;o (u2"l o(11») = f(l;ou2"l) o(11) =
1"1(1; 0 u2"l) = (rl)tl2(1;). 0
Theorem 3.1. For each u in Sn the mapping Ttl defined by f f-+ r is a linear
isomorphism of (V*)@ onto itself. The mapping u f-+ Ttl is an antihomo-
morphism from the group Sn to the group of nonsingular elements of
Hom(V*)®).
Proof. Permuting the variables does not alter the property of multilinearity, so
Ttl maps (V*)® into itself. It is linear, since (af + bg)tI = ar + bgtl. And
Tptl = Ttl 0 Tp, becausef'tI = (f't. Thusu f-+ Ttl preserves products, but in the
reverse order. This is why it is called an antihomomorphism. Finally,
so that Ttl is invertible (nonsingular, an isomorphism). 0
The mapping u f-+ Ttl is a representation (really an antirepresentation) of the
group Sn by linear transformations on (V*)®.
Lemma 3.2. Each Ttl carries the basis {J.!p} into itself, and so is a permu-
tation on the basis.
Proof. We have (J.!p)tI(l;) = J.!p(1; 0 u-1) = II~=1 J.!Pi(~tI-l(i»' Settingj = u-1(i)
and so having i = u(j), this product can be rewritten IIi=l J.!p"r.;)(~j) = J.!potl(I;).
Thus
(J.!p)" = J.!POtl,
and since p f-+ P 0 u is a permutation on m'fi, we are done. 0
4. THE SIGN OF A PERMUTATION
We consider now the special polynomial E on ~ n defined by
E(x) = E(X1, ... , xn) = II (Xi - Xj).
l~i<j~n
This is the product over all pairs -<i, j>- E n X n such that i < j. This set of
ordered pairs is in one-to-one correspondence with the collection P of all pair
sets {i, j} en such that i ~ j, the ordered pair being obtained from the un-
ordered pair by putting it in its natural order. Now it is clear that for any
permutation u E Sn, the mapping {i, j} f-+ {u(i), u(j)} is a permutation of P.
This means that the factors in the polynomial EtI(x) = E(x 0 u) are exactly the
same as in the polynomial E(x) except for the changes of sign that occur when u
reverses the order of a pair. Therefore, if n is the number of these reversals, we
have Etl = (-1)nE. The mapping u f-+ (-I)n is designated 'sgn' (and called
"sign"). Thus sgn is a function from Sn to {I, -I} such that Etl = (sgnu)E,
310 MULTILINEAR FUNCTIONALS Vi
for all er E Sn. It follows that
sgnper = (sgnp) (sgner),
for (sgnper)E = EPU = (EP)U = (sgner)EP = (sgnp) (sgner)E, and we call
evaluate E at any n-tuple x such that E(x) -;t. 0 and cancel the factor E(x).
Also
sgner = -1 if er is a transposition.
This is clear if er interchanges adjacent numbers because it then changes th(~
sign of just one factor in E(x); we leave the general case as an exercise for the
interested reader.
5. THE SUBSPACE an OF ALTERNATING TENSORS
Definition. A covariant tensor f E (V*)@ is symmetric if !" = f for all
er E Sn.
If f is bilinear [f E (V*)®], this is just the condition f(~, 1]) = f(1], 0 for
all ~,1] E V.
Definition. A covariant tensor f E (V*)@ is antisymmetric or alternating if
!" = (sgn er)f for all er E Sn.
Since each er is a product of transpositions, this can also be expressed as th(,
fact that f just changes sign if two of its arguments are interchanged. In tIll'
case of a bilinear functional it is the conditionf(~, 1]) = -f(1], ~) for all ~,1] E V.
It is important to note that iffis alternating, thenf(~) = 0 whenever the n-tupl('
~ = -< ~b ••. , ~n >- is not injective (~i = ~j for some i -;t. j). The set of all
symmetric elements of (V*)@ is clearly a subspace, as is also the (for us) mOJ"('
important set an of all alternating elements. There is an important linear pro-
jection from (V*)@ to an which we now describe.
Theorelll 5.1. The mapping f ~ (1/n!)LuESn (sgn er)!" is a projection H
from (V*)@ to an.
Proof. We first check that Qf E an for every f in (V*)@. We have (Qf)P =
(1/n!)Lu (sgner)!"p. Now sgner = (sgnerp)(sgnp). Setting er' = er 0 p and
remembering that er ~ er' is a bijection, we thus have
(Qil = (Sg~p):E (sgner')!'" = (sgnp)(Qf).
n. a'
Hence Qf E an.
If jis already in an, then!" = (sgner)f and Qf = (1/n!)LuESn f. Since S"
has n! elements, Qf = f. Thus Q is a projection from (V*)@ to an. 0
Lenllna 5.1. Q(f') = (sgn p)Qf.
Proof. The formula for Q(fP) is the same as that for (Qf)P except that per replaces
erp. The proof is thus the same as the one for the theorem above. 0
7.5 THE SUBSPACE an OF ALTERNATING TENSORS 311
Theorem 5.2. The vector space an of alternating n-linear functionals over
the m-dimensional vector space V has dimension C;:).
Proof. If f E an and 1 = Lp cpJJ,p, then since :r = (sgn u)/, we have
Lp cpJJ,pocr = Lp (sgn u)cpJJ,p for any U in Sn. Setting po U = q, the left sum
becomes Lq Cqocr-lJJ,q, and since the basis expansion is unique, we must have
Cqocr-l = sgnucq or cp = (sgnu)cpocr for all p E m,'ii. Working backward, we see,
conversely, that this condition implies that:r = (sgn u)f. Thus 1E an if and
only if its coordinate function cp satisfies the identity
for all p E m,'ii and all U E Sn.
This has many consequences. For one thing, cp = 0 unless p is one-to-one
(injective). For if Pi = Pi and U is the transposition interchanging i and j, then
p 0 U = p, cp = (sgn u)cpocr = -cp , and so cp = o. Since no p can be injective
if n > m, we see that in this case the only element of an is the zero functional.
Thus n > m =} dim an = o.
Now suppose that n :::; m. For any injective p, the set {p 0 U : U E Sn}
consists of all the (injective) n-tuples with the same range set as p. There are
clearly n! of them. Exactly one q = po U counts off the range set in its natural
order, i.e., satisfies ql < q2 < ... < qn. We select this unique q as the repre-
sentative of all the elements p 0 U having this range. The collection C of these
canonical (representative) q's is thus in one-to-one correspondence with the
collection of all (range) subsets of m = {I, ... , m} of size n.
Each injective p E m,'ii is uniquely expressible as p = q 0 U for some q E C,
U E Sn. Thus each1in an is the sum LqEC LcrESn tqocrJJ,qocr. Since tqocr = (sgn u)tq,
this sum can be rewritten LqEC tq Lcr (sgn u)JJ,qocr = LqEC tqVq, where we have
Het Vq = Lcr (sgn u)JJ,qo(J' = n!Q(JJ,q).
We are just about done. Each Vq is alternating, since it is in the range of Q,
and the expansion
which we have just found to be valid for every 1 E an shows that the set
{vq : q E C} spans an. It is also independent, since LqEC tqVq = LpEmfi tpJJ,p
and the set {JJ,p} is independent. It is therefore a basis for an.
Now the total number of injective mappings p from n = {I, ... ,n} to
m = {I, ... ,m} is m(m - 1) ... (m - n + 1), for the first element can be
chosen in m ways, the second in m - 1 ways, and so on down through n choices,
the last element having m - (n - 1) = m - n +1 possibilities. We have seen
above that the number of these p's with a given range is n!. Therefore, the
number of different range sets is
m(m _ 1) ... m - n + 1 _ m! _ (m) .
n! - n!(m - n)! - n
And this is the number of elements q E C. 0
312 MULTILINEAR FUNCTIONALS 7.li
The case n = m is very important. Now C contains only one element, thp
identity I in Sm, so that
f = CIIlI = CI L: (sgn u)p,u
u
and
u
u
This is essentially the formula for the determinant, as we shall see.
6. THE DETERMINANT
We saw in Section 5 that the dimension of the space am of alternating m-forms
over an m-dimensional V is (:) = 1. Thus, to within scalar multiples there is only
one alternating m-linear functional D over V = IRm , and we can adjust th(~
constant so that D(al, ... , am) = 1. This uniquely determined m-form is th(~
determinanifunctional, and its value D(xl, ... , xm) at the m-tuple -<Xl, ••• ,xm>-
is the determinant ofthe matrix x = {Xij} whoseJth column is xj for J= 1, ... , m.
Lemma 6.1. D(t1, ... , t m) = LuESm (sgn u)tu(l),I ••• tu(m),m'
Proof. This is just the last remark of the last section, with the constant CI = 1,
since D(al, ... , am) = 1, and with the notation changed to the usual matrix
form tij. 0
Corollary I. D(t*) = D(t).
Proof. If we reorder the factors of the product tul,l ••• ttTm,m in the order of th('
values Ui, the product becomes tl'PI ••• tm,Pm' where p = u-l • Since
is a bijection from Sn to Sn, and since sgn(u-l ) = sgn u, the sum in the lemma
can be rewritten as LpESm (sgn p) tl'PI ••. tm,Pm' But this is
Corollary 2. D(t) is an alternating m-linear functional of the rows of 1.
Now let dim V = m, and letfbe any nonzero alternating m-form on V. For
any T in Hom V the functionalh defined by h(~l' ... , ~n) = f(T~l' ... , T~n)
also belongs to am. Since am is one-dimensional, h = krf for some constant kT .
Moreover, kT is independent of i, since if gT = kT'g and g = ci, we must havp
ch = kT'ci and kT' = kT. This unique constant is called the determinant of T;
we shall designate it !leT). Note that !leT) is defined independently of any basi::;
for V.
Theorem 6.1. !l(S 0 T) = !l(S) !leT).
7.6 THE DETERMINANT 313
Proof
fl.(S 0 T)f(h, ... , ~m) = f(S 0 T)(~l)' ... , (S 0 T)(~m))
= f(S(T(h)), ... ,S(T(~m)))
= fl.(S)f(T(h), ... , T(~m)) = fl.(T) fl.(S)f(~I' ... , ~m)'
Now divide out f. 0
Theorem 6.2. If (J is an isomorphism from V to W, and if T E Hom V and
S = (J 0 ToO-l, then fl.(S) = fl.(T).
Proof. If f is any nonzero alternating m-form on W, and if we define g by
(J(h, ... , ~n) = f«(J~1l ... ,(J~n)' Then g is a nonzero alternating m-form on V.
Now f(S 0 (J~1l ... ,S 0 (J~n) = fl.(S)f«(J~, ... , (J~n) = fl.(S)g(h, ... , ~n)'
and alsof(S 0 (J~1l •.• ,S 0 (J~n) = f«(J 0 Th, . .. , (J 0 T~n) = g(Th, ... , T~n) =
b.(T)g(~1l ... , ~n)' Thus fl.(S)g = fl.(T)g and fl.(S) = fl.(T). 0
The reader will expect the two notions of determinant we have introduced
to agree; we prove this now.
Corollary 1. If t is the matrix of T with respect to some basis in V, then
D(t) = fl.(T).
Proof. If (J is the coordinate isomorphism, then T = (J 0 To (J-I is in Hom IRm
and fl.(T) = fl.(T) by the theorem. Also, the columns of t are the m-tuple
T(OI), ... ,T(om). Thus D(t) = D(t ... , t m) = D(T(OI), ... , T(om)) =
b.(T) D(oI, ... , om) = fl.(T). Altogether we have D(t) = fl.(T). 0
Corollary 2. If sand tare m X m matrices, then D(s· t) = D(s) D(t).
Proof. D(s· t) = fl.(S 0 T) = fl.(S) fl.(T) = D(s) D(t). 0
Corollary 3. D(t) = 0 if and only if t is singular.
J)roof. If t is nonsingular, then t-l exists and D(t) D(CI) = D(tt-l) =
n(I) = 1. In particular, D(t) ~ O. If t is singular, some column, say tIl is a
linear combination of the others, tl = L;' Cit;, and D(tIl"" t m ) =
L:;' c;D(ti, t2, ... , t m) = 0, since each term in the sum evaluates D at an
lit-tuple having two identical elements, and so is 0 by the alternating property. 0
We still have to show that fl. has all the properties we ascribed to it in
Chapter 2. Some of them are in hand. We know that fl.(S 0 T) = fl.(S) fl.(T),
ILnd the one- and two-dimensional properties are trivial. Thus, if T interchanges
independent vectors al and a2 in a two-dimensional space, then its matrix with
respect to them as a basis is t = [~ Al, and so fl.(T) = D(t) = -1.
The following lemma will complete the job.
Lemma 6.2. Consider D(t) = D(t  ... , t m) under the special assumption
that t m = om. If s is the (m - 1) X (m - 1) matrix obtained from the
m X m matrix t by deleting its last row and last column, then D(s) = D(t).
314 MULTILINEAR FUNCTIONALS 7.n
Proof. This can be made to follow from an inspection of the formula of LemmlL
6.1, but we shall argue directly.
If t has om also as its jth column for some j ~ m, then of course D(t) = (I
by the alternating property. This means that D(t) is unchanged if thejth columll
is altered in the mth place, and therefore D(t) depends only on the values iij ill
the rows i ~ m. That is, D(t) depends only on s. Now t ~ s is clearly a Slll'-
jective mapping to ~m-lXm-l, and, as a function of s, D(t) is alternatill~
(m - I)-linear. It therefore is a constant multiple of the determinant I)
on ~(m-l)X(m-l). To see what the constant is, we evaluate at
Then D(s) = 1 = D(t) for this special choice, and so D(s) = D(t) in general. [[
In order to get a hold on the remaining two properties, we consider all
m X m matrix t whose last m - n columns are on+ 1, .•• , om, and we apply
the above lemma repeatedly. We have, first, D(t) = D(t)mm), where (t)1II111
is the (m - 1) X (m - 1) matrix obtained from t by deleting the last rowalHl
the last column. Since this matrix has om-l as its last column (om-l being no'
an (m - I)-tuple), the same argument shows that its determinant is the salll('
as that of the (m - 2) X (m - 2) matrix obtained from it in the same way.
We can keep on going as long as the o-columns last, and thus see that D(t) is tll(·
determinant of the n X n matrix that is the upper left corner of t. If we interpret
this in terms of transformations, we have the following lemma.
Lenuna 6.3. 3uppose that V is m-dimensional and that T in Hom V iH
the identity on an (m - n)-dimensional subspace X. Let Y be a compl(·-
ment of X, and let p be the projection on Y along X. Then po (T f Y) call
be considered an element of Hom Y and Ll(T) = Lly(p 0 (T f V»).
Proof. Let ar, ... , an be a basis for Y, and let an+r, ... , am be a basis for X.
Then {ai}'{' is a basis for V, and since T(ai) = ai for i = n + 1, ... , m, til!'
matrix for T has oi as its ith column for i = n + 1, ... ,m. The lemma will
therefore follow from our above discussion if we can show that the matrix of
po (T f Y) in Hom Y is the n X n upper left corner of t. The student should
be able to see this if he visualizes what vector (p 0 T)(ai) is for i ::;; n. D
Corollary. In the above situation if Y is also invariant under T, thpll
Ll(T) = Lly(T f V).
Proof. The proof follows immediately since now p 0 (T f Y) = T f Y. I[
If the roles of X and Yare interchanged, both being invariant under T and
T being the identity on Y, then this same lemma tells us that Ll(T) = Llx(T f X).
If we only know that X and Yare T-invariant, then we can factor T into II
commuting product T = T loT2 = T 2 0 T r, where T 1 and T 2 are of the two
more special types discussed above, and so have the rule Ll(T) = Ll(T1) Ll(T2) =
Llx(T f X) Lly(T f V), another of our properties listed in Chapter 2.
7.6 THE DETERMINANT 315
The final rule is also a consequence of the above lemma. If T is the identity
on X and also on VIX, then it isn't too hard to see that po (T f Y) is the
identity, as an element of Hom Y, and so A(T) = 1 by the lemma.
We now prove the theorem concerning "expansion by minors (or cofactors)".
Let t be an m X m matrix, and let (t)pr be the (m - 1) X (m - 1) submatrix
obtained from t by deleting the pth row and rth column. Then,
Theorem 6.3. D(t) = L:i"=1 (-l)i+rtir D(t)ir). That is, we can evaluate
D(t) by going down the rth column, multiplying each element by the
determinant of the (m - 1) X (m - 1) matrix associated with it, and
adding. The two occurrences of 'D' in the theorem are of course over di-
mensions m and m - 1, respectively.
Proof. Consider D(t) = D(tl, ... ,tm) under the special assumption that
e = op. Since D(t) is an alternating linear functional both of the columns of t
and of the rows of t, we can move the rth column and pth row to the right-
bottom border, and apply Lemma 6.2. Thus
D(t) = (_l)m-r(_l)m-p D(er) = (-l)P+rD(er),
assuming that the rth column of t is or. In general, e = L:i"=1 tir oi, and if we
expand D(t t, ... , t m) with respect to this sum in the rth place, and if we use the
above evaluation of the separate terms of the resulting sum, we get D(t) =
L:i"=1 (_l)i+r tir D(t)ir. 0
Corollary 1. If S ~ r, then L:i"=1 (-l)i+rti. D(t)ir) = o.
Proof. We now have the expansion of the theorem for a matrix with identicalsth
and rth columns, and the determinant of this matrix is zero by the alternating
property. 0
For simpler notation, set Cij = (-l)i+j D (t)ij). This is called the cofactor
of the element tij in t. Our two results together say that
m
L: Cirti. = o~ D(t).
i=1
In particular, if D(t) ~ 0, then the matrix s whose entries are Sri = Cir/D(t) is
the inverse of t. This observation gives us a neat way to express the solution of
a system of linear equations. We want to solve t . x = y for x in terms of y,
supposing that D(t) ~ O. Since s is the inverse of t, we have x = s' y. That is,
Xj = L:i"=1 SjiYi = (L:i"=1 YiCij)/D(t) for J = 1, ... ,m. According to our
expansion theorem, the numerator in this expression is exactly the determinant
dj of the matrix obtained from t by replacing its Jth column by the m-tuple y.
Hence, with dj defined this way, the solution to t . x = y is the m-tuple
This is Cramer's rule. It was stated in slightly different notation in Section 2.5.
316 MULTILINEAR FUNCTIONALS 7.7
7. THE EXTERIOR ALGEBRA
Our final job is to introduce a multiplication operation between alternatiIlK
n-linear functionals (also now called exterior n-Iorms). We first extend the tensol'
product operation that we have used to fashion elementary covariant tensors out.
of functionals.
Definition. If IE (V*)@ and g E (V*)<D, then I ® g is that element of
(V*)<l!±D defined as follows:
I ® g(~1! ... , ~n+l) = I(~l' ... , ~n)g(~n+1! ... , ~n+l)'
We naturally ask how this operation combines with the projection n of
(V*)<li±D onto an +l •
Theorem 7.1. n(f ® g) = n(f ® ng) = n(nl ® g).
Prool. We have
n(f ® ng) = (n ~ l)! ~ (sgn u)(f ® ng)"
1 (1 )U= (n + l)! ~ (sgn u) I ® Ii ~ (sgn p)gP
= (n; l) !l! ~ (sgn u)(sgn p)(f ® gP)".
We can regard p as acting on the full n + l places of I ® g by taking it as thl'
identity on the first n places. Then (f ® gP)U = (f ® g)pu. Set pu = u'. Fol'
each u' there are exactly l! pairs <p, u'? with pu = u', namely, the pair;;
{<p, p-1u''?: p E Sl}. Thus the above sum is
(n ~ l)! ~ (sgn u') (f ® g)U' = n(f ® g).
The proof for n(nl ® g) is essentially the same. 0
Definition. If I E an and g E ai, then I 1 g = (n;;l)n(f ® g).
Lemma 7.1. !I 1 h 1 ... 1 Ik = (n!/nl!n2!'" nk!)n(fl ® ... ® Ik),
where ni is the order of J;, i = 1, ... , k, and n = L~ ni.
Proof. This is simply an induction, using the definition of the wedge operation 1
and the above theorem. 0
Corollary. If Ai E V*, i = 1, ... , n, then
In particular, if ql < ... < qn and {JLiH' is a basis for V*, then
JLql 1 ... 1 JLqn = n!n(JLq) = the basis element IIq of an.
7.7 THE EXTERIOR ALGEBRA 317
TheoreUl 7.2. If f E an and g E ai, then g 1 f = (_1)lnf 1 g. In par-
ticular, A 1 A= 0 for AE V*.
Proof. We have g @f = U @ g)", where (f is the permutation moving each of
the last l places over each of the first n places. Thus (f is the product of In
transpositions, (sgn (f) = (_1)ln, and
Q(g @f) = QU@ g)" = (sgn (f)QU@ g) = (_1)lnQU @ g).
We multiply by C;-l) and have the theorem. D
Corollary. If {Ai} ~ C V*, then Al 1 ... 1 An = 0 if and only if the
sequence {Ai} ~ is dependent.
Proof. If {Ai} is independent, it can be extended to a basis for V*, and then
Al 1 ... 1 An is some basis vector Vq of an by the above corollary. In par-
ticular, AI 1 . . . 1 An ~ O.
If {Ai} is dependent, then one of its elements, say AI, is a linear combination of
the rest, Al = :E~ CiAi and Al 1 A2 1 ... 1 An = :Ei'=2 CiAi 1 (A2 1 ... 1 An).
The ith of these terms repeats Ai, and so is 0 by the lemma and the above
corollary. D
LeUlUla 7.2. The mapping <'f, g>- f---t f 1 g is a bilinear mapping from
an X a l to a n+l.
Proof. This follows at once from the obvious bilinearity of f @ g. D
We conclude with an important extension theorem.
TheoreUl 7.3. Let () be the alternating n-linear map
from (v*)n to an. Then for any alternating n-linearfunctional F(AI' ... ,An)
on (v*)n, there is a uniquely determined linear functional G on an such that
F = Go e. The mapping G f---t F is thus a canonical isomorphism from
(an) * to an(V*).
Proof. The straightforward way to prove this is to define G by establishing its
necessary values on a basis, using the equation F = Go e, and then to show
from the linearity of G, the alternating multilinearity of
and the alternating multilinearity of F that the identity F = Go () holds
everywhere. This computation becomes notationally complex. Instead, we shall
be devious. We shall see that by proving more than the theorem asserts we get
a shorter proof of the theorem.
We consider the space an(V*) of all alternating n-linear functions on (v*)n.
We know from Theorem 5.2 that d(an(V*» = C;;'), since d(V*) = d(V) = m.
318 MULTILINEAR FUNCTIONALS 7.7
Now for each functional Gin (an )*, the functional Go eis alternating and u-
linear, and so G;-+ F = Go e is a mapping from (an)* to an(v*) which is
clearly linear. Moreover it is injective, for if G ~ 0, then F(fJ-q(l), ... , fJ-q(n)) =
G(Vq) ~ 0 for some basis vector Vq = fJ-q(1) 1 ... 1 fJ-q(n) of an(v*). Sincp
d(an(V*) = ('::) = dean) = d(an)*), the mapping is an isomorphism (by
the corollary of Theorem 2.4, Chapter 2). In particular, every F in an(v*) i::;
of the form Go e. 0
It can be shown further that the property asserted in the above theorem is
an abstract characterization of an. By this we mean the following. Suppose that
a vector space X and an alternating mapping cp from (v*)n to X are given, and
suppose that every alternating functional F on (v*)n extends uniquely to it
linear functional G on X (that is, F = Go cp). Then X is isomorphic to an, and
in such a way that cp becomes e.
To see this we simply note that the hypothesis of unique extensibility is
exactly the hypothesis that <I>: G;-+ F = Go cp is an isomorphism from X* to
an(v*). The theorem gave an isomorphism 8 from (an)* to an(v*), and the
adjoint (<I>-1 0 8)* is thus an isomorphism from X** to (an)**, that is, from X
to an. We won't check that cp "becomes" e.
By virtue of Corollary 1 of Theorem 6.2, the identity D(t) = D(t*) is the
matrix form of the more general identity b.(T) = b.(T*), and it is interesting to
note the "coordinate free" proof of this equation. Here, of course, T E Hom V.
We first note that the identity (T*A)(O = A(T~) carries through the defini-
tions of @ and 1 to give
T*Al 1 ... 1 T*An(~lI ... , ~n) = Al 1 ... 1 An(T~lI ... ,T~n). (*)
Also, ev~: Al 1 ... 1 An;-+ Al 1 ... 1 An(~I' ... '~n) is an alternating
n-linear functional on an(v*) for each l; E V n. The left member of (*) is thus
ev~(T*Al' ... , T*An), and, if n = dim V, this is b.(T*)ev~(Al' ... , An) by the
definition of b.. By the same definition the right side of (*) becomes
b.(T)[AI 1 ... 1 An(~I' ... , ~n)l = b.(T)ev~(Al' ... , An).
Thus (*) implies the identity b.(T*)ev~ = b.(T)ev~. Since ev~ ~ 0 if l; =
Hi}! is independent, we have proved that b.(T*) = b.(T).
We call a wedge product Al 1 ... 1 An of functionals Ai E V* a multi-
vector. We saw above that Al 1 ... 1 An ~ 0 if and only if {Ai}1 is inde-
pendent, in which case {Ai}! spans an n-dimensional subspace of V*. The
following lemma shows that this geometric connection is not accidental.
Lemma 7.3. Two independent n-tuples {Ai}1 and {fJ-i}1 in V* have the
same linear span if and only if fJ-l 1 ... 1 fJ-n = k(AI 1 ... 1 An) for
some k.
Proof. If {fJ-j}! C L({Ai}1), then each fJ-j is a linear combination of the A/S, and
if we expand fJ-l 1 ... 1 fJ-n according to these basis expansions, we get
k(Al 1 ... 1 An). If, furthermore, {fJ-i}1 is independent, then k cannot be zero.
7.8 EXTERIOR POWERS OF SCALAR PRODUCT SPACES 319
Now suppose, conversely, that Jll / ... / Jln = kCAl / •.. / An), where
k ¢ O. This implies first that {Jli} 1is independent, and then that
Jlj / (AI / ... / An) = 0
for each j, so that each Jlj is dependent on {Ai}1. Together, these two con-
sequences imply that the set {Jlin has the same linear span as {Ain. 0
This lemma shows that a multivector has a relationship to the subspace it
determines like that of a single vector to its span.
8. EXTERIOR POWERS OF SCALAR PRODUCT SPACES
Let V be a finite-dimensional vector space, and let ( , ) be a nondegenerate
(nonsingular) symmetric bilinear form on V. In this and the next section we shall
call any such bilinear form a scalar product, even though it may not be
positive definite. We know that the bilinear form ( , ) induces an isomorphism
of V with V* sending y E V into y E V*, where y(x) = (x, 'fi) = (x, y) for all
x E V. We then get a nondegenerate form (scalar product), which we shall con-
tinue to denote by ( , ), on V* by setting (ii, v) = (u, v). We also obtain a
nondegenerate scalar product on aq by setting
(iiI / ... / iiq, VI / ... / Vq ) = det (iii, VJ-). (8.1)
To check that (8.1) makes sense, we first remark that for fixed Vb ... , Vq E V*,
the right-hand side of (8.1) is an antisymmetric multilinear function of the
vectors iiI, ... , iiq, and therefore extends to a linear function on aq(v*) by
Theorem 7.3. Similarly, holding the ii's fixed determines a linear function on
aq(v*), and (8.1) is well defined and extends to a bilinear function on aq(V*).
The right-hand side of (8.1) is clearly symmetric in U and v, so that the bilinear
form we get is indeed symmetric. To see that it is nondegenerate, let us choose a
basis Ul, ... , Un so that
(8.2)
(We can always find such a basis by Theorem 7.1 of Chapter 2.) We know that
{iii} = {iii! / ... / iii.} forms a basis for aq, where i = <ib ... ,iq> ranges
over all q-tuplets of integers such that 1 ::::; i l < ... < i q ::::; n, and we claim
that
(8.3)
In fact, if i ¢ j, then ir ¢ js for some value of r between 1 and q and for aIls.
In this case one whole row of the matrix (Ui" Ujm) vanishes, namely, the rth row.
Thus (8.1) gives zero in this case. If i = j, then (8.2) says that the matrix has
±1 down the diagonal and zeros elsewhere, establishing (8.3), and thus the fact
that ( , ) is nondegenerate on aq• In particular, we have
(Ul / •.. / Un, Ul / ••• / Un) = (-1)#,
where #is the number of minus signs occurring in (8.3).
(8.4)
320 MULTILINEAR FUNCTIONALS 7.9
9. THE STAR OPERATOR
Let V be a finite-dimensional vector space endowed with a nondegenerate scalar
product as in Section 8. The space an is one-dimensional if n is the dimension of
V. The induced scalar product on an is nondegenerate, so that (u, u) is either
always positive or always negative for all nonzero u E an. In particular, there
are exactly two u's in an with (u, u) = ±l. Let us choose one of them and hold
it fixed for the remainder of this section. Geometrically, this amounts to choosing
an orientation on V. We thus have picked a
with (9.1)
Let ii be some fixed element of a q. Then for any y E a n- q, ii / YEan, and so
we can write ii / Y = fv(Y)u, where fv(y) depends linearly on y. Since the
induced scalar product ( , ) on an - q is nondegenerate, there is a unique
element *ii E a n- q such that (y, *ii) = hey). To repeat, we have assigned a
*ii E an - q to each ii E aq by setting
(y, *ii)u = ii / y. (9.2)
We have thus defined a map, *, from aq to an - q• It is clear from (9.2) that this
map is linear. Let Ut, •.. , Un be a basis for V satisfying (8.2) and also u =
Ut / ••. / Un, and construct the corresponding bases for the spaces aq and
an - q• Then Ui / Uj = 0 if any il occurring in the q-tuplet i also occurs in j.
If no il occurs in j then
where Ek = sgn k
If we compare this with (9.2) and (8.3), we see that
(9.3)
where the sign is the same as that occurring in (8.3), i.e., the sign is positive or
negative according as the number of jl with (Uj" ujz) = -1 which appear in j
is even or odd. Applying * to (9.3), we see that
**ii = (-1)q(n-qJ+#v
Let v and 10 be elements of aq• Then
(*v, *w)u = v / *w = (-1)q(n-q)*w / v = (_l)q(n-q)(**w, v)u.
If we apply (9.4), we see that
(*v, *w) = (-1)#(v, w).
(9.4)
(9.5)
CHAPTER 8
INTEGRATION
1. INTRODUCTION
In this chapter we shall present a theory of integration in n-dimensional Euclid-
ean space lEn, which the reader will remember is simply Cartesian n-space
IRon together with the standard scalar product. Our main item of business is to
introduce a notion of size for subsets of lEn (area in two dimensions, volume in
three ...). Before proceeding to the formal definitions, let us see what properties
we would like our notion of size to have. We are looking for a function jJ. which
assigns a number jJ.(A) to bounded subsets A C P.
i) We would like jJ.(A) to be a nonnegative real number.
ii) If AC B, we would expect to have jJ.(A) ::; jJ.(B).
iii) If A and B are disjoint (that is, A n B = 0), then we would expect to
have jJ.(A U B) = jJ.(A) + jJ.(B).
iv) Let T be any Euclidean motion. * For any set A let TA be the set of all
points of the form Tx, where x E A. We then would expect to have
jJ.(TA) = jJ.(A). (Thus we want "congruent" sets to have the same size.)
v) We would expect a "lower-dimensional set" (where this is suitably de-
fined) to have zero size. Thus points in the line, curves in the plane,
surfaces in three-space, etc., should all have zero size.
vi) By the same token, we would expect open sets to have positive size.
In the above discussion we did not specify what kind of sets we were talking
about. One might be ambitious and try to assign a size to every subset of lEn.
This proves to be impossible, however, for the following reason: Let U and V
be any two bounded open subsets of IE 3. It can be shownt that we can find
* Recall that a Euclidean motion is an isometry of lEn and can thus be represented as
the composition of the translation and an orthogonal transformation.
t S. Banach and A. Tarski, Sur la decomposition des ensembles de pointes en partie
respectivement congruentes, Fund. Math. 6, 244-277 (1924). R. 11. Robinson, On the
decomposition of spheres, Fund. Math. 34, 246-260 (1947).
321
322 INTEGRATION 8.2
decompositions
and
with U, n U; = 0 = Vi n V; for i ¢ j, and Euclidean motions Ti with
TiUi = Vi. In other words, we can break up U into finitely many pieces, move
these pieces around, and then recombine them to get V. Needless to say, the
sets Ui will have to look very bad. A moment's reflection shows that if we wish
to assign a size to all subsets (including those like Ui ), we cannot satisfy (ii),
(iii), (iv), and (vi). In fact, (iii) [repeated (k - I) times] implies that
k
p,(U) = :E p,(Ui),
i=1
and (iv) implies that p,(Ui ) = p,(Vi ). Thus p,(U) = p,(V), or the size of any two
open sets would coincide. Since any open set contains two disjoint open sets,
this implies, by (ii), that p,(U) ? 2p,(U), so p,(U) = O.
Weare thus faced with a choice. Either we dispense with some of require-
ments (i) through (vi) above, or we do not assign a size to every subset of lEn.
Since our requirements are reasonable, we prefer the second alternative. This
means, of course, that now, in addition to introducing a notion of size, we must
describe the class of "good" sets we wish to admit.
We shall proceed axiomatically, listing some "reasonable" axioms for a class
of subsets and a function p,.
2. AXIOMS
Our axioms will concern a class :D of subsets of lEn and a function p, defined on :D.
(That is, p,(A) is defined if A is a subset of lEn belonging to our collection :D.)
I. :D is a collection of subsets of P such that:
:Dl. If A E :D and B E :D, then A u B E :D, A n B E :D, and A - B E :D.
:D2. If A E :D and T is a translation, then TA E :D.
:D3. The set D~ = {x: 0 ~ Xi < I} belongs to :D.
II. The real-valued function p, has the following properties:
p.!. p,(A) ? 0 for all A E :D.
p,2. If A E :D, B E :D, and A n B = 0, then p,(A U B) = p,(A) +p,(B).
p,3. For any A E :D and any translation T, we have p,(TA) = p,(A).
p,4. p,(D~) = 1.
Before proceeding, some remarks about our axioms are in order. Axiom:D1
will allow us to perform elementary set-theoretical operations with the elements
of :D. Note that in Axioms :D2 and p,3 we are only allowing translations, but in
AXIOMS 323
~ur list of desired properties we wanted proper behavior with respect to all
llIuclidean motions in (iv). The reason for this is that we shall show that for
B~good" choices of :0, the axioms, as they stand, uniquely determine J.I.. It will
'lhen turn out that J.I. actually satisfies the stronger condition (iv), while we
Issume the weaker condition J.l.3 as an axiom.
I
I
------------*---I
I
I
I
I
Fig. 8.1
Axiom :03 guarantees that our theory is not completely trivial, i.e., the
t}bllection :0 is not empty. Axiom J.l.4 has the effect of normalizing J.I.. Without it,
tf¥ly J.I. satisfying J.l.l, J.l.2, and J.l.3 could be multiplied by any nonnegative real
"1;tumber, and the new function J.I.' so obtained would still satisfy our axioms.
flh particular, J.l.4 guarantees that we do not choose J.I. to be the trivial function
:lSSigp.ing to each A the value zero.
Fig. 8.2
Our program for the next few sections is to make some reasonable choices for
.:0 and to show that for the given:o there exists a uniqueJ.I. satisfying J.l.l through J.l.4.
An important elementary consequence of the :0, J.I.-axioms that we shall fre-
quently use without comment is:
p.5. If A c U1 Ai and all the sets are in :0, then J.I.(A) :::; l:1 J.I.(Ai).
Our beginning work will be largely combinational. We will first consider
(generalized) rectangles, which are just Cartesian products of intervals, and the
iway in which a point inside a rectangle determines a splitting of the rectangle
,.into a collection of smaller rectangles, as indicated in Fig. 8.1. This is associated
with the fact that the intersection of any two rectangles is a rectangle and the
difference of two rectangles is a finite disjoint union df rectangles (see Fig. 8.2).
324 INTEGRATION
--------r--
Fig. 8.3
I
I
I
I
I
f-------- t---
8.3
We call a set A paved if it can be expressed as the union of a finite disjoint
collection p of rectangles (a paving of A). It will follow from our combinational
considerations that the collection !Dmin of all the paved sets satisfies Axioms !Dl
through !D3 and is the smallest family that does: any other collection !D satisfying
the axioms includes !Dmin. It will then follow that if Msatisfies Ml through M4 on
!Dmin, then it must have the natural value (the product of the lengths of the sides)
for a rectangle. This implies that Mis uniquely defined on !Dmin by requirements
Ml through M4, since the value M(A) for any paved set A must be the sum of the
natural values for the rectangles in a paving of A. The existence of Mon !Dmin
thus depends on the crucial lemma that two different pavings of the set A give
the same sum. (See Fig. 8.3.)
This comes down to the fact that the "intersection" of two pavings of A is
a third paving "finer" than either, and the fact that when a single rectangle is
broken up, the natural values of M for the pieces add up to M for the fragmented
rectangle.
All these considerations are elementary but exceedingly messy in detail. We
give the proofs below for the reader to refer to in case of doubt, but he may
prefer to study only the definitions and statements of results and then to proceed
to Section 6.
3. RECTANGLES AND PAVED SETS
We first introduce some notation and terminology. Let a = -< a ... , an >-
and b = -<bI, ... , bn>- be elements of lEn. By the rectangle D~ we shall mean
the set of all x = -< xI, ... , xn >- in lEn with ai ~ Xi < bi • Thus
Dh - { . i < i < bi . - 1 }a - x.a _X ,2- , ... ,n. (3.1)
Note that in order for D~ to be nonempty, we must have ai < bi for all i.
In other words,
D~= 0' if ai ;::: bi for some i. (3.2)
In the plane (n = 2) for instance, our rectangles D~ correspond to ordinary
Euclidean rectangles whose sides are parallel to the axes. (We should perhaps use
an additional adjective and call our sets level rectangles, braced rectangles, or
something else, but for simplicity we shall just call them rectangles.) Note that
in the plane our rectangles include the left-hand and lower edges but not the
right-hand and upper ones (see Fig. 8.4).
8.3 RECTANGLES AND PAVED SETS 325
----1'-------------X2= 0 Fig. 8.4
For general n, if we set 1 = -< 1, ... , 1» , then our notation coincides with
that of ::03.
We now collect some elementary facts about rectangles. It follows imme-
diatelyfrom the definition (3.1) that if a = -<at, ... , ak », b = -<bt, . .. , bn »,
etc., then
O~nO~ = O!, (3.3)
where
and ~ = 1, ... , n.
(The reader should draw various different instances of this equation in the plane
to get the correct geometrical feeling.) Note that the case where O~ n O~ = >Z5
is included in (3.3) by (3.2). Another immediate consequence of the definition
(3.1) is
for any translation T. (3.4)
We will now establish some elementary results which will imply that any ::0
satisfying Axioms ::01 through ::03 must contain all rectangles.
Lemma 3.1. Any rectangle O~ can be written as the disjoint union
where b r - ar E D~.
(What this says is that any "big" rectangle can be written as a finite union
of "small" ones.)
Proof. We may assume that O~ ~ >Z5 (otherwise take k = 0 in the union).
Thus bi > ai . In particular, if we choose the integer m sufficiently large,
(1/2m)(b - a) will lie in O~.
By induction, it therefore suffices to prove that we can decompose O~ into
the disjoint union
2n
D~ = U D~: with d. - c. = !(b - a). (3.5)
.=1
(For then we can continue to subdivide until the rectangles we get are small
enough.)
We get this subdivision in the obvious way by choosing the vertex "in the
middle" of the rectangle and considering all rectangles obtained by cutting
D~ through this point by coordinate hyperplanes. To write down an explicit
326 INTEGRATION 8.3
h[2J /
Db{2) I/Db{1,2J
a{2) a[l,2J
b = b{l,2)
Db0 Db'lll '
a0 a[l]
Fig. 8.5
formula, it will be convenient to use the set of all subsets of {I, ... , n} as an
indexing set, rather than the integers 1, ... ,2n. Let J denote an arbitrary sub-
set of {1,2, ... ,n}. LetaJ= <a}, ... ,aj> andbJ = <b}, ... ,bj> be
given by
if i E J,
if i ~ J
and
if i E J,
if i ~ J.
Then any x E D~ lies in one and only one D~::-. In other words, D~::- n D~~ =
525 if J r" K and UallJ D~::- = D~. (The case where n = 2 is shown in Fig. 8.5.)
Since bJ - aJ = -!(b - a) for all J, we have proved the lemma. 0
We now observe that for any c E D~ we have, by (3.3),
Do = D~ n D~-l' (3.6)
Let Tv denote translation through the vector v. Then D~-l = Tc - l D~ by
(3.4). Thus by Axioms i>2 and i>3 the rectangle D~-l must belong to i>.
By (3.6) and Axiom i>1 we conclude that Do E i> for any c E D~.
Observe that TaD~-a = D~ by (3.4). Thus
whenever b -a E D~.
If we now apply Lemma 3.1 we conclude that
D~ E i> for all a and b. (3.7)
We make the following definition.
Definition 3.1. A subset S C lEn will be called a paved set if S is the disjoint
union of finitely many rectangles.
We can then assert:
Proposition 3.1. Any i> satisfying Axioms i>1 through i>3 must contain all
paved sets. Let i>min denote the collection of all finite unions of rectangles;
then i>min satisfies Axioms i>1 through i>3.
Proof. We have already proved the first part of this proposition. We leave the
second part as an exercise for the reader.
8.4 THE MINIMAL THEORY 327
4. THE MINIMAL THEORY
Weare now going to see how far Mis determined by Axioms M1 through M3.
In fact, we are going to show the M(D~) is what it should be; i.e., if
and
then we must have
M(D~) = {~bl - al) ... (bn - an)
if D~ = 0,
if D~ 7"= 0. (4.1)
AxiomM4 says that (4.1) holds for the special case a = 0, b = 1. Examining
the proof of Lemma 3.1 shows that D~ can be written as the disjoint union of 2n
rectangles, all congruent (via translation) to D~12, where i = (!, ... ,!).
Axioms M2 and M3 then imply that
M(D~12) = 21n'
Repeating this argument inductively shows that
Dll2r 1
M( 0 ) = 2M
We shall now use (4.2) to verify (4.1). The idea is
to approximate any rectangle by unions of trans-
lates of cubes
Dll2ro .
Fig. 8.6
Observe that in proving (4.1) we need to consider only rectangles of the form
D~. In fact, we take c = b - a and observe that
T-a (D~) = D~,
so Axiom M3 implies that M(D~) = M(D~), and by definition e1 ••• en =
(b1 - a1) ••• (bn - an). If D~ = 0, then (4.1) is trivially true (from Axiom
M2). Suppose that D~ 7"= 0. Then c = -<e1, ... , en> with ei > 0 for all i.
For each r there are n integers N 1, ••• , Nn such that (Fig. 8.6)
(4.3)
In what follows, let k = -< kl, ... , kn>, 1 = -< l1, ... , In>, etc., denote
vectors with integral coordinates (i.e., the k/s are integers). Let us write
k < 1 if ki < li for all i. If N = -<N11 ••• , N n >, then it follows from (4.3)
and the definitions that
D (l/2 r )k+1I2r C DC
(I/2 r )k 0 whenever
For any k and 1,
Since
D (l/2r )k+1I2r n D(l/2r JL+1I2r _ n<
(l/2 r )k (l/2r )L - )U
k < N.
if k 7"= l.
328 INTEGRATION
by (4.2) (and Axiom f..!2) and
we conclude that
U D(1I2r)k+l/2r C DC
(l/2 r )k 0,
k<N
f..!(Oo) ~ <jbX (the number of k satisfying 0 ::; k < N).
It is easy to see that there are N 1 • N 2' •..• N n such k, so that
f..!(Do) ~ <jbX (Nl ·· . N n ) = (~f)'" (~f)'
According to (4.3), Nj2r ~ e, - 1/2", so we have
f..!(Do) ~ (e1 - ;r)'" (en - ;r)'
Similarly,
Doc U
and we conclude that
f..!(Do) ::; (e1 + ~r) ... (en+ ~r)
Letting r --t 00 in (4.4) and (4.!i) proves (4.1).
8.;,
(4.,1 )
(4.lil
In deriving (4.1) we made use of Axiom f..!4. Examining our argument show,,:
that if f..!' satisfied f..!2 and f..!3 but not f..!4, we could argue in the same mamH'l"
except that we would have to multiply everything by the fixed constant f..!'(D~).
To sum up, we have proved:
Proposition 4.1. If f..! satisfies Axioms f..!1 through f..!4, then the value of Jl
or any rectangle is uniquely determined and is given by (4.1). If f..!' satisfi('~
f..!1 through f..!3, then for any rectangle 0:,
where
5. THE MINIMAL THEORY (Continued)
We will now show that formula (4.1) extends to give a unique f..! defined on ~llli"
so as to satisfy Axioms f..!1 through f..!4. We must establish essentially two fact,~.
1) Every union of rectangles can be written as a disjoint union oj rectangles
This will then allow us to use Axiomf..!l to determine f..!(A) for every A E ~mi",
by setting
if A is the disjoint union of the D:;. Since A might be written in another way
as a disjoint union of rectangles, this formula is not well defined until we estab-
lish that:
2) IJ A = UD:; = UD~f are two representations of A as a disjoint unioll
of rectangles, then
8.5 THE MINIMAL THEORY (CONTINUED) 329
I I
--+
I I
I I
I I
I I
I I
I I
c---- I I
-<c1, c~>- -<c~, c~>- I ?
-<c1, C~>-r-------',---_--------~-<C4' c2>-
I I
1 ') I 1 2 I
I 2 -<C3' q>-: -<C2' C4>- I I 2
-<C" C4>-1---------r----+---"----.-<C4' C4>-
I I
---f--
I I
I I
I I
----+---
Fig. 8.7 Fig. 8.8
We first introduce some notation.
Definition 5.1. A paving p of lEn is a finite collection of mutually disjoint
rectangles. The floor of this paving, denoted by [pI, is the union of all
rectangles belonging to p.
If p = {D~:} and T is a translation, we set Tp = {TD~:}.
If p and g are two pavings, we say that g is finer than p (and write g -< p)
if every rectangle of p is a union of rectangles of g. It is clear that if p -< 9:/
and g -< p, then g -< 9:/. Note also that g -< P implies [p[ C [gr.
Proposition 5.1. Let p and g be any two pavings. There exists a third
paving 9:/ such that 9:/ -< p and 9:/ -< g.
Proof. The idea of the proof is very simple. Each rectangle in p or in g deter-
mines 2n hyperplanes (each hyperplane containing a face of the rectangle).
If we collect all these hyperplanes, they will "enclose" a number of rectangles.
We let 9:/ consist of those rectangles in this collection which do not contain any
smaller rectangle. Figure 8.7 shows the case (for n = 2) where p and g each
contain one rectangle. Here 9:/ contains nine rectangles.
We now fill in the details of this argument. Let Cl = -< cL ... ,c~ >- ,... ,
Ck = -< ck, ... , Ck>- be all the vectors that occur in the description of the
rectangles of p and g. (In other words, if D~ E: P or E: g, then a and b are among
the c's.) Let d 1, ..• , dkn be the vectors of the form -< cL ... ,cin >-, where the i/s
range independently from 1 to k, (so that there are kn of them). (See Fig. 8.8 for
the case where n = 2 and p and g consist of one rectangle each.) For each di
there is at most one smallest dj(i) such that d i < dj(i)' In fact, if
di = -<cll,···,cin >-,
then set dj(i) = -< cll' ... , cfn>-' where
I I
Cjz = mIn cm.
el >el
m 'I
330 INTEGRATION 8.5
Let '1:1 = {D~~(;)}. Then '1:1 is finer than P and s.
D~ = D~p for suitable a and (3 and
In fact, if D~ E p, say, then
DdfJ-
d a - DdO(O)
d~ ~ •
. (5.1)
To see this observe that if x E D~~, then d", ~ x < d~. Choose a largest
di ~ x. Then di ~ x < dj(i), so x E D~{(;). This proves the proposition. We
will later want to use the particular form of the '1:1 we constructed to find addi-
tional information. 0
We can now prove (1) and (2).
Lemma 5.1. Let PI, ... ,PI be pavings. Then there exists a paving s
such that lsi = Ipil u· .. u IpzI·
Proof. By repeated applications of Proposition 5.1 we can choose a paving ~/
which is finer than all the p/s. Then each Ipil is the union of suitable rectangles
of '1:1. Let S be the collection of all these rectangles occurring in all the p/s.
Then lsi = Ipil U· .. u Ipkl. D
In particular, we have proved (1). More generally, we have shown that every
A E :Dmin is of the form A = Ipi for a suitable paving p. We now wish to turn
our attention to (2).
Lemma 5.2. Let d < ... < c;l' c~ < ... < C~2' ••• ,c~ < ... < c~n be 11
sequences of numbers. Then
1 11. 1 11.
D <c , ... , c >- '"" D <C0 +1'···' c. +1 >-J.l. T} Tn = £...J J.L "'I '11.
<clll""enl> l<il<rl <c~ ,...,e'!' > .-: 1.1 "'11.
l:5i~<rn
Proof. In fact, C~i - c = c1 - c; +c~ - c1 +... +C~i - C;i-I, so that th('
lemma follows from (4.1) when we multiply out all the factors. D
We now prove (2). Let P = {D~:} and S = {D!~}, where A = Ipi = lsi·
Let '1:1 be the paving we constructed in the proof of Proposition 5.1. Let..4- =
{D~i} be the collection of those rectangles D~~ of '1:1 such that D~~ C Ipi = lsi·
Then to prove (2) it suffices to show that
L M(D~i) = L: M(D~:) = L M(D!t). (5.2)
Now each rectangle D~: is decomposed into rectangles D~{(l) according to (5.1),
that is, ai = d"" hi = d~, etc.
By construction of the d's, this is exactly a decomposition of the typl!
described in Lemma 5.2. Thus (5.1) implies that
M(D~~) = L: M(D~{(i».
d a :5di
dj(i)<dp
8.6 CONTENTED SETS 331
Summing over all D~: (and doing the same for D~l) proves (5.2). We can thus
state:
TheoreD1 5.1. Every A E :Dmin can be written as A = Ipl. The number
JL(A) = LOEPJL(O) does not depend on the choice of p. We thus get a
well-defined function JL on :Dmin. It satisfies Axioms JL1 through JL4. If JL' is
any other function on :Dmin satisfying JL2 and JL3, then JL'(A) = KJL(A),
where K = JL'(O~).
Proof. The proof of the last two assertions of the theorem is easy and is left as an
exercise for the reader.
6. CONTENTED SETS
Theorem 5.1 shows that our axioms are not vacuous. It does not provide us
with a satisfactory theory, however, because :Dmin contains far too few sets.
In particular, it does not fulfill requirement (iii), since :Dmin is not invariant
under rotations, except under very special ones. We are now going to remedy
this by repeating the arguments of Section 4; we are going to try to approximate
more general sets by sets whose JL'S we know, i.e., by sets contained in :Dmin.
This idea goes back to Archimedes, who used it to find the areas of figures in
the plane.
Definition 6.1. Let A be any subset of lEn. We say that P is an inner paving
of A if Ipi CA. We say that S is an outer paving of A if A c lsi.
We list several obvious facts.
If Ipi cAe lsi, then JL(p) ~ JL(S).
If Ipi cAe lsi, then ITpl eTA c ITsl·
If Al n A2 = 525 and Ipil CAb Ip21 C A 2,
then PI U P2 is an inner paving of Al U A 2.
Definition 6.2. For any bounded subset A of lEn let
JL*(A) = lub JL(lpi)
IpicA
be called the inner content of A and let
,a(A) = glb JL(isi)
AC/sl
be called the outer content of A.
(6.1)
(6.2)
(6.3)
Note that since A is bounded, there exists a S with A c lsi. This shows that
peA) is defined. This together with (6.1) shows that JL*(A) is defined and that
JL*(A) ~ ,a(A). (6.4)
332 INTEGRATION 8.6
Definition 6.3. A set A will be called contented if J-t*(A) = ,ileA). We call
J-t*(A) = ,ileA) the content of A and denote it by J-t(A).
Observe that every A E ~min is contented. In fact, if A = lvi, then v is
both an inner and an outer paving of A. Thus J-t*(A) = ,ileA) = J-t(lvl), and the
new definition of J-t(A) coincides with the old one.
Our next immediate objective is to show that the collection of all contented
sets fulfills Axioms ~1 through ~3.
Proposition 6.1. A set A is contented if and only if its boundary is con-
tented and has content zero.
Proof. Suppose A is contented. For any 0 > 0 we can find an inner paving p
and an outer paving S such that J-t(S) - J-t(p) < 0/2. We want to replace p by
a close paving p' with Ip'l c int A. To do this, we choose a small number 1/
and replace each rectangle D~ of p by D~+~~~=:~. We let p~ be the collection
of all these rectangles. Then Ip~1 C int Ipl, so Ip~1 C int A. Furthermore,
J-t(lp~l) = (1 - 21/)nJ-t(lpl), since the factor (1 - 21/) is the decrease of each side
of each rectangle of p. Similarly, we replace S by a slightly larger s~, with
A C int S~ and J-t(S~) :::; (1 + 21/)nJ-t(S). By choosing 1/ sufficiently small, we
can thus arrange that J-t(S~) - J-t(p~) < o. Let v be a paving which is finer
than S~ and p~, with Ivl = Is~l. Let.4- C v consist of those rectangles of 'V
lying in int A. Then Ivl = Is~1 =:> 1.4-1 =:> Ip~l, so J-t(lvl) - J-t(I.4-I) :::; o. But,
aA c Iv - .4-1, so that ,il(aA) :::; J-t(lv - .4-1) = J-t(lvl) - J-t(I.4-1) < o. In other
words, ,il(aA) = o.
Conversely, suppose that aA has content zero. Let.4- be an outer paving of
aA with J-t(I.4-1) < E. Let v be a paving finer than .4- and such that A C Ivl.
Let p C v consist of those rectangles contained in A. Let S C v consist of those
rectangles lying in Ipi U 1.4-1. ThenJ-t(lsl) :::; J-t(lpl) +J-t(I.4-1) < J-tCipl) + E. Further-
more, A C lsi. In fact, let x E A. Then xED for some 0 E v. If 0 n aA ~
0, then 0 n 1.4-1 ~ 0, so 0 c 1.4-1, since v is a refinement of.4-. If 0 n aA = 0,
then every point of 0 must lie in A, so that 0 C Ipl. We have thus constructed
p and S with Ipi cAe lsi and J-t(S) - J-t(p) < E. Since we can do this for any f,
this implies that A is contented. D
Proposition 6.2. The union of any finite number of sets with content zero
has content zero. If A c Band B has content zero, then so does A.
Proof. The proof is obvious.
TheoreIn 6.1. Let ~con denote the collection of all contented sets. Theil
~con satisfies Axioms ~1 through ~3, and the J-t given in Definition 6.3 sat-
isfies J-t1 through J-t4. IfJ-t' is any other function on ~con satisfying J-t1 through
J-t3, then J-t' = KJ-t, where K = J-t'(Db).
Proof. Let us verify the axioms.
:D1. For any A and B,
a(A u B) c aA u aB and a(A n B) c aA u aBo
8.7 WHEN IS A SET CONTENTED? 333
By Proposition 6.1, if A and B are contented, then aA and aB have content
zero. Thus so do aA u aB, a(A u B), and a(A n B), by Proposition 6.2.
Hence A u B and A n B are contented.
~2. Follows immediately from (6.1).
~3. Is obvious.
p,2. If A 1 and A 2are contented, we can find inner pavings Ih and P2 such
that P,(A1) - p,(lp1/) < e/2andp,(A2) - p,(lp2/) < e/2. If A1 n A2 = )25,
then P1 U P2 is an inner paving of A 1 u A 2, and so
P,(A1 u A 2) ;::: p,(A 1) +P,(A2)'
On the other hand, let 81 and 82 be outer pavings of A1 and A 2, respectively,
with P,(81) < P,(A1) + e/2 and P,(82) < P,(A2) + e/2.
Let v be a paving with Ivl = 1811 u 1821. Then v is an outer paving of
A1 u A2 and p,(lvl) ~ P,(181/) +p,(182/)' Thus p,(A1 u A z) ~ p,(lv/) ~
P,(A1) +P,(A2) + e, or p,(A1 u A 2) ~ p,(A1) +P,(A2)' These two inequal-
ities together give p,2.
p,1. Is obvious.
p,3. Follows from (6.2) and Definition 6.3.
p,4. We already know.
The second part of the theorem follows from Theorem 5.1 and Definition 6.3.
In fact, we know that p,'(lpl) = Kp,(lp[), and (6.1) together with Axiom p,2
implies that p,'(lpl) ~ p,'(A) ~ p,'(18!). Since we can choose p and 8 to be
arbitrarily close approximations to A (relative to p,), we are done. 0
Remark. It is useful to note that we have actually proved a little more than
what is stated in Theorem 6.1. We have proved, namely, that if i) is any collec-
tion of sets satisfying i)1 through i)3, such that i)min C i) C i)con, and if p,': i) ~ IR.
satisfiesp,l through p,3, then p,'(A) = Kp,(A) for all A in i), where K = p,'(D~).
7. WHEN IS A SET CONTENTED?
We will now establish some useful criteria for deciding whether a given set is
contented.
Recall that a closed ball B: with center x and
radius r is given by
B~ = {y: Iiy - xii ~ r}. (7.1)
Note that
for any e > 0, (7.2)
and
(7.3)
(See Fig. 8.9.) Fig. 8.9
334 INTEGRATION 8.7
If we combine (7.2) and (7.3), we see that any cube G lies in a ball B such
that fl(B) S 2n(Vn)np.(G) and that any ball B lies in a cube G such that p.(G) <
3n(yn)nfl(B).
Lelllllla 7.1. Let A be a subset of lEn. Then A has content zero if and only
if for every E > 0 there exist a finite number of balls {Bi} covering A
with L fl(Bi) < E.
Proof. If we have such a collection of covering balls, then by the above remark
we can enlarge each ball to a rectangle to get a paving p such that A C Ipi and
p.(lpl) < 3n(yn)nE. Therefore, fleA) = 0 if we can always find the {Bi}.
Conversely, suppose A has content O. Then for any 0 we can find an outer
paving p with p.(lpl) < o. For each rectangle 0 in the paving we can, by thp
arguments of Section 4, find a finite number of cubes which cover 0 and whose
total content is as close as we like to p.(O), say <2p.(O). By doing this for each
o E p, we have a finite number of cubes {Oi} covering A with total content
less than 20. Then by our remark before the lemma each cube O. lies in a ball
Bi such that P.(Oi) < 2n(Vn)nfl(B.), and so we have a covering of A by balls Bi
such that L fl(Bi) < 2n+1(yn)n o. If we take 0 = E/2n+1(yn)n, we have the
desired collection of balls, proving the lemma. 0
Recall that a map cp of U C P ---+ lEn is said to satisfy a Lipschitz condition
if there is a constant K (called the Lipschitz constant) such that
IIcp(y) - cp(x) II < KIIY - xii· (7.4)
Proposition 7.1. Let A be a set of content zero with if C U, and let
cp: U ---+ lEn satisfy a Lipschitz condition. Then cp(A) has content zero.
Proof. The proof consists of applying both parts of Lemma 7.1. Since A has
content zero, for any E > 0 we can find a finite number of balls covering A whose
total outer content is less than E/Kn. By (7.4), cp(B~) C B:[,.), so that the images
of the balls covering A cover cp(A) and have a total volume less than E. 0
Recall that if cp is a (continuously) differentiable map of an open set U into
lEn, then cp satisfies a Lipschitz condition on any compact subset of U.
As a consequence of Proposition 7.1, we can thus state:
Proposition 7.2. Let cp be a continuously differentiable map defined on an
open set U, and ld A be a bounded set of content zero with if C U. Then
cp(A) has content zero.
Let A be any compact subset of P lying entirely in the subspace given by
xn = O. Then A has content zero. In fact, for some sufficiently large fixed r,
the set A is contained in the rectangle
o<T•...•T•• >
<-T•.•.•-T.O) > for any E > 0,
which has arbitrarily small volume.
8.8 BEHAVIOR UNDER LINEAR DISTORTIONS 335
Now let 1/;: V C p-l ~ P be a continuously differentiable map given by
<yl, ... , yn-l> ~ <I/;l(yl, ... , yn-l), ... , I/;n(yI, ... , yn-l».
Let B be any bounded subset of p-l with ]'j C V. We can then write I/;(B) =
~(A), where A is the set of points in lEn of the form (y, 0), where y E B, and
where ~ is a differentiable map such that
~(Xl, ... , xn) = <I/;I(xI, ... , xn - l), ... , I/;n(xI, ... , xn- l».
By Proposition 7.2 we see that p.CI/;(B) = o. Thus,
Proposition 7.3. Let I/; be a differentiable map of V C IEn - 1 into lEn, and
let B be a bounded set such that ]'j c V. Then I/;(B) has content zero.
We have thus recovered requirement (v) of Section l.
An immediate consequence of Propositions 7.3 and 6.1 is:
Proposition 7.4. Let A C P be such that aA c UI/;i(Bi) where each I/;i and
Bi is as in Proposition 7.3. Then A is contented.
This shows that every set "we can draw" is contented.
Exercise. Show that every ball is contented.
8. BEHAVIOR UNDER LINEAR DISTORTIONS
We shall continue to derive consequences of Proposition 7.l.
Proposition 8.1. Let ~ be a one-to-one map of U ~ P which satisfies a
Lipschitz condition and is such that ~-l is continuous. If A c U is con-
tented, then so is ~(A).
Proof. Since A is contented, aA has content zero. By the conditions on ~,
we know that a~(A) = ~(aA). Thus a~(A) has content zero, and so ~(A) is
contented. 0
An immediate consequence of Proposition 8.1 is:
Proposition 8.2. Let L be a linear transformation of P. Then LA is con-
tented whenever A is contented.
Proof. If Lis nonsingular, Proposition 8.1 applies. If L is singular, it maps all
of lEn onto a proper subspace. Any such subspace is contained in the image of
{x : xn = O} by a suitable linear transformation, and so p.(LA) = 0 for any
contented A. 0
Theorem 8.1. Let L be a linear transformation of P. Then for any con-
tented A we have
p.(LA) = Idet LIp.(A). (8.1)
Proof. We can restrict our attention to nonsingular L, since we have already
checked Eq. (8.1) for det L = o. If L is nonsingular, then L carries the class of
336 INTEGRATION S.9
contented sets into itself. Let us define fJ.' by fJ.'(A) = fJ.(LA) for each A E :Deon.
We claim that fJ.' satisfies Axioms fJ.1 through fJ.3 on :Deon.
In fact, fJ.1 and fJ.2 are obviously true; fJ.3 follows from the fact that for any
translation Tv, we have TLvL = LTv, so that
fJ.'(TvA) = fJ.(LTvA) = fJ.(TLvLA) = fJ.(LA) = fJ.'(A).
By Theorem 5.2 we thus conclude that
fJ.' = kLfJ.,
where kL is some constant depending on L. We must show that kL = Idet LI.
We first observe that if 0 is an orthogonal transformation, then
fJ.(OA) = fJ.(A).
In fact, we know that fJ.(OA) = kOfJ.(A). If we take A to be the unit ball B~,
then OB~ = B~, so ko = l.
Next we observe that fJ.(L I L 2 A) = kL1fJ.(L2 A) = kLJCL2fJ.(A), so that
Now we recall that any nonsingular L can be written as L = PO, where P
is a positive self-adjoint operator and 0 is orthogonal. Thus kL = kp and
Idet LI = Idet PI Idet 01 = Idet PI, so we need only verify (S.l) for positive self-
adjoint linear transformations. Any such P can be written as P = OID01
whe~e 0 1 is orthogonal and D is diagonal. Since P is positive, all the eigenvalues
of D are positive. Since det P = det D and kp = kD, we need only verify (S.U
for the case where L is given by a diagonal matrix with positive eigenvalues
Ar, ... ,An. But then LD~ = D~Al"")'n>, so that
fJ.'(D~) = fJ.(D~Al,...,An» = AI' .. An = Idet LI,
verifying (8.1). 0
Exercise. Let VI, ... , Vn be vectors of lEn. By the parallelepiped spanned by VI, ... , Vn
we mean the set of all vectors of the form L7~1 XiVi, where 0 :::; xi:::; 1. Show that its
content is Idet ((Vi, Vj))II/2.
9. AXIOMS FOR INTEGRATION
SO far we have shown that there is a unique fJ. defined for a large collection of
sets in lEn. However, we do not have an effective way to compute fJ., except in
very special cases. To remedy this we must introduce a theory of integration.
We first introduce some notation.
Definition 9.1. Let f be any real-valued function on lEn. By the support of
j, denoted by supp j, we shall mean the closure of the set where j is not zero;
that is,
suppj = {x: j(x) rf O}.
8.9 AXIOMS FOR INTEGRATION 337
Observe that
supp (f + g) C supp f U supp g (9.1)
and
supp fg c supp f n supp g. (9.2)
We shall say that f has compact support if supp f is compact. Equation (9.1)
[and Eq. (9.2) applied to constant gj shows that the set of all functions with
compact support form a vector space.
Let T be any one-to-one transformation of lEn onto itself. For any function f
we denote by Tf the functions given by
(Tf)(x) = f(T-1x). (9.3)
Observe that if T and T- 1 are continuous, then
and
supp Tf = T supp f. (9.4)
Definition 9.2. Let A be a subset of lEn. By the characteristic function of A,
denoted by eA, we shall mean the function given by
Note that
if x E A,
if xtiA.
eAlnA2 = eAl . eA2 ,
eAlUA2 = eAl + eA2 - eAlnA2 ,
supp eA = X,
(9.5)
(9.6)
(9.7)
(9.8)
(9.9)
for any one-to-one map T of lEn onto itself.
By a theory of integration on P we shall mean a collection 5' of functions
and a rule I which assigns a real number If to each f E 5', subject to the follow-
ing axioms:
5'1. 5' is a vector subspace of the space of all bounded functions of compact
support.
5'2. If f E 5' and T is a translation, then Tf E 5'.
5'3. eo belongs to 5' for any rectangle D.
II. I is a linear function on 5'.
I2. ITf = If for any translation T.
I3. Iff ~ 0, then If ~ O.
I4. IeD~ = 1.
Note that the axioms imply that 5' contains all functions of the form
eOl + e0 2 + ... +eOk for any rectangles Db ... ,Ok, In particular, for any
paving p, the function e11'1 must belong to 5'.
338 INTEGRATION 8.10
Also note that from I3 we have at once the stronger version:
I3'. f:::; g =} If :::; I g, since then g - f ~ O.
Proposition 9.1. Let a:, I be a system satisfying Axioms n through a:3 and
II through I4. Then
JeA = ,u(A) (9.10)
for every contented set A such that eA E a:, and
I5. IIfl :::; IIflloo,u(suppf) for every f Ea:.
Proof. The axioms guarantee that eA E a: for every A E !Dmin and that v(A) =
I eA satisfies ,u1 through ,u4. Therefore, I eA = ,u(A) for every A E !Dmin by the
uniqueness of ,u (Proposition 4.1). It follows that if A is a contented set such
that eA E a:, and if p and s are inner and outer pavings of A, then
,u(lpi) = Je1,,1 :::; JeA :::; Jelsi = ,u(jsi).
Therefore, I eA lies between ,u*(A) and ,a(A), and so equals ,u(A). For any
f E a: and any A E !Dmin such that supp f ~ A, we have -llfll",eA :::; f :::;
IIfll",eA, and therefore IIfl :::; IIfll",,u(A) by I3' and (9.10). Taking the greatest
lower bound of the right side over all such sets A, we have I5. 0
10. INTEGRATION OF CONTENTED FUNCTIONS
We will now proceed to deal with Axioms a: and I in the same way we dealt
with Axioms !D and,u. We will construct a "minimal" theory and then get a "big"
one by approximating. According to Proposition 9.1, the class a: must contain
the function e1,,1 for any paving p. By a:1 it must therefore contain all linear
combinations of such.
Definition 10.1. By a paved function we shall mean a function f = f"
given by
(10.1)
for some paving p.
It is easy to see that the collection of all paved functions satisfies Axioms n
through 53. Furthermore, by Proposition 9.1 and Axiom II the integral, I, is
uniquely determined on the class of all paved functions by
(10.2)
if f is given by (10.1).
The reader should verify that if we let a:p be the class of all paved functions
and let I be given by (10.2), then all our axioms are satisfied. Don't forget to
show that I is well defined: if f is expressed as in (10.1) in two ways, then the
sums given by (10.2) are equal.
8.10 INTEGRATION OF CONTENTED FUNCTIONS 339
The paved functions obviously form too small a collection of functions.
We would like to have an 5' including all continuous functions with compact
support and all characteristic functions of the form eA with A contented, for
example.
Definition 10.2. A bounded function f with compact support is said to be
contented if for any e > 0 and a> 0 there exists a paved function g = g.,a
and a contented set A = A.,a such that
If(x) - g(x) I < e for all x (/. A
and
,u(A) < a.
The pair -< g, A >- will be called a paved e, a-approximation to f.
(10.3)
(10.4)
Let us verify that the collection of all contented functions, 5'con, satisfies
Axioms 5'1 through n. It is clear that iffis contented, so is affor any constant a.
If fl and f2 are contented, let -<gb Al >- and -<Y2, A 2>- be paved e,a-approxi-
mations to fl and f2' respectively. Then
for all x (/. Al U A 2,
and
Thus -<gl + g2, Al U A 2>- gives a paved 2e, 2a-approximation tofI +12.
To verify 5'2 we simply observe that if -< g, A>- is a paved e, a-approximation
to f, then -< Tg, T A >- is one to Tf.
A similar argument establishes the analogous result for multiplication:
Proposition 10.1. Let fI and f2 be two contented functions. Then fd2 is
contented.
Proof. Let M be such that Ifl(X) < M and If2(X)1 < M for all x. Recall that
the product of two paved functions is a paved function. Using the same notation
as before, we have
Ifl12(X) - gl(X)g2(X)1 :::; IfI(x)llf2(X) - g2(x)1 + Ig2(X)llfl(X) - gl(x)1
< Me + (M +e)e
Thus -<glg2, Al U A 2>- is a paved (2M +e)e, 2a-approximation tofd2' D
As for 5'3, it is immediate that a stronger statement is true:
Proposition 10.2. If B is a contented set, then eB is a contented function.
Proof· In fact, let p be an mner paving of B with ,u(B) - ,u(lpl) < a. Then
eB(x) - eII'I (x) = 0 if x (/. B - Ipl,
and
,u(B - Ipl) < a,
so eipi ,Ipi is a paved e, a-approximation to eB for any e > O. D
340 INTEGRATION 8.10
We now establish a useful alternative characterization of a contented func-
tion.
Proposition 10.3. A function f is contented if and only if for every E there
are paved functions hand k such that h :::; f :::; k and I (k - h) < E.
Proof. Iff is contented, let R be a rectangle including supp f. Let -< g, A>- be an
E, a-approximation to f. Let P be a paved set including A = A.,5 such thai.
p,(P) < a, and let m be a bound of IfI· Then g - E(eR) - mep :::; f :::; g -+
E(eR) + l1Wp, where the outside functions are clearly paved and the differenceH
of their integrals is less than 2Ep,(R) + 2ma. Since E and aare arbitrary, we have
our hand k. Conversely, if hand k are paved functions such that h :::; f :::; k
and I(k - h) < a, then the set where k - h 2:: al/2 is a paved set A. Further-
more, a1/2p,(A) :::; IeA(k - h) :::; I(k - h) :::; a, so that p,(A) :::; a1/2. Given E
and a, we only have to choose a :::; min (E2, a2 ) and take g as either k or h to sec
that f is contented. 0
Corollary. A function f is contented if for every E there are contented
functions fr and f2 such that fr :::; f :::; f2 and I (12 - fr) < E.
Proof. For then we can find paved functions h :::; fr and k 2:: f2 such thai.
f (11 - h) < Eand I (k - f2) < f and end up with h :::; f :::; k and I (k - h) < 3E. 0
Theorem. 10.1. Let 5' be a class of functions satisfying Axioms n through 5'3
and such that 5'pC 5' C 5'ean. Then there exists a unique I satisfying Axioms
II through I4 on 5'.
Proof. If I is any integral on 5' satisfying Axioms II through I4, then we must
have If simultaneously equal to lub Ih for h paved and :::;f and equal to glb I Ie
for k paved and 2::f, by Proposition 10.3. The integral is thus uniquely de-
termined on 5'. lloreover, it is easy to see that if the integral on 5' is defined by
If = lub I h = glb I k, then Axioms II through I4 follow from the fact that
they hold for the uniquely determined integral on the paved functions. 0
Exercise 10.1. Let f and g be contented functions such that f(x) = g(x) for x tl. A,
where J.t(A) = O. Then Jf = Jg. (This shows that for the purpose of integration
we need to know a function only as to a set of content zero.)
Definition 10.3. Let f be a contented function and A a contented set.
We call I eAf the integral of f over A and denote it by I Af. Thus
Lf= feAf. (10.5)
An immediate consequence of Axiom II and (9.7) is
(10.6)
An immediate consequence of Exercise 10.1 is
ILfl :::; ~~~ If(x)Ip,(A).
8.10 INTEGRATION OF CONTENTED FUNCTIONS 341
We close this section by giving another useful characterization of contented
functions.
Proposition 10.4. Let f be a bounded function with compact support.
Then f is contented if and only if to every E > 0 and ~ > 0 we can find an
1/ > 0 and a contented set Aa such that ,u(Aa) < ~ and
If(x) - f(y) I < E whenever IIx - yll < 1/ and x, y ft Aa. (10.7)
Proof. Suppose that for every E, ~ we can find 1/ and Aa. Let p = {Di} be a
paving such that
i) suppfelpl;
ii) if x, y E Di, then IIx - yll < 1/;
iii) if S = {Di E p: Di n Aa ~ .0}, then ,u(lsD < 2~.
Then let f •.2a(x) = f(Xi) when x E Di, where Xi is some point of Di. By (ii)
and (iii), we see that f •.2a,ISI is a paved E, 2~-approximation to f. Thus f is
contented.
Conversely, suppose that f is contented,
and letf./2.a/2, A./2•a/2 be a paved approxima-
tion to f.
Let p = {Di} be the paving associated
with f./2.aj2. Replace each Di by the rec-
tangle D~ obtained by contracting Di about
its center by a factor (1 - ~). (See Fig. 8.10.)
Thus ,u(DD = (1 - ~)n,u(Di). For any x,
y E UD~, if
IIx - yll < 1/,
where 1/ is sufficiently small, then Xand y belong
to the same D~. If
X, Y E UDi.
then
I I
I I
Fig. B.I0
and ilx - yll < 1/,
If(x) - f(y) I ~ If(x) - f./2.m(x)1 + If(y) - f./2.m(y)1
+ If./2.a/2(x) - f./2.m(y)l·
D
D
But the third term vanishes, so that if(x) - f(y) I < E. Now by first choosing ~
sufficiently small, we can arrange that ,u(lpl - UDD < ~/2. Then we can
choose 1/ so small that IIx - yll < 1/ implies that x, y belong to the same D~ if
x, y E UD~. For this 1/ and for Aa = A./2 •a/2 U (ipi - UDD, Eq. (10.7) holds,
and ,u(Aa) < ~. 0
In particular, a bounded function which is continuous except at a set of
content zero and has compact support is contented.
342 INTEGRATION 8.11
EXERCISES
10.2 Show that for any bounded set A, CA is a contented function if and only if A iK
a contented set.
10.3 Let! be a contented function whose support is contained in a cubeD. For each 0
let p~ = {Di.~}iEI~ be a paving with Ip~1 = 0 and whose cubes have diameter IC~K
than o. Let Xi.~ be some point of Di.~. The expression
is called a Riemann o-approximating sum for f. Show that for any E > 0 there exists a
00 [= oo(f)] > 0 such that
whenever 0 < 00.
11. THE CHANGE OF VARIABLES FORMULA
This section will be devoted to the proof of the following theorem, which is of
fundamental importance.
Theorem 11.1. Let U and V be bounded open sets in IRn, and let !p be a
continuously differentiable one-to-one map of U onto V with !p - 1 differentiable
Let f be a contented function with supp f C V. Then (f 0 !p) is a contented function,
and
!vf= !u(focp)ldetJI"I. (11.1 )
Recall that if the map cp is given by yi = cpi(Xl, ... , xn), then J I" is th(~
linear transformation whose matrix is [acpi/aXj).
Note that if cp is a nonsingular linear transformation (so that JI" is just cp),
then Theorem 11.1 is an easy consequence of Theorem 8.1. In fact, for functions
of the form CA we observe that CA 0 cp = CI"-lA, and Eq. (11.1) reduces, in this
case, to (8.1). By linearity, (11.1) is valid for all paved functions.
Furthermore,fo cp is contented. Suppose If(x) - f(y) I < Ewhen IIx - yll <
cp and x, y tl A, with ,u(A) < o. Then If 0 cp(u) - f 0 cp(v) I < Ewhen
and u, v tl cp-l(A),
with ,u(cp-lA) < o/Idet cpl.
Now let g•.~, A •.~ be an approximating family of paved functions for f. Theil
Ifo cp(x) - g•.~ 0 cp(x) I < eforx tl cp-l(A •.~) and,u(cp-lA •.~) < o/Idet cpl. Thus
I(g•.~ 0 cp)ldet cpl -t I(fo cp)ldet cpl, and Eq. (11.1) is valid for all contentedf.
The proof of Theorem 11.1 for nonlinear maps is a bit more tricky. It con-
sists essentially of approximating cp locally by linear maps, and we shall do it ill
several steps. We shall use the uniform norm IIxliao = max IXil on ~n. This is
convenient because a ball in this norm is actually a cube, although this nicety
isn't really necessary.
8.11 THE CHANGE OF VARIABLES FORMULA 343
Let t/t be a (continuously) differentiable map defined on a convex open set U.
If the cube D = D:~~~ lies in U, then the mean-value theorem (Section 7,
Chapter 3) implies that for any y E D,
Thus
Thus
1It/t(y) - t/t(p)lIao < lIy - pllao sup IIJIft(z)lI·
.1'(D) DIft(p)+Krl'I' C Ift(p)-Krl, where
zED
(11.2)
LeDlDla 11.1. Let cp be as in Theorem 11.1. Then for any contented set A
with A c U we have
~(cp(A)) ~ LIdet J",I· (11.3)
Proof. Let us apply Eq. (11.2) to the map t/t = L -lcp, where L is a linear
transformation. Then
(11.4)
for any D contained in the domain of the definition of cp and for any linear
tmnsformation L.
For any E > 0, let ~ be so small that IIJ",(x)-IJ",(y)1I < 1 + Efor
IIx - yllao < ~
for all x, y in a compact neighborhood of A. (It is possible to choose such a ~,
since J(x) is a uniformly continuous function of x, so that J",(x)-IJ",(y) is close
to the identity matrix when x is close to y; see Section 8, Chapter 4.)
Choose an outer paving if = {Di} of A, where the Di are cubes all having
edges of length less than~. Let Xi be a point of Di. Then applying (11.4) to
each Di taking L = J",(Xi), we get
~(cp(A)) < ~(CP(lifD) = L ~CP(Di) < L Idet J",(Xi) I(1 + E)n~(Di)'
We can also suppose ~ to have been taken small enough so that
Idet J",(Z) I > (I - E)ldet J",(Xi)I for all Z E Di and all i.
Then we have
and so
fD.ldetJ",I> (1 - E)ldetJ",(xi)I~(Di)'
•
~(cp(A)) < - 11 (1 + E)" r Idet J",I.
- E Jill
Since E is arbitrary and if is an arbitrary outer paving of A, we get (11.3). 0
344 INTEGRATION 8.11
We can now conclude that f 0 cp is contented for any contented f with
supp f c V. In fact, let K be chosen so large that it is a Lipschitz constant for «'
on cp-l(SUppf),and so large thatK > IdetJ<,O-I(u)1 foru ESUPPf. Nowgivcll
E and a, we can find an 1/ such that
lJ(u) - f(v) I < E if Ilu - vII < 1/ and u, vr;;. A~ with ,u(A~) < a.
But this implies that
If 0 cp(x) - f 0 cp(y)I < E if IIx - yll < 1//K and x, y r;;. cp-l(A~),
where ,u(cp-l(A~)) < K~, by (11.3). Since K was chosen independently of f
and a, this shows that f 0 cp is contented.
Lemma 1l.2. Let cp, U, and V be as in Theorem 11.1. Let f be a nonnega-
tive contented function with supp f c V. Then
01.5)
Proof. Let -< g, A>- be a paved E, a-approximation to fwith g(u) :::; feu) for all u.
If p = {Oi} is the paving associated with g, we may assume that supp f C Ipi
Then
Ig = 21 g(Ui),u(Oi) :::; L g(Ui)1<,0-1(Oi) Idet J<'oI :::; 11 ~-I(Oi) (fo cp)ldet JI'I
u~E.Di
Since we can choose g so that fg --+ ff, we obtain (11.5). 0
Lemma 1l.3. Let cp, U, V, and f be as in Theorem 11.1. Let f be a nOIl-
negative function. Then Eq. (11.1) holds.
Proof. Let us apply (11.5) to the map cp-l and the function (fo cp)ldetJ<'oI.
Since J <,0(x) 0 J <,0-1 (cp(x)) = id, we obtain
IUo cp)ldetJ<'o1 :::; I[Uo cp) 0 cp-l](1detJ<'o1 0 cp-l)ldetJ<,O-11
= If.
Combining this with (11.5) proves the lemma. 0
Completion of the proof of Theorem 11.1. Any real-valued contented function call
be written as the difference of two positive contented functions. If for all x,
f(x) > -1'11 for some large M, we write f = (f + Meo) - Meo, where
supp feD. Since we have verified Eq. (11.1) for nonnegative functions, and sinc('
both sides of (11.1) are linear inf, we are done. Similarly, any bounded complex-
valued contented function f can be written as f = fl +ih, where fl and f2 arc
bounded real-valued contented functions. 0
8.11 THE CHANGE OF VARIABLES FORMULA 345
In practice, we sometimes may apply Eq. (11.1) to a situation where the
hypotheses of Theorem 11.1 are not, strictly speaking, verified. For instance, in
1R2 we may want to introduce "polar coordinates". That is, we let r, °be coordi-
nates on 1R2; if S is the set 0 ::; °< 21r,0 ::; r, we consider the map cp: S ~ 1R2
given by x = r cos 0, y = r sin 0, where x, yare coordinates on a second copy
of R2• Now this map is one-to-one and has positive Jacobian for r > O. If we
consider the open sets U C S given by 0 < r, 0 < °::; 21r and V C R2 given by
V = R2 - {x, y : y = 0, x ~ O}, the hypotheses of Theorem 11.1 are fulfilled,
and we can write (since det J", = r)
JI= J(focp)r (11.6)
if supp leV. However, Eq. (11.6) is valid without the restriction supp leV.
In fact, if D. is a strip of width E about the ray y = 0, x ~ 0, then I = leD. +
leRn_D. and fleD. ~ 0 as E ~ 0 (Fig. 8.11). Similarly, f(fo cp)(r 0 cp)eD. 0 cp ~ 0,
so that (11.6) is valid for all contented I by this simple limit argument.
Fig. 8.11
We will not state a general theorem covering all such useful extensions of
Theorem 11.1. In each case the limit argument is usually quite straightforward
and will be left to the reader.
EXERCISES
11.1 By the parallelepiped spanned by vI, ... , vn we mean the set of all x = LevI +
...+ ~nvn, where 0 ::; ~i < 1. Show that the content of this parallelepiped is given by
Jdet ((Vi, Vj))J1I2.
n.2 Express the content of the ellipsoid
{
(xl)2 (xn) 2 }
x: (a1)2 +...+(an)2 :::; 1
in terms of the content of the unit ball.
n.3 Compute the Jacobian determinant of the map <r, 0> 1--+ <x, y>, where
x = r cos 0, y = r sin 0.
n.4 Compute the Jacobian determinant of the- map <r, 0, cpr 1--+ <x, y, z>,
where x = r cos cp sin 0, y = r sin cp sin 0, z = r cos 0.
n.5 Compute the Jacobian determinant of the map <r, 0, z> 1--+ <x, y, z>,
where x = r cos 0, y = r sin 0, z = -z.
346 INTEGRATION 8.12
12. SUCCESSIVE INTEGRATION
In the case of one variable, i.e., the theory of integration on IRI, the fundamental
theorem of the calculus reduces the computation of the integral of a function to
the computation of its antiderivative. The generalization of this theorem to
n dimensions will be presented in a later chapter. In this section we will show
how, in many cases, the computation of an n-dimensional integral can be reduced
to n successive one-dimensional integrations.
Suppose we regard IRn , in some fixed way, as the direct product IRn =
IRk X 1R1. We shall write everyz E IRn as z = -<x, y>, where x E IRk and y E IR/.
Definition: 12.1. We say that a contented function f is contented relative to
the decomposition IRn = IRk X IRI if there exists a set AI C IRk of content
zero (in IRk) such that
i) for each fixed x E IRk, X ~ A" the function f(x, .) is a contented function
on IRI;
ii) the function IRlf which assigns to x the number IRlf(x, .) is a contented
function on IRk.
It is easy to see that the set of all such functions satisfies Axioms 5'1 through
5'3. (The only axiom that is not immediate is 5'2. But this is an easy consequence
of the fact that any translation T can be rewritten as TIT2 ,where Tl is a trans-
lation in IRk and T2 is a translation in 1R1.)
It is equally easy to verify that the rule which assigns to any such f the
number
satisfies Axioms II through I4. The only one which isn't immediately obvious
is I3. However, if pis any paving with supp f C Ipl, then
f ~ IIfllelpl
and
lk (ll IIfllelpl) = IIflllk IIeipi = IIfllJ.L(elpl)'
since
lkll eO = J.L(eo)
for any rectangle (direct verification). Thus, by the uniqueness part of Theorem
10.1, we have
(12.1)
Note, in particular, that if f is also contented relative to the decomposition
IRn = IRI X IRk, then
In particular, for such f the double integration is independent of the order.
8.12 SUCCESSIVE INTEGRATION 347
In practice, all the functions that we shall come across will be contented
relative to any decomposition of IRn. In particular, writing IRn = 1R1 X· .. X 1R1,
we have
(12.2)
In terms of the rectangular coordinates x1, • • • , xn , this last expression is usually
written as
For this reason, the expression on the left-hand side of (12.2) is frequently
written as
hnf = j ... jf(x ... ,xn) dx 1 • •• dxn.
Let us work out some simple examples illustrating the methods of integra-
tion given in the previous sections.
ExaDlple 1. Compute the volume of the intersection of the solid cone with vertex
angle a (vertex at 0) with the spherical shell 1 ~ r ~ 2 (Fig. 8.12). By a Euclidean
motion we may assume that the axis of the cone is the z-axis. If we introduce
polar coordinates, we see that the set in question is the iInage of the set
0<2,2".,<>/2>
<1,0,0> , 1 ~ r < 2,
in the -<r, I{J, 6>-space (Fig. 8.13).
Fig. 8.12
o ~ 6 ~ a/2
r
Fig. 8.13
By the change of variables formula and Exercise 11.4 we see that the volume
in question is given by
j 1,2 (2". (<>/2
r2 sin 6 = 1 10 10 r2 sin 6 dO dl{J dr
1,2 ("/2 2
= 211" 1 10 r sin 6 d'O dr
= 211"h2 [1 - cos (a/2)]r2 dr
= 211"[1 - cos (a/2)](! - i).
348 INTEGRATION 8.12
Exalllple 2. Let B be a contented set in the plane, and let !I and f2 be two COlI-
tented functions defined on B. Let A be the set of all -<x, y, z>- E 1E3 such thai,
-<x, y>- E B andfl(x, y) ::::; z ::::; hex, y). If G is any contented function on A,
we can express the integral JAG as
1G = 1{rI2 (X'Y) G(x, y, z) dZ} dx dy.
A A Jh(X,y)
For example, compute the integral JA z, where
A is the set of all points in the unit ball lying
above the surface z = x2 + y2 (Fig. 8.14).
Thus
A = {-<x,y,z>-: X2 +y2+Z2::::; 1,z ~ X2 +y2}.
Fig.II.H
We must have x2+ y2 ::::; a, where a2 + a = 1 [so that a = (VS - 1)/2], ill
order for -<x, y, z>- to belong to A. Then !I(x, y) = x2 + y2, f2(x, y) =
vI - (X2 + y2), and
;;
/ 2 (X,y)
z dz = t[1 - (x 2+ y2) - (x2 + y2)2],
h(x,y)
so that, using polar coordinates in the plane (and Exercise 11.3),
1z = t r [1 - (x2 + y2) - (x2 + y2)2] = 7r r..;a r(l - r2 - r4) dr.
A Jx2+y25,a Jo
As we saw in the last example, part of the problem of computing an integral
as an iterated integral is to determine a good description of the domain of
integration in terms of the decomposition of the vector space. It is usually It.
great help in visualizing the situation to draw a figure.
Exalllple 3. Compute the volume enclosed by a surface of revolution. Here we arp
given a function f of one variable, and we consider the surface obtained by
rotating the curve x = fez), Zl ::::; Z ::::; Z2, around the z-axis (Fig. 8.15). We
thus wish to compute ,u(A), where
A = {-<x, y, z>- : x2 + y2 ::::; fez), Zl ::::; Z ::::; Z2}'
Here it is obviously convenient to use cylindrical coordinates, and we sen
that A is the image of the set
B = {-<r, e, z>- : r ::::; fez), 0 ::::; e < 27r}
in the -< r, e, z>- -space. By Exercise 11.5, we wish to compute
fB r = fo2". 1.:2
fo/(Z) r dr dz de = 27r1.:2 (fo/(Z) r dr) dz = 27rf.:2f(~ 2 dz.
Thus
j Z2 2
,u(A) = 7r fez) dz.
Z1
8.12 SUCCESSIVE INTEGRATION 349
z
x=f(z)
-+--+-x
Fig. 8.15
EXERCISES
12.1 Compute the volume of the region between the surfaces z = x2+y2 and
z = x+ y.
12.2 Find the volume of the region in lEa bounded by the plane z = 0, the cylinder
x2 +y2 = 2x, and the cone z = +vx2 +y2.
12.3 Compute fA (x2 +y2)2 dx dy dz, where A is the region bounded by the plane
z = 2 and the surface x2 +y2 = 2z.
12.4 Compute fAx, where
A = {-<x,y,z>:x2+y2+z2~ a2,x~ O,y~ O,z~O}.
12.5 Compute
 i(::+~:+::y/2,
whe~ A is the region bounded by the ellipsoid
) ::+~:+::=l.
i'et p be a nonnegative function (to be called the density of mass in the following
discussion) defined on a domain V in P. The total mass of -< V, p> is defined as
M = kP(X) dx.
If M F- 0, the center of gravity of -< V, p> is the point C = -<Cl, C2, Ca>, where
C1 = .!InXIP(X) dx,
C2 = .!InX2P(X) dx,
Ca = .!Lxap(x) dx.
350 INTEGRATION 8.12
12.6 A homogeneous solid (where P is constant) is given by Xl ~ 0, X2 ~ 0, X3 ~ 0,
and
2 2 2
Xl + X2 + X3 < 1.
a 2 b2 c2 -
Find its center of gravity.
12.7 The unit cube has density p(x) = X1X3. Find its total mass and its center of
gravity.
12.8 Find the center of mass of the homogeneous body bounded by the surfaces
x2 +y2 +z2 = a 2 and x2 +y2 = ax.
The notion of center of mass can, of course, be defined for a region in a Euclidean spacc
of any dimension. Thus, for a region D in the plane with density p, the center of mass
will be the point -<xo, Yo>-, where
JDXP
Xo = - -
JDP
and
JDYP
Yo = - - .
JDP
12.9 Let D be a region in the xz-plane which lies entirely in the half-plane X > 0.
Let A be the solid in 1E3 obtained by rotating D about the z-axis. Show that IL(A) =
211" dlL(D) , where d is the distance of the center of mass of the region D (with uniform
density) from the z-axis. (Use cylindrical coordinates.) This is known as Guldin's rule.
Observe that in the definition of center of gravity we obtain a vector (i.e., a point
in 1E3) as the answer by integrating each of its coordinates. This suggests the following
definition: Let V be a finite-dimensional vector space, and let el, ... , ek be a basis
for V. Call a map ffrom lEn to V (f is a vector-valued function on lEn with values in V)
contented if when we write f(x) = L: fi(x)ei, each of the (real-valued) functions fi
is contented. Define the integral of f over D by
12.10 Show that the condition that a function be contented and the value of its
integral are independent of the choice of basis e1, ... , ek.
Let ~ be a point not in the closed domain D, which has a
mass distribution p. The gravitational force on a particle of
unit mass situated at ~ is defined to be the vector
1p(x)(x - ~) dx
D Ilx - ~113
(here x - ~ is an 1E3-valued function on [3). Fig. 8.16
12.11 Let D be the spherical shell bounded by two concentric spheres 81 and 82
(Fig. 8.16), with center at the origin. Let Pbe a m;l.SS distribution on D which depends
only on the distance from the center, that is, p(x) = f(llxll). Show that the gravita-
tional force vanishes at any ~ inside 81.
12.12 -< D, p>- is as in Exercise 12.5. Show that the gravitational force on a point
outside 82 is the same as that due to a particle situated at the origin and whose mass
is the' total mass of D.
8.13 ABSOLUTELY INTEGRABLE FUNCTIONS 351
13. ABSOLUTELY INTEGRABLE FUNCTIONS
Thus far we have been dealing with bounded functions of compact support. In
practice, we would like to be able to integrate functions which neither are
bounded nor have compact support. Let f be a function defined on lEn, let M be
a nonnegative real number, and let A be a (bounded) contented subset of lEn.
Let If be the function
If(x) = {~f(x)
if x El A,
if x E A and If(x)I > M,
if x E A and If(x) I ~ M.
Thus If is a bounded function of compact support. It is obtained from f by
cutting f back to zero outside A and cutting f back to M when If(x) I > M.
We say that a function f is absolutely integrable if
i) If is a contented function for all M > 0 and contented sets A; and
ii) for any E > 0 there is a bounded contented set A. such that eA• • f is
bounded and for all M > 0 and all B with B n A. = >0,
JlfWI < E.
It is easy to check that the sum of two absolutely integrable functions is again
absolutely integrable. Thus the set of absolutely integrable functions forms a
vector space. Note that if f satisfies condition (i) and If(x) I < Ig(x)1 for all x,
where g is absolutely integrable, then f is absolutely integrable.
Letf be an absolutely integrable function. Given any E, choose a correspond-
ing A.. Then for any numbers M I and M 2 ~ maXxEA. If(x) I and for any sets
Al ~ A. and A2 ~ A.,
If we let E ~ 0 and choose a corresponding family of A., then the above inequal-
ity implies that the lim ffA. is independent of the choice of the A •. We define
this limit to be ff.
We now list some very crude sufficient criteria for a function to be absolutely
integrable. We will consider the two different causes of trouble-nonbounded-
ness and lack of compact support.
Let f be a bounded function with fA contented for any contented set A.
Suppose If(x) I ::; Cllxll-k for large values of Ilxll. Let Br be the ball of radius r
centered at the origin. If TI is large enough so that the inequality holds for
IIxll ~ Tb then for T2 ~ TI we have
JlfBr2-Br11 ::; C ( IIxll-k = COnf.r2Tn-1-k,
JBr2-Brl rl
where On is some constant depending on n (in fact, it is the "surface area" of the
unit sphere in lEn). If k > n, this last integral becomes
COn (n-k n-k)
n _ k T2 - Tl ,
352 INTEGRATION 8.13
which is ~ [Cln/(k - n)lr~-k, which tends to zero as r1 ---+ 00 if k > n. Thus
we can assert:
Let f be a bounded function such that fA is contented for any contented
set A. Suppose that If(x)1 ---+ 0 as Ilxll ---+ 00 in such a way that Ilxllklf(x)1 is
bounded for some k > n. Then f is absolutely integrable.
Now let us examine the situation when f is of compact support but un-
bounded. Suppose first that there is a point Xo such that f is bounded in the
complement of any neighborhood of Xo. Suppose, furthermore, that
If(x)1 ~ Ckllx - xoll-k
for some constants C and k. Then if If(x) I > 111, Ilx - xoll-k > M/C or
IIx - xoll < C/MI/k.
Let B1 be the ball of radius CM1I / k centered at Xo. Then If(x)1 > M1
implies that x E B 1. Furthermore, for M 2 > M 1 we have
where B2 is the ball of radius CM-;I/k centered at Xo. Thus
where Q n and V n depend only on n. If k < n, the integral on the right becomes
Cn- k Cn-k
__ (M~k-n)/k _ M~k-n)/k) < __ M(k-n)/k
n-k n-k 1 •
Thus
which can be made arbitrarily small by choosing M 1 large.
Thus if f has compact support and is such that fM is contented for all M and
If(x) I < Cllx - xoll-k with k < n, thenfis absolutely integrable.
More generally, let S be a bounded subset of an l-dimensional subspace of
lEn. Let d(x) denote the distance from x to S. Let f be a function of compact
support with fM contented for all M. If If(x) I < C d(x)-k with k < n - l,
then f is absolutely integrable. The proof is similar to that given above and is left
to the reader.
Let Uk} be a sequence of absolutely integrable functions. Under what
conditions will the sequence ff n ---+ ff if the sequence A(x) ---+ f(x)? Even if the
sequence converges uniformly, there is no guarantee that the integrals converge.
For instance, if fk = (l/kn)eD~, then Ifk(X)1 ~ l/kn, so that fk approaches zero
uniformly. On the other hand, ffk = 1 for all k.
8.13 ABSOLUTELY INTEGRABLE FUNCTIONS 353
We say that a set of functions Uk} is uniformly absolutely integrable if for
any E > 0 there is an A. which can be chosen independently of k such that
for all M
wherever B n A. = 0.
We frequently verify that {ik} is uniformly absolutely integrable by showing
that there is an absolutely integrable function g such that Ifk(X)1 ~ Ig(x)1 for
all k and x.
Let Uk} be a uniformly absolutely integrable sequence of functions. Suppose
that fk ~ f uniformly. Suppose in addition that f is absolutely integrable. Then
Ifk ~ If. In fact, for any 0 > 0 we can find a ko such that Ifk(X) - f(x) I < 0
for all k > ko and all x. We can also find A. and M. such that
Ilfk - IfI < Ilf~A. - Ifkl + Ilffo - IfI+ Ilf~A. - Iffol
< E+E+oj.l(A.),
which can be made arbitrarily small by first choosing E small (which then gives
an A.) and then choosing 0 small (which means choosing ko large).
The main applications that we shall make of the preceding ideas will be to
the problems of computing iterated integrals and of differentiating under the
integral sign.
Proposition 13.1. Let f be a function on ~k X ~l. Suppose that the set of
functions U(x, .)} is uniformly absolutely integrable, where x is restricted
to lie in a bounded contented set K C ~k. Then the function eKXRI. f is
absolutely integrable, and
( zf = ( ( zf(x, y) dy dx = ( I ( f(x, y) dx dy.
1KxR 1K JR JR 1K
Proof. By assumption, for any E > 0 we can find M and A. C ~l such that
if AnA. = 0. (13.1)
Now for any set B in ~n,
leKxRllff/1 = leK(x)DIff/(x, ·)1
~ j.I(K) E if B n K X A. = 0.
This shows that eKXRI f is absolutely integrable on ~n. Now choose a sufficiently
large 0 = 01 X 02 and an M such that
IIKXRIf - IKXRI ft! I< E,
and also such that
for all x E K.
354 INTEGRATION 8.13
Then we have
and
r Ifff = r r Jff (x, y) dy dx.
lxXR JK JR
Thus
so that
r If= r rJ(x,y)dydx.
JKXR JK JR
Finally Eq. (13.1) shows that the function F(y) = fK f(', y) IS absolutely
integrable. In fact, using the same A and M as in (13.1), we get
Thus we get
r zf = r r J(x, y) dy dx = r I r f(x, y) dx dy. 0
lxXR lx JR JR lx
An extension of the same argument shows the following.
Proposition 13.2. Let f be absolutely integrable on IR.n and such that thc
functions f(x, .) are uniformly absolutely integrable for each x E IR.k • Then
If = Ilf(x, y) dy dx.
We now turn our attention to the problem of differentiating under thc
integral sign.
Proposition 13.3. Let (t, x) ~ F(t, x) be a function on I X IR.n, wherc
I = [a, b] C IR.. Suppose that
i) F and aFj at are continuous functions on I X IR.n;
ii) (aFjat)(t,') is a uniformly absolutely integrable family of functions;
iii) F(t,') is absolutely integrable for all tEl.
Letf(t) = fF(t, .). Thenfis a differentiable function of t and
f'(t) = r (aFjat)(t, .).
JRn
Proof. Let G(t) = fRn(aFjat)(t, .). Then G(t) is continuous; hence we can pass
to the limit under the integral sign of a family of absolutely integrable
functions. Furthermore,
f G(s) ds = lnf (aFjat)(s,') ds
8.14 PROBLEM SET: THE FOURIER TRANSFORM 355
by Proposition 13.1. Thus
1t G(s) = ( (F(t,') - F(a, .)) = ( F(t,') - ( F(a,') = f(t) - f(a).
a JRn JRn JRn
Differentiating this equation with respect to t gives the desired result. 0
Finally, let us state the change of variables fonnula for absolutely integrable
functions.
Let tp: U ~ V be a differentiable one-to-one map with differentiable inverse,
where U and V are two open sets in ~n. Let f be an absolutely integrable
function defined on V. Then (f 0 tp)ldet J <pI is an absolutely integrable func-
tion on U and
Jvf= Ju (fo tp)ldetJ<pI·
Proof. To show that (f 0 tp)ldet J <pI is absolutely integrable, let E > 0 and
choose an A. C V such that (ii) holds. Then A. is compact, and therefore so is
tp-l(A.). In particular, tp-l(A.) is a bounded contented set and Idet J <pI is
bounded on it. If B n tp-l(A.) = 0, where Be U is bounded and contented,
then
( IdetJ<pIMI(fotp)MI::; J. IfMI < E.
lB <p(B)
This shows that (f 0 tp) Idet J <pI is absolutely integrable. The rest of the proposi-
tion then follows from
by letting E ~ O. 0
EXERCISES
13.1 Evaluate the integral f~"" e-x2 dx. [Hint: Compute its square.]
13.2 Evaluate the integral fo"" e-x2x2k dx.
13.3 Evaluate the volume of the unit ball in an odd-dimensional space. [Hint: Ob-
serve that the Jacobian determinant for "polar" coordinates is of the form rn - 1 X j,
wherejis a function of the "angular variables". Thus the volume of the unit ball is of
the form CfJ rn - 1 dr, where C is determined by integrating j over the "angular vari-
ables". Evaluate C by computing <f':"" e-x2 dx)n.]
14. PROBLEM SET: THE FOURIER TRANSFORM
Let a = <al, ... , an> be an n-tuple whose entries are nonnegative integers.
By Da we shall mean the differential operator
!lal+"'+a"
Da = _(J_ _ __
ax~l ... ax~"
356 INTEGRATION 8.14
Let lal = al +... + an. Let Q(x, D) = Llal:::;k aa(x)Da be the differential
operator where each aa is a polynomial in x. Thus if f is a Ck-function on IRn,
we have
For any f which is Coo on IRn we set
IIfllQ = sup IQf(x) I·xERn
We denote by s the space of all f E Coo such that
IlfllQ < 00 (14.1)
for all Q. To see what this means, let us consider those Q with k = o. Then
(14.1) says that for any polynomial a(·) the function a· f is bounded. In other
words, f vanishes at infinity faster than the inverse of any polynomial; that is,
lim IlxIIPf(x) = 0
IIxll--->oo
for all p. To say that (14.1) holds means that the same is true for any derivative
of f as well.
If f is a Coo-function of compact support, then (14.1) obviously holds, so
f E s. A more instructive example is provided by the function n given by
n(x) = e-lIxIl2•
Since limT--->oo rPe-r2 = 0 for any p, it follows that limllxll--->oo a(x)n(x) = o. On
the other hand, it is easy to see (by induction) that Dan(x) = Pa(x)n(x) for
some polynomial Pa. Thus Qn(x) = PQ(x)n(x), where PQ is a polynomial.
Thus n E s.
It is easy to see that the space S is a vector space. We shall introduce a
notion of convergence on this space by saying that fn ~ f if for every fixed Q,
Ilfn - fllQ ~ o.
(Note that the space S is not a Banach space in that convergence depends on an
infinity of different norms.)
EXERCISES
14.1 Let cp be a Coo-function which grows slowly at infinity. That is, suppose that
for every a there is a polynomial Pa such that
for all x.
Show that if I E S, then cpl E S. Furthermore, the map of S into itself sending I ~ cpl
is continuous, that is, if In ~ I, then CPln ~ cpf.
8.14 PROBLEM SET: THE FOURIER TRANSFORM 357
For x = -<Xl, ... , Xn> E ~n and ~ = -< ~1, ... , ~n> E ~n* we denote the value
of ~ at x by
Also for any a = -<al , ... , an> and any x E ~n we let
xa = (Xl)al ... (xn)an ,
and similarly ~a = (e)al ... (~n)an, etc.
For any f E S we define its Fourier transformj, which is a function on ~n*, by
Jm = J e-i(x'~>f(x) dx.
We note that
J(O) = Jf and
14.2 Show that j possesses derivatives of all orders with respect to ~ and that
DrJm = (-i)laIJe-i(x.nxaf(x) dx;
in other words,
where g(x) = (-i) lalxaf(x).
14.3 Show that
D~JW = OW,
/'-..
aaf .m = i~1m.
XJ
[Hint: Write the integral as an iterated integral and use integration by parts with
respect to the jth variable.]
14.4 Conclude that the map fl---+ Jsends S(~n) into S(~n*) and that if fn ~ 0 in S,
thenJn ~ 0 in S(~n*).
14.5 Show that
f:Jw = e-i(",.t>jw
Recall that T",f(x) = f(x - w).
for any w E ~n.
14.6 For any f E S define f by
f(x) = f(-x),
where denotes complex conjugation. Show that
1w =jW.
14.7 Let n = 1, and letfbe an even real-valued function of x. Show that
Jm = J cos (x, ~) f(x) dx.
14.8 Let n(x) = e-(1I2).r2 where x E ~l. Show that
and conclude that
dn ( ) • )d~ ~ = -~ri(~ ,
log nW = _-H2 + const,
358 INTEGRATION
so that
n(~) = const X e-O/2a2.
Evaluate this constant as vz;;: by setting ~ = 0 and using Exercise 13.1. Thus
nW = V21r e-(1I2)E2.
8.14
14.9 Show that the limit limE-->o IF' (sin x)/x dx exists. Let us call this limit d.
Show that for any R > 0, limE-->o I.II· (sin Rx)/x dx = d.
If f E S, we have seen thatJ E S(lRn*). We can therefore consider the function
fei(Y.E)Jm d~.
The purpose of the next few exercises is to show that
fey) = _1_ jei (1I·E>Jm d~(2'/1")" •
(14.2)
We first remark that since all integrals involved are absolutely convergent, it
suffices to show that
f Rn fRI
fey) = lim ... lim ~ J(~I, ••. , nei(yIEI+... +1In~l d~1 ... dt.
RI-->oo R,,-->oo (2'/1") -R" -RI
Substituting the definition of f into this formula and interchanging the order of inte-
gration with respect to x and ~, we get
I· I· ( 1 )"fiRn fRIf( 1 n) i[(yl_ZllEI+··+(1In-znlEnl dl:l dl:ndx1m··· 1m - ... x, ... ,xe ., ... ., .
RI-->oo R,,-->oo 27 -R" -RI
It therefore suffices to evaluate this limit one variable at a time (provided the conver-
gence is uniform, which will be clear from the proof). We have thus reduced the problem
to functions of one variable. We must show that if f E S(IR1), then
R
fey) = lim 21 fer f(x)ei(Y-ZlE d~ dx.
R-->oo '/I" J-R
We shall first show that
R
fey) = lim 41d fer f(x)ei(Y-ZH d~ dx,
R-->oo J-R
where d is given in Exercise 14.9.
14.10 Show that this last integral can be written as
.! foo f( ) sin R(y - x) dx = ! 100
fey - u) +fey +u) . Ru d
2d x d 2 sm u._ y-x 0 u
14.11 Let
g(u) = fey - u) ~ fey +u) - fey).
8.14 PROBLEM SET: THE FOURIER TRANSFORM 359
Show that g(O) = 0 and conclude that g(x) = xh(x) for 0 ~ x ~ 1, where h E Cl. By
integrating by parts, show that
I rg(u) sin Ru + t g(u) sin Rul < const -R1 .
Jl u JlIE U
Conclude that
1· 1 100
fey - u) +fey +u) . Ru d f( )Im-
d 2 sm- U= y.
R~ 0 U
This proves that
fey) = 4~ feill~fm d~.
14.12 Using Exercise 14.8, conclude that d = 7r/2.
Let h E Sand h E S. Define the function h *h by setting
h *12(x) = jh(x-Y)h(y)dy.
Note that this makes good sense, since the integrand on the right clearly converges for
each fixed value of x. We can be more precise. Since fi E S, we can, for any integer p,
find a Kp such that
so that
f L Rn
Ih(y)1 < 1 +- Rp .
IIIIII>R
Then
j (l+ IIxllq)h(x - y)h(y) dy = { (1+ IIxllq)h(x - y)h(y) dy
J111111«1/2) II" II
+ { (1 +Ilxllq)h(x - y)h(y) dy.
J111111>(1/2) II" II
The first integral is at most
Cn (illxllnl + Ilxll)q m~x Ih(z)1 1 +~ttlxlDP ,
while the second is at most
(1 + IIxllq) max Ih(u)1 Lp(illxlDn
" 1 + (illxlDp
By choosing p > q +n, we see that both terms go to zero. Thus
lim (1 + IIxllq)h *hex) = o.
II" 11--+00
14.13 Show that
a (ah ) (ah)-. (fl *h) = -. *h = h * -. .ax' ax' ax'
Conclude that h *hE S.
360 INTEGRATION
14.14 Show that if cp is any bounded continuous function on IR", then
IIcp(x +y)f (x)h(y) dx dy = ICP(U)(fl *h)(u) duo
14.15 Conclude that
- - - - - - A A
h*hW =hWhW.
14.16 Show that
f *l(y) = c~rf'Jw,2ei(Y'O d~.
14.17 Conclude that for any f E S,
[Hint: Set y = 0 in Exercise 14.16.]
8.14
(14.3)
The following exercises use the Fourier transform to develop facts which are useful
in the study of partial differential equations. We will make use of these facts at the
end of the last chapter. The reader may prefer to postpone his study of these problems
until then.
On the space S, define the norm II II. by setting
IIfll~ = (21frnl(1 + 1I~1I2)'ljWI2 d~,
and the scalar product (f, g). by
(f' g). = 1(1+ 1I~1I2)'JWgW d~.
14.18 Let s = R be a nonnegative integer. Show that
IIfll~ = L '(R ~ I /)' f1Daf(x) 12 dx,
lal~Ra. a .
where a! = a1! ... an!. [Use the multinomial theorem, a repeated application of
Exercise 14.3, and Eq. (15.3).]
We thus see that IIfliR measures the size of f and its derivatives to order R in
the square integral norm. It is helpful to think of II II. as a generalization of this notion
of size, where now s can be an arbitrary real number.
Note that
IIfll.::::;; IIfllt if s::::;; t.
For any real s define the operator K8 by setting
14.19 Show that the operator K = Kl is given by
a2f
Kf =f- L 2 '
aXl
8.14 PROBLEM SET: THE FOURIER TRANSFORM 361
14.20 Show that for any real numbers 8 and t,
IIK'fllt = Ilfllt+28
and
14.21 Show that Ko+t = K' 0 Kt, so that, in particular, K' in invertible for all 8.
We now define the space H8 to be the completion of S under the norm II II.. The
space H8 is a Hilbert space with a scalar product ( , )8. We can think of the elements
of H. as "generalized functions with generalized derivatives up to order 8". By con-
struction, the space S is a dense subspace of H. in the norm II II•. We note that Exer-
cise 14.20 implies that the operator K' can be extended to an isometric map of H t into
H t -2•. We shall also denote this extended map by K8. By Exercise 14.21,
is the inverse of K', so that K' is a norm-preserving isomorphism of H t onto H t -28.
14.22 Let u E H. and v E H -s. Show that
I(u, v)ol ::; lIuI18 1Ivll-8 •
Thus we can extend -<u, v >- ~ (u, v)o to a function on H. X H -s which is linear in
u and antilinear in v [that is, (u, aVI + b2V2)0 = a(u, VI) + b(u, V2)] and satisfies the
above inequality. Thus any v E H -s defines a bounded linear function, l, on H. by
l(u) = (u, v)o.
14.23 Conversely, let l be a bounded linear function on Hs. Show that there is a
v E H -s with l(u) = (u, v)o for all u E H•. [Hint: Consider the linear form v = K'w,
where w is a suitable element of H8 , using Theorem 2.4 of Chapter 5.]
14.24 Show that
I(u, v)ol
IIvll-s = sup II IIuEH. U.
u,,",O
(Exercise 14.22 gives an inequality. If v ;& 0, take u = K-'v to get
in order to get an equality.)
14.25 Let 28 > n (where our functions are defined on IRn). Show that for any f E S
we have
(Sobolev's inequality).
(Use Eq. (14.2), Schwarz's inequality, and the fact that the integral on the right of
the inequality is absolutely convergent.)
Sobolev's inequality shows that the injection of S into C(lRn) extends to a continuous
injection of H. into C(lRn), where C(lRn) is given the uniform norm. We can thus
regard the elements of H. as actual functions on IRnif 8 > n/2.
362 INTEGRATION 8.14
By induction on lal we can assert that for 8 > n/2, any f E H 1al +. has lal con-
tinuous derivatives and
(14.4)
14.26 Let Q be a bounded open subset of ~n. Let cP E S satisfy supp cP C Q. Show
that
14.28 Show that
and conclude that
for all ~.
I~1li,?W I ~ JL(Q) 1/211Dllcpll0,
1(1 + 11~112)ki,?2WI ~ JL(Q)llcpllz.
14.29 More generally, let y; be a function in S which satisfies y;(x) = 1 for all x E Q,
and let cp E S satisfy supp cp C Q. Show that
Ii,?WI = I(cp,h)ol ~ IlcpI181Ihll-s,
where y;~(x) = y;(x) e-i<x,~), and that
ID~i,?WI ~ Ilcpll.IIY;~ 11-.,
where y;~(x) = xay;(x)e-i(x,~).
Let us denote by H~ the completion under II II. of the space of those functions in S
whose supports lie in Q. According to Exercise 14.29, any cp E H~ defines an actual
function i,? of ~ which is differentiable and satisfies
ID~i,?(~) I ~ Ilcpll.IIY;~(x) 11-8,
where 11Y;~(x)il-. depends only on Q, a, ~, and -8, and is independent of cpo Further-
more, Ilcpll; = J(1 + 11~112)81i,?WI2 d~.
14.30 Let 8 < t. Then the injection H t ~ H. is a compact mapping. That is, if
{CPi} is a sequence of elements of HP such that IICPil1t ~ 1 for all i, then we can select a
subsequence {cpi) which converges in II II•. [Hint: By Exercise 14.29, the sequence oj
functions cpi(~) is bounded and equicontinuous on U: II~II ~ r} for only fixed r. w(~
can thus choose a subsequence which converges uniformly and therefore a subsubsc-
quence which converges on U: II~II < r} for all r (the uniformity possibly depending
on r). Then if {CPi) is this subsubsequence,
IICPij - CPikll~ = f(1 + 1I~1I2)8IcpijW - CPikWI2 d~
= ( (1 + 1I~1I2)8Icpi;(~) - CPikWI2 d~
JII~II:$ r
+ ( (1 + 11~1I2)8IcpijW - CPikWI2 d~
lu~ lI>r
~ ( (1 + 11~112)8Icpij(~) - CPik WI2 d~
lu~ 11:$ r
+ (1 + 1I~1I2)8-t{IICPij1l7 + IICPikll~}.J
CHAPTER 9
DIFFERENTIABLE MANIFOLDS
Thus far our study of the calculus has been devoted to the study of properties
of and operations on functions defined on (subsets of) a vector space. One of
the ideas used was the approximation of possibly nonlinear functions at each
point by linear functions. In this chapter we shall generalize our notion of space
to include spaces which cannot, in any natural way, be regarded as open subsets
of a vector space. One of the tools we shall use is the "approximation" of such a
space at each point by a linear space.
Suppose we are interested in studying functions on (the surface of) the unit
sphere in 1E3. The sphere is a two-dimensional object in the sense that we can
describe a neighborhood of every point of the sphere in a bicontinuous way by
two coordinates. On the other hand, we cannot map the sphere in a bicontinuous
one-to-one way onto an open subset of the plane (since the sphere is compact and
an open subset of 1E2 is not). Thus pieces of the sphere can be described by open
subsets of 1E2, but the whole sphere cannot. Therefore, if we want to do calculus
on the whole sphere at once, we must introduce a more general class of spaces
and study functions on them.
Even if a space can be regarded as a subset of a vector space, it is conceivable
that it cannot be so regarded in any canonical way. Thus the state of a (homo-
geneous ideal) gas in equilibrium is specified when one gives any two of the three
parameters: temperature, pressure, or volume. There is no reason to prefer any
two to the third. The transition from one set of parameters to the other is given
by a one-to-one bidifferentiable map. Thus any function of the states of the
gas which is a differentiable function in terms of one choice of parameters is
differentiable in terms of any other. Thus it makes sense to talk of differentiable
functions on the states of the gas. However, a function which is linear in terms
of one choice of parameters need not be linear in terms of the other. Thus it
doesn't really make sense to talk of linear functions on the states of the gas.
In such a situation we would like to know what properties of functions and what
operations make sense in the space and are not artifacts of the description we
give of the space.
Finally, even in a vector space it is sometimes convenient to introduce
"nonlinear coordinates" for the solution of specific problems: for example, polar
coordinates in Exercises 11.3 and 11.4, Chapter 8. We would therefore like to
know how various objects change when we change coordinates and, if possible,
to introduce notation which is independent of the coordinate system.
363
364 DIFFERENTIABLE MANIFOLDS 9.1
We will begin our formal discussion with the definition of differentiable
manifolds. The basic idea is ~imilar to the one that is used in everyday life to
describe the surface of the earth. One gives a collection of charts describing small
overlapping portions of the globe. We can piece the whole picture together by
seeing how the charts match up.
Fig. 9.1
1. ATLASES
Let M be a set. Let V be a Banach space. (For almost all our applications we
shall take V to be IRn for some integer n.) A V-atlas of class Ck on M is a collec-
tion a of pairs (Ui , l{Ji) called charts, where Ui is a subset of M and l{Ji is a bijec-
tive map of Ui onto an open subset of V subject to the following conditions
(Fig. 9.1):
AI. For any (Ui , l{Ji) E a and (Uj, I{Jj) E a the sets l{Ji(Ui n Uj) and
I{Jj(Ui n Uj ) are open subsets of V, and the maps
l{Ji 0 I{Jjl: I{Jj(Ui n Uj) ---t l{Ji(Ui n Uj)
are differentiable of class Ck.
A2. UUi = M.
The functions l{Ji 0 I{Jjl are called the transition functions of the atlas a.
The following are examples of sets with atlases.
ExalDple I. The trivial example. Let M be an open subset of V. If we take a
to consist of the single element (U, I{J), where U = M and I{J: U ---t V is the
identity map, then Axioms Al and A2 are trivially fulfilled.
ExalDple 2. The sphere. Let M = sn denote the subset of IRn+t given by
(Xl)2 +... + (xn+l)2 = 1. Let the set U1 consist of those points for which
xn+t > -1, and let U2 consist of those points for which xn+t < 1. Let
1P1; U1 ---t IRn
9.1 ATLASES 365
be given by
i
i (I n+l) X • 1y 0 IPI X , ••• , x = 1 +xn +l ' ~ = , ... ,n,
where y ... , yn are coordinates on IRn. Thus the map IPI is given by the
projection from the "south pole", -<0, ... , 0, -1>-, to IRn regarded as the
equatorial plane (see Fig. 9.2). Similarly, define 1P2 by
i
i 0 (I n+l) _ X •
Y 1P2 X , ••• , x - I _ xn +l
Then IPI(U1 n U2) = 1P2(U1 n U2 ) = {y E IRn: y ~ O}. Now
Thus
or
'" i 2( 1 n+l (XI)2 +... + (xn)2
£...., (y 0 1P1) x, ... , x ) = (1 +xn+l) 2
1 - (xn+l)2 1 _ xn+l
= (1 + xn+l)2 = 1 + xn+1 •
1P1 (x)
1P2(X) = 111P1(X)11 2
In other words, the map 1P2 0 IPIl, defined for all y ~ 0, is given by
-l( ) Y1P2 0 1P1 Y = llYfi2 .
Thus conditions Al and A2 are fulfilled.
Fig. 9.2
Note that the atlas we gave for the sphere contains only two charts (each
given by polar projection). An atlas of the earth usually contains many more
charts. In other words, many different atlases can be used to describe the same
set. We shall return to this point later.
366 DIFFERENTIABLE MANIFOLDS 9.1
Fig. 9.3
Example 3. The circle. The circle 8 1 is a "one-dimensional sphere" and therefore
has an atlas as described in Example 2. We wish to describe a different atlas
on 8 1. Regard 8 1 as the unit circle x~ +x~ = 1, and consider the function 8b
defined in a neighborhood of < 1, 0> on the upper semicircle of 8 1, which gives
the angle from the point on 8 1 to < 1, 0> (see Fig. 9.3). As we move counter-
clockwise around the circle, this function is well defined until we hit < 1, °>
again. We will take, as the first chart in our atlas, (U1, 81), where U1 =
8 1 - {< 1,0>} and 81 is the function defined above. Let U2 = 8 1 - {<o, I>},
and define 82 to be 7r/2 plus the angle (measured counterclockwise) from
<0,1> (see Fig. 9.4). Now U1 n U2 = 8 1 - {<1,0>, <0, I>}, and
81(U1 n U2) = (0,27r) - {7r/2}.
I I
I I
o "./2
The map 82 0 811 is given by
82 0 811(x) = {~+ 27r
2".
if °< x < 7r/2,
if 7r/2 < x < 27r.
Example 4. The product of two atlases. Let a = {CUi, 'Pi)} be a V I-atlas on a set
M, and let <B = {(Wb 1/Ij)} be a V2-atlas on a set N, where VI and V2 are
Banach spaces. Then the collection e = {(Ui X Vj, 'Pi X 1/1j)} is a (V1 X V2)-
atlas on M X N. Here 'Pi X 1/Ij(p, q) = < 'Pi(P), 1/Ii(q) > if <p, q> E Ui X Wj.
lt is easy to check that e satisfies conditions Al and A2. We shall call e the
product of a and <B and write e = a X <B.
For instance, let M = (0,1) C 1R1 and N = 8 1• Then we can regard
M X N as a cylinder or an annulus. If M = N = 81, then M X N is a torus.
Cylinder Annulus Torus
9.2 FUNCTIONS, CONVERGENCE 367
It is an instructive exercise to write down the atlases and transition functions
explicitly in these cases.
ExaDlple 5. As a generalization of our first example, let S be a submanifold of
an (n +m)-dimensional vector space X, as defined in Section 12 of Chapter 3.
For each neighborhood N defined there, the set S n N, together with the map l{J
which is defined as the projection 1r1 restricted to S, provides a chart with
values in V (where X is viewed as V X W). In such a neighborhood N the set S
is presented as a graph of function F. In other words,
SnN= {-<x,F(x)~ EVX W:XE1r1(S)},
where F is a smooth map of A = 1r1(S n N) into W. Let N' be another such
neighborhood with corresponding projection 1r~ (where now X is identified with
V X W in some other way). Then l{J' 0 l{J-1(X) = 1r~(x, F(x»), which shows
that l{J' 0 l{J-1 is a smooth map. Thus every submanifold in the sense of Chapter 3
possesses an atlas.
Exercise. Let!P" (projective n-space) denote the space of all lines through the origin
in 1R"+1. Any such line is determined by a nonzero vector lying on the line. Two such
vectors, -<Xl, ... , x,,+l ~ and -< y1, ... , y,,+l ~ , determine the same line if and only
if they differ by a factor, that is, yi = AX' for all i, where A is some (nonzero) real
number. We can thus regard an element of !P" as an equivalence class of nonzero
vectors. For each i between 1 and n +1, let U. C !P" be the set of those elements
coming from vectors with Xi ~ O. Map
by sending
~
1 i-I i+1 ,,+1 ~
1 ,,+1 X X X X
-<x, ... ,x ~~ -", ... ,-.-,-.-, ... ,-.- .x' x· x' x'
Show that the map ai is well defined and that {(Ui, ai)} is an atlas on P".
2. FUNCTIONS, CONVERGENCE
Let G, be a V-atlas of class Ck on a set M. Let/be a real-valued function defined
on M. For a chart (Ui , l{Ji) we obtain a function Ii defined on l{Ji(Ui) by setting
(2.1)
The function /i can be regarded as the "local expression of f" in terms of the
chart (Ui, l{Ji). In general, the functions /i will look quite different from one
another. For example, let M = S", let a be the atlas described, and let / be the
function on the sphere assigned to the point -<Xl, ... , x"+1 ~ the value x..+1.
Then
while
/2(Y) = /0 l{J21(y) = 1 - 1 +~IYIl2 '
as one can check by solving the equations.
368 DIFFERENTIABLE MANIFOLDS 9.2
Returning to the general discussion, we observe that the functions fi are
not completely independent of one another. In fact, it follows from the defini-
tion (2.1) that we have
Ii 0 <Pi 0 <pjl = I; on <Pj(Ui n Uj). (2.2)
[Thus in the example cited above we indeed have f2(Y) = fl(y/llyI12), as is
required by (2.2).]
We now come to a simple but important observation. Suppose we start
with a collection of functions {fi}, eachfi defined on <Pi(Ui), and such that (2.2)
holds. Then there exists a unique function f on M such that Ii = f 0 <pil. In
fact, define f by setting f(p) = fi (<Pi(p) if p E Ui. For f to be well defined, we
must be sure that this definition is consistent, i.e., that if p is also in Uj, then
fi(<Pi(p) = I;(<pj(p), but this is exactly what (2.2) says.
We can thus think of a real-valued function in two ways: as either
i) an object defined invariantly on M, i.e., a map from M to ~, or
ii) a collection of objects (in this case functions) one defined for each chart
and satisfying certain "transition laws", namely (2.2).
This dual way of looking at objects on M will recur quite frequently in what
follows.
Let M be a set with an atlas of class Ck • We will say that a function f is of
class Cl (l ~ k) if each of the functions Ii defined by (2.1) is of class Cl• Note
that since l ~ k, this can happen without any interference from (2.2). IfIi E Cl
and <pil 0 <Pj E Ck (k ~ l), then fi 0 (<pi l 0 <Pj) E Cl. If l were larger than k,
then in general fi would not be of class Cl if I; were, and there would be very
few functions of class Cl•
Since we will not wish to constantly specify degrees of differentiability of
our atlas, from now on when we speak of an atlas we shall mean an atlas of class Coo.
Let M be a set with an atlas <t. We shall say that a sequence of points
{Xi E M} converges to X E M if
i) there exists a chart (Ui , <Pi) E a and an integer Ni such that X E Ui and
for all k > N, Xk E Ui ;
ii) <Pi(Xkh>N converges to <p(x).
Note that if (Uj, <pj) is any other chart with X E Uj, then there exists an N j
such that <Pj(Xk) E Uj for k > N j and <Pj(Xk) ---? <pj(x). In fact, choose N j so
that <Pi(Xk) E <Pi(Ui n Uj) for all k ~ N j. (This is possible since <Pj(Ui n Uj)
is open by AI.) The fact that the <Pj(Xk) converge to <Pj(x) follows from the
continuity of <Pj 0 <pil. It thus makes good sense to say that {Xk} converges to x.
Warning. It does not make sense to say that a sequence {Xk} is a Cauchy
sequence. Thus, for example, let M = sn with the atlas described above. If {Xk}
is a sequence of points converging to the north pole in sn, then <PI (Xk) ---? 0,
while <P2(Xk) ---? 00. This example becomes even more sticky if we remove the
north pole, i.e., let M = sn - {-< 0, ... , 0, 1>} and define the charts as before.
9.3 DIFFERENTIABLE MANIFOLDS 369
Then {Xk} has no limit (in M). Clearly, {IP1 (Xk)} is a Cauchy sequence, while
{IP2(Xk)} is not.
Once we have a notion of convergence, we can talk about such things as
open sets and closed sets. We could also define them directly. For instance, a set
U is open if IPi(U n Ui) is an open subset of IPi(Ui) for all charts (Ui, IPi), and so on.
EXERCISES
2.1 Show that the above definition of a set's being open is consistent, i.e., that there
exist nonempty open sets. (In fact, each of the U/s is open.)
2.2 Show that a sequence {xc>} converges to x if and only if for every open set U
containing x there is an Nu with Xc> E U for a > Nu.
Let a = {CUi, IPi)} be an atlas on M, and let U be an open subset of M relative
to this atlas. Let a f U be the collection of all pairs (Ui n U, IPi f U). It is
easy to check that a f U is an atlas on U. We shall call it the restriction of
a to U.
Let f be a function defined on the open set U. We say that f is of class Cl
on U if it is of class Cl relative to the atlas (1, f U on U. For later convenience
we shall say that a function f defined on a subset of M is of class Cl if
i) the domain of f is some open set U of M, and
ii) f is of class Cion U.
3. DIFFERENTIABLE MANIFOLDS
In our discussion of the examples in Section 1, the particular choice of atlas that
we made in each case was rather arbitrary. We could equally well have intro-
duced a different atlas in each case without changing the class of differentiable
functions, or the class of open sets, or convergent sequences, and so on. We
therefore introduce an equivalence relation between atlases on M:
Let (1,1 and (1,2 be atlases on M. We say that they are equivalent if their
union (1,1 U (1,2 is again an atlas on M.
The crucial condition is that Al still hold for the union. This means that
for any charts (Ui, IP;) E (1,1 and (Wi> "'j) E (1,2 the sets IPi(Ui n Wj) and
"'j(Ui n Wj) are open and IPi 0 ",;1 is a differentiable map of "'j(Ui n Wj) onto
IPi(Ui n Wj) with a differentiable inverse.
It is clear that the relation introduced is an equivalence relation. Further-
more, it is an easy exercise to check that iff is a function of class Cl with respect
to a given atlas, it is of class Cl with respect to any equivalent one. The same is
true for the notions of open set and convergence.
370 DIFFERENTIABLE MANIFOLDS 9.3
Definition 3.1. A set M together with an equivalence class of atlases on M is
called a differentiable manifold if it satisfies the "Hausdorff property": For any
two pointsXl ¢ x2 ofMthere are open sets Uland U2 with Xl E UI andX2E U2 with
UI n U2 = 0.
In what follows we shall (by abuse of the language) denote a differentiable
manifold by M, where the equivalence class of atlases is understood. By an
atlas of M we shall then mean an atlas belonging to the given equivalence class,
and by a chart of M we shall mean a chart belonging to some atlas of M.
We sha.ll also adopt the notational convention that V is the Banach space
where the charts on M take their values (and shall say that M is a V-manifold).
If there are several manifolds, M 1, M 2, etc., under discussion, we shall denote
the corresponding vector spaces by VI, V2, etc. If V = Rn, we say that M is
an n-dimensional manifold.
Let M 1 and M 2 be differentiable manifolds. A map cp: M 1 ~ M 2 is called
continuous if for any open set U2 eM2 the set cp-1(U2) is an open subset of MI'
Let X2 E M 2, and let U2 be any open set containing X2' If CP(X1) = X2, then
cp-1(U2) is an open set containing Xl. If (W, a) is a chart about Xl, then
W n cp-1(U2) is an open subset of W, and a(W n cp-1(U2») is an open set in VI
containing a(xI)' Therefore, there exists an E > 0 such that cp(x) E U2 for all
X E W, with Ila(x) - a(xl)1I < E. In this sense, all points "close to Xl" are
mapped "close to X2". Note that the choice of E will depend on the chart (W, a)
as well as on Xb X2, U2, and cpo
If M b M 2, and M 3 are differentiable manifolds, and if cp: M 1 ~ M 2 and
"': M2 ~ M3 are continuous maps, it is easy to see that their composition
'" 0 cp is a continuous map from M1 to M3'
Let cp be a continuous map from M 1 to M 2. Let (Wb aI) be a chart on M 1
and (W2, (2) a chart on M 2. We say that these charts are compatible (under cp)
if cp(W1) c W 2. If a 2 is an atlas on M2and al is an atlas on Mb we say that al
and a2 are compatible under cp if for every (W b al) E al there exists a
(W2, (2) E a2 compatible with it, i.e., such that cp(W1) c W 2. (Note that the
map a2 0 (cp f WI) 0 all is then a continuous map of an open subset of VI into
V2') Given a 2 and cp, we can always find an al compatible with a2 under cpo
In fact, let a'l be any atlas on M b and set
al = {(WI n cp-I(W2»), a f (WI n cp-l(W2»)},
where (W11 a) ranges over all charts of a'i and (W2, (3) ranges over all charts
of a2.
Definition 3.2. Let M 1 and M 2 be differentiable manifolds, and let cp be a
map: MI .!4 M 2• We say that cp is differentiable if the following hold:
i) cp is continuous.
ii) Let al and a 2 be compatible atlases under cpo Then for any compatible
(Wb aI) E al and (W2, (2) E a2, the map
a2 0 cp 0 all: al(W1) ~ a2(W2)
is differentiable (as a map of an open subset of a Banach space into a
Banach space). (See Fig. 9.5.)
9.3 DIFFERENTIABLE MANIFOLDS 371
Fig. 9.5
In order to check that a continuous map tp is differentiable, it suffices to
check much less than (ii). Condition (ii) relates to any pair of compatible atlases
and any pair of compatible charts. In fact, we can assert:
Proposition 3.1. Let tp: M 1 -+ M 2 be continuous, and let (11 and (12 be
compatible atlases under tp. Suppose that for every (WI, aI) E al there
exists a (W2, (2) E (12 with tp(W1) c W2 and a2 0 tp 0 a'i'"1 differentiable.
Then tp is differentiable.
Proof. Let (U1, (31) and (U2, (32) be any charts on M 1 and M 2 with tp(U1) C U2'
We must show that f32 0 tp 0 f311 is differentiable. It suffices to show that it is
differentiable in the neighborhood of every point f3(XI), where Xl E Ul' Choose
(W1, aI) E (11 with X E W11 and choose (W2, aI) E a2 with tp(W1) C W2'
Then on f3I(W1 nUl), we have
f32 0 tp 0 f311 = (f32 0 a;I) 0 (a2 0 tp 0 all) 0 (al 0 f3II ),
sO that the left-hand side is differentiable. 0
In other words, it suffices to verify differentiability with one pair of atlases.
We have as a consequence:
Proposition 3.2. Let tp: M 1 -+ M 2 and 1/1: M 2 -+ M a be differentiable.
Then 1/1 0 tp is differentiable.
Proof. Let (1a be an atlas on Ma. Choose (12 compatible with (1a under 1/1,
a.nd then choose an atlas (11 on M 1 compatible with (12 under tp. For any
(W11 aI) E (11 choose (W2, (2) E (12 and (Wa, aa) E aa with tp(W1) C W2 and
I/I(W2) C Wa. Then aa 01/10 tp 0 a'i'"1 = (aa 01/10 a;I) 0 (a2 0 tp 0 a'i'"I) is dif-
ferentiable. 0
Exercise 3.1. Let MI = 8", let M2 = lPn, and let tp: MI -+ M2 be the map sending
each point of the unit sphere into the line it determines. (Note that two antipodal
372 DIFFERENTIABLE MANIFOLDS 9.:
points of ,sn go into the same point of [P'n.) Construct compatible atlases for <p and
show that <p is differentiable.
Nate that if f is any function on M with values in a Banach space, then f i~
differentiable as a function (in the sense of Section 2) if and only if it is differ-
entiable as a map of manifolds. In particular, let <p: M 1 ---+ M 2 be a differentiabl('
map, and let f be a differentiable function on IJI2 (defined on some open subset,
say U2). Then f 0 <p is a differentiable function on M 1 [defined on the open seL
<p-l(U2)]. Thus <p "pulls back" a differentiable function on lYI2 to 1If1• From
this point of view we can say that <p induces a map from the collection of differ-
entiable functions on 1II2 to the collection of differentiable functions on MI. WI:
shall denote this induced map by <p*. Thus
differentiable functions on M 2 ~ differentiable functions on M 1
is given by
<p*[f] = f 0 <p.
If 1/;: M 2 ---+ M 3 is a second differentiable map, then (1/; 0 <p) * goes from functiOlIC;
on AI3 to functions on 1If1, and we have
(1/; 0 <p)* = <p* o 1/;* (3.1 )
(note the change of order). In fact, for 0 on M 3,
(1/; 0 <p)*0 = 0 0 (1/; 0 <p) = (0 0 1/;) 0 <p = <p*[1/;*[0]]·
Observe that if <p is any map from M 1 ---+ M 2 and f is any function definel I
on a subset S2 of M 2, then the "pullback" <p*[f] = f 0 <p is a function defined ()II
<p-l(S2) of MI. The fact that <p is continuous allows us to conclude that if S2 ic;
open, then so is <p-l(S2). The fact that <p is differentiable implies that <p*[f] i,
differentiable whenever f is.
The map <p* commutes with all algebraic operations whenever they an'
defined. More precisely, suppose f and 0 take values in the same vector spacI'
and have domains of definition U1 and U2. Then f +9 is defined on U1 n (!~,
and <p*[fl + <p*[0] is defined on <p-l(U1 n U2), and we clearly have
<p*[f + oj = <p*[fj + <p*[0]·
EXERCISES
3.2 Let il12 be a finite-dimensional manifold, and let <p: ilh --> JIz be eontinuou,;
Suppose that <p*[fl is differentiable for any (locally defined) differentiable real-valul'd
function f. Conclude that <p is differentiable.
3.3 Show that if <p is a bounded linear map between Banach spaces, then <p* :1
defined above is an extension of <p* as defined in Section 3, Chapter 2.
9.4 THE TANGENT SPACE 373
4. THE TANGENT SPACE
In this section we are going to construct an "approximating vector space" to a
differentiable manifold at each point of the manifold. This will allow us to
formulate most of the notions of the differential calculus on manifolds.
Let 111 be a differentiable manifold, and let x be a point of 111 (Fig. 9.6).
Lei I C IR be an interval containing the origin. Let <p be a differentiable map
of I into 111 such that <p(O) = x. We will call <p a (differentiable) curve through x.
Let f be any differentiable real-valued function on 111 defined in a neigh-
borhood of x. Then <p*[fl is differentiable on IR and we can consider its derivative
at the origin. Define the operator D", by
D",(f) = d<p*[fl/ .dt t=o
In view of the linearity of <p*, the map
f ~ D",(f) is linear:
Similarly, we have Leibnitz's rule:
~)
M
o
Fig. 9.6
which can easily be checked. The functional D", depends on the curve <po If 'if;
is a second curve, then, in general, D'" ~ D",. If, however, D", = D"" then we
say that the curves <p and 'if; are tangent at x, and we write <p ,...., 'if;. Thus
if and only if D",(f) = D",(f) for all differentiable functions f.
It is easy to check that,...., is an equivalence relation. An equivalence class
of curves through x will be called a tangent vector at X. If ~ is a tangent vector
at x and <p E ~, we say that ~ is tangent to <p at X.
For any differentiable function f defined about x and any tangent vector ~,
we set
where <p E~. Thus ~ gives us a functional on differentiable functions defined
about X. We have
Haf + bg) = aHf) + b~(g),
Hfg) = f(xH(g) + g(xH(f)·
(4.1)
(4.2)
Let us examine what the equivalence relation,...., says in terms of a chart
(W, a) about X. The functional D",(f) can be written as
dfo <p / = d(fo a-I) 0 (a 0 <p) /
dt t=O dt t=o'
374 DIFFERENTIABLE MANIFOLDS 9.'1
If we set <I> = a 0 cp and F = f 0 a-I, then <I> is a parametrized curve in a Banaeh
space and F is a differentiable function there. We can thus write
D",(f) = dF(<I>'(O)) = D~'(o)F.
From this expression we see (setting '11 = a 0 if;) that if; '" cp if and only if
<1>'(0) = '11'(0). We thus see that in terms of a chart (W, a), every tangent.
vector ~ at x corresponds to a unique vector ~O! E V given by
~a = (a 0 cp),(O),
where cp E ~.
Conversely, given any v E V, there is a tangent vector ~ with ~O! = /'
In fact, define cp by setting cp(t) = a-I (a(x) + tv). Then cp is defined in a sma.ll
enough interval about 0, and (a 0 cp)' = v.
In short, a choice of chart allows us to identify the set of all tangent vector:;
at x with V. Let (U, (3) be a second chart about x. Then
~{3 = «(30 cp)'(O) = «(30 a-I) 0 (a 0 cp)'(O).
By the chain rule we thus have
~(3 = J{3oa-1(a(x))t",
where J-y(p) is the differential dYp of Y at p.
Since J{3oa-1(a(x)) is a linear map of V into itself, Eq. (4.3) says that tIl('
set of all tangent vectors at x can be identified with V, the identification beilll!-
determined up to an automorphism of V. In particular, we can make the set.
of all tangent vectors at x into a vector space by defining
a~ + br) = 5",
where 5" is determined by
for some chart a. Equation (4.3) shows that this definition is independent of CY.
We shall denote the space of tangent vectors at x by Tx(M) and shall can il.
the tangent space (to M) at x.
Let if; be a differentiable map of M 1 to M 2, and let cp be a curve passinl!-
through x E !III (see Fig. 9.7). Then if; 0 cp is a curve passing through if;(x) EM2'
It is easy to check that if cp '" cp, then if; 0 cp '" if; 0 cpo Thus the map if; induces a.
mapping of Tx(M 1) into Tif;(x)(M2 ) , which we shall denote by if;u. To repeat,
..
9.4 THE TANGENT SPACE 375
Fig. 9.8 Fig. 9.9
if ~ E Tx (M1), then t/lua) = fJ is determined by
and
for all cp E ~.
Let (U, a) be a chart about x, and let (W, (3) be a chart about 1/I(x). Then
~a = (a 0 cp)'(O)
fJIJ = «(301/1 0 cp)'(O) = «(301/1 0 a-I) 0 (a 0 cp)'(O).
By the chain rule we can thus write
fJIJ = JIJoy,oa-l(a(x»)~a.
This says that if we identify Tx(M1) with VI via a and identify T y,(x)(M2) with
V2 via (3, then 1/Iu becomes identified with the linear map JIJoy,oa-1(a(x»). In
particular, the map t/I*x is a continuous linear mapping from Tx(M1) to Ty,(x)(M2)'
If cp: M 1 -+ M 2 and t/I: M 2 -+ M 3 are two differentiable mappings, then it
follows immediately from the definitions that
(1/1 0 cp)u = t/I*<p(X) 0 cp*x. (4.4)
We have seen that the choice of chart (U, a) identifies Tx(M) with V.
Now suppose that M is actually V itself (or an open subset of V) regarded as a
differentiable manifold. Then M has a distinguished chart, namely (M, id).
Thus on an open subset of V the identity chart gives us a distinguished way of
identifying Tx(M) with V. It is sometimes convenient to picture Tx(M) as a
copy of V whose origin has been translated to x. We would then draw a tangent
vector at x as an arrow originating at x. (See Fig. 9.8.)
Now suppose that M is a general manifold and that 1/1 is a differentiable map
of M into a vector space VI' Then 1/1* (Tx(M») is a subspace of TY,lX)(V1),
If we regard t/I*(Tx(M» as a subspace of VI and consider the corresponding
hyperplane through x, we get the "plane tangent to 1/I(M) at x" in the intuitive
sense (Fig. 9.9).
376 DIFFERENTIABLE MANIFOLDS
It is very convenient to think of tangent vectors in this way, that is, to
regard them as vectors tangent to M if M were mapped into a vector spac('
H! is a real-valued differentiable function defined in a neighborhood U 01
of x E M, then we can regard it as a map of the manifold U to the manifold Iffi I.
We therefore get a map!*x: Tx(M) -> Tf(dlffil). Recall that we identify Ty(lffil)
with Iffil for any y E Iffil. Therefore, !*x can be viewed as a map from T*x(l't1)
to Iffi 1. The reader should check that this map is indeed given by
for ~ E TxCM). (4..'i)
In particular, if we take M 3 = Iffi and y., = ! in (4.4), we can assert:
Let y., be a differentiable map of M 1 to M 2, and let! be a differentiable fUll('
tion on M2 defined in a neighborhood of y.,(x). Then for any ~ E Tx(lII]),
~(y.,*(f)) = y.,h(O(f)· (4.())
From now on, we shall frequently drop the subscript x in y.,*x when it can 1)('
understood from the context. Thus we would write (4.4) as (y., 0 cp)* = y.,* 0 cP*.
Some authors call the mapping y.,*x the differential of y., at x and designate it # ,.
If 1111 and 1112 are open subsets of Banach spaces VIand V2 (and hence ar('
differentiable manifolds under their identity charts), then y.,*x as defined aboY<'
does reduce to the differential #x when Tx(lJI,) is identified with Vi. Thi.~
reduction does depend on the identification, however.
5. FLOWS AND VECTOR FIELDS
Let M 1 and M 2 be differentiable manifolds. A map g from M 1 -> M 2 is called :I
diffeomorphism if g is a differentiable one-to-one map of M 1 onto M 2 such thaI.
(I-I is also differentiable.
Let M be a differentiable manifold. A map cp: 1JI X Iffi -> 1JI is called a 01lC-
parameter group if
i) cP is differentiable;
ii) cp(x, 0) = x for all x E 1JI;
iii) cp(cp(x, s), t) = cp(x, s + t) for all x E 1JI and s, t E R
We can express conditions (ii) and (iii) a little differently. Let CPt: M -> JII
be given by
CPt(x) = cp(x, t).
For each t E Iffi the map CPt is differentiable. In fact,
where Lt is the differentiable map of M -> M X Iffi given by Lt(X) = (x, t).
Then condition (ii) says that CPo = id. Condition (iii) says that
CPt 0 CPs = CPt+s'
If we take t = -s in this equation, we get CPt 0 CP-t = id. Thus for each t
the map CPt is a diffeomorphism and (cpt)-l = CP-t.
9.5 FLOWS AND VECTOR FIELDS 377
Fig. 9.10
We now give some examples of one-parameter groups.
Example I. Let M = V be a vector space, and let wE M. Let cp: V X ~ --t V
be given by
cp(v, t) = v + two
It is easy to check that (i), (ii), and (iii) are satisfied. (See Fig. 9.10.)
Example 2. Let M = V be a finite-dimensional vector space, and let A be a
linear transformation A: V --t V. Recall that the linear transformation etA is
defined by
i.e., for any v E V,
2A2 3A3
etA = 1 + tA + _t_ +_t_ +...2! 3!
co ti
etAv = ~ -;-; Aiv.
i=03·
(See Figs. 9.11 and 9.12.) Since the convergence of the series is uniform on any
A=[_~ ~JV=!R2
V=!R2
A=[~~J
Fig.9.11 Fig. 9.12
378 DIFFERENTIABLE MANIFOLDS 9.5
compact set of <v, t>-, the map cp: M X ~ ~ M given by
cp(v, t) = etAv
is easily seen to be differentiable and to satisfy (ii) and (iii) as well.
Example 3. Let M be the circle 8 1, and let a be any real number. Let cp~ be the
diffeomorphism consisting of rotation through angle tao In terms of the atlas
a = {(Ub 81), (U2 , 82)}, the map cp is given by
81 (cp(x, t)) = 81 (x) +ta,
= 81(x) +ta - 21l',
82 (cp(x, t)) = 82(x) +ta,
= 82 (x) +ta - 21l',
x E Ub 81(x) < 21l' - ta,
x E U1 , 81 (x) > 21l' - ta,
x E U2, 82 (x) < 21l' +1l'/2 - ta,
x E U2, 82(x) > 21l' +1l'/2 - tao
(Strictly speaking, this doesn't quite define cp for all values of <x, t>- • If
x = <1,0>- and ta = 1l'/2, then x f1. U1 and cp(x, 1l'/2) f1. U2. This is e~sily
remedied by the introduction of a third chart.) It is easy to see that cp is a onc-
parameter group.
Example 4. Let M = 8 1 X 8 1 be the torus, and let a and b be real numbers.
Write x E M as x = <Xb X2>-, where Xi E 8 1• Define cp<a,b> by
cp<a,b>(Xb X2, t) = <CPf(Xl), CP~(X2) >- ,
where cpa and cpb are given in Example 3. Then cp<a,b> is a one-parameter group
and indeed a rather instructive one. The reader should check to see that essen-
tially different behavior arises according to whether b/a is rational or irrational.
[The construction of Example 4 from Example 3 can be generalized as
follows. If cp: M X ~ ~ M and 1/1: N X ~ ~ N are one-parameter groups,
then we can construct a one-parameter group on M X N given by CPt X I/It.
The map of M X N X ~ ~ M X N sending <x, y, t>- ~ <CPt(x) , I/It(Y)>- lH
differentiable because it can be written as the composite (cp X 1/1) 0 ~, where
cp X 1/1: M X ~ X N X ~ ~ M X N,
and
~: M X N X ~ ~ M X ~ X N X ~
is given by ~(x, y, t) = <x, t, y, t>-.]
In each of the four preceding examples we started out with an "infinitesimal
generator" to construct the one-parameter group, namely, the vector w ill
Example 1, the linear transformation A in Example 2, the real number a ill
Example 3, and the pair <a, b>- in Example 4. We will now show that associated
with anyone-parameter group on a manifold, there is a nice object which we
can regard as the infinitesimal generator of the one-parameter group.
Let cp: M X ~ ~ M be a one-parameter group. For each x E M considcr
the map CPx of ~ ~ M given by
CPx(t) = cp(x, t).
9.5 FLOWS AND VECTOR FIELDS 379
In view of condition (ii), we know that <;?x(O) = x. Thus <;?x is a curve passing
through x (see Fig. 9.13). Let us denote the tangent to this curve by X(x).
We thus get a mapping X which assigns to each x E M a vector X(x) E TxCM).
Any such map, i.e., any rule assigning to each x E M a vector in Tx(M), will be
called a vector field. We have seen that everyone-parameter group gives rise to a
vector field which we shall call the infinitesimal generator of the one-parameter
group.
Fig. 9.13
Let Y be a vector field on M, and let (U, a) be a chart on M. For each
x E U we get a vector Y(x)a E V. We can regard this as defining a V-valued
function Ya on a (U) :
Ya(v) = Y(a-l(v))a for v E a(U). (5.1)
Let (W, (3) be a second chart, and let Y~ be the corresponding V-valued function
on (3(W). If we compare (5.1) with (4.3), we see that
Y~({3 0 a-lev)) = J~ca-l(V) 0 Ya(v) if v E a(U n W). (5.2)
Equation (5.1) gives the "local expression" of a vector field with respect to a
chart, and Eq. (5.2) describes the "transition law" from one chart to another.
Conversely, let a be an atlas of M, and let Y a be a V-valued function defined
on a(U) for each chart (U, a) E a. Suppose that the Ya satisfy (5.2). Then for
each x E M we can let Y(x) E Tx(M) be defined by setting
Y(x)a = Ya(a(x))
for some chart (U, a) about x. It follows from the transition law given by (4.3)
and (5.2) that this definition does not depend on the choice of (U, a).
Observe that J~oa-l is a COO-function (linear transformation-valued function)
on a(U n W). Therefore, if Y is a vector field and Ya is a V-valued COO-function
on a(U), the function Y~ will be Coo on (3(U n V). In other words, it is consistent
to require that the functions Ya be of class Coo. We shall therefore say that Y
380 DIFFERENTIABLE MANIFOLDS
is a COO-vector field if the function Ya is Coo for every chart (U, a). As in the casl'
of functions and mappings, in order to verify that Y is Coo, it suffices to check that
the Ya are Coo for all charts (U, a) belonging to some atlas of M.
Let us check that the infinitesimal generator X of a one-parameter group <{'
is a Coo-vector field. In fact, if (U, a) is a chart, then
Xa(v) = (a 0 IPx),(O),
where IPx(t) = IP(x, t). We can write a 0 IPx(t) = <I>a(v, t), where
<I>a(v, t) = a 0 IP(a-l(v), t).
Let U' CUbe a neighborhood of x such that IP(Y, t) E U for y E U' and ItI < f:
Then <I>a is a differentiable map of a(U') X 1- a(U), where 1= {t: ItI < f:
In terms of this representation, we can write
Xa(v) = a!a (v, 0). (5.;))
This shows that X is a Coo-vector field.
If we evaluate (5.3) in the case of Example 1, we get <I>id(V, t) = v + tll',
so that X id = w. In the case of Example 2 we !!:et Xid(V) = Av.
There are various algebraic operations that can be performed with vector
fields. The set of all vector fields on M forms a vector space in the obvious way.
If X and Yare Coo-vector fields, then so is aX +bY (a and b are constants),
where
(aX +bY)(x) = aX(x) +bY(x), xEM.
Similarly, we can multiply a vector field by a function. Iff is a function and X
is a vector field, we define fX by
(fX)(x) = f(x)X(x), xEM.
It is easy to see that if f and X are differentiable, then so is fX. It is also easy
to check the various associative laws for this multiplication.
We have seen that anyone-parameter group defines a smooth vector field.
Let us examine the converse. Does any Coo-vector field define a one-parameter
group? The answer to the question as stated is "no".
In fact, let X = a/axl be the vector field corresponding to translation in the
xl-direction in ~n. Let M = ~2 - C, where C is some nonempty closed set.
of ~n. Then if p is any point of M that lies on a line parallel to that xl-axis which
intersects C (Fig. 9.14), then IPt(p) will not be defined (will not lie in M) for
every t.
The reader may object that M "has some
points missing" and that is why X does not
generate a one-parameter group. But we can
construct a counterexample on ~2 itself. In
fact, if we consider the vector field X on ~2
given by
Xid(XI , x2) = (1, _(X2)2),
--~p~-----+----~~
Fig. 9.14
9.5 FLOWS AND VECTOR FIELDS 381
then (5.3) shows that cp, if defined, satisfies
d4? d4?
dt (x, t) = dt (4?(x, t), 0) = X(4?(x, t),
where 4? = 4?id. If we let yi(t, x) = Xi 0 4?(x, t), then
dyl
dt = 1, yl(O) = xl,
If x 2 ~ 0, then the unique solution of the second equation is given by
2 1
Y (t) = t + 1/x2 '
which is not defined for all values of t. Of course, the trouble is that we only
have a local existence theorem for differential equations.
We must therefore give up on the requirement that cp be defined on all of
MXR
Definition 5.1. A flow on 111 is a map cp of an open set U C 111 X IR ---t 111
such that
i) 111 X {O} C U;
ii) cp is differentiable;
iii) cp(x, 0) = x;
iv) cp(cp(x, s), t) = cp(x, s + t) whenever both sides of this equation are
defined.
For x fixed, CPx(t) = cp(x, t) is defined for sufficiently small t, so that cp gives
rise to a vector field X as before. We shall call X the infinitesimal generator
of the flow cpo
As the previous examples show, there may be no t ~ 0 such that cp(x, t) is
defined for all x, and there may be no x such that cp(x, t) is defined for all t.
Proposition 5.1. Let X be a smooth vector field on 111. Then there exists a
neighborhood U of 111 X {O} in 111 X IR and a flow cp: U ---t 111 having X as
its infinitesimal generator.
Proof. We shall first construct the curve CPx(t) for any x E M, and shall then
verify that -<x, t>- f-+ cp(x, t) is indeed a flow.
Let x be a point of 111, and let (U, a) be a chart about x. Then Xa gives us
lfn ordinary differential equation in a(U), namely,
dv
dt = Xa(v), v E a(U).
By the fundamental existence theorem for ordinary differential equations, there
exists an e > 0, an open set 0 containing a(x), and a map
4?a: 0 X {t: ItI < e} ---t a(U)
382 DIFFERENTIABLE MANIFOLDS 9.5
such that
<Pa is Coo, <pa(V,O) = v,
and
d<Pad~' t) = Xa(<pa(V, t»).
Here the choice of the open set 0 and of f depends on a(x) and a(U). The
uniqueness part of the theorem asserts that <Pa is uniquely determined up to the
domain of definition; i.e., if <Pv is any curve defined for ItI < f' with <pv(O) = v
and
d<Pv(t) = X (<p (t»)
dt a v ,
then <pv(t) = <pa(v, t).
This implies that
<pa(V, t + s) = <pa(<pa(v, s), t)
whenever both sides are defined. (Just hold s fixed in the equation.)
Consider the curve <1>'(,) defined by
(5.4)
<1>'(t) = a -l(<I>a(a(x),t)). (5.5)
It is defined for It I < E, and is a continuous, in fact differentiable map of this
interval intoM. Furthermore, ifwe write IjJ = <1>axO then (5.4) asserts that the
tangent vector to the curve ljJ(t + .) isX(IjJ(t)), the value ofthe vector field at the
point ljJ(t). We will write this condition as
1jJ,(t) = X(IjJ(t)). (5.6)
Equation (5.6) is the way we would write the "first order differential equation"
on M corresponding to the vector field X. A differentiable curve IjJ satisfying
(5.6) is called an integral curve ofX. We now can formulate a manifold version
of the uniqueness theorem of differential equations:
Lemma 5.1. Let 111:1---'> M and 112:1 ---'> M be two integral curves ofX defined on
the same interval I. If 111(8) = 112(8) at some point 8 E Ithen 111 = 112, i.e. 111(t) =
112(t) for all tEl.
Proof We wish to show that the set where IjJl(t) -;t! 1jJ2(t) is empty. Let
A = {t:t:? sand IjJl(t) -;t! 1jJ2(t)}.
We wish to show that A is empty, and similarly that the set B = {t:t ~ sand
IjJl(t) -;t! 1jJ2(t)} is empty. Suppose that A is not empty, and let
t + = glb A = glb {t:t :? sand IjJl(t) -;t! 1jJ2(t)}.
We will derive a contradiction by
i) using the uniqueness theorem for differential equations to show that IjJl(t +)
-;t!1jJ2(t+), and
ii) using the Hausdorff property of manifolds to show that IjJl(t +) = 1jJ2(t +).
9.6 LIE DERIVATIVES 383
Details: i). Suppose that 111(t +) = 112(t+) =Y EM. We can find a coordinate
chart (13, W) abouty, and then 13 a 111 and 13 a 112 are solutions ofthe same system
of first order ordinary differential equations, and they both take on the value
13(Y) at t = t+. Hence, by uniqueness for differential equations, 13 a 111 and 13 a 112
must be equal in some interval about t +, and hence 111(t) = 112(t) for all t in this
interval. This is impossible since there must be points: arbitrarily close to t +
where 111(t) ;t.! 112(t) by the glb property oft +. This proves i). Now suppose that
111(t +) ;t.! 112(t+). We can find neighborhoods U1of111(t +) and U2of112(t +) such
that U1n U2 = 0. But then the continuity ofthe 111 imply that 111(t) E Uland
112(t) E U2 for t close enough to t+, and hence that IIl(t) ;t.! 112(t) for t in some
interval about t+. This once again contradicts the glb property oft +, proving
ii). The same argument with glb replaced by lub shows that B is empty proving
Lemma 5.1. The above argument is typical of a "connectedness argument."
We showed that the set where IIl(t) = 112(t) is both open and closed, and hence
must be the whole interval I.
Lemma 5.1 shows that (5.5) defines a solution curve ofX passing through
x at time t = 0, and is independent of the choice of chart in any common
interval ofdefinition about o. In other words it is legitimate to define the curve
<Px(-) by
which defines <px(t)for It I < E. Unfortunately the E depends not only on x but
also on the choice ofchart. We use Lemma 5.1 and extend the definition of<Px(·)
as far as possible, much as we did for ordinary differential equations on a vector
space in Chapter 6: For any S with IsI < Ewe lety = <Px(s) and obtain a curve
<Py(.) defined for ItI < E'. By Lemma 5.1
<Py(t) = <pX<s+t)ifls+tl < E. (5.7)
It may happen that Is I + E' > E. Then there will exist a t with It I < E' and
Is+t I > E. Then the right hand side of (5.7) is not defined, but the left is. We
then take (5.7) as the definition of <Px(s +t), extendingthe domain ofdefinition
of <Px(-). We then continue: Let Ix + denote the set of all s > 0 for which there
exists a finite sequence ofreal numbers So = 0 < SI < ... < Sk = s and points
Xo, . • •Xk-l E M with Xo = x, SI in the domain ofdefinition of<Px(·), X2 = <Px(SI)
and, inductively,
Si+l in the domain of definition of <PXj(·) and Xi+ 1 = <Pxj(Si+ 1).
If S E Tt, so is Sf for 0 < Sf < s, and so is s + TJ for sufficiently small
positive TJ. Thus rt is an interval, half open on the right. By repeated use of
(5.4) we define IPx(s) for s E Tt We construct r; in a similar fashion and
set Ix = It u r;. Then IPx(s) is defined for s E Ix, and I is the maximal
interval for which our construction defines IPx. For each x E M we obtain an
open interval I x in which the curve IPx(·) is defined.
Let U = UXEM{X} X Ix. Then U is an open subset of M X I. To verify this,
let (x, 8) E U. We must show that there is a neighborhood W of x and an E > 0
such that s E I x for all Is - 81 < E and x E W. By definition, there is a finite
384 DIFFERENTIABLE MANIFOLDS 9.6
sequence of points x = XO, Xl, ... , Xk and charts (U b al), ... , (Uk, ak) with
Xi-l E Ui and Xi E Ui and such that
ai(xi) = <J!",/ai(xi_l), ti),
where tl +... +tk = s. It is now clear from the continuity properties of the
<J!", that if we choose Xo such that al(xO) is close enough to al(xO), then the
points Xi defined inductively by
ai(xi) = <J!",; (ai(xi-l), ti)
will be well defined. [That is, aj{Xj_l) will be in the domain of the definition of
<J!",;(. ,ti).] This means that "8 E Ixo for all such points Xo. The same argument
shows that "8 + TJ E I Xo for TJ sufficiently small and X sufficiently close to X.
This shows that U is open.
Now define cp by setting
cp(x, t) = CPx(t) for (x, t) E U.
That cp is differentiable near M X {O} follows from the fact that cp is given (in
terms of a chart) as the solution of an ordinary differential equation. The
fundamental existence theorem then guarantees the differentiability. Near the
point (x, t) we can write
and so cp is differentiable because it is the composite of differentiable maps. D
6. LIE DERIVATIVES
Let cp be a one-parameter group on a manifold M, and let f be a differentiable
function on M. Then for each t the function cpi[f] is differentiable, and for t ~ 0
we can form the function
(6.1)
We claim that the limit of this expression as t --+ 0 exists. In fact, for any
x E M, cpi[f](x) = f 0 CPt(x) = f 0 CPx(t) and, therefore,
lim cpi[f] - f (x) = lim f 0 CPx(t) - f 0 CPx(O) = D"",f = X(x)f. (6.2)
hO t hO t
Here X(x) is a tangent vector at x and we are using the notation introduced in
Section 4. We shall call the limit of (6.1) the derivative of f with respect to the
one-parameter group cp, and shall denote it by Dxf. More generally, for any
smooth vector field X and differentiable function f we define Dxf by
Dxf(x) = X(x)f for all x E M, (6.3)
and call it the Lie derivative of f with respect to X. In terms of the flow
generated by X, we can, near any x E M, represent Dxf as the limit of (6.1),
9.6 LIE DERIVATIVES 385
where, in general, (6.1) will only be defined for a sufficiently small neighborhood
of x and for sufficiently small Itl.
Our notation generalizes the notation in Chapter 3 for directional derivative.
In fact, if III is an open subset of V and X is the "constant vector field" of
Example 1
Xid = wE V,
then
(Dxf)id = Dwfid'
where Dw is the directional derivative with respect to w.
Note that Dxf is linear in X; that is,
DaX+byf = aDxf +bDyg
if X and Yare vector fields and a and b are constants.
Let if; be a diffeomorphism of M 1 onto M 2, and let X be a vector field on M 2.
We define the "pullback" vector field if;*[X] on MI by setting
for all x EM. (6.4)
Note that if; must be a diffeomorphism for (6.4) to make sense, since if;-I enters
into the definition. This is in contrast to the "pullback" for functions, which
made sense for any differentiable map. Equation (6.4) does indeed define a
vector field, since
and
Let us check that if;*[X] is a smooth vector field if X is. To this effect, let a1
and a2 be compatible atlases on M 1 and M 2, and let (U, a) E al and (W, (3) E a2
be compatible charts. Then (6.4) says that
if;*[X],,(v) = J"orlo{rI({3 0 if; 0 a-I(v») . X(3({3 0 if; 0 a-I(v») for v E a(U),
which is a differentiable function of v. Since, by the chain rule,
J"orlo{rI({3oif; o a-I(v») ·J(3oy,o,,-I(V) = 1,
we can rewrite the last expression more simply as
if;*[X],,(v) = (J(3oy,o,,-I(V»)-IX(3({3oif; o a-I(v») for v E a(U). (6.5)
Thus if;*[X]" is the product of a smooth HomeV2, V I)-valued function and a
smooth V2-valued function, which shows that if;*[X] is a smooth vector field.
Exercise. Let <p be the flow generated by X on .V2. Show that the flow generated by
if;*[X] is given by
If <p is a one-parameter group, then we can write (6.6) as
-<x, t> ~ if;-I 0 <Pt 0 if;(x).
(6.6)
(6.6')
386 DIFFERENTIABLE MANIFOLDS 9.6
The vector field X The vector field Y
(Xf=af/ax) (Yf=x(af/ay»)
X(u, v) = <1, 0> =lil Y(u, v) = <0, u> 1Ot*(Y)
",l(X)
(a) (b) (c)
IOl(Y) - Y = D,.y
t -,
IOl(Y)-Y (since independent of t)
(d) (f)
1{It(X)-X
(g) (h) (i)
Fig. 9.15
It is easy to check that if T/ll: Ml ---+ M2 and T/l2: M2 ---+ M3 are diffeo-
morphisms and Y is a vector field on M 3, then
(1/12 0 1/Il)*Y = T/liT/l;Y.
9.6
""r(X) -x = DyX
t
(since independent of t)
Fig;. 9.15 (cont.)
(j)
LIE DERIVATIVES 387
If f is a differentiable function on M 2, then
D"'*lxM'*[f]) = -.f;*(Dxf). (6.7)
In fact, by (6.3) and (4.6) we have, for x EM17
D"'*lx]-.f;*[f](x) = -.f;*[X](x)-.f;*[f]
= -.f;;IX(-.f;(x»)-.f;*[f]
= (-.f;*-.f;;IX(-.f;(x»))f
= X(-.f;(x»)f
= (Dxf)(-.f;(x»)
= -.f;*(Dxf) (x).
by (6.3)
by (6.4)
by (4.6)
Let 'P be a one-parameter group on M with infinitesimal generator X, and
let Y be another smooth vector field on M. For t r£ 0 we can form the vector
field
'Pi[Y] - Y
t
(6.8)
and investigate its limit as t ~ 0, which we shall call DxY. In Fig. 9.15 we
have shown the calculation of DyX and DxY for two very simple fields on the
Cartesian plane ~2. The field X is the constant field Xid = 1117 so that Xf =
af/ax in terms of Cartesian coordinates x, y. The corresponding flow is given
by 'Pt(x, y) = -< x +t, y>-. Thus 'Pt* = id if we identify the tangent space at
each point of the plane with the plane itself. Then Y I--t 'Pi Y consists of "moving"
the vector field Y to the left by t units. Here we have taken Y = X1l2, so that
Yf = x(8f/ay). In Fig. 9.1.5(c) we have pictured 'PiY, and have superimposed Y
and 'PiY in Fig. 9.15(d). Figure 9.15(e) represents 'PiY - Y and Fig. 9.15(f)
is (l/t){'PiY - V}, which coincides with its limit, DxY, since the expression is
independent of t. The one-parameter group generated by Y is -.f;t where -.f;t(x, y) =
-<x, y +tx>-. Here at any p E ~2 we have -.f;t*lll = III + t1l2, so that -.f;iX =
-.f;-t*X(-.f;(x») = III - t1l2 • In Fig. 9.15(g) we have drawn -.f;iX and in Fig. 9.15(h)
we have superimposed it on X. Note that DxY = -DyX. However, these
two derivatives are nonzero for quite different reasons. The field 'PiY varies
vith t because the field Y is not constant. The field -.f;iX varies with t because
of "distortion" in the flow 1/It. See Fig. 9.15(g) and (h). In the general case,
DxY will r,esult from a superposition of these two effects. We now make the
general calculation.
Let (U, a) be a chart on M, and for v E a(U) let 0 be a sufficiently small open
set containing v, and let E > 0 be sufficiently small, so that tf>" given by
is defined for w E 0 and ItI < E. Then, for ItI < E, Eq. (6.5) implies that
(6.9)
388 DIFFERENTIABLE MANIFOLDS 9.6
The right-hand side of this equation is of the form A;IZt, where At and Zt are
differentiable functions of t with A 0 = I. Therefore, its derivative with respect
to t exists and
d(AtlZt) I l' AtlZt - Zo= 1m ----"--'----"
dt t=o t-+o t
1· A (AtlZt - zo)
= 1m t
t-+o t
1· Zt - Atzo= 1m -=------:---'--"
t-+o t
1· (Zt - Zo Atzo - zo)= 1m - ----=:.--"-----"
t-+o t t
= Zo - Aozo·
Now in (6.9) Zt = Ya(cf>a(v, t)), so
Zo = dYa (a!a (v, 0))
= dYa(Xa(V)).
Here Ya is a V-valued function, so dYa is its differential at the point cf>a(v, 0).
Hence dYa(Xa(v)) is the value of this differential at Xa(v). The transformation
At = J <l>a(V,t) = d(cf>a)(v,t), so
dd~lt=o = a:~alt=o
= d acf>a
at
= d(Xa)v.
Thus the derivative of (6.9) at t = 0 can be written as
d(Ya)v(Xa(v)) - d(Xa)v(Ya(v)) = DXa(v)Ya - Dya(v)Xa.
We have thus shown that the limit in (6.8) exists. If we denote it by DxY,
we can write
(6.10)
As before, we can use (6.10) as the definition of DxY for arbitrary vector
fields X and Y. Again, this represents the derivative of Y with respect to the
flow generated by X, that is, the limit of (6.9) where now (c.8) is only locally
defined.
From (6.10) we derive the surprising result that DxY = -DyX. For this
reason it is convenient to introduce a notation which expresses the antisym-
metry more clearly, and we shall write
DxY = [X, Y].
9.6 LIE DERIVATIVES 389
The expression on the right-hand side is called the Lie bracket of X and Y.
We have
[X, Yj = -[Y, Xj. (6.11)
Let us evaluate the Lie bracket for some of the examples listed in the
beginning of Section 5. Let M = IRn.
Example 1. If Xid = WI and Yid = W2 are "constant" vector fields, then (6.10)
shows that [X, Yj = O.
Example 2. Let Xid(V) = Av, where A is a linear transformation, and let
Yid = w. Then (6.10) says that
[X, Yhd(V) = -Aw,
since the directional derivative of the linear function Av with respect to w is Aw.
Example 3. Let Xid(V) = Av and Yid(V) = Bv, where A and B are linear
transformations. Then by (6.10),
[X, Yjid(V) = BAv - ABv = (BA - AB)v. (6.12)
Thus in this case [X, Yj again comes from a linear transformation, namely,
BA - AB. In this case the antisymmetry in A and B is quite apparent.
We now return to the general case. Let cp be a one-parameter group on M,
let Y be a smooth vector field on M, and let I be a differentiable function on M.
According to (6.7),
Then
cpi(Dyj) - Dyj
t
DCPt[Yl(CPj [ID - Dy(cpi[ID + Dy(cpif) - Dyj
t t
= D{CP![~I_y}cpi[IJ - D y (cpiIt- I).
Since the functions cp£[j] are uniformly differentiable, we may take the limit as
t ---? 0 to obtain
In other words,
Dx(Dyj) = DDxyj + Dy(Dxf)
= D[x,yJi + Dy(DxI).
D[x,yJi = Dx(Dyj) - Dy(Dxf). (6.13)
In view of its definition as a derivative, it is clear that DxY is linear in Y:
DX(aYl +bY2) = aDXYl +bDx Y2
if a and b are constants and X and Yare vector fields. By the antisymmetry,
it must therefore also be linear in X. That is,
DaXl+bX2Y = [aXl +bX2, Yj = a[Xl' Yj +b[X2' Yj = aDx1Y +bDx2Y.
390 DIFFERENTIABLE MANIFOLDS 9.7
Let X and Y be vector fields on a manifold M 2, and let ~ be a diffeomorphism
of M 1 onto M2. Then
~*[X, Y] = [~*X, ~*Y]. (6.14)
In fact, suppose X generates the flow cpo Then
~*[X, Y] = ~*DxY = ~* lim (cpi Y - Y)
t=o t
* *y *y= lim ~ CPt - ~
t=o t
r ~*cpN-l* ~*y - ~*y
= t~ t
= lim (~-1 0 CPt 0 ~)*~*y - ~*Y.
t=o t
Since ~-1 0 CPt 0 ~ is the flow generated by ~*X, we conclude that the last limit
is D",*x~*Y, which proves (6.14).
Now let Y and Z be smooth vector fields on M, and let X be the infinitesimal
generator of cpo Then
Dx[Y, Z] = ~~ cpi[Y, Z] t- [Y, Z]
= lim [cpi Y, cpiZ] - [Y, Z]
t=o t
= ~~ {[cpi Y t- Y, cpiZ] +[Y, cpiz t Z]}
= [DxY,Z] + [Y, DxZ].
Thus
[X, [Y, Z]] = [lx, Y], Z] + [Y, [X, Z]]. (6.15)
In view of the antisymmetry of the Lie bracket, Eq. (6.15) can be rewritten as
[X, [Y, Z]] + [Y, [Z, XJ] + [Z, [X, YJ] = O. (6.16)
Equation (6.15), or (6.16), is known as Jacobi's identity.
7. LINEAR DIFFERENTIAL FORMS
Let M be a differentiable manifold. We have attached to each x EM a vector space
Tx(M). Its dual space, (Tx(M»*, is called the cotangent space to M at x, and will be
denoted by T:(M). Thus an element ofT:(M) is a continuous linear function on Tx(M);
it is called a covector.
Some explanation of the word "continuous" is in order. In the case where
M [and hence Tx(M)] is finite-dimensional, all linear functions on Tx(M) are
continuous, so no further comment is necessary. We shall be concerned primarily
with this case. More generally, let l be a linear function on Tx(M). For any
9.7 LINEAR DIFFERENTIAL FORMS 391
chart (U, a) about x we have identified Tx(M) with V, thus identifying ~ E Tx(M)
with ~a E V. Then 1determines a linear function la on V by
(7.1)
If (W, 13) is a second chart, then
<~fj, lfj) = <Jaofj-l({3(X))~fj, la).
Since Jaorl ({3(x)) is a continuous map of V into V, we see that la is continuous
if and only if lfj is. We shall therefore say that 1is continuous if la is continuous
for some (and hence any) a. In this case we see that (7.1) gives us an identifica-
tion of T:(l[) with V* sending 1 into lao The last equation says that the rule
for change of charts is given by
(7.2)
Let f be a differentiable function on M, and let x EM. Then the function
on Tx(M) sending each ~ E Tx(M) into Hf) will be denoted by df(x). Thus
<~, clf(x) = U·
It is easy to see that df E T:(l[). In fact, in terms of a chart (U, a) about x,
<~, df(x) = D~(fa) (a(x)).
Note that f assigns an element df(x) of T:(lIf) to each x E lIf. A map which
assigns to each x E lJl an element of T:(lJl) will be called a covector field or a
linear differential form. The linear differential form determined by the function f
will be denoted simply by df.
Let W be a linear differential form. Thus w(x) E T:(lJl) for each x E lJI.
Let <X be an atlas of lIf. For each (U, a) E lJl we obtain the V*-valued function
w'" on a(U) defined by
for v E a(U). (7.3)
If (W, (3) E <X, then (7.2) says that
Wfj({3o a-lev)) = (J",orl({3o a-l(v))*w",(v)
= (Jfjo",-l(V))-l*W",(V) for v E a(U n W). (7.4)
As before, Eq. (7.4) shows that it makes sense to require that W be smooth.
We say that W is a Ck-differential form if w'" is a V*-valued Ck-function for every
chart (U, a). By (7.4) it suffices to check this for all charts in an atlas. Also, if
we are given V*-valued functions w"" each defined on a(U), (U, a) E <X, and
satisfying (7.4), then they define a linear differential form W on M via (7.3).
If W is a differential form and f is a function, we define the form fw by
fw(x) = f(x)w(x). Similarly, we define WI +W2 by
(WI + W2)(X) = Wl(X) + W2(X).
392 DIFFERENTIABLE MANIFOLDS 9.7
Let M 1 and M 2 be differentiable manifolds, and let "': M 1 -? M 2 be a
differentiable map. For any x EM1we have the map "'*x: Tx(M1) -? T'Pex)(M2)'
It therefore defines a dual map
("'u)*: T:ex/M2) -? T: (M1)'
(The reader can check that if l E T;Cx)(M2), then ~ -? (",*(0, l) is a continuous
linear function of ~, by verification in terms of a chart.)
Now let w be a differential form on M 2. It assigns w (",(x») E T;cx) (M2) to
",(x), and thus assigns an element ("'*x)*(w(",(x»)) E Ti(M 1) to x E M 1. We
thus "pull back" the form w to obtain a form on M 1 which we shall call ",*w.
Thus
(7.f»
Note that ",* is defined for any differentiable map as in the case of func-
tions, not only for diffeomorphisms (and in contrast to the situation for vector
fields).
It is easy to give the expression for ",* in terms of compatible charts (U, 0')
of M 1 and (W, (3) of M 2. In fact, from the local expression for ",* we see that,
v E O'(U). (7.6)
From (7.6) we see that ",*w is smooth if w is. It is clear that ",* preserves algebraie
operations:
(7.7)
and
",*(fw) = ",*(fJ",*(w). (7.8)
If ",: M1 -? M2 and 1/;: M2 -? M3 are differentiable maps, then (4.4) and
(7.5) show that
(I/; 0 ",)*w = ",*I/;*w. (7.U)
Let 1/;: M 1 -? M2 be a differentiable map, and let f be a differentiable func-
tion on ]1,[2. Then (4.6) and the definition df show that
d(I/;*(fJ) = 1/;* df. (7.10)
Let", be a flow on M with infinitesimal generator X, and let w be a smooth
linear differential form on M. Then the form ",iw is locally defined and, as in tIll'
case of functions and vector fields, the limit as t -? 0 of
*"'tW- w
t
exists. We can verify this by using (7.6) and proceeding as we did in the casp
of vector fields. The limit will be a smooth covector field which we shall call
Dxw. We could give an expression for Dxw in terms of a chart, just as w('
did for vector fields.
9.8 COMPUTATIONS WITH COORDINATES 393
If f is a differentiable function, w a smooth differential form, and X the
infinitesimal generator of cp, then
Dx(fw) = (Dxf)w +fDxw. (7.11)
In fact,
D (f) 1· cp7fw - fw
x w = 1m t
t->o
= lim (cp7f - f cpi(w) +fCP7w - w)
t->o t t
= (Dxf)w +fDxw.
If 9 is a differentiable function on M, then
cp7 dg - dg = dcp7[g] - dg = d (cp7[g] - g)
t t t ·
An easy verification in terms of a chart shows that the limit of this last expression
exists and is indeed d(Dxcp). Thus
Dx(df) = d(Dxf). (7.12)
Equations (7.11) and (7.12) show that if
w = h dg l +... +fk d(/k,
then
Dxw = (Dxfd dg l + ... + (Dxfk) dgk +h d(Dxgl ) + ... +fk d(Dxgk).
(7.12')
Let w be a smooth linear differential form, and let X be a smooth vector field.
Then (X, w) is a smooth function given by
(X, w)(x) = (X(x), w(x).
Note that (X, w) is linear in both X and w. Also observe that for any smooth
function f we have
(X, df) = Dxf· (7.13)
8. COMPUTATIONS WITH COORDINATES
For the remainder of this chapter we shall assume that our manifolds are finite-
dimensional. Let M be a differentiable manifold whose V = IRn. If (U, a) is a
chart of M, then we define the function x~ on U by setting
x~(x) = ith coordinate of a(x).
If f is any differentiable function on U, then we can write Eq. (2.1) as
f(x) = fa(x~(x), ... , x~(x)),
(8.1)
394 DIFFERENTIABLE MANIFOLDS 9.R
which we shall write as
f = fa(x~, ... ,x~). (8.2)
We define the vector field a/ ax~ on U by
(aai ) (v) = 8i(= <0, . '," l l : .. ,0».
Xa a ~th positIOn
(8.3)
If X is any vector field on U, then we have
X = Xl ~+ ... +Xn~,a axl a axn (8.4)
a a
where the functions X~ are defined by
(X)a(a(x)) = <X~(x), .. . , X~(x) >. (8.5)
Equation (8.4) allows us to regard the vector field X as a "differential operator".
In fact, it follows from the definitions that
(8.G)
Since x~ is a differentiable function on U, dx~ is a differential form on U and
for all v E U. (8.7)
In particular,
<~ d' j) - .~ax~' Xa - u,. (8.8)
If W is a differential form on U, then
W = ala dx~ +... +ana dx~, (8.9)
where the functions a,a are defined by
wa(a(x)) = <ala(x), ... , ana(x) >E IRn*. (8.10)
It then follows from the definitions that
df = afa d 1 +... + afa d n
J axi Xa axn Xa· (8.11)
a a
Equation (8.11) has built into it the transition law for differential forms under a
change of charts. In fact, if (W, (3) is a second chart, then on Un W we hav(',
by (8.11),
d i axb d 1 axb d n
X(j = axl Xa +... +axn Xa.
a a
(8.12)
9.8 COMPUTATIONS WITH COORDINATES
If we write W = al{3 dxA +... + an{3 dx~ and substitute (8.12), we get
" ax~aia = L... axi ai{3·
a
Now
[ax~JaX]a
395
is the matrix J{3oa-l. If we compare with (8.10), we see that we have recovered
(7.4).
Since the subscripts a, (3, etc., clutter up the formulas, we shall frequently
use the following notational convcntions: Instcad of writing x~ we shall write Xi,
and instead of writing x~ we shall write yi. Thus
i i
X = Xa,
k k
Z = X-y, etc.
Similarly, we shall write Xi for X~, yi for X~, ai for a,a, bi for ai{3, and so on.
Then Eqs. (8.1) through (8.12) can be written as
xi(x) = ith coordinate of a(x),
f = fa(XI, ... , xn),
(to)a (v) = Oi,
X=XI~+ ... +Xn~,
ax l axn
(X)a(a(x) = <XI(X), ... , xn(x»,
Dxf = Xl afa + ... + xn afa ,
axl axn
wa(a(x) = <al(x), ... ,an(x»,
df = afa dx l +... +():f~ dxi
axl aX]'
d i ayi d I ay i d n
y = ax! x + ... +ax; x.
(8.1')
(8.2')
(8.3')
(8.4')
(8.5')
(8.6')
(8.7')
(8.8')
(8.9')
(8.10')
(8.11')
(8.12')
The formulas for "pullback" also take a simple form. Let if;: M I --> M 2 be a
differentiable map, and suppose that M I is m-dimensional and M 2 is n-dimen-
396 DIFFERENTIABLE MANIFOLDS
sional. Let (U, a) and (W, (3) be compatible charts. Then the map
(3 0 1/; 0 a-I: a(U) -t (3(W)
is given by
i = 1, ... , n,
that is, by n functions of m real variables. If f is a function on M 2 with
f = ffl(yl, ... ,yn) on W,
then
1/;*[f1 = fa(XI, ... , Xm) on U,
where
9.8
(8.13)
(8.14)
The rule for "pulling back" a differential form is also very easy. In fact, if
w = al dyl +... + an dyn on W,
then 1/;*w has the same form on U, where we now regard the a's and y's as
functions of the x's and expand by using (8.12). Thus
* '"' ayi .
1/; w = L....- aia----:- dxJ,
xJ
where ai = a,(yl(xt, . .. ,xm), .. . , yn(x l, ... , xm)).
Let x E U. Then
( a) ayj a
1/;* a---: (x) = L a---: (x) a---: (1/;(x))
x' j x' yJ
gives the formula for 1/;u.
EXERCISES
(8.15)
8.1 Let x and y be rectangular coordinates on P, and let (r, (J) be polar "coordinates"
on P - {O}. Express the vector fields ajar and aja(J in terms of rectangular coordi-
nates. Express ajax and ajay in terms of polar coordinates.
8.2 Let x, y, z be rectangular coordinates on 1f3. Let
a ax = y--z--,
az ay
a aY = z--x-,
ax az
and
a az=x--y-'
ay ax
Compute [X, Y], [X, Z], and [Y, ZJ. Note that X represents the infinitesimal generator
of the one-parameter group of rotations about the x-axis. We sometimes call X the
"infinitesimal rotation about the x-axis". We can do the same for Y and Z.
8.3 Let
a aA = y-+z-,
az ay
a aB=x-+z-,
az ax
and
a aC=x--y_·
ay ax
Compute [A, BJ, [A, C), and [B, CJ. Show that Af = Bf = Cf = 0 if f(x, y, z)
x 2 + y2 - z2. Sketch the integral curves of each of the vector fields A, B, and C.
9.9 RIEMANN METRICS 397
8.4 Let
o 0
D = u-+v-,
ov ou
o 0
E = u--v-,
ov ou
and
o 0
F=u--v-·
OU OV
Compute [D, El, [D, Fl, and [E, F].
8.5 Let P1, ... , P n be polynomials in xl, ... , xn with no constant term, that is,
P1(O, ... ,O) = 0.
Let
1 0 n 0
I=x -+···+x-ox1 oxn
and
Show that
[I,X] = °
if and only if the P/s are linear. [Hint: Consider the expansion of the P/s into homoge-
neous terms.]
8.6 Let X and the P;'s be as in Exercise 8.5, and suppose that the P;'s are linear. Let
1 0 n 0
A = }..lX - +...+}..nx - ,
ox1 oxn
and suppose that
fo~ i ~ j.
Show that [A, X] = °if and only if Pi = f.l.iXi, that is,
10+ + nOX = f.l.1X - . . . f.l.nX-
ox1 oxn
for some
1 n
f.I. , ••• , f.I. •
8.7 Let A be as in Exercise 8.6, and suppose, in addition, that
for any i, j, T.
Show that if the Pis are at most quadratic, then
[A,X] = °
if and only if Pi = f.l.iXi. Generalize this result to the case where Pi can be a polynomial
of degree at most m.
9. RIEMANN METRICS
Let M be a finite-dimensional differentiable manifold. A Riemann metric, m,
on M is a rule which assigns a positive definite scalar product ( , )m.., to the
vector space Tz(M) for each x E M. We shall usually drop the subscripts m and
x when they are understood from the context. Thus if m is a Riemann metric
on M, x E M, and ~, ." E T.,(M), we shall write the scalar product of ~ and." as
(~, .,,) = (~, "')m•.,.
398 DIFFERENTIABLE MANIFOLDS
Let (U, a) be a chart of M. Define the functions gij on U by setting
gij(X) = (a~i (x), a~j (x») ,
so that % = (ji>' If~, TJ E TxCM) with
then
" i a~ = L.J ~ a---: (x)
x'
and
(~, TJ) = L g,j(xHiTJj .
i.j
Since dx 1 (x), ... ,dx"(x) is the basis of T:UIl) dual io t.he basis
a a
a- I (:r), ... 'a- (x),
X x"
we have
so that the last equation can be written as
(~, TJ)m,x = L (j,j(x)(~, clxi)(TJ, dxj ).
Equation (9.2) is usually written more succinctly as
m I U = L g,j(x) dx i dxj .
[Here (9.3) is to be interpreted as a short way of writing (9.2).]
Let (W, fJ) be a second chart with
hkl(X) = (a~k (x), a~l (x») , ;1; E W,
that is,
m IW= Lh/cldykdi·
Then for x E Un W, we have
so
that is,
a al a
(x) - " (-r) (x)axi' - L.J axi . ayk '
a a/ a
if;) (x) = L [i;) (x) ayl '
alal
gij = L hkl axi axj '
k,l
9.9
(9.1)
(9.2)
(9.3)
(9.4)
(9.5)
Note that (9.5) is the answer we would get if we fonnally substituted (8.12) for
the dy's in (9.4) and collected the coefficients of dxi dxj •
In any event, it is clear from (9.5) that if the h,j are all smooth functions
on W, then the (j,j are smooth on U n W. In view of this we shall say that a
9.9 RIEMANN METRICS 399
Riemann metric is smooth if the functions gij given by (9.3) are smooth for any
chart (U, a) belonging to an atlas a of M. Also, if we are given functions
gij = gji defined for each (U, a) E a such that
i) L gij(X) ~i~j > 0 unless ~ = 0 for all x E U,
ii) the transition law (9.5) holds,
then the gij define a Riemann metric on ]1.[. In the following discussion we shall
assume that our Riemann metrics are smooth.
Let if;: M 1 --+ M2 be a differentiable map, and let m be a Riemann metric
on M2. For any x EM1 define ( , )",*(m),x on TxCM1) by
(9.6)
Note that this defines a symmetric bilinear function of ~ and 7]. It is not
necessarily positive definite, however, since it is conceivable that if;*(~) = 0
with ~ ~ O. Thus, in general, (9.6) does not define a Riemann metric on MI.
For certain if; it does.
A differentiable map if;: M 1 --+ M 2 is called an immersion if if;*x is an injection
(i.e., is one-to-one) for all x EM1.
If if;: M 1 --+ M2 is an immersion and m is a Riemann metric on M2, then we
define the Riemann metric if;*(m) on M 1 by (9.6).
Let (U, a) and (W, (3) be compatible charts of M1 and M 2, and let
m I W = L hkl dl dyl.
Then
where
which is just (9.5) again (with a different interpretation). Or, more succinctly,
if;*(m) I U = L if;*(hk1)if;*(dyk)if;"-(dyl).
Let us give some examples of these formulas. If M = IRn , then the identity
chart induces a Riemann metric on IRn given by
(dX1)2 +... + (dxn)2.
Let us see what this looks like in terms of polar coordinates in 1R2 and 1R3.
400 DIFFERENTIABLE MANIFOLDS
In IR2 if we write
then
so
Xl = r cos 0, x2 = r sin 0,
dxl = cos 0 dr - r sin 0 dO,
dx2 = sin 0 dr + r cos 0 dO,
9.9
(9.7)
Note that (9.7) holds whenever the forms dr and dO are defined, i.e., on all of
1R2 - {O}. (Even though the function 0 is not well defined on all of 1R2 - {O},
the form dO is. In fact, we can write
Xl dx2 - x2 dx l
dO = (XI)2 + (X2)2 .)
In 1R3 we introduce
Then
Thus
Xl = r cos cp sin 0,
x2 = r sin cp sin 0,
x3 = r cos o.
dxl = cos cp sin 0 dr - r sin cp sin 0 dcp +r cos cp cos 0 dO,
dx2 = sin cp sin 0 dr +r cos cp sin 0 dcp +r sin cp cos 0 dO,
dx3 = cos 0 dr - r sin 0 dO.
(9.8)
(dXI)2 + (dx2)2 + (dx3)2 = dr2 +r2 sin2 0 dcp2 +r2 d02• (9.9)
Again, (9.9) is valid wherever the forms on the right are defined, which this time
means when (X I)2 + (X2)2 ~ O.
Let us consider the map L of the unit sphere 8 2 -+ 1R3, which consists of
regarding a point of 8 2as a point of 1R3. We then get an induced Riemann metric
on 8 2•
Let us set
dO = L* dO and dcp = L* dcp,
so the forms dO and dcp are defined on U = 8 2 - {<O, 0,1>-, <0,0, -1>-}.
Then on U we can write (since r = 1 on 8 2)
(9.10)
We now return to general considerations. Let M be a differentiable manifold
and let C: I -+ M be a differentiable map, where I is an interval in IRI. Let t
denote the coordinate of the identity chart on l. We shall set
C'(S) = C* (~) (s), s E I,
so that C'(s) E TC(B)(M) is the tangent vector to the curve C at s. If (U, a) is
a chart on M and xl, ... ,xn are the coordinate functions of (U, a), then if
9.9 RIEMANN METRICS 401
C(l') C U for some I' C I,
a 0 C = -<Xl 0 C, ... ,xn 0 C>-,
so that
, J dx l 0 C dxn 0 C .
C (t)", =, dt , ... , (It- ret).
When there is no possibility of confusion, we shall omit the 0 C and simply write
and
Now let m be a Riemann metric on M. Then IIC'(t)11 = (C'(t), C'(t))1/2 is a
continuous function. In fact, in terms of a chart, we can write
The integral
IIC'(t)1I = V2:: %(C(t)xi'(t)xj'(t) .
[ IIC'(t)11 dt (9.11)
is called the length ofthe curve C. It will be defined if IIC'(t) II is integrable over I.
This will certainly be the case if I and IIC'(t)II are both bounded, for instance.
Note that the length is independent of the parametrization. More precisely,
let ep: J -t I be a one-to-one differentiable map, and let CI = Co ep. Then at
any r E J we have
that is,
Ci(r) = ;: C'(t).
Thus
IICier)II = IIC'(ep(r» II 1~~ I· (9.12)
On the other hand, by the law for change of variable in an integral we have
lllC'ol1 = i IIC'(ep(·)II I~; I
= i IICiOl1 by (9.12).
More generally, we say that a curve C defined on an interval I is piecewise
differentiable if
i) C is continuous;
ii) I = I I U' .. U IT and C, on each Ij, is the restriction of a differentiable
curve defined on some interval Ij strictly containing Ii'
402 DIFFERENTIABLE MANIFOLDS 9.9
(Thus a piecewise differentiable curve is a curve with a finite number of "cor-
ners".) If C is piecewise differentiable, then IIC'(t) II is defined and continuous
except at a finite number of t's, where it may have a jump discontinuity. In
particular, the integral (9.11) will exist and the curve will have a length.
Exercise. Let C be a curve mapping I onto a straight line segment in IRn in a one-to-
one manner. Show that the length of C is the same as the length of the segment.
Let C: [0, 1] ~ 1R2 be a curve with C(O) = 0 and C(I) = v E 1R2. If we use
the expression (9.7), we see that
IIIC'(t) II dt = IvI(r'(t))2 + (r(t)O'(t))2dt
~ Ilr'(t)1 dt
1
~ 10 r'(t) dt
= Ilvll,
with equality holding if and only if 0' =0 and r' ~ O. Thus among all curves
joining °to v, the straight line segment has shortest length.
Similarly, on the sphere, let C be any curve C: [0, 1] ~ 8 2 with C(O) =
(0,0, 1) and C(I) = p -:;6- (0,0, -1), and let 01 = O(C(I) ). Then
IIIC'(t)11 dt = I vi(O'(t))2 +sin2 o(P'(t)) 2 dt
~ 101
IO'(t)1 dt.
If we let tl denote the first point in [0, 1] where 0 = 01, then
I (1 (tI (tI
IIC'(t) II dt ~ io IO'(t) Idt ~ io IO'(t)1 dt ~ io O'(t) dt = 01,
with equality only if <p' =°and tl = 1. Thus the shortest curve joining any
two points on 8 2 is the great circle joining them.
In both examples above we were aided by a very fortuitous choice of coordi-
nates (polar coordinates in the plane and a kind of polar coordinates on the
sphere). We shall see in Section 11, Chapter 13, that this is not accidental. We
shall see that on any Riemann manifold one can introduce local coordinates in
terms of which it is easy to describe the curves that locally minimize length.
CHAPTER 10
THE INTEGRAL CALCULUS ON MANIFOLDS
In this chapter we shall study integration on manifolds. In order to develop
the integral calculus, we shall have to restrict the class of manifolds under
consideration. In this chapter we shall assume that all manifolds M that arise
satisfy the following two conditions:
1) M is finite-dimensional.
2) M possesses an atlas ex containing (at most) a countable number of
charts; that is, ex = {(Ui , (Xi)}i=1,2, ...•
Before getting down to the business of integration, there are several technical
facts to be established. The first two sections will be devoted to this task.
1. COMPACTNESS
A subset A of a manifold 1Jl is said to be compact if it has the following property:
i) If {U,} is any collection of open sets with
ACUU"
there exist finitely many of the U" say U'I' ... , U", such that
A C U'I U ... U U",
Alternatively, we can say:
ii) A set A is compact if and only if for any family {F,} of closed sets
such that
there exist finitely many of the F, such that
A n F'I n ... n F" = 0.
The equivalence of (i) and (ii) can be seen by taking U, equal to the comple-
ment of F,.
In Section 5 of Chapter 4 we established that if M = U is an open subset of
IRn, then A C U is compact if and only if A is a closed bounded subset of IRn.
403
404 THE INTEGRAL CALCULUS ON MANIFOLDS
We make some further trivial remarks about compactness:
iii) If A b ... , AT are compact, so is A 1 U ... U AT'
10.1
In fact, if {U,} covers A1 u··· U AT, it certainly covers each A j. We can
thus choose for each j a finite subcollection which covers A j . The union of these
subcollections forms a finite subcollection covering A 1 U ... U AT'
iv) If 1/;: M 1 ---7 M 2 is continuous and A eM1 is compact, then 1/;[A] is
compact.
In fact, if {U,} covers 1/;[A], then {y;-1(U,)} covers A. If the U, are open,
so are the 1/;-1(U,), since 1/; is continuous. We can thus choose '1, ... , 'T so that
A C 1/;-1(U,J u· .. u 1/;-1(U,),
which implies that 1/;[A] C U'I U· .. U U'T'
We see from this that if A = A1 u· .. U An where each Aj is contained
in some Wi, where (Wi, f3i) is a chart, and f3i(Aj) is a compact subset of ~n,
then A is compact. In particular, the manifold M itself may be compact. For
instance, we can write sn as the union of the upper and lower hemispheres:
sn = {x: xn+1 ~ O} U {x : xn+1 ~ O}. Each hemisphere is compact. In fact,
the upper hemisphere is mapped onto {y : Ilyll ~ I} by the map IP1 of Section 8.1,
and the lower hemisphere is mapped onto the same set by IP2' Thus the sphere
is compact.
On the other hand, an open subset of ~n is not compact. However, it can
be written as a countable union of compact sets. In fact, if U C ~n is an open
set, let
An = {x E U: IIxll ~ nand p(x, aU) ~ lin}.
It is easy to check that An is compact and that
UAn = U.
In view of condition (2), we can say the same for any manifold M under
consideration:
Proposition 1.1. Any manifold M satisfying (1) and (2) can be written as
where each Ai C 111 is compact.
Proof. In fact, by (2)
and by the preceding discussion each Uj can be written as the countable union
of compact sets. Since the countable union of a countable union is still count-
able, we obtain Proposition 1.1. 0
10.2 PARTITIONS OF UNITY 405
An immediate corollary is:
Proposition 1.2. Let M be a manifold [satisfying (1) and (2)], and let {U.}
be an open covering of M. Then we can select a countable subcollection
{Uj } such that
Proof. Write M = UA" where Ar is compact. For each r we can choose finitely
many Ur,b Ur,2, ... , Ur,kr so that
Ar C Ur,l u· .. U Ur,kr"
The collection
is a countable subcollection covering M. D
2. PARTITIONS OF UNITY
In the following discussion it will be convenient for us to have a method of
"breaking up" functions, vector fields, etc., into "little pieces". For this purpose
we introduce the following notation:
Definition 2.1. A collection {gj} of COO-functions is said to be a partition of
unity if
i) gi 2:: 0 for all i;
ii) supp git is compact for all i;
iii) each x E M has a neighborhood Vx such that Vx n supp gi = 0
for all but a finite number of i; and
iv) L gi(X) = 1 for all x E M.
Note that in view of (iii) the sum occurring in (iv) is actually finite, since
for any x all but a finite number of the gi(X) vanish. Note also that:
Proposition 2.1. If A is a compact set and {gj} is a partition of unity, then
An supp gi = 0
for all but a finite number of i.
Proof. In fact, each x E A has a neighborhood Vx given by (iii). The sets
{Vx} xEA form an open covering of A. Since A is compact, we can select a finite
subcollection {V 1, ... , Vr } with A C V 1 U ... U Vr. Since each Vk has a
nonempty intersection with only finitely many of the supp 9i, so does their
union, and so a fortiori does A. D
t Recall that supp 9 is the closure of the set {x: g(x) rf O}.
406 THt; INTEGRAL CALCULUS ON MANIFOLDS 10.2
Oefinition 2.2. Let {U,} be an open cuvering of AI, and let fgj} be a parti-
tion of unity. We Ray that {gj} is subordinate to {U,} if for every j there
exists an l(j) such that
BUPP gj C U'<il' (2.1)
Th.ore... 2.1. Let {fl,} be any open covering of M. There exists a partition
of unity fgj} subordinate to {U,}.
The proof that we shall present below is due tu Bunic anu Fralllpton. t
First we introduce some preliminary notions.
The function 1on ~ defined by
I
e-lIlt
I(u) = 0
if u> 0,
if u ~ 0
is C~. For u '" 0 it is clear that Jhas uerivatives uf all urders. To check thatJ is
Coo at 0, it suffices to show that J(k'(u) -> 0 as u -> 0 from the right. But
ICk'(u) = Pk(l / u)e- II., where Pk is a polynomial of degree 2k. So
lim f,k,(U) = lim Pk(s)e-' = 0,
u--->O , om:>
since e' goes to iufinity faster than any polynomial.
Note that J(u) > 0 if and only if u > O. Now consider the function g~ on ~
defined by
g~(x) = J(x - a)J(b - x).
Then y! is CZl and nonnegative and
if and only if a < x < b.
More generally, if a = -<al
, .•• , ak
>- and b = -<b l
, ••• , bk
>-, define the
fUllctiull y~ on Rk by setting
g~(x) = g~:(x)g!;(X2) . .. Ot.(xk),
where x = -<Xl, ... J Xk>. Then gZ 2: 0,0: E CZJ, and
y~(x) > 0 if and only if al
< Xl < bl
, ... ,ak
< x' < b'. (2.2)
Lemma. Let I" ... .Ik be COO-functions on a manifold At, and let W =
{x: a l < JI(X) < b" ... , ak < J.(x) < b'}. There exists a nonnegative
COO-function 9 such that W = {x : g(x) > O}.
In fact, if we define 9 by
then it is clear that 9 has the ue.ired pruperties.
t Smooth functions on nanach manifold., J. Malh and Meek. 15, 877-898 (1966).
10.2 "AUTITTO-':S OF UNITY 407
We now t.urn to the proof of Theorem 2. L
Proof. For each x E M choose 0. V, containing x and a chart (V, ,,) about x.
Then a(U n U.) i9 an 01)('0 !l4't containing a(x) in RIO. Choose a and b such thllt
...{z) E int O!'
Let W" = ",- l(int D~). Then
lV~ C If.
and
Ilnd
Q>Ca(V n V.).
11'~ is compact.
AIJ:K) if xl, ... ,x~ an! the <.:w roiu.ates givcn by a,
11'., = {y: a l < r,1(y) < bl, ... , 0" < x"C,,) < b"}.
By our lcmmu we can fino a nonncgative C"-function Iz such that
11'2 - {y :/z(Y) > OJ.
(2.3)
Since x E lV.. lhl;l {lI'z} cover Jr. By Proposition 1.2 we can Re!ftCtll countable
su)t"Qvering {Wi}. Let liS denote the corresponding function~ by Ii; that is,
if Wi = Wz , we IW;t/t "'" I~·
r..,.
VI - IV] - {.t :fd.r:) > OJ,
V2 - {X:/f(X) > O,/I(X) <!},
1', = {x :/.(:r.) > 0, hex) < l / r, ... ,/,_,(x) < I/r} .
It is clear thut VI is open and that, Vj C IVj, so that, by (~.3),
Vi is compact ,nd Vi '- U. (2.4)
for some . - l(j).
lor each ;t E M let q(x) dcnot(' the first integer q for which fa{x) > O.
Thus/p(x) = [) if p < q(x) and /~CZ1(I) > o.
1.et Vz - {y: /"cz/y) > tfq(Z)(;c)} . Since/V(I)(x) > 0, it follows that x C I'A
and I'z i~ open. Furlhtmnorc,
if r > q(l') and l i T" < !fv(All:). (2.5)
ACCQrding to thc lemma, f!sch Ij,t 1' , can be given as Vi = {.t:: !1lc) > OJ,
where!1, is a suitable C"'-function. Let g = L i];. ILL view uf (2.5) this is really
a finite awn ilL the ncip;hborhood uf lilLy 2.". Thus Y is C"'. Now !1q(Z)(x) > 0,
since x E VVI)' Thus g > O. Set
,;(lj = -.
g
We claim that {OJ} il:l the desired partition of unity. In fnet, (i) holds by uur
construction, (ii) nno (2.1) follow from (~.4), (iii) follows from (2.5), and (iv)
hold!~ hy construction. 0
408 THE INTEGRAL CALCULUS ON MANIFOLDS 10.3
3. DENSITIES
If we regard IRn as a differentiable manifold, then the law for change of variables
for an integral shows that the integrand does not have the same transition law
as that of a function under change of chart. For this reason we cannot expect
to integrate functions on a manifold. We now introduce the type of object that
we can integrate.
Definition 3.1. A density P is a rule which assigns to each chart (U, a) of
M a function Pa defined on a(U) subject to the following transition law:
If (W, (3) is a second chart of M, then
for v E a(U n W). (3.1)
If a is an atlas of M and functions Pa, are given for all (Ui, ai) E a satisfying
(3.1), then the Pa, define a density P on M. In fact, if (U, a) is any chart of M
(not necessarily belonging to a), define Pa by
This definition is consistent: If v E a(U n Ui) n a(U n Uj), then by (3.1),
Pa;(aj 0 a-1(v))ldet Ja;oa-l(v)1
= Pa, (ai 0 a,-:-1 (aj 0 a-I(v) )) Idet J aioail (aj 0 a-I(v) ) IIdet J a;oa-I (v) I
= Pa;(ai 0 a-I (v)) Idet Ja,oa-l(v)1
by the chain rule and the multiplicative property of determinants.
In view of (3.1) it makes sense to talk about local smoothness properties of
densities. We will say that a density P is Ck if for any chart (U, a) the function
Pa is Ck • As usual, it suffices to verify this for all charts (U, a) belonging to some
atlas. Similarly, we say that a density P is locally absolutely integrable if for
any chart (U, a) the function Pa is absolutely integrable. By the last proposition
of Chapter 8 this is again independent of the choice of atlases.
Let P be a density on M, and let x be a point of M. It does not make sense
to talk about the value of pat x. However, (3.1) shows that it does make sense
to talk about the sign of P at x, More precisely, we say that
P > 0 at x if Pa(a(x)) > 0 (3.2)
f~r a chart (U, a) about x. Equation (3.1) shows that if Pa(a(x)) > 0, then
P/3({3(x)) > 0 for any other chart (W, (3) about x. Similarly, it makes sense to
say that P < 0 at x, P > 0 at x, or P ~ 0 at x.
Definition 3.2. Let P be a density on M. By the support of P, denoted by
supp P, we shall mean the closure of the set of points of M at which P does
not vanish. That is,
supp P = {x: P ~ 0 at x}.
10.3 DENSITIES 409
Let Pl and P2 be densities. We define their sum by setting
(Pl +P2)a = Pla +P2a (3.3)
for any chart (U, ex). It is immediate that the right-hand side of (3.3) satisfies
the transition law (3.1), and so defines density on M.
Let P be a density, and let f be a function. We define the density fp by
(3.4)
Again, the verification of (3.1) is immediate in view of the transition laws for
functions.
It is clear that
supp (Pl +P2) C supp Pl U supp P2 (3.5)
and
supp (fp) = suppf n supp p. (3.6)
We shall write
Pl ::; P2 at x if P2 - Pl 2: 0 at x
and
Pl ::; P2 if Pl::; P2 at all x E M.
Let P denote the space of locally absolutely integrable densities of compact
support. We observe that P is a vector space and that the product fp belongs
to P if f is a (bounded) locally contented function and pEP.
TheoreIn 3.1. There exists a unique linear function f on P satisfying the
following condition: If pEP is such that supp pC U, where (U, ex) is a
chart of M, then
Jp = ( Pa.
ia(U)
(3.7)
Proof. We first show that there is at most one linear function satisfying (3.7).
Let a be an atlas of M, and let {gj} be a partition of unity subordinate to a.
For each} choose an iU) so that
supp gj C Ui(j).
Write P = 1· P = L gj' p. Since supp P is compact, only finitely many
of the terms gjp are not identically zero. Thus the sum is finite. Since f is linear,
By (3.7),
Thus
(3.8)
Thus f, if it exists, must be given by (3.8). To establish the existence of f,
410 THE INTEGRAL CALCULUS ON MANIFOLDS 1O.:~
we must show that (3.8) defines a linear function on P satisfying (3.7). The
linearity is obvious; we must verify (3.7).
Suppose supp pC U for some chart (U, a). We must show that
( p", = L: 1 (gjP)"'iU)'
l",(U) j "'iU) (UiU»
Since p = L (ljp and therefore P'" = L «(ljp)"" it suffices to show that
1 (gjp)", = ( (gjP)"'i' (3.9)
",(U) l"'i(Ui)
where supp (ljP C U n Ui . By (3.1),
«(ljp)", = «(ljP)"'i 0 (a, 0 a-I). det J"'iO",-I,
SO that (3.9) holds by the transformation law for integrals in IRn. 0
We can derive a number of useful properties of the integral from the for-
mula (3.8):
(3.10)
In fact, since (lj 2:: 0, we have «(ljPl)", :::; «(ljP2)", for any chart (U, a).
Thus (3.10) follows from the corresponding fact on IRn if we use (3.8).
Let us say that a set A has content zero if A C A I U ... U Ap where each
Ai is compact, A, C Ui for some chart (Ui, ai), and ai(A,) has content zero ill
IRn. It is easy to see that the union of any finite number of sets of content zero
has content zero. It is also clear that the function eA is contented.
Let us call a set B C J1[ contented if the function eB is contented. For any
pEP we define IB P by
(3.11)
It follows from (3.8) that
Lp=O
for any pEP if A has content zero. We can thus ignore sets of content zero for
the purpose of integration. In practice, one usually takes advantage of this when
computing integrals, rather than using (3.8). For instance, in computing an
integral over sn, we can "ignore" any meridian: for example, if
A = {x E sn : x = (t, 0, ... , ±yr=t2) E IRn +I },
then
for any p.
This means that we can compute Isn P by introducing polar coordinates
(Fig. 10.1) and expressing P in terms of them. Thus in S2, if U = S2 - A and
a is the polar coordinate chart on U, then
( P = (2" (" P'" dO dcp.
lS2 lo lo
10.4 VOLUME DENSITY OF A RIEMANN METRIC 411
(J
~~--------------~
a(S-A)
---+--------------~2~~----~
Fig. 10.1
It is worth observing that if N is a differentiable manifold of dimension less
than dim M and if; is a differentiable map of N ~ M, then Proposition 7.3 of
Chapter 8 implies that if A is any compact subset of N, then if;(A) has content
zero in M. In this sense, one can ignore "lower-dimensional sets" when integrat-
ing on M.
4. VOLUME DENSITY OF A RIEMANN METRIC
Let M be a differentiable manifold with a Riemann metric o. We define the
density 0' [=0'(0)] as follows. For each chart (U, a) with coordinates Xl, ••• , xn
let
O'a(a(x)) = Idet [(a~i (x), a~j (x))]11
/
2
= Idet (gij(x))I I / 2• (4.1)
Here
is the matrix whose ijth entry is the scalar product of the vectors
a
-a. (x)
x'
and
so that (in view of Exercise 8.1 of Chapter 8)
a
a---: (x),
X1
O'a(a(x)) = volume of the parallelepiped spanned by (a/axi)(x) with
respect to the Euclidean metric ( , )0,., on T.,(M).
It is easy to see that (4.1) actually defines a density. Let (W, fJ) be a second
chart about x with coordinates yl, ... ,yn. Then
so that
412 THE INTEGRAL CALCULUS ON MANIFOLDS
Now
for all k and l. We can write this as the matrix equation
so that
U~(~(x) = Idet [(a~i (x), a~j (x))J det [;:~]det [~:~JI1/2
= 1det [(a~i (x), a~j (X))JI1/2 1det [~::JI
= ua(a(x) Idet[~::]I(X).
lOA
If M is an open subset of Euclidean space with the Euclidean metric, then
the volume density, when integrated over any contented set, yields the ordinary
Euclidean volume of that set. In fact, if xI, ... , Xn are orthonormal coordinates
corresponding to the identity chart, then gij(X) = 0 if i ~ j and gii = 1, so
that Uid == 1 and thus
LU = L1 = }L(A).
More generally, let 'P be an immersion of a k-dimensional manifold Minto
IRn such that 'P(M) is an open subset of a k-dimensional hyperplane in IRn, and
let m be the Riemann metric induced on M by 'P. Then, if U denotes the corre-
sponding volume density, fA Uis the k-dimensional Euclidean volume of 'P(A).
In fact, by a Euclidean motion, we may assume that 'P maps M into IRk C IRn.
Then, since 'P is an immersion and M is k-dimensional, we can use xI, ... , Xk
as coordinates on M and conclude, as before, that U is given by the function in
terms of these coordinates, and hence that fA U= }L('P(A).
Now let 'PI and 'P2 be two immersions of M ~ IRn. Let (U, a) be a coordinate
chart on M with coordinates y ... ,yk. If mI is the Riemann metric induced
by 'Pi, then
and
where the scalar product on the right is the Euclidean scalar product. Let 111
10.4 VOMrME DENSITY OF A RIEMANN METRIC 413
Fig. 10.2
and U2 be the volume densities corresponding to ml and m2. Then
and
1 [(iJIPl iJIPl)J11/2
U la = det iJyi' iJyi
U2a = Idet [(~~~ , ~~~)JI
1
/
2
In particular, given an L > 0, there is a K = K(k, n, L) such that if
II~;~II < L and
II~~!II < L
for all i = 1, ... , k,
then, by the mean-value theorem,
lUI - U2 I < K (11iJIP2 _ iJIP111 +... + IliJCf?2 _ iJCPlll) .a a - iJy1 iJyt iJyk iJyk
RO}1ghly speaking, this means that if CPt and CP2 are close, in the sense that their
derivatives are close, then the densities they induce are close.
We apply this remark to the following situation. We let CPt be an immersion
qf Minto IR" and let (W, a) be some chart of M with coordinates yt, ••• , yk.
We let U = W - C = UUz, where C is some closed set of content zero and
such that Uz n Uz' = 9J if Z¢ l'. For each Zlet z, be a point of Ui whose
coordinates are -< yl, ••• , y~ >, and for z = -< yt, ... , yk> define CP2 by setting,
1 k ~.. iJCPl
CP2(y , ••. , y ) = CPl(zz) + £oJ (y' - yi) iJy' (zz)
i~z E Uz• (See Fig. 10.2.)
414 THE INTEGRAL CALCULUS ON MANIFOLDS 10.4
If the Uz's are sufficiently small, then
II:~~ -~~!II
will be small. More generally, we could choose CP2 to be any affine linear map
approximating CPl on each Uz• We thus see that the volume of W in terms of the
Riemann metric induced by cP is the limit of the (suljace) volume of polyhedra
approximating cp(W). Here the approximation must be in the sense of slope
(Le., the derivatives must be close) and not merely in the sense of position.
The construction of the volume density can be generalized and suggests an
alternative definition of the notion of density. In fact, let P be a rule which
assigns to each x in M a function, Px, on n tangent vectors in Tx(M) subject to
the rule
(4.2)
where ~i E Tx(M) and A: Tx(M) ~ Tx(M) is a linear transformation. Then
we see that P determines a density by setting
Pa(a(x)) = P (a~l (x), ... 'a~n (X)) (4.3)
if (U, a) is a chart with coordinates uI, ... ,un. The fact that (4.3) defines a
density follows immediately from (4.2) and the transformation law for the
a/aui under change of coordinates.
Conversely, given a density P in terms of the Pa, define p(a/auI, ... , a/aun)
by (4.3). Since the vectors {a/aUi}i=l, ...,n form a basis at each x in U, any
6, ... , ~n in Tx(M) can be written as
a~i=B-a. (x),
u'
where B is a linear transformation of Tx(M) into itself. Then (4.2) determines
p(6, ... , ~n) as
p(6, ... , ~n) = Idet BIPa(a(x)). (4.4)
That this definition is consistent (i.e., doesn't depend on a) follows from (4.2)
and the transformation law (3.1) for densities.
EXERCISES
4.1 Let M = Sl X Sl be the torus, and let cp: M -+ 1R4 be given by
xl 0 cp(8J, 82) = cos 81,
x20 cp(81, 82) = sin 81,
x3 0 cp(81, 82) = 2 cos 82,
X4 0 cp(8l, 82) = 2 sin 82,
10.4 VOLUME DENSITY OF A RIEMANN METRIC 415
where Xl, ... , x4 are the rectangular coordinates on ~4 and (}l, (}2 are angular coordi-
nates on M.
a) Express the Riemann metric induced on M by '" (from the Euclidean metric
on ~4) in terms of the coordinates (}l, (}2. [That is, compute the Yij«(}l, (}2).]
b) What is the volume of M relative to this Riemann metric?
4.2 Consider the Riemann metric in-
duced on Sl X Sl by the immersion", into
fEB by
x 0 ",(u, v) = (a - cos u) cos v,
yo 'P(u, v) = (a - cos u) sin v,
z 0 ",(u, v) = sin u,
where u and v are angular coordinates and
a > 2. What is the total surface area of
Sl X Sl under this metric?
4.3 Let 'I' map a region U of the xy-plane
into IP by the formula
",(x, y) = (x, y, F(x, y)),
Fig. 10.3
so that 'P(U) is the surface z = F(x, y). (See Fig. 10.3.) Show that the area of this
surface is given by
4.4 Find the area of the paraboloid
z = x2+y2 for x2+y2 ::; 1.
4.5 Let U C ~2, and let '1': U - P be given by
",(u, v) = (x(u, v), y(u, v), z(u, v)),
where x, y, z are rectangular coordinates on P. Show the area of the surface 'P(U)
is given by
i r flax ay _ ax ay)2 +(ay az _ ay az)2 +(ax iJz _ ax ay)2.
I Ju ~au av avau au av avau au av avau
4.6 Compute the surface area of the unit sphere in P.
4.7 Let Ml and M2 be differentiable manifolds, and let 0' be a density on M2
which is nowhere zero. For each density P on MIX M2, each product chart (Ul X U2,
al X a2), and each X2 E U2, define the function Plal(', X2) by
Plal(Vl, X2)Ua (a2(X2)) = PaIXa2(Vl, a2(x2))
for all VI E al(UI).
a) Show that PIal(VI, X2) is independent of the chart (U2, a2).
b) Show that for each fixed X2 E M2 the functions Plal(', X2) define a density on MI.
We shall call this density PI(X2).
416 THE INTEGRAL CALCULUS ON MANIFOLDS 10.5
c) Show that if P is a smooth density of compact support on M1 X 1112 and (j is
smooth, then P1(X2) is a smooth density of compact support 011 1111.
d) Let P be as in (c). Define the function Fp on l!f2 by
Fp(X2) = r PI(X2).
1Ml
Sketch how you would prove the fact that Fp is a smooth function of compact
support on !If2 and that
5. PULLBACK AND LIE DERIVATIVES OF DENSITIES
Let cp: M 1 ~ M 2 be a diffeomorphism, and let p be a density on M 2. Define
the density cp*p on lJf1 by
(5.1)
for ~i E Tx(M1) and cp* = Cp*x. To show that cp*p is actually a density, we must
check that (4.2) holds for any linear transformation A of Tx(M1). But
cp*p(Ah, ... , A~n) = P(cp*A~l' ... , cp*A~n)
which is the desired identity.
= p(cp*Acp*lcp*~b ... , cp*Acp*lCP*~n)
= Idet cp*Acp*ll p(cp*h, ... , CP*~n)
= Idet Alcp*p(h, ... , ~n),
Let (U, a) and (W, (3) be compatible charts on M 1and M 2 with coordinates
u 1, ... , un and WI, ... , wn, respectively. Then for all points of U we have,
by (4.3),
(cp*p) (a(.)) = p (cp* ~, ... 'CP*~) = Idet (aw~)1 p (~, ... ,~)a. au1 aun au' awl awn
= Idet (~::)I p~({3 0 cp(.)).
In other words, we have
(cp*p)a. = Idet J~o'Poa.-llp({3 0 cp 0 a-1(.)).
The density cp*p is called the pullback of p by cp*. It is clear that
CP*(P1 +P2) = CP*(P1) + CP*(P2)
and that
cp*(fp) = cp*(f)cp*(p)
for any function f.
It follows directly from the definition that
supp cp*p = cp-1[SUpp pl.
(5.2)
10.5 PULLBACK AND LIE DERIVATIVES OF DENSITIES 417
Proposition 5.1. Let cP: M1 ---7 M2 be a diffeomorphism, and let P be a
locally absolutely integrable density ,vith compact support on M 2. Then
(5.3)
Proof. It suffices to prove (5.3) for the case
supp P C cp(U)
for some chart (U, a) of M1 with cp(U) C W, where (W, (3) is a chart of M2.
In fact, the set of all such cp(U) is an open covering of J],f2, and we can therefore
choose a partition of unity {gj} subordinate to it. If we write P = L gjP, then
the sum is finite and each gjp has the observed property. Since both sides of (5.3)
are linear, we conclude that it suffices to prove (5.3) for each term.
Now if supp pC cp(U), then
fp = r P{1 = r P{1
J{1(W) J{1otp(U)
and
thus establishing (5.3). D
Now let CPt be a one-parameter group on Al with infinitesimal generator X.
Let P be a density on M, let (U, a) be a chart, and let W be an open subset of U
such that CPt(W) C U for all ItI < e. Then
(/(P)a(v) = Pa(<I>a(v, t) Idet (aa<l>a) I for v E a(W),
v (V,t)
where <l>a(v, t) = a 0 CPt 0 a-1(v) and (a<l>a/av)(V,t) is the Jacobian of v ~ <l>a(v, t).
We would like to compute the derivative of this expression with respect to t at
t = O. Now <l>a(v, 0) = v, and so
det (a<l>a) = 1.
av (V,O) •
Consequ~ntly, we can conclude that
det (a<l>a) > 0
av (V,t)
for t close to zero. We can therefore omit the absolute-value sign and write
d(cpiP)al = dpa(<I>a) I +Pa(V)i£(deta<l>a)/ .
dt t=o dt t=o dt av t=o
418 THE INTEGRAL CALCULUS ON MANIFOLDS 10.5
We simply evaluate the first derivative on the right by the chain rule, and get
dpa (a:a) = dPa(Xa(V).
In terms of coordinates xl, ... , xn, we can write
if Xa = -< X~, ... , X~ >-.
To evaluate the second term on the right, we need to make a preliminary
observation. Let ACt) = (aij(t) be a differentiable matrix-valued function of t
with A(O) = id = (of). Then
d(de~~(t) = lim~ (det A(t) - 1).
Now aii(O) = 1 and aij(O) = 0 (i ~ j). To say that A is differentiable means
that each of the functions aij(t) is differentiable. We can therefore find a constant
K such that lai/t)1 :::;; Kltl (i ~ j) and laii(t) - 11 :::;; Kltl. In the expansion of
det A(t), the only term which will not vanish at least as t2 is the diagonal product
all(t)··· ann(t). In fact, any other term in L ± ali1 (t)··· aniJt) involves at
least two off-diagonal terms and thus vanishes at least as t2• Thus
~ (det A(t) = ~~~ (all(t)··· ann(t) - 1)
= ail(O) +... + a~n(O)
= tr A'(O).
If we take A = a'P,,/av, we conclude that
!i (det a'Pa) = tr aXa = L: ax~ .
dt av av ax'
Thus
We repeat:
Proposition 5.2. Let <Pt be a one-parameter group of diffeomorphisms of M
with infinitesimal generator X, and let P be a differentiable density on M.
Then
*Dxp = lim <PtP - P
t=o t
exists and is given locally by
(DxP)a = L: a(7~~)x,
if X = -<X~, ... , X:>- on the chart (U, a).
10.6 THE DIVERGENCE THEOREM 419
The density Dxp is sometimes called the divergence of <X, p> and is
denoted by div <X, p>. Thus div <X, p> = Dxp is the density given by
(div <X, P»a = 1: aa .(X~Pa)x'
Now let p be a differentiable density,
and let A be a compact contented set.
Then
Thus
1 p = ( e<pt(A)p
<pt(Al 1M
= 1M <p:(e<p,cAlP)
= I(<pie<pM»)(<ptp)
= leA<Pi(p)
= L<p:p.
~ (i,cA/ - i p) = i~ (<pip - p).
on (U, ex).
Fig. 10.4
Using a partition of unity, we can easily see that the limit under the integral
sign is uniform, and we thus have the formula
ddt (l p)1 = 1Dxp = [ div <X,p>.
<pt(Al t=o A }A
6. THE DIVERGENCE THEOREM
Let <P be a flow on a differentiable manifold M with infinitesimal generator X.
Let p be a density belonging to P, and let A be a contented subset of M. Then
for small values of t, we would expect the difference f"'t(Al P - fA P to depend
only on what is happening near the boundary of A (Fig. lOA). In the limit,
we would expect the derivative of f<P,cAl p at t = 0 (which is given by
fA div <X, p» to be given by some integral over aA. In order to formulate
such a result, we must first single out a class of sets whose boundaries are suffi-
ciently nice to allow us to integrate over them. We therefore make the following
definition:
Definition. Let M be a differentiable manifold, and let D be a subset of M.
We say that D is a domain with regular boundary if for every x E M there is a
chart (U, ex) about x, with coordinates x!, ... ,x:, such that one of the
following three possibilities holds:
i) Un D = 50;
ii) U c D;
iii) cx(U n D) = exCU) n {v = <Vi, ... I vn > E IRn : Vn ~ O}.
420 THE INTEGRAL CALCULUS ON MANIFOLDS 10.6
Note that if x G!: TI, we can always find a (U, a) about x such that (i) holds.
If x E int D, we can always find a chart (U, a) about x such that (ii) holds.
This imposes no restrictions on D. The crucial condition is imposed when
x E aD. Then we cannot find charts about x satisfying (i) or (ii). In this case,
(iii) implies that a(U n aD) is an open subset of IRn-1 (Fig. 10.5). In fact,
a(U n aD) = {v E a(U) : vn = O} = a(U) n IRn-l, where we regard IRn- 1 as
the subspace of IRn consisting of those vectors with last component zero.
a(UnaD)
Fig. 10.5
Let a be an atlas of M such that each chart of a satisfies either (i), (ii), or
(iii). For each (U, a) E a consider the map a raD: U n aD ~ IRn- 1 C IRn.
[Of course, the maps a raD will have a nonempty domain uf definition only for
charts of type (iii).] We claim that {(U n aD, a I aD)} is an atlas on aD. In
fact, let (U, a) and (W, (3) be two charts in a such that C n lV n aD ~ [25.
Let Xl, ... ,xn be the coordinates of (U, a), and let yl, ... ,yn be those of
(W, (3). The map f3 0 a-I is given by
On a(U n W n aD), we have xn = 0 and yn = O. In particular,
yn(xl, ... , xn- l, 0) == 0,
and the functions yl(xl, ... , xn-l, 0), ... ,
yn-l (xl, ... , xn-l, 0) are differentiable. This ])
shows that (f3 I aD) 0 (a raD)-1 is differen-
tiable on a(U n aD). We thus get a manifold x
structure on aD.
It is easy to see that this manifold struc-
ture is independent of the particular atlas of !If
that was chosen. We shall denote by L the map
of aD ~ M which sends each x E aD, regarded Fig. 10.6
as an element of M, into itself. It is clear that
L is a differentiable map. (In fact, (U n aD, a raD) and (U, a) are compatible
charts in terms of which a 0 ,0 (a I aD)-1 is just the map of IRn-1 ~ IRn.)
Let x be a point of aD regarded as a point of M, and let t be an element
of Tx(M). We say that t points into D if for every curve C with C'(O) = t.
we have C(t) E D for sufficiently small positive t (Fig. 10.6). In
10.6 THE DIVERGENCE THEOREM 421
terms of a chart (U, a) of type (iii), let ~'" = -< e, ... ,~n >. Then it is clear
that ~ points into D if and only if C > o. Similarly, a tangent vector ~ points
out of D (obvious definition) if and only if ~n < o. If ~n = 0, then ~ is tangent
to the boundary-it lies in L*Tx(aD).
Let P be a density on M and X a vector field on M. Define the density Px
on aD by
for ~i E Tx(aD). (6.1)
It is easy to check that (6.1) defines a density. (This is left as an exercise for
the reader.) If (U, a) is a chart of type (iii) about x and X", = -< Xl, ... , xn> ,
then applying (4.3) to the chart (U n aD, a f aD) and the density Px, we see
that
(Px) ~aD=p(~, ... ,_a_,x).'" axl axn - l
Let A be the linear transformation of Tx(lIf) given by
a a
A axl = axl '
a aA--=--,
axn - l axn - l
The matrix of A is
1 0
o 1 0
o
and therefore Idet AI = Ixnl. Thus we have
aA-=X.
axn
(Px)", taD = IXnlp" at all points of a(U n aD). (6.2)
We can now state our results.
TheorelU 6.1 (The divergence theorem).t Let D be a domain with regular
boundary, let PEP, and let X be a smooth vector field on M. Define the
function EX on aD by
Then
EX(X) = { ~-1
if X(x) points out of D,
if X(x) is tangent to aD,
if X(x) points into D.
{ div -<X, p> = { EXPX.
lD laD
Remark. In terms of a chart of type (iii), the function EX is given by
EX = -sgnXn.
(6.3)
(6.4)
t This formulation and proof of the divergence theorem was suggested to us by
Richard Rasala.
422 THE INTEGRAL CALCULUS ON MANIFOLDS
~
~
Fig. 10.7
r---------------------,R
-R~------------------~ Fig. 10.9
10.6
Fig. 10.8
Fig. 10.10
Proof. Let a be an atlas of M each of whose charts is one of the three types.
Let {Yi} be a partition of unity subordinate to a. Write P = L YiP. This is a
finite sum. Since both sides of (6.3) are linear functions of P it suffices to verify
(6.3) for each of the summands YiP. Changing our notation (replacing YiP by p),
we reduce the problem to proving (6.3) under the additional assumption
supp pC U, where (U, a) is a chart of type (i), (ii), or (iii). There are therefore
three cases to consider.
CASE I. supp P C U and Un 15 = ¢. (See Fig. 10.7.) Then both sides of
(6.3) vanish, and so (6.3) is correct.
CASE II. supp pC U with U C int D. (See Fig. 10.8.) Then the right-hand
side of (6.3) vanishes. We must show that the left-hand side does also. But
rdiv -<X, p'r = rdiv -<X, p'r = 1 2: a(Xi~a) = 2: 1 a(Xipa)
} D } U a(U) ax' a(U) axi
Now each of the functions XiPa has its support lying inside a(U). Choose some
large R so that a(U) C O!R' We can then replace fa(U) by fO!!'R' We extend
its domain of definition to all of ~n by setting it equal to zero outside a(U).
(See Fig. 10.9.) Writing the integral as an iterated integral and integrating with
respect to Xi first, we see that
1 aXiPa
a(U) axi
= fXipa(" ., R, ... ) - Xipa("" -R, . .. ) dxl dx2 dXi - l dxi . .. dxn = O.
This last integral vanishes, because the function XiPa vanishes outside a(U).
10.6 THE DIVERGENCE THEOREM 423
CASE III. supp p is contained in a chart of type (iii). (See Fig. 10.10.) Then
( div -<X, p> = ( div -<X, p> = 1:1 aXipa.
JD JDnu a(DnU) ax'
Now
a(U n D) = a(U) n {v: vn ~ O}.
We can therefore replace the domain
of integration by the rectangle
D~~R::~.~R,o>. (See Fig. 10.11.)
For 1 ~ i < n all the integrals in <:- R, _.'_''':''---:''---1_
the sum vanish as before. For
i = n we obtain
JD div -<X, p> = - ~n-l XnPa. Fig. 10.11
... ,R>
'-'---IV"~ = 0
If we compare this with (6.2) and (6.4), we see that this is exactly the assertion of
(6.3). 0
If the manifold M is given a Riemann metric, then we can give an alternative
version of the divergence theorem. Let dV be the volume density of the Riemann
metric, so that
dV(h, ... , ~n) = Idet ((~i' ~j»)11/2, ~i E Tx(M),
is the volume of the parallelepiped spanned by the ~i in the tangent space (with
respect to the Euclidean metric given by the scalar product on the tangent space).
Now the map L is an immersion, and therefore we get an induced Riemann
metric on aD. Let dS be the corresponding volume density on aD. Thus, if
{Mi=l,... ,n-l are n - 1 vectors in Tx(aD), dS(~b ... , ~n-l) is the (n - 1)-
dimensional volume of the parallelepiped spanned by L*~l"'" L*~n-l in
L*Tx(aD) c Tx(M). For any x E aD let n E Tx(M) be the vector of unit length
which is orthogonal to L*Tx(aD) and which points out of D (Fig. 10.12). We
Fig. 10.12 Fig. 10.13
424 THE INTEGRAL CALCULUS ON MANIFOLDS 10.7
clearly have
dS(h, ... , ~n-I) = dV(L*h, ... , L*~n-b n).
For any vector X(x) E TxCM) (Fig. 10.13) the volume of the parallelepiped
spanned by h, ... , ~n-I, X(x) is I(X(x), n) IdS(~1i ... '~n-I)' [In fact, write
X(x) = (X(x), n)n +Ill,
where III E L*T(aD).] If we compare this with (6.1), we see that
dVx = I(X, n)ldS.
Furthermore, it is clear that
e(x) = sgn (X(x), n).
Let p be any density on M. Then we can write
p = jdV,
where j is a function. Furthermore, we clearly have px = j dVx and
div -<X, p> = div -<X, j dV>.
We can then rewrite (6.3) as
r div -<X,jdV> = r j. (X, n) dS.
lD laD
7. MORE COMPLICATED DOMAINS
(6.5)
(6.6)
For many purposes, Theorem 6.1 is not quite sufficiently broad. The trouble is
that we would like to apply (6.3) to domains whose boundaries are not com-
pletely smooth. For instance, we would like to apply it to a rectangle in [Rn.
N ow the boundary of a rectangle is regular at all points except those lying on an
edge (i.e., the intersection of two faces). Since the edges form a set "of dimension
n - 2", we would expect that their presence does not invalidate (6.3). This is
in fact the case.
Let M be a differentiable manifold, and let D be a subset of M. We say
that D is a domain with almost regular boundary if to every x E M there is a
chart (U, a) about x, with coordinates xl, ... ,x~, such that one of the following
four possibilities holds:
i) Un D = 0;
ii) UeD;
iii) a(U n D) = a(U) n {v = -<VI, ... , vn> E [Rn : Vn :?: O};
iv) a(U n D) = a(U) n {v = -<VI, ... , Vn> E [Rn: Vk :?: 0, ... ,Vn :?: O}.
The novel point is that we are now allowing for possibility (iv) where k < n.
This, of course, is a new possibility only if n > 1. Let us assume n > 1 and see
what (iv) allows. We can write a(U n aD) as the union of certain open subsets
lying in (n - I)-dimensional subspaces of [Rn-I, together with a union of
portions lying in subspaces of dimension n - 2.
10.7 MORE COMPLICATED DOMAINS 425
B
Fig. 10.14
H~
In fact, for k ::; p ::; n let
H k - { • k 0 P - 0 p+1 0 n O}p- v.v > ,... ,v - ,v > ,... ,v > .
Thus H~ is an open subset of the (n - I)-dimensional subspace given by
vP = o. (See Fig. 10.14.) We can write
ex(U n aD) c ex(U) n {(Ht u Ht+1 U· .. u H!) uS},
where'S is the union of the subspaces (of dimension n - 2) where at least two
of the vP vanish.
Fig. 10.15
Observe that if x E Un aD is such that ex(x) E H~ for some p, then there is a
chart about x of type (iii). In fact, simply renumber the coordinates so that vP
becomes vn, that is, map IFiin.!4 IFiin by sending -<vI, ... ,vn>- -+ -<wI, ... , wn>-,
where
Wi = Vi
Wi = Vi +1
wn = vp •
for i < p,
for p::; i < n,
Then in a sufficiently small neighborhood U1 of x the chart (U1, '" 0 ex) is of
type (iii). (See Fig. 10.15.)
426 THE INTEGRAL CALCULUS ON MANIFOLDS 10.7
We next observe the set of x E aD having a neighborhood of type (iii) forms
a differentiable manifold. The argument is just as before. The only difference
is that this time these points do not exhaust all of aD. We shall denote this
manifold by lib. Thus lib is a manifold which, as a set, is not aD but only the
"regular" points of aD, that is, those having charts of type (iii).
Theorem 7.1 (The divergence theorem). Let M be an n-dimensional
manifold, and let D C M be a domain with almost regular boundary.
Let lib be as above, and let i be the injection of lib --7 M. Then for any
pEP we have
( div <.X, p> = r~ EXPX.
JD JiJD
(7.1)
Proof. The proof proceeds as before. We choose a connecting atlas of charts
of types (i) through (iv) and a partition of unity {gj} subordinate to the atlas.
We write P = L gjp and now have four cases to consider. The first three cases
have already been handled.
The new case arises when p has its support in U, where (U, a) is a chart of
type (iv). We must evaluate
1 L: aXipa.
a(UnD) axi
The terms in the sum corresponding to i < k make no contribution to the
integral, as before. Let us extend XiPa to be defined on all of ~n by setting it
equal to zero outside a(U), just as before. Then, for k :::; i :::; n we have
1 axipa ( axipa
a (u nD) ----a:ii = JB ----a:ii'
where B = {v: vk ~ 0, ... , vn ~ O}. Writing
this as an iterated integral and integrating first
with respect to Xi, we obtain
where the set Ai C ~n-I is given by
Fig. 10.16
A { J 1 i-I i+I n,- k > 0 n > O}i= ,V, ... ,V ,v , ... ,V r:V _ , ... ,V _ •
Note that Ai differs from HZ by a set of content zero in ~n-I (namely, where at
least one of the vl = 0 for k = l :::; n). Thus we can replace the Ai by the H~
in the integral. Summing over k :::; i :::; n, we get
which is exactly the assertion of Theorem 7.1 for case (iv). 0
10.7 MORE COMPLICATED DOMAINS 427
Fig. 10.17 Fig. 10.18
We should point out that even Theorem 7.1 does not cover all cases for which
it is useful to have a divergence theorem. For instance, in the plane, Theorem 7.1
does apply to the case where D is a triangle. (See Fig. 10.16.) This is because
we can "stretch" each angle to a right angle (in fact, we can do this by a linear
change of variables of ~2). (See Fig. 10.17.)
However Theorem 7.1 does not apply to a quadrilateral such as the one in
Fig. 10.18, since there is no CI-transformation that will convert an angle greater
than 7r into one smaller than 7r (since its Jacobian at the corner must carry
lines into lines). Thus Theorem 7.1 doesn't apply directly. However, we can
write the quadrilateral as the union of two triangles, apply Theorem 7.1 to each
triangle, and note that the contributions of each triangle coming from the
common boundary cancel each other out. Thus the divergence theorem does
apply to our quadrilateral.
This procedure works in a quite general context. In fact, it works for all
cases where we shall need the divergence theorem in this book, whether Theorem
7.1 applies directly or we can reduce to it by a finite subdivision of our domain,
followed by a limiting argument. We shall not, however, fornmlate a general
theorem covering all such cases; it is clear in each instance how to proceed.
EXERCISES
In Euclidean space we shall write div X instead of div -< X, p> when p is taken to
be the Euclidean volume density.
7.1 Let x, y, z be rectangular coordinates on P. Let the vector field X be given by
X = r2 (x ~+y ~+z~) ,
ax ay az
where r2 = x2+ y2 + Z2. Show directly that
Is(X, n) dA = Ldiv X
by integrating both sides. Here B is a ball centered at the origin and S is its boundary.
428 THE INTEGRAL CALCULUS ON MANIFOLDS 10.7
7.2 Let the vector field Y be given by
Y = Yrnr + Y 8n 8 + Y",1l;",
in terms ofpolar "coordinates" r, e,,<p on 1E3, where nr, n,.and n,<pare the unit vectors in
the directions a/ar, a/a,e and a/a;<p respectively. Show that
div Y = ~{a~(r2sin<p Yr)+aa(j(rY8)+aa (r sin <p y",)}.
r sm <p r <p
7.3 Compute the divergence of a vector field in terms of polar coodrinates in the
plane.
7.4 Compute the divergence of a vector field in terms of cylindrical coordinates
in 1E3.
7.5 Let q be the volume (area) density on the unit sphere S2. Compute div qX
in terms of the coordinates (j, <p (polar coordinates) on the sphere.
CHAPTER 11
EXTERIOR CALCULUS
Let M be a differentiable manifold and let W be a linear differential form in ilf.
For any differentiable curve C: [a, bl ----i M ye can consider the integral
f: (C'(t), wC(t) elt. Let [c, ell ----i [a, bl be a differentiable lllap given by s ----i t(s).
The curve B: [c, ell ----i M given by B(s) = C(t(s)) satisfies
B'(s) = t'(s)C'(t(s)).
Thus if t'(s) > 0 for all s,
fed (B'(s), WB(s) ds = lab (C'(t), WC(t) dt.
Thus a linear differential form is something we can integrate over "oriented"
curves of ]1.1 and is independent of the parametrization. In this chapter ,ye shall
introduce objects yhich can be integrated over "oriented k-dimensional surfaces"
of M and study their properties.
1. EXTERIOH DIFFEHENTIAL FOHMS
We defined a linear differential form to be a rule which assigns an element of
T;(M) to each x E M. We can regard T;(M) as a1 (Tx(M)). In view of this,
we make the following generalization of this definition. By an exterior differ-
ential form of degree q on III we mean a rule which assigns an clement of
aq(Tx(M)) to each x E M. If W is an exterior form of degree q and (U, a) is a
chart, then, since a identifies each Tx(il1) with V for x E U, we obtain an
aq (V)-valued function, W,,' on a( U) defined by
if v = a(x) and ~1, ... , ~q E Tx(M).
It is easy to write down the transition laws. In fact, if (W, (3) is a second
chart, we have
WiJ((3(x))(~J, ... , ~~) = w(X)(~l, ... , ~q) = w,,(a(x))(~;, ... , ~:D
or, since ~iJ = JiJo,,-l(a(x))(~,,) for ~ E Tx(I1f), we see that
w,,(v)(~;, ... , ~~) = WiJ((3 0 a-1(v)) (JiJo,,-l(vH~, ... ,JiJo,,-l(vH':x). (1.1)
In order to write (1.1) in a less cumbersome form, we introduce the following
notation. Let V 1 and V2 be vector spaces, and let l: V 1 ---7 V2 be a linear map.
429
430 EXTERIOR CALCULUS 11.1
We define (iP(l) to be the linear map of (iP(Y2) -7 (iP(V I) given by
(iP(l)(w)(vr, . .. , vp) = w(l(vI), ... ,1(vp))
for all 10 E (iP(V2) and VI, ... ,Vp E VI. Note that under the identification of
(i I (V) with V* the map (i I (l) coincides with the map 1*: V; -7 V~. Note also
that if WI Eo (iP(V2) and 102 E (iq(v2), then
(iP(l)n' l 1 (iQ(l)U'2 = (ip+q(l)(wl 1 W2)' (1.2)
This follows directly from the definitions. Also, if /1 : V I -7 V 2 and 12: V 2 -7 V 3,
then
(1.3)
It is clear that if 1 depends differentiably on some parameters, then so does
(iP(l) for any p.
We can now write (1.1) as
(1.1')
It is clear from (1.1') that it is consistent to require that w" be a smooth
function. We therefore say that W is a smooth differential form if all the functions
w" are C'" on O'.(U) for all charts (U, 0'.). As usual, it suffices to verify this for all
charts in an atlas. We let /q(l'tJ) denote the space of all smooth exterior forms
of degree q.
Let WI E /P(M) and W2 E /q(M). We define the exterior (p +q)-form
WI 1 W2 by
for all x E l'tJ.
It is easy to check that WI 1 W2 is a smooth (p + q)-form. We thus get a multi-
plication on exterior forms. To make the formalism complete, it is convenient
to denote the space of differentiable functions on l'tJ by /O(]I.f) and to denote
the product of a function f and a p-form W by fw or f 1 w. This product is
given by
(f 1 w)(x) = (fw)(x) = f(x)w(x) for all x E M.
We have thus defined, for all 0 ~ p ~ nand 0 ~ q ~ 11, a multiplication
sending WI E /P(M) and W2 E /q(M) into WI 1 W2 E /p+q(M) (where
WI 1 W2 == 0 if p + q > n = dim M). The rules for the 1 -product on anti-
symmetric tensors carryover and thus, for instance,
WI 1 (W2 1 W2) = (WI 1 W2) 1 W3,
WI 1 (W2 +W3) = WI 1 W2 +WI 1 W3,
and so OIl.
Let M I and M 2 be differentiable manifolds, and let cp: M I -7 M 2 be a
differentiable map. For each W E /q (M 2) we define the form cp*w E /q(M I) by
cp*w(x) = (iq (cp*x)(wecp(x)) ). (1.4)
11.1 EXTERIOR DIFFERENTIAL FORMS 431
It is easy to check that cp*w is indeed an element of /q(M1), that is, it is a
smooth q-form. Note also that (7.5) of Chapter 9 is a special case of (l.4)-the
case q = 1. (If we make the convention that a°(l) = id, then the case q = 0
of (1.4) is the rule for pullback of functions.)
It follows from (1.4) that cp* is linear, that is,
(1.5)
and from (1.2) that
(1.6)
If cp is a one-parameter group on a manifold M with infinitesimal generator
X, then we can show that the
*1· CPtW - W D
1m t = XW
t--->O
exists for any W E /q(M). The proof of the existence of this limit is straight-
forward and will be omitted. We shall derive a useful formula allowing a simple
calculation of Dxw in Section 3.
Let us now see how to compute with the /q(M) in terms of local coordinates.
Let (U, a) be a chart of M with coordinates xI, .. . , xn. Then dxi E /1(U)
(where by /q(U) we mean the set of differentiable q-forms defined on U).
For any it, ... , iq the form dXil 1 ... 1 dxiq belongs to /q(U), and for every
x E U the forms
{(dXil 1 ... 1 dxiq)(X)}i1<"'<iq
form a basis for aq(Tx(M). From this it follows that every exterior form w
of degree q which is defined on U can be written as
w= (1.7)
where the a's are functions; that is,
w(x) = L: ai1 .....iq(X)(dxil 1 ... 1 dxiq)(x)
i1<···<i.
for all x E U. It is easy to see that wE /q(U) if and only if all the func-
tions ai1 .....iqare COO-functions on U.
If (W, (3) is a second chart with coordinates y1, ... , yn and
w = '" b· . dyh 1 . . . 1 dyiq£.... Jlo···.Jq , (1.8)
then it is easy to compute the transition law relating the b's to the a's on U n W.
In fact, on Un W we have
(1.9)
where yi = yi(xI, ... ,xn). Then all we have to do is to substitute (1.9) into
(1.8) and collect the coefficients of dXil 1 ... 1 dxio. For instance, if q = 2,
432 EXTERIOR CALCULUS
then we have
w = L bhh dyh 1 dyj2
h<h
= " b . . (a yh dxl + ... + ayh dxn)
.k..... 1112 ax! axn
1, <12
11.1
If we collect the coefficients of dXi' 1 dxi2 (remember the 1 -multiplication is
anticommutative), we get
w = " [" b·· (ay~ ayh _ ayh ayh)] dXi, 1 dXi2.
.k..... .k..... 1112 axil aXi2 a;J;i, a;Ci2 .1-1<1,11 h<J2
Thus
(1.10)
Although (1.10) looks a little formidable, the point is that all one has to remem-
ber is (1.9) and the law for I-multiplication. For general q the same argument
gives
Iay:' ...
Iax"
a . . - " b . . det l.-z,l.···.'l.q - . L.-i . JI,···.J q :
11 < <1q a h
y
aXiq
a/qJaxil
. .
ayjq
aXiq
(1.1 I)
The formula for pullback takes exactly the same form. Let <p: lJJ! ---7 )1,f c
be a differentiable map, and suppose that (U, a) and (W, (3) are compatibl(·
charts, where xl, ... , xm are the coordinates of (U, a) and y ... , yn are thos('
of (W, (3). Then we get that yi 0 <p are functions on U and can thus be written as
yj 0 <p = yj(x ... , xm).
Since <p* dyj = d(yj 0 <p), we have
* j ayj . i
<p (dy) = L -a. dx .x,
If
then, by (1.5) and (1.6),
<p*(w) = L (bh .....jq 0 <p)(<p* dyh) 1 ... 1 (<p* dyjq).
h<···<jq
(1.12)
(1.1:: )
The expression for (1.13) in terms of the dx's can be computed by substituting
(1.12) into (1.13) and collecting coefficients. The answer, of course, will look
11.2 THE INTEGRATION OF EXTERIOR DIFFERENTIAL FORMS
just like it did before. If
",*(w) = " a· . dXi, 1 ... 1 dxiq,,... .l-I 'It ... ,'q
il<···<i.
then the a's are given by
... aY~·1aX'l
. .
ayi.
aXia
433
(1.14)
Again, we emphasize that there is no need to remember a complicated
looking formula like (1.14); Eqs. (1.5), (1.6), and (1.12) (and of course the rules
for 1-multiplication) are sufficient. In many cases, it is much more convenient
to do the substitutions directly than to use (1.14).
2. ORIENTED MANIFOLDS AND
THE INTEGRATION OF EXTERIOR DIFFERENTIAL FORMS
Let lvI be an n-dimensional manifold. Let (U, a) and (W, (3) be two charts on lvI
with coordinates xl, ... , xn and yl, ... ,yn. Let w be an exterior differential
form of degree n. Then we can write
on U
and
w = b dyl 1 ... 1 dyn on W,
where the functions a and b are related on U n W by (1.11), which, in this case
(q = n), becomes
or
or, finally,
aa(v) = bfj ({3 0 a-I (v) ) det J fjoa-l(v) for v E a(U n W).
If P is a density on lvI, then the transition laws for Pa are given by
Pa(v) = Pfj({3 0 a-1(v))ldet Jfjoa-l(v)l.
(2.1)
(2.2)
Note that (2.2) and (2.1) look almost the same; the difference is the absolute-
value sign that occurs in (2.2) but not in (2.1). In particular, if (U, a) and
(W, (3) were such that det Jfjoa-l > 0, then (2.2) and (2.1) would agree for this
pair of charts.
434 EXTERIOR CALCULUS 11.2
This leads us to the following definition: An atlas a of M is said to be
oriented if for any pair of charts (U, a) and (W, (3) of a we have
detJj3o",-l(a(x)) > 0 for all x E U n W.
There is no guarantee that there exists an oriented atlas on a given manifold M.
In fact, it is not difficult to show that there does not exist an oriented atlas on
certain manifolds. (An example of a manifold possessing no oriented atlas is
the Mobius strip.)
We say that a manifold M is orientable if it has an oriented atlas.
Let M be an orientable manifold, and let al and a2 be two oriented atlases.
We say that a l and a2 have the same orientation, and write al 0' a2 , if a l U a2
is again an oriented atlas. To say that al 0' a2 meabs that for any (U, a) E al
and any (W, (3) E a2 we have
det Jj3o",-l(V) > 0 on a(U n W).
It is clear that 0' is an equivalence relation. An equivalence class of oriented
atlases is called an orientation of M. An orientable manifold, together with a
choice of orientation, will be called an oriented manifold. We shall denote an
oriented manifold by M. That is, M is a manifold M together with a choice
of orientation. Thus an oriented one-dimensional manifold has a preferred
direction at each point (Fig. 11.1); an oriented two-dimensional manifold has a
notion of clockwise versus counterclockwise direction (Fig. 11.2); and at any
point of an oriented three-dimensional manifold we can distinguish between
right- and left-handedness.
Fig. 11.1 Fig. 11.2
In general, let M be an oriented manifold, and let (U, a) be a chart of M
with coordinates xl, ... ,xn. We say that (U, a) is a positive chart if Jj3o",-l > 0
for any chart (W, (3) belongi~g to any oriented atlas defining (i.e., belonging to)
the orientation. (It suffices to check this, of course, for all (W, (3) belonging to
one fixed atlas defining the orientation.) Note that if U is connected, then if
(U, a) is not positive, then the chart (U, a l ), where
is a positive chart.
We shall say that (U, a) is a negative chart if det Jj3o",-l < 0 for all (W, (3)
belonging to an atlas defining the orientation. (Thus, if U is connected, thell
(U, a) must be either positive or negative.)
11.2 THE INTEGRATION OF EXTERIOR DIFFERENTIAL FORMS 435
We now return to our initial observation comparing (2.1) with (2.2).
Proposition 2.1. Let l1 be an oriented n-dimensional manifold. We can
identify exterior forms of degree n with densities by sending the form w
into the density pW, where for any positive chart (U, a) with coordinates
Xl, ... , x n, the function p: is determined by
w = p:(a(.)) dx l 1 ... 1 dxn
Another way of writing (2.3) is
on U.
w(a/ax l , .•. , a/axn) = p(a/ax l , ..• , a/axn).
(2.3)
(2.3')
In other words, if w = a clxl 1 ... 1 clx" on U, then p~(v) = aa. That pW
is really a density follows from the fact that (2.2) reduces to (2.1) for all pairs
of charts belonging to a positive atlas.
It is clear that this identification is additive,
(2.4)
and that for any fUllction,
(2.5)
Furthermore, if w(x) = 0, then pW = 0 at x. By the support of a differential
form we mean, as usual, the closure of the set of x for which w(x) ~ O. We say
that an n-form w is locally absolutely integrable if the density pW is locally
absolutely integrable. Note that to say that w is locally absolutely integrable
means that for any chart (U, a), with coordinates Xl, ... ,xn of some atlas ex, if
w = a dx l 1 ... 1 clxn on U,
then the function aa = a 0 a- l is an absolutely integrable function on a(U).
Let r(M) denote the space of absolutely integrable n-forms of compact support.
It is clear that r(M) is a vector space and that fw E r(M) if f is a (bounded)
contented function and wE r(M). As a consequence of Proposition 2.1 and
Theorem 3.1 of Chapter 10, we can state:
Theorem 2.1. Let M be an oriented manifold. There exists a unique linear
function f on r(M) satisfying the following condition: If supp w C U,
where (U, a) is a positive chart with coordinates xl, ... , xn , and if w =
a dx l 1 ... 1 clxn, then
Jw = 1 aa·
a(U)
(2.6)
Observe that we can write
for all wE r(M). (2.7)
The recipe for computing fw is now very simple. We break w up into small
pieces such that each piece lies in some U. (We can ignore sets of content zero
436 EXTERIOR CALCULUS 11.2
in the process.) If supp w C U, and if (U, a) is a positive chart, we express w as
w = a dx 1 / ••• / dxn.
And if a is given as a = aa(xI, ... , xn), we integrate the function aa over IRn.
The computations are automatic. Thus one point that has to be checked is that
the chart (U, a) is positive. If it is negative, then fw is given by - faa.
Let M1 be an oriented manifold of dimension q, let cp: M1 ---t M2 be a
differentiable map, and let w E Aq(M2 ). Then for any contented compact set
A C M1 the form eACP*(w) belongs to r(M1), so we can consider its integral.
This integral is sometimes denoted by fcp(A) w; that is, we make the definition
(2.8)
If we regard cp(A) as an "oriented q-dimensional surface" in M 2, then we see
that the elements of Aq(M2) are objects that we can integrate over such
"surfaces". (Of course, if q = 1, we say "curves".)
C(c)
C(a)
a b Fig. 11.3 C(b) Fig. 11.4
Let us illustrate by some examples. Suppose that M 2 = IRn, and let A C IR1
be the interval a :::; t :::; b. Let xl, ... , xn be the coordinates of IRn, and let
w = a1dx1 + ... + andxn. We regard 1R1 as an oriented manifold on which
the identity chart is positive (and its coordinate is t). If C: IR 1 ---t IRn is a differ-
entiable curve (Fig. 11.3), then
r w = je[a,bP*(W)
}C([a,b])
lb
(1 )1 dx n dxn
= a -+ ... +a - dt
a dt dt
b
= 1(C'(t), w) dt. (2.9)
From this last expression we see that C does not have to be differentiable every-
where in order for fC([a,b]) w to make sense. In fact, if C is differentiable
everywhere on IR except at a finite number of points, and if C'(t) is always
bounded (when regarded as an element of IRn), then the function (C'(·), w) is
defined everywhere except for a set of content zero and is bounded. Thus
11.2 THE INTEGRATION OF EXTERIOR DIFFERENTIAL FORMS 437
C*(w) is a contented density and (2.9) still makes sense. Now the curve can
have corners. (See Fig. 11.4.)
It should be observed that if w = df (and if C is continuous), then
b
r df = r d(f 0 C) = r (f 0 C)'
JC([a.b]) JC([a.b]) Ja
= fCC(b») - fCC(a»). (2.10)
In this case the integral depends not on the particular curve C but on the end-
points. In general, Ic w depends on the curve C. We will obtain conditions for
it to be independent of C in Section 5.
In the next example let M 2 = 1R3 and M I = U C 1R2 where (u, v) are
Euclidean coordinates on 1R2 and x, y, z are Euclidean coordinates on 1R3. Let
w = P dx A dy + Q dx A dz + R dy A dz
be an element of /2(1R3). If <p: U ~ 1R3 is given by the fUIlctions x(u, v),
y(u, v), and z(u, v), then for A C U,
r w = feA<P*w = feA<P*cP dx A dy +Qdx A dz + R dy Adz)
JI"(A)
= r [(P 0 ) (ax ay _ ay ax) + (Q 0 ) (ax az _ az ax)
JA <P au av au av <P au av au av
+ (R 0 ) (ay az _ az aY)J.
<P au av au av
'We conclude this section with another look at the volume density of Riemann
metrics, this time for an oriented manifold. If M is an oriented manifold with a
Riemann metric, then the volume density u corresponds to an n-form n. By
our rule for this correspondence, if (U, a) is a positive chart with coordinates
Xl, •.. ,xn, then
n = a axl A ... A dxn ,
where, by (4.1) of Chapter 10, a(x) = Idet (%)11/2 is the volume in Tx(JJl) of
the parallelepiped spanned by
a a
axl (x), ... 'axn (x).
Let el(x), ... ,en(x) be an orthonormal basis of Tx(M) (relative to the scalar
product given by the Riemann metric). Then
Idet (gijW/ 2 = Idet [C~i' ej ) JI = Idet AI,
where A = (a/axi , ej) is the matrix of the linear transformation carrying
ej ~ a/axj. If wl(x), ... ,wn(x) is the dual basis of the e's, then
wl(x) A ... 1 wn(x) = det A axl(x) A ... A axn(x).
438 EXTERIOR CALCULUS ll.:~
Now wl(x), ... , wn(x) can be any orthonormal basis of T:(M). [T:(M) has a
scalar product, since it is the dual space of the scalar product space Tx(M).]
We thus get the following result: If wI, ... , wn are linear differential forms such
that for each x EM, wl(x), ... , wn(x) is an orthonormal basis of T:(M), then
n = ±wl 1 ... 1 wn .
We can write
(2.11)
if we know that WI 1 ... 1 wn is a positive multiple of dx l 1 ... 1 dx".
Can we always find such forms wI, ... , wn on U? The answer is "yes": we can
do it by applying the orthonormalization procedure to dxt, ... ,dxn. That is,
we set
I d:c l where Ildxlll(x) = Ildxl(x) II > 0
w = Ildxlll ' is a Coo-function on U,
2 d;r2 - (dx 2, WI) WI
W - ,
- IIdx2 - (dX2,WI )wlll
The matrix which relates the dx's to the w's is composed of Coo-functions, so that.
the wi E / I (U). Furthermore, it is a triangular matrix with positive entries
on the diagonal, so its determinant is positive. We have thus constructed tIl('
desired forms WI, ..• ,wn , so (2.11) holds. For instance, it follows from
Eq. (9.10), Chapter 9, that dO, sin 0 dcp form an orthonormal basis for Tx(82) at.
all x E 8 2 (except the north and south poles). If we choose the orientation on 8~
so that 0, cp form a positive chart, then the volume form is given by
n = sin 0 dO 1 dcp.
3. THE OPERATOR d
With every function f we have associated a linear differential form df. We can
thus regard d as a map from /o(M) to /I (lf). As such, it is linear and satisfies
d(fd2) = f2 dfl +fl diz.
We now seek to define a cl: /k(M) ~ /k+I(.ilf) for k > 0 as well. We shall
require that d be linear and satisfy some identity with regard to multiplication,
generalizing the above formula for cl(fd2)' The condition we will impose is that
cl(WI 1 W2) = clWI 1 W2 + (-l)PwI 1 clW2
if WI is a form of degree p. The factor (-l)P accounts for the anticommutativity
of 1. The reader should check that d is consistent with this law, at least to the
extent that d(wl 1 W2) = (-l)pq cl(W2 1 WI) if WI is of degree p and W2 is of
degree q.
Weare going to impose one further condition on cl which will uniquely
determine it. This condition (which lies at the heart of the matter) requires
11.3 THE OPERATOR d 439
Fig. 11.5
some introduction. Let f be a differentiable function, and let C: I ~ M be a
differentiable curve. For any points a, bEl, the fundamental theorem of
the calculus implies that
f(C(b)) - f(C(a)) = lbd(fd~C) dt = lbC* df.
We can regard b and a (with ± signs attached) as the "oriented boundary"
of the interval [a, b]. Let us make the convention that "integrating" an element
of /O(p) is just evaluating the function at the point p. As such, the equation
above says that the integral of the "pullback" of f over the "boundary", that is,
feb) - f(a), equals the integral of the "pullback" of df over [a, b]. In some sense,
we would like to be able to say that if w is a form of degree k, then the integral
of the "pullback" of w over the k-dimensional boundary" of a (k + I)-dimen-
sional region is equal to the integral of the pullback of dw over the (k + 1)-
dimensional region. Without trying to make this requirement precise, let us see
what it says for the case where k = 1 and the region is a triangle in the plane.
Let cP be a smooth map of some neighborhood of the triangle ~ C 1R2 into M,
and let the vertices of ~ be mapped by cp into x, y, and z (see Fig. 11.5). The
boundary of ~ consists of three curves (segments) Cb C2 , and Ca (with the
proper orientations). Let w be a linear differential form on M. We would then
expect that
cp dw = ClCP W+ C2CP w + CaCP w.J * J** J** J**
If w = df, then the three integrals on the right become (by the fundamental
theorem of the calculus) fey) - f(x) +fez) - fey) +f(x) - fez) = O. Thus
fcp* d(df) = O. Since the triangle was arbitrary, we expect that
d(df) = O.
We now assert:
Theorelll 3.1. There exists a unique linear map d: /k(M) ~ /k+l(M)
such that on /° it coincides with the old d and such that
d(Wl 1 W2) = dWl 1 W2 + (-I)Pwl 1 dW2
and
d(df) = 0 if f E /O(M).
(3.1)
(3.2)
440 EXTERIOR CALCULUS 11.:;
Proof. We first establish the uniqueness of d. To do this we observe that (3.1)
implies that d is local, in the sense that if w = w' on some open set U, thell
dw = dw' on U. In fact, let W be an open set with We U, and let ep be a Coo_
function such that ep(x) == 1 for x E Wand supp ep C U. Then epw = epw' every-
where on ]1.[, and thus dlepw) = d(epw'). But, by (3.1), d(epw) = ep dw + dep /
w = dw on W, since ep == 1 and dep = 0 there. Thus elw = elw' on W. Since W
can be arbitrary, we conclude that dw = dw' on (T.
Let (U, a) be a chart with coordinates Xl, ••• ,xn. Every wE Ik(M)
can be written as
w= on U.
Now [by induction on k, using (3.1) and (3.2)] d(elxi1 / ... / dXik) = O.
Thus (3.1) implies that
on U. (3.3)
Equation (3.3) gives a local formula for el. It also shows that d is unique. In
fact, we have shown that there is at most one operator d on any open subset
o C 111 mapping 1k(O) ~ 1k+1(0) and satisfying the hypotheses of the
theorem (for 0). On the set 0 n U it must be given by (3.3).
We now claim that in order to establish the existence of el, it suffices to show
that the d given by (3.3) [in any chart (U, a)] satisfies the requirement of the
theorem on IklU). In fact, suppose we have shown this to be so. Let a be
an atlas of 111, and for each chart (U, a) E a define the operator da: Ik(U) ~
1k+1(U) by (3.3). We would like to set clw = claw on U. For this to be con-
sistent, we must show that claw = dpw on U n W if (W, (3) is some other chart.
But both da and clp satisfy the hypotheses of the theorem on Un W, and they
must therefore coincide there.
Thus to prove the theorem, it suffices to check that the operator cl, defined
by (3.3), fulfills our requirements as a map of Ik(U) ~ 1k+1(U). It is ob-
viously linear. To check (3.2), we observe that
so
'" af i
df = L..J -;--: dx ,
va;'
d(df) = L: d (~£) / d;ti = t (a!2txi dxi)
= L: (~- ~)d i / d,i
i<j axi axi axi axi x x
=0
by the equality of mixed partials.
Now we turn to (3.1). Since both sides of (3.1) are linear in WI and W2
separately, it suffices to check (3.1) for WI = a clXi1 / ... / dxip and W2 =
11.3 THE OPERATOR d 441
b dxh / ... / dxjq• Now WI / W2 = ab dXi1 / ... / dxip / dxil / ... / dxjq
and d(ab) = b da + a db; therefore,
d(WI / W2) = b da / dXi1 / ... / dxip / dxil / ... / dxjq
+ a db / dxi1 / • • • / dxip / dxil / . . . / dxi.,
while
dWI / W2 = (da / dXi1 / ... / dxip) / (b dxil / ... / dxjq)
= b da / dXi1 / ... / dxip / dxil / ... / dx jq
and
WI / dW2 = (a dXi1 / ... / dxip) / (db / dxil / ... / dxjq)
= (-1)Pa db / dXi1 / ... / dxip / dxh / ... / dx
so we see that (3.1) holds. This proves the theorem. D
We can draw a number of important corollaries from Eq. (3.3).
First of all, it follows immediately that for W E /k(M), for any k, we have
d(dw) = O. (3.4)
(Remember we merely assumed it for k = 0.)
Secondly, let cp: M I -+ M 2 be a differentiable map. Then for W E /k(M2)
we have
dcp*w = cp* dw. (3.5)
To check (3.5), it suffices to verify it for any pair of compatible charts.
But if Xl, •.• , xn are coordinates on 1112 and, locally,
W = " a· . dXi1 / ... / dXik£....i 'Zll' . .,'l.k ,
we have
= cp* dw.
In particular, if X is a vector field on 111, we conclude that
Dx dw = d(Dxw).
EXERCISES
3.1 Compute d of the following differential forms.
a) 'Y = L~ (-l)i-IxidxI / ... / dXi-1 / dXi+1 / ... / dXn
b) r-n'Y, where 'Y is as in (a) and r = {xi +...+ X~} 1/2
c) L Pi dqi
d) sin (X2 +y2 +z2) (x dx + y dy + z dz)
(3.6)
442 EXTERIOR CALCULUS 11 I
Let V be a vector space equipped with a nonsingular bilinear form and an orientatioll
Then we can define the *-operator as in Chapter 7. Since we identify the tangent spa",
Tx(V) with V for any x E V, we can consider the *-operator as mapping !k(V)
!n-k(V). For instance, in 1R2, with the rectangular coordinates (x, y), we have
*dx = dy, *dy = -dx,
and so on.
3.2 Show that
for any function f on 1R2.
3.3 Obtain a similar eXJlrc~sion for d * din IRn with its usual sealar produet. (Re(':,11
that
and, more generally,
*dxh 1 ... 1 dXik = ± dxil 1 ... 1 dx1n-k,
where (il, ... , ik,jl, ... ,jn-k) is a permutation of (1, ... , n) and the ± is the si~I'
of the permutation.)
3.4 Let x, y, z, t be coordinates on 1R4. Introduce a scalar product on the tangelll
space at each point so that
(dx, dy)
and
(dx, dx) = (dy, dy) = (dz, dz) = 1,
(dx, dz) = (dx, dt) = (dy, dz) = (dy, dt)
(c dt, c dt) = -1,
where c is a positive constant. Let the two-form w be given by
(dz, dt) = 0,
w = C(EI dx 1 dt + E2 dy 1 dt + E3 dz 1 dt)
+BI dy 1 dz + B2 dz 1 dx + B3 dx 1 dy.
Let the three-form 'Y be given by
'Y = p dx 1 dy 1 dz - (Jr dy 1 dz + Jz dz 1 dx + h dx 1 dy) 1 dt.
Write the equations
dw = 0, d * w = 47r'Y
as equations involving the various coefficients and their partial derivatives.
4. STOKES' THEOREM
In this section we shall prove a theorem which will be a far-reaching generaliza
tion of the fundamental theorem of the calculus of one variable. It should,
perhaps, be called the fundamental theorem of the calculus of several variables
We first make some definitions.
Let D be a domain with regular boundary in a manifold M. We recall
(page 419) that each point of M lies in a chart (U, a) which is one of three types
11.4 STOKES' THEOHEM 443
Let (U, a) and (W, (3) be two charts of M of type (iii). Then, as on page 420,
the matrix of J{Joa-1 is given by
ayl ayl
axl axn
ayn-l ayn-l ,
axl axn
0 0 0 0
ayn
axn
and so
(4.1)
Furthermore, yn(xl, ... ,xn) > 0 if xn > 0, since a(U n W) n {v: vn > O} =
a(U n W n int D). Thus aynjaxn > 0 at a boundary point where Xn = O.
Now suppose that 11 is an oriented manifold, and let D C M be a domain
with regular boundary. We shall make aD into an oriented manifold. We say
that an atlas a is adjusted if each (U, a) E a is of type (i), (ii), or (iii) and, in
addition, if each chart of a is positive.
If dim M > 1, we can always find an adjusted atlas. In fact, by choosing
the U connected, we find that every (U, a) is either positive or negative. If
(U, a) is negative, we replace it by (U, a'), where x~, = -x~.
If dim M = 1, then aD consists of a discrete set of points (which we can
regard as a "zero-dimensional manifold"). Each x E aD lies in a chart of type
(iii) which is either positive or negative. We assign a plus sign to x if any chart
(and hence all constricted charts) of type (iii) is negative. We assign a minus sign
to x if its charts of type (iii) are positive. In this way we "orient" aD, as shown
in Fig.. 11.6.
D
+
Fig. 1I.6
If dim M > 1, we choose an adjusted (oriented) atlas on M. It then follows
from (4.1) and the fact that aynjaxn > 0 that
det J(fJ ~aD)o(a ~aD)-l > O.
This shows that (U f aD, a f aD) is an oriented atlas on aD. We thus get an
orientation on aD. This is not quite the orientation we want on aD. For reasons
that will soon become apparent, we choose the orientation on aD so that
444 EXTERIOR CALCULUS 11.4
(U r aD, a raD) has the same sign as (_I)n. That is, (U raD, a raD) is a
positive chart if n is even, and we take the orientation opposite to that deter-
mined by (U raD, a raD) if n is odd. We can now state our main theorem.
Theorem 4.1 (Stokes'theorem). Let M be an n-dimensional oriented mani-
fold, and let DC M be a domain with regular boundary. Let aD denote
the boundary of D regarded as an oriented manifold. Then for any
wE !n-I(M) with compact support we have
( L*W = ( dw,
lao 10
(4.2)
where, as usual, L is the injection of boundary D into M.
Proof. For n = 1 this is just the fundamental theorem of the calculus.
For n > lour proof is almost exactly the same as the proof of Theorem 6.1
of Chapter 10. Choose an adjusted atlas a and a partition of unity {gj} sub-
ordinate to a. Since w has compact support, we can write
W = LgjW,
where the sum is finite. Since both sides of (4.2) are linear, it suffices to verify
(4.2) for each of the summands gJW' Since supp gjW C U, where (U, a), we
must check the three possibilities: (U, a) satisfies (i), (ii), or (iii).
If (U, a) satisfies (i), L*W = 0, since supp W n aD = 0, and
10 dw = 1M eD dw = 0,
since D n supp W = 0. Thus both sides of (4.2) vanish.
If (U, a) satisfies (ii), the left-hand side of (4.2) vanishes. We must show
that the same holds for the right-hand side. Let Xl, .•• ,xn be the coordinates
on (U, a), and write
gjW = al dx2 1 ... 1 dxn +a2 dx l 1 dx3 1 ... 1 clxn +...
Then
and thus
+ an clx1 1 ... 1 dxn-l.
d "( )i-l aai d i d ngjW = £...i -1 axi x 1 ... 1 x,
Since gjW has compact support, the functions ai have compact support, and
we can replace the integral over ~n by the integral over O!R, where R =
-< R, ... , R> and R is chosen so large that supp ai C O!R' But writing the
multiple integral as an integral, we get
fO'!..R~~~ = .£n-l ai(. .. , R, ...) - ai(. .. , - R, ...) = 0,
since ai(. .. , R, ...) = ai(. .. , -R, ...) = 0.
11.4 STOKES' THEOREM 445
Fig. 11.7
We now examine f dyjw in case (iii). The argument proceeds exactly as
before, except that we must compute fa(unD) aa;jaXi instead of fa(U)' (See
Fig. 11.7.)
We can now replace the region of integration by a rectangle of the form
D~~R::~.~R.O> for large R. If i < n, f aa;jaxi = 0 as before. If i = n, we get
1 ~an = - ( an(·, " ... ",0),
a(UnD) uXn JRn-l
so that
! d(Yjw) = 2: (_1)i-l! aaa~ = (_I)n ( an(-,·, ... ",0).
x' JRn-l
Now since xn = 0 on Un aD, we see that L* dxn = O. Thus
L*W = (L*an)(L* dxl ) / •.• / (L* dxn- 1),
or if (by abuse of notation) we regard xl, ... ,xn - 1 as the coordinates of
(U raD, ex raD), we get
L*W = a(·,·, ... ,·, 0) dx l / •.. / dxn-l.
In view of the choice we made for the orientation of aD, we conclude that
r ,*w = (_I)n r an(·,·, ... ,·, 0).
JaD JRn-l
This completes the proof of the theorem. 0
Theorem 4.1, like the divergence theorem, is not sufficiently broad for us to
apply to more general domams. For this purpose, we will again use the notion
of a domain with almost regular boundary.
We have already seen that the set of x E aD having a neighborhood of type
(iii) forms a differentiable manifold. (Recall that these points need not exhaust
all of aD). Similarly, if M is an oriented manifold, then this collection of points
becomes an oriented manifold (with (-1)n times the induced orientation, as
before). By abuse of language we shall denote this oriented manifold by aD.
Thus aD is an oriented manifold which, as a set, is not aD but only the "regular"
points of aD, that is, the points of ab.
446 EXTERIOR CALCULUS 11.4
Theorem 4.2 (Stokes' theorem). Let M be an n-dimensional oriented
manifold, and let D C M be a domain with almost regular boundary.
Let aD be as above, and let L be the injection of aD ---7 M. Then for any
W E /n-l(M) with compact support we have
( L*W = ( dw.
iao io
(4.2)
Proof. The proof proceeds as before. We choose an adjusted atlas and a par-
tition of unity {gj} subordinate to the atlas. We write w = L: gjW and now
have four cases to consider. The first three cases have been handled already.
The new case is where
' " 1 ---- .gjW = £oJ aj dx 1 ... 1 dx' 1 ... 1 dxn,
j
'where the ---- indicates that dxj is to be omitted, has its support contained in U,
where (U, a) is a chart of type (iv). By linearity, it suffices to verify (4.2) for
each summand on the right, i.e., for
aj dx l 1 .. , 1 d'7ii 1 ... 1 dxn.
Now L*(aj dx l 1 ... 1 £j 1 ... 1 dxn) = 0 unless j ~ k, since dxP
vanishes on the piece of aD n U whose image under a lies in H;. If j < p,
then all these dxP occur, and thus L*(aj dx l 1 ... 1 £j 1 ... 1 dxn) = O.
If j > p, then L*(aj dx l 1 ... 1 dxj 1 dxn) vanish everywhere except on the
portion of aD which maps under a onto H7.
On the other hand,
d(aj dx l 1 ... 1 dxj 1 ... 1 dxn) = (_l)j-l ~~: dx l 1 ... 1 dxn.
We can evaluate the integral JD by integrating over the rectangle
[]<R, ... ,R, ... ,R>
<-R.... ,-R,O, ... ,0,0 >
(where the - R's extend through the (k - l)th position). Integrating first
with regard to xj, we obtain
( d(aj dx l 1 ... 1 dxj 1 .. , 1 dxn ) = (-l)j ( k aj.
iD iH.
On the other hand, the orientation on Hj is such that this integral has the
sign necessary to make (4.2) hold. This proves Theorem 4.2. 0
As before, we can apply Theorems 4.1 and 4.2 to still more general domains
by using a limit argument. For instance, Theorem 4.2, as stated, does not apply
to the domain D in Fig. 11.8, because the curves Cl and C2 are tangent at P.
It does apply, however, to the approximating domain obtained by "breaking
off a little piece" (Fig. 11.9), and it is clear that the values of both sides of (4.2)
for D' are close to those for D. We thus obtain (4.2) for D by passing to the
limit. As before, we will not state a more general theorem covering these cases.
It will be clear in each instance how to apply a limit argument.
11.4 STOKES' THEOREM 447
p~-~-
Fig. 11.8 Fig. 11.9
Since the statement and proof of Stokes' theorem are so close to those of the
divergence theorem, the reader might suspect that one implies the other. On
an oriented manifold, the divergence theorem is, indeed, a corollary of Stokes'
theorem. To see this, let 0 be an element of /n(M) corresponding to the
density p. If X is a vector field, then the n-form DxO clearly corresponds to the
density Dxp = div <X, p>-. Anticipating some notation that we shall intro-
duce in Section 6, let X ...J 0 be the (n - I)-form defined by
X ...JO(e, ... , ~n-l) = (-I)n-IOal, ... , ~n- X).
In terms of coordinates, if 0 = a dxl 1 ... 1 dxn , then
X ...J 0 = a[XI dx2 1 ... 1 dxn - X 2 dx l 1 dx3 1 ... 1 dxn
+... + (_I)n- Ixn dxl 1 ... 1 dxn- l].
Note that
d(X ...J 0) = (L a;;.) dx1 1 ... 1 dxn,
which is exactly the n-form DxO, since it corresponds to the density Dxp =
div <X, p>-. Thus, by Stokes' theorem,
( ,*(X...J 0) = ( d(X...J 0) = ( div -< X, p>- .
Jao Jo JD
We must compare X ...J 0 with the density px on aD. By (2.2) they agree on
everything up to sign. To check that the signs agree, it suffices to compare
px(~' ...,_a)= p(~, ...,_a,X)axl axn- l axl axn- l
with
'*(X...JO(~, ... ,_a))axl axn- l
at any x E aD. Now
,*(X ...JO) = (_I)n- I xn dx l 1 ... 1 dxn- l
and, according to our convention, Xl, ••• , xn - l is a positive or negative coordi-
nate system according to the sign of (_l)n. Thus the two coincide if and only
if xn is negative, that is,
( '*(X...J 0) = { EXPx.
Jao JaD
448 EXTERIOR CALCULUS 11.4
EXERCISES
4.1 Compute the following surface integrals both directly and by using Stokes'
theorem. Let 0 denote the unit cube, and let B be the unit ball in 1R3.
a) faD x dy A dz + y dz A dx + z dx A dy
b) fan x3 dy A dz
c) faD cos z dx A dy
d) fou x dy A dz, where
U = [(x, y, z) : x ~ 0, y ~ 0, z ~ 0, x2+ y2 + z2 ::; I}
4.2 Let w = yz dx + x dy + dz. Let'Y hc the unit circle in the plane oriented in the
count.crdoekwise direction. ComJlute f7 w. Let
.r = {(x, y, z) : z = 0, x2+ y2 ::; 1],
A2 = {(x, y, z) : z = 1 - x2 - y2, x2 + y2 ::; 1].
Orient the surfaces A 1 and .12 so that aAI = aA2 = 'Y. Verify that fAI dw = fA2 dw =
f w by computing the integrals.
4.3 Let 8 1 be the circle and define w = (1!27r) de, where 8 is the angular coordinate.
a) Let cp: 81 -781 be a differentiable map. Show that fcp*w is an integer. This
integer is called the degree of cP and is denoted by deg cpo
b) Let CPt be a collection of maps (one for each t) which depends differentiably
on t. Show that deg CPO = deg CPl.
c) Let us regard 8 1 as the unit circle in the complex numbers. Let f be some
function on the complex numbers, and suppose that fez) ~ 0 for Izl = r. Define
CPr.! by setting CPr.!(ei8) = f(re i8 )/lf(rei8)1. Suppose fez) = Zn. Compute deg cpr.!
for r ~ O.
d) Letfbe a polynomial of degree n ~ 1. Thus
fez) = a"zn+ an_Iz,,-I +...+ao,
where an ~ O. Show that there is at least one complex number Zo at which f(zo) = O.
[Hint: Suppose the contrary. Then CPr,(llun)! it! defined for all 0 ::; r < 00 and deg
CPr,(1 Ian)! = const, by (b). Evaluate limr=o andlimr=oo of this eX)ll'eSHion.]
Let X be a vector field defined in some neighborhood U of the origin in IP, and suppose
that X(O) = 0 and that X(x) ~ 0 for x ~ O. Thus X vani:shes only at the origin.
Define the map CPr: 8 1 -7 8 1 by
i8. X(rei8)
cpr(e ) = IIX(rei8) II'
This map is defined for sufficiently small r. By Exerci8e 4.3(h) the degree of this map
does not depend on r. This degree is called the index of the vector field X at the origin.
4.4 Compute the index of
a)
a ax-+y-,
ax ax
a ab) x- - y-'
ax ay
a ac) y- - x-'
ax ay
d) Construct a vector field with index 2.
e) Show that the index of -X is the same as the index of X for any vector field X.
11.5 SOME ILLUSTRATIONS OF STOKES' THEOREM 449
4.5 Let X be a vector field on an oriented two-dimensional manifold, and suppose
that X(p) = 0 for some p E M and that X does not vanish at any other point in a
small neighborhood of p. By choosing an oriented chart mapping p into zero, we get a
vector field on [2 vanishing at the origin. Show that the index of this vector field does
not depend on the choice of charts. We can thus define the index of X at p.
4.6 a) On the sphere 8 2 let X be a vector field which is tangent to the meridian
circles everywhere and vanishes only at the north and south poles. What is its
index at each pole?
b) Let Y be a vector field which is tangent to the circles of latitude everywhere
and vanishes only at the north and south poles. What is its index at each pole?
5. SOME ILLUSTRATIONS OF STOKES' THEOREM
As a simple but important corollary of Theorem 4.2, we state:
Theorem 5.1. Let cp:]I.{1 ~ ]I.{2 be a differentiable map of the oriented
k-dimensional manifold Ml into the n-dimensional manifold ]I.{2. Let w
be a form of degree k - 1 on ]I.{2, and let
D C M 1 be a domain with almost regular q
boundary on MI. Then we have
(5.1)
Equation (5.1) follows directly from (4.2)
and from the fact that cp*d = dcp*.
p
Fig. 11.10
We can regard the right-hand side of (5.1) as the integral of dw over the
"oriented k-dimensional hypersurfaces" cp(D). Equation (5.1) says that this
integral is equal to the integral of w over the (k - I)-dimensional hypersur-
face cp(aD).
[Sj-------------
-------
p
q
Fig. 11.11
We now give a simple application of Theorem 5.1. Let Co: [0, 1] ~ ]I.{ and
C1 : [0, 1] ~ M be two differentiable curves with CoCO) = Cl(O) ~ p and
Co(I) = Cl(I) = q. (See Fig. 11.10.) We say that Co andCl are (differentiably)
homotopic if there exists a differentiable map cp of a neighborhood of the unit
square [0,1] X [0,1] C [R2 into ]I.{ such that cp(t,O) = Co(t), cp(t, 1) = Cl(t),
'P(O, s) = p, and cp(I, s) = q. (See Fig. 11.11.) For each value of s we get the
curve C. given by CB(t) = cp(t, s). We think of cp as providing a differentiable
"deformation" of the curve Co into the curve Cl.
450 EXTERIOR CALCULUS 11.5
Proposition 5.1. Let Co and C1 be differentiably homotopic curves, and
let w be a linear differential form on M with dw = 0. Then
(5.2)
Proof. In fact,
{ 0<1.1> ,,/w = {0<1.1> cp* dw = 0.
Ja <0.0> J[ <0.0 >
But faD is the sum of the four terms corresponding to the four sides of the square.
The two vertical sides (t = °and t = 1) contribute nothing, since cp maps
these curves into points. The top gives - fel (because of the counterclockwise
orientation), and the bottom gives feo. Thus feow - fel w = 0, proving the
proposition. 0
It is easy to see that the proposition extends without difficulty to piecewise
differentiable curves and piecewise differentiable homotopies. Let us say that
two piecewise differentiable curves, Co and ClI are (piecewise differentiably)
homotopic if there is continuous map cp of [0, 1] X [0, 1] ~ M such that
i) cp(O, s) = p, cp(l, s) = q;
ii) cp(t,O) = Co(t), cp(t, 1) = C1(t);
iii) there are a finite number of points to < tl < ... < tm such that cp
coincides with the restriction of a differentiable map defined in some
neighborhood of each rectangle [ti' ti+d X [0, 1]. (See Fig. 11.12.)
To verify that Proposition 5.1 holds for the case of piecewise differentiable
homotopies, we apply Stokes' theorem to each rectangle and observe that the
contribution of the interior vertical lines cancel one another.
We say that a manifold M is connected if every pair of points can be joined
by a (piecewise differentiable) curve. Thus IRn, for example, is connected. We
say that M is simply connected if all (piecewise differentiable) curves joining the
same two points are (piecewise differentiably) homotopic. (Note that the circle,
8 1 is not simply connected.) Let us verify that IRn is simply connected. If Co
and C1 are two curves, let cp: [0, 1] X [0, 1] ~ IRn be given by
cp(t, s) = sCoCt) + (1 - S)C1(t).
It is clear that cp has all the desired properties.
Fig. 1l.12
Proposition 5.2. Let M be a connected and simply connected manifold,
and let 0 EM. Let wE /I(M) satisfy dw = 0. For any x E M let
f(x) = fe w, where C is some piecewise differentiable curve joining 0 to x.
The function f is well defined and differentiable, and df = w.
11.5 SOME ILLUSTRATIONS OF STOKES' THEOREM 451
Proof. It follows from Proposition 5.1 that f is well defined. If Co and Cl are
two curves joining 0 to x, then they are homotopic, and so feo w = fel w.
It is clear that f is continuous, since
f(x) - fey) = fD w,
where D is any curve joining y to x (Fig. 11.13).
To check that f is differentiable, let (U, a) be a chart about x with coordi-
nates <Xl, ... , xn>. Then
f(x!, ... , xi + h, ... ,xn) - f(x l,... ,xn) = fe w,
where C is any curve joining p to q, where a(p) = (Xl, ... , xi, ... ,xn ), and
where a(q) = (xl, ... , Xi + h, ... ,xn). We can take C to be the curve given
by
a 0 C(t) = (xl, ... , xi + ht, ... , xn).
If w = al dx l + ... + an dxn, then
fe w = fol hai dt = foh ai(xl,... ,xi + s, ... ,xn) ds.
(See Fig. 11.14.) Thus
1· 1 [f( 1 i hI n) f( 1 n)] iIm-h X, •.• ,X + ,... ,X - X, ..• ,X =a,
h-.O
that is, aj/ axi = ai. This shows that f is differentiable and that df = w, proving
the proposition. 0
______-x~
o
Fig. 11.13 Fig. 11.14
We have thus established that every wE /V(/R n ) with dw = 0 is of the
form df. More generally, it can be established that if Q E !k(/Rn ) satisfies
dQ = 0, then Q = dw for some w E !k-l(/Rn).
*This is not true for an arbitrary manifold. For instance, every w E !1(81)
satisfies dw = O. Yet the element of angle form (which is, unfortunately,
denoted by dO) is not the d of any function. The fact that d2 = 0 shows that if
Q = dw, then dQ = O. Thus the space d[!k-l(M)] C !k(M) is a subspace of
the space kerk d of elements in !k(M) satisfying dQ = O. The quotient space
kerk d/d[!k-lj is denoted by Hk(M) and is called the kth cohomology group of
M. If M is compact, it can be shown that Hk is finite-dimensional. It measures
(roughly speaking) "how many" k-dimensional holes there are in M.*
452 EXTERIOR CALCULUS 11.6
6. TilE LIE DERIVATIVE OF A DIFFERENTIAL FORM
Let M be a differentiable manifold, and let 'P be a flow on M with infinitesimal
generator X. For any W E /k(M) we can consider the expression
*'PtW - W
t
It is not difficult (using local expressions) to verify that the limit as t ---7 0 exists
and is again an element of /k(M), which we denote by Dxw. The purpose of
this section is to provide an effective formula for computing Dxw. For this
purpose, we first collect some properties of Dx. First of all, we have that it is
linear:
Secondly, we have
'Pi(WI 1 W2) - WI 1 W2 = ('PiWI) 1 ('PiW2) - WI 1 W2
= ('PiWI) 1 ('PiW2) - ('PiWI) 1 W2
+ ('PiWI) 1 W2 - WI 1 W2.
Dividing by t and passing to the limit, we see that
DX(WI 1 W2) = (DXWI) 1 W2 + WI 1 DXW2.
Finally, since 'Pi d = d'Pi, we have
Dx dw = d(Dxw).
Actually, these three formulas suffice for the computation. If
W = " a· . dxil 1 ... 1 dXik,~ zI .. ··.lk
then
Dxw = "Dx(a· . dXil 1 ... 1 dXik)~ ~ zl····,'lk by (6.1)
(6.1)
(6.2)
(6.3)
= " [(Dxa . ) dXil 1 ... 1 clXik + a . (Dx dXil) 1 ... 1 clXikL.J l.l, ... ,tk 'l.!,· .. ,'l.k
+ ... + a,l .....i k clXil 1 ... 1 (Dx dXik)] by repeated use of (6.2)
= " [(Dxa· ) dxil 1 ... 1 clXik + a· . cl(DxXI) 1 ... 1 dxkL...J 11, ... ,Zk ZI,'" Ilk
+... + a,l .....i k clXil 1 ... 1 cl(DxXik)] by (6.3).
Since this expression is rather cumbersome (the d(DxXi) have to be expanded and
the terms collected), we shall derive a simpler and more convenient expression
for D XW. In order to do this, we make an algebraic detour.
Recall that the operator cl: /k(M) ---7 /k+I(M) is linear and satisfies the
identity
cl(WI 1 W2) = clWI 1 W2 + (_l)kwl 1 dW2
if WI E /k(M). More generally, any (sequence of linear) maps () of
/k(M) ---7/k+I(M)
(6.4)
11.6 THE LIE DERIVATIVE OF A DIFFERENTIAL FORM 453
satisfying the identity
(J(WI 1 W2) = (JWI 1 W2 + (_I)kwl 1 (JW2 (6.4')
and
supp (Jw C supp Wt (6.5)
will be called an antiderivation of the algebra A(M).
It follows from (6.5) that if WI == W2 on an open set U, then (J(WI) == (J(W2)
on U. Now about every x E lIf we can find a neighborhood U and functions
xl, ... , xn , so that wE Ak(M) can be written as
on U. (6.6)
Then by repeated use of (6.4') we have
(J(w) == '" [(J(a· . ) 1 clXil 1 ... 1 clXik + a· . (J(clxil) 1 ... 1 dxikL..... 11, ...• 1..k 1..1.···,7k
+ ... + (-I)k-Iail .....ik clXil 1 ... 1 (J(clXik)]. (6.7)
We thus arrive at the important conclusion:
Proposition 6.1. Any antiderivation (J: Ak(M) ---+ AHI(M), k = 0, ... ,n,
is uniquely determined by its action on Ao(M) and AI(M). That is, if
(JI(W) = (J2(W) for all wE AO(M) and AI(M), then (JI(Q) = (J2(Q) for
Q E Ak(M) for any k.
Now suppose we are given maps
and
which satisfy (6.5) and (6.4') where it makes sense, that is,
and (J(fW) = (J(f)llw + f(J(w). (6.8)
Then any chart (U, ex) defines (J: Ak(U) ---+ AHl(U) by (6.7). This
gives an antiderivation (Ju on U, as can easily be checked by the use of the ar-
gument on pp. 440-441. By the uniqueness argument, if (W, (3) is a second chart,
the antiderivations (Ju and (Jw coincide on Un W. Therefore, Eq. (6.7) is
consistent and yields a well-defined antiderivation on A(M). (Observe that we
have just repeated about two-thirds of the proof of Theorem 3.1 for the more
general context of any antiderivation.)
t This condition is actually a consequence of (6.4). In fact, let U be an open set
containing supp w. Since {U, lIf-supp w} is an open covering of 1II, we can find a
partition of unity subordinate to it. In particular, we can find a COO-function cP which
is identically one on supp wand vanishes outside U. Then w = cpw, so that
Thus supp (J(w) C supp w U supp cp C U. Since U is an arbitrary neighborhood of
supp w, we conclude that supp (J(w) C supp w.
454 EXTERIOR CALCULUS 11.6
Also observe that in the above arguments, nothing changes if instead of
0: /k(M) ~ /Hl(M) we have 0: /k(M) ~ /k-l(M). [We take this to
mean O(f) = 0 for f E /O(M).] In fact, the same argument works for
0: /k ~ /Hr
for any odd integer r. We can thus state:
Proposition 6.2. Let 0: AO(M) ~ Ar(M) and 0: Al(M) ~ A r +1(M) be
linear maps satisfying (6.5) and (6.8), where r is odd. Then there exists one
and only one way ofextending 0 to an antiderivation 0: Ah(M) ~ Ah+reM)
satisfying (6.4).
As an application of this proposition, we will attach an antiderivation
O(X): /k(M) ~ /k-l(M) to every smooth vector field X on AI. Since r = -1,
for f E /o(M) we set
O(X)f = o.
For w E / l(M) we set
O(X)w = (X, w). (6.9)
To verify (6.8) means to check that
O(X)(fw) = fO(X)w,
that is, that
(X, fw) = f(X, w),
which is obvious.
If f is a function and 0 is an antiderivation, we denote by fO the map which
sends w ~ fO(w). It is easy to check that this is again an antiderivation.
We can assert the following as a consequence of the uniqueness theorem:
Let X and Y be smooth vector fields, and let f and g be smooth functions. Then
O(fX + gY) = fO(X) + gO(Y). (6.10)
By the proposition, it suffices to check (6.10) on all w E /l(M). By (6.9), this
is just
(fX + gY, w) = f(X, w) + g(Y, w),
which is obvious.
In particular, in a chart (U, h), if
X=Xl~+ ... +xn~,
axl axn
then
O(X) = L XiO (a~i) .
To evaluate O(ajaxi ), we use (6.8) and the fact that
if i ~ j,
if i = j.
11.6 THE LIE DERIVATIVE OF A DIFFERENTIAL FORM 455
Thus, for example, O(ajaxi) dxP / dxq = 0 if neither p = i nor q = i, while
O(ajaxi) (dxi / dxi ) = dxi , O(ajaxi)(dxi / dxi ) = -dxi , etc.
Let us call a (sequence of) map(s) D: /k(JI1) ---7 /H8CM), where s is even,
a derivation if it satisfies (6.5) and
D(WI / W2) = DWI / W2 +WI / DW2· (6.11)
Since s is even, this is consistent. The most important example is Dx
where s = O. Then (6.11) is just (6.2).
All the previous arguments about existence and uniqueness of extensions
apply unchanged to derivations, as can easily be checked. We can therefore
assert:
Proposition 6.3. Let D: /0(1'01) ---7/0+8(1f) and D: /I(M) ---7/ H8(1'o1),
where s is even, be maps satisfying (6.5) and (6.8) (with 0 replaced by D).
Then there exists one and only one way of extending D to a derivation of
/(M).
We need one further algebraic fact.
Proposition 6.4. Let 01: /k ---7 /k+r1 and Oz: /k ---7 /k+r2 be antideriva-
tions. Then 0102 + 0201: /k ---7 /k+r1 +'2 i" a derivation.
Proof. Since Tl and r2 are both odd, i"1 + 1"2 is eyen. Equation (6.5) obviously
holds. To verify (6.4'), let WI E /k(lI). Then
Similarly,
0102(Wl / W2) = 01[02Wl / W2 + (_l)kWI / 02W2]
= 0102Wl / W2 + (_1)k+r202Wl / 01W2
+ (_l)kOlWI / 02W2 + WI / 0102W2·
0201(WI / W2) = 0201Wl / W2 + (_l)k+rlOlWI / 02W2
+ (_1)k02Wl / 0IW2 + WI / 0201W2·
Since rl and T2 are both odd, the middle terms cancel when we add. Hence we get
(0 102 + 0201)(Wl / W2) = (0 102 + 0201)Wl / W2 + WI / (0102 + 0201)W2. 0
As a first application of Proposition 6.3, we observe that
O(X) 0 O(Y) = -O(Y) 0 O(X). (6.12)
In fact, by Proposition 6.4, O(X)O(Y) + O(Y)O(X) is a derivation of degree
-2, that is, it vanishes on /0 and / 1. It must therefore vanish identically.
We could, of course, directly verify (6.12) from the local description of O(X)
and O(Y).
As a more serious use of Proposition 6.4, consider O(X) 0 d +d 0 O(X),
where X is a smooth vector field. Since d: /k ---7 /Hl and O(X): /k ---7 /k-r,
we conclude that O(X) 0 d + d 0 O(X): /k ---7 /k. We now assert the main
formula of this section:
Dx = O(X) 0 d + d 0 O(X). (6.13)
456 EXTEHIOH CALCULUS lUi
Since both sides of (6.13) are derivations, it suffices to check (6.13) for
fUllctions and linear differential forms. If j E 1o(M), then O(X)j = O. Thus,
by (6.9), Eq. (6.13) becomes
Dxj = (X, df),
which we know holds. Next we must verify (6.13) for wE II(M). By (6.5), it,
suffices to verify (6.13) locally. If we write w = al dx l +... + an dxn, it
suffices, by linearity, to verify (6.13) for each term ai dXi. Since both sides of
(6.13) are clerivations, we have
Dx(a, dxi) = (DXai) dxi + a,(Dx dxi)
and
[O(X) d +clO(X)](a, dxi) = [O(X) d + clO(X)](a,) dxi + a,[O(X) d + dO(X)] clXi.
Since we have verified (6.13) for functions, we only have to check (6.13) for dXi.
Now
and
[O(X) d + clO(X)] dxi = dO(X) d.ri = d(X, dxi) = dDxXi.
This completes the proof of (6.13).
In many circumstances it will be convenient to free the letter 0 for other uses.
We shall therefore occasionally adopt the notation
X .J w = O(X)w.
The symbol .J is called the interior product. X.J w is the interior product of
the form w with the vector field X. If wE 1 then X.J w E Ik-l. Equa-
tion (6.13) can then be rewritten as
Dxw = X .J dw + d(X .J w). (6.14)
Let us see what (6.14) says in some special eases in terms of local coordinates.
If w = al clx l +... + an dxn and
X Xl a xn a= ax1 + ... + a.rn '
then
Hence
while
so
j (aaj) i (aXi) i
d(X.J w) = LX axi dx + L aj axi dx.
APP. I. "VECTOR ANALYSIS" 457
Thus
( j aai aXJ) i
Dxw = L LX a----:+ aj-a' dx,
i j xJ x'
"vhich agrees with Eq. (7.12') of Chapter 9.
As a second illustration, let n = a dx 1 1 ... 1 dxn , where n = dim 111.
Then dn = 0, so (6.14) reduces to
Dxn = d(X ..J n).
If X = L Xi(ajaxi), then
X ..J n = L Xi (a~i) ..J w
= L aXi (~-;) ..J (dxl 1 . . . 1 d:rn )
ax'
= L (_l)i-laX i dx1 1 ... 1 dX i- 1 1 dXi+1 1 ... 1 dxn ,
which is merely the formula introduced at the end of Section 4. Then
(~ aax') 1 n
Dxn = d(X..J n) = L... axi- dx 1 ... 1 dx .
Since we can always locally identify a density with an n-form by identifying P
"ith Pc< dx1 1 ... 1 dxn on (U, a), we obtain another proof of Proposition 5.2
of Chapter 10.
Appendix I. "VECTOR ANALYSIS"
We list here the relationships between notions introduced in this chapter and
various concepts found in books on "vector analysis ", although we shall have
no occasion to use them.
In oricnted Euclidean three-space 1E3 , there are a number of identifications
we can make which give a special form to some of the operations we have
introduced in this chapter.
First of all, in 1E3, as in any Riemann space, we can (and shall) identify
vector fields with linear differential forms. Thus for allY function f we can
regard df as a vector field. As such, it is called grad f. Thus, in 1E3, in terms of
rectangular coordinates x, y, z,
gradf = ..Jaf ,~, af"'- ,
,ax ay aZ(
where we have also identified vector fields on 1E3 with 1E3-valued functions.
Secondly, since 1E3 is oriented, we can, via the *-operator (acting on each Tn,
identify A2(1E3) with A1(1E3). Recall that * is given by
*(dx 1 dy) = dz, *(dx 1 dz) = -dy, *(dy 1 dz) = dx. (L1)
458 EXTERIOR CALCULUS
In particular, if Wi = <.P, Q, R>- = P dx +Q dy + R dz and W2 =
<.L, M, N>- = L dx + M dy +N dz, we can introduce the so-called "vector
product" of Wi with W2. It is defined by
Wi X W2 = *(Wi / W2)
and is given [in view of (1.1)1by
<.P, Q, R>- X <.L, M, N>- = <.QN - RM, RL - PN, PM - QL>-.
Also we introduce the operator
curl W = *dw.
Thus, if W= <.P, Q, R>-, we have
curl w =.-JaR _ aQ, ap _ aR , aQ _ ~~ ... .
" ay az az ax ax ay (
Consider an oriented surface in 1E3; i.e., let cp: S -t 1E3. Let n be the volume
form on S associated with the Riemann metric induced by cpo By definition, if
~b ~2 E Tx(S), then
n(~b b) = dV <. CP*~b CP*~2' n>-,
where dV is the volume element of 1E3 and n is the unit normal vector. Another
way of writing this is to say that
n(h, ~2) = U(CP*~b cp*b),
where U = *n when we regard n as a differential form. Now let iii be a form in
1E3 , and suppose that cp*w = fn for some function f. Then
f(x) = (w, *n)(cp(x)).
Thus
Iscp*(W) = !sfn = Is (w,*n)n = I(*w,n)n.
Applying this to w = dw, where w = P dx +Q dy + R dz, we can rewrite
Stokes' theorem as
Ie w = Ie P dx +Q dy + R dz = Is (curl w, n)n,
where S is some surface spanning the closed curve C.
If we apply the remark to the case w= *w and S = aD, we obtain, since
** = id (for n = 3),
I(w, n)n = ID d * w
Note that
d * w = (ap +aQ +~~) dx / dy / dz
ax ay az '
which we write as div W; that is,
div w = d *w.
APP. II. DIFFERENTIAL GEOMETRY OF SURFACES IN 1E3 459
(It is in fact div {w, dV}, where dV is the volume element and we regard w as a
vector field.) Thus we get the divergence theorem again. Note that
curl (grad f) = *d df == 0
and
div (curl w) = d ** dw = d2w = 0,
since d2 = o.
Appendix II. ELEMENTARY DIFFERENTIAL
GEOMETRY OF SURFACES IN 1E3
For purposes of computation, it is convenient to introduce the notion of a
vector-valued differential form. Let E be a vector space, and let M be a differen-
tiable manifold. By an E-valued exterior differential form !l of degree p we
shall mean a rule which assigns an element !l:z; to each x EM, where !l:z; is
an antisymmetric E-valued multilinear function of degree p on T:z;(M). For
instance, if p = 0, then an E-valued zero-form is just a function on M with
values in E. An E-valued one-form is a rule which assigns an element of E to
each tangent vector ~ at any point of M, and so on.
Suppose that E is finite-dimensional and that {eb ... , eN} is a basis for E.
Let !lb ... ,!IN be (real-valued) p-forms. We can then consider the E-valued
p-form!l = !llel +... +!lNeN, where, for any p vectors h, ... , ~p in T:z;(M),
we have
!l:z;(h, ... , ~p) = !lIAh, ... , ~p)el +... + !IN:z;(h, ... , ~p)eN.
Conversely, if !l is an E-valued form, then real-valued forms !lb ... , !IN can
be defined by the above equation. In short, once a basis for an N-dimensional
vector space E has been chosen, giving an E-valued differential form nis the
same as giving N real-valued forms, and we can write
N
!l = L !liei
I
or
The rules for local description of E-valued forms, as well as the transition
laws, are similar to those of real-valued forms, so we won't describe them in
detail. For the sake of simplicity, we shall restrict our attention to the case
where E is finite-dimensional, although for the most part this assumption is
unnecessary.
If w is a real-valued differential form of degree p, and if !l is an E-valued
form of degree q, then we can define the form w / !l in the obvious way. In
terms of a basis, if!l = -<!lb ... ,!lN >- ,then w / !l = -< w / !lb ... , w / !IN>-.
More generally, let E and F be (finite-dimensional) vector spaces, and let #
be a bilinear map of E X F -) G, where G is a third vector space. Let
{eb . .. ,eN}
460 EXTERIOR CALCULUS
be a basis for E, let {h, ... ,fM} be a basis for F, and let {gI, ... ,od be a
basis for G. Suppose that the map # is given by
#{ei, h} = L a~jOk.
k
Then if W = L Wiei is an E-valued form and n = L niIj is an F-valued form,
we define the G-valued form W 1 n by
W 1 n = L (2: a~jWi 1 nj) Ok·
k t.J
It is easy to check that this does not depend on the particular bases chosen.
Ve shall want to this notion primarily in two contexts. First of all,
we will be interested in the case where E = F and G = IR, so that #is a bilinear
form on E. Suppose #is a scalar product and el, ... , eN is an orthonormal basis.
Then we shall write (w 1 n) to remind us of the scalar product. If
and
then
(W 1 n) = L Wi 1 ni'
Kote that in this case if W is a ]i-form and n is a q-form, then
as in the case of real-valued forms.
The second case we shall be interested in is where F = G and E = Hom(F),
and # is just the evaluation map evaluating a linear transformation on a vector
of F to give another element of F. This time, choosing a basis for F determines a
basis for Hom(F), so we can regard was a matrix of real-valued differential forms.
If W = (Wij) and n = -< nl, ... , nM >- , then
W 1 n= -<LWlj 1 nj, ... ,LWMj 1 nj >-.
The operator d makes sense for vector-valued forms just as it did for real-
valued forms, and it satisfies the same rules. Thus, if n = -<n!, ... , nN>-,
then dn = -< dnj , ..• , dnN>- and
dew 1 n) = dw 1 n+ (-l)Pw 1 dn
if W is an E-valued form of degree p and n is an F-valued form.
We shall apply the notion of vector-valued forms to develop (mostly in
exercise form) some elementary facts about the geometry of oriented surfaces
in IE:l. Let 111 be an oriented two-dimensional manifold, and let 'P be a differ-
entiable map of },1 into 1E3. We shall assume that 'P* is not singular at any point
of M, i.e., that 'P is an immersion. Thus at each point p EO JI the space 'P* (Tp(M»)
is a two-dimensional subspace ofT<p(p)(1E 3 ). Since we can identify T<p(p)(1E 3)
,'vith 1E 3, we can regard 'P*(Tp(ilf)) as a two-dimensional subspace of 1E3.
(See Fig. 11.15.) Since M is oriented, so is the tangent plane 'P*(Tp(Nl)).
Therefore, there is a unique unit vector orthogonal to the tangent plane which,
APP. II. DIFFERENTIAL GEOMETRY OF SURFACES IN 1E3 461
'P*(Tp(M))
Fig. H.lS
together with an oriented basis of the tangent plane, gives an oriented basis
of 1E3. This vector is called the normal vector and will be denoted by n(p). We
can consider n an 1E3-valued function on M. Since Ilnll = 1, we can regard n as
a mapping from M to the unit sphere. Note that cp(M) lies in a fixed plane of 1E3
if and only if n = const (n = the normal vector to the plane). We therefore can
expect the variation of n to be useful in describing how the surface cp(M) is
"bending".
Let 0 be the (oriented) area form on M corresponding to the Riemann metric
induced by cpo Let Os be the (oriented) area form on the unit sphere. Then
n*(Os) is a two-form on M, and therefore we can write
n*Os = KO.
The function K is called the Gaussian curvature of the surface cp(M). Note that
K = 0 if cp(M) lies in a plane. Also, K = 0 if cp(M) is a cylinder (see the exer-
cises).
For any oriented two-dimensional manifold with a Riemann metric we
let (f denote the set of all oriented bases in all tangent spaces of M. Thus an
element of (f is given by -<iI, 12'>, where -<fll 12'> is an orthonormal basis of
T",(M) for some x E M. Note that 12 is determined by fl' because of the orienta-
tion and the fact that f2 1. II. Thus we can consider (f the space of all tangent
vectors of unit length. For each x E M the set of all unit vectors is just a circle.
We leave it to the reader to verify that (f is, in fact, a three-dimensional manifold.
We denote by 7r the map that assigns to each -<fII f 2'> the point x when
-<fll f2 '> is an orthonormal basis at x. Again, the reader should verify that 7r
is a differentiable map of (f onto M.
In the case at hand, where the metric comes from an immersion cp, we define
several vector-valued functions X, el, e2, and e3 on (f as follows:
X = cp 0 7r,
el( -<It, f2 '» = cp*1t,
e2( -<It, f2 '» = CPJ2,
e3 = no 7r.
462 EXTERIOR CALCULUS
(In the middle two equations we regard cp* Ii as elements of 1E3 via the identifica-
tion of TI"(x)1E 3 with 1E3.) Thus at any point z of 5', the vectors el(z), e2(z), e3(z)
form an orthonormal basis of 1E3, where el (z) and e2(z) are tangent to the surface
at cp(n·(z)) = X(z) and ea(z) is orthogonal to this surface. We can therefore
write
(dX 1 ea) = (dX, e3) = 0 and
By the first equation we can write
dX = Wlel +W2e2, (11.1)
where
and
are (real-valued) linear differential forms defined on 5'.
Similarly, let us define the forms Wij by setting
Wij = (dei, ej).
Applying d to the equation (ei' ej) = 8ik shows that
(11.2)
If we apply d to (11.1), we get
o= d dX = dWlel - WI 1 del + dW2e2 - W2 1 de2.
Taking the scalar product of this equation with el and e2, respectively, shows
(since WII = 0 and W22 = 0) that
If we apply d to the equation
we get
and
dei = L: Wijej,
j
0= 'L,(dWijej - Wij 1 dej)
and if we take the scalar product with ej, we get
dWij = L: Wik 1 Wkj·
k
If we apply d to the equation (dX, e3) = 0, we get
o= d(dX, (3) = (dX, de3) = (wlel +W2e2, Walel +Wa2e2),
which implies that
(11.3)
(11.4)
(11.5)
We will now interpret these equations. Let z = </I,j2> be a point of 5'.
For any ~ E Tz(5') we have
(~, dX) = (~, d7r*cp) = (~, 7r*dcp) = (7r*~, dcp) = CP*(7r*~).
APP. II. DIFFERENTIAL GEOMETRY OF SURFACES IN 1E3 463
Therefore,
(~, WI) = ('P*7r*~, el)
- ('P*7r*~,'P*it)
= (7r*~,it), (II.6)
since the metric was defined to make 'P* an isometry. In other words, (~, WI)
and (~,W2) are the components of 7r*~ with respect to the basis -<it,f2>-'
If TJ is another tangent vector at z, then WI 1 W2(~' TJ) is the (oriented) area of
the parallelogram spanned by 7r*~ and 7r*TJ. In other words,
WI 1 W2 = 7r*Q,
where Q is the oriented area form on M.
Similarly,
and we have
n*7r*~ = (~, w31)el + (~, w32)e2
(~, W31)'P*it + (~, w32)'P*f2'
(II.7)
(II.S)
(II.g)
Since we can regard el and e2 as an orthonormal basis of the tangent space to the
unit sphere, we conclude that W3I 1 W32(~' TJ) is the oriented area on the unit
sphere of the parallelogram spanned by n*7r*~ and n*7r*TJ. Thus
Let
W3I 1 W32 = 7r*n*Qs
= 7r*KQ
= KWI 1 W2'
be the matrix of the linear transformation n*: Tx(M) -+ T n(x)(S2) in terms of
the basis -<it, f2 >- of TxClI1) and -<ell e2>- of T n(x)(S2). Then comparing (I1.6)
with (II.9) shows that
and (II.lO)
If we substitute this into (II.5), we conclude that b = bl, i.e., that the
matrix of n* is symmetric. This suggests that it corresponds to a symmetric
bilinear form of some geometrical significance. In other words, we want to
consider the quadratic form
awi + 2bwIW2 + cw~
[where it is understood that this is the quadratic form on Tz(fJ) which assigns
the number
to any ~ E Tz(5')].
464 EXTERIOR CALCULUS
EXERCISES
11.1 Show that
a<t Wl)2 + 2b<t Wl)<~, W2) +c<t W2)2 = (cp*7J"*t n*7J"*~).
11.2 The quadratic form which assigns to each ~ E TAM) the number (cp*~, n*~)
is called the second fundamental form of the surface. We shall denote it by II(~).
(What is usually called the first fundamental form is just 1I~112 in our terminology.)
Let C be any smooth curve with C'(O) = ~. Show that
IIm = - (d2
;t:C(0), n(x») .
Thus IICn measures how much the curve cp 0 C is bending in the n-direction. Suppose
we choose C to be such that cp 0 C lies in the plane spanned by CP*~ and n(x). [Geomet-
rically, this amounts to considering the curve obtained on the surface by intersecting
the surface with the plane spanned by CP*~ and n(x).] Show that II(n is the curvature
of this plane curve.
In this sense, the second fundamental form II(n tells us how much the surface is
bending in the direction of r.
Note that
Let Al and A2 be the eigenvalues of the matrix
[~ ~J.
Thus
Al = max IIm and for II~II = 1.
If Al ~ A2, there are two orthogonal eigenvectors which are called the directions of
principal curvature of the surface. (Note that they must be orthogonal, since they are
eigenvectors of a' symmetric matrix.)
If A is a Euclidean motion of [3, then if; = A 0 cp is another immersion of
M and it is easy to check that both the Riemann metric induced by if; and the second
fundamental form associated with if; coincide with those attached to cpo What is not
so obvious is the converse: If if; and cp induce the same metric and the same second funda-
mental form, then if; = A 0 cp for some Euclidean motion A. We will not prove this fact,
although it is a fairly easy consequence of what we have already established.
We have seen the meaning of w!, W2, W3l, and W32 in geometric terms. Let us
now interpret the one remaining form, W12.
Let I' be a differentiable curve on M. A differentiable family of unit vectors
11 (-) along I' (where 11 (s) E T'Y(s)(M)J is the same as a curve C in ~ with 7r 0 C = 1'.
(Here C(s) = -<11 (s), 12(S) >-.J Let us call the family ft (s) parallel if the unit
vectors are all changing normally to the surface in three-space. In other words,
it(s) is parallel if the vector
dcpn(s) (ft (s))
ds
APP. II. DIFFERENTIAL GEOMETRY OF SURFACES IN 1E3 465
is normal to rp(M) for all s. Let us see how to express this condition. Let 1;.
be the tangent vector to the curve C at C(s). Then, by the definition of dell
d
ds rp*'Y(8) (fl(s») = (1;8, del)'
Note that (1;., del), el(C(s»)) = 0 and (1;., del), e2(C(s»)) = (1;., WI2). Now
el and e2 span the tangent space to rp(M), so saying that fl (-) is parallel is the
same as saying that (1;., W12) = O.
Thus h(s) is parallel along 'Y if and only if (1;., W12) = O.
Let M and M be two-dimensional manifolds with Riemann metrics.
.Let u: M ---. M be a differentiable map which is an isometry. Let 5' be the
manifold of orthonormal bases of 111, and let 5' be the manifold of orthornormal
bases of M. Then u induces a map U of 5' ~ 5' by
u(-<h,f2"» = -<U*!I,U*f2">'
Let WI be the differential form on 5' given, as in (1I.6), by
for I; E Tz (5'),
where z = -<fl, f2">, with the corresponding definition for W2, w}, and W2'
Then for any I; E Tz(5') we have, since if 0 U = u 0 7r,
(I;, U*WI) = (u*l;, WI) = (7i'*u*l;, u*fl) = (u*7r*1;, u*fl) = (7r*1;, fl) = (I;, WI)'
In other words,
and
Now suppose that the metrics on M and AI come from immersions rp and <p.
Then we get forms Wij and Wij. Now by (1I.3) we have
U*(WI 1 (21) = u*dwI = d(u*wI) = dWI = WI 1 W21'
Thus
and
or
and
Since the differential forms WI and W2 are linearly independent, this can only
happen if
In other words, if the two surfaces rp(M) and <p(M) are isometric, they have
the "same" W12, that is, the same notion of "parallel vector fields". Observe
that a piece of a cylinder and a piece of the plane are isometric, even though they
are not congruent by a Euclidean motion. In different terms, while the forms WI3
and W23 depend on how the surface is immersed in 1E3 , the form WI2 depends
only on the Riemann metric induced by the immersion.
Now we have (1I.4):
466 EXTERIOR CALCULUS
From this we conclude that the Gaussian curvature K also does not depend only
on the immersion, but only on the Riemann metric coming from the immersion.
Since Wl2 does not depend on C(J, we should be able to define it for an arbitrary
two-dimensional manifold with a Riemann metric. Note that the preceding
argument shows that Wl2 is uniquely determined by Eq. (11.4). It therefore
suffices to construct an W12 on a coordinate neighborhood so as to satisfy (11.4).
It will then follow from the uniqueness that any two such coincide to give a
well-defined form. Let U be a coordinate neighborhood of M, and let y;: U ---t it
be a differentiable map such that 71" 0 y; = id. Thus y; assigns a basis <11,12 >-
to each x E U, in a differentiable manner. (One possible way to construct y; is
to apply the orthonormalization procedure to the vector fields <ajax!, ajax2>-.)
Once we have chosen y;, any basis of Tx differs from y;(x) by a rotation.
If we let 7 denote the (angular) coordinate giving this rotation (so that 7 is
only defined mod 271"), then we can use the local coordinates on U together
with 7 as coordinates on 7I"-I(U). llore precisely, if Xl and x 2 are local coordi-
nates on U, we define yl, y2, 7 by
and 7(Z) is given for Z = <el, e2> by
el = cos 7(Z)Jr + sin 7(z)h, e2 = -sin 7(Z)Jr + cos 7(Z)h, (11.11)
where <Jr,h> = y;(x) when <el, e2> E Tx(M).
Now let
and
so that 81 and 82 are forms defined on U and are, in fact, the dual basis for y;(x)
at each x E M. If we set
and
then (11.11) gives
WI = cos 7al + sin 7a2 and
Note that
Define the functions II and l2 on M by
and
Let k1 = II 0 71" and k2 = l2 0 71", so that
and
Now
dWl = -sin 7 d7 1 a1 + cos 7 d7 1 a2 + (k1 cos 7 + k2 sin 7)a1 1 a2,
dW2 = -cos 7 d7 1 a1 - sin 7 d7 1 a2 + C+k2 cos 7 - k1 sin 7)a1 1 a2.
APP. II. DIFFERENTIAL GEOMETRY OF SURFACES IN 1E3 467
Since WI A W2 = al A a2, we can rewrite these equations as
dWI = (dr + (kl cos r + k2 sin r)wI) A W2,
dW2 = - (dr - (+k2 cos r - kl sin r)w2) A WI.
We thus see that the form
Wl2 = dr + (k l cos r + k2 sin r)wI + (-kl sin r + k2 cos r)w2
= dr + klal + k2a2
satisfies the desired equations.
As before, on any two-dimensional Riemann manifold we will call a family
of unit vectors parallel along a curve l' if (~., W12) = O. With this definition of
parallel translation we can state the following:
Theorem. Let l' be any differentiable curve on M. Given the unit vector
gl E T'Y(o)(M), there is a unique parallel family of unit vectors gl (s) along 1',
with gl (0) = gl. If g~ (0) is another unit vector of T'Y(O)(M) differing from gl
by an angle CT, then g~(s) differs from gl(S) by the same angle CT for all s.
Proof. It is clearly sufficient (by breaking l' up into small pieces if necessary) to
prove the theorem for curves l' lying entirely in a coordinate chart. Then we
can use the local expression for W12.
Let us rewrite the condition for parallel translation along I'(s). In terms of
local coordinates, the unit vector gl (s) is given by a function r(s), where
gl(S) = cos r(s)!l (I'(s») - sin r(s)!2(I'(s»).
Then
(~., W12) = (~. dr) + (~., klal + k2( 2)
dr(s) *= ([S + (~., 7r (kllh + k2Ih»
dr(s)
= ([8 - (7r*~., klfh + k2(J2).
But 7r*~. = r. is the tangent vector to l' at I'(s). Thus
where F'Y(s) = (r., kl(J1 + k2(J2) is a function depending only on s. In par-
ticular, gl(S) is parallel if and only if
dr(s) = F ( )
ds 'Y S •
From this we see that given gl (0) there is a unique parallel family gl (s), starting
with gl (0). Furthermore, if g~ (0) is a second unit vector at 1'(0), the angle
468 EXTERIOR CALCULUS
between gl(S) and g~(s) is equal to the angle between gl(O) and g~(O). Thus
parallel translation preserves angles, which proves the theorem. 0
Note that if M is (locally isometric to) Euclidean space, then we can choose
a
h = axl and
so that {}l = dx l and (}2 = dx2• In this case, kl = k2 = 0 and T is just the
angle that gl makes with a/ax that is, with the xl-axis. Thus W12 = -dT
in this case. Then the condition for parallel translation becomes dT/ ds = 0,
which coincides "ith the usual notion of parallelism in Euclidean geometry.
1'ote that in Euclidean "pace the parallelism does not depend on the curve 1'.
This is not true in general.
Exercise II.3. Let 1'1 and 1'2 be two arcs of great circles joining the north and south
poles on S2. Suppose that 1'1 and 1'2 are orthogonal at the l)oles. Let r be a tangent
vector of the north pole. Compare its translates to the south pole via 1'1 and 1'2.
Let M be any two-dimensional Riemann manifold. For any curve I' on M
there is an obvious way of choosing unit vectors along 1': just let gl(S) be the
unit tangent vector to I' at I'(s). Thus for every curve I' on III we get a curve,
which we shall call y, on 5'. [Here y = (I'(s), gl(S), g2(S) and gl(S) is the
tangent to I'(s).]
We call the form y*(W12) the geodesic curvature form of 1'. [In the Euclidean
case this is just the ordinary curvature (see the exercises).]
Let us consider those curves whose geodesic curvatures vanish, i.e., those
curves whose tangent vectors are parallel. We shall call such a curve a geodesic
with respect to the given Riemann metric. Note that the condition that a curve
be geodesic is given, in local coordinates, by a second-order differential equa-
tion. Therefore, a geodesic CO is uniquely specified by giving C(t) and C'(t)
at any fixed value of t. In Chapter 13 we use the term "geodesic" to mean
a curve which locally minimizes length. It is the purpose of the next few exercises
to show that geodesics in our present sense have this property.
EXERCISES
11.4 Let x, y be local coordinates on U eM. Through each point of the curve
y = 0 (that is, the x-axis in the local coordinates), construct the unique geodesic
orthogonal to this curve. (See Fig. 11.16.) Let s be the arc-length parameter along
the geodesic, so that the geodesic passing through (u, 0) is given by
(y(u, 8), x(u, 8»).
Show that the map (u, 8) ~ (y(u, 8), x(u, 8») has nonzero Jacobian at (0,0) and
therefore defines a coordinate system in some open subset U' C U.
APP. II. DIFFERENTIAL GEOMETRY OF SURFACES IN 1E3 469
Fig. 11.16 Fig. 11.17
11.5 We are going to make a further change of coordinates. Let Y be the vector
field on U' defined by the properties
IIYII = 1, (Y, du) > o.
Thus Y is orthogonal to the geodesics u = const and points in the increasing U-direc-
tion. Let us consider the solution curves of this vector field parametrized by the initial
position along the geodesic u = O. That is, let v be the arc-length parameter along the
geodesic u = 0, and consider the map
(u, v) ~ (u, s(u, v»),
where s(u, v) is the s-coordinate of the intersection of the solution curve of Y passing
through (0, v) with the geodesic given by u. (See Fig. 11.17.) Again the existence
theorem and smooth dependence on parameters, together with the fact that the curves
u = 0 and s = 0 are already orthogonal, guarantees that we can find some neighbor-
hood W so that (u, v) are coordinates on W. "We have thus constructed coordinates
such that the curves u = const are geodesics and the curves u = const and v = const
are orthogonal. Such a system of coordinates is called a geodesic parallel coordinate
system.
11.6 Let (u, v) be a coordinate system on U C ill for which (a/au, a/av) == 0, so
that the metric takes the form
ds2 = P du2 + q dv2•
Define the choice of frame if; by normalizing a/au, a/av so that if;(x) = <h, h >-,
where h = (a/au)/II(a/au) II and h = (a/av)/II(a/av) II. Show that the forms fh
and fh are given by
{II = P du, {I2 = q dv,
and
( 1 ap 1 aq )W12 = dT - 1l"* - - du +- - dv
q av p au
and
K - - ~[i.(.!:. aq) +~ (!ap)].
- pq au p au av q av
470 EXTERIOR CALCULUS
11.7 Let (u, v) be a geodesic parallel coordinate system, as in Exercise II.5. The
curve Cu given by Cu(v) = Cu, v) is a geodesic. Thus (CI/(v), W12) = 0. But in terms
of our local coordinates, (CI/(v) , dT) = 0, since C'(v) is always parallel to one of the
base vectors 12, and
(CI/ (v), ir*du) = (C1 (v), dU) = 0,
since u = const along C. Thus we conclude that aq/au = 0, or q = q(v). Let us
replace the parameter v by w 10' q(t) dt. Then (u, v) is a geodesic parallel coordinate
system for which we have
ds2 = P du2 + dw2,
and now the arc length along any curve u = const is I dw.
11.8 Show that for Iwl sufficiently small, any curve joining, (0, 0) to (0, w) must have
arc length at least Iwl. Conclude that (since the choice of our original curve x = °was arbitrary) the geodesics locally minimize length.
II.9 Let -< w, z>- be local coordinates on an open set U of a Riemann manifold
with the property that the curves Cz given by Cz(w) = (w, z) are geodesics param-
etrized according to arc length. Thus z = const is a geodesic and [[a/aw[[ = 1. Let
Show that aa/aw = 0. [llint: Show that by orthonormalizing (a/aw, a/az) , we obtain
a map if; whose associated forms (h and (h are given by (h = dw + adz, (12 = b dz,
wh
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus
Advanced_Calculus

More Related Content

PDF
A level further mathematics zimsec syllabus cambridge zimbabwe
PDF
An introduction to linear algebra
PDF
Syllabus 4-year-bs-math
PDF
G024047050
PDF
P. e. bland, rings and their modules
PDF
The quantum strategy of completeness
PDF
29 15021 variational final version khalid hammood(edit)
A level further mathematics zimsec syllabus cambridge zimbabwe
An introduction to linear algebra
Syllabus 4-year-bs-math
G024047050
P. e. bland, rings and their modules
The quantum strategy of completeness
29 15021 variational final version khalid hammood(edit)

Viewers also liked (15)

TXT
구청전세자금대출『BU797』.『COM』프랜차이즈체인 위해골프여행
PPTX
Treball nadal
PDF
GlobalBono en Mujer Emprendedora 18 julio 2012
PPTX
PUBLICACION 2
PDF
REAL-LIVE-PRESENTATION-MAY16-2016
PPTX
Historieta
PDF
Civil Society Organisations in Research Governance
PPTX
Alcohol jaume
PDF
Carta iberoamericana de_participacion
PDF
Business s4 plan de trabajo - dic 2016
PDF
Grammar Terms
PDF
Módulo 4 gobierno electrónico
PPTX
Sahara case
PDF
Black tip shark
구청전세자금대출『BU797』.『COM』프랜차이즈체인 위해골프여행
Treball nadal
GlobalBono en Mujer Emprendedora 18 julio 2012
PUBLICACION 2
REAL-LIVE-PRESENTATION-MAY16-2016
Historieta
Civil Society Organisations in Research Governance
Alcohol jaume
Carta iberoamericana de_participacion
Business s4 plan de trabajo - dic 2016
Grammar Terms
Módulo 4 gobierno electrónico
Sahara case
Black tip shark
Ad

Similar to Advanced_Calculus (20)

PDF
Advanced Linear Algebra (Third Edition) By Steven Roman
PDF
Galois Theory Escofier Jeanpierreschneps Leilatranslator
PDF
A Course In LINEAR ALGEBRA With Applications
PDF
Lectures On The Geometry Of Manifolds 2nd Edition Liviu I. Nicolaescu
PDF
A First Course In With Applications Complex Analysis
PDF
A combinatorial miscellany by Anders BJ Orner and Richard P. Stanley
PDF
A combinatorial miscellany by anders bj ¨orner and richard p. stanley
PDF
Conformal Field Theory And Topology Toshitake Kohno
PDF
Polynomials ( PDFDrive ).pdf
PDF
Grimmett&Stirzaker--Probability and Random Processes Third Ed(2001).pdf
PDF
Elementary geometry from an advanced standpoint(Geometría Elemental Desde Un ...
PDF
Calculus volume 1
PDF
Herstein 3th editon
PDF
Linear integral equations -rainer kress
PDF
Journey Into Mathematics An Introduction To Proofs Dover Ed Joseph J Rotman
PDF
A Book of Abstract Algebra.pdf
PDF
Functional Equations and Inequalities in Several Variables 1st Edition Stefan...
PDF
Fractional Calculus Models and Numerical Methods 2nd Edition Dumitru Baleanu
PDF
Analytic Theory Of Polynomials Qazi Ibadur Rahman Gerhard Schmeisser
PDF
Multiplicative number theory i.classical theory cambridge
Advanced Linear Algebra (Third Edition) By Steven Roman
Galois Theory Escofier Jeanpierreschneps Leilatranslator
A Course In LINEAR ALGEBRA With Applications
Lectures On The Geometry Of Manifolds 2nd Edition Liviu I. Nicolaescu
A First Course In With Applications Complex Analysis
A combinatorial miscellany by Anders BJ Orner and Richard P. Stanley
A combinatorial miscellany by anders bj ¨orner and richard p. stanley
Conformal Field Theory And Topology Toshitake Kohno
Polynomials ( PDFDrive ).pdf
Grimmett&Stirzaker--Probability and Random Processes Third Ed(2001).pdf
Elementary geometry from an advanced standpoint(Geometría Elemental Desde Un ...
Calculus volume 1
Herstein 3th editon
Linear integral equations -rainer kress
Journey Into Mathematics An Introduction To Proofs Dover Ed Joseph J Rotman
A Book of Abstract Algebra.pdf
Functional Equations and Inequalities in Several Variables 1st Edition Stefan...
Fractional Calculus Models and Numerical Methods 2nd Edition Dumitru Baleanu
Analytic Theory Of Polynomials Qazi Ibadur Rahman Gerhard Schmeisser
Multiplicative number theory i.classical theory cambridge
Ad

More from THILIVHALI CASTRO NDOU (10)

PDF
2016 General Prospectus for Wits
PDF
Mining Proffesors
PDF
ThesisMarwala
PDF
practical-operational-aspects-of-dense-medium-cyclone-separation-9f775
PDF
annual report - 2008-9
PDF
PDF
Dr Marais Research.pdf
PDF
10.4.MTech_MathTech_Structured_2016
PDF
PhD 2014 Thesis Corrections 8 Final NO RED.pdf (My other best lecturer's Thesis)
2016 General Prospectus for Wits
Mining Proffesors
ThesisMarwala
practical-operational-aspects-of-dense-medium-cyclone-separation-9f775
annual report - 2008-9
Dr Marais Research.pdf
10.4.MTech_MathTech_Structured_2016
PhD 2014 Thesis Corrections 8 Final NO RED.pdf (My other best lecturer's Thesis)

Advanced_Calculus

  • 1. L Y N N H. L 0 0 MIS and S H L 0 M 0 S T ERN B ERG Department of Mathematics, Harvard University ADVANCED CALCULUS REVISED EDITION JONES AND BARTLETT PUBLISHERS Boston London
  • 2. ~"' , ~ ", :,i.; J) Editorial, Sales, and Customer Service Offices: Jones and Bartlett Publishers, Inc, One Exeter Plaza Boston, MA 02116 Jones and Bartlett Publishers International POBox 1498 London W6 7RS England Copyright © 1990 by Jones and Bartlett Publishers, Inc. Copyright © 1968 by Addison-Wesley Publishing Company, Inc. All rights reserved. No part of the material protected by this copyright notice may be reproduced or utilized in any form, electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system, without written permission from the copyright owner. Printed in the United States of America. 10 9 8 7 6 5 4 3 2 Library of Congress Cataloging-in-Publication Data Loomis, Lynn H. Advanced calculus / Lynn H. Loomis and Shlomo Sternberg. -Rev. ed. p. cm. Originally published: Reading, Mass. : Addison-Wesley Pub. Co., 1968. ISBN 0-86720-122-3 1. Calculus. I. Sternberg, Shlomo. II. Title. QA303.L87 1990 515--dc20 '. , 'I 89-15620 CIP
  • 3. PREFACE This book is based on an honors course in advanced calculus that we gave in the 1960's. The foundational material, presented in the unstarred sections of Chap- ters 1 through 11, was normally covered, but different applications of this basic material were stressed from year to year, and the book therefore contains more material than was covered in anyone year. It can accordingly be used (with omissions) as a text for a year's course in advanced calculus, or as a text for a three-semester introduction to analysis. These prerequisites are a good grounding in the calculus of one variable from a mathematically rigorous point of view, together with some acquaintance with linear algebra. The reader should be familiar with limit and continuity type arguments and have a certain amount of mathematical sophistication. AB possi- ble introductory texts, we mention Differential and Integral Calculus by R. Cou- rant, Calculus by T. Apostol, Calculus by M. Spivak, and Pure Mathematics by G. Hardy. The reader should also have some experience with partial derivatives. In overall plan the book divides roughly into a first half which develops the calculus (principally the differential calculus) in the setting of normed vector spaces, and a second halfwhich deals with the calculus ofdifferentiable manifolds. Vector space calculus is treated in two chapters, the differential calculus in Chapter 3, and the basic theory of ordinary differential equations in Chapter 6. The other early chapters are auxiliary. The first two chapters develop the neces- sary purely algebraic theory of vector spaces, Chapter 4 presents the material on compactness and completeness needed for the more substantive results of the calculus, and Chapter 5 contains a brief account of the extra structure en- countered in scalar product spaces. Chapter 7 is devoted to multilinear (tensor) algebra and is, in the main, a reference chapter for later use. Chapter 8 deals with the theory of (Riemann) integration on Euclidean spaces and includes (in exercise form) the fundamental facts about the Fourier transform. Chapters 9 and 10 develop the differential and integral calculus on manifolds, while Chapter 11 treats the exterior calculus of E. Cartan. The first eleven chapters form a logical unit, each chapter depending on the results of the preceding chapters. (Of course, many chapters contain material that can be omitted on first reading; this is generally found in starred sections.)
  • 4. On the other hand, Chapters 12, 13, and the latter parts of Chapters 6 and 11 are independent of each other, and are to be regarded as illustrative applications of the methods developed in the earlier chapters. Presented here are elementary Sturm-Liouville theory and Fourier series, elementary differential geometry, potential theory, and classical mechanics. We usually covered only one or two of these topics in our one-year course. We have not hesitated to present the same material more than once from different points of view. For example, although we have selected the contraction mapping fixed-point theorem as our basic approach to the in1plicit-function theorem, we have also outlined a "Newton's method" proof in the text and have sketched still a third proof in the exercises. Similarly, the calculus of variations is encountered twice-once in the context of the differential calculus of an infinite-dimensional vector space and later in the context of classical mechanics. The notion of a submanifold of a vector space is introduced in the early ohapters, while the invariant definition of a manifold is given later on. In the introductory treatment of vector space theory, we are more careful and precise than is customary. In fact, this level of precision of language is not maintained in the later chapters. Our feeling is that in linear algebra, where the concepts are so clear and the axioms so familiar, it is pedagogically sound to illustrate various subtle points, such as distinguishing between spaces that are normally identified, discussing the naturality of various maps, and so on. Later on, when overly precise language would be more cumbersome, the reader should be able to produce for hin1self a more precise version of any assertions that he finds to be formulated too loosely. Similarly, the proofs in the first few chapters are presented in more formal detail. Again, the philosophy is that once the student has mastered the notion of what constitutes a fonnal mathematical proof, it is safe and more convenient to present arguments in the usual mathe- matical colloquialisms. While the level of formality decreases, the level of mathematical sophisti- cation does not. Thus increasingly abstract and sophisticated mathematical objects are introduced. It has been our experience that Chapter 9 contains the concepts most difficult for students to absorb, especially the notions of the tangent space to a manifold and the Lie derivative of various objects with respect to a vector field.
  • 5. There are exercises of many different kinds spread throughout the book. Some are in the nature of routine applications. Others ask the r~ader to fill in or extend various proofs of results presented in the text. Sometimes whole topics, such as the Fourier transform or the residue calculus, are presented in exercise form. Due to the rather abstract nature of the textual material, the stu- dent is strongly advised to work out as many of the exercises as he possibly can. Any enterprise of this nature owes much to many people besides the authors, but we particularly wish to acknowledge the help of L. Ahlfors, A. Gleason, R. Kulkarni, R. Rasala, and G. Mackey and the general influence of the book by Dieudonne. We also wish to thank the staffofJones and Bartlett for their invaluable help in preparing this revised edition. Cambridge, Massachusetts 1968, 1989 L.H.L. S.S.
  • 7. CONTENTS Chapter 0 I n troduction 1 Logic: quantifiers 1 2 The logical connectives 3 3 Negations of quantifiers 6 4 Sets 6 5 Restricted variables . 8 6 Ordered pairs and relations. 9 7 Functions and mappings 10 8 Product sets; index notation 12 9 Composition 14 10 Duality 15 11 The Boolean operations . 17 12 Partitions and equivalence relations 19 Chapter 1 Vector Spaces 1 Fundamental notions 21 2 Vector spaces and geometry 36 3 Product spaces and Hom(V, TV) 43 4 Affine subspaces and quotient spaces 52 5 Direct sums 56 6 Bilinearity 67 Chapter 2 Finite-Dimensional Vector Spaces 1 Bases 71 2 Dimension 77 3 The dual space 81 4 .Matrices 88 5 Trace and determinant 99 6 Matrix computations 102 *7 The diagonalization of a quadratic form 111 Chapter 3 The Differential Calculus 1 Review in IR 117 2 Norms. 121 3 Continuity 126
  • 8. 4 Equivalent norms 132 5 Infinitesimals . 136 6 The differential 140 7 Directional derivatives; the mean-value theorem 146 8 The differential and product spaces 152 9 The differential and IRn • 156 10 Elementary applications 161 11 The implicit-function theorem 164 12 Submanifolds and Lagrange multipliers 172 *13 Functional dependence 175 *14 Uniform continuity and function-valued mappings 179 *15 The calculus of variations 182 *16 The second differential and the classification of critical points 186 *17 The Taylor formula . 191 Chapter 4 Compactness and Completeness 1 Metric spaces; open and closed sets 195 *2 Topology 201 3 Sequential convergence . 202 4 Sequential compactness. 205 5 Compactness and uniformity 210 6 Equicontinuity 215 7 Completeness. 216 8 A first look at Banach algebras 223 9 The contraction mapping fixed-point theorem 228 10 The integral of a parametrized arc 236 11 The complex number system 240 *12 Weak methods 245 Chapter 5 Scalar Product Spaces 1 Scalar products 248 2 Orthogonal projection 252 3 Self-adjoint transformations 257 4 Orthogonal transformations 262 5 Compact transformations 264
  • 9. Chapter 6 Differential Equations 1 The fundamental theorem 266 2 Differentiable dependence on parameters . 274 3 The linear equation 276 4 The nth-order linear equation 281 5 Solving the inhomogeneous equation 288 6 The boundary-value problem 294 7 Fourier series . 301 Chapter 7 Multilinear Functionals 1 Bilinear functionals 305 2 Multilinear functionals 306 3 Permutations. 308 4 The sign of a permutation 309 5 The subspace an of alternating tensors 310 6 The determinant . 312 7 The exterior algebra. 316 8 Exterior powers of scalar product spaces 319 9 The star operator 320 Chapter 8 Integration 1 Introduction 321 2 Axioms 322 3 Rectangles and paved sets 324 4 The minimal theory . 327 5 The minimal theory (continued) 328 6 Contented sets 331 7 When is a set contented? 333 8 Behavior under linear distortions 335 9 Axioms for integration 336 10 Integration of contented functions 338 11 The change of variables formula 342 12 Successive integration 346 13 Absolutely integrable functions 351 14 Problem set: The Fourier transform 355
  • 10. Chapter 9 Differentiable Manifolds 1 Atlases 364 2 Functions, convergence . 367 3 Differentiable manifolds 369 4 The tangent space 373 5 Flows and vector fields 376 6 Lie derivatives 383 7 Linear differential forms 390 8 Computations with coordinates 393 9 Riemann metrics . 397 Chapter 10 The Integral Calculus on Manifolds 1 Compactness . 403 2 Partitions of unity 405 3 Densities 408 4 Volume density of a Riemann metric 411 5 Pullback and Lie derivatives of densities 416 6 The divergence theorem 419 7 More complicated domains 424 Chapter 11 Exterior Calculus 1 Exterior differential forms 429 2 Oriented manifolds and the integration of exterior differential forms 433 3 The operator d 438 4 Stokes' theorem 442 5 Some illustrations of Stokes' theorem 449 6 The Lie derivative of a differential form 452 Appendix 1. "Vector analysis" . 457 Appendix II. Elementary differential geometry of surfaces in [3 459 Chapter 12 Potential Theory in lEn 1 Solid angle 474 2 Green's formulas . 476 3 The maximum principle 477 4 Green's functions 479
  • 11. 5 The Poisson integral formula 482 6 Consequences of the Poisson integral formula 485 7 Harnack's theorem 487 8 Subharmonic functions 489 9 Dirichlet's problem 491 10 Behavior near the boundary 495 11 Dirichlet's principle 499 12 Physical applications 501 13 Problem set: The calculus of residues 503 Chapter 13 Classical Mechanics 1 The tangent and cotangent bundles 511 2 Equations of variation 513 3 The fundamental linear differential form on T*(M) 515 4 The fundamental exterior two-form on T*(M) 517 5 Hamiltonian mechanics . 520 6 The central-force problem 521 7 The two-body problem 528 8 Lagrange's equations 530 9 Variational principles 532 10 Geodesic coordinates 537 11 Euler's equations 541 12 Rigid-body motion 544 13 Small oscillations 551 14 Small oscillations (continued) 553 15 Canonical transformations 558 Selected References . 569 Notation Index 572 Index 575
  • 13. CHAPTER 0 INTRODUCTION This preliminary chapter contains a short exposition of the set theory that forms the substratum of mathematical thinking today. It begins with a brief discussion of logic, so that set theory can be discussed with some precision, and continues with a review of the way in which mathematical objects can be defined as sets. The chapter ends with four sections which treat specific set-theoretic topics. It is intended that this material be used mainly for reference. Some of it will be familiar to the reader and some of it will probably be new. We suggest that he read the chapter through "lightly" at first, and then refer back to it for details as needed. 1. LOGIC: QUANTIFIERS A statement is a sentence which is true or false as it stands. Thus '1 < 2' and '4 +3 = 5' are, respectively, true and false mathematical statements. Many sentences occurring in mathematics contain variables and are therefore not true or false as they stand, but become statements when the variables are given values. Simple examples are 'x < 4', 'x < 1/', 'x is an integer', '3x2 + y2 = 10'. Such sentences will be called statementjrames. If P(x) is a frame containing the one variable 'x', then P(5) is the statement obtained by replacing 'x' in P(x) by the numeral '5'. For example, if P(x) is 'x < 4', then P(5) is '5 < 4', P(0) is '0 < 4', and so on. Another way to obtain a statement from the frame P(x) is to assert that P(x) is always true. We do this by prefixing the phrase 'for every x'. Thus, 'for every x, x < 4' is a false statement, and 'for every x, x2 - 1 = (x - 1)(x + 1)' is a true statement. This prefixing phrase is called a universal quantifier. Syn- onymous phrases are 'for each x' and 'for all x', and the symbol customarily used is '("Ix)', which can be read in any of these ways. One frequently presents sentences containing variables as being always true without explicitly writing the universal quantifiers. For instance, the associative law for the addition of numbers is often written x + (y +z) = (x + y) +z, where it is understood that the equation is true for all x, y and z. Thus the 1
  • 14. 2 INTRODUCTION 0.1 actual statement being made is (Vx) (Vy) (Vz) [x + (y + z) = (x + y) + z]. Finally, we can convert the frame P(x) into a statement by asserting that it is sometimes true, which we do by writing 'there exists an x such that P(x)'. This process is called existential quantification. Synonymous prefixing phrases here are 'there is an x such that', 'for some x', and, symbolically, '(::jx)'. The statement '(Vx)(x < 4)' still contains the variable 'x', of course, but 'x' is no longer free to be given values, and is now called a bound variable. Roughly speaking, quantified variables are bound and unquantified variables are free. The notation 'P(x), is used only when 'x' is free in the sentence being discussed. Now suppose that we have a sentence P(x, y) containing two free variables. Clearly, we need two quantifiers to obtain a statement from this sentence. This brings us to a very important observation. If quantifiers of both types are used, then the order in which they are written affects the meaning of the statement; (::jy)(Vx)P(x, y) and (Vx)(::jy)P(x, y) say different things. The first says that one y can be found that works for all x: "there exists a y such that for all x ... ". The second says that for each x a y can be found that works: "for each x there exists a y such that ... ". ~ut in the second case, it may very well happen that when x is changed, the y that can be found will also have to be changed. The existence of a single y that serves for all x is thus the stronger statement. For example, it is true that (Vx)(::jy)(x < y) and false that (::jy)(Vx)(x < y). The reader must be absolutely clear on this point; his whole mathematical future is at stake. The second statement says that there exists a y, call it Yo, such that (Vx)(x < Yo), that is, such that every number is less than Yo. This is false; Yo + 1, in particular, is not less than Yo. The first statement says that for each x we can find a corresponding y. And we can: take y = x + 1. On the other hand, among a group of quantifiers of the same type the order does not affect the meaning. Thus '(Vx) (Vy)' and '(Vy) (Vx) , have the same mean- ing. We often abbreviate such clumps of similar quantifiers by using the quan- tification symbol only once, as in '(Vx, y)', which can be read 'for every x and y'. Thus the strictly correct '(Vx) (Vy) (Vz) [x + (y + z) = (x + y) + zl' receives the slightly more idiomatic rendition '(Vx, y, z)[x + (y + z) = (x + y) + zl'. The situation is clearly the same for a group of existential quantifiers. The beginning student generally feels that the prefixing phrases 'for every x there exists a y such that' and 'there exists a y such that for every x' sound artificial and are unidiomatic. This is indeed the case, but this awkwardness is the price that has to be paid for the order of the quantifiers to be fixed, so that the meaning of the quantified statement is clear and unambiguous. Quantifiers do occur in ordinary idiomatic discourse, but their idiomatic occurrences often house ambiguity. The following two sentences are good examples of such ambiguous idiomatic usage: "Every x is less than some y" and "Some y is greater than every x". If a poll were taken, it would be found that most men on the
  • 15. 0.2 THE LOGICAL CONNECTIVES 3 street feel that these two sentences say the same thing, but half will feel that the common assertion is false and half will think it true! The trouble here is that the matrix is preceded by one quantifier and followed by another, and the poor reader doesn't know which to take as the inside, or first applied, quantifier. The two possible symbolic renditions of our first sentence, '[(Vx)(x < y)](3y)' and '(Vx)[(x < y)(3y)1', are respectively false and true. Mathematicians do use hanging quantifiers in the interests of more idiomatic writing, but only if they are sure the reader will understand their order of application, either from the context or by comparison with standard usage. In general, a hanging quantifier would probably be read as the inside, or first applied, quantifier, and with this understanding our two ambiguous sentences become true and false in that order. After this apology the reader should be able to tolerate t.he definit.ion of sequential convergence. It involves three quantifiers and runs as follows: The sequence {xn} converges to x if (Ve) (3N) (Vn) (if n > N then IXn - xl < e). In exactly the same format, we define a function f to be continuous at a if (Ve) (3 0)(Vx) (if Ix - al < 0 then If(x) - f(a) I < e). We often omit an inside universal quantifier by displaying the final frame, so that the universal quanti- fication is understood. Thus we define f to be continuous at a if for every e there is a 0 such that if Ix - al < 0, then If(x) - f(a) I < E. We shall study these definitions later. We remark only that it is perfectly possible to build up an intuitive understanding of what these and similar quantified statements actually say. 2. TIlE LOGICAL CONNECTIVES When the word 'and' is inserted between two sentences, the resulting sentence is true if both constituent sentences are true and is false otherwise. That is, the "truth value", T or F, of the compound sentence depends only on the truth values of the constituent sentences. We can thus describe the way 'and' acts in compounding sentences in the simple "truth table" P Q P and Q T T T T F F F T F F F F where 'P' and 'Q' stand for arbitrary statement frames. Words like 'and' are called logical connectives. It is often convenient to use symbols for connectives, and a standard symbol for 'and' is the ampersand '&'. Thus 'P & Q' is read 'P andQ'.
  • 16. 4 INTRODUCTION 0.2 Another logical connective is the word 'or'. Unfortunately, this word is used ambiguously in ordinary discourse. Sometimes it is used in the exclusive sense, where 'P or Q' means that one of P and Q is true, but not both, and sometimes it is used in the inclusive sense that at least one is true, and possibly both are true. Mathematics cannot tolerate any fundamental ambiguity, and in mathe- matics 'or' is always used in the latter way. We thus have the truth table P Q P orQ T T T T F T F T T F F Ii' The above two connectives are binary, in the sense that they combine two sentences to form one new sentence. The word 'not' applies to one sentence and really shouldn't be considered a connective at all; nevertheless, it is called a unary connective. A standard symbol for 'not' is '~'. Its truth table is obviously P ~P T F F T In idiomatic usage the word 'not' is generally buried in the interior of a sentence. We write' x is not equal to y' rather than' not (x is equal to y)'. However, for the purpose of logical manipulation, the negation sign (the word 'not' or a symbol like '~') precedes the sentence being negated. We shall, of course, continue to write 'x ~ y', but keep in mind that this is idiomatic for 'not (x = y)' or '~(x = y)'. We come now to the troublesome 'if ... ,then ...' connective, which we write as either 'if P, then Q' or 'P ==} Q'. This is almost always applied in the universally quantified context (Vx) (P(x) ==} Q(x»), and its meaning is best unraveled by a study of this usage. We consider 'if x < 3, then x < 5' to be a true sentence. More exactly, it is true for all x, so that the universal quantifi- cation (Vx)(x < 3 ==} x < 5) is a true statement. This conclusion forces us to agree that, in particular, '2 < 3 ==} 2 < 5', '4 < 3 ==} 4 < 5', and '6 < 3 ==} 6 < 5' are all true statements. The truth table for '==}' thus contains the values entered below. P Q P==}Q T T T T F F T T F F T
  • 17. 0.2 THE LOGICAL CONNECTIVES 5 On the other hand, we consider 'x < 7 ==} x < 5' to be a false sentence, and therefore have to agree that '6 < 7 ==} 6 < 5' is false. Thus the remaining row in the table above gives the value 'F' for P ==} Q. Combinations of frame variables and logical connectives such as we have been considering are called truth-functional forms. We can further combine the elementary forms such as 'P ==} Q' and '",P' by connectives to construct com- posite forms such as '",(P ==} Q)' and '(P ==} Q) & (Q ==} P)'. A sentence has a given (truth-functional) form if it can be obtained from that form by substitution. Thus 'x < y or ",(x < V)' has the form 'P or ",P', since it is obtained from this form by substituting the sentence 'x < y' for the sentence variable 'P'. Com- posite truth-functional forms have truth tables that can be worked out by combining the elementary tables. For example, '",(P ==} Q)' has the table below, the truth value for the whole form being in the column under the connective which is applied last ('",' in this example). P Q ",(P ==} Q) T T F T T F T F F T F T F F F T Thus", (P ==} Q) is true only when P is true and Q is false. A truth-functional form such as 'P or (",P), which is always true (i.e., has only 'T' in the final column of its truth table) is called a tautology or a tautologous form. The reader can check that and ((P ==} Q) & (Q ==} R)) ==} (P ==} R) are also tautologous. Indeed, any valid principle of reasoning that does not involve quantifiers must be expressed by a tautologous form. The 'if and only if' form 'P <=? Q', or 'P if and only if Q', or 'P iff Q', is an abbreviation for '(P ==} Q) & (Q ==} P)'. Its truth table works out to be P Q P<=?Q T T T T F F F T F F F T That is, P <=? Q is true if P and Q have the same truth values, and is false otherwise. Two truth-functional forms A and B are said to be equivalent if (the final columns of) their truth tables are the same, and, in view of the table for '<=?', we see that A and B are equivalent if A <=? B is tautologous, and conversely.
  • 18. 6 INTRODUCTION 0.4 Replacing a sentence obtained by substitution in a form A by the equivalent sentence obtained by the same substitutions in an equivalent form B is a device much used in logical reasoning. Thus to prove a statement P true, it suffices to prove the statement ",P false, since 'P' and '",(",P), are equivalent forms. Other important equivalences are ",(P or Q) ~ (",P) & (",Q), (P => Q) ~ Q or (",P), ",(P => Q) ~ P & (",Q). A bit of conventional sloppiness which we shall indulge in for smoother idiom is the use of 'if' instead of the correct 'if and only if' in definitions. We definefto be continuous at x if so-and-so, meaning, of course, thatfis continuous at x if and only if so-and-so. This causes no difficulty, since it is clear that 'if and only if' is meant when a definition is being given. 3. NEGATIONS OF QUANTIFIERS The combinations '",(V'x)' and '(3x)",' have the same meanings: something is not always true if and only if it is sometimes false. Similarly, '",(3y)' and '(V'y)",' have the same meanings. These equivalences can be applied to move a negation sign past each quantifier in a string of quantifiers, giving the following important practical rule: In taking the negation of a statement beginning with a string of quantifiers, we simply change each quantifier to the opposite kind and move the negation sign to the end of the string. Thus ",(V'x)(3y) (V'z)P(x, y, z) ~ (3x)(V'y)(3z)",P(x, y, z). There are other principles of quantificational reasoning that can be isolated and which we shall occasionally mention, but none seem worth formalizing here. 4. SETS It is present-day practice to define every mathematical object as a set of some kind or other, and we must examine this fundamental notion, however briefly. A set is a collection of objects that is itself considered an entity. The objects in the collection are called the elements or members of the set. The symbol for 'is a member of' is 'E' (a sort of capital epsilon), so that 'x E A' is read "x is a member of A", "x is an element of A", "x belongs to A", or "x is in A". We use the equals sign '=' in mathematics to mean logical identity; A = B means that A is B. Now a set A is considered to be the same object as a set B if and only if A and B have exactly the same members. That is, 'A = B' means that (V'x)(x E A ~ x E B).
  • 19. 0.4 SETS 7 We say that a set A is a subset of a set B, or that A is included in B (or that B is a superset of A) if every element of A is an element of B. The symbol for inclusion is Ie'. Thus 'A e B' means that (Yx)(x E A =} x E B). Clearly, (A = B) {=} (A e B) and (B e A). This is a frequently used way of establishing set identity: we prove that A = B by proving that A e B and that B e A. If the reader thinks about the above equivalence, he will see that it depends first on the equivalence of the truth-func- tional forms 'P {=} Q' and '(P =} Q) & (Q =} P)', and then on the obvious quantificational equivalence between '(Yx)(R & S)' and '(Yx)R & (Yx)S'. We define a set by specifying its members. If the set is finite, the members can actually be listed, and the notation used is braces surrounding a member- ship list. For example {I, 4, 7} is the set containing the three numbers 1, 4, 7, {x} is the unit set of x (the set having only the one object x as a member), and {x, y} is the pair set of x and y. We can abuse this notation to name some infinite sets. Thus {2, 4, 6, 8, ...} would certainly be considered the set of all even positive integers. But infinite sets are generally defined by statement frames. If P(x) is a frame containing the free variable 'x', then {x : P(x)} is the set of all x such that P(x) is true. In other words, {x : P(x)} is that set A such that yEA {=} P(y). For example, {x: x2 < 9} is the set of all real numbers x such that x 2 < 9, that is, the open interval (-3, 3), and y E {x : x2 < 9} {=} y2 < 9. A statement frame P(x) can be thought of as stating a property that an object x mayor may not have, and {x : P(x)} is the set of all objects having that property. We need the empty set 0, in much the same way that we need zero in arithmetic. If P(x) is never true, then {x: P(x)} = 0. For example, {x:x ~ x} = 0. When we said earlier that all mathematical objects are customarily con- sidered sets, it was taken for granted that the reader understands the distinction between an object and a name of that object. To be on the safe side, we add a few words. A chair is not the same thing as the word 'chair', and the number 4 is a mathematical object that is not the same thing as the numeral '4'. The numeral '4' is a name of the number 4, as also are 'four', '2 +2', and 'IV'. According to our present viewpoint, 4 itself is taken to be some specific set. There is no need in this course to carry logical analysis this far, but some readers may be interested to know that we usually define 4 as {O, 1, 2, 3}. Similarly, 2 = {O, I}, 1 = {O}, and 0 is the empty set 0. It should be clear from the above discussion and our exposition thus far that we are using a symbol surrounded by single quotation marks as a name of that symbol (the symbol itself being a name of something else). Thus' '4' , is a name of '4' (which is itself a name of the number 4). This is strictly correct
  • 20. 8 INTRODUCTION 0.5 usage, but mathematicians almost universally mishandle it. It is accurate to write: let x be the number; call this number 'x'. However, the latter is almost always written: call this number x. This imprecision causes no difficulty to the reading mathematician, and it often saves the printed page from a shower of quotation marks. There is, however, a potential victim of such ambiguous treatment of symbols. This is the person who has never realized that mathe- matics is not about symbols but about objects to which the symbols refer. Since by now the present reader has safely avoided this pitfall, we can relax and occasionally omit the strictly necessary quotation marks. In order to avoid overworking the word 'set', we use many synonyms, such as 'class', 'collection', 'family' and 'aggregate'. Thus we might say, "Let a be a family of classes of sets". If a shoe store is a collection of pairs of shoes, then a chain of shoe stores is such a three-level object. 5. RESTRICTED VARIABLES A variable used in mathematics is not allowed to take all objects as values; it can only take as values the members of a certain set, called the domain of the variable. The dOInain is sometimes explicitly indicated, but is often only im- plied. For example, the letter' n' is customarily used to specify an integer, so that' (Vn)P(n) , would automatically be read "for every integer n, P(n)". How- ever, sometimes n is taken to be a positive integer. In case of possible ambiguity or doubt, we would indicate the restriction explicitly and write' ("In E 71.)P(n)', where'71.' is the standard symbol for the set of all integers. The quantifier is read, literally, "for all n in 71.", and more freely, "for every integer n". Similarly, '(3n E 71.)P(n), is read "there exists an n in 71. such that P(n)" or "there exists an integer n such that P(n)". Note that the symbol 'E' is here read as the preposition'in'. The above quantifiers are called restricted quantifiers. In the same way, we have restricted set formation, both implicit and explicit, as in '{n: P(n)} , and '{n E 71. : pen)}', both of which are read "the set of all integers n such that P(n)". by Restricted variables can be defined as abbreviations of unrestricted variables ("Ix E A)P(x) ¢=> ("Ix) (x E A => P(x)), (3x E A)P(x) ¢=> (3x) (x E A & P(x)), {x E A :P(x)} = {x:x E A & P(x)}. Although there is never any ambiguity in sentences containing explicitly restricted variables, it sometimes helps the eye to see the structure of the sentence if the restricting phrases are written in superscript position, as in (Ve>o)(3nEZ). Some restriction was implicit on page 1. If the reader agreed that (Vx)(x2 - 1 = (x - 1)(x +1)) was true, he probably took x to be a real number.
  • 21. 0.6 ORDERED PAIRS AND RELATIONS 9 6. ORDERED PAIRS AND RELATIONS Ordered pairs are basic tools, as the reader knows from analytic geometry. According to our general principle, the ordered pair -< a, b> is taken to be a certain set, but here again we don't care which particular set it is so long as it guarantees the crucial characterizing property: -<x,y> = -<a,b> R x = aandy = b. Thus -<1, 3> ~ -<3,1>. The notion of a correspondence, or relation, and the special case of a map- ping, or function, is fundamental to mathematics. A correspondence is a pairing of objects such that given any two objects x and y, the pair -< x, y> either does or does not correspond. A particular correspondence (relation) is generally presented by a statement frame P(x, y) having two free variables, with x and y corresponding if any only if P(x, y) is true. Given any relation (correspondence), the set of all ordered pairs -<x, y> of corresponding elements is called its graph. Now a relation is a mathematical object, and, as we have said several times, it is current practice to regard every mathematical object as a set of some sort or other. Since the graph of a relation is a set (of ordered pairs), it is efficient and customary to take the graph to be the relation. Thus a relation (correspondence) is simply a set of ordered pairs. If R is a relation, then we say that x has the relation R to y, and we write 'xRy', if and only if -<x, y> E R. We also say that x corresponds to y under R. The set of all first elements occurring in the ordered pairs of a relation R is called the domain of R and is designated dom R or ~(R). Thus dom R = {x: (~y)-<x, y> E R}. The set of second elements is called the mnge of R: rangeR = {y:(~x)-<x,y> ER}. The inverse, R-l, of a relation R is the set of ordered pairs obtained by reversing those of R: R-1 = {-<x, y> : -<y, x> E R}. A statement frame P(x, y) having two free variables actually determines a pair of mutually inverse relations R & S, called the gmphs of P, as follows: R = {-<x, y> : P(x, y)}, S = {-<y, x> : P(x, y)}. A two-variable frame together with a choice of which variable is considered to be first might be called a directed frame. Then a directed frame would have a uniquely determined relation for its graph. The relation of strict inequality on the real number system IR would be considered the set {-<x, y> : x < y}, since the variables in 'x < y' have a natural order. The set A X B = {-<x, y> : x E A & y E B} of all ordered pairs with first element in A and second element in B is called the Cartesian product of the
  • 22. 10 INTRODUCTION 0.7 sets A and B. A relation R is always a subset of dom R X range R. If the two "factor spaces" are the same, we can use exponential notation: A 2 = A X A. The Cartesian product R2 = R X R is the "analytic plane". Analytic geometry rests upon the one-to-one coordinate correspondence between R2 and the Euclidean plane [2 (determined by an axis system in the latter), which enables us to treat geometric questions algebraically and algebraic questions geometrically. In particular, since a relation between sets of real numbers is a subset of R2, we can "picture" it by the corresponding subset of the Euclidean plane, or of any model of the Euclidean plane, such as this page. A simple Cartesian product is shown in Fig. 0.1 (A UB is the union of the sets A and B). B B 1 1 A A AXB when A=[l, 2Iu[2t, 31 and B=[1, Itlu{2} Fig. 0.1 R[AI I I I I I .... I I ----------,--- Fig. 0.2 I I, A R If R is a relation and A is any set, then the restriction of R to A, R rA, is the subset of R consisting of those pairs with first element in A: R t A = {-<x, y> : -<x, y> E R and x E A}. Thus R rA = R n (A X range R), where C n D is the intersection of the sets CandD. If R is a relation and A is any set, then the image oj A under R, R[A), is the set of second elements of ordered pairs in R whose first elements are in A: R[A} = {y: (3x)(x E A & -<x. y> E R)}. Thus R[A] = range (R rA), as shown in Fig. 0.2. 7. FUNCTIONS AND MAPPINGS A Junction is a relation J such that each domain element x is paired with exactly one range element y. This property can be expressed as follows: -<x, y> EJ and -<x, Z> EJ =} y = z.
  • 23. 0.7 FUNCTIONS AND MAPPINGS The y which is thus uniquely determined by f and x is designated f(x): y = f(x) ~ <x, y> Ef. 11 One tends to think of a function as being active and a relation which is not a function as being passive. A function f acts on an element x in its domain to givef(x). We take x and applyfto it; indeed we often call a function an operator. On the other hand, if R is a relation but not a function, then there is in general no particular y related to an element x in its domain, and the pairing of x and y is viewed more passively. We often define a function f by specifying its value f(x) for each x in its domain, and in this connection a stopped arrow notation is used to indicate the pairing. Thus x 1-+ x2 is the function assigning to each number x its square x2• Fig. 0.3 If we want it to be understood that f is this function, we can write "Consider the function f: x 1-+ X2ll• The domain of f must be understood for this notation to be meaningful. If f is a function, then r 1 is of course a relation, but in general it is not a function. For example, if f is the function x 1-+ X2, then r 1 contains the pairs <4,2> and <4, -2> and so is not a function (see Fig. 0.3). Ifr 1 is a func- tion, we say thatf is one-to-one and thatf is a one-to-one correspondence between its domain and its range. Each x E domf corresponds to only one y E rangef (f is a function), and each y E rangef corresponds to only one x E dom f (r1 is a function). The notation· f:A -tB is read "a (the) function f on A into B" or "the function f from A to B". The notation implies that f is a function, that domf = A, and that range feB. Many people feel that the very notion of function should include all these ingredients; that is, a function should be considered an ordered triple <f, A, B> , where f is a function according to our more limited definition, A is the domain
  • 24. 12 INTRODUCTION 0.8 of f, and B is a superset of the range of f, which we shall call the codomain of f in this context. We shall use the terms 'map', 'mapping', and 'transformation' for such a triple, so that the notationf: A --+ B in its totality presents a mapping. Moreover, when there is no question about which set is the codomain, we shall often call the function f itself a mapping, since the triple -<f, A, B>- is then determined by f. The two arrow notations can be combined, as in: "Define f: !Fl --+ !Fl by x 1-+ x2 ". A mapping f: A --+ B is said to be injective if f is one-to-one, surjective if range f = B, and bijective if it is both injective and surjective. A bijective mapping f: A --+ B is thus a one-to-one correspondence between its domain A and its codomain B. Of course, a function is always surjective onto its range R, and the statement that f is surjective means that R = B, where B is the under- stood codomain. 8. PRODUCT SETS; INDEX NOTATION One of the characteristic habits of the modern mathematician is that as soon as a new kind of object has been defined and discussed a little, he immediately looks at the set of all such objects. With the notion of a function from A to S well in hand, we naturally consider the set of all functions from A to S, which we designate SA. Thus!FlR is the set of all real-valued functions of one real variable, and sz+ is the set of all infinite sequences in S. (It is understood that an infinite sequence is nothing but a function whose domain is the set Z+ of all positive integers.) Similarly, if we set n = {I, ... , n}, then Sri is the set of all finite sequences of length n in S. If B is a subset of S, then itR characteristic function (relative to S) is the func- tion on S, usually designated XB, which has the constant value I on B and the constant value 0 off B. The set of all characteristic functions of subsets of S is thus 28 (since 2 = {O, I}). But because this collection of functions is in a natural one-to-one correspondence with the collection of all subsets of S, XB corresponding to B, we tend to identify the two collections. Thus 28 is also interpreted as the set of all subsets of S. We shall spend most of the remainder of this section discussing further similar definitional ambiguities which mathe- maticians tolerate. The ordered triple -<x, y, z>- is usually defined to be the ordered pair -< -<x, y>- ,z>- . The reason for this definition is probably that a function of two variables x and y is ordinarily considered a function of the single ordered pair variable -<x, y>-, so that, for example, a real-valued function of two real variables is a subset of (!Fl X !Fl) X !Fl. But we also consider such a function a subset of Cartesian 3-space !Fl3• Therefore, we define !Fl3 as (!Fl X !Fl) X !Fl; that is, we define the ordered triple -<x, y, z>- as -< -<x, y>-. z>-. On the other hand, the ordered triple -<x, y, z>- could also be regarded as the finite sequence {-< I, x>-, -<2, y>-, -<3, z>-}, which, of course, is a different object. These two models for an ordered triple serve equally well, and, again,
  • 25. 0.8 PRODUCT SETS; INDEX NOTATION 13 mathematicians tend to slur over the distinction. We shall have more to say on this point later when we discuss natural isomorphisms (Section 1.6). For the moment we shall simply regard IRa and 1R"3" as being the same; an ordered triple is something which can be "viewed" as being either an ordered pair of which the first element is an ordered pair or as a sequence of length 3 (or, for that matter, as an ordered pair of which the second element is an ordered pair). Similarly, we pretend that Cartesian 4-space 1R4 is 1R4, 1R2 X 1R2, or IRI X IRa = IR X ((IR X IR) X IR), etc. Clearly, we are in effect assuming an associative law for ordered pair formation that we don't really have. This kind of ambiguity, where we tend to identify two objects that really are distinct, is a necessary corollary of deciding exactly what things are. It is one of the prices we pay for the precision of set theory; in days when mathematics was vaguer, there would have been a single fuzzy notion. The device of indices, which is used frequently in mathematics, also has am- biguous implications which we should examine. An indexed collection, as a set, is nothing but the range set of a function, the indexing function, and a particular indexed object, say Xi, is simply the value of that function at the domain element i. If the set of indices is I, the indexed set is designated {Xi: i E l} or {Xi};EI (or {Xi};:'l in case I = Z+). However, this notation suggests that we view the indexed set as being obtained by letting the index run through the index set I and collecting the indexed objects. That is, an indexed set is viewed as being the set together with the indexing function. This ambivalence is reflected in the fact that the same notation frequently designates the mapping. Thus we refer to the sequence {Xn}:=l, where, of course, the sequence is the mapping n ~ Xn. We believe that if the reader examines his idea of a sequence he will find this ambiguity present. He means neither just the set nor just the mapping, but the mapping with emphasis on its range, or the range "together with" the mapping. But since set theory cannot reflect these nuances in any simple and graceful way, we shall take an indexed set to be the indexing function. Of course, the same range object may be repeated with different indices; there is no implication that an indexing is one-to-one. Note also that indexing imposes no restriction on the set being indexed; any set can at least be self-indexed (by the identity function). Except for the ambiguous' {Xi: i E I}', there is no universally used notation for the indexing function. Since Xi is the value of the function at i, we might think of 'x;' as another way of writing 'xCi)', in which case we designate the function 'x' or 'x'. We certainly do this in the case of ordered n-tuplets when we say, "Consider the n-tuplet x = -< XI, •.• , xn »". On the other hand, there is no compelling reason to use this notation. We can call the indexing function anything we want; if it is j, then of course j(i) = Xi for all i. We come now to the general definition of Cartesian product. Earlier we argued (in a special case) that the Cartesian product A X B X C is the set of all ordered triples x = -<XI, X2, xa» such that Xl E A, X2 E B, and Xa E C. More generally, A I X A 2 X ... X An, or IIi=1 Ai, is the set of ordered n- tuples x = -< XI, ... , xn» such that Xi E Ai for i = 1, ... ,n. If we interpret
  • 26. 14 INTRODUCTION 0.9 an ordered n-tuplet as a function on n = {I, ... , n}, we have IIi=l Ai is the set of all functions x with domain n such that Xi. E Ai for all i En. This rephrasal generalizes almost verbatim to give us the notion of the Cartesian product of an arbitrary indexed collection of sets. Definition. The Cartesian product IIiE1Si of the indexed collection of sets {Si: i E I} is the set of all functions f with domain the index set I such that f(i) E Si for all i E I. We can also use the notation II{Si : i E I} for the product and fi for the value f(i). 9. COMPOSITION If we are given maps f: A ~ Band g: B ~ C, then the composition of g with f, g 0 f, is the map of A into C defined by (gof)(x) = g(j(x)) for all X E A. This is the function of a function operation of elementary calculus. Iff and g are the maps from IR to IR defined by f(x) = Xl/3 +1 and g(x) = x2, then f 0 g(x) = (X 2)1/3 + 1 = X 2 / 3 + 1, and g 0 f(x) = (X 1/3 + 1)2 = X 2 / 3 + 2Xl/3 + 1. Note that the codomain of f must be the domain of g in order for go f to be defined. This operation is perhaps the basic binary operation of mathematics. Lemma. Composition satisfies the associative law: f 0 (g 0 h) = (f 0 g) 0 h. P1"OOf. (jo (g 0 h)) (x) = f((g 0 h)(x)) = f(g(h(x))) = (fo g) (h(x)) = ((f 0 g) 0 h) (x) for all x E dom h. 0 If A is a set, the identity map I A: A ~ A is the mapping taking every x E A to itself. Thus I A = {-< x, x>- : x E A}. If f maps A into B, then clearly foIA=f=IBof. If g: B ~ A is such that g 0 f = lA, then we say that g is a left inverse of f and that f is a right inverse of g. Lemma. If the mapping f: A ~ B has both a right inverse h and a left inverse g, they must necessarily be equal. Proof. This is just algebraic juggling and works for any associative operation. We have h = IA 0 h = (g 0 f) 0 h = go (f 0 h) = go IB = g. 0
  • 27. 0.10 DUALITY 15 In this case we call the uniquely determined map y: B ---t A such that fog = IB and g 0 f = IA the inverse of f. We then have: Theorem. A mapping f: A ---t B has an inverse if and only if it is bijective, in which case its inverse is its relational inverse f-l. Proof. If fis bijective, then the relational inverser1 is a function from B to A, and the equations fori = I Band r 1 0 f = I A are obvious. On the other hand, if fog = I B, then f is surjective, since then every y in B can be written y = f(g(y»). And if g 0 f = I A, then f is injective, for then the equation f(x) = f(y) implies that x = y(j(x») = y(j(y») = y. Thus f is bijective if it has an inverse. D Now let ~(A) be the set of all bijections f: A ---t A. Then ~(A) is closed under the binary operation of composition and 1) f 0 (y 0 h) = (f 0 y) 0 h for all f, g, h E~; 2) there exists a unique I E ~(A) such that f 0 I = I 0 f = f for all f E ~; 3) for each f E ~ there exists a unique y E ~ such that fog = g 0 f = I. Any set G closed under a binary operation having these properties is called a group with respect to that operation. Thus ~(A) is a group with respect to composition. Composition can also be defined for relations as follows. If RCA X Band S C B X C, then S 0 RCA X C is defined by -<x,z>- ESoR <=> (3yEB)(-<x,y>- ER& -<y,z>- ES). If Rand S are mappings, this definition agrees with our earlier one. 10. DUALITY There is another elementary but important phenomenon called duality which occurs in practically all branches of mathematics. Let F: A X B ---t C be any function of two variables. It is obvious that if x is held fixed, then F(x, y) is a function of the one variable y. That is, for each fixed x there is a function hX:B ---t C defined by hX(y) = F(x, y). Then x 1-+ hXis a mapping cp of A into CB. Similarly, each y E B yields a function gy E CA , where gy(x) = F(x, y), and y 1-+ yy is a mapping (J from B to CA. Now suppose conversely that we are given a mapping cp: A ---t CB. For each x E A we designate the corresponding value of cp in index notation as hX , so that hXis a function from B to C, and we define F: A X B ---t C by F(x, y) = hX(y). We are now back where we started. Thus the mappings cp: A ---t CB, /1': A X B ---t C, and (J: B ---t CA are equivalent, and can be thought of as three different ways of viewing the same phenomenon. The extreme mappings cp and (J will be said to be dual to each other.
  • 28. 16 INTRODUCTION 0.10 The mapping I() is the indexed family of functions {hx: x E A} C CB. Now suppose that 5' C CB is an unindexed collection of functions on B into C, and define F: 5' X B -+ C by F(f, y) = f(y). Then 8: B -+ (J'.f is defined by gll(f) = f(y). What is happening here is simply that in the expressionf(y) we regard both symbols as variables, so that f(y) is a function on 5' X B. Then when we hold y fixed, we have a function on 5' mapping 5' into C. Wc shall see some important applications of this duality principle as our subject develops. For example, an m X n matrix is a function t = {tij} in RmXn. We picture the matrix as a rectangular array of numbers, where Ii' is the row index and Ij' is the column index, so that tij is the number at the inter- section of the ith row and the jth column. If we hold i fixed, we get the n-tuple forming the ith row, and the matrix can therefore be interpreted as an m-tuple of row n-tuples. Similarly (dually), it can be viewed as an n-tuple of column m-tuples. In the same vein, an n-tuple -<!I, ... ,fn > of functions from A to B can be regarded as a single n-tuple-valued function from A to Bn, In a somewhat different application, duality will allow us to regard a finite- dimensional vector space V as being its own second conjugate space (V*)*. It is instructive to look at elementary Euclidean geometry from this point of view. Today we regard a straight line as being a set of geometric points. An older and more neutral view is to take points and lines as being two different kinds of primitive objects. Accordingly, let A be the set of all points (so that A is the Euclidean plane as we now view it), and let B be the set of all straight lines. Let F be the incidence function: F(p, l) = 1 if p and I are incident (p is "on" l, I is "on" p) and F(p, l) = 0 otherwise. Thus F maps A X B into {O, 1}. Then for each IE B the function gl(P) = F(p, I) is the characteristic function of the set of points that we think of as being the line l (gl(P) has the value 1 if p is on l and 0 if p is not on l.) Thus each line determines the set of points that are on it. But, dually, each point p determines the set of lines I "on" it, through its char- acteristic function hP(I). Thus, in complete duality we can regard a line as being a set of points and a point as being a set of lines. This duality aspect of geometry is basic in projective geometry. It is sometimes awkward to invent new notation for the "partial" function obtained by holding a variable fixed in a function of several variables, as we did above when we set gil(x) = F(x, y), and there is another device that is frequently useful in this situation. This is to put a dot in the position of the "varying variable". Thus F(a,') is the function of one variable obtained from F(x, y) by holding x fixed at the value a, so that in our beginning discussion of duality we have hX = F(x, .), gil = F(·, y). If f is a function of one variable, we can then write f = f('), and so express the
  • 29. 0.11 THE BOOLEAN OPERATIONS 17 above equations also as h"'(-) = F(x, .), gy(-) = F( . , y). The flaw in this notation is that we can't indicate substitution without losing meaning. Thus the value of the function F(x,·) at b is F(x, b), but from this evaluation we cannot read backward and tell what function was evaluated. Weare therefore forced to some such cumbersome notation as F(x, ·)/b, which can get out of hand. Never- theless, the dot device is often helpful when it can be used without evaluation difficulties. In addition to eliminating the need for temporary notation, as mentioned above, it can also be used, in situations where it is strictly speaking superfluous, to direct the eye at once to the position of the variable. For example, later on D~F will designate the directional derivative of the function F in the (fixed) direction~. This is a function whose value at a is D~F(a), and the notation D~F(-) makes this implicitly understood fact explicit. 11. THE BOOLEAN OPERATIONS Let S be a fixed domain, and let 5' be a family of subsets of S. The union of 5', or the union of all the sets in 5', is the set of all elements belonging to at least one set in 5'. We designate the union U5' or UAE~ A, and thus we have U5' = {x: (3A E~)(X E A)}, Y E U5' ¢=} (3A E~)(y E A). We often consider the family 5' to be indexed. That is, we assume given a set I (the set of indices) and a surjective mapping i 1-+ Ai from I to 5', so that 5' = {Ai: i E I}. Then the union of the indexed collection is designated UiEI Ai or U{Ai: i E I}. The device of indices has both technical and psychological advantages, and we shall generally use it. If 5' is finite, and either it or the index set is listed, then a different notation is used for its union. If 5' = {A, B}, we designate the union A U B, a notation that displays the listed names. Note that here we have x E A u B ¢=} x E A or x E B. If 5' = {Ai: i = 1, ... ,n}, we generally write 'AI U A2 U· .. U An' or 'Uf=l Ai' for U5'· The intersection of the indexed family {Ai}iEI, designated niEI Ai, is the set of all points that lie in every Ai. Thus x E nAi ¢=} fYiEI)(x E Ai). iEI For an unindexed family 5' we use the notation n5' or nAE~ A, and if 5' = {A, B}, then n5' = An B. The complement, A', of a subset of S is the set of elements x ,~: S not in A: A' = {XES: x fJ. A}. The law of De Morgan states that the complement of an intersection is the union of the complements: (nAi)' = U (A~). iEI iEI This an immediate consequence of the rule for negating quantifiers. It is the
  • 30. 18 INTRODUCTION 0.11 equivalence between 'not always in' and 'sometimes not in': [~(Vi)(x E Ai) <=? (3i)(x G! Ai)] says exactly that x E (0Ai)' <=? X E yeA:>. If we set Bi = A~ and take complements again, we obtain the dual form: (UiE1Bi)' = niEI(BD· Other principles of quantification yield the laws B n (U Ai) = U(B n Ai) iEI iEI from P & (3x)Q(x) <=? (3x)(P & Q(x», B U (n Ai) = n(B u Ai), iEI iEI B n (n Ai) = n(B n Ai), iEI iEI B U (U Ai) = U(B U Ai). iEI iEI In the case of two sets, these laws imply the following familiar laws of set algebra: (A U B)' = A' n B', (A n B)' = A' U B' (De Morgan), A n (B U C) = (A n B) U (A n C), A u (B n C) = (A u B) n (A U C). Even here, thinking in terms of indices makes the laws more intuitive. Thus (AI n A 2 )' = A) u A~ is obvious when thought of as the equivalence between 'not always in' and 'sometimes not in'. The family 5' is disjoint if distinct sets in 5' have no elements in common, i.e., if ('IX, yE5')(X ~ Y =} X n Y = 0). For an indexed family {Ai}iEI the condition becomes i ~ J=} Ai n Aj = 0. If 5' = {A, B}, we simply say that A and B are disjoint. Given f: U ~ V and an indexed family {Bi} of subsets of V, we have the following important identities: and, for a single set B C V, For example, x E r l [~ Bi] <=? f(x) E ~ Bi <=? (Vi) (j(x) E B i) <=? (Vi) (x E f-1[Bi ]) <=? x E nf-l[Bi l. i
  • 31. 0.12 PARTITIONS AND EQUIVALENCE RELATIONS 19 The first, but not the other two, of the three identities above remains valid when f is replaced by any relation R. It follows from the commutative law, (3x)(3y)A ~ (3y)(3x)A. The second identity fails for a general R because '(3x)(Vy)' and '(Vy)(3x)' have different meanings. 12. PARTITIONS AND EQUIVALENCE RELATIONS A partition of a set A is a disjoint family ;r of sets whose union is A. We call the elements of;r 'fibers', and we say that;r fibers A or is afibering of A. For example, the set of straight lines parallel to a given line in the Euclidean plane is a fibering of the plane. If 'x' designates the unique fiber containing the point x, then x ~ x is a surjective mapping 71': A ~ ;r which we call the projection of A on;r. Passing from a set A to a fibering ;r of A is one of the principal ways of forming new mathematical objects. Any function f automatically fibers its domain into sets on which f is con- stant. If A is the Euclidean plane and f(p) is the x-coordinate of the point p in some coordinate system, then f is constant on each vertical line; more exactly, j-l(X) is a vertical line for every x in IR. Moreover, x ~ j-l(X) is a bijection from IR to the set of all fibers (vertical lines). In general, if j: A ~ B is any sur- jective mapping, and if for each value y in B we set Ay = j-l(y) = {x E A: j(x) = y}, then ;r = {Ay: y E B} is a fibering of A and cp: y ~ Ay is a bijection from B to;r. Also cp 0 f is the projection 71': A ~;r, since cp 0 j(x) = cp(j(x) is the set x of all z in A such that j(z) = j(x). The above process of generating a fibering of A from a function on A is relatively trivial. A more important way of obtaining a fibering of A is from an equality-like relation on A called an equivalence relation. An equivalence relation ~ on A is a binary relation which is reflexive (x ~ x for every x E A), symmetric (x ~ y =? Y ~ x), and transitive (x ~ y and y ~ z =? X ~ z). Every fibering;r of A generates a relation ~ by the stipulation that x ~ y if and only if x and yare in the same fiber, and obviously ~ is an equivalence relation. The most important fact to be established in this section is the converse. Theorem. Every equivalence relation ~ on A is the equivalence relation of a fibering. Proof. We obviously have to define x as the set of elements y equivalent to x, x = {y: y ~ x}, and our problem is to show that the family ;r of all subsets of A obtained this way is a fibering. The reflexive, symmetric, and transitive laws become x Ex, x E 'ii =? Y Ex, and x E 'ii and y E Z =? X E Z. Reflexivity thus implies that ;r covers A. Transitivity says that if y E z, then x E 'ii =? X E z; that is, if yE z, then 'ii c z. But also, if yE z, then z E 'ii by
  • 32. 20 INTRODUCTION 0.12 symmetry, and so zC y. Thus yE z implies y = z. Therefore, if two of our sets aand bhave a point x in common, then a = x = 0. In other words, if ais not the set b, then aand bare disjoint, and we have a fibering. 0 The fundamental role this argument plays in mathematics is due to the fact that in many important situations equivalence relations occur as the primary object, and then are used to define partitions and functions. We give two examples. Let lL be the integers (positive, negative, and zero). A fraction 'min' can be considered an ordered pair -<m, n>- of integers with n -:;e o. The set of all fractions is thus lL X (lL - {OJ). Two fractions -<m, n>- and -<p, q>- are "equal" if and only if mq = np, and equality is checked to be an equivalence relation. The equivalence class -<m, n>- is the object taken to be the rational number min. Thus the rational number system Q is the set of fibers in a par- tition of lL X (lL - {O}). Next, we choose a fixed integer p ElL and define a relation E on lL by mEn <=> p divides m - n. Then E is an equivalence relation, and the set lLp of its equivalence classes is called the integers modulo p. It is easy to see that mEn if and only if m and n have the same remainder when divided by p, so that in this case there is an easily calculated function f, where f(m) is the remainder after dividing m by p, which defines the fibering. The set of possible remainders is {O, 1, ... , p - I}, so that lLp contains p elements. A function on a set A can be "factored" through a fibering of A by the following theorem. Theorem. Let g be a function on A, and let g: be a fibering of A. Then g is constant on each fiber of g: if and only if there exists a function y on g: such that g = y 0 7r. Proof. If g is constant on each fiber of g:, then the association of this unique value with the fiber defines the function y, and clearly g = yo 7r. The converse is obvious. 0
  • 33. CHAPTER 1 VECTOR SPACES The calculus of functions of more than one variable unites the calculus of one variable, which the reader presumably knows, with the theory of vector spaces, and the adequacy of its treatment depends directly on the extent to which vector space theory really is used. The theories of differential equations and differential geometry are similarly based on a mixture of calculus and vector space theory. Such "vector calculus" and its applications constitute the subject matter of this book, and in order for our treatment to be completely satisfactory, we shall have to spend considerable time at the beginning studying vector spaces them- selves. This we do principally in the first two chapters. The present chapter is devoted to general vector spaces and the next chapter to finite-dimensional spaces. We begin this chapter by introducing the basic concepts of the subject- vector spaces, vector subspaces, linear combinations, and linear transforma- tions-and then relate these notions to the lines and planes of geometry. Next we establish the most elementary formal properties of linear transformations and Cartesian product vector spaces, and take a brief look at quotient vector spaces. This brings us to our first major objective, the study of direct sum decomposi- tions, which we undertake in the fifth section. The chapter concludes with a preliminary examination of bilinearity. I. FUNDAMENTAL NOTIONS Vector spaces and subspaces. The reader probably has already had some eontact with the notion of a vector space. lIost beginning calculus texts discuss 1!:cometric vectors, which are represented by "arrows" drawn from a chosen origin O. These vectors are added geometrically by the parallelogram rule: The sum of the vector 01 (represented by the arrow from 0 to A) and the vcctor Oii is the vector QP, where P is the vertex opposite 0 in the parallelogram having OA and OB as two sides (Fig. 1.1). Vectors can also be multiplied by numbers: x(o"A) is that vector DB such that B is on the line through 0 and :1, the distance from 0 to B is Ixl times the distance from 0 to A, and B and A arc on the same side of 0 if x is positive, and on opposite sides if x is negative 21
  • 34. 22 VECTOR SPACES o Fig. 1.1 ------!!.------ j -------- / -- / X I I , / O~-- _______.._......~~/ o --------p ------ Fig. 1.3 Fig. 1.2 oc=-till OB=fOA ------------.---- 1.1 (Fig. 1.2). These two vector operations satisfy certain laws of algebra, which we shall soon state in the definition. The geometric proofs of these laws are generally sketchy, consisting more of plausibility arguments than of airtight logic. For example, the geometric figure in Fig. 1.3 is the essence of the usual proof that vector addition is associative. In each case the final vector OX is represented by the diagonal starting from 0 in the parallelepiped constructed from the three edges OA, OB, and ~C. The set of all geometric vectors, together with these two operations and the laws of algebra that they satisfy, constitutes one example of a vector space. We shall return to this situation in Section 2. The reader may also have seen coordinate triples tr€ated as vectors. In this system a three-dimensional vector is an ordered triple of numbers -<Xl, X2, xa> which we think of geometrically as the coordinates of a point in space. Addition is now algebraically defined, -<Xb X2,Xa> + -<YbY2,Ya> = -<Xl+Yb X.2+Y2,Xa+Ya>, as is multiplication by numbers, t-<Xl' X2, xa> = -<tXl, tX2, tXa>. The vector laws are much easier to prove for these objects, since they are almost algebraic formalities. The set ~3 of all ordered triples of numbers, together with these two operations, is a second example of a vector space.
  • 35. 1.1 FUNDAMENTAL NOTIONS 23 If we think of an ordered triple -<Xli X2, X3 >- as a function x with domain the set of integers from 1 to 3, where Xi is the value of the function x at i (see Section 0.8), then this vector space suggests a general type called a function space, which we shall examine after the definition. For the moment we remark only that we defined the sum of the triple x and the triple y as that triple z such that Zi = Xi +Yi for every i. A vector space, then, is a collection of objects that can be added to each other and multiplied by numbers, subject to certain laws of algebra. In this context a number is often called a scalar. Definition. Let V be a set, and let there be given a mapping -<a, fl >- .- a +fl from V X V to V, called addition, and a mapping -<x, a>- .- xa from IR X V to V, called multiplication by scalars. Then V is a vector space with respect to these two operations if: AI. a + (fl +1') = (a +tJ) +I' for all a, fl, I' E V. A2. a +fl = fl +a for all a, fl E V. A3. There exists an element 0 E V such that a +0 = a for all a E V. A4. For every a E V there exists a fl E V such that a +fl = O. S1. (xy)a = x(ya) for all x, y E IR, a E V. S2. (x + y)a = Xa + ya for all x, y E IR, a E V. S3. x(a +tJ) = Xa +xfl for all X E IR, a, fl E V. S4. Ia = a for all a E V. In contexts where it is clear (as it generally is) which operations are intended, we refer simply to the vector space V. Certain further properties of a vector space follow directly from the axioms. Thus the zero element postulated in A3 is unique, and for each a the fl of A4 is unique, and is called -a. Also Oa = 0, xO = 0, and (-I)a = -a. These elementary consequences are considered in the exercises. Our standard example of a vector space will be the set V = IRA of all real- valued functions on a set A under the natural operations of addition of two functions and multiplication of a function by a number. This generalizes the example lR(l,2,31 = 1R3 that we looked at above. Remember that a function f in IRA is simply a mathematical object of a certain kind. We are saying that two of these objects can be added together in a natural way to form a third such object, and that the set of all such objects then satisfies the above laws for addition. Of course, f + g is defined as the function whose value at a is f(a) + y(a), so that (f + g)(a) = f(a) + g(a) for all a in A. For example, in 1R3 we defined the sum x +y as that triple whose value at i is Xi +Yi for all i. Similarly, cf is the function defined by (cf)(a) = c(j(a») for all a. Laws Al through 84 follow at once from these definitions and the corresponding laws of algebra for the real number system. For example, the equation (s +t)f = sf +tf means
  • 36. 24 VECTOR SPACES 1.1 that (s + t)f) (a) = (sf + tf)(a) for all a E A. But (s + t)f) (a) = (s + t) (f(a)) = s(f(a)) + t(f(a)) = (sf)(a) + (tf)(a) = (sf + tf)(a), where we have used the definition of scalar multiplication in IRA, the distributive law in IR, the definition of scalar multiplication in IRA, and the definition of addition in IRA, in that order. Thus we have S2, and the other laws follow similarly. The set A can be anything at all. If A = IR, then V = IRR is the vector space of all real-valued functions of one real variable. If A = IR X IR, then V = IRRXR is the space of all real-valued functions of two real variables. If A = {1,2} = 2, then V = 1R2 = 1R2 is the Cartesian plane, and if A = {I, ... ,n} = ii, then V = IRn is Cartesian n-space. If A contains a single point, then IRA is a natural bijective image of IR itself, and of course IR is trivially a vector space with respect to its own operations. Now let V be any vector space, and suppose that W is a nonempty subset of V that is closed under the operations of V. That is, if a and {3 are in W, then so is a +(3, and if a is in W, then so is Xa for every scalar x. For example, let V be the vector space lR[a,bl of all real-valued functions on the closed interval [a, b) C IR, and let W be the set e([a, b]) of all continuous real-valued functions on [a, b). Then W is a subset of V that is closed under the operations of V, since f +g and cf are continuous whenever f and g are. Or let V be Cartesian 2-space 1R2, and let W be the set of ordered pairs x = -<XI, X2> such that Xl + X2 = O. Clearly, W is closed under the operations of V. Such a subset W is always a vector space in its own right. The universally quantified laws AI, A2, and Sl through S4 hold in W because they hold in the larger set V. And since there is some {3 in W, it follows that 0 = O{3 is in W because W is closed under multiplication by scalars. For the same reason, if a is in W, then so is -a = (-l)a. Therefore, A3 and A4 also hold, and we see that W is a vector space. We have proved the following lemma. Lelllilla. If W is a nonempty subset of a vector space V which is closed under the operations of V, then W is itself a vector space. We call Wa subspace of V. Thus e([a, b]) is a subspace of lR[a,bl, and the pairs -<Xl, X2> such that Xl + X2 = 0 form a subspace of 1R2. Subspaces will be with us from now to the end. A subspace of a vector space IRA is called a function space. In other words, a function space is a collection of real-valued functions on a common domain which is closed under addition and multiplication by scalars. What we have defined so far ought to be called the notion of a real vector space or a vector space over IR. There is an analogous notion of a complex vector space, for which the scalars are the complex numbers. Then laws Sl through S4 refer to mUltiplication by complex numbers, and the space CA of all complex-
  • 37. 1.1 FUNDAMENTAL NOTIONS 25 valued functions on A is the standard example. In fact, if the reader knew what is meant by a field F, we could give a single general definition of a vector space over F, where scalar multiplication is by the elements of F, and the standard example is the space V = FA of all functions from A to F. Throughout this book it will be understood that a vector space is a real vector space unless explic- itly stated otherwise. However, much of the analysis holds as well for complex vector spaces, and most of the pure algebra is valid for any scalar field F. EXERCISES 1.1 Sketch the geometric figure representing law S3, x(OA +oB) = x(OA) +x(oB), for geometric vectors. Assume that x > 1. 1.2 Prove S3 for 1R3 using the explicit displayed form {Xl, X2, X3J for ordered triples. 1.3 The vector 0 postulated in A3 is unique, as elementary algebraic fiddling will show. For suppose that 0' also satisfies A3. Then 0' = 0'+0 = 0+ 0' =0 (A3 for 0) (A2) (A3 for 0'). Show by similar algebraic juggling that, given a, the {3 postulated in A4 is unique. This unique {3 is designated -a. 1.4 Prove similarly that Oa = 0, xO = 0, and (-I)a = -a. 1.5 Prove that if xa = 0, then either X = 0 or a = O. 1.6 Prove SI for a function space IRA. Prove S3. 1.7 Given that a is any vector in a vector space V, show that the set {xa: X E IR} of all scalar multiples of a is a subspace of V. 1.8 Given that a and {3 are any two vectors in V, show that the set of all vectors .ra +y{3, where x and yare any real numbers, is a subspace of V. 1.9 Show that the set of triples x in 1R3 such that Xl - X2 +2X3 = 0 is a subspace M. If N is the similar subspace {x: Xl + X2 + X3 = O}, find a nonzero vector a in !If n N. Show that !If n N is the set {xa: X E IRJ of all scalar multiples of a. 1.10 Let A be the open interval (0,1), and let V be IRA. Given a point X in (0,1), lct V:z; be the set of functions in V that have a derivative at x. Show that V:z; is a sub- space of V. l.ll For any subsets A and B of a vector space V we define the set sum A +B by .1+B = {a+{3:aEAand{3EB}. Show that (A+B)+C = A+(B+C). 1.12 If A C V and X C IR, we similarly define X A = {xa: X E X and a E .ti}. Show that a nonvoid set A is a subspace if and only if A + A = A and IRA = A. 1.13 Let V be 1R2, and let !If be the line through the origin with slope k. Let x be any nonzero vector in M. Show that M is the subspace IRx = {tx: t E IR}.
  • 38. 2G VECTOR SPACES 1.1 1.14 Show that any other line L with the same slope k is of the form M + a for some a. 1.15 Let !If be a subspace of a vector space V, and let a and {3 be any two vectors in V. Given A = a +1If and B = {3 + M, show that either A = B or An B = 0. Show also that A +B = (a +(3) +M. 1.16 State more carefully and prove what is meant by "a subspace of a subspace is a subspace". 1.17 Prove that the intersection of two subspaces of a vector space is always itself a subspace. 1.18 Prove more generally that the intersection TV = niEI Wi of any family {Wi: i E J} of subspaces of V is a subspace of V. 1.19 Let V again be IR(O.l), and let W be the set of all functions f in V such that f'(x) exists for every x in (0, 1). Show that lr is the intersection of the collection of subspaces of the form V. that were considered in Exercise 1.10. 1.20 Let V be a function space IR--t, and for a point a in .1 let Wa be the set of functions such that f(a) = O. Wa is clearly a subspace. For a subset Be A let W B be the set of functions f in V such that f = 0 on B. Show that lVB is the intersection naEB Wa. 1.21 Supposing again that X and Yare subspaces of V, show that if X + y = V and X n l' = {O}, then for every vector ~ in V there is a unique pair of vectors !; E X and 71 E Y such that ~ = !; + 71. 1.22 Show that if X and Yare subspaces of a vector space 17, then the union XU l' can only be a subspace if either XC Yor Y ex. Linear combinations and linear span. Because of the commutative and associ- ative laws for vector addition, the sum of a finite set of vectors is the same for all possible ways of adding them. For example, the sum of the three vectors aa, ab, a c can be calculated in 12 ways, all of which give the same result: Therefore, if I = {a, b, c} is the set of indices used, the notation LiEI ai, which indicates the sum without telling us how we got it, is unambiguous. In general, for any finite indexed set of vectors {ai: i E l} there is a uniquely determined sum vector LiEI ai which we can compute by ordering and group- ing the a/s in any way. The index set I is often a block of integers n = {1, ... ,n}. In this case the vectors ai form an n-tuple {ai}~' and unless directed to do otherwise we would add them in their natural order and write the sum as Li'=l ai. Note that the way they are grouped is still left arbitrary. Frequently, however, we have to use indexed sets that are not ordered. For example, the general polynomial of degree at most 5 in the two variables 's' and 't' is and the finite set of monomials {Siti}i+i:$5 has no natural order.
  • 39. 1.1 FUNDAMENTAL NOTIONS 27 *The formal proof that the sum of a finite collection of vectors is indepen- dent of how we add them is by induction. We give it only for the interested reader. In order to avoid looking silly, we begin the induction with two vectors, in which case the commutative law aa +ab = ab +aa displays the identity of all possible sums. Suppose then that the assertion is true for index sets having fewer than n elements, and consider a collection {ai:i E I} having n members. Let {3 and 'Y be the sum of these vectors computed in two ways. In the com- putation of {3 there was a last addition performed, so that {3 = (LiEJ1 ai) + (LiEJ 2 ai), where {JI, J 2 } partitions I and where we can write these two partial sums without showing how they were formed, since by our inductive hypothesis all possible ways of adding them give the same result. Similarly, 'Y = (LiEKl ai) + (L iEK2 ai). Now set and ~jk = L: ai, iELjk where it is understood that ~jk = °if Ljk is empty (see Exercise 1.37). Then LiE J 1 = ~ 11 + h 2 by the inductive hypothesis, and similarly for the other three sums. Thus which completes our proof. * A vector {3 is called a linear combination of a subset A of the vector space V if (3 is a finite sum L Xiai, where the vectors ai are all in A and the scalars Xi are arbitrary. Thus, if A is the subset {tn}; C IRR of all "monomials", then a function f is a linear combination of the functions in A if and only if f is a polynomial function f(t) = L~ ci. If A is finite, it is often useful to take the indexed set {ai} to be the whole of A, and to simply use a O-coefficient for any vector missing from the sum. Thus, if A is the subset {sin t, cos t, et} of IRR, then we can consider A an ordered triple in the listed ordering, and the function 3 sin t - et = 3· sin t +°.cos t + (-I)et is the linear combination of the triple A having the coefficient triple -< 3, 0, -1>-. Consider now the set L of all linear combinations of the two vectors -< 1, 1, 1>- and -< 0, 1, -1>- in 1R3. It is the set of all vectors s -< 1, 1, 1>- + t-< 0, 1, -1>- = -< s, s +t, s - t>-, where sand t are any real numbers. Thus L = {-<s, s +t, s - t>- : -<s, t>- E 1R2}. It will be clear on inspection that L is closed under addition and scalar multiplication, and therefore is a subspace of 1R3. Also, L contains each of the two given vectors, with coefficient pairs -< 1, 0>- and -<0, 1>-, respectively. Finally, any subspace M of 1R3 which contains ·each of the two given vectors will also contain all of their linear combi- nations, and so will include L. That is, L is the smallest subspace of 1R3 containing -< 1, 1, 1>- and -< 0, 1, -1>-. It is called the linear span of the two vectors, or the subspace generated by the two vectors. In general, we have the following theorem.
  • 40. 28 VECTOR SPACES 1.1 Theorem 1.1. If A is a nonempty subset of a vector space V, then the set L(A) of all linear combinations of the vectors in A is a subspace, and it is the smallest subspace of V which includes the set A. Proof. Suppose first that A is finite. We can assume that we have indexed A in some way, so that A = {ai: i E I} for some finite index set I, and every element of L(A) is of the form LiEI Xiai. Then we have (L Xiai) + (L Yiai) = L (Xi + Yi)ai because the left-hand side becomes Li (Xiai +Yiai) when it is regrouped by pairs, and then S2 gives the right-hand side. We also have e(L Xiai) = L(exi)ai by S3 and mathematical induction. Thus L(A) is closed under addition and multiplication by scalars and hence is a subspace. Moreover, L(A) contains each ai (why?) and so includes A. Finally, if a subspace W includes A, then it contains each linear combination L Xiai, so it includes L(A). Therefore, L(A) can be directly characterized as the uniquely determined smallest subspace which includes the set A. If A is infinite, we obviously can't use a single finite listing. However, the sum (L~ Xiai) + (Lf Yj{3j) of two linear combinations of elements of A is clearly a finite sum of scalars times elements of A. If we wish, we can rewrite it as L~+m Xiai, where we have set (3j = an+j and Yj = xn+j for j = 1, ... , m. In any case, L(A) is again closed under addition and multiplication by scalars and so is a subspace. 0 We call L(A) the linear span of A. If L(A) = V, we say that A spans V; V is finite-dimensional if it has a finite spanning set. If V = JR3, and if 0 02, and 03 are the "unit points on the axes", 01 = -<1,0,0>-,02 = -<0,1,0>-, and 03 = -<0,0,1>-, then {Oi}r spans V, since x= -<Xl, X2, X3>- = -<XllO,O>- + -<0,X2,0>- + -<0,0,X3>- = X101 + X202 + X303 = L~ XiOi for every x in JR3. More generally, if V = JRn and oj is the n-tuple having 1 in the jth place and °elsewhere, then we have similarly that x = -< Xl, ... , Xn>- = Li'=l Xioi, so that {oi}l spans JRn. Thus JRn is finite- dimensional. In general, a function space on an infinite set A will not be finite- dimensional. For example, it is true but not obvious that e([a, bJ) has no finite spanning set. EXERCISES 1.23 Given a = -<1,1,1>-, {3 = -<0,1, -1>-,1' = -<2,0,1>-, compute the linear combinations a +{3 +1', 3a - 2{3 +1', xa +y{3 +z'Y. Find x, y, and z such that xa + y{3 + z'Y = -< 0,0,1>- = 03 • Do the same for 01 and 02. 1.24 Given a = -<1,1,1>-, {3 = -<0,1, -1>-, I' = -<1,0,2>-, show that each of a, {3, I' is a linear combination of the other two. Show that it is impossible to find coefficients x, y, and z such that xa + y{3 + z'Y = 01.
  • 41. 1.1 FUNDAMENTAL NOTIONS 29 1.25 a) Find the linear combination of the set A = <. t, t2 - 1, t2 + 1> with coeffi- cient triple <'2, -1, 1>. Do the same for <'0,1,1>. b) Find the coefficient triple for which the linear combination of the triple A is (t + 1)2. Do the same for 1. c) Show in fact that any polynomial of degree ~ 2 is a linear combination of A. 1.26 Find the linear combinationfof {et, e-t} C jRR such thatf(O) = 1 andf'(O) = 2. 1.27 Find a linear combination f of sin x, cos x, and eX such that f(O) = 0, ff (0) = 1, and f"(0) = 1. 1.28 Suppose that a sin x + b cos x + ceX is the zero function. Prove that a = b = c = O. 1.29 Prove that <'1,1> and <'1,2> span jR2. 1.30 Show that the subspace M = {x: Xl + X2 = O} C jR2 is spanned by one vector. 1.31 Let M be the subspace {x: Xl - X2 + 2X3 = O} in jR3. Find two vectors a and h in M neither of which is a scalar multiple of the other. Then show that M is the linear span of a and h. 1.32 Find the intersection of the linear span of <. 1, 1, 1> and <. 0, 1, -1 > in jR3 with the coordinate subspace X2 = O. Exhibit this intersection as a linear span. 1.33 Do the above exercise with the coordinate space replaced by J[ = {x: Xl + X2 = O}. 1.34 By Theorem 1.1 the linear span L(A) of an arbitrary subset .t of a vector space V has the following two properties: i) L( A) is a subspace of V which includes A; ii) If M is any subspace which includes A, then L(A) eM. Using only (i) and (ii), show that a) A C B=} L(A) C L(B); b) L(L(A)) = L(A). 1.35 Show that a) if M and N are subspaces of V, then so is M + N; b) for any subsets A, B C V, L(A U B) = L(A) + L(B). 1.36 Remembering (Exercise 1.18) that the intersection of any family of subspaces is a subspace, show that the linear span L(A) of a subset A of a vector space V is the intersection of all the subspaces of V that include A. This alternative characterization is sometimes taken as the definition of linear span. 1.37 By convention, the sum of an empty set of vectors is taken to be the zero vector. This is necessary if Theorem 1.1 is to be strictly correct. Why? What about the preceding problem? Linear transformations. The general function space JRA and the subspace e([a, bJ) of jR[a,bJ both have the property that in addition to being closed under the vector operations, they are also closed under the operation of multiplication of two functions. That is, the pointwise product of two functions is again a function [(fg)(a) = f(a)g(a)J, and the product of two continuous functions is continuous. With respect to these three operations, addition, multiplication,
  • 42. 30 VECTOR SPACES 1.1 and scalar multiplication, IRA and e([a, b]) are examples of algebras. If the reader noticed this extra operation, he may have wondered why, at least in the context of function spaces, we bother with the notion of vector space. Why not study all three operations? The answer is that the vector operations are exactly the operations that are "preserved" by many of the most important mappings of sets of functions. For example, define T: e([a, b]) ~ IR by T(f) = f: f(t) dt. Then the laws of the integral calculus say that T(f + g) = T(f) + T(g) and T(cf) = cT(f). Thus T "preserves" the vector operations. Or we can say that T "commutes" with the vector operations, since plus followed by T equals T followed by plus. However, T does not preserve multiplication: it is not true in general that T(fg) = T(f)T(g). Another example is the mapping T: x ~ y from 1R3 to 1R2 defined by YI = 2XI - X2 + X3, Y2 = Xl + 3X2 - 5X3, for which we can again verify that T(x + y) = T(x) + T(y) and T(cx) = cT(x). The theory of the solvability of systems of linear equations is essentially the theory of such mappings T; thus we have another important type of mapping that preserves the vector operations (but not products). These remarks suggest that we study vector spaces in part so that we can study mappings which preserve the vector operations. Such mappings are called linear transformations. Definition. If V and Ware vector spaces, then a mapping T: V ~ W is a linear transformation or a linear map if T(a + (3) = T(a) + T({3) for all a, (3 E V, and T(xa) = xT(a) for all a E V, X E IR. These two conditions on T can be combined into the single equation T(xa + y(3) = xT(a) + yT({3) for all a, {3 E V and all x, y E IR. lIoreover, this equation can be extended to any finite sum by induction, so that if T is linear, then for any linear combination L Xiai· For example, f: (L~ cdi) = L~ Ci f: k EXERCISES 1.38 Show that the most general linear map from IR to IR is multiplication by a con- stant. 1.39 For a fixed a in V the mapping x ~ xa from IR to V is linear. Why? 1.40 Why is this true for a ~ xa when x is fixed? 1.41 Show that every linear mapping from IR to V is of the form x ~ Xa for a fixed vector a in V.
  • 43. 1.1 FUNDAMENTAL NOTIONS 31 1.42 Show that every linear mapping from 1R2 to V is of the form <Xl, X2 > 1-+ XIO!I +X20!2 for a fixed pair of vectorsO!l and 0!2 in V. What is the range of this mapping? 1.43 Show that the map f 1-+ f: fCt) dt from eC[a, b]) to IR does not preserve products. 1.44 Let g be any fixed function in IRA. Prove that the mapping T: IRA ~ IRA defined by TU) = gf is linear. 1.45 Let cp be any mapping from a set A to a set B. Show that composition by cp is a linear mapping from IRB to IRA. That is, show that T: IRB ~ IRA defined by TU) = f 0 cp is linear. In order to acquire a supply of examples, we shall find all linear transforma- tions having IRn as domain space. It may be well to start by looking at one such transformation. Suppose we choose some fixed triple of functions {Ii} ~ in the space IRR of all real-valued functions on IR, say !I(t) = sin t, f2(t) = cos t, and fa(t) = et = exp(t). Then for each triple of numbers x = {xiH in 1R3 we have the linear combination L~=l Xdi with {Xi} as coefficients. This is the element of IRR whose value at t is L~ xiIi(t) = Xl sin t +X2 cos t +X3et. Different coefficient triples give different functions, and the mapping x 1-+ L~=l xdi = Xl sin + X2 cos + X3 exp is thus a mapping from 1R3 to IRR. It is clearly linear. If we call this mapping T, then we can recover the determining triple of functions from T as the images of the "unit points" ~i in 1R3; T(~j) = L ~!Ii = Ii, and so T(~l) = sin, T(~2) = cos, and T(~3) = expo We are going to see that every linear mapping from 1R3 to IRR is of this form. In the following theorem {~iH is the spanning set for IRn that we defined earlier, so that x = Li Xi~i for every n-tuple x = <Xl, ••• , xn> in IRn. Theorelll 1.2. If {~j} i is any fixed n-tuple of vectors in a vector space W, then the "linear combination mapping" x 1-+ Li Xi~i is a linear trans- formation T from IRn to W, and T(~j) = ~j for j = 1, ... ,n. Conversely, if T is any linear mapping from IRn to W, and if we set ~j = T(~j) for j = 1, ... ,n, then T is the linear combination mapping x 1-+ Li Xi~i. Proof. The linearity of the linear combination map T follows by exactly the same argument that we used in Theorem 1.1 to show that L(A) is a subspace. Thus n n T(x + y) = L: (Xi + Yi)~i = L: (Xi~i + Yi~i) I I n n = L: Xi~i +L: Yi~i = T(x) + T(y), I I and n n n T(sx) = L: (SXi)~i = L: S(Xi~i) = SL: Xi~i = sT(x). I I I
  • 44. 32 VECTOR SPACES 1.1 Conversely, if T: IRn ~ W is linear, and if we set (3j = T(5j ) for all j, then for any x = -< Xb .•• , xn>- in IRn we have T(x) = T(L:i Xi 5i) = L:i xiT(5i ) = L:i Xi{3i. Thus T is the mapping x ~ L:i xi{3i. 0 This is a tremendously important theorem, simple though it may seem, and the reader is urged to fix it in his mind. To this end we shall invent some termi- nology that we shall stay with for the first three chapters. If a = {ab ... , an} is an n-tuple of vectors in a vector space W, let La. be the corresponding linear combination mapping x ~ L:i Xiai from IRn to W. Note that the n-tuple a itself is an element of W n • If T is any linear mapping from IRn to W, we shall call the n-tuple {T(5i)}i the skeleton of T. In these terms the theorem can be restated as follows. Theorelll 1.2'. For each n-tuple a in W n , the map La.: IRn ~ W is linear and its skeleton is a. Conversely, if T is any linear map from IRn to W, then T = Lp where (3 is the skeleton of T. Or again: Theorelll 1.2". The map a ~ La. is a bijection from wn to the set of all linear maps T from IRn to W, and T ~ skeleton (T) is its inverse. A linear transformation from a vector space V to the scalar field IR is called a linear functional on V. Thus f ~ f: f(t) dt is a linear f,unctional on V = e([a, bJ). The above theorem is particularly simple for a linear functional F: since W = IR, each vector (3i = F(5i ) in the skeleton of F is simply a number bi , and the skeleton {bi } i is thus an element of IRn. In this case we would write F(x) = L:i biXi, putting the numerical coefficient 'b/ before the variable 'xi'. Thus F(x) = 3x! - X2 +4X3 is the linear functional on 1R3 with skeleton -< 3, -1, 4>-. The set of all linear functionals on IRn is in a natural one-to-one correspondence with IRn itself; we get b from F by bi = F(5i ) for all i, and we get F from b by F(x) = L: biXi for all x in IRn. We next consider the case where the codomain space of T is a Cartesian space IRm, and in order to keep the two spaces clear in our minds, we shall, for the moment, take the domain space to be 1R3. Each vector (3i = T(5i ) in the skeleton of T is now an m-tuple of numbers. If we picture this m-tuple as a column of numbers, then the three m-tuples {3i can be pictured as a rectangular army of numbers, consisting of three columns each of m numbers. Let tij be the ith number in the jth column. Then the doubly indexed set of numbers {tij} is called the matrix of the transformation T. We call it an m-by-3 (an m X 3) matrix because the pictured rectangular array has m rows and three columns. The matrix determines T uniquely, since its columns form the skeleton of T. The identity T(x) = L:~ xjT(5j) = L:~ Xj{3j allows the m-tuple T(x) to be calculated explicitly from x and the matrix {tij}. Picture multiplying the column m-tuple {3j by the scalar Xj and then adding across the three columns at
  • 45. 1.1 FUNDAMENTAL NOTIONS 33 the ith row, as below: Since tij is the ith number in the m-tuple (3j, the ith number in the m-tuple L:=1 Xj{3j is L:=1 Xjtij. That is, if we let y be the m-tuple T(x), then 3 Yi = L tijXj j=1 for i = 1, ... , m, and this set of m scalar equations is equivalent to the one-vector equation y = T(x). We can now replace three by n in the above discussion without changing anything except the diagram, and thus obtain the following specialization of Theorem 1.2. Theorelll 1.3. Every linear mapping T from IRn to IRm determines the m X n matrix t = {tij} having the skeleton of T as its columns, and the expression of the equation y = T(x) in linear combination form is equivalent to the m scalar equations n Yi = L tijXj j=1 for '/, = 1, ... , m. Conversely, each m X n matrix t determines the linear combination mapping having the columns of t as its skeleton, and the mapping t 1---+ T is therefore a bijection from the set of all m X n matrices to the set of all linear maps from IRn to IRm. A linear functional F on IRn is a linear mapping from IRn to IR1, so it must be expressed by a 1 X n matrix. That is, the n-tuple b in IRn which is the skeleton of F is viewed as a matrix of one row and n columns. As a final example of linear maps, we look at an important class of special linear functionals defined on any function space, the so-called coordinate func- tionals. If V = IRI and i E I, then the ith coordinate functional 1ri is simply evaluation at i, so that 1ri(f) = f(i). These functionals are obviously linear. In fact, the vector operations on functions were defined to make them linear; since sf +tg is defined to be that function whose value at i is sf(i) +tg(i) for all i, we see that sf +tg is by definition that function such that 1ri(sf +tg) = S1ri(f) +t1ri(g) for all i! If V is IRn , then 1rj is the mapping x = -<Xl> ... , xn >- 1---+ Xj. In this case we know from the theorem that 1rj must be of the form 1rj(x) = L1 biXi for some n-tuple b. What is b?
  • 46. 34 VECTOIt SPACES 1.1 The general form of the linearity property, TeE Xiai) = L xiT(ai), shows that T and T- l both carry subspaces into subspaces. Theorem 1.4. If T: V ~ W is linear, then the T-image of the linear span of any subset A C V is the linear span of the T-image of A: T[L(A)] = L(T[AJ). In particular, if A is a subspace, then so is T[A]. Furthermore, if Y is a subspace of W, then T-l[y] is a subspace of V. Proof. According to the formula T(L Xiai) = L xiT(ai), a vector in W is the T-image of a linear combination on A if and only if it is a linear combination on T[A]. That is, T[L(A)] = L(T[AJ). If A is a subspace, then A = L(A) and T[A] = L(T[AJ), a subspace of W. Finally, if Y is a subspace of Wand {ail C T-lfY], then T(L Xiai) = L xiT(ai) E L(Y) = Y. Thus L Xiai E T-l[y] and T-l[y] is its own linear span. 0 The subspace T-l(O) = {a E V: T(a) = O} is called the null space, or kernel, of T, and is designated N(T) or meT). The range of T is the subspace T[V] of W. It is designated R(T) or (!l(T). Lemma 1.1. A linear mapping T is injective if and only if its null space is {O}. Proof. If T is injective, and if a rf 0, then T(a) rf T(O) = 0 and the null space accordingly contains only O. On the other hand, if N(T) = {O}, then whenever a rf {3, we have a - (3 rf 0, T(a) - T({3) = T(a - (3) rf 0, and T(a) rf T({3); this shows that T is injective. 0 A linear map T: V ~ W which is bijective is called an isomorphism. Two vector spaces V and Ware isomorphic if and only if there exists an iso- morphism between them. For example, the map -< Cl, ... , Cn >- ~ Lo-l Ci+1Xi is an isomorphism of IRn with the vector space of all polynomials of degree < n. Isomorphic spaces "have the same form", and are identical as abstract vector spaces. That is, they cannot be distinguished from each other solely on the basis of vector properties which they do or do not have. When a linear transformation is from V to itselfJ special things can happen. One possibility is that T can map a vector a essentially to itself, T(a) = Xa for some x in IR. In this case a is called an eigenvector (proper vector, character- istic vector), and x is the corresponding eigenvalue. EXERCISES 1.46 In the situation of Exerci~e 1.45, show that T is an isomorphism if <p is bijective by showing that a) <p injective ==} T surjective, b) <p surjective ==} T injective.
  • 47. 1.1 FUNDAMENTAL NOTIONS 35 1.47 FindthelinearfunctionallonIR2suchthatl(-<I,I>-) = Oandl(-<1,2>-) = 1. That is, find b = -< bI, b2>- in 1R2 such that l is the linear combination map 1.48 Dothesameforl(-<2,1>-) = -3 and l(-<1,2>-) = 4. 1.49 Find the linear T: 1R2 ~ IRR such that T(-< 1,1 >-) = t2 and T(-< 1,2>-) = t3• That is, find the functions h (t) and h(t) such that T is the linear combination map x ~ xI/I +x2h. 1.50 LetTbethelinearmapfromIR2toIR3suchthatT(~1) = -<2, -1, 1>-, T(~2) = -< 1, 0, 3>-. Write down the matrix of T in standard rectangular form. Determine whether or not ~I is in the range of T. 1.51 Let T be the linear map from 1R3 to 1R3 whose matrix is [1 2 3]2 ° -1 . 3 -1 1 Find T(x) when x = -< 1,1,0>-; do the same for x = -< 3, -2, 1>-. 1.52 Let M be the linear span of -< 1, -1, 0>- and -< 0, 1, 1>-. Find the subspace T[M] by finding two vectors spanning it, where T is as in the above exercise. 1.53 Let T be the map -< x, y >- ~ -< x +2y, y>- from 1R2 to itself. Show that T is a linear combination mapping, and write down its matrix in standard form. 1.54 Do the same for T: -< x, y, z >- ~ -< x - z, x + z, y>- from 1R3 to itself. 1.55 Find a linear transformation T from 1R3 to itself whose range space is the span of -< 1, -1,0>- and -< -1,0,2>-. 1.56 Find two linear functionals on 1R4 the intersection of whose null spaces is the linear span of -<1, 1, 1, 1>- and -<1,0, -1,0>-. You now have in hand a linear transformation whose null space is the above span. What is it? 1.57 Let V = e([a, b]) be the space of continuous real-valued functions on [a, b], also designated eO([a, b]), and let lV = e 1([a, b]) be those having continuous first derivatives. Let D: lV ~ V be differentiation (Df = f'), and define T on V by T(f) = F, where F(x) = 1:f(t) dt. By stating appropriate theorems of the calculus, show that D and T are linear, T maps into lV, and D is a left inverse of T (D 0 Tis the identity on V). 1.58 In the above exercise, identify the range of T and the null space of D. We know that D is surjective and that T is injective. Why? ].59 Let V be the linear span of the functions sin x and cos x. Then the operation of differentiation D is a linear transformation from V to V. Prove that D is an isomor- phism from V to V. Show that D2 = - / on V. 1.60 a) As the reader would guess, e 3(1R) is the set of real-valued functions on IR having continuous derivatives up to and including the third. Show that f ~ fIll is a surjective linear map T from e 3(1R) to e(IR). b) For any fixed a in IR show that f ~ -<f(a), !,(a), j"(a) >- is an isomorphism from the null space N(T) to 1R3. [Hint: Apply Taylor's formula with remainder.]
  • 48. 3G VECTOR SPACES 1.2 1.61 .n integral analogue of the matrix equations Yi = Li tiixi, i = 1, ... , tn, is the equation g(s) = 101 K(s, t)f(t) dt, s E [0, I]. Assuming that [(s, t) is defined on the square [0, I] X [0, I] and is continuous as a function of t for each s, check that f ----> g is a linear mapping from e([O, 1]) to 1R1O.1l. 1.62 For a finite set A = {ai}, Theorem l.1 is a corollary of Theorem 104. Why? 1.63 Show that the inverse of an isomorphism is linear (and hence is an isomorphism). 1.64 Find the eigenvectors and eigenvalues of T: 1R2 ----> 1R2 if the matrix of T is [ 1-1]. -2 ° Since every scalar multiple xa of an eigenvector a is clearly also an eigenvector, it will suffice to find one vector in cach "eigendirection". This is a problem in elementary algebra. 1.65 Find the eigenvectors and eigenvalues of the transformations T whose matrices are [-1 -1],-1 -1 [ 1-1]-2 2' 1.66 The five transformations in the above two exercises exhibit four different kinds of behavior according to the number of distinct eigendirections they have. What are the possibilities? 1.67 Let V be the vector space of polynomials of degree ::::; 3 and define T: V ----> V by f ----> tj'(t). Find the eigenvectors and eigenvalues of T. 2. VECTOR SPACES AND GEOMETRY The familiar coordinate systems of analytic geometry allow us to consider geometric entities such as lines and planes in vector settings, and these geometric notions give us valuable intuitions about vector spaces. Before looking at the vector forms of these geometric ideas, we shall briefly review the construction of the coordinate correspondence for three-dimensional Euclidean space. As usual, the confident reader can skip it. We start with the line. A coordinate correspondence between a line Land the real number system IR is determined by choosing arbitrarily on L a zero point 0 and a unit point Qdistinct from O. Then to each point X on L is assigned the number x such that JxJ is the distance from 0 to X, measured in terms of the segment OQ as unit, and x is positive or negative according as X and Q are on the same side of 0 or on opposite sides. The mapping X ~ x is the coordinate correspondence. Now consider three-dimensional Euclidean space 1E3. We want to set up a coordinate correspondence between 1E3 and the Cartesian vector space 1R3. We first choose arbitrarily a zero point 0 and three unit points Ql, Q2, and Q3 in such a way that the four points do not lie in a plane. Each of
  • 49. 1.2 VECTOR SPACES A~D GEOMETRY 37 the unit points Qi determines a line Li through 0 and a coordinate correspon- dence on this line, as defined above. The three lines L l, L 2, and L3 are called the coordinate axes. Consider now any point X in 1E3. The plane through X parallel to L2 and L3 intersects Ll at a point X b and therefore determines a number Xl, the coordinate of Xl on L l . In a similar way, X determines points X 2 on L2 and X 3 on L3 which have co- ordinates X2 and X3, respectively. Alto- gether X determines a triple in ~3, and we have thus defined a mapping (): X ~ x from 1E3 to ~3 (see Fig. 1.4). We call () the coordinate correspondence defined by the axis system. The conven- tion implicit in our notation above is that ()(Y) is y, ()(A) is a, etc. Note that the unit point Ql on Ll has the coordinate triple 01 = -< 1,0,0>, and similarly, that and ()(Q3) = 03 = -< 0, 0, 1> . L~3X3/ ----/ --- / --JI ~_ Q2 // ----- // --..L Xl I I I I I I I I Q 1________x, 1 ~ / L1 X.,---_ I // - ---1.- Fig. 1.4 There are certain basic facts about the coordinate correspondence that have to be proved as theorems of geometry before the correspondence can be used to treat geometric questions algebraically. These geometric theorems are quite tricky, and are almost impossible to discuss adequately on the basis of the usual secondary sch:)ol treatment of geometry. We shall therefore simply assume them. They are: 1) () is a bijection from 1E3 to ~3. 2) Two line segments AB and XY are equal in length and parallel, and the direction from A to B is the same as that from X to Y if and only if b - a = y - X (in the vector space ~3). This relationship between line segments is important enough to formalize. A directed line segment is a geometric line seg- ment, together with a choice of one of the two directions along it. If we interpret AB as the directed line segment from A to B, and if we define the directed line segments AB and XY to be equivalent (and write AB ~ XV) if they are equal in length, parallel, and similarly directed, then (2) can be restated: AB ~ XY <=} b - a = y - x. 3) If X ~ 0, then Y is on the line through 0 and X in 1E3 if and only if y = tx for some t in R Moreover, this t is the coordinate of Y with respect to X as unit point on the line through 0 and X.
  • 50. 38 VECTOR SPACES 82 =Xr+X~ IOXI2=r2=82+x~ 1.2 y Fig. 1.5 4) If the axis system in 1E3 is Cartesian, that is, if the axes are mutually perpendicular and a common unit of distance is used, then the length 10XI of the segment OX is given by the so-called Euclidean norm on 1R3, 10XI = CE~ Xi2)1/2. This follows directly from the Pythagorean theorem. Then this formula and a second application of the Pythagorean theorem to the triangle OXY imply that the segments OX and OY are perpendicular if and only if the scalar product (x, y) = L~=l XiYi has the value 0 (see Fig. 1.5). In applying this result, it is useful to note that the scalar product (x, y) is linear as a function of either vector variable when the other is held fixed. Thus 3 3 3 (cx +dy, z) = L: (CXi +dYi)zi = C L: XiZi +d L: YiZi = c(x, z) +dey, z). 1 1 1 Exactly the same theorems hold for the coordinate correspondence between the Euclidean plane 1E2 and the Cartesian 2-space 1R2, except that now, of course, (x, y) = L~ XiYi = XIYl +X2Y2· We can easily obtain the equations for lines and planes in 1E3 from these basic theorems. First, we see from (2) and (3) that if fixed points A and B are given, X with A F- 0, then the line through B parallel to the segment OA contains the point X if and only if there 0 exists a scalar t such that x - b = ta (see Fig. 1.6). Therefore, the equation of this line is x = ta+ h. Fig. 1.6 This vector equation is equivalent to the three numerical equations Xi = ait +bi, i = 1, 2, 3. These are customarily called the parametric equations of the line, since they present the coordinate triple x of the varying point X on the line as functions of the "parameter" t.
  • 51. 1.2 VECTOR SPACES AND GEOMETRY 39 Next, we know that the plane through B perpendicular to the direction of the segment OA contains the point X if and only if BX 1. OA, and it therefore follows from (2) and (4) that the plane contains X if and only if (x - b, a) = 0 (see Fig. 1.7). But (x - b, a) = (x, a) - (b, a) by the linearity of the scalar product in its first variable, and if we set l = (b, a), we see that the equation of the plane is (x, a) = l or That is, a point X is on the plane through B perpendicular to the direction of OA if and only if this equation holds for its coordinate triple x. Conversely, if a ¢ 0, then we can retrace the steps taken above to show that the set of points X in 1E3 whose coordinate triples x satisfy (x, a) = l is a plane. A Fig. 1.7 The fact that 1R3 has the natural scalar product (x, y) is of course extremely important, both algebraically and geometrically. However, most vector spaces do not have natural scalar products, and we shall deliberately neglect scalar products in our early vector theory (but shall return to them in Chapter 5). This leads us to seek a different interpretation of the equation L~ aiXi = l. We saw in Section 1 that x 1---+ L~ aiXi is the most general linear functional f on 1R3. Therefore, given any plane 111 in 1E3, there is a nonzero linear functional f on 1R3 and a number l such that the equation of 111 is f(x) = l. And conversely, given any nonzero linear functional f: IRa ~ IR and any l E IR, the locus of I(x) = l is a plane M in 1E3. The reader will remember that we obtain the coefficient triple a from f by ai = f(~i), since then f(x) = f(LI Xi~i) = 3 i a Ll x;J( ~ ) = Ll Xiai· Finally, we seek the vector form of the notion of parallel translation. In plane geometry when we are considering two congruent figures that are parallel and similarly oriented, we often think of obtaining one from the other by "sliding
  • 52. 40 VECTOR SPACES 1.2 the plane along itself" in such a way that all lines remain parallel to their original positions. This description of a parallel translation of the plane can be more elegantly stated as the condition that every directed line segment slides to an equivalent one. If X slides to Y and 0 slides to B, then OX slides to BY, so that OX ~ BY and x = y - b by (2). Therefore, the coordinate form of such a parallel sliding is the mapping x ~ y = x + b. Conversely, for any b in ~2 the plane mapping defined by x ~ y = x +b is easily seen to be a parallel translation. These considerations hold equally well for parallel translations of the Euclidean space 1E3• It is geometrically clear that under a parallel translation planes map to parallel planes and lines map to parallel lines, and now we can expect an easy algebraic proof. Consider, for example, the plane 111 with equation f(x) = l; let us ask what happens to 111 under the translation x ~ y = x + b. Since x = y - b, we see that a point x is on 111 if and only if its translate y satisfies the equation f(y - b) = lor, since f is linear, the equation f(y) = l', where l' = l +f(b). But this is the equation of a plane N. Thus the translate of M is the plane N. It is natural to transfer all this geometric terminology from sets in 1E3 to the corresponding sets in ~3 and therefore to speak of the set of ordered triples x satisfying f(x) = l as a set of points in ~3 forming a plane in ~3, and to call the mapping x ~ x + b the (parallel) translation of ~3 through b, etc. lIoreover, since ~3 is a vector space, we would expect these geometric ideas to interplay with vector notions. For instance, translation through b is simply the operation of adding the constant vector b: x ~ x + b. Thus if 111 is a plane, then the plane N obtained by translating 111 through b is just the vector set sum 111 + b. If the equation of 111 is f(x) = l, then the plane 111 goes through 0 if and only if l = 0, in which case 111 is a vector subspace of ~3 (the null space of f). It is easy to see that any plane 111 is a translate of a plane through O. Similarly, the line {ta + b : t E ~} is the translate through b of the line {ta : t E ~}, and this second line is a subspace, the linear span of the one vector a. Thus planes and lines in ~3 are translates of subspaces. These notions all carryover to an arbitrary real vector space in a perfectly satisfactory way and with additional dimensional variety. A plane in ~3 through 0 is a vector space which is two-dimensional in a strictly algebraic sense which we shall discuss in the next chapter, and a line is similarly one-dimensional. In ~3 there are no proper subspaces other than planes and lines through 0, but in a vector space V with dimension n > 3 proper subspaces occur with all dimensions from 1 to n - 1. We shall therefore use the term "plane" loosely to refer to any translate of a subspace, whatever its dimension. More properly, translates of vector subspaces are called affine subspaces. We shall see that if V is a finite-dimensional space with dimension n, then the null space of a nonzero linear functionalfis always (n - I)-dimensional, and therefore it cannot be a Euclidean-like two-dimensional plane except when
  • 53. 1.2 VECTOR SPACES AND GEOMETRY 41 n = 3. We use the term hyperplane for such a null space or one of its translates. Thus, in general, a hyperplane is a set with the equation f(x) = l, where f is a nonzero linear functional. It is a proper affine subspace (plane) which is maxi- mal in the sense that the only affine subspace properly including it is the whole of V. In ~3 hyperplanes are ordinary geometric planes, and in ~2 hyperplanes are lines! EXERCISES 2.1 Assuming the theorem AB ~ XY <=} b -'- a = y - x, show that OC is the sum of 01. and OB, as defined in the preliminary discussion of Section 1, if and only if c = b +a. Considering also our assumed geometric theorem (3), show that the mapping x 1--+ OX from ~3 to the vector space of geometric vectors is linear and hence an isomorphism. 2.2 Let L be the line in the Cartesian plane ~2 with equation X2 = 3Xl. Express L in parametric form as x = ta for a suitable ordered pair a. 2.3 Let V be any vector space, and let ex and {j be distinct vectors. Show that the line through ex and {j has the parametric equation ~ = t{j + (1 - t)ex, tE ~. Show also that the segment from ex to {j is the image of [0, 1] in the above mapping. 2.4 According to the Pythagorean theorem, a triangle with side lengths a, b, and e has a right angle at the vertex "opposite e" if and only if e2 = a2 + b2• a~ b Prove from this that in a Cartesian coordinate system in 1E3 the length IOXI of a segment OX is given by 3 IOXI2 = :E X~, 1 where x = -< XI, X2, X3 >- is the coordinate triple of the point X. Next use our geometric theorem (2) to conclude that OX..L OY if and only if (x, y) = 0, where (Use the bilinearity of (x, y) to expand IX - YI2.) 3 (x, y) = :E XiYi. 1 2.5 More generally, the law of cosine says that in any triangle labeled as indicated, e2 = a2 +b2 - 2ab cos (J. Bb
  • 54. 42 VECTOR SPACES 1.2 Apply this law to the diagram to prove that (x, y) = 21xllyl cos 8, where (x, y) is the scalar product I:~ XiYi, Ixl = (x, x)I/2 = 10XI, etc. 2.6 Given a nonzero linear functional f: 1R3 --> IR, and given k E IR, show that the set of points X in P such that f(x) = k is a plane. [Hint: Find a h in 1R3 such that f(h) = k, and throw the equation f(x) = k into the form (x - h, a) = 0, etc.] 2.7 Show that for any b in 1R3 the mapping X f-+ Y from P to itself defined by y = x + b is a parallel translation. That is, show that if X f-+ }' and Z f-+ W, then XZ"'" YW. 2.8 Let M be the set in 1R3 with equation 3XI - X2 + X3 = 2. Find triplets a and h such that M is the plane through b perpendicular to the direction of a. What is the equation of the plane P = J[ + -< 1,2, 1>- '? 2.9 Continuing the above exercise, what is the condition on the triplet h in order for N = M + h to pass through the origin? What is the equation of N? 2.10 Show that if the plane Min 1R3 has the equation f(x) = l, then J[ is a translate of the null space N of the linear functional f. Show that any two translates M and P of N are either identical or disjoint. What is the condition on the ordered triple h in order that M + h = M? 2.11 Generalize the above exercise to hyperplanes in IRn. 2.12 Let N be the subspace (plane through the origin) in 1R3 with equation f(x) = O. Let M and P be any two planes obtained from N by parallel translation. Show that Q = ).11 + P is a third such plane. If M and P have the equations f(x) = it and f(x) = l2, find the equation for Q. 2.13 If M is the plane in 1R3 with equation f(x) = l, and if r is any nonzero number, show that the set product rM is a plane parallel to M. 2.14 In view of the above two exercises, discuss how we might consider the set of all parallel translates of the plane N with equation f(x) = 0 as forming a new vector space. 2.15 Let L be the subspace (line through the origin) in 1R3 with parametric equation x = tao Discuss the set of all parallel translates of L in the spirit of the above three exercises. 2.16 The best object to take as "being" the geometric vector AS is the equivalence class of all directed line segments XY such that XY ,.." AB. Assuming whatever you need from properties (1) through (4), show that this is an equivalence relation on the set of all directed line segments (Section 0.12). 2.17 Assuming that the geometric vector AS is defined as in the above exercise, show that, strictly speaking, it is actually the mapping of the plane (or space) into itself that we have called the parallel translation through AB. Show also that AS +cD is the composition of the two translations.
  • 55. 1.3 PRODUCT SPACES AND HOM(V, W) 43 3. PRODUCT SPACES AND HOM(V, W) Product spaces. If W is a vector space and A is an arbitrary set, then the set V = W A of all W-valued functions on A is a vector space in "exactly the same way that ~A is. Addition is the natural addition of functions, (f + g) (a) = f(a) + g(a), and, similarly, (xf)(a) = x(j(a) for every function f and scalar x. Laws Al through S4 follow just as before and for exactly the same reasons. For variety, let us check the associative law for addition. The equation!+(g +h) = (f + g) +h means that (j + (g + h)(a) = ((f + g) + h) (a) for all a E A. But (j + (g + h) (a) = f(a) + (g + h)(a) = f(a) + (g(a) + h(a) = (j(a) + g(a) + h(a) = (f + g)(a) +h(a) = ((f + g) +h) (a), where the middle equality in this chain of five holds by the associative law for W and the other four are applications of the definition of addition. Thus the associative law for addition holds in WA because it holds in W, and the other laws follow in exactly the same way. As before, we let 7ri be evaluation at i, so that 7ri(f) = f(i). Now, however, 7ri is vector valued rather than scalar valued, because it is a mapping from V to W, and we call it the ith coordinate projection rather than the ith coordinate functional. Again these maps are all linear. In fact, as before, the natural vector operations on W A are uniquely defined by the requirement that the projections 7ri all be linear. We call the value fU) = 7rj(f) the jth coordinate of the vector f. Here the analogue of Cartesian n-space is the set wn of all n-tuples a = -<al, ... , a n >- of vectors in W; it is also designated Wn • Clearly, aj is the jth coordinate of the n-tuple a. There is no reason why we must use the same space W at each index, as we did above. In fact, if W b ... , Wn are any n vector spaces, then the set of all n-tuples a = -<ab ... , an >- such that aj E Wj for j = 1, ... , n is a vector space under the same definitions of the operations and for the same reasons. That is, the Cartesian product W = W1 X W2 X ... X Wn is also a vector space of vector-valued functions. Such finite products will be very important to us. Of course, ~n is the product IIi Wi with each Wi = ~; but ~n can also be considered ~m X ~n-m, or more generally, IIf Wi, where Wi = ~mi and L:f mi = n. However, the most important use of finite product spaces arises from the fact that the study of certain phenomena on a vector space V may lead in a natural way to a collection {ViH of subspaces of V such that V is isomorphic to the product IIi Vi. Then the extra structure that V acquires when we regard it as the product space IIi Vi is used to study the phenomena in question. This is the theory of direct sums, and we shall investigate it in Section 5. Later in the course we shall need to consider a general Cartesian product of vector spaces. We remind the reader that if {Wi: i E J} is any indexed collection of vector spaces, then the Cartesian product IIiEI Wi of these vector spaces is
  • 56. 44 VECTOR SPACES 1.3 defined as the set of all functions f with domain I such that f(i) E Wi for all i E I (see Section 0.8). The following is a simple concrete example to keep in mind. Let S be the ordinary unit sphere in 1R3, S = {x: L~ x~ = I}, and for each point x on S let W" be the subspace of 1R3 tangent to S at x. By this we mean the subspace (plane through 0) parallel to the tangent plane to S at x, so that the translate W" + x is the tangent plane (see Fig. 1.8). A function f in the product space W = II"es W" is a function which assigns to each point x on S a vector in W", that is, a vector parallel to the tangent plane to Sat x. Such a function is called a vector field on S. Thus the product set W is the set of all vector fields on S, and W itself is a vector space, as the next theorem states. Fig. 1.8 Of course, the jth coordinate projection on W = IIiES Wi is evaluation at j, 7rj(f) = f(j), and the natural vector operations on Ware uniquely defined by the requirement that the coordinate projections all be linear. Thus f + g must be that element of W whose value at j, 7rj(f + g), is 7rj(f) + 7rj(g) = f(j) + g(j) for all j E I, and similarly for mUltiplication by scalars. Theorem. 3.1. The Cartesian product of a collection of vector spaces can be made into a vector space in exactly one way so that the coordinate pro- jections are all linear. Proof. With the vector operations determined uniquely as above, the proofs of Al through S4 that we sampled earlier hold verbatim. They did not require that the functions being added have all their values in the same space, but only that the values at a given domain element i all lie in the same space. 0 Hom.(V, W). Linear transformations have the simple but important properties that the sum of two linear transformations is linear and the composition of two linear transformations is linear. These imprecise statements are in essence the theme of this section, although they need bolstering by conditions on domains and codomains. Their proofs are simple formal algebraic arguments, but the objects being discussed will increase in conceptual complexity.
  • 57. 1.3 PRODUCT SPACES AND HOM(V. W) 45 If W is a vector space and A is any set, we know that the space WA of all mappings f: A -+ W is a vector space of functions (now vector valued) in the same way that IRA is. If A is itself a vector space V, we naturally single out for special study the subset of WV consisting of all linear mappings. We designate this subset Hom(V, W). The fonowing elementary theorems summarize its basic algebraic properties. TheoreIn 3.2. Hom(V, W) is a vector subspace of WV. Pl·oof. The theorem is an easy formality. If Sand T are in Hom(V, W), then (S + T)(xa. + y(3) = S(xa. + y,8) + T(xa. + y,8) = xS(a.) + yS(,8) + xT(a.) + yT(,8) = xeS + T)(a.) + yeS + T)(,8), so S + T is linear and Hom(V, W) is closed under addition. The reader should be sure he knows the justification for each step in the above continued equality. The closure of Hom(V, W) under multiplication by scalars fonows similarly, and since Hom(V, W) contains the zero transformation, and so is nonempty, it is a subspace. 0 TheoreIn 3.3. The composition of linear maps is linear: if T E Hom(V, W) and S E Hom(W, X), then SoT E Hom(V, X). Moreover, composition is distributive over addition, under the obvious hypotheses on domains and codomains: (Sl + S2) 0 T = SloT + S2 0 T and So (Tl + T2) = So Tl + S 0 T2. Finally, composition commutes with scalar multiplication: C(S 0 T) = (cS) 0 T = S 0 (cT). Proof. We have So T(xa. + y,8) = S(T(xa. + y(3)) = S(xT(a.) + yT(,8») = xS(T(a.») + yS(T(,8») = xeS 0 T)(a.) + yeS 0 T)«(3), so SoT is linear. The two distributive laws will be left to the reader. 0 Corollary. If T E Hom(V, W) is fixed, then composition on the right by T is a linear transformation from the vector space Hom(W, X) to the vector space Hom(V, X). It is an isomorphism if T is an isomorphism. P1'Oof. The algebraic properties of composition stated in the theorem can be combined as follows: (ClSl +C2S2) 0 T = Cl(Sl 0 T) +C2(S2 0 T), So (clTl +C2T2) = Cl(S 0 T 1) +C2(S 0 T2)' The first equation says exactly that composition on the right by a fixed T is a linear transformation. (Write SoT as 3(S) if the equations still don't look right.) If T is an isomorphism, then composition by T-l "undoes" composition by T, and so is its inverse. 0
  • 58. 46 VECTOR SPACES 1.3 The second equation implies a similar corollary about composition on the left by a fixed S. TheoreIn 3.4. If W is a product vector space, W = IIi Wi, then a mapping T from a vector space V to W is linear if and only if 7ri 0 T is linear for each coordinate projection 7ri. Proof. If T is linear, then 7ri 0 T is linear by the above theorem. Now suppose, conversely, that all the maps 7ri 0 T are linear. Then 7ri(T(xa +y(3») = 7ri 0 T(xa +y(3) = X(7ri 0 T)(a) +Y(7ri 0 T)({3) = x7ri(T(a») +Y7ri(T({3») = 7ri(xT(a) + yT({3»). But if 7ri(f) = 7ri(g) for all i, then f = g. Therefore, T(xa + y(3) = xT(a) + yT({3), and T is linear. 0 If T is a linear mapping from ~n to W whose skeleton is {{3j} n, then 7ri 0 T / has skeleton {7ri({3j)}f=l. If W is ~m, then 7ri is the ith coordinate functional / y 1---+ Yi, and (3j is the jth column in the matrix t = {tij} of T. Thus 7ri({3j) = tij, and 7ri 0 T is the linear functional whose skeleton is the ith row of the matrix of T. In the discussion centering around Theorem 1.3, we replaced the vector equation y = T(x) by the equivalent set of m scalar equations Yi = L:f=l tijXj, which we obtained by reading off the ith coordinate in the vector equation. But in "reading off" the ith coordinate we were applying the coordinate mapping 7ri, or in more algebraic terms, we were replacing the linear map T by the set of linear maps {7ri 0 T}, which is equivalent to it by the above theorem. Now consider in particular the space Hom(V, V), which we may as well designate 'Hom(V)'. In addition to being a vector space, it is also closed under composition, which we consider a multiplication operation. Since composition of functions is always associative (see Section 0.9), we thus have for multiplica- tion the laws A 0 (B 0 C) = (A 0 B) 0 C, A 0 (B +C) = (A 0 B) + (A 0 C), (A +B) 0 C = (A 0 C) + (B 0 C), k(A 0 B) = (kA) 0 B = A 0 (kB). Any vector space which has in addition to the vector operations an operation of multiplication related to the vector operations in the above ways is called an algebra. Thus, TheoreIn 3.5. Hom(V) is an algebra. We noticed earlier that certain real-valued function spaces are also algebras. Examples were ~A and e([O, 1]). In these cases multiplication is commutative, but in the case of Hom(V) multiplication is not commutative unless V is a trivial space (V = {O}) or V is isomorphic to~. We shall check this later when we examine the finite-dimensional theory in greater detail.
  • 59. 1.3 PRODUCT SPACES AND HOM(V, W) 47 Product projections and injections. In addition to the coordinate projections, there is a second class of simple linear mappings that is of basic importance in the handling of a Cartesian product space W = IIkEK Wk. These are, for each j, the mapping OJ taking a vector a E Wj to the function in the product space having the value a at the index j and °elsewhere. For example, O2 for WI X W 2 X W3 is the mapping a ~ -<0, a, 0> from W 2 to W. Or if we view ~3 as ~ X ~2, then O2 is the mapping -< X2, X3 > ~ -< 0, -< X2, X3> > = -< 0, X2, X3 > . We call OJ the injection of Wj into IIk Wk. The linearity of OJ is probably obvious. The mappings 7rj and OJ are clearly connected, and the following pl'Ojection- injection identities state their exact relationship. If I j is the identity trans- formation on Wj, then and if i ~ j. If K is finite and I is the identity on the product space W, then L Ok 0 11'k = I. kEK In the case IIt=l Wi, we have 02 0 11'2(-<aI, a2, a3» = -<0, a2, 0>, and the identity simply says that -< aI, 0, °> + -< 0, a2, °> + -< 0, 0, a3> = -< aI, a2, a3> for all aI, a2, a3' These identities will probably be clear to the reader, and we leave the formal proofs as an exercise. The coordinate projections 11'j are useful in the study of any product space, but because of the limitation in the above identity, the injections OJ are of interest principally in the case of finite products. Together they enable us to decompose and reassemble linear maps whose domains or codomains are finite product spaces. For a simple example, consider the T in Hom(~3, ~2) whose matrix is [i -1 1 Then 11'1 0 T is the linear functional whose skeleton -< 2, -1, 1> is the first row in the matrix of T, and we know that we can visualize its expression in equation form, Yl = 2Xl - X2 + X3, as being obtained from the vector equation y = T(x) by "reading off the first row". Thus we "decompose" T into the two linear functionals li = 11'i 0 T. Then, speaking loosely, we have the reassembly T = -< h, l2> j more exactly, T(x) = -< 2Xl - X2 + X3, Xl + X2 + 4X3> = -< II (x), l2(X) > for all x. However, we want to present this reassembly as the action of the linear maps 01 and O2 . We have which shows that the decomposition and reassembly of T is an expression of the identity L Oi 0 11'i = I. In general, if T E Hom(V, W) and W = IIi Wi, then Ti = 11'; 0 T is in Hom(V, Wi) for each i, and Ti can be considered "the part of T going into W/', since Ti(a) is the ith coordinate of T(a) for each a. Then we
  • 60. 48 VECTOR SPACES 1.3 can reassemble the T/s to form T again by T = L Oi 0 Ti, for L Oi 0 Ti = (L Oi 0 'Tri) 0 T = loT = T. Moreover, any finite collection of T/s on a common domain can be put together in this way to make a T. For example, we can assemble an m-tuple {Ti} r of linear maps on a common domain V to form a single m-tuple-valued linear map T. Given a in V, we simply define T(a) as that m-tuple whose ith coordinate is Ti(a) for i = 1, ... , m, and then check that T is linear. Thus without having to calculate, we see from this assembly principle that T: x ~ -< 2XI - X2 + X3, Xl + X2 + 4X3>- is a linear mapping from 1R3 to 1R2, since we have formed T by assembling the two linear functionals ll(X) = 2XI - X2 + X3 and l2(X) = Xl + X2 +4X3 to form a single ordcrcd-pair-valued map. This very intuitive proccss has an cqually simple formal justification. We rigorize our discussion in the following theorem. Theorem 3.6. If Ti is in Hom(V, Wi) for each i in a finite index set I, and if W is the product space IIiEI Wi, then there is a uniquely determined Tin Hom(V, W) such that Ti = 'Tri 0 T for all i in I. Proof. If T exists such that Ti = 'Tri 0 T for each i, then T = Iw 0 T = (L Oi 0 'Tri) 0 T = L Oi 0 (7ri 0 T) = L Oi 0 Ti. Thus T is uniquely determincd as L Oi 0 Ti. Moreover, this T does have the required property, since thcn 'Trj 0 T = 'Trj 0 (L Oi 0 Ti) = L ('Trj 0 Oi) 0 Ti = I j 0 T j = T j. 0 i In the same way, we can decompose a linear T whose domain is a product space V = II;'=1 V j into the maps T j = To OJ with domains Vj, and thcn reassemble these maps to form T by the identity T = L;'=l Tj 0 'Trj (check it mentally!). lVloreover, a finite collection of maps into a common codomain space can be put together to form a single map on the product of the domain spaces. Thus an n-tuple of maps {Ti} ~ into W defines a single map T into W, where the domain of T is the product of the domains of the T/s, by the equation T( -< aI, ... , an >-) = L~ Ti(ai) or T = L~ Ti 0 'Tri. For example, if T I: IR ~ 1R2 is the map t ~ t -< 2, 1>- = -< 2t, t>-, and T 2 and T 3 are similarly the maps t ~ t -< -1, 1>- and t ~ t -< 1,4>-, then T = Lt Ti 0 7ri is the mapping from 1R3 to 1R2 whose matrix is -1 1 !J. Again there is a simple formal argument, and we shall ask the reader to write out the proof of the following theorem. Theorem 3.7. If Tj is in Hom(Vj, W) for each J in a finite index set J, and if V = IIjEJ Vi> then there exists a unique T in Hom(V, W) such that To OJ = T j for eachJ in J. Finally we should mention that Theorem 3.6 holds for all product spaces, finite or not, and states a property that characterizes product spaces. We shall
  • 61. 1.3 PRODUCT SPACES AND HOM(V, W) 49 investigate this situation in the exercises. The proof of the general case of Theorem 3.6 has to get along without the injections OJ; instead, it is an application of Theorem 3.4. The reader may feel that we are being overly formal in using the projections 7ri and the injections Oi to give algebraic formulations of processes that are easily visualized directly, such as reading off the scalar "components" of a vector equation. However, the mappings and Xi ~ -<0, ... , 0, Xi, 0, ... , °>- are clearly fundamental devices, and making their relationships explicit now will be helpful to us later on when we have to handle their occurrences in more complicated situations. EXERCISES 3.1 Show that IRm X IR" is isomorphic to 1R,,+m. 3.2 Show more generally that if L:~ ni = n, then 11:"=1 IRni is isomorphic to IRn. 3.3 Show that if {B, C] is a partitioning of .1, then IRA and IRB X IRe are isomorphic. 3.4 Generalize the above to the case where {.li]~ partitions A. 3.5 Rhow that a mapping T from a vector space r to a vector space Tr is linear if and only if (the graph of) T is a subspace of V X H'. 3.6 L('t Sand T be nonzero linear maps from V to W. The definition of the map S + T is not the same as the set sum of (the graphs of) Sand T as subspaces of V X Tr. Show that the set sum of (the graphs of) Sand T cannot be a graph unless S = T. 3.7 Give the justification for each step of the calculation in Theorem 3.2. 3.8 Prove the distributive laws given in Theorem 3.3. 3.9 L('t D: e 1([a, b)) --+ e([a, b)) be differentiation, and l('t S: e([a, b» --+ IR be the definit<> int('gral map f ~ J:f. Compute the eomposition SoD. 3.10 We know that the g('nerallinear functional F on 1R2 is the map x ~ alXI + a2;(2 determined by the pair a in 1R2, and that the g('neral linear map T in Hom(1R2) is determined by a matrix t = [tIl t12] . t21 t22 Then F 0 T is another linear functional, and hence is of the form x ~ btXI + b2X2 for some b in 1R2. Compute b from t and a. Your computation should ,!lOW you that a ~ b is linear. What is its matrix? 3.ll Given Sand Tin Hom(1R2) whose matrices are and [~ ~], respectively, find the matrix of SoT in Hom(1R2).
  • 62. 50 VECTOR SPACES 3.12 Given Sand T in Hom(~2) whose matrices are s = [811 812] 821 822 find the matrix of SoT. and t = [t11 t12] , t21 t22 1.3 3.13 With the above answer in mind, what would you guess the matrix of SoT is if Sand T are in Hom(~3)? Verify your guess. 3.14 We know that if T E Hom(V, lV) is an isomorphism, t"'en T-l is an isomorphism in HomOT', V). Prove that SoT surjective =} S surjective, SoT injective =} T injective, and, therefore, that if T E Hom(V, tv), S E Hom(W, V), and SoT = Iv, To S = I. , then T is an isomorphism. 3.15 Show that if S-1 and T-l exist, then (So T)-1 exists and equals T-l 0 S-I. Give a more careful statement of this result. 3.16 Show that if Sand T in Hom V commute with each other, then the null space of T, N = N(T), and its range R = R(T) are invariant under S (S[N] C Nand S[R] C R). 3.17 Show that if ex is an eigenvector of T and S commutes with T, then S(ex) is an eigenvector of T and has the same eigenvalue. 3.18 Show that if S commutes with T and T-l exists, then S commutes with T-l. 3.19 Given that ex is an eigenvector of T with eigenvalue x, show that ex is also an eigenvector of T2 = ToT, of Tn, and of T-l (if T is invertible) and that the corre- sponding eigenvalues are x2, x n, and l/x. Given that p(t) is a polynomial in t, define the operator p(T), and under the above hypotheses, show that ex is an eigenvector of p(T) with eigenvalue p(x). 3.20 If Sand T are in Hom V, we say that S doubly commute8 with T (and write S cc T) if S commutes with every A in Hom V which commutes with T. Fix T, and set {T]" = {S: S cc T}. Show that {T}" is a commutative subalgebra of Hom V. 3.21 Given T in Hom V and ex in V, let N be the linear span of the "trajectory of ex under T" (the set {Tnex: n E l+]). Show that N is invariant under T. 3.22 A transformation T in Hom V such that Tn = 0 for some n is said to be nilpotent. Show that if T is nilpotent, then I - T is invertible. [Hint: The power series _1_ = fxn 1 - x 0 is a finite sum if x is replaced by T.] 3.23 Suppose that T is nilpotent, that S commutes with T, and that S-1 exists, where S, T E Hom V. Show that (S - T) -1 exists. 3.24 Let q; be an isomorphism from a vector space V to a vector space W. Show that T ~ q; 0 To q;-1 is an algebra isomorphism from the algebra Hom V to the algebra Hom TV.
  • 63. 1.3 PRODUCT SPACES AND HOM(V, lV) 51 3.25 Show the 7r/s and (J/s explicitly for ~3 = ~ X ~ X ~ using the stopped arrow notation. Also write out the identity L: (Jj 0 7rj = I in explicit form. 3.26 Do the same for ~5 = ~2 X ~3. 3.27 Show that the first two projection-injection identities (7ri 0 (Ji = Ii and 7rj 0 (Ji = 0 if j ~ i) are simply a restatement of the definition of (Ji. Show that the linearity of (Ji follows formally from these identities and Theorem 3.4. 3.28 Prove the identityL: (Ji 0 7ri = I by applying 7rj to the equation and remembering that f = g if 7rj(f) = 7rj(g) for all j (this being just the equation f(j) = g(j) for all j). 3.29 Prove the general case of Theorem 3.6. We are given an indexed collection of linear maps {Ti: i E I} with common domain V and codomains {Wi: i E 1]. The first question is how to define T: V ---+ W = IIi Wi. Do this by defining T(~) suitably for each ~ E V and then applying Theorem 3.4 to conclude that T is linear. 3.30 Prove Theorem 3.7. 3.31 We know without calculation that the map from ~3 to ~4 is linear. Why? (Cite relevant theorems from the text.) 3.32 Write down the matrix for the transformation T in the above example, and then write down the mappings To (Ji from ~ to ~4 (for i = 1,2,3) in explicit ordered quadruplet form. 3.33 Let W = II~ Wi be a finite product vector space and set Pi = (Ji 0 7ri, so that Pi is in Hom W for all i. Prove from the projection-injection identities that L:~ Pi = I (the identity map on W), Pi 0 pj = 0 if i ~ j, and Pi 0 Pi = Pi. Identify the range Ri = R(Pi). 3.34 In the context of the above exercise, define T in Hom W as Show that a is an eigenvector of T if and only if a is in one of the subspaces Ri and that then the eigenvalue of a is i. 3.35 In the same situation show that the polynomial n II (T - jl) = (T - I) 0 ••• 0 (T - nl) j=l is the zero transformation. 3.36 Theorems 3.6 and 3.7 can be combined if T E Hom(V, 11'), where both V and lI' are product spaces: n V = IIVj and 1 State and prove a theorem which says that such a T can be decomposed into a doubly indexed family {Tij} when Tij E Hom(Vi, Wj) and conversely that any such doubly indexed family can be assembled to form a single T form V to W.
  • 64. 52 VECTOR SPACES 1.4 3.37 Apply your theorem to the special case where V = IRn and TV = IRm (that is, Vi = Wj = IR for all i and j). Now Tij is from IR to IR and hence is simply multipli- cation by a number tij. Show that the indexed collection {tij} of these numbers is the matrix of T. 3.38 Given an m-tuple of vector spaces {Wi) '{', suppose that there are a vector space X and maps Pi in Hom(X, 1ri), i = 1, ... , m, with the following property: P. For any m-tuple of linear maps {Ti) from a common domain space V to the above spaces Wi (so that Ti E Hom(V, Wi), i = 1, ... , m), there is a unique T in Hom(V, X) such that Ti = Pi 0 T, i = 1, ... , m. Prove that there is a "canonical" isomorphism from m TV = II Wi to X 1 under which the given maps Pi become the projections 7ri. [Remark: The product space TV itself has property P by Theorem 3.6, and this exercise therefore shows that P is an abstract characterization of the product space.] 4. AFFINE SUBSPACES AND QUOTIENT SPACES In this section we shall look at the "planes" in a vcctor space V and see what happens to them when we translate them, intersect them with each other, take their images under linear maps, and so on. Then we shall confine ourselves to the set of all planes that are translates of a fixed subspace and discover that this set itself is a vector space in the most obvious way. Some of this material has been anticipated in Section 2. Affine subspaces. If N is a subspace of a vector space V and a is any vector of V, then the set N + a = {~+ a : ~ EN} is called either the coset of N containing a or the affine subspace of V through a and parallel to N. The set N +a is also called the translate of N through a. We saw in Section 2 that affine sub- spaces are thc general objects that we want to call planes. If N is given and fixed in a discussion, we shall use the notation a = N +a (see Section 0.12). We begin with a list of some simple properties of affine subspaces. Some of these will gencralize observations already made in Section 2, and the proofs of some will be left as exercises. 1) With a fixed subspace N assumed, if 'Y E a, then 'Y = a. For if 'Y = a + 7]0, then 'Y + 7] = a + (7]0 + 7]) E a, so 'Y Ca. Also a + 7] = 'Y + (7] - 7]0) E 'Y, soaC'Y. Thusa= 'Y. 2) With N fixed, for any a and {3, either a = 11 or a and 11 are disjoint. For if a and 11 are not disjoint, then there exists a 'Y in each, and a = 'Y = 11 by (1). The reader may find it illuminating to compare these calculations with the more general ones of Section 0.12. Here a ~ {3 if and only if a - (3 EN. 3) Now let a be the collection of all affine subspaces of V; a is thus the set of all cosets of all vector subspaces of V. Then the intersection of any sub-
  • 65. 1.4 AFFINE SUBSP,CES AXD QUOTIEXT SP,CES 53 family of Ci is either empty or itself an affine subspace. In fact, if [Ai} iEI is an indexed collection of affine subspaces and Ai is a coset of the vector subspace Wi for each i E I, then niEI Ai is either empty or a coset of the vector subspace niEI Wi. For if {3 E niEI Ai, then (1) implies that Ai = {3 + Wi for all i, and then nAi = (3 + nWi. 4) If A, BE Ci, then A + BE Ci. That is, the set sum of any two affine subspaces is itself an affine ~mbspace. 5) If A E Ci and T E Hom(V, W), then T[A] is an affine subspace of W. In particular, if t E IR, then tA E Ci. 6) If B is an affine subspace of Wand T E Hom(V, TV), then T- 1[B] is either empty or an affine subspace of V. 7) For a fixed a E V the translation of V through a is the mapping Sa: V ~ V defined by Sa(~) = ~ + a for all ~ E V. Translation is not linear; for example, Sa(O) = a. It is clear, however, that translation carries affine subspaces into affine subspaces. Thus Sa(A) = A + a and Sa({3 + W) = (a +(3) + W. 8) An affine transformation from a vector space V to a vector space W is a linear mapping from V to W followed by a translation in W. Thus an affine transformation is of the form ~ f-+ T(~) +(3, where T E Hom(V, W) and (3 E W. Note that ~ f-+ T(~ + a) is affine, since T(~ + a) = T(~) +{3, where (3 = T(a). It follows from (5) and (7) that an affine transformation carries affine subspaces of V into affine subspaces of W. Quotient space. Now fix a subspace N of V, and consider the set W of all translates (cosets) of N. We are going to see that W itself is a vector space in the most natural way possible. Addition will be set addition, and scalar multipli- cation will be set multiplication (except in one special case). For example, if N is a line through the origin in 1R3, then W consists of all lines in 1R3 parallel to N. Weare saying that this set of parallel lines will automatically turn out to be a vector space: the set sums of any two of the lines in W turn out to be a line in W! And if LEW and t ~ 0, then the set product tL is a line in W. The translates of L fiber 1R3, and the set of fibers is a natural vector space. During this discussion it will be helpful temporarily to indicate set sums by '+s' and set products by I •• '. With N fixed, it follows from (2) above that two eosets are disjoint or identical, so that the set W of all cosets is a fibering of V in the general case, just as it was in our example of the parallel lines. From (4) or by a direct calculation we know that ex +. II = a + (3. Thus W is closed under set addition, and, naturally, we take this to be our operation of addition on W. That is, we define + on W by ex + II = ex +. II. Then the natural map 7r: a f-+ ex from V to W preserves addition, 7r(a +(3) = 7r(a) + 7r({3), since
  • 66. 54 VECTOH SPACES 1.4 this is just our equation a +{3 = a + ~ above. Similarly, if t EO IR, then the set product t .s a is either ta or {O]. Hence if we define ta as the set product when t ,e 0 and as 0 = N when t = 0, then 7r also preserves scalar multiplication, 7r(ta) = t7r(a). We thus have two vectorlike operations on the set W of all cosets of N, and we naturally expect W to turn out to be a vector space. We could prove this by verifying all the laws, but it is more elegant to notice the general setting for such a verification proof. Theorem 4.1. Let V be a vector space, and let W be a set having two vectorlike operations, which we designate in the usual way. Suppose that there exists a surjective mapping T: V --+ W which preserves the operations: T(sa + t(3) = sT(a) + tT({3). Then W is a vector space. Proof. We have to check laws Al through S4. However, one example should make it clear to the reader how to proceed. We show that T(O) satisfies A3 and hence is the zero vector of lV. Since every (3 EO W is of the form T(a), we have T(O) +(3 = T(O) + T(a) = T(O + a) = T(a) = (3, which is A3. We shall ask the reader to check more of the laws in the exercises. 0 Theorem 4.2. The set of cosets of a fixed subspace N of a vector space V themselves form a vector space, called the quotient space VIN, under the above natural operations, and the projection 7r is a surjective linear map from V to VIN. Theorem 4.3. If T is in Hom(V, W), and if the null space of T includes the subspace lIf C V, then T has a unique factorization through VIlIf. That is, there exists a unique transformation S in Hom(V1M, W) such that T = S07r. Proof. Since T is zero on lIf, it follows that T is constant on each coset A of lIf, so that T[A] contains only one vector. If we define S(A) to be the unique vector in T[A], then S(a) = T(a), so So 7r = T by definition. Conversely, if T = R 0 7r, then R(a) = R 0 7r(a) = T(a), and R is our above S. The linearity of S is practically obvious. Thus S(a +m= S(a +(3) = T(a +(3) = T(a) + T({3) = S(a) + S(m, and homogeneity follows similarly. This completes the proof. 0 One more remark is of interest here. If N is invariant under a linear map T in Hom V (that is, T[N] eN), then for each a in V, T[a] is a subset of the coset T(a), for T[a] = T[a +N] = T(a) +. T[N] C T(a) +. N = T(a).
  • 67. 1.4 AFFINE SUBSPACES AND QUOTIENT SPACES 55 There is therefore a map S: VIN -+ VIN defined by the requirement that S(a) = T(OI) (or S 0 7r = 7r 0 T), and it is easy to check that S is linear. There- fore, Theorem 4.4. If a subspace N of a vector space V is carried into itself by a transformation T in Hom V, then there is a unique transformation S in Hom(VIN) such that S 0 7r = 7r 0 T. EXERCISES 4.1 Prove properties (4), (5), and (6) of affine subspaces. 4.2 Choose an origin 0 in the Euclidean plane P (your sheet of paper), and lct 1.,1 and L2 be two parallel lines not containing O. Let X and Y be distinct points on 1.,1 and Z any point on L2. Draw the figure giving the geometric sums and (parallelogram rule), and state the theorem from plane geometry that says that these two sum points are on a third line L3 parallel to L1 and L2. 4.3 a) Prove the associative law for addition for Theorem 4.1. b) Prove also laws A4 and S2. 4.4 Return now to Exercise 2.1 and reexamine the situation in the light of Theorem 4.1. Show, finally, how we really know that the geometric vectors form a vector space. 4.5 Prove that the mapping 8 of Theorem 4.3 is injective if and only if N is the null space of T. 4.6 We know from Exercise 4.5 that if T is a surjective element of Hom(V, W) and N is the null space of T, then the 8 of Theorem 4.3 is an isomorphism from VIN to W. Its inverse 8-1 assigns a coset of N to each." in Tr. Show that the process of "indefinite integration" is an example of such a map 8-1• This is the process of calculating an integral and adding an arbitrary constant, as in Jsin x dx = -cos x +c. 4.7 Suppose that Nand Mare subspaces of a vector space V and that N C M. Hhow that then MIN is a subspace of VIN and that VI M is naturally isomorphic to the (Iuotient space (VIN)/(MIN). [Hint: Every coset of N is a subset of some coset of M.l 4.8 Suppose that Nand M are any subspaces of a vector space V. Prove that (M +N)IN is naturally isomorphic to MI(M n N). (Start with the fact that each ('oset of M n N is included in a unique coset of N.) 4.9 Prove that the map 8 of Theorem 4.4 is linear. ·~.IO Given T E Hom V, show that T2 = 0 (T2 = To T) if and only if R(T) C N(T). ·~.ll Suppose that T E Hom V and the subspace N are such that T is the identity nn N and also on VIN. The latter assumption is that the 8 of Theorem 4.4 is the idcntityon VIN. Set R = T - I, and use the above exercise to show that R2 = O. Hhow that if T = 1+ Rand R2 = 0, then there is a subspace N such that T is the identity on N and also on VIN.
  • 68. 56 VECTOH SPACES 1.5 4.12 We now view the above situation a little differently. Supposing that T is the identity on N and on V IN, and setting R = I - T, show that there exists a K E Hom(VIN, V) such that R = K 0 7r. Show that for any coset A of N the action of T on A can be viewed as translation through K(A). That is, if ~ E ..4 and 71 = K(A), then TW = ~+ 71. 4.13 Consider the map T: <Xl, X2>- ~ <Xl + 2X2, X2>- in Hom !R2 , and let N be the null space of R = T - I. Identify N and show that T is the identity on Nand on !R21N. Find the map K of the above exercise. Such a mapping T is called a shear transformation of r parallel to N. Draw the unit squar~ and its image under T. 4.14 If we remember that the linear span 1.,(.1) of a subset .1 of a vector space V can be defined as the intersection of all the subspaces of V that include .1, then the fact that the intersection of any collection of affine subspaces of a vector space V is either an affine subspace or empty suggests that we define the affine span .1J(A) of a nonempty subset A C V as the intersection of all affine subspaces including A. Then we know from (3) in our list of affine properties that .1J(.I) is an affine subspace, and by its definition above that it is the smallest affine subspace including A. We now naturally wonder whether M(A) can be directly described in terms of linear combinations. Show first that if a E ii, then M(A) = 1.,("1 - a) + a; then prove that M(A) is the set of all linear combinations I: Xiai on .1 such that I: Xi = 1. 4.15 Show that the linear span of a set B is the affine span of B U {OJ. 4.16 Show that .t/(A + 'Y) = .:1/(.1) + 'Y for any 'Y in r and that M(xA) = xM(A) for any X in !R. 5. DIRECT SUMS We come now to the heart of the chapter. It frequently happens that the study of some phenomenon on a vector space V leads to a finite collection of subspaces {Vi} such that V is naturally isomorphic to the product space IIi Vi. Under this isomorphism the maps (h 0 7ri on the product space become certain maps Pi in Hom V, and the projection-injection identities are reflected in the identities "'£Pi = I, Pj 0 Pj = Pj for all .1, and Pi 0 Pj = 0 if i ¢ i Also, Vi = range Pi. The product structure that V thus acquires is then used to study the phe- nomenon that gave rise to it. For example, this is the way that we unravel the structure of a linear transformation in Hom V, the study of which is one of the central problems in linear algebra. Direct SUIllS. If VI, ... , Vn are subspaces of the vector space V, then the mapping 7r: <aI,"" an >- ~ "'£7 ai is a linear transformation from II~ Vi to V, since it is the sum 7r = "'£7 7ri of the coordinate projections. Definition. We shall say that the Vi's are independent if 7r is injective and that V is the direct sum of the Vi's if 7r is an isomorphism. We express the latter relationship by writing V = VI EEl ... EEl Vn = EB7 Vi. Thus V = EBi=I Vi if and only if 7r is injective and surjective, i.e., if and only if the subspaces {Vig are both independent and span V. A useful restate-
  • 69. 1.5 DIRECT SUMS 57 ment of the direct sum condition is that each a E V is uniquely expressible as a sum :E~ ai, with ai E Vi for all i; a has some such expression because the V/s span V, and the expression is unique by their independence. For example, let V = e(lR) be the space of real-valued continuous functions on IR, let V. be the subset of even functions (functions f such that f( -x) = f(x) for all x), and let Vo be the subset of odd functions (functions such that f( -x) = -f(x) for all x). It is clear that V. and Vo are subspaces of V, and we claim that V = V. EB Yo' To see this, note that for any f in V, O(x) = (j(x) +f( -x))/2 is even, h(x) = (j(x) - f( -x)) /2 is odd, and f = 0 + h. Thus V = V. + Yo' Moreover, this decomposition of f is unique, for if f = 01 + hI also, where 01 is even and hI is odd, then 0 - 01 = hI - h, and therefore 0 - 01 = 0 = hI - h, since the only function that is both even and odd is zero. The even-odd components of eX are the hyperbolic cosine and sine functions: X (eX + e-X) (eX - e-X) . e = 2 + 2 = cosh x + smh x. Since 7r is injective if and only if its null space is {O} (Lemma 1.1), we have: Lelllllla 5.1. The independence of the subspaces Wig is equivalent to the property that if ai E Vi for all i and :E~ ai = 0, then ai = 0 for all i. Corollary. If the subspaces {Vi}~ are independent, ai E Vi for all i, and :E~ ai is an element of Vj, then ai = 0 for i ~ j. We leave the proof to the reader. The case of two subspaces is particularly simple. Lelllllla 5.2. The subspaces 1If and N of V are independent if and only if 1If n N = (O}. Proof. If a E M, {3 E N, and a +{3 = 0, then a = -{3 E M n N. If 1If n N = {O}, this will further imply that a = {3 = 0, so 1If and N are independent. On the other hand, if 0 ~ {3 E M n N, and if we set a = -{3, then a E JI.!, {j E N, and a +{3 = 0, so 1If and N are not independent. 0 Note that the first argument above is simply the general form of the unique- ness argument we gave earlier for the even-odd decomposition of a function on IR. Corollary. V = M EB N if and only if V = JI.! + Nand 1If n N = {O}. Definition. If V = M EB N, then M and N are called complementary sub- spaces, and each is a complement of the other. Waming: A subspace M of V does not have a unique complementary subspace IInless M is trivial (that is, M = {O} or M = V). If we view 1R3 as coordinatized Euclidean 3-space, then M is a proper subspace if and only if M is a plane con- taining the origin or M is a line through the origin (see Fig. 1.9). If M and N are
  • 70. 58 VECTOR SPACES 1R13 = NffiL ~='1+A 1.5 Fig. 1.9 proper subspaces one of which is a plane and the other a line not lying in that plane, then M and N are complementary subspaces. Moreover, these are nontrivial complementary pairs in IRs, The rea.der will be asked to prove some of these facts in the exercises and they all will be clear by the middle of the next chapter. The following lemma is technically useful. Lemma 5.3. If 171 and Vo are independent subspaces of 17 and {Yin are independent subspaces of Yo, then {Vi}'; are independent subspaces of V. Proof. If ai E Vi for all i and 2:1 ai = 0, then, setting ao = 2:2 ai, we have al +ao = 0, with ao E Yo. Therefore, a 1 = ao = 0 by the independence of 171 and Vo. But then a2 = as = ... = an = 0 by the independence of {Yin, and we are done (Lemma 5.1). 0 Corollary. V = V 1 EEl V 0 and V 0 = EBf=2 Vi together imply that V = EBf=l Vi. Projections. If V = EBf=1 Vi, if 7r is the isomorphism -< a b ... , an >- .- a = 2:1 ai, and if 7rj is the jth projection map -<al>"" an >- !--l> aj from IIi=i Vi to Vi> then (7rj 0 7r-1)(a) = ai. Definition. We call aj the jth component of a, and we call the linear map Pj = 7rj 0 7r-1 the projection of V onto Vi (with respect to the given direct sum decomposition of V). Since each a in V is uniquely expressible as a sum a = 2:1 ai, with ai in Vi for all i, we can view Pj(a) = aj as "the part of a in V;". This use of the word "projection" is different from its use in the Cartesian product situation, and each is different from its use in the quotient space con- text (Section 0.12). It is apparent that these three uses are related, and the ambiguity causes little confusion since the proper meaning is always clear from the context. Theorem 5.1. If the maps Pi are the above projections, then range Pi = Vi, Pi 0 Pj = 0 for i ~ j, and L:l Pi = I. Proof. Since 7r is an isomorphism and P j = 7rj 0 7r-I, we have range Pj = range 7rj = Vj. Next, it follows directly from the corollary to Lemma 5.1 that
  • 71. 1.5 DIRECT SUMS 59 if a E Vj, then PiCa) = 0 for i ¢ j, and so Pi 0 P j = 0 for i ¢ j. Finally, Ei Pi = Ei 7ri 0 7r-l = (Ei 7ri) 0 7r-l = 7r 0 7r-l = I, and we are done. 0 The above projection properties are clearly the reflection in V of the pro- jection-injection identities for the isomorphic space IIi Vi. A converse theorem is also true. Theorelll 5.2. If {Pi}i c Hom V satisfy :Li Pi = I and Pi 0 P j = 0 for i ¢ j, and if we set Vi = range Pi, then V = EB?=l Vi, and Pi is the corresponding projection on Vi. P1"Oof. The equation a = lea) = Ei PiCa) shows that the subspaces {Vi} i span V. Next, if fJ E Vj, then Pi(fJ) = 0 for i ¢ j, since fJ E range Pj and Pi 0 P j = 0 if i ¢ j. Then also P;({3) = (I - Ei+j P i)({3) = 1({3) = (3. Now consider a = Ei ai for any choice of ai E Vi. Using the above two facts, we have Pj(a) = Pj(E?=l ai) = Ei=l Pj(ai) = aj. Therefore, a = 0 implies that aj = Pj(O) = 0 for all j, and the subspaces Vi are independent. Consequently, V = EBi Vi. Finally, the fact that a = E PiCa) and PiCa) E Vi for all i shows that P j(a) is the jth component of a for every a and therefore that P j is the projection of V onto Vj. 0 There is an intrinsic characterization of the kind of map that is a projection. Lelllllla 5.4. The projections Pi are idempotent (Pr = Pi), or, equivalently, each is the identity on its range. The null space of Pi is the sum of the spaces Vj for j ¢ i. Proof. PJ = P j 0 (I - Ei+j Pi) = P j 0 I = Pj. Since this can be rewritten as Pj(Pj(a» = Pj(a) for every a in V, it says exactly that P j is the identity on its range. Now set Wi = Ei+i V;, and note that if (3 E Wi, then P i({3) = 0 since Pi[Vj] = 0 for j ¢ i. Thus Wi C N(Pi). Conversely, if PiCa) = 0, then a = lea) = Ei Pj(a) = Ej+i Pj(a) E Wi. Thus N(Pi) C Wi, and the two spaces are equal. 0 Conversely: Lelllllla 5.5. If P E Hom(V) is idempotent, then V is the direct sum of its range and null space, and P is the corresponding projection on its range. Proof. Setting Q = I - P, we have PQ = P - p 2 = O. Therefore, V is the direct sum of the ranges of P and Q, and P is the corresponding projection on its range, by the above theorem. Moreover, the range of Q is the null space of P, by the coronary. 0 If V = M E9 Nand P is the corresponding projection on M, we call P the p1"Ojection on M along N. The projection P is not determined by M alone, since M does not determine N. A pair P and Qin Hom V such that P +Q = I and PQ = QP = 0 is called a pair of complementary projections.
  • 72. 60 VECTOR SPACES 1.5 In the above discussion we have neglected another fine point. Strictly speaking, when we form the sum 7r = L~ 7ri, we are treating each 7rj as though it were from II~ Vi to V, whereas actually the codomain of 7rj is Vj. And we want P j to be from V to V, whereas 7rj 0 7r- 1 has codomain Vj, so the equation Pj = 7rj 0 7r- 1 can't quite be true either. To repair these flaws we have to introduce the injection Lj: Vj ---+ V, which is the identity map on Vj, but which views V j as a subspace of V and so takes Vas its codomain. If our concept of a mapping includes a codomain possibly larger than the range, then we have to admit such identity injections. Then, setting ifj = Lj 0 7rj, we have the correct equations 7r = L~ ifi and Pj = ifj 0 7r- 1• EXERCISES 5.1 Prove the corollary to Lemma 5.1. 5.2 Let a bc the vector -< 1, 1, 1>- in 1R3, and let M = IRa be its one-dimrnsional span. Show that each of the three coordinate planes is a complement of M. 5.3 Show that a finite product space V = II~ Vi has subspaces {WiH such that Wi is isomorphic to Vi and r = EB~ Wi. Show how the corresponding projections {Pi} are related to the 7r;'S and O;'s. 5.4 If T E HomO', W), show that (the graph of) T is a complement of W' = {OJ X Win Y X W. 5.5 If l is a linear functional on Y (l E Hom(Y, IR) = Y*), and if a is a vector in V such that l(a) r'- 0, show that F = N ® JI, where N is the null space of land M = IRa is the linear span of a. What does this result say about complements in 1R3? 5.6 Show that any complement M of a subspace N of a vector space V is isomorphic to the quotient space YIN. 5.7 We suppose again that every subspace has a complement. Show that if T E Hom Y is not injective, then there is a nonzero S in Hom Y such that To S = O. Show that if T E Hom Y is not surjective, then there is a nonzero S in Hom V such that SoT = O. 5.8 Using the above exercise for half the arguments, show that T E Hom Y is injective if and only if To S = 0 => S = 0 and that T is surjective if and only if SoT = o=> S = O. We thus have characterizations of injectivity and surjectivity that are formal, in the sense that they do not rrfer to the fact that Sand T are transformations, but refer only to the algebraic properties of Sand T as elements of an algebra. 5.9 Let M and N be complementary subspaces of a vector space Y, and let X be a subspace such that X n N = {OJ. Show that there is a linear injection from X to M. [Hint: Consider the projection P of V onto M along N.] Show that any two comple- ments of a subspace N are isomorphic by showing that the above injection is surjective if and only if X is a complement of N. 5.10 Going back to the first point of the preceding exercise, let Y be a eomplemrnt of P[X] in M. Show that X n Y = {OJ and that X ® Y is a complement of N. 5.Il Let M be a proper subspace of V, and let {ai: i E I} be a finite set in r. Set L = L( {ai}), and suppose that jlJ +L = Y. Show that there is a subset J C I such
  • 73. 1.5 DIRECT SUMS 61 that {ai: i E J} spans a complement of j[. [Hint: Consider a largest possible subset J such that ill n L ({ai} J) = {O}.l 5.12 Given T E Hom(Y, 11") and S E Hom(lr, X), show that a) SoT is surjective <=> S is surjective and R(T) + N(S) = W; b) SoT is injective <=> T is injective and R(T) n N(S) = {O}; c) So Tis an isomorphism <=> Sis surjective, Tis injective,and lr = R(T) EB N(S). 5.13 Assuming that every subspace of V has a complement, show that T E Hom Y satisfies T2 = 0if and only if l' has a din'ct sum decomposition Y = JIll EB N such that T = 0 on Nand T[Jl1 C N. 5.14 Suppose next that T3 = 0 but T2 rf O. Show that l' can be written as Y = Y 1 EB Y 2 EB 1':l, wlH're T[Vrl C T'2, T[l'21 C Y:l , and T = 0 on 1':1. (Assume again that any subspace of a vector space haH a complement.) 5.15 We now suppose that Tn = 0 but Tn-l rf o. Set Ni = null space (Ti) for t = 1, ... ,n - 1, and let VI be a complement of N n-l in Y. Show first that T[Yl1 n N n-2 = {OJ and that T[yrl C N n-l. Extend T[Yl1 to a complement Y2 of N n-2 in Nn-l, and show that in this way we can construct subspaces Y 1, ... , Y n such that n Y = ED Yi , for i < n, 1 and T[V n1 = {OJ. On solving a linear equation. 1fany important problems in mathematics are in the following general form. A linear operator T: V ---t W is given, and for a given 1/ E W the equation T(~) = 1/ is to be solved for ~ E V. In our terms, the condition that there exist a solution is exactly the condition that 1/ be in the range space of T. In special circumstances this condition can be given more or less useful equivalent alternative formulations. Let us suppose that we know how to recognize R(T), in which case we may as well make it the new codomain, and so assume that T is surjective. There still remains the problem of determin- ing what we mean by solving the equation. The universal principle running through all the important instances of the problem is that a solution process calculates a right inverse to T, that is, a linear operator S: W ---t V such that To S = I w, the identity on W. Thus a solution process picks one solution vector ~ E V for each 1/ E W in such a way that the solving ~ varies linearly with 1/. Taking this as our meaning of solving, we have the following fundamental reformulation. TheoreIll 5.3. Let T be a surjective linear map from the vector space V to the vector space W, and let N be its null space. Then a subspace M is a complement of N if and only if the restriction of T to M is an isomorphism from M to W. The mapping M ~ (T rM)-1 is a bijection from the set of all such complementary subspaces M to the set of all linear right inverses of T.
  • 74. 62 VECTOR SPACES 1.5 Proof. It should be clear that a subspace M is the range of a linear right inverse of T (a map S such that T 0 S = Iw) ifand only if T rM is an isomorphism to W, in which caseS = (T rM)-l. Strictly speaking, the right inverse must be from W to V and therefore must be R = LM 0 S, where LM is the identity injection from M to V. Then (R 0 T)2 = R 0 (T 0 R) 0 T = R 0 Iw 0 T = RoT, and RoT is a projection whose range is M and whose null space is N (since R is injective). Thus V = M Ef) N. Conversely, if V = M Ef) N, then T rM is injective because JI.f n N = {O} and surjective because M +N = V implies that W = T[V] = T[JI.f +N] = T[M] + T[N] = T[M] + {O} = T[M]. 0 Polynomials in T. The material in this subsection will be used in our study of differential equations with constant coefficients and in the proof of the diagonal- izability of a symmetric matrix. In linear algebra it is basic in almost any approach to the canonical forms of matrices. If PI(t) = Lo ai and P2(t) = Lo bjtj are any two polynomials, then their product is the polynomial m+n p(t) = PI(t)P2(t) = L Cktk, o where Ck = Li+j=k aibj = Lf=o aibk-i. Now let T be any fixed element of Hom(V), and for any polynomial q(t) let q(T) be the transformation obtained by replacing t by T. That is, if q(t) = L~ Cktk, then q(T) = L~ CkTk, where, of course, Tl is the composition product ToT 0 ••• 0 T with l factors. Then the bilinearity of composition (Theorem 3.3) shows that if p(t) = PI (t)P2(t), then p(T) = PI(T) 0 P2(T). In particular, any two polynomials in T commute with each other under composition. More simply, the commutative law for addition implies that if p(t) = PI(t) +P2(t), then p(T) = PI(T) +P2(T). The mapping p(t) 1---+ p(T) from the algebra of polynomials to the algebra Hom(V) thus preserves addition, multiplication, and (obviously) scalar mUltipli- cation. That is, it preserves all the operations of an algebra and is therefore what is called an (algebra) homomorphism. The word "homomorphism" is a general term describing a mapping 8 between two algebraic systems of the same kind such that 8 preserves the operations of the system. Thus a homomorphism between vector spaces is simply a linear transformation, and a homomorphism between groups is a mapping preserving the one group operation. An accessible, but not really typical, example of the latter is the logarithm function, which is a homomorphism from the multiplicative group of positive real numbers to the additive group of ~. The logarithm function is actually a bijective homomorphism and is therefore a group isomorphism. If this were a course in algebra, we would show that the division algorithm and the properties of the degree of a polynomial imply the following theorem. (However, see Exercises 5.16 through 5.20.)
  • 75. 1.5 DIRECT SUMS 63 Theorem 5.4. If PI(t) and P2(t) are relatively prime polynomials, then there exist polynomials al(t) and a2(t) such that al(t)PI (t) + a2(t)P2(t) = 1. By relatively prime we mean having no common factors except constants. We shall assume this theorem and the results of the discussion preceding it in proving our next theorem. We say that a subspace MeV is invariant under T E Hom(V) if T[M] eM [that is, T rM E Hom(M)]. Theorem 5.5. Let T be any transformation in Hom V, and let q be any polynomial. Then the null space N of q(T) is invariant under T, and if q = qlq2 is any factorization of q into relatively prime factors and N I and N 2 are the null space of ql(T) and q2(T), respectively, then N = N I E9 N 2. Proof. Since T 0 q(T) = q(T) 0 T, we see that if q(T)(a) = 0, then q(T)(Ta) = T(q(T)(a)) = 0, so T[N] eN. Note also that since q(T) = ql(T) 0 q2(T), it follows that any a in N 2 is also in N, so N 2 eN. Similarly, N leN. We can therefore replace V by Nand T by T rN j hence we can assume that T E Hom N and q(T) = ql(T) 0 q2(T) = 0. Now choose polynomials al and a2 so that alql + a2q2 = 1. Since P 1--+ p(T) is an algebraic homomorphism, we then have al(T) 0 ql(T) + a2(T) 0 q2(T) = I. Set Al = al(T), etc., so that Al 0 QI + A2 0 Q2 = I, QI 0 Q2 = 0, and all the operators Ai, Qi commute with each other. Finally, set Pi = Ai 0 Qi = Qi 0 Ai for i = 1,2. Then PI + P 2 = I and P IP 2 = P 2P I = 0. Thus PI and P 2 are projections, and N is the direct sum of their ranges: N = VI E9 V2. Since each range is the null space of the other projectivn, we can rewrite this as N = N I E9 N 2, where Ni = N(Pi). It remains for us to show that N(Pi) = N(Qi). Note first that since QI 0 P 2 = QI 0 Q2 0 A2 = 0, we have QI = QI 0 I = QI 0 (PI + P 2) = QI 0 Pl' Then the two identities Pi = Ai 0 Qi and Qi = Qi 0 Pi show that the null space of each of Pi and Qi is included in the other, and so they are equal. This completes the proof of the theorem. 0 Corollary. Let p(t) = rrf'=l Pi(t) be a factorization of the polynomial p(t) into relatively prime factors, let T be an element of Hom(Il), and set Ni = N(Pi(T)) for i = 1, ... ,m and N = N(p(T)). Then N and all the Ni are invariant under T, and N = EBf'=1 N i. Proof. The proof is by induction on m. The theorem is the case m = 2, and if we set q = rr~ Pi(t) and M = N(q(T)), then the theorem implies that N = N I E9 M and that N I and M are invariant under T. Restricting T to M, we see that the inductive hypothesis implies that M = EBf'=2 Ni and that Ni is invariant under T for i = 2, ... ,m. The corollary to Lemma 5.3 then yields our result. 0
  • 76. 64 VECTOR SPACES 1.5 EXERCISES 5.16 Presumably the reader knows (or can see) that the degree d(P) of a polynomial P satisfies the laws d(P +Q) ~ max {d(P), d(Q)}, d(P . Q) = d(P) + d(Q) if both P and Q arc nonzero. The degree of the zero polynomial is undefined. (It would have to be -oo!) By induc- tion on the degree of P, prove that for any two polynomials P and D, with D ~ 0, there are polynomials Q and R such that P = DQ + Rand d(R) < d(D) or R = o. [/lint: If d(P) < d(D), we can takeQ and R as what? If d(P) ;::: d(D), and if the lead- ing terms of P and D arc ax" and bx>n, respectively, with n ;::: m, then the polynomial P' = P _ (~) x,,-mD has degree less than d(P), so P' = DQ' + R' by the inductive hypothesis. Now finish the prooL] 5.17 Assuming the above result, prove that Rand Q are uniquely determined by P and D. (Assume also that P = DQ' + R', and prove from the properties of degree that R' = Rand Q' = Q.) These two results together constitute the division algorithm for polynomials. 5.18 If P is any polynomial n P(x) = L: anxn, o and if t is any number, then of course pet) is the number n L: a"t". o Prove from the division algorithm that for any polynomial P and any number t there is a polynomial Q such that P(x) = (x - t)Q(x) + pet), and therefore that P(x) is divisible by (x - t) if and only if pet) = o. 5.19 Let P and Q be nonzero polynomials, and choose polynomials Ao and Bo such that among all the polynomials of the form AP + BQ the polynomial D = AoP+ BoQ is nonzero and has minimum degree. Prove that D is a factor of both P and Q. (Sup- pose that D does not divide P and apply the division algorithm to get a contradiction with the choice of Ao and Bo.) 5.20 Let P and Q be nonzero relatively prime polynomials. This means that if E is a common factor of P and Q (P = EP', Q = EQ'), then E is a constant. Prove that there are polynomials A and B such that A(x)P(x) + B(x)Q(x) = 1. (Apply the above exercise.)
  • 77. 1.5 DIRECT SL"MS 65 5.21 In the context of Theorem 5.5, show that the restriction of q2(T) = Q2 to N 1 is an isomorphism (from N 1 to N 1). 5.22 An involution on V is a mapping T EO Hom V such that T2 = 1. Show that if T is an involution, then V is a direct sum V = Vi EB V 2, where T(~) = ~ for every ~ EO Vi (T = Ion Vi) and TW = -~ for every ~ EO 1'2 (T = -Ion V2). (Apply Theorem 5.5.) 5.23 We noticed earlier (in an exercise) that if cp is any mapping from a set. t to a set B, then fl----+ fo cp is a linear map T." from RE to R.i. Show now that if 1/;; B -> C, then (This "hould turn out to hr a direct const'qurnce of tht' associativity of composition.) 5.2,t Let. t be any set, and let cp; . t -> A be such that cp 0 cp(a) = a for ('very a. Then T.,,;fl----+focp is an involution on V = R.t (since T."oy, = Ty,o T.,,). Show that the decomposition of RR as the direct sum of the subspace of even functions and the subspace of odd functions aris('s from an involution on RR defined by such a map cp; R -> IR. 5.25 Let V be a subspace of RR consisting of differentiable functions, and suppose that V is invariant under differentiation (f EO V=} Df EO V). Suppose abo that on V the linear operator D EO Hom V satisfies D2 - 2D - 31 = o. Prove that V is the direct sum of two subspaces J[ and N such that D = 31 on J[ and D = -Ion N. Actually, it follows that J[ is the linear span of a single vector, and similarly for N. Find these two functions, if you can. (f' = 3f=} f = '?) *Block decompositions of linear maps. Given T in Hom V and a direct sum decomposition V = EB7 Vi, with corresponding projections {Pi}7, we can consider the maps Tij = Pi 0 To Pj. Although Tij is from V to V, we may also want to consider it as being from V j to Vi (in which case, strictly speaking, what is it?). We picture the T;/s arranged schematically in a rectangular array Similar to a matrix, as indicated below for n = 2. Furthermore, since T = Li,j Tij, we call the doubly indexed family the block decomposition of T associated with the given direct sum decomposition of V. 110re generally, if TEO Hom(V, W) and W also has a direct sum decomposi- tion W = EBi=l Wi, with corresponding projections {Qi}'{', then the family {Tij} defined by Tij = Qi 0 To P j and pictured as an m X n rectangular array is the block decomposition of T with respect to the two direct sum decompositions. Whenever T in Hom V has a special relationship to a particular direct sum decomposition of V, the corresponding block diagram may have features that display these special properties in a vivid way; this then helps us to understand the nature of T better and to calculate with it more easily.
  • 78. 66 VECTOR SPACES 1.5 For example, if V = VI E9 V 2, then VIis invariant under T (i.e., T[V de VI) if and only if the block diagram is upper triangular, as shown in the following diagram. Suppose, next, that T2 = O. Letting VI be the range of T, and supposing that V 1 has a complement V2, the reader should clearly see that the corresponding block diagram is This form is called strictly upper triangular; it is upper triangular and also zero on the main diagonal. Conversely, if T has some strictly upper-triangular 2 X 2 block diagram, then T2 = O. If R is a composition product, R = ST, then its block components can be computed in terms of those of Sand T. Thus Rik = PiRP" = PiSTPk = PiS (f Pj) TPk = t SijTjk. j=1 j=1 We have used the identities 1= L.f=1 Pj and Pj = pJ. The 2 X 2 case is pictured below. SllTll +S12T 21 SllT12 +S12T 22 S21 T ll +S22T 21 S21 T 12 +S22T 22 From this we can read off a fact that will be useful to us later: If Tis 2 X 2 upper triangular (T21 = 0), and if Tii is invertible as a map from Vi to Vi (i = 1, 2), then T is invertible and its inverse is T -1 11 o -Tll-ITI2T22-1 T22- 1 We find this solution by simply setting the product diagram equal to II 0 and solving; but of course with the diagram in hand it can simply be checked to be correct. EXERCISES 5.26 Show that if T E Hom V, if V = EBl' Vi, and if {Pi}]' are the corresponding projections, then the sum of the transformation Tij = Pi 0 To Pj is T.
  • 79. 1.6 BILINEARITY 67 5.27 If Sand T are in Hom V and {Sij} , {Tij} are their block components with respect to some direct sum decomposition of V, show that Sij 0 Tlk = 0 if j ,t. l. 5.28 Verify that if T has an upper-triangular block diagram with respect to the direct sum decomposition V = VI <±l V2, then VIis invariant under T. 5.29 Verify that if the diagram is strictly upper triangular, then T2 = o. 5.30 Show that if V = VI <±l V2 <±l V3 and T E Hom V, then the subspaces Vi are all invariant under T if and only if the block diagram for Tis [T o 1o . T33 Show that T is invertible if and only if Tii is invertible (as an element of Hom Vi) for each i. 5.31 Supposing that T has an upper-triangular 2 X 2 block diagram and that Tii is invertible as an element of Hom Vi for i = 1, 2, verify that T is invertible by form- ing the 2 X 2 block diagram that is the product of the diagram for T and the diagram given in the text as the inverse of T. 5.32 Supposing that T is as in the preceding exercise, show that S = T-I must have the given block diagram by considering the two equations To S = I and SoT = I in their block form. 5.33 What would strictly upper triangular mean for a 3 X 3 block diagram? What is the corresponding property of T? Show that T has this property if and only if it has a strictly upper-triangular block diagram. (See Exercise 5.14.) 5.34 Suppose that T in Hom V satisfies Tn = 0 (but T,,-l ,t. 0). Show that T has a strictly upper-triangular n X n block decomposition. (Apply Exercise 5.15.) 6. BILINEARITY Bilinear mappings. The notion of a bilinear mapping is important to the un- derstanding of linear algebra because it is the vector setting for the duality principle (Section 0.10). Definition. If U, V, and Ware vector spaces, then a mapping w: -<~, 7]> ~ w(~, 7]) from U X V to W is bilinear if it is linear in each variable when the other variable is held fixed. That is, if we hold ~ fixed, then 7] ~ w(~, 7]) is linear [and so belongs to Hom(V, W)]; if we hold 7] fixed, then similarly w(~, 7]) is in Rom(U, W) as a function of~. This is not the same notion as linearity on the product vector space U X V. For example, -< x, y > ~ x + y is a linear mapping from 1R2 to IR, but it is not bilinear. If y is held fixed, then the mapping x ~ x + y is affine (translation through y), but it is not linear unless y is O. On the other hand, -< x, y > ~ xy is a bilinear mapping from 1R2 to IR, but it is not linear. If y
  • 80. 68 VECTOR SPACES 1.6 is held fixed, then the mapping x ~ yx is linear. But the sum of two ordered couples does not map to the sum of their images: <x, y> + <u, v> = <x +u, y +v> ~ (x +u)(y +v), which is not the sum of the images, xy +uv. Similarly, the scalar product (x, Y) = Ll XiYi is bilinear from IRn X IRn to IR, as we observed in Section 2. The linear meaning of bilinearity is partially explained in the following theorem. Theorem 6.1. If w: U X V ~ W is bilinear, then, by duality, w is equiv- alent to a linear mapping from U to Hom(V, W) and also to a linear mapping from V to Hom(U, W). Proof. For each fixed TJ E V let w~ be the mapping ~ ~ w(~, TJ). That is, w~w = w(~, TJ). Then w~ E Hom(U, W) by the bilinear hypothesis. The mapping TJ ~ w~ is thus from V to Hom(U, W), and its linearity is due to the linearity of w in TJ when ~ is held fixed: wCHd!"(O = w(~, CTJ +cit) = cw(~, TJ) +ciwa, t) = cw~W +ciw!"(~), so that Similarly, if we define w~ by w~(TJ) = w(~, TJ), then ~ ~ w~ is a linear mapping from U to Hom(V, W). Conversely, if IP: U ~ Hom(V, W) is linear, then the function w defined by w(~, TJ) = IP(O(TJ) is bilinear. Moreover, w~ = IP(~), so that IP is the mapping ~ ~ w~. 0 We shall see that bilinearity occurs frequently. Sometimes the reinterpreta- tion provided by the above theorem provides new insights; at other times it seems less helpful. For example, the composition map <S, T> ~ SoT is bilinear, and the corollary of Theorem 3.3, which in effect states that composition on the right by a fixed T is a linear map, is simply part of an explicit statement of the bilinearity. But the linear map T ~ composition by T is a complicated object that we have no need for except in the case W = lR- On the other hand, the linear combination formula Ll Xi(Xi and Theorem 1.2 do receive new illumination. Theorem 6.2. The mapping w(x, ex) = Ll Xi(Xi is bilinear from IR n X vn to V. The mapping ex ~ Wa. is therefore a linear mapping from vn to Hom(lRn , V), and, in fact, is an isomorphism. Proof. The linearity of w in x for a fixed ex was proved in Theorem 1.2, and its linearity in ex for a fixed x is seen in the same way. Then ex ~ Wa. is linear by Theorem 6.1. Its bijectivity was implicit in Theorem 1.2. 0 It should be remarked that we can use any finite index set I just as well as the special set'ii and conclude that w(x, ex) = LiEI Xi(Xi is bilinear from IRI X Vi
  • 81. 1.6 BILINEARITY 69 to V and that a 1-+ Wa. is an isomorphism from VI to Hom(~I, V). Also note that Wa. = La. in the terminology of Section 1. Corollary. The scalar product (x, a) = I:1 Xiai is bilinear from ~n X ~n to ~; therefore, a 1-+ Wa = La is an isomorphism from ~n to Hom(~n, ~). Natural isomorphisms. We often find two vector spaces related to each other in such a way that a particular isomorphism between them is singled out. This phenomenon is hard to pin down in general terms but easy to describe by examples. Duality is one source of such "natural" isomorphisms. For example, an m X n matrix {iii} is a real-valued function of the two variables -< i,} >-, and as such it is an element of the Cartesian space ~mXn. We can also view {iii} as a sequence of n column vectors in ~m. This is the dual point of view where we hold} fixed and obtain a function of i for each i From this point of view {iii} is an element of (~m)n. This correspondence between ~mXn and (~m)n is clearly an isomorphism, and is an example of a natural isomorphism. We review next the various ways of looking at Cartesian n-space itself. One standard way of defining an ordered n-tuplet is by induction. The ordered triplet -< x, y, z>- is defined as the ordered pair -< -< x, y>- , z>- , and the ordered n-tuplet -<XI, ... , xn>- is defined as -< -<XI, ... , Xn-l>-, xn>-. Thus we define ~n inductively by setting ~l = ~ and ~n = ~n-l X ~. The ordered n-tuplet can also be defined as the function on n = {I, ... , n} which assigns Xi to i. Then -<Xl,""Xn>- = {-<l,Xl>-,"" -<n,xn>-}, and Cartesian n-space is ~n = ~ (1, ... , nl . Finally, we often wish to view Cartesian (n +m)-space as the Cartesian product of Cartesian n-space with Cartesian m-space, so we now take as and ~n+m as ~n X ~m. Here again if we pair two different models for the same n-tuplet, we have an obvious natural isomorphism between the corresponding models for Cartesian n-space. Finally, the characteristic properties of Cartesian product spaces given in Theorems 3.6 and 3.7 yield natural isomorphisms. Theorem 3.6 says that an n-tuple of linear maps {Tig on a common domain V is equivalent to a single n-tuple-valued map T, where T(~) = -< Tl(~)"'" TnW >- for all ~ E V. (This is duality again! TiW is a function of the two variables i and 0 And it is not hard to see that this identification of T with {Tig is an isomorphism from I1i Hom(V, Wi) to Hom(V, IIi Wi). Similarly, Theorem 3.7 identifies an n-tuple of linear maps {Ti }1 into a com- mon codomain V with a single linear map T of an n-tuple variable, and this iden- tification is a natural isomorphism from II1 Hom(Wi, V) to Hom(II1 Wi, V).
  • 82. 70 VECTOR SPACES l.G An arbitrary isomorphism between two vector spaces identifies them in a transient way. For the moment we think of the vector spaces as representing the same abstract space, but only so long as the isomorphism is before us. If we shift to a different isomorphism between them, we obtain a new temporary identification. Natural isomorphisms, on the other hand, effect permanent identifications, and we think of paired objects as being two aspects of the same object in a deeper sense. Thus we think of a matrix as "being" either a sequence of row vectors, a sequence of column vectors, or a single function of two integer indices. We shall take a final look at this question at the end of Section 3 in the next chapter. *We can now make the ultimate dissection of the theorems centering around the linear combination formula. Laws Sl through S3 state exactly that the scalar product xa is bilinear. lIore precisely, they state that the mapping s: -< x, a >- ~ xa from IR X W to W is bilinear. In the language of Theorem G.1, xa = w",(x), and from that theorem we conclude that the mapping a ~ w'" is an isomorphism from W to Hom(lR, W). This isomorphism between Wand Hom(lR, W) extends to an isomor- phism from wn to (Hom(lR, W)) n, which in turn is naturally isomorphic to Hom(lRn,W) by the second Cartesian product isomorphism. Thus wn is natu- rally isomorphic to Hom(lRn , W); the mapping is a ~ La., where La.(x) = L~ Xiai· In particular, IRn is naturally isomorphic to the space Hom(lRn , IR) of all linear functionals on IR n , the n-tuple a corresponding to the functional Wa defined by wa(x) = L~ aiXi. Also, (IRrn)n is naturally isomorphic to Hom(lRn , 1R1n). And since IRmXn is naturally isomorphic to (IRrn)n, it follows that the spaces IRmXn and Hom(lR n , IRrn) are naturally isomorphic. This is simply our natural association of a transfor- mation Tin Hom(lR n , IRm) to an m X n matrix {tij}.
  • 83. CHAPTER 2 FINITE-DIMENSIONAL VECTOR SPACES We have defined a vector space to be finite-dimensional if it has a finite spanning set. In this chapter we shall focus our attention on such spaces, although this restriction is unnecessary for some of our discussion. We shall see that we can assign to each finite-dimensional space V a unique integer, called the dimension of V, which satisfies our intuitive requirements about dimensionality and which becomes a principal tool in the deeper explorations into the nature of such e;paces. A number of "dimensional identities" are crucial in these further investigations. We shall find that the dual space of all linear functionals on V, V* = Hom(V, IR), plays a more satisfactory role in finite-dimensional theory than in the context of general vector spaces. (However, we shall see later in the book that when we add limit theory to our algebra, there are certain special infinite-dimensional vector spaces for which the dual space plays an equally important role.) A finite-dimensional space can be characterized as a vector space isomorphic to some Cartesian space IRn , and such an isomorphism allows a transformation T in Hom V to be "transferred" to IRn , whereupon it acquires a matrix. The theory of linear transformations on such spaces is therefore mirrored eompletely by the theory of matrices. In this chapter we shall push much deeper into the nature of this relationship than we did in Chapter 1. We also include a section on matrix computations, a brief section describing the trace and determinant functions, and a short discussion of the diagonalization of a quadratic form. I. BASES Consider again a fixed finite indexed set of vectors a = {ai: i E I} in V and the corresponding linear combination map La.: x f---+ L Xiai from IRI to V having a lie; skeleton. Definition. The finite indexed set {ai: i E I} is independent if the above mapping La. is injective, and {ai} is a basis for V if La. is an isomorphism (onto V). In this situation we call {ai: i E J} an ordered basis or frame if J = n = {I, ... , n} for some positive integer n. Thus {ai : i E J} is a basis if and only if for each ~ E V there exists a unique illdexed "coefficient" set x = {Xi: i E I} E IRI such that ~ = L Xiai. The 71
  • 84. 72 FINITE-DIMENSIONAL VECTOR SPACES 2.1 numbers Xi always exist because {ai: i E I} spans V, and x is unique because La. is injective. For example, we can check directly that hI = -< 2, 1>- and h 2 = -< 1, -3 >- form a basis for jR2. The problem is to show that for each y E jR2 there is a unique x such that 2 Y = L Xihi = Xl -<2,1>- + X2-< 1, -3>- = -<2XI + X2, Xl - 3X2>-. I Since this vector equation is equivalent to the two scalar equations YI = 2XI +X2 and Y2 = Xl - 3X2, we can find the unique solution Xl = (3YI +Y2)/7, X2 = (YI - 2Y2)/7 by the usual elimination method of secondary school algebra. The form of these definitions is dictated by our interpretation of the linear combination formula as a linear mapping. The more usual definition of indepen- dence is a corollary. Lelllllla I.I. The independence of the finite indexed set {ai: i E I} is equivalent to the property that LI Xiai = 0 only if all the coefficients Xi are O. Proof. This is the property that the null space of La. consist only of 0, and is thus equivalent to the injectivity of La., that is, to the independence of {ai} , by Lemma 1.1 of Chapter 1. 0 If {ain is an ordered basis (frame) for V, the unique n-tuple x such that ~ = L~ Xiai is called the coordinate n-tuple of ~ (with respect to the basis {ai}), and Xi is the ith coordinate of~. We call Xiai (and sometimes Xi) the ith component of~. The mapping La. will be called a basis isomorphism, and its inverse L;:l, which assigns to each vector ~ E V its unique coordinate n-tuple x, is a coordinate isomorphism. The linear functional ~ 1---+ Xi is the jth coordinate functional; it is the composition of the coordinate isomorphism ~ 1---+ x with the jth coordi- nate projection x 1---+ Xi on jRn. We shall see in Section 3 that the n coordinate functionals form a basis for V* = Hom(V, jR). In the above paragraph we took the index set I to be n = {I, ... , n} and used the language of n-tuples. The only difference for an arbitrary finite index set is that we speak of a coordinate function x = {Xi: i E I} instead of a coordi- nate n-tuple. Our first concern will be to show that every finite-dimensional (finitely spanned) vector space has a basis. We start with some remarks about indices. We note first that a finite indexed set {ai: i E I} can be independent only if the indexing is injective as a mapping into V, for if ak = aI, then L Xiai = 0, where Xk = 1, Xl = -1, and Xi = 0 for the remaining indices. Also, if {ai : i E l} is independent and J C I, then {ai : i E J} is independent, since if LJ Xiai = 0, and if we set Xi = 0 for i E I - J, then LI Xiai = 0, and so each xi is o. A finite unindexed set is said to be independent if it is independent in some
  • 85. 2.1 BASES 73 (necessarily bijective) indexing. It will of course then be independent with respect to any bijective indexing. An arbitrary set is independent if every finite subset is independent. It follows that a set A is dependent (not independent) if and only if there exist distinct clements al> ..• , an in A and scalars Xl, ... , xn not all zero such that L7 x,;a,; = o. An unindexed basis would be defined in the obvious way. However, a set can always be regarded as being indexed, by itself if necessary! Lemma 1.2. If B is an independent subset of a vector space V and {3 is any vector not in the linear span L(B), then B U {{3} is independent. Proof. Otherwise there is a zero linear combination, ;r{3 + L~ :r,;{3,- = 0, where {3I> •.. , {3n arc distinct clements of B and the coefficients are not all O. But then X cannot be zero: if it were, the equation would contradict the independence of B. We can therefore divide by;'/: and solve for {3, so that {3 E L(B), a contra- diction. 0 The reader will remember that we call a vector space V finite-dimensional if it has a finite spanning set {aJ 7. We can use the above lemma to construct a basis for such a V by choosing some of the a/so We simply run through the sequence {aig and choose those members that increase the linear span of the preceding choices. We end up with a spanning set since {aig spans, and our subsequence is independent at each step, by the lemma. In the same way we can extend an independent set {{3i} 7to a basis by choosing some members of a spanning set {ai} 7. This procedure is intuitive, but it is messy to set up rigor- ously. We shall therefore proceed differently. Theorem 1.1. Any minimal finite spanning set is a basis, and therefore any finite-dimensional vector space V has a basis. More generally, if {{3j : j E J} is a finite independent set and {ai: i E J} is a finite spanning set, and if K is a smallest subset of J such that {{3j} J U {ai} K spans, then this collection is independent and 2, basis. Therefore, any finite independent subset of a finite-dimensional space can be extended to a basis. Proof. It is sufficient to prove the second assertion, since it includes the first as a special case. If {{3j} J U {ai} K is not independent, then there is a nontrivial zero linear combination LJ Yj{3j + LK Xiai = O. If every Xi were zero, this pquation would contradict the independence of {{3j} J. Therefore, some Xk is Ilot zero, and we can solve the equation for ak. That is, if we set L = K - {k}, then the linear span of {{3j} J U {ai} L contains ak. It therefore includes the whole original spanning set and hence is V. But this contradicts the minimal nature of K, since L is a proper subset of K. Consequently, {{3j} J U {aJ K is independent. 0 We next note that ~n itself has a very special basis. In the indexing map ,: ~ ai the vector aj corresponds to the index j, but under the linear combi- Ilation map x ~ L Xiai the vector aj corresponds to the function oj which has t he value 1 at j and the value 0 elsewhere, so that Li o{ai = aj. This function
  • 86. 74 FINITE-DIMENSIONAL VECTOR SPACES 2.1 ~j is called a Kronecker delta function. It is clearly the characteristic function XB of the one-point set B = {J}, and the symbol ,~j, is ambiguous, just as 'XB' is ambiguous; in each case the meaning depends on what domain is implicit from the context. We have already used the delta functions on ~n in proving Theorem 1.2 of Chapter 1. Theorem 1.2. The Kronecker functions {~j}i=l form a basis for ~n. Proof. Since Li Xi~i(j) = Xj by the definition of ~i, we see that L~ Xi~i is the n-tuple x itself, so the linear combination mapping L5: x 1--+ L~ Xi~i is the identity mapping x 1--+ x, a trivial isomorphism. 0 Among all possible indexed bases for ~n, the Kronecker basis is thus singled out by the fact that its basis isomorphism is the identity; for this reason it is called the standard basis or the natural basis for ~n. The same holds for ~l for any finite set I. Finally, we shall draw some elementary conclusions from the existence of a basis. Theorem 1.3. If T E Hom(V, W) is an isomorphism and a = {ai: i E I} is a basis for V, then {T(ai) : i E I} is a basis for W. Proof. By hypothesis Lo. is an isomorphism in Hom(~n, V), and so To Lo. is an isomorphism in Hom(~n, W). Its skeleton {T(ai)} is therefore a basis for W. 0 We can view any basis {ai} as the image of the standard basis {~i} under the basis isomorphism. Conversely, any isomorphism 8: ~l --t B becomes a basis isomorphism for the basis aj = 8(~j). Theorem 1.4. If X and Yare complementary subspaces of a vector space V, then the union of a basis for X and a basis for Y is a basis for V. Conversely, if a basis for V is partitioned into two sets, with linear spans X and Y, respectively, then X and Yare complementary subspaces of V.. Proof. We prove only the first statement. If {ai : i E J} is a basis for X and {ai: i E K} is a basis for Y, then it is clear that {ai : i E J uK} spans V, since its span includes both X and Y, and so X + Y = V. Suppose then that LJUK Xiai = O. Setting ~ = LJ Xiai and 7J = LK Xiai, we see that ~ E X, 7J E Y, and ~ + 7J = O. But then ~ = 7J = 0, since X and Yare complementary. And then Xi = 0 for i E J because {ai} J is independent, and Xi = 0 for i E K because {ai} K is independent. Therefore, {ai} JUK is a basis for V. We leave the converse argument as an exercise. 0 Corollary. If V = EB~ Vi and Bi is a basis for Vi, then B = U~ Bi is a basis for V. Proof. We see from the theorem that Bl U B2 is a basis for VI EB V 2. Proceed- ing inductively we see that U{=l Bi is a basis for EB{=l Vi for j = 2, ... , n, and the corollary is the case j = n. 0
  • 87. 2.1 BASES 75 If we follow a coordinate isomorphism by a linear combination map, we get the mapping of the following existence theorem, which we state only in n-tuple form. Theorelll 1.5. If {3 = {l1i} ~ is an ordered basis for the vector space V, and if {ai} ~ is any n-tuple of vectors in a vector space W, then there exists a unique S E Hom(V, W) such that S(l1i) = ai for i = 1, ... , n. Proof. By hypothesis L~ is an isomorphism in Hom(!Rn , V), and so S = La. 0 (L~)-1 is an clement of Hom(V, W) such that S(l1i) = La.(oi) = ai. Conversely, if S E Hom(V, W) is such that S(l1i) = ai for all i, then S 0 L~( oi) = ai for all i, so that S 0 L~ = La.. Thus S is uniquely determined as La. 0 (L~) -1. 0 It is natural to ask how the unique S above varies with the n-tuple {ai}. The answer is: linearly and "isomorphically". Theorelll 1.6. Let {l1i} ~ be a fixed ordered basis for the vector space V, and for each n-tuple a = {aig chosen from the vector space W let Sa. E Hom(V, W) be the unique transformation defined above. Then the map a ~ Sa. is an isomorphism from wn to Hom(V, W). Proof. As above, Sa. = La. 0 (r where 0 is the basis isomorphism L~. N"ow we know from Theorem 6.2 of Chapter 1 that a ~ La. is an isomorphism from wn to Hom(!Rn , W), and composition on the right by the fixed coordinate isomor- phism 0-1is an isomorphism from Hom(!Rn , W) to Hom(V, W) by the corollary to Theorem 3.3 of Chapter 1. Composing these two isomorphisms gives us the theorem. 0 *Infinite bases. Most vector spaces do not have finite bases, and it is natural to try to extend the above discussion to index sets I that may be infinite. The Kronecker functions {oi : i E I} have the same definitions, but they no longer span !RI . By definition f is a linear combination of the functions oi if and only if f is of the form LiEI! Cioi, where II is a finite subset of I. But then f = 0 outside of I l' Conversely, if f E !RI is 0 except on a finite set 111 then f = LiEIJ(i) oi. The linear span of {oi : i E I} is thus exactly the set of all func- tions of !RI that are zero except on a finite set. We shall designate this sub- space !RI. If {ai: i E I} is an indexed set of vectors in V and f E !RI , then the sum LiEI f(i)ai becomes meaningful if we adopt the reasonable convention that the sum of an arbitrary number of O's is O. Then LiEI = LiEIo' where lois any finite subset of I outside of which f is zero. With this convention, La.:f ~ Ld(i)ai is a linear map from !RI to V, as in Theorem 1.2 of Chapter 1. And with the same convention, LiEI f(i)ai is an elegant expression for the general linear combination of the vectors ai. Instead of choosing a finite subset II and numbers Ci for just those indices i in II, we define Ci for all i E I, but with the stipulation that Ci = 0 for all but a finite number of indices. That is, we take c = {Ci: i E I} as a function in !RI.
  • 88. 76 FINITE-DlMENSro","AL VECTOR SPACES 2.1 We make the same definitions of independence and basis as before. Then {ai: i E l} is a basis for V if and only if La.: IRI ~ V is an isomorphism, i.e., if and only if for each ~ E V there exists a unique x E IRI such that ~ = Li Xiai. By using an axiom of set theory called the axiom of choice, it can be shown that every vector space has a basis in this sense and that any independent set can be extended to a basis. Then Theorems 1.4 and 1.5 hold with only minor changes in notation. In particular, if a basis for a subspace 111 of V is extended to a basis for V, then the linear span of the added part is a subspace N comple- mentary to M. Thus, in a purely algebraic sense, every subspace has com- plementary subspaces. We assume this fact in some of our exercises. The above sums are always finite (despite appearances), and the above notion of basis is purely algebraic. However, infinite bases in this sense are not very useful in analysis, and we shall therefore concentrate for the present on spaces that have finite bases (i.e., are finite-dimensional). Then in one impor- tant context later on we shall discuss infinite bases where the sums are genuinely infinite by virtue of limit theory. EXERCISES 1.1 Show by a direct computation that {-< 1, -1>, -< 0, I>} is a basis for 1R2. 1.2 The student must realize that the ith coordinate of a vector depends on the whole basis and not jm3t on the ith basis vector. Prove this for the second coordinate of vectors in 1R2 using the standard basis and the basis of the above exercise. 1.3 Show that {-< 1, 1>, -< 1, 2>} is a basis for V = 1R2. The basis isomorphism from 1R2 to V is now from 1R2 to 1R3. Find its matrix. Find the matrix of the coordinate isomorphism. Compute the coordinates, with respect to this basis, of -< -1, 1 >, -< 0, 1>, -< 2,3 >. 1.4 Show that {bi}~, where b l = -<1,0,0>, b 2 = -<1,1,0>, and b 3 = -< 1, 1, 1>, is a basis for 1R3. 1.5 In the above exercise find the three linear functionals li that are the coordinate functionals with respect to the given basis. Since 3 x = L: li(X)b 1 finding the Ii is equivalent to solving x = L~ Yibi for the y/s in terms of x = -<Xl, X2, X3>. 1.6 Show that any set of polynomials no two of which have the same degree is independent. 1.7 Show that if {ai}~ is an independent subset of V and Tin Hom(V, W) is injec- tive, then {T(ai)}~ is an independent subset of W. 1.8 Show that if T is any element of Hom(V, W) and {T(ai)}~ is independent in W, then {ai}~ is independent in V.
  • 89. 2.2 DI~1ENSION 77 1.9 Later on WE' are going to call a vpctor space V n-dinlPnsional if every basis for l' contains exactly n ecmcnts. If l' is the span of a singlp vC'ctor a, so that l' = IRa, then V is clearly one-dimensional. Let {Vi} ~ bE' a collection of one-dimen~ional suh~pa('c~ of a vcrtor space Y, and choose a nonzero vpctor ai in V, for eaeh i. Provp that fai} ~ is indepC'ndpnt if ami only if the subspaces {Vi} ~ are indcppn<icnt and that fai: ~ is a ba"is if and only if V = EB7 Vi. 1.10 Finish the proof of TheorC'm 1.4. 1.11 Give a proof of ThC'orem 1.4 baspd on the pxistencC' of isomorphisms. 1.12 ThE' rpader would guess, and "(' shall prOH' in thC' next sE'ction, that l'very subspace of a finite-dimensional space is fini te-<iimen"ional. Prove now that a sub- space N of a finitC'-dinH'nsiollal H'dor spacer is finite-diIllcllsional if and only if it ha:, a complcment JI. (York from a combination of Theorems 1.1 and 1.4 and direct sum projections.) 1.13 SincE' {hin = {-< 1,0, °>, -< 1, ], °>, -< 1, 1, ] >} is a basis for 1R3, there is a unique T in Hom(1R3,1R2) such that T(h 1) = -<1,0>, T(h2) = -<0,1>, and T(h3 ) = -< 1, 1 >. Find thl' matrix of T. (Find T( 8i ) for i = 1, 2, 3.) 1.14 Find, similarly, the S in Hom 1R3 such that S(hi) = 8i for i = 1, 2, 3. 1.15 Show that the infinite spqupnce ft u } ~ is a basis for the vcclor space of all poly- nomials. 2. DIMENSION The concept of dimension rests on the fact that two different bases for the same space always contain the same number of elements. This number, which is then the number of elements in every basis for V, is called the dimension of V. It tells all there is to know about V to within isomorphism: There exists an isomorphism between two spaces if and only if they have the same dimension. We shall consider only finite dimensions. If V is not finite-dimensional, its dimension is an infinite cardinal number, a concept with which the reader is probably unfamiliar. Lemma 2.1. If V is finite-dimensional and T in Hom V is surjective, then T is an isomorphism. Proof. Let 11 be the smallest number of elements that can span V. That is, there is some spanning set {a;} 7and none with fewer than n elements. Then {ai} 7is a basis, by Theorem 1.1, and the linear combination map 0: x f---> 2::7 Xiai is accordingly a basis isomorphism. But {iJJ 7= {T(ai)} 7also spans, since T is surjective, and so ToO is also a basis isomorphism, for the same reason. Then T = (T 0 0) 0 0- 1 is an isomorphism. 0 Theorem 2.1. If V is finite-dimensional, then all bases for V contain the same number of elements. Proof. Two bases with nand m elements determine basis isomorphisms 0: IRn ~ V and cp: 1R1n ~ V. Suppose that m < n and, viewing IRn as IRrn X IR n - m ,
  • 90. 78 FINITE-DIMEXSIONAL VECTOR SPACES 2.2 let 7r be the projection of IRn onto IRm, Since T = 0-1 0 cp is an isomorphism from IRm to IRn and To 7r: IRn ~ IRn is therefore surjective, it follows from the lemma that T 0 7r is an isomorphism. Then 7r = T- l 0 (T 0 7r) is an isomorphism. But it isn't, because 7r(on) = 0, and we have a contradiction. Therefore no basis can be smaller than any other basis. D The integer that is the number of elements in every basis for V is of course called the dimension of V, and we designate it d(V). Since the standard basis {oi} ~ for IRn has n elements, we see that IRn is n-dimensional in this precise sense. Corollary. Two finite-dimensional vector spaces are isomorphic if and only if they have the same dimension. Proof. If T is an isomorphism from V to Wand B is a basis for V, then T[B] is a basis for W by Theorem 1.3. Therefore d(V) = #B = #T[B] = d(W), where #A is the number of elements in A. Conversely, if d(V) = d(W) = n, then V and Ware each isomorphic to IRn and so to each other. D Theorem 2.2. Every subspace M of a finite-dimensional vector space V is finite-dimensional. Proof. Let a be the family of finite independent subsets of M. By Theorem 1.1, if A E a, then A can be extended to a basis for V, and so #A ~ d(V). Thus {#A : A E a} is a finite set of integers, and we can choose B E a such that n = #B is the maximum of this finite set. But then L(B) = M, because other- wise for any a E ]J[ - L(B) we have B U {a} E a, by Lemma 1.2, and #(B U {a}) = n + 1, contradicting the maximal nature of n. Thus M is finitely spanned. D Corollary. Every subspace ]J[ of a finite-dimensional space V has a comple- ment. Proof. Use Theorem 1.1 to extend a basis for M to a basis for V, and let N be the linear span of the added vectors. Then apply Theorem 1.4. D DiInensional identities. We now prove two basic dimensional identities. We will always assume V finite-dimensional. Lemma 2.2. If V 1 and V 2 are complementary subspaces of V, then d(V) = d(V1) +d(V2)' More generally, if V = EB~ Vi then d(V) = L~ d(Vi)' Proof. This follows at once from Theorem 1.4 and its corollary. D Theorem 2.3. If U and Ware subspaces of a finite-dimensional vector space, then d(U + W) +d(U n W) = d(U) +d(W).
  • 91. 2.2 DIMENSION 79 Proof. Let V be a complement of U n Win U. We start by showing that then V is also a complement of W in U + W. First V + W = V + ((U n W) + W) = (V + (U n W») + W = U + W. We have used the obvious fact that the sum of a vector space and a subspace is the vector space. Next, V n W = (V n U) n W = V n (U n W) = {O}, because V is a complement of Un W in U. We thus have both V + W = U + Wand V n W = {O}, and so V is a complement of W in U + W by the corollary of Lemma 5.2 of Chapter 1. The theorem is now a corollary of the above lemma. We have d(U) +deW) = (d(U n W) +dey») +deW) = d(U n W) +(d(V) + deW») = d(U n W) +d(U + W). 0 Theorem 2.4. Let V be finite-dimensional, and let W be any vector space. Let T E Hom(V, W) have null space N (in V) and range R (in W). Then R is finite-dimensional and dey) = deN) +d(R). Proof. Let U be a complement of N in V. Then we know that T r U is an isomorphism onto R. (See Theorem 5.3 of Chapter 1.) Therefore, R is finite- dimensional and d(R) + deN) = d(U) + deN) = dey) by our first identity. 0 Corollary. If W is finite-dimensional and deW) = dey), then T is injective if and only if it is surjective, so that in this case injectivity, surjectivity, and bijectivity are all equivalent. Proof. T is surjective if and only if R = W. But this is equivalent to d(R) = deW), and if deW) = deY), then.the theorem shows this is turn to be equivalent to deN) = 0, that is, to N = {O}. 0 Theorem 2.5. If dey) = nand deW) = m, then Hom(V, W) is finite- dimensional and its dimension is mn. Proof. By Theorem 1.6, Hom(V, W) is isomorphic to wn which is the direct sum of the n subspaces isomorphic to W under the injections (Ji for i = 1, ... ,n. The dimension of wn is therefore L~ m = mn by Lemma 2.2. 0 Another proof of Theorem 2.5 will be available in Section 4. EXERCISES 2.1 Prove that if d(V) = n, then any spanning subset of n elements is a basis. 2.2 Prove that if-d(V) = n, then any independent subset of n elements is a basis. 2.3 Show that if d(V) = nand lV is a subspace of the same dimension, then W = V.
  • 92. 80 FIKITE-DIMENSIONAL VECTOR SPACES 2.2 2.4 Prove by using dimensional identities that if f is a nonzero linear functional on an n-dimensional space V, then its null space has dimension n - 1. 2.5 Prove by u::;ing dimensional identities that if f is a linear functional on a finite- dimensional space V, and if a is a vector not in its null space N, then V = N (B IRa. 2.6 Given that N is an (n - I)-dimensional subspa<;e of an n-dimen:sional vector space V, show that N is the null space of a linear functional. 2.7 Let X and Y be subspaces of a finite-dimensional vector space V, and suppose that Tin Hom(V, W) has null space N = X n L Show that T[X + 1"] = T[X] (B T( y), and then deduce Theorem 2.3 from Lemma 2.2 and Theorem 2.4. This proof still depends on the existence of a T having N = X n Y as its null space. Do we know of any such T'? 2.8 Show that if T' i~ finite-dimen;;ional and S, T E Hom V, then SoT = I ==} T is invertible. Show also that To S = I ==} T is invertible. 2.9 A subspace N of a vector space V has finite codimension n if the quotient space V IN is finite-dimensional, with dimension n. Show that a subspace N has finite codimension n if and only if N has a complementary subspace J1 of dimension 7!. (:Iove a basis for VI N ba<;k into V.) Do not assume V to be finite-dimensional. 2.10 Show that if N 1 and N 2 are subspaces of a vector space V with finite codimC'n- sions, then N = N 1 n N 2 has finite codimension and cod(N) ::;: cod(NI) + cod(Nz). (Consider the mapping ~ f-+ < ~I' ~2 >- when ~i is the coset of Ni containing ~.) 2.11 In the above exercise, suppose that cod(NI) = cod(N2), that is, d(VIN1) d(VIN2). Prove that d(NJ/lv') = d(N2IN). 2.12 Given nonzero vectors (3 in V and f in V* such that f({3) ,e 0, show that some scalar multiple of the mapping ~ f-+ f(~){3 is a projection. Prove that any projedion having a one-dimensional rangC' arises in this way. 2.13 We know that the choice of an origin 0 in Euclidean 3-space 1E3 indu<;C's a vector space structure in 1E3 (under the correspondence X f-+ OX) and that this vector space is three-dimensional. Show that a geometric plane through 0 becomes a two- dimensional subspace. 2.14 An m-dimensional plane ,11 is a translate N +ao of an m-dimensional subspa<;e N. Let {{3i} ~ be any basis of N, and set ai = {3i +ao. Show that M is exactly the set of linear combinations such that m LXi = 1. o 2.15 Show that Exercise 2.14 is a corollary of Exercise 4.14 of Chapter l. 2.16 Show, conversely, that if a plane M is the affine span of m + 1 elements, then its dimension is ::;: m. 2.17 From the above two exer<;ises concoct a direct definition of the dimension of an affine subspace.
  • 93. 2.3 THE DUAL SPACE 81 2.18 Write a small essay suggested by the following definition. An (rn + I)-tuple {ai}O' is affinely independent if the conditions together imply that m L: Xiai = 0 o Xi = 0 and m L: Xi = 0 o for all i. 2.19 A polynomial on a vector space V is a real-valued function on V which can be represented as a finite sum of finite products of linear functionals. Define the degree of a polynomial; define a hornogeneous polynornial oj degree k. Show that the set of homogeneous polynomials of degree k is a vector space X k • - 2.20 Continuing the above exercise, show that if kl < k2 < ... < kN, then the vector spaces {XkJf are independent sub3paces of the vector space of all polynomials. [Assume that a polynom'ial p(t) of a real variable can be the zero function only if all its coefficients are O. For any polynomial P on V consider the polynomials p,,(t) = P(ta).J 2.21 Let -<a, {3 >- be a basis for the two-dimensional space V, and let -<~, p. >- be the corresponding coordinate projections (dual basis in V*). Show that every polynomial on V "is a polynomial in the two variables ~ and p.". 2.22 Let -<a, {3 >- be a basis for a two-dimensional vector space V, and let -<~, p. >- be the corresponding coordinate projections (dual basis for V*). Show that -<~2, ~p., p.2 >- is a basis for the vector space of homogeneous polynomials on V of degree 2. Similarly, compute the dimension of the space of homogeneous polynomials of degree 3 on a two-dimensional vector space. 2.23 Let V and W be two-dimensional vector spaces, and let F be a mapping from V to W. Using coordinate systems, define the notion of F being quadratic and then show that it is independent of coordinate systems. Generalize the above exercise to higher dimensions and also to higher degrees. 2.24 Now let F: V ~ W be a mapping between two-dimensional spaces such that for any u, v E V and any l E W*, l(F(tu + v)) is a quadratic function of t, that is, of the form at2 + bt + c. Show that F is quadratic according to your definition in the above exercises. 3. THE DUAL SPACE Although throughout this section all spaces will be assumed finite-dimensional, many of the definitions and properties are valid for infinite-dimensional spaces as well. But for such spaces there is a difference between pmely algebraic situations and situations in which algebra is mixed with hypotheses of continuity. One of the blessings of finite dimensionality is the absence of this complication. As the reader has probably surmised from the number of special linear functionals we have met, particularly the coordinate functionals, the space Hom(V, IR) of all linear functionals on V plays a special role.
  • 94. 82 FINITE-DIMENSIONAL VECTOR SPACES 2.3 Definition. The dual space (or conjugate space) V* of the vector space V is the vector space Hom(V, IR) of all linear mappings from V to IR. Its elements are called linear functionals. Weare going to see that in a certain sense V is in turn the dual space of V* (V and (V*)* are naturally isomorphic), so that the two spaces are sym- metrically related. We shall briefly study the notion of annihilation (orthogonal- ity) which has its origins in this setting, and then see that there is a natural isomorphism between Hom(V, W) and Hom(W*, V*). This gives the mathema- tician a new tool to use in studying a linear transformation Tin Hom(V, W); the relationship between T and its image T* exposes new properties of T itself. Dual bases. At the outset one naturally wonders how big a space V* is, and we settle the question immediately. Theorem 3.1. Let {f3i}~ be an ordered basis for V, and let ej be the corre- sponding jth coordinate functional on V: ej(l;) = Xi> where ~ = L~ Xif3i. Then {ejg is an ordered basis for V*. Proof. Let us first make the proof by a direct elementary calculation. a) Independence. Suppose that L~ Cjej = 0, that is, L~ Cjej(O = °for all ~ E V. Taking ~ = f3i and remembering that the coordinate n-tuple of f3i is ~i, we see that the above equation reduces to Ci = 0, and this for all i. There- fore, {ej}~ is independent. b) Spanning. First note that the basis expansion ~ = L Xif3i can be re- written ~ = L ei(~)f3i' Then for any AE V* we have A(~) = L~ liei(~)' where we have set li = A(f3i). That is, A= L liei. This shows that {ej} ~ spans V*, and, together with (a), that it is a basis. D Definition. The basis {ej} for V* is called the dual of the basis {f3i} for V. As usual, one of our fundamental isomorphisms is lurking behind all this, but we shall leave its exposure to an exercise. Corollary. d(V*) = dey). The three equations A(~) = L A(f3i) . ei(~) are worth looking at. The first two are symmetrically related, each presenting the basis expansion of a vector with its coefficients computed by applying the corresponding element of the dual basis to the vector. The third is symmetric itself between ~ and A. Since a finite-dimensional space V and its dual space V* have the same dimension, they are of course isomorphic. In fact, each basis for V defines an isomorphism, for we have the associated coordinate isomorphism from V to IRn, the dual basis isomorphism from IRn to V*, and therefore the composite isomor-
  • 95. 2.3 THE DUAL SPACE 83 phism from V to V*. This isomorphism varies with the basis, however, and there is in general no natural isomorphism between V and V*. It is another matter with Cartesian space IRn because it has a standard basis, and therefore a standard isomorphism with its dual space (IRn)*. It is not hard to see that this is the isomorphism a 1-+ La, where La(x) = L~ aiXi, that we discussed in Section 1.6. We can therefore feel free to identify IR n with (IRn)*, only keeping in mind that when we think of an n-tuple a as a linear functional, we mean the functional La(x) = L~ aiXi. The second conjugate space. Despite the fact that V and V* are not naturally isomorphic in general, we shall now see that V is naturally isomorphic to V** = (V*)*. TheorelD 3.2. The function w: V X V* ~ IR defined by w(~, f) = f(O is bilinear, and the mapping ~ 1-+ w~ from V to V** is a natural isomorphism. Proof. In this context we generally set ~** = w~, so that ~** is defined by ~**(f) = fW for all f E V*. The bilinearity of w should be clear, and Theorem 6.1 of Chapter 1 therefore applies. The reader might like to run through a direct check of the linearity of ~ 1-+ ~** starting with (Cl h +C2 ~2) **(1). There still is the question of the injectivity of this mapping. If a ~ 0, we can find f E V* so that f(a) ~ O. One way is to make a the first vector of an ordered basis and to takefas the first functional in the dual basis; thenf(a) = 1. Since a**(f) = f(a) ~ 0, we see in particular that a** ~ O. The mapping ~ ~ ~** is thus injective, and it is then bijective by the corollary of Theorem 2.4. 0 If we think of V** as being naturally identified with V in this way, the two Hpaces V and V* are symmetrically related to each other. Each is the dual of t.he other. In the expression 'f(~)' we think of both symbols as variables and t.hen hold one or the other fixed for the two interpretations. In such a situation we often use a more symmetric symbolism, such as (~,f), to indicate our inten- t.ion to treat both symbols as variables. LelDlDa 3.1. If {Xi} is the basis in V* dual to the basis {ai} in V, then {ai*} is the basis in V** dual to the basis {Xi} in V*. 1'l'oof. We have ai*(Xj) = Xj(ai) = 5}, which shows that a{* is the ith coordi- nate projection. In case the reader has forgotten, the basis expansion f = L CjXj implies that ai*(f) = f(ai) = (L CjXj) (ai) = Ci, so that ai* is the mapping J1-+ Ci. 0 Annihilator subspaces. It is in this dual situation that orthogonality first naturally appears. However, we shall save the term 'orthogonal' for the latter enntext in which V and V* have been identified through a scalar product, and shall speak here of the annihilator of a set rather than its orthogonal com- plement.
  • 96. 84 FI],;"ITE-DlME],;"SIOXAL VECTOR SP.-CES 2.3 Definition. If A C V, the annihilator of A, A 0, is the set of all f in V * such that f(a) = 0 for all a in A. Similarly, if A c V*, then AO = {a E V:f(a) = 0 forallfE A}. If we view Vas (V*)*, the second definition is included in the first. The following properties arc easily establiHhed and will be left as exercises: 1) A ° iH always a subspace. 2) ACE =} EO C A 0. 3) (L(A))O = A 0. 4) (A u Bt = A° n W. 5) A C AOo. We now add one more crucial dimenHional identity to thoHe of the last section. Theorem 3.3. If W ii"l a subspace of V, then d(V) = d(W) + d(WO). Proof. Let {i3Yf be a basis for W, and extend it to a basis {{1i} ~ for V. Let {Ai}~ be the dual basis in V*. We claim that then {Ai}::'+l is a basis for Woo First, if j > In, then Aj({1i) = 0 for i = 1, ... , m, and so Aj is in WO by (3) above. Thus {Am+l' ... , An} CWo. Now suppose that f E WO, and let f = Lj'=l CjAj be its (dual) basis expansion. Then for each i ::; m we have Ci = f({1i) = 0, since {1i E Wand f E WO; therefore, f = L::'+l CjAj. Thus every fin WO is in the span of {Ai}::'+l. Altogether, we have shown that WO is the span of {Ai}::'+l, as claimed. Then d(WO) + d(W) = (n - m) + m = n = d(V), and we are done. D Corollary. A 00 = L(A) for every subset A C V. Proof. Since (L(A))O = A 0, we have d(L(A)) + d(A 0) = d(V), by the theorem. Also d(AO) + d(.400) = d(V*) = d(V). Thus d(AOO) = d(L(A)), and since L(A) C A 00, by (5) above, we have L(A) = A 00. D The adjoint of T. We shall now see that with every T in Hom(V, W) there is naturally associated an element of Hom(W*, V*) which we call the adjoint of T and designate T*. One consequence of the intimate relationship between T and T* is that the range of T* is exactly the annihilator of the null space of T. Combined with our dimensional identities, this implies that the ranges of T and T* have the same dimension. And later on, after we have established the connection between matrix representations of T and T*, this turns into the very mysterious fact that the dimension of the linear span of the row vectors of an m- by-n matrix is the same as the dimension of the linear span of its column vectors, which gives us our notion of the rank of a matrix. In Chapter 5 we shall study a situation (Hilbert space) in which we are given a fixed fundamental isomorphism between V and V*. If T is in Hom V, then of course T* is in Hom V*, and we can use this isomorphism to "transfer" T* into Hom V. But now T can be com-
  • 97. 2.3 THE DUAL SPACE 85 pared with its (transferred) adjoint T*, and they may be equal. That is, T may be self-adjoint. It turns out that the self-adjoint transformations are "nice" ones, as we shall see for ourselves in simple cases, and also, fortunately, that many important linear maps arising from theoretical physics are self-adjoint. If T E Hom(V, W) and l E W*, then of course loT E V*. Moreover, the mapping l ~ loT (T fixed) is a linear mapping from W* to V* by the corollary to Theorem 3.3 of Chapter 1. This mapping is called the adjoint of T and is designated T*. Thus T* E Hom(W*, V*) and T*(l) = loT for all l E W*. Theorelll 3.4. The mapping T ~ T* is an isomorphism from the vector space Hom(V, W) to the vector space Hom(W*, V*). Also (T ° S)* = S* ° T* under the relevant hypotheses on domains and codomains. Proof. Everything we have said above through the linearity of T ~ T* is a consequence of the bilinearity of w(l, T) = loT. The map we have called T* is simply WT, and the linearity of T ~ T* thus follows from Theorem 6.1 of Chapter 1. Again the reader might benefit from a direct linearity check, begin- ning with (ciTI + c2T2)*(l). To see that T ~ T* is injective, we take any T .,t. 0 and choose a E V so that T(a) .,t. O. We then choose l E W* so that l(T(a») .,t. o. Since l(T(a») = (T*(l») (a), we have verified that T* .,t. o. Next, if d(V) = m and d(W) = n, then also d(V*) = m and d(W*) = n . bythecorollaryofTheorem3.1,andd(Hom(V, W») = mn= d(Hom(W*, V*») by Theorem 2.5. The injective map T ~ T* is thus an isomorphism (by the corollary of Theorem 2.4). Finally, (T ° S)*l = lo (T ° S) = (loT) ° S = S*(l ° T) = S*(T*(l») = (S* ° T*)l, so that (T ° S)* = S* ° T*. D The reader would probably guess that T** becomes identified with T under the identification of V with V**. This is so, and it is actually the reason for calling the isomorphism ~ ~ ~** natural. We shall return to this question at the end of the section. Meanwhile, we record an important elementary identity. Theorelll 3.5. (R(T*»)O = N(T) and N(T*) = (R(T»)o. Proof. The following statements are definitionally equivalent in pairs as they occur: l E N(T*), T*(l) = 0, loT = 0, l(T(~») = 0 for all ~ E V, l E (R(T»)o. Therefore, N(T*) = (R(T»)o. The other proof is similar and will be left to the reader. [Start with a E N(T) and end with a E (R(T*»)o.] D The rank of a linear transformation is the dimension of its range space. Corollary. The rank of T* is equal to the rank of T. Proof. The dimensions of R(T) and (N(T»)O are each d(V) - d(N(T») by Theorems 2.4 and 3.3, and the second is d(R(T*») by the above theorem. Therefore, d(R(T») = d(R(T*»). D
  • 98. 86 FINITE-DIMENSIONAL VECTOR SPACES 2.3 Dyads. Consider any Tin Hom(V, W) whose range M is one-dimensional. If fJ is a nonzero vector in 111, then x ~ xfJ is a basis isomorphism (J: ~ --. M and rIoT: V --. ~ is a linear functional AE V*. Then T = (J 0 A and T(~) = A(~)fJ for all~. We write this as T = A(')fJ, and call any such T a dyad. Lemma 3.2. If T is the dyad A(' )fJ, then T* is the dyad fJ**(· )A. Proof. (T*(l»)(~) = (l 0 T)W = l(T(~») = l(A(~)fJ) = l(fJ)A(~), so that T*(l) = l(fJ)A = fJ**(l)A, and T* = fJ**(· )A. 0 *Natural isomorphisms again. We are now in a position to illustrate more precisely the notion of a natural isomorphism. We saw above that among all the isomorphisms from a finite-dimensional vector space V to its second dual, we could single one out naturally, namely, the map ~ ~ ~**, where ~**(f) = f(~) for all f in V*. Let us call this isomorphism cpv. The technical meaning of the word 'natural' pertains to the collection {cpv} of all these isomorphisms; we found a way to choose one isomorphism CPV for each space V, and the proof that this is a "natural" choice lies in the smooth way the various cpv's relate to each other. To see what we mean by this, consider two finite-dimensional spaces V and Wand a map Tin Hom(V, W). Then T* is in Hom(W*, V*) and T** = (T*)* is in Hom(V**, W**). The setting for the four maps T, T**, cpv, and CPW can be displayed in a diagram as follows: V __----.::T'--___ W 'I'~j j'l'w T** V**------- W** The diagram indicates two maps, CPw 0 T and T** 0 cpv, from V to W**, and we define the collection of isomorphisms {cpv} to be natural if these two maps are always equal for any V, Wand T. This is the condition that the two ways of going around the diagram give the same result, i.e., that the diagram be com- mutative. Put another way, it is the condition that T "become" T** when V is identi- fied with V** (by cpv) and W is identified with W** (by cpw). We leave its proof as an exercise. EXERCISES 3.1 Let (J be an isomorphism from a vector space V to ~n. Show that the functionals {lI'i 0 (J}~ form a basis for V*. 3.2 Show that the standard isomorphism from ~n to (~n)* that we get by composing the coordinate isomorphism for the standard basis for ~n (the identity) with the dual basis isomorphism for (~n)* is just our friend a ~ la, where la(x) = L~ a;Xi. (Show that the dual basis isomorphism is a ~ L~ a,"1f'i.)
  • 99. 2.3 THE DUAL SPACE 87 3.3 We know from Theorem 1.6 that a choice of a basis {,Bi} for V defines an isomor- phism from wn to Hom(V, W) for any vector space W. Apply this fact and Theorem 1.3 to obtain a basis in V*, and show that this basis is the dual basis of {,Bi}. 3.4 Prove the properties of A° that are listed in the text. 3.5 Find (a basis for) the annihilator of -< 1, 1, 1>- in ~3. (Use the isomorphism of (~3)* with ~3 to express the basis vectors as triples.) 3.6 Find (a basis for) the annihilator of {-< 1,1, 1 >-, -< 1,2,3>-} in ~3. 3.7 Find (a basis for) the annihilator of {-< 1, 1, 1, 1>-, -< 1, 2, 3, 4>-} in ~4. 3.8 Show that if V = MEt> N, then V* = MO Et> N°. 3.9 Show that if M is any subspace of an n-dimensional vector space V and d(M) In, then M can be viewed as being the linear span of an independent subset of m clements of V or as being the annihilator of (the intersection of the null spaces of) an independent subset of n - m elements of V*. 3.10 If B = {Ii} ~ is a finite collection of linear functionals on V (B C V*), then its annihilator BO is simply the intersection N = n~ Ni of the null spaces Ni = N(fi) of the functionals k State the dual of Theorem 3.3 in this context. That is, take W as the linear span of the functionals Ii, so that we V* and lYo C V. State the dual of the corollary. 3.11 Show that the following theorem is a consequence of the corollary of Theorem 3.3. Theorem.. Let N be the intersection n~ Ni of the null spaces of a set {fi}~ of linear functionals on V, and suppose that gin V* is zero on N. Then g is a linear combination of the set {M~. 3.12 A corollary of Theorem 3.3 is that if W is a proper subspace of V, then there is Itt least one nonzero linear functional I in V* such that I = 0 on W. Prove this fact directly by elementary means. (You are allowed to construct a suitable basis.) 3.13 An m-tuple of linear functionals {f;} ~ on a vector space V defines a linear mapping at--+ -<it(a), ... ,/m(a) >- from V to ~m. What theorem is being applied here? Prove that the range of this linear mapping is the whole of ~m if and only if {f;}~ is an independent set of functionals. [Hint: If the range is a proper subspace lV, t.here is a nonzero m-tuple a such that L~ aiXi = 0 for all x E W.] 3.14 Continuing the above exercise, what is the null space N of the linear mapping It t--+ -<it(a), ... ,Im(a) >--? If g is a linear functional which is zero on N, show that g is a linear combination of the f;, now as a corollary of the above exercise and Theorem 4.3 of Chapter 1. (Assume the set {fi} ~ independent.) 3.15 Write out from scratch the proof that T* is linear [for a given Tin Hom(V, W)]. Also prove directly that T t--+ T* is linear. 3.16 Prove the other half of Theorem 3.5. 3.17 Let 8i be the isomorphism a t--+ a** from Vi to V;** for i = 1, 2, and suppose p;iven Tin Hom(V1, V2). The loose statement T = T** means exactly that or T** 0 81 = 82 0 T. Prove tlIis identity. As usual, do this by proving that it holds for each a in VI.
  • 100. 88 FINITE-DIMENSIONAL VECTOR SPACES 2.4 3.18 Let (J: IR,n ~ V be a basis isomorphism. Prove that the adjoint (J* is the coordi- nate isomorphism for the dual basis if (IR,n)* is identified with IR,n in the natural way. 3.19 Let w be any bilinear functional on V X TV. Then the two associated linear transformations are T: V ~ W* defined by (TW)(71) = w(~, 71) and S: lr ~ V* defined by (S(71»)W = w(~, 71). Prove that S = T* if W is identified with IF**. 3.20 Suppose that fin (IR,m)* has coordinate m-tuple a [fey) = I:~ aiYi] and that T in Hom(lR,n, IR,m) has matrix t = {t;;}. Write out the explicit expression of the number f( T(x») in terms of all these coordinates. Rearrange the sum so that it appears in the form n g(x) = :E biXi, 1 and then read off the formula for b in terms of a. 4. MATRICES Matrices and linear transforlllations. The reader has already learned something about matrices and their relationship to linear transformations from Chapter 1; we shall begin our more systematic discussion by reviewing this earlier material. By popular conception a matrix is a rectangular array of numbers such as Note that the first index numbers the rows and the second index numbers the columns. If there are m rows and n columns in the array, it is called an m-by-n (m X n) matrix. This notion is inexact. A rectangular array is a way of picturing a matrix, but a matrix is really a function, just as a sequence is a function. With the notation m = {I, ... , m}, the above matrix is a function assigning a num- ber to every pair of integers -<i, j>- in mX n. It is thus an element of the set IR,mXn. The addition of two m X n matrices is performed in the obvious place- by-place way, and is merely the addition of two functions in IR,mX7i; the same is true for scalar multiplication. The set of all m X n matrices is thus the vector space IR,mXn, a Cartesian space with a rather fancy finite index set. We shall use the customary index notation tij for the value t(i, j) of the function t at -<i, j>-, and we shall also write {tij} for t, just as we do for sequences and other indexed collections. The additional properties of matrices stem from the correspondence between m X n matrices {tij} and transformations T E Hom(lR,n, IR,m). The following theorem restates results from the first chapter. See Theorems 1.2, 1.3, and 6.2 of Chapter 1 and the discussion of the linear combination map at the end of Section 1.6.
  • 101. 2.4 MATRICES 89 Theorem 4.1. Let {tij} be an m-by-n matrix, and let t j be the m-tuple that is itsjth column for j = 1, ... , n. Then there is a unique TinHom(~n, ~m) such that skeleton T = {tj} , i.e., such that T( oj) = t j for all j. T is defined as the linear combination mapping x ~ y = 2:.1=1 Xjt j, and an equivalent presentation of T is the collection of scalar equations n Yi = L tijXj j=l for i = 1, ... , m. Each T in Hom(~n, ~m) arises this way, and the bijection {tij} ~ T from ~mXn to Hom(~n, ~m) is a natural isomorphism. The only additional remark called for here is that in identifying an m X n matrix with an n-tuple of m-tuples, we are making use of one of the standard identifications of duality (Section 0.10). We are treating the natural isomorphism between the really distinct spaces ~mXn and (~m)n as though it were the identity. We can also relate T to {tij} by way of the rows of {tij}. As above, taking ith coordinates in the m-tuple equation y = 2:.1=1 Xjtj, we get the equivalent and familiar system of numerical (scalar) equations Yi = 2:1=1 tijXj for i = 1, ... ,m. Now the mapping x ~ 2:1=1 CjXj from ~n to ~ is the most gen- eral linear functional on ~n. In the above numerical equations, therefore, we have simply used the m rows of the matrix {tij} to present the m-tuple of linear functionals on ~n which is equivalent to the single m-tuple-valued linear mapping T in Hom(~n, ~m) by Theorem 3.6 of Chapter 1. The choice of ordered bases for arbitrary finite-dimensional spaces V and W allows us to transfer the above theorem to Hom(V, W). Since we are now going to correlate a matrix t in ~mXn with a transformation Tin Hom(V, W), we shall designate the transformation in Hom(~n, ~m) discussed above by T. Theorem 4.2. Let {ajg and {Ili} 7 be ordered bases for the vector spaces V and W, respectively. For each matrix {tij} in ~mXn let T be the unique element of Hom(V, W) such that T(aj) = 2:7'=1 tiif3i for j = 1, ... , n. Then the mapping {tij} ~ T is an isomorphism from ~mXn to Hom(V, W). 1'1'oof. We simply combine the isomorphism {tij} ~ T of the above theorem with the isomorphism T ~ T = 1/; 0 To cp-1 from Hom(~n, ~m) to Hom(V, W), where cp and 1/; are the two given basis isomorphisms. Then T is the transforma- tion described in the theorem, for T(aj) = 1/;(T(cp-1(aj»)) = 1/;(T(oj») = f(t;) = 2:7'=1 tijlli' The map {tij} ~ T is the composition of two isomorphisms alld so is an isomorphism. 0 It is instructive to look at what we have just done in a slightly different way. (:iven the matrix {tij}, let Tj be the vector in W whose coordinate m-tuple is the .It h column t j of the matrix, so that Tj = 2:~ 1tijlli. Then let T be the unique (·lement of Hom(V, W) such that T(aj) = Tj for j = 1, ... ,n. Now we have ()btained T from {tij} in the following two steps: T corresponds to the n-tuple
  • 102. 90 FINITE-DIMENSIONAL VECTOR SPACES 2.4 {Tj}i under the isomorphism from Hom(V, W) to wn given by Theorem 1.6, and {Tj} ~ corresponds to the matrix {tij} by extension of the coordinate isomor- phism between W and ~m to its product isomorphism from wn to (~m)n. Corollary. If y is the coordinate m-tuple of the vector." in W and x is the coordinate n-tuple of ~ in V (with respect to the given bases), then." = T(~) if and only if Yi = L1=1 tijXj for i = 1, ... , m. Proof. We know that the scalar equations are equivalent to y = T(x), which is the equation y = ",,-1 0 To <p(x). The isomorphism"" converts this to the equation." = T(~). 0 Our problem now is to discover the matrix analogues of relationship between linear transformations. For transformations between the Cartesian spaces ~n this is a fairly direct, uncomplicated business, because, as we know, the matrix here is a natural alter ego for the transformation. But when we leave the Car- tesian spaces, a transformation T no longer has a matrix in any natural way, and only acquires one when bases are chosen and a corresponding T on Cartesian spaces is thereby obtained. All matrices now are determined with respect to chosen bases, and all calculations are complicated by the necessary presence of the basis and coordinate isomorphisms. There are two ways of handling this situation. The first, which we shall follow in general, is to describe things directly for the general space V and simply to accept the necessarily more compli- cated statements involving bases and dual bases and the corresponding loss in transparency. The other possibility is first to read off the answers for the Cartesian spaces and then to transcribe them via coordinate isomorphisms. Lemma 4.1. The matrix element tkj can be obtained from T by the formula where fJ.k is the kth element of the dual basis in W*. Proof. fJ.k(T(aj)) = fJ.k(Li"=1 tij{Ji) = Li tijfJ.k(fJ;) = Li tij c5~ = tkj. 0 In terms of Cartesian spaces, T(c5j). is the jth column m-tuple t j in the matrix {tij} of T, and tkj is the kth coordinate of tj. From the point of view of linear maps, the kth coordinate is obtained by applying the kth coordinate projection 'Irk, so that tkj = 'lrk(T(c5j)). Under the basis isomorphisms, 'Irk becomes fJ.k, T becomes T, c5j becomes aj, and the Cartesian identity becomes the identity of the lemma. The transpose. The transpose of the m X n matrix {tij} is the n X m matrix {tt} defined by tt = tji for all i, j. The rows of t* are of course the columns of t, and conversely. Theorem 4.3. The matrix of T* with respect to the dual bases in W* and V* is the transpose of the matrix of T.
  • 103. 2.4 MATRICES Proof. If s is the matrix of T*, then Lemmas 3.1 and 4.1 imply that Sji = a;*(T*(J.LM = a;*(J.Li 0 T) = (J.Li 0 T)(aj) = J.Li(T(ai)) = tii' 0 91 Definition. The row space of the matrix {tii} E ~mXn is the subspace of ~n spanned by the m row vectors. The column space is similarly the span of the n column vectors in ~m. Corollary. The row and column spaces of a matrix have the same dimension. Proof. If T is the element of Hom(~n, ~m) defined by T(~i) = ti, then the set {tin of column vectors in the matrix {tii} is the image under T of the stan- dard basis of ~n, and so its span, which we have called the column space of the matrix, is exactly the range of T. In particular, the dimension of the column space is d(R(T)) = rank T. Since the matrix of T* is the transpose t* of the matrix t, we have, similarly, t.hat rank T* is the dimension of the column space of t*. But the column space of t* is the row space of t, and the assertion of the corollary is thus reduced to the identity rank T* = rank T, which is the corollary of Theorem 3.5. 0 This common dimension is called the rank of the matrix. Matrix products. If T E Hom(~n, ~m) and 8 E Hom(~m, ~l), then of course R = 80 T E Hom(~n, ~l), and it certainly should be possible to calculate the matrix r of R from the matrices sand t of 8 and T, respectively. To make this eomputation, we set y = T(x) and z = 8(y), so that z = (8 0 T)(x) = R(x). The equivalent scalar equations in terms of the matrices t and s are so that n Yi = L tihXh h=1 and m Zk = L SkiYi, i=1 But Zk = L~=1 rkhXh for k = 1, ... ,l. Taking x as ~i, we have m rki = L Skitii i=1 for all k and j. We thus have found the formula for the matrix r of the map R = 80 T: x -t z. Of course, r is defined to be the product of the matrices sand t, and we write r = s . t or r = st. Note that in order for the product st to be defined, the number of columns ill the left factor must equal the number of rows in the right factor. We get the clement rki by going across the kth row of s and simultaneously down the jth
  • 104. 92 FINITE-DIMENSIONAL VECTOR SPACES 2.4 column of t, multiplying corresponding elements as we go, and adding the resulting products. This process is illustrated in Fig. 2.1. In terms of the scalar product (x, y) = L~ XiYi on ~n, we see that the element Tkj in r = st is the scalar product of the kth row of s and the jth column of t. (l by m) X (m by n) (l by n) n I Q I I klh mw~-<>--:--<>l In I I I • I I Q n I I I 'I'lj ------------.---- I jth column s • r Fig. 2.1 Since we have defined the product of two matrices as the matrix of the product of the corresponding transformations, i.e., so that the mapping T ~ {tij} preserves products (8 0 T ~ st), it follows from the general principle of Theorem 4.1 of Chapter 1 that the algebraic laws satisfied by composition of transformations will automatically hold for the product of matrices. For example, we know without making an explicit computation that matrix multipli- cation is associative. Then for square matrices we have the following theorem. Theorem 4.4. The set M n of square n X n matrices is an algebra naturally isomorphic to the algebra Hom(~n). Proof, We already know that T ~ {tij} is a natu~al linear isomorphism from Hom(~n) to lfn (Theorem 4.1), and we have defined the product of matrices so that the mapping also preserves multiplication. The laws of algebra (for an algebra) therefore follow for M n from our observation in Theorem 3.5 of Chapter 1 that they hold for Hom(~n). 0 The identity I in Hom(~n) takes the basis vector oj into itself, and therefore its matrix e has oj for its jth column: ej = oj. Thus eij = o{ = 1 if i = j and eij = o{ = 0 if i ~ f. That is, the matrix e is 1 along the main diagonal (from upper left to lower right) and 0 elsewhere. Since I ~ e under the algebra isomorphism T ~ t, we know that e is the identity for matrix multiplication. Of course, we can check this directly: LJ=l tijejk = tik, and similarly for mul- tiplying by e on the left. The symbol 'e' is ambiguous in that we have used it to denote the identity in the space ~nXn of square n X n matrices for any n. Corollary. A square n X n matrix t has a multiplicative inverse if and only if its rank is n.
  • 105. 2.4 MATRICES 93 Proof. By the theorem there exists an 8 E M n such that 8t = t8 = e if and only if there exists an 8 E Hom(lRn ) such that 8 0 T = T 0 8 = I. But such an 8 exists if and only if T is an isomorphism, and by the corollary to Theorem 2.4 this is equivalent to the dimension of the range of T being n. But this dimension is the rank of t, and the argument is complete. D A square matrix (or a transformation in Hom V) is said to be nonsingular if it is invertible. Theorem 4.5. If {ai}~' {j3j}i', and {'Yk}11 are ordered bases for the vec- tor spaces U, V, and lV, respectively, and if T E Hom(U, V) and 8 E Hom(V, lV), then the matrix of 80 T is the product of the matrices of 8 and T (with respect to the given bases). Proof. By definition the matrix of 80 T is the matrix of 8 0 T = X-I 0 (8 0 T) 0 cp in Hom(lR n, 1R1), where cp and x are the given basis isomorphisms for U and lV. But if I/; is the basis isomorphism for V, we have 80 T = (x- 1 0801/;) 0 (1/;-1 0 To cp) = ;S 0 T, and therefore its matrix is the product of the matrices of ;S and T by the defini- tion of matrix multiplication. The latter are the matrices of 8 and Twith respect to the given bases. Putting these observations together, we have the theorem. D There is a simple relationship between matrix products and transposition. Theorem 4.6. If the matrix product 8t is defined, then so is t*8*, and t*8* = (8t)*. Proof. A direct calculation is easy. We have m m (St);k = (St)kj = L Skitij = L t;iS:k = (t*s*) jk· i=l i=l Thus (8t)* = t*8*, as asserted. D This identity is clearly the matrix form of the transformation identity (8 0 T)* = T* 0 8*, and it can be deduced from the latter identity if desired. Cartesian vectors as matrices. We can view an n-tuple x = -<Xb"" xn> as being alternatively either an n X 1 matrix, in which case we call it a column vector, or a 1 X n matrix, in which case we call it a row vector. Of course, these identifications are natural isomorphisms. The point of doing this is, in part, that then the equations Yi = L.f=l tijXj say exactly that the column vector y is the matrix product of t and the column vector x, that is, y = t· x. The linear map T: IRn ~ IRm becomes left multiplication by the fixed matrix t when IR n is viewed as the space of n X 1 column vectors. For this reason we shall take the column vector as the standard matrix interpretation of an n-tuple x; then x* is the corresponding row vector.
  • 106. 94 FINITE-DIME~SIONAL VECTOU SPACES 2.4 In particular, a linear functional F E (IRn)* becomes left multiplication by its matrix, which is of course 1 X n (F being from IRn to 1R1), and therefore is simply the row matrix interpretation of an n-tuple in IRn. That is, in the natural isomorphism a f-+ La from IRn to (IRn)*, where La(x) = L~ aiXi, the functional La can now be interpreted as left matrix multiplication by the n-tuple a viewed as the row vector a*. The matrix product of the row vector (1 X n matrix) a* and the column vector (n X 1 matrix) x is a 1 X 1 matrix a*· x, that is, a number. Let us now see what these observations say about T*. The number La (T(x» is the 1 X 1 matrix a*tx. Since La(T(x» = (T*(La»(x) by the definition of T*, we see that the functional T*(La) is left multiplication by the row vector a*t. Since the row vector form of La is a* and the row vector form of T*(La) is a*t, this shows that when the functionals on IRn are interpreted as row vectors, T* becomes right multiplication by t. This only repeats something we already know. If we take transposes to throw the row vectors into the standard column vector form for n-tuples, it shows that T* is left multiplication by t*, and so gives another proof that the matrix of T* is t*. Change of basis. If cP: x f-+ ~ = L~ xif3i and (j: y f-+ ~ = L~ Yif3i are two basis isomorphisms for V, then A = (j-l 0 cP is the isomorphism in Hom(lRn) which takes the coordinate n-tuple x of a vector ~ with respect to the basis {f3i} into the coordinate n-tuple y of the same vector with respect to the basis {f3i}. The isomorphism A is called the "change of coordinates" isomorphism. In terms of the matrix a of A, we have y = ax, as above. The change of coordinate map A = (j-l 0 cP should not be confused with the similar looking T = (j 0 cP-1. The latter is a mapping on V, and is the element of Hom(V) which takes each f3i to f3i. T' ~.m.pI T B V W Til .p2 IRn u;i1m Fig. 2.2 We now want to see what happens to the matrix of a transformation T E Hom(V, W) when we change bases in its domain and codomain spaces. Suppose then that CPl and CP2 are basis isomorphisms from IRn to V, that 1/11 and 1/12 are basis isomorphisms from IRm to W, and that t' and t" are the matrices of T with respect to the first and second bases, respectively. That is, t' is the matrix of T' = (1/11)-1 0 To CPl E Hom(lRn, IRm ), and similarly for t". The mapping A = cp;-1 0 CPl E Hom(lRn) is the change of coordinates transformation for V: if x is the coordinate n-tuple of a vector ~ with respect to the first basis [that is, ~ = CPl (x)], then A (x) is its coordinate n-tuple with respect to the second basis. Similarly, let B be the change of coordinates map 1/1;-1 01/11 for W. The diagram in Fig. 2.2 will help keep the various relationships of these spaces and
  • 107. 2.4 MATRICES 95 mappings straight. We say that the diagram is commutative, which means that any two paths between two points represent the same map. By selecting various pairs of paths, we can read off all the identities which hold for the nine maps T, T', Til, <PI, <P2, A, 1/Ib 1/12, B. For example, Til can be obtained by going back- ward along A, forward along T', and then forward along B. That is, Til = BoT' 0 A-I. Since these "outside maps" are all maps of Cartesian spaces, we can then read off the corresponding matrix identity til = bt'a-1, showing how the matrix of T with respect to the second pair of bases is obtained from its matrix with respect to the first pair. What we have actually done in reading off the above identity from the diagram is to eliminate certain retraced steps in the longer path which the definitions would give us. Thus from the definitions we get BoT' 0 A-I = (1/1"21 0 1/11) 0 (1/1110 T 0 <PI) 0 (<PlIo «2) = 1/1"21 0 T 0 <P2 = T". In the above situation the domain and codomain spaces were different, and the two basis changes were independent of each other. If W = V, so that T E Hom(V), then of course we consider only one basis change and the formula becomes t" = a· t' . a-I. Now consider a linear functional FE V*. If f" and f' are its coordinate n-tuples considered as column vectors (n X 1 matrices), then the matrices of F with respect to the two bases are the row vectors (f')* and (f")*, as we saw earlier. Also, there is no change of basis in the range space since here W = IR, with its permanent natural basis vector 1. Therefore, b = e in the formula t" = bt'a-t, and we have (f")* = (f')*a-1 or f" = (a-1)*f'. We want to compare this with the change of coordinates of a vector ~ E V, which, as we saw earlier, is given by x" = ax'. These changes go in the oppositive directions (with a transposition thrown in). For reasons largely historical, functionals F in V* are called covariant vectors, and since the matrix for a change of coordinates in V is the transpose of the inverse of the matrix for the corresponding change of coordinates in V*, the vectors ~ in V are called contravariant vectors. These terms are used in classical tensor analysis and differential geometry. The isomorphism {tij} f-t T, being from a Cartesian space IRmXn, is auto- matically a basis isomorphism. Its basis in Hom(V, W) is the image under the isomorphism of the standard basis in IRmXn, where the latter is the set of Kronecker functions ,p defined by ~kl(i, j) = 0 if <. k, l> ¢ <. i, j> and ~kl(k, l) = 1. (Remember that in IRA, ~a is that function such that ~a(b) = 0
  • 108. !)(j FIXITE-DIMEXSIOXcL VECTOn SPACES 2.4 if b r!= a and o"(a) = 1. Here A = m X 'ii and the elements a of A are ordered pairs a = <. k, Z>-.) The function okl is that matrix whose columns are all 0 except for the lth, and the lth column is the m-tuple ok. The corrcsponding tran"formation D'd thus takes every ba"is vector (Xj to 0 except (Xl and takes (Xl to (3". That i", Dkl((Xj) = 0 if j r!= l, and Dk/((Xl) = (3k. Again, Dkl takes the lth basi" vector in V to the !.th basis vector in TV and takes the other basis vectors in V to O. If ~ = L: :ri(Xi, it follows that Dkl(~) = XI(3". Since {Dkl} is the basis defined by the isomorphi"l1l {tij} ~ T, it follows that {tij} is the coordinate set of T with respect to this basis; it is thc image of T under the coordinate isomorphism. It is interesting to see how this basis expan- "ion of T automatically appear::;. We have so that T = L tijD ij. i,i Our original discu""ion of the dual basi" ill V* was a special ca"e of the pre::;ent situation. There we had Hom(V, IR.) = V*, with the permanent "tan- dard basis 1 for R The basis for V* corresponding to the basis {(Xi} for V therefore con"ists of those maps Dl taking (Xl to 1 and (Xj to 0 for j r!= l. Then Dl(~) = DI(L: :rjCXj) = Xl, and Dl i" the lth coordinate functional C,l. Finally, we note that the matrix expression of T E Hom(lR.n , IR."') is very suggestive of the block decompositions of T that we discussed earlier in Section 1.5. In the exerci"es we shall ask the reader to show that in fact T'd = tkID,,-/- EXEHCISES 4.1 Pro"(' that if w: l' X l' ----f IR. i:-; a bilinear functional on rand T: r ----f r* is the corrt'sponding linear transformation defined by (T(1)))(O = w(~, 1)), then for any ba:-;is {(XI} for r the matrix t'j = W((XI, (Xj) is the matrix of T. . 4·.2 Verify that the row and column rank of the following matrix are both 1: [ -5 -10 2 4 ·1.3 Show by a direct calculation that if the row rank of a 2 X 3 matrix is 1, then so is itl'; column rank. .t.t Let {fin be a linearly dependent set of e2-functions (twice continuously differ- entiable real-valued functions) on IR.. Show that the three triples <'fi(X),f:(x),f;'(x) >- arc dependent for any x. Prove therefore that sin t, cos t, ancl et arc linearly indepen- dent. (Compute the cieri'ative triplei; for a well-chosen x.)
  • 109. 2.4 MATRICES 97 4.5 Compute r ~ -~l'-3 0 4 2 4.6 Compute [ a bJ X[ d -bJ.c d -c a From your answer give a necessary and sufficient condition for to exist. 4.7 A matrix a is idempotent if a 2 = a. Find a basis for the vector space ~2X2 of all 2 X 2 matrices consisting entirely of idempotents. 4.8 By a direct calculation show that is invertible and find its inverse. 4.9 Show by explicitly solving the equation [; ~]. [: ~] = [~ ~] , that the matrix on the left is invertible if and only if (the determinant) ad - be is not zero. 4.10 Find a nonzero 2 X 2 matrix [; ~] whose square is zero. 4.11 Find all 2 X 2 matrices whose squares are zero. 4.12 Prove by computing matrix products that matrix multiplication is associative. 4.13 Similarly, prove directly the distributive law, (r + s) . t = r' t + s . t. 4.14 Show that left matrix multiplication by a fixed r in ~mXn is a linear transforma- tion from ~nXp to ~mXp. What theorem in Chapter 1 does this mirror? 4.15 Show that the rank of a product of two matrices is at most the minimum of their ranks. (Remember that the rank of a matrix is the dimension of the range space of its associated T.) 4.16 Let a be an m X n matrix, and let b be n X m. If m > n, show that a . b cannot be the identity e (m X m).
  • 110. 98 FINITE-DIMENSIONAL VECTOR SPACES 2.4 4.17 Let Z be the subset of 2 X 2 matrices of the form Prove that Z is a subalgebra of R2X2 (that is, Z is closed under addition, scalar multipli- cation, and matrix multiplication). Show that in fact Z is isomorphic to the complex number system. 4.18 A matrix (necessarily square) which is equal to its transpose is said to be sym- metric. As a square array it is symmetric about the main diagonal. Show that for any m X n matrix t the product t . t* is meaningful and symmetric. 4.19 Show that if sand t are symmetric n X n matrices, and if they commute, then s' t is symmetric. (Do not try to answer this by writing out matrix products.) Show conversely that if s, t, and s' t are all symmetric, then sand t commute. 4.20 Suppose that T in Hom R2 has a symmetric matrix and that T is not of the form cI. Show that T has exactly two eigenvectors (up to scalar multiples). What does the matrix of T become with respect to the "eigenbasis" for R2 consisting of these two eigenvectors? 4.21 Show that the symmetric 2 X 2 matrix t has a symmetric square root 8 (82 = t) if and only if its eigenvalues are nonnegative. (Assume the above exercise.) 4.22 Suppose that t is a 2 X 2 matrix such that t* = t-l. Show that t has one of the forms where a2 + b2 = 1. 4.23 Prove that multiplication by the above t is a Euclidean isometry. That is, show that if y = t· x, where x and yE R2, then Ilxll = Ilyll, where Ilxll = (x~ +x~) 1/2. 4.24 Let {Dkl} be the basis for Hom(V, TV) defined in the text. Taking lr = V, show that these operators satisfy the very important multiplication rules Dij 0 Dkl = 0 Dik 0 Dkl = Dil. if j ~ k, 4.25 Keeping the above identities in mind, show that if l ~ 1n, then there are trans- formations Sand T in Hom V such that SoT - T 0 S = Dim. Also find Sand T such that SoT - T 0 S = Dll - Dmm. 4.26 Given T in Hom Rn, we know from Chapter 1 that T = Li.j Tij, where Tij = PiTPj and Pi = (Ji'Tri. Now we also have Show from the definition of Dij in the text that PiDijPj = Dij and that PiDklPj = 0 if either i ~ k or j ~ l. Conclude that Tij = tijDij.
  • 111. 2.5 TRACE A~D DETERMINANT 99 5. TRACE AND DETERMINANT Our aim in this short section is to acquaint the reader with two very special real-valued functions on Hom V and to describe some of their properties. Theorem 5.1. If V is an n-dimensional vector space, there is exactly one linear functional A on the vector space Hom(V) with the property that A(S 0 T) = A(T 0 S) for all S, Tin Hom(V) and normalized so that A(I) = n. If a basis is chosen for V and the corresponding matrix of T is {tij}, then A(T) = L:i'=1 tii, the sum of the elements on the main diagonal. Proof. If we choose a basis and define A(T) as L:~ tii, then it is clear that Ais a linear functional on Hom(V) and that A(I) = n. Moreover, n(n ) nX(S 0 T) = 2: ~ Sijtji = .2: Sijtji = ~ tjiSij = X(T 0 S) . •=1 J=1 ',J=1 ',J That is, each basis for V gives us a functional Ain (Hom V) *such that A(S 0 T) = A(T 0 S), A(l) = n, and A(T) = L: tii for the matrix representation of that basis. Now suppose that J..I. is any element of (Hom(V)) * such that J..I.(S 0 T) = J..I.(T 0 S) and J..I.(I) = n. If we choose a basis for V and use the isomorphism 8: {tij} 1--+ T from ~nXn to Hom V, we have a functional v = J..I. 0 8 on ~nXn (v = 8*J..I.) such that v(st) = v(ts) and v(e) = n. By Theorem 4.1 (or 3.1) v is given by a matrix c, v(t) = L:i.j=1 Cijtij, and the equation v(st - ts) = 0 becomes L:i.j,k=1 Cij(Siktkj - Sjktki) = o. Weare going to leave it as an exercise for the reader to show that if l rf m, then very simple special matrices sand t can be chosen so that this sum reduces to Clm = 0, and, by a different choice, to Cll - Cmm = O. Together with the requirement that v(e) = n, this implies that Clm = 0 for l rf m and Cmm = 1 for m = 1, ... ', n. That is, v(t) = L:~ tmm , and v is the A of the basis being used. Altogether this shows that there is a unique A in (Hom V)* such that A(S 0 T) = A(T 0 S) for all Sand T and A(I) = n, and that A(T) has the diagonal evaluation as L: tii in every basis. 0 This unique A is called the trace functional, and A(T) is the trace of T. It is usually designated tr(T). The determinant function tl(T) on Hom V is much more complicated, and we shall not prove that it exists until Chapter 7. Its geometric meaning is as follows. First, [tl(T)[ is the factor by which T multiplies volumes. More pre- cisely, if we define a "volume" v for subsets of V by choosing a basis and using the coordinate correspondence to transfer to V the "natural" volume on ~n, then, for any figure A C V, v(T[AJ) = [tl(T)/. v(A). This will be spelled out in Chapter 8. Second, tl(T) is positive or negative according as T preserves or reverses orientation, which again is a sophisticated notion to be explained later. For the moment we shall list properties of tl(T) that are related to this geometric interpretation, and we give a sufficient number to show the uniqueness of tl.
  • 112. 100 FINITE-DIMENSIONAL VECTOR SPACES 2.5 v ...-f--"A M-+-+---+-- Fig. 2.3 We assume that for each finite-dimensional vector space V there is a func- tion d (or dv when there is any question about domain) from Hom(V) to IR such that the following are true: a) d(S 0 T) = d(S) d(T) for any S, T in Hom(V). b) If a subspace N of V is invariant under T and T is the identity on Nand on VIN (that is, T[a] = a for each coset a = a +N of N), then d(T) = 1. Such a T is a shearing of V along the planes parallel to N. In two dimensions it can be pictured as in Fig. 2.3. c) If V is a direct sum V = M +N of T-invariant subspaces M and N, and if R = T rM and S = T rN, then d(T) = d(R) d(S). More exactly, dv(T) = d},f(R) dN(S). d) If V is one-dimensional, so that any Tin Hom(V) is simply multiplication by a constant CT, then d(T) is that constant CT. e) If V is two-dimensional and T interchanges a pair of independe~t vectors, then d(T) = -1. This is clearly a pure orientation-changing property. The fact that d is uniquely determined by these properties will follow from our discussion in the next section, which will also give us a process for calculating d. This process is efficient for dimensions greater than two, but for Tin Hom(1R2) there is a simple formula for d(T) which every student should know by heart. Theorem 5.2. If T is in Hom(1R2) and {tij} is its 2 X 2 matrix, then d(T) = tllt22 - t 12t 21 · This is a special case of a general formula, which we shall derive in Chapter 7, that expresses d(T) as a sum of n! terms, each term being a product of n numbers from the matrix of T. This formula is too complicated to be useful in computa- tions for large n, but for n = 3 it is about as easy to use as our row-reduction calculation in the next section, and for n = 2 it becomes the above simple expression. There are a few more properties of d with which every student should be familiar. They will all be proved in Chapter 7. Theorem 5.3. If T isin Hom V, then d(T*) = d(T). If 8 is an isomorphism from V to Wand S = (J 0 ToO-l, then d(S) = d(T).
  • 113. 2.5 TRACE AND DETERMINANT 101 Theorem 5.4. The transformation Tis nonsingular (invertible) if and only if A(T) ~ O. In the next theorem we consider T in Hom Rn, and we want to think of A(T) as a function of the matrix t of T. To emphasize this we shall use the notation D(t) = A(T). Theorem 5.5 (Cramer's rule). Given an n X n matrix t and an n-tuple y, let t Ii y be the matrix obtained by replacing thejth column of t by y. Then y = t . x =} D(t)Xi = D(t Ii y) for all j. If t is nonsingular [D(t) ~ 0], this becomes an explicit form.lla for the solution x of the equation y = t· x; it is theoretically important even in those cases when it is not useful in practice (large n). EXERCISES 5.1 Finish Theorem 5.1 by applying Exercise 4.25. 5.2 It follows from our discussion of trace that tr(T) = :E tii is independent of the basis. Show that this fact follows directly from tr(t . s) = tr(s . t) and the change of basis formula in the preceding section. 5.S Show by direct computation that the function d(t) = tllt22 - t12t21 satisfies d(s· t) = des) d(t) (where sand tare 2 X 2 matrices). Conclude that if V is two- dimensional and d(T) is defined for T in Hom V by choosing a basis and setting d(T) = d(t), then d(T) is actually independent of the basis. 5.4 Continuing the above exercise, show that d(T) = A(T) in any of the following cases: 1) T interchanges two independent vectors. 2) T has two eigenvectors. 3) T has a matrix of the form [~ ~J. Hhow next that if T has none of the above forms, then T = R 0 S, where S is of type (1) and R is of type (2) or (3). [Hint: Suppose T(a) = {3, with a and {3 independent. Let S interchange a and (3, and consider R = To S.] Show finally that d(T) = A(T) for all T in Hom V. (V is two-dimensional.) 5.5 If t is symmetric and 2 X 2, show that there is a 2 X 2 matrix s such that H* = 8-1, A(s) = 1, and sts-l is diagonal. 5.6 Assuming Theorem 5.2, verify Theorem 5.4 for the 2 X 2 case. 5.7 Assuming Theorem 5.2, verify Theorem 5.5 for the 2 X 2 case.
  • 114. 102 FINITE-DIMENSIONAL VECTOR SPACES 2.6 5.8 In this exercise we suppose that the reader remembers what a continuous func- tion of a real variable is. Suppose that the 2 X 2 matrix function a(t) = [all (t) a2l (t) has continuous components aiit) for t E (0, 1), and suppose that a(t) is nonsingular for every t. Show that the solution y(t) to the linear equation a(t) . y(t) = x(t) has continuous components YI (t) and Y2(t) if the functions Xl (t) and X2(t) are continuous. 5.9 A homogeneous second-order linear differential equation is an equation of the form Y" + alY' + aoy = 0, where al = al (t) and ao = ao(t) are continuous functions. A solution is a e2-function 1 (i.e., a twice continuously differentiable function) such that I"(t) + al(t)/'(t) + ao(t)/(t) = o. Suppose that 1 and g are e2-functions [on (0,1), say] such that the 2 X 2 matrix [ l(t) g(t) ] !'(t) g'(t) is always nonsingular. Show that there is a homogeneous second-order differential equation of which they are both solutions. 5.10 In the above exercise show that the space of all solutions is a two-dimensional vector space. That is, show that if h(t) is any third solution, then h is a linear combi- nation of 1and g. 5.11 Bya "linear motion" of the Cartesian plane 1R2 into itself we shall mean a con- tinuous map X t--+ t(x) from [0, 1] to the set of 2 X 2 nonsingular matrices such that t(O) = e. Show that Ll (t(I)) > O. 5.12 Show that if Ll(s) = 1, then there is a linear motion whose final matrix t(l) is s. 6. MATRIX COMPUTATIONS The computational process by which the reader learned to solve systems of linear equations in secondary school algebra was undoubtedly "elimination by successive substitutions". The first equation is solved for the first unknown, and the solution expression is substituted for the first unknown in the remaining equations, thereby eliminating the first unknown from the remaining equations. Next, the second equation is solved for the second unknown, and this unknown is then eliminated from the remaining equations. In this way the unknowns are eliminated one at a time, and a solution is obtained. This same procedure also solves the following additional problems: 1) to obtain an explicit basis for the linear span of a set of m vectors in IRn; therefore, in particular, 2) to find the dimension of such a subspace; 3) to compute the determinant of an m X m matrix; 4) to compute the inverse of an invertible m X m matrix.
  • 115. 2.6 MATRIX COMPUTATIONS 103 In this section we shall briefly study this process and the solutions to these problems. We start by noting that the kinds of changes we are going to make on a finite sequence of vectors do not alter its span. Lemma 6.1. Let {ai}'f be any m-tuple of vectors in a vector space, and let {i3i}'f be obtained from {ai} 'f by anyone of the following elementary operations: 1) interchanging two vectors; 2) multiplying some ai by a nonzero scalar; 3) replacing ai by ai - xaj for some j ~ i and some x E R Then L({i3iH) = L({ai}'f). Proof. If ai = ai - xaj, then ai = ai + Xaj. Thus if {i3i}'f is obtained from {ai}'f by one operation of type (3), then {ai}'f can be obtained from {i3i}'f by one operation of type (3). In particular, each sequence is in the linear span of the other, and the two linear spans are therefore the same. Similarly, each of the other operations can be undone by one of the same type, and the linear spans are unchanged. 0 When we perform these operations on the sequence of TOW vectors in a matrix, we call them elementary row operations. We define the order of an n-tuple x = <Xb"" xn > as the index of the first nonzero entry. Thus if Xi = °for i < j and Xj ~ 0, then the order of x isj. The order of <0,0,0,2, -1,0> is 4. Let {aij} be an m X n matrix, let V be its row space, and let nl < n2 < ... < nk be the integers that occur as orders of nonzero vectors in V. We are going to construct a basis for V consisting of k elements having exactly the above set of orders. If every nonzero row in {aij} has order >p, then every nonzero vector x in V has order> p, since x is a linear combination of these row vectors. Since some vector in V has the minimal order nb it follows that some row in {aij} has order 1£1. We move such a row to the top by interchanging two rows. We then multiply this row x by a constant, so that its first nonzero entry xn1 is 1. Let a 1, ... , an be the row vectors that we now have, so that a 1 has order nl and a~l = 1. We next subtract multiples of a 1 from each of the other rows in such a way that the new ith row has °as its nl-coordinate. Specifically, we replace ai by ai - a;.l· a1 for i > 1. The matrix that we thus obtain has the property that its jth column is the zero m-tuple for eachj < n1 and its n1th column is 151 in IRm. Its first row has order nb and every other row has order >nl' Its row space is still V. We again call it a. Now let x = I:'f ciai be a vector in V with order n2. Then Cl = 0, for if Cl ~ 0, then the order of x is nl. Thus x is a linear combination of the second
  • 116. 104 FINITE-DIMEXSIONAL VECTOR SPACES 2.6 to the mth rows, and, just as in the first case, one of these rows must therefore have order n2. We now repeat the above process all over again, keying now on this vector. We bring it to the second row, make its n2-coordinate 1, and subtract multiples of it from all the other rows (including the first), so that the resulting matrix has 52 for its n2th column. Next we find a row with order n3, bring it to the third row, and make the n3th column 53, etc. We exhibit this process below for one 3 X 4 matrix. This example is dis- honest in that it has been chosen so that fractions will not occur through the application of (2). The reader will not be that lucky when he tries his hand. Our defense is that by keeping the matrices simple we make the process itself more apparent. [~ -1 2 4 2 4 o ---+ (3) ---+ [00 1 (3) 2 -1 4 1 -1 2 o 1 o 4 2 o 2 2 -4 4 -2 o o 4 1 -2 o 0 -11 ;J -~l11 ~l 1 -1 4 1 1 2 o 1 o 2 2 o 2 -2 -4 4 -2 o Note that from the final matrix we can tell that the orders in the row space are 1, 2, and 4, whereas the original matrix only displays the orders 1 and 2. We end up with an m X n matrix having the same row space V and the following special structure: 1) For 1 :s; j :s; k the jth row has order nj. 2) If k < m, the remaining m - k rows are zero (since a nonzero row would have order >nk, a contradiction). 3) The njth column is 5j • It follows that any linear combination of the first k rows with coefficients Cb .•• , Ck has Cj in the njth place, and hence cannot be zero unless all the c/s are zero. These k rows thus form a basis for V, solving problems (1) and (2). Our final matrix is said to be in row-reduced echelon form. It can be shown to be uniquely determined by the space V and the above requirements relating its rows to the orders of the elements of V. Its rows form the canonical basis of V.
  • 117. 2.6 MATRIX COMPUTATIONS 105 A typical row-reduced echelon matrix is shown in Fig. 2.4. This matrix is 8 X 11, its orders are 1, 4, 5, 7, 10, and its row space has dimension 5. It is entirely 0 below the broken line. The dashes in the first five lines represent arbitrary num- bers, but any change in these remaining entries changes the spanned space V. We shall now look for the significance of the row-reduction operations from the point of view of general linear theory. In this discussion it will be convenient to use the fact from Section 4 that if an n-tuplet in IRn is viewed as an n X 1 matrix (i.e., as a column vector), then the system of linear equations Yk = L:.i=1 aiixj, i = 1, ... , m, expresses exactly the single matrix equation y = a' x. Thus the associated linear transformation A E Hom(lRn, IRm) is now viewed as being simply multiplication by the matrix a; y = A(x) if and only if y = a· x. 1 - - 0 0 - 0 0 - ------, I 1 0 - 0 0 - L-, I 1 - 0 0 - L __ -, I 1 0 -L ___ .., I 1 - o- 0 - 0 - 0 Fig. 2.4 We first note that each of our elementary row operations on an m X n matrix a is equivalent to premultiplication by a corresponding m X m elementary matrix u. Supposing for the moment that this is so, we can find out what u is by using the m X m identity matrix e. Since U· a = (u· e) . a, we see that the result of performing the operation on the matrix a can also be obtained by premultiplying a by the matrix u . e. That is, if the elementary operation can be obtained as matrix multiplication by u, then the multiplier is u . e. This argument suggests that we should perform the operation on e and then see if premultiplying a by the resulting matrix performs the operation on a. If the elementary operation is interchanging the ioth and ioth rows, then performing it on e gives the matrix a with Ukk = 1 for k ~ io and k ~ io, uioio = Uioio = 1 and Ukl = 0 for all other indices. Moreover, examination of the sums defining the elements of the product matrix u . a will show that pre- multiplying by this u does just interchange the ioth and ioth rows of any In X n matrix a. In the same way, multiplying the ioth row of a by c is equivalent to pre- multiplying by the matrix u which is the same as e except that Uioio = c. Finally, multiplying theioth row by x and adding it to the ioth row is equivalent to premultiplying by the matrix u which is the identity e except that uioio is x instead of O.
  • 118. 106 FINITE-DIMENSIONAL VECTOR SPACES 2.6 These three elementary matrices are indicated schematically in Fig. 2.5. Each has the value 1 on the main diagonal and 0 off the main diagonal except as indicated. io jo io jo io ~I I io c io x -I~i- jo -1-1~ Fig. 2.5 These elementary matrices u are all nonsingular (invertible). The row inter- change matrix is its own inverse. The inverse of multiplying the jth row by e is multiplying the same row by lie. And the inverse of adding e times the jth row to the ith row is adding -e times the jth row to the ith row. If u 1, u 2, ... , uP is a sequence of elementary matrices, and if b = up· Up-I • •••• u then b· a is the matrix obtained from a by performing the corresponding sequence of elementary row operations on a. If u ... , uP is a sequence which row reduces a, then r = b· a is the resulting row-reduced echelon matrix. Now suppose that a is a square m X m matrix and is nonsingular (invertible). Thus the dimension of the row space is m, and hence there are m different orders n}, ... ,nk' That is, k = m, and since 1 ~ nl < n2 < ... < nm = m, we must also have ni = i, i = 1, ... ,m. Remembering that the nith column in r is 5 we see that now the ith column in r is 5i and therefore that r is simply the identity matrix e. Thus b . a = e and b is the inverse of a. Let us find the inverse of by this procedure. The row-reducing sequence is 2J ~ [1 2J ~ [14 (3) 0 -2 (2) 0 21J (3) [~ The corresponding elementary matrices are
  • 119. 2.6 MATRIX COMPUTATIONS 107 The inverse is therefore the product [1 -2][1 0][ 1 o 1 0 -l -3 0] = [-; ~].1 2-2 Check it if you are in doubt. Finally, since h· e = h, we see that we get h from e by applying the same row operations (gathered together as premultiplication by h) that we used to reduce a to echelon form. This is probably the best way of computing the inverse of a matrix. To keep track of the operations, we can place e to the right of a to form a single m X 2m matrix a Ie, and then row reduce it. In echelon form it will then be the m X 2m matrix e Ih, and we can read off the inverse h of the original matrix a. Let us recompute the inverse of D !] by this method. We row reduce [~ 2 1 ~l4 0 getting [! 2 1 ~] [~ -~I 1 ~] -- [~ 2 1 -~]4 0 (3) -3 (2) 1 ~ 2 (3) [~ ~I -2 -~] ,t from which we read off the inverse to be [-;2 -~l Finally we consider the problem of computing the determinant of a square m X m matrix. We use two elementary operations (one modified) as follows: 1') interchanging two rows and simultaneously changing the sign of one of them; 2) as before, replacing some row ai by ai - xaj for some j ~ i. When applied to the 1"OWS of a square matrix, these operations leave the determi- nant unchanged. This follows from the properties of determinants listed in Section 5, and its proof will be left as an exercise. Moreover, these properties will be trivial consequences of our definition of a determinant in Chapter 7. Consider, then, a square m X m matrix {aij}. We interchange the first and pth rows to bring a row of minimal order nl to the top, and change the sign of the row being moved down (the first row here). We do not make the leading
  • 120. 108 FIXITE-DIMENSIONAL VECTOR SPACES 2.6 coefficient of the new first row 1; this elementary operation is not being used now. We do subtract multiples of the first row from the remaining rows, in order to make all the remaining entries in the nlth column O. The nlth column is now C10 l, where Cl is the leading coefficient in the first row. And the new matrix has the same determinant as the original matrix. We continue as before, subject to the above modifications. We change the sign of a row moved downward in an interchange, we do not make leading coefficients 1, and we do clear out the njth column so that it becomes CjOn;, where Cj is the leading coefficient of the jth row (1 ::=; j ::=; k). As before, the remaining m - k rows are 0 (if k < m). Let us call this resulting matrix semireduced. Note that we can find the corresponding reduced echelon matrix from it by k applications of (2); we simply multiply the jth row by 1/Cj for j = 1, ... ,k. If s is the semireduced matrix which we obtained from a using (1') and (3), then we shall show below that its determinant, and therefore the determinant of a also, is the product of the entries on the main diagonal: IIi=1 sii. Recapitulating, we can compute the determinant of a square matrix a by using the operations (1') and (3) to change a to a semireduced matrix s, and then taking the product of the numbers on the main diagonal of s. If we apply this process to D !J, we get 4 2 J (3) [1 2J ~ [1 OJ o -2 (3) 0 -2 ' and the determinant is 1 . (-2) = -2. Our 2 X 2 determinant formula, applied to gives 1 . 4 - 2· 3 = 4 - 6 = -2. If the original matrix {aij} is nonsingular, so that k = m and ni = i for i = 1, ... , m, then the jth column in the semireduced matrix is CjOi, so that Sjj = CiJ and we are claiming that the determinant is the product IIi=l Ci of the leading coefficients. To see this, note that if T is the transformation in Hom([Rn) corresponding to our semireduced matrix, then T( oj) = CjO j , so that [Rn is the direct sum of n T-invariant, one-dimensional subspaces, on the jth of which T is multiplication by Cj. It follows from (c) and (d) of our list of determinant properties that t:.(T) = II~ Cj = II~ Sjj. This is nonzero. On the other hand, if {aij} is singular, so that k = d(V) < m, then the mth row in the semireduced matrix is 0 and, in particular, Smm = O. The product IIi Sii is thus zero. Now, without altering the main diagonal, we can subtract multiples of the columns containing the leading row entries (the columns with
  • 121. 2.6 MATRIX COMPUTATIONS 109 indices nj) to make the mth column a zero column. This process is equivalent to postmultiplying by elementary matrices of type (2) and, therefore, again leaves the determinant unchanged. But now the transformation S of this matrix leaves ~m-l invariant (as the span of Clt, ... , Clm - 1 in ~m) and takes Clm to 0, so that t.(S) = 0 by (c) in the list of determinant properties. So again the determinant is the product of the entries on the main diagonal of the semi- reduced matrix, zero in this case. We have also found that a matrix is nonsingular (invertible) if and only if its determinant is nonzero. EXERCISES 6.1 Compute the canonical basis of the row space of [-1 2 1 j2 3 -3 0 4 -1 6.2 Do the same for U 2 4 :l2 3 -2 0 6.3 Do the same for the above matrix but with a different first choice. 6.4 Calculate the inverse of [~ 2 3 4 ~] by row reduction. Check your answer by multiplication. 6.5 Row reduce [~ 2 3 4 3 4 7 Yl]Y2 . Y3 How does the fourth column in the row-reduced matrix compare with the inverse of [~ computed in the above exercise? Explain. 2 3 4 ~] 6.6 Check whether or not -< 1, 1, 1, 1>-, -< 1, 2, 3, 4>-, -<0, 1, 0, 1>-, and -< 4, 3, 2, 1>- are linearly independent by row reducing. Part of one of the row-reduc- ing operations is unnecessary for this check. What is it?
  • 122. 110 FINITE-DIMENSIONAL VECTOR SPACES 2.6 6.7 Let us call a k-tuple of vectors {ai}t in [Rn canonical if the k X n matrix a with ai as its ith row for all i is in row-reduced echelon form. Supposing that an n-tuple ~ is in the row space of a, we can read off what its coordinates are with respect to the above canonical basis. What are they? How then can we check whether or not an arbitrary n-tuple ~ is in the row space? 6.8 Use the device of row reducing, as suggested in the above exercise, to determine whether or not 51 = -< 1,0,0,0> is in the span of -< 1,1,1,1 >, -< 1,2,3,4>, and -< 2,0,1, -1 >. Do the same for -< 1,2,1,2>, and also for -< 1,1,0,4>. 6.9 Supposing that a ,t. 0, show that [; ~J is invertible if and only if ad - bc ,t. °by reducing the matrix to echelon form. 6.10 Let a be an m X n matrix, and let u be the nonsingular matrix that row reduces a, so that r = u' a is the row-reduced echelon matrix obtained from a. Suppose that r has m - k > °zero rows at the bottom (the kth row being nonzero). Show that the bottom m - k rows of u span the annihilator (range A)O of the range of A. That is, y = ax for some x if and only if m L: CiYi = °1 for each m-tuple c in the bottom m - k rows of u. [Hint: The bottom row of r is obtained by applying the bottom row of u to the columns of a.] 6.11 Remember that we find the row-reducing matrix u by applying to the m X m identity matrix e the row operations that reduce a to r. That is, we row reduce the m X (n +m) juxtaposition matrix a Ie to r Iu. Assuming the result stated in the above exercise, find the range of A E Hom([R3) as the null space of a functional if the matrix of A is 2 3 5 6.12 Similarly, find the range of A if the matrix of A is r~ ~ iJ6.13 Let a be an m X n matrix, and let a be row reduced to r. Let A and R be the corresponding operators in Hom([Rn, [Rm) [so that A(x) = a . x]. Show that A and R have the same null space and that A* and R* have the same range space. 6.14 Show that solving a system of m linear equations in n unknowns is equivalent to solving a matrix equation k = tx for the n-tuple x, given the m X n matrix t and the m-tuple k. Let T E Hom([Rn, [Rm) be multiplication by t. Review the possibilities for a solution from our general linear theory for T (range, null space, affine subspace).
  • 123. 2.7 THE DIAGONALIZATION OF A QUADRATIC FORM 111 6.15 Let b = c Id be the m X (n + p) matrix obtained by juxtaposing the m X n matrix c and the m X p matrix d. If a is an l X m matrix, show that a' b = ac Iad. State the similar result concerning the expression of b as the juxtaposition of n column m-tuples. State the corresponding theorem for the "distributivity" of right multipli- cation over juxtaposition. 6.16 Let a be an m X n matrix and k a column m-tuple. Let b II be the m X (n + 1) matrix obtained from the m X (n + 1) juxtaposition matrix a Ik by row reduction. Show that a . x = k if and only if b . x = I. Show that there is a solution x if and only if every row that is zero in b is zero in I. Restate this condition in terms of the notion of row rank. 6.17 Let b be the row-reduced echelon matrix obtained from an m X n matrix a. Thus b = U· a, where u is nonsingular, and Band i1 have the same null space (where B E Hom(~n, ~m) is multiplication by b). We can read off from b a basis for a sub- space W C ~n such that B I W is an isomorphism onto range B. What is this basis? We then know that the null space N of B is a complement of W. One complement of W, call it M, can be read off from W. What is M? 6.18 Continuing the above exercise, show that for each standard basis vector Oi in M we can read off from the matrix b a vector (Xi in W such that Oi - (Xi E N. Show that these vectors {oi - (Xi} form a basis for N. 6.19 We still have to show that the modified elementary row operations leave the determinant of a square matrix unchanged, assuming the properties (a) through (e) from Section 5. First, show from (a), (c), (d), and (e) that if T in Hom ~2 is defined by T(ol) = 02 and T(02) = -0 1, then /:;'(T) = 1. Do this by a very simple factor- ization, T = R 0 S, where (e) can be applied to S. Conclude that a type (1') elementary matrix has determinant 1. 6.20 Show from the determinant property (b) that an elementary matrix of type (2) has determinant 1. Show, therefore, that the modified elementary row operations on a square matrix leave its determinant unchanged. *7. THE DIAGONALIZATION OF A QUADRATIC FORM As we mentioned earlier, one of the crucial problems of linear algebra is the analysis of the "structure" of a linear transformation T in Hom V. From the point of view of bases, every theorem in this area asserts that with the choice of a special basis for V the matrix of T can be given the such-and-such simple form. This is a very difficult part of the subject, and ,ye are only making con- tact with it in this book, although Theorem 5.5 of Chapter 1 and its corollary form a cornerstone of the structural results. In this section we are going to solve a simpler problem. In the above lan- guage it is the problem of choosing a basis for V making simple the matrix of a transformation T in Hom(V, V*). Such a transformation is equivalent to a bilinear functional on V (by Theorem 6.1 of Chapter 1 and Theorem 3.2 of this chapter); we shall tackle the problem in this setting.
  • 124. 112 FINITE-DIMENSIONAL VECTOR SPACES 2.7 Let V be a finite-dimensional real vector space, and let w: V X V ~ IR be a bilinear functional. If {aig is a basis for V, then w determines a matrix tij = w(ai' aj). We know that if w'l(~) = w(~, 7]), then W'l E V* and 7] 1-+ W'l is a linear mapping T from V to V*. We leave it as an exercise for the reader to show that {tij} is the matrix of T with respect to the basis {ai} for V and its dual basis for V* (Exercise 4.1). If ~ = L~ Xiai and 7] = L~ Yjaj, then i,j i,i In particular, if we set q(~) = w(~, ~), then q(~) = Li,j tijXiXj is a homogeneous quadratic polynomial in the coordinates Xi. For the rest of this section we assume that w is symmetric: w(~, 7]) = w(7], ~). Then we can recover w from the quadratic form q by ( t ) _ qa+ 7]) - qa - 7]) w ,>,7] - 4 ' as the reader can easily check. In particular, if the bilinear form w is not iden- tically zero, then there are vectors ~ such that q(~) = wa, ~) ~ O. What we want to do is to show that we can find a basis {aig for V such that w(ai' aj) = 0 if i ~ j and w(ai' ai) has one of the three values 0, ± 1. Borrow- ing from the standard usage of scalar product theory (see Chapter 5), we say that such a basis is orthonormal. Our proof that an orthonormal basis exists will be an induction on n = dim V. If n = 1, then any nonzero vector (3 is a basis, and if w({3, (3) ~ 0, then we can choose a = x{3 so that x2w({3, (3) = w(a, a) = ±I, the required value of X obviously being x = /w({3, (3)/-1/2. In the general case, if w is the zero functional, then any basis will trivially be orthonormal, and we can therefore suppose that w is not identically O. Then there exists a (3 such that w({3, (3) ~ 0, as we noted earlier. We set an = x{3, where x is chosen to make q(an) = wean, an) = ±1. The nonzero linear functionalf(~) = w(~, an) has an (n - I)-dimensional null space N, and if we let w' be the restriction of w to N X N, then w' has an orthonormal basis {aig- 1 by the inductive hypoth- esis. Also w(ai' an) = wean, ai) = 0 if i < n, because ai is in the null space off. Therefore, {aig is an orthonormal basis for w, and we have reached our goal: Theorelll 7.1. If w is a symmetric bilinear functional on a finite-dimensional real vector space V, then V has an w-orthonormal basis. For an w-orthonormal basis the expansion w(~, 7]) = L xiYjw(ai, aj) reduces to n w(~, 7]) = L: xiYiq(ai), i=1 where q(ai) = ± 1 or O. If we let V 1 be the span of those basis vectors ai for which q(ai) = 1, and similarly for V -1 and V o, then we see that q(~) > 0 for every nonzero ~ in V b q(~) < 0 for every nonzero vector ~ in V-b and q = 0
  • 125. 2.7 THE DIAGONALIZATION OF A QUADRATIC FORM 113 on Vo. Furthermore, V = V 1 9 V -1 9 V 0, and the three subspaces are w-orthonormal to each other (which means that w{~, 71) = 0 if ~ E V1 and 71 E V-1> etc.). Finally, q(~) ::; 0 for every ~ in V-1 9 Yo· If we choose another orthonormal basis {~i} and let W 1, W -1, and W 0 be its corresponding subspaces, then W 1 may be different from V 1, but their dimen- sions must be the same. For W 1 n (V-1 9 V 0) = {O}, since any nonzero ~ in this intersection would yield the contradictory inequalities q(~) > 0 and q(~) ::; o. Thus W 1 can be extended to a complement of V -1 9 Yo, and since V 1 is a complement, we have d(W1) ::; d(V1). Similarly, d(V1) ::; d(W1), and the dimensions therefore are equal. Incidentally, this shows that W 1 is a complement of V-1 9 Yo. In exactly the same way, we find that d(W-1) = d(V-1) and finally, by subtraction, that d(W0) = d(Vo). It is conventional to reorder an w-orthonormal basis {ai} 1so that all the a/s with q(ai) = 1 come first, then those with q(ai) = -1, and finally those with q(ai) = o. Our results above can then be stated as follows: Theorelll 7.2. If w is a symmetric bilinear functional on a finite-dimensional space V, then there are integers nand p such that if {ai} '{' is any w-ortho- normal basis in conventional order, and if ~ = L:,{, Xiai, then q(~) = x~ +... +x; - X;+1 - ... - X;+n p p+n =:E x~ - :E xl1 p+1 The integer p - n is called the signature of the form q (or its associated symmetric bilinear functional w), and p +n is its rank. Note that p + n is the dimension of the column space of the above matrix of q, and hence equals the dimension of the range of the related linear map T. Therefore, p +n is the rank of every matrix of q. An inductive proof that an orthonormal basis exists doesn't show us how to find one in practice. Let us suppose that we have the matrix {tii} of w with respect to some basis {aig before us, so that w(~, 71) = L: XiYitij, where ~ = L:1 Xiai, 71 = 1:1 Yiai, and tii = w(ai' ai), and we want to know how to go about actually finding an orthonormal basis {~i} 1. The main problem is to find an orthogonal basis; normalization is then trivial. The first objective is to find a vector ~ such that w(~, m0;6- o. If some tii = w(ai' ai) is not zero, we can take ~ = ai. If all tii = 0 and the form w is not the zero form, there must be some Iii 0;6- 0, say t12 0;6- O. If we set 1'1 = a1 +a2 and I'i = ai for i > 1, then {I'ig is a basis, and the matrix s = {Sii} of w with respect to the basis {-Yi} has Sll = W(I'1> 1'1) = w(a1 +a2, a1 +a2) = tll +2t12 +t22 = 2t12 0;6- O. Similarly, sii = tii if either i or j is greater than 1. For example, if w is the bilinear form on 1R2 defined by w(x, y) = X1Y2 + X2Y1, then its matrix tii = w(,si, ,si) is [~ ~J'
  • 126. 114 FINITE-DIMENSIONAL VECTOR SPACES 2.7 and we must change the basis to get tt1 ~ o. According to the above scheme, we set 'Yl = 151 + 152 and 'Y2 = 152 and get the new matrix Sij = W('Yi' 'Yj), which works out to [i ~]. The next step is to find a basis for the null space of the functional w(~, 'Y1) = L XiS I i· We do this by modifying 'Y2, ... , 'Yn; we replace 'Yj by 'Yj +e'Y1 and calculate e so that this vector is in the null space. Therefore, we want 0 = w('Yj +e'Yl, 'Yl) = Slj +eS11, and so e = -sljls11. Note that we cannot take this orthogonalizing step until we have made Sl1 ~ o. The new set still spans and thus is a basis, and the new matrix {rij} has r11 ~ 0 and rlj = rjl = 0 for j > 1. We now simply repeat the whole procedure for the restriction of W to this (n - I)-dimensional null space, with matrix {rij : 2 ~ i, j ~ n}, and so on. This is a long process, but until we normalize, it consists only of rational oper- ations on the original matrix. We add, subtract, multiply, and divide, but we do not have to find roots of polynomial equations. Continuing our above example, we set fh = 'Yb but we have to replace 'Y2 by fJ2 = 'Y2 - (SI2Is11)'Y1 = 'Y2 - t'Yl. The final matrix rij = W(fJi, fJj) has rll = Sl1 = 2, {rij} = [2 ?].o -"2 The final basisisfJl = 'Y1 = 151 + c52andfJ2 = 'Y2 - t'Yl = 152 - t(c51 + 152)= (152 - 151)/2. The steps we had to take above are reminiscent of row reduction, but since we are changing bases simultaneously in the domain and range spaces of the transformation T: V ---? V* associated with w, each step involves simultaneously premultiplying and postmultiplying by an elementary matrix. That is, we are simultaneously row and column reducing. It should be intuitively clear that this has to be the case if we are to operate on a symmetric matrix in such a way as to keep it symmetric. - For additional information about quadratic forms, we go back to the change of basis formula for the matrix of a transformation: til = b· t'· a-I. Here the transformation T associated with the form w is from V to V*, and so b = (a*)-I, according to our calculations in Section 4. Now one of the properties of the determinant function is that d(T*) = d(T), and so d(a*) = d(a). Therefore, if t and s are the matrices of a quadratic form with respect to a first and second basis in V, and if a is the change of basis matrix, then s = (a*)-I. t· a-I and deS) = (d(a-1))2 d(t). Therefore, a quadratic form has parity. If it is non- singular, then its determinant is either always positive or always negative, and
  • 127. 2.7 THE DIAGONALIZATION OF A QUADRATIC FORM 115 we can call it even or odd. In our continuing example, the beginning and final matrices [~ ~] and both have determinant -1. In the two-dimensional case, the determinant of a form with respect to an orthonormalized basis is +1 if the diagonal elements are both +1 or both - 1, and -1 if they are of opposite sign. We can therefore read off the signature of a nonsingular form over a two-dimensional space without orthonormalizing. If the determinant t11t22 - (t 12)2 is positive, the signature is ±2, and we can deter- mine which by looking at t11 (since t11 is then unchanged by our orthogonalizing procedure). Thus the signature is +2 or -2 depending on whether tIl > 0 or t11 < o. If the determinant is negative, then the signature is O. Thus the signature of the form w(x, y) = X1Y2 + X2Yr, with matrix [~ ~], is known to be 0, without any calculation. Theorems 7.1 and 7.2 are important for the classification of critical points of real-valued functions on vector spaces. We shall see in Section 3.16 that the second differential of such a function F is a symmetric bilinear functional, and that the signature of its form has the same significance in determining the be- havior of F near a point at which its first differential is zero that the sign of the second derivative has in the elementary calculus. A quadratic form q is said to be definite if q(~) is never zero except for ~ = O. Then q(~) must always have the same sign, and q is accordingly called positive definite or negative definite. Looking back to Theorem 7.2, it should be obvious that q is positive definite only if p = dey) and n = 0, and negative definite only if n = dey) and p = o. A symmetric bilinear functional whose associated quadratic form is positive definite is called a scalar product. This is a very important notion on general vector spaces, and the whole of Chapter 5 is de- voted to developing some of its implications.
  • 128. CHAPTER 3 THE DIFFERENTIAL CALCULUS Our algebraic background is now adequate for the differential calculus, but we still need some multidimensional limit theory. Roughly speaking, the differ- ential calculus is the theory of linear approximations to nonlinear mappings, and we have to know what we mean by approximation in general vector settings. We shall therefore start this chapter by studying the notion of a measure of length, called a norm, for the vectors in a vector space V. We can then study the phenomenon suggested by the way in which a tangent plane to a surface approximates the surface near the point of tangency. This is the general theory of unique local linear approximations of mappings, called differentials. The collection of rules for computing differentials includes all the familiar laws of the differential calculus, and achieves the same goal of allowing complicated calculations to be performed in a routine way. However, the theory is richer in the multidimensional setting, and one new aspect which we must master is the interplay between the linear transformations which are differentials and their evaluations at given vectors, which are directional derivatives in general and partial derivatives when the vectors belong to a basis. In particular, when the spaces in question are finite-dimensional and are replaced by Cartesian spaces through a choice of bases, then the differential is entirely equivalent to its matrix, which is a certain matrix of partial derivatives called the Jacobian matrix of the mapping. Then the rules of the differential calculus are expressed in terms of matrix operations. Maximum and minimum points of real-valued functions are found exactly as before, by computing the differential and setting it equal to zero. However, we shall neglect this subject, except in starred sections. It also is much richer than its one-variable counterpart, and in certain infinite-dimensional situations it becomes the subject called the calculus of variations. Finally, we shall begin our study of the inverse-mapping theorem and the implicit-function theorem. The inverse-mapping theorem states that if a mapping between vector spaces is continuously differentiable, and if its differential at a point a is invertible (as a linear transformation), then the mapping itself is invertible in the neighborhood of a. The implicit-function theorem states that if a continuously differentiable vector-valued function G of two vector variables is set equal to zero, and if the second partial differential of G is invertible (as a linear mapping) at a point -< a, (3 >- where G(a, (3) = 0, then the equation 116
  • 129. 3.1 REVIEW IN IR 117 G(~7J) = 0 can be solved for 7J in terms of ~ near this point. That is, there is a uniquely determined mapping 7J = F(~) defined near a such that (3 = F(a) and such that G(~, F(~») = 0 in the neighborhood of a. These two theorems are fundamental to the further development of analysis. They are deeper results than our work up to this point in that they depend on a special property of vector spaces called completeness; we shall have to put off part of their proofs to the next chapter, where we shall study completeness in a fairly systematic way. In a number of starred sections at the end of the chapter we present some harder material that we do not expect the reader to master. However, he should try to get a rough idea of what is going on. I. REVIEW IN IR Every student of the calculus is presumed to be familiar with the properties of the real number system and the theory of limits. But we shall need more than familiarity at this point. It will be absolutely essential that the student under- stand the E-definitions and be able to work with them. To be on the safe side, we shall review some of this material in the setting of limits of functions; the confident reader can skip it. We suppose that all the functions we consider are defined at least on an open interval containing a, except possibly at a itself. The need for this exception is shown by the difference quotients of the calculus, which are not defined at the point near which their hchavior is crucial. Definition. f(x) approaches l as x approaches a (in symbols, f(x) ~ l as x ~ a) if for every positive Ethere exists a positive ~ such that o < Ix - al < ~ ~ If(x) - II < E. We also say that l is the limit of f(x) as x approaches a and write lim",--->af(x) = l. The displayed statement in the definition is understood to be universally quantified in x, so that the definition really begins with the three quantifiers (VE>O)(3~>O)(Vx). These prefixing quantifiers make the definition sound artificial and unidiomatic when read as ordinary prose, but the reader will remember from our introductory discussion of quantification that this artificiality is absolutely necessary in order for the meaning of the sentence to be clear and unambiguous. Any change in the order of the quantifiers (VE)(3~)(Vx) changes the meaning of the statement. The meaning of the inner universal quantifi- mtion (Vx)(O < Ix - al < ~ ~ If(x) - II < E) is intuitive and easily pictured (see Fig. 3.1). I E{~-_-_-~i_-~-~~r-_ E{ I I --+- I I I Fig. 3.1
  • 130. 118 THE DIFFERENTIAL CALCULUS 3.1 For all x closer to a than ~ the value of f at x is closer to l than E. The definition begins by stating that such a positive ~ can be found for each positive E. Of course, ~ will vary with E; if E is made smaller, we will generally have to go closer to a, that is, we will have to take ~ smaller, before all the values of f on (a - ~, a + ~) - {a} become E close to l. The variables 'E' and '~' are almost always restricted to positive real num- bers, and from now on we shall let this restriction be implicit unless there seems to be some special call for explicitness. Thus we shall write simply ('v'E)(3~) ... The definition of convergence is used in various ways. In the simplest situations we are given one or more functions having limits at a, say, f(x) ---+ u and g(x) ---+ v, and we want to prove that some other function h has a limit 10 at a. In such cases we always try to find an inequality expressing the quantity we wish to make small, Ih(x) - 101, in terms of the quantities which we know can be made small, If(x) - ul and Ig(x) - vi. For example, suppose that h = f +g. Since f(x) is close to u and g(x) is close to v, clearly hex) is close to 10 = U +v. But how close? Since hex) - 10 = (f(x) - u) + (g(x) - v), we have Ih(x) - 101 ::; If(x) - ul + Ig(x) - vi. From thi~ it is clear that in order to make Ih(x) - 101 less than E it is sufficient to make each of If(x) - ul and Ig(x) - vi less than E/2. Therefore, given any E, we can take ~l so that 0 < Ix - al < ~l => If(x) - ul < E/2, and ~2 so that o < Ix - al < ~2 => Ig(x) - vi < E/2, and we can then take ~ as the smaller of these two numbers, so that if 0 < Ix - al < ~, then both inequalities hold. Thus o < Ix - al < ~ => Ih(x) - 101 ::; If(x) - ul + Ig(x) - vi < ~ +~ = E, and we have found the desired ~ for the function h. Suppose next that u r!= 0 and that h = l/f. Clearly, hex) is close to 10 = l/u whenf(x) is close to u, and so we try to express hex) - 10 in terms of f(x) - u. Thus 1 1 u - f(x) hex) - 10 = f(x) - U= f(x)u ' and so Ih(x) - 101 ::; If(x) - ul/lf(x)ul. The trouble here is that the denomi- nator is variable, and if it should happen to be very small, it might cancel the smallness of If(x) - ul and not force a small quotient. But the answer to this problem is easy. Sincef(x) is close to u and u is not zero, f(x) cannot be close to zero. For instance, if f(x) is closer to u than lul/2, then f(x) must be farther from 0 than lul/2. We therefore choose ~l so that 0 < Ix - al < ~l => If(x) - ul < lul/2, from which it follows that If(x) I > lul/2. Then Ih(x) - 101 < 2If(x) - ul/luI 2,
  • 131. 3.1 REVIEW IN ~ 119 and now, given any E, we take 02 so that o < Jx - aJ < 02 => Jf(x) - uJ < EJUJ2/2. Again taking 0 as the smaller of 01 and 02, so that both inequalities will hold Himultaneously when 0 < Jx - aJ < 0, we have o < Jx - aJ < 0 => Jh(x) - wJ < 2Jf(x) - UJ/JUJ2 < 2EJUJ2/2JuJ2 = E, and again we have found our 0 for the function h. We have tried to show how one would think about these situations. The actual proof that would be written down would only show the choice of o. Thus, Lelllllla 1.1. If f(x) ----+ u and g(x) ----+ v as x ----+ a, then f(x) + g(x) ----+ u +v as x ----+ a. Proof. Given E, choose 01 so that 0 < Jx - aJ < 01 => Jf(x) - aJ < E/2 (by the assumed convergence of f to u at a), and, similarly, choose 02 so that o < Jx - aJ < 02 => Jg(x) - vJ < E/2. Take 0 as the smaller of 01 and 02. Then 0< Jx - aJ < 0 => J(j(x) +g(x») - (u+v)J :::; Jf(x) - uJ + Jg(x) - vJ < E/2 + E/2 = E. Thus we have proved that for every Ethere is a 0 such that o < Jx - aJ < 0 => J(j(x) + g(x») - (u +v)1 < E, and we are done. 0 In addition to understanding E-techniques in limit theory, it is necessary to understand and to be able to use the fundamental property of the real number system called the least upper bound property. In the following statement of the property the semi-infinite interval (- 00, a] is of course the subset {x E ~ : x :::; a}. If A is a nonempty subset of ~ such that A c (- 00, a] for some a, then there exists a uniquely determined smallest number bsuch that A C (- 00, b]. A number a such that A C (- 00, a] is called an upper bound of A; clearly, a iH an upper bound of A if and only if every x in A is less than or equal to a. Aset having an upper bound is said to be bounded above. The property says that a nonempty set A which is bounded above has a least upper bound (lub). If we reverse the order relation by multiplying everything by -1, then we have the alternative formulation which asserts that a nonempty subset of ~ that is hounded below has a greatest lower bound (glb). The least upper bound of the interval (0, 1) is 1. The least upper bound of [0, 1] is also 1. The greatest lower hound of {1/n : n a positive integer} is O. Furthermore, lub {x : x is a positive rational number and x 2 < 2} = 0, glb {ex: x E ~} = 0, and lub {ex: x is rational and x < 0} = eV2•
  • 132. 120 THE DIFFERENTIAL CALCULUS 3.1 EXERCISES 1.1 Prove that if f(x) ----> I and f(x) ----> m as x ----> a, then I = 111. ·We can therefore talk about the limit of f as x ----> a. 1.2 Prove that if f(x) ----> I and g(x) ----> m (as x ----> a), then f(x) g(x) ----> 1m as x ----> a. 1.3 Prove that Jx - aJ ::::; JaJ/2 => JxJ 2:: JaJ/2. 1.4 Prove (in detail) the greatest lower bound property from the least upper bound property. 1.5 Show that lub A = x if and only if x is an upper bound of A and, for every positive E, x - E is not an upper bound of A. 1.6 Let A and B be subsets of IR that are nonempty and bounded aboye. Show that A +B is nonempty and bounded above and that lub (A + B) = lub A +lub B. 1.7 Formulate and prove a correct theorem about the least upper bound of thr product of two sets. 1.8 Define the notion of a one-sided limit for a function whose domain is a subset of IR. For example, we want to be able to discuss the limit of f(x) as x approaches a from below, which we might designate lim f(x). ",fa 1.9 If the domain of a real-valued function f is an interval, say [a, b], we say that f i~ an increasing function if x < y => f(x) ::::; fey). Prove that an increasing function has one-sided limits everywhere. 1.10 Let [a, b] be a closed interval in IR, and letf: [a, b] ----> IR be increasing. Show that limx->yf(x) = fey) for all y in [a, b] (f is continuous on [a, b]) if and only if the range of f does not omit any subinterval (e, d) C [f(a) , feb)]. [Hint: Suppose the range omit~ (e, d), and set y = lub {x :f(x) ::::; e}. Then f(x) +fey) as x ----> y.] loll A set that intersects every open subinterval of an interval [s, t] is said to be dense in [s, t]. Show that if f: [a, b] ----> IR is increasing and rangefis dense in [f(a),j(b)], thenrangef = [f(a),f(b)]. (Foranyzbetweenf(a) andf(b) sety = lub {x:f(x)::::; z}, etc.) 1.12 Assuming the results of the above two exercises, show that if f is a continuous strictly increasing function from [a, b] to IR, and if r = f(a) and s = feb), then f-1 is a continuous strictly increasing function from [r, s] to IR. [A function f is continuous if f(x) ----> fey) as x ----> y for every y in its domain; it is strictly increasing if x < y ==} f(x) < f(y)·] 1.13 Argue somewhat as in Exercise 1.11 above to prove that if f: [a, b] ----> IR is con- tinuous on [a, b], then the range of f includes [f(a),j(b)]. This is the intermediate- value theorem. 1.14 Suppose the function q: IR ----> IR satisfies q(xy) = q(x) q(y) for all x, y E IR. Note that q(x) = xn (n a positive integer) and q(x) = JxJr (r any real number) satisfy this "functional equation". So does q(x) == 0 (r = -00 ?). Show that if q satisfies the functional equation and q(x) > 1 for x > 1, then there is a real number r > 1 such that q(x) = xr for all positive x.
  • 133. 3.2 NORMS 121 1.15 Show that if q is continuous and satisfies the functional equation q(xy) = q(x) q(y) for all x, y E IR, and if there is at least one point a where q(a) ¢ 0, 1, then q(x) == xr for all positive x. Conclude that if also q is nonnegative, then q(x) == Ixlr on IR. 1.16 Show that if q(x) == lxi', and if q(x +y) ~ q(x) +q(y), then r ~ 1. (Try y = 1 and x large; what is q'(x) like if r > I?) 2. NORMS rn the limit theory of IR, as reviewed briefly above, the absolute-value function is used prominently in expressions like' Ix - yl' to designate the distance between two numbers, here between x and y. The definition of the convergence of f(x) to u is simply a careful statement of what it means to say that the distance I/(x) - ul tends to zero as the distance Ix - al tends to zero. The properties of [:rl which we have used in our proofs are 1) Ixl > 0 if x ~ 0, and 101 = 0; 2) Ixyl = IxIIYI; 3) Ix +yl ~ Ixl + Iyl· The limit theory of vector spaces is studied in terms of functions called /lOrms, which serve as multidimensional analogues of the absolute-value function on IR. Thus, if p: V ~ IR is a norm, then we want to interpret pea) as the "size" of a and pea - (3) as the "distance" between a and (3. However, if V is not one-dimensional, there is no one notion of size that is most natural. For example, if f is a positive continuous function on [a, b], and if we ask the reader for a number which could be used as a measure of how "large" f is, there are two possibilities that will probably occur to him: the maximum value off and the area Ilnder the graph of f. Certainly, f must be considered small if max f is small. But also, we would have to agree that f is small in a different sense if its area is small. These are two examples of norms on the vector space V = e([a, b]) of all (·ontinuous functions on [a, b]: p(f) = max {If(t) I:t E [a, b]} and q(f) = lab If(t)Idt. "ote that f can be small in the second sense and not in the first. In order to be useful, a notion of size for a vector must have properties analogous to those of the absolute-value function on IR. Definition. A norm is a real-valued function p on a vector space V such that nl. pea) > 0 if a ~ 0 (positivity); n2. p(xa) = Ixlp(a) for all a E V, x E IR (homogeneity); n3. pea +(3) ~ pea) + p({3) for all a, {3 E V (triangle inequality). A normed linear space (nls), or normed vector space, is a vector space V together with a norm p on V. A normed linear space is thus really a pair
  • 134. 122 THE DIFFERENTIAL CALCULUS 3.2 -< V, p:>, but generally we speak simply of the normed linear space V, a definite norm on V then being understood. It has been customary to designate the norm of a by Iiall, presumably to suggest the analogy with absolute value. The triangle inequality n3 then becomes Iia + ~II :::; Iiall + II~II. which is almost identical in form with the basic absolute-value inequality Ix + yl :::; Ixl + Iyl. Similarly, n2 becomes Ilxall = Ixiliall, analogous to Ixyl = Ixllyl in IR. Furthermore, Iia - ~II is similarly interpreted as the distance between a and~. This is reasonable since if we set a = ~ - 71 and ~ = 71 - r, then n3 becomes the usual triangle inequality of geometry: II~ - rll :::; II~ - 7111 + 1171 - rll· We shall use both the double bar notation and the "p"-notation for norms; each is on occasion superior to the other. The most commonly used norms on IRn are IlxilI = 1:7 lXii, the Euclidean norm IIxl12 = (1:7 X~)I/2, and Ilxlloo = max {Ixil}i. Similar norms on the infinite-dimensional vector space e([a, b)) of all continuous real-valued functions on [a, b] are IIflll = fab If(t) Idt, ( b )1/2IIfll2 = fa If(tW dt , Ilflloo = max {If(t) I: a :::; t :::; b}. It should be easy for the reader to check that II lit is a norm in both cases above, and we shall take up the so-called uniform norms II 1100 in the next paragraph. The Euclidean norms II 112 are trickier; their properties depend on scalar product considerations. These will be discussed in Chapter 5. :Meanwhile, so that the reader can use the Euclidean norm II 112 on IRn , we shall ask him to prove the triangle inequality for it (the other axioms being obvious) by brute force in an exercise. On IR itself the absolute value is a norm, and it is the only norm to within a constant multiple. We can transfer the above norms on IRn to arbitrary finite-dimensional spaces by the following general remark. Lemma 2.1. If p is a norm on a vector space Wand T is an injective linear map from a vector space V to W, then poT is a norm on V. Proof. The proof is left to the reader. Uniform norms. The two norms II 1100 considered above are special cases of a very general situation. Let A be an arbitrary nonempty set, and let <B(A, IR) be the set of all bounded functionsf: A ~ IR. That is, f E <B(A, IR) if and only if f E IRA and range f c [-b, b] for some b E IR. This is the same as saying that range IfIc [0, b], and we call any such b a bound of IfI· The set <B(A, IR) is a
  • 135. 3.2 NORMS 123 vector space V, since if IfI and Igl are bounded by band c, respectively, then Ixf + ygl is bounded by Ixlb + lyle. The uniform norm Ilfll., is defined as the smallest bound of IfI· That is, Ilfll., = lub {If(p)1 : pEA}. Of course, it has to be checked that II II., is a norm. For any p in A, If(p) + g(p)1 :::; If(p) I + Ig(p)1 :::; Ilfll", + Ilgll.,· Thus Ilfll., + Ilgll., is a bound of If +gl and is therefore greater than or equal to the smallest such bound, which is Ilf +glloo. This gives the triangle inequality. Next we note that if x ~ 0, then b bounds IfIif and only if Ixlb bounds Ixfl, and it follows that Ilxfll., = Ixillfll.,. Finally, Ilfll., ~ 0, and Ilfll., = °only if f is the zero function. We can replace IR by any normed linear space W in the above discussion. A function f: A ---t W is bounded by b if and only if Ilf(p) II :::; b for all p in A, and we define the corresponding uniform norm on CB(A, W) by Ilflloo = lub {lIf(p) II : pEA}. If f E e([O, 1]), then we know that the continuous function f assumes the least upper bound of its range as a value (that is, f "assumes its maximum value"), so that then IIfll., is the maximum value of IfI· In general, however, the definition must be given in terms of lub. Balls. Remembering that IIa - ~II is interpreted as the distance from a to ~, it is natural to define the open ball of radius r about the center a as {~ : IIa - ~II < r}. We designate this ball Br(a). Translation through (3 preserves distance, and therefore ~ E Br(a) if and only if ~ +(3 E Br(a +(3). That is, translation through (3 carries Br(a) into Br(a +(3): T(:I[Br(a)] = Br(a +(3). Also, scalar multiplication by c multiplies all distances by c, and it follows in a similar way that cBr(a) = Bcr(ca). Although Br(a) behaves like a ball, the actual set being defined is different for different norms, and some of them "look unspherelike". The unit balls about the origin in 1R2 for the three norms II 1111 II 112, and II II., are shown in Fig. 3.2. A subset A of a nls V is bounded if it lies in some ball, say Br(a). Then it also lies in a ball about the origin, namely Br+llall(O). This is simply the fact that if II ~ - all < r, then II ~II < r + lIall, which we get from the triangle inequality upon rewriting II~II as lIa - a) +all. The radius of the largest ball about a vector {3 which does not touch a set A is naturally called the distance from {3 to A. It is clearly glb {II ~ - {311 : ~ E A} (see Fig. 3.3).
  • 136. 124 THE DIFFERENTIAL CALCULUS 3.2 p(fj, A) =r Fig. 3.2 Fig. 3.3 Fig. 3.4 A point ex is an interior point of a set A if some ball about ex is included in A. This is equivalent to saying that the distance from ex to the complement of A is positive (supposing that A is not the whole of V), and should coincide with the reader's intuitive notion of what an "inside" point should be. A subset A of a normed linear space is said to be open if every point of A is an interior point. If our language is to be consistent, an open ball should be an open set. It is: if ex E Br({3), then Ilex - (311 < r, and then Ba(ex) C Br({3), provided that 0 :;:; r - "ex - {311, by virtue of the triangle inequality (see Fig. 3.4). The reader should write down the detailed proof. He has to show that if ~ E Ba(ex), then ~ E Br({3). Our intuitions about distances are quite trustworthy, but they should always be checked by a computation. The reader probably can see by a mental argument that the union of any collection of open sets is open. In particular, the union of any collection of open balls is open (Fig. 3.5), and this is probably the most intuitive way of visualizing an open set. (See Exercise 2.9.) Fig. 3.5 Fig. 3.6 A subset C is said to be closed if its complement C' is open. Our discussion above shows that a nonempty set C is closed if and only if every point not in it is at a positive distance from it: ex ~ C =? p(ex, C) > o. The so-called closed ball of radius r about {3, B = H : ,,~ - {311 :;:; r}, is a closed set. As Fig. 3.6 suggests, the proof is another application of the triangle in- equality.
  • 137. 3.2 EXERCISES 2.1 Show that if II~ - all ~ lIa11/2, then II~II ~ lIa1l/2. 2.2 Prove in detail that n Ilxlh = L: Ix;1 1 is a norm on IR n. Also prove that Ilfll! = { 1/(t)1 dt is a norm on e([a, b]). 2.3 For xin ~n let Ixl be the Euclidean length Ixl = [~xq/2, and let (x, y) be the scalar product The Schwarz inequality says that n (x, y) = L: XiYi. 1 I(x, y)1 ~ Ixllyl and that the inequality is strict if x and yare independent. NORMS 125 a) Prove the Schwarz inequality for the case n = 2 by squaring and canceling. b) Now prove it for the general n in the same way. 2.4 Continuing the above exercise, prove that the Euclidean length Ixl is a norm. The crucial step is the triangle inequality, Ix + yl ~ Ixl + Iyl. Reduce it to the Schwarz inequality by squaring and canceling. This is of course our two-norm Ilxll2. 2.5 Prove that the unit balls for the norms II 111 and II 1100 on 1R2 are as shown in Fig. 3.2. 2.6 Prove that an open ball is an open set. 2.7 Prove that a closed ball is a closed set. 2.8 Give an example of a subset of 1R2 that is neither open nor closed. 2.9 Show from the definition of an open set that any open set is the union of a family (perhaps very large!) of open balls. Show that any union of open sets is open. Conclude, therefore, that a set is open if and only if it is a union of open balls. 2.10 A subset A. of a normed linear space V is said to be convex jf A includes the line segment joining any two of its points. We know that the line segment from a to {3 is the image of [0, 1] under the mapping t -> t{3 + (1 - t)a. Thus A. is convex if and only if a, {3 E A and t E [0, 11 =? t{3 + (1 - t)a E A. Prove that every ball Br('Y) in a normed linear space V is convex. 2.11 A seminorm is the same as a norm except that the positivity condition nl is relaxed to nonnegativity: nl'. pea) ~ °for all a.
  • 138. 126 THE DIFFERENTIAL CALCULUS 3.3 Thus p(a) may be 0 for some nonzero a. Every norm is in particular a seminorm. Prove: a) If p is a seminorm on a vector space lr and T is a linear mapping from V to W, then poT is a seminorm on V. b) poT is a norm if and only if T is injective and p is a norm on range T. 2.12 Show that the sum of two seminorms is a seminorm. 2.13 Prove from the above two exercises (and not by a direct calculation) that q(f) = 11f'lloo + 11(to) I is a seminorm on the space e1([a, b]) of all continuously differentiable real-valued functions on [a, b], where to is a fixed point in [a, b). Prove that q is a norm. 2.14 Show that the sum of two bounded sets is bounded. 2.15 Prove that the sum Br(a) + B8 ({3) is exactly the ball Br+8 (a +(3). 3. CONTINUITY Let V and W be any two normed linear spaces. We shall designate both norms by II II. This ambiguous usage does not cause confusion. It is like the ambiguous use of "0" for the zero elements of all the vector spaces under consideration. If we replace the absolute value sign I I by the general norm symbol II II in the definition we gave earlier for the limit of a real-valued function of a real variable, it becomes verbatim the corresponding definition of convergence in the general setting. However, we shall repeat the definition and take the occasion to relax the hypothesis on the domain of f. Accordingly, let A by any subset of V, and let f be any mapping from A to W. Definition. We say that f(~) approaches (3 as ~ approaches a, and write f(~) ~ {3 as ~ ~ a, if for every E there is a 0 such that ~ E A and 0 < II ~ - all < 0 => IlfW - (311 < E. If a E A and f(~) ~ f(a) as ~ ~ a, then we say that f is continuous at a. We can then drop the requirement that ~ ~ a and have the direct E,O- characterization of continuity: f is continuous at a if for every E there exists a 0 such that II ~ - all < 0 => IlfW - f(a) II < E. It is understood here that ~ is universally quantified over the domain A of f. We say that f is continuous iff is continuous at every point a in its domain. If the absolute value of a number is replaced by the norm of a vector, the limit theorems that we sampled in Section 1 hold verbatim for normed linear spaces. We shall ask the reader to write out a few of these transcriptions in the exercises. There is a property stronger than continuity at a which is much simpler to use when it is available. We say that f is Lipschitz continuous at a if there is a constant csuch that IIf(~) - f(a) II ::; cll ~ - all for all ~ sufficiently close to a.
  • 139. 3.3 CONTINUITY 127 That is, there are constants c and r such that II t - all < r => IIf(t) - f(a) II ::; cll t - all· The point is that now we can take 8 simply as E/c (provided E is small enough so that this makes 8 ::; r; otherwise we have to set 8 = min {E/C, r}). We say that f is a Lipschitz function (on its domain A) if there is a constant c such that Ilf(~) - f(7J) II ::; cllt - 7111 for all t,7J in A. For a linear map T: V ~ W the Lipschitz inequality is more simply written as for all ~ E V; we just use the fact that now T(~) - T(7J) = T(t - 71) and set ~ = t - 71. In this context it is conventional to call T a bounded linear mapping rather than a Lipschitz linear mapping, and any such c is called a bound of T. We know from the beginning calculus that if f is a continuous real-valued function on [a, b) (that is, if f E e([a, b))), then II: f(x) £lxl ::; m(b - a), where m is the maximum value of If(x)l. But this is just the uniform norm of f, so that the inequality can be rewritten as II: fl ::; (b - a) 111!100. This shows that if the uniform norm is used on e([a, b)), then f 1---+ I:f is a bounded linear functional, with bound b - a. It should immediately be pointed out that this is not the same notion of boundedness we discussed earlier. There we called a real-valued function bounded if its range was a bounded subset of IR. The analogue here would be to call a vector-valued function bounded if its range is norm bounded. But a nonzero linear transformation cannot be bounded in this sense, because IIT(xa) II = IxIIIT(a)ll· The present definition amounts to the boundedness in the earlier sense of the quotient T(a)/iiall (on V - {O}). It turns out that for a linear map T, being continuous and being Lipschitz are the same thing. TheoreDl 3.1. Let T be a linear mapping from a normed linear space V to a normed linear space W. Then the following conditions are equivalent: 1) T is continuous at one point; 2) T is continuous; 3) T is bounded. Proof. (1) => (3). Suppose T is continuous at ao. Then, taking E = 1, there exists 8such that lIa - aoll < 8=> IIT(a) - T(ao)1I < 1. Setting t = a - ao and using the additivity of T, we have II til < 8=> IIT( t)II < 1. Now for any nonzero 71, t = 871/2117111 has norm 8/2. Therefore, IIT(t)1I < 1. But IIT(t)11 = 8I1T(7J)11/2117J11, giving IIT(7J)1I < 2117111/8. Thus T is bounded by C = '2/8.
  • 140. 128 THE DIFFERENTIAL CALCULUS 3.3 (3) ==} (2). Suppose IITW II ~ Gil ~II for all~. Then for any ao and any E we can take 6 = E/G and have lIa - aoll < 6 ==} IIT(a) - T(ao)1I = IIT(a - ao)1I ~ Gila - aoll < G6 = E. (2) ==} (1). Trivial. 0 In the lemma below we prove that the norm function is a Lipschitz function from V to~. Lemma 3.1. For all a, {3 E V, Illall - 1I{311 I~ lIa - {311· Proof. We have lIall = II (a - (3) +{311 ~ IIa - {311 + 1I{311, so that lIall - 1I{311 ~ lIa - {311· Similarly, II{311 - lIall ~ 1I{3 - all = lIa - {311. This pair of inequal- ities is equivalent to the lemma. 0 Other Lipschitz mappings will appear when we study mappings with con- tinuous differentials. Roughly speaking, the Lipschitz property lies between continuity and continuous differentiability, and it is frequently the condition that we actually apply under the hypothesis of continuous differentiability. The smallest bound of a bounded linear transformation T is called its norm. That is, IITII = lub {IIT(a)lI/l1all : a ~ O}. For example, let T: e([a, b]) ---+ ~ be the Riemann integral, T(f) = I:f(x) dx. We saw earlier that if we use the uniform norm IIfll", on e([a, b)), then T is bounded by b - a: IT(f)1 ~ (b - a)lIfll",. On the other hand, there is no smaller bound, because I: 1 = b - a = (b - a) II 111",· Thus IITII = b - a. Other formulations of the above definition are useful. Since IIT(a)lI/l1all = IIT(alllall)1I by homogeneity, and since {3 = aillall has norm 1, we have IITII = lub {IIT({3)1I : 1I{311 = 1}. Finally, if II"YII ~ 1, then"Y = x{3, where 1I{311 = 1 and Ixl ~ 1, and IIF("Y)II = IxIIlF({3)1I ~ IIF({3) II· We therefore have an inefficient but still useful characterization: IITII = lub {IIT("Y)II : II"YII ~ 1}. These last two formulations are uniform norms. Thus, if B I is the closed unit ball H : II ~II ~ 1}, we see that a linear T is bounded if and only if T fBI is bounded in the old sense, and then IITII = liT f BIll",· A linear map T: V ---+ Wisbouncled below byb if IiTa)1I ~ bll~1I for all ~ in V. If T has a bounded inverse and m = liT-III, then T is bounded below by 11m, for IIT-I('1)1I ~ mll'1l1 for all '1 E W if and only if II~II ~ mIlT(~)1I for all ~ E V.
  • 141. 3.3 CONTINUITY 129 If V is finite-dimensional, then it is true, conversely, that if T is bounded below, then it is invertible (why?), but in general this does not follow. If V and Ware normed linear spaces, then Hom(V, W) is defined to be the set of all bounded linear maps T: V ~ W. The results of Section 2.3 all remain true, but require some additional arguments. Theorem 3.2. Hom(V, W) is itself a normed linear space if II Til is defined as above, as the smallest bound for T. Proof. This follows from the uniform norm discussion of Section 2 by virtue of the identity IITII = liT rBilloo- 0 Theorem 3.3. If U, V, and Ware normed linear spaces, and if T E Hom(U, V) and S E Hom(V, W), then SoT E Hom(U, W) and liS 0 Til ~ IISIIIITII. It follows that composition on the right by a fixed T is a bounded linear transformation from Hom(V, W) to Hom(U, W), and similarly for composition on the left by a fixed S. Proof II(S 0 T)(a)1I = IIS(T(a)) II ~ IISIIIIT(a)11 ~ IISII(IITllllall) = (IISII' IITII)(llall)· Thus SoT is bounded by liSII . II Til and everything else follows at once. 0 As before, the conjugate space V* is Hom(V, IR), now the space of all bounded linear functionals. EXERCISES 3.1 Write out the f,B-proofs of the following limit theorems. 1) Let V and TV be normed linear spaces, and let F and Gbe mappings from V to W. If lim~->a FW = p. and lim~->a GW = v, then lim~->a (F + G) W = p. + v. 2) Given F: V ~ lV and g: V ~ IR, if F(~) ~ p. and g(~) ~ b as ~ ~ a, then (gF)(~) ~ bp.. 3.2 Prove that if F(~) ~ p. as ~ ~ a and G(.,,) ~ X as ." ~ p., then G 0 F(~) ~ X as ~ ~ a. Give a careful, complete statement of the theorem you have proved. 3.3 Suppose that A is an open subset of a nls V and that ao E A. Suppose that F: A ~ IR is such that lima->ao F(a) = b ~ O. Prove that l/F(a) ~ lib as a ~ ao (f,B-proof). 3.4 The functionf(x) = Ixlr is continuous at x = 0 for any positive r. Prove thatf is not Lipschitz continuous at x = 0 if r < 1. Prove, however, that f is Lipschitz con- tinuous at x = a if a > O. (Use the mean-value theorem.) 3.5 Use the mean-value theorem of the calculus and the definition of the derivative to show that if f is a real-valued function on an interval I, and if f' exists everywhere, then f is a Lipschitz mapping if and only if f' is a bounded function. Show also that then Ilf'lloo is the smallest Lipschitz constant C.
  • 142. 130 THE DIFFERENTIAL CALCULUS 3.3 3.6 The "working rules" for II Til are 1) IITWII::; IITIIII~II for all ~; 2) IITWII::; bll~ll, all ~ ==} IITII::; b. Prove these rules. 3.7 Prove that if we use the one-norm Ilxlll = L~ IXil on IRn, then the norm of the linear functional is lIall",. 3.8 Prove similarly that if IIxll = IIxll"" then IILali = Iialh. 3.9 Use the above exercises to show that if Ilxll on IRn is the one-norm, then Ilxll = lub {If(x)1 :fE (IRn)* and IIfll ::; I}. 3.10 Show that if Tin Hom(lRn, IRm) has matrix t = {tij] , and if we use the one- norm Ilxlll on IRn and the uniform norm IIYII", on IRm, then IITII = Iltll",. 3.ll Show that the meaning of 'Hom(V, TV)' has changed by giving an example of a linear mapping that fails to be bounded. There is one in the text. 3.12 For a fixed ~ in V define the mapping eVE: Hom(V, W) ~ W by eVE(T) = T(~). Prove that eVE is a bounded linear mapping. 3.13 In the above exercise it is in fact true that IlevEIi = II~II, but to prove this we need a new theorem. Theorem. Given ~ in the normed linear space V, there exists a functionalfin V* such that Ilfll = 1 and IfWi = II~II. Assuming this theorem, prove that IlevEII = II~II. [Hint: Presumably you have already shown that lIevEII ::; II~II. You now need a Tin Hom(V, W) such that IITII = 1 and IITWII = II~II. Consider a suitable dyad.] 3.14 Let t = {tij] be a square matrix, and define IItll as maXi (Lj Itijl). Prove that this is a norm on the space IRnXn of all n X n matrices. Prove that listII ::; IIsll . Iltll. Compute the norm of the identity matrix. 3.15 Let V be the normed linear space IR n under the uniform norm IIxll", = max {Ixil). If T E Hom V, prove that II Til is the norm of its matrix IItll as defined in the above exercise. That is, show that (Show first that IItll is an upper bound of T, and then show that II T(x) II = Iltllllxll for a specially chosen x.) Does part of the previous exercise now become superfluous? 3.16 Assume the following fact: If fE e([O, 1]) and Ilflll = a, then given E, there is a function UE e([O, 1]) such that Ilull", = 1 and
  • 143. 3.3 CONTINUITY 131 Let K(8, t) be continuous on [0, 1] X [0, 1] and bounded by b. Define T: e([O, 1]) ---+ ffi([O, 1]) by Th = k, where k(8) = fa1 K(8, t) h(t) dt. If V and TV are the normal linear spaces e and ffi under the uniform norms, prove that IITII = l~b /IK(8, 01 dt. [lIint: Proceed as in the above exercise.] 3.17 Let V and TV be normed linear spaces, and let A be any subset of V containing more than one point. Let £(A, 11') be the set of all Lipschitz mappings from .1 to W. For fin £(.1, lr), let p(f) be the smallest Lipschitz constant for f. That is, p(f) = lub Ilfm - f(1/)II. ~*~ II~ - 1/11 Prove that £(.:1, Tr) is a vector space V and that p is a seminorm on V. :t18 Continuing the above exercise, show that if a is any fixed point of .1, then fI(f) + Ilf(a) II is a norm on V. :t19 Let K be a mapping from a subset A of a normed linear space V to V which differs from the identity by a Lipschitz mapping with constant c less than 1. We may as well take c = !, and then our hypothesis is that Prove that [( is injective and that its inverse is a Lipschitz mapping with constant 2. :1.20 Continuing the above exercise, suppose in addition that the domain A of [( is lin open subset of V and that K[C] is a closed set whenever C is a closed ball lying in A. Prove that if C = Cr(a) , the closed ball of radius r about a, is a subset of A, then I[C] includes the ball B = Br/7('Y) , where 'Y = [((a). This proof is elementary but tricky. If there is a point v of B not in [([CJ, then since K[C] is closed, there is a largest hall B' about v disjoint from K[C] and a point 1/ = K(~) in [([C] as close to B' as we wish. Now if we change ~ by adding v - 1/, the change in the value of [( will approxi- lIlate v - 1/ closely enough to force the new value of [( to be in B'. If we can also show that the new value ~ + (v - 1/) is in C, then this new value of [( is in [([CJ, and we have our contradiction. Draw a picture. Obviously, the radius p of B' is at most r/7. Show that if 1/ = K (~) is chosen so that II v - 1/ II :-::; 3/2p, then the above assertions follow from the triangle inequality, and the Lipschitz inequality displayed in Exercise 3.19. You have t() prove that IIK(~+ (v - 1/» - vii < p Illl(i :1.21 Assume the result of the above exercise and show that
  • 144. 132 THE DIFFERENTIAL CALCULUS 3.4 Show, therefore, that K[.I] is an open subset of Y. State a theorem about the Lipschitz invertibility of K, including all the hypotheses on K that were used in the above exercises. 3.22 We shall see in the next chapter that if V and Tr are finite-dimensional spaces, then any continuom, map from V to Tr takes bounded closed sets into bounded closed sets. Assuming this and the results of the above exercises, prove the following theorem. Theorem. Let F be a mapping from an open subset A of a finite-dimensional normed linear space V to a finite-dimensional normed linear space W. Suppose that there is a T in Hom(V, lV) such that T-I exists and such that F - T is Lipschitz on A, with constant 112m, where m = liT-III. Then F is injective, its range R = F[.I] is an open subset of 11', and its inverse F-I is Lipschitz contin- uous, with constant 2m. 4. EQUIVALENT NORMS Two normed linear spaces V and Ware norm isomorphic if there is a bijection T from V to W such that T E Hom(V, W) and T-I E Hom(W, V). That is, an isomorphism is a linear isomorphism T such that both T and T- I are continuous (bounded). As usual, we regard isomorphic spaces as being essentially the same. For two different norms on the same space we are led to the following definition. Definition. Two norms p and q on the same vector space V are equivalent if there exist constants a and b such that p ~ aq and q ~ bp. Then (l/b)q ~ P ~ aq and (l/a)p ~ q ~ bp, so that two norms are equivalent if and only if either can be bracketed by two multiples of the other. The above definition simply says that the identity map ~ ~ ~ from V to V, considered as a map from the normed linear space -< V, p>- to the normed linear space -< V, q>-, is bounded in both directions, and hence that these two normed linear spaces are isomorphic. If V is infinite-dimensional, two norms will in general not be equivalent. For example, if V = e([O, 1]) and fn(t) = tn, then Ilfnlll = l/(n + 1) and Ilinlloo = 1. Therefore, there is no constant a such that lIilloo ~ aililil for all f E e[O, 1], and the norms II 1100 and II III are not equivalent on V = era, b]. This is why the very notion of a normed linear space depends on the assumption of a given norm. However, we have the following theorem, which we shall prove in the next chapter by more sophisticated methods than we are using at present. Theorem 4.1. On a finite-dimensional vector space Vall norms are equiva- lent. We shall need this theorem and also the following consequence of it occasion- ally in the present chapter. Theorem 4.2. If V and Ware finite-dimensional normed linear spaces, then every linear mapping T from V to W is necessarily bounded.
  • 145. 3.4 EQUIVALENT NORMS 133 Proof. Because of the above theorem, it is sufficient to prove T bounded with respect to some pair of norms. Let (J: IRn ~ V and cp: IRm~ W be any basis isomorphisms, and let {tii} be the matrix of T = cp-l 0 To (J in Hom(lRn, IRm). Then where b = max Itiil. Now q(T/) = IIcp-I(T/)1100 and p(~) = II(J-I(~)III are norms on Wand V respectively,. by Lemma 2.1, and since q(T(~) = liT(o-l~)lloo ::; bll(J-I~III = bp(~), we see that T is bounded by b with respect to the norms p and q on V and W. D If we change to an equivalent norm, we are merely passing through an isomorphism, and all continuous linear properties remain unchanged. For example: Theorem 4.3. The vector space Hom(V, W) remains the same if either the domain norm or the range norm is replaced by an equivalent norm, and the two induced norms on Hom(V, W) are equivalent. Proof. The proof is left to the reader. We now ask what kind of a norm we might want on the Cartesian product V X W of two normed linear spaces. It is natural to try to choose the product norm so that the fundamental mappings relating the product space to the two factor spaces, the two projections 7ri and the two injections (Ji, should be con- tinuous. It turns out that these requirements determine the product norm uniquely to within equivalence. For if II <a, ~ >- II has these properties, then II <a, ~>- II = II <a, 0>- + <0, ~>- II ::; II <a, 0>- II + II <0, ~>- II ::; kliiall + k211 ~II ::; k(llall + II ~II), where ki is a bound of the injection (Ji and k is the larger of kl and k2 . Also, Iiall ::; cI11 <a, ~>- II and II~II ::; c211 <a, ~>- II, by the boundedness of the projec- tions 7ri, and so Iiall + II~II ::; cll <a, ~>- II, where c= C1 + C2. Now Iiall + II ~II is clearly a norm II I!1on V X W, and our argument above shows that I! <a, ~ >- II will satisfy our requirements if and only if it is equivalent to II Ill. Any such norm will be called a product norm for V X W. The product norms IllOSt frequently used are the uniform (product) norm II;<a, ~>-Iloo = max {ilall, II~II}' the Euclidean (product) norm II <a, ~>- 112 = (lIa112+ 11~112)1/2, and the above Hum (product) norn1 II <a, ~ >- lit- We shall leave the verification that the uni- form and Euclidean norms actually are norms as exercises. Each of these three product norms can be defined as well for n factor spaces as for two, and we gather the facts for this general case into a theorem.
  • 146. 134 THE DIFFERENTIAL CALCULUS 3.4 Theorem 4.4. If { -< Vi, Pi'? n is a finite set of normed linear spaces, then II II I, II 112, and II II"" defined on V = IIi=l Vi by lIalll = L~ Pi(ai), IIal12 = (L~ Pi(aY) 1/2, and Ilall", = max {Pi(ai) : i = 1, ... ,n}, are equivalent norms on V, and each is a product norm in the sense that the projections 7ri and the injections (Ji are all continuous. *It looks above as though all we are doing is taking any norm II lion IRn and then defining a norm III Ilion the product space V by This is almost correct. The interested reader will discover, however, that II lion IRn must have the property that if IXil :-:::: IYil for i = 1, •.. ,n, then Ilxll :-:::: Ilyll for the triangle inequality to follow for III III in V. If we call such a norm on IRn an increasing norm, then the following is true. If II II is any increasing norm on IR n, then Ilia III = II-<Pl(al), ... , Pn(an) '? II is a product norm on V = II~ Vi. However, we shall use only the 1-, 2-, oo-product norms in this book. * The triangle inequality, the continuity of addition, and our requirements on a product norm form a set of nearly equivalent conditions. In particular, we make the following observation. Lemma 4.1. If V is a normed linear space, then the operation of addition is a bounded linear map from V X V to V. Proof. The triangle inequality for the norm on V says exactly that addition is bounded by 1 when the sum norm is used on V X V. 0 A normed linear space V is a (norm) direct sum EB~ Vi if the mapping -< XI, ... , Xn '? 1---7 L~ Xi is a norm isomorphism from II~ Vi to V. That is, the given norm on V must be equivalent to the product norm it acquires when it is viewed as II~ Vi. If V is algebraically the direct sum EB~ Vi, we always have by the triangle inequality for the norm on V, and the sum on the right is the one- norm for II~ Vi. Therefore, V will be the norm direct sum EB~ Vi if, conversely, there is an n-tuple of constants {ki } such that Ilxill :-:::: kilixil for all x. This is the same as saying that the projections Pi: X 1---7 Xi are all bounded. Thus, Theorem 4.5. If V is a normed linear space and V is algebraically the direct sum V = EB~ Vi, then V = EB~ Vi as normed linear spaces if and only if the associated projections {Pi} are all bounded.
  • 147. :1.4 EQUIVALENT NORMS 135 EXERCISES 4.1 The fact that Hom(V, lV) is unchanged when norms are replaced by equivalent norms can be viewed as a corollary of Theorem 3.3. Show that this is so. 4.2 Write down a string of quite obvious inequalities showing that the norms II Ill, II 112, and II 1100 on IR n are equivalent. Discuss what happens as n ----> 00. 4.3 Let V be an n-dimensional vector space, and consider the collection of all norms on V of the form p 0 0, where 0: V ----> IRn is a coordinate isomorphism and p is one of the norms II Ill, II 112, II 1100 on IRn. Show that all of these norms are equivalent. (Use the above exercise and the reasoning in Theorem 4.2.) 4.4 Prove that II-<a, ~>-II max {ilall, II~II} is a norm on VX lV. 1·.5 Prove that 11-< a, ~ >- II = Iiall + II ~II is a norm on V X lV. 4.6 Prove that II-<a, ~>-II = (lIaI1 2+ 11~112)1!2isanormon VX TV. 4.7 Assuming Exercises 4.4 through 4.6, prove by induction the corresponding part of Theorem 4.4. 4.8 Prove that if A is an open subset of V X lV, ~hen 11'1[A) is an open subset of V. 1,.9 Prove (E, 0) that -< T, S>- ----> SoT is a continuous map from Hom(VI, V2) X Hom(V2, V3) to Hom(VI, V3), where the Vi are all normed linear spaces. 1.10 Let II II be any increasing norm on IRn; that is, Ilxll :::; Ilyll if Xi :::; Yi for all i. Lrt Pi be a norm on the vector space Vi for i = 1, ... ,n. Show that is a norm on V = IIi' Vi. I.ll Suppose that p: V ----> IR is a nonnegative function such that p(xa) = Ixlp(a) for all X, a. This is surely a minimum requirement for any function purporting to be a IIwasure of length of a vector. a) Define continuity with respect to p and show that Theorem 3.1 is valid. b) Our next requirement is that addition be continuous as a map from V X V to V, and we decide that continuity at 0 means that for every E there is a 0 such that p(a) < 0 and p({3) < 0 =} p(a +(3) < E. Argue again as in Theorem 3.1 to show that there is a constant c such that p(a +(3) :::; c(p(a) + p({3») for all a, (3 E V. 1.12 Let V and TV be normed linear spaces, and let f: V X TV ----> IR be bounded and hilinear. Let T be the corresponding linear map from V to TV*. Prove that T is bounded 1111( that II Til is the smallest bound to f, that is, the smallest b such that I!(a, (3) I :::; bllall 11{311 for all a, {3. 1.13 Let the normed linear space V be a norm direct sum M E0 N. Prove that the "Ilhspaces M and N are closed sets in V. (The converse theorem is false.)
  • 148. 136 THE DIFFERENTIAL CALCULUS 3.5 4.14 Let N be a closed subspace of the normed linear space V. If A is a coset N + a, define III A III as glb {II ~II :~ E A}. Prove that III A III is a norm on the quotient space VIN. Prove also that if ~ is the coset containing t then the mapping ~ ~ ~ (the natural projection 71' of V onto VIN) is bounded by 1. 4.15 Let V and TV be normed linear spaces, and let Tin Hom(V, W) have a null space which includes the closed subspace N. Prove that the unique linear S from VIN to TV defined by T = So 71' (Theorem 4.3 of Chapter 1) is bounded and that IISII = IITII. 4.16 Let N be a closed subspace of a normed linear space, and suppose that N has a finite-dimensional complement in the purely algebraic sense. Prove that then V is the norm direct sum M E9 N. (Use the above exercise and Theorem 4.2 to prove that if P is the projection of V onto N along M, then P is bounded.) 4.17 Let N 1 and N 2 be closed subspaces of the normed linear space V, and suppose that they have the same finite codimension. Prove that N 1 and N2 are norm isomor- phic. (Assume the results of the above exercise and Exercise 2.11 of Chapter 2.) 4.18 Prove that if p is a seminorm on a vector space V, then its null set is a subspace N, p is constant on the cosets of N, and p factors: p = q 0 71', where q is a norm on VIN and 71' is the natural projection ~ ~ ~ of V onto VIN. Note that ~ ~ ~ is thus an isometric surjection from the seminormed space V to the normed space VIN. An isometry is a distance-preserving map. 5. INFINITESIMALS The notion of an infinitesimal was abused in the early literature of the calculus, its treatment generally amounting to logical nonsense, and the term fell into such disrepute that many modern books avoid it completely. Nevertheless, it is a very useful idea, and we shall base our development of the differential upon the properties of two special classes of infinitesimals which we shall call "big oh" and "little oh" (and designate 'e' and 'e', respectively). Originally an infinitesimal was considered to be a number that "is infinitely small but not zero". Of course, there is no such number. Later, an infinitesimal was considered to be a variable that approaches zero as its limit. However, we know that it is functions that have limits, and a variable can be considered to have a limit only if it is somehow considered to be a function. We end up looking at functions cp such that cp(t) ~ 0 as t ~ O. The definition of derivative involves several such infinitesimals. If f'(x) exists and has the value a, then the funda- mental difference quotient (J(x + t) - f(x)) It is the quotient of two infinites- imals, and, furthermore, (U(x + t) - f(x))lt) - a also approaches 0 as t ~ O. This last function is not defined at 0, but we can get around this if we wish by multiplying through by t, obtaining (J(x + t) - f(x)) - at = cp(t), wheref(x + t) - f(x) is the "change in!" infinitesimal, at is a linear infinitesimal, and cp(t) is an infinitesimal that approaches Ofaster than t (i.e., cp(t)lt ~ 0 as t ~ 0). If we divide the last equation by t again, we see that this property of the infin-
  • 149. :t5 INFINITESIMALS 137 itesimal <p, that it converges to 0 faster than t as t -+ 0, is exactly equivalent to the fact that the difference quotient of1converges to a. This makes it clear that the study of derivatives is included in the study of the rate at which infinites- imals get small, and the usefulness of this paraphrase will shortly become clear. Definition. A subset A of a normed linear space V is a neighborhood of a point a if A includes some open ball about a. A deleted neighborhood of a is a neighborhood of a minus the point a itself. We define special sets of functions fJ, 0, and 0 as follows. It will be assumed in these definitions that each function is from a neighborhood of 0 in a normed linear space V to a normed linear space W. 1 E fJ if 1(0) = 0 and 1 is continuous at O. These functions are the infi- nitesimals. 1 E °if 1(0) = 0 and 1 is Lipschitz continuous at o. That is, there exist positive constants rand esuch that IIfW II ::; ell ~II on Br(O). f Eo if f(O) = 0 and Ilf(~) 11I11 ~II -+ 0 as ~ -+ o. When the spaces V and Ware not understood, we specify them by writing Cl(V, W), etc. A simple set of functions from ~ to ~ makes the qualitative difference hdween these classes apparent. The function f(x) = Ix1 1/2 is in fJ (~, ~) but not ill 0, g(x) = x is in °and therefore in fJ but not in 0, and hex) = x2 is in all three classes (Fig. 3.7). g f Fig. 3.7 It is clear that fJ, 0, and 0 are unchanged when the norms on V and Ware n·placed by equivalent norms. Our previous notion of the sum of two functions does not apply to a pair of functions f, g E fJ(V, W) because their domains may be different. However, f I· g is defined on the intersection dom f n dom g, which is still a neighborhood "I' o. Moreover, addition remains commutative and associative when extended III this way. It is clear that then fJ(V, W) is almost a vector space. The only trouble occurs in connection with the equation f + (-f) = 0; the domain of f lin function on the left is dom f, whereas we naturally take 0 to be the zero function on the whole of V.
  • 150. 138 THE DIFFERENTIAL CALCULUS 3.5 *The way out of this difficulty is to identify two functions f and g in [f if they are the same on some ball about O. We define f and g to be equivalent (f '" g) if and only if there exists a neighborhood of 0 on whichf = g. We then check (in our minds) that this is an equivalence relation and that we now do have a vector space. Its elements are called germs of functions at o. Strictly speaking, a germ is thus an equivalence class of functions, but in practice one tends to think of germs in terms of their representing functions, only keeping in mind that two functions are the same as germs when they agree on a neighbor- hood of 0.* As one might guess from our introductory discussion, the algebraic prop- erties of the three classes [f, e, and e are crucial for the differential calculus. We gather them together in the following theorem. Theorelll 5.1 1) e(V, W) c e(V, W) c [f(V, W), and each of the three classes is closed under addition and multiplication by scalars. 2) If f E e(V, W), and if g E e(W, X), then g 0 f E e(V, X), where dom g 0 f = rl[dom gJ. 3) If either f or g above is in e, then so is g 0 f. 4) If f E e(V, W) and g E [f(V, IR), then fg E e(V, W), and similarly if f E [f and gEe. 5) In (4) if either for g is in e and the other is merely bounded on a neigh- borhood of 0, then fg E e(V, W). 6) Hom(V, W) c e(V, W). -7) Hom(V, W) n e(V, W) = {O}. Proof. Let £.(V, W) be the set of infinitesimals f such that Ilf(~)11 ~ EII~II on some ball about O. Thenf E e if and only iff is in some £., and fEe if and only if f is in every £.. Obviously, e C e c [f. 1) If Ilf(~)11 ~ all~11 on Bt(O) and Ilg(~)1I ~ bll~11 on Bu(O), then Ilf(O + g(~)11 ~ (a + b)II~11 on Br(O), where r = min {t, u}. Thus e is closed under addition. The closure of e under addition follows similarly, or simply from the limit of a sum being the sum of the limits. 2) If Ilfa)II ~ all~11 when II~II ~ t and Ilg(71)11 ~ bll7111 when 117111 ~ u, then IlgUW)11 ~ bllfWl1 ~ abll~11 when II~II ~ t and Ilfa)11 ~ u, and so when II~II ~ r = min {t, u/a}.
  • 151. 3.5 INFINITESIMALS 139 3) Now suppose thatf E 0 in (2). Then, given E, we can take a = E/b and have Ilg(j(~) II ~ Ell ~II when II ~II ~ r. Thus g 0 f E 0. The argument when g E 0 and f E 0 is essentially the same. 4) Given IlfW II ~ ell ~II on Br(O) and given E, we choose 8 such that IgWI ~ E/e on B~(O) and have Ilf(~)g(~) II ~ Ell ~II when II ~II ~ min (8, r). The other result follows similarly, as also does (5). 6) A bounded linear transformation is in 0 by definition. 7) Suppose that f E Hom(V, W) n 0(V, W). Take any a ~ o. Given E, choose r so that IlfW II ~ Ell ~II on Br(O). Then write a as a = x~, where II ~II < r. (Find ~ and x.) Then Ilf(a)1I = Ilf(x~)11 = Ixl· Ilf(~)11 ~ Ixl· E· II~II = Ellall· Thus Ilf(a) II ~ Ellall for every positive E, and so f(a) = O. Thus f = 0, proving (7). D Remark. The additivity offwas not used in this argument, only its homogeneity. It follows therefore that there is no homogeneous function (of degree 1) in 0 except o. Sometimes when more than one variable is present it is necessary to indicate with respect to which variable a function is in 0 or 0. We then write ''f(~) = o(~)" for ''I EO", where "O(~)" is used to designate an arbitrary element of o. The following rather curious lemma will be useful later in our proof of the differentiability of an implicitly defined function. It is understood that 71 = f(~), where f is the function we are studying. Lemma 5.1. If 71 = oW +0( -< ~, 71 >-) and also 71 = d(~), then 71 = O(~). Proof. The hypotheses imply that there are numbers b, rl and p such that 117111 ::; bll~11 + !(II~II + 117111) if II~II ~ rl and II~II + 117111 ::; p, and then that 117111 ~ p/2 if II ~II is smaller than some r2. If II ~II ::; r = min {rb 1"2, p/2} , then all the conditions are met and 117111 ::; bll ~II +!(II~II + 117111). But this is the inequality 117111::; (2b + 1)1I~11, and so 71 = 0(0. D We shall also need the following straightforward result. Lemma 5.2. Iff E O(V, X) and g E O(V, Y), then -<f, g>- E O(V, X X Y). That is, -<O(~), o(~) >- = oW. /'roo!. The proof is left to the reader.
  • 152. 140 THE DIFFERENTIAL CALCULUS 3.6 EXERCISES 5.1 Prove in detail that the class g(V, W) is unchanged if the norms on V and TV are replaced by equivalent norms. 5.2 Do the same for e and o. 5.3 Prove (5) of the eo-theorem (Theorem 5.1). 5.4 Prove also that if in (4) either for 9 is in e and the other is merely bounded on- a neighborhood of 0, thenfg E e(V, W). 5.5 Prove Lemma 5.2. (Remember that P = <1', P2>- is loose language for F = 01 0 PI + 02 0 F2.) State the generalization to n functions. State the o-form of the theorem. 5.6 Given PI E e(Vl, vI') and P2 E e(V2, W), define P from (a subset of) V = VI X V2 to W by P(Cil, Ci2) = 1'(Cil) + F2(Ci2). Prove that FE e(V, W). (First state the defining equation as an identity involving the projections 1fl and 1f2 and not involving explicit mention of the domain vectors Cil and Ci2.) 5.7 Given Fl E e(Vl, W) and F2 E e(V2, ~), define precisely what you mean by FIF2 and show that it is in o(V1 X V 2, W). 5.8 Define the class en asfollows:f E en iff E g and IlfWII/II~11 nis bounded in some deleted ball about O. (A deleted neighborhood of Ci is a neighborhood minus Ci.) State and prove a theorem about f + 9 when f E en and 9 E em. 5.9 State and prove a theorem about fog when f E en and 9 E em. 5.10 State and prove a theorem about fg when f E en and 9 E em. 5.II Define a similar class on. State and prove a theorem about fog when f E en and 9 E om. 6. THE DIFFERENTIAL Before considering the notion of the differential, we shall review some geometric material from the elementary calculus. We do this for motivation only; our sub- sequent theory is independent of the preliminary discussion. In the elementary one-variable calculus the derivative f'(a) of a function f at the point a has geometric meaning as the slope of the tangent line to the graph of f at the point a. (Of course, according to our notion of a function, the graph of f is f.) The tangent line thus has the (point-slope) equation y - f(a) = f'(a)(x - a), and is the graph of the affine map x f--+ f'(a) (x - a) +f(a). We ordinarily examine the nature of the curve f near the point <a, f(a) >- by using new variables which are zero at this point. That is, we express every- thing in terms of s = y - f(a) and t = x-a. This change of variables is simply the translation <x,y>- f--+ <t,s>- = <x-a,y-f(a)>- in the Cartesian plane ~2 which brings the point of interest <a, f(a) >- to the origin. If we picture the situation in a Euclidean plane, of which the next page is a satis- factory local model, then this translation in ~2 is represented by a choice of new axes, the t- and s-axes, with origin at the point of tangency. Since y = f(x)
  • 153. 3.6 THE DIFFERENTIAL 141 if and only if 8 = f(a + t) - f(a), we see that the image of f under this trans- lation is the function !!.fa defined by t.fo(t) = f(a + t) - f(a). (See Fig. 3.8.) Of course, t.fa is simply our old friend the change in f brought about by changing x from a to a + t. dfa(t) - t.fa(t) =0(t) f(a+t) I 'fo(t){ ===i~~=_~ tj'C"HJ.(t) f(a) I t I : I I I I I I I I l ~ Fig. 3.8 a a+t Similarly, the equation y - f(a) = 1'(a)(x - a) becomes s = 1'(a)t, and the tangent line accordingly translates to the line that is (the graph of) the linear functional l: t f---+ .f'(a)t having the number 1'(a) as its skeleton (matrix). Remember that from the point of view of the geometric configuration (curve and tangent line) in the Euclidean plane, all that we are doing is choosing the natural axis system, with origin at the point of tangency. Then the curve is (the graph of) the function t.fa, and the tangent line is (the graph of) the linear map l. Now it follows from the definition of1'(a) that l can also be characterized as Ihe linear function that approximates t.fa most closely. For, by definition, t.fa(t) _ f'(a) t as and this is exactly the same as saying that t.fa(t) - let) _ 0 t or t-O, t.fa - lEe. But we know from the Be-theorem that the expression of the function t.fa as the sum l + e is unique. This unique linear approximation l is called the differential of f at a and is designated dfa. Again, the differential of f at a is the linear function I: IR f---+ IR that approximates the actual change in f, t.fa, in the sense that tlJa - lEe; we saw above that if the derivative l'(a) exists, then the differential of Jat a exists and has f'(a) as its skeleton (1 X 1 matrix). Similarly, if1is a function of two variables, then (the graph of) 1is a surface ill Cartesian 3-space 1R3 = 1R2 X IR, and the tangent plane to this surface at -< a, b,l(a, b) >- has the equation z - l(a, b) = 11Ca, b)(x - a) +12(a, b)(x - b),
  • 154. 142 THE DIFFERENTIAL CALCULUS 3.6 where/l = allax and 12 = aljay. If, as above, we set 1l.1<a,b>(S, t) = I(a + s, b +t) - I(a, b) and l(s, t) = s/t(a, b) +t/2(a, b), then 1l.1<a,b> is the change in I around a, b and 1is the linear functional on ~2 with matrix (skeleton) -</t(a, b), /2(a, b) >. Moreover, it is a theorem of the standard calculus that if the partial derivatives of I are continuous, then again 1approximates 1l.1<a,b>, with error in e. Here also 1is called the differential ofI at -<a, b> and is designated dl<a,b> (Fig. 3.9). The notation in the figure has been changed to show the value at t = -<tt, t2 > of the differential dis of I at a = -<all a2> . Fig. 3.9 The following definition should now be clear. As above, the local function ll.Fa is defined by ll.Fa(~) = F(a + ~) - F(a). Definition. Let V and W be normed linear spaces, and let A be a neighbor- hood of a in V. A mapping F: A - W is differentiable at a if there is a T in Hom(V, W) such that ll.Faa) = T(~) +e(~). The ee-theorem implies then that T is uniquely determined, for if also ll.Fa = S +e, then T - SEe, and so T - S = 0 by (7) of the theorem. This uniquely determined T is called the differential 01 F at a and is designated dFa' Thus ll.Fa = dFa +e, where dFa is the unique (bounded) linear approximation to ll.Fa'
  • 155. 3.6 THE DIFFERENTIAL 143 * Our preliminary discussion should make it clear that this definition of the differential agrees with standard usage when the domain space is R". However, in certain cases when the domain space is an infinite-dimensional function space, dFa is called the first variation of F at a. This is due to the fact that although the early writers on the calculus of variations saw its analogy with the differential calculus, they did not realize that it was the same subject.* We gather together in the next two theorems the familiar rules for differ- entiation. They follow immediately from the definition and the el9-theorem. Itwill be convenient to use the notation !Da(V, W) for the set of all mappings from neighborhoods of a in V to W that are differentiable at a. Theorem 6.1 1) If F E !Da(V, W), then t:.Fa E e(V, W). 2) If F, G E !Da(V, W), then F + G E !Da(V, W) and d(F + G)a = dFa +dGa. 3) IfF E !Da(V, R) and G E !Da(V, W), then FG E !Da(V, W) and d(FG)a = F(a) dGa +dFaG(a), the second term beiIm a dyad. 4) If F is a constant function on V, then F is differentiable and dFa = O. 5) If F E Hom(V, W), then F is differentiable at every a E V and dFa = F. Proof 1) t:.Fa = dFa +19 = e +f) = e by (1) and (6) of the el9-theorem. 2) It is clear that t:.(F + G)a = t:.Fa + i1Ga. Therefore, t:.(F + G)a = (dFa +0) + (dGa +0) = (dFa +dGa) +0 by (1) of the el9-theorem. Since dFa +dGa E Hom(V, W), we have (2). 3) t:.(FG)a(~) = F(a + ~)G(a + ~) - F(a)G(a) = t:.Fa(~)G(a) +F(a) t:.Ga(~) +t:.Fa(~) i1Ga(~), as the reader will see upon expanding and canceling. This is just the usual device of adding and subtracting middle terms in order to arrive at the form involving the t:.'s. Thus i1(FG)a = (dFa +0)G(a) +F(a)(dGa+19) +ee = dFaG(a) +F(a) dGa+19 by the efJ-theorem. 4) If i1Fa = 0, then dFa = 0 by (7) of the e0-theorem. 5) t:.Fa(~) = F(a + ~) - F(a) = F(~). Thus t:.Fa = FE Hom(V, W). 0 The composite-function rule is somewhat more complicated. Theorem 6.2. IfF E !Da(V, W) and G E !DF(a)(W, X), then Go F E !Da(V, X) and
  • 156. 144 THE DIFFERENTIAL CALCULUS Proof. We have il(G 0 F)a(V = G(F(a + ~») - G(F(a») = G(F(a) +ilFaa») - G(F(a») = ilGF(a,(ilFa(~») = dGF(a)(ilFaW) +e(ilFaU») = dGF(a)(dFa(~») + dGF(a)(e(~») + e 0 e = (dGF(a) 0 dFa)(~) + e 0 e + e 0 e. 3.6 Thus il(G 0 F)a = dGF(a) 0 dFa + e, and since dGF(a) 0 dFa E Hom(V, W), this proves the theorem. The reader should be able to justify each step taken in this chain of equalities. 0 EXERCISES 6.1 The coordinate mapping -< x, y >- ~ x from 1R2 to IR is differentiable. Why? What is its differential? 6.2 Prove that differentiation commutes with the application of bounded linear maps. That is, show that if F: V --+ TV is differentiable at a and if T E Hom(TV, X), then To F is differentiable at a and d(T 0 F)a = To dFa. 6.3 Prove that FE :Da(V, IR) and F(a) ¢ 0 =:::} G = I/F E :Da(V, IR) and dG = -dFa a (F(a»)2 6.4 Let F: V --+ IR be differentiable at a, and let f: IR --+ IR be a function whose derivative exists at a = F(a). Prove that f 0 F is differentiable at a and that d(fo F)a = f'(a) dFa. [Remember that the differential of f at a is simply multiplication by its derivative: dfa(h) = hI'(a).J Show that the preceding problem is a special case. 6.5 Let V and TV be normed linear spaces, and let F: V --+ TV and G: TV --+ V be continuous maps such that Go F = Iv and FoG = Iw. Suppose that F is differ- entiable at a and that G is differentiable at (3 = F(a). Prove that 6.6 Let f: V --+ IR be differentiable at a. Show that g = r is differentiable at a and that (Prove this both by an induction on the product rule and by the composite-function rule, assuming in the second case that D",xn = nxn - 1.)
  • 157. 3.6 THE DIFFERENTIAL 145 6.7 Prove from the product rule by induction that if the n functions Ii: V -t IR, i = 1, ... , n, are all differentiable at a, then so is f = InIi, and that 6.1l A monomial of degree n on the normed linear space V is a product In li of linear functionals (li E V*). A homogeneous polynomial of degree n is a finite sum of monomials of degree n. A polynomial of degree n is a sum of homogeneous polynomials Pi, i = 0, ... , n, where Po is a constant. Show from the above exercise and other known facts that a polynomial is differentiable everywhere. 6.9 Show that if Fl: V -t WI and F2: V -t W2 are both differentiable at a, then so is F = -<Fl, F2> from V to W = Trl X W2 (use the injections 81 and 82). 6.10 Show without using explicit computations, but using the results of earlier exercises instead, that the mapping F = 1R2 -t 1R2 defined by -<x, y> 1-+ -< (x - y)2, (x +y)3> is everywhere differentiable. Now compute its differential at -< a, b>. 6.11 Let F: V -t X and G: W -t X be differentiable at a and fJ respectively, and define K: V X W -t X by K(~, 11) = FW + G(l1). Show that K is differentiable at -<a, fJ > a) by a direct .6-calculation; b) by using the projections 11"1 and 11"2 to express K in terms of F and G without explicit reference to the variable, and then applying the differentiation rules. 6.12 Now suppose given F: V -t IR and G: TV -t X, and define K by Show that if F and G are differentiable at a and fJ respectively, then K is differentiable at -< a, fJ> in the manner of (b) in the above exercise. 6.13 Let V and W be normed linear spaces. Prove that the map -< a, fJ> 1-+ lIall IIfJlI from V X W to IR is in 0(V X W, IR). Use the maximum norm on the product space. Let f: V X W -t IR be bounded and bilinear. Here boundedness means that there is some b such that If(a, fJ)1 ::::; bllallllfJlI for all a, fJ. Prove that f is differentiable everywhere and find its differential. 6.14 Letf and g be differentiable functions from IR to R We know from the composite- function rule of the ordinary calculus that (fa g)'(a) = !'(g(a»)g'(a). Our composite-function rule says that d(f a g)a = dfu(a) a dga, where df", is the linear mapping t -t !,(x)t. Show that these two statements are equiv- alent.
  • 158. 146 THE DIFFERENTIAL CALCULUS 3.7 6.15 Prove that f(x, y) = II -< x, y>- [[ 1 = Ixl +IYI is differentiable except on the coordinate axes (that is, df<.a,b> exists if a and b are both nonzero). 6.16 Comparing the shapes of the unit balls for [[ [[1 and II [[00 on ~2, guess from the above the theorem about the differentiability of [[ [[00' Prove it. 6.17 Let V and W be fixed normed linear spaces, let Xd be the set of all maps from V to W that are differentiable at 0, let X 0 be the set of all maps from V to lV that belong to o(V, W), and let Xl be Hom(V, W). Prove that Xd and Xo are vector spaces and that Xd = Xo EB Xl. 6.18 Let F be a Lipschitz function with constant C which is differentiable at a point a. Prove that [[dF,,[I ~ C. 7. DIRECTIONAL DERIVATIVES; THE MEAN-VALUE THEOREM Directional derivatives form the connecting link between differentials and the derivatives of the elementary calculus, and, although they add one more concept that has to be fitted into the scheme of things, the reader should find them intuitively satisfying and technically useful. A continuous function f from an interval I C ~ to a normed linear space W can have a derivativef'(x) at a point x E I in exactly the sense of the elementary calculus: f'(x) = lim f(x + t) - f(x) . t-+o t The range of such a function f is a curve or arc in W, and it is conventional to call f itself a parametrized arc when we want to keep this geometric notion in mind. We shall also call f'(x), if it exists, the tangent vector to the arc f at x. This terminology fits our geometric intuition, as Fig. 3.10 suggests. For sim- plicity we have set x = 0 and f(x) = O. If f'(x) exists, we say that the param- etrized arc f is smooth at x. We also say that f is smooth at a = f(x), but this terminology is ambiguous if f is not injective (i.e., if the arc crosses itself). An arc is smooth if it is smooth at every value of the parameter. We naturally wonder about the relationship between the existence of the tangent vector f'(x) and the differentiability of fat x. If dfx exists, then, being a linear map on ~, it is simply multiplication "by" the fixed vector a that is its skeleton, dfx(h) = h dfx(1) = ha, and we expect a to be the tangent vector, 1'(0) ~1(1) 1(0+0 -1(0) 1~ t Fig. 3.10
  • 159. 3.7 DIRECTIONAL DERIVATIVES; THE MEAN-VALUE THEOREM 147 f'(x). We showed this and also the converse result for the ordinary calculus in our preliminary discussion in Section 6. Actually, our argument was valid for vector-valued functions, but we shall repeat it anyway. When we think of a vector-valued function of a real variable as being an arc, we often use Greek letters like' 'A' and' 1" for the function, as we do below. This of course does not in any way change what is being proved, but is slightly suggestive of a geometric interpretation. Theorem 7.1. A parametrized arc 1': [a, b] -+ Vis differentiable at x E (a, b) if and only if the tangent vector (derivative) a = 'Y'(x) exists, in which case the tangent vector is the skeleton of the differential, d'Yx(h) = h'Y'(x) = ha. Proof. If the parametrized arc 1': [a, b] -+ V is differentiable at x E (a, b), then d'Yx(h) = hd'Yx(1) = ha, where a = d'Yx(1). Since Ll'Yx - d'Yx E tJ, this gives IILl'Yx(h) - hall/lhl-+ 0, and so Ll'Yx(h)/h -+ a as h -+ 0. Thus a is the derivative 'Y'(x) in the ordinary sense. By reversing the above steps we see that the exis- tence of 1" (x) implies the differentiability of I' at x. 0 Now let F be a function from an open set A in a normed linear space V to a normed linear space W. One way to study the behavior of F in the neighborhood of a point a in A is to consider how it behaves on each straight line through a. That is, we study F by temporarily restricting it to a one-dimensional domain. The advantage gained in doing this is that the restricted F is then simply a parametrized arc, and its differential is simply multiplication by its ordinary derivative. For any nonzero ~ E V the straight line through a in the direction ~ has the parametric representation t ....... a + t~. The restriction of F to this line is the parametrized arc 1': 'Y(t) = F(a + t~). Its tangent vector (derivative) at the origin t = 0, if it exists, is called the derivative of F in the direction ~ at a, or the derivative of F with respect to ~ at a, and is designated D~F(a). Clearly, D F() 1· F(a + t~) - F(a) ~ a = 1m t . t---+o Comparing this with our original definition of1', we see that the tangent vector 'Y'(x) to a parametrized arc I' is the directional derivative Dl'Y(X) with respect to the standard basis vector 1 in R Strictly speaking, we are misusing the word "direction", because different vectors can have the same direction. Thus, if 11 = c~ with c > 0, then 11 and ~ point in the same direction, but, because DI;F(a) is linear in ~ (as we shall see in a moment), their associated derivatives are different: D~F(a) = cDI;F(a). We now want to establish the relationship between directional derivatives, which are vectors, and differentials, which are linear maps. We saw above that for an arc I' differentiability is equivalent to the existence of 'Y'(x) = Dl'Y(X). In the general case the relationship is not as simple as it is for arcs, but in one direction everything goes smoothly.
  • 160. 148 THE DIFFERENTIAL CALCULUS 3.7 Theorelll 7.2. If F is differentiable at lX, and if Ais any smooth arc through lX, with lX = A(X), then I' = F 0 A is smooth at x, and I"(x) = dFa(A'(X»). In particular, if F is differentiable at lX, then every directional derivative D~F(lX) exists, and D~F(lX) = dFaW. Proof. The smoothness of I' is equivalent to its differentiability at x and there- fore follows from the composite-function theorem. Moreover, I"(x) = dl'x(1) = d(F 0 A)x(1) = dFa(dAA1») = dFa(A'(X»). If A is the parametrized line A(t) = lX + t~, then it has the constant derivative ~, and since lX = A(O) here, the above formula becomes 1"(0) = dFaW. That is, D~F(lX) = 1"(0) = dFaW. 0 It is not true, conversely, that the existence of all the directional derivatives D~F(lX) of a function F at a point lX implies the differentiability of Fat lX. The easiest counterexample involves the notion of a homogenous function. We say that a function F: V -7 W is homogeneous if F(x~) = xF(~) for all x and ~. For such a function the directional derivative D~F(O) exists because the arc I'(t) = F(O + to = tFW is linear, and 1"(0) = F(~). Thus, all of the directional derivatives of a homogeneous function F exist at 0 and D~F(O) = F(~). If F is also differentiable at 0, then dFo(~) = D~F(O) = FW and F = dFo. Thus a differentiable homogeneous function must be linear. Therefore, any nonlinear homogeneous function F will be a function such that D~F(O) exists for all ~ but dFo does not exist. Taking the simplest possible situation, define F: 1R2 -7 IR by F(x, y) = X3 /(X2 + y2) if <x, y> ~ <0,0> and F(O, 0) = O. Then F(tx, ty) = tF(x, y), so that F is homogeneous, but F is not linear. However, if V is finite-dimensional, and if for each ~ in a spanning set of vectors the directional derivative D~F(lX) exists and is a continuous function of lX on an open set A, then F is continuously differentiable on A. The proof of this fact depends on the mean-value theorem, which we take up next, but we shall not complete it until Section 9 (Theorem 9.3). The reader will remember the mean-value theorem as a cornerstone of the calculus, and this is just as true in our general theory. We shall apply it in the next section to give the proof of the general form of the above-mentioned theorem, and practically all of our more advanced work will depend on it. The ordinary mean-value theorem does not have an exact analogue here. Instead we shall prove a theorem that in the one-variable calculus is an easy consequence of the mean-value theorem. Theorelll 7.3. Let f be a continuous function (parametrized arc) from a closed interval [a, b] to a normed linear space, and suppose that f'(t) exists and that [[f'(t)[[ ::; m for all t E (a, b). Then [[feb) - f(a)[[ ::; m(b - a). Proof. Fix E > 0, and let A be the set of points x E [a, b] such that [[f(x) - f(a)[[ ::; (m + E)(X - a) + E.
  • 161. .. 3.7 DIRECTIONAL DERIVATIVES; THE MEAN-VALUE THEOREM 149 A includes at least a small interval [a, e], because f is continuous at a. Set l = lub A. Then Ilf(l) - f(a) II ~ (m +e)(l - a) +ebythecontinuityoffatl. Thus lEA, and a < l ~ b. We claim that l = b. For if l < b, then f'(l) exists and IIf'(l)II ~ m. Therefore, there is a ~ such that II[f(x) - f(l)]j(x - l)II < m +e when Ix - II ~ ~. It follows that IIf(l + ~) - f(a) II ~ 11.f(l + ~) - f(l) II + IIf(l) - f(a) II ~ (m +e)~ + (m +e)(l - a) +e = (m +e)(l + ~ - a) +e, so that l + ~ E A, a contradiction. Therefore, l = b. We thus have IIf(b) - f(a) II ~ (m +e)(b - a) + e, and, since e is arbitrary, IIf(b) - f(a) II ~ m(b - a). 0 The following more general version of the mean-value theorem is the form in which it is ordinarily applied. As usual, F and G are from a subset of V to W. Theorem 7.4. If F is differentiable in the ball BT(a), and if IIdFj311 ~ e for every fJ in this ball, then 116Fj3(~)11 ~ ell ~II whenever fJ and fJ + ~ are in the ball. More generally, the same result holds if the ball BT(a) is replaced by any convex set C. Proof. The segment from fJ to fJ + ~ is the range of the parametrized arc X(t) = fJ +t~ from [0, 1] to V. If fJ and fJ + ~ are in the ball BT(a), then this segment is a subset of the ball. Setting 'Y(t) = F(fJ +t~), we then have 'Y'(x) = dFj3+xE(},,'(x») = dFi3+x~(~), from Theorem 7.2. Therefore, II'Y'(x)II ~ ell~II on [0,1], and the mean-value theorem then implies that 116Fj3WII = IIF(fJ + ~) - F(fJ)II = II'Y(I) - 'Y(O)II ~ ell~II(1 - 0) = ell~II, which is the desired inequality. The only property of BT(a) that we have used is that it includes the line segment joining any two of its points. This is the definition of convexity, and the theorem is therefore true for any convex set. 0 Corollary. If G is differentiable on the convex set C, if T E Hom(V, W), and if IIdGj3 - Til ~ e for all fJ in C, then II.::lGj3a) - T(~)II ~ ell ~II when- ever fJ and fJ + ~ are in C. Proof. SetF = G - T,andnotethatdFj3 = dGj3 - Tand.::lFj3 = .::lGj3 - T. 0 We end this section with a few words about notation. Notice the reversal of the positions of the variables in the identity (D~)(a) = dFa(~). This differ- l!IlCe has practical importance. We have a function of the two variables'a' lI.nd ' ewhich we can convert to a function of one variable by holding the other variable fixed; it is convenient technically to put the fixed variable in subscript
  • 162. 150 THE DIFFERENTIAL CALCULUS 3.7 position. Thus we think of dFa(~) with a held fixed and have the function dFa in Hom(V, W), whereas in (D~F)(a) we hold ~ fixed and have the directional derivative D~F: A ---t W in the fixed direction ~ as a function of a, generalizing the notation for any ordinary partial derivative aFjaxi(a) as a function of a. We can also express this implication of the subscript position of a variable in the dot notation (Section 0.10): when we write D~F(a), we are thinking of the value at a of the function D~F(·). Still a third notation that we shall use in later chapters puts the function symbol in subscript position. We write J F(a) = dFa • This notation implies that the mapping F is going to be fixed through a discussion and gets it "out of the way" by putting it in subscript position. If F is differentiable at each point of the open set A, then we naturally con- sider dF to be the map a ~ dFa from A to Hom(V, W). In the "J"-notation, dF = J F. Later in this chapter we are going to consider the differentiability of this map at a. This notion of the second differential d2Fa = d(dF)a is probably confusing at first sight, and a preliminary look at it now may ease the later discussion. We simply have a new map G = dF from an open set A in a normed linear space V to a normed linear space X = Hom(V, W), and we consider its differentiability at a. If dGa = d(dF)a exists, it is a linear map from V to Hom(V, W), and there is something special now. Referring back to Theorem 6.1 of Chapter 1, we know that dGa = d2Fa is equivalent by duality to a bilinear mapping w from V X V to W: since dGa(~) is itself a transformation in Hom(V, W), we can evaluate it at TI, and we define w by The dot notation may be helpful here. The mapping a ~ dFa is simply dF(.), and we have defined G by GO = dF(.). Later, the fact that dGaW is a mapping can be emphasized by writing it as dGaa)(·). In each case here we have a function of one variable, and the dot only reminds us of that fact and shows us where we shall put the variable when indicating an evaluation. In the case of w we have the original use of the dot, as in w(~, .) = dGa(~). EXERCISES 7.1 Given f: IR ---+ IR such that f'(a) exists, show that the "directional derivative" Dbf(a) has the value bf'(a), by a direct evaluation of the limit of the difference quotient. 7.2 Letfbe a real-valued function on an n-dimensional space V, and suppose thatfis differentiable at a E V. Show that the directions ~ in which the derivative D~F(a) is zero make up an (n - I)-dimensional subspace of V (or the whole of V). What similar conclusions can be drawn if f maps V to a two-dimensional space W?
  • 163. DIRECTIONAL DERIVATIVES; THE MEAN-VALUE THEOREM 151 7.3 a) Show by a direct argument on limits that iff and g are two functions from an interval Ie IR to a normed linear space V, and if f'(x) and g'(x) both exist, then (f+ g)'(x) exists and (f+ g)'(x) = !'(x) +g'(x). b) Prove the same result as a corollary of Theorems 7.1 and 7.2 and the differen- tiation rules of Section 6. 7.4 a) Given f: I - V and g: I - W, show by a direct limit argument that if !'(x) and g'(x) both exist, and if F = -<f, g>-: I - V X W, then F'(x) exists and W(x) = -<!'(x), g'(x) >-. b) Prove the same result from Theorems 7.1 and 7.2 and the differentiation rules of Hcction 6, using the exact relation F = fh 0 f +82 0 g. 7.5 In the spirit of the above two exercises, state a product law for derivatives of IU'CS and prove it as in the (b) proofs above. 7.6 Find the tangent vector to the arc -<el,sint>- at t = OJ at t = 7r/2. [Apply Exercise 7.4(a).] What is the differential of the above parametrized arc at these two points? That is, if f(t) = -<el, sin t >-, what are dfo and df...{2? 7.7 Let F: 1R2 _1R2 be the mapping -<x, y>- ~ -<3x2y, x2y3>-. Compute the dircctional derivative D<1,2>F(3, -1) a) as the tangent vector at -< 3, -1>- to the arc f 0 ~, where ~ is the straight line through -< 3, -1 >- in the direction -< 1, 2>- j b) by first computing dF<3.-1> and then evaluating at -< 1, 2>-. 7.8 Let ~ and JJ. be any two linear functionals on a vector space V. Evaluate the pl'llduct fW = ~WJJ.W along the line ~ = tOl., and hence compute D..f(OI.). Now "valuate f along the general line ~ = ta +(J, and from it compute D..f(fJ). 7.9 Work the above exercise by computing differentials. 7.10 If f: IRn - IR is differentiable at a, we know that its differential dfa, being a linear functional on IRn, is given by its skeleton n-tuple L according to the formula n dfa(x) = (L,x) = L: liXi. 1 III this context we call the n-tuple Lthe gradient of f at a. Show from the Schwarz IIwquality (Exercise 2.3) that if we use vectors y of Euclidean length 1, then the t1il'l'ctional derivative Dyf(a) is maximum when y points in the direction of the gradient IIf f. 7.11 Let W be a normed linear space, and let V be the set of parametrized arcs ~: [-1, 1] - W such that ~(O) = 0 and ~'(0) exists. Show that V is a vector space and 'hilt X- ~'(O) is a surjective linear mapping from V to W. Describe in words the ,-I,-ments of the quotient space VIN, where N is the null space of the above map. 7.12 Find another homogeneous nonlinear function. Evaluate its directional deriva- 'lVI'S DEF(O), and show again that they do not make up a linear map. 7.13 Prove that if F is a differentiable mapping from an open ball B of a normed lliwar space V to a normed linear space W such that dF.. = 0 for every a in B, then F i" " constant function. 7. J.i Generalize the above exercise to the case where the domain of F is an open set A with the property that any two points of A can be joined by a smooth arc lying in A.
  • 164. 152 THE DIFFERENTIAL CALCULUS 3.8 Show by a counterexample that the result does not generalize to arbitrary open sets A as the domain of F. 7.15 Prove the following generalization of the mean-value theorem. Let f be a con- tinuous mapping from the closed interval [a, b] to a normed linear space V, and let g be a continuous real-valued function on [a, b]. Suppose that f' (t) and g'(t) both exist, at all points of the open interval (a, b) and that 11f'(t) II ~ g'(t) on (a, b). Then llf(b) -f(a)ll:::; g(b) -g(a). [Consider the points x such that Ilf(x) - f(a) II ~ g(x) - g(a) + E(X - a) + E.] 8. THE DIFFERENTIAL AND PRODUCT SPACES In this section we shall relate the differentiation rules to the special configurationH resulting from the expression of a vector space as a finite Cartesian product. When dealing with the range, this is a trivial consideration, but when the domain is a product space, we become involved with a deeper theorem. These general product considerations will be specialized to the IRn-spaces in the next section, but they also have a more general usefulness, as we shall see in the later sectionH of this chapter and in later chapters. We know that an m-tuple of functions on a common domain, Fi: A ~ Wi, i = 1, ... , m, is equivalent to a single m-tuple-valued function m F: A ~ W = II Wi, 1 F(OI) being the m-tuple {Fi(OI)}T for each a E A. We now check the obviously necessary fact that F is differentiable at a if and only if each Fi is differentiable at 01. TheoreDl 8.1. Given Fi: A ~ Wi, i = 1, ... , m, and F = -< FI, ... ,Fm >-, then F is differentiable at a if and only if all the functions Fi are, in which case dF", = -<dF!, ... , dF':: >-. Proof. Strictly speaking, F = LT 8i 0 Fi , where 8j is the injection of Wj into the product space W = lIT Wi (see Section 1.3). Since each 8i is linear and hence differentiable, with d(8i )", = 8i, we see that if each Fi is differentiable at 01, then so is F, and dF", = LT 8i 0 dF~. Less exactly, this is the statement dF", = -<dF!, ... , dF'::>-. The converse follows similarly from Fi = 7ri 0 F, where 7rj is the projection of lIT Wi onto Wj. D Theorems 7.1 and 8.1 have the following obvious corollary (which can also be proved as easily by a direct inspection of the limits involved).. LeDlDla 8.1. If f; is an arc from [a, b] to Wi, for i = 1, ... , n, and if f i~ the n-tuple-valued arc f = -<iI,.·., fn >-, then f'(x) exists if and only if fI (x) exists for each i, in which case f'(x) = -<f~ (x), ... ,f~(x) >-. When the domain space V is a product space lI~ V j the situation is morl' complicated. A function F(h, ... , ~n) of n vector variables does not decomposl'
  • 165. a.s THE DIFFERENTIAL AND PRODUCT SPACES 153 into an equivalent n-tuple of functions. Moreover, although its differential liFo. does decompose into an equivalent n-tuple of partial differentials {dF~}, we do not have the simple theorem that dFa. exists if and only if the partial differentials dF~ all exist. Of course, we regard a function F(h, ... , ~n) of n vector variables as being a function of the single n-tuple variable ~ = <h, ... , ~n >-, so that in principle t.here is nothing new when we consider the differentiability of F. However, when we consider a composition FoG, the inner function G must now be an n-tuple- valued function G = <gl, ... , gn >-, where gi is from an open subset A of some normed linear space X to Vi, and we naturally try to express the differential of FoG in terms of the differentials dgi. To accomplish this we need the partial differentials dF~ of F. For the moment we shall define the jth partial differential of F at a = <ai, ... , a n>- as the restriction of the differential dFa. to Vj, (~onsidered as a subspace of V = In Vi. As usual, this really involves the injection OJ of Vj into II? Vi, and our formal (temporary) definition, accordingly, IK dF~ = dFa. 0 OJ. Then, since ~ = <h, ... , ~n >- = L:? Oi(~i)' we have n dFa.W = L dF~ai). 1 Himilarly, since G = <gl, ... , gn >- = L:? Oi 0 gi, we have n d(F 0 Gh = L dFhcY) 0 dg~, I which we shall call the general chain rule. There is ambiguity in the "i"-super- H(Tipts in this formula: to be more proper we should write (dF)~ and d(gi)-y. We shall now work around to the real definition of a partial differential. Hince AFa. 0 OJ = (dFa.+ 19) 0 OJ = dFa. 0 OJ +19 = dF~+ 19, II'P see that dF~ can be directly characterized, independently of dFa., as follows: dF~ is the unique element Ti of Hom(Vi, W) such that AFa. 0 Oi = Ti +19. That is, dF~ is the differential at ai of the function of the one variable ~i oht.ained by holding the other variables in F(h, ... , ~n) fixed at the values Ei = aj. This is important because in practice it is often such partial differen- I ill.bility that we come upon as the primary phenomenon. We shall therefore luke this direct characterization as our definition of dF~, after which our moti- 'lLting calculation above is the proof of the following lemma. Lemma 8.2. If A is an open subset of a product space V = II? Vi, and if F: A -7 W is differentiable at a, then all the partial differentials dF~ exist and dF~ = dFa. 0 Oi'
  • 166. 154 THE DIFFERENTIAL CALCULUS 3.8 The question then occurs as to whether the existence of all the partial differentials dF~ implies the existence of dFa.. The answer in general is negative, as we shall see in the next section, but if all the partial differentials dF~ exist for each a in an open set A and are continuous functions of a, then F is continuously differentiable on A. Note that Lemma 8.2 and the projection-injection identities show us what dFa. must be if it exists: dF~ = dFa. 0 8i and L 8i 0 'Tri = I together imply that dFa. = L dF~ 0 'Tri. Theorem 8.2. Let A be an open subset of the normed linear space V = VI X V 2, and suppose that F: A ~ W has continuous partial differ- entials dF~a.f3> and dF~a.f3> on A. Then dF <a.f3> exists and is continuous on A, and dF <a.f3>(~, 7]) = dF~a.f3>(~) + dF~a.f3>(7]). Proof. We shall use the sum norm on V = VI X V 2. Given E, we choose aso that I/dFi<J.'.v> - dFi<a.f3> II < E for every <'J.t, v>- in the a-ball about <'a, {3>- and for i = 1, 2. Setting GW = F(a + ~,(3 + 7]) - dF~a.f3>(~), we have and the corollary of Theorem 7.4 implies that when II <. ~, 7] >- II < a. Arguing similarly with H(7]) = F(a, (3 + 7]) - dF~a.f3>(7]), we find that when II <.0, 7] >- 1/ < a. Combining the two inequalities, we have when II <. ~, 7] >- 1/ < a, where T = dF~a.f3> 0 'Trl + dF~a.f3> 0 'Tr2' That IS, tJ.F <a.f3> - T = l'l, and so dF <a.f3> exists and equals T. 0 The theorem for more than two factor spaces is a corollary. Theorem 8.3. If A is an open subset of II~Vi and F ~ W is such that for each i = 1, ... , n the partial differential dF~ exists for all a E A and is continuous as a function of a = <. a 1> ••• , an >-, then dFa. exists and is continuous on A. If ~ = <. tI, ... , ~n >-, then dFa.W = L~ dF~(~i)" Proof. The existence and continuity of dF~ and dF! imply by the theorem that dF~ 0 7rl + dF! 0 'Tr2 is the differential of F considered as a function of the first two variables when the others are held fixed. Since it is the sum of continuouH
  • 167. , 3.8 THE DIFFERENTIAL AND PRODUCT SPACES 155 functions, it is itself continuous in a, and we can now apply the theorem again to add dF! to this sum partial differential, concluding that L ~ dF~ 0 7ri is the partial differential of F on the factor space VI X V2 X V3, and so on (which is colloquial for induction). 0 As an illustration of the use of these two theorems, we shall deduce the general product rule (although a direct proof based on .1-estimates is perfectly feasible). A general product is simply a bounded bilinear mapping w: X X Y ---t W, where X, Y, and Ware all normed linear spaces. The boundedness inequality here is "w(~, 7/)11 ~ bll ~11117/11. We first show that w is differentiable. Lemma 8.3. A bounded bilinear mapping w: X X Y ---t W is everywhere differentiable and dW<a.{J>(~, 7/) = w(a, 7/) + w(~, (3). Proof. With (3 held fixed, g{Ja) = wa, (3) is in Hom(X, W) and therefore is cverywhere differentiable and equal to its own differential. That is, dwl exists and dW~a.{J>(~) = w(~, (3). Since {3 ~ g{J is a bounded linear mapping, dW~a.{J> = g{J is a continuous function of -<a, (3 '? Similarly, dW~a.{J>(7/) = w(a, 7/), and dw2 is continuous. The lemma is now a direct corollary of Theorem 8.2. 0 If w(~, 7/) is thought of as a product of ~ and 7/, then the product of two functions g(r) and her) is weyer), h(r), where g is from an open subset A of a Ilormed linear space V to X and h is from A to Y. The product rule is now just what would be expected: the differential of the product is the first times the differential of the second plus the second times the differential of the first. Theorem 8.4. If g: A ---t X and h: A ---t Yare differentiable at (3, then so is the product F(r) = weger), h(r) and dFp(r) = w(y({3), dhp(r) + w(dgp(r), h({3)). Proof. This is a direct corollary of Theorem 8.1, Lemma 8.3, and the chain rule. 0 t:XERCISES 8.1 Find the tangent vector to the arc -<sin t, cos t, t2 '? at t = 0; at t = 7r/2. What is the differential of the above parametrized arc at the two given points? That is, if I(t) = -<sin t, cos t, t2 '?, what are dlo and dlr /2? 8.2 Give the detailed proof of Lemma 8.1. 8.3 The formula .. dFuW = L dF!(~i) 1
  • 168. 156 THE DIFFERENTIAL CALCULUS 3.9 is probably obvious in view of the identity n ~ = L (Ji(~i) 1 and the definition of partial differentials, but write out an explicit, detailed proof anyway. 8.4 Let F be a differentiable mapping from an n-dimensional vector space V to a finite-dimensional vector space W, and define G: VX W ---7 Wby G(t 1J) = 1J - F(~). Thus the graph of F in V X W is the null set of G. Show that the null space of dG<a.fJ> has dimension nfor every -«a, (3) E V X W. 8.5 Let F(~, 1J) be a continuously differentiable function defined on a product A X B, where B is a ball and A is an open set. Suppose that dF~a.fJ> = 0 for all -«a, (3) in A X B. Prove that F is independent of 1J. That is, show that there is a continuously differentiable function Gm defined on A such that F(~, 1J) = Gm on AXB. 8.6 By considering a domain in 1R2 as indicated at the right, show that there exists a function f(x, y) on an open set A in 1R2 such that everywhere and such that f(x, y) is not a function of x alone. 8.7 Let F(~, 1J, t) be any function of three vector variables, and for fixed/' set G(~, 1J) = F(~, 1J, /'). Prove that the partial differential dF~a.fJ..Y> exists if and only if dG~a.fJ> exists, in which case they are equal. 8.8 Give a more careful proof of Theorem 8.3. That is, state the inductive hypothesis and show that the theorem follows from it and Theorem 8.2. If you are meticulous in your argument, you will need a form of the above exercise. 8.9 Let f be a differentiable mapping from 1R2 to IR. Regarding 1R2 as IR X IR, show that the two partial differentials of f are simply multiplication by its partial derivatives. Generalize to n dimensions. Show that the above is still true for a map F from 1R2 to a general vector space V, the partial derivatives now being vectors. 8.10 Give the details of the proof of Theorem 8.4. 9. THE DIFFERENTIAL AND IRn We shall now apply the results of the last two sections to mappings involving the Cartesian spaces IRn, the bread and butter spaces of finite-dimensional theory. We start with the domain. Theorem 9.1. If F is a mapping from (an open subset of) IRn to a normed linear space W, then the directional derivative of F in the direction of the .ith standard basis vector ~j is just the partial derivative aF/ ax;, and the .ith partial differential is multiplication by aF/aXj: dF~(h) = h(aF/aXj) (a). More exactly, if anyone of the above three objects exists at a, then they all do, with the above relationships.
  • 169. 3.9 THE DIFFERENTIAL AND IRn 157 Proof. We have aF () l' F(ab"" aj + t, ... , an) - F(ab ... , aj, ... , an)-- a = Im~~--~~~~--~~~--~~--~~--~~ aXj t--->o t = lim F(a + t~j) - F(a) = DOiF(a). t--->o t Moreover, since the restriction of F to a + lR~j is a parametrized arc whose differential at 0 is by definition the jth partial differential of F at a and whose tangent vector at 0 we have just computed to be (aFjaxj)(a), the remainder of the theorem follows from Theorem 7.1. 0 Combining this theorem and Theorem 7.2, we obtain the following result. Theorelll 9.2. If V = IRn and F is differentiable at a, then the partial derivatives (aFjaXj) (a) all exist and the n-tuple of partial derivatives at a, {(aFjaxj)(a))~, is the skeleton of dFa. In particular, n aF DyF(a) = E1Yj aXj (a). Proof. Since dFa(~i) = DaiF(a) = (aFjaXi) (a), as we noted above, we have ( n i) n i n aF DyF(a) = dFa(Y) = dFa L Yi~ = L Yi dFa(~ ) = L Yi --a. (a). 1 1 1 X, All that we have done here is to display dFa as the linear combination mapping defined by its skeleton {dFa(~i)} (see Theorem 1.2 of Chapter 1), where T(~i) = tlFa(~i) is now recognized as the partial derivative (aFjaXi)(a). 0 The above formula shows the barbarism of the classical notation for partial Ilcrivatives: note how it comes out if we try to evaluate dFa(x). The notation [)aiF is precise but cumbersome. Other notations are Fj and DjF. Each has its problems, but the second probably minimizes the difficulties. Using it, our formula reads dFa(Y) = L:J=1 yjDjF(a). In the opposite direction we have the corresponding specialization of Theorem 8.3. Theorelll 9.3. If A is an open subset of IRn, and if F is a mapping from A to a normed linear space W such that all of the partial derivatives (aFjaXj) (a) exist and are continuous on A, then F is continuously differ- entiable on A. Proof. Since the jth partial differential of F is simply multiplication by aFj aXil we are (by Theorem 9.1) assuming the existence and continuity of all the partial differentials dF~ on A. Theorem 9.3 thus becomes a special case of Theorem 8.3. 0 Now suppose that the range space of F is also a Cartesian space, so that F is a mapping from an open subset A of IRn to IRm. Then dFa is in Hom(lRn, IRm).
  • 170. 158 THE DIFFERENTIAL CALCULUS 3.9 For computational purposes we want to represent linear maps from IRn to IR'" by their matrices, and it is therefore of the utmost importance to find the matrix t of the differential T = dFa' This matrix is called the Jacobian matrix of F at a. The columns of t form the skeleton of dFa, and we saw above that this skeleton is the n-tuple of partial derivatives (aF/aXj) (a). If we write the m- tuple-valued F loosely as an m-tuple of functions, F = -<!11""!m >-, then according to Lemma 8.1, the jth column of t is the m-tuple aF ~ aft aim ~ a- (a) = a- (a), ... 'a- (a) . Xj Xj Xj Thus, Theorelll 9.4. Let F be a mapping from an open subset of IRn to IRm, and suppose that F is differentiable at a. Then the matrix of dFa (the Jacobian matrix of F at a) is given by alitij = a- (a). Xj If we use the notation Yi = /i(x), we have aYi tij = a- (a). Xj If we also have a differentiable map z = G(y) = -< gl (y), ... , gl(y) >- from an open set B C IRminto 1R1, then dGb has, similarly, the matrix agk (b) = aZk (b). aYi aYi Also, if B contains b = F(a), then the composite-function rule has the matrix form or simply aZk _ :t aZk aYi aXj - i=l aYi aXj . This is the usual form of the chain rule in the calculus. We see that it is merely the expression of the composition of linear maps as matrix multiplication. We saw in Section 8 that the ordinary derivativef'(a) of a function! of one real variable is the skeleton of the differential d!a, and it is perfectly reasonable to generalize this relationship and define the derivative F'(a) of a function F of n real variables to be the skeleton of dFa , so that F'(a) is the n-tuple of partial derivatives {(aF/aXi) (a)H, as we saw above. In particular, if F is from an open subset of IRn to IRm, then F'(a) is the Jacobian matrix of F at a. This gives the
  • 171. 3.9 THE DIFFERENTIAL AND IRn 159 matrix chain rule the standard form (G 0 F)'(a) = G'(F(a))F'(a). Some authors use the word 'derivative' for what we have called the differ- ential, but this is a change from the traditional meaning in the one-variable case, and we prefer to maintain the distinction as discussed above: the differential dFa. is the linear map approximating flFa. and the derivative F'(a) must be the matrix of this linear map when the domain and range spaces are Cartesian. However, we shall stay with the language of Jacobians. Suppose now that A is an open subset of a finite-dimensional vector space V and that H: A ~ W is differentiable at a E A. Suppose that W is also finite- dimensional and that cp: V ~ IRn and 1/;: W ~ IRm are any coordinate isomor- phisms. If A = cp[A], then A is an open subset of IRn and H = 1/; 0 H 0 cp-l is a mapping from A to IRm which is differentiable at a = cp(a), with dY. = 1/; 0 dHa. 0 cp-l. Then dH. is given by its Jacobian matrix {(ohi/oXj) (a)} , which we now call the Jacobian matrix of H with respect to the chosen bases in V and W. Change of bases in V and W changes the Jacobian matrix according to the rule given in Section 2.4. If F is a mapping from IRn to itself, then the determinant of the Jacobian matrix COr;oXj)(a) is called the Jacobian of Fat a. It is designated or if it is understood that Yi = rex). Another notation is J F(a) (or simply J(a) if F is understood). However, this is sometimes used to indicate the differential dF., and we shall write det J F(a) instead. If F(x) = -<x~ - x~, 2XIX2 >-, then its Jacobian matrix is [ 2Xl -2X2] 2X2 2Xl' EXERCISES 9.1 By analogy with the notion of a parametrized are, we define a smooth param- ct.rized two-dimensional surface in a normed linear space W to be a continuously differentiable map r from a rectangle I X J in 1R2 to W. Suppose that I X J = [-1,1] X [-1,1], and invent a definition of the tangent space to the range of r in W at the point nO, 0). Show that the two vectors or (0 0) ox ' and or (0 0) oy , are a basis for this tangent space. (This should not have been your definition.) /
  • 172. 160 THE DIFFERENTIAL CALCULUS 3.9 9.2 Generalize the above exercise to a smooth parametrized n-dimensional surface in a normed linear space W. 9.3 Compute the Jacobian matrix of the mapping -< x, y > ~ -< x2, y2, (x + y)2 >. Show that its rank is two except at the origin. 9.4 Let F = -<P, p, f3 > from 1R3 to 1R3 be defined by hex, y, z) = x + y + z, hex, y, z) = x2+ y2 +z2, and Compute the Jacobian of F at -< a, b, c >. Show that it is nonsingular unless two of the three coordinates are equal. Describe the locus of its singularities. 9.5 Compute the Jacobian of the mapping F: -<x, y> ~ -< (x +y)2, y3> from 1R2to 1R2 at -< 1, -1> jat -< 1,0> jat -<a, b>. Compute the Jacobian of G: -<8, t> ~ -<8- t, 8+t> at -<u, v>. 9.6 In the above exercise compute the compositions FoG and G 0 F. Compute the Jacobian of FoG at -< y, v>. Compute the corresponding product of the Jacobians of F and G. 9.7 Compute the Jacobian matrix and determinant of the mapping T defined by x = r cos 8, y = r sin 8, z = z. Composing a function f(x, y, z) with this mapping gives a new function: g(r, 8, z) = fer cos 8, r sin 8, z). That is, g = f 0 T. This composition (substitution) is called the change to cylindrical coordinates in 1R3. 9.8 Compute the Jacobian determinant of the polar coordinate transformation -<r, 8> ~ -<x, y>, where x = r cos 8, y = r sin 8. 9.9 The transformation to spherical coordinates is given by x = r sin 'P cos 8, y = r sin 'P sin 8,.z = r cos 8. Compute the Jacobian a(x, y, z) a(r, 'P, 8) 9.10 Write out the chain rule for the following special cases: dw/dt = ?, where w = F(x, y), x = get), y = h(t). Find dw/dt when w = F(xl, ... , xn) and Xi = gi(t), i = 1, ... ,n, Find aw/au when w = F(x, y), x = g(u, v), y = h(u, v). The special case where g(u, v) = 'lit can be rewritten aax F(x, hex, v». Compute it. 9.11 If w = f(x, y), x = r cos 8, and y = r sin 8, show that [ r aw]2 +[aw]2 = [aw]2 +[aw]2, ar a8. ax ay
  • 173. 3.10 ELEMENTARY APPLICATIONS 161 10. ELEMENTARY APPLICATIONS The elementary max-min theory from the standard calculus generalizes with little change, and we include a brief discussion of it at this point. TheoreDl 10.1. Let F be a real-valued function defined on an open subset A of a normed linear space V, and suppose that F assumes a relative maximum value at a point a in A where dFa exists. Then dFa = O. Proof. By definition D~F(a) is the derivative 'Y'(O) of the function 'Y(t) = F(a + t~), and the domain of'Y is a neighborhood of 0 in R Since'Y has a relative maximum value at 0, we have 'Y'(O) = 0 by the elementary calculus. Thus dFa(~) = D~F(a) = 0 for all ~, and so dFa = O. 0 A point a such that dFa = 0 is called a critical point. The theorem states that a differentiable real-valued function can have an interior extremal value only at a critical point. If V is IRn, then the above argument shows that a real-valued function F can have a relative maximum (or minimum) at a only if the partial derivatives (aF/aXi)(a) are all zero, and, as in the elementary calculus, this often provides a way of calculating maximum (or minimum) values. Suppose, for example, that wewant to show that the cube is the most efficient rectangular parallelepiped from the point of view of minimizing surface area for a given volume V. If the edges are x, y and z, we have V = xyz and A = 2(xy + xz + yz) = 2(xy + V/y + V/x). Then from 0 = aAjax = 2(y - V/x2), we see that V = yx2, and, similarly, aA/ay = 0 implies that V = xy2. Therefore, yx2 = xy2, and since neither x nor y can be 0, it follows that x = y. Then V = yx2 = x3, and x = V I /3 = y. Finally, substituting in V = xyz shows that z = V I /3. Our critical configuration is thus a cube, with minimum area A = 6V2/3. It was assumed above that A has an absolute minimum at some point -<x, y, z >-. The reader might enjoy showing that A ~ 00 if any of x, y, z tends to 0 or 00, which implies that the minimum does indeed exist. We shall return to the problem of determining critical points in Sections 12, 15, and 16. The condition dFa = 0 is necessary but not sufficient for an interior maxi- mum or mmimum. The reader will remember a sufficient condition from beginning calculus: If f'(x) = 0 and f"(x) < 0 (>0), then x is a relative maxi- mum (minimum) point for f. We shall prove the corresponding general theorem in Section 16. There are more possibilities now; among them we have the analogous sufficient condition that if dFa = 0 and d2Fa is negative (positive) definite as a quadratic form on V, then a is a relative maximum (minimum) point of F. We consider next the notion of a tangent plane to a graph. The calculation of tangent lines to curves and tangent planes to surfaces is ordinarily considered a geometric application of the derivative, and we take this as sufficient justifica- tion for considering the general question here.
  • 174. 162 THE DIFFERENTIAL CALCULUS 3.10 Let F be a mapping from an open subset A of a normed linear space V to a normed linear space W. When we view F as a graph in V X W, we think of it as a "surface" S lying "over" the domain A, generalizing the geometric interpre- tation of the graph of a real-valued function of two real variables in 1R3 = 1R2 X IR. The projection 11'1: V X W ~ V projects S "down" onto A, -< ~, F(~) >- t;rt ~, and the mapping ~ 1---+ -<~, F(~) >- gives the point of S lying "over"~. Our geometric imagery views V as the plane (subspace) V X {O} in V X W, just as we customarily visualize IR as the real axis IR X {O} in 1R2. We now assume that F is differentiable at a. Our preliminary discussion in Section 6 suggested that (the graph of) the linear function dFa is the tangent plane to (the graph of) the function IlFa in V X W, and that its translate M through -<a, F(a) >- is the tangent plane at -<a, F(a) >- to the surface S that is (the graph of) F. The equation of this plane is TJ - F(a) = dFa(~ - a), and it is accordingly (the graph of) the affine function GW = dFa(~ - a) +F(a). Now we know that dFa is the unique T in Hom(V, W) such that IlFa(!;) = T(r) +('J(r), and if we set r = ~ - a, it is easy to see that this is the same as saying that G is the unique affine map from V to W such that F(~) - G(~) = ('J(~ - a). That is, M is the unique plane over V that "fits" the surface S around -<a, F(a) >- in the sense of ('J-approximation. However, there is one further geometric fact that greatly strengthens our feeling that this really is the tangent plane. Theorem 10.2. The plane with equation TJ - F(a) = dFa(~ - a) is exactly the union of all the straight lines through -< a, F(a) >- in V X W that are tangent to smooth curves on the surface S = graph F passing through this point. In other words, the vectors in the subspace dFa of V X Ware exactly the tangent vectors to curves lying in S and passing through -<a, F(a) >-. P,·OOj. This is nearly trivial. If -<~, TJ >- E dFa, then the arc 'Y(t) = -< a + t~, F(a + t~) >- in S lying over the line t 1---+ a + t~ in V has -<~, dFaa) >- = -<~, TJ >- as its tangent vector at -<a, F(a) >-, by Lemma 8.1 and Theorem 8.2. Conversely, if t 1---+ -< X(t), F (X(t») >- is any smooth arc in S passing through a, with a = X(to), then its tangent vector at -<a, F(a) >- is -< X'(to), dFa (X'(to») >- , a vector in (the graph of) dFa. 0
  • 175. :uo ELEMENTARY APPLICATIONS 163 As an example of the general tangent plane discussed above, let F = <ft,h> be the map from 1R2 to 1R2 defined by ft(x) = (x~ - x~)/2, h(x) = .CIX2. The graph of F is a surface over 1R2 in 1R4 = 1R2 X 1R2. According to our above discussion, the tangent plane at <a, F(a) > has the equation y = rlFa(x - a) + F(a). At a = <1,2> the Jacobian matrix of dFa is -2]1 ' and F(a) = < -!, 2>. The equation of the tangent plane 1If at < 1, 2> is thus <Yb Y2> = [~ -~] <Xl - 1, X2 - 2> + < -~, 2>. (~omputing the matrix product, we have the scalar equations YI = Xl - 2X2 + (-1 +4 -!) = Xl - 2X2 + !, Y2 = 2XI + X2 + (-2 -2 +2) = 2XI + X2 - 2. Note that these two equations present the affine space]I.J as the intersection of the hyperplane in 1R4 consisting of all <Xb X2, Yb Y2> such that Xl - 2X2 - YI = -!, with the hyperplane having the equation EXERCISES 10.1 Find the maximum value of f(x, y, z) = x + y + z on the ellipsoid x2+ 2y2 + 3z2 = 1. 10.2 Find the maximum value of the linear functional f(x) = L1 CiXi on the unit ""here L1 x~ = 1. 10.3 Find the (minimum) distance between the two lines and y = 8<1,1,1> + <1,0, -1> 111 1R3. 10.4 Show that there is a uniquely determined pair of closest points on the two lines = ta + I and y = 8b + m in IRn unless b = ka for some k. We assume that II .,e 0 ~ b. Remember that if b is not of the form ka, then I(a, b)1 < lIall2 IIblb 1I('(:ording to the Schwarz inequality. 10.5 Show that the origin is the only critical point of f(x, y, z) = xy + yz + zx. Find a line through the origin along which 0 is a maximum point forf, and find another line along which 0 is a minimum point.
  • 176. 164 THE DIFFERENTIAL CALCULUS 3.11 10.6 In the problem of minimizing the area of a rectangular parallelepiped of given volume V worked out in the text, it was assumed that has an absolute minimum at an interior point of the first quadrant. Prove this. Show first that A --+ ao if <x, y> approaches the boundary in any way: x --+ 0, x --+ ao, y --+ 0, or y --+ ao. 10.7 Let F: 1R2 --+ 1R2 be the mapping defined by Find the equation of the tangent plane in 1R4 to the graph of F over the point a <11"/4,11"/4>. 10.8 Define F: 1R3 --+ 1R2 by 3 3 YI = L x7, Y2 = L x~. 1 1 Find the equation of the tangent plane to the graph of F in 1R5 over a = <1,2, -1 >. 10.9 Let w(~, 1/) be a bounded bilinear mapping from a product normed linear space V X ll" to a normed linear space X. Show that the equation of the tangent plane to thc graph S of w in V X ll" X X at the point <a, (3, 'Y> E S is s = w(~, (3) +w(a, 1/) +w(a, (3). 10.10 Let F be a bounded linear functional on the normed linear space V. Show that the equation of the tangent plane to the graph of F3 in V X IR over the point a can be written in the form y = F2(a) (3FW - 2F(a). 10.11 Show that if the general equation for a tangent plane given in the text is applied to a mapping Fin Hom(V, W), then it reduces to the equation for F itself [1/ = F(~)], no matter where the point of tangency. (Naturally!) 10.12 Continuing Exercise 9.1, show that the tangent space to the range of r in n· at r(O) is the projection on ll" of the tangent 8pace to the graph of r in 1R2 X ll" at the point <0, r(O) >. Now define the tangent plane to range r in lV at r(O), and show that it is similarly the projection of the tangent plane to the graph of r. 10.13 Let F: V --+ W be differentiable at a. Show that the range of dFa is the pro- jection on 11" of the tangent space to the graph of F in V X W at the point <a, F(a) >. ll. THE IMPLICIT-FUNCTION THEOREM The formula for the Jacobian of a composite map that we obtained in Section ) is reminiscent of the chain rule for the differential of a composite map that we derived earlier (Section 8). The Jacobian formula involves numbers (partial derivatives) that we multiply and add; the differential chain rule involves linear maps (partial differentials) that we compose and add. (The similarity becomes a full formal analogy if we use block decompositions.) Roughly speaking, the
  • 177. 3.11 THE IMPLICIT-FUNCTW:>i THEOREM 165 whole differential calculus goes this way. In the one-variable calculus a differ- ential is a linear map from the one-dimensional space IR to itself, and is therefore multiplication by a number, the derivative. In the many-variable calculus when we decompose with respect to one-dimensional subspaces, we get blocks of such numbers, i.e., Jacobian matrices. When we generalize the whole theory to vector spaces that are not one-dimensional, we get essentially the same formulas but with numbers replaced by linear maps (differentials) and multiplication by composition. Thus the derivative of an inverse function is the reciprocal of the derivative of the function: if 9 = r 1 and b = f(a), then g'(b) = l/f'(a). The differential of an inverse map is the composition inverse of the differential of the map: if G = F-1 and F(a) = {3, then dG{3 = (dFa)-l. If the equation g(x, y) = °defines y implicitly as a function of x, y = f(x), we learn to compute f'(a) in the elementary calculus by differentiating and we get where b = f(a). Hence g(x,f(x)) == 0, 8g 8g , 8_- (a, b) +8-- (a, b) f (a) = 0, x y 8g/8x f'(a) = - 8g/ay' We shall see below that if G(~, 7)) = °defines 7) as a function of ~, 7) = F(~), and if (3 = F(a), then we calculate the differential dFa by differentiating the identity G(~, F(~)) = 0, and we get a formula formally identical to the above. Finally, in exactly the same way, the so-called auxiliary variable method of solving max-min problems in the elementary calculus has the same formal structure as our later solution of a "constrained" maximum problem by Lagrange multipliers. In this section we shall consider the existence and differentiability of func- tions implicitly defined. Suppose that we are given a (vector-valued) function G(~, 7)) of two vector variables, and we want to know whether setting G equal to °defines 7) as a function of ~, that is, whether there exists a unique function F such that G(~, F(~)) is identically zero. Supposing that such an "implicitly defined" function F exists and that everything is differentiable, we can try to compute the differential of F at a by differentiating the equation G(~, F(~)) = 0, or Go <.1, F'? = 0. We get dG~a,B> 0 dla +dG~a,{3> 0 dFa = 0, where we have set (3 = F(a). If dG2 is invertible, we can solve for dFa, getting elFa = -(clG~a,{3»-l 0 dG~a,{3>. Note that this has the same form as the corresponding expression from the elementary calculus that we reviewed above. If F is uniquely determined, then so is elFa, and the above calculation therefore strongly suggests that we are
  • 178. 166 THE DIFFERENTIAL CALCULUS 3.11 going to need the existence of (dG~a,tJ»-1 as a necessary condition for the existence of a uniquely defined implicit function around the point -<a, (3 >. Since (3 is F(a), we also need G(a, n= O. These considerations will lead us to the right theorem, but we shall have to postpone part of its proof to the next chapter. What we can prove here is that if there is an implicitly defined function, then it must be differentiable. Theorem 11.1. Let V, W, and X be normed linear spaces, and let G be a mapping from an open subset A X B of V X W to X. Suppose that F is a con- tinuous mapping from A to B implicitly defined by the equation G(~, 1/) = 0, that is, satisfying G(~, F(~)) = 0 on A. Finally, suppose that G is differ- entiable at -< a, (3 >, where (3 = F(a), and that dG~a,tJ> is invertible. Then F is differentiable at a and dFa = -(dG~a,tJ»-1 0 dG~a,tJ>. Proof. Set 1/ = dFaa), so that G(a + ~,(3 + 1/) = G(a + ~,F(a + ~)) = O. Then 0= G(a + ~,(3 +1/) - G(a, (3) = dG<a,tJ>a, 1/) = dG<a,tJ>(~, 1/) +e(~, 1/) = dG~a,tJ>(~) +dG~a,tJ>(1/) +e(~, 1/). Applying T- l to this equation, where T = dG~a,tJ>' and solving for 1/, we get 1/ = _T-l(dG~a,tJ>(~)) +O(e( -<~, 1/> )). This equation is of the form 1/ = O(~) + e( -< ~, 1/ > ), and since 1/ = dFa(~) is an infinitesimal 9'(~), by the continuity of F at a, Lemmas 5.1 and 5.2 imply first that 1/ = O(~) and then that -<~, 1/> = O(~). Thus O(e(-<~, 1/») = e(eO(m) = ea), and we have dFaW = 1/ = 8W +e(~), where 8 = -(dG~a,tJ»-1 0 dG~a,tJ>' an element of Hom(V, W). Therefore, F is differentiable at a and dFa has the asserted value. 0 We shall show in the next chapter, as an application of the fixed-point theorem, that if V, W, and X are finite-dimensional, and if G is a continuously differentiable mapping from an open subset A X B of V X W to X such that at the point -<a, (3) we have both G(a, (3) = 0 and dG~a,tJ> invertible, then there is a uniquely determined continuous mapping F from a neighborhood M of a to B such that F(a) = (3 and G(~, F(~)) = 0 on M. The same theorem is true for the more general class of complete normed linear spaces which we shall study in the next chapter. For these spaces it is also true that if T- l exists, then so does 8-1 for all 8 sufficiently close to T, and the mapping 81--+8-1 is contin- uous. Therefore dG~p.,.> is invertible for all -<fJ., v> sufficiently close to -<a, (3 >, and the above theorem then implies that F is differentiable on a neighborhood of a. Moreover, only continuous mappings are involved in the formula given by the theorem for dF: fJ. 1--+ dFp., and it follows that F is in fact continuously differentiable near a. These conclusions constitute the implicit- function theorem, which we now restate.
  • 179. lUI THE IMPLICIT-FUNCTION THEOREM 167 Theorem n.2. Let V, W, and X be finite-dimensional (or, more generally, complete) normed linear spaces, let A X B be an open subset of V X W, and let G: A X B ~ X be continuously differentiable. Suppose that at the point -<a, (3 >- in A X B we have both G(a, (3) = 0 and dG~<x,f3> invertible. Then there is a ball M about a and a uniquely defined continuously differen- tiable mapping F from M to B such that F(a) = (3 and G(~, F(O) = 0 on M. The so-called inverse-mapping theorem is a special case of the implicit- fUllction theorem. Theorem n.3. Let H be a continuously differentiable mapping from an open subset B of a finite-dimensional (or complete) normed linear space W to a normed linear space V, and suppose that its differential is invertible at a point (3. Then H itself is invertible near (3. That is, there is a ball M about a = H((3) and a uniquely determined continuously differentiable function F from M to B such that F(a) = (3 and H(F(~)) = ~ on M. 1)(00/. Set G(~, 1/) = ~ - H(1/). Then G is continuously differentiable from V X B to V and dG~a,f3> = -dHf3 is invertible. The implicit-function theorem then gives us a ball M about a and a uniquely determined continuously differ- f»1t.iable mapping F from M to B such that F(a) = (3 and 0 = G(~, F(~)) = • . H(FW) on M. 0 The inverse-mapping theorem is often given a slightly different formulation 'Wilich we state as a corollary. Corollary. Under the hypotheses of the above theorem there exists an open neighborhood U of (3 such that H is injective on U, N = H[U] is open in V, and H-1 is continuously differentiable on N. (See Fig. 3.11.) Fig. 3.11 l'mo/. The proof of the corollary is left as an exercise. In practice we often have to apply the Cartesian formulations of these ~hnorems. The student should certainly be able to write these downf but we iltmll state them anyway; starting with the simpler inverse-mapping .theorem. Theorem n.4. Suppose that we are given n continuously differentiable real-valued functions Gi (Y1,"" Yn), i = 1, ... ,n, of n real variables defined on a neighborhood B of a point b in ~n and suppose that the Jacobian determinant
  • 180. 168 THE DIFFERENTIAL CALCULUS 3.11 is not zero. Then there is a ball M about a = G(b) in ~n and a uniquely determined n-tuple F = -< F1, ... , Fn >- of continuously differentiable real- valued functions defined on M such that F(a) = band G(F(x)) = x on 111 for i = 1, ... ,n. That is, Gi(F1(Xb ... ,xn), ... ,Fn(Xb ... ,xn)) = ;r, for all x in M and for i = 1, ... , n. For example, if x = -< Y~ + Y~, Y~ + Y~ >-, then at the point b = -< 1, 2>- we have a(Xl, X2) _ det [3yt 3Y~]1 a(Yl, Y2) - 2Yl, 2Y2 < 1.2 > = det [;: I!]= -12 ,= 0, and we therefore know without trying to solve explicitly that there is a uniqUf' solution for y in terms of x near x = -< 13 + 23, 12 + 22>- = -< 9, 5>-. Thl' reader would find it virtually impossible to solve for y, since he would quickly discover that he had to solve a polynomial equation of degree 6. This clearly shows the power of the theorem: we are guaranteed the existence of a mappillJ!: which may be very difficult if not impossible to find explicitly. (However, ill the next chapter we shall discover an iterative procedure for approximating thl' inverse mapping as closely as we want.) Everything we have said here applies all the more to the implicit-functioll theorem, which we now state in Cartesian form. Theorem n.s. Suppose that. we are given m continuously differentiabll' real-valued functions Gi(x, y) = Gi(Xl, ... , Xn, Yb ... , Ym) of n +m real variables defined on an open subset A X B of ~n+m and an (n + m)-tupll' -<a, b>- = -<ab"" an, b1, • •• , bm>- such that Gi(a, b) = 0 for i, 1, ... , m, and such that the Jacobian determinant is not zero. Then there is a ball M about a in ~n and a uniquely determinl'd m-tuple F = -< F1, .•• , Fm >- of continuously differentiable real-vaIUl·d functions Fj(x) = Fj(Xb ... ,xn) defined on M such that b = F(a) alld Gi(x, F(x)) = 0 on M for i = 1, ... ,m. That is, bi = Fi(ab ... ,an) I'lli' i = 1, ... ,m,and Gi (Xl, ... ,Xn; F1 (Xb ... ,xn), ... ,Fm(Xb ... ,xn )) ofor all x in M and for i = 1, ... , m. For example, the equations X~ + x~ - y~ - y~ = 0, x~ - x~ - y~ - y~ = 0 can be solved uniquely for y in terms of x near -< x, y>- -< 1, 1, 1, -1>-,
  • 181. 3.11 THE IMPLICIT-FUNCTION THEOREM 169 because they hold at that point and because a(GI, G2) = det [-2Y~, -2Y2] (2 2) a(YI, Y2) -3Yb -3y~ = 6 YIY2 - Y2YI has the value 12 there. Of course, we mean only that the solution functions exist, not that we can explicitly produce them. EXERCISES 11.1 Show that -< x, y> ....... -< e" +ell, eX +e-II > is locally invertible about any point -< a, b>, and compute the Jacobian matrix of the inverse map. 11.2 Show that -< u, v> ....... -< eU +e', eU - eV > is locally invertible about any point -< a, b> in 1R2, by computing the Jacobian matrix. In this case the whole map- ping is invertible, with an easily computed inverse. Make this calculation, compute the Jacobian matrix of the inverse map, and verify that the two matrices are inverses at the appropriate points. 11.3 Show that the mapping -<x,y,z> ....... -<sinx,cosy,ez > from 1R3 to 1R3 is locally invertible about -< 0, 11'/2, 0>. Show that -<x, y, z> ....... -<sin (x + y + z), cos (x - y +z), e<dll-z) > is locally invertible about -<11'/4, -11'/4,0>. 11.4 Express the second map of the above exercise as the composition of two maps, and obtain your answer a second way. 11.5 Let F: -<x, y> ....... -<u, v> be the mapping from 1R2 to 1R2 defined by u = x2 + y2, V = 2xy. Compute an inverse G of F, being careful to give the domain and range of G. How many inverse mappings are there? Compute the Jacobian matrices of F at -< 1,2> and of Gat -< 5,4>, and show by multiplying them that they are inverse. 11.6 Consider now the mapping F: -< x, y> ....... -< x3, y3 >. Show that dF<0.0> is singular and yet that the mapping has an inverse G. What conclusion do we draw about the differentiability of G at the origin? 11.7 Define F:1R2---t1R2 by -<x,y> ....... -<e"cosy,eXsiny>. Prove that F is locally invertible about every point. 11.8 Define F: 1R3 ---t 1R3 by x ....... y where YI = Xl +x~ + (X3 - 1) 2 3 Y2 = Xl +X2+ (X3 - 3X3), 3 2 Y3 = Xl +X2 + X3. Prove that x ....... y = F(x) is locally invertible about x = -< 0, 0, 1>. 11.9 For a functionf: IR ---t IR the proof of local invertibility around a point a where dfa is nonsingular is much simpler than the general case. Show first that the Jacobian matrix of f at a is the number f'ea). We are therefore assuming thatf'(x) is continuous in a neighborhood of a and that f'(a) ;>'! O. Prove that then f is strictly increasing (or decreasing) in an interval about a. Now finish the theorem. (See Exercise 1.12.)
  • 182. 170 THE DIFFERENTIAL CALCULUS 3.11 n.IO Show that the equations t2 + x3 + y3 + z3 = 0, t + x2+ y2 + z2 = 4, have differentiable solutions x(t), y(t), z(t) around -<t, x, y, z> = -<0, -1, 1,0>, n.n Show that the equations x + 2y + 3u + 4. 4e e e e = , can be uniquely solved for u and v in terms of x and y around the point -< 0, 0, 0, °> , n.12 Let S be the graph of the equation xz + sin (xy) + cos (xz) = 1 in IJ;P. Determine whether in the neighborhood of (0,1,1) S is the graph of a diffrr- entiable function in any of the following forms: z = f(x, y), x = g(y, z), y = h(x, z). n.13 Given functionsf and g from 1R3 to IR such thatf(a, b, c) = °and g(a, b, c) = n, write down the condition on the partial derivatives of f and g that guarantees tht, existence of a unique pair of differentiable functions y = h(x) and z = k(x) satisfyin/-': h(a) = b, k(a) = c, and f(x, y, z) = f(x, h(x), k(x)) = 0, g(x, y, z) = g(x, h(x), k(x)) = ° around -<a, b, c>. n.14 Let G(t TJ, n be a continuously differentiable mapping from V = In Vi to 11 such that dG!: V3 ----> W is invertible and G(a) = G(aI, a2, (3) = 0. Prove that then' exists a uniquely determined function t = F(~, TJ) defined around -< aI, a2 > III VI X V 2 such that G(~, TJ, F(~, TJ)) = °and F(al, (2) = a3. Also show that dF~t,~> = [-dG~t,~'>l-l[dG~t,~,r>l, where t = F(~, TJ). n.15 Let F(t TJ) be a continuously differentiable function from V X tv to X, and suppose that dF~a,{3> is invertible. Setting'Y = F(a, (3), show that there is a product neighborhood L X M X N of -< 'Y, a, (3 > in X X V X Wand a unique continuously differentiable mapping G: L X M ----> N such that on LX M, F(~, G(t, ~)) = !: n.16 Suppose that the equation g(x, y, z) = °can be solved for z in terms of x and iI, This means that there is a function f(x, y) such that g(x, y, f(x, y)) = 0. Suppose abo that everything is differentiable and compute az/ax. n.17 Suppose that the equations g(x, y, z) = ° and h(x, y, z) = ° can be solved for y and z as functions of x. Compute dy/dx. n.IS Suppose that g(x, y, u, v) = °and h(x, y, u, v) = °can be solved for u and I' as functions of x and y. Compute au/ax. n.19 Compute dz/dx where x3 + y3 + z3 = °and x2+ y2 +- z2 = 1. n.20 If t3 + x3 + y3 + z3 = °and t2+ x2 + y2 + z2 = 1, then az/ax is ambiguolI~, We are obviously going to think of two of the variables as functions of the other two,
  • 183. 3.11 THE IMPLICIT-FUNCTION THEOREM 171 Also z is going to be dependent and x independent. But is t or y going to be the other independent variable? Compute azjax under each of these assumptions. 1l.21 We are given four "physical variables" p, v, t, and cp such that each of them is a function of any two of the other three. Show that atjap has two quite different mean- ings, and make explicit what the relationship between them is by labeling the various functions that are relevant and applying the implicit differentiation process. 1l.22 Again the "one-dimensional" case is substantially simpler. Let G be a con- tinuously differentiable mapping from 1R2 to IR such that G(a, b) = 0 and (aGjay) (a, b) = G2(a, b) > o. Show that there are positive numbers E and ~ such that for each c in (a - ~, a + ~) the function g(y) = G(c, y) is strictly increasing on [b - E, b + E) and G(c, b - E) < o < G(c, b + E). Conclude from the intermediate-value theorem (Exercise 1.13) that there exists a unique function F: (a - ~, a + ~) ---+ (b - E, b + E) such that G(x,F(x» = O. 1l.23 By applying the same argument used in the above exercise a second time, prove that F is continuous. 1l.24 In the inverse-function theorem show that dFa = (dHfj) -1. That is, the differ- ential of the inverse of H is the inverse of the differential of H. Show this a) by applying the implicit-function theorem; b) by a direct calculation from the identity H (FW) = ~. 11.25 Again in the context of the inverse-mapping theorem, show that there is a neighborhood I'll of f3 in A such that F(H(rJ» = ." on M. (Don't work at this. Just apply the theorem again.) 11.26 We continue in the context of the inverse-mapping theorem. Assume the result (from the next chapter) that if dH~l exists, then so does dH"i1, for ~ sufficiently close to f3. Show that there is an open neighborhood U of f3 in B such that H is injective on U, H[U) is an open set N in V, and H-1 is continuously differentiable on N. 11.27 Use Exercise 3.21 to give a direct proof of the existence of a Lipschitz con- tinuous local inverse in the context of the inverse-mapping theorem. [Hint: Apply Theorem 7.4.) 11.28 A direct proof of the differentiability of an inverse function is simpler than the implicit-function theorem proof. Work out such a proof, modeling your arguments in a general way upon those in Theorem 11.1. 11.29 Prove that the implicit-function theorem can be deduced from the inverse- function theorem as follows. Set H(~,.,,) = -<~, G(~, .,,) >, ILnd show that dH<a,fj> has the block diagram I I 0 dG1 dG2 Then show that dH<a,fj> -1 exists from the block diagram results of Chapter 1. Apply the inverse-mapping theorem.
  • 184. 172 THE DIFFERENTIAL CALCULUS 3.12 12. SUBMANIFOLDS AND LAGRANGE MULTIPLIERS If V and Ware finite-dimensional spaces, with dimensions nand m, respectively, and if F is a continuous mapping from an open subset A of V to W, then (the graph of) F is a subset of V X W which we visualize as a kind of "n-dimensional surface" S spread out over A. (See Section 10.) We shall call F an n-dimensional patch in V X W. :Iore generally, if X is any (n +m)-dimensional vector space, we shall call a subset S an n-dimensional patch if there is an isomorphism 'P from X to a product space V X W such that V is n-dimensional and 'P[S] is a patch in V X W. That is, S becomes a patch in the above sense when X is considered to be V X W. This means that if 11"1 is the projection of X = V X W onto V, then 1I"dS] is an open subset A of V, and the restriction 11"1 rS is one-to- one and has a continuous inverse. If 11"2 is the projection on W, then F = 11"2 a (11"1 rS)-I~ is the map from A to W whose graph in V X W is S (when V X W is identified with X). Now there are important surfaces that aren't such "patch" surfaces. Con- sider, for instance, the surface of the unit ball in ~3, S = {x : L~ x2 = I}. S is obviously a two-dimensional surface in ~3 which cannot be expressed as a graph, no matter how we try to express ~3 as a direct sum. However, it should be equally clear that S is the union of overlapping surface patches. If a is any point on S, then any sufficiently small neighborhood N of a in ~3 will intersect S in a patch; we take Vas the subspace parallel to the tangent plane at a and Was the perpendicular line through O. Moreover, this property of S is a completely adequate definition of what we mean by a submanifold. A subset S of an (n + m)-dimensional vector space X is an n-dimensional submanifold of X if each a on S has a neighborhood N in X whose intersection with S is an n-dimensional patch. We say that S is smooth if all these patches Sa are smooth, that is, if the function F: A ---? W whose graph in V X W is the patch Sa (when X is viewed as V X W) is continuously differentiable for every such patch Sa. The sphere we considered above is a two-dimensional smooth submanifold of ~3. Submanifolds are frequently presented as zero sets of mappings. For example, our sphere above is the zero set of the mapping G from ~3 to ~ defined by G(x) = L~ xl - 1. It is obviously important to have a condition guar- anteeing that such a null set is a submanifold. Theorem 12.1. Let G be a continuously differentiable mapping from an open subset U of an (n + m)-dimensional vector space X to an m-dimensional vector space Y such that dGa is surjective for every a in the zero set S of G. Then S is an n-dimensional submanifold of X. Proof. Choose any point 'Y of S. Since dG-y is surjective from the (n + m)- dimensional vector space X to the m-dimensional vector space Y, we know that the null space V of dG-y has dimension n (Theorem 2.4, Chapter 2). Let W be any
  • 185. 3.12 SUBMANIFOLDS AND LAGRANGE MULTIPLIERS 173 complement of V, and think of X as V X W, so that G now becomes a function of two vector variables and 'Y is a point <a, /3 >- such that G(a, /3) = O. The restriction of dG<a.,fJ> to W is an isomorphism from W to Y; that is, (dG~a.,fJ»-l exists. Therefore, by the implicit-function theorem, there is a product neigh- borhood Sa(a) X Sr(/3) of <a, /3 >- in X whose intersection with S is the graph of a function on Sa(a). This proves our theorem. 0 If S is a smooth submanifold, then the function F whose graph is the patch of S around 'Y (when X is viewed suitably as V X W) is continuously differentia- ble, and therefore S has a uniquely determined n-dimensional tangent plane M at 'Y that fits S most closely around 'Y in the sense of our l'J-approximations. If 'Y = 0, this tangent plane is an n-dimensional subspace, and in general it is the translate through 'Y of a subspace N. We call N t.he tangent space of S at 'Y; its elements are exactly the vectors in X tangent to parametrized arcs drawn in S through 'Y. What we are going to do later is to describe an n-dimensional manifold S independently of any imbedding of S in a vector space. The tangent space to S at a point 'Y will still be an invaluable notion, but we are not going to he able to visualize it by an actual tangent plane in a space X carrying S. Instead, we will have to construct the vector space tangent to S at 'Y some- how. The clue is provided by Theorem 10.2, which tells us that if S is imbedded I1S a submanifold in a vector space X, then each vector tangent to S at 'Y can be presented as the unique tangent vector at 'Y to some smooth curve lying in S. This mapping from the set of smooth curves in S through 'Y to the tangent space Itt'Y is not injective; clearly, different curves can be tangent to each other at 'Y ILlld so have the same tangent vector there. Therefore, the object in S that eorresponds to a tangent vector at 'Y is an equivalence class of smooth curves through 'Y, and this will in fact be our definition of a tangent vector for a general lIIanifold. The notion of a submanifold allows us to consider in an elegant way a <:Iassical "constrained" maximum problem. Weare given an open subset U of a finite-dimensional vector space X, a differentiable real-valued function F defined on U, and a submanifold S lying in U. We shall suppose that the ,ubmanifold S is the zero set of a continuously differentiable mapping G from U 10 a vector space Y such that dG"( is surjective for each 'Y on S. We wish to ('lIllsider the problem of maximizing (or minimizing) F('Y) when 'Y is "con- Htrained" to lie on S. We cannot expect to find such a maximum point 'Yo by Hd,ting dF"( = 0 and solving for 'Y, because 'Yo will not be a critical point for F. (:ollsider, for example, the function fI(x) = L~ xT - 1 from ~3 to ~ and F(x) = .1'2' Here the "surface" defined by fI = 0 is the unit sphere L~ xT = 1, and on Ihis sphere F has its maximum value 1 at <0,1,0>-. But F is linear, and so ,[/,,"( = F can never be the zero transformation. The device known as Lagrange multipliers shows that we can nevertheless find such constrained critical points by solving dL,,( = 0 for a suitable function L.
  • 186. 174 THE DIFFERENTIAL CALCULUS 3.12 Theorem 12.2. Suppose that F has a maximum value on S at the point 1'. Then there is a functionall in y* such that I' is a critical point of the func- tion F - (loG). Proof. By the implicit-function theorem we can express X as V X W in such a way that the neighborhood of S around I' is the graph of a mapping H from an open set A in V to W. Thus, expressing F and G as functions on V X W, we have G(~, 1/) = 0 near I' = «a, (3) if and only if 1/ = H(~), and the restriction of F(~, 1/) to this zero surface is thus the function K: A ~ IR defined by Ka) = F (~, H (~». By assumption a is a critical point for this function. Thus o= dKa = dF~a,p> +dF~a,p> 0 dHa. Also from the identity G(~, H(O) = 0, we get o= dG~a,p> + dG~a,p> 0 dHa. Since dG~a,fJ> is invertible, we can solve the second equation for dHa and substitute in the first, thus getting, dropping the subscripts for simplicity, dF I - dF2 0 (dG2)-1 0 dGI = O. Let l E y* be the functional dF2 0 (dG2 )-I. Then we have dF I = lo dGI and, by definition, dF2 = lo dG2• Composing the first equation (on the right) with 1rl: V X W ~ V and the second with 1r2, and adding, we get dF<a,p> = lo dG<a,p>. That is, d(F - lo Gh = o. 0 Nothing we have said so far explains the phrase "Lagrange mUltipliers". This comes out of the Cartesian expression of the theorem, where we have U an open subset of a Cartesian space IRn, Y = IRm, G = «gl, ... , gm> , and l ill y* of the form lc: l(y) = L'f CiYi. Then F - loG = F - L'f Cigi , and d(F - lo G)a = 0 becomes aF magi - - 1: Ci - = 0, j = 1, ... , n. aXj 1 aXj These n equations together with the m equations G = «gl, ... , gm> = 0 give m +n equations in the m +n unknowns Xb ••• , X n , Cb ••• , Cm. Our original trivial example will show how this works out in practice. We want to maximize F(x) = X2 from 1R3 to IR subject to the constraint L~ x'f = 1. Here g(x) = L~ x'f - 1 is also from 1R3 to IR, and our method tells us to look for a critical point of F - cg subject to g = O. Our system of equations is 0- 2CXl = 0, 1 - 2CX2 = 0, 0- 2CX3 = 0, 3 1: x~ = 1. 1
  • 187. 3.13 FUNCTIONAL DEPENDENCE 175 The first says that c = °or Xl = 0, and the second implies that c cannot be 0. Therefore, Xl = Xa = 0, and the fourth equation then shows that X2 = ±1. Another example is our problem of minimizing the surface area A = 2(xy + yz +zx) of a rectangular parallelepiped, subject to the constraint of a constant volume, xyz = V. The theorem says that the minimum point will be a critical point of A - AV for some A, and, setting the differential of this function equal to zero, we get the equations together with the constraint 2(y + z) - AyZ = 0, 2(x + z) - AXZ = 0, 2(x + y) - AXY = 0, xyz = V. The first three equations imply that X = Y = z; the last then gives VIla at the common value. *13. FUNCTIONAL DEPENDENCE The question, roughly, is this: If we are given a collection of continuous functions, all defined on some open set A, how can we tell whether or not some of them are functions of the rest? For example, if we are given three real-valued continuous functions fll 12, and fa, how can we tell whether or not some one of them is a function of the other two, say fa is a function offl and12, which means that there is a function of two variables g(x, y) such that fa(t) = g(!I (t), f2(t)) for all t in the common domain A? If this happens, we say that fa is functionally dependent on!I and f2. This is very nearly the same as asking when it will be the case that the range S of the mapping F: t ~ <'!I(t),!2(t), fa(t) >- is a two-dimensional submanifold of IRa. However, there are differences in these questions that are worth noting. If fa is functionally dependent on fl and f2' then the range of F certainly lies on a two-dimensional submanifold of IRa, namely, the graph of g. But this is no guarantee that it itself forms a two-dimensional submanifold. For example, both 12 and fa might be functionally dependent on !I, f2 = go fl' and fa = h 0!I, in which case the range of F lies on the curve <. s, g(s), h(s) >- in IRa, which is a one-dimensional submanifold. In the opposite direction, the range of F can be a two-dimensional submanifold M without fa being functionally dependent onf2 and fl. All we can conclude in this case is that locally one of the functions {h} ~ is a function of the other two, since locally M is a surface patch, in the language of the last section. But if we move a little bit away on the eurving surface M to the neighborhood of another point, we may have to solve for a different one of the functions. Nevertheless, if M = range F is a subset of u two-dimensional manifold, it is reasonable to say that the functions {h} ~ are functionally dependent, and we are led to examine this more natural notion.
  • 188. 176 THE DIFFERENTIAL CALCULUS 3.1:~ If we assume that F = -<ft, f2' fa >- is continuously differentiable and that the rank of dFa is 3 at some point a in A, then the implicit-function theorem implies that F[A] includes a whole ball in 1R3 about the point F(a). Thus a necessary condition for !If = range F to lie on a two-dimensional submanifold in 1R3 is that the rank of dFa be everywhere less than 3. We shall see, in fact, that if the rank of dFais 2 for all a, then lIf = range F is essentially a two-dimensional manifold. (There is still a tiny difficulty that we shall explain later.) Our tools are going to be the implicit-function theorem and the following theorem, which could well have come much earlier, that the rank of T is a "lower semicon- tinuous" function of T. Theorelll 13.1. Let V and W be finite-dimensional vector spaces, normed in some way. Then for any Tin Hom(V, W) there is an E such that liS - Til < E ==} rank S ~ rank T. Pl'oof. Let T have null space N and range R, and let X be any complement of N in V. Then the restriction of T to X is an isomorphism to R, and hence is bounded below by some positive m. (Its inverse from R to X is bounded by some b, by Theorem 4.2, and we set m = l/b.) Then if liS - Til < m/2, it follows that S is bounded below on X by m/2, for the inequalities IIT(a) II ~ mllall and II(S - T)(a)11 ::s; (m/2)llall together imply that IIS(a)11 ~ (m/2)llall. In particular, S is injective on X, and so rank S = derange S) ~ d(X) = d(R) = rank T. D We can now prove the general local theorem. Theorelll 13.2. Let V and W be finite-dimensional spaces, let l' be an integer less than the dimension of W, and let F be a continuously differentiable map from an open subset A C V to W such that the rank of dF-y = r for all 'Y in A. Then each point 'Y in A has a neighborhood U such that F[U] is an r-dimensional patch submanifold of W. Proof. For a fixed 'Y in A let VIand Y be the null space and range of dF-y, let V2 be a complement of VI in V, and view V as VI X V2. Then F becomes a function F(~, 71) of two variables, and if 'Y = -<a, f3>-, then dF~a,{J> is an isomorphism from V 2 to Y. At this point we can already choose the decom- position W = WI Et> W2 with respect to which F[A] is going to be a graph (locally). We simply choose any direct sum decomposition W = WI Et> W2 such that W2 is a complement of Y = range dF<a,{J>' Thus WI might be Y, but it doesn't have to be. Let P be the projection of W onto W b along W2. Since Y is a complement of the null space of P, we know that PlY is an isomorphism from Y to WI. In particular, WI is r-dimensional, and rank P 0 dF<a,{J> = r.
  • 189. 3.13 FUNCTIONAL DEPENDENCE 177 Moreover, and this is crucial, P is an isomorphism from the range of dF <~,~> to WI for all <~, 71 >- sufficiently close to <a, (3 >-. For the above rank theorem implies that rank po dF <~,~> ~ rank P 0 dF <a,~> = r on some neighborhood of <a, (3 >-. On the other hand, the range of P 0 dF <~,~> is included in the range of P, which is WI, and so rank P 0 dF <~,~> ~ r. Thus rank po dF <~,'1> = r for <~, 71>- near <a, (3)-, and since rank dF <~,~> = r by hypothesis, we see that P is an isomorphism on the range of any such dF<~,'1>' Now define H: WI X A -+ WI as the mapping <r,~, 71>- f---+ P 0 Fa, 71) - r. If J1. = po F(a, (3), then dH~I',a,~> = P 0 dF~a,~>, which is an isomor- phism from V 2 to WI. Therefore, by the implicit-function theorem there exists a neighborhood L X M X N of <J1., a, (3 >- and a uniquely determined con- tinuously differentiable mapping G from L X M to N such that H(r, ~, G(r, ~») = 0 on L X M. That is, r = P 0 F(~, G(r, ~») on L X M. The remainder of our argument consists in showing that F (~, G(r, ~») is a function of r alone. We start by differentiating the above equation with respect to ~, getting 0= po (dFI+dF2odG2) = podFo <I,dG2>-. As noted above, P is an isomorphism on the range of dF <~,~> for all <~, 71 >- sufficiently close to <a, (3 >- ,and if we suppose that L X M is also taken small enough so that this holds, then the above equation implies that dF<~,~> 0 <I, dG2>- = 0 for all <r, ~>- E L X M. But this is just the statement that the partial differ- ential with respect to ~ of F (~, G(r, ~») is identically 0, and hence that F(~, G(r, ~») is a continuously differentiable function K of r alone: F(~, G(r, ~») = K(t). Hince 71 = G(r, ~) and r = P 0 F(~, 71), we thus have Fa, 71) = K(P 0 F(~, 71»), or F = K 0 P 0 F, and this holds on the open set U consisting of those points <~, 71 >- in M X /1/ Huch that P 0 F(~, 71) E L. If we think of W as WI X W 2, then F and K are ordered pairs of functions, F = <FI, F2>- and K = <l, k>- ,P is the mapping <r, /I >- f---+ r, and the second component of the above equation is F2 = k 0 Fl.
  • 190. 178 THE DIFFERENTIAL CALCULUS 3.13 Since F1[U] = P 0 F[U] = L, the above equation says that F[U] is the graph of the mapping k from L to W 2. Moreover, L is an open subset of the r-dimensional vector space W 1, and therefore F[U] is an r-dimensional patch manifold in W = W 1 X W 2• 0 The above theorem includes the answer to our original question about functional dependence. Corollary. Let F = {fi}i be an m-tuple of continuously differentiable real-valued functions defined on an open subset A of a normed linear space V, and suppose that the rank of dFa has the constant value r on A, where r is less than m. Then any point 'Y in A has a neighborhood U over which m - r of the functions are functionally dependent on the remaining r. Proof. By hypothesis the range Y of dF'Y = -<df~, ... , df!7> is an r-dimen- sional subspace of IRm. We can therefore find a basis for a complementary sub- space W2 by choosing m - r of the standard basis elements {6i}, and we may as well renumber the functions t so that these are 6T +1, ••• ,6m • Then the projection P of IRm onto W = L(61, ••• , 6T) is an isomorphism from Y to W (since Y is a complement of its null space), and by the theorem there is a neigh- borhood U of 'Y over which (l - P) 0 F is a function k of po F. But this says exactly that -<r+1, ... ,r> = k 0 -<fl, ... ,r>. That is, k is an (m - r)- tuple-valued function, k = -<kr+t, ... , km >, and fi = ki 0 -<ft, ... ,F> for j = r + 1, ... ,m. 0 y Fig. 3.12 We mentioned earlier in the section that there was a difficulty in concluding that if F is a continuously differentiable map from an open subset A of V to W whose differential has constant rank r less than deW), then S = range F is an r-dimensional submanifold of S. The flaw can be described as follows. The definition of a submanifold S of X required that each point of S have a neighbor- hood in X whose intersection with S is a patch. In the case before us, what we
  • 191. 3.14 UNIFORM CONTINUITY AND FUNCTION-VALUED MAPPINGS 179 can conclude is that if fl is a point of S, then fl = F(at.) for some at. in A, and at. has a neighborhood U whose image under F is a patch. But this image may not be a full neighborhood of fl in S, because S may curve back on itself in such a way as to intrude into every neighborhood of fl. Consider, for example, the one- dimensional r imbedded in 1R3 suggested by following Fig. 3.12. The curve begins in the xz-plane along the z-axis, curves over, and when it comes to the xy-plane it starts spiraling in to the origin in the xy-plane (the point of change over from the xz-plane to the xy-plane is a singularity that we could smooth out). The origin is not a point having a neighborhood in 1R3 whose intersection with r is a one-patch, but the full curve is the image of (-1, 1) under a continuously differentiable injection. We would consider r to be a one-dimensional manifold without any difficulty, but something has gone wrong with its imbedding in 1R3, so it is not a one-dimen- sional submanifold of 1R3. *14. UNIFORM CONTINUITY AND FUNCTION-VALUED MAPPINGS In the next chapter we shall see that a continuous function F whose domain is a bounded closed subset of a finite-dimensional vector space V is necessarily uniformly continuous. This means that given E, there is a ~ such that I!~ - 'III < ~ =} IIF(~) - F('1)1I < E for all vectors ~ and 'I in the domain of F. The point is that 8 depends only on E and not, as in ordinary continuity, on the "anchor" point at which continuity is being asserted. This is a very important property. In this section we shall see that it underlies a class of theorems in which a point map is escalated to a function-valued map, and prop- erties of the point map imply corresponding properties of the function-valued map. Such theorems have powerful applications, as we shall see in Section 15 and in Section 1 of Chapter 6. An application that we shall get immediately here is the theorem on differentIation under the integral sign. However, it is only Theorem 14.3 that will be used later in the book. Suppose first that Fa, 'I) is a bounded continuous function from a product open set M X N to a normed linear space X. Holding 'I fixed, we have a function I.(E) = F(~, 'I) which is a bounded continuous function on M, that is, an ~Iement of the normed linear space Y = me(M, X) of all bounded continuous maps from M to X. This function is also indicated F( . , 'I), so that f~ = F( . , 'I)' We are supposing that the uniform norm is being used on Y: IIf~1I = lub {lIf'l(~)1I : ~ E M} = lub {IIF(~, '1)11 : ~ EM}. Theorem 14.1. In the above context, if F is unt"formly continuous, then the mapping 'I 1-+ f'l (or 'I 1-+ F(·, 'I)) is continuous, in fact, uniformly continuous, from N to Y.
  • 192. 180 THE DIFFERENTIAL CALCULUS 3.14 Proof. Given E, choose 8 so that II-<~, 1]>- - -<}L, v>-II < 8 => IIF(~, 1]) - F(}L, v)11 < E. Taking}L = ~ and rewriting the right-hand side, we have for all~. Thus 111] - vII < 8 => Ilf'1W - f.WII < E 111] - vII < 8 => Ilf'1 - f.lloo ::::; E. 0 We have proved that if a function of two variables is uniformly continuous, then the mappings obtained from it by the general duality principle are con- tinuous. This phenomenon lies behind many well-known facts. For example: Corollary. If F(x, y) is a uniformly continuous real-valued function on the unit square [0, 11 X [0, 11 in ~2, then UF(x, y) dx is a continuous function of y. Proof. The mapping y ~ IJ F(x, y) dx is the composition of the bounded linear mapping f ~ IJ f from e([O, 11 to ~ with the continuous mapping y ~ F( . , y) from [0, 1] to e[O, 1], and is continuous as the composition of con- tinuous mappings. 0 We consider next the differentiability of the above duality-induced mapping. Theorelll 14.2. If F is a bounded continuous mapping from an open product set M X N of a normed linear space V X W to a normed linear space X, and if dJi'~a.fJ> exists and is a bounded uniformly continuous function of -<a, fJ>- on M X N, then cP: 1] ~ F(·, 1]) is a differentiable mapping from N to Y = CBe(M, X), and [dcpfJ(1])]W = dF~~.fJ>(1]). Proof. Given E, we choose 8 by the uniform continuity of dF2, so that II}L - vii < 8 => IldF~~.Jl> - dF~~.•>11 < E for all ~ E M. The corollary to Theorem 7.4 then implies that for all ~ E M, all fJ EN, and all 1] such that the line segment from fJ to fJ +1] is in N. We fix fJ and rewrite the right-hand side of the above inequality. This is the heart of the proof. First dF~~.fJ>(1]) = Fa, fJ + 1]) - F(~, fJ) = [ffJ+'1 - ftJ](O = [cp(fJ + 1]) - cp(fJ)](~) = [dCPfJ(1])](~)· Next we can check that if IldF~Jl.•>11 ::::; b for -<}L, v>- EM X N, then the map- ping T defined by the formula [T(1])J(~) = dF~~.fJ>(1]) is an element of Hom(W, Y) of norm at most b. We leave the detailed verification of this as an
  • 193. 3.14 UNIFORM CONTINUITY AND FUNCTION-VALUED MAPPINGS exercise for the reader. The last displayed inequality now takes the form and hence 1171// < ~ => II[Acp/l(71) - T(71)]WII ~ EII7III, //71// < ~ => //Acp/l(71) - T(71)//", ~ E//71//. This says exactly that the mapping cP is differentiable at (3 and dcp/l = T. 0 181 The mapping cP is in fact continuously differentiable, as can be seen by arguing a little further in the above manner. The situation is very close to being an application of Theorem 14.1. The classical theorem on differentiability under the integral sign is a corollary of the above theorem. We give a simple case. Note that if 71 is a real variable y, then the above formula for dcp can be rewritten in terms of arc derivatives: [cp'(b)]W = ~~ (~, b). Corollary. If F(x, y) is a continuous real-valued function on the unit square [0, 1] X [0, 1], and if aFjay exists and is a uniformly continuous function on the square, then UF(x, y) dx is a differentiable function of y and its derivative is U(aFjay) (x, y) dx. Proof. The mapping T: y f-+ UF(x, y) dx is the composition of the bounded linear mappingf f-+ Uf(x) dx from e([O, 1]) to IR with the differentiable mapping cp: y f-+ F( . ,y) from [0, 1] to e([O, 1]), and is therefore differentiable by the composite-function rule. Then Theorem 7.2 and the fact that the differential of a bounded linear map is itself give T'(y) = 101 [cp'(y)](x) dx = 101 :~ (x, y) dx. 0 We come now to the situation of most importance to us, where a point to point map generates a function-to-function map by composition. Let A be an open set in a normed linear space V, let S be an arbitrary set, and let a be the set of bounded mapsf from S to A. Then a is a subset of the normed linear space <B(S, V) of all bounded functions from S to V under the uniform norm. A func- tionf E a will be an interior point of a if and only if the distance from the range off to the boundary of a is a positive number ~, for this is clearly equivalent to saying that a includes a ball in <B(S, V) about the point f. Now let g be any bounded mapping from A to a normed linear space W, and let G: a ~ <B(S, W) be composition by g. That is, h = G(f) if and only if f E a and h = g 0 f. We can consider both the continuity and differentiability of G, but we shall only work out the differentiability theorem. Theorem 14.3. Let the function g: A ~ W be differentiable at each point a in A, and let dg(JI. be a bounded uniformly continuous function of a. Then the mapping G: a ~ <B(S, W) defined by G(fl = go f is differentiable at
  • 194. 182 THE DIFFERENTIAL CALCULUS any interior point f in <t and dGj : CB(S, V) -t CB(S, W) is defined by [dGj(h)](s) = dgj(8)(h(s» for all s E S. Proof. Given E, choose (j by the uniform continuity of dg so that and then apply the corollary to Theorem 7.4 once more to conclude that 3.15 provided the line segment from a to a + ~ is in A. Now choose any fixed interior point f in a, and choose (j' ::; (j so that Ba,(f) c <t. Then for any h in CB(S, V), Ilhll", < (j' =} II~gjc.) (h(s» - dgj(8) (h(s» II ::; Ellh(s) II for aIls E S. Define a map T: CB(S, V) ~ CB(S, W) by [T(h)](s) = dgjcs)(h(s». Then the above displayed inequality can be rewritten as That is, ~Gj = T +I:l. We will therefore be done when we h,ave shown that T E Hom(CB(S, V), CB(S, W». First, we have (T(h l + h2»(s) = dgjcs )((hl + h2)(s» = dgjCs) (hl(S) + h2(s» = dgjcs)(hl(s» + dgj(8)(h2(s» = (T(hl»(S) + (T(h2»(S). Thus T(hl + h2) = T(hl ) + T(h2), and homogeneity follows similarly. Second, if b is a bound to Iidgall on A, then IIT(h)ll", = lub {II (T(h»(s)11 : s E S} ::; lub {lldgjcs)II . Ilh(s) II :s E S} ::; bllhll",· Therefore, IITil ::; b, and we are finished. 0 In the above situation, if g is from A X U to W, so that G(f) is the function h given by h(t) = g(j(t), t), then nothing is changed except that the theorem is about dgl instead of dg. If, in addition, V is a product space V 1 X V 2, so that f is of the form <hh> and [G(f)](t) = g(!I(t),f2(t), t), then our rules about partial differentials give us the formula [dGj(h)](t) = dyJc!)(hl(t» + dY7c!) (h2(t». *15. THE CALCULUS OF VARIATIONS The problems of the calculus of variations are simply critical-point problems of a certain type with a characteristic twist in the way the condition dFa = 0 is used. We shall illustrate the subject by proving one of its standard theorems. Since we want to solve a constrained maximum problem in which the domain is an infinite-dimensional vector space, a systematic discussion would start off
  • 195. 3.15 THE CALCULUS OF VARIATIONS 183 with a more general form of the Lagrange multiplier theorem. However, for our purpose it is sufficient to note that if S is a closed plane M + a, then the restric- tion of F to S is equivalent to a new function on the vector space 111, and its differential at {3 = T/ + a in S is clearly just the restriction of dFfJ to M. The requirement that {3 be a critical point for the constrained function is therefore simply the requirement that dFfJ vanish on M. Let F be a uniformly continuous differentiable real-valued function of three variables defined on (an open subset of) W X W X IR, where W is a normed linear space. Given a closed interval [a, b] C IR, let V be the normed linear space e1([a, b], W) of smooth arcs f: [a, b] ~ W, with Ilfll taken as Ilflloo + 11!'1100. The problem is to maximize the (nonlinear) functional G(f) = f: F(j(t), !'(t), t) dt, subject to the restraints f(a) = a and feb) = {3. That is, we consider only smooth arcs in W with fixed endpoints a and {3, and we want to find that arc from a to {3 which maximizes (or minimizes) the integral. Now we can show that G is a continuously differentiable function from (an open subset of) V to R The easiest way to do this is to let X be the space e([a, b], W) of continuous arcs under the uniform norm, and to consider first the more general functional K from X X X to IR defined by K(f, g) = f: F(j(t), get), t) dt. By Theorem 14.3 the integrand map <f, g>- 1---+ F(jO, gO, .) is differentiable from X X X to e([a, b]) and its differential at <f, g>- evaluated at <h, k>- is the function dF~f(t),g(t),t>(h(t)) + dF~f(t),g(t),t>k(t). Sincef 1---+ f: f(t) is a bounded linear functional on e, it is differentiable and equal to its differential. The composite-function rule therefore implies that K is differentiable and that dK<f,g>(h, k) = ib [dF1(h(t)) +dF2(k(t))] dt, where the partial differentials in the integrand are at the point <f(t), get), t>-. Now the pairs <f, g>- such thatf' exists and equals g form a closed subspace of X X X which is isomorphic to V. It is obvious that they form a subspace, but to see that it is closed requires the theory of the integral for parametrized arcs from Chapter 4, for it depends on the representation f(t) = f(a) +f~ !,(s) ds and the consequent norm inequality Ilf(t) - f(a) II :::; (t - a) 11!'1100. Assuming this, we see that our original functional G is just the restriction of K to this sub- space (isomorphic to) V, and hence is differentiable with This differential dGf is called the first variation of G about f. The fixed endpoints a and {3 for the arc f determine in turn a closed plane P in V, for the evaluation maps (coordinate projections) 7rz: f 1---+ f(x) are bounded and P is the intersection of the hyperplanes 7ra = a and 7rb = {3. Since P is a translate of the subspace M = {f E V: f(a) = feb) = O}, our constrained
  • 196. 184 THE DIFFERENTIAL CALCULUS 3.15 maximum equation is dG/(h) = lab [dFl(h(t)) +dF2(h'(t))] dt = 0 for all h in M. We come now to the special trick of the calculus of variations, called the lemma of Du Bois-Reymond. Suppose for simplicity that W = IR. Then F is a function F(x, y, t) of three real variables, the partial differentials are equivalent to ordinary partial deriva- tives, and our critical-point equation is dG/(h) = rb (aF . h +aF . h') = O. Ja ax ay If we integrate the first term in the integral by parts and remember that h(a) = h(b) = 0, we see that the equation becomes where g = h'. Since h is an arbitrary continuously differentiable function except for the constraints h(a) = h(b) = 0, we see that g is an arbitrary continuous function except for the constraint f: get) dt = O. That is, aFjay - faFjax is orthogonal to the null space N of the linear functional g ~ f: get) dt. Since the one-dimensional space N1. is clearly the set of constant functions, our condition becomes aF taF ay (f(t),f'(t), t) = Jo ax (f(s),I'(s), s) ds +C. This equation implies, in particular, that the left member is differentiable. This is not immediately apparent, since!, is only assumed to be continuous. Differ- entiating, we conclude finally that I is a critical point of the mapping G if and only if it is a solution of the differential equation :t :: (f(t),!'(t), t) = :~ (f(t),f'(t), t), which is called the Euler equation of the variational problem. It is an ordinary differential equation for the unknown function I; when the indicated derivative is computed, it takes the form a2FI" + a2F !' + a2F _ aF = 0 ay2 ay ax ay at ax . If W is not IR, we get exactly the same result from the general form of the integration by parts formula (using Theorem 6.3) and a more sophisticated
  • 197. 3.15 THE CALCULUS OF VARIATIONS 185 version of the above argument. (See Exercise 10.14 and 10.15 of Chapter 4.) That is, the smooth arc / with fixed endpoints a and p is a critical point of the mapping g 1-+ f: F(g(t), g'(t), t) dt if and only if it satisfies the Euler differential equation d 2 1 dt dF</(1)./'(1),1> = dF</(1)./'(1),1>· This is now a vector-valued equation, with values in W*. If W is finite-dimen- sional, with dimension n, then a choice of basis makes W* into IRn, and this vector equation is equivalent to n scalar equations ! ::i(j(t), f'(t), t) = :~ (j(t),f'(t), t), where F is now a function of 2n + 1 real variables, F(x, y, t) = F(xb ... ,Xn,Yb ... , Yn, t). Finally, let us see what happens to the simpler v~riational problem (W = IR) when the endpoints of/are not fixed. Now the critical-point equation is dG/(h) = ofor all h in V, and when we integrate by parts it becomes b b of. hI+1(oF -!!:.. OF) h= 0 oY a ox dt oY for all h in V. We can reason essentially as above, but a little more closely, to conclude that a function / is a critical point if and only if it satisfies the Euler equation !!:..(OF) _ of = 0 dt oY ox and also the endpoint conditions oFi - OFi - 0 oY I=a - oY I=b - • This has been only a quick look at the variational calculus, and the interested reader can pursue it further in treatises devoted to the subject. There are many more questions of the general type we have considered. For example, we may want neither fixed nor completely free endpoints but freedom subj~ct to con- straints. We shall take this up in Chapter 13 in the special case of the varia- tional equations of mechanics. Or again, / may be a function of two or more variables and the integral may be a multiple integral. In this case the Euler equation may become a system of partial differential equations in the unknown/. Finally, there is the question of sufficient conditions for the critical function to give.a maximum or minimum value to the integral. This will naturally involve a study of the second differential of the functional G, or its second variation, as it is known in this subject.
  • 198. 186 THE DIFFERENTIAL CALCULUS 3.1( *16. THE SECOND DIFFERENTIAL AND THE CLASSIFICATION OF CRITICAL POIN.TS Suppose that V and Ware normed linear spaces, that A is an open subset of V, and that F: A ~ W is a continuously differentiable mapping. The first differ- ential of F is the continuous mapping dF: l' ~ dF-y from A to Hom(V, W). W(' now want to study the differentiability of this mapping at the point a. Prp- sumably, we know what it means to say that dF is differentiable at a. By definition d(dF)a is a bounded linear transformation T from V to Hom(V, W) such that A(dF)a(1]) - T(1]) = 0(1]). That is, dFa+'1 - dFa - T(7J) is all element of Hom(V, W) of norm less than el17J11 for 71 sufficiently small. We St't d2Fa = d(dF)a and repeat: d2Fa = d2FaO is a linear map from V to Hom(V, W), d2Fa(1]) = d2Fa(1])(-) is an element of Hom(V, W), and d2Fa(1])U) is a vector in W. Also, we know that d2Fa is equivalent to a bounded bilinear map w:Vx V~W, where w(7J, t) = d2Fa(1])(t). The vector d2Fa(7J)(t) clearly ought to be some kind of second derivative of F at a, and the reader might even conjecture that it is the mixed derivative ill the directions t and 1]. Theorem 16.1. If F: A ~ W is continuously differentiable, and if th(' second differential d2Fa exists, then for each fixed p. E V the functioll D!,F: l' ~ D!,F(1') from A to W is differentiable at a and Dp(D!,F)(a) = (d2Fa(V)) (p.), Proof, We use the evaluation-at-p. map eVIL: Hom(V, W) ~ W defined for a fixed p. in V by ev!'(T) = T(p.). It is a bounded linear mapping. Then (D!,F)(a) = dFa(p.) = ev!'(dFa) = (ev!, 0 dF)(a), so that the function D!,F is the composition eVIL 0 dF. It is differentiable at (X because d(dF)a exists and eVIL is linear. Thus (Dp(D!,F))(a) = d(D!,F)a(v) =c d(ev!, 0 dF)a(v) = (ev!, 0 d(dF)a) (v) = ev!'[(d2Fa)(v)] = (d2Fa(V)) (p.). 0 The reader must remember in going through the above argument that D!,F is the function (D!,F) (.), and he might prefer to use this notation, as follow,; Dp((D!,F)('))la = d((D!,F)('))a(v) = d(ev!, 0 dF(.))a(v) = rev!' 0 d(dF(.))a](v) = eV!,(d2Fa(v)). If the domain space V is the Cartesian space IRn, then the differentiability of (DaiF)O = (aFjaxj)(') at a implies the existence of the second partial deriva tives (a2Fjaxi aXj) (a) by Theorem 9.2, and with band c fixed, we then haw Dc(DbF) = De (L bi :~) = :E biDe ::i = L bi (L Cj~ (aF)) = L biCj a2 F . aXj aXi i,j aXj aXi
  • 199. 3.16 SECOND DIFFERENTIAL; CLASSIFICATION OF CRITICAL POINTS 187 Thus, Corollary I. If V = IRn in the above theorem, then the existence of d2Fa implies the existence of all the second partial derivatives (o2FjoXi OXj)(a) and Moreover, from the above considerations and Theorem 9.3 we can also conclude that: Theorem 16.2. If V = IRn , and if all the second partial derivatives (o2F/OXi OXj) (a) exist and are continuous on the open set A, then the second differential d2Fa exists on A and is continuous. Proof. We have directly from Theorem 9.3 that each first partial derivative (oF/OXj)(') is differentiable. But ofIOXj = eVa; a dF, and the corollary is then a consequence of the following general principle. 0 Lemma. If {Si}~ is a finite collection of linear maps on a vector space W such that S = -<S1, ... , Sk> is invertible, then a mapping F: A ~ W is differentiable at a if and only if Si a F is differentiable at a for all i. Proof. For then S a F and F = S-1 a S a F are differentiable, by Theorems 8.1 and 6.2. 0 These considerations clearly extend to any number of differentiations. Thus, if d2F(.): y ~ d2Fy is differentiable at a, then for fixed band c the evaluation d2F(.)(b, c) is differentiable at a and the formula shows (for special choices of b and c) that all the second partials (o2Floxj OXi)(') are differentiable at a, with Conversely, if all the third partials exist and are continuous on A, then the ~econd partials are differentiable on A by Theorem 9.3, and then d2F(.) is differentiable by the lemma, since (o2Floxi OXj)(-) = ev<ai,a;> a d2F(.). As the reader will remember, it is crucially important in working with higher-order derivatives that o2Floxi OXj = o2Flo.'Cj OXi, and we very much need the same theorem here. Theorem 16.3. The second differential is a symmetric function of its two arguments: (d2Fa(7J)(~) = (d2Fa(O) (7J).
  • 200. 188 THE DIFFERENTIAL CALCULUS 3.16 Proof. By the definition of d(dF)a, given E, there is a 8 such that whenever 117]11 ~ 8. Of course, tl(dF)a(7]) = dFa+7I - dFa. If we write down the same inequality with 7] replaced by 7] + r, then the difference of the trans- formations in the left members of the two inequalities is and the triangle inequality therefore implies that provided that both 7] and 7] + r have norms at most 8. We shall take IIrll ~ 8/3 and 117]11 ~ 28/3. If we hold r fixed, and if we set T = d2Fa (-r) and G(~) = F(~) - F(~ + r), then this inequality becomes IldGa +7I - Til ~ 2E(II7]11 + Ilrll), and since it holds whenever 117]11 ~ 28/3, we can apply the corollary to Theorem 7.4 and conclude that IltlGa+7I(~) - T(~)II ~ 2E(II7]1I + IIrl!)11 ~II, provided that 7] and 7] + ~ have norms at most 28/3. This inequality therefore holds if 7], r, and ~ all have norms at most 8/3. If we now set r = -7], we have and tlGa+7IW = F(a +7] + ~) - F(a + 7]) - F(a + ~) + F(a). This function of 7] and ~ is called the second difference of F at a, and is designated tl2Fa(7], ~). Note that it is symmetric in ~ and 7]. Our final inequality can now be rewritten as Reversing 7] and ~, and using the symmetry of tl2Fa, we see that provided 7] and ~ have norms at most 8/3. But now it follows by the usual homogeneity argument that this inequality holds for all 7] and~. Finally, since E is arbitrary, the left-hand side is zero. 0 The reader will remember from the elementary calculus that a critical point a for a function / [f'(a) = 0] is a relative extremum point if the second derivativef"(a) exists and is not zero. In fact, if f"(a) < 0, then/has a relative maximum at a, becausef"(a) < 0 implies thatf' is decreasing in a neighborhood of a and the graph of / is therefore concave down in a neighborhood of a. Simi- larly, / has a relative minimum at a if f'(a) = 0 and f"(a) > O. If f"(a) = 0, nothing can be concluded. If/ is a real-valued function defined on an open set A in a finite-dimensional vector space V, if ex E A is a critical point of /, and if d2/a. exists and is a non-
  • 201. 3.16 SECOND DIFFERENTIAL; CLASSIFICATION OF CRITICAL POINTS 189 singular element of Hom(V, V*), then we can draw similar conclusions about the behavior of I near a, only now there is a richer variety of possibilities. The reader is probably already familiar with what happens for a function I from 1R2 to IR. Then a may be a relative maximum point (a "cap" point on the graph of I), a relative minimum point, or a saddle point as shown in Fig. 3.13 for the graph of the translated function t:./a. However, it must be realized that new axes may have to be chosen for the orientation of the saddle to the axes to look as shown. Replacing I by t:./a amounts to supposing that 0 is the critical point and that 1(0) = o. Note that if 0 i~_ a saddle point, then there are two complemen- tary subspaces, the coordinate axes in the Fig. 3.13, such that 0 is a relative maximum for Iwhen/is restricted to one of them, and a relative minimum point for the restriction of I to the other. Fig. 3.13 We shall now investigate the general case and find that it is just like the two-dimensional case except that when there is a saddle point the subspace on which the critical point is a maximum point may have any dimension from 1 to n - 1 [where d(V) = n). Moreover, this dimension is exactly the number of -l's in the standard orthonormal basis representation of the quadratic form q(~) = wa, ~) = d2IOt(~, ~). Our hypotheses, then, are that I is a continuously differentiable real-valued function on an open subset of a finite-dimensional normed linear space V, that a E A is a critical point for I (dla = 0), and that the mapping d21a.: V --+ V* exists and is nonsingular. This last hypothesis is equivalent to assuming that the bilinear form wa, '7) = d2Ia(~, '7) has a nonsingular matrix with respect to any basis for V. We now use Theorem 7.1 of Chapter 2 to choose an w-orthonormal basis {ai}~' Remember that this means that w(ai' aj) = 0 if i ¢ j, w(ai' ai) = 1 for i = 1, ... ,p, and w(ai' ai) = -1 for i = p + 1, ... ,n. There cannot be any 0 values for w(ai' ai) because the Inatrix tij = w(ai' aj) is nonsingular: if w(ai' ai) = 0, then the whole ith column is zero, the column space has dimen- sion :::;; n - 1, and the Inatrix is singular. We can use the basis isomorphism tp to replace V by IR" (i.e., replace I by 10 tp), and we can therefore suppose that V = IR" and that the standard basis is
  • 202. 190 THE DIFFERENTIAL CALCULUS 3.16 w-orthonormal, with w(x, y) = Ll XiYi - L;+l XiYi. Since w(!5i, !5j) = d2fa{!5 i, !5j) = DaiDaif(a) = a a 2 a'f (a), Xi Xj our hypothesis of w-orth0gonality is that (a2flaXi aXj) (a) = 0 for i ¢ j, a2flaxl = 1 for i = 1, ... ,p, and a2flaxl = -1 for i = p + 1, ... , n. Since p can have any value from 0 to n, there are n + 1 possibilities. We show first that if p = n, then a is a relative minimum off. In this case the quadratic form q is said to be positive definite, since q(x) = w(x, x) is positive for every nonzero x. We also say that the bilinear form w(x, y) = d2fa(x, y) is positive definite, and, in the language of Chapter 5, that w is a scalar product. Theorelll 16.4. Let f be a continuously differentiable real-valued function defined on an open subset A of Rn, and let a E A be a critical point of fat which d2f exists and is positive definite. Thenf has a relative minimum at a. Proof. We suppose, as above, that the standard basis {!5i} ~ is w-orthonormal. By the definition of d2fa, given E, there is a 15 such that whenever Ilyll ::::; 15. Now dfa = 0, since a is a critical point off, and d2fa(x, y) = L~ XiYi, by the assumption that {!5i) ~ is w-orthonormal. Therefore, if we use the two-norm on IR and set y = tx, we have Also, if h(t) = f(a + tx), then h'(s) = dfa+8x(X), and this inequality therefore says that (1 - E)tllxl1 2 ::::; h'(t) ::::; (1 + E)tllxI1 2• Integrating, and remembering that h(l) - h(O) = f(a + x) - f(a) = fl.fa(x), we have e2 E) IIxl1 2 ::::; Ilfa(x) ::::; e~ E) IIxll 2 whenever Ilxll ::::;15. This shows not only that a is a relative minimum point but also that fl.fa lies between two very close paraboloids when x is sufficiently small. 0 The above argument will work just as well in general. If p n q(x) = :E x~ - :E x~ 1 p+l is the quadratic form of the second differential and IIxll~ = L~ xl, then replac- ing IIxl1 2 inside the absolute values in the above inequalities by q(x), we conclude that q(x) - Ellxl1 2 < A f ( ) < q(x) + Ellxl1 2 2 - UJa X - 2 '
  • 203. 3.17 THE TAYLOR F.ORMULA 191 or ! (t (1 - E)X~ - t (1 + E)X~) ~ afa(x) 1 p+l ~ i (t (1 + E)X~ - t (1 - E)X~) . 1 p+l This shows that afa lies between two very close quadratic surfaces of: the same type when IIxll ~ ~. If 1 ~ p ~ n - 1 and a = 0, then f has a relative minimum on the subspace VI = L({aiH) and a relative maximum on the complementary space V2 = L({~i}~+l)' According to our remarks at the end of Section 2.7, we can read off the type of a critical point for a function of two variables without orthonormalizing by looking at the determinant of the matrix of the (assumed nonsingular) form d2fa. This determinant is 2 a2f a2f (a2f)2 tUt22 - (t12) = ax2 ax2 - ax ax . 1 2 1 2 If it is positive, then a is either a relative minimum or a relative maximum. We can tell which by following f along a single line, say the xraxis. Thus, if a2f/ax~ < 0, then a is a relative maximum point. On the other hand, if the above expression is negative, then a is a saddle point. It is important for the calculus of variations that Theorem 16.4 remains true when the domain space is replaced by a space of the general type that we shall study in the next chapter, called a Banach space. The hypotheses now are that a is a critical point of f, that q(~) = dfa(~, ~) is positive definite, and that the scalar product norm qlJ 2 (see Chapter 5) is equivalent to. the given norm on V. The proof remains virtually unchanged. *17. HIGHER ORDER DIFFERENTIALS. THE TAYLOR FORMULA We have seen that if V and Ware normed linear spaces, and if F is a differ- entiable mapping from an open set A in V to W, then its differential dF = dF(o) is a mapping from A to Hom(V, W). If this mapping is differentiable on A, then its differential d(dF) = d(dFk) is a mapping from A to Hom(V, Hom(V, W)). We remember that an element of Hom(V, Hom(V, W)) is equivalent by duality to a bilinear mapping from V X V to W, and if we designate the space of all such bilinear mappings by Hom2(V, W), then d(dF) can be considered to be from A to Hom2(V, W). We write d(dF) = d2F, and call this mapping the second differ- ential of F. In Section 16 we saw that d2Fa(~, 1/) = D~(D'1F)(a), and that if V = IRn , then 2 ~F d Fa(b, c) = DbDcF(a) = L: biCj~ (a). UXjUXi The differentials of higher order are defined in the same way. If d2F: A --+ Hom2(V, W)
  • 204. 192 THE DIFFERENTIAL CALCULUS 3.17 is differentiable on A, then its differential, d(d2F) = d3F, is from A to Hom(V, Hom2(V, W)) = Hom3(V, W), the space of all trilinear mappings from V 3 = V X V X V to W. Continuing inductively, we arrive at the notion of the nth differential of F on A as a mapping from A to Hom(V, Homn- 1(V, W)) = Homn(V, W), the space of all n-linear mappings from vn to W. The theorem that d2Fa is a symmetric element of Hom2(V, W) extends inductively to show that dnFa is a symmetric element of Homn(V, W). We shall omit this proof. Our theorem on the evaluation of the second differential by mixed directional derivatives also generalizes by induction to give D~l' ... ,D~nF(a) = dnFa(6, ... , ~n)' for starting from the left-hand term, we have D~1(D~2"'" D~nF)Ola = d(D~2" .. , D~nF('))a(~I) = d(dn - 1Fo (b, ... , ~n))a(6) = d(ev<~2"".~n> 0 dn-1F(.,)a(~I) = rev 0 d(dn- 1F ) ](1: )<~2'''''~''> (.) a <;1 = eV<b".~n>(dnFaal)) = (dnFa(6))(b, .. . , ~n) = dnFa(~b ... , ~n)' If V = IRn, then our conclusions about partial derivatives extend inductively in the same way to show that F has continuous differentials on A up through order m if and only if all the mth-order partial derivatives amFjaxip ... , aXim exist and are continuous on A, with dmFa(cI, ... , cm) = t dp .•• , ci'ma . amF a . (a). i 1.....i m =1 X'l' ... , X'm We now consider the behavior of F along the line t t-+ a +t7J, where, of course, a and 7J are fixed. If A(t) = F(a +t7J), then we can prove by induction that We know this to be true for j = 1 by Theorem 7.2, and assuming it for j = m, we have, by the same theorem, dm+1A (dAm)' . dtm+1 = dtm (t) = d(D':,'F)a+t,,(7J) = D,,(D':,'F)(a +t7J) = D':,'+IF(a +t7J). Now suppose that Fis real-valued (W = IR). We then have Taylor's formula: tm r+1 A(t) = A(O) +tA'(O) +... +m! A(m)(O) + (m + I)! A(m+l)(kt) for some k between 0 and 1. Taking t = 1 and substituting from above, we have F(a + 7J) = F(a) + D"F(a) +... + ~! D':,'F(a) + (m! I)! D':,'+IF(a +k7J),
  • 205. 3.17 THE TAYLOR FORMULA 193 which is the general Taylor formula in the normed linear space context. In terms of differentials, it is 1 F(a + 71) = F(a) + dFa (7J) +... +-,d""Fa (7J, •.• , 71) m. 1 d""+l ( + (m + I)! Fa+k~ 71, ... , 71). If V = IRn, then DyG = L~ Yi aGjaxi, and so the general term in the Taylor expansion is 1 (n a )m 1 n amF m! L Yi ax' F(a) = m!. ~ _ Yil··· Yim ax' ... ax' (a). 1 " ttt . .. ,'Im-1 '&1 1m If m = n = 2, and if we use the notation x = -<x, y>-, s = -<8, t>-, then 1 2 1[2a2F a2F 2a2F ] 2! D8 F(a) = 2 8 ax2 (a) +28t ax ay (a) +t ay2 (a) . The above description is logically simple, but it is inefficient in that it repeats identical terms such as YIY2(a2Fjaxl aX2) and Y2Yl(a2Fjax2 aXl). We conclude by describing for the interested reader the modern "multi-index" notation for this very complicated situation. Remember that we are looking at the mth term of the Taylor formula for F, and that F has n variables. For any n-tuple k = -< kll ... , kn >- of nonnegative integers, we define jkj as L~ ki' and for x E IRn, we set Xk = XlklXl2 ... xnkn. Also we set Fk = Fklk2·..kn' or better, if DjF = aFjaxj, we set DkF = D/IDl2 . .. DnknF = Fk • Finally, we set k! = kl !k2! ... kn!, and if p ~ jkj, we set (t) = p!jk !(p - jkj)!. Then the mth term of the Taylor expansion of F is 1, L (mk ) DkF(a)xk, m·lkl=m which is surely a notational triumph. The general Taylor formula is too cumbersome to be of much use in practice; it is principally of theoretical value. The Taylor expansions that we actually compute are generally found by other means, such as substitution of a poly- nomial (or power series) in a power series. For example, ( X + 2)3 (x + 2)5 sin (x + y2) = (x + y2) - 3!Y + 5t ... = x + y2 _ x3 _ x2y2 + (X5 _ Xy4) + (X4y2 _ y6 ) ••• 3! 2 5! 2 4! 3! A mapping from A to W which has continuous differentials of all orders through k is said to be of class Ck on A, and the collection of all such mappings
  • 206. 194 THE DIFFERENTIAL CALCULUS 3.17 is designated Ck(A, W) or ek(A, W). It is clear that Ck(A, W) is a vect(i)l"'Space (induction). Moreover, it can also be shown by induction that a composition of Ck-maps is itself of class Ck. This depends on <recognizing the generail form of the mth differential of a composition FoG as being a finite sum, each term of which is a composition offunctions chosen from F, dF, ... , dmF, G, dG, ... , dmG. Functions of many variables are involved in these calculations, and it is simplest to treat each as a function of a single n-tuplet variable and to apply the obvious corollary of Theorem 8.1 that if Gt, ... , Gn are of class Ck, then so is G = -< Gr, ... , Gn>, with dkG = -<dkGt, ... , dkGn>. As a special case of composition, we can conclude that a product of Ck-maps is of class Ck. We shall see in the next chapter that cP: T ~ T-I is a differentiable map on the open set of invertible elements in Hom V (if V is a Banach space) and that dCPT(H) = - T-IHT- I. Since -<S, H, T> ~ S-IHT-Ithen has continuous par- tial differentials, we can continue, and another induction shows that cP is of class Ck for every k and that dm'PT(Hb ... , Hm) is a finite sum of finite products of T-t, HI, ... , Hm' 'It then follows that a function F defined implicitly by a Ck_ function G is also of class ck, for its differential, as computed in the implicit- function theorem, is then a composition of maps of class Ck-I. A mapping F which is of class Ck for all k is said to be of class Coo, and it follows from our remarks above that the family of Coo-maps is closed under all the operations that we have met in the calculus. If the domain of F is an open set in IRn, then FE eOO(A, W) if and only if all the partial derivatives of F exist and are continuous on A.
  • 207. CHAPTER 4 COMPACTNESS AND COMPLETENESS In this chapter we shall investigate two properties of subsets of a normed linear space V which are concerned with the fact that in a certain sense all the points which ought to be there really are there. These notions are largely independent of the algebraic structure of V, and we shall therefore study them in their own most natural setting, that of metric spaces. The stronger of these two properties, compactness, helps to explain why the theory of finite-dimensional spaces is so simple and satisfactory. The weaker property, completeness, is shared by important infinite-dimensional normed linear spaces, and allows us to treat these spaces in almost as satisfactory a way. It is these properties that save the calculus from being largely a formal theory. They allow us to define crucial elements by limiting processes, and are responsible, for example, for an infinite series having a sum, a continuous real- valued function assuming a maximum value, and a definite integral existing. For the real number system itself, the compactness property is equivalent to the least upper bound property, which has already been an absolutely essential tool in our construction of the differential calculus in Chapter 3. In Sections 8 through 10 we shall apply completeness to the calculus. The first of these sections is devoted to the existence and differentiability of functions defined by power series, and since we want to include power series in an operator T, we shall take the occasion to introduce and exploit the notion of a Banach algebra. Next we shall prove the contraction mapping fixed-point theorem, which is the missing ingredient in our unfinished proof of the implicit-function theorem in Chapter 3 and which will be the basis for the fundamental existence and uniqueness theorem for ordinary differential equations in Chapter 6. In Section 10 we shall prove a simple extension theorem for linear mappings into a complete normed linear space and apply it to construct the Riemann integral of a param- atrized arc. 1. METRIC SPACES; OPEN AND CLOSED SETS In the preceding chapter we occasionally treated questions of convergence and continuity in situations where the domain was an arbitrary subset A of a normed linear space V. In such discussions the algebraic structure of V fades into the background, and the vector operations of V are used only to produce the combi- 195
  • 208. 196 COMPACTNESS AND COMPLETENESS 4.1 nation Iia - !311, which is interpreted as the distance from ato!3. If we distill out of these contexts what is essential to the convergence and continuity argu- ments, we find that we need a space A and a function p: A X A ~ JR, p(x, y) being called the distance from x to y, such that 1) p(x, y) > 0 if x ¥- y, and p(x, x) = 0; 2) p(x, y) = p(y, x) for all x, YEA; 3) p(x, z) ::; p(x, y) +p(y, z) for all x, y, z E A. Any set A together with such a function p from A X A to JR is called a metric space; the function p is the metric. It is obvious that a normed linear space is a metric space under the norm metric pea, m= Iia - !311 and that any subset B of a metric space A is itself a metric space under p rB X B. If we start with a nice intuitive space, like JRn under one of its standard norms, and choose a weird subset B, it will be clear that a metric space can be a very odd object, and may fail to have almost any property one can think of. Metric spaces very often arise in practice as subsets of normed linear spaces with the norm metric, but they come from other sources too. Even in the normed linear space context, metrics other than the norm metric are used. For example, S might be a two-dimensional spherical surface in JR3, say S = {x : r:,~ xl = I}, and p(x, y) might be the great circle distance from x to y. Or, more generally, S might be any smooth two-dimensional surface in JR3, and p(x, y) might be the length of the shortest curve connecting x to y in S. In this chapter we shall adopt the metric space context for our arguments wherever it is appropriate, so that the student may become familiar with this more general but very intuitive notion. We begin by reproducing the basic definitions in the language of metrics. Because the scalar-vector dichotomy is not a factor in this context, we shall drop our convention that points be repre- sented by Greek or boldface roman letters and shall use whatever letters we wish. Definition. If X and Yare metric spaces, then j: X ~ Y is continuous at a E X if for every E there is a ~ such that p(x, a) < ~ =? p(j(x),j(a) < E. Here we have used the same symbol'p' for metrics on different spaces, just as earlier we made ambiguous use of the norm symbol. Definition. The (open) ball oj radius r about p, BT(p), is simply the set of points whose distance from p is less that r: BT(P) = {x :p(x, p) < r}. Definition. A subset A C X is open if every point p in A is the center of some ball included in A, that is, if
  • 209. 4.1 METRIC SPACES; OPEN AND CLOSED SETS 197 LeDlDla 1.1. Every ball is open; in fact, if q E Br(P) and a= r - p(p, q), then Ba(q) C Br(P). Proof. This amounts to the triangle inequality. For, if x E Ba(q), then p(x, q) < a and p(x, p) ~ p(x, q) +p(q, p) < a+p(p, q) = r, so that x E Br(P). Thus Ba(q) C Br(P). 0 LeDlDla 1.2. If P is held fixed, then p(p, x) is a continuous function of x. Proof. A symbol-by-symbol paraphrase of Lemma 3.1 of Chapter 3 shows that Ip(p, x) - p(p, y)1 ~ p(x, y), so that p(p, x) is actually a Lipschitz function with constant 1. 0 TheoreDl 1.1. The family ~ of all open subsets of a metric space S has the following properties: 1) The union of any collection of open sets is open; that is, if {Ai: i E J} C ~, then UiEI Ai E ~. 2) The intersection of two open sets is open; that is, if A, B E ~, then A nB E~. 3) 0, V E~. Proof. These properties follow immediately from the definition. Thus any point p in Ui Ai lies in some Aj, and therefore, since A j is open, some ball about p is a subset of Aj and hence of the larger set Ui Ai. 0 Corollary. A set is open if and only if it is a union of open balls. Proof. This follows from the definition of open set, the lemma above, and property (1) of the theorem. 0 The union of all the open subsets of an arbitrary set A is an open subset of A, by (1), and therefore is the largest open subset of A. It is called the interior of A and is designated A into Clearly, p is in Aint if and only if some ball about p is a subset of A, and it is helpful to visualize Aint as the union of all the balls that are in A. Definition. A set A is closed if A' is open. The theorem above and De Morgan's law (Section 0.11) then yield the following complementary set of properties for closed sets. TheoreDl 1.2 1) The intersection of any family of closed sets is closed. 2) The union of two closed sets is closed. 3) 0 and V are closed. Proof. Suppose, for example, that {Bi: i E J} is a family of closed sets. Then the complement B: is open for each i, so that UiB: is open by the above theorem.
  • 210. 198 COMPACTNESS AND COMPLETENESS 4.1 Also, UiB[ = miBi)' by De Morgan's law (see Section 0.11). Thus niBi is the complement of an open set and is closed. 0 Continuing our "complementary" development, we define the closure, if, of an arbitrary set A as the intersection of all closed sets including A, and we have from (1) above that if is the smallest closed set including A. De Morgan's law implies the important identity For F is a closed superset of A if and only if its complement U = F' is an open subset of A'. By De Morgan's law the complement of the intersection of all such sets F is the union of all such sets U. That is, the complement of if is (A')int. This identity yields a direct characterization of closure: Lemma 1.3. A point p is in if if and only if every ball about p intersects A. Proof. A point p is not in if if and only if p is in the interior of A', that is, if and only if some ball about p does not intersect A. Negating the extreme members of this equivalence gives the lemma. 0 Definition. The boundary, iJA, of an arbitrary set A is the difference between its closure and its interior. Thus iJA = if - Aint. Since A - B = A n B', we have the symmetric characterization iJA = if n (A'). Therefore, iJA = iJ(A') j also, p E iJA if and only if every ball about p intersects both A and A'. Example. A ball Br(a) is an open set. In a normed linear space the closure of Br(a) is the closed ball about a of radius r, H :p(~, a) ~ r}. This is easily seen from Lemma 1.3. The boundary iJBr(a) is then the spherical surface of radius I' about a, H: p(~, a) = r}. If some but not all of the points of this surface are added to the open ball, we obtain a set that is neither open nor closed. The student should expect that a random set he may encounter will be neither open nor closed. Continuous functions furnish an important source of open and closed sets by the following lemma. Lemma 1.4. If X and Yare metric spaces, and iff is a continuous mapping from X to Y, then f-I[A] is open in X whenever A is open in Y. Proof. If pErI [A], then f(p) E A, and, since A is open, some ball B.(f(p) is a subset of A. But the continuity of fat p says exactly that there is a ~ such that f[Ba(p)] C B.(f(p). In particular, f[Ba(p)] C A and Ba(P) Cf-I[A]. Thus for each p inrI[A] there is a ball about p included inf-I[A], and this set is therefore open. 0
  • 211. 4.1 METRIC SPACES; OPEN AND CLOSED SETS 199 Since f-I[A'] = (f-I[A])', we also have the following corollary. Corollary. Iff: X -- Y is continuous, thenrI[C] is closed in X whenever C is closed in Y. The converses of both of these results hold as well. As an example of the use of this lemma, consider for a fixed a E X the continuous function f: X -- IR defined by f(~D = p(~, a). The sets (-1',1'), [0,1'], and {r} are respectively open, closed, and closed subsets of IR. Therefore, their inverse images under f-the ball Br(a), the closed ball a:pa, a) ~ r}, and the spherical surface a:p(~, a) = r} ----<are respectively open, closed, and closed in X. In particular, the triangle inequality argument demonstrating directly that Br(a) is open is now seen to be unnecessary by virtue of the triangle inequality argument that demonstrates the continuity of the distance function (Lemma 1.2). It is not true that continuous functions take closed sets into closed sets in the forward direction. For example, if f: IR -- IR is the arc tangent function, then f[lR] = range f = (-7r/2, 7r/2), which is not a closed subset of IR. The reader may feel that this example cheats and that we should only expect the f-image of a closed set to be a closed subset of the metric space that is the range of f. He might then consider f(x) = 2x/(1 +x2 ) from IR to its range [-1, 1]. The set of positive integers Z+ is a closed subset of IR, but f[Z+] = {2n/(1 +n2)} ~ is not closed in [-1, 1], since 0 is clearly in its closure. The distance between two nonempty sets A and B, p(A, B), is defined as glb {p(a, b) : a E A and b E B}. If A and B intersect, the distance is zero. If A and B are disjoint, the distance may still be zero. For example, the interior and exterior of a circle in the plane are disjoint open sets whose distance apart is zero. The x-axis and (the graph of) the function f(x) = l/x are disjoint closed sets whose distance apart is zero. As we have remarked earlier, a set A is closed if and only if every point not in A is a positive distance from A. More generally, for any set A a point p is in A if and only if p(p, A) = o. We list below some simple properties of the distance between subsets of a normed linear space. 1) Distance is unchanged by a translation: p(A, B) = p(A + "/, B + "/) (became II(a +"/) - ({3 +"/)II = Iia - (311)· 2) p(kA, kB) = Iklp(A, B) (because Ilka - k{311 = Ikilia - (31f). 3) If N is a subspace, then the distance from B to N is unchanged when we translate B through a vector in N: p(N, B) = p(N, B +71) if 71 E N (because N - 71 = N). 4) If T E Hom(V, W), then p(T[A], T[B]) ~ IITllp(A, B) (because /IT(a) - T({3)1I ~ IIT/I. lIa - (311)· Lemma 1.5. If N is a proper closed subspace and 0 < E <1, there exists an a suc:n that Ila/l = 1 and p(a, N) > 1 - E.
  • 212. 200 COMPACTNESS AND COMPLETENESS 4.1 Proof. Choose any (3 ~ N. Then p({3, N) > 0 (because N is closed), and there exists an 71 EN such that 11{3 - 7111 < p({3, N)/(1 - E) [by the definition of p({3, N)]. Set a = ({3 - 71)/II{3 - 7111. Then Iiall = 1 and p(a, N) = p({3 - 71, N)/II{3 - 7111 = p({3, N)/II{3 - 7111 > p({3, N)(1 - E)/p({3, N) = 1 - E, by (2), (3), and the definition of 71. 0 The reader may feel that we ought to be able to improve this lemma. Surely, all we have to do is choose the point in N which is closest to (3, and so obtain 11{3 - 7111 = p({3, N), giving finally a vector a such that Iiall = 1 and p(a, N) = 1. However, this is a matter on which our intuition lets us down: if N is infinite- dimensional, there may not be a closest point 71! For example, as we shall see later in the exercises of Chapter 5, if V is the space e([-1, 1]) under the two- norm Ilfll = (f!l f2)1/2, and if N is the set of functions g in V such that U g = 0, then N is a closed subspace for which we cannot find such a "best" Ci. But if N is finite-dimensional, we can always find such a point, and if V is a Hilbert space, (see Chapter 5) we can also. EXERCISES 1.1 Write out the proof of Lemma 1.2. 1.2 Prove (2) and (3) of Theorem 1.1. 1.3 Prove (2) of Theorem 1.2. 1.4 It is not true that the intersection of a sequence of open sets is necessarily open. Find a counterexample in ~. 1.5 Prove the corollary of Lemma 1.4. 1.6 Prove that pEA if and only if pep, A) = o. 1.7 Let X and Y be metric spaces, and let j: X --+ Y have the property that j-l[B] is open in X whenever B is open in Y. Prove that j is continuous. 1.8 Show that p(x, A) = p(x, A). 1.9 Show that p(x, A) is a continuous function of x. (In fact, it is Lipschitz con- tinuous.) 1.10 Invent metric spaces S (by choosing subsets of ~2) having the following prop- erties: 1) S has n points. 2) S is infinite and p(x, y) ~ 1 if x ~ y. 3) S has a ball Bl(a) such that the closed ball {x: p(x, a) :$ I} is not the same as the closure of Bl(a).
  • 213. 4.2 TOPOLOGY 201 1.11 Prove that in a normed linear space a closed ball is the closure of the corre- sponding open ball. 1.12 Show that if f: X -+ Y and g: Y -+ Z are continuous (where X, Y, and Z are metric spaces), then so is g 0 f. 1.13 Let X and Y be metric spaces. Define the notion of a product metric on Z = X X Y. Define a I-metric PI and a uniform metric p", on Z (showing that they are metrics) in analogy with the I-norm and uniform norm on a product of normed linear spaces, and show that each is a product metric according to your definition above. 1.14 Do the same for a 2-metric P2 on Z = X X Y. 1.IS Let X and Y be metric spaces, and let V be a normed linear space. Letf: X -+ IR and g: Y -+ V be continuous maps. Prove that -< x, y>- 1-+ f(x) g(y) is a continuous map from X X Y to v. *2. TOPOLOGY If X is an arbitrary set and 3 is any family of subsets of X satisfying properties (1) through (3) in Theorem 1.1, then 3 is called a topology on X. Theorem 1.1 thus asserts that the open subsets of a metric space X form a topology on X. The subsequent definitions of interior, closed set, and closure were purely topological in the sense that they depended only on the topology 3, as were Theorem 1.2 and the identity (A), = (A')int. The study of the consequences of the existence of a topology is called general topology. On the other hand, the definitions of balls and continuity given earlier were metric definitions, and therefore part of metric space theory. In metric spaces, then, we have not only the topology, but also our E-definitions of continuity and balls and the spherical characterizations of closure and interior. The reader may be surprised to be told now that although continuity and convergence were defined metrically, they also have purely topological char- acterizations and are therefore topological ideas. This is easy to see if one keeps in mind that in a metric space an open set is nothing but a union of balls. We have: f is continuous at p if and only if for every open set A containing f(p) there exists an open set B containing p such that f[B] C A. This local condition involving behavior around a single point p is more fluently rendered in terms of the notion of neighborhood. A set A is a neighbor- hood of a point p if pEAinto Then we have: f is continuous at p if and only if for every neighborhood N of f(p) , r1[N] is a neighborhood of p. Finally there is an elegant topological characterization of global continuity. Suppose that 8 1 and 8 2 are topological spaces. Then f: 8 1 -+ 8 2 is continuous
  • 214. 202 COMPACTNESS AND COMPLETENESS 4.: (everywhere) if and only if rl[A] is open whenever A is open. Also, f is con- tinuous if and only if f-l[B] is closed whenever B is closed. These conditionH are not surprising in view of Lemma 1.4. 3. SEQUENTIAL CONVERGENCE In addition to shifting to the more general point of view of metric space theory, we also want to add to our kit of tools the notion of sequential convergence, which the reader will probably remember from his previous encounter with the calculus. One of the principal reasons why metric space theory is simpler and more intuitive than general topology is that nearly all metric arguments can be presented in terms of sequential convergence, and in this chapter we shall partially make up for our previous neglect of this tool by using it constantly and in preference to other alternatives. Definition. We say that the infinite sequence {xn} converges to the point a if for every E there is an N such that n > N => p(xn' a) < E. We also say that Xn approaches a as n approaches (or tends to) infinity, and we call a the limit of the sequence. In symbols we write Xn ~ a as n ~ 00, or limn-+oo Xn = a. Formally, this definition is practically identical with our earlier definition of function convergence, and where there are parallel theorems the arguments that we use in one situation will generally hold almost verbatim in the other. Thus the proof of Lemma 1.1 of Chapter 3 can be alternated slightly to give the following result. Lemma 3.1. If {M and {"Ii} are two sequences in a normed linear space V, then ~i ~ a and 7Ji ~ {3 => ~i + 7Ji ~ a + {3. The main difference is that we now choose N as max {N b N 2} instead of choosing ~ as min {~1' ~2}. Similarly: Lemma 3.2. If ~i ~ a in V and Xi ~ a in IR, then Xi~i ~ aa. As before, the definition begins with three quantifiers, (VE)(3N) (Vn). A somewhat more idiomatic form can be obtained by rephrasing the definition in terms of balls and the notion of "almost all n". We say that P(n) is true for almost all n if P(n) is true for all but a finite number of integers n, or equivalently, if (3N)(Vn>N)P(n). Then we see that lim Xn = a if and only if every ball about a contains almost all the Xn • The following sequential characterization provides probably the most intuitive way of viewing the notion of closure and closed sets.
  • 215. 4.~~ SEQUENTIAL CONVERGENCE 203 Theorem. 3.1. A point x is in the closure A of a set A if and only if there is a sequence {xn} in A converging to x. Therefore, a set A is closed if and only if every convergent sequence lying in A has its limit in A. Proof. If {Xn} C A and Xn ~ x, then every ball about x contains almost every Xn> and so, in particular, intersects A. Thus x E A by Lemma 1.3. Conversely, if x E A, then every ball about x intersects A, and we can construct a sequence in A that converges to x by choosing Xn as any point in B lin (X) n A. Since A is closed if and only if A = A, the second statement of the theorem follows from the first. 0 There is also a sequential characterization of continuity which helps greatly in using the notion of continuity in a flexible way. Let X and Y be metric spaces, and let f be any function from X to Y. Theorelll 3.2. The function f is continuous at a if and only if, for any sequence {xn} in X, if Xn ~ a, then f(xn) ~ f(a). Proof. Suppose first that f is continuous at a, and let {xn} be any sequence converging to a. Then, given any E, there is a 0 such that p(X , a) < 0 =} p(j(x) , f(a)) < E, by the continuity of f at a, and for this 0 there is an N such that n > N =} p(xn , a) < 0, because ;t·n ~ a. Combining these implications, we see that given E we have found N so that n > N =} p(f(xn) , f(a)) < E. That is, f(xn) ~ f(a). Now suppose that f is not continuous at a. In considering such a negation it is important that implicit universal quantifiers be made explicit. Thus, for- mally, we are assuming that ~(V'E)(30)(V'x)(p(x , a) < 0 =} p(j(X) , f(a)) < E), that is, that (3E)(V'0)(3x)(p(x, a) < 0 &p(f(x) ,f(a)) ~ E). Such symbolization will not be necessary after the reader has had some practice in computing logical negations; the experienced thinker will intuit the correct negation without a formal calculation. In any event, we now have a fixed E, and for each 0 of the form 0 = lin we can let Xn be a corresponding x. We then have p(xn , a) < lin and p(f(xn) , f(a)) ~ E for all n. The first inequality shows tnat Xn ~ a; the second shows that f(xn) + f(a). Thus, if f is not continuous at a, then the sequential condition is not satisfied. 0 The above type of argument is used very frequently and almost amounts to an automatic proof procedure in the relevant situations. We want to prove, say, that (V'x)(3y)(V'z)P(x, y, z). Arguing by contradiction, we suppose this false, so that (3x)(V'y)(3z)~P(x, y, z). Then, instead of trying to use all numbers y, we let y run through some sequence converging to zero, such as (lIn}, and we choose
  • 216. 204 COMPACTNESS AND COMPLETENESS 4.3 one corresponding z, Zn, for each such y. We end up with "",P(x, lin, zn) for the given x and all n, and we finish by arguing sequentially. The reader will remember that two norms p and q on a vector space V are equivalent if and only if the identity map ~ 1-+ ~ is continuous from -< V, p>- to -< V, q>- and also from -< V, q>- to -< V, p>-. By virtue of the above theorem we now see that: Theorem 3.3. The norms p and q are equivalent if and only if they yield exactly the same collection of convergent sequences. Earlier we argued that a norm on a product V X W of two normed linear spaces should be equivalent to 11-<a, ~ >- III = lIall + II ~II. Now with respect to this sum norm it is clear that a sequence -< an, ~n >- in V X W converges to -< a, ~ >- if and only if an ---+ a in V and ~n ---+ ~ in W. We now see (again by Theorem 3.2) that: Theorem 3.4. A product norm on V X W is any norm with the property that -< an, ~n >- ---+ -< a, ~ >- in V X W if and only if an ---+ a in V and ~n ---+ ~ in W. EXERCISES 3.1 Prove that a convergent sequence in a metric space has a unique limit. That is, show that if Xn --+ a and x.. --+ b, then a = b. 3.2 Show that Xn --+ x in the metric space X if and only if P(Xn, x) --+ 0 in IR. 3.3 Prove that if Xn --+ a in IR and Xn ;;:::: 0 for all n, then a ;;:::: O. 3.4 Prove that if Xn --+ 0 in IR and IYnl ::;; Xn for all n, then Yn --+ O. 3.5 Give detailed E, N-proofs of Lemmas 3.1 and 3.2. 3.6 By applying Theorem 3.2, prove that if X is a metric space, V is a normed linear space, and F and G are continuous maps from X to V, then F + G is continuous. State and prove the similar theorem for a product FG. 3.7 Prove that continuity is preserved under composition by applying Theorem 3.2. 3.8 Show that (the range of) a sequence of points in a metric space is in general not a closed set. Show that it may be a closed set. 3.9 The fact that in a normed linear space the closure of an open ball includes the corresponding closed ball is practically trivial on the basis of Lemma 3.2 and Theorem 3.1. Show that this is so. 3.10 Show directly that if the maximum norm 11-<a, ~).. II = max {liall, II~II} is used on V = VI X V2, then it is true that -<an, ~..>- --+ -<a, ~>- in V if and only if an --+ a in VI and
  • 217. 4.4 SEQUENTIAL COMPACTNESS 205 S.ll Show that if II II is any increasing norm on 1R2 (see the remark after Theorem 4.3 of Chapter 3), then p(-<Xl,Yl>, -<X2,Y2» = lI-<p(xl,x2),p(Yl,Y2»1I is a metric on the product X X Y of two metric spaces X and Y. S.I2 In the above exercise show that -< Xn , Yn> -+ -< x, y> in X X Y if and only if Xn -+ x in X and Yn -+ yin Y. This property would be our minimal requirement for a product metric. S.IS Defining a product metric as above, use Theorem 3.2 to show that -<f, g> : S -+ X X Y is continuous if and only if f: S -+ X and g: S -+ Yare both continuous. 3.14 Let X, Y, and Z be metric spaces, and let f: X X Y -+ Z be a mapping such that f(x, y) is continuous in the variables separately. Suppose also that the continuity in x is uniform over y. That is, suppose that given E and XO, there is a ~ such that p(x, xo) < ~ => p(f(x, y), f(xo, y») < E for every value of y. Show that then f is continuous on X X Y. 3.15 Define the function f on the closed unit square [0, 1] X [0, 1] by f(O,O) = 0, xy f(x, y) = (x +y)2 if -<x, y> F- -<0,0>. Thenf is continuous as a function of x for each fixed value of y, and conversely. Show, however, that f is not continuous at the origin. That is, find a sequence -< Xn, Yn > converging to -< 0, 0> in the plane such that f(xn, Yn) does not converge to O. This example shows that continuity of a function of two variables is a stronger property than continuity in each variable separately. 4. SEQUENTIAL COMPACTNESS The reader is probably familiar with the idea of a subsequence. A subsequence of a sequence {xn} is a new sequence {Ym} that is formed by selecting an infinite number, but generally not all, of the terms X n , and counting them off in the order of the selected indices. Thus, if nl is the first selected n, n2 the next, and so on, and if we set Ym = xn.., then we obtain the subsequence or Strictly speaking, this counting off of the selected set of indices n is a sequence m ~ nm from Z+ to Z+ which preserves order: nm+l > nm for all m. And the subsequence m ~ xn... is the composition of the sequence n ~ Xn and the selector sequence. In order to avoid subscripts on subscripts, we may use the notation n(m) instead of nm• In either case we are being conventionally sloppy: we are using the same symbol'n' as an integer-valued variable, when we write Xn, and as the selector function, when we write n(m) or nm• This is one of the standard nota-
  • 218. 206 COMPA€TNESS AND COMPLETENESS 4.4 tional ambiguities which we tolerate in elementary calculus, because the cure is considered worse than the disease. We could say: let f be a sequence, i.e., a function from Z+ to IR. Then a subsequence of f is a composition fog, where g is a mapping from Z+ to Z+ such that g(m + 1) > g(m) for all m. Ifyou have grasped the idea of subsequence, you should be able to see that any infinite sequence of O's and l's, say {O, 1, 0, 0, 1, 0, 0, 0, 1, ...}, can be obtained as a subsequence of {O, 1,0, 1,0, 1, ... , [1 + (-1)n]/2, ...}. If Xn --+ a, then it should be clear that every subsequence also converges to a. We leave the details as an exercise. On the other hand, if the sequence {xn} does not converge to a, then there is an E such that for every N there is some larger n at which p(xn' a) ~ E. Now we can choose such an n for every N, taking care that nN+l > nN, and thus choose a subsequence all of whose terms are at a distance at least E from a. Then this sequence has no subsequence converging to a. Thus, if {Xn} does not converge to a, then it has a subsequence no (sub)subsequence of which converges to a. Therefore, Lemma 4.1. If the sequence {xn} and the point a are such that every subsequence of {xn} has itself a subsequence that converges to a, then Xn --+ a. This is a wild and unlikely sounding lemma, but we shall use it to prove a most important theorem (Theorem 4.2). Definition. A subset A of a metric space is sequentially compact if every sequence in A has a subsequence that converges to a point of A. Here, so to speak, we create convergence out of nothing. One would expect a compact set to have very powerful properties, and perhaps suspect that there aren't many such sets. We shall soon see, however, that every bounded closed subset of IRn is compact, and it is in the theory of finite-dimensional spaces that we most frequently use this notion. Sequential compactness in infinite-dimen- sional spaces is a much rarer phenomenon, but when it does occur it is very important, as we shall see in our brief look at Sturm-Liouville theory in Chapter 6. We begin with a few simple but important general results. Lemma 4.2. If A is a sequentially compact subset of a metric space S, then A is closed and bounded. Proof. Suppose that {xn} C A and that Xn --+ b. By the compactness of A there exists'a subsequence {Xn(i)}i that converges to a point a EA. But a sub- sequence of a convergent sequence converges to the same limit. Therefore, a = band b E A. Thus A is closed. Boundedness here will mean lying in some ball about a given point b. If A is not bounded, for each n there exists a point Xn E A such that p(xn, b) > n. By compactness a suBsequence {Xn(i)}i converges to a point a E A, and P(Xn(i), b) --+ p(a, b). This clearly contradicts P(Xn(i), b) > n(i) ~ i. 0
  • 219. 4.4 SEQUENTIAL COMPACTNESS 207 Continuous functions carry compact sets into compact sets. The proof of the following result is left as an exercise. TheorelD 4.1. Iff is continuous and A is a sequentially compact subset of its domain, then I[A] is sequentially compact. A nonempty compact set A C iii' contains maximum and minimum elements. This is because lub A is the limit of a sequence in A, and hence belongs to A itself, since A is closed. Combining this fact with the above theorem, we obtain the following well-known corollary. Corollary. Iffis a continuous real-valued function and dom (f) is nonempty and sequentially compact, then f is bounded and assumes maximum and minimum values. The following very useful result is related to the above theorem. TheorelD 4.2. If f is continuous and bijective and dom (f) is sequentially compact, then r 1 is continuous. Proof. We have to show that if Yn ~ Y in the range of I, and if Xn = f-1(Yn) and: x = rl(y), then Xn ~ x. It is sufficient to show that every subsequence {Xn(i)}i has itself a subsequence converging to x (by Lemma 4.1). But, since dom (f) is compact, there is a subsequence {Xn(i(i»} i converging to some z, and the continuity of f implies that f(z) = limi->oc f(Xn(i(j») = limi->oc Yn(i(j» = y. Therefore, z = r1(y) = x, which is what we had to prove. Thus r 1 is con- tinuous. 0 We now take up the problem of showing that bounded closed sets in IRn are compact. We first prove it for IR itself and then give an inductive argument for IRn. A sequence {xn} C IR is said to be increasing if Xn ~ xn+1 for all n. It is strictly increasing if Xn < Xn+1 for all n. The notions of a decreasing sequence and a strictly decreasing sequence are obvious. A sequence which is either increas- ing or decreasing is said to be monotone. The relevance of these notions here lies in the following two lemmas. LelDlDa 4.3. A bounded monotone sequence in IR is convergent. Proof. Suppose that {xn} is increasing and bounded above. Let 1be the least upper bound ofits range. That is, Xn ~ 1for all n, but for every E, 1 - E is not an upper bound, and so 1- E < XN for some N. Then n > N =} 1- E < XN ~ Xn ~ 1, and so IXn - II < E. That is, Xn ~ 1as n ~ 00. 0 LelDlDa 4.4. Any sequence in IR has a monotone subsequence. Proof. Call Xn a peak term if it is greater than or equal to all later terms. If there are infinitely many peak terms, then they obviously form a decreasing
  • 220. 208 COMPACTNESS AND COMPLETENESS 4.4 subsequence. On the other hand, if there are only finitely many peak terms, then there is a last one xno (or none at all), and then every later term is strictly less than some other still later term. We choose any n1 greater than no, and then we can choose n2 > n1 so that xnl < xn2' etc. Therefore, in this case we can choose a strictly increasing subsequence. We have thus shown that any sequence {xn } in IR has either a decreasing subsequence or a strictly increasing subsequence. 0 Putting these two lemmas together, we have: Theorem 4.3. Every bounded sequence in IR has a convergent subsequence. Now we can generalize to IRn by induction. Theorem 4.4. Every bounded sequence in IRn has a convergent subsequence (using any product norm, say II /11)' Proof. The above theorem is the case n = 1. Suppose then that the theorem is true for n - 1, and let {xm}m be a bounded sequence in IRn. Thinking of IRn as IRn- 1 X IR, we have xm = -< ym, zm>-, and {ym} mis bounded in IRn-1, because if x = -<y, z>-, then /lx/l 1 = /ly/l 1+ Izi ;::: /lyl/ 1. Therefore, there is a subsequence {yn(i)}i converging to some y in IRn-t, by the inductive hypothesis. Since {Zn(i)} is bounded in IR, it has a subsequence {Zn(i(p»} p converging to some x in IR. Of course, the corresponding subsubsequence {yn(i(p»} p still converges to y in IRn-t, and then {xn(i(p»} p converges to x = -< y, Z>- in IRn = IRn- 1 X IR, since its two component sequences now converge to y and z, respectively. We have thus found a convergent subsequence of {xn}. 0 Theorem 4.5. If A is a bounded closed subset of IRn, then A is sequentially compact (in any product norm). Proof. If {xn} C A, then there is a subsequence {Xn(i)}i converging to'some x in IRn, by Theorem 4.4, and x is in A, since A is closed. Thus A is compact. 0 We can now fill in one of the minor gaps in the last chapter. Theorem 4.6. All norms on IRn are equivalent. Proof. It is sufficient to prove that an arbitrary norm /I /I is equivalent to /I /I 1. Setting a = max {I/cSi/lg, we have /lxI/ = II~ XicSill ~ ~ IXil /lcSil/ ~ al/xllt, so one of our inequalities is trivial. We also have I/lxI/ - /lyl/ I ~ /Ix - yl/ ~ allX - yl/ 1, so /lxI/ is a continuous function on IRn with respect to the one-norm. Now the unit one-sphere S = {x: /lx/l1 = I} is closed and bounded and so compact (in the one-norm). The restriction of the continuous function /lx/I to this compact set S has a minimum value m, and m cannot be zero because S does not contain the zero vector. We thus have /lx/l ;::: m/lxl/1 on S, and so /lx/l ;::: m/lxl/ 1 on IRn, by homogeneity. Altogether we have found positive constants a and m such that m/l /I1 ~ /I /I ~ a/l /I1. 0
  • 221. 4.4 SEQUENTIAL COMPACTNESS 209 Composing with a coordinate isomorphism, we see that all norms on any finite-dimensional vector space are equivalent. Corollary. If M is a finite-dimensional subspace of the normed linear space V, then M iR a closed subspace of V. Proof. Suppose that {~n} eM and ~n -+ a E V. We have to show that a is in M. Now an} is a bounded subset of M, and its closure in M is therefore se- quentially compact, by the theorem. Therefore, some subsequence converges to a point {3 in M as well as to a, and so a = (3 E M. 0 EXERCISES 4.1 Prove by induction that if f: Z+ ---+ Z+ is such that fen + 1) > fen) for all n, then fen) ~ n for all n. 4.2 Prove carefully that if Xn ---+ a as n ---+ 00, then Xn(m) ---+ a as m ---+ 00 for any subsequence. The above exercise is useful in this proof. 4.3 Prove that if {Xn} is an increasing sequence in IR (Xn+1 ~ Xn for all n), and if {Xn} has a convergent subsequence, then {Xn} converges. 4.4 Give a more detailed version of the argument that if the sequence {xn} does not converge to a, then there is an E and a subsequence {Xn(m)} m such that P(Xn(m), a) ~ E for all m. 4.5 Find a sequence in IR having no convergent subsequence. 4.6 Find a nonconvergent sequence in IR such that the set of limit points of con- vergent subsequence consists exactly of the number 1. 4.7 Show that there is a sequence {Xn} in [0,1] such that for any y E [0, 1] there is a subsequence Xnm converging to y. 4.8 Show that the set of limits of convergent subsequences of a sequence {Xn} in a metric space X is a closed subset of X. 4.9 Prove Theorem 4.1. 4.10 Prove that the Cartesian product of two sequentially compact metric spaces is sequentially compact. (The proof is essentially in the text.) 4.11 A metric space is boundedly compact if every closed bounded set is sequentially compact. Prove that the Cartesian product of two boundedly compact metric spaces is boundedly compact (using, say, the maximum metric on the product space). 4.12 Prove that the sum A + B of two sequentially compact subsets of a normed linear space is sequentially compact. 4.13 Prove th_at the sum A +B of a closed set and a compact set is closed. 4.14 Show by an example in IR that the sum of two closed sets neea not be closed. 4.15 Let {Cn} be a decreasing sequence (Cn+l C Cn for all n) of nonempty closed subsets of a sequentially compact metric space S. Prove that niCn is nonempty. 4.16 Give an example of a decreasing sequence {Cn} of nonempty closed subsets of a metric space such that ni Cn = 0.
  • 222. 210 COMPACTNESS AND COMPLETENESS 4.5 4.17 Suppose the metric space S has the property that every decreasing sequence {Cn] of nonempty closed subsets of S has nonempty intersection. Prove that then S must be sequentially compact. [Hint: Given any sequence {x;} C S, let Cn be thr closure of {Xi: i ~ n}.J 4.18 Let.l be a sequentially compact subset of a nls V, and let B be obtained from .I by drawing all line segments from points of A to the origin (that is, B = {ta: a E A and t E [0, 1]}). Prove that B is compact. 4.19 Show by applYing a compactness argument to Lemma 1.5 that if N is a proper closed subspace of a finite-dimensional vector space V, then there exists a in V such that lIall = p(a, N) = 1. 5. COMPACTNESS AND UNIFORMITY The word' uniform' is frequently used as a qualifying adjective in mathematics. Roughly speaking, it concerns a "point" property P(y) which mayor may not hold at each point y in a domain A and whose definition involves an existential quantifier. A typical form for P(y) is (Vc)(3d)Q(y, c, d). Thus, if P(y) is 'f is continuous at y', then P(y) has the form (VE)(3lJ)Q(y, E, lJ). The property holds on A if it holds for all y in A, that is, if (VyEA)[(Vc)(3d)Q(y, c, d)]. Here d will, in general, depend both on y and c; if either y or c is changed, the corresponding d may have to be changed. Thus lJ in the definition of continuity depends both on E and on the point y at which continuity is being asserted. The property is said to hold uniformly on A, or uniformly in y, if a value d can be found that is independent of y (but still dependent on c). Thus the property holds uniformly in y if (Vc)(3d)(VyEA)Q(y, c, d); the uniformity of the property is expressed in the reversal of the order of the quantifiers (VyEA) and (3d). Thus f is uniformly continuous on A if (VE)(3lJ)(Vy, ZEA)[p(y, z) < lJ ==> p(J(y), fez)) < E]. Now lJ is independent of the point at which continuity is being asserted, but still dependent on E, of course. We saw in Section 14 of the last chapter how much more powerful the point condition of continuity becomes when it holds uniformly. In the remainder of this section we shall discuss some other uniform notions, and shall see that the uniform property is often implied by the point property if the domain over which it holds is sequentially compact. The formal statement forms we have examined above show clearly the distinction between uniformity and nonuniformity. However, in writing an argument, we would generally follow our more idiomatic practice of dropping out
  • 223. 4.5 COMPACTNESS AND UNIF@RMIT;Y 211 the inside universal quantifier. For example, a sequence of functions Un} C W A converges pointwise to f: A -+ W if it converges to f at every point P in A, that is, if for every point p in A and for every f there is an N such that n> N => p(jn(p),f(p) ~ f. The sequence converges uniformly on A if an N exists that is independent of p, that is, if for every f there is an N such that n > N => p(jn(P),f(p) ~ f for every p in A. When p(~, 71) = II ~ - 711[, saying that p(jn(P), f(p) ~ f for all p is the same as saying that IIfn - fll"" ~ f. Thus fn -+ funiformly if and only if IIfn - fll"" -+ 0; this is why the norm IIfll"" is called the uniform norm. Pointwise convergence does not imply uniform convergence. Thus fn(x) = xn on A = (0, 1) converges pointwise to the zero function but does not converge uniformly. Nor does continuity on A imply uniform continuity. The function f(x) = 1/x is continuous on (0, 1) but is not uniformly continuous. The function sin (l/x) is continuous and bounded on (0, 1) but is not unifomlly continuous. Compactness changes the latter situation, however. Theorem 5.1. Iff is continuous on A and A is compact, then f is uniformly continuous on A. Proof. This is one of our "automatic" negation proofs. Uniform continuity (UC) is the property (Vf>o)(H>O)(Vx, yEA)[p(x, y) < ~ => p(j(x),j(y) < f]. Therefore, .....,UC ~ (3f)(V~)(3x, y)[p(x, y) < ~ and p(j(x), fey)~ ~ f]. Take ~ = lin, with corresponding Xn and Yn. Thus, for all n, p(xn, Yn) < lin and p(j(xn), f(Yn) ~ f, where f is a fixed positive number. Now {xn} has a con- vergent subsequence, say Xn(i) -+ x, by the compactness of A. Since P(Yn(i), Xn(i») < Iii, we also have Yn(i) -+ x. By the continuity off at x, p(j(xn(i»),f(Yn(i»)) ~ p(j(xn(i»),f(x) +p(j(X),f(Yn(i»)) -+0, which contradicts p(j(Xn(i»), f(Yn(i»)) ~ f. This completes the proof by nega- tion. 0 The compactness of A does not, however, automatically convert the point- wise convergence of a sequence of functions on A into uniform convergence. The "piecewise linear" functions fn: [0, 1] -+ [0, 1] defined by the graph shown in Fig. 4.1 converge pointwise to zero on the compact domain [0, 1], but the con- vergence is not uniform. (However, see Exercise 5.4.)
  • 224. 212 COMPACTNESS AND COMPLETENESS 4.5 lr--------.--------------, lin 21n Fig. 4.1 Fig. 4.2 We pointed out earlier that the distance between a pair of disjoint closed sets may be zero. However, if one of the closed sets is compact, then the distance must be positive. Theorem 5.2. If A and C are disjoint nonempty closed sets, one of which is compact, then p(A, C) > o. Proof. The proof is by automatic contradiction, and is left to the reader. This result is again a uniformity condition. Saying that a set A is disjoint from a closed set C is saying that (VxEA )(3r>O) (Br(x) n C = 0). Saying that p(A, C) > 0 is saying that (3r>O)(VxEA ) ••• As a last consequence of sequential compactness, we shall establish a very powerful property which is taken as the definition of compactness in general topology. First, however, we need some preparatory work. If A is a subset of a metric space S, the r-neighborhood of A, Br[A], is simply the union of all the balls of radius r about points of A: Br[A] = U{Br(a) : a E A} = {x: (3aEA) (p(x, a) < r)}. A subset A C S is r-dense in S if S C Br[A], that is, if each point of S is closer than r to some point of A. A subset A of a metric space S is dense in S if A = S. This is the same as saying that for every point p in S there are points of A arbitrarily close to p. The set iQ of all rational numbers is a dense subset of the real number system IR, because any irrational real number x can be arbitrarily closely approximated by rational numbers. Since we do arithmetic in decimal notation, it is customary to use decimal approximations, and if 0 < x < 1 and the decimal expansion of x is x = L~ an/lOn, where each an is an integer and 0 ~ an < 10, then :Ef an/IOn is a rational number differing from x by less than lO-N. Note that A is a dense subset of B if and only if A is r-dense in B for every positive r. A set B is said to be totally bounded if for every positive r there is a finite set which is r-dense in B. Thus for every positive r the set B can be covered by a finite number of balls of radius r. For example, the n - 1 numbers {i/nH-1 are (l/n)-dense in the open interval (0, 1) for each n, and so (0, 1) is totally bounded.
  • 225. 4.5 COMPACTNESS AND UNIFORMITY 213 Total boundedness is a much stronger property than boundedness, as the following lemma shows. Lemma 5.1. If the normed linear space V is infinite-dimensional, then its closed unit ball BI = H: II ~II ~ I} cannot be covered by a finite number of balls of radius -1. Proof. Since V is not finite-dimensional, we can choose a sequence {an} such that an+l is not in the linear span M n of {aI, ... , an}, for each n. Since M n is closed in V, by the corollary of Theorem 4.6, we can apply Lemma 1.5 to find a vector ~n in M n such that II ~n II = 1 and p( ~n' M n-l) > i for all n > 1. We take h = adilalll, and we have a sequence Hn} C BI such that II~m - ~nll > i if m ¢ n. Then no ball of radius -1 can contain more than one h, proving the lemma. 0 For a concrete example, let V be e([O, 1]), and letfn be the "peak" function sketched in Fig. 4.2, where the three points on the base are 1/(2n +2), 1/(2n+ 1), and 1/2n. Then fn+l is "disjoint" from fn (that is, fn+dn = 0), and we have Ilfnlloo = 1 for all nand Ilfn - fmlloo = 1 if n ¢ m. Thus no ball of radius i can contain more than one of the functions fn, and accordingly the closed unit ball in V cannot be covered by a finite number of balls of radius l Lemma 5.2. Every sequentially compact set A is totally bounded. Proof. If A is not totally bounded, then there exists an r such that no finite subset F is r-dense in A. We can then define a sequence {Pn} inductively by taking PI as any point of A, P2 as any point of A not in Br(PI), and Pn as any point of A not in Br[U~-l Pi] = U~-l Br(Pi). Then {Pn} is a sequence in A such that P(Pi' Pj) ~ r for all i ¢ j. But this sequence can have no convergent subsequence. Thus, if A is not totally bounded, then A is not sequentially com- pact, proving the lemma. 0 Corollary. A normed linear space V is finite-dimensional if and only if its closed unit ball is sequentially compact. Proof. This follows from Theorem 4.4 in one direction and from the above two lemmas in the other direction. 0 Lemma 5.3. Suppose that A is sequentially compact and that {Ei : i E J} is an open covering of A (that is, {Ei} is a family of open sets and A C UiEi). Then there exists an r > 0 with the property that for every point P in A the ball Br(P) is included in some Ej • Proof. Otherwise, for every r there is a point P in A such that Br(P) is not a sub- set of any Ej • Take r = l/n, with corresponding sequence {Pn}. Thus Bl/n(Pn) is not a subset of any Ej • Since A is sequentially compact, {Pn} has a convergent subsequence, Pn(m) ~ P as m ~ 00. Since {Ei} covers A, some Ej contains p,
  • 226. 214 COMPACTNESS AND COMPLETENESS 4.5 and then B.(p) C Ej for some E > 0, since Ej is open. Taking m large enough so that l/m < E/2 and also P(Pn(m), p) < E/2, we have Blfn(m)(Pn(m» C B.(p) C Ej, contradicting the fact that Blfn(Pn) is not a subset of any Ei. The lemma has thus been proved. 0 Theorem 5.S. If ff is an open covering of a sequentially compact set A, then some finite subfamily of ff covers A. Proof. By the lemma immediately above there exists an l' > 0 such that for every P in A the ball Br(P) lies entirely in some set of ff, and by the first lemma there exist Pb ... , Pn in A such that A C U~ Br(Pi). Taking corresponding sets Ei in ff such that Br(Pi) C Ei for i = 1, ... , n, we clearly have A C U~ Ei • 0 In general topology, a set A such that every open covering of A includes a finite covering is said to be compact or to have the Heine-Borel property. Thc above theorem says that in a metric space every sequentially compact set if:; compact. We shall see below that the reverse implication also holds, so that thc two notions are in fact equivalent on a metric space. Theorem 5.4. If A is a compact metric space, then A is sequentially compact. Proof. Let {xn} be any sequence in A, and let ff be the collection of open balls B such that B contains only finitely many Xi. If ff were to cover A, then by com- pactness A would be the union of finitely many balls in ff, and this would clearly imply that the whole of A contains only finitely many Xi, contradicting the fact that {Xi} is an infinite sequence. Therefore, ff does not cover A, and so there is a point X in A such that every ball about x contains infinitely many of the Xi. More precisely, every ball about x contains Xi for infinitely many indices i. It can now be safely left to the reader to see that a subsequence of {xn} converges to x. 0 EXERCISES 5.1 Show thatfn(x) = xn does not converge uniformly on (0, 1). 5.2 Show thatf(x) = l/x is not uniformly continuous on (0,1). 5.S Define the notion of a function [{: X X Y ~ }' being uniformly Lipschitz in its second variable over its first variable. 5,4, Let S be a sequentially compact metric space, and let {fn) be a sequence of continuous real-valued functions on S that decreases pointwise to zero (that is, {fn(P)] is a decreasing sequence in IR andfn(p) ~ 0 as n ~ co for each p in S). Prove that the convergence is uniform. (Try to apply Exercise 4.15.) 5.5 Restate the corollaries of Theorems 15.1 and 15.2 of Chapter 3, employing the weaker hypotheses that suffice by virtue of Theorem 5.1 of the present section.
  • 227. 4.6 EQUICONTINUITY 215 5.6 Prove Theorem 5.2. 5.7 Prove that if .1 is an r-dense subset of a set X in a normed linear space V, and if B is an s-dense subset of a set Y C V, then A + B is (r + s)-dense in X + Y. Con- clude that the sum of two totally bounded subsets of V is totally bounded. 5.8 Suppose that the n points {Pi] i are r-dense in a metric space X. Let A. be any subset of X. Show that A. has a subset of at most n points that is 2r-dense in A. Conclude that any subset of a totally bounded metric space is itself totally bounded. 5.9 Prove that the Cartesian product of two totally bounded metric spaces is totally bounded. 5.10 Show that if a metric space X has a dense subset A that is totally bounded, then X is total!y bounded. 5.11 Show that if two continuous mappings f and g from a metric space X to a metric space Yare equal on a dense subset of X, then they are equal everywhere. 5.12 Write out in explicit quantified form involving the existence of balls the state- ment that the interiors of the sets {Ai} cover the metric space A. Then show that the conclusion of Lemma 5.3 is another uniformity assertion. 5.13 Reprove the theorem that a continuous function on a compact domain is bounded on the basis of Theorem 5.3. 5.14 Reprove the theorem that a continuous function on a compact domain is uniformly continuous from Theorem 5.3. 6. EQUICONTINUITY The application of sequential compactness that we shall make in an infinite- dimensional context revolves around the notion of an equicontinuou8 family of functions. If A and B are metric spaces, then a subset fr C BA is said to be equicontinuou8 at Po in A if all the functions of fr are continuous at Po and if given E, there is a ~ which works for them all, i.e., such that p(p, Po) < ~ => p(j(p),f(Po» < E for every fin fr. The family fr is uniformly equicontinuou8 if ~ is also independent of Po, and so is dependent only on E. Our quantifier string is thus (VE)(3~)(Vp, qEA)(Vr'.f). For example, given m > 0, let fr be a collection of functions f from (0, 1) to (0, 1) such that!' exists and 1f'1 :$ m on (0, 1). Then If(x) - f(y) I :$ mix - yl, by the ordinary mean-value theorem. Therefore, given any E, we can take ~ = Elm and have Ix - yl < ~ => If(x) - f(y)1 < E for all x, y E (0, 1) and all f E fr. The collection fr is thus uniformly equicon- tinuous. TheorelD 6.1. If A and B are totally bounded metric spaces, and if ff is a uniformly equicontinuou8 subfamily of BA, then fr is totally bounded in the uniform metric.
  • 228. 216 COMPACTNESS AND COMPLETENESS 4.7 Proof. Given E > 0, choose 8so that for allfin 5 and all Pi, P2 in A, P(Pb P2) < o =? P(J(Pl), f(P2») < E/4. Let D be a finite subset of A which is o-dense in A, and let E be a finite subset of B which is (E/4)-dense in B. Let G be the set ED of all functions on D into E. G is of course finite; in fact, #G = nm , where m = #D and n = #E. Finally, for each g E G let 5g be the set of all functions f E 5 such that p(j(p), g(p») < E/4 for every P E D. We claim that the collections 5g cover 5 and that each 5g has diameter at most E. We will then obtain a finite E-dense subset of 5 by choosing one function from each nonempty 5g , and the theorem will be proved. To show that every f E 5 is in some 5g , we simply construct a suitable g. For each P in D there exists a q in E whose distance from f(p) is less than E/4. If we choose one such q in E for each pin D, we have a function gin G such that fE5g • The final thing we have to show is that if f, h E 5g , then p(f, h) ~ E. Since p(h, g) < E/4 on D and p(f, g) < E/4 on D, it follows that p(j(p), h(p») < E/2 for every p E D. Then for any p' E A we have only to choose p E D such that p(p', p) < 0, and we have p(j(p'), h(p'») ~ p(j(p'),f(p») + p(j(p), h(p») + p(h(p), h(p'») ~ E/4 +E/2 +E/4 = E. 0 The above proof is a good example of a mathematical argument that is completely elementary but hard. When referring to mathematical reasoning, the words 'sophisticated' and 'difficult' are by no means equivalent. 7. COMPLETENESS If xn ~ a as n ~ 00, then the terms Xn obviously get close to each other as n gets large. On the other hand, if {xn} is a sequence whose terms get arbitrarily close to each other as n ~ 00, then {xn} clearly ought to converge to a limit. It may not, however; the desired limit point may be missing from the sp~ce. If a metric space S is such that every sequence which ought to converge actually does converge, then we say that S is complete. We now make this notion precise. Definition. {Xn} is a Cauchy sequence if for every Ethere is an N such that m > Nand n > N =? p(xm, xn) < E. Lemma 7.1. If {xn} is convergent, then {Xn} is Cauchy. Proof. Given E, we choose N such that n > N =? p(xn' a) < E/2, where a is the limit of the sequence. Then if m and n are both greater than N, we have p(xm' xn) ~ p(xm' a) +p(a, xn) <: E/2 +E/2 = E. 0
  • 229. 4.7 COMPLETENESS 217 Lelllllla 7.2. If {Xn} is Cauchy, and if a subsequence is convergent, then {xn} itself converges. Proof. Suppose that Xn(i) -+ a as i -+ 00. Given E, we take N so that m, n > N =} p(xn' xm) < E. Because Xn(i) -+ a as i -+ 00, we can choose an i such that n(i) > Nand P(Xn(i), a) < E. Thus if m > N, we have p(xm' a) ~ p(xm' Xn(i» +p(Xn(i), a) < 2E, and so Xm -+ a. 0 Actually, of course, if m, n > N =} p(xm, xn) < E, and if Xn -+ a, then for any m > N it is true that p(xm' a) ~ E. Why? Lelllllla 7.3. If A and B are metric spaces, and if T is a Lipschitz mapping from A to B, then T carries Cauchy sequences in A into Cauchy sequences in B. This is true in particular if A and Bare normed linear spaces and T is an element of Hom(A, B). Proof. Let {xn} be a Cauchy sequence in A, and set Yn = T(xn). Given E, choose N so that m, n > N =} p(xm' xn) < EIC, when C is a Lipschitz constant for F. Then m, n > N =} P(Ym, Yn) = p(T(xm), T(xn») ~ Cp(xm, xn) < CEIC = E. 0 This lemma has a substantial generalization, as follows. Theorelll 7.1. If A and B are metric spaces, {xn} is Cauchy in A, and F: A -+ B is uniformly continuous, then {F(xn )} is Cauchy in B. Proof. The proof is left as an exercise. The student should try to acquire a good intuitive feel for the truth of these lemmas, after which the technical proofs become more or less obvious. Definition. A metric space A is complete if every Cauchy sequence in A converges to a limit in A. A complete normed linear space is called a Banach space. Weare now going to list some important examples of Banach spaces. In each case a proof is necessary, so the list becomes a collection of theorems. Theorelll 7.2. IR is complete. Proof. Let {xn} be Cauchy in R Then {xn} is bounded (why?) and so, by Theorem 4.3, has a convergent subsequence. Lemma 7.2 then implies that {xn} is convergent. 0 Theorelll 7.3. If A is a complete metric space, and if f is a continuous bijective mapping from A to a metric space B such that r 1 is Lipschitz continuous, then B is complete. In particular, if V is a Banach space, and if Tin Hom(V, W) is invertible, then W is a Banach space.
  • 230. 218 COMPACTNESS AND COMPLETENESS 4.7 Prool. Suppose that {Yn} is a Cauchy sequence in B, and set Xi = I-I(y,) for all i. Then {Xi} is Cauchy in A, by Lemma 7.3, and so converges to some X in A, since A is complete. But then Yn = I(xn) ~ I(x), because I is continuomt Thus every Cauchy sequence in B is convergent and B is complete. 0 The Banach space assertion is a specia.l case, because the invertibility of '1' means that T-I exists in Hom(W, V) and hence is a Lipschitz mapping. Corollary. If p and q are equivalent norms on V and -< V, p>- is completc, then so is -< V, q>-. Theorenl 7.4. If V I and V2 are Banach spaces, then so is V 1 X V2. Proof. If {-< tn, 7Jn >-} is Cauchy, then so are each of {tn} and {7Jn} (by Lemma 7.3, since the projections 7ri are bounded). Then tn ~ a and 7Jn ~ {:J for some a E V land {3 E V2· Thus -< tn, 7Jn >- ~ -<a, {3 >- in V I X V2. (See Theorem 3.4.) 0 Corollary 1. If {Vi] 1are Banach spaces, then so is IIi=l Vi. Corollary 2. Every finite-dimensional vector space is a Banach space (in any norm). Proof. IRn is complete (in the one-norm, say) by Theorem 7.2 and Corollary 1 above. We then impose a one-norm on V by choosing a basis, and apply the corollary of Theorem 7.3 to pass to any other norm. 0 TheoreIIl 7.5. Let W be a Banach space, let A be any set, and let 03(A, W) be the vector space of all bounded functions from A to W with the uniform norm Ilill"" = lub {lli(a)11 : a E A}. Then 03(A, W) is a Banach space. Prooi. Let Un] be Cauchy, and choose any a E A. Since Ilin(a) - 1m(a) I! ~ Ilin - imll"" it follows that Un(a)} is Cauchy in Wand so convergent. Definc g: A ~ W by yea) = limin(a) for each a E A. We have to show that g is bounded and that in ~ y. Given E, we choose N so that m, n > N =:} Ilim - inll"" < E. Then Ilim(a) - yea) II = lim Ilim(a) - inCa) II ~ E. n-->"" Thus, if m > N, then Ilim(a) - y(a)!1 ~ E for all a E A, and hence Ilim - gil"" ~ E. This implies both that im - g E 03(A, W), and so g = im - (fm - g) E 03(A, W), and that im ~ g in the uniform norm. 0 TheoreIIl 7.6. If V is a normed linear space and W is a Banach space, then Hom(V, W) is a Banach space. The method of proof is identical to that of the preceding theorem, and we leave it as an exercise. Boundedness here has a different meaning, but it is used
  • 231. 4.7 COMPLETENESS 219 in essentially the same way. One additional fact has to be established, namely, that the limit map (corresponding to g in the above theorem) is linear. Theorem 7.7. A closed subset of a complete metric space is complete. A complete subset of any metric space is closed. Proof. The proof is left to the reader. It follows from Theorem 7.7 that a complete metric space A is absolutely closed, in the sense that no matter how we extend A to a larger metric space B, A is always a closed subset of B. Actually, this property is equivalent to completeness, for if A is not complete, then a very important construction of metric space theory shows that A can be completed. That is, we can construct a complete metric space B which includes A. Now, if A is not complete, then the closure of A in B, being complete, is different from A, and A is not absolutely closed. See Exercise 7.21 through 7.23 for a construction of the completion of a metric space. The completion of a normed linear space is of course a Banach space. Theorem 7.8. In the context of Theorem 7.5, let A be a metric space, let e(A, W) be the space of continuous functions from A to W, and set CBe(A, W) = CB(A, W) n e(A, W). Then CBe is a closed subspace ofCB. I I I I I I -----..!...---I I I I I I I I a x } <.j3 Fig. 4.3 Proof. We suppose that Un} C CBe and that IIfn - all", --t 0, where g E CB. We have to show that g is continuous. This is an application of a much used "up, over, and down" argument, which can be schematically indicated as in Fig. 4.3. Given E, we first choose any n such that IIfn - all", < E/3. Consider now any a EA. Sincefn is continuous at a, there exists a 6 such that p(x, a) < 6 :::} Ilfn(x) - fn(a) II < E/3.
  • 232. 220 COMPACTNESS AND COMPLETENESS 4.7 Then p(x, a) < 6 ==> Ilg(x) - g(a)11 :::; IIg(x) - fn(x) II + Ilfn(x) - fn(a) II + Ilfn(a) - g(a)11 < E/3 +E/3 +E/3 = E. Thus g is continuous at a for every a E A, and so g E me. 0 This important classical result is traditionally stated as follows: The limit oj a uniformly convergent sequence of continuous functions is continuous. Remark. The proof was slightly more general. We actually showed that if fn ---+ f uniformly, and if each fn is continuous at a, then f is continuous at a. Corollary. me(A, W) is a Banach space. Theorem 7.9. If A is a sequentially compact metric space, then A is com- plete. Proof. A Cauchy sequence in A has a subsequence converging to a limit in A, and therefore, by Lemma 7.2, itself converges to that limit. Thus A is complete. 0 In Section 5 we proved that a compact set is also totally bounded. It can be shown, conversely, that a complete, totally bounded set A is sequentially com- pact, so that these two properties together are equivalent to compactness. The crucial fact is that if A is totally bounded, then every sequence in A has a Cauchy subsequence. If A is also complete, this Cauchy subsequence will converge to a point of A. Thus the fact that total boundedness and complete- ness together are equivalent to compactness follows directly from the next lemma. Lemma 7.4. If A is totally bounded, then every sequence in A has a Cauchy subsequence. Proof. Let {Pm} be any sequence in A. Since A can be covered by a finite number of balls of radius 1, at least one ball in such a covering contains infinitely many of the points {Pm}. More precisely, there exists an infinite set M 1 C z+ such that the set {Pm: m E M I} lies in a single ball of radius 1. Suppose that M b ... , M n C Z+ have been defined so that M i+1 C M;for i = 1, ... ,n - 1, Mn is infinite, and {Pm: m E Mi } is a subset of a ball of radius l/i for i = 1, ... ,n. Since A can be covered by a finite family of balls of radius l/(n + 1), at least one covering ball contains infinitely many points of the set {Pm: mE M n}. More precisely, there exists an infinite set M n+l C M n such that {Pm: m E M n+l} is a subset of a ball of radius l/(n + 1). We thus define an infinite sequence {Mn} of subsets of Z+ having the above properties. Now choose ml E M 1, m2 E M 2 so that m2 > ml, and, in general, mn+l E Mn+l so that mn+l > m". Then the subsequence {PmJn is Cauchy. For given E, we can choose n so that l/n < E/2. Then i, j > n ==> mi, mj E M n ==> P(Pm., Pm.) < 2(1/n) < E. This proves the lemma, and our theorem is a. . , corollary. 0
  • 233. 4.7 COMPLETENESS 221 Theorem 7.10. A metric space S is sequentially compact if and only if S is totally bounded and complete. The next three sections will be devoted to applications of completeness to the calculus, but before embarking on these vital matters we should say a few words about infinite series. As in the ordinary calculus, if {~n} is a sequence in a normed linear space V, we say that the series :E ~i converges and has the sum a, and write:E~ ~i = a, if the sequence of partial sums converges to a. This means that Un -t a as n -t ao, where Un is the finite sum :Ei ~i for each n. We say that :E ~i converges absolutely if the series of norms :EIIM converges in IR. This is abuse of language unless it is true that every absolutely convergent series con- verges, and the importance of the notion stems from the following theorem. Theorem 7.11. If V is a Banach space, then every absolutely convergent series in V is convergent. Proof. Let:E ~i be absolutely convergent. This means that :EII ~ill converges in R, i.e., that the sequence {sn} converges in R, where Sn = :Ei II ~ill. If m < n, then Since {Si} is Cauchy in R, this inequality shows that {Ui} is Cauchy in V and therefore, because V is complete, that {un} is convergent in V. That is, :E ~i is convergent in V. 0 The reader will be asked to show in an exercise that, conversely, if a normed linear space V is such that every absolutely convergent series converges, then V is complete. This property therefore characterizes Banach spaces. We shall make frequent use of the above theorem. For the moment we note just one corollary, the classical Weierstrass comparison test. Corollary. If {In} is a sequence of bounded real-valued (or W-valued, for some Banach space W) functions on a common domain A, and if there is a sequence {Mn} of positive constants such that :E Mn is convergent and IIfnll"" ~ Mn for each n, then :E fn is uniformly convergent. Proof. The hypotheses imply that :Ellfnll"" converges, and so :E fn converges in the Banach space CB(A, W) by the theorem. But convergence in CB(A, W) is uniform convergence. 0 EXERCISES 7.1 Prove that a Cauchy sequence in a metric space is a bounded set. 7.2 Let V be a normed linear space. Prove that the sum of two Cauchy sequences in V is Cauchy. 7.3 Show also that if {~n} is Cauchy in V and {an} is Cauchy in R, then {an~n} is Cauchy in V.
  • 234. 222 COMPACTNESS AND COMPLETENESS 4.7 7.4 Prove that if {~n} is a Cauchy sequence in a normed linear space V, then (II ~n II) is a Cauchy sequence in R 7.5 Prove that if [Xn} and [Yn} are two Cauchy sequences in a metric space S, then [P(Xn, Yn)} is a Cauchy sequence in IR. 7.6 Prove the statement made after the proof of Lemma 7.2. 7.7 Thl' rational number system is an incomplete metric space. Prove this by exhibiting a Cauchy sequence of rational .lUmbers that does not converge to a rational number. 7.8 Prove Theorem 7.1. 7.9 Deduce a strengthened form of Theorem 7.3 from Theorem 7.1. 7.10 Write out a careful proof of Theorem 7.6, modeled on the proof of Theorem 7.5. 7.Il Prove Theorem 7.7. 7.12 Let the metric space X have a dense subset Y such that every Cauchy sequence in Y is convergent in X. Prove that X is complete. 7.13 Show that the set W of all Cauchy sequences in a normed linear space V i~ itself a vector space and that a seminorm P can be defined on W by p( {~n)) = lim lI~nll. (Put this together from the material in the text and the preceding problems.) 7.14 Continuing the above exercise, for each ~ E V, let ~c be the constant sequence all of whose terms are~. Show that 0: ~ f-+ ~c is an isometric linear injection of V into II· and that O[V) is dense in W in terms of the seminorm from the above exercise. 7.15 Prove next that every Cauchy sequence in O[V) is convergent in W. Put Exer- cises 4.18 of Chapter 3 and 7.12 through 7.14 of this chapter together to conclude that if N is the set of null Cauchy sequences in W, then WIN is a Banach space, and that ~ f-+ ~c is an isometric linear injection from V to a dense subspace of WIN. This con- stitutes the standard completion of the normed linear space V. 7.16 We shall now sketch a nonstandard way of forming the completion of a metric space S. Choose some point Po in S, and let V be the set of real-valued functions on S such that I(po) = 0 and I is a Lipschitz function. For I in Y define 11I11 as the smallest Lipschitz constant for I. That is, Ilfll = lub {1/(p) - l(q)llp(p, q)}. p-:/,q Prove that r is a normed linear space under this norm. (V actually is complete, but we do not need this fact.) 7.17 Continuing the above exercise, we know that the dual space V* of all bounded linear functionals on V is complete by Theorem 7.6. We now want to show that Scan be isometrically imbedded in V*; then the closure of S as a subset of V will be the desired completion of S. For each pES, let Op: V ~ IR be "evaluation at p". That is, Op(f) = I(p)· Show that Op E V* and that 1I0p - Oqll ::; pep, q). 7.18 In order to conclude that the mapping 0: p f-+ Op is an isometry (i.e., is distance- preserving), we have to prove the opposite inequality 1I0p - Oqll 2: pep, q). To do this, choose p and consider the special function I(x) = pep, x) - pep, Po). Show that I is in V and that 11/11 = 1 (from an early lemma in the chapter). Now apply the definition of 1I0p - Oqll and conclude that 0 is an isometric injection of S into V*. Then 0(8) is our constructed completion.
  • 235. 4.8 A FIRST LOOK AT BANACH ALGEBRAS 7.19 Prove that if a normed linear space V has the property that every absolutely convergent series converges, then V is complete. (Let {an} be a Cauchy sequence. Show that there is a subsequence {an;1 i such that if ~i = a ni+1 - ani' then II~ill < 2-i• Conclude that the subsequence converges and finish up.) 7.20 The above exercise gives a very useful criterion for V to be complete. Use it to prove that if V is a Banach space and N is a closed subspace, then VIN is a Banach space (see Exercise 4.14 of Chapter 3 for the norm on VIN). 7.21 Prove that the sum of a uniformly convergent series of infinitesimals (all on the same domain) is an infinitesimal. 8. A FIRST LOOK AT BANACH ALGEBRAS When we were considering the implicit-function theorem and the inverse-function theorem in· the last chapter, we saw how useful it is to know that if a transfor- mation T has an inverse T-1, then so does S whenever liS - Til is small enough, and that the mapping T 1-+ T- 1 is continuous on the open set of all invertible elements. When the spaces in question are finite-dimensional, these facts can be made to follow from the continuity of the determinant function T 1-+ /leT) from Hom V to IR. It is also possible to produce them by arguing directly in terms of upper and lower bounds for T and its close approximations S. But the most natural, most elegant,. and-in the case of Banach spaces-easiest way to prove these things is to show that if V is a Banach space and T in Hom V has norm less than one, then the sum of the geometric series :EO' Tn is the inverse of I - T, just as in the elementary calculus. But in making this argument, the fact that T is a linear transformation has little importance, and we shall digress for a moment to explore this situation. Ust us summarize the norm and algebraic properties of Hom V when V is a Banach space. First of all, we know that Hom V is also a Banach space. Second, it is an algebra. That is, it possesses an associative mUltiplication operation (composition) that relates to the linear operations according to the following laws: S(TI +T 2) = ST1 +ST2, (SI + S2)T = SIT + S2T, c(ST) = (cS)T = S(cT). Finally, multiplication is related to the norm by IISTII 5 IISII IITil and 11111 = 1. This list of properties constitutes exactly the axioms for a Banach algebra. Just as we can see certain properties of functions most clearly by forgetting that they are functions and considering them only as elements of a vector space, now it turns out that we can treat certain properties of transformations in Hom(V) most simply by forgetting the complicated nature of a linear transfor- mation and considering it merely as an element of an abstract Banach algebra.A.
  • 236. 224 COMPACTNESS AND COMPLETENESS 4.8 The most important simple thing we can do in a Banach algebra that we couldn't do in a Banach space is to consider power series. The following theorem shows that the geometric series, in particular, plays the same central role here that it plays in elementary calculus. Since we are not thinking of the elements of A as transformations, we shall designate them by lower-case letters; e is the identity of A. Theorem 8.1. If A is a Banach algebra, and if x in A has norm less than one, then (e - x) is invertible and its inverse is the sum of the geometric series lUX: 00 (e - X)-l = LX". o Also, lie - (e - x)-Ill ~ rl(l - r), where r = Ilxll. Proof. Since Ilx"ll ~ Ilxll" = r", the series L x" is absolutely convergent when Ilxll < 1by comparison with the ordinary geometric series L r". It is therefore convergent, and if y = L~ x", then "(e - x)y = lim (e - x) L xi = lim (e - X"+l) = e, n--+co 0 since Ilxll"+1 ~ r,,+l --+ o. That is, y = (e - X)-I. Finally, lie - (e - x)-lll = II~ x"ll ~ ~ r" = rl(l - r). 0 Theorem 8.2. The set ;n of invertible elements in a Banach algebra A is open and the mapping x 1-4 X-I is continuous from ;n to;n. In fact, if y-l exists and m = Ily-111, then (y - h)-l exists whenever Ilhll < 11m and II(y - h)-l - y-Ill ~ m2 11hll/(l - mllhl!). Proof. Set x = y-Ih. Then (y - h) = y(e - x), where Ilxll = Ily-Ihil ~ mllhll, and so by the above theorem y - h will be invertible, with (y - h)-l = (e - X)-Iy-t, provided Ilhll < 11m. Then also Ily-1 - (y - h)-III ~ lie - (e - x)-Ill .m, and this is bounded above by mrl(l - r) ~ m211hll/(1 - mllhll), by the last inequality in the above theorem. 0 Corollary. If V and Ware Banach spaces, then the invertible elements in Hom(V, W) form an open set, and the map T 1-4 T-1 is continuous on this domain. Proof. Suppose that T-I exists, and set m = liT-III. Then if liT - SII < 11m, we have III - T-1SII ~ liT-III liT - SII < 1, and so T-1S = I - (I - T-IS) is an invertible element of Hom V. Therefore, S = T(T-IS) is invertible and S-I = (T-1S)-IT-1. The continuity of S 1-4 S-I is left to the reader. 0
  • 237. 4.8 A FIRST LOOK AT BANACH ALGEBRAS 225 We saw above that the map x 1-+ (e - X)-l from the open unit ball B1(O) in a Banach algebra A to A is the sum of the geometric power series. We can define many other mappings by convergent power series, at hardly any greater effort. Theorem 8.3. Let A be a Banach algebra. Let the sequence {an} C A and the positive number ~ be such that the sequence {llanll ~n} is bounded. Then L anxn converges for x in the ball Ba(O) in A, and if 0 < s < ~,then the series converges uniformly on B8(O). Proof. Set r = s/~, and let b be a bound for the sequence {llanll ~n}. On the ball Bs(O) we then have Ilanxnll ::; Ilanllsn= Ilanll ~nrn ::; brn, and the series therefore converges uniformly on this ball by comparison with the geometric series bL rn, since 1" < 1. 0 The series of most interest to us will have real coefficients. They are included in the above argument because the product of the vector x and the scalar t is the algebra product (te)x. In addition to dealing with the above geometric series, we shall be particularly interested in the exponential function eX = LoxnIn!. The usual comparison arguments of the elementary calculus show just as easily here that this series converges for every x in A and uniformly on any ball. It is natural to consider the differentiability of the maps from A to A defined by such convergent series, and we state the basic facts below, starting with a fundamental theorem on the differentiability of a limit of a sequence. Theorem 8.4. Let {Fn} be a sequence of maps from a ball B in a normed linear space V to a normed linear space W such that Fn converges pointwise to a map F on B and such that {dF:} converges for each a and uniformly over a. Then F is differentiable on Band dF{J = lim dF~ for each {3 in B. Proof. Fix {3 and set T = lim dF~. By the uniform convergence of {dFn} , given E, there is an N such that IldF: - dF~11 ::; Efor all n ~ N and for all a in B. It then follows from the mean-value theorem for differentials that for all n ~ N and all ~ such that (3 + ~ E B. Letting n ~ 00 and regrouping, we have II(~F{JW - TW) - (~F:W - dF:W) II ::; 2EII~11 for all such ~. But, by the definition of dF: there is a ~ such that II~F:(~) - dF:WII ::; EII~II when II ~II < ~. Putting these last two inequalities together, we see that II ~II < ~ => II~F{J(~) - T(~)II ::; 3EII ~II· Thus F is differentiable at {3 and dF{J = T. 0 The remaining proofs are left as a set of exercises.
  • 238. 226 COMPACTNESS AXD COMPLETENESS 4.8 Lemma 8.1. Multiplication on a Banach algebra A is differentiable (from A X A to A). If we let p be the product function, so that p(x, y) = xv, then dP<a.b>(X, y) = ay +xb. Lemma 8.2. Let A be a commutative Banach algebra, and let p be the monomial function p(x) = ax7l • Then p is everywhere differentiable and dpy(x) = nayn-Ix. Lemma 8.3. If {llanllrn] is a bounded sequence in IR, then {nllanllsn} is bounded for any 0 < s < r, and therefore L na"xn- l converges uniformly on any ball in A smaller than Br(O). Theorem 8.5. If A is a commutative Banach algebra and {an} C A is such that (1Ianllrn] is bounded in IR, then F(x) = Lo anxn is defined and differ- entiable on the ball Br(O) in A, and dFy(x) = (~nanyn-t). x. It is natural to call the element L~ nanyn-t the derivative of F at y and to designate it F'(y), although this departs from our rule that derivatives are vectors obtained as limits of difference quotients. The remarkable aspect of the above theorem is that for this kind of differentiable mapping from a,n open ,mbset of A to A the linear transformation dF-y is multiplication by an element of A: dFy(:r) = F'(y) . :r. In particular, the exponential function exp (x) = eX = LoxnIn! is its own derivative, since L~ nxn- l In! = LO xm1m!, and from this fact (see the exer- cises) or from direct algebraic manipulation of the series in question, we can deduce the law of exponents eX+y = eXeY • Remember, though, that this is on a commutative Banach algebra. The function x ~ eX = Lo xnln! can be defined just as easily on any Banach algebra A, but it is not nearly as pleasant when A is noncommutative. However, one thing that we can always do, and often thereby save the day, is to restrict the exponential mapping to a commutative subalgebra of A, say that generated by a single element x. For example, we can consider the parametrized arc 'Y(t) = etx (x fixed) into any Banach algebra A, and, because its range lies in the commutative subalgebra X generated by x, we can apply Theorem 7.2 of Chapter 3 to conclude that 'Y is differentiable and that 'Y'(t) = d exptx (x) = xetx. This can also easily be proved directly from the law of exponents: ~'Yt(h) = e<t+hlx - etx = etx(ehx - 1), and since it is clear from the series that (chx - l)lh -+ ;); as h ~ 0, we have that 'Y'(t) = lim ~'Y,(h) = XCix. h-+O h
  • 239. 4.8 A FIRST LOOK AT BAXACH ALGEBRAS 227 EXERCISES 8.1 Finish the proof of the corollary of Theorem 8.2. 8.2 Let.1 be a Banach algebra, and let [an) C R and x E .1 be such that L aixi converges. Suppose also that x satisfies a polynomial indentity p(x) = LO bixi = 0, where {bi] C Rand bn ;t. O. Prove that the element L~ aixi is a polynomial in x of degree::S; n - 1. (Let JI be the linear span of {Xi) 0-1, and show first that Xi E JI for all i.) 8.3 Let.1 be any Banach algebra, let x be a fixed element in .1, and let X be the smallest closed subalgebra of A containing x. Prove that X is a commutative Banach algebra. (The set of polynomials p(x) = LO aixi is the smallest algebra containing x. Consider its closure in X.) 8.4 Prove Lemma 8.1. [Hint: -< x, y >- ~ xy is a bounded bilinear map.] 8.5 Prove Lemma 8.2 by making a direct ~-estimate from the binomial expansion, as in the elementary calculus. 8.6 Prove Lemma 8.2 by induction from Lemma 8.1. 8.7 Let A be any Banach algebra. Prove that p: x ~ x3 is differentiable and that dpa(x) = xa2 +axa +a2x. 8.8 Prove by induction that if q(x) = x n , then q is differentiable and Deduce Lemma 8.2 as a corollary. n-I L:: aixa(n-I-i). i=O 8.9 Let.1 be any Banach algebra. Prove that r: x ~ x-I is everywhere differentiable on the open set U of invertible elements and that [Hint: Examine the proofs of Theorems 8.1 and 8.2.] 8.10 Let A be an open subset of a normed linear space V, and let F and G be mappings from A to a Banach algebra X that are differentiable at a. Prove that the product mapping FG is differentiable at a and that d(FG..) = F(a) dG.. +dF..G(a). Does it follow that d(F2).. = 2F(a) dP..? 8.11 Continuing the above exercise, show that if X is a commutative Banach algebra, then d(pn).. = npn-I(a) dP... 8.12 Let F: A --+ X be a differentiable map from an open set A of a normed linear space to a Banach algebra X, and suppose that the element P(~) is invertible in X for every ~ in A. Prove that the map G: ~ ~ [F(m-I is differentiable and that dG..(~) = -F(a)-I dP..(~)P(a). Show also that if P isa parametrized arc(A=lC R), then G'(a) = - P(a) -1 . P'(a) . F(a) -1. 8.13 Prove Lemma 8.3. 8.14 Prove Theorem 8.5 by showing that Lemma 8.3 makes Theorem 8.4 applicable. 8.15 Show that in Theorem 8.4 the convergence of Fn to P needs only to be assumed at one point, provided we know that the codomain space 1r is a Banach space.
  • 240. 228 COMPACTNESS AND COMPLETE~ESS 4.9 8.16 We want to prove the law of exponents for the exponential function on a com- mutative Banach algebra. Show first that (exp (-x) ) (exp x) == e by applying Exercise 7.13 of Chapter 3, the above Exercise 8.10, and the fact that d eXPa (x) = (exp a)x. 8.17 Show that if X is a commutative Banach algebra and F: X ---> X is a differ- entiable map such that dFa(~) = ~F(a), then F(~) = {3 exp ~ for some constant {3. [Consider the differential of FW exp (-~).l 8.18 Now set F(~) = exp (~+ 7/) and prove from the above exercise that exp (~+ 7/) = exp W exp (7/). You will also need the fact that exp 0 = 1. 8.19 Let z be a nilpotent element in a commutative Banach algebra X. That is, zP = 0 for some positive integer p. Show by an elementary estimate based on the binomial expansion that if Ilxll < 1, then Ilx + zll n :::; knpllxll n-p for n > p. The series of positive terms L nar n converges for r < 1 (by the ratio test). Show, therefore, that the series for log (1 - (x +z)) and for (1 - (x +z)) -1 converge when IIxll < 1. 8.20 Continuing the above exercise, show that F(y) = log (1 - y) is defined and differentiable on the ball Ily - zll < 1 and that dFa(x) = -(1 - a)-I. x. Show, therefore, that exp (log (1 - y)) = 1 - Y on this ball, either by applying the inverse mapping theorem or by applying the composite function rule for differentiating. Conclude that for every nilpotent element z in X there exists a u in X such that exp u = 1 - z. 8.21 Let Xl, ... , Xn be Banach algebras. Show that the product Banach space X = IIi Xi becomes a Banach algebra if the product xy = -< x, ... ,Xn >-<YI, ... ,Yn > is defined as -<XIYI, ... , XnYn> and if the maximum norm is used on X. 8.22 In the above situation the projections 7ri have now become bounded algebra homomorphisms. In fact, just as in our original vector definitions on a product space, our definition of multiplication on X was determined by the requirement that 7ri(XY) = 7ri(X)7ri(Y) for all i. State and prove an algebra theorem analogous to Theorem 3.4 of Chapter 1. 8.23 Continuing the above discussion, suppose that the series L anxn converges in X, with sum y. Show that then L(an)i(xi) n converges in Xi to Yi for each i, where, of course, y = -<YI, ... , Yn >. Conclude that eX = -< eXt, ... , eXn> for any x = -<Xl, ... ,xn > in X. 8.24 Define the sine and cosine functions on a commutative Banach algebra, and show that sin' = cos, cos' = -sin, sin2 + cos2 = e. 9. THE CONTRACTION MAPPING FIXED-POINT THEOREM In this section we shall prove the very simple and elegant fixed-point theorem for contraction mappings, and then shall use it to complete the proof of the implicit- function theorem. Later, in Chapter 6, it will be the basis of our proof of the fundamental existence and uniqueness theorem for ordinary differential equa- tions. The section concludes with a comparison of the iterative procedure of the fixed-point theorem and that of Newton's method.
  • 241. 4.9 THE CONTRACTION MAPPING FIXED-POINT THEOREM 229 A mapping K from a metric space X to itself is a contraction if it is a Lipschitz mapping with constant less than 1; that is, if there is a constant C with 0 < C < 1 such that p(K(x), K(y)) ~ Cp(x, y) for all x, y EX. A fixed point of K is, of course, a point x such that K(x) = x. A contraction K can have at most one fixed point, since if K(x) = x and K(y) = y, thenp(x, y) = p(K(x), K(y)) ~ Cp(x, y), and so (1 - C)p(x, y) ~ O. Since C < 1, this implies that p(x, y) = 0 and x = y. Theorem 9.1. Let X be a nonempty complete metric space, and let K: X - X be a contraction. Then K has a (unique) fixed point. Proof. Choose any Xo in X, and define the sequence {xn} 0inductively by setting Xl = K(xo), X2 = K(Xl) = K2(XO)' and Xn = K(Xn_l) = Kn(xo). Set 6 = P(Xh xo). Thenp(x2, Xl) = P(K(Xl), K(xo)) ~ Cp(Xl, xo) = C6, and, by induc- tion, p(xn+l> xn) = p(K(xn), K(Xn_l)) ~ Cp(xn> Xn-l) ~ C· cn-l 6 = Cn 6. It follows that {xn } is Cauchy, for if m > n, then m-l m-l p(xm , xn) ~ :E P(Xi+l, Xi) ~ :E Ci6 < Cn6/(1 - C), n n and Cn _ 0 as n - 00, because C < 1. Since X is complete, {xn} converges to some a in X, and it then follows that K(a) = lim K(xn) = lim Xn+l = a, so that a is a fixed point. 0 In practice, we meet mappings K that are contractions only near some particular point p, and we have to establish that a suitable neighborhood of p is carried into itself by K. We show below that if K is a contraction on a ball about p, and if K doesn't move the center of p very far, then the theorem can be applied. Corollary 1. Let D be a closed ball in a complete metric space X, and let K: D - X be a contraction which moves the center of D a distance at most (1 - C)r, where r is the radius of D and C is the contraction constant. Then K has a unique fixed point and it is in D. Proof. We simply check that the range of K is actually in D. If p is the center of D and X is any point in D, then p(K(x), p) ~ p(K(x), K(p)) +p(K(p), p) ~ Cp(x, p) + (1 - C)r ~ Cr + (1 - C)r = r. 0 Corollary 2. Let B be an open ball in a complete metric space X, and let K: B - X be a contraction which moves the center of B a distance less than (1 - C)r, where r is the radius of Band C is the contraction constant. Then K has a unique fixed point. Proof. Restrict K to any slightly smaller closed ball D concentric with B, and apply the above corollary. 0
  • 242. 230 COMPACTNESS AND COMPLETENESS 4.9 Corollary 3. Let K be a contraction on the complete metric space X, and suppose that K moves the point x a distance d. Then the distance from x to the fixed point is at most dl(1 - C), where C is the contraction constant. Proof. Let D be the closed ball about x of radius r = dl(1 - C), and apply Corollary 1 to the restriction of K to D. It implies that the fixed point is in D. 0 We now suppose that the contraction K contains a parameter s, so that K is now a function of two variables K(s, x). We shall assume that K is a con- traction in x uniformly over s, which means that peKes, x), K(s, y) s Cp(x, y) for all x, y, and s, where 0 < C < 1. We shall also assume that K is a con- tinuous function of s for each fixed x. Corollary 4. Let K be a mapping from S X X to X, where X is a complete metric space and S is any metric space, and suppose that K(s, x) is a con- traction in x uniformly over s and is continuous in s for each x. Then the fixed point P. is a continuous function of s. Proof. Given f, we use the continuity of K in its first variable around the point -<t, PI >- to choose 5, so that if pes, t) < 5, then the distance from K(s, PI) to K(t, Pt) is at most f. Since K(t, PI) = Pt, this simply says that the contraction with parameter value 8 moves PI a distance at most f, and so the distance from PI to the fixed point P. is at most f/(1 - C) by Corollary 3. That is, pes, t) < 5 ~ p(P., Pt) < f/(1 - C), where C is the uniform contraction constant, and the mapping s 1--+ P. is accordingly continuous at t. 0 Combining Corollaries 2 and 4, we have the following theorem. Theorem 9.2. Let B be a ball in a complete metric space X, let S be any metric space, and let K be a mapping from S X B to X which is a contraction in its second variable uniformly over its first variable and is continuous in its first variable for each value of its second variable. Suppose also that K moves the center of B a distance less than (1 - C)r for every s in S, where r is the radius of Band C is the uniform contraction constant. Then for each s in S there is a unique P in B such that K(s, p) = p, and the mapping s 1--+ P is continuous from S to B. We can now complete the proof of the implicit-function theorem. Theorem 9.3. Let V, W, and X be Banach spaces, let A X B be an open subset of V X W, and let G: A X B - X be continuous and have a con- tinuous second partial differential. Suppose that the point -<ex, f3 >- in A X B is such that G(ex, (3) = 0 and dG"<.a..fJ> is invertible. Then there are open balls M and N about ex and f3, respectively, such that for each ~ in M there is a unique." in N satisfying G(~, .,,) = O. The function F thus uniquely defined near -<ex, f3 >- by the condition G(~, F(~) = 0 is continuous. Proof. Set T = dG"<.a..fJ> and K(~,.,,) = ." - T-l(G(~, .,,). Then K is a con- tinuous mapping from A X B to W such that K(ex, (3) = f3, and K has a con-
  • 243. 4.9 THE CONTRACTION MAPPING FIXED-POINT THEOREM 231 tinuous second partial differential such that dK~a,(j> = 0. Because dK~p,.> is a continuous function of -<p., v>-, we can choose a product ball 111 X N about -<a, (3>- on which dK~p,.> is bounded by!, and we can then decrease the ball M if necessary so that for p. in M we also have IIK(p., (3) - (311 < 1"/2, where l' is the radius of the ball N. The mean-value theorem for differentials implies that K is a contraction in its second variable with constant t. The preceding theorem therefore shows that for each ~ in M there is a unique." in N such that K(t, .,,) = ." and the mapping F: ~ 1-+ ." is continuous. Since K(~, .,,) = ." if and only if G(~, .,,) = 0, we are done. 0 Theorems 8.2 and 9.3 complete the list of ingredients of the implicit-function theorem. (However, see Exercise 9.8.) We next show, in the other direction, that if a contraction depending on a parameter is continuously differentiable, then the fixed point is a continuously differentiable function of the parameter. Theorem 9.4. Let V and W be Banach spaces, and let K be a differentiable mapping from an open subset A X B of V X W to W which satisfies the hypotheses of Theorem 9.2. Then the function F from A to B uniquely defined by the equation K(~, F(~)) = F(~) is differentiable. Proof. The inequality IIK(~, .,,') - K(~, .,,")11 ~ Gil.,,' - .,,"11 is equivalent to IldK~a,(j>11 ~ G for all -<a, (3>- in A X B. We now define G by G(~, .,,) = ." - K(~, .,,), and observe that dG2 = 1- dK2 and that dG2 is therefore invertible by Theorem 8.1. Since G(~, F(~)) = 0, it follows from Theorem 11.1 of Chapter 3 that F is differentiable and that its differential is obtained by differentiating the above equation. 0 Corollary. If K is continuously differentiable, then so is F. *We should emphasize that the fixed-point theorem not only has the implicit- function theorem as a consequence, but the proof of the fixed-point theorem gives an iterative procedure for actually finding the value of F(~), once we know how to compute T-1 (where T = dG~a,(j». In fact, for a given value of ~ in a small enough ball about -<a, (3>- consider the function G(~, .). If we set K(~, .,,) = ." - T-IG(~, .,,), then the inductive procedure "'i+I = K(~, ."D becomes (9.1) The meaning of this iterative procedure is easily seen by studying the graph of the situation where V = W = RI. (See Fig. 4.4.) As was proved above, under suitable hypotheses, the series EII"'i+l - "'ill converges geometrically. It is instructive to compare this procedure with Newton's method of elemen- tary calculus. There the iterative scheme (9.1) is replaced by (9.2)
  • 244. 232 COMPACTNESS AND COMPLETENESS 4.9 z G(x,.) Fig. 4.4 where Si = dG~q.T/i>. (See Fig. 4.5.) As we shall see, this procedure (when it works) converges much more rapidly than (9.1), but it suffers from the dis- advantage that we must be able to compute the inverses of an infinite number of linear transformations Si. Fig. 4.5 Let us suppress the ~ which will be fixed in the argument and consider a map G defined in some neighborhood of the origin in a Banach space. Suppose that G has two continuous differentials. For definiteness, we assume that G is defined in the unit ball, B, and we suppose that for each x E B the map dG", is invertible and, in fact, IIdG;lll :::; K, Let Xo = 0 and, assuming that Xn has been defined, we set Xn+l = Xn - S;;lG(Xn), where Sn = dG",,,. We shall show that if IIG(O) II is sufficiently small (in terms of K), then the procedure is well defined (that is, Ilxn+lll < 1) and converges rapidly. In fact, if T is any real number between one and two (for instance T = i), we shall show that for some c (which can be made large if IIG(O)II is small)
  • 245. 4.9 THE CONTRACTION MAPPING FIXED-POINT THEOREM 233 Note that if we can establish (*) for large enough c, then Ilxnll :$ 1 follows. In fact, i 00 00 -c(r-1) IIx'lI < L: e-crn < L: e-crn < L: e-cn(r-l) = e , , - 1 - 1 - 1 1 - e-c<r-ll which is :$ 1 if c is large. Let us try to prove (*) by induction. Assuming it true for n, we have IIXn+1 - xnll = IIS;;-IG(Xn)1I :$ KII G(Xn-l - S;;-':1G(Xn_l» II :$ K{IIG(xn_l) - dG"'n_lS;;-':IG(Xn-l)1I + Kllxn - xn_1112} by Taylor's theorem. Now the first term on the right of the inequality vanishes, and we have IIXn+1 - xnll :$ K211xn - xn_111 2 :$ K2e-2crn. For the induction to work we must have or (**) Since T < 2, this last inequality can be arranged by choosing c sufficiently large. We must still verify (*) for n = 1. This says that or -cr IIG(O)II :$ eK . In summary, for 1 < T < 2 choose c so that K2 :$ e(2-r)cr and -c(r-l) e < 1 1 - e-C<r-ll - . Then if (***) holds, the sequence Xn converges exponentially, that is, (*) holds. If x = lim Xi, then G(x) = lim G(xn) = lim Sn(Xn+l - xn) = O. This is Newton's method. As a possible choice of c and T, let T = I, and let c be given by K2 = e3c/4, so that (**) just holds. We may also assume that K ~ 23/4, so that e3c/4 ~ 43/4 or eC ~ 4, which guarantees that e-c/2 :$ !, implying that e-c/ 2/(1 - e-c/2) :$ 1. Then (***) becomes the requirement G(O) :$ K-5• We end this section with an example of the fixed-point iterative procedure in its simplest context, that of the inverse-mapping theorem. We suppose that H(O) = 0 and that dH(j1 exists, and we want to invert H near zero, i.e., solve the equation H(1/) - ~ = 0 for 1/ in terms of~. Our theory above tells us that the 1/ corresponding to ~ will be the fixed point of the contraction K(~, 1/) = 1/ - T-1H(1/) + T-l(~), where T = dHo. In order to make our example as
  • 246. 234 COMPACTNESS AND COMPLETENESS 4.!1 simple as possible, we shall take H from ~2 to ~2 and choose it so that dH0 = f. Also, in order to avoid indices, we shall use the mongrel notation x = -< x, y? , U = -<u, V? Consider the mapping x = H(u) defined by x = u +v2, y = u3 +v. The Jacobian matrix [3~2 2;] is clearly the identity at the origin. Moreover, in the expression K(x, u) = x + u - H(u), the difference H(u) - u is just the function J(u) = -<v2, U 3 ? This cancellation of the first-order terms is the practical expression of the fact that in forming K(~, .,,) = ." - T-IG(~, .,,), we have acted to make dK2 = 0 at the "center point" (the origin here). We naturally start the iteration with Uo = 0, and then our fixed-point sequence proceeds UI = K(x, uo) = K(x, 0), ... , Un = K(x, Un-I)' Thus Uo = 0 and Un = K(x, Un-I) = X - J(Un_I), giving UI = x, VI = y, U2 = X - y2, V2 = Y - x 3, U3 = X - (y - X 3)2, V3 = Y - (x - y2) 3, U4 = X - [y - (x - y2)3J2, V4 = Y - [x - (y - X 3)2]3. We are guaranteed that this sequence Un will converge geometrically provided the starting point x is close enough to 0, and it seems clear that these two sequences of polynomials are computing the Taylor series expansions for the inverse functions u(x, y) and vex, y). We shall ask the reader to prove this in an exercise. The two Taylor series start out u(x, y) = x - y2 - 2yx3 + ... , vex, y) = y - x 3 + 3x2y2 + .. . EXERCISES 9.1 Let B be a compact subset of a normed linear space such that rB C B for all r E [0, 1]. Suppose that F: B - B is a Lipschitz mapping with constant 1 (i.e., IIFW - F(.,,) II ~ II~ - .,,11 for all ~,." E B). Prove that F has a fixed point. [Hint: Consider first G = rF for 0 < r < 1.] 9.2 Give an example to show that the fixed point in the above exercise may not be unique. 9.3 Let X be a compact metric space, and let K: X - X "reduce each nonzero distance". That is, p(K(x), K(y») < p(x, y) if x ~ y. Prove that K has a unique fixed point. (Show that otherwise glb {p(K(x), x)} is positive and achieved as a minimum. Then get a contradiction.) 9.4 Let K be a mapping from S X X to X, where X is a complete metric space and S is any metric space, and suppose that K(s, x) is a contraction in s uniformly over x and
  • 247. 4.9 THE CONTRACTION MAPPING FIXED-POINT THEOREM 235 is Lipschitz continuous in x uniformly over x. Show that the fixed point P. is a Lipschitz continuous function of s. [Hint: Modify the E,~-beginning of the proof of Corollary 4 of Theorem 9.1.] 9.5 Let D be an open subset of a Banach space V, and let K: D ---+ V be such that I - K is Lipschitz with constant j-. a) Show that if Br(a) C D and (3 = K (a), then Br/2({3) C K[D]. (Apply a corollary of the fixed-point theorem to a certain simple contraction mapping.) b) Conclude that K is injective and has an open range, and that K-l is Lipschitz with constant 2. 9.6 Deduce an improved version of the result in Exercise 3.20, Chapter 3, from the result in the above exercise. 9.7 In the context of Theorem 9.3, show that dG~,..•>is invertible if IldK~,.."> II < 1. (Do not be confused by the notation. We merely want to know that 8 is invertible if III - T-l 0 811 < 1.) 9.8 There is a slight discrepancy between the statements of Theorem 11.2 in Chapter 3 and Theorem 9.3. In the one case we assert the existence of a unique continuous mapping from a ball M, and in the other case, from the ball M to the ball N. Show that the requirement that the range be in N can be dropped by showing that two continuous solutions must agree on M. (Use the point-by-point uniqueness of Theorem 9.3.) 9.9 Compute the expression for dFa from the identity G(~, F(~» = 0 in Theorem 9.4, and show that if K is continuously differentiable, then all the maps involved in the solution expression are continuous and that a 1--+ dFa is therefore continuous. 9.10 Going back to the example worked out at the end of Section 9, show by induction that the polynomials Un - U n-l and Vn - Vn-l contain no terms of degree less than n. 9.11 Continuing the above exercise, show therefore that the power series defined by taking the terms of degree at most n from Un is convergent in a ball about 0 and that its sum is the first component u(x, y) of the mapping inverse to H. 9.12 The above conclusions hold generally. Let J = -< K, L>- be any mapping from a ball about 0 in 1R2 to 1R2 defined by the convergent power series K(x, t) = L a;ixiyi, L(x, y) = L biiXiyi in which there are no terms of degree 0 or 1. With the conventions x = -<x, y>- and U = -<u, v >-, consider the iterative sequence Uo = 0, Un = X - J(Un-l). Make any necessary assumptions about what happens when one power series is sub- stituted in another, and show by induction that Un - Un-l contains no terms of degree less than n, and therefore that the Un define a convergent power series whose sum is the function u(x, y) = -<u(x, y), v(x, y) >- inverse to H in a neighborhood of O. [Remember that J(TJ) = H(TJ) - TJ.] 9.13 Let A be a Banach algebra, and let x be an element of A of norm less than 1. Show that ao (e - x)-1 = IT (1 +x2 ;=1
  • 248. 236 COMPACTNESS AND COMPLETENESS 4.10 This means that if 7rn is the partial product IH (1 + x2 then 7rn --. (e - x) -1. [Hint: Prove by induction that (e - X)7rn -l = e - x2n.J This is another example of convergence at an exponential rate, like Newton's method in the text. 10. THE INTEGRAL OF A PARAMETRIZED ARC In this section we shall make our final application of completeness. We first, prove a very general extension theorem, and then apply it to the construction of the Riemann integral as an extension of an elementary integral defined for step functions. TheoreD1 10.1. Let U be a subspace of a normed linear space V, and let 7' be a bounded linear mapping from U to a Banach space W. Then T has a uniquely determined extension to a bounded linear transformation 8 from the closure D to W. Moreover, 11811 = IITII. ProoJ. Fix a E Dand choose {~n} C U so that ~n -+ a. Then {~n} is Cauchy and {T(~n)} is Cauchy (by the lemmas of Section 7), so that {T(~n)} con- verges to some (3 E W. If {lIn} is any other sequence in U converging to a, then ~n - lIn -+ 0, T(~n) - T(lIn) = T(~n - lIn) -+ 0, and so T(lIn) -+ {3 also. Thus {3 is independent of the sequence chosen, and, clearly, (3 must be the valuc 8(a) at a of any continuous extension 8 of T. If a E U, then (3 = lim T(an) = T(a) by the continuity of T. We thus have 8 uniquely defined on D by the requirement that it be a continuous extension of T. It remains to be shown that 8 is linear and bounded by II Til. For any a, {3 E D we choose {~n}, {lIn} C U, so that ~n -+ a and lIn -+ (3. Then x~n + YlIn -+ x~ +YlI, so that 8(xa +y(3) = lim T(x~n +YlIn) = x lim T(~n) +Y lim T(lIn) = x8(a) +y8({3). Thus 8 is linear. Finally, 118(a)11 = lim IIT(~n)11 ~ IITlllim II~nll = !ITII· Ilall· Thus II Til is a bound for 8, and, since 8 includes T, 11811 = II Til. 0 The above theorem has many applications, but we shall use it only once, to obtain the Riemann integral f: J(t) dt of a continuous function J mapping a closed interval [a, b] into a Banach space Was an extension of the trivial integral for step functions. If W is a normed linear space and J: [a, b] -+ W is a con- tinuous function defined on a closed interval [a, b] C IR, we might expect to be able to define f: J(t) dt as a suitable vector in Wand to proceed with the integral calculus of vector-valued functions of one real variable. We haven't done this until now because we need the completeness of W to prove that the integral exists! At first we shall integrate only certain elementary functions called step functions. A finite subset A of [a, b] which contains the two endpoints a and b
  • 249. 4.10 THE INTEGRAL OF A PARAMETRIZED AHC 237 will be called a partition of [a, b). Thus A is (the range of) some finite sequence {ti}o,wherea= to < tl < ... < tn = b,andAsubdivides[a, b) into a sequence of smaller intervals. To be definite, we shall take the open intervals (ti-b ti), i = 1, ... , n, as the intervals of the subdivision. If A and B are partitions and A C B, we shall say that B is a j·efinement of A. Then each interval (Sj-b Sj) of the B-subdivision is included in an interval (ti-b ti) of the A-subdivision; ti-l is the largest element of A which is less than or equal to Sj-b and ti is the smallest greater than or equal to Sj. A step function is simply a map f: [a, b) ---t W which is constant on the intervals of some subdivision A = {ti}0· That is, thE'J'e exists a sequence of vectors {aiH such that f(~) = ai when ~ E (ii-b ti). The values off at the subdividing points may be among these values or they may be different. For each step function f we define f: f(t) dt as Li'=1 ai Ilti, where f = ai on (t;-b ti) and Ilti = ti - ti_l. If f were real-valued, this would be simply the sum of the areas of the rectangles making up the region between the graph of f and the t-axis. Now f may be described as a step function in terms of many different subdivisions. For example, if f is constant on the intervals of A, and if we obtain B from A by adding one new point s, then f is constant on the (smaller) intervals of B. We have to be sure that the value of the integral of f doesn't change when we change the describing subdivision. In the case just mentioned this is easy to see. The one new point slies in some interval (ti-l, ti), defined by the partition A. The contribution of this interval to the A-sum is ai(ti - ti_l), while in the B-sum it splits into ai(ti - s) +ai(s - ti_l). But this is the same vector. The remaining summands are the same in the two sums, and the integral is therefore unchanged. In general, suppose that f is a step function with respect to A and also with respect to C. Set B = A U C, the "common refinement" of A and C. We can pass from A to B in a sequence of steps at each of which we add one new point. As we have seen, the integral remains unchanged at each of these steps, and so it is the same for A as for B. It is similarly the same for C and B, and so for A and C. We have thus shown that f: f is independent of the subdivision used to define f. Now fix [a, b) and W, and let e be the set of all step functions from [a, b) to W. Then eis a vector space. For, iff and gin eare step functions relative to partitions A and B, then both functions are constant on the intervals of C = A u B, and therefore xf + yg is also. Moreover, if C = {ti} 0, and if on (ti-b ti) we havef = ai and g = (3;, so that xf +yg = xai +y{3i there, then the equation is just f: (xf +yg) = x f: f +y f: g. The map f ~ f: f is thus linear from e to W. Finally,
  • 250. 238 COMPACTNESS AND COMPLETENESS 4.10 where !!fll"" = lub {!!f(t)!! : t E [a, b]} = max {!!ai!! : 1 ~ i ~ n}. That is, if we use on S the uniform norm defined from the norm of W, then the linear mapping I ~ f; f is bounded by (b - a). If W is complete, this transformation therefore has a unique bounded linear extension to the closure S of S in (B([a, b], W) by Theorem 10.1. But we can show that S includes the space e([a, b], W) of all continuous functions from [a, b] to W, and the integral of a continuous function is thus uniquely defined. Lemma 10.1. e([a, b], W) c S. Proof. A continuous function f on [a, b] is uniformly continuous (Theorem 5.1). That is, given E>o, there exists 15>0 such that !s - tl < 15 => IIf(s) - I(t) II < E. Now take any partition A = {ti} 0on [a, b] such that Ati = ti - ti-l < 15 for all i, and take ai as any value of f on (ti-l, ti). Then IIf(t) - aill < E on [ti-b tJ Thus, if g is the step function with value ai on (ti-l, til and g(a) = aI, then IIf - gil"" ~ E. Thus f is in S, as desired. 0 Our main theorem is a recapitulation. Theorem 10.2. If W is a Banach space and V = e([a, b], W) under the uniform norm, then there exists a J E Hom(V, W) uniquely determined by setting J(f) = lim f; fn, where Un} is any sequence in S converging to f and f; fn is the integral on S defined above. Moreover, IIJII ~ (b - a). Iffis elementary from [a, b] to Wand c E [a, b], then of coursefis elementary on each of [a, c] and [c, b]. If c is added to a subdivision A used in defining f, and if the sum defining f; f with respect to B = A u {c} is broken into two sums at c, we clearly have f;I = f:f + febf. This same identity then follows for any continuous function f on [a, b], since f; I = lim f; In = lim (f:In + Lb fn) = lim f: fn+lim LbIn = f: I +Lbf. The fundamental theorem of the calculus is still with us. Theorem 10.3. If f E e([a, b], W) and F: [a, b] --t W is defined by F(x) = f: f(t) dt, then F' exists on (a, b) and F'(x) = f(x). Proof. By the continuity of fat Xo, for every E there exists a 15 such that IIf(xo) - f(x)!1 < E whenever Ix - xol < 15. But then IIJ~: (J(xo) - f(t» dtl/ ~ Elx - xol, and since fx~ f(xo) dt = f(xo} (x - xo) by the definition of the integral for an elementary function, we see that Ilf(xo) - (1:f(t) dt/(x - xo))11 < E. Since f:a f(t) dt = F(x) - F(xo), this is exactly the statement that the differ- ence quotient for F converges to f(xo), as was to be proved. 0
  • 251. 4.10 THE INTEGRAL OF A PARAMETRIZED ARC 239 EXERCISES 10.1 Prove the following analogue of Theorem 10.1. Let A be a subset of a metric space B, let C be a complete metric space, and let F: A ~ C be uniformly continuous. Then F extends uniquely to a continuous map from A to C. 10.2 In Exercises 7.16 through 7.18 we have constructed a completion of 8, namely, 0[8] in V*. Prove that this completion is unique to within isometry. That is, supposing that cp is some other isometric imbedding of 8 in a complete space X, show that the identification of the two images of 8 by cp 0 0-1 (from 0[8] to cp[8J) extends to an isometric bijection from O[S] to cp[S]. [Hint: Apply the above exercise.] 10.3 Suppose that 8 is a normed linear space X and that X is a dense subset of a complete metric space Y. This means, remember, that every point of Y is the limit of a sequence lying in the subset X. Prove that the vector space structure of X extends in a unique way to make Y a Banach space. Since we know from Exercise 7.18 that a metric space can be completed, this shows again that a normed linear space can always be completed to a Banach space. 10.4 In the elementary calculus, if f is continuous, then {f(t) dt = f(x)(b - a) for some x in (a, b). Show that this is not true for vector-valued continuous functionsf by considering the arc f: [0, 11"] ~ 1R2 defined by f(t) = -< sin t, cos t>-. 10.5 Show that integration commutes with the application of linear transformations. That is, show that if f is a continuous function from [a, b] to a Banach space W, and if T E Hom(W, X), where X is a Banach space, then { T(f(t)) dt = T[{ f(t) dt]. [Hint: Make the computation directly for step functions.] 10.6 State and prove the theorem suggested by the following identity: { -<f(t), get) >- dt = -< { f(t) dt, { get) dt >-. (Apply the above exercise.) 10.7 Let W be any normed linear space, {ain a finite set of vectors in W, and {fin a corresponding set of real-valued continuous functions on [a, b]. Define the arc l' by "1'(t) = L:j;(t)a;. 1 Prove that f: 1'(t) dt exists and equals *[labI;(t) dt] ai.
  • 252. 240 COMPACTNESS AND COMPLETENESS 4.11 10.8 Let f be a continuous function from ~2 to a Banach space W. Describe how one might set up a theory of a double integral ff f(s, t) ds dt, IXJ where I X J is a closed rectangle. 10.9 Prove that if fn converges uniformly to f, then { fn(t) dt ~ {f(t) dt. This is trivial if you have understood the definition and properties of the integral. 10.10 Suppose that {fn} is a sequence of smooth arcs from [a, b] to a Banach space TV such that Llf~(t) is uniformly convergent. Suppose also that Llfn(a) is convergent. Prove that then L fn(t) is uniformly convergent, that f = Ll fn is smooth, and that l' = Llf~. (Use the above exercise and the fundamental theorem of the calculus.) 10.n Prove that even if lV is not a Banach space, if the arc f: [a, b] ~ W has a continuous derivative, then f: f' exists and equals f(b) - f(a). 10.12 Let X be a normed linear space, and set (I, ~) = l(~) for ~ E X and IE X*. Now let f and g be continuously differentiable functions (arcs) from the closed interval [a, b] to X and X*, respectively. Prove the integration by parts formula: (g(b),f(b» - (g(a),f(a» = {(f(t), g'(t» dt +lab (j'(t), g(t» dt. [Hint: Apply Theorem 8.4 from Chapter 3.] 10.13 State the generalization of the above integration by parts formula that holds for any bounded bilinear mapping w: V X lV ~ X, where X is a Banach space. 10.14 Let t ~ It be a fixed continuous map from a closed interval [a, b] to the dual W* of a Banach space lV. Suppose that for any continuous map g from [a, b] to lV { g(t) dt = 0 =? {It(g(t)) dt = O. Show that there exists a fixed L E W* such that {It(g(t)) dt = L ({ g(t) dt) for all continuous arcs g: [a, b] ~ W. Show that it then follows that It = L for all t. 10.15 Use the above exercise to deduce the general Euler equation of Section 3.15. n. THE COMPLEX NUMBER SYSTEM The complex number system C is the third basic number field that must be studied, after the rational numbers and the real numbers, and the reader surely has had some contact with it in the past. Almost everybody views a complex number ~ as being equivalent to a pair of real numbers, the "real and imaginary parts" of ~, and the complex number system C is thus viewed as being Cartesian 2-space ~2 with some further struc-
  • 253. 4.11 THE COMPLEX NUMBER SYSTEM 241 ture. In particular, a complex-valued function is simply a certain kind of vector- valued function, and is equivalent to an ordered pair of real-valued functions, again its real and imaginary parts. What distinguishes the complex number system C from its vector substratum 1R2 is the presence of an additional operation, complex multiplication. The vector operations of 1R2 together with this complex multiplication operation make C into a commutative algebra. Moreover, it turns out that -< 1, 0>- is the unique multiplicative identity in C and that every nonzero complex number ~ has a multiplicative inverse. These additional facts are summarized by saying that C is a field, and they allow us to use C as a new scalar field in vector space theory. In fact, the whole development of Chapters 1 and 2 remains valid when IR is replaced everywhere by Co Scalar multiplication is now multiplication by complex numbers. Thus cn is the vector space of ordered n-tuples of complex numbers -< ~l! ... , ~n >-, and the product of an n-tuple by a complex scalar ex. is defined by ex.-< ~l! ... ' ~n>- = -<ex.~l! ... ' ex.~n>-' where ex.~i is complex multiplication. It is time to come to grips with complex multiplication. As the reader prob- ably knows, it is given by an odd looking formula that is motivated by thinking of an element ~ = -<Xl! X2>- as being in the form Xl + iX2' where i 2 = -1, and then using the ordinary laws of algebra. Then we have ~." = (Xl +ix2)(YI +iY2) = XIYI + iXIY2 + iX2YI + i2X2Y2 = (XIYI - X2Y2) + i(XIY2 + X2Yl), and thus our definition is -<Xl, X2>- -< YI, Y2>- = -<XIYI - X2Y2, XIY2 + X2YI >-. Of course, it has to be verified that this operation is commutative and satisfies the laws for an algebra. A straightforward check is possible but dull, and we shall indicate a neater way in the exercises. The mapping X ~ -<X, 0>- is an isomorphic injection of the field IR into the field C. It clearly preserves sums, and the reader can check in his mind that it also preserves products. It is conventional to identify X with its image -< x, 0>- , and so to view IR as a subfield of C. The mysterious i can be identified in C as the pair -<0, 1>-, since then i 2 = -<0,1>- -<0,1>- = -< -1,0>-, which we have identified with -1. With these identifications we have -<x, y>- = -<X, 0>- + -<0, y>- = -< X, 0>- + -<0, 1>- -< y, 0>- = X + iy, and this is the way we shall write complex numbers from now on. The mapping X + iy ~ X - iy is a field isomorphism of C with itself. That is, it preserves both sums and products, as the reader can easily check. Such a self-isomorphism is called an automorphism. The above automorphism is called complex conjugation, and the image X - iy of r = X + iy is called the conjugate of r, and is designated r. We shall ask the reader to show in an exercise
  • 254. 242 COMPACTNESS AND COMPLETENESS 4.11 that conjugation is the only automorphism of C (except the identity automor- phism) which leaves the elements of the subfield IR fixed. The Euclidean norm of r = x + iy = -<x, y>- is called the absolute value of r, and is designated Irl, so that Irl = Ix + iyl = (x2+ y2)1/2. This is reasonable beeause it then turns out that Ipl = Irll'Yl. This can be verified by squaring and multiplying, but it is much more elegant first to notice the relation- ship between absolute value and the conjugation automorphism, namely, rf = Irl2 [(x+iy)(x - iy) = x2 - (iy) 2= X 2+y2]. Then Ipl2 = (p)(p) = (rf)('Y'Y) = IrI21'Y12, and taking square roots gives us our identity. The identity rf = Irl2 also shows us that if r ~ 0, then fllrl 2is its multiplicative inverse. Because the real number system IR is a subfield of the complex number system C, any vector space over C is automatically also a vector space over IR: multiplication by complex scalars includes multiplication by real scalars. And any complex linear transformation between complex vector spaces is auto- matically real linear. The converse, of course, does not hold. For example, a real linear mapping T from 1R2 to 1R2 is not in general complex linear from C to C, nor does a real linear S in Hom 1R4 become a complex linear mapping in Hom C2 when 1R4 is viewed as C2• We shall study this question in the exercises. The complex differentiability of a mapping F between complex vector spaces has the obvious definition llF", = T + ('), where T is complex linear, and then F is also real differentiable, in view of the above remarks. But F may be real differentiable without being complex differentiable. It follows from the dis- cussion at the end of Section 8 that if {an} C C and {lanI5n} is bounded, then the series L anrn converges on the ball Ba(O) in the (real) Banach algebra C, and F(r) = La anrnis real differentiable on this ball, with dFIl(r) = (L~ nan(3n)r = F'«(3) . r But multiplication by F'«(3) is obviously a complex linear operation on the one-dimensional complex vector space Co Therefore, complex-valued functions defined by convergent complex power series are automatically com- plex differentiable. But we can go even further. In this case, if r ~ 0, we can divide by r in the defining equation to get the result that llFIlW ~ F'«(3) r as r~o. That is, F'«(3) is now an honest derivative again, with the complex infinitesimal r in the denominator of the difference quotient. The consequences of complex differentiability are incalculable, and we shall mostly leave them as future pleasures to be experienced in a course on functions of complex variables. See, however, the problems on the residue calculus at the end of Chapter 12 and the proof in Chapter 11, Exercise 4.3, of the following fundamental theorem of algebra.
  • 255. 4.11 THE COMPLEX NUMBER SYSTEM 243 Theorem. Every polynomial with complex coefficients is a product of linear factors. A weaker but equivalent statement is that every polynomial has at least one (complex) root. The crux of the matter is that x2 + 1 cannot be factored over jR (i.e., it has no real root), but over C we have x 2 + 1 = (x + i)(x -- i), with the two roots ± i. For later use we add a few more words about the complex exponential func- tion exp r = ei = LO' rn/n!. If r = x +iy, we have ei = ex+ill = eXeill, and ei1l = LO' (iy)n/n! = (1 - y2/2! +y4/4! - ...) +i(y - y3/3!+ yS/5! - ...) = cos y +i sin y. Thus ex +i1l = eX(cos y +i sin y). That is, the real and imaginary parts of the complex-valued function exp (x +iy) are eX cos y and eX sin y, respectively. EXERCISES 11.1 Prove the associativity of complex multiplication directly from its definition. B.2 Prove the distributive law, a(~ +71) = a~ + a71, for complex numbers. B.3 Show that scalar multiplication by a real number a, a-< x, y>- = -<ax, ay >-, in C = 1R2 is consistent with the interpretation of a as the complex number -<a, 0>- and the definition of complex multiplication. 11.4 Let 8 be an automorphism of the complex number field leaving the real numbers fixed. Prove that 8 is either the identity or complex conjugation. [Hint: (8(i))2 = fJ(i2) = fJ( -1) = -1. Show that the only complex numbers x +iy whose squares are -1 are ±i, and then finish up.] 11.5 If we remember that C is in particular the two-dimensional real vector space 1R2, we see that multiplying the elements of C by the complex number a +ib must define a linear transformation on 1R2. Show that its matrix is 11.6 The above exercise suggests that the complex number system may be like the set A of all 2 X 2 real matrices of the form Prove that A is a subalgebra of the matrix algebra jR2X2 (that is, A is closed under multiplication, addition, and scalar multiplication) and that the mapping [: -!]~ a+ ib is a bijection from A to C that preserves all algebra operations. We therefore can conclude that the laws of an algebra automatically hold for C. Why?
  • 256. 244 COMPACTNESS AND COMPLETENESS 4.11 II.7 In the above matrix model of the complex number system show that the abso- lute value identity Inl = ls-I 1')'1 is a determinant property. II.S Let lr be a real vector space, and let V be the real vector space W X W. Show that there is a () in Hom V such that (}2 = -1. (Think of C as being the real vector space ~2 = ~ X ~ under multiplication by i.) 11.9 Let V be a real vector space, and let () in Hom V satisfy (}2 = -1. Show that r becomes a complex vector space if ia is defined as (}(a). If the complex vector space r is made from the real vector space ll' as in this and the above exercise, we shall call V the complexification of lr. We shall regard IV itself as being a real subspace of r (actually lr X {OJ), and then V = WEe ill'. II.IO Show that the complex vector space C" is the complexification of ~". Show more generally that for any set .1 the complex vector space CA is the complexificatioll of the real vector space ~A. II.II Let V be the complexification of the real vector space lV. Define the operation of complex conjugation on V. That is, show that there is a real linear mapping <p such that <p2 = 1 and <p(ia) = -i<p(a). Show, conversely, that if V is a complex vector space and <p is a conjugation on V [a real linear mapping <p such that <p2 = 1 and <p(ia) = -i<p(a)], then V is (isomorphic to) the complexification of a real linear spac(' W. (Apply Theorem 5.5 of Chapter 1 to the identity <p2 - 1 = 0.) II.12 Let W be a real vector space, and let V be its complexification. Show that, every T in Hom lr "extends" to a complex linear S in Hom V which commutes with thp conjugation <p. By S extending T we mean, of course, that S I (lJ' X {O}) = T. Show, conversely, that if S in Hom V commutes with conjugation, then S is th!' extension of a T in Hom W. II.13 In this situation we naturally call S the complexification of T. Show finally that if S is the complexification of T, then its null space X in V is the direct sum X = N Ee iN, where N is the null space of Tin lV. Remember that we are viewing r as lV Ee ilV. II.14 On a complex normed linear space V the norm is required to be complex homo- geneous: IIAali = IAI . Iiall for all complex numbers A. Show that the natural definitions of II Ill, II 112, and II 1100 on Cn have this property. II.IS If a real normed linear space lV is complexified to V = W Ee ilV, there is no trivial formula which converts the real norm for lV into a complex norm for V. Sho that, nevertheless, any product norm on V (which really is lr X lr) can be used to generate an equivalent complex norm. [Hint: Given -< ~, 1] > E V, consider the set of numbers {1I(x +iy) -<~, 1] >II: Ix +iyl = I}, and try to obtain from this set a singl!' number that works.] II.16 Show that every nonzero complex number has a logarithm. That is, show that. if u + iv ~ 0, then there exists an x + iy such that e,,+ill = u + iv. (Write the equatioll e"(cos y + i sin y) = u + iv, and solve by being slightly clever.) II.17 The fundamental theorem of algebra and Theorem 5.5 of Chapter 1 imply that if V is a complex vector space and T in Hom V satisfies p(T) = 0 for a polynomial
  • 257. 4.12 WEAK METHODS 245 p, then there are subspaces {Vi] 1of V, complex numbers [Xi] 1, and integers [?lti] 1such that V = E81 Vi, Vi is T-invariant for each, and (T - XiT)mi = 0 on 1'i for each i. Show that this is so. Show also that if V is finite-dimensional, then every T in Hom V must satisfy some polynomial equation p(t) = O. (Consider the linear independence or dependence of the vector I, T, T2, ... , Tn2 , ••• ,in the vector space Hom V.) Il.I8 Suppose that the polynomial p in the above exercise has real coefficients. Use the fact that complex conjugation is an automorphism of IC to prove that if Xis a root of p, then so is X. Show that if V is the complexification of a real space Wand T is the complexifica- tion of R E Hom W, then there exists a real polynomial p such that p(T) = O. Il.I9 Show that if W is a finite-dimensional real vector space and R E Hom W is an isomorphism, then there exists an A E Hom W such that R = eA (that is, log R exists). This is a hard exercise, but it can be proved from Exercises 8.19 through 8.23, 11.12, 11.17, and 11.18. *12. WEAK METHODS Our theorem that all norms are equivalent on a finite-dimensional space suggests that the limit theory of such spaces should be accessible independently of norms, and our earlier theorem that every linear transformation with a finite-dimen- sional domain is automatically bounded reinforces this impression. We shall look into this question in this section. In a sense this effort is irrelevant, since we can't do without norms completely, and since they are so handy that we use them even when we don't have to. Roughly speaking, what we are going to do is to study a vector-valued map F by studying the whole collection of real-valued maps (l 0 F : 1E V*) . Theorem 12.1. If V is finite-dimensional, then ~n ~ ~ in V (with respect to any, and so every, norm) if and only if lan) ~ l(~) in IR for each 1in V*. Proof. If ~n ~ ~ and 1E V*, then l(~n) ~ l(~), since 1 is automatically con- tinuous. Conversely, if l(~n) ~ l(~) for every 1 in V*, then, choosing a basis {tli}~ for V, we have Ei(~n) ~ Ei(~) for each functional Ei in the dual basis, and this implies that ~n ~ ~ in the associated one-norm, since I! ~n - ~!! 1 = L~ IEi(~n) - Ei(~)1 ~ o. 0 Remark. If V is an arbitrary normed linear space, so that V* = Hom(V, IR) is the set of bounded linear functionals, then we say that ~n ~ ~ weakly if l(~n) ~ l(~) for each 1E V*. The above theorem can therefore be rephrased to Ray that in a finite-dimensional space, weak convergence and norm convergence are equivalent notions. We shall now see that in a similar way the integration and differentiation of parametrized arcs can all be thrown back to the standard calculus of real-valued functions of a real variable by applying functionals from V* and using the natural isomorphism of V** with V. Thus, if f E e([a, b], V) and XE V*, then
  • 258. 246 COMPACTNESS AND COMPLETENESS 4.1~' A0 f E e([a, bJ. ~), and so the integral I: A0 f exists from standard calculus. If WI' vary A, we can check that the map A1--+ I: A0 f is linear, hence is in V**, awl therefore is given by a uniquely determined vector a E V (by duality; S('I' Chapter 2, Theorem 3.2.). That is, there exists a unique a E V such thai A(a) = I: A0 f for every AE V*, and we define this a to be I:f. Thus integra tion is defined so as to commute with the application of linear functionab I:f is that vector such that A(labf) = lab A(j(t)) dt for all A E V*. Similarly, if all the real-valued functions {A 0 f: AE V*} are differentiahll' at xo, then the mapping A1--+ (A 0 f), (xo) is linear by the linearity of the derivati'I . in the standard calculus: Therefore, there is again a unique a E V such that (A 0 j)'(xo) = A(a) for all AE V*, and if we define this a to be the derivative l'(xo), we have again defined an opcr- ation of the calculus by commutativity with linear functionals: (A of') (xo) = (A 0 f), (xo). Now the fundamental theorem of the calculus appears as follows. If F(x) = I: f, then (A 0 F)(x) = I: A0 f by the weak definition of tlw integral. The fundamental theorem of the standard calculus then says thai (A 0 F)' exists and (A 0 F)'(x) = (A 0 f) (x) = A(j(X)). By the weak definition 01 the derivative we then have that F' exists and F'(x) = f(x). The one conclusion that we don't get so easily by weak methods is the nOflll inequality [[I:f[[ ~ (b - a)[[f[["", This requires a theorem about norms Oil finite-dimensional spaces that we shall not prove in this course. Theorem 12.2. [[a**[[ = [[all for each a E V. What is being asserted is that lub [a**(A)[/[[AII = [[all. Since a**(A) = A(a), and since IA(a)1 ~ IIAII . IJal1 by the definition of IIAII, we see that lub la**(A)I/[IAII ~ Ila[l· Our problem is therefore to find AE V* with IIAII = 1 and IA(a)1 = Iiali. If WI' multiply through by a suitable constant (replacing a by ca, where c = 1/I!all). we can suppose that Ila!1 = 1. Then a is on the unit spherical surface, and tIll' problem is to find a functional AE V* such that the affine subspace (hyperplane) where A= 1 touches the unit sphere at a (so that A(a) = 1) and otherwi:-w lies outside the unit sphere (so that IA(OI ~ 1 when II ~II = 1, and henc(' [IAII ~ 1). It is clear geometrically that such "tangent planes" must exist, but, we shall drop the matter here.
  • 259. 4.12 WEAK METHODS 247 If we assume this theorem, then, since Ix (lab1)1 = lib x(j(t)) dtl ~ (b - a) max {lx(j(t))I: tE [a, bJ} ~ (b - a)IIXII max {11/(t)II} (from IX(a)1 ~ IIXII· Iiall) = (b - a)IIXII . 11/1100, we get the extreme members of which form the desired inequality.
  • 260. CHAPTER 5 SCALAR PRODUCT SPACE~ In this short chapter we shall look into what is going on behind two-norms, arrd we shall find that a wholly new branch of linear analysis is opened up. TIll'c' norms can be eharaderized abHtractly a::; those arising from sealar prodw·t They are the finite and infinite-dimensional analogues of ordinary geometri,' length, and they carry with them practically all the concepts of Eucliden" geometry, such as the notion of the angle between two vectors, perpendicularit, (orthogonality) and the Pythagorean theorem, and the existence of marr.' rigid motions. The impact of this extra structure is particularly dramatic for infinil(' dimensional spaces. Infinite orthogonal bases exist in great profusion and C:tII be handled about as easily a::; bases in finite-dimensional spaces, although 01<' basis expamlion of a vector is now a convergent infinite series, ~ = L~ x,n, lIany of the most important series expansions in mathematics are examples (II such orthogonal basis expansions. For example, we shall see in the next chapkr that the Fourier series expan::;ion of a continuous function 1on [0, 71'] is the basi" expansion of 1 under the two-norm 111112 = (fo12)1/2 for the particular orthog onal basis [an) ~ = {sin lIt)~. If a vector space is complete under a scalar prodW't norm, it is called a Hilbert space. The more advanced theory of such space::; i.': one of the most beautiful parts of mathematics. 1. SCALAR PRODUCTS A scalar product on a real vector ::;pace r is a real-valued function from V X to ~, its value at the pair -<~, TJ >- ordinarily being designated (~, TJ), such that a) (~, TJ) iH linear in ~ when TJ is held fixed; b) (~, TJ) = (TJ,~) (symmetry); c) (~, 0 > 0 if ~ ~ 0 (positive definiteness). If (c) is replaced by the weaker condition c') (~, ~) ;:::: 0 for all ~ E TT, then (~, TJ) is called a semiscalar product. Two important examples of scalar products are n (x, Y) = L :riYi 1 when V = ~n 248
  • 261. 5.1 SCALAR PRODUCTS 249 and (f, g) = labf(t) g(t) dt when v = e([a, b]). On a complex vector space (b) must be replaced by b/) (~, 1]) = (1], ~) (Hermitian symmetry), where the bar denotes complex conjugation. The corresponding examples are (z, w) = L1 Z.Wi when V = en and (f, g) = J:jg when V is the space of continuous complex-valued functions on [a, b]. We shall study only the real case. It follows from (a) and (b) that a semiscalar product is also linear in the second variable when the first variable is held fixed, and therefore is a symmetric bilinear functional whose associated quadratic form q(~) = (~, ~) is positive definite or positive semidefinite [(c) or (c/); see the last section in Chapter 2]. The definiteness of the form q has far-reaching consequences, as we shall begin to see at once. Theorem 1.1. The Schwarz inequality I(~, 1])1 ~ (~, ~)1/2(1], 1])1/2 is valid for any semiscalar product. Proof. We have 0 ~ (~ - t1], ~ - t1]) = (~, ~) - 2t(~, 1]) + t2 (1], 1]) for every t E~. Since this quadratic in t is never negative, it cannot have distinct roots, and the usual (b2 - 4ac)-formula implies that 4(~, 1])2 - 4(~, ~)(1], 1]) ~ 0, which is equivalent to the Schwarz inequality. 0 We can also proceed directly. If (1], 1]) > 0, and if we set t = (~, 1])/(1], 1]) in the quadratic inequality in the first line of the proof, then the resulting expression simplifies to the Schwarz inequality. If (1], 1]) = 0, then (~, 1]) must also be 0 (or else the beginning inequality is clearly false for some t), and now the Schwarz inequality holds trivially. Corollary. If (~, 1]) is a scalar product, then II ~II = (~, ~)1/2 is a norm. Proof II~ + 1]11 2 = (~+ 1], ~ + 1]) = II ~112 + 2(~, 1]) + 111]11 2 ~ II ~112 + 211 ~II 111]11 + 111]11 2 = (II ~II + 111]11)2, (by Schwarz) proving the triangle inequality. Also, Ilc~1I = (c~, C~)1/2 = (C2(~, ~»)1/2 = lei II ~II. 0 Note that the Schwarz inequality I(~, 1])1 ~ 1I~1I111]11 is now just the state- ment that the bilinear functional (~, 1]) is bounded by one with respect to the scalar product norm. A normed linear space V in which the norm is a scalar product norm is called a pre-Hilbert space. If V is complete in this norm, it is a Hilbert space.
  • 262. 250 SCALAR PRODUCT SPACES 5.1 The two examples of scalar products mentioned earlier give us the real explana- tion of our two-norms for the first time: ( n )1/2 lxl12 = ~ x~ for x E IRn and for IE e([a, b]) are scalar product norms. Since the scalar product norm on IRn becomes Euclidean length under a Cartesian coordinate correspondence with Euclidean n-space, it is conventional to call IRn itself Euclidean n-space IP when we want it understood that thp scalar product norm is being used. Any finite-dimensional space V is a Hilbert space with respect to any scalar product norm, because its finite dimensionality guarantees its completeness. On the other hand, we shall see in Exercise 1.10 that e([a, b]) is incomplete in th(' two-norm, and is therefore a pre-Hilbert space but not a Hilbert space in this norm. (Remember, however, that e([a, b]) is complete in the uniform norm 11111",.) It is important to the real uses of Hilbert spaces in mathematics that any pre-Hilbert space can be completed to a Hilbert space, but the theory of infinite- dimensional Hilbert spaces is for the most part beyond the scope of this book. Scalar product norms have in some sense the smoothest possible unit spheres, because these spheres are quadratic surfaces. It is orthogonality that gives the theory of pre-Hilbert spaces its special flavor. Two vectors a and fJ are said to be orthogonal, written a 1- fJ, if (a, fJ) = O. This definition gets its inspiration from geometry; we noted in Chapter 1 that two geometric vectors are perpendicular if and only if their coordinate triples x and y satisfy (x, y) = O. It is an interesting problem to go further and to show from the law of cosines (c2 = a2 + b2 - 2ab cos (J) that the angle (J between two geometric vectors is given by (x, y) = Ilxllllyll cos (J. This would motivate us to define the angle (J between two vectors ~ and 'I] in a pre-Hilbert space by (~, '1]) = II ~IIII'I]II cos (J, but we shall have no use for this more general formu- lation. We say that two subsets A and B are orthogonal, and we write A 1- B, if a 1- fJ for every a in A and fJ in B; for any subset A we set A1- = {fJ E V: fJ 1- AJ. Lemma 1.1. If fJ is orthogonal to the set A, then fJ is orthogonal to L(A), tlw closure of the linear span of A. It follows that B1- is a closed subspace for every subset B. Proof. The first assertion depends on the linearity and continuity of the scalar product in one of its variables; it will be left to the reader. As for A = B1-, it includes the closure of its own linear span, by the first part, and so is a closed subspace. 0
  • 263. 5.1 SCALAR PRODUCTS 251 Lemma 1.2. In any pre-Hilbert space we have the parallelogram law, Iia + ~112 + Iia - ~112 = 2(lla112+ 11~112), and the Pythagorean theorem, a.l.~ if and only if If {aig is a (pairwise) orthogonal collection of vectors, then Proof. Since Iia +~112 = IIal12+ 2(a,~) + 1I~112, by the bilinearity of the scalar product Iia + ~112, we see that Iia +~112 = IIal12+ 11~112 if and only if (a,~) = 0, which is the Pythagorean theorem. Writing down the similar expansion of Iia - ~112 and adding, we have the parallelogram law. The last statement follows from the Pythagorean theorem and Lemma 1.1 by induction. Or we can obtain this statement directly by expanding the scalar product on the left and noticing that all "mixed terms" drop out by orthogonality. 0 The reader will notice that the Schwarz inequality has not been used in this lemma, but it would have been silly to state the lemma before proving that II~II = (~, ~)1/2 is a norm. If {ai}i are orthogonal and nonzero, then the identity IIl:~ Xiail12 = l:i xlilail12shows that l:i Xiai can be zero only if all the coefficients Xi are zero. Thus, Corollary. A finite collection of (pairwise) orthogonal nonzero vectors is independent. Similarly, a finite collection of orthogonal subspaces is independent. EXERCISES 1.1 Complete the second proof of Theorem 1.1. 1.2 Reexamine the proof of Theorem 1.1 and show that if ~ and 7J are independent, then the Schwarz inequality is strict. 1.3 Continuing the above exercise, now show that the triangle inequality is strict if ~ and 7J are independent. 1.4 a) Show that the sum of two semiscalar products is a semiscalar product. b) Show that if (p., v) is a semiscalar product on a vector space 1V and if T is a linear transformation from a vector space V to W, then I~, 7Jl = (Tp., Tv) is a semiscalar product on V. c) Deduce from (a) and (b) that (j, g) = f(a)g(a) +t!'(t)g'(t) dt is a semiscalar product on V = el (la, b]). Prove that it is a scalar product.
  • 264. 252 SC..LAR PRODUCT SPACES r. ., d. 1.5 If a is held fixed, we know that f(~) = (~, a) is continuous. Why? Prove mOIl generally that (~, 1/) is continuous as a map from V X V to R 1.6 Let l' be a two-dimensional Hilbert space, and let {al, (2) be any basis for Show that a scalar product (~, 1/) has the form (t 1/) = aXIYI + b(XIY2 + X2YI) +CX2Y2, where b2 < ac. Here, of course, ~ = Xlal + X2a2, 1/ = Ylal + Y2a2. 1.7 Prove that if w(x, y) = aXIYI + b(XIY2 + X2YI) +CX2Y2 and b2 < ac, then ,.' is a scalar product on 1R2. 1.8 Let w(~, 1/) be any symmetric bilinear functional on a finite-dimensional veclut space Y, and let q(~) = w(~, ~) be its associated quadratic form. Show that for aliI choice of a basis for Y the equation IJ(~) = 1 becomes a quadratic equation in til< coordinates {x;] of ~. 1.9 Prove in detail that if a vector {3 is orthogonal to a set A in a pre-Hilbert spa(·,'. then (3 is orthogonal to L(A). 1.10 We know from the last chapter that the Riemann integral is defined for the ;;('1 eof uniform limits of real-valued step functions on [0, 1] and that eincludes all till' continuous functions. Given that k is the step function whose value is 1 on [0, !l and o on [!, 1], show that Ilf - kl12 > 0 for any continuous function f. Show, howevpl'. that there is a sequence of continuous functions {fn) such that Ilfn - kl12 ---> O. ShOll. therefore, that e([O, 1]) is incomplete in the two-norm, by showing that the abo (. sequence [jn] is Cauchy but not convergent in e([O, 1]). 2. ORTHOGONAL PROJECTION One of the most important devices in geometric reasoning is "dropping a per- pendicular" from a point to a line or a plane and then using right triangh· arguments. This device is equally important in pre-Hilbert space theory. If M is a subspace and a is any element in V, then by "the foot of the perpendicular dropped from a to M" we mean that vector J.L in M such that (a - J.L) 1- 111, if such a J.L exists. (See Fig. 5.1.) Writing a as J.L + (a - J.L), we see that tllP existence of the "foot" J.L in JIll for each a in V is equiv- alent to the direct sum decomposition V = M $ M 1-. N ow it is precisely this direct sum decomposition that the completeness of a Hilbert space guarantees, as we shall shortly see. We start by proving the geometrically intuitive fact that J.L is the foot of the perpendicular dropped from a to M if and only if J.L is the point in M closest to a. It ig.5.1 111 Lemma 2.1. If J.L is in the subspace M, then (a - J.L) 1- M if and only'if J.L is the unique point in M closest to a, that is, J.L is the "best approximation" to a in M. Proof. If (a - J.L) 1- M and ~ is any other point in M, then Iia - ~112 = II(a - J.L) + (J.L - ~)112 = Iia - J.L112 + IIJ.L - ~112 > Iia - J.L112. Thus J.L is the
  • 265. 5.2 ORTHOGONAL PROJECTION 253 unique point in M closest to a. Conversely, suppose that JJ. is a point in M closest to a,and let ~beanynonzerovectorinM. Then Iia -JJ.11 2 .:::; II(a -JJ.)+t~1I2, which becomes 0 .:::; 2t(a - JJ., 0 + t211 ~112 when the right-hand scalar product is expanded. This can hold for all t only if (a - JJ., ~) = 0 (otherwise let t = ?). Therefore, (a - JJ.) .1. M. 0 On the basis of this lemma it is clear that a way to look for JJ. is to take a sequence JJ.n in M such that Iia - JJ.nll --t p(a, M) and to hope to define JJ. as its limit. Here is the crux of the matter: We can prove that such a sequence {JJ.n} is always Cauchy, but its limit may not exist if M is not complete! Lemma 2.2. If {JJ.n} is a sequence in the subspace M whose distance from some vector a converges to the distance p from a to M, then {JJ.n} is Cauchy. Proof. By the parallelogram law, IIJJ.n - JJ.m11 2 = II(a - JJ.n) - (a - JJ.m)112 = 2(lla - JJ.n11 2 + Iia - JJ.mI1 2) - 112a - (JJ.n +JJ.m)112. Since the first term on the right converges to 4p2 as n, m --t 00, and since the second term is always .:::; -4p2 (factor out the 2), we see that IIJJ.n - JJ.m1l 2 --t 0 as n,m --t 00. 0 Theorem 2.1. If M is a complete subspace of a pre-Hilbert space V, then V = M ED M l., In particular, this is true for any finite-dimensional sub- space of a pre-Hilbert space and for any closed subspace of a Hilbert space. Proof. This follows at once from the last two lemmas, since now JJ. = lim JJ.n exists, Iia - JJ.II = p(a, M), and so (a - JJ.) .1. M. 0 If V = M ED M-L, then the projection on M along Ml. is called the orthogonal projection on M, or simply the projection on M, since among all the projections on M associated with the various complements of M, the orthogonal projection is distinguished. Thus, if M is a complete subspace of V, and if P is the projection on M, then P(~) is at once the foot of the perpendicular dropped from ~ to M (which is where the word "projection" comes from) and also the best approxi- mation to ~ in M (Lemma 2.1). Lemma 2.3. If {MiH is a finite collection of complete, pairwise orthogonal subspaces, and if for a vector a in V, ai is the projection of a on "Ali for i = 1, ... , n, then L~ ai is the projection of a on EB~ Mi. Proof. We have to show that a - L~ ai is orthogonal to EB~ Mj, and it is sufficient to show it orthogonal to each M j separately. But if ~ E Mj, then (a - L~ ai, ~) = (a - aj, ~), since (ai, ~) = 0 for i ~ j, and (a - aj, ~) = 0 because aj is the projection of a on M j. Thus (a - L~ ai, ~) = O. 0 Lemma 2.4. The projection of ~ on the one-dimensional span of a single nonzero vector." is «(~, .")/11.,,11 2).,,. Proof. Here JJ. must be of the form x.". But (~ - x.,,) .1. ." if and only if or
  • 266. 254 SCALAR PRODUCT SPACES 5 ') .- We call the number (~, '17)/11'17112the 'I7-Fourier coefficient of~. If '17 is a unit (normalized) vector, then this Fourier coefficient is just (~, '17). It follows from Lemma 2.3 that if {'Pi} ~ is an orthogonal collection of nonzero vectors, and if {Xin are the corresponding Fourier coefficients of a vector ~, then L~ Xi'Pi is thl' projection of ~ on the subspace M spanned by {'Pi} ~. Therefore, ~ - L~ Xi'Pi -L M. and (Lemma 2.1) L~ Xi'Pi is the best approximation to ~ in M. If ~ is in M, thcll both of these statements say that ~ = L~ Xi'Pi. (This can of course be verified directly, by letting ~ = L~ ai'Pi be the basis expansion of ~ and computinl!; (~, 'Pj) = L~ ai('Pi, 'Pj) = ajll'PjI12.) If an orthogonal set of vectors {'Pi} is also normalized (lI'Pill = 1), then we call the set orthonormal. Theorem 2.2. If {'Pi} ~ is an infinite orthonormal sequence, and if {Xi} ~ are the corresponding Fourier coefficients of a vector ~, then (Bessel's inequality), and ~ = L~ Xi'Pi if and only if L~ xl = II ~112 (Parseval's equation). Proof. Setting Un = L~ Xi'Pi and ~ = (~ - un) +Un , and remembering that ~ - Un -L Un, we have n 11~112 = II~ _ un l12+1: X~. 1 Therefore, L~ xl ~ II ~112 for all n, proving Bessel's inequality, and Un --+ ~ (that is, II ~ - Un II --+ 0) if and only if L~ xl --+ II ~112, proving Parseval's identity. 0 We call the formal series I: Xi'Pi the Fourier series of ~ (with respect to the orthonormal set {'Pi}). The Parseval condition says that the Fourier series of ~ converges to ~ if and only if II ~112 = I:~ xl. An infinite orthonormal sequence {'Pi} ~ is called a basis for a pre-Hilbert space V if every element in V is the sum of its Fourier series. Theorem 2.3. An infinite orthonormal sequence {'Pi} ~ is a basis for a pre- Hilbert space V if (and only if) its linear span is dense in V. Proof. Let ~ be any element of V, and let {Xi} be its sequence of Fourier coefficients. Since the linear span of {'Pi} is dense in V, given any E, there is a finite linear combination L~ Yi'Pi which approximates ~ to within E. But L~ Xi'Pi is the best approximation to ~ in the span of {'Pi}~' by Lemmas 2.3 and 2.1, and so for any m ~ n. That is, ~ = L~ Xi'Pi. 0 CoroIJary. If V is a Hilbert space, then the orthonormal sequence {'Pi} ~ is a basis if and only if {'Pi}.L = {O}.
  • 267. 5.2 ORTHOGONAL PROJECTION 255 Proof. Let M be the closure of the linear span of {lPi}~. Since V = M + M.l., and since M.l. = {lPi}.l., by Lemma 1.1, we see that {lPi}.l. = {o} if and only if V = M, and, by the theorem, this holds if and only if {lPi} is a basis. 0 Note that when orthogonal bases only are being used, the coefficient of a vector ~ at a basis element {3 is always the Fourier coefficient (~, (3)/11{3112. Thus the {3-coefficient of ~ depends only on {3 and is independent of the choice of the rest of the basis. However, we know from Chapter 2 that when an arbi- trary basis containing {3 is being used, then the {3-coefficient of ~ varies with the basis. This partly explains the favored position of orthogonal bases. We often obtain an orthonormal sequence by "orthogonalizing" some given sequence. Lemma 2.5. If {ai} is a finite or infinite sequence of independent vectors, then there is an orthonormal sequence {lPi} such that {ai}! and {lPi}'~,have the same linear span for all n. Proof. Since normalizing is trivial, we shall only orthogonalize. Suppose, to be definite, that the sequence is infinite, and let M n be the linear span of {aI, ... , an}. Let JLn be the orthogonal projection of an on M n-l, and set IPn = an - JLn (and IPI = al). This is our sequence. We halVe lPi E Mi C M n-l if i < n, and IPn 1. M n-l, so that the vectors lPi are mutually orthogonal. Also, IPn ~ 0, since an is not in M n_ l . Thus {lPi}! is an independent subset of the n-dimensional vector space Mn, by the corollary of Lemma 1.2, and so {lPi}! spans Mn. 0 The actual calculation of the orthogonalized sequence {lPn} can be carried out recursively, starting with IPI = ab by noticing that since JLn is the projection of an on the span of IPI, ... , IPn-b it must be the vector L:~-l CilPi, where ci.is the Fourier coefficient of an with respect to lPi. Consider, for example, the sequence {xn} 0' in e([O, 1]). We have IPI = al = 1. Next, 1P2 = a2 - JL2 = X - c· 1, where C= (a2, IPI)/[IIPI[[2 = fol x· 1/f; (1)2 = !. Then 1P3 = a3 - (C21P2 + CIIPI), where CI = fol x2 • 1/fol (1)2 = t and C2 = fol x 2(x - !)/fol (x - !)2 = (! - i)/(2;) = 1. Thus the first three terms in the orthogonalization of {xn} 0' in e([O, 1]) are 1, x - !, x 2 - (x - !) - t = x2 - X +i. This process is completely elemen- tary, but the calculations obviously become burdensome after only a few terms. We remember from general bilinear theory that if for {3 in V we define (}{J: V ~ IR by (},s(~) = (~, (3), then (},s E V* and (): {3 ~ (},s is a linear mapping from V to V*. If a, '1/) is a scalar product, then (},s({3) = 11{311 2 > °if {3 ~ 0, and so () is injective. Actually, () is an isometry, as we shall ask the reader to show in
  • 268. 256 SCALAR PRODUCT SPACES 5.2 an exercise. If V is finite-dimensional, the injectivity of () implies that () is an isomorphism. But we have a much more startling result: Theorem 2.4. () is an isomorphism if and only if V is a Hilbert space. Proof. Suppose first that V is a Hilbert space. We have to show that () is sur- jective, i.e., that every nonzero F in V* is of the form ()p. Given such an F, let N be its null space, let a be a vector orthogonal to N (Theorem 2.1), and consider (3 = ca, where c is to be determined later. Every vector ~ in V is uniquely a sum ~ = x{3 + 1/, where 1/ is in N. [This only says that VIN is one-dimensional, which presumably we know, but we can check it directly by applying F and seeing that F(~ - x(3) = 0 if and only if x = FWIF({3).J But now the equations FW = F(x{3 +1/) = xF({3) = xcF(a) and ()fjW = (~, (3) = (x{3 +1/, (3) = xll{3112 = xc2 11al12 show that ()p = F if we take c = F(a)/llaI1 2• Conversely, if () is surjective (and assuming that it is an isometry), then it is an isomorphism in Hom(V, V*), and since V* is complete by Theorem 7.6, Chapter 4, it follows that V is complete by Theorem 7.3 of the same chapter. Weare finished. 0 EXERCISES 2.1 In the proof of Lemma 2.1, if (a - }J., ~) -,.6 0, what value of t will contradict the inequality 0 ~ 2t(a -}J., ~) + t211~112? 2.2 Prove the "only if" part of Theorem 2.3. 2.3 Let {Mi} be an orthogonal sequence of complete subspaces of a pre-Hilbert space V, and let Pi be the (orthogonal) projection on Mi. Prove that {Pi~) is Cauchy for any ~ in V. 2.4 Show that the functions {sin nt) :'=1 form an orthogonal collection of elements in the pre-Hilbert space e([O,1I"» with respect to the standard scalar product (I, g) = fo'" I(t) g(t) dt. Show also that Iisin ntl12 = ...;'11"/2. 2.5 Compute the Fourier coefficients of the function I(t) = t in e([0,1I"]) with respect to the above orthogonal set. What then is the best two-norm approximation to t in the two-dimensional space spanned by sin t and sin 2t? Sketch the graph of this approximating function, indicating its salient features in the usual manner of calculus curve sketching. 2.6 The "step" function I defined by f(t) = 11"/2 on [0,11"/21 and f(t) = °on (11"/2,11"1 is of course discontinuous at 11"/2. Nevertheless, calculate its Fourier coefficients with respect to {sin nt) :'=1 in e([o, 11"]) and graph its best approximation in the span of {sin nt}~. 2.7 Show that the functions {sin nt) :'=1 U {cos nt) :'=0 form an orthogonal collection of elements in the pre-Hilbert space e([-11",11"]) with respect to the standard scalar product (f, g) = f~" I(t) g(t) dt.
  • 269. 5.3 SELF-ADJOINT TRANSFORMATIONS 257 2.8 Calculate the first three terms in the orthogonalization of {xn} 0' in e([-1, 1]). 2.9 Use the definition of the norm of a bounded linear transformation and the Schwarz inequality to show that 1101/11 ~ 11{311 [where Ol/(~) = (~, (3)]. In order to conclude that {3 ~ 01/ is an isometry, we also need the opposite inequality, 1101/11 2:: 1I{311. Prove this by using a special value of ~. 2.10 Show that if V is an incomplete pre-Hilbert space, then V has a proper closed subspace JJf such that M J.. = {O}. [Hint: There must exist P E V* not of the form PW = (~, a).] Together with Theorem 2.1, this shows that a pre-Hilbert space V is a Hilbert space if and only if V = M EEl M J.. for every closed subspace M. 2.11 The isometry 0: a ~ Oa [where Oa(~) = (~, a)] imbeds the pre-Hilbert space V in its conjugate space V*. We know that V* is complete. Why? The closure of Vas a subspace of V* is therefore complete, and we can hence complete V as a Banach space. Let H be its completion. It is a Banach space including (the isometric image of) V as a dense subspace. Show that the scalar product on V extends uniquely to Hand that the norm on H is the extended scalar product norm, so that H is a Hilbert space. 2.12 Show that under the isometric imbedding a ~ Oa of a pre-Hilbert space V into V* orthogonality is equivalent to annihilation as discussed in Section 2.3. Discuss the connection between the properties of the annihilator A0 and Lemma 1.1 of this chapter. 2.13 Prove that if C is a nonempty complete convex subset of a pre-Hilbert space V, and if a is any vector not in C, then there is a unique JI. E C closest to a. (Examine the proof of Lemma 2.2.) 3. SELF-ADJOINT TRANSFORMATIONS Definition. If V is a pre-Hilbert space, then T in Hom V is self-adjoint if (Ta, (3) = (a, T(3) for every a, {3 E V. The set of all self-adjoint transforma- tions will be designated SA. Self-adjointness suggests that T ought to become its own adjoint under the injection °of V into V*. We check this now. Since (a, (3) = OfJ(a), we can rewrite the equation (Ta,{3) = (a, T(3) as OfJ(Ta) = OTtI(a), and again as (T*(0tl)(a) = t,(TrI.) ~ by the definition of T*. This holds for all a and (3 if and only if T*(OfJ) = OTfJ for all (3 E V, or T* 0 °= °0 T, which is the asserted identification. Lemma 3.1. If V is a finite-dimensional Hilbert space and {<Pi}~ is an orthonormal basis for V, then T E Hom(V) is self-adjoint if and only if the matrix {tij} of T with respect to {<Pi} is symmetric (t = t*). Proof. If we substitute the basis expansions of a and {3 and expand, we see that (a, T(3) = (Ta, (3) for all a and {3 if and only if (<Pi, T<pj) = (T<pi, <pj) for all i and j. But T<P1 = Lk=l tkl<Pk, and when this is substituted in these last scalar products, the equation becomes tij = tji. That is, T is self-adjoint if and only if t = t*. 0 A self-adjoint T is said to be nonnegative if (T~, 0 2:: 0 for all~. Then [~, '11] = (T~, '11) is a semiscalar product!
  • 270. 258 SCALAR PRODUCT SPACES 5.: Lemma 3.2. If T is a nonnegative self-adjoint transformation, then IITWII ~ IITI11/2(T~, ~)1/2 for all~. Therefore, if (T~,~) = 0, then T~ = 0, and, more generally, if (T~n' ~n) ~ 0, then T(~n) ~ o. Proof. If T is nonnegative as well as self-adjoint, then [~, 17] = (TI;, 17) is a semiscalar product, and so, by Schwarz's inequality, I(TI;, 17)1 = [~, 17] ~ [I;, ~P/2[17, 17P/2 = (T~, ~)1/2(T17, 17)1/2. Taking 17 = T~, the factor on the right becomes (T(T~), T~)1/2, which is lesH than or equal to IITI11/21IT~II, by Schwarz and the definition of IITII. Dividing by IIT ~II, we get the inequality of the lemma. 0 If a ~ 0 and T(a) = Ca for some c, then a is called an eigenvector (proper vector, characteristic vector) of T, and c is the associated eigenvalue (proper value, characteristic value). Theorem 3.1. If V is a finite-dimensional Hilbert space and T is a self- adjoint element of Hom V, then V has an orthonormal basis consisting entirely of eigenvectors of T. Proof. Consider the function (T~, ~). It is a continuous real-valued function of ~, and on the unit sphere S = {~: II~II = I} it is bounded above by IITII (by Schwarz). Set m = lub {(TI;,~): II~II = I}. Since S is compact (being bounded and closed), (T~, ~) assumes the value m at some point a on S. Now m - T is a nonnegative self-adjoint transformation (Check this!), and (Ta, a) = m is equivalent to ((m - T)a, a) = o. Therefore, (m - T)a = 0 by Lemma 3.2, and Ta = ma. We have thus found one ~igenvector for T. Now set VI = V, a1 = a, and m1 = m, and let V2 be {a1} 1.. Then T[V2] C V2, for if I;.l all then (T~, a1) = (~, Tal) = m(~, a1) = O. We can therefore repeat the above argument for the restriction of T to the Hilbert space V2and find a2 in V2such that IIa211 = 1 and T(a2) = m2a2, where m2 = lub {(T~, ~) : II ~II = 1 and ~ E V 2J. Clearly, m2 ~ mI. We then set V3 = {all a2} 1. and continue, arriving finally at an orthonormal basis {aig of eigenvectors of T. 0 Now let All •.• , AT be the distinct values in the list mil ... , mn , and let M j be the linear span of those basis vectors ai for which mi = Aj. Then the sub- spaces 1I1j are orthogonal to each other, V = E9; lIfj, each 1I1j is T-invariant, and the restriction of T to 1I1j is Aj times the identity. Since all the nonzero vectors in 1I1j are eigenvectors with eigenvalue Aj, if the a/s spanning ]lIj are replaced by any other orthonormal basis for AIj, then we still have an ortho- normal basis of eigenvectors. The a/s are therefore not in general uniquely determined. But the subspaces 1I1j and the eigenvalues Aj are unique. This will follow if we show that every eigenvector is in an ]lIj . Lemma 3.3. In the context of the above discussion, if ~ ~ 0 and T(I;) = x~ for some x in IR, then ~ E 1I1j (and so x = Aj) for some j.
  • 271. 5.3 SELF-ADJOINT TRANSFORMATIONS 259 Proof. Since V = EB; M j , we have ~ = L; ~i with ~i E Mi. Then r r r r L: X~i = x~ = TW = L: Tai) = L: Xi~i and L: (x - Xi)~i = O. 1 1 1 1 Since the subspaces Mi are independent, every component (x - XiHi is O. But some ~j ~ 0, since ~ ~ O. Therefore, x = Xi> h = 0 for i ~ j, and ~ = ~j E M j • 0 We have thus proved the following theorem. Theorelll 3.2. If V is a finite-dimensional Hilbert space and T is a self- adjoint element of Hom V, then there are uniquely determined subspaces {Vi}; of V, and distinct scalars {Xi};, such that {Vi} is an orthogonal family whose sum is V and the restriction of T to Vi is Xi times the identity. If V is a finite-dimensional vector space and we are given T E Hom V, then we know how to compute related mappings such as T2 and T-1 (if it exists) and vectors Ta, T-1a, etc., by choosing a basis for V and then computing matrix products, inverses (when they exist), and so on. Some of these computations, particularly those related to inverses, can be quite arduous. One enormous advantage of a basis consisting of eigenvectors for T is that it trivializes all of these calculations. To see this, let {i3n} be a basis of V consisting entirely of eigenvectors for T, and let {rn} be the corresponding eigenvalues. To compute T~, we write down the basis expansion for ~, ~ = L~ Xii3i, and then T~ = L~ rixii3i. T2 has the same eigenvectors, but with eigenvalues {rl}. Thus T 2a = L~ rlxii3i. T- 1 exists if and only if no ri = 0, in which case it has the same eigenvectors with eigenvalues {l/ri}. Thus T- 1 ~ = L~ (x;jri)i3i. If P(t) = L~ antn is any polynomial, then P(T) takes i3i into P(ri)i3i. Thus P(T) ~ = L~ P(ri)xii3i. By now the point should be amply clear. The additional value of orthonormality in a basis is already clear fom the last section. Basically, it enables us to compute the coefficients {Xi} of ~ by scalar products: Xi = (~, i3i). This is a good place to say a few words about the general eigenvalue problem in finite-dimensional theory. Our complete analysis above was made possible by the self-adjointness of T (or the symmetry of the matrix t). What we can say about an arbitrary T in Hom V is much less satisfactory. We first note that the eigenvalues of T can be determined algebraically, for X is an eigenvalue if and only if T - XI is not injective, or, equivalently, is singu- lar, and we know that T - XIis singular ifand only if its determinant .1(T - XI) is O. If we choose any basis for V, the determinant of T - XI is the determinant of its matrix t - Xe, and our later formula in Chapter 7 shows that this is a polynomial of degree n in X. It is easy to see that this polynomial is independent of the basis; it is called the characteristic polynomial of T. Thus the eigenvalues of T are exactly the roots of the characteristic polynomial of T.
  • 272. 260 SCALAR PRODUCT SPACES 5.:~ However, T need not have any eigenvectors! Consider, for example, a 900 rotation in the Cartesian plane. This is the map T: -< x, y> f-4 -< -y, x> . Thus T(OI) = 02 and T(02) = -0 so the matrix of Tis [~ -~l Then the matrix of T - Ais [~ -~l and the characteristic polynomial of T is the determinant of this matrix: A2 + 1. Since this polynomial is irreducible over IR, there are no eigenvalues. Note how different the outcome is if we consider the transformation with the same matrix on complex 2-space (:2. Here the scalar field is the complex number system, and T is the map -< z}, Z2> f-4 -< -Z2, Zl > from (:2 to (:2. But now >-.2 + 1 = (>-. + i)(>-' - i), and T has eigenvalues ±i! To find the eigenvectors for i, we solve T(z) = iz, which is the equation -< -Z2, Zl > = -< iz}, iz2> , or Z2 = -izl . Thus -<1, -i> (or i-<l, -i> = -<i, 1» is the unique eigen- vector for i to within a scalar multiple. We return to our real theory. If T E Hom V and n = d(V), so that d(Hom V) = n2 , then the set of n2 + 1 vectors {Ti n2 in Hom V is dependent. But this is exactly the same as saying that p(T) = 0 for some polynomial p of degree ~n2. That is, any T in Hom V satisfies a polynomial identity p(T) = O. Now suppose that T is an eigenvalue of T and that TW = T~. Then p(T)(O = p(/') ~ === 0, and so p(r) = O. That is, every eigenvalue of T is a root of the polynomial p. Conversely, if p(r) = 0, then we know from the remainder theorem of algebra that t - l' is a factor of the polynomial p(t), and therefore (t - 1')1n will be one of the relatively prime factors of p. Now suppose that p is the minimal polynomial of T (see Exercise 3.5). Theorem 5.5, Chapter 1, tells us that (T - 1'I)m is zero on a corresponding subspace N of V and therefore, in particular, that T - rl is not injective when restricted to N. That is, l' is an eigenvalue. We have proved: Theorem 3.3. The eigenvalues of T are zeros (roots) of every polynomial p(t) such that p(T) = 0, and are exactly the roots of the minimal polynomial. EXERCISES 3.1 Use the defining identity (T~, 1/) = (~, T1/) to show that the set S.I of all self- adjoint elements of Hom r is a subspace. Prove similarly that if Sand T are self- adjoint, then ST is self-adjoint if and only if ST = TS. Conclude that if T is self-adjoint, then so is p(T) for any polynomial p. 3.2 Show that if T is self-adjoint, then S = T2 is nonnegative. Show, therefore, that if T is self-adjoint and a ~ 0, then T2 + aI cannot be the zero transformation. 3.3 Let p(t) = t2 +bt +c be an irreducible quadratic polynomial (b2 < 4c), and let T be a self-adjoint transformation. Show that p(T) ~ O. (Complete the squarc and apply carlier exercises.)
  • 273. 5.3 SELF-ADJOINT TRANSFORMATIONS 261 3.4 Let T be self-adjoint and nilpotent (Tn = 0 for some n). Prove that T = o. This can be done in various ways. One method is to show it first for n = 2 and then for n = 2m by induction. Finally, any n can be bracketed by powers of 2, 2m ::; n < 2m+! . 3.5 Let V be any vector space, and let T be an element of Hom V. Suppose that there is a polynomial q such that q(T) = 0, and let p be such a polynomial of minimum degree. Show that p is unique (to within a constant multiple). It is called the minimal polynomial of T. Show that if we apply Theorem 5.5 of Chapter 1 to the minimal polynomial p of T, then the subspaces Ni must both be nontrivial. 3.6 It is a corollary of the fundamental theorem of algebra that a polynomial with real coefficients can be factored into a product of linear factors (t - r) and irreducible quadratic factors' (t2 + bt +c). Let T be a self-adjoint transformation on a finite- dimensional Hilbert space, and let pet) be its minimal polynomial. Deduce a new proof of Theorem 3.1 by applying to pet) the above remark, Theorem 5.5 of Chapter 1, and Exercises 3.1 through 3.4. 3.7 Prove that if T is a self-adjoint transformation on a pre-Hilbert space V, then its null space is the orthogonal complement of its range: N(T) = (R(T»)J.. Conclude that if V is a Hilbert space, then a self-adjoint T is injective if and only if its range is dense (in V). 3.8 Assuming the above exercise, show that if V is a Hilbert space and T is a self- adjoint element of Hom V that is bounded below (as well as bounded), then T is sur- jective. 3.9 Let T be self-adjoint and nonnegative, and set m = lub {(T~, ~): II~II = I}. Use the Schwarz inequality and the inequality of Lemma 3.2 to show that m = IITil· 3.10 Let V be a Hilbert space, let T be a self-adjoint element of Hom V, and set m = lub {(T~,~): II~II = I}. Show that if a> m, then a - T (=aI - T) is in- vertible and II(a - T) -111 ::; I/(a - m). (Apply the Schwarz inequality, the definition of m, and Exercise 3.8.) 3.11 Let P be a bounded linear transformation on a pre-Hilbert space V that is a projection in the sense of Chapter 1. Prove that if P is self-adjoint, then P is an orthogonal projection. Now prove the converse. 3.12 Let V be a finite-dimensional Hilbert space, let T in Hom V be self-adjoint, and suppose that S in Hom V commutes with T. Show that the subspaces 111i of Theorem 3.1 and Lemma 3.3 are invariant under S. 3.13 A self-adjoint transformation T on a finite-dimensional Hilbert space V is said to have a simple spectrum if all its eigenvalues are distinct. By this we mean that all the subspaces 111i are one-dimensional. Suppose that T is a self-adjoint transformation with a simple spectrum, and suppose that S commutes with T. Show that S is also self-adjoint. (Apply the above exercise.) 3.14 Let H be a Hilbert space, and let w[~, 1/J be a bounded bilinear form on H X H. That is, there is a constant b such that Iw[~, 1/11 ::; bll~IIII1/11 for all ~,1/ E H. Show that there is a unique T in Hom V such that w[t 1/J = (~, T1/). Show that Tis self-adjoint if and only if w is symmetric.
  • 274. 262 SCALAR PR.oDUCT SPACES 5.4 4. ORTHOGONAL TRANSFORMATIONS Assuming that V is a Hilbert space and that therefare 0: V ~ V* is an isamar- phism, we can .of caurse replace the adjaint T* E Ham V* .of any T E Ham V by the carresP.onding transfarmatian 0-1 0 T* 0 0 E Ham V. In Hilbert space theary it is this mapping that is called the adjaint .of T and is designated T*. Then, exactly as in .our discussi.on .of a self-adjaint T, we see that (Ta, (3) = (a, T*(3) far all a, f3 E V and that T* is uniquely defined by this identity. Finally, Tis self-adjaint if and .only if T = T*. Althaugh it really amaunts ta the abave way .of intraducing T* inta Ham V, we can make a direct definitian. as fallaws. Far each 71 the mapping ~ ~ (T~, '11) is linear and baunded, and sa is an elemen!t~of V*, which, by Thearem 2.4, is given by a unique element f3." in V accarding ta the farmula (T~, 71) = (~, f3.,,). Naw we check that 71 ~ f3n is linear and baunded and is therefare an element .of Ham V which we call T*, etc. The matrix calculatians .of Lemma 3.1 generalize verbatim ta shaw that the matrix .of T* in Ham V is the transpase t* .of the matrix t .of T. Anather very impartant type .of transfarmatian an a Hilbert space is .one that preserves the scalar praduct. Definition. A transfarmatian T E Ham V is orthogonal if (Ta, T(3) = (a, (3) far all a, f3 E V. By the basic adjaint identity abave this is entirely equivalent ta (a, T*T(3) = (a, (3), far all a, f3, and hence ta T*T = I. An arthaganal T is injective, since IITal12 = Ila112, and is therefare invertible if V is finite-dimensianal. Whether V is finite-dimensianal .or nat, if T is invertible, then the abave canditian becames T* = T-1• If T E Ham IRn, the matrix farm .of the equatian T*T = I is .of caurse t *t = e, and if this is written aut, it becames n L tkitkj = Cl} k=l far all i, j, which simply says that the calumns .of t farm an arthanarmal set (and hence a basis) in IRn. We thus have: Theorem 4.1. A transfarmatian T E Ham IRn is arthaganal if and .only if the image .of the standard basis {Ili}i under T is anather arthanarmal basis (with respect ta the standard scalar product). The necessity .of this canditian is, .of caurse, abviaus fram the scalar-praduct- preserving definitian .of arthaganality, and the sufficiency can alsa be checked directly using the basis expansians .of a and f3. We can naw state the eigenbasis thearem in different terms. By a diagonal matrix we mean a matrix which is zero everywhere except an the main diaganal.
  • 275. 5.4 ORTHOGONAL TRANSFORMATIONS 263 Theorelll 4.2. Let t = {tii} be a symmetric n X n matrix. Then there exists an orthogonal n X n matrix b such that b-1tb is a diagonal matrix. Proof. Since the transformation T E Hom IRn defined by t is self-adjoint, there exists an orthonormal basis {bi}i of eigenvectors of T, with corresponding eigenvalues {ri}i. Let B be the orthogonal transformation defined by B(~i) = bi, j = 1, ... ,n. (The n-tuples b i are the columns of the matrix b = {bii} of B.) Then (B-1 0 T 0 B)(~i) = ri~i. Since (B-1 0 T 0 B)(~i) is the jth column of b-1tb, we see that s = b-1tb is diagonal, with 8ji = rj. 0 For later applications we are also going to want the following result. Theorelll 4.3. Any invertible T E Hom V on a finite-dimensional Hilbert space V can be expressed in the form T = RS, where R is orthogonal and S is self-adjoint and positive. Proof. For any T, T*T is self-adjoint, since (T*T)* = T*T** = T*T. Let {'Pi}i be an orthonormal eigenbasis, and let {ri}i be the corresponding eigen- values of T*T. Then 0 < IIT'Pi1l 2 = (T*T'Pi, tpi) = (ritpi, 'Pi) = ri for each i. Since all the eigenvalues of T*T are thus positive, we can define a positive square root S = (T*T)1/2 by Stpi = (ri)1/2tpi, i = 1,2, ... ,n. It is clear that S2 = T*T and that S is self-adjoint. Then A = ST-1 is orthogonal, for (ST-1a, ST-1fJ) = (T-1a, S2T-1fJ) = (T-1a, T*TT-1fJ) = (T-1a, T*fJ) = (TT-1a, fJ) = (a, fJ). Since T = A -IS, we set R = A-I and have the theorem. 0 It is not hard to see that the above factorization of T is unique. Also, by starting with TT*, we can express T in the form T = SR, where S is self-adjoint and positive and R is orthogonal. We call these factorizations the polar decompositions of T, since they func- tion somewhat like the polar coordinate factorization z = rei8 of a complex number. Corollary. Any nonsingular n X n matrix t can be expressed as t = udv, where u and v are orthogonal and d is diagonal. Proof. From the theorem we have t = rs, where r is orthogonal and s is sym- metric. By Theorem 4.2, s = bdb-l, where d is diagonal and b is orthogonal. Thus t = rs = (rb)db-1 = udv, where u = rb and v = b-1 are both orthogonal. 0 EXERCISES 4.1 Let V be a Hilbert space, and suppose that 8 and T in Hom V satisfy (T~, 11) = (~, 811) for all ~,11. Write out the proof of the identity 8 = (J-10 T* 0 (J.
  • 276. 264 SCALAR PRODUCT SPACES 5.1i 4.2 Write out the analogue of the proof of Lemma 3.1 which shows that the matrix of T* is the transpose of the matrix of T. 4.3 Once again show that if (~, 71) = (~, S-) for all ~, then 71 = S-. Conclude that if S, T in Hom V are such that (~, TTJ) = (~, STJ) for all 71, then T = S. 4.4 Let {a, b} be an orthonormal basis for 1R2, and let t be the 2 X 2 matrix whose columns are a and b. Show by direct calculation that the rows of t are also ortho- normal. 4.5 State again why it is that if V is finite-dimensional, and if Sand T in Hom V satisfy SoT = I, then T is invertible and S = T-l. Now let V be a finite-dimensional Hilbert space, and let T be an orthogonal transformation in Hom V. Show that T* ill also orthogonal. 4.6 Let t be an n X n matrix whose columns form an orthonormal basis for IRn. Prove that the rows of t also form an orthonormal basis. (Apply the above exercise.) 4.7 Show that a nonnegative self-adjoint transformation S on a finite-dimensional Hilbert space has a uniquely determined nonnegative self-adjoint square root. 4.8 Prove that if V is a finite-dimensional Hilbert space and T E Hom V, then the "polar decomposition" of T, T = RS, of Theorem 4.3 is unique. (Apply the above exercise.) 5. COMPACT TRANSFORMATIONS Theorem 3.1 breaks down when V is an infinite-dimensional Hilbert space. A self-adjoint transformation T does not in general have enough eigenvectors to form a basis for V, and a more sophisticated analysis, allowing for a "con- tinuous spectrum" as well as a "discrete spectrum", is necessary. This en- riched situation is the reason for the need for further study of Hilbert space theory at the graduate level, and is one of the sources of complexity in the mathematical structure of quantum mechanics. "However, there is one very important special case in which the eigenbasis theorem is available, and which will have a startling application in the next chapter. Definition. Let V and W be any normed linear spaces, and let S be the unit ball in V. A transformation T in Hom(V, W) is compact if the closure of T[S] in W is sequentially compact. TheoreID 5.1. Let V be any pre-Hilbert space, and let T E Hom V be self- adjoint and compact. Then the pre-Hilbert space R = range (T) has an orthonormal basis {oi} consisting entirely of eigenvectors of T, and the corresponding sequence of eigenvalues {rn} converges to 0 (or is finite). Proof. The proof is just like that of Theorem 3.1 except that we have to start a little differently. Set m = IITII = lub {IIT(~)II : II ~II = I}, and choose a se- quence Hn} such that II ~nII = 1 for all n and IIT( ~n) II ---? m. Then ((m2 - T2Hn, ~n) = m2 - IIT(~n)112 ---? 0,
  • 277. 5.5 COMPACT TRANSFORMATIONS 265 and since m2 - T2 is a nonnegative self-adjoint transformation, Lemma 3.2 tells that (m2 - T2)(~n) -7 O. But since T is compact, we can suppose (passing to a subsequence if necessary) that {T~n} converges, say to fj. Then T2 ~n -7 Tfj, and so m2~n -7 Tfj also. Thus ~n -7 Tfj/m2 and fj = lim T~n = T2(fj)/m2 Since Ilfjll = lim IIT(~n)1I = m, we have a nonzero vector fj such that T2(fj) = m2fj. Set a = fj/llfjll. We have thus found a vector a such that Iiall = 1and 0 = (m2 - T2)(a) = (m - T)(m + T)(a). Then either (m + T)(a) = 0, in which case T(a) = -ma, or (m + T)(a) = 'Y ¢ 0 and (m - T)'Y = 0, in which case T'Y = m'Y. Thus there exists a vector 'PI (a or 'Y/II'YII) such that II'PIII = 1 and T('PI) = rl'PI, where hi = m. We now proceed just as in Theorem 3.I. For notational consistency we set mi = m, V I = V, and now set V 2 = {'PI}1.. Then T[V2] C V 2, since if a.i 'Pb then (Ta, 'PI) = (a, T'P2) = rl(a, 'PI) = O. Thus T rV 2is compact and self-adjoint, and if m2 = liT rV 211, there exists 'P2 with 11'P211 = 1 and T('P2) = r2'P2, where Ir21 = m2. We continue inductively, obtaining an orthonormal sequence {'Pn} C V and a sequence {rn} C IR such that T'Pn = rn'Pn and Irnl = liT rV nil, where V n = {'Pb ... ,'Pn_l}l.. We suppose for the moment, since this is the most interesting case, that rn ¢ 0 for all n.- Then we claim that Irnl -7 O. For Irnl is decreasing in any case, and if it does not converge to 0, then there exists a b > 0 such that Irnl ~ b for all n. Then IIT('Pi) - T('Pj)11 2 = Ilri'Pi - rj'Pj112 = "1Iri'PiI12 + Ilrj'Pjl12 = ri2+ r; ~ 2b2 for all i ¢ j, and the sequence {T('Pi)} can have no convergent subsequence, contradicting the compactness of T. Therefore Irnll O. Finally, we have to show that the orthonormal sequence {'Pi} is a basis for R. If fj = T(a), and if {bn} and {an} are the Fourier coefficients of fj and a, then we expect that bn = rnan, and this is easy to check: bn = (fj, 'Pn) = (T(a), 'Pn) = (a, T('Pn)) = (a, rn'Pn) = rn(a, 'Pn) = rnan. This is just saying that T(an'Pn) = bn'Pn' and therefore fj - L~ bi'Pi = T(a - L~ ai'Pi). Now a - L~ ai'Pi is orthogonal to {'Pi}~ and therefore is an element of Vn+b and the norm of T on V n+l is Irn+ll. Moreover, Iia - L~ ai'Pili ~ lIall, by the Pytha- gorean theorem. Altogether we can conclude that and since rn+1 -70, this implies that fj = Li bi'Pi. 'Thus {'Pi} is a basis for R(T). Also, since T is self-adjoint, N(T) = R(T)l. = {'Pi} 1. = ni Vi. If some ri = 0, then there is a first n such that rn = O. In this case liT rV nil = Irnl = 0, so that V n C N(T). But 'Pi E R(T) if i < n, since then 'Pi = T('Pi)/ri, and so N(T) = R(T)l. C {'Pb ... ,'Pn_l}l. = V n' There- fore, N(T) = Vn and R(T) is the span of {'Pi}~-I. 0 /
  • 278. CHAPTER 6 DIFFERENTIAL EQUATIONS This chapter is not a small differential equations textbook; we leave out far too much. We are principally concerned with some of the theory of the subject, although we shall say one or two practical things. Our first goal is the funda- mental existence and uniqueness theorem of ordinary differential equations, which we prove as an elegant application of the fixed-point theorem. Next we look at the linear theory, where we make vital use of material from the first two chapters and get quite specific about the process of actually finding solutions. So far our development is linked to the initial-value problem, concerning the existence of, and in some cases the ways of finding, a unique solution passing through some initially prescribed point in the space containing the solution curves. However, some of the most important aspects of the subject relate to what are called boundary-value problems, and our last and most sophisticated effort will be directed toward making a first step into this large area. This will involve us in the theory of Chapter 5, for we shall find ourselves studying self- adjoint operators. In fact, the basic theorem about Fourier series expansion:;; will come out of recognizing a certain right inverse of a differential operator to be a compact self-adjoint operator. 1. THE FUNDAMENTAL THEOREM Let A be an open subset of a Banach space W, let I be an open interval in IR, and let F: I X A ~ W be continuous. We want to study the differential equation dOi/dt = F(t, (1). A solution of this equation is a function!: J ~ A, where J is an open subinterval of I, such that f'(t) exists and f'(t) = F (t, f(t)) for every t in J. Note that a solution f has to be continuously differentiable, for the existence of f' implies the continuity of f, and then f'(t) = F (t, f(t)) is continuous by the continuity of F. Weare going to see that if F has a continuous second partial differential, then there exists a uniquely determined "local" solution through any point -<to, 010> E I X A. 266
  • 279. 6.1 THE FUNDAMENTAL THEOREM 267 In saying that the solution f goes through <to, ao >, we mean, of course, that ao = f(to). The requirement that the solution f have the value ao when t = to is called an initial condition. The existence and continuity of dF~t.a> implies, via the mean-value theorem, that F(t, a) is locally uniformly Lipschitz in a. By this we mean that for any point <to, ao> in I X A there is a neighborhood M X N and a constant b such that IIF(t, ~) - F(t, '17)11 ::::; bll ~ - '1711 for all t in M and all ~, '17 in N. To see this we simply choose balls M and N about to and ao such that dF~I.a> is bounded, say by b, on M X N, and apply Theorem 7.4 of Chapter 3. This is the condition that we actually use below. Theorelll 1.1. Let A be an open subset of a Banach space W, let I be an open interval in IR, and let F be a continuous mapping from I X A to W which is locally uniformly Lipschitz in its second variable. Then for any point <to, ao> in I X A, for some neighborhood U of ao and for any sufficiently small interval J containing to, there is a unique function f from J to U which is a solution of the differential equation passing through the point <to, ao> . Proof. If f is a solution on J through <to, ao> , then an integration gives so that f(t) - f(to) = rl F(s,f(s)) ds, 110 f(t) = ao + rt F(s,f(s)) ds lto for t E J. Conversely, if f satisfies this "integral equation", then the funda- mental theorem of the calculus implies that !'(t) exists and equals F(t,f(t)) on J, so thatfis a solution of the differential equation which clearly goes through < to, ao>. Now for any continuous f: J ---7 A we can define g: J ---7 W by g(t) = ao + rt F(s,f(s)) ds, lto and our argument above shows that f is a solution of the differential equation if and only iff is a fixed point of the mapping K: f f---+ g. This suggests that we try to show that K is a contraction, so that we can apply the fixed-point theorem. We start by choosing a neighborhood L X U of <to, ao> on which F(t, a) is bounded and Lipschitz in a uniformly over t. Let J be some open subinterval of L containing to, and let V be the Banach space CJ3e(J, W) of bounded con- tinuous functions from J to W. Our later calculation will show how small we have to take J. We assume that the neighborhood U is a ball about ao of radius r, and we consider the ball of functions 'U = Br(ao) in V, where ao is the con- stant function with value ao. Then any fin 'U has its range in U, so that F(t, f(t)) is defined, bounded, and continuous. That is, K as defined earlier maps the ball 'U into V.
  • 280. 268 DIFFERENTIAL EQUATIONS 6.1 We now calculate. Let F be bounded by m on L X U and let 0 be the length of J. Then IIK(ao) - aoll", = lub {Ill: F(s, ao) dsll : t E J} ::::; om (1) by the norm inequality for integrals (see Section 10 of Chapter 4). Also, if fI and f2 are in'ti, and if c is a Lipschitz constant for F on L X U, then IIK(h) - K(f2)11", = lub {Ill: F(s,h(s)) - F(s,h(s))II} ::::; olub {IIF(s,h(s)) - F(s,h(s))II} ::::; oc lub {llh(s) - h(s)ll} = ocllh - 1211",. (2) From (2) we see that K is a contraction with constant C = OC if OC < 1, and from (1) we see that K moves the center ao of the ball 'ti a distance less than (1 - C)r if om < (1 - oc)r. This double requirement on 0 is equivalent to r 0< + 'm cr and with any such 0 the theorem follows from a corollary of the fixed-point theorem (Corollary 2 of Theorem 9.1, Chapter 4). 0 Corollary. The theorem holds if F: I X A ~ W is continuous and has a continuous second partial differential. We next show that any two solutions through <to, ao>- must agree on the intersection of their domains (under the hypotheses of Theorem 1.1). Lelllllla 1.1. Let gl and g2 be any two solutions of da/dt = F(t, a) through <to, ao>-. Then gl (t) = g2(t) for all t in the intersection J = J 1 n J 2 of their domains. Proof. Otherwise there is a point s in J such that gl(S) ~ g2(S). Suppose that s > to, and set C = {t: t > to and gl(t) ~ g2(t)} and x = glb C. The set C is open, since gl and g2 are continuous, and therefore x is not in C. That is, gl(X) = g2(X). Call this common value a and apply the theorem to <x, a>-. With r such that Br(a) CA, we choose 0 small enough so that the differential equation has a unique solution g from (x - 0, x + 0) to Br(a) passing through <x, a>-, and we also take 0 small enough so that the restrictions of gl and g2 to (x - 0, x + 0) have ranges in Br(a). But then gl = g2 = g on this interval by the uniqueness of g, and this contradicts the definition of x. Therefore, gl = g2 on the intersection of their domains. 0 This lemma allows us to remove the restriction on the range of f in the theorem. Theorelll 1.2. Let A, I, and F be as in Theorem 1.1. Then for any point <to, ao>- in I X A and any sufficiently small interval neighborhood J of to, there is a unique solution from J to A passing through <to, ao>- .
  • 281. 6.1 THE FUNDAMENTAL THEOREM 269 Fig. 6.1 Global solutions. The solutions we have found for the differential equation dajdt = F(t, a) are defined only in sufficiently small neighborhoods of the initial point to and are accordingly called local solutions. Now if we run along to Ii. point -<tl, al > near the end of such a local solution and then consider the local solution about -< tt, al >,first of all it will have to agree with our first solution on the intersection of the two domains, and secondly it will in general extend farther beyond tl than the first solution, so the two local solutions will fit together to make a solution on a larger t-interval than either gives separately. We can continue in this way to extend our original solution to what might be called a global solution, made up of a patchwork of matching local solutions. These notions are somewhat vague as described above, and we now turn to a more precise construction of a global solution. Given -<to, ao> C I X A, let (f be the family of all solutions through -<to, ao>. Thus g E (f if and only if g is a solution on an interval J C I, to E J, and g(to) = ao. Lemma 1.1 shows exactly that the uniont f of all the functions g in (f is itself a function, for if -<tl, al > E gl and -< tIl a2 > E g2, then al = gl (t) = g2(t) = a2' Moreover, f is a solution, because around any x in its domain f agrees with some g E (f. By the way f was defined we see that f is the unique maximum solution through -<to, ao>. We have thus proved the following theorem. Theorelll 1.3. Let F: I X A ~ V be a function satisfying the hypotheses of Theorem 1.1. Then through each -<to, ao> in I X A there is a uniquely determined maximal solution to the differential equation dajdt = F(t, a). In general, we would have to expect a maximal solution to "run into the boundary of A" and therefore to have a domain interval J properly included in I, as Fig. 6.1 suggests. t Remember that we are taking a function to be a set of ordered pairs, so that the union of a family of functions makes precise sense.
  • 282. 270 DIFFERENTIAL EQUATIONS 6.1 However, if A is the whole space W, and if F(t, a) is Lipsohitz in a for each t, with a Lipschitz bound e(t) that is continuous in t, then we can show that each maximal solution is over the whole of I. We shall shortly see that this condition is a natural one for the linear equation. Theorem 1.4. Let W be a Banach space, and let I be an open interval in ~. Let F: I X W ~ W be continuous, and suppose that there is a continuous function e: I ~ ~ such that for all t in I and all aI, a2 in W. Then each maximal solution to the differ- ential equation daldt = F(t, a) has the whole of I for its domain. Proof. Suppose, on the contrary, that g is a maximal solution whose domain interval J has right-hand endpoint b less than that of I. We choose a finite open interval L containing b and such that LeI (see Fig. 6.2). Since L is compact, the continuous function e(t) has a maximum value e on L. We choose any tl in L n J close enough to b so that b - tl < lie, and we set al = g(tl) and m = max IIF(t, al) lion L. With these values of e and m, and with any r, the proof of Theorem 1.1 gives us a local solution f through -< tl, al >- with domain (It - 5, tl + 5) for any 5 less than r/(m +re) = liCe + (mlr». Since we now have no restriction on r (because A = W), this bound on 5 becomes lie, and since we chose tl so that tl + (lie) > b, we can now choose 5 so that tl + 5 > b. But this gives us a contradiction; the maximal solution g through -< tt, al >- includes the local solution f, so that, in particular, tl + 5 :::; b. We have thus proved the theorem. 0 ! I I ~b Fig. 6.2 Going back to our original situation, we can conclude that if the Lipschitz control of F is of the stronger type assumed above, and if the domain J of some maximal solution g is less than I, then the open set A cannot be the whole of W. It is in fact true that the distance from g(t) to the boundary of A approaches zero as t approaches an endpoint b of J which is interior to I. That is, it is now a theorem that p(j(t), A') ~ 0 as t ~ b. The proof is more complicated than our argument above, and we leave it as a set of exercises for the interested reader. The nth-order equation. Let AI, A 2, ... , An be open subsets of a Banach space W, let Ibean open interval in~, and let G: I X Al X A2 X· .. X An ~ W be continuous. We consider the differential equation dnaldtn = G(t, a, daldt, ... ,dn-Ialdtn- l).
  • 283. 6.1 THE FUNDAMENTAL THEOREM 271 A functionf: J ~ W is a solution to this equation if J is an open subinterval of I, f has continuous derivatives on J up to the nth order, f(i-ll[J] C Ai, i = 1, ... , n, and f(n)(t) = GCt, f(t),f'(t), . .. ,pn-ll(t)) for t E J. An initial condition is now given by a point -<to, f3I, f32,"" f3n>- E I X Al X··· X An. The basic theorem is almost the same as before. To simplify our notation, let a be the n-tuple -<aI, a2, ... , an>- in wn = V, and set A = In Ai. Also let 1/1 be the mapping f 1---+ -< f, 1', ... , f(n-I) >- . Then the solution equation becomesf(n)(t) = GCt,1/If(t)). Theorem 1.5. Let G: I X A ~ W be as above and suppose, in addition, that G(t, a) is locally uniformly Lipschitz in a. Then for any -<to, fJ >- in I X A and for any sufficiently small open interval J containing to, there is a unique functionffrom J to W such thatfis a solution to the above nth-order equation satisfying the initial condition if;f(to) = fJ. Proof. There is an ancient and standard device for reducing a single nth-order equation to a system of first-order equations. The idea is to replace the single equation dna/dtn = G(t, a, da/dt, ... ,dn-Ia/dtn- I) by the system of equations daddt = a2, da2/dt = aa, dan_ddt = an, dan/dt = G(t, aI, a2, ... , an), and then to recognize this system as equivalent to a single first-order equation on a different space. In fact, if we define the mapping F = -<FI, ... , Fn>- from I X A to V = wn by setting Fi(t, a) = ai+I for i = 1, ... , n - 1, and Fn(t, a) = G(t, a), then the above system becomes the single equation da/dt = F(t, a), where F is clearly locally uniformly Lipschitz in a. Now a function f = -<iI, ... ,fn >- from J to V is a solution of this equation if and only if f~ =h, f~ =ia, f~-I = fn, f~ = G(t, !I, ... , fn),
  • 284. 272 DIFFERENTIAL EQUATIONS 6.1 that is, if and only if f1 has derivatives up to order n, t/!(/I) = f and f~nt) = GCt, t/!/I(t). The n-tuplet initial condition 1/If(to) = fl is now just f(to) = fl. Thus the nth-order theorem for G has turned into the first-order theorem for F, and so follows from Theorems 1.1 and 1.2 0 The local solution through -<to, fl> extends to a unique maximal solution by Theorem 1.3 applied to our first-order problem dot./dt = F(t, a), and the domain of the maximal solution is the whole of I if G(t, a) is Lipschitz in a with a bound c(t) that is continuous and if A = Wn , as in Theorem 1.4. EXERCISES 1.1 Consider the equation da/dt = F(t, a) in the special case where W = 1R2. Write out the equation as a pair of equations involving real-valued functions and real variables. 1.2 Consider the system of differential equations dy/dt = cos xy. Define the function F: 1R3 ~ 1R2 so that the above system becomes da/dt = F(t, a), where a = -<x, y>. 1.3 In the above exercise show that F is uniformly Lipschitz in a on IR X A, where A is any bounded open set in 1R2. Is F uniformly Lipschitz on IR X 1R2? 1.4 Write out the above system in terms of a solution function 1 = -</1,12>. Write out for this system the integrated form used in proving Theorem 1.1. 1.5 The fixed-point theorem iteration sequence that we used in proving Theorem 1.1 starts off with 10 as the constant function ao and then proceeds by In(t) = ao+ fo' F(S,fn-1(S» ds. Compute this sequence as far as /4 for the differential equation dx/dt=t+x [/'(t) = t +I(t)] with the initial condition1(0) = O. That is, take 10 = 0 and compute/1,12,fa, and 14 from the formula. Now guess the solution 1and verify it. 1.6 Compute the iterates 10, /1,12, and fa for the initial-value problem dy/dx = x +y2, y(O) = o. Supposing that the solution 1has a power series expansion about 0, what are its first three nonzero terms? 1.7 Make the computation in the above exercise for the initial condition/(O) = -1. 1.8 Do the same for 1(0) = +1.
  • 285. 6.1 THE FUNDAMENTAL THEOREM 273 1.9 Suppose that W is a Banach space and that F and G are functions from IR X W4 to W satisfying suitable Lipschitz conditions. Show how the second-order system TI" = F(t, t TI, /;" TI'), ~" = G(t, ~, TI, /;" TI') would be brought under our standard theory by making it into a single second-order equation. I.IO Answer the above exercise by converting it to a first-order system and then to a single first-order equation. l.ll Let 8 be a nonnegative, continuous, real-valued function defined on an interval [0, a) c IR, and suppose that there are constants band e > 0 such that 8(x) ~ e10'" 8(t) dt +bx for all x E [0, a). a) Prove by induction that if m = 11811"" then 8(x) ~ m(e~t+~ f (e~( n. C j=l J. for every n. b) Then prove that ( b cz ) 8 x) ~ - (e - 1 e for all x. 1.12 Let W be a Banach space, let I be an interval in IR, and let F be a continuous mapping from I X W to W. Suppose that IIF(t, cio)II :::; b for all tEl and that lIF(t, a) - F(t, (3) II :::; ella - {311 for all t in I and all a, (3 in W. Let f be the global solution through -<to, ao >-, and set 8(x) = Ilf(to + x) - aoll. Prove that 8(x) ~ fox 8(t) dt + bx for x > 0 and to + x in I. Then use the result in the above exercise to derive a much stronger bound than we have in the text on the growth of the solution f(t) as t goes away from to. 1.13 With the hypotheses on F as in the above exercise, show that the iteration sequence for the solution through -<to, ao>- converges on the whole of I by showing inductively that if fo = ao and then fn(t) = ao+ fol F«S),fn-l(S)) ds, b (ett Ifn(t) - fn-l(t) I ~ - -, .C n. From these inequalities prove directly that the solution f through -<to, ao>- satisfies IIf(t) - aoll < ~ (ec1t-tol - 1). -c
  • 286. 274 DIFFERENTIAL EQUATIONS 6.2 2. DIFFERENTIABLE DEPENDENCE ON PARAMETERS It is exceedingly important in some applications to know how the solution to the system f'(t) = G(t,f(t)), varies with the initial point -< tl, al >-. In order to state the problem precisely, we fix an open interval J, set 'U = Br(ao) C V = CBe(J, W) as in the previous section, and require a solution in'U passing through -<h, al >- , where -<h, al >- is near -< to, ao>-. Supposing that a unique solution f exists, we then have a mapping -< tll al >- ~ f, and it is the continuity and differentiability of this map that we wish to study. TheoreDl 2.1. Let L X U be a neighborhood of -< to, ao>- in the Banach space ~ X W, and let F(t, a) be a bounded continuous mapping from L X U to W which is Lipschitz in a uniformly over t. Then there is a neigh- borhood J X N of -<to, ao>- with the following property. For any -< tll al >- in J X N there is a unique function f from J to U which is a solution of the differential equation da/dt = F(t, a) passing through -< tl, al >-, and the mapping -< tl, al >- ~ f from J X N to V is continuous. Proof. We simply reexamine the calculation of Theorem 1.1 and take 0 a little smaller. Let K(tll all f) be the mapping of that theorem but with initial point -< tl, al >-, so that g = K(tl , all f) if and only if get) = al +ft~ F(s, f(s)) ds for all t in J. Clearly K is continuous in -< tl, al >- for each fixed f. If N is the ball Br / 2 (ao), then the inequality (1) in the proof shows that IIK(tll aI, ao) - aoll ~ lIal - aoll + om ~ r/2 + om. The second inequality remains unchanged. Therefore, f ~ K(tll aI, f) is a map from'U to V which is a contraction with constant C = OC if OC < 1, and which moves the center ao of 'U a distance less than (1 - C)r if r/2 + om < (1 - oc)r. This new double requirement on 0 is equivalent to r o< 2(m +cr)' which is just half the old value. With J of length 0, we can now apply Theorem 9.2 of Chapter 4 to the map K(tl' aI, f) from (J X N) X'U to V, and so have our theorem. 0 If we want the map -< tll al >- ~ f to be differentiable, it is sufficient, by Theorem 9.4 of Chapter 4, to know in addition to the above that K:(JXN)X'U~V is continuously differentiable. And to deduce this, it is sufficient to suppose that dF exists and is utniformly continuous on L X U. TheoreDl 2.2. Let L X U be a neighborhood of -<to, ao>- in the Banach space ~ X W, and let F(t, a) be a bounded mapping from LX U to W such that dF exists, is bounded, and is uniformly continuous on L X U. Then, in
  • 287. 6.2 DIFFERENTIABLE DEPENDENCE ON PARAMETEUS 275 the context of the above theorem, the solution f is a continuously differ- entiable function of the initial value -< tt, at >- . Proof. We have to show that the map K(tt, at, f) from (J X N) X '11 to Vis con- tinuously differentiable, after which we can apply Theorem 9.4 of Chapter 4, as we remarked above. Now the mapping h 1--+ k defined by k(t) = ft~ h(s) ds is a bounded linear mapping from V to V which clearly depends continuously on tt, and by Theorem 14.3 of Chapter 3 the integrand map f 1--+ h defined by h(s) = F(s, f(s)) is continuously differentiable on '11. Composing these two maps we see that dK~tl''''l.J> exists and is continuous on J X N X '11. Now LlK~tl''''l.J>(~) = ~, so that dK2 = I, and LlK~tl''''l.J>(h) = - ft~l+h F(s,f(s)) ds, from which it follows easily that dK~tl''''l.J>(h) =-hF(tt,f(t)). The three partial differentials dK t, dK2, and dK3 thus exist and are continuous on J X N X '11, and it follows from Theorem 8.3 of Chapter 3 that KCtt, at, f) is continuously differentiable there. D Corollary. If s is any point in J, then the value f(s) of a solution at s is a differentiable function of its value at to. Proof. Let f", be the solution through -<to, a>-. By the theorem, a 1--+ f", is a continuously differentiable map from N to the function space V = ffie(J, W). Eut 71'8: f 1--+ f(s) is a bounded linear mapping and thus trivially continuously differentiable. Composing these two maps, we see that a 1--+ f",(s) is continuously differentiable on N. D It is also possible to make the continuous and differentiable dependence of the solution on its initial value -<to, ao>- into a global affair. The following is the theorem. We shall not go into its proof here. Theorem 2.3. Let f be the maximal solution through -<to, ao>- with domain J, and let [a, b] be any finite closed subinterval of J containing to. Then there exists an E > 0 such that for every -< tt, at >- E BE( -<to, ao>-) the domain of the global solution through -<tt, at >- includes [a, b], and the restriction of this solution to [a, b] is a continuous function of -<tt, at >-. If F satisfies the hypotheses of Theorem 2.2, then this dependence is continuously differentiable. Finally, suppose that F depends continuously (or continuously differ- entiably) on a parameter A, so that we have F(A, t, a) on M X I X A. Now the solution f to the initial-value problem f'(t) = F(t,f(t)), depends on the parameter Aas well as on the initial condition fUt) = at, and if the reader has fully understood our arguments above, he will see that we can show in the same way that the dependence of f on A is also continuous (con- tinuously differentiable). We shall not go into these details here.
  • 288. 276 DIFFERENTIAL EQUATIONS 6.3 3. THE LINEAR EQUATION We now suppose that the function F of Section 1 is from I X W to Wand con- tinuous, and that F(t, a) is linear in a for each fixed t. It is not hard to see that we then automatically have the strong Lipschitz hypothesis of Theorem 1.4, which we shall in any case now assume. Here this is a boundedness condition on a linear map: we are assuming that F(t, a) = Tt(a), where T t E Hom W, and that IITtll :::; e(t) for all t, where e(t) is continuous on I. As one might expect, in this situation the existence and uniqueness theory of Section 1 makes contact with general linear theory. Let X 0 be the vector space e(1, W) of all continuous functions from I to W, and let X 1 be its subspace e1(1, W) of all functions having continuous first derivatives. Norms will play no role in our theorem. Theorem 3.1. The mapping S: X1 ~ X0 defined by setting g = Sf if get) = f'(t) - F(t, f(t)) is a surjective linear mapping. The set N of global solutions of the differential equation da/dt = F(t, a) is the null space of S, and is therefore, in particular, a vector space. For each to E I the restriction to N of the coordinate (evaluation) mapping 'Trto: f ~ f(to) is an isomorphism from N to W. The null space M of 'Trto is therefore a complement of N in X 1, and so determines a right inverse R of S. The mapping f ~ -<Sf, f(to) > is an isomorphism from X1 to X0 X W, and this fact is equivalent to all the above assertions. Proof. For any fixed gin Xo we set G(t, a) = F(t, a) +get) and consider the (nonlinear) equation da/dt = G(t, a). By Theorems 1.3 and 1.4 it has a unique maximal solution f through any initial point -<to, ao>, and the domain of f is the whole of I. That is, for each pair -< g, a> in X0 X W there is a unique f in X 1 such that -< Sf, f(to) > = -<g, a>. The mapping -<S, 'Trto> :f~ -<Sf,f(to» is thus bijective, and since it is clearly linear, it is an isomorphism. In particular, S is surjective. The null space N of S is the inverse image of {O} X W under the above isomorphism; that is, 'Trto rN is an isomorphism from N to W. Finally, the null space M of 'Trto is the inverse image of X0 X {O} under -<S, 'Trto> , and the direct sum decomposition X 1 = M $ N simply reflects the decomposition X 0 X W = (X0 X {O}) EB ({O} X W) under the inverse isomor- phism. This finishes the proof of the theorem. 0 The problem of finding, for a given g in X 0 and a given ao in W, the uniquef in X 1 such that S(f) = g and f(to) = ao is called the initial-value problem. At the theoretical level, the problem is solved by the above theorem, which states that the uniquely determined f exists. At the practical level of computation, the problem remains important. The fact that M = Mto is a complement of N breaks down the initial-value problem into two independent subproblems. The right inverse R associated with
  • 289. 6.3 THE LINEAR EQUATION 277 Mto finds h in X 1 such that S(h) = g and h(to) = O. The inverse of the isomor- phism f 1-+ f(to) from N to W selects that k in Xl such that S(k) = 0 and k(to) = ao. Then f = h + k. The first subproblem is the problem of "solving the inhomogeneous equation with homogeneous initial data", and the second is the problem of "solving the homogeneous equation with inhomogeneous initial data". In a certain sense the initial-value problem is the "direct sum" of these two independent problems. We shall now study the homogeneous equation da/dt = Tt(a) more closely. As we saw above, its solution space N is isomorphic to W under each projection map 'lrt = f 1-+ f(t). Let CPt be this isomorphism (so that CPt = 'lrt rN). We now choose some fixed to in I-we may as well suppose that I contains 0 and take to = O-and set K t = CPt 0 CPOI. Then {Kt} is a one-parameter family of linear isomorphisms of W with itself, and if we setf(3(t) = K t({3), thenf(3 is the solution of da/dt = Tt(a) passing through <0, fJ>. We call K t a fundamental solution of the homogeneous equation da/dt = Tt(a). Since fp(t) = Tt(f(3(t)), we see that d(Kt)/dt = Tt 0 K t in the sense that the equation is true at each fJ in W. However, the derivative d(Kt)/dt does not necessarily exist as a norm limit in Hom W. This is because our hypotheses on T, do not imply that the mapping t 1-+ Tt is continuous from I to Hom W. If this mapping is continuous, then the mapping <t, A> 1-+ Tt 0 A is continuous from I X Hom W to Hom W, and the initial-value problem dA/dt = Tt 0 A, Ao = I has a unique solution At in (31(1, Hom W). Because evaluation at fJ is a bounded linear mapping from Hom W to W, At(fJ) is a differentiable function of t and This implies that At(fJ) = Kt(fJ) for all fJ, so K t = At. In particular, the fundamental solution t 1-+ Kt is now a differentiable map into Hom W, and dKt/dt = Tt 0 K t. We have proved the following theorem. Theorem. 3.2. Let t 1-+ Tt be a continuous map from an interval neighbor- hood I of 0 to Hom W. Then the fundamental solution t 1-+ K t of the differential equation da/dt = Tt(a) is the parametrized arc from I to Hom W that is the solution of the initial-value problem dA/dt = Tt 0 A, Ao = I. In terms of the isomorphisms K t = K(t), we can now obtain an explicit solution for the inhomogeneous equation dajdt = Tt(a) +g(t). We want f such that f'(t) - Tt(f(t)) = g(t). Now K'(t) = Tt 0 K(t), so that Tt = K'(t) 0 K(t)-l, and it follows from Exercise 8.12 of Chapter 4 and the general product rule for differentiation
  • 290. 278 DIFFERENTIAL EQUATIONS (Theorem 8.4 of Chapter 3) that the left side of the equation above is exactly K(t) (~t [K(t)-l (J(t) )]) . The equation we have to solve can thus be rewritten as We therefore have an obvious solution, and even if the reader has found our :plOtivating argument too technical, he should be able to check the solution by differentiating. Theorelll 3.3. In the context of Theorem 3.2, the function f(t) = K t [fot K;l(g(S) dsJ is the solution of the inhomogeneous initial-value problem da/dt = Tt(a) +g(t), f(O) = o. This therefore is a formula for the right inverse R of S determined by thp complement M 0 of the null space N of S. The special case of the constant coefficient equation, where the "coefficient" operator Tt is a fixed T in Hom W, is extremely important. The first new fact. to be observed is that if f is a solution of da/dt = T(a), then so is 1'. For the equationf'(t) = T(j(t) has a differentiable right-hand side, and differentiating, we getf"(t) = T(j'(t). That is: Lelllllla 3.1. The solution space N of the constant coefficient equation da/dt = T(a) is invariant under the derivative operator D. Moreover, we see from the differential equation that the operator D on N is just composition with T. More precisely, the equation f'(t) = T(J(t) can be rewritten 'lrt 0 D = To 'lrt, and since the restriction of 'lrt to N is the isomor- phism CPt from N to W, this equation can be solved for T. We thus have the following lemma. Lemma 3.2. For each fixed t the isomorphism CPt from N to W takes the derivative operator D on N to the operator T on W. That is, T = CPt 0 D 0 cp-;l. The equation for the fundamental solution Kt is now dS/dt = TS. In the elementary calculus this is the equation for the exponential function, which leads us to expect and immediately check that K t = etT• (See the end of Section 8 of Chapter 4.) The solution of da/dt = T(a) through <0, {3> is thus the function etT(3 = f: ti Ti.({3) . o J!
  • 291. 6.3 THE LINEAR EQUATION 279 If T satisfies a polynomial equation p(T) = 0, as we know it must if W is finite-dimensional, then ouranalysis can be carried significantly further. Suppose for now that p has only real roots, so that its relatively prime factorization is p(x) = II~(x - Xi)mi. Then we know from Theorem 5.5 of Chapter 1 that W is the direct sum W = EB~ Wi of the null spaces Wi of the transformations (T - Xi)mi , and that each Wi is invariant under T. This gives us a much simpler form for the solution curve etTa if the point a is in one of the null spaces Wi. Taking such a subspace Wi itself as W for the moment, we have (T - Xl)m = 0, so that T = XI + R, where Rm = 0, and the factorization etT = et).etR, together with the now finite series expansion of etR, gives us etTa = et). [a + tR(a) +... +tm-1 Rm-1(a)]. (m - 1) Note that the number of terms on the right is the degree of the factor (t - X)m in the polynomial p(t). In the general situation where W = EB~ Wi, we have a = L:~ ai, etT(a), = L:~ etT(ai), and each etT(ai) of the above form. The solution of f'(t) = T(j(t)) through the general point -<0, a>- is thus a finite sum of terms of the form tiet).ifJi;' the number of terms being the degree of the polynomial p. If W is a complex Banach space, then the restriction that p have only real roots is superfluous. We get exactly the same formula but with complex values of X. This introduces more variety into the behavior of the solution curves since an outside exponential factor et). = etpeiIP now has a periodic factor if 11 ~ 0. Altogether we have proved the following theorem. TheorelD 3.4. If W is a real or complex Banach space and T E Hom W, then the solution curve in W of the initial-value problem f' (t) = T (j(t)) , f(O) = fJ, is aD i f(t) = etTfJ = L ~ Ti(fJ). o J. If T satisfies a polynomial equation (T - X)m = 0, then f(t) = et)' [fJ +tR(fJ) +... + (~:11) Rm-1(fJ)] , where R = T - H. If T satisfies a polynomial equation p(T) = °and p has the relatively prime factorization p(x) = II~ (x - Xi)mi, then f(t) is a sum of k terms of the above type, and so has the form f(t) = L tiet).ifJij, i.i where the number of terms on the right is the degree of the polynomial p, and each fJii is a fixed (constant) vector.
  • 292. 280 DIFFERENTIAL EQUATIONS It is important to notice how the asymptotic behavior of f(t) as t ---? + 00 iH controlled by the polynomial roots Ai. We first restrict ourselves to the solution through a vector a in one of the subspaces Wi, which amounts to supposing that. (T - A)m = O. Then if Ahas a positive real part, so that eO. = et/Leitv with JL > 0, then Ilf(t)1I ---? 00 in exponential fashion. If Ahas a negative real part, then f(t) approaches zero as t ---? 00 (but its norm becomes infinite exponentially fast as t ---? -(0). If the real part of Ais zero, then IIf(t)lI---? 00 like tm - I if m > 1. Thus the only way for f to be bounded on the whole of IR is for the real part of Ato be zero and m = 1, in which case f is periodic. Similarly, in the general case where p(T) = In (T - An)m" = 0, it will be true that all the solution curves are bounded on the whole of IR if and only if the roots An are all pure imaginary and all the multiplicities mn are 1. EXERCISES 3.1 Let I be an open interval in IR, and let W be a normed linear space. Let F(t, a) be a continuous function from I X W to W which is linear in a for each fixed t. Prove that there is a function c(t) which is bounded on every closed interval [a, b] included in I and such that IIF(t, a) II ~ c(t) Iiall for all a and t. Then show that c can be made continuous. (You may want to use the Heine-Borel property: If [a, b] is covered by n collection of open intervals, then some finite subcollection already covers [a, b].) 3.2 In the text we omitted checking that jf-+ jCn) - G(t,j,j', ... ,jCn-I) is sur- jective from Xn to Xo. Prove that this is so by tracking down the surjectivity through the reduction to a first-order system. 3.3 Suppose that the coefficients ai(t) in the operator n Tj = L: aJ'i) o are all themselves in el . Show that the null space N of T is a subspace of en+!. State a generalization of this theorem and indicate roughly why it is true. 3.4 Suppose that W is a Banach space, T E Hom W, and (3 is an eigenvector of l' with eigenvalue r. Show that the solution of the constant coefficient equation dot/dt = T(a) through <0, (3 >- is j(t) = etT{3. 3.5 Suppose next that W is finite-dimensional and has a basis {(3iH consisting of eigenvectors of T, with corresponding eigenvalues rio Find a formula for the solution through <0, a>- in terms of the basis expansion of a. 3.6 A very important special case of the linear equation da/dt = Tt(a) is when th(' operator function Tt is periodic. Suppose, for example, that Tt+l = Tt for all t. Show that then Kt+n = Kt(Kl)n for all t and n. Assume next that KI has a logarithm, and so can be written KI = eA for some A in Hom W. (We know from Exercise 11.19 of Chapter 4 that this is always possibll~ if W is finite-dimensional.) Show that now Kt can be written in the form K t = B(t)etA , where B(t) is periodic with period 1.
  • 293. 6.4 THE nTH-ORDER LINEAR EQUATION 281 3.7 Continuing the above exercise, suppose now that W is a finite-dimensional complex vector space. Using the analysis of etAfJ given in the text, show that the differential equation da/dt = Tt(a) has a periodic solution (with any period) only if KI has an eigenvalue of absolute value 1. Show also that if KI has an nth root of unity as an eigenvalue, then the differential equation has a periodic solution with period n. 3.8 Write out the special form that the formula of Theorem 3.3 takes in the constant coefficient situation. 3.9 It is interesting to look at the facts of Theorem 3.1 from the point of view of Theorem 5.3 of Chapter 1. Assume that S: Xl ~ Xo is surjective and that its null space N is isomorphic to W under the coordinate (evaluation) map 71'to. Prove that if M is the nullspace of 71'to in Xl, then S rM is an isomorphism onto Xo by applying this theorem. 4. THE nTH-ORDER LINEAR EQUATION The nth-order linear differential equation is the equation dna/dtn = GCt, a, da/dt, ... , dn-Ia/dtn- l), where GCt, a) = GCt, ab ..• , an) is now linear from V = wn to W for each t in I. We convert this to a first-order equation da/dt = FCt, a) just as before, where now F is a map from I X V to V that is linear in its second variable a, FCt, a) = TtCa). Our proof of Theorem 1.5 showed that a functionf in e(n)(I, W) is a solution of the nth-order equation dna/dtn = G(t, a, ... , dn-Ia/dtn- l) if and only if the n-tuplet 1/;f = -<f,!', ... ,f(n-O> is a solution of the first-order equation da/dt = F(t, a) = TtCa). We know that the latter solutions form a vector subspace N of el(I, wn), and since the map 1/;: f I--'t -<f,!', ... ,!'n-l) > is linear from en(l, W) to el(I, Wn), it follows that the set N of solutions of the nth-order equation is a subspace of en(I, W) and 1/; rN is an isomorphism from 'N to N. Since the coordinate evaluation CPt = 7rt rN is an isomorphism from N to wn for each t (Theorem 3.1), it follows that the map 7rt 0 1/;:fl--'t -<fCt),!'Ct), ... ,f(n-OCt» takes N isomorphically to wn. Its null space M t is a complement of N in en, as before. Here M t is the set of functions f in enCI, W) such that fCt) = ... = f(n-OCt) = O. We now consider the special case W = ~. For each fixed t, G is now a linear map from ~n to ~, that is, an element of c~n)*, and its coordinate set with respect to the standard basis is an n-tuple k = -< kl' ... , kn >. Since the linear map varies continuously with t, the n-tuple k varies continuously with t. Thus, when we take t into account, we have an n-tuple kCt) = -<kl(t), ... , knCt) > of continuous real-valued functions on I such that n GCt, Xb ••• , xn) = :E kiCt)Xi. i=l
  • 294. 282 DIFFERENTIAL EQUATIONS 6.·1 The solution space N of the nth-order differential equation dna/dtn = G(a, ... , dn-la/dtn-I, t) is just the null space of the linear transformation L: en(I, !R.) ~ eO(I, !R.) defined by If we shift indices to coincide with the order of the derivative, and if we let t n ) also have a coefficient function, then our nth-order linear differential operator J, appears as (Lf)(t) = anCt)f(n)(t) +... + ao(t)f(t). Giving t n) a coefficient function an changes nothing provided an(t) is never zero, since then it can be divided out to give the form we have studied. This is called the regular case. The singular case, where an(t) is zero for some t, requires further study, and we shall not go into it here. We recapitulate what our general linear theory tells us about this situation. Theorem 4.1. L is a surjective linear transformation from the space en(l) of all real-valued functions on I having continuous derivatives through order n to the space e°(I) = e(I) of continuous functions on I. Its null space N is the solution space of our original differential equation. For each to in I the restriction to N of the mapping <Pto 0 1/;:fl---7 -<f(to), ... , tn-°(tO) >- is an isomorphism from N to !R.n, and the set Mto of functions f in en such that f(to) = ... = fIn-OCto) = 0 is therefore a complement of N in en(I), and determines a linear right inverse of L. The practical problem of "solving" the differential equation LU) = g for J when g is given falls into two parts. First we have to find the null space N of L, that is, we have to solve the homogeneous equation LU) = O. Since N is all n-dimensional vector space, the problem of delineating it is equivalent to finding a basis, and this is clearly the efficient way to proceed. Our first problem there- fore is to find n linearly independent solutions {uiE of LU) = O. Our second problem is to find a right inverse of L, that is, a linear way of picking one f such that LU) = g for each g. Here the obvious thing to do is to try to make th(, formula of Theorem 3.3 into a practical computation. If v is one solution or LU) = g, then of course the set of all solutions is the affine subspace N +v. We shall start with the first problem, that of finding a basis {Ui} ~ of solutiolls to LU) = O. Unfortunately, there is no general method available, and we hav(~ to be content with partial success. We shall see that we can easily solve thp first-order equation directly, and that if we can find one solution of the nth- order equation, then we can reduce the problem to solving an equation of order n - 1. Moreover, in the very important special case of an operator L with constant coefficients, Theorem 3.4 gives a complete explicit solution. The first-order homogeneous linear equation can be written in the form y' + a(t)y = 0, where the coefficient of y' has been divided out. Dividing by y
  • 295. 6.4 THE nTH-ORDER LINEAR EQUATION 283 and remembering that y'/y = (log y)" we see that, formally at least, a solution is given by log y = - Ja(t) dt or y = e-fa<tldt, and we can check it by inspec- tion. Thus the equation y' + y/t = 0 has a solution y = e-1og t = 1ft, as the reader might have noticed directly. Suppose now that L is an nth-order operator and that we know one solution U of Lf = o. Our problem then is to find n - 1 solutions VlJ ••• ,Vn-l inde- pendent of each other and of u. It might even be reasonable to guess that these could be determined as solutions of an equation of order n - 1. We try to find a second solution vet) in the form c(t)u(t), where c(t) is an unknown function. Our motivation, in part, is that such a solution would automatically be inde- pendent of u unless c(t) turns out to be a constant. Now if vet) = c(t)u(t), then v' = cu' +c'u, and generally vU) = t (~) c(i)uU- i ). i=O '/. If we write down L(v) = L~ aj(t)v(j)(t) and collect those terms involving c(t), we get n L(v) = c(t) L ajuU) +terms involving c', ... , c(n) o = cL(u) +S(c') = S(c'), where S is a certain linear differential operator of order n - 1 which can be explicitly computed from the above formulas. We claim that solving S(n = 0 solves our original problem. For suppose that {gin-l is a basis for the null space of S, and set Ci(t) = Jci gi. Then L(CiU) = S(c~) = S(gi) = 0 for i = 1, ... , n - 1. Moreover, u, ClU, ... ,Cn_lU are independent, for if u = L~-l kic.u, then 1 = L~-l kiCi(t) and 0 = L~-l kiC~(t) = L~-l kigi(t), con- tradicting the independence of the set {gi}. We have thus shown that if we can find one solution u of the nth-order equation Lf = 0, then its complete solution is reduced to solving an equation Sf = 0 of order n - 1 (although our independence argument was a little sketchy). This reduction procedure does not combine with the solution of the first- order equation to build up a sequence of independent solutions of the nth-order equation because, roughly speaking, it "works off the top instead of off the bottom". For the combination to be successful, we would have to be able to find from a given nth-order operator a first-order operator S such that N(S) C N(L), and we can't do this in general. However, we can do it when the coefficient functions in L are all constants, although we shall in fact proceed differently. Meanwhile it is valuable to note that a second-order equation Lf = 0 can be solved completely if we can find one solution u, since the above argument reduces the remaining problem to a first-order equation which can then be solved by an integration, as we saw earlier. Consider, for instance, the equation y" - 2y/t2 = 0 over any interval I not containing 0, so that ao(t) = l/t2 is continuous on I. We see by inspection that u(t) = t2 is one solution. Then we
  • 296. 284 DIFFERENTIAL EQUATIONS G.· know that we can find a solution vet) independent of u(t) in the form vet) = t2c(t) and that the problem will become a first-order problem for c'. We have, in fac1, v' = t2c' + 2tc and v" = t2c" + 4tc' + 2c, so that L(v) = v" - 2v/t2 = t2c" +4tc', and L(v) = 0 if and only if (c')' + (4/t)c' = O. Thus c' = e-f4dt/t = e-41ogt = l/t c = 1/t3 (to within a scalar multiple; we only want a basis!), and v = t2c(t) = l/l. (The reader may wish to check that this is the promised solution.) The null space of the operator LU) = f" - 2f/t2 is thus the linear span of {t2, l/t] . We now turn to an important tractable case, the differential operator Lf = anf(n) + an_d(n-O + ... + aof, where the coefficients ai are constants and an might as well be taken to 1. What makes this case accessible is that now L is a polynomial in the derivative operator D. That is, if Df =!" so that Djf = f(j), then L = p(D), where p(x) = L~ aixi. The most elegant, but not the most elementary, way to handle this equation is to go over to the equivalent first-order system dx/dt = T(x) on IRn and to apply the relevant theory from the last section. Theorem 4.2. If pet) = (t - b)n, then the solution space N of the COIl- stant coefficient nth-order equation p(D)f = 0 has the basis {ebt, tebt, ... , tn-1ebt}. If pet) is a polynomial which has a relatively prime factorization pet) = II~ Pi(t) with each P.(t) of the above fonn, then the solution space of the constant coefficient equation p(D)f = 0 has the basis UB" where Bi is th(! above basis for the solution space Ni of pi(D)f = O. Proof. We know that the mapping if;:f~ <'f,!', ... ,f(n-O>- is an isomor- phism from the null space N of p(D) to the null space N of dx/dt - T(x). It i;; clear that if; commutes with differentiation, if;(Df) = <.!', ... , f(n) >- = Dif;(f), and since we know that N is invariant under D by Lemma 3.1, it follows (and call easily be checked directly) that N is invariant under D. By Lemma 3.2 we have T = CPt 0 D 0 CPt 1, which simply says that the isomorphism CPt: N ---7 IRn take;; the operator D on N into the operator T on IRn. Altogether CPt 0 if; takes D on N into Ton IR n, and since p(D) = 0 on N, it follows that peT) = 0 on IRn. We saw in Theorem 3.4 that if peT) = 0 and p = (t - b)n, then the solution space N of dx/dt = T(x) is spanned by vectors of the form The first coordinates of the n-tuple-valued functions g in N form the space N (under the isomorphism f = if;-lg), and we therefore see that N is spanned by the functions ebt, ... ,tn-1ebt. Since N is n-dimensional, and since there are n of these functions, the spanning set forms a basis.
  • 297. 6.4 THE nTH-ORDER LINEAR EQUATION 285 The remainder of the theorem can be viewed as the combination of the above and the direct application of Theorem 5.5 of Chapter 1 to the equation p(D) = 0 on N, or as the carry-over to N under the isomorphism 1/;-1 of the facts already established for N in the last section. 0 If the roots of the polynomial p are not all real, then we have to resort to the complexification theory that we developed in the exercises of Section 11, Chapter 4. Except for one final step, the results are the same. The one extra fact that has to be applied is that the null space of a real operator T acting on a real vector space Y is exactly the intersection with Y of the null space of the complexification S of T acting on the complexification Z = Y EB iY of Y. This implies that if p(t) is a polynomial with real coefficients, then we get the real solutions of p(D)f = 0 as the real parts of the complex solutions. In order to see exactly what this means, suppose that q(x) = (x2 - 2bx + c)m is one of the relatively prime factors of p(x) over tR, with x2 - 2bx + c irreducible over tR. Over C, q(x) factors into (x - A)m(X - x)m, where A= b + iw and w2 = c - b2. It follows from our general theory above that the complex 2m-dimensional null space of q(D) is the complex span of {eht teht tm- 1eht e~t teXt tm- 1eXt}, , ... , " , ... , . The real parts of the complex linear combinations of these 2m functions is a 2m-dimensional real vector space spanned by the real parts of the above functions and the real parts of i times the above functions. That is, the null space of the real operator q(D) is a 2m-dimensional real space spanned by {ebt cos wt, tebt cos wt, ... , tm- 1ebt cos wt; ebt sin wt, ... , tm- 1ebt sin wt}. Since there are 2m of these functions, they must be independent and must form a basis for the real solution space of q(D)f = O. Thus, Theorelll 4.3. If p(t) = (t2 + 2bt + c)m and b2 < c, then the solution space of the constant coefficient 2mth-order equation p(D)f = 0 has the basis { i bt }m-l { i bt· }m-lt e cos wt i=O ute sm wt i=O, where w2 = c - b2. For any polynomial p(t) with real coefficients, if p(t) = II~ Pi(t) is its relatively prime factorization into powers of linear factors and powers of irreducible quadratic factors, then the solution space N of p(D)f = 0 has the basis U~ Bi, where Bi is the basis for the null space of Pi(D) that we displayed above if Pi(t) is a power of an irreducible quadratic, and Bi is the basis of Theorem 4.2 if Pi(t) is a power of a linear factor. Suppose, for example, that we want to find a basis for the null space of D4 - 1 = O. Here p(x) = X4 - 1 = (x - l)(x + l)(x - i)(x + i). The basis for the complex solution space is therefore {et, e-t, eit, e-it}. Since eit = cos t +i sin t, the basis for the real solution space is {et, e-t, cos t, sin t}.
  • 298. 286 DIFFERENTIAL EQUATIONS The same problem for D3 - 1 = 0 gives us p(x) = x3 - 1 = (x - 1)(x2 +X + 1) = (x - 1) (x + 1 +2iV~ (x + 1 -2i~, so that the basis for the complex solution space is {et, e-[(1+i.y3)/2lt, e-[(1-i.y3/2Jt} and the basis for the real solution space is {et, e-t/2 cos (V3t/2), e-t/2 sin (y'3t/2)}. 6.4 *Our results above suggest that the collection a of all real-valued solutions of constant coefficient homogeneous linear differential equations con- tains the functions ti, ert, cos wt, sin wt for all i, r, and w, and is closed under addition and multiplication, and is in fact the algebra generated by these functions. We can easily prove this conjecture. We first consider sums. Suppose that T(f) = 0 and that S(g) = 0, where Sand T are two such constant coefficient operators. Then f +g is in the null space of SoT because Sand T commute: (S 0 T)(f + g) = (S 0 T)(f) + (S 0 T)(g) = S(Tf) + T(Sg) = 0 + 0 = o. We know that Sand T commute because they are both polynomials in D. In order to treat products, we first have to recognize that the linear span of all the trigonometric functions sin at, cos bt is an algebra. In other words, any finite product of such functions is a linear combination of such functions. This is the role of a certain class of trigonometric identities, such as 2 sin x cos y = sin (x + y) +sin (x - y), which the reader has undoubtedly had to struggle with. (And again the mystery disappears when we are allowed to treat them as complex exponentials.) Then we observe that any function in the algebra a is a finite sum of terms each of which is of the form tVt sin wt or tVt cos wt for some i, r, and w. We can exhibit an operator T having such a function in its null space, and our finite sum of such terms will then be in the null space of the composition of these operators T by our first argument. We are tempted to say one more thing. The functions ti, ert, sin wt, cos wt, and sums of their products can be shown to be exactly the continuous functions f: IR - IR such that the set of translates of f has a finite-dimensional span. That is, if we define translation through x, K x, by (Kxf)(t) = f(t - x), then for exactly the above functionsfthe linear span of {Kxf, x E IR} is finite-dimensional. This second characterization of exactly the same class of functions cannot be accidental. Part of the secret lies in the fact that the constant coefficient oper- ators T are exactly those linear differential operators that commute with trans- lation. That is, if T is a linear differential operator, then T 0 Kx = Kx 0 T for all x if and only if T has constant coefficients. Now we have noted in an early chapter that if To S = SoT, then the null space of T is invariant under S. Therefore, the null space N of a constant coefficient operator T is invariant under all translations: Kx[N] eN for all x. Now we know that N is finite-dimensional
  • 299. 6.4 THE n!l'H-ORDER LINEAR EQUATION 287 from our differential equation theory. Therefore, the functions in N are such that their translates have a finite-dimensional span! This device of gaining additional information about the null space N of a linear operator T by finding operators S that commute with T, so that N is S-invariant, is much used in advanced mathematics. It is especially important when we have a group of commuting operators S, as we do in the above case with the operators S = K",. What we have not shown is that if a continuous function f is such that its translation generates a finite-dimensional vector space, then f is in the null space of some constant coefficient operator p(D). This is delicate, and it depends on showing that if {Kt} is a one-parameter family of linear transformations on a finite-dimensional space such that K.+t = K. 0 K t and Kt --7 I as t --7 0, then there is an S in Hom V such that K t = ets.* EXERCISES Find solutions for the following equations. 4.1 4.3 4.5 4.7 x"-3x'+2x = 0 x"+2x'+3x = 0 XIII - 3x"+ 3x' - x = 0 x(6) - x" = 0 4.9 XIII ~ x" = 0 4.2 x" +2x' - 3x = 0 4.4 x" +2x' +x = 0 4.6 x'" - x = 0 4.8 x'" = 0 4.10 Solve the initial-value problem x" +4x' - 5x = 0, x(O) = 1, x'(O) = 2. 4.11 Solve the initial-value problem XIII +x' = O,x(O) = O,x'(O) = -l,x"(O) = 1. 4.12 Find one solution u of the equation 4t2x" +x = 0 by trying u(t) = tn, and then find a second solution as in the text by setting v(t) = c(t)u(t). 4.13 Solve t3x'" - 3tx' + 3x = 0 by trying u(t) = tn. 4.14 Solve tx" +x' = O. 4.15 Solve t(xlll +x') + 2(x" +x) = O. 4.16 Knowing that e-bt cos wt and e-bt sin wt are solutions of a second-order linear differential equation, and observing that their values at 0 are 1 and 0, we know that they are independent. Why? 4.17 Find constant coefficient differential equations of which the following functions are solutions: t2, sin t, t2 sin t. 4.18 Iff and g are independent solutions of a second-order linear differential equation u" +alu' +a2U = 0 with continuous coefficient functions, then we know that the vectors <f(x), f'(x) > and <g(x), g'(x) > are independent at every point x. Show conversely that if two functions have this latter property, then they are solutions of a second-order differential equation. 4.19 Solve the equation (D - a)3f = 0 by applying the order-reducing procedure discussed in the text starting with the obvious solution eat.
  • 300. 288 DIFFERENTIAL EQUATIONS 6.5 5. SOLVING THE INHOMOGENEOUS EQUATION We come now to the problem of solving the inhomogeneous equation L(f) = g. We shall briefly describe a practical method which works easily some of the time and a theoretical method which works all the time, but which may be hard to apply. The latter is just the translation of Theorem 3.3 into matrix language. We first consider the constant coefficient equation L(f) = g in the special case where g itself is in the null space of a constant coefficient operator S. A simple example is y' - ay = ebt (or y' - ay = sin bt), where g(t) = ebt is in the null space of S = (D - b). In such a situation a solution f must be in the null space of SoL, for So L(f) = S(g) = O. We know what all these functions are, and our problem is to select f among them such that L(f) is the given g. For the moment suppose that the polynomials Land S (polynomials in D) have no factors in common. Then we know that L is an isomorphism on the null space N s of S and therefore that there exists an f in N s such that Lf = g. Since we have a basis for N s, we could construct the matrix for the action of Lon N s and find f by solving a matrix equation, but the simplest thing to do is take a general linear combination of the basis, with unknown coefficients, let L act on it, and see what the coefficients must be to give g. For example, to solve y' - ay = ebt, we try f(t) = cebt and apply L: (D - a)(cebt) = (b - a)cebt J, ebt, and we see that c = 1j(b - a). Again, to solve y' - ay = cos bt, we observe that cos bt is in the null space of S = D2 +b2 and that this null space has the basis {sin bt, cos bt}. We therefore set f(t) = CI sin bt +C2 cos bt and solve (D - a)f = cos bt, getting (-acI - bC2) sin bt + (bCI - aC2) cos bt = cos bt, -aCI - bC2 = 0, bCI - aC2 = 1, and f(t) = 2 b b2 sin bt - 2 a b2 cos bt. a + a + When Land S do have factors in common, the situation is more complicated, but a similar procedure can be proved to work. Now an extra factor ti must be introduced, where i is the number of occurrences of the common factor in L. For example, in solving (D - r)2f = eTt, we have SoL = (D - r)3, and so we must set f(t) = ct2eTt. Our equation then becomes (D - r)2ct2eTt = 2ceTt J, eTt, and so C = !. For (D2 + 1)f = sin t we have to set f(t) = t(CI sin t + C2 cos t), and after we work it out we find that CI = 0 and C2 = -!, so that f = -!t cos t.
  • 301. 6.5 SOLVING THE INHOMOGENEOUS EQUATION 289 This procedure, called, naturally, the method of undetermined coefficients, vio- lates our philosophy about a solution process being a linear right inverse. Indeed, it is not a single process, applicable to any g occurring on the right, but varies with the operator S. However, when it is available, it is the easiest way to com- pute explicit solutions. We describe next a general theoretical method, called variation of parameters, that is a right inverse to L and does therefore apply to every g. Moreover, it inverts the general (variable coefficient) linear nth-order operator L: "(Lf)(t) = L: ai(t)j<i)(t). o We are assuming that we know the null space N of Lj that is, we assume known n linearly independent solutions {Ui}~ of the homogeneous equation Lf = O. What we are going to do is to translate into this context our formula K t f~ K-;l(g(S») ds for the solution to da/dt = Tt(a) +get). Since 1/1: f 1-+ -<f,!', ... ,f("-O > is an isomorphism from the solution space N of the nth-order equation LCf) = 0 to the solution space N of the equivalent first-order system dx/dt = Tt(x), it follows that if we have a basis {uiH for N, then the columns of the matrix Wij = uji-ll form a basis for N. Let wet) be the matrix Wij(t) = uji-o(t). Since evaluation at t is the isomor- phism tpt from N to IR", the columns of wet) form a basis for IR", for each t. But Kt(a) is the value at t of the solution of dx/dt = TI(x) through the initial point -< 0, a> ,and it follows that the linear transformation KI takes the columns of the matrix w(O) to the corresponding columns of wet). The matrix for KI is therefore wet) . w(O)-l, and the matrix form of our formula f(t) = Ktlot (Ks)-l(g(S») ds is therefore f(t) = wet) . w(O)-l . lot w(O) . w(S)-l . g(s) ds. Moreover, since integration commutes with the application of a constant linear transformation (here multiplication by a constant matrix), the middle w(O) factors cancel, and we have the result that rl 1f(t) = wet) . Jo w(s)- . g(s) ds is the solution of dx/dt = Tt(x) +get) which passes through -<0,0>. Finally, set k(s) = W(S)-l . g(s), so that this solution formula splits into the pair f(t) = wet) . fk(s) ds, w(s) . k(s) = g(s).
  • 302. 290 DIFFERENTIAL EQUATIONS Now we want to solve the inhomogeneous nth-order equation L(f) = g, and this means solving the first-order system with g = -<0, ... , 0, g>-. Therefore, the second equation above is equivalent to L: wij(s)kj(s) = 0, j L: wn;(s)kj(s) = g(s). j i < n, Moreover, the solution J of the nth-order equation is the first component of the n-tuple f (that is, J = ",-If), and so we end up with J(t) = 1;1Wlj(t) itkj(s) ds = ~ Uj(t)Cj(t), where Ci(t) is the antiderivative I~ ki(s) ds. Any other antiderivative would do as well, since the difference between the two resulting formulas is of the form I:i aiui(t), a solution of the homogeneous equation L(f) = 0. We have proved the following theorem. Theorem. 5.1. If {Ui(t)}~ is a basis for the solution space of the homogeneous equation L(h) = 0, and if J(t) = I:~ Ci(t)Ui(t), where the derivatives cW) are determined as the solutions of the equations L: cW)u~j)(t) = 0, j = 0, ... , n - 2, i L: c~(t)u~n-l)(t) = get), i then L(f) = g. We now consider a simple example of this method. The equation y" + y = sec x has constant coefficients, and we can therefore easily find the null space of the homogeneous equation y" + y. A basis for it is {sin x, cos x}. But we can't use the method of undetermined coefficients, because sec x is not a solution of a constant coefficient equation. We therefore try for a solution vex) = C1(X) sin x +C2(X) cos x. Our system of equations to be solved is ci sin x + c~ cos x = 0, ci cos x - C2 sin x = sec x. Thus c~ = -c~ tan x and c~ (cos x +sin x tan x) = sec x, giving and (Check it!) C~ = 1, C1 = x, C~ = -tanx, C2 = log cos x, vex) = x sin x + (log cos x) cos x.
  • 303. 6.5 SOLVING THE INHOMOGENEOUS EQUATION 291 This is all we shall say about the process of finding solutions. In cases where everything works we have complete control of the solutions of L(f) = g, and we can then solve the initial-value problem. If L has order n, then we know that the null space N is n-dimensional, and if for a given g the function v is one solution of the inhomogeneous equation L(f) = g, then the set of all solutions is the n-dimensional plane (affine subspace) M = N +v. If we have found a basis {Ui} ~ for N, then every solution of L(f) = g is of the form I = L~ CiUi +v. The initial-value problem is the problem of finding I such that L(f) = g and l(to) = aY,,!,(to) = ag, . .. ,!'n-l)(to) = a~, where <.aY, ... , a~>- = a O is the given initial value. We can now find this unique I by using these n conditions to determine the n coefficients Ci in I = L CiUi +V. We get n equations in the n unknowns Ci. Our ability to solve this problem uniquely again comes back to the fact that the matrix Wij(tO) = uji-ll(tO) is nonsingular, as did our success in carrying out the variation of parameters process. We conclude this section by discussing a very simple and important example. When a perfectly elastic spring is stretched or compressed, it resists with a "restoring" force proportional to its deformation. If we picture a coiled spring lying along the x-axis, with one end fixed and the free end at the origin when undisturbed (Fig. 6.3), then when the coil is stretched a distance x (compression being negative stretching), the force it exerts is -cx, where C is a constant rep- resenting the stiffness, or elasticity, of the spring, and the minus sign shows that the force is in the direction opposite to the displacement. This is Hooke's law. Fig. 6.3 Suppose that we attach a point mass m to the free end of the spring, pull the spring out to an initial position xo = a, and let go. The reader knows perfectly well that the system will then oscillate, and we want to describe its vibration explicitly. We disregard the InaSS of the spring itself (which amounts to ad- justing m), and for the moment we suppose that friction is zero, so that the system will oscillate forever. Newton's law says that if the force F is applied to the mass m, then the particle will accelerate according to the equation d2x m dt2 = F. Here F = -cx, so the equation combining the laws of Newton and Hooke is d2x m dt2 +ex = O.
  • 304. 292 DIFFERENTIAL EQUATIONS 6.5 This is almost the simplest constant coefficient equation, and we know that the general solution is x = CI sin Qt +C2 cos Qt, where Q = v'c/m. Our initial condition was that x = a and x' = 0 when t = O. Thus C2 = a and CI = 0, so x = a cos Qt. The particle oscillates forever between x = -a and x = a. The maximum displacement a is called the amplitude A of the oscillation. The number of complete oscillations per unit time is called the frequency f, so f = Q/27r = v'C/27rVm. This is the quantitative expression of the intuitively clear fact that the frequency will increase with the stiffness c and decrease as the mass m increases. Other initial conditions are equally reasonable. We might consider the system originally at rest and strike it, so that we start with an initial velocity v and an initial displacement 0 at time t = O. Now C2 = 0 and x = CI sin Qt. In order to evaluate Cll we remem- ber that dx/dt = v at t = 0, and since dx/dt = CIQ cos Qt, we have v = CIQ and CI = v/Q, the amplitude for this motion. In general, the initial condition would be x = a and x' = v when t = 0, and the unique solution thus determined would involve both terms of the general solution, with amplitude to be calculated. The situation is both more realistic and more interesting when friction is taken into account. Frictional resistance is ideally a force proportional to the velocity dx/dt but again with a negative sign, since its direction is opposite to that of the motion. Our new equation is thus d2x dx m dt2 +k dt +cx = 0, and we know that the system will act in quite different ways depending on the relationship among the constants m, k, and c. The reader will be asked to explore these equations further in the exercises. It is extraordinary that exactly the same equation governs a freely oscillating electric circuit. It is now written d2x dx 1 L dt2+ R dt +eX = 0, where L, R, and C are the inductance, resistance, and capacitance of the circuit, respectively, and dx/dt is the current. However, the ordinary operation of such a circuit involves forced rather than free oscillation. An alternating (sinusoidal) voltage is applied as an extra, external, "force" term, and the equation is now d2x dx x . L dt2 + R dt +C= a sm wt. This shows the most interesting behavior of alL Using the method of un- determined coefficients, we find that the solution contains transient terms that die away, contributed by the homogeneous equation, and a permanent part of frequency w/27r, arising from the inhomogeneous term a sin wt. New phenomena called phase and resonance now appear, as the reader will discover in the exercises.
  • 305. 6.5 SOLVING THE INHOMOGENEOUS EQUATION 293 EXERCISES Find particular solutions of the following equations. 5.1 x" - x = t4 5.2 x" - x = sin t 5.4 x" + x = sin t 5.6 y" - y' = eX 5.5 y" - y' = x2 5.3 x" - x = sin t +t4 (Here y' = dy/dx.) 5.7 Consider the equation y" + y = sec x that was solved in the text. To what interval I must we limit our discussion? Check that the particular solution found in the text is correct. Solve the initial-value problem for f"(x) +f(x) = sec x, f(O) = 1, j'(0) -1. Solve the following equations by variation of parameters. S.S x" + x = tan t 5.9 x'" +x' = t 5.10 y" + y = 1 5.11 y(4) - Y = cos x 5.12 y" + 4y = sec 2x 5.13 y" + 4y = sec x 5.14 Show that the general solution of the frictionless elastic equation m(d2x/dt2) +cx = 0 can be rewritten in the form A sin (Ot - a). (Remember that sin (x - y) = sin x cos y - cos x sin y.) This type of motion along a line is called simple harmonic motion. 5.15 In the above exercise express A and a in terms of the initial values dx/dt = v and x = a when t = o. 5.16 Consider now the freely vibrating system with friction taken into account, and therefore having the equation m(d2x/dt2 ) +k(dx/dt) +cx = 0, all coefficients being positive. Show that if k2 < 4mc, then the system oscillates forever, but with amplitude decreasing exponentially. Determine the frequency of oscillation. Use Exercise 5.14 to simplify the solution, and sketch its graph. 5.17 Show that if the frictional force is sufficiently large (k2 ~ 4mc), then a freely vibrating system does not in fact vibrate. Taking the simplest case k2 = 4mc, sketch the behavior of the system for the initial condition dx/dt = 0 and x = a when t = O. Do the same for the initial condition dx/dt = v and x = 0 when t = o. S.IS Use the method of undetermined coefficients to find a particular solution of the equation of the driven electric circuit Assuming that R > 0, show by a general argument that your particular solution is in fact the steady-state part (the part without exponential decay) of the general solution.
  • 306. 294 DIFFERENTIAL EQUATIONS 6.(j 5.19 In the above exercise show that the "current" dx/dt for your solution can b(~ written in the form dx dt a sin (wt - a), VR2+X2 where X = Lw - l/wC. Here a is called the phase angle. 5.20 Continuing our discussion, show that the current flowing in the circuit will have a maximum amplitude when the frequency of the "impressed voltage" a sin wt iH 1/27rvLC. This is the phenomenon of resonance. Show also that the current is in phase with the impressed voltage (i.e., that a = 0) if and only if L = C = O. 5.21 What is the condition that the phase a be approximately gOO? -gOO? 5.22 In the theory of a stable equilibrium point in a dynamical system we end up with two scalar products (~, 7J) and (t 7J») on a finite-dimensional vector space V, the qua- dratic form q(~) = H(~, ~») being the potential energy and p(e> = !(e, e> being the kinetic energy. Now we know that dqaW = (a, ~») and similarly for p, and because of this fact it can be shown that the Lagrangian equations can be written d(d~ )dt dt' 7J = (~, 7J»). Prove that a basis {,ail i can be found for V such that this vector equation becomes the system of second-order equations d2xi dt2 = AiXi, i = 1, ... , n, where the constants Ai are positive. Show therefore that the motion of the system is the sum of n linearly independent simple harmonic motions. 6. THE BOUNDARY-VALUE PROBLEM We now turn to a problem which seems to be like the initial-value problem but which turns out to be of a wholly different character. Suppose that T is a second- order operator, which we consider over a closed interval [a, b]. Some of the most important problems in physics require us to find solutions to T(f) = g such that f has given values at a and b, instead of f and l' having given values at a single point to. This new problem is called a boundary-value problem, because {a, b} is the boundary of the domain I = [a, b]. The boundary-value problem, like the initial-value problem, breaks neatly into two subproblems if the set M = {j E e2 ([a, b]) : f(a) = f(b) = O} turns out to be a complement of the null space N of T. However, if the reader will consider this general question for a moment, he will realize that he doesn't have a clue to it from our initial-value development, and, in fact, wholly new tools have to be devised. Our procedure will be to forget that we are trying to solve the boundary- value problem and instead to speculate on the nature of a linear differential
  • 307. 6.6 THE BOUNDARY-VALUE PROBLEM 295 operator T from the point of view of scalar products and the theory of self- adjoint operators. That is, our present study of T will be by means of the scalar product (f, g) = J: f(t)g(t) dt, the general problem being the usual one of solving Tf = g by finding a right inverse S of T. Also, as usual, S may be deter- mined by finding a complement M of N(T). Now, however, it turns out that if T is "formally self-adjoint", then suitable choices of M will make the associated right inverses S self-adjoint and compact, and the eigenvectors of S, computed as those solutions of the homogeneous equation Tf - rf = 0 which lie in M, then allow (relatively) the same easy handling of S, by virtue of Theorem 5.1 of Chapter 5, that they gave us earlier in the finite-dimensional situation. We first consider the notion of "formal adjoint" for an nth-order linear differential operator T. The ordinary formula for integration by parts, allows the derivatives of f occurring in the scalar product (Tf, g) to be shifted one at a time to g. At the end, f is undifferentiated and g is acted on by a certain nth-order linear differential operator R. The endpoint evaluations, like the above fgl~, that accumulate step by step can be described as B(f, g)l: = L kij(x)f(i)(x)g<il(x)l:, O"':i+j<n where the coefficient functions kij(x) are linear combinations of the coefficient functions ai(x) and their derivatives. Thus (Tf, g) = (f, Rg) + B(f, g)l~. The operator R is called the formal adJ·oint of T, and if R = T, we say that T is formally self-adjoint. Every application of the integration by parts formula introduces a sign change, and the reader may be able to see that the leading coefficient of R is (-1) n times the leading coefficient of T. Assuming this, we see that a necessary condition for formal self-adjointness is that n be even, so that Rand T have the same first terms. Supposing that T is formally self-adjoint, we seek a complement M of the null space N of T in en([a, b]) with the further property that S, the associated right inverse of T, is self-adjoint as a mapping from the pre-Hilbert space eO([a, b]) to itself. Let us see what this further requirement amounts to. For any u, v E eO, set f = Su and g = Sv, so that f and g are in M and u = Tf, v = Tg. Then (u, Sv) = (Tf, g) = (f, Tg) + B(f, g)l~ = (Su, v) + B(f, g)l~. We thus have: LeIllIlla 6.1. If T is a formally self-adjoint differential operator and M is a complement of the null space of T, then the right inverse of T determined by M is self-adjoint if and only if f, gEM =} B(f, g)l~ = o.
  • 308. IlU'PEnl;:"T'AI. EQUATIO:"I!I 0.6 From now on we shall consider only the seoond-order ca.se, However, IIlmO/!t everything Uillt we are going to do worh perfectly well for the general r.Me, the pritll of genemlity being only Additional notationsl complexity, We start hy oomputing the formal adjoint of the second-order op4:'nLtor TI _ cJ" +cd' +eJ. WII hllve (Tj, g) = f cJ"g +f cd'g I- f co/g, f eJ'g - eJfIr-f j(elg)', f C2/"g = c:/,g! - f !'(C2g)' = (c,/'g f(c29}'):t + f / (C29)", givillg (I, Ng) - fl(e'l{J)" - (c /{J)' + (Co9'), ond BU, g) - e~U 'g - g'/) + (e, - ~)ffl. ThWi Ng == C2fI" + (2c~ - e,)g' + (I'; - c; +cl)g au.1 R = T if and only if 2e'; - C, ..." C, (and e~ - e; _ 0), that is, c; = Ct. We have provP.d: Lemnla 6.2. T he ~nd-order diITercntial opcrotor T is formally Self- adjoint if and only if Tf "'" c~f" +c;r +CM = (clI' )' +cal, in which case BU, g) -= ttV'g - g'f). A constant coefficient operntor is tllU~ formally self·adjoillt if and only ifc, = O. Suppo;!ing that T is formally 8elf-adjoint, we noW try to find a complement M of ita null space N slIch that I, gEM = BU, fI)1! _ O. Since N is two-diUlell' sional, any complement M ClUj be described as the intersection of the lIull space of two linear fUlictiollal~ I, and 13 on X1 - e2{!a, bJ). ,For ~mple, tbe MODO point" complement MI~ that we had eArlier in connt!ctiOIl with the initial-value problem i~ tho intel'SCCtion of the nullllp8ct'l1 of tile tl'l'O fUllctionals '1(J) = /(10) lI.ntllJ (f) ... J'(lu). Here, howeVer, the vllilishing of II and I~ for two functiol1ll I and (J mtJ.~t imply that B(I, V).J: = C3(1'(1 - g'f).J: = 0, and the fWlCtiollsls 1,(1) must therefore involve the valUe<! of f and/, at a and at b. We would natu- rally guess, alld it can be proved, that eAch of II and II must be of the fonn l(f) = k,J(a) +kJ'(a) I- kJ(b) +kJ'(b). Our problem can therefore bc restated Il!j follows. We mll~t find two linear funetiollals II and /) of the above general fonn such that if M is the intersection
  • 309. G.G TilE DOU:o.'O,l.tlY-V,l.LUE PROBL J;:M 297 of thp,ir null spaces, then a) M is a complement of N, and b) I, g E loT ~ t;2(f'U - !7'!)1! = 0, in which case we cal! the boundary condition 11(/) - I~(n - 0 ~c1f·adjoint. Lemma 6.3. We tllll rcpiuCI) (11.) by 11.') T j~ illjl'<!(ive 011 M. Proof. If T is injective on AI, then M n N - {O}, 80 that the map f -> -<II(f), l~(f) >- is injective on N, and therefore, ~ll.Use N i~ tWo-JiIllCIl~i!rud, i~ all iWlllorphi~JU from N to R~ (by the corollary of Theorem 2.4, Chapter 2). Then M is a comple- ment of N by Theorem f•.a of Chapter L 0 Now we un c8.!lily write down various pain 11 and l~ tha~ form a self-adjoint boundary condition, We li~t wille below. 1) f E M<=> f(a) = feb) = 0 [that is, ldf) = f{a) Ilud 12 m - feb)]. 2) IE M ~f'(a) "" I'(b) "" O. 3) lIore generaly,/,(a) - V(a),j'(b) "" c/(b). (Tn fact, lz can he any I that uepo::w:Is only on the values at a, and Iz can be any I that depends only on b. Thu~ II(!) = kd(a) + kJ'(a), anu if l,(f) ...., 1,0) = 0, then the pairs </(a),/'(a» and <y(a),II'(a» are dependent, since both lie in the one- dimensional null space of it. and M f'y ...., 1/14 _ O. The same holds for 12 and b, 50 that this split pair of endpoint conditions makes h(f, oj]! = 0 hy making the valUe!! of IJ at a and at b separately 0.) 4) If c2(a) = c2(b), then tM.kef E!If <=> f(a) = I(b) lu,,1/'(a) = I'(b). That is, II(/) = I(a) - I(b) and 12(f) = /,(a) - f'(b). We now show thllt in every clise hut (3) the condit.ion (a') also holds if we replace l' by '1' - Xfor a suitablc~. T his is true also for case (3), but the proof is han.ler, am.! we ~llall omit it. 1...,11111111 6.4. Suppose that M is defined by one of the self-/ldjoint boumlary conrlilion!l (1), (2), or (4) above, that C2(t) ~ m > 0 on la, bj, and that ~ 2: co(t) + I on la, b]. T hen I« T- ')/,/)1 ;, mll!'lI: ·1·llflll for all f E M. In particular, M is a complement of the null WM.~e I)f l' - X ami hence defines a sclf~adjoint right inveI"SC of T - ~ . Proof. We have «}.. T)/,/) - - t (c~I')'I -1 i' (}.. - CO)/2 = -C21'fr+t c2(f')~ +t(}..-co)/'}..
  • 310. 298 DIFFERENTIAL EQUATIONS G.G Under any of conditions (1), (2), or (4), cd'jJ! = 0, and the two integral terms are clearly bounded below by mllf'll~ and Ilfll~, respectively. Lemma 6.3 then implies that !vI is a complement of the null space of T - "A. 0 We come now to our main theorem. It says that the right inverse S of T - "A determined by the subspace !vI above is a compact self-adjoint mapping of the pre-Hilbert space eO([a, b]) into itself, and is therefore endowed with all the rich eigenvalue structures of Theorem 5.1 of the last chapter. First we present some classical terminology. A Sturm-Liouville system on [a, b] is a formally self-adjoint second-order differential operator Tf = (cd')' +cof defined over the closed interval [a, b], together with a self-adjoint boundary ~ondition II (f) = l2(f) = 0 for that interval. If C2(t) is never zero on [a, b], the system is called regular. If c2(a) or c2(b) is zero, or if the interval [a, b] is replaced by an infinite interval such as [a, 00], then the system is called singular. Theorelll 6.1. If T: l1, l2 is a regular Sturm-Liouville system on [a, b], with C2 positive, then the subspace !vI defined by the homogeneous boundary condition is a complement of N(T - "A) if "A is taken sufficiently large, and the right inverse of T - "A thus determined by !vI is a compact self-adjoint mapping of the pre-Hilbert space eO([a, b]) into itself. Proof. The proof depends on the inequality of the above lemma. Since we have proved this inequality only for boundary conditions (1), (2), and.(4), our proof will be complete only for those cases. Set g = (T - "A)f. Since IIgl1211fl12 ~ I((T - "A)f,f)1 by the Schwarz inequality, we see from the lemma first that Ilfll~ ::; Ilg11211f112' so that IIfl12 ::; Ilg112' and then that mllf'll~ ::; IIgl1211fl12 ::; IIgll~, so that 11f'112 ::; IlgI12/vm. We have already checked that the right inverse S of the formally self- adjoint T - "A defined by !vI is self-adjoint, and it remains for us to show that the set S[U] = {f: IIgl12 ::; 1} has compact closure. For any such f the Schwarz inequality and the above inequality imply that ly lY Iy - xl 1/2 If(y) - f(x) I ::; x lf'l = x 1f'1· 1 ::; 1If'I121y - xl 1/2 ::; vm Thus S[U] is uniformly equicontinuous. Since the common domain of the functions in S[U] is the compact set [a, b], we will be able to conclude from Theorem 6.1 of Chapter 4 that the set S[U] is totally bounded if we can show that there is a constant C such that all the functions in S[U] have their ranges in [-C, C]. Taking y and x in the last inequality where IfIassumes its maximum and minimum values, we have Ilfll", - min IfI ::; (b - a) 1/2/vm. But
  • 311. 6.6 THE BOUNDARY-VALUE PROBLEM 299 (min Ifl)(b - a)1/2 ::;; IIfl12 ::;; IIYl12 ::;; 1, and therefore Ilfll", ::;; e = l/(b - a)1/2 + (b - a)1/2/vm. Thus S[U] is a uniformly equicontinuous set of functions mapping the com- pact set [a, b] into the compact set [-e, e], and is therefore totally bounded in the uniform norm. Since e([a, b]) is complete in the uniform norm, every sequence in S[U] has a subsequence uniformly converging to some fEe, and since IIfll2 ::;; (b - a)1/21Ifll"" this subsequence also converges to f in the two-norm. We have thus shown that if H is the pre-Hilbert space e([a, b]) under the standard scalar product, then the image S[U] of the unit ball U C H under S has the property that every sequence in S[U] has a subsequence converging in H. This is the property we actually used in proving Theorem 5.1 of Chapter 5, but it is not quite the definition of the compactness of S, which requires us to show that the closure S[U] is compact in H. However, if {~n} is any sequence in this closure, then we can choose {Sn} in S[U] so that II~n - snll < l/n. The se- quence {sn} has a convergent subsequence {sn(m)} m as above, and then {~n(m)} m converges to the same limit. Thus S is a compact operator. 0 Theorelll 6.2. There exists an orthonormal sequence {<Pn} consisting entirely of eigenvectors of T and forming a basis for M. Moreover, the Fourier expansion of any f E M with respect to the basis {<Pn} converges uniformly to f (as well as in the two-norm). Proof. By Theorem 5.1 of Chapter 5 there exist an eigenbasis for the range of S, which is M. Since S<Pn = 1'n<Pn for some nonzero rn, we have (T - X)(rn<Pn) = <Pn and T<Pn = (1 + Xrn)/rn)<Pn' The uniformity of the series convergence comes out of the following general consideration. Lelllllla 6.5. Suppose that T is a self-adjoint operator on a pre-Hilbert space V and that T is compact as a mapping from V to -< V, q>- , where q is a second norm on V that dominates the scalar product norm p (q 2:: cp). Then T is compact (from p to p), and the eigenbasis expansion L bn<Pn of an element {3 in the range of T converges to {3 in both norms. Proof. Let U be the unit ball of V in the scalar product norm. By the hypothesis of the lemma, the q-closure B of T[U] is compact. B is then also p-compact, for any sequence in it has a q-convergent subsequence which also p-converges to the same limit, because p ::;; cq. We can therefore apply the eigenbasis theorem. Now let a and (3 = T(a) have the Fourier series L ai<Pi and L bi<Pi, and let T(<Pi) = ri<Pi. Then bi = riai, because bi = (T(a), <Pi) = (a, T(<Pi)) = (a, ri<Pi) = ri(a, <Pi) = riai. Since the sequence of partial sums L~ ai<Pi is p-bounded (Bessel's inequality), the sequence {L~ bi<Pi} = {T(L:~ ai<Pi)} is totally q-bounded. Any subsequence of it therefore has a subsubsequence q-converging to some element 'Y in V. Since it then p-converges to 'Y, 'Y must be {3. Thus every subsequence has a subsubsequence q-converging to {3, and so {L~ bi<Pi} itself q-converges to {3 by Lemma 4.1 of Chapter 4. 0
  • 312. 300 DIFFERENTIAL EQUATIONS 6.6 EXERCISES 6.1 Given that Tf(x) = xf"(x) +f(x) and Sf(x) = !'(x), compute To Sand SoT. 6.2 Show that the differential operators T = aD and S = bD commute if and only if the functions a(x) and b(x) are proportional. 6.3 Show that the differential operators T = aD2 and S = bD commute if and only if b(x) is a first-degree polynomial b(x) = ex +d and a(x) = k(b(x»2. 6.4 Compute the formal adjoint S of T if a) Tf = 1', b) Tf = 1", c) Tf = 1"', d) (Tf)(x) = xf'(x), e) (Tf) (x) = x3f" (x). 6.5 Let Sand T be linear differential operators of orders m and n, respectively. What are the coefficient conditions for SoT to be a linear differential operator of order m +n? 6.6 Let T be the second-order linear differential operator (Tf)(t) = a2(t)f"(t) + al(t)f'(t) + ao(t)f(t). What are the conditions on its coefficient functions for its formal adjoint to exist'! What are these conditions for T of order n? 6.7 Let Sand T be linear differential operators of order m and n, respectively, and suppose that all coefficients are e""-functions (infinitely differentiable). Prove that SoT - T a S is of order :-:; m + n - 1. 6.8 A a-blip is a continuous nonnegative function cp such that cp = 0 outside of [-a, a] and J~6 cp = 1 (Fig. 6.4). We assume that there exists an infinitely differentiable I-blip cpo Show that there exists an infinitely differentiable a-blip for every a> O. Define what you would mean by a a-blip centered at x, and show that one exists. Fig. 6.4 6.9 Let f be a continuous function on [a, b] such that (f, g) = J: fg = 0 whenever g is an infinitely differentiable function which vanishes near a and b. Show that f = O. (Use the above exercise.) 6.10 Let eOO([a, bJ) be the vector space of infinitely differentiable functions on [a, b], and let T be a second-order linear differential operator with coefficients in e"": (Tf)(t) = a2(t)f"(t) +al (t)f'(t) +aoCt)f(t). Let S be a linear operator on e""([a, bJ) such that (Tf, g) - (f, Sg) = K(f, g) is a bilinear functional depending only on the values of f, g,f', and g' at a and b. Prove that S is the formal adjoint of T. [Hint: Take f to be a a-blip centered at x. Then K(f, g) = O. Now try to work the assertion to be proved into a form to which the above exercise can be applied.]
  • 313. 6.7 FOURIER SERIES 301 6.11 Prove an nth-order generalization of the above exercise. 6.12 Let X be the space of linear differential operators with ~oo-coefficients, and let AT be the formal adjoint of T. Prove that T ~ AT is an isomorphism from X to X. Prove that A(TOS) = As 0 AT. 7. FOURIER SERIES There are not many regular Sturm-Liouville systems whose associated ortho- normal eigenbases have proved to be important in actual calculations. Most orthonormal bases that are used, such as those due to Bessel, Legendre, Hermite, and Laguerre, arise from singular Sturm-Liouville systems and are therefore beyond the limitations we have set for this discussion. However, the most well- known example, Fourier series, is available to us. We shall consider the constant coefficient operator Tf = D2f, which is clearly both formally self-adjoint and regular, and either the boundary condition f(O) = f(7r) = 0 on [0, 7r] (type 1) or the periodic boundary conditionf(-7r) = f(7r), f'(-7r) = f'(7r) on [-7r, 7r] (type 4). To solve the first problem, we have to find the solutions of f" - V = 0 which satisfy f(O) = f(7r) = O. If}.. > 0, then we know that the two-dimen- sional solution space is spanned by {crx, c-rx}, where r = }..1/2. But if CICrx + C2C-rx is 0 at both 0 and 7r, then CI = C2 = 0 (because the pairs -< 1, 1>- and -<cT1r , c-T1r >- are independent). Therefore, there are no solutions satisfying the boundary conditions when}.. > O. If}.. = 0, then f(x) = CIX + Co and again CI = Co = O. If }.. < 0, then the solution space is spanned by {sin rx, cos rx}, where r = (_}..) 1/2. Now if C1 sin rx + C2 cos rx is 0 at x = 0 and x = 7r, we get, first, that C2 = 0 and, second, that r7r = n7r for some integer n. Thus the eigenfunctions for the first system form the set {sin nx} f, and the corresponding eigenvalues of D2 are {-n2}f. At the end of this section we shall prove that the functions in ~2([a, b]) that are zero near a and b are dense in ~([a, b]) in the two-norm. Assuming this, it follows from Theorem 2.3 of Chapter 5 that a basis for M is a basis for ~o, and we now have the following corollary of the Sturm-Liouville theorem. TheoreD1 7.1. The sequence {sin nx}f is an orthogonal basis for the pre- Hilbert space ~O([O, 7r]). If f E ~2([0, 7r]) and f(O) = f(7r) = 0, then the Fourier series for f converges uniformly to f. We now consider the second boundary problem. The computations are a little more complicated, but again if f(x) = CICrx + C2C-rx, and if f( -7r) = f(7r) and 1'(-7r) = f'(7r), then f = O. For now we have
  • 314. 302 DIFFERENTIAL EQUATIONS 6.7 giving CI (eT'lr - e-r ll") = 0, and so CI = o. Again f(x) = CIX + Co is ruled out Finally, if f(x) = CI sin rx + C2 cos rx, our boundary conditions become 2CI sin nr = 0 and 2rC2 sin nr = 0, so that again r = n, but this time the full solution space of (D2 + n2)f = 0 satisfies the boundary condition. Theorem 7.2. The set {sin nx} ~ U {cos nx} 0 forms an orthogonal basis for the pre-Hilbert space eO([ -7r, 7r]). If f E e2 ([ -7r, 7r)) and f( -7r) = f(7r), 1'(-7r) = f'(7r), then the Fourier series for f converges to f uniformly on [-7r,7r). Remaining proof. This theorem follows from our general Sturm-Liouville dis- cussion except for the orthogonality of sin nx and cos nx. We have (sin nx, cos nx) = f~lI" sin nt cos nt dt = tf~" sin 2nt dt = -(1/4n) cos 2nx)~1I" = o. Or we can simply remark that the first integrand is an odd function and therefore its integral over any symmetric interval [-a, a) is necessarily zero. The orthogonality of eigenvectors having different eigenvalues follows of course, as in the proof of Theorem 3.1 of Chapter 5. 0 Finally, we prove the density theorem we needed above. There are very slick ways of doing this, but they require more machinery than we have avail- able, and rather than taking the time to make the machines, we shall prove the theorem with our bare hands. It is standard notation to let a subscript zero on a symbol denoting a class of functions pick out those functions in the class that are zero "on the boundary" in some suitable sense. Here eo([a, b)) will denote the functions in e([a, bJ) that are zero in neighborhoods of a and b, and similarly for e~([a, b)). Theorem 7.3. e 2 ([a, b)) is dense in e([a, b)) in the uniform norm, and e6([a, b)) is dense in e([a, b)) in the two-norm. Proof. We first approximate f E e([a, b)) to within E by a piecewise "linear" function g by drawing straight line segments between the adjacent points on the graph of f lying over a subdivision a = Xo < Xl < ... < Xn = b of [a, bJ. If f varies by less than E on each interval (Xi-I, Xi), then Ilf - all"" :::; E. Now a'(t) is a step function which is constant on the intervals of the above sub- division. We now alter g'(t) slightly near each jump in such a way that the new function h(t) is continuous there. If we do it as sketched in Fig. 6.5, the total
  • 315. 6.7 FOURIER SERIES 303 1 o Fig. 6.5 Fig. 6.6 integral error at the jump is zero, I:.0:,,& (h - g') = 0, and the maximum error I:''-a is 611/4. This will be less than Eif we take 6= E/llg'II"" since 11 ~ 21Ig'II",. We now have a continuous function h such that II:h(t) dt - (J(x) - I(a)) I< 2E. In other words, we have approximated I uniformly by a continuously differ- entiable function. Now choose g and h in e1([a, b)) so that first III - gil", < E/2 and then IIg' - hll", < E/2(b - a). Then Ig(x) - g(O) - 10'"hi < E/2, and so H(x) = fO' h + g(O) is a twice continuously differentiable function such that III - HII", < E. In other words, e 2«(a, b)) is dense in e([a, b]) in the uniform norm. It is then also dense in the two-nonn, since 11/112 = (ib12y/2 ~ 1I/1I",(ib 1r/2= (b - a)1/211/11",. But now we can do something which we couldn't do for the uniform norm: we can alter the approximating function to one that is zero on neighborhoods of a and b, and keep the two-norm approximation good. Given 6, let e(t) be a non- negative function on [a, b] such that e(t) = 1 on [a + 26, b - 26], e(t) = 0 on la, a + 6] and on [b - 0, b], e" is continuous, and lIell", = 1. Such an e(t) clearly exists, since we can draw it. We leave it as an interesting exercise to actually define e(t). Here is a hint: Show somehow that there is a fifth-degree polynomial pet} having a graph between 0 and 1 as shown in Fig. 6.6, with a zero second derivative at 0 and at 1, and then use a piece of the graph, suitably translated, compressed, rotated, etc., to help patch together e(t). Anyway, then IIg - egl12 ~ IIgll",,(46)1/2 for any g on e([a, b)), and if g has continuous derivatives up to order 2, then so does ego Thus, if we start withI in e and approximate it by gin e2 , and then approximate g by eg, we have altogether the second approximation of the theorem. 0
  • 316. 304 DIFFERENTIAL EQUATIONS 6.7 EXERCISES 7.1 Convert the orthogonal basis {sin nxJ '1 for the pre-Hilbert space e([O, 7rJ) to an orthonormal basis. 7.2 Do the same for the orthogonal basis {sinnxJ'1U {cosnxJO' for e([-7r,7rJ). 7.3 Show that {sin nx] '1 is an orthogonal basis for the vector space V of all odd continuous functions on [-7r, 7r]. (De clever. Do not calculate from scratch.) Normal- ize t.he above basis. 7.4 State and prove the corresponding theorem for the even functions on [-7r,7r]. 7.5 Prove that the derivative of an odd function is even, and conversely. 7.6 Ye now want to prove t.he following s(.ronger tlH'Ol'em ahout (he uniform convergence of Fourier series. TheoreIll. Let f have a continuous derivative on [-7r, 7r], and suppose that f( -7r) = f(7r)· Then the Fourier series for f converges to f uniformly. Assume for convenience that f is even. (This only cuts down the number of calculations.) Show first that the Fourier series for l' is the series obtained from the Fourier series for f by term-by-term differentiation. Apply the above exercises here. Next show from the two-Horm convergence of its Fourier series to l' and the Schwarz inequality that the Fourier series for f converges uniformly. 7.7 Prove that {cos nx] 0' is an orthonormal basis for the space 11[ of e2-function:-; on [0,7r] such that 1'(0) =f'(7r) = O. 7.8 Find a fifth-degree polynomial p(x) such that p(O) = p'(O) = p"(O) = 0, p'(1) = p"(1) = 0, p(l) 1. (Forget the last condition until the end.) Sketch the graph of p. 7.9 Usc a "piece" of the above polynomial p to construct a function e(x) such that e' and elf exist and are continuous, e(x) = 0 when x < a + 0 and x > b - 0, e(x) = 1 on [a + 20, b - 20], and llell", = 1. 7.10 Prove the Weierstrass theorem given below on [0, 7r] in the following steps. Ve know that f can be uniformly approximated by a e2-function g. 1) Show that c and d can be found and that g(t) ~ c(t) - d is 0 at 0 and 7r. 2) Use the Fourier series expansion of this function and the Maclaurin series for the functions sin nx to show that the polynomial p(x) can be found. TheoreIll (The Weierstrass approximation theorem). The polynomials are dense in e([a, bJ) in the uniform norm. That is, given any continuous function f 011 [a, b] and any f, there is a polynomial p such that /f(x) - p(x)/ < f for all :;; in [a, b].
  • 317. CHAPTER 7 MULTILINEAR FUNCTIONALS This chapter is principally for reference. Although most of the proofs will be included, the reader is not expected to study them. Our goal is a collection of basic theorems about alternating multilinear functionals, or exterior forms, and the determinant function is one of our rewards. 1. BILINEAR FUNCTIONALS We have already studied various aspects of bilinear functionals. We looked at their duality implications in Section 6, Chapter 1, we considered the "canonical forms" of symmetric bilinear functionals and their equivalent quadratic forms in Section 7, Chapter 2, and, of course, the whole scalar product theory of Chapter 5 is the theory of a still more special kind of bilinear functional. In this chapter we shall restrict ourselves to bilinear and multilinear functionals over finite-dimensional spaces, and our concerns are purely algebraic. We begin with some material related to our earlier algebra. If V and Ware finite-dimensional vector spaces, then the set of all bilinear functionals on V X W is pretty clearly a vector space. We designate it V* ® W* and call it the tensor product of V* and W*. Our first theorem simply states something that was implicit in Theorem 6.1 of Chapter 1. TheoreIn 1.1. The vector spaces V* ® W*, Hom(V, W*), and Hom(W, V*) are naturally isomorphic. Proof. We saw in Theorem 6.1 of Chapter 1 that eachfin V* ® W* determines a linear mapping a 1--+ fa from W to V*, wherefa(t) = f(t, a), and we also noted that this correspondence from V* ® W* to Hom(W, V*) is bijective. All that the present theorem adds is that this bijective correspondence is linear and so constitutes a natural isomorphism, as does the similar one from V* ® W* to Hom(V, W*). To see this, leth be the bilinear functional corresponding to Tin Hom(V, W*). Then f(T+S) = h +fs, for f(T+S)(a, (3) = (T + S)(a))({3) = (T(a) +S(a))({3) = (T(a))({3) + (S(a))({3) = h(a, (3) +fs(a, (3). We can do the same for homogeneity. The isomorphism of V* ® W* with Hom(W, V*) follows in exactly the same way by reversing the roles of the variables. We are thus finished with the proof. D 305
  • 318. 306 MULTILINEAR FUNCTIONALS 7.2 Before looking for bases in V* Q9 W*, we define a bilinear functional 'Y Q9 A for any two functionals 'Y E V* and AE W* by ('Y Q9 A)(~, '1/) = 'Y(~)A('1/). We call 'Y Q9 Athe tensor product of the functionals 'Y and Aand call any bilinear functional having this form elementary. It is not too hard to see thatf E V* Q9 W* is elementary if and only if the corresponding T E Hom(V, W*) is a dyad. If V and Ware finite-dimensional, with dimensions m and n, respectively, then the above isomorphism of V* Q9 W* with Hom(V, W*) shows that the dimension of V* Q9 W* is mn. We now describe the basis determined by given bases in V and W. Theorem 1.2. Let {ai}'{' and {{jiH be any bases for V and W, and let their dual bases in V* and W* be {Mi}'{' and {viH. Then the mn elementary bilinear functionals {Mi Q9 Vi} form the corresponding basis for V* Q9 W*. Proof· Since Mi Q9 Vi(~' '1/) = MiWVi('1/) = xiYb the matrix expansionf(~, '1/) = Li,i tiixiYi becomes f(~, '1/) = Li,i tii(Mi Q9 Vi)(~' '1/) or f = 1: tii(Mi Q9 v;). i,i The set {Mi Q9 Vi} thus spans V* Q9 W*. Since it contains the same number of elements (mn) as the dimension of V* Q9 W*, it forms a basis. 0 Of course, independence can also be checked directly. If Li,i tii(Mi Q9 Vi) = 0, then for every pair -<k, l>, tkl = Li,i tii(Mi Q9 Vi) (ak' (jl) = O. We should also remark that this theorem is entirely equivalent to our discussion of the basis for Hom(V, W) at the end of Section 4, Chapter 2. 2. MULTILINEAR FUNCTIONALS All the above considerations generalize to multilinear functionals f: VI X '" X Vn ~ IR. We change notation, just as we do in replacing the traditional -<x, y> E 1R2 by x = -<Xl,"" Xn> E IRn. Thus we write f(al,"" an) = f(a), where a = -<al,"" an> E VI X ... X V n. Our requirement now is that f(all . .. , an) be a linear functional of ai when ai is held fixed for all i :;z!! i. The set of all such functionals is a vector space, called the tensor product of the dual spaces Vr, ... , V:, and is designated Vr Q9 ... Q9 V:. As before, there are natural isomorphims between these tensor product spaces and various Hom spaces. For example, and are naturally isomorphic. Also, there are additional isomorphisms of a variety
  • 319. 7.2 MULTILINEAR FUNCTIONALS 30.7 not encountered in the bilinear case. However, it will not be necessary for us to look into these questions. We define elementary multilinear functionals as before. If Ai E vt, i = 1, ... , n, and E = -< h, ... , ~n >- , then To keep our notation as simple as possible, and also because it is the case of most interest to us, we shall consider the question of bases only when VI = V2 = ... = Vn = V. In this case (V*)® = V* (8) ... (8) V* (in factors) is called the space of covariant tensors of order n (over V). If {aj}i is a basis for V and! E (V*)®, then we can eXlJand the value !(E) = !(h, ... , ~n) with respect to the basis expansions of the vectors ~i just as we did when! was bilinear, but now the result is notationally more complex. If we set ~i = Li!=l x}aj for i = 1, ... , n (so that the coordinate set of ~i is xi = {xJ} j) and use the linearity of the!(h, ... , ~n) in its separate variables one variable at a time, we get !(~l' ... , ~n) = L X!lX;2 ... x;J(aplI ap2, ... , apn), where the sum is taken over all n-tuples p = -< PI, ... , Pn >- such that 1 .:::; Pi .:::; m for each i from 1 to m. The set of all these n-tuples is just the set of all func- tions from {I, ... , n} to {I, ... ,m}. We have designated this set mn, using the notation n = {I, ... , n}, and the scope of the above sum can thus be indicated in the formula as follows: !(h, ... , ~n) = I: X!l··· x;J(ap1, ... , apn)· PEm1' A strict proof of this formula would require an induction on n, and is left to the interested reader. At the inductive step he wilt have to rewrite a double sum LpE;nii LjEm as the single sum LqEm n+l using the fact that an ordered pair -<p, j>- in mn X m is equivalent to an (n + 1)-tuplet q E mn+t, where qi = Pi for i = 1, ... , nand qn+l = j. If {,ui}i is the dual basis for V* and q E mn, let,uq be the elementary func- tional,uql (8) ... (8) ,uqn· Thus ,uq(apII ... , apJ = ll1,uqi(ap) = 0. unless p = q, in which case its value is 1. More generally, }Lq(h, ... , ~n) = ,uql(h) ... ,uqn(~n) = X!l ... x~n· Therefore, if we set cq = !(aq1, • .. , aqn), the general expansion now appears as f(~b ... , ~n) = I: CP,uP(~l' ... ' ~n) PEm1' or! = L ~, which is the same formula we obtained in the bilinear case, but with more sophisticated notation. The functionals {,up: p E mn} thus span (V*)®. They are also independent. For, if L Cp,lLp = 0., then for each q, cq = L cp,lLp(aq1, ... , aqJ = 0.. We have proved the following theorem.
  • 320. 308 MULTILINEAR FUNCTIONALS 7.:~ Theorem 2.1. The set {Mp : p E mn} is a basis for (V*)@. For any f ill (V*)@ its coordinate function {cp } is defined by cp = f(OlPl' ••• ,00Pn )· Thusf = L CpMp andf(~b ... , ~n) = L cpMp(h, ... , ~n) = L Cp X!l ... x~. for any f E (V*)@ and any -< h, ... , ~n >- E vn. Corollary. The dimension of (V*)@ is mn. Proof. There are mn functions in mn, so the basis {Mp: p E mn} has mn ele- ments. 0 3. PERMUTATIONS A permutation on a set S is a bijection f: S -t S. If S(S) is the set of all permu- tations on S, then S = S(S) is closed under composition (u, pES =} u 0 pES) and inversion (u E S =} u-I E S). Also, the identity map I is in S, and, of course, the composition operation is associative. Together these statements say exactly that S is a group under composition. The simplest kind of permutation other than I is one which interchanges a pair of elements of S and leaves every other element fixed. Such a permuation is called a transposition. We now take S to be the finite set n = {I, ... ,n} and set Sn = Sen). It is not hard to see that then any permutation can be expressed as a product of transpositions, and in more than one way. A more elementary fact that we shall need is that if p is a fixed element of Sn, then the mapping u I---t u 0 p is a bijection Sn I---t Sn. It is surjective because any u' can be written u' = (u' 0 p-I) 0 p, and it is injective because UI 0 P = U2 0 P =} (UI 0 p) 0 p-I = (U2 0 p) 0 p-l =} Ul = U2. Similarly, the mapping U I---t P 0 U (p fixed) is bijective. We also need the fact that there are n! elements in Sn. This is the ele- mentary count from secondary school algebra. In defining an element U E Sn, u(I) can be chosen in n ways. For each of these choices u(2) can be chosen in n - 1 ways, so that -<u(I), u(2) >- can be chosen in n(n - 1) ways. For each of these choices u(3) can be chosen in n - 2 ways, etc. Altogether u can be chosen in n(n - I)(n - 2) ... 1 = n! ways. In the sequel we shall often write 'pu' instead of 'p 0 u', just as we occasion- ally wrote 'ST' instead of'S 0 T' for the composition of linear maps. If ~ = -< ~b ... , ~n >- E vn andu E Sn, then we can "apply u to ~", or "per- mute the elements of -< ~I' ... ' ~n>- through u". We mean, of course, that we can replace -< h, ... , ~n >- by -< ~..(1}, ••• , ~..(n) >-, that is, we can replace ~ by ~ 0 u. Permuting the variables changes a functional f E (V*)@ into a new such functional. Specifically, given f E (V*)@ and u E Sn, we define r by rw = fa 0 u-I) = fa..-I(}), ... , ~cr-I(n». The reason for using u-1 instead of u is, in part, that it gives us the following formula.
  • 321. 7.4 THE SIGN OF A PERMUTATION 309 Proof. J'tlltl2)(~) = f(l;o (Ul O(2)-I) = f(l;o (u2"l o(11») = f(l;ou2"l) o(11) = 1"1(1; 0 u2"l) = (rl)tl2(1;). 0 Theorem 3.1. For each u in Sn the mapping Ttl defined by f f-+ r is a linear isomorphism of (V*)@ onto itself. The mapping u f-+ Ttl is an antihomo- morphism from the group Sn to the group of nonsingular elements of Hom(V*)®). Proof. Permuting the variables does not alter the property of multilinearity, so Ttl maps (V*)® into itself. It is linear, since (af + bg)tI = ar + bgtl. And Tptl = Ttl 0 Tp, becausef'tI = (f't. Thusu f-+ Ttl preserves products, but in the reverse order. This is why it is called an antihomomorphism. Finally, so that Ttl is invertible (nonsingular, an isomorphism). 0 The mapping u f-+ Ttl is a representation (really an antirepresentation) of the group Sn by linear transformations on (V*)®. Lemma 3.2. Each Ttl carries the basis {J.!p} into itself, and so is a permu- tation on the basis. Proof. We have (J.!p)tI(l;) = J.!p(1; 0 u-1) = II~=1 J.!Pi(~tI-l(i»' Settingj = u-1(i) and so having i = u(j), this product can be rewritten IIi=l J.!p"r.;)(~j) = J.!potl(I;). Thus (J.!p)" = J.!POtl, and since p f-+ P 0 u is a permutation on m'fi, we are done. 0 4. THE SIGN OF A PERMUTATION We consider now the special polynomial E on ~ n defined by E(x) = E(X1, ... , xn) = II (Xi - Xj). l~i<j~n This is the product over all pairs -<i, j>- E n X n such that i < j. This set of ordered pairs is in one-to-one correspondence with the collection P of all pair sets {i, j} en such that i ~ j, the ordered pair being obtained from the un- ordered pair by putting it in its natural order. Now it is clear that for any permutation u E Sn, the mapping {i, j} f-+ {u(i), u(j)} is a permutation of P. This means that the factors in the polynomial EtI(x) = E(x 0 u) are exactly the same as in the polynomial E(x) except for the changes of sign that occur when u reverses the order of a pair. Therefore, if n is the number of these reversals, we have Etl = (-1)nE. The mapping u f-+ (-I)n is designated 'sgn' (and called "sign"). Thus sgn is a function from Sn to {I, -I} such that Etl = (sgnu)E,
  • 322. 310 MULTILINEAR FUNCTIONALS Vi for all er E Sn. It follows that sgnper = (sgnp) (sgner), for (sgnper)E = EPU = (EP)U = (sgner)EP = (sgnp) (sgner)E, and we call evaluate E at any n-tuple x such that E(x) -;t. 0 and cancel the factor E(x). Also sgner = -1 if er is a transposition. This is clear if er interchanges adjacent numbers because it then changes th(~ sign of just one factor in E(x); we leave the general case as an exercise for the interested reader. 5. THE SUBSPACE an OF ALTERNATING TENSORS Definition. A covariant tensor f E (V*)@ is symmetric if !" = f for all er E Sn. If f is bilinear [f E (V*)®], this is just the condition f(~, 1]) = f(1], 0 for all ~,1] E V. Definition. A covariant tensor f E (V*)@ is antisymmetric or alternating if !" = (sgn er)f for all er E Sn. Since each er is a product of transpositions, this can also be expressed as th(, fact that f just changes sign if two of its arguments are interchanged. In tIll' case of a bilinear functional it is the conditionf(~, 1]) = -f(1], ~) for all ~,1] E V. It is important to note that iffis alternating, thenf(~) = 0 whenever the n-tupl(' ~ = -< ~b ••. , ~n >- is not injective (~i = ~j for some i -;t. j). The set of all symmetric elements of (V*)@ is clearly a subspace, as is also the (for us) mOJ"(' important set an of all alternating elements. There is an important linear pro- jection from (V*)@ to an which we now describe. Theorelll 5.1. The mapping f ~ (1/n!)LuESn (sgn er)!" is a projection H from (V*)@ to an. Proof. We first check that Qf E an for every f in (V*)@. We have (Qf)P = (1/n!)Lu (sgner)!"p. Now sgner = (sgnerp)(sgnp). Setting er' = er 0 p and remembering that er ~ er' is a bijection, we thus have (Qil = (Sg~p):E (sgner')!'" = (sgnp)(Qf). n. a' Hence Qf E an. If jis already in an, then!" = (sgner)f and Qf = (1/n!)LuESn f. Since S" has n! elements, Qf = f. Thus Q is a projection from (V*)@ to an. 0 Lenllna 5.1. Q(f') = (sgn p)Qf. Proof. The formula for Q(fP) is the same as that for (Qf)P except that per replaces erp. The proof is thus the same as the one for the theorem above. 0
  • 323. 7.5 THE SUBSPACE an OF ALTERNATING TENSORS 311 Theorem 5.2. The vector space an of alternating n-linear functionals over the m-dimensional vector space V has dimension C;:). Proof. If f E an and 1 = Lp cpJJ,p, then since :r = (sgn u)/, we have Lp cpJJ,pocr = Lp (sgn u)cpJJ,p for any U in Sn. Setting po U = q, the left sum becomes Lq Cqocr-lJJ,q, and since the basis expansion is unique, we must have Cqocr-l = sgnucq or cp = (sgnu)cpocr for all p E m,'ii. Working backward, we see, conversely, that this condition implies that:r = (sgn u)f. Thus 1E an if and only if its coordinate function cp satisfies the identity for all p E m,'ii and all U E Sn. This has many consequences. For one thing, cp = 0 unless p is one-to-one (injective). For if Pi = Pi and U is the transposition interchanging i and j, then p 0 U = p, cp = (sgn u)cpocr = -cp , and so cp = o. Since no p can be injective if n > m, we see that in this case the only element of an is the zero functional. Thus n > m =} dim an = o. Now suppose that n :::; m. For any injective p, the set {p 0 U : U E Sn} consists of all the (injective) n-tuples with the same range set as p. There are clearly n! of them. Exactly one q = po U counts off the range set in its natural order, i.e., satisfies ql < q2 < ... < qn. We select this unique q as the repre- sentative of all the elements p 0 U having this range. The collection C of these canonical (representative) q's is thus in one-to-one correspondence with the collection of all (range) subsets of m = {I, ... , m} of size n. Each injective p E m,'ii is uniquely expressible as p = q 0 U for some q E C, U E Sn. Thus each1in an is the sum LqEC LcrESn tqocrJJ,qocr. Since tqocr = (sgn u)tq, this sum can be rewritten LqEC tq Lcr (sgn u)JJ,qocr = LqEC tqVq, where we have Het Vq = Lcr (sgn u)JJ,qo(J' = n!Q(JJ,q). We are just about done. Each Vq is alternating, since it is in the range of Q, and the expansion which we have just found to be valid for every 1 E an shows that the set {vq : q E C} spans an. It is also independent, since LqEC tqVq = LpEmfi tpJJ,p and the set {JJ,p} is independent. It is therefore a basis for an. Now the total number of injective mappings p from n = {I, ... ,n} to m = {I, ... ,m} is m(m - 1) ... (m - n + 1), for the first element can be chosen in m ways, the second in m - 1 ways, and so on down through n choices, the last element having m - (n - 1) = m - n +1 possibilities. We have seen above that the number of these p's with a given range is n!. Therefore, the number of different range sets is m(m _ 1) ... m - n + 1 _ m! _ (m) . n! - n!(m - n)! - n And this is the number of elements q E C. 0
  • 324. 312 MULTILINEAR FUNCTIONALS 7.li The case n = m is very important. Now C contains only one element, thp identity I in Sm, so that f = CIIlI = CI L: (sgn u)p,u u and u u This is essentially the formula for the determinant, as we shall see. 6. THE DETERMINANT We saw in Section 5 that the dimension of the space am of alternating m-forms over an m-dimensional V is (:) = 1. Thus, to within scalar multiples there is only one alternating m-linear functional D over V = IRm , and we can adjust th(~ constant so that D(al, ... , am) = 1. This uniquely determined m-form is th(~ determinanifunctional, and its value D(xl, ... , xm) at the m-tuple -<Xl, ••• ,xm>- is the determinant ofthe matrix x = {Xij} whoseJth column is xj for J= 1, ... , m. Lemma 6.1. D(t1, ... , t m) = LuESm (sgn u)tu(l),I ••• tu(m),m' Proof. This is just the last remark of the last section, with the constant CI = 1, since D(al, ... , am) = 1, and with the notation changed to the usual matrix form tij. 0 Corollary I. D(t*) = D(t). Proof. If we reorder the factors of the product tul,l ••• ttTm,m in the order of th(' values Ui, the product becomes tl'PI ••• tm,Pm' where p = u-l • Since is a bijection from Sn to Sn, and since sgn(u-l ) = sgn u, the sum in the lemma can be rewritten as LpESm (sgn p) tl'PI ••. tm,Pm' But this is Corollary 2. D(t) is an alternating m-linear functional of the rows of 1. Now let dim V = m, and letfbe any nonzero alternating m-form on V. For any T in Hom V the functionalh defined by h(~l' ... , ~n) = f(T~l' ... , T~n) also belongs to am. Since am is one-dimensional, h = krf for some constant kT . Moreover, kT is independent of i, since if gT = kT'g and g = ci, we must havp ch = kT'ci and kT' = kT. This unique constant is called the determinant of T; we shall designate it !leT). Note that !leT) is defined independently of any basi::; for V. Theorem 6.1. !l(S 0 T) = !l(S) !leT).
  • 325. 7.6 THE DETERMINANT 313 Proof fl.(S 0 T)f(h, ... , ~m) = f(S 0 T)(~l)' ... , (S 0 T)(~m)) = f(S(T(h)), ... ,S(T(~m))) = fl.(S)f(T(h), ... , T(~m)) = fl.(T) fl.(S)f(~I' ... , ~m)' Now divide out f. 0 Theorem 6.2. If (J is an isomorphism from V to W, and if T E Hom V and S = (J 0 ToO-l, then fl.(S) = fl.(T). Proof. If f is any nonzero alternating m-form on W, and if we define g by (J(h, ... , ~n) = f«(J~1l ... ,(J~n)' Then g is a nonzero alternating m-form on V. Now f(S 0 (J~1l ... ,S 0 (J~n) = fl.(S)f«(J~, ... , (J~n) = fl.(S)g(h, ... , ~n)' and alsof(S 0 (J~1l •.• ,S 0 (J~n) = f«(J 0 Th, . .. , (J 0 T~n) = g(Th, ... , T~n) = b.(T)g(~1l ... , ~n)' Thus fl.(S)g = fl.(T)g and fl.(S) = fl.(T). 0 The reader will expect the two notions of determinant we have introduced to agree; we prove this now. Corollary 1. If t is the matrix of T with respect to some basis in V, then D(t) = fl.(T). Proof. If (J is the coordinate isomorphism, then T = (J 0 To (J-I is in Hom IRm and fl.(T) = fl.(T) by the theorem. Also, the columns of t are the m-tuple T(OI), ... ,T(om). Thus D(t) = D(t ... , t m) = D(T(OI), ... , T(om)) = b.(T) D(oI, ... , om) = fl.(T). Altogether we have D(t) = fl.(T). 0 Corollary 2. If sand tare m X m matrices, then D(s· t) = D(s) D(t). Proof. D(s· t) = fl.(S 0 T) = fl.(S) fl.(T) = D(s) D(t). 0 Corollary 3. D(t) = 0 if and only if t is singular. J)roof. If t is nonsingular, then t-l exists and D(t) D(CI) = D(tt-l) = n(I) = 1. In particular, D(t) ~ O. If t is singular, some column, say tIl is a linear combination of the others, tl = L;' Cit;, and D(tIl"" t m ) = L:;' c;D(ti, t2, ... , t m) = 0, since each term in the sum evaluates D at an lit-tuple having two identical elements, and so is 0 by the alternating property. 0 We still have to show that fl. has all the properties we ascribed to it in Chapter 2. Some of them are in hand. We know that fl.(S 0 T) = fl.(S) fl.(T), ILnd the one- and two-dimensional properties are trivial. Thus, if T interchanges independent vectors al and a2 in a two-dimensional space, then its matrix with respect to them as a basis is t = [~ Al, and so fl.(T) = D(t) = -1. The following lemma will complete the job. Lemma 6.2. Consider D(t) = D(t ... , t m) under the special assumption that t m = om. If s is the (m - 1) X (m - 1) matrix obtained from the m X m matrix t by deleting its last row and last column, then D(s) = D(t).
  • 326. 314 MULTILINEAR FUNCTIONALS 7.n Proof. This can be made to follow from an inspection of the formula of LemmlL 6.1, but we shall argue directly. If t has om also as its jth column for some j ~ m, then of course D(t) = (I by the alternating property. This means that D(t) is unchanged if thejth columll is altered in the mth place, and therefore D(t) depends only on the values iij ill the rows i ~ m. That is, D(t) depends only on s. Now t ~ s is clearly a Slll'- jective mapping to ~m-lXm-l, and, as a function of s, D(t) is alternatill~ (m - I)-linear. It therefore is a constant multiple of the determinant I) on ~(m-l)X(m-l). To see what the constant is, we evaluate at Then D(s) = 1 = D(t) for this special choice, and so D(s) = D(t) in general. [[ In order to get a hold on the remaining two properties, we consider all m X m matrix t whose last m - n columns are on+ 1, .•• , om, and we apply the above lemma repeatedly. We have, first, D(t) = D(t)mm), where (t)1II111 is the (m - 1) X (m - 1) matrix obtained from t by deleting the last rowalHl the last column. Since this matrix has om-l as its last column (om-l being no' an (m - I)-tuple), the same argument shows that its determinant is the salll(' as that of the (m - 2) X (m - 2) matrix obtained from it in the same way. We can keep on going as long as the o-columns last, and thus see that D(t) is tll(· determinant of the n X n matrix that is the upper left corner of t. If we interpret this in terms of transformations, we have the following lemma. Lenuna 6.3. 3uppose that V is m-dimensional and that T in Hom V iH the identity on an (m - n)-dimensional subspace X. Let Y be a compl(·- ment of X, and let p be the projection on Y along X. Then po (T f Y) call be considered an element of Hom Y and Ll(T) = Lly(p 0 (T f V»). Proof. Let ar, ... , an be a basis for Y, and let an+r, ... , am be a basis for X. Then {ai}'{' is a basis for V, and since T(ai) = ai for i = n + 1, ... , m, til!' matrix for T has oi as its ith column for i = n + 1, ... ,m. The lemma will therefore follow from our above discussion if we can show that the matrix of po (T f Y) in Hom Y is the n X n upper left corner of t. The student should be able to see this if he visualizes what vector (p 0 T)(ai) is for i ::;; n. D Corollary. In the above situation if Y is also invariant under T, thpll Ll(T) = Lly(T f V). Proof. The proof follows immediately since now p 0 (T f Y) = T f Y. I[ If the roles of X and Yare interchanged, both being invariant under T and T being the identity on Y, then this same lemma tells us that Ll(T) = Llx(T f X). If we only know that X and Yare T-invariant, then we can factor T into II commuting product T = T loT2 = T 2 0 T r, where T 1 and T 2 are of the two more special types discussed above, and so have the rule Ll(T) = Ll(T1) Ll(T2) = Llx(T f X) Lly(T f V), another of our properties listed in Chapter 2.
  • 327. 7.6 THE DETERMINANT 315 The final rule is also a consequence of the above lemma. If T is the identity on X and also on VIX, then it isn't too hard to see that po (T f Y) is the identity, as an element of Hom Y, and so A(T) = 1 by the lemma. We now prove the theorem concerning "expansion by minors (or cofactors)". Let t be an m X m matrix, and let (t)pr be the (m - 1) X (m - 1) submatrix obtained from t by deleting the pth row and rth column. Then, Theorem 6.3. D(t) = L:i"=1 (-l)i+rtir D(t)ir). That is, we can evaluate D(t) by going down the rth column, multiplying each element by the determinant of the (m - 1) X (m - 1) matrix associated with it, and adding. The two occurrences of 'D' in the theorem are of course over di- mensions m and m - 1, respectively. Proof. Consider D(t) = D(tl, ... ,tm) under the special assumption that e = op. Since D(t) is an alternating linear functional both of the columns of t and of the rows of t, we can move the rth column and pth row to the right- bottom border, and apply Lemma 6.2. Thus D(t) = (_l)m-r(_l)m-p D(er) = (-l)P+rD(er), assuming that the rth column of t is or. In general, e = L:i"=1 tir oi, and if we expand D(t t, ... , t m) with respect to this sum in the rth place, and if we use the above evaluation of the separate terms of the resulting sum, we get D(t) = L:i"=1 (_l)i+r tir D(t)ir. 0 Corollary 1. If S ~ r, then L:i"=1 (-l)i+rti. D(t)ir) = o. Proof. We now have the expansion of the theorem for a matrix with identicalsth and rth columns, and the determinant of this matrix is zero by the alternating property. 0 For simpler notation, set Cij = (-l)i+j D (t)ij). This is called the cofactor of the element tij in t. Our two results together say that m L: Cirti. = o~ D(t). i=1 In particular, if D(t) ~ 0, then the matrix s whose entries are Sri = Cir/D(t) is the inverse of t. This observation gives us a neat way to express the solution of a system of linear equations. We want to solve t . x = y for x in terms of y, supposing that D(t) ~ O. Since s is the inverse of t, we have x = s' y. That is, Xj = L:i"=1 SjiYi = (L:i"=1 YiCij)/D(t) for J = 1, ... ,m. According to our expansion theorem, the numerator in this expression is exactly the determinant dj of the matrix obtained from t by replacing its Jth column by the m-tuple y. Hence, with dj defined this way, the solution to t . x = y is the m-tuple This is Cramer's rule. It was stated in slightly different notation in Section 2.5.
  • 328. 316 MULTILINEAR FUNCTIONALS 7.7 7. THE EXTERIOR ALGEBRA Our final job is to introduce a multiplication operation between alternatiIlK n-linear functionals (also now called exterior n-Iorms). We first extend the tensol' product operation that we have used to fashion elementary covariant tensors out. of functionals. Definition. If IE (V*)@ and g E (V*)<D, then I ® g is that element of (V*)<l!±D defined as follows: I ® g(~1! ... , ~n+l) = I(~l' ... , ~n)g(~n+1! ... , ~n+l)' We naturally ask how this operation combines with the projection n of (V*)<li±D onto an +l • Theorem 7.1. n(f ® g) = n(f ® ng) = n(nl ® g). Prool. We have n(f ® ng) = (n ~ l)! ~ (sgn u)(f ® ng)" 1 (1 )U= (n + l)! ~ (sgn u) I ® Ii ~ (sgn p)gP = (n; l) !l! ~ (sgn u)(sgn p)(f ® gP)". We can regard p as acting on the full n + l places of I ® g by taking it as thl' identity on the first n places. Then (f ® gP)U = (f ® g)pu. Set pu = u'. Fol' each u' there are exactly l! pairs <p, u'? with pu = u', namely, the pair;; {<p, p-1u''?: p E Sl}. Thus the above sum is (n ~ l)! ~ (sgn u') (f ® g)U' = n(f ® g). The proof for n(nl ® g) is essentially the same. 0 Definition. If I E an and g E ai, then I 1 g = (n;;l)n(f ® g). Lemma 7.1. !I 1 h 1 ... 1 Ik = (n!/nl!n2!'" nk!)n(fl ® ... ® Ik), where ni is the order of J;, i = 1, ... , k, and n = L~ ni. Proof. This is simply an induction, using the definition of the wedge operation 1 and the above theorem. 0 Corollary. If Ai E V*, i = 1, ... , n, then In particular, if ql < ... < qn and {JLiH' is a basis for V*, then JLql 1 ... 1 JLqn = n!n(JLq) = the basis element IIq of an.
  • 329. 7.7 THE EXTERIOR ALGEBRA 317 TheoreUl 7.2. If f E an and g E ai, then g 1 f = (_1)lnf 1 g. In par- ticular, A 1 A= 0 for AE V*. Proof. We have g @f = U @ g)", where (f is the permutation moving each of the last l places over each of the first n places. Thus (f is the product of In transpositions, (sgn (f) = (_1)ln, and Q(g @f) = QU@ g)" = (sgn (f)QU@ g) = (_1)lnQU @ g). We multiply by C;-l) and have the theorem. D Corollary. If {Ai} ~ C V*, then Al 1 ... 1 An = 0 if and only if the sequence {Ai} ~ is dependent. Proof. If {Ai} is independent, it can be extended to a basis for V*, and then Al 1 ... 1 An is some basis vector Vq of an by the above corollary. In par- ticular, AI 1 . . . 1 An ~ O. If {Ai} is dependent, then one of its elements, say AI, is a linear combination of the rest, Al = :E~ CiAi and Al 1 A2 1 ... 1 An = :Ei'=2 CiAi 1 (A2 1 ... 1 An). The ith of these terms repeats Ai, and so is 0 by the lemma and the above corollary. D LeUlUla 7.2. The mapping <'f, g>- f---t f 1 g is a bilinear mapping from an X a l to a n+l. Proof. This follows at once from the obvious bilinearity of f @ g. D We conclude with an important extension theorem. TheoreUl 7.3. Let () be the alternating n-linear map from (v*)n to an. Then for any alternating n-linearfunctional F(AI' ... ,An) on (v*)n, there is a uniquely determined linear functional G on an such that F = Go e. The mapping G f---t F is thus a canonical isomorphism from (an) * to an(V*). Proof. The straightforward way to prove this is to define G by establishing its necessary values on a basis, using the equation F = Go e, and then to show from the linearity of G, the alternating multilinearity of and the alternating multilinearity of F that the identity F = Go () holds everywhere. This computation becomes notationally complex. Instead, we shall be devious. We shall see that by proving more than the theorem asserts we get a shorter proof of the theorem. We consider the space an(V*) of all alternating n-linear functions on (v*)n. We know from Theorem 5.2 that d(an(V*» = C;;'), since d(V*) = d(V) = m.
  • 330. 318 MULTILINEAR FUNCTIONALS 7.7 Now for each functional Gin (an )*, the functional Go eis alternating and u- linear, and so G;-+ F = Go e is a mapping from (an)* to an(v*) which is clearly linear. Moreover it is injective, for if G ~ 0, then F(fJ-q(l), ... , fJ-q(n)) = G(Vq) ~ 0 for some basis vector Vq = fJ-q(1) 1 ... 1 fJ-q(n) of an(v*). Sincp d(an(V*) = ('::) = dean) = d(an)*), the mapping is an isomorphism (by the corollary of Theorem 2.4, Chapter 2). In particular, every F in an(v*) i::; of the form Go e. 0 It can be shown further that the property asserted in the above theorem is an abstract characterization of an. By this we mean the following. Suppose that a vector space X and an alternating mapping cp from (v*)n to X are given, and suppose that every alternating functional F on (v*)n extends uniquely to it linear functional G on X (that is, F = Go cp). Then X is isomorphic to an, and in such a way that cp becomes e. To see this we simply note that the hypothesis of unique extensibility is exactly the hypothesis that <I>: G;-+ F = Go cp is an isomorphism from X* to an(v*). The theorem gave an isomorphism 8 from (an)* to an(v*), and the adjoint (<I>-1 0 8)* is thus an isomorphism from X** to (an)**, that is, from X to an. We won't check that cp "becomes" e. By virtue of Corollary 1 of Theorem 6.2, the identity D(t) = D(t*) is the matrix form of the more general identity b.(T) = b.(T*), and it is interesting to note the "coordinate free" proof of this equation. Here, of course, T E Hom V. We first note that the identity (T*A)(O = A(T~) carries through the defini- tions of @ and 1 to give T*Al 1 ... 1 T*An(~lI ... , ~n) = Al 1 ... 1 An(T~lI ... ,T~n). (*) Also, ev~: Al 1 ... 1 An;-+ Al 1 ... 1 An(~I' ... '~n) is an alternating n-linear functional on an(v*) for each l; E V n. The left member of (*) is thus ev~(T*Al' ... , T*An), and, if n = dim V, this is b.(T*)ev~(Al' ... , An) by the definition of b.. By the same definition the right side of (*) becomes b.(T)[AI 1 ... 1 An(~I' ... , ~n)l = b.(T)ev~(Al' ... , An). Thus (*) implies the identity b.(T*)ev~ = b.(T)ev~. Since ev~ ~ 0 if l; = Hi}! is independent, we have proved that b.(T*) = b.(T). We call a wedge product Al 1 ... 1 An of functionals Ai E V* a multi- vector. We saw above that Al 1 ... 1 An ~ 0 if and only if {Ai}1 is inde- pendent, in which case {Ai}! spans an n-dimensional subspace of V*. The following lemma shows that this geometric connection is not accidental. Lemma 7.3. Two independent n-tuples {Ai}1 and {fJ-i}1 in V* have the same linear span if and only if fJ-l 1 ... 1 fJ-n = k(AI 1 ... 1 An) for some k. Proof. If {fJ-j}! C L({Ai}1), then each fJ-j is a linear combination of the A/S, and if we expand fJ-l 1 ... 1 fJ-n according to these basis expansions, we get k(Al 1 ... 1 An). If, furthermore, {fJ-i}1 is independent, then k cannot be zero.
  • 331. 7.8 EXTERIOR POWERS OF SCALAR PRODUCT SPACES 319 Now suppose, conversely, that Jll / ... / Jln = kCAl / •.. / An), where k ¢ O. This implies first that {Jli} 1is independent, and then that Jlj / (AI / ... / An) = 0 for each j, so that each Jlj is dependent on {Ai}1. Together, these two con- sequences imply that the set {Jlin has the same linear span as {Ain. 0 This lemma shows that a multivector has a relationship to the subspace it determines like that of a single vector to its span. 8. EXTERIOR POWERS OF SCALAR PRODUCT SPACES Let V be a finite-dimensional vector space, and let ( , ) be a nondegenerate (nonsingular) symmetric bilinear form on V. In this and the next section we shall call any such bilinear form a scalar product, even though it may not be positive definite. We know that the bilinear form ( , ) induces an isomorphism of V with V* sending y E V into y E V*, where y(x) = (x, 'fi) = (x, y) for all x E V. We then get a nondegenerate form (scalar product), which we shall con- tinue to denote by ( , ), on V* by setting (ii, v) = (u, v). We also obtain a nondegenerate scalar product on aq by setting (iiI / ... / iiq, VI / ... / Vq ) = det (iii, VJ-). (8.1) To check that (8.1) makes sense, we first remark that for fixed Vb ... , Vq E V*, the right-hand side of (8.1) is an antisymmetric multilinear function of the vectors iiI, ... , iiq, and therefore extends to a linear function on aq(v*) by Theorem 7.3. Similarly, holding the ii's fixed determines a linear function on aq(v*), and (8.1) is well defined and extends to a bilinear function on aq(V*). The right-hand side of (8.1) is clearly symmetric in U and v, so that the bilinear form we get is indeed symmetric. To see that it is nondegenerate, let us choose a basis Ul, ... , Un so that (8.2) (We can always find such a basis by Theorem 7.1 of Chapter 2.) We know that {iii} = {iii! / ... / iii.} forms a basis for aq, where i = <ib ... ,iq> ranges over all q-tuplets of integers such that 1 ::::; i l < ... < i q ::::; n, and we claim that (8.3) In fact, if i ¢ j, then ir ¢ js for some value of r between 1 and q and for aIls. In this case one whole row of the matrix (Ui" Ujm) vanishes, namely, the rth row. Thus (8.1) gives zero in this case. If i = j, then (8.2) says that the matrix has ±1 down the diagonal and zeros elsewhere, establishing (8.3), and thus the fact that ( , ) is nondegenerate on aq• In particular, we have (Ul / •.. / Un, Ul / ••• / Un) = (-1)#, where #is the number of minus signs occurring in (8.3). (8.4)
  • 332. 320 MULTILINEAR FUNCTIONALS 7.9 9. THE STAR OPERATOR Let V be a finite-dimensional vector space endowed with a nondegenerate scalar product as in Section 8. The space an is one-dimensional if n is the dimension of V. The induced scalar product on an is nondegenerate, so that (u, u) is either always positive or always negative for all nonzero u E an. In particular, there are exactly two u's in an with (u, u) = ±l. Let us choose one of them and hold it fixed for the remainder of this section. Geometrically, this amounts to choosing an orientation on V. We thus have picked a with (9.1) Let ii be some fixed element of a q. Then for any y E a n- q, ii / YEan, and so we can write ii / Y = fv(Y)u, where fv(y) depends linearly on y. Since the induced scalar product ( , ) on an - q is nondegenerate, there is a unique element *ii E a n- q such that (y, *ii) = hey). To repeat, we have assigned a *ii E an - q to each ii E aq by setting (y, *ii)u = ii / y. (9.2) We have thus defined a map, *, from aq to an - q• It is clear from (9.2) that this map is linear. Let Ut, •.. , Un be a basis for V satisfying (8.2) and also u = Ut / ••. / Un, and construct the corresponding bases for the spaces aq and an - q• Then Ui / Uj = 0 if any il occurring in the q-tuplet i also occurs in j. If no il occurs in j then where Ek = sgn k If we compare this with (9.2) and (8.3), we see that (9.3) where the sign is the same as that occurring in (8.3), i.e., the sign is positive or negative according as the number of jl with (Uj" ujz) = -1 which appear in j is even or odd. Applying * to (9.3), we see that **ii = (-1)q(n-qJ+#v Let v and 10 be elements of aq• Then (*v, *w)u = v / *w = (-1)q(n-q)*w / v = (_l)q(n-q)(**w, v)u. If we apply (9.4), we see that (*v, *w) = (-1)#(v, w). (9.4) (9.5)
  • 333. CHAPTER 8 INTEGRATION 1. INTRODUCTION In this chapter we shall present a theory of integration in n-dimensional Euclid- ean space lEn, which the reader will remember is simply Cartesian n-space IRon together with the standard scalar product. Our main item of business is to introduce a notion of size for subsets of lEn (area in two dimensions, volume in three ...). Before proceeding to the formal definitions, let us see what properties we would like our notion of size to have. We are looking for a function jJ. which assigns a number jJ.(A) to bounded subsets A C P. i) We would like jJ.(A) to be a nonnegative real number. ii) If AC B, we would expect to have jJ.(A) ::; jJ.(B). iii) If A and B are disjoint (that is, A n B = 0), then we would expect to have jJ.(A U B) = jJ.(A) + jJ.(B). iv) Let T be any Euclidean motion. * For any set A let TA be the set of all points of the form Tx, where x E A. We then would expect to have jJ.(TA) = jJ.(A). (Thus we want "congruent" sets to have the same size.) v) We would expect a "lower-dimensional set" (where this is suitably de- fined) to have zero size. Thus points in the line, curves in the plane, surfaces in three-space, etc., should all have zero size. vi) By the same token, we would expect open sets to have positive size. In the above discussion we did not specify what kind of sets we were talking about. One might be ambitious and try to assign a size to every subset of lEn. This proves to be impossible, however, for the following reason: Let U and V be any two bounded open subsets of IE 3. It can be shownt that we can find * Recall that a Euclidean motion is an isometry of lEn and can thus be represented as the composition of the translation and an orthogonal transformation. t S. Banach and A. Tarski, Sur la decomposition des ensembles de pointes en partie respectivement congruentes, Fund. Math. 6, 244-277 (1924). R. 11. Robinson, On the decomposition of spheres, Fund. Math. 34, 246-260 (1947). 321
  • 334. 322 INTEGRATION 8.2 decompositions and with U, n U; = 0 = Vi n V; for i ¢ j, and Euclidean motions Ti with TiUi = Vi. In other words, we can break up U into finitely many pieces, move these pieces around, and then recombine them to get V. Needless to say, the sets Ui will have to look very bad. A moment's reflection shows that if we wish to assign a size to all subsets (including those like Ui ), we cannot satisfy (ii), (iii), (iv), and (vi). In fact, (iii) [repeated (k - I) times] implies that k p,(U) = :E p,(Ui), i=1 and (iv) implies that p,(Ui ) = p,(Vi ). Thus p,(U) = p,(V), or the size of any two open sets would coincide. Since any open set contains two disjoint open sets, this implies, by (ii), that p,(U) ? 2p,(U), so p,(U) = O. Weare thus faced with a choice. Either we dispense with some of require- ments (i) through (vi) above, or we do not assign a size to every subset of lEn. Since our requirements are reasonable, we prefer the second alternative. This means, of course, that now, in addition to introducing a notion of size, we must describe the class of "good" sets we wish to admit. We shall proceed axiomatically, listing some "reasonable" axioms for a class of subsets and a function p,. 2. AXIOMS Our axioms will concern a class :D of subsets of lEn and a function p, defined on :D. (That is, p,(A) is defined if A is a subset of lEn belonging to our collection :D.) I. :D is a collection of subsets of P such that: :Dl. If A E :D and B E :D, then A u B E :D, A n B E :D, and A - B E :D. :D2. If A E :D and T is a translation, then TA E :D. :D3. The set D~ = {x: 0 ~ Xi < I} belongs to :D. II. The real-valued function p, has the following properties: p.!. p,(A) ? 0 for all A E :D. p,2. If A E :D, B E :D, and A n B = 0, then p,(A U B) = p,(A) +p,(B). p,3. For any A E :D and any translation T, we have p,(TA) = p,(A). p,4. p,(D~) = 1. Before proceeding, some remarks about our axioms are in order. Axiom:D1 will allow us to perform elementary set-theoretical operations with the elements of :D. Note that in Axioms :D2 and p,3 we are only allowing translations, but in
  • 335. AXIOMS 323 ~ur list of desired properties we wanted proper behavior with respect to all llIuclidean motions in (iv). The reason for this is that we shall show that for B~good" choices of :0, the axioms, as they stand, uniquely determine J.I.. It will 'lhen turn out that J.I. actually satisfies the stronger condition (iv), while we Issume the weaker condition J.l.3 as an axiom. I I ------------*---I I I I I Fig. 8.1 Axiom :03 guarantees that our theory is not completely trivial, i.e., the t}bllection :0 is not empty. Axiom J.l.4 has the effect of normalizing J.I.. Without it, tf¥ly J.I. satisfying J.l.l, J.l.2, and J.l.3 could be multiplied by any nonnegative real "1;tumber, and the new function J.I.' so obtained would still satisfy our axioms. flh particular, J.l.4 guarantees that we do not choose J.I. to be the trivial function :lSSigp.ing to each A the value zero. Fig. 8.2 Our program for the next few sections is to make some reasonable choices for .:0 and to show that for the given:o there exists a uniqueJ.I. satisfying J.l.l through J.l.4. An important elementary consequence of the :0, J.I.-axioms that we shall fre- quently use without comment is: p.5. If A c U1 Ai and all the sets are in :0, then J.I.(A) :::; l:1 J.I.(Ai). Our beginning work will be largely combinational. We will first consider (generalized) rectangles, which are just Cartesian products of intervals, and the iway in which a point inside a rectangle determines a splitting of the rectangle ,.into a collection of smaller rectangles, as indicated in Fig. 8.1. This is associated with the fact that the intersection of any two rectangles is a rectangle and the difference of two rectangles is a finite disjoint union df rectangles (see Fig. 8.2).
  • 336. 324 INTEGRATION --------r-- Fig. 8.3 I I I I I f-------- t--- 8.3 We call a set A paved if it can be expressed as the union of a finite disjoint collection p of rectangles (a paving of A). It will follow from our combinational considerations that the collection !Dmin of all the paved sets satisfies Axioms !Dl through !D3 and is the smallest family that does: any other collection !D satisfying the axioms includes !Dmin. It will then follow that if Msatisfies Ml through M4 on !Dmin, then it must have the natural value (the product of the lengths of the sides) for a rectangle. This implies that Mis uniquely defined on !Dmin by requirements Ml through M4, since the value M(A) for any paved set A must be the sum of the natural values for the rectangles in a paving of A. The existence of Mon !Dmin thus depends on the crucial lemma that two different pavings of the set A give the same sum. (See Fig. 8.3.) This comes down to the fact that the "intersection" of two pavings of A is a third paving "finer" than either, and the fact that when a single rectangle is broken up, the natural values of M for the pieces add up to M for the fragmented rectangle. All these considerations are elementary but exceedingly messy in detail. We give the proofs below for the reader to refer to in case of doubt, but he may prefer to study only the definitions and statements of results and then to proceed to Section 6. 3. RECTANGLES AND PAVED SETS We first introduce some notation and terminology. Let a = -< a ... , an >- and b = -<bI, ... , bn>- be elements of lEn. By the rectangle D~ we shall mean the set of all x = -< xI, ... , xn >- in lEn with ai ~ Xi < bi • Thus Dh - { . i < i < bi . - 1 }a - x.a _X ,2- , ... ,n. (3.1) Note that in order for D~ to be nonempty, we must have ai < bi for all i. In other words, D~= 0' if ai ;::: bi for some i. (3.2) In the plane (n = 2) for instance, our rectangles D~ correspond to ordinary Euclidean rectangles whose sides are parallel to the axes. (We should perhaps use an additional adjective and call our sets level rectangles, braced rectangles, or something else, but for simplicity we shall just call them rectangles.) Note that in the plane our rectangles include the left-hand and lower edges but not the right-hand and upper ones (see Fig. 8.4).
  • 337. 8.3 RECTANGLES AND PAVED SETS 325 ----1'-------------X2= 0 Fig. 8.4 For general n, if we set 1 = -< 1, ... , 1» , then our notation coincides with that of ::03. We now collect some elementary facts about rectangles. It follows imme- diatelyfrom the definition (3.1) that if a = -<at, ... , ak », b = -<bt, . .. , bn », etc., then O~nO~ = O!, (3.3) where and ~ = 1, ... , n. (The reader should draw various different instances of this equation in the plane to get the correct geometrical feeling.) Note that the case where O~ n O~ = >Z5 is included in (3.3) by (3.2). Another immediate consequence of the definition (3.1) is for any translation T. (3.4) We will now establish some elementary results which will imply that any ::0 satisfying Axioms ::01 through ::03 must contain all rectangles. Lemma 3.1. Any rectangle O~ can be written as the disjoint union where b r - ar E D~. (What this says is that any "big" rectangle can be written as a finite union of "small" ones.) Proof. We may assume that O~ ~ >Z5 (otherwise take k = 0 in the union). Thus bi > ai . In particular, if we choose the integer m sufficiently large, (1/2m)(b - a) will lie in O~. By induction, it therefore suffices to prove that we can decompose O~ into the disjoint union 2n D~ = U D~: with d. - c. = !(b - a). (3.5) .=1 (For then we can continue to subdivide until the rectangles we get are small enough.) We get this subdivision in the obvious way by choosing the vertex "in the middle" of the rectangle and considering all rectangles obtained by cutting D~ through this point by coordinate hyperplanes. To write down an explicit
  • 338. 326 INTEGRATION 8.3 h[2J / Db{2) I/Db{1,2J a{2) a[l,2J b = b{l,2) Db0 Db'lll ' a0 a[l] Fig. 8.5 formula, it will be convenient to use the set of all subsets of {I, ... , n} as an indexing set, rather than the integers 1, ... ,2n. Let J denote an arbitrary sub- set of {1,2, ... ,n}. LetaJ= <a}, ... ,aj> andbJ = <b}, ... ,bj> be given by if i E J, if i ~ J and if i E J, if i ~ J. Then any x E D~ lies in one and only one D~::-. In other words, D~::- n D~~ = 525 if J r" K and UallJ D~::- = D~. (The case where n = 2 is shown in Fig. 8.5.) Since bJ - aJ = -!(b - a) for all J, we have proved the lemma. 0 We now observe that for any c E D~ we have, by (3.3), Do = D~ n D~-l' (3.6) Let Tv denote translation through the vector v. Then D~-l = Tc - l D~ by (3.4). Thus by Axioms i>2 and i>3 the rectangle D~-l must belong to i>. By (3.6) and Axiom i>1 we conclude that Do E i> for any c E D~. Observe that TaD~-a = D~ by (3.4). Thus whenever b -a E D~. If we now apply Lemma 3.1 we conclude that D~ E i> for all a and b. (3.7) We make the following definition. Definition 3.1. A subset S C lEn will be called a paved set if S is the disjoint union of finitely many rectangles. We can then assert: Proposition 3.1. Any i> satisfying Axioms i>1 through i>3 must contain all paved sets. Let i>min denote the collection of all finite unions of rectangles; then i>min satisfies Axioms i>1 through i>3. Proof. We have already proved the first part of this proposition. We leave the second part as an exercise for the reader.
  • 339. 8.4 THE MINIMAL THEORY 327 4. THE MINIMAL THEORY Weare now going to see how far Mis determined by Axioms M1 through M3. In fact, we are going to show the M(D~) is what it should be; i.e., if and then we must have M(D~) = {~bl - al) ... (bn - an) if D~ = 0, if D~ 7"= 0. (4.1) AxiomM4 says that (4.1) holds for the special case a = 0, b = 1. Examining the proof of Lemma 3.1 shows that D~ can be written as the disjoint union of 2n rectangles, all congruent (via translation) to D~12, where i = (!, ... ,!). Axioms M2 and M3 then imply that M(D~12) = 21n' Repeating this argument inductively shows that Dll2r 1 M( 0 ) = 2M We shall now use (4.2) to verify (4.1). The idea is to approximate any rectangle by unions of trans- lates of cubes Dll2ro . Fig. 8.6 Observe that in proving (4.1) we need to consider only rectangles of the form D~. In fact, we take c = b - a and observe that T-a (D~) = D~, so Axiom M3 implies that M(D~) = M(D~), and by definition e1 ••• en = (b1 - a1) ••• (bn - an). If D~ = 0, then (4.1) is trivially true (from Axiom M2). Suppose that D~ 7"= 0. Then c = -<e1, ... , en> with ei > 0 for all i. For each r there are n integers N 1, ••• , Nn such that (Fig. 8.6) (4.3) In what follows, let k = -< kl, ... , kn>, 1 = -< l1, ... , In>, etc., denote vectors with integral coordinates (i.e., the k/s are integers). Let us write k < 1 if ki < li for all i. If N = -<N11 ••• , N n >, then it follows from (4.3) and the definitions that D (l/2 r )k+1I2r C DC (I/2 r )k 0 whenever For any k and 1, Since D (l/2r )k+1I2r n D(l/2r JL+1I2r _ n< (l/2 r )k (l/2r )L - )U k < N. if k 7"= l.
  • 340. 328 INTEGRATION by (4.2) (and Axiom f..!2) and we conclude that U D(1I2r)k+l/2r C DC (l/2 r )k 0, k<N f..!(Oo) ~ <jbX (the number of k satisfying 0 ::; k < N). It is easy to see that there are N 1 • N 2' •..• N n such k, so that f..!(Do) ~ <jbX (Nl ·· . N n ) = (~f)'" (~f)' According to (4.3), Nj2r ~ e, - 1/2", so we have f..!(Do) ~ (e1 - ;r)'" (en - ;r)' Similarly, Doc U and we conclude that f..!(Do) ::; (e1 + ~r) ... (en+ ~r) Letting r --t 00 in (4.4) and (4.!i) proves (4.1). 8.;, (4.,1 ) (4.lil In deriving (4.1) we made use of Axiom f..!4. Examining our argument show,,: that if f..!' satisfied f..!2 and f..!3 but not f..!4, we could argue in the same mamH'l" except that we would have to multiply everything by the fixed constant f..!'(D~). To sum up, we have proved: Proposition 4.1. If f..! satisfies Axioms f..!1 through f..!4, then the value of Jl or any rectangle is uniquely determined and is given by (4.1). If f..!' satisfi('~ f..!1 through f..!3, then for any rectangle 0:, where 5. THE MINIMAL THEORY (Continued) We will now show that formula (4.1) extends to give a unique f..! defined on ~llli" so as to satisfy Axioms f..!1 through f..!4. We must establish essentially two fact,~. 1) Every union of rectangles can be written as a disjoint union oj rectangles This will then allow us to use Axiomf..!l to determine f..!(A) for every A E ~mi", by setting if A is the disjoint union of the D:;. Since A might be written in another way as a disjoint union of rectangles, this formula is not well defined until we estab- lish that: 2) IJ A = UD:; = UD~f are two representations of A as a disjoint unioll of rectangles, then
  • 341. 8.5 THE MINIMAL THEORY (CONTINUED) 329 I I --+ I I I I I I I I I I I I c---- I I -<c1, c~>- -<c~, c~>- I ? -<c1, C~>-r-------',---_--------~-<C4' c2>- I I 1 ') I 1 2 I I 2 -<C3' q>-: -<C2' C4>- I I 2 -<C" C4>-1---------r----+---"----.-<C4' C4>- I I ---f-- I I I I I I ----+--- Fig. 8.7 Fig. 8.8 We first introduce some notation. Definition 5.1. A paving p of lEn is a finite collection of mutually disjoint rectangles. The floor of this paving, denoted by [pI, is the union of all rectangles belonging to p. If p = {D~:} and T is a translation, we set Tp = {TD~:}. If p and g are two pavings, we say that g is finer than p (and write g -< p) if every rectangle of p is a union of rectangles of g. It is clear that if p -< 9:/ and g -< p, then g -< 9:/. Note also that g -< P implies [p[ C [gr. Proposition 5.1. Let p and g be any two pavings. There exists a third paving 9:/ such that 9:/ -< p and 9:/ -< g. Proof. The idea of the proof is very simple. Each rectangle in p or in g deter- mines 2n hyperplanes (each hyperplane containing a face of the rectangle). If we collect all these hyperplanes, they will "enclose" a number of rectangles. We let 9:/ consist of those rectangles in this collection which do not contain any smaller rectangle. Figure 8.7 shows the case (for n = 2) where p and g each contain one rectangle. Here 9:/ contains nine rectangles. We now fill in the details of this argument. Let Cl = -< cL ... ,c~ >- ,... , Ck = -< ck, ... , Ck>- be all the vectors that occur in the description of the rectangles of p and g. (In other words, if D~ E: P or E: g, then a and b are among the c's.) Let d 1, ..• , dkn be the vectors of the form -< cL ... ,cin >-, where the i/s range independently from 1 to k, (so that there are kn of them). (See Fig. 8.8 for the case where n = 2 and p and g consist of one rectangle each.) For each di there is at most one smallest dj(i) such that d i < dj(i)' In fact, if di = -<cll,···,cin >-, then set dj(i) = -< cll' ... , cfn>-' where I I Cjz = mIn cm. el >el m 'I
  • 342. 330 INTEGRATION 8.5 Let '1:1 = {D~~(;)}. Then '1:1 is finer than P and s. D~ = D~p for suitable a and (3 and In fact, if D~ E p, say, then DdfJ- d a - DdO(O) d~ ~ • . (5.1) To see this observe that if x E D~~, then d", ~ x < d~. Choose a largest di ~ x. Then di ~ x < dj(i), so x E D~{(;). This proves the proposition. We will later want to use the particular form of the '1:1 we constructed to find addi- tional information. 0 We can now prove (1) and (2). Lemma 5.1. Let PI, ... ,PI be pavings. Then there exists a paving s such that lsi = Ipil u· .. u IpzI· Proof. By repeated applications of Proposition 5.1 we can choose a paving ~/ which is finer than all the p/s. Then each Ipil is the union of suitable rectangles of '1:1. Let S be the collection of all these rectangles occurring in all the p/s. Then lsi = Ipil U· .. u Ipkl. D In particular, we have proved (1). More generally, we have shown that every A E :Dmin is of the form A = Ipi for a suitable paving p. We now wish to turn our attention to (2). Lemma 5.2. Let d < ... < c;l' c~ < ... < C~2' ••• ,c~ < ... < c~n be 11 sequences of numbers. Then 1 11. 1 11. D <c , ... , c >- '"" D <C0 +1'···' c. +1 >-J.l. T} Tn = £...J J.L "'I '11. <clll""enl> l<il<rl <c~ ,...,e'!' > .-: 1.1 "'11. l:5i~<rn Proof. In fact, C~i - c = c1 - c; +c~ - c1 +... +C~i - C;i-I, so that th(' lemma follows from (4.1) when we multiply out all the factors. D We now prove (2). Let P = {D~:} and S = {D!~}, where A = Ipi = lsi· Let '1:1 be the paving we constructed in the proof of Proposition 5.1. Let..4- = {D~i} be the collection of those rectangles D~~ of '1:1 such that D~~ C Ipi = lsi· Then to prove (2) it suffices to show that L M(D~i) = L: M(D~:) = L M(D!t). (5.2) Now each rectangle D~: is decomposed into rectangles D~{(l) according to (5.1), that is, ai = d"" hi = d~, etc. By construction of the d's, this is exactly a decomposition of the typl! described in Lemma 5.2. Thus (5.1) implies that M(D~~) = L: M(D~{(i». d a :5di dj(i)<dp
  • 343. 8.6 CONTENTED SETS 331 Summing over all D~: (and doing the same for D~l) proves (5.2). We can thus state: TheoreD1 5.1. Every A E :Dmin can be written as A = Ipl. The number JL(A) = LOEPJL(O) does not depend on the choice of p. We thus get a well-defined function JL on :Dmin. It satisfies Axioms JL1 through JL4. If JL' is any other function on :Dmin satisfying JL2 and JL3, then JL'(A) = KJL(A), where K = JL'(O~). Proof. The proof of the last two assertions of the theorem is easy and is left as an exercise for the reader. 6. CONTENTED SETS Theorem 5.1 shows that our axioms are not vacuous. It does not provide us with a satisfactory theory, however, because :Dmin contains far too few sets. In particular, it does not fulfill requirement (iii), since :Dmin is not invariant under rotations, except under very special ones. We are now going to remedy this by repeating the arguments of Section 4; we are going to try to approximate more general sets by sets whose JL'S we know, i.e., by sets contained in :Dmin. This idea goes back to Archimedes, who used it to find the areas of figures in the plane. Definition 6.1. Let A be any subset of lEn. We say that P is an inner paving of A if Ipi CA. We say that S is an outer paving of A if A c lsi. We list several obvious facts. If Ipi cAe lsi, then JL(p) ~ JL(S). If Ipi cAe lsi, then ITpl eTA c ITsl· If Al n A2 = 525 and Ipil CAb Ip21 C A 2, then PI U P2 is an inner paving of Al U A 2. Definition 6.2. For any bounded subset A of lEn let JL*(A) = lub JL(lpi) IpicA be called the inner content of A and let ,a(A) = glb JL(isi) AC/sl be called the outer content of A. (6.1) (6.2) (6.3) Note that since A is bounded, there exists a S with A c lsi. This shows that peA) is defined. This together with (6.1) shows that JL*(A) is defined and that JL*(A) ~ ,a(A). (6.4)
  • 344. 332 INTEGRATION 8.6 Definition 6.3. A set A will be called contented if J-t*(A) = ,ileA). We call J-t*(A) = ,ileA) the content of A and denote it by J-t(A). Observe that every A E ~min is contented. In fact, if A = lvi, then v is both an inner and an outer paving of A. Thus J-t*(A) = ,ileA) = J-t(lvl), and the new definition of J-t(A) coincides with the old one. Our next immediate objective is to show that the collection of all contented sets fulfills Axioms ~1 through ~3. Proposition 6.1. A set A is contented if and only if its boundary is con- tented and has content zero. Proof. Suppose A is contented. For any 0 > 0 we can find an inner paving p and an outer paving S such that J-t(S) - J-t(p) < 0/2. We want to replace p by a close paving p' with Ip'l c int A. To do this, we choose a small number 1/ and replace each rectangle D~ of p by D~+~~~=:~. We let p~ be the collection of all these rectangles. Then Ip~1 C int Ipl, so Ip~1 C int A. Furthermore, J-t(lp~l) = (1 - 21/)nJ-t(lpl), since the factor (1 - 21/) is the decrease of each side of each rectangle of p. Similarly, we replace S by a slightly larger s~, with A C int S~ and J-t(S~) :::; (1 + 21/)nJ-t(S). By choosing 1/ sufficiently small, we can thus arrange that J-t(S~) - J-t(p~) < o. Let v be a paving which is finer than S~ and p~, with Ivl = Is~l. Let.4- C v consist of those rectangles of 'V lying in int A. Then Ivl = Is~1 =:> 1.4-1 =:> Ip~l, so J-t(lvl) - J-t(I.4-I) :::; o. But, aA c Iv - .4-1, so that ,il(aA) :::; J-t(lv - .4-1) = J-t(lvl) - J-t(I.4-1) < o. In other words, ,il(aA) = o. Conversely, suppose that aA has content zero. Let.4- be an outer paving of aA with J-t(I.4-1) < E. Let v be a paving finer than .4- and such that A C Ivl. Let p C v consist of those rectangles contained in A. Let S C v consist of those rectangles lying in Ipi U 1.4-1. ThenJ-t(lsl) :::; J-t(lpl) +J-t(I.4-1) < J-tCipl) + E. Further- more, A C lsi. In fact, let x E A. Then xED for some 0 E v. If 0 n aA ~ 0, then 0 n 1.4-1 ~ 0, so 0 c 1.4-1, since v is a refinement of.4-. If 0 n aA = 0, then every point of 0 must lie in A, so that 0 C Ipl. We have thus constructed p and S with Ipi cAe lsi and J-t(S) - J-t(p) < E. Since we can do this for any f, this implies that A is contented. D Proposition 6.2. The union of any finite number of sets with content zero has content zero. If A c Band B has content zero, then so does A. Proof. The proof is obvious. TheoreIn 6.1. Let ~con denote the collection of all contented sets. Theil ~con satisfies Axioms ~1 through ~3, and the J-t given in Definition 6.3 sat- isfies J-t1 through J-t4. IfJ-t' is any other function on ~con satisfying J-t1 through J-t3, then J-t' = KJ-t, where K = J-t'(Db). Proof. Let us verify the axioms. :D1. For any A and B, a(A u B) c aA u aB and a(A n B) c aA u aBo
  • 345. 8.7 WHEN IS A SET CONTENTED? 333 By Proposition 6.1, if A and B are contented, then aA and aB have content zero. Thus so do aA u aB, a(A u B), and a(A n B), by Proposition 6.2. Hence A u B and A n B are contented. ~2. Follows immediately from (6.1). ~3. Is obvious. p,2. If A 1 and A 2are contented, we can find inner pavings Ih and P2 such that P,(A1) - p,(lp1/) < e/2andp,(A2) - p,(lp2/) < e/2. If A1 n A2 = )25, then P1 U P2 is an inner paving of A 1 u A 2, and so P,(A1 u A 2) ;::: p,(A 1) +P,(A2)' On the other hand, let 81 and 82 be outer pavings of A1 and A 2, respectively, with P,(81) < P,(A1) + e/2 and P,(82) < P,(A2) + e/2. Let v be a paving with Ivl = 1811 u 1821. Then v is an outer paving of A1 u A2 and p,(lvl) ~ P,(181/) +p,(182/)' Thus p,(A1 u A z) ~ p,(lv/) ~ P,(A1) +P,(A2) + e, or p,(A1 u A 2) ~ p,(A1) +P,(A2)' These two inequal- ities together give p,2. p,1. Is obvious. p,3. Follows from (6.2) and Definition 6.3. p,4. We already know. The second part of the theorem follows from Theorem 5.1 and Definition 6.3. In fact, we know that p,'(lpl) = Kp,(lp[), and (6.1) together with Axiom p,2 implies that p,'(lpl) ~ p,'(A) ~ p,'(18!). Since we can choose p and 8 to be arbitrarily close approximations to A (relative to p,), we are done. 0 Remark. It is useful to note that we have actually proved a little more than what is stated in Theorem 6.1. We have proved, namely, that if i) is any collec- tion of sets satisfying i)1 through i)3, such that i)min C i) C i)con, and if p,': i) ~ IR. satisfiesp,l through p,3, then p,'(A) = Kp,(A) for all A in i), where K = p,'(D~). 7. WHEN IS A SET CONTENTED? We will now establish some useful criteria for deciding whether a given set is contented. Recall that a closed ball B: with center x and radius r is given by B~ = {y: Iiy - xii ~ r}. (7.1) Note that for any e > 0, (7.2) and (7.3) (See Fig. 8.9.) Fig. 8.9
  • 346. 334 INTEGRATION 8.7 If we combine (7.2) and (7.3), we see that any cube G lies in a ball B such that fl(B) S 2n(Vn)np.(G) and that any ball B lies in a cube G such that p.(G) < 3n(yn)nfl(B). Lelllllla 7.1. Let A be a subset of lEn. Then A has content zero if and only if for every E > 0 there exist a finite number of balls {Bi} covering A with L fl(Bi) < E. Proof. If we have such a collection of covering balls, then by the above remark we can enlarge each ball to a rectangle to get a paving p such that A C Ipi and p.(lpl) < 3n(yn)nE. Therefore, fleA) = 0 if we can always find the {Bi}. Conversely, suppose A has content O. Then for any 0 we can find an outer paving p with p.(lpl) < o. For each rectangle 0 in the paving we can, by thp arguments of Section 4, find a finite number of cubes which cover 0 and whose total content is as close as we like to p.(O), say <2p.(O). By doing this for each o E p, we have a finite number of cubes {Oi} covering A with total content less than 20. Then by our remark before the lemma each cube O. lies in a ball Bi such that P.(Oi) < 2n(Vn)nfl(B.), and so we have a covering of A by balls Bi such that L fl(Bi) < 2n+1(yn)n o. If we take 0 = E/2n+1(yn)n, we have the desired collection of balls, proving the lemma. 0 Recall that a map cp of U C P ---+ lEn is said to satisfy a Lipschitz condition if there is a constant K (called the Lipschitz constant) such that IIcp(y) - cp(x) II < KIIY - xii· (7.4) Proposition 7.1. Let A be a set of content zero with if C U, and let cp: U ---+ lEn satisfy a Lipschitz condition. Then cp(A) has content zero. Proof. The proof consists of applying both parts of Lemma 7.1. Since A has content zero, for any E > 0 we can find a finite number of balls covering A whose total outer content is less than E/Kn. By (7.4), cp(B~) C B:[,.), so that the images of the balls covering A cover cp(A) and have a total volume less than E. 0 Recall that if cp is a (continuously) differentiable map of an open set U into lEn, then cp satisfies a Lipschitz condition on any compact subset of U. As a consequence of Proposition 7.1, we can thus state: Proposition 7.2. Let cp be a continuously differentiable map defined on an open set U, and ld A be a bounded set of content zero with if C U. Then cp(A) has content zero. Let A be any compact subset of P lying entirely in the subspace given by xn = O. Then A has content zero. In fact, for some sufficiently large fixed r, the set A is contained in the rectangle o<T•...•T•• > <-T•.•.•-T.O) > for any E > 0, which has arbitrarily small volume.
  • 347. 8.8 BEHAVIOR UNDER LINEAR DISTORTIONS 335 Now let 1/;: V C p-l ~ P be a continuously differentiable map given by <yl, ... , yn-l> ~ <I/;l(yl, ... , yn-l), ... , I/;n(yI, ... , yn-l». Let B be any bounded subset of p-l with ]'j C V. We can then write I/;(B) = ~(A), where A is the set of points in lEn of the form (y, 0), where y E B, and where ~ is a differentiable map such that ~(Xl, ... , xn) = <I/;I(xI, ... , xn - l), ... , I/;n(xI, ... , xn- l». By Proposition 7.2 we see that p.CI/;(B) = o. Thus, Proposition 7.3. Let I/; be a differentiable map of V C IEn - 1 into lEn, and let B be a bounded set such that ]'j c V. Then I/;(B) has content zero. We have thus recovered requirement (v) of Section l. An immediate consequence of Propositions 7.3 and 6.1 is: Proposition 7.4. Let A C P be such that aA c UI/;i(Bi) where each I/;i and Bi is as in Proposition 7.3. Then A is contented. This shows that every set "we can draw" is contented. Exercise. Show that every ball is contented. 8. BEHAVIOR UNDER LINEAR DISTORTIONS We shall continue to derive consequences of Proposition 7.l. Proposition 8.1. Let ~ be a one-to-one map of U ~ P which satisfies a Lipschitz condition and is such that ~-l is continuous. If A c U is con- tented, then so is ~(A). Proof. Since A is contented, aA has content zero. By the conditions on ~, we know that a~(A) = ~(aA). Thus a~(A) has content zero, and so ~(A) is contented. 0 An immediate consequence of Proposition 8.1 is: Proposition 8.2. Let L be a linear transformation of P. Then LA is con- tented whenever A is contented. Proof. If Lis nonsingular, Proposition 8.1 applies. If L is singular, it maps all of lEn onto a proper subspace. Any such subspace is contained in the image of {x : xn = O} by a suitable linear transformation, and so p.(LA) = 0 for any contented A. 0 Theorem 8.1. Let L be a linear transformation of P. Then for any con- tented A we have p.(LA) = Idet LIp.(A). (8.1) Proof. We can restrict our attention to nonsingular L, since we have already checked Eq. (8.1) for det L = o. If L is nonsingular, then L carries the class of
  • 348. 336 INTEGRATION S.9 contented sets into itself. Let us define fJ.' by fJ.'(A) = fJ.(LA) for each A E :Deon. We claim that fJ.' satisfies Axioms fJ.1 through fJ.3 on :Deon. In fact, fJ.1 and fJ.2 are obviously true; fJ.3 follows from the fact that for any translation Tv, we have TLvL = LTv, so that fJ.'(TvA) = fJ.(LTvA) = fJ.(TLvLA) = fJ.(LA) = fJ.'(A). By Theorem 5.2 we thus conclude that fJ.' = kLfJ., where kL is some constant depending on L. We must show that kL = Idet LI. We first observe that if 0 is an orthogonal transformation, then fJ.(OA) = fJ.(A). In fact, we know that fJ.(OA) = kOfJ.(A). If we take A to be the unit ball B~, then OB~ = B~, so ko = l. Next we observe that fJ.(L I L 2 A) = kL1fJ.(L2 A) = kLJCL2fJ.(A), so that Now we recall that any nonsingular L can be written as L = PO, where P is a positive self-adjoint operator and 0 is orthogonal. Thus kL = kp and Idet LI = Idet PI Idet 01 = Idet PI, so we need only verify (S.l) for positive self- adjoint linear transformations. Any such P can be written as P = OID01 whe~e 0 1 is orthogonal and D is diagonal. Since P is positive, all the eigenvalues of D are positive. Since det P = det D and kp = kD, we need only verify (S.U for the case where L is given by a diagonal matrix with positive eigenvalues Ar, ... ,An. But then LD~ = D~Al"")'n>, so that fJ.'(D~) = fJ.(D~Al,...,An» = AI' .. An = Idet LI, verifying (8.1). 0 Exercise. Let VI, ... , Vn be vectors of lEn. By the parallelepiped spanned by VI, ... , Vn we mean the set of all vectors of the form L7~1 XiVi, where 0 :::; xi:::; 1. Show that its content is Idet ((Vi, Vj))II/2. 9. AXIOMS FOR INTEGRATION SO far we have shown that there is a unique fJ. defined for a large collection of sets in lEn. However, we do not have an effective way to compute fJ., except in very special cases. To remedy this we must introduce a theory of integration. We first introduce some notation. Definition 9.1. Let f be any real-valued function on lEn. By the support of j, denoted by supp j, we shall mean the closure of the set where j is not zero; that is, suppj = {x: j(x) rf O}.
  • 349. 8.9 AXIOMS FOR INTEGRATION 337 Observe that supp (f + g) C supp f U supp g (9.1) and supp fg c supp f n supp g. (9.2) We shall say that f has compact support if supp f is compact. Equation (9.1) [and Eq. (9.2) applied to constant gj shows that the set of all functions with compact support form a vector space. Let T be any one-to-one transformation of lEn onto itself. For any function f we denote by Tf the functions given by (Tf)(x) = f(T-1x). (9.3) Observe that if T and T- 1 are continuous, then and supp Tf = T supp f. (9.4) Definition 9.2. Let A be a subset of lEn. By the characteristic function of A, denoted by eA, we shall mean the function given by Note that if x E A, if xtiA. eAlnA2 = eAl . eA2 , eAlUA2 = eAl + eA2 - eAlnA2 , supp eA = X, (9.5) (9.6) (9.7) (9.8) (9.9) for any one-to-one map T of lEn onto itself. By a theory of integration on P we shall mean a collection 5' of functions and a rule I which assigns a real number If to each f E 5', subject to the follow- ing axioms: 5'1. 5' is a vector subspace of the space of all bounded functions of compact support. 5'2. If f E 5' and T is a translation, then Tf E 5'. 5'3. eo belongs to 5' for any rectangle D. II. I is a linear function on 5'. I2. ITf = If for any translation T. I3. Iff ~ 0, then If ~ O. I4. IeD~ = 1. Note that the axioms imply that 5' contains all functions of the form eOl + e0 2 + ... +eOk for any rectangles Db ... ,Ok, In particular, for any paving p, the function e11'1 must belong to 5'.
  • 350. 338 INTEGRATION 8.10 Also note that from I3 we have at once the stronger version: I3'. f:::; g =} If :::; I g, since then g - f ~ O. Proposition 9.1. Let a:, I be a system satisfying Axioms n through a:3 and II through I4. Then JeA = ,u(A) (9.10) for every contented set A such that eA E a:, and I5. IIfl :::; IIflloo,u(suppf) for every f Ea:. Proof. The axioms guarantee that eA E a: for every A E !Dmin and that v(A) = I eA satisfies ,u1 through ,u4. Therefore, I eA = ,u(A) for every A E !Dmin by the uniqueness of ,u (Proposition 4.1). It follows that if A is a contented set such that eA E a:, and if p and s are inner and outer pavings of A, then ,u(lpi) = Je1,,1 :::; JeA :::; Jelsi = ,u(jsi). Therefore, I eA lies between ,u*(A) and ,a(A), and so equals ,u(A). For any f E a: and any A E !Dmin such that supp f ~ A, we have -llfll",eA :::; f :::; IIfll",eA, and therefore IIfl :::; IIfll",,u(A) by I3' and (9.10). Taking the greatest lower bound of the right side over all such sets A, we have I5. 0 10. INTEGRATION OF CONTENTED FUNCTIONS We will now proceed to deal with Axioms a: and I in the same way we dealt with Axioms !D and,u. We will construct a "minimal" theory and then get a "big" one by approximating. According to Proposition 9.1, the class a: must contain the function e1,,1 for any paving p. By a:1 it must therefore contain all linear combinations of such. Definition 10.1. By a paved function we shall mean a function f = f" given by (10.1) for some paving p. It is easy to see that the collection of all paved functions satisfies Axioms n through 53. Furthermore, by Proposition 9.1 and Axiom II the integral, I, is uniquely determined on the class of all paved functions by (10.2) if f is given by (10.1). The reader should verify that if we let a:p be the class of all paved functions and let I be given by (10.2), then all our axioms are satisfied. Don't forget to show that I is well defined: if f is expressed as in (10.1) in two ways, then the sums given by (10.2) are equal.
  • 351. 8.10 INTEGRATION OF CONTENTED FUNCTIONS 339 The paved functions obviously form too small a collection of functions. We would like to have an 5' including all continuous functions with compact support and all characteristic functions of the form eA with A contented, for example. Definition 10.2. A bounded function f with compact support is said to be contented if for any e > 0 and a> 0 there exists a paved function g = g.,a and a contented set A = A.,a such that If(x) - g(x) I < e for all x (/. A and ,u(A) < a. The pair -< g, A >- will be called a paved e, a-approximation to f. (10.3) (10.4) Let us verify that the collection of all contented functions, 5'con, satisfies Axioms 5'1 through n. It is clear that iffis contented, so is affor any constant a. If fl and f2 are contented, let -<gb Al >- and -<Y2, A 2>- be paved e,a-approxi- mations to fl and f2' respectively. Then for all x (/. Al U A 2, and Thus -<gl + g2, Al U A 2>- gives a paved 2e, 2a-approximation tofI +12. To verify 5'2 we simply observe that if -< g, A>- is a paved e, a-approximation to f, then -< Tg, T A >- is one to Tf. A similar argument establishes the analogous result for multiplication: Proposition 10.1. Let fI and f2 be two contented functions. Then fd2 is contented. Proof. Let M be such that Ifl(X) < M and If2(X)1 < M for all x. Recall that the product of two paved functions is a paved function. Using the same notation as before, we have Ifl12(X) - gl(X)g2(X)1 :::; IfI(x)llf2(X) - g2(x)1 + Ig2(X)llfl(X) - gl(x)1 < Me + (M +e)e Thus -<glg2, Al U A 2>- is a paved (2M +e)e, 2a-approximation tofd2' D As for 5'3, it is immediate that a stronger statement is true: Proposition 10.2. If B is a contented set, then eB is a contented function. Proof· In fact, let p be an mner paving of B with ,u(B) - ,u(lpl) < a. Then eB(x) - eII'I (x) = 0 if x (/. B - Ipl, and ,u(B - Ipl) < a, so eipi ,Ipi is a paved e, a-approximation to eB for any e > O. D
  • 352. 340 INTEGRATION 8.10 We now establish a useful alternative characterization of a contented func- tion. Proposition 10.3. A function f is contented if and only if for every E there are paved functions hand k such that h :::; f :::; k and I (k - h) < E. Proof. Iff is contented, let R be a rectangle including supp f. Let -< g, A>- be an E, a-approximation to f. Let P be a paved set including A = A.,5 such thai. p,(P) < a, and let m be a bound of IfI· Then g - E(eR) - mep :::; f :::; g -+ E(eR) + l1Wp, where the outside functions are clearly paved and the differenceH of their integrals is less than 2Ep,(R) + 2ma. Since E and aare arbitrary, we have our hand k. Conversely, if hand k are paved functions such that h :::; f :::; k and I(k - h) < a, then the set where k - h 2:: al/2 is a paved set A. Further- more, a1/2p,(A) :::; IeA(k - h) :::; I(k - h) :::; a, so that p,(A) :::; a1/2. Given E and a, we only have to choose a :::; min (E2, a2 ) and take g as either k or h to sec that f is contented. 0 Corollary. A function f is contented if for every E there are contented functions fr and f2 such that fr :::; f :::; f2 and I (12 - fr) < E. Proof. For then we can find paved functions h :::; fr and k 2:: f2 such thai. f (11 - h) < Eand I (k - f2) < f and end up with h :::; f :::; k and I (k - h) < 3E. 0 Theorem. 10.1. Let 5' be a class of functions satisfying Axioms n through 5'3 and such that 5'pC 5' C 5'ean. Then there exists a unique I satisfying Axioms II through I4 on 5'. Proof. If I is any integral on 5' satisfying Axioms II through I4, then we must have If simultaneously equal to lub Ih for h paved and :::;f and equal to glb I Ie for k paved and 2::f, by Proposition 10.3. The integral is thus uniquely de- termined on 5'. lloreover, it is easy to see that if the integral on 5' is defined by If = lub I h = glb I k, then Axioms II through I4 follow from the fact that they hold for the uniquely determined integral on the paved functions. 0 Exercise 10.1. Let f and g be contented functions such that f(x) = g(x) for x tl. A, where J.t(A) = O. Then Jf = Jg. (This shows that for the purpose of integration we need to know a function only as to a set of content zero.) Definition 10.3. Let f be a contented function and A a contented set. We call I eAf the integral of f over A and denote it by I Af. Thus Lf= feAf. (10.5) An immediate consequence of Axiom II and (9.7) is (10.6) An immediate consequence of Exercise 10.1 is ILfl :::; ~~~ If(x)Ip,(A).
  • 353. 8.10 INTEGRATION OF CONTENTED FUNCTIONS 341 We close this section by giving another useful characterization of contented functions. Proposition 10.4. Let f be a bounded function with compact support. Then f is contented if and only if to every E > 0 and ~ > 0 we can find an 1/ > 0 and a contented set Aa such that ,u(Aa) < ~ and If(x) - f(y) I < E whenever IIx - yll < 1/ and x, y ft Aa. (10.7) Proof. Suppose that for every E, ~ we can find 1/ and Aa. Let p = {Di} be a paving such that i) suppfelpl; ii) if x, y E Di, then IIx - yll < 1/; iii) if S = {Di E p: Di n Aa ~ .0}, then ,u(lsD < 2~. Then let f •.2a(x) = f(Xi) when x E Di, where Xi is some point of Di. By (ii) and (iii), we see that f •.2a,ISI is a paved E, 2~-approximation to f. Thus f is contented. Conversely, suppose that f is contented, and letf./2.a/2, A./2•a/2 be a paved approxima- tion to f. Let p = {Di} be the paving associated with f./2.aj2. Replace each Di by the rec- tangle D~ obtained by contracting Di about its center by a factor (1 - ~). (See Fig. 8.10.) Thus ,u(DD = (1 - ~)n,u(Di). For any x, y E UD~, if IIx - yll < 1/, where 1/ is sufficiently small, then Xand y belong to the same D~. If X, Y E UDi. then I I I I Fig. B.I0 and ilx - yll < 1/, If(x) - f(y) I ~ If(x) - f./2.m(x)1 + If(y) - f./2.m(y)1 + If./2.a/2(x) - f./2.m(y)l· D D But the third term vanishes, so that if(x) - f(y) I < E. Now by first choosing ~ sufficiently small, we can arrange that ,u(lpl - UDD < ~/2. Then we can choose 1/ so small that IIx - yll < 1/ implies that x, y belong to the same D~ if x, y E UD~. For this 1/ and for Aa = A./2 •a/2 U (ipi - UDD, Eq. (10.7) holds, and ,u(Aa) < ~. 0 In particular, a bounded function which is continuous except at a set of content zero and has compact support is contented.
  • 354. 342 INTEGRATION 8.11 EXERCISES 10.2 Show that for any bounded set A, CA is a contented function if and only if A iK a contented set. 10.3 Let! be a contented function whose support is contained in a cubeD. For each 0 let p~ = {Di.~}iEI~ be a paving with Ip~1 = 0 and whose cubes have diameter IC~K than o. Let Xi.~ be some point of Di.~. The expression is called a Riemann o-approximating sum for f. Show that for any E > 0 there exists a 00 [= oo(f)] > 0 such that whenever 0 < 00. 11. THE CHANGE OF VARIABLES FORMULA This section will be devoted to the proof of the following theorem, which is of fundamental importance. Theorem 11.1. Let U and V be bounded open sets in IRn, and let !p be a continuously differentiable one-to-one map of U onto V with !p - 1 differentiable Let f be a contented function with supp f C V. Then (f 0 !p) is a contented function, and !vf= !u(focp)ldetJI"I. (11.1 ) Recall that if the map cp is given by yi = cpi(Xl, ... , xn), then J I" is th(~ linear transformation whose matrix is [acpi/aXj). Note that if cp is a nonsingular linear transformation (so that JI" is just cp), then Theorem 11.1 is an easy consequence of Theorem 8.1. In fact, for functions of the form CA we observe that CA 0 cp = CI"-lA, and Eq. (11.1) reduces, in this case, to (8.1). By linearity, (11.1) is valid for all paved functions. Furthermore,fo cp is contented. Suppose If(x) - f(y) I < Ewhen IIx - yll < cp and x, y tl A, with ,u(A) < o. Then If 0 cp(u) - f 0 cp(v) I < Ewhen and u, v tl cp-l(A), with ,u(cp-lA) < o/Idet cpl. Now let g•.~, A •.~ be an approximating family of paved functions for f. Theil Ifo cp(x) - g•.~ 0 cp(x) I < eforx tl cp-l(A •.~) and,u(cp-lA •.~) < o/Idet cpl. Thus I(g•.~ 0 cp)ldet cpl -t I(fo cp)ldet cpl, and Eq. (11.1) is valid for all contentedf. The proof of Theorem 11.1 for nonlinear maps is a bit more tricky. It con- sists essentially of approximating cp locally by linear maps, and we shall do it ill several steps. We shall use the uniform norm IIxliao = max IXil on ~n. This is convenient because a ball in this norm is actually a cube, although this nicety isn't really necessary.
  • 355. 8.11 THE CHANGE OF VARIABLES FORMULA 343 Let t/t be a (continuously) differentiable map defined on a convex open set U. If the cube D = D:~~~ lies in U, then the mean-value theorem (Section 7, Chapter 3) implies that for any y E D, Thus Thus 1It/t(y) - t/t(p)lIao < lIy - pllao sup IIJIft(z)lI· .1'(D) DIft(p)+Krl'I' C Ift(p)-Krl, where zED (11.2) LeDlDla 11.1. Let cp be as in Theorem 11.1. Then for any contented set A with A c U we have ~(cp(A)) ~ LIdet J",I· (11.3) Proof. Let us apply Eq. (11.2) to the map t/t = L -lcp, where L is a linear transformation. Then (11.4) for any D contained in the domain of the definition of cp and for any linear tmnsformation L. For any E > 0, let ~ be so small that IIJ",(x)-IJ",(y)1I < 1 + Efor IIx - yllao < ~ for all x, y in a compact neighborhood of A. (It is possible to choose such a ~, since J(x) is a uniformly continuous function of x, so that J",(x)-IJ",(y) is close to the identity matrix when x is close to y; see Section 8, Chapter 4.) Choose an outer paving if = {Di} of A, where the Di are cubes all having edges of length less than~. Let Xi be a point of Di. Then applying (11.4) to each Di taking L = J",(Xi), we get ~(cp(A)) < ~(CP(lifD) = L ~CP(Di) < L Idet J",(Xi) I(1 + E)n~(Di)' We can also suppose ~ to have been taken small enough so that Idet J",(Z) I > (I - E)ldet J",(Xi)I for all Z E Di and all i. Then we have and so fD.ldetJ",I> (1 - E)ldetJ",(xi)I~(Di)' • ~(cp(A)) < - 11 (1 + E)" r Idet J",I. - E Jill Since E is arbitrary and if is an arbitrary outer paving of A, we get (11.3). 0
  • 356. 344 INTEGRATION 8.11 We can now conclude that f 0 cp is contented for any contented f with supp f c V. In fact, let K be chosen so large that it is a Lipschitz constant for «' on cp-l(SUppf),and so large thatK > IdetJ<,O-I(u)1 foru ESUPPf. Nowgivcll E and a, we can find an 1/ such that lJ(u) - f(v) I < E if Ilu - vII < 1/ and u, vr;;. A~ with ,u(A~) < a. But this implies that If 0 cp(x) - f 0 cp(y)I < E if IIx - yll < 1//K and x, y r;;. cp-l(A~), where ,u(cp-l(A~)) < K~, by (11.3). Since K was chosen independently of f and a, this shows that f 0 cp is contented. Lemma 1l.2. Let cp, U, and V be as in Theorem 11.1. Let f be a nonnega- tive contented function with supp f c V. Then 01.5) Proof. Let -< g, A>- be a paved E, a-approximation to fwith g(u) :::; feu) for all u. If p = {Oi} is the paving associated with g, we may assume that supp f C Ipi Then Ig = 21 g(Ui),u(Oi) :::; L g(Ui)1<,0-1(Oi) Idet J<'oI :::; 11 ~-I(Oi) (fo cp)ldet JI'I u~E.Di Since we can choose g so that fg --+ ff, we obtain (11.5). 0 Lemma 1l.3. Let cp, U, V, and f be as in Theorem 11.1. Let f be a nOIl- negative function. Then Eq. (11.1) holds. Proof. Let us apply (11.5) to the map cp-l and the function (fo cp)ldetJ<'oI. Since J <,0(x) 0 J <,0-1 (cp(x)) = id, we obtain IUo cp)ldetJ<'o1 :::; I[Uo cp) 0 cp-l](1detJ<'o1 0 cp-l)ldetJ<,O-11 = If. Combining this with (11.5) proves the lemma. 0 Completion of the proof of Theorem 11.1. Any real-valued contented function call be written as the difference of two positive contented functions. If for all x, f(x) > -1'11 for some large M, we write f = (f + Meo) - Meo, where supp feD. Since we have verified Eq. (11.1) for nonnegative functions, and sinc(' both sides of (11.1) are linear inf, we are done. Similarly, any bounded complex- valued contented function f can be written as f = fl +ih, where fl and f2 arc bounded real-valued contented functions. 0
  • 357. 8.11 THE CHANGE OF VARIABLES FORMULA 345 In practice, we sometimes may apply Eq. (11.1) to a situation where the hypotheses of Theorem 11.1 are not, strictly speaking, verified. For instance, in 1R2 we may want to introduce "polar coordinates". That is, we let r, °be coordi- nates on 1R2; if S is the set 0 ::; °< 21r,0 ::; r, we consider the map cp: S ~ 1R2 given by x = r cos 0, y = r sin 0, where x, yare coordinates on a second copy of R2• Now this map is one-to-one and has positive Jacobian for r > O. If we consider the open sets U C S given by 0 < r, 0 < °::; 21r and V C R2 given by V = R2 - {x, y : y = 0, x ~ O}, the hypotheses of Theorem 11.1 are fulfilled, and we can write (since det J", = r) JI= J(focp)r (11.6) if supp leV. However, Eq. (11.6) is valid without the restriction supp leV. In fact, if D. is a strip of width E about the ray y = 0, x ~ 0, then I = leD. + leRn_D. and fleD. ~ 0 as E ~ 0 (Fig. 8.11). Similarly, f(fo cp)(r 0 cp)eD. 0 cp ~ 0, so that (11.6) is valid for all contented I by this simple limit argument. Fig. 8.11 We will not state a general theorem covering all such useful extensions of Theorem 11.1. In each case the limit argument is usually quite straightforward and will be left to the reader. EXERCISES 11.1 By the parallelepiped spanned by vI, ... , vn we mean the set of all x = LevI + ...+ ~nvn, where 0 ::; ~i < 1. Show that the content of this parallelepiped is given by Jdet ((Vi, Vj))J1I2. n.2 Express the content of the ellipsoid { (xl)2 (xn) 2 } x: (a1)2 +...+(an)2 :::; 1 in terms of the content of the unit ball. n.3 Compute the Jacobian determinant of the map <r, 0> 1--+ <x, y>, where x = r cos 0, y = r sin 0. n.4 Compute the Jacobian determinant of the- map <r, 0, cpr 1--+ <x, y, z>, where x = r cos cp sin 0, y = r sin cp sin 0, z = r cos 0. n.5 Compute the Jacobian determinant of the map <r, 0, z> 1--+ <x, y, z>, where x = r cos 0, y = r sin 0, z = -z.
  • 358. 346 INTEGRATION 8.12 12. SUCCESSIVE INTEGRATION In the case of one variable, i.e., the theory of integration on IRI, the fundamental theorem of the calculus reduces the computation of the integral of a function to the computation of its antiderivative. The generalization of this theorem to n dimensions will be presented in a later chapter. In this section we will show how, in many cases, the computation of an n-dimensional integral can be reduced to n successive one-dimensional integrations. Suppose we regard IRn , in some fixed way, as the direct product IRn = IRk X 1R1. We shall write everyz E IRn as z = -<x, y>, where x E IRk and y E IR/. Definition: 12.1. We say that a contented function f is contented relative to the decomposition IRn = IRk X IRI if there exists a set AI C IRk of content zero (in IRk) such that i) for each fixed x E IRk, X ~ A" the function f(x, .) is a contented function on IRI; ii) the function IRlf which assigns to x the number IRlf(x, .) is a contented function on IRk. It is easy to see that the set of all such functions satisfies Axioms 5'1 through 5'3. (The only axiom that is not immediate is 5'2. But this is an easy consequence of the fact that any translation T can be rewritten as TIT2 ,where Tl is a trans- lation in IRk and T2 is a translation in 1R1.) It is equally easy to verify that the rule which assigns to any such f the number satisfies Axioms II through I4. The only one which isn't immediately obvious is I3. However, if pis any paving with supp f C Ipl, then f ~ IIfllelpl and lk (ll IIfllelpl) = IIflllk IIeipi = IIfllJ.L(elpl)' since lkll eO = J.L(eo) for any rectangle (direct verification). Thus, by the uniqueness part of Theorem 10.1, we have (12.1) Note, in particular, that if f is also contented relative to the decomposition IRn = IRI X IRk, then In particular, for such f the double integration is independent of the order.
  • 359. 8.12 SUCCESSIVE INTEGRATION 347 In practice, all the functions that we shall come across will be contented relative to any decomposition of IRn. In particular, writing IRn = 1R1 X· .. X 1R1, we have (12.2) In terms of the rectangular coordinates x1, • • • , xn , this last expression is usually written as For this reason, the expression on the left-hand side of (12.2) is frequently written as hnf = j ... jf(x ... ,xn) dx 1 • •• dxn. Let us work out some simple examples illustrating the methods of integra- tion given in the previous sections. ExaDlple 1. Compute the volume of the intersection of the solid cone with vertex angle a (vertex at 0) with the spherical shell 1 ~ r ~ 2 (Fig. 8.12). By a Euclidean motion we may assume that the axis of the cone is the z-axis. If we introduce polar coordinates, we see that the set in question is the iInage of the set 0<2,2".,<>/2> <1,0,0> , 1 ~ r < 2, in the -<r, I{J, 6>-space (Fig. 8.13). Fig. 8.12 o ~ 6 ~ a/2 r Fig. 8.13 By the change of variables formula and Exercise 11.4 we see that the volume in question is given by j 1,2 (2". (<>/2 r2 sin 6 = 1 10 10 r2 sin 6 dO dl{J dr 1,2 ("/2 2 = 211" 1 10 r sin 6 d'O dr = 211"h2 [1 - cos (a/2)]r2 dr = 211"[1 - cos (a/2)](! - i).
  • 360. 348 INTEGRATION 8.12 Exalllple 2. Let B be a contented set in the plane, and let !I and f2 be two COlI- tented functions defined on B. Let A be the set of all -<x, y, z>- E 1E3 such thai, -<x, y>- E B andfl(x, y) ::::; z ::::; hex, y). If G is any contented function on A, we can express the integral JAG as 1G = 1{rI2 (X'Y) G(x, y, z) dZ} dx dy. A A Jh(X,y) For example, compute the integral JA z, where A is the set of all points in the unit ball lying above the surface z = x2 + y2 (Fig. 8.14). Thus A = {-<x,y,z>-: X2 +y2+Z2::::; 1,z ~ X2 +y2}. Fig.II.H We must have x2+ y2 ::::; a, where a2 + a = 1 [so that a = (VS - 1)/2], ill order for -<x, y, z>- to belong to A. Then !I(x, y) = x2 + y2, f2(x, y) = vI - (X2 + y2), and ;; / 2 (X,y) z dz = t[1 - (x 2+ y2) - (x2 + y2)2], h(x,y) so that, using polar coordinates in the plane (and Exercise 11.3), 1z = t r [1 - (x2 + y2) - (x2 + y2)2] = 7r r..;a r(l - r2 - r4) dr. A Jx2+y25,a Jo As we saw in the last example, part of the problem of computing an integral as an iterated integral is to determine a good description of the domain of integration in terms of the decomposition of the vector space. It is usually It. great help in visualizing the situation to draw a figure. Exalllple 3. Compute the volume enclosed by a surface of revolution. Here we arp given a function f of one variable, and we consider the surface obtained by rotating the curve x = fez), Zl ::::; Z ::::; Z2, around the z-axis (Fig. 8.15). We thus wish to compute ,u(A), where A = {-<x, y, z>- : x2 + y2 ::::; fez), Zl ::::; Z ::::; Z2}' Here it is obviously convenient to use cylindrical coordinates, and we sen that A is the image of the set B = {-<r, e, z>- : r ::::; fez), 0 ::::; e < 27r} in the -< r, e, z>- -space. By Exercise 11.5, we wish to compute fB r = fo2". 1.:2 fo/(Z) r dr dz de = 27r1.:2 (fo/(Z) r dr) dz = 27rf.:2f(~ 2 dz. Thus j Z2 2 ,u(A) = 7r fez) dz. Z1
  • 361. 8.12 SUCCESSIVE INTEGRATION 349 z x=f(z) -+--+-x Fig. 8.15 EXERCISES 12.1 Compute the volume of the region between the surfaces z = x2+y2 and z = x+ y. 12.2 Find the volume of the region in lEa bounded by the plane z = 0, the cylinder x2 +y2 = 2x, and the cone z = +vx2 +y2. 12.3 Compute fA (x2 +y2)2 dx dy dz, where A is the region bounded by the plane z = 2 and the surface x2 +y2 = 2z. 12.4 Compute fAx, where A = {-<x,y,z>:x2+y2+z2~ a2,x~ O,y~ O,z~O}. 12.5 Compute i(::+~:+::y/2, whe~ A is the region bounded by the ellipsoid ) ::+~:+::=l. i'et p be a nonnegative function (to be called the density of mass in the following discussion) defined on a domain V in P. The total mass of -< V, p> is defined as M = kP(X) dx. If M F- 0, the center of gravity of -< V, p> is the point C = -<Cl, C2, Ca>, where C1 = .!InXIP(X) dx, C2 = .!InX2P(X) dx, Ca = .!Lxap(x) dx.
  • 362. 350 INTEGRATION 8.12 12.6 A homogeneous solid (where P is constant) is given by Xl ~ 0, X2 ~ 0, X3 ~ 0, and 2 2 2 Xl + X2 + X3 < 1. a 2 b2 c2 - Find its center of gravity. 12.7 The unit cube has density p(x) = X1X3. Find its total mass and its center of gravity. 12.8 Find the center of mass of the homogeneous body bounded by the surfaces x2 +y2 +z2 = a 2 and x2 +y2 = ax. The notion of center of mass can, of course, be defined for a region in a Euclidean spacc of any dimension. Thus, for a region D in the plane with density p, the center of mass will be the point -<xo, Yo>-, where JDXP Xo = - - JDP and JDYP Yo = - - . JDP 12.9 Let D be a region in the xz-plane which lies entirely in the half-plane X > 0. Let A be the solid in 1E3 obtained by rotating D about the z-axis. Show that IL(A) = 211" dlL(D) , where d is the distance of the center of mass of the region D (with uniform density) from the z-axis. (Use cylindrical coordinates.) This is known as Guldin's rule. Observe that in the definition of center of gravity we obtain a vector (i.e., a point in 1E3) as the answer by integrating each of its coordinates. This suggests the following definition: Let V be a finite-dimensional vector space, and let el, ... , ek be a basis for V. Call a map ffrom lEn to V (f is a vector-valued function on lEn with values in V) contented if when we write f(x) = L: fi(x)ei, each of the (real-valued) functions fi is contented. Define the integral of f over D by 12.10 Show that the condition that a function be contented and the value of its integral are independent of the choice of basis e1, ... , ek. Let ~ be a point not in the closed domain D, which has a mass distribution p. The gravitational force on a particle of unit mass situated at ~ is defined to be the vector 1p(x)(x - ~) dx D Ilx - ~113 (here x - ~ is an 1E3-valued function on [3). Fig. 8.16 12.11 Let D be the spherical shell bounded by two concentric spheres 81 and 82 (Fig. 8.16), with center at the origin. Let Pbe a m;l.SS distribution on D which depends only on the distance from the center, that is, p(x) = f(llxll). Show that the gravita- tional force vanishes at any ~ inside 81. 12.12 -< D, p>- is as in Exercise 12.5. Show that the gravitational force on a point outside 82 is the same as that due to a particle situated at the origin and whose mass is the' total mass of D.
  • 363. 8.13 ABSOLUTELY INTEGRABLE FUNCTIONS 351 13. ABSOLUTELY INTEGRABLE FUNCTIONS Thus far we have been dealing with bounded functions of compact support. In practice, we would like to be able to integrate functions which neither are bounded nor have compact support. Let f be a function defined on lEn, let M be a nonnegative real number, and let A be a (bounded) contented subset of lEn. Let If be the function If(x) = {~f(x) if x El A, if x E A and If(x)I > M, if x E A and If(x) I ~ M. Thus If is a bounded function of compact support. It is obtained from f by cutting f back to zero outside A and cutting f back to M when If(x) I > M. We say that a function f is absolutely integrable if i) If is a contented function for all M > 0 and contented sets A; and ii) for any E > 0 there is a bounded contented set A. such that eA• • f is bounded and for all M > 0 and all B with B n A. = >0, JlfWI < E. It is easy to check that the sum of two absolutely integrable functions is again absolutely integrable. Thus the set of absolutely integrable functions forms a vector space. Note that if f satisfies condition (i) and If(x) I < Ig(x)1 for all x, where g is absolutely integrable, then f is absolutely integrable. Letf be an absolutely integrable function. Given any E, choose a correspond- ing A.. Then for any numbers M I and M 2 ~ maXxEA. If(x) I and for any sets Al ~ A. and A2 ~ A., If we let E ~ 0 and choose a corresponding family of A., then the above inequal- ity implies that the lim ffA. is independent of the choice of the A •. We define this limit to be ff. We now list some very crude sufficient criteria for a function to be absolutely integrable. We will consider the two different causes of trouble-nonbounded- ness and lack of compact support. Let f be a bounded function with fA contented for any contented set A. Suppose If(x) I ::; Cllxll-k for large values of Ilxll. Let Br be the ball of radius r centered at the origin. If TI is large enough so that the inequality holds for IIxll ~ Tb then for T2 ~ TI we have JlfBr2-Br11 ::; C ( IIxll-k = COnf.r2Tn-1-k, JBr2-Brl rl where On is some constant depending on n (in fact, it is the "surface area" of the unit sphere in lEn). If k > n, this last integral becomes COn (n-k n-k) n _ k T2 - Tl ,
  • 364. 352 INTEGRATION 8.13 which is ~ [Cln/(k - n)lr~-k, which tends to zero as r1 ---+ 00 if k > n. Thus we can assert: Let f be a bounded function such that fA is contented for any contented set A. Suppose that If(x)1 ---+ 0 as Ilxll ---+ 00 in such a way that Ilxllklf(x)1 is bounded for some k > n. Then f is absolutely integrable. Now let us examine the situation when f is of compact support but un- bounded. Suppose first that there is a point Xo such that f is bounded in the complement of any neighborhood of Xo. Suppose, furthermore, that If(x)1 ~ Ckllx - xoll-k for some constants C and k. Then if If(x) I > 111, Ilx - xoll-k > M/C or IIx - xoll < C/MI/k. Let B1 be the ball of radius CM1I / k centered at Xo. Then If(x)1 > M1 implies that x E B 1. Furthermore, for M 2 > M 1 we have where B2 is the ball of radius CM-;I/k centered at Xo. Thus where Q n and V n depend only on n. If k < n, the integral on the right becomes Cn- k Cn-k __ (M~k-n)/k _ M~k-n)/k) < __ M(k-n)/k n-k n-k 1 • Thus which can be made arbitrarily small by choosing M 1 large. Thus if f has compact support and is such that fM is contented for all M and If(x) I < Cllx - xoll-k with k < n, thenfis absolutely integrable. More generally, let S be a bounded subset of an l-dimensional subspace of lEn. Let d(x) denote the distance from x to S. Let f be a function of compact support with fM contented for all M. If If(x) I < C d(x)-k with k < n - l, then f is absolutely integrable. The proof is similar to that given above and is left to the reader. Let Uk} be a sequence of absolutely integrable functions. Under what conditions will the sequence ff n ---+ ff if the sequence A(x) ---+ f(x)? Even if the sequence converges uniformly, there is no guarantee that the integrals converge. For instance, if fk = (l/kn)eD~, then Ifk(X)1 ~ l/kn, so that fk approaches zero uniformly. On the other hand, ffk = 1 for all k.
  • 365. 8.13 ABSOLUTELY INTEGRABLE FUNCTIONS 353 We say that a set of functions Uk} is uniformly absolutely integrable if for any E > 0 there is an A. which can be chosen independently of k such that for all M wherever B n A. = 0. We frequently verify that {ik} is uniformly absolutely integrable by showing that there is an absolutely integrable function g such that Ifk(X)1 ~ Ig(x)1 for all k and x. Let Uk} be a uniformly absolutely integrable sequence of functions. Suppose that fk ~ f uniformly. Suppose in addition that f is absolutely integrable. Then Ifk ~ If. In fact, for any 0 > 0 we can find a ko such that Ifk(X) - f(x) I < 0 for all k > ko and all x. We can also find A. and M. such that Ilfk - IfI < Ilf~A. - Ifkl + Ilffo - IfI+ Ilf~A. - Iffol < E+E+oj.l(A.), which can be made arbitrarily small by first choosing E small (which then gives an A.) and then choosing 0 small (which means choosing ko large). The main applications that we shall make of the preceding ideas will be to the problems of computing iterated integrals and of differentiating under the integral sign. Proposition 13.1. Let f be a function on ~k X ~l. Suppose that the set of functions U(x, .)} is uniformly absolutely integrable, where x is restricted to lie in a bounded contented set K C ~k. Then the function eKXRI. f is absolutely integrable, and ( zf = ( ( zf(x, y) dy dx = ( I ( f(x, y) dx dy. 1KxR 1K JR JR 1K Proof. By assumption, for any E > 0 we can find M and A. C ~l such that if AnA. = 0. (13.1) Now for any set B in ~n, leKxRllff/1 = leK(x)DIff/(x, ·)1 ~ j.I(K) E if B n K X A. = 0. This shows that eKXRI f is absolutely integrable on ~n. Now choose a sufficiently large 0 = 01 X 02 and an M such that IIKXRIf - IKXRI ft! I< E, and also such that for all x E K.
  • 366. 354 INTEGRATION 8.13 Then we have and r Ifff = r r Jff (x, y) dy dx. lxXR JK JR Thus so that r If= r rJ(x,y)dydx. JKXR JK JR Finally Eq. (13.1) shows that the function F(y) = fK f(', y) IS absolutely integrable. In fact, using the same A and M as in (13.1), we get Thus we get r zf = r r J(x, y) dy dx = r I r f(x, y) dx dy. 0 lxXR lx JR JR lx An extension of the same argument shows the following. Proposition 13.2. Let f be absolutely integrable on IR.n and such that thc functions f(x, .) are uniformly absolutely integrable for each x E IR.k • Then If = Ilf(x, y) dy dx. We now turn our attention to the problem of differentiating under thc integral sign. Proposition 13.3. Let (t, x) ~ F(t, x) be a function on I X IR.n, wherc I = [a, b] C IR.. Suppose that i) F and aFj at are continuous functions on I X IR.n; ii) (aFjat)(t,') is a uniformly absolutely integrable family of functions; iii) F(t,') is absolutely integrable for all tEl. Letf(t) = fF(t, .). Thenfis a differentiable function of t and f'(t) = r (aFjat)(t, .). JRn Proof. Let G(t) = fRn(aFjat)(t, .). Then G(t) is continuous; hence we can pass to the limit under the integral sign of a family of absolutely integrable functions. Furthermore, f G(s) ds = lnf (aFjat)(s,') ds
  • 367. 8.14 PROBLEM SET: THE FOURIER TRANSFORM 355 by Proposition 13.1. Thus 1t G(s) = ( (F(t,') - F(a, .)) = ( F(t,') - ( F(a,') = f(t) - f(a). a JRn JRn JRn Differentiating this equation with respect to t gives the desired result. 0 Finally, let us state the change of variables fonnula for absolutely integrable functions. Let tp: U ~ V be a differentiable one-to-one map with differentiable inverse, where U and V are two open sets in ~n. Let f be an absolutely integrable function defined on V. Then (f 0 tp)ldet J <pI is an absolutely integrable func- tion on U and Jvf= Ju (fo tp)ldetJ<pI· Proof. To show that (f 0 tp)ldet J <pI is absolutely integrable, let E > 0 and choose an A. C V such that (ii) holds. Then A. is compact, and therefore so is tp-l(A.). In particular, tp-l(A.) is a bounded contented set and Idet J <pI is bounded on it. If B n tp-l(A.) = 0, where Be U is bounded and contented, then ( IdetJ<pIMI(fotp)MI::; J. IfMI < E. lB <p(B) This shows that (f 0 tp) Idet J <pI is absolutely integrable. The rest of the proposi- tion then follows from by letting E ~ O. 0 EXERCISES 13.1 Evaluate the integral f~"" e-x2 dx. [Hint: Compute its square.] 13.2 Evaluate the integral fo"" e-x2x2k dx. 13.3 Evaluate the volume of the unit ball in an odd-dimensional space. [Hint: Ob- serve that the Jacobian determinant for "polar" coordinates is of the form rn - 1 X j, wherejis a function of the "angular variables". Thus the volume of the unit ball is of the form CfJ rn - 1 dr, where C is determined by integrating j over the "angular vari- ables". Evaluate C by computing <f':"" e-x2 dx)n.] 14. PROBLEM SET: THE FOURIER TRANSFORM Let a = <al, ... , an> be an n-tuple whose entries are nonnegative integers. By Da we shall mean the differential operator !lal+"'+a" Da = _(J_ _ __ ax~l ... ax~"
  • 368. 356 INTEGRATION 8.14 Let lal = al +... + an. Let Q(x, D) = Llal:::;k aa(x)Da be the differential operator where each aa is a polynomial in x. Thus if f is a Ck-function on IRn, we have For any f which is Coo on IRn we set IIfllQ = sup IQf(x) I·xERn We denote by s the space of all f E Coo such that IlfllQ < 00 (14.1) for all Q. To see what this means, let us consider those Q with k = o. Then (14.1) says that for any polynomial a(·) the function a· f is bounded. In other words, f vanishes at infinity faster than the inverse of any polynomial; that is, lim IlxIIPf(x) = 0 IIxll--->oo for all p. To say that (14.1) holds means that the same is true for any derivative of f as well. If f is a Coo-function of compact support, then (14.1) obviously holds, so f E s. A more instructive example is provided by the function n given by n(x) = e-lIxIl2• Since limT--->oo rPe-r2 = 0 for any p, it follows that limllxll--->oo a(x)n(x) = o. On the other hand, it is easy to see (by induction) that Dan(x) = Pa(x)n(x) for some polynomial Pa. Thus Qn(x) = PQ(x)n(x), where PQ is a polynomial. Thus n E s. It is easy to see that the space S is a vector space. We shall introduce a notion of convergence on this space by saying that fn ~ f if for every fixed Q, Ilfn - fllQ ~ o. (Note that the space S is not a Banach space in that convergence depends on an infinity of different norms.) EXERCISES 14.1 Let cp be a Coo-function which grows slowly at infinity. That is, suppose that for every a there is a polynomial Pa such that for all x. Show that if I E S, then cpl E S. Furthermore, the map of S into itself sending I ~ cpl is continuous, that is, if In ~ I, then CPln ~ cpf.
  • 369. 8.14 PROBLEM SET: THE FOURIER TRANSFORM 357 For x = -<Xl, ... , Xn> E ~n and ~ = -< ~1, ... , ~n> E ~n* we denote the value of ~ at x by Also for any a = -<al , ... , an> and any x E ~n we let xa = (Xl)al ... (xn)an , and similarly ~a = (e)al ... (~n)an, etc. For any f E S we define its Fourier transformj, which is a function on ~n*, by Jm = J e-i(x'~>f(x) dx. We note that J(O) = Jf and 14.2 Show that j possesses derivatives of all orders with respect to ~ and that DrJm = (-i)laIJe-i(x.nxaf(x) dx; in other words, where g(x) = (-i) lalxaf(x). 14.3 Show that D~JW = OW, /'-.. aaf .m = i~1m. XJ [Hint: Write the integral as an iterated integral and use integration by parts with respect to the jth variable.] 14.4 Conclude that the map fl---+ Jsends S(~n) into S(~n*) and that if fn ~ 0 in S, thenJn ~ 0 in S(~n*). 14.5 Show that f:Jw = e-i(",.t>jw Recall that T",f(x) = f(x - w). for any w E ~n. 14.6 For any f E S define f by f(x) = f(-x), where denotes complex conjugation. Show that 1w =jW. 14.7 Let n = 1, and letfbe an even real-valued function of x. Show that Jm = J cos (x, ~) f(x) dx. 14.8 Let n(x) = e-(1I2).r2 where x E ~l. Show that and conclude that dn ( ) • )d~ ~ = -~ri(~ , log nW = _-H2 + const,
  • 370. 358 INTEGRATION so that n(~) = const X e-O/2a2. Evaluate this constant as vz;;: by setting ~ = 0 and using Exercise 13.1. Thus nW = V21r e-(1I2)E2. 8.14 14.9 Show that the limit limE-->o IF' (sin x)/x dx exists. Let us call this limit d. Show that for any R > 0, limE-->o I.II· (sin Rx)/x dx = d. If f E S, we have seen thatJ E S(lRn*). We can therefore consider the function fei(Y.E)Jm d~. The purpose of the next few exercises is to show that fey) = _1_ jei (1I·E>Jm d~(2'/1")" • (14.2) We first remark that since all integrals involved are absolutely convergent, it suffices to show that f Rn fRI fey) = lim ... lim ~ J(~I, ••. , nei(yIEI+... +1In~l d~1 ... dt. RI-->oo R,,-->oo (2'/1") -R" -RI Substituting the definition of f into this formula and interchanging the order of inte- gration with respect to x and ~, we get I· I· ( 1 )"fiRn fRIf( 1 n) i[(yl_ZllEI+··+(1In-znlEnl dl:l dl:ndx1m··· 1m - ... x, ... ,xe ., ... ., . RI-->oo R,,-->oo 27 -R" -RI It therefore suffices to evaluate this limit one variable at a time (provided the conver- gence is uniform, which will be clear from the proof). We have thus reduced the problem to functions of one variable. We must show that if f E S(IR1), then R fey) = lim 21 fer f(x)ei(Y-ZlE d~ dx. R-->oo '/I" J-R We shall first show that R fey) = lim 41d fer f(x)ei(Y-ZH d~ dx, R-->oo J-R where d is given in Exercise 14.9. 14.10 Show that this last integral can be written as .! foo f( ) sin R(y - x) dx = ! 100 fey - u) +fey +u) . Ru d 2d x d 2 sm u._ y-x 0 u 14.11 Let g(u) = fey - u) ~ fey +u) - fey).
  • 371. 8.14 PROBLEM SET: THE FOURIER TRANSFORM 359 Show that g(O) = 0 and conclude that g(x) = xh(x) for 0 ~ x ~ 1, where h E Cl. By integrating by parts, show that I rg(u) sin Ru + t g(u) sin Rul < const -R1 . Jl u JlIE U Conclude that 1· 1 100 fey - u) +fey +u) . Ru d f( )Im- d 2 sm- U= y. R~ 0 U This proves that fey) = 4~ feill~fm d~. 14.12 Using Exercise 14.8, conclude that d = 7r/2. Let h E Sand h E S. Define the function h *h by setting h *12(x) = jh(x-Y)h(y)dy. Note that this makes good sense, since the integrand on the right clearly converges for each fixed value of x. We can be more precise. Since fi E S, we can, for any integer p, find a Kp such that so that f L Rn Ih(y)1 < 1 +- Rp . IIIIII>R Then j (l+ IIxllq)h(x - y)h(y) dy = { (1+ IIxllq)h(x - y)h(y) dy J111111«1/2) II" II + { (1 +Ilxllq)h(x - y)h(y) dy. J111111>(1/2) II" II The first integral is at most Cn (illxllnl + Ilxll)q m~x Ih(z)1 1 +~ttlxlDP , while the second is at most (1 + IIxllq) max Ih(u)1 Lp(illxlDn " 1 + (illxlDp By choosing p > q +n, we see that both terms go to zero. Thus lim (1 + IIxllq)h *hex) = o. II" 11--+00 14.13 Show that a (ah ) (ah)-. (fl *h) = -. *h = h * -. .ax' ax' ax' Conclude that h *hE S.
  • 372. 360 INTEGRATION 14.14 Show that if cp is any bounded continuous function on IR", then IIcp(x +y)f (x)h(y) dx dy = ICP(U)(fl *h)(u) duo 14.15 Conclude that - - - - - - A A h*hW =hWhW. 14.16 Show that f *l(y) = c~rf'Jw,2ei(Y'O d~. 14.17 Conclude that for any f E S, [Hint: Set y = 0 in Exercise 14.16.] 8.14 (14.3) The following exercises use the Fourier transform to develop facts which are useful in the study of partial differential equations. We will make use of these facts at the end of the last chapter. The reader may prefer to postpone his study of these problems until then. On the space S, define the norm II II. by setting IIfll~ = (21frnl(1 + 1I~1I2)'ljWI2 d~, and the scalar product (f, g). by (f' g). = 1(1+ 1I~1I2)'JWgW d~. 14.18 Let s = R be a nonnegative integer. Show that IIfll~ = L '(R ~ I /)' f1Daf(x) 12 dx, lal~Ra. a . where a! = a1! ... an!. [Use the multinomial theorem, a repeated application of Exercise 14.3, and Eq. (15.3).] We thus see that IIfliR measures the size of f and its derivatives to order R in the square integral norm. It is helpful to think of II II. as a generalization of this notion of size, where now s can be an arbitrary real number. Note that IIfll.::::;; IIfllt if s::::;; t. For any real s define the operator K8 by setting 14.19 Show that the operator K = Kl is given by a2f Kf =f- L 2 ' aXl
  • 373. 8.14 PROBLEM SET: THE FOURIER TRANSFORM 361 14.20 Show that for any real numbers 8 and t, IIK'fllt = Ilfllt+28 and 14.21 Show that Ko+t = K' 0 Kt, so that, in particular, K' in invertible for all 8. We now define the space H8 to be the completion of S under the norm II II.. The space H8 is a Hilbert space with a scalar product ( , )8. We can think of the elements of H. as "generalized functions with generalized derivatives up to order 8". By con- struction, the space S is a dense subspace of H. in the norm II II•. We note that Exer- cise 14.20 implies that the operator K' can be extended to an isometric map of H t into H t -2•. We shall also denote this extended map by K8. By Exercise 14.21, is the inverse of K', so that K' is a norm-preserving isomorphism of H t onto H t -28. 14.22 Let u E H. and v E H -s. Show that I(u, v)ol ::; lIuI18 1Ivll-8 • Thus we can extend -<u, v >- ~ (u, v)o to a function on H. X H -s which is linear in u and antilinear in v [that is, (u, aVI + b2V2)0 = a(u, VI) + b(u, V2)] and satisfies the above inequality. Thus any v E H -s defines a bounded linear function, l, on H. by l(u) = (u, v)o. 14.23 Conversely, let l be a bounded linear function on Hs. Show that there is a v E H -s with l(u) = (u, v)o for all u E H•. [Hint: Consider the linear form v = K'w, where w is a suitable element of H8 , using Theorem 2.4 of Chapter 5.] 14.24 Show that I(u, v)ol IIvll-s = sup II IIuEH. U. u,,",O (Exercise 14.22 gives an inequality. If v ;& 0, take u = K-'v to get in order to get an equality.) 14.25 Let 28 > n (where our functions are defined on IRn). Show that for any f E S we have (Sobolev's inequality). (Use Eq. (14.2), Schwarz's inequality, and the fact that the integral on the right of the inequality is absolutely convergent.) Sobolev's inequality shows that the injection of S into C(lRn) extends to a continuous injection of H. into C(lRn), where C(lRn) is given the uniform norm. We can thus regard the elements of H. as actual functions on IRnif 8 > n/2.
  • 374. 362 INTEGRATION 8.14 By induction on lal we can assert that for 8 > n/2, any f E H 1al +. has lal con- tinuous derivatives and (14.4) 14.26 Let Q be a bounded open subset of ~n. Let cP E S satisfy supp cP C Q. Show that 14.28 Show that and conclude that for all ~. I~1li,?W I ~ JL(Q) 1/211Dllcpll0, 1(1 + 11~112)ki,?2WI ~ JL(Q)llcpllz. 14.29 More generally, let y; be a function in S which satisfies y;(x) = 1 for all x E Q, and let cp E S satisfy supp cp C Q. Show that Ii,?WI = I(cp,h)ol ~ IlcpI181Ihll-s, where y;~(x) = y;(x) e-i<x,~), and that ID~i,?WI ~ Ilcpll.IIY;~ 11-., where y;~(x) = xay;(x)e-i(x,~). Let us denote by H~ the completion under II II. of the space of those functions in S whose supports lie in Q. According to Exercise 14.29, any cp E H~ defines an actual function i,? of ~ which is differentiable and satisfies ID~i,?(~) I ~ Ilcpll.IIY;~(x) 11-8, where 11Y;~(x)il-. depends only on Q, a, ~, and -8, and is independent of cpo Further- more, Ilcpll; = J(1 + 11~112)81i,?WI2 d~. 14.30 Let 8 < t. Then the injection H t ~ H. is a compact mapping. That is, if {CPi} is a sequence of elements of HP such that IICPil1t ~ 1 for all i, then we can select a subsequence {cpi) which converges in II II•. [Hint: By Exercise 14.29, the sequence oj functions cpi(~) is bounded and equicontinuous on U: II~II ~ r} for only fixed r. w(~ can thus choose a subsequence which converges uniformly and therefore a subsubsc- quence which converges on U: II~II < r} for all r (the uniformity possibly depending on r). Then if {CPi) is this subsubsequence, IICPij - CPikll~ = f(1 + 1I~1I2)8IcpijW - CPikWI2 d~ = ( (1 + 1I~1I2)8Icpi;(~) - CPikWI2 d~ JII~II:$ r + ( (1 + 11~1I2)8IcpijW - CPikWI2 d~ lu~ lI>r ~ ( (1 + 11~112)8Icpij(~) - CPik WI2 d~ lu~ 11:$ r + (1 + 1I~1I2)8-t{IICPij1l7 + IICPikll~}.J
  • 375. CHAPTER 9 DIFFERENTIABLE MANIFOLDS Thus far our study of the calculus has been devoted to the study of properties of and operations on functions defined on (subsets of) a vector space. One of the ideas used was the approximation of possibly nonlinear functions at each point by linear functions. In this chapter we shall generalize our notion of space to include spaces which cannot, in any natural way, be regarded as open subsets of a vector space. One of the tools we shall use is the "approximation" of such a space at each point by a linear space. Suppose we are interested in studying functions on (the surface of) the unit sphere in 1E3. The sphere is a two-dimensional object in the sense that we can describe a neighborhood of every point of the sphere in a bicontinuous way by two coordinates. On the other hand, we cannot map the sphere in a bicontinuous one-to-one way onto an open subset of the plane (since the sphere is compact and an open subset of 1E2 is not). Thus pieces of the sphere can be described by open subsets of 1E2, but the whole sphere cannot. Therefore, if we want to do calculus on the whole sphere at once, we must introduce a more general class of spaces and study functions on them. Even if a space can be regarded as a subset of a vector space, it is conceivable that it cannot be so regarded in any canonical way. Thus the state of a (homo- geneous ideal) gas in equilibrium is specified when one gives any two of the three parameters: temperature, pressure, or volume. There is no reason to prefer any two to the third. The transition from one set of parameters to the other is given by a one-to-one bidifferentiable map. Thus any function of the states of the gas which is a differentiable function in terms of one choice of parameters is differentiable in terms of any other. Thus it makes sense to talk of differentiable functions on the states of the gas. However, a function which is linear in terms of one choice of parameters need not be linear in terms of the other. Thus it doesn't really make sense to talk of linear functions on the states of the gas. In such a situation we would like to know what properties of functions and what operations make sense in the space and are not artifacts of the description we give of the space. Finally, even in a vector space it is sometimes convenient to introduce "nonlinear coordinates" for the solution of specific problems: for example, polar coordinates in Exercises 11.3 and 11.4, Chapter 8. We would therefore like to know how various objects change when we change coordinates and, if possible, to introduce notation which is independent of the coordinate system. 363
  • 376. 364 DIFFERENTIABLE MANIFOLDS 9.1 We will begin our formal discussion with the definition of differentiable manifolds. The basic idea is ~imilar to the one that is used in everyday life to describe the surface of the earth. One gives a collection of charts describing small overlapping portions of the globe. We can piece the whole picture together by seeing how the charts match up. Fig. 9.1 1. ATLASES Let M be a set. Let V be a Banach space. (For almost all our applications we shall take V to be IRn for some integer n.) A V-atlas of class Ck on M is a collec- tion a of pairs (Ui , l{Ji) called charts, where Ui is a subset of M and l{Ji is a bijec- tive map of Ui onto an open subset of V subject to the following conditions (Fig. 9.1): AI. For any (Ui , l{Ji) E a and (Uj, I{Jj) E a the sets l{Ji(Ui n Uj) and I{Jj(Ui n Uj ) are open subsets of V, and the maps l{Ji 0 I{Jjl: I{Jj(Ui n Uj) ---t l{Ji(Ui n Uj) are differentiable of class Ck. A2. UUi = M. The functions l{Ji 0 I{Jjl are called the transition functions of the atlas a. The following are examples of sets with atlases. ExalDple I. The trivial example. Let M be an open subset of V. If we take a to consist of the single element (U, I{J), where U = M and I{J: U ---t V is the identity map, then Axioms Al and A2 are trivially fulfilled. ExalDple 2. The sphere. Let M = sn denote the subset of IRn+t given by (Xl)2 +... + (xn+l)2 = 1. Let the set U1 consist of those points for which xn+t > -1, and let U2 consist of those points for which xn+t < 1. Let 1P1; U1 ---t IRn
  • 377. 9.1 ATLASES 365 be given by i i (I n+l) X • 1y 0 IPI X , ••• , x = 1 +xn +l ' ~ = , ... ,n, where y ... , yn are coordinates on IRn. Thus the map IPI is given by the projection from the "south pole", -<0, ... , 0, -1>-, to IRn regarded as the equatorial plane (see Fig. 9.2). Similarly, define 1P2 by i i 0 (I n+l) _ X • Y 1P2 X , ••• , x - I _ xn +l Then IPI(U1 n U2) = 1P2(U1 n U2 ) = {y E IRn: y ~ O}. Now Thus or '" i 2( 1 n+l (XI)2 +... + (xn)2 £...., (y 0 1P1) x, ... , x ) = (1 +xn+l) 2 1 - (xn+l)2 1 _ xn+l = (1 + xn+l)2 = 1 + xn+1 • 1P1 (x) 1P2(X) = 111P1(X)11 2 In other words, the map 1P2 0 IPIl, defined for all y ~ 0, is given by -l( ) Y1P2 0 1P1 Y = llYfi2 . Thus conditions Al and A2 are fulfilled. Fig. 9.2 Note that the atlas we gave for the sphere contains only two charts (each given by polar projection). An atlas of the earth usually contains many more charts. In other words, many different atlases can be used to describe the same set. We shall return to this point later.
  • 378. 366 DIFFERENTIABLE MANIFOLDS 9.1 Fig. 9.3 Example 3. The circle. The circle 8 1 is a "one-dimensional sphere" and therefore has an atlas as described in Example 2. We wish to describe a different atlas on 8 1. Regard 8 1 as the unit circle x~ +x~ = 1, and consider the function 8b defined in a neighborhood of < 1, 0> on the upper semicircle of 8 1, which gives the angle from the point on 8 1 to < 1, 0> (see Fig. 9.3). As we move counter- clockwise around the circle, this function is well defined until we hit < 1, °> again. We will take, as the first chart in our atlas, (U1, 81), where U1 = 8 1 - {< 1,0>} and 81 is the function defined above. Let U2 = 8 1 - {<o, I>}, and define 82 to be 7r/2 plus the angle (measured counterclockwise) from <0,1> (see Fig. 9.4). Now U1 n U2 = 8 1 - {<1,0>, <0, I>}, and 81(U1 n U2) = (0,27r) - {7r/2}. I I I I o "./2 The map 82 0 811 is given by 82 0 811(x) = {~+ 27r 2". if °< x < 7r/2, if 7r/2 < x < 27r. Example 4. The product of two atlases. Let a = {CUi, 'Pi)} be a V I-atlas on a set M, and let <B = {(Wb 1/Ij)} be a V2-atlas on a set N, where VI and V2 are Banach spaces. Then the collection e = {(Ui X Vj, 'Pi X 1/1j)} is a (V1 X V2)- atlas on M X N. Here 'Pi X 1/Ij(p, q) = < 'Pi(P), 1/Ii(q) > if <p, q> E Ui X Wj. lt is easy to check that e satisfies conditions Al and A2. We shall call e the product of a and <B and write e = a X <B. For instance, let M = (0,1) C 1R1 and N = 8 1• Then we can regard M X N as a cylinder or an annulus. If M = N = 81, then M X N is a torus. Cylinder Annulus Torus
  • 379. 9.2 FUNCTIONS, CONVERGENCE 367 It is an instructive exercise to write down the atlases and transition functions explicitly in these cases. ExaDlple 5. As a generalization of our first example, let S be a submanifold of an (n +m)-dimensional vector space X, as defined in Section 12 of Chapter 3. For each neighborhood N defined there, the set S n N, together with the map l{J which is defined as the projection 1r1 restricted to S, provides a chart with values in V (where X is viewed as V X W). In such a neighborhood N the set S is presented as a graph of function F. In other words, SnN= {-<x,F(x)~ EVX W:XE1r1(S)}, where F is a smooth map of A = 1r1(S n N) into W. Let N' be another such neighborhood with corresponding projection 1r~ (where now X is identified with V X W in some other way). Then l{J' 0 l{J-1(X) = 1r~(x, F(x»), which shows that l{J' 0 l{J-1 is a smooth map. Thus every submanifold in the sense of Chapter 3 possesses an atlas. Exercise. Let!P" (projective n-space) denote the space of all lines through the origin in 1R"+1. Any such line is determined by a nonzero vector lying on the line. Two such vectors, -<Xl, ... , x,,+l ~ and -< y1, ... , y,,+l ~ , determine the same line if and only if they differ by a factor, that is, yi = AX' for all i, where A is some (nonzero) real number. We can thus regard an element of !P" as an equivalence class of nonzero vectors. For each i between 1 and n +1, let U. C !P" be the set of those elements coming from vectors with Xi ~ O. Map by sending ~ 1 i-I i+1 ,,+1 ~ 1 ,,+1 X X X X -<x, ... ,x ~~ -", ... ,-.-,-.-, ... ,-.- .x' x· x' x' Show that the map ai is well defined and that {(Ui, ai)} is an atlas on P". 2. FUNCTIONS, CONVERGENCE Let G, be a V-atlas of class Ck on a set M. Let/be a real-valued function defined on M. For a chart (Ui , l{Ji) we obtain a function Ii defined on l{Ji(Ui) by setting (2.1) The function /i can be regarded as the "local expression of f" in terms of the chart (Ui, l{Ji). In general, the functions /i will look quite different from one another. For example, let M = S", let a be the atlas described, and let / be the function on the sphere assigned to the point -<Xl, ... , x"+1 ~ the value x..+1. Then while /2(Y) = /0 l{J21(y) = 1 - 1 +~IYIl2 ' as one can check by solving the equations.
  • 380. 368 DIFFERENTIABLE MANIFOLDS 9.2 Returning to the general discussion, we observe that the functions fi are not completely independent of one another. In fact, it follows from the defini- tion (2.1) that we have Ii 0 <Pi 0 <pjl = I; on <Pj(Ui n Uj). (2.2) [Thus in the example cited above we indeed have f2(Y) = fl(y/llyI12), as is required by (2.2).] We now come to a simple but important observation. Suppose we start with a collection of functions {fi}, eachfi defined on <Pi(Ui), and such that (2.2) holds. Then there exists a unique function f on M such that Ii = f 0 <pil. In fact, define f by setting f(p) = fi (<Pi(p) if p E Ui. For f to be well defined, we must be sure that this definition is consistent, i.e., that if p is also in Uj, then fi(<Pi(p) = I;(<pj(p), but this is exactly what (2.2) says. We can thus think of a real-valued function in two ways: as either i) an object defined invariantly on M, i.e., a map from M to ~, or ii) a collection of objects (in this case functions) one defined for each chart and satisfying certain "transition laws", namely (2.2). This dual way of looking at objects on M will recur quite frequently in what follows. Let M be a set with an atlas of class Ck • We will say that a function f is of class Cl (l ~ k) if each of the functions Ii defined by (2.1) is of class Cl• Note that since l ~ k, this can happen without any interference from (2.2). IfIi E Cl and <pil 0 <Pj E Ck (k ~ l), then fi 0 (<pi l 0 <Pj) E Cl. If l were larger than k, then in general fi would not be of class Cl if I; were, and there would be very few functions of class Cl• Since we will not wish to constantly specify degrees of differentiability of our atlas, from now on when we speak of an atlas we shall mean an atlas of class Coo. Let M be a set with an atlas <t. We shall say that a sequence of points {Xi E M} converges to X E M if i) there exists a chart (Ui , <Pi) E a and an integer Ni such that X E Ui and for all k > N, Xk E Ui ; ii) <Pi(Xkh>N converges to <p(x). Note that if (Uj, <pj) is any other chart with X E Uj, then there exists an N j such that <Pj(Xk) E Uj for k > N j and <Pj(Xk) ---? <pj(x). In fact, choose N j so that <Pi(Xk) E <Pi(Ui n Uj) for all k ~ N j. (This is possible since <Pj(Ui n Uj) is open by AI.) The fact that the <Pj(Xk) converge to <Pj(x) follows from the continuity of <Pj 0 <pil. It thus makes good sense to say that {Xk} converges to x. Warning. It does not make sense to say that a sequence {Xk} is a Cauchy sequence. Thus, for example, let M = sn with the atlas described above. If {Xk} is a sequence of points converging to the north pole in sn, then <PI (Xk) ---? 0, while <P2(Xk) ---? 00. This example becomes even more sticky if we remove the north pole, i.e., let M = sn - {-< 0, ... , 0, 1>} and define the charts as before.
  • 381. 9.3 DIFFERENTIABLE MANIFOLDS 369 Then {Xk} has no limit (in M). Clearly, {IP1 (Xk)} is a Cauchy sequence, while {IP2(Xk)} is not. Once we have a notion of convergence, we can talk about such things as open sets and closed sets. We could also define them directly. For instance, a set U is open if IPi(U n Ui) is an open subset of IPi(Ui) for all charts (Ui, IPi), and so on. EXERCISES 2.1 Show that the above definition of a set's being open is consistent, i.e., that there exist nonempty open sets. (In fact, each of the U/s is open.) 2.2 Show that a sequence {xc>} converges to x if and only if for every open set U containing x there is an Nu with Xc> E U for a > Nu. Let a = {CUi, IPi)} be an atlas on M, and let U be an open subset of M relative to this atlas. Let a f U be the collection of all pairs (Ui n U, IPi f U). It is easy to check that a f U is an atlas on U. We shall call it the restriction of a to U. Let f be a function defined on the open set U. We say that f is of class Cl on U if it is of class Cl relative to the atlas (1, f U on U. For later convenience we shall say that a function f defined on a subset of M is of class Cl if i) the domain of f is some open set U of M, and ii) f is of class Cion U. 3. DIFFERENTIABLE MANIFOLDS In our discussion of the examples in Section 1, the particular choice of atlas that we made in each case was rather arbitrary. We could equally well have intro- duced a different atlas in each case without changing the class of differentiable functions, or the class of open sets, or convergent sequences, and so on. We therefore introduce an equivalence relation between atlases on M: Let (1,1 and (1,2 be atlases on M. We say that they are equivalent if their union (1,1 U (1,2 is again an atlas on M. The crucial condition is that Al still hold for the union. This means that for any charts (Ui, IP;) E (1,1 and (Wi> "'j) E (1,2 the sets IPi(Ui n Wj) and "'j(Ui n Wj) are open and IPi 0 ",;1 is a differentiable map of "'j(Ui n Wj) onto IPi(Ui n Wj) with a differentiable inverse. It is clear that the relation introduced is an equivalence relation. Further- more, it is an easy exercise to check that iff is a function of class Cl with respect to a given atlas, it is of class Cl with respect to any equivalent one. The same is true for the notions of open set and convergence.
  • 382. 370 DIFFERENTIABLE MANIFOLDS 9.3 Definition 3.1. A set M together with an equivalence class of atlases on M is called a differentiable manifold if it satisfies the "Hausdorff property": For any two pointsXl ¢ x2 ofMthere are open sets Uland U2 with Xl E UI andX2E U2 with UI n U2 = 0. In what follows we shall (by abuse of the language) denote a differentiable manifold by M, where the equivalence class of atlases is understood. By an atlas of M we shall then mean an atlas belonging to the given equivalence class, and by a chart of M we shall mean a chart belonging to some atlas of M. We sha.ll also adopt the notational convention that V is the Banach space where the charts on M take their values (and shall say that M is a V-manifold). If there are several manifolds, M 1, M 2, etc., under discussion, we shall denote the corresponding vector spaces by VI, V2, etc. If V = Rn, we say that M is an n-dimensional manifold. Let M 1 and M 2 be differentiable manifolds. A map cp: M 1 ~ M 2 is called continuous if for any open set U2 eM2 the set cp-1(U2) is an open subset of MI' Let X2 E M 2, and let U2 be any open set containing X2' If CP(X1) = X2, then cp-1(U2) is an open set containing Xl. If (W, a) is a chart about Xl, then W n cp-1(U2) is an open subset of W, and a(W n cp-1(U2») is an open set in VI containing a(xI)' Therefore, there exists an E > 0 such that cp(x) E U2 for all X E W, with Ila(x) - a(xl)1I < E. In this sense, all points "close to Xl" are mapped "close to X2". Note that the choice of E will depend on the chart (W, a) as well as on Xb X2, U2, and cpo If M b M 2, and M 3 are differentiable manifolds, and if cp: M 1 ~ M 2 and "': M2 ~ M3 are continuous maps, it is easy to see that their composition '" 0 cp is a continuous map from M1 to M3' Let cp be a continuous map from M 1 to M 2. Let (Wb aI) be a chart on M 1 and (W2, (2) a chart on M 2. We say that these charts are compatible (under cp) if cp(W1) c W 2. If a 2 is an atlas on M2and al is an atlas on Mb we say that al and a2 are compatible under cp if for every (W b al) E al there exists a (W2, (2) E a2 compatible with it, i.e., such that cp(W1) c W 2. (Note that the map a2 0 (cp f WI) 0 all is then a continuous map of an open subset of VI into V2') Given a 2 and cp, we can always find an al compatible with a2 under cpo In fact, let a'l be any atlas on M b and set al = {(WI n cp-I(W2»), a f (WI n cp-l(W2»)}, where (W11 a) ranges over all charts of a'i and (W2, (3) ranges over all charts of a2. Definition 3.2. Let M 1 and M 2 be differentiable manifolds, and let cp be a map: MI .!4 M 2• We say that cp is differentiable if the following hold: i) cp is continuous. ii) Let al and a 2 be compatible atlases under cpo Then for any compatible (Wb aI) E al and (W2, (2) E a2, the map a2 0 cp 0 all: al(W1) ~ a2(W2) is differentiable (as a map of an open subset of a Banach space into a Banach space). (See Fig. 9.5.)
  • 383. 9.3 DIFFERENTIABLE MANIFOLDS 371 Fig. 9.5 In order to check that a continuous map tp is differentiable, it suffices to check much less than (ii). Condition (ii) relates to any pair of compatible atlases and any pair of compatible charts. In fact, we can assert: Proposition 3.1. Let tp: M 1 -+ M 2 be continuous, and let (11 and (12 be compatible atlases under tp. Suppose that for every (WI, aI) E al there exists a (W2, (2) E (12 with tp(W1) c W2 and a2 0 tp 0 a'i'"1 differentiable. Then tp is differentiable. Proof. Let (U1, (31) and (U2, (32) be any charts on M 1 and M 2 with tp(U1) C U2' We must show that f32 0 tp 0 f311 is differentiable. It suffices to show that it is differentiable in the neighborhood of every point f3(XI), where Xl E Ul' Choose (W1, aI) E (11 with X E W11 and choose (W2, aI) E a2 with tp(W1) C W2' Then on f3I(W1 nUl), we have f32 0 tp 0 f311 = (f32 0 a;I) 0 (a2 0 tp 0 all) 0 (al 0 f3II ), sO that the left-hand side is differentiable. 0 In other words, it suffices to verify differentiability with one pair of atlases. We have as a consequence: Proposition 3.2. Let tp: M 1 -+ M 2 and 1/1: M 2 -+ M a be differentiable. Then 1/1 0 tp is differentiable. Proof. Let (1a be an atlas on Ma. Choose (12 compatible with (1a under 1/1, a.nd then choose an atlas (11 on M 1 compatible with (12 under tp. For any (W11 aI) E (11 choose (W2, (2) E (12 and (Wa, aa) E aa with tp(W1) C W2 and I/I(W2) C Wa. Then aa 01/10 tp 0 a'i'"1 = (aa 01/10 a;I) 0 (a2 0 tp 0 a'i'"I) is dif- ferentiable. 0 Exercise 3.1. Let MI = 8", let M2 = lPn, and let tp: MI -+ M2 be the map sending each point of the unit sphere into the line it determines. (Note that two antipodal
  • 384. 372 DIFFERENTIABLE MANIFOLDS 9.: points of ,sn go into the same point of [P'n.) Construct compatible atlases for <p and show that <p is differentiable. Nate that if f is any function on M with values in a Banach space, then f i~ differentiable as a function (in the sense of Section 2) if and only if it is differ- entiable as a map of manifolds. In particular, let <p: M 1 ---+ M 2 be a differentiabl(' map, and let f be a differentiable function on IJI2 (defined on some open subset, say U2). Then f 0 <p is a differentiable function on M 1 [defined on the open seL <p-l(U2)]. Thus <p "pulls back" a differentiable function on lYI2 to 1If1• From this point of view we can say that <p induces a map from the collection of differ- entiable functions on 1II2 to the collection of differentiable functions on MI. WI: shall denote this induced map by <p*. Thus differentiable functions on M 2 ~ differentiable functions on M 1 is given by <p*[f] = f 0 <p. If 1/;: M 2 ---+ M 3 is a second differentiable map, then (1/; 0 <p) * goes from functiOlIC; on AI3 to functions on 1If1, and we have (1/; 0 <p)* = <p* o 1/;* (3.1 ) (note the change of order). In fact, for 0 on M 3, (1/; 0 <p)*0 = 0 0 (1/; 0 <p) = (0 0 1/;) 0 <p = <p*[1/;*[0]]· Observe that if <p is any map from M 1 ---+ M 2 and f is any function definel I on a subset S2 of M 2, then the "pullback" <p*[f] = f 0 <p is a function defined ()II <p-l(S2) of MI. The fact that <p is continuous allows us to conclude that if S2 ic; open, then so is <p-l(S2). The fact that <p is differentiable implies that <p*[f] i, differentiable whenever f is. The map <p* commutes with all algebraic operations whenever they an' defined. More precisely, suppose f and 0 take values in the same vector spacI' and have domains of definition U1 and U2. Then f +9 is defined on U1 n (!~, and <p*[fl + <p*[0] is defined on <p-l(U1 n U2), and we clearly have <p*[f + oj = <p*[fj + <p*[0]· EXERCISES 3.2 Let il12 be a finite-dimensional manifold, and let <p: ilh --> JIz be eontinuou,; Suppose that <p*[fl is differentiable for any (locally defined) differentiable real-valul'd function f. Conclude that <p is differentiable. 3.3 Show that if <p is a bounded linear map between Banach spaces, then <p* :1 defined above is an extension of <p* as defined in Section 3, Chapter 2.
  • 385. 9.4 THE TANGENT SPACE 373 4. THE TANGENT SPACE In this section we are going to construct an "approximating vector space" to a differentiable manifold at each point of the manifold. This will allow us to formulate most of the notions of the differential calculus on manifolds. Let 111 be a differentiable manifold, and let x be a point of 111 (Fig. 9.6). Lei I C IR be an interval containing the origin. Let <p be a differentiable map of I into 111 such that <p(O) = x. We will call <p a (differentiable) curve through x. Let f be any differentiable real-valued function on 111 defined in a neigh- borhood of x. Then <p*[fl is differentiable on IR and we can consider its derivative at the origin. Define the operator D", by D",(f) = d<p*[fl/ .dt t=o In view of the linearity of <p*, the map f ~ D",(f) is linear: Similarly, we have Leibnitz's rule: ~) M o Fig. 9.6 which can easily be checked. The functional D", depends on the curve <po If 'if; is a second curve, then, in general, D'" ~ D",. If, however, D", = D"" then we say that the curves <p and 'if; are tangent at x, and we write <p ,...., 'if;. Thus if and only if D",(f) = D",(f) for all differentiable functions f. It is easy to check that,...., is an equivalence relation. An equivalence class of curves through x will be called a tangent vector at X. If ~ is a tangent vector at x and <p E ~, we say that ~ is tangent to <p at X. For any differentiable function f defined about x and any tangent vector ~, we set where <p E~. Thus ~ gives us a functional on differentiable functions defined about X. We have Haf + bg) = aHf) + b~(g), Hfg) = f(xH(g) + g(xH(f)· (4.1) (4.2) Let us examine what the equivalence relation,...., says in terms of a chart (W, a) about X. The functional D",(f) can be written as dfo <p / = d(fo a-I) 0 (a 0 <p) / dt t=O dt t=o'
  • 386. 374 DIFFERENTIABLE MANIFOLDS 9.'1 If we set <I> = a 0 cp and F = f 0 a-I, then <I> is a parametrized curve in a Banaeh space and F is a differentiable function there. We can thus write D",(f) = dF(<I>'(O)) = D~'(o)F. From this expression we see (setting '11 = a 0 if;) that if; '" cp if and only if <1>'(0) = '11'(0). We thus see that in terms of a chart (W, a), every tangent. vector ~ at x corresponds to a unique vector ~O! E V given by ~a = (a 0 cp),(O), where cp E ~. Conversely, given any v E V, there is a tangent vector ~ with ~O! = /' In fact, define cp by setting cp(t) = a-I (a(x) + tv). Then cp is defined in a sma.ll enough interval about 0, and (a 0 cp)' = v. In short, a choice of chart allows us to identify the set of all tangent vector:; at x with V. Let (U, (3) be a second chart about x. Then ~{3 = «(30 cp)'(O) = «(30 a-I) 0 (a 0 cp)'(O). By the chain rule we thus have ~(3 = J{3oa-1(a(x))t", where J-y(p) is the differential dYp of Y at p. Since J{3oa-1(a(x)) is a linear map of V into itself, Eq. (4.3) says that tIl(' set of all tangent vectors at x can be identified with V, the identification beilll!- determined up to an automorphism of V. In particular, we can make the set. of all tangent vectors at x into a vector space by defining a~ + br) = 5", where 5" is determined by for some chart a. Equation (4.3) shows that this definition is independent of CY. We shall denote the space of tangent vectors at x by Tx(M) and shall can il. the tangent space (to M) at x. Let if; be a differentiable map of M 1 to M 2, and let cp be a curve passinl!- through x E !III (see Fig. 9.7). Then if; 0 cp is a curve passing through if;(x) EM2' It is easy to check that if cp '" cp, then if; 0 cp '" if; 0 cpo Thus the map if; induces a. mapping of Tx(M 1) into Tif;(x)(M2 ) , which we shall denote by if;u. To repeat, ..
  • 387. 9.4 THE TANGENT SPACE 375 Fig. 9.8 Fig. 9.9 if ~ E Tx (M1), then t/lua) = fJ is determined by and for all cp E ~. Let (U, a) be a chart about x, and let (W, (3) be a chart about 1/I(x). Then ~a = (a 0 cp)'(O) fJIJ = «(301/1 0 cp)'(O) = «(301/1 0 a-I) 0 (a 0 cp)'(O). By the chain rule we can thus write fJIJ = JIJoy,oa-l(a(x»)~a. This says that if we identify Tx(M1) with VI via a and identify T y,(x)(M2) with V2 via (3, then 1/Iu becomes identified with the linear map JIJoy,oa-1(a(x»). In particular, the map t/I*x is a continuous linear mapping from Tx(M1) to Ty,(x)(M2)' If cp: M 1 -+ M 2 and t/I: M 2 -+ M 3 are two differentiable mappings, then it follows immediately from the definitions that (1/1 0 cp)u = t/I*<p(X) 0 cp*x. (4.4) We have seen that the choice of chart (U, a) identifies Tx(M) with V. Now suppose that M is actually V itself (or an open subset of V) regarded as a differentiable manifold. Then M has a distinguished chart, namely (M, id). Thus on an open subset of V the identity chart gives us a distinguished way of identifying Tx(M) with V. It is sometimes convenient to picture Tx(M) as a copy of V whose origin has been translated to x. We would then draw a tangent vector at x as an arrow originating at x. (See Fig. 9.8.) Now suppose that M is a general manifold and that 1/1 is a differentiable map of M into a vector space VI' Then 1/1* (Tx(M») is a subspace of TY,lX)(V1), If we regard t/I*(Tx(M» as a subspace of VI and consider the corresponding hyperplane through x, we get the "plane tangent to 1/I(M) at x" in the intuitive sense (Fig. 9.9).
  • 388. 376 DIFFERENTIABLE MANIFOLDS It is very convenient to think of tangent vectors in this way, that is, to regard them as vectors tangent to M if M were mapped into a vector spac(' H! is a real-valued differentiable function defined in a neighborhood U 01 of x E M, then we can regard it as a map of the manifold U to the manifold Iffi I. We therefore get a map!*x: Tx(M) -> Tf(dlffil). Recall that we identify Ty(lffil) with Iffil for any y E Iffil. Therefore, !*x can be viewed as a map from T*x(l't1) to Iffi 1. The reader should check that this map is indeed given by for ~ E TxCM). (4..'i) In particular, if we take M 3 = Iffi and y., = ! in (4.4), we can assert: Let y., be a differentiable map of M 1 to M 2, and let! be a differentiable fUll(' tion on M2 defined in a neighborhood of y.,(x). Then for any ~ E Tx(lII]), ~(y.,*(f)) = y.,h(O(f)· (4.()) From now on, we shall frequently drop the subscript x in y.,*x when it can 1)(' understood from the context. Thus we would write (4.4) as (y., 0 cp)* = y.,* 0 cP*. Some authors call the mapping y.,*x the differential of y., at x and designate it # ,. If 1111 and 1112 are open subsets of Banach spaces VIand V2 (and hence ar(' differentiable manifolds under their identity charts), then y.,*x as defined aboY<' does reduce to the differential #x when Tx(lJI,) is identified with Vi. Thi.~ reduction does depend on the identification, however. 5. FLOWS AND VECTOR FIELDS Let M 1 and M 2 be differentiable manifolds. A map g from M 1 -> M 2 is called :I diffeomorphism if g is a differentiable one-to-one map of M 1 onto M 2 such thaI. (I-I is also differentiable. Let M be a differentiable manifold. A map cp: 1JI X Iffi -> 1JI is called a 01lC- parameter group if i) cP is differentiable; ii) cp(x, 0) = x for all x E 1JI; iii) cp(cp(x, s), t) = cp(x, s + t) for all x E 1JI and s, t E R We can express conditions (ii) and (iii) a little differently. Let CPt: M -> JII be given by CPt(x) = cp(x, t). For each t E Iffi the map CPt is differentiable. In fact, where Lt is the differentiable map of M -> M X Iffi given by Lt(X) = (x, t). Then condition (ii) says that CPo = id. Condition (iii) says that CPt 0 CPs = CPt+s' If we take t = -s in this equation, we get CPt 0 CP-t = id. Thus for each t the map CPt is a diffeomorphism and (cpt)-l = CP-t.
  • 389. 9.5 FLOWS AND VECTOR FIELDS 377 Fig. 9.10 We now give some examples of one-parameter groups. Example I. Let M = V be a vector space, and let wE M. Let cp: V X ~ --t V be given by cp(v, t) = v + two It is easy to check that (i), (ii), and (iii) are satisfied. (See Fig. 9.10.) Example 2. Let M = V be a finite-dimensional vector space, and let A be a linear transformation A: V --t V. Recall that the linear transformation etA is defined by i.e., for any v E V, 2A2 3A3 etA = 1 + tA + _t_ +_t_ +...2! 3! co ti etAv = ~ -;-; Aiv. i=03· (See Figs. 9.11 and 9.12.) Since the convergence of the series is uniform on any A=[_~ ~JV=!R2 V=!R2 A=[~~J Fig.9.11 Fig. 9.12
  • 390. 378 DIFFERENTIABLE MANIFOLDS 9.5 compact set of <v, t>-, the map cp: M X ~ ~ M given by cp(v, t) = etAv is easily seen to be differentiable and to satisfy (ii) and (iii) as well. Example 3. Let M be the circle 8 1, and let a be any real number. Let cp~ be the diffeomorphism consisting of rotation through angle tao In terms of the atlas a = {(Ub 81), (U2 , 82)}, the map cp is given by 81 (cp(x, t)) = 81 (x) +ta, = 81(x) +ta - 21l', 82 (cp(x, t)) = 82(x) +ta, = 82 (x) +ta - 21l', x E Ub 81(x) < 21l' - ta, x E U1 , 81 (x) > 21l' - ta, x E U2, 82 (x) < 21l' +1l'/2 - ta, x E U2, 82(x) > 21l' +1l'/2 - tao (Strictly speaking, this doesn't quite define cp for all values of <x, t>- • If x = <1,0>- and ta = 1l'/2, then x f1. U1 and cp(x, 1l'/2) f1. U2. This is e~sily remedied by the introduction of a third chart.) It is easy to see that cp is a onc- parameter group. Example 4. Let M = 8 1 X 8 1 be the torus, and let a and b be real numbers. Write x E M as x = <Xb X2>-, where Xi E 8 1• Define cp<a,b> by cp<a,b>(Xb X2, t) = <CPf(Xl), CP~(X2) >- , where cpa and cpb are given in Example 3. Then cp<a,b> is a one-parameter group and indeed a rather instructive one. The reader should check to see that essen- tially different behavior arises according to whether b/a is rational or irrational. [The construction of Example 4 from Example 3 can be generalized as follows. If cp: M X ~ ~ M and 1/1: N X ~ ~ N are one-parameter groups, then we can construct a one-parameter group on M X N given by CPt X I/It. The map of M X N X ~ ~ M X N sending <x, y, t>- ~ <CPt(x) , I/It(Y)>- lH differentiable because it can be written as the composite (cp X 1/1) 0 ~, where cp X 1/1: M X ~ X N X ~ ~ M X N, and ~: M X N X ~ ~ M X ~ X N X ~ is given by ~(x, y, t) = <x, t, y, t>-.] In each of the four preceding examples we started out with an "infinitesimal generator" to construct the one-parameter group, namely, the vector w ill Example 1, the linear transformation A in Example 2, the real number a ill Example 3, and the pair <a, b>- in Example 4. We will now show that associated with anyone-parameter group on a manifold, there is a nice object which we can regard as the infinitesimal generator of the one-parameter group. Let cp: M X ~ ~ M be a one-parameter group. For each x E M considcr the map CPx of ~ ~ M given by CPx(t) = cp(x, t).
  • 391. 9.5 FLOWS AND VECTOR FIELDS 379 In view of condition (ii), we know that <;?x(O) = x. Thus <;?x is a curve passing through x (see Fig. 9.13). Let us denote the tangent to this curve by X(x). We thus get a mapping X which assigns to each x E M a vector X(x) E TxCM). Any such map, i.e., any rule assigning to each x E M a vector in Tx(M), will be called a vector field. We have seen that everyone-parameter group gives rise to a vector field which we shall call the infinitesimal generator of the one-parameter group. Fig. 9.13 Let Y be a vector field on M, and let (U, a) be a chart on M. For each x E U we get a vector Y(x)a E V. We can regard this as defining a V-valued function Ya on a (U) : Ya(v) = Y(a-l(v))a for v E a(U). (5.1) Let (W, (3) be a second chart, and let Y~ be the corresponding V-valued function on (3(W). If we compare (5.1) with (4.3), we see that Y~({3 0 a-lev)) = J~ca-l(V) 0 Ya(v) if v E a(U n W). (5.2) Equation (5.1) gives the "local expression" of a vector field with respect to a chart, and Eq. (5.2) describes the "transition law" from one chart to another. Conversely, let a be an atlas of M, and let Y a be a V-valued function defined on a(U) for each chart (U, a) E a. Suppose that the Ya satisfy (5.2). Then for each x E M we can let Y(x) E Tx(M) be defined by setting Y(x)a = Ya(a(x)) for some chart (U, a) about x. It follows from the transition law given by (4.3) and (5.2) that this definition does not depend on the choice of (U, a). Observe that J~oa-l is a COO-function (linear transformation-valued function) on a(U n W). Therefore, if Y is a vector field and Ya is a V-valued COO-function on a(U), the function Y~ will be Coo on (3(U n V). In other words, it is consistent to require that the functions Ya be of class Coo. We shall therefore say that Y
  • 392. 380 DIFFERENTIABLE MANIFOLDS is a COO-vector field if the function Ya is Coo for every chart (U, a). As in the casl' of functions and mappings, in order to verify that Y is Coo, it suffices to check that the Ya are Coo for all charts (U, a) belonging to some atlas of M. Let us check that the infinitesimal generator X of a one-parameter group <{' is a Coo-vector field. In fact, if (U, a) is a chart, then Xa(v) = (a 0 IPx),(O), where IPx(t) = IP(x, t). We can write a 0 IPx(t) = <I>a(v, t), where <I>a(v, t) = a 0 IP(a-l(v), t). Let U' CUbe a neighborhood of x such that IP(Y, t) E U for y E U' and ItI < f: Then <I>a is a differentiable map of a(U') X 1- a(U), where 1= {t: ItI < f: In terms of this representation, we can write Xa(v) = a!a (v, 0). (5.;)) This shows that X is a Coo-vector field. If we evaluate (5.3) in the case of Example 1, we get <I>id(V, t) = v + tll', so that X id = w. In the case of Example 2 we !!:et Xid(V) = Av. There are various algebraic operations that can be performed with vector fields. The set of all vector fields on M forms a vector space in the obvious way. If X and Yare Coo-vector fields, then so is aX +bY (a and b are constants), where (aX +bY)(x) = aX(x) +bY(x), xEM. Similarly, we can multiply a vector field by a function. Iff is a function and X is a vector field, we define fX by (fX)(x) = f(x)X(x), xEM. It is easy to see that if f and X are differentiable, then so is fX. It is also easy to check the various associative laws for this multiplication. We have seen that anyone-parameter group defines a smooth vector field. Let us examine the converse. Does any Coo-vector field define a one-parameter group? The answer to the question as stated is "no". In fact, let X = a/axl be the vector field corresponding to translation in the xl-direction in ~n. Let M = ~2 - C, where C is some nonempty closed set. of ~n. Then if p is any point of M that lies on a line parallel to that xl-axis which intersects C (Fig. 9.14), then IPt(p) will not be defined (will not lie in M) for every t. The reader may object that M "has some points missing" and that is why X does not generate a one-parameter group. But we can construct a counterexample on ~2 itself. In fact, if we consider the vector field X on ~2 given by Xid(XI , x2) = (1, _(X2)2), --~p~-----+----~~ Fig. 9.14
  • 393. 9.5 FLOWS AND VECTOR FIELDS 381 then (5.3) shows that cp, if defined, satisfies d4? d4? dt (x, t) = dt (4?(x, t), 0) = X(4?(x, t), where 4? = 4?id. If we let yi(t, x) = Xi 0 4?(x, t), then dyl dt = 1, yl(O) = xl, If x 2 ~ 0, then the unique solution of the second equation is given by 2 1 Y (t) = t + 1/x2 ' which is not defined for all values of t. Of course, the trouble is that we only have a local existence theorem for differential equations. We must therefore give up on the requirement that cp be defined on all of MXR Definition 5.1. A flow on 111 is a map cp of an open set U C 111 X IR ---t 111 such that i) 111 X {O} C U; ii) cp is differentiable; iii) cp(x, 0) = x; iv) cp(cp(x, s), t) = cp(x, s + t) whenever both sides of this equation are defined. For x fixed, CPx(t) = cp(x, t) is defined for sufficiently small t, so that cp gives rise to a vector field X as before. We shall call X the infinitesimal generator of the flow cpo As the previous examples show, there may be no t ~ 0 such that cp(x, t) is defined for all x, and there may be no x such that cp(x, t) is defined for all t. Proposition 5.1. Let X be a smooth vector field on 111. Then there exists a neighborhood U of 111 X {O} in 111 X IR and a flow cp: U ---t 111 having X as its infinitesimal generator. Proof. We shall first construct the curve CPx(t) for any x E M, and shall then verify that -<x, t>- f-+ cp(x, t) is indeed a flow. Let x be a point of 111, and let (U, a) be a chart about x. Then Xa gives us lfn ordinary differential equation in a(U), namely, dv dt = Xa(v), v E a(U). By the fundamental existence theorem for ordinary differential equations, there exists an e > 0, an open set 0 containing a(x), and a map 4?a: 0 X {t: ItI < e} ---t a(U)
  • 394. 382 DIFFERENTIABLE MANIFOLDS 9.5 such that <Pa is Coo, <pa(V,O) = v, and d<Pad~' t) = Xa(<pa(V, t»). Here the choice of the open set 0 and of f depends on a(x) and a(U). The uniqueness part of the theorem asserts that <Pa is uniquely determined up to the domain of definition; i.e., if <Pv is any curve defined for ItI < f' with <pv(O) = v and d<Pv(t) = X (<p (t») dt a v , then <pv(t) = <pa(v, t). This implies that <pa(V, t + s) = <pa(<pa(v, s), t) whenever both sides are defined. (Just hold s fixed in the equation.) Consider the curve <1>'(,) defined by (5.4) <1>'(t) = a -l(<I>a(a(x),t)). (5.5) It is defined for It I < E, and is a continuous, in fact differentiable map of this interval intoM. Furthermore, ifwe write IjJ = <1>axO then (5.4) asserts that the tangent vector to the curve ljJ(t + .) isX(IjJ(t)), the value ofthe vector field at the point ljJ(t). We will write this condition as 1jJ,(t) = X(IjJ(t)). (5.6) Equation (5.6) is the way we would write the "first order differential equation" on M corresponding to the vector field X. A differentiable curve IjJ satisfying (5.6) is called an integral curve ofX. We now can formulate a manifold version of the uniqueness theorem of differential equations: Lemma 5.1. Let 111:1---'> M and 112:1 ---'> M be two integral curves ofX defined on the same interval I. If 111(8) = 112(8) at some point 8 E Ithen 111 = 112, i.e. 111(t) = 112(t) for all tEl. Proof We wish to show that the set where IjJl(t) -;t! 1jJ2(t) is empty. Let A = {t:t:? sand IjJl(t) -;t! 1jJ2(t)}. We wish to show that A is empty, and similarly that the set B = {t:t ~ sand IjJl(t) -;t! 1jJ2(t)} is empty. Suppose that A is not empty, and let t + = glb A = glb {t:t :? sand IjJl(t) -;t! 1jJ2(t)}. We will derive a contradiction by i) using the uniqueness theorem for differential equations to show that IjJl(t +) -;t!1jJ2(t+), and ii) using the Hausdorff property of manifolds to show that IjJl(t +) = 1jJ2(t +).
  • 395. 9.6 LIE DERIVATIVES 383 Details: i). Suppose that 111(t +) = 112(t+) =Y EM. We can find a coordinate chart (13, W) abouty, and then 13 a 111 and 13 a 112 are solutions ofthe same system of first order ordinary differential equations, and they both take on the value 13(Y) at t = t+. Hence, by uniqueness for differential equations, 13 a 111 and 13 a 112 must be equal in some interval about t +, and hence 111(t) = 112(t) for all t in this interval. This is impossible since there must be points: arbitrarily close to t + where 111(t) ;t.! 112(t) by the glb property oft +. This proves i). Now suppose that 111(t +) ;t.! 112(t+). We can find neighborhoods U1of111(t +) and U2of112(t +) such that U1n U2 = 0. But then the continuity ofthe 111 imply that 111(t) E Uland 112(t) E U2 for t close enough to t+, and hence that IIl(t) ;t.! 112(t) for t in some interval about t+. This once again contradicts the glb property oft +, proving ii). The same argument with glb replaced by lub shows that B is empty proving Lemma 5.1. The above argument is typical of a "connectedness argument." We showed that the set where IIl(t) = 112(t) is both open and closed, and hence must be the whole interval I. Lemma 5.1 shows that (5.5) defines a solution curve ofX passing through x at time t = 0, and is independent of the choice of chart in any common interval ofdefinition about o. In other words it is legitimate to define the curve <Px(-) by which defines <px(t)for It I < E. Unfortunately the E depends not only on x but also on the choice ofchart. We use Lemma 5.1 and extend the definition of<Px(·) as far as possible, much as we did for ordinary differential equations on a vector space in Chapter 6: For any S with IsI < Ewe lety = <Px(s) and obtain a curve <Py(.) defined for ItI < E'. By Lemma 5.1 <Py(t) = <pX<s+t)ifls+tl < E. (5.7) It may happen that Is I + E' > E. Then there will exist a t with It I < E' and Is+t I > E. Then the right hand side of (5.7) is not defined, but the left is. We then take (5.7) as the definition of <Px(s +t), extendingthe domain ofdefinition of <Px(-). We then continue: Let Ix + denote the set of all s > 0 for which there exists a finite sequence ofreal numbers So = 0 < SI < ... < Sk = s and points Xo, . • •Xk-l E M with Xo = x, SI in the domain ofdefinition of<Px(·), X2 = <Px(SI) and, inductively, Si+l in the domain of definition of <PXj(·) and Xi+ 1 = <Pxj(Si+ 1). If S E Tt, so is Sf for 0 < Sf < s, and so is s + TJ for sufficiently small positive TJ. Thus rt is an interval, half open on the right. By repeated use of (5.4) we define IPx(s) for s E Tt We construct r; in a similar fashion and set Ix = It u r;. Then IPx(s) is defined for s E Ix, and I is the maximal interval for which our construction defines IPx. For each x E M we obtain an open interval I x in which the curve IPx(·) is defined. Let U = UXEM{X} X Ix. Then U is an open subset of M X I. To verify this, let (x, 8) E U. We must show that there is a neighborhood W of x and an E > 0 such that s E I x for all Is - 81 < E and x E W. By definition, there is a finite
  • 396. 384 DIFFERENTIABLE MANIFOLDS 9.6 sequence of points x = XO, Xl, ... , Xk and charts (U b al), ... , (Uk, ak) with Xi-l E Ui and Xi E Ui and such that ai(xi) = <J!",/ai(xi_l), ti), where tl +... +tk = s. It is now clear from the continuity properties of the <J!", that if we choose Xo such that al(xO) is close enough to al(xO), then the points Xi defined inductively by ai(xi) = <J!",; (ai(xi-l), ti) will be well defined. [That is, aj{Xj_l) will be in the domain of the definition of <J!",;(. ,ti).] This means that "8 E Ixo for all such points Xo. The same argument shows that "8 + TJ E I Xo for TJ sufficiently small and X sufficiently close to X. This shows that U is open. Now define cp by setting cp(x, t) = CPx(t) for (x, t) E U. That cp is differentiable near M X {O} follows from the fact that cp is given (in terms of a chart) as the solution of an ordinary differential equation. The fundamental existence theorem then guarantees the differentiability. Near the point (x, t) we can write and so cp is differentiable because it is the composite of differentiable maps. D 6. LIE DERIVATIVES Let cp be a one-parameter group on a manifold M, and let f be a differentiable function on M. Then for each t the function cpi[f] is differentiable, and for t ~ 0 we can form the function (6.1) We claim that the limit of this expression as t --+ 0 exists. In fact, for any x E M, cpi[f](x) = f 0 CPt(x) = f 0 CPx(t) and, therefore, lim cpi[f] - f (x) = lim f 0 CPx(t) - f 0 CPx(O) = D"",f = X(x)f. (6.2) hO t hO t Here X(x) is a tangent vector at x and we are using the notation introduced in Section 4. We shall call the limit of (6.1) the derivative of f with respect to the one-parameter group cp, and shall denote it by Dxf. More generally, for any smooth vector field X and differentiable function f we define Dxf by Dxf(x) = X(x)f for all x E M, (6.3) and call it the Lie derivative of f with respect to X. In terms of the flow generated by X, we can, near any x E M, represent Dxf as the limit of (6.1),
  • 397. 9.6 LIE DERIVATIVES 385 where, in general, (6.1) will only be defined for a sufficiently small neighborhood of x and for sufficiently small Itl. Our notation generalizes the notation in Chapter 3 for directional derivative. In fact, if III is an open subset of V and X is the "constant vector field" of Example 1 Xid = wE V, then (Dxf)id = Dwfid' where Dw is the directional derivative with respect to w. Note that Dxf is linear in X; that is, DaX+byf = aDxf +bDyg if X and Yare vector fields and a and b are constants. Let if; be a diffeomorphism of M 1 onto M 2, and let X be a vector field on M 2. We define the "pullback" vector field if;*[X] on MI by setting for all x EM. (6.4) Note that if; must be a diffeomorphism for (6.4) to make sense, since if;-I enters into the definition. This is in contrast to the "pullback" for functions, which made sense for any differentiable map. Equation (6.4) does indeed define a vector field, since and Let us check that if;*[X] is a smooth vector field if X is. To this effect, let a1 and a2 be compatible atlases on M 1 and M 2, and let (U, a) E al and (W, (3) E a2 be compatible charts. Then (6.4) says that if;*[X],,(v) = J"orlo{rI({3 0 if; 0 a-I(v») . X(3({3 0 if; 0 a-I(v») for v E a(U), which is a differentiable function of v. Since, by the chain rule, J"orlo{rI({3oif; o a-I(v») ·J(3oy,o,,-I(V) = 1, we can rewrite the last expression more simply as if;*[X],,(v) = (J(3oy,o,,-I(V»)-IX(3({3oif; o a-I(v») for v E a(U). (6.5) Thus if;*[X]" is the product of a smooth HomeV2, V I)-valued function and a smooth V2-valued function, which shows that if;*[X] is a smooth vector field. Exercise. Let <p be the flow generated by X on .V2. Show that the flow generated by if;*[X] is given by If <p is a one-parameter group, then we can write (6.6) as -<x, t> ~ if;-I 0 <Pt 0 if;(x). (6.6) (6.6')
  • 398. 386 DIFFERENTIABLE MANIFOLDS 9.6 The vector field X The vector field Y (Xf=af/ax) (Yf=x(af/ay») X(u, v) = <1, 0> =lil Y(u, v) = <0, u> 1Ot*(Y) ",l(X) (a) (b) (c) IOl(Y) - Y = D,.y t -, IOl(Y)-Y (since independent of t) (d) (f) 1{It(X)-X (g) (h) (i) Fig. 9.15 It is easy to check that if T/ll: Ml ---+ M2 and T/l2: M2 ---+ M3 are diffeo- morphisms and Y is a vector field on M 3, then (1/12 0 1/Il)*Y = T/liT/l;Y.
  • 399. 9.6 ""r(X) -x = DyX t (since independent of t) Fig;. 9.15 (cont.) (j) LIE DERIVATIVES 387 If f is a differentiable function on M 2, then D"'*lxM'*[f]) = -.f;*(Dxf). (6.7) In fact, by (6.3) and (4.6) we have, for x EM17 D"'*lx]-.f;*[f](x) = -.f;*[X](x)-.f;*[f] = -.f;;IX(-.f;(x»)-.f;*[f] = (-.f;*-.f;;IX(-.f;(x»))f = X(-.f;(x»)f = (Dxf)(-.f;(x») = -.f;*(Dxf) (x). by (6.3) by (6.4) by (4.6) Let 'P be a one-parameter group on M with infinitesimal generator X, and let Y be another smooth vector field on M. For t r£ 0 we can form the vector field 'Pi[Y] - Y t (6.8) and investigate its limit as t ~ 0, which we shall call DxY. In Fig. 9.15 we have shown the calculation of DyX and DxY for two very simple fields on the Cartesian plane ~2. The field X is the constant field Xid = 1117 so that Xf = af/ax in terms of Cartesian coordinates x, y. The corresponding flow is given by 'Pt(x, y) = -< x +t, y>-. Thus 'Pt* = id if we identify the tangent space at each point of the plane with the plane itself. Then Y I--t 'Pi Y consists of "moving" the vector field Y to the left by t units. Here we have taken Y = X1l2, so that Yf = x(8f/ay). In Fig. 9.1.5(c) we have pictured 'PiY, and have superimposed Y and 'PiY in Fig. 9.15(d). Figure 9.15(e) represents 'PiY - Y and Fig. 9.15(f) is (l/t){'PiY - V}, which coincides with its limit, DxY, since the expression is independent of t. The one-parameter group generated by Y is -.f;t where -.f;t(x, y) = -<x, y +tx>-. Here at any p E ~2 we have -.f;t*lll = III + t1l2, so that -.f;iX = -.f;-t*X(-.f;(x») = III - t1l2 • In Fig. 9.15(g) we have drawn -.f;iX and in Fig. 9.15(h) we have superimposed it on X. Note that DxY = -DyX. However, these two derivatives are nonzero for quite different reasons. The field 'PiY varies vith t because the field Y is not constant. The field -.f;iX varies with t because of "distortion" in the flow 1/It. See Fig. 9.15(g) and (h). In the general case, DxY will r,esult from a superposition of these two effects. We now make the general calculation. Let (U, a) be a chart on M, and for v E a(U) let 0 be a sufficiently small open set containing v, and let E > 0 be sufficiently small, so that tf>" given by is defined for w E 0 and ItI < E. Then, for ItI < E, Eq. (6.5) implies that (6.9)
  • 400. 388 DIFFERENTIABLE MANIFOLDS 9.6 The right-hand side of this equation is of the form A;IZt, where At and Zt are differentiable functions of t with A 0 = I. Therefore, its derivative with respect to t exists and d(AtlZt) I l' AtlZt - Zo= 1m ----"--'----" dt t=o t-+o t 1· A (AtlZt - zo) = 1m t t-+o t 1· Zt - Atzo= 1m -=------:---'--" t-+o t 1· (Zt - Zo Atzo - zo)= 1m - ----=:.--"-----" t-+o t t = Zo - Aozo· Now in (6.9) Zt = Ya(cf>a(v, t)), so Zo = dYa (a!a (v, 0)) = dYa(Xa(V)). Here Ya is a V-valued function, so dYa is its differential at the point cf>a(v, 0). Hence dYa(Xa(v)) is the value of this differential at Xa(v). The transformation At = J <l>a(V,t) = d(cf>a)(v,t), so dd~lt=o = a:~alt=o = d acf>a at = d(Xa)v. Thus the derivative of (6.9) at t = 0 can be written as d(Ya)v(Xa(v)) - d(Xa)v(Ya(v)) = DXa(v)Ya - Dya(v)Xa. We have thus shown that the limit in (6.8) exists. If we denote it by DxY, we can write (6.10) As before, we can use (6.10) as the definition of DxY for arbitrary vector fields X and Y. Again, this represents the derivative of Y with respect to the flow generated by X, that is, the limit of (6.9) where now (c.8) is only locally defined. From (6.10) we derive the surprising result that DxY = -DyX. For this reason it is convenient to introduce a notation which expresses the antisym- metry more clearly, and we shall write DxY = [X, Y].
  • 401. 9.6 LIE DERIVATIVES 389 The expression on the right-hand side is called the Lie bracket of X and Y. We have [X, Yj = -[Y, Xj. (6.11) Let us evaluate the Lie bracket for some of the examples listed in the beginning of Section 5. Let M = IRn. Example 1. If Xid = WI and Yid = W2 are "constant" vector fields, then (6.10) shows that [X, Yj = O. Example 2. Let Xid(V) = Av, where A is a linear transformation, and let Yid = w. Then (6.10) says that [X, Yhd(V) = -Aw, since the directional derivative of the linear function Av with respect to w is Aw. Example 3. Let Xid(V) = Av and Yid(V) = Bv, where A and B are linear transformations. Then by (6.10), [X, Yjid(V) = BAv - ABv = (BA - AB)v. (6.12) Thus in this case [X, Yj again comes from a linear transformation, namely, BA - AB. In this case the antisymmetry in A and B is quite apparent. We now return to the general case. Let cp be a one-parameter group on M, let Y be a smooth vector field on M, and let I be a differentiable function on M. According to (6.7), Then cpi(Dyj) - Dyj t DCPt[Yl(CPj [ID - Dy(cpi[ID + Dy(cpif) - Dyj t t = D{CP![~I_y}cpi[IJ - D y (cpiIt- I). Since the functions cp£[j] are uniformly differentiable, we may take the limit as t ---? 0 to obtain In other words, Dx(Dyj) = DDxyj + Dy(Dxf) = D[x,yJi + Dy(DxI). D[x,yJi = Dx(Dyj) - Dy(Dxf). (6.13) In view of its definition as a derivative, it is clear that DxY is linear in Y: DX(aYl +bY2) = aDXYl +bDx Y2 if a and b are constants and X and Yare vector fields. By the antisymmetry, it must therefore also be linear in X. That is, DaXl+bX2Y = [aXl +bX2, Yj = a[Xl' Yj +b[X2' Yj = aDx1Y +bDx2Y.
  • 402. 390 DIFFERENTIABLE MANIFOLDS 9.7 Let X and Y be vector fields on a manifold M 2, and let ~ be a diffeomorphism of M 1 onto M2. Then ~*[X, Y] = [~*X, ~*Y]. (6.14) In fact, suppose X generates the flow cpo Then ~*[X, Y] = ~*DxY = ~* lim (cpi Y - Y) t=o t * *y *y= lim ~ CPt - ~ t=o t r ~*cpN-l* ~*y - ~*y = t~ t = lim (~-1 0 CPt 0 ~)*~*y - ~*Y. t=o t Since ~-1 0 CPt 0 ~ is the flow generated by ~*X, we conclude that the last limit is D",*x~*Y, which proves (6.14). Now let Y and Z be smooth vector fields on M, and let X be the infinitesimal generator of cpo Then Dx[Y, Z] = ~~ cpi[Y, Z] t- [Y, Z] = lim [cpi Y, cpiZ] - [Y, Z] t=o t = ~~ {[cpi Y t- Y, cpiZ] +[Y, cpiz t Z]} = [DxY,Z] + [Y, DxZ]. Thus [X, [Y, Z]] = [lx, Y], Z] + [Y, [X, Z]]. (6.15) In view of the antisymmetry of the Lie bracket, Eq. (6.15) can be rewritten as [X, [Y, Z]] + [Y, [Z, XJ] + [Z, [X, YJ] = O. (6.16) Equation (6.15), or (6.16), is known as Jacobi's identity. 7. LINEAR DIFFERENTIAL FORMS Let M be a differentiable manifold. We have attached to each x EM a vector space Tx(M). Its dual space, (Tx(M»*, is called the cotangent space to M at x, and will be denoted by T:(M). Thus an element ofT:(M) is a continuous linear function on Tx(M); it is called a covector. Some explanation of the word "continuous" is in order. In the case where M [and hence Tx(M)] is finite-dimensional, all linear functions on Tx(M) are continuous, so no further comment is necessary. We shall be concerned primarily with this case. More generally, let l be a linear function on Tx(M). For any
  • 403. 9.7 LINEAR DIFFERENTIAL FORMS 391 chart (U, a) about x we have identified Tx(M) with V, thus identifying ~ E Tx(M) with ~a E V. Then 1determines a linear function la on V by (7.1) If (W, 13) is a second chart, then <~fj, lfj) = <Jaofj-l({3(X))~fj, la). Since Jaorl ({3(x)) is a continuous map of V into V, we see that la is continuous if and only if lfj is. We shall therefore say that 1is continuous if la is continuous for some (and hence any) a. In this case we see that (7.1) gives us an identifica- tion of T:(l[) with V* sending 1 into lao The last equation says that the rule for change of charts is given by (7.2) Let f be a differentiable function on M, and let x EM. Then the function on Tx(M) sending each ~ E Tx(M) into Hf) will be denoted by df(x). Thus <~, clf(x) = U· It is easy to see that df E T:(l[). In fact, in terms of a chart (U, a) about x, <~, df(x) = D~(fa) (a(x)). Note that f assigns an element df(x) of T:(lIf) to each x E lIf. A map which assigns to each x E lJl an element of T:(lJl) will be called a covector field or a linear differential form. The linear differential form determined by the function f will be denoted simply by df. Let W be a linear differential form. Thus w(x) E T:(lJl) for each x E lJI. Let <X be an atlas of lIf. For each (U, a) E lJl we obtain the V*-valued function w'" on a(U) defined by for v E a(U). (7.3) If (W, (3) E <X, then (7.2) says that Wfj({3o a-lev)) = (J",orl({3o a-l(v))*w",(v) = (Jfjo",-l(V))-l*W",(V) for v E a(U n W). (7.4) As before, Eq. (7.4) shows that it makes sense to require that W be smooth. We say that W is a Ck-differential form if w'" is a V*-valued Ck-function for every chart (U, a). By (7.4) it suffices to check this for all charts in an atlas. Also, if we are given V*-valued functions w"" each defined on a(U), (U, a) E <X, and satisfying (7.4), then they define a linear differential form W on M via (7.3). If W is a differential form and f is a function, we define the form fw by fw(x) = f(x)w(x). Similarly, we define WI +W2 by (WI + W2)(X) = Wl(X) + W2(X).
  • 404. 392 DIFFERENTIABLE MANIFOLDS 9.7 Let M 1 and M 2 be differentiable manifolds, and let "': M 1 -? M 2 be a differentiable map. For any x EM1we have the map "'*x: Tx(M1) -? T'Pex)(M2)' It therefore defines a dual map ("'u)*: T:ex/M2) -? T: (M1)' (The reader can check that if l E T;Cx)(M2), then ~ -? (",*(0, l) is a continuous linear function of ~, by verification in terms of a chart.) Now let w be a differential form on M 2. It assigns w (",(x») E T;cx) (M2) to ",(x), and thus assigns an element ("'*x)*(w(",(x»)) E Ti(M 1) to x E M 1. We thus "pull back" the form w to obtain a form on M 1 which we shall call ",*w. Thus (7.f» Note that ",* is defined for any differentiable map as in the case of func- tions, not only for diffeomorphisms (and in contrast to the situation for vector fields). It is easy to give the expression for ",* in terms of compatible charts (U, 0') of M 1 and (W, (3) of M 2. In fact, from the local expression for ",* we see that, v E O'(U). (7.6) From (7.6) we see that ",*w is smooth if w is. It is clear that ",* preserves algebraie operations: (7.7) and ",*(fw) = ",*(fJ",*(w). (7.8) If ",: M1 -? M2 and 1/;: M2 -? M3 are differentiable maps, then (4.4) and (7.5) show that (I/; 0 ",)*w = ",*I/;*w. (7.U) Let 1/;: M 1 -? M2 be a differentiable map, and let f be a differentiable func- tion on ]1,[2. Then (4.6) and the definition df show that d(I/;*(fJ) = 1/;* df. (7.10) Let", be a flow on M with infinitesimal generator X, and let w be a smooth linear differential form on M. Then the form ",iw is locally defined and, as in tIll' case of functions and vector fields, the limit as t -? 0 of *"'tW- w t exists. We can verify this by using (7.6) and proceeding as we did in the casp of vector fields. The limit will be a smooth covector field which we shall call Dxw. We could give an expression for Dxw in terms of a chart, just as w(' did for vector fields.
  • 405. 9.8 COMPUTATIONS WITH COORDINATES 393 If f is a differentiable function, w a smooth differential form, and X the infinitesimal generator of cp, then Dx(fw) = (Dxf)w +fDxw. (7.11) In fact, D (f) 1· cp7fw - fw x w = 1m t t->o = lim (cp7f - f cpi(w) +fCP7w - w) t->o t t = (Dxf)w +fDxw. If 9 is a differentiable function on M, then cp7 dg - dg = dcp7[g] - dg = d (cp7[g] - g) t t t · An easy verification in terms of a chart shows that the limit of this last expression exists and is indeed d(Dxcp). Thus Dx(df) = d(Dxf). (7.12) Equations (7.11) and (7.12) show that if w = h dg l +... +fk d(/k, then Dxw = (Dxfd dg l + ... + (Dxfk) dgk +h d(Dxgl ) + ... +fk d(Dxgk). (7.12') Let w be a smooth linear differential form, and let X be a smooth vector field. Then (X, w) is a smooth function given by (X, w)(x) = (X(x), w(x). Note that (X, w) is linear in both X and w. Also observe that for any smooth function f we have (X, df) = Dxf· (7.13) 8. COMPUTATIONS WITH COORDINATES For the remainder of this chapter we shall assume that our manifolds are finite- dimensional. Let M be a differentiable manifold whose V = IRn. If (U, a) is a chart of M, then we define the function x~ on U by setting x~(x) = ith coordinate of a(x). If f is any differentiable function on U, then we can write Eq. (2.1) as f(x) = fa(x~(x), ... , x~(x)), (8.1)
  • 406. 394 DIFFERENTIABLE MANIFOLDS 9.R which we shall write as f = fa(x~, ... ,x~). (8.2) We define the vector field a/ ax~ on U by (aai ) (v) = 8i(= <0, . '," l l : .. ,0». Xa a ~th positIOn (8.3) If X is any vector field on U, then we have X = Xl ~+ ... +Xn~,a axl a axn (8.4) a a where the functions X~ are defined by (X)a(a(x)) = <X~(x), .. . , X~(x) >. (8.5) Equation (8.4) allows us to regard the vector field X as a "differential operator". In fact, it follows from the definitions that (8.G) Since x~ is a differentiable function on U, dx~ is a differential form on U and for all v E U. (8.7) In particular, <~ d' j) - .~ax~' Xa - u,. (8.8) If W is a differential form on U, then W = ala dx~ +... +ana dx~, (8.9) where the functions a,a are defined by wa(a(x)) = <ala(x), ... , ana(x) >E IRn*. (8.10) It then follows from the definitions that df = afa d 1 +... + afa d n J axi Xa axn Xa· (8.11) a a Equation (8.11) has built into it the transition law for differential forms under a change of charts. In fact, if (W, (3) is a second chart, then on Un W we hav(', by (8.11), d i axb d 1 axb d n X(j = axl Xa +... +axn Xa. a a (8.12)
  • 407. 9.8 COMPUTATIONS WITH COORDINATES If we write W = al{3 dxA +... + an{3 dx~ and substitute (8.12), we get " ax~aia = L... axi ai{3· a Now [ax~JaX]a 395 is the matrix J{3oa-l. If we compare with (8.10), we see that we have recovered (7.4). Since the subscripts a, (3, etc., clutter up the formulas, we shall frequently use the following notational convcntions: Instcad of writing x~ we shall write Xi, and instead of writing x~ we shall write yi. Thus i i X = Xa, k k Z = X-y, etc. Similarly, we shall write Xi for X~, yi for X~, ai for a,a, bi for ai{3, and so on. Then Eqs. (8.1) through (8.12) can be written as xi(x) = ith coordinate of a(x), f = fa(XI, ... , xn), (to)a (v) = Oi, X=XI~+ ... +Xn~, ax l axn (X)a(a(x) = <XI(X), ... , xn(x», Dxf = Xl afa + ... + xn afa , axl axn wa(a(x) = <al(x), ... ,an(x», df = afa dx l +... +():f~ dxi axl aX]' d i ayi d I ay i d n y = ax! x + ... +ax; x. (8.1') (8.2') (8.3') (8.4') (8.5') (8.6') (8.7') (8.8') (8.9') (8.10') (8.11') (8.12') The formulas for "pullback" also take a simple form. Let if;: M I --> M 2 be a differentiable map, and suppose that M I is m-dimensional and M 2 is n-dimen-
  • 408. 396 DIFFERENTIABLE MANIFOLDS sional. Let (U, a) and (W, (3) be compatible charts. Then the map (3 0 1/; 0 a-I: a(U) -t (3(W) is given by i = 1, ... , n, that is, by n functions of m real variables. If f is a function on M 2 with f = ffl(yl, ... ,yn) on W, then 1/;*[f1 = fa(XI, ... , Xm) on U, where 9.8 (8.13) (8.14) The rule for "pulling back" a differential form is also very easy. In fact, if w = al dyl +... + an dyn on W, then 1/;*w has the same form on U, where we now regard the a's and y's as functions of the x's and expand by using (8.12). Thus * '"' ayi . 1/; w = L....- aia----:- dxJ, xJ where ai = a,(yl(xt, . .. ,xm), .. . , yn(x l, ... , xm)). Let x E U. Then ( a) ayj a 1/;* a---: (x) = L a---: (x) a---: (1/;(x)) x' j x' yJ gives the formula for 1/;u. EXERCISES (8.15) 8.1 Let x and y be rectangular coordinates on P, and let (r, (J) be polar "coordinates" on P - {O}. Express the vector fields ajar and aja(J in terms of rectangular coordi- nates. Express ajax and ajay in terms of polar coordinates. 8.2 Let x, y, z be rectangular coordinates on 1f3. Let a ax = y--z--, az ay a aY = z--x-, ax az and a az=x--y-' ay ax Compute [X, Y], [X, Z], and [Y, ZJ. Note that X represents the infinitesimal generator of the one-parameter group of rotations about the x-axis. We sometimes call X the "infinitesimal rotation about the x-axis". We can do the same for Y and Z. 8.3 Let a aA = y-+z-, az ay a aB=x-+z-, az ax and a aC=x--y_· ay ax Compute [A, BJ, [A, C), and [B, CJ. Show that Af = Bf = Cf = 0 if f(x, y, z) x 2 + y2 - z2. Sketch the integral curves of each of the vector fields A, B, and C.
  • 409. 9.9 RIEMANN METRICS 397 8.4 Let o 0 D = u-+v-, ov ou o 0 E = u--v-, ov ou and o 0 F=u--v-· OU OV Compute [D, El, [D, Fl, and [E, F]. 8.5 Let P1, ... , P n be polynomials in xl, ... , xn with no constant term, that is, P1(O, ... ,O) = 0. Let 1 0 n 0 I=x -+···+x-ox1 oxn and Show that [I,X] = ° if and only if the P/s are linear. [Hint: Consider the expansion of the P/s into homoge- neous terms.] 8.6 Let X and the P;'s be as in Exercise 8.5, and suppose that the P;'s are linear. Let 1 0 n 0 A = }..lX - +...+}..nx - , ox1 oxn and suppose that fo~ i ~ j. Show that [A, X] = °if and only if Pi = f.l.iXi, that is, 10+ + nOX = f.l.1X - . . . f.l.nX- ox1 oxn for some 1 n f.I. , ••• , f.I. • 8.7 Let A be as in Exercise 8.6, and suppose, in addition, that for any i, j, T. Show that if the Pis are at most quadratic, then [A,X] = ° if and only if Pi = f.l.iXi. Generalize this result to the case where Pi can be a polynomial of degree at most m. 9. RIEMANN METRICS Let M be a finite-dimensional differentiable manifold. A Riemann metric, m, on M is a rule which assigns a positive definite scalar product ( , )m.., to the vector space Tz(M) for each x E M. We shall usually drop the subscripts m and x when they are understood from the context. Thus if m is a Riemann metric on M, x E M, and ~, ." E T.,(M), we shall write the scalar product of ~ and." as (~, .,,) = (~, "')m•.,.
  • 410. 398 DIFFERENTIABLE MANIFOLDS Let (U, a) be a chart of M. Define the functions gij on U by setting gij(X) = (a~i (x), a~j (x») , so that % = (ji>' If~, TJ E TxCM) with then " i a~ = L.J ~ a---: (x) x' and (~, TJ) = L g,j(xHiTJj . i.j Since dx 1 (x), ... ,dx"(x) is the basis of T:UIl) dual io t.he basis a a a- I (:r), ... 'a- (x), X x" we have so that the last equation can be written as (~, TJ)m,x = L (j,j(x)(~, clxi)(TJ, dxj ). Equation (9.2) is usually written more succinctly as m I U = L g,j(x) dx i dxj . [Here (9.3) is to be interpreted as a short way of writing (9.2).] Let (W, fJ) be a second chart with hkl(X) = (a~k (x), a~l (x») , ;1; E W, that is, m IW= Lh/cldykdi· Then for x E Un W, we have so that is, a al a (x) - " (-r) (x)axi' - L.J axi . ayk ' a a/ a if;) (x) = L [i;) (x) ayl ' alal gij = L hkl axi axj ' k,l 9.9 (9.1) (9.2) (9.3) (9.4) (9.5) Note that (9.5) is the answer we would get if we fonnally substituted (8.12) for the dy's in (9.4) and collected the coefficients of dxi dxj • In any event, it is clear from (9.5) that if the h,j are all smooth functions on W, then the (j,j are smooth on U n W. In view of this we shall say that a
  • 411. 9.9 RIEMANN METRICS 399 Riemann metric is smooth if the functions gij given by (9.3) are smooth for any chart (U, a) belonging to an atlas a of M. Also, if we are given functions gij = gji defined for each (U, a) E a such that i) L gij(X) ~i~j > 0 unless ~ = 0 for all x E U, ii) the transition law (9.5) holds, then the gij define a Riemann metric on ]1.[. In the following discussion we shall assume that our Riemann metrics are smooth. Let if;: M 1 --+ M2 be a differentiable map, and let m be a Riemann metric on M2. For any x EM1 define ( , )",*(m),x on TxCM1) by (9.6) Note that this defines a symmetric bilinear function of ~ and 7]. It is not necessarily positive definite, however, since it is conceivable that if;*(~) = 0 with ~ ~ O. Thus, in general, (9.6) does not define a Riemann metric on MI. For certain if; it does. A differentiable map if;: M 1 --+ M 2 is called an immersion if if;*x is an injection (i.e., is one-to-one) for all x EM1. If if;: M 1 --+ M2 is an immersion and m is a Riemann metric on M2, then we define the Riemann metric if;*(m) on M 1 by (9.6). Let (U, a) and (W, (3) be compatible charts of M1 and M 2, and let m I W = L hkl dl dyl. Then where which is just (9.5) again (with a different interpretation). Or, more succinctly, if;*(m) I U = L if;*(hk1)if;*(dyk)if;"-(dyl). Let us give some examples of these formulas. If M = IRn , then the identity chart induces a Riemann metric on IRn given by (dX1)2 +... + (dxn)2. Let us see what this looks like in terms of polar coordinates in 1R2 and 1R3.
  • 412. 400 DIFFERENTIABLE MANIFOLDS In IR2 if we write then so Xl = r cos 0, x2 = r sin 0, dxl = cos 0 dr - r sin 0 dO, dx2 = sin 0 dr + r cos 0 dO, 9.9 (9.7) Note that (9.7) holds whenever the forms dr and dO are defined, i.e., on all of 1R2 - {O}. (Even though the function 0 is not well defined on all of 1R2 - {O}, the form dO is. In fact, we can write Xl dx2 - x2 dx l dO = (XI)2 + (X2)2 .) In 1R3 we introduce Then Thus Xl = r cos cp sin 0, x2 = r sin cp sin 0, x3 = r cos o. dxl = cos cp sin 0 dr - r sin cp sin 0 dcp +r cos cp cos 0 dO, dx2 = sin cp sin 0 dr +r cos cp sin 0 dcp +r sin cp cos 0 dO, dx3 = cos 0 dr - r sin 0 dO. (9.8) (dXI)2 + (dx2)2 + (dx3)2 = dr2 +r2 sin2 0 dcp2 +r2 d02• (9.9) Again, (9.9) is valid wherever the forms on the right are defined, which this time means when (X I)2 + (X2)2 ~ O. Let us consider the map L of the unit sphere 8 2 -+ 1R3, which consists of regarding a point of 8 2as a point of 1R3. We then get an induced Riemann metric on 8 2• Let us set dO = L* dO and dcp = L* dcp, so the forms dO and dcp are defined on U = 8 2 - {<O, 0,1>-, <0,0, -1>-}. Then on U we can write (since r = 1 on 8 2) (9.10) We now return to general considerations. Let M be a differentiable manifold and let C: I -+ M be a differentiable map, where I is an interval in IRI. Let t denote the coordinate of the identity chart on l. We shall set C'(S) = C* (~) (s), s E I, so that C'(s) E TC(B)(M) is the tangent vector to the curve C at s. If (U, a) is a chart on M and xl, ... ,xn are the coordinate functions of (U, a), then if
  • 413. 9.9 RIEMANN METRICS 401 C(l') C U for some I' C I, a 0 C = -<Xl 0 C, ... ,xn 0 C>-, so that , J dx l 0 C dxn 0 C . C (t)", =, dt , ... , (It- ret). When there is no possibility of confusion, we shall omit the 0 C and simply write and Now let m be a Riemann metric on M. Then IIC'(t)11 = (C'(t), C'(t))1/2 is a continuous function. In fact, in terms of a chart, we can write The integral IIC'(t)1I = V2:: %(C(t)xi'(t)xj'(t) . [ IIC'(t)11 dt (9.11) is called the length ofthe curve C. It will be defined if IIC'(t) II is integrable over I. This will certainly be the case if I and IIC'(t)II are both bounded, for instance. Note that the length is independent of the parametrization. More precisely, let ep: J -t I be a one-to-one differentiable map, and let CI = Co ep. Then at any r E J we have that is, Ci(r) = ;: C'(t). Thus IICier)II = IIC'(ep(r» II 1~~ I· (9.12) On the other hand, by the law for change of variable in an integral we have lllC'ol1 = i IIC'(ep(·)II I~; I = i IICiOl1 by (9.12). More generally, we say that a curve C defined on an interval I is piecewise differentiable if i) C is continuous; ii) I = I I U' .. U IT and C, on each Ij, is the restriction of a differentiable curve defined on some interval Ij strictly containing Ii'
  • 414. 402 DIFFERENTIABLE MANIFOLDS 9.9 (Thus a piecewise differentiable curve is a curve with a finite number of "cor- ners".) If C is piecewise differentiable, then IIC'(t) II is defined and continuous except at a finite number of t's, where it may have a jump discontinuity. In particular, the integral (9.11) will exist and the curve will have a length. Exercise. Let C be a curve mapping I onto a straight line segment in IRn in a one-to- one manner. Show that the length of C is the same as the length of the segment. Let C: [0, 1] ~ 1R2 be a curve with C(O) = 0 and C(I) = v E 1R2. If we use the expression (9.7), we see that IIIC'(t) II dt = IvI(r'(t))2 + (r(t)O'(t))2dt ~ Ilr'(t)1 dt 1 ~ 10 r'(t) dt = Ilvll, with equality holding if and only if 0' =0 and r' ~ O. Thus among all curves joining °to v, the straight line segment has shortest length. Similarly, on the sphere, let C be any curve C: [0, 1] ~ 8 2 with C(O) = (0,0, 1) and C(I) = p -:;6- (0,0, -1), and let 01 = O(C(I) ). Then IIIC'(t)11 dt = I vi(O'(t))2 +sin2 o(P'(t)) 2 dt ~ 101 IO'(t)1 dt. If we let tl denote the first point in [0, 1] where 0 = 01, then I (1 (tI (tI IIC'(t) II dt ~ io IO'(t) Idt ~ io IO'(t)1 dt ~ io O'(t) dt = 01, with equality only if <p' =°and tl = 1. Thus the shortest curve joining any two points on 8 2 is the great circle joining them. In both examples above we were aided by a very fortuitous choice of coordi- nates (polar coordinates in the plane and a kind of polar coordinates on the sphere). We shall see in Section 11, Chapter 13, that this is not accidental. We shall see that on any Riemann manifold one can introduce local coordinates in terms of which it is easy to describe the curves that locally minimize length.
  • 415. CHAPTER 10 THE INTEGRAL CALCULUS ON MANIFOLDS In this chapter we shall study integration on manifolds. In order to develop the integral calculus, we shall have to restrict the class of manifolds under consideration. In this chapter we shall assume that all manifolds M that arise satisfy the following two conditions: 1) M is finite-dimensional. 2) M possesses an atlas ex containing (at most) a countable number of charts; that is, ex = {(Ui , (Xi)}i=1,2, ...• Before getting down to the business of integration, there are several technical facts to be established. The first two sections will be devoted to this task. 1. COMPACTNESS A subset A of a manifold 1Jl is said to be compact if it has the following property: i) If {U,} is any collection of open sets with ACUU" there exist finitely many of the U" say U'I' ... , U", such that A C U'I U ... U U", Alternatively, we can say: ii) A set A is compact if and only if for any family {F,} of closed sets such that there exist finitely many of the F, such that A n F'I n ... n F" = 0. The equivalence of (i) and (ii) can be seen by taking U, equal to the comple- ment of F,. In Section 5 of Chapter 4 we established that if M = U is an open subset of IRn, then A C U is compact if and only if A is a closed bounded subset of IRn. 403
  • 416. 404 THE INTEGRAL CALCULUS ON MANIFOLDS We make some further trivial remarks about compactness: iii) If A b ... , AT are compact, so is A 1 U ... U AT' 10.1 In fact, if {U,} covers A1 u··· U AT, it certainly covers each A j. We can thus choose for each j a finite subcollection which covers A j . The union of these subcollections forms a finite subcollection covering A 1 U ... U AT' iv) If 1/;: M 1 ---7 M 2 is continuous and A eM1 is compact, then 1/;[A] is compact. In fact, if {U,} covers 1/;[A], then {y;-1(U,)} covers A. If the U, are open, so are the 1/;-1(U,), since 1/; is continuous. We can thus choose '1, ... , 'T so that A C 1/;-1(U,J u· .. u 1/;-1(U,), which implies that 1/;[A] C U'I U· .. U U'T' We see from this that if A = A1 u· .. U An where each Aj is contained in some Wi, where (Wi, f3i) is a chart, and f3i(Aj) is a compact subset of ~n, then A is compact. In particular, the manifold M itself may be compact. For instance, we can write sn as the union of the upper and lower hemispheres: sn = {x: xn+1 ~ O} U {x : xn+1 ~ O}. Each hemisphere is compact. In fact, the upper hemisphere is mapped onto {y : Ilyll ~ I} by the map IP1 of Section 8.1, and the lower hemisphere is mapped onto the same set by IP2' Thus the sphere is compact. On the other hand, an open subset of ~n is not compact. However, it can be written as a countable union of compact sets. In fact, if U C ~n is an open set, let An = {x E U: IIxll ~ nand p(x, aU) ~ lin}. It is easy to check that An is compact and that UAn = U. In view of condition (2), we can say the same for any manifold M under consideration: Proposition 1.1. Any manifold M satisfying (1) and (2) can be written as where each Ai C 111 is compact. Proof. In fact, by (2) and by the preceding discussion each Uj can be written as the countable union of compact sets. Since the countable union of a countable union is still count- able, we obtain Proposition 1.1. 0
  • 417. 10.2 PARTITIONS OF UNITY 405 An immediate corollary is: Proposition 1.2. Let M be a manifold [satisfying (1) and (2)], and let {U.} be an open covering of M. Then we can select a countable subcollection {Uj } such that Proof. Write M = UA" where Ar is compact. For each r we can choose finitely many Ur,b Ur,2, ... , Ur,kr so that Ar C Ur,l u· .. U Ur,kr" The collection is a countable subcollection covering M. D 2. PARTITIONS OF UNITY In the following discussion it will be convenient for us to have a method of "breaking up" functions, vector fields, etc., into "little pieces". For this purpose we introduce the following notation: Definition 2.1. A collection {gj} of COO-functions is said to be a partition of unity if i) gi 2:: 0 for all i; ii) supp git is compact for all i; iii) each x E M has a neighborhood Vx such that Vx n supp gi = 0 for all but a finite number of i; and iv) L gi(X) = 1 for all x E M. Note that in view of (iii) the sum occurring in (iv) is actually finite, since for any x all but a finite number of the gi(X) vanish. Note also that: Proposition 2.1. If A is a compact set and {gj} is a partition of unity, then An supp gi = 0 for all but a finite number of i. Proof. In fact, each x E A has a neighborhood Vx given by (iii). The sets {Vx} xEA form an open covering of A. Since A is compact, we can select a finite subcollection {V 1, ... , Vr } with A C V 1 U ... U Vr. Since each Vk has a nonempty intersection with only finitely many of the supp 9i, so does their union, and so a fortiori does A. D t Recall that supp 9 is the closure of the set {x: g(x) rf O}.
  • 418. 406 THt; INTEGRAL CALCULUS ON MANIFOLDS 10.2 Oefinition 2.2. Let {U,} be an open cuvering of AI, and let fgj} be a parti- tion of unity. We Ray that {gj} is subordinate to {U,} if for every j there exists an l(j) such that BUPP gj C U'<il' (2.1) Th.ore... 2.1. Let {fl,} be any open covering of M. There exists a partition of unity fgj} subordinate to {U,}. The proof that we shall present below is due tu Bunic anu Fralllpton. t First we introduce some preliminary notions. The function 1on ~ defined by I e-lIlt I(u) = 0 if u> 0, if u ~ 0 is C~. For u '" 0 it is clear that Jhas uerivatives uf all urders. To check thatJ is Coo at 0, it suffices to show that J(k'(u) -> 0 as u -> 0 from the right. But ICk'(u) = Pk(l / u)e- II., where Pk is a polynomial of degree 2k. So lim f,k,(U) = lim Pk(s)e-' = 0, u--->O , om:> since e' goes to iufinity faster than any polynomial. Note that J(u) > 0 if and only if u > O. Now consider the function g~ on ~ defined by g~(x) = J(x - a)J(b - x). Then y! is CZl and nonnegative and if and only if a < x < b. More generally, if a = -<al , .•• , ak >- and b = -<b l , ••• , bk >-, define the fUllctiull y~ on Rk by setting g~(x) = g~:(x)g!;(X2) . .. Ot.(xk), where x = -<Xl, ... J Xk>. Then gZ 2: 0,0: E CZJ, and y~(x) > 0 if and only if al < Xl < bl , ... ,ak < x' < b'. (2.2) Lemma. Let I" ... .Ik be COO-functions on a manifold At, and let W = {x: a l < JI(X) < b" ... , ak < J.(x) < b'}. There exists a nonnegative COO-function 9 such that W = {x : g(x) > O}. In fact, if we define 9 by then it is clear that 9 has the ue.ired pruperties. t Smooth functions on nanach manifold., J. Malh and Meek. 15, 877-898 (1966).
  • 419. 10.2 "AUTITTO-':S OF UNITY 407 We now t.urn to the proof of Theorem 2. L Proof. For each x E M choose 0. V, containing x and a chart (V, ,,) about x. Then a(U n U.) i9 an 01)('0 !l4't containing a(x) in RIO. Choose a and b such thllt ...{z) E int O!' Let W" = ",- l(int D~). Then lV~ C If. and Ilnd Q>Ca(V n V.). 11'~ is compact. AIJ:K) if xl, ... ,x~ an! the <.:w roiu.ates givcn by a, 11'., = {y: a l < r,1(y) < bl, ... , 0" < x"C,,) < b"}. By our lcmmu we can fino a nonncgative C"-function Iz such that 11'2 - {y :/z(Y) > OJ. (2.3) Since x E lV.. lhl;l {lI'z} cover Jr. By Proposition 1.2 we can Re!ftCtll countable su)t"Qvering {Wi}. Let liS denote the corresponding function~ by Ii; that is, if Wi = Wz , we IW;t/t "'" I~· r..,. VI - IV] - {.t :fd.r:) > OJ, V2 - {X:/f(X) > O,/I(X) <!}, 1', = {x :/.(:r.) > 0, hex) < l / r, ... ,/,_,(x) < I/r} . It is clear thut VI is open and that, Vj C IVj, so that, by (~.3), Vi is compact ,nd Vi '- U. (2.4) for some . - l(j). lor each ;t E M let q(x) dcnot(' the first integer q for which fa{x) > O. Thus/p(x) = [) if p < q(x) and /~CZ1(I) > o. 1.et Vz - {y: /"cz/y) > tfq(Z)(;c)} . Since/V(I)(x) > 0, it follows that x C I'A and I'z i~ open. Furlhtmnorc, if r > q(l') and l i T" < !fv(All:). (2.5) ACCQrding to thc lemma, f!sch Ij,t 1' , can be given as Vi = {.t:: !1lc) > OJ, where!1, is a suitable C"'-function. Let g = L i];. ILL view uf (2.5) this is really a finite awn ilL the ncip;hborhood uf lilLy 2.". Thus Y is C"'. Now !1q(Z)(x) > 0, since x E VVI)' Thus g > O. Set ,;(lj = -. g We claim that {OJ} il:l the desired partition of unity. In fnet, (i) holds by uur construction, (ii) nno (2.1) follow from (~.4), (iii) follows from (2.5), and (iv) hold!~ hy construction. 0
  • 420. 408 THE INTEGRAL CALCULUS ON MANIFOLDS 10.3 3. DENSITIES If we regard IRn as a differentiable manifold, then the law for change of variables for an integral shows that the integrand does not have the same transition law as that of a function under change of chart. For this reason we cannot expect to integrate functions on a manifold. We now introduce the type of object that we can integrate. Definition 3.1. A density P is a rule which assigns to each chart (U, a) of M a function Pa defined on a(U) subject to the following transition law: If (W, (3) is a second chart of M, then for v E a(U n W). (3.1) If a is an atlas of M and functions Pa, are given for all (Ui, ai) E a satisfying (3.1), then the Pa, define a density P on M. In fact, if (U, a) is any chart of M (not necessarily belonging to a), define Pa by This definition is consistent: If v E a(U n Ui) n a(U n Uj), then by (3.1), Pa;(aj 0 a-1(v))ldet Ja;oa-l(v)1 = Pa, (ai 0 a,-:-1 (aj 0 a-I(v) )) Idet J aioail (aj 0 a-I(v) ) IIdet J a;oa-I (v) I = Pa;(ai 0 a-I (v)) Idet Ja,oa-l(v)1 by the chain rule and the multiplicative property of determinants. In view of (3.1) it makes sense to talk about local smoothness properties of densities. We will say that a density P is Ck if for any chart (U, a) the function Pa is Ck • As usual, it suffices to verify this for all charts (U, a) belonging to some atlas. Similarly, we say that a density P is locally absolutely integrable if for any chart (U, a) the function Pa is absolutely integrable. By the last proposition of Chapter 8 this is again independent of the choice of atlases. Let P be a density on M, and let x be a point of M. It does not make sense to talk about the value of pat x. However, (3.1) shows that it does make sense to talk about the sign of P at x, More precisely, we say that P > 0 at x if Pa(a(x)) > 0 (3.2) f~r a chart (U, a) about x. Equation (3.1) shows that if Pa(a(x)) > 0, then P/3({3(x)) > 0 for any other chart (W, (3) about x. Similarly, it makes sense to say that P < 0 at x, P > 0 at x, or P ~ 0 at x. Definition 3.2. Let P be a density on M. By the support of P, denoted by supp P, we shall mean the closure of the set of points of M at which P does not vanish. That is, supp P = {x: P ~ 0 at x}.
  • 421. 10.3 DENSITIES 409 Let Pl and P2 be densities. We define their sum by setting (Pl +P2)a = Pla +P2a (3.3) for any chart (U, ex). It is immediate that the right-hand side of (3.3) satisfies the transition law (3.1), and so defines density on M. Let P be a density, and let f be a function. We define the density fp by (3.4) Again, the verification of (3.1) is immediate in view of the transition laws for functions. It is clear that supp (Pl +P2) C supp Pl U supp P2 (3.5) and supp (fp) = suppf n supp p. (3.6) We shall write Pl ::; P2 at x if P2 - Pl 2: 0 at x and Pl ::; P2 if Pl::; P2 at all x E M. Let P denote the space of locally absolutely integrable densities of compact support. We observe that P is a vector space and that the product fp belongs to P if f is a (bounded) locally contented function and pEP. TheoreIn 3.1. There exists a unique linear function f on P satisfying the following condition: If pEP is such that supp pC U, where (U, ex) is a chart of M, then Jp = ( Pa. ia(U) (3.7) Proof. We first show that there is at most one linear function satisfying (3.7). Let a be an atlas of M, and let {gj} be a partition of unity subordinate to a. For each} choose an iU) so that supp gj C Ui(j). Write P = 1· P = L gj' p. Since supp P is compact, only finitely many of the terms gjp are not identically zero. Thus the sum is finite. Since f is linear, By (3.7), Thus (3.8) Thus f, if it exists, must be given by (3.8). To establish the existence of f,
  • 422. 410 THE INTEGRAL CALCULUS ON MANIFOLDS 1O.:~ we must show that (3.8) defines a linear function on P satisfying (3.7). The linearity is obvious; we must verify (3.7). Suppose supp pC U for some chart (U, a). We must show that ( p", = L: 1 (gjP)"'iU)' l",(U) j "'iU) (UiU» Since p = L (ljp and therefore P'" = L «(ljp)"" it suffices to show that 1 (gjp)", = ( (gjP)"'i' (3.9) ",(U) l"'i(Ui) where supp (ljP C U n Ui . By (3.1), «(ljp)", = «(ljP)"'i 0 (a, 0 a-I). det J"'iO",-I, SO that (3.9) holds by the transformation law for integrals in IRn. 0 We can derive a number of useful properties of the integral from the for- mula (3.8): (3.10) In fact, since (lj 2:: 0, we have «(ljPl)", :::; «(ljP2)", for any chart (U, a). Thus (3.10) follows from the corresponding fact on IRn if we use (3.8). Let us say that a set A has content zero if A C A I U ... U Ap where each Ai is compact, A, C Ui for some chart (Ui, ai), and ai(A,) has content zero ill IRn. It is easy to see that the union of any finite number of sets of content zero has content zero. It is also clear that the function eA is contented. Let us call a set B C J1[ contented if the function eB is contented. For any pEP we define IB P by (3.11) It follows from (3.8) that Lp=O for any pEP if A has content zero. We can thus ignore sets of content zero for the purpose of integration. In practice, one usually takes advantage of this when computing integrals, rather than using (3.8). For instance, in computing an integral over sn, we can "ignore" any meridian: for example, if A = {x E sn : x = (t, 0, ... , ±yr=t2) E IRn +I }, then for any p. This means that we can compute Isn P by introducing polar coordinates (Fig. 10.1) and expressing P in terms of them. Thus in S2, if U = S2 - A and a is the polar coordinate chart on U, then ( P = (2" (" P'" dO dcp. lS2 lo lo
  • 423. 10.4 VOLUME DENSITY OF A RIEMANN METRIC 411 (J ~~--------------~ a(S-A) ---+--------------~2~~----~ Fig. 10.1 It is worth observing that if N is a differentiable manifold of dimension less than dim M and if; is a differentiable map of N ~ M, then Proposition 7.3 of Chapter 8 implies that if A is any compact subset of N, then if;(A) has content zero in M. In this sense, one can ignore "lower-dimensional sets" when integrat- ing on M. 4. VOLUME DENSITY OF A RIEMANN METRIC Let M be a differentiable manifold with a Riemann metric o. We define the density 0' [=0'(0)] as follows. For each chart (U, a) with coordinates Xl, ••• , xn let O'a(a(x)) = Idet [(a~i (x), a~j (x))]11 / 2 = Idet (gij(x))I I / 2• (4.1) Here is the matrix whose ijth entry is the scalar product of the vectors a -a. (x) x' and so that (in view of Exercise 8.1 of Chapter 8) a a---: (x), X1 O'a(a(x)) = volume of the parallelepiped spanned by (a/axi)(x) with respect to the Euclidean metric ( , )0,., on T.,(M). It is easy to see that (4.1) actually defines a density. Let (W, fJ) be a second chart about x with coordinates yl, ... ,yn. Then so that
  • 424. 412 THE INTEGRAL CALCULUS ON MANIFOLDS Now for all k and l. We can write this as the matrix equation so that U~(~(x) = Idet [(a~i (x), a~j (x))J det [;:~]det [~:~JI1/2 = 1det [(a~i (x), a~j (X))JI1/2 1det [~::JI = ua(a(x) Idet[~::]I(X). lOA If M is an open subset of Euclidean space with the Euclidean metric, then the volume density, when integrated over any contented set, yields the ordinary Euclidean volume of that set. In fact, if xI, ... , Xn are orthonormal coordinates corresponding to the identity chart, then gij(X) = 0 if i ~ j and gii = 1, so that Uid == 1 and thus LU = L1 = }L(A). More generally, let 'P be an immersion of a k-dimensional manifold Minto IRn such that 'P(M) is an open subset of a k-dimensional hyperplane in IRn, and let m be the Riemann metric induced on M by 'P. Then, if U denotes the corre- sponding volume density, fA Uis the k-dimensional Euclidean volume of 'P(A). In fact, by a Euclidean motion, we may assume that 'P maps M into IRk C IRn. Then, since 'P is an immersion and M is k-dimensional, we can use xI, ... , Xk as coordinates on M and conclude, as before, that U is given by the function in terms of these coordinates, and hence that fA U= }L('P(A). Now let 'PI and 'P2 be two immersions of M ~ IRn. Let (U, a) be a coordinate chart on M with coordinates y ... ,yk. If mI is the Riemann metric induced by 'Pi, then and where the scalar product on the right is the Euclidean scalar product. Let 111
  • 425. 10.4 VOMrME DENSITY OF A RIEMANN METRIC 413 Fig. 10.2 and U2 be the volume densities corresponding to ml and m2. Then and 1 [(iJIPl iJIPl)J11/2 U la = det iJyi' iJyi U2a = Idet [(~~~ , ~~~)JI 1 / 2 In particular, given an L > 0, there is a K = K(k, n, L) such that if II~;~II < L and II~~!II < L for all i = 1, ... , k, then, by the mean-value theorem, lUI - U2 I < K (11iJIP2 _ iJIP111 +... + IliJCf?2 _ iJCPlll) .a a - iJy1 iJyt iJyk iJyk RO}1ghly speaking, this means that if CPt and CP2 are close, in the sense that their derivatives are close, then the densities they induce are close. We apply this remark to the following situation. We let CPt be an immersion qf Minto IR" and let (W, a) be some chart of M with coordinates yt, ••• , yk. We let U = W - C = UUz, where C is some closed set of content zero and such that Uz n Uz' = 9J if Z¢ l'. For each Zlet z, be a point of Ui whose coordinates are -< yl, ••• , y~ >, and for z = -< yt, ... , yk> define CP2 by setting, 1 k ~.. iJCPl CP2(y , ••. , y ) = CPl(zz) + £oJ (y' - yi) iJy' (zz) i~z E Uz• (See Fig. 10.2.)
  • 426. 414 THE INTEGRAL CALCULUS ON MANIFOLDS 10.4 If the Uz's are sufficiently small, then II:~~ -~~!II will be small. More generally, we could choose CP2 to be any affine linear map approximating CPl on each Uz• We thus see that the volume of W in terms of the Riemann metric induced by cP is the limit of the (suljace) volume of polyhedra approximating cp(W). Here the approximation must be in the sense of slope (Le., the derivatives must be close) and not merely in the sense of position. The construction of the volume density can be generalized and suggests an alternative definition of the notion of density. In fact, let P be a rule which assigns to each x in M a function, Px, on n tangent vectors in Tx(M) subject to the rule (4.2) where ~i E Tx(M) and A: Tx(M) ~ Tx(M) is a linear transformation. Then we see that P determines a density by setting Pa(a(x)) = P (a~l (x), ... 'a~n (X)) (4.3) if (U, a) is a chart with coordinates uI, ... ,un. The fact that (4.3) defines a density follows immediately from (4.2) and the transformation law for the a/aui under change of coordinates. Conversely, given a density P in terms of the Pa, define p(a/auI, ... , a/aun) by (4.3). Since the vectors {a/aUi}i=l, ...,n form a basis at each x in U, any 6, ... , ~n in Tx(M) can be written as a~i=B-a. (x), u' where B is a linear transformation of Tx(M) into itself. Then (4.2) determines p(6, ... , ~n) as p(6, ... , ~n) = Idet BIPa(a(x)). (4.4) That this definition is consistent (i.e., doesn't depend on a) follows from (4.2) and the transformation law (3.1) for densities. EXERCISES 4.1 Let M = Sl X Sl be the torus, and let cp: M -+ 1R4 be given by xl 0 cp(8J, 82) = cos 81, x20 cp(81, 82) = sin 81, x3 0 cp(81, 82) = 2 cos 82, X4 0 cp(8l, 82) = 2 sin 82,
  • 427. 10.4 VOLUME DENSITY OF A RIEMANN METRIC 415 where Xl, ... , x4 are the rectangular coordinates on ~4 and (}l, (}2 are angular coordi- nates on M. a) Express the Riemann metric induced on M by '" (from the Euclidean metric on ~4) in terms of the coordinates (}l, (}2. [That is, compute the Yij«(}l, (}2).] b) What is the volume of M relative to this Riemann metric? 4.2 Consider the Riemann metric in- duced on Sl X Sl by the immersion", into fEB by x 0 ",(u, v) = (a - cos u) cos v, yo 'P(u, v) = (a - cos u) sin v, z 0 ",(u, v) = sin u, where u and v are angular coordinates and a > 2. What is the total surface area of Sl X Sl under this metric? 4.3 Let 'I' map a region U of the xy-plane into IP by the formula ",(x, y) = (x, y, F(x, y)), Fig. 10.3 so that 'P(U) is the surface z = F(x, y). (See Fig. 10.3.) Show that the area of this surface is given by 4.4 Find the area of the paraboloid z = x2+y2 for x2+y2 ::; 1. 4.5 Let U C ~2, and let '1': U - P be given by ",(u, v) = (x(u, v), y(u, v), z(u, v)), where x, y, z are rectangular coordinates on P. Show the area of the surface 'P(U) is given by i r flax ay _ ax ay)2 +(ay az _ ay az)2 +(ax iJz _ ax ay)2. I Ju ~au av avau au av avau au av avau 4.6 Compute the surface area of the unit sphere in P. 4.7 Let Ml and M2 be differentiable manifolds, and let 0' be a density on M2 which is nowhere zero. For each density P on MIX M2, each product chart (Ul X U2, al X a2), and each X2 E U2, define the function Plal(', X2) by Plal(Vl, X2)Ua (a2(X2)) = PaIXa2(Vl, a2(x2)) for all VI E al(UI). a) Show that PIal(VI, X2) is independent of the chart (U2, a2). b) Show that for each fixed X2 E M2 the functions Plal(', X2) define a density on MI. We shall call this density PI(X2).
  • 428. 416 THE INTEGRAL CALCULUS ON MANIFOLDS 10.5 c) Show that if P is a smooth density of compact support on M1 X 1112 and (j is smooth, then P1(X2) is a smooth density of compact support 011 1111. d) Let P be as in (c). Define the function Fp on l!f2 by Fp(X2) = r PI(X2). 1Ml Sketch how you would prove the fact that Fp is a smooth function of compact support on !If2 and that 5. PULLBACK AND LIE DERIVATIVES OF DENSITIES Let cp: M 1 ~ M 2 be a diffeomorphism, and let p be a density on M 2. Define the density cp*p on lJf1 by (5.1) for ~i E Tx(M1) and cp* = Cp*x. To show that cp*p is actually a density, we must check that (4.2) holds for any linear transformation A of Tx(M1). But cp*p(Ah, ... , A~n) = P(cp*A~l' ... , cp*A~n) which is the desired identity. = p(cp*Acp*lcp*~b ... , cp*Acp*lCP*~n) = Idet cp*Acp*ll p(cp*h, ... , CP*~n) = Idet Alcp*p(h, ... , ~n), Let (U, a) and (W, (3) be compatible charts on M 1and M 2 with coordinates u 1, ... , un and WI, ... , wn, respectively. Then for all points of U we have, by (4.3), (cp*p) (a(.)) = p (cp* ~, ... 'CP*~) = Idet (aw~)1 p (~, ... ,~)a. au1 aun au' awl awn = Idet (~::)I p~({3 0 cp(.)). In other words, we have (cp*p)a. = Idet J~o'Poa.-llp({3 0 cp 0 a-1(.)). The density cp*p is called the pullback of p by cp*. It is clear that CP*(P1 +P2) = CP*(P1) + CP*(P2) and that cp*(fp) = cp*(f)cp*(p) for any function f. It follows directly from the definition that supp cp*p = cp-1[SUpp pl. (5.2)
  • 429. 10.5 PULLBACK AND LIE DERIVATIVES OF DENSITIES 417 Proposition 5.1. Let cP: M1 ---7 M2 be a diffeomorphism, and let P be a locally absolutely integrable density ,vith compact support on M 2. Then (5.3) Proof. It suffices to prove (5.3) for the case supp P C cp(U) for some chart (U, a) of M1 with cp(U) C W, where (W, (3) is a chart of M2. In fact, the set of all such cp(U) is an open covering of J],f2, and we can therefore choose a partition of unity {gj} subordinate to it. If we write P = L gjP, then the sum is finite and each gjp has the observed property. Since both sides of (5.3) are linear, we conclude that it suffices to prove (5.3) for each term. Now if supp pC cp(U), then fp = r P{1 = r P{1 J{1(W) J{1otp(U) and thus establishing (5.3). D Now let CPt be a one-parameter group on Al with infinitesimal generator X. Let P be a density on M, let (U, a) be a chart, and let W be an open subset of U such that CPt(W) C U for all ItI < e. Then (/(P)a(v) = Pa(<I>a(v, t) Idet (aa<l>a) I for v E a(W), v (V,t) where <l>a(v, t) = a 0 CPt 0 a-1(v) and (a<l>a/av)(V,t) is the Jacobian of v ~ <l>a(v, t). We would like to compute the derivative of this expression with respect to t at t = O. Now <l>a(v, 0) = v, and so det (a<l>a) = 1. av (V,O) • Consequ~ntly, we can conclude that det (a<l>a) > 0 av (V,t) for t close to zero. We can therefore omit the absolute-value sign and write d(cpiP)al = dpa(<I>a) I +Pa(V)i£(deta<l>a)/ . dt t=o dt t=o dt av t=o
  • 430. 418 THE INTEGRAL CALCULUS ON MANIFOLDS 10.5 We simply evaluate the first derivative on the right by the chain rule, and get dpa (a:a) = dPa(Xa(V). In terms of coordinates xl, ... , xn, we can write if Xa = -< X~, ... , X~ >-. To evaluate the second term on the right, we need to make a preliminary observation. Let ACt) = (aij(t) be a differentiable matrix-valued function of t with A(O) = id = (of). Then d(de~~(t) = lim~ (det A(t) - 1). Now aii(O) = 1 and aij(O) = 0 (i ~ j). To say that A is differentiable means that each of the functions aij(t) is differentiable. We can therefore find a constant K such that lai/t)1 :::;; Kltl (i ~ j) and laii(t) - 11 :::;; Kltl. In the expansion of det A(t), the only term which will not vanish at least as t2 is the diagonal product all(t)··· ann(t). In fact, any other term in L ± ali1 (t)··· aniJt) involves at least two off-diagonal terms and thus vanishes at least as t2• Thus ~ (det A(t) = ~~~ (all(t)··· ann(t) - 1) = ail(O) +... + a~n(O) = tr A'(O). If we take A = a'P,,/av, we conclude that !i (det a'Pa) = tr aXa = L: ax~ . dt av av ax' Thus We repeat: Proposition 5.2. Let <Pt be a one-parameter group of diffeomorphisms of M with infinitesimal generator X, and let P be a differentiable density on M. Then *Dxp = lim <PtP - P t=o t exists and is given locally by (DxP)a = L: a(7~~)x, if X = -<X~, ... , X:>- on the chart (U, a).
  • 431. 10.6 THE DIVERGENCE THEOREM 419 The density Dxp is sometimes called the divergence of <X, p> and is denoted by div <X, p>. Thus div <X, p> = Dxp is the density given by (div <X, P»a = 1: aa .(X~Pa)x' Now let p be a differentiable density, and let A be a compact contented set. Then Thus 1 p = ( e<pt(A)p <pt(Al 1M = 1M <p:(e<p,cAlP) = I(<pie<pM»)(<ptp) = leA<Pi(p) = L<p:p. ~ (i,cA/ - i p) = i~ (<pip - p). on (U, ex). Fig. 10.4 Using a partition of unity, we can easily see that the limit under the integral sign is uniform, and we thus have the formula ddt (l p)1 = 1Dxp = [ div <X,p>. <pt(Al t=o A }A 6. THE DIVERGENCE THEOREM Let <P be a flow on a differentiable manifold M with infinitesimal generator X. Let p be a density belonging to P, and let A be a contented subset of M. Then for small values of t, we would expect the difference f"'t(Al P - fA P to depend only on what is happening near the boundary of A (Fig. lOA). In the limit, we would expect the derivative of f<P,cAl p at t = 0 (which is given by fA div <X, p» to be given by some integral over aA. In order to formulate such a result, we must first single out a class of sets whose boundaries are suffi- ciently nice to allow us to integrate over them. We therefore make the following definition: Definition. Let M be a differentiable manifold, and let D be a subset of M. We say that D is a domain with regular boundary if for every x E M there is a chart (U, ex) about x, with coordinates x!, ... ,x:, such that one of the following three possibilities holds: i) Un D = 50; ii) U c D; iii) cx(U n D) = exCU) n {v = <Vi, ... I vn > E IRn : Vn ~ O}.
  • 432. 420 THE INTEGRAL CALCULUS ON MANIFOLDS 10.6 Note that if x G!: TI, we can always find a (U, a) about x such that (i) holds. If x E int D, we can always find a chart (U, a) about x such that (ii) holds. This imposes no restrictions on D. The crucial condition is imposed when x E aD. Then we cannot find charts about x satisfying (i) or (ii). In this case, (iii) implies that a(U n aD) is an open subset of IRn-1 (Fig. 10.5). In fact, a(U n aD) = {v E a(U) : vn = O} = a(U) n IRn-l, where we regard IRn- 1 as the subspace of IRn consisting of those vectors with last component zero. a(UnaD) Fig. 10.5 Let a be an atlas of M such that each chart of a satisfies either (i), (ii), or (iii). For each (U, a) E a consider the map a raD: U n aD ~ IRn- 1 C IRn. [Of course, the maps a raD will have a nonempty domain uf definition only for charts of type (iii).] We claim that {(U n aD, a I aD)} is an atlas on aD. In fact, let (U, a) and (W, (3) be two charts in a such that C n lV n aD ~ [25. Let Xl, ... ,xn be the coordinates of (U, a), and let yl, ... ,yn be those of (W, (3). The map f3 0 a-I is given by On a(U n W n aD), we have xn = 0 and yn = O. In particular, yn(xl, ... , xn- l, 0) == 0, and the functions yl(xl, ... , xn-l, 0), ... , yn-l (xl, ... , xn-l, 0) are differentiable. This ]) shows that (f3 I aD) 0 (a raD)-1 is differen- tiable on a(U n aD). We thus get a manifold x structure on aD. It is easy to see that this manifold struc- ture is independent of the particular atlas of !If that was chosen. We shall denote by L the map of aD ~ M which sends each x E aD, regarded Fig. 10.6 as an element of M, into itself. It is clear that L is a differentiable map. (In fact, (U n aD, a raD) and (U, a) are compatible charts in terms of which a 0 ,0 (a I aD)-1 is just the map of IRn-1 ~ IRn.) Let x be a point of aD regarded as a point of M, and let t be an element of Tx(M). We say that t points into D if for every curve C with C'(O) = t. we have C(t) E D for sufficiently small positive t (Fig. 10.6). In
  • 433. 10.6 THE DIVERGENCE THEOREM 421 terms of a chart (U, a) of type (iii), let ~'" = -< e, ... ,~n >. Then it is clear that ~ points into D if and only if C > o. Similarly, a tangent vector ~ points out of D (obvious definition) if and only if ~n < o. If ~n = 0, then ~ is tangent to the boundary-it lies in L*Tx(aD). Let P be a density on M and X a vector field on M. Define the density Px on aD by for ~i E Tx(aD). (6.1) It is easy to check that (6.1) defines a density. (This is left as an exercise for the reader.) If (U, a) is a chart of type (iii) about x and X", = -< Xl, ... , xn> , then applying (4.3) to the chart (U n aD, a f aD) and the density Px, we see that (Px) ~aD=p(~, ... ,_a_,x).'" axl axn - l Let A be the linear transformation of Tx(lIf) given by a a A axl = axl ' a aA--=--, axn - l axn - l The matrix of A is 1 0 o 1 0 o and therefore Idet AI = Ixnl. Thus we have aA-=X. axn (Px)", taD = IXnlp" at all points of a(U n aD). (6.2) We can now state our results. TheorelU 6.1 (The divergence theorem).t Let D be a domain with regular boundary, let PEP, and let X be a smooth vector field on M. Define the function EX on aD by Then EX(X) = { ~-1 if X(x) points out of D, if X(x) is tangent to aD, if X(x) points into D. { div -<X, p> = { EXPX. lD laD Remark. In terms of a chart of type (iii), the function EX is given by EX = -sgnXn. (6.3) (6.4) t This formulation and proof of the divergence theorem was suggested to us by Richard Rasala.
  • 434. 422 THE INTEGRAL CALCULUS ON MANIFOLDS ~ ~ Fig. 10.7 r---------------------,R -R~------------------~ Fig. 10.9 10.6 Fig. 10.8 Fig. 10.10 Proof. Let a be an atlas of M each of whose charts is one of the three types. Let {Yi} be a partition of unity subordinate to a. Write P = L YiP. This is a finite sum. Since both sides of (6.3) are linear functions of P it suffices to verify (6.3) for each of the summands YiP. Changing our notation (replacing YiP by p), we reduce the problem to proving (6.3) under the additional assumption supp pC U, where (U, a) is a chart of type (i), (ii), or (iii). There are therefore three cases to consider. CASE I. supp P C U and Un 15 = ¢. (See Fig. 10.7.) Then both sides of (6.3) vanish, and so (6.3) is correct. CASE II. supp pC U with U C int D. (See Fig. 10.8.) Then the right-hand side of (6.3) vanishes. We must show that the left-hand side does also. But rdiv -<X, p'r = rdiv -<X, p'r = 1 2: a(Xi~a) = 2: 1 a(Xipa) } D } U a(U) ax' a(U) axi Now each of the functions XiPa has its support lying inside a(U). Choose some large R so that a(U) C O!R' We can then replace fa(U) by fO!!'R' We extend its domain of definition to all of ~n by setting it equal to zero outside a(U). (See Fig. 10.9.) Writing the integral as an iterated integral and integrating with respect to Xi first, we see that 1 aXiPa a(U) axi = fXipa(" ., R, ... ) - Xipa("" -R, . .. ) dxl dx2 dXi - l dxi . .. dxn = O. This last integral vanishes, because the function XiPa vanishes outside a(U).
  • 435. 10.6 THE DIVERGENCE THEOREM 423 CASE III. supp p is contained in a chart of type (iii). (See Fig. 10.10.) Then ( div -<X, p> = ( div -<X, p> = 1:1 aXipa. JD JDnu a(DnU) ax' Now a(U n D) = a(U) n {v: vn ~ O}. We can therefore replace the domain of integration by the rectangle D~~R::~.~R,o>. (See Fig. 10.11.) For 1 ~ i < n all the integrals in <:- R, _.'_''':''---:''---1_ the sum vanish as before. For i = n we obtain JD div -<X, p> = - ~n-l XnPa. Fig. 10.11 ... ,R> '-'---IV"~ = 0 If we compare this with (6.2) and (6.4), we see that this is exactly the assertion of (6.3). 0 If the manifold M is given a Riemann metric, then we can give an alternative version of the divergence theorem. Let dV be the volume density of the Riemann metric, so that dV(h, ... , ~n) = Idet ((~i' ~j»)11/2, ~i E Tx(M), is the volume of the parallelepiped spanned by the ~i in the tangent space (with respect to the Euclidean metric given by the scalar product on the tangent space). Now the map L is an immersion, and therefore we get an induced Riemann metric on aD. Let dS be the corresponding volume density on aD. Thus, if {Mi=l,... ,n-l are n - 1 vectors in Tx(aD), dS(~b ... , ~n-l) is the (n - 1)- dimensional volume of the parallelepiped spanned by L*~l"'" L*~n-l in L*Tx(aD) c Tx(M). For any x E aD let n E Tx(M) be the vector of unit length which is orthogonal to L*Tx(aD) and which points out of D (Fig. 10.12). We Fig. 10.12 Fig. 10.13
  • 436. 424 THE INTEGRAL CALCULUS ON MANIFOLDS 10.7 clearly have dS(h, ... , ~n-I) = dV(L*h, ... , L*~n-b n). For any vector X(x) E TxCM) (Fig. 10.13) the volume of the parallelepiped spanned by h, ... , ~n-I, X(x) is I(X(x), n) IdS(~1i ... '~n-I)' [In fact, write X(x) = (X(x), n)n +Ill, where III E L*T(aD).] If we compare this with (6.1), we see that dVx = I(X, n)ldS. Furthermore, it is clear that e(x) = sgn (X(x), n). Let p be any density on M. Then we can write p = jdV, where j is a function. Furthermore, we clearly have px = j dVx and div -<X, p> = div -<X, j dV>. We can then rewrite (6.3) as r div -<X,jdV> = r j. (X, n) dS. lD laD 7. MORE COMPLICATED DOMAINS (6.5) (6.6) For many purposes, Theorem 6.1 is not quite sufficiently broad. The trouble is that we would like to apply (6.3) to domains whose boundaries are not com- pletely smooth. For instance, we would like to apply it to a rectangle in [Rn. N ow the boundary of a rectangle is regular at all points except those lying on an edge (i.e., the intersection of two faces). Since the edges form a set "of dimension n - 2", we would expect that their presence does not invalidate (6.3). This is in fact the case. Let M be a differentiable manifold, and let D be a subset of M. We say that D is a domain with almost regular boundary if to every x E M there is a chart (U, a) about x, with coordinates xl, ... ,x~, such that one of the following four possibilities holds: i) Un D = 0; ii) UeD; iii) a(U n D) = a(U) n {v = -<VI, ... , vn> E [Rn : Vn :?: O}; iv) a(U n D) = a(U) n {v = -<VI, ... , Vn> E [Rn: Vk :?: 0, ... ,Vn :?: O}. The novel point is that we are now allowing for possibility (iv) where k < n. This, of course, is a new possibility only if n > 1. Let us assume n > 1 and see what (iv) allows. We can write a(U n aD) as the union of certain open subsets lying in (n - I)-dimensional subspaces of [Rn-I, together with a union of portions lying in subspaces of dimension n - 2.
  • 437. 10.7 MORE COMPLICATED DOMAINS 425 B Fig. 10.14 H~ In fact, for k ::; p ::; n let H k - { • k 0 P - 0 p+1 0 n O}p- v.v > ,... ,v - ,v > ,... ,v > . Thus H~ is an open subset of the (n - I)-dimensional subspace given by vP = o. (See Fig. 10.14.) We can write ex(U n aD) c ex(U) n {(Ht u Ht+1 U· .. u H!) uS}, where'S is the union of the subspaces (of dimension n - 2) where at least two of the vP vanish. Fig. 10.15 Observe that if x E Un aD is such that ex(x) E H~ for some p, then there is a chart about x of type (iii). In fact, simply renumber the coordinates so that vP becomes vn, that is, map IFiin.!4 IFiin by sending -<vI, ... ,vn>- -+ -<wI, ... , wn>-, where Wi = Vi Wi = Vi +1 wn = vp • for i < p, for p::; i < n, Then in a sufficiently small neighborhood U1 of x the chart (U1, '" 0 ex) is of type (iii). (See Fig. 10.15.)
  • 438. 426 THE INTEGRAL CALCULUS ON MANIFOLDS 10.7 We next observe the set of x E aD having a neighborhood of type (iii) forms a differentiable manifold. The argument is just as before. The only difference is that this time these points do not exhaust all of aD. We shall denote this manifold by lib. Thus lib is a manifold which, as a set, is not aD but only the "regular" points of aD, that is, those having charts of type (iii). Theorem 7.1 (The divergence theorem). Let M be an n-dimensional manifold, and let D C M be a domain with almost regular boundary. Let lib be as above, and let i be the injection of lib --7 M. Then for any pEP we have ( div <.X, p> = r~ EXPX. JD JiJD (7.1) Proof. The proof proceeds as before. We choose a connecting atlas of charts of types (i) through (iv) and a partition of unity {gj} subordinate to the atlas. We write P = L gjp and now have four cases to consider. The first three cases have already been handled. The new case arises when p has its support in U, where (U, a) is a chart of type (iv). We must evaluate 1 L: aXipa. a(UnD) axi The terms in the sum corresponding to i < k make no contribution to the integral, as before. Let us extend XiPa to be defined on all of ~n by setting it equal to zero outside a(U), just as before. Then, for k :::; i :::; n we have 1 axipa ( axipa a (u nD) ----a:ii = JB ----a:ii' where B = {v: vk ~ 0, ... , vn ~ O}. Writing this as an iterated integral and integrating first with respect to Xi, we obtain where the set Ai C ~n-I is given by Fig. 10.16 A { J 1 i-I i+I n,- k > 0 n > O}i= ,V, ... ,V ,v , ... ,V r:V _ , ... ,V _ • Note that Ai differs from HZ by a set of content zero in ~n-I (namely, where at least one of the vl = 0 for k = l :::; n). Thus we can replace the Ai by the H~ in the integral. Summing over k :::; i :::; n, we get which is exactly the assertion of Theorem 7.1 for case (iv). 0
  • 439. 10.7 MORE COMPLICATED DOMAINS 427 Fig. 10.17 Fig. 10.18 We should point out that even Theorem 7.1 does not cover all cases for which it is useful to have a divergence theorem. For instance, in the plane, Theorem 7.1 does apply to the case where D is a triangle. (See Fig. 10.16.) This is because we can "stretch" each angle to a right angle (in fact, we can do this by a linear change of variables of ~2). (See Fig. 10.17.) However Theorem 7.1 does not apply to a quadrilateral such as the one in Fig. 10.18, since there is no CI-transformation that will convert an angle greater than 7r into one smaller than 7r (since its Jacobian at the corner must carry lines into lines). Thus Theorem 7.1 doesn't apply directly. However, we can write the quadrilateral as the union of two triangles, apply Theorem 7.1 to each triangle, and note that the contributions of each triangle coming from the common boundary cancel each other out. Thus the divergence theorem does apply to our quadrilateral. This procedure works in a quite general context. In fact, it works for all cases where we shall need the divergence theorem in this book, whether Theorem 7.1 applies directly or we can reduce to it by a finite subdivision of our domain, followed by a limiting argument. We shall not, however, fornmlate a general theorem covering all such cases; it is clear in each instance how to proceed. EXERCISES In Euclidean space we shall write div X instead of div -< X, p> when p is taken to be the Euclidean volume density. 7.1 Let x, y, z be rectangular coordinates on P. Let the vector field X be given by X = r2 (x ~+y ~+z~) , ax ay az where r2 = x2+ y2 + Z2. Show directly that Is(X, n) dA = Ldiv X by integrating both sides. Here B is a ball centered at the origin and S is its boundary.
  • 440. 428 THE INTEGRAL CALCULUS ON MANIFOLDS 10.7 7.2 Let the vector field Y be given by Y = Yrnr + Y 8n 8 + Y",1l;", in terms ofpolar "coordinates" r, e,,<p on 1E3, where nr, n,.and n,<pare the unit vectors in the directions a/ar, a/a,e and a/a;<p respectively. Show that div Y = ~{a~(r2sin<p Yr)+aa(j(rY8)+aa (r sin <p y",)}. r sm <p r <p 7.3 Compute the divergence of a vector field in terms of polar coodrinates in the plane. 7.4 Compute the divergence of a vector field in terms of cylindrical coordinates in 1E3. 7.5 Let q be the volume (area) density on the unit sphere S2. Compute div qX in terms of the coordinates (j, <p (polar coordinates) on the sphere.
  • 441. CHAPTER 11 EXTERIOR CALCULUS Let M be a differentiable manifold and let W be a linear differential form in ilf. For any differentiable curve C: [a, bl ----i M ye can consider the integral f: (C'(t), wC(t) elt. Let [c, ell ----i [a, bl be a differentiable lllap given by s ----i t(s). The curve B: [c, ell ----i M given by B(s) = C(t(s)) satisfies B'(s) = t'(s)C'(t(s)). Thus if t'(s) > 0 for all s, fed (B'(s), WB(s) ds = lab (C'(t), WC(t) dt. Thus a linear differential form is something we can integrate over "oriented" curves of ]1.1 and is independent of the parametrization. In this chapter ,ye shall introduce objects yhich can be integrated over "oriented k-dimensional surfaces" of M and study their properties. 1. EXTERIOH DIFFEHENTIAL FOHMS We defined a linear differential form to be a rule which assigns an element of T;(M) to each x E M. We can regard T;(M) as a1 (Tx(M)). In view of this, we make the following generalization of this definition. By an exterior differ- ential form of degree q on III we mean a rule which assigns an clement of aq(Tx(M)) to each x E M. If W is an exterior form of degree q and (U, a) is a chart, then, since a identifies each Tx(il1) with V for x E U, we obtain an aq (V)-valued function, W,,' on a( U) defined by if v = a(x) and ~1, ... , ~q E Tx(M). It is easy to write down the transition laws. In fact, if (W, (3) is a second chart, we have WiJ((3(x))(~J, ... , ~~) = w(X)(~l, ... , ~q) = w,,(a(x))(~;, ... , ~:D or, since ~iJ = JiJo,,-l(a(x))(~,,) for ~ E Tx(I1f), we see that w,,(v)(~;, ... , ~~) = WiJ((3 0 a-1(v)) (JiJo,,-l(vH~, ... ,JiJo,,-l(vH':x). (1.1) In order to write (1.1) in a less cumbersome form, we introduce the following notation. Let V 1 and V2 be vector spaces, and let l: V 1 ---7 V2 be a linear map. 429
  • 442. 430 EXTERIOR CALCULUS 11.1 We define (iP(l) to be the linear map of (iP(Y2) -7 (iP(V I) given by (iP(l)(w)(vr, . .. , vp) = w(l(vI), ... ,1(vp)) for all 10 E (iP(V2) and VI, ... ,Vp E VI. Note that under the identification of (i I (V) with V* the map (i I (l) coincides with the map 1*: V; -7 V~. Note also that if WI Eo (iP(V2) and 102 E (iq(v2), then (iP(l)n' l 1 (iQ(l)U'2 = (ip+q(l)(wl 1 W2)' (1.2) This follows directly from the definitions. Also, if /1 : V I -7 V 2 and 12: V 2 -7 V 3, then (1.3) It is clear that if 1 depends differentiably on some parameters, then so does (iP(l) for any p. We can now write (1.1) as (1.1') It is clear from (1.1') that it is consistent to require that w" be a smooth function. We therefore say that W is a smooth differential form if all the functions w" are C'" on O'.(U) for all charts (U, 0'.). As usual, it suffices to verify this for all charts in an atlas. We let /q(l'tJ) denote the space of all smooth exterior forms of degree q. Let WI E /P(M) and W2 E /q(M). We define the exterior (p +q)-form WI 1 W2 by for all x E l'tJ. It is easy to check that WI 1 W2 is a smooth (p + q)-form. We thus get a multi- plication on exterior forms. To make the formalism complete, it is convenient to denote the space of differentiable functions on l'tJ by /O(]I.f) and to denote the product of a function f and a p-form W by fw or f 1 w. This product is given by (f 1 w)(x) = (fw)(x) = f(x)w(x) for all x E M. We have thus defined, for all 0 ~ p ~ nand 0 ~ q ~ 11, a multiplication sending WI E /P(M) and W2 E /q(M) into WI 1 W2 E /p+q(M) (where WI 1 W2 == 0 if p + q > n = dim M). The rules for the 1 -product on anti- symmetric tensors carryover and thus, for instance, WI 1 (W2 1 W2) = (WI 1 W2) 1 W3, WI 1 (W2 +W3) = WI 1 W2 +WI 1 W3, and so OIl. Let M I and M 2 be differentiable manifolds, and let cp: M I -7 M 2 be a differentiable map. For each W E /q (M 2) we define the form cp*w E /q(M I) by cp*w(x) = (iq (cp*x)(wecp(x)) ). (1.4)
  • 443. 11.1 EXTERIOR DIFFERENTIAL FORMS 431 It is easy to check that cp*w is indeed an element of /q(M1), that is, it is a smooth q-form. Note also that (7.5) of Chapter 9 is a special case of (l.4)-the case q = 1. (If we make the convention that a°(l) = id, then the case q = 0 of (1.4) is the rule for pullback of functions.) It follows from (1.4) that cp* is linear, that is, (1.5) and from (1.2) that (1.6) If cp is a one-parameter group on a manifold M with infinitesimal generator X, then we can show that the *1· CPtW - W D 1m t = XW t--->O exists for any W E /q(M). The proof of the existence of this limit is straight- forward and will be omitted. We shall derive a useful formula allowing a simple calculation of Dxw in Section 3. Let us now see how to compute with the /q(M) in terms of local coordinates. Let (U, a) be a chart of M with coordinates xI, .. . , xn. Then dxi E /1(U) (where by /q(U) we mean the set of differentiable q-forms defined on U). For any it, ... , iq the form dXil 1 ... 1 dxiq belongs to /q(U), and for every x E U the forms {(dXil 1 ... 1 dxiq)(X)}i1<"'<iq form a basis for aq(Tx(M). From this it follows that every exterior form w of degree q which is defined on U can be written as w= (1.7) where the a's are functions; that is, w(x) = L: ai1 .....iq(X)(dxil 1 ... 1 dxiq)(x) i1<···<i. for all x E U. It is easy to see that wE /q(U) if and only if all the func- tions ai1 .....iqare COO-functions on U. If (W, (3) is a second chart with coordinates y1, ... , yn and w = '" b· . dyh 1 . . . 1 dyiq£.... Jlo···.Jq , (1.8) then it is easy to compute the transition law relating the b's to the a's on U n W. In fact, on Un W we have (1.9) where yi = yi(xI, ... ,xn). Then all we have to do is to substitute (1.9) into (1.8) and collect the coefficients of dXil 1 ... 1 dxio. For instance, if q = 2,
  • 444. 432 EXTERIOR CALCULUS then we have w = L bhh dyh 1 dyj2 h<h = " b . . (a yh dxl + ... + ayh dxn) .k..... 1112 ax! axn 1, <12 11.1 If we collect the coefficients of dXi' 1 dxi2 (remember the 1 -multiplication is anticommutative), we get w = " [" b·· (ay~ ayh _ ayh ayh)] dXi, 1 dXi2. .k..... .k..... 1112 axil aXi2 a;J;i, a;Ci2 .1-1<1,11 h<J2 Thus (1.10) Although (1.10) looks a little formidable, the point is that all one has to remem- ber is (1.9) and the law for I-multiplication. For general q the same argument gives Iay:' ... Iax" a . . - " b . . det l.-z,l.···.'l.q - . L.-i . JI,···.J q : 11 < <1q a h y aXiq a/qJaxil . . ayjq aXiq (1.1 I) The formula for pullback takes exactly the same form. Let <p: lJJ! ---7 )1,f c be a differentiable map, and suppose that (U, a) and (W, (3) are compatibl(· charts, where xl, ... , xm are the coordinates of (U, a) and y ... , yn are thos(' of (W, (3). Then we get that yi 0 <p are functions on U and can thus be written as yj 0 <p = yj(x ... , xm). Since <p* dyj = d(yj 0 <p), we have * j ayj . i <p (dy) = L -a. dx .x, If then, by (1.5) and (1.6), <p*(w) = L (bh .....jq 0 <p)(<p* dyh) 1 ... 1 (<p* dyjq). h<···<jq (1.12) (1.1:: ) The expression for (1.13) in terms of the dx's can be computed by substituting (1.12) into (1.13) and collecting coefficients. The answer, of course, will look
  • 445. 11.2 THE INTEGRATION OF EXTERIOR DIFFERENTIAL FORMS just like it did before. If ",*(w) = " a· . dXi, 1 ... 1 dxiq,,... .l-I 'It ... ,'q il<···<i. then the a's are given by ... aY~·1aX'l . . ayi. aXia 433 (1.14) Again, we emphasize that there is no need to remember a complicated looking formula like (1.14); Eqs. (1.5), (1.6), and (1.12) (and of course the rules for 1-multiplication) are sufficient. In many cases, it is much more convenient to do the substitutions directly than to use (1.14). 2. ORIENTED MANIFOLDS AND THE INTEGRATION OF EXTERIOR DIFFERENTIAL FORMS Let lvI be an n-dimensional manifold. Let (U, a) and (W, (3) be two charts on lvI with coordinates xl, ... , xn and yl, ... ,yn. Let w be an exterior differential form of degree n. Then we can write on U and w = b dyl 1 ... 1 dyn on W, where the functions a and b are related on U n W by (1.11), which, in this case (q = n), becomes or or, finally, aa(v) = bfj ({3 0 a-I (v) ) det J fjoa-l(v) for v E a(U n W). If P is a density on lvI, then the transition laws for Pa are given by Pa(v) = Pfj({3 0 a-1(v))ldet Jfjoa-l(v)l. (2.1) (2.2) Note that (2.2) and (2.1) look almost the same; the difference is the absolute- value sign that occurs in (2.2) but not in (2.1). In particular, if (U, a) and (W, (3) were such that det Jfjoa-l > 0, then (2.2) and (2.1) would agree for this pair of charts.
  • 446. 434 EXTERIOR CALCULUS 11.2 This leads us to the following definition: An atlas a of M is said to be oriented if for any pair of charts (U, a) and (W, (3) of a we have detJj3o",-l(a(x)) > 0 for all x E U n W. There is no guarantee that there exists an oriented atlas on a given manifold M. In fact, it is not difficult to show that there does not exist an oriented atlas on certain manifolds. (An example of a manifold possessing no oriented atlas is the Mobius strip.) We say that a manifold M is orientable if it has an oriented atlas. Let M be an orientable manifold, and let al and a2 be two oriented atlases. We say that a l and a2 have the same orientation, and write al 0' a2 , if a l U a2 is again an oriented atlas. To say that al 0' a2 meabs that for any (U, a) E al and any (W, (3) E a2 we have det Jj3o",-l(V) > 0 on a(U n W). It is clear that 0' is an equivalence relation. An equivalence class of oriented atlases is called an orientation of M. An orientable manifold, together with a choice of orientation, will be called an oriented manifold. We shall denote an oriented manifold by M. That is, M is a manifold M together with a choice of orientation. Thus an oriented one-dimensional manifold has a preferred direction at each point (Fig. 11.1); an oriented two-dimensional manifold has a notion of clockwise versus counterclockwise direction (Fig. 11.2); and at any point of an oriented three-dimensional manifold we can distinguish between right- and left-handedness. Fig. 11.1 Fig. 11.2 In general, let M be an oriented manifold, and let (U, a) be a chart of M with coordinates xl, ... ,xn. We say that (U, a) is a positive chart if Jj3o",-l > 0 for any chart (W, (3) belongi~g to any oriented atlas defining (i.e., belonging to) the orientation. (It suffices to check this, of course, for all (W, (3) belonging to one fixed atlas defining the orientation.) Note that if U is connected, then if (U, a) is not positive, then the chart (U, a l ), where is a positive chart. We shall say that (U, a) is a negative chart if det Jj3o",-l < 0 for all (W, (3) belonging to an atlas defining the orientation. (Thus, if U is connected, thell (U, a) must be either positive or negative.)
  • 447. 11.2 THE INTEGRATION OF EXTERIOR DIFFERENTIAL FORMS 435 We now return to our initial observation comparing (2.1) with (2.2). Proposition 2.1. Let l1 be an oriented n-dimensional manifold. We can identify exterior forms of degree n with densities by sending the form w into the density pW, where for any positive chart (U, a) with coordinates Xl, ... , x n, the function p: is determined by w = p:(a(.)) dx l 1 ... 1 dxn Another way of writing (2.3) is on U. w(a/ax l , .•. , a/axn) = p(a/ax l , ..• , a/axn). (2.3) (2.3') In other words, if w = a clxl 1 ... 1 clx" on U, then p~(v) = aa. That pW is really a density follows from the fact that (2.2) reduces to (2.1) for all pairs of charts belonging to a positive atlas. It is clear that this identification is additive, (2.4) and that for any fUllction, (2.5) Furthermore, if w(x) = 0, then pW = 0 at x. By the support of a differential form we mean, as usual, the closure of the set of x for which w(x) ~ O. We say that an n-form w is locally absolutely integrable if the density pW is locally absolutely integrable. Note that to say that w is locally absolutely integrable means that for any chart (U, a), with coordinates Xl, ... ,xn of some atlas ex, if w = a dx l 1 ... 1 clxn on U, then the function aa = a 0 a- l is an absolutely integrable function on a(U). Let r(M) denote the space of absolutely integrable n-forms of compact support. It is clear that r(M) is a vector space and that fw E r(M) if f is a (bounded) contented function and wE r(M). As a consequence of Proposition 2.1 and Theorem 3.1 of Chapter 10, we can state: Theorem 2.1. Let M be an oriented manifold. There exists a unique linear function f on r(M) satisfying the following condition: If supp w C U, where (U, a) is a positive chart with coordinates xl, ... , xn , and if w = a dx l 1 ... 1 clxn, then Jw = 1 aa· a(U) (2.6) Observe that we can write for all wE r(M). (2.7) The recipe for computing fw is now very simple. We break w up into small pieces such that each piece lies in some U. (We can ignore sets of content zero
  • 448. 436 EXTERIOR CALCULUS 11.2 in the process.) If supp w C U, and if (U, a) is a positive chart, we express w as w = a dx 1 / ••• / dxn. And if a is given as a = aa(xI, ... , xn), we integrate the function aa over IRn. The computations are automatic. Thus one point that has to be checked is that the chart (U, a) is positive. If it is negative, then fw is given by - faa. Let M1 be an oriented manifold of dimension q, let cp: M1 ---t M2 be a differentiable map, and let w E Aq(M2 ). Then for any contented compact set A C M1 the form eACP*(w) belongs to r(M1), so we can consider its integral. This integral is sometimes denoted by fcp(A) w; that is, we make the definition (2.8) If we regard cp(A) as an "oriented q-dimensional surface" in M 2, then we see that the elements of Aq(M2) are objects that we can integrate over such "surfaces". (Of course, if q = 1, we say "curves".) C(c) C(a) a b Fig. 11.3 C(b) Fig. 11.4 Let us illustrate by some examples. Suppose that M 2 = IRn, and let A C IR1 be the interval a :::; t :::; b. Let xl, ... , xn be the coordinates of IRn, and let w = a1dx1 + ... + andxn. We regard 1R1 as an oriented manifold on which the identity chart is positive (and its coordinate is t). If C: IR 1 ---t IRn is a differ- entiable curve (Fig. 11.3), then r w = je[a,bP*(W) }C([a,b]) lb (1 )1 dx n dxn = a -+ ... +a - dt a dt dt b = 1(C'(t), w) dt. (2.9) From this last expression we see that C does not have to be differentiable every- where in order for fC([a,b]) w to make sense. In fact, if C is differentiable everywhere on IR except at a finite number of points, and if C'(t) is always bounded (when regarded as an element of IRn), then the function (C'(·), w) is defined everywhere except for a set of content zero and is bounded. Thus
  • 449. 11.2 THE INTEGRATION OF EXTERIOR DIFFERENTIAL FORMS 437 C*(w) is a contented density and (2.9) still makes sense. Now the curve can have corners. (See Fig. 11.4.) It should be observed that if w = df (and if C is continuous), then b r df = r d(f 0 C) = r (f 0 C)' JC([a.b]) JC([a.b]) Ja = fCC(b») - fCC(a»). (2.10) In this case the integral depends not on the particular curve C but on the end- points. In general, Ic w depends on the curve C. We will obtain conditions for it to be independent of C in Section 5. In the next example let M 2 = 1R3 and M I = U C 1R2 where (u, v) are Euclidean coordinates on 1R2 and x, y, z are Euclidean coordinates on 1R3. Let w = P dx A dy + Q dx A dz + R dy A dz be an element of /2(1R3). If <p: U ~ 1R3 is given by the fUIlctions x(u, v), y(u, v), and z(u, v), then for A C U, r w = feA<P*w = feA<P*cP dx A dy +Qdx A dz + R dy Adz) JI"(A) = r [(P 0 ) (ax ay _ ay ax) + (Q 0 ) (ax az _ az ax) JA <P au av au av <P au av au av + (R 0 ) (ay az _ az aY)J. <P au av au av 'We conclude this section with another look at the volume density of Riemann metrics, this time for an oriented manifold. If M is an oriented manifold with a Riemann metric, then the volume density u corresponds to an n-form n. By our rule for this correspondence, if (U, a) is a positive chart with coordinates Xl, •.. ,xn, then n = a axl A ... A dxn , where, by (4.1) of Chapter 10, a(x) = Idet (%)11/2 is the volume in Tx(JJl) of the parallelepiped spanned by a a axl (x), ... 'axn (x). Let el(x), ... ,en(x) be an orthonormal basis of Tx(M) (relative to the scalar product given by the Riemann metric). Then Idet (gijW/ 2 = Idet [C~i' ej ) JI = Idet AI, where A = (a/axi , ej) is the matrix of the linear transformation carrying ej ~ a/axj. If wl(x), ... ,wn(x) is the dual basis of the e's, then wl(x) A ... 1 wn(x) = det A axl(x) A ... A axn(x).
  • 450. 438 EXTERIOR CALCULUS ll.:~ Now wl(x), ... , wn(x) can be any orthonormal basis of T:(M). [T:(M) has a scalar product, since it is the dual space of the scalar product space Tx(M).] We thus get the following result: If wI, ... , wn are linear differential forms such that for each x EM, wl(x), ... , wn(x) is an orthonormal basis of T:(M), then n = ±wl 1 ... 1 wn . We can write (2.11) if we know that WI 1 ... 1 wn is a positive multiple of dx l 1 ... 1 dx". Can we always find such forms wI, ... , wn on U? The answer is "yes": we can do it by applying the orthonormalization procedure to dxt, ... ,dxn. That is, we set I d:c l where Ildxlll(x) = Ildxl(x) II > 0 w = Ildxlll ' is a Coo-function on U, 2 d;r2 - (dx 2, WI) WI W - , - IIdx2 - (dX2,WI )wlll The matrix which relates the dx's to the w's is composed of Coo-functions, so that. the wi E / I (U). Furthermore, it is a triangular matrix with positive entries on the diagonal, so its determinant is positive. We have thus constructed tIl(' desired forms WI, ..• ,wn , so (2.11) holds. For instance, it follows from Eq. (9.10), Chapter 9, that dO, sin 0 dcp form an orthonormal basis for Tx(82) at. all x E 8 2 (except the north and south poles). If we choose the orientation on 8~ so that 0, cp form a positive chart, then the volume form is given by n = sin 0 dO 1 dcp. 3. THE OPERATOR d With every function f we have associated a linear differential form df. We can thus regard d as a map from /o(M) to /I (lf). As such, it is linear and satisfies d(fd2) = f2 dfl +fl diz. We now seek to define a cl: /k(M) ~ /k+I(.ilf) for k > 0 as well. We shall require that d be linear and satisfy some identity with regard to multiplication, generalizing the above formula for cl(fd2)' The condition we will impose is that cl(WI 1 W2) = clWI 1 W2 + (-l)PwI 1 clW2 if WI is a form of degree p. The factor (-l)P accounts for the anticommutativity of 1. The reader should check that d is consistent with this law, at least to the extent that d(wl 1 W2) = (-l)pq cl(W2 1 WI) if WI is of degree p and W2 is of degree q. Weare going to impose one further condition on cl which will uniquely determine it. This condition (which lies at the heart of the matter) requires
  • 451. 11.3 THE OPERATOR d 439 Fig. 11.5 some introduction. Let f be a differentiable function, and let C: I ~ M be a differentiable curve. For any points a, bEl, the fundamental theorem of the calculus implies that f(C(b)) - f(C(a)) = lbd(fd~C) dt = lbC* df. We can regard b and a (with ± signs attached) as the "oriented boundary" of the interval [a, b]. Let us make the convention that "integrating" an element of /O(p) is just evaluating the function at the point p. As such, the equation above says that the integral of the "pullback" of f over the "boundary", that is, feb) - f(a), equals the integral of the "pullback" of df over [a, b]. In some sense, we would like to be able to say that if w is a form of degree k, then the integral of the "pullback" of w over the k-dimensional boundary" of a (k + I)-dimen- sional region is equal to the integral of the pullback of dw over the (k + 1)- dimensional region. Without trying to make this requirement precise, let us see what it says for the case where k = 1 and the region is a triangle in the plane. Let cP be a smooth map of some neighborhood of the triangle ~ C 1R2 into M, and let the vertices of ~ be mapped by cp into x, y, and z (see Fig. 11.5). The boundary of ~ consists of three curves (segments) Cb C2 , and Ca (with the proper orientations). Let w be a linear differential form on M. We would then expect that cp dw = ClCP W+ C2CP w + CaCP w.J * J** J** J** If w = df, then the three integrals on the right become (by the fundamental theorem of the calculus) fey) - f(x) +fez) - fey) +f(x) - fez) = O. Thus fcp* d(df) = O. Since the triangle was arbitrary, we expect that d(df) = O. We now assert: Theorelll 3.1. There exists a unique linear map d: /k(M) ~ /k+l(M) such that on /° it coincides with the old d and such that d(Wl 1 W2) = dWl 1 W2 + (-I)Pwl 1 dW2 and d(df) = 0 if f E /O(M). (3.1) (3.2)
  • 452. 440 EXTERIOR CALCULUS 11.:; Proof. We first establish the uniqueness of d. To do this we observe that (3.1) implies that d is local, in the sense that if w = w' on some open set U, thell dw = dw' on U. In fact, let W be an open set with We U, and let ep be a Coo_ function such that ep(x) == 1 for x E Wand supp ep C U. Then epw = epw' every- where on ]1.[, and thus dlepw) = d(epw'). But, by (3.1), d(epw) = ep dw + dep / w = dw on W, since ep == 1 and dep = 0 there. Thus elw = elw' on W. Since W can be arbitrary, we conclude that dw = dw' on (T. Let (U, a) be a chart with coordinates Xl, ••• ,xn. Every wE Ik(M) can be written as w= on U. Now [by induction on k, using (3.1) and (3.2)] d(elxi1 / ... / dXik) = O. Thus (3.1) implies that on U. (3.3) Equation (3.3) gives a local formula for el. It also shows that d is unique. In fact, we have shown that there is at most one operator d on any open subset o C 111 mapping 1k(O) ~ 1k+1(0) and satisfying the hypotheses of the theorem (for 0). On the set 0 n U it must be given by (3.3). We now claim that in order to establish the existence of el, it suffices to show that the d given by (3.3) [in any chart (U, a)] satisfies the requirement of the theorem on IklU). In fact, suppose we have shown this to be so. Let a be an atlas of 111, and for each chart (U, a) E a define the operator da: Ik(U) ~ 1k+1(U) by (3.3). We would like to set clw = claw on U. For this to be con- sistent, we must show that claw = dpw on U n W if (W, (3) is some other chart. But both da and clp satisfy the hypotheses of the theorem on Un W, and they must therefore coincide there. Thus to prove the theorem, it suffices to check that the operator cl, defined by (3.3), fulfills our requirements as a map of Ik(U) ~ 1k+1(U). It is ob- viously linear. To check (3.2), we observe that so '" af i df = L..J -;--: dx , va;' d(df) = L: d (~£) / d;ti = t (a!2txi dxi) = L: (~- ~)d i / d,i i<j axi axi axi axi x x =0 by the equality of mixed partials. Now we turn to (3.1). Since both sides of (3.1) are linear in WI and W2 separately, it suffices to check (3.1) for WI = a clXi1 / ... / dxip and W2 =
  • 453. 11.3 THE OPERATOR d 441 b dxh / ... / dxjq• Now WI / W2 = ab dXi1 / ... / dxip / dxil / ... / dxjq and d(ab) = b da + a db; therefore, d(WI / W2) = b da / dXi1 / ... / dxip / dxil / ... / dxjq + a db / dxi1 / • • • / dxip / dxil / . . . / dxi., while dWI / W2 = (da / dXi1 / ... / dxip) / (b dxil / ... / dxjq) = b da / dXi1 / ... / dxip / dxil / ... / dx jq and WI / dW2 = (a dXi1 / ... / dxip) / (db / dxil / ... / dxjq) = (-1)Pa db / dXi1 / ... / dxip / dxh / ... / dx so we see that (3.1) holds. This proves the theorem. D We can draw a number of important corollaries from Eq. (3.3). First of all, it follows immediately that for W E /k(M), for any k, we have d(dw) = O. (3.4) (Remember we merely assumed it for k = 0.) Secondly, let cp: M I -+ M 2 be a differentiable map. Then for W E /k(M2) we have dcp*w = cp* dw. (3.5) To check (3.5), it suffices to verify it for any pair of compatible charts. But if Xl, •.• , xn are coordinates on 1112 and, locally, W = " a· . dXi1 / ... / dXik£....i 'Zll' . .,'l.k , we have = cp* dw. In particular, if X is a vector field on 111, we conclude that Dx dw = d(Dxw). EXERCISES 3.1 Compute d of the following differential forms. a) 'Y = L~ (-l)i-IxidxI / ... / dXi-1 / dXi+1 / ... / dXn b) r-n'Y, where 'Y is as in (a) and r = {xi +...+ X~} 1/2 c) L Pi dqi d) sin (X2 +y2 +z2) (x dx + y dy + z dz) (3.6)
  • 454. 442 EXTERIOR CALCULUS 11 I Let V be a vector space equipped with a nonsingular bilinear form and an orientatioll Then we can define the *-operator as in Chapter 7. Since we identify the tangent spa", Tx(V) with V for any x E V, we can consider the *-operator as mapping !k(V) !n-k(V). For instance, in 1R2, with the rectangular coordinates (x, y), we have *dx = dy, *dy = -dx, and so on. 3.2 Show that for any function f on 1R2. 3.3 Obtain a similar eXJlrc~sion for d * din IRn with its usual sealar produet. (Re(':,11 that and, more generally, *dxh 1 ... 1 dXik = ± dxil 1 ... 1 dx1n-k, where (il, ... , ik,jl, ... ,jn-k) is a permutation of (1, ... , n) and the ± is the si~I' of the permutation.) 3.4 Let x, y, z, t be coordinates on 1R4. Introduce a scalar product on the tangelll space at each point so that (dx, dy) and (dx, dx) = (dy, dy) = (dz, dz) = 1, (dx, dz) = (dx, dt) = (dy, dz) = (dy, dt) (c dt, c dt) = -1, where c is a positive constant. Let the two-form w be given by (dz, dt) = 0, w = C(EI dx 1 dt + E2 dy 1 dt + E3 dz 1 dt) +BI dy 1 dz + B2 dz 1 dx + B3 dx 1 dy. Let the three-form 'Y be given by 'Y = p dx 1 dy 1 dz - (Jr dy 1 dz + Jz dz 1 dx + h dx 1 dy) 1 dt. Write the equations dw = 0, d * w = 47r'Y as equations involving the various coefficients and their partial derivatives. 4. STOKES' THEOREM In this section we shall prove a theorem which will be a far-reaching generaliza tion of the fundamental theorem of the calculus of one variable. It should, perhaps, be called the fundamental theorem of the calculus of several variables We first make some definitions. Let D be a domain with regular boundary in a manifold M. We recall (page 419) that each point of M lies in a chart (U, a) which is one of three types
  • 455. 11.4 STOKES' THEOHEM 443 Let (U, a) and (W, (3) be two charts of M of type (iii). Then, as on page 420, the matrix of J{Joa-1 is given by ayl ayl axl axn ayn-l ayn-l , axl axn 0 0 0 0 ayn axn and so (4.1) Furthermore, yn(xl, ... ,xn) > 0 if xn > 0, since a(U n W) n {v: vn > O} = a(U n W n int D). Thus aynjaxn > 0 at a boundary point where Xn = O. Now suppose that 11 is an oriented manifold, and let D C M be a domain with regular boundary. We shall make aD into an oriented manifold. We say that an atlas a is adjusted if each (U, a) E a is of type (i), (ii), or (iii) and, in addition, if each chart of a is positive. If dim M > 1, we can always find an adjusted atlas. In fact, by choosing the U connected, we find that every (U, a) is either positive or negative. If (U, a) is negative, we replace it by (U, a'), where x~, = -x~. If dim M = 1, then aD consists of a discrete set of points (which we can regard as a "zero-dimensional manifold"). Each x E aD lies in a chart of type (iii) which is either positive or negative. We assign a plus sign to x if any chart (and hence all constricted charts) of type (iii) is negative. We assign a minus sign to x if its charts of type (iii) are positive. In this way we "orient" aD, as shown in Fig.. 11.6. D + Fig. 1I.6 If dim M > 1, we choose an adjusted (oriented) atlas on M. It then follows from (4.1) and the fact that aynjaxn > 0 that det J(fJ ~aD)o(a ~aD)-l > O. This shows that (U f aD, a f aD) is an oriented atlas on aD. We thus get an orientation on aD. This is not quite the orientation we want on aD. For reasons that will soon become apparent, we choose the orientation on aD so that
  • 456. 444 EXTERIOR CALCULUS 11.4 (U r aD, a raD) has the same sign as (_I)n. That is, (U raD, a raD) is a positive chart if n is even, and we take the orientation opposite to that deter- mined by (U raD, a raD) if n is odd. We can now state our main theorem. Theorem 4.1 (Stokes'theorem). Let M be an n-dimensional oriented mani- fold, and let DC M be a domain with regular boundary. Let aD denote the boundary of D regarded as an oriented manifold. Then for any wE !n-I(M) with compact support we have ( L*W = ( dw, lao 10 (4.2) where, as usual, L is the injection of boundary D into M. Proof. For n = 1 this is just the fundamental theorem of the calculus. For n > lour proof is almost exactly the same as the proof of Theorem 6.1 of Chapter 10. Choose an adjusted atlas a and a partition of unity {gj} sub- ordinate to a. Since w has compact support, we can write W = LgjW, where the sum is finite. Since both sides of (4.2) are linear, it suffices to verify (4.2) for each of the summands gJW' Since supp gjW C U, where (U, a), we must check the three possibilities: (U, a) satisfies (i), (ii), or (iii). If (U, a) satisfies (i), L*W = 0, since supp W n aD = 0, and 10 dw = 1M eD dw = 0, since D n supp W = 0. Thus both sides of (4.2) vanish. If (U, a) satisfies (ii), the left-hand side of (4.2) vanishes. We must show that the same holds for the right-hand side. Let Xl, .•• ,xn be the coordinates on (U, a), and write gjW = al dx2 1 ... 1 dxn +a2 dx l 1 dx3 1 ... 1 clxn +... Then and thus + an clx1 1 ... 1 dxn-l. d "( )i-l aai d i d ngjW = £...i -1 axi x 1 ... 1 x, Since gjW has compact support, the functions ai have compact support, and we can replace the integral over ~n by the integral over O!R, where R = -< R, ... , R> and R is chosen so large that supp ai C O!R' But writing the multiple integral as an integral, we get fO'!..R~~~ = .£n-l ai(. .. , R, ...) - ai(. .. , - R, ...) = 0, since ai(. .. , R, ...) = ai(. .. , -R, ...) = 0.
  • 457. 11.4 STOKES' THEOREM 445 Fig. 11.7 We now examine f dyjw in case (iii). The argument proceeds exactly as before, except that we must compute fa(unD) aa;jaXi instead of fa(U)' (See Fig. 11.7.) We can now replace the region of integration by a rectangle of the form D~~R::~.~R.O> for large R. If i < n, f aa;jaxi = 0 as before. If i = n, we get 1 ~an = - ( an(·, " ... ",0), a(UnD) uXn JRn-l so that ! d(Yjw) = 2: (_1)i-l! aaa~ = (_I)n ( an(-,·, ... ",0). x' JRn-l Now since xn = 0 on Un aD, we see that L* dxn = O. Thus L*W = (L*an)(L* dxl ) / •.• / (L* dxn- 1), or if (by abuse of notation) we regard xl, ... ,xn - 1 as the coordinates of (U raD, ex raD), we get L*W = a(·,·, ... ,·, 0) dx l / •.. / dxn-l. In view of the choice we made for the orientation of aD, we conclude that r ,*w = (_I)n r an(·,·, ... ,·, 0). JaD JRn-l This completes the proof of the theorem. 0 Theorem 4.1, like the divergence theorem, is not sufficiently broad for us to apply to more general domams. For this purpose, we will again use the notion of a domain with almost regular boundary. We have already seen that the set of x E aD having a neighborhood of type (iii) forms a differentiable manifold. (Recall that these points need not exhaust all of aD). Similarly, if M is an oriented manifold, then this collection of points becomes an oriented manifold (with (-1)n times the induced orientation, as before). By abuse of language we shall denote this oriented manifold by aD. Thus aD is an oriented manifold which, as a set, is not aD but only the "regular" points of aD, that is, the points of ab.
  • 458. 446 EXTERIOR CALCULUS 11.4 Theorem 4.2 (Stokes' theorem). Let M be an n-dimensional oriented manifold, and let D C M be a domain with almost regular boundary. Let aD be as above, and let L be the injection of aD ---7 M. Then for any W E /n-l(M) with compact support we have ( L*W = ( dw. iao io (4.2) Proof. The proof proceeds as before. We choose an adjusted atlas and a par- tition of unity {gj} subordinate to the atlas. We write w = L: gjW and now have four cases to consider. The first three cases have been handled already. The new case is where ' " 1 ---- .gjW = £oJ aj dx 1 ... 1 dx' 1 ... 1 dxn, j 'where the ---- indicates that dxj is to be omitted, has its support contained in U, where (U, a) is a chart of type (iv). By linearity, it suffices to verify (4.2) for each summand on the right, i.e., for aj dx l 1 .. , 1 d'7ii 1 ... 1 dxn. Now L*(aj dx l 1 ... 1 £j 1 ... 1 dxn) = 0 unless j ~ k, since dxP vanishes on the piece of aD n U whose image under a lies in H;. If j < p, then all these dxP occur, and thus L*(aj dx l 1 ... 1 £j 1 ... 1 dxn) = O. If j > p, then L*(aj dx l 1 ... 1 dxj 1 dxn) vanish everywhere except on the portion of aD which maps under a onto H7. On the other hand, d(aj dx l 1 ... 1 dxj 1 ... 1 dxn) = (_l)j-l ~~: dx l 1 ... 1 dxn. We can evaluate the integral JD by integrating over the rectangle []<R, ... ,R, ... ,R> <-R.... ,-R,O, ... ,0,0 > (where the - R's extend through the (k - l)th position). Integrating first with regard to xj, we obtain ( d(aj dx l 1 ... 1 dxj 1 .. , 1 dxn ) = (-l)j ( k aj. iD iH. On the other hand, the orientation on Hj is such that this integral has the sign necessary to make (4.2) hold. This proves Theorem 4.2. 0 As before, we can apply Theorems 4.1 and 4.2 to still more general domains by using a limit argument. For instance, Theorem 4.2, as stated, does not apply to the domain D in Fig. 11.8, because the curves Cl and C2 are tangent at P. It does apply, however, to the approximating domain obtained by "breaking off a little piece" (Fig. 11.9), and it is clear that the values of both sides of (4.2) for D' are close to those for D. We thus obtain (4.2) for D by passing to the limit. As before, we will not state a more general theorem covering these cases. It will be clear in each instance how to apply a limit argument.
  • 459. 11.4 STOKES' THEOREM 447 p~-~- Fig. 11.8 Fig. 11.9 Since the statement and proof of Stokes' theorem are so close to those of the divergence theorem, the reader might suspect that one implies the other. On an oriented manifold, the divergence theorem is, indeed, a corollary of Stokes' theorem. To see this, let 0 be an element of /n(M) corresponding to the density p. If X is a vector field, then the n-form DxO clearly corresponds to the density Dxp = div <X, p>-. Anticipating some notation that we shall intro- duce in Section 6, let X ...J 0 be the (n - I)-form defined by X ...JO(e, ... , ~n-l) = (-I)n-IOal, ... , ~n- X). In terms of coordinates, if 0 = a dxl 1 ... 1 dxn , then X ...J 0 = a[XI dx2 1 ... 1 dxn - X 2 dx l 1 dx3 1 ... 1 dxn +... + (_I)n- Ixn dxl 1 ... 1 dxn- l]. Note that d(X ...J 0) = (L a;;.) dx1 1 ... 1 dxn, which is exactly the n-form DxO, since it corresponds to the density Dxp = div <X, p>-. Thus, by Stokes' theorem, ( ,*(X...J 0) = ( d(X...J 0) = ( div -< X, p>- . Jao Jo JD We must compare X ...J 0 with the density px on aD. By (2.2) they agree on everything up to sign. To check that the signs agree, it suffices to compare px(~' ...,_a)= p(~, ...,_a,X)axl axn- l axl axn- l with '*(X...JO(~, ... ,_a))axl axn- l at any x E aD. Now ,*(X ...JO) = (_I)n- I xn dx l 1 ... 1 dxn- l and, according to our convention, Xl, ••• , xn - l is a positive or negative coordi- nate system according to the sign of (_l)n. Thus the two coincide if and only if xn is negative, that is, ( '*(X...J 0) = { EXPx. Jao JaD
  • 460. 448 EXTERIOR CALCULUS 11.4 EXERCISES 4.1 Compute the following surface integrals both directly and by using Stokes' theorem. Let 0 denote the unit cube, and let B be the unit ball in 1R3. a) faD x dy A dz + y dz A dx + z dx A dy b) fan x3 dy A dz c) faD cos z dx A dy d) fou x dy A dz, where U = [(x, y, z) : x ~ 0, y ~ 0, z ~ 0, x2+ y2 + z2 ::; I} 4.2 Let w = yz dx + x dy + dz. Let'Y hc the unit circle in the plane oriented in the count.crdoekwise direction. ComJlute f7 w. Let .r = {(x, y, z) : z = 0, x2+ y2 ::; 1], A2 = {(x, y, z) : z = 1 - x2 - y2, x2 + y2 ::; 1]. Orient the surfaces A 1 and .12 so that aAI = aA2 = 'Y. Verify that fAI dw = fA2 dw = f w by computing the integrals. 4.3 Let 8 1 be the circle and define w = (1!27r) de, where 8 is the angular coordinate. a) Let cp: 81 -781 be a differentiable map. Show that fcp*w is an integer. This integer is called the degree of cP and is denoted by deg cpo b) Let CPt be a collection of maps (one for each t) which depends differentiably on t. Show that deg CPO = deg CPl. c) Let us regard 8 1 as the unit circle in the complex numbers. Let f be some function on the complex numbers, and suppose that fez) ~ 0 for Izl = r. Define CPr.! by setting CPr.!(ei8) = f(re i8 )/lf(rei8)1. Suppose fez) = Zn. Compute deg cpr.! for r ~ O. d) Letfbe a polynomial of degree n ~ 1. Thus fez) = a"zn+ an_Iz,,-I +...+ao, where an ~ O. Show that there is at least one complex number Zo at which f(zo) = O. [Hint: Suppose the contrary. Then CPr,(llun)! it! defined for all 0 ::; r < 00 and deg CPr,(1 Ian)! = const, by (b). Evaluate limr=o andlimr=oo of this eX)ll'eSHion.] Let X be a vector field defined in some neighborhood U of the origin in IP, and suppose that X(O) = 0 and that X(x) ~ 0 for x ~ O. Thus X vani:shes only at the origin. Define the map CPr: 8 1 -7 8 1 by i8. X(rei8) cpr(e ) = IIX(rei8) II' This map is defined for sufficiently small r. By Exerci8e 4.3(h) the degree of this map does not depend on r. This degree is called the index of the vector field X at the origin. 4.4 Compute the index of a) a ax-+y-, ax ax a ab) x- - y-' ax ay a ac) y- - x-' ax ay d) Construct a vector field with index 2. e) Show that the index of -X is the same as the index of X for any vector field X.
  • 461. 11.5 SOME ILLUSTRATIONS OF STOKES' THEOREM 449 4.5 Let X be a vector field on an oriented two-dimensional manifold, and suppose that X(p) = 0 for some p E M and that X does not vanish at any other point in a small neighborhood of p. By choosing an oriented chart mapping p into zero, we get a vector field on [2 vanishing at the origin. Show that the index of this vector field does not depend on the choice of charts. We can thus define the index of X at p. 4.6 a) On the sphere 8 2 let X be a vector field which is tangent to the meridian circles everywhere and vanishes only at the north and south poles. What is its index at each pole? b) Let Y be a vector field which is tangent to the circles of latitude everywhere and vanishes only at the north and south poles. What is its index at each pole? 5. SOME ILLUSTRATIONS OF STOKES' THEOREM As a simple but important corollary of Theorem 4.2, we state: Theorem 5.1. Let cp:]I.{1 ~ ]I.{2 be a differentiable map of the oriented k-dimensional manifold Ml into the n-dimensional manifold ]I.{2. Let w be a form of degree k - 1 on ]I.{2, and let D C M 1 be a domain with almost regular q boundary on MI. Then we have (5.1) Equation (5.1) follows directly from (4.2) and from the fact that cp*d = dcp*. p Fig. 11.10 We can regard the right-hand side of (5.1) as the integral of dw over the "oriented k-dimensional hypersurfaces" cp(D). Equation (5.1) says that this integral is equal to the integral of w over the (k - I)-dimensional hypersur- face cp(aD). [Sj------------- ------- p q Fig. 11.11 We now give a simple application of Theorem 5.1. Let Co: [0, 1] ~ ]I.{ and C1 : [0, 1] ~ M be two differentiable curves with CoCO) = Cl(O) ~ p and Co(I) = Cl(I) = q. (See Fig. 11.10.) We say that Co andCl are (differentiably) homotopic if there exists a differentiable map cp of a neighborhood of the unit square [0,1] X [0,1] C [R2 into ]I.{ such that cp(t,O) = Co(t), cp(t, 1) = Cl(t), 'P(O, s) = p, and cp(I, s) = q. (See Fig. 11.11.) For each value of s we get the curve C. given by CB(t) = cp(t, s). We think of cp as providing a differentiable "deformation" of the curve Co into the curve Cl.
  • 462. 450 EXTERIOR CALCULUS 11.5 Proposition 5.1. Let Co and C1 be differentiably homotopic curves, and let w be a linear differential form on M with dw = 0. Then (5.2) Proof. In fact, { 0<1.1> ,,/w = {0<1.1> cp* dw = 0. Ja <0.0> J[ <0.0 > But faD is the sum of the four terms corresponding to the four sides of the square. The two vertical sides (t = °and t = 1) contribute nothing, since cp maps these curves into points. The top gives - fel (because of the counterclockwise orientation), and the bottom gives feo. Thus feow - fel w = 0, proving the proposition. 0 It is easy to see that the proposition extends without difficulty to piecewise differentiable curves and piecewise differentiable homotopies. Let us say that two piecewise differentiable curves, Co and ClI are (piecewise differentiably) homotopic if there is continuous map cp of [0, 1] X [0, 1] ~ M such that i) cp(O, s) = p, cp(l, s) = q; ii) cp(t,O) = Co(t), cp(t, 1) = C1(t); iii) there are a finite number of points to < tl < ... < tm such that cp coincides with the restriction of a differentiable map defined in some neighborhood of each rectangle [ti' ti+d X [0, 1]. (See Fig. 11.12.) To verify that Proposition 5.1 holds for the case of piecewise differentiable homotopies, we apply Stokes' theorem to each rectangle and observe that the contribution of the interior vertical lines cancel one another. We say that a manifold M is connected if every pair of points can be joined by a (piecewise differentiable) curve. Thus IRn, for example, is connected. We say that M is simply connected if all (piecewise differentiable) curves joining the same two points are (piecewise differentiably) homotopic. (Note that the circle, 8 1 is not simply connected.) Let us verify that IRn is simply connected. If Co and C1 are two curves, let cp: [0, 1] X [0, 1] ~ IRn be given by cp(t, s) = sCoCt) + (1 - S)C1(t). It is clear that cp has all the desired properties. Fig. 1l.12 Proposition 5.2. Let M be a connected and simply connected manifold, and let 0 EM. Let wE /I(M) satisfy dw = 0. For any x E M let f(x) = fe w, where C is some piecewise differentiable curve joining 0 to x. The function f is well defined and differentiable, and df = w.
  • 463. 11.5 SOME ILLUSTRATIONS OF STOKES' THEOREM 451 Proof. It follows from Proposition 5.1 that f is well defined. If Co and Cl are two curves joining 0 to x, then they are homotopic, and so feo w = fel w. It is clear that f is continuous, since f(x) - fey) = fD w, where D is any curve joining y to x (Fig. 11.13). To check that f is differentiable, let (U, a) be a chart about x with coordi- nates <Xl, ... , xn>. Then f(x!, ... , xi + h, ... ,xn) - f(x l,... ,xn) = fe w, where C is any curve joining p to q, where a(p) = (Xl, ... , xi, ... ,xn ), and where a(q) = (xl, ... , Xi + h, ... ,xn). We can take C to be the curve given by a 0 C(t) = (xl, ... , xi + ht, ... , xn). If w = al dx l + ... + an dxn, then fe w = fol hai dt = foh ai(xl,... ,xi + s, ... ,xn) ds. (See Fig. 11.14.) Thus 1· 1 [f( 1 i hI n) f( 1 n)] iIm-h X, •.• ,X + ,... ,X - X, ..• ,X =a, h-.O that is, aj/ axi = ai. This shows that f is differentiable and that df = w, proving the proposition. 0 ______-x~ o Fig. 11.13 Fig. 11.14 We have thus established that every wE /V(/R n ) with dw = 0 is of the form df. More generally, it can be established that if Q E !k(/Rn ) satisfies dQ = 0, then Q = dw for some w E !k-l(/Rn). *This is not true for an arbitrary manifold. For instance, every w E !1(81) satisfies dw = O. Yet the element of angle form (which is, unfortunately, denoted by dO) is not the d of any function. The fact that d2 = 0 shows that if Q = dw, then dQ = O. Thus the space d[!k-l(M)] C !k(M) is a subspace of the space kerk d of elements in !k(M) satisfying dQ = O. The quotient space kerk d/d[!k-lj is denoted by Hk(M) and is called the kth cohomology group of M. If M is compact, it can be shown that Hk is finite-dimensional. It measures (roughly speaking) "how many" k-dimensional holes there are in M.*
  • 464. 452 EXTERIOR CALCULUS 11.6 6. TilE LIE DERIVATIVE OF A DIFFERENTIAL FORM Let M be a differentiable manifold, and let 'P be a flow on M with infinitesimal generator X. For any W E /k(M) we can consider the expression *'PtW - W t It is not difficult (using local expressions) to verify that the limit as t ---7 0 exists and is again an element of /k(M), which we denote by Dxw. The purpose of this section is to provide an effective formula for computing Dxw. For this purpose, we first collect some properties of Dx. First of all, we have that it is linear: Secondly, we have 'Pi(WI 1 W2) - WI 1 W2 = ('PiWI) 1 ('PiW2) - WI 1 W2 = ('PiWI) 1 ('PiW2) - ('PiWI) 1 W2 + ('PiWI) 1 W2 - WI 1 W2. Dividing by t and passing to the limit, we see that DX(WI 1 W2) = (DXWI) 1 W2 + WI 1 DXW2. Finally, since 'Pi d = d'Pi, we have Dx dw = d(Dxw). Actually, these three formulas suffice for the computation. If W = " a· . dxil 1 ... 1 dXik,~ zI .. ··.lk then Dxw = "Dx(a· . dXil 1 ... 1 dXik)~ ~ zl····,'lk by (6.1) (6.1) (6.2) (6.3) = " [(Dxa . ) dXil 1 ... 1 clXik + a . (Dx dXil) 1 ... 1 clXikL.J l.l, ... ,tk 'l.!,· .. ,'l.k + ... + a,l .....i k clXil 1 ... 1 (Dx dXik)] by repeated use of (6.2) = " [(Dxa· ) dxil 1 ... 1 clXik + a· . cl(DxXI) 1 ... 1 dxkL...J 11, ... ,Zk ZI,'" Ilk +... + a,l .....i k clXil 1 ... 1 cl(DxXik)] by (6.3). Since this expression is rather cumbersome (the d(DxXi) have to be expanded and the terms collected), we shall derive a simpler and more convenient expression for D XW. In order to do this, we make an algebraic detour. Recall that the operator cl: /k(M) ---7 /k+I(M) is linear and satisfies the identity cl(WI 1 W2) = clWI 1 W2 + (_l)kwl 1 dW2 if WI E /k(M). More generally, any (sequence of linear) maps () of /k(M) ---7/k+I(M) (6.4)
  • 465. 11.6 THE LIE DERIVATIVE OF A DIFFERENTIAL FORM 453 satisfying the identity (J(WI 1 W2) = (JWI 1 W2 + (_I)kwl 1 (JW2 (6.4') and supp (Jw C supp Wt (6.5) will be called an antiderivation of the algebra A(M). It follows from (6.5) that if WI == W2 on an open set U, then (J(WI) == (J(W2) on U. Now about every x E lIf we can find a neighborhood U and functions xl, ... , xn , so that wE Ak(M) can be written as on U. (6.6) Then by repeated use of (6.4') we have (J(w) == '" [(J(a· . ) 1 clXil 1 ... 1 clXik + a· . (J(clxil) 1 ... 1 dxikL..... 11, ...• 1..k 1..1.···,7k + ... + (-I)k-Iail .....ik clXil 1 ... 1 (J(clXik)]. (6.7) We thus arrive at the important conclusion: Proposition 6.1. Any antiderivation (J: Ak(M) ---+ AHI(M), k = 0, ... ,n, is uniquely determined by its action on Ao(M) and AI(M). That is, if (JI(W) = (J2(W) for all wE AO(M) and AI(M), then (JI(Q) = (J2(Q) for Q E Ak(M) for any k. Now suppose we are given maps and which satisfy (6.5) and (6.4') where it makes sense, that is, and (J(fW) = (J(f)llw + f(J(w). (6.8) Then any chart (U, ex) defines (J: Ak(U) ---+ AHl(U) by (6.7). This gives an antiderivation (Ju on U, as can easily be checked by the use of the ar- gument on pp. 440-441. By the uniqueness argument, if (W, (3) is a second chart, the antiderivations (Ju and (Jw coincide on Un W. Therefore, Eq. (6.7) is consistent and yields a well-defined antiderivation on A(M). (Observe that we have just repeated about two-thirds of the proof of Theorem 3.1 for the more general context of any antiderivation.) t This condition is actually a consequence of (6.4). In fact, let U be an open set containing supp w. Since {U, lIf-supp w} is an open covering of 1II, we can find a partition of unity subordinate to it. In particular, we can find a COO-function cP which is identically one on supp wand vanishes outside U. Then w = cpw, so that Thus supp (J(w) C supp w U supp cp C U. Since U is an arbitrary neighborhood of supp w, we conclude that supp (J(w) C supp w.
  • 466. 454 EXTERIOR CALCULUS 11.6 Also observe that in the above arguments, nothing changes if instead of 0: /k(M) ~ /Hl(M) we have 0: /k(M) ~ /k-l(M). [We take this to mean O(f) = 0 for f E /O(M).] In fact, the same argument works for 0: /k ~ /Hr for any odd integer r. We can thus state: Proposition 6.2. Let 0: AO(M) ~ Ar(M) and 0: Al(M) ~ A r +1(M) be linear maps satisfying (6.5) and (6.8), where r is odd. Then there exists one and only one way ofextending 0 to an antiderivation 0: Ah(M) ~ Ah+reM) satisfying (6.4). As an application of this proposition, we will attach an antiderivation O(X): /k(M) ~ /k-l(M) to every smooth vector field X on AI. Since r = -1, for f E /o(M) we set O(X)f = o. For w E / l(M) we set O(X)w = (X, w). (6.9) To verify (6.8) means to check that O(X)(fw) = fO(X)w, that is, that (X, fw) = f(X, w), which is obvious. If f is a function and 0 is an antiderivation, we denote by fO the map which sends w ~ fO(w). It is easy to check that this is again an antiderivation. We can assert the following as a consequence of the uniqueness theorem: Let X and Y be smooth vector fields, and let f and g be smooth functions. Then O(fX + gY) = fO(X) + gO(Y). (6.10) By the proposition, it suffices to check (6.10) on all w E /l(M). By (6.9), this is just (fX + gY, w) = f(X, w) + g(Y, w), which is obvious. In particular, in a chart (U, h), if X=Xl~+ ... +xn~, axl axn then O(X) = L XiO (a~i) . To evaluate O(ajaxi ), we use (6.8) and the fact that if i ~ j, if i = j.
  • 467. 11.6 THE LIE DERIVATIVE OF A DIFFERENTIAL FORM 455 Thus, for example, O(ajaxi) dxP / dxq = 0 if neither p = i nor q = i, while O(ajaxi) (dxi / dxi ) = dxi , O(ajaxi)(dxi / dxi ) = -dxi , etc. Let us call a (sequence of) map(s) D: /k(JI1) ---7 /H8CM), where s is even, a derivation if it satisfies (6.5) and D(WI / W2) = DWI / W2 +WI / DW2· (6.11) Since s is even, this is consistent. The most important example is Dx where s = O. Then (6.11) is just (6.2). All the previous arguments about existence and uniqueness of extensions apply unchanged to derivations, as can easily be checked. We can therefore assert: Proposition 6.3. Let D: /0(1'01) ---7/0+8(1f) and D: /I(M) ---7/ H8(1'o1), where s is even, be maps satisfying (6.5) and (6.8) (with 0 replaced by D). Then there exists one and only one way of extending D to a derivation of /(M). We need one further algebraic fact. Proposition 6.4. Let 01: /k ---7 /k+r1 and Oz: /k ---7 /k+r2 be antideriva- tions. Then 0102 + 0201: /k ---7 /k+r1 +'2 i" a derivation. Proof. Since Tl and r2 are both odd, i"1 + 1"2 is eyen. Equation (6.5) obviously holds. To verify (6.4'), let WI E /k(lI). Then Similarly, 0102(Wl / W2) = 01[02Wl / W2 + (_l)kWI / 02W2] = 0102Wl / W2 + (_1)k+r202Wl / 01W2 + (_l)kOlWI / 02W2 + WI / 0102W2· 0201(WI / W2) = 0201Wl / W2 + (_l)k+rlOlWI / 02W2 + (_1)k02Wl / 0IW2 + WI / 0201W2· Since rl and T2 are both odd, the middle terms cancel when we add. Hence we get (0 102 + 0201)(Wl / W2) = (0 102 + 0201)Wl / W2 + WI / (0102 + 0201)W2. 0 As a first application of Proposition 6.3, we observe that O(X) 0 O(Y) = -O(Y) 0 O(X). (6.12) In fact, by Proposition 6.4, O(X)O(Y) + O(Y)O(X) is a derivation of degree -2, that is, it vanishes on /0 and / 1. It must therefore vanish identically. We could, of course, directly verify (6.12) from the local description of O(X) and O(Y). As a more serious use of Proposition 6.4, consider O(X) 0 d +d 0 O(X), where X is a smooth vector field. Since d: /k ---7 /Hl and O(X): /k ---7 /k-r, we conclude that O(X) 0 d + d 0 O(X): /k ---7 /k. We now assert the main formula of this section: Dx = O(X) 0 d + d 0 O(X). (6.13)
  • 468. 456 EXTEHIOH CALCULUS lUi Since both sides of (6.13) are derivations, it suffices to check (6.13) for fUllctions and linear differential forms. If j E 1o(M), then O(X)j = O. Thus, by (6.9), Eq. (6.13) becomes Dxj = (X, df), which we know holds. Next we must verify (6.13) for wE II(M). By (6.5), it, suffices to verify (6.13) locally. If we write w = al dx l +... + an dxn, it suffices, by linearity, to verify (6.13) for each term ai dXi. Since both sides of (6.13) are clerivations, we have Dx(a, dxi) = (DXai) dxi + a,(Dx dxi) and [O(X) d +clO(X)](a, dxi) = [O(X) d + clO(X)](a,) dxi + a,[O(X) d + dO(X)] clXi. Since we have verified (6.13) for functions, we only have to check (6.13) for dXi. Now and [O(X) d + clO(X)] dxi = dO(X) d.ri = d(X, dxi) = dDxXi. This completes the proof of (6.13). In many circumstances it will be convenient to free the letter 0 for other uses. We shall therefore occasionally adopt the notation X .J w = O(X)w. The symbol .J is called the interior product. X.J w is the interior product of the form w with the vector field X. If wE 1 then X.J w E Ik-l. Equa- tion (6.13) can then be rewritten as Dxw = X .J dw + d(X .J w). (6.14) Let us see what (6.14) says in some special eases in terms of local coordinates. If w = al clx l +... + an dxn and X Xl a xn a= ax1 + ... + a.rn ' then Hence while so j (aaj) i (aXi) i d(X.J w) = LX axi dx + L aj axi dx.
  • 469. APP. I. "VECTOR ANALYSIS" 457 Thus ( j aai aXJ) i Dxw = L LX a----:+ aj-a' dx, i j xJ x' "vhich agrees with Eq. (7.12') of Chapter 9. As a second illustration, let n = a dx 1 1 ... 1 dxn , where n = dim 111. Then dn = 0, so (6.14) reduces to Dxn = d(X ..J n). If X = L Xi(ajaxi), then X ..J n = L Xi (a~i) ..J w = L aXi (~-;) ..J (dxl 1 . . . 1 d:rn ) ax' = L (_l)i-laX i dx1 1 ... 1 dX i- 1 1 dXi+1 1 ... 1 dxn , which is merely the formula introduced at the end of Section 4. Then (~ aax') 1 n Dxn = d(X..J n) = L... axi- dx 1 ... 1 dx . Since we can always locally identify a density with an n-form by identifying P "ith Pc< dx1 1 ... 1 dxn on (U, a), we obtain another proof of Proposition 5.2 of Chapter 10. Appendix I. "VECTOR ANALYSIS" We list here the relationships between notions introduced in this chapter and various concepts found in books on "vector analysis ", although we shall have no occasion to use them. In oricnted Euclidean three-space 1E3 , there are a number of identifications we can make which give a special form to some of the operations we have introduced in this chapter. First of all, in 1E3, as in any Riemann space, we can (and shall) identify vector fields with linear differential forms. Thus for allY function f we can regard df as a vector field. As such, it is called grad f. Thus, in 1E3, in terms of rectangular coordinates x, y, z, gradf = ..Jaf ,~, af"'- , ,ax ay aZ( where we have also identified vector fields on 1E3 with 1E3-valued functions. Secondly, since 1E3 is oriented, we can, via the *-operator (acting on each Tn, identify A2(1E3) with A1(1E3). Recall that * is given by *(dx 1 dy) = dz, *(dx 1 dz) = -dy, *(dy 1 dz) = dx. (L1)
  • 470. 458 EXTERIOR CALCULUS In particular, if Wi = <.P, Q, R>- = P dx +Q dy + R dz and W2 = <.L, M, N>- = L dx + M dy +N dz, we can introduce the so-called "vector product" of Wi with W2. It is defined by Wi X W2 = *(Wi / W2) and is given [in view of (1.1)1by <.P, Q, R>- X <.L, M, N>- = <.QN - RM, RL - PN, PM - QL>-. Also we introduce the operator curl W = *dw. Thus, if W= <.P, Q, R>-, we have curl w =.-JaR _ aQ, ap _ aR , aQ _ ~~ ... . " ay az az ax ax ay ( Consider an oriented surface in 1E3; i.e., let cp: S -t 1E3. Let n be the volume form on S associated with the Riemann metric induced by cpo By definition, if ~b ~2 E Tx(S), then n(~b b) = dV <. CP*~b CP*~2' n>-, where dV is the volume element of 1E3 and n is the unit normal vector. Another way of writing this is to say that n(h, ~2) = U(CP*~b cp*b), where U = *n when we regard n as a differential form. Now let iii be a form in 1E3 , and suppose that cp*w = fn for some function f. Then f(x) = (w, *n)(cp(x)). Thus Iscp*(W) = !sfn = Is (w,*n)n = I(*w,n)n. Applying this to w = dw, where w = P dx +Q dy + R dz, we can rewrite Stokes' theorem as Ie w = Ie P dx +Q dy + R dz = Is (curl w, n)n, where S is some surface spanning the closed curve C. If we apply the remark to the case w= *w and S = aD, we obtain, since ** = id (for n = 3), I(w, n)n = ID d * w Note that d * w = (ap +aQ +~~) dx / dy / dz ax ay az ' which we write as div W; that is, div w = d *w.
  • 471. APP. II. DIFFERENTIAL GEOMETRY OF SURFACES IN 1E3 459 (It is in fact div {w, dV}, where dV is the volume element and we regard w as a vector field.) Thus we get the divergence theorem again. Note that curl (grad f) = *d df == 0 and div (curl w) = d ** dw = d2w = 0, since d2 = o. Appendix II. ELEMENTARY DIFFERENTIAL GEOMETRY OF SURFACES IN 1E3 For purposes of computation, it is convenient to introduce the notion of a vector-valued differential form. Let E be a vector space, and let M be a differen- tiable manifold. By an E-valued exterior differential form !l of degree p we shall mean a rule which assigns an element !l:z; to each x EM, where !l:z; is an antisymmetric E-valued multilinear function of degree p on T:z;(M). For instance, if p = 0, then an E-valued zero-form is just a function on M with values in E. An E-valued one-form is a rule which assigns an element of E to each tangent vector ~ at any point of M, and so on. Suppose that E is finite-dimensional and that {eb ... , eN} is a basis for E. Let !lb ... ,!IN be (real-valued) p-forms. We can then consider the E-valued p-form!l = !llel +... +!lNeN, where, for any p vectors h, ... , ~p in T:z;(M), we have !l:z;(h, ... , ~p) = !lIAh, ... , ~p)el +... + !IN:z;(h, ... , ~p)eN. Conversely, if !l is an E-valued form, then real-valued forms !lb ... , !IN can be defined by the above equation. In short, once a basis for an N-dimensional vector space E has been chosen, giving an E-valued differential form nis the same as giving N real-valued forms, and we can write N !l = L !liei I or The rules for local description of E-valued forms, as well as the transition laws, are similar to those of real-valued forms, so we won't describe them in detail. For the sake of simplicity, we shall restrict our attention to the case where E is finite-dimensional, although for the most part this assumption is unnecessary. If w is a real-valued differential form of degree p, and if !l is an E-valued form of degree q, then we can define the form w / !l in the obvious way. In terms of a basis, if!l = -<!lb ... ,!lN >- ,then w / !l = -< w / !lb ... , w / !IN>-. More generally, let E and F be (finite-dimensional) vector spaces, and let # be a bilinear map of E X F -) G, where G is a third vector space. Let {eb . .. ,eN}
  • 472. 460 EXTERIOR CALCULUS be a basis for E, let {h, ... ,fM} be a basis for F, and let {gI, ... ,od be a basis for G. Suppose that the map # is given by #{ei, h} = L a~jOk. k Then if W = L Wiei is an E-valued form and n = L niIj is an F-valued form, we define the G-valued form W 1 n by W 1 n = L (2: a~jWi 1 nj) Ok· k t.J It is easy to check that this does not depend on the particular bases chosen. Ve shall want to this notion primarily in two contexts. First of all, we will be interested in the case where E = F and G = IR, so that #is a bilinear form on E. Suppose #is a scalar product and el, ... , eN is an orthonormal basis. Then we shall write (w 1 n) to remind us of the scalar product. If and then (W 1 n) = L Wi 1 ni' Kote that in this case if W is a ]i-form and n is a q-form, then as in the case of real-valued forms. The second case we shall be interested in is where F = G and E = Hom(F), and # is just the evaluation map evaluating a linear transformation on a vector of F to give another element of F. This time, choosing a basis for F determines a basis for Hom(F), so we can regard was a matrix of real-valued differential forms. If W = (Wij) and n = -< nl, ... , nM >- , then W 1 n= -<LWlj 1 nj, ... ,LWMj 1 nj >-. The operator d makes sense for vector-valued forms just as it did for real- valued forms, and it satisfies the same rules. Thus, if n = -<n!, ... , nN>-, then dn = -< dnj , ..• , dnN>- and dew 1 n) = dw 1 n+ (-l)Pw 1 dn if W is an E-valued form of degree p and n is an F-valued form. We shall apply the notion of vector-valued forms to develop (mostly in exercise form) some elementary facts about the geometry of oriented surfaces in IE:l. Let 111 be an oriented two-dimensional manifold, and let 'P be a differ- entiable map of },1 into 1E3. We shall assume that 'P* is not singular at any point of M, i.e., that 'P is an immersion. Thus at each point p EO JI the space 'P* (Tp(M») is a two-dimensional subspace ofT<p(p)(1E 3 ). Since we can identify T<p(p)(1E 3) ,'vith 1E 3, we can regard 'P*(Tp(ilf)) as a two-dimensional subspace of 1E3. (See Fig. 11.15.) Since M is oriented, so is the tangent plane 'P*(Tp(Nl)). Therefore, there is a unique unit vector orthogonal to the tangent plane which,
  • 473. APP. II. DIFFERENTIAL GEOMETRY OF SURFACES IN 1E3 461 'P*(Tp(M)) Fig. H.lS together with an oriented basis of the tangent plane, gives an oriented basis of 1E3. This vector is called the normal vector and will be denoted by n(p). We can consider n an 1E3-valued function on M. Since Ilnll = 1, we can regard n as a mapping from M to the unit sphere. Note that cp(M) lies in a fixed plane of 1E3 if and only if n = const (n = the normal vector to the plane). We therefore can expect the variation of n to be useful in describing how the surface cp(M) is "bending". Let 0 be the (oriented) area form on M corresponding to the Riemann metric induced by cpo Let Os be the (oriented) area form on the unit sphere. Then n*(Os) is a two-form on M, and therefore we can write n*Os = KO. The function K is called the Gaussian curvature of the surface cp(M). Note that K = 0 if cp(M) lies in a plane. Also, K = 0 if cp(M) is a cylinder (see the exer- cises). For any oriented two-dimensional manifold with a Riemann metric we let (f denote the set of all oriented bases in all tangent spaces of M. Thus an element of (f is given by -<iI, 12'>, where -<fll 12'> is an orthonormal basis of T",(M) for some x E M. Note that 12 is determined by fl' because of the orienta- tion and the fact that f2 1. II. Thus we can consider (f the space of all tangent vectors of unit length. For each x E M the set of all unit vectors is just a circle. We leave it to the reader to verify that (f is, in fact, a three-dimensional manifold. We denote by 7r the map that assigns to each -<fII f 2'> the point x when -<fll f2 '> is an orthonormal basis at x. Again, the reader should verify that 7r is a differentiable map of (f onto M. In the case at hand, where the metric comes from an immersion cp, we define several vector-valued functions X, el, e2, and e3 on (f as follows: X = cp 0 7r, el( -<It, f2 '» = cp*1t, e2( -<It, f2 '» = CPJ2, e3 = no 7r.
  • 474. 462 EXTERIOR CALCULUS (In the middle two equations we regard cp* Ii as elements of 1E3 via the identifica- tion of TI"(x)1E 3 with 1E3.) Thus at any point z of 5', the vectors el(z), e2(z), e3(z) form an orthonormal basis of 1E3, where el (z) and e2(z) are tangent to the surface at cp(n·(z)) = X(z) and ea(z) is orthogonal to this surface. We can therefore write (dX 1 ea) = (dX, e3) = 0 and By the first equation we can write dX = Wlel +W2e2, (11.1) where and are (real-valued) linear differential forms defined on 5'. Similarly, let us define the forms Wij by setting Wij = (dei, ej). Applying d to the equation (ei' ej) = 8ik shows that (11.2) If we apply d to (11.1), we get o= d dX = dWlel - WI 1 del + dW2e2 - W2 1 de2. Taking the scalar product of this equation with el and e2, respectively, shows (since WII = 0 and W22 = 0) that If we apply d to the equation we get and dei = L: Wijej, j 0= 'L,(dWijej - Wij 1 dej) and if we take the scalar product with ej, we get dWij = L: Wik 1 Wkj· k If we apply d to the equation (dX, e3) = 0, we get o= d(dX, (3) = (dX, de3) = (wlel +W2e2, Walel +Wa2e2), which implies that (11.3) (11.4) (11.5) We will now interpret these equations. Let z = </I,j2> be a point of 5'. For any ~ E Tz(5') we have (~, dX) = (~, d7r*cp) = (~, 7r*dcp) = (7r*~, dcp) = CP*(7r*~).
  • 475. APP. II. DIFFERENTIAL GEOMETRY OF SURFACES IN 1E3 463 Therefore, (~, WI) = ('P*7r*~, el) - ('P*7r*~,'P*it) = (7r*~,it), (II.6) since the metric was defined to make 'P* an isometry. In other words, (~, WI) and (~,W2) are the components of 7r*~ with respect to the basis -<it,f2>-' If TJ is another tangent vector at z, then WI 1 W2(~' TJ) is the (oriented) area of the parallelogram spanned by 7r*~ and 7r*TJ. In other words, WI 1 W2 = 7r*Q, where Q is the oriented area form on M. Similarly, and we have n*7r*~ = (~, w31)el + (~, w32)e2 (~, W31)'P*it + (~, w32)'P*f2' (II.7) (II.S) (II.g) Since we can regard el and e2 as an orthonormal basis of the tangent space to the unit sphere, we conclude that W3I 1 W32(~' TJ) is the oriented area on the unit sphere of the parallelogram spanned by n*7r*~ and n*7r*TJ. Thus Let W3I 1 W32 = 7r*n*Qs = 7r*KQ = KWI 1 W2' be the matrix of the linear transformation n*: Tx(M) -+ T n(x)(S2) in terms of the basis -<it, f2 >- of TxClI1) and -<ell e2>- of T n(x)(S2). Then comparing (I1.6) with (II.9) shows that and (II.lO) If we substitute this into (II.5), we conclude that b = bl, i.e., that the matrix of n* is symmetric. This suggests that it corresponds to a symmetric bilinear form of some geometrical significance. In other words, we want to consider the quadratic form awi + 2bwIW2 + cw~ [where it is understood that this is the quadratic form on Tz(fJ) which assigns the number to any ~ E Tz(5')].
  • 476. 464 EXTERIOR CALCULUS EXERCISES 11.1 Show that a<t Wl)2 + 2b<t Wl)<~, W2) +c<t W2)2 = (cp*7J"*t n*7J"*~). 11.2 The quadratic form which assigns to each ~ E TAM) the number (cp*~, n*~) is called the second fundamental form of the surface. We shall denote it by II(~). (What is usually called the first fundamental form is just 1I~112 in our terminology.) Let C be any smooth curve with C'(O) = ~. Show that IIm = - (d2 ;t:C(0), n(x») . Thus IICn measures how much the curve cp 0 C is bending in the n-direction. Suppose we choose C to be such that cp 0 C lies in the plane spanned by CP*~ and n(x). [Geomet- rically, this amounts to considering the curve obtained on the surface by intersecting the surface with the plane spanned by CP*~ and n(x).] Show that II(n is the curvature of this plane curve. In this sense, the second fundamental form II(n tells us how much the surface is bending in the direction of r. Note that Let Al and A2 be the eigenvalues of the matrix [~ ~J. Thus Al = max IIm and for II~II = 1. If Al ~ A2, there are two orthogonal eigenvectors which are called the directions of principal curvature of the surface. (Note that they must be orthogonal, since they are eigenvectors of a' symmetric matrix.) If A is a Euclidean motion of [3, then if; = A 0 cp is another immersion of M and it is easy to check that both the Riemann metric induced by if; and the second fundamental form associated with if; coincide with those attached to cpo What is not so obvious is the converse: If if; and cp induce the same metric and the same second funda- mental form, then if; = A 0 cp for some Euclidean motion A. We will not prove this fact, although it is a fairly easy consequence of what we have already established. We have seen the meaning of w!, W2, W3l, and W32 in geometric terms. Let us now interpret the one remaining form, W12. Let I' be a differentiable curve on M. A differentiable family of unit vectors 11 (-) along I' (where 11 (s) E T'Y(s)(M)J is the same as a curve C in ~ with 7r 0 C = 1'. (Here C(s) = -<11 (s), 12(S) >-.J Let us call the family ft (s) parallel if the unit vectors are all changing normally to the surface in three-space. In other words, it(s) is parallel if the vector dcpn(s) (ft (s)) ds
  • 477. APP. II. DIFFERENTIAL GEOMETRY OF SURFACES IN 1E3 465 is normal to rp(M) for all s. Let us see how to express this condition. Let 1;. be the tangent vector to the curve C at C(s). Then, by the definition of dell d ds rp*'Y(8) (fl(s») = (1;8, del)' Note that (1;., del), el(C(s»)) = 0 and (1;., del), e2(C(s»)) = (1;., WI2). Now el and e2 span the tangent space to rp(M), so saying that fl (-) is parallel is the same as saying that (1;., W12) = O. Thus h(s) is parallel along 'Y if and only if (1;., W12) = O. Let M and M be two-dimensional manifolds with Riemann metrics. .Let u: M ---. M be a differentiable map which is an isometry. Let 5' be the manifold of orthonormal bases of 111, and let 5' be the manifold of orthornormal bases of M. Then u induces a map U of 5' ~ 5' by u(-<h,f2"» = -<U*!I,U*f2">' Let WI be the differential form on 5' given, as in (1I.6), by for I; E Tz (5'), where z = -<fl, f2">, with the corresponding definition for W2, w}, and W2' Then for any I; E Tz(5') we have, since if 0 U = u 0 7r, (I;, U*WI) = (u*l;, WI) = (7i'*u*l;, u*fl) = (u*7r*1;, u*fl) = (7r*1;, fl) = (I;, WI)' In other words, and Now suppose that the metrics on M and AI come from immersions rp and <p. Then we get forms Wij and Wij. Now by (1I.3) we have U*(WI 1 (21) = u*dwI = d(u*wI) = dWI = WI 1 W21' Thus and or and Since the differential forms WI and W2 are linearly independent, this can only happen if In other words, if the two surfaces rp(M) and <p(M) are isometric, they have the "same" W12, that is, the same notion of "parallel vector fields". Observe that a piece of a cylinder and a piece of the plane are isometric, even though they are not congruent by a Euclidean motion. In different terms, while the forms WI3 and W23 depend on how the surface is immersed in 1E3 , the form WI2 depends only on the Riemann metric induced by the immersion. Now we have (1I.4):
  • 478. 466 EXTERIOR CALCULUS From this we conclude that the Gaussian curvature K also does not depend only on the immersion, but only on the Riemann metric coming from the immersion. Since Wl2 does not depend on C(J, we should be able to define it for an arbitrary two-dimensional manifold with a Riemann metric. Note that the preceding argument shows that Wl2 is uniquely determined by Eq. (11.4). It therefore suffices to construct an W12 on a coordinate neighborhood so as to satisfy (11.4). It will then follow from the uniqueness that any two such coincide to give a well-defined form. Let U be a coordinate neighborhood of M, and let y;: U ---t it be a differentiable map such that 71" 0 y; = id. Thus y; assigns a basis <11,12 >- to each x E U, in a differentiable manner. (One possible way to construct y; is to apply the orthonormalization procedure to the vector fields <ajax!, ajax2>-.) Once we have chosen y;, any basis of Tx differs from y;(x) by a rotation. If we let 7 denote the (angular) coordinate giving this rotation (so that 7 is only defined mod 271"), then we can use the local coordinates on U together with 7 as coordinates on 7I"-I(U). llore precisely, if Xl and x 2 are local coordi- nates on U, we define yl, y2, 7 by and 7(Z) is given for Z = <el, e2> by el = cos 7(Z)Jr + sin 7(z)h, e2 = -sin 7(Z)Jr + cos 7(Z)h, (11.11) where <Jr,h> = y;(x) when <el, e2> E Tx(M). Now let and so that 81 and 82 are forms defined on U and are, in fact, the dual basis for y;(x) at each x E M. If we set and then (11.11) gives WI = cos 7al + sin 7a2 and Note that Define the functions II and l2 on M by and Let k1 = II 0 71" and k2 = l2 0 71", so that and Now dWl = -sin 7 d7 1 a1 + cos 7 d7 1 a2 + (k1 cos 7 + k2 sin 7)a1 1 a2, dW2 = -cos 7 d7 1 a1 - sin 7 d7 1 a2 + C+k2 cos 7 - k1 sin 7)a1 1 a2.
  • 479. APP. II. DIFFERENTIAL GEOMETRY OF SURFACES IN 1E3 467 Since WI A W2 = al A a2, we can rewrite these equations as dWI = (dr + (kl cos r + k2 sin r)wI) A W2, dW2 = - (dr - (+k2 cos r - kl sin r)w2) A WI. We thus see that the form Wl2 = dr + (k l cos r + k2 sin r)wI + (-kl sin r + k2 cos r)w2 = dr + klal + k2a2 satisfies the desired equations. As before, on any two-dimensional Riemann manifold we will call a family of unit vectors parallel along a curve l' if (~., W12) = O. With this definition of parallel translation we can state the following: Theorem. Let l' be any differentiable curve on M. Given the unit vector gl E T'Y(o)(M), there is a unique parallel family of unit vectors gl (s) along 1', with gl (0) = gl. If g~ (0) is another unit vector of T'Y(O)(M) differing from gl by an angle CT, then g~(s) differs from gl(S) by the same angle CT for all s. Proof. It is clearly sufficient (by breaking l' up into small pieces if necessary) to prove the theorem for curves l' lying entirely in a coordinate chart. Then we can use the local expression for W12. Let us rewrite the condition for parallel translation along I'(s). In terms of local coordinates, the unit vector gl (s) is given by a function r(s), where gl(S) = cos r(s)!l (I'(s») - sin r(s)!2(I'(s»). Then (~., W12) = (~. dr) + (~., klal + k2( 2) dr(s) *= ([S + (~., 7r (kllh + k2Ih» dr(s) = ([8 - (7r*~., klfh + k2(J2). But 7r*~. = r. is the tangent vector to l' at I'(s). Thus where F'Y(s) = (r., kl(J1 + k2(J2) is a function depending only on s. In par- ticular, gl(S) is parallel if and only if dr(s) = F ( ) ds 'Y S • From this we see that given gl (0) there is a unique parallel family gl (s), starting with gl (0). Furthermore, if g~ (0) is a second unit vector at 1'(0), the angle
  • 480. 468 EXTERIOR CALCULUS between gl(S) and g~(s) is equal to the angle between gl(O) and g~(O). Thus parallel translation preserves angles, which proves the theorem. 0 Note that if M is (locally isometric to) Euclidean space, then we can choose a h = axl and so that {}l = dx l and (}2 = dx2• In this case, kl = k2 = 0 and T is just the angle that gl makes with a/ax that is, with the xl-axis. Thus W12 = -dT in this case. Then the condition for parallel translation becomes dT/ ds = 0, which coincides "ith the usual notion of parallelism in Euclidean geometry. 1'ote that in Euclidean "pace the parallelism does not depend on the curve 1'. This is not true in general. Exercise II.3. Let 1'1 and 1'2 be two arcs of great circles joining the north and south poles on S2. Suppose that 1'1 and 1'2 are orthogonal at the l)oles. Let r be a tangent vector of the north pole. Compare its translates to the south pole via 1'1 and 1'2. Let M be any two-dimensional Riemann manifold. For any curve I' on M there is an obvious way of choosing unit vectors along 1': just let gl(S) be the unit tangent vector to I' at I'(s). Thus for every curve I' on III we get a curve, which we shall call y, on 5'. [Here y = (I'(s), gl(S), g2(S) and gl(S) is the tangent to I'(s).] We call the form y*(W12) the geodesic curvature form of 1'. [In the Euclidean case this is just the ordinary curvature (see the exercises).] Let us consider those curves whose geodesic curvatures vanish, i.e., those curves whose tangent vectors are parallel. We shall call such a curve a geodesic with respect to the given Riemann metric. Note that the condition that a curve be geodesic is given, in local coordinates, by a second-order differential equa- tion. Therefore, a geodesic CO is uniquely specified by giving C(t) and C'(t) at any fixed value of t. In Chapter 13 we use the term "geodesic" to mean a curve which locally minimizes length. It is the purpose of the next few exercises to show that geodesics in our present sense have this property. EXERCISES 11.4 Let x, y be local coordinates on U eM. Through each point of the curve y = 0 (that is, the x-axis in the local coordinates), construct the unique geodesic orthogonal to this curve. (See Fig. 11.16.) Let s be the arc-length parameter along the geodesic, so that the geodesic passing through (u, 0) is given by (y(u, 8), x(u, 8»). Show that the map (u, 8) ~ (y(u, 8), x(u, 8») has nonzero Jacobian at (0,0) and therefore defines a coordinate system in some open subset U' C U.
  • 481. APP. II. DIFFERENTIAL GEOMETRY OF SURFACES IN 1E3 469 Fig. 11.16 Fig. 11.17 11.5 We are going to make a further change of coordinates. Let Y be the vector field on U' defined by the properties IIYII = 1, (Y, du) > o. Thus Y is orthogonal to the geodesics u = const and points in the increasing U-direc- tion. Let us consider the solution curves of this vector field parametrized by the initial position along the geodesic u = O. That is, let v be the arc-length parameter along the geodesic u = 0, and consider the map (u, v) ~ (u, s(u, v»), where s(u, v) is the s-coordinate of the intersection of the solution curve of Y passing through (0, v) with the geodesic given by u. (See Fig. 11.17.) Again the existence theorem and smooth dependence on parameters, together with the fact that the curves u = 0 and s = 0 are already orthogonal, guarantees that we can find some neighbor- hood W so that (u, v) are coordinates on W. "We have thus constructed coordinates such that the curves u = const are geodesics and the curves u = const and v = const are orthogonal. Such a system of coordinates is called a geodesic parallel coordinate system. 11.6 Let (u, v) be a coordinate system on U C ill for which (a/au, a/av) == 0, so that the metric takes the form ds2 = P du2 + q dv2• Define the choice of frame if; by normalizing a/au, a/av so that if;(x) = <h, h >-, where h = (a/au)/II(a/au) II and h = (a/av)/II(a/av) II. Show that the forms fh and fh are given by {II = P du, {I2 = q dv, and ( 1 ap 1 aq )W12 = dT - 1l"* - - du +- - dv q av p au and K - - ~[i.(.!:. aq) +~ (!ap)]. - pq au p au av q av
  • 482. 470 EXTERIOR CALCULUS 11.7 Let (u, v) be a geodesic parallel coordinate system, as in Exercise II.5. The curve Cu given by Cu(v) = Cu, v) is a geodesic. Thus (CI/(v), W12) = 0. But in terms of our local coordinates, (CI/(v) , dT) = 0, since C'(v) is always parallel to one of the base vectors 12, and (CI/ (v), ir*du) = (C1 (v), dU) = 0, since u = const along C. Thus we conclude that aq/au = 0, or q = q(v). Let us replace the parameter v by w 10' q(t) dt. Then (u, v) is a geodesic parallel coordinate system for which we have ds2 = P du2 + dw2, and now the arc length along any curve u = const is I dw. 11.8 Show that for Iwl sufficiently small, any curve joining, (0, 0) to (0, w) must have arc length at least Iwl. Conclude that (since the choice of our original curve x = °was arbitrary) the geodesics locally minimize length. II.9 Let -< w, z>- be local coordinates on an open set U of a Riemann manifold with the property that the curves Cz given by Cz(w) = (w, z) are geodesics param- etrized according to arc length. Thus z = const is a geodesic and [[a/aw[[ = 1. Let Show that aa/aw = 0. [llint: Show that by orthonormalizing (a/aw, a/az) , we obtain a map if; whose associated forms (h and (h are given by (h = dw + adz, (12 = b dz, wh