SlideShare a Scribd company logo
Sadri Hassani
Mathematical Physics
A Modem Introduction to
Its Foundations
With 152 Figures
, Springer
ODTlJ KU1"UPHANESt
M. E. T. U. liBRARY
METULIBRARY
..2
~themltlcoJ physicS:I modem
mllllllllllllllllllllllllllllll~11111111111111111111111111IIII
002m7S69
QC20 H394
SadriHassani
Department ofPhysics
IllinoisStateUniversity
Normal, IL 61790
USA
hassani@entropy.phy.ilstu.edu
To my wife Sarah
and to my children
Dane Arash and Daisy Rita
336417
Libraryof CongressCataloging-in-Publication Data
Hassani,Sadri.
Mathematical physics: a modem introductionits foundations /
SadriHassani.
p. em.
Includesbibliographical referencesandindex.
ISBN 0-387-98579-4 (alk.paper)
1.Mathematical physics. I. Title.
QC20.H394 1998
530.15--<1c21 98-24738
Printedon acid-freepaper.
QC20
14394
c,2.
© 1999Springer-Verlag New York,Inc.
All rightsreserved.Thiswork maynot be translatedor copiedin whole or in part withoutthe written
permission of the publisher (Springer-Verlag New York, Inc., 175 Fifth Avenue, New York, NY
10010, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use-in
connectionwith any form of informationstorageand retrieval,electronicadaptation,computersctt-
ware,or by similaror dissimilarmethodology now knownor hereafterdevelopedis forbidden.
The use of generaldescriptivenames, trade names, trademarks,etc., in this publication, even if the
formerare not especiallyidentified, is not to be takenas a sign that such names,as understoodby the
TradeMarksandMerchandise MarksAct,may accordinglybe used freelyby anyone.
Productionmanagedby KarinaMikhli;manufacturing supervisedby ThomasKing.
Photocomposed copy preparedfromthe author's TeXfiles.
Printedand boundby HamiltonPrintingCo.,Rensselaer, NY.
Printedin the UnitedStatesof America.
9 8 7 6 5 4 3 (Correctedthirdprinting,2002)
ISBN0-387-98579-4 SPIN 10854281
Springer-Verlag New York Berlin: Heidelberg
A member.ofBertelsmannSpringer Science+Business Media GmbH
Preface
"Ich kann es nun einmal nicht lassen, in diesem Drama von
Mathematik und Physik---<lie sich im Dunkeln befrnchten,
aber von Angesicht zu Angesicht so geme einander verkennen
und verleugnen-die Rolle des (wie ich gentigsam erfuhr, oft
unerwiinschten) Boten zu spielen."
Hermann Weyl
It is said that mathematics is the language of Nature. If so, then physics is its
poetry. Nature started to whisper into our ears when Egyptians and Babylonians
were compelled to invent and use mathematics in their day-to-day activities. The
faint geomettic and arithmetical pidgin of over four thousand years ago, snitable
for rudimentary conversations with nature as applied to simple landscaping, has
turned into a sophisticated language in which the heart of matter is articulated.
The interplay between mathematics and physics needs no emphasis. What
may need to be emphasized is that mathematics is not merely a tool with which the
presentation of physics is facilitated, butthe only medium in which physics can
survive. Just as language is the means by which humans can express their thoughts
and without which they lose their uniqueidentity, mathematics is the only language
through which physics can express itself and without which it loses its identity.
And just as language is perfected due to its constant usage, mathematics develops
in the most dramatic way because of its usage in physics. The quotation by Weyl
above, an approximation to whose translation is "In this drama of mathematics
and physics-whichfertilize each other in the dark, but which prefer to deny and
misconstrue each other face to face-I cannot, however, resist playing the role
ofa messenger, albeit, as I have abundantly learned, often an unwelcome one:'
vi PREFACE
is a perfect description of the natnral intimacy between what mathematicians and
physicists do, and the nnnatnral estrangement between the two camps. Some ofthe
most beantifnl mathematics has been motivated by physics (differential eqnations
by Newtonian mechanics, differential geometry by general relativity, and operator
theoryby qnantnmmechanics), and some of the most fundamental physics has been
expressed in the most beantiful poetry of mathematics (mechanics in symplectic
geometry, and fundamental forces in Lie group theory).
I do uot want to give the impression that mathematics and physics cannot
develop independently. On the contrary, it is precisely the independence of each
discipline that reinforces not only itself, but the other discipline as well-just as the
stndy of the grammar of a language improves its usage and vice versa. However,
the most effective means by which the two camps can accomplish great success
is throngh an inteuse dialogue. Fortnnately, with the advent of gauge and string
theories ofparticlephysics, such a dialogue has beenreestablishedbetweenphysics
and mathematics after a relatively long lull.
Level and Philosophy of Presentation
This is a book for physics stndeuts interested in the mathematics they use. It
is also a book fur mathematics stndeuts who wish to see some of the abstract
ideas with which they are fantiliar come alive in an applied setting. The level of
preseutation is that of an advanced undergraduate or beginning graduate course (or
sequence of courses) traditionally called "Mathematical Methods of Physics" or
some variation thereof. Unlike most existing mathematical physics books intended
for the same audience, which are usually lexicographic collections of facts about
the diagonalization of matrices, tensor analysis, Legendre polynomials, contour
integration, etc., with little emphasis on formal and systematic development of
topics, this book attempts to strike a balance between formalism and application,
between the abstract and the concrete.
I have tried to include as mnch of the essential formalism as is necessary to
render the book optimally coherent and self-contained. This entails stating and
proving a large nnmber of theorems, propositions, lemmas, and corollaries. The
benefit of such an approachis that the stndentwill recognize clearly both the power
and the limitation of a mathematicalidea usedin physics. Thereis a tendency on the
part ofthe uovice to universalize the mathematical methods and ideas eucountered
in physics courses because the limitations of these methods and ideas are not
clearly pointed out.
There is a great deal offreedom in the topics and the level of presentation that
instructors can choose from this book. My experience has showu that Parts I, TI,
Ill, Chapter 12, selected sections ofChapter 13, and selected sections or examples
of Chapter 19 (or a large snbset of all this) will be a reasonable course content for
advancedundergraduates. Ifone adds Chapters 14and 20, as well as selectedtopics
from Chapters 21 and 22, one can design a course snitable for first-year graduate
PREFACE vii
students. By judicious choice of topics from Parts VII and VIII, the instructor
can bring the content of the course to a more modern setting. Depending on the
sophistication ofthe students, this can be done either in the first year or the second
year of graduate school.
Features
To betler understand theorems, propositions, and so forth, students need to see
them in action. There are over 350 worked-out examples and over 850 problems
(many with detailed hints) in this book, providing a vast arena in which students
can watch the formalism unfold. The philosophy underlying this abundance can
be summarized as ''An example is worth a thousand words of explanation." Thus,
whenever a statement is intrinsically vague or hard to grasp, worked-out examples
and/or problems with hints are provided to clarify it. The inclusion of such a
large number of examples is the means by which the balance between formalism
and application has been achieved. However, although applications are essential
in understanding mathematical physics, they are only one side of the coin. The
theorems, propositions, lemmas, and corollaries, being highly condensed versions
of knowledge, are equally important.
A conspicuous feature of the book, which is not emphasized in other compa-
rable books, is the attempt to exhibit-as much as.it is useful and applicable-«
interrelationships among various topics covered. Thus, the underlying theme of a
vector space (which, in my opinion, is the most primitive concept at this level of
presentation) recurs throughout the book and alerts the reader to the connection
between various seemingly unrelated topics.
Another useful feature is the presentation of the historical setting in which
men and women of mathematics and physics worked. I have gone against the
trend of the "ahistoricism" of mathematicians and physicists by summarizing the
life stories of the people behind the ideas. Many a time, the anecdotes and the
historical circumstances in which a mathematical or physical idea takes form can
go a long way toward helping us understand and appreciate the idea, especially if
the interaction among-and the contributions of-all those having a share in the
creation of the idea is pointed out, and the historical continuity of the development
of the idea is emphasized.
To facilitate reference to them, all mathematical statements (definitions, theo-
rems, propositions, lemmas, corollaries, and examples) have been nnmbered con-
secutively within each section and are preceded by the section number. For exam-
ple, 4.2.9 Definition indicates the ninth mathematical statement (which happens
to be a definition) in Section 4.2. The end of a proofis marked by an empty square
D, and that of an example by a filled square III, placed at the right margin of each.
Finally, a comprehensive index, a large number of marginal notes, and many
explanatory underbraced and overbraced comments in equations facilitate the use
viii PREFACE
and comprehension of the book. In this respect, the book is also nsefnl as a refer-
ence.
Organization and Topical Coverage
Aside from Chapter 0, which is a collection of pnrely mathematical concepts,
the book is divided into eight parts. Part I, consisting of the first fonr chapters, is
devotedto a thorough study of finite-dimensional vectorspaces and linearoperators
defined on them. As the unifying theme ofthe book, vector spaces demand careful
analysis, and Part I provides this in the more accessible setting of finite dimensionin
a languagethatis conveniently generalizedto the more relevant infinite dimensions,'
the subject of the next part.
Following a brief discussion of the technical difficulties associated with in-
finity, Part IT is devoted to the two main infinite-dimensional vector spaces of
mathematical physics: the classical orthogonal polynomials, and Foutier series
and transform.
Complex variables appear in Part ill. Chapter 9 deals with basic properties of
complex functions, complex series, and their convergence. Chapter 10 discusses
the calculus of residues and its application to the evaluation of definite integrals.
Chapter II deals with more advanced topics such as multivaluedfunctions, analytic
continuation, and the method of steepest descent.
Part IV treats mainly ordinary differential equations. Chapter 12 shows how
ordinary differential equations of second order arise in physical problems, and
Chapter 13 consists of a formal discussion of these differential equations as well
as methods of solving them numerically. Chapter 14 brings in the power of com-
plex analysis to a treatment of the hypergeometric differential equation. The last
chapter of this part deals with the solution of differential equations using integral
transforms.
Part V starts with a formal chapter on the theory of operator and their spectral
decomposition in Chapter 16. Chapter 17 focuses on a specific type of operator,
namely the integral operators and their corresponding integral equations. The for-
malism and applications of Stnrm-Liouville theory appear in Chapters 18 and 19,
respectively.
The entire Part VI is devoted to a discussion of Green's functions. Chapter
20 introduces these functions for ordinary differential equations, while Chapters
21 and 22 discuss the Green's functions in an m-dimensional Euclidean space.
Some of the derivations in these last two chapters are new and, as far as I know,
unavailable anywhere else.
Parts VII and vrncontain a thorough discussion of Lie groups and their ap-
plications. The concept of group is introduced in Chapter 23. The theory of group
representation, with an eye on its application in quantom mechanics, is discussed
in the next chapter. Chapters 25 and 26 concentrate on tensor algebra and ten-,
sor analysis on manifolds. In Part vrn, the concepts of group and manifold are
PREFACE ix
brought together in the coutext of Lie groups. Chapter 27 discusses Lie groups
and their algebras as well as their represeutations, with special emphasis on their
application in physics. Chapter 28 is on differential geometry including a brief
introduction to general relativity. Lie's original motivation for constructing the
groups that bear his name is discussed in Chapter 29 in the context of a systematic
treatment of differential equations using their symmetry groups. The bookends in
a chapter that blends many of the ideas developed throughout the previous parts
in order to treat variational problems and their symmetries. It also provides a most
fitting example of the claim made at the beginning of this preface and one of the
most beautiful results of mathematical physics: Noether's theorem ou the relation
between symmetries and conservation laws.
Acknowledgments
It gives me great pleasure to thank all those who contributed to the making of
this book. George Rutherford was kind enough to voluuteer for the difficult task
of condensing hundreds of pages of biography into tens of extremely informative
pages. Without his help this unique and valuable feature of the book would have
been next to impossible to achieve. I thank him wholeheartedly. Rainer Grobe and
Qichang Su helped me with my rusty computational skills. (R. G. also helped me
with my rusty German!) Many colleagues outside my department gave valuable
comments and stimulating words of encouragement on the earlier version of the
book. I would like to recordmy appreciationto Neil Rasbandfor reading part of the
manuscript and commenting on it. Special thanks go to Tom von Foerster, senior
editor ofphysics and mathematics at Springer-Verlag, not ouly for his patience and
support, but also for the,extreme care he took in reading the entire manuscript and
giving me invaluable advice as a result. Needless to say, the ultimate responsibility
for the content of the book rests on me. Last but not least, I thank my wife, Sarah,
my son, Dane, and my daughter, Daisy, for the time taken away from them while
I was writing the book, and for their support during the long and arduous writing
process.
Many excellent textbooks, too numerous to cite individually here, have influ-
enced the writing of this book. The following, however, are noteworthy for both
their excellence and the amount of their influence:
Birkhoff, G., and G.-C. Rota, Ordinary Differential Equations, 3rd ed., New York,
Wiley, 1978.
Bishop, R., and S. Goldberg, Tensor Analysis on Manifolds, New York, Dover,
1980.
Dennery, P., and A. Krzywicki, Mathematics for Physicists, New York, Harper &
Row, 1967.
Halmos, P., Finite-Dimensional Vector Spaces, 2nd ed., Princeton, Van Nostrand,
1958.
x PREFACE
Hamennesh, M. Group Theory and its Application to Physical Problems, Dover,
New York, 1989.
Olver, P.Application ofLie Groups to DifferentialEquations, New York, Springer-
Verlag, 1986.
Unless otherwise indicated, all biographical sketches have been taken from the
following three sources:
Gillispie, C., ed., DictionaryofScientificBiography, Charles Scribner's,New York,
1970.
Simmons, G. Calculus Gems, New York, McGraw-Hill, 1992.
History ofMathematics archive at www-groups.dcs.st-and.ac.uk:80.
I wonld greatly appreciate any comments and suggestions for improvements.
Although extreme care was taken to correct all the misprints, the mere volume of
the book makes it very likely that I have missed some (perhaps many) of them. I
shall be most grateful to those readers kind enough to bring to my attention any
remaining mistakes, typographical or otherwise. Please feel free to contact me.
Sadri Hassani
Campus Box 4560
Department of Physics
Illinois State University
Normal, IL 61790-4560, USA
e-mail: hassani@entropy.phy.i1stu.edu
Itis my pleasure to thank all those readers who pointed out typographical mistakes
and suggested afew clarifyingchanges. With the exception ofa couple that required
substantial revisiou, I have incorporated all the corrections and suggestions in this
second printing.
Note to the Reader
Mathematics and physics are like the game of chess (or, for that matter, like any
gamej-i-you willleam only by ''playing'' them. No amount of reading about the
game will make you a master. Inthis bookyou will find a large number ofexamples
and problems. Go throughas many examples as possible, and try to reproducethem.
Pay particular attention to sentences like "The reader may check . .. "or "It is
straightforward to show . .. "These are red flags warning you that for a good
understanding of the material at hand, yon need to provide the missing steps. The
problems often fill in missing steps as well; and in this respect they are essential
for a thorough understanding of the book. Do not get discouraged if you cannot get
to the solution of a problem at your first attempt. If you start from the beginning
and think about each problem hard enough, you will get to the solution, .and you
will see that the subsequent problems will not be as difficult.
The extensive index makes the specific topics about which you may be in-
terested to leam easily accessible. Often the marginal notes will help you easily
locate the index entry you are after.
I have included a large collection of biographical sketches of mathematical
physicists ofthe past. These are truly inspiring stories, and I encourage you to read
them. They let you see that even underexcruciating circumstances, the human mind
can workmiracles. Youwill discover how these remarkable individuals overcame
the political, social, and economic conditions of their time to let us get a faint
glimpse of the truth. They are our true heroes.
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Contents
Preface
Note to the Reader
List of Symbols
o Mathematical Preliminaries
0.1 Sets .
0.2 Maps .
0.3 Metric Spaces .
0.4 Cardinality........
0.5 Mathematical Induction
0.6 Problems..........................
I Finite-Dimensional VectorSpaces
1 Vectors and Transformations
1.1 VectorSpaces . . . . .
1.2 Inner Product .
1.3 Linear Transformations .
1.4 Algebras.
1.5 Problems..........
2 Operator Algebra
2.1 Algebra «s:(V)
v
xi
xxi
1
1
4
7
10
12
14
17
19
19
23
32
41
44
49
49
xiv CONTENTS
2.2 Derivatives of Functions of Operators .
2.3 Conjugationof Operators . . . .
2.4 Hermitian and Unitary Operators
2.5 Projection Operators . . . . . . .
2.6 Operatorsin Numerical Analysis
2.7 Problems.............
3 Matrices: Operator Representations
3.1 Matrices..........
3.2 Operationson Matrices .. . . . .
3.3 OrthonormalBases .
3.4 Change of Basis and SimilarityTransformation .
3.5 The Determinant . . ..
3.6 The Trace
3.7 Problems......
4 Spectral Decomposition
4.1 Direct Sums . . . . . . . . . .
4.2 InvariantSubspaces . . . . . .
4.3 EigeuvaluesandEigenvectors .
4.4 SpectralDecomposition
4.5 Functionsof Operators
4.6 Polar Decomposition
4.7 Real VectorSpaces
4.8 Problems.......
IT Infinite-Dimensional Vector Spaces
5 Hilbert Spaces
5.1 The Questionof Convergence .
5.2 The Space of Square-IntegrableFunctions
5.3 Problems...................
6 Generalized Functions
6.1 ContinuousIndex
6.2 Generalized Functions .
6.3 Problems........
7 Classical Orthogonal Polynomials
7.1 GeneralProperties . .
7.2 Classification.........
7.3 RecurrenceRelations ....
7.4 Examplesof Classical OrthogonalPolynomials.
56
61
63
67
70
76
82
82
87
89
91
93
101
103
109
109
112
114
117
125
129
130
138
143
145
145
150
157
159
159
165
169
172
172
175
176
179
7.5
7.6
7.7
Expansion in Terms of Orthogonal Polynomials ..
Generating Functions . . . . . . . . . . . .
Problems .
CONTENTS xv
186
190
190
8 Fourier Analysis
8.1 Fourier Series .....
8.2 The Fourier Transform .
8.3 Problems........
III Complex Analysis
196
196
208
220
225
9 Complex Calculus 227
9.1 Complex Functions " 227
9.2 Analytic Functions. . . . . . . " 228
9.3 ConformalMaps .. . . . . . . . . . . . 236
9A Integration of Complex Functions . . . . . . . . . . . . . " 241
9.5 Derivativesas Integrals 248
9.6 Taylorand Laurent Series . 252
9.7 Problems............................ 263
10 Calculus of Residues
10.1 Residues . . . . . . . . . . . . . . . .
10.2 Classificationof Isolated Singularities
10.3 Evaluation of DefiniteIntegrals . . ..
lOA Problems .
11 Complex Analysis: Advanced Topics
ILl MeromorphicFunctions . . .
11.2 MultivaluedFunctions . . . . . . . .
11.3 Analytic Continuation . . . . . . . . .
1104 The Gamma and Beta Functions. . . .
11.5 Method of Steepest Descent . . . . . .
11.6 Problems. . . . . . . . . . . . . . . .
270
270
273
275
290
293
293
295
302
309
312
319
IV DifferentialEquations 325
12 Separation ofVariables in Spherical Coordinates 327
12.1 PDEs of Mathematical Physics .. . . . . . . . . . . . . . . 327
12.2 Separation of theAngular Part of the Laplacian. . . . . . .. 331
12.3 Constructionof Eigenvaluesof L2. . . . . . . . . . . 334
1204 Eigenvectorsof L2: Spherical Harmonics . . 338
12.5 Problems.. . . . . . . . . . . . . . ., . . . . . . . 346
xvi CONTENTS
13 Second-Order Linear Differential Equations 348
13.1 General Properties of ODEs. . . . . . . . . . . . . 349
13.2 Existence and Uniqneness for First-Order DEs . 350
13.3 General Properties of SOLDEs . . . 352
13.4 The Wronskian. . . . . . . . . . . . 355
13.5 Adjoint Differential Operators. . . . 364
13.6 Power-Series Solntions of SOLDEs . 367
13.7 SOLDEs with Constant Coefficients 376
13.8 The WKB Method . . . . . . . . . . . . 380
13.9 Numerical Solntions of DEs . . . . . . . 383
13.10 Problems. . . . . . . . . . . . . . . . . . . . . . . . . . . 394
14 Complex Aualysis of SOLDEs 400
14.1 Analytic Properties of Complex DEs 401
14.2 Complex SOLDEs . . . . . . . . . . 404
14.3 Fuchsian Differential Equations . . . 410
14.4 The Hypergeometric Functiou . . . . 413
14.5 Confiuent Hypergeometric Functions 419
14.6 Problems. . . . . . . . . . . . . . . . . . 426
15 Integral Transforms and Differential Equations 433
15.1 Integral Representation of the Hypergeometric Function . . 434
15.2 Integral Representation of the Confiuent Hypergeometric
Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
15.3 Integral Representation of Bessel Functions 438
15.4 Asymptotic Behavior of Bessel Functions. 443
15.5 Problems.. . . . . . . . . . . . . . . . . . . . . . . . . .. 445
V Operators on Hilbert Spaces
16 An Inlroduction to Operator Theory
16.1 From Abstract to Integral and Differential Operators .
16.2 Bounded Operators in Hilbert Spaces . .
16.3 Spectra ofLinear Operators . .
16.4 Compact Sets .
16.5 Compact Operators .
16.6 Spectrum of Compact Operators
16.7 Spectral Theorem for Compact Operators .
16.8 Resolvents
16.9 Problems. . . . . . . . . . . . . . . . . .
17 Integral Equations
17.1 Classification.
449
451
451
453
457
458
464
467
473
480
485
488
488
CONTENTS xvii
17.2 Fredholm Integra!Equations . 494
17.3 Problems............ 505
18 Sturm-Liouville Systems: Formalism 507
18.1 UnboundedOperatorswith CompactResolvent 507
18.2 Sturm-Liouville Systems and SOLDEs . . . . 513
18.3 Other Propertiesof Sturm-Liouville Systems . 517
18.4 Problems. . . . . . . . . . . . . . . . . . . . 522
19 Sturm-Lionville Systems: Examples 524
19.1 Expansionsin Termsof Eigenfunctions . 524
19.2 Separationin CartesianCoordinates. . . 526
19.3 Separationin CylindricalCoordinates. . 535
19.4 Separationin SphericalCoordinates. 540
19.5 Problems................. 545
VI Green's Functions
20 Green's Functions in One Dimension
20.1 Calculationof Some Green's Functions .
20.2 Formal Considerations . . . . . . . . . .
20.3 Green's Functionsfor SOLDOs . . . . .
20.4 EigenfunctionExpansion of Green's Fnnctions .
20.5 Problems.. . . . . . . . . . . . . . . . . . . . . . . . . . .
21 Multidimensional Green's Functions: Formalism
21.1 Propertiesof Partial DifferentialEquations .
21.2 MultidimensionalGFs and Delta Functions .
21.3 FormalDevelopment ...
21.4 IntegralEquations and GFs
21.5 PerturbationTheory
21.6 Problems. . . . . . . . . .
22 Multidimensional Green's Functions: Applications
22.1 Elliptic Equations . .
22.2 ParabolicEquations . . . . . . . . . . . . . . .
22.3 HyperbolicEquations . . . . . . . . . . . . . .
22.4 The FourierTransformTechnique . . . .
22.5 The EigenfunctionExpansion Technique
22.6 Problems. .. .. .. .. .. .. . ...
551
553 r-
554
557
565
577
580
583
584
592
596
600
603
610
613
613
621
626
628
636
641
xviii CONTENTS
VII Groups and Manifolds
23 Group Theory
23.1 Groups.
23.2 Subgroups . . .
23.3 Group Action .
23.4 The Symmetric Group s,
23.5 Problems. . . . . . . . .
24 Group Representation Theory
24.1 Definitionsand Examples
24.2 OrthogonalityProperties. . .
24.3 Analysis of Representations . .
24.4 Group Algebra . . . . . . . .. . . . . .
24.5 Relationship of Characters to Those of a Subgroup .
24.6 IrreducibleBasis Functions . . . . . . .
24.7 TensorProduct of Representations ...
24.8 Representationsof the Symmetric Group
24.9 Problems.. .. .. .. . .. .. .. ..
25 Algebra of Tensors
25.1 Multilinear Mappings
25.2 Symmetriesof Tensors .
25.3 Exterior Algebra . . . . .
25.4 hmer Product Revisited .
25.5 The Hodge Star Operator
25.6 Problems. . . . . . . . .
26 Analysis of Tensors
26.1 DifferentiableManifolds. .
26.2 Curves and Tangent Vectors .
26.3 Differentialof a Map ...
26.4 TensorFields on Manifolds
26.5 Exterior Calculus ...
26.6 Symplectic Geometry
26.7 Problems. . . . . . . .
VIII Lie Groups and Their Applications
27 Lie Groups and Lie Algebras
27.1 Lie Groups and Their Algebras . . . . . . .
27.2 An Outline of Lie Algebra Theory ...
27.3 Representation of Compact Lie Groups . . .
649
651
652
656
663
664
669
673 -J
673
680
685
687
692
695
699
707
723
728
729
736
739
749
756
758
763
763
770
776
780
791
801
808
813
815
815
833
845
27.4 Representationof the General Linear Group .
27.5 Representationof Lie Algebras
27.6 Problems .
CONTENTS xix
856
859
876
28 DifferentialGeometry 882
28.1 VectorFields and Curvature . . . . . . 883
28.2 RiemannianManifolds. . . . 887
28.3 CovariantDerivative and Geodesics . 897
28.4 Isometriesand Killing VectorFields 908
28.5 GeodesicDeviationand Curvature . 913
28.6 GeneralTheory of Relativity 918
28.7 Problems. . . . . . . . . . . . . . . . . . . . . . . . . . . . 932
29 Lie Gronpsand DifferentialEquations 936
29.1 Synunetries of AlgebraicEquations. . . . . . 936
29.2 Synunetry Groups of DifferentialEquations 941
29.3 The CentralTheorems. . . . . . . 951
29.4 Applicationto Some KnownPDEs 956
29.5 Applicationto ODEs. 964
29.6 Problems. . . . . . . . . . . . . . 970
30 Calcnlusof Variations, Symmetries,and ConservationLaws 973
30.1 The Calculusof Variations. . . . . . . . . . . 973
30.2 SymmetryGroups of VariationalProblems . . . . . . . . 988
30.3 ConservationLaws and Noether's Theorem. . . . . . . . 992
30.4 Applicationto ClassicalField Theory . 997
30.5 Problems. . . . . . . . . . . . . . . . . . . . . . . . . . .. 1000
Bibliography 1003
Index 1007
Hassani_Mathematical_Physics_A_Modem_Int.pdf
List of Symbols
E, (11')
z
R
R+
C
lI
ill
~A
AxB
An
U, «»
A=B
x ..... f(x)
V
3
[a]
gof
iff
ek(a, b)
Cn(or JRn)
pClt]
P'[t]
P~[t]
Coo
(al b)
lIall
"belongs to", ("does not belong to")
Set of integers
Set of real nnmbers
Set of positive real nnmbers
Set of complex nnmbers
Set of nonnegative integers
Set of rational nnmbers
Complement of the set A
Set of ordered pairs (a, b) with a E A and b E B
{(aI, a2, ... , an)lat E Al
Union, (Intersection)
A is eqnivalent to B
x is mapped to f(x) via the map f
for all (valnes of)
There exists (a valne of)
Eqnivalence class to which a belongs
Composition of maps f and g
if and only if
Set of functions on (a, b) with continnons derivatives np to order k
Set of complex (or real) n-tnples
Set of polynomials in t with complex coefficients
Set of polynomials in t with real coefficients
Set of polynomials with complex coefficients of degree n or less
Set of all complex seqnences (atl~1 snch that L:~I latl2
< 00
Inner prodnct of la) and Ibl
Norm (length) of the vector la)
xxii LIST OF SYMBOLS
£-(V)
[8, T]
Tt
At, or A
l.lE/lV
8(x - xo)
Res[f(zolJ
DE, ODE, PDE
SOLDE
GL(V)
GL(n,q
SL(n,q
71 ®72
AJB
AP(V)
Set of endomorphisms (linear operators) on vector space V
Commutator of operators 8 and T
Adjoint (hermitian conjugate) of operator T
Transpose of matrix A
Direct sum of vector spaces 1.l and V
Dirac delta function nonvanishing only at x = xo
Residue of f at point zo
Differential equation, Ordinary DE, Partial DE
Second order linear (ordinary) differential equation
Set of all invertible operators on vector space V
Set of all n x n complex matrices of nonzero determinant
Set of all n x n complex matrices ofunit determinant
Tensor product of 71 and 72
Exterior (wedge) product of skew-symmetric tensors A and B
Set of all skew-symmetric tensors of type (p, 0) on V
0 _
Mathematical Preliminaries
This introductory chapter gathers together some of the most basic tools and notions
that are used throughout the book. It also introduces some common vocabulary
and notations used in modem mathematical physics literature. Readers familiar
with such concepts as sets, maps, equivalence relations, and metric spaces may
wish to skip this chapter.
0.1. Sets
Modem mathematics starts with the basic (and undefinable) concept of set. We
think of a set as a structureless family, or collection, of objects. We speak, for
example, of the set of students in a college, of men in a city, of women working
concept ofset for a corporation, of vectors in space, of points in a plane, or of events in the
elaborated continuum of space-time. Each member a of a set A is called an element of that
sel. This relation is denoted by a E A (read "a is an element of A" or "a belongs
to A"), and its negation by a ¢ A. Sometimes a is called a point of the set A to
emphasize a geometric connotation.
A set is usually designated by enumeration of its elements between braces.
For example, {2, 4, 6, 8}represents the set consisting ofthe first four even natural
numbers; {O, ±I, ±2, ±3, ... } is the set of all integers; {I, x, x2 , x3, ... } is the
set of all nonnegative powers ofx; and {I, i, -1, -i} is the set of the four complex
fourth roots ofunity. In many cases, a set is defined by a (mathematical) statement
that holds for all ofits elements. Such a set is generally denoted by {x IP(x)} and
read "the set of all x's such that P(x) is true." The foregoing examples of sets can
be written alternatively as follows:
{n In is even and I < n < 9}
2 O. MATHEMATICAL PRELIMINARIES
{±n In is a natural number}
{y I y = x" and n is a uatural uumber}
{z IZ4 = I and z is a complex uumber}
singleton
(proper) subset
empty set
union, intersection,
complement
universal set
Cartesian product
ordered pairs
In a frequently used shorthand uotation, the last two sets can be abbreviated as
[x" I n 2: aand n is an integer} and [z E iC I Z4 = I}. Similarly, the uuit circle
can be deuoted by {z [z] = I}, the closed interval [a, b] as {xla ::; x ::; b}, the
open interval (a, b) as {x I a < x < b}, and the set of all nonnegative powers of
x as {x"}~o' This last notation will be used frequeutly iu this book. A set with a
single element is called a singleton.
Ifa E A whenever a E B, we say that B is a subset of A and write B C A or
A:J B. If Be A and A c B, then A = B.1f Be A and A i' B, thenB is called
a proper subset of A. The set defined by {ala i' a}is called the empty set and
is denoted by 0. Clearly, 0 contains no elements and is a subset of any arbitrary
set. The collection of all subsets (including 0) of a set A is denoted by 2A • The
reason for this notation is that the number of subsets ofa set containing n elements
is 2" (Problem a.I).1f A and B are sets, their union, denoted by A U B, is the set
containing all elements that belong to A or B or both. The intersection of the sets
.! and B, denoted by A n B, is the set containing all elements belonging to both
A and B. If {B.}.El is a collection of sets,1 we denote their union by U.B. and
their intersection by n.Ba-
In any application of set theory there is an underlying universal set whose
subsets are the objects ofstudy. This universal set is usually clear from the context.
For exaunple, in the study ofthe properties of integers, the set of integers, denoted
by Z, is the universal set. The set of reals, JR, is the universal set in real analysis,
and the set of complex numbers, iC, is the universal set in complex analysis. With
a universal set X in mind, one can write X ~ A instead of ~ A. The complement
of a set A is denoted by ~ A and defined as
~ A sa {a Ia Ii! A}.
The complement of B in A (or their difference) is
A ~ B == {ala E A and a Ii! B}.
From two given sets A and B, it is possible to form the Cartesian prodnct of A
and B, denoted by A x B, which is the set of ordered pairs (a, b), where a E A
and b E B. This is expressed in set-theoretic notatiou as
A x B ={(a, b)la E A and b e B}.
1Here I is an index set--or a counting set-with its typical element denoted by ct. In most cases, I is the set of (nonnegative)
integers, but, in principle, it can be any set, for example, the set of real numbers.
relation and
equivalence relation
0.1 SETS 3
We can generalize this to an arbitrary number of sets. If AI, Az, ... , An are sets,
then the Cartesian product of these sets is
Al x Az x ... x An = {(ai, az, ... , an)!ai E Ad,
which is a set of ordered n-tuples. If Al = Az = ... = An = A, then we write
An instead of A x A x··· x A, and
An = {(ai, az, ... , an) Ia; E Aj.
The most familiar example of a Cartesian product occurs when A = R Then
JRz is the set of pairs (XI,xz) with XI,xz E JR. This is simply the points in the
Euclidean plane. Similarly, JR3 is the set of triplets (XI,xz, X3), or the points in
space, and JRn = {(XI, Xz, ... , Xn)!Xi E JRj is the set ofreal n-tuples.
0.1.1 Equivalence Relations
There are many instances in which the elements of a set are naturally grouped
together. For example, all vector potentials that differ by the gradient of a scalar
function can be grouped together because they all give the same magnetic field.
Similarly, all quantum state functions (of unit "length") that differ by a multi-
plicative complex number of unit length can be grouped together because they all
represent the same physical state. The abstraction of these ideas is summarized in
the following definition.
0.1.1. Definition. Let A be a set. A relation on A is a comparison test between
ordered pairs ofelements ofA. If the pair (a, b) E A x A pass this test, we write
at> b and read "a is related to b" An equivalence relation an A is a relation that
has the fallowing properties:
af>a V'aEA,
a s-b ~ b s a a.b e A,
a i-b.b» c ==>- a[>c a.b;c E A,
(reflexivity)
(symmetry)
(transivity)
When a t> b, we say that "a is equivalent to b" The set [a] = {b E Alb t> aj ofall
equivalence class elements that are equivalent to a is called the equivalence class ofa.
The reader may verify the following property of equivalence relations.
0.1.2. Proposition. If» is an equivalence relation an A and a, b E A, then either
[a] n [b] = 0 or [a] = [bl
representative ofan
equivalence class
Therefore, a' E [a] implies that [a'] = [a]. In other words, any element of
an equivalence class can be chosen to be a representative of that class. Because
of the symmetry of equivalence relations, sometimes we denote them by c-o.
4 O. MATHEMATICAL PRELIMINARIES
0.1.3. Example. Let A be the set of humanbeings.Let a »b be interpretedas"a is older
than b." Then clearly, I> is a relation but not an equivalence relation. On the other hand, if
we interpret a E> b as "a and b have the same paternal grandfather," then l> is an equivalence
relation, as the reader may check. The equivalence class of a is the set of all grandchildren
of a's paternal grandfather.
Let V be the set of vector potentials. Write A l> A' if A - A' = V f for some function
f. The reader may verifythat" is an equivalence relation. and that [A] is the set of all
vector potentials giving rise to the same magnetic field.
Let the underlying set be Z x (Z - {OJ). Say "(a, b) is relatedto (c, d)" if ad = be.
Then this relation is an equivalence relation. Furthermore, [(a, b)] can be identified as the
ratio a/b. l1li
0.1.4. Definition. Let A be a set and {Ra} a collection ofsubsets ofA. Wesay that
partition ofa set {Ra} is a partition of A, or {Ra} partitions A, if the Ra's are disjoint, i.e., have
noelementin common, andUo;Ba = A.
Now consider the collection {[a] Ia E A} of all equivalence classes of A.
quotient set These classes are disjoint, aod evidently their union covers all of A. Therefore,
the collection of equivalence classes of A is a partition of A. This collection is
denoted by A/1><1 aod is called the quotient set ofA underthe equivalence relation
1><1.
0.1.5. Example. Let the underlyingset be lll.3. Definean equivalence relationon lll.3 by
saying that PI E lR3 and P2 E }R3 are equivalentifthey lie on the same line passing through
the origin. Then ]R3I l><l is the set of all lines in space passing through the origin. If we
choose the unit vector with positive third coordinate along a given line as the representative
ofthat line, then]R3I l><lcan be identified with the upper unit hemisphere.e ]R3I l><lis called
projective space the projective space associated with]R3.
On the set IE of integers define a relation by writing m e- n for m, n E IE if m - n is
divisible by k,where k is a fixed integer. Then e-is not only a relation, but an equivalence
relation. In this case, we have
Z/" = {[O], [1], ... , [k - I]},
as the reader is urged to verify.
For the equivalence relation defined on IE x IE of Example 0.1.3, the set IE x lEI l><lcan
be identified with lQ, the set of rational numbers. II
0.2 Maps
map, domain,
codomain, image
To communicate between sets, one introduces the concept of a map. A map f
from a set X to a set Y, denoted by f : X -> Y or X ~ Y, is a correspondence
between elements of X aod those of Y in which all the elements of X participate,
2Purthermore, we need to identify any two points on the edge of the hemisphere which lie on the same diameter.
0.2 MAPS 5
Figure 1 The map f maps all of the set X onto a subset of Y. The shaded areain Y is
f(K), the rangeof f.
and each element of X corresponds to only one element of Y (see Figure 1). If
y E Y is the element that corresponds to x E X via the map f, we write
y = f(x) or x f--> f(x) or
and call f (x) the image of x onder f. Thus, by the definition of map, x E X can
have only one image. The set X is called the domain, and Y the codomain or
the target space. Two maps f : X --> Y and g : X --> Y are said to be equal if
function f(x) = g(x) for all x E X.
0.2.1. Box. A map whose codomain is the set ofreal numbers IR or the set
ofcomplex numbers iC is commonly called afunction.
A special map that applies to all sets A is idA : A --> A, called the identity
identity map map of A, and defined by
VaEA.
graph ofa map The graph r f of a map f : A --> B is a subset of Ax B defined by
r f = {(a, f(a)) Ia E A} C A x B.
This general definition reduces to the ordinary graphs encountered in algebra and
calculus where A = B = IR and A x B is the xy-plane. If A is a subset of X,
we call f(A) = {f(x)lx E A} the image of A. Similarly, if B C f(X), we call
preimage f- 1(B) = {x E Xlf(x) E B) the inverse image, or preimage, of B. In words,
f-1(B) consists of all elements in X whose images are in B C Y.1f B consists
of a single element b, then r:'(b) = {x E Xlf(x) = b) consists of all elements
of X that are mapped to b. Note that it is possible for many points of X to have
the same image in Y. The subset f(X) of the codomain of a map f is called the
range of f (see Figure 1).
6 O. MATHEMATICAL PRELIMINARIES
Figure 2 Thecomposition of twomaps is another map.
composition oftwo
maps
injection, surjection,
and bijection, or1-1
correspondence
inverse of a map
If I : X --> Y and g : Y --> W, then the mapping h : X --> W given
by h(x) = g(f(x)) is' called the composition of I and g, and is denoted by
h = g 0 I (see Figure 2).3It is easy to verify that
loidx=l=idyol
If l(xI) = I(X2) implies that XI = X2, we call I injective, or one-to-one
(denoted I-I). For an injective map only one element of X corresponds to an
element of Y. If I(X) = Y, the mapping is said to be surjective, oronto. A
map that is both injective and surjective is said to be bijective, or to be a one-to-
one correspondence. Two sets that are in one-to-one correspondence, have, by
definition, the same nnmber of elements. If I : X --> Y is a bijection from X
onto Y, then for each y E Y there is one and only one element X in X for which
I(x) = y. Thus, there is a mapping I-I : Y --> X given by I-I(y) = x, where
X is the nniqne element such that I(x) = y. This mapping is called the inverse
of I. The inverse of I is also identified as the map that satisfies I 0 I-I = idy
and I-I 0 I = idx- For example, one can easily verify that ln-I = exp and
exp"! = ln, because In(eX
) = X and elnx = x.
Given a map I : X --> Y, we can define a relation txl on X by saying XI txl
X2 if l(xI) = I(X2). The reader may check that this is in fact an equivalence
relation. The equivalence classes are subsets of X all of whose elements map to
the same point in Y. In fact, [x] = 1-1(f(X». Corresponding to I, there is a
map! : X/txl--> Y given by !([x]) = I(x). This map is injective because
if !([XI]) = !([X2]), then l(xI) = I(X2), so XI and X2 belong to the same
equivalence class; therefore, [XI] = [X2]. It follows that! : X/txl--> I(X) is
bijective.
IfI and g arebothbijections withinverses I-I and g-I, respectively,then goI
also has an inverse, and verifying that (g 0 f)-I = I-log-I is straightforward.
3Notetheimportance of theorder in whichthecomposition is written. Thereverse ordermaynotevenexist.
0.3 METRIC SPACES 7
0.2.2. Example. As an example of the preirnage of a set, consider the sine and cosine
functions. Then it should be clearthat
. -10 { JOO
SID = nn: n=-oo' cos-
1
0 = {i+mT:J:-oo
III
in{ectivity and
surjectivity depend
on the domain and
codomain
unit circle
binary operation
Similarly, sin-1[O, !l consists of all the intervals on the x-axis marked by heavy line
segments in Figure 3, i.e., all thepointswhose sine lies between0 and~.
As examples of maps, we consider fonctions 1 :lR-+ lR stndied in calculus.
The Iwo fonctions 1 : lR -+ lR aud g : lR -+ (-I, +I) given, respectively, by
1 (x) = x 3 aud g(x) = taubx are bijective. The latter function, by the way,shows
that there are as mauy points in the whole real line as there are in the interval
(-I, +1). If we denote the set of positive real numbers by lR+, then the function
1 : lR -+ lR+ given by I(x) = x2 is surjective but not injective (both x aud
~x map to x 2). The function g : lR+ -+ lRgiven by the same rule, g(x) = x 2 ,
is injective but not surjective. On the other haud, h : lR+ -+ lR+ again given by
h(x) = x2 is bijective, but u : lR -+ lRgiven by the same rule is neither injective
nor surjective.
LetMn x n
denotethe set ofn xn real matrices. Define a function det : Mnxn ---+
lR by det(A) = det A,where det Ais the determinaut ofAfor A E J1nxn. This fonc-
tion is clearly surjective (why?) but not injective. The set of all matrices whose
determinaut is 1 is det-I(I). Such matrices occur frequently in physical applica-
tions.
Another example of interest is 1 :C -+ lRgiven by 1(z) = [z].This function
is also neither injective nor swjective. Here 1-1
(1) is the unit circle, the circle
of radius I in the complex plaue.
The domain of a map cau be a Cartesiau product of a set, as in 1 :X x X -+ Y.
Two specific cases are worthy of mention. The first is when Y = R An example
ofthis case is the dot product onvectors, Thus, if X is the set of vectors in space,
we cau define I(a, b) = a· b. The second case is when Y = X. Then 1 is
called a binary operation on X, whereby au element in X is associated with two'
elements in X. For instance, let X = Z, the set of all integers; then the fonction
I: Z xZ -+ Zdefined by [tm, n) = mn is the binary operation of multiplication
of integers. Similarly, g : lR x lR -+ lRgiven by g(x, y) = x +y is the binary
operation of addition of real numbers.
0.3 Metric Spaces
Although sets are at the root of modem mathematics, they are only of formal aud
abstract interest by themselves. To make sets useful, it is necessary to introduce
some structnres on them. There are two general procedures for the implementa-
tion of such structnres. These are the abstractions of the two major brauches of
mathematics-algebra aud aualysis.
8 O. MATHEMATICAL PRELIMINARIES
Figure 3 Theunionof all theintervals on thex-axis marked by heavyline segments is
. -1[0 1]
sm ,~ .
We can turn a set into an algebraic structure by introducing a binary operation
on it. For example, a vector space consists, among other things, of the binary
operation of vector addition. A group is, among other things, a set together with the
binary operation of "multiplication". There are many other examples of algebraic
systems, and they constitute the rich subject of algebra.
When analysis, the otherbranch of mathematics, is abstractedusing the concept
of sets, it leads to topology, in which the concept ofcontinuity plays a central role.
This is also a rich subjectwith far-reaching implications and applications. We shall
not go into any details of these two areas of mathematics. Although some algebraic
systems will be discussed and the ideas of limit and continuity will be used in the
sequel, this will be done in an intuitive fashion, by introducing and employing the
concepts when they are needed. On the other hand, some general concepts will
be introduced when they require minimum prerequisites. One of these is a metric
space:
0.3.1. Definition. A metric space is a set X together with a real-valuedfunction
metric space defined d: X x X ~ lRsuch that
(a) d(x,y) 2: 0 VX,y,andd(x,y)=Oiffx=y.
(b) d(x, y) = d(y, x).
(c) d(x, y) ::: d(x, z) +d(z, y).
(symmetry)
(the triangle inequality)
It is worthwhile to point out that X is a completely arbitrary set and needs
no other structure. In this respect Definition 0.3.1 is very broad and encompasses
many different situations, as the following examples will show. Before exantining
the examples, note that the function d defined above is the abstraction ofthe notion
of distance: (a) says that the distance between any two points is always nonnegative
and is zero ouly if the two points coincide; (b) says that the distance between two
points does not change if the two points are interchanged; (c) states the known fact
0.3 METRIC SPACES 9
that the sum of the lengths oftwo sides of a triangle is always greater than or equal
to the length of the third side. Now consider these examples:
1. Let X = iQI, the set of rational numbers, and define d(x, y) = Ix - yl.
2. Let X = R, and again define d(x, y) = Ix - yl.
3. Let X consist of the points on the surface of a sphere. We can define two
distance functions on X. Let dt (P, Q) be the length of the chord joining P
and Q on the sphere. We can also define another metric, dz(P, Q), as the
length of the arc of the great circle passing through points P and Q on the
surface ofthe sphere. Itis not hard to convince oneselfthat dl and da satisfy
all the properties of a metric function.
4. Let eO[a,b] denote the set ofcontinuous real-valued functions on the closed
interval [a, b]. We can define d(f, g) = J:If(x) - g(x)1 dx for f, g E
eO(a, b).
5. Let en(a, b) denote the set ofbounded continuons real-valned fnnctions on
the closed interval [a, b]. We then define
d(f, g) = max lIf(x) - g(x)IJ
xe[a,b]
for f, g E eB(a, b). This notation says: Take the absolute valne of the
difference in f andg at all x in the interval [a, b] and then pickthe maximum
of all these values.
The metric function creates a natural setting in whichto testthe"closeness"
of points in ametric space. Oneoccasionon whichtheideaof closenessbecomes
sequence defined essential is in the study of a seqnence. A sequence is a mapping s : N --* X from
the set of natural numbers N into the metric space X. Such a mapping associates
with a positive integer n apoints(n) of the metric space X. Itis customary to write
Sn (or Xn to match the symbol X) instead of s(n) and to enumerate the values of
the function by writing {xnJ~I'
Knowledge of the behavior of a sequence for large values of n is of fundamental
importance. In particular, it is important to know whether a sequence approaches
convergence defined a finitevalue as n increases.
0.3.2. Box. Suppose thatfor some x andforanypositive real number e, there
exists a natural number N such that dtx-; x) < e whenevern > N. Then we
say that the sequence {xn}~l converges to x and write limn~oo d(xn l x) =
Oar d(xno x) --* 0 or simply Xn --* X.
It may not be possible to test directly for the convergence of a given sequence
because this requires a knowledge of the limit point x. However, it is possible to
10 O. MATHEMATICAL PRELIMINARIES
Figure 4 The distancebetween the elements of a Cauchy sequence gets smaller and
smaller.
Cauchy sequence
complete metric
space
do the next best thing-to see whether the poiots of the sequence get closer and
closer as n gets larger and larger. A Cauchy sequence is a sequence for which
limm.n-->oo d(xm, xn) = 0, as shown in Figure 4. We can test directly whether
or not a sequence is Cauchy. However, the fact that a sequence is Cauchy does
not guarantee that it converges. For example, let the metric space be the set of
rational numbers IQI with the metric function d(x, y) = Ix - YI, and consider the
sequence {xn}~l where X n = Lk~l (- li+1
/ k. It is clear that X n is a rational
number for any n. Also, to show that IXm - XnI 4 0 is an exercise in calculus.
Thus, the sequence is Cauchy. However, it is probably known to the reader that
limn--+oo Xn = In2, whichis not arational number.
A metric space io which every Cauchy sequence converges is called a complete
metric space. Complete metric spaces playa crucial role in modern analysis. The
precediog example shows that IQI is not a complete metric space. However, if the
limit poiots of all Cauchy sequences are added to IQI, the resulting space becomes
complete. This complete space is, ofcourse, the real number system R It tums out
that any iocomplete metric space can be "enlarged" to a complete metric space.
0.4 Cardinality
The process of counting is a one-to-one comparison ofone set with another. Iftwo
cardinalily sets are io one-to-one correspondence, they are said to have the same cardinality.
Two sets with the same cardinality essentiallyhave the same "number" of elements.
The set Fn = {I, 2, ... , n} is finite and has cardinality n. Any set from which there
is a bijection onto Fn is said to be finite with n elements.
Although some steps hadbeentaken before himin thedirection of a definitive theory of
sets,thecreatorof thetheoryof setsis considered to be Georg Cantor (1845-1918), who
was bornin Russiaof Danish-Jewish parentage butmoved to Germany withhis parents.
0.4 CARDINALITY 11
His father urged him to study engineering, and Cantor entered the University of Berlin in
1863 with that intention. There he came under the influence of Weierstrass and turned to
pure mathematics. He became Privatdozent at Halle in 1869 and professor in 1879. When
he was twenty-nine hepublished his first revolutionarypaper on the theory of infinitesetsin
the JournalfUrMathematik.Although some of its propositions were deemed faulty by the
older mathematicians,its overalloriginality and brilliance attracted attention.He continued
to publishpapers on the theoryof sets and on transfinitenumbersuntil 1897.
One of Cantor's main concerns was to differentiate among
infinite sets by "size" and, like Balzano before him, he decided
that one-to-one correspondence should be the basic principle.
In his correspondence with Dedekind in 1873, Cantor posed the
question of whether the set of real numbers can be put into one-
to-one correspondence with the integers, and some weeks later
he answered in the negative. He gave two proofs. The first is more
complicated than the second, which is the one most often used
today. In 1874 Cantor occupied himself with the equivalence of
the points of a line and the points of R" and sought to prove
that a one-to-one correspondence between these two sets was
impossible. Three years later he proved that there is such a correspondence. He wrote to
Dedekind, "I see it but I do not believe it." He later showed that given any set, it is always
possible to create a new set, the set of subsets of the given set, whose cardinal number is
larger than that of the given set. If 1'::0 is the given set, then the cardinal number of the set
of subsets is denoted by 2~o. Cantor proved that 2~O = C, where c is the cardinal number
of the continuum; i.e., the set of real numbers.
Cantor's work, which resolved age-old problems and reversed much previous thought,
could hardly be expected to receive immediate acceptance. His ideas on transfinite ordi-
nal and cardinal numbers aroused the hostility of the powerful Leopold Kronecker, who
attacked Cantor's theory savagely over more than a decade, repeatedly preventing Cantor
from obtaining a more prominent appointment in Berlin. Though Kronecker died in 1891,
his attacks left mathematicians suspicious of Cantor's work. Poincare referred to set theory
as an interesting "pathological case." He also predicted that "Later generations will regard
[Cantor's] Mengenlehre as 'a disease from which one has recovered." At one time Cantor
suffered a nervous breakdown, but resumed work in 1887.
Many prominentmathematicians, however, were impressedby the uses to which the new
theoryhadalreadybeenpatinanalysis,measuretheory,andtopology. HilbertspreadCantor's
ideas in Germany, and in 1926 said, "No one shall expel us from the paradise which Cantor
created for us." He praised Cantor's transfinite arithmetic as "the most astonishing product
of mathematical thought, one of the most beautiful realizations of human activity in the
domain ofthe purely intelligible." BertrandRusselldescribed Cantor's work as "probablythe
greatest of which the age can boast." The subsequent utility ofCantor's work in formalizing
mathematics-a movement largely led by Hilbert-seems at odds with Cantor's Platonic
view that the greater importance of his work was in its implications for metaphysics and
theology. That his work could be so seainlessly diverted from the goals intended by its
creator is strong testimony to its objectivity and craftsmanship.
12 O. MATHEMATICAL PRELIMINARIES
countably infinite
uncountable sets
Cantor set
constructed
Now consider the set of natnral numbers N = {l, 2, 3, ... }. If there exists a
bijection between a set A and N, then A is said to be countably infinite. Some
examples of countably infinite sets are the set of all integers, the set ofeven natnral
numbers, the set of odd natnral numbers, the set of all prime numbers, and the set
of energy levels of the bound states of a hydrogen atom.
It may seem surprising that a subset (such as the set of all even numbers)
can be put into one-to-one correspondence with the full set (the set of all natural
numbers); however, this is a property shared by all infinite sets. In fact, sometimes
infinite sets are defined as those sets that are in one-to-one correspondence with at
least one of their proper subsets. It is also astonishing to discover that there are as
many rational numbers as there are natnral numbers. After all, there are infinitely
many rational numbers just in the interval (0, I)-or between any two distinctreal
numbers.
Sets that are neither finite nor countably infinite are said to be uncountable. In
some sense they are "more infinite" than any countable set. Examples of uncount-
able sets are the points in the interval (-1, +1), the real numbers, the points in a
plane, and the points in space. Itcanbe shown that these sets have the same cardinal-
ity: There are as many points in three-dimensional space-the whole universe-as
there are in the interval (-I, +1) or in any other finite interval.
Cardinality is avery intricatemathematicalnotionwith many surprisingresults.
Consider the interval [0, 1]. Remove the open interval (~, ~) from its middle. This
means that the points ~ and ~ will not be removed. From the remaining portion,
[0, ~] U [~, 1], remove the two middle thirds; the remaining portion will then be
[0, ~] U [~, ~] U [~, ~] U [~, 1]
(see Figure 5). Do this indefinitely. What is the cardinality of the remaining set,
which is called the Cantor set? Intuitively we expect hardly anything to be left.
We might persuade ourselves into accepting the fact that the number of points
remaining is at mostinfinite but countable. The surprising fact is that the cardinality
is that of the continuum! Thus, after removal of infinitely many middle thirds, the
set that remains has as many points as the original set!
0.5 Mathematical Induction
Many a time it is desirable to make a mathematical statement that is true for all
natural numbers. For example, we may want to establish a formula involving an
integer parameter that will hold for all positive integers. One encounters this situa-
tion when, after experimenting with the first few positive integers, one recognizes
a pattern and discovers a formula, and wants to make sure that the formula holds
for all natural numbers. For this purpose, one uses mathematical induction. The
induction principle essence of mathematical induction is stated as follows:
(2)
0.5 MATHEMATICAL INDUCTION 13
0------------------
1------
2--
3-
4--
Figure5 TheCantor setafter one,two,three, and four "dissections."
0.5.1. Box. Suppose that there is associatedwith each natural number (pos-
itive integer) n a statement Sn. Then Sn is true for every positive integer
provided the following two conditions hold:
I. S, is true.
2. If Sm is true for some given positive integer m,
then Sm+l is also true.
We illustrate the use of mathematical induction by proving the binomial the-
binomial theorem orem:
where we have used the notation
(;) '" k!(mm~k)!'
The mathematical statement Sm is Equation (1). We note that SI is trivially true.
Now we assume that Smis true and show that Sm+1 is also true. This means starting
with Equation (1) and showing that
14 O. MATHEMATICAL PRELIMINARIES
Then the induction principle ensures that the statement (equation) holds for all
positiveintegers.
Multiply both sides of Eqnation (I) by a +b to obtain
(a +b)m+l = t (rn)am- k+l bk +t (rn)am- kbk+1.
k~O k k=O k
Now separate the k = 0 term from the first sum and the k = rn term from the
secondsum:
letk=j-linthissum
The second sum in the last line involves j. Since this is a dummy index, we can
substitute any symbol we please. The choice k is especially useful because then
we canunitethetwo summations. Thisgives
If we now use
which the reader can easily verify, we finally obtain
Mathematical induction is also used in defining quantities involving integers.
inductive definitions Suchdefinitions are called inductive definitions. Forexample, inductive definition
is used in defining powers: a l = a and am = am-lao
0.6 Problems
0.1. Show that the number of subsets of a set containing n elements is 2".
tk=n(n+ I).
k=O 2
0.6 PROBLEMS 15
0.2. Let A, B, and C be sets in a universal set U. Show that
(a) A C BandB C CimpliesA C C.
(b) AcBiffAnB=AiffAUB=B.
(c) A c Band B C C implies (A U B) c C.
(d) AU B = (A ~ B) U (A n B) U (B ~ A).
Hint: To show the equality of two sets, show that each set is a subset of the other.
0.3. For each n E N, let
In = Ixlix - II < n and Ix + II > ~} .
Find UnIn and nnIn.
0.4. Show that a' E [a] implies that [a'] = [a].
0.5. Can you define a binary .operatiou of "multiplication" on the set of vectors in
space? What about vectors in the plane?
0.6. Show that (f a g)-1 = g-1 a 1-1 when I and g are both bijections.
0.7. Take any two open intervals (a, b) and (c, d), and show that there are as many
points in the first as there are in the second, regardless of the size of the intervals.
Hint: Find a (linear) algebraic relation between points of the two intervals.
0.8. Use mathematical induction to derive the Leibniz rule for differentiating a
Leibniz rule product:
d
n
n (n) d
k
I dn-kg
dx n(f. g) = 'E k -d
k d n-k'
k=O x x
0.9. Use mathematical induction to derive the following results:
n k rn+1 - 1
'Er = ,
k=O r - I
Additional Reading
1. Halmos, P.Naive Set Theory, Springer-Verlag, 1974. A classic text on intu-
itive (as opposed to axiomatic) set theory coveriog all the topics discussed
in this chapter and much more.
2. Kelley, J. General Topology,Springer-Verlag, 1985. The introductory chap-
ter of this classic reference is a detailed introduction to set theory and map-
pings.
3. Simmons, G.lntroduction to Topologyand Modern Analysis, Krieger, 1983.
The first chapter of this book covers not only set theory and mappings, but
also the Cantor set and the fact that integers are as abundant as rational
numbers.
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Part I _
Finite-Dimensional Vector Spaces
j
j
j
j
j
j
j
1 _
Vectors and Transformations
Two- and three-dimensional vectors-undoubtedlyfamiliar objects to thereader-
can easily be generalized to higher dimensions. Representing vectors by their
components, one canconceiveof vectors having N components. Thisis themost
immediate generalization of vectors in the plane and in space, and such vectors
are called N-dimensional Cartesian vectors. Cartesian vectors are limited in two
respects: Their components are real, and their dimensionality is finite. Some ap-
plications in physics require the removal of one or both of these limitations. It is
therefore convenient to study vectors stripped of any dimensionality or reality of
components. Such properties become consequences of more fundamental defini-
tions. Although we will be concentrating on finite-dimensional vector spaces in
this part of the book, many of the concepts and examples introduced here apply to
infinite-dimensional spaces as well.
1.1 Vector Spaces
Let us begin with the definition of an abstract (complex) vector space.'
1.1.1. Definition. A vector space Vover <C is a set ojobjects denoted by 10), Ib),
vector space defined [z), and so on, called vectors, with the following propertiesr:
1. To every pair ofvectors la) and Ib) in Vthere corresponds a vector la) +Ib),
also in V, called the sum oj la) and Ib), such that
(a) la) + Ib) = Ib) + la),
1Keepin mindthat C is theset of complexnumbers andR theset of reels.
2Thebra, ( I,andket, I),notation forvectors, invented by Dirac, is veryusefulwhendealingwithcomplexvectorspaces.
However, itis somewhat clumsyforcertain topicssuchasnannandmetrics andwill therefore beabandoned in thosediscussions.
20 1. VECTORS ANO TRANSFORMATIONS
(b) la} +(Ib) + Ie)) = (Ia) + Ib)) + [c),
(c) There exists a unique vector 10} E V, called the zero vector, such that
la} + 10) = Ja} for every vector la),
(d) To every vector la} E Vthere corresponds a unique vector - Ja} (also
in V) such that la) + (-Ia}) = 10).
scalars are numbers
complex VS. real
vector space
concept offield
summarized
2. To every complex number: a-{llso called a scalar-s-and every vector la)
there corresponds a vector a la) in V such that
(a) a(f3la}) = (af3) la),
(b) J ]a) = la).
3. Multiplication involving vectors and scalars is distributive:
(a) a(la) + Ib}) = ala) +alb).
(b) (a +13) la} = ala) + f3la).
The vector space defined above is also called a complex vector space. It is
possible to replace iC with IR-the set ofreal numbers-s-in which case the resulting
space will be called a real vector space. Real and complex numbers are prototypes
of a mathematical structure called field. A field is a set of objects with two binary
operations called addition and multiplication. Each operation distributes with re-
spect to the other, and each operation has an identity. The identity for addition is
denoted by °and is called additive identity. The identity for multiplication is de-
noted by I and is called multiplicative identity. Furthermore, every element has an
additive inverse, and every elementexcept the additive identity has a multiplicative
inverse.
1.1.2. Example. SOME VECTOR SPACES
1. 1R is a vector space overthe fieldof real numbers.
2. C is a vector space overthefieldof realnumbers.
3. C is a vector space over the complex numbers.
4. LetV = R andletthefieldofscalarsheC.Thisisnot avectorspace,becauseproperty
2 of Definition 1.1.1 is notsatisfied: A complexnumber timesa realnumber is not
arealnumber and therefore doesnotbelongto V.
5. Thesetof"arrows"in theplane(orin space) forms avector space over1R under the
parallelogram lawof additiooof planar (or spatial)vectors.
3Complex numbers, particularly when they aretreated as variables, areusuallydenoted by z, andwe shall adhere to this
convention in Part ITI. However, in the discussionof vectorspaces,we havefoundit moreconvenient to use lowercase Greek
letters to denotecomplexnumbers as scalars.
n-dimensional
complex coordinate
space
n-dimensional reai
coordinate space, or
Cartesian n-space
linear independence
defined
linear combination of
vectors
1.1 VECTOR SPACES 21
6. Let :Pe[t] be the set of all polynomials with complex coefficients in a variable t.
Then pC[t] is a vector space under the ordinary addition of polynomials and the
multiplication of a polynomial by a complex number. In this case the zero vector is
the zeropolynomial.
7. For a given positive integer n, let P~[t] be the set of all polynomials with complex
coefficients of degree less than or equal to n. Again it is easy to verify that P~[t]
is a vector space under the usual addition of polynomials and their multiplication
by complex scalars. In particular. the sum of two polynomials of degree less than
or equal to n is also a polynomial of degree less than or equal to n, and multiply-
ing a polynomial with complex coefficients by a complex number gives another
polynomial of the same type. Here the zero polynomial is the zero vector.
8. The set ~[t] of polynomials of degree less than or equal to n with real coefficients
is a vector space over the reals, but it is not a vector space over the complex numbers.
9. Let en consist of all complex a-tuples such as la) = ("1, "2, ... , "n) and Ib) =
(!!I, fJ2, ... , fJn)· Let" be a complex number.Then we define
la) + Ib) = ("t + fJI, "'2 + fJ2,"" "n + fJn),
ala} = (ceq, aa2, ...• aan),
10) = (0,0, ... , 0),
-Ia) = (-"10 -"2,···, -"n)·
It is easy to verify that en is a vector space over the complex numbers. It is called
the n-dimensional complex coordinate space.
10. The set of all real n-tuples lRn is a vector space over the real numbers under the
operations similar to that of en.It is called the n-dimensional real coordinate space,
or Cartesian n-space. It is not a vector space over the complex numbers.
11. The setof all complex matrices withm rows andn columns JY[mxn is a vector space
under the usual addition of matrices and multiplication by complex numbers. The
zero vector is the m x n matrix with all entries equal to zero.
12. Let Coo be the set of all complex sequences la} = {a; }~l such that L~l la;1
2 <
00. One can show that with addition and scalar multiplication defined component-
wise, Coo is a vector space over the complex numbers.
13. The set of all complex-valued functions of a single real variable that are continuous
in the real interval (a, b) is a vector space over the complex numbers.
14. The set en(a, b) on (a, b) of all real-valued functions of a single real variable that
possess continuous derivatives of all orders up to n forms a vector space over the
reals.
15. The set eOO(a, b) of all real-valuedfunctions on (a, b) of a single real variablethat
possess derivatives of all orders forms a vector space over the reals. II
It is ciear from the example above that the existence nf a vector space depends
as much on the nature of the vectors as nn the nature of the scalars.
1.1.3.Definition. The vectors lal), la2), ... , Ian}, are said to be linearly inde-
pendent iffor "i EC, the relation L:f=t a; lai) =0 implies a; =0for all i. The
sum L:f=l a; lai) is called a linear combination of{lai )}f=l'
22 1. VECTORS AND TRANSFORMATIONS
subspace
The intersection of
two subspaces is
also a subspace.
1.1.4. Definition. A subspace W ofa vector space V is a nonempty subset ofV
with the property that iJla} , Ib) E W, then ala} +fJ Ib) also belongs to Wfor all
a, fJ E C.
A subspace is a vector space in its own right. The reader may verify that the
intersection oftwo subspaces is also a subspace.
span ofa subset ofa
vector space
1.1.5. Theorem. If S is any nonempty set ofvectors in a vector space V,then the
set Ws ofall linear combinations ofvectors in S is a subspace ofV. We say that
Ws is the span of S, or that S spans WS, or that Ws is spanned by S. Ws is
sometimes denoted by Span{S}.
The proof of Theorem 1.1.5is left as Problem 1.8.
basis defined 1.1.6. Definition. A basis ofa vector space V is a set B oflinearly independent
vectors that spans all ofV. A vector space that has afinite basis is calledjinite-
dimensional; otherwise, it is injinite-dimensional.
We state the following without proof (see [Axle96, page 31]):
components ofa
vector in a basis
1.1.7. Theorem. All bases of a given finite-dimensional vector space have the
same number oflinearly independent vectors. This number is called the dimension
ofthe vector space. A vector space ofdimension N is sometimes denoted by VN.
If 10) is a vectorin an N-dimensional vectorspaceV and B = {Ia,}}~1 a basis
in that space, then by the definitionof a basis, there exists a nnique (see Problem
1.4) set of scalars{aI, a2,··., an} such that 10) = L~l a, 10,). The set {ail~l
is called the components of la} with respect to the basis B.
1.1.8. Example. Thefollowingaresubspaces of someof thevectorspacesconsidered in
Example 1.1.2.
• The"space" of real numbers is a subspace of C overthereals,
• 1R is notasubspace ofC overthecomplexnumbers, becauseasexplainedinExample
1.1.2,1R cannot be avector spaceoverthecomplex numbers.
• The set of all vectors along a given line going through theorigin is a subspace of
arrows intheplane(orspace)overJR.
• P~[t] is a subspace ofpC[t].
• Cn- 1 is a subspace of en whenen- 1is identified withall complex a-tuples with
zerolastentry. In general, em is a subspace of en form < n whenem is identified
withalla-tuples whoselastn - m elements arezero.
• :MYXS is asubspace of:Mm xn forr :::: m ors ~ n. Here, weidentify anr x s matrix
withanm x n matrix whoselastm - r rows andn - s columns areall zero.
• :P~ [t] is a subspace of p~ [t] form < n.
• P~[t] is a subspace ofP~[t] form < n. Notethat bothP~[t] andP~n[t] are vector
spaces Over thereals only.
1.2 INNER PRODUCT 23
• IRm is a subspaceofR" form < n. Therefore,lR2•the plane,is a subspaceoflR3~ the
Euclidean space. Also, R 1 == IRis a subspace ofboth the plane]R2 and the Euclidean
space IR3• II
1.1.9. Example. The following are bases for the vector spaces given in Example 1.1.2.
• The number 1 is a basis for JR, which is therefore one-dimensional.
• The numbers 1 and i = .J=I are basis vectors for the vector space Cover R Thus,
this space is two-dimensional.
• The number 1 is a basis for Cover C, and the space is one-dimensional. Note that
although the vectors are the same as in the preceding item, changing the nature of
the scalars changes the dimensionality of the space.
• The set {ex,ey, cz} of the unit vectors in the directions of the three axes forms a
basis in space. The space is three-dimensional.
• A basis of pert] can be fanned by the monomials 1, t, (2, .... It is clear that this
space is infinite-dimensional.
• A basis of en is given by el, e2, ...,en, where ej is an n-tuple that has a 1 at the
standard basis of en jth position and zeros everywhere else. This basis is called the standard basis of
en. Clearly, the space has n dimensions.
• A basis of JY[mxn is given by 811, e12, ... , eij, ... , emn• where 8U is the m x n
matrix with zeros everywhere except at the intersection ofthe ithrow and jthcolumn,
where it has a one.
• A set consisting of the monomials 1, t, t 2 , ••• , tn fOnTIS a basis of:P~[t]. Thus, this
space is (n + 1)-dimensional.
• The standard basis of en is a basis of R" as well. It is also called the standard basis
oflllff
• Thus, IRn is n-dimensional.
• If we assume that a < 0 < b, then the set of monomials 1, x, x2, ... forms a basis
for e""(a, b), because, by Taylor'stheorem, any fuuction belonging to e""(a, b)
can be expanded in an infinite power series about x = O.Thus, this space is infinite-
dimensional. ..
Given a space V with a basis B = {lai)}i~l' the span of any m vectors (m < n)
of B is an m-dimensional subspace of V.
1.2 Inner Product
A vector space, as given by Definition 1.1.1, is too general and structureless to be
of much physical interest. One useful structure introduced on a vector space is a
scalarproduct. Recallthat the scalar (dot) product of vectors in the plane or in space
is a rule that associates with two vectors a and b, a real number. This association,
denoted symbolically by g : V x V --> E, with g(a, b) = a . b, is symmetric:
g(a, b) = g(b, a), is linear in the first (and by symmetry, in the second) factor:"
g(aa +tJb, c) = ag(a, c) +tJg(b, c) or (aa +tJb) . c = aa· c +tJb· c,
4A function that is linear in both of its arguments is called a bilinear function.
24 1. VECTORS AND TRANSFORMATIONS
gives the "length" of a vector: lal2
= g(a, a) = a . a 2: 0, and ensures that the
only vector with zero length'' is the zero vector: g(a, a) = 0 if and only if a = O.
We want to generalize these properties to abstract vector spaces for which the
scalars are complex numbers. A verbatim generalization of the foregoing proper-
ties, however, leads to a contradiction. Using the linearity in both arguments and
anonzero la), we obtain
g(i la), i la» = i2g(la}
, la» = -g(la) , la}). (1.1)
Dirac "bra," ( I,and
"kef' I ), notation Is
used forInner
products.
inner product defined
Either the right-hand side (RHS) or left-hand side (LHS) of this equation must be
negative! But this is inconsistent with the positivity of the "length" of a vector,
which requires g(la) , la» to be positive for all nonzero vectors, including i la}.
The source of the problem is the linearity in both arguments. Ifwe can change this
property in such a way that one of the i's in Equation (1.1) comes out complex-
conjugated, the problem may go away. This requires linearity in one argument
and complex-conjugate linearity in the other. Which argument is to be complex-
conjugate linear is a matter of convention. We choose the first argument to be so.6
We thus have
g(a la) +fJ Ib), Ie}) = a*g(la) , Ie}) +fJ*g(lb) , Ic}),
where a* denotes the complex conjugate. Consistency then requires us to change
the symmetry property as well. In fact, we must demand that g(la} , Ib» =
(g(lb) , la»)*, from which the reality ofg(la) , la})-anecessary condition for its
positivity-follows innnediately.
The question of the existence of an inner product on a vector space is a deep
problem in higher analysis. Generally, if an inner product exists, there may be
many ways to introduce one on a vector space. However, as we shall see in Section
1.2.4, afinite-dimensional vector space always has an inner product and this inner
product is unique." So, for all practical purposes we can speak of the inner product
on a finite-diroensional vector space, and as with the two- and three-dimensional
cases, we can omit the letter g and use a notation that involves only the vectors.
There are several such notations in use, but the one that will be employed in this
book is the Dirac bra(c)ket notation, whereby g(la} , Ib}) is denoted by (al b).
Using this notation, we have
1.2.1. Definition. The innerproduct a/two vectors, la) and Ib), in a vector space
Vis a complex number, (a Ib) E <C, such that
1. (alb) = (bla)*
2. (al (fJ Ib) +Y Ie}) = fJ (al b) + y (al c)
SInourpresent discussion, we areavoiding situations in whichanonzero vectorcanhavezero"length." Suchoccasionsarise
inrelativity, andwe shalldiscussthemin Part VII.
6In some books,particularly in themathematical literature. thesecondargument is chosento be linear.
7TIris uniqueness holdsupto a certain equivalence of inner products thatwe shallnotget intohere.
positive definite, or
Riemannian inner
product
sesquilinear
1.2 INNER PRODUCT 25
3. (ala) ,,::0, and (ala) =0 ifandonlyif ja) = 10).
The last relation is calledthe positivedefiniteproperty ofthe innerproduct.8 A pos-
itive definite inner product is also called a Riemannian inner product, otherwise
it is called pseudo-Riemannian.
Note that linearity in the first argument is absent, because, as explained earlier,
it would be inconsistent with the first property, which expresses the "symmetry"
of the inner product. The extra operation of complex conjugation renders the true
linearity in the second argument impossible. Because of this complex conjugation,
the inner product on a complex vector space is not truly bilinear; it is commonly
called sesqnilinear.
A shorthand notation will be useful when dealing with the inner product of a
linear combination of vectors.
1.2.2. Box. We write the illS ofthe second equation in the definition above
as (alPb + vc).
This has the advantage of treating a linear combination as a single vector. The
second property then states that if the complex scalars happen to be in a ket, they
"split out" unaffected:
(alpb +vc) = P(al b) +V (al c) . (1.2)
On the other hand, if the complex scalars happen to be in the first factor (the bra),
then they should be conjugated when they are "split out":
(Pb +vcla) = P* (bl a) +V* (cla) . (1.3)
natural inner product
forC"
A vector space V on which an inner product is defined is called an inner
product space. As mentioned above, all finite-dimensional vector spaces can be
turned into inner product spaces.
1.2.3.Example. In this example we introduce some ofthe most common inner products.
The reader is urged to verify that in all cases, we indeed have an inner product.
• Let la) , Ib) E cn, with la) = (ej , a2, ... , an) and Ib) = (fJj, f32, ... , f3n), and
define an inner product on en as
n
(alb) =aif31 +a2f32+ .. ·+a~f3n =I>if3j·
i=l
That this product satisfies all the required properties of an inner product is easily
checked. For example, if Ib) = la), we obtain (al a) = lal1 2+ la212 +...+la,,12,
which is clearly nonnegative.
8The positive definiteness must be relaxed in the space-time of relativity theory, in which nonzero vectors can have zero
"length."
26 1. VECTORS ANO TRANSFORMATIONS
• Similarly, for la} • Ib) E JRn the same definition (without the complex conjugation)
satisfies all the properties of an inner product.
• For Ie},Ib} E <coo the natural inner product is defined as (al b) = Lr:;,1 aifJi' The
question of the convergence of this sum is thesubject of Problem 1.16.
• Let x(t), y(t) E :Pe[t], the space of all polynomials in t with complex coefficients.
Define
weight function ofan
inner product defined
interms ofintegrals
(xl y) es L
b
w(t)x*(t)y(t)dt, (t.4)
naturai inner product
forcomplex
functions
where a and b are real numbers--or infinity-for which the integral exists, and wet)
is a real-valued, continuous function that is always strictlypositivein the interval
(a, b). Then Equation (1.4) defines an inner product. Depending on the so-called
weight function w(t), there can be many different inner products defined on the
infinite-dimensional space pC[t].
• Let I, g E <C(a, b) and define their inner product by
UI g) '" L
b
w(x)!*(x)g(x) dx.
It is easily shown that (II g) satisfies alt the requirements of the inner product if,
as in the previous case, the weight function w(x) is always positive in the interval
(a, b). This is called the standard innerproduct on <C(o, b). III
1.2.1 Orthogonality
The vectors of analytic geometry and calculus are often expressed in terms of unit
vectors along the axes, i.e., vectors that are of unit length and perpendicular to one
another. Such vectors are also important in abstract inner product spaces.
orthogonalilydefined 1.2.4, Definition. Vectors la) ,Ib) E V are orthogonal if (al b) =0. A normal
vector,or normalized vector, Ie}is onefor whicb (e] e) = 1.A basis B = [lei)}~1
orthonormal basis in an N-dimensional vector space V is an orthonormal basis if
{
I ifi =j,
e, e , -8"-
( ,I J) - 'J = 0 ifi 'I i.
Kronecker delta where 8ij, defined by the last equality, is called the Kronecker delta.
1.2.5. Example. Here are examples of orthonormal bases:
• The standard basis of R" (or C"}
leI} =(1,0, ... ,0), le2) = (0, I, .... ,0), ... , len} = (0,0, ... ,1)
is orthonormal under the usual inner product of those spaces.
(1.5)
(a)
!az)
1.2 INNER PRODUCT 27
laz) Ia,)-l =1£'1)

'- ~ Ie2)
 
'- '- ~I~)
 
I~) ~ ~"'a,)11 ------"'1~) Ie,)
(b) (e) (d)
Figure 1.1 The essence of the Gram-Schmidt process is neatly illustrated by the process in two dimensions.
This figure, depicts the stages of the construction of two orthonormal vectors.
• Lellek) =eikx /v'2n'be functions in iC(O, 2".) withw(x) = 1. Then
1 fozn . .
(ekl ek) = - e-tkXe,kx dx =1,
2". 0
andfor I i' k,
(ell ek) = ~ f21r e-ilxikx dx = ~ f2tr ei(k-l)x dx = o.
2".k 2".k
Thns, (etlek) = ~lk'
1.2.2 The Gram-Schmidt Process
III
The Gram-Schmidt
process explained
It is always possible to convert any basis in V into an orthonormal basis. A process
by wbich this may be accomplished is called Gram-Schmidt orthonormaliza-
tion. Consider a basis B = {lal} , laz}, ... , IaN)}. We intend to take linear com-
binations of lai) in such a way that the resulting vectors are orthonormal. First, we
let leI) = lal) /v'(atlal) and note that (ell el) = 1. If we subtract from laz) its
projection along leI), we obtain a vector that is orthogonal to leI) (see Figure 1.1).
Ca1ling the resulting vector lez)' we have lez) = laz} - (ell az) let), which can
be written more symmetrically as 102) = laz) - leI) (ell az). Clearly, this vector
is orthogonal to Iet}.In order to normalize lez}, we divide it by (ezl ez). Then
lez} == lez) / (ezl ez)will be a normal vector orthogonal to leI). Subtracting from
la3) its projections along the first and second unit vectors obtained so far will give
the vector
z
Ie;) = la3) -leI} (ella3) -lez} (ezla3) = la3) - L lei} (eil a3) ,
i=l
28 1. VECTORS ANO TRANSFORMATIONS
la,)
(a) (b) (c)
Figure 1.2 Once the orthonormal vectors in the plane of two vectors are obtained, the thirdorthonormal vector
is easily constructed.
which is orthogonal to both lei) and le2) (see Fignre 1.2):
=1 ~O
I ,..-'-, ,..-'-,
(ell e3) = (ell a3) - (eli el) (ell a3) - (ell e2) (e21a3) = O.
Similarly, (e21 e~) = O.
Erhard Schmidt (1876-1959) obtained bis doctorate nnder
the supervision of David Hilbert. His main interest was in in-
tegral equations and Hilbert spaces. He is the "Schmidt" of
the Gram-8chmldtorthogonalization process, which takes
a basis of a space and constructs an orthonormal one from
it. (Laplace had presented a special case of this process long
before Gram or Schntidt.)
In 1908 Schmidt worked on infinitely many equations in
infinitelymanyunknowns, introducingvariousgeometricno-
tations and terms that are still in use for describing spaces of
functions. Schmidt's ideas were to lead to the geometry of
Hilbert spaces. This was motivated by the study of integral equations (see Chapter 17) and
an attempt at their abstraction.
Earlier, Hilbert regarded a function as given by its Fourier coefficients. These satisfy
the condition that L~l a~ is finite. He introduced sequences of real numbers {xn} such
that L~l x~ is finite. Riesz and Fischer showed that there is a one-to-one correspondence
between square-integrable functions and square-summable sequences of their Fourier co-
efficients. In 1907 Schmidt and Frechet showed that a consistent theory could be obtained
if the square-summable sequences were regarded as the coordinates of points in an infinite-
dimensional space that is a generalization of n-dimensional Euclidean space. Thusftmctions
can be regarded as points ofa space, now called a Hilbert space.
1.2 INNER PRODUCT 29
In general, if we have calculated m orthonormal vectors lell, ... , !em), with
m < N, then we can find the next one using the following relatious:
m
le~+l) = lam+!) - Lie,) (eilam+,) ,
i=l
!e~+l)
(e~+,1 e~'+l)
(1.6)
Even though we have been discussing finite-dimensional vector spaces, the process
ofEquation (1.6) can continue for infinite-dimensions as well. The reader is asked
to pay attention to the fact that, at eacb stage of the Gram-Schmidt process, one
is taking linear combinations of the original vectors.
1.2.3 The SchwarzInequality
Let us now consider an important inequality that is valid in both finite and infinite
dimensions and whose restriction to two and three dimensions is equivalent to the
fact that the cosine of the angle between two vectors is always less than one. .
1.2.6. Theorem. For any pair ofvectors la) , Ib) in an innerproduct space V, the
Schwarz inequality Schwarz inequality holds: (al a) (bl b) 2: 1(al b) 1
2. Equality holds when la) is
proportional to Ib).
Proof Let [c) = Ib) - ((al b) / (al a)) la), and note that (al c) = O. Write Ib) =
((al b) / (al a») la) + Ic) and take the inner product of Ib) with itself:
l
(alb
)1
2
(bib) = (ala) (ala) + (clc).
Sincethelastterm is nevernegative, we have
(bl b) > I (al b) 1
2
=} (al a) (bl b) 2: 1 (al b) 1
2•
- (al a)
Equality holds iff (c] c) = 0 or Ic) = O. From the definition of [c), we conclude
that la) and Ib) must be proportional. 0
Notice the power of abstraction: Webave derived the Schwarz inequality solely
from the basic assumptions of inner product spaces independent of the specific
nature of the inner product. Therefore, we do not have to prove the Schwarz
inequality every time we encounter a new inner product space.
Karl Herman AmandusSchwarz (1843-1921) theson of anarchitect, wasborn in what
is nowSobiecin, Poland. Aftergymnasium. Schwarz studied chemistry in Berlin fora time
30 1. VECTORS ANO TRANSFORMATIONS
before switching to mathematics, receiving his doctorate in 1864. He was greatly influenced
by the reigning mathematicians in Germany at the time, especially Kummerand Weierstrass.
The lecture notes that Schwan took while attending Weierstrass's lectures on the integral
calculus still exist. Schwarz received an initial appointment at Halle and later appointments
in Zurich and Gotringen before being named as Weierstrass'g successorat Berlin in 1892.
These later years, filled with students and lectures, were not Schwarz's most productive,
but his earlypapers assurehis place in mathematicshistory.
Schwarz's favorite tool was geometry, which he soon
turnedto the studyof analysis.He conclusively provedsome
of Riemann's results that had been previously(and justifiably)
challenged. The primary result in question was the assertion
that every simply connected region in the plane could be con-
formally mapped onto a circular area. From this effort came
several well-known results now associated with Schwarz's
name, including the principle of reflection and Schwarz's
lemma. He also worked on surfaces of minimal area, the
brancbof geometrybelovedby allwho dabblewithsoapbub-
bles.
Schwarz's most important work, for the occasion of Weierstrass's seventieth birthday,
again dealt with minimal area, specifically whether a minimal surface yields a minimal area.
Along the way, Schwarz demonstrated second variation in a multiple integral, constructed
a function using successive approximation, and demonstrated the existence of a "least"
eigenvalue for certain differential. equations. This work also contained the most famous
inequality in mathematics, which bears his name.
Schwarz's success obviously stemmed from a matching of his aptitude and training to
the mathematical problems of the day. One of his traits, however, could be viewed as either
positive or negative-his habit of treating all problems, whether trivial or monumental, with
the same level of attention to detail. This might also at least partly explain the decline in
productivity in Schwarz's later years.
Schwarz had interests outside mathematics, although his marriage was a mathematical
one, since he married Kummer's daughter. Outside mathematics he was the captain of the
local voluntary fire brigade, and he assisted the stationmaster at the local railway station by
closing the doors of the trains!
1.2.4 Length of a Vector
norm ofavector
defined
In dealing with objects such as directed line segments in the plane or in space, the
intuitive idea of the length of a vector is used to define the dot product. However,
sometimes it is more convenient to introduce the innerproduct first and then define
the length, as we shall do now.
1.2.7. Definition. The norm, or length, ofa vector la) in an inner product space
is denoted by lIall and defined as lIall ==.J"(£il<i). We use the notation lIaa +,Bbll
for the norm ofthe vector a la) +,B Ib).
One can easily show that the norm has the following properties:
triangle inequality
narmed linear space
natural distance in a
normed linear space
parallelogram law
1.2 INNER PROOUCT 31
1. The norm of the zero vector is zero: 11011 = 0.
2. lIall 2:: 0, and lIall = 0 if and only if la) = 10).
3. lIaall = lailiall tor any? complex o.
4. lIa +bll ::0 lIall + IIbli. This property is called the triangle ineqnality.
A vector space on which a nonn is defined is called a normed linear space.
One can introduce the idea of the "distance" between two vectors in a nonned
linear space. The distance between la) and Ib)----<!enoted by d(a, b)-is simply
the norm of their difference: d(a, b) es lIa - bll. It can be readily shown that this
has all the properties one expects of the distance (or metric) function introduced
in Chapter O. However, one does not need a nonned space to define distance. For
exannple,as explained in Chapter 0, one can define the distance between two points
on the surface of a sphere, but the addition of two points on a sphere-a necessary
operation for vector space structure-is not defined. Thus the points on a sphere
forma metricspace,butnot a vector space.
Inner product spaces are automatically nonned spaces, but the converse is not,
in general, true: There are nonned spaces, i.e., spaces satisfying properties 1-4
above that cannot be promoted to inner product spaces. However, if the norm
satisfies the paraUelogram law,
(1.7)
then one can define
and show that it is indeed an inner product. In fact, we have (see [Frie 82, pp.
203-204] for a proof) the foUowing theorem.
1.2.8. Theorem. A normed linear space is an inner product space if and only if
the norm satisfies the parallelogram law.
Now consider any N-dimensional vector space V. Choose a basis Ilain!:! in
V, and for any vector la) whose components are la; 1!:1 in this basis, define
N
lIall
2
== ~)ail2.
i=l
The reader may check that this defines a norm, and that the norm satisfies the
parallelogrann law. From Theorem 1.2.8 we have the foUowing:
1.2.9. Theorem. Everyfinite-dimensional vectorspace can be turned into an inner
product space.
32 1. VECTORS ANO TRANSFORMATIONS
<C" has many
different distance
functions
1.2.10. Example. Let the space be en. Thenatural inner product of en givesriseto a
norm, which,forthevector la} = (cq , aZI ••• , an) is
n
lIall=~= ~)a;l2.
i=l
This norm yields thefollowing distance between la) andIb) = Uh.lh..... fJn):
n
d(a,b) = Iia-bil =)(a-bla-b)= ~)ai -fJiI2•
i=l
Onecandefinenthernorms, suchaslIaIIJ sa LI~l lai I,whichhasall thereqniredproperties
of anonn, andleadstothedistance
n
dl (a. b) = lIa- bill = ~)aj - fJ;I·
i=1
Another norm defined on en is givenby
lIalip = (~Ia;lp) lip
where p is apositive integer. It isproved inhigher mathematical analysis thatII . lip hasall
theproperties of anonn.(Thenontrivial part oftheproofis toverifythe triangle inequality.)
Theassociated distance is
(" )tlP
dp(a, b) = lIa- blip = ;~ laj - fJ;lP
Theother twonoons introduced above arespecialcases,forp = 2 andp = 1. III
1.3 Linear Transformations
We have made progress in enriching vector spaces with structures such as norms
and inner products.However,this enrichment, although important,will be of little
value if it is imprisoned in a single vector space. We would like to give vector
spaceproperties freedom of movement, so they can go from one space to another.
The vehicle that carries these properties is a linear transformation which is the
subject of this section. However,first it is instructive to review the concept of a
mapping (discussed in Chapter 0) by considering some examples relevant. to the
present discussion.
1.3.1. Example. Thefollowing area few familiar examples ofmappings.
9Thefirstproperty followsfromthisby lettinga = O.
1.3 LINEAR TRANSFORMATIONS 33
1. Let f : IR ->-1R be givenby f(x) = x2•
2. Let g : 1R2 ->- IR be givenby g(x, y) = x 2 + y2 - 4.
3. Let F : 1R2 ->- iC be givenby F(x, y) = U(x, y) + iV(x, y), where U : 1R2 ->- IR
and V : 1R2 ->- lR.
4. Let T : IR ->- 1R2 be givenby T(t) = (t +3, 2t - 5).
5. Motion ofapointparticle in space can be considered as a mapping M : [a, b] --+ JR3,
where [a, b] is an interval of the real line. For each t E [a, b], we defineM(t) =
(x(t), y(t), z(t», where x(t), y(t), and z(t) are real-valned functions of t. If we
identify t withtime,whichis assumed to havea value in the interval [a, b], then
M(t) describes thepath of theparticle as a function of time,anda and b arethe
beginning and theendof themotion, respectively. II
Let us consider an arbitrary mapping F : V ---> W from a vector space V
to another vector space W. It is assumed that the two vector spaces are over the
same scalars, say Co Consider la) and Ib) in V and Ix) and Iy) in W such that
F(la) = Ix) and F(lb) = Iy). In general, F does not preserve the vector space
structnre. That is, the image of a linear combination of vectors is not the same as
thelinearcombination of the images:
F(a la) +{J Ib) i' aF(lx) +{JF(ly).
This is the case for all the mappings of Example 1.3.1 except the fourth item.
There are many applications in which the preservation ofthe vector space structure
(preservation of the linear combination) is desired.
1.3.2. Definition. A linear transformation from the complex vector space V to
the complex vector space W is a mapping T : V ---> W such that
linear transformation,
linear operator,
endomorphism
T(a la) +{JIb)) = aTria)) +{JT(lb)) Via) , Ib) E V and a, {J E C.
A linear transformation T : V ---> V is called an endomorphism ofV or a linear
operator on V. The action ofa linear transformation on a vector is written without
the parentheses: T(la)) sa T la).
The same definition applies to real vector spaces. Note that the definition
demands that both vector spaces have the same set of scalars: The same scalars
multiply vectors in V on the LHS and those in W on the RHS. An immediate
consequence of the definition is the following:
1.3.3. Box. Twa linear transformations T : V ---> Wand U : V ---> Ware
equal ifand only ifT la,) = U lail for all la,) in same basis afV. Thus, a
linear transformation is uniquely determined by its action an same basis of
its domain space.
34 1. VECTORS AND TRANSFORMATIONS
linear functional
qv, W) isavector
space
dual vector space v·
derivative operator
integration operator
integration isalinear
functional on the
space ofcontinuous
functions
The equality in this box is simply the set-theoretic equality of maps discussed
in Chapter O. Au importaut example of a linear trausformation occurs when the
second vector space, W, happens to be the set of scalars, C or JR, in which case
the linear trausformation is called a linear functional.
The set oflinear trausformations from Vto W is denoted by r:.(V, W), aud this
set happens to be a vector space. The zero transformation, 0, is defined to take every
vector in V to the zero vector ofW. The sum oftwo linear trausformations Taud U
is the lineartrausformation T+U, whose action on a vector la) E V is defined to be
(T+U) la) es T la) + U la). Similarly, define aT by (aD la) es arT tal) = aT la).
The set of endomorphisms of V is denoted by r:.(V) rather thau t:(V, V).
The set oflinear functionals r:.(V, iC)--or t:(V, JR) ifV is a real vector space---
is denoted by V* aud is called the dual space of V.
1.3.4. Example. Thefollowing aresomeexamples of linear operators in various vector
spaces.Theproofsof linearity are simpleinall cases andareleftasexercises forthereader.
I. Let {Jal} , la2) , ... , lam}} be an arbitraryfiniteset of vectorsin V, and {I), 12, ... ,
1m } an arbitrary set of linear functionals on V. Let
m
A'" L lak)lk E qV)
k~l
be definedby A [x) = L:~llak) Ik(lx» = L:~1 Ik(lx» lak}. Then A is a linear
operator on V.
2. Let" be a permutation(shnffiing) of the integers It, 2, ... , n}. If [x} = (~1, ~2,
... , fJn) is a vector in en, we canwrite
An Ix} = (~n(l)' ~n(2), ... , ~n(n).
ThenAn is a linear operator.
3. For any Ix) E :P'[t], withx(t) = L:Z~o cxktk,write Iy} = D[x}, where Iy) isdefined
as yet) = Lk=l kaktk-l. Then Dis a linear operator, thederivativeoperator.
4. For every [z} E :P'[t], with x(t) = L:Z~o cxktk, write Iy) = S Ix}, where Iy) E
:P'[t] is definedas yet) = L:Z~O[CXk/(k + l)]tH I. Then S is a linear operator, the
integrationoperator.
5. Definethe operatorInl : eO(a, b) -+ IRby
inl(f) = lb
f(t) dt.
Thenintis alinear functional on thevectorspace eO(a, b)~
6. Let en(a, b) be the set ofreat-valued functionsdefinedin the interval [a, b] whose
first n derivatives exist and are continoous. For any If) E en(a, b) define lu} =
GIf}, with u(t) = g(t)f(t) and get) a fixedfunction in en (a, b). Then Gis linear.
In particular. theoperation of multiplying by t, whose operator is denoted by T, is
~~ .
1.3 LINEAR TRANSFORMATIONS 35
An immediate consequence of Definition 1.3.2 is that the image of the zero
vector in V is the zero vector in W. This is not true for a general mapping, but it is
necessarily true for a linear mapping. As the zero vector of V is mapped onto the
zero vector of W, other vectors of V may also be dragged along. In fact, we have
the following theorem.
1.3.5. Theorem. The set ofvectors in V that are mapped onto the zero vector of
W under the linear transformation T : V --+ W form a subspace ofV called the
Tremel, or null space, ofT and denoted by kerT.
kernel ofa linear
transformation
Proof The proof is left as an exercise". D
nullity The dimension of ker T is also called the nullity of V.
The proof of the following is also left as an exercise.
rank ofa linear
transformation
1.3.6. Theorem. The range T(V) of a linear transformation T
subspace ofW. The dimension ofT(V) is called the rank ofT.
V--+Wisa
1.3.7. Theorem. A linear transformation is I-I (injective) iffits kernel is zero.
Proof The "only if" part is trivial. For the "if" part, suppose T lat) = T la2);then
linearity of T implies that T(lat) - la2)) = O. Since kerT = 0, we must have
lat) = la2). D
Suppose we start with a basis of ker T and add enough linearly independent
vectors to it to get a basis for V.Withoutloss ofgenerality, let us assume thatthe first
n vectors in this basis form a basis ofker T. So let B = (Ial) , la2), ... , IaN)}be a
basis for V and B' = (Iat) , la2) , ... , Ian)}be a basis for kerT. Here N = dim V
andn = dim kerT. II is straightforward to show that (Tlan+t), ... , T IaN)} is a
basis for T(V). We therefore have the following result.
dimension theorem 1.3.8. Theorem. Let T : V --+ W be a linear transformation. Then!O
dim V = dim ker T +dim T(V)
This theorem is called the dimension theorem. One ofits consequences is that
an injective endomorphism is automatically surjective, and vice versa:
1.3.9. Proposition. An endomorphism ofa finite-dimensional vector space is bi-
jective zf it is either injective or surjective.
The dimension theorem is obviously valid only for finite-dimensional vec-
tor spaces. In particnlar, neither swjectivity nor injectivity implies bijectivity for
infinite-dimensional vector spaces.
lORecall that the dimension of a vector space depends on the scalars used in that space. Although we are dealing with two
different vector spaces here, since they are both over the same set of scalars (complex or real), no confusion in the concept of
dimension arises.
36 1. VECTORS AND TRANSFORMATIONS
1.3.10. Example. Let us try to findthe kernelofT : ]R4 -+ ]R3 givenby
2xl +x2 +x3 -X4 =0,
XI +x2 +2X3 +2X4 = 0,
XI - X3 - 3X4 = 0.
The"solution" to theseequations is Xl = x3 +3X4 andX2 = -3X3 - 5X4. Thus, to be in
kerT,a vectorin r mustbeof the form
where X3 andX4 are arbitrary realnumbers. It followsthat kerTconsists ofvectors that can
bewritten as linearcombinations of the two linearly independent vectors(1, -3, 1, 0) and
(3, -5,0, I). Therefore, dim kerT = 2. Theorem 1.3.8then says that dimT(V) = 2; that
is, therange of T is two-dimensional. Thisbecomesclearwhenonenotesthat
andtherefore T(x!. X2. X3, X4), anarbitrary vectorintherange ofT, is alinear combination
of only twolinearlyindependentvectors,(I, 0, I) and (0, I, -I). III
isomorphism and
automorphism
In many cases, two vector spaces may "look" different, while in reality they
are very much the same. For example, the set of complex numbers C is a two-
dimensional vector space over the reals, as is 1R2. Although we call the vectors of
these two spaces by different names, they have very similarproperties. This notion
of "similarity" is made precise in the following definition.
1.3.11. Definition. A vector space V is said to be isomorphic to another vector
space W ifthere exists a bijective linear mapping T ; '17---> W.Then Tis called an
isomorphism.II A bijective linear map ofVonto itselfis called an automorphism
ofV. The set ofautomarphisms ofVis denoted by GL(V).
For all practical purposes, two isomorphic vector spaces are different manifes-
tations of the"same" vectorspace. Intheexample discussed above, thecorrespon-
dence T; C ---> 1R2
, with T(x +iy) = (x, y), establishes an isomorphism between
the two vectorspaces. It should be emphasized that only as vector spaces are C
and 1R2
isomorphic. If we go beyond the vector space structures, the two sets are
qnite different. For example, C has a natural multiplication for its elements, but
1R2 does not. The following theorem gives a working criterion for isomorphism.
llThe word"isomorphism," as we shall see, is used in conjunction with many algebraic structures. To distinguish them,
qualifiers need to be used.In the present context, we speakof linear isomorphism.We shalluse qualifiers whennecessary.
However, thecontextusuallymakesthemeaning of isomorphism clear.
only two
N-dimensional
vector spaces
1.3 LINEAR TRANSFORMATIONS 37
1.3.12. Theorem. A linear surjective map T : V --+ W is an isomorphism ifand
only ifits nullity is zero.
Proof The "only if" part is obvious. Toprove the "if" part, assume that the nnllity
is zero. Then by Theorem 1.3.7, T is I-I. Since it is already surjective, T must be
bijective. D
1.3.13. Theorem. An isomorphism T : V --+ W carries linearly independent sets
ofvectors onto linearly independent sets ofvectors.
Proof Assume that {Iai)}I~I is a set of linearly independent vectors in V.To show
that{T lai))I=1 islinearly independentin W, assume that there exist eq , az, ... , am
such that 2:1~1 a;T lai) = 10). Then the linearity of T and Theorem 1.3.12 give
T(2:7:"1a; lai)) = 10), or 2:;'::1 a; laj) = 10), and the linear independence of
the laj) implies that a, = °for all i. Therefore, {Tlaj))I~I must be linearly
independent. D
The following theorem shows that finite-dimensional vector spaces are severely
limited in number:
1.3.14. Theorem. Twofinite-dimensional vector spaces areisomorphic ifand only
ifthey have the same dimension.
Proof Let Bv = {lai)}~1 be a basis for V and Bw = {lbi))~1 a basis for W.
Define the linear transformation Tlai) = Ibi), i = 1,2, ... , N. The rest of the
proof involves showing that T is an isomorphism. We leave this as an exercise for
the reader. D
A consequence of Theorem 1.3.14 is that all N-dimensional vector spaces
over JR are isomorphic to JRN and all complex N-dimensional vector spaces are
isomorphic to eN. So, for all practical purposes, we have only two N-dimensional
vectorspaces, IRN andeN.
1.3.1 More on Linear Functionals
An example of isomorphism is that between a vector space and its dual space,
which we discuss now. Consider an N-dimensional vector space with a basis
B = {Ial), laz), ... , ION)). For any given set of N scalars, {ai, az, ... , aN},
define the linear functional fa by fa laj) = at. When fa acts on any arbitrary vector
Ib) = 2:~dJ; laj) in V, the result is
(1.9)
This expression suggests that Ib) can be represented as acolumnvectorwith entries
th, fJz, ... , fJN and fa as a row vector with entries ai, az,···, aN. Then fa Ib) is
38 1. VECTORS ANO TRANSFORMATIONS
Every set ofN
scalars defines a
linear functional.
merely the matrix product'< of the row vector (on the left) and the column vector
(on the right).
fa is uniquely detennioed by the set {aI, a2, ... ,aNj. In other words, corre-
sponding to every set of N scalars there exists a unique linear functional. This
leads us to a particular set of functionals, fl, f2, , fN corresponding, respec-
tively, to the sets of scalars {I, 0, 0, ... , OJ, {O, 1,0, , OJ, .,., (O, 0, 0, ... , I],
This means that
fl 101) = 1
f21a2) = 1
or that
and
and
and
fllaj} =0
f2laj} =0
for j i' 1,
for j i' 2,
for j i' N,
(1.10)
where 8ij is the Kronecker delta.
The functionals of Equation (1.10) form a basis of the dual space V*. To
show this, consider an arbitrary 9 E V*, which is uniquely determined by its
action on the vectors in a basis B = lIal) , la2} , ... , ION} j. Let 9 lai} = Yi E C.
Then we claim that 9 = L:;:"I Yifi. In fact, consider an arbitrary vector 10) in
V with components (aI, a2, ... , aN) with respect to B. Then, on the one hand,
9 10) = g(L:;:"1 a; lai) = L:;:"I aig lai} = L:;:"! aiYi. On the other hand,
(~Yifi) la} = (~Yifi) (~aj laj))
N N N N N
= LYi Lajfi laj) = LYi LaAj = LYiai.
;=1 j=l ;=1 j=l i=l
Since the actions of 9 and L:;:"I Yifi yield equal results for arbitrary 10),we con-
clude that 9 = L:;:"I Yifi, i.e., {fij;:"1 span V'. Thus, we have the following result.
1.3.15. Theorem. If V is an N-dimensional vector space with a basis B =
lIal} , 102) , ... , ION} j, then there is a corresponding unique basis B* = {f;};:"!
in V* with the property that fi 10j} = 8ij.
By this theorem the dual space of an N -dinnensional vector space is also N-
dual basis dinnensional, and thus isomorphic to it. The basis B* is called the dual basis of
B. A corollary to Theorem 1.3.15 is that to every vector in V there corresponds a
12Matrices will be taken upin Chapter 3. Here,we assumeonly a nodding familiarity withelementary matrix operations.
1.3 liNEAR TRANSFORMATIONS 39
unique linear functional in V*. This can be seen by noting that every vector la) is
uniquely determined by its components (ai, a2, ... ,aN) in a basis B. The unique
linear functional fa corresponding to [a), also called the dual of la), is simply
Lf::l ajfj, withIi E B*.
annihilator ofavector
and a subspace
1.3.16. Definition. An annihilator of la) E V is a linear junctional f E V* such
that f la) = O. Let W be a subspace ofV. The set oflinear functionals in V* that
annihilate all vectors in W is denoted by WO.
The reader may check that WO is a subspace of V*. Moreover, if we extend
a basis {Iai))f~t of W to a basis B = (Iaj) 1[:,1 of V, then we can show that the
functionals {fj 17=k+l' chosen from the basis B* = (fj 17=t dnalto B, span WO. I!
then follows that
dim V = dimW+dimWO. (1.11)
dual, or pull back, of
a linear
transformation
We shall have occasions to use annihilators later on when we discuss symplectic
geometry.
We have "dualed" a vector, a basis, and a complete vector space. The only
object remaining is a linear transformation.
1.3.17. Definition. Let T : V --> 11be a linear transformation. Define l' : 11* -->
V* by13
[T*(g)J la) = g(T la») Via) E V, 9 E 11*,
T* is called the dual or pull back, ofT.
One can readily verify that l' E £'(11*,VOl, i.e., that T* is a linear operator
on 11*. Some of the mapping properties of T* are tied to those of T. To see this
we first consider the kernel of1'. Clearly, 9 is in the kernel of l' if and only if 9
annihilates all vectors of the form T [a), i.e., all vectors in T(V). I! follows that 9 is
in T(V)o. In particular, ifT is surjective, T(V) = 11,and 9 annihilates all vectors in
11, i.e., it is the zero linear functional. We conclude that kerT* = 0, and therefore,
T*is injective. Similarly, one can show that if T is injective, then T* is surjective.
We summarize the discussion above:
1.3.18. Proposition. Let T be a linear transformation and T* its pull back. Then
kerT* = T(V)o. 1fT is surjective (injective), then T* is injective (surjective). In
particular, T" is an isomorphism ifT is.
I! is useful to make a connection between the inner product and linear func-
tionals. To do this, consider a basis (Iat) , la2) , ... , IaN)) and letaj = (al aj). As
noted earlier, the set of scalars (ail[:,t defines a unique linear functional fa such
that t, lai) = ai. Since (a Iaj) is also equal to cq, ilis natural to identify fa with
duals and inner
products
13Donot confusethis"*,,withcomplexconjugation.
40 1. VECTORS AND TRANSFORMATIONS
tihe symbol (ai, and write T: fa >--> (al where T is tiheidentification map.
It is also convenient to introduce tihe notation'f
(Ia))t es (ai, (1.12)
dagger ofalinear
combination of
vectors
where tihe symbol t means "dual, or dagger of." Now we ask: How does tihis dagger
operation act on a linear combination of vectors? Let ]c) = ala) +fJ Ih) and take
tihe inner product of [c) witihan arbitrary vector Ix} using linearity iu tihe second
factor: (xl e) = a (xl a) +fJ (xl h). Now complex conjugate botihsides and use
tihe (sesqui)symmetry of tihe inner product:
(LHS)* = (xle)* = (cl x},
(RHS)* =a* (xl a)* +fJ* (xl h)* = a* (al x) +fJ* (hi x)
= (a* (al +fJ* (hi) [x) .
Since tihis is true for all [x), we musthave (Ie))t es (c] = a* (al+fJ* (hi. Therefore,
in a duality "operation" tihecomplex scalars must be conjugated. So, we have
(a la) +fJ Ih})t = a* (al +fJ* (hi. (1.13)
Thus, unlike tiheassociation la) .... la which is linear, tihe association la .... (al is
uotlinear, but sesquilinear, i.e., tiheideutification map T mentioned above is also
sesquilinear:
T(ala +fJlb) = a* (al +fJ* (hi = a*T(la) +fJ*T(lb);
It is convenient to represent lu) E en as a column vector
Then tihe definition ofthe complex inner product suggests tihattihe dual of la) must
be represented as a row vector witihcomplex conjugate entries:
Compare (1.14) with
the comments after
(1.9). The complex
conjugation in(1.14)
isthe resuit ofthe
sesquilinearily ofthe
association
la) <+ (s],
(al = (aj ai ... a~),
and tiheinner product can be written as tihe (matrix) product
(al h) = (aj ai
(1.14)
14The significance of thisnotation will becomeclearin Section2.3.
1.4 ALGEBRAS 41
1.3.19. Example. Let IL and Vbe vectorspaceswith bases BU =Iluj)}i'~l and BV =
(IVj}}j=l' respectively. Consider an mn-dimensional vector space W whose basis Bw
is in one-to-one correspondence with the pairs (lUi), IVj», and let !UiVj) be the vector
corresponding to (Iuj), IVj»' For lu) Ell with components 10I;}i'=1 in Bu and [v) E V
with components {11jlj=l in Bv, definethe vector lu, v} E W whosecomponents in Bw
are {OIi~j}' One can easily sbowthat if lu), lu'), and lu") are vectorsin IL and lu") =
01lu) + fJ lu'), then
[a". v) = 01 lu, v) +fJ lu', v)
ThespaceW thusdefined is calledthe tensor product oflL andV anddenotedby IL <8> V.
One can alsodefine tensorproductof three or more vectorspaces. Of specialinterest
are tensor products of a vector space and its dual. The tensor product space Vr,s of type
(r,s) is definedas follows:
V,., = V <8> V <8> ... V<8>V* <8>V* <8> ... <8> V*
I ' - v - " , .
r times s times
Wesballcomeback to this spacein Cbapter25.
1.4 Algebras
III
In many physical applications, a vector space has a natural "product," the prime
example being the vector space of matrices. It is therefore useful to consider vector
spaces for which such a product exists.
algebra defined 1.4.1. Definition. An algebra A over iC (or IR) is a vector space over iC (or IR),
together with a binary operation /L : V x V -+ V, called multiplication, that
satisfies15
a(pb +ye) = pab + yac Va, b, e E A, Vp,Y E iC (or IR),
dimension ofthe
algebra; associativity;
commutativity;
identity; and right
and leftinverses
with a similar relationfor multiplication on the right. The dimension ofthe vector
space is called the dimension ofthe algebra. The algebra is called associative if
the product satisfies a(bc) = (ab)e and commutative if it satisfies ab = ba. An
algebra with identity is an algebra that has an element 1 satisfying al =la =a.
An element b ofan algebra with identity is said to be a left inverse ofa ifba = 1.
Right inverse is defined similarly.
1.4.2. Example. Definethe following producton ]Rz:
(xj, XZ)(Yl, yz) = (XIYl - XZYZ, X1YZ + XZYl)·
The reader is urged to verify that this productturns JR2 into a commutativealgebra.
15Weshall abandon the Dirac bra-and-ket notation in this section due to its clumsiness; instead we use boldface roman letters
to denote vectors. It is customary to write ab for p,(a, b).
42 1. VECTORS ANO TRANSFORMATIONS
Similarly, thevector(cross)product on lR.3 turns itinto a nonassociative, noncornmu-
tative algebra.
Theparadigm of allalgebras is thematrix algebra whosebinary operation is ordinary
multiplication of n x n matrices. Thisalgebra is associative butnotcommutative.
All the examples above arefinite-dimensional algebras. An example of an infinite-
dimensional algebra is eOO(a, b), the vectorspace of infinitely differentiable real-valued
functions onarealinterval (a, b). Themultiplication isdefioedpointwise: H f E eOO(a, b)
andg E eOO(a, b), then
Thisalgebra is commutative andassociative.
fg(x) '" f(x)g(x) v x E (a, b).
..
derivation ofan
algebra defined
The last item in the example above has a feature that turns out to be of great
significancein all algebras,the product rule for differentiation.
1.4.3.Definition. A vectorspace endomorphism D : A --+ A is calleda derivation
on A if it has the additional property
D(ab) = [D(a)]b +a[D(b)].
1.4.4. Example. LetA bethesetof n x n matrices. Definethebinary operation, denoted
bye, as
AoB"'AB-BA,
where the RHS is ordinary matrix multiplication. The reader may check thatA together
withthisoperation becomesanalgebra. Now let Abe a fixedmatrix, anddefinethelinear
transformation
Thenwe notethat
DA(B 0 C) = A. (B. C) = A(B. C) - (B 0 C)A
=A~-~-~-C~=~-_-_+~
Ontheother hand,
(DAB). C + B. (DAC)
= (AoB) .C+B. (AoC)
= (AB- BA) .C+B. (AC - CAl
= (AB - BA)C- C(AB- BA) + B(AC - CAl - (AC- CA)B
= ABC+ CBA- BCA- ACB.
So, DA is a derivation on A. ..
Thelineartransformations connectiogvectorspacescanbemodifiedslightlyto
accommodatethebinaryoperationofmultiplicationof thecorrespondingalgebras:
(1.15)
algebra
homomorphism and
isomorphism
structure constants
ofanalgebra
1.4 ALGEBRAS 43
1.4.5.Definition. Let A and 13 be algebras. A linear transformation T : A ..... 13
is called an algebra homomorphism if T(ab) = T(a)T(b). A bijective algebra
homomorphism is called an algebra isomorphism.
1.4.6.Example. LetA be]R3, and :8 thesetof3 x 3matrices oftheform
Then themap T : A -+ :8 defined by
canbe shown to be a linear isomorphism. Let the crossproduct be the binary operation
onA, tnming itintoanalgebra. For B, define thebinary operation ofExample 1.4.4. The
reader maycheckthat, withtheseoperations, Tis extended to analgebra isomorphism. IlIlI
Given an algebraA and a basis B = (e;}~1 for the underlyingvector space,
onecanwrite
N
eiej = Lc~ekl
k~1
The complex numbers ef
j , the components of the vector eiej in the basis B, are
calledthe structureconstants of A. These constantsdeterminetheproductof any
two vectorsonce theyare expressedinterms of thebasis vectorsof B. Conversely,
1.4.7.Box. Given any N-dimensional vector space V, one can turn it into
an algebra by choosing a basis and a set ofN3 numbers {c~} and defining
the product ofbasis vectors by Equation (1.15).
1.4.8. Example. Consider the vector space of n x n matrices with its standard basis
{ei]'}~ "-I' wheree., hasa 1 atthe ijthposition and zero everywhere else. Thismeans that
l,j_
(eij)zk = 8il8jko and
n n
(eijekl)mn = L(eij)mr(ekl)rn = L~im~jr8kr8In = lJimlJjk8In = 8jk(eU)mn,
r=1 r=1
or
eijekl = 8jkeil·
The structure constantsarecjj~kl = 8im8jk8ln. Note thatone needs a double index to label
these constants. II
44 1. VECTORS AND TRANSFORMATIONS
1.4.9. Example. In thestandard basis{eil ofR4, choose thestructore constants asfol-
lows:
2 2 2 2
et = -02 = -e3 = -e4 = ej,
elei = eie! = ei for i = 2, 3, 4,
eiej = L Eijkek for i, j = 2, 3, 4,
k
algebra of
quaternions
[<ijk is defined in Equation (3.19)]. The reader may verify that these relations turn R4
intoanassociative, butnoncommutative, algebra, calledthealgebra of quaternionsand
denoted by1HI. In thiscontext, ej isusually denoted by I, ande2,e3,ande4byi, j, andk,
respectively, and one writes q = x + iy + iz +kw foranelement oflBI. It thenbecomes
evident that1HI isageneralization ofC.In analogy withC,x iscalledtherealpart ofq, and
(y, z, w) thepure part ofq. Similarly, theconjugateofq isq* = x - iy - iz - kw. III
Algebras have a surprisingly rich structure and are used extensively in many
branches of mathematics and physics. We shall see their usefulness in our dis-
cussion of group theory in Part vn. To ciose this section, and to complete this
introductory discussion of algebras, we cite a few more useful notions.
left, right, end 1.4.10. Definition. LetA be an algebra. A subspace 'B ofA is calledasubalgebra
two-sided ideals ofA if'B contains the products ofall its members. If'B has the extra property that
it contains abfor all a E A and b E 'B, then 'B is called a left ideal ofA. A right
ideal and a two-sided ideal are defined similarly.
It is ciear from the definition that an ideal is automatically a subalgebra, and
that
1.4.11. Box. No proper ideal of an algebra with identity can contain the
identity element.
In fact, no proper left (right) ideal can contain an element that has a left (right)
minimal ideal inverse.An ideal can itself contain a proper (sub)ideal.Ifan ideal does not contain
any proper subideal, it is called a minimal ideal.
1.4.12. Example. Thevector space eO(a, b) of all contiuuous real-valued functions ou
an interval (a, b) is turned into a commutative algebra by pointwise multiplication: If
f, g E eO(a, b), thentheproduct fg is defined by fg(x) '" f(x)g(x) for allx E (a, b).
The set of functions that vanish at a given fixedpointC E (a, b) constitutes an ideal in
eO(a, b). Sincethealgebra is commutative, theidealis two-sided. II
ideals generated by
an element ofan
algebra
One can easily construct left ideals for an algebra A: Take any element x E A
and consider the set
Ax", {ax Ia E A}.
1.5 PROBLEMS 45
The reader may check that .Ax is a left ideal. A right ideal can be constructed
similarly. To construct a two-sided ideal, consider the set
.AKA== {axb Ia, h E .A}.
These are all called ideals generated by x.
1.5 Problems
1.1. Let R+ denote the set of positive real nnmbers. Define the "sum" of two
elements of R+ to be their usual product, and define scalar multiplication hy
elements of R as being given by r . p = pr where r E R and p E R+. With these
operations, show that R+ is a vector space over R
1.2. Show that the intersection of two subspaces is also a subspace.
1.3. For each of the following subsets of R3 determine whether it is a subspace
ofR3:
(a) {(X, y, z) ER3 jx + y - 2z = 0};
(b) {(x,y,z)ER3Ix+y- 2z=3);
(c) {(x, y, z) E R3 1xyz = OJ.
1.4. Prove that the components of a vector in a given basis are unique.
1.5. Show that the following vectors form a basis for en (or Rn).
I
I
I
I
laz) =
I
I
I
o
... ,
I
o
o
o
1.6. Let W be a subspace of R5 defined by
W = {(Xl, ... ,X5) E R5
1xI = 3xz +X3, Xz = X5, and xa = 2x3}.
Find a basis for W.
1.7. Show that the inner product of any vector with 10) is zero.
1.8. Prove Theorem 1.1.5.
1.9. Find aD, bo, bl, CO, Ci, and cz such that the polynomials aD, bo + bIt, and
CO +Cit +cztZ are mutually orthonormal in the interval [0, I]. The inner product
is as defined for polynomials in Example 1.2.3 with wet) = 1.
46 1. VECTORS AND TRANSFORMATIONS
1.10. Given the linearly independent vectors x(t) = t", for n = 0, 1,2, ...
in pc[I], use the Gram-Schmidt process to find the orthonormal polynomials
eo(t), ei (t), and e2(t)
(a) when the inner product is defined as (xl y) = J~l x*(t)y(t) dt.
(b) when the inner product is defined with a nontrivial weight function:
(xl y) = i:e-t'x*(t)y(t) dt.
Hint: Use the following result:
j
..;n
00 2
e-ttndt= 0
ioo 1 ·3·5 .. · (n - 1)
..;n 2n/ 2
ifn = 0,
if n is odd,
if n is even.
1.11. (a) Use the Gram-Schmidt process to find an orthonormal set of vectors
out of (I, -I, 1), (-1,0, I), and (2, -1,2).
(b) Are these three vectors linearly independent? Ifnot, find a zero linear combi-
nation of them by using part (a).
1.12. (a) Use the Gram-Schmidt process to find an orthonormal set of vectors
out of (I, -I, 2), (-2, I, -1), and (-1, -I, 4).
(b) Are these three vectors linearly independent? If not, find a zero linear combi-
nation of them by using part (a).
1.13. Show that
1.14. Show that
i:dx i:dy (x5 - x3 +2x2 - 2)(y5 -l +2y2 - 2)e-(x
4
+
y4
)
s i:dx i:dy (x4
- 2x2 + l)(i +4y3 +4)e-(x
4
+y').
Hint: Define an appropriate inner product and use the Schwarz inequality.
1.15. Show that for any set of n complex numbers ai, a2, ... , an, we have
la1 +a2 + ...+an 1
2 :s n (la112 + la212 + ... + Ian 1
2) .
Hint: Apply the Schwarz inequality to (I, 1, ... ,1) and (ai, a2, ... , an).
1.5 PROBLEMS 47
1.16. Using the Schwarz inequality show that if (a;}f,;;, and ltl;}f,;;, are iu iCoo,
then Lf,;;, al fJ; is convergent.
1.17. Show that T : IR2 --> IR3 given byT(x, y) = (x2 + y2,x + y, 2x - y) is not
a linear mapping.
1.18. Verify that all the transformations of Example 1.3.4 are linear.
1.19. Let 17: be the permutation that takes (1,2,3) to (3, 1, 2). Find
A" Ie;}, i = 1,2,3,
where {Ie;}}T=, is the standard basis ofIR3 (or i(3), and A" is as defined in Exam-
ple 1.3.4.
1.20. Show that if TEI:,(iC, C), then there exists a E iCsuch that T la} = ala}
for all la) E C.
1.21. Showthatif (la;}}f~l spans V and TEI:,(V, W) is surjective, then (T la,}}f=l
spans W.
1.22. Give an example of a function f :IR2 --> IR such that
f(a la}) = af(la}) Va E IR and la} E IR2
but f is not linear. Hint: Consider a homogeneous function of degree 1.
1.23. Show that the following transformations are linear:
(a) V is ic over the reals and C [z) = [z"). Is C linear ifinstead ofreal numbers,
complex numbers are used as scalars?
(b) V is pelt] and T Ix(t) = Ix(t + 1) - Ix(t)}.
1.24. Verify that the kernel of a transformation T : V --> W is a subspace of V,
and that T(V) is a subspace of W.
1.25. Let V and W be finite dimensional vector spaces. Show thatifTEI:,(V, W)
is surjective, then dim W :::dim V.
1.26. Suppose that V is finite dimensional and TEl:,(V, W) is not zero. Prove
that there exists a subspace U of V such that ker T nU = (OJ and T(V) =T(U).
1.27. Show that WO is a subspace of V* and
dim V = dimW+ dimWo.
1.28. Show that T and T* have the same rank. In particular, show that if T is
injective, then T* is surjective, Hint: Use the dimension theorem for T and T* and
Equation (1.11).
48 1. VECTORS ANO TRANSFORMATIONS
1.29. Show that (a) the product on IR2
defiued by
turns IR2 into an associative and commutative algebra, and (b) the cross product
onIR.3 turns it intoanonassociative, noncommutative algebra.
1.30. Fix a vector a E IR3 and define the linear transformation D. : IR3 -4 IR3
by D.(h) = a x b. Show that D. is a derivation ofIR3 with the cross product as
multiplication.
1.31. Show that the linear transformation of Example 1.4.6 is an isomorphism of
the two algebras A and 11.
1.32. Write down all the structure constants for the algebra of quaternions. Show
that this algebra is associative.
1.33. Show that a quaternion is real iff it commntes with every quaternion and
that it is pure iffits square is a nonpositive real number.
1.34. Let p and q be two quaternions. Show that
(a) (pq)* = q*p*.
(b) q E IRiffq* = q, andq E IR3 iffq* = -q.
(c) qq* = q*q is a nonnegative real number.
1.35. Show that no proper left (right) ideal of an algebra with identity can contain
an element that has a left (right) inverse.
1.36. Let A be an algebra, and x E A. Show that Ax is a left ideal, xA is a right
ideal, and AxA is a two-sided ideal.
Additional Reading
I. Alder, S. Linear Algebra Done Right, Springer-Verlag, 1996. A small text
packed with information. Lots of marginal notes and historical remarks.
2. Greub, W.LinearAlgebra, 4th ed., Springer-Verlag, 1975. Chapter V has a
good discussion of algebras and their properties.
3. Halmos, P.Finite Dimensional Vector Spaces, 2nd ed., VanNostrand, 1958.
Another great book by the master expositor.
2 _
Operator Algebra
Recall that a vector space in which one can multiply two vectors to obtain a third
vector is called an algebra. In this cbapter, we want to iuvestigate the algebra of
linear transformations. We have already established that the set of linear trans-
formations L (V, W) from V to W is a vector space. Let us attempt to define a
multiplication as well. The best candidate is the composition of linear transforma-
tions. If T : V --+ U and S : U --+ W are linear operators, then the composition
SoT: V --+ W is also a linear operator, as can easily be verified.
This product, however, is not defined on a single vector space, bnt is such
that it takes an element in L(V, U) and another element in a second vector space
L(U, W) to give an element in yet another vector space L(V, W). An algebra
requires a single vector space. We can accomplish this by letting V = U = W.
Then the three spacesoflineartransformations collapsetothe single space L (V, V),
the set of endomorphisms of V, which we abbreviate as ,c,(V) and to which T, S,
ST sa SoT, and TS sa To S belong. The space ,c,(V) is the algebra of the linear
operators on V.
2.1 Algebraof 'c(V)
Operator algebra encompasses various operations on, and relations among, op-
erators. One of these relations is the equality of operators, which is intuitively
obvions; nevertheless, we make it explicit in (see also Box 1.3.3) the following
defiuition.
operator equality 2.1.1.Definition. Two linearoperators T,U E ,c,(V) areequalifT 10) = U la)Ior
aUla) E V.
Because of the linearity of T and U, we have
50 2. OPERATOR ALGEBRA
2.1.2. Box. Two endomorphisms T, U E L(V) are equal ifT lai} = Ulai}
for aUla,} E B, where B is a basis ofV. Therefore, an endomorphism is
uniquely determined by its action on the vectors ofa basis.
The equality of operators can also be established by other, more convenient,
methods when an inner product is defined on the vector space. The following two
theorems contain the essence of these alternatives.
2.1.3. Theorem. An endomorphism T ofan inner product space is 0 ifand only
if! {blT la} es (bl Ta) = Ofor aUla) and Ib).
Proof Clearly, ifT = 0 then {bl T la} = O. Conversely, if (blT la) = 0 for all ]a)
and Ib), then, choosing Ib) =T la) = ITa), we obtain
(TaITa)=O Via) * Tla)=O Via) * T=O
by positive definiteness of the inner product. o
2.1.4. Theorem. A linear operator T on an inner product space is 0 ifand only if
(al T la) = Ofor aUla}.
Proof Obviously, ifT = 0, then (al T la) = O. Conversely, choose a vectora la)+
fJ Ib), sandwich T between this vector and its dual, and rearrange terms to obtain
polarization identity what is known as the polarization identity:
a*fJ (al T Ib) +afJ* (blT la) = (aa + fJbl T laa + fJb)
-lal2
(al T la) - IfJI
2
(bl T Ib).
According to the assumption of the theorem, the RHS is zero. Thus, if we let
a = fJ = 1 we obtain (alT Ib) + (hiT la) = O. Similarly, with a =1 and fJ = i
we get i (al T Ib) - i (blT la) = O. These two equations give (al T Ib) = 0 for all
la) , Ib). By Theorem 2.1.3, T = O. 0
To show that two operators U and T are equal, one can either have them act
on an arbitrary vector and show that they give the same result, or one verifies that
U- Tis the zero operatorby means ofone ofthe theorems above. Equivalently, one
shows that (alT Ib) = (al U Ib) or (al T la) = (al U la) for all le}, Ib}. In addition
to the zero element, which is present in all algebras, L(V) has an identity element,
1, which satisfies the relation 1 la) = la) for all ]«) E V. With 1 in our possession,
we can ask whether it is possible to find an operator T-1 with the property that
T-I
T = n-I
= 1. Generally speaking, ouly bijective mappings have inverses.
Therefore, only automorphisms ofa vector space are invertible.
1It is convenient here to use the notation ITa) for T la). This would then allow us to write the dual of the vector as (Tal,
emphasizing that it is indeed the bra associated with T la).
2.1 ALGEBRA OF L(V) 51
2.1.5. Example. Let the linear operator T : lit3 -+ lit3 be defined by
T(xl. x2, X3) = (Xl +X2, X2 +x3, xl +x3)·
We wantto see whetherT is invertible and,if so, find its inverse. T has an inverseif and
onlyif it is bijective. By thecomments after Theorem 1.3.8thisis thecaseif andonlyif T
is either surjective orinjective. The latter is equivalent to kefT = IO}. ButkefTis the set
of all vectors satisfying T(XI,x2, X3) = (0,0, 0), or
Xl +X2 =0, X2+X3 =0,
Thereader may checkthattheunique solutionto these equations is xi = x2 = x3 = o.
Thus, theonlyvector belonging tokef Tis thezerovector. Therefore, Thasaninverse.
To find r l applyT-IT = 1 to (XI,X2,X3):
(xj , X2, X3) =T-IT(xl. X2, X3) =T-I(XI +X2, X2 +x3, Xl +X3)·
This equationdemonstrates how T-1 acts on vectors. Tomake this more apparent, we let
Xl + X2 = x, X2 + X3 = y, Xl + x3 = z, solve for Xl, x2, andX3 in terms of x, y, andz,
andsubstitute inthepreceding equation to obtain
rl(x,y,z) = !(x - y+z,x + y -z, -x + y+z).
Rewriting thisequation in terms of Xl, x2. andX3 gives
T-
I(XI,X2,X3)
= !(XI -x2 +x3,XI +x2 -x3, -Xl +x2 +X3)'
Wecaneasily verify thatT-1T= 1 andthatTT- 1 = 1. III
The following theorem, whose proof is left as an exercise, describes some
properties of the inverse operator.
2.1.6. Theorem. The inverse of a linear operator is unique. 1fT and S are two
invertible linear operators, then TS is also invertible, and
An endomorphism T : V ---> V is invertible if and only if it carries a basis of V
onto another basis ofV.
2.1.1 Polynomials of Operators
With products and sums of operators defined, we can construct polynomials of
operators. We define powers of T inductively as Tm = TT"'-l = Tm-IT for all
positive integers m ::: I. The consistency of this equation (for m = I) demands
that orO = 1. It follows that polynomials such as
p(T) = "01 +"IT +"2T2 +...+"nT"
can be defined.
I
ODTO l'WnJRJ'ROiANES!
M. E. T. U. USRARY
52 2. OPERATOR ALGEBRA
2.1.7. Example. Let TO : IR2 --+ ]R2 be the linear operator that rotates vectors in the
xy-plane through the angle 0, that is,
Te(x, y) = (x cos s - y sin e',x sln s + y cos s).
Weareinterested in powersof Te:
x' y'
T~(X, y) = Te(x cos s - y sin e',x stn s + y ccs s)
= (x' coss - y'sine, x' sine +y'cosB)
= «x cos s - y sin s) cos s - (x sinf +y coae) sine,
(x cos s - y sinO) sin s + (x sinO +ycos sj cos s)
= (x cos 20 - y sin 20, x sin 20 + y cos20).
Thus, T2 rotates (x, y) by 20. Similarly, one can show that
T~(X, y) = (x cos 30 - y sin 30, x sin 30 + ycos30),
andin general, l"9(x, y) = (x cosnB - y sinnB, x sinnB +y cosn(J). whichshowsthat 19
is a rotation of (x, y) through the anglentJ, thatis, TO = TnO. Thisresultcouldhavebeen
guessedbecause TO is equivalent torotating (x, y) n times,eachtimeby ananglee. III
Negative powers of an invertible linear operator T are defined by T-m
(T-I)m. The exponents of T satisfy the usual rules. In particular, for any two
integers m and n (positive or negative), TmTn = ~+n and ~)n = r nn. The
first relation implies that the inverse of Tm is T-m. One can further generalize the
exponent to include fractions and ultimately all real numbers; but we ueed to wait
uutil Chapter 4, in which we discuss the spectral decomposition theorem.
2.1.8. Example. Letus evaluate Tin for theoperator of theprevious example. First, let
us findTel (seeFigure 2.1). Wearelooking for an operator sucb that TelTe(x, y) =(x, y),
or
Tel (x cos s - y sine',x sinO +y cos s) = (x, y). (2.1)
We define x' = x cos () - ysin eand y' = x sin e+ ycos eand solve x and yin terms of
x' andy' to obtain x = x' cose+ y' sin(J andy= -x' sine+./cose. Substituting forx
and y in Equation (2.1) yields
T01(X', y') = (x' cose +y'sine, -x' sine + y' cos e).
Comparing this withthe actionof T8 in the previous example, we discoverthat the only
difference betweenthe two operators is the sign of the sineterm, Weconclude that TO1
hasthe sameeffectasT-8' So we have
and -n (T-I)n (T )n
T8 = 8 = -8 =T_n8·
2.1 ALGEBRA OF L(V) 53
"J8Cx,y)
(x,y)
Figure 2.1 Theoperator T(;I andits inverse astheyacton apointin theplane.
It is instructive to verifythatTin
T9 = 1:
x' y'
TinT'J(x, y) = T(in(x cosn(} - y sin nO', x sinne + y cosn(})
= (x' cosn8 + y' sinnB, -x' sinnB + y' cos nO)
= «x cosnB - y sinnB) cosnf:J + (x sinn() + y cosn(}) sinne,
- (x cos nO - y sin nO)sin nrJ + (x sin n() + y cos nfJ) cosnB)
= (x (cos2 nO+sin2 nO), y(sin2
nO+cos2 nO)) = (x, y).
Similarly, we can showthat T,JTiJn(x, y) = (x, y). l1li
One has to keep in mind that p (T) is not, in general, invertible, even ifT is. In
fact, the snm oftwo invertible operators is not necessarily invertible. For example,
although T and -Tare invertible, their sum, the zero operator, is not.
2.1.2 Functions of Operators
We can go one step beyond polynomials of operators and, via Taylor expansion,
define functions ofthem. Consider an ordinary function f (x), which has the Taylor
expansion
00 (x - xO)k d
k
f I
f(x) = L , -k .
k=Q k. dx x=xo
in which xo is a pointwhere f (x) and all its derivatives are defined. Tothis function,
there corresponds a fnnction of the operator T, defined as
f(T)=fdk{1 (T- xol )k
k=O dx x=xo k!
(2.2)
(2.3)
(2.4)
54 2. OPERATOR ALGEBRA
Becausethisseriesisaninfinite sumof operators, difficulties mayarise concerning
its convergence. However, as will be shown in Chapter 4, f (T) is always defined
for finite-dimensional vector spaces. In fact, it is always a polynomial in T. For
the time being, we shall think of f (T) as a formal infinite series. A simplification
results when the function can be expanded about x = O. In this case we obtain
00 dkfl Tk
f(T) = L d k -k' .
k~O x x~O .
A widely used function is the exponential, whose expansion is easily found to be
00 Tk
e
T
'" exp(T) = L-'
k~O k!
2.1.9.Example. Let us evaluateexp(aT) wheu T : ][l.2 -+ ][l.2 is given by
T(x,y) = (-y,x).
Wecan find a generalformulafor the action ofTn on (x, y). Start with n =2:
T2(x, y) = T(-y, x) = (-x, -y) = -(x, y) = -l(x, y).
Thus, T2 = -1. From T andT2 we caneasily obtain higher powers of T. Forexample:
T3 = T(T2)= -T, 'f'I = T2T2 = 1, and in general,
T2n = (-I)nl
T2n+1 = (-I)nT
Thus,
for n =0, 1,2, .
forn =0, 1,2, .
(aT)n (aT)n 00 (aT)2k+l 00 (aT)2k
exp(aT) = L -,- + L -,- = L ,+ L --,
n odd n. n even n. k=O (2k + I). k~O (2k).
00 a2k+lT2k+l 00 a2kT2k 00 (_lla2k+1 00 (_I)ka2k
=L , +L--, =L I T+L , 1
k=O (2k + I). k~O (2k). k=O (2k + I). k~O (2k).
00 (_I)ka2k+l 00 (_I)ka2k
=TL , +lL "
k~O (2k + I). k~O (2k).
Thetwoseries arerecognized as sina andcosa. respectively. Therefore, we get
eaT = Tsina + 1cosa,
whichshowsthat eaT is apolynomial (of first degree) inT.
The action of eaT on (x, y) is given by
e·T
(x, y) = (sinaT + cosut) (x, y) = sinaT(x, y) + cos«t (x, y)
= (sina)(-y, x) + (cosa)(x, y)
= (-y sina, x sin a) + (x cos a, ycosa)
= (x cosa - y sin a, x sina +y cosa).
2.1 ALGEBRAOFL(V) 55
The reader will recognize the final expression as a rotation in the xy-plane through an angle
a. Thus, we can think. of eaT as a rotation operator of angle a about the z-axis. In this
contextT is called the generator of the rotation. II
2.1.3 Commutators
The result of multiplication of two operators depends on the order in which the
operators appear. This means thatifT, U E .G(V),then TU E .G(V) and UTE .G(V);
however, in general UT i= TU. When this is the case, we say that U and T do
not commute. The extent to which two operators fail to commute is given in the
following definition.
commutator defined 2.1.10. Definition. The commutator [U,T] ofthe two operators Uand T in .G(V)
is another operator in .G(V), defined as
[U,T] es UT-TU.
An immediate consequence of this definition is the following:
2.1.11. Proposition. For S, T, U E .G(V)and a, fJ E IC (or ffi.), we have
[U, T] = -[T, U],
[aU, fJn = afJ[U, no
[S, T+ U] = [S, T] + [S, U],
[S+T, U] = [S, U]+ [T, U],
[ST, U] = S[T, U]+ [S, U]T,
[S, TU] = [S, T]U+ T[S, U],
[[S, T], U]+ [[U, S], n+ [[T, U], S] = O.
antisymmetry
linearity
linearity in the right entry
linearity in the left entry
right derivation property
left derivation property
Jacobi identity
Proof In almost all cases the proof follows immediately from the definition. The
only minor exceptions are the derivation properties. We prove the left derivation
property:
[S, TU] =S(TU) - (TU)S =STU- TUS+ TSU - TSU
' - v - '
=0
= (ST - TS)U + T(SU - US) = [S, T]U+ T[S, U].
The right derivation property is proved in exactly the same way.
A useful consequence of the definition and Proposition 2.1.11 is
D
for m=O,±I,±2, ....
In particular, [A, 1] = 0 and [A,A-I] = O.
56 2. OPERATOR ALGEBRA
2.2 Derivatives of Functions of Operators
atime-dependent
operator does not
commute with itself
atdifferent times
derivative ofan
operator
Up to this point we have heen discussing the algebraic properties of operators,
static objects that obey certain algebraic rules and fulfill the static needs of some
applications. However, physical quantities are dynamic, and if we want operators
to represent physical quantities, we must allow them to change with time. This
dynamism is best illustrated in quantum mechanics, where physical observables
are represented by operators.
Let us consider a mapping H : ffi. ---> ,c(V), which/ takes in a real number and
gives out a linear operator on the vector space V. We denote the image of t E ffi. by
H(t), which acts on the underlying vector space V. The physical meaning of this is
that as t (usually time) varies, its image H(t) also varies. Therefore, for different
values of t, we have different operators. In particular, [H(t), H(t')] "" afor t "" t',
A concrete example is an operator that is a linear combination of the operators D
and T introduced in Example 1.3.4, with time-dependent scalars. To be specific,
let H(t) = Dcos wt +Tsinwt, where w is a constant. As time passes, H(t) changes
its identity from D to T and back to D. Most of the time it has a hybrid identity!
Since DandT do not commute, values of H(t) for differeuttimes do not necessarily
commute.
Of particular interest are operators that can be written as exp H(t), where H(t)
is a "simple" operator; i.e., the dependence of H(t) on t is simpler than the corre-
sponding dependence of exp H(t). We have already encountered such a situation
in Example 2.1.9, where it was shown that the operation of rotation around the
z-axis could be written as expaT, and the action ofT on (x, y) was a great deal
simpler than the corresponding action of exp aT.
Such a state of affairs is very common in physics. In fact, it can be shown
that many operators of physical interest can be written as a product of simpler
operators, each being of the form exp aT. For example, we know from Euler's
theorem in mechanics that an arbitrary rotation in three dimensions can be written
as a product of three simpler rotations, each being a rotation through a so-called
Euler angle about an axis.
2.2.1. Definition. For the mapping H : ffi. ---> ,(,(V), we define the derivative as
dH = lim H(t + M) - H(t).
dt ""-+0 f!,.t
This derivative also belongs to ,c(V).
As long as we keep track ofthe order, practically all the rules ofdifferentiation
apply to operators. For example,
d dU dT
dt (UT) = dt T + Udt .
2Strictlyspeaking,thedomain ofH mustbe an interval [a, b] of the real line. because Hmay notbe definedfor all R. However,
for our purposes, such a finedistinction is Dot necessary.
2.2 DERIVATIVES DF FUNCTIONS OF OPERATORS 57
We are not allowed to change the order of multiplication on the RHS, not even
when both operators being multiplied are the same on the LHS. For instance, if
we let U = T = Hin the preceding equation, we obtain
d 2 dH dH
-(H)= -H+H-
dt dt dt
This is not, in general, equal to 2H!fJ/-.
2.2.2. Example. Letusfindthederivative ofexp(tH),whereHis independentof t. Using
Definition2.2.1, we have
d lim exp[(t + Ll.t)Hj - exp(tH)
-exp(tH) = .
dt .6.t-+0 IJ..t
However,for infinitesimalI::1t we have
cxpltr + Ll.t)H] - exp(tH) = etHel!.tH _ etH
= etH(1 + HLl.t) _ etH = etHHLl.t.
Therefore,
d etHHLl.t
- exp(tH) = lim - - - = etHH.
dt l!.HO Ll.t
Since HandetH commute.e we also have
d
- exp(tH)= HetH.
dt
Note that in deriving the equation for the derivative of etH, we have used the relation
etHe..6..tH = e(t+Lit)H. This may seem trivial.but it will be shown later that in general,
~#~~. •
Now let us evaluate the derivative of a more general time-dependent operator,
exp[H(t)]:
d . exp[H(t + Ll.t)] - exp[H(t)]
- exp[H(t)] = Ion .
dt l!.HO Ll.t
If H(t) possesses a derivative, we have, to the first order in Ll.t,
d
H(t + Ll.t) = H(t) + Ll.t dt H,
and we can write exp[H(t + Ll.t)] = exp[H(t) + Ll.tdH/dt]. It is very tempting to
factor out the exp[H(t)] and expand the remaining part. However, as we will see
presently, this is not possible in general. As preparation, consider the following
example, which concerns the integration of an operator.
58 2. OPERATOR ALGEBRA
evolution operator
2.2.3. Example. The Schrtidingerequation i ;t It(t» = HIt(t» can be turned into an
operator differential equation asfollows. Definetheso-called evolutionoperator U(t) by
It(t» = U(t) It(O»,and substitute in the Schrtidinger eqnation to obtain
i ~U(t) It(O» = HU(t) It(O» .
at
Ignoring thearbitraryvector [t(O» results ina differentialequation in U(t). Forthepurposes
ofthisexample.Ietusconsideranoperatordifferential equationof the form dU / dt = HU(t),
where His not dependent on t, We can find a solutionto such an equation by repeated
differentiation followedby Taylor seriesexpansion. Thus,
d2U d
~ = H-
d
U = H[HU(t)] = H2U(t),
dt t
d3U
d d
-3 = -[H2U(t)] = H2_U = H3U(t).
dt dt dt
In general dnU/dtn = HnU(t). Assuming that U(t) is well-defiued at t = 0, the above
relations say that all derivatives of U(t) are also well-defined at t = O. Therefore, we can
expand U(t) arouud t = 0 to obtain
00 tn (dnU) 00 tn
U(t) = L, -d
n = L ,HnU(O)
n=O n. t t=O n=O n.
= (~ U:r)U(O) = e'HU(O). III
Let us see under what conditions we have exp(5 +T) = exp(5) exp(T). We
consider only the case where the commutator of the two operators commutes
with both ofthem: [T, [5, Tj] = 0 = [5, [5, T]].NowconsidertheoperatorU(t) =
e'Se'Te-'(SH) and differentiate it using the result ofExarnple 2.2.2 and the product
rule for differentiation:
~U = 5e'setTe-t(SH) +etSTe'Te-'(S+T) _ e'Se'T(5 +T)e-t(SH)
dt
= 5e'Se'Te-t(SH) _ etSetT5e-t(SH). (2.5)
The three factors of U(t) are present in all terms; however. they are not always
next to one another. We can switchthe operators if we introduce a commutator.
For instance, e'T5 = 5etT +letT, 5].
Itis leflas aproblemforthereaderto show that if[5, Tj commutes with 5 and T,
then letT, 5] = -t[5, T]etT, and therefore, etT5 = 5e,r - t[5, T]etT. Substituting
this in Equation (2.5) and noting that e'55 = SetS yields dU/dt = t[5. TjU(t).
The solution to this equation is
U(t) =exp c;[5, T]) => e'Se'Te-t(SH) =exp c;[5, T])
3Thisis a consequence of a moregeneral result that if two operators commute, anypairof functions of thoseoperators also
commute (see Problem 2.14).
Baker-Campbell-
Hausdorff
formula
2.2 DERIVATIVES OF FUNCTIONS OF OPERATORS 59
because U(O) = 1. We thus have the following:
2.2.4. Proposition. Let S, T E £'(V). If [S, [S, T]] = 0 = [T, [S, TJ], then the
Baker-Campbell-Hausdorffformula holds:
(2.6)
In particular, e'Se'T = e,(s+n ifand only if[S, TJ = o.
If t = I, Equation (2.6) reduces to
eSeTe-(l/2)[S,T] = eS+T. (2.7)
Now assume that both H(t) and its derivative commute with [H, dH/dt]. Letting
S =H(t) and T =f!J.tdH/dt in (2.7), we obtain
eH(t+.6.t) = eH(t)+.6.tdH/dt
= eH(I)elll(dH/dl)e-[H(I),lltdH/dl]/2.
For infinitesimal f!J.t, this yields
e
H(I+IlI)
=eH(I) (1 +f!J.t~~) (1- ~f!J.t[H(t), ~~])
=e
H(I){l
+f!J.t~~ - ~f!J.t[H(t), ~~]},
and we have
~eH(I) = eHdH _ !eH [H dH].
dt dt 2 ' dt
We can also write
eH(t+IlI) =e[H(I)+lltdH/dl] =e[llldH/dl+H(I)]
= e[llldH/dl]eH(t)e-[lltdH/dl,H(ll]/2,
which yields
~eH(ll = dH eH+ !eH [H dH].
dt dt 2 ' dt
Addingthe above two expressions and dividing by 2 yields the following symmetric
expression forthederivative:
~eH(ll =! (dHeH +eHdH) =! {dH eH}
dt 2 dt dt 2 dt ' ,
antlcommutator where IS, T} sa ST + TS is called the anticommutator of the operators S and T.
We, therefore, have the following proposition.
60 2. OPERATOR ALGEBRA
2.2.5. Prepositlon, Let H : R --> .G(V) and assume that H and its derivative
commutewith [H, dH/dt]. Then
!£eH(t) =! {dH eH}
dt 2 dt ' .
In particular, if[H, dH/dt] = 0, then
!£eH(t) = dHeH =eHdH .
dt dt dt
A frequently eucountered operator is F(t) = etABe-tA, where A and B are
t-independent. It is straightforward to show that
dF
- = [A, F(t)]
dt
and d [ dF]
-[A, F(t)] = A, - .
dt dt
Using these results, we can write
d2F d
-2 = -[A, F(t)] =[A, [A, F(t)]] sa A2[F(t)],
dt dt
and in general, dnF/dtn = An[F(t)], where An[F(t)] is defined inductively as
An[F(t)] = [A,An-1[F(t)]], with AO[F(t)] '" F(t). For example,
A3[F(t)] =[A,A2[F(t)]] =[A, [A,A[F(t)]]] =[A, [A, [A, F(t)]]].
Evaluating F(t) and all its derivatives at t = 0 and substituting in the Taylor
expansion about t = 0, we get
00 tn dRFI 00 tn 00 t"
F(t) =L , -.- =L ,An[F(O)] =L ,An[B].
n=O n. dt 1=0 n=O n. n=O n.
That is,
00 tn t2
etABe-tA = L _An[B] ea B +t[A, B] + -[A, [A, B]] +....
n~O n! 2!
Sometimes tltis is written symbolically as
where the RHS is merely an abbreviation of the infinite sum in the middle.
For t = I we obtain a widely used formula:
(
00 I ) I
eABe-A= eA[B] = L ,An [B] '" B + [A, B] + ,[A, [A, B]] +....
n=O n. 2.
2.3 CONJUGATION OF OPERATORS 61
IfA commutes with [A, BJ,then the infinite series truncates at the second term,
and we have
etABe-tA = B +t[A, BJ.
For instance, ifA and B are replaced by D and T ofExample 1.3.4, we get (see
Problem 2.3)
etOTe-tO= T +t[D, T] = T +t1.
generator of The RHS shows that the operator T has been translated by an amount t (more
translation precisely, by t times the unit operator). We therefore call exp(tD) the translation
operator of T by t, and we call D the generator of translation. With a little mod-
ification T and D become, respectively, the position and momentum operators in
momentum as quantum mechanics. Thus,
generator of
translation
2.2.6. Box. Momentum is the generator oftranslation in quantum mechan-
ics.
But more of this later!
2.3 Conjugation of Operators
We have discussed the notion of the dual of a vector in conjunction with inner
products. We now incorporate linear operators into this notion. Let Ib) , [c) E '7
and assume that [c) = T Ib). We know that there are linear functionals in the dual
space '7* that are associated with (Ib))t = (bl and (Ic))t = (c], Is there a linear
operator belonging to £'('7*) that somehow corresponds to T? In other words, can
we find a linear operator that relates (bl and (cl just as T relates Ib) and [c}?The
answer comes in the following definition.
adjoint ofan operator 2.3.1. Definition. Let T E £'('7) and 10) ,Ib} E '7. The adjoint, or hermitian
conjugate, ofT is denoted by rt and defined by
(01 T Ib)* = (blTt 10) . (2.8)
The LHS of Equation (2.8) can be written as (01 c)* or (c] a), in which case
we can identify
(2.9)
This equation is sometimes used as the definition of the hermitian conjugate.
From Equation (2.8), the reader may easily verify that 1t = 1. Thus, using the unit
operatorfor T, (2.9) justifies Equation (1.12).
Some of the properties of conjugation are listed in the following theorem,
whose proof is left as an exercise.
62 2. OPERATOR ALGEBRA
2.3.2. Theorem. Let U,T E .(,(V) and a E C. Then
1. (U +T)t = Ut +Tt.
3. (OIT)t =OI*Tt.
2. (UT)t = TtUt.
4. «T)t)t = T.
The last identity holds for finite-dimensional vector spaces; it does not apply to
infinite-dimensional vectorspaces in general.
In previous examples dealing with linear operators T : Ill" -> Ill", an element
ofIll" was denoted by a row vector, such as (x, y) for Ill2 and (x, y, z) for Ill3.There
wasno confusion, because we wereoperating onlyin V. However, sinceelements
of both V and V* are required when discussing T, T*, and rt, it is helpful to make
a distinction between them. We therefore resort to the convention introduced in
Example 1.2.3 by which
.
2.3.3.Box. Ketsare representedas column vectorsandbrasas rowvectors.
2.3.4. Example. Let us find the hermitianconjugateof theoperatorT : C3 -> C3 given
by
T (~~) = (ati::~~a
3
) .
{t3 al - a2 + ia3
Introduce
la} = GD and Ib) = (~D
with dual vectors (al = (aj a2 a3)and (bl = (13j 132 133),respectively. Weuse
Equation(2.8)to findTt:
2.4 HERMITIAN AND UNITARYOPERATORS 63
Therefore, we obtain
III
hermitian and
antl-hermltlan
operators
2.4 Hermitian and Unitary Operators
The process of conjugation of linear operators looks much like conjugation of
complex numbers. Equation (2.8) alludes to this fact, and Theorem 2.3.2 provides
further evidence. It is therefore natural to look for operators that are counter-
parts of real numbers. One can define complex conjugation for operators and
thereby construct real operators. However, these real operators will not be inter-
esting because-as it turns out-they completely ignore the complex character
of the vector space. The following altemative definition makes use of hermitian
conjugation, and the result will have much wider application than is allowed by a
mere complex conjugation.
2.4.1. Definition. A linear operator H E L(V) is calledhermitian, or self-adjoint,
ifHt = H. Similarly, A E L(V) is called anti-hermitian ifAt = -A
Charles Hermite (1822-1901), one of the most eminent
French mathematicians of thenineteenth century, was par-
ticularly distinguishedforthecleanelegance andhigh artis-
tic quality of his work. As a student, he courted disaster
byneglecting his routine assigned worktostudytheclassic
masters of mathematics; andthough henearly failedhisex-
aminations, he became a first-rate creative mathematician
whilestillin his earlytwenties. In 1870he wasappointed to
a professorship at the Sorbonne, where he trained a whole
generation of well-known French mathematicians, includ-
ing Picard, Borel, and Poincare.
Thecharacter ofhis mindis suggestedbyaremarkofPoincare: "TalkwithM. Hermite.
Heneverevokesaconcrete image,yetyousoonperceivethatthemostabstractentities areto
him likelivingcreatures." Hedislikedgeometry, butwasstrongly attractedto numbertheory
andanalysis,andhis favorite subject was ellipticfunctions, where these two fieldstouch
in many remarkable ways. Earlier in the centurythe Norwegian genius Abel had proved
that the general equation of the fifth degreecannotbe solvedby functions involvingonly
rational operations androotextractions. Oneof Hermite's mostsurprising achievements (in
1858)wasto showthat thisequation canbesolvedby ellipticfunctions.
His 1873proofof thetranscendence of e was another high pointof his career," If he
hadbeenwillingtodigevendeeper intothisvein,he couldprobably havedisposedof tt as
4Transcendental numbers arethosethat arenotroots of polynomials withinteger coefficients.
64 2. OPERATOR ALGEBRA
well,butapparently hehadhadenoughof a goodthing.Ashe wrotetoa friend, "I sballrisk
nothing on anattempt to provethetranscendence of thenumber n, Ifothers undertake this
enterprise, no onewill be happier than I attheir success,butbelieveme, my dear friend, it
willnotfail to cost themsome efforts." As it turned out,Lindemann's proof nine yearslater
restedon extendingHermite's method.
Several ofhis purely mathematical discoveries hadunexpected applications many years
later to mathematical physics.Forexample, the Hermitianforms and matrices that he in-
ventedin connectionwith certain problems of number theoryturned out to be crucial for
Heisenberg's 1925formulation of quantummechanics, andHermite polynomials (seeChap-
ter7) areuseful in solving Schrodinger's waveequation.
The following observations strengthen the above conjecture that conjugation
of complex numbers and hermitian conjugation of operators are somehow related.
expectation value 2.4.2.Definition. The expectation value (T)a ofan operator T in the "state" la)
is a comp/ex number defined by (T)a = (aITla)..
The complex conjugate of the expectation value is5
(T)* = (al T la)* = (al rt la).
In words, r', the hermitian conjugate of T, has an expectation value that is the
complex conjugate of the latter's expectationvalue. Inparticular, ifT is hermitian-
is equal to its hermitian conjugate-its expectation value will be real.
What is the analogue of the known fact that a complex number is the sum of a
real number and a pure imaginary one? The decomposition
shows that any operator can be written as a sum of a hermitian operator H =
~ (T +Tt) and an anti-hermitian operator A = ~ (T - Tt).
We can go even further, because any anti-hermitian operator A can be written
as A = i(-iA) in which -iA is hermitian: (-iA)t = (-i)*At = i(-A) = -iA.
Denoting -iA by H', we write T = H + iH', where both H and H' are hermitian.
This is the analogue of the decomposition z = x + iYin which both x and y are
real.
Clearly, we should expect some departures from a perfect correspondence.
This is due to a lack ofconunutativity among operators. For instance, although the
product of two real numbers is real, the product of two hermitian operators is not,
in general, hermitian:
5Whennoriskof confusionexists, it is common to drop thesubscript "a" andwrite(T) fortheexpectation valueofT.
2.4 HERMITIAN AND UNITARY OPERATORS 65
We have seen the relation between expectation values and conjugatiou properties
of operators. The following theorem completely characterizes hermitian operators
in terms of their expectation values:
2.4.3. Theorem. A linear transformation H on a complex inner product space is
hermitian ifand only if (al H la) is realfor allla).
Proof We have already pointed out that a hermitian operator has real expectation
values. Conversely, assume that (al H la) is real for all ]e). Then
(al H la) =(al H la)* =(al Ht la) {} (al H - Ht la) =0
By Theorem 2.1.4 we must have H - Ht = o.
Via) .
D
2.4.4. Example. In thisexample, we illustrate theresult oftheabove theorem with2 x 2
matrices. Thematrix H = C? c/) is hermitian'' andacts on «:2. Letus take anarbitrary
vector la) =(~P and evaluate (al H la). We have
Therefore,
(al H la) = (ai az)(-.ia2) = -iaia2 + iazat
,a!
= iaial + (ia20:1)* = 2Re(io:iat),
and (al H la) is real.
Forthemostgeneral 2 x 2 hermitian matrix H= (;* ~), where a and y arereal, we
have
Hla) = (;*
and
(al H la) = (Oli az) (pOl*OI! ++P(2) = ai(aa! +P0l2) +OIZ(P*OI! + y(2)
at ya2
=ala!1
2
+aiP0I2 +aZp*a! + Yla21
2
= 0I1a!1
2
+YI0I212 +2Re(aip0I2).
Again (al Hla) is real. III
2.4.5. Definition. An operator A on an inner product space is called positive
positive operators (written A:::: 0) ifA is hermitian and (al Ala) :::: Ofor allla).
6Weassume thatthereader hasa casualfamiliarity withhermitianmatrices. Thinkof ann x n matrix asalinearoperator that
actsoncolumnvectorswhoseelementsarecomponents of vectorsdefinedin thestandard basisof en orR". A hermitian matrix
thenbecomesa hermitian operator.
66 2. OPERATOR ALGEBRA
positive definite
operators
2.4.6. Example. Ali example of a positive operator is the squareof a hermitian opera-
tor.? Wenote that foranyhermitian operator Handany vector [c), we have (c] H2 1
a} =
(al HtH la) = (Hal Ha) ~ obecauseofthepositivedefiniteness oftheinnerproduct. III
An operator T satisfying the extra condition that (aIT la) = 0 implies la) = 0
is called positive definite. From the discussion ofthe example above, we conclude
that the square of an invertible hermitian operator is positive definite.
Thereadermaybe familiar with two- and three-dimensional rigid rotations and
the fact that they preserve distances and the scalar product. Can this be generalized
to complex inner product spaces? Let jz) , Ib) E V, and let U be an operator on V
that preserves the scalar product; that is, given Ib') = U Ib) and la') = Ula), then
(a'i b') = (al b). This yields
(a'l b') = «al Ut)(U Ib» = (al UtU Ib) = (al b) = (aI1Ib).
Since this is truefor arbitrary la) and Ib), we obtain u'u = 1. In the next chapter,
when we introduce the concept of the determinant of operators, we shall see that
this relation implies that U and ut are both Inverrible," with each one being the
inverse of the other.
2.4.7. Definition. Let V be afinite-dimensional inner product space. An operator
unitary operators U is called a unitary operator ifUt = U-I. Unitary operators preserve the inner
product o/V.
2.4.8. Example. Thelineartransformation T : C3 -->C3 givenby
T I:~) = ( (al ~li~2i'::~~:0./6 )
;;3 {al - a2 + a3 + i(al + cz + (3)}/./6
is unitary. In fact,let
and
with dual vectors (al = (at ai a3') and (hi = (Pi Pi P3'), respectively. Weuse
Equation(2.8) and the procedureof Example2.3.4to findrt, The resultis
al a2 a3(1 - i)
v1+./6+ ./6
ial ia2 a3(l + i)
v1-./6- ./6
~This is further evidencethathermitian operators areanalogues of realnumbers: The square of anyrealnumber is positive.
Thisimplication holdsonlyforfinite-dimensional vectorspaces.
2.5 PROJECTION OPERATORS 67
and we can verify that
Thus TIt = 1. Similarly, we can show that TtT = 1 and therefore that T is unitary. II
2.5 Projection Operators
We have already considered subspaces briefly. The significance of subspaces is
that physics frequently takes place not inside the whole vector space, but in one
of its subspaces. For instance, although motion generally takes place in a three-
dimensional space, it may restrict itself to a plane either because of constraints or
due to the nature of the force responsible for the motion. An example is planetary
motion, which is confined to a plane because the force of gravity is central. Fur-
thermore, the example of projectile motion teaches us that it is very convenient
to "project" the motion onto the horizontal and vertical axes and to study these
projections separately. It is, therefore, appropriate to ask how we can go from a
full space to one of its subspaces in the context of linear operators. Let us first
consider a simple example. A point in the plane is designated by the coordinates
(x, y). A subspace of the plane is the x-axis. Is there a linear operator.? say Px ,
that acts on such a point and somehow sends it into that subspace? Of course, there
are many operators from ]Rz to R However, we are looking for a specific one.
We want Px to projectthe point onto the x-axis. Such an operator has to act on
(x, y) and produce (x, 0): Px(x, y) = (x, 0). Note that ifthe point already lies on
the x-axis, Px does not change it. In particular, if we apply Px twice, we get the
same result as if we apply it ouly once. And this is true for any point in the plane.
Therefore, our operator must have the property P~ = Px. We can generaIize the
above discussion in the following deflnition.I''
projection operators 2.5.1. Definition. A hermitian operator PEl:.,(V) is calledaprojectionoperator
ifPZ = P.
From this definition it immediately follows that the only projection operator
with an inverse is the identity operator. (Show this!)
Considertwo projection operators PI and Pz. We wantto investigate conditions
under which PI + Pz becomes a projection operator. By definition, PI + Pz =
(PI +Pz)z = pi +PI Pz +PZPI + P~. So PI +Pz is a projection operator ifand
ouly if
(2.10)
9We want this operator to preserve the vector-space structure of the plane and the axis.
l°lt is sometimes useful to relax the condition ofherrniticity. However, in this part ofthe book, we demand that P be hermitian.
68 2. OPERATOR ALGEBRA
MUltiply this ou the left by PI to get
PfP2 + PIP2PI =0 =} PIP2 + PIP2PI =O.
Now multiply the same equatiou ou the right by PI to get
PIP2PI +P2Pf = 0 =} PIP2PI +P2PI = O.
These last two equatious yield
(2.11)
orthogonal projection
operators
compieteness
relation
The solutiou to Equatious (2.10) aud (2.11) is PIP2 = P2PI = O. We therefore
have the following result.
2,5.2. Proposition. Let PI, P2 E .G(V) be projection operators. Then PI + P2
is a projection operator if and only if PIP2 = P2PI = O. Projection operators
satisfying this condition are called orthogonalprojection operators.
More geuerally, if there is a set {P;}i~1 ofprojectiou operators satisfyiug
{
Pi ifi = j,
p.p. -
, J - 0 ifi '" j,
theu P = I:7::1 Pi is also a projectiou operator. Giveu a uormal vector Ie), oue cau
show easily that P = [e) (e] is a projectiou operator:
• Pishermitiau: pt = (Ie) (el)t = «el)t(le»t = Ie) (e].
• P equals its square: p2 = (le) (el)(le) (el) = Ie) (el e) (e] = lei (e],
---
=1
ill fact, we cantake au orthouormal basis B = {lei)}~I andcoustruct a set of
projectiou operators {Pi = lei) (eil}~I' The operators Pi are mutually orthogoual.
Thus, their sum I:~I Pi is also a projectiou operator.
2.5.3. Propositiou. Let B = {lei}}~1 be an orthonormal basis for VN. Then the
set {Pi = lei) (eil}~1 consists ofmutually orthogonal projection operators, and
I:~I Pi = I:~I lei) (eiI =1. This relation is called the completeness relation.
Proof The mutual orthogouality of the Pi is au immediate cousequeuce of the
orthouormality of the lei). To show the secoud part, consider au arbitrary vector
la), writteu in terms of lei): la) = I:f=1 aj leji. Apply Pi to both sides to obtain
2.5 PROJECTION OPERATORS 69
Therefore, we have
11a) = tai lei) = tPila) = (tPi) la).
Since this holds for an arbitrary la), the two operators must be equal. D
If we choose only the first m < N vectors instead of the entire basis, then
the projection operator p(m) '" I:~=I lei) (eil projects arbitrary vectors into the
subspace spanned by the first m basis vectors (lei)li':"I' In other words, when p(m)
acts on any vector la) E V, the result will be a linear combination of only the first
m vectors. The simple proof of this fact is left as an exercise. These points are
illustrated in the following example.
2.5.4. Example. Consider three orthonormal vectors (lei)1;=1 E lW,3 given by
I (I I 0)
10)=2: 1 1 0.
. 0 0 0
Theprojection operators associated witheachof thesecanbeobtained by notingthat (ei I
is a row vector.Therefore,
PI = leI) (eIl = ~ (D(I
Similarly,
pz=H~I)(l -I 2)=H~1 ~: ~2)
and
(
I -I
I)=~ -I I
3 -I I
-I)
I .
I
Note thatPi projects ontothe line along lei}. This can be testedby lettingPi acton
anarbitrary vectorandshowing that theresulting vectoris perpendicular to theother two
vectors. Forexample, let P2acton anarbitrary columnvector:
la) == Pz(;)= ~ (~I ~I !2)(;)= ~ (:x-:/_2~z).
z 2 -2 4 z 2x-2y+4z
We verify that]e} is perpendicular to both let) and [eg}:
I
(ella)= vrz(1
I(X-Y+2Z)
I 0) (; -x +Y - 2z = O.
2x-2y+4z
70 2. OPERATOR ALGEBRA
Similarly, (e31 a) =O. So indeed, la) is atongle2).
We can find the operator that projects onto the plane fanned by leI) and le2). This is
t (2 t
PI + P2 = - I 2
3 I -t
When this operator acts on an arbitrary column vector, it produces a vector lying in the
planeof 1eJ) and le2), orperpeodicularto le3):
(
X) t (2 I t ) (X) t GX+Y+Z)
Ib) == (PI + P2) Y = 3" I 2 -I Y = 3" X +2y - Z •
Z I -t 2 Z - Y + 2z
It is easy to show that (e31 b) = O.The operators that project onto the other two planes are
obtainedsimilarly. Finalty, weverifyeasilythat
PI + P2 + P3 = (~ ~ ~ =1.
o 0 ~)
2.6 Operators in Numerical Analysis
11II
forward, backward,
and central difference
operators
In numerical calculations, limitiog operations involving infinities and zeros are
replaced with finite values. The most natural setting for the discussion of such
operations is the ideas developed in this chapter. In this section, we shall assume
that all operators are invertible, and (rather sloppily) manipulate them with no
mathematical justification.
2.6.1 Finite-Difference Operators
In all numerical manipulations, a function is considered as a table with two
columns. The first column lists the (discrete) values of the independent variable
Xi, and the second column lists the value of the function I at Xi. We often write
Ii for I(xi).
Three operators that are in use in numerical analysis are the forward difference
operator ~,the backward difference operator V (not to be confused with the
gradient), and the central difference operator 6. These are defined as follows:
~Ii sa 11+1 - fi, Vfi es Ii -fi-I,
(2.12)
The last equation has only theoretical significance, because a half-step is not used
in the tabulation of functions or in computer calculations. Typically, the data are
equally spaced, so Xi+1 - Xi =h is the same for all i, Then Ii±1 =I(xi ± h),
and we define Ii±lj2sa I(xi ± hI2).
2.6 OPERATORS IN NUMERICAL ANALYSIS 71
We can define products (composition) of the three operators. In particular, A 2
is given by
A 2
f; = A(f1+1 - ft) =11+2 - 2/1+1 + f;.
Similarly,
V2f; = V(f; - li-l) = Ii - 2li-l + Ii-a.
We note that
(2.13)
tPf; = f;+J - 21i + Ii-I.
(2.14)
shifting and
averaging operators
tPf; = f;+1 - f; - (f; - Ii-I) = (A - V)li =} (j2 = A-V.
'---v-' ' - , , - '
=Afi =V!i
This shows that the three operators are related.
Itis convenient to introduce the shifting and averaging operators, respectively
E and p" as
E/(x) = I(x +h), p,1(x) = ~ [I (x +~) +I (x - ~)].
(2.15)
Note that for any positive integer n, EnI(x) = I(x +nh). We generalize this to
any real number a:
E
a
I(x) = I(x +ah).
All the other finite-difference operators can be written in terms of E:
(2.16)
A=E-1, V=1-E-1, (j=EI/2_E-1/2, p,=~(El/2+E-l/2)
(2.17)
The first two equations of (2.17) can be rewritten as
E=l +A, E = (1 - V)-l (2.18)
We can obtaina usefulformula for the shifting operator when it acts on polynomials
of degree n or less. First note that
1 - Vn
+1 = (1 - V)(l + V + ... + Vn
) .
But Vn
+1 armihilates all polynomials of degree n or less (see Problem 2.33).
Therefore, for such polynomials, we have
1 = (1 - V)(l + V + ., . + Vn
) ,
(2.20)
72 2. OPERATOR ALGEBRA
which shows that E = (1 - V)-I = 1 + V +...+ Vn. Now let n -+ 00 and
obtain
00
E = (1 - V)-I = L V k (2.19)
k~O
for polynomials ofany degree and-hy Taylor expansion-for any (well-behaved)
function.
2.6.1. Example. Numerical interpolation illustrates the use of the formulas derived
above. Suppose that we are given a table of the values of a function, and we want the
valueof thefunction foranx locatedbetween twoentries. Wecannot usethetable directly,
butwe mayuse thefollowing procedure.
Assumethat thevalues of thefunction f are givenforXl, X2, ... , xi, ... , and we are
interested in thevalueof thefunction for x suchthat xi < x < xi+ 1. Thiscorresponds to
Ii+r. where0 < r < 1.Wehave
r r ( r(r - I) 2 )
li+r=Eli=(1+~)/i= 1+r~+--2-~ +... j;.
In practice, theinfinite sumis truncated after a finite number of terms.
If only twotenusarekept, we have
li+r "" (1 + r~)li = Ii +r(fi+1 - Ii) = (I - r)/i +rli+l. (2.21)
In particular. for r = t.Equatiou (2.21) yields li+l/2 "" t(fi + li+I). which states the
reasonable result that thevalueatthemidpoint is approximately equalto theaverage of the
values attheendpoints.
If the third term of the series in Equation (2.20) is also retained. theu
[
r(r - I) 2] r(r - I) 2
li+r"" 1 +r~+ --2-~ Ii =Ii +r~/i + --2-~ Ii
. r(r - I)
= Ii +r(fi+l - /i) + --2-(1i+2 - 2li+l + Ii) (2.22)
~-~(I-~ r~-I)
= 2 Ii +r(2 - r)li+1 + --2-11+2'
For r = t.that is. at the midpoint betweeu Xi and Xi +1. Equation (2.22) yields
1i+1/2 "" ~Ii + lli+l - !/i+2.
whichturns outtohe abetter approximation than Equation (2.21).However, it involves not
onlythetwopoints oneither sideof x butalsoarelatively distant point, Xi+2, Ifwe wereto
retain terms up to ~k for k > 2. theu Ii+r would be giveu in terms of h. Ii+j. ...• li+k>
and theresult wouldbemore accurate than (2.22). Thus, the more information we have
about the behavior of a function at distant points; the better we can approximate it at
x E (Xj,Xi+l).
Theforegoing analysis wasbasedonforward interpolation. Wemaywanttouseback-
wardinterpolation, where fi-r is soughtfor 0 < r < 1.In sucha casewe usethebackward
difference operator
1 ( r ( r - I) 2 )
Ii-r =(E- )rIi =(1 - V)r Ii = 1 - rV + --2-V +... h- III
differentiation and
integration operators
2.6 OPERATORS IN NUMERICAL ANALYSIS 73
2.6.2. Example. Let us check the conclusion made in Example 2.6.1 using a calculator
and a specific function, say sinx. A calculator gives sin(O.I) = 0.0998334, sin(O.2) =
0.1986693,sin(0.3) = 0.2955202.Supposethat we wantto findsin(O.l5)by interpolation.
UsingEquation (2.21)with r = i,we obtain
sin(0.15) "" ![sin(O.I) + sin(0.2)] = 0.1492514.
On the other hand, using (2.22)with r = iyields
sin(0.15) "" i sin(O.I) + i sin(0.2) - ksin(0.3) = 0.1494995.
The valueof sin(O.l5) obtainedusing a calculatoris 0.1494381.It is clearthat (2.22)gives
a betterestimate than (2.21). III
2.6.2 Differentiation and Integration Operators
The two mostimportant operations ofmathematicalphysics canbe writtett itt temns
of the finite-difference operators. Define the differetttiatiott operator 0 and the
mtegration operator J by
Of(x) == f'(x),
l
x+h
Jf(x) es x f(t) dt. (2.23)
Assuming that 0-1exists, we note that f(x) = 0-1[f'(x)]. This shows that 0-1is
the operatiott of antidifferentiation: 0-1f(x) = F(x), where F is any primitive of
f. Ott the other hand, AF(x) = F(x +h) - F(x) = Jf(x). These two equations
and the fact that J and 0 commute (reader, verify!) show that
"exact" relation
between shiffing and
differentiation
operators
Ao-1 = J =} JO = OJ = A = E -1.
UsingtheTaylor expansion, we canwrite
(2.24)
(2.26)
2.6.3. Example. Let us calculate cos(O.I), considering cosx to be (dldx)(sinx) and
usingthevaluesgivenin Example 2.6.2. Using Equation (2.26) to secondorder, we get
74 2. OPERATOR ALGEBRA
Thisgives
cos(O.I) '" 1[-0.295520 +4(0.198669) - 3(0.099833)] = 0.998284.
"exact" relation
between integration
and difference
operators
Incomparison. thevalue obtained directly from a calculator is 0.995004.
The operator J is a special case of a more general operator defined by
t:
Jaf(x) = x f(t) dt = F(x +ah) - F(x)
= (Ea -l)F(x) = (Ea _1)D-1f(x),
or
Ea-1 (1+a)a-1
J = (Ea -1)D-1 = h-- = h-'-c:--,--'---:-,--
a In E 1n(1 +a) ,
where we nsed Equation (2.26).
iii
(2.27)
(2.28)
2.6.3 NumericalIntegration
Suppose that we are interested in the numerical value of J: f(x) dx. Let Xo sa a
and XN == b, and divide the interval [a, b] into N equal parts, each of length
h = (b - a)jN. The method commonly used in calcnlating integrals is to find
the integral J:,;+ah f(x) dx, where a is a suitable number, and then add all such
integrals. More specifically,
t: t: r:
1= f(x)dx+ f(x)dx+ ... + f(x)dx,
XQ xo+ah xo+(M-l)ah
where M is a suitably chosen number. In fact, since xN = Xo + N h, we have
M a = N. We next employ Equation (2.27) to get
1= Jafo +Jafa +...+Jaf(M-l)a = Ja (~l fka) ,
k~O
where fka sa f(xo + kah). We thus need an expression for Ja to evaluate the
integral. Such an expression can be derived elegantly, by noting that
1
a
S ES
la Ea- 1 I
E ds = - = -- = -Ja ,
DinED InE h
so that
[by Equation (2.27)]
(2.29)
trapezoidal rule for
numerical integration
2.6 OPERATORS IN NUMERICAL ANALYSIS 75
where we expanded (1 +AY using the binomial infinite series. Equations (2.28)
and (2.29) givethe desired evaluationof the integral.
Let us make a few remarks before developing any commonly used rules of
approximation.First, once h is set, the function can be evaluatedonly at Xo +nh,
where n is a positiveinteger.This means that fn is givenonly for positiveiotegers
n. Thus, in the sum in (2.28) ka must be an integer. Since k is an integer, we
conclude that a must be an integer. Second, since N = Ma for some integer M,
we must choose N to be a multiple of a. Third, if we areto be able to evaluate
Jaf(M-l)a [the last term io (2.28)], Ja cannot have powers of A higher than a,
because An f(M-l)a contaics a term of the form
f(xo +(M - l)ah +nh) = f(XN + (n - a)h),
whichforn > a givesf at apoint beyondtheupperlimit.Thus,in thepower-series
expansion of Ja, we must make sure that no power ofA beyond a is retained.
There are several specific Ja's commonly used in numerical integration. We
will consider these next. The trapezoidal rnle sets a = 1. Accordiog to the
remarks above, we therefore retain terms up to the firstpower in the expansion of
Ja. Then (2.29) gives JI =J =h(1 + ~A). Substituting this in Equation (2.28),
we obtain
(
N- l ) N-l
1= h(1 + ~A) ~ fk = h ~[!k + ~(fk+l - fk)]
h N-l h
="2 L(fk + fk+l) = "2(fo+2fl + .. ·+2fN-l +fN).
k~O (2.30)
Simpson's one-third
rule for numerical
integration
Simpson's
three-eighths ruie for
numerical integration
Simpson's one-third rule sets a = 2. Thus, we have to retain all terms
up to the A 2 term. However, for a = 2, the third power of A disappears in
Equation (2.29), and we get an extra ''power'' of accuracy for free! Because of
this,Simpson's one-thirdruleispopularfornumericalintegrations.Equation(2.29)
yields J2 = 2h(1 + A + ~A 2). Substituting this in (2.28) yields
h N/2-1 h N/2-1
1= - L (61 +6A +A 2)fzk = - L (flk+2 +4flk+l + flk)
3 k=O 3 k=O
h
= "3 (fo +4fl +2fz +4/3 +...+4fN-l + fN). (2.31)
Itis understood,of course,that N is an eveninteger.The factor tgivesthis method
its name.
For Simpson's three-eighths rule, we set a = 3, retain terms up to A 3, and
use Equation (2.29) to obtaio
J3 = 3h(1 + ~A + ~A2 + fA 3) = 3: (81 + l2A +6A 2 + A3).
76 2. OPERATOR ALGEBRA
Substitutingin (2.28),we get
3h N/3-t
1=- L (81+12A+6A z+A 3
) f3k
8 k=O
3h N/3-t
= 8" ~ (f3k+3 +3f3k+2 +3f3k+t + 13k). (2.32)
III
2.6.4. Example. Letus use Simpson's one-third rulewithfour intervals to evaluate the
familiar integrall = fJ eX dx. With h = 0.25 andN =4,Equation (2.31) yields
1 '" 0.25 (I +4eO.25+2e0.5 +4eO.75 +e) = 1.71832.
3
This is very close tothe"exact" result e - I '" 1.71828.
2.7 Problems
2.1. ConsideralinearoperatorTon a finite-dimensional vectorspaceV.Showthat
there exists a polynomial P such that P(T) =O. Hint: Take a basis B = {Iai )}!:I
aod considerthe vectors {Tk
lal)}~O for large enough M aod conclude that there
existsapolynomialPI(n suchthat PI (T) lal) = O. Do thesamefor laz), etc. Now
take the product of all such polynomials.
2.2. Use mathematicalinductionto showthat [A, Am] = O.
2.3. For D aodT definedin Exarople 1.3.4:
(a) Showthat [D, T] = 1.
(b) Calculate the linear traosformationsD3T3 aod T3D3.
2.4. Considerthreelinear operators LI, Lz, aod L3 satisfyingthe commutationre-
lations [LI, Lz] = L3, [L3, Ltl = Lz, [Lz, L3] =LI, aod define the new operators
L± = LI ± iLz.
(a) Showthat the operator LZ'" Li+~ +L~ commutes with Lk, k = 1,2,3.
(b) Showthat the set {L+, L, L3} is closedundercommutation, i.e., thecommuta-
tor of aoytwo of themcaobe writtenas a linear combinationof the set.Determine
these commutators.
(c) Write LZin terms of L+, L_, aod L3.
2.5. Prove the rest of Proposition2.1.11.
2.6. Show that if [[A, B],A] = 0, then for every positiveinteger k,
[Ak, B] = kAk-I[A, B].
Hint:Firstprove the relationfor low valuesof k; then use mathematicalinduction.
2.7 PROBLEMS 77
2.7. Show that for 0 and T defined in Example 1.3.4, [ok, T] = kok-1 and
[Tk, OJ = _kTk- 1•
2.8. Evaluate the derivative of H-I(t) in terms of the derivative of H(t).
2.9. Show that for any ct, fJ E lRand any H E ,c,(V), we have
2.10. Show that (U +T)(U - T) =U2 - T2 if and only if [U, T] =O.
2.11. Prove that if A and B are hermitian, then i[A, BJ is also hermitian.
2.12. Find the solution to the operator differential equation
dU
iii = tHU(t).
Hint: Make the change of variable y = t2 and use the result of Example 2.2.3.
2.13. Verify that
~H3 = (dH) H2 + H(dH) H+ H2 (dH).
dt dt dt dt
2.14. Show that if A and B commute, and ! and g are arbitrary functions, then
!(A) and g(B) also commute.
2.15. Assuming that US, T], T] = 0 = [[5, T], 5], show that
[5, exp(tT)] = t[S, T] exp(tn.
Hint: Expand the exponential and use Problem 2.6.
2.16. Prove that
exp(HI +H2 + H3) = exp(Hj ) exp(H2) exp(H3)
. exp{-!([HI, H2J + [HI, H3]+[H2, H3])}
provided that HI, H2, and H3 commute with all the commutators. What is the
generalization to HI +H2 +...+Hn?
2.17. Denoting the derivative of A(t) by A, show that
d . .
-[A, B] = [A, BJ+ [A, B].
dt
2.18. Prove Theorem 2.3.2. Hint: Use Equation (2.8) and Theorem 2.1.3.
2.19. Let A(t) es exp(tH)Ao exp( -tH), where H and An are constant operators.
Show thatdA/dt = [H, A(t)]. What happens when Hcommutes with A(t)?
78 2. OPERATOR ALGEBRA
2.20. Let If}, Ig) E C(a, b) with the additional property that f(a) = g(a) =
feb) = g(b) = O. Show that for such fuuctions, the derivative operator D is
anti-hermitian, The inner product is defined as usual:
UI g) es f !*(t)g(t) dt.
2.21. In this problem, you will go through the steps of proving the rigorous state-
Heisenberg ment ofthe Heisenberg nncertainty principle. Denote the expectation (average)
uncertainty principle value of an operator A in a state 11jI} by Aavg.Thus, Aavg = (A) = (1jI1 A 11jI). The
uncertainty (deviation from the mean) in state 11jI) of the operator A is given by
LlA = J«A - AavgP) = J(1jI1 (A - Aavg1P 11jI)·
(a) Show that for any two hermitian operators A and B, we have
Hint: Apply the Schwarz inequality to an appropriate pair of vectors.
(b) Using the above and the triangle inequality for complex numbers, show that
1(1jI1 [A, B] 11jI)12
::: 4 (1jI1 A
2
1
1j1) (1jI1 B
2
1
1j1) .
(c) Define the operators A' = A - "1, B' = B - fJ1, where" and fJ are real
numbers. Show that A' and B' are hermitian and [A', B'] = [A, B].
(d) Now use all the results above to show the celebrated uncertainty relation
(M)(LlB) ~ !I(1jI I [A, B] 11jI}1.
What does this reduce to for position operator x and momentum operator p if
[x, p] = i1i?
2,22. Show that U = exp A is unitary if and only if A is anti-hermitian,
2.23. Find rt for each of the following linear operators.
(a) T : JR2 -+ ]R2 given by
(b) T: ]R3 -+ ]R3 given by
T(~) = (;x~2:;~).
z -x+2y+3z
(c) T : ]R2 -+ ]R2 given by
T (X) = (xcosll- y sin II)
y xsmll+ycosll'
2.7 PROBLEMS 79
where 0 is a real number. What is TtT?
(d) T : 1(;2 """* 1(;2 given by
T (a1) = (al - ia2) •
a'2 1CXl +(X2
(e) T : 1(;3 """* 1(;3 given by
2.24. Show that if P is a (hermitian) projection operator, so are (a) 1 - P and (b)
UtpU for any unitary operator U.
2.25. For the vector
(a) Find the associated projection matrix, Pa.
(b) Verify that Pa does project an arbitrary vector in 1(;4 along la).
(c) Verify directly that the matrix 1 - Pais also a projection operator.
2.26. Let lal) '" al = (I, I, -I) and la2) sa 82 = (-2, I, -I).
(a) Construct (in the form of a matrix) the projection operators PI and P2 that
project onto the directions of lal) and la2),respectively. Verify that they are indeed
projection operators.
(b) Construct (in the form of a matrix) the operator P = PI + P2 and verify
directly that it is a projection operator.
(c) Let P act on an arbitraryvector (x, y, z). What is the dotproduct ofthe resulting
vector with the vector 8I x 82? What can you say about P and your conclusion in
(b)?
2.27. Let p(m) = L~=llei) (eil be a projection operator constructed out of the
firstm orthonormal vectors ofthe basis B = {lei) 1f:,1of"17. Show that p(m) projects
into the subspace spanned by the first m vectors in B.
2.28. What is the length ofthe projection ofthe vector (3, 4, -4) onto a line whose
parametric equation is x = 21 +I, Y = -I +3, z = 1 -I? Hint: Find a unit vector
in the direction of the line and construct its projection operator.
2.29. The parametric equation of a line L in a coordinate system with origin 0 is
x=2/+1, y = 1 + I, Z=-2/+2.
80 2. OPERATOR ALGEBRA
A point P has coordinates (3, -2, 1).
(a) Using the projection operators, find the length of the projection of 0 P on the
line L.
(b) Find the vector whose beginning is P and ends on the line L and perpendicnlar
to L.
(c) From this vector calculate the distance from P to the line L.
2.30. Let the operator U : 1(;2 -> 1(;2 be given by
U (al) = (iJz - i ~)
a ~+.!!Z...
2 ..(i..(i
Is U unitary?
2.31. Show that the product of two unitary operators is always unitary, but the
product of two hermitian operators is hermitian if and only if they commute.
2.32. Let 5 be an operator that is both unitary and hermitian. Show that
(a) 5 is involutive (i.e., 52 = 1).
(b) 5 = p+ - P-, where p+ and P- are hermitian projection operators.
2,33. Show that when the forward difference operator is applied to a polynomial,
the degree ofthe polynomial is reduced by I. (Hint: Consider xn first.) Then show
that a n+1 annihilates all polynomials of degree n or less.
2.34. Show that 6n
+1
and Vn+1
annihilate any polynomial of degree n.
2.35. Show that all of the finite-difference operators commute with one another.
2.36. Verify the identities
VE = 6E1
/
2 = a,
v +a = 2p,6 = E - E-1,
E-1/2 = P, _ (6/2).
2.37. By writing everything in terms of E, show that 62
= a - v = a V.
2.38. Write expressions for E1/2, a, V, and p, in terms of 6.
2.39. Show that
o = ~ sinh-I m
.
2.40. Show that
02 = :2 (a2 - a 3 + ga4 _ ~a5 - ... )
and derive Equation (2.27).
2.7 PROBLEMS 81
2.41. Find an expression for Ja in powers of a. Retain all terms np to the foorth
power.
2.42. Show that for a = 2, the third power of a disappears in Equation (2.29).
2.43. Evaluate the following integrals numerically, using six subintervals with
the trapezoidal rule, Simpson's one-third rule, and Simpson's three-eighths rule.
Compare with the exact result when possible.
1
5
x
3dx. [, f2
(a) (b) o e-X
dx. (c) 0 xei cosx dx.
[1 14inXdx. {'
(d) -dx. (e) (f) 1 e' sin x dx.
1 x
[ dx [, (i) 11
eX tan x dx.
(g)
_1 1+x2'
(h) 0 xe" dx.
Additional Reading
1. Axler, S. Linear Algebra Done Right, Springer-Verlag, 1996.
2. Greub, W. Linear Algebra, 4th ed., Springer-Verlag, 1975.
3. Hildebrand, F. Introduction to Numerical Analysis, 2nd ed., Dover, 1987.
Uses operator techniques in numerical analysis. It has a detailed discussion
of error analysis, a topic completely ignored in our text.
3 _
Matrices: Operator Representations
So far, our theoretical investigation has been dealing mostly with abstract vec-
tors and abstract operators. As we have seen in examples and problems, concrete
representations of vectors and operators are necessary in most applications. Such
representations are obtained by choosing a basis and expressing alloperations in
terms of components of vectors and matrix representations of operators.
3.1 Matrices
Letus choose a basis Bv = {Iai) 1~1 ofavector space VN, and express an arbitrary
vector Ix) in this basis: [r) = L~I ~i lai). We write
(3.1)
representation of and say that the column vector x represents Ix) in Bv. We can also have a linear
vectors transformation A E L (VN, WM) act on the basis vectors in Bv to give vectors in
the M-dimensional vector space WM: IWk) = A lak). The tatter can be written as
a linear combination of basis vectors Bw = {Ibj) If=j in W M :
M
IWI) = L"jllbj),
j=l
M
IW2) = L"j2Ibj),
j=l
M
IWN) = L"jN Ibj)'
j=1
Note that the components have an extra subscript to denote which ofthe N vectors
{IWi ) 1~1 they are representing. The components can be arranged in a column as
3.1 MATRICES 83
before to give a representation of lbe corresponding vectors:
The operator itself is determined by the collection of all these vectors, i.e., by a
matrix. We write this as
(
"' 11
"'21
A=
. ; a~2
(3.2)
representation of
operators
and call Albe matrix representing Ain bases Bv and Bw. This statement is also
sunnnarized symbolically as
M
Alai) = I>jilbj) ,
j~1
i = 1,2, ... ,N. (3.3)
We lbus have the following mle:
3.1.1.Box. To find the matrix A representing A in bases Bv = (Iai)1~1
and Bw = libj)1f=1' express Alai) as a linear combination ofthe vectors
in Bw. The components form the ith column ofA.
Now consider lbe vector Iy) = A Ix) in WM. This vector can be written in two
ways: On the one hand, Iy) = Ef=1 ryj Ibj ) . On the other hand,
N N
IY) =Alx) = A I) lai) = L~iAlai)
;=1 ;=1
Since Iy) has a unique set of components in lbe basis Bw, we conclude lbat
N
ryj = L"'ji~i'
i=l
j = 1,2, ... ,M. (3.4)
84 3. MATRICES: OPERATOR REPRESENTATIONS
Thisis written as
(
m
) (al1
~2 a21
· .
· .
· .
~M aMI
a12
a22
aM2
(3.5)
The operator TA
associated with a
matrix A
in which the matrix multiplication rule is understood. This matrix equation is the
representation of the operator equation Iy) = A [x) in the bases Bv and Bw.
The construction above indicates that-once the bases are fixed in the two
vector spaces-to every operator there corresponds a unique matrix. This unique-
ness is the result of the uniqueness of the components of vectors in a basis. On
the other hand, given an M x N matrix A with elements aij, one can construct a
unique linear2Perator TA defined by its action on the basis vectors (see Box 1.3.3):
TA lai) == Ljd aji Ibj ). Thus, there is a one-to-one correspondence between op-
erators and matrices. This correspondence is in fact a linear isomorphism:
3.1.2. Proposition. The two vector spaces £.,(VN, WM) and JV(
M xN are isomor-
phic. An explicit isomorphism is established only when a basis is chosen for each
vectorspace,inwhichcase,anoperator is identified withitsmatrixrepresentation.
Given the linear transformations A : VN -+ WM and B : WM -+ UK,
we can form the composite linear transformation BoA : VN -+ UK. We can
also choose bases Bv = {Iai)}~" Bw = {Ibi)}~" Bu = {Ici)};;', for V, W,
and U, respectively. Then A, B, and BoA will be represented by an M x N, a
K x M, and a K x N matrix, respectively, the latter being the matrix product of
the other two matrices. Matrices are determined entirely by their elements. For
thisreason a matrix A whose elements are(.lll, 0!12, ... is sometimes denoted by
(aij). Sintilarly, the elements of this matrix are denoted by (A)ij. So, on the one
hand, we have (aij) = A, and on the other hand (A)ij = ai], In the context of this
notation, therefore, we canwrite
(A+ B)ij = (A)ij + (B)ij =} (aij + fJij) =(aij) + (fJij),
(yA)ij = y(A)ij =} y(aij) = (yaij),
(O)ij = 0,
(1)ij = 8ij.
A matrix as a representation of a linear operator is well-defined only in refer-
ence to a specific basis. A collection ofrows and columns ofnumbers by themselves
have no operational meaning. When we manipulate matrices and attach meaning
to them, we make an unannounced assumption regarding the basis: We have the
standard basis of en (or JRn) in mind. The following example should clarify this
subtlety.
3.1 MATRICES 85
3.1.3. Example. Let us fiud the matrix represeutatiou of the liuear operator A E £, (lR.3),
given by
(
X
) (X-Y+2Z)
A Y = 3x-z
z 2y+z
in thebasis
There is a tendency to associate thematrix
-1
o
2
(3.6)
withtheoperator A. Thefollowingdiscussionwill showthatthisis false.Toobtain thefirst
columnof thematrix representing A,we consider
So, by Box 3.1.1, the first column of the matrix is
Theothertwo columns are obtained from
givingthesecondandthethird columns, respectively. Thewholematrix is then
2-5~)
1 t .
o 2
As long asallvectors are represented by columnswhoseentries are expansion coefficients
of thevectors in B, A andAare indistinguishable. However, theactionof Aon thecolumn
86 3. MATRICES: OPERATOR REPRESENTATIONS
x
vector(Y )willnotyieldtheRHSofEquation(3.6)!Althoughthisisnotnsuallyemphasized,
z
thecolumnvectorontheLHSof Equation (3.6) is reallythevector
whichis anexpansion in terms of thestandard basis of]R3 rather than in tenus of B.
x
Wecanexpand A(Y) in terms of B. yielding
z
(
X
) (X - Y +2Z)
A Y = 3x-z
z 2y+z
= (2x - b) (i)+(-x+ h+ 2z) G)+ (x+ b -z) (D.
Thissaysthat in thebasisB thisvector hastherepresentation
X
Similarly, (Y ) is represented by
z
W)) = ( t~~~:~;).
z B -~x + ~y + ~z
ApplyingA to the RHSof (3.8)yieldsthe RHS of (3.7),as it should.
(3.7)
(3.8)
Given any M X N matrix A, an operator TA E L (VN, WM) can be associated
with A, and one can construct the kernel and the range of TA. The rank of TA
rank ofa matrix is called the rank of A. Sioce the rank of an operator is basis independent, this
definition makes sense.
Now suppose that we choose a basis for the kernel ofTA and extend it to a basis
ofV. Let VI denote the span ofthe remaining basis vectors. Similarly, we choose a
basis for TA (V) and extend it to a basis for W. In these two bases, the M x N matrix
representiog TA will have all zeros except for an r x r submatrix, where r is the
rank of TA. The reader may verify that this submatrix has a nonzero detenninant.
In fact, the submatrix represents the isomorphism betweeu VI and TA (V), and,
by its very construction, is the largest such matrix. Since the determinant of an
operator is basis-iudependent, we have the followiog proposition:
3.1.4. Proposition. The rank ofa matrix is the dimension ofthe largest (square)
submatrix whosedeterminant is not zero.
3.2 OPERATIONS ON MATRICES 87
3.2 Operations on Matrices
There are two basic operations that one can perform on a matrix to obtain a new
transpose ofa matrix one; these are transposition and complex conjugation. The transpose ofan M x N
matrix A is an N x M matrix At obtained by interchanging the rows and columns
of A:
(At)ij = (A)ji, or (3.9)
The following theorem, whose proof follows immediately from the definition
oftranspose, summarizes the importantproperties ofthe operation oftransposition.
3.2.1.Theorem. Let Aand Bbe two (square) matrices. Then
(a) (A+ B)t = At + Bt, (b) (AB)t = BtAt, (c) (At)t = A.
(3.10)
symmetric and
antisymmetric
matrices
orthogonal matrix
complex conjugation
hermitian conjugate
Of special interest is a matrix that is identical to its transpose. Such matrices
occur frequently in physics and are called symmetric matrices. Similarly, anti-
symmetric matrices are those satisfying At = -A. Any matrix A can be written
as A = ~(A +At) + ~(A - At), where the first term is symmetric and the second
is antisymmetric.
The elements of a symmetric matrix A satisfy the relation a ji = (At)ij =
(A)ij = aij; i.e., the matrix is symmetric under reflection through the main di-
agonal. On the other hand, for an antisymmetric matrix we have a ji = -aij. In
particular, the diagonal elements of an antisymmetric matrix are all zero.
A (real) matrix satisfying AtA = AAt = 1 is called orthogonal.
Complex conjugation is an operation under which all elements of a matrix
are complex conjugated. Denoting the complex conjugate of A by A*, we have
(A·)ij = (A)ij, or (aij)* == (a~). A matrix is real if and only if A* = A. Clearly,
Wr=A .
Under the combined operation of complex coujugation and transpositiou, the
rows and columns ofa matrix are interchanged and all of its elemeuts are complex
conjugated. This combined operatiou is called the adjoint operation, or hermitian
conjugation, and is denoted by t, as with operators. Thus, we have
At = (At)* = (A*)t,
(At)ij = (A)ji or (aij)t = (aji)'
Two types of matrices are important enough to warrant a separate definition.
hermitian and unitary 3.2.2.Definition. A hermitian matrix Hsatisfies Ht = H,or, in terms ofelements,
matrices ~ij =tlji- A unitary matrix U satisfies UtU =uut =1, or, in terms ofelements,
"N *"N* ,
L....k=l lLiktLjk = L...k=l J.Lkif.Lkj = Vij·
Remarks: It follows immediately from this definition that
I. The diagonal elements of a hermitian matrix are real.
88 3. MATRICES: OPERATOR REPRESENTATIONS
2. The kth column of a hennitiau matrix is the complex coujugate of its kth row,
and vice versa.
3. A real hermitiau matrix is symmetric.
4. The rows of au N x N unitary matrix, when considered as vectors in eN, form
au orthonormal set, as do the columns.
5. A real unitary matrix is orthogonal.
It is sometimes possible (aud desirable) to trausfonn a matrix into a form in
which all of its off-diagonal elements are zero. Such a matrix is called a diagonal
diagonal matrices matrix. A diagonal matrix whose diagonal elements are {Akl~1 is denoted by
diag(AI, A2,... , AN)'
3.2.3. Example. In this example,we derivea useful identityfor functions of a diagonal
matrix.Let D= diag(AI,1.2, ... , An) be a diagonalmatrix, and [(x) a functionthatbas a
Taylor seriesexpansion f (x) = L~o akxk. The samefunction of Dcanbe written as
Inwords, thefunction ofadiagonalmatrixisequal toadiagonalmatrixwhoseentries are the
samefunction of thecorresponding entries of theoriginal matrix. ill the abovederivation,
we usedthefollowingobviousproperties of diagonal matrices:
adiag(AJ,A2, ... ,An) =diag(aAI,aA2, ... ,aAn),
diag(AI,1.2, , An) + diag("'I, "'2, ,"'n) = diag(AI +"'1, ... , An + Wn),
diag(AI,1.2, , An) . diag("'I. "'2 ,"'n) = diag(AI"'I, ... , An"'n). III
3.2.4. Example. (a) A prototypical symmetric matrix is that of the moment of in-
ertia encountered in mechanics. The ijth eleIIlent of this matrix is defined as Iij es
fff P(XI. X2, X3)XiXj dV, whereXi is the ilb Cartesiancoordinateof a point in the dis-
tribution of massdescribed by thevolumedensity P(Xlo xz, X3).It is clearthat Iij = Iji.
or I = It. Themoment of inertia matrix canbe represented as
Ithassix independent elements.
(b) Anexampleof ananti symmetric matrix is theelectromagnetic fieldtensor givenby
F= (~3
-82
-EI
-83 82
o -81
81 0
-E2 -E3
Pauli spin matrices (c) Examples of hermitian matrices arethe2 x 2 Pauli spin matrices:
3.3 ORTHONORMAL BASES 89
(
0 -i)
cz = i 0 .
(d) The mostfrequently encountered orthogonal matrices are rotations. One suchmatrix
Euler angles represents therotation ofa3-dimensionalrigidbodyin terms of Euleranglesandisusedin
mechanics. Attaching acoordinate system tothebody, ageneralrotationcanbedecomposed
intoa rotation of anglee about thez-axis,followedby arotation of angleeabout thenew
x-axis, followedbyarotationofangle1ft aboutthenew z-axis. Wesimplyexhibitthismatrix
interms of theseangles andleaveitto thereader to showthat it is indeedorthogonal.
(
COS 1fr cos tp - sin 1/1 cos 8 sin cp
sin 1/1 cos rp + cos 'if.r cos ()sinrp
sinfJ sinrp
- cos 1fr sin q; - sin 1/1 cos 8 cos rp
- sin "" sin cp + cos 1/1 cos ()cos qJ
sine cos rp
sin1f.rsin9 )
- cos1ft sinB .
cosB
III
3.3 Orthonormal Bases
Matrix representation is facilitated by choosing an orthonormal basis B =
(Ie;)}!:!. The matrix elements of an operator A can be found in such a basis
by "multiplying" both sides of A lei) = I:f=t aki lek) on the left by {ej I:
or
(3.11)
We can also show that in an orthonormal basis, the ith component ;i of a
vector is found by multiplying it by (eiI. This expression for;i allows us to write
the expansion of [x) as
N
=} 1 = I>j}(ejl,
j~l
(3.12)
which is the same as in Proposition 2.5.3. Let us now investigate the representation
of the special operators discussed in Chapter 2 and find the connection between
those operators and the matrices encountered in the last section. We begin by
calculating the matrix representing the hermitian conjugate of an operator T. In
an orthonormal basis, the elements of this matrix are given by Equation (3.11),
7:;j = (eil T lej). Taking the complex conjugate of this equation, and using the
definition of T" given in Equation (2.8), we obtain 7:ij = (eil T lej)* = (ej Irt lei),
or (Tt)ij = 7:j;. This is precisely how the adjoint of a matrix was defined.
90 3. MATRICES: OPERATOR REPRESENTATIONS
Note how crucially this conclusion depends on the orthonormality of the basis
vectors. If the basis were not orthonormal, we could not use Equation (3.11) on
which the conclusion is based. Therefore,
3.3.1. Box. Only in an orthonormal basis is the adjoint ofan operator rep-
resented by the adjoint ofthe matrix representing that operator.
In particular, a hermitian operator is represented by a hermitian matrix only if
an orthonormal basis is used. The following example illustrates this point.
3.3.2. Example. Consider the matrix representation of the hermitian operator H in a
general-notorthononnal-basis B = {Iai) If::l.Theelementsofthe matrix corresponding
to H aregivenby
N
H lak) = L tljk laj).
j=l
or
N
H la;) = L tlji laj}'
j=l
(3.13)
Takingthe product of the first equation with <rti Iand complex-conjugatingthe result gives
(aiIHlak)* = CLf~l~jk(ailaj})* = Ej=l~jk(ajlai).Butbythedefinitionofa
hermitian operator,
(ail H lak)* = (aklHt lai) = (aklH lai)'
So we have (aklH lai) = Ef=l ~jk (ajlai).
On the other hand, muitiplying the second equation in (3.13) by (akl gives
(aklHjai) = "L.7=1 Ylji {akl aj}' Theonlyconclusion we candraw from thisdiscussion is
Ef~l ~jk (ajl ail = Ef~l ~ji (akl aj). Because litis equation does not say anything
about each individual 1Jij. we cannot conclude, in general, that 1]0 = 'YJji' However,
if the lai)'s are orthonormal, then (ajl ail = 8ji and (akl aj) = 8kj. and we obtain
"L.f=l11jk(Jji = "Lf=11Jji8icj, orrJik = nu. as expected of a hermitian matrix. II
Similarly, we expect the matrices representing unitary operators to be uni-
tary only if the basis is orthonormal. This is an inunediate consequence of
Equation (3.10), hut we shall prove it in order to provide yet another example
of how the completeness relation, Equation (3.12), is used. Since UUt = 1,
we have (eil UUt lej) = (eil1Iej) = 8ij. We insert the completeness relation
1 = Lf=l lek) (ekl between Uand uton the LHS:
(eil U (~ lekHekl) U
t
lej) = ~ (eil ~ lek) (ekl ~t lej) = 8ij.
=f.£ik =f.£jk
This equation gives the first half of the requirement for a unitary matrix given in
Definition 3.2.2. By redoing the calcnlation for UtU, we conld obtain the second
half of that requirement.
3.4 CHANGE OF BASIS AND SIMILARITY TRANSFORMATION 91
3.4 Change of Basis and Similarity Transformation
!tis often advantageous to describe aphysicalproblem iu aparticularbasis because
it takes a simpler form there, but the general form of the result may still be of
importance. In such cases the problem is solved in one basis, and the result is
transformed to other bases. Let us investigate this point in some detail.
Given a basis B = (Iai)}!:!' we can wtite an arbitrary vector la) with com-
ponents {al,a2, ... ,aN} in B as la) = L!:lai lai). Now suppose that we
change the basis to B' = (Iai)}f=l' How are the components of la) in B' re-
lated to those in B? To answer this question, we wtite lai) in terms of B' vectors,
lai) = Lf=l Pji lai), and substitute for lai) in this expansion of la), obtaining
la) = L!:l a; Lf=l Pji lai) = Li,j aiPji lai)· Ifwe denote the jth component
of la) in B' by ai, then this equation tells us that
N
otj = L Pjiotj
i=l
for j = 1,2, "" N. (3.14)
Ifwe use a', R, and a, respectively, to designate a column vector with elements ai,
an N x N matrix withelements Pit- and a columnvectorwithelements ai, then
Equation (3.14) can be wtitten in matrix form as
(~~) cp~: ~~ ~~~) (~~) or a' = Ra.
a~ - ~l P~2 •• a~'
(3.15)
basistransformation The matrix Ris calledthe basis transformation matrix. !tis invertiblebecause
matrix it is a linear transformation that maps one basis onto another (see Theorem 2.1.6).
What happens to a matrix when we transform the basis? Consider the equation
Ib) = A la), where la) and Ib) have components {ail!:l and (,Bil!:!' respectively,
in B. This equation has a corresponding matrix equation b = Aa. Now, if we
cbaoge the basis, the components of la) and Ib) will change to those of a' and b',
respectively. We seek a matrix A' such that b' = A'a'. This matrix will clearly be
the transform of A. Using Equation (3.15), we wtite Rb =A'Ra, or b =R-1A'Ra.
Comparing this with b = Aa and applying the fact that both equations hold for
arbitrary a and b, we conclude that
or (3.16)
similarity This is called a similarity transformation on A, and A' is said to be similar to A.
transformation The transformation matrix R can easily be found for orthonormal bases B =
(Jei)}!:l and B' = (Ie;)}!:!. We have lei) = Lf=l Pki le~). Multiplying this
92 3. MATRICES: OPERATOR REPRESENTATIONS
equation by {eiI, we obtain
N N
(eilei) =LPki (eileiJ =LPki8jk <e».
k~1 k~1
That is,
(3.17)
3.4.1. Box. Tofind the ijth element ofthe matrix that changes the compo-
nents ofa vector in the orthonormal basis B to those ofthe same vector in
the orthonormal basis B', take the j th ket in B and multiply it by the ith bra
in R'.
To find the ijth element of the matrix that changes B' into B, we take the
jth ket in B' and multiply it by the ith bra in B: pij = (ed ei). However, the
matrix R' must be R-I, as can be seen from Equation (3.15). On the other hand,
(pi)' =(edei)' =(eilei) =Pji,Or
(R-I)lj =Pji, or (R-I)jj =Pji =(Rt)ij. (3.18)
This shows that R is a unitary matrix and yields an important result.
3.4.2. Theorem. The matrix that transforms one orthonormal basis into another
is necessarily unitary.
From Equations (3.17) and (3.18) we have (Rt)ij = {ed ei}. Thus,
3.4.3. Box. To obtain the jth column of Rt, we take the jth vector in the
new basis and successively "multiply" it by {ed for i = 1,2, ... , N.
In particular, if the original basis is the standard basis of eN and lei} is rep-
resented by a column vector in that basis, then the jth column of Rt is simply the
vector lei}.
3.4.4. Example. In this example, we show that the similarity transform of a function
of a matrix is the same function of the similarity transform of the matrix: R!(A)R-1 =
f (RAR-1). Theproofinvolvesinserting 1 = R- IRbetweenfactors of AintheTaylor series
expansion of !(A):
R!(A)R-I = R If.akAk) R-I = f;akRAkR-I = f;akR~~-;R-I
.t=0 k~O k~O
k times
00 . 00
=I>kRAR-IRAR-I ... RAR-I = L ak (RAR-I)k = !(RAR-I). III
k~O k=O
3.5 THE DETERMINANT 93
3.5 The Determinant
An important concept associated with linear operators is the determinant, Deter-
minants are usually defined in terms of matrix representations of operators in a
particular basis. This may give the impression that determinants are basis depen-
dent. However, we shall show thatthe value ofthe determinant ofan operatoris the
same in all bases. In fact, it is possible to define determinants of operators without
resort to a specific representation ofthe operator in terms ofmatrices (see Section
25.3.1).
Letus first introduce apermutation symboleiliz...iN' whichwill beusedexten-
sively in this chapter. It is defined by
(3.19)
In other words, S'j'2...iN is completely antisymmetric (or skew-symmetric) under
interchange ofany pair ofits indices. We will use this permutation symbol to define
determinants. An immediate consequence of the definition above is that S'j'2...iN
will be zero if any two of its indices are equal. Also note that f:iliz ...iN is +1 if
(iI, ia. ... , iN) is an even permutationI (shuflling) of (1, 2, . '" N), and -I if it
is an odd permutation of (1,2, ... , N).
3.5.1 Determinantof a Matrix
determinant defined 3.5.1. Definition. The determinant is a mapping, det : MNxN --> C, given in
term ofthe elements OIu ofa matrix A by
N
detA = L 8iliz...iNCXlit ... 00NiN'
il,...,iN=l
Definition 3.5.1 gives detA in terms of an expansion in rows, so the first entry
is from the first row, the second from the second row, and so on. It is also possible
to expand in terms of columns, as the following theorem shows.
3.5.2. Theorem. The determinant ofa matrix A can be written as
N
detA = L eiriz...iNCXitl ... otiNN·
il, ...,iN
Therefore, det A = det At.
1An even permutation means an even number of exchanges of pairs of elements. Thus, (2,3, 1) is an even permutation
of (1, 2, 3), while (2, 1, 3) is an odd permutation.It can be shown (see Chapter 23) that the parity (evenness or oddness) of
a permutation is well-defined; i.e., thatalthough there may be manyroutes of reaching a permutation froma given (fiducial)
permutation via exchanges of pairs of elements, all suchroutes require eithereven numbers of exchanges or odd numbers of
exchanges.
94 3. MATRICES: OPERATOR REPRESENTATIONS
Proof We shall go into some detail in the proof of only this theorem on determi-
nants to illustrate the manipulations involved in working with the B symbol. We
shall not reproduce such details for the other theorems.
In the equation of Definition 3.5.1, ii, i2, ... , iN are all different and form a
permutation of (I, 2, ... ,N). So one of the a's must have I as its second index.
We assume that it is the h th term; that is, iiI = I, and
We move this term all the way to the left to get aiJIali! . .. aNiN' Now we look
for the entry with 2 as the second index and assume that it occurs at the hth
position; that is, ijz = 2. We move this to the left, next to aiI 1, and wtite
(ljt1(ljz2(llil aNiN0 Wecontinue in thisfashion untilwe get(lhl(ljz2· .. oriNN.
Since h, n. ,iN is really a reshuffling ofit, i2, ... , iN, the surnmationindices
can be changed to h, h. ... , l«, and we can wtite2
N
detA = "I: Bili2...iN(1hl ... ajNN.
il,···,jN=l
If we can show that Biliz ...iN = 8hjz...in» we are done. In the equation of the
theorem, the sequence of integers (ii, i2, ... , is) is obtained by some shuffling
of (I, 2, ... , N). What we have done just now is to reshuffle (iI, tz. ... , iN) in
reverse order to get back to (1, 2, ... , N). Thus, ifthe shuflling in the equation of
the theorem is even (odd), the reshuffling will also be even (odd). Thus, Biti,...iN =
Biih...is» and we obtain the first part of the theorem:
N
detA = L 8hjz...jN(lhl ... ajNN.
h,···,iN=l
For the second part, we simply note that the rows of At are coluums of A and vice
versa. D
3.5.3. Theorem. Interchanging two rows (or two columns) of a matrix changes
the sign ofits determinant.
Proof The proof is a simple exercise in permutation left for the reader. D
An immediate consequence of this theorem is the following corollary.
3.5.4. Corollary. The determinant ofa matrix with two equal rows (or two equal
columns) is zero. Therefore, one can add a multiple ofa row (column) to another
row (column) ofa matrix without changing its determinant.
2TheE symbolin thesumis notindependent of thej's. although it appears without suchindices.In reality, thei indicesare
"functions" of thej indices.
(3.20)
cofactor ofan
element ofamatrix
3.5 THE DETERMINANT 95
Since every term of the determinant in Definition 3.5.1 contains one and only
one elementfrom eachrow, we canwrite
N
detA = CXilAil +cxi2Ai2 + ... +otiNAiN = L:CiijAij,
j~l
where Aij contaios products of elements of the matrix A other than the element
aii' Sinceeachelementof aroworcolumnoccursatmostonceineachterm of the
expansion, Aij cannot contain any element from the ithrow or the jth column, The
quantity Aij is called the cofactor of aij, and the above expression is known as the
(Laplace) expansion ofdetA by its ith row. Clearly, there is a similar expansion by
the ith column of the determinant, which is obtained by a similar argument usiog
the equation of Theorem 3.5.2. We collect both results io the followiog equation.
N N
detA = L:cxijAij = L:ajiAjio
)=1 )=1
Vandermonde, Alexandre-Thieophile, alsoknownasAlexis,Abnit, andCharles-Auguste
Vandermonde (1735-1796) had a father, a physician who directed his sickly son towarda
musicalcareer. An acquaintanceship withFontaine, however, so stimulated Vandennonde
that in 1771 he was elected to the Academic des Sciences, to which he presented four
mathematicalpapers (histotal mathematical production) in 1771~1772. Later,Vandennond.e
wrote several papers on harmony, andit was said at thattime that musicians considered
Vandermonde to be a mathematician andthat mathematicians viewedhimas a musician.
Vandermonde's membership in theAcademy led to a paper on experiments withcold,
made with Bezout and Lavoisier in 1776, and a paper on the manufacture of steel with
Berthollel and Monge in 1786. Vandermonde becamean ardent and active revolutionary,
being such a close friend of Monge thathe was termed "femme de Monge." He was a
member of the Commune of Paris andtheclubof the Jacobins. In 1782he was director of
the Conservatoire des Artset Metiers andin 1792, chief of the Bureau de 1'Habillement
desArmies. Hejoinedin thedesignof acourseinpoliticaleconomyfortheEcoleNormale
andin 1795 was named a member of theInstitut National.
Vandermonde is best known for the theory of determinants. Lebesgue believedthat
the attribution of determinant to Vandermonde was due to a misreading of his notation.
Nevertheless, Vandermonde's fourth paper was the first to give a connected exposition of
determinants, because he (1) defined a contemporary symbolismthat wasmorecomplete,
simple, and appropriate than that of Leibniz; (2) defined determinants as functions apart
from thesolution of linear equations presentedby Cramer butalsotreatedbyVandermonde;
and(3) gaveanumber of properties of thesefunctions, suchasthenumber andsignsof the
terms andtheeffectof interchanging two consecutive indices(rowsorcolumns), whichhe
usedto showthata determinant is zeroif tworowsorcolumnsareidentical.
Vandermonde's realandunrecognized claim to fame was lodgedin his first paper, in
whichhe approachedthe generalprohlemof the solvabilityof algebraiceqoationsthrough
a study of functions invariant under permutations of the roots of the equations. Cauchy
96 3. MATRICES: OPERATOR REPRESENTATIONS
assigned priority in this to Lagrange and Vandermonde. Vandermonde read his paper in
November 1770, but he did not become a member of the Academy until 1771, and the
paperwasnot publisheduntil 1774.AlthoughVandennonde'smethodswerecloseto those
later developed by Abeland Galoisfor testing the solvability ofequations, and although his
treatment of the binomial equation xn - 1 = 0 could easily have led to the anticipation
of Gauss's results on constructible polygons, Vandennonde himself did not rigorously or
completelyestablishhisresults,nordid he see the implicationsfor geometry. Nevertheless,
Kroneckerdates the modern movement in algebra to Vandermonde's 1770 paper.
Unfortunately, Vandennonde's spurt of enthusiasm and creativity, which in two years
produced four insightful mathematical papers at least two of which were of substantial
importance, was quickly diverted by the exciting politics of the time and perhaps by poor
health.
3.5.5. Proposition. Ifi i' k, then L.f~1 aijAkj =0 =L.f~1 ajiAjk.
Proof Consider the matrix B ohtained from A hy replacing row k hy row i (row
i remains unchanged, of course). The matrix B has two equal rows, and its de-
terminant is therefore zero. Now, if we ex~and det B by its kth row according to
Equation (3.20), we obtain 0 = det B = L.j=1 t!kjBkj. But the elements of the kth
row of B are the elements of the ith row of A;that is, {Jkj = ai], and the cofactors
of the kth row of B are the same as those of A, that is, Bkj = Akj. Thus, the first
equation ofthe proposition is established. The second equation can be established
using expansion by columns. D
minor ofa matrix A minor of order N - I of an N x N matrix A is the determinant of a matrix
obtained by striking out one row and one column ofA.Ifwe strike out the ith row
and jth column of A, then the minor is denoted by Mij.
3.5.6. Theorem. Aij = (-l)'+jMij.
Proof The proofinvolves separating all from the rest of the terms in the expansion
of the determinant, The unique coefficient of all is All by Equation (3.20). We
can show that it is also MIl by examining the e expansion of the determinant and
performing the first sum. This will establish the equality A II = MIl. The general
equality is obtained by performing enough interchanges of rows and columns of
the matrix to bring aij into the first-row first-column position, each exchange
introducing a negative sign, thus the (-l)i+j factor. The details are left as an
exercise. D
The combination of Equation (3.20) and Theorem 3.5.6 gives the familiar
routine of evaluating the determinant of a matrix.
3.5.2 Determinants of Products of Matrices
One extremely useful property of determinants is expressed in the following the-
orem.
3.5 THE DETERMINANT 97
3.5.7. Theorem. det(AB) = (detA)(detB)
Proof The proofconsists in keeping track ofindex shuffling while rearranging the
order of products ofmatrix elements. We shall leave the details as an exercise. D
3.5.8. Example. Let 0 and U denote, respectively, an orthogonal and a unitary n x n
matrix; that is, 00' = 0'0 = 1, and UUt = utu = 1. Taking the detenoinant of the
firstequation and nsing Theorems 3.5.2 and 3.5.7. we nbtain (det O)(deta') = (det 0)2 =
det 1 = 1. Therefore, for an orthogonal matrix, we get detO = ±1.
Orthogonal transformations preserve a real inner product. Among such transformations
are the so-called inversions, which, in their simplest form, multiply a vector by -1. In three
dimensions this correspondsto a reflection through the origin. The matrix associated with
this operation is -1:
o
-1
o
which has a determinant of -1. This is a prototype of other, more complicated, orthogonal
transformations whose determinants are -1.
The other orthogonal transformations, whose determinants are +I, are of special in-
terest because they correspond to rotations in three dimensions. The set of orthogonal
transformations in n dimensions having the determinant +1 is denoted by SO(n). These
transformations are special because they have the mathematical structure of a (continuous)
group, which finds application in many areas of advanced physics. We shall come back to
the topic of group theory later in the book.
We can obtain a similar result for unitary transformations. We take the determinant of
both sides of UtU = 1:
det(U*)' det U= det U* det U= (det U)*(det U) = Idet UI2 = I.
Thus, we can generally write det U = eia, with a E JR.. The set of those transformations
with a = Ofonos agronp to which 1 belongs and that is denoted by SU(n). This gronphas
found applications in the attempts at unifying the fundamental interactions. II
3.5.3 Inverse of a Matrix
One of the most useful properties of the determinant is the simple criterion it
gives for a matrix to be invertible. We are ready to investigate this criterion now.
We combine Equation (3.20) and the content of Proposition 3.5.5 into a single
equation,
N N
LClijAkj = (detA)8ik = LCljiAjk,
)=1 j=l
(3.21)
98 3. MATRICES: OPERATOR REPRESENTATIONS
andconstruct amatrix CA, whoseelementsarethecofactors of theelements of the
matrix A:
(3.22)
Then Equation (3.21) can be written as
N N
I>ij (C~)jk = (detA)8ik = L (C~)kj «s.
j=l j=l
Of, in matrix form, as
AC~ =(det A)1 =C~A. (3.23)
(3.24)
3.5.9. Theorem. The inverse ofa matrix (ifit exists) is unique. The matrix A has
inverse ofamatrix an inverse ifand only ifdetA i' O. Furthermore,
C'
A-t = _A_
det A'
where CA is the matrix ofthe cofactors ofA.
Proof. Let B and C be inverses of A. Then
B = (CA)B = C (AB) = C.
'-.r-' '-.r-'
=1 =1
For the second part, we note that if A has an inverse B, then
AB= 1 =} detAdetB = det 1 = I,
whence det A i' O. Conversely, if det A i' 0, then dividing both sides ofEquation
(3.23) by detA, we obtain the unique inverse (3.24) of A. D
The inverse of a 2 x 2 matrix is easily found:
(
a b)-t I ( d -b)
e d =ad-be -e a
(3.25)
ifad - be i' O. There is a more practical way ofcalculating the inverse ofmatrices.
In the following discussion of this method, we shall confine ourselves simply to
stating a couple of definitions and the main theorem, with no attempt at providing
any proofs. The practical utility of the method will be illusttated by a detailed
analysis of examples.
3.5 THE DETERMINANT 99
elementary row
operation
triangular, or
row-echelon form of
amatrix
3.5.10. Definition. An elementary row operation on a matrix is one of the fol-
lowing: (a) interchange oftwo rows ofthe matrix, (b) multiplication ofa row by a
nonzero number, and (c) addition ofa multiple ofone row to another.
Elementary column operations are defined analogously.
3.5.11. Definition. A matrix is in triangular, or row-echelon.form if it satisfies
the following three conditions:
1. Any row consisting ofonly zeros is below any row that contains at least one
nonzero element.
2. Goingfrom left to right, the first nonzero entry ofany row is to the left ofthe
first nonzero entry ofany lower row.
3. The first nonzero entry ofeach row is 1.
3.5.12. Theorem. For any invertible n x n matrix A,
• The n x 2n matrix (All) can be transformed into the n x 2n matrix (IIA-I)
by means ofa finite number ofelementary row operations.3
• if (All) is transformed into (liB) by means ofelementary row operations,
then B = A-I
A systematic way of transforming (All) into (I lA-I) is first to bring A into
triangular form and then eliminate all nonzero elements of each column by ele-
mentary row operations.
3.5.13. Example. Let us evaluate the inverse of
A= G: =D·
We start with
G
2
o I
I
-III 00)
-2 0 I 0 == M
-I 0 0 I
o 0)
I 0
3 I
-III
-2 0
-5 -2
(
I 2
o I
o 0
0)
o ,
I 3(2)+(3)
o )
o =M'
-1/5
and apply elementary row operations to Mto bring the left half of it into triangular form. If
we dennte the klb row by (k) and the three operatinns of Definition 3.5.10, respectively, by
(k) '* (j), ark), and ark) + (j), we get
(I 2 -III 0
M ,0 I -2 0 I
-2(1)+(3) 0 -3 I -2 0
(I2 -III 0
-------+ 0 I -2 0 I
-!(3) 0 0 I 2/5 -3/5
3The matrix (AI1) denotes the n x 2n matrix obtained by juxtaposing the n x n unit matrix to the right of A.It can easily be
shown that if A, 8, and C are n x n matrices, then A(BIC) = (ABIAC).
100 3. MATRICES, OPERATOR REPRESENTATIONS
The left half of M' is in triangular form. However, we wantall entries aboveany 1 in a
columnto be zero as well, .i.e.,we wantthe left-hand matrix to be 1. We can do thisby
appropriate use of type3 elementary rowoperations:
c
0
3 II
-2
-tJ
M' ) 0 I -2 0 I
-2(2)+(1) 0 0 I 2/5 -3/5
G
0 o 1-1/5 -1/5 3/5 )
, I -2 0 I
-~/5
-3(3)+(1)
0 I 2/5 -3/5
2(3)+(2) G
0 o ,-1/5 -1/5 3/5 )
I o 4/5 -1/5 -2/5
0 I 2/5 -3/5 -1/5
Theright half of the resultingmatrixis A-I. 11II
3.5.14. Example. It is instructive to start with a matrix thatis not invertible andshow
that it is impossibleto turn it into 1 by elementary rowoperations. Consider thematrix
B = (i -I
D·
-2
-I 5
Let us systematically bring it intotriangular fonn:
M= (i -I
31
1 0
0) C-2
I 10
I
D
-2 I 0 I o ,2 -I 3 I 0
-I 5 o 0 0 I (1)<>(2) -I 5 o 0 0
(~I
-2
I 10
I
0) C
-2
I 10
I
D
, 3 I I -2 o -------+ 0 3 I I -2
-2(1)+(2)
5 o 0 0 I (1)+(3) 0 3 I 0 I
C
-2
I I 0
I
0) C
-2
I I 0
I
D
' 0 3 I I -2 o --+ 0 I 1/3 1/3 -2/3
-(2)+(3) 0 0 o -I 3 I ~(2) 0 0 o -I 3
The matrix B is now in triangular form, butits third row contains all zeros.There is no
way we can bring thisinto the form of a unitmatrix. We therefore conclude that B is not
invertible. This is, of course, obvious,sinceit caneasily beverified thatBhas a vanishing
determinant. 11II
We mentioned earlier that the determinant is a property of linear lransfonna-
tions, although they are defined in terms of matrices that represent them. We can
now show this. First, note that taking the determinant of both sides of AA-1 = I,
trace ota square
matrix
3.6 THE TRACE 101
one obtains det(A-1) = 1/ detA. Now recall that the representations of an opera-
tor in two different bases are related via a similarity transformation. Thus, if A is
represented by Ain one basis and by A' in another, then there exists an invertible
matrix R such that A' = RAR-1
• Taking the determinant of both sides, we get
I
det A' = detR detA - - = detA.
detR
Thus, the determinant is an inttiusic property of the operator, independent of the
basis chosen in which to represent the operator.
3.6 The Trace
Another inttiusic quantity associated with an operator that is usually defined in
terms of matrices is given in the following definition.
3.6.1. Definition. Let A be an N x N matrix. The mapping tr : MN x N --> <C (or
lR) given by tr A = L~l aii is called the trace ofA.
3.6.2. Theorem. The trace is a linear mapping. Furthermore,
trA' = trA and tr(AB) = tr(BA).
connection between
trace and
determinant
Proof The linearity of the trace and the first identity follow directly from the
definition. To prove the second identity of the theorem, we use the definitions of
the trace and the matrix product:
N N N N N
tr(AB) = ~)AB)ii = LL(A)ij(B)ji = LL(B)ji(A)ij
;=1 ;=1 j=l ;=1 j=l
N (N ) N
= f; ~(B)ji(A)ij =f;(BA)jj =tr(BA).
D
3.6.3. Example. In thisexample, we show a veryusefulconnection between the trace
and thedeterminant that holdswhen amatrix is onlyinfinitesimally different from theunit
matrix. Letuscalculate thedeterminant of 1+fA tofirst order in E. Usingthedefinition of
determinant, we write
n
det(1 +€A) = L £;1 ...iIl(151i1 +€alit)···(l5nin +€a'nin)
i1,.."i n= 1
n n n
= E Eit ...in81it···8nin +€ L E E"il ...in81i1···8kik .. ,8ninakik·
i1,...•in= 1 k=l ;1,...,in= 1
(3.26)
102 3. MATRICES: OPERATOR REPRESENTATIONS
Thefirst sumis justtheproduct of all theKronecker deltas. In thesecondsum,8kik means
thatin the product of the deltas, ~kik is absent. This term is obtained by multiplying the
secondterm of thekth parentheses by thefirst term of all therest.Since we are interested
onlyinthe first power ofE, we stopat this term. Now,thefirst sum is reduced toE12...n = 1
after all theKronecker deltas are summed over. Forthesecondsum,we get
n n n n
e L E Eit...i1l8lil··· 8kh ... 8ninetkh = € L L €12 ...ik ...nCl
kik
k=l it ....• in=l k=l ik=l
n n
= e E€12 ...k...netkk = € Eakk = € trA,
k~l k~l
where thelastlinefollows from thefactthat theonlynonzerovalue for€12...ik...n isobtained
whenik is equalto the missing index,i.e., k, in whichcaseit will be 1.ThusOOt(1 +fA) =
l+<trA. III
Similar matrices have the same trace: If A' = RAR-1, then
tr A' =tr(RAR-1) =tr[R(AR-1)] =tr[(AR-1)R]
=tr[A(R-1R)] =tr(A1) =trA.
The preceding discussion is summarized in the following proposition.
3.6.4. Proposition. To every operator A E L (V) are associated two intrinsic
numbers, det A and tr A, which are the determinant and trace ofthe matrix repre-
sentation ofthe operator in anybasis ofV.
Itfollows from this proposition that the resnlt ofExample 3.6.3 can be written
in termsof operators:
det(l + fA) = 1+ <trA. (3.27)
(3.28)
A par1icnlarly usefulformula thatcan be derivedfrom this equation is the derivative
at t = 0 of an operator A(t) depending on a single variable with the property
that A(O) = 1. To first order in t, we can write A(t) = 1 + tA(o) where a dot
represents differentiating with respect to t. Substituting this in Equation (3.27)
and differentiating with respect to t, we obtain the important result
d I .
- det(A(t» = tr A(O).
dt ,=0
3.6.5. Example. Wehaveseenthatthedeterminantof aproductof matrices is theproduct
of thedeterminants. On theother hand, thetrace of a sum of matrices is thesumof traces.
Whendealingwith numbers, products and sums arerelated via the logarithm andexpo-
nential: afJ = exp{lna +In,B}. A generalization of thisrelation exists fordiagonalizable
matrices. LetAbe sucha matrix, i.e., let D= RAR-1for-some similarity transformation R
andsome diagonal matrix D= diag(Al, 1.2, ... , An).The determinantofa diagonalmatris
is simplytheproduct of its elements:
3.7 PROBLEMS 103
Takingthe naturallog of both sidesandusing the result of Example3.2.3,we have
In(detD) =lnAl +lnA2+···+lnAn =tr(lnD),
whichcan also be writtenas det D= exp(tr(ln D)).
In tenus ofA,thisreadsdet(RAR-1) = exp{tr(ln(RAR-1))). Nowinvoketheinvariance
of determinant andtrace under similarity transformation andtheresultofExample3.4.4to
obtain
detA = exp{tr(R(lnA)R-1)) = exp{tr(lnA)). (3.29)
Thisisanimportantequation,whichissometimesusedtodefinethedeterminantofoperators
in infinite-dimensional vectorspaces. II
Bolh lhe determinant and lhe trace are mappings from MNxN to C. The deter-
minant is not a linear mapping, bnt lhe trace is; and lhis opens up lhe possibility
of defining an inner product in lhe vector space of N x N matrices in terms of lhe
trace:
3.6.6. Proposition. For any two matrices A, B E MNxN, the mapping g :
MNxN x MNxN --> C defined by g{A, B) = tr(AtB) is a sesquilinear inner
product.
Proof The proof follows directly from lhe linearity of trace and lhe definition of
hermitian conjugate. D
3.7 Problems
3.1. Show lhat if [c) = la) + Ib), lhen in any basis lhe components of Ie) are equal
to lhe sums of lhe corresponding components of la) and Ib). Also show lhat lhe
elements of lhe matrix representing lhe sum of two operators are lhe sums of the
elements of lhe matrices representing IDosetwo operators.
3.2. Show lhat the unit operator 1 is represented by lhe unit matrix in any basis.
3.3. The linear operator A : ]R3 --> ]R2 is given by
A (X) = (2x+Y - 3Z) .
Y x+y-z
z
Construct lhe matrix representing A in lhe standard bases of]R3 and]R2.
3.4. The linear transformation T : ]R3 --> ]R3 is defined as
Find lhe matrix representation of T in
104 3. MATRICES: OPERATOR REPRESENTATIONS
(a) the standard basis of JR3,
(b) the basis consisting of lat) = (1,1,0), la2} = (1,0, -I), and la3} =
(0,2,3).
3.5. Show that the diagonal elements of an antisymmetric matrix are all zero.
3.6. Show that the nnmber of independent real parameters for an N x N
1. (real) symmetric matrix is N(N + 1)/2,
2. (real) antisymmetric matrix is N(N - 1)/2,
3. (real) orthogonal matrix is N(N - 1)/2,
4. (complex) unitary matrix is N 2,
5. (complex) hermitian matrix is N2
•
3.7. Show that an arbitrary orthogonal 2 x 2 matrix can be written in one of the
following two forms:
(
CO
So
sinO
- SinO)
cos e
or (
cosO
sinO
SinO)
-CDSe .
The first is a pure rotation (its determinant is +I), and the second has determinant
-1. The form of the choices is dictated by the assnmption that the first entry of
the matrix reduces to I when 0 = O.
3.8. Derive the formulas
COS(OI +~) = cos 01 cos~ - sin 01 sin~,
sin(OI +~) = sin 01 cos ~ + cos 01 sin 02
by noting that the rotation of the angle 01 +~ in the xy-plane is the product of
two rotations. (See Problem 3.7.)
3.9. Prove that ifa matrix Msatisfies MMt = 0, then M= O. Note that in general,
M
2 = 0 does not imply that Mis zero. Find a nonzero 2 x 2 matrix whose square
is zero.
3.10. Construct the matrix representations of
and
the derivative andmultipiication-by-t operators. Choose II, t, t2 , t3 } as your basis
of3'~[tl and {l, t, t2 , t3 , t4 } as your basis of3'4[t]. Use the matrix of 0 so obtained
to find the first, second, third, fourth, and fifth derivatives of a general polynomial
of degree 4.
3.7 PROBLEMS 105
3.11. Find the transformation matrix R that relates the (orthonormal) standard
basis of C3 to the orthonormal basis obtained from the following vectors via the
Gram-Schmidt process:
Verify that R is unitary, as expected from Theorem 3.4.2.
3.12. If the matrix representation of an endomorphism Tof C2 with respect to the
standard basis is (11), what is its matrix representation with respect to the basis
{(D, (!I)}?
3.I3. If the matrix representation of an endomorphism T of C3with respect to the
standard basis is
what is the representation of T with respect to the basis
o 1 -1
{( 1 ),(-1),{ I)}?
-I 1· 0
3.14. Using Definition 3.5.1, calculate the determinant of a general 3 x 3 matrix
and obtain the familiar expansion of such a determinant in terms of the first row
of the matrix.
3.15. Prove Corollary 3.5.4.
3.16. Showthatdet(aA) = aN detAforanNxNmatrixAandacomplexnumber
a.
3.17. Show that det 1 = I for any unit matrix.
3.18. Find a specific pair ofmatrices Aand Bsuch that det(A+ B) oF detA+det B.
Therefore, the determinant is not a linear mapping. Hint: Any pair of matrices will
most likely work. In fact, the challenge is to find a pair such that det(A + B) =
detA + detB.
3.19. Demonstrate Proposition 3.5.5 using an arbitrary 3 x 3 matrix and evaluating
the sum explicitly.
3.20. Find the inverse of the matrix
A= (i
-2
-I
o
I
106 3. MATRICES: OPERATOR REPRESENTATIONS
3.21. Show explicitly that det(AB) = det Adet Bfor 2 x 2 matrices.
3.22. Given three N x N matrices A, B,and C such that AB = C with C invertible,
show that both A and B must be invertible. Thus, any two operators A and B on
a finite-dimensional vector space satisfying AB = 1 are invertible and each is the
inverse of the other. Note: This is not true for infinite-dimensional vector spaces.
3.23. Show directly that the similarity transformation induced by R does not
change the determinant or the trace of A where
(12-1)
R = 0 1 -2
2 1 -1
and
(3-1 2)
A = 0 1 -2·.
I -3 -I
3.24. Find the matrix that transforms the standard basis of 1(;3 to the vectors
I -I
0
.j2 .j2
!al} =
j
!a2) =
i
!a3) =
-2
.J6 .J6 .J6
l+i -1+i l+i
.J6 .J6 .J6
Show that this matrix is unitary.
3.25. Consider the three operators LI, L2, and L3 satisfying [Lj , L21 = iL3,
[L3, Ll] = iL2. [L2, L3] = iLl. Show that the trace of each of these operators
is necessarilyzero.
3.26. Show that in the expansion of the determinant given in Definition 3.5.1, uo
two elements of the same row or the same column can appear in each term of the
sum.
3.27. Find inverses for the following matrices using both methods discussed in
this chapter.
A=U
I
-1) ( 2 -1)
C= (~1
-1
~ ),
I 2 , B= 0 I -2 , I
-I -2 -2 2 1 -1 -I -2
CyO 0 (I - i)/(2../2)
"H)!('yO) J
D- 0 1/../2 (I - i)/(2../2) -(1 +i)/(2../2)
- 1/../2 0 -(1 - i)/(2../2) -(1 +i)/(2../2) .
0 1/../2 -(1 - i)/(2../2) (1+ i)/(2../2)
3.28. Let Abe an operator on V. Show that if det A = 0, then there exists a nonzero
vector [x) E V such that A [x) = o.
3.7 PROBLEMS 107
3.29. For which values ofa are the following matrices iuvertible? Find the iuverses
whenever possible.
A=G
a
;), B=G
I
D,
1 a
a 1
c= (! 1
~), D=G
I
~).
a I
0 a
3.30. Let {ai}~l' be the set consisting of the N rows of an N x N matrix A and
assume that the ai are orthogonal to each other. Show that
Hint: Consider AAt. What would the result be if Awere a unitary matrix?
3.31. Prove that a set of n homogeneous linear equations in n unknowns has a
nontrivial solution if and only if the determinant of the matrix of coefficients is
zero.
3.32. Use determinants to show that an antisymmetric matrix whose dimension is
odd cannot have an inverse.
3.33. Show that tr(la} (bl) = (bl a). Hint: Evaluate the trace in an orthonormal
basis.
3.34. Show that if two invertible N x N matrices A and B anticommute (that is,
AB + BA = 0), then (a) N must be even, and (b) tr A = tr B = O.
3.35. Show that for a spatial rotation Rfi(0) of an angle 0 about an arbitrary axis
n, tr Ril (0) = I +2 cos O.
3.36. Express the smn of the squares of elements of a matrix as a trace. Show that
this smn is invariant under an orthogonal transformation of the matrix.
3.37. Let S and Abe a symmetric and an antisymmetric matrix, respectively, and
let Mbe a general matrix. Show that
(a) trM = tr M',
(b) tr(SA) = 0; iu particular, tr A = 0,
(c) SAis antisymmetric if and only if [S, A} = 0,
(d) MSM' is symmetric and MAM' is antisymmetric.
(e) MHMt is hermitian if His.
108 3. MATRICES: OPERATOR REPRESENTATIONS
3.38. Find the trace of each of the following linear operators:
(a) T : ]R3 --> ]R3 given by
T(x, y, z) = (x +Y - z, 2x + 3y - 2z, x - y).
(b) T : ]R3 --> ]R3 given by
T(x, y, z) = (y - z, x +2y +z, z - y).
(c) T : 1(;4--> 1(;4given by
T(x, y, z, w) = (x + iy - z +iw, 2ix +3y - 2iz - w, x - iy, z + iw).
3.39. Use Equation (3.29) to derive Equation (3.27).
3.40. Suppose that there are two operators A and B such that [A, B] = c1, where
c is a constant. Show that the vector space in which such operators are defined
cannot be finite-dimensional. Conclude that the position and momentum operators
of quantum mechanics can be defined only in infinite dimensions.
Additional Reading
1. Axler, S. Linear Algebra Done Right, Springer-Verlag, 1996.
2. Birkhoff, G. and MacLane, S. Modern Algebra, 4th ed., Macmillan, 1977.
Discusses matrices from the standpoint of row and column operations.
3. Greub, W.Linear Algebra, 4th ed., Springer-Verlag, 1975.
4 _
Spectral Decomposition
The last chapter discussed matrix representation of operators. It was pointed out
there that sucha representation is basis-dependent. In some bases, the operatormay
"look" quite complicated, while in others it may take a simple form. In a "spe-
cial" basis, the operator may look the simplest: It may be a diagonal matrix. This
chapter investigates conditions under which a basis exists in which the operator is
represented by a diagonal matrix.
4.1 Direct Sums
Sum oftwo
subspaces defined
Sometimes it is possible, and convenient, to break up a vector space into special
(disjoint) subspaces. For instance, it is convenient to decompose the motion of a
projectile into its horizontal and vertical components. Similarly, the study of the
motion of a particle in R3 under the influence of a central force field is facili-
tated by decomposing the motion into its projections onto the direction of angular
momentum and onto a plane perpendicular to the angular momentum. This corre-
sponds to decomposing a vector in space into a vector, say, in the xy-plane and a
vector along the z-axis. We can generalize this to any vector space, but first some
notation: Let 11and W be subspaces of a vector space V. Denote by 11+W the
collection of all vectors in V that can be written as a sum of two vectors, one in 11
and one in W. It is easy to show that 11 +W is a subspace of V.
4.1.1. Example. Let If be the xy-plane and W the yz-plane. Theseare both subspaces
ofJll.3, and so is 1L +W. In fact, 1L +W = Jll.3, becausegivenany vector(x, y, z) inJll.3,
we can write it as
(x, y, z) = (x, h.0) +(0, h.z}.
'-'-'
ell eW
110 4. SPECTRAL DECOMPOSITION
Thisdecomposition is notunique: Wecouldalsowrite(x, y, z) = (x, h,0) + (0, ~y, z),
anda hostof other relations. l1li
4.1.2. Definition. Let U and W be subspaces ofa vector space Vsuch that V =
U +Wand the only vector common to both U and W is the zero vector. Then we
direct sumdefined say that Vis the direct sum ofU and Wand write
uniqueness ofdirect 4.1.3. Proposition. Let U and W be subspaces ofV. Then V = U Ell W ifand only
sum ifany vector in Vcan be written uniquely as a vector in 11 plus a vector in W.
Proof Assume V = U Ell W, and let Iv) E V be written as a sum of a vector in U
and a vector in W in two different ways:
Iv) = lu) + Iw) = lu') + Iw') {} lu) - lu') = Iw') - Iw).
The LHS is in U. Since it is equal to the RHS-which is in W-it must be in
W as well. Therefore, the LHS must equal zero, as must the RHS. Thus, lu) =
lu') , Iw') = Iw), and there is only one way that [u) can be written as a sum of a
vector in U and a vector in W.
Conversely, if la) E U and also la) E W, then one can write
la) = la) + 10) and
--
in'lL inW
la) = 10) + la) .
--
in'lL inW
Uniqueness of the decomposition of la) implies that la) = 10). Therefore, the
only vector corumon to both U and W is the zero vector. This implies that V =
UEilW. D
dimensions ina 4.1.4. Proposition. IfV = U Ell W, then dim V = dim U +dim W.
direct sum
Proof Let llui) }~I be a basis for U and Ilwi)J~=I a basis for W. Then it is easily
verified that IluI) ,lu2), ... , lum ) , IWI) ,lw2), ... , IWk)J is a basis for V. The
details are left as an exercise. D
We can generalize the notion of the direct sum to more than two subspaces. For
example, we can write R' = XEIl'IJEIlZ, where X, 'IJ, and Z are the one-dimensional
subspaces corresponding to the three axes. Now assume that
(4.1)
i.e., V is the direct sum of r of its subspaces that have no common vectors among
themselves except the zero vector and have the property that any vector in Vcan be
written (uniquely) as a sum of vectors one from each subspace. Define the linear
operator Pj by Pj lu) = luj) where lu) = LJ=I luj), luj) E Uj . Then it is readily
4.1 DIRECT SUMS 111
verified that P] = Pj and PjPk = 0 for j i' k. Thus, the P/s are (not necessarily
hermitian) projection operators. Furthermore, for an arbitrary vector lu), we have
¥Iu) E V.
Since this is true for all vectors, we have the identity
(4.2)
orthogonal
complement ofa
subspace
4.1.5. Definition. Let V be an innerproduct space. Let JV(be any subspace ofV.
Denote by JV(.L the set ofall vectors in Vorthogonal to all the vectors in :M. JV(.L
(pronounced "em perp") is called the orthogonal complement ofJV(.
4.1.6. Proposition. JV(.L is a subspace ofV.
Proof In fact, if la), Ib) E JV(.L, then for any vector [c) E JV(,we have
=0 =0
- ---
(cl (a la) + f3lb» = a (c] a) +f3 (c] b) = O.
So, ala) +f3lb} E JV(.L for arbitrary a, f3 E C and la} , Ib) E JV(.L. o
If V of Equation (4.1) is an inner product space, and the subspaces are mutually
orthogonal, then for arbitrary lu) , Iv} E V,
which shows that Pj is hermitian. In Chapter2, we assnmedthe projectionoperators
to be hermitian. Now we see that only in an ioner product space (and only if
the subspaces of a direct sum are orthogonal) do we recover the hermiticity of
projection operators.
4.1.7. Example. Consideran orthonormalbasis BM = !Iei)):"=l for M, and extend it
to a basis B = !Iei))~l for V. Now coostruct a (hermitian) projection operator P =
L~l lei) {ei I· Thisis theoperatorthat projects anarbitrary vector inVanta thesubspace
M. It is straightforwardto showthat 1- P is the projectionoperatorthat projectsontoM.L
(seeProblem4.2).
Anarbitrary vector la) E Vcanbe written as
la) =(P +1 - P) la) =Pia) +(1 - P) la) .
'-.-'
in.M: in:M.l
Furthermore, theonlyvector that canbeinboth:M andMl..is thezerovector, because itis
the onlyvectororthogonalto itself. III
112 4. SPECTRAL DECOMPOSITION
From this example and the remarks immediately preceding it we may conclude
the following:
4.1.8. Proposition. If V is an inner product space, then V = M Ell M.L for any
subspace M. Furthermore, the projection operators corresponding to M and M.L
are hermitian.
4.2 Invariant Subspaces
invariant subspace:
reduction ofan
operator
matrix representation
ofanoperator in a
subspace
block diagonal matrix
defined
This section explores the possibility of obtaioiog subspaces by means ofthe action
ofa linear operator on vectors of an N-dimensional vector space V. Let 10) be any
vector 10V, and A a linear operator on V. The vectors
10) ,A 10) ,A2 1
a) , ... , AN 10)
are linearly dependent (there are N + I of them!). Let M es Span {Ak
10)Jf=o' It
follows that, m es dim M =" dim V, and M has the property that for any vector
[x) E M the vector A [x) also belongs to M (show this!). In other words, no vector
10M "leaves" the subspace wheu acted on by A.
4.2.1. Definition. A subspace M is an invariant subspace ofthe operator A if A
transforms vectors ofM into vectors ofM. This iswrittensuccinctlyas A(M) C M.
We say that M reduces A if both M and M.L are invariant subspaces ofA.
Startiog with a basis of M, we can extend it to a basis B = {Iai) J!:, of V
whose first m vectors span M. The matrix representation of A 10 such a basis is
given by the relation Alai) = I:f=''''jilaj),i = 1,2, ... ,N.lfi =" m,then
",ji = 0 for j > m, because Alai) belongs to M when i =" m and therefore can
be written as a linear combioation of only {Ia,) , 102),... , lam)). Thus, the matrix
representation of A 10B will have the form
A = (An A12),
02' A22
where An is anm x m matrix, A'2 anm x (N - m) matrix, 021the (N - m) x m
zero matrix, and A22 an (N - m) x (N - m) matrix. We say that An represents
the operator A 10the m-dimensional subspace M.
It may also happen that the snbspace spanned by the remaioiog basis vectors
10 B, namely lam+, ) , lam+2) , ... , ION), is also an invarianl subspace of A. Then
A12 will be zero, and A will take a block diagonal form:'
A = (An 0)
o A22
A matrix representation of an operator that can be brought ioto this form by a
reducible and suitable choice ofbasis is said 10 be reducible; otherwise, it is called irreducible.
irreducible matrices
1Fromnow on, we shalldenoteall zeromatrices by thesamesymbolregardless of theirdimensionality.
4,2 INVARIANT SUBSPACES 113
A reducible matrix A is denoted in two different waysr'
(4.3)
condition for
invariance
For example, when M reduces A and one chooses a basis the first m vectors
of which are in M and the remaining ones in M1-, then A is reducible. We have
seen on a number of occasions the significance of the hermitian conjugate of an
operator (e.g., in relation to hermitian and unitary operators). The importance of
this operator will be borne out further when we study the spectral theorem later in
this chapter. Let us now investigate some properties of the adjoint of an operator
in the context of invariant subspaces.
4.2.2. Lemma. A subspace M ofan inner product space V is invariant under the
linear operator A ifand only ifM1- is invariant under At,
Proof The proof is left as a problem. D
An immediate consequence ofthe above lemma and the two identities (At) t =
A and (M1-)1- = M is contained in the following theorem.
4.2.3. Theorem. A subspace of V reduces A if and only if it is invariant under
both A and At.
4.2.4. Lemma. Let M be a subspace ofV and P the hermitianprojection operator
onto M. Then M is invariant under the linear operator A ifand only ifAP = PAP,
Proof Suppose M is invariant. Then for any [x) in V, we have
Pix) EM·=} AP Ix) EM=} PAP Ix) = AP Ix) .
Since the last equality holds for arbitrary [x), we bave AP = PAP,
Conversely, suppose AP = PAP. For any IY) E M, we have
Ply) = IY) =} AP Iy) = A Iy) = P(AP IY» E M.
'-.-'
=PAP
Therefore, M is invariant under A. D
4.2.5. Theorem. Let M be a subspace ofV, P the hermitian projection operator
ofVonto M, and Aa linear operator on V.Then M reduces Aifand only ifAand
P commute.
21tis common to use a single subscript for submatrices ofa block diagonal matrix, just as it is common to use a single subscript
for entries of a diagonal matrix.
114 4. SPECTRAL DECOMPOSITION
Proof Suppose M reduces A. Then by Theorem 4.2.3, M is invariant under both
A and At. Lemma 4.2.4 then implies
AP = PAP and Atp = PAtp. (4.4)
Taking the adjoint of the second equation yields (Atp)t = (PAtp)t, or PA = PAP.
This equation together with the first equation of (4.4) yields PA = AP.
Conversely, suppose that PA = AP. Then p2A = PAP, whence PA = PAP.
Taking adjoints gives Atp = PAtp, because P is hermitian. By Lemma 4.2.4, M
is invariant under At. Similarly, from PA = AP, we get PAP = Ap2, whence
PAP = AP. Once again by Lemma 4.2.4, M is invariant under A. By Theorem
4.2.3, M rednces A. D
The main goal of the remaining part of this chapter is to prove that certain
operators, e.g. hermitian operators, are diagonalizable, that is, that we can always
find an (orthonormal) basis in which they are represented by a diagonal matrix.
4.3 Eigenvalues and Eigenvectors
eigenvalue and
eigenvector
Let us begin by considering eigenvaluesand eigenvectors, which are generaliza-
tions of familiar concepts in two and three dimensions. Consider the operation of
rotation about the z-axis by an angle 0 denoted by Rz(O).Such a rotation takes any
vector (x, y) in the xy-plane to a new vector (x cos 0 - y sin 0, x sin 0 +y cos 0).
Thus, unless (x, y) = (0,0) or 0 is an integer multiple of 2n, the vector will
change. Is there a nonzero vector that is so special (eigen,in German) that it does
not change when acted on by Rz(O)? As long as we confine ourselves to two di-
mensions, the answer is no. But if we lift ourselves up from the two-dimensional
xy-plane, we encounter many such vectors, alI of which lie along the z-axis.
The foregoing example can be generalized to any rotation (normally specified
by EuIer angles). In fact, the methods developed in this section can be used to
show that a general rotation, given by Euler angles, always has an unchanged
vector lying along the axis around which the rotation takes place. This concept is
further generalized in the following definition.
4.3.1. Definition. A scalar A is an eigenvalue and a nonzero vector la} is an
eigenvector ofthe linear transformation A E J:.,(V) if
Ala} = Ala). (4.5)
4.3.2. Proposition. Addthe zero vector to the set ofall eigenvectors ofAbelonging
to the same eigenvalue A, and denote the span of the resulting set by MA. Then
MA is a subspace ofV, and every (nonzero) vector in MA is an eigenvector ofA
with eigenvalue A.
Proof The prooffollows immediately from the above definition and the definition
of a subspace. D
4.3 EIGENVALUES AND EIGENVECTORS 115
eigenspace 4.3.3. Definition. The subspace JVe,. is referred to as the eigenspace ofA corre-
sponding to the eigenvalue ),.. Its dimension is called the geometric multiplicity
of ),.. An eigenvalue is called simple if its geometric multiplicity is 1. The set of
spectrum eigenvalues ofA is called the spectrum ofA.
By their very construction, eigenspaces corresponding to different eigenvalues
have no vectors in common except the zero vector. This can be demonstrated by
noting that if Iv) E. M, n MILfor ), f= 1-'-, then
0= (A - ),.1) Iv) = A Iv) - ),.Iv) = I-'-Iv) - ),.Iv) = (I-'- -),.) Iv) '* Iv) = O.
'--.-'
,.0
Let us rewrite Equation (4.5) as (A - ),,1) la) = O. This equation says that la)
is an eigenvector of A if and only if la) belongs to the kernel of A - ),,1. If the
latter is invertible, then its kernel will consist of only the zero vector, which is
not acceptable as a solution of Equation (4.5). Thus, if we are to obtain nontrivial
solutions, A -),,1 must have no inverse. This is true if and only if
det(A - ),,1) = O. (4.6)
characteristic
polynomial and
characteristic roots
ofanoperator
The determinant in Equation (4.6) is a polynomial in)", called the characteris-
tic polynomial of A.The roots of this polynomial are called characteristic roots
and are simply the eigenvalues of A. Now, any polynomial of degree greater than
or equal to I has at least one (complex) root, which yields the following theorem.
4.3.4. Theorem. Every operator on afinite-dimensional vector space over iC has
at least one eigenvalue and therefore at least one eigenvector.
Let X},),.2, •.• ,),.p be the distinct roots of the characteristic polynomial of A,
and let xj occur mj times. Then 3
p
det(A - ),,1) = (),.I - ),.)m 1 ••• (),.p - ),.)mp = D(),.j _ ),.)mj.
j~1
For),. = 0, this gives
p
detA - 'l.rn t , m 2 'l.mp_D'l.mj
- 11.1 11.2 ••• "-p - Aj.
j=l
(4.7)
(4.8)
Equation (4.8) states that the determinant of an operator is the product of all
its eigenvalues. In particular, ifone ofthe eigenvalues is zero, then the operator is
not invertible.
3mj is called the algebraic multiplicity of J..i-
116 4. SPECTRAL DECDMPDSITIDN
4.3.5. Example. Letus fiudthe eigenvaluesof a projectionoperatorP.1f la} is an eigen-
vector,then Pia) = Ala}. ApplyingP on both sides again, we obtain
p2la) =AP la} =1.(1. la}) =Alia).
But p2 =P; thus, Pia} =Alia). It followsthaUlla) =A[e},or (1.1 - A) la} =O. Since
la} i' 0, wemustbaveA(A-I) = 0, or1.= 0, 1.Thus,the onlyeigenvalues of aprojection
operator are0 and1. Thepresence of zero as aneigenvalue of P is anindication that P is
notinvertible. II
4.3.6. Example. Tobe ableto see thedifference betweenalgebraic andgeometric multi-
plicities,considerthematrixA= (A 1),whosecharacteristicpolynomialis (I - 1.)1. Thus,
thematrix has only oneeigenvalue, A = I, with algebraic multiplicity ml = 2. However,
themostgeneralvector la) satisfying(A-1) la) = 0 is easilyshownto be ofthefonn (~).
Thisshowsthat M).=l is one-dimensional, i.e., thegeometric multiplicity of Ais 1. l1li
diagonalizable
operators
As mentioned at the beginning of this chapter, it is useful to represent an
operator by as simple a matrix as possible. The simplest matrix is a diagoual
matrix. This motivates the following definition:
4.3.7. Definition. A linear operatorA on a vector space V is said to be diagonal-
izable if there is a basisfor Vall ofwhose vectorsare eigenvectors ofA.
4.3.8. Theorem. Let A be a diagonalizable operator on a vector space V with
distinct eigenvalues{Aj }j~t. Thereare (not necessarilyhermitian)projectionop-
eratorsPj on Vsuch that
r
(I) 1 = LPj,
j=l
(2) PiPj = 0 for i t= i.
r
(3) A = I>jPj.
j=l
Proof Let JY(j denote the eigenspace corresponding to the eigenvalue Ai- Since
the eigenvectors span V and the only common vector of two different eigenspaces
is the zero vector (see comments after Definition 4.3.3), we have
This immediately gives (I) and (2) if we use Equations (4.1) and (4.2) where Pj
is the projection operator onto JY(
s-
To prove (3), let Iv} be an arbitrary vectorin V.Then [u) canbe writtenuniquely
as a sum of vectors each coming from one eigenspace. Therefore,
A [u) =~AIVj) =~Aj IVj) = (~AjPj) Iv}.
Since this equality holds for all vectors Iv}, (3) follows. D
4.4 SPECTRAL DECOMPOSITION 117
4.4 Spectral Decomposition
This section derives one of the most powerful theorems in the theory of linear
operators, the spectral decomposition theorem. We shall derive the theorem for
operators that generalize hermitian and unitary operators.
normal operator 4.4.1. Definition. A normal operator is an operator on an inner product space
defined that commutes with its adjoint.
An important consequence of this definition is that
if and only if Ais normal. (4.9)
4.4.2. Proposition. Let A be a normal operator on V. Then Ix} is an eigenvector
ofAwith eigenvalue). ifand only iflx} is an eigenvector ofAt with eigenvalue):*.
Proof By Equation (4.9), thefactthat(A-)"1)t = At -),,*1, and the factthatA-)"1
isnonnal (reader, verify), we have II (A-)"1)xll = 0 ifandonly if II(At -)..*1)xll =
O. Since it is only the zero vector that has the zero nonn, we get
(A - ),,1) Ix} = 0 if and only if (At - ),,*1) [x) = o.
This proves the proposition. D
We obtain auseful consequence ofthis propositionby applying it to a hermitian
operator H and a unitary operator" U.In the first case, we get
Therefore, Ais real.Inthe second case, we write
Ix) = 1jx) = UUt [x) = U()..* Ix)) = )..*U Ix) = U* Ix) => U* = I.
Therefore, ).. is unimodular (has absolute value equal to 1). We summarize the
foregoing discussion:
4.4,3. Corollary. The eigenvalues ofa hermitian operator are real. The eigenval-
ues ofa unitary operator have unit absolute value.
4.4.4. Example. Let us find the eigenvalues and eigenvectors of the hermitianmatrix
H=(~r})' Wehave
(
- A
det(H -Al) = det i
Thus, theeigenvalues, Al = 1andA2 = -I, arereal, asexpected.
"Obviously, bothare normal operators.
118 4. SPECTRAL DECOMPOSITION
Tofindtheeigenvectors, we write
-i) ("I) = (--:"1 - i"2)
-1 cz la'i - cz
oraz = ial. which gives lal) = (iC:X1
r
) = al (}), where cq is an arbitrary complex number.
Atso,
-i) (Ih) =(fh - ifh)
1 fh zfh +fh
Always normalize the
eigenvectors!
or fh = -ilh, which gives la2} = (!'~1) = PI <-~;l, where PI is an arbitrary complex
number.
It is desirable, inmostsituations, toorthonormalizetheeigenvectors. In thepresentcase,
theyarealready orthogonal. Thisis a property shared by all eigenvectors of ahermitian (in
fact,normal) operator stated in thenexttheorem. Wetherefore need to merelynormalize
the eigenvectors:
or 1"11 = 1/.J2and"I = ei~1.J2for some'P E lll.. A similarresullis obtained for PI. The
choiceip = 0 yields
and III
The following theorem proves for all normal operators the orthogonality prop-
erty of their eigenvectors illnstrated in the example above for a simple hermitian
operator.
4.4.5. Theorem. An eigenspace ofa normaloperatorreduces that operator. More-
over, eigenspaces ofa normal operator are mutually orthogonal.
Proof The first part of the theorem is a trivial consequence of Proposition 4.4.2
and Theorem 4.2.3. To prove the second part, let Iu) E JY[A and Iv) E JY[I' with
A i' 1-'- Then, using Theorem 4.2.3 once more, we obtain
A(vlu) = (vIAu) = (vIAu) = (Atvlu) = (Il*vlu) =Il(vlu)
It follows that (A- Il) (vi u) = 0 and since Ai' u, (vi u) = O. D
spectral
decomposition
theorem
4.4.6. Theorem. (Spectral Decomposition Theorem) Let Abe a normal operator
on afinite-dimensional complex inner product space V. Let AI, A2, ... , A, be its
distinct eigenvalues. Then there exist nonzero (hermitian) projection operators
PI, Pz, ... , Pr such that
Vi i' j ..
4.4 SPECTRAL DECDMPDSITIDN 119
3. L;'=I AiPi = A.
Proof Let Pi be the operator that projects onto the eigenspace Mi corresponding
to eigenvalue Ai. By conuuents after Proposition 4.1.6, these projection operators
are hermitian. Because of Theorem 4.4.5, the only vector conuuon to any two
distinct eigenspaces is the zero vector. So, it makes sense to talk about the direct
sum of these eigenspaces. Let M =MI E9M2E9· . 'E9Mr and P =Lr=1 Pi, where
P is the orthogonal projection operator onto M. Since A conuuutes with every Pi
(Theorem 4.2.5), it conuuutes with P. Hence, by Theorem 4.2.5, M reduces A,i.e.,
M1- is also invariant under A. Now regard the resttiction of A to M1- as an operator
in its own right on the finite-dimensional vector space M1-. Theorem 4.3.4 now
forces A to have at least one eigenvector in M1-. But this is impossible because all
eigenvectors of A have been accounted for in its eigenspaces. The only resolution
is for M1- to be zero. This gives
and
The second equation follows from the first and Equations (4.1) and (4.2). The
remaining part of the theorem follows from arguments sintilar to those used in the
proof of Theorem 4.3.8. D
We can now establish the counection between the diagonalizability of a normal
operatorand the spectral theorem. In each subspace Mi' we choose an orthonormal
basis. The union of all these bases is clearly a basis for the whole space V. Let
us label these basis vectors lei), where the subscript indicates the subspace and
the superscript indicates the particnlar vector in that subspace. Clearly, (ei lei:) =
8",8jj, and Pj = L;~llei){eil. Noting that Pk len =8kj' lei:),we can obtain
the mattix elements of A in such a basis:
Only the diagonal elements are nonzero. We note that for each subscript j we have
mj orthonormal vectors lei}, where mj is the dimension of Mi- Thus, Aj occurs
mj times as a diagonal element. Therefore, in such an orthonormal basis, A will
be represented by
diag(AI, ... , AI, A2, ... , A2, ... , Ar , ••• , Ar ).
'"-.,,-" . , .
ml times m2 times m- times
Let us sunuuarize the foregoing discussion:
4.4.7. Corollary. IfA E £'(V) isnormal,then Vhas an orthonormalbasisconsist-
ingofeigenvectorsofA. Therefore, a normaloperatoron a complexinnerproduct
space is diagonalizable.
120 4. SPECTRAL DECOMPOSITION
Computation ofthe
largest and the
smallest eigenvalues
ofanormal operator
Using this corollary, the reader may show the following:
4.4.8. Corollary. A hermitianoperatorispositive ifandonly ifall its eigenvalues
are positive.
4.4.9.Example. COMPUTATION OFLARGEST AND SMALLEST EIGENVALUES
There is an elegant technique that yields the largest andthe smallest (in absolute value)
eigenvalues of a normal operator A in a straightforward way if theeigenspaces of these
eigenvalues are onedimensional. Forconvenience, assume that theeigenvalues are labeled
in order of decreasing absolute values:
IAII > IA21 > ... > IArl ;6 O.
Let (laknt'=1 be a basis of V consistingof eigenvectorsof A,and Ix) = Lt'=1 ~k lak) an
arbitrary vectorin V. Then
AmIx) = f ~kAm lak) = f ~kAT lak) = A'i' [~llal) + i:~k eflak)] .
k=1 k=1 k=2 I
In thelimitm --+ 00, thesummation inthebrackets vanishes. Therefore,
and
Ahermitian matrix
can be diagonalized
byaunitary matrix.
foranyIy} E V.Taking theratio of this equation and thecorresponding oneform + 1,we
obtain
. (yl Am+1 Ix)
lim AI.
m-e-co (yl Am Ix)
Notehowcrucially thisrelation depends onthefactthat Al is nondegenerate, i.e., that JV(1
is one-dimensional. By taking larger andlarger values for m, we can obtain a better and
better approximation tothelargest eigenvalue.
Assumingthatzero isnotthesmallesteigenvalue Ar-andthereforenotaneigenvalue-
of A, we can findthe smallesteigenvalueby reptacing A with A-I and Al with liAr. The
details areleft asanexercise forthereader. III
Any given hermitian matrix H can be thought of as the representation of a
hermitian operator in the standard orthonormal basis. We can find a unitary matrix
U that Cantransform the standard basis to the orthonormal basis consisting of lej),
the eigenvectors of the hermitian operator. The representation of the hermitian
operator in the new basis is UHUt, as discussed in Section 3.3. However, the above
argument showed that the new matrix is diagonal. We therefore have the following
result.
4.4.10. Corollary. A hermitianmatrixcanalwaysbe broughtto diagonalform by
meansofa unitarytransformation matrix.
4.4.11.Example. Letus consider thediagonalization of thehermitian matrix
o
o
-I-i
1- i
-I+i
-I+i
o
o
-I-i)
1+ i
o .
o
4.4 SPECTRAL DECOMPOSITION 121
The characteristic polynomial is det(H - A1) = (A + 2)2(A - 2)2 Thus, Al = -2 with
multiplicity m 1 = 2, and).2 = 2 With multiplicity m2 = 2. Tofindtheeigenvectors, we
first look at the matrix equation (H+21) la} = 0, or
o
2
-1-;
1- i
-1+;
-1+;
2
o
-I-.i)GI)
I+! "'2 =0.
o "'3
2 4
Thisis a systemof linear equations whose"solution" is
Wehavetwoarbitraryparameters, so we expecttwolinearlyindependent solutions. Forthe
two choices al = 2, cz = 0 and al = 0, Cl2 = 2, we obtain, respectively,
and
which happen to he orthogonal. We simply normalize them to ohtain
and
Similarly, the second eigenvalue equation, (H- 21) la} = 0, gives rise to the conditions
"'3 = -!(l + ;)("'1 +"'2) and "'4 = -!(l-;)("'1 - "'2), whichprodnce the orthonormal
vectors
and le4} = I", ( I~ .).
2...,2 - -!
1-;
Theunitarymatrix thatdiagonaIizes Hcan be constructed fromthesecolunmvectors
using the remarks hefore Example 3.4.4, which imply that ifwe simply put the vectors lei}
together ascolumns, theresulting matrix is Ut :
2 0 2 0
I
0 2 0 2
Ut = _
2-,/2 1+; 1+; -1-; -1-;
1-; -1+; -1+; 1- i
122 4. SPECTRAL DECOMPOSITION
and the unitary matrix will be
2 0 1- i 1+ i
1
0 2 1- i -1- i
U=(Ut)t=_,
2,f'i 2 0 -I+i -1- i
0 2 -I+i 1 + i
We can easily check that U diagonalizes H, i.e., that UHUt is diagonal.
application ot
diagonalizalion in
electromagnetism
4.4.12. Example. In somephysicalapplications the abilityto diagonaIize matricescan
be very useful. As a simple but illustrative example, let us consider the motion ofa charged
particle in a constant magnetic field pointing in the zdirection. The equation of motion for
sucha particleis
d (eX
m--.! = qv x B = qdet Vx
dt 0
ey
Vy
o
which in component form becomes
dv y qB
-=--Vx ,
dt m
dvz
-=0.
dt
Ignoring the uniform motion in the z direction, we need to solve the first two coupled
equations, which in matrix form becomes
d (Vx) qB ( 0
dt v y = -;;; -1
(4.10)
where we have introduced a factor of i to render the matrix hermitian, and defined (j) =
qB/m. If the 2 x 2 matrix were diagonal, we would get two uncoupled equations, which
we could solve easily. Diagonalizing the matrix. involves finding a matrix R such that
Ifwe coulddo sucha diagonalization, we wouldmuttipty(4.10)by Rto get'
which can be written as
d (vi) _ . (1'1
-d , - -leV 0
t v y
where
5The fact that R is independent of t is crucial in this step. This fact, in tum, is a consequence of the independence from t of
the original 2 x 2 matrix.
4.4 SPECTRAL DECDMPDSITIDN 123
Wethenwouldhaveapairof uncoupled equations
that have v' = v' e-iwtttt andv' = v' e-iw!LZt as a solutionset in whichv' andv'
XOx YOy ' O x Oy
areintegration constants.
To find R, we need the normalizedeigenvectors of (~i b). But these are obtainedin
preciselythe samefashion as in Example 4.4.4. Thereis, however, an arbitrariness in the
solutions due to the choice in numbering the eigenvalues. If we choose the normalized
eigenvectors
1 (-i)
le2) = -J'i 1 '
thenfrom commentsatthe end of Section 3.3, we get
-1 t 1 (i
R =R=-J'il
With this choiceof R,we have
t t 1 (-i
R=(R) = - .
-J'i'
so that1'1 = 1 = -1'2. Havingfound Rt, we can write
(V
x)= Rt (VJ) = _1 (i
vy vy -J'i 1
-i)(Vbxe~i"")
1 v' el wf .
Oy
(4.11)
If thex andy components of velocityat t = 0 areVOx andVDy' respectively, then
or
(v
o
x)= Rt (V9X) ,
vOy vO
y
Substituting iu (4.11), we obtain
(V
9X) = R(vox) = _1 (-iVox+ VOY) .
vO
y VDy..,fi i vOx + VOy
-i)«-iVox +voy)e~iwt) = ( vOx cos cr +VDy sin cr )
1 (ivox+voy)elwt -vOxsinwt+voyCOSW( .
simultaneous
diagonalization
defined
This gives the velocity asa function of time. Antidifferentiating once with respect to time
yields the position vector. II
In many situations of physical interest, it is desirable to know whether two
operators are simultaneously diagonalizable. For instance, if there exists a basis
of a Hilbert space of a quantum-mechanical system consisting of simultaneous
eigenvectors of two operators, then one can measure those two operators at the
same time. In particular, they are not restricted by an uncertainty relation.
4.4.13. Definition. Two operatorsaresaidtobesimultaneouslydiagonalizable if
they can be written in terms a/the same set a/projection operators, as in Theorem
4.4.6.
124 4. SPECTRAL OECOMPOSITION
This definitionis consistent with the matrix representation ofthe two operators,
because if we take the orthonormal basis B = (Iej)) discussed right afterTheorem
4.4.6, we obtaindiagonalmatrices for bothoperators. Whatare the conditionsunder
which two operators can be simultaneously diagonalized? Clearly, a necessary
condition is that thetwooperators commute. Thisis animmediate consequence of
the orthogonality of the projection operators, which trivially implies PiPj = PjPi
for all i and j. It is also apparent in the matrix representation ofthe operators: Any
two diagonal matrices commute. What about sufficiency? Is the commutativity
of the two operators sufficient for them to be simultaneously diagonalizable? To
answer this question, we need the following lemma:
4.4.14. Lemma. An operator T commutes with a normal operator A ifand only if
T commutes with all the projection operators ofA.
Proof The "if" part is trivial. To prove the "only if" part, suppose AT = TA,
and let Ix} be any vector in one of the eigenspaces of A, say Mi- Then we have
A(T Ix}) = T(A Ix}) = T(A.j Ix}) = Aj(T Ix»); i.e., T Ix) isinMj,orMj is invariant
under T. Since M j is arbitrary, T leaves all eigenspaces invariant. In particular, it
leaves Mt,the orthogonal complement ofM j (the direct sum ofall the remaining
eigenspaces), invariant. By Theorems 4.2.3 and 4.2.5, TPj = PjT; and this holds
forallj. 0
necessary and
sufficient condition
forsimultaneous
diagonalizability
4.4.15. Theorem. A necessary and sufficient condition for two normal operators
A and B to be simultaneously diagonalizable is [A, B] = O.
Proof As claimed above, the"necessity" is trivial. Toprovethe"sufficiency," let
A = Lj~l AjPj and B = Lf~l I-'kQk, where {Aj} and {Pj} are eigenvalues and
projections of A, and {M} and {Qk! are those of B. Assume [A, B] = O. Then
by Lemma 4.4.14, AQk = QkA. Since Qk commutes with A, it must commute
with the latter's projection operators: PjQk = QkPi- To complete the proof, define
Rjk sa PjQk, and note that
Rjk = (PjQk)t = Qlpj = QkPj = PjQk = Rjk,
(Rjk)2 = (PjQd = PjQkPjQk = PjPjQkQk = PjQk = Rjk.
Therefore, Rjk are hermitian projection operators. Furthermore,
r r
LRjk=LPjQk=Qk.
j=l j=l
-----
=,
Similarly, Lf=l Rjk = Lf=l PjQk = Pj Lf=l Qk = Pi- We can now write A
and B as
r r s
A = LAjPj = LLAjRjk,
j=l j~l k~l
, , r
B = LI-'kQk = LLI-'kRjk.
k~l k~lj=l
spectral
decomposition ofa
Pauli spin matrix
4.5 FUNCTIONS OF OPERATORS 125
By definition, they are simultaneously diagonalizable. D
4.4.16. Example. Let us findthe spectraldecompositionof the Paoti spinmatrix
(
0 -i)
U2 = i 0 .
Theeigenvalues andeigenvectors havebeenfound in Example 4.4.4. Theseare
and
The subspaces JY[J..j areone-dimensional; therefore,
1 (1) 1 1(1
PI = 1eJ) (ell = -J'i i -J'i (1 -i) =:2 i
1(1) 1(1
P2 = !e2) (e21 = - . (l i) = -2 .
2 -z -I
Wecan checkthat PI +P2 = (6?) and
-i)
1 '
-i) _~ (1
1 2-'
III
What restrictions are to be imposed on the most general operator T to make
it diagonalizable? We saw in Chapter 2 that T can be written in terms of its so-
called Cartesian components as T = H +iH' where both H and H' are hermitian
and can therefore be decomposed according to Theorem 4.4.6. Can we conclude
that T is also decomposable? No. Because the projection operators used in the
decomposition of H may not be the same as those used for H'. However, if H and
H' are simultaneously diagonalizable such that
and
r
H' = I>~Pko
k~1
(4.12)
then T = L:k~1 ()'k + iA~)Pk. It follows that T has a spectral decomposition,
and therefore is diagonalizable. Theorem 4.4.15 now implies that H and H' must
commute. Since, H = ~ (T +Tt) and H' = tr(T- Tt), we have [H, H'] = 0 if and
ouly if [T, Tt] = 0; i.e., T is normal. We thus get back to the condition with which
we started the whole discussion of spectral decomposition in this section.
4.5 Functionsof Operators
Functions of transformations were discussed in Chapter 2. With the power of
spectral decomposition at our disposal, we can draw many important conclusions
about them.
126 4. SPECTRAL OECOMPOSITION
First, we note thatifT = L;~l A,P" then, because oforthogonality ofthe P,'s
, ,
1
2
= LATPi"'" yn = LAfPi.
i=1 ;=1
Thus, any polynomial pinT has a spectral decomposition given by p(T) =
L;=l p(Ai)Pi. Generalizing this to functions expandable in power series gives
00
f(T) = Lf(Ai)Pi.
i=l
(4.13)
4.5.1.Example. Let us investigatethe spectral decomposition of the following unitary
(actuallyorthogonal)matrix:
-Sine)
cos a .
(
COS B - ei()
sinO
Wefindtheeigenvalues
(
C
aSe - A -SinB) z
det . 0 0 ' =1. -2cosOA+I=0.
SIn v cos] - A
yieldingAl = e-i8 andA2 = ei9 . ForAl we have(reader, provide themissingsteps)
CO~9S~~il~) (:~)= 0 ~ a2 = ial =>
andfor A2.
(
COS e- e-i()
sinB
- sinO ) ("I) 0
cosB - e-i9 a2 = => "z = -i"l => lez) = JzCJ.
Wenote that the Mlj
are one-dimensionaland spannedby [ej). Thus,
_ie-iO) I ( eiO
-te + 2- . te
e -Ie
PI = leI) (ell = ~ (;)(1
Pz = lez) (ezl =! (I.) (I
2 -,
Clearly,PI + Pz = 1, and
-i8 i9 1 (e-i()
e PI +e Pz = 2 ie-i8
I(I
-i) = 2: i
O=!(l
2 -,
-i)
I '
D·
ie
iO)
if) = U.
e
III
If we takethenatural log of this equation anduse Equation (4.13), we obtain
InU = In(e-iO)Pl + In(eiO)pz = -iOPI + iePz
= i(-OPI + OPz)== iH, (4.14)
where H== -fjPl +8P2 is ahennitianoperatorbecausefj isrealandPIandP2are hermitian.
InvertingEquation(4.14)givesU = ei" , where
H = O(-PI + Pz) = 0 (~i ~).
The square root ofan
operator isplagued
by multivaluedness.
In the real numbers,
we have only
two-valuedness!
4.5 FUNCTIONS OF OPERATORS 127
The example above shows that the unitary 2 x 2 matrix Ucan be written as
an exponentialof an anti-hermitianoperator. This is a general result. In fact, we
havethe following theorem,whose proof is left as an exercisefor the reader (see
Problem4.22).
4.5.2.Theorem. A unitary operatorUon afinite-dimensionalcomplexinnerprod-
uctspacecanbe writtenas U = ei H whereH is hermitian. Furthermore, a unitary
matrix can be brought to diagonalform by a unitary transformation matrix.
The last statemeutfollowsfrom Corollary4.4.10 and the fact that
for any functiou f that can be expandedin a Taylorseries.
A usefulfunctionof an operatoris its squareroot. A natural way to definethe
square root of an operatorA is .jA = L~~l (±A)Pi. This clearly givesmany
candidatesfor theroot,becauseeach termin the sum can haveeitherthe plus sign
or the mious sign.
4.5.3.Definition. Thepositive squareroot ofa positive operator A = L~=l A,P,
is.;A = L~~l APi.
The uniquenessof the spectraldecompositionimplies that the positivesquare
root of a positiveoperatoris unique.
4.5.4.Example. Letusevaluate ,fAwhere
A=CS3i ~).
First,we haveto spectrally decomposeA.Its characteristic equation is
1.2
- lOA + 16 = 0,
withrootsAl = 8 andA2 = 2. Since both eigenvaluesarepositive andAis hermitian, we
concludethat Ais indeedpositive (Corollary 4.4.8). We can also easily findits normalized
eigenvectors:
Thus,
and 1 (-i)
1<2) =.,fi 1 .
and
;), -i)
1 '
We caneasily check!hat(,fA)2 = A.
-i) 1 (3
1 =.,fi -i ~) .
III
128 4. SPECTRAL DECOMPOSITION
Intuitively, higher and higher powers of T, when acting on a few vectors of
the space, eventually exhaust all vectors, and further increase in power will be
a repetition of lower powers. This intuitive idea can be made more precise by
looking at the projection operators. We have already seen that Tn = LJ=l AjPj,
n = 1,2, .... For various n'sone can"solve" for Pj in termsofpowers ofT. Since
there are only a finite number of Pj 's, only a finite number of powers of T will
suffice. In fact, we canexplicitlyconstructthe polynomialin Tfor Pi- Ifthereis such
a polynomial, by Equation (4.13) it must satisfy Pj = pj(T) = L~~l Pj(Ak)Pk,
where Pj is some polynomial to be determined. By orthogonality ofthe projection
operators, Pj O.k) must be zero unless k = j, in which case it mnst be I. In other
words, Pj O.k) = 8kj. Such a polynomial can be explicitly constructed:
Therefore,
(4.15)
and we have the following result.
4.5.5. Proposition. Every function ofa normal operator on a finite-dimensional
vector space can be expressedas a polynomial. Infact.from Equations (4./3) and
(4.15),
(4.16)
4.5.6. Example. Let us write .,fA of the last exampleas a polynomialin A. Wehave
Substitutingin Equation(4.16), we obtain
v'8 .,fi .,fi v'8
.,fA = j):jPI (A) +../i:i.P2(A) = -(A -2) - -(A- 8) = -A+-.
6 6 6 3
TheRHS is clearly a (first-degree) polynomial in A, andit is easy to verify that it is the
matrix of ..fAobtainedin thepreviousexample. II
4.6 POLAR DECOMPOSITION 129
4.6 Polar Decomposition
We have seen many similarities between operators and complex numbers. For
instance, hermitian operators behave very much like the real numbers: they have
realeigenvalues; their squares arepositive; everyoperatorcanbewritten asH+i H',
where both H and H' are hermitian; and so forth. Also, unitary operators can be
written as expiH, where H is hermitian. So unitary operators are the analogue of
complex numbers of unit magnitude such as eiO• A general complex number can
be written as re'", Can we write an arbitrary operator in an analogous way? The
following theorem provides the answer.
polar decomposition 4,6.1. Theorem. (polar decomposition theorem) An operatorA onafinite-dimen-
theorem sionalcomplexinnerproductspacecanbe writtenas A = UR whereRis a(unique)
positive operatorand U a unitaryoperator. IfA is invertible,then U is also unique.
Proof We will prove the theorem for the case where the operator is invertible.
The proof of the general case can be found in books on linear algebra (such as
[Halm 58]).
The reader may show that the operator AtA is positive. Therefore, it has a
unique positive square root R.We let V = RA-I, or VA = R. Then
w t = RA-1(RA-1)t = RA-1(A-1)tRt = R(AtA)-IRt
= R(R2)-IRt = R(RtR)-IRt = RR-1(Rtl-1Rt = 1
and
and Vis indeed a unitary operator. Now choose U = vt to get the desired decom-
position.
To prove uniqueness we note that UR = U'R' implies that R = UtU'R' and
R2 = RtR = (utu'R'lt(utU'R') = R'tU'tUUtU'R' = R'tR' = R'2. Since the
positive transformation R2 (or R'2l has only one positive square root, it follows
that R = R'
IfA is invertible, then so is R = UtA. Therefore,
UR = U'R =} URR-1 = U'RR-1 =} U = U',
and U is also unique. D
Itis interesting to note that the positive definiteness ofRand the nonuniqueness
of U are the analogue of the positivity of r and the nonuniqueness of eiO in the
polar representation of complex numbers:
z = reifJ = rei (fJ+ 2mr ) Vn EZ.
130 4. SPECTRAL DECOMPOSITION
In practice, R is found by spectrally decomposing AtA and taking its positive
square root.6 Once R is found, Ucan be calculated from the definition A = UR.
4.6.2. Example. Let us findthepolar decomposition of
A= (-gi ~.
We have
Theeigenvalues and eigenvectors of R2 areroutinely found tobe
I (i)
Al = 18, 1.2 = 2, let} = 2../2 ~ ,
Theprojection matrices are
i../7
7 },
Thus,
TofindU,we notethatdetAis nonzero.Hence,Ais invertible, whichimpliesthat Ris also
invertible. Theinverse of Ris
The unitary mattixis simply
U = AR-I = ~ (-iI5../2 3v'14
24 3i-m 15../2)·
Itis leftforthereader toverifythat Uis indeed unitary.
4.7 Real Vector Spaces
The treatment so far in this chapter has focused on complex inner product spaces.
The complex number system is far more complete than the real numbers. For
example, in preparation for the proof of the spectral decomposition theorem, we
used the existence of n roots of a polynomial of degree n over the complex field
6It is important to payattention to theorderof thetwo operators: One decomposesAtA,not AA
t.
4.7 REAL VECTOR SPACES 131
(this is the fundamental theorem of algebra). A polynomial over the reals, on the
other hand, does not necessarily have all its roots in the real number system.
Itmay therefore seem that vector spaces over the reals will not satisfy the nseful
theorems and results developed for complex spaces. However, through a process
called complexification of a real vector space, in which an imaginary part is added
to such a space, it is possible to prove (see, for example, [Balm 58]) practically all
the results obtained for complex vector spaces. Only the resnlts are given here.
4.7.1. Theorem. Arealsymmetricoperatorhasaspectraldecompositionasstated
in Theorem4.4.6.
This theorem is especially useful in applications of classical physics, which
deal mostly with real vector spaces. A typical situation involves a vector that
is related to another vector by a symmetric matrix. It is then convenient to find
a coordinate system in which the two vectors are related in a simple mauner.
This involves diagonalizing the symmetric mattix by a rotation (a real orthogonal
matrix). Theorem 4.7.1 reassures us that such a diagonalization is possible.
4.7.2. Example. For a systemof N point particles constituting a rigid body,the total
angular momentum L = 'Lf::l mj(rj x Vi) is related to the angular frequency via L =
'Lf::l mi[rj x (w x Ii)] = 'Lf::l mj[wfj . fj - fieri . W)], or
(Z
;)= (~;: ~;~
t., Izx /zy
where
IX,) (WX)
I yz Wy.
Izz {Oz
N
i.. = L:mjCrl-xl>.
;=1
N
Ixy = -LmjXiYi,
i=1
N
Iyy =L mi (r1- Y1),
i=1
N
[xz = - LmjXjZi.
i=1
N
I zz = EmjCrl - zf),
i=1
N
Iyz =- LmiYiZi,
i=1
with Ixy = I yx • [xz = [zx. and I yz = flY'
The 3 x 3 matrix is denoted by I and is called the moment of inertia matrix. It is
symmetric,andTheorem4.7.1permitsits diagonalization by an orthogonal transformation
(the counterpart of a unitary transformation in a real vector space). But an orthogonal
transformation in threedimensions is merely a rotation of coordinares.? Thus, Theorem
4.7.1 says that it is always possible to choose coordinate systems in which the moment of
inertia matrix is diagonal. In such a coordinate system we have Lx = Ixxwx. L y = Iyywy,
and Lz = Izzwz, simplifying the equations considerably.
Similarly, the kineticenergyof therigidrotatingbody,
N. N N
T = L !mjVr = L !miVj . (w x rj) = L !mjw. (r, x Vj) = !W. L = !wt!W,
i=1 i=1 i=1
71bis is not entirely true! There are orthogonal transformations that are composed of a rotation followed by a reflection about
the origin. See Example 3.5.8.
132 4. SPECTRAL DECOMPOSITION
which in general has off-diagonal terms involving Ixy and so forth, reduces to a simple
& . I 2 II 2 II 2 III
rorm: T = 2. Ixxwx + 2" yyWy + '2 zzwz·
4.7.3. Example. Another application of Theorem 4.7.1 is in the study of conic sections.
The most general form of the equation of a conic section is
alx
2
+ azy2 +a3xy + a4x +a5Y +a6 = 0,
where aI, ... , a6 are constants. ITthe coordinate axes coincide with the principal axes of
the conic section, the xy term will beabsent, and the equation of the conic section takes
the familiar form, On geometrical grounds we have to be able to rotate xy-coordinates to
coincide with the principal axes. We shall do this using the ideas discussed in this chapter.
First, wenote that the generalequation for a conic sectioncan be writtenin matrixform
as
(x as) (~) +a6 = o.
The 2 x 2 matrix is symmetricand can therefore be diagonalizedby means of an orthogonal
matrix R. Then R/R = 1, and we can write
Let
a3/2) Rt = (ai
a2 0
Then we get
(x' y') (aJ ~) (;:) + (a4 a~) (;;) +a6 = 0;
or
"2 "2 " " 0
alx +a2Y +a4x +asy +a6 = .
The cross term has disappeared. The orthogonal matrix R is simply a rotation. In fact,
it rotates the original coordinate system to coincide with the principal axes of the conic
section. II
4.7.4. Example. In this example we investigate conditions under which a multivariable
function has a maximum or a minimum.
A point a = (aI, a2, ...• an) E lRn is a maximum (minimum) of a function
f(xj, X2, ... , xn) ee f(r)
if
vflx,=a, = (aa
f
, aaf
, ... , aaf
) = 0
Xl x2 xn Xj=aj
4.7 REAL VECTOR SPACES 133
and for small Xi - ai, the difference fer) - /(a) is negative (positive). To relate this
difference to thetopicsof thissection, write theTaylorexpansionof the functionarounda
keeping terms up to the second order:
n al 1 n ( a2
I )
l(r)=/(a)+L(Xi-ai)(-a.) +-2 L(Xi- ai)(Xj-aj) -a·a· +... ,
i=l X, r=a i,j XI XJ r=a
or, constructing a column vector out of 8j ea Xi - ai and a symmetric matrix Dij out of the
second derivatives, we can write
1 t
I(r) - I(a) = -8 08 + ...
2
because the first derivatives vanish. For a to be a minimum point of f, the RHS of the last
equation must be positive for arbitrary 8. This means that D must be a positive matrix. S
Thus, all its eigenvalues must be positive (Corollary 4.4.8). Similarly, we can show that for
a to be a maximum point of f. -0 must be positive definite. This means that Dmust have
negative eigenvalues.
Whenwespecialize theforegoing discussion to twodimensions, we obtainresultsthat
are familiar from calculus. For the function f (x, y) to have a minimum, the eigenvalues of
the matrix
(Ixx Ixy)
Iyx Iyy
must be positive. The characteristic polynomial
det (Ixx - A
Iyx
Ixy ) =0
Iv» -A
yields two eigenvalues:
Ixx +Iyy+JUxx - lyy)2 +4!A
At = 2 '
Ixx + Iyy - JUxx - lyy)2 +4!ly
1.2 = 2 .
These eigenvalues will be both positive if
Ixx + Iyy > jUxx:- /yy)2 +4!ly.
and both negative if
Ixx + Ivv < -JUxx - lyy)2 +41;y.
Squaring these iuequalities and simplifying yields
2
Ixxlyy > IxY'
SNote that Dis already symmetric-the real analogue of hermitian.
134 4. SPECTRAL DECOMPOSITION
which shows that fxx and /yy must have the same sign. ITthey are both positive (negative),
we havea minimum (maximum).This is the familiar condition for the attainmentof extrema
by a function of two variables. II
Although the establishment of spectral decomposition for symmetric opera-
tors is fairly straightforward, the case of orthogonal operators (the counterpart of
unitary operators in a real vector space) is more complicated. In fact, we have
already seen in Example 4.5.1 that the eigenvalues of an orthogonal transforma-
tion in two dimensions are, in general, complex. This is in contrast to symmetric
transformations,
Think of the orthogonal operator 0 as a unitary operator.? Since the absolute
value ofthe eigenvalues ofa unitary operator is I, the only real possibilities are ±I.
To find the other eigenvalues we note that as a unitary operator. 0 can be written
as eA
, where A is anti-hermitian (see Problem 4.22). Since hermitian conjugation
and transposition coincide for real vector spaces, we conclude that A = _At, and
A is antisymmetric. It is also real, because 0 is.
Let us now consider the eigenvalues of A. If A is an eigenvalue of A corre-
sponding to the eigenvector la), then (al Ala) = A(al a). Taking the complex
conjugate of both sides gives (al At la) = A*(al a); but At = At = -A. because
A is real and antisymmetric. We therefore have (alAla) = -A* (ala), which
gives A* = -A. It follows that if we restrict A to be real. then it can only be zero;
otherwise. it must be purely imaginary. Furthermore, the reader may verify that if
Ais an eigenvalue ofA. so is -A. Therefore, the diagonal form ofAlooks like this:
Amag = diag(O. 0, ...• O.WI. -WI. ifh. -ilh•...•Wk, -iek).
which gives 0 the following diagonal form:
Od' = eAdiag = diag(eO eO eO eifft e-i91 eilh e-ifh eiOk e-Uh)
lag , , ... " , , , , .•• , ,
with el •fh•....ekall real. Ilis clear that if 0 has -I as an eigenvalue, then some
of the e's must equal ±1f. Separating the 1f'S from the rest of e's and putting all
of the above arguments together, we get
O' - diagt l 1 1 -1 -1 -1 Uft -Uh i(h -iBz iBm -iBm)
dlag- , , ... , ,~,e ,e ,e,e , ... ,e ,e
N+ N_
whcre v., +N_ +2m = dimO.
Getting insight from Example 4.5.1, we can argue. admittedly in a nonrigorous
way. that corresponding to each pair e±iBj is a 2 x 2 matrix of the form
-Sine}) == R (e.)
COse} 2 J
(4.17)
9This can always be done by formally identifying transposition with hermitian conjugation, an identification that holds when
the underlying field of numbers is real.
4.7 REAL VECTOR SPACES 135
We therefore have the following theorem (refer to [Hahn 58] for a rigorous treat-
ment).
4.7.5. Theorem. A real orthogonal operator on a real inner product space V
cannot, in general, be completely diagonalized. The closest it can get to a diagonal
form is
Odiag = diag(l, 1, ... ,1",-1, -1, ... , -1, R2(11j), R2(!h), ... , R2(em )) ,
where N+ + N_ +2m = dim V and R2(ej) is as given in (4.17). Furthermore,
the matrix that transforms an orthogonal matrix into the form above is itself an
orthogonal matrix.
The last statement follows from Theorem 4.5.2 and the fact that an orthogonal
matrix is the real analogue of a unitary matrix.
4.7.6. Example. An interesting applicationof Theorem4.7.5 occursinclassicalmechan-
ics,whereitis shownthat themotionof arigidbodyconsistsof atranslation andarotation.
'The rotation is represented by a 3 x 3 orthogonal matrix. Theorem 4.7.5 states that by an
appropriate choice of coordinate systems(l.e., by applying the same orthogonal transfor-
mation that diagonalizes therotation matrix of therigidbody),one can"diagonalize" the
3 x 3 orthogonal matrix. The"diagonal" formis
o
±l
o
or
o
cosB
sinO
-s~e) .
cosB
Excluding thereflections (corresponding to -1's) and thetrivial identity rotation, we con-
cludethat anyrotation of arigidbodycanbewritten as
(
I 0
o cosB
o sine
-s~e),
cosO
whichis arotation through theangleBabout the(new)x-axis. III
Combining the rotation of the example above with the translations, we obtain
the following theorem.
4.7.7. Theorem. (Euler) The general motion ofa rigid body consists ofthe trans-
lation of one point of that body and a rotation about a single axis through that
point.
Finally, we quote the polar decomposition for real inner product spaces.
4.7.8. Theorem. Any operator A on a real inner product space can be written as
A = OR, where R is a (unique) symmetric positive operator and 0 is orthogonal.
136 4. SPECTRAL DECOMPOSITION
4.7.9. Example. Let us decompose the following matrix into its polar form:
Theprocedure is thesameasin thecomplexcase. Wehave
2 t (2 3)(2 0) (13
R = A A = 0 -2 3 -2 = -6
witheigenvalues A.l = 1 and).2 = 16 andnormalized eigenvectors
and
Theprojection operators are
PI = le1)(eIl = ~ G;),
Thus, we have
-2)
1.
Wenote that Ais invertible. Thus,Ris also invertible, and
R-
I
= 2~ (~ I~)'
This gives0 = AR-1,or
Itis readilyverified that 0 is indeedorthogonal.
-2) = ~ (17
1 5-6
-6)
8 .
III
Our excursion through operator algebra and matrix theory bas revealed to us
the diversity of diagonalizable operators. Could it be perhaps that all operators are
diagonalizable? In other words, given any operator, can we find a basis in which
the matrix representing that operator is diagonal? The answer is, in general, no!
(See Problem 4.27.) Discussion of this topic entails a treatment of the Hamilton-
Cayly theorem and the Jordan canonical form of a matrix, in which the so-called
generalized eigenvectors are introduced. A generalized eigenvectorbelongs to the
kernel of (A - A1)m for some positive integer m. Then A is called a generalized
eigenvalue. We shall not pursue this matter here. The interested reader can find
such a discussion in books on linear algebra and matrix theory. We shall, however,
see the application of this notionto special operators on infinite-dimensional vector
spaces in Chapter 16. One result is worth mentioning at this point.
4.7.10. Proposition. If the roots ofthe characteristic polynomial ofa matrix are
all simple, then the matrix can be brought to diagonal form by a similarity (not
necessarily a unitary) transformation.
4.7 REAL VECTOR SPACES 137
4.7.11. Example. As a finat example of the application of the results of this section, tet
us evaluate the n-fold integral
100 100 100 -'"
In = dXl dX2 ... dXne - L..i,j=l lnijXiXj,
-00 -00 -00
(4.t8)
[det J] = IdetR'1 = I.
III
analytic definition of
the determinant ofa
matrix
where the mij are elements of a real, symmetric, positive definite matrix. say M. Because
it is symmetric, M can be diagonaIized by an orthogonal matrix R so that RMRt
= D is a
diagonal matrix whose diagonal entries are the eigenvalues, AI. ).,2, ... , An, of M, whose
positive definiteness ensures that none of these eigenvalues is zero or negative.
The exponent in (4.18) can be written as
n
L mijXjXj = xtMx =xtRtRMRtRx = xttDx' = )"1X12 +... +).nx~2.
i,j=l
where
or, in component form, x; = L:j=l TijXj for t = 1,2, ... ,n. Similarly, since x = Rtx'. it
follows that Xi =LJ=l rjixj for i = 1,2, ... , n.
The "volume element" dx! dXn is related to the primed volume element as follows:
1
8(Xl> X2, ,xn)1 I I I d I
dXl···dxn = f " ' f dXl ···dxn es IdetJldxl ... xn'
8(xl' x2 . · · · , xn)
where J is the Jacobian matrix whose ijth element is 8xi/8xj. But
8Xi
8' =rji
xj
Therefore, in terms of s', the integral In becomes
In = 1
00
dx~ l°Odx~ ... 1
00
dx~e-J..IX?-J..2Xq--
..·-J..nX~2
-00 -00 -00
= (/_: dx~e-J..1X?)(i:dx2e-J..2X22) ... (i:dx~e-J..nX~)
= ~ ~... ~ = "n/2 1 = "n/2(detM)-t/2,
VAt V1.2 vi;. .JA1A2"· An
because the determinant of a matrix is the product of its eigenvalues. This result can be
written as
1
00 t nn
dnxe-X
Mx = "n/2(detM)-1/2 =} detM = ,
-00 (J~oodnxe-xt M X
)2
which gives an analytic definition of the determinant.
138 4. SPECTRAL DECOMPOSITION
4.8 Problems
4.1. Let 11j and 11z be subspaces of V. Show that
(a) dim(l1j + 11z)= dim 111 + dim 11z - dim(111 n11z).Hint: Extend a basis of
111 n 11z to both 111 and 11z.
(b) If111 + 11z = Vand dim 111 + dim 11z = dim V,then V= 111 m11z.
(b) If dim 111 + dim 11z > dim V,then 111 n11z oF {OJ.
4.2. Let P be the (hermitian) projection operator onto a subspace Jye Show that
1 - P projects onto M1-. Hint: You need to show that (ml Pia) = (ml a) for
arbitrary la) and 1m) E M; therefore, consider (ml P la)*, and use the hermiticity
ofP.
4.3. Show that a subspace M of an inner product space V is invariant under the
linear operator A if and only if M1- is invariant under At.
4.4. Show that the intersection of two invariant subspaces of an operator is also
an invariant subspace.
4.5. Let n be a permutation of the integers (I, 2, ... , nj. Find the spectrum of An,
if for Ix) = (aj, az, ... ,an) E en, we define
An [x) = (an(I), ... , an(n»)'
4.6. Show that
(a) the coefficient of J..N in the characteristic polynomial is (_l)N, where N =
dim V, and
(b) the constant in the characteristic polynomial of an operator is its determinant.
4.7. Operators A and B satisfy the commutation relation [A, B] = 1. Let Ib) be an
eigenvector of B with eigenvalue J... Show that e-'fA Ib) is also an eigenvector of
translation operator B,but witheigenvalue J.. + r , This is why e-'fA is called the translation operator
for B. Hint: First find [B, e-'fA].
4.8. Find the eigenvalues of an involutive operator, that is, an operator A with the
property AZ = 1.
4.9. Assume that A and A' are similar matrices. Show that they have the same
eigenvalues.
4.10. In each of the following cases, determine the counterclockwise rotation of
the xy-axes that brings the conic section into the standard form and determine the
conic section.
(a) llxz + 3l + 6xy - 12 = 0
(e) 2xz<s"> 4xy - 3 = 0
(e) 2xz+sl- 4xy - 36 = 0
(b) SxZ
- 3yz+6xy+6=O
(d) 6x
z
+3l- 4xy - 7 = 0
4.B PROBLEMS 139
4.11. Show that if A is invertible, then the eigenvectors of A-I are the same as
those of A and the eigenvalues of A-I are the reciprocals of those of A.
4.12. Find all eigenvalues and eigenvectors of the following matrices:
Al = (~ ~) 81 = (~ ~) Cl= CI
-2
~I)
3
-4 -I
A2= G
0
~) 82= G
I
D C
I I
JJ
I 0 C2 = : -I
0 I I
A3= G
I
D83 = G
I
D (
I
D
I I C3 = : 0
0 I I
4.13. Sbow that a 2 x 2 rotation matrix does not have a real eigenvalue (and,
therefore, eigenvector) when the rotation angle is not an integer multiple of tt .
What is the physical interpretation of this?
4.14. Three equal point masses are located at (a, a, 0), (a, 0, a), and (0, a, a).
Find the moment ofinertia matrix as well as its eigenvalues and the corresponding
eigenvectors.
4.15. Consider (aI, a2, ... , an) E en and define Eij as the operator that inter-
changes a: and ai- Find the eigenvalues of this operator.
4.16. Find the eigenvalues and eigenvectors of the operator -id/dx acting in the
vector space of differentiable functions el (-00,00).
4.17. Show that a hermitian operator is positive if and only if its eigenvalues are
positive.
4.18. What are the spectral decompositions of At, A-I, and AAt for an invertible
normal operator A?
4.19. Consider the matrix
I +i)
3 .
(a) Find the eigenvalues and the orthonormal eigenvectors of A.
(b) Calcnlatethe projection operators (matrices) PI andP2 and verifythat Li Pi =
1 and Li AiPi = A.
(c) Find the matrices.,fA, sin(nA/6), and cos(nA/6).
(d) Is A invertible? If so, find the eigenvalues and eigenvectors of A-I.
140 4. SPECTRAL OECOMPOSITION
4.20. Consider the matrix
A= (~i ~ ~}
(a) Find the eigenvalnes of A.Hint: Try A= 3 in the characteristic polynomial of
A.
(b) For each A,find a basis for M, the eigenspace associated with the eigenvalue
A.
(c) Use the Gram-Schmidt process to orthonormalize the above basis vectors.
(d) Calculate the projection operators (matrices) Pi for each subspace and verify
that Li Pi =1 and LiAiPi =A.
(e) Find the matrices.,fA, sin(rrAj2), and cos(rrAj2).
(f) Is Ainvertible? If so, find the eigenvalues and eigenvectors of A-I.
4.21. Show that if two hermitian matrices have the same set of eigenvalues, then
they are unitarily related.
4.22. Prove that corresponding to every unitary operator U acting on a finite-
dimensional vector space, there is a hermitian operator H such that U = exp iH.
4.23. Find the polar decomposition of the following matrices:
(
2i 0)
A="fi3' (
41
B = 12i
-12i)
34 ' (
1 0
c= 0 1
I i
4.24. Show that an arbitrary matrix Acan be "diagonalized" as D = UAV, where U
is unitary and Dis a real diagonal matrix with only nonnegative eigenvalues. Hint:
Consider AAt .
4.25. Show that (a) if A is an eigenvalue of an antisymmetric operator, then so
is -A, and (b) antisymmetric operators (matrices) of odd dimension cannot be
invertible.
4.26. Find the unitary matrices that diagonalize the following hermitian matrices:
Al = C12
_i -1+ i) A2=C· ;), A3 = G-i)
-1 ' -, o '
BI = (~I
-1
~) B2 = (~.
0
~i) .
0 , , -1
-i -I -,
Warning! You may have to resort to numerical approximations for some of these.
4.27. For A = (b1),where x t= 0, show that it is impossible to find an invertible
2 x 2 matrix R such that RAR-1is diagonal. (This shows that not all operators are
diagonalizable.)
4.8 PROBLEMS 141
Additional Reading
1. Alder, S. Linear Algebra Done Right, Springer-Verlag, 1996. Concise but
useful discussion of real and complex spectral theory.
2. DeVito, C. Functional Analysis and Linear Operator Theory, Addison-
Wesley, 1990. Has a good discussion of spectral theory for finite and infinite
dimensions.
3. Halmos, P. Finite Dimensional Vector Spaces, 2nd ed., Van Nostrand, 1958.
Comprehensive treatment of real and complex spectral theory for operators
on finite dimensional vectorspaces.
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Part II _
Infinite-Dimensional Vector
Spaces
Hassani_Mathematical_Physics_A_Modem_Int.pdf
5 _
Hilbert Spaces
The basic concepts of finite-dimensional vector spaces introduced in Chapter I can
readily be generalized to infinite dimensions. The definition of a vector space and
concepts of linear combination, linear independence, basis,subspace, span, andso
forth all carry over to infinite dimensions. However, one thing is crucially different
in the new situation, and this difference makes the study of infinite-dimensional
vector spaces bothricher and morenontrivial: In afinite-dimensional vectorspace
we dealt withfinite sums; ininfinite dimensions we encounter infinite sums. Thus,
we haveto investigate theconvergence of suchsums.
5.1 The Question of Convergence
The intuitive notion of convergence acquired in calculus makes use of the idea of
closeness. This, in tum, requires the notion of distance.' We considered such a
notion in Chapter I in the context of a norm, and saw that the inner product had
anassociated norm. However, it is possibleto introduce a norm on a vectorspace
without aninner product.
One such norm, applicable to en and JRn, was
where p is an integer. The "natural" norm, i.e., that induced on en (or JRn) by
the usual inner product, corresponds to p = 2. The distance between two points
1It is possible to introduce the idea of closeness abstractly, withoutresort to the notion of distance,as is done in topology.
However, distance,as appliedin vectorspaces,is as abstract as we wantto get.
146 5. HILBERT SPACES
Closeness isa
relative concept!
Cauchy sequence
defined
complete vector
spacedefined
depends on the particntar norm nsed. For example, consider the "point" (or vector)
Ib) = (0.1,0.1, ... ,0.1) in a 1000-dimensional space (n = 1000). One can easily
check that the distance of this vector from the origin varies considerably with p:
IIblil = 100, IIbll2 = 3.16, libII 10 = 0.2. This variation may give the impression
that there is no snch thing as "closeness", and it all depends on how one defines the
norm. This isnot true, becanse closeness is arelative concept: One always compares
distances. A norm with large p shrinks all distances of a space, and a norm with
small p stretches them. Thus, although it is impossible (and meaningless) to say
that "Ia) is close to Ib)" because of the dependence of distance on p, one can
always say "Ia) is closer to Ib) than [c) is to Id)," regardless of the value of p,
Now that we have a way of telling whether vectors are close together or far
apart, we can talk about limits and the convergence of sequences of vectors. Let
us begin by recalling the definition of a Cauchy sequence
5.1.1. Definition. An infinite sequence of vectors {lai)}~l in a normed linear
space V is called a Cauchy sequence if 1im;-->oo lIa; - aj II = O.
j-HX)
A convergent sequence is necessarily Cauchy. This can be shown using the
triangle inequality (see Problem 5.2). However, there may be Cauchy sequences
in a given vector space that do not converge to any vector in that space (see the
example below). Snch a convergence requires additional properties of a vector
space summarized in the following deliuition.
5.1.2. Definition. A complete vector space Vis a normed linear space for which
every Cauchy sequence of vectors in V has a limit vector in V. In other words,
if {Iai)}~l is a Cauchy sequence, then there exists a vector la} E V such that
limi-->oo lIa; - a II = o.
5.1.3. Example. 1.lll.is completewithrespectto theabsolute-value norm lIaII = laI. In
otherwords,everyCauchysequenceof realnumbers hasa limitin JR. Thisis provedin real
analysis.
2. <C is complete with respect to the norm lIall = lal = ~(Rea)2 + (rma)2. Using
lal :s IReal + IImc], one can show that the completeness of <C followsfrom that of R,
Detailsareleft as anexercisefor thereader.
3. The set of rational numbers Q is not completewith respectto the absolute-value norm.
In fact, {(I + 1/ k)k}~l is a sequence of rational numbers that is Cauchy but does not
convergeto arational number; it converges to e, thebaseof thenatural logarithm, whichis
known to bean irrational number. II
Let {laj}}~l be a Cauchy sequence of vectors in a finite-dimensional vec-
tor space VN. Choose an orthonormal basis {lek)}f=l in VN such thatz laj} =
2Recallthatone can always definean innerproduct on a finite-dimensional vectorspace.So, the existenceof orthonormal
basesis guaranteed.
allfinite-dimensional
vector spaces are
complete
5.1 THE QUESTION OF CONVERGENCE 147
N W N ill
Lk=l Clk [ek) and [aj) = Lk=t Clk [ek). Then
lIa; _ajll2 = (a; -ajla; -aj) = 11~(Clf) -Clk
j»)
lekf
N N
= L (Clk;) - Clkj))*(Cl?) - Cl?») (ek[el) = L [Clf) - Clk
j)
[2.
k,l=l k=l
The LHS goes to zero, because the sequence is assumed Cauchy. Furthermore, all
terms on the RHS are positive. Thus, they too must go to zero as i, j --> 00. By
the completeness of C, there must exist Clk E C such that limn....oo Clk
n)
= Clk for
k = 1,2, ... , N. Now consider [a) E VN given by la) = Lf=l Clk lek). We claim
that [a) is the limit of the above sequence of vectors iu VN. Indeed,
We have proved the followiug:
5.1.4. Proposition. Every Cauchy sequence in a finite-dimensional innerproduct
space overC (or R) is convergent. In otherwords, everyfinite-dimensionalcomplex
(or real) inner product space is complete with respect to the norm induced by its
inner product.
Thenextexample showshow important theword "finite" is.
5.1.5. Example. Consider{fk}~l' the infinitesequence of continuousfunctions defined
iu the interval [-I, +1] by
{
I if Ilk::ox::ol,
Ik(x) = (kx + 1)/2 if -llk::o x::o Ilk,
o if -I::ox::o -Ilk.
Thissequence belongsto eO(_l, 1), theinner product spaceof continuous functions with
its usual inner product: (II g) = J~l !*(x)g(x) dx. It is straightforwardto verify that
IIIk - /;112 = J~ll!k(x) - /;(x)1 2dx . ,0. Therefore,the sequence is Cauchy.
k,j"""'*oo
However. thelimitof this sequence is (see Figure 5.1)
I(x) = {I if 0< x < I,
o if -1<x<O,
which is discontinuous at x = 0 andtherefore does not belong to the spacein whichthe
original sequence lies. III
148 5. HILBERT SPACES
y
o
Figure 5.1 Thelimit of thesequence of thecontinuous functions Ik is a discontinuous
function that is 1 forx > 0 and0 forx < O.
We see that infinite-dimensional vector spaces are not generally complete. It is
a nontrivial taskto show whether ornot a given infinite-dimensional vector space
is complete.
Any vector space (finite- or infinite-dimensional) contains all finite linearcom-
binations of the form I:7~1 a; lai) when it contains all the lai)'s. This follows from
the very definition of a vector space. However, the sitnation is different when n
goes to infinity. For the vector space to contain the infinite sum, firstly, the mean-
ing of such a sum has to be clarified, i.e., a norm and an associated convergence
criterion needs to be put in place. Secondly, the vector space has to be complete
with respect tn that norm. A complete normed vector space is called a Banach
Banach space space. We shall not deal with a general Banach space, but only with those spaces
whose norms arise natnrally from an inner product. This leads to the following
definition:
Hilbert space defined 5.1.6. Definition. A complete inner product space, commonly denoted by 11:, is
called a Hilbert space.
Thus, all finite-dimensional real or complex vector spaces are Hilbert spaces.
However, when we speak of a Hilbert space, we shall usually assume that it is
infinite-dimensional.
It is convenient to use orthonormal vectors in stndying Hilbert spaces. So, let
us consider an infinite sequence (lei) }~1 of orthonormal vectors all belonging to
a Hilbert space 11:. Next, take any vector If) E 11:, construct the complex numbers
fi = (ei If), and form the sequence of vectors''
n
Ifn) = L fi lei)
;=1
for n = 1,2, ... (5.1)
3WecanconsiderIfn)asan"approximation"toIf),becausebothshare thesamecomponents alongthesamesetoforthonormal
vectors. The sequence of orthonormal vectorsactsverymuchas a basis.However, to bea basis,anextracondition mustbe met.
Weshalldiscussthiscondition shortly.
5.1 THE QUESTION OF CONVERGENCE 149
For the pair of vectors If) and IJ.,}, the Schwarz inequality gives
(5.2)
where Equation (5.1) has been used to evaluate Unl fn}. On the other hand, taking
the inner product of (5.1) with (fl yields
n n n
Ulfn) = Lfdflei) = Lfd,' = Ll.fil2.
i=1 ;=1 ;=1
Parseval inequality Substitution of this in Equation (5.2) yields the Parseval inequality:
n
L I.ti1
2
:s UI f} .
;=1
(5.3)
This conclusion is true for arbitrarily large n and can be stated as follows:
5.1.7. Proposition. Let Ilei)}?::t be an infinite set of orthonormal vectors in a
Hilbert space, 1[, Let If) E 11: and define complex numbers Ii = (eil f). Then
Bessel inequality the Besselinequality holds: E?::l Ifil 2
s UI f).
The Bessel inequality shows that the vector
00 n
" Ii lei) == lim "Ii lei)
L- n--+ooL-
i=1 i=l
complete
orthonormal
sequence ofvectors
converges; that is, it has a finite norm. However, the inequality does not say whether
the vector converges to If). To make such a statement we need completeness:
5.1.8. Definition. A sequence oforthonormal vectors Ilei)}?::! in a Hilbert space
11: is called complete ifthe only vector in 11: that is orthogonal to all the lei} is the
zerovector.
This completeness property is the extra condition alluded to (in the footuote)
above, and is what is required to make a basis.
5.1.9. Proposition. Let Ilei}}?::! be an orthonormal sequence in 11:. Then the
following statements are equivalent:
1. Ilei}}?::! is complete.
v If) E 1[,
2. If) = E~! [ei} (eil f)
3. E~l [ei} (eil = 1.
4. UI g) = E?::l UI ei) (eil g) v If), Ig} E 11:.
150 5. HILBERT SPACES
v If) E 1f.
Proof We shall prove the implications 1=}2 =} 3 =} 4 =} 5 =} I.
I =} 2: It is sufficient to show that the vector 11ft) == If) - I:f;:! lei) (eil f) is
otthogonal to all the Iej):
8ij
00 ,..-'--,
(ejl1ft) = (ejl f) - L (ejl ei) (eil f) = O.
;=1
2 =} 3: Since If) =llf) =I:f;:j(lei) (eil) If) is true for all If) E 1C, we must
have 1 = I:f;:I!ei)(eil.
3 =} 4: (fl g) = (flllg) = (fl (I:f;:I!ei) (eil) Ig) =I:f;:j (fl ei)(eil g).
4 =} 5: Let Ig} = If) in statement 4 and recall that (fl ei) = (eil f)*.
5 =} 1: Let If) be otthogonal to all the lei). Then all the terms in the sum are
zero implying that 1If112 = 0, which in tum gives If) = 0, because only the zero
vectorhas a zeronorm. 0
Parseval equality;
generalized Fourier
coefficients
The equality
00 00
IIff = (fl f) =L I(eil f) 1
2
=L Ifil
2
,
;=1 i=1
fi = (eil f) , (5.4)
is called the Parseval equality, and the complexnumbers Ji are called generalized
completeness Fourier coefficients. The relation
relation
00
1 = L lei} (eil
;=1
is called the completeness relation.
(5.5)
basis for Hilbert 5.1.10. Definition. A complete orthonormal sequence (lei) Jf;:! in a Hilbert space
spaces 1Cis called a basis of1C.
5.2 The Space of Square-Integrable Functions
Chapter 1 showed that the collection of all continuous fuoctions defined on an
interval [a, b] forms a linear vector space. Example 5.1.5 showed that this space
is not complete. Can we enlarge this space to make it complete? Since we are
interested in an inner product as well, and since a natural inner product for func-
tions is defined in terms of integrals, we want to make sure that our fuoctions
are integrable. However, integrability does not require continuity, it only requires
piecewise continuity. In this section we shall discuss conditions under which the
5.2 THE SPACE OF SQUARE-INTEGRABLE FUNCTIONS 151
space offunctions becomes complete. An important class offunctions has already
been mentioned in Chapter I. These functions satisfy the inner product given by
square-integrable
functions
(gl f) = lb
g*(x)!(x)w(x) dx.
If g(x) = f ix), we obtain
(fl f) = lb
1!(x)1
2w(x)
dx. (5.6)
Functions for which such an integral is defined are said to be square-integrable.
David Hilbert (1862-1943), the greatest mathematician of this
century, receivedhis Ph.D. fromthe University of Konigsberg
and was a member of the staff there from 1886 to 1895.ln 1895
he was appointed to the chair of mathematics at the University
of Gottingen, wherehecontinued to teachfortherestof his life.
Hilbert is one of that rare breedof late 19th-century math-
ematicians whose spectrum of expertise covereda wide range,
withformal settheory atoneendandmathematicalphysicsatthe
other.He did superbworkin geometry,algebraicgeometry,alge-
braic numbertheory, integral equations, and operatortheory. The
seminaltwo-volume bookMethodender mathematische Physik
by R. Courant, stilloneof thebestbooksonthesubject, wasgreatly influenced byHilbert.
Hilbert's workingeometryhadthegreatestinfluence in that area sinceEuclid. A system-
aticstudy of theaxioms of Euclidean geometry ledHilbert topropose 21 suchaxioms, and
he analyzed their significance. He published Grundlagen der Geometrie in 1899, puttiug
geometry onaformal axiomatic foundation. Hisfamous 23Paris problems challenged (and
stilltodaychallenge) mathematicians to solvefundamental questions.
It waslateinhis career that Hilbert turned to thesubject forwhichhe is mostfamous
among physicists. A lecture by Erik Holmgren in 1901 on Fredholm's work on integral
equations, whichhadalready beenpublished in Sweden, aroused Hilbert's interest in the
subject. DavidHilbert, having established himselfastheleading mathematician ofhis time
by his work on algebraic numbers, algebraic invariants, andthefoundations of geometry,
nowturned hisattention tointegralequations.He saysthat aninvestigation of thesubject
showedhim thatitwasimportant forthetheory of definite integrals, forthedevelopment of
arbitrary functions in series (ofspecialfunctions ortrigonometric functions). forthetheory
of linear differential equations, forpotential theory. andforthecalculus of variations. He
wrote a series of sixpapers from 1904to 1910andreproduced theminhisbookGrundzuge
einer allgemeinenTheone der linearenIntegralgleichungen (1912). During thelatter part
of thisworkhe applied integral equations toproblems of mathematical physics.
Itis saidthat Hilbert discovered thecorrect fieldequation forgeneral relativity in 1915
(oneyear before Einstein) usingthevariational principle, butnever claimed priority.
Hilbert claimed that he worked best out-of-doors. He accordingly attached an 18-foot
blackboard tohis neighbor's wallandbuiltacovered walkway there so that he couldwork
outside in any weather. He wouldintermittently interrupt his pacing andhis blackboard
152 5. HILBERT SPACES
computations witha few turns around therestof theyardonhis bicycle, orhe wouldpull
some weeds, ordo somegarden trimming. Once,whena visitorcalled,themaidsenthim
to the backyard andadvised thatif the master wasn'treadily visible at the blackboard to
lookforhimupin one of thetrees.
Highlygiftedandhighlyversatile, DavidHilbert radiated overmathematics acatching
optimism anda stimulating vitality that canonlybe called"thespirit of Hilbert," Engraved
on a stone marker set over Hilbert's grave in Gottingen arethe master's own optimistic
words: "WiT mussenwissen.WiT werden wissen." ("We must know. Weshallknow.")
The space of square-integrable functions over the interval [a, b] is denoted by
.c~(a, b). In this notation L stands for Lebesgue, who generalized the notion of
the ordinary Riemann integral to cases for which the integrand could be highly
discontinuous; 2 stands for the power of j (x) in the integral; a and b denote the
limits of integration; and w refers to the weight function (a strictly positive real-
valued function). When w(x) = I, we use the notation .c2(a, b). The significance
of .c~(a, b) lies in the following theorem (for a proof, see [Reed 80, Chapter III]):
£'~(a, b) iscomplete 5.2.1. Theorem. (Riesz-Fischer theorem) The space .c~(a, b) is complete.
A complete infinite-dimensional ioner product space was earlier defined to be
a Hilbert space. The following theorem shows that the number of Hilbert spaces
is severely restricted. (For a proof, see [Frie 82, p. 216].)
all Hilbert spacesare
alike
5.2.2. Theorem. All infinite-dimensional complete inner product spaces are iso-
morphic to .c~(a, b).
.c~ (a, b) is defined in terms offunctions that satisfyEquation (5.6). Yetanioner
product involves integrals of the form J:g*(x)j(x)w(x) dx: Are such integrals
well-defined and finite? Using the Schwarz inequality, which holds for any ioner
product space, finite or infinite, one can show that the integral is defined. The
isomorphism ofTheorem 5.2.2 makes the Hilbert space more tangible, because it
identifies the space with a space of functions, objects that are more familiar than
abstract vectors. Nonetheless, a faceless function is very little improvement over
an abstract vector. What is desirable is a set of concrete functions with which we
can calculate. The following theorem provides such functions (for a proof, see
[Sinon 83, pp. 154-161]).
5.2.3. Theorem. (Stone-Weierstrass approximation theorem) The sequence oj
functions (monomials) {xk }, where k = 0, 1,2, ... .forms a basis oj.c~(a, b).
Thus, any function j can be written as j(x) = L:~oakxk. Note that the
{x k } are not orthonormal but are linearly independent. If we wish to obtain an
orthononnal-or simply orthogonal-linear combination ofthese vectors, we can
use the Gram-Schmidt process. The result will be certain polynomials, denoted by
Cn(x), that are orthogonal to one another and span .c~(a, b).
5.2 THE SPACE OF SQUARE-INTEGRABLE FUNCTIONS 153
Such orthogonal polynomials satisfy very useful recurrence relations, which
we now derive. In the following discussion P,k(X) denotes a generic polynomial
of degree less than or equal to k. For example, 3x s - 4x2 +5, 2x + I, -2.4x4
+
3x3 - x2 +6, and 2 are all denoted by p,s(x) or p,s(x) or p,S9(X) because they
all have degrees less than or equal to 5, 8, and 59. Since a polynomial of degree
less than n can be written as a liuear combination of Ck(X) with k < n, we have
the obvious property
1
b
Cn(x)P,n_l(x)w(x)dx = O. (5.7)
Let km and k~ denote, respectively, the coefficients of xm andxm- 1 in Cm(x),
and let
hm = 1b[Cm(x)fw(X)dX. (5.8)
The polynomial CnH (x) - (knHI kn)xCn(x) has degree less than or equal to n,
and therefore can be expanded as a linear combination of the Cj (x):
(5.9)
form :On-2.
form:On-2.
Take the inner product of both sides of this equation with Cm(x):
1
b
~+11b
Cn+l(x)Cm(x)w(x)dx - -- xCn(x)Cm(x)w(x)dx
a kn a
n (b
= I:aj I, Cj(x)Cm(x)w(x) dx.
j=O a
The first integral on the LHS vanishes as long as m :0 n; the second integral
vanishes if m :0 n - 2 [if m :0 n - 2, then xCm(x) is a polynomial of degree
n - 1]. Thus, we have
n r
I:aj I, Cj(x)Cm(x)w(x)dx = 0
j=O a
The integral in the sum is zero unless j = m, by orthogonality. Therefore, the sum
reduces to
am 1
b
[Cm(x)]2W(x) dx = 0
Sincetheintegral isnonzero, we conclude that am = 0 form =0, 1,2, ... , n - 2,
and Equation (5.9) reduces to
kn+!
Cn+l(X) - --xCn(x) = an-lCn-!(X) +anCn(x). (5.10)
kn
154 5. HILBERT SPACES
It can be shown that if we define
kn+l
an = --,
kn
s; an
Yn=-----,
hn-l an-l
(5.11)
a recurrence relation then Equation (5.10) can be expressed as
lor orthogonal
polynomials Cn+l(X) = (anx + f3n)Cn(x) +VnCn-l(X),
or
(5.12)
(5.13)
Other recurrence relations, involving higher powers of x, can be obtained from
the one above. For example, a recurrence relation involving x2 can be obtained
by multiplying both sides ofEquation (5.13) by x and expanding each term ofthe
RHS using that same equation. The result will be
(5.14)
5.2.4. Example. As anapplication of the recurrence relations above, letus evaluate
t, == lb
xCm(x)Cn(x)w(x)dx.
Substituting (5.13)in the integralgives
1 lb
P lb
lj = - Cm(x)Cn+l(x)w(x)dx - --" Cm(x)Cn(x)w(x)dx
an a an a
~ lb
- --" Cm(X)Cn_l (x)w(x) dx.
IXn a
Wenowuse theorthogonality relations among the Ck(X) to obtain
1 lb
. P lb
lj = ~8m,n+l C~(x)w(x)dx---"8mn C~(x)w(x)dx
an a an a
Yn lb
2
- -8m,n-l Cm(x)w(x)dx
IXn a
(
1 8 Pm Ym+l )
= -- m,n+l - -8mn - --8m,n- l hm•
Clm-l am Clm+l
5.2 THE SPACE OF SQUARE-INTEGRABLE FUNCTIONS 155
or
III
if m = n + I,
if m = n,
j
hml Olm_ 1
I
-{3mhm/am
1-
-0Ym+lhm/am+l if m = n - 1,
otherwise.
5.2.5. Example. Letus findtheorthogonalpolynomialsforminga basisof £.2(-I, +I),
whichwedenoteby Pk(X), wherek is the degreeof thepolynomial.Let Po(x) = L Tofind
PI (x), write PI (x) = ax +b, anddeterminea andb in sucha way that PI (x) is orthogonal
to Po(x):
0=11
PI (x) Po(x) dx = 11
(ax +b) dx = !ax21~1 +2b = 2b.
-I -I
So oneof thecoefficients, b, is zero. Tofind theother one; we needsomestandardization
procedure.We"standardize"Pk(X) by requiringthat Pk(l) = I Vk: For k = I this yields
a x 1 = I,ora = 1. so that PI (x) =X.
Wecan calculate P2(X) similarly:Write P2(X) =ax2 +bx +c, imposethe condition
that it be orthogonalto both PI (x) and Po(x), and enforce the standardizationprocedure.
All this will yield
11 2 11
2
0= P2(x) Po(x) dx = -a +2c, 0 = P2(x) PI (x) dx = -b,
-I 3 -I 3
and P2(1) = a+b+c = L Thesethreeequationshavetheuniquesolutiona = 3/2, b = 0,
c = -1/2. Thus, P2(X) = !(3x2 - I). These are the first three Legendrepolynomials,
which arepart of a larger group of polynomials to be discussed in Chapter 7. II
5.2.1 Orthogonal Polynomials and Least Squares
The method of least squares is no doubt familiar to the reader. In the simplest
procedure, one tries to find a linear function that most closely fits a set of data.
By defioition, "most closely" means that the sum of the squares ofthe differences
between the data points and the corresponding values of the linear function is
minimum. More generally, one seeks the best polynomial fit to the data.
We shall consider a related topic, namely least-square fitting ofa givenfunction
with polynomials. Suppose f(x) is a function defined on (a, b). We wantto find a
polynomial that most closely approximates f. Write such a polynomial as p(x) =
L:Z=o akxk, where the ak's are to be determined such that
S(ao, 011, ... , an) ea lb
[f(x) - 010 - alx - ... - anx
n]2
dx
is a minimum. Differentiating S withrespect to the ak's and setting the result equal
to zerogives
as lb
[ n ]
0=-. = 2(-xi ) f(x) - I>kXk dx,
oaJ a k=O
158 5. HILBERTSPACES
(a) II! ± gil = II!II+ IIgll·
(b) II! +gll2 + II! - gll2 = 2(1If11 + IIgll)2.
(c) Using parts (a), (b), and Theorem 1.2.8, show that .c1(R) is not an inner
product space. This shows that not all norms arise from an inner product.
5.6. Use Equation (5.10) to derive Equation (5.12). Hint: To find an, equate the
coefficients of x" on both sides of Equation (5.10). To find an-I, multiply both
sides of Equation (5.10) by Cn_IW(X) and integrate, using the definitions of kn,
k~, andhno
5.7. Evaluate the integral f:x2C
m(x)Cn(x)w(x) dx.
Additional Reading
1. Boccara, N. Functional Analysis, Academic Press, 1990. An application
oriented book with many abstract topics related to Hilbert spaces (e.g.,
Lebesgue measure) explained for a physics audience.
2. DeVito, C. Functional Analysis and Linear Operator Theory, Addison-
Wesley, 1990.
3. Reed, M., and Simon, B. FunctionalAnalysis, Academic Press, 1980. Coau-
thored by a mathematical physicist (B.S.), this first volume ofa four-volume
encyclopedic treatise on functional analysis and Hilbert spaces bas many
examples and problems to help the reader comprehend the rather abstract
presentation.
4. Zeidler, E. Applied Functional Analysis, Springer-Verlag, 1995. Another
application-oriented book on Hilbert spaces suitable for a physics audience.
6, _
Generalized Functions
Once we allow the number of dimensions to be infinite, we open the door for
numerous possibilities that are not present in the finite case. One such possibility
arises because ofthe variety ofinfinities. Wehave encountered two types of infinity
in Chapter 0, the countable infinity and the uncountable infinity. The paradigm of
the former is the "number" of integers, and that of the latter is the "number" of
real numbers. The nature of dimensionality of the vector space is reflected in the
components of a general vector, which has a finite number of components in a
finite-dimensional vector space, a countably infinite number of components in
an infinite-dimensional vector space with a countable basis, and an uncountably
infinite number of components in an infinite-dimensional vector space with no
countable basis.
6.1 ContinuousIndex
To gain an understanding of the nature of, and differences between, the three
types of vector spaces mentioned above, it is convenient to think of components
as functions of a "counting set." Thus, the components Ji of a vector If) in an
N-dimensional vector space can be thought of as values of a function f defined
on the finite set {I, 2, ... , N}, and to emphasize such functional dependence, we
write f (I) instead of fi- Similarly, the components Ji of a vector If) in a Hilbert
space with the countable basis B = (lei)}~l can be thought of as values of a
function f : N --> C, where N is the (infinite) set of natural numbers. The next
step is to allow the counting set to be uncountable, i.e., a continuum such as the
real numbers or an interval thereof. This leads to a "component" ofthe form f (x)
corresponding to a function f : lR --> C. What about the vectors themselves?
What sort of a basis gives rise to such components?
160 6. GENERALIZED FUNCTIONS
Because of the isomorphism of Theorem 5.2.2, we shall concentrate on
.c~ (a, b). In keeping with our earlier notation, let {lex)lxeJR denote the elements
of an orthonormal set and interpret f(x) as (ex If). The inner product of .c~(a, b)
cannow be written as
(glf) = lb
g*(x)f(x)w(x)dx = lb
(glex)(exl f) w(x)dx
= (gl ([ lex) w(x) (exI dX) i».
The last line suggests writing
lb
lex) w(x) (exI dx = 1.
In the physics literature the "e" is ignored, and one writes Ix) for lex). Hence, we
obtain thecompleteness relation fora continuous index:
completeness
relation fora
continuous index
lb
[x) w(x) (xl dx = 1, or lb
[x) (xl dx = 1, (6.1)
where in the secoud integral, w(x) is set equal to unity. We also have
If) = (lb
Ix) w(x) (xl dx) If) = lb
f(x)w(x) [x) dx, (6.2)
which shows how to expand a vector If) in terms of the Ix)'s.
Take the inner product of (6.2) with (x']:
(xii f) = f(x') = lb
f(x)w(x) (xii x) dx
where x' is assumed to lie in the interval (a, b), otherwise f(x') = 0 by definition.
This equation, which holds for arbitrary f, tells us inuuediately that w(x) (xii x)
is no ordinary function of x and x'. For instance, suppose f(x') = O. Then, the
result of integration is always zero, regardless of the behavior of f at other points.
Clearly, there is an infinitude offunctions that vanish at x', yet all ofthem give the
same integral! Pursuing this line of argument more quantitatively, one can show
that w(x) (x'] x) = Oifx oF x', w(x) (xl x) = 00, w(x) (x'] x) is anevenfunctiou
of x - x', and J:w (x) (x'] x) dx = 1. The proof is left as a problem. The reader
Dirac delta function may recognize this as the Dirac delta function
8(x - x') = w(x) (x'] x) , (6.3)
which, for a function f defined on the interval (a, b), has the following property.'
l
b
f(x)8(x _ x') dx = {f(x
I
) if x E (a, b), (6.4)
a 0 ifxrt(a,b).
1Foranelementary discussionof theDiracdeltafunction withmanyexamplesof its application, see [Hass99].
6.1 CONTINUOUS INDEX 161
! !
,
_._._._._._._._.__._~_._._---_._._._._._._.+- ............-
········__·"·__····j···_--_·_--------·----i···_·- .•..........-
,
,
. .
" ..__._._-_._._._.~._---_._._._._._._._._~ .._. ........ -
, '
'/; i
----f--- _._._._.- ······1- .j ....... -
J ~!
,
,
_._~_.. ,
." .. ---------.""-j .................-
_.J  .,....... ;
,
e-,.......... ..-.- ~~.-
, ........-
......._.,..
,
:
j
,
! i
i
-0.5 o 0.5 I 1.5 2 2.5
Figure 6.1 The Gaussian bell-shaped curve approaches the Dirac delta function as the
width of the curve approaches zero. The value of € is 1 for the dashed curve, 0.25 for the
heavy curve and 0.05 for the light curve.
Written in the form (xii x) = 8(x -x')fw(x), Equation (6.3) is the generalization
of the orthonormality relation of vectors to the case of a continuous index.
The Dirac delta function is anything but a "function." Nevertheless, there is
a well-developed branch of mathematics, called generalized function theory or
functional analysis, studying it and many other functions like it in a highly rigorous
fashion. We shall only briefly explore this territory of mathematics in the next
section. At this point we simply mention the fact that the Dirac delta function
can be represented as the limit of certain sequences of ordinary functions. The
following three examples illustrate some of these representations.
6.1.1.Example. Consider a Gaussian curve whose width approaches zero at the same
time that its height approaches infinity in such a way that its area remains constant.
In the infinite limit, we obtain the Dirac delta function. In fact, we have 8(x -'- x') =
limE-+o ~e-(X-XI)2/E. In the limit of E ---+ 0, the height of this Gaussian goes to infin-
v H
'
ity while its width goes to zero (see Figure 6.1). Furthermore, for any nonzero value of E,
we can easily verify that
This relation is independent of E and therefore still holds in the limit E -+ O.The limit of
the Gaussian behaves like the Dirac delta function. II
6.1.2. Example. Consider thefunctionDT (x - x') defined as
I jT O( ')
DT(X -x') == _ e' x-x fdt.
2" -T
162 6. GENERALIZED FUNCTIONS
-5 o 5
Figure 6.2 The function sin Txfx also approaches the Dirac delta function as the width
of the curve approaches zero. The value of T is 0.5 for the dashed curve. 2 for the heavy
curve, and 15 for the light curve.
The integral is easily evaluated, with the result
« ') T
1 el
x-x til sinT(x -x')
DT(X - x') = - = .
. 2rr i(x -x') -T 7r x -x'
The graph of DT(x - 0) as a function of x for various values of T is shown in Figure 6.2.
Note that the width of the curve decreases as T increases. The area under the curve can be
calculated:
100 , I 100
sin T(x - x') 1 100
sin y
DT(X-x)dx=- , dx=- --dy=l.
-00 7C -00 X -x tt -00 Y
' - - - '
="
(6.5)
Figure 6.2 shows that DT (x - x') becomes more and more like the Dirac delta function
as T gets larger and larger. In fact, we have
~(x _ x') = lim I sin T(x - x').
T--+oo JC X - x'
and
T(x -x') -+ 0
To see this, we note that for any finite T we can write
DT(X _ x') = T sin T(x - x').
x T(x - x')
Furthermore, for values of x thatare very close to x',
sin T(x - x') I
-+ .
T(x-x') .
Thus, for such values of x and x', we have Dr(x - x') ~ (T/Jr), which is large when T
is large. This is as expected of a delta function: ~ (0) = 00. On the other hand, the width
(6.6)
(6.7)
III
step function orB
function
8function as
derivative 01 B
function
6.1 CONTINUOUS INDEX 163
of Dr (x - x') around x' is given, roughly, by the distance between the points at which
DT (x - x') drops to zero: T (x - x') = ±1l'. or X - x' = ±1l'/ T. This width is roughly
Ax = 2n/ T. whichgoestozeroasT grows.Again,thisisasexpectedofthedeltafunction.
III
The preceding example suggests auotherrepresentation ofthe Dirac deltafunc-
tion:
I joo.( ')
8(x - x') = - e' x-x 'dt.
2". -00
6.1.3. Example. A third representation of the Dirac delta function involves the step
function e(x - x'), which is defined as
I {O if x e x',
B(x -x) ==
1 if x>x'
and is discontinuous at x = x', We can approximate this step function by many continuous
functions, such as TE (x - x') defined by
[
0 if X~X'_,.
T,(x-x')== 2~(X-x'+') if x'-, ~x~x'+,.
1 if x ~ x' +€,
where e is a small positive number as shown in Figure 6.3. It is clear that
B(x-x')= tiro T,(x-x').
,--+0
Now let us consider the derivative of Te(x - x') with respect to x:
r
if x <X'-f,
at; , 1
if x' - € < X < x' +E,
-(x-x)= -
dx 2,
0 if x > x' +E.
We note that the derivative is not defined at x = x' - e and x = x' + E. and that dT€/ dx
is zero everywhere except when x lies in the interval (x' - E, x' + f), where it is equal to
1/(2f) and goes to infinity as e .....,. O.Here again we see signs of the delta function. In fact,
we also note that
100 (dT,) t:(dT,) lX
'+E 1
- dx = - dx = -dx = I.
-00 dx x' -, dx x'_, 2<
It is not surprising, then, to find that lim€......,)oo dT€ (x - x') = 8(x - x'). Assuming that the
dx
interchange of the order of differentiation and the limiting process is justified, we obtain
the importantidentity
d I I
-B(x - x) = 8(x - x).
dx
164 6. GENERALIZED FUNCTIONS
y
1
x
o X'-E x' x'+E
Figure 6.3 The step function, or 8-function, shown in the figure has the Dirac delta
function asits derivative.
Now that wehavesomeunderstanding of onecontinuous index,we cangener-
alizetheresults to several continuous indices. In theearlierdiscussion welookedat
j (x) as thexthcomponent of some abstractvector If). Forfunctions of n variables,
we can think of j(X!, ... , xn) as the component of an abstract vector If) along a
basis vector Ix!, ... , xn}.2 This basis is a direct generalization of one continuous
index to n. Then j(XI, ... , xn) is defined as j(X!, ... , xn) = (Xl, ... , XnIf). If
the region of integration is denoted by n, and we use the abbreviations
thenwe canwrite
If) = Indnxj(r)w(r) [r}, Indnx [r) w(r) (r] = 1,
j(r') =Indnxj(r)w(r) (r'] r), (r'l r) w(r) =8(r - r'), (6.8)
where d"X is the "volume" element and n is the region of integration of interest.
For instance, ifthe region of definition of the functions under consideration is
the surface of the unit sphere, then [with w(r) = 1], one gets
rdq, fan sinO so 10, q,HO, q,1 =1. (6.9)
2Do notconfusethiswithanu-dimensional vector. Infact,thedimension is a-fold infinite: eachxi countsone infinite setof
numbers!
(6.10)
6.2 GENERALIZED FUNCTIONS 165
This will be used in our discussion of spherical harmonics in Chapter 12.
An important identity using the three-dimensional Dirac delta function comes
from potential theory. This is (see [Bass 99] for a discussion of this equation)
2 ( 1 ) ,
V - - = -411"8(r-r).
Ir-r'l
6.2 Generalized Functions
Paul Adrian Maurice Dirac discovered the delta function in the late 1920s while
investigating scaltering problems in quantum mechanics. This "function" seemed
to violate most properties ofother functions known to mathematicians at the time.
Furthermore, the derivative of the delta function, 8'(x - x') is such that for any
ordinary function f(x),
i:f(x)8'(x - x')dx =- i:f'(x)8(x - x') dx =- f'(x').
We can define 8'(x - x') by this relation. In addition, we can define the derivative
of any function, including discontinuous functions, at any point (including points
of discontinuity, where the usual definition ofderivative fails) by this relation. That
is, if cp(x) is a "bad" function whose derivative is not defined at some point(s), and
f(x) is a "good" function, we can define the derivative of cp(x) by
i:f(x)cp'(x)dx == - i:f'(x)cp(x)dx.
The integral on the RHS is well-defined.
Functions such as the Dirac deltafunction and its derivatives of all orders are not
functions in the traditional sense. Whatis common among all ofthem is that in most
applications they appearinside an integral, and we saw in Chapter 1that integration
can be considered as a linear functional on the space of continuous functions. 11
is therefore natura1to describe such functions in terms of linear functionals. This
idea was picked up by Laurent Schwartz in the 1950s who developed it into a new
branch of mathematics called generalized functions, or distributions.
A distribution is a mathematical entity that appears inside an integral in con-
junction with a well-behaved test function-which we assume to depend on n
variables-such that the result of integration is a well-defined number. Depending
on the type of test function used, different kinds of distributions can be defined.
If we want to include the Dirac delta function and its derivatives of all orders,
then the test functions must be infinitely differentiable, that is, they must be eoo
functions on Rn (or C"), Moreover, in order for the theory of distributions to be
mathematicallyfeasible, all the test functions mustvanish outside a finite "volume"
of Rn (or Cn).3 One common notation for such functions is e1i'(Rn) or e1i'(cn)
3Such functions aresaidtobe of compactsupport.
166 6. GENERALIZED FUNCTIDNS
generalized functions
and distributions
defined
(F stands for "finite"). The definitive property of distributions concerns the way
they combine with test functions to give a number. The test functions used clearly
form a vector space over R or C. In this vector-space language, distributions are
liuear functionals. The linearity is a simple consequence of the properties of the
integral. We therefore have the following definition of a distribution.
6.2.1. Definition. A distribution, orgeneralizedfunction, is a continuous" linear
functional on the space ef"(Rn) or ef"(Cn). Iff Eef" and <p is a distribution,
then <p[J] = Coo' ,!,(r)f(r) dnx.
Another notation nsed in place of <p[J] is (<p, f). This' is more appealing not
only becanse <p is linear, in the sense that <p[af + ,Bg] = a<p[J] + ,B<p[g], bnt
also because the set of all such liuear functionals forms a vector space; that is,
the liuear combination of the <p's is also defined. Thus, (<p, f) suggests a mutual
"democracy" for both !'s and <p's.
We now have a shorthand way of writing integrals. For instance, if 8a repre-
sents the Dirac delta function 8(x - a), with an integration over x understood,
then (8a , f) = f(a). Similarly, (8~, f) = -/,(0), and for linear combinations,
(a8a +,B8~, f) = af(a) - ,B/,(a).
6.2.2. Example. An ordinary(continnons)functinn g coobe thonghtof as a specialcase
of a distribution, The linear functional g : ef"(R) --+ Ris simpty defined by (g, f) ea
g[f] = 1:"'00 g(x)f(x) dx. III
6.2.3. Example. An interesting application of distributions (generalizedfunctions) oc-
curs whenthenotionofdensityisgeneralizedto include notonly(smooth) volumedensities,
butalsopoint-like,linear, andsurface densities.
Apointcbargeq locatedatrO coobethougbtofashavingacharge densityper) =q8(r-
rO)'Inthelanguage oflinear functionals,weinterpretp asadistribution,p : ef"(E3)--+ E,
whichforanarbitrary function f gives
p[f] = (p, f) =qf(ro)· (6.11)
Thedeltafunction character of p canbe detected fromthisequation by recalling that the
LHSis
On the RHS of this equation, the only volume element that contributes is the one that
contains the point to: all the rest contribute zero. As b.Vi -+ 0, the only way that the
RHScangive a nonzero number is forp(rO)!(rO) tobe infinite. Since! is a well-behaved
function, p(rO) must be infinite, implying that per) acts as a delta function. This shows
thatthe definitionof Equation(6.11) ieads to a delta-functionbehavior for p. Similarlyfor
linearandsurface densities. III
4See [Zeidler, 95], pp.27. 156-160, fora formal definition of thecontinuity of linear functionals.
"The amount of
theoretical ground
one has tocover
before being able to
solve problems of
real practical value is
rather large, but this
circumstance isan
inevitable
consequence ofthe
fundamental part
played by
transformation
theory and is likely to
become more
pronounced inthe
theoretical physics of
thefuture."
P.A.M. Dirac (1930)
6.2 GENERALIZED FUNCTIONS 167
The example above and Problems 6.5 and 6.6 suggest that a distribution that
confines an integral to a lower-dimensional space mnst have a delta function in its
definition.
"Physical Laws should have mathematical beauty." This statement
was Dirac's response to the question of his philosophy ofphysics,
posed to him in Moscow in 1955. He wrote it on a blackboard that
is still preserved today.
Paul Adrien Maurice Dirac (1902-1984), was born in 1902
in Bristol, England, of a Swiss, French-speaking father and an
English mother. His father, a taciturn man who refused to receive
friends at home, enforced young Paul's silence by requiring that
only French be spoken at the dinner table. Perhaps this explains
Dirac's later disinclination toward collaboration and his general
tendency to be a loner in most aspects ofhis life. The fundamental
nature ofhis work made the involvement ofstudents difficult, so perhaps Dirac's personality
was well-suited to his extraordinary accomplishments.
Dirac went to Merchant Venturer's School, the public school where his father taught
French, and while there displayed great mathematical abilities. Upon graduation, he fol-
lowed in his older brother's footsteps and went to Bristol University to study electrical
engineering. He was 19 when he graduated Bristol University in 1921. Unable to find a
suitable engineering position due to the economic recession that gripped post-World War I
England, Dirac accepted a fellowship to study mathematics at Bristol University. This fel-
lowship, together with a grant from the Department of Scientific and Industrial Research,
made it possible for Dirac to go to Cambridge as a research student in 1923. At Cambridge
Dirac was exposed to the experimental activities of the Cavendish Laboratory, and he be-
came a member of the intellectual circle over which Rutherford and Fowler presided. He
took his Ph.D. in 1926 and was elected in 1927 as a fellow. His appointment as university
lecturer came in 1929. He assumed the Lucasian professorship following Joseph Larmor
in 1932 and retired from it in 1969. Two years later he accepted a position at Florida State
University where he lived out his remaining years. The FSU library now carries his name.
In the late 1920s the relentless march of ideas and discoveries had carried physics to a
generally accepted relativistic theory of the electron. Dirac, however, was dissatisfied with
the prevailing ideas and, somewhat in isolation, sought for a better formulation. By 1928
he succeeded in finding an equation, the Diracequation, that accorded with his own ideas
and also fit most of the established principles of the time. Ultimately, this equation, and
the physical theory behind it, proved to be one of the great intellectual achievements of the
period. It was particularly remarkable for the internal beauty of its mathematical structure,
which not only clarified previously mysterious phenomena such as spin and the Fermi-
Dirac statistics associated with it, but also predicted the existence ofan electron-like particle
of negative energy, the antielectron, or positron,and, more recently, it has come to playa
. role of great importance in modem mathematics, particularly in the interrelations between
topology, geometry, and analysis. Heisenberg characterized the discovery of antimatter by
Dirac as "the most decisive discovery in connection with the properties or the nature of
elementary particles .... This discovery ofparticles and antiparticles by Dirac. .. changed
our whole outlook on atomic physics completely." One of the interesting implications of
168 6. GENERALIZED FUNCTIONS
his work that predicted thepositron wastheprediction of amagnetic monopole. Dirac won
the Nobel Prize in 1933for this work.
Dirac is not only one of the chief authors of quantum mechanics, buthe is also the
creator of quantum electrodynamics andone of theprincipal architects of quantum field
theory. While studying the scattering theory of quantum particles, he invented the (Dirac)
delta junction; in his attempt at quantizing the geueral theory of relativity, he founded
constrained Hamiltonian dynamics, whichis one of the mostactive areas of theoretical
physicsresearch today. Oneof his greatest contributions is theinvention of bra ( Iand ket
I)·
WhileatCambridge, Diracdidnotaccept many research students. Thosewhoworked
with him generally thought that he was a good supervisor; but one who did not spend
muchtimewithhis students. A student neededtobeextremely independent to work under
Dirac. One suchstudent was Dennis Sciama, who later became the supervisor of Stephen
Hawking, thecurrent holder of the Lucasian chair. Salam andWigner, in their preface tothe
FestschriftthathonorsDiraconhisseventiethbirthday andcommemorates hiscontributions
toquantum mechanics succinctly assessedtheman:
Diracis one of thechief creators of quantum mechanics .... Posterity will
rateDiracasoneof thegreatest physicists of alltime.Thepresent generation
valueshimas one of its greatest teachers . . .. Onthoseprivileged to know
him, Dirachas left his mark . .. by his human greatness. He is modest,
affectionate, and setsthehighestpossiblestandards of personal and scientific
integrity. He is a legendinhis ownlifetimeandrightly so.
(Takenfrom Schweber, S. S. "Some chapters for a history of quantum field theory: 1938-
1952", in Relativity, Groups, and Topology II vol. 2, B. S. Dewitt and R. Stora, eds.,
North-Holland, Amsterdam, 1984.)
We have seenthat the deltafunctiott Cattbe thoughtofas the limitofatt ordinary
function, This idea can he generalized,
6.2.4. Defittition. Let {'Pn (x)} be a sequence offunctions such that
n~i:'Pn(x)f(x)dx
exists for all f E e~(JR). Then the sequence is said to converge to the distribution
'P,defined by
('P, f) = j}!'!"J:'Pn(x)f(x)dx V f.
This convergence is denoted by 'Pn -> 'P.
For example, it Cattbe verified that
n 2 2
_e-n x -> 8(x)
.,fii
attd 1 - cosnx -> 8(x)
nnx2
and so Ott. The proofs are left as exercises.
6.3 PROBLEMS 169
derivative ofa 6.2.5. Definition. The derivative of a distribution rp is another distribution rp'
distribution defined by (rp', f) = - (rp, f') V f E e~.
6.2.6. Example. Wecancombine thelasttwodefinitions toshowthat if thefunctions en
are defined as
{
a if x < _1
B,,(x) == (nx + 1)/2 if --k ~"~ s k.
1 if x ~ k,
tbenB~(x) -+ 8(x).
Wewritethedefinition of the derivative, (B~, f) = - (Bn • j'), in terms ofintegrals:
100 , 1
00
df 100
B,,(x)f(x)dx = - B,,(x)-dx = - Bn(x)df
-00 -00 dx -00
=_(1-1
/" B
n(X)df+l
l
/
n
Bn(x)df+ rooBn(X)d f)
-00 -lin 11/n
= _ (0+11
/"nx + Idf + roodf)
-I/n 2 11/n
= _::11
/
n
xdf _ ~11
/ " df _ roodf
2 -I/n 2 -I/n 11/n
= -~ (xf(x)I~I~" - i:f(X)dX)
I
- 2: (f(lfn) - f(-I/n» - f(oo) + f(l/n).
Forlarge n. we have I/n '" 0 and f(±I/n) '" frO). Thus,
100 , n (I I I I 2 )
B,,(x)f(x)dx '" -- - f(-) + - f(--) - -frO) + frO) '" frO).
-00 2 n n n n n
Theapproximation becomesequality in thelimitn -+ 00. Thus,
1
00 , ,
lim Bn(x)f(x)dx = frO) = (80. f) => Bn -+ 8.
n"""*oo -00
Note that /(00) = 0 because of the assumption that all functions must vanish outside a
finitevolume. II
6.3 Problems
6.1. Write a density function for two point charges q; and qz located at r = rl
and r = r2. respectively.
6.2. Write a density function for four point charges ql = q, q2 = -q, q3 = q
and q4 = -q, located at the comers of a square of side 2a, lying in the xy-plane,
whose center is at the origin and whose first comer is at (a. a).
170 6. GENERALIZED FUNCTIONS
I
6.3. Show that 8(f(x)) = 8(x - xo), where xo is a root of f and x is
1f'(xo)1
confined to values close to xo. Hint: Make a change of variable to Y = f(x).
6.4. Show that
where the Xk'S are all the roots of f in the interval on which f is defined.
6.5. Define the distribution p : eoo(R3) --+ R by
(p, f) = f<J(r)f(r)da(r),
S
where <J(r) is a smooth function on a smooth surface S in R3• Show that per) is
zero ifr is not on S and infinite ifr is on S.
6.6. Define thedisttibution p : eoo(R3) --+ R by
(p, f) = fc)..(r)f(r) di(r),
where )..(r) is a smooth function on a smooth curve C in R3• Show that per) is
zero if r is not 00 C and infinite if r is on C.
6.7. Express the three-dimensional Dirac delta function as a product of three one-
dimensional delta functions involving the coordinates in
(a) cylindtical coordinates,
(b) spherical coordinates,
(c) general curvilinear coordinates.
Hint: The Dirac delta function in R3
satisfies JJJ 8(r)d3x = 1.
6.8. Show that J""oo 8'(x)f(x) dx = - f'(O) where 8'(x) sa !x8(x).
6.9. Evaluate the following integrals:
(a) i:8(x2 - 5x +6)(3x2 -7x +2) dx.
(e) roo 8(sin]fx)(~)Xdx.
10.5
Hint: Use the result of Problem 6.4.
(b) i:8(x2 - ]f2)cosxdx.
1
00 2
(d) -00 8(e-X )lnxdx.
6.10. Consider IxIas a generalized function and find its derivative.
6.3 PROBLEMS 171
6.11. Let ~ E eOO(JRn) be a smooth function on JRn, and let <fJ be a distribution.
Showthat ~<fJ is alsoa distribution. What is the natural definitionfor ~<fJ? What is
(~<fJ)/, the derivative of ~<fJ?
6.12. Showthat eachof the following sequencesof functionsapproaches8(x) in
the sense of Definition6.2.4.
n 2 2
(a) .,jiie-n
x •
(b) 1- cosnx
1tnx 2
(d) sinnx.
nx
Hint:Approximate<fJn(x) for largen andx "" 0, andthen evaluatethe appropriate
integral.
6.13. Showthat ~(1 +tanhnx) --> O(x) as n --> 00.
6.14. Showthatx8'(x) = -8(x).
Additional Reading
1. Hassani, S. Mathematical Methods, Springer-Verlag, 2000. An elementary
treatmentof the Dirac delta function with many examplesdrawn from me-
chanicsand electromagnetism.
2. Rudin, W.Functional Analysis, McGraw-Hill,1991. Part II of this mathe-
maticalbut(forthosewithastrongundergraduatemathematicsbackground)
veryreadablebook is devotedto the theory of distributions.
3. Reed, M. and Simon, B. Functional Analysis, AcademicPress, 1980.
7 _
Classical Orthogonal Polynomials
The last example of Chapter 5 discussed only one of the many types of the so-called
classical orthogonal polynomials. Historically, these polynomials were discovered
as solutions to differential equations arising in various physical problems.
Suchpolynomials can be producedby startingwith I, x, X
2,
... and employing
the Gram-Schmidt process. However, there is a more elegant, albeit less general,
approach that simultaneously studies most polynomials of interest to physicists.
We will employ this approach.'
7.1 General Properties
for n = 0, 1,2, ... ,
Most relevant properties of the polynomials of interest are contained in
7.1.1.Theorem. Consider thefunctions
1 dn
Fn(x) = w(x) dx n(uis")
where
1. F, (x) is a first-degreepolynomial in x,
(7.1)
2. s(x) is a polynomial in x ofdegree less than or equal to 2 with only real
roots,
3. w(x) is a strictly positive function, integrable in the interval (a, b), that
satisfies the boundary conditions w(a)s(a) = 0 = w(b)s(b).
IThisapproach is duetoF.G. Tricomi [Tric 55]. See also [Denn 67].
7.1 GENERAL PROPERTIES 173
Then Fn(x) is a polynomial of degree n in x and is orthogonal-on the inter-
val (a, b), with weight w(x)---to any polynomial Pk(X) of degree k < n, i.e.,
J:Pk(x)Fn(x)w(x) dx = Of
or k < n. These polynomials are collectively called
classical onhogonalpolynomials.
Before proving the theorem, we need two lemmasf
7.1.2. Lemma. Thefollowing identity holds:
Proof See Problem 7.1.
m:S n.
D
7.1.3. Lemma. All the derivatives d/" jdxm(wsn) vanish at x = a and x = b,for
all values ofm < n.
Proof Set k = 0 in the identity of the previous lemma and let P,o = 1. Then we
dm
have --(wsn) = wsn-mFsm- The RHS vanishes atx = a and x = b due to the
dx'" -
third condition stated in the theorem. D
Proofofthe theorem. .We prove the orthogonality first. The proof involves multi-
ple use of integration by parts:
l
b lb I d
n
pk(x)F"(x)w(x)dx = Pk(x)- [-n (ws n)] wdx
a a W dx
l
b
d [dn
-
1
]
= Pk(X)- ---I (uis") dx
a dx dx»:
d
n-I
I
b
lb
d d"-l
= Pk(X) - - I (ws") - d
Pk
- d
1 (wsn) dx.
dxv: a a X xll-
~
=0 by Lemma7.1.3
This shows that each integration by parts transfers one differentiation from uis" to
Pk and introduces a minus sign. Thus, after k integrations by parts, we get
l
b
lbd
k
Pk d
n
-
k
a Pk(x)Fn(x)w(x)dx = (_I)k a dxk dxn_k(wsn)dx
l
b
d [dn-k-I
] d"-k-l I
b
= C a dx dx" k 1 (ws
n)
dx = C dx" k 1 (ws
n)
a = 0,
2Recall that P5k is a generic polynomial with degree less than or equal to k.
174 7. CLASSICAL ORTHOGONAL POLYNOMIALS
where we have used the fact that the kth derivative of a potynomial of degree k
is a constant. Note that n - k - 1 ~ 0 because k < n, so that the last line of the
equation is well-defined.
To prove the firstpart of the theorem, we use Lemma 7.1.2 with k = 0 and m =
dn 1 dn
n to get -(wsn) = wP<n,orFn(x) = --(wsn) = Psn-Toprove that Fn(x)
dxn - ui dx" -
is a polynomial ofdegree precisely equal ton, we write Fn(x) = P~n-' (x)+knxn,
multiply both sides by w(x)Fn(x), and iotegrate over (a, b):
lb
[Fn(x)]2w(x)dx = lb
P~n_,Fn(x)w(X)dx +kn lb
xnFn(x)w(x)dx.
The LHS is a positive quantity because both w(x) and [Fn(x)]2 are positive, and
the first iotegral on the RHS vaoishes by the first part of the proof. Therefore, the
second term on the RHS cannot be zero. In particular, kn t= 0, and Fn(x) is of
degreen. D
Itis customary to iotroduce a normalization constant io the definition of Fn (x),
and write
1 dn
Fn(x) = -- __(ws n).
Knwdxn (7.2)
generalized
Rodriguez formula
differential equation
forclassical
orthogonal
polynomials
This equationis calledthe generalizedRodriguez fonnula. For historicalreasons,
different polynomial functions are normalized differently, which is why K. is
introduced here.
From Theorem 7.1.1 it is clear that the sequence Fo(x), F, (x), F2(X), . " of
polynomials forms an orthogonal set of polynomials on [a, b] with weight function
w(x).
All the varieties of classical orthogonal polynomials were discovered as solu-
tions of differential equations. Here, we give a siogle generic differential equation
satisfied by all the Fn's. The proof is outlioed in Problem 7.4.
7.1.4. Proposition. Let k, be the coefficientofx in Fl (x) and (72 the coefficientof
x2 in sex). Then the orthogonal polynomials Fn satisfy the differential equation3
We shall study the differential equation above in the context of the Sturm-
Liouville problem (see Chapters 18 and 19), which is an eigenvalue problem in-
volving differential operators.
3Aprime is a symbol forderivative withrespect tox.
7.2 CLASSIFICATION 175
7.2 Classification
Let us now investigate the consequences of vatious choices of s(x). We start with
Ft (x), and note that it satisfies Equation (7.2) with n = 1:
1 d
Ft(x) = --(ws),
Kiui dx
or
(7.3)
which can be integrated to yield ws = Aexp(f KtFt(x)dx/s) where A is a
constant. On the other hand, being a polynomial of degree 1, Fl (x) can be written
as F;(x) = klX +k;. It follows that
(I Kl (klX +kJ) )
w(x)s(x) = Aexp s dx,
w(a)s(a) = 0 = w(b)s(b).
Next we look at the three choices for s(x): a constant, a polynomial of degree
1, and a polynomial of degree 2. For a constant s(x), Equation (7.3) can be easily
integrated:
(I Kl(klX +k')) (I )
w(x)s(x) = Aexp s 1 dx =Aexp (2ax+fJ)dx
= Aeax2+px+c = Beax2+px.
The interval (a, b) is determined by w(a)s(a) = 0 = w(b)s(b), which yields
Beua2+pa = 0 = Beub2+Pb. The only way that this equality can hold is for a and
b to be infinite. Since a < b, we must take a =-00 and b =+00, in which case
a < O. With y = .JIiiT(x +fJ/(2a)) and choosing B = s exp(fJ2/(4a)), we obtain
w(y) = exp( _y2). We also take the constant sto be 1. This is always possible by
a proper choice of constants such as B.
lfthe degree ofsis 1, then s(x) = O'IX +0'0 and
where y = Klkl/O'I, P = Klk1JO'I - KlklO'O/O'[, and B is A modified by
the constant of integration. The last equation above must satisfy the boundary
conditions at a and b: B(O'la + O'o)Peya = 0 = B(O'lb + O'O)Peyb, which give
a = -0'0/0'1, P > 0, Y < 0, and b = +00. With appropriate redefinition of
vatiables and parameters, we can write w(y) = yVe-Y, v > -1, and s(x) = x,
a =0, b =+00.
176 7. CLASSICAL ORTHOGONAL POLYNOMIALS
fL v w(x) Polynomial
0 0 I Legendre, Pn(x)
J.-! J.-! (I - x2jA- l/2 Gegenbauer, C~(x), J. > -!
1 1 (I - x 2)- 1/2 Chebyshev of the first kind, Tn(x)
-2 -2
1 1 (I - x 2)1/2 Chebyshev of the second kind, Un(x)
2 2
Table7.1 Special casesof Jacobi polynomials
Similarly, we can obtain the weight function and the interval of integration for
the case when s(x) is of degree 2. This result, aswell asthe results obtained above,
are collected in the following proposition.
7.2.1. Proposition. Ifthe conditions ofTheorem 7.1.1 prevail, then
(a) For s(x) ofdegree zero we get w(x) = e-x' with s(x) = I, a = -00, and
b = +00. The resulting polynomials are called Hermite polynomials and
are denoted by Hn(x).
(b) For s(x) ofdegree 1, we obtain w(x) = xVe-x with v > -1, s(x) = x,
a = 0, and b = +00. The resulting polynomials are called Laguerre
polynomials and are denoted by L~(x).
(c) Fors(x) ofdegree 2, we get w(x) = (1 +x)I'(I-x)V with u.,» > -1,
s(x) = 1 - x 2
, a = -1, and b = +1. The resulting polynomials are called
Jacobi polynomials and are denoted by p/:. v (x).
Jacobi polynomials are themselves divided into other subcategories depending
on the values of fLand v. The most common and widely used of these are collected
in Table 7.1. Note that the definitionofeachofthe precedingpolynomialsinvolves a
"standardization," which boils down to a particular choice of Kn in the generalized
Rodriguez formula.
7.3 Recurrence Relations
Besides the recurrence relations obtained in Section 5.2, we can use the differen-
tial equation of Proposition 7.1.4 to construct new recurrence relations involving
derivatives. These relations apply only to classical orthogonal polynomials, and
notto general ones. We start with Equation (5.12)
Fn+l(X) = (anx + fJn)Fn(x) + )'nFn-l(X), (7.4)
(7.5)
7.3 RECURRENCE RELATIONS 177
differentiate both sides twice, and substitnte for the secoud derivative from the
differeutial equation of Propositiou 7.1.4. This will yield
2wsanF~+ [an:X (ws) +WAn(anX +fin)] t;
- WAn+!Fn+1 +WYnAn-IFn-l = O.
Karl Gustav Jacob Jacobi (1804-1851) was the secoud son
born 10 a well-to-doJewish baoking family in Potsdam. An
obviously bright young man, Jacobi was soon moved to the
highest class in spite of his youth and remained at the gym-
nasium for four years only because he could not enter the
university until he was sixteen. He excelled at the University
of Berlin in all the classical subjects as well as mathematical
studies, the topic he soon chose as his career.He passed the
examinationto becomea secondaryschoolteacher,then later
the examination that allowed university teaching, and joined
the faculty at Berlin at the age of twenty. Since promotion
there appeared unlikely,he moved in 1826 to the University of Konigsberg in search of a
more permanent position. He was known as a lively and creative lecturer who often injected
his latest research topics into the lectures. He began what is now a common practice at
most universities-the research seminar-for the most advanced students and his faculty
collaborators. The Jacobi "school," together with the influence of Bessel and Neumann(also
at Konigsberg), sparked a renewal of mathematical excellence in Germany.
In 1843 Jacobi fell gravely ill with diabetes. After seeing his condition, Dirichlet, with
the help of von Humboldt, secured a donation to enable Jacobi to spend several months in
Italy, a therapy recommended by his doctor. The friendly atmosphere and healthful climate
there soon improved his condition. Jacobi was later given royal permission to move from
Konigsberg10 Berlin so that his heal!h would not be affected by the harsh winlers in the
former location. A salary bonus given to Jacobi to offset the highercostofliving in the capital
was revoked after he made some politically sensitive remarks in an impromptu speech. A
permanent position at Berlin was also refused, and the reduced salary and lack of security
caused considerable hardship for Jacobi and his family. Only after he accepted a position in
Vienna did the Prussian government recognize the desirability of keeping the distinguished
mathematician within its borders, offering him special concessions that together with his
love for his homeland convinced Jacobi to stay. In 1851 Jacobi died after contracting both
influenza and smallpox.
Jacobi's mathematical reputation beganlargely with his heated competition with Abelin
the study ofellipticfunctions. Legendre, formerly the star ofsuch studies, wrote Jacobi ofhis
happiness at having "livedlong enough to witness these magnanimous contests between two
young athletes equally strong." Although Jacobi and Abel could reasonably be considered
contemporary researchers who arrived at many of the same results independently, Jacobi
suggested the names "Abelian functions" and "Abelian theorem" in a review he wrote for
Crelle's Journal. Jacobi also extended his discoveries in elliptic functions to number theory
and the theory of integration. He also worked in other areas of number theory, such as the
theory ofquadratic forms and the representation ofintegers as sums ofsquares and cubes. He
(7.6)
(7.7)
178 7. CLASSICAL ORTHOGONAL POLYNOMIALS
presentedthewell-knownJacobian, orfunctional determinant, in1841.Tophysicists, Jacobi
is probably bestknown for his workin dynamics with the form introduced by Hamilton.
Althoughelegantandquitegeneral, Hamiltoniandynamics did notlenditselfto easysolution
of many practical problems in mechanics. In the spirit of Lagrange, Poisson, and others,
Jacobi investigated transformations of Hamilton's equations thatpreserved their canonical
nature (loosely speaking, thatpreserved thePoissonbrackets ineachrepresentation). After
much work anda littlesimplification, the resulting equations of motion, now known as
Hamilton-Jacobi equations, allowedJacobi tosolve several importantproblems inordinary
andcelestial mechanics. Clebsch andlater Helmholtz amplified their use in other areas of
physics.
We can get another recurrence relation involving derivatives by substituting
(7.4) in (7.5) and simplifying:
ZwsanF~ + [an :X(ws) +wO'n - An+t)(anx + fJn)] Fn
+wYn(An-l - An+l)Fn-l = O.
Two other recurrence relations can be obtained by differentiating Equations
(7.6) and (7.5), respectively, and using the differential equation for Fn• Now solve
the first equation so obtained for Yn(djdx)(wFn-l) and substitote the result in the
second equation. After simplification, the result will be
ZwanAnFn + :x {[an :x (ws) +W(An - An-l)(anx + fJn)] Fn}
d
+ (An-I - An+t) dx (WFn+l) = O.
Finally, we record one more useful recurrence relation:
dw dw
An(x)Fn - An+l(anx + fJn)-d Fn+t +YnAn-l(anx + fJn)-Fn-1
x dx
+Bn(x)F~+t + YnDn(x)F~_1 = 0, (7.8)
where
An(x) = (anx +fJn) [zwanAn +an d
2
2(ws) +An(anx +fJn) dW]
dx dx
2 d
- an dx (ws),
d
Bn(x) = an dx (ws) - w(anx + fJn)(An+1 - An),
d
Dn(x) = w(anx + fJn)(An-l - An) - an -(ws).
dx
Details of the derivation ofthis relation are left for the reader. All these recurrence
relations seem to be very complicated. However, complexity is the price we pay
7.4 EXAMPLES OF CLASSICAL ORTHOGONAL POLYNOMIALS 179
for generality. When we work with specific orthogonal polynomials, the equa-
tions simplify considerably. For instance, for Hermite and Legendre polynomials
Equation (7.6) yields, respectively,
useful recurrence
relations forHermite
and Legendre
polynomials and 2 '
(l - X )Pn +nxp" - nPn-l = O. (7.9)
Also, applying Equation (7.7) to Legendre polynomials gives
, ,
Pn+1 - XPn - (n + I)Pn = 0,
and Equation (7.8) yields
, ,
Pn+1 - Pn-1 - (2n + I)Pn = O.
(7.10)
(7.11)
It is possible to find many more recurrence relations by manipulating the ex-
istingrecurrence relations.
Before studying specific orthogonal polynomials, let us pause for a moment
to appreciate the generality and elegance of the foregoing discussion. With a few
assumptions and a single defining equation we have severely restricted the choice
of the weight function and with it the choice of the interval (a, b). We have nev-
ertheless exhausted the list of the so-called classical orthogonal polynomials.
7.4 Examples of Classical Orthogonal Polynomials
We now consttuct the specific polynomials used frequently in physics. We have
seen that the four parameters Kn, kn, k~, and hn determine all the properties ofthe
polynomials. Once Kn is fixed by some standardization, we can detenmine all the
other parameters: kn and k~ will be given by the generalized Rodriguez formula,
and hn can be calculated as
hn = lb
F;(x)w(x) dx = lb
(knxn + ...)Fn(x)W(X) dx
l
b
I d
n
k lb
d [d
n-1
]
=kn wxn----(wsn)dx = -'0. xn_ - - I (ws n) dx
a Knw dx" Kn a dx dx»:
k d
n-1
[b k lb d d
n-I
= -'0. xn __ (ws n) _ -'0. _(xn) __ (ws n) dx
Kn dxr:" a Kn a dx dx n-1
The first term of the last line is zero by Lemma 7.1.3. It is clear that each integra-
tion by parts introduces a minus sign and shifts one differentiation from uis" to
x".Thus, after n integrations by parts and noting that dOjdxo(wsn) = ius" and
d" jdxn(xn) = n!, we obtain
h
(-l)nknn! lb
nd
n = WS x.
Kn a
(7.12)
180 7. CLASSICAL ORTHOGONAL POLYNOMIALS
summary of
properties of Hermite
polynomials
7.4.1 Hermite Polynomials
The Hermite polynomials are standardized such that Kn = (_1)n. Thus, the
geueralized Rodriguez formula (7.2) and Proposition 7.2.1 give
2 d" 2
Hn(x) = (_I)"ex - (e-X ).
dxn (7.13)
It is clear that each time e-
x2
is differentiated, a factor of -2x is introduced.
The highest power ofx is obtained when we differentiate e-x2n times. This yields
2 2
(_I)nex (_2x)ne- x =2nxn '* kn =2n.
To obtain l(,., we find it helpful to see whether the polynomial is even or odd.
We substitute -x for x in Equation (7.13) and get Hn(-x) = (_I)n Hn(x), which
shows thatifn is even (odd), Hn is an even (odd) polynomial, i.e., it can have only
even (odd) powers of x. In either case, the next-highest power of x in Hn(x) is
not n - I but n - 2. Thus, the coefficient of xn-1 is zero for Hn(x), and we have
k~ = O. For h«, we use (7.12) to obtain hn = .,jii2nnL
Next we calculate the recurrence relation of Equation (5.12). We can readily
calculate the constants needed: an = 2, fin = 0, Yn = -2n. Then substitute these
in Equation (5.12) to obtain
(7.14)
(7.15)
(7.16)
summary of
properties of
Laguerre
polynomiais
Other recurrence relations can be obtained similarly.
Finally, the differential equation of Hn(x) is obtained by first noting that K1 =
-I, (jz = 0, Fl (x) = 2x '* kl = 2. All of this gives An = -2n, which can be
used in the equation of Proposition 7.1.4 to get
dZHn an;
dxz - 2x dx +2nHn = O.
7.4.2 LaguerrePolynomials
For Laguerre polynomials, the standardization is Kn = nL Thus, the generalized
Rodriguez formula (7.2) and Proposition 7.2.1 give
I dn I dn
L~(x) = (xVe-xxn) = _x-VeX_(xn+ve-X).
n!xVe x dx" n! dx"
To find kn we note that differentiating e-x does not introduce any new powers
of x but ouly a factor of -1. Thus, the highest power of x is obtained by leaving
x n+v alone and differentiating e-x n times. This gives
~,x-vexXn+V(_I)ne-X = (_I)n xn '* k
n
= (_I)n.
n. n! n!
7.4 EXAMPLES OF CLASSICAL ORTHOGONAL POLYNOMIALS 181
We may try to check the evenness or oddness of L~ (x); however, this will not
be helpful because changing x to -x distorts the RHS of Equation (7.16). In fact,
k~ i' 0 in this case, and it can be calculated by noticing that the next-highest power
of x is obtained by adding the first derivative of x"+v n times and multiplying the
result by (_1)n-l , which comes from differentiating e-x. We obtain
I ( 1)"-I(n +v)
-v X[( 1)"-1 ( + ) n+v-l -X] - ,,-1
-x e - n n v x e = )1 X ,
n! (n -I .
and therefore k~ = (-I)"-I(n + v)/(n - I)L
Finally, for h" we get
h _ (-1)"[(-1)" /n!]n! 1"" v -x "d _ I 1"" n+v -x d
n - x e x x - - x e x.
n! 0 n! 0
If v is not an integer (and it need not be), the integral on the RHS cannot be
evaluated by elementary methods. In fact, this integral occurs so frequently in
thegamma function mathematical applications that it is given a special name, the gamma function.
A detailed discussion of this function can be found in Chapter 11. At this point,
we simply note that
['en +1) = n! for n EN, (7.17)
and write hn as
h
n
= ['en +v + 1) = ['en +v + 1) .
n! ['en + 1)
The relevant parameters for the recurrence relation can be easily calculated:
1
Ol - - - -
,,- n+I' fJ
_ 2n+v+I
n- n+l '
n+v
Y"=-n+I'
Substituting these in Equation (5.12) and simplifying yields
(n + I)L~+l = (2n +v + I - x)L~ - (n + v)L~_I'
With kl = -I and (f2 = 0, we get A" = -n, and the differential equation of
Proposition 7.1.4 becomes
d2 p ai:
x--f + (v +I-x)--" +nL~ = O.
dx dx
(7.18)
182 7. CLASSICAL ORTHOGONAL POLYNOMIALS
7.4.3 Legendre Polynomials
summary of
properties of
Legendre
polynomials
Instead of discussing the Jacobi polynomials as a whole, we will discuss a special
case of them, the Legendre polynomials P"(x), which are more widely used in
physics.
With IJ. = 0 = v, corresponding to the Legendre polynomials, the weight
function for the Jacobi polynomials reduces to w(x) = I. The standardization is
K" = (-1)"2"n!. Thus, the generalized Rodriguez formula reads
(_I)" a" 2
P,,(x) = - - - - [(1- x)"].
2nn! dx"
(7.19)
Tofind k", we expand the expression in square brackets using the binomial theorem
and take the nth derivative of the highest power of x. This yields
k x" _ (_I)" a" _x2)" __
1_ a" (x2n)
" - 2" , a II [( ] - 2" I a "
n. x n. x
= _1_2n(2n - 1)(2n - 2) ... (n + I)x".
2nn!
2"r(n +1)
After some algebra (see Problem 7.7), we get k" = 1 2
n!r(z)
Adrien-Marie Legendre (1752-1833) came from a well-
to-do Parisian family and received an excellent education in
science and mathematics. His university work was advanced
enough that his mentor used many of Legendre's essays in
a treatise on mechanics. A man of modest fortune until the
revolution, Legendre was able to devote himself to study
and research without recourse to an academic position. In
1782 he won the prize of the Berlin Academy for calculat-
ing the trajectories of cannonballs taking air resistance into
account. This essay brought him to the attention of Lagrange
and helped pave the way to acceptance in French scientific
circles, notably the Academy of Sciences, to which Legendre submitted numerous papers.
In July 1784 he submitted a paper on planetary orbits that contained the now-famous Leg-
endre polynomials, mentioning that Lagrange had been able to "present a more complete
theory" in a recent paper by using Legendre's results. In the years that followed, Legendre
concentrated his efforts in number theory, celestial mechanics, and the theory of elliptic
functions. In addition, he was a prolific calculator, producing large tables of the values of
special functions, and he also authored an elementary textbook_
that remained in use for
many decades. In 1824 Legendre refused to vote for the government's candidate for Institut
National. Because of this, his pension was stopped and he died in poverty and in pain at the
age of 80 after several years of failing health.
Legendre produced a large number of useful ideas but did not always develop them
in the most rigorous manner, claiming to hold the priority for an idea if he had presented
7.4 EXAMPLES OF CLASSICAL ORTHOGONAL POLYNOMIALS 183
merely a reasonable argumentfor it. Gauss, with whom he had severalquarrels overpriority.
considered rigorous proof the standard of ownership. To Legendre's credit, however, he
was an enthusiastic supporter of his young rivals Abel and Jacobi and gave their work
considerable attention in his writings. Especially in the theory of elliptic functions, the area
of competition with Abel and Jacobi, Legendre is considered more of a trailblazer than a
great builder. Hermite wrotethat Legendre "is consideredthe founder of the theory of elliptic
functions" and "greatly smoothed the way for his successors," but Dotes that the recognition
of the double periodicity of the inversefunction, which allowed the great progress of others,
was missing from Legendre's work.
Legendre also contributed to practical efforts in science and mathematics. He and two
of his contemporaries were assigned in 1787 to a panel conducting geodetic work in co-
operation with the observatories at Paris and Greenwich. Four years later the same panel
members were appointed as the Academy's commissioners to undertake the measurements
and calculations necessary to determine the length of the standard meter. Legendre's seem-
ingly tireless skill at calculating produced large tables of the values of trigonometric and
elliptic functions, logarithms, and solutions to various special equations.
In his famous textbook Elements de geometrie (1794) he gave a simple proof that 7r is
irrational and conjecturedthat it is not the root of any algebraicequationof finite degree with
rational coefficients. The textbook was somewhat dogmatic in its presentation of ordinary
Euclidean thought and includes none of the non-Euclidean ideas beginning to be fanned
around that time. It was Legendre who first gave a rigorous proofof the theorem (assuming
all of Euclid's postulates, of course) that the sum of the angles of a triangle is "equal to
two right angles." Very little of his research in this area was of memorable quality. The
samecould possibly be arguedfor the balanceof his writing,bot one must acknowledge
the very fruitful ideas he left behind in number theory and elliptic functions and, of course,
the introduction of Legendre polynomials and the important Legendre transformation used
both in thennodynamics and Hamiltonian mechanics.
To find k~, we look at the evenness or oddness of the polynomials. By an
investigation of the Rodriguez formula-s-as in our study ofHennitepolynomials-
we note that Fn(-x) = (_I)nFn(x), which tells us that Fn(x) is either even or
odd. In either case, x will not have an (n - l)st power. Therefore, k~ = O.
We now calculate hn as given by (7.12):
h
n
= (-I)nknn! 11
(l _ x2)ndx = 2
nr(n
+ ~)/ r(~) 11
(1 _ x2)ndx.
s; -1 2nn ! _1
The integral can be evaluated by repeated integration by parts (see Problem 7.8).
Substituting the result in the expression above yields hn = 2/(2n + 1).
We need ctn, fin and Yn for the recurrence relation:
kn+ l 2n
+
1r(n
+ 1 +~) n!r(~) 2n + 1
ct
n
= k:: = (n + 1)!r(~) 2nr(n +~) = n + 1 '
where we used the relation I'(n-l- 1+~) = (n+ ~)r(n+~). We also have fin =0
(because ej, = 0 = k~+I) and Yn = -n/(n+l). Therefore, the recurrence relation
184 7. CLASSICAL ORTHOGONAL POLYNOMIALS
is
(n + l)Pn+I(X) = (2n + l)xPn(x) -nPn-I(X). (7.20)
Now we use K, = -2, Pl(X) = x =} kl = 1, and 02 = -1 to obtain
An = -n(n + 1), which yields the following differential equation:
d [ 2 dPn]
dx (l-x) dx = -n(n + l)Pn·
This can also be expressed as
2 d2P
n ar;
(l - x ) dx2 - 2x dx +n(n + l)Pn = O.
7.4.4 Other Classical Orthogonal Polynomials
(7.21)
(7.22)
The rest of the classical orthogonal polynomials can be constructed similarly. For
the sake of completeness, we merely quote the results.
Jacobi Polynomials, P::'v(x)
Standardization: K; = (-2)nn!
Constants: kn =2-n r(2n +fJ. +v +1), k' = n(v - fJ.) k
n!r(n +fJ. +v + 1) n 2n +fJ. +v n,
2,,+v+lr(n +fJ. + l)r(n +v +1)
hn = -:-:c::-----'----'--'---=-'----'-------'----'-,.,-
n!(2n +fJ. +v + l)r(n +fJ. +v +1)
Rodriguez formula:
Differential Equation:
d2P"'v dP"'v
(1 - x2)__
n - +[fJ. - v - (fJ. + v +2)x]_n_
dx2 dx
+n(n +fJ. +v + l)P/:'v = 0
A Recurrence Relation:
2(n + l)(n +fJ. +v +1)(2n +fJ. +v)P:-i-~
= (2n +fJ. +v + 1)[(2n +fJ. +v)(2n +u. +v +2)x +v2 - fJ.2]p/:. v
- 2(n + fJ.)(n +v)(2n +fJ. +v +2)P:~~
7.4 EXAMPLES OF CLASSICAL ORTHOGONAL POLYNOMIALS 185
Gegenbauer Polynomials, C;(x)
f(n + A+ ~)r(2A)
Standardization: K - (_2)nn'-'-__---"'--'-..,..:-
n - . r(n +2A)f(A + ~)
k
_ 2n f(n + A) , ..jiif(n + 2A)r(A + ~)
Constants: ~, - 0 h - --'--;c;--:-:-=-:;;:-:-,-=,-!'--
n - n! f(A) , ''"n - , n - n!(n +A)r(2A)r(A)
Rodriguez Formnla:
Differential Equation:
d2e'" . de'"
(1 - x2
)-----f - (2A+ l)x d" + n(n + 2A)e~ = 0
dx x
A Recurrence Relation:
(n + l)e~+l = 2(n + A)Xe~ - (n + 2A- l)e~_l
Chebyshev Polynomials of the First Kind, Tn(x)
Standardization: K = (_I)" (2n)!
n 2nn!
Constants:
Standardization:
• (_I)n2nn! dn
Rodrfguez Formulaz Tn(x) = . (l_x2)1/2_ [(l_x2)"- 1/2]
(2n)! dx"
2 d2T
n dTn 2
Differential Equation: (1 - x )--2 - x - + n Tn = 0
dx dx
A Recnrrence Relation: Tn+l = 2xTn - Tn-l
Chebyshev Polynomials of the Second Kind, U«(x)
K _ _ 1)n (2n + I)!
n-( 2n(n+l)!
Constants:
(7.23)
186 7. CLASSICAL ORTHOGONAL POLYNOMIALS
Rodriguez Formula:
U ( ) = (_1)n2
n(n
+ 1)1 (1 _ x2)- 1/2 d
n
[(1 _ x2)"+1/2]
n x (2n + I)! dx"
2 d 2U
n au;
Differential Equation: (1 - x )-2- - 3x- +n(n +2)Un = 0
dx dx
A Recurrence Relation: Un+1 = 2xUn - Un-I
7.5 Expansion in Terms of Orthogonal Polynomials
Having studied the different classical orthogonal polynomials, we can now use
them to write an arbitrary function f E £'~(a, b) as a series of these polynomials.
Ifwe denote a complete set of orthogonal (not necessarily classical) polynomials
by ICk) and the given function by If), we may write
00
If) = Lak ICk),
k=O
where ak is found by multiplying both sides of the equation by (Cil and using the
orthogonality of the ICk) 's:
00
(Cil f) = Lad Cil Ck) = a; (Cil Cj) =}
k=O
This is written in function form as
(Cil f)
aj = .
(CilCj)
(7.24)
(7.25)
t C~(x)f(x)w(x) dx
a· - ",a'-:-i:--'-'-=--.:-'--'-:-'---'---'-----
i r-: J;ICj(x)12w(x)dx
We can also "derive" the functional form of Equation (7.23) by multiplying both
ofits sides by (xl and using the fact that (xl f) = f(x) and (xl Ck) = Ck(X).
The result will be
00
f(x) = LakCk(X).
k=O
(7.26)
(7.27)
7.5.1. Example. The solution of Laplace's equation in spherically symmetric electro-
static problems that are independent of theazimuthal angleis givenby
~( bk k)
<I>(r, 0) = L.. HI +ckr Pk(COSO).
k=O r
Consider twoconducting hemispheres of radius a separated by a small insulating gap
attheequator. Theupper hemisphere is heldatpotential Vo andthelowerone at - Vo. as
7.5 EXPANSION IN TERMS OF ORTHOGONAL POLYNOMIALS 187
shownin Figure 7.1. Wewantto findthe potential at pointsoutsidethe resulting sphere.
Sincethepotential mustvanishatinfinity, we expectthesecondterm in Equation (7.27)to
be absent, i.e., Ck = 0 V k. To find bb substitute a for r in (7.27) and let cosB '" x. Then,
00 bk
<I>(a, x) = L HI Pk(X),
k=Oa
where
{
V. if - I < x < 0,
<I>(a, x) = - 0
+Vo if ue x -c L
From Equation (7.25), we have
2k + 111
-2- Pk(X)<I>(a, x) dx
-1
Toproceed, we rewrite thefirst integral:
10 Pk(x)dx = _ fO Pk(-y)dy = f1 Pk(-y)dy = (_I)k f1 Pk(x)dx,
-1 1+1 10 10
where we madeuse oftbe parity property of Pk(X). Therefore,
bk 2k+1 k r
aH 1 = -2-Vo[l- (-I) ] 10 Pk(x)dx.
Itis now clearthat only oddpolynomials contribute to the expansion. Using the result of
Problem 7.26, we get
or
Zm+2 m (2m)!
bZm+1 =(4m +3)a VO(-I) Z +1 .
2 m ml(m + 1)1
Note that <I>(a, x) is an odd function; that is, <I>(a, -x) = -<I>(a, x) as is evident from its
definition. Thus,onlyoddpolynomials appear intheexpansion of 4>(a,x) topreserve this
property. Havingfoundthecoefficients, we canwritethepotential:
00 m(4m + 3)(2m)1 (a)zm+z
<I>(r,B)=VoL(-I) Z +1 . - PZm+1(cos B).
k~O 2 m ml(m + 1)1 r
III
188 7. CLASSICAL ORTHOGONAL POLYNOMIALS
Figure 7.1 Thevoltageis +Vo fortheupper hemisphere, where0 :::; B < 1T/2, orwhere
o < cosO:::; 1. It is -Vo for the lower hemisphere, where 7C/2 < B :::; rr, or where
-1 ~cose <0.
The place where Legendre polynomials appear most natnrally is, as mentioned
above, in the solntion of Laplace's eqnation in spherical coordinates. After the par-
tial differential equation is transformed into three ordinary differential equations
using the method of the separation of variables, the differential equation corre-
sponding to the polar angle egives rise to solutions of which Legendre polynonti-
als are special cases. This differential equation simplifies to Legendre differential
equation if the substitution x = cos eis made; in that case, the solutions will
be Legendre polynontials in x, or in cos e. That is why the argument of Pk(X) is
restricted to the interval [-I, +I].
7.5.2.Example. Wecan expandthe Dirac deltafunctionin tenus of Legendre polyno-
mials.Wewrite
00
8(x) = L a"P,,(x),
n=O
where
2n+11
1
2n+1
an = - - Pn(x)8(x) dx = --Pn(O).
2 -1 2
Foroddn thiswillgive zero,because Pn(x) is an oddpolynomial. Toevaluate Pn(0) for
evenn, we use therecurrence relation(7.20)forx = 0:
n-I
or nPn(O) = -en -1)P,,_z(O), or P,,(O) = ---Pn-z(O). Iteratingthis m times,we
n
obtain
R 0 _ I m (n - I)(n - 3)··· (n - 2m + I)
,,( ) - (-) n(n - 2)(n - 4) ... (n - 2m +2) P,,-Zm(O).
(2m - 1)(2m - 3) .. ·3' I
Forn = 2m, thisyields PZm(0) = (_i)m PoCO). Nowwe "fill
2m(2m - 2) .. ·4· 2
the gaps" in the numerator by multiplying it-and the denominator, of course-by the
(7.28)
7.6 GENERATING FUNCTIONS 189
denominator. Thisyields
2m(2m -1)(2m - 2)· .. 3·2·1
P2m (0) = (_I)m [2m(2m _ 2) ... 4 . 2]2
m (2m)! m (2m)!
= (-I) [2mm!]2 = (-I) 22m(m!)2'
because Po(x) = 1.Thus, we canwrite
~ 4m + I m (2m)!
~(x) = L.- --(-I) 2 2 P2m(x).
m=O 2 2 m(m!)
Wecan also derive this expansion as follows. Forany complete set of orthonormal
vectors[IA}}~t, we have
~(x - x') =w(x) (xl x') =w(x) (xI1Ix')
= w(x) {xl (~llk) (AI) Ix'} = w(x) ~Ik(x')lk(X).
Legendre polynomials arenotorthonormal; hutwe canmakethemsoby dividing Pk(x) by
h~/2 = ..j2/(2k + I). Then,notingthat w(x) = I, we obtain
, 00 Pk(X') Pk(X) 00 2k + I ,
~(x - x) = ~ ..j2((2k + I) ..j2(2k +I) = ~ -2-Pk(x )Pk(X).
For x' = 0 we get ~(x) = E~o 2k; 11'k(O)Pk(X), which agrees with the previons
result. III
7.6 Generating Functions
Itis possible to generate all orthogonal polynomials of a certain kind from a single
function of two variables g(x, t) by repeated differentiation of that function. Such
generating function a function is called a generating function. This generating function is assumed
to be expandable in the form
00
g(x, t) = I>ntn Fn(x),
n=O
so that the nth derivative ofg(x, t) with respect to t evaluated at t = 0 gives Fn(x)
to within a multiplicative constant. The constant anis introduced for convenience.
Clearly, for g(x, t) to be useful, it must be in closed form. The derivation of such
a function for general Fn(x) is nontrivial, and we shall not attempt to derive such
a general generating function-as we did, for instance, for the general Rodriguez
formula. Instead, we simply quote these functions in Table 7.2, and leave the
derivation of the generating functions of Hermite and Legendre polynomials as
Problems 7.14 and 7.20. For the derivation of Laguerre generating function, see
[Hassani, 2000] pp. 606-607.
190 7. CLASSICAL ORTHOGONAL POLYNOMIALS
Polynomial Generating function an
Hermite, Hn(x) exp(-t2 +2xt) lin!
Lagnerre,L~(x) exp[-xtl(l-t)]/(l-t)v+l I
Chebyshev (1st kind), Tn(x) (I - t2)(t2 - 2xt + 1)-1 2, n f= 0
ao = 1
Chebyshev (2nd kind), Un (x) (t2 - 2xt +1)-1 1
.
Table 72 Generating functions for selected polynomials
7.7 Problems
7.1. Let n = 1 in Eqnation (7.1) and solve for sdw . Now snbstitute this in the
dx
derivative of ws nP,k and show that the derivative is eqnalto ws n-l P,k+!. Repeat
this process m times to prove Lennna 7.1.2.
7.2. Find w(x), a, and b for the case of the classical orthogonal polynomials in
which s(x) is of second degree.
7.3. Integrate by parts twice and nse Lennna 7.1.2 to show that
lb
Fm(wsF~)'dx =0 for m < n.
7.4. (a) Using Lemma 7.1.2 conclnde that (wsF~)'[u: is a polynomial of degree
less than or equal to n.
(b) Write (ws F~)'Iw as a linear combination of F, (x), and use their orthogonality
and Problem 7.3 to show that the linear combination collapses to a single term.
(c) Multiply both sides ofthe differential equation so obtained by Fn and integrate.
The RHS becomes hnAn. For the LHS, carry out the differentiation and note that
(ws)'I w = Kl Fl. Now show that K1Fl F~ +sF;:is a polynomial ofdegree n, and
that the LHS of the differential equation yields {(Klkln + 02n(n - I)}hn. Now
find An.
7.5. Derive the recurrence relation of Equation (7.8). Hint: Differentiate Equation
(7.5) and substitute for F;: from the differential equation. Now multiply the result-
ing equation by anx +fJn and substitute for (anx +fJn)F~ from one of the earlier
recurrence relations.
ifn=2m.
ifn is odd,
7.7 PROBLEMS 191
7.6. Using only the orthogonality of Hermite polynomials
i:e-X'Hm(x)Hn(x)dx = .j1iZnn!8
mn
(and the fact that they are polynomials) generate the first three of them.
7.7. Show that for Legendre polynomials, kn = Znr(n + !)/[n!r(~)]. Hint:
Multiply and divide the expression given in the book by n!; take a factor of Z out
of allterms in the nmnerator; the even terms yield a factor ofn!, and the odd terms
give a gannna function.
7.8. Using integration by parts several times, show that
11(1 _ x2)"dx = zmn(n - I)··· (n - m + I) IIx2m(l _ x2)n-mdx.
-I . 3 . 5 . 7 ... (Zm - 1) -I
Now show that I~I(1 - x2)ndx = Zr(!)n!/[(Zn + 1)r(n + !)].
7.9. Use the generalized Rodriguez formula for Hermite polynomials and integra-
tion by parts to expand x2k and x2k+I in terms of Hermite polynomials.
7.10. Use the recurrence relation for Hermite polynomials to show that
1
00 ,
-00xe-x
Hm(x)Hn(x)dx = .j1i2n-In! [8m•n - 1 +2(n + 1)8m•n+l]'
What happens when m = n?
7.11. Apply the general forma1ism of the recurrence relations given in the book
to Hermite polynomials to find the following:
Hn + H~_I - 2xHn_ 1 = O.
7.12. Show that 1':"00x2e-x
' H;(x) dx = .j1i2n(n + !)n!
7.13. Use a recurrence relations for Hermite polynomials to show that
Hn(O) = {O
(_I)m (2~)!
7.14. Differentiate the expansion of g(x, t) for Hermite polynomials with respect
to x (treating t as a constant) and choose an such that nan = an-I to obtain
a differential equation for g. Solve this differential equation. To determine the
"constant" ofintegrationuse theresultofProblem7.13 to showthatg(O, t) = e-/'.
192 7. CLASSICAL ORTHOGONAL POLYNOMIALS
7.15. Use the expansion of the generating function for Hermite polynontials to
obtain
Then integrate both sides over x and use the orthogonality ofthe Herntite polyno-
ntials to get
00 (sz)" fOO 2
L:-,-Z e-X
H;(x)dx = ./iieZ".
n~O (n.) -00
Deduce from this the normalization constant hn of Hn(x).
7.16. Using the recurrence relation of Equation (7.14) repeatedly, show that
fOO k _x2 {O
X e Hm(x)Hm+n(x)dx = rz:
-00 '171: 2m(m +k)!
if n > k,
if n = k.
7.17. Given that Po(x) = I and PI (x) = x,
(a) use (7.20) repeatedly to show that Pn(l) = 1.
(b) Using the same equation, find Pz(x), P3(X), and P4(X).
7.18. Apply the general formalism of the recurrence relations given in the book
to find the following two relations for Legendre polynontials:
, ,
(a) nPn - xP; + Pn-1 = O.
(b) (I - xZ)P~ - nPn-l +nXPn = O.
7.19. Show that J~l xnPn(x) dx = 2n+1(n!)z
j(2n +I)!. Hint: Use the definition
of hn and kn and the fact that Pn is orthogonal to any polynontial of degree lower
thann.
7.20. Differentiate the expansion ofg(x, t) for Legendre polynontials, and choose
Un = I. For P~, you will substitute two different expressions to get two equations.
First use Equation (7.11) with n + I replaced by n, to obtain
d 00
(I - tZ)~ +tg = 2 L:ntnPn-l +2t.
dx n=2
As an alternative, use Equation (7.10) to substilnte for P~ and get
dg 00
(I-xt)- = L:ntnPn_l +t.
dx n=Z
Combine the last two equations to get (lz - 2xt +I)g' = tg. Solve this differential
equation and detemtine the constant of integration by evaluating g (x, 0).
7.7 PROBLEMS 193
7.21. Use the generating function for Legendre polynontials to show that Pn (I) =
I, Pn(-I) = (_I)n, Pn(O) = 0 for odd n, and P~(l) = n(n + 1)/2.
7.22. Both electrostatic and gravitational potential energies depend on the qnantity
I/lr - r'], where r' is the position of the sonrce (charge or mass) and r is the
observation point.
(a) Let r lie along the z-axis, and nse spherical coordinates and the definition of
generating functions to show that
_1_ = ~ f(r<)n Pn(cose),
[r - r'] r-; n=O r-;
where r«r» is the smaller (larger) of rand r', and eis the polar angle.
(b) The electrostatic or gravitational potential energy <!>(r) is given by
<!>(r) = kll! p(r
')
d3x' ,
[r - r']
where k is a constant and p(r') is the (charge or mass) density function. Use the
result of part (a) to show that if the density depends only on r', and not on any
angle (i.e., p is spherically symmetric), then <!>(r) reduces to the potential energy
of a point charge at the origin for r > r',
(c) What is <!>(r)-in the form of an integral-for r < a for a spherically sym-
metric density that extends from origin to a?
(d) Show that E (or g) is given by [kQ(r)lr2]er where Q(r) is the charge (or
mass) enclosed in a sphere of radius r,
7.23. Use the generating function for Legendre polynontials and their orthogonal-
ity to derive the relation
11 dx 2 = ft 2n1
1
P;(x)dx.
_11-2xt+t n=O -1
Integrate the LHS, expand the result in powers of t, and compare these powers on
both sides to obtain the normalization constant hn•
7.24. Evaluate the following integrals using the expansion of the generating func-
tion for Legendre polynontials.
10
" (a cos s +b) sine de
(a) .
o -v'a2+2abcose+b2
7.25. Differentiate the expansion of the Legendre polynontial generating function
with respect to x and manipulate the resulting expression to obtain
00 00
(I - 2xt +t
2
) :~:::>n P~(x) = t ~::>n Pn(x).
n=O n=O
if k is even,
194 7. CLASSICAL ORTHOGONAL POLYNOMIALS
Equate equal powers of t ou both sides to derive the recurrence relation
, , ,
Pn+1 + Pn-1 - 2xPn - Pn = O.
7.26. Show that
[1 Pk(X) dx = {8
kO
io (_I)(k-l'/2(k_l)!
S",==:fu2-" if k is odd.
2k(y )!( &¥ )!
Hint: For even k, extend the region of integration to (-I, I) and use the orthogo-
nality property. For odd k, note that
dk- 1 2 k 1
dxk- 1 (I-x) 10
gives zero for the upper limit (by Lemma 7.1.3). For the lower limit, expand the
expression using the binomial theorem, and carry out the differentiation, keeping
in mind that only one term of the expansion contributes.
7.27. Showthatg(x, r) = g(-x, -t)forbothHermiteandLegendrepolynomiais.
Now expand g(x, t) and g(-x, -t) and compare the coefficients of t" to obtain
parity relations the parity relations for these polynomials:
and Pn(-x) = (_I)n P,,(x).
7.28. Derive the orthogonality of Legendre polynomials directly from the differ-
ential equation they satisfy.
7.29. Expand Ixl in the interval (-I, +1) in terms of Legendre polynomials. Hint:
Use the result ofProblem 7.26.
7.30. Apply the general formalism of the recurrence relations given in the book
to find the following two relations for Laguerre polynomials:
v v dL~
(a) nLn - (n +v)Ln_1 - x dx = O.
(b) (n + I)L~+1 - (2n +v + I - x)L~ +(n +v)L~_1 = O.
7.31. From the generating function for Laguerre polynomials given in Table 7.2
deduce that L~(O) = r(n +v + I)/[n!r(v + I)].
7.32. Let Ln sa L2. Now differentiate both sides of
e-xt/(l-t) 00
g(x,t)= I-t = LtnL,,(x)
o
with respect to x and compare powers of t to obtain L~(0) = -n and L~(0) =
~n(n - I). Hint: Differentiate 1/(1 - t) = I:;;:o t" to get an expression for
(I - t)-2.
7.7 PROBLEMS 195
7.33. Expande-kx as a series ofLaguerrepolynomials L~ (x). Findthe coefficients
by using (a) the orthogonality of L~(x) and (b) the generating function.
7.34. Derive the recurrence relations given in the book for Jacobi, Gegenbauer,
and Chebyshev polynomials.
7.35. Show that Tn(-x) = (-I)nTn(x) and Un(-x) = (-I)nun(x). Hint: Use
g(x, t) = g(-x, -t).
7.36. Show that Tn(l) = I, Un(l) = n + I, Tn(-I) = (_I)", Un(-I) =
(-I)n(n + I), T2m(O) = (_l)m = U2m(O), and T2m+l(O) = 0 = U2m+! (0).
Additional Reading
J. Dennery, P.and Krzywicki, A. Mathematicsfor Physicists, Harper and Row,
1967. Treats the classical orthogonal polynomials in the spirit of this chapter.
2. Tricomi, F. Vorlesungen uber Orthogonalreihen, Springer, 1955. The origi-
nal unified treatment of the classical orthogonal polynomials.
8, _
Fourier Analysis
The single most recurring theme of mathematical physics is Fourier analysis. It
shows up, for example, in classical mechanics and the analysis of normal modes,
in electromagnetic theory and the frequency analysis of waves, in noise consid-
erations and thermal physics, in quantum theory and the transformation between
momentum and coordinate representations, and in relativistic quantum field theory
and creation and annihilation operation formalism.
8.1 Fourier Series
One way to begin the study of Fourier series and transforms is to invoke a general-
ization of the Stone-Weierstrass Approximation Theorem (Theorem 5.2.3), which
established the completeness of monomials, xk . The generalization of Theorem
5.2.3 permits us to find another set of orthogonal functions in terms of which
we can expand an arbitrary function. This generalization involves polynomials in
more than one variable (For a proof of this theorem, see Sinunons [Sinun 83, pp
160-161].)
generalized
Stone-Weierstrass
theorem
8.1.1. Theorem. (generalized Stone-Weierstrass theorem) Suppose that !(X, X2,
... , xn) is continuous in the domain {ai ::: Xi ::: hili=. Then it can be expanded
in termsofthe monomials X~l X~2 ••. x~n, wherethe k; are nonnegative integers.
Now let us consider functions that are periodic and investigate their expan-
sion in terms of elementary periodic functions. We use the generalized Stone-
Weierstrass theorem with two variables, x and y. A function g(x, y) can be written
as g(x, y) = I:k,m~o akmxkym. In this equation, x and y can be considered as co-
ordinates in the xy-plane, whichin tum can be written in terms ofpolar coordinates
8.1 FOURIER SERIES 197
r and e. Inthat case, we obtain
00
[(r, e) sa g(r cos s, r sine) = L akmrk+m cosk esin" e.
k,m=O
In particular, if we let r = I, we obtain a function of ealone, which upon substi-
tution of complex exponentials for sin eand cos ebecomes
(8.1)
where b" is a constant that depends on akm' The RHS of (8.1) is periodic with
period 2IT;thus, it is especially suitable for periodic functions [(e) that satisfy the
periodicity condition [(e - IT) = [(e +IT).
We can also write Equation (8.1) as
00
[(e) = bo + L (b"ei"e +b_"e-i"e)
n=l
00
= bo +L[(b" +b_,,) cosne +i(b" - b_,,) sinne)]
n=l '"-r--' . , I
=An =Bn
00
= bo+ L(A" cosne + B" sinne).
n=l
(8.2)
If [(e) is real, then bo, A", and B" are also real. Equation (8.1) or (8.2) is called
the Fourierseries expansion of [(e).
Let us now concentrate on the elementary periodic functions ei"e. We define
the (Ie,,) }~I such that their "eth components" are given by
1 . e
(ele,,) = l'C:e"',
-v2IT
where eE (-IT, IT).
These functions-or ket vectors-which belong to .(,2(-IT, IT), are orthonormal,
as can be easily verified. It can also be shown that they are complete. In fact,
for functions that are continuous on (-IT, IT), this is a result of the generalized
Stone-Weierstrass theorem. It turns out, however, that (Ie,,) }~I is also a complete
orthonormal sequence for piecewisecontinuousfunctions on (-IT, IT).I Therefore,
any periodic piecewise continuous function of e can be expressed as a linear
combination of these orthonormal vectors. Thus if If) e.(,2(-IT, IT), then
00
If) = L [" Ie,,),
n=-oo
where f" = (e,,1 f). (8.3)
1A piecewisecontinuous function onafiniteinterval isonethathasafinitenumberofdiscontinuitiesinitsinterval ofdefinition.
198 8. FOURIER ANALYSIS
We can write this as a functional relation if we take the Othcomponent of both
sides: (01 f) = '£':::-00 fn (01 en), or
Fourier series
expansion: anguiar
expression
f(O) = _1_ f fnein9
.[iii n=-oo
with fn given by
(8.4)
fundamental cell ofa
periodic function
"The profound study
ofnature isthe most
fruitful source of
mathematical
discoveries."
Joseph Fourier
fn = (enI1 If) = (enl (j~ 10)(01 dO) If) = L:(enl 0) (01 f) so
= -1-1"e-in9f(O)dO. (8.5)
.[iii -"
It is important to note that even though f (0) may be defined only for -:n: :0:
o:0: :n:, Eqnation (8.4) extendsthe domain of definitionof f(O) to all the intervals
(2k - I):n: :0: 0 :0: (2k + I):n: for all k E Z. Thns, if a function is to be represented
by Equation (8.4) withoutany specificationof the interval of definition,it must be
periodic in O.For such functions, the interval of their definition can be translated
by a factor of2:n:. Thus, f(O) with -:n: :0: 0 :0: :n: is equivalentto f(O - 2m:n:) with
2m:n: -:n: :0: 0 :0: 2m:n: +:n:; both will give the same Fourier series expansion.We
shall defineperiodic functions in their fundamental cell such as (-:n:, :n:).
JosephFourier (1768-1830) didvery wellasayoung student
of mathematics buthad set his heart on becoming an army
officer. Denied a commission because he was the son of a
tailor, he went to a Benedictine school with thehope that he
couldcontinue studying mathematics atitsseminary in Paris.
TheFrench Revolution changed thoseplans andsetthestage
formany of the personal circumstancesofFourier'slateryears,
duein part to his courageous defenseof some of its victims,
anaction that led to his arrest in 1794.He wasreleased later
that year, andhe enrolled as a student in theEcoleNonnale,
whichopened andclosedwithinayear. Hisperformancethere,
however, wasenoughtoeamhimaposition asassistantlecturer (under Lagrange and Monge)
in theEcole Polytechnique. He was anexcellentmathematical physicist, was a friend of
Napoleon (sofarassuchpeoplehavefriends), andaccompaniedhimin 1798toEgypt, where
Fourier heldvarious diplomatic andadministrative posts whilealso conducting research.
Napoleon tooknote of his accomplishments and, on Fourier's return to France in 1801,
appointed himprefect of thedistrict of Isere, in southeastern France, andin thiscapacity
builtthe first realroad from Grenoble to Turin. He also befriended theboy Champollion,
who later deciphered the Rosetta stone as the first long step toward understanding the
hieroglyphic writing of the ancient Egyptians.
Like other scientists of his time, Fourier took up the flow of heat. The flow was of
interest as a practical problem in the handling of metals in industry and as a scientific
problem in attempts todetermine thetemperature in theinterior of theearth, thevariation
Fourier series
expansion: general
expression
8.1 FOURIER SERIES 199
of that temperature with time, and other such questions. He submitted a basic paper on
heat conduction to the Academy of Sciences of Paris in 1807. The paper was judged by
Lagrange, Laplace, and Legendre. The paper was not publish<id, mainly due to the objections
of Lagrange, who had earlier rejected the use of trigonometric series. But the Academy did
wish to encourage Fourier to develop his ideas, and So made the problem of the propagation
of heat the subject ofa grand prize to be awarded in 1812. Fourier submitteda revised paper
in 1811, which was judged by the men alreadymentioned and others. It won the prize but
was criticized for its lack of rigor and so was not published at that time in the Memoires of
the Academy.
He developed a mastery ofclear notation, some of which is still in use today. (The mod-
ern integral sign and the placement of the limits of integration near its top and bottom were
introduced by Fourier.) It was also his habit to maintain close association between mathe-
matical relations and physically measurable quantities, especially in limiting or asymptotic
cases, even performing some of the experiments himself. He was one of the first to begin
full incorporation of physical constants into his equations, and made considerable strides
toward the modern ideas of units and dimensional analysis.
Fourier continued to work on the subject of heat and, in 1822, published one of the
classics of mathematics, Theorie Analytique de la Chaleur, in which he made extensive
use of the series that now bear his name and incorporated the first part of his 1811 paper
practically without change. Two years later he became secretary of theAcademy and was
able to have his 1811 paper published in its original form in the Memoires.
Fourier series were of profound significance in connection with the evolution of the
concept of a function, the rigorous theory of definite integrals, and the development of
Hilbert spaces. Fourierclaimed that"arbitrary" graphs can be represented by trigonometric
series and should therefore be treated as legitimate functions, and it came as a shock to many
that he tumed out to be right. The classical definition ofthe definite integral due to Riemann
was first given in his fundamental paper of 1854 on the subject of Fourier series. Hilbert
thought of a function as represented by an infinite sequence, the Fourier coefficients of the
function.
Fourier himself is one of the fortunate few: his name has become rooted in all civilized
languages as an adjective that is well-known to physical scientists and mathematicians in
everypart of theworld.
Functions are not always defined on (-17:,17:).Let us consider a function F(x)
that is defined on (a, b) and is periodic with period L = b - a. We define a new
variable,
and note that f(O) == F«LI217:)O +a +L12) has period (-17:,17:) because
f(U17:) = F (2~ (0 ±1r)+a +~) = F (x ±~)
and F(x + L12) = F(x - LI2). If follows that we can expand the latter as in
Equation (8.4). Using that equation, but writing 0 in tenus of x, we obtain
(8.6)
200 8. FOURIER ANALYSIS
F(X)=F(~8+a+£)=_I- f !n exp [in
2:n:
(x-a-£)]
. 2:n: 2../iii n=-oo L 2
= _1_ f Fne2mrixjL,
4 n=-oo
where we have introduced- Fn sa '/L/2:n:!ne-i(2nn/L)(a+L/2). Using Equation
(8.5), we can write
t; = [Le-i(2nn/L)(a+L/2)_I_ In e-inO
!(8)d8
V2;. ../iii -n
= 4 e-i(2nn/L)(a+L/2) la
+
L
e-i(2nn/L)(x-a-L/2)F(x) 2:n: dx
~ a L
= _1_1
b
e-i(2nn/L)xF(x)dx. (8.7)
4a
The functions exp(2:n:inx/L)/4 are easily seen to be orthonormal as mem-
bers of .c}(a, b). We can introduce {len)}~1 with the "xth component" giveu by
(xl en) = (1/4)eZrrinx/L. Then the reader may check that Equations (8.6) and
(8.7) can be written as IF) = L~-oo Fn len) with Fn = (nl F).
8.1.2.Example. In the study of electricalcircuits,periodicvoltage signalsof different
square wave voltage shapes are encountered. Anexample is a squarewavevoltageof height Uo."duration" T,
and "rest dnration" T [seeFignre 8.I(a)]. The potential as a function of time Vet) can be
expanded as a Fourier series. Theinterval is (0, 2T) because that is one wholecycleof the
potential variation. Wetherefore use Equation (8.6) andwrite
I 00
Vet) = -- L Vn,e2n1fitj2T,
..tiTn=-oo
I Jo2T .
where Vn = -_ e-2n,nt/2TV(t)dt.
..tiT 0
Theproblem is tofindVn . Thisis easilydoneby substituting
V t _ {Uo if 0 s t ::; T,
( ) - 0 if T s t s 2T
in the last integral:
Vn = Uo fT e-innt/T dt = Uo (_~) [(_I)n _ I] where n i' 0
..tiT 1
0 ..tiT 1n1f
_{O ifn is even andn =1= 0,
- ..tiTuo if . odd
. n IS .
1n1f
2TheFn aredefined suchthatwhattheymultiply in theexpansion areorthonormal in theinterval (a, b).
8.1 FOURIER SERIES 201
(b)
(a)
2T
T
-T
u0
"- ..••....+........... :::::::::::t::::::::::: + !::::::::::::1:::::::::::
••••••·••• l •••••··•···· .••..•••••••..:............. ••••••••••••(10............ . ~ .
o :~:~:~:~:r:::::~:~:~I::::::::::::L:::::::::I:~:~:~:~::~:L:~:~:~:~:l:::::::::::I:::::.:.~
-.::.- --.-: : ; --:;;.: :
U
o .:10............ .. ~ .
,~l~'.~~H~;f~~~:~·~~~~~
-2T
Figure 8.1 (a) The periodic square wave potential. (b) Various approximations to the
Fourier series of the square-wave potential. Thedashed plotis that of thefirst term of the
series, the thick grey plot keeps 3 terms, and the solid plot IS terms.
I 1
2T
I IT
~
For n = 0, we obtain Vo = "'"' V(t) dt = "'"' Uodt = Uo -. Therefore,
..,2T 0 ..,2T 0 2
we canwrite
V(t) = _1_ [uo f'f+ ,fiTUo ( I: !eimrt/ T + f: !e'nm/T)]
,.j2f V"2 Z:Tr n=-oo n n=l n
n odd n odd
= Uo{!+ ~ If: _Ie-in"t/T + f: !einm/T]}
2 l1r n=l -n n=l n
odd n odd
= U {! ~ ~ _1_ in ([2k+ 111ft)}
o 2+1ft;Q2k+l s T .
Figure 8.1(b)showsthegraphicalrepresentationoftheabove sumwhenonlyafinite number
of tenusarepresent. ..
sawtooth voltage 8.1.3. Example. Another frequently nsed voltage is the sawtooth voltage [see Fig-
ure 8.2(a)]. The equatiou for V(t) with period T is Vet) = UotlT for 0 ~ t ~ T,
anditsFourier representation is
I 00
V(t) = - L Vne2n1titjT,
Jf n~-oo
where Vn = _1_ rT
e-2"int/TV(t) dt.
Jf io
202 B. FOURIER ANALYSIS
Substitutingfor V(t) in the integral above yields
Vn
= _1_ {T e-21rintjTUo!.... dt = UoT-3j2 {T e-21rint/Tt dt
..ti 10 T 10
= ti r-3/2(---J!--e-2Jrint/TIT +;!.- (T e-21rint/Tdt)
o -12mr 0 l2mr 10
::..::.-~--
~o
3/2 ( T
2)
uo..ti
= UOT- -.- = --.-- where n t"0,
-12m/: z2n:tr
1 leT 1 leT t
Vo = - V(t) dt = f'i' UO- dt = !Uo..ti.
..ti 0 -rt 0 T
Thus,
V(t) = _1_ [lUo..ti _ uo..ti (t ~ei2n"'/T+ f: ~ei2n"t/T)]
...tf 2 z2rr n=-oo n n=l n
= Uo {~- 2. f: ~Sin(2mrt)}.
2 JCn=ln T
Figure 8.2(b)showsthegraphical representation of theabove series keeping thefirst few
ternM. •
The foregoing examples indicate an important fact about Fourier series. At
points of discontinuity (for example, t = T in the preceding two examples), the
value of the functiou is not defined, but the Fourier series expansion assigns it a
value-the average of the two values on the right and left ofthe discontinuity. For
instance, when we substitute t = T in the series of Example 8.1.3, all the sine
terms vanish and we obtain V(T) = Uo/2, the average of Uo (on the left) and 0
(on the rigbt). We express this as
V(T) = ![V(T - 0)+ V(T +0)] sa !lim [V(T - E)+ V(T +E)].
.....0
This is a general property of Fourier series. In fact, the main theorem of Fourier
series, which follows, incorporates this property. (For a proof of this theorem, see
[Cour 62].)
8.1.4. Theorem. TheFourierseries ofafunction f(O) that ispiecewise continuous
in the interval (-Jr, Jr) converges to
![f(O +0) + f(O - 0)] for - x < 0 < it;
![f(Jr) + f(-Jr)] for 0 = ±Jr.
Although we used exponential functions to find the Fourier expansion of the
two examples above, it is more convenient to start with the trigonometric series
B.l FOURIER SERIES 203
(b)
time
Figure8.2 (a)Theperiodic saw-toothpotential. (b)Various approximations totheFourier
seriesof the sawtooth potential. The dashed plot is thatof the first term of the series,the
thick grey plot keeps 3 terms, and the solid plot 15 terms.
when the expansion of a real function is songht. Equation (8.2) already gives such
an expansion. All we need to do now is find expressions for An and Bn. From the
definitions of An and the relation between bn and In we get
1
An =bn +b-n = !'C(fn + I-n)
v2re
1 I f" 1 f"
= - ( - e-ine1(0) so + - eine1(0) dO)
v'2ir v'2ir -" v'2ir -"
1 f" 1 L
= - [e- ine +eineJ/(O)dO = - cosnOI(O) dO.
211: -1t tt -11:
Similarly,
I L
Bn = - sinnOI(O) dO,
:n: _"
1 I f" I
bo = !'Clo = -2 1(0) se es -2 Ao.
V 2n 1!-1C
(8.8)
(8.9)
So, for a function 1(0) defined in (-re, :n:), the Fourier trigonometric series is as
in Equation (8.2) with the coefficients given by Equations (8.8) and (8.9). For a
function F(x), defined on (a, b), the trigonometric series becomes
1 ~ ( Znnx . 2n:n:x)
F(x) = -AD + L.. Ancos-- + Bn sm-- ,
2 n=1 L L
(8.10)
(8.11)
204 8. FOURIER ANALYSIS
where
An = ~ t cos Cn;X)F(x) dx ;
En = ~ (b sin (2nrrx) F(x) dx.
Lin L
A convenient rule to remember is that for even (odd) functions-which are
necessarily defined on a symmetric interval around the origin-only cosine (sine)
terms appear in the Fourier expansion.
8.1.5. Example. An alternating current is tnmed into a direct currentby starting with
a signal of the form V(t) ex: Isin llltI.i.e., a harmonic function thatis nevernegative, as
shownin Figure 8.3(a).Thenby proper electronics, one smooths outthe "bumps" so that
the output signal is verynearly a direct voltage.Let us Fourier-analyze the above signal.
Since Vet) is even for -lC -c tot < 1C, we expect only cosine tenus to be present. Ifforthe
timebeingwe use e instead of cot,we canwriteIsinBI = !Ao +L~l Ancosn6, where
1 1" 2 10"
An = - IsinOlcosnOdO=- sinOcosnOdO
]'(-Jr nO
21o"t 2 [(-I)n+ 1]
= - ~[sin(n + 1)0 - sin(n -I)O]dO = --2--
11:0 n-l1!
_{-~(+) for e even and e ee D,
- 1r n -1
o forn odd,
and Ao = (1/rr)J':." Isin0l dO = 4/rr. Theexpansionthen yields
. 2 4 ~ cos2kwt
Ismwtl=--- L., 2 '
rr tt k~l 4k - 1
where in the sumwe substituted 2k for n, andOJt forB. Figure 8.3(b) showsthe graph of
the seriesabove when only the first few termsarekept. II
It is useful In have a representation of the Dirac delta function in terms of
the present orthonormal basis of Fourier expansion. First we note that we can
represent the delta function in terms of a series in any set of orthonormal functions
(see Problem 8.23):
8(x - x') = L fn(x)t,;'(x')w(x). (8.12)
n
Next we use the basis of the Fourier expansion for which w(x) = 1. We then
obtain
00
8(x -x') = L
n=-oo
e21r:inx/L e-2ninx'/L
~ Ee2nin(x-x')/L .
n=-oo
8.1 FOURIER SERIES 205
(a)
(b)
31C/ro
21C/ro
1C/ro
time
a
oO:"""~.L.....~..L.,..~u.....~..........~-'-'-~ ..........~.........~........"
-rc/ro
a
Figure8.3 (a) The periodic "monopolar" sine potential. (b) Various approximations to
theFourier seriesofthe"monopolar' sinepotential. Thedashedplotis that of thefirst term
of the series, the thick grey plot keeps 3 terms, and the solid plot 15 terms.
8.1.1 The Gibbs Phenomenon
The plot of the Fourier series expansions in Figures 8.1(b) and 8.2(b) exhibit a
feature that is common to all such expansions: At the discontinuity of the periodic
function, the truncated Fourier series overestimates the actual function. This is
Gibbs phenomenon called the Gibbs phenomenon, and is the subject of this subsection.
Let us approximate the infiuite series with a fiuite sum. Then
1 ~ . 0 1 ~ . 0 1 fo2n" .0'
fN(O) = -- L.. fn eln = -- L.. eln - - e-In
f(O') dO'
v'2rr n=- N v'2rr n=- N ' v'2rr 0
1 (2n N
= 27C In dO' f(O') L ein(O-O'),
o n=-N
where we substituted Equation (8.5) in the sum and, without loss of generality,
changed the interval of integration from (-:n:, IT) to (0, 2:n:). Problem 8.2 shows
that the sum in the last equation is
~ in(O-O') _ sin[(N + ~)(O - 0')]
L.. e - 1
n=-N sin['i(O - 0')]
It follows that
fN(O) =...!:.- r:s«f(O') sin[(N + ~)(O - 0')]
2" Jo sin[~(O - 0')]
206 6. FOURIER ANALYSIS
I i: sin[(N + l)tP] I j2n~0
= - dtPf(tP+0) . 1 2 sa - d4>f(tP +O)S(tP)·
2n: -0 s1O(2tP) 2n: -0 (8.13)
~
=SCq,)
We wantto investigate the behavior of fN at a discontinuity of f. By translating
the limits of integration if uecessary, we can assume that the discontinuity of
f occurs at a point a such that 0 01 a 01 2n:. Let us denote the jump at this
discontinuity for the function itself by !'>f, and for its finite Foutier sum by !'>fN:
!'>f sa f(a + f) - f(a - f),
maximum overshoot
inGibbs
phenomenon
calculated
Then, we have
!'>fN
I c: I j2n-u+<
=- dtPf(tP +a + f)S(tP) - - dtPf(tP +a - f)S(tP)
2n -a-e 211: -a+€
= _I {j-u+<dtPf(tP +a + f)S(tP) +t:dtPf(tP +a + f)S(tP)}
2rr -a-e -a+€
I {j2n-u-< c: }
- - dtPf(tP +a - f)S(tP) + dtPf(tP +a - f)S(tP)
2n -a+€ 2rr-a-e
I {j-u+< {2n-u+< }
= 2n: -u-< dtPf(tP +a + f)S(tP) - J2n-u-< dtPf(tP +a - f)S(tP)
I i:
+ - dtP[f(tP +a +f) - f(tP +a - f)]S(tP)
2n -a+e
The first two integrals give zero because ofthe small ranges ofintegration and the
continuity ofthe integrands in those intervals. The integrand ofthe third integral is
almost zero for all values ofthe range ofintegration except when tP '" O.Hence, we
can confine the integration to the small interval (-8, +8) for which the difference
in the square brackets is simply ts], It now follows that
!'>f j~ sin[(N + !)tP] !'>f fo~ sin[(N + !)tP]
!'>fN(8) '" -2 . 1 dtP '" - 1 dtP,
it _~ s1O(2tP) n: 0 2tP
wherewe have emphasized the dependence of fN on 8 and approximated the sine
in the denominator by its argument, a good approximation due to the smallness
of tP. The reader may find the plot of the integrand in Fignre 6.2, where it is
shown clearly that the major contribution to the integral comes from the interval
[0, n:1(N+!)],where n:I(N +!) is the first zero ofthe integrand. Furthermore, it
is clear that ifthe upper limit is larger than n:I(N +!),the result ofthe integral will
decrease, because in each interval oflength 2n:,the area below the horizontal axis
is larger than that above. Therefore, if we are interested in the maximum overshoot
8.1 FOURIER SERIES 207
of the finite sum, we must set the upperlimit equal to re/(N +~). It follows firstly
that the maximum overshoot of the finite sum occurs at reI(N +~) RJ n tN to the
right of the discontinuity. Secondly, the amount of the maximum overshoot is
=X
I ~
2!:>.f 1,,/(N+2) sin[(N + ~)</J]
(l1fN)m", RJ - A. d</J
tt 0 'I'
2 1
1C
sinx
= -tV -dx RJ 1.179I1f.
re 0 x
Thus
(8.14)
8.1.6.Box. (Gibbs phenomenon) The finite (large-N) sum approximation
ofthe discontinuousfunction overshoots the function itselfat a discontinuity
by about 18 percent.
8.1.2 Fourier Series in Higher Dimensions
It is instructive to generalize the Fourier series to more than one dimension. This
generalizationis especiallyusefulin crystallography and solid-statephysics, which
deal with three-dimensional periodic structures. To generalize to N dimensions,
we first consider a special case in which an N-dimensional periodic function is a
product of N one-dimensional periodic functions. That is, we take the N functions
f(J)(x) = ~ f: ffiJe2i" kx/ Lj , j = 1,2, ... , N,
yLj k=-oo
and multiply them on both sides to obtain
(8.15)
where we have used the following new notations:
F(r) ea f(l)(xj)f(2)(X2)" .j<N)(XN),
k",(kl,k2, ... .k»).
gk = 2re(kilL!, ... , kNILN),
V=LI L2··· LN,
Fk sa fkl ... fkN'
r = (Xl, X2, ... , XN).
We take Equation (8.15) as the definition of the Fourier series for any periodic
function of N variables (not just the product of N functions of a single variable).
However, application of (8.15) requires some clarification. In one dimension, the
208 8. FOURIER ANALYSIS
shape ofthe smallest region ofperiodicity is unique. I! is simply a liue segment of
length L, for example. In two and more dimensious, however, such regions may
havea variety of shapes. Forinstance, in two dimensions, theycanberectangles,
peutagons, hexagons, and so forth. Thus, we let V in Equation (8.15) stand for
a primitive cell of the N-dimensional lattice. This cell is important iu solid-state
Wigner-Seitz cell physics, and (iu three dimeusions) is called the Wigner-Seitz cell.
I! is customary to absorb the factor 1!.,fV iuto Fk, and write
F(r) = L Fkeigk"
k
(8.16)
where the integral is over a siugle Wigner-Seitz cell.
Recall that F(r) is a periodic function ofr. This means that when r is changed
by R, where R is a vector describiug the boundaries of a cell, then we should
get the same function: F(r +R) = F(r). When substituted iu (8.16), this yields
F(r +R) =Lk Fkeig•.(,+R) =Lk eig•.R Fkeig•." which is equal to F(r) if
(8.17)
In three dimensions R = mI8t + m28z + m3a3, where mI, mz. and m3 are
iutegers and 31, 3Z, and 33 are crystal axes, which are not generally orthogonal.
On the other hand, gk = nlbl +nzhz +n3b3, where nl, nz, and n3are iutegers,
reciprocallallice and bl, bz, and b3 are the reclprocal lattice vectors defined by
vectors
The reader may verify that bi . 3j = 2:n:8ij. Thus
~.R = (tnibi) . (tmj3j) = ~nimjbi'3j
1=1 1=1 I,l
3
= 2:n: Lmjnj = 2:n:(iuteger),
j=l
and Equation (8.17) is satisfied.
8.2 The FourierTransform
The Fourier series representation of F (x) is valid for the entire realliue as long as
F(x) is periodic. However, most functions encountered in physical applications
are defined in some iuterval (a, b) withoutrepetition beyond that iuterval. I!wonld
be useful ifwe conld also expand such functions iu some form ofFourier "series."
One way to do this is to star! with the periodic series and then let the period
go to iufinity while extending the domain of the definition of the function. As a
8.2 THE FOURIER TRANSFORM 209
f(x)
a
(a)
L
)1001
b=a+L
a-L a
(b)
a+L a+2L
(8.18)
(8.19)
Figure 8.4 (a) The function we want to represent. (b) The Fourier series representation
of thefunction.
specific case, suppose we are iuterested in representiug a function f(x) that is
defined only for the interval (a, b) and is assigned the value zero everywhere else
[see Figure 8.4(a)]. To begiu with, we might try the Fourier series representation,
but this will produce a repetition of our function. This situation is depicted iu
Figure 8.4(b).
Next we may try a function gA (x) defined in the interval (a - A/2, b +A/2),
where A is an arbitrary positive number:
{
o if a - A/2 < x < a,
gA(X)= f(x) if a-c x -c b,
o if b < x < b + A/2.
This function, which is depicted iu Figure 8.5, has the Fourier series representation
I 00
gA(X) = L gA,ne2hmxI(L+Al,
.,jL +A n=-oo
where
I lb
+
A/2
.
gA.n = e-2l1rnxI(L+AlgA(X)dx.
.,jL +A a-A/2
We have managed to separate various copies of the original periodic function
by A. It should be clear that if A ---> 00, we can completely isolate the function
210 8. FOURIER ANALYSIS
and stop the repetition. Let us investigate the behavior of Equations (8.18) and
(8.19) as A grows without bound. First, we notice that the quantity kn defined by
kn es 2n71:/ (L + A) and appearing in the exponent becomes almost continuous.
In other words, as n changes by one unit, kn changes only slightly. This suggests
that the terms in the sum in Equation (8.18) can be lumped together in j intervals
of width Snj , giving
where kj ea 2nj7l:/(L + A), and gA(kj) == gA,nj' Substituting !'1nj = [(L +
A)/271:]!'1kj in the above sum, we obtain
where we introduced gA(kj) defined by gA(kj) sa .J(L +A)/271: gA(kj). It is
now clear that the preceding sum approaches an integral in the limit that A --+ 00.
Fourier integral In the same limit, gA (x) --+ f (x), and we have
transforms
f(x) = ~1°O f(k)eikXdk,
'V 21l' -00
where
(8.20)
(8.21)
- . - . JL+A
f(k) sa lim u(kj) = lun - - gA(kj)
A--+oo A-+oo 2n
JL + A I lb+Af
2 .
= lim e-'kjXgA(x)dx
A ....oo 271:.JL + A a-A/2
= _1_100
f(x)e-ikxdx.
...J2ir -00
Equations (8.20) and (8.21) are called the Fourier integral transforms of f(k)
and f(x), respectively.
8.2.1, Example. Let us evaluatethe Fouriertransformof the function defiuedby
f(x) = {b if Ixl < a,
D if [x] > a
(seeFigure 8.6).From (8.21)we bave
J(k) = _1_ foo f(x)e-ikxdx = _b_ fa e-ikxdx = 2ab (Siuka),
."fiii -00 ."fiii -a ."fiii ka
whichis thefunction encountered (anddepicted) in Example 6.1.2.
a-N2
8.2 THE FOURIER TRANSFORM 211
b+N2
Figure 8.5 By introducing theparameter A, we havemanaged to separate thecopies of thefunction.
Letusdiscussthisresultindetail. First, notethatifa ~ 00, thefunction f (x) becomes
a constant function overtheentire real line, andwe get
_ 2b sinka 2b
I(k) = = lim - - = =7C8(k)
v 21r e-e-oc k '"21r
by the resultof Example 6.1.2. This is the Fourier transformof an everywhere-constant
function (seeProblem 8.12). Next,let b --+ 00 anda --+ 0 in suchawaythat2ab, whichis
the area under I(x), is I. Then I(x) will approach the delta function, and j(k) becomes
_ 2ab sinka I sinka I
I(k) = lim - - - = - lim - - =-.
b....oo.j'iii ka .j'iii a....O ka .j'iii
a-+O
Heisenberg
uncertainty relalion
So the Fouriertransform of the delta functionis the constant 1/~.
Finally, we note that the width of I(x) is Sx =2a, and the width of j(k) is roughly
thedistance, onthek-axis,betweenits first two roots, k+ andk_. on either sideof k = 0:
t>k =k+ -k_ =27C[a. Thus increasing the width of I(x) results in a decrease in the width
of !(k). In other words, whenthefunction is wide, its Fourier transform is narrow. In the
limitofinfinite width(a constantfunction), we get infinite sharpness (thedeltafunction).
The lasttwo statements are verygeneral. In fact, it canbe shownthatAxAk ~ 1 forany
function I (x). When both sides of this inequality are multiplied by the (reduced) Planck
constant a== h/(23t}, theresultis thecelebrated Heisenberg uncertainty relation:3
t>xt>p 2: Ii,
where p = Ilk is the momeutum of the particle. Having obtained the transform of I(x), we
canwrite
I( I 100
2b sinka ikxdk _ b 100
sinka ikxdk
x}=-- ----e -- --e .
.j'iii -oo.j'iii k 7C -00 k
III
3Inthecontextof theuncertainty relation, the widthof the function-the so-calledwavepacket-measures theuncertainty
in theposition x of a quantum mechanical particle. Similarly, thewidthof theFourier transform measures theuncertainty ink,
whichis related to momentum p via p = hk,
212 B. FOURIER ANALYSIS
f(x)
b
ax
_a a
x
Figure8.6 The square"bump" function.
8.2.2. Example. Let us evaluate the Fourier transform of a Gaussian g(x) = ae-
bx2
witha,b > 0:
a 1
00
2 ae-
k2
/
4b
100
2
g(k) = - - e-b(x +ikxjb)dx = e-b(x+ikj2b) dx .
../2ii -00 ../2ii -00
Toevaluate thisintegral rigorously, we would havetousetechniques developed incomplex
analysis, whicharenotintroduced until Chapter 10(seeExample 10.3.8).However, we can
ignore thefactthat theexponent is complex, substitute y = x + ik/(2b), andwrite
100 e-b[x+ik/(2b)]2dx = 100
e-by2dy = E,
-00 -00 Vb
Thus, we have g(k) = ::"'e-k'j(4b), which is also a Gaussian.
-n»
We note again that the width of g(x), which is proportional to lj../b, is in inverse
relation to the widthof g(k), whichis proportionalto ../b.Wethus have AxAk - 1. III
Equations (8.20) and (8.21) are reciprocals of one another. However, it is not
obvious that they are consistent. In other words, if we substitute(8.20) in the RHS
of (8.21), do we get an identity? Let's try this:
Ilk) = _1_100
dx e-ikx [_1_100
!(k')eik'Xdk']
..;z;r -00 ..;z;r -00
= ~ 100
dx 100
!(k')ei(k'-k)Xdk'.
21l' -00 -00
We now change the order of the two integrations:
Ilk) = 100
dk' /(k') [~1°O dx ei(k'-k)X] .
-00 21t -00
But the expression in the square brackets is the delta function (see Example 6.1.2).
Thus, we have Ilk) = J~oo dk'!(k')8(k' - k), which is an identity.
8.2 THE FOURIER TRANSFORM 213
As in the case of Fourier series, Equations (8.20) and (8.21) are valid even if
f and! are piecewise continuous. In that case the Fourier transforms are written
as
![f(x +0) + f(x - 0)] = ~100
!(k)eikxdk,
'1271: -00
![!(k+O)+!(k-O)] = ~1°O f(x)e-ikxdx,
'1271: -00
(8.22)
where each zero on the LHS is an E that has gone to its limit.
It is useful to generalize Fourier transform equations to more than one dimen-
sion. The generalization is straightforward:
1 f 'k-
f(r) = dnke' -r f(k)
(271:)n/2 '
I f 'k
!(k) = dnxf(r)e-' .r.
(271: )n/2
(8.23)
Let us now use the abstract notation of Chapter 6 to get more insight into the
preceding results. In the language of Chapter 6, Equation (8.20) can be written as
(8.24)
(8.25)
where we have defined
I 'k
(xlk) = =e' x.
'1271:
Equation (8.24) suggests the identification If) es If) as well as the identity
1 = L:Ik) (kl dk,
which is the same as (6.1). Equation (6.3) yields
(kl k') = 8(k - k'),
(8.26)
(8.27)
which upon the insertion of a unit operator gives an integral representation of the
delta function:
8(k - k') = (klllk') = (kl (L:Ix) (xl dX) Ik')
100 I 100
= (kl x) (xl k') dx = - dxei(k'-k)x.
-00 21f -co
Obviously, we can also write 8(x - x') = [1/(271:)] J":'oo dkei(x-x'lk.
214 8. FOURIER ANALYSIS
Ifmore than onedimension is involved, we use
8(k - k') = _1_ fdnxei(k-k'j.r
(211")n '
8(r - r') = _1_ fdnkei(r-r'j.k
(211")n '
with the inner product relations
(8.28)
( Ik) I rk-r
r = (211")n/2 e ,
1 -ik.r
{k] r) = (211")n/2 e • (8.29)
Equations (8.28) and (8.29) and the identification If) == If) exhibit a striking
resemblance between [r) and [k), In fact, any given abstract vector If) can be
expressed either in terms of its r representation, (r] f) = f(r), or in terms of
its k representation, (kl f) == f(k). These two representations are completely
equivalent, and there is a one-to-one correspondence between the two, given by
Equation (8.23). The representation that is used in practice is dictated by the
physical application. In quantum mechartics, for instance, most of the time the r
representation, corresponding to the position, is used, because then the operator
equations turn into differential equations that are normally linear and easier to
solve than the corresponding equations in the k representation, which is related to
the momentum.
8.2.3. Example. InthisexampleweevaluatetheFouriertransform ofthe Coulombpoten-
tialVCr) of a point charge q: VCr) = «tr- TheFouriertransfonn is important in scattering
experiments withatoms, molecules,andsolids.As we shallseeinthefollowing, theFourier
Yukawa potential transform of V (r) is notdefined. However, if we work withtheYukawapotential,
qe-ar
Va(r) =--,
r
a> 0,
theFourier transform will be well-defined,andwe cantakethelimita ~ 0 to recoverthe
Coulomb potential. Thus, we seektheFourier transformof Va(r).
Weareworking in threedimensionsandtherefore may write
I Iff -ar
Va(k) = d3xe-ik.r~.
(211")3/2 r
It is clearfrom the presence of r that spherical coordinates are appropriate. We are free
to pick anydirection as the z-axis.A simplifying choice in thiscaseis thedirection of k.
So, we let k = Iklez = kez, or k . r = kr cose, whereeis the polar anglein spherical
coordinates. Now we have
Va(k) = __
q_ roor2dr {1C sln sse {21r dcpe-ikrcos(Je-ar.
(211")3/2 10 10 10 r
Thecp integration is trivial and gives 217'. The0 integration is donenext:
{1C sinOe-ikrcos(J dO = 11 e-ikrudu = ~(eikr _ e-ikr).
fo -'-1 ikr
..
8.2 THE FOURIER TRANSFORM 215
We thus have
Va(k) = q(2n) rX!drrze-ar _1_ C
i kr _ e-ikr)
(211:)3/2 1
0 r ikr
= __
q_ ~ [00 dr [e(-a+iklr _ e-(a+ik)rj
(211:)1/2 ik 10
q I (e(-a+ik)r 1
00
e-(a+ik)r 1
00
)
= (211:)1/2 ik -a +ik 0 + ,,+ ik 0 .
Note how thefactor e-ar hastamed thedivergent behavior of theexponential atr -+ 00.
This was the reason for introducing it in the first place. Simplifyingthe last expression
yields Va(k) =(2ql../'iii)(~ +~)-1. The parameter" is a measure of the range of the
potential. Itis clearthat thelarger ex is, thesmallertherange. In fact,it wasinresponse to
the short range of nuclear forcesthat Yukawa introduced a. Forelectromagnetism, where
therange isinfinite, ex becomes zeroandVa(r) reduces toV (r). Thus, theFouriertransform
of theCoulomb potential is
_ 2q I
VCoul (k) = ../'iii k2 .
If a charge distribution is involved, theFourier transform willbedifferent.
8.2.4. Example. The example above deals with the electrostatic potential of a point
charge. Letus now consider thecase where thecharge is distributed overa finite volume.
Thenthepotential is
V( ) = fff qp(r') d3 , '" f per') d3 ,
r Ir' _ r] x q II'"' _ r] x ,
where qp(r') is thecharge density at r'.andwe haveuseda single integral because d3x'
already indicates thenumber of integrations tobe performed. Notethatwe havenormalized
p (1'"') sothat its integralover the volmne is I. Figore 8.7 showsthe geometry of the situation.
Making a changeof variables, R == r' - r, or r' = R + r, andd 3x' = d 3X, with
R", (X, Y, Z), we get
V(k) = (211:~3/2f d3xe-ik.rq f P(RR+
r
) d3X . (8.30)
Toevaluate Equation (8.30), we substitute for peR+ r) in terms of its Fourier transform,
peR +r) = __
1_ fd3k'p(k')eik'.(R+r). (8.31)
(211:)3/2
Combining (8.30) and (8.31), we obtain
ik'·R
V(k) =.s.. f d3xd3X d3k'_e__ p(k')ir.(k'-k)
(211:)3 R
ik'·R
= fd 3X d3k'_e- -(k') (_1_ f d3xeir.(k'-kl)
q R p (211:)3
-...::.:.....:.....:.~--~
~(k'-k)
(8.32)
216 8. FOURIER ANALYSIS
dq
o
Figure 8.7 TheFourier transform of thepotential of a continuous charge distribution at
P is calculated usingthisgeometry.
What isniceaboutthisresultis thatthecontribution ofthecharge distribution, p(k), has
beencompletely factored out.Theintegral, asidefroma constant anda change in thesign
of k, is simply the Fourier transform of the Coulomb potential of a point cbarge obtained
in theprevious example. Wecantherefore writeEquation (8.32) as
- 3/2 - 4".qp(k)
V(k) = (2".) P(k)VCoul(-k) = Ikl2 •
formfactor
Fourier transform
and the discovery of
quarks
This equation is important in analyzing thestructure of atomic particles. TheFourier
transform ii(k) is directly measurable in scattering experiments. In a typical experiment
a (cbarged) target is probed with a charged point particle (electron). If the analysis of the
scatteringdatashowsadeviationfrom 1/ k2in thebehaviorof V(k),theniteanbeconcluded
that the target particle has a charge distribntion. More specifically, a plot of k2iT(k) versus k
givesthevariation of p(k:),theformfactor,withk.1f theresulting graph isaconstant, then
p(k) is a constant, and the target is a point particle [p(k) is a constant for point particles,
where p(r') ex8(r - r')]. Ifthere is any deviation from a constantfunction, p(k) mnst have
adependence onk, andcorrespondingly, thetarget particle musthaveacharge distribution.
The abovediscussion, when generalized to four-dimensional relativistic space-time,
was the basis for a strong argument in favor of the existence of point-like particles-
qnarks-inside a proton in 1968, when the results ofthe scattering of high-energy electrons
off protons at the Stanford Linear Accelerator Center revealed deviation from a constant
~~~oofu~~~ •
8.2.1 Fourier Transformsand Derivatives
The Fourier transform is very useful for solving differential equations. This is
because the derivative operator iu r space turns into ordinary multiplication in k
:~ = Jznfdky(k)(ik)e
ikx,
f(x) = JznfdkJ(k)e
ikx.
8.2 THE FOURIER TRANSFORM 217
space. For example, if we differentiate f (r) in Equation (8.23) with respect to xi-
we obtain
~ fer) = I fdnk~ei(klXl+.+kjxj+..+k"x")J()<)
aXj (2rr)n/2 aXj
= (2rr~n/2 fdnkCikj)e
ik.r
J(k).
That is, every time we differentiate with respect to any component of r, the cor-
responding component of k "comes down." Thus, the n-dimensional gradient
is V fer) = (2rr)-n/2 f dnkCik)eik.rJ()<), and the n-dimensional Laplacian is
v2 fer) = (2rr)-n/2 f d"k(_k2)eik.rJ(k).
We shall use Fourier transforms extensively in solving differential equations
later in the book. Here, we can illustrate the above points with a simple example.
Consider the ordinary second-order differential equation
d 2y dy
C2dx2 +Ci dx +CoY = f(x),
where Co, Cl, and C2 are constaots. We can "solve" this equation by simply
substituting the following in it:
y(X) = JznfdkYCk)e
ikx,
d
2y
= __
1_ fdky(k)k2eikx,
dx? ./'iii .
This gives
Jznfdky(k)(-C2k2 +iCtk +Co)e
ikx
= JznfdkJ(k)e
ih
Equating the coefficients of eikx on both sides, we obtain
- k _ J(k)
y( ) - -C2k2 +iCik:+Co'
If we know J(k) [which can be obtained from f(x)], we can calculate y(x)
by Fourier-transforming Y(k). The resulting integrals are not generally easy to
evaluate. In some cases the methods of complex analysis may be helpful; in others
numerical integration may be the last resort. However, the real power of the Fourier
transform lies in the formal analysis of differential equations.
8.2.2 The Discrete Fourier Transform
The preceding remarks alluded to the power of the Fourier transform in solving
certain differential equations. Ifsuch a solution is combined with numerical tech-
niques, the integrals must be replaced by sums. This is particularly troe if our
218 8. FOURIER ANALYSIS
function is given by a table rather than a mathematical relation, a common fea-
ture of numerical analysis. So suppose that we are given a set of measurements
performed in equal time intervals of I'>t.Suppose that the overall period in which
these measurements are done is T. We are seeking a Fourier transform ofthis finite
set of data. First we write
or, discretizing the frequency as well and writing CUm = mSco, with I'>cu to be
determined later, we have
I N-l T
j(ml'>cu) = - - L f(nl'>t)e-i(mh.wj
n6.t ( - ) •
v'2IT n~O N
(8.33)
Since the Fourier transform is given in terms of a finite sum, let us explore the
idea of writing the inverse transform also as a sum. So, multiply both sides of the
above equation by [ei(mh.wjk6.t /(v'2IT)]l'>cu and sum over m:
I N-l T I'> N-l N-l ,
-- L j(ml'>cu)ei (mh.w)k6.t I'>cu = ~ L L f(n!'>t)eimh.w6.t (k- n)
v'2IT m~O 2:n:N n~O m~O
T I'> N-l N-l
= ~ L f(nl'>t) L eimh.w6.t (k- n).
2:n:N n~O m=O
Problem 8.2 shows that
j
N
N-l
eim!!J..w8.t (k- n) =
~ . e iN t1wt:J.t(k-n) _ 1
eili.wAt(k n) _ 1
if k = n,
if k # n.
Wewant the sum to vanish whenk # n. This suggests demanding that N I'>cul'>t(k-
n) be an integer multiple of 2:n:. Since I'>cu and I'>t are to be independent of this
(arbitrary) integer (as well as k and n), we must write
T 2:n:
N I'>cul'>t(k - n) = 2:n:(k - n) =} N I'>cuN = 2:n: =} I'>cu = T
discrete Fourier With this choice, we have the following discreteFouriertransforms:
transforms
2:n:j
cu· - -
} - T '
(8.34)
8.2 THE FOURIER TRANSFORM 219
where we have redefined the new 1to be .J2rcN/ T times the old j.
Discrete Fourier transforms are used extensively in numerical calculation of
problems in which ordinary Fourier transforms are used. For instance, if a dif-
ferential equation lends itself to a solution via the Fourier transform as discussed
before, then discrete Fourier transforms will give a procedure for fiuding the so-
lution uumerically. Similarly, the frequeucy analysis of signals is nicely handled
by discrete Fourier transforms.
It turns out that discrete Fourier analysis is very iuteusive computationally. Its
present status as a popular tool iu computational physics is due primarily to a very
fastFourier efficieut method of calculatiou known as the fast Fouriertransform. 10a typical
transform Fourier transform, one has to perform a sum of N terms for every point. Since there
are N points to transform, the total computational time will be of order N2• 10the
fast Fourier transform, one takes N to be even and divides the sum into two other
sums, one overthe even termsandone overthe odd terms. Thenthe computation
time will be of order 2 x (N/2)2, or half the original calculation. Similarly, if
N/2 is even, one can further divide the odd and even sums by two and obtain a
computation time of 4 x (N / 4)2, or a quarter ofthe original calculation. 10general,
if N = 2k
, then by dividing the sums consecutively, we end up with N transforms
to be performed after k steps. So, the computation time will be kN = N log2 N.
For N = 128, the computation time will be 100 log2 128 = 700 as opposed to
1282 R<16, 400, a reduction by a factor of over 20. The fast Fourier transformis
indeed fast!
8.2.3 The Fourier Transform of a Distribution
Although one can define the Fourier transform of a distribution in exact analogy
to an ordinary function, sometimes it is convenient to define the Fourier transform
of the distribution as a linear functional.
Let us ignore the distinction between the two variables x and k, and simply
define the Fourier transform of a function f :JR ---> JR as
- 1 100
. t
f(u) = = f(t)e'U dt.
'! 2rc -00
Now we consider two functions, f and g, and note that
1
00 100 1 100 .
(j, g) == f(u)g(u)du = f(u) [= g(t)e-'U'dt] du
-00 -00 v2n-00
1
00 1 100 .]
= g(t) [-- f(u)e-'U'du dt
-00 ../iii-oo
= L:g(t)l(t) dt = (I, g) .
The following definition is motivated by the last equation.
220 8. FOURIER ANALYSIS
8.2.5. Definition. Let<p be a distribution and let f be a er;function whose Fourier
transform 1exists and is also a er;function. Then we define the Fourier transform
ifJ of<p to be the distribution given by
(ifJ, f) = (<p, /) •
8.2.6. Example. TheFouriertransform of 8(x) is givenby
- - - 1 100
(8, f) = (8, f) = frO) = PC f(t) dt
",21l' -00
= i:(~)f(t)dt=(~,f).
Thus,g= 1/$, asexpected.
TheFourier transform of }j (x - x') ee 8X' (x) is givenby
- - - ! 1 1
00
-ix't
(8x" f) = (8x" f) = f(x) = PC f(t)e dt
",,21l'" -00
= 100
(_I_e- 'x't) f(t)dt.
-00 $
Thus,if ,,(x) = 8(x - x'), then';;(t) = (1/$)e-'x't.
8.3 Problems
III
8.1. Consider the function frO) = L~=-oo 8(0 - Zmn}.
(a) Show that f is periodic of period 2".
(b) What is the Fourier series expansion for f(O).
8.2. Break:the sum L~~-N ein(O-O') into L;;L; +1+L~~t. Use the geometric
sum formula
N rN+l _ 1
~arn=a~__'::'
L..J r-1
n=O
to obtain
~ e'n(O-O') = e'(O-O') e'N(O-~') - 1 = ei~(N+l)(O-O') sin[~N(O - Of)].
t:1 e'(O-O) - 1 sin[~(O - 0')]
By changing n to -n or equivalently, (0 - Of) to -(0 - Of) find a similar sum from
-N to -I. Now put everythiog together and use the trigonometric identity
2 cos a sin P= sin(a +P) - sin(a - P)
to show that
~ 'n(O-O') _ sin[(N + ~)(O - Of)]
~ e - 1 .
n~-N sin[Z(O - 0')]
8.3 PROBLEMS 221
8.3. Find the Fourier series expansion of the periodic function defined on its fun-
damental cell as
!(8) = {-1(1f+8) if -1f ~8 < 0,
1(1f -8) if 0 < 8 ~ tt,
8.4. Show that An and B" inEquation (8.2) are real when !(8) is real.
8.5. Find the Fourier series expansion ofthe periodic function !(8) defined on its
fundamental cell, (-1f, zr), as !(8) = cosa8,
(a) when a is an integer. (b) when a is not an integer.
8.6. Find the Fourier series expansion of the periodic function defined on its fun-
damental cell, (-1f, x), as !(8) = 8.
8.7. Considerthe periodic function that is defined on its fundamental cell, (-a, a),
as !(x) = [z],
(a) Find its Fourier series expansion.
(b) Evaluate both sides of the expansion at x = 0, and show that
2 8~ 1
it = L.., (2k + 1)2.
k=O
(c) Show that the infinite series gives the same result as the function when both
are evaluated at x = a.
8.8. Let !(x) = x be a periodic function defined over the interval (0, 2a). Find
the Fourier series expansion of f .
8.9. Show that the piecewise parabolic "approximation" to a2 sin(1fxja) in the
interval (-a, a) given by the function
!(x) = {4X(a +x) if - a ~ x ~ 0
4x(a-x) if O~x~a
hastheFourier series expansion
32a2
~ 1 . (2n + 1)1fx
!(x) = ---;3 ~ (2n + 1)3 sm a .
Plot !(x), a2 sin(1fxja), and the series expansion (up to 20 terms) for a = I
between -I and +I on the same graph.
8.10. Find the Fourier series expansion of !(8) = 82
for 181 < tt . Then show that
and
1f2 00 (_I)"
---~­
12 - L.., n2 •
n=l
222 8. FOURIER ANALYSIS
8.11. Find the Fourier series expansion of
f(t) = {Sinwt if 0::; t ::;nfo»,
o if - rr/w::; t::; O.
8.12. What is the Fourier transform of
(a) the constant function f (x) = C, and
(b) the Dirac delta function 8(x)?
8.13. Show that
(a) if g(x) is real, then g*(k) = g(-k), and
(b) if g(x) is even (odd), then g(k) is also even (odd).
8.14. Let gc(x) stand for the single function that is nonzero only on a subinterval
of the fundamental cell (a, a +L). Define the function g(x) as
00
g(x) = L gc(x - jL).
j=-oo
(a) Show that g(x) is periodic with period L.
(b) Find its Fourier transform g(k), and verify that
00
g(k) = Lgc(k) L 8(kL - 2mrr).
m=-oo
(c) Find the (inverse) transform of g(k), and show that it is the Fourier series of
gc(x).
8.15. Evaluate the Fourier transform of
g(x) = {b - blxl/a if [x] < a,
o if [x] > a.
8.16. Let f(6) be a periodic function given by f(6) = I:~-oo anetlle. Find its
Fourier transform /(t).
8.17. Let
if ItI < T,
if ItI > T.
Show that
/(w) = _1_ {Sin[(W- "'O)T] _ sin[(w +wa)T]} .
"J2ii w - wo w +"'0
Verify the uncertainty relation !'>w!'>t RJ 4rr.
8.3 PR08LEMS 223
8.18. if f(x) = g(x +a), show that j(k) = e-iakg(k).
8.19. For a > 0 find the Fourier transform of f(x) = e-alxl. Is j(k) symmetric?
Is it real? Verify the uncertaioty relations.
8.20. The displacement of a damped harmonic oscillator is given by
{
Ae- atei""'t if t > 0
r» = 0 if 0'
1 t < .
Fiod jew) and show that the frequency distribution jj(w)12 is given by
- 2 A2
I
If(w)1 = -2 ( )2 + 2'
7l'ltJ-lVO a
convolution theorem 8.21. Prove the convolution theorem:
i:f(x)g(y -x)dx = i:j(k)g(k)e
ikY
dk.
What will this give when y = O?
Parseval's relation 8.22. Prove Parseval's relation for Fourier transforms:
i:f(x)g*(x)dx = i:j(k)g*(k)dk.
Inparticular, the norm ofa function-with weightfunction equal to I-is iovariant
under Fourier transform,
8.23. Use the completeness relation 1 = La In) (nl and sandwich it between [x)
and (x'] to find an expression for the Dirac delta function in terms of an iofinite
seriesof orthonormal functions.
8.24. Use a Fourier transform io three dimensions to find a solution of the Poisson
equation: V2<1>(r) = -4np(r).
8.25. For 9'(x) = 8(x - x'), find q;(y).
8.26. Show that jet) = f( -t).
8.27. The Fourier transform of a distribution 9' is given by
00 I
q;(t) = I: _8' (t - n).
n=on!
What is 9'(x)? Hiot: Use ql(x) = 9'(-x)
8.28. For f (x) = LZ~o akxk, show that
n
j(u) = y'2;I:ikak8(k)(u),
k~O
where
224 8. FOURIER ANALYSIS
Additional Reading
1. Courant, R. and Hilbert, D. Methods ofMathematical Physics, vol. I, In-
terscience, 1962. The classic book by two masters. This is a very readable
book written specifically for physicists. Its treatment of Fourier series and
transforms is very clear.
2. DeVries, P.A First Course in Computational Physics, Wiley, 1994. A good
discnssion ofthe fast Fourier transform inclnding some illnstrative computer
programs.
3. Reed, M., and Simon', B. Fourier Analysis, Self-Adjointness, Academic
Press, 1980. Second volume of a four-volume series. A comprehensive ex-
position of Fourier analysis with emphasis on operator theory.
4. Richtmyer, R. Principles of Advanced Mathematical Physics, Springer-
Verlag, 1978. A two-volume book on mathematical physics written in a
formal style, but very useful due to its comprehensiveness and the large
number of examples drawn from physics. Chapter 4 discusses Fourier anal-
ysis and distributions.
Part III _
Complex Analysis
j
j
j
j
j
j
j
j
9 _
Complex Calculus
Complex analysis, just like real analysis, deals with questions of continuity, con-
vergence of series,differentiation, integration, andso forth. Thereader is assumed
to have been exposed to the algebra of complex numbers.
9.1 ComplexFunctions
A complex function is a map f : C --> C, and we write f(z) = w, where both
z and w are complex numbers.' The map f can be geometrically thought of as a
correspondence between two complex planes, the z-plane and the w-plane. The w-
plane has a real axis and an imaginary axis, which we can call u and v, respectively.
Both u and v are real functions ofthe coordinates ofz, i.e., x and y. Therefore, we
may write
f(z) = u(x, y) +iv(x, y). (9.1)
This equation gives a unique point (u, v) in the w-plane for each point (x, y)
in the z-plane (see Figure 9.1). Under f, regions of the z-plane are mapped onto
regions of the w-plane. For instance, a curve in the z-plane may be mapped into a
curve in the w-plane. The following example illustrates this point.
9.1.1. Example. Letus investigate thebehavior of acoupleof elementary complexfunc-
tions. In particular, we shall look at theway a line y = mx in thez-plane is mapped into
curves in the w-plane.
1Strictly speaking, we shouldwritef : S -e- C whereS is a subset of thecomplexplane. Thereasonis that mostfunctions
arenotdefined fortheentire setof complexnumbers, so that thedomain of suchfunctions is notnecessarily C. Weshallspecify
thedomain only whenit is absolutely necessary. Otherwise, we use thegenericnotation f :C -e- C, even though f is defined
onlyon a subsetof C.
228 9. COMPLEX CALCULUS
f
z (x,y) w (u,v)
Figure 9.1 A map from the z-plane to the w-plane.
(a) For w= I(z) = z2, we have
w = (x +iy)2 =x2 - i +2ixy,
with u(x, y) = x2 - y2 and v(x, y) = 2xy. For y = mx, i.e., for a line in the z-plane
withslopem, theseequations yieldu = (1- m2)x 2 andv = 2mx2. Eliminating x inthese
equations, we find v = [2m(1 - m2)]u. This is a line passing through the origin of the
w-plane [see Figore 9.2(a)]. Note that the angle the image line makes with the real axis
ofthe w-plane is twice the angle the original line makes with the x-axis. (Shnw this!). (b)
Thefunction w = fez) = eZ = ex+iy gives u(x, y) = eXcos y andvex, y) = eXsiny.
Substituting y = mx, we obtain u = eXcosmx and v = ex sinmx. Unlikepart (a), we
cannot eliminate x to find v as an explicitfunction of u, Nevertheless, the last pairof
equations areparametric equations of a curve, whichwe canplotin a uv-plane as shown
inFigore9.2(b). III
Limits of complex functions are defined in terms of absolute values. Thus,
limz-+a f(z) = wo means that given any real number f > 0, we can find a
corresponding real number 8 > Osuchthatlf(z)-wol < s whenever Izc-u] < 8.
Similarly, we say that a function f is cnntinunus at z = a iflirnz-+a f(z) = f(a).
9.2 AnalyticFunctions
The derivative of a complex function is defined as usual:
9.2.1.Definitinn. Let f : iC --> iC be a complexfunction. The derivative of f at
zo is
df I = lim f(zo +L>z) - f(zo) ,
dz Zo 6.z-+0 .6.z
provided that the limit exists and is independent ofL>z.
9.2 ANALYTIC FUNCTIONS 229
y
z2
a
-+
x
(aJ
y
eZ
a
-+
x
(bJ
2a
u
2+4im
1 +im
Example Illustrating
path dependence of
derivative
Figure9.2 (a)Themap z2 takes alinewithslopeanglea and maps ittoalinewithtwice
theanglein the w-plane. (b)ThemapeZ
takes thesameline and mapsit to a spiral in the
w-plane.
In this definition "independent of /),.z" means independent of /),.X and /),.y (the
components of Az) and, therefore, independent ofthe direction of approach to zoo
The restrictions of this definition apply to the real case as well. For instance, the
derivative of f(x) = [x] at x = 0 does not exisr' because it approaches +I from
the right and - I from the left.
Itcan easily be shownthat allthe formal rules ofdifferentiation that apply to the
real case also apply to the complex case. Forexample, iff and g are differentiable,
then f ± g, f g, and-as long as g is not zero-f / g are also differentiable, and
their derivatives are given by the usual rules of differentiation.
9.2.2, Example. Let us examinethe derivative of fez) = x2 +2ii at z = I +i:
df I = lim f(1 +i +Ll.z)- f(t +i)
dz z=l+i 6.z~O ..6.z
. (I + /),.x)2+2i(t + Ll.y)2 - t - 2i
= hm
.8.x--*O Ax + if:!..y
.6.y~O
= lim 2Ll.x+4iLl.y+(~x)2+2i(Ll.y)2.
8x~O Ax + l.6..y
.6.y~O
Let us approachz = I + i along the line y - I = m(x - I). Then Ll.y = m Sx, and the
limit yields
df I = lim 2/),.x +4imLl.x + (/),.x)2 +2im2(Ll.x)2
dz z=l+i .6.x~O Ax + im Sx
20ne canrephrase thisandsaythatthederivative exists.butnotinterms of ordinary functions, rather, interms of generalized
functions-in this case 8(x)---discussed in Chapter6.
230 9. COMPLEX CALCULUS
It follows that we get infinitely manyvaluesfor the derivative depending on thevaluewe
assign to m, i.e., depending on the direction along which we approach 1 + i, Thus, the
derivative does notexist atz = 1+i, III
It is clear from the definition that differentiability pnts a severe restriction on
fez) because it requires the limit to be the saroe for all paths going through zoo
Furthermore, differentiability is a local property: To test whether or not a function
fez) is differentiable at zo. we move away from zo by a small aroount Az and
check the existence of the limit in Definition 9.2.1.
What are the conditions under which a complex function is differentiable? For
fez) = u(x, y) +iv(x, y), Definition 9.2.1 yields
df I = lim {u(xo + AX, YO + Ay) - u(xo, YO)
dz zo 8x-+O Ax+iAy
dy~O
. v(xo + AX, YO + Ay) - v(xo, YO)}
+1 .
Ax +iAy
If this limit is to exist for all paths, it must exist for the two particular paths on
which Ay =0 (parallel to the x-axis) and Ax =0 (parallel to the y-axis). For the
first path we get
df I = lim u(xo + AX, Yo) - u(xo, Yo)
dz zo llx-+O ~x
. Ii v(xo + AX, YO) - v(xo, YO) au I .av I
+1 m = - +1 - .
Lx-+O !:i.x ax (xo,Yo) ax (xo,yo)
For the second path (Ax = 0), we obtain
df I = lim u(xo, Yo + Ay) - u(xo, YO)
dz eo 8y-+O iAy
. I' v(xo, Yo+ Ay) - v(xo, YO) . au I av I
+1 rm =-1 - +-
8y-+O iAy ay (xo,Yo) ay (XO,Yo) .
If f is to be differentiable at zo, the derivatives along the two paths must be equal.
Equating the real and imaginary parts of both sides of this equation and ignoriog
the subscript zo (xo, Yo,or zo is arbitrary), we obtain
au av
ax = ay
and (9.2)
These two conditions, which are necessary for the differentiability of f, are called
Cauchy-Riemann the Cauchy-Riemann conditions.
conditions An alternative way of writing the Cauchy-Riemann (C-R) conditions is ob-
tained by making the substitution' x = !(z + z*) and y = 11(z - z") in u(x, y)
3Weuse z" to indicate thecomplexconjugate of z.Occasionally we mayuse z.
9.2 ANALYTIC FUNCTIONS 231
and vex, y), using the chain rule to write Equation (9.2) in terms of z and z",
substituting the results in !L = ~ +i ~ and showing that Equation (9.2) is
az* az* az*
equivalent to the single equation af/az* = O. This equation says that
9.2.3. Box. Iff is to be differentiable, it must be independent ofz".
Ifthe derivative of f exists, the arguments leading to Equation (9.2) imply that
the derivative can be expressed as
Expression forthe
derivative ofa
differentiable
complex function
df au' av av au
- = -+i-= --i-.
dz ax ax ay ay (9.3)
The C-R conditions assure us that these two equations are equivalent.
The following example illustrates the differentiability of complex functions.
9.2.4. Example. Letus determine whether or not thefollowingfunctions aredifferen-
tiable:
(a) We have alreadyestablishedthat fez) = x2 + 2iy2 is not differentiableat z =1+ t.
We can now show that it is has no derivative at any point in the complex plane (ex-
cept at the origin). This is easily seen by noting that U = x2 and v = 2y2, and that
au/ax = 2x of. av/ay = 4y, and the first Cauchy-Riemann conditionis not satisfied. The
secondC-Rcondition is satisfied, butthat is notenough.
Wecan also write fez) in terms of z andz":
fez) = G(Z+z*)t +2i [;i (z - z*)t = 1(1-2i)(z2 +z*2) + !(l +2i)zz*.
f (z) hasanexplicitdependence on z". Therefore, it is notdifferentiable.
(b) Now consider fez) =x2 - y2 +2ixy, for which u = x2 - i and v = 2xy. The C-R
conditionsbecome au/ax =2x = av/ay and au/ay = -2y = -av/ax. Thus, fez) may
bedifferentiable. Recallthat theC-Rconditions areonlynecessary conditions; wehavenot
shown(bot we will, shortly)that they are also sufficient.
Tocheck the dependenceof f on z", sobstitntex = (z +z*)/2 and y = (z - z*)/(2i)
in u andv to showthat f (z) = z2, andthus there is no z" dependence.
(c) Let u(x, y) =eXcos y and vex, y) =eXsiny. Then au/ax = eX cosy = av/ay and
aulay = _ex siny =-avlax,and theC-Rconditions aresatisfied. Also,
fez) = e" cos y +ie" siny =eX(cosy +i siny) =e"ely = e
x
+iy =eZ,
andthere is no z"dependence.
The requirement of differentiability is very restrictive: The derivative must
exist along infiuitely many paths. On the other hand, the C-R conditions seem
deceptively mild: They are derived for only two paths. Nevertheless, the two paths
are, in fact, true representatives ofall paths; that is, the C-R conditions are not only
necessary, but also sufficient:
232 9. COMPLEX CALCULUS
9.2.5. Theorem. The function f(z) = u(x, y) + iv(x, y) is differentiable in a
region ofthe complex plane ifand only if the Cauchy-Riemann conditions,
. au av
=
ax ay
and
au av
=
ay ax
(or, equivalently, af/az* = 0), are satisfied and all first partial derivatives of u
and v are continuous in that region. In that case
df au . av av . au
-=-+Z-=--z-.
dz ax ax ay ay
Proof We have already shown the "only if" part. To show the "if" part, note that
if the derivative exists at all, it must equal (9.3). Thus, we have to show that
lim f(z + M) - f(z) = au +i av
..6..z--+o llz ax ax
or, equivaleutly, that
I
f (z + AZ) - f (z) _ (au +iav)1 <f
Az ax ax
By defiuition,
whenever IAzl < 8.
f(z + Az) - f(z) = u(x + AX, y + Ay) +iv(x + AX, y + Ay) - u(x, y) - iv(x, y).
Since u andv havecontinuous first partial derivatives, we canwrite
au au
u(x + AX, y + Ay) = u(x, y) + ax AX + ay Ay +flAx +81 Ay,
av av
v(x + AX, y + Ay) = v(x, y) + -Ax + -Ay + f2Ax +82Ay,
ax ay
where sj , ez, 8J, and 82are real numbers that approachzero as Ax and Ay approach
zero.Using theseexpressions, we canwrite
f(z + Az) - f(z) = (aU +i av) AX +i (-i au + av) Ay
ax ax ay ay
+(fl +if2)Ax + (81 +i82)Ay
(
au av)
= ax +i
ax
(Ax+iAy)+fAx+8Ay,
where f == er +iez. 8 == 81+i82, and we used the C-R conditions in the last step.
Dividing both sides by Az = Ax +i Ay, we get
f(z + Az) - f(z) _ (aU +i av) = fAX +8 Ay.
Az ax ax Az Az
analyticity and
singularity; regular
and singular points;
entire functions
9.2 ANALYTIC FUNCTIONS 233
By the triangle inequality, IRHSI :S lEI + iE21 + 181 +i821. This follows from
the fact that Il!.xl/Il!.zl and Il!.yl/Il!.zI are both equal to at most 1. The E and 8
terms can be made as small as desired by making l!.z small enough. We have thus
established that when the CoRconditions hold, the function f is differentiable. D
Angustin-Louis Cauchy (1789-1857) was ODe of the most influ-
entialFrench mathematicians of thenineteenth century. Hebegan
his career as a military engineer, butwhenhis healthbroke down
in 1813 he followedhis natural inclination anddevoted himself
wholly tomathematics.
In mathematical productivity Cauchy was surpassed only by
Euler, andhis collectedworks fill 27 fatvolumes. He madesub-
stantial contributions to number theory anddeterminants; is con-
sideredto bethe originatorof the theory of finitegroups; and did
extensive workin astronomy, mechanics, optics,andthetheory of
elasticity.
His greatest achievements, however, layin thefieldof analysis. Together withhis con-
temporaries Gauss andAbel, he wasapioneer intherigorous treatmentof limits,continuous
functions, derivatives, integrals, andinfinite series.Severalof thebasictestsfortheconver-
genceof seriesare associated withhis name. He alsoprovided thefirst existenceproof for
solutions of differential equations, gavethefirst proofof theconvergence of aTaylor series,
andwasthefirst to feel theneedfora careful studyof theconvergence behavior of Fourier
series(see Chapter 8). However, his most important workwas in the theory of functions
of a complexvariable, whichinessence he created andwhichhas continued to beone of
thedominant branches of bothpureandapplied mathematics. In thisfield,Cauchy's inte-
graltheorem andCauchy's integral formula arefundamental tools without whichmodem
analysiscould hardly exist (see Chapter 9).
Unfortunately, his personality did notharmonize with thefruitful powerof his mind.
He was an arrogant royalist in politics anda self-righteous, preaching, pious believer in
religion-all this in an age of republican skepticism-and most of his fellow scientists
disliked him and considered him a smug hypocrite. It might be fairer to put first things
first and describe him as a great mathematician who happened also to be a sincere but
narrow-minded bigot.
9.2.6. Definition. Afunction f : C --+ C is called analytic at zo if it is differen-
tiable at zo and at all other points in some neighborhood ofzoo A point at which
f is analytic is called a regular point of f. A point at which f is not analytic is
called a singular point or a singularity of f. A function for which all points in iC
are regular is called an entire function.
9.2.7. Example. DERNATIVES OF SOME FUNCTIONS
(a) fez) = z.
Hereu = x andv = y; theC-Rconditions areeasilyshownto hold,andforanyz,we have
df/dz = ou/ox +iov/ox = 1.Therefore, thederivativeexists at all pointsof the cornpiex
234 9. COMPLEX CALCULUS
plane. (b) f(z) = z2
Here u = x2 - y2 and v = 2xy; the C-R conditions hold, and for all points z of the
complex plane, we have dfldz = aulax + iavlax = 2x + i2y = Zz. Therefore, f(z) is
differentiable at all points. (c) f(z) = zn for n 2: I.
Wecanusemathematical induction andthefactthattheproduct oftwoentire functions isan
entire function to show that ..</..(zn) = nzn-I. (d) f(z) = ao +aJZ +...+an_JZn-1+
dz
anZn,
where ai arearbitrary constants. That fez) is entire follows directly from part(e) andthe
factthatthesumof two entire functions is entire. (e) fez) = liz. Thederivative canbe
found to be f'(z) = -Ilz2, which does not exist for z = O. Thus, z = 0 is a singularity
of f(z). However, any other point is a regular point of f. (f) f(z) = [z[2. Using the
definition of thederivative, we obtain
!;.f [z+ !;.z[2 _[z[2
!;.z = !;,z
(
.z:....:,+..:!;.=z)cc(z::.*--,:.+..:!;.=z::.*-')_---..::zz"--* * A * !;.z*
- =z +uz +z--.
!;,z !;.z
For z = 0, !;.fl!;.z = !;.z*, which goes to zero as Az --> O. Therefore, dfldz = 0 at
z = 0.4 However, ifz i' 0, the limit of !;.fI!;.z will depend on how z is approached. Thus,
df/dz does not exist if z i' O. This shows that [z[2 is differentiable only at z = 0 and
nowhere else in its neighborhood. It also shows that even if the real (here, u = x2 + ;)
andimaginary (here, v = 0) parts of acomplexfunction havecontinuous partial derivatives
of alI orders at a point, the functiun may not he differentiable there. (g) f(z) = II sin z:
This gives dfldz =- coszl sin2z. Thus, f has infinitely many (isolated) singular points
at z = ±nrr forn = 0, 1,2,.... III
9.2.8. Example. THE COMPLEX EXPONENTIAL FUNCTION
In this example, we lind the (unique) function f : <C --> <C that has the following three
properties:
(a) f is single-valued and analytic for all z,
(b) dfldz = f(z), and
(c) f(ZI +Z2) = f(ZI)f(Z2)·
Property (b) shows that if f (z) is well behaved, then dfIdds also well behaved. In particular,
if f (z) is defined for all values ofz, then f must he entire.
For zt = 0 = Z2,property (c) yields flO) = [/(0)]2 =} f(O) = I, or f(O) = o. On
the other hand,
df = lim f(z + !;.z) - f(z) = lim f(z)f(!;.z) - f(z) = f(z) lim f(!;.z) - I
dz ..6.z~O .6.z ..6.z--+0 8z ..6.z-+0 ilz
Properly (b) now implies that
lim f(!;.z) - I = I =} 1'(0) = I
..6.z--+0 ..6.z
and f(O) = I.
4Although thederivative of Izl2 exists atz= 0, itis notanalytic there(oranywhere else). Tobe analytic atapoint,afunction
musthavederivatives atall points in someneighborhood of thegivenpoint.
9.2 ANALYTIC FUNCTIONS 235
The first implication follows from the definition ofderivative, and the second from the fact
thatthe onlyother choice,namely1(0) = 0, wouldyield -00 for the limit.
Now,we write I(z) =u(x, y) + iv(x, y},for whichproperty(b) becomes
au av au av
-+i-=u+iv;;;} -=u, -=v.
ax ax ax ax
Theseequatioushavethe most general solutionu(x, y) = a(y)e' and v(x, y) = b(y)eX,
where a(y) and bey) are the "constants" of integration. The Cauchy-Riemann conditions
now yield aryl = dbldy and daldy = -b(y), whose most general solutionis aryl =
Acosy+Bsiny,b(y) =Asiny-Bcosy.Ontheotherhand,/(O) = lyieldsu(O,O) = I
andv(O,0) =0,implyingthata(O) = I, b(O)=0 or A = I, B =O. Wethereforeconclude
that
I(z) =a(y)eX+ib(y)e' =eX(cosy +i siny) =exiy =eZ
•
Both ex and eiy are well-defined in the entire complex plane. Hence, eZ is defined and
differentiable over all C; therefore, it is entire. II
Example 9.2.7 shows that any polynomial in z is entire. Example 9.2.8 shows
that the exponential function eZ is also entire. Therefore, any product and/or sum
ofpolynomials and eZ will also he entire. We can bnild other entire functions. For
instance, eiz and e-iz are entire functions; therefore. the trigonometric functions,
defined as
eiz _ e-iz eiz +e-iz
sinz= 2i and cosz= 2 (9.4)
are also entire functions. Problem 9.5 shows that sin z and cos z have only real
zeros. The hyperholic functions can be defined similarly:
eZ _ e-z eZ +e-z
sinh z = and cosh z = (9.5)
2 2
Although the sum and the product of entire functions are entire, the ratio, in
general, is not. For instance, if f(z) and g(z) are polynontials of degrees m and
n, respectively, then for n > 0, the ratio f (z)Ig(z) is not entire, because at the
zeros ofg(z)-which always exist and we assume that it is not a zero of f(z)-the
derivative is not defined.
The functions u(x, y) and v(x, y)' of an analytic function have an interesting
property that the following example investigates.
9.2.9. Example. The familyof curvesu(x, y) = constant is perpendicularto thefamily
of curves vex, y) = constant at each point of the complex plane where fez) = u + iv is
analytic.
This can easily be seenby lookingat thenormalto the curves.The normalto the curve
u(x, y) = constant is simply Vu = (aulax, aulay). Similarly, the normal to the curve
v(x, y) = constant is Vv = (avlax, avlay). Takingthedotproductofthesetwononna1s,
we obtain
au av au av au (au) au (au)
(Vu) . (Vv) = ax ax + ay ay = ax - ay + ay ax =0
by theC-R conditions. III
236 9. COMPLEX CALCULUS
9.3 Conformal Maps
The real and imaginary parts of an analytic function separately satisfy the two-
dimensional Laplace's equation:
a2
u a2
u a2
v a2
v
-+--0 -+-=0 (9.6)
ax2 ay 2 - , ax2 ai
This can easily be verified from the C-R conditions. Laplace's equation in three
dimensions,
a2
<1> a2
<1> a2
<1>
- + - + - - 0
ax2 ai 8z2 - ,
describes the electrostatic potential <I> in a charge-free region of space. In a typ-
ical electrostatic problem the potential <I> is given at certain bonndaries (usually
conducting surfaces), and its value at every point in space is sought. There are
numerous techniques for solving such problems, and some of them will be dis-
cussed later in the book. However, some of these problems have a certain degree
of symmetry that reduces them to two-dimensional problems. In such cases, the
theory of analytic functions can be extremely helpful.
The symmetry mentioned above is cylindrical symmetry, where the potential is
known a priori to be independent of the z-coordinate (the axis of symmetry). This
situation occurs when conductors are cylinders and-if there are charge distribu-
tions in certain regions of space-the densities are z-independent. In such cases,
a<l>jaz = 0, and the problem reduces to a two-dimensional one.
harmonic functions Functions satisfying Laplace's equation are called harmonic functions. Thus,
the electrostatic potential is a three-dimensioual harmonic function, and the po-
tential for a cylindrically symmetric charge distribution and boundary condition is
a two-dimensional harmonic function. Since the real and the imaginary parts of a
complex analytic function are also harmonic, techniques of complex analysis are
sometimes useful in solving electrostatic problems with cylindrical symmetry.'
To illustrate the connection between electrostatics and complex analysis, con-
sider a long straight filament with a constant linear charge density A. It is shown in
introdnctory electromagnetism that the potential <I> (disregarding the arbitrary con-
stant that determines the reference potential) is given, in cylindrical coordinates,
by <I> = 2Alnp = 2AIn[(X2 + y2)1/2] = 2AIn [z]. Since <I> satisfies Laplace's
equation, we conclude that <I> could be the real part of an analytic function w(z),
complex potential which we call the complex potential. Example 9.2.9, plus the fact that the curves
u = <I> = constant are circles, imply that the constant-v curves arerays, i.e.,
v ()( rp. Choosing the constant of proportionality as 2A,we obtain
w(z) = 2Alnp + i2Arp = 2AIn(pei~). = 2AlnZ
5We useelectrostatics because it is morefamiliar tophysicsstudents. Engineering students arefamiliar withsteadystate heat
transfer aswell, whichalsoinvolves Laplace's equation, andtherefore is amenable to thistechnique.
9.3 CONFORMAL MAPS 237
It is useful to know the complex potential of more than one filament of charge.
Tofind such a potential we must first find w(z) for aline charge when it is displaced
from the origin. Ifthe line is located at zo = xo +iyo, then it is easy to show that
w(z) = 2Aln(Z - zo). If there aren line charges located at zjv zj, ... , Zn, then
n
w(z) = 2 I>kIn(z - Zk)·
k=l
(9.7)
The function w(z) can be used directly to solve a number ofelectrostatic problems
involving simple charge distributions and conductor arrangements. Some of these
are illustrated in problems at the end of this chapter. Iostead of treating w(z) as a
complex potential, let us look at it as a map from the z-plane (or xy-plane) to the
w-plane (or uv-plane). In particular, the equipotential curves (circles) are mapped
onto lines parallel to the v-axis in the w-plane. This is so because equipotential
curves are defined by u = constant. Similarly, the constant-v curves are mapped
onto horizontal lines in the w-plane.
This is an enormous simplification of the geometry. Straight lines, especially
when they are parallel to axes, are by far simpler geometrical objects than circles,"
especially if the circles are not centered at the origin. So let us consider two
complex "worlds." One is represented by the xy-plane and denoted by z. The
other, the "prime world," is represented? by z', and its real and imaginary parts
by x' and y'. We start in z, where we need to find a physical quantity such as the
electrostatic potential <I>(x, y). If the problem is too complicated in the z-world,
we transfer it to the z'-world, in which it may be easily solvable; we solve the
problem there (in terms of x' and y') and then transfer back to the z-world (x and
y). The mapping that relates z and z' must be cleverly chosen. Otherwise, there is
no guarantee that the problem will simplify.
Two conditions are necessary for the above strategy to work. First, the dif-
ferential equation describing the physics must not get more complicated with the
transfer to z'.Since Laplace's equation is already of the simplest type, the z'-world
must also respect Laplace's equation. Second, and more importantly, the map-
ping must preserve the angles between curves. This is necessary because we want
the equipotential curves and the field lines to be perpendicular in both worlds. A
mapping that preserves the angle between two curves at a given point is called
conformal mapping a conformal mapping. We already have such mappings at our disposal, as the
following proposition shows.
9.3.1. Proposition. Let VI and Y2 be curves in the complex z-plane that intersectat
apointui at an angle a.Let f : iC --+ C be a mappinggiven by f(z) =z' =x'+iy'
that is analytic at zoo Let vi and V~ be the images ofVI and Y2 under this mapping,
which intersect at an angle a', Then,
6This statement is valid only in Cartesian coordinates. But these are precisely the coordinates we are using in this discussion.
7We are using Zl instead of w, and (x', y') instead of Cu,v).
for i = 1,2.
238 9. COMPLEX CALCULUS
(a) a' = a, that is, the mapping f is conformal, if (dz'ldzlzo oF O.
(b) Iff is harmonic in (x, y), it is also harmonic in (x', y').
Proof We sketch the proof of the first part. The details, as well as the proof ofthe
second part, involve partial differentiation and the chain rule and are left for the
reader. The angle between the two curves is obtained by taking the inner product
of the two unit vectors tangent to the curves at zoo A small displacement along Yi
canbe written asexAxi + eyD..Yi for i = 1,2, and theunitvectors as
'" ex.6.xi +eylJ.Yi
ej =
J(!;'Xi)2 + (!;.Yi)2
Therefore,
, , !;.XI!;.X2 +!;.YI !;.Y2
el . ez = . .
J (!;.XIP + (!;'YI)2J(!;.X2)2 + (!;.Y2P
Sintilarly, in the prime plane, we have
!;.x;!;.x~ + !;.Y;!;.y~
e~ .e; = ,,===='==~r=~==~===
J(!;.x;)2 +(!;.y;pJ(!;.x~)2 +(!;.y~)2'
(9.8) .
translation
dilation
inversion
where x' = u(x, y) andy' = vex, y) andu and v are the real andimaginaryPj
of the analytic function f. Using the relations
, au au , av av
!;.x, = -!;.x, + -a !;.Yi, !;.Yi = -a !;.Xi + -!;.Yi, i =1,2,
ax Y x ay
and the Cauchy-Riemann conditions, the readermay verify that 1'; .e~ = 1'1 ·1'2. D
The following are some examples of conformal mappings.
(a) z' = z +a,where a is an arbitrary complex constant. This is simply a trans-
lation of the z-plane,
(b) z' = bz; where b is an arbitrary complex constant. This is a dilation whereby
distances are dilated by a factor Ibl. A graph in the z-plane is mapped onto a sim-
ilar (congruent) graph in the z'·plane that will be reduced (Ibl < 1) or enlarged
(!bl > 1) by a factor of Ibl.
(c) z' = liz. This is called an inversion. Example 9.3.2 will show that under such
a mapping, circles are mapped onto circles or straight lines.
(d) Combining the preceding three transformations yields the general mapping
, az+b
Z=--,
cz+d
which is conformal if cz +d oF 0 oF dz'ldz. The latter conditions are eqnivalent
toad - be oF O.
9.3 CONFORMAL MAPS 239
9.3.2. Example, A circle of radius r whose center is at a in the a-plane is described by
the equation lz - aI = r. When transforming to the z'-plane under inversion, this equation
becomes II/z' - a] = r, or 11- all = rill. Squaringboth sidesand simplifying yields
(r2 - lal2)lz'12
+2Re(al) - I =O. In termsof Cartesiancoordinates, this becomes
(9.9)
where a == a- +ia;. We now consider two cases:
1. r i' 14 Divideby r2 -lal2 and comptetethe squaresto get
(
' ar )2 (, ai )2 a;+af 1
x + + y - - - -0
r2 _ lal2 r2 - lal2 (r2 _ la12)2 r2 - lal2 - ,
or definiug a; sa -ar/(r2 -laI2), a; eaai!(r2 -laI2
), and r' '" r/lr2 -lal2l, we
have (x' - a;)2 + (y' - aj)2 = rll, which can also be written as
Ii - a'l = r',
I , . , a*
a = ar + la; = 2 2'
lal -r
homographic
transformations
This is a circle in the z' -plane with center at a' and radius of r',
2. r = a: Then Equation (9.9) reduces to a-x' - aj y' = l.which is the equation of a
liue.
If we use the transformation z' = 1/(z - c) instead of z' = lIz, then lz - at = r
becomes 11/z' - (a - c)I = r, and all the above analysis will go through exactly as before,
exceptthata is replacedby a - c. III
Mappiugs of the form given in Equation (9.8) are called homographic trans-
formations. A nseful property of such transformations is that they can map an
iufinite region of the z-plane onto a finite region of the z'-plane. In fact, points
with very large values ofz are mapped onto a neighborhood of the point z' = a/ c.
Of course, this argmnent goes both ways: Eqnation (9.8) also maps a neighbor-
hood of -d/ c in the z-plane onto large regions of the z'-plane. The nsefulness of
homographic transformations is illustrated in the followiug example.
9.3.3. Example. Cousidertwocyliudricalconductorsofequalrar1ius r, beldatpotentials
Ul and U2. respectively, whose centers are D units of length apart. Choose the x-and the
y-axes such that the centers of the cylinders are located on the x-axis at distances al and
«a from the origin, as shown in Figure 9.3. Let us find the electrostatic potential produced
by sucha configuration iu the xy-plane.
We know from elementary electrostatics that the problem becomes very simple if the
two cylinders are concentric (and, of course, of different radii). Thus, we try to map the
two circles onto two concentric circles in the z'-plane such that the infinite region outside
the two circles in the z-plane gets mapped onto the finite annular region between the two
concentric circles in the zl-plane. We then (easily) find the potential in the z'-plane, and
transfer it back to the z-plane.
The most general mapping that may be able to do the job is that given by Equation
(9.8). However, it turns out that we do not have to be this general. In fact, the special case
240 9. COMPLEX CALCULUS
Figure 9.3 In thez-plane, we see twoequalcylinders whosecenters areseparated.
z' = I/(z - c) in which c is a real constant will be snfficient. So, z = (I/z') + c, and the
circles Iz -akl =I" fork =1,2 will be mapped onto the circles [z' - a,(1 =1"'(, where (by
Example 9.3.2) a,( = (ak - c)/[(ak - c)2 - r 2] and r,( = r/I(ak - c)2 _ r 21
.
Canwe arrange theparameters so thatthe circles in the z' -plane are concentric, i.e.,
thata~ =a2
? Theansweris yes. Wesetal = a2 andsolveforaz in terms of al' Theresult
is either the trivial solution az = at, or cz = c - r2/(al - c). IT we place the originof
thez-plane atthecenter of the first cylinder, thenat = 0 andaz = D = c + r2[c. Wecan
also findat anda2:at = a2ee a' = -c/(c2 - r2), andthegeometry of the problem is as
shownin Figure 9.4.
For sucha geometry the potential at a pointin the annular regionis given by 4>' =
A In p + B = A In Iz' - a'i + B, where A and B are real constants determined by the
conditions <1>'(r;) = "1 and <1>' (r~) = "2, which yields
and
Thepotential $' is therealpart of thecomplexfuncttonf
F(Z') =Aln(z' - a') + B,
whichis analytic exceptatz' = a', apointlyingoutside theregionof interest. Wecannow
go back to the z-plane by substituting z' = I/(z - c) to obtain
G(z) = Ain (_1__ a') + B,
z-c
8Writing Z = IzleiO
, we notethatln z = In [z] +ie, so that therealpart of a complexlog function is thelog of theabsolute
value.
9.4 INTEGRATION OF COMPLEX FUNCTIONS 241
"
Figure 9.4 In ~~ z' -plane, we see two concentric unequal cylinders.
whoserealpart is thepotential in thez-plane:
I
I - a'z+ a'cl
<I>(x, y) =Re[G(z)] =Aln + B
z-c
=Alnl(1 +a'c-a'x) -ia'yl +B
(x-c)+iy
A [(I+a'c -a'x)2 +a'2y2]
=-In 2 2 +B.
2 (x - c) + y
Thisis thepotential we want.
9.4 Integration of Complex Functions
iii
The derivative of a complex function is an important concept and, as the previous
section demonstrated, provides a powerful tool in physical applications. The con-
cept of integration is even more important. In fact, we will see in the next section
that derivatives can be written in terms of integrals. We will study integrals of
complex functions in detail in this section.
The definite integral of a complex function is defined in analogy to that of a
real function:
1
' 2 N
!(z) dz = lim L !(Zi)Ll.Zi,
at N....:,-oo. 1
..6.zr+O' =
242 9. COMPLEX CALCULUS
where I1Zi is a small segment, situated at zs,ofthe curve that conoects the complex
numberal to the complex number a2 in the z-plane, Since there are infinitely many
ways ofconnecting al to a2, it is possible to obtain different values for the integral
for different paths.
Oneencounters a similar situation when one triesto evaluate theline integral
of a vector field. In fact, we can tum the integral of a complex function into a line
integral as follows. We substitute f(z) = u +iv anddz = dx +idy in the integral
to obtain
c c c
f(z)dz = (udx - vdy) +i (vdx +udy).
at at at
Ifwe define the two-dimensional vectors Al sa (u, -v) andA2 '" (v, u), then we
get ra, f(z) dz = ra, Al . dr +ira, A2 . dr. It follows from Stokes' theorem
JU I JUI jUt
(or Green's theorem, since the vectors lie in a plane) that the integral of f is path-
independent only if both AI and A2 have vanishing curls. This in tum follows if
and only if u and v satisfy the CoRconditions, and this is exactly what is needed
for f (z) to be analytic.
Path-independence of a line integral ofa vector A is equivalent to the vanishing
of the integral along a closed path, and the latter is equivalent to the vanishing of
V x A = 0 at every point of the region bordered by the closed path. The preceding
discussion is encapsnlated in an important theorem, which we shall state shortly.
First, however, it is worthwhile to become familiar with some terminology used
frequently in complex analysis.
curve defined 1. A curve is a map Y : [a, b] -> iC from the real interval into the complex
plane given by y(t) = Yr(t) +iYi(t), where a :s: t :s: b, and Yr and Yi are the
real and imaginary parts of Y; Y(a) is called the initial point of the curve
and y(b) its final point.
2. A simple arc, or a Jordan arc,is a curvethat does not crossitself, i.e., y is
injective (or one to one), so that y(tl) "" y(t2) when tl "" ti.
3. A path is a finite collection In, n, ... ,Yn} of simple arcs such that the
initial point of JIk+1 coincides with the final point of JIk.
4. A smooth arc is a curve for which dy ldt = dYr/dt + idYi/dt exists and
is nonzero for t E [a, b].
contour defined 5. A contour is a path whose arcs are smooth. When the initial point of YI
coincides with the final point of Yn,the contour is said to be a simple closed
contour.
Cauchy-Goursat 9.4,1. Theorem. (Cauchy-Goursat theorem) Let f : C -> iC be analytic on a
theorem simple closed contour C and at all points inside C. Then
i f(z)dz = O.
9.4 INTEGRATION OF COMPLEX FUNCTIONS 243
y (1,2)
x
Figure 9.5 Thethree different paths of integration corresponding to theintegrals II. If.
Iz. and I~.
9.4.2. Example. EXAMPLESOF DEFINITE INTEGRALS
(a) Let us evaluate the integral II = fY1 z dz where ·YI is the straight line drawn from the
origin to the point (1.2) (see Figure 9.5). Along such a line y = 2x and. using t for x.
YI(t) =t +2it where 0 :s t:s 1; so
II = 1z dz = [t +2it)(dt +2idt) = [I (-3tdt +4itdt) = -~ +u.
)'1 io 10
For a different path)'2. along which y = 2x2• we get )'2(t) = t +2it2 where 0 :s t :s 1.
and
Ii = Ir,zdz = 10
1
(t +2it
2)(dt
+4itdt) = -~ +2i.
Therefore, II = If. Thisis what is expected fromtheCauchy-Goursat theorem because
thefunction fez) = z is analytic on thetwopaths andin theregionbounded by them.
(h) To find Iz es fY1
z2dz with YI as in part (a). suhstitute for z in terms of t:
1 10
1 11 2
Iz = (t +2it)2(dt +2idt) = (1 +2i)3 t2dt = --3 - -3i,
n 0
Next we compare h with I~ = fY3
z2dz whereY3 is as shown in Figure 9.5. Thispathcan
be described by
n(t) = g+i(t -1)
Therefore,
for 0 s t .::: 1,
for 1 s t ::::: 3.
, 101
2 /,3 2 1 2 11 2
12 = t dt + [1 +itt - 1)] (idt) = - - 4 - -i = -- - -zt,
o 1 3 3 3 3
244 9. COMPLEX CALCULUS
Figure 9.6 Thetwosemicircular paths forcalculating h andI~.
whichis identical to 12.onceagainbecausethefunction is analytic on YlandY3 aswell as
in theregionbounded bythem.
(c) Now consider /3 es fY4
dz/z whereY4 is theupper semicircle of unit radius, as shown
in Figure 9.6. A parametric equation for Y4 canbe givenin terms of ():
O:::::8::S1L
convention for
positive sense of
integretion around a
closed contour
Thus,we obtain
On the otherhand,
, 1t j," t . te .
13 = -dz = ~,e dO = -llr.
Y~ Z 2n el
Herethetwo integrals arenotequal. From Y4 andY~ we canconstruct a counterclockwise
simpte closed contour C, along which the integral of f(z) = l/z becomes Jc dz/z =
/3 - I~ = liT(. Thatthe integral is not zero is a consequence of the fact that liz is not
analytic atallpointsof theregionbounded by the closedcontour C. ..
The Cauchy-Goursat theorem applies to more complicated regions. When a
region contains points at which f(z) is not analytic, those points can be avoided
by redefining the region and the contour. Such a procedure requires an agreement
on the direction we will take.
Convention. When integrating along a closed contour, we agree to move along
the contour in such a way that the enclosed region lies to our left. An integration
thatfollows this convention is called integration in the positive sense. Integration
performed in the opposite direction acquires a minus sign.
9.4 INTEGRATION OF COMPLEX FUNCTIONS 245
y
o x
simply and multiply
connected regions
Cauchy Integral
Formula
Figure 9.7 A complicated contourcan be broken up into simpler ones. Note thatthe
boundaries of the "eyes" and the "mouth" are forced to be traversed in the (negative)
clockwisedirection.
For a simple closed contour, movementin the counterclockwise directionyields
integration in the positive sense. However, as the contour becomes more compli-
cated, this conclusion breaks down. Figure 9.7 shows a complicatedpath enclosing
a region (shaded) in which the integrand is analytic. Note that it is possible to tra-
verse a portion of the region twice in opposite directions without affecting the
integral, which may be a sum of integrals for different pieces of the contour. Also
note that the"eyes" and the "mouth" are traversed clockwise! This is necessary
because of the convention above. A region such as that shown in Figure 9.7, in
which holes are "punched out," is called multiply connected. In contrast, a sim-
ply connected region is one in which every simple closed contour encloses only
points of the region.
One important consequence of the Cauchy-Goursat theorem is the following:
9.4.3. Theorem. (Cauchy iutegral formula) Let ! be analytic on and within a
simple closed contour C integrated in the positive sense. Let zo be any interior
point to C. Then
I i !(z)
!(ZO) = -. --dz
2:m c z - zo
To prove the Cauchy integral formula (CIF), we need the following lemma.
9.4.4. Lemma. (Darboux inequality) Suppose ! : C -+ iC is continuous and
bounded on a path y, i.e.•there exists a positive number M such that 1!(z)1 ::: M
246 9. COMPLEX CALCULUS
for all values z E y. Then
where L y is the length ofthe path ofintegration.
Proof
11 f(Z)dZI = IN~ootf(Zi)!!.Zil = J!!!'oo Itf(Zi)!!.Zi I
y Li.Zi--+O l=l .6.Zj--+O 1=1
N N
s lim ''If(zi)!!.ziI = lim L If(Zi)1 l!!.ziI
N--+oo !-- N--+oo .
..6.zi--+O 1=1 8Zj--+O 1=1
N
s M lim L l!!.ziI = MLy •
N-*oo. 1
Li.zi--+O 1=
The first inequality follows from the triangle inequality, the second from the bound-
edness of f, and the last equality follows .from the definition of the length of a
path. D
Now we are ready to prove the Cauchy integral formula.
ProofofCIF. Consider the shaded region in Figure 9.8, which is bounded by C,
by YO (a circle of arbitrarily small radius 8 centered at zo), and by Ll and L2, two
straight line segments infinitesimally close to one another (we can, in fact, assnme
that L1 and L2 are right on top of one another; however, they are separated in the
figure for clarity). Let us use C' to denote the union of all these curves.
Since f(z)/(z - zo) is analytic everywhere on the contour C' and inside the
shaded region, we can write
0= _1_ 1 f(z) dz (9.10)
2ni fe! z - zo
= _1_[1 f(z) dz + 1 f(z) dz + 1 f(z) dz + 1 f(z) dzJ.
2ni rc z - zo fro Z - zo hI Z - zo hl Z - zo ,
=0
The contributions from L1 and L2 cancel because they are integrals along the sarne
line segment in opposite directions. Let us evaluate the contribution from the in-
finitesimal circle yo.First we note that because f (z) is continuous (differentiability
implies continnity), we can write
I
f(z) - f(zo) I= If(z) - f(zo) I = If(z) - f(zo)!
z-zo Iz-zol 8
•
<-
8
9.4 INTEGRATION OF COMPLEX FUNCTIONS 247
y
x
Figure 9.8 Theintegrand is analytic withinandon theboundary of theshaded region. It
is alwayspossibleto construct contours that excludeall singular points.
for Z E JIO, where. is a small positive number. We now apply the Darbonx
inequality and write
1
1. !(z) - !(zo) dzi < =-2,,8 = 2"•.
fro z-zo 8
This means that the integral goes to zero as 8 ---> 0, or
1. !(z) dz = 1. !(zo) dz = !(zo) 1. .ss.;
froz-zo froz-zo froz-zo
We can easily calculate the integral on the RHS by noting that z - zo = 8ei• and
that JIO has a clockwise direction:
i dz 12
" i8e
i·d<p
i !(z)
- - = - . = -2"i =} --dz = -2"i!(zo).
Yo z - zo 0 8e' qJ Yo Z - zo
Snbstitnting this in (9.10) yields the desired result.
9.4.5. Example. Wecanusetheelf to evaluate theintegrals
D
fc
(z2 - I)dz
Iz=· ,
c, (z- iHz2 - 4)3
248 9. COMPLEX CALCULUS
where CI, e2, andC3 arecircles centered atthe origin withradii 71 = 3/2,72 = I, and
73 =4.
For lj we notethat /(z) = z2/(z2 + 3)2is analyticwithinandon Coand zc = i lies
in theinterior of C1. Thus,
t /(z)dz .. . i 2 .st
II = --. = 2",/(,) = 21"'2 2 = -'-2'
c, z -, (, + 3)
Similarly, /(z) = (z2 - 1)/(z2 - 4)3 for the integralIz is analytic on andwithinC2, and
ZQ = 1/2 is aninterior point of C2. Thus, the Cllt gives
i
/(z)dz 32".
Iz = --1 = 2".i/(1/2) = --5i.
C,z-:;: 112
Forthe lastintegral, / (z) = ez/2/(z2 - 20)4,andthe interiorpointis zu = iit:
2".
III
(9.11)
Explanation at why
theCauchy integral
formula works!
The Cauchy integral formula gives the value of an analytic function at every
point inside a simple closed contour when it is given the value of the function
only at points on the contour. It seems as though an analytic function is not free to
change insidearegion onceitsvalueis fixedonthecontour enclosingthat region.
There is an analogons sitnation in electrostatics: The specification ofthe poten-
tial at the boundaries, such as the surfaces ofconductors, automatically determines
the potential at any other point in the region of space bounded by the conductors.
This is the content of the uniqueness theorem used in electrostatic boundary value
problems. However, the electrostatic potential <I> is bound by another condition,
Laplace's equation; and the combination of Laplace's equation and the boundary
conditions furnishes the uniqueness of <1>. Similarly, the real and imaginary parts
of an analytic function separately satisfy Laplace's equation in two dimensions!
Thus, it should come as no surprise that the value of an analytic function on a
boundary (contour) determines the function at all points inside the boundary.
9.5 Derivatives as Integrals
The Cauchy Integral Formula is a very powerful tool for working with analytic
functions. One of the applications of this formula is in evaluating the derivatives
of such functions. It is convenient to change the dununy integration variable to ~
and write the ClF as
f(z) = ---.!..." 1. f(~) d~ ,
2"., fc ~ - z
where C is a simple closed contour in the ~-plane and z is a point within C.
As preparation for defining the derivative of an analytic function, we need the
following result.
(9.12)
derivative ofan
analytic function
given interms ofan
integral
9.5 DERIVATIVES AS INTEGRALS 249
9.5.1. Proposition. Let y be any path---<z contour,for example-and g a contin-
uousfunction on that path. Thefunction f(z) defined by
f(z) = ~ 1g(;) d;
2:n:z y ; - z
is analytic at every point z ¢ y.
Proof The proof follows immediately from differentiation of the integral:
df =_1 ~lg(;)d; =_1 19(;)d;~(_I_)=_1
1 g(;)d;.
dz 2:n:i dz y ; - z 2:n:i y dz ; - z 2:n:i y (; - Z)2
This is defined for all values of z not on y.9 Thus, f(z) is analytic there. D
We can generalize the formula above to the nth derivative, and obtain
d
n
f nl 1 g(;)d;
dzn = 2:n:i y (; - z)n+1.
Applyingthis resultto an analytic function expressedby Equation (9.11), we obtain
the following important theorem.
9.5.2. Theorem. The derivatives of all orders of an analytic function f (z) exist
in the domain of analyticity of the function and are themselves analytic in that
domain. The nth derivative of f(z) is given by
f(n)(z) = d
n
f = ~ 1 f(;) d;
dz" 2:n:i ic (; - z)n+1
9.5.3. Example. Let us apply Bquation (9.12) directly 10 some simple functions. In all
. cases, we will assumethat the contour is a circle of radius r centeredatz.
(a) Lei f(z) = K, a constant. Then, for n = 1 we have
df __1_1 ~
dz - 2"i fc (I; - z)2.
Since~ is onthecircle C centered atz,g- z = reifJ
and dg = riei~d8. Sowe have
df = _1_ r: Kire
iO
dO = ~ f2rr e-lOso = O.
dz 2"i 10 (relO)2 Zatr 10 .
(b) Given f (z) = z, its first derivative will be
df = _1_ 1 ~ = _1_ r: (z+relO)irelOdO
dz 2ni fc (I; - z)2 2"i 10 (re,O)2
= ~ (: t" e-lOdO + t" -) = ~(O+2") = I.
2n rio 10 21l' ,
9Theinterchange of differentiation andintegration requires justification. Suchaninterchange canbe doneif the integral has
somerestrictive properties. Weshallnot concern ourselves withsuchdetails. In fact,one canachievethe sameresult by using
thedefinition of derivatives andtheusualproperties of integrals.
250 9. COMPLEX CALCULUS
(c) Given f(z) = z2, for the firstderivative Equation (9.12) yields
df = _1_ 1. g2dg = _1_ {2n (z +rei9)2ireiOdB
dz 'Iitl fc (g - z)2 2"i 10 (re,0)2
= ~ r:[z2 + (rei9)2"+ 2zre'0j (reiO)-ldB
2" 1
0
I(z2 fo2n" fo2n . foh)
= - - e-'OdB+r e'OdB+2z dB ee Zz,
2n r 0 ' 0 0
It can be shown that, in general, (d/dz)z" = mz..- l. The proof is left as Problem
9.30. III
The CIF is a central formula in complex aualysis, and we shall see its sig-
nificance in much of the later development of complex analysis. For now, let us
demonstrate its usefulness in proving a couple of important properties of analytic
functions.
9.5.4. Proposition. The absolute value ofan analytic function f(z) cannot have
a local maximum within the region ofanalyticity ofthe function.
Proof Let S c <C be the region of analyticity of f. Suppose zo E S were a local
maximum. Then we could find a circle jn of small enough radius 8, centered at zo,
such that If(zo)1 > If(z)1 for all Z on )/0. We now show that this cannot happen.
Using the CIF, and noting that z - zo = 8eiO, we have
If(zo)1 = I~,[ f(z) dzl = _I I{2n f(z)i8ei9deI
2"., fro z - zo 2" 10 8e'0
s ~ {2n If(z)!de s _I r:use = M,
2" 10 2" 10
where M is the maximum value of If(z)1 for z E )/0. This inequality says that
there is at least one point z on the circle )/0 (the point at which the maximum of
!f(z)1 is attained) such that If(zo)1 ::: If(z)l. This contradicts our assumption.
Therefore, there can be no local maximum within S. D
9.5.5. Proposition. A bounded entire function is necessarily a constant.
Proof We show that the derivative of such a function is zero. Consider
df 1,[ f(g) dg
dz = 2"i fc (g - Z)2 •
Since f is an entire function, the closed contour C can be chosen to be a very large
circle of radius R with center at z. Taking the absolute value of both sides yields
fundamental theorem
ofalgebra proved
9.5 DERIVATIVES AS INTEGRALS 251
where M is the maximum of the function in the complex plane. Now, as R ---> 00,
the derivative goes to zero, and the function must be a constant. D
Proposition 9.5.5 is a very powerful statement about analytic functions. There
are many interesting and nontrivial real functions thatare boundedandhave deriva-
tivesof all orders on theentire real line. Forinstance, e-
x2
is sucha function. No
such freedom exists for complex analytic functions. Any nontrivial analytic func-
tion is either not bounded (goes to infinity somewhere on the complex plane) or
not entire (it is not analytic at some point(s) of the complex plane).
A consequence of Proposition 9.5.5 is the fundamental theorem of algebra,
which states that any polynomial of degree n 2': 1 has n roots (some of which may
be repeated). In other words, the polynomial
p(x) = aD +alX +...+anxn for n, 2': I
can be factored completely as p(x) = c(x - Zl)(X - zz)··· (x - Zn) where c is a
constant and theZi are, in general. complexnumbers.
To see how Proposition 9.5.5 implies the fundamental theorem of algebra, we
let f(z) = l/p(z) and assume the contrary, i.e., that p(z) is never zero for any
(finite) Z E C. Then f(z) is bounded and analytic for all z E C, and Proposition
9.5.5 says that f (z) is a constant. This is obviously wrong. Thus, there mnst be at
least one z, say z = zi, for which p(z) is zero. So, we can factor out (z - Zl) from
p(z) and write p(z) = (z - Zl)q(Z) where q(z) is of degree n - 1. Applying the
above argument to q(z), we have p(z) = (z - Zl)(Z - zz)r(z) where r(z) is of
degree n - 2. Continning in this way, we can factor p(z) into linear factors. The
last polynomial will be a constant (a polynomial of degree zero) which we have
denoted by c.
The primitive (indefinite integral) of an analytic function can be defined using
definite integrals just as in the real case. Let f : C ---> C be analytic in a region
S of the complex plane. Let zo and z be two points in S, and defmel'' F(z) es
fz: f(l;) d~. We can show that F(z) is the primitive of f(z) by showing that
lim IF(z + t>z) - F(z) - f(z)1 = o.
.6.z--+O f).z
We leave the details as a problem for the reader.
9.5.6. Proposition. Let f : C ---> C be analytic in a region S ofC. Then at every
point z E S, there exists an analytic function F : C ---> C such that
dF
dz = f(z).
lONate that theintegral is path-independent duetotheanalyticity of f. Thus, F is well-defined.
252 9. COMPLEX CALCULUS
In the sketch of the proofof Proposition 9.5.6, we used only the continuity of f
and the fact that the integral was well-defined. These two conditions are sufficient
to establish the analyticity of F and f, since the latter is the derivative of the
former. The following theorem, due to Morera, states this fact and is the converse
of the Cauchy-Goursat theorem.
9.5.7. Theorem. (Morera's theorem) Let a function f : iC -+ iC be continuous in
a simply connected region S. Iffor each simple closed contour C in S we have
:Fe f(~) d~ = 0, then f is analytic throughout S.
9.6 Taylor and Laurent Series
The expansion of functions in terms of polynomials or monomials is important
in calculus and was emphasized in the analysis of Chapter 5. We now apply this
concept to analytic functions.
9.6.1 Properties of Series
absolute
convergence
power series
circle ofconvergence
The reader is assumed to have some familiarity with complex series. Nevertheless,
we state (without proof) the most important properties of complex series before
discussing Taylor and Laurent series.
A complex series is said to converge absolutely ifthe real series L~o IZkl =
L~o Jx~ +y~ converges. Clearly, absolute convergence implies convergence.
9.6.1. Proposition. If the power series L~o ak(Z - ZO)k converges for Zl i' zo,
then it converges absolutely for every value of Z such that Iz- zol < IZI - zol.
Similarly if the power series L~o bk/(Z - ZO)k converges for Z2 i' zo. then it
converges absolutelyfor every value ofz such that [z - zol > IZ2 - zn].
A geometric interpretation of this proposition is that if a power series-with
positive powers--converges for a point at a distance 71 from ZQ, then it converges
for all interior points of the circle whose center is zo, and whose radius is r.
Similarly, if a power series-with negative powers-converges for a point at a
distance r2 from zo, then it converges for all exterior points of the circle whose
center is zo and whose radius is r: (see Figure 9.9). Generally speaking, positive
powers are used for points inside a circle and negative powers for points outside
it.
The largest circle about zo such that the first power series ofProposition 9.6.1
converges is called the circle of convergence of the power series. The propo-
sition implies that the series carmot converge at any point outside the circle of
convergence. (Why?)
In determining the convergence of a power series
00
S(z) es I>n(z - zo)".
n=O
(9.13)
9.6 TAYLOR ANO LAURENT SERIES 253
(a)
Figure 9.9 (a) Power series withpositive exponents converge for the interiorpoints of a
circle. (b) Power series withnegative exponents converge fortheexterior points of acircle.
we look at the behavior of the sequence of partial sums
N
SN(Z) es Ean(z - zo)".
n=O
Convergence of (9.13) implies that for any 8 > 0, there exists an integer N. such
that
whenever N > Ns-
uniform convergence
explained
power series are
uniformly convergent
and analytic
power series can be
differentiated and
integrated term by
term
In general, the integer N. may be dependent on z; that is, for different values of
z, we may be forced to pick different N. 's. When N. is independent of z, we say
that theconvergence is uniform.
9,6.2. Theorem. The power series S(z) = L~oan(Z - ao)" is uniformly con-
vergent for all points within its circle of convergence and represents a function
that is analytic there.
By substituting the reciprocal of (z - zo) in the power series, we can show that
if L~o bk/(z - ZO)k is convergent in the annulus rz < Iz- zol < rl, then it is
uniformly convergent for all Z in that annulus.
9.6.3. Theorem. A convergentpower series can be differentiated and integrated
term by term; that is, if S(z) = L~o an(z - zo)".then
dS(z) 00 1 00 1
~=Enan(z-zo)n-I, S(z)dz=Ean (z-zo)ndz
n=l Y n=O Y
for any path y lying in the circle ofconvergence ofthe power series.
(9.14)
254 9. COMPLEX CALCULUS
9.6.2 Taylor and Laurent Series
We now state and prove the two main theorems of this section. A Taylor series
consists of terms withonly positivepowers.A Laurent series allows fornegative
powersas well.
Taylor series 9.6.4. Theorem. (Taylor series) Let f be analytic throughout the interior of a
circle Co having radius ro and centered at zoo Then at each point z inside Co.
00 f(n)(zo)
f(z) = f(zo) + f'(zo)(z - zo) +... = L I (z - zo)".
n=O n.
That is. the power series converges to f(z) when [z- zol < roo
Proof From the CIF and the fact that z is inside Co, we have
Maclaurin series
f(z) = ~ J f(~) d~.
2rrl feo ~ - z
On the other hand,
1 1 1
~ - z = ~ - zo +zo - z = (~_ zo) (1 _z - zo)
~ -zo
= 1 1 1 ~ (Z - ZO)n
~-zo 1-~ = ~-zo~ ~-zo
; -zo n_O
The last equality follows from the fact that I(z - zo)/(~ - zo)1 < I-because z
is inside the circle Co and ~ is on it-and from the sum of a geometric series.
Substituting in the CIF and using Theorem 9.5.2, we obtain the result. D
For zo = 0 we obtain the Maclanrin series:
00 f(n)(o)
f(z) = f(O) + J'(O)z +... = L __zn
n=O n!
The Taylor expansion reqnires analyticity of the function at all points interior
to the circle Co. On many occasions there may be a point inside Co at which the
function is not analytic. The Laurent series accommodates such cases.
Laurent series 9.6.5. Theorem. (Laurent series) Let Cl and C2 be circles ofradii Yj and ri. both
centered at zo in the Z-plane with Yj > rz. Let f : iC --+ iC be analytic on C1 and
C2 and throughout S, the annular region between the two circles. Then, at each
point z E S, f(z) is given by
00
f(z) = L an(z - zo)"
n=-oo
where an = _1 J f(~) d~
2rri fc (~ - zo)n+l
and C is any contour within S that encircles zo.
(9.15)
9.6 TAYLOR AND LAURENT SERIES 255
Figure9.10 Theannular region within andon whosecontour theexpanded function is
analytic.
Proof Let Y be a small closed contour in S enclosing z, as shown in Figure 9.10.
For the composite contour C' the Cauchy-Goursat theorem gives
0= J f(~) d~ = J f(~) d~ _ J f(~) d~ _ J f(~) d~,
Yc' ~ - Z YCI ~ - Z Yc, ~ - z r, ~ - z
where the y and C2 integrations are negative because their interior lies to our right
as we traverse them. The y integral is simply 2rr:if(z) by the CIF. Thus, we obtain
2rr:if(z) = J f(~) d~ _ J f(~) d~.
YCI ~ - Z Yc, ~ - z
Now we use the same trick we used in deriving the Taylor expansion. Since z is
located in the annular region, ri < [z - zol < rl. We have to keep this in mind
When expanding the fractions. In particular, for ~ E CI we want the ~ term in
the denominator, and for ~ E C2 we want it in the numerator. Substituting such
expansions in Equation (9.15) yields
00 i f(~)d~
2rr:if(z) = I:(z - zo)" (~_ )n+l
n=O Cl zo
00 I i
+I:( )n+l f(;)(~ - zo)" d~.
n=Oz-zo C2
(9.16)
Now we consider an arbitrary contour C in S that encircles zoo Figure 9.11
shows a region bounded by a' contour composed of CI and C. In this region
256 9. COMPLEX CALCULUS
1(1;)/(1; - zo)n+l is analytic (because I; can never equal zo). Thus, the integral
over the compositecontourmust vanish by the Cauchy-Goursattheorem. Itfollows
that the integral over C1 is equal to that over C. A similar argument shows that the
C2 integral can also be replaced by an integral over C. We let n +1 = -m iu the
second sum of Equation (9.16) to transform it into
f: 1 m J. 1(1;)(1; - zo)-m-l dl; = f (z - zo)m J. 1(1;) ~:+1 .
m~-1 (z - zo) 1'c m=-oo 1'c (I; - zo)
Changing the dummy index back to n and substituting the result in Equation (9.16)
yields
00 i 1(1;) n=-1 i 1(1;)
2:n:i/(z) = L(z - zo)" (I; _ )n+l dl; + L (z - zo)" (I; _ )n+l dl;.
n=O C zo -00 c zo
We can now combine the sums and divide both sides by 2:n:i to get the desired
expansion. D
The Laurentexpansion is convergeut as loug as'2 < Iz- ZoI < 'I. Iu particular,
if'2 = 0, and if the function is analytic throughout the interior of the larger circle,
then an will be zero forn = -1, -2, ... because 1(1;)/(1; _zo)n+l will be analytic
for negative n, and the integral will be zero by the Cauchy-Goursat theorem. Thus,
ouly positive powers of (z - zo) will be present in the series, and we will recover
the Taylor series, as we should.
It is clear that we can expand CI and shrink C2 until we encounter a poiut at
which I is no longer analytic. This is obvious from the constructiou of the proof,
in which ouly the analyticity in the annular region is important, not its size. Thus,
we can include all the possible analytic points by expanding CI and shrinking C2.
9.6.6. Example. Letus expand somefunctions in tenusof series. Foranentire function
there is nopointin theentire complex plane atwhichit is notanalytic. Thus, onlypositive
powers of (z - zo) will bepresent, andwe will haveaTaylor expansion that is validforall
values of z.
(a)Let usexpand eZ around 20 = O. Thenthderivative of eZ is eZ• Thus, j(n)(O) = 1,and
Taylor (Maclaurin) expansion gives
00 tIn) (0) 00 zn
e'= L--zn=L-'
n=O n! n=O n!
(b)TheMaclaurin series forsinz is obtained by noting that
d
n
. I {O ifn is even,
-r-r-smz =
dzn ,~o (_I)(n-I)/2 ifnisod<!
and substituting thisin theMaclaurin expansion:
n 00 2k+l
sinz = L (_I)(n-I)/2~ = L(-I)k z .
n odd n! k~O (2k + 1)1
9.6 TAYLOR AND LAURENT SERIES 257
Figure 9.11 Thearbitrary contour intheannular regionusedin theLaurent expansion.
Similarly, we canobtain
00 Z2k
cosz = {;,(_I)k (Zk)!'
00 Z2k+l
sinhz = (;, ""(Z"-k-+--'I'""')!'
00 Z2k
coshz = I: (Zk "
k~O ).
(c) Thefunction 1/ (1 + z) is notentire, so theregionof its convergence is limited. Letus
find the Maclaurin expansion of this function. The function is analytic within all circles
of radii r < 1. At r = 1 we encounter a singularity, the pointz = -1. Thus,the series
converges forallpointsl! z forwhich [z] < 1. Forsuchpoints we have
d
n
I
/(n)(O) = -[(I +z)-I] . =(_I)nnL
dzn z=o
Thus,
_1_ = f /(n) (0)z" = f(-I)nZn.
l+z n=O n! n=O
III
Taylor and Laurent series allow us to express an analytic function as a power
series. For a Taylor series of !(Z), the expansion is routine because the coeffi-
cient of its nth term is simply !(n)(zo)/nl, where zo is the center of the circle of
convergence. When a Laurent series is applicable, however, the nth coefficient is
not, in general, easy to evaluate. Usually it can be found by inspection and certain
manipulations of other known series. But if we use such an intuitive approach
11As remarked before, theseriesdiverges forall points outside thecirclelz! = 1. Thisdoesnotmeanthat thefunction cannot
be represented by a seriesforpointsoutside thecircle.Onthe contrary, we shallsee shortly that Laurent series,withnegative
powers of z - zearedesignedprecisely forsucha purpose.
258 9. COMPLEX CALCULUS
n+l = 8kn
Laurent series is
unique
You can add,
subtract, and
multiply convergent
power series
to determine the coefficients, can we be sure that we have obtained the correct
Laurent series? The following theorem answers this question.
9.6.7. Theorem. Ifthe series L~-oo an(z - zo)n converges to fez) at all points
insomeannular region aboutzo. then it is theunique Laurent seriesexpansion of
fez) in that region.
Proof Multiply both sides of f (z) = L~-oo an(z - zo)n by
1
211:i(z - zo)k+!'
integrate the result along a contour C in the annular region, and use the easily
verifiable fact that
1 J dz
211:i fc (z - zo)k
to obtain
I J fez)
211:i fc (z_ zo)k+! dz = ar.
Thus, the coefficient in the power series of f is precisely the coefficient in the
Laurent series, and the two must be identical. 0
We will look at some examples that illustrate the abstract ideas developed in
thepreceding collectionof theorems and propositions. However, we canconsider
a much broader range of examples if we know the arithmetic of power series.
The following theorem giving arithmetical manipulations with power series is not
difficult to prove (see [Chur 74]).
9.6.8. Theorem. Letthetwopowerseriesf(z) = L~_ooan(z-zo)nandg(z) =
L~-oo bn(z - zo)n be convergent within some annular region r2 < [z- zoI < TJ·
Then the sum L~-oo (an+bn)(z - zo)n converges to f (z)+g(z), and theproduct
00 00 00
L L anbm(z - zo)m+n == L Ck(Z - ZO)k
n=-oo m=-oo k=-QO
converges to f(z)g(z) for Z interior to the annular region. Furthermore, if
g(z) i' 0 for some neighborhood of zo. then the series obtained by long divi-
sion ofL~_ooan(Z - eo)" by L:'=-oo bm(z - zo)" converges to f(z)/g(z) in
that neighborhood.
This theorem, inessence, says that converging power series can be manipulated
as though they were finite sums (polynomials). Such manipulations are extremely
useful when dealing with Taylor.andLaurent expansions in which the straightfor-
ward calculation of coefficients may be tedious. The following examples illustrate
the power of infinite-series arithmetic.
9.6.9. Example.
rewriteit as
9.6 TAYLOR AND LAURENT SERIES 259
Toexpandthefunctionf(z) = 22+3~ in a Laurent seriesaboutz =0,
z +z
fez) = ~ (2+ 3Z) = ~ (3 __1_) = ~ (3 _~(_I)nZn)
Z21+Z z2 I+z Z2 L..
n=O
1 2 3 2 1 2
=-(3-I+z-z +Z -···)=-+--I+z-z +....
Z2 Z2 . Z
This seriesconverges for 0 < [z] < 1.Wenotethat negativepowersof z are alsopresent.12
Using the notation ofTheorem 9.6.5, we have a_2 = 2, a_I = 1, an = 0 for n :::s -3, and
an = (_1)n+1 forn '" O. III
9.6.10. Example. The function fez) = 1/(4z - z2) is the ratio of two entire functions.
Therefore, by Theorem9.6.8, itis analytic everywhereexceptat the zeros ofits denominator,
Z = 0 and z =4. For tileannularregion (herer2 of Theorem9.6.5is zero)0 < lel < 4, we
expand f(z) in the Laurent series around z = O. Instead of actually calculating an, we first
note that
fez) = :z (I _lz/4)·
The second factor can be expanded in a geometric series because Iz/41 < 1:
l OO n 00
_ _ '" (~) _ '" 4-n n
1 _ z/4 - L.. 4 - L.. z.
n=O n=O
Dividing this by cz, and noting that z = 0 is the only zero of 4z and is exclnded from the
annular region, we obtain the expansion
Although we derived this series using manipulations ofother series, the uniqueness ofseries
representations assures us that this is the Laurent series for the indicated region.
How can we represent fez) in the region for which Iz[ > 4? This region is exterior to
the circle lel = 4, so we expect negative powers of z. To find the Laurent expansion we
write
fez) = - z; C_1
4
/J
and note that ]4/zl < 1 for points exterior to the larger circle. The second factor can be
written as a geometric series:
12This is a reflection of the fact that the function is not analytic inside the entire circle lz! = 1; it blows up at z = O.
260 9. COMPLEX CALCULUS
Dividingby _z2, whichis nonzero intheregionexterior tothelarger circle,yields
00
f(z) = - I>"z-n-2
n=Q
11II
9.6.11. Example. The function fez) = z([(z -1)(z-2)] has a Taylor expansion around
theoriginIorlz] < 1.Tofind thisexpansion, we write13
1 2 1 1
f(z) =--1 +-2 = -1- - - 1
(2·
z- z- -z -z
Expanding both fractions in geometric series (hoth [z] and Iz(21are less than 1), we ohtain
fez) = L~o z" - L~O(z(2)n. Adding the two series-nsing Theorem 9.6.8-yie1ds
00
fez) = L(l- 2-")zn
n=O
for Izl < 1.
This is the unique Taylor expansion of fez) within the circle Izi = 1.
For1 < Izi -c 2 we havea Laurent series. Toobtain thisseries, write
l(z 1 1 ( 1) 1
fez) = l(z - 1 - 1 - z(2 = -z 1 - l(z - 1 - z(2·
Since hoth fractions on the RHS converge in the annuiarregion (11(zl < 1, Iz(21 < 1), we
get
100 In 00 n 00 00
fez) = -zL (z) - L m= - Lz-
n- I
- L2-
nzn
n=D n=O n=O n=O
-00 00 00
=- L z"> Lz-nzn= L anZ
n
,
n=-l n=O n=-oo
where an = -1 forn < 0 and an = _Z-n farn ::: O. Thisis the unique Laurent expansion
of f (z) in the givenregion.
Finally, for Izl > 2 we have only negative powers of z. Weobtain theexpansion inthis
region hy rewriting f (z) as follows:
l(z 2(z
fez) = -1 _ l(z + 1 - 2(,
Expanding the fractions yields
00 00 00
fez) = - L z-n-I + L 2n+1z-n-1 = L(2n+l - 1)z-n-1.
n=O n=O n=O
Thisis again theunique expansion of f(z) in theregion[z] > 2. 11II
13We could,of course, evaluate the derivatives of all orders of thefunction at z = 0 anduse Maclaurin's formula. However,
thepresent methodgives the sameresultmuchmorequickly.
9.6 TAYLOR ANO LAURENT SERIES 261
9.6.12. Example. Definef (z) as
f {
(I - cosz)/z2 for z '" 0,
(z) = 1
2 for z = O.
Wecanshowthat f(z) is anentire function.
Since 1 - cosZ andz2 are entire functions, their ratio is analytic everywhere exceptat
thezerosof its denominator. Theonlysuchzerois z = O. Thus,Theorem 9.6.8 implies that
f(z) is analyticeverywhere exceptpossiblyat z = O. To see thebehaviorof f(z) at z = 0,
we look atits Maclaurin series:
00 z2n
1- cosz = 1- L(-I)n_-,
n~O (2n)!
whichimpliesthat
The expansion on the RHS shows that the value of the series at z = 0 is !'which, by
definition, is f(O). Thus,the seriesconverges for all z, andTheorem9.6.2 saysthat f(z) is
entire. III
A Laurent series can give information about the integral of a function around
a closed contour in whose interior the function may not be analytic. In fact, the
coefficient of the first negative power in a Laurent series is given by
a_I = .2., J f(~) d~.
21f1 rc (9.17)
Thus, to find the integral of a (nonanalytic) function around a closed contour sur-
rounding zo, we write the Laurent series for the function and read offthe coefficient
of the I/(z - zo) term.
9.6.13. Example. As an illustrationof thisidea.Iet us evaluatethe integralI = Pc dz/
[z2(z - 2)], whereC is the circleof radius I centeredatthe origin.Thefunctionis analytic
in theannular region 0 < [z] < 2. We can therefore expand it as a Laurent series about
z =0 in thatregion:
I I ( I ) I 00 (z)n
z2(z - 2) = - 2z2 I - z/2 = - 2z2]; 2:
=-~ C2) - ~ G)-~ -....
Thus,LI = -1,and Pcdz/[z2(z - 2)] = 21tia_l = -i1t/2. A directevaluationof the
integral isnontrivial. Infact, wewill seelaterthat to find certain integrals, itis advantageous
to cast them in the form of a contourintegraland use either Equation (9.17) or a related
equation. II
262 9. COMPLEX CALCULUS
Let f : C -> C be analytic at zoo Then by definition, there exists a neighbor-
hood ofzo in which f is analytic. In particular, we can find a circle Iz-zol = r > a
in whose interior f has a Taylor expansion.
zero oforder k 9.6.14. Definition. Let
00 f(n) (eo) 00
f(z) = L I (z - zo)" es Lan(Z - zn)".
n=O n. n=O
Then f is saidto have a zero oforderk atzo if f(n) (zn) = Oforn = 0, 1, ... , k-l
but f(k) (zo) i' o.
In that case f(z) = (z - ZO)k. I:~o ak+n(Z- zo)n, where ak i' aand [z-zol < r.
We define g(z) as
00
g(z) = L ak+n(z - zo)"
n=O
where [z - zol < r
thezeros ofan
analytic function are
isolated
simple zero
and note that g(zo) = ak i' O. Convergence of the series on the RHS implies that
g(z) is continuous at ZOo Consequently, for each E > 0, there exists 8 such that
Ig(z) - akl < E whenever [z - zol < 8.lfwe choose E = lakl/2, then, for some
80> 0, Ig(z) - akl < lakl/2 whenever [z - zo] < 80. Thus, as long as z is inside
the circle [z - zn] < 80, g(z) cannot vanish (because if it did the first inequality
would imply that lakl < lakl/2). We therefore have the following result.
9.6.15. Theorem. Let f : C -> iC be analytic at zo and f (zo) = O. Then there
exists a neighborhood of zo throughout which f has no other zeros unless f is
identically zero there. Thus, the zeros ofan analytic function are isolated.
When k = 1, we say that zo is a simple zero of f. To find the order of the zero
of a function at a point, we differentiate the function, evaluate the derivative at that
point, and continue theprocess untilwe obtain anonzero valueforthederivative.
9.6.16. Example. (a) The zeros of cosz, which are z = (2k + 1),,/2, are all simple,
because
d
d coszl = - sin [(2k + I)::] '"O.
z z~(2k+I)n/2 2
(b)Tofindtheorderof thezeroof f (z) = eZ - 1- z - z2/2 at z = 0, we differentiate
f(z) andevaluate f'(O):
f'(O) = (eZ
-I-z)z=o = O.
Differentiating again gives f"(O) = (eZ - l)z=O = O. Differentiating once more yields
fm (0) = (eZ)z~o = I. Thus,the zero is of order 3. III
9.7 PROBLEMS 263
9.7 Problems
9.1. Show that the function w = l/z maps the straight line y = a in the z-plane
onto a circle in the w-plane with radius l/(2Ial) and center (0, l/(2a)).
9.2. (a) Using the chainmle, find af/az* and af/az in terms of partial derivatives
with respect to x and y.
(b) Evaluate af/az* and af/az assuming that the Cauchy-Riemann conditions
hold.
9.3. Show that when z is represented by polar coordinates, the derivative of a
function f(z) can be written as
df =e-19 (au +iav),
dz aT aT
where U and V are the real and imaginary parts of f(z) written in polar coor-
dinates. What are the C-R conditions in polar coordinates? Hint: Start with the
C-R conditions in Cartesian coordinates and apply the chain IDle to them using
x = TCOSO and y =T sinO.
9.4. Show that d/dz(lnz) = liz. Hint: Find u(x, y) and v(x, y) for ln z and
differentiate them.
9.5. Show that sin z and cos z have only real roots.
9.6. Show that
(a) the sum and the prodnct of two entire functions are entire, and
(b) the ratio of two entire functions is analytic everywhere except at the zeros of
thedenominator.
9,7. Given that u =2Aln[(XZ+yZ)I/Z], show that v =2Atan"! (yfx), where u
and v are the real and imaginary parts of an analytic function w(z).
9.8. If w(z) is any complex potential, show that its (complex) derivative gives the
components of the electric field.
9.9. (a) Show that the flnx through an element of area da of the lateral surface
of a cylinder (with arbitrary cross section) is d¢ = dz(IEI ds) where ds is an arc
length along the equipotential surface.
(b) Prove that lEI = Idw/dzl =av/as where v is the imaginary part of the com-
plex potential, and s is the parameter describing the length along the equipotential
curves.
(c) Combine (a) and (b) to get
flux per unit z-length = _¢- = v(Pz) - V(PI)
zz - Zl
for any two points PI and Pz on the cross-sectional curve of the lateral surface.
Conclude that the total flux per unit z-length through a cylinder (with arbitrary
264 9. COMPLEX CALCULUS
cross section) is [v], the total change in v as one goes around the curve.
(d) Using Gauss's law, show that the capacitance per unit length for the capacitor
consisting of the two conductors with potentials Ut and U2 is
c '" charge per unit length _ [v]j4rr
potential difference - IU2 - Uti'
9.10. Using Equation (9.7)
(a) find the equipotential curves (curves of constant u) and curves of constant v
for two line charges of equal magnitude and opposite signs located at y = a and
y = -a in the xy-plane.
(b) Show that
z=a(sin;),. +iSinh;Jj(COSh;),. -cos 2:)
by solving Equation (9.7) for z and simplifying.
(c) Show that the equipotential curves are circles in the xy-plane of radii
a] sinh(uj2)") with centers at (0, a coth(uj2)")), and that the curves of constant v
are circles of radii a] sin(vj2)") with centers at (a cot(vj2)"), 0).
9.11. In this problem, you will find the capacitance per unit length of two cylin-
dtical conductors of radii Rt and R2 the distance between whose centers is D by
looking for two line charge densities +),. and -),. such that the two cylinders are
two of the equipotential surfaces. From Problem 9.10, we have
a
Ri = sinh(u;j2),.) ' Yi =a coth(u;j2)"), i = 1,2,
where Yjand Y2 are the locations of the centers ofthe two conductors on the y-axis
(which we assume to connect the two centers).
I
Ut U21
(a) Show that D = IYt - Y21 = Rt cosh 2)" - R2 cosh 2)" .
(b) Square both sides and use cosh(a -b) = cosh a coshb - sinha sinhb and the
expressions for the R's and the y's given above to obtain
(
Ut - U2) IRr+R~-D21
cosh = .
2)" 2RtR2
(c) Now find the capacitance per unit length. Consider the special case of two
concenttic cylinders.
(d) Find the capacitance per unit length of a cylinder and a plane, by letting one
of the radii, say Rj, go to infinity while h '" Rj - D remains fixed.
9.7 PROBLEMS 265
9.12. Use Equations (9.4) and (9.5) to establish the following identities.
(a) Re(sinz) = sin x coshy,
(b) Re(cosz) = cosxcoshy,
(c) Re(sinhz) = sinh x cosy,
(d) Re(coshz) = cosh x cosy,
(e) Isind = sin2x + sinh2 y,
(f) Istnh z]? = sinh2x + sin2 y,
9.13. Find all the zeros of sinh z and cosh z.
9.14. Verify the following hyperbolic identities.
Im(sin z) = cos x sinh y.
Im(cosz) = - sinx sinhy.
Im(sinhz) = cosh x siny.
Im(coshz) = sinhxsiny.
1 coszl2 = cos2x + sinh2 y.
1 coshzl2 = sinh2x + cos2 y.
(a) cosh2Z - sinh2Z = 1.
(b) COSh(ZI + Z2) = COShZl cosh zj + sinh zj sinh zj.
(c) sinhtz: + Z2) = sin zj COShZ2 + COShZl sinh z-.
(d) cosh2z = cosh2z + sinh2z, sinh2z = 2 sinh zcosh z,
taoh Zj + tanh Z2
(e) tanhfzj +Z2) = 1 +tanh zj taohZ2'
9.15. Show that
(a) taoh(!:) = sinhx+isiny
2 cosh x + cos y
9.16. Find all values of z such that
(
Z) sinhx-isiny
(b) coth - = .
2 coshx - cos Y
(a) eZ
= -3. (b) eZ
= 1 +i./3. (c) e
2z- 1
= 1.
9.17. Show that le-zi < 1 if and only if Re(z) > O.
9.18. Show that both the real and the imaginary parts of an aoaiytic function are
harmonic.
9.19. Show that each of the following functions-s-call each one u(x, y)-is har-
monic, and find the function's harmonic partner, v(x, y), such that u(x, y) +
iv(x, y) is analytic.
(a)x3 - 3xi. (b) eX cosy.
2 2 2
(d)e- Ycos2x. (e) eY -X cos2xy.
(I) eX(x cosy - y siny) + 2 sinh y sin x + x 3
- 3xi + y.
266 9. COMPLEX CALCULUS
9.20. Prove the following identities.
(u) cos-I Z = -i In(z ± JZ2 - I).
I (i - Z)
(c) tan-I z = ---, In -.- .
2, ,+z
(e) sinh-I
z = In(z ± JZ2 + I).
(b) sin-I
z = -i In[iz ± ~)].
(d) cosh-I
z = In(z ± ~).
(f) tanh-I z =! In (I +Z).
2 I-z
9.21. Find the curve defined by each of the following equations.
(u) z = I - it, 0::; t ::; 2.
n 3:n:
(c) z =u(cost+isint), "2::; t::; 2'
(b) z = t +it2, -00 < t < 00.
i
(d) z = t + -, -00 < t < O.
t
9.22. Provide the details of the proof of part (a) of Proposition 9.3.1. Prove part
a2 cf> a2 cf>
(b) by showing that if f(z) = z' = x' +iy' is analytic and -2 + -2 = 0, then
ax ay
a2cf> a2cf>
-' +--0
axil ay'2 - .
9.23. Let f(t) = u(t) + iv(t) be a (piecewise) continuous complex-valued
function of a real variable t defined in the interval a ::; t ::; b. Show that if
F(t) = U(t) +iV(t) is a function such that dF[dt = f(t), then
lb
f(t) dt = F(b) - F(u).
This is the fundamental theorem of calculus for complex variables.
9.24. Find the value of the integral Icr(z +2)/z]dz, where C is (a) the semicircle
z = 2eiO
, for 0 ::; e::; it; (b) the semicircle z = 2e'&,for:n: ::; e ::; 2:n:, and (c)
thecirclez = 2ei(), for-]i ~ 8 .:5 Jr.
9.25. Evaluate the integra1Iy dz/(z-I-i) where y is (a) the line joining zi = 2i
and Z2 = 3, and (b) the broken path from zi to the origin and from there to Z2.
9.26. Evaluate the integral Ie zm(z*)ndz, where m and n are integers and C is the
circle [z] = I taken counterclockwise.
9.27. Let C be the boundary ofthe square with vertices at the points z =0, z = I,
z = I +i, and z =i with counterclockwise direction. Evaluate
fc(5z +2)dz and
9.7 PROBLEMS 267
9.28. Let Cl be a simple closed contour. Deform Cl into anew contour C2io such
a way that C1 does not encounter aoy singularity of ao aoalytic function f io the
process. Show that
J f(z)dz = J f(z)dz.
Tel i.
That is, the contour cao always be deformed iota simpler shapes (such as a circle)
aod the iotegral evaluated.
9.29. Use the result of the previous problem to show that
J dz. = 2,,; aod J (z - I - i)m-1dz = 0 for m = ±I, ±2, ...
fCZ- I - 1 fc
when C is the boundary of a square with vertices at z = 0, z = 2, z = 2 +2;, aod
z = 2;, taken counterclockwise.
9.30. Use Equation (9.12) aod the bioomial expaosion to show that
d
dz (z'") = mzm-l.
9.31. Evaluate '.fc dZ/(Z2 - 1) where C is the circle lzl = 3 iotegrated io the
positive sense. Hint: Deform C iota a contour C' that bypasses the singularities of
the iotegraod.
9.32. Show that when f is aoalytic witbio aod on a simple closed contour C aod
zo is noton C, then
J f'(z) dz = J f(z) dz .
fc z - zo fc (z - ZO)2
9.33. Let C be the boundary of a square whose sides lie along the lioes x = ±3
aod y = ±3. For the positive sense of integration, evaluate each of the following
integrals.
J eZ
J cosz d
(b) fc Z(Z2 + 10) dz: (e) fc (z _ ~)(Z2 -10) Z.
i coshz i cosz
(e) -4- dz. (f) -3 dz:
c z c z
i eZ i cosz
(h) dz: (i) --.- dz:
c (z - ;,,)2 C Z +in
(k) J sinhz dz: (I) J eoshz dz;
fc (z - ;,,/2)2 fc (z - ;,,/2)2
for - 3 < a < 3. (n) J Z2 dz:
fc (z - 2)(Z2 - 10)
(a) J e-
z
dz:
fc z - i,,/2
i sinhz
(d) -4-dz.
c z
i cosz d
(g) z
c (z - ;,,/2)2 .
(j) J 2 e
Z
dz:
fc z -5z +4
i taoz
(m) dz
c (z - a)2
268 9. COMPLEX CALCULUS
9.34. Let C be the circle lz - i1 = 3 integrated in the positive sense. Find the
valne of each of the following integrals.
(a) J: e
Z
dz
Y
c Z2 +rr2 •
J: dz
(d) :rc (z2 +9)2 .
J: sinhz
(b) Yc (Z2 +rr2j2 dz,
J: coshz
(e) Y
c
(Z2 +rr2) 3 dz:
J: dz
(c) :rc Z2 +9'
(f) J: Z2 - 3z +4 dz,
Y
c z2 - 4z + 3
9.35. Show that Legendre polynomials (for [x] < I) can be represented as
(_I)n i (1- z2)n
Pn(x) = +1 dz;
2n(2rri) c (z - x)"
where C is the unit circle around the origin.
9.36. Let I be analytic within and on the circle Yo given by [z - zol = Yo and
integrated in the positive sense. Show that Cauehy's inequalityholds:
nlM
1I(n) (zo)1 :::: ~,
Yo
where M is the maximum value of I/(z)1 on Yo.
9.37. Expand sinhz in a Taylor series about the point z = in.
9.38. What is the largest circle within which the Maclaurin series for tanh Z con-
verges to tanhz?
9.39. Find the (unique) Laurentexpansion of each of the following functions about
the origin for its entire region of analyticity.
(d) sinhz-z
Z4
(h) (Z2 ~ 1)2'
I
(e) --'z2'--(-'-I---z-'--).
z2 -4
(g) ---Z--9'
z -
I
(f) ---Z--I'
z -
I
(a) .
(z - 2)(z - 3)
I
(e) (I _ z)3'
(i) _z_.
z-I
9.40. Show that the following functions are entire.
{
e2Z_ 1 2
(a) I(z) = -----;:- - z for Z f= O,
2 for z = O.
{
~
(b) I(z) = I z
for z f= 0,
for z = O.
{
cosz
(e) I(z) = Z2 - rr2
/4
-I/rr
for z f= ±rr/2,
for z = ±rr/2.
9.7 PROBLEMS 269
9.41. Let I be analytic at zo and I (zo) = I'(zo) = ... = I(k)(eo) = O. Show
that the following function is analytic at zo:
1
I(z)
(z - ZO)k+l
g(z) =
flk+ll(zo)
(k + I)!
for z i' zo.
for z = zoo
9.42. Obtain the first few nonzero terms of the Laurent series expansion of each
of the following functions about the origin. Also find the integral of the function
along a small simple closed contour encircling the origin.
I
(a) -.-.
sm z
Z4
(e) 3 .
6z+z -6sinhz
(b) I
1- cosz
I
(f) -2-'-'
Z smz
(c) z
I-coshz
I
(g) eZ - I'
(d) --,--
z - sinz
Additional Reading
I. Churchill, R. and Verhey, R. Complex Variables and Applications, 3rd ed.,
McGraw-Hill, 1974. An introductory text on complex variables with many
examples and exercises.
2. Lang, S. Complex Analysis, 2nd ed., Springer-Verlag, 1985. An extremely
well-written book by a master expositor. Although the book has a formal
tone, the clarity of exposition and the use of many examples make this book
very readable.
10 _
Calculus of Residues
One of the mostpowerful tools made available by complex analysis is the theory of
residues, which makes possible the routine evaluation of certain definite integrals
that are impossible to calculate otherwise. The derivation, application, and analysis
of this tool constitute the main focus of this chapter. In the preceding chapter we
saw examples in which integrals were related to expansion coefficients of Laurent
series. Here we will develop a systematic way of evaluating both real and complex
integrals.
10.1 Residues
Recall that a singular point zo of f : iC --* iC is a point at which f fails to be
analytic. If in addition, there is some neighborhood of zo in which f is analytic
at every point (except of course at zo itself), then zo is called an isolatedsingu-
isolated singuiarity Iarity of f. Almost all the singularities we have encountered so far have been
isolated singularities. However, we will see later-when discussing multivalued
functions-that singularities that are not isolated do exist.
Let zo be an isolated singularity of f. Then there exists an r > 0 such that
within the "annular" region 0 < [z - zoI < r, the function f has the Laurent
expansion
f( ) ~ ( n ~ n bl b2
Z = L- anz-zo) =L-an(Z-ZO) +--+ )2+'"
n=-oo n=O Z - zo (z - zo
where
1 J f(~)d~
an = 2".; fc (~ - zo)n+l
and
(10.1)
10.1 RESIDUES 271
In particular,
hi = 2
1
. 1. f(~) d~,
7T:Z Yc
where C is any simple closed contour around zo. traversed in the positive sense,
on and interior to which f is analytic except at the point zo itself. The complex
number hj, which is essentially the integral of f(z) along the contour, is called
residue defined the residue of f at the isolated singular point zo. It is important to note that the
residue is independent of the contour C as long as zo is the only isolated singular
point within C.
Pierre AlphonseLaurent (1813-1854) gradoated fromthe EcolePo1ytechniqoe nearthe
top ofhis classand became asecondlieutenantin theengineering corps. Onhisreturn from
the warin Algeria, he took part in the effortto improve the portat Le Havre, spending
six years there directing various parts of theproject Laurent's superior officers admired
thebreadth of his practical experience andthe goodjudgment it afforded theyoung en-
gineer. During thisperiod he wrote his first scientific paper, onthecalculus of variations.
andsubmitted it to the French Academy of Sciences for the grand prix in mathematics.
Unfortunately thecompetition hadalready closed(allboughthejudgeshadnotyetdeclared
a winner), andLaurent's submission wasnotsuccessful. However, thepaper so impressed
Cauchy that he recommended itspublication, also without success.
Thepaperfor which Laurentismostwellknownsufferedasimilarfate. Inithedescribed
amore generalform ofatheoremearlierprovenbyCauchyfor the powerseries expansionof
afunction. Laurentrealizedthatonecouldgeneralizethisresultto holdinany annularregion
between two singular ordiscontinuous points by usingbothpositive andnegative powers
in theseries, thus allowing treatment of regions beyond thefirst singular ordiscontinuous
point. Again, Cauchy arguedfor thepaper'spublicationwithoutsuccess. Thepassageoftime
provided amore justreward, however, andtheuse of Laurent series became a fundamental
toolin complex analysis.
Laurent laterworkedinthetheory of lightwaves and contended withCauchy overthe
interpretation of thedifferential equations thelatterhadfonnulated to explain thebehavior
of light. Little came ofhis work inthisarea. however, andLaurent diedattheageof forty-
two,_
a captain serving on the committee on fortifications in Paris. His widowpressed to
havetwomore of hispapers read to theAcademy, onlyone of whichwaspublished.
We use the notation Res[f(zo)] to denote the residue of f at the isolated
singular point zoo Equation (10.1) can then be written as
t f(z) dz = 2ni Res[f(zo)]·
What if there are several isolated singular points within the simple closed
contour C? The following theorem provides the answer.
residue theorem 10.1.1. Theorem. (the residue theorem) Let C be a positively oriented simple
(10.2)
272 10. CALCULUS OF RESIDUES
Figure 10.1 Singularitiesare avoidedby going around them.
closed contour within and on which a function f is analytic except at a finite
number ofisolated singular points Zl , Z2, ... , Zm interior to C. Then
1. f(z)dz=2,..itReS[f(Zk)].
fc k=1
Proof Let Ck be the positively traversed circle around Zk.Then Figure 10.1 and
the Cauchy-Goursat theorem yield
0= !cJ(Z)dZ = - f f(z)dz + f f(z)dz +!c f(z)dz,
circles parallel
lines
where C' is the union of all the contours, and the minus sign on the first integral
is due to the interiors of all circles lie to our right as we traverse their boundaries.
The contributious of theparallel lines cancel out, and we obtain
i f(z) dz = t i f(z) dz = t 2,..i Res[f(Zk)],
c k=l Ck k=l
where in thelast step thedefinition ofresidue at Zk has been used. o
10.1.2. Example. Let us evaluatethe integral:fc(2z - 3) dz/[z(z - 1)] where C is the
circle Izi = 2. There are two isolated singularitiesin C, Zl = 0 and Z2 = I. To find
Res[f(ZI)l. we expandaroundthe origin:
2z-3 3 I 3 I 3
- - - = - - -- = - +-- = - +1 +z +... for [z] < I.
z(z - I) z z- I z 1 - z z
10.2 CLASSIFICATION OF ISOLATED SINGULARITIES 273
This givesRes[!(z,)] = 3. Similarly, expandingaroundz = I gives
2z-3 = 3 __
I_= __
I_+ 3I:(_I)n(Z_I)n,
z(z - I) z - I + I z - I z - I k=O
whichyieldsRes[!(z2)] = -I. Thus,
1. 2z - 3 dz = 2"iIRes[!(z,)] + Res[f(Z2)]} = 2"i(3 - I) = 4"i.
Yc z(z - I)
10.2 Classificationof Isolated Singularities
III
Let [ : iC -> iChave an isolated singularity at zoo Then there exist a real number
r > 0 and an annular region 0 < [z - zol < r such that T can be represented by
theLaurent series
(10.3)
principal partofa
function
removable singular
point
poles defined
simple pole
essential singularity
The second sum in Equation (10.3), involving negative powers of (z - zo), is
called the principal part of [ at zoo We can use the ptincipal part to distinguish
three types of isolated singularities. The behavior ofthe function near the isolated
singularity is fundamentally different in each case.
I. If bn = 0 for all n 2: I, zo is called a removable singular point of f .In
this case, the Laurent series contains only nonnegative powers of (z - zo),
and setting [(zo) = ao makes the function analytic at zoo For example, the
function fez) = (e' - 1 - Z)/z2, which is indeterminate at z =0, becomes
entire if we set [(0) =1
,becauseits Lanrentseries fez) = ~+~+:~+...
hasno negative power.
2. If bn = 0 for all n > m and bm # 0, zn is called a pole of order m. In this
case, the expansion takes the form
~ n bi »:
[(z) = L.., an(z - zn) + -- +...+ -:---=-:-:::-
n~O Z - zo (z - zo)"
for 0 < lz - zn] < r. Inparticular, if m = 1, ZO is called a simple pole.
3. If the principal part of [ at zo has an infinite number of nonzero terms, the
point zo is called an essential singularity. A prototype of functions that
have essential singularities is
274 10. CALCULUSOFRESIOUES
which has an essential singularity at z = aand a residue of I there. To see
how strange suchfunctions are, we let a be any realnumber, and consider
z = I/Ona +2n,,0 for n =0, ±1, ±2, .... For such a z we have e1/ z =
elna+2mri = ae2nni = a. Inparticular, asn ---+ 00, Z gets arbitrarily close to
the origin. Thus, in an arbitrarily small neighborhood ofthe origin, there are
infinitely many points at which the function exp(l/z) takes on an arbitrary
value a. In other words, as z ---> 0, the function gets arbitrarily close to any
real number! This result holds for all functions with essential singularities.
10.2.1. Example. ORDER OF POLES
(a) The function(zz - 3z+5)/(z - I) hasa Laurentseriesaroundz = I containingonly
1hreetenns: (zZ -3z+5)/(z -I) =-I +(z-I)+3/(z -I). Thus,it has a simplepole
atz = 1,witha residue of 3.
(b) The function sinz/z6 has a Laurentseries
sinz 1 00 z2n+l 1 l I z
7 = z6 ~(_I)n (2n + 1)1 = z5 - 6z3 + (5!)z - 7! + ...
about z = O. Theprincipal part hasthree terms. Thepole, at z = 0, is of order 5, and the
functionhas a residue of 1/120 at z = O.
(c) The function (zZ - 5z + 6)/(z - 2) has aremovahle singularityat z =2, because
zZ - 5z +6 = (z - 2)(z - 3) = z _ 3 = -I + (z -2)
z-2 z-2
andbn = afor all n. III
10.2.2. Example. SINGULARITIES OF A RATIONAL FUNCTION
rational function In this example we showthat a function whose only singularities in the entire complex
plane arepolesmusthe a rational function,1 i.e., theratio of two polynomials. Letf be
suchafunction and let {zj lJ=l beitspolessuchthat Zj isof orderml :Expand thefunction
about zt in a Laurent series
Afunction whose
only singularities in
the entire complex
plane are poles isa
rational function.
bj bmj ~ k Pt(z)
fez) = - - + ...+ ( )mj + L" ak(Z - Zt) == ( )mj + 81(Z),
Z - Zl Z - Zl k=O Z - Zl
where Pj (z) is a polynomialof degree mj - I in z and 81 is analytic at Zj. It should be
clearthat theremaining poles of f are in gl. So, expand gl about Z2 in a Laurent series.
A similar argumentas aboveyields81(z) = Pz(z)/(z - Zz)m2 +8Z(Z) where P2(Z) is a
polynomial ofdegree m2 - I in z andg2 is analytic atzt and Z2. Continuing inthismanner,
we get
where g hasnopoles.Sinceallpolesof f havebeenisolatedin thesum,g mustbe analytic
everywhere in C, i.e., anentire function. Now substitute 1/ t for z, take the limit t --+ 0,
lWe assume that thepointatinfinity is nota poleof the function, i.e., thatf(l/z) does nothavea pole at theorigin.
10.3 EVALUATION OF DEFINITE INTEGRALS 275
andnote that, since thedegreeof Pi is mi - 1, all theterms in thepreceding equation go
to zeroexceptpossiblygO/t). Moreover,
lim g(l/t) i' 00,
HO
because, by assumption, thepointatinfinity is nota pole of f. Thus,g is abounded entire
function. By Proposition 9.5.5, g mustbea constant. Taking acommon denominator forall
the termsyields a ratioof two polynomials. II
The type of isolated singularity that is most important in applications is of the
second type-poles. For a function that has a pole oforder m at zo, the calculation
of residaes is routine. Such a calculation, in turn, enables us to evaluate many
integrals effortlessly. How do we calculate the residue of a function f having a
pole of order m at zo?
It is clear that if f has a pole of order m, then g : iC --> iC defined by
g(z) es (z - zo)mf(z) is analytic at zoo Thus, for any simple closed contour C that
contains zo but no other singular point of f, we have
I i i i g(z)dz' g(m-l)(zo)
Res[f(zo)] = -2' f(z)dz = -2' ( )m = ( 1)1'
1l'1 C 7n C z - zo m - .
Interms of f this yields2
I dm- I
Res[f(zo)] = ( 1)1 lim d ..-I [(z - zo)" f(z)].
m - . z-e-zn Z
For the special, but important, case of a simple pole, we obtain
Res[f(zo)] = lim [(z - zo)f(z)].
Z--+ZQ
10.3 Evaluation of Definite Integrals
(lOA)
(10.5)
Themostwidespreadapplicationofresidues occurs in the evaluationofrealdefinite
integrals. It is possible to "complexify" certain real definite integrals and relate
them to contour integrations in the complex plane. We will discuss this method
shortly; however, we first need a lemma.
10.3.1. Lemma. (Jordan's lemma) Let CR be a semicircle ofradius R in the upper
halfof the complex plane (UHP) and centered at the origin. Let f be a function
that tends uniformly to zerofaster than l!I'zl for arg(z) E [0, n] as [z] --> 00. Let
0: be a nonnegative real number. Then
2Thelimitis taken becausein manycasesthemeresubstitution of zo mayresultin anindeterminate form.
276 10. CALCULUS DF RESIDUES
Proof. For Z E CR we write z = Rei9, dz = IRei9de, and
IOIZ = IOI(Rcos e +IR sin e) = IOIR cos e - OIR sine
and substitute in the absolute value of the integral to show that
IIRI :s1o"e-·Rsin9Rlf(Rei9)lde.
By assumption, Rlf(Rei9)1< feR) independent of e, where feR) is an arbitrary
positive number that tends to zero as R ---> 00. By breaking up the interval of
integration into two equal pieces andchanging eto n - ein the second integral,
one can show that
IIRI < 2f(R) 10,,/2 e-·Rsin9de.
Furthermore, sin e "= 2eIn for 0 :s e :S n 12 (see Figure 10.2 for a "proof").
Thus,
fo
" f2 . nf(R)
Ihl < 2f(R) e-(2.R/,,)9de = --(I - e-·R),
o OIR
which goes to zero as R gets larger and larger. D
Note that Jordan's lemma applies fora = 0 as well, because (I-e-·R) ---> OIR
as a ---> O. If a < 0, the lemma is still valid if the semicircle CR is taken in
the lower half of the complex plane (LHP) and fez) goes to zero mtiformly for
n :S arg(z) :S 2n.
We are now in a position to apply the residue theorem to the evaluation of
definite integrals. The three types of integrals most commonly encountered are
discussed separately below. In all cases we assume that Jordan's lemma holds.
10.3.1 Integrals of Rational Functions
The first type of integral we can evaluate using the residue theorem is of the form
I -100 p(x) d
1- - - x
-00 q(x) ,
where p(x) and q(x) are real polynomials, and q(x) f= 0 for any real x. We can
then write
t, = lim 1R
p(x) dx = lim r p(z) dz
R....oo -R q(x) R....oo lex q(z) ,
where Cx is the (open) contour lying on the real axis from - R to +R. Assuming
that Jordan's lemmaholds, we can close that coutour by adding to it the semicircle
10.3 EVALUATION DF DEFINITE INTEGRALS 277
----------r--------
A ---+-------!----------'.
----r-··.--
1/,,- .i., -+.--------. !--------~---'i,--------
-!
....__...i···········-r-·········-t·__······_+·····--T
.........i···········_j_···········f·· _·_·-t-·_---------:--·_···_----+----
·····1
I
o
-0.5 0 0.5 I 1.5 2 2.5 3 3.5
Figure 10.2 The "proof" of sin6 ::: 26/" for 0", 6 '" "/2. The line is the graphof
y = 26/,,; thecurve is that of y = sin6.
of radius R [see Figure lO.3(a)]. This will not affect the value of the integral,
because in the limit R --+ 00, the contribution of the integral of the semicircle
tends to zero. We close the contour in the UHP if q(z) has at least one zero there.
We then get
!J = lim J: p(z) dz = 2"j ~Res[P(Zj)]
R-+C.,rCq(z) ~ q(Zj) ,
where C is the closed contourcomposed ofthe interval (- R, R) and the semicircle
CR,and (Zj 11~t are the zeros of q(z) in the UHP. We may instead close the contour
in the LHP,3 in which case
where (Zj I1=1are the zeros of q(z) in the LHP. The minus sign indicates that in
the LHP we (are forced to) integrate clockwise.
10.3.2. Example. Letus evaluate theintegrall = 1
0
00
x2dx/[(x2 + 1)(x
2
+9)]. Since
the integrand is even,we can extend the interval of integration to all real numbers (and
dividethe resultby 2).It is shownbelowthat Jordan's lemmaholds. Therefore, we write
the contourintegral corresponding to I:
3Provided thatJordan's lemmaholdsthere.
278 10. CALCULUS OF RESIDUES
where C is as shownin Figure 1O.3(a). Note that the contour is traversed in the positive
sense.Thisis alwaystrue fortheUHP. Thesingularities of thefunction in theUHPare the
simple poles i and3i corresponding to the simplezerosof thedenominator. Theresidues
atthesepoles are
2 1
Res[f(i)] = lim (z _ i) z - -
z....t (z - i)(z +i)(z2 +9) 16i '
Res[f(3i)] = lim (z _ 3i) z2 = 3
z....3' (z2 + l)(z - 3i)(z +3i) 16i'
Thus,we obtain
It is instructive to obtain thesameresults usingtheLHP. In thiscase,thecontour is as
showninFigure 10.3(b) andis taken clockwise,so we haveto introduce aminussign.The
singular points areatz = -i and z = - 3i. Thesearesimplepolesatwhichtheresidues of
thefunction are
2
Res[f(-O] = lim (z + O-:----::--,----,-z--,---,,---:::-
z....-, (z - i)(z +i)(z2 +9) 16i'
2
Res[f(-3i)] = lim (z +3i) z =
z....-3' (z2 +1)(z - 3i)(z +3i)
Therefore,
3
16i
To show thatJordan's lemma applies to this integral, we have only to establish that
limR-4oo Rlf(Rei8)1
= O. In the case at hand, a = 0 becausethere is no exponential
function intheintegrand. Thus,
whichclearlygoes to zeroas R --+ 00.
10.3.3. Example. Let us uow consider a slightly more complicated integral:
1
00 x2dx
-00 (x2 + 1)(x2 +4)2'
11II
which in the UHP turns into Pc z2dz/[(z2 + l)(z2 +4)2]. The poles in the UHP are at
z = i andz = 2i. The former is a simplepole, andthe latter is a pole of order 2. Thus,
10.3 EVALUATION OF DEFINITE INTEGRALS 279
3i
-R
(a)
-i
-3i
(b)
R
Figure10.3 (a)Thelargesemicircleis cbosenin the UHP. (b) Notebowthe directionof
contour integration is forced tobe clockwisewhenthesemicircle is chosen in theLHP.
usingEquations(l0.5) and (10.4),we obtain
Res[f(i)l= lim(z-i) z2 = __
1_,
z....i (z - i)(z +i)(z2 +4)2 18i
Res[f(2i)] = _1_ lim .'!- [(Z _2i)2 z2 ]
(2 - 1)1 z....2' dz (z2+ I)(z + 2i)2(z - 2i)2
d [ z2 ] 5
= ,~, dz (z2+ I)(z +2i)2 = 72i'
and
foo ~~.::x_2=dx:;.-_~ _ 2"i ( __1_ + _5_) _z.
-00 (x2+1)(x2+4)2 - 18i 72i - 36'
Closingthecontour in theLHP wouldyieldthesameresult.
10.3.2 Products of Rational and Trigonometric Functions
III
The second type of integral we can evaluate using the residue theorem is of the .
form
1
00 pIx)
--cosaxdx
-00 q(x)
or
1
00 pIx) .
--smaxdx,
-00 q(x)
where a is a real number, pIx) and q(x) are real polynomials in x, and q(x) has
no real zeros. These integrals are the real and imaginary parts of
I
100 pIx) 'ax d
2= --e x.
-00 q(X)
280 10. CALCULUS DFRESIDUES
The presence of eiax dictates the choice ofthe half-plane: Ifa "=: 0, we choose the
UHP; otherwise, we choose the LHP. We must, of course, have enough powers of
x in the denominator to render Rjp(Reie)/q(ReiB)1 uniformly convergent to zero.
10.3.4. Example. Let us evaluate!-""oo[cos ax/(x2+ IP] dx wherea 'i O. Thisintegral
is the real part of the integral h = J~oo eiax dx/ (x2 + 1)2. When a > 0, we close in the
UHP as advised by Jordan's lemma. Then we proceed as for integrals of rational functions.
Thus, we have
i
eiaz
h = 2 2 dz = 2rri Res[f(i)]
C (z + I)
fora> 0
because there is only one pole (of order 2) in the UHP at z = i, We next calculate the
residue:
Res[f(i)] = lim .'!..- [(z - i)2 e~az 2J
z-->i dz (z - i) (z + i)
. d [ e
iaz
J . [CZ + i)iae
iaz
- 2e
iaZ
J e-
a
= Inn - - - - = Inn = -(I +a).
z-+i dz (z +i)2 z-vt (z +i)3 4i
Substituting this in the expression for 12. we obtain h = Ie-a(l +a) for a > O.
When a < 0, we have to close the contour in the LHP, where the pole of order 2 is at
z = -i and the contour is taken clockwise. Thus, we get
i
eiaz
h = 2 2 dz = -2rri Res[t(-i)]
C (z + I)
For the residue we obtain
for a < O.
d [ e
iaz
J e
a
Res[f(-i)] = lim - (z +i)2 2 2 = --(1- a),
z-->-i dz (z - i) (z + i) 4i
and the expression for h becomes h = ~ea(l - a) for a < O.We can combine the two
results and write
100 cosax 7f I I
2 2 dx =Re(l2) = lz = -(I + lal)e- a .
-00 (x + I) 2
10.3.5. Example. As anotherexample, let us evaluate
III
100 x sin ax
---dx
-00 x4 + 4
where a 'i O.
This is the imaginary part of the integral h = J~oo xeiax dx/(x4 +4), which, in terms of
z and for the closed contour in the UHP (when a > 0), becomes
i
zeiaz m
tx> -4--dz=2rriLRes[f(Zj)]
C z +4 j~l
for a> O. (10.6)
10.3 EVALUATION OF DEFINITE INTEGRALS 281
The singularities are determined by the zeros of the denominator: z4 + 4 = 0, or Z =
1 ± i, -1 ± i, Of these four simplepoles only two, 1 + i and-1 + i, arein theUHP. We
now calculate theresidues:
zeiaz
Res[f(1 + i)] = lim (z - I - i) . . . .
z-->1+; (z - I -,)(z - I + ,)(z + I-,)(z + 1+,)
(1 +i)e ia(l+ i) eiae-a
(2i)(2)(2 +2i) 8i
zeiaz
Res[f(-I + ill = lim (z + I - i) . . . .
z-->-1+; (z + I -,)(z + I + ,)(z - I-,)(z - 1+,)
(-1 +Oia(- l+ i) e-iae-a
(2i)(-2)(-2 +2i) 8i
Substituting in Equatiou(10.6),we obtain
e-a. . x
12= 2rci""""8i(ela _e-W
) = ize-asina.
Thus,
1
00 x sinax 1r
-4-- dx = lm(h) = _e-a sina
-00 x +4 2
for a> O. (10.7)
Fora < 0, we couldclose thecontour in the LHP. Butthere is aneasierway of getting to
theauswer. Wenote that -a > 0, audEquation(10.7)yields
100 xsinax 1
00
xsin[(-a)x] 1r (a). n
--dx=- dx=--e-- sm(-a)=-easina.
-00 x4 + 4 -00 x4 + 4 2 2
Wecancollect thetwocases in
1
00 x sin-ax 1C I I
---dx = _e- a sina.
-00 x4 + 4 2
10.3.3 Functions of Trigonometric Functions
The third type of integral we can evaluate usingthe residue theorem involves only
trigonometric functions and is of the form
fo2n F(sinO, cos 0) ao.
where F is some (typically rational) function ofits arguments. Since 0 varies from
oto 217:,we can consider it an argument ofa point z on the unit circle centered at the
origin. Then z = eW and e-w = I/z, and we can substitute cos 0 = (z +l/z)/2,
sinO = (z - l/z)/(2i), and dll = dz/(iz) in the original integral, to obtain
J F (Z - I/z, z+ I/Z) ~z.
Jc 2, 2 IZ
This integral can often be evaluated using the method of residues.
282 10. CALCULUS OF RESIDUES
a
-I-~
Z2 =
and
a
10.3.6. Example. Let us evaluate the integral frdO/(1 + acosO) where faJ < 1.
Substituting for cos () and de in terms of z, we obtain
1. dz/iz 21. dz
f c 1+ a[(z2 + 1)/(2z)] = i fc 2z + az2 + a'
where C is the unit circle centered at the origin. The singularities of the integrand are the
zerosof its denominator:
-l+~
ai =
For lal< 1it is clear that Z2 will lie outside the unit circle C; therefore, it does not contribute
to the integral. But zt lies inside, and we obtain
fc 2z + ::2 + a = 2:n:i Res[f(Zt)]·
The residue of the simple pole at zt can be calculated:
Res[!(ZI)] = lim (z - Zl) ( ~( ) ~ (_1_)
z-e-zt a Z - zr Z - Z2 a zt - Z2
I ( a) I
=;; 2v'I-a2 = 2v'I-a2'
It follows that
[2" dO 2 1. dz 2 ( I ) 2:n:
10 1+{lcose=irc2z+az2+a=i'bri 2-!1-a2 = Jl-aZ' II
10.3.7. Example. As anotherexample, let us consider the integral
where a> 1.
] _ {" dO
- 10 (a + cos0)2
Since cos eis an even function of e. we may write
where a > 1.
11" dO
] -
-2: _,,(a+cosO)2
This integration is over a complete cycle around the origin, and we can make the usual
substitution:
]=~1. dz/iz =~1. zdz .
2 f c [a + (z2 + 1)/2z]2 i fc (z2 +2az + 1)2
The denominator has the"roots zt = -a + JaZ - 1 and Z2 = -a - via2 - 1, which are
both of order 2. The second root is outside the unit circle because a > 1. Also, it is easily
verified that for all a > 1, zr is inside the unit circle. Since zt is a pole of order 2, we have
Res[f(ZI)] = lim '!...- [(z _ ZI)2 z ]
z-->z[ dz (z - ZI)2(z - Z2)2
-lim'!...-[ z ]_ I
- z-e-zr dz (z - Z2)2 - (Zl - Z2)2
We thos obtain] = ~2:n:i Res[!(ZI)] = 2:n:a 3/2'
, (a - I)
10.3 EVALUATION OF DEFINITE INTEGRALS 283
10.3.4 Some OtherIntegrals
The three types of definite integrals discussed above do not exhaust all possible
applications of the residue theorem. There are other integrals that do not fit into
any of the foregoing three categories but are still manageable. As the next two
examples demonstrate, aningenious choiceof contours allowsevaluation of other
types of integrals.
10.3.8. Example. Letus evaluate the Gaussianintegral
1
00 . 2
I = elax-bx dx
-00
where a, b E lll., b > O.
Completing squares in theexponent, we have
1= 1
00
e-b[x-ia/(2b)]2_a2j4bd x = e-a2/ 4b lim lR
e-b[x-ia/(2b)]2dx.
-00 R-?oo -R
If we change thevariable of integration toz = x - iaf(2b), we obtain
2/(4b) lR- ia/ (2b) b 2
1= e-alim e? Z dz.
R-+oo -R-iaj(2b)
Let us now define IR:
l
R- ia/(2b) 2
[R es e-bz dz.
-R-ia/(2b)
This is an integralalonga straightline Ci that is parallelto the x-axis (seeFigure 10.4).
We close the contour as shown andnote that e-
hzZ
is analyticthroughout the interior of
theclosed contour (it is anentirefunction!). Thus,the contour integral mustvanishby the
Cauchy-Gaursat theorem. So we obtain
[ 2 i: 2 1 2
IR + e-bz dz + e-bx dx + e-bz dz = O.
C3 R C4
AlongC3, z =R + iy and
r e-bz2dz=1° e-b(R+iy)2idY=ie-bR21° eby2-2ibRYdy
lc, -ia/(2b) -ia/(2b)
whichclearly tendsto zero as R --+ 00. We get a similarresultfor the integral along C4.
Therefore, we have
l
R 2
IR = e-bx dx
-R
Finally, we get
. 100
_bx
2
~
=} 11m IR = e dx = -b'
R~oo -00
III
284 10, CALCULUS OF RESIDUES
-R
-ia 1(2b) C I
Figure 10.4 The contour for the evaluation of the Gaussian integral.
R
10.3.9. Example. Let us evaluate [ = fo
oo
dxl(x3 + I). If the integraud were even,
we could extend the lower limit of integration to -00 and close the contour in the UHP.
Since this is not the case, we need to use a different trick. To get a hint as to how to close
the contour, we study the singularities of the integrand. These are simply the roots of the
denominator: z3 = -1 or Z,~ = ei(2n+ l )1l'j 3 with n =0, 1,2. These, as well as a contour
that has only zo as an interior point, are shown in Figure 10.5. We thus have
1 dz 1 dz
[ + -3-- + -3-- = 2"i Res[f(zo)]·
CR Z + I c, z + I
(10.8)
The CR integral vanishes, as usual. Along Cz. z = ria, with constantc, so that dz =eiadr
aud
1 dz 10 ei'dr ,{OO dr
C2 z3 + 1 = 00 (re ia)3 + 1 = _e
UX
io r3e3ia + 1.
In particular, ifwe choose 3a = 2rr. we obtain
f ~ = _i21r/3 1
00
~ = _i27t/ 3I.
1c, z3 + I 10 r3 + I
Substituting this in Equation (10.8) gives
21Ci
(I - e
i2rr/3)[
= 2,,; Res[f(zO)] =} [= '2 /3 Res[f(zo)],
1- e' 71:
On the other baud,
Res[f(zO)] = lim (z - zo) I
2-->20 (z - zo)(z - ZI)(Z - Z2)
I I
(zo - ZI)(ZO - Z2) (eirr/3 - eirr)(eirr/3 _ ei5rr/3) '
These last two equations yield
2"i I 27r
[ = 1- ei2rr/3 (eirr/3 _ eirr)(eirr/3 - ei5rr/3) = 3,,13'
10.3 EVALUATION OF DEFINITE INTEGRALS 285
Cz
• Zz
Figure10.5 The contour is chosenso that onlyoneof the poleslies inside.
10.3.5 Principal Valueof an Integral
So far we have discussed only integrals of functions that have no singularities on
the contour. Let ns now investigate the conseqnences of the presence of singular
points on the contour. Consider the integral
1'" f(x) dx,
_ooX-XQ
(10.9)
principal value ofan
integral
where xo is a real number and f is analytic at xo. To avoid xo-which causes the
integrand to diverge-we bypass it by indenting the contour as shown in Figure
10.6 and denoting the new contour by Cu' The contour Co is simply a semicircle
of radius E. Forthecontour Cu , we have
( f(z) dz = 1
xo
- < f(x) dx +1'" f(x) dx + ( f(z) dz,
Jeu z - XQ -00 x - XQ xo+€ X - XQ leo z - XQ
In the limit E ~ 0, the sum of the first two terms on the RHS-when it exists-
defines the principal value of the integral in Equation (10.9):
1
'" f(x) . [1xo
- < f(x) 1'" f(x) ]
P --dx=lim --dx+ --dx.
-00 x - XQ E~O -00 X - XQ xo+€ X - XQ
The integral over the semicircle is calcnlated by noting that z - Xo = .e;e and
dz =iee"dO: JeD f(z) dz/(z - xo) =-i:n:f(xo). Therefore,
1 f(z) 1'" f(x) .
- - dz = P - - dx - mf(xo)·
CuZ-XQ _ooX-XQ
(10.10)
286 10. CALCULUS OF RESIDUES
Xo-E
. Figure 10.6 The contour Cu avoidsxo.
XO+E
Cu
(10.11)
On the other hand, if Co is taken below the siogularity on a contour Cd, say, we
obtaio
( f(z) dz = P 100
f(x) dx +irrf(xo).
Jed Z - XQ -00 x - XQ
We see that the contour integral depends on how the siogular poiot Xo is avoided.
However, the priocipal value, ifit exists, is unique. To calculate this priocipal value
we close the contour by addiog a large semicircle to it as before, assuming that the
contribution from this semicircle goes to zero by Jordan's lemma. The contours
C« and Cd are replaced by a closed contour, and the value of the iotegral will be
given by the residue theorem. We therefore have
P 100
f(x) dx = ±irrf(xo) +2rri fRes [ f(Zj) ],
-00 x - XQ j=l Zj - XQ
where the plus sign corresponds to placing the iofioitesimal semicircle io the UHP,
as shown io Figure 10.6, and the mious sign corresponds to the other choice.
10.3.10. Example. Let us use the principal-value method to evaluate the integral
10
00 sinx 1100
sinx
1= --dx = - --dx.
ox 2_00 x
Itappears that x = 0 is a singular point of the integrand; in reality, however, it is only a
removable singularity, as canbe verifiedby theTaylor expansionof sinxJx. Tomakeuse
of theprincipal-value method, we write
I= ~1m(100
e
ix
dX) = ~1m(p1
00
e
ix
dX).
2 -00 x 2 -00 x
We now use Equation (10.11) withthe small circle in the UHP, noting that there areno
singularities foreix[x there. Thisyields
1
00 ix
P - dx = breeD) = in.
-00 x
Therefore,
roo sinx 1 1r
1
0
7 dx = 2: Imuz) = 2' III
10.3 EVALUATION OF DEFINITE INTEGRALS 287
x
•
Xo
I
e
_ _ _ _1 - - - - -
Figure 10.7 Theequivalent contour obtained by "stretching" Cu, thecontour of Figure
10.6.
The principal value of an integral can be written more compactly ifwe deform
the contour Cu by stretching it into that shown in Figure 10.7. For small enough <,
such a deformation will not change the number of singularities within the infinite
closed contour. Thus, the LHS of Equation (10.10) will have limits of integration
-00 + i< and +00 + i<.1f we change the variable of integration to ~ = z - ie,
this integral becomes
1
00 f(~ +i<) d~ =1
00
f(~)d~ =1
00
f(z) dz
-00 ~ + ie - XQ -00 ~ - XQ + ie -00 Z - XQ + ie' (10.12)
where in the last step we changed the dummy integration variable back to z. Note
that since f is assumed to be continuous at all points on the contour, f (~ +is) --->
f(~) for small <. The last integral of Equation (10.12) shows that there is no
singularity on the new x-axis; we have pushed the singnlarity down to XQ - i <.In
otherwords, we have given the singnlarity Onthe x-axis a small negative imaginary
part. We can thus rewrite Equation (10.10) as
p 100
f(x) dx = irrf(xQ) +100
f(x) dx. ,
-00 x - XQ -00 x - XQ + lE
where x is used instead of z in the last integral because we are indeed integrating
along the new x-axis-s-assumlng that no other singnlarities are present in the UHP.
A similar argument, this time for the LHP, introduces a minus sign for the first
term on the RHS and for the e term in the denominator. Therefore,
p 100
f(x) dx = ±irrf(xQ) +100
f(x)dx.,
-00 x - xo -00 x - XQ ± lE
(10.13)
where the plus (minus) sign refers to the UHP (LHP). This result is sometimes
abbreviated as
1 I
-----,--.- = P-- dx Of irr~(x - XQ).
x -XO±ZE x -XD
(10.14)
E > o.
288 10. CALCULUS DF RESIDUES
10.3.11. Example. Let us use residuesto evaluatethe fuuction
I 1
00
e
ikx
dx
f(k)=-. --.,
21n -00 X-lf
The integral
representation 01 the
e(step) lunction
Wehaveto close thecontour byadding a large semicircle. Whether we dothisin theUHP
or the LHPis dictatedby the sign of k: Ifk > 0, we close in the UHP. Thus,
1 1eikZdz [ e
ikz
]
f(k) = -. --. =Res --.
21l'l C Z - IE Z - IE z-vie
= lim [(Z - it:) eik~ ] = e-kE-----+ 1.
Z-+l€ Z - IE 10-+0
Ontheother hand, if k -c 0, we mustclose in theLHP, in whichtheintegrand is analytic.
Thus, by theCauchy-Goursat theorem, theintegral vanishes. Therefore, wehave
f(k) = {I ifk>O,
o ifk<O.
theta(orstep) This is precisely the definition of the theta fnnction (or step function). Thus, we have
function obtained anintegral representation of that function:
I 1
00
e
ixt
O(x) = -. --. dt.
2Jrl -00 t - IE
..
Now suppose that there are two singularpoints on the real axis, at XI and xz. Let
us avoid XI and Xz by making little semicircles, as before, letting both semicircles
be in the UHP (see Figure 10.8). Without writing the integrands, we can represent
the contour integral by
The principal value ofthe integral is naturally defined to be the sum of all integrals
having E in their limits. The contribution from the small semicircle Clean be
calculated by substituting z - XI = Eeie in the integral:
r f(z)dz 1°f(xI +Eeie)iEeie de . f(xI)
leI (z - Xl)(Z - Xz) = x Eeie(Xl +EeW - Xz) = -In Xl - X2'
with a similar result for CZ. Putting everything together, we get
plOO f(x) d . f(xz)-f(xI) 2 'LR
x - l1t = 1i1 es.
-00 (x - Xl)(X - Xz) Xz - XI
If we inclnde the case where both CI and Cz are in the LHP, we get
ploo f(x) d ±. f(xz)-f(Xl) + 2 'LR
x = tit 1!1 es,
-00 (x - XI)(X - Xz) Xz - Xl
(10.15)
10.3 EVALUATION OF DEFINITE INTEGRALS 289
XI
C2
X2
Figure10.8 One of the four choices of contours for evaluating the principal value of the
integral when there are two poles on the real axis.
where the plus sign is for the case where CI and C2 are in the UHP and the minus
sign for the case where both are in the LHP. We can also obtain the result for the
case where the two singularities coincide by taking the limit Xl --> X2. Then the
RHS of the last equation becomes a derivative, and we obtain
JOO f(x) . ."
P 2 dx = ±mf'(xo) +2m L...Res.
-00 (x - xo)
10.3.12. Example. An expression encountered in the study ofGreen's functions or prop-
agators (which we shall discuss later in the book) is
1
00 eitx dx
_oox2 - k2 '
where k and t are real constants. We want to calculate the principal value of this integral.
We use Equation (10.15) and note that for t > 0, we need to close the contour in the UHP,
where there are no poles:
1
00 eitx dx 100 eitx dx eikt - e-ikt sinkt
P ---=P =irr --'Jr--
-00 x2 - k2 -00 (x - k)(x +k) 2k - k·
When t < 0, we have to close the contour in the LHP, where again there are no poles:
1
00 eitx dx 100 eitx dx eikt - e-ikt sin kt
P -00 x-2---k-2 = P -00 (x - k)(x + k) = -Ln 2k = Jr-
k - ·
The two results abovecan be combinedinto a singlerelation:
1
00 eitx dx sinkltl
P -2--2 = -Jr--.
-oox -k k
11II
290 10. CALCULUS OF RESIOUES
10.4 Problems
(c) J cosz dz:
Yc z(z - n)
(f) J I-~OSZ dz:
Yc z
(") J dz
I Yc Z3(Z +5)'
i:
(I) zdz.
cZ
J 4z-3
(a) Yc z(z _ 2) dz.
i 2+1
(d) z dz:
c z(z - I)
()i
sinh z
g -4- dz.
c z
(j) i tanzdz.:
i dz
(m) -.--
c z2 sin Z'
10.1. Evaluate each of the following integrals, for all of which C is the circle
lel = 3. I
!
i eli
(b) i dz:
c z(z i in)
i cosh z
(e) 2 -I- 2 dz:
c z n
(h) izcos (D dz.
J dz
(k) Yc sinh 2z'
i e'dz
(n) .
C (z - I)(z - 2)
10.2. Leth(z) be analytic and have a simple zero at z = zo. and let g(z) be analytic
there. Let !(z) = g(z)/h(z), and show that
g(zo)
Res[!(zo)] = h'(zo)'
10.3. Find the residne of !(z) = 1/ cos z at each of its poles.
10.4. Evaluate the integral J:'dx/[(x2
+ l)(x2 +4)] by closing the contour (a)
in the UHP and (b) in the LHP.
10.5. Evaluate the following integrals, in which a and b are nonzero real constants.
R+ib
-R o
10.4 PROBLEMS 291
R+ib
-
+R
where a ;f ±1.
where a ;f ±1.
(b) {2" d8 where a > 1.
io a+cos8
10
2;< d8
(d) where a, b > O.
o (a +bcos28)2
(f) (" d¢ where a ;f ±1.
io 1- Zacos ¢ +a2
Fi9ure 10.9 Thecontour usedin Problem 10.8.
10.6. Evaluate each of the following integrals by turning it into a contour integral
around a unit circle.
(2" d8
(a) io 5+4sin8
10
2" ae
(e)
o l+sin28'
10
2" cos2 38
(e) ao..
o 5-4cosZ8
t" cos23¢ d¢
(g) io 1- Zacos¢ + a2
(h) [" cos Z¢ d¢
io 1 - Za cos ¢ + a2
(i) 10"tan(x + ia) dx where a E R.
(j) 10" eC
os~ cos(n¢ - sin ¢) d¢ where n E Z.
10.7. Evaluate the integral I = f'''ooeuxdx/(I+eX)forO < a < 1.Hint Choose
a closed (long) rectangle that encloses only one of the zeros of the denominator.
Show that the contributions of the short sides of the rectangle are zero.
10.8. Derive the integration formnla fo
oo
e-
x2
cos(Zbx)dx = V;e-
b2
where
b ;f 0 by integrating the function e-
z2
around the rectangnlar path shown in
Figure 10.9.
10.9. Use the result of Example 10.3.11 to show that 8'(k) = 8(k).
292 10. CALCULUS OF RESIDUES
10.10. Find the principal values of the following iutegrals.
(d)
1
00 sinxdx
(a) .
-00 (x2 +4) (x - I)
1
00 x cosx
(c) 2 5 dx.
-oox - x+6
10.11. Evaluate the following integrals.
1
00 cosax
(b) --dx
-00 I +x3
1
00 l-cosx
2 dx.
-00 x
where a "':O.
(d)
(f)
rOO xl _ b
2
(Sinax)
(a) Jo x2 +b2 -x- dx.
1
00 sin ax
(c) dx
o x(x2 +b2)2 .
1
00 8in2 x dx
(e)
o x2 '
Additional Reading
(b) roo sin ax d
Jo x(x2 +b2) x.
1
00 cos 2ax - cos 2bx d
2 x.
o x
roo sin3x dx
10 x3 ·
I. Dennery, P.and Krzywicki, A. Mathematics for Physicists, Harper and Row,
1967. Includes a detailed discussion of complex analysis encompassing ap-
plications ofconformal mappings and the residue theorem to physical prob-
lems.
2. Mathews, J. and Walker, R. Mathematical Methods of Physics, 2nd ed.,
Benjamin, 1970. A "practical" guide to mathematical physics, including a
long discussion of "how to evaluate integrals" and the use of the residue
theorem.
11 _
Complex Analysis: Advanced Topics
The subject of complex analysis is an extremely rich and powerful area of math-
ematics. We have already seeu some of this richuess and power io the previous
chapter. This chapter concludes our discussion ofcomplex analysis by iotroducing
some other topics with varyiog degrees of importance.
11.1 Meromorphic Functions
Complex functions that have only simple poles as their siogularities are numerous
meromorphic in applications and are called meromorpbic functions. Inthis section, we derive
functions an important result for such functions.
Assume that!(z) has simple poles at{zj }f~l' where N could be iofinity. Then,
if z 01 Zj for all j, the residue theorem yields!
~ ( !CI;) dl; = !CZ)+ tRes eCI;)) ,
2Jfl len I; - Z j=! I; - Z ~=Zj
where Cn is a circle contaioiog the first n poles, and it is assumed that the poles are
arranged in order of iocreasiog absolute values. Since the poles of ! are assumed
to be simple, we have
( ! ~) ) . !~) 1 .
Res - - = Inn CI; - Zj)--. = -- lim [CI; - zj)!CI;)]
I; - Z ~=Zj ~""Zj I; - Z Zj - Z ~""Zj
1 r'
= --Res[!Cf;)]~~Zj es _J_,
Zj - Z Zj - Z
INote that theresidue of f(~)/(~ - z) at'; = z is simply f(z),
294 11. COMPLEX ANALYSIS: ADVANCED TOPICS
where rj is, by definition, the residue of f(~) at ~ = ij. Substituting in the
preceding equation gives
f(z) = ~ ( f(~) d~ - t~.
2:", Je, ~ - Z j~l Zj - Z
Taking the difference between this and the same equation evaluated at Z = 0
(assumed to be none of the poles),Z we can write
(11.1)
Mittag-Leffler
expansion
f(z) - f(O) = -f--, ( H~(~» d~ +t rs (_1_ + 2.).
7Cl len - Z )=1 Z - Zj Zj
If If(~)1 approaches a finite value as I~I --.. 00, the integral vanishes for an infinite
circle (which includes all poles now), and we obtain what is called the Mittag-
Leffler expansion of the meromorphic fuoction f:
f(z) = f(O) +trj (_1_+ 2.).
)=1 Z - Zj Zj
Now we let g be an entire function with simple zeros. We claim that (a)
(dg/dz)/g(z) is a meromorphic function that is bounded for all values of z, and
(b) its residues are all unity. To see this, note that g is of the fomr'
g(z) = (z - ZI)(Z - zz)··· (z - zN)f(z),
where Zl, ... , ZN are all the zeros of g, and f is an analytic function that does not
vanish anywhere in the complex plane. It is now easy to see that
g'(z) N I f'(z)
g(z) = L z - z· + f(z) .
J=l j
This expression has both properties (a) and (b) mentioned above. Furthermore, the
last term is an entire function that is bounded for all C. Therefore, it must be a
constant by Proposition 9.5.5.This derivation also verifies Equation (11.1),which
in the case at hand can be written as
d s'(z) d N (I I)
-d Ing(z) =-(
) = -d Ing(O)+ L --+ - ,
z g z Z j~l Z - Zj Zj
whose solution is readily found to be
g(Z) = g(O)eCZ
fI(I-~) eZ
/
Zj
)=1 z)
and it is assumed that Zj i= 0 for all j,
where
(dg/dz)lz~o
c=
g(O)
(11.2)
~This is not a restrictive assumption because we can always move our coordinate system so that the origin avoids allpoles.
One can ''prove'' this by factoring the simple zeros one by one, writing g(z) = (z - Zl)!t (z) and noting that g(Z2) = 0,
with Z2 i= at. implies that il (z) = (z - Zz) h (z), etc.
(a)
11.2 MULTIVALUED FUNCTIONS 295
(b)
Figure 11.1 (a) TheangleeO changes by 2rr as ZQ makes a complete circuit around C.
(b)Theangle80returns to its original valuewhenzQ completes thecircuit.
11.2 Multivalued Functions
The arbitrariness, up to a multiple of2"., of the angle 0 = arg(z) inz = reieleads to
fuuctions that can take differentvalues at the same point. Consider, for example, the
function !(z) =.,ji.. Writing z in polar coordinates, we obtain !(z) =f tr, 0) =
(reie)t/2 = ..;reie/ 2. This shows that for the same z = (r,O) = (r,O + 2".), we
get two different values, f tr,0) and f'(r, 0 + 2".) = - !(r, 0).
This may be disturbing at first. After all, the definition of a function (mapping)
ensures that for any point in the domain a unique image is obtained. Here two
differentimages are obtainedfor the same z. Riemannfound a cure for this complex
"double vision" by introducing what is now calledRiemann sheets. We will discuss
these briefly below, but first let us take a closer look at a prototype of multivalued
functions. Consider the natural log function, In z. For z = reie this is defined as
In z =In r+iO =In [z] +i arg(z) where arg(z) is defined only to within a multiple
of 2".; that is, arg(z) = 0 + Znit , for n = 0, ±l, ±2, ....
We can see the peculiar nature of the logarithmic function by considering a
closed path around the point z = 0, as shown in Figure 11.1(a). Starting at Zo, we
move counterclockwise, noticing the constant increase in the angle (}o, untilwe
reach the initial point in the z-plane. However, the angle is then 00+ 2".. Thus, the
process of moving around the origin has changed the value of the log function by
2".i. Thus, (lnzo)final - (lnzo)mitiol = Zati. Note that in this process zo does not
change, because
branch point 11.2.1. Definition. A branch point ofafunction f : iC -> iC is a complex number
296 11. COMPLEX ANALYSIS: ADVANCED TOPICS
zo with the property that f(ro, 110) oft f(ro, 1I0 +211") for any closed curve C
encircling Z00Here (ro, 110) are the polar coordinates ofzo.
Victor-Alexandre Pniseox (1820-1883) wasthe firstto take up the subjectof multivalued
functions. In 1850Puiseux published a celebrated paper on complex algebraic functions
givenby f (u, z)= 0, f apolynomial inu andz.Hefirst made clearthedistinctionbetween
polesandbranchpointsthat Cauchy hadbarely perceived, andintroduced thenotion of an
essentialsingularpoint,towhichWeierstrass independently hadcalledattention. Though
Cauchy, in the1846paper, didconsider thevariation of simple multivalued functions along
paths that enclosed branch points, Puiseux clarified thissubject too.
Puiseux alsoshowed that thedevelopment of afunction ofz about abranch point z = a
must involve fractional powers of z - a. He thenimproved on Cauchy's theorem on the
expansion of a function in a Maclaurin series. By his significant investigations of many-
valued functions and their branch points in thecomplex plane, andby his initial work on
integrals of suchfunctions, Puiseux brought Cauchy's pioneering work in function theory
to the end of whatmightbe called the firststage.The difficnlties in the thenryof multiple-
valuedfunctions and integrals of suchfunctions werestilltobe overcome. Cauchy didwrite
other papers on theintegrals of multiplevalued functions in whichhe attempted to follow
uponPuiseux's work; andthough heintroduced thenotion of branch cuts(!ignes d~arrh),
be was stillconfused about thedistinction between poles and branch points. Thissubject
of algebraic functions andtheir integrals wastobe pursued by Riemann.
Puiseux wasa keenmountaineer andwasthefirst to scaletheAlpine peak that is now
named after him.
Thus, Z = 0 is a branchpointofthe logarithmic function. Studyingthe behavior
of 1n(I/z) = -Inz around z = 0 will reveal that the point "at infinity" is also
a branch point of In z. Figure 11.1(b) shows that any other point of the complex
)lIane, such as z', carmot be a branch point because 110 does not change when C' is
traversed completely.
11.2.1 Riemann Surfaces
branch cut orsimply
"cut"
The idea of a Riemarm surface begins with the removal of all points that lie on the
line (or any other curve) joining two branch points. For In z this means the removal
of all points lying on a curve that starts at Z = 0 and extends all the way to infinity.
Such a curve is called a branch cut, or simply a cut.
Let us concentrate on In z and take the cut to be along the negative half of the
real axis. Let us also define the functions
fn (z) = I«(r, II)
= Inr +i(1I+2n1l") for - 1C < e< 1C; r > 0; n = 0, ±1, ... ,
Riemann surfaces
and sheets
11.2 MULTIVALUED FUNCTIONS 297
so in (z) takes on the same values for -:rr: < e < n that In z takes in the range
(2n - I):rr: < e < (2n + I):rr:. We have replaced the multivalued logarithmic
function by a series of different functions that are analytic in the cut z-plane.
This process of cutting the z-plane and then defining a sequence of functions
eliminates the contradiction caused by the existence ofbranch points, since we are
no longer allowed to completely encircle a branch point. A complete circulation
involves crossiog the cut, which, in tum, violates the domain of definition of in (z).
We have made good progress. We have replaced the (nonanalytic) multivalued
function In Z with a series of analytic (in their domain of definition) functions
in (z). However, there is a problem left: in(z) has a discontinuity at the cut. In
fact, just above the cut in(r,:rr: - E) = Inr +i(:rr: - E +2n:rr:) with s c- 0, and just
below it in(r, -:rr: +E) = Inr +i(-:rr: +E +2n:rr:), so that
lim [in(r,:rr: - E) - in(r, -:rr: +E)l = 2:rr:i.
' ....0
To cure this we make the observation that the value of in (z) just above the cut
is the same as the value of in+1(z) just below the cut. This suggests the following
geometrical construction, due to Riemarm: Superpose an infinite series of cut
complex planes one on top of the other, each plane corresponding to a different
value of n. The adjacent planes are connected along the cut such that the upper lip
of the cut in the (n - I)th plane is connected to the lower lip of the cut in the nth
plane. All planes contain the two branch points. That is, the branch points appear
as "hinges" at which all the planes are joined. With this geometrical construction,
if we cross the cut, we end up on a different plane adjacent to the previous one
(Figure 11.2).
The geometric surface thus constructed is called a Riemann surface; each
plane is called a Riemann sheet and is denoted by R], for j = 0, ±I, ±2, ....
A single-valued function defined on a Riemarm sheet is called a branch of the
original multivalued function.
We have achieved the following: From a multivalued function we have con-
structed a sequence of single-valued functions, each defined in a single complex
plane; from this sequence of functions we have constructed a single complex func-
tion defined on a single Riemarm surface. Thus, the logatithmic function is analytic
throughout the Riemarm surface except at the branch points, which are simply the
function's singular points.
It is now easy to see the geometrical significance ofbranch points. A complete
cycle around a branch point takes us to another Riemarm sheet, where the function
takes on a different form. On the other hand, a complete cycle around an ordinary
pointeither nevercrossesthecut,or if it does, it will crossit backto the original
sheet.
Let us now briefly consider two of the more common multivalued functions
and their Riemarm surfaces.
11.2.2. Example. THE FUNCTION f(z) = z'!"
The only branchpointsfor the functioo f(z) = zl/n are z = 0 and the point at infinity.
298 11. COMPLEX ANALYSIS: ADVANCED TOPICS
Figure 11.2 - A few sheets of the Riemann surface of the logarithmic function. The path
C encircling the origin 0 ends up on the lower sheet.
Defioing!k(z) sa r 1/ nei(9+2k" jn) for k = 0, 1, ... , n -1 and0 < e < 2". andfollowing
the same procedure as for the logarithmic function, we see that there must be n Riemann
sheets, labeled Ro. RI •... , Rn-l. in the Riemann surface. The lower edge of Rn-l is
pasted to the upper edge of Ro along the cut, which is taken to be along the positive real
axis. The Riemann surfacefor n = 2 is shownin Figure 11.3.
It is c1ear that for any nonintegervalue ofa the function fez) = zOl. has a branch point
at z = 0 and another at the point at infinity. For irrational a the number of Riemann sheets
is infinite. II
11.2.3. Example. l'HEFUNCTION f(z) = (z2 _1)1/2
The brancbpoints for the function f(z) = (z2 - 1)1/2 are at zt = +1 and Z2 = -1 (see
Figure 11.4).Writingz - 1 = rt ie, andz + 1 = r2eie,. we bave
f(z) = (Yjeie1)1/2 (r2eie,)I/2 = .jYjr2ei(e1+e, )/2.
The cut is along the real axis from z = -1 to z = +1. There are two Riemann sheets in
the Riemann surface. Clearly, only cycles of 2rr involving one branch point will cross the
cut and therefore end up on a different sheet. Any closed curve that has both zr and Z2 as
interior points will remain entirely on the original sheet. II
evaluation of
integrals involving
Guts
The notion of branch cnts can be used to evalnate certain integrals that do not
fit into the three categories discussed in Chapter 10. The basic idea is to circumvent
the cut by constructing a contour that is infinitesimally close to the cut and circles
around branch points.
11.2 MULTIVALUED FUNCTIONS 299
Figure 11.3 The Riemann surface for fez) = zl/2.
z
Figure 11.4 The cut for the fuoction fez) = (z2 - 1)1/2 is from ZI to Z2. Paths that
circle only one of the points cross the cut and end up on the other sheet.
11.2.4. Example. To evaluate the integrall = 1000
xadx/(x2 + I) for lal < I, consider
the complex integral t' = :Fe zadz/(z2 + 1) where C is as shown in Figure 11.5 and the
cut is taken along the positive real axis. To evaluate the contribution from CR and Cr, we
let p stand for either T or R. Then we have
It is clear that since lal < 1, I p -+ 0 as p -)- 0 or p -+ 00.
300 11. COMPLEX ANALYSIS: ADVANCED TOPICS
(11.3)
Figure 11.5 Thecontourfortheevaluationofthe integrals of Examples 11.2.4and11.2.5.
The contributions from L 1 and £2 do not cancel one another because the value of
the functionchangesabove andbelow the cut. To evaluate these two integrals we have to
choose a branch of the function. Let us choose that branch on which ZCl = IzlfteiaB for
o< () < 2Jt. Along L1, (J ~ 0 or zet = x", andalong L2. e ~ 2rr or zet = (xe27ti)a'.
Thus,
i z· looo x' 1° x
ct
e
2rria
--4Z= --dx+ . dx
C z2 +I 0 x 2 +I 00 (xe2n')2 + I
= (1 - e21riCl. ) tX) ~ dx,
10 x2 + I
TheLHSof thisequation canbe obtained usingtheresidue theorem. There are two simple
poles, at z = +i and z = -i with residues Res[f(i)] = (ein/2)./2i and Res[f(-i)] =
_(ei3n/2). /2i. Thus,
1i
z· (ei•n/2 ei3
•
n/2) . .
- - dz = 2rci --.- - --.- = rt(ew rrj2 _ eI3car/2).
cz2 + 1 21 21
Combiniog this with Equation (11.3), we obtaio
[00 ~ dx = 7r(eicmj2 - <3urr/2) = !!:. sec a7r .
io xl + 1 1 - e~bna 2 2
Ifwe hadchosenadifferent branch of thefunction, boththeLHSandtheRHSof Equation
(11.3) wonld have been different, but the fina1 result would stiDhave been the same. III
11.2 MULTIVALUED FUNCTIDNS 301
Ce
for 0 < a < 1.
(11.4)
III
(11.5)
Figure 11.6 The contour for the evaluation of the integral of Example 11.2.6.
11.2.5. Example. Here is another integral involving a branch cut:
1= roo x-a dx
10 x+ I
To evaluate this integral we use the zeroth branch of the function and the contour of the
previousexample (Figure 11.5).Thus, writing z = peW. we have
2"iRes[f(-I)] = J: z-a dz = [00 p-a dp + J: z-a dz
Ic z + I 1
0 p + I fCR Z + I
1
0 (pe2i,,)-a. i c a
+ . e2mdp+ --dz.
00 peZm + 1 Cr Z + 1
The contributions from both circles vanish by the same argument used in the previous
example. On the other hand, Res[f(-I)] = (_I)-a. For the branch we are using, -I =
ei". Thus, Res[f(-I)] =e-ia". The RHS of Equation (11.4) yields
10
00 -a 1000 -a
_P_ dp - e-2ina ..!!...:.....- dp = (1 _ e-2i1ta)I.
o p+1 0 p+1
It follows from (11.4) that (1 - e-2iJra)I = 21rie-i 1r Q
, or
10
00 x-a 1C
dx - for 0 < a < 1.
o x+l - sinarr
11.2.6. Example. Let us evaluate I = fo
oo
Inx dx/(x2+a2) with a > O. We choose the
zeroth branch of the logarithmic function, in which -li < 9 < Jr, and use the contour of
Figure 11.6. For Ll' Z = pei1r (note that p > 0), and for £2, Z = p. Thus, we have
i ln z 1< In(pe;"). /, ln z
2"i Res[f(ia)] = -2--2 dz = . 2 2 e"dp + ~ dz
C z +a 00 (pelJt
) +a C" z +a
100 lup /, Inz
+ -2--2 dp + ~ dz.
E p +a CR z +a
302 11. COMPLEX ANALYSIS: ADVANCED TOPICS
wherez = ia is the only singularity-a simplepole-in theUHP. Now we note that
l
E
In(pei:Jr). 1
00
lop +itt 1
00
lop . 100
dp
. el1C
dp = dp= ---dp+zJr ---.
00 ( pel'lr)2 +a2 to p2 +a2 E p2 +a2 to p2 +a2
Thecontributions from thecirclestendto zero. Ontheother hand,
Inz In(ia) 1 ( ''')
Res[f(ia)] = lim (z - ia) . . = -.- = -.- Ina +!- .
z-via (z - w)(z + za) 2za 'Iia 2
Substituting thelasttworesults in Equation (11.5), we obtain
?:'(lna+i::')=21°O ;llP 1 dp +i:7r100
zd
p
2"
a 2 E P +a E p +a
It canalsoeasilybe shownthatIo
oo dp/(p2 +a2) = ,,/(2a). Thus, in the limitE -+ 0, we
get I = !!..-Ina. The sign of a is irrelevant because it appears as a square in the integral.
2a
Thus, we canwrite
1
00 Inx n
-2--2 dx = -21
Ilnla l,
D x +a a
a i' O.
equal onapiece,
equal all over
n.3 Analytic Continuation
Analytic functions have certain unique properties, some ofwhich we have already
noted. For instance, the Cauchy integral formula gives the value of an analytic
function inside a simple closed contour once its value on the contour is known.
We have also seen that we can deform the contours ofintegration as long as we do
not encounter any singnlarities of the function.
Combining these two properties and assuming that f : C -+ C is analytic
within a region See, we can ask the following question: Is it possible to extend
f beyond S1 We shall see in this section that the answer is yes in many cases of
interest." First consider
1I.3.1. Theorem. Let fl' fz : C -+ C be analytic in a region S. If fl = I: in a
neighborhood ofa point z E S, orfor a segment ofa curve in S, then I, = fzfor
all z E S.
Proof Let g = f1 - fz,and U = {z E S Ig(z) = OJ. Then U is a subset of S that
includes the neighborhood of z (or the line segment) in which fl = h. If U is the
entire region S, we are done. Otherwise, U has a boundary beyondwhich g(z) oF O.
Since all points within the boundary satisfy g(z) = 0, and since g is continuous
(more than that, it is analytic) on S, g must vanish also on the boundary. But the
bouodary points are not isolated: Any small circle around anyone ofthem includes
points of U as well as points outside U. Thus, g must vanish on a neighborhood
of any boundary point, implying that g vanishes for some points outside U. This
contradicts our assumption. Thus, U must include the entire region S. 0
4Provided that S is not discrete (countable). (See [Lang85, p. 91].)
when Iz+il < i·
11.3 ANALYTIC CONTINUATION 303
A consequence of this theorem is the following corollary.
11.3.2. Corollary. The behavior 01afunction that is analytic in a region S c C is
completely determined by its behavior in a (small) neighborhood 01an arbitrary
point in that region.
analytic continuation This process of detetmining the behavior of an analytic function outside the
regionin which it was originally defined is called analytic continuation. Although
there are infinitely many ways of analytically continuing beyond regions of defi-
nition, the values of all functions obtained as a result of diverse continuations are
the sarne at any given point. This follows from Theorem 11.3.1.
Let It, Iz :C -+ C be analytic in regions 8t and 82, respectively. Suppose that
II and fzhave different functional forms in their respective regions of analyticity.
If there is an overlap between 81 and 82 and if II = fz within that overlap, then
the (unique) analytic continuation ofII into 82 must be h. and vice versa. In fact,
we may regard !J and h as a single function I :iC -+ C such that
I(z) = {!J(Z) when ZE 81,
I:(z) when Z E 82.
Clearly, I is analytic for the combined region 8 = 81 U 82. We then say that II
and fz are analytic continuations of one another.
11.3.3. Example. Let us consider the function 11(z) = L~o t", which is analytic
for lel < 1. We have seen that it converges to 1(1 - z) for [z] < 1. Thus, we have
II(z) =1((1 - z) when [z] < I, and!J is not definedfor Izi > 1.
Now let us consider a second function, h(z) = L;:'=o (~)n+1 (z + i)n, which
converges for [z + ~I < ~. To see what it converges to, we note that h(z) =
~ L~o [~(z + i)]", Thus,
~ I
h(z) = = -
I-~(z+~) I-z
Weobservethat althoughII(z) and h (z) havedifferentseriesrepresentations in the two
overlapping regions(seeFigure 11.7),theyrepresentthe samefunction, I(z) = 1(1- z).
Wecantherefore write
I {
fJ (z) when Izl < I,
(z) -
- h(z) when lz+ ~I < i,
and I: and h are analyticcontinuations of ooe another. In fact, I(z) = 1(1 - z) is the
analyticcontinuation of both fJ and Iz for all of iCexcept z = 1. Figure 11.7sbowsSi,
the region of definitionof Ii.for i = 1, 2. III
11.3.4. Example. The function fJ(z) =1
0
00
e-ztdt existsonly if Re(z) > 0, in which
caseII(z) = I(z.Its regionof definitionSI is showninFigure H.S andis simplytheright
half-plane.
304 11. COMPLEX ANALYSIS: ADVANCED TOPICS
Figure 11.7 Thefunction defined in thesmaller circleis continued analytically intothe
larger circle.
Figure 11.8 Thefunctions !l andI: are analytic continuations of eachother: !l ana-
lytically continues 12 into the right half-plane, and 12 analytically continues !J into the
senticirc1ein the left half-plane.
Now we define 12by a geometric series: h(z) = i L:~O[(z+i)fi]n where Iz+il < I.
Thisseries converges, within its circleof convergence 82, to
. I I
l =-.
l-(z+iJfi z
(a)
11.3 ANALYTIC CONTINUATION 305
(b)
Figure 11.9 (a) Regions 81 and 82 separated by the bonndary B and the contnnr C. (b)
Thecontour C splitsupintoCl andC2-
Thus, we have
~={ft(Z) when z e Sj ,
z h(z) when Z E 82.
Thetwofunctions are-analytic continuations of oneanother, andf (z) = 1/z is theanalytic
continoation of both II and 12 for aU z E C except z = O. iii
11.3.1 The Schwarz Reflection Principle
A result that is useful in some physical applications is referred to as a dispersion
relation. To derive such a relation we need to know the behavior of analytic func-
tions on either side of the real axis. This is found using the Schwarz reflection
principle, for which we need the following result.
11.3.5. Proposition. Let Ii be analytic throughout 8i, where i = I, 2. Let B be
the boundary between 81 and 82 (Figure 11.9) andassume that It and Izare con-
tinuous on B and coincide there. Thenthetwofunctionsare analytic continuations
ofone another and together they define a (unique)function
I(z) = {fl (z) when z E 81 U B,
h(z) when z E 82 U B,
which is analytic throughaut the entire region 81 U 82 U B.
306 11. COMPLEX ANALYSIS: ADVANCED TOPICS
Proof The proofconsists in showing that the function integrates to zero along any
closed curve in Sl U S2 U B. Once this is done, one can use Morera's theorem to
conclude analyticity. The case when the closed curve is entirely in either S; or S2
is trivial. When the curve is partially in Sj and partially in S2 the proof becomes
only slightly more complicated, because one has to split up the contour C into Cl
and C2 of Figure 11.9(b). The details are left as an exercise. D
Schwarz reflection
principle
11.3.6. Theorem. (Schwarz reflection principle) Let f be a function that is ana-
lytic in a region S that has a segment ofthe real axis as part ofits boundary B. If
f(z) is real whenever z is real, then the analytic continuation g of f into S* (the
mirror image ofS with respect to the real axis) exists and is given by
g(z) = f*(z*) where z E S*.
Proof First, we show that g is analytic in S*. Let
f(z) sa u(x, y) +iv(x, y), g(z) es U(x, y) +iV(x, y).
Then f(z*) = fix, -y) = u(x, -y) +iv(x, -y) and g(z) = f*(z*) imply that
U(x, y) = u(x, -y) and V(x, y) = -v(x, -y). Therefore,
au au av av av
= = =---=-
ax ax ay a(-y) ay'
au au av av
ay = - ay = ax = -""h'
These are the Cauchy-Riemann conditions for g(z). Thus, g is analytic.
Next, we note that fix, 0) = g(x, 0), implying that f and g agree on the real
axis. Proposition 11.3.5 then implies that f and g are analytic continuations of
one another. D
It follows from this theorem that there exists an analytic function h such that
h(z) = {f(Z) when z E S,
g(z) when z E S*.
We note that h(z*) = g(z*) = f*(z) = h*(z).
11.3.2 DispersionRelations
Let f he analytic throughout the complex plane except at a cut along the real axis
extending from xo to infinity. For a point z not on the x-axis, the Cauchy integral
formula gives f(z) = (2ni}-1 Ie f(~) d~I(~ - z) where C is the contour shown
in Figure 11.10. We assume that f drops to zero fast enough that the contribution
11.3 ANALYTIC CONTINUATION 307
o
Figure 11.10 The contour used for dispersion relations.
from the large circle tends to zero. The reader may show that the contribution from
the small half-circle around XQ also vanishes. Then
1 [l""+i' f(~) l""-i' f(~) ]
f(z) = -. -d~ - -d~
2rn xo+i'e ~ - Z xo-ie ~ - z
= ~ [ roo f(x +i~) dx _ roo f(x - if) dX].
2nl JXD x - Z +IE Jxo X - Z - IE
Since z is not on the real axis, we can ignore the iE tenus in the denominators,
so that f(z) = (2rri)-1 Jx':[f(x + if) - f(x - if)]dxj(x - z). The Schwarz
reflection principle in the form J*(z) = f(z*) can now be used to yield
f(x +if) - f(x - if) = f(x +if) - J*(x +if) = 2i Im[f(x +if)].
dispersion relation The final result is
1 1""Im[f(x +if)]
f(z) = - . dx.
tt xo x-z
This is one form of a dispersion relation. It expresses the value of a function at
any point of the cut complex plane in terms of an integral ofthe imaginary part of
the function on the upper edge of the cut.
When there are no residues in the UHP, we can obtain other forms ofdispersion
relations by equating the real and imaginary parts of Equation (l0.11). The result
308 11. COMPLEX ANALYSIS: ADVANCED TOPICS
is
1 100
Im[f(x)]
Re[f(xo)] = ±-P dx,
Jr -00 X -xo
Im[f(xo)] = =F~P100
Re[f(x)] dx,
1T: -00 x-xo
(11.6)
(11.7)
where the upper (lower) sign corresponds to placing the small semicircle around
xo in the UHP (LHP). The real and imaginary parts of f, as related by Equation
Hilbert transform (11.6), are sometimes sometiroes said to be the Hilbert transform of one another.
In some applications the imaginary part of f is an odd function ofits argument.
Then the first equation in (11.6) can be wtitten as
2 1000
x Im[f(x)]
Re[f(xo)] = ±-P 2 dx.
x 0 x2-x
O
To artive at dispersion relations, the following condition must hold:
lim Rlf(Reie)1 = 0,
R->oo
. where R is the radius of the large semicircle in the UHP (or LHP). If f does not
satisfy this prerequisite, it is still possible to obtain a dispersion relation called
dispersion relation a dispersion relation with one subtraction. This can be done by introducing
with onesubtraction an extra factor of x in the denominator of the integrand. We start with Equation
(l0.15), confining ourselves to the UHP and assuming that there are no poles there,
so that the sum over residues is dropped:
f(X2) - f(XI) = ~P 100
f(x) dx,
X2 - Xl in -00 (x - XI)(X - X2)
The reader may check that by equating the real and imaginary parts on both sides,
letting Xl = 0 and X2 = xo, and changing X to -x in the first half of the interval
of integration, we obtain
:....Re.:..o[::...f.:c.(x-",o)",] = Re[f(O)] + ~ [p roo Im[f(-x)] dx + P roo Im[f(x)] dX].
Xo xo n 10 x(x +xo) 10 x(x - xo)
For the case where Im[f(-x)] = -lm[f(x)], this equation yields
Re[f(xo)] = Re[f(O)] + 2x~ P roo Im~f(X)1 dx.
n 10 x(x -xo)
11.3.7. Example. In optics. it has been shown that the imaginary part of the forward-
optical theorem scatteringlightamplitudewithfreqnency00 is related, by the so-calledoptical theorem, to
the total crosssectionfor the absorptionof light of that frequency:
00
!m[f(oo)] = -<1lot(oo).
411"
(11.8)
Kramers-Kronig
relation
11.4 THE GAMMAAND BETA FUNCTIONS 309
Substituting this in Equation(t 1.7)yields
,,} looo O"totCw)
Re[f(wo)] = Re[f(O)] + ----%:p 2 2 da.
2:rr 0 (V-coO
Thus, the real part of the (coherent) forwardscatteringof light,that is, the real part of the
indexcfrefraaion. canbecomputedfromEquation(11.8)byeithermeasuring orcalculating
O'tot(w), the simpler quantity describing the absorption of lightin themedium. Equation
(11.8) is theoriginal Kramers-Kronigrelation. II
11.4 The Gamma and Beta Functions
We have already encountered the gamma function. In this section, we derive some
useful relations involving the gammafunction and the closelyrelatedbetafunction.
The gamma function is a generalization ofthe factorial function-which is defined
only for positive integers-to the system of complex numbers, By differentiating
the integral 1(01) '" fooo
e-aldt = 1/01 with respect to 01 repeatedly and setting
gamma function 01 = I at the end, we get fo
oo
tne-I
dt = nL This fact motivates the generalization
defined
r(z) '" faoot,-Ie-Idt for Refz) > 0, (11.9)
where I' is called the gamma (or factorial) function. It is also called Euler's
integralofthe second kind. It is clear from its definition that
r(n + I) = n! (11.10)
ifn is a positive integer. The restriction Re(z) > 0 assnres the convergence ofthe
integral.
An immediate consequence ofEquation (11.9) is obtained by integrating it by
parts:
I'(z + I) = zF'(z). (11.11)
This also leads to Equation (11.10) by iteration.
Another consequence is the analyticity of I'(z), Differentiating Equation
(11.11) with respectto z, we obtain
dr(z + 1) r() dr(z)
dz = z +zdZ"
Thus, dr(z)/dz exists and is finite if and only if dr(z + I)/dz is finite (recall
that z oF 0). The procednre of showing the latter is outlined in Problem 11.16.
Therefore, F'(z) is auaiytic whenever I'(z +1) is. To see the singularities of I'(z),
we note that
I'{z +n) = z(z + l)(z +2) ... (z +n - I)r(z),
310 11. COMPLEX ANALYSIS: ADVANCED TOPICS
or
F'(z} = f(z +n)
z(z + l)(z +2) ... (z +n - 1)
(11.12)
The numerator is analytic as long as Re(z +n) > 0, or Retz) > r--Fl, Thus, for
Refz) > -n, the singularities of T'(z) arethepolesatz = 0, -I, -2, ... , -n+ I.
Since n is arbitrary, we conclude that
11.4.1. Box. I'(z) is analytic at all Z E iC except at z = 0, -1, -2, ... ,
where I'(z) has simple poles.
A useful result is obtained by setting z = !in Equation (11.9):
(11.13)
This can be obtained by making the substitntion u = .,fi in the integral.
We can derive an expression for the logarithmic derivative ofthe gamma func-
tion that involves an infinite series. To do so, we use Equation (11.2) noting that
1/ I'(z + I) is an entire function with simple zeros at {-k}~I' Equation (11.2)
gives
where y is a constant to be determined. Using Equation (11.11), we obtain
1 00 ( Z)
-- = zeyZ f1 1 + - e-z/k
•
I'(z) k~1 k
(11.14)
Euler-Mascheroni To determine y, let z = 1 in Equation (11.14) and evaluate the resulting product
constant numerically. The result is y = 0.57721566 ... , the so-called Euler-Mascheroni
constant.
Differentiating the logarithm of both sides of Equation (11.14), we obtain
d 1 00 (1 1)
-in[f(z)] = -- - y +I: -- - .
dz Z k=l k z+k
(11.15)
(11.16)
where Re(a), Re(b) > O.
Other properties ofthe ganunafunction are derivable from the results presented
beta function defined here. Those derivations are leftas problems. The betafunction, or Euler's integral
of the first kind, is defined for complex numbers a and b as follows:
B(a, b) es fal ta-1
(1 - t)b-1dt
(11.17)
11.4 THE GAMMA AND BETA FUNCTIONS 311
By changing I to 1/1, we can also write
R(a, b) sa 100
I-a-b(t - l)b-1dl.
Since 0 ::: I ::: I in Equation (11.16), we can define eby I = sin2
e.This gives
["/2
R(a, b) = 2 J
o
sin2a- 1e cos2b- 1e de. (11.18)
(11.19)
gamma function and
beta function are
related
This relation can be used to establish a connection between the gamma and beta
functions. We note that
where in the last step we changed the variable to x = .;t. Multiply I'(c) by ['(b)
and express the resulting double integral in terms of polar coordinates to obtain
['(a)[,(b) = ['(a +b)R(a, b), or
['(a)[,(b)
R(a, b) = Bib, a) = .
['(a +b)
Let us now establish the following useful relation:
11:
[,(z)[,(1 - z) = -.-.
SIDrr:Z
(11.20)
for 0 < Re(z) < I.
(11.21)
With a = z and b = 1- z, and using u = tan s, Equations (11.18) and (11.19)
give
(00 u2z~1
[,(z)[,(1 - z) = R(z, I - z) =2 J
o
u2 + I du
Using the resultobtained in Exaruple 11.2.4, we immediately get Equation (11.20),
valid for 0 < Re(z) < I. By analytic continuation we then geueralize Equation
(11.20) to values of z for which both sides are analytic.
11.4.2. Example. Asan illustration of theuse of Equation(tt.20),tetus showthatI'(z)
canalsobe written as
1 1 [ e'
r(z) = 2:n:i lc Ii dt,
whereC isthecontourshownin Fignre11.11.FromEqoations (11.9)and(tt.20)itfollows
that
1 sinxz sin rrz 1000
ei1
rz - e-i7l'Z 1000
e-r
- = --r(l - z) = - - «"r-'dr = - dr.
rez):n: :n: 0 2:n:i 0 rt
312 11. COMPLEX ANALYSIS: ADVANCED TOPICS
Im(t)
c
Figure 11.11 Thecontour C usedin evaluating thereciprocal gamma function.
The contour integral of Equation (11.21) can be evaluated by noting that above the real
axis, t = rein = -r, below it t = re-i:n: = -r. and, as thereader may check,that the
contribution from thesmallcircleatthe origin is zero; so
l et loco e-
r
1° e-
r
-dt= -.-(-dr) + . (-dr)
C tZ 0 (re l1TY 00 (re-m)Z
. loco e-r . loco e-r
= _e-l1
rz _ dr + el.7rZ - dr.
o rZ 0 rZ
Comparison withthelastequation aboveyieldsthedesired result. III
Another useful relation can be obtained by combining Equations (11.11) and
(11.20): r(z)r(1 - z) = r(z)(-z)r(-z) = rr/ sin zrz. Thus,
r(z)r(-z) =
ic
z sinzrz
(11.22)
Once we know rex) for positive values ofreal x, we can use Equation (11.22)
to find rex) for x < O. Thus, for instance, r(!) =.,fii gives r(-!) = -2.,fii.
Equation (11.22) also shows that the gamma function has simple poles wherever
z is a negative integer.
11.5 Method of Steepest Descent
It is shown in statistical mechanics ([Hill 87, pp. 150-152]) that the partition
function, which generates all the thermodynamical quantities, can be written as
a contour integral. Debye found a very elegant technique of approximating this
contour integral, which we investigate in this section. Consider the integral
I (a) == fceU!(z)g(z) dz (11.23)
where lal is large and f and g are analytic in some region of C containing the
contour C. Since this integral occurs frequently in physical applications, it would
(11.24)
11.5 METHOD OF STEEPEST DESCENT 313
be helpful if we could find a general approximation for it that is applicable for
all f and g. The fact that lal is large will be of great help. By redefining f(z), if
necessary, we can assume that a = lale' ar
g(a) is real and positive [absorb e' argte)
into the function f(z) if need be].
The exponent of the integrand can be written as
af(z) = au(x, y) +iav(x, y).
Since a is large and positive, we expect the exponential to be the largest at the
maximum of u(x, y). Thus, if we deform the contour so that it passes through a
point zo at which u(x, y) is maximum, the contribution to the integral may come
mostly from the neighborhood of zoo This opens up the possibility of expanding
the exponent about zo and keeping the lowest terms in the expansion, which is
what we are after. There is one catch, however. Because of the largeness of a, the
imaginary part of af in the exponent will oscillate violently as v(x, y) changes
even by a small amount. This oscillation can make the contribution of the real
part of f(zo) negligibly small and render the whole procedure useless. Thus, we
want to tame the variation of exp[iv(x, y)] by making v(x, y) vary as slowly as
possible. A necessary condition is for the derivative of v to vanish at zoo This and
the fact that the real part is to have a maximum at zo lead to
au . av dfl
ax + Z ax = dz zo = o.
However, we do not stop here but demand that the imaginary part of f be constant
along the deformed contour: Im[f(z)] =Im[f(zo)] or v(x, y) =v(xo, Yo).
Equation (11.24) and the Cauchy-Riemann conditions imply that aujax =
0= aujay atzo. Thus, it might appear that zo is a maximum (or minimum) ofthe
surface described by the function u(x, y). This is not true: For the surface to have
a maximum (minimum), both second derivatives, a2ujax2 and a2ujay2, must be
negative (positive). But thatis impossible because u(x, y) is harmonic-the sum of
these two derivatives is zero. Recall that a point at which the derivatives vanish but
that is neither a maximum nor a minimum is called a saddle point. That is why the
procedure described below is sometimes called the saddle point approximation.
We are interested in values of z close to zo. So let us expand f(z) in a Taylor
series about zo, use Equation (11.24), and keep terms only up to the second, to
obtain
f(z) = f(zo) + !(z - ZO)2I" (zo).
Let us assume that t"(zo) f= 0, and define
z - zo = Yje
iB,
and !f"(zo) = T2eifh
and substitute in the above expansion to obtain
f(z) - f(zo) = TfT2ei(2Bz+fh),
(11.25)
(11.26)
(11.27)
314 11. COMPLEX ANALYSIS: ADVANCED TOPICS
y z,t>~O__ Co
x
Figure 11.12 A segment of thecontour Coin thevicinityof zoo Thelinesmentioned in
thetextaresmallsegments of thecontour Cocentered atzoo
or
Re[f(z) - f(zo)] = rhcos(201 +02),
Im[f(z) - f(zo)] = rrr2 sin(201 +~). (11.28)
The constancy of Im[f(z)] implies that sin(201 + 02) = 0, or 201 + 02 = ntt.
Thus, for 01 = -~/2 + nIT/2 where n = 0, I, 2, 3, the imaginary part of f is
constant. The angle 02 is determined by the second equation in (11.26). Once we
determine n, the path of saddle point integration will be specified.
To get insight into this specification, consider z - zo = rlei(-1J,/2+nrr/2), and
eliminate rl from its real and imaginary parts to obtain
y-yo=[tanC; -~)](X-xo).
This is the equation of a line passing through zo = (xo, YO) and making an angle
of 01 = (nIT - ~)/2 with the real axis. For n = 0,2 we get one line, and for
n = 1,3 we get another that is perpendicular to the first (see Figure 11.12). It is
to be emphasized that along both these lines the imaginary part of fez) remains
constant. To choose the correct line, we need to look at the real part of the function.
Also note that these "lines" are small segments of (or tangents to) the deformed
contourat zoo
We are looking for directions along which Re(f) goes through a relative max-
imum at zoo In fact, we are after a path on which the function decreases maximally.
This occurs when Re[f(z)] - Re[f(zo)] take the largest negative value. Equation
(11.28) determines such a path: It is that path on which cos(201 +02) = -I, or
when n = 1,3. There is ouly one such path in the region of interest, and the pro-
method ofsteepest cedure is uniquely determined/' Because the descent from the maximum value at
descent
11.5 METHOO OF STEEPEST OESCENT 315
zo is maximum along such a path, this procedure is called the method of steepest
descent.
Now that we have determined the contour, let us approximate the iutegral.
Substituting 2ej +e2 = tt, 3n in Equation (11.27), we get
fez) - f(zo) =-rir: es _t2 = i(z - ZO)2 f"(zo).
Using this in Equation (11.23) yields
I (a) "" ( ea[i(zoH'] g(z) dz = eai C,o) ( e-at'g(z) dz,
leo leo
(11.29)
(11.30)
where Co isthe deformed contour passing through zoo
To proceed, we need to solve for z in terms of t. From Equation (11.29) we
have
2 2 2 t
2
-ie,
(z - zo) = - - - I = --e
!"(zo) ri
Therefore, [z - zoI =It1/JT2, or Z - zo =(It1/JT2)eie" by the first equation of
(11.26). Let us agree that for t > 0, the point z ou the coutour will move in the
direction that makes an angle of 0 :5 el < tt , and that t < 0 correspouds to the
opposite direction. This convention removes the remaining ambiguity of the angle
el, and gives
1 ·0
z=zo+-el 1
,
JT2
(lUI)
Using the Taylor expansion of g(z) about zn, we can write
g(z)dz = {~~einOlg(n)(zO)} e
iO,
dt
L.- n/2 I 'ri.
n=O 72 n. 'V':l.
00 tn
= " ei(n+I)01 g(nZO) dt
L- (n+l)/2 , I
n=O 72 n.
and substituting this in Equation (11.30) yields
I(a) "" eaiC.o) { e-at' {~ t
n
eiCn+I)O, gcn)(ZO)} dt
lee L.- Cn+I)/2 ,
o n=O 72 n.
00 i(n+l)91 JOO
= eaiC'o)" e g(n)(ZO) e-at'tn dt.
L.- Cn+I)/2 ,
n=O 72 n. -00
(11.32)
5Theangle191 is stillambiguous by x, becausen canbe 1 or 3. However, by a suitablesign convention described below,we
canremovethisambiguity.
316 11. COMPLEX ANALYSIS: ADVANCED TOPICS
The extension of the integral limits to infinity does not alter the result significantly
because a is assumed large and positive. The integral in the sum is zero for odd n.
Whenn is even, wemake the substitution u = at2 and show that J~oo e-at2
t" dt =
a-(n+l)/2r[(n+ 1)/2].Withn = 2k, and using r2 = If"(zo) 1/2,the sum becomes
asymptotic
expansion ofI(a)
00 2k+l/2 i(2k+I)el
[(a) "" ea!(,o) '" e g(2k)(zo)r(k +!)a-k-1/2.
f;;o If"(zo)lk+l/2(2k)! 2
(11.33)
(11.34)
(11.35)
III
Stirling
approximation
This is called theasymptotic expansion of [(a). In most applications, only the
first term of the above series is retained, giving
[(ex) "" ea!(,o) fiIT eiel g(zo) .
V-;- JI f" (zo)1
11.5.1. Example. LeI us approximale theintegral
[(ex) == r(ex + 1) =faOO e-,zadz,
where ex is a positive real number. First, we must rewrite the integral in the fonn ofEquation
(11.23). We candothis bynoting that za =ealnz. Thus,wehave
l(ex) = 10
00
ealnZ-Zdz = 10
00
elX(lnz-z/a)dz,
andWe identify f(z) =ln z- z/a andg(z) = I. The saddle point is found from f'(z) = 0
or zu = C1. Furthermore, from
1 " 1 ( 1) 1 in
2 f (zn) = 2 - a2 = 2cx2e
and 2(h +6z = x, 3n. as well as the condition 0 ::::: 81 < zr, we conclude that 91 = O.
Substitution inEquation (11.34) yields
r(ex + 1) "" ea!(,o) rzrr_l
_ = .J2Jrexea(lna-l) = 5e-aaa+l/2,
V-;;./1/ex2
called theStirlingapproximation.
11.5.2. Example. TheHankel function ofthe first kind is defined as
HSI)(ex) ==,!- ( e(a/2)(,-1/zl..!!:.-,
lJr lc zv+l
where C is the contour shown in Figure 11.13. We want to find the asymptotic expansion
of this function, choosing the branch of the function in which -l'C < e < 'Jr.
We identify f(z) = :!:(z - I/z) andg(z) = z-v-I. Next, the stationary points of f
are calculated:
df I I
dz = 2: + 2z2 =0 =} Zo = ±i.
11.5 METHOD OF STEEPEST DESCENT 317
Irnz
in 1-----.-----
o Rez
III
Figure 11.13 The contourfor the evaluation of the Hankelfunctionof thefirstkind.
The contour of integration suggests the saddle point ZQ = +i. The second derivative evalu-
ated at the saddlepointgives f"(zo) = -1/z5 = -i = e-i" j2, or Ih = -,,/2. This, and
the convention 0 ::::: 611 < 1r, force us to choose 611 = 3.1l"/ 4. Substituting this in Equation
(11.34)and notingthat f(i) = i and II"(zo)1 = I, we obtain
H~l)(a) es ~I(a) ~ ~eai fiJrei3rr/4i - v- l = (2ica-vlr/z-n/4),
IX m V-;; Van
where we have used i-v- 1 = e-i(v+l)rrj2.
Although Equation (11.34) is adequate for most applications, we shall have
occasionto demand abetter approximation. One may try to keep higher-orderterms
of Equation (11.33), but that infinite sum is in reality inconsistent. The reason is
that in the product g(z) dz; we kept only the first power of t in the expansion of z.
To restore consistency, let us expand z(t) as well. Snppose
00
z-zo = Lbmtm
m=l
so that
00
=} dz = L(m + l)bm+1t
mdt,
m=O
00 tn 00
g(z) dz = L -;;72-/no,g(n)(zo) L (m + I)bm+ltmdt
n=O 72 n. m=O
00 ei n8r
= L ~(m + l)bm+lg(n)(Zo)t
m+ndt.
m,n=O T2 n.
Now introdnce 1 = m +n and note that the summation over n goes np to I. This
gives
318 11. COMPLEX ANALYSIS: ADVANCED TOPICS
Substituting this in Equation (11.30) and changing the contour integration into the
integral from -00 to 00 as before yields
00
I(a) "" ea!(zo) Lazka-k-I/Zr (k +!),
k~O
2k ein(h
au = L ---nj2(2k - n + l)bZk_n+lg(n)(zo).
n=O r2 n! (11.36)
The only thing left to do is to evaluate bm • We shall not give a general formula
for these coefficients. Instead, we shall calculate the first three of them. This shonld
reveal to the reader the general method of approximating them to any order. We
have already calculated bt in Equation (11.31). To calculate bz, keep the next-
highest term in the expansion ofboth z and tZ• Thus write
t Z = -~f"(zo)(z - zo)z - ~f"'(zo)(z - ZO)3.
2 6
Now substitute the first equation in the second and equate the coefficients of equal
powers of t on both sides. The second power of t gives nothing new: It merely
reaffirms the value of bl. The coefficient of the third power oft is -blbzt"(zo) -
ibif"'(zo). Setting this equal to zero gives
(11.37)
bz =
bffill (zo) _ fill (zo) e4i9[
6f"(zo) - 31f"(zo)12 ,
where we substituted for bl from Equation (11.31) and used 201 +Oz = tt .
To calculate b3, keep one more term in the expansion ofboth z and t Z to obtain
and
2 1" 2 1 m 3 1 (iv) 4
t = -- f (zo)(z - zo) - - f (zo)(z - zo) - - f (zo)(z - zo)
2 6 24 .
Once again substitute the first equation in the second and equate the coefficients
of equal powers of t on both sides. The second and third powers of t give nothing
new. Setting the coefficient of the fourth power of t equal to zero yields
b _ b3 { 5[f"'(zo)]z f(iV)}
3 - I 72[f"(zo)]Z - 24f"(zo)
../ie3i9
[ {5[flll(zo)f f(iV)}
= 121f"(zo)13/Z 3[f"(zo)]Z - f"(zo) .
(11.38)
11.6 PROBLEMS 319
Imz
2i1[;/' + __---,
..LJ
•
f
-00 o +00 Rez
Figure 11.14 Contourused for Problem 11.4.
11.6 Problems
11.1. Derive Equation (11.2) from its logarithmic derivative.
11.2. Show that the point at infinity is not a branch point for f(z) = (z2 - 1)1/2.
11.3. Find the following integrals, for which 0 t= a E R.
1
00 (lnx)2
(e) 22dx.
o x +a
11.4. Use the contour in Figure 11.14 to evaluate the following integrals.
(a) roo s~ax dx
Jo sinh x
(b) roo x ~osax dx
Jo smhx
11.5. Show that 1;f(sine) de = 21;/2 f(sine)de for an arbitrary function f
defiued iu the interval [-I, +1].
11.6. Findthe principalvalue ofthe integral1.':"00 x sin x dx/ (x 2- x5) andevaluate
1
00 x sinx d
1= x
-00 (x - xo ± i€)(x +Xo ± i€)
for the four possible choices of signs.
11.7. Use analytic continuation, the analyticity of the exponential, hyperbolic,
and trigonometric functions, and the analogous identities for real z to prove the
following identities.
(a) eZ
= cosh z + sinhz.
(e) sin2z = 2sinzcosz.
(b) cosh2 z - sinh2Z = 1.
320 11. COMPLEX ANALYSIS: ADVANCED TOPICS
11.8. Show that the function I/z2 represents the analytic continnation into the
domain iC - {OJ (all the complex plane minns the origin) of the function defined
by L:~o(n + I)(z + I)" where [z + II < 1.
11.9. Find the analytic continuation into iC- Ii, -i} (all the complex plane except
i and -i) of f(z) = fo
oo e-zt sin t dt where Refz) > O.
11.10. Expand f(z) = L:~o zn (defined in its circle of convergence) in a Taylor
series about z = a. For what values of a does this expansion permit the function
f (z) to be continued analytically?
11.11. The two power series
00 zn
Ji(z) = L-
n=l n
and
. ~ n (z _2)n
!z(z) = in + L..(-I) -'--'--------'-
n=l n
and
where Retz) > O.
have no common domain of convergence. Show that they are nevertheless analytic
continuations of one another.
11.12. Prove that the functions defined by the two series
I (I - a)z (I - a)2z2
-I---z - (I - Z)2 + -'-(:-:-I-_--"Z"")3c-
are analytic continuations of one another.
11.13. Show that the function Ji (z) = I/(z2 + I), where z i' ±i, is the analytic
continuation into iC - Ii, -i} of the function !z(z) = L:~0(_l)nz2n, where
lel < 1.
11.14. Find the analytic continuation into iC - {OJ of the function
f(z) = 10
00
te-ztdt
11.15. Show that the integral in Equation (11.9) converges. Hint: First show that
[F'(z + I) I ::s fooo tX
e-tdt where x = Retz), Now show that
for some integer n > 0
and conclude that I'(z) is finite.
11.16. Show that dr(z + 1)ldz exists and is finite by establishing the following:
(a) [Inr] < t+ lit for t > O. Hint: Fort::: I, showthatt -Int is a monotonically
increasing function. For t < I, make the substitution t = lis.
(b) Use the result from part (a) in the integral for dr(z + 1)ldz to show that
Idr(z + I)ldzl is finite. Hint: Differentiate inside the integral.
11.6 PROBLEMS 321
11.17. Derive Equation (11.11) from Equation (11.9).
11.18. Show that I'(~) = .ft, and that
2
k
(2k+l)
(Zk - I)l! == (Zk -j)(2k - 3) .. ·5·3· I = .ftf -Z- .
11.19. Show that F'(z) = Io1[1n(llt)y-ldt with Re(z) > O.
11.20. Derive the identity 1000
eX"dx = f[(a +1)la].
11.21. Consider the function f (z) = (I +z)".
(a) Show that d" fldznlz~o = I'(o + 1)1I'(o - n + I), and use it to derive the
relation
(
a ) a! f(a + I)
where n == nl(a - n)! == n!r(a - n + I) .
(b) Show that for general complex numbers a and b we can formally write
(a +b)a = t (:)anb
a-n.
n=O
(c) Show that if a is a positive integer m, the series in part (b) truncates at n = m.
11.22. Prove that the residue of I'(z) at z = -k is rk = (-I)klkL Hint: Use
Equation (11.12)
11.23. Derive the following relation for z = x +iy:
11.24. Using the definition of B(a, b), Equation (11.16), show that B(a, b) =
B(b, a).
11.25. Integrate Equation (11.21) by parts and derive Equation (11.11).
11.26. For positive integers n, show that f(~ - n)r(~ +n) = (_1)n"..
11.27. Show that
(a) B(a, b) = B(a + I, b) + B(a, b + I).
(b) B(a, b + I) = C~ b) B(a, b).
(c) B(a, b)B(a +b, c) = B(b, c)B(a, b + c).
322 11. COMPLEX ANALYSIS: ADVANCED TOPICS
Imz
o Rez
-it: 1-----.....- - - -
Figure 11.15 The contour for the evaluation of the Hankel function of the second kind.
11.28. Verify that 121(1 +t)a(l - t)bdt = za+b+lB(a + 1, b + 1).
11.29. Show that the volume of the solid formed by the surface z = xal, the
xy-, yz-, and xz-planes, and the plane parallel to the z-axis and going through the
points (0, YO) and (xo, 0) is
xa+1 b+l
o Yo B(a + 1, b + 1).
a+b+Z
11.30. Derive this relation:
where - 1 < a < b.
Hint: Let t = tanh2x in Equation (11.16).
11.31. The Hankel function of the second kind is defined as
where C is the contour shown in Figure 11.15. Find the asymptotic expansion of
this function.
11.32. Find the asymptotic dependence ofthe modified Bessel function of the first
kind, defined as .
[v(ex) es ~ 1. e(a/2)«+I/<) dz ,
2:rrz Y
c zv+l
where C starts at -00, approaches the origin and circles it, and goes back to -00.
Thus the negative real axis is excluded from the domain of analyticity.
11.6 PROBLEMS 323
11.33. Find the asymptotic dependeoce of the modified Bessel function of the
second kind:
K (a) ==! [ e-(aI2)(z+1/z)~
IJ 2 Jc zv+l'
where C starts at 00, approaches the origin and circles it, and goes back to 00.
Thus the positive real axis is excluded from the domain of analyticity.
Additional Reading
I. Denoery, P.and Krzywicki, A.Mathematics for Physicists, Harper and Row,
1967.
2. Lang, S. Complex Analysis, 2nd ed., Springer-Verlag, 1985. Contains a very
lucid discussion of analytic continuation.
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Part IV _
Differential Equations
Hassani_Mathematical_Physics_A_Modem_Int.pdf
12 _
Separation of Variables in Spherical
Coordinates
The laws of physics are almost exclusively writteu in the form of differential
equations (DEs). In (point) particle mechanics there is only one independent vari-
able, leading to ordinary differential equations (ODEs). In other areas of physics
in which extended objects such as fields are studied, variations with respect to po-
sition are also important. Partial derivatives with respect to coordinate variables
show up in the differential equations, which are therefore called partial differen-
tiai eqnations (PDEs). We list 'the most common PDEs of mathematical physics
in the following.
12.1 PDEs of Mathematical Physics
Inelectrostatics, where time-independent scalar fields such as potentials and vector
fields such as electrostatic fields are studied, the law is described by Poisson's
Poisson's equation equation,
V2<1>(r) = -41fp(r). (12.1)
Laplace's equation In vacuum, where p(r) = 0, Equation (12.1) reduces to Laplace's equation,
(12.2)
Many electrostatic problems involve couductors held at constant potentials and
situatedin vacuum. In the space betweensuchconducting surfaces, the electrostatic
potential obeys Equation (12.2).
heat equation The most simplified version of the heat equation is
(12.3)
328 12. SEPARATION OF VARIABLES IN SPHERICAL COOROINATES
where T is the temperature aud a is a constaut characterizing the medium io which
heat is flowing.
One of the most frequently recurriog PDEs encountered in mathematical
wave equation physics is the wave eqnation,
(12.4)
This equation (or its simplification to lower dimensions) is applied to the vibration
of striogs aud drums; the propagation of sound io gases, solids, aud liquids; the
propagation of disturbauces in plasmas; aud the propagation of electromagnetic
waves.
The Schriidinger equation, describiog nonrelativistic quautum phenomena,
Schrodinger equation is
where m is the mass of a subatomic particle, Ii is Plauck's constaut (divided by 2".),
V is the potential energy of the particle, aud IW(r, t)I2 is the probability density
of findiog the particle at r at time t.
A relativistic generalization of the Schrodinger equation for a free particle
of mass ,m is the Klein-Gordon eqnation, which, in terms of the natural units
(Ii = 1 = c), reduces to
Klein-Gordon
equation
~ aw
- -V2
w+ V(r)W = -Ui-,
2m at
2 2 a2q,
Vq,-mq,=-2'
at
(12.5)
(12.6)
time isseparated
from space
Equations (12.3-12.6) have partial derivatives with respect to time. As a first
step toward solviog these PDEs aud as au iotroduction to similar techniques used
io the solution of PDEs not iovolviog time,1 let us separate the time variable. We
will denote the functions io all four equations by the generic symbol W(r, t). The
basic idea is to separate the r aud t dependence ioto factors: W(r, t) es R(r)T(t).
This factorization permits us to separate the two operations of space differentiation
aud time differentiation. Let L staud for all spatial derivative operators aud write
all the relevaut equations either as LW = awjat or as Lj1 = a2w jat2• With this
notation andtheabove separation, we have
L(RT) = T(LR) = {RdTjdt,
Rd2Tjdt2
1See [Hass99] fora thorough discussionof separation in Cartesian andcylindrical coordinates. Chapter 19 of thisbookalso
contains examples of solutions to some second-order linear DEsresulting from suchseparation.
12.1 PDES DF MATHEMATICAL PHYSICS 329
Dividing both sides by RT, we obtain
(12.7)
Now comes the crucial step in the process of separation ofvariables. The LHS
of Equation (12.7) is a function of position alone, and the RHS is a function of
time alone. Since r and t are iudependent variables, the only way that (12.7) can
hold is for both sides to be constant, say a:
1
-LR = a =} LR = «R
R
1 dT dT
--=a =} -=aT
T dt dt
We have reduced the original time-dependent POE to an ODE,
dT =aT
dt
or (12.8)
and a POE involving only the position variables, (L - a)R = O. The most general
formofL-a arising from Equations (12.3-12.6) is L-a sa V2 +j(r). Therefore,
Equations (12.3-12.6) are equivalent to (12.8), and
V 2R
+j(r)R = O. (12.9)
Toinclude Poisson's equation, we replacethe zero on the RHS by g(r) == -41!'p(r),
obtaining V 2R + j(r)R = g(r). With the exception of Poisson's equation (an
inhomogeneous POE), in all theforegoing equations the term on the RHS is zero.'
We will restrict ourselves to this so-called homogeneous case and rewrite (12.9)
as
V2W(r)
+ j(r)W(r) = O. (12.10)
Depending on the geometry of the problem, Equation (12.10) is further separated
into ODEs each involving a single coordinate of a suitable coordinate system. We
shall see examples of all major coordinate systems (Cartesian, cylindrical, and
210most cases, a is chosento be real.In the case of the Schrodinger equation, it is moreconvenientto choose a to be purely
imaginary so thatthei in thedefinition of Lcanbecompensated. Inall cases,theprecisenature of a is determined byboundary
conditions.
3Techniques forsolvinginhomogeneous PDEsarediscussed in Chapters 21 and22.
330 12. SEPARATION OF VARIABLES IN SPHERICAL COORDINATES
spherical) in Chapter 19. For the rest of this chapter, we shall concentrate on some
general aspects of the spherical coordinates.
Jean Le Rond d'A1embert (1717-1783) was the illegitimate son of a famous sa-
lon hostess of eighteenth-century Paris and a cavalry officer. Abandoned by his mother,
d' Alembert was raised by a foster family and later educated by the arrangement of his
father at a nearby church-sponsored school, in which he received instruction in the classics
and above-average instruction in mathematics. After studying law and medicine, he finally
chose to pursue a careerin mathematics. In the 17408he joinedthe ranks ofthe philosophes,
a growing group of deistic and materialistic thinkers and writers who actively questioned
the social and intellectual standards of the day. He traveled little (he left France only once,
to visit the court of Frederick the Great), preferring instead the company of his friends in
the salons, among whom he was well known for his wit and laughter.
D' Alembert turned his mathematical and philosophical
talents to many of the outstanding scientific problems of the
day, with mixed success. Perhaps his most famous scien-
tific work, entitled 'Iraite de dynamique, shows his appre-
ciation that a revolution was taking place in the science of
mechanics-thefonnaIization ofthe principles stated by New-
ton into a rigorous mathematical framework. The philoso-
phy to which d'Alembert subscribed, however, refused to ac-
knowledge the primacy of a concept as unclear and arbitrary
as "force," introducing a certain awkwardness to his treatment
and perhaps causing him to overlook the important principle
of conservation of energy. Later,d'Alembertproduced a treatiseon fluidmechanics (the
priority of which is still debated by historians), a paper dealing with vibrating sltings (iu
which the wave equation makes its first appearance in physics), and a skillful treatment
of celestial mechanics. D' Alembert is also credited with use of the first partial differential
equation as well as the first solution to such an equation using separation of variables.
(One should be careful interpreting ''first'': many of d' Alembert's predecessors and con-
temporaries gave similar. though less satisfactory. treatments of these milestones.) Perhaps
his most well-known contribution to mathematics (at least among students) is the ratio test
for the convergence of infinite series.
Much of the work for which d' Alembert is remembered occurred outside mathemat-
ical physics. He was chosen as the science editor of the Encyclopedie, and his lengthy
Discours Preliminaire in that volume is considered one of the defining documents of the
Enlightenment. Other works included writings on law, religion, and music.
Since d' Alembert's final years were not especially happy ones, perhaps this account
of his life should eud with a glimpse at the bwnanity his philosopby often gave his work.
Like many of his contemporaries, he considered the problem of calculating the relative
risk associated with the new practice of smallpox inoculation. which in rare cases caused
the disease it was designed to prevent. Although not very successful in the mathematical
sense. he was careful to point out that the probability ofaccidental infection, however slight
or elegantly derived, would be small consolation to a father whose child died from the
12.2 SEPARATION OFTHEANGULAR PARTOFTHE LAPLACIAN 331
inoculation. It is greatlyto his credit that d'Alembertdid not believe such considerations
irrelevant to the problem.
12.2 Separation ofthe Angular Partofthe Laplacian
angular momentum
operator
commutation
relations between
components 01
angular momentum
operator
With Cartesian and cylindrical variables, the boundary conditions are important
in determining the nature of the solutions of the ODE obtained from the POE. In
almost all applications, however, the angular part of the spherical variables can
be separated and studied very generally. This is because the angular part of the
Laplacian in the spherical coordinate system is closely related to the operation
of rotation and the angular momentum, which are independent of any particular
situation.
The separation of the angular part in spherical coordinates can be done in a
fashion exactly analogous to the separation of time by writing Ij1 as a product
of three functions, each depending on ouly one of the variables. However, we
will follow an approach that is used in quantum mechanical treatments of angular
momentum. This approach, which is based on the operator algebra of Chapter 2
and is extremely powerful and elegant, gives solutions for the angular partin closed
form.
Define the vector operator pas p= -iV so that its jth Cartesian component
is Pj = -i8/8xj, for j = 1,2,3. In quantum mechanics p(multiplied by Ti) is
the momentum operator. It is easy tu verify that" [Xj, Pk] = i8jk and [Xj, xk] =
0= [Pj' Pk]·
We can also define the angular momentum operator as C= r x p. This
is expressed in components as Li = (r X P)i = EijkXjPk for i = 1,2,3, where
Einstein's summation convention (summing over repeated indices) is utilized.f
Using the commutation relations above, we obtain
[Lj, Lk] = iEjk/L/.
We will see shortly that Ccan be written solely in terms of the angles eand
tp, Moreover, there is one factor of pin the definition of C, so if we square C, we
will get two factors of p, and a Laplacian may emerge in the expression for C.L.
In this marmer, we may be able to write '112 in terms of L2, which depends only on
4These operators act on the space of functions possessing enough "nice" properties as to renderthe space suitable. The operator
xj ~ply multiplies functions, while p' differentiates them.
It IS assumed that the reader is familiar with vector algebrausing indices and such objects as 8ij and Eijk' For an introductory
treatment, sufficient for our present discussion, see [Hass 99]. A more advanced treatment ofthese objects (tensors) can be found
in Part VII of this book.
332 12. SEPARATION OF VARIABLES IN SPHERICAL COORDINATES
angles. Let us try this:
3
2 "" "
L = L· L = .!...J LiL; = EijkXjPkEimllXmPn = EijkEimn XjPkXmPn
;=1
= (8jm8kn - 8jn8km)XjPkXmPn = XjPkXjPk - XjPkXkPj'
We need to write this expression in such a way that factors with the same index are
next to each other, to give a dot product. We must also try, when possible, to keep
the pfactors to the right so that they can operate on functions without intervention
from the x factors. We do this using the commutation relations between the x's
and the p's:
L
2
= Xj(XjPk - i8kj)Pk - (PkXj +i 8kj)XkPj
= XjXjPkPk - iXjPj - PkXkXjPj - iXjPj
= XjXjPkPk - 2ixjPj - (XkPk - i8kk)XjPj'
Recalling that 8kk = Ei~l 8kk = 3 and XjXj = E]~l XjXj = r·r= r
2
etc., we
can write L2 = r 2p.p+rr· p- (r. p)(i'. p), which, if we make the substitution
p= -iV, yields
V2 = _r-2L2 +r-2(r. V)(r· V) +r-2r · V.
Letting both sides act on the function "'(r, e, <p), we get
2 I 2 1 1
V '" = --L '" + -(r· V)(r· V)'" + -r· V"'.
r2 r2 r2
(12.11)
(12.12)
Laplacian separated
into angular and
radial parts
But we note that r· V = reT' V = ra/ar. We thus get the final form of 72", in
spherical coordinates:
V2", = _~L2", + ~~ (ra"') + ~ a",.
r2 rar ar r ar
It is important to note that Equation (12.11) is a general relation that holds in
all.coordinate systems. Although all the manipulations leading to it were done in
Cartesian coordinates, since it is written in vectornotation, there is no indication
in the final form that it was derived using specific coordinates.
Equation (12.12) is the spherical version of (12.11) and is the version we shall
use. We will first make the simplifying assumption that in Equation (12.10), the
master equation, fer) is a function of r only. Equation (12.10) then becomes
1 2 1 a (a"') I a",
--L '" + -- r- + -- + f(r)'" = O.
r2 rar ar r ar
Assuming, for the time being, that L2 depends only on eand <p, and separating
'" into a product of two functions, "'(r, e, <p) = R(r)Y(e, <p), we can rewrite this
12.2 SEPARATION OFTHEANGULAR PARTOFTHE LAPLACIAN 333
equation as
I la[a ]Ia
--L2(RY) + -- r-(RY) + --(RY) + f(r)RY = O.
r2 r ar ar r ar
Dividing by RY and multiplying by r2 yields
I 2 r d (dR) r dR 2
--L (Y)+-- r- +--+r f(r) =0,
Y Rdr dr R dr
" - . . - ' • I
-no +a
or
and
d
2R
2dR [ Ci]
'-+--+ f(r)-- R=O.
dr2 r dr r2
(12.13)
(12.14)
We will concentrate on the angular part, Equation (12.13), leaving the radial part to
the general discussion of ODEs. The rest of this subsection will focus on showing
that Ll sa Lx> L2 sa Ly, and L3 sa Lz are independent of r.
Since Li is an operator, we can study its action on an arbitrary function f.
Thus, L;j = -i€ijkX/'hf == -i€ijkXjaf/aXk. We can express the Cartesian
Xj in terms of r, II, and tp, and use the chain rule to express af/axk in terms of
spherical coordinates. This will give us Lif expressed in terms of r, II, and rp. It
will then emerge that r is absent in the final expression.
Let us start with x = r sin IIcos tp, Y = r sin IIsin tp, z = r cos II, and their
inverses, r = (x2 + y2 + Z2)1/2, cosll = zfr, tanrp = Ylx, and express the
Cartesian derivatives in terms of spherical coordinates using the chain rule. The
first such derivative is
af af ar af all af arp
-=--+--+--.
ax ar ax all ax arp ax
(12.15)
The derivative of one coordinate system with respect to the other can be easily
calculated. For example, arlax = x]r = sinllcosrp, and differentiating both
sides of the equation cos II = zlr, we obtain
. all zarlax zx
-smll- = ---- = -- =
ax r2 r3
cos IIsin 1/ cos rp all cos IIcos rp
=} = ---'-
r ax r
Finally, differentiating both sides oftan rp = YIx with respectto x yields arpIax =
- sinrp/(r sin II). Using these expressions in Equation (12.15), we get
af. af cosllcosrpaf sinrp af
- = smllcosrp- + - ---.
ax ar r all r sin II arp
334 12. SEPARATION OF VARIABLES IN SPHERICALCOOROINATES
In exactly the same way, we obtain
af . . af cos esin I" af cos I" af
-=smesmqJ,,+ oe+-'-e'"
ay or r u r sm vI"
af af sine af
- = cos()- - ---.
az ar r ae
We can now calculate Lx by letting it act on an arbitrary function and expressing
all Cartesian coordinates and derivatives in terms of spherical coordinates. The
result is
. af . af .( . a a )
Lxf = -ly- + IZ- = I smqJ- +cote cos 1"- f,
az ay ae aqJ
Cartesian
components of
angular momentum
operator expressed
inspherical
coordinates
or
Lx = i (Sin qJ~ +cote cos qJ~) .
ae aqJ
Analogous arguments yield
Ly = i (-cosqJ~ +cote SinqJ~),
ae aqJ
(12.16)
(12.17)
(12.18)
angular momentum
squared as
differential operator
ineand q>
It is left as a problem for the reader to show that by adding the squares of the
components of the angular momentum operator, one obtains
2 I a (. a) I a2
L = - sine ae sme ae - sin2 eaqJ2'
Substitution in Equation (12.12) yields the familiar expression for the Laplacian
in spherical coordinates.
12.3 Construction of Eigenvalues of l2
Now that we have L2 in terms of eand 1", we could substitute in Equation (12.13),
separate the eand I" dependence, and solve the corresponding ODEs. However,
there is a much more elegant way of solving this problem algebraically, because
Equation (12.13) is simply an eigenvalue equation for L2. In this section, we will
find the eigenvalues of L2
• The next section will evaluate the eigenvectors of L2
Let us consider L2 as an abstract operator and write (12.13) as
where IY) is an abstract vector whose (e, qJ)thcomponent can be calculated later.
Since L2
is a differential operator, it does not have a (finite-dimensional) matrix
12.3 CONSTRUCTION OF EIGENVALUES OF L2 335
representation. Thus, the determinantal procedure for calculating eigenvalues and
eigenfunctions will not work here, and we have to find another way.
The equation above specifies an eigenvalue, a, and an eigenvector, IY). There
may be more than one IY) corresponding to the same a. Todistinguish among these
so-called degenerate eigenvectors, we choose a second operator, say L3 E {L,}that
commntes with L2. This allows us to select a basis in which both L2 and L3 are
diagonal, or, equivalently, a basis whose vectors are simultaneous eigenvectors
of both L2 and L3. This is possible by Theorem 4.4.15 and the fact that both L2
and L3 are hermitian operators in the space of square-integrable functions. (The
proofis left as a problem.) In general, we would want to continue adding operators
until we obtained a maximum set of commuting operators which could label the
eigenvectors. In this case, L2 and L3 exhaust the set," Using the more common
subscripts x, y, and z instead of 1, 2, 3 and attaching labels to the eigenvectors,
we have
Lz IY.,p) = fJ 1Y.,p), (12.19)
The hermiticity of L2 and Lz implies the reality of a and fJ. Next we need to
determine the possible values for a and fJ.
Define two new operators L+ sa Lx +i Ly and L == Lx - iLy . It is then easily
verified that
(12.20)
angular momentum
raising and lowering
operalors
The first equation implies that L± are invariant operators when acting in the sub-
space corresponding to the eigenvalue a; that is, L± 1Y.,p) are eigenvectors of L2
with the same eigenvalue a:
L2(L± 1Y.,p) = L±(L21Y.,p) = aL± 1Y.,p).
The second equation in (12.20) yields
Lz(L+ IY.,p) =(LzL+) 1Y.,p) =(L+Lz +L+) IY.,p)
=L+Lz 1Y.,p)+ L+ lYa,p) =fJL+ 1Ya,p) +L+ 1Y.,p)
= (fJ + I)L+ IY.,p) .
This indicates that L+ 1Y.,p) has one more unit of the Lz eigenvalue than 1Y.,p)
does. In other words, L+ raises the eigenvalue of Lz by one unit. That is why L+
is called a raising operator. Similarly, L is called a lowering operator because
Lz(L 1Y.,p) = (fJ - I)L 1Ya,p).
Wecansummarize theabovediscussion as
6Wecouldjustas well havechosenL
2 andanyother component as ourmaximalset. However, L2 andL3 is theuniversally
accepted choice.
336 12. SEPARATION OF VARIABLES IN SPHERICAL COOROINATES
where C± are constants to be determined by a suitable normalization.
There are restrictions on (and relations between) 01 and p.First note that as L2
is a sum of squares of hermitian operators, it must be a positive operator; that is,
(al L2 1
a) :::: 0 for all ]c). In particular,
0::; (Ya,~1 L2IYa,~) = 01 (Ya,~1 Ya,~) = 0IIIYa,~1I2.
Therefore, 01 :::: O. Next, one can readily show that
L
2
= L+L +L; - Lz = LL+ +L; +Lz. (12.21)
Sandwiching both sides of the first equality between IYa,~) and (Ya,~ Iyields
(Ya,~1 L2IYa,~) = (Ya,~1 L+L IYa,~) + (Ya,~1 L; lYa,~) - (Ya,~1 t, lYa,~),
with an analogous expression involving LL+. Using the fact that L+ = (L)t,
we get
0IIIYa,~1I2 = (Ya,~1 L+L IYa,~) +p2I1Ya,~1I2 - PIIYa,~1I2
= (Ya,~1 LL+ IYa,~) +p2I1Ya,~1I2 +PIIYa,~1I2
= liLT IYa.~) 11
2
+p2I1Ya,~ f 1= PllYa,~1I2 (12.22)
Because of the positivity ofnonns, this yields 01 :::: p2 - P and 01 :::: p2 +p. Adding
these two inequalities gives 201 :::: 2p2 => -v'{i::; p ::; v'{i. It follows that the
values of p are bounded. That is, there exist a maximum p,denoted by P+, and
a minimum p, denoted by p_, beyond which there are no more values of p. This
can happen only if
because if L± lYa,~±) are not zero, then they must have values of p corresponding
to P±± 1, which are not allowed.
Using P+ for pin Equation (12.22) yields
(01- p~ - P+)llYa,~+1I2 = O.
By definition lYa,~+) oft 0 (otherwise P+ - 1 wonld be the maximum). Thus, we
obtain 01 = p~ +P+ An analogous procedure using p_ for p yields 01 = P:- p_.
We solve these two equations for P+ and p_:
P+ = !(-I ±"'1 +4a), p_ =!(I ± "'I+4a).
Since P+ :::: p_ and "'1 +401 :::: I, we must choose
P+ =!(-I +"'1 +401) =-p_.
forkEN,
eigenvalues ofL' and
Lz given
12.3 CONSTRUCTION OF EIGENVALUES OF L2 337
Starting with 1Ya.~+), we can apply L_ to it repeatedly. In each step we decrease
the value offJ by one unit. There must be a limit to the number of vectors obtained
in this way,becausefJ has a minimum. Therefore, there mustexist a nonnegative
integer k such that
(L)k+1IYa,~+) =L(L~ lYa,~+)) =O.
Thus, L~ lYa,~+) must be proportioual to lYa,~_). In particular, since L~ IYa,~+)
has a fJ value equal to fJ+ - k, we have fJ- = fJ+ - k. Now, using fJ- = -fJ+
(derived above) yields the important result
k .
fJ+=Z=J
or a = j (j +1), since a = fJt+fJ+. This result is important enough to be stated
as a theorem.
12.3.1. Theorem. The eigenvectorsof~, denoted by 1Yjm), satisfy the eigenvalue
relations
L
21
Yjm) = j(j + 1) IYjm) ,
Lz IYjm) = m IYjm) ,
where j is a positive integer or half-integer, and m can take a value in the set
I-j, -j + 1, ... ,j -1, j] of2j + 1 numbers.
Let us briefly consider the normalization of the eigenvectors. We already know
that the IYjm), being eigenvectors ofthe hermitian operators L2 and Lz, are orthog-
onal. We also demand that they be of unit norm; that is,
(12.23)
(12.24)
This will determine the constants C±, introduced earlier. Let ns consider C+ first,
which is defined by L+ IYjm) = C+ IYj,m+l). The hermitian conjngate of this
equation is (Yjml L = C':' (Yj,m+ll. We contract these two eqnations to get
(Yjml LL+ IYj m) = IC+12
(Yj,m+ll Yj,m+l). Then we use the second relation
in Equation (12.21), Theorem 12.3.1, and (12.23) to obtain
j (j + 1) - m(m + 1) = 1
C+1
2 => 1
C+1=-/"j-;-(j:---+:---I:7)---m:---(:---m-+:---I'"'").
Adopting the convention that the argument (phase) of the complex number C+ is
zero (and therefore that C+ is real), we get
C+ = .jj(j +1) -m(m +1)
Similarly, C- = -/j (j + 1) m(m 1). Thus, we get
L+ IYjm) = ,jj(j + 1) - m(m + 1) IYj,m+l) ,
L IYjm) =,jj(j + 1) - m(m - 1) IYj,m-l).
338 12. SEPARATION OF VARIABLES IN SPHERICAL COOROINATES
12.3.2. Example. Letusfindanexpressionfor IYlm) byrepeatedly applyingL- to IYu).
Theactionfor L_ is completely described by Equation (12.24). Forthefirst power of L_,
we obtain
L-lYu) = ,//(/ + I) -/(/-1) IYI,I-I} = J2l1Y1,l-1).
Weapply L_ oncemore:
(L-)2IYU) = J2lL-1Y1,1-l} = J2l,//(l + I) - (/- 1)(/- 2) IYI,I-2}
= J2l,/2(21 - I) 1Y1,1-2} = ,/2(21)(2/ - I) 1Y1,1-2)'
ApplyingL- a thirdtimeyields
(L-)3 IYU) = ,/2(2/)(2/- 1)L-IYI,l-2} = ,/2(21)(2/- 1),/6(/- I) 1Y1,1-3)
= ,/3!(21)(21 - 1)(21 - 2) IYI,H}.
Thepattern suggeststhefollowingformula fora generalpowerk:
L~ lYu) = ,/k!(2/)(21-I) ... (2/- k + I) 1Y1,I-k},
or L~ lYu} = ,/k!(21)!j(21 k)! 1YI,I-k).lfwe set1- k =m and solvefor IYI,m)' we get
} (I +m)! l-mIY)
IYl,m = (/ _m)!(21)! L_ iu- III
The discussion in this sectionis the standard treatment ofangularmomentumin
quantum mechanics. In the context of quantum mechanics, Theorem 12.3.1 states
the far-reaching physical result that particles can have integer or half-integer spin.
Suchaconclusionistiedtotherotationgroup inthree dimensions, which,intum, is
an exarupleofa Lie group, or a contiuuous group oftransformations. We shall come
back to a study ofgroups later. Itis worth notiug that it was the study ofdifferential
equations that led the Norwegian mathematician Sophus Lie to the investigation
of their symmetries and the development of the beautiful branch of mathematics
and theoretical physics that bears his narue. Thus, the existence of a conuection
between group theory (rotation, angular momentum) and the differential equation
we are trying to solve should not come as a surprise.
12.4 Eigenvectors of L2
: Spherical Harmonics
The treatment in the preceding section took place in an abstract vector space. Let
us go back tu the fuuction space and represent the operators and vectors in terms
of 0 and <p.
First, let us consider Lz in the form of a differential operator, as given in
Equation (12.17). The eigenvalue equation for L, becomes
12.4 EIGENVECTORS OF L2: SPHERICAL HARMONICS 339
We write Yjm(O, <p) = Pjm(O)Qjm(<P) and substitute in the above equation to
obtain the ODE for <p, dQjm/d<p = ImQjm, which has a solution of the form
Qjm(<P) = Cjmeim~, where Cjm is a constant. Absorbing this constant into Pjm,
we can write
Yjm(O, <p) = Pjm(O)eim~.
In classical physics the value of functions must be the same at <p as at <p + 2".
This condition restricts the values of m to integers. In quantum mechanics, on the
other hand, it is the absolute values of functions that are physically measurable
quantities, and therefore m can also be a half-integer.
12.4.1. Box. From now on, we shall assume that m is an integer and denote
the eigenvectors of L2 by Ylm(0, <p), in which 1Is a nonnegative integer.
Our task is to find an analytic expression for Ylm(O, <pl.We need differential
expressions for L±. These can easily be obtained from the expressions for Lx and
Ly given in Equations (12.16) and (12.17). (The straightforward manipulations are
left as a problem.) We thus have
L± = e±i~ (±~ + i cotO~) . (12.25)
ao a<p
Since 1is the highest value ofm, when L+ acts on Yll(O, <p) = Pll(O)eil~ the result
must be zero. This leads to the differential equation
(:0 + I cot 0aa<p) [Pll(O)eil~] = 0 =} (:0 -I cot 0
)Pll(O) = O.
The solution to this differential equation is readily found to be
Pll (IJ) = CI (sin 0)1.
The constant is subscripted because each Pll may lead to a different constant of
integration. We can now write Yll(O, <p) = CI(sinO)Ieil~.
With Yll(O, <p) at our disposal, we can obtain any Ylm(O, <p) by repeated ap-
plication of L. In principle, the result of Example 12.3.2 gives all the (abstract)
eigenvectors. In practice, however, it is helpful to have a closed form (in terms
of derivatives) for just the 0 part of Ylm(O, <pl. So, let us apply L, as given in
Equation (12.25) to Yll(O, <p):
. ( a a ) '1
LYli = e-'~ - - + I cotO- [Pll(O)e'~]
ao a<p
= e-i~ [- :0 +I cotO(il)] [Pll(O)eil~]
= (_I)ei(l-l)~ (:0 +lcotO) Pll(O).
340 12. SEPARATION OF VARIABLES IN SPHERICALCOORDINATES
It can be shown that for a positive integer,
(
d ) I d .n
dO +ncotO f(O) = sinnOdO[sm Of(O)].
Using this result yields
(12.26)
(12.27)
LYIl = v'2iYI.I-l = v'2iei(l-ll~PI,1-1 (0)
= (_I)ei(I-ll~_I_~[sini O(CI sini 0)]
siniO dO
ei(I-l)~ d
= (-I)CI---(sin21
0).
sinlO dO
We apply L to (12.27), and use Equation (12.26) with n = I - I to obtain
With a little more effort one can detect a pattern and obtain
ei (l-k)~ dk
k ( 2)1
L_Yn = CI (1 _ u2)(l-kl(2 duk [ I - u ] .
If we let k = I - m and make use of the result obtained in Example 12.3.2, we
obtain
YIm(O, rp) =
(I +m)! eim~ dI- m
--,-'---'-c-=,...,C [(I _ u2)I]
(1- m)!(21)! I (I _ u2)m(2 du' m .
To specify Ylm(O, rp) completely, we need to evaluate CI. Since CI does not depend
on m, we set m = 0 in the above expression, obtaining
I d
l
2 I
YlO(U, rp) = IMmCIdl [(1 - u ) ].
v (21)! u
The RHS looks very much like the Legendre polynomials of Chapter 7. In fact,
YlO(U, rp) = ~(-1)1211!PI(U) '" AIPI(U).
v (21)!
(12.28)
(12.30)
12.4 EIGENVECTORS OF L2: SPHERICAL HARMONICS 341
Therefore, the normalization of YIO and the Legendre polynomials Pi determines
Ct.
We now use Equation (6.9) to obtain the integral form of the orthonormality
relation forYlm:
(12.29)
which in terms of u = cos ebecomes
12" it
dip Yt;m'(U' <P)Ytm(U, <p)du = 811'8mm,.
o -t
Problem 12.15 shows thatnsing (12.29) one gets At = .)(21 + l)j(4n). Therefore,
Equation (12.28) yields not only the value of Ci, but also the useful relation
r:
YIO(U, '1') = --Pt(u).
4n
spherical harmonics Substituting the value of Ct thus obtained, we finally get
(12.31)
where u = cos e. These functions, the eigenfunctions of L2 and Lz, are called
spherical harmonics. They occur frequently in those physical applications for
which the Laplacian is expressed in terms of spherical coordinates.
One can immediately read off the epart of the spherical harmonics:
t('ii+ll
Ptm(u) = (-1) V~ 2tl!
However, this is not the version used in the literature. For historical reasons the
associated Legendre associated Legendre functions Pt(u) are used. These are defined by
functions
m m (I+m)! ~
Pt (u) = (-1) (1- m)!V 2i+1Ptm(u)
(I + )1 (1 2)-m/2 dt- m
= (_I)t+m m. - u [(1- u2iJ.
(1- m)! »n du' m
Thus,
Ytm(e, '1') = (_1)'" [21
4:
1 i:::;:t2
pt"'<cose)ei"'~.
(12.33)
(12.34)
342 12. SEPARATION OF VARIABLES IN SPHERICAL COOROINATES
We generated the spherical harmonics starting with Yl!(0, <p) and applying the
lowering operator L. We could have started with YI,-I(O, <p) instead, and applied
the raising operator L+. Thelatterprocedure is identical to the former; nevertheless,
we outline it below because of some important relations thatemerge along the way.
We first note that
(I +m)! LI- ", IY )
IYI,-",) = (I _ m)!(21)! + 1,-1· (12.35)
(This can be obtained following the steps of Example 12.3.2,) Next, we use
L IYI,-I) = 0 in differential form to obtain
(:0 -I cot 0) PI,-I(O) = 0,
which has the same form as the differential equation for Pl!. Thus, the solution is
PI,_I(O) = Cf(sinO)I, and
YI,_I(O, <p) = PI,_I(O)e-il~ = C;(sinO)le-il~.
Applying L+ repeatedly yields
k I (-lle-i(l-k)~ dk 2 I
L+YI,_I(U, <p) = CI (I _ u2)(l k)j2 du k [(1 - U ) ],
where U = cos O. Substituting k = 1 - m and using Equation (12.35) gives
YI,-",(U, <p) =
The constant Cf can be determined as before. In fact, for m = 0 we get exactly
the same result as before, so we expect Cf to be identical to CI. Thus,
Comparison with Equation (12.32) yields
YI,_",(O, <p) = (-I)"'YI~"'(O, <p), (12.36)
(12.37)
andusingthedefinition Pic.s.Ie', <p) = PI,_",(O)e-i"'~ and the first partofEquation
(12.33), we obtain
P-"'(O) = (-1)'" (1- m)! pm(O).
I (l +m)! I
12.4 EIGENVECTORS OF L2: SPHERICAL HARMONICS 343
The first few spherical harmonics with positive m are given below. Those with
negative m can be obtained using Equation (12.36).
[g
S .
Y21 = - _e
up
sin ecos e,
8rr
For 1= 0,
Fori = I,
For 1= 2,
For I = 3,
I
Yoo = v'4Jt'
If a·
YlO = - cos II, Yl1 = - -e'~ sin II.
4rr 8rr
{f 2
Y20 = -(3cos II - I),
16rr
/lf
s u« ·2
Y22 = --e T sm II.
32rr
hf 3
Y30 = -(Scos II - 3 cos II),
16rr
/if
l . 2
Y31 = - -e'~ sinll(Scos II - I),
64rr
~
OS 2 2 /lfS 3 3
Y32 = --e HPsin (}cosO, Y33 = - --e HPsin 6.
32rr Mrr
From Equations (12.13), (12.18), and (12.34) and the fact that a = 1(1 + I)
for some nonnegative integer I, we obtain
which gives
I d (. dPt) m
2
m m
- - smll-- - --PI +l(l + I)PI = O.
sin IIdll dll sin211
As before, we let u = cos II to obtain
d [ 2 dPt] [ m
2
] m
- (l-u)- + 1(1+1)---2 PI =0.
du du. 1- u
(12.38)
associated Legendre
differential equation
This is called the associated Legendre differential equation. Its solntions, the
associated Legendre functions, are given in closed form in Equation (12.33). For
m = 0, Equation (12.38) reduces to the Legendre differential equation whose
solutions, again given by Equation (12.33) with m = 0, are the Legendre polyno-
mials encountered in Chapter 7. When m = 0, the spherical harmonics become
q.>-independent. This corresponds to a physical situationin which there is an explicit
azimuthal symmetry. In such cases (when it is obvious that the physical property in
question does not depend on q.» a Legendre polynomial, depending only on cos II,
will multiply the radial function.
344 12. SEPARATION OF VARIABLES IN SPHERICAL COOROINATES
12.4.1 Expansion of Angular Functions
The orthonormality of spherical harmonics can be utilized to expand functions
of IJ and rp in terms of them. The fact that these functions are complete will be
discussed in a general way in the context of Sturm-Liouville theory. Assuming
completeness for now, we write
{
L~o L~=-I aIm Ylm(IJ, rp)
f(IJ, rp) =
L~'=-l aIm Ylm(IJ, rp)
if1is not fixed,
if1is fixed,
(12.39)
where we have included the case where it is known a priori that f (IJ, rp) has a given
fixed 1value. To find aIm,we multiply both sides by Yl~ (IJ, rp) and integrate over
the solid angle. The result, obtained by using the orthonormality relation, is
(12.40)
where dn es sin IJ dIJ dtp is the element of solid angle. A useful special case of
this formula is
alt1= ffdnf(IJ, rp)Y1t(IJ, rp) = )21/I ffdnf(IJ, rp) PI(cos IJ),
11: (12.41)
where we have introduced an extra superscript to emphasize the relation of the
expansion coefficients with the function being expanded. Another useful relation
is obtained when we let IJ = 0 in Equation (12.39):
{
L~O L~=-l almYlm(IJ, rp)19=O
f(IJ, rp)19=O =
L~=-l aimYlm(IJ, rp)19=O
if 1is not fixed,
if 1is fixed.
From Equations (12.33) and (12.34) one can show that
Therefore,
[
L~o alt
1
/ ~t1
if 1is not fixed,
f(IJ, rp)19=O =
a(f) /21+1 if1is fixed.
10 41l'
(12.42)
12.4 EIGENVECTORS OF L2; SPHERICAL HARMONICS 345
z
y
x
Figure 12.1 The unit vectors er and er, with their spherical angles and the angle y
betweenthem.
12.4.2 Addition Theorem for Spherical Harmonics
An important consequence of the expansion in terms of YZm is called the addition
theorem for spherical harmonics. Consider two unit vectors e, and e" making
spherical angles (e, rp) and (e', rp'), respectively, as shown in Figure 12.1. Let y
be the angle between the two vectors. The addition theorem states that
addition theorem for
spherical harmonics
Z
4". L * , ,
Pz(cos y) = - - ylm(e , rp )Yzm(e, rp).
2/ + I m=-I
(12.43)
We shall not give a proof of this theorem here and refer the reader to an
elegant proof on page 866 which nses the representation theory of groups. The
addition theorem is particularly useful in the expansion of the frequently occurring
expression 1/lr - r'[. For definiteness we assume [r'] sa r' < [r] es r, Then,
introducing t = r' / r, we have
1 1 1
= =-(I+t2-2tcosy)-1/2.
[r - r'l (r2 + rtl - 2rr' cos y)I/2 r
Recalling the generating function for Legendre polynomials from Chapter 7 and
using the addition theorem, we get
1 1 00 00 r,l 4". Z
- I
- 'I = - LtIPI(cosy) = L 1+1 -2/
1 L Y{;,,(e',rp')Yzm(e,rp)
r r r [=0 [=0 r + m=-l
346 12. SEPARATION OF VARIABLES IN SPHERICAL COOROINATES
expansion of
1/1r - "I inspherical
coordinates
It is clear that if r < r', we should expand in terms of the ratio r Ir', It is therefore
customary to use r< to denote the smaller and r-; to denote the larger of the two
radii r and r'. Then theabove equation is written as
(12.44)
This equation is used frequently in the study of Coulomb-like potentials.
12.5 Problems
12.1. By applying the operator [Xj, pkl to an arbitrary function fer), show that
[Xj, Pk] = i8jk.
12.2. Use the defining relation Li =fijkXjPk to show thatXjPk - XkPj =fijkLi.
In both of these expressions a sum over the repeated indices is understood.
12.3. For the angular momentum operator Li = EijkXj Pko show that the commu-
tationrelation [Lj , Lk] = iEjktLI holds.
12.4. Evaluate aflay and af/az in spherical coordinates and find Ly and Lz in
terms of spherical coordinates.
12.5. Obtain an expression for L2 in terms of 8 and <fl, and substitnte the result in
Equation (12.12) to obtain the Laplacian in spherical coordinates.
12.6. Show that L2 =L+L +L; - Lz and L2 =L_L+ + L; +Lz.
12.7. Show that L2, Lx, Ly , and Lz are hermitian operators in the space of square-
integrable functions.
12.8. Verify the following commutation relations:
12.9. Show that L_IY.p) has fJ -I as its eigenvaluefor Lz, and that IY.,P±) cannot
be zero.
12.10. Show that if the IYjm) are normalized to mtity, then with proper choice of
phase, L IYjm) = ..;j(j + I) - m(m - I) IYj,m-l).
12.11. Derive Equation (12.35).
12.5 PROBLEMS 347
12.12. Starting with Lx and Ly , derive the following expression for L±:
L± = e±i. (±~ +i cotO~).
ao arp
12.13. Integrate dP/dO -I cotOP = 0 to find P(O).
12.14. Verify the following differential identity:
(:0 +n cot 0
)j(O) = sin~O:e [sin" OJ(O)].
12.15. Letl = I' and m = m' = 0 in Equation (12.30), and substitnte for YIO from
Equation (12.28) to obtain Al = ../(21 + 1)/4"..
12.16. Show that
(-Ile-i(l-kl. dk
LiYI,-/(U, rp) = c;(I _ u2)(l-k)/2 duk [(I - u
2)/].
12.17. Derive the relations Y/,-m(O, rp) = (_l)mY/:m(0, rp) and
(I - m)1
P-m(O) = (_I)m . pm(O).
/ (I+m)! /
12.18. Show that I:~=-I IYlm(O, rp)12 = (21+1)/(4".). Verify this explicitly for
1=landl=2.
12.19. Show that the addition theorem for spherical harmonics can be written as
p/(cosy) = PI(cos O)PI(cos 0')
/ (I - m)'
+2 I:: .Pt(cos O)Pt (cos 0') cos[m(rp _ rp')].
m=! (l +m)!
Additional Reading
1. Morse, P. and Feshbach, M. Methods ojTheoretical Physics, McGraw-Hill,
1953. A two-volume classic including a long discussion of the separation of
variables in many (sometimes exotic) coordinate systems.
2. The angular momentum eigenvalues and eigenfunctions are discussed in
most books on quantnm mechanics. See, e.g., Messiah, A. Quantum Me-
chanics, volume IT, Wiley, 1966.
13 _
Second-Order Linear Differential
Equations
The discussion of Chapter 12 has clearly singled out ODEs, especially those of
second order, as objects requiring special attention because most common PDEs
of mathematical physics can be separated into ODEs (of second order). This is
really an oversimplification of the situation. Many PDEs of physics, both at the
fundamental theoretical level (as in the general theory of relativity) and from
a practical standpoint (weather forecast) are nonlinear, and the method of the
separation of variables does not work. Since no general analytic solutions for
such nonlinear systems have been found, we shall confine ourselves to the linear
systems, especially those that admit a separated solution.
With the exception of the infinite power series, uo systematic method of solving
DEs existed during the first halfofthe nineteenthcentury. The majority of solutions
were completely ad hoc and obtained by trial and error, causing frustration and
anxiety among mathematicians. It was to overcome this frustration that Sophus
Lie, motivated by the newly developed concept of group, took up the systematic
study of DEs in the second half of the nineteenth century. This study not only
gave a handle on the disarrayed area of DEs, but also gave birth to one of the most
beautiful and fundamental branches of mathematical physics, Lie group theory.
We shall come back to a thorough treatment of this theory in Parts vn and Vlll.
Our main task in this chapter is to study the second-order linear differential
equations (SOLDEs). However, to understand SOLDEs, we need some basic un-
derstanding of differential equations in general. The next section outlines some
essential properties of general DEs. Section 2 is a very brief introduction to first-
order DEs, and the remainder of the chapter deals with SOLDEs.
13.1 GENERAL PROPERTIES OF ODES 349
13.1 General Properties of ODEs
The most general ODE can be expressed as
(
dy d
2y
dny)
F x,y, dx' dx2 ' · · · ' dxn =0, (13.1)
in which F : Rn+2 --> R is a real-valued function of n +2 real variables. When
F depends explicitly and nontrivially on dnyjdxn, Equation (13.1) is called an
nth-order ODE. An ODE is said to be linear if the part of the function F that
includes y and all its derivatives is linear in y. The most general nth-order linear
ODE is
dy dny
po(x)y +P!(x)-d +...+Pn(X)-d = q(x)
x xn for Pn(x) '" 0,
(13.2)
(13.3)
Pn(x) '" 0,
where {pilf=o and q are functions of the independent variable x. Equation (13.2)
is said to be homogeneous if q = 0; otherwise, it is said to be inhomogeneous
and q (x) is called the inhomogeneous term. It is customary, and convenient, to
define a linear differential operator Lby!
d dn
L sa po(x) +P!(x)- +...+Pn(x)-,
dx dx"
and write Equation (13.2) as
homogeneous and
inhomogeneous
ODEs
L[y] =q(x). (13.4)
A solution ofEquation (13.2) or (l3.4) is a single-variablefunction f :R --> R
such that F (x, f(x), f'(x), ... , f(n) (x) = 0, or L[f] = q(x), for all x in the
domain of definition of f. The solution of a differential equation may not exist if
we put too many restrictions on it. For instance, if we demand that f : R --> R
be differentiable too many times, we may not be able to find a solution, as the
following example shows.
13.1.1. Example. Themostgeneralsolutioo ofdy/dx = IxIthatvaoishes at x = 0 is
{
l X2 ifx>O
f(x ) - ~ - ,
- -tx2 if x :s o.
Thisfunction is continuous andhas first derivative f'(x) = [x],whichis also continuous
atx = O. However, if we demand that its secondderivative alsobe continuous atx = 0,
we cannot finda solution, because
f"(x) ={+I if x > 0,
-I ifx <0.
1Do notconfusethislinear differential operator withtheangular momentum (vector) operator L.
350 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS
If we want 1m(x) to exist at x = 0, then we have to expand the notion of a function to
include distributions,or generalizedfunctions. II
.Overrestricting a solution for a differeutial equation results in its absence, but
underrestricting it allows multiple solutions. To strike a balance between these two
extremes, we agree to make a solution as many times differentiable as plausible and
to satisfy certaiu initial conditions. For an nth-order DE such initial conditions
are commonly equivalent (but not restricted) to a specification ofthe function and
of its first n - I derivatives. This sort of specification is made feasible by the
following theorem.
implicit function 13.1.2. Theorem. (implicit functiou theorem) Let G : JRn+l --> lR, given by
theorem G(Xl, X2, ... , Xn+l) E JR, have continuous partial derivatives up to the kth
order in some neighborhood of a point Po = (rl, ru ... , r,,+I) in JRn+l Let.
(8Gj8xn+I)lpo i' O. Then there exists a unique function F : JRn --> JR that is
continuously differentiable k times at (some smaller) neighborhood of Po such
that Xn+1 = F(Xl, X2, ... ,xn)for all points P = (Xl,X2, ... ,Xn+l) in a neigh-
borhood of Poand
G(Xl, X2, ... , Xn, F(XI, X2, ... , xn)) = O.
Theorem 13.1.2 simply asserts that under certain (mild) conditions we can
"solve" for one of the independent variables in G(XI, X2, ... , xn+!) = 0 in terms
ofthe others. A proof of this theorem is usually given in advanced calcnlus books.
Application of this theorem to Equation (13.1) leads to
dny ( dy d
2y
dn-1y)
dxn=F x'Y'dx'dx2""'dxn-1 '
provided that G satisfies the conditions of the theorem. If we know the solution
y = f(x) and its derivatives up to order n - I, we can evaluate its nth derivative
using this equation. In addition, we can calculate the derivatives of all orders
(assuming they exist) by differentiating this equation. This allows us to expand
the solution in a Taylor series. Thus-for solutions that have derivatives of all
orders-knowledge of the value of a solution and its first n - I derivatives at a
point Xo determines that solution at a neighboring point x.
We shall not study the general ODE of Equation (13.1) or even its simpler
linear versiou (13.2). We will only briefly study ODEs ofthe first order in the next
section, and then concentrate on linear ODEs of the second order for the rest of
this chapter.
13.2 Existence and Uniqueness for First-Order DEs
A general first-order DE (FODE) is of the form G(x, y, y') = O. We can find y'
(the derivative of y) in terms of a function of x and y if the function G(Xl, x2, X3)
(13.6)
themost general
FODE In normal form
explicit solution toa
general first-order
linear differential
equation
13.2 EXISTENCE AND UNIQUENESS FOR FIRST-ORDER DES 351
is differentiablewith respect to its third argument and aGjax3 0; O. In that case
we have
, dy
y sa - = F(x, y), (13.5)
dx
which is said to be a normal FODE. If F (x, y) is a linear function of y, then
Equation (13.5) becomes a first-orderlinear DE (FOLDE), which can generally
be written as
dy
PI(X)- + po(x)y = q(x).
dx
It can be shownthat the generalFOLDE has an explicit solution: (see [Hass99])
13.2.1. Theorem. Anyfirst order linear DE oftheform PI(x) y' +Po(x) Y = q(x),
in which po, PI, and q are continuous functions in some interval (a, b), has a
generalsolution
(13.7)
(13.8)
where C is an arbitrary constantand
f1-(x) = _1_ exp [lX
po(t) dt] ,
PI(X) xo PI(t)
where Xo and Xl are arbitrary points in the interval (a, b).
No such explicitsolutionexists for nonlinear first-orderDEs. Nevertheless, it
is reassuring to know that a solution of such a DE always exists and under some
mildconditions,this solutionis unique.We summarizesome of the ideasinvolved
intheproofoftheexistenceanduniqueuessofthe solutionstoFODEs.(Forproofs,
seethe excellentbook by Birkhoffand Rota [Birk78].) Wefirst statean existeuce
theoremdue to Peano:
Peano existence 13.2.2. Theorem. (Peanoexistencetheorem) Ifthefunction F (x,y) is continuous
theorem for the points on and within the rectangle defined by Iy-cl ::s K and Ix -01 ::s N,
and if IF(x, y) I ::s M there, then the differential equation y' = F(x, y) has at
least one solution, y = f(x), definedfor [x - 01 ::s min(N, K j M) and satisfying
the initial condition f(a) == c.
Thistheoremguaranteesonlytheexistenceof solutions.Toensureuniqueness,
the function F needsto havesome additionalproperties.An iroportantpropertyis
statedin the followingdefinition.
Lipschitz condition 13.2.3. Definition. Afunction F(x, y) satisfies a Lipschitz condition in a domain
D C IRz iffor somefinite constant L (Lipschitzconstant),it satisfies the inequality
IF(x, YI) - F(x, yz)1 s LIYI - yzl
for all points (x, YI) and (x, yz) in D.
352 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS
uniqueness theorem 13.2.4. Theorem. (uoiqueness theorem) Let f (x) and g(x) be any two solutions
ofthe FODE y' = F(x, y) in a domain D, where F satisfies a Lipschitz condition
with Lipschitz constant L. Then
If(x) - g(x)1 :::eLlx-allf(a) - g(a)l.
In particular, the FODE has at most one solution curve passing through the point
(a, c) E D.
The final conclusion of this theorem is an easy cousequence of the assumed
differentiability of F and the requirement f(a) = g(a) = c. The theorem says
that if there is a solution y = f(x) to the DE y' = F(x, y) satisfyiog f(a) =c,
then it is the solution.
The requirements of the Peano existence theorem are too broad to yield so-
lutions that have some nice properties. For iostance, the ioterval of definition of
the solutions may depend on their initial values. The followiog example illustrates
this poiot.
13.2.5. Example. Consider the DEdyldx = eY• Thegeneralsolntinnof thisDEcanbe
obtained by direct integration:
«»dy = dx =} -e-Y = x + C.
Ify = b whenx = 0, then C = _e-h, and
e-Y = -x +e-h =} y = _In(e-h - x).
Thus, thesolution is defined for-00 < x < e-b, ie.,-theinterval of definition of asolution
changes withits initialvalue. l1li
To avoid situations illustrated io the example above, one demands not justthe
continuity of F-as does the Peano existence theorem-but a Lipschitz condition
for it. Then one ensuresnot only the existence, butalso the uniqueness:
local existence and
uniqueness theorem
13.2.6. Theorem. (local existence and uniqueness theorem) Suppose thatthefunc-
tion F(x, y) is defined and continuous in the rectangle Iy - c] :S K, Ix - al ::: N
and satisfies a Lipschitz condition there. Let M = max IF(x, y)l in this rectan-
gle. Then the differential equation y' = F(x, y) has a unique solution y = f(x)
satisfying f(a) = c and defined on the interval Ix - al :S mio(N, KIM).
13.3 General Properties of SOLDEs
The most general SOLDE is
d2y
dy
P2(X)-2 +PI (x)-d + po(x)y = P3(X).
dx x
(13.9)
13.3 GENERAL PROPERTIES OF SOLDES 353
(13.10)
normal form ofa
SOLDE
singular points ofa
SOLDE
Dividing by P2(X) and writing P for Pl/P2, q for polP2, and r for P3/P2 reduces
this to the normal form
d2y dy
-2 + P(x)-d +q(x)y = rex).
dx x
Equation (13.10) is equivalent to (13.9) if P2(X) i' O. The points at which P2(X)
vanishes are called the singnlar points of the differential equation.
There is a crucial difference between the singular points of linear differential
equations and those ofnonlinear differential equations. For a nonlinear differential
equation such as (x2_y)y' = x 2+y2, thecnrve y = x2is the collectionofsingular
points. This makes it impossible to construct solutions y = f (x) that are defined
on an interval I = [a, b] of the x-axis because for any x E I, there is a y for
which the differential equation is undefined. Linear differential equations do not
have this problem, because the coefficients of the derivatives are functions of x
only. Therefore, all the singular "curves" are vertical. Thus, we have the following
definition.
(13.11)
where
L[y]=P3,
13.3.1. Definition. The normalform ofa SOWE, Equation (13.10), is regularon
an interval [a, b] of the x-axis if p(x), q(x), and rex) are continuous on [a, b].
A solution ofa normal SOWE is a twice-differentiable function y = f(x) that
satisfies the SOLVE at every point of[a, b].
It is clear that any function that satisfies Equation (13.1O)--{)r Equation
(13.9)-must necessarily be twice differentiable, and that is all that is demanded
ofthe solutions. Any higher-order differentiability requirement may be too restric-
tive, as was pointed out in Example 13.1.1. Most solutions to a normal SOLDE,
however, automatically have derivatives of order higher than two.
We write Equation (13.9) in the operator form as
d2
d
L sa P2dx2 +PI dx +PO·
regular SOLDE
It is clear that L is a linear operator because d/ dx is linear, as are all powers of
it. Thus, for constants a and {J, L[aYI +{JY2] = aL[yJl +{JL[Y2]. In particular,
if Yl and Y2 are two solutions of Equation (13.11), then L[YI - Y2] = O. That
is, the difference between any two solutions of a SOLDE is a solution of the
homogeneous equation obtained by setting P3 = 0.2
An immediate consequence of the linearity of L is the following:
13.3.2. Lemma. lj'L[u] = rex), L[v] = sex), a and {J are constants, and w =
au +{Jy, then L[w] = ar(x) +{Js(x).
Theproofofthis lemmais trivial, but the resultdescribes the fundamental prop-
erty of linear operators: When r = s = 0, that is, in dealing with homogeneous
superposition
principle
2Thisconclusion is, of course, notlimited totheSOLDE; it holdsforall linear DEs.
354 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS
equations, the lennna says that any linear combination of solutious of the homo-
geneous SOLDE (HSOLDE) is also a solution. This is called the superposition
principle.
Based on physical intuition, we expect to be able to predict the behavior of
a physical system if we know the differential equation obeyed by that system,
and, equally importantly, the initial data. A prediction is not a prediction unless it
is unique.' This expectation for linear equations is bome out in the language of
mathematics in the form of an existence theorem and a uniqueness theorem. We
consider the latter next. But first, we need a lennna.
13.3.3. Lemma. The only solution g(x) ofthe homogeneous equation s"+py' +
qy = 0 defined on the interval [a, b] that satisfies g(a) = 0 = g'(a) is the trivial
solution g = O.
Proof Introduce the nounegative function u(x) es [g(x)]2 + [g'(x)]2 and differ-
entiate it to get
u'(x) = 2g'g +2g'g" = 2g'(g +g") = 2g'(g - pg' - qg)
= _2p(g')2 + 2(1 - q)gg'.
Since (g ± g')2 2:0, it follows that 21gg'I ::: g2 + e". Thns,
2(1 - q)gg' ::: 21(1 - q)gg'l = 21(1 - q)1 Igg'l
s 1(1 - q)l(g2 + g'2) .s (I + Iql)(g2 + g'2),
and therefore,
u'(x)::: lu'(x)1 = 1- 2pg,2 + 2(1- q)gg'l
s 21plg,2 + (I + Iql)(g2 + gl2)
= [I + Iq(x)l]g2 + [I + Iq(x)1 +2Ip(x)l]gl2.
Now let K = I +max[lq(x)1 +2Ip(x)J], where the maximum is taken over [a, b].
Then we obtain
u'(x)::: K(g2 +g'2) = Ku(x) V x E [a, b].
Using the resnlt of Problem 13.1 yields u(x) ::: u(a)eK(x-a) for all x E [a, b].
This equation, plus u(a) = 0, as well as the fact that u(x) 2: 0 imply that u(x) =
g2(x) +gl2(x) = O. It follows that g(x) = 0 = g'(x) for all x E [a, b]. D
uniqueness of 13.3.4. Theorem. (Uniqueness theorem) Ifp and q are continuous on [a, b], then
solutions fa SOLDE
3Physicalintuition also tells us thatif the initial conditions are changed by an infinitesimal amount, then the solutions
will be changedinfinitesimally. Thus, the solutionsof lineardifferential equations are said to be continuous functions of the
initialconditions. Nonlinear differential equations can have completely different solutionsfor two initialconditions that are
infinitesimally close. Since initialconditions cannot be specifiedwithmathematical precision in practice, nonlinear differential
equations leadto unpredictable solutions, orchaos.This subject has receivedmuchattention in recentyears. Foranelementary
discussionof chaossee [Hass99, Chapter 15].
13.4 THEWRONSKIAN 355
atmost one solution y = f (x) ofEquation (13.10) can satisfy theinitial conditions
f(a) = Ci and I'(a) = C2, where Ci and C2 are arbitrary constants.
Proof Let fl and 12 be two solutions satisfying the given initial conditions. Then
their difference, g es fl - 12, satisfies the bomogeneous equation [witb rex) = 0].
The initial condition that g(x) satisfies is clearly g(a) = a= g'la). By Lemma
13.3.3, g = aor !I = h. 0
Theorem 13.3.4 can be applied to any homogeneous SOLDE to find the latter's
most general solution. In particular, let !I(x) and 12(x) be any two solutions of
y" + p(x)y' +q(x)y = 0 (13.12)
defined on the interval [a, b]. Assnme that the two vectors VI = (fl(a), fila»~
and V2 = (h(a), f~(a» in IR2 are linearly independent," Let g(x) be another
solution. The vector (g(a), g'(a)) can be written as a linear combination of VI and
V2, giving the two equations
g(a) = cdl(a) +c2h(a),
g'(a) = Cif{(a) +c2f~(a).
Now consider the function u(x) ea g(x) - cdl(x) - c2h(x), which satisfies
Equation (13.12) and the initial conditions ural =u'(a) =O. By Lemma 13.3.3,
we must have u(x) = 0 or g(x) = Ci!l(X) + c2h(x). We have proved the
following:
13.3.5. Theorem. Let fl and 12 be two solutions ofthe HSOWE
v" + py' +qy = 0,
where p and q are continuousfunctions defined on the interval [a, b]. If
(fl(a), flea»~ and (h(a), f~(a))
are linearly independent vectors in IR2, then every solution g(x) ofthis HSOWE
is equal to some linear combination g(x) = CI!I(x) +c2h(x) offl and 12 with
constantcoefficients Cl and C2.
13.4 The Wronskian
The two solntions fl(x) and hex) in Theorem 13.3.5 have the property that any
other solution g (x) can be expressed as a linear combination of them. We call
basis ofsolutions fl and 12 a basis of solutions of the HSOLDE. To form a basis of solutions, !I
4If theyarenot,thenone mustchoosea different initial pointfortheinterval.
356 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS
and h must be linearly independent. The linear dependence or independence of
a number of functions If; 1i'=1 : [a, b] --> lit is a concept that must hold for all
x E [a, b]. Thus, if (a;}i'~1 E lit can be found such that
for some XQ E [a, b], it does not mean that the j's are linearly dependent. Linear
dependence requires that the equality hold for all x E [a, b]. In fact, we must write
aIfI +a2h +...+anfn = 0,
where 0 is the zero function.
Wronskian defined 13.4.1. Definition. The Wronskian ofany two differentiable functions II(x) and
h(x) is
(
fJ (X)
W(fl, h: x) = lI(x)f~(x) - h(x)f{(x) = det
h(x)
f{(X») .
f~(x)
13.4.2. Proposition. The Wronskian ofany two solutions ofEquation (13.12) sat-
isfies
where c is any number in the interval [a, b].
Proof. Differentiating both sides of the definition of Wronskian and substituting
from Equation (13.12) yields a FOLDE for W(fl, h: x), which can be easily
solved. The details are left as a problem. D
An important consequence of Proposition 13.4.2 is that the Wronskian of any
two solutions of Equation (13.12) does not change sign in [a, b]. In particular, if
the Wronskian vanishes at one point in [a, b], it vanishes at all points in [a, b].
The real importance of the Wronskian is contained in the following theorem.
13.4.3. Theorem. Two differentiable functions fl and h, which are nonzero in
the interval [a', b], are linearly dependent ifand only iftheir Wronskian vanishes.
Proof. If fl and h are linearly dependent, then one is a multiple of the other, and
the Wronskian is readily seen to vanish. Conversely, assume that the Wronskian
is zero. Then
lI(x)f~(x) - h(x)f{(x) = 0 ~ IIdh = hdfl ~ h = elI
and the two functions are linearly dependent. D
ifx?;O
if x:::: 0
13.4 THE WRONSKIAN 357
Josef Hoene de Wronski (1778-1853) was hom Josef Hoene, hut he adopted the name
Wronski around 1810justafter he married, He hadmovedto France andbecomea French
citizenin 1800andmovedto Parisin 1810, the sameyearhe publishedhis firstmemoiron
the foundations of mathematics, which receivedless than favorable reviews from Lacroix
andLagrange. Hisotherinterests includedthedesignof caterpillar vehiclestocompetewith
therailways. However, theywerenevermanufactured.
Wronski wasinterested mainlyinapplying philosophyto
mathematics, thephilosophy taking precedence overrigorous
mathematical proofs. He criticised Lagrange's use of infinite
seriesandintroduced his ownideasforseriesexpansions of a
function. The coefficientsin this series aredeterminants now
known as Wronskians [so named by Thomas Muir (1844-
1934), a GlasgowHigh School science master who became
an authority on determinants by devoting most of his life to
writing afive-volume treatise onthehistory of determinants].
Formany years Wronski's workwasdismissedasrubbish.
However, a closer examination of the workin more recent
timesshowsthatalthough someis wrongandhe has anincredibly high opinion of himself
andhisideas,there arealsosomemathematical insights of greatdepthandbrilliance hidden
within thepapers.
13.4.4. Example. Let I: (x) = x and h(x) = Ixlfor x E [-1,1]. These two functions
are linearly independent inthegiveninterval, becausealx +a21xI = 0for all x ifandonly
if ''1 = ez =O.The Wronskian, on the other hand, vanishes for all x E [-1, +1]:
dlxl dx dlxl
W(ft, fz; x) = x- -Ixl-= x- -Ixl
dx dx dx
d {x ifx>O {x
= x dx -x if x ~ 0 - -x
{
X - x =0 if x > 0
= -x - (-x) = 0 if x < O.
Thus, it is possiblefortwofunctions to haveavanishing Wronskian without beinglinearly
dependent. However, as we showed in the proof of the theorem above, if the functions
are differentiable in their interval of definition, then they are linearly dependent if their
Wronskian vanishes. II
13.4.5. Example. TheWronskian canbe generalized to n functions. TheWronskian of
the functions fJ, h. ... ,[n is
(
fJ(x)
h(x)
W(ft,fz, ... ,fn: x) = det :
fn(x)
f{(x)
f5.(x)
f/'(x)
fi"_l)(X»)
f}"-l) (x)
fJn-l) (x)
358 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS
If the functions are linearlydependent,then W(ft,!2, ... , in; x) = O.
For instance, itisclearthat eX, e-x, andsinhx arelinearly dependent. Thus, weexpect
(
e" e" ex)
W(eX
, e-x, sinh x; x) = det e-x _e-x e-x
sinhx coshx sinhx
tovanish, asis easilyseen(thefirst and lastcolumns arethesame).
13.4.1 A Second Solution to the HSOLDE
Ifwe know one solution to Equation (13.12), say ft, then by differentiating both
sides of
h(x)fi(x) - h(x)f{(x) = W(x) = W(e)e- J: p(t)dt,
dividing the result by f 1
2
, and noting that the LHS will be the derivative of hih,
we can solve for 12 in tenus of ft. The result is
hex) = fleX) {C+ K 1
X
---i-exp [-1'p(t)dt] dS} ,
a f l (S) c
(13.13)
ke
kx
)
-ke-kx = -2k,
where K '" W (c) is another arbitrary (nonzero) constant; we do not have to know
W (x) (this would require knowledge of 12, which we are trying to calculate!)
to obtain Wee). In fact, the reader is urged to check directly that hex) satisfies
the DE of (13.12) for arbitrary C and K. Whenever possible-and convenient-
it is customary to set C = 0, because its presence simply gives a term that is
proportional to the known solution ft (x).
13.4.6. Example. (a) A solutionto the SOLDE y" - k2y = 0 is ekx. To finda second
solution, we let C = 0 and K = I 10Equation(13.13).Sincep(x) = 0, we have
( 1
x ds ) I e-2ka
!2(x)=e
kx
0+ a e2k' =-2ke-kx+~t!X,
which, ignoring thesecond term (whichis proportional to thefirst solution), leads directly
to thechoiceof e-kx as a secondsolution.
(h) Thedifferentialequationy"+k2y = ohassinkx asasolution. WithC = O,a = rr/(2k),
andK = I, we get
!2(x) =slokx (o+l
x
.~ ) =-slokxcotksl;/2k = - coskx .
1r/2k sm ks
(c)For the solotions in part (a),
(
ekx
W(x) = det e-kx
andfor thosein part (h),
W(x) = det (SlokX
coskx
kcoskx ) =-k
-ksiokx .
13.4 THE WRONSKIAN 359
BothWronskians are constant. In general, theWronskian of anytwo linearly independent
solutions of y" + q(x) y = 0 is constant. II
Most special functions used in mathematicalphysics are solutions ofSOLDEs.
The behavior of these functions at certain special points is determined by the
physics of the particular problem. In most situations physical expectation leads to
a preference for one particular solution over the other. For example, although there
are two linearly independent solutions to the Legendre DE
~ [(I - x 2
) dY] +n(n + I)y = 0,
dx dx
the solution that is most frequently encountered is the Legendre polynomial Pn (x)
discussed in Chapter 7. The other solntion can be obtained by solving the Legendre
equation or by using Equation (13.13), as done in the following example.
13.4.7. Example. The Legendreequationcan bereexpressedas
d2y 2x dy n(n + I)
dx2 - I _ x2 dx + I _ x2 Y = O.
This is an HSOWE with
p(x) = -~2 and q(x) = n(n + I)
I-x I-x2
One solutionof this HSOLDEis the well-knownLegendrepolynomial Pn(x). Using this
as ourinputandemploying Equation (13.13), we cangenerate another set ofsolutions.
Let Qn(x) stand for the linearly independent''partner'' of Pn(x). Then, setting C =
0= c in Equation(13.13)yields
Qn(x) = KPn(x) 1
x
+exp [[' ~ -l d.
" Pn (s) 1
0 I - t
= KPn(x)lx + [~] d. = AnPn(x)l
x
~. 2 '
a Pn (s} 1-. " (I -. )Pn (.)
where An is anarbitrary constant determined bystandardization, anda is anarbitrary point
in the interval [-1, +1].Forinstance,forn = 0, we have Po = 1, andwe obtain
for Ixl < 1.
for Ixl < 1.
1
x
d. [I II+xl I 1
1
+"1
]
QO(x) = Ao - - = Ao -In - - - -In - - .
"1-.2 2 I-x 2 1-"
The standardform of Qo(x) is obtainedby settingAo = I and" = 0:
I II+x I
Qo(x) = -In - -
2 I-x
Similarly, since Pt (x) = x,
1
x
d. II+x I
Ql(X)=AlX 2 2 =Ax+Bxln - - +C
" s (I -. ) I - x
Here standardization is A = 0, B = !'andC = -1. Thus,
I II+x I
Ql(X) = -xln - - -1.
2 I-x III
360 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS
13.4.2 The General Solution to an ISOLDE
Inhomogeneous SOLDEs (ISOLDEs) can be most elegantly discussed iu terms of
Green's functions, the subject of Chapter 20, which automatically iucorporate the
boundary conditions. However, the most general solution of an ISOLDE, with no
boundary specification, can be discussed at this poiut.
Let g(x) be a particular solution of
L[y] =y" + py' +qy =r(x) (13.14)
and let h(x) be any other solution of this equation. Then h(x) - g(x) satisfies
Equation (13.12) and can be written as a liuear combiuation ofa basis ofsolutions
fl (x) and h (x), leading to the followiug equation:
h(x) = CJfl(x) +ezh(x) +g(x). (13.15)
Thus, ifwe have apartieular solution of the ISOLDE of Equation (13.14) and two
basis solutions of the HSOLDE, then the most general solution of (13.14) can be
expressed as the sum of a linear combiuation of the two basis solutions and the
particular solution.
We know how to find a second solution to the HSOLDE once we know one
solution. We now show that knowiug one such solution will also allow us to find a
particular solution to the ISOLDE. The method we use is called the variation of
method ofvariation constants. This methodcan also be used to find a second solution to the HSOLDE.
ofconstants Let hand h be the two (known) solutions of the HSOLDE and g(x) the
sought-after solution to Equation (13.14). Write g as g(x) = fl (x)v(x) and sub-
stitute it in (13.14) to get a SOLDE for v(x):
" ( 2f {) , r
v + p+- v =-.
h h
This is aftrst order liuear DE iu v', which has a solution of the form
v' = W(x) [c+1
x
h(t)r(t) dt]
ft(x) a W(t) ,
where W(x) is the (known) Wronskian of Equation (13.14). Substitutiug
W(x) = h(x)f5.(x) - h(x)f{(x) =!:... (h)
ft(x) ft(x) dx h
iu the above expression for v' and settiug C = 0 (we are interested in a particular
solution), we get
dv =!:... (h) 1
x
fl(t)r(t) dt
dx dx h a W(t)
.».[h(x) 1x
h(t)r(t) -l_h(x) !:...lx
h(t)r(t) dt
dx h(x) a W(t) fl(x).ax a W(t)
=ft(x)r(x)/W(x)
13.4 THE WRONSKIAN 361
and
V(X) = h(x) r Ji(t)r(t) dt _ (X h(t)r(t) dt.
fl(x) Ja W(t) Ja W(t)
This leads to the particular solution
1
x
fl (t)r(t) 1X
h(t)r(t)
g(x) = fl(x)v(x) = h(x) W(t) dt - fl(x) W(t) dt.
a a (13.16)
We have just proved the following result.
13.4.8. Proposition. Given a single solution Ji(x) ofthe homogeneous equation
corresponding to an ISOLDE, one can use Equation (13.13) tofind a second solu-
tion h(x) ofthe homogeneous equation and Equation (13.16) tofind a particular
solution g(x). The most general solution h will then be
h(x) = cdl (x) +czh(x) +g(x).
13.4.3 Separation and Comparison Theorems
The Wronskian can be used to derive some properties ofthe graphs of solutions of
HSOLDEs. One such property concerns the relative position of the zeros of two
linearly independent solutions of an HSOLDE.
theseparation 13.4.9. Theorem. (the separation theorem) The zeros oftwo linearly independent
theorem solutions ofan HSOWE occur alternately.
Proof Let Ji(x) and h(x) be two independent solutions of Equation (13.12).
We have to show that a zero of Ji exists between any two zeros of h. The linear
independence of fl and h implies that W (fl, h: x) i' 0 for any x E [a, b]. Let
Xi E [a, b] be a zero of h. Then
oi' W(fI, fz;Xi) = Ji(xilf~(Xi) - h(xilf{(xil = fl(Xi)f~(Xi)'
Thus, [: (Xi) i' 0 and f~(Xi) i' O. Suppose that XI and xz-where X2 > Xl----are
two successive zeros of h. Since h is continuous in [a, b] and f~(xI) i' 0, h has
to be either increasing [f~(Xl) > 0] or decreasing [f~(XI) < 0] at XI. For h to
be zero at xz, the next point, f~ (X2) must have the opposite sign from f~(Xl) (see
Figure 13.1). We proved earlier that the sign of the Wronskian does not change
in [a, b] (see Proposition 13.4.2 and comments after it). The above equation then
says that fl (XI) and I, (xz) also have opposite signs. The continuity of Ji then
implies that Ji must cross the x-axis somewhere between Xl and xz. A similar
argument shows that there exists one zero of h between any two zeros of fl. D
362 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS
Figure 13.1 If 12
(xI ) > 0 > 12(X2). then (assumingthaI the Wronskianis positive)
!J(XI) > 0 > !J(X2).
the comparison
theorem
13.4.10. Example. Two linearly independent sotntions of v" + Y = 0 are sinx and
cosx. The separation theorem suggeststhat the zeros of sinx andcosx mustalternate, a
fact known fromelementary trigonometry: The zeros of cosx occurat odd,multiples of
"/2. and thoseof sinX occurat evenmultiplesof"/2. ill
A second useful result is known as the comparison theorem (for a proof. see
[Birk 78. p. 38]).
13.4.11. Theorem. (the comparison theorem) Let f and g be nontrivial solutions
of u" + p (x)u = 0 and v" +q (x) v = O. respectively. where p (x) 2:: q(x) for
all X E [a. b]. Then f vanishes at least once between any two zeros ofg. unless
p = q and f is a constant multiple ofg.
The form of the differential equations used in the comparison theorem is not
restrictive because any HSOLDE can be cast in this form. asthe following example
shows.
13.4.12. Example. We show that y" + p(x)y' + q(x)y = 0 can be cast in the fonn
u" + S(x)u = Obyanappropriate functional transformation. Definew(x) by y = WU, and
substitutein the HSOLDE to obtain
(u'w +w'u)' +p(u'w +w'u) +quw = O.
or
wu" + (2w' + pw)u' + (qw + pw' + w")u = O. (B.i?)
If we demand that thecoefficient of u' bezero, we obtain the DE 2w' +pw = 0, whose
solution is
w(x) = cexp[-i LX p(t)dt].
13.4 THEWRONSKIAN 363
Dividing (13.t7) by this w and substituting for w yields
u" + S(x)u = 0, where
W' w" 12 II
Sex) = q + p- + - = q - 4P - 'lP .
w w
oscillation ofthe
Bessel function of
order zero
A useful special case of the comparison theorem is given as the following
corollary whose straightforward but instructive proof is left as a problem.
13.4.13. Corollary. If q(x) ::: 0 for all x E [a, bj, then no nontrivial solution of
the differential equation v" +q(x)v = 0 can have more than one zero.
13.4.14. Example. It should be clear from the preceding discussion that the oscillations
of the solutions of s" + q(x)v = 0 are mostly determined by the sign and magnitude of
q(x). For q(x) ::s 0 there is no oscillation; that is, there is no solution that changes sign
more than once. Now suppose that q(x) ::: k2 > 0 for some real k. Then, by Theorem
13.4.11, any solution of v" + q(x)v = 0 must have at least one zero between any two
successivezeros of the solutionsinkx of ulf
+ k2u = O. This means that any solutionof
v" +q(x)v = ohas a zero in any interval oflength1l"jk if q(x) 2: k2 > O.
Let us apply this to the Bessel DE,
" I, ( n
2
)
y + -y + 1- - y = O.
x x2
We can eliminate the y' term by substituting vj.JX for y.5 This transforms the Bessel DE
into.
(
4n2-1)
v" + 1 - - - - v = O.
4x2
We compare this, for n = 0, with u" +U = 0, which has a solution u = sin x, and conclude
that each interval of length 7r ofthe positive x-axis contains at least one zero ofany solution
of order zero (n = 0) ofthe Bessel equation. Thus, in particular, the zeroth Bessel function,
denoted by Jo(x), has a zero in each interval of length 7r of the x-axis.
On the other hand, for 4n2 - I > 0, or n > !' we have I > [t - (4n2 - l)j4x2j.
This implies that sin x has at least one zero between any two successive zeros of the Bessel
functions of order greater than ~. It follows that such a Bessel function can have at most
one zero between any two successive zeros of sin x (or in each interval of length 7r on the
positive x-axis). 11
13.4.15. Example. Let us apply Corollary 13.4.13 to v" - v = 0 in whichq(x) = -I <
O.According to the corollary, the most general solution, cjeX + C2e-x, can have at most
one zero. Indeed,
x -x 0 1 I 1 c21
qe+c2e = ;::}x=zn-
q
,
and this (real) x (ifit exists) is the only possible solution, as predicted by the corollary. 11
SBecause of the square root in the denominator, the range of x will have to be restricted to positive values.
364 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS
13.5 AdjointDifferential Operators
We discussed adjoint operators in detail in the context of finite-dimensional vector
spaces in Chapter 2. In particular, the importance of self-adjoint, or hermitian,
operators was clearly spelled out by the spectral decomposition theorem of Chapter
4. A consequence of that theorem is the completeness of the eigenvectors of a
hermitian operator, the fact that an arbitrary vector can be expressed as a linear
combination of the (orthonormal) eigenvectors of a hermitian operator.
Self-adjoint differential operators are equally important because their "eigen-
functions" also form complete orthogonal sets, as we shall see later. This section
will generalize the concept of the adjoint to the case of a differential operator (of
second degree).
13.5.1. Definition. The HSOLVE
L[y] '" p2(X)y" +PI(x)y' + po(x)y = 0
exact SOLDE is said to be exact if
(13.18)
(13.19)
integrating factor for for all f E e2[a, b] andfor some A, B E ella, bJ, An integratingfactorfor L[y]
SOLDE is afunction /L(x) such that /L(x)L[y] is exact.
If an integrating factor exists, then Equation (13.18) reduces to
d , ,
d)A(x)y + B(x)y] =0 => A(x)y + B(x)y =C,
a FOLDE with a constant inhomogeneous term. Even the ISOLDE corresponding
to Equation (13.18) can be solved, because
d ,
/L(x)L[y] =/L(x)r(x) => dx[A(x)y + B(x)y] =/L(x)r(x)
=> A(x)y' + B(x)y = LX /L(t)r(t)dt,
which is a general FOLDE. Thus, the existence of an integrating factor completely
solves a SOLDE. Itis therefore important to know whether ornot a SOLDE admits
an integrating factor. First let us give a criterion for the exactness of a SOLDE.
13.5,2. Proposition. The SOWE ofEquation (13.18) is exact ifand only ifP~ -
pi + Po = o,
Proof Ifthe SOLDE is exact, then Equation (13.19) holds for all f. implying that
P2 = A, PI = A' +B. and PO = B'. It follows that P~ = A", pi = A" +B', and
PO = B'. which in tum give P~ - pi + PO = O.
13.5 ADJOINT DIFFERENTIAL OPERATORS 365
Conversely if Pz - PI + PO = 0, then, substituting PO = - pz + PI in the
LHS of Equation (13.18), we obtain
and the DE is exact. D
A general SOLDE is clearly not exact. Can we make it exact by multiplying
it by an integrating factor as we did with a FOLDE? The following proposition
contains theanswer.
13.5.3. Proposition. Afunetion J.L is an integratingfactor ofthe SainE ofEqua-
tion (13.18) if and only if it is a solution ofthe HSOInE
M[J.L] es (P2J.L)" - (PIJ.L)'+PoJ.L = O. (13.20)
Proof This is an immediate conseque,:!ceof Proposition 13.5.2.
We can expand Equation (13.20) to obtain the equivalent equation
P2J.L" + (2p~ - PI)J.L' +(pz - PI + pO)J.L = O.
D
(13.21)
(13.22)
The operator Mgiven by
2 ~
d r d " ,
M es P2dx 2 +(2p2 - PI) dx +(P2 - PI+ PO)
is called the adjoint of the operator L and denoted by Mes Lt. The reason for the
use of the word "adjoint" will be made clear below.
Proposition 13.5.3 confirms the existence of an integrating factor. However,
the latter can be obtained only by solving Equation (13.21), which is at least as
difficult as solving the original differential equation! In contrast, the integrating
factor for a FOLDE can be obtained by a mere integration [see Equation (13.8)].
Although integrating factors for SOLDEs are not as usefnl as their counterparts
for FOLDEs, they can facilitate the study of SOLDEs. Let us first note that the
adjoint of the adjoint of a differential operator is the original operator: (Lt)t = L
(see Problem 13.11). This suggests that if v is an integrating factor of L[u], then u
will be an integrating factor ofM[v] sa Lt[v]. In particular, multiplying the first one
by v and the second one by u and subtracting the results, we obtain [see Equations
(13.18) and (13.20)] vL[u] - uM[v] = (VP2)U" - U(P2V)" +(VPI)U' +U(PI v)',
which can be simplified to
adjoint ofa
second-order linear
differential operator
d , ,
vL[u] - uM[v] = -[p2VU - (P2V) u +PIUV].
dx
(13.23)
(13.24)
366 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS
Lagrange identities Integrating this from a to b yields
lb
(vL[u]- uM[v])dx = [pzvu' - (pzv)'u +Pluv]I~.
Equations (13.23) and (13.24) are calledthe Lagrange identities. Equation (13.24)
embodies the reason for calling Mthe adjointofL: Ifwe consider u and v as abstract
vectors Iu) and Iv), L and M as operators in a Hilbert space with the inner product
(ul v) = J:u*(x)v(x) dx, then Equation (13.24) can be written as
(vi L lu) - (ul M Iv) = (ul Lt [u)" - (ul M [u} = [pzvu' - (pzv)'u +Pluv]I~.
If the RHS is zero, then (ul Lt Iv)* = (ul M Iv) for all ]«) , [u), and since all these
operators and functions are real, Lt = M.
As in the case of finite-dimensional vector spaces, a self-adjoint differential
operator merits special consideration. For M[v] es Lt [v] to be equal to L, we must
have [see Equations (13.18) and(13.21)12p~ - PI = PI and P~ - pi +PO = Po·
The first equation gives P~ = PI, which also solves the second equation. If this
conditionholds, then we can write Equation (13.18) as L[y] = PZy" +p~y' +POY,
or
d [ dY] (
L[y] = dx pz(x) dx +Po(x)y =O.
Can we make all SOLDEs self-adjoint? Let us multiply both sides ofEquation
(13.18) by a function h(x), to be detenntined later. We get the new DE
h(x)pz(x)y" +h(x)PI(x)y' +h(x)po(x)y = 0,
which we desire to be self-adjoint. This will be accomplished if we choose h(x)
such thathpi =(hpz)', or pzh' +h(p~- PI) =0, which can be readily integrated
to give
h(x) = ~ exp [fX PI (t) dt] .
PZ pz(t)
We have just proved the following:
all SOLOEs canbe 13.5.4. Theorem. The SOInE ofEquation (13.18) is self-adjoint if and only if
made self-adjoint P~ = PI, in which case the DE has the form
d [ dY]
dx pz(x) dx + po(x)y = O.
If it is not self-adjoint, it can be made so by multiplying it through by
I [fX PI(t) ]
h(x) = -exp --dt.
pz pz(t)
13.6 POWER-SERIES SOLUTIONS OF SOLOES 367
13.5.5. Example. (a)The Legendreequationin normalform,
" 2x, x
y ---y +--y=O,
I-xl I - x2
is notself-adjoint. However, we get-aself-adjoint versionif we multiplythrough byh(x) =
I-x2:
(I - x2)y" - 2xy' +Ay= 0,
or [(I - x 2)y'1
' + Ay= o.
(b) Similarly, thenormalformof the Besselequation
I ( n
2
)
y"+-y'+ 1-- y=O
x x2
is not self-adjoint, but multiplying throughby h(x) = x yields
~(XdY) + (x _n
2
)y = 0,
dx. dx x
whichis clearlyself-adjoint.
13.6 Power-Series Solutions of SOLDEs
l1li
Analysis is one of the richest branches of mathematics, focusing ou the eudless
variety of objects we call functions. The simplest kind of functiou is a polyno-
mial, which is obtained by performing the simple algebraic operations ofaddition
and multiplication on the independent variable x. The next in complexity are the
trigonometric functions, which are obtained by taking ratios of geometric objects.
If we demand a simplistic, intuitive approach to functions, the list ends there. It
was only with the advent of derivatives, integrals, and differential equations that a
vastly rich variety of functions exploded into existence in the eighteenth and nine-
teenth centuries. Forinstance, e",nonexistent beforetheinvention of calculus, can
be thought of as the function that solves dy[dx = y.
Although the definition of a function in terms ofDEs and integrals seems a bit
artificial, for most applications it is the only way to define a function. For instance,
theerror function, usedin statistics, is.defined as
I IX 2
erf(x) == r;; e-t
dt.
y'Jr -00
Such a function carmot be expressed in terms of elementary functions. Similarly,
functions (of x) such as
1
00 sint
-dt,
x t
368 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS
and so on are enconnteredfrequently in applications. None of these functions can
be expressed in terms of other well-knownfunctions.
An effective way of studying such functions is to study the differential equa-
tions they satisfy. Infact, the majority of functions encountered in mathematical
physics obey the HSOLDEof Equation (13.I8) in which the Pi(x) are elementary
functions,mostlyratios of polynomials(of degree at most 2). Of course, to specify
functionscompletely,appropriatebonndaryconditionsare necessary.For instance,
the error function mentioned above satisfiesthe HSOLDE y" +2xy' = 0 withthe
boundary conditions yeO) =! and y'(O) =l/..,m.
The natural tendency to resist the idea of a function as a solution of a SOLDE
is mostlydue to the abstractnature of differentialequations.After all, it iseasier to
imagineconstructingfnnctionsby simplemultiplicationsor withsimplegeometric
figuresthat have been aronnd for centuries. The following beautiful example (see
[Birk78, pp. 85--S7]) should overcome this resistance and convince the skeptic
that differentialequations contain all the information about a function.
13.6.1. Example. We canshow thatthesolutions to y" + Y =0 have all theproperties
we expectof sinx and cosx. Letus denote the two linearly independent solutions of this
equationbyC(x) andS(x).Tospecifythesefunctiooscompletety, wesetC(O) = S'(O) = t,
andC'(O) = S(O) = O. We claim thatthisinformation is enoogh toideotify C(x) andSex)
as cos x and sinx, respectively.
First, let us showthat the solutions exist andarewell-behaved functions. With C(O)
and c' (0) given, the equation y" + y = 0 can generate all derivatives of C (x) at zero:
C"(O) = -C(O) = -I, CII/(O) = -C'(O) = 0, C(4) (0) = -C"(O) = +1, and, in general,
{
o if n i odd
C(n) (0) = 1 a is ,
(-I)k ifa = 2k wherek = 0, 1,2, .. ,.
Thns, theTaylor expansion of C(x) is
00 2k
C(x) = L(-I)k~,
k=O e2k)!
Shnilarty,
00 2k+l
Sex) = "e_l)k_X;-:-:c-;
6 (Zk+I)!'
(13.25)
e13.26)
Example illustrates
that all intormation
about sine and cosine
ishidd,m in their
differential equation
A simpleratio teston theseriesrepresentation of C(x) yields
. ak+t ' e-l)k+lx2(k+I)/e2k+2)!, _x2
100 --= too = 1hn =0
k....oo ak k....oo e-l)kx2k/e2k)! k....oo e2k+ 2)(2k+ I) ,
whichshowsthat theseriesfor C(x) converges forallvaluesof x. Similarly, theseriesfor
S(x) is alsoconvergent. Thus, we aredealing withwell-defined finite-valued functions.
Letusnowenumerate and prove someproperties of C(x) andS(x).
(a) C'ex) = -Sex),
We prove this relation by differentiating c" (x) + C(x) = 0 and writing the result as
13.6 POWER-SERIES SOLUTIONS OF SOLOES 369
[C'(x) 1"+C'(x) = 0 to make evident the fact that C'(x) is also a solution. Since C'(0) = 0
and [C'eO)]' = C"(O) = -1, and since -Sex) satisfies the same initialconditions, the
uniqueness theorem implies that C'(x) = -S(x). Similarly, S'(x) = C(x).
(b) C2(x) +S2(x) = I.
Sincethe p(x) term is absent from the SOLDE, Proposition 13.4.2impliesthat theWron-
skiau of C(x) aud S(x) is constaut. On the otherhaud,
W(C, S; x) = C(x)S'(x) - C' (x)S(x) = C2(x) + S2(x)
= W(C, S; 0) = C2(0) + S2(O) = I.
(c) S(a +x) = S(a)C(x) +C(a)S(x).
Theuse oftbe chainruleeasily showsthat Sea+x) is a solution of the equation y" +y =
O. Thus,it can be written as a linear combination of C(x) and Sex) [whichare linearly
independent becausetheir Wronskian is nonzeroby (b)]:
S(a +x) =AS(x) + BC(x). (3.27)
This is a functional identity, which for x = 0 gives S(a) = BC(O) = B.lfwe differentiate
both sides of Equation (13.27), we get
C(a +x) = AS'(x) + BC'(x) = AC(x) - BS(x),
which for x = 0 gives C(a) = A. Substitoting the values of A aud B in Equatiou (13.27)
yieldsthedesired identity. A similar argument leadsto
C(a +x) = C(a)C(x) - S(a)S(x).
(d) Periodicity of C(x) aud S(x).
Let xo be the smallest positive real uumber such that S(xo) = C(xo). Then property (b)
implies that C(xo) = S(xo) = 1/./2. On the otherhaud,
S(xo +x) = S(xO)C(x) +C(xo)S(x) = C(XO)C(x) +S(xO)S(x)
= C(xo)C(x) - S(xo)S(-x) =C(xO - x).
The third equality follows becaose by Equation (13.26), S(x) is au odd function of x. This
is truefor all x; in particular, for x = xo it yields S(2xo) = C(O) = I, aud by property (b),
C(2xO) = O. Using property (c) once more, we get
S(2xo +x) = S(2xo)C(x) +C(2xo)S(x) = C(x),
C(2xo +x) = C(2xo)C(x) - S(2xO)S(x) = -S(x).
Substitoting x = 2xo yields S(4xO) = C(2xo) = 0 aud C(4xo) = -S(2xO) = -I.
Continuing in thismanner, we caneasilyobtain
S(8xo +x) =S(x), C(8xo + x) = C(x),
whichprove theperiodicity of Sex) andC(x) andshowthat their periodis 8xQ. It is even
possibleto determine xo. Thisdetermination is left as aproblem, buttheresultis
xo = {l/.Ji _d_t_.
10 JJ=t2
A numerical calculation will showthatthisis 11:/4.
370 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS
13.6.1 Frobenius Method of Undetermined Coefficients
A proper treatment of SOLDEs requires the medium of complex analysis and will
be undertakeu in the uext chapter. At this poiut, however, we are seekiug aformal
iufiuite series solutiou to the SOLDE
y" + p(x)y' +q(x)y = 0,
where p(x) and q(x) are real and analytic. This means that p(x) and q(x) can he
represeuted by convergent power series in some interval (a, b). [The interesting
case where p(x) and q(x) may have siugularities will be treated in the context of
complex solutions.]
The general procedure is to write the expansions''
00
p(x) = Lakxk,
k~O
00
q(x) = LbkXk,
k~O
(13.28)
for the coefficient functions p and q and the solution y, substitute them in the
SOLDE, and equate the coefficient of each power of x to zero. For this purpose,
we need expansions for derivatives of y:
00 00
y' = LkCkXk-1 = L(k + l)ck+I Xk,
k=l k=O
00 00
v" =L(k + l)kck+IXk-1 = L(k +2)(k + I)Ck+2xk
k~l k=O
Thus
00 00
p(x)y' = L L amxm(k + l)Ck+IXk = L(k + l)amCk+lX
k+m.
k=Om=O k,m
Letk +m == n andsumovern. Thentheother sum,saym, cannot exceedn. Thus,
00 n
p(x)y' = LL(n-m+l)amcn-m+IX
n.
n=Om=O
Similarly, q(x)y = I:~o I:~~o bmcn_mxn. Substituting these sums and the se-
ries for y" in the SOLDE, we obtain
~{ (n + l)(n +2)Cn+2 +1;[en - m + l)amCn-m+1 +b~cn-m]} x
n
= O.
6Herewe areexpanding about theorigin. If suchanexpansion is impossible orinconvenient, one canexpand about another
point,sayxo. Onewouldthenreplace all powersof x in all expressions belowwithpowersof x - xo. Theseexpansions assume
that p, q, andy havenosingularity atx = O. In general, thisassumption is notvalid,anda different approach, inwhichthewhole
seriesis multiplied by a (notnecessarily positiveinteger) powerof x, oughttobe taken. Detailsareprovided in Chapter 14".
13.6 POWER-SERIES SOLUTIONS OF SOLOES 371
For this to be true for all x, the coefficient of each power of x must vanish:
"
(n + lien +2)c"+2 = - L[(n - m + l)amc,,-m+1 +bmc,,_m]
m=O
or
for n 2: 0,
,,-I
n(n + I)C"+l = - L[(n - m)amc,,_m +bmc,,_m_tl
m=O
for n 2: 1.
(13.29)
the SOLOE existence
theorem
If we know Co and CI (for instance from bonndary conditions), we can uniquely
determine c"for n 2:2 from Eqnation (13.29). This, in tnm, gives a unique power-
series expansion for y, and we have the following theorem.
13.6.2. Theorem. (the existence theorem) For any SOLVE of the form s" +
p(x) y' + q(x) y = 0 with analytic coefficient functions given by the first two
equations of(13.28), there exists a unique power series, given by the third equa-
tion of(13.28) that formally satisfies the SOWEfor each choice ofco and ct.
This theorem merely states the existence of a formal power series and says
nothing about its convergence. The following example will demonstrate that con-
vergence is not necessarily guaranteed.
13.6.3. Example. Theformal power-series solutionforx2 s'- y+x = 0 canbeobtained
by letting y = L~o e-x",Theny' = L~o(n + l)cn+lXn, andsubstitution in theDE
gives L~O(n + l)Cn+lXn+2 - E~o cnxn + x = 0, or
00 00
L(n+ 1)cn+lXn+2 - Co -qx - LCnxn +x = O.
n=O n=2
We see thatCo = 0, C1 = 1, and (n + l)Cn+l = cn+2 for n ~ O. Thus, we have the
recursion relation nCn = cn+l forn ::: 1 whose unique solution is en = (n - 1)1, which
generates thefollowing solution fortheDE:
Y = x +xz+ (2!)x 3 + (3!)x 4 + ... +(n - t)!x" + ....
This seriesis notconvergent for anynonzerox. III
As we shall see later, for normal SOLDEs, the power series of y in Equation
(13.28) converges to an analytic function. The SOLDE solved in the preceding
example is not normal.
13.6.4. Example. As an application of Theorem 13.6.2, tet us cousiderthe Legendre
equation inits normal form
" 2x! A
Y ---ZY +--zy=O.
I-x t-x
372 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS
For IxI < I both p and q are analytic, and
00 00
p(x) = -2x L (x2)m = L (_2)x2m+1,
m=O m=O
00 00
q(x) = ic L (x2)m = L !cx
2m
.
m=O m=O
Thus, the coefficients of Eqoation (13.28) are
{
o ifm is even.
am =
-2 ifmisodd
and bm = {~
if m is even,
ifm is odd.
Wewantto substitute for am andbm in Equation (13.29) to findCn+l . It is convenient
to consider two cases: when n is odd and when n is even. For n = 2r + 1, Equation
(13.29)-after some algebra-yields
r
(2r + 1)(2r +2)C2'+2 = L (4r - 4m - ic)C2(,-m)'
m=O
With r ---+ r + 1, thisbecomes
,+1
(2r +3)(2r +4)C2'+4 = L (4r +4 - 4m - ic)C2(,+I-m)
m=O
(13.30)
foreven k.
(13.31)
,+1
= (4r +4 - ic)C2(,+I) + L (4r +4 - 4m - ic)C2(,+I_m)
m=l
r
=(4r +4 - ic)c2r+2 + L (4r - 4m - ic)C2(,_m)
m=O
= (4r +4 - ic)C2,+2+ (2r + 1)(2r +2)02,+2
= [-ic + (2r +3)(2r +2)]c2,+2,
where in goingfromthesecondequality to thethird we changed thedummy index, andin
going from the thirdequality to the fourth we used Eqoation (13.30). Now we let2r+2 eek
to obtain (k + I)(k +2)Ck+2 = [k(k + I) - iclck, or
k(k+ I) -ic
ck+2 = (k + I)(k +2) Ck
Itis notdifficult to showthat starting withn = 2r, thecaseof evenn, we obtain this same
equation foroddk. Thus, we canwrite
n(n+I)-ic
cn+2 = (n + I)(n +2) Cn·
Forarbitrary coandcj ,weobtaintwoindependentsolutions, oneofwhichhasonlyeven
powers ofx andtheother onlyoddpowers. Thegeneralizedratiotest(see [Hass99, Chapter
5]) showsthatthe seriesis divergent for x = ±1 unless A = 1(1 + 1) for some positive
quantum harmonic
oscillator: power
series method
13.6 POWER-SERIES SOLUTIONS OF SOLOES 373
integer l. In that case the infinite seriesbecomesa polynomial, the Legendre polynomial
encountered in Chapter 7.
Equation (13.31) could have been obtained by substitutiog Equation (13.28) directly
intotheLegendre equation. Theroundabout wayto (13.31)takenhereshowsthegenerality
of Equation (13.29). With specificdifferential equations it is generally better to substitute
(13.28) directly. III
13.6.5. Example. WestudiedHermitepolynomials inChapter7inthe contextofclassical
orthogonal polynomials. Let us see howtheyariseinphysics.
Theone-dimensional time-independent SchrOdinger equation fora particle of massm
in apotential V (x) is
12 d2lj1
- 2m dx2 + V(x)ljI = EljI,
where E is thetotalenergy of theparticle.
Fora harmonicoscillator, Vex) = ikx2 == !mw2x2 and
m2w2 2m
ljI" - ~x2lj1 + 12 EljI = O.
Substituting ljI(x) = H(x) exp(-mwx2/21) and then making the change of variables
x = (I/~mw/I)y yietds
H" -2yH' +J..H =0
2E
where J..=--1.
tu»
(13.32)
ThisistheHermite differential equation innormal form. Weassume theexpansion H (y) =
L~O cnyn whichyields
00 00
H'(y) = Lncnyn-t = L(n + t)cn+1yn,
n=l n=O
00 00
H"(y) = Ln(n+l)cn+tyn-l = L(n+l)(n+2)cn+2yn.
n=l n=O
Substituting in Equation (13.32) gives
00 00
L[(n+ I)(n +2)cn+2 +J..cnJyn - 2 L (n + I)cn+lyn+l = 0,
n=O n=O
or
00
2C2 +J..co + L[(n +2)(n +3)C
n+3+ J..cn+l - 2(n + l)cn+tlY
n+l = O.
n=O
Setting thecoefficients of powers of y equal to zero,we obtain
J..
C2 = -Zco,
2(n + I) - J..
C
n+3 = (n +2)(n +3) cn+t
for n ~ 0,
374 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS'
or,replacingn withn - 1,
(13.33)
n2:l.
2n-i.
c +2 - c
" - (n+t)(n+2) ",
Theratiotest yields easily thatthe seriesis convergent forall valuesof y.
Thus, the infinite series whose coefficients obey the recursive relation in Equa-
tion (13.33) converges for all y. However, on physical grounds, i.e., the demand that
limx--+oo 1fr(x) = 0, the series must be truncated. This happens only if)" = 21for some
integer I (see Probtem 13.20 and [Hass 99, Chapter 13]), and in that case we obtain a
polynomial,the Hermite polynomialof order 1.A consequenceof such a truncation is the
quantization of harmonic oscillatorenergy:
quantum harmonic
oscillator: algebraic
method
2E
21 = A = Iiw - t =} E = (l + !)Iiw.
Twosolutions are generated from Equation (13.33), one including only even powers
andthe otheronly oddpowers.Theseareclearlylinearlyindependent. Thus,knowledgeof
co and CJ determines the general solution of the HSOLDE of (13.32). III
The preceding two examples show how certain special functions used in math-
ematical physics are obtained in an analytic way, by solving a differential equation.
We saw in Chapter 12 how to obtain spherical harmonics and Legendre polynomi-
als by algebraic methods. It is instructive to solve the barmortic oscillator problem
using algebraic methods, as the following example demonstrates.
13.6.6. Example. TheHamiltonian of a one-dimensional harmonic oscillator is
p2 1
H= -+_mo}x2
2m 2
where p = -itid/dx isthemomentumoperator. Letusfind theeigenvectors andeigenvalues
ofH.
Wedefinethe operators
and
Using the commutation relation [x, p] = i1i1, we can showthat
and _ ... t t ...
H - "wa a + 2,=1. (13.34)
creation and
annihilation
operators
Furthermore, one canreadily showthat
[H, a] = -liwa, (13.35)
Let ItfrE) be the eigenvector corresponding to the eigenvalne E: HItfrE) = E ItfrE}, and
note that Equation (13.35) gives Ha ItfrE) = (aH - liwa) /tfrE} = (E - Iiw)aItfrE) and
Hat ItfrE} = (E+Iiw)at /tfrE). Thus,a /tfrE} is aneigenvectorof H,witheigenvalueE -liw,
andat 11/1E} is aneigenvector witheigenvalueE +Iu». Thatis why at anda arecalledthe
raising andlowering (orcreation andannihilation) operators, respectively. Wecan write
13.6 POWER-SERIES SOLUTIONS OF SOLDES 375
By applying a repeatedly, we obtain states of lowerandlowerenergies. Butthere is a
limitto thisbecause His a positiveoperator: It cannot havea negative eigenvalue. Thus,
theremustexista ground state, 11{Io}, suchthat a 11{Io} = O. Theenergyof thisgrouodstate
(or the eigenvaluecorresponding to 11{Io» can be obtained-I
H11{Io} = (lUvat a + !1Uv) 11{Io) = !1Uv11{Io) .
Repeated application oftheraising operatoryieldsbothhigher-level statesandeigenvalues.
Wethus define 11{In} by
(13.36)
where en is a normalizing constant. Theenergy of l1/Jn} is n unitshigher than theground
state's, or
t
En = (n + ~)IUv,
whichis what we obtained inthepreceding example.
Tofinden, wedemandorthonormalityfor the 11{In). Takingtheinnerproductof (13.36)
with itself, we can show (see Problem 13.21) that lenl
2 = nlc,,_1I2, or len1
2 = n!leoI2,
whichfor Icol = 1 and real c" yields en = v'ni.lt follows,then, that
quantum harmonic
oscillator: connection
between algebraic
.andanalytic methods
In terms of functions andderivative operators, a Ito} = 0 gives
(J~:x + J2;W :x) 1{Io(x) = 0
with the solution1{IO(x) = eexp(-mwx2/2/i). Normalizing1{IO(x) gives
100 (mwx
2
) ( lin ) 1/2
I = (1{IoI1{l0) = e
2
-00 exp --/i- dx = e
2
mw
Thus,
WecannowwriteEquation (13.37) in terms of differential operators:
1{I,,(x) = _1_ (~)ti4 (tw x _ J/i ~)n e-mwx'/(2n)
v'ni lin 2/i 2mw dx
Defining a newvariable y = .Jmw/hx transforms this equation into
= (mW)t/4 _1_ ( _ ~)n e-Y'/2.
1{1" lin 1],=n
I Y d
-v L.··n: Y
(13.37)
7Prom hereon, theunitoperator 1 will notbe shownexplicitly.
376 13. SECONO-OROER LINEAR OIFFERENTIAL EQUATIONS
From this, the relation between Hermite polynomials, and the solutions of the one-
dimensional harmonic oscillator as given in the previous example, we can obtain a general
formula for Hn(x). In particular, if we note that (see Problem 13.21)
'/2 ( d) '/2 -i d 2
eY y - - e-Y = -eY -e-Y
dy dy
and, in general,
2 ( d )n 2 2 d
ll
2
eY /2 y _ _ e-Y /2 = (_l)ney -e-Y ,
dy dyn
we recover the generalized Rodriguez formula of Chapter 7. l1li
To end this section, we simply quote the following important theorem (for a
proof, see [Birk 78, p. 95]):
13.6.7. Theorem. For any choice ofCo and CI, the radius ofconvergence ofany
power series solution y = L~o CkXkfor the normal HSOWE
y" +p(x)y' +q(x)y = 0
whose coefficients satisfy the recursion relation of (13.29) is at least as large as
the smaller ofthe two radii ofconvergence ofthe two series for p(x) and q(x).
10 particular, if p(x) and q(x) are analytic in an interval around x = 0, then
the solution of the normal HSOLDE is also analytic in a neighborhood of x = O.
13.7 SOLDEs with Constant Coefficients
The solution to a SOLDE with constant coefficients can always be fonnd in closed
form. 10fact, we can treat an nth-order linear differential equation (NOLDE) with
constant coefficients with no extra effort. This brief section outlines the procedure
for solving such an equation. For details, the reader is referred to any elementary
book on differential equations (see also [Hass 99]). The most general nth-order
linear differential equation (NOLDE) with constant coefficients can be written as
L[y] == yCn) +an-1/,,-I) + ...+aIY' +aoy = r(x). (13.38)
The corresponding homogeneous NOLDE (HNOLDE) is obtained by setting
r(x) = O. Let us consider such a homogeneous case first. The solution to the
homogeneous NOLDE
L[y] == yCn) +an_IY C
n-l) + ...+alY' +aOy = 0 (13.39)
characteristic
polynomial ofan
HNOLOE
can be found by making the exponential substitotion y = eAx , which results in the
equation L[eAx
] = (A" +a,,_IAn-1 +...+alA +ao)eAX
=O. This equation will
hold only if Ais a root of the characteristic polynomial
13.7 SOLDES WITH CONSTANT COEFFICIENTS 377
which, by the fundamental theorem of algebra, can be written as
(13.40)
The Aj are the distinct (complex) roots of peA) and have multiplicity k],
13.7.1. Theorem. Let (Aj J7~1 be the roots ofthe characteristic polynomial ofthe
real HNOLDE ofEquation (13.39), and let the respective roots have multiplicities
(kj J7=1' Then the functions
are a basis ofsolutions ofEquation (13.39).
When a A is complex, one can write its corresponding solution in terms of
trigonometric functions.
13.7.2. Example. Anequation that is usedin bothmechanics and circuit theory is
for a, b > O. (13.41)
Itscharacteristic polynomial is pC},,) = A2 + a'A +b. whichhastheroots
Al = !(-a+Ja2 - 4b) and 1.2= i(-a-Ja2 - 4b).
Wecandistinguish three different possiblemotions depending ontherelative sizes of
a andb.
(a) a2 > 4b (overdamped): Herewe havetwo distinctsimpleroots.The multiplicities are
bothone(kl = k2 = 1); therefore, thepower of x forbothsolutions is zeroCrl = rz = 0).
Lety == 1./a2 - 4b.Then themostgeneral solution is
Since a > 2y, this solutionstarts at y = C1 + C2 at t = 0 and continuously decreases; so,
as t -+ 00, yet) -+ o.
(b)a2 = 4b (criticallydamped): In thiscasewehaveonemultiplerootof order2 (kl = 2);
therefore, thepower of x canbe zeroor 1 (rl = 0, 1).Thus, thegeneral solution is
yet) = Cjte-at/2 +<oe-al/ 2
This solution starts at y (0) = Co at t = 0, reaches a maximum (or minimum) at t =
2/a- colcr. and subsequently decays (grows)exponentiallyto zero.
(e) a2 < 4b (underdamped): Oncemore, we havetwo distinct simple roots. Themulti-
plicities arebothone (kl = k2 = 1);therefore, thepower of x for both solutions is zero
378 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS
(" =r: =0). Let '" '" tv'4b - 02. Then Al = -0/2 + ico and A2 =Ai. The roots are
complex, andthemostgeneral solutionis thusof theform
yet) = e-
at j2
(cr COS(J)t + cz sin cu) = Ae-at j2
cos(evt + a).
Thesolution is aharmonic variation withadecayingamplitude A exp(-at/2). Notethatif
a = 0, the amplitude doesnot decay. Thatis why a is called thedamping factor (orthe
damping constant).
Theseequations describeeitheramechanical systemoscillating (withnoexternal force)
in a viscous (dissipative) fluid, or an electrical circuit consisting of a resistance R, an
inductance L, anda capacitance C. For RLC circuits,a = RJL and b = Ij(LC). Thus,
thedamping factor depends on therelative magnitudes of R andL. Ontheother hand, the
frequency
depends on all three elements. In particular, for R ~ 2JL/ c the circuit does not oscil-
- .
A physical system whose behaviorin the absence ofa driving force is described
by a NOLDE will obey an inhomogeneous NOLDE in the presence of the driving
force. This driving force is simply the inhomogeneous term of the NOLDE. The
best way to solve such an inhomogeneous NOLDE in its most general form is by
using Fourier transforms and Green's functions, as we will do in Chapter 20. For
the particular, but important, case in which the inhomogeneous term is a product
of polynomials and exponentials, the solution can be found in closed form.
13.7.3. Theorem. The NOLDE L[y1= eAx S(x), where S(x) is a polynomial, has
the particular solution eAxq(x), where q(x) is also a polynomial. The degree of
q(x) equals that ofS(x) unless A = Ai- a root ofthe characteristic polynomial of
L, in which case the degree ofq (x) exceeds that of S(x) by ki- the multiplicity of
Aj.
Once we know the form of the particular solution of the NOLDE, we can find
the coefficients in the polynomial of the solution by substitnting in the NOLDE
and matching the powers on both sides.
13.7.4. Example. Letusfind themostgeneral solutions forthefollowingtwodifferential
equationssubjectto the bonndarycooditions y(O) = 0 and y'(0) = 1.
(a) The first DE we wantto consideris
y" + y = xe", (13.42)
The characteristic polynomial is).,2+ I, whose roots areAl = i andA2 = -i. Thus,a basis
of solutions is {cosx, sinx}. Tofind theparticular solutionwe note that A(thecoefficient
of x in theexponential part of theinhomogeneous term) is 1, whichis neither of theroots
Al and).2-Thus,theparticular solutionis of the form q(x)eX
, whereq(x) = Ax + B is
13.7 SOLDES WITH CONSTANT COEFFICIENTS 379
of degree I [same degree as that of S(x) = x]. We now substitute u = (Ax + B)eX
in
Equation (13.42) to obtain the relation
2Axex +(2A +2B).' = xe",
Matching thecoefficients, we have
2A = I and 2A+2B=O =} A=!=-B.
Thus,themostgeneral solution is
Y = C1 cosx +C2 sin x + 'i<x - l)ex.
Imposing the given boundary conditions yields 0 = y(O) = Cj - !and I = y' (0) = C2.
Thus,
y = !cos x +sinx + !(x - l)eX
is theunique solution.
(b) ThenextDE we wantto consider is
y" _ y = xe", (13.43)
Here p(A) = A2 - I, and the roots are At I and A2 = -1. A basis of solutions is
{eX,e-X}. To find a particular solotion, we note that S(x) = x and A= I = At. Theorem
13.7.3 then implies thatq(x) must be of degree 2, because At is a simple root, i.e., kt =1.
We therefore try
q(x) = Ax2
+ Bx +C =} u = (Ax 2
+ Bx + C)ex.
Taking the derivatives and substituting in Eqoation (13.43) yields two equations,
4A= I and A+B=O,
whosesolution is A = -B = 1.Note thatC is notdetermined, becauseCe" is a solution
of the homogeneousDE corresponding to Equation (13.43), so when Lis appliedto u, it
eliminates the term Cex. Another way of looking at the situation is to note that themost
general solution to (13.43) is of the form
The term Cex could be absorbed in creX. We therefore set C = 0, applythe boundary
conditions, andfindtheunique solution
III
(13.44)
380 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS
13.8 The WKB Method
In this section, we treat the somewhat specialized method-due to Wentzel,
Kramers, and Brillouin-e-of obtaining an approximate solution to a particular type
of second-order DE arising from the Schrodinger equation in one dimension. Sup-
pose we are interested in finding approximate solutions of the DE
d2
y
-2 +q(x)y = 0
dx
in whichq varies "slowly" with respect tox in the sense discussed below. If q
varies infinitely slowly, i.e., if it is a constant, the solution to Equation (13.44)
is simply an imaginary exponential (or trigonometric). So, let us define </>(x) by
y = ei¢(x) and rewrite the DE as
(<//)2 +i</>" - q = O. (13.45)
(13.46)
(13.47)
Assuming that <//' is small (compared to q), so that y does not oscillate too rapidly,
we can find an approximate solution to the DE:
<// = ±.jii =} </> = ± fJq(x) dx.
The condition of validity of our assumption is obtained by differentiating (13.46):
1<//'1 '" ~ I~I « Iql.
It follows from Equation (13.46) and the definition of</> that 1/y7i is approximately
1/(2Jf) times one "wavelength" of the solution y. Therefore, the approximation is
valid if the change in qin one wavelength is small compared to IqI.
The approximation can be improved by inserting the derivative of (13.46) in
the DE and solving for a new </>:
. , ( . ') t/2
12 lq , zq
(</» "'q±-- =} </> "'± q±--
2y7i 2y7i
or
The two choices give rise to two different solutions, a linear combination ofwhich
gives the most general solution. Thus,
y'" ~:(X) {C1exp [if.jii dX] +C2 exp [-i f.jii dX]} .
13.8 THE WKB METHOD 381
Equation (13.47) gives an approximate solution to (13.44) in any region in
which the condition of validity holds. The method fails if q changes too rapidly or
if it is zero at some point of the region. The latter is a serious difficnlty, since we
often wish to join a solution in a region in which q(x) > 0 to one in a region in
which q(x) < O. There is a general procedure for deriving the so-called connection
formulas relating the constants C1 and C2 of the two solutions on either side of the
point where q(x) = O. We shall not go into the details of such a derivation, as it is
not particularly illuminating.f We simply quote a particular result that is useful in
applications.
Suppose that q passes through zero at xo, is positive to the right of xo, and
satisfies the condition of validity in regions both to the right and to the left of xo.
Furthermore, assume that the solution of the DE decreases exponentially to the
left of xo. Under such conditions, the solution to the left will be of the form
I lxO
~exp[- J-q(x) dX],
-q(x) x
while to the right, we have
(13.48)
(13.49)
2 ~ I cos [lX
Jq(x) dx - ::].
q(x) xo 4
A similar procedure gives connection formulas for the case where q is positive on
the left and negative on the right of xo.
13.8.1. Example. Consider theSchrodinger equation in onedimension
d2t 2m
dx 2 + 1<2 [E - V(x)lt = 0
where V(x) isa potential well meeting the horizontal line ofconstant E atx = a and x = b,
so that
2m {>0
q(x) = Z[E - Vex)]
I< < 0
if a < x < b,
if x < a or x > b.
The solution thatis bounded to theleft of a mustbe exponentially decaying. Therefore, in
theinterval (c, b) theapproximate solution, asgivenby Equation (13.49), is
t(x) '" A 1/4 cos (lX
t~[E - Vex)] dx - ::'4) ,
(E - V) a I<
where A is somearbitrary constant. The solution thatis bounded to therightof b mustalso
be exponentially decaying. Hence,the solution for a < x < b is
8Theinterested reader is referred to thebookby Mathews andWalker, pp.27-37.
(13.50)
382 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS
Sincethese twoexpressions givethesamefunction in thesameregion, theymustbeequal.
Thus, A = B, and, moreimportantly,
cos(fJ~~[E - V(x)] dx - ~) = cos ({ J~~[E - V(x)] dx - ~) ,
or
lab JZm[E - V(x)] dx = (n + !)rrn.
Thisis essentiallytheBohr-Sommerfeld quantization condition ofpre-1925 quantum me-
chanics. IIllII
13.8.1 Classical Limit of the Schrodlnger Equation
As long as we are approximating solutions of second-orderDEs that arise naturally
from the Schrodinger eqnation, it is instructive to look at another approximation
to the Schr6dinger equation, its classical limit in which the Planck constant goes
to zero.
The idea is to note that since 1/r(r, t) is a complex function, one can write it as
1/r(r, t) = A(r, t) exp [~s(r, t)] ,
where A(r, t) and S(r, t) are real-valued functions. Snbstituting (13.50) in the
Schrodinger equation and separating the real and the imaginary parts yields
(13.51)
These two equations are completely equivalent to the Schrodinger equation.
The second equation has a direct physical interpretation. Define
p(r, t) == A2
(r , t) = 11/r(r, t)12
and
VS
J(r, t) sa A2(r , t) - = pv,
m
___ (13.52)
=V
multiply the second equation in (13.51) by 2A/m, and note that it then can be
written as
ap
-+V·J=O,
at (13.53)
wltich is the continuity equation for probability. The fact that J is indeed the
probability current density is left for Problem 13.30.
Schriidinger equation
describes a classical
statistical mixture
when Ii -e- O.
13.9 NUMERICAL SOLUTIONS OF DES 383
The first equation of (13.51) gives an interesting result when 1i, --+ °because
in this limit, the RHS of the equation will be zero, and we get
as 1 2
-+-mv +V =0.
at 2
Taking the gradient of this equation, we obtain
(:t +v'V)mv+VV=O,
which is the equation ofmotion of a classical fluid with velocity field v = VS/ m.
We thus have the following:
13.8.2. Proposition. In the classical limit, the solution ofthe Schriidinger equa-
tion describes a fluid (statistical mixture) ofnoninteracting classical particles of
mass m subject to the potential V(r). The density and the Currentdensity ofthis
fluid are, respectively, the probability density p = I'"1
2 and the probability current
density J ofthe quantum particle.
13.9 Numerical Solutionsof DEs
The majority of differential equations encountered in physics do not have known
analytic solutions. One therefore resorts to numerical solutions. There are a variety
ofmethods having various degrees of simplicity ofuse and accuracy. This section
considers a few representatives applicable to the solution of ODEs. We make
frequent use oftechniques developed in Section 2.6. Therefore, the reader is urged
to consult that section as needed.
Any normal differential equation of nth order,
dnx _ . (n-e l},
--F(x,x, ... ,x ,t),
dtn
can be reducedto a system ofnfirst-order differential equations by defining Xl = x,
Xz =i, ... Xn = x(n-l). Thisgives thesystem
We restrict ourselves to a FaDE of the form i = f(x, t) in which f is a well-
behaved function oftwo real variables. At the end ofthe section, we briefly outline
a technique for solving second-order DEs.
Two general types of problems are encountered in applications. An initial
value problem (IVP) gives X (t) at an initial time to and asks for the value of X at
other times. The second type, the boundary value problem (BVP), applies ouly to
differential equations ofhigher order than first. A second-order BVP specifies the
value of x(t) and/or i(t) at one or more points and asks for X or i at other values
of t. We shall consider only IVPs.
384 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS
13.9.1 Using the Backward Difference Operator
Let us consider the IVP
i = f(x, r), x(to) = xo· (13.54)
(13.55)
(13.56)
The problem is to find {Xk = x(to +kh)}£"=I' given (13.54).
Let us begiu by integrating (13.54) between tn aud tn +h:
l
to+h
x(tn +h) - x(tn) = i(t) dt.
to
Chauging the variable of integration to s = (t - tn)/ h aud using the shift operator
E introduced in Section 2.6 yields
xn+' - Xn = h fa'i(tn +sh) ds = h faES
i(tn)] ds.
Since a typical situation involves calculating xn+' from the values of x(t) aud
i(t) at preceding steps, we waut au expression in which the RHS of Equation
(13.55) contains such preceding terms. This suggests expressing E in terms of the
backward difference operator. It will also be useful to replace the lower limit of
integration to - p, where p is a number to be chosen later for convenience. Thus,
Equation (13.55) becomes
Xn+t = xn-p +h [i~(1 - V)-S dS] in
[I' 00 r(-s + l)ds k] .
= xn- p +h _p ~ k!r(-s _ k + 1) (-V) Xn
= xn-p +h (f>iP)Vk
) in,
k=O
where
(p) (-l)kl' r(-s+l)ds III
ak =-- =- s(s+l) .. ·(s+k-l)ds.
k! r P r(-s - k + 1) k! -p (13.57)
Keeping the first few terms for p = 0, we obtain the useful formula
Due to the presence of V in Equation (13.58), finding the value of x(t) at tn+1
requires a knowledge of x(t) aud i(t) at points to, tl, ... , tn' Because of this,
formulas of open and
closed type
13.9 NUMERICAL SOLUTIONS OF DES 385
Equation (13.58) is called a formula of open type. In contrast, in formulas of
closed type, the RHS contains values at tn+1 as well. We can obtain a formula of
closed type by changing E' xnto its equivalent form, E,-l xn
+1. The result is
00
. h" bCP)Vk .
Xn+l = x n-p + L- k' Xn+b
k~O
where
bC
P) es (_l)k 11
r(-s+2)ds.
k k! _P I'{-s - k +2)
Keeping the first few terms for p = 0, we obtain
x +1 "" x +h (1 _ V _ V
2
_ V
3
_ 19V
4
_ 3V
5
_ ... ) X +1
n n 2 12 24 720 160 " ,
(13.59)
(13.60)
which involves evaluation at tn+l on the RHS.
For p = 1 (p = 3), Equation (13.56) [(13.59)] results in an expansion in
powers of V in which the coefficient of VP (VP+2)is zero. Thus, retaining terms
up to the (p - l)st [(p + l)st] power of V automatically gives us an accuracy of
hl' (hp +2). This is the advantage of using nonzero values of p and the reason we
considered such cases. The reason for the use of formulas of the closed type is the
smallness of the error involved. All the formulas derived in this section involve
powers of V operating onx, orXn+ l . Thismeansthat to findxn+ l , we mustknow
the values of Xk for k :0 n + 1. However, x = f(x, t) or Xk = f(xk, tk) implies
that knowledge of Xk requires knowledge of Xk. Therefore, to find Xn+l, we must
know not ouly the values of x but also the values of x(t) at tk for k :0 n +1. In
particular, we cannot start with n = 0 because we would get negative indices for
x due to the high powers of V. This means that the first few values of Xk must be
obtained using a different method. One common method of starting the solution
is to use aTaylor seriesexpansion:
h2xo 2
Xk = x (to +kh) =xo +hxok+ -2-k +... ,
where
(13.61)
Xo = Itx«; to), .. (8
f
I ). + 8f I
xo= - XQ - ,
ax Xij,to at XQ,to
For the general case, it is clearthat the derivatives requiredfor the RHS of Equation
.(13.61) involve very complicated expressions. The following example illustrates
the procedure for a specific case.
386 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS
13.9.1. Example. Letus solvethe IVPx+x +e' x 2 = 0 withx(O) = I. Wecanobtain
a Taylor series expansion for x by noting that
. 2
xQ = -xo -xO' xo = x(O) = -XO - 2xOXO - x5,
... .. ".,2 2 .. 4' 2
xo=-xO-£...,o- xQxO- xQxO-xo'
Continuing in this way. we can obtain derivatives of all orders. Substituting xQ = 1 and
keeping terms up to the fifth order, we obtain
iO = -2, iQ = 5, x·o =-16, d4XI =65,
dt4 ,=0
d
5
x I = -326.
dt5 ,~O
(13.62)
Substituting thesevaluesin a Taylorseriesexpansionwithh = 0.1 yields
Xk = I - 0.2k +0.025k2 - 0.0027k3 + (2.7 x 1O-4)k4 - (2.7 x 1O-5)k5 + ....
Thus,Xt = 0.82254, X2 = 0.68186, and X3 = 0.56741. Thecorresponding valuesoU can
be calculatedusingthe DE. Wesimplyquote the result: Xt = -1.57026, x2 = -1.24973,
X3 = -1.00200. III
Once the starting values are obtained, either a formula of open type or one
of closed type is used to find the next x value. Ouly formulas of open type will
be discussed here. However, as mentioned earlier, the accuracy of closed-type
formulas is better. The price one pays for having x,,+! on the RHS is that using
closed-type formulas requires estimating Xn+t. This estimate is then substituted
in the RHS, and an improved estimate is found. The process is continued until no
further improvement in the estimate is achieved.
The use ofopen-type formulas involves simple substitution ofthe known quan-
tities Xo, xi, ... , Xn on the RHS to obtain x,,+t. The master equation (for p = 0)
is (13.58). The number of powers of V that are retained gives rise to different
Euler's method methods. For instance, when no power is retained, the method is called Euler's
method, for which we nse xn+t '" x" +hin. A more commouly used method
is Adam's method, for which all powers of V up to and including the third are
Adam's method retained. We then have
or, in tenus of values ofi,
h
Xn+t '" Xn + 24 (55in - 59in_t +37i,,_2 - 9i,,_3).
Recall thatik = f(xk, tk). Thus, if we know the values xs, Xn-I, X,,-2, andx,,_3,
we can obtain xn+1-
13.9.2. Example. Knowing xo, xI, X2, andX3,we can useEquation(13.62)to calculate
X4 for Example13.9.1:
0.1
X4 "" X3 + 24 (55x3 - 59x2+37xl - 9xo) = 0.47793.
13.9 NUMERICAL SOLUTIONS OF DES 387
With X4 at our disposal, we canevaluate x4 = - X4 - xlet4, andsubstitute it in
to find X5. and so on.
A crucial fact about such methods is that every value obtaiued is in error by
some amount, and using such values to obtain new values propagates the error.
Thus, error cau accumulate rapidly aud make approximatious worse at each step.
Discussiou of error propagation aud error aualysis-topics that we have not, aud
shall not, cover-is commou in the literature (see, for example, [Hild 87, pp. 267-
268]).
13.9.2 The Runge-Kutta Method
The FODE of Equatiou (13.54) leads to a unique Taylor series,
h2
x(to +h) = xo +hio + 2io +... ,
where io, io, aud all the rest of the derivatives cau be evaluated by differeutiatiog
i = lex, t). Thus, theoretically, the Taylor series gives the solution (for to +h;
but to +2h, to + 3h, aud so on cau be obtaioed similarly). However, in practice,
the Taylor series converges slowly, aud the accuracy involved is not high. Thus
oue resorts to other methods of solution such as described earlier.
Runge-Kutta method Another method, known as the Runge-Kulla method, replaces the Taylor
series
with
Xn+l = Xn +h [ao!(xn, tn) +i-»-+bjh, tn + /Ljh)] ,
J~t
(13.63)
(13.64)
where ao aud {aj, b], /Lj }}=t are constauts choseu such that iftheRHS of (13.64)
were expauded in powers of the spacing h, the coefficients of a certaio number of
the leading terms would agree with the corresponding expausion coefficieuts of
the RHS of (13.63). It is customary to express the b's as linear combinations of
preceding values of !;
i-I
hb, = LAirkr,
r=O
i = 1,2, ... , p.
(13.65)
388 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS
The k; are recursively defined as
ko = hf(x,,, In), k; = hf(x" +b-h, In + P,rh).
Then Equation (13.64) gives Xn+l = Xn + L~=oarkr' The (nontrivial) task now
is to determine the parameters a-, J-Lr, and Aijo
Carle David TalmoRunge (1856-1927), afterreturningfrom
a six-month vacation in Italy, enrolled at the University of
Munich to study literature. However, after six weeks of the
course he changed to mathematics and physics.
Rungeattendedcourses with Max Planck, andtheybecame
close friends. In 1877 both went to Berlin, but Runge turned
to pure mathematicsafterattendingWeierstrass'slectures. His
doctoraldissertatioo(1880)dealt with differentialgeometry.
After taking a secondary-school teachers certificationtest,
he returned to Berlin, where he was influenced by Kronecker.
Runge then worked on a procedure for the numerical solution
of algebraic equations in which the roots were expressed as infinite series of rationalfunc-
tions of the coefficients. In the area of numerical analysis, he is credited with an efficient
method of solving differential equations numerically, work he did with Martin Kutta.
Runge published little at that stage, but after visiting MitlagMLeffler in Stockholm in
September 1884 he produced a large nnmber of papers in Mittag-Leffler'sjonmal Acta
mathematica. In 1886. Runge obtained a chair at Hanover and remained there for 18 years.
Within a year Runge had moved away from pure mathematics to study the wavelengths
of the spectral lines of elements other than hydrogen. He did a great deal of experimental
work and published a great quantity of results, including a separation of the spectral lines
of helium in two spectral series.
In 1904 Klein persuaded Gottingen to offer Runge a chair of applied mathematics, a
post that Rnnge held nnti!he retired in 1925.
Runge was always a fit and active man, and on his 70th birthday he entertained his
grandchildren by doing handstands. However, a few months later he had a heart attack and
died.
In general, the determination of these constants is extremely tedious. Let us
consider the very simple case where p = I, and let A es AOl and IL sa ILl. Then
we obtain
Xn+l = x" +aoko +alkj,
where ko = hf(xn , In) and kj = hf(xn + Ako, In + p,h).
Taylor-expanding kj, a function of two variables, gives?
k: = hf +h2(lLft +Vfx) + h; (1L2I« +2Ap,ffxt +A2
f2 fxx) + O(h 4
) ,
9ThesymbolO(h ln
) meansthatall termsof orderhlll·and higherhavebeenneglected.
13.9 NUMERICAL SOLUTIONS OF OES 389
where It es anal, etc. Substituting this in the first equation of (13.65), we get
w
c-
nt
Xn+l = x" +h(ao +al)I +h
2a(J-LIt
+Allx)
+ h; al(J-L2Itt +2AJ-Lllxt +A
2
12
Ixx) + O(h
4
) .
On the other hand, with
.. df al dx al .
x = I, x = dt = atdl + at= xIx + It = IIx + It,
x' = Itt +2IIxt + 12
Ixx + Ix<Jlx + It),
Equation (13.63) gives
h2
xn+! = Xn +hI + z<Jlx + It)
h
3
2 4
+ 6[Itt +2IIxt + I Ixx + Ix<Jlx + ft)] + O(h ).
(13.66)
(13.67)
lfwe demand that (13.66) and (13.67) agree up to the h2 term (we cannot demand
agreement for h3 or higher because of overspecification), then we mnst have ao +
al = 1, alJ-L = ~,aIA = ~' There are ouly three equations for four unknowns.
Therefore, there will be an arbitrary parameter f3 in terms of which the unknowns
can be written:
in
ta
~S.
hs
al
es
a
ao = 1- f3, al = f3,
1
A = 2f3'
is
td
IS
m
Substituting these values in Eqnation (13.65) gives
This formula becomes useful ifwe let f3 = ~' Then In +hf(2f3) = In +h = In+l,
which makes evaluation of the second term in square brackets convenient. For
f3 = ~,we have
i) h 3
X,,+l = xn +"2[f(Xn, In)+ I(xn +hI, In+I)] + O(h ). (13.68)
What is nice about this equation is that it needs no starting up! We can plug
in the known quantities In, tn+1, and Xn on the RHS and find Xn+I starting with
n = O. However, theresult is not veryaccurate, and we cannot makeit anymore
accurate by demanding agreement for higher powers of h, because, as mentioned
earlier, sucha demand overspecifies theunknowns.
390 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS
Martin Wilhelm Knlla (1867-1944) losthis parents when he was still a child, and together
with his brother went to his uncle in Breslau to go to gymnasium. He attended the University
of Bres1au from 1885-1890, and the University of Munich from 1891-1894 concentrating
mainly on mathematics, but he was also interested in languages, music, and art. Although
he completed the certification for teaching mathematics and physics in 1894, he did not start
teaching immediately. Instead, he assisted von Dyckat the Technische Hochschule Miinchen
until 1897 (and then again from 1899 to 1903).
From 1898 to 1899 he studied at Cambridge, and a year later, he finished his Ph.D.
at the University of Munich. In 1902, he completed his habilitation in pure and applied
mathematics at the Technische Hochschule Munchen, where he became professor of applied
mathematics five years later. In 1909 he accepted an offer from University of Vienna, but
a year later he went to the Technische Hochschule Aachen as a professor. From 1912 until
his retirement in 1935 he worked at the Technische Hochschule Stuttgart.
Kutta's name is well known not only to physicists, applied mathematicians, and en-
gineers, but also to specialists in aerospace science and fluid mechanics. The first group
use the Runge-Kutta method, developed at the beginning of the twentieth century to obtain
numerical solutions to ordinary differential equations. The second group use the Kutta-
Zhukovskii formula for the theoretical description of the buoyancy of a body immersed in
a nonturbulent moving fluid. Kutta's work on the application of conformal mapping to the
study of airplane wings was later applied to the flight of birds, and further developed by L.
Prandtl in the theory of wings.
Kutta obtained the motivation for his first scientific publication from Boltzmann and
others (including a historian ofmathematics) when working on the theoretical determination
of the heat exchanged between two concentric cylinders kept at constant temperatures. By
applying the conformal mapping technique, Kutta managed to obtain numerical values for
the heat conductivity of air that agreed well with the experimental values of the time.
Three ofKutta'spublications dealt withthe history ofmathematics, for which he profited
greatly because of his knowledge of the Arabic language.
One of the most important tasks of applied mathematics is to approximate numerically
the initial value problem of ODEs whose solutions cannot be found in closed form. After
Euler(1770) hadalready expressed the basic idea, Runge (1895) and Heun(1900) wrote down
the appropriate formulas. Kutta's contribution was to considerably increase the accuracy,
and allow for a larger selection of the parameters involved. After accepting a professorship
in Stuttgart in 1912, Kutta devoted all his time to teaching. He was very much in demand
as a teacher, and it is said that his lectures were so good that even engineering students took
an interest in mathematics.
(Taken from W. Schulz, "Martin Wilhelm Kutta," Neue Deutsche Biographie 13, Berlin,
(1952-) 348-350.)
Formulas that give more accurate results can be obtained by retaining terms
beyondp = 1.Thus, for p = 2, if we writexn+ ! = xn+I:~~o OI,k" there will be
eight unknowns (three 01's, three Aij 's, and two u 's), and the demand for agreement
between the Taylor expansion and the expansion of f up to h3 will yield only six
equations. Therefore, there will be two arbitrary parameters whose specification
13.9 NUMERICALSOLUTIONS OF OES 391
results iu various formulas. The details ofthis kind ofalgebraic derivation are very
messy, so we will merely consider two specificformulas. Onesuchformula, due
to Kutta, is
Xn+l = Xn + ~(ko +4kl +k2) + O(h4
) ,
where
ko = hf(xn , In), kl = hf(xn + !ko, In + !h),
k2 = hf(xn +2kl - ko, In +h).
(13.69)
A second formula, dne to Heun, has the form
Xn+! = Xn + !(ko +3k2) + O(h4
),
where
ko = hf(xn, In), kl = hf(xn + ~kO, In + ~h),
ka = hf(xn + ~kl - ko, In + ~h).
These two formulas are of about the sarne order of accuracy.
13.9.3. Example. Letus solveIheDEofExample 13.9.1usingIheRunge-Kuua meIhod.
WiIh to = 0, xO = I, h = 0.1, and n = 0, Equation (13.69) gives ko = -0.2, kl =
-0.17515, k2 = -0.16476. so Ihat
XI = 1 + ~(-0.2+4(-0.17515) - 0.16476) = 0.82244.
This Xl, h = 0.1, and tl = to + h = 0.1 yield Ihe following new values:ko = -0.15700,
kl = -0.13870, k2 = -0.13040, whicb in turn give
x2 = 0.82244 + ~[-0.15700 - 4(0.13870) - 0.13040] = 0.68207.
WesimilarlyobtainX3 =0.56964andX4 =0.47858. On theoIherhand, solvingIheFODE
analyticallygives Iheexact result x(t) = e-t/(1 + t).
Table 13.1 compares thevalues obtained here, thoseobtained usingAdam's method,
and theexactvalues tofivedecimal places.It is clear that theRunge-Kutta method is more
accurate than themethods discussed earlier. II1II
The accuracy ofIhe Runge-Kutta meIhod and the fact Ihat it requires no startup
procedure make it one of the most popular methods for solving differential equa-
tions. The Runge-Kutta method can be made more accurate by using highervalues
of p, For instance, a formula that is used for p = 3 is
Xn+1 = Xn + ~(ko +2kl +2k2 +k3) + O(h
5
) ,
where
(13.70)
ko = hf(xn , In),
k2 = hf(xn + !kl, In + !h)
kl = hf(xn + !ko, In + !h),
k3 = hftx; +k2, In +h).
392 13. SECONO-ORDER LINEAR DIFFERENTIAL EQUATIONS
I Analytical
O. 1
0.1 0.82258
0.2 0.68228
0.3 0.56986
0.4 0.47880
Runge--Kutta
1
0.82244
0.68207
0.56964
0.47858
Adam's method
1
0.82254
0.68186
0.56741
0.47793
Table 13.1 Solutions to the differential equation of Example 13.9.1 obtained in three
different ways.
13.9.3 Higher-OrderEquations
Any nth-order differential equation is equivalent to n first-order differential equa-
tionsinn+ 1variables. Thus, for instance, the most general SOOE, F(i, X, x, I) =
0, can he reduced to two FOOEs by solving for i to obtain i = G(x, x, z), and
defining x = U to get the system of equations
u= G(u, x, I), x=u.
These two equations are completely equivalent to the original SOOB. Thus, it is
appropriate to discuss numerical solutions of systems of FOOEs in several vari-
ables. The discussion here will be limited to systems consisting of two equations.
The generalization to several equations is not difficult.
Consider the IVP of the following system of equations
x= f(x, u, I), x(lo) =xo: u=g(x, u, I), u(to) =uo.
(13.71)
Using an obvious generalization of Equation (13.70), we can write
X n+l = X n + !(ko + 2kl + 2k2 + k3) + O(hs),
Un+l = Un + !(mo + 2ml + 2m2 + m3) + O(hs),
where
ko =hf(xn, Un, In), kl =hf(xn + !kO, Un + !mo, In + !h),
k2 = hf(xn + !kl, Un + !mj, In + !h),
k3 = hf(xn + ka. Un + m2, In + h),
and
mo = hg(xn, Un, tn), mj = hg(xn + !ko, Un + !mo, In + !h),
m2 = hg(xn + !kj, Un + !ml, In + !h),
m3 = hg(xn + k2, Un + m2, tn + h).
(13.72)
13.9 NUMERICAL SOLUTIONS OF DES 393
These formulas are more general than needed for a SODE, since, as mentioned
above, such a SODE is equivalent to the simpler system in which !(x, u, I) sa u.
Therefore, Equation (13.72) specializes to
ko = hu.. = hin
k2 = hin + ~hm"
and
k, =h(un + ~mo) =hin + ~hmo,
k3 = hin +hm-i,
Xn+l = Xn +hin + ~h(mo +ml +m2) + O(h
5),
in+l = in + ~(mo +2m, +2m2 +m3) + O(h
5),
(13.73)
where
mo =hg(xn, in, In), m, =hg(xn + ~hin, in + ~mo, In + ~h),
m2 = hg(xn + ~in + ~hmo, in + ~m" In + ~h),
m3 = hg(xn +hin + ~hm" in +m2, In +h).
13.9.4. Example. The IVP x+ x = 0, x(O) = 0, x(O) = I clearly has the analytic
solution xCt) = sinr. Nevertheless, let us use Equation (13.73) to illustrate the Runge-
Kutta method and compare the result with the exact solution.
For this problem g(x, i, t) = -x. Therefore, we can easily calculate the m's:
rna = ":""hx
n, ml = -h(xn + !hxn},
m2 = -h(xn + ~hxn - ~h2Xn),
m3 = -h[xn +hin - ~h2(Xn + ~hXn)].
These lead to the following expressions for xn+1 and xn+1:
xn+ l = xn + kin - ~h2(3Xn + kin - !h2Xn),
xn+' = in - gh[6Xn +3hxn - h2(xn + !hin)].
Starting with xo = 0 and.to = 1, we can generate xl> X2. and so on by using the last
two equations successively. The results for 10 values of x with h = 0.1 are given to five
significant figures in Table 13.2. Note that up to x5 there is complete agreement with the
exact result. ..
The Runge-Kutta method leuds itself readily to use in computer programs.
Because Equation (13.73) does not require any startups, it can be used directly to
generate solutions to any IVP involving a SODE.
Another, more direct, method of solving higher-order differential equations is
to substitute 0 = -(1/ h) 10(1 - V) for the derivative operator in the differential
equation, expand in terms ofV , and keep an appropriate number ofterms. Problem
13.33 illustrates this point for a linear SODE.
394 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS
t Runge-Kutta
0.1 0.09983
0.2 0.19867
0.3 0.29552
0.4 0.38942
0.5 0.47943
0.6 0.56466
0.7 0.64425
0.8 0.71741
0.9 0.78342
1.0 0.84161
sint
0.09983
0.19867
0.29552
0.38942
0.47943
0.56464
0.64422
0.71736
0.78333
0.84147
Table 13.'2 Comparison of theRunge-Kutta andexactsolutions to thesecondorder DE
ofExample 13.9.4.
13.10 Problems
13.1. Let u(x) be a differentiable function satisfying the differential inequal-
ity u'(x) ~ Ku(x) for x E [a, b], where K is a constant. Show that u(x) ~
u(a)eK(x-a). Hint: Multiply both sides of the inequality by e-KX, and show that
the result can be writteu as the derivative of a nouincreasing fuuction. Then use
the fact that a ~ x to get the final result.
13.2. Prove Proposition 13.4.2.
13.3. Let f and g be two differentiable functions that are linearly dependent. Show
that their Wronskian vanishes.
13.4. Show that if (Ji, f{) and (!2. f~) are linearly dependent at one point, then
Ji and fz are linearly dependent at all x E [a, b]. Here Ji and fz are solutions of
the DE of (13.12). Hint: Derive the identity
W(fl, f2; X2) = W(Ji, f2; XI) exp {-1~2 P(t)dt} .
13.5. Show that the solutions to the SOLDE v" + q(x)y = 0 have a constant
Wronskian.
13.6. Find (in tenus of an integral) GI/(x), the linearly independent "partner" of
the Hermite polynomial HI/(x). Specialize this to n = 0, 1. Is it possible to find
Go(x) and GI (x) in tenus of elementary functions?
13.7. LetJi, fz,and f3be any three solutions of y" +py' +qy = O. Show that the
(generalized 3 x 3) Wronskian of these solutions is zero. Thus, any three solutions
of the HSOLDE are linearly dependent.
13.10 PROBLEMS 395
13.8. For the HSOLDE v" +py' +qy = 0, show that
P=
11ft. - hI!'
W(fl, h)
and
I 'f " - f'f"
q=1221.
W(fI,f2)
Thus, knowing two solutions of an HSOLDE allows us to reconstruct the DE.
13.9. Let h, 12, and 13 be three solutions of the third-order linear differential
equation s" + P2(X)Y" + PI(x)y' + po(x)y = O. Derive a FODE satisfied by the
(generalized 3 x 3) Wronskian of these solutions.
13.10. Prove Corollary 13.4.13. Hint: Consider the solution u = I of the DE
u" = 0 and apply Theorem 13.4.11.
13.11. Show that the adjoint of Mgiven in Equation (13.21) is the original L.
13.12. Show that ifu(x) and v(x) are solutions ofthe self-adjointDE (pu')'+qu =
0, then Abel's identity, p(uv' - vu') = constant, holds.
13.13. Reduce each DE to self-adjoint form.
(a) x2y" +xy' +y = O. (b) s"+ y'tan x = O.
13.14. Reduce the self-adjoint DE (py')' + qy = 0 to u" + S(x)u = 0 by an
appropriate change of the dependent variable. What is Sex)? Apply this reduction
to the Legendre DE for P"(x), and show that
Sex) = 1+ n(n + I) - n(n + l)x
2
.
• (l-x2)2
Now use this result to show that every solution of the Legendre equation has at
least (2n + I)/lf zeros on (-I, +1).
13.15. Substitute v = y' / y in the homogeneousSOLDE
y" +p(x)y' +q(x)y = 0
and:
Riccati equation .(a) Show that it tums into v' + v2 + p(x)v + q(x) = 0, which is a first-order
nonlinear equation called the Riccati equation. Would the same substitution work
if the DE were inhomogeneous?
(h) Show that by an appropriate transformation, the Riccati equation can be directly
cast in the form u' +u2 +Sex) = o.
13.16. For thefunction S(x) defined in Example 13.6.1, let S-I(x) be the inverse,
i.e., S-I(S(x» = x. Show that
d -I I
d
- [S (x)] = "------"2'
x vi-x·
396 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS
and given that s-t(O) = 0, conclude that
fo
x dt
S-I(X) = ~.
o 1- t2
13.17. Define sinh x andcoshx as the solutions of y" = y satisfying the boundary
conditions y(O) = 0, y'(O) = I and y(O) = I, y'(O) = 0, respectively. Using
Example 13.6.1 as a guide, show that
(a) cosh2
x - sinh2
x = I.
(c) sinh(-x) = -sinhx.
(b) cosh(-x) = coshx.
(d) sinh(a +x) = sinha coshx +cosha sinhx.
13.18. (a) Derive Equation (13.30) of Example 13.6.4.
(b) Derive Equation (13.31) of Example 13.6.4 by direct substitntion.
(c) Let)" = 1(1 + I) in Example 13.6.4 and calculate the Legendre polynomials
p/(x) for 1= 0, 1,2,3, subject to the condition p/(l) = I.
13.19. Use Equation (13.33) ofExample 13.6.5 to generate the first three Hermite
polynomials. Use the normalization
to determine the arbitrary constant.
13.20. The function defined by
00
f(x) = I>nxn,
n=O
where
2n -)"
c 2 - C
n+ - (n + I)(n +2) n,
can be written as f (x) = cog(x) +Cj h(x), where g is even andh is odd in x. Show
that f (x) goes to infinity at least as fast as e
Xx
does, i.e., limx-->oo f (x )e-
Xx
i' O.
Hint: Consider g(x) and h(x) separately and show that
00
g(x) = Lbnxn,
n=O
4n -)"
where b 1- b
n+ - (2n + 1)(2n +2) n'
Then concentrate on the ratio g(x)/e
xx
, where g and e
Xx
are approximated by
polynomials of very high degrees. Take the limit of this ratio as x ---> 00, and use
recursion relations for g and e
Xx
• The odd case follows similarly.
13.21. Refer to Example 13.6.6 for this problem.
(a) Derive the commutation relation [a, at] = 1.
(b) Show that the Hamiltonian can be written as given in Equation (13.34).
(c) Derive the commutation relation [a, (at)n] = n(at)n-l.
13.10 PROBLEMS 397
(d) Take the inner product ofEquatiou (13.36) with itself aud use (c) to show that
len 1
2 = nlen_112
• From this, conclude that Ic,I 2 = n!leoI2.
(e) For auy function f (y), show that
(y _~) (eY' /2f) = _eY' /2d
f .
dy dy
Apply (y - d/dy) repeatedly to both sides of the above equation to obtain
d n . d/' f
( y _ -) (eY'/2f) = (_I)"eY'/2_.
dy dy"
(f) Choose au appropriate f(y) in part (e) aud show that
' ( d )n 2 2 an 2
eY /2 y _ _ e-Y /2 = (-I)'eY -(e-Y ).
dy dy"
13.22. Solve Airy's DE, y" +xy = 0, by the power-series method. Show that the
radius of couvergence for both independent solutions is infinite. Use the compari-
sou theorem to show that for x > 0 these solutions have infinitely mauy zeros, but
for x < 0 they cau have at most one zero.
13.23. Show that the functions x/' eAX, where r = 0, I, 2, ... , k, are liuearly inde-
pendeut. Hint: Starting with (D- A)k,apply powers ofD- Ato a linearcombination
of xreAx
for all possible r's.
13.24. Find a basis of real solutions for each DE.
(a) s"+Sy' +6 = O.
d4
y
(e) - =y.
dx 4
(b) v"+ 6y" + 12y' + 8y = O.
d4y
(d) -4 =-y.
dx
13.25. Solve the following initial value problems.
d4
y
(a) -4 =y,
dx
d4y d2
y
(b) -4 + -2 =0,
dx dx
d4y
(e) -4 =0,
dx
y(O) = y'(O) = ylll(O) = 0, y"(O) = 1.
y(O) = y" (0) = s"(0) = 0, y' (0) = 1.
y(O) = y'(0) = y" (0) = 0, s"(0) = 2.
13.26. Solve v"-2y'+y = xe" subjectto the initial conditions y(O) = 0, y'(O) =
1.
398 13. SECONO-ORDER LINEAR DIFFERENTIAL EQUATIONS
13.27. Find the general solution of each equation.
(a) y" = xe",
(c) y" +y = sinx sin 2x.
(e) y" - y = eX sin2x.
(g) v" -4y' +4 = eX+xe2x
.
(b) y" - 4y' +4y = x2
.
(d) v" - y = (1 +e-x )2.
(f) y(6) _ y(4) = x2•
(h) y" + y = e2x
•
i = t + sinx, x(O) = IT/2
i = sinxt, x(O) = 1
13.28. Consider the Euler equation,
xn/ n) +an_1Xn-1y(n-l) + ... +alxy' +aoy = r(x).
Substitute x = e' and show that such a substitution reduces this to a DE with
constant coefficients. In particular, solve x 2y" - 4xy' +6y = x.
13.29. (a) Show that the substitution (13.50) reduces the Schrodinger equation to
(13.51).
(b) From the second equation of (13.51), derive the continuity equation for prob-
ability. .
13.30. Show that the usual definition of probability current density,
J=Re[l'J*i~VI'J],
reduces to that in Equation (13.52) if we use (13.50).
13.31. Write a computer program that solves the following differential equations
by
(a) Adam's method [Equation (13.62)] and
(b) the Runge-Kutta method [Equation (13.70)].
i = t - x 2
, x(O) = 1
i = e-
xt,
x(O) = 1
i=x2t2+1,
x(O) = 1
13.32. Solve the following IVPs numerically, with h = 0.1. Find the first ten
values of x.
(a) x+0.2i2
+ lOx = 20t,
(b)x+4x=t2
,
(c) x+i +x = 0,
(d) tx +i +xt = 0,
(e) x+i +x2
= t,
(f) x +xt = 0,
(g) x+sin x = to,
x(O) = 0,
x(O) = I,
x(O) = 2,
x(O) = I,
x(O) = I,
x(O) = 0,
IT
x(O) = 2'
i(O) = O.
i(O) = O.
i(O) = O.
i(O) = O.
i(O) = O.
i(O) = 1.
i(O) = O.
13.10 PROBLEMS 399
13.33. Substitute dldt = D = -(IIh) 1n(1 - V) in the SOLDE i + p(t)i +
q(t)x = r(t) and expand the log terms to obtain
(V2
+ V3)xn - hp.;(V + ~V2
)xn +h2qnxn = h2rn
Since V is of order h, one has to keep one power fewer in the second term, Find
anexpression forXn in terms of Xn-l, Xn-2, and Xn-3. validto h2
.
Additional Reading
1. Birkhoff, G. and Rota, G.-c. Ordinary Differential Equations, 3rd ed., Wi-
ley, 1978. The small size of this book is very deceptive. It is loaded with
information. Written by two excellent mathematicians and authors, the book
covers all the topics ofthis chapter and much more in a very clear and lucid
style.
2. DeVries, P. A First Course in Computational Physics, Wiley, 1994. The
numerical solutions of differential equations are discussed in detail. The
approach is slightly different from the one used in this chapter.
3. Hildebrand,F.Introduction toNumericalAnalysis, 2nd ed., Dover, 1987.Our
tteatmentofnumerical solutions of differentialequations closely follows that
of this reference.
4. Mathews, J. and Walker, R. Mathematical Methods of Physics, 2nd ed.,
Benjamin, 1970. A good source for WKB approximation.
14 -----'---- _
Complex Analysis of SOLDEs
We have familiarized ourselves with some useful techniques for finding solutions
to differential equations. One powerful method that leads to formal solutions is
power series. We also stated Theorem 13.6.7 which guarantees the convergence
of the solution of the power series within a circle whose size is at least as large
as the smallest of the circles of convergence of the coefficient functions. Thus,
the convergence of the solution is related to the convergence of the coefficient
functions. What about the nature of the convergence, or the analyticity of the
solution? Is it related to the analyticity of the coefficient functions? If so, how?
Are the singularpoints ofthe coefficients also singularpoints ofthe solution? Is the
nature ofthe singularities the sarne? This chapter answers some ofthese questions.
Analyticity is best handled in the complex plane. An important reason for this
is the property of analytic continuation discussed in Chapter II. The differential
equation du [dx = u2 has a solution u = -llx for all x except i = O. Thus, we
have to "puncture" the real line by removing x = 0 from it. Then we have two
solutions, because the domain of definition of u = -1Ix is not connected on the
real line (technically, the definition ofa function includes its domain as well as the
rule for going from the domain to the range). In addition, if we confine ourselves
to the real line, there is no way that we can connect the x > 0 region to the x < 0
region. However, in the complex plane the sarne equation, duifd.; = w2, has
the complex solution w = -liz, which is analytic everywhere except at z = O.
Puncturing the complex plane does not destroy the connectivity of the region of
definition ofw.Thus, the solution in the x > 0 region can be analytically continued
to the solution in the x < 0 region by going around the origin.
The aim of this chapter is to investigate the analytic properties ofthe solutions
of some well known SOLDEs in mathematical physics. We begin with a result
from differential equation theory (for a proof, see [Birk 78, p. 223]).
continuation
principle
14.1 ANALYTIC PROPERTIES OF COMPLEX DES 401
14.0.1. Proposition. (continuation principle) The function obtained by analytic
continuation of any solution of an analytic differential equation along any path
in the complex plane is a solution of the analytic continuation of the differential
equation along the same path.
An analytic differential equation is one with analytic coefficient functions.
This proposition makes it possible to find a solution in one region of the complex
plane and then continue it analytically. The following example shows how the
singularities of the coefficient functions affect the behavior of the solution.
14.0.2. Example. LetusconsidertheFODE w'- (y jz)w = Ofory E lit. Thecoefficient
functionp(z) = -yjz hasa simplepoleatz = O. Thesolution totheFODEiseasilyfound
to be w = zY. Thus,depending on whether y is a nonnegative integer, a negative integer
-m, ora noninteger, thesolution has aregular point,a pole of order m. ora branch point
atz = 0, respectively. l1li
This example shows that the singularities of the solution depend on the param-
eters of the differential equation.
14.1 Analytic Properties of Complex DEs
To prepare for discussing the analytic properties of the solutions of SOLDEs,
let us consider some general properties of differential equations from a complex
analytical point of view.
14.1.1 ComplexFOLDEs
In the homogeneous FOLDE
dw
dz + p(z)w = 0, (14.1)
p(z) is assumed to have only isolated singular points. It follows that p(z) can be
expanded about a point zo-which may be a singularity of p(z)-as a Laurent
series in some annular region Yj < [z - zo] < r2:
00
p(z) = L an(z- zo)"
n=-oo
where Yj < [z - zol < rz.
The solution to Equation (14.1), as given in Theorem 13.2.1 with q = 0, is
w(z) = exp [- I p(z)dz]
= Cexp [-a-II Z ~zzo - ~anl(z - zo)ndz - ~a-nl(z - zo)-ndz]
[
00 00 ]
an n+l a-n-l -n
= C exp -a_IIn(z - zo) - L --(z - zo) +L--(z - zo) .
n=O n + 1 n=l n
402 14. COMPLEX ANALYSIS OF SOLOES
We can write this solution as
w(z) = C(z - zo)"g(z), (14.2)
where a sa -a-l and g(z) is an analytic single-valued function io the annular
region rl < [z - zn] < rz because g(z) is the exponential of an analytic function.
For the special case io which p has a simple pole, i.e., when a-n = 0 for all
n 2: 2, the second sum io the exponent will be absent, and g will be analytic even
at zoo In fact, g(zo) = I, and choosing C = 1, we can write
(14.3)
The singularity ofthe
coefficient functions
ofan FOLDE
determines the
singularity ofthe
solution.
Depending on the nature of the siognlarity of p(z) at zo, the solutions given by
Equation (14.2) have different classifications. Foriostance, if p(z) has a removable
siogularity (if a-n = 0 V n 2: I), the solution is Cg(z), which is analytic. In this
case, we say that the FOLDE [Equation (14.1)] has a removable singularity at zoo
If p(z) has a simple pole at zo (ifa-I f= 0 and a-n = 0 V n 2: 2), then io general,
the solution has a branch poiot at zoo In this case we say that the FOLDE has a
regular singularpoint. Fioally, if p(z) has a pole oforder m > I, then the solution
will have an essential singulatity (see Problem 14.1). In this case the FOLDE is
said to have an irregular singular point.
To arrive at the solution given by Equation (14.2), we had to solve the FOLDE.
Since higher-order differential equations are not as easily solved, it is desirable to
obtaio such a solution through other considerations. The followiog example sets
the stage for this endeavor.
14.1.1. Example. A FOLDEhas a unique solution,to within a multiplicative constant,
givenby Theorem 13.2.1.Thus,given a solutionw(z), anyother solutionmustbe of the
form Cw(z). Let zu be a singulatityof p(z), and let z - zo = rele . Start at a point z and
circleZQ so that f) --+ e+21f.Eventhough p(z) mayhavea simple poleatzo,thesolution
mayhavea branch point there. This is clear from the general solution, where a maybe
a noninteger. Thus,w(z) ea w(zQ + re i(8+ 2:n) may be different from w(z). Todiscover
thisbranch point-without solving the DE-invoke Proposition 14.0.1 andconclude that
w(z) is also a solutionto the FOLDE. Thus, w(z) can be differentfrom w(z) by at most
a multiplicative constant: w(z) = Cw(z). Definethe complex number ex by C = e'h<ia.
Thenthefunctiong(z) '" (z - ZO)-aw(z) is single-valued aroundzo0 In fact,
g(zo + rei(B+'h<) = [ri(B+2"lr
a
w(zo + rei(8+Z,,)
=(z - zo)-ae-Z"iae'h<iaw(z) = (z - ZO)-aw(z) = g(z).
This argumentshowsthat a solution w(z) of the FOLDE of Equation (14.1) can be
writtenas w(z) = (z - zo)ag(z), where g(z) is single-valued. III
(14.4)
14.1 ANALYTIC PROPERTIES OF COMPLEX DES 403
14.1.2 The Circuit Matrix
The method used in Example 14.1.1 can be generalized to obtain a similar result
for the NOLDE
dnw dn-1w dw
L[w] = dzn +Pn-I(Z) dzn-I +...+PI(Z) dz +po(z)w = 0
where all the Pi(Z) are analytic in rl < [z - zol < ri-
Let {wj(z)}J~1 be a basis ofsolutions ofEquation (14.4), andletz-zo = re'",
Start at z and analyticallycontinue the functions W j (z) one complete tum to e+2IT.
Let Wj(z) '" Wj (eo +reie ) es Wj (zo +rei(9+2"J). Then, by a generalization of
Proposition 14.0.1, {Wj(z)}J=1 are not ouly solutions, but they are linearly inde-
pendent (because they are W j'S evaluated at a different point). Therefore, they also
form a basis ofsolutions. Onthe otherhand, Wj (z) can be expressed as a linearcom-
binationofthe Wj (z). Thus, Wj(z) =Wj (zo +re i(9+2"») = Lk=1 ajkwk(Z). The
circuit matrix matrix A = (ajk), called the circnit matrix of the NOLDE, is invertible, because
it transforms one basis into another. Therefore, it has ouly nonzero eigenvalues.
We let A be one such eigenvalue, and choose the column vector C, with entries
{c,}7~1' to be the corresponding eigenvector of the transpose of A (note that A
and At, have the same set of eigenvalues). At least one such eigenvector always
exists, because the characteristic polynomial of At has at least one root. Now we
let w(z) = LJ=I CjWj(z). Clearly, this w(z) is a solution of (14.4), and
n
w(z) '" w(zo +rei (9+2>tJ) = :~::>jWj(zo +re i(9+2>tJ)
j=l
n n n
= LCj LajkWk(z) = L(At)kjCjWk(Z) = LACkWk(Z) = AW(Z).
j=l k=l j,k k=l
If we define O! by A =e2>tia, then w(zo +rei(9+2>tJ) =e2"ia w(z). Now we write
f(z) ea (z - zo)-aw(z). Following the argument used in Example 14.1.1, we get
f(zo +rei(8+2"») = f(z); that is, f(z) is single-valued around ZOo We thus have
the following theorem.
14.1.2. Theorem. Any homogeneous NOWE with analytic coefficient functions
in ri < [z - zo 1 < rz admits a solution ofthe form
W(z) = (z - zo)" f(z)
where f(z) is single-valued around zo in rl < lz- eo! < rz-
An isolated singular point zo near which an analytic function w(z) can be
written as w(z) = (z - zo)" f(z), where f(z) is single-valued and analytic in
the punctured neighborhood of zo. is called a simple branch point of w(z). The
arguments leading to Theorem 14.1.2 imply that a solution with a simple branch
404 14. COMPLEX ANALYSIS OF SOLDES
point exists if and only if the vector C whose components appear in w(z) is an
eigenvector of At, the transpose of the circuit matrix. Thus, there are as many
solutions with simple branch points as there are linearly independent eigenvectors
of At
14.2 Complex SOLDEs
canonical basis ofthe
SOLOE
Let us now consider the SOLDE ui" + p(z)wt
;t-q (z)w = O. Given two linearly
independent solutions WI (z) and Wz(z), we form the2 x 2 matrix A and try to
diagonalize it. There are three possible outcomes:
1. The matrix A is diagonalizable, and we can find two eigenvectors, F(z)
and G(z), corresponding, respectively, to two distinct eigenvalues, Al and
AZ. This means that F(zo +rei(O+z,,)) = AlF(z) and G(zo +rei(O+h)) =
AZG(Z). Defining Al = eh i" andAz = eh iP, we get F(z) = (z - zo)" fez)
andG(z) = (z-zo)Pg(z),asTheoremI4.1.2suggests. The set {F(z), G(z)}
is called a canonical basis of the SOLDE.
2. The matrix Ais diagonalizable, and the two eigenvalues are the same. Inthis
case both F(z) and G(z) have the same constant ex:
F(z) = (z - zo)"fez) and G(z) = (z - zo)"g(z).
3. We cannot find two eigenvectors. This corresponds to the case where A
is not diagonalizable. However, we can always find one eigenvector, so
A has only one eigenvalue, A. We let WI (z) he the solution of the form
(z - zo)" fez), where fez) is single-valued and A = eZ"i". The existence
of such a solution is guaranteed by Theorem 14.1.2. Let wz(z) be any other
linearly independent solution (Theorem 13.3.5 ensures the existence of such
a second solution). Then
and the circuit matrix will be A = (~g), which has eigenvalues A and b.
Since Ais assumed to have only one eigenvalue (otherwise we would have
the first outcome again), we must have b = A. This reduces Ato A = (~ ~),
where a i" O. The condition a i" 0 is necessary to distinguish this case from
the second outcome. Now we analytically continue h(z) sa wZ(Z)/WI(Z)
one whole tum around zo, obtaining
14.2 COMPLEX SOLOES 405
It then follows that the functionI
a
gl (z) == h(z) - 2:n:iJ.. In(z - zo)
is single-valued in TJ < [z - zn] < rz- If we redefine gl (z) and Wz(z) as
(2:n:iJ..ja)gl (z) and (2:n:iJ..ja)wz(z), respectively, we have the following:
14.2.1. Theorem. If p(z) and q(z) are analytic in the annular region rl < [z -
zol < rz. then the SOWE w" + p(z)w' +q(z)w = 0 admits a basis ofsolutions
{WI, wz} in the neighborhood ofthe singular point zo. where either
WI(Z) = (z - zn)" fez),
or, in exceptional cases (when the circuit matrix is not diagonalizable],
WI (z) = (z- zo)"fez),
Thefunctions f (z). g(z), and gl (z) are analytic and single-valued in the annular
region.
This theorem allows us to factor out the branch point zo from the rest of the
solutions. However, even though fez), g(z), and gl (z) are analytic in the annular
region rl < Iz - zoI< r2, they may very well have poles of arbitrary orders at zoo
Can we also factor out the poles? In general, we cannot; however, under special
circumstances, described in the following definition, we can.
14.2.2. Definition. ASOWEoftheform w"+p(z)w'+q(z)w = othat isanalytic
regular singular point in 0 < [z - zol < r has a regular singularpoint at zo ifp(z) has at worsta simple
ofaSOLDE defined pole and q(z) has at worst a pole oforder 2 there.
In a neighborhood of a regular singular point zo, the coefficient functions p(z)
and q (z) have the puwer-series expansions
00
a-I '" k
p(z) = - - + L..., ak(Z - zo) ,
z -zo k~O
b-z b_1 ~ k
q(z) = ( )Z + -- + L...,bk(z - zo) .
z-zo z-zo k~O
Multiplying both sides of the first equation by z - zo and the second by (z - zo)z
and introducing P(z) sa (z - zo)p(z), Q(z) es (z - zo)Zq(z), we obtain
00
P(z) = Lak-I(Z - ZO)k,
k=O
1Recallthat In(z - zo) increases by 21ri foreachturn around zoo
00
Q(z) = Lbk-z(z - zo)k.
k~O
406 14. COMPLEX ANALYSIS OF SOLDES
It is also convenient to multiply the SOLDE by (z - ZO)2 and write it as
(z - ZO)2w" +(z - zo)P(z)w' + Q(z)w = o.
Inspired by the discussion leading to Theorem 14.2.1, we write
(14.5)
00
w(z) = (z - zo)" L Ck(Z - ZO)k,
k=O
Co = I, (14.6)
where we have chosen the arbitrary multiplicative constant in such a way that
Co = 1. Substitute this in Equation (14.5), and change the dummy variable-s-so
that all sums start at O-to obtain
~ { (n +v)(n +v - I)Cn + ~[(k+v)an-k-I +bn-k-2]Ck }
. (z - zo)"+v = 0,
whichresults in therecursion relation
For n = 0, this leads to what is known as the indicial equation for the exponent
v:
indicial equation,
indicial polynomial,
characteristic
exponents
n
(n +v)(n +v - I)Cn = - L[(k +v)an_k_1 +bn-k-2]Ck·
k=O
ltv) ss v(v -I) +a_Iv +b_2 = O.
(14.7)
(14.8)
The roots ofthis equation are calledthe characteristic exponents ofzo, and I (v) is
called its indicial polynomial. Interms ofthis polynomial, (14.7) can be expressed
as
n-I
I(n +v)Cn = - L[(k +v)an-k-I +bn-k-2]Ck
k=O
forn = 1,2, ....
(14.9)
Equation (14.8) determines what values of v are possible, and Equation (14.9)
gives CI, C2, C3, ... , which in turn determine w(z). Special care must be taken
if the indicial polynomial vanishes at n +v for some positive integer n, that is, if
n +v, in addition to v, is a root ofthe indicial polynomial: I (n +v) = 0 = I (v).
If uj and V2 are characteristic exponents of the indicial equation and Re(vI) >
Re(V2),then a solutionfor VI always exists. A solutionfor V2 also exists if VI - v2 #
n for any (positive) integer n. In particular, if Zo is an ordinary point [a point at
which both p(z) and q(z) are analytic], then only one solution is determined by
(14.9). (Why?) The foregoing discussion is summarized in thefollowing:
14.2 COMPLEX SOLDES 407
14.2.3. Theorem. If the dif.ferential equation w" + p(z)w' + q(z)w = 0 has a
regular singular point at Z = zo, then at least one power series of the form of
(14.6) formally solves the equation. If VI and V2 are the characteristic exponents
ofzo. then there are two linearly independent formal solutions unless VI - V2 is
aninteger.
14.2.4. Example. Let us consider somefamiliar differential equations.
(a)TheBessel equation is
" I , ( a
2
)
w + -w + 1- - w·= O.
Z z2
In this case, the originis a regular singular point,a-I = I, andb-z = _a2. Thus, the
indicia! equation is v(v - 1) + v - a2 = 0, and its solutions are vI = a and 112 =
-a. Therefore, there are two linearly independent solutions to theBessel equation unless
VI - v2 = 2a is aninteger, i.e., unlessa is eitheran integerora half-integer.
(b) For the Coulomb potential f (r) = f3Ir, the most general radial eqoation [Equation
(12.14)] reducesto
" 2, (fJ a)
w + -w + - - - w = O.
z z z2
Thepointz = 0 is a regular singular point atwhicha-I = 2 andb-z = -ct. Theindicia!
polynomial is lev) = v2+v -a withcharacteristic exponents vi = -i +iJl +4a and
V2 = -! - !oJI +4a. There are two independentsolutionsunless vI - VZ = oJI +4a
is aninteger. In practice, a = 1(1 + I), where I is some integer; so VI "" vz = 21 + I, and
onlyone solution is obtained.
(c) The hypergeometricdifferential equation is
w" + y - (a +fJ + I)z w' _ ~w = O.
z(1 - z) z(1 - z)
A substantial nwnberof functions inmathematical physicsaresolutions of thisremarkable
equation, withappropriatevaluesfora, fJ, andy. Theregularsingularpoints- arez = 0 and
z = 1. Atz = O,a_1 = y andb_2 = O. The indicialpolynomialis I(v) = v(v +y -I),
whose roots are VI = 0 and V2 = 1 - y. Unless y is an integer, we have two formal
solutions. ..
It is shown in differential equation theory [Birk 78, pp. 40-242] that as long
as VI - V2 is not an integer, the series solution of Theorem 14.2.3 is convergent
for a neighborhood of zoo What happens when VI - VZ is an integer? First, as a
convenience, we translate the coordinate axes so that the point zo coincides with
the origin. This will save us some writing, because instead of powers of z - zo,
we will have powers ofz. Next we let VI = V2 +n with n a positive integer. Then,
since it is impossible to encounter any new zero ofthe indicial polynomial beyond
2Thecoefficientof w neednothavea pole of order 2. Itspole canbe of order oneas well.
408 14. COMPLEX ANALYSIS OF SOLDES
VI, the recursion relation, Equation (14.9), will be valid for all values ofn, and we
obtaina solution:
WI (z) = ZVI f(z) = ZVI (I+ f Ckl) ,
k=1
which is convergent in the region 0 < IzI < r for some r > 0 To investigate the
nature and the possibility of the second solution, write the recursion relations of
Equation (14.9) for the smaller characteristic root V:2:
=PII(V2+I)
~
I(V2 + I)CI = -(V:2aO +b_l) Co =} CI = PI,
1(V2 +2)C2 = -(V:2al +bo)Co - [(V2 + I)ao +b_IlCI =} C2 sa P2,
(14.10)
1(V:2 +n - I)C,,_I "" p,,-II(V2 +n - I)Co =} C,,_I = P,,-I,
1(V:2 +n)C" = I(VI)C" = p"Co =} 0 = P",
where in each step, we have used the result of the previous step in which Ck is
given as a multiple of Co = 1. Here, the p's are constauts depending (possibly in
a very complicated way) on the ak's and bk'S.
Theorem 14.2.3 guarantees two power series solutions only when VI - V2 is
not an integer. When VI - V2 is an integer, Equation (14.10) shows that ariecessary
condition for a secondpower series solution to exist is that Pn = O. Therefore,
when p" i 0, we have to resort to other means of obtaining the second solution.
Let us define the second solution as
=WI(Z)
,......-"-,
W2(Z) "" WI (z)h(z) = ZVl f(z)h(z) (14.11)
(14.12)
and substitute in the SOLDE to obtain a FOLDE in h', namely, h" + (p +
2w;lwI)h' =O,or, by substituting w;lwI =vi/z-} f' If, the equivalentFOLDE
h"+ C~I + 2;' + p)h' = O.
14.2.5. Lemma. The coefficient ofh' in Equation (14.12) has a residue ofn + 1.
Proof Recall that the residue of a function is the coefficient of Z-I in the Laurent
expansion of the function (about Z = 0). Let us denote this residue for the coef-
ficient of h' by A-I. Since f(O) = I, the ratio f' If is analytic at z = O. Thus,
the simple pole at z = 0 comes from the other two terms. Substituting the Laurent
expansion of p(z) gives
2vI 2vI a-I'
-+p= -+-+aO+alz+· ...
Z Z Z
i = 1,2,
14.2 COMPLEX SOLOES 409
This shows that A-I = 2vI +0_1. On the other hand, comparing the two versions
ofthe indicial polynomial v2+(a_1 -1)v+b_2 and (v -VI)(V- V2) = v2- (VI +
1I2)V + VI V2 gives VI + V2 = -(a-I - I), or 2vI - n = -(a-I - I). Therefore,
A_I=2vI+a_l=n+1. 0
14.2.6. Theorem. Suppose that the characteristic exponents ofa SOWE with a
regular singular point at z = 0 are VI and 1J:2. Consider three cases:
1. VI - Vz is not an integer.
2. V:2 = VI - n wheren is a nonnegativeinteger.and Pit, as definedinEquation
(14.10), vanishes.
3. V2 = VI - n wheren is a nonnegativeinteger, and Pn, as definedinEquation
(14.10), does not vanish.
Then, in the first two cases, there exists a basis ofsolutions {WI, W2} oftheform
Wi (z) = zV' (I+f Cki1zk) ,
k~1
and in the third case, the basis ofsolutions takes theform
Wt(z) =ZV! (1+ ~akZk), W2(Z) =zV' (1+ ~bkl) +CWt(z)lnz,
where the power series are convergent in a neighborhood ofz = o.
Proof. The first two cases have been shown before. For the third case, we use
Lemma 14.2.5 and write
2vI 2f' n + I ~ k
- + -f + p = - - + L... CkZ ,
z Z k~O
and the solution for the FOLDE in h' will be [see Equation (14.3) and the discussion
preceding it]
h' (z) = z-n-t (1+ f bkl) .
k~t
For n = 0, i.e., when the indicial polynomial has a double root, this yields h' (z) =
l/z+ I:~1 bkZk- l, or h(z) = Inz+ gl (z), where gl is analytic in a neighborhood
ofz = O. Forn oF 0, we have h'(z) = bn/z+ I:i4n bkZk-n-1and, by integration,
00 b
h(z) = bn ln z + L _k_l-n
k;fnk-n
b In -n ~ bk k b In -n ()
= n z+z L...k_nz = n z+z g2Z,
kim
(14.14)
410 14. COMPLEX ANALYSIS OF SOLOES
where gz is analytic in a neighborhood of z = O. Substituting h in Equation (14.11)
and recalling that Vz = VI - n, we obtain the desired results of the theorem. 0
14.3 Fuchslan Differential Equations
In many cases of physical interest, the behavior of the solution of a SOLDE at
infinity is important. For instance, bound state solutions of the Schrodinger equa-
tion describing the probability amplitndes of particles in quantum mechanics must
tend to zero as the distance from the center of the binding force increases.
We have seen that the behavior of a solution is determined by the behavior
of the coefficient functions. To determine the behavior at infinity, we substitnte
z = lit in the SOLDE
dZw dw
-z + P(z)-d +q(z)w = 0 (14.13)
dz z
and obtain
dZv [2 1 ] dv 1
- + - - -r(t) - + -s(t)v = 0,
dt Z t t Z dt t4
where v(t) = w(1lt), r(t) = p(llt), ands(t) = q(llt).
Clearly, as z --+ 00, t --> O. Thus, we are interested in the behavior of (14.14)
at t = O. We assume that both r(t) and s(t) are analytic at t = O. Equation (14.14)
shows, however, that the solution v(t) may still have singularities att = 0 because
of the extra terms appearing in the coefficient functions.
We assume that infinity is a regular singular point of (14.13), by which we
mean that t = 0 is a regular singular point of (14.14). Therefore, in the Taylor
expansions ofr(t) and s(t), the first (constant) term ofr(t) and the first two terms
of s(t) must be zero. Thus, we write
00
r(t) = alt +aztZ+ ... = Lakl,
k~l
00
s(t) = bztZ+b3t3 + ... = Lbktk.
k=Z
By their definitions, these two equations imply that for p(z) and q(z), and for large
values of [z], we must have expressions of the form
(14.15)
(14.16)
(14.17)
Fuchsian DE
Asecond-order
Fuchsian DE with two
regular singular
points leads to
uninteresting
solutions!
Asecond-order
Fuchsian DE with
three regular singular
points leads to
Interesting solutions!
14.3 FUCHSIAN DIFFERENTIAL EQUATlDNS 411
When infinity is a regular singular point of Equation (14.13), or, equiva-
lently, when the origin is a regular singular point of (14.14), it follows from
Theorem 14.2.6 that there exists at least one solution of the form VI(t) =
t" (I +L~l ckrk) or, in terms of z,
-"(I ~ Ck)
WI (z) = z + L..J k .
k=l Z
Here a is a characteristic exponents at t = 0 of(14.14), whoseindicial polynomial
is easily found to be a(a - I) + (2 - al)a +b: = o.
14.3.1. Definition. A homogeneous differential equation with single-valued an-
alytic coefficient functions is called a Fuchsian differential equation (FDE) if it
has only regular singular points in the extended complex plane, i.e., the complex
plane including the point at infinity.
It turns out that a particnlar kind of FDE describes a large class of nonelemen-
tary functions encountered in mathematical physics. Therefore, it is instructive to
classify various kinds of FDEs. A fact that is used in such a classification is that
complex functions whose only singularities in the extended complex plane are
poles are rational functions, i.e., ratios of polynomials (see Example 10.2.2). We
thus expect FDEs to have only rational functions as coefficients.
Consider the case where the equation has at most two regular singnlar points at
Zl and Z2. We introduce anew variable Hz) = Z - Zl . The regnlar singular points
Z -Z2
at zt and Z2 are mapped onto the points ~I = ~(ZI) = 0 and ~2 = HZ2) = 00,
respectively, in the extended ~-plane. Equation (14.13) becomes
d2u du
d~2 + <I>(~) d~ + e(~)u = 0,
where u, <1>, and e are functions of ~ obtained when Z is expressed in terms of
~ in w(z), p(z), and qtz), respectively. From Equation (14.15) and the fact that
~ = 0 is at most a simple pole of <I>(~), we obtain <I>(~) = al/~. Similarly,
e(~) = b2/~2. Thus, a SOFDE with two regular singular points is equivaleut
to the DE wIt + (al/Z)w' + (b2/Z2)W = O. Mnltiplying both sides by Z2, we
obtain Z2w" + alzw' + b2W = 0, which is the second-order Euler differential
equation. A general nth-orderEuler differential equation is equivalent to a NOLDE
with constant coefficients (see Problem 13.28). Thus, a second order Fuchsian DE
(SOFDE) with two regular singular points is eqnivalent to a SOLDE with constant
coefficients and produces nothing new.
The simplest SOFDE whose solutions may include nonelementary functions
is therefore one having three regular singnlar points, at say ZI, Z2, and Z3. By the
transformation
~(z) = (z - ZI)(Z3 - Z2)
(z - Z2)(Z3 - ZI)
412 14. COMPLEX ANALYSIS OF SOLOES
(14.18)
Riemann differential
equation
we can map Zl, Z2, and Z3 onto ~I = 0, ~2 = 00, and ~3 = 1. Thus, we assume
that the three regular singular points are at Z = 0, z = I, and z = 00. It can be'
shown [see Problem (14.8)] that the most geuera! p(z) and q(z) are
Al BI A2 B2 A3
p(z) = - + --1 and q(z) = 2: + ( 1)2 ( I)
z z- z z- zz-
We thus have the following theorem.
14.3.2. Theorem. The most general second order Fuchsian DE with three regular
singular points can be transformed into the form
/I (AI BI) I [A2 B2 A3] 0
w+-+--w+-+ - w=,
z Z - I z2 (z - 1)2 z(z - I)
where AI, A2, A3, BI, and B2 are constants. This equation is called the Riemann
differential equation.
We can write the Riemaruo DE in terms of pairs of characteristic exponents,
(;,1,1.2), (1-'1,1-'2), and (VI, V2), belonging to the singular points 0, 1, and 00,
respectively. The indicia! equations are easily found to be
1.
2
+ (AI - 1)1.+ A2 = 0,
1-'2 +(BI - 1)1-' +B2 = 0,
v
2
+(I - Al - BI)v +A2 +B2 - A3 = O.
By writing the indicia! equations as (A - 1.1)(1. - 1.2) = 0, and so forth and
comparing coefficients, we can find the following relations:
Al = I - Al - 1.2,
BI = I - 1-'1 - 1-'2,
AI+BI =VI+V2+1,
A2 = 1.11.2,
B2 = 1-'11-'2,
A2 + B2 - A 3 = VIV2.
These equations lead easily to the Riemann identity
Al +1.2+1-'1 +1-'2 +VI +V2 = 1. (14.19)
(14.20)
Substituting these results in (14.18) gives the following result.
14.3.3. Theorem. A second order Fuchsian DE with three regular singuiarpoints
in the extended complex plane is equivalent to the Riemann DE,
(
1 - 1.1 - 1.2 1- "1- "2)
wI! + + t'" r- w'
Z z-I
[
1.11.2 1-'11-'2 VIV2 - 1.11.2 -1-'11-'2]
+ --+ + w-O
Z2 (z - 1)2 z(z - I) - ,
which is uniquely determined by the pairs of characteristic exponents at each
singularpoint. The characteristic exponents satisfy the Riemann identity, Equation
(14.19).
14.4 THE HYPERGEOMETRIC FUNCTION 413
The uniqueness of the Riemann DE allows us to derive identities for solutions
and reduce the independent parameters of Equation (14.20) from five to three. We
first note that if w(z) is a solution of the Riemann DE corresponding to (AI, A2),
(J.'I, J.'2), and (VI, 112), then the function
u(z) = z},(z - I)JLw(z)
has branch points at z = 0, 1,00 [because w(z) does]; therefore, it is a solution of
the Riemann DE. Its pairs of characteristic exponents are (see Problem 14.10)
(AI +A,A2+A),
In particular, if we let A =-AI and J.' = -J.'1, then the pairs reduce to
(0, J.'2 - J.'I),
Defining a sa VI + Al + J.L!, fJ es V2 + Al + J.'I, and y == I - A2 + AI, and using
(14.19), we can write the pairs as
(O,I-y), (0, y - a - fJ), (a, fJ),
which yield the third version of the Riemann DE
w" + (1:'. + 1- Y + a + fJ) w' + afJ w = O.
z z - I z(z - 1)
hypergeometric DE This important equation is commonly written in the equivalent form
z(1 - z)w" + [y - (1 + a + fJ)z]w' - afJw = 0 (14.21)
where ao = I.
and is called the hypergeometric differential equation (HGDE). We will study
this equation next.
14.4 The Hypergeometric Function
The two characteristic exponents of Equation (14.21) at z = 0 are 0 and I - y.
It follows from Theorem 14.2.6 that there exists an analytic solution (correspond-
ing to the characteristic exponent 0) at z = O. Let us denote this solution, the
hypergeomelric hypergeometric function, by F(a, fJ; y; z) and write
function
00
F(a, fJ; y; z) = I>kZk
k~O
Substituting in the DE, we obtain the recurrence relation
(a +k)(fJ +k)
ak+I = (k + I)(y +k) ak
for k 2: O.
(14.23)
414 14. COMPLEX ANALYSIS OF SOLOES
Thesecoefficients canbedetermined successivelyif y isneither zeronoranegative
hypergeometric integer:
series
. . _ ~ a(a + 1)··· (a + k - 1)f:l(f:l + 1)· .. (f:l + k - 1) k
F(a,f:l,y,z)-l+t;r k!y(y+1) ... (y+k-1) z
r(y) ~ r(a + k)r(f:l + k) k
= r(a)r(f:l) 6 r(k + l)r(y + k) z . (14.22)
The series in (14.22) is called the hypergeometric series, because it is the gener-
alization of F(l, f:l; f:l; z), which is simply the geometric series.
We note immediately from (14.22) that
14.4.1. Box. The hypergeometric series becomes a polynomial if either a
or f:l is a negative integer.
Tbisis because for k < lal (ork < If:ll)bothr(a+k) [orr(f:l+k)]and I'(o)
[or r(f:l)] have poles that cancel each other. However, r(a + k) [or r(f:l + k)]
becomes finitefor k > lal (or k > If:ll), and the pole in I'(o) [or r(f:l)] makes the
denominator infinite. Therefore, all terms of the series (14.22) beyond k = la I(or
k = If:lll will be zero.
Many of the properties of the hypergeometric function can be obtained directly
from the HGDE, Equation (14.21). For instance, differentiating the HGDE and
letting v = w', we obtain
z(l - z)v" + [y + 1 - (a + f:l + 3)zlv' - (a + 1)(f:l+ l)v = 0,
which shows that F'(a, f:l; y; z) = CF(a + 1, f:l + 1; y + 1; z), The constant C
can be determined by differentiating Equation (14.22), setting z = 0 in the result.'
and noting that F(a + I, f:l + 1; y + 1; 0) = 1. Then we obtain
af:l
F'(a,f:l; y; z) = -F(a + I, f:l + 1; y + 1; z).
y
Now assume that y '" 1, and make the substitution w = Zl-Yu in the HGDE
to obtain" z(l-z)u" +[Yl - (al +f:ll + l)z]u' -alf:llu =0, where al =a -Y + 1,
f:ll = f:l- Y + I, and Yl = 2 - y. Thus,
u = F(a - y + I, f:l- Y + 1; 2 - y; z),
3Notethat thehypergeometric function evaluates to 1 atZ = 0 regardless of its parameters.
4Inthefollowingdiscussion, aI, Ih, andYl will represent theparameters of thenew DE satisfied by thenewfunction defined
in terms of theold.
14.4 THE HYPERGEOMETRIC FUNCTION 415
and u is therefore analytic at z = D. This leads to an interesting resnIt. Provided
that y is not an integer, the two functions
Wt(z) == F(a, fJ; y; z), W2(Z) == zt-y F(a - y + 1, fJ - Y + I; 2 - y; z)
(14.24)
form a canonical basis of solutions to the HGDE at Z = D. This follows from
Theorem 14.2.6 and the fact that (D, I - y) are a pair of (different) characteristic
exponents at z = D.
Johann CarlFriedrich Gauss (1777-1855)wasthegreatestof
all mathematiciansandperhapsthemostrichlygifted geniusof
whom there is any record. He was bornin the city ofBrunswick
in northern Germany, His exceptional skill with numbers was
clear at a very early age, and in later life he joked that he knew
how to count before he could talk. It is said that Goethe wrote
anddirectedlittleplaysfor a puppettheaterwhen he was 6 and
that Mozart composed his first childish minuets when he was
5, but Gauss corrected an error in his father's payroll accounts
at the age of 3. At the age of seven, when he started elemen-
tary school, his teacher was amazed when Gauss summed the
integers from 1 to 100 instantly by spotting that the som was
50 pairs of numbers each pair summing to 1Oi.
His long professional life is so filled withaccomplishments that it is impossible to give
a full account of them in the short space available here. All we can do is simply give a
chronology of his almost uncountable discoveries.
1792-1794: Gauss reads the works of Newton, Euler, and Lagrange; discovers the prime
number theorem (at the age of 14 or 15); invents the method of least squares; conceives the
Gaussian law of distribution in the theory of probability.
1795:(only 18years old!) Proves tbataregularpolygonwithn sides is constructible (by ruler
and compass) ifand only ifn is the productofa powerof2 and distinct prime numbers ofthe
form Pk = 2
2k
+1, and completely solves the 200D-year old problem ofruler-and-compass
construction of regular polygons. He also discovers the law of quadratic reciprocity.
1799: Proves the fundamental theorem ef algebra in his doctoral dissertation using the
then-mysterious complex numbers with complete confidence.
1801: Gauss publishes his DtsquistitonesArithmeticae in which he creates the modem rig-
orous approach to mathematics; predicts the exact location of the asteroid Ceres.
1807:Becomes professor ofastronomy and the director ofthe new observatory at Gottingen,
1809:Publishes his second book, Theoria motus corporum coelestium, a major two-volume
treatise on the motion ofcelestial bodies and the bible of planetary astronomers for the next
100years.
1812: Publishes Dlsqulsitiones generales circa seriem infinitam, a rigorous treatment of
infinite series, and introduces the hypergeometric function for the first time, for which he
uses the notation F(et,fJ; Y; z): an essay on approximate integration.
1820-1830: Publishes over 70 papers, including Disquisitiones generales circa superficies
curvas, in which he creates the intrinsic differential geometry of general curved surfaces,
416 14. COMPLEX ANALYSIS OF SOLOES
theforerunnerofRiemanniangeometryandthe generaltheoryofrelativity. From the 18308
on, Gauss was increasingly occupied with physics, and he enriched every branch of the
subject he touched. In the theory of surface tension, he developed the fundamental idea of
conservation ofenergy and solved the earliest problem in the calculus ofvariations. In op-
tics,he introducedthe conceptof thefocallengthof a systemoflenses.He virtuallycreated
the science of geomagnetism, and in collaboration with his friend and colleague Wilhelm
Weber he invented the electromagnetic telegraph. In 1839 Gauss published his fundamental
paper on the general theory of inverse square forces, which established potential theory as
a coherent branch of mathematics and in which he established the divergence theorem.
Gauss had many opportunities to leave Gottingen, but he refused all offers and remained
there for the rest of his life, living quietly and simply, traveling rarely, and working with
immense energy on a wide variety of problems in mathematics and its applications. Apart
from science andhis family-he married twice and had six children, two ofwhom emigrated
to America-his main interests were history and world literature, international politics, and
publicfinance. He owned alarge library of about 6000 volumes in manylanguages, including
Greek, Latin, English, French, Russian, Danish, and of course German. His acuteness in
handling his own financial affairs is shown by the fact that although he started with virtually
nothing, he left an estate over a hundred times as great as his average annual income
during the last half of his life. The foregoing list is the published portion of Gauss's total
achievement; the unpublished and private part is almost equally impressive. His scientific
diary, a little booklet of 19 pages, discovered in 1898, extends from 1796 to 1814 and
consists of 146 very concise statements of the results of his investigations, which often
occupied him for weeks or months. These ideas were so abundant and so frequent that he
physically did not have time to publish them. Some of the ideas recorded in this diary:
Cauchy Integral Formula: Gauss discovers it in 1811, 16 years before Cauchy.
Non-Euclidean Geometry: After failing to prove Euclid's fifth postulate at the age of 15,
Gauss came to the conclusion that the Euclidean form of geometry cannot be the only one
possible.
Elliptic Functions: Gauss had found many of the results of Abeland Jacobi (the two main
contributors to the subject) before these men were born. The facts became known partly
through Jacobi himself. His attention was caught by a cryptic passage in the Disquisitiones,
whose meaning can only be understood if one knows something about elliptic functions. He
visited Gauss on several occasions to verify his suspicions and tell him about his own most
recent discoveries, and each time Gauss pulled 30-year-old manuscripts out ofhis desk and
showed Jacobi what Jacobi had just shown him. After a week's visit with Gauss in 1840,
Jacobi wrote to his brother, "Mathematics would be in a very different position if practical
astronomy had not diverted this colossal genius from his glorious career."
A possible explanation for not publishing such important ideas is suggested by his
comments in a letter to Bolyai: "It is not knowledge but the act of learning, not possession
but the act of getting there, which grants the greatest enjoyment. When I have clarified and
exhausted a subject, then I turn away from it in order to go into darkness again." His was
the temperament of an explorer who is reluctant to take the time to write an account of his
last expedition when he could be starting another. As it was, Gauss wrote a great deal, but
to have published every fundamental discovery he made in a form satisfactory to himself
would have required several long lifetimes.
14.4 THE HYPERGEOMETRIC FUNCTION 417
A third relation can be obtainedby making the snbstitntion w = (I-z)y-a-pu.
Tltis leads to a hypergeometric equation for u with al = Y - a, fh = Y - fJ, and
Yl = y. Furthermore, w is analytic at z = 0, and w(O) = I. We couclude that
w = F(a, fJ; Y; z), We therefore have the ideutity
F(a, fJ; Y; z) = (I - z)y-a-pF(y - a, Y - fJ; Y; z), (14.25)
To obtain the canonical basis at z = I, we make the substitution t = 1-z, and
notethattheresultisagaintheHGDE,withal = a,fJI = fJ,andYI = a+fJ-y+1.
It follows from Equatiou (14.24) that
W3(Z) sa F(a, fJ; a +fJ - Y + I; 1- z),
W4(Z) == (I - z)y-a-pF(y - fJ, Y - a; Y - a - fJ + I; I - z)
(14.26)
form a canonical basis of solutious to the HGDE at z = I.
A symmetry of the hypergeometric functiou that is easily obtained from the
HGDEis
F(a, fJ; Y; z) = F(fJ, a; Y; z).
The six functions
(14.27)
F(a ± I, fJ; Y; z), F(a, fJ ± I; Y; z), F(a, fJ; Y ± I; z)
(14.28)
(14.29)
are called hypergeometric functions contiguous to F(a, fJ; Y; z), The discussiou
above showed how to obtain the basis of solutions at z = I from the regular
solutiou to the HDE z = 0, Fta, fJ; Y; z). We can show that the basis of solutious
at z = 00 can also be obtained from the hypergeometric fuuctiou.
Equatiou (14.16) suggests a function of the form
v(z) =z' F (ai, fJI; Yl; D
sa z'w G) =} w(z) =z'v G),
where r, ai, fJl' and Ylare to be determined. Since w(z) is a solution ofthe HGDE,
v will satisfy the following DE (see Problem 14.15);
z(l- z)v" + [I - a - fJ -2r - (2 - Y - 2r)z]v'
- [r
2
- r +ry - ~(r +a)(r +fJ)] v = O.
This reduces to the HGDE if r = -a or r = -fJ. For r = -a, the parameters
become al = a, fJl = I +a - Y, and YI = a - fJ+I. For r = -fJ, the parameters
are al = fJ, fJI = I +fJ - Y, and Yl = fJ - a + I. Thus,
VI(Z) = z-a F (a, I +a - Y; a - fJ + I; ~) ,
V2(Z) = z-p F (fJ, I +fJ - Y; fJ - a + I; D (14.30)
(14.31)
418 14. COMPLEX ANALYSIS OF SOLDES
form a canonical basis of solutions for the HGDE that are valid about z = 00.
As the preceding discussion suggests, it is possible to obtain many relations
among the hypergeomettic functions with different parameters and independent
variables. In fact, the nineteenth-century mathematicianKummer showed that there
are 24 different (but linearly dependent, of course) solutions to the HGDE. These
Kummer's solutions are collectively known as Kummer's solutions, and six of them were derived
above. Another important relation (shown in Problem 14.16) is that
z·-y (1 - z)Y-.-pF (Y - a, I - a; I - a +fJ; ~)
also solves the HGDE.
Many of the functions that occur in mathematical physics are related to the
hypergeomettic function. Even some ofthe common elementary functions can be
expressed in terms of the hypergeometric function with appropriate parameters.
For example, when fJ = y, we obtain
. . _ ~ r(a +k) k _ _.
Fta, fJ, fJ, z) - f;;o r(a)r(k + I) z - (I - z) .
Similarly, F(~, ~; ~; z2) = sin-1
zlz, and F(I, I; 2; -z) = In(1 + z)lz. How-
ever, the real power of the hypergeomettic function is that it encompasses almost
all of the nonelementary functions encountered in physics. Let us look briefly at a
Jacobi functions few of these. Jacobi functions are solutions of the DE
2 d2u
du
(I - x )-2 + [fJ - a - (a + fJ + 2)x]- + ).(). + a +fJ + I)u = 0
dx dx (14.32)
Defining x = I - 2z changes this DE into the HGDE with parameters al = ).,
fJl = ). + a + fJ + I, and Yl = I + a. The solutions of Equation (14.32), called
the Jacobi functions of the first kind, are, with appropriate normalization,
c. R) r(). +a + 1) ( I - Z)
P op (z) = F -). ). + a + fJ + I' I + a' - - .
A r(). + l)r(a + I ) ' " 2
When). = n, a nonnegative integer, the Jacobi function turns into a polynomial
of degree n with the following expansion:
pCa,PJ(z) = r(n+a+l) ~r(n+a+fJ+k+I).(Z-I)k
n r(n + I)r(n + a + fJ + I) f;;o I'(o + k + I) 2
These are the Jacobipolynomials discussed in Chapter7.In fact, the DE satisfied by
pJ.,PJ (x) ofChapter7 is identicalto Equation (14.32). Note that the transformation
x = I - 2z translates the points z = 0 and z = 1 to the points x = I and x = -I,
respectively. Thus the regular singular points of the Jacobi functions of the first
kind are at ±1 and 00.
14.5 CONFLUENT HYPERGEOMETRIC FUNCTIONS 419
A second, linearly independent, solution of Equation (14.32) is obtained by
using (14.31). These are called the Jacobi functions of the second kiud:
lO,P) _ 2,,+o+Pr(A + ex + 1)r(A + fJ + I)
Q, (z) - r(2A + ex + fJ + 2)(z - 1),,+o+l(z + I)P
. F (A+ex + I,A + 1; 2A +ex + fJ +2; _2_).
1-z (14.33)
(14.34)
(14.35)
(14.36)
Gegenbauer Gegeubauer functions, orultraspherical functions, are special cases ofJacobi
functions functions for which ex = fJ = /-' - ~. They are defined by
/L _ r(A+2/-,) ( . 1. 1-Z)
c, (z) - rCA + 1)r(2/-,) F -A, A+ 2/-" /-'+ 2' -2- .
Note the change in the normalization constant. Linearly independent Gegenbauer
functions "of the second kind" can be obtained from the Jacobi functions of the
Legendre functions second kind by the substitution ex =fJ =/-'-~. Another special case of the Jacobi
functions is obtained when ex = fJ = O. Those obtained from the Jacobi functions
of the first kind are called Legendre functions of the first kind:
_ (0,0) 1/2 ( . . 1 - Z)
F,(z) = F, (z) = c, = F -A, A+1, 1, -2- .
Legendre fuuctions of the second kind are obtained from the Jacobi functions of
the second kind in a similar way:
2'r
2(A
+ 1) ( 2 )
Q,(z) = r(2A + 2)(z _ 1)"+1F A+ 1, A+ 1; 2A+ 2; 1 _ z .
Otherfunctions derived from the Jacobifunctions are obtained similarly (see Chap-
ter7).
14.5 Confluent Hypergeometric Functions
The transformation x = 1 - 2z translates the regular singular points ofthe HGDE
by a finite amount. Consequently, the new functions still have two regular singular
points, z = ±I, in the complex plane. In some physical cases of importance,
only the origin, corresponding to r = 0 in spherical coordinates (typically the
location of the source of a central force), is the singular point. Ifwe want to obtain
a differential equation consistent with such a case, we have to "push" the singular
point z = 1 to infinity. This can be achieved by making the substitution t = rz in
the HGDE and taking the limit r --> 00. The substitution yields
d
2w
(Y I - Y + ex + fJ) dw exfJ 0
dt2 + t + t - r at + t(t _ r) w = .
420 14. COMPLEX ANALYSIS OF SOLOES
(14.38)
confluent
hypergeomelric OE
confluent
hypergeometric
function and series
If we blindly take the limit r --+ 00 with a, fJ, and )I remaining finite, Equation
(14.36) reduces to w+ ()I/t)th = 0, an elementary FODE in th. To obtain a
nonelementary DE, we need to manipulate the parameters, to let some of them
tend to infinity. We want )I to remain finite, because otherwise the coefficient of
dw/dt will blow up. We therefore let fJ or a tend to infinity. The result will be
the same either way because a and fJ appear symmetrically in the equation. It is
customary to let fJ = r --+ 00. In that case, Equation (14.36) becomes
d
2w
+ (l'. _I) dw _ ':W = O.
dt2 t dt t
Multiplying by t and changing the independent variable back to Z yields
zw"(z) +()I - z)w'(z) - OIW(Z) = O. (14.37)
This is called the confluent hypergeometric DE (CHGDE).
Since Z = 0 is still a regular singular point of the CHGDE, we can obtain
expansions about that point. The characteristic exponents are 0 and 1-)1, as before.
Thus, there is an analytic solution (corresponding to the characteristic exponent
0) to the CHGDE at the origin, which is called the confluent hypergeomelric
function and denoted by <p(0I; )I; z). Since z = 0 is the ouly possible (finite)
singularity of the CHGDE, <P(OI; )I; z) is an entire function.
We can obtain the series expansion of <P(OI; )I; z) directly from Equation
(14.22) and the fact that <p(0I; )I; z) = limp-+o F(OI, fJ; )I; z/fJ). The result is
<P a. . _ r()I) f'... r(OI +k) k
( ,)I,z) - I'(c) f;;Q r(k+ 1)r()I +k)z .
This is called the conflnent hypergeometric series. An argument similar to the
one given in the case of the hypergeometric function shows that
,
14.5.1. Box. The confluent hypergeometricfunction <P(a;)I; z) reduces to
a polynomial when a is a negative integer.
A second solution of the CHGDE can be obtained, as for the HGDE. If 1-)I is
not an integer, then by taking the limit fJ --+ 00 of Equation (14.24), we obtain the
second solution Zl-Y <P (a -)I +I, 2 -)I; z). Thus, any solution ofthe CHGDE can
be written as a linear combination of <p(0I; )I; z) and zl-y <P(a -)I +I, 2 -)I; z),
14.5.2. Example. Thetime-independent Schrodinger equation for a centralpotential, in
units in whichIi = m = I, is -!V21jf + V(r)1jf = E1jf. Forthe case of hydrogen-like
hydrogen-like atoms atoms, V(r) = -Z;' [r, where Z is the atomic number, and theequation reducesto
(
2Ze2)
V
2
1jf + 2E + -r- 1jf =O.
(14.39)
quantization ofthe
energy ofthe
hydrogen atom
14.5 CONFLUENT HYPERGEOMETRIC FUNCTIONS 421
The radial part of this equation is given by Eqnation (12.14)with fer) =2E +2Ze21r.
Definingu = rR(r), we maywrite
e, ( a b)
- + A+ - - - u =0,
dr 2 r r2
where A = 2E, a = 2Ze2, and b = 1(1 + I). This equation can be further simplified by
defining r == kz (k is anarbitrary constant tobe determined later):
d
2u
( 2 ak b)
- + Ak + - - - u =O.
dz2 Z z2
Choosing Ak2 =-! and introducing a == al(2H) yields
d2u+(_!+~_~)u=0.
dz2 4 z z2
Equations ofthisform canbe transformed intotheCHGDE by making thesubstitution
u(z) = z"e-vz f(z).1t theu follows that
d
2f
+(2fJ. _2v)df +[_!+fJ.(fJ.- I ) _2fJ.v +~-~+v2]f=0.
dz2 z dz 4 z2 zzz2
Cboosing v2 = ! and fJ.(fJ. - I) = b reduces this equation to
I" + c:-2v) r - 2fJ.
V
z
- a f = 0,
which is in the form of (14.37).
Onphysicalgrounds, we expect u(z) --+ 0 as z --+ 00.5 Therefore, v = !. Similarly,
with fJ.(fJ. - I) = b = 1(1 + I), we obtain the two possibilities fJ. = -I and fJ. = I + I.
Againonphysical grounds, we demand that u(O)befinite (thewavefunction mustnot blow
up at r = 0). This implies'' that fJ. = I + I. We thus obtain
fff + [2(1; I) _ I] t' _ I + ~- a f = O.
Multiplying by z gives zf" + [2(1 + I) - zlr" - (I + I - a)f =O. Comparing this with
Equation (14.37) shows that f is proportional to cf>(I + I-a, 21 +2; z). Thus, the solution
of (14.39) can be written as
u(z) =Cz1
+te-z/2cf> (1+ 1- a, 21 +2; z).
An argument similar to that used in Problem 13.20 will reveal that the product
e-z/2cf> (I+I-a, 21+2; z) willbeinfiniteunless the power series representing cf> terminates
(becomes a polynomial). It follows from Box 14.5.1 that this will take place if
I + I-a =-N (14.40)
5Thisis becausethevolumeintegral of 11112 overallspacemustbe finite. Theradial part of thisintegral is simplytheintegral
of r2R2(r) = u2(r). This latter integral will notbe finite unlessu(oo) = o.
6Recallthat JL is theexponent of z =r / k.
422 14. COMPLEX ANALYSIS OF SOLOES
for some integer N :::: O. In that case we obtain the Laguerre polynomials
L j = r(N+j+l) <1>(-N' I')
N - I'(N +I)r(j + I) ,J + ,z , where j = 21+ I.
Condition (14.40) is the quantization rule for the energy levels of a hydrogen-like
atom. Writing everything in tenus of the original parameters and defining n = N +I + 1
yields-after restoring all the m's and the !i's-the energy levels of a hydrogen-like atom:
E = _ Z2me4 = _Z2 (me
2
) a2~,
21i2n2 2 n2
where a = eO1(1ic) = 1/137 is the fine structure constant.
The radial wave functions can now be written as
Rn I(T) = Un,I(T) = CTle-bj(naO)<1> (-n +1+ 1,21 +2; 2ZT) ,
, r naO
where an = 1i2/ (me2) = 0.529 x 10-8 em is the Bohr radius. III
Friedrich Wilhelm Bessel (1784-1846) showed no signs of
unusual academic ability in school, although he did show a
liking. for mathematics and physics. He left school intending
to become a merchant's apprentice, a desire that soon mate-
rialized with a seven-year unpaid apprenticeship with a large
mercantile finn in Bremen. The young Bessel proved so adept
at accounting and calculation that he was granted a smallsalary,
with raises, after only the first year. An interest in foreign trade
led Besselto study geography aod laoguages at uight, astonish-
ingly learning to read and write English in only three months.
He also studied navigation in order to qualify as a cargo officer
aboard ship, but his innate curiosity soon compelledhim to investigate astronomy at a more
fundamentalleveI. Still serving his apprenticeship, Bessel learned to observe the positions
of stars with sufficient accuracy to determine the longitude of Bremen, checking his results
against professional astronomical journals. He then tackled the more formidable problem
of determining the orbit of Halley's comet from published observations. After seeing the
close agreement between Bessel's calculations and those ofHaIley, the German astronomer
Olbers encouraged Bessel to improve his already impressive work with more observations.
The improved calculations, an achievement tantamount to a modem doctoral dissertation,
were published with Olbers's recommendation. Bessel later received appointments with
increasing authority at observatories near Bremen and in Konigsberg, the latter position
being accompanied by a professorship. (The title of doctor, required for the professorship,
was granted by the University of Gottingen on the recommendation of Gauss.)
Bessel provedhimselfan excellentobservational astronomer. His careful measurements
coupled with his mathematical aptitude allowed him to produce accurate positions for a
number of previously mapped stars, taking account of instrumental effects, atmospheric
refraction, and the position and motion of the observation site. In 1820 he determined the
position of the vernal equinox accurate to 0.01 second, in agreement with modern values.
14.5 CONFLUENT HYPERGEOMETRIC FUNCTIONS 423
His observation of the variation of the proper motion of the stars Sirius and Procyon led
him to posit the existence of nearby, large, low-luminosity stars called dark companions.
Between1821and 1833he cataloguedthe positionsof about 75,000 stars, publishing his
measurements in detail. One of his most important contributions to astronomy was the
determinationof the distanceto a star using parallax. This method uses triangulation,or the
determinationof the apparentpositions of a distant object viewedfrom two points a known
distanceapart,in this casetwodiametrically opposedpointsof the Earth's orbit.The angle
subtended by the baseline of Earth's orbit, viewed from the star's perspective, is known
as the star's parallax. Before Bessel's measurement, stars were assumed to be so distant
that their parallaxeswere too small to measure, and it was further assumed that bright stars
(thought to be nearer) would have the largest parallax. Bessel correctly reasoned that stars
with large proper motions were more likely to be nearby ones and selected such a star, 61
Cygni, for his historic measurement. His measured parallax for that star differs by less than
8% from the currently accepted value.
Given such an impressive record in astronomy, it seems only fitting that the famous
functions that bear Bessel'sname grew out ofhis investigations ofperturbations in planetary
systems. He showed that such perturbations could be divided into two effects and treated
separately: the obvious direct attraction due to the perturbing planet and an indirect effect
caused by the sun's response to theperturber's force. The so-called Bessel functions then
appear as coefficients in the series treatment of the indirect perturbation. Although special
cases of Bessel functions were discovered by Bernoulli,Euler, and Lagrange, the systematic
treatment by Bessel clearly established his preeminence, a fitting tribute to the creator of
the most famous functions in mathematical physics.
14.5.1 BesselFunctions
Bessel differential The Bessel differential equation is usually written as
equalion
1/ I, ( V
2)
W + -w + 1- - w = 0
Z Z2
(14.41)
As in the example above, the substitution w = zI'e-~' f(z) transforms (14.41)
inlo
d
2
f (2/k + I ) df [/k
2
- v
2
~(2/k + I) 2 IJ f 0
-+---2~-+ - +~+ =,
di z ~ ~ z
which, if we set /k =v and ~ =i, reduces to
r +CV:I _ 2i)f' _ (2V;I); f = O.
Making the further substitution 2;z = t, and multiplying out by t, we obtain
d2f
df
t -
2
+(2v+I-t)--(v+!)f=O,
dt dt
424 14. COMPLEX ANALYSIS OF SOLOES
which is in the fonn of (14.37) with a = v + 1and y = 2v + 1.
Thns, solutions of the Bessel equation, Equation (14.41), can be wtitten as
constant multiples of zVe-iz<I>(v +1,2v +I; 2iz). With propernonnalization, we
Bessel function ofthe define the Bessel function of the first kind of order v as
firstkind
1 (Z)v. 1
Jv(z) = - e-"<I>(v + 2' 2v + 1; 2iz).
f(v + 1) 2
Using Equation (14.38) and the expansion for e-iz, we can show that
(
Z ) V 00 (_I)k (Z)2k
Jv(z)= 2" ~k!f(V+k+l) 2" .
(14.42)
(14.43)
The second linearly independent solution can be obtained as usual and is propor-
tional to
ZI-(2v+1) G)v e-iz<I>(v +1- (2v + I) + I, 2 ., (2v + I); 2iz)
=C G)-v e-iz<I>(_v + 1,-2v + 1; 2iz) =C1-v(z) ,
provided that 1- y = 1- (2v +1) = -2v is not an integer. When v is an integer,
1-n(z) = (_I)n In(z) (see Problem 14.25). Thus, when v is anoninteger, the most
general solution is of the form AJv(z) + B1-v(z).
How do we find a second linearly independent solution when v is an integer
n? We first define
Bessel function ofthe
second kind, or
Neumann function
J» (z) cos V1f - 1-v(z)
Yv(z) = . ,
smuzr
(14.44)
called the Bessel function of the second kind, or the Nenmann function. For
noninteger v this is simply a linear combinatiou of the two linearly independent
solutious. For integer v the function is indeterminate. Therefore, we use I'Hopital's
rule and define
. 1 . [aJv na1-V)
Yn(z) == Ion Yv(z) = - Ion - - (-1) - - .
v-+n 1{ v--+-n 8v 8v
Equation (14.43) yields
aJv = Jv(z) In (!:) _(!:)V f(_1)k W(v +k + 1) (!:)2k,
av 2 2 k=O k!f(v + k + 1) 2
where W(z) = (dldz) In I'(z), Sintilarly,
a1-v = -1-v(z) In (!:) + (!:)-v f W(-v +k +1) (!:)2k.
av 2 2 k~O k!f(-v +k+ 1) 2
14.5 CONFLUENT HYPERGEOMETRIC FUNCTIONS 425
Substituting these expressions in the definition of Yn(z) and using L n (z) =
(-I)nJn(z), we obtain
The natural log term is indicative of the solution suggested by Theorem 14.2.6.
Since Yv(z) is linearly independent of Jv(z) for any v, integer or noninteger, it is
convenientto consider {Jv(z), Yv(z)} as a basis ofsolutions for the Bessel equation.
Another basis of solutions is defined as
Bessel function ofthe
third kind orHankel
function
2 (Z) I (z)n~ kqr(n+k+l) (Z)2k
Yn(z)=;Jn(z)ln 2: -; 2: 6(-1) k!I'(n+k+l) 2:
_~(~)-n -I n~ -I k qr(k-n+l) er
T( 2 () 6( ) k!I'(k - n + I) 2
(14.45)
(14.46)
which are called Bessel functions of the third kind, or Hankelfunctions.
Replacing z by iz in the Bessel equation yields
whose basis of solutions consists of multiples of Jv(iz) and Lv(iz). Thus, the
modified Bessel fnnctions of the first kind are defined as
. (Z)v 00 I (Z)2k
I - -mvj2], . -
v(z) = e u(zz) - 2: t;k!I'(v +k + I) 2: .
Similarly, the modified Bessel fnnctions of the second kindare defined as
T(
Kv(z) = 2' [I-v(z) - Iv(z)].
SIll v:rr
When v is anintegern, In = L n, and Kn is indeterminate. Thus, we define Kn(z)
as limv-->n Kv(z). This gives
Kn(z) = (_I)n lim [aLv _ aIv].
2 v--+n Bv Bv
which has the power-series representation
426 14. COMPLEX ANALYSIS OF SOLOES
We can obtain a recorrence relation for solntions of the Bessel eqnation as
follows. If Zv(z) is a solntion of order v, then (see Problem 14.28)
recurrence relation
forsolutions ofthe
BesselOE
and
v d v
Zv-l = C2C -[z Zv(z)].
dz
If the constants are chosen in snch a way that Zv, Z-v, Zv+l, and Zv-l satisfy
their appropriate series expansions, then Cl = -I and C2 = 1. Carrying ont the
differentiation in the equations for Zv+l and Zv-l, we obtain
v dZ;
Zv+l = -Zv - -d'
z z
v sz;
Zv-l = -Zv + --.
z dz
(14.47)
Adding these two eqnations yields the recursion relation
2v
Zv-l(Z) +Zv+l(Z) = -Zv(z),
z
where Zv(z) can be any of the three kinds of Bessel functions.
14.6 Problems
(14.48)
for n 2: O.
14.1. Show that the solution of w' + w/ Z2 = 0 has an essential singularity at
z =0.
14.2. Derive the recursion relation of Equation (14.7) and express it in terms of
the indicial polynomial, as in Equation (14.9).
14.3. Find the characteristic exponent associated with the solution of ui" +
p(z)w' +q(z)w = 0 at an ordinary point [a point at which p(z) and q(z) have no
poles]. How many solutions can you find?
14.4. The Laplace equation in electrostatics when separated in spherical coordi-
nates yields a DE in the radial coordinate given by
~ (x2dY
) _ n(n +I)y = 0
dx dx
Starting with an infinite series of the form (14.6), show that the two independent
solutions of this ODE are of the form xn and x-n- 1.
14.5. Findthe indicialpolynomial, characteristic exponents, and recursionrelation
at both of the regular singular points of the Legendre equation,
2z a
wI! - _-w' + --w = O.
I- Z2 l- z2
What is ai, the coefficient of the Laurent expansion, for the point z = +I?
F(-a, fJ; fJ; -z) = (I +z)",
14.6 PROBLEMS 427
14.6. Show that the substitution z = lit transforms Equation (14.13) into Equa-
tion (14.14).
14.7. Obtain the indicial polynomial of Equation (14.14) for expansion about
t = O.
14.8. Show that Riemann DE represents the most general second order Fuchsian
DE.
14.9. Derive the indicial equation for the Riemann DE.
14.10. Show that the transformation v(z) = ZA(Z - I)"w(z) changes the pairs of
characteristic exponents 0.1, A2), (iLl, iL2), and (VI, V2) for the Riemann DE to
(AI +A, A2 +A), (J.L1 +u; iL2 +iL), and (VI - A - u, V2 - A - iL).
14.11. Go through the steps leading to Equations (14.24), (14.25), and (14.26).
14.12. Show that the elliptic function of the first kind, defined as
/0
" /2 de
K(z) = ,
o JI- Z2 sin2 e
can be expressed as (n 12)F(~, ~; I; Z2).
14.13. By differentiating the hypergeometric series, show that
dn r(a +n)r(fJ +n)r(y)
dzn F(a, fJ; y; z) = r(a)r(fJ)r(y +n) F(a +n, fJ +n; y +n; z).
14.14. Use direct substitution in the hypergeometric series tu show that
1 1. 3. 2 1'_1
F(2' 2' 2'Z) = -sm z,
z
I
F(I, I; 2; -z) = -In(1 +z).
z
14.15. Show that the substitution v(z) = ZTw(l/z) [see Equation (14.28)] trans-
forms the HGDE into Equation (14.29).
14.16. Consider the function v(z) '" zT(I - z)'F(al, fJI; Yl; liz) and assume
that it is a solution of HGDE. Find a relation among r, s, aI, fJt, and Yl such that
v(z) is written in terms of three parameters rather than five. In particular, show
that one possibility is
v(z) = za-y(I - z)y-a-PF(y - a, I - a; I +fJ - a; liz).
Find all such possibilities.
428 14. COMPLEX ANALYSIS OF SOLOES
14.17. Show that the Jacobi functions are related to the hypergeometric fooctions.
14.18. Derive the expression for the Jacobi function of the second kind as given
in Equation (14.33).
14.19. Show that z = 00 is not a regular singular point of the CHGDE.
14.20. Derive the confluent hypergeometric series from hypergeometric series.
14.21. Show that the Weber-Hermite equation, u" +(v+ ~ - !Z2)u = ocan be
transformed into the CHGDE. Hint: Make the substitution u(z) = exp( -!Z2)v(Z).
14.22. The linear combination
. _ I'(l - y) . I'(y - I) l-y _ .
ljI(a, y, z) = I) <!>(a, y, z) + I'(a) Z <!>(a y + 1,2 - y, z)
I'(a - y + j, a
is also a solntion ofthe CHGDE. Show thatthe Hermitepolynomials can be written
as
( Z) " nlz
2
H" .,fi = 2 1jI(-2' 2; 2)'
14.23. Verify that the error fooction erf'(z) = J~ e-t'dt satisfies the relation
erf'(z) = z<!>(~, ~; _Z2).
14.24. Derive the series expansion of the Bessel function of the first kind from
that of the confluent hypergeometric series and the expansion of the exponential.
Check your answer by obtaining the same result by snbstituting the power series
directly in the Bessel DE.
14.25. Show that 1-,,(z) = (-I)"J,,(z). Hint: Let v = -n in the expansion of
Jv (z) and use I'Cm) = 00 for a nonpositive integer m,
14.26. In a potential-free region, the radial part of the Schrodinger equation re-
duces to
d
2
R + ~dR + [I.. _:!...] R = O.
dr 2 r dr r2
Write the solutions of this DE in terms of Bessel functions. Hint: Substitute R =
spherical Bessel uj.p. These solutions are called spherical Bessel functions.
functions
14.27. Theorem 14.2.6 states that ooder certain conditions, linearly independent
solutions of SOLDE at regular singolar points exist even though the difference
between the characteristic exponents is an integer. An example is the case of
Bessel fooctions of half-odd-integer orders. Evaluate the Wronskian of the two
linearly independent solutions, Jv and J-v, ofthe Bessel equation and show that it
vanishes only ifv is an integer.This shows, in particular, that J,,+1/2 and 1-,,-1/2
are linearly independent. Hint: Consider the value of the Wronskian at z = 0, and
use the formula I'(v)I'(l - v) = x] sin vrr.
14.6 PROBLEMS 429
14.28. Show that z±V(djdz)[z'FvZv(z)] is a solution of the Bessel equation of
order v ± I if Z, is a solution of order v.
14.29. Use the recursion relation of Equation (14.47) to prove that
(
I d)m u v m
~ dz lz Zv(z)] = z - Zv-m(Z),
I d m
(~dJ [z-VZv(z)] = (-I)mz-V-mZv+m(z).
14.30. Using the series expansion of the Bessel function, write JI/2(Z) and
Ll/2(Z) in terms of elementary functions. Hint: First show that r(k + ~) =
v'IT(2k + 1)!j(k!22k+l).
14.31. From the results of the previous two problems, derive the relations
L n- I/2(Z) = [frl
/
2
(~:y ("o;Z) ,
I n+I/2(Z) = [fzn+I/2 (_~:y("i~Z).
14.32. Obtain the following integral identities:
(a) fzV+l Jv(z)dz = zV+l Jv+l(Z).
(b) fz-v+IJv(z)dz = -z-v+lJv_l(Z).
(e) fZIL+l Jv(z)dz = ZIL+
1
Jv+l(Z) +(I-' - v)ZIL Jv(z)
- (1-'2 - v2)fZIL-
1
Jv(z) dz;
and evaluate
(d) fZ3 Jo(z) dz:
Hint: For (c) write ZIL+ 1 = ZIL-Vzv+1 and use integration by parts.
14.33. Use Theorem 14.2.6 and the fact that In(z) is entire to show that for integer
n, a second solution to the Bessel equation exists and can be written as Yn(z) =
I n(z)[/n (z) +en ln z], where In(z) is analytic about z = O.
14.34. (a) Show that the Wronskian W(Jv, Z; z) of Jv and any other solution Z
of the Bessel equation, satisfies the eqnation
d
dz[zW(Jv, Z; z)] = o.
430 14. COMPLEX ANALYSIS OF SOLOES
(b) For some constant A, show that
~ [~] _ W(z) _ _ A_
dz Jv - J;(z) - zJ;(z)'
(c) Show that the general second solution of the Bessel equation can be written as
14.35. Spherical Bessel functions are defined by
Let il(Z) denote a spherical Bessel function "of some kind." By direct differenti-
ationandsubstitution in the Bessel equation, show that
(c) Combine the results of parts (a) and (b) to derive the recursion relations
2/ + 1
il-t(Z) + fi+l(z) = --fj(z),
z
df;
/fi-I (z) - (/ + l)fi+1 (z) = (2/ + 1) dz .
14.36. Show that W(Jv, Yv;z) = 2/(:n:z), W(HJ!), HS
2
); z) = 4/(i:n:z). Hint:
Use Problem 14.34.
14.37. Verify the following relations:
(a) Yn+I/2(Z) = (_1)n+1 L n-I/2(Z), L n-I/2(Z) = (-1)" I n+I/2(Z).
Jv (z) - cos v:n: Lv (z)
(b) Lv(z) = sin v:n: Jv(z) +cos v:n:Yv(z) = . .
sm vn
(c) Y-n(z) = (-l)nyn(z) in the limit v --> n in part (b).
14.38. Use the recurrence relation for the Bessel function to show that JI (z) =
-Jij(z).
14.39. Let u = Jv().z) and v = Jv(jLz). Multiply the Bessel DE for u by v[z and
that of v by ufz, Subtract the two equations to obtain
0.2 _ jL2)zuv = ~ [z (u dv _ v dU)] .
dz dz dz
(a) Write the above equation in terms of Jv(}"z) and Jv(jLz) and integrate both
sides withrespectto Z.
o is w =
14.6 PROBLEMS 431
(b) Now divide both sides by A2 - ,,2 and take the limit as" --> A. You will need
to use L'Hopital's rule.
(c) Substitute for J;'(AZ) from the Bessel DE and simplify to get
fz[Jv(Az)fdz = Z; {[j;(AZ)]2 + (1
-A~:2) [Jv(Az)f} .
(d)Finally, let X= xvn/a, where r.; is the nth root of Jv, and use Equation (14.47)
to arriveat
{a 2 (Xvn ) a
2 2
io zJv ---;;z dz = 2Jv+I (xvn).
14.40. The generating function for Bessel functions of integer order is exp[~z(t­
I/t)]. To see this, rewtite the generating function as e"/2e-z/21, expand both
factors, and wtite the product as powers of t", Now show that the coefficient
of t" is simply In(z). Finally, use Ln(z) = (-I)nJn(z) to derive the formula
exp[~z(t - I/t)] = L~-oo JIl(z)tn.
14.41. Make the substitutions z = fJtY and w = tau to transform the Bessel DE
into t2 d
2
u + (2", + l)t du + (fJ2y 2t2Y + ",2 _ v2y 2)u = O. Now show that
dt2 dt
Airy's differential equation ii - tu = 0 has solutions of the form hf3(~it3/2)
and LI/3(~it3/2).
d2 w
14.42. Show that the general solution of dt2 +
t[AJv(el/') + BYv(el/')].
14.43. Transform dw/dz + w2 + zm = 0 by making the substitution w =
(d/dz) In v. Now make the forther substitutions
v= u..;z and t = _2_z!+O/2)m
m+2
to show that the new DE can be transformed into a Bessel equation of order
I/(m +2).
14.44. Starting with the relation
exp[~x(t - l/t)] exp[~Y(t - l/t)] = exp[~(x + y)(t - l/t)]
and the fact that the exponential function is the generating function for I n (z), prove
the "addition theorem" for Bessel functions:
00
JIl(x +y) = L h(X)Jn-k(Y)·
k=-oo
432 14. COMPLEX ANALYSIS OF SOLDES
Additional Reading
1. Birkhoff, G. and Rota, G.-C. OrdinaryDifferentialEquations, 3rded., Wiley,
1978. The first two sections ofthis chapter closely follow their presentation.
2. Dennery, P.and Krzywicki, A. Mathematicsfor Physicists, Harper and Row,
1967.
3. Watson, G. A Treatise on the Theory ofBessel Functions, 2nd ed., Cam-
bridge University Press, 1952. As the name suggests, the definitive text and
reference on Bessel functions.
15 _
Integral Transforms and Differential
Equations
The discussion in Chapter 14 introduced a general method of solving differential
equations by power series-i-also called the Frobenius method-which gives a
solution that converges within a circle of convergence. In general, this circle of
convergence maybe small; however, thefunction represented by thepowerseries
can be analytically continued using methods presented in Chapter II.
This chapter, which is a bridge between differential equations and operators
on Hilbert spaces (to be developed in the next part), introduces another method of
solving DEs, which uses integral transforms and incorporates the analytic con-
tinuation automatically. The integral transform of a function v is another function
u given by
u(z) = fcK(z, t)v(t)dt, (15.1)
kernel ofintegral
transforms
examples ofintegral
transforms
where C is a convenient contour, and K(z, t), called the kernel of the integral
transform, is an appropriate function of two complex variables.
15.0.1. Example. Letus consider some examples of integral transforms.
(a)TheFouriertransform is familiar from thediscussionof Chapter 8. Thekernel is
K(x, y) = eixy.
(b) The Laplace transform is used frequently in electrical engineering. Its kernel is
K(x, y) = e-xy.
(c)TheEuler transform hasthekemel
K(x, y) = (x _ y)v.
434 15. INTEGRAL TRANSFORMS AND DIFFERENTIAL EQUATIONS
(d) The Mellintransform has the kernel
st», y) = G(xY).
where G is an arbitrary function. Most of the time K(x, y) is taken to be simply x Y.
(e) The Hankel transform has the kernel
K(x, y) = yJn(xy),
IIIi
Strategy forsolving
DEs using integral
transforms
where In is the nth-order Bessel function.
(f) A transform that is useful in connection with the Bessel equation has the kernel
K(x,y) = Gfey-x'/4y.
The idea behind using integral transform is to write the solution u(z) of a DE
in z in terms of an integral such as Equation (15.1) and choose v and the kernel
in such a way as to render the DE more manageable. Let Lz be a differential
operator (DO) in the variable z. We want to determine u(z) such that Ldu] = 0, or
equivalently, such that Ie Lz[K(z, t)]v(t) dt = O. Snppose that we can find M" a
DO in the variable t, such that LdK(z, t)l = M,[K(z, t)]. Then the DE becomes
Ie(M,[K(z, t)]Jv(t) dt = O. If C has a and b as initial and final points (a and b
may be equal), then the Lagrange identity [see Equation (13.24)] yields
0= Lz[u] = t K(z, t)M;[V(t)]dt + Q[K, v]I~,
where Q[K, v] is the "surface term." If vet) and the contour C (or a and b) are
chosen in such a way that
Q[K, v]l~ = 0 and M;[V(t)] = 0, (15.2)
the problemis solved. The trick is to find an M,suchthatEquation (15.2) is easierto
solve thanthe original equation, Lz[u] = O. This in torn demands a clever choice of
the kernel, K(z, t). This chapter discusses how to solve some common differential
equations of mathematical physics using the general idea presented above.
15.1 Integral Representation ofthe Hypergeometric
Function
Recall that for the hypergeometric function, the differential operator is
d2 d
Lz = z(1 - z) dz2 +[y - (a +fJ + I)zl dz - afJ
For such operators-whose coefficient functions are polynomials-the proper
choice for K(z, t) is the Euler kernel, (z - t)'. Applying Lz to this kernel and
15.1 INTEGRAL REPRESENTATION OF THE HYPERGEOMETRIC FUNCTION 435
rearrmgmgrenns,weO~IDn
Lz[K(z, t)] = /Z2[-s(s - 1) - s(a +,B + 1) - a,B] +z[s(s- 1) +sy
+st(a +,B + 1) +2a,Bt] - yst - a,BP}(z - t)'-2.
(15.3)
Note that except for a multiplicative constant, Kiz; t) is synunetric in z and t.
This suggests that the general form of M, may be chosen to be the same as that of
Lz except for the interchange of z and t. If we can manipulate the parameters in
such a way that M,becomes simple, then we have a chance ofsolving the problem.
For instance, if M, has the form of Lz with the constant term absent, then the
hypergeometric DE effectively reduces to a FODE (in dv/ dt). Let us exploit this
possibility.
The general form of the M, that we are interested in is
d2
d
M, = P2(t) dt2 +PI (t) dt'
i.e., with no Po term. By applying M, to K(z, t) = (z - t)' and setting the result
equal to the RHS of Equation (15.3), we obtain
s(s - 1)p2 - PISZ + pist = Z2[-s(s - 1) - s(a +,B + 1) - a,B]
+ z[s(s - 1) +sy +st(a +,B + 1) +2a,Bt]
- yst - a,Bt2,
for which the coefficients of equal powers of z on both sides must be equal:
- s(s - 1) - s(a +,B +1) - a,B = 0 =} s = -a or s = -,B,
- PIS = s(s - 1) +sy +stt« +,B + 1) +2a,Bt,
s(s - 1)p2 + pvst = -yst - a,Bt2.
Ifwe choose s = -a (s = -,B leads to an equivalent representation), the coeffi-
cient functions of M, will be completely determined. In fact, the second equation
gives PI (t), and the third determines P2(t). We finally obtain
PI(t) =a+l-y +t(,B -a -1),
and
2 d2
d
M, = (t - t )-2 + [a + 1 - y +t(,B - a - 1)]-,
dt dt
(15.4)
(15.5)
which, according to Equation (13.20), yields the following DE for the adjoint:
d2 d
Mi[v] = -2 [(t - t2)v] - -{[a - y +1 +t(,B - a - l)]v} = O.
dt dt
(15.6)
436 15. INTEGRAL TRANSFORMS AND DIFFERENTIAL EQUATIONS
The solution to this equation is v(t) = Cta-Y(t- I)Y-~-l (see Problem 15.5).
We also need the surface tenm, Q[K, v], in the Lagrangeidentity (see Problem
15.6 for details): Q[K, v](t) = Cata-y+1(t - l)Y-~(z - t)-a-l.
Finally, we need a specification of the contour. For different contours we will
get different solutions. The contour chosen must, of course, have the property that
Q[K, v] vanishes as a result of the integration. There are two possibilities: Either
the contour is closed [a = b in (15.2)] or a i' b but Q[K, v] takes on the same
value at a and at b.
Let us consider the second of these possibilities. Clearly, Q[K, v](t) vanishes
att = 1 ifRe(y) > Re(,8). Also, as t ---> 00,
Q[K, v](l) ---> (_l)-a-1Cata-y+ltY-~t-a-l = (_l)-a-1Cat-~,
which vanishes if Re(,8) > O. We thus take a = 1 and b = 00, and assume that
Re(y) > Re(,8) > O. It then follows that
u(z) = lb
K(z, t)v(t)dt = c 100
(t - z)-ata-y(t - l)y-~-ldt.
The constaot C' Can be determined to be r(y)/[r(,8)r(y - ,8)] (see Problem
15.7). Therefore,
u(z) sa F(a,,8; y; z) = r(,8)~i~)_ ,8) 1
00
(t - z)-ata-
y
(t -l)y-~-ldt.
Euler formula forfhe
hypergeomelric
function
It is customary to change the variable of integration from t to 1/t. The resulting
expression is called the Euler formula for the hypergeometric function:
F(a ,8' y' z) = r(y) t(l - tz)-at~-l(l- t)y-~-ldt.
, " r(,8)r(y - ,8) 10 (15.7)
Note that the tenm (1 - tz)-a in the integral has two branch points in the z-
plane, one at z = l/t and the other at z = 00. Therefore, we cut the z-plane from
Zl = l/t, a point on the positive real axis, to Z2 = 00. Since 0 :'0 t :'0 1, Zl is
somewhere in the interval [1, (0). To ensure that the cutis applicable for all values
of t, we take Zl = 1 and cut the plane along the positive real axis. It follows that
Equation (15.7) is well behaved as long as
o< arg(l - z) < 21f. (15.8)
We couldchoose a different contour, which, in general, would leadto a different
solution. The following example illustrates one such choice.
15.1.1.Example. Firstnotethat Q[K, v] vanishes att = Oandt = 1 as long asRe(y) >
Re(li) andRe(a) > Re(y) -I. Hence, we can choose the contour to start att =0 and end
15.2 INTEGRAL REPRESENTATION OF THE CONFLUENT HYPERGEOMETRIC FUNCTION 437
att = 1. Wethenhave
(15.9)
Tosee therelation between w(z) andthehypergeometric function, expand (1- t/z)-a in
the integral to get
w(z) = e"z-u f r(a +n) (~)n [I tu+n-y (1 _ tly-P-1dt.
n~O f(a)r(n + 1) z Jo
(15.10)
Nowevaluatethe integralby changingt to l/t and osing Eqoations(11.19)and (11.17).
Thischangestheintegral to
[00 t-u-n-l+P(t _ l)y-P-1dt = r(a +n + 1 - y)r(y - fJ)
Jl r(a +n + 1 - fJ)
Snbstitnting this in Equation(15.10), we obtain
c" -a ~ f(a +n)f(a +n + 1-y) (1)"
w(z) = r(a) r(y - fJ)z f;;n F'(e +n + 1 - fJ)r(n + 1) Z
c" r(a)r(a + 1- y)
= r(a) f(y - fJ)z-a r(a +1- fJ) F(a, a - y + 1; a - fJ+ 1; l/z),
where wehave usedthe hypergeometric series of Chapter 14.Choosing
e" = f(a + 1 - fJ)
r(y - fJ)r(a + 1 - y)
yields w(z) = z-a F(a, a - y +1;a - fJ +1; l/z), whichis one of the solutions of the
hypergeometric DE [Equation (14.30)]. II!I
15.2 Integral Representation of the Confluent Hy-
pergeometric Function
Having obtained the integral representation of the hypergeometric fnnction, we
can readily get the integral representation of the conflnent hypergeometric func-
tion by taking the proper limit. It was shown in Chapter 14 that <I>(a, y; z) =
limp-+oo F(a, fJ; y; zlfJ). This snggests taking the limit of Equation (15.7). The
presence of the ganuna functions with fJ as their arguments complicates things,
but on the other hand, the symmetry ofthe hypergeometric function can be utilized
438 15. INTEGRALTRANSFORMS AND DIFFERENTIAL EQUATIONS
to ouradvantage. Thus,we may write
<1>(a, y; z) = lim F (a, fJ; y; .:.) = lim F (fJ, a; y; .:.)
~....oo fJ ~....oo fJ
= lim r(y) r(1- IZ)-~1.-1(1 _ t)y-.-Idl
~....co r(a)r(y - a) Jo fJ
= <1>(a, y; z) = r(y) t eztl·-I(I _ t)y-.-Idl
r(a)r(y - a) Jo (15.11)
becausethe limitofthe firsttenn in the integralis simplyetz. Note that the condition
Re(y) > Re(a) > 0 must still hold here.
Integral transforms are particularly useful in determining the asymptotic be-
havior offunctions. We shall use them in deriving asymptotic formulas for Bessel
functions later on, and Problem 15.10 derives the asymptotic formula for the con-
fluent hypergeometric function.
15.3 Integral Representation of Bessel Functions
Choosing the kernel, the contour, and the function v(l) that lead to an integral
representation of a function is an art, and the nineteenth century produced many
masters of it. A particularly popular theme in such endeavors was the Bessel
equation and Bessel functions. This sectionconsiders theintegral representations
of Bessel functions.
The most effective kernel for the Bessel DE is
K(z, I)= Grexp (I-~:).
d
Z
I d ( V
Z)
.
When the Bessel DO Lz sa -z + -- + 1- -z acts on K(z, I), it yields
dz z dz z
(
v+1 ZZ)(Z)V '/4 (d V+I)
L K(z, t) = --- + I + - - et
- z t = - - -- K(z I).
z I 41z 2 dt I '
Thus, Mt = dfdt - (v + 1)/1, and Equation (13.20) gives
t dv v +I
Mt [V(I)] = -- - --v = 0,
dt I
whose solution, including the arbitrary constant ofintegrationk, is v(l) = ta:":",
When we substitute this solution and the kernel in the surface term ofthe Lagrange
identity, Equation (13.24), we obtain
Q[K, V](I) = PIK(z, l)v(l) = k G)v l-v-I
et - z' / (4t ) .
15.3 INTEGRAL REPRESENTATION OF BESSEL FUNCTIONS 439
Imt
Ret
(15.12)
(15.13)
inlegral
representation of
Bessel function
Figure 15.1 ThecontourC in the r-planeusedin evaluating Jv(z).
A contour in the I -plane that ensures the vanishing of Q [K, v]for all values of v
starts at I = -00, comes to the origin, orbits it on an arbitrary circle, and finally
goes back to t = -00 (see Figure 15.1). Such a contour is possible because of the
factor et in the expression for Q[K, v]. We thus can write
Jv(z) = k G)v fct-V-Iet-z2/(4t)dt
Note that the integrand has a cut along the negative real axis due to the factor
I-v-I. If v is an integer, the cut shrinks to a pole att = O.
The constant k must be determined in such a way that the above expression
for Jv(z) agrees with the series representation obtained in Chapter 14. It can be
shown (see Problem 15.11) that k = 1/(2,,;). Thus, we have
Jv(z) = ~ (~)V rt-v-let-z2/(41)dt
2", 2 lc
Itis more convenient to take the factor (zI2)vinto the integral, introduce a new
integration variable u = 2t [z, and rewrite the preceding equation as
Jv(z) = ~ ru-v-1e(z/2)(u-lfu)du.
2", Jc
This result is valid as long as Re(zu) < 0 when u --> -00 on the negative real
axis; that is, Re(z) must be positive for Equation (15.13) to work.
An interesting result can be obtained from Equation (15.13) when v is an
integer. In that case the only singularity will be at the origin, so the contour can be
taken to be a circle about the origin. This yields
J (z) = _1_ ru-n-Ie(z/2)(u-lfu)du
n 2ni Jc '
which is the nth coefficient of the Laurent series expansion ofexp[(zI2)(u -l/u)]
Bessel generating about the origin. We thus have this important resnlt:
function
DO
e(z/2)(t-l/l) = L In(z)tn.
n=-oo
(15.14)
(15.15)
440 15. INTEGRAL TRANSFORMS AND DIFFERENTIAL EQUATIONS
The function exp[(zIZ) (t - 1It)1is therefore appropriately called the generating
fnnction for Bessel functions of integer order (see also Problem 14.40). Equation
(15.14) canbe useful in deriving relations for suchBesselfunctions as the following
example shows.
15.3.1. Example. Let us rewrite the LHS of (15.14) as eZI/2e-z/21, expand the expo-
nentials, andcollectterms to obtain
e(z/2)(t-ljt) = ezt/2e-z/2t = f ~ (~)m f ~ (_~)n
m=O m. 2 n=O n. 2t
= f f (-~)~ (~)m+n tm-n.
m=On=O m.n. 2
If we letm - n = k, change them summation to k, andnotethat k goes from -00 to 00,
we get
e(z/2)(I-I/I) = f f (_I)n (~)2n+krk
k~-oon~O (n +k)!n! 2
00 [(Z)k 00 (_I)n (Z)2n] k
k~oo 2: Er(n +k + l)r(n + 1) 2: 1 .
Comparing thisequation withEquation (15.14)yieldsthefamiliar expansion fortheBessel
function:
(
Z) k 00 (_I)n (Z)1.
h(z) = 2: Er(n + k + l)r(n + 1) 2: .
Wecanalsoobtain arecurrencerelationforI n (z), DifferentiatingbothsidesofEquation
(15.14)with respect to t yields
~ (1 + k)e(Z/2)(I-I/I) = f nJn(Z)ln-1
n=-oo
Using Equation (15.14) on the LHS gives
00 00 00
L G+ --;) In(z)t
n
= ~ L In(z)t
n
+ ~ L In(z)t
n-2
n=-oo 2t n=-oo n=-oo
00 00
=~ L In_l(Z)t''-I+~ L In+I(Z)I,,-I,
n=-oo n=-oo (15.16)
where we substituted n - 1forn in thefirst sumandn +1forn inthesecond. Equating the
coefficientsof equalpowersof 1 on the LHS andthe RHS of Equations (15.15) and (15.16),
we get
z
nJn(z)= 2:[Jn-I(Z) + J,,+I(Z)],
which was obtained hy a different method in Chapter 14 [seeEq. (14.48)]. III
15.3 INTEGRAL REPRESENTATION OF BESSEL FUNCTIONS 441
Imw
Rew
Figure 15.2 The contourC' in the w-plane used in evaluating JvCz).
(15.17)
Re(z) > 0,
We can start with Equation (15.13) and obtain other integral representations
of Bessel functions by making appropriate substitutions. For instance, we can let
u = e" and assume that the circle of the contour C has unit radius. The contour
C' in the w-plane is determined as follows. Write u = rei() and w sa x + iy, sol
reiO= eXeiy yielding r = eXand eiO= eiy. Along the first part of C, e= -rr and
r goes from 00 to 1. Thus, along the corresponding part of C', y = -rr and x goes
from 00 to O. On the circular part of C, r = 1 and egoes from -rr to -l-z , Thus,
along the corresponding part of C', x = 0 and y goes from -rr to +rr. Finally, on
the last part of C', y = rr and x goes from 0 to 00. Therefore, the contour C' in
the w-plane is as shown in Figure 15.2.
Substituting u = eW
in Equation (15.13) yields
Jv(z) = ---.!:..., ( ezsmhw-vwdw,
21rl lei
which can be transformed into (see Problem 15.12)
1 In" sin vrr Inoo
.
Jv(z) = - cos(ve -zsine)de _ - - e-vt-zsinh 'dt.
n 0 1r 0 (15.18)
integral
representation of
Bessel functions of
integer order
For the special case of integer v, we obtain
I n (z) = ~ (" cos(ne - z sin e) de.
rr Jo
In particular,
1 In"
Jo(z) = - cos(z sine) de.
rr 0
We can use the integral representation for Jv(z) to find the integral repre-
sentation for Bessel functions of other kinds. For instance, to obtain the integral
1Do notconfusex andy withtherealandimaginary parts of z,
442 15. INTEGRAL TRANSFORMS AND DIFFERENTIAL EQUATIONS
Imw
in 1------1~----
en
Rew
Figure15.3 Thecontourc" in the w-planeused in evaluating HS1
)(z).
representation for the Neumann function Yv(z), we use Equation (14.44):
1
Yv(z) = (cot V1f)Jv(z) - -.--Lv(z)
sm V1r
cot V1l' i1C
cos VJr i"" inh
= __ cos(ve-zsine)de--- e-Vf
-
z S
"dt
tc 0 7i 0
1 i" 1 i"" .
- . cos(ve +zsine)de _ _ evt-zsinhtdt
tt smVll' 0 1C 0
with Re(z) > O. Substitute tc - e for e in the third integral on the RHS. Then
insert the resulting integrals plus Equation (15.18) in HJl) (z) = Jv(z) +iYv(z) to
obtain
Re(z) > O.
These integrals can easily be shown to result from integrating along the contour
e" of Figure 15.3. Thus, we have
Re(z) > O.
By changing i to -t, we can show that
Re(z) > 0,
where elll
is the mirror image of e" about the real axis.
g(w) = 1.
15.4 ASYMPTOTIC BEHAVIOR OF BESSEL FUNCTIONS 443
15.4 Asymptotic Behavior of Bessel Functions
As mentioned before, integral representations are particularly useful for determin-
ing the asymptotic behavioroffunctions. For Bessel functions we can considertwo
kinds of limits. Assumiug that both v and z = x are real, we can consider v --+ 00
or x --+ 00. First, let us consider the behavior of Jv(x) oflarge order. The appropri-
ate method for calculating the asymptotic form is the method ofsteepest descent
discussed in Chapter II for which v takes the place of the large parameter a. We
use Equation (15.17) because its integrand is simplerthan that ofEquation (15.13).
The form of the integrand in Equation (15.17) may want to suggest f(w) = -w
and g(w) = e" sinhw. However, this choice does not allow setting f' (w) equal
to zero. To proceed, therefore, we write the exponent as v (~ sinh w- w), and
conveniently introduce x]» es 1/ cosh wo, with wo a real number, which we take
to be positive. Substituting this in the equation above, we can read off
sinh w
f(w) = h -w,
cos wo
The saddle point is obtained from df/dw = °or cosh w = cosh woo Thus,
w = ±wo + 2in:n:, for n = 0, 1, 2 .... Since the contour C' lies in the right half-
plane, we choose Wo as the saddle point. The second derivative fN (wo) is simply
tanh wo, which is real, making Ih = 0, and el = n/2 or 3:n:/2. The convention
of Chapter II suggests taking el = :n:/2 (see Figure 15.4). The rest is a matter of
substitution. We are interested in the approximation to w up to the third order in t:
w - wo = bIt + b2t2 + b3t3. Using Equations (11.31), (11.37), and (11.38), we
can easily find the three coefficients:
bl = v"i eWI = i v"i
1f"(woJll/2 ,Jtanh wo'
b = f"'(wo) e4i81 = cosh
2
Wo
2 31f"(wo)12 3 sinh2 wiJ'
{
5[f"'(wo)f f(4l(wO)} v"ie3i91
b3 = 3[f"(wO)]2 - f"(wo) 121f"(wo)!3/2
. v"i (5 2 )
- -I - coth Wo - 1
- 12(tanh wO)3/2 3 .
Ifwe substitute the above in Equation (11.36), we obtain the following asymp-
totic formula valid for v --+ 00:
eX(sinhWO-Wij coshWO) [ I ( 5 2 ) ]
Jv(x)'" 1+ I--coth wo + ...
(2:n:x sinh wo)I/2 8x sinh Wo 3 '
where v is related to WQ via v = x coshwoo
444 15. INTEGRAL TRANSFORMS AND DIFFERENTIAL EQUATIONS
v
i r e - - - - - - - - -
u
-ix - - - ~-=--=:-::=- - -
Figure 15.4 Thecontour Coin thew-plane usedin evaluating Jv(z) forlarge values of
v.
Let us now consider the asymptotic behavior for large x. It is convenient to
consider the Hankel functions HSt)(x) and HS2)(x).
The contours e" and ell!
involve both the positive and the negative real axis; therefore, it is convenient,
assuming that x > v, to write v = x cos fJ so that
The saddle points are given by the solutions to cosh W =cos fJ, which are WQ =
±ifJ. Choosing WQ = +ifJ, we note that the contour along which
Im(sinh W - W cos fJ) = Im(sinh WQ - WQ cos fJ)
is given by coshu = [sinfJ + (v - fJ)cosfJ]/sinv. This contour is shown in
Figure 15.5. The rest of the procedure is exactly the same as for Jv(x) described
above. In fact, to obtain the expansion for HS1
) (x), we simply replace WQ by ifJ.
The result is
2 V2 I 5
H(l)(x) '" (. . ) ei(xsin~-v~) [I+ . . (I + -cot2fJ) +...J.
v mx smfJ g.x smfJ 3
When x is much larger than v, fJ will be close to ](/2, and we have
HS1l(x) '" (2ei(X- vrr/ 2- rr/4) (1 + ~) ,
'In g.x
which, with 11x -+ 0, is what we obtained in Example 11.5.2.
15.5 PROBLEMS 445
v
u
Figure 15.5 Thecontour in thew-plane usedin evaluating .H51)(z) in thelimitof large
values of x.
The other saddle point, at -if!, gives the otherHankel function, with the asymp-
totic limit
H(2)(X) '" (2e-i(X-vrr/2-rr/4) (1 _~)
v V;; 8IX
We can now use the expressions for the asymptotic forms of the two Hankel
functions to write the asymptotic forms of Jv(x) and Yv(x) for large x:
Jv(x) = &[H~l)(X) + H~2x)]
'" (2 [cos (x - v~ - ~) + ~ sin (x - v~ - ~) +...] '
V;-; 2 4 8x 2 4
Yv(x) = ;i[H~l)(x) - H~2)(x)]
"'J:x[Sin(x-vi -~) - 8~ cos (x-vi -~)+"l
15.5 Problems
15.1. Use the change of variables k =Int and ix =w -Cl (wherek and x are the
common variables used in Fourier transform equations) to show that the Fourier
transform changes into a Mellin transform,
I lioo
+
u
G(t) = -. F(w)t-Wdw,
21l'1 -ioo+a
15.2. The Laplace transform L[f] of a function f(l) is defined as
L[J](s) sa 10
00
e-st
f(t) dt.
446 15. INTEGRAL TRANSFORMS AND DIFFERENTIAL EQUATIONS
Show that the Laplace transform of
is
I
(a) f(t) = 1
s
is
s
(b) f(t) = cosh tot
s2 - w2 '
is
w
(c) f(t) = sinhwt
s2 - (J)2'
is
s
(d) f(t) =coswt
s2 +w2·
is
w
(e) f(t) = sinwt
s2 +w2·
f(t) = eM for t > 0,
1
(f) is ,
s-w
(g) f(t) = t" is
f(n + 1)
sn+l
,
15.3. Evaluate the integral
where s > O.
wheres2 > (li2.
where s > w.
where s > 0, n > -1.
(15.19)
10
00 sintat
f(t) = --dw
o w
by finding the Laplace transform and changing the order of integration. Express
the result for both t > 0 and t < 0 in terms of the theta function. (You will need
some results from Problem 15.2.)
15.4. Show that the Laplace transform of the derivative of a function is given by
L[F'](s) = sL[F](s) - F(O). Similarly, show that for the second derivative the
transform is
L[FI/](s) = s2L[F](s) - sF(O) - F'(O).
Use these results to solve the differential equation. ul/(t) +w2u(t ) = 0 snbject to
the boundary conditions u(O) = a, u'(O) = O.
15.5. Solve the DE of Equation (15.5).
15.6. Calculate the surface term for the hypergeometric DE.
15.7. Determine the constant C' in Equation (15.6), the solution to the hypergeo-
metric DE. Hint: Expand (t - z)-a inside the integral, use Equations (11.19) and
(11.17), and compare the ensuing series with the hypergeometric series of Chapter
14.
15.8.: Derive the Euler formula [Equation (15.7)].
15.9. Show that
f(y)f(y - a - fJ)
F(a, fJ; y; 1) = f(y _ a)r(y - fJ)'
15.5 PROBLEMS 447
Hint: Use Equation (11.19). Equation (15.19) was obtained by Gauss using only
hypergeometric series.
15.10. We determine the asymptotic behavior of <l>(a, y; z) for z --> 00 in this
problem. Break up the integral in Equation (15.11) into two parts, one from oto
-00 and the other from -00 to 1. Substitute -t/z for t in the first integral, and
1 - t/ z for t in the second. Assuming that z --> 00 along the positive real axis,
show that the second integral will dominate, and that
as z ----* 00.
r(y)
<l>(a, y; z) --> f(a)z·-YeZ
15.11. In this problem, we determine the constant k of Equation (15.12).
(a) Write the contour integral of Equation (15.12) for each of the three pieces of
the contour. Note that arg(t) =-n as t comes from -00 and arg(t) =n as t goes
to -00. Obtain a real integral from 0 to 00.
(b) Use the relation f(z)f(1 - z) = tt] sin n z, obtained in Chapter 11, to show
that
n
I'(v-z) =
r(z + 1) sin xz
(c) Expand the function exp(z2/4t) in the integral of part (a), and show that the
contour integral reduces to
., ~ (z)2n f(-n - v)
-2l smVJr L." - .
n=O 2 f(n + 1)
(d) Use the resnlt of part (c) in part (b), and compare the resnlt with the series
expansion of Jv(z) in Chapter 14 to arrive finally at k = 1/(2ni).
15.12. By integrating along Ct, C2, C3, and C4 of Figure 15.2, derive Equation
(15.18).
15.13. By substituting t = exp(ill) in Equation (15.14), show that
00 00
eiZ';ne = Jo(z) +2I:hn(z) cos(2nll) +2i I:hn+J (z) sin[(2n + 1)11].
n=l n=O
In parricular, show that
1 121< . .o
Jo(z) = - e,<sm dll.
2n 0
15.14. Derive the integral representations ofHJl)(x) and HJ2) (x) given in Section
15.3.
448 15. INTEGRAL TRANSFORMS AND DIFFERENTIAL EQUATIONS
Additional Reading
1. Dennery, P. and Krzywicki, A Mathematicsfor Physicists, Harper and Row,
1967.
2. Mathews, J. and Walker, R. Mathematical Methods of Physics, 2nd ed.,
Benjamin, 1970.
Part V _
Operators on Hilbert Spaces
Hassani_Mathematical_Physics_A_Modem_Int.pdf
16, _
An Introduction to Operator Theory
The first two parts of the book dealt almost exclusively with algebraic techniques.
The third and fourth part were, devoted to analytic methods. In this introductory
chapter, we shall try to unite these two branches of mathematics to gaiu insight
into the nature of some of the important equations in physics and their solutions.
Let us start with a familiar problem.
16.1 From Abstract to Integral and Differential Op-
erators
Let's say we wantto solve an abstractvector-operatorequationA lu) = Iv) in an N-
dimensional vector space V. To this end, we select a basis B = {lUi) )~1' write the
equation in matrix form, and solve the resulting system of N linear equations. This
produces the components of the solution lu) in B. If components in another basis
B' are desired, they can be obtained using the similarity transformation connecting
the two bases (see Chapter 3).
There is a standard formal procedure for obtaining the matrix equation. It
is convenient to choose an orthonormal basis B = IIei) }~1 for V and refer
all components to this basis. The procedure involves contracting both sides of
the equation with (eil and inserting 1 = r.7=1Iej) (ejl between A and lu):
r.7=1 (eil A lej) (ejl u) = (eil v) for i = 1,2, ... , N, or
N
LAijuj = Vi
j=l
fori=1,2, ... ,N, (16.1)
452 16. AN INTRODUCTION TO OPERATOR THEORY
where Ai} sa (eilAlej) , Uj ss (ejlu), and Vi ss (eilv). Equation (16.1) is a
system of N linear equations in N unknowns Iuj J7=1' which can be solved to
obtain the solution(s) of the original eqnation in B.
A convenient basis is that in which A is represented by a diagonal matrix
diag(AI, A2, '" ,AN). Then the operator equationtakes the simplcform x.a, = Vi,
and the solution becomes immediate.
Let us now apply the procedure just described to infinite-dimensional vector
spaces, in particnlar, for the case of a continuous index. We want to find the
solutions of K Iu) = If). Following the procedure used above, we obtain
(xl K (lab IY) w(y) (YI dY) lu} =lab (xl K Iy) w(y) (YI u) dy = (xl f) ,
where we have used the results obtained in Chapter 6. Writing this in functional
notation, we have
lab K(x, y)w(y)u(y) dy = f(x), (16.2)
Integral operators which is the continuous analogue of Equation (16.1). Here (a, b) is the interval
andkernels on which the functions are defined. We note that the indices have tumed into
continuous arguments, and the snm has turned into an integral. The operator K that
leads to an equation such as (16.2) is called an integral operator (10), and the
"matrix element" K (x, y) is saidto be its kernel.
The discussion of the discrete case mentioned the possibility of the operator A
being diagonal in the given basis B. Let us do the same with (16.2); that is, noting
that x and y are indices for K, let us assume that K(x, y) = 0 for x oft y. Such
local operators operators are called local operators. For local operators, the contribution to the
integral comes ouly at the point where x = y (hence, their name). If K (x, y) is
finite at this point, and the functions w(y) and u(y) are well behaved there, the
LHS of (16.2) will vanish, and we will get inconsistencies. To avoid this, we need
to have
( ) {
o if x oft y,
K x,y = .
00 If x = y.
Thus, K(x, y) has the behavior of a delta function. Letting K (x, y) sa L(x)8(x -
y)/w(x) and substituting in Equation (16.2) yields L(x)u(x) = f(x).
Inthediscrete case,A.i wasmerely anindexednumber; itscontinuous analogue,
L(x), may represent merely a function. However, the fact that x is a continuous
variable (index) gives rise to other possibilities for L(x) that do not exist for the
discrete case. For instance, L(x) could be a differential operator. The derivative,
although defined by a limiting process involving neighboring points, is a local
operator. Thus, we can speak of the derivative of a function at a point. For the
right-shift operator
16.2 BOUNDED OPERATORS IN HILBERTSPACES 453
discrete case, u; can only "hop" from i to i +I and then back to i. Snch a difference
(as opposed to differential) process is notlocal; it involves not only i but also i +1.
The "point" i does not have an (infinitesimally close) neighbor.
This essential difference between discrete and continuous operators makes the
latter far richer in possibilities for applications. In particular, if L(x) is considered
a differential operator, the equation L(x)u(x) = f(x) leads directly to the fruitful
area of differential equation theory.
16.2 Bounded Operators in Hilbert Spaces
The concept ofan operatoron a Hilbert space is extremely subtle. Even the elemen-
tary characteristics of operators, such as the operation of hermitian conjugation,
cannot generally be defined on the whole Hilbert space.
In finite-dimensional vector spaces thereis a one-to-one correspondence be-
tween operators andmatrices. So, in some sense, the study of operators reduces
to a stody of matrices, which are collections of real or complex numbers. Al-
though we have already noted an analogy between matrices and kernels, a whole
new realm of questions arises when Aij is replaced by K(x, y)-questions about
the continuity of K(x, y) in both its arguments, about the limit of K(x, y) as x
and/or y approach the "end points" of the interval on which K is defined, about
the boundedness and "compactness" of K, and so on. Such subtleties are not unex-
pected. After all, when we tried to generalize concepts offinite-dimensional vector
spaces to infinite dimensions in Chapter 5, we encountered difficulties. There we
were concerned aboutvectorsonly; the generalization of operators is even more
complicated.
16.2.1. Example. Recallthat <coo is the set of sequences la) = l"'iJ;';;l' or of co-tuples
(al. a2, ... ), thatsatisfythe convergence requirement L~l laj 1
2 < 00 (see Example
t.1.2). It is a Hilbert space with inner product defined by (al b) = L:~I "'jfJj. The
standard (orthonormal) basisfor Coois {Iei}}~l' wherelei} has all components equal to
zeroexcept the ith one, which is 1.Thenone has la} = L~l aj [ej}.
Onecanintroduce anoperator Tr, calledtheright-shift operator,by
T,la) =T,(f>jlej )) = f>jIej+I).
j=l j=l
In otherwords,Tr transforms("'10"'2 •... ) to (0, "'I. "'2, ... ). It is straightforward to show
that Tr is indeed a linear operator. ..
The first step in our stody of vector spaces of infinite dimensions was getting
a handle on the convergence of infinite sums. This entailed defining a norm for
vectors and a distance between them. In addition, we noted that the set of linear
transformations L (V, W) was a vector space in its own right. Since operators are
"vectors" in this space, the study of operators requires constructing a norm in
L (V, W) when V and W are infinite-dimensional.
454 16. AN INTROOUCTION TO OPERATOR THEORY
16.2.2. Definition. Let 1fl and 1fz be two Hilbert spaces with norms II . 111 and
II· liz. ForanyT EL(1fl,1fz),thenumber
max { II
TXllzl
lx} 'i-o}
IIxlll
operator norm (if it exists) is called1 the operator norm ofT and is denoted by IITII. A linear
transformation whose norm is finite is called a bounded linear transformation. A
bounded linear transformation from a Hilbert space to itselfis called a bounded
bounded operator operator. The collection ofall bounded linear transformations, which is a subset
of L(1f" 1fz), will be denoted by 13(1fl, 1fz), and if1fl = 1fz es 1f, it will be
denoted by 13(1f).
Note that II . 111 and II . liz are the norms induced by the inner product of 1fl
and 1fz. Also note that by dividing by IIx II t we eliminate the possibility of dilating
the norm of IITII by choosing a "long" vector. By restricting the length of [x),
one can eliminate the necessity for dividing by the length. In fact, the norm can
equivalently be defined as
IITII = max IIlTxllzl IIxlll = I} = max IIlTxllzl IIxliJ =" I}. (16.3)
It is straightforwardto show thatthe three definitions are equivalent and they indeed
define a norm.
16.2.3. Proposition. An operator T is bounded if and only if it maps vectors of
finite norm to vectors offinite norm.
Proof Clearly, ifT is bounded, then IITxll has finite norm. Conversely, if IITxllz
is finitefor all [x) (ofunitlength),maxIIlTxllzlllxllt = I} is also finite, andTis
bounded. D
An innnediate consequence of the definition is
IITxllz =" IITII IIxllJ 'of Ix) E 1f,. (16.4)
Ifwechoose Ix}-Iy} instead of [x), it will follow from (16.4) that as [x) approaches
Iy}, T [x) approaches T Iy}. This is the property that characterizes continuous func-
tions:
bounded operators 16.2.4. Proposition. The bounded operatorT E 13(1fj, 1fz) is a continuousfunc-
arecontinuous tion from 1f, to 1fz.
Another consequence of the definition is that
1The precise definition uses "supremum" instead of"maxiroum." Rather than spending a lot ofeffort explaining the difference
between the two concepts, we use the less precise, but more intuitively familiar, concept of "maximum."
16.2 BOUNDEO OPERATORS IN HILBERT SPACES 455
16.2.5. Box. :B(:J-Cj, :J-Cz) is a vector subspace of!'(:J-Cj, :J-Cz), andfar:J-C, =
:J-Cz = :J-C, we have 1 E 13(:J-C) and 11111 = I.
16.2.6. Example. Wehaveseenthatin aninnerproduct space,one canassociate alinear
operator (linear functional) to everyvector. Thus,associatedwiththevector Ix} in aHilbert
spaceXts thelinearoperatorIx : :J£ -e- C defined byIx (Iy» '" (x] y). Wewantto compare
theoperatornormoffx withthenormof Ix}. FirstnotethatbyusingtheSchwarz inequality,
we get
IIlxll =max[ Ilxl~/)IIIY)"'0) =max[ I(~~~)IIIY)"'0) s 114
Onthe otherhand,from IIxliZ
= Ix (Ix», we obtain
IIxll = IXI~~I) ~ max [ Ilxl~I»IIIY) '" 0) =lllxll·
Thesetwo inequalities implythat IIlxII = IIxli. III
derivative operator is
unbounded
16.2.7. Example. The derivative operator D dJdx is not a bounded operator on
the Hilbert space- £}(a, b) of sqnare-integrable fnoctions. With a functionlike f(x) =
Jx - a, one gets
b-a
"" Ilfll = .Ji '
norm ofaproduct is
less than the product
ofnorms.
whiledf[dx = 1/(2.,fx - a) gives 1I0fllZ
= !J% dx/(x - a) = 00. Weconclude that
11011 = 00. III
16.2.8. Example. Since£,O{) is analgebra aswell as a vectorspace,onemaybe inter-
estedin therelation betweentheproduct of operators andtheir norms. Morespecifically,
onemay wantto knowbow IISTII is relatedto IISII and IITII. Iu this example we sbowthat
IISTII ~ IISII IITII.
Todo so, we use thedefinition of operator nonn fortheproduct ST:
IISTII = max [ II~:~II I Ix) '" 0)
=max[ IISTxll IITxll I Ix) "'O",Tlx))
IITxll IIxll
< max [ II
SCT IX)1I1
Tlx) ",o)max[ II
Txlll
lx) "'0).
- . IITxll ~---,I,-Ix,-II ~_~
=IITII
(16.5)
2Herethe two Hilbertspacescoincide, so thatthe derivative operator acts on a single Hilbertspace.
456 16. AN INTROOUCTION TO OPERATOR THEORY
Now note that the first term on the RHS does not scan all the vectors for maximality: It
scans only the vectors in the image of T. If we include all vectors, we may obtain a larger
number. Therefore,
max{ 11
5(Tlx))111
T1x);"0) < max { 115.<1
11
Ix) ;"0) = 11511
IITxll - IIxll '
andthedesiredinequalityisestablished. A usefulconsequenceofthisresultis IITnII ::; IITlln,
which we shall use frequently. IlIIII
We can put Equation (16.5) to immediate good use.
16.2.9. Proposition. Let 1f be a Hilbert space and T E 13(1f). If [T II < I, then
1 - T is invertible and (1 - T)-t = L~o Tn.
Proof. First note that the series converges, because
Il
f Tnl1s f IITnII :::: f IITli
n
= I_I T
n=O n~O n=O II II
and the sum has a finite norm. Furthermore,
(1 - nfTn = (1 - T) (lim tTn) = lim (1 - T) trn
n=O k->;oon=O k--+oo n=Q
= lim (trn -trn+t) = lim (1 - Tk+l) = 1,
k-+oo n=O n=O k-HXi
because 0 < Iim,....co IITk+1
11 ::; limr....oo IITIlk+l = 0 for IITII < I, and the
vanishing ofthe norm implies the vanishing ofthe operatoritself. One can similarly
show that (L~o Tn)(l - T) = 1. D
A corollary of this proposition is that operators that are "close enough" to an
invertible operator are invertible (see Problem 16.1). Another corollary, whose
proof is left as a straightforward exercise, is the following:
16.2.10. Corollary. Let T E 13(1f) and Aa complex number such that IITII < IAI.
Then T - Al is an invertible operator, and
lOOT n
(T - Al)-t = -- L (-)
An~O A
Adjoints play an important role in the stody of operators. We recall that the
adjoint ofTis definedas (yl T Ix)* = (r] rt Iy) or (Txl y) = (x ITt y).Inthefinite-
dimensional case, we could calculate the matrix representation of the adjoint in
a particular basis using this definition and generalize to all bases by similarity
transformations. That is why we never raised the question of the existence of the
adjoint of an operator. In the infinite-dimensional case, one must prove such an
existence. We state the following theorem without proof:
16.3 SPECTRA OF LINEAR OPERATORS 457
16.2.11.Theorem. Let T E 1I(:Ji). Then the adjoint ofT, defined by
(Txl y) = (x ITty),
Tand Tthave equal exists. Furthermore, IITII = IITt II.
norms
Anotheruseful theoremthat we shall use later is the following.
16.2.12.Theorem. Let N(T) and ~(T) denote the null space (kernel) and the
range ofT E 1I(:Ji).We have
Proof Ix} is in N(Tt) iff rt [x) = 0 iff (y ITtx) = 0 for all Iy} E :Ji. Thisholds
if and only if (Ty Ix) = 0 for all Iy) E :Ji. This is equivalentto the statement that
Ix} is in ~(T)-L. This chain of argumentproves that N(Tt) = ~(T)-L. The second
part of the theoremfollowsfrom the fact that (Tt)t = T. D
16.3 Spectra of Lineal" Operators
regular point ofan
operator
resolvent setand
spectrum ofan
operator
every eigenvalue of
an operator on a
vector space offinite
dimension isin its
spectrum and vice
versa
One of the most important results of the theory of finite-dimensional vector
spacesis thespectraldecomposition theoremdevelopedin Chapter4.Theinfinite-
dimensional analogue of that theorem is far more encompassingand difficult to
prove.It is beyondthe scopeof this bookto developall the machineryneeded for
a thoroughdiscussionof theinfinite-dimensional spectraltheory.Instead,weshall
presentthe central results, and occasionallyintroducethe reader to theperipheral
arguments when they seem to have their own merits.
16.3.1.Definition. Let T E .t:,(:Ji). A complex number Ais called a regular point
ofT ifthe operator T - A1 is bounded and invertible.3 The set ofall regularpoints
ofT is called the resolvent set ofT, and is denoted by p(T). The complement of
p(T) in the complex plane is called the spectrum ofT and is denoted by a(T).
Corollary16.2.10implies"thatifT isbounded,then p(T)isnotempty, andthat
the spectrumof a boundedlinear operatoron a Hilbert space is a boundedset.In
fact, an immediateconsequenceof the corollaryis that A::; IITII for all AE aCT).
It is instructive to contrastthe finite-dimensional case againstthe implications
of the above definition. Recall that because of the dimension theorem, a linear
operator on a finite-dimensional vector space V is invertible if and only if it is
eitherontoorone-to-one.Now, AE a(T) if and onlyifT - A1 is notinvertible. For
finitedimensions, this impliesthatS ker(T- A1) 01 O. Thus, in finite dimensions,
3IfTis bounded, then T- ).1is automatically bounded.
40ne cansimply choosea x whoseabsolute value is greater than IITII.
SNotehow critical finite-dimensionality is for this implication. In infinite dimensions, an operator canbe one-to-one (thus
having a zerokernel) without beingonto.
458 16. AN INTRODUCTION TO OPERATOR THEORY
notall points ofaCT)
areeigenvalues
A E ,,(T) if and only if there is a vector la) in V such that (T - A1) la) = O. This
is the combined definition of eigenvalue and eigenvector, and is the definition we
will have to use to define eigenvalues in infinite dimensions. It follows that in the
finite-dimensional case, ,,(T) coincides with theset of all eigenvalues of T. This
is not true for infinite dimensions, as the following example shows.
16.3.2. Example. Consider theright-shift operator actingon Coo. It is easy to see that
IIT
rall = lIall for all ]«). This yields IITrll = I, so that any A that belongsto a(T,) must
be suchthat 1)..1 :::: 1. Wenow showthat theconverse is also true, i.e., that if III :5 1, then
)" E o(T,.). It is sufficient to showthat if 0 < 1),,1 ::: I, thenTr - A1 is notinvertible. To
establish this,we shallshowthat Tr - A1 is notonto.
Supposethat T; -),,1 is onto.Thenthere must be a vector la) suchthat(Tr-),,1) la) =
let} where leI}is the first standardbasisvector ofCco.Equating components onbothsides
yields therecursion relations al = -1/)", anda j-l = )"aj forall j :::: 2. Onecanreadily
solvethisrecursionrelation toobtaina j = -1/).) forallj. Thisisacontradiction, because
00 00 1
I:lajl2 = I: IAI2j
J=l J=l
will not converge ifO < IAI ~ 1, i.e., la} f/. Coo,and therefore Tr - A1 is not onto.
we conclude thet c rr-) = {A E Cj O < IAI::: I). 1fwe could generalize the result
of thefinite-dimensional casetoCoo,we wouldconclude that allcomplexnumbers whose
magnitude is atmost I are eigenvalues of Tr- Quitetooursurprise, thefollowingargument
showsthat Tr hasno eigenvalues atall!
Suppose that A is an eigenvalue of Tr. Let la} be any eigenvector for A. Since Tr
preserves the tengthof a vector, we have (al a) = (Tral Tra) = (Aal Aa) = IAI2 (al a).
It follows that IAI = I. Nowwrite la) = {aj )1=1 and let am be the firstnonzeroterm of
this sequence. Then 0 = (Tral em) = (Aul em) = Au
m • The first equalitycomes about
because T; la} has its first nonzero term in the (m + l)st position. Since A =1= 0, we must
haveam = 0, whichcontradicts thechoice of thisnumber. II!I
16.4 CompactSets
This section deals with some technical concepts, and as such will be rather formal.
The central concept of this section is compactoess. Although we shall be using
compactness sparingly in the sequel, the notion has sufficient application in higher
analysis and algebra that it warrants an introductory exposure.
Let us start with the familiar case of the real line, and the intuitive notion
of "compactness." Clearly, we do not wantto call the entirereal line "compact,"
because intuitively, it is not. The next candidate seems to be a "finite" interval.
So, first consider the open interval (a, b). Can we call it compact? Intuition says
"yes," but the following argument shows that it would not be appropriate to call
the open interval compact.
Considerthe map e : IR ---> (a, b) givenby etr) = b"2a
tanh t+bia •The reader
may check that this map is continuous and bijective. Thus, we can continuously
16.4 COMPACT SETS 459
map all of R in a one-to-one manner onto (a, b). This makes (a, b) "look" very
much" like lR. How can we modify the interval to make it compact? We do not
want to alter its finiteness. So, the obvious thing to do is to add the end points.
Thus, the interval [a, b] seems to be a good candidate; and indeed it is.
The next step is to generalize the notion of a closed, finite interval and even-
tually come up with a definition that can be applied to all spaces. First we need
some terminology.
open ball 16.4.1. Definition. An open ball B;(x) of radius r and center Ix) in a normed
vector space V is the set ofall vectors in V whose distance from Ix} is strictly less
than r:
open round
neighborhood
bounded subset
open subset
boundary point
closed subsetand
closure
Br(x) == lly) EVilly -xII < r].
We call Br(x) an open round neighborhood of [x),
This is a generalization of open interval because
(a, b) = {y E lRIly _a: b I< b; a} .
16.4.2. Example. A prototype of finite-dimensional nonned spacesis JRn. An openball
of radius r centered at x is
Br(x) = {y E JR I(Yt - Xt)2 +(Y2 - X2)2 + ... + (Yn - xn)2 < r2
}.
Thus, all points inside a circle form an open ball in the xy-plane, and all interior points of
a solid sphereform an open baIlin space. II
16.4.3. Definition. A bounded subset of a normed vector space is a subset that
can be enclosed in an open ball offinite radius.
For example, any region drawn on a piece of paper is a bounded subset of lR2,
and any "visible" part of our environment is a bounded subset of lR3 because we
can always find a big enough circle or sphere to enclose these subsets.
16.4.4. Definition. A subset (')ofa normedvector space V is called open ifeach of
itspoints(vectors) hasan open round neighborhoodlying entirely in ('). A boundary
point of(') is a point (vector) in Vall ofwhose open round neighborhoods contain
points inside and outside ('). A closed subset eof V is a subset that contains all
ofits boundary points. The closure ofa subset S is the union of S and all of its
boundary points, and is denoted by S.
Forexample, the boundary ofa region drawn on paper consists of all its bound-
ary points. A curve drawn on paper has nothing but boundary points. Every point
is also its own boundary. A boundary is always a closed set. In particular, a point
is a closed set. In general, an open set cannot contain any boundary points. A
frequently used property of a closed set eis that a convergent sequence of points
of econverges to a point in e.
6In mathematical jargon one says that (a, b) and R are homeomorphic.
460 16. AN INTROOUCTION TO OPERATOR THEORY
dense subset
rational numbers are
dense inthereal
numbers
p(T) isopen, and
u(T) isclosed and
bounded inC.
16.4.5.Definition. A subset W 01a normed vector space V is dense in V if the
closure olW is the entire space V. Equivalently, W is dense ifeach vector in W is
infinitesimally close to at least one vector in V. In other words, given any !u) E V
and any E > 0, there is a !w) E W such that lIu - wll < E, i.e., any vector in V
can be approximated, with arbitrary accuracy, by a vector in W.
A paradigm of dense spaces is the set of rational numbers io the normed
vector space of real numbers. It is a well-known fact that any real number can
be approximated by a rational number with arbitrary accuracy: The decimal (or
bioary) representation of real numbers is precisely such an approximation. An
iotuitive way ofimagioiog denseness is that the (necessarily) infinite subsetis equal
to almost all of the set, and its members are scattered "densely" everywhere io the
set. The embeddiog of the rational numbers in the set of real numbers, and how they
densely populate that set, is a good mental picture of all dense subsets. A useful
property iovolviog the concept of closure and openness has to do with contiouous
maps betweennormedvectorspaces. Let I: 1t:1 --> 1t:2be a continuous map. Let
(92 be an open setin1t:2. Let 1-1((92) denote the ioverse image of (92,i.e., allpoiots
of1t:1 that are mapped to (92. Let !XI)be a vector io I-I((92), !X2) = I(!XI)), and
let B.(X2) be a ball contained entirely io (92. Then I-I(B. (X2)) contaios !XI) and
lies entirely io 1-1((92). Because of the continuity of I, one can now construct
an open ball centered at IXI) lyiog entirely io I-I(B. (X2)), and by inclusion, in
I-I((92).This shows that every poiot of I-I((92)has a round open neighborhood
lyiog entirely io 1-1((92). Thus, 1-1((92) is an open subset. One can similarly
show the correspondiog property for closed subsets. We can summarize this io the
followiog:
16.4.6.Proposition. Let I : 1t:1 --> 1t:2be continuous.Then the inverse image 01
an open (closed) subset 011t:2 is an open (closed) subset ol1t:J.
Consider the resolvent set of a bounded operator T. We claim that this set is
open io C. To see this, note that if )" E P(T), then T - )"1 is iovertible. On the
other hand, Problem 16.1 shows that operators close to an invertible operator are
iovertible. Thus, if we choose a sufficiently small positive number E and consider
all complex numbers I-' withio a distance E from )",then all operators of the form
T - 1-'1 are iovertible, i.e., I-' E P(T). Therefore, any)" E peT) has an open round
neighborhood io the complex plane all poiots of which are io the resolvent. This
shows that the resolvent set is open. In particular, it cannot contain any boundary
poiots. However, peT) and O'(T) have to be separated by a common boundary?
Sioce p(T) cannotcontaioany boundarypoiot, 0' (T) mustcarry the entire boundary.
This shows that 0' (T) is a closed subset of Co Recalliog that 0' (T) is also bounded,
we have the following result.
7The spectrum of a bounded operator need not occupy any "area" in the complex plane. It may consist of isolated points or
line segments, etc., in which case the spectrum will constitute the entire boundary.
16.4 COMPACT SETS 461
16.4.7. Proposition. For any T E 1l(9i) the set peT) is an open subset ofC and
IT (T) is a closed, bounded subset ofCo
Let us go back to the notion of compactness. It tums out that the feature of
the closed interval [a, b] most appropriate for generalization is the behavior of
infinite sequences of numbers lying in the interval. More specifically, let la,1;';;1
be a sequence of infinitely many real numbers all lying in the interval [a, b]. It is
intuitively clear that since there is not enough room for these points to stay away
from each other, they will have to crowd around a number ofpoints in the interval.
For example, the sequence
in the interval [-1, +1] crowds around the two points -t and +t.ln fact, the
points with even n accumnlate around +tand those with odd n crowd around - t.
It tums out that all closed intervals of ~ have this property, namely, all sequences
crowd around some points. To see that open intervals do not share this property
consider the open interval (0, 1). The sequence 12n~tl:l = Il,~, ... I clearly
crowds only around zero, which is not a point ofthe interval. But we already know
that open intervals are not compact.
16.4.8. Definition. (Bolzano-Weierstrass property) A subset X ofa normed vec-
compact subset tor space is called compact if every (infinite) sequence in X has a convergent
subsequence.
The reason for the introduction of a subsequence in the definition is that a
sequence may have many points to which it converges. But no matter how many
of these points there may exist, one can always obtain a convergent subsequence
by choosing from among the points in the sequence. For instance, in the example
above,one canchoosethe subsequenceconsistingofelementsfor whichn is even.
This subsequence converges to the single point +t.
An important theorem in real analysis characterizes all compact sets in ~n:8
16.4.9. Theorem. (BWHB theorem) A subset of~n is compact ifand only ifit is
closed and bounded.
We showed earlier that the spectrum of a bounded lioear operator is closed and
aCT) iscompact bounded. Identifying C with ~2, the BWHB theorem implies that
BBWHB stands for Balzano, Weierstrass, Heine, and Borel. Balzano and Weierstrass proved that any closed and bounded
subset of R has the Balzano-Weierstrass property. Heine and Borel abstracted the notion of compactness in terms of open sets,
and showed that a closed bounded subset of R is compact. The BWHB theorem as applied to R is usually called the Heine-
Borel theorem (although some authors call it the Balzano-Weierstrass theorem). Since the Balzano-Weierstrass property and
compactness are equivalent, we have decided to choose BWHB as the name of our theorem.
462 16. AN INTROOUCTION TO OPERATOR THEORY
16.4.10. Box. The spectrum ofa bounded linear operator is a compact sub-
set ofC.
criterion for
finite-dimensionality
An immediate consequence of the BWHB Theorem is that every bounded
subset ofJRnhas a compactclosure. SinceJR" is a prototype ofallfinite-dimensional
(normed) vector spaces, the same statement is true for all such vector spaces. What
is interesting is that thestatement indeedcharacterizes theDonned space:
16.4.11. Theorem. A normedvector space isfinite-dimensional ifandonly ifevery
bounded subset has a compact closure.
This result can also be applied to subspaces of a normed vector space: A
subspace W of a normed vector space Vis finite-dimensional if and ouly if every
bounded subset ofW has a compact closure in W. A useful version ofthis property
is stated in terms of sequences of points (vectors):
16.4.12. Theorem. A subspace W ofa normedvector space V isfinite dimensional
ifand only ifevery bounded sequence in W has a convergent subsequence in W.
Karl Theodor WDhelm Weierstrass (1815-1897) was
boththe greatest analyst andtheworld's foremost teacher
of advanced mathematics of the last third of the nine-
teenth century. His career was also remarkable in another
way-and aconsolation toall"late starters"-forhe began
thesolid part of his professional life at theage of almost
40, whenmostmathematicians are long pasttheircreative
years.
Hisfather senthimtotheUniversity ofBonn toqualify
forthehigherranks ofthePrussiancivil serviceby studying
law andcommerce. ButKarl hadno interest in these sub-
jects. He infuriatedhis fatherby rarely attendinglectnres,gettingpoor grades,andinstead,
becomingachampion beerdrinker. Hedidmanage to becomeasuperb fencer, butwhenhe
returned home, he hadno degree.
In order to earn his living, he made a fresh start by teaching mathematics, physics,
botany, German, penmanship. and gymnastics to the children of several small Prussian
towns during the day. During the nights, however, he mingled with the intellectuals of
thepast, particularly thegreat Norwegian mathematician Abel. Hisremarkable research on
Abelian functions wascarried onforyearswithouttheknowledge of another livingsoul;he
didn'tdiscussitwithanyone atall,orsubmit itforpublication inthemathernaticaljoumals
of the day.
All thischanged in 1854whenWeierstrass atlastpublished anaccount of his research
onAbelian functions. Thispaper caught theattention of analert professor attheUniversity
of Konigsberg who persuaded his university to award Weierstrass an honorary doctor's
degree. The Ministry of Education granted Weierstrass a year'sleave of absence withpay
16.4 COMPACT SETS 463
to continue his research, andthe next yearhe was appointed to the University of Berlin,
where heremained therestof his life.
Weierstrass's great creative talents wereevenly dividedbetweenhis thinking andhis
teaching. The student notes of his lectures,andcopies of thesenotes, andcopies of copies,
werepassed from hand to hand throughout Europe andevenAmerica. LikeGauss he was
indifferent to fame, butunlike Gauss he endeared himself to generations of students by
thegenerosity withwhichhe encouraged themto developandpublish, andreceivecredit
for, ideas andtheorems thatheessentially originatedhimself. Among Weierstrass's students
andfollowerswere Cantor, Schwarz, Holder, Mittag-Leffler, SonjaKovalevskaya (Weierstrass's
favorite student), Hilbert, Max Planck, Willard Gibbs, andmanyothers.
In 1885he published thefamous theorem now calledtheWeierstrassapproximation
theorem (see Theorems 5.2.3 and 8.1.1), which was given a far-reaching generalization,
withmanyapplications, by themodemAmerican mathematician M. H. Stone.
Thequality thatcametobeknownas"Weierstrassian rigor" wasparticularly visiblein
his contributions to thefoundations ofreal analysis. Herefused to acceptanystatement as
"intuitively obvious," butinstead demanded ironclad proofbasedon explicitproperties of
therealnumbers. Thecareful reasoning required fortheseproofswasfounded ona crucial
property of therealnumbers nowknownastheBWHB theorem.
We shall need the following proposition in our study of compact operators:
16.4.13. Proposition. Let W be.aclosedproper subspaceofX and 8 an arbitrary
nonnegative number with 0 :0 8 < 1. Then there exists a unit vector Ivo} E X
such that
IIx- voll2:8 v [x) E W.
Proof Choose a vector Iv} in X but not in W and let
d = minl]» - xIII [x) E W}.
We claim that d > O. To show this, assume otherwise. Then for each (large) n and
(sufficiently small) f, we could find distinct vectors (Ixn )) whose distance from
[u) would be fin and for which the sequence (Ixn )) would have Iv} as a limit.
Closure ofW would then imply that Iv} is in W, a contradiction. So, d > 0,
Now, for any Ixo} E W, let
eW
lu} sa [x) _ Iv} - Ixo} = '(IIv - xolllx) + Ixo}) -Iv}
IIv -xoll IIv -xoll
and note that by the definition of d, the norm of the numerator is larger than d.
Therefore, lIuli 2: dlllv - xoll for every Ix}, Ixo} E W. If we choose Ixo} snch
that IIv - Xo II < as:t, which is possible because d8-1 > d, then lIuII 2: 8 for all
Ix} E W. Now let Ivo} = (Iv) - Ixo})/lIv - xoll. 0
464 16. AN INTROOUCTION TO OPERATOR THEORY
16.5 Compact Operators
It is straightforward to showthat if X is a compact set in 9ii and f : 9i1 ..... 9i2
is continuous, then fiX) (the image of X) is compact in 9i2. Since all bounded
operators are continuous, we conclude that all bounded operators map compact
subsets onto compactsubsets. Thereis aspecialsubsetof 21(9i1, 9i2) thatdeserves
particularattention.
compact operator 16.5.1. Definition. An operator K E 2l(9it, 9i2) is called a compact operator if
it maps a bounded subset of9il onto a subset Of9i2 with compact closure.
Since we will be dealing with function spaces, and since it is easier to deal
with sequencesof functionsthan with subsetsof the space of functions,we findit
moreusefulto havea definition of compactoperatorsin termsof sequencesrather
than subsets.Thus,insteadof aboundedsubset,wetake a subsetof it consistingof
a (necessatily)boundedsequence.The image of this sequencewill be a sequence
in a compact set, which,by defutition, must have a convergentsubsequence. We
thereforehavethe following:
16.5.2. Theorem. An operator K E 2l(9it, 9i2) is compact ifand only iffor any
bounded sequence (Ixn)} in 9it, the sequence (K Ixn)} has a convergent subse-
quence in 9i2.
product oftwo
compact operators Is
compact
16.5.3. Example. Consider 2l(:J{), thesetofbounded operators ontheHilbert space X,
If Kis a compactoperatorand T a bounded operator,then KT and TK are compact. This is
because (T IXn) == IYn))is a bounded sequence if (Ixn)) is, and (K IYn) = KTIXn)) hasa
convergentsubsequence,because Kis compact. For the second part, use the first definition
of the compact operator and note that K maps bounded sets onto compact sets, which T
(beingcontinuous) maps onto a compact set. As a special case of this property we note that
the product of two compact operators is compact. Similarly,one can show that any linear
combination of compactoperators is compact.Thus,anypolynomial of a compactoperator
is compact. In particular,
~ n' . ~ n' .
(1-K)n = L...., .. ,(-K)] =1+ L...., '. ,(-K)] ==1-Kn,
j=O J .(n - J). j=l ] .(n - J).
whereKn is a compactoperator. III
finite rank operators 16.5.4. Definition. An operator T E .c(9it, 9i2) is called a finite rank operator
ifits range is finite-dimensional.
The following is clear from Theorem 16.4.12.
16.5.5. Proposition. A finite rank operator is compact
In particular,everylinear transformationof a finite-dimensional vector space
is compact.
linear
transformations of
finite-dimensional
vector spaces are
compact
16.5 COMPACT OPERATORS 465
16.5.6. Theorem. IflKn} E ,c(Xj, X2) are compactandK E ,c(Xj, X2) is such
that IIK - Kn II --+ 0, then K is compact.
Proof. Let {lxm)J be a bounded sequence in Xl. Let {Kl Ixm1)Jbe the convergent
subsequence guaranteed by the compactness of Kj. Now, {lxm1)J is a bounded
sequence in Xl. It therefore has a subsequence {Ixm,)} such that {K2Ixm,)} is
convergent. Note that {Kj Ixm,)} is also convergent. Continuing this process, we
constroct the sequence of sequences
where each sequence is a subsequence of all sequences preceding it. Furthermore,
all the sequences {I<J Ixmk } } for I = I, ... , k are convergent. In particular, if we
pick the diagonal sequence {IYm)J == {lxmm)J, then for any I E !'ii, the sequence
{K1IYm)} converges in X2. To show that K is compact, we shall establish that
{IYm)} is the subsequence of {Ixm)} such that {KIYm)} is convergent. Since X2
is complete, it is sufficient to show that {KIYm)} is Cauchy. We use the so-called
"E/3trick." Write
K IYm) - K IYn} = K IYm) -I<J IYm) +I<J IYm} - K1IYn) +I<J IYn) - K IYn}
and use the triangle inequality to obtain
By choosing m, n, and I large enough, we can make each ofthe three terms on the
RHS smaller than E/3;the first and the third ones because Kl --+ K, the second
one because {I<J IYn)} is a convergent sequence. 0
Recall that given an orthonormal basis {lei) }~j' any operator T on a Hilbert
space X can be written as L:0~1 Cij lei) (ejl, where Cij = (eil T Iej). Now let K
be a compact operator and consider the finite rank operators
n
Kn sa L Cij Ie,) (ejl,
i,j=l
Clearly, 11K- Kn II --+ O. The hermitian adjoints {Kk} are also of finite rank (there-
fore, compact). Barring some convergence technicality, we see that Kt, which is
the limit of the sequence of these compact operators, is also compact.
Kiscompact iff Kt is 16.5.7. Theorem. Kis a compact operator ifand only ifKt is.
A particular type of operator occurs frequently in integral equation theory.
These are called Hilbert-Schmidt operators and defined as follows:
16.5.8. Definition. Let:J{ be a Hilbert space, and {lei) }~l an orthonormal basis.
Hiibert-Schmidt An operator T E ,c(X) is called Hilberl-Schmidt if
operators
466 16. AN INTROOUCTION TO OPERATOR THEORY
00 00 00
tr(TtT) sa L (edTtT lei) = L (Ted Tei) = L IITedl
2
< 00.
i=l i=l i=l
Hilbert-Schmidt
operators are
compact.
Hilbert-Schmidt
kernel
16.5.9. Theorem. Hilbert-Schmidt operators are compact.
For a proof, see [Rich 78, pp. 242-246].
16.5.10. Example. It is time to give a concrete example of a compact (Hilbert-8chmidt)
operator. For this, we retorn to Equation (16.2) with w(y) = I, and assume that Iu) E
r}(a, b). Supposefurther thatthe functionK(x, y) is continuous on the closed rectangle
[a, b] X [a, b] in the xy-plane (orIR2) . Under such conditions, K(x, y) is called a Hilhert-
Schmidt kernel. We now show that Kis compact. Firstnote thatdue to the continuity of
K(x, y), J:J:IK(x, y)I 2dx dy < 00. Next, we calculate the trace of KtK. Let {lei) )~1
be any orthonormal basis of ,e2(a, b). Then
trKtK = f:(etl KtKlei) = 'tfff(eil x) (xl Kt Iy) (yl Klz) (zl ei) dxdydz
l=l /=1
= fff(YIKIX)*(YIKlz)'t(zlei)(eilx)dXdydZ
1=1
~o(x-z)
= fff(yl Klx)* (yl KIz)'(zl (~Iei) (eil) Ix) dxdydz
----
=1
III
Bernard Balzano (1781-1848) was a Czech philosopher, math-
ematician, andtheologianwhomade significant contributions to
bothmathematics andthetheoryof knowledge.He enteredthe
Philosophy Faculty.oftheUniversity of Prague in 1796, stodying
philosophy andmathematics. Hewrote "Myspecialpleasure in
mathematics restedtherefore particularly on its purelyspecula-
tiveparts, in otherwordsI prizedonly thatpart of mathematics
which was atthe sametimephilosophy."
.10 the automo of 1800 he began three years of theological
studywhilehe was preparing a doctoral thesis on geometry. He
receivedhis doctorate in 1804 for a thesis in which he gave his view of mathematics and
whatconstitutes a correct mathematical proof.In theprefacehe wrote:
I could not be satisfiedwith a completely strictproof if it were not derived
from conceptswhich the thesis to beprovedcontained, butrather madeuse
of somefortuitous, alien,intermediate concept,whichis alwaysanerroneous
transition to another kind.
16.6 SPECTRUM OF COMPACT OPERATORS 467
Two days after receiving his doctorate Balzano was ordained a Roman Catholic priest.
However, he came to realize that teaching and not ministering defined his true vocation.
In the same year, Balzano was appointed to the chair of philosophy and religion at the
University of Prague. Because of his pacifist beliefs and his concern for economic justice,
he was suspended from his position in 1819 after pressure from the Austrian government.
Balzano had not given up without a fight but once he was suspended on a charge of heresy
he was put under house arrest and forbidden to publish.
Although some ofhis books had to bepublished outside Austria because ofgovernment
censorship, he continued to write and to play an important role in the intellectual life of his
country. Balzano intended to write a series of papers on the foundations of mathematics.
He wrote two, the first of which was published. Instead of publishing the second one he
decided to" . .. make myself better known to the learned world by publishing some papers
which, by their titles, would be more suited to arouse attention."
Pursuing this strategy he published Der binomische Lehrsatz ... (1816) and Rein
analytischer Beweis ... (1817), which contain an attempt to free calculus from the concept
of the infinitesimal. He is clear in his intention stating in the preface of the first that the
work is "a sample of a new way of developing analysis." The paper gives a proof of the
intermediate value theorem with Bolzano's new approach and in the work he defined what
is now called a Cauchy sequence. The concept appears in Cauchy's work four years later
but it is unlikely that Cauchy had read Balzano's work.
After 1817, Bolzano published no further mathematical works for many years. Between
the late I820s and the 1840s, he worked on a major work Grossentehre. This attempt to put
the whole of mathematics on a logical foundation was published in parts, while Balzano
hopedthathis students wouldfinish andpoblishthe complete work.
His work Paradoxien des Unendlichen, a study of paradoxes of the infinite, was pub-
lished in 1851, three years after his death, by one of his students. The word "set" appears
here for the first time. In this work Balzano gives examples of 1-1 correspondences between
the elements of an infinite set and the elements of a proper subset.
Bolzano's theories ofmathematical infinity anticipated Georg Cantor's theory of infinite
sets. It is also remarkable that he gave a function which is nowhere differentiable yet
everywhere continuous.
16.6 Spectrum of Compact Operators
Our next task is to investigate the spectrum 0" (K) of a compact operator K on
a Hilbert space J£. We are particularly interested in the set of eigenvalues and
eigenvectors of compact operators. Recall that every eigenvalue of an operator on
a vector space of finite dimension is in its spectrum, and that every point of the
spectrum is an eigenvalue (see page 457). In general, the second statement is not
true. In fact, we saw that the right-shift operator had no eigenvalue at all, yet its
spectrum was the entire unit disk of the complex plane.
We first observe that 0 E O"(K), because otherwise 0 E p(K), which implies
that K = K - 01 is invertible with inverse K-t. The product of two compact
operators (in fact, the product of a compact and a bounded operator) is compact (see
468 16. AN INTRODUCTION TO OPERATOR THEORY
oE cr(K) if Kis
compact
Example 16.5.3). This yields a contradiction? because the unit operator cannot be
compact: Itmaps a bounded sequence to itself, not to a sequence with a couvergent
subseqnence.
16.6.1. Proposition. For any compact operator KE ~(:Ji) on an infinite dimen-
sional Hilbert space, we have 0 E u(K).
To proceed, we note that eigenvectors of K corresponding to the eigenvalue ).
belong to the null space of K - ),1. So, letlO
NJ. = ker(K - ),1),
Nt =ker(Kt -),*1),
:RJ. es Range(K - ),1),
:Rt =Range(Kt -),*1).
generalized
eigenvector
16.6.2. Theorem. NJ. and Nt are finite-dimensional subspaces of X. Further-
-n.L _ Nt
more, J}.. - )..0
Proof. We use Theorem 16.4.12. Let (Ix.)} be a bounded sequence in NJ.. Since
K is compact, (K Ix.) = ).Ix.)} has a convergent subsequence. So (Ix.)} has a
convergent subsequence. This subsequence will converge to a vector in NJ. if the
latter is closed. But this follows from Proposition 16.4.6, continuity of K- ),1, the
fact that NJ. is the inverse image of the zero vector, and the fact that any single
point of a space, such as the zero vector, is a closed subset. Finite-dimensionality
of Nt follows from the compactness of Kt and a similar argument as above.
To show the second statement, we observe that for any bounded operator T,
wehavelllu) ET(X).Liff(ulv) = ofor all Iv) ET(X)iff(uITx) = ofor all
[x) E X iff (Ttul x) = 0 for all [x) EX iffTt lu) = 0 iff lu) E kerTt. This
shows that T(X).L = ker rt, The desired result is obtained by letting T = K - ),1
and noting that (W.L).L = W for any subspace W of a Hilbert space. 0
We note that NJ. is the eigenspace of K corresponding to the eigenvalue )..
However, it may well happen that zero is the ouly number in o (K). In the finite-
dimensional case, this corresponds to the case where the matrix representation of
the operator is not diagonalizable. In such a case, the standard procedure is to look
at generalized eigenvectors. We do the same in the case of compact operators.
16.6.3.Definition. A vector Iu) is a generaTked eigenvector of Kof order m if
(K - )'1)m-l lu) # Obut(K- )'1)m lu) = O. The set ofsuch vectors, i.e., the null
space of(K - )'1)m, will be denoted byNt)
It is clear that
{O} =N(O) C N esN(ll C N(2) C ... c Jlfm) c N(m+l) ...
A-A ) , . - ) . - _ A _ A (16.6)
l~Our conclusion is valid only in infinite dimensions. In finitedimensions, all operators, including 1, are compact.
In what follows, we assume that X#= O.
11Recall that T(9{) is the range of the operatorT.
16.6 SPECTRUM OF COMPACT OPERATORS 469
and each Nt') is a subspace of 1C In general, a subspace with higher index is
larger than those with lower index. Ifthere happens to be an equality at one link of
the above chain, then the equality continnes all the way to the right ad infinitum.
To see this, let p be the first integer for which the equality occurs, and let n > p
be arbitrary. Suppose lu) E Nin+l). Then (K - A1)p+1[(K - Al)n-p lu)] = (K-
A1)n+1 lu) = O. It follows that (K - Al)n-p lu) is in NjP+1). But NjP) = NjPH).
So
(K - At)" lu) = (K - Al)P[(K - Al)n-p lu)] = o.
Thus every vector in Nin+1
) is also in Nin). This fact and the above chain imply
that Nin) = NinH) for all n > p,
16.6.4. Theorem. The subspacell Ni
n
) isfinite-dimensionalforeach n. Moreover,
there is an integer p such that
for n = 0, 1,2, ... , p - 1
butNin
) = Nin
+1
) for all n :::: p,
Proof For the first part, use the result of Exarnple 16.5.3to show that (K-At)" =
Kn - An1 where Kn is compact. Now repeat the proof of Tbeorem 16.6.2for Kn.
Ifthe integer p exists, the second part of the theorem follows from the discus-
sion preceding the theorem. To show the existence of p, suppose, to the contrary,
that Nin
) oft Nin
+1
) for every positive integer n. This means that for every n, we
can find a (unit) vector Ivn) E Nin+l) that is not in Nin) and that by Proposition
16.4.13has the property
We thus obtain a bounded sequence {Ivn)]. Let us apply K to this sequence. If
j > I, then (Ivj) - IVi») E N~+I) by the construction of IVj) and the fact that
N(l+l) c N U+1) Furthermore
A - } . . ,
but
11Forinfinite dimensions, thefactthatlinearcombinations of a subsetbelongtothesubsetis notsufficient tomakethat subset
intoa subspace. Thesubsetmustalso be closed. Wenormally leaveouttherather technical proofof closure.
470 16. AN INTRODUCTION TO OPERATOR THEORY
by the definition ofN~+I). Therefore, (K - A1)(lvj) -Iv[)) E N~). Now note
that
ENf) ENf)
, - ' - , , - " - ,
Klvj) - Klvl) = AH<K- A1)IVj) - *(K-A1) IVI) +Ivj) -IVI)}.
It follows from Proposition 16.4.13 that the nann of the vector in curly brackets
is larger than ~. Hence, II KIvj) - KIVI) II ::: IAI/2, i.e., since j and I are arbitrary,
the sequence (KIvn )} does not have a convergent subsequence. This contradicts
the fact that Kis compact. So, there must exist a p such that Nip) = Nl."+!). D
Wealso needthe range of various powers of K- A1.Thus, let ~in) = Range(K-
At)". One can show that
1C= ~(O) ::::> ~(l) ::::> ••• ::::> ~(n) ::::> ~(n+l) ::::> •••
),.-),,- -J..-)" -
16.6.5. Theorem. Each ~in) is a subspace of1C. Moreover, there is an integer q
h ha men) m(n+l)fi - 0 l i b men) - m(n+l)fi II
sue t t J}, f=..AJ,. orn - , , ... , q - , ut.},J. -..AJ.. ora n:::: q.
Proof The proof is similar to that of Theorem 16.6.4. The only extra step needed
is to show that ~in) is closed. We shall not reproduce this step. D
16.6.6. Theorem. Let q be the integer ofTheorem 16.6.5. Then
1. 1C= Niq
) Ell ~iq)
2. Niq
) and ~iq) are invariant subspaces ofK.
3. The only vector in ~iq) that K - A1 maps to zero is the zero vector. Infact,
when restricted to ~iq), the operator K - A1 is invertible.
Proof (l) Recall that 1C = Niq
) Ell ~iq) means that every vector of 1Ccan be
written as the sum of a vector in Niq
) and a vector in ~iq), and the only vector
common to both subspaces in zero. We show the latter property first. In fact, we
show that Nim
) n ~iq) = 0 for any integer m. Suppose [x) is in this intersection.
For each n ::: q, there must be a vector Ixn) in 1Csuch that [x) = (K - At)" Ixn)
because~in) = ~iq) forn::: q. If [x) i' 0, then Ixn) <t Nr') foreachn. Nowletr
be the larger of the two integers (p, q) where p is the integer of Theorem 16.6.4.
Then
(16.7)
16.6 SPECTRUM OF COMPACT OPERATORS 471
From
0= (K-A1)'" [x) = (K-A1)'"+'lxr)
and
it follows that Ixr) E Ni,"+r). But Ni,"+r) =Nf) =Nj"), contradicting Equation
(16.7). We conclude that [x) must be zero,By the definition of:Ri
q),
for any vector
lz) in :Ji, we have that (K - A1)q lz) E :Ri
q)
. Since :Ri
q)
= :Rfq), there must he a
vector Iy) E:Ji such that (K-A1)q [z) = (K- A1)2q Iy) or (K- A1)qllz) - (K-
A1)q Iy)] = O. This shows that Iz) - (K- A1)q Iy) is in Ni
q).
On the otherhand,
[z) = liz) - (K-A1)q ly)J+(K-A1)q Iy)..
and the first part of the theorem is done.
(2) For the second part, we simply note that (K - A1)Nik
) C; Nik
-
1
) C; Nik
),
and that
K(Nl
q)
= (K - A1 +A1)(Ni
q)
=(K - A1)(Ni
q
)+ A1(Ni
q)
c Ni
q).
c:Niq) cNiq)
Similarly,
K(:Ri
q)
= (K - A1 +A1)(:Ri
q)
= (K - A1)(:Ri
q)
+A1(:Ri
q)
C :Ri
q).
~
c:Riq
+1) c1l.iQ
) c:Riq
)
(3) Suppose [z) E :Ri
q)
and (K - A1) [z) =O. Then [z) = (K - A1)q Iy) for some
Iy) in Jf, and 0 = (K - A1) [z) = (K - A1)q+lly), or Iy) E Ni
q+1
) . From part
(l)-withm =q+ I-we conclude that lz) =O. Itfollows that K-A1 is injective
(or 1-1). We also have
(K-A1):Ri
q)
= (K-A1)(K-A1)q(:Ji)
= (K - A1)q+l(:Ji) = :Ri
q+1) = :Riq).
Therefore, when restricted to :Ri
q),
the operator K - A1 is surjective (or onto) as
well. Thus, (K - A1) : :Ri
q)
--> :Ri
q)
is bijective, and therefore has an inverse. D
16.6.7. Corollary. The two integers p and q introduced in Theorems 16.6.4 and
16.6.5 are equal.
472 16. AN INTROOUCTION TO OPERATOR THEORY
Proof The proof is left as a problem for the reader (see Problem 16.5). D
The next theorem characterizes the spectrum ofa compact operator completely.
In order to prove it, we need the following lemma.
16.6.8. Lemma. Let K, : 9<Iq) --> 9<Iq) be the restriction ofK to 9<Iq). Then:
1. Each nonzero point ofa(K) is an eigenvalue ofK whose eigenspace isfinite-
dimensional.
2. a(K,) "" a(K).
3. Every infinite sequence in a(K) converges to zero.
Proof (I) If A oF 0 is not an eigenvalue of K, the null space of K - A1 is zero.
This says that {O} "" NIO) "" Nil) "" .. " i.e., p "" q "" O. From Theorem 16.6.6,
we conclude that:Ji "" NIO) EIl9<IO) "" 9<II). Therefore, K - A1 is onto. Part (3) of
Theorem 16.6.6 shows that K - A1 is one-to-one. Thus, K- A1 is invertible, and
A E p(K). So, A ¢ a(K).
(2) Clearly a(K,) <; a(K). To show the reverse inclusion, first note that 9<Iq)
is infinite-dimensional because NIq) has finite dimension. Thus by Proposition
16.6.1,0 E a(K,). Now let u-s-nonzero and distinct from A-be in a(K). By part
(I) /L is an eigenvalue of K, so there is a vector lu) E :Ji such that Klu) "" /Llu).
We also have (K - A1) lu} "" (/L - A) lu), or (K - A1)q lu) "" (/L - A)q lu).
Thus, (/L - A)q lu) (and, therefore lu) is in 9<Iq). Therefore, we can restrict K to
9<Iq), i.e., we can write Klu) "" /Llu) as K, lu} "" /Llu}, or (K, - /L1) lu) "" O.
Hence, /L E a(K,). We conclude that every point of a(K) is a point of a(K,) and
a(K) <; a(K,).
(3) Let A be the limit of an infinite sequence in a(K) "" a(K,). If A oF 0,
K, - A1 will be invertible (Theorem 16'6.6 part 3), indicating that A E p(K,).
Since p(K,) is open, we can find an open round neighborhood of A entirely in
p(K,). This contradicts the property of a limit of an infinite sequence whereby any
neighborhood of the limit contains (infinitely many) other points of the sequence.
Therefore, we must conclude that no nonzero A can be the limit of an infinite
sequence in a (K). D
16.6.9. Theorem. Let Kbe a compact operator on an infinite-dimensionalHilbert
space :Ji. Then
1. 0 E a(K).
2. Each nonzero point ofa (K) is an eigenvalue ofK whose eigenspace isfinite-
dimensional.
3. a (K) is either a finite set or it is a sequence that converges to zero.
16.7 SPECTRAL THEOREM FOR COMPACT OPERATORS 473
o
Figure 16.1 Theshaded area represents a convexsubsetof the vector space. It consists
of vectors whose tips lie in the shaded region. It is clear that there is a (unique) vector
belongingto the subsetwhose lengthis minimum.
Proof (I) was proved in Proposition 16.6.1. (2) was shown in the lemma above.
(3) Let <T,,(K) sa {A E <T(K) IIAI ~ lin}. Clearly, <Tn(K) must be a finite set,
because otherwise the infinite set would constitute a sequence that by compactuess
of <T"(K) would have to have (at least) a limit point. By part (2), this limit must be
zero, which is not included in <Tn (K). Let <T1 (K) = {Ai}7~1,arranged in order of
decreasing absolute value. Next, let Ak+l' Ak+2, ... label the elements of <T2(K)
not accounted for in <T1 (K), again arranged in decreasing absolute value. If this
process stops after a finite number of steps, <T (K) is finite. Otherwise, continue the
process to construct a sequence whose limit by necessity is zero. D
16.7 Spectral Theorem for Compact Operators
The finite-dimensional spectral decomposition theorem of Chapter 4 was based on
the existence of eigenvalues, eigenspaces, and projection operators. Such existence
was guaranteedby the existence of an innerproduct for any finite-dimensional vec-
tor space. The task of establishing spectral decomposition for infinite-dimensional
vector spaces is complicated not only by the possibility of the absence of an inner
product, but also by the questions ofcompleteness, closure, and convergence. One
can elintinate the first two hindrances by restticting oneself to a Hilbert space.
However, evenso,onehasto dealwithother complications of infinite dimensions.
As an example, consider the relation V = W 6) W.L,which is ttivially true for
any subspace W in finite dimensions once an orthonormal basis is chosen. Recall
that the procedure for establishing this relation is to complement a basis of W
to produce a basis for the whole space. In an infinite-dimensional Hilbert space,
we do not know a priori how to complement the basis of a subspace (which may
be infinite-dimensional). Thus, one has to prove the existence of the orthogonal
complement of a subspace. Without going into details, we sketch the proof. First
a definition:
474 16. AN INTRODUCTION TO OPERATOR THEORY
convex subset 16.7.1. Definition. A convex subset E ofa vector space is a collection ofvectors
such that iflu) and Iv)are in E, then lu) -t(lu) -Iv» is also in Efor all 0 :0: t :0: 1.
Intuitively, any two points of a convex subset can be connected by a straigbt line
segment lying entirely in the subset.
Let E be a convex subset (not a subspace) of a Hilbert space Ji:. One can show
that there exists a unique vector in E with minimal nonn (see Figure 16.1). Now
let M be a subspace of Ji:. For an arbitrary vector lu) in Ji:, consider the subset
E = lu) - M, i.e., all vectors of the form lu) - 1m) with 1m) E M. Denote
the unique vector of minimal nonn of lu) - M by lu) - IPu) with IPu) EM.
One can show that lu) - IPu) is orthogonal to lu), i.e., (Iu) - IPu» E M.L (see
Figure 16.2). Obviously, only the zero vector can be simultaneously in M and M.L.
Furthermore, any vector lu) in Ji: can be written as lu) = IPu) +(Iu) -IPu» with
IPu) EM and (Iu) -IPu» E M.L. This shows that Ji: = M Ell M.L. In words,
a Hilbert space is the direct Sum of anyone of its subspaces and the orthogonal
complement of that subspace. The vector IPu) so constructed is the projection of
lu) inM.
A projection operator P can be defined as a linear operator with the property
that p2 = P. One can then show the following.
16.7.2. Theorem. The kernel ker P of a projection operator is the orthogonal
complement ofthe range P(Ji:) ofP in Ji: iff P is hermitian.
This is the reason for demanding henniticity of the projection operators in our
treatruent of the finite-dimensional case.
We now concentrate on the compact operators, and first look at hermitian
compact operators. We need two lemmas:
16.7.3. Lemma. Let H E :B(Ji:) be a bounded hermitian operator on the Hilbert
space K Then [H] = maxl] (H»] x) II1Ixll = II.
Proof Let M denote the positive number on the RHS. From the definition of the
norm ofan operator, we easily obtain I(Hxlx) 1:0: lIHlllIx1l2
= lIHlI,orM:o: lIHII.
For the reverse inequality, see Problem 16.7. D
16.7.4. Lemma. Let K E :B(Ji:) be a hermitian compact operator. Then there is
an eigenvalue}" ofKsuch that IAI = lIKll.
Proof Let (Ixn)I be a sequence of unit vectors such lbat
This is always possible, as the following argument shows. Let E be a small positive
number. There must exist a unit vector IXI) E Ji: such that
16.7 SPECTRAL THEOREM FOR COMPACT OPERATORS 475
Figure 16.2 Theshaded area represents thesubspace Jy(of thevectorspace. Theconvex
subset E consists of allvectors connecting points of M tothetipof lu}. It is clearthat there
is a (unique) vectorbelonging to E whose lengthis minimum. The figure showsthat this
vector is orthogonal to M,
because otherwise, IIKII - E would be greater thau or equal to the uonn of the
operator (see Lemma 16.7.3). Similarly, there must exist auother (differeut) unit
vector IX2} E 1f such that II KII - E12 = I (Kx21 X2) I. Continuing this way, we
construct au infinite sequence ofunit vectors (Ixn)) with the property IIKII-Eln =
I(Kx n Ix n) I·This coustructionclearly produces the desired sequence. Note that the
argumeut holds for any hennitiau bounded operator; compactness is uot uecessary.
Now define an sa (Kxnl xn ) aud leta = lim an, so that lal = IIKII. Compact-
ness of Kimplies that (IKxn)) converges. Let Iy} E 1fbe the limit of {IKxn}}. Then
lIyll = lim IIKxnll ~ IIKllllxnll = IIKII. On the otherhaud,
o~ IIKxn - axnll = IIKxnll
2 - 2a (Kxnl xn) + lal
2.
Taking the limit aud noting that an aud a are real, we get
It follows from these two inequalities that lIyll = IIKII aud that lim Ixn } = Iy} [tx,
Furthermore,
(K - al)(ly} la) =(K - al)(lim Ixn}) =lim(K - al) Ixn} =0
Therefore, a is au eigeuvalue of Kwith eigenvector Iy} Ia. o
Let us orderall the eigeuvalues ofTheorem 16.6.9 in decreasing absolute value.
Let Mn denote the (finite-dimensional) eigeuspace corresponding to eigenvalue
An, aud Pn the projection to M n. The eigenspaces are pairwise orthogonal aud
PnPm = 0 for m i' n. This follows in exact aualogy with the finite-dimensional
case.
First assume that K has only finitely mauy eigenvalues,
476 16. AN INTRODUCTION TO OPERATOR THEORY
Let M "" Ml Ell Mz Ell ... Ell M,. = I:'i=l EIlMj, and let Mo be the orthogonal
complementofM. Sinceeacheigenspace is invariant underK, so is JvC, Therefore,
by Theorem 4.2.3-which holds for finite- as well as infinite-dimensional vector
spaces-and the fact that Kis henmitian, Mo is also invariant. Let Ko be the restric-
tion of Kto Mo. By Lemma 16.7.4, Ko has an eigenvalne Asnch that IAI = IIKolI.·
If A i" 0, it mnst be one of the eigenvalnes already accounted for, because any
eigenvalue of Ko is also an eigenvalue of K. This is impossible, because Mo is
orthogonal to all the eigenspaces. So, A = 0, or IAI = IIKoll = 0, or Ko = 0, i.e.,
Kacts as the zero operator on Mo.
Let Po be the orthogonal projection on Mo. Then JC = I:'i=o EIlMi- and we
have 1 = I:'i=o Pi- and for an arbitrary [x) E JC, we have
spectral theorem for
compact hermitian
operators
K[x) = K (~Pj IX)) = ~K(Pj Ix)) = tAj(Pj Ix)).
Itfollows that K = I:'i=l AjPj. Notice that the range ofKis I:'i=l EIlMj,which is
finite-dimensional. Thus, Khas finite rank. Barring some technical details, which
we shall notreproducehere, the case ofa compacthenmitian operatorwith infinitely
many eigenvalues goes through in the sanneway (see [DeVi 90, pp. 179-180]):
16.7.5. Theorem. (spectral theorem: compact henmitian operators) Let K be a
compact hermitian operator on a Hilbert space JC. Let {Aj IN=l be the distinct
nonzero eigenvalues of K arranged in decreasing order of ahsolute values. For
each j let Mj be the eigenspace ofK corresponding to eigenvalue Aj and Pj its
projection operator with the property Pi Pj = 0 for i i" j. Then:
1. If N < 00, then K is an operator offinite rank, K = I:J
= l AjPj, and
JC = Mo Ell Ml Ell··· Ell MN, or 1 = I:J=oPj, where Mo is infinite-
dimensional.
2. If N = 00, then Aj -+ 0 as j -+ 00, K = I:i=l AjPj, and JC =
Mo Ell I:i=l EIlMl» or 1 = I:i=o P[, where Mo could be finite- or infinite-
dimensional. Furthermore,
which shows that the infinite series above converges for an operator norm.
The eigenspaces of a compact hermitian operator are orthogonal and, by (2)
of Theorem 16.7.5, span the entire space. By the Grann-Schrnidt process, one can
select an orthonormal basis for each eigenspace. We therefore have the following
corollary.
16.7.6. Corollary. If K is a compact hermitian operator on a Hilbert space JC,
then the eigenvectors ofK constitute an orthonormal basis for JC.
spectral theorem for
compact normal
operators
16.7 SPECTRAL THEOREM FOR COMPACT OPERATORS 477
16.7.7. Theorem. Let Kbe a compact hermitian operator on a Hilbert space Jf
and let K = '£7~1 AjPj, where N could be infinite. A bounded linear operator on
Jf commutes with Kifand only if it commutes with every Pi-
Proof The "if" part is straightforward. So assume that the bounded operator T
commutes with K. For [x) E Mj,wehave(K-Aj)Tlx) = T(K-Aj) Ix) =0.
Similarly, (K - Aj)Tt [x) = Tt(K - Aj) Ix) = 0, because 0 = [T, K]t = rr'. K].
These equatious show that both T aud rt leave M j invariaut. This meaus that M j
reduces T, aud by Theorem 4.2.5, TPj = PjT. 0
Next we prove the spectral theorem for a uormal operator. Recall that auy
operatorTcaubewrittenasT=T,+iTiwhereT, = !(T+Tt)audTi = lICT-Tt)
are henntitiau, aud since both T aud rt are compact, T, aud Ti are compact as well.
For normal operators, we have the extra condition that [T" Til = [T,Tt] = O. Let
T, = '£7~1 AjPj aud r, = ,£f=ll-'kQk be the spectral decompositions err, aud
Ti. Using Theorem 16.7.7, it is strailfhtforward to show that if [T" Til = 0 then
[Pj,Qk] = O.Now,sinceJf = '£j~oEllMj = '£~oEll:Nk.whereMj are the
eigenspaces ofT, aud:Nk those ofTi, we have, for auy Ix) E Jf,
T, [x) = (~AjPj) (~Qk IX)) = ~~AjPjQk Ix).
Sintilarly, r, [x) = Ti('£7~oPj Ix) = '£f=1 ,£7=0I-'kQkPj [x), Combining
these two relations aud noting that QkPj = Pj Qk gives
N N
T [x) = (T, +iTi) [x) = L L(Aj +il-'k)PjQk [z}.
j~Ok~O
The projection operators Pj Qk project onto the intersection of M j aud :Nk.There-
fore, M j n :Nk are the eigenspaces of T. Only those terms in the sum for which
M j n:Nk f= 0 contribute. As before, we cau order the eigenvalues according to
their absolute values.
16.7.8. Theorem. (spectral theorem: compact normal operators) LetT be a com-
pact normal operator on a Hilbert space Jf. Let {Aj 17=1 (where N can be 00)
be the distinct nonzero eigenvalues ofT arranged in decreasing order ofabsolute
values. For each n let Mn be the eigenspace ofT corresponding to eigenvalue An
and Pn its projection operator with the property PmPn = Ofor m f= n. Then:
1. If N < 00, then T is an operator offinite rank T = '£~~1 Aj Pi- and
Jf = Mo Ell Ml Ell··· Ell MN, or 1 = ,£7=0 Pj, where Mo is infinite-
dimensional.
2. If N = 00, then An --> 0 as n --> 00, T = '£~1 AnPno and Jf =
Mo Ell '£~1 EIlMn, or 1 = '£1=0 Pj, where Mo could be finite- or infinite-
dimensional.
478 16. AN INTRODUCTION TO OPERATOR THEORY
As in the case of a compact hermitian operator, by the Gram-Schmidt process,
one can select an orthonormal basis for each eigenspace of a normal operator, in
which case we have the following:
16.7.9. Corollary. lfT is a compact normal operator on a Hilbert space Jf, then
the eigenvectors ofT constitute an orthonormal basis for Jf.
One can use Theorem 16.7.8 to write any function of a normal operator T as
an expansion in terms of the projection operators of T. First we note that Tk has
A~ as its expansion coefficients. Next, we add various powers of T in the form of
a polynomial and conclude that the expansion coefficients for a polynomial p(T)
are pO.n). Finally, for any function f(T) we have
00
f(T) = Lf(An)Pn.
n=l
(16.8)
Johann (John) von Neumann, (1903-1957), the eldest ofthree
sons of Max von Neumann, a well-to-do Jewish banker, was
privately educated until he entered the gymnasium in 1914. His
unusual mathematical abilities soon came to the attention of
his teachers, who pointed out to his father that teaching him
conventional school mathematics would be a waste of time;
he was therefore tutored in mathematics under the guidance of
university professors, and by the age ofnineteen he was already
recognized as a professional mathematician and had published
his first paper.
VonNeumann was Privatdozentat Berlinfrom 1927 to 1929
and at Hamburg in 1929-1930, then went to Princeton.University forthree years; in 1933
he was invited to join the newly opened Institute for Advanced Study, of which he was the
youngest permanent member at that time. At the outbreak of World War Il, von Neumann
was called upon to participate in various scientific projects related to the war effort: In
particular, from 1943 he was a consultant on the construction of the atomic bomb at Los
Alamos. After the war he retained his membership on numerous government boards and
committees, and in 1954 he became a member of the Atomic Energy Commission. His
health began to fail in 1955, and he died of cancer two years later.
It is only in comparison with the greatest mathematical geniuses of history that von
Neumann's scope in pure mathematics may appear somewhat restricted; it was far beyond
the range ofmost ofhis contemporaries, and his extraordinary work in applied mathematics,
in which he certainly equals Gauss, Cauchy, or Poincare, more than compensates for its
limitations. Von Neumann's work in pure mathematics was accomplished between 1925
and 1940, when he seemed to be advancing at a breathless speed on all fronts of logic
and analysis at once, not to speak of mathematical physics. The dominant theme in von
Neumann's work is by far his work on the spectraltheoryofoperatorsin Hilbert spaces.
For twenty years he was the undisputed master in this area, which contains what is now
considered his most profound and most original creation, the theory of rings of operators.
16.7 SPECTRAL THEOREM FOR COMPACT OPERATORS 479
The first papers (1927) in which Hilbert space theory appears are those on the foundations
ofquantum mechanics. These investigations later led von Neumann to a systematic study
of unbounded hermitian operators.
Von Neumann's most famous work in theoretical physics is his axiomatization ofquan-
tum mechanics. When he began work in that field in 1927,' the methods used by its founders
were hard to formulate in precise mathematical tenus: "Operators" on "functions" were
handled without much consideration of their domain of definition or their topological prop-
erties, and it was blithely assumed that such "operators," when self-adjoint, could always
be "diagonalized" (as in the finite dimensional case), at the expense of introducing Dirac
delta functions as "eigenvectors." Von Neumann showed that mathematical rigor could be
restored by taking as basic axioms the assumptions that the states of a physical system
were points of a Hilbert space and that the measurable quantities were Hermitian (generally
unbounded) operators densely defined in that space.
After 1927 von Neumann also devoted much effort to more specific problems of quan-
tum mechanics, such as the problem ofmeasurement and the foundation of quantum statis-
tics and quantum thermodynamics, proving in particular an ergodic theorem for quantum
systems. All this work was developed and expanded in Mathematische Grundlagen del"
Quantenmechanik (1932), in which he also discussed the much-debated question of"causal-
ity" versus "indeterminacy" and concluded that no introduction of "hidden parameters"
could keep the basic structure of quantum theory and restore "causality."
Von Neumann's uncommon grasp of applied mathematics, treated as a whole without
divorcing theory from experimental realization, was nowhere more apparent than in his
work on computers. He became interested in numerical computations in connection with
the need for quick estimates and approximate results that developed with the technology
used for the war effort-particularly the complex problems of hydrodynamics-and the
completely new problems presented by the harnessing of nuclear energy, for which no
ready-made theoretical solutions were available. Von Neumann's extraordinary ability for
rapid mental calculation was legendary. The story is told of a friend who brought him a
simple kinematics problem. Two trains, a certain given distance apart, move toward each
other at a given speed. A fly, initially on the windshield of one of the trains, flies back and
forth between them, again at a known constant speed. When the trains collide, how far has
the fly traveled? One way to solve the problem is to add up all the successively smaller
distances in each individual flight. (The easy way is to multiply the Ily'sspeed by the time
elapsed until the crash.) After a few seconds of thought, von Neumann quickly gave the
correct answer.
"That's strange," remarked his friend, "Most people try to sum the infinite series."
"What's strange about that?" von Neumann replied. "That's what I did."
In closing this section, let ns remark that the paradigm of compact operators,
namely the Hilbert-Schmidt operator, is such because it is defined on the finite
rectangle [a, b) x [a, b). If this rectangle grows beyond limit, or equivalently, if
the Hilbert space is r:..2(R oo), where Roo is some infinite region of the real line,
then the compactness property breaks down, as the following example illustrates.
16.7.10. Example. Consider the twokernels
Kj(x. t) = e-Ix-II and K2(X, t) = sinxt
480 16. AN INTROOUCTION TO OPERATOR THEORY
wherethe first one acts on ,(,2(_00, 00) and the second one on ,(,2(0, 00). One can show
(see Problem 16.8) thatthese two kernels have, respectively, the two eigenfunctions
e
iat•
a E JR., and a >0,
corresponding to thetwoeigenvalues
2
),,=--2' o<EIR,
1+0<
and
resolvent ofan
operator
Wesee that in the first case, all realnumbers between0 and2 areeigenvalues, rendering
this set uncountable. In the second case, there are infinitely (in fact, uncountably) many
eigenvectors. (oneforeacha) corresponding to thesingleeigenvalue JJr/2. Note,however,
that in the first case the eigenfunctions and in the second case the kernel have infinite
norms. II
16.8 Resolvents
The discussion of the preceding section showed that the spectrum of a normal
compact operator is countable. Removing the compactness property in general will
remove countability, as showninExample 16.7.10. Wehave also seen that the right-
shiftoperator, aboundedoperator, has uncountably many points in its spectrum. We
therefore expect that the sums in Theorem 16.7.8 should be replaced by integrals
in the spectral decomposition theorem for (noncompact) bounded operators. We
shall not discuss the spectral theorem for general operators. However, one special
class of noncompact operators is essential for the treatroent of Stnrrn-Liouville
theory (to be stndied in Chapters 18 and 19). For these operators, the concept of
resolvent will be used, which we develop in this section. This concept also makes a
connectionbetweenthecountable (algebraic) and the uncountable (analytic) cases.
16.8.1. Definition. Let T be an operator and A E P(T). The operator Ri.(T) ea
(T - A1)-1 is called the resolvent ofT at A.
There are two important properties of the resolvent that are useful in analyzing
the .spectrum of operators. Let us assume that A, fJ, E P(T), A rf fJ" and take
the difference between their resolvents. Problem 16.9 shows how to obtain the
following relation:
(16.9)
To obtain the second property of the resolvent, we formally (and indefinitely)
differentiate Ri.(T) with respect to Aand evaluate the result at A= W
16.8 RESOLVENTS 481
Differentiating both sides of this equation. we get 2Ri(T), and in general.
dn dn
-R,(T) = n!R~+l(T) =} -R,(T) I = n!R~+l(T).
d)..n d)..n ,~"
Assuming that the Taylor series expansion exists, we may write
(16.10)
(16.11)
forn =0.1•...
which is the second property of the resolvent.
We now look into the spectral decomposition from an analytical viewpoint.
For convenience, we concentrate on the finite-dimensional case and let A be an
arbitrary (not necessarily hermitian) N x N matrix. LetAbe a complexnumberthat
is larger (in absolute value) than any of the eigenvalues of A. Since all operators
on finite-dimensional vector spaces are compact, Lemma 16.7.4 assures us that
1)..1 > IITII. and it is then possible to expand R, (T) in a convergent power series as
follows:
1 00 A n
R,(A) = (A_)..l)-l = -- L:(-)
).. n~O )..
This is the Laurent expansion of R, (A). We can immediately read off the residue
of R,(A) (the coefficient of 1/)..):
Res[R,(A)] = -1 =} ----.!..., 1 R,(A)d)" = 1.
2m fr.
where I' is a circle with its center at the origin and a radius large enongh to
encompass all the eigenvalues ofA [see Figure l6.3(a)]. A similar argument shows
that
----.!..., 1 )..R,(A)d)" = A.
21n fr.
and in general.
1 inn
--. )..R,(A)d)..=A
21l"1 r
Using this and assuming that we can expand the function f(A) in a power series.
we get
- ~ 1 f()..)R,(A)d)" = f(A).
2m fr.
Writing this equation in the form
_1_1 f()..) d).. = f(A)
2rri fr. )"l-A
(16.12)
482 16. AN INTRODUCTION TO OPERATOR THEORY
• • 0 0
• 0 0
•
0
•
•
0
(a) (b)
Figure 16.3 (a)Thelargecircleencompassing alleigenvalues. (b)thedeformed contour
consisting of smallcirclesorbiting theeigenvalues.
makes it recognizable as the generalization of the Cauchy integral formula to
operator-valued functions. To use any of the above integral formulas, we must
know the analytic behavior of RJ.(A). From the formula of the inverse of a matrix
given in Chapter 3, we have
where Cjk(J..) is the cofactor of the ijth element of the matrix A - J..1 and p(J..)
is the characteristic polynomial of A. Clearly, Cjk(J..) is also a polynomial. Thus,
[RJ.(A)]jk is a rational function ofJ... It follows that RJ.(A) has ouly poles as singu-
larities (see Example 10.2.2). The poles are simply the zeros of the denominator,
i.e., the eigenvalues ofA.We can deform the contour r in such a way that it consists
of small circles Yj that encircle the isolated eigenvalues J..j [see Figure 16.3(b)].
Then, with f(A) = 1, Equation (16.12) yields
Pj sa -~ J RJ.(A)dJ...
2n, r"j (16.13)
It can be shown (see Example 16.8.2 below) that {Pj) is a set of orthogonal pro-
jection operators. Thus, Equation (16.13) is a resolution ofidentity, as specified in
the spectral decomposition theorem in Chapter 4.
16.8 RESOLVENTS 483
16.8.2. Example. We want to show that the Pj are projection operators. First let i = j.
Then12
P] = ( __1_.)2 1. R)JA)dA 1. Rtt(A)dl-'.
27l'Z t; jYi
Note that Aneed not be equal to 1-'. In fact, we are free to choose IA - Aj I > II-' - Aj I,i.e.,
letthecirclecorresponding to ).. integration be outsidethatof JL integration.U Wecanthen
rewrite the abovedoubleintegral as
P] = (_f)2 1.(') 1.V» Rie(A)Rtt(A)dAdl-'
1rl irj
irj
= ( __
I )21. 1. [Rie(A) _ Rtt(A)] dAdl-'
21ri 1'r~},.) J;,~) ).. - JL J... - JL
I I
= h~J {it,)Rie(A) dA i(.) A~I-' - i<.)Rtt(A)du: it,)A~I-'} ,
J J J J
wherewe used Equation (16.9) to go to the secondline. Now note that
and
i
d A
- - =2rri
y~).) A - JL
I
because Alies outside rJIL) andn lies insideyy).Hence,
P] = (- 2~i)2{o-'Ixi iiV» Rtt(A)dl-'} = - 2~i iy)Rtt(A)dl-' = Pj.
The remaining part, namely PjPk = 0 for k '" j, can be done sbnilarly (see Problem
16.10). ..
Now we let I(A) = A in Eqnation (16.12), deform the contonr as above, and
write
(16.14)
12We havenot discussed multipleintegrals of complexfunctions. A rigorous studyof such integrals involves thetheory of
functions of several complexvariables-a subjectwe haveto avoiddue to lack of space.However, in the simplecase athand,
thetheory of realmultipleintegrals is anhonestguide.
13Thisis possiblebecausethepoles areisolated.
484 16. AN INTRODUCTION TO OPERATOR THEORY
It can be shown (see Problem 16.11) that
OJ = 1. (A - Aj)" RA(A)o:
t;
In particular, since RA(A) has only poles as singularities, there exists a positive
integer m such that OJ' = O. We have not yet made any assumptions about A. If
we assume that A is hermitian, for example, theu RA(A) will have simple poles
(see Problem 16.12). It follows that (A - Aj)RA(A) will be analytic at Aj for all
j = 1,2,oo.,r,andOj =OiuEquatiou(16.14). We thus have
,
A= I>jPj,
)=1
which is the spectral decomposition discussed in Chapter 4. Problem 16.13 shows
that the Pj are hermitian.
16.8.3. Example. The most general 2 x 2 hermitiao matrixis of the form
A = (a~l a12) ,
al2 aZ2
where au andaZ2 are-real numbers. Thus,
whichhasroots
Al = ![411 +a22 - )(all - 422)2+4141212].
A2 = ![all +a22 + )(all - a22)2 +4141212].
Theinverse of A- A1canimmediately be written:
I I (a22 - A 412 )
R(A)-(A-A1)- - -
A - - det(A-A1) -ai2 411-A
I (aZ2 - A -aI 2 )
= (A - AI)(A - A2) -ah 411 - A .
We wantto verify thatR). (A)has only simple poles. Twocases arise:
1. If Al i' A2, then it is clear that RA(A) has simplepoles.
2. If Al = A2, it appears that R}"(A) mayhavea pole of order 2. However, notethatif
Al = A2.thenthe square rootsin theabove equationsmustvanish.This happensiff
all = aZ2 sa a and aiz = O. It then follows that A} = A2 == a, and
I (4-A 0)
RA(A) = (A-a)2 0 a-A'
This clearlyshowsthat RA(A) has only simplepoles in dtis case. III
Jordan canonical
form
16.9 PROBLEMS 485
If A is not hermitian, Dj oF 0; however, Dj is nevertheless nilpotent. That
is, Dj' = 0 for some positive integer m. This property and Equation (16.14) can
be used to show that A can be cast into a Jordan canonicalform via a similarity
transformation. That is, there exists an N x N matrix S such that
("'
0 0 ...
n
SAS-
1
= J = IJ2 0 ...
0 0
where Jk is a matrix of the form
A I 0 0 0 0
0 A I 0 0 0
Jk = 0 0 A I 0 0
0 0 0 0 A I
in which A is one of the eigenvalnes of A. Different Jk may contain the same
eigenvalues of A. Fora discussion of theJordan canonical form of a matrix, see
[Birk 77], [Denn 67], or [Halm 58].
16.9 Problems
16.1. Suppose that S is a bounded operator, T an invertible operator, and that
Show that S is invertible. Hint: Show that T-1S is invertible. Thus, an operator
that is "sufficiently close"to aninvertible operator is invertible.
16.2. Let V and W be finite-dimensional vector spaces. Show that T E .G(V,W)
is necessarily bounded.
16.3. Let Jf be a Hilbert space, and T E .G(Jf) an isometry, i.e., a linear operator
that does not change the norm of any vector. Show that IITII = 1.
16.4. Show that (a) the unit operator is not compact, and that (b) the inverse of
a compact operator cannot be bounded. Hint: For (b) use the resnlts of Example
16.5.3.
16.5. Prove Corollary 16.6.7. Hint: Let [x) E Ni
q
+
l
) and write it as [x) = In)+ Ir)
with In) E Niq) and Ir) E :1<iq).Apply (K - A1)q+1 to [r), and invoke part (3) of
Theorem 16.6.6 to show that Ir) E Ni
q).
Conclude that Ir) =0, Niq+l) = Ni
q)
,
486 16. AN INTROOUCTION TO OPERATOR THEORY
and q 2: p. To establish the reverse inequality, apply (K - I-.l)P to both sides of
the direct sum of part (I) of Theorem 16.6.6, and notice that the LHS is ~i.P), the
second term of the RHS is zero, and the first term is ~iq+p). Now conclude that
p 2: q.
16.6. Let]«) E ~andletMbeasubspaceof1f.ShowthatthesubsetE = lu}-M
is convex. Show that E is not necessarily a subspace of~.
16.7. Show that for any hermitian operator H, we have
4(Hxly} = (H(x +Y)lx +y) - (H(x - Y)lx - y)
+i[(H(x +iy)1 x +iy} - (H(x - iy)1x - iy)].
Now let Ix) = I-.Iz) and Iy) = 1Hz) II-., where I-. = (IIHzll/llzlI)1/2, and show that
IIHzll2 = {Hx] y) ::: MllzlIlIHzlI,
where M = maxl] (Hz] z) 1/IIIzIl2}.Now conclude that IIHII ::: M.
16.8. Show that the twokemels Kj(x, r) =e-Ix-tl and K2(X, t) =sinxt, where
the first one acts on ,(,2(-00,00) and the second one on ,(,2(0, 00), have the two
eigenfunctions
and
o at t
V2e
+ a2 + t2 '
respectively, corresponding to the two eigenvalues
a> 0,
2
1-.=I+a2 ' aEIR, and
16.9. Derive Equation (16.9). Hint: Multiply R,,(T) by 1 = R"m(T - iLl) and
R"m by 1 = R,,(T)(T - 1-.1).
16.10. Finish Example 16.8.2 by showing that PjPk = 0 for k i' j,
16.11. Show thalD) = :FYi (I-. - I-.j)nR"(A)d): Hint: Use mathematical induction
and the technique used in Example 16.8.2.
16.12. (a) Take the inner product of lu) = (A - 1-.1) [u) with Iv) and show that
for a hermitian A, 1m(vi u) = -(1m 1-.) IIv 11
2. Now use the Schwarz inequality to
obtain
IIvll::: I~~~I => IIR,,(A)lu)lI::: I~~I~I'
(b) Use this result to show that
16.9 PROBLEMS 487
where eis the angle that A- Aj makes with the real axis and Ais chosen to have
an imaginary part. From this result conclude that R.(A) has a simple pole when A
is hermitian.
16.13. (a) Show that when A is hermitian, [R.(A)]t = R•• (A).
(b) Write A - Aj = rjeie in the definition of Pj in Equation (16.13). Take the
hermitian conjugate of both sides and use (a) to show that Pj is hermitian. Hint:
You will have to change the variable of integration a number of times.
Additional Readings
1. DeVito, C. Functional Analysis and Linear Operator Theory, Addison-
Wesley, 1990. Our treatment of compact operators follows this reference's
discussion.
2. Glimm, J. and Jaffe, A. Quantum Physics, 2nd ed., Springer-Verlag, 1987.
One of the mostmathematical treatments ofthe subject, and therefore a good
introduction to operator theory (see the appendix to Part I).
3. Reed, M. and Simon, B. FourierAnalysis, Self-Adjointness, Academic Press,
1980.
4. Richtmyer, R. Principles of Advanced Mathematical Physics, Springer-
Verlag, 1978. Discusses resolvents in detail.
5. Zeidler, E. Applied Functional Analysis, Springer-Verlag, 1995.
17 -,------ _
Integral Equations
The beginning of Chapter 16 showed that to solve a vector-operator equation one
transforms it into an equation involving a sum over a discrete index [the matrix
equation of Equation (16.1n,or an equationinvolving an integral over a continuous
index [Equation (16.2)]. The latter is called an integral eqnation, which we shall
investigate here using the machinery of Chapter 16.
17.1 Classification
Volterra and
Fredholm equations
offirstand second
kind
Integral equations can be divided into two major groups. Those that have a variable
limit of integration are called Volterra equations; those that have constant limits
of integration are called Fredholm equations. If the unknown function appears
only inside the integral, the integral equation is said to be of the first kind. Integral
equations having the unknown function outside the integral as well as inside are
said to be of the second kind. The four kinds of equations can be written as
follows.
Volterra equation of the Ist kind,
Volterra equation of the 2nd kind,
Fredhohn equation of the Ist kind,
Fredhohn equation of the 2nd kind.
l'K(x, t)u(t) dt = v(x),
1
b
K(x, t)u(t) dt = v(x),
u(x) = v(x) +l'K(x, t)u(t)dt,
u(x) = v(x) +1
b
K(x, t)u(t)dt,
In all these equations, K (x, t) is called the kernel of the integral equation.
17.1 CLASSIFICATION 489
In the theory of integral equations of the second kind, one nsually multiplies
the integral by a nonzero complex number A. Thus, the Fredhohn equation of the
second kind becomes
A A that satisfies (17.2) with v(x) = 0 is called a characteristic vaiue of the
integral equation. In the abstract operator language both equations are written as
characteristic value
ofan Integral
equation
u(x) = v(x) +A1
b
K(x, t)u(t) dt,
and for the Volterra equation of the second kind one obtains
u(x) = v(x) +A1
X
K(x, t)u(t) dt.
lu} = Iv) +AK lu} =} (K - A-I) lu) = -A-llv).
(17.1)
(17.2)
(17.3)
Thus A is a characteristic value for (17.1) if and only if A-1 is an eigenvalue of
K. Recall that when the interval of integration (a, b) is finite, K(x, t) is called a
Hilbert-Schmidt kemel. Example 16.5.10 showed that K is a compact operator,
and by Theorem 16.6.9, the eigenvalues of Keither form a finite set or a sequence
that converges to zero.
17.1.1. Theorem. The characteristic values ofa Fredholm equation ofthe second
kind eitherform a finite set or a sequence ofcomplex numbers increasing beyond
limit in absolute value.
Our main task in this chapter is to study methods of solving integral equations
ofthe second kind. We treat the Volterra equation first because it is easier to solve.
Let us introduce the notation
K[u](x) == 1
x
K(x, t)u(t) dt and Kn[u](x) = K[Kn-1[u]](x)
(17.4)
whereby K [u1denotes a function whose value at x is given by the integral on the
RHS of the first equation in (17.4). One can show with little difficulty that the
associated operator Kis compact. Let M = max{IK (x, t) I Ia ::; t ::; x ::; b} and
note that
IAK[u](x)1 = IA1
x
K(x, t)u(t) dtl ::; IAIIMlliulloo(x - a),
where lIulioo == max{lu(x)I I x E (a, b)}.
Using mathematical induction, one can show that (see Problem 17.1)
(x a)n
I(AK)n [u](x) I s 1'-lnlMlnllulioo ,
n.
(17.5)
490 17. INTEGRAL EQUATIONS
Since b 2:: x, we can replace x with b and still satisfy the inequality. Then the
inequality of Eqnation (17.5) will hold for all x, and we can write the equation as
an operator norm ineqnality: II(J.K)n II :s: 1J.lnlMlnlIuIIoo(b - a)n[n': Therefore,
Volterra equalion of
the second kind has
aunique solution and
no nonzero
characterislic value
and the series I:~o(J.K)n converges for all J.. In fact, a direct calculation shows
that the series converges to the inverse of 1 - J.K.Thus, the latter is invertible and
the spectrum of Khas no nonzero points. We have just shown the following.
17.1.2. Theorem. The Volterra equation ofthe second kind has no nonzero char-
acteristic value. In particular, the operator 1 - J.K is invertible, and the Volterra
equation of the second kind always has a unique solution given by the conver-
gent infinite series u(x) = I:j:o J.i J: K i (x, t)v(t) dt where K i (x, t) is defined
inductively in Equation (/7.4).
VitoVolterra (1860-1940)wasonly t1whenhehecameinterested
inrnathematics whilereading Legendre's Geometry. Attheageof 13 .
hebegan tostudy thethree bodyproblem andmade someprogress.
Hisfamilywereextremety poor(hisfatherhad diedwhenVito
was two yearsold) butafterattending lecturesatFlorencehe was
abletoproceedtoPisain 1878.AtPisahe studiedunder Betti, grad-
uating asadoctor of physics in 1882.Histhesisonhydrodynamics
included someresults of Stokes, discovered laterbutindependently
by Volterra.
Hebecame Professor of Mechanics atPisain 1883,andupon
Betti'sdeath, he occupied thechair of mathematical physics. Af-
terspending some time atTurin as the chair of mechanics, he was awarded the chair of
mathematical physics attheUniversity of Romein 1900.
Volterra conceived theideaof a theory of functions that depend ona continuous set of
values of another function in 1883.Hadamard waslater to introduce theword "functional,"
whichreplacedVolterra's originalterminology. In 1890Volterrausedhisfunctional calculus
toshowthatthetheory of Hamilton andJacobi fortheintegration ofthedifferential equations
of dynamics couldbeextended to other problems of mathematical physics.
Hismostfamous workwasdoneonintegralequations.Hebeganthisstudyin1884,and
in 1896hepublished severalpapers onwhat isnowcalledtheVolterra integral equation. He
continued to study functional analysis applications to integral equations producing a large
number of papers oncomposition andpermutable functions.
During the First World War Volterra joinedthe Air Force. He made many journeys
to France andEngland to promote scientific collaboration. Afterthe war he returned to
theUniversity of Rome, and his interests moved to mathematical biology. He studied the
Verhulst equation andthelogisticcurve. Healso wrote onpredator-prey equations.
In 1922Fascism seizedItaly, andVolterra fought against it in theItalian Parliament.
However, by 1930theParliament wasabolished, andwhenVolterra refused totake anoath
ural = cj , u'(a) = C2.
17.1 CLASSIFICATION 491
of allegiance to the Fascist government in 1931, he was forced to leave theUniversity of
Rome.From thefollowingyearhe livedmostlyabroad, mainlyin Paris, butalso in Spain
andother countries.
17.1.3. Example. Differential equations canbe transformed intointegral equations. For
instance, considerthe SOLDE
d2u du
dx2 + Pt (x) dx +po(x)u = r(x),
By integrating theDE once,we obtain
du LX LX LX
- = - PI (t)u'(t) dt - po(t)u(t) dt + ret) dt +C2.
dx a a a
Integrating thefirst integral by parts gives
u'(x) =-PI (x)u(x) + LX[p (t) - po(t)]u(t) dt +LXret) dt +PI (a)ct +C2.
~f(X) ~g(X)
III
Neumann series
solution
Integrating once moreyields
u(x) = -LXPI (t)u(t) dt + LX f(s) ds + LXg(s) ds +(x - a)[PI (a)ct +C2]
= -LXPI (t)u(t) dt + LXds l'[p (t) - po(t)]u(t) dt
+ LXds l'ret) dt + (x - a)[PI (a)ct +C2]+ct
=LX (x - t)[p (t) - poet)] - PI (t)} u(1) dt
+ LX(x - t)r(t) dt +(x - a)[PI (a)Cj +C2]+cr. (17.6)
where we haveusedtheformula
LXds l'f(t) dt =LX(x - t)f(t) dt,
whichthereader mayverifybyinterchanging theorder of integration ontheLHS.Equation
(17.6) is a Volterra equation ofthesecoud kind withkernel
K(x, t) sa (x - t)[p (t) - poet)] - PI (t)
andvex) '" f:(x - t)r(t) dt +(x - a)[PI (a)ct +C2]+ct.
Wenow outline a systematic approachto obtainingthe infinite series of Theo-
rem 17.1.2,which also works for the Fredholm equation of the second kind as we
shall see in the next section.In the latter case, the series is guaranteedto converge
492 17. INTEGRAL EQUATIONS
only if IAIIIKII < 1. This approach has the advantage that in each successive step,
we obtain a better approximation to the solution. Writing the equation as
Iu) = Iv) +AK Iu) , (17.7)
we can interpret it as follows. The difference between lu) and [u) is AK lu). If
AK were absent, the two vectors lu) and Iv) would be equal. The effect of AK
is to change lu) in such a way that when the result is added to [u), it gives lu).
As our initial approximation, therefore, we take Iu) to be equal to Iv) and write
luo) = [u), where the index reminds us of the order (in this case zeroth, because
AK = 0) of the approximation. Tofind a betterapproximation, we always snbstitute
the latest approximation for lu) in the RHS of Equation (17.7). At this stage, we
have lUI) = Iv) +AK luo) = Iv) +AK Iv). Still a better approximation is achieved
if we substitute this expression in (17.7):
The procedure is now clear. Once lun), the nth approximation, is obtained, we can
get IUn+l) by substituting in the RHS of (17.7).
Before continuing, let us write the above equations in integral form. In what
follows, we shall concentrate on the Fredhohn equation. To obtain the result for
the Volterra equation, one simply replaces b, the upper limit of integration, with x.
The first approximation can be obtained by substituting v(t) for u(t) on the RHS
of Equation (17.1). This yields
Uj(x) = V(X)+A lb
K(x,t)v(t)dt.
Substituting this back in Equation (17.1) gives
U2(X) = v(x) +Alb
dsK(x, s)Uj(s)
= v(x) +Alb
dsK(x, s)v(s) + A21bdt [lb
K(x, s)K(s, t) dS] v(t)
= v(x) +Alb
dtK(x, t)v(t) + A21b dtK2(x, t)v(t),
where K 2(x, t) sa J:K(x, s)K(s, t)ds. Similar expressions can be derived for
U3(X), U4(X), and so forth. The integrals expressing various "powers" of K can be
obtained using Dirac notation andvectors with continuous indices,as discussed
17.1 CLASSIFICATION 493
in Chapter 6. Thus, for instance,
K
3(x,
t) sa (xl K(l
b
IsIl (sll dSI)K (l
b
IS2) (s21 dS2) Kit}
n
Iun} = Iv} +).K [u) + ... + ).nKn Iv} = L().K)i Iv} ,
i~O
whose integral form is
(17.8)
(17.9)
Un(x) = t).i l
b
Ki(x,t)v(t)dt.
j=O a
Here K i (x, t) is defined inductively by
KO(x, z) = (xl ~ It) = (xI1It) = (xl t) = 8(x - t),
Ki (x, t) = (xl KKi-1
It) = (xl K (l
b
Is) (si dS) Ki-I
It}
= l
b
K(x,s)Ki-l(s,t)ds.
The limit of Un (x) as n -> 00 gives
u(x) = f).i l
b
Ki(x, t)v(t)dt. (17.10)
j=O a
The convergence of this series, called the Neumann series, is always guaranteed
for the Volterra equation. For the Fredholm equation, we need to impose the extra
condition 1).IIIKIl < 1.
17.1.4. Example. As an example, let us find the solution of u(x) = 1 +). fou(t) dt, a
Volterra equation of the second kind. Here, v(x) = I and K(x, t) = 1, and it is straightfor-
ward to calculate approximations to u(x):
Uo(x) =v(x) =1, UI(X) =1 +A foX K(x,t)uo(t)dt =1 +AX,
{X . (X A2X2
U2(X) =1 +A 10 K(x,t)uI(t)dt = 1 +A 10 (I +M)dt = I +AX + -2-'
494 17. INTEGRAL EQUATIONS
It is clear that the nth term will look like
).,2
x2 ).}lXn n )jxi
u,,(x) = l+h+--+ ...+-,- = L-.,-.
2 n. j=O J.
As n --+ 00, we obtain u(x) = eAx• By direct substitution, it is readily checked that this is
indeed a solution of the original integral equation. II
17.2 Fredholm Integral Equations
We can use our knowledge of compact operators gained in the previous chapter to
study Fredholm equations of the second kind. With A "" 0 a complex number, we
consider the characteristic equation
(l-AK)lu) = Iv), or u(x) - AK[u](x) = v(x), (17.11)
where all fuuctious are square-integrable on [a, bl, and K(x, t), the Hilbert-
Schmidt kernel, is square-integrable on the rectangle [a, bl x [a, b].
Using Propositiou 16.2.9, we innnediately see that Equation (17.11) has a
unique solutiou if IAIIiKIl < I, and the solutiou is of theform
00
lu) = (1 - AK)-llv) = LA"K" Iv),
n=O
(17.12)
or u(x) = L~oAnK"[vl(x), where Kn[v](x) is defined as in Equation (17.4)
except that now b replaces x as the upper limit of integration.
17.2.1. Example. Considerthe integralequation
u(x) - 10
1
K(x, t)u(t) dt = x, where {
X ifO~x <t,
K(x, r) = t
ift <x:::::: l.
Here )" = 1; therefore, a Neumann series solution exists if IIKIf < 1. It is convenient to
write K in terms of the theta function:1
K(x, t) = xO(t - x) +to(x - r), (17.13)
This gives IK(x, t)12 = x20Ct - x) + t20(x - t) because 02(x - t) = O(x - t) aod
O(x - t)O(t - x) = O. Thus, we have
IIKII2
= f dx f dtIK(x, t)1
2
= fo1dx folx20Ct-X)dt+ fo1dx fo 1t20(X-t)dt
[I r [I r [I (t3) [I (x3) 1
= fo dt fo x
2
dx + f
o
dx f
o
t
2
dt = f
o
dt "3 + fo dx 3" = 6'
1Recall that the theta function is defined to be 1 ifits argument is positive, and 0 if it is negative.
17.2 FREDHOLM INTEGRAL EQUATIONS 495
Sincethisis less than I, theNeumann series converges, andwehave2
u(x) = I» lb
Ki(x,t)v(t)dt= flol
Ki(x,t)tdt= ffi(x).
j=O a j=O 0 j=O
The firstfew terms areevaluatedas follows:
fo(x) = 10
1
KO(x,t)tdt = 10
1
8(x,t)tdt =x
!J(x) = (I K(x,t)tdt= (I[X8(t-X)+t8(x-t)]tdt
. 10 10
=x 1
1
t dt + (X t2 dt = :: _ x
3
.
x 1
0 2 6
Thenextterm is trickier than thefirst twobecause of theproduct of thetheta functions.
We first substitute Equation (17.13) in the integral for the second-order term, and simplify
h(x) = 10
1
K2(x,t)tdt =10
1
tdt 10
1
K(x,s)K(s,t)ds
= 10
1
t dt 10
1
[x8(s - x) +sO(x - s)][sO(t - s) +tOrs - t)] ds
=x 10
1
tdt 10
1
sO(s-x)8(t-s)ds+x 10
1
t2dt 10
1
O(s-x)8(s-t)ds
+ (dt (I s20(x-s)8(t-s)ds+ (I t2dt (O(X-s)O(s-t)ds.
10 10 . 10 10
Itis convenientto switchtheorderofintegration atthispoint, Thisisbecause ofthepresence
ofO(x - s) and O(s-x), which do not involve t and are best integrated last. Thus, we have
h(x) =X 10
1
s8(s-x)ds /,1 tdt+x 10
1
8(s-x)ds loSt2dt
+ foIS20(X-S)dS /,I t dt+ foIS8(X-S)dS fo
St
2dt
=x1IsdS(~_s2)+x11ds
s3
+ (Xs2dS(~_s2)+ f'sds
s3
x 2 2 x 3 10 2 2 10 3
5 1 3 1 5
=-x--x +-x
24 12 120
As a test of his/her knowledge of O-function manipulation, the reader is orged to perform
theintegration inreverse order. Addingalltheterms, we obtain anapproximation foru(x)
that is validfor0::: x ::: 1:
II
2Notethat inthiscase(Fredholm equation), we cancalculate thejth termin isolation. In theVolterra case, itwasmorenatural
to calculate thesolutionup to a givenorder.
496 17. INTEGRAL EQUATIONS
We have seen that the Volterra equation of the second kind has a unique so-
lution which can be written as an infinite series (see Theorem 17.1.2). The case
of the Fredholm equation of the second kind is more complicated because of the
existence of eigenvalues. The general solution of Equation (17.11) is discussed in
the following:
Fredholm alternative 17.2.2. Theorem. (Fredhohn Alternative) Let K be a Hilbert-Schmidt operator
and A a complex number. Then either
1. A is a regular value ofEquation (17.11)---or A-I is a regular point ofthe
operator K-in which case the equation has the unique solution lu) = (1 -
AK)-I Iv), or
2. A is a characteristic value ofEquation (17.11) (A-1 is an eigenvalue ofthe
operator K), in which case the equation has a solution if and only if Iv)
is in the orthogonal complement of the (finite-dimensional) null space of
1-A*Kt.
Proof The first part is trivial if we recall that by definition, regular points of Kare
those complex numbers fl. which make the operator K - fl.1 invertible.
For part (2), we first show that the null space of 1 - A*Kt is finite-dimensional.
We note that 1 - AK is invertible if and only if its adjoint 1 - A*Kt is invertible,
and A E P(K) iff 1.* E P(Kt). Since the specttum of an operator is composed of
all points that are not regular, we conclude that A is in the specttum of K if and
only if 1.* is in the specttum of Kt. For compact operators, all nonzero points of
the specttum are eigenvalues. Therefore, the nonzero points of the specttum of
Kt, a compact operator by Theorem 16.5.7, are all eigenvalues of Kt, and the nnll
space of 1 - A*Kt is finite-dimensional (Theorem 16.6.2). Next, we note that the
equation itself requires that Iv) be in the range of the operator 1 - AK, which, by
Theorem 16.6.2, is the orthogonal complement ofthe nnll space of1 - A*Kt. D
Erik Ivar Fredholm (1866-1927) was born in Stockholm, the
sonof awell-to-do merchant family. He received thebestedu-
cationpossibleandsoonshowedgreatpromise inmathematics,
leaningespeciallytoward the applied mathematics of practi-
cal mechanics in a year of study at Stockholm's Polytechnic
Institute. Fredholm finished his education attheUniversity of
Uppsala, obtaining his doctorate in 1898. He also studied at
theUniversity of Stockholm during thissameperiodandeven-
tuallyreceivedanappointment to thefaculty there. Fredholm
remained there therestof his professional life.
His first contribution to mathematics wascontained in his
doctoral thesis,in whichhe studied a first-order partial differential equation in three vari-
ables, a problem thatarises in the deformation of anisotropic 'media. Several years later
17.2 FREDHOLM INTEGRAL EQUATlDNS 497
he completedthis work by finding the fundamental solntion to a general elliptic partial
differential equation withconstant coefficients.
Fredholm is perhaps bestknownforhis studies of the integral equation that bears his
name. Suchequations occur frequently in physics.Fredholm's geniusled him to notethe
similarity betweenhis equation anda relatively familiar matrix-vector equation, resulting
in his identification of aquantity that playsthesamerolein hisequation asthedeterminant
playsinthematrix-vectorequation. Hethusobtainedamethodfordeterminingtheexistence
of a solution and later used an analogous expression to derive a solution to his equation
akinto theCramer's rulesolution to thematrix-vector equation. He further showedthat the
solution couldbeexpressed as a powerseriesin a complexvariable. Thislatter resultwas
considered important enoughthatPoincare assumed it without proof(infacthe wasunable
to prove it) in a study of related partial differential equations.
Fredholm then considered the homogeneous form of his equation. He showed that
under certain conditions, the vectorspace of solutionsis finite-dimensional. David Hilbert
later extended Fredholm's worktoacompleteeigenvalue theory of theFredholm equation,
whichultimately led to thediscovery of Hilbert spaces.
17.2.1 HermitianKernel
Of special interest are integral equations in which the kernel is hermitian, which
occurs exactly when the operator is hermitian. Such a kernel has the property thar'
(r] K[r)" = (tl KIx) or [K(x, t)]* = K(t, x). For such kernels we can use the
spectral theorem for compact hermitian operators to find a series solution for the
integral equation. First we recall that
where we have used J..j 1 to denote the eigenvalue of the operator" and expanded
the projection operator in terms oforthonormal basis vectors of the corresponding
finite-dimensional eigenspace. Recall that N can be infinity. Instead of the double
sum, we can sum once over all the basisvectors and write K = L:~I J..;llun) (unl·
Here n counts all the orthonormal eigenvectors ofthe Hilbert space, and J..;1is the
eigenvalue corresponding to the eigenvector Iun). Therefore, J..;I may be repeated
in the sum. The action of K on a vector Iu} is given by
00
K lu) = I>;I (unl u) lun).
11=1
(17.14)
3Sincewe aredealingmainlywithrealfunctions, hermiticity of Kimpliesthesymmetry of K, i.e., K (x, t) = K (t, x).
4)"j is thecharacteristic valueof theintegral equation, ortheinverse of theeigenvalue of thecorresponding operator.
498 17. INTEGRAL EQUATIONS
If the Hilbert space is .(,2[0, b], we may be interested in the functional form of this
equation. We obtain such a form by multiplying both sides by (x I:
00 00
K[u](x) == (xIKlu) =I);! (unlu) (xlun) =I);l (unlu)un(x).
n=l n=l '
3x
(17.15)
Hilbert-Schmidt
theorem

That this series converges uniformly in the interval [0, b] is known as the Hilbert-
Schmidt theorem.
17.2.3. Example. Letus solve u(x) =x +AI:K(x, t)u(t)dt, where K(x, t) '" xt is
a symmetric (hermitian) kernel, by theNeumann seriesmethod. Wenotethat /
IIKII2
= lb
lb
IK(x,t)1
2dxdt
= lb
lb
x
2t2dxdt
=
lb
x2dx lb
t2dt = (lb
X2dX)
2
= ~(b3 _ a3)2,
or
IIKII = lb
x2dx = !(b
3
- a3
),
andtheNeumann seriesconverges if I)..I(b3- a3) < 3. Assuming that thisconditionholds;
we have
00 lb
u(x) = x + I>i Ki(x, t)tdt.
j=l a
ThespecIal formof thekernel allows us to calculate xt(x, t) directly:
Ki(x,t) = lb
t:K(x,SI)K(s!,S2) ...·K(si_J,t)dslds2···dsi_l
=lblb···lb
xsfs~ ... S~_ltdslds2···dSj_l
a a a J
(
b r
= xt 1s2 ds = xtllKlli-1
It follows thatI:Ki (x, t)t dt = xIlKlli-1!(b3 - a3) = xllKlli. Substituting thisin the
expression foru(x) yields
00 00
u(x) = x + I>ixlIKlli = x +xAIiKII I>i-11IKlli-1
j=l j=l
=X (I +AIIKIII_~IIKII)= I-:IIKII
Becauseof thesimplicity of thekernel, we cansolvetheintegral equation exactly. First
we write
u(x) = x + Alb
xtu(t) dt =x + AX lb
tu(t) dt sa x(1 + AA),
17.2 FREDHDLM INTEGRAL EQUATIDNS 499
whereA = f:tu(t) dt. Multiplying both sidesby x aud integrating, we obtain
A = lab xu(x)dx = (I +AA) lab x
2dx
= (I + AA) IIKII =} A = I ~~::KII
Substitutiug A in Equation(17.15)gives
_ ( A IIKII ) _ x
u(x) - x 1+ 1- AIIKII - 1- AIIKII"
Thissolution is thesameasthefirst onewe obtained. However, no serieswasinvolvedhere,
and therefore no assumption is necessary concerning l'AIIiKIl. II
If one cau calculate the eigenvectors Iun} and the eigenvalues A;;1, then one
can obtain a solution for the integral equation in terms of these eigenfunctions as
follows: Substitute (17.14) in the Fredholm equation [Equation (17.3)] to get
00
lu) = Iv) +A~:::>;;1 (unl u) Iun).
n=l
Multiply both sides by (umI:
(17.16)
00
(uml u) = (uml v) +ALA;;1 (unl u) (uml un) = (uml v) +).).;;;1 (uml U),
n=1 ~ (17.17)
or,if ).. is notoneof theeigenvalues.
( I)
- Am(umlv)
Urn U - .
Am - A
Substituting this in Equation (17.16) gives
~ (unl v)
lu} = Iv) +AL. A _ A Iun),
n=l n
and in the functional form,
~ (unl v)
u(x) = v(x) +AL. -r-r--r-un (x),
n=l An - A
(17.18)
(17.19)
In case A = Am for some m, the Fredholm alternative (Theorem 17.2.2) says
that we will have a solution only if Iv) is in the orthogonal complement of the null
space of 5 1 - AmK. Moreover, Equation (17.17) shows that (uml u), the expan-
sion coefficients of the basis vectors of the eigenspace Mm, cannot be specified.
5Remember thatKis hermitian; therefore, Am is real.
(17.20)
(17.21)
500 17. INTEGRAL EQUATIONS
However, Equation (17.17) does determine the rest of the coefficients as before.
In this case, the solution can be written as
~ (k) ~ (u,,1v)
lu) = Iv) + L.., ckl u", } +A L.., A _ A lu,,),
k=l 11=1 n
n=f:.m
where r is the (finite) dimension of M"" k labels the ottbononnal basis {Iu!:»} of
M"" and {Ck}k~1 are arbitrary constants. In functional form, this equation becomes
r 00 (ulv)
u(x) = v(x) +I:CkU~~)(X) +AI: _n_un(x).
k=l 11=1 An - A
n"",
17.2.4. Example. Wenow give an example of the applicationof Equation (17.19).We
wantto solveu(x) = 3 J~1 K(x, t)u(t) dt +x2 where
and Pk(x) is a Legendrepolynontial.
We first note that {Uk} is an orthonormal set of functions, that K(x, t) is real and
symmetric (therefore, hermitian), andthat
11 dt 11
dxIK(x, t)12 = 11
dt 11
dx t Uk(X~U:(t) u/(x;u;(t)
-1 -1 -1 -1 k,/=O 2 / Z /
00 1 I 11
11
= I: k/2172 Uk(X)U/(x) dx Uk(t)U/(t) dt
k,/=O 2 2 -I -1
~~~~
=Okl =clkl
001 001
= I:"k8kk = I:"k = 2 < 00.
k~O Z k=O Z
Thus, K (x, t) is a Hilbert-Schutidtkernel.
Now note that
Thisahows that zq isaneigenfunctionof K(x, r) witheigenvalue1/2k/2. Since3 ,. I/Zk/2
forany integer k, we canuse Equation (17.19) to write
2 00 J~1 Uk(S)s2ds
u(x) = x +3 I: k/2 Uk(X).
k=O Z - 3
17.2 FREDHDLM INTEGRAL EQUATIDNS 501
But J~I Uk(S)s2ds =0 for k 2:: 3. For k ::: 2, we use the firstthree Legendrepolynomials
to get
1
1 2 ..ti
uo(s)s ds = -,
-I 3 11 UI (s)s2ds = 0,
-I
degenerate or
separable kernel
Thisgives u(x) = !- 2x2. Thereader is urged to substitute this solution in theoriginal
integral equationandverifythat it works. II
17.2.2 Degenerate Kernels
The preceding example involves the simplest kind of degenerate, or separable,
kernels. A kernel is called degenerate, or separable, if it cao be written as a finite
sum of products of functions of one variable:
n
K(x, t) = I>/>j(x)1frj(t),
j=l
(17.22)
where <Pj aod 1frj are assumed to be square-integrable. Substitoting (17.22) in the
Fredhohn integral equation of the second kind, we obtain
n lb
u(x) - A~ <Pj (x) a 1frj(t)u(t) dt = v(x).
If we define f.'j = 1:1frj(t)u(t) dt, the preceding equation becomes
n
u(x) - ALf.'j<Pj(X) = v(x).
j=l
Multiply this equation by 1frt(x) aod integrate over x to get
(17.23)
n
f.li - ALfJ-jAij = Vi
j~l
for i = I, 2, ... ,n, (17.24)
where Aij = J:1frt(t)<pj(t) dt aod Vi = J:1frt(t)V(t) dt. With f.'i, Vi, aod Aij as
components of column vectors u, v, and a matrix A,we canwritetheabove linear
system of equations as
u - )"Au = v, or (1-AA)u =v. (17.25)
We cao now determine the f.'i by solving the system of linear equations given
by (i7.24). Once the f.'i are determined, Equation (17.23) gives u(x). Thus, for a
degenerate kernel the Fredholm problem reduces to a system of linear equations.
502 17. INTEGRAL EQUATIONS
17.2.5. Example. As aconcrete example of anintegral equation withdegenerate kernel,
we solve u(x)-l fJ(I+xt)u(t) dt =x for two different values ofA. Thekemel, K(x, t) =
1 + xt, isseparable with 9'>1 (x) = I, 1/Jl (I) = I, 9'>2(X) = x, and 1/J2(t) = t. This gives the
matrix
A= Gt)·
2 3
Forconvenience,we definethematrix B sa 1 - )"A.
(a)Firstassumethat)" = 1.In thatcase Bhasa nonzero determinant. Thus,8-1 exists,and
canbe calculatedto be
With
VI = f 1/J;(I)v(t)dt = f tdt =! and "2 = 10
1
1/J; (I )V(t) dt = 10
1
t
2dt
= j
we obtain
( fl.I ) = B-Iv = (-i
fl.2 -2
Equation (17.23) then gives u(x) = fl.19'>1 (x) +f1.2¢2(x) +x = -2.
(b) Now, forthepurpose ofillustrating the otheralternative ofTheorem 17.2.2, let us take
l = 8 +203. Then
(
7 +203 4 +03 )
B = 1 - AA = - 4 +03 (5 +203)/3 '
anddetB= O. This shows that8 +2.../13is a characteristic valueof the equation. Wethus
havea solution only if vex) == x is orthogonal to thenull space of 1 - A*At = at. To
determine abasisforthisnullspace, we have to find vectors Iz) suchthat at lz) = O. Since
). is real,andBis realandsymmetric, Bt = B,andwe must solve
(7+203 4+03 ) «I)
4 +03 (5 +203)/3 <2 = O.
Thesolution to thisequation is a multiple of lz) == (-2!,JI3 ).lfthe integral equation is to
have a solution,the columnvectorv (whose corresponding ket we denoteby Iv)) mustbe
orthogonal to lz). But
(zl v) = (3 -2 - 03) (t) f- O.
Therefore, the integral equationhas no solution. III
17.2 FREDHOLM INTEGRAL EQUATIONS 503
The reader may feel uneasy that the functions <Pj (x) and Vrj (t) appearing in
a degenerate kemel are arbitrary to within a multiplicative function. After all, we
can multiply <Pj(x) by a nonzero function, and divide Vrj(t) by the same function,
and get the same kemel. Such a change clearly alters the matrices A and B and
therefore seems likely to change the solution, u(x). That this is not the case is
demonstrated in Problem 17.2. In fact, it can be shown quite generally that the
transformations described above do not change the solution.
As the alert reader may have noticed, we have been avoiding the problem
of solving the eigenvalue (characteristic) problem for integral operators. Such
a problem is nontrivial, and the analogue of the finite-dimensional case, where
one works with determinants and characteristic polynomials, does not exist. An
exception is a degenerate hermitian'' kemel, i.e., a kemel of the form K (x, t) =
L:?~l hi(X)h~(t). Substituting this in the characteristic-value equation
u(x) = AlbK(x, t)u(t) dt,
we obtain u(x) = A
L:?=l hi (x) f:h~(t)u(t) dt. Defining I-'i sa f:h~(t)u(t) dt
and substituting it back in the equation gives
n
u(x) = AI)i(x)l-'i.
i=l
Multiplying this equation by A-lh;;(x) and integrating over x yields
(17.26)
This is an eigenvalueequation for the hermitian n x n matrix Mwith elements
mi], which, by spectral theorem for hermitian operators, can be solved. In fact, the
matrix need not be hetmitian; as long as it is normal, the eigenvalue problem can
be solved. Once the eigenvectors and the eigenvalues are found, we can substitute
them in Equation (17.26) and obtain u(x). We expect to find a finite number of
eigenfunctions and eigenvalues. Our analysis of compactoperators included such a
case. That analysis also showed that the entire (infinite-dimensional) Hilbert space
could be written as the direct sum of eigenspaces that are finite-dimensional for
nonzero eigenvalues. Therefore, we expect the eigenspace corresponding to the
zero eigenvalue (or infinite characteristic value) to be infinite-dimensional. The
following example illustrates these points.
17.2.6. Example. Let usfind thenonzero characteristic values andcorresponding eigen-
functions of the kernel K(x, t) = 1 + sin(x + t) for -Jl' ::0 X. t ::0 n,
6Actually, theproblem of a degenerate kernel that leadsto a normal matrix, as described below,canalso be solved.
504 17. INTEGRAL EQUATIONS
We are seeking functions u and scalars A satisfying u(x) = AK[u](x), or
u(x) = Ai:[I +sin(x +t)lu(t) dt.
Expanding sin(x +t), we obtain
u(x) =)" L:[1 + sinx cos t + cos x sint]u(t) dt,
or
A-lu(x) = ILl + IL2sinx + JL3 COSX,
(17.27)
(17.28)
where 1'1 = f':.n u(t) dt, !'2 = f':.n u(t) cost dt, and !'3 = f':.n u(t) sin t dt. Integrate
both sides of Equation (17.28) withrespect to x from -rr to 1C to obtain A-I JL1 = 23l'JL 1.
Similarly, multiplying by sin x andcosx andintegrating yields
and (17.29)
If1'1 i' 0, we get A-I = 2,.., which, when substituted in (17.29), yields!'2 = 1'3 = O. We
thus have, as a first solution, All = 2Jr and lUI) = a(6), where a is anarbitrary constant.
o
Equation (17.28) now gives Ai1uI(X) = 1'10 or UI(X) = cj , where Cj is an arbitrary
constant tobe determined.
On the other hand, 1'1 = 0 in-I i' 2,... Then Equation (17.29) yields A-I =±,.. and
1'2 = ±1'3· For A-I == 1.+1 =,..,Equation (17.28) gives
u(x) ea u+(x) = c+(sinx +cosx),
andfor A-I == ).,=1 = -Jr, it yields u(x) ss u_(x) = c_{sinx - COS x), where c± are
arbitrary constants to bedetermined by normalization of eigenfunctions. Thenormalized
eigenfunctions are
I
U1= $' u±(x) = ~(sinx ± cos x).
v 2,..
Direct substitution in theoriginal integral equation easily verifies that U I,'U+, and u.: are
eigenfunctions of theintegral equation withtheeigenvalues calculated above.
Letus nowconsider the zeroeigenvalue (orinfinite characteristic value). Divideboth
sides of Equation (17.27) by A and take the limit of A -+ 00. Then the integral equation
becomes
L:[1 +sinx cost + cos x sint]u(t) dt = O.
The solutions u(t) to thisequation wouldspanthe eigenspace corresponding to the zero
eigenvalue, or infinite characteristic value. We pointed out abovethat this eigenspace is
expected to be infinite-dimensional. This. expectation is borne out once we note that all
functions of the form sinnt or cosnt withn ~ 2 maketheabove integral zero; and there
are infinitely many suchfunctions. III
17.3 PROBLEMS 505
17.3 Problems
17.1. Use mathematical induction to derive Equation (17.5).
17.2. Repeat part (a) of Example 17.2.5 using
¢2(x) = x, 'h(t) = t
so that we still have K(x, t) = 4>1(X)1/tl(t) +¢2(X)1/r2(t).
17.3. Use the spectral theorem for compact hermitian operators to show that ifthe
kernel of a Hilbert-Schmidt operator has a finite number of nonzero eigenvalues,
then the kernel is separable. Hint: See the discussion at the beginning of Section
17.2.1.
17.4. Use the method of successive approximations to solve the Volterra equation
u(x) = Ag u(t)dt. Then derive a DE equivalent to the Volterra equation (make
sure to include the initial condition), and solve it.
17.5. Regard the Fourier transform,
I 1
00
F[f](x) sa - - e
ixy
fey) dy
,J2ir -00
as an integral operator.
(a) Show that F2[f](x) = fe-x).
(b) Deduce, therefore. that the only eigenvalues of this operator are A = ± I, ±i.
(c) Let f(x) be any even function of x. Show that an appropriate choice of a can
make u = f +aF[f] an eigenfunction of F. (This shows that the eigenvalues of
F have infinite multiplicity.)
17.6. For what values of Adoes the following integral equation have a solution?
u(x) = A10" sin(x +t)u(t) dt +x.
What is that solution? Redo the problem using a Neumann series expansion. Under
whatcondition is theseriesconvergent?
17.7. It is possible to mnltiply the functions 4>j(x) by rj(x) and 1/rj(t) by I!yj(t)
and still get the same degenerate kernel, K(x, t) = I:j=1 4>j (x)1/rj(t). Show that
such arbitrariness, although affecting the matrices A and S, does not change the
solution of the Fredholm problem
u(x) - AlbK(x, t)u(t)dt = f(x).
17.8. Show, by direct substitution, that the solution found in Example 17.2.4 does
satisfy its integral equation.
506 17. INTEGRAL EQUATIONS
17.9. Solve u(x) = !f~1(x +t)u(t) dt +x.
17.10. Solve u(x) = ),. f~ xtu(t) dt + x using the Neumann series method. For
. what values of)" is the series convergent? Now find the eigenvalues and eigen-
functions of the kernel and solve the problem using these eigenvalues and eigen-
functions.
17.11. Solve u(x) = ),. fo
oo K(x, t)u(t)dt+xa, where", is any real number except
a negative integer, and K(x, t) = e-(x+I). For what values of)" does the integral
equation have a solution?
17.12. Solve the integral equations
(a) u(x) = eX +),. fol xtu(t)dt.
(c) u(x) = x2 +fol xtu(t) dt.
(b) u(x) =),. fo" sin(x - t)u(t) dt.
(d) u(x) = x + foX u(t) dt.
17.13. Solve the integral equation u(x) = x +),. f~(x+t)tu(t) dt, keeping terms
upto),.2.
17.14. Solve the integral equation u(x) = e-ixi+),. f.""oo e-1x-11u(t) dt, assuming
that f remains finite as x --> ±oo.
17.15. Solve the integral equation u(x) = e-ixi +),. fo
oo
u(t) cosxt dt, assuming
that f remains finite as x --> ±oo.
Additional Reading
1. DeVito, C. Functional Analysis and Linear Operator Theory, Addison-
Wesley, 1990.
2. Jiirgen, K. Linear Integral Operators, Pitman, 1982. Translated from its
original German, this is a thorough (but formal) introduction to integral
operators and equations.
3. Zeidler, E. Applied Functional Analysis, Springer-Verlag, 1995.
18, _
Sturm-Liouville Systems: Formalism
The linear operators discussed in the last two chapters were exclusively integral
operators. Most applications of physical interest, however, involve differential
operators (DO). Unfortunately, differential operators are unbounded. We noted that
complications arise when one abandons the compactness property of the operator,
e.g., sums tum into integrals and one loses one's grip over the eigenvalues of
noncompact operators. The transition to unbounded operators further complicates
malters. Fortunately, the formalism of one type of DOs that occur most frequently
in physics can be stndied in the context of compact operators. Such a stndy is our
aim for this chapter.
18.1 Unbounded Operators with Compact Resol-
vent
domain ofalinear
operator
As was pointed out in Example 16.2.7, the derivative operator cannot be defined
for all functions in .(,2(a, b). This motivates the following:
18.1.1. Definition. Let D be a linear manifold 1 in the Hilbert space 11:. A linear
map T : :D --> 11: will be called a linear operator in2 11:. :D is called the domain
ofT and often denoted by :D(T).
18.1.2. Example. The domainof the derivative operatorD, as an operatoron £,2(a, b),
cannot be the entire space. On the other hand, D is defined on the linear manifold M in
£,2(a. b) spannedby (ei2nrrx/Lj wiih L = b - a. As we saw in Cbapter8, :M is dense
1A linear manifoldof an infinite-dimensional normedvectorspace'7 is a propersubset that is a vectorspacein its ownright,
butls not necessarilyclosed.
As opposed to on X.
508 18. STURM-LIOUVILLE SYSTEMS: FORMALISM
(see Definition 16.4.5 and the discussion following it) in ~2(a, b). This is the essence of
Fourier series: That every function in t.}(a, b) canbe expanded in (i.e., approximated by)
a Fourier series. It turns out thatmany unbounded operators on a Hilbertspace sharethe
same property, namely that their domains aredensein theHilbert space.
Another important property of Fourier expansion is the fact that if the function is
differentiable, then one can differentiate both sides, i.e., one can differentiate a Fourier
expansion term by term if such anoperation makes sensefortheoriginal function. Define
the sequence (fm) by
m
fm(x) = L uni2rrnx/L,
n=-m
1 lb
.
a'l = .JL a !(x)e-l271:nxjLdx.
Then we can statethe propertyaboveas follows:Soppose (fm) is in M.lflim [m = I and
lim I:" =g, then I' = g andI E M. Manyunboundedoperatorssharethisproperty. iii
18.1.3. Definition. Let :D be a linear manifold in the Hilbert space Jf. Let T :
:D --'> Jf be a linear operator in Jf. Suppose that for any sequence {Iun)) in D,
both {Iun)) and {Tlun)}converge in Jf, i.e.,
lim lun ) = lu) and lim T lun) = Iv) .
difference between
hermitian and
self-adjoint operators
closed operator We say thatT is closed iflv) E :DandTlu) = [u).
Notice that we cannot demand that Iv) be in :D for a general operator. This, as
we saw in the preceding example, will not be appropriate for unbounded operators.
The restriction of the domain of an unbounded operator is necessitated by the
fact that the action ofthe operator on a vector in the Hilbert space in general takes
that vector out of the space. The following theorem (see [DeVi 90, pp. 251-252]
for a proot) shows why this is necessary:
18.1.4. Theorem. A closed linear operator in Jf that is defined at every point of
Jf (so that :D= Jf) is bounded.
Thus, if we are interested in unbounded operators (for instance, differential
operators), we have to restrict their domains. In particular, we have to accept the
possibility of an operator whose adjoint has a different domain.f
18.1.5. Definition. Let Tbea linear operatorin Jf. Weshall say thatT is hermitian
ifTt is an extension ofT, i.e., :D(T) C :D(Tt) andTt lu) = T lu)foralllu) E :D(T).
T is called self-adjoint if:D(T) = :D(Tt).
operators with
compact resolvent
As we shall see shortly, certain types of Sturm-Liouville operators, although
unbounded, lend themselves to a study within the context of compact operators.
18.1.6. Definition. A hermitian linear operator T in a Hilbert space Jf is said to
have a compact resolvent ifthere is a I-' E P(T) for which the resolvent R" (T) is
compact.
3Thissubtledifference betweenhermitian andself-adjoint is statedheremerelytowarnthereader andwill be confined to the
present discussion. Thetwo qualifiers will be (ab)used interchangeably in therestof thebook.
18.1 UNBOUNOEO OPERATORS WITH COMPACT RESOLVENT 509
An immediate consequence of this definition is that R, (T) is compact for all
A E pen. To see this, note that R,(T) is bounded by Definition 16.3.1. Now use
Equation (16.9) and write
The RHS is a product ofa bounded" and a compact operator, and therefore must be
compact. The compactness ofthe resolvent characterizes its spectrum by Theorem
16.7.8. As the following theorem shows, this in tnrn characterizes the spectmm of
the operators with compact resolvent.
18.1.7. Theorem. Let T be an operator with compact resolvent R,(T) where A E
pm. Then 0 'I I" E p(R,m) if and only if (A + 1/1") E peT). Similarly,
I" 'lOis an eigenvalue ofR,(T) ifand only if (A+ 1/1") is an eigenvalue ofT.
Furthermore, the eigenvectorsofR,(T) corresponding to I" coincide with those of
T corresponding to (A+ 1/1").
Proof The proof consists of a series of two-sided implications involving defini-
tions. We give the proof of the first part, the second part being very similar:
I" E P(R, (T)) iff R,(T) -1"1 is invertible.
R,(T) -1"1 is invertible iff (T - A1)-1 -1"1 is invertible.
(T - Al)-1 -1"1 is invertible iff l-I"(T-Al) is invertible.
I
l-I"(T-Al) is invertible iff -l-T+Al is invertible.
I"
(±H)l-T is invertible iff (± H) E peT).
Comparing the LHS of the first line with the RHS of the last line, we obtain the
first part of the theorem. D
A consequence of this theorem is that the eigenspaces of an (unbounded)
operator with compact resolvent are finite-dimensional, i.e., such an operator has
only finitely many eigenvectors corresponding to each ofits eigenvalues. Moreover,
arranging the eigenvalues I"n of the resolvent in decreasing order (as done in
Theorem 16.7.8), we conclude that the eigenvalues of T can be arranged in a
sequence in increasing order of their absolute values and the limit ofthis sequence
is infinity.
18.1.8. Example. Consider the operator T in .(,2(0,1) definedbyS Tf = - r having
the domain 2J(T) = If E .(,2(0, I) Ir E .(,2(0, I), frO) = f(l) = OJ. The reader may
4rhe sumof two bounded operators is bounded.
5Weshalldepart from ourconvention hereand shallnotusetheDirac bar-ket notation although theuse of abstract operators
encourages theiruse.The reason is thatin thisexample, we aredealingwithfunctions, andit is moreconvenient toundress the
functions fromtheirDiracclothing.
510 18. STURM-LIOUVILLE SYSTEMS: FORMALISM
checkthat zerois not aneigenvalue of T. Therefore, we maychoose Ro(n = T- 1. We
shallstudya systematic way of finding inverses of some specificdifferential operators in
theupcoming chapters on Green's functions. At thispoint,sufficeit to saythat T-1 canbe
written as a Hilbert-Schmidt integral operator withkernel
K(x,t) = {X(I-t) if 0 ~ x ~ t s I,
(I - x)t if 0 ~ t ~ x ~ I.
Thus,ifT1= g, i.e., if I" = -g, thenrig = I, or 1= K[g], i.e.,
I(x) = K[g](x) = f K(x, t)g(t) dt = fox (I - x)tg(t) dt + {(I - x)tg(t) dt.
lt is readilyverified that K[g](O) = K[g](l) = 0 and !,,(x) = K[g]"(X) = -g.
We can now use Theorem 18.1.7 with)" = 0 to find all the eigenvalues ofT: /Ln is an
eigenvalue of T if andonlyif 1/Jkn is aneigenvalue of T-1.Theseeigenvalues shouldhave
finite-dimensional eigenspaces, andwe shouldbe ableto arrange themin increasing order
of magnitude without bound. To verifythis, we solve I" = -J.L!. whose solutions are
/Ln = n27f2 andIn(x) = sinmfx. Notethat there is onlyoneeigenfunction corresponding
to eacheigenvalue. Therefore, theeigenspaces arefinite- (one-)dimensional. l1li
The example above is a special case of a large class of DOs occurring in math-
ematical physics. Recall from Theorem 13.5.4 that all linear second-order differ-
ential equations can be made self-adjoint. Moreover, Example 13.4.12 showed
that any SOLDE can be transformed into a form in which the first-derivative term
is absent. By dividing the DE by the coefficient of the second-derivative term if
necessary, the study of the most general second-order linear differential operators
boils down to that of the so-called Sturm-Liouville (S-L) operators
which are assumed to be self-adjoint. Differential operators are necessarily ac-
companied by boundary conditions that specify their domains. So, to be complete,
let us assume that the DO in Equation (18.1) acts on the subset of £,z(a, b) con-
sisting of functions u that satisfy the following so-called separated boundary
conditions:
Sturm-Liouville
operators
separated boundary
conditions
d2
Lx == dx 2 - q(x),
alu(a) +fhu'(a) = 0,
azu(b) +fJzu'(b) = 0,
(18.1)
(18.2)
regular
Sturm-Liouville
systems
where aI, az, fh, and fJz are real constants with the property that the matrix of
coefficients has no zero rows. The collectionofthe DO and the boundary conditions
above is called a regular Sturm-Liouville system.
We now show that the DO of a regular Sturm-Liouville system has compact
resolvent. First observe that by adding au-with a an arbitrary number different
from all eigenvalues of the DO-to both sides of the eigenvalue equation u" -
18.1 UNBOUNDED OPERATORS WITH COMPACT RESOLVENT 511
qu = AU,we canassume" thatzerois not aneigenvalue of Lx. Next, suppose that
Ut(x) and uz(x) are ihe two linearly independent solutions ofihe homogeneous
DE satisfying ihe first and ihe second boundary conditions of Equation (18.2),
respectively. The operator whose kernel is
K(x, t) = {-U1(X)UZ(t)/w(a) if a::o x ::0 t::o b,
-UI(t)UZ(x)/W(a) if a::o t::o x ::0 b,
in which W is ihe Wronskian of the solutions, is a Hilbert-Schmidt operator and
iherefore compact. We now show ihat K(x, t) is ihe resolvent Ro(Lx) = L.;-t sa K
of our DO. To see this, write Lxu = v, and
uz(x) 1
x
Ut(x) lb
u(x) = K[v](x) = --- Ut(t)v(t)dt - -(-) uz(t)v(t)dt.
W(a) a Wax
Differentiating this once gives
u
t
(x) 1
x
u';(x) lb
u' (x) = __
z_ UI (t)v(t) dt - _t_ uz(t)v(t) dt,
W(a) a W(a) x
and a second differentiation yields
u"(x) 1
x
u"(x) lb
u"(x) = __
z_ uI(t)v(t)dt __t_ uz(t)v(t)dt + v(x).
W(a) a W(a) x
The last equation follows from ihe fact ihat the Wronskian uiUz- u;uI is constant
for a DE of the form u" -qu = O. By substituting ul = qu; andu~ = qu: in ihe
last equation, we verify ihat u = K[v] is indeed a solution ofihe Sturm-Liouvi1le
system Lxu = v.
Next, we show that ihe eigensolutions of ihe S-L system are nondegenerate,
i.e., ihe eigenspaces are one-dimensional. Suppose II and [z are any two eigen-
functions corresponding to ihe same eigenvalue. Then boih must satisfy ihe same
DE and the Same boundary conditions; in particular, we must have
at!t(a) +fhI{(a) = 0, '* (!t(a) I{(a)) (at) _ (0)
at!z(a) +fhI~(a) = 0 jz(a) I~(a) fJt - 0 .
(18.3)
If at and fJI are not boih zero, ihe Wronskian-ihe determinant of ihe matrix
above-must vanish. Therefore, ihe two functions must be linearly dependent.
Finally, recall ihat a Hilbert space on which a compact operator K is defined
can be written as a direct sum of ihe latter's eigenspaces. More specifically,
9{ = '£-7=0 EIlJV(j, where each JV(j is finite-dimensional for j = 1, 2, ... , and
6Although this will changeq-and the originaloperator-no information will be lost becausethe eigenvectors will be the
sameandall eigenvalues will be changed by a.
(18.4)
512 lB. STURM-LIOUVILLE SYSTEMS: FORMALISM
N can befinite or infinite. If N is finite, then Mo, which can be considered as
the eigenspace of zero eigenvalne,7 will be infinite-dimensional. If Mo is finite-
dimensional (or absent), then N mnst be infinite, and the eigenvectors of K will
span the entire space, i.e., they will form a complete orthogonal system. We now
show that this holds for the regular Sturm-Liouville operator.
Jacques Charles FraucoisSturm (1803-1855)madethefirst
accurate determination of the velocity of soundin water in
1826, working with the Swiss engineer Daniel Colladon. He
became a French citizenin 1833 andworked in Paris at the
EcolePolytechniquewhere hebecame aprofessor in 1838.In
1840 he succeeded Poisson in the chair of mechanics in the
Pacuue des Sciences, Paris.
Theproblems of determining theeigenvalues andeigen-
functions of an ordinary differential equation withboundary
conditions andof expanding a given function in tenus of an
infinite seriesof the eigenfunctions, which date from about
1750,becamemoreprominent asnew coordinate systemswereintroduced andnewclasses
of functions aroseas theeigenfunctions of ordinary differential equations. Sturm andhis
friend Joseph Liouvilledecided to tackle thegeneral problem for any second-order linear
differential equation. .
Sturm hadbeenworking since 1833onproblems of partial differential equations, pri-
marily on the flowof heatin a bar of variable density, andhence was fully aware of the
eigenvalue andeigenfunction problem. Themathematical ideas he applied tothisproblem
arecloselyrelatedtohisinvestigations ofthe reality anddistributionof theroots ofalgebraic
equations. His ideas on differential equations, he says,camefrom thestudy of difference
equations anda passage to thelimit. Liouville, informed by Sturm of theproblems he was
working on, took up the same subject. The results of their joint workwas published in
several papers which are quite detailed.
Suppose that the above Hilbert-Schmidt operator K has a zero eigenvalue.
Then, there must exists a nonzero function v such that K[v](x) = 0, i.e.,
U2(X) 1
x
Ut(x) lb
- - - Uj (t)v(t) dt - - - U2(t)V(t) dt = 0
W(a) a W(a) x
for all x. Differentiate this twice to get
U"(x) r u"(x) lb
- .Ji(a) Ja
Ut(t)v(t) dt - ~(a) x U2(t)V(t) dt +v(x) = O.
Now substitute ur = qUj and Uz= qU2 in this equation and use Equation (18.4)
to conclude that v = O. This is impossible because no eigenvector can be zero.
7Thereader recallsthat whenKactsonMo, it yieldszero.
Theorem forregular
Sturm-Liouville
systems
16.2 STURM--L1DUVILLE SYSTEMS AND SDLDES 513
Hence, zero is not an eigenvalue of K, i.e., Mo = {D). Since eigenvectors of
K = L;1coincidewith eigenvectors of Lx, andeigenvalues of Lx arethereciprocals
of the eigenvalues of K, we have the following result.
18.1.9. Theorem. A regular Sturm-Liouville system has a countable number of
eigenvalues that can be arranged in an increasing sequence that has infinity as
its limit. The eigenvectors ofthe Sturm-Liouville operator are nondegenerate and
constitute a complete orthogonal set. Furthermore, the eigenfunction Un (x) cor-
responding to the eigenvalue An has exactly n zeros in its interval ofdefinition.
The last statementis not a result of operator theory, but can be derivedusing the
theory of differential equations. We shall not present the,details of its derivation.
We need to emphasize that the boundary conditions are an integral part of S-
L systems. Changing the boundary conditions so that, for example, they are no
longer separated may destroy the regularity of the SoL system.
18.2 Sturm-LiouvilleSystems and SOLDEs
We are now ready to combine our discussion of the preceding section with the
knowledge gained from our study of differential equations. We saw in Chapter 12
that the separation of PDEs normally results in expressions of the form
L[u]+ AU = 0, or
d2u du
P2(x)d 2 +PI(x)-d + po(x)u +AU = 0,
x x (18.5)
(18.6)
where uis a function of a single variable and Ais, a priori, an arbitrary constant. This
is an eigenvalue equation for the operator L, which is not, in general, self-adjoint.
lfwe use Theorem 13.5.4 and multiply (18.5) by
1 [fX PI(t) ]
w(x) = --exp --dt,
. P2(X) P2(t)
it becomes self-adjoint for real A,and can be written as
d [ dU]
dx p(x) dx + [AW(X) - q(x)]u = 0
withp(x) =w(x)p2(x)andq(x) =-po(x)W(x). Equation (18.6) is the standard
form of the SoL equation. However, it is not in the form studied in the previous
section. To turn it into that form one changes both the independent and dependent
Liouville substitution variables via the so-called Lionville substitntion:
u(x) = v(t)[p(x)w(x)r1/4, _lX~(S)d
t - . s.
a pes)
(18.7)
514 18. STURM-LIOUVILLE SYSTEMS: FORMALISM
(18.8)
Itis thena mailerofchain-rule differentiation to showthatEquation (18.6) hecomes
d2v
dt2 +[J. - Q(t)]v = 0,
where
Q(t) = q(x(t)) + [P(X(t))W(X(t))]-1/4
d
d:
[(pw)I/4].
w(x(t)) t
Therefore, Theorem 18.1.9 still holds.
Joseph Liouville (1809-1882) was a highly respecled professor
at the College de France, in Paris, and the founder and editor of
theJournal des Mathbnatiques Pures et Appliquees, afamouspe-
riodical that played an important role in French mathematical life
through the latter part of the nineteenth century. His own remark-
able achievements as a creativemathematicianhave onlyrecently
received the appreciation they deserve.
He was the first to solve a boundary value problem by solving
an equivalent integral equation. His ingenious theory offractional
differentiation answered the long-standing question of what rea-
sonable meaning can be assigned to the symbol any/ dx n when n is not a positive integer.
He discovered the fundamental result in complex analysis that a bounded entire function
is necessarily a constant and used it as the basis for his own theory of elliptic functions.
There is also a well-known Liouvilletheoremin Hamiltonian mechanics, which states that
volume integrals are time-invariant in phase space. In collaboration with Sturm, he also
investigated the eigenvalue problem of second-order differential equations.
The theory oftranscendental numbers is another branch ofmathematics that originated
in Liouville's work. The irrationality of 1f and e (the fact that they are not solutions of any
linear equations) had been proved in the eighteenth century by Lambertand Euler.In 1844
Liouville showed that e is not a root of any quadratic equation with integral coefficients
as well. This led him to conjecture that e is transcendental, which means that it does not
satisfy any polynomial equation with integral coefficients.
18.2.1. Example. The Liouville substitutiou [Equatiou (18.7)1 transformsthe BesselDE
(xu')' + (Px - v2/x)u = 0 iuto
d2
v [2 v
2
- 1
/4]
-+ k - v=O,
dt 2 t2
coskt
Jl/2(kt) = B .,ji ,
or
from whichwe can obtainaninteresting resultwhen v = !.In thatcase we have v+k2v =0,
whose solutious are of the form coskt aud siukt. Noting that u(x) = JI/2(X), Equatiou
(18.7) gives
sinkt
JI/2(kt) = A .,ji
(18.9)
18.2 STURM--LIOUVILLESYSTEMS ANO SOLDES 515
andsinceh/2(X) is analyticatx = 0, wernnst have h/2(kt) = A sinkt/,f!, whichis the
resultobtainedin Chapter 14. ..
The appearance of W is the result of our desireto renderthe differential nperator
self-adjoint. It also appears in another context. Recall the Lagrange identity for a
self-adjoint differential operator L: .
d , ,
uL[v] - vL[u] = -{p(x)[u(x)v (x) - v(x)u (x)]).
dx
Ifwe specialize this identity to the S-Lequation of(18.6) with u = Uj correspond-
ing to the eigenvalue Al and v = U2 corresponding to the eigenvalue A2,we obtain
for the LHS
uIL[U2] - u2L[uIl = Uj(-A2WU2) +U20.IWUI) = (AI - A2)WUIU2.
Integrating both sides of (18.9) then yields
(Aj - A2)lab WUjU2dx = {p(x)[Uj(x)u~(x) - U2(X)U~(x)])~.
(18.10)
A desired property of the solutions of a self-adjoint DE is their orthogonality
when they belong to different eigenvalues. This property will be satisfied if we
assume an inner product integral with weight function W (x), and if the RHS of
Equation (18.10) vanishes. There are various boundary conditions that fulfill the
latter requirement. For example, Uj and U2 could satisfy the boundary conditions
of Equation (18.2). Another set of appropriate boundary conditions (BC) is the
periodic boundary periodic BC given by
conditions
ural = u(b) and u' (a) = u' (b). (18.11)
However, as the following example shows, the latter BCs do not lead to a regular
S-L system.
18.2.2. Example. (a)TheS-Lsysternconsisting oftheS-Lequationd2u/dt2+",2u = 0
intheinterval[0, T] withtheseparatedBCsu(O)= 0 andu(T) = ohastheeigenfunctions
un(t) = sin n; t with n = 1,2, ... and the eigenvalues An = w~ = (nx /T)2 with
n = 1,2, ....
(b) Letthe S-Leqnationbe the sameas inpart (a) bntchange theintervalto [-T, +T] and
theBCs to aperiodiconesuchasu(-T) = u(T) andu'(-T) = u'(T). Theeigenvalues are
thesameasbefore,but the eigenfunctions are1,sin(mrt/T), andcos(mrt/T), wheren is a
positiveinteger. Note that there is a degeneracy herein the sense thattherearetwolinearly
independent eigenfunctions havingthe sameeigenvalue(mr / T)2. By Theorem18.1.9,the
S-L systemis notregular. .
(c) The Besselequationfor a givenfixed 1)2 is
1 ( v
2
)
u"+ -;.u' + k
2
- x2
u = 0, where a :S:X :s:b,
(18.12)
516 18. STURM-LIOUVILLE SYSTEMS: FORMALISM
andit canbe turned intoanS-L systemif we multiply it by
I [IX Pl(t)] [IX dt]
w(x)=--exp --dt=exp -=x.
P2(X) P2(t) t
Thenwe canwrite
d (dU) (2 v
2)
- x- + k x - - u = 0,
dx dx x
which is in theform of Equation(18.6) with P = w = x, A = k2, and q(x) = v2/x. If
a > 0, we can obtaina regularSoL systemhy applyingappropriateseparatedBes. III
singular S·Lsystems A regular SoLsystem is too restrictive for applications where either a or b or
both may be infinite or where either a or b may be a singular point of the SoL
equation. A singular S·L system is one for which one or more of the following
conditions hold:
1. The interval [a, b] stretches to infinity in either or both directions.
2. Either p or w vanishes at one or both end points a and b.
3. The function q(x) is not continuous in [a, b].
4. Anyone of the functions ptx), q(x), and w(x) is singular at a or b.
Even though the conclusions concenting eigenvalues of a regular SoLsystem
cannot be generalized to the singular SoLsystem, the orthogonality of eigenfunc-
tions correspondiug to different eigenvalues can, as long as the eigenfunctions are
square-integrable with weight function w(x):
18.2.3. Box. The eigenfunctions ofa singular SoLsystem are orthogonal if
the RHS of(18.10) vanishes.
18.2.4. Example. Bessel functions Jvex) are entire functions. Thus, they are square-
integrable in the interval[0, b] for any finitepositiveb. For fixedv the DE
2d2u du 2 2 2
r dr2 +r dr +(k r - v )u = 0
transforms into the Bessel equationx-e"+xu'+(x1_v2)u = oifwe make the substitution
kr = x. Thus, the solution of the singular S-L equation (18.12)thatis analytic atr = 0
and corresponds to theeigenvalue k2 is uk(r) = JvCkr). Fortwodifferent eigenvalues, kr
andk~, theeigenfunctions are orthogonalifthe boundarytermof (18.10)corresponding to
Equation(18.12)vanishes, that is, if
(r[Jv(klr)J~(k2r)- Jv(k2r)J~(klr)]Jg
where - 1 < x < 1,
18.3 OTHER PROPERTIES OF STURM--L10UVILLE SYSTEMS 517
vanishes,whichwilloccurifandonlyif Jv(kj b)J~(k2b)-Jv(k2b)J~(kj
b) = O. Acornmon
choice is to take Jv(k]b) = 0 = Jv(k2b), that is, to take both k]b and k2b as (different)
rootsof the Bessel function of order v. WethushaveItr Jv(kir)Jv(kjr) dr = 0 if ki and
kj aredifferent roots of Jp(kb) = O. TheLegendre equation
d [ 2 dU]
- (I-x)- +AU=O,
dx dx
is aheady self-adjoint. Thus, w(x) = I, and p(x) = 1 - x2. The eigenfnnctionsof this
singularS-I. system[singularbecansep(I) = p(-I) = 0] areregularattheendpoints x =
±I andarethe LegendrepolynomialsPn(x) correspondingto A= n(n + 1).The bonndary
termof(I8.1O) clearlyvanishesata = -1 and b = +1. Since Pn(x) are square-integrable
on [-I, +1], we obtain the familiar orthogonality relation: J~l Pn(x)Pm(x)dx = 0 if
m i' n.
TheHermite DE is
u" - 2xu' + AU = O. (18.13)
(18.14)
2
It is transformedinto an SoL systemif we multiply it by w(x) = e-x . The resultingSoL
equation is
d [ 2 dU] 2
- e-x _ + Ae-x u = O.
dx dx
The boundary tenn corresponding to the two eigenfunctions u1(x) andU2(x) having the
respective eigenvalues A1 and)..2 =1= Al is
(e-
X2
[U] (x)u2 (x)- U2 (x)u! (x)]J~.
Thisvanishes forarbitrary Uland U2 (because theyareHermite polynomials) if a = -00
andb = +00.
The function u is an eigenfunction of (18.14) corresponding to the eigenvalue x if
and only if it is a solution of (18.13). Solutions of this DE corresponding to A = 2n
are the Hermite polynomials Hn(x) discussed in Chapter 7. We can therefore write
J~:: e-
x 2
Hn(x)Hm(x) dx = 0 ifm i' n. This orthogonalityrelation was also derivedin
Chapter7. III
18.3 Other Properties of Sturm-Liouville Systems
The S-L problem is central to the solution of many DEs in mathematical physics.
In some cases the S-L equation has a direct bearing on the physics. For example,
the eigenvalue Amay correspond to the orbital angularmomentum ofan electron in
an atom (see the treatment of spherical harmonics in Chapter 12) or to the energy
levels of a particle in a potential (see Example 14.5.2). In many cases, then, it
is worthwhile to gain some knowledge of the behavior of an S-L system in the
limit oflarge A-high angular momentum or high energy. Similarly, it is useful to
understand the behavior of the solutions for large values of their arguments. We
therefore devote this section to a discnssion of the behavior of solutions of an S-L
system in the limit of large eigenvalues and large independent variable.
518 18. STURM-LIOUVILLE SYSTEMS: FORMALISM
18.3.1 Asymptotic Behavior for Large Eigenvalues
We assume that the SoLoperator has the form given in Equation (18.1). This can
always be done for an arbitrary second-order linear DE by multiplying it by a
proper function (to make it self-adjoint) followed by a Liouville substitution. So,
consider an SoL systems of the following form:
u" + [A - q(x)]u == u" + Q(x)u = 0 whereQ=A-q (18.15)
with separated Bes of (18.2). Let us assume that Q(x) > 0 for all x E [a, b], that
is, A> q(x). This is reasonable, since we are interested in very large A.
The study of the system of (18.15) and (18.2) is simplified if we make the
PrOfer substitution Priifer snbstitution:
(18.16)
(18.17)
where R(x, A)and </>(x, A)are A-dependent functions of x. This substitution trans-
forms the SoLequation of (18.15) into a pair of equations (see Problem 18.3):
d</> q'
dx = .vA - q(x) - 4[A _ q(x)] sin2</>,
dR = Rq' cos2</>.
dx 4[A - q(x)]
The function R(x, A)is assumed to be positive because any negativity ofu can
be transferred to the phase </>(x, A). Also, R carmot be zero at any point of [a, b],
because both u and u' would vanish at that point, and, by Lemma 13.3.3, u(x) = O.
Equation (18.17) is very useful in discussing the asymptotic behavior of solutions
of SoL systems both when A --> 00 and when x --> 00. Before we discuss such
asymptotics, we need to make a digression.
It is often useful to have a notation for the behavior of a function j (x, A) for
large Aand all values of x. If the function remains bounded for all values of x as
A--> 00, we write j(x, A) = 0(1). Intuitively, this means that as Agets larger and
larger, the magnitude of the function j(x, A)remains of order 1. In other words,
for no value of x is lim"....co j(x, A) infinite. If A"j(x, A) = 0(1), then we can
write j(x, A) = O(1)/A". This means that as Atends to infinity, j(x, A) goes to
zero as fast as I/A" does. Sometimes this is written as j(x, A) = OrA-"). Some
properties of 0(1) are as follows:
1. Ifa is a finite real number, then 0(1) +a = 0(1).
2. 0(1) + 0(1) = 0(1), and 0(1)0(1) = 0(1).
3. For finite a and b, J:0(1) dx = 0(1).
4. Ifr and s are real numbers with r ::s s, then
O(I)A' + O(I)A' = o(1)A' .
.'
18.3 OTHER PROPERTIES OF STURM--L10UVILLE SYSTEMS 519
5. If g(x) is any bounded function of x, then a Taylor series expansion yields
[A +g(x)]' = A' [I + g~)r
, { g(x) r(r - I) [g(X)]2 O(I)}
=A I+r-+ - +--
A 2 A A3
= A' + rg(x)A,-1 + O(l)A,-2 = A' + O(l)A,-1
= 0(1)).'.
Returning to Equation (18.17) and expanding its RHSs using property 5, we
obtain
dR 0(1)
=
dx A
Taylor series expansion of q,(x, A) and R(x, A) about x = a then yields
0(1)
q,(x, A) = q,(a, A) + (x - a)Vi + Vi'
0(1)
R(x, A) = R(a, A) +-A- (18.18)
for A -+ 00. These resnlts are useful in determining the behavior of An for large
n. As an example, we use (18.2) and (18.16) to write
_ al = u'(a) = R(a, A)Ql/4(a, A)cos[q,(a, A)] = Ql/2(a, A)cot[q,(a, A)],
Ih u(a) R(a, A)Q 1/4(a, A)sin[q,(a, A)l
where we have assumed that,81 f' O. If,81 = 0, we can take the ratio ,81/al, which
is finite because at least one of the two constants must be different from zero. Let
A = -al/,81 and write cot[q,(a, A)] = A/.JA q(a). Similarly, cot[q,(b, A)] =
B/.JA - q(b), where B = -a2/.82. Let us concentrate on the nth eigenvalue and
write
-1 A
</!(a, An) = cot
"fAn - q(a)
A
q,(b, An) = coC
1
--r.;'==T,[i'
.JAn q(b)
For large Anthe argument ofcot"1 is small. Therefore, we can expand the RHS in
a Taylor series about zero:
1 1 1f 1f 0(1)
cot" e ee ccf" (0)-£+···=--£+···=-+--
2 2 A.
for s = o(I)/A.. It follows that
1f 0(1)
q,(a, An) = "2 + A.'
1f 0(1)
q,(b, An) = - +nit + 1'>'
2 vAn
(18.19)
520 18. STURM-LIOUVILLE SYSTEMS: FORMALISM
The term nit appears in (18.19) because, by Theorem 18.1.9, the nth eigenfunction
has n zeros between a and b. Since u = R Q~1/4 sin tP, this means that sin tP must
go through n zeros as x goes from a to b. Thus, at x = b the phase tP must be nn
largerthanat x = a.
Substituting x = b in the first equation of (18.18), with A --> An, and using
(18.19), we obtain
~ +n1l' + 0(1) = ~ + 0(1) + (b -a)J):;; + 0(1),
2 .n; 2 .;r;; .n;
or
0(1)
(b - a)J):;; = nn + 1'>'
vAn
(18.20)
(18.21)
One consequence ofthis result is that, limn-->oo nA;I/2 = (b-a)I1l'. Thus,.;r;; =
Csn, where limn-->oo Cn = 1l'1(b - a), and Equation (18.20) can be rewritten as
J):;; = ~ + 0(1) = ~ + 0(1).
b -a Cnn b -a n
00,
This equation describes the asymptotic behavior of eigenvalues. The following
theorem, statedwithoutproof, describes the asymptotic behaviorofeigenfunctions.
18.3.1. Theorem. Let {un(x)} ~o be the normalized eigenfunctions ofthe regular
S-L system given by Equations (18.15) and (18.2) with fhfh i' O. Then.for n -->
(18.22)
where An = n(n + I).
_~ n1l'(x-a) 0(1)
unix) - --cos + --.
b-a b-a n
18.3.2. Example. Let us derive an asymptotic formula for the Legeudre polynomials
Pn (x). WefirstmaketheLiouville substitution to transform theLegendre DE [(I-x2)pI]'+
n(n + I)Pn = 0 ioto
d2v
dt2 + [An - Q(t)]v = 0,
asymptotic behavior
ofsolutions of large
order
(18.23)
Here p(x) = 1- x2 and w(x) = I, so t = r ds/~ = cos-I x, or x(t) = cost,
and
Pn(x(t)) = v(t)[I-x2(t)]-1/4 = ;(1) .
smr
10Equation (18.22)
16.3 OTHER PROPERTIES OF STURM--LIOUVILLE SYSTEMS 521
For large n we can neglect Q(t), make the approximation An ~ (n + !)2, and write
v+ en + !)2v = 0, whose general solution is
v(t) = A cos[(n+ ~)t +"J,
where A and a are arbitrary constants. Substituting this solution in (18.23) yields
Pn(cost) = A cos[(n + ~)t + "J/v'sint. To determine" we note that Pn(O) = 0 if n
is odd.Thus,ifwe let t = 1r/2,thecosineterm vanishesforoddn ifa = -JC/4.Thus,the
general asymptotic formula forLegendre polynomials is
A [ I "]
Pn(cost) = v" cos (n + -)t - - .
smt 2 4
18.3.2 Asymptotic Behavior for Large x
III
Liouville and Pnifer substitutions are useful in investigating the behavior of the
solutions of S-L systems for large x as well. The general procedure is to transform
the DE into the form of Eqnation (18.8) by the Liouville substitution; then make
the Pnifer substitution of(18.16) to obtain two DEs in the form of (18.17). Solving
Equation (18.17) when x -+ 00 determines the behavior of </J and R and, subse-
quently, of u, the solution. Problem 18.4 illustrates this procedure for the Bessel
functions. We simply quote the results:
(2 [ ( 1)]( v
2
- 1/4] 0(1)
Jv(x)=y;;;;COS x- v+2: 2+ 2x + x5/2'
(2 . [ ( 1)]( v
2
-1/4] 0(1)
Yv(x) = y;;;;sm x - v + 2: 2 + 2x + x5/2 .
These two relations easily yield the asymptotic expressions for the Hankel func-
tions:
HP)(x) = Jv(x) +iYv(x)
{!; {[ ( 1)]( v
2
-1/4]} 0(1)
= -exp i x- v+- -+ +--,
](X 2 2 2x x5/ 2
H~2)(x) = Jv(x) - iYv(x)
= (2exp {-i [x _(v + ~) ::. + v2
- 1/4]} + 0(1).
Y;;;; 2 2 2x x5/2
If the last term in the exponent-which vanishes as x -+ oo-is ignored, the
asymptotic expression for HPJ(x) matches what was obtained in Chapter 15 using
the method of steepest descent.
522 18. STURM-LIOUVILLE SYSTEMS: FORMALISM
18.4 Problems
18.1. Show that the Liouville substitutiou trausfonns regular S-L systems into
regular S-L systems aud separated aud periodic BCs into separated aud periodic
BCs, respectively.
18.2. Let UI(x) aud U2(X) be trausfonned, respectively into VI(t) aud V2(t) by the
Liouville substitutiou. Show that the inner product ou [a, b] with weight functiou
w(x) is trausfonned into the inner product ou [0, c] with unit weight, where c =
J:../w/pdx.
18.3. Derive Equatiou (18.17) from (18.15) using Priifer substitution.
18.4. (a) Show that the Liouville substitution trausfonns the Bessel DE into
d
2
v (2 v
2
- 1/4) _
-2 + k - 2 v-D.
dt t
(b) Find the the equations obtained from the Priifer substitution, aud show that for
large x these equations reduce to
, ( a) 0(1)
</> = k I - 2k2x2 + ~
R' 0(1)
=
R ~'
where a = v2
- t.
(c) Integrate these equations from x to b > x aud take the limit as b --+ 00 to get
a 0(1)
</>(x) = </>00 +kx+ 2kx +7'
0(1)
R(x) = Roo+ -2-'
x
where </>00 =limb-->oo(</> (b) - kb) aud Roo =limb-->oo R(b).
(d) Substitute these aud the appropriate expression for Q-I/4 in Equation (18.16)
aud show that
Roo ( v
2
- 1/4) 0(1)
vex) = Ii: cos kx - kxoo + + -2-'
-Jk 2kx x
where kxoo ea n/2 - </>00'
(e) Choose Roo = ../2/:n: for all solutions of the Bessel DE, aud let
aud
for the Bessel functions Jv(x) aud the Neumann functions Yv(x), respectively, aud
find the asymptotic behavior of these two functions.
IB.4 PROBLEMS 523
Additional Reading
1. Birkhoff, G. and Rota, G.-C. OrdinaryDifferentialEquations, 3rded., Wiley,
1978. Has a good discussion of Sturm-Liouville differential equations and
their asymptotic behavior.
2. Boccara, N. Functional Analysis, Academic Press, 1990. Discusses the
Sturm-Liouville operators in the same spirit as presented in this chapter.
3. Hellwig, G. Differential Operators of Mathematical Physics, Addison-
Wesley, 1967. An oldie, but goodie! It gives a readable account of the
Sturm-Liouville systems.
19 _
Sturm-Liouville Systems: Examples
Chapter 12 showed how the solution of many PDEs can be written as the product
of the solutions of the separated ODEs. These DEs are nsually of Sturm-Liouville
type. We saw this in the construction of spherical harmonics. In this chapter, con-
sistingmainly of illustrative examples, we shall considerthe use of othercoordinate
systems and construct solutions to DEs as infinite series expansions in terms of
SoLeigenfunctions.
19.1 Expansions in Termsof Eigenfunctions
Central to the expansion of solutions in terms ofSoLeigenfunctions is the question
of their completeness. This completeness was established for a regular SoLsystem
in Theorem 18.1.9.
We shall shortly state an analogous theorem (without proof) that establishes
the completeness of the eigenfunctions of more general SoLsystems. This theorem
requires the following generalization of the separated and the periodic BCs:
RIU == anura) +a12u'(a) +anu(b) +aI4u'(b) = 0,
Rzu sa aZlu(a) +«nu'ta) +aZ3u(b) +a24u'(b) = 0,
where aij are numbers such that the rank of the following matrix is 2:
(19.1)
TheseparatedBCscorrespondtothecaseforwhichan = aj, a12 = fh,az3 = az,
and aZ4 = th, with all other aij zero. Similarly, the periodic BC is a special case
for which an = -an = azz = -a24 = I, with all other aij zero. It is easy to
19.1 EXPANSIONS IN TERMS OF EIGENFUNCTIONS 525
verify that the rank of the matrix a is 2 for these two special cases. Let
ll= {u E eZ[a,bJIRju =0, for j = 1,2} (19.2)
bea subspace of.c~ (a, b),and-to assure the vanishing oftheRHS ofthe Lagrange
identity-assume that the following equality holds:
(b) det ("'U "'12) = (a) det ("'t3 "'t4).
P "'ZI "'22 P "'Z3 "'24
(19.3)
We are now ready to consider the theorem (for a proof, see [Hell 67, Chapter
7]).
19.1.1. Theorem. The eigenfunctions (un(x)}~1 ofan S-L system consisting of
the S-L equation (pu')' + (Aw - q)u = 0 and the Bes of(19.1)form a complete
basis ofthe subspace II of.c~(a, b) described in (19.2). The eigenvalues are real
and countably infinite and each one has a multiplicity ofat most 2. They can be
ordered according to size Al ::: Az ::: .. " and their only limit point is +00.
First note that Equation (19.3) contains both separated and periodic BCs as
special cases (problem 19.1).10 the case ofperiodic BCs, we assume that pea) =
p (b). Thus, all the eigenfunctions discussed so far are covered by Theorem 19.1.1.
Second, the orthogonality ofeigenfunctions corresponding to differenteigenvalues
and the fact that there are infinitely many distincteigenvalues assure the existence of
infinitely many eigenfunctions. Third, the eigenfunctions form a basis of II and not
thewhole.c~(a, b). Onlythosefunctionsu E .c~(a, b) thatsatisfytheBCin(19.1)
are expandable in terms of u,,(x). Finally, the last statement of Theorem 19.1.1 is
a repetition of part of Theorem 18.1.9 but is included because the conditiuns under
which Theorem 19.1.1 holds are more general than those applying to Theorem
18.1.9.
Part II discussed orthogonalfunctions in detail and showed how otherfunctions
can be expanded in terms of them. However, the procedure nsed in Part II was ad
hoc from a logical standpoint. After all, the orthogonal polynomials were invented
by nineteenth-century mathematical physicists who, in their struggle to solve the
PDEs of physics using the separation of variables, carne across various ODEs
of the second order, all of which were recognized later as S-L systems. From a
logical standpoint, therefore, this chapter shonld precede Part II. But the order of
the chapters was based on clarity and ease of presentation and the fact that the
machinery of differential equations was a prerequisite for such a discussion.
Theorem 19.1.1 is the importantlink betweenthe algebraic and the analytic ma-
chinery of differential equation theory. This theorem puts at our disposal concrete
mathematical functions that are calculable to any desired accuracy (on a computer,
say) and can serve as basis functions for all the expansions described in Part II.
The remainder of this chapter is devoted to solving some PDEs of mathematical
physics using the separation of variables and Theorem 19.1.1.
526 19. STURM-LIOUVILLE SYSTEMS: EXAMPLES
x
c
b
t:P= f(x,y)
y
Figure 19.1 A rectangular conducting box of which one face is held at the potential
f (x, y) and the otherfaces are grounded.
19.2 Separation in Cartesian Coordinates
(19.4)
d2Z
-2 +vZ=O,
dz
Problems most suitable for Cartesian coordinates have boundaries withrectangular
symmetry such as boxes or planes.
19.2.1. Example. RECrANGULAR CONDUCTING BOX
Consider a rectangular conducting box with sides a, b, and c (see Figure 19.1). All faces are
held at zero potential except the top face, whose potential is given by a function f(x, y).
Let us find the potential at all points inside the box.
The relevant PDEfor this situation is Laplace's equation, V2ep = O.Writing cJl(x, y, z)
as a product of three functions, <I>(x, y, z) = X(x)Y(y)Z(z), yields three ODEs (see
Problem19.2):
d2X
-2 +AX=O,
dx
rectangular
conducling box
where A + f.l- + v = O.The vanishing of cI> at x = 0 and x = a means that
forn= 1,2, ....
<1>(0, y, z) = X(O)Y(y)Z(z) = 0 V y, z =} X(O) = 0,
<I>(a, y, z) = X(a)Y(y)Z(z) = 0 V y, z =} X(a) = O.
We thus obtain an S-L system, XIf + AX = 0, X(O) = 0 = X(a), whose Be is neither
separated nor periodic, but satisfies (19.1) with au = Q(23 = 1 and all other aii zero. This
S~L systemhas the eigenvalues and eigenfunctions
An = (n;)2 and XIl(X) = sin (n;x)
Similarly, the second equation in (19.4) leads to
(m
n)2
JLm= b and for m = 1,2, ....
On the other hand, the third equation in (19.4) does not lead to an S-L system because
the Befor the top ofthe box does not fit (19.1). This is as expected because the "eigenvalue"
v is already determined by Aand JL. Nevertheless, we can find a solution for that equation.
The substitution
2 _(,!!:)2 (mn)2
Ymn - a + b
19.2 SEPARATION IN CARTESIAN COORDINATES 527
changes the Z equation to Z" - Y~n Z = 0, whose solution, consistent with Z (0) = 0, is
Z(z) = Cmn sinh(Ymnz).
We note that X(x) and Y(y) are functions satisfying RIX = 0 = R2X. Thns,
by Theorem 19.1.1, they can be written as a linear combination of Xn(x) and YmCy):
X(x) = I:~l An sin(mrxja) and Y(y) = I:;;,'=l Bm sin(mrrbjy). Conseqnently, the
most general solution can be expressed as
00 00
<I> (x, y, z) = X(x)Y(y)Z(z) = L L Amn sin c:x) sin (mbrr y) sinh(YmnZ),
n=lm=l
where Amn = AnBmCmn.
To specify cI> completely, we must determine the arbitrary constants Amn . This is done
by imposing the remaining BC, <I> (x, y, c) = f(x, y), yielding the identity
00 00
f(x, y) = L L Amn sin c:x) sin Crr
y) sinh(Ymn c)
n=lm=!
= f f BmnsinC x) sin Crr
y),
n=lm=!
where Bmn == Amn sinh(Ymnc). This is a two-dimensional Fourier series (see Chapter 8)
whose coefficients are given by
4 fa t' (nrr) (mrr)
Bmn = ab 10 dx 10 dyf(x, y)sin --;;x sin bY . III
Pierre Simon de Laplace (1749-1827) was a French
mathematician and theoretical astronomer who was so fa-
mous in his own time that he was known as the Newton of
France. His main interests throughout his life were celestial
mechanics, the theory of probability, and personal advance-
ment.
At the age of 24 he was already deeply engaged in the
detailed application of Newton's law of gravitation to the
solar system as a whole, in which the planets and their satel-
lites are not governed by the sun alone, but interact with one
another in a bewildering variety of ways. Even Newton had
been of the opinion that divine intervention would occasionally be needed to prevent this
complex mechanism from degenerating into chaos. Laplace decided to seek reassurance
elsewhere, and succeeded in proving that the ideal solar system of mathematics is a sta-
ble dynamical system that will endure unchanged for all time. This achievement was only
one of the long series of triumphs recorded in his monumental treatise Mecamque Celeste
(published in five volumes from 1799 to 1825), which summed up the work on gravitation
of several generations of illustrious mathematicians. Unfortunately for his later reputation,
he omitted all reference to the discoveries of his predecessors and contemporaries, and left
it to be inferred that the ideas were entirely his own. Many anecdotes are associated with
this work. One of the best known describes the occasion on which Napoleon tried to get a
528 19. STURM-LIOUVILLE SYSTEMS: EXAMPLES
rise outofLaplaceby protestingthathe hadwrittena hugebookon thesystemofthe world
without once mentioning God as the author of the universe. Laplace is supposed to have
replied, "Sire, I had no need of that hypothesis." The principal legacy of the Mecanique
Celeste to later generations lay in Laplace's wholesale development of potential theory,
with its far-reachingimplicationsfor adozendifferent branchesof physical scienceranging
from gravitationandfluidmechanicsto electromagnetismandatomicphysics.Eventhough
he lifted the idea of the potential from Lagrange without acknowledgment, he exploited it
so extensively that ever since his time the fundamental equation of potential theory has
been known as Laplace's equation. After the French Revolution, Laplace's political talents
and greed for position came to full flower. His compatriots speak ironically of his "supple-
ness" and "versatility" as a politician. What this really means is that each time there was a
changeof regime (andthere were many),Laplace smoothlyadaptedhimselfby changing
his principles-back and forth between fervent republicanism and fawning royalism-and
each time he emerged with a betterjob and grander titles. He has been aptly compared with
the apocryphal Vicar of Bray in English literature, who was twice a Catholic and twice a
Protestant. The Vicar is said to have replied as follows to the charge of being a turncoat:
"Not so, neither, for if I changed my religion, I am sure I kept true to my principle, which
is to live and die the Vicar of Bray."
To balance his faults, Laplace was always generous in giving assistance and encourage-
ment to younger scientists. From time to time he helped forward in their careers such men
as the chemist GaywLussac.the traveler and naturalist Humboldt, the physicist Poisson, and-
appropriately-the young Cauchy, who was destined to become one of the chief architects
of nineteenth century mathematics.
Laplace's equation describes not only electrostatics, but also heat transfer.
When the transfer (diffusion) of heat takes place with the temperature being in-
dependent of time, the process is known as steady-state heat transfer. The dif-
fusion equation, aT fat = a2
V 2T , becomes Laplace's equation, V 2T = 0, and
the technique of the preceding example can be used. It is easy to see that the
diffusion equation allows us to perform any linear transformation on T, such as
T -> «T +P, and still satisfy that equation. This implies that T can be measured
in any scale such as Kelvin, Celsius, and Fahrenheit.
steady-state
haat-conducting
plate
haat-conducting
plata: steady state
19.2.2. Example. STEADY-STATE HEAT-CONDUCTING PLATE
Let us consider a rectangular heat-conducting plate with sides of lengths a and b. Three
of the sides are held at T = 0, and the fourth side has a temperaturevariationT = f (x)
(see Figure 19.2). The flat faces are insulated. so they cannot lose heat to the surroundings.
Assuming a steady-state heat transfer, let us calculate the variation of T over the plate. The
problem is two-dimensional. The separation of variables leads to
where A+ f.' =O. (19.5)
TheXequationandtheBCsT(O, y) = I'(a, y) = oformanS-Lsystemwhoseeigenvalues
and eigenfunctions are An = (mr/ap and Xn(x) = sin(mrx/a) forn = 1,2, .... Thus.ac-
cordingtoTheorem19.1.1,ageneralX(x) canbewrittenasX(x) =L:~t An sin(nrrxja).
19.2 SEPARATION IN CARTESIAN COOROINATES 529
T=f(x)
o·
---x
301
o·
f-oE---- a ------;~
Figure 19.2 A heat-conducting rectangular plate.
TheY equation, on theother hand, doesnotformanS-L systemdueto thefactthat its
"eigenvalue" ispredeterminedbythethirdequationin (19.5).Nevertheless, wecansolvethe
equation Y" -(mr/apr = Otoobtainthegeneral solutionY(y) = Aen:Jry/a +Be-mry/a.
Since T'[x, 0) =0 'I x, we must have Y(O) = O. This implies thatA+B = 0, which.tnturn,
reduces the solution to Y = A sinh(mryja). Thus, the most general solution, consistent
with the three Bes T(O, y) = T(a, y) = T(x, 0) = 0, is
00
ri», y) = X(x)Y(y) = L s; sin errx) sinh erry).
n=l a a
Thefourth Be gives a Fourier series,
~ [ . (nrr)]. (nrr) ~ . (nrr )
f(x) = L- s; smh --;;b sm --;;x sa L- en sin --;;x ,
11=1 n=l
whose coefficientscanbe determined from
(
nrr ) 2 fa (nrr)
c, =s; sinh --;;b =;;10 sin --;;x f(x) dx.
In particular, if thefourth sideis heldattheconstant temperature To. then
2T. {
4TO
c; = --.!l. (.!'-) [1- (_1)n] = nrr
a nH 0
if n is odd,
if n is even,
and we obtain
T(x, y) = 4To f 1 sin[(2k + 1)rrx/alsinh[(2k + I)rry/al.
n k~O 2k + I sinh[(2k + I)rrb/a]
(19.6)
Ifthetemperature variation of thefourth sideis of theform f (x) = Tosin(nx/a). then
2To Ina (nrrx) (rrx) 2To (a)
en = - sin -- sin - dx = - - 8n 1 = To8n 1
a 0 a- a a 2 ' ,
530 19. STURM-LIOUVILLE SYSTEMS: EXAMPLES
and Bn = en/ sinh(mrb/a) = [To/ sinh(mrb/a)]8n," and we have
T(x ) =To sin(rrx/a) sinh(rry/a).
, y 0 sinh(rrb/a)
(19.7)
conduction 01 heal in
a rectangular plate
Onlyone term of the seriessurvives in this case becausethe variation on the fourth side
happens tohe oneof theharmonics of theexpansion. '
Note that the temperature variations given by (19,6) and (19.7) are independent of
thematerial of theplatebecausewe are dealingwith a steadystate. The conductivity of
the material is a factor only in the process of heat transfer, leading to the steady state.
Onceequilibrium hasbeenreached, thedistribution of temperature will bethesameforall
materials. II
The preceding two examples concerned themselves with static sitoations. The
remaining examples of this section are drawn from the (time-dependent) diffusion
equation, the Schrodinger equation, and the wave equation.
19.2.3. Example. CONDUCTION OF HEAT IN A RECTANGULAR PLATE
Considera rectangularbeat-conductingplate with sides oflength a and b all held at T = O.
Assumethat attimet = 0 thetemperature has a distribution function I(x, y). Let us find
thevariation of temperature forall points(x, y) atall timest > O.
Thediffusion equation forthisproblem is
aT =k2V2T = k2 (a
2;
+ a
2;).
at ax ay
A separation of variables, T(x, y, t) = X(x)Y(y)g(t), leads to three DEs:
d2 y
-2 +iLY=O,
dy
The BCs T(O, y, t) = T(a, y, r) = T(x, 0, t) = T(x, b, r) = 0, together with the three
ODEs,giveriseto two S-L systems.The solutions to bothof theseare easilyfound:
_ (mr)2
An- -
a
_ (mrr)2
iLm- b
and
and
Xn(X) = sin (n;x)
Ym(y) = siu erry)
for n = 1,2, ... ,
for m = 1,2, ....
Thesegiveriseto thegeneral solutions
00
X(x) =~::>n sin ex),
n=I
With Ymn == k
2Jr2(n 2/a2 + m2/h2), the solutionto the g equation can be expressed as
g(t) = Cmne-Ymnt. Putting everything together, we obtain
00 00 (nrr) (mrr)
T(x, y, t) = E L Amne-Ymnt sin --;;x sin bY ,
n=lm=l
quantum particle ina
box
19.2 SEPARATION IN CARTESIAN COOROINATES 531
where Amn = AnBmCmn is an arbitrary constant. To determine it, we imposetheinitial
condition T(x, y, 0) = f(x, y). This yields
00 00
f(x,y) = L L Amn sin C:x ) sin C"y),
n=lm=l
which determines the coefficientsAmn:
Amn = a: fa" dx t dyf(x, y) sin C:x) sin C"y). ..
19.2.4. Example. QUANTUM PARTICLEIN A BOX
Thebehavior of anatomic particle of massJl- confined in arectangular box withsidesa, b,
andc (aninfinite three-dimensionalpotential well) is governedbytheSchrodingerequation
fora freeparticle,
ina1/l = _ n
2
(a
2
1/1 + a
2
1/1 + a
2
1/1 ) ,
at 2/L ax2 ai az2
and the Bethat 1/I(x,y, Z, t) vanishes at all sides of the boxfar all time.
A separation of vanabies 1/1(x, y, z, t) = X(x)Y(y)Z(z)T(t) yields the ODEs
d2X
-2 +).X=O,
dx
dT
-+iwT=O,
dt
where
d2y
-2 +uY=O,
dy
n
ni ea -().+u+v).
2/L
d2Z
-2 +vX=O,
dz
The spatial equations, togetherwiththe Bes
1/1(0, y, Z, t) = 1/I(a, y, Z, t) =0
1/1(x, 0, z,t) = 1/I(x,b, z,t) = 0
1/1(x, y, 0, t) = 1/I(x,y, c, t) = 0
=} X(O) = 0 = X(a),
=} YeO) = 0 = Y(b),
=} Z(O) = 0 = Z(c),
leadto threeS-L systems, whose solutionsareeasily found:
Xn(X) = sin (n;x),
Ym(y) = sin (m
b
" y),
ZI(Z) = sin ez),
(n
")2
An = a '
(m
")2
Um = b '
VI = C:)2
for n = 1,2, ... ,
for m = 1,2, ...•
for I = 1,2, ....
Thetimeequation, on theother hand, basa solutionof theform
where Wimn = 2: [C)2+ C")2+ en·
The solution of theSchrodinger equation that is consistent withtheBes is therefore
1fr(x, y, Z, r) = f: Almne-iWlmnt sin (n;x) sin (mb
7r Y) sin c:z).
1,m,n=1
(19.8)
(19.9)
532 19. STURM-LIOUVILLE SYSTEMS: EXAMPLES
The constants Almn are determined by the initial shape, 1/1(x, y, Z, 0) of the wave function.
The energy of the particle is
/i2,,2 (n2
m
2
1
2
)
E=nwlmn = - - -2+2+2 .
2/L a b c
Each set of three positive integers (n, m,l) represents a state of the particle. For a cube,
a = b = c == L, and the energy of the particle is
/i2,,2 /i2,,2
E = --2(n
2 +m
2
+z2) = ------Z-/3 (n
2
+m
2
+z2)
2/LL 2/LY
where V = L 3 is the volume of the box. The ground stale is (1,1,1), has energy E =
3lt2rr2/2f.l-V2/3,andisnondegenerate(onlyone statecorrespondstothis energy).However,
thehigher-levelstatesaredegenerate. Forinstance,thethreedistinctstates(I, 1,2), (I, 2, I),
and (2,1,1) all correspondto the same energy, E = 6/i2,,2/2/Ly2/3. The degeneracy
increases rapidly with larger values ofn, m, and I.
Equation(19.8)canbe writtenas
2/LEy2/3
where R
2
= h,2
1r2
This looks like the equation of a sphere in the nml-space.1f R is large, the number ofstates
contained within the sphere ofradius R (the number ofstates with energy less than or equal
to E) is simply the volume of the first octant! of the sphere. If N is the number of such
states, we have
density ofstates Thus the density of states (the number of states per unit volume) is
_ N _::. (~)3/2 3/2
n - y - 6 /i2,,2 E.
This is an important formula in solid-state physics, because the energy E is (with minor
Fermi energy modifications required by spin) the Fermi energy. If the Fermi energy is denoted by Ef'
Equation (19.9) gives Ef = an2j3 where a is some constant. l1li
In the preceding examples the time variation is given by a first derivative. Thns,
as far as time is concerned, we have a FODE. Itfollows that the initial specification
of the physical quantity of interest (temperature T or Schrodinger wave functiou
1/1) is sufficient to determine the solution uniquely.
A second kind of time-dependent PDE occurring in physics is the wave equa-
tion, which contains time derivatives of the second order. Thus, there are two
arbitrary parameters in the general solution. To determine these, we expect two
initial conditions. For example, if the wave is standing, as in a rope clamped at both
IThis is because n, m, and 1are all positive.
19.2 SEPARATION IN CARTESIAN COOROINATES 533
ends, the boundary conditions are not sufficient to determine the wave function
uniquely. One also needs to specify the initial (transverse) velocity ofeach point of
the rope. For traveling waves, specification of the wave shape and velocity shape
is not as important as the mode ofpropagation. For instance, in the theory of wave
guides, after the time variation is separated, a particular time variation, such as
e+iwt , and a particular direction for the propagation of the wave, say the z-axis,
are chosen. Thus, if u denotes a component of the electric or the magnetic field,
we can write u(x, y, Z, z) = 1fr(x, y)e'(WI±k'l, where k is the wave number. The
waveequation thenreduces to
Introducing y2 = w2/c2 - k2 and the transverse gradient V, = (a/ax, a/ay) and
writing the above equation in terms of the full vectors, we obtain
2 2 {E}
(V, +Y ) B = 0, where {
E
}= {E(x, y)} e,(wIHz)
B B(x, y)
(19.10)
These are the basic equations used in the study of electromagnetic wave guides
andresonant cavities.
guided waves Maxwell's equations in conjunction with Equation (19.10) gives the transverse
components (components perpendicular to the propagation direction) E, and B,
in terms of the longitudinal components Ez and Bz (see [Lorr 88, Chapter 33]):
2 (aEz ) .W.
y E, = V, - - ,-ez x (V,Bz),
az c
2 (aBz ) .W.
yB,=V, - +,-e,x(V,Ez)'
az c
Three types of guided waves are usually studied.
(19.11)
1. Transverse magnetic (TM) waves have B, = 0 everywhere. The BC on E
demands that Ez vanish at the conducting walls of the guide.
2. Transverse electric (TE) waves have E, = 0 everywhere. The BC on B
requires that the normal directional derivative
vanish at the walls.
3. Transverse electromagnetic (TEM) waves have B, = 0 = Ez.For anontriv-
ial solution, Equation (19.11) demands that y2 = O. This form resembles a
free wave with no boundaries.
534 19. STURM-LIOUVILLE SYSTEMS: EXAMPLES
We will discuss the TM mode briefly (see any book on electromagnetic theory
for further details). The basic equations in this mode are
('v; +YZ)Ez = 0,
Z (8E,)
Y E, = V, az '
B, =0,
yZB, = i :':e, x (V,E,).
c
(19.12)
Y(O) = 0 = Y(b),
X(O) =0 = X(a),
rectangular wave
guides
19.2.5. Example. REcrANGULAR WAVE GUIDES
For a wave guide with a rectangular cross section of sides a and b in the x and the y
directions, respectively, we have
aZE, aZEz Z
-z- + -z- +Y E, =O.
ax ay
A separationof variables,E,(x, y) =X(x)Y(y), leads to two SoLsystems,
dZX
-Z +AX=O,
dx
dZY
-Z +/LY=O,
dy
where y2 = A + u: Theseequations havethe solutions
Xn(X) = sin (n;x) ,
Ym(y) = sin(";,1!'y) ,
_(n1!')Z
An -
a
u-m = (";,1!')Z
for n = 1,2, ... ,
for m = 1,2, ....
The wavenumber is given by
which has toberealifthe waveis to propagate (animaginary k leadsto exponential decay
orgrowthalongthe z-axis). Thus,there is a cutofffrequency,
{orm,n 2: 1,
below whichthewavecannot propagate through the wave guide. It follows that fora TM
wave the lowest frequency thatcan propagate along a rectangular wave guide is WI!
1!'cJaZ +bZj(ab).
Themostgeneral solutionforEz is therefore
00
Ez = L Amnsin(n; X) sin (mb
1r y)ei(wt±kmnZ).
m,n=l
Theconstants Amn are arbitrary andcanbedetermined from theinitial shapeof thewave,
butthatis not commonly done. Once Ez is found, theothercomponents canbe calculated
usingEquation(19.12). II
19.3 SEPARATION IN CYLINDRICALCOORDINATES 535
«P=V(p,<j»
T
h
y
conducting
cylindrical can
x
Figure 19.3 A conducting cylindricalcan whose top has a potentialgivenby V(p, ,),
withthe restof the surfacegrounded.
19.3 Separation in Cylindrical Coordinates
When the geometry of the boundaries is cylindrical, the appropriate coordinate
system is the cylindrical one. This usually leads to Bessel functions "of some
kind."
Before working specific examples of cylindrical geometry, let us consider a
question that has more general implications. We saw in the previous section that
separation of variables leads to ODEs in which certain constants (eigenvalues)
appear. Differentchoices ofsigns for these constants Canleadto differentfunctional
forms ofthe general solution. For example, an equation such as d2x/ dt 2 - kx = 0
can have exponential solutions if k > 0 or trigonometric solutions if k < O. One
cannot a priori assign a specific sign to k. Thus, the general form of the solution
is indeterminate. However, oncetheboundary conditions areimposed, theunique
solutions will emerge regardless ofthe initial functional form ofthe solutions (see
[Hass 99] for a thorough discussion of this point).
19.3.1. Example. CONDUCTING CYUNDRICAL CAN
Consider a cylindrical conducting can of radius a and height h (see Figure 19.3). The
potential varies atthetopfaceas V(p, <p), whilethelateral surface andthebottom faceare
grounded. Letus find theelectrostatic potential atallpointsinsidethecan.
A separation of variables transforms Laplace's equation intothree ODEs:
d(dR) (2 m
2)
- p- + k p - - R = 0,
dp dp p
536 19. STURM-LIOUVILLE SYSTEMS: EXAMPLES
wherein anticipation of thecorrect Bes, we have written theconstants ask2 and _m 2 with
m aninteger. Thefirst ofthese istheBesselequation, whosegeneral solution canbewritten
as R(p) = AJm(kp) + BYm(kp). The second DE, when the extra condition of periodicity
is imposed onthepotential, hasthegeneral solution
S(<p) = Ccosm<p+ Dsinm<p.
Finally the third DE hasa general solntion of the form
Z(z) = Eekz + Fe-kz.
and
E=-F
Wenotethatnoneof thethree ODEsleadto anS-L system ofTheorem19.1.1 because the
Bes associatedwiththemdo not satisfy (19.1). However, we can still solve theproblem by
imposing thegivenBes.
Thefactthatthepotential mustbe finiteeverywhere insidethecan (including at p = 0)
forces B to vanish because theNeumann function Ym (kp) is notdefined at p = O. On the
other hand, we want <I> to vanish at p = a. This gives Jm(ka) = 0, which demands that
ka be a rootof the Bessel functionof order m. Denotingby Xmn thenth zeroof the Bessel
function of order m, wehaveka = Xmn, ork = xmn/a forn = 1,2, ....
Sintilarly, the vanishing of <I> at Z = 0 implies that
. (xmnz)
Z(z) = E sinh -a- .
Wecan now multiply R, S, and Z and sumoverallpossible values ofm and n, keeping
inmindthatnegative values ofm giveterms thatarelinearly dependentonthecorresponding
Fourier-Bessel series positive values. Theresult is theso-calledFourier-Bessel series:
00 00 (xmn). (xmn) .
<I>(p, <p, z) = L L Jm . ---;;-P sinh ---;;-z (Amn cosm<p + Bmn smm<p),
m~On~l (19.13)
where Amn and Bmn are constants to be determined by theremaining Be. Tofind these
constants we use theorthogonality of thetrigonometric andBessel functions. Forz = h
Equation (19.13) redoces to
yep, lp) = f f:s; (x:
n
p) sinh(x:n h) (Amncosmlp + Bmn sinmlp),
m=On=l
from which we obtain
Amn = 2 2 2 r:d<p fa dp pV(p, <p)Jm e
mn p) cosmtp,
x a Jm+1(xmn) sinh(xmnh/a) Jo Jo a
Bmn = 22 2 {21rdlp (a dPPV(p,lp)Jm(Xmnp)sinmlp,
x a Jm+1(xmn) sinh(xmnh/a) Jo Jo a
where we haveusedthefollowing result derived in Problem 14.39:
(19.14)
19.3 SEPARATION IN CYLINDRICALCOORDINATES 537
Forthespecialbutimportant case.ofazimuthal symmetry, for whichV is independent
of sp, we obtain
40m
0 In" (xon )
Amn = ' dppV(p)Jo -p ,
a2Jf(xo
n) sinh(xOnhja) 0 a
Bmn =0. ..
circular
heat-conducting
plate
The reason we obtained discrete values for k was the demand that <I> vanish at
p = a.Ifwe let a -+ 00, then k will be a continuous variable, and instead ofa sum
over k, we will obtain an integral. This is completely analogous to the transition
froma Fourier seriesto a Fourier transform, butwe will not pursue it further.
19.3.2. Example. CIRCULAR HEAT-CONDUCTING PLATE
Consider a circular heat-conducting plateof radius a whose temperature attimet = 0 has
a disIribution function f (p, <p). Let us findthe variation of T for alt points (p, <p) on the
platefortimet > 0 whentheedge is keptatT = O.
Thisis a two-dimensional problem involvingtheheatequation,
aT = k2V 2T = k2 [.!.i.(/T) + 2.a
2T].
at p ap Bp p2 B<p2
A separationof variables,T(p, <p, t) = R(p)S(<p)g(t), teads to the following ODEs:
dg 2 d
2S
-d =k Ag, -2 +tLS=O,
t dsp
a
2R
t dR (tL )
d
p2
+ Pdp - p2 +A R = O.
Toobtain exponential decayrather than growth for the temperature, we demand that A==
b2 < O. Toensureperiodicity (see thediscussionatthebeginningof this section),we must
havef.L = m2, where m is aninteger.'Tohave finite T at p = 0, no Neumann function is to
be present. Thisleadsto thefollowingsolutions:
get) = Ae-k'b", S(<p) = Bcosm<p + Csinm<p, R(p) = DJm(bp).
Ifthe temperature is to bezeroat p = a, we musthaveJm(ba) = 0, orb = xmn/a. It
followsthatthegeneral solution canbe written as
00 00
T(p, sp, t) = L L e-k2(Xmn/a)2t Jm(x:n p){Amncosm<p + Bmn sinm<p).
m=On=l
Amn andBmn canbe determined asin thepreceding example. ..
cylindrical wave
guide
19.3.3. Example. CYLINDRICAL WAVEGUIDE
Fora TM wavepropagating alongthez-axis in a hollow circ~ar conductor, we have[see
Eqnation(19.12)]
1 B (BEZ ) t B
2E
z 2
- - p- +---+y Ez=O.
p Bp Bp p2 B<p2
538 19. STURM-LIOUVILLE SYSTEMS: EXAMPLES
The separation Ez = R(p)S(rp) yields S(rp) = A cosmrp + H sinmrp and
d
2R
I dR ( m
2)
d
p2
+ Pdp + y2 - p2 R =O.
Thesolution to thisequation, whichis regular at p = 0 andvanishes at p = a, is
R(p) = CJm(xmn p)
a
and
Xmn
y=-.
a
Recalling thedefinition of y, we obtain
Hz =0,
and
This gives the cut-offfrequency lVmll = cXmn/a.
Thesolution fortheazimuthally symmetric case (m = 0) is
Ez(p, v,t) = f:AnJO (X~n p) ei(wt±knz)
n=l
11II
current dlstrlbutlon
in acircular wire
There are many variations on the theme of Bessel functions. We have encoun-
lered three kinds ofBessel functions, as well as modifiedBessel functions. Another
variation encountered in applications leads 10 whal are known as Kelvinfunctions,
introduced in the following example.
19.3.4. Example. CURRENT DISTRlBUTION IN A CIRCULAR WIRE
Consider the flow of charges in an infinitely long wire with a circular cross section of
radius Q. We are interested in calculating the variation of thecurrent density in the wire
as a function of timeand location. Therelevant equation canbe obtained by starting with
Maxwell's equations for negligible charge density (V· E = 0), Ohm's law G= ,,-E), the
assumption of high electrical conductivity (I,,-EI » jaE/atl), and the usual procedure of
obtaining thewaveequation fromMaxwell's equations. Theresult is
2. 4",,,- aj
V J-7at =0.
Moreover, we make thesimplifying assumptions that the wireis alongthez-axis and
that there is no turbulence, so j is also alongthe z direction. We further assume that j is . t
independent of <p and z,andthat its time-dependence is givenby e-iwt . Thenwe get "-.
skindeplh
Kelvin equation
d2. I d .
_J + _-.-1. +<2j = 0,
d p2 pdp
where <2 = i4"',,-w/c2 == i2/~2 and ~ = c/-'/2",,,-w is called the skin depth.
TheKelvinequation is usually givenas
d2w I dw 2
--+---ik w=O.
dx 2 x dx
(19.15)
(19.16)
19.3 SEPARATION IN CYLINDRICALCOORDINATES 539
If we substitute x = ,Jit/ k, it becomesin + tiJ/ t + w = 0 whichis a Bessel equation of
order zero. Ifthe solution is to beregular atx = 0, thentheonlychoiceis wet) = Jo(t) =
Kelvin function Jo(e-i-nj 4 kx). This is the Kelvin function forEquation (19.16).It is usuallywritten as
Jo(e-hr/4kx) es ber(kx) +i bei(kx)
where berand beistand for"Besselreal" and "Besselimaginary," respectively. If we sub-
stitute z = e-i1Cj4kx in the expansion for Io(z) and separate the real andthe imaginary
parts of theexpansion, we obtain
(xI2)4 (xI2)8
ber(x) = 1- --+-- - ...
(2!)2 (4!)2
. (xI2)2 (xI2)6 (xI2)10
beifx) = (1!)2 - (3!)2 + (5!)2 _ ....
Equation (19.15) is the complex conjugate of(l9.16) withk2 = 2/132. Thus,its solution
is
j(p) = AJo(ein/4kp) =-A {ber (~p) - ibei (~p)}.
Wecancompare thevalue of thecurrent density at p withits value atthesurface p = a:
[
(~ ) (~ )]1/2
j(p) ber
2
T P + bei
2
T P
Ij(a) I= b 2 (~) b.2 (~ )
er -a + el -a
13 13
quanlum particle ina
cylindrical can
For low frequencies, 13 is large, which implies that P113 is small; thus, ber(~p113) '" I
and bei(~pllJ) '" 0, and Ij(p)lj(a)1 '" I; i.e., the current density is almost uniform. For
higher frequencies the ratio of the current densitiesstarts at a valueless than 1 at p = 0
andincreases to 1 at p = a. The starting valuedepends on the frequency. Forverylarge
frequencies the starting value is almost zero (see [Mati 80, pp 150-156]). II
19.3.5. Example. QUANTUM PARTICLE IN A CYUNDRICAL CAN
Let us consider a quantum particle in a cylindrical can. Foranatomicparticle of mass f.L
confined ina cylindrical canof lengthL andradius a, therelevant Schriidinger equation is
. at Ii [I a ( at) Ia
2t
a2t]
I at = - 2{Jo Pap p ap + p2 arp2 + az2 .
Let us solve thisequation subject to theBe that1/!(p,qJ, Z, t) vanishes at the sides of the
can.
A separation of variables, t(p, rp, z, t) =R(p)S(rp)Z(z)T(t), yields
dT d2Z d2S
-=-;OJT -+AZ=O, -+m2S = O,
dt dz2 drp2
d
2
R + .!:. dR + (2{Jo OJ _ A _ m
2
) R = O. (19.17)
dp2 pdp Ii p2
540 19. STURM-LIOUVILLE SYSTEMS: EXAMPLES
TheZ equation, along withits periodic Bes, constitutes anS-Lsystemwithsolutions
Z(z) = sin (k;z) fork=I.2.....
(19.18)
If we let 2/Lw/1i- (k:n:/ L)2 '" b2•thenthe last eqnationin (19.17)becomes
d
2R
+.!:.dR +(b2_ m
2
) R = O
.
dp2 pdp . p2
and the solutionthat is well-behaved at p = 0 is Jm(bp). Since R(a) = O. we obtainthe
quantization condition b = xmn/a forn = 1,2, .... Thus, theenergy eigenvalues are
and thegeneral solution canbe written as
~ -iw t (xmn). (k:n: ) .
'!J(P. <P. z, t) = L, e km. Jm -p 8m -z (Akmn ccemo + Ekmn smm<p).
k,n=l a L III
m=O
19.4 Separation in Spherical Coordinates
Recall that most PDEs encountered in physical applications can be separated. in
spherical coordinates. into
L2y(O. <p) = 1(1 + I)Y(O. <p).
d
2R
+~dR +[f(r)_I(I+I)]R=O.
dr2 r dr r2
We discussed the first of these two equations in great detail in Chapter 12. In
particular. we constructed Yim(0. <p) in such a way thatthey formed an orthonormal
sequence. However. thatconstructionwas purely algebraic and did not say anything
about the completeness of Ylm(O.<pl. With Theorem 19.1.1 at our disposal, we
can separate the first equation of (19.18) into two ODEs by writing Ylm(O.<p) =
Plm(O)Sm(<p). We obtain
d2S
m 2
--2 +m Sm=O.
d<p
d [ 2 dPlm] [ m
2
]
- (l-x )-- + 1(1+1)--- Plm=O.
dx dx l-x2
where x = cos O. These are both S-L systems satisfying the conditions ofTheorem
19.1.1. Thus. the Sm are orthogonal among themselves and form a complete set
for ,(,2(0. 2IT).Similarly. for any fixed m. the Plm(X) form a complete orthogonal
set for ,(,2(-1. +1) (actually for the subset of ,(,2(-1. +1) that satisfies the same
19.4 SEPARATION IN SPHERICAL COORDINATES 541
BC as the Pi«do atx =±I). Thus, the products Ylm(X,<p) =Plm(X)Sm(<P) fonu
a complete orthogonal sequence in the (Cartesian prodnct) set [-I, +I] x [0, 2,,],
which, in tenus of spherical angles, is the unit sphere, 0 :::: e :::: x ;0 :::: <p :::: 2".
Let ns considersome specific examples of expansionin the spherical coordinate
system starting with the simplest case, Laplace's equation for which fer) = O.
The radial equation is therefore
d2R
2dR 1(1 + I)
-+--- R=O.
dr2 r dr r 2
Multiplying by r2, substituting r = e', and using the chain rule and the fact that
dt[dr = I/r leads to the following SOLDE with constant coefficients:
d2R
dR
- + -·-1(1 + I)R = O.
dt2 dt
This has a characteristic polynomial peA) = A2 +A -1(1 + I) with roots Al = I
and A2 = -(I + I). Thus, a general solution is of the fonu
R(t) = AeAI ' +BeA" = A(e')l +B(e,)-I-I,
or, in tenus of r, R(r) = Ar1 + Br-I-I. Thus, the most general solution of
Laplace's equation is
00 1
<I>(r, e, <p) = L L (Almr l +Blmr-1-I)Ylm(e, <p).
I=Om=-l
For regions containing the origin, the finiteness of <I> implies that Blm = O.
Denoting the potential in such regions by <l>in, we obtain
00 1
<l>in(r, e. <p) = L L AlmrIYlm(e, <p).
I=Om=-1
Similarly, for regious including r = 00, we have
00 I
<l>out(r, e, <p) = L L Blmr-I-IYlm(e, <p).
1=0 m=-l
To determine Aim and Blm, we need to invoke appropriate BCs. In particular,
forinside a sphere of radius a on which the potential is given by Vee, <p), we have
00 1
vee, <p) = <l>in(a, e. <p) = L L Alma1Ylm(e,<p).
l=Om=-1
542 19. STURM-LIOUVILLE SYSTEMS: EXAMPLES
Multiplying by Y~ (0, 91) and integrating over dQ = sin 0 dO d91, we obtain
Akj = a-k
IIdQV(O, 91}Y~(O, 91} =} Aim = a-I IIdQV(O, 91}YI;"(O,91}·
Similarly, for potential ontside the sphere,
Rim = a
l
+1
IIdQV(O, 91}YI;"(O, 91}·
In particular, if V is independent of 91, only the components for which m = 0
are nonzero, and we have
2rr 1" 2rr Fll-I
+ 1 1" .
Am = -I sinOV(O}Yio(O}dO = -I -4- smOV(O}PI(cosO} ae.
a 0 a tc 0
which yields
where
Al = _2_ {" sinOV(O}PI(COSO} so.
21 + 1 10
Similarly,
The next simplest case after Laplace's equation is that for which f (r) is a constant.
The diffusion equation, the wave equation, and the Schriidinger equation for a free
particle give rise to such a case once time is separated from the restofthe variables.
Helmholtz equation The Helmholtz equationis
(19.19)
(19.20)
and its radial part is
d
2R
+~dR +[k2_1(l+I}]R=0.
dr2 r dr r2
(This equation was discussed in Problems 14.26 and 14.35.) The solutions are
spherical Bessel spherical Bessel functions, generically denoted by the corresponding lower case
functions letter as ZI(X} and given by
(19.21)
19.4 SEPARATION IN SPHERICALCOOROINATES 543
where Zv(x) is a solution of the Bessel eqnation of order v.
A general solution of (19.20) can therefore be written as
If the origin is included in the region of interest, then we must set B = O. For such
a case, the solution to the Helmholtz equation is
00 Z
'h(r, IJ, 'P) = L L Azmjz(kr)Yzm(lJ, 'P).
I=Om=-l
(19.22)
for I = 0, 1, ... , n = 1,2, ....
particle Inahard
sphere
The subscript k indicates that'" is a solution of the Helmholtz equation with k2
as
its constant.
19.4.1. Example. PARTICLE IN A HARD SPHERE
The time-independent Schrodinger equation for a particle in a sphere of radius a is
,,}
- 2jLv 2t = Et with the Be t(o, e, IfJ) = O. Here E is the energy of the particle
and jL is its mass. We rewrite the Schrodinger equation as V2t + 2jLEjn2 = O. With
k2 = 2f.LE/;,,2,we canimmediately writetheradial solution
Rz(r) = Ajz(kr) = Ajz(,j2jLErjn).
Thevanishing of tat 0 impliesthat jZ(.,f2jLE ojn) = 0, or
.,f2jLE 0
11, - = Xin for n = 1,2, "',
where KIn is thenthzeroof hex), whichis the sameas thezeroof Jl+l/2(X). Thus, the
energy is quantized as
El
n
= n2x~
2jLo
Thegeneral solution to theSchrodlnger equation is
00 00 I r)
t(r,e,IfJ) = LL L Anzmjz(Xznti YZm(e,'P).
n=11=Om=-1
III
A particularly useful consequence of Equation (19.22) is the expansion of a
plane wave in terms of spherical Bessel functions. It is easily verified that if k
is a vector, with k . k = k2, then eik.r is a solution of the Helmholtz equation.
Thus, eik.r can be expanded as in Equation (19.22). Assuming that k is along the
z-axis, we get k· r = krcoslJ, which is independent of 'P. Only the terms of
Equation (19.22) for which m = 0 will survive in such a case, and we may write
eik"osO = L~o A/j/(kr)Pz(cos IJ).To find Az, let u = cos IJ, mnltiply both sides
by Pn(u),andintegratefrom-I to I:
11 . 00 11 2
Pn(u)e'krUdu = L Azjz(kr) Pn(u)Pz(u) du = Anjn(kr)--.
-1 /=0 -1 2n + I
544 19. STURM-LIOUVILLE SYSTEMS: EXAMPLES
Thus
. 2n + 111
ikru
Anln(kr) = - - Pn(u)e du
2 -1
= 2n + 1 f (ikr)m 11
Pn(u)umdu.
2 m=O m! -1
(19.23)
expansion ofeik
.rin
spherical harmonics
This equality holds for all values of kr. In particular, both sides should give the
same result in the limit ofsmall kr. From the definition of jn(kr) and the expansion
of In(kr), we obtain
.Jii (kr)n 1
jn(kr) ~ 2 2 ['(n +3/2)
On the other hand, the first nonvanishing term of the RHS of Equation (19.23)
occurs when m = n. Eqnating these terms on both sides, we get
.Jii (kr)n 21..+1n! = 2n + 1 ;n(kr)n 2n+1(nl)2
An 2 2 (2n +1)!.Jii 2 n! (2n + I)! '
wherewe have used
(19.24)
(
~) _ (2n + 1)1.Jii
I' n + 2 - 21..+1n! and
1
1 n 2n+1(nl)2
-1 Pn(U)U du = (2n + 1)1'
Equation (19.24) yields An = ;n(2n + 1).
With An thns calculated, we can now write
00
elkroo'B = L(21 + l);l jl(kr) PI(cos 6).
I~O
(19.25)
For an arbitrary direction of k, k . r = kr cos y, where y is the angle between k
and r. Thus, we may write elk.• = I:Z:;o(21 + 1);1jl (kr) PI(cos V), and using the
addition theorem for spherical harmonics, we finally obtain
00 1
elk-r = 4n L L ;1h(kr)Y/;n (6', ,P')Ylm(6, <p),
1=0 m=-l
(19.26)
where 6' and ip' are the spherical angles of k and 6 and <p are those of r. Such
a decomposition of plane waves into components with definite orbital angular
momenta is extremely useful when working with scattering theory for waves and
particles.
19.5 PROBLEMS 545
T=O
T=O
Figure 19.4 A semi-infiniteheat-conductingplate.
19.5 Problems
x
19.1. Show that separated and periodic Bes are special cases of the equality in
Equation (19.3).
19.2. Derive Equation (19.4).
19.3. A semi-infinite heat-conducting plate of width b is extended along the pos-
itive x-axis with one comer at (0, 0) and the other at (0, b). The side of width b is
held at temperature To, and the two long sides are held at T = 0 (see Figure 19.4).
The two flat faces are insulated. Find the temperature variation ofthe plate, assum-
ing equilibrium. Repeat the problem with the temperature of the short side held at
each of the following:
Ca)T={O ~f
To If
(c) Tocos Gy) ,
0< y < biZ,
biZ < y < b.
o:'0 y :'0 b.
(b) ;y, 0:'0 y:'O b.
(d) Tosin (~y) , 0:'0 y:'0 b.
19.4. Find a general solution for the electromagnetic wave propagation in a res-
onant cavity, a rectangular box of sides 0 :'0 x :'0 a, 0 :'0 Y :'0 b, and 0 :'0 z :'0 d
with perfectly conducting walls. Discuss the modes the cavity can accommodate. (
19.5. The lateral faces of a cube are grounded, and its top and bottom faces are
held at potentials 11(x, y) and fz(x, y), respectively.
(a) Find a general expression for the potential inside the cube.
(b) Find the potential if the top is held at Vo volts and the bottom at - Vo volts.
19.6. Find the potential inside a semi-infinite cylindrical conductor, closed at the
nearby end, whose cross section is a square with sides oflength a. All sides are
grounded except the square side, which is held at the constant potential Vo.
546 19. STURM-LIOUVILLE SYSTEMS: EXAMPLES
19.7. Find the temperature distribution of a rectangular plate (see Figure 19.2)
with sides of lengths a and b if three sides are held at T = 0 and the fourth side
has a temperature variation given by
0::; x < a.
To
(a) -x,
a
To I a I
(e) -; x -"2 ' o::: x < a.
To
(b) 2"x(x - a), 0:" x :" a.
a
(d) T = 0, 0 :" x :" a.
19.8. Consider a thin heat-conducting bar of length b along the x-axis with one
end at x = 0 held at temperature To and the other end at x = b held attemperature
- To. The lateral surface of the bar is thermally insulated. Find the temperature
distribution at all times if initially it is given by
2To
(a) T(O, x) = -bx +To, where 0:" x:" b.
2To 2
(b) T(O, x) = -z;zx +To, where 0:" x :" b.
To
(e) T(O, x) = -,;x + To, where 0:" x < b.
(d) T(O,x) = Tocosej;x), where O:"x:"b.
Hint: The solution corresponding to the zero eigenvalue is essential and cannot be
excluded.
19.9. Determine T(x, y, r) for the rectangular plate ofExample 19.2.3 if initially
the lower left quarter is held at To and the rest of the plate is held at T = O.
19.10. All sides of the plate of Example 19.2.3 are held at T = O. Find the
temperature distribution for all time if the initial temperature distribution is given
by
if "a < x < ~a and Ib < y < ~b
4 - - 4 4 - - 4 '
otherwise.
{
To
(a) T(x, y, 0) = 0
To
(b) T(x, y, 0) = -xy,
ab
To
(e) T(x, y, 0) = -x,
a
where 0:" x < a and 0 :" y < b.
where 0:" x < a and 0 < y < b.
(
19.11. Repeat Example 19.2.3 with the temperatures of the sides equal to Tl, T2,
T3, and T4.Hint: You must include solutions corresponding to the zero eigenvalue.
19.12. A string of length a is fixed at the left end, and the right end moves with
displacement A sinmt. Find 1{f(x,t) and a consistent set of initial conditions for
the displacement and the velocity.
19.5 PROBLEMS 547
19.13. Find the equation for a vibrating rectangular membrane with sides of
lengths a and b rigidly fastened on all sides. For a = b, show that a given mode
frequency mayhavemorethan one solution.
19.14. Repeat Example 19.3.1 if the can has semi-infinite length, the lateral sur-
face is grounded, and:
(a) the base is held at the potential V(p, <fI).
Specialize to the case where the potential of the base is given-in Cartesian co-
ordinates-s-oy
Vo
(b) V = -yo
a
Vo
(c) V = -x.
a
Vo
(d) V = '2xy.
a
Hint: Use the integral identity f zV+IJv(z) dz = Zv+1 Jv+1 (z).
19.15. Find the steady-state temperature distribution T (p, tp, z) in a semi-infinite
solid cylinder ofradius a if the temperature distribution ofthe base is f(p, <fI) and
the lateral surface is held at T = O.
19.16. Find the steady-state temperature distribution of a solid cylinder with a
height and radius of 10, assuuting that the base and the lateral surface are at T = 0
and the top is at T = 100.
19.17. The circumference ofa flat circular plate ofradius a, lying in the xy-plane,
is held at T = O. Find the temperature distribution for all time if the temperature
distribution att = 0 is given-in Cartesian coordinates-by
To
(a) -yo
a
To
(b) -x.
a
To
(c) '2xy.
a
(d) To.
19.18. Find the temperature of a circular conducting plate ofradius a at all points
of its surface for all time t > 0, assuming that its edge is held at T = 0 and initially
its surface from the ceuter to a/2 is in contact with a heat bath of temperature To.
19.19. Fiud the potential of a cylindrical conducting can of radius a and height h
whose top is held at a constant potential Vo while the rest is grounded.
19.20. Find the modes and the corresponding fields of a cylindricalresonantcavity
of length L and radius a. Discuss the lowest TM mode.
19.21. Two identical long conducting half-cylindrical shells (cross sectious are
half-circles) ofradius a are glued togetherin such a way that they are insulated from
one another. One half-cylinder is held at potential Vo and the other is grouuded.
Findthe potential at any pointinsidethe resulting cylinder. Hint: Separate Laplace's
equation in two dimensions.
(
548 19. STURM-LIOUVILLE SYSTEMS: EXAMPLES
19.22. A linear charge distribution of uniform dcnsity x extends along the z-axis
from z = -b to z = b. Show that the electrostatic potential at any point r > b is
given by
00 (b/r)Zk+l
<P(r, e, rp) = 2)"I: PZk(COSe).
k~O 2k+ I
Hiut: Consider a point on the z-axis at a distance r > b from the origin. Solve the
simple problem by integration and compare the result with the infinite series to
obtain the unknown coefficients.
19.23. The upper half of a heat-conducting sphere of radius a has T = 100; the .
lower halfis maintained at T = -100. The whole sphereis inside an infinitelylarge
mass of heat-conducting material. Find the steady-state temperature distribution
inside and outside the sphere.
19.24. Find the steady-state temperature distribution inside a sphere of radius a
when the surface temperature is given by:
(a) To cosz e.
(d) To (cos e- cos3
e).
(b) To cos" e.
(e) Tosm2 e.
(c) Tol cos e].
(f) To sin4
e.
19.25. Find the electrostatic potential both inside and outside a conducting sphere
of radius a when the sphere is maintained at a potential given by
(a) Vo(cose - 3sinz e).
{
Vocos e for the upper hemisphere,
(c) .
o for the lower hemisphere.
19.26. Find the steady-state temperature distribution inside a solid hemisphere
of radius a if the curved surface is held at To and the flat surface at T = O.
Hint: Imagine completing the sphere and maintaining the lower hemisphere at a
temperature such that the overall surface temperature distributionis an oddfunction
about e= n/2.
19.27. Find the steady-state temperature distribution in a spherical shell of inner (
radius R1 and outer radius Rz when the inner surface has a temperature Tl and the
outer surface a temperature Tz-
Additional Reading
1. Jackson, J. Classical Electrodynamics, 2nd ed., Wiley, 1975. The classic
textbook on electromagnetism with many examples and problems on the
solutions of Laplace's equation in different coordinate systems.
19.5 PROBLEMS 549
2. Mathews, J. and Walker, R. Mathematical Methods of Physics, 2nd ed.,
Benjamin, 1970.
3. Morse, P. and Feshbach, M. Methods ofTheoretical Physics, McGraw-Hill,
1953.
(
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
(
Part VI _
Green's Functions
(
1
1
(
(20.1)
20 _
Green's Functions in One Dimension
Our treatment of differential equations, with the exception of SOLDEs with con-
stant coefficients, did not consider inhomogeneous equations. At this point, how-
ever, we can put into use one of the most elegant pieces of machinery in higher
mathematics, Green's functions, to solve inhomogeneous differential equations.
This chapter addresses Green's functions in one dimension, that is, Green's
functions of ordinary differential equations. Consider the ODE Lx[u] = f(x)
where Lxis a linear differential operator. In the abstract Dirac notation this can be
formally written as L lu) = If}. If L has an inverse L-1 sa G, the solution can be
formally written as lu) =L-I If} =G If}. Multiplying this by (xl and inserting
1 = f dy Iy) w(y) (yl between G and If) gives
u(x) = fdyG(x, y)w(y)f(y),
where the integration is over the range of definition of the functions involved. Once
we know G(x, y), Equation (20.1) gives the solution u(x) in an integral form. But
how do we find G(x, y)?
Sandwiching both sides of LG = 1 between (xl and Iy) and using 1 =
f dx'lx'} w(x') (x'i between L and G yields f dx'L(x, x')w(x')G(x', y) =
(xl y) = 8(x - y)jw(x) if we use Equation (6.3). In particular, if L is a local
differential operator (see Section 16.1), then L(x, x') = [8(x - x')jw(x)]Lx, and
differential equation we obtain
forGreen's function
(
8(x - y)
LxG(x, y) = --'C....,-7"'-
w(x)
or LxG(x, y) = 8(x - y), (20.2)
where the second equation makes the frequently used assumption that w(x) = 1.
Green's function G(x, y) is called the Green's function (GF) for the differential operator (DO) Lx.
554 20. GREEN'S FUNCTIONS IN ONE OIMENSION
As discussed iu Chapters 16 and 18, Lx might uot be defined for all functions
on R Moreover, a complete specification of Lx reqnires some initial (or boundary)
conditions. Therefore, we expect G(x, y) to depend on such initial conditions as
well. We note that when Lx is applied to (20.1), we get
Lxu(x) = f dy[Lx(G(x, y)]w(y)f(y) = f dy 8(x - y) w(y)f(y) = f(x),
w(x)
indicating that u(x) is indeed a solution of the original ODE. Equation (20.2),
involving the generalized function 8(x - y) (or disttibution in the language of
Chapter 6), is meaningful only in the same context. Thus, we treat G(x, y) not as
an ordinary function but as a distribution. Finally, (20.1) is assumed to hold for an
arbitrary (well-behaved) function f.
20.1 Calculation of Some Green's Functions
This section presents some examples of calculating G(x, y) for very simple DOs.
Later we will see how to obtain Green's functions for a general second-orderlinear
differential operator. Although the complete specification of GFs requires bound-
ary conditions, we shall introduce unspecified constants in some of the examples
below, and calculate some indefinite GFs.
20.1.1. Example. Let us findthe GF for the simplestDO, Lx =d/dx. Weneed to find
a distribution suchthat its derivative is theDirac deltafunction: 1 G'(x, y) =cS (x - y).
In Chapter 6, we encountered sucha distribution-the stepfunction B(x - y). Thus,
G(x, y) = O(x - y) +aryl, wherearyl is the "constant"of integration. III
The example above did not include a boundary (or initial) condition. Let us
see how boundary conditions affect the resulting GF.
20.1.2. Example. Letussolve u'(x) = f(x) wherex E [0,00) andu(O) = O. Ageneral
solution of thisDE is givenby Equation (20.1) andthepreceding example:
u(x) =fooo O(x - y)f(y)dy +foOOa(y)f(y)dy.
Thefactor8(x - y) in the firstterm onthe RHSchopsofftheintegralatx:
u(x) = fox f(y)dy + fooo a(y)f(y)dy.
TheBegives0 = u(O) = 0+ Jo
ooa(y)f(y)dy. Theonlywaythatthiscanbe satisfied for
arbitrary fey) is for aryl to be zero.Thus,G(x, y) = O(x - y), and
u(x) = foOO O(x - y)f(y)dy = fox f(y)dy.
1Hereandelsewhere in thischapter, a primeovera GFindicates differentiation withrespectto its first argument.
(
20.1 CALCULATION OF SOME GREEN'S FUNCTIONS 555
Thisiskillingaflywithasledgehammer! Wecouldhaveobtainedtheresultbyasimple
integration. However, theroundabout wayoutlinedhereillustrates someimportant features
of GFsthat will be discussed later. TheBe introduced hereis veryspecial. What happens
if it is changed to u(O) = a? Problem 20.1 answers lbat. III
20.1.3. Example. A more complicated DO is Lx = d2/ dx2.Let us find its indefinite GF.
To do so, we integrate GI/(x, y) =8(x - y) ouce wilbrespect to x to obtain -!xG(x, y) =
O(x - y) +a(y). A secoud integration yields G(x, y) = f dxO(x - y) +xa(y) + ~(y),
wherea and 11 arearbitrary functions andtheintegral isanindefinite integral tobeevaluated
next.
Let Q(x, y) be the primitive of O(x - y); lbat is,
dQ"( {I if x > y,
- =u x-y) =
dx 0 if x < y.
Thesolution to thisequation is
Q(x, y) = {x +a(y) if x > y,
bey) if x < y.
(20.3)
Note that we have not defined Q(x, y) atx = y. It will become c1earbelow lbat Q(x, y) is
continuous at x = y. Itis convenient to writeQ (x, y) as
Q(x, y) = [x +a(y)]e(x - y) +b(y)e(y -x). (20.4)
To specify aryl and bey) further, we differentiate (20.4) and compare it wilb (20.3):
dQ
dx = O(x - y) + [x +a(y)]8(x - y) - b(y)8(x - y)
=O(x - y) + [x - bey) +a(y)]8(x - y), (20.5)
where we have used
d d
dx O(x - y) = - dx O(y - x) = 8(x - y).
For Equation (20.5) to agree wilb (20.3), we must bave [x - bey) + a(y)]8(x - y) = 0,
which,uponintegration overx, yieldsa(y) - b(y) = -y. Substituting thisintheexpression
for Q(x, y) gives
Q(x, y) = (x - y)e(x - y) +b(y)[O(x - y) +O(y - x)].
But O(x) +Or-x) = I; therefore, Q(x, y) = (x - y)e(x - y) +bey). It follows, among
other things, that n (x, y) is continuous atx = y. Wecannowwrite
(
G(x, y) = (x - y)e(x - y) +xa(y) +fJ(y),
where fJ(y) = ~(y) +bey). III
The GF in the example above has two arbitrary functions, Ol(y) and fJ(y), which
are the result of underspecification of Lx: A full specification of Lx requires Bes,
as the following example shows.
556 20. GREEN'S FUNCTIONS IN ONE OIMENSION
20.1.4.Example. Let us calculate the GF of Lx[u] = u"(x) = f(x) subject to the BC
u(a) = u(b) = 0 where [a, b] is the iutervalon which Lx is defined.Example 20.1.3gives
us the (indefinite) GFfor Lx.Using that,we can write
u(x) = lb
(x - y)e(x - y)f(y) dy +x lb
a(y)f(y) dy +lb
fJ(y)f(y) dy
= LX(x - y)f(y) dy +x lb
a(y)f(y) dy +lb
fJ(y)f(y) dy.
Applyiugthe BCs yields
0= u(a) = a lb
a(y)f(y) dy +lb
fJ(y)f(y) dy,
0= u(b) = lb
(b - y)f(y) dy +b lb
a(y)f(y) dy +t fJ(y)f(y) dy.
(20.6)
where a:::; x and y ::s: b.
From thesetwo relations it is possibleto determine ex(y) andfJ(y): Substitute forthe
last iutegral on the RHS of the second equation of (20.6) from the first equation and get
o = J/;[b - y + ba(y) - aa(y)]f(y) dy. Siuce this must hold for arbitrary fey), we
concludethat
b-y
b - y + (b - a)a(y) =0 => a(y) = ---.
b-a
Substituting for a (y) iu the first equation of (20.6) and noting that the result holds for
arbitrary f, weobtain fJ(y) =a(b-y)/(b-a). Insertionofa(y) andfJ(y) iutheexpression
for G(x, y) obtaiuediu Example 20.1.3 gives
y-b
G(x, y) =(x - y)e(x - y) + (x - a) b _ a
It is striking that G(a, y) =(a - y)e(a - y) =0 (becausea - y ~ 0), and
y-b
G(b, y) = (b - y)e(b - y) + (b - a) b _ a = 0
because e(b - y) = 1 for all y ~ b [recall that x and y lie iu the iuterval (a, b)J. These
two equations revealthe important fact thatas a functionof x, G(x, y) satisfiesthe same
(homogeneous) Beasthesolutionofthe DE.Thisis ageneralpropertythat willbediscussed
later. III
In all the preceding examples, the BCs were very simple, Specifically, the value
of the solution and/or its derivative at the boundary points was zero. What if the
BCs are not so simple? In particular, how can we handle a case where u(a) [or
u'(a)] and u(b) [or u'(b)] are nonzero?
Consider a general (second-order) differential operator Lx and the differential
equation Lx[u] = J(x) subjecttotheBCsu(a) = at andu(b) = bl. We claim that
we canreduce this system to the case where u(a) = u(b) = O.RecallfromChapter
13 that the most general solution to such a DE is ofthe form u = Uk +u, where Uk,
(
20.2 FORMAL CONSIDERATIONS 557
the solution to the homogeneous equation, satisfies Lx[Uh] = 0 and contains the
arbitrary parameters inherent in solutions of differential equations. For instance,
if the linearly independent solutions are v and w, then Uh(X) = C, v(x) +C2W(X)
and Ui is any solution of the inhomogeneous DE.
Ifwe demand that uh(a) =a, and uh(b) =b, then u, satisfies the system
Lx[uil = f(x), Uj(a) =ui(b) =0,
which is ofthe type discussedin the preceding examples. Since Lxis a SOLDO, we
can put all the machinery of Chapter 13 to workto obtain v(x), w (x), and therefore
Uh(X). The problem then reduces to a DE for which the BCs are homogeneous;
that is, the value ofthe solution and/or its derivative is zero at the boundary points.
20.1.5.Example. Let us asswne that Lx = d2/dx2 Calculation of Uh is trivial:
d2uh
Lx[Uh] = 0 ~ -2- = 0 ~ Uh(X) = c'x + C2·
dx
To evaluate C, and C2, we impose the BCs Uh(a) = a, and uh(b) = b,:
C,a + C2 = at.
C,b + C2 = b,.
This gives C, = (b, - a,)/(b - a) and C2 = (a,b - ab,)/(b - a).
Theinhomogeneous equation defines a problem identical to that of Example 20.1.4.
Thus, we can immediately write Ui(X) = I:G(x, y)f(y) dy, where G(x, y) is as given in
thatexample.Thus,the generalsolutionis
hI - at alb - abl lx
x - alb
u(x) = ~x + b _ a + a (x - y)f(y) dy + b _ a a (y - b)f(y) dylll
Example 20.1.5 shows that an inhomogeneous DE with inhomogeneous BCs
can be separated into two DEs, one homogeneous with inhomogeneous BCs and
the other inhomogeneous with homogeneous BCs, the latter being appropriate for
the OF. Futthermore, all the foregoing examples indicate that solutions of DEs can
be succinctly written in terms of OFs that automatically incorporate the BCs as
long as the BCs are homogeneous. Can a OF also give the solution to a DE with
inhomogeneons BCs?
20.2 Formal Considerations
The discussion and examples ofthe preceding section hint at the power of Green's
functions. The elegance of such a function becomes apparent from the realization
that it contains all the information about the solutions of a DE for any type of BCs,
as we are about to show. Since OFs are inverses of DOs, let us briefly reexamine
the inverse of an operator, which is closely tied to its specttum. The question as
(
558 20. GREEN'S FUNCTIONS IN ONE OIMENSION
to whether or not an operator A in a finite-dimensional vector space is invertible
is succinctly answered by the value of its determinant: A is invertible if and only
if detA t= O. In fact, as we saw at the beginning of Chapter 16, one translates the
abstract operator equation A lu) = [u) into a matrix equation Au = v and reduces
the question to that of the inverse of a matrix. This matrix takes on an especially
simpleformwhenA is diagonal,thatis, when Aij = AiBij. Forthisspecialsituation
we have
for i = 1,2, ... , N (no sum over i). (20.7)
This equation has a nnique solution (for arbitrary Vi) if and only if Ai t= 0 for all
i, In that case Ui = Vi/Ai for i = 1,2, ... , N. In particnlar, if Vi = 0 for all i,
that is, when Equation (20.7) is homogeneous, the unique solution is the trivial
solution.On the otherhand, when someof the Ai arezero,there maybe no solution
to (20.7), but the homogeneousequation has a nontrivial solution (Ui need not be
zero). Recalling (fromChapter 3) that an operator is invertibleif and only if none
of its eigenvaluesis zero, we have the following:
20.2.1. Proposition. The operator A E J:.,(V) isinvertible ifandonlyifthehomo-
geneous equation A Iu) = 0 hasno nontrivial solutions.
In infinite-dimensional(Hilbert) spaces there is no determinant. How can we
tell whether or not an operator in a Hilbert space is invertible? The exploitation
of the connection between invertibility and eigenvalues has led to Proposition
20.2.1, which can be generalizedto an operator acting on any vector space, finite
or infinite. Consider the equation A lu) = 0 in a Hilbert space 1(. In general,
neither the domain nor the range of A is the whole of 1(. If A is invertible, then
the only solution to the equation A lu) = 0 is lu} = O. Conversely, assuming that
the equation has no nontrivialsolution implies that the null space of A consists of
only the zero vector.Thus,
This showsthatA is injective(one-to-one),i.e., A is abijectivelinear mappingfrom
the domain of A, 1J(A), onto the range of A. Therefore, A must have an inverse.
The foregoing discussion can be expressed as follows.If Alu) = 0, then (by
the definitionof eigenvectors)A = 0 is an eigenvalue of A if and only if lu) t= o.
Thus, if A Iu) = 0 has no nontrivial solution,then zero cannot be an eigenvalueof (
A. This can also be stated as follows:
20.2.2. Theorem. An operator Aon a Hilbertspacehasan inverse if andonly if
A = 0 isnotan eigenvalue of A.
Green's functions are inversesof differential operators. Therefore, it is impor-
tant to have a clear understanding of the DOs. An nth-order linear differentialop-
erator (NOLDO)satisfiesthe followingtheorem (for aproof, see [Birk 78, Chapter
6]).
20.2 FORMAL CONSIOERATIONS 559
20.2.3. Theorem. Let
d" dn-I d
Lx = P,,(x)-d + Pn-I(X)-d
1+'" + PI (x)-d + po(x)
xn x": X
(20.8)
(20.9)
where p,,(x) ¢ 0 in [a, b]. Let Xo E [a, b] and let {YklZ~1 be given numbers
and f(x) a given piecewise continuous function on [a, b]. Then the initial value
problem (IVP)
Lx[u] = f for x E [a, b],
u(xo) = YI, u'(xo) = J!2, ... , uC,,-IxO) = Yn
has one and only one solution.
initial value problem This is simply the existence and uniqueness theorem for a NOLDE. Equation
(20.9) is referred to as the IVPwith data (f(x); YI, ... , YnJ. This Iheoremis used
to define Lx. Part ofIhat definition are the BCs Ihat the solutions to Lx must satisfy.
A particularly important BC is the homogeneous one in which YI = J!2 =
... = Yn = O. In such a case it can he shown (see Problem 20.3) Ihat the only
nontrivial solution of the homogeneous DE Lx[u] = 0 is u ss O. Theorem 20.2.2
then tells us that Lx is invertible; that is, there is a unique operator G suchthat
LG = 1. The "components" version ofIhis last relation is part ofthe content of the
nexttheorem.
20.2.4. Theorem. The DO Lx of Equation (20.8) associated with the IVP with
data (f(x); 0, 0, ... , OJ is invertible; that is, there exists afunction G(x, y) such
that
. 8(x-y)
LxG(x, y) = .
w(x)
The importance of homogeneous BCs can now be appreciated. Theorem 20.2.4
is the reason why we had to impose homogeneous BCs to obtain the GF in all the
examples of the previous section.
The BCs in (20.9) clearly are not the only ones Ihat can be used. The most
general linear BCs encountered in differential operator Iheory are
RI[U] sa al1u(a) + + alnu(n-I)(a) + fJl1u(b) + + fJlnuCn-I)(b) = YI, (
R2[U] == a2Iu(a) + + a2nu(n-I)(a) + fJ2IU(b) + + fJ2nU(n-l) (b) = J!2,
(20.10)
Rn[u] == anlu(a) + ... + annuC,,-I)(a) + fJ"lu(b) + ... + fJnnuCn-lb) = Yn·
The n row vectors {(ail, ... , ain, fJil, ... , fJin)}7~1 are assumed to be independent
(in particular, no row is identical to zero). We refer to Ri as boundary functionals
because for each (sufficiently smoolh) function u, Ihey give a number Yi. The
560 20. GREEN'S FUNCTIONS IN ONE DIMENSION
boundary functionals
and boundary value
problem
DO of (20.8) and the BCs of (20.10) together form a boundary valne problem
(BVP). The DE Lx[u] = f subject to the BCs of (20.10) is a BVP with data
(f(x); YI, ... , Ynl.
We note that the Ri are linear; that is,
and Ri[au] = aRi[U].
completely
homogeneous
problem
Since Lx is also linear, we conclude that the superposition principle applies to
the system consisting of Lx[u] = f and the BCs of (20.10), which is sometimes
denoted by (L; RI, ... , Rn). If u satisfies the BVP with data {f; YI, ... , Ynl and
v satisfies the BVP with data Is:1."1, ... , /Lnl, then au + pv satisfies the BVP
with data laf +pg; an +P/LI, , aYn +P/Lnl. It follows that if u and v both
satisfy the BVPwith data {f; n, ,Ylll, then u - v satisfies the BVP with data
10; 0, 0, ... , O], which is called the completely homogeneons problem.
Unlike the IVP, the BVP with data 10; 0, 0, ... , O} may have a nontrivial so-
lution. If the completely homogeneous problem has no noutrivial solution, then
the BVP with data {f; YI, ... , Ynl has at most one solution (a solution exists for
any set of data). On the other hand, if the completely homogeneous problem has
nontrivial solutions, then the BVP with data {f; YI, ... , Yn1eitherhas no solutions
or has more than one solution (see [Stak 79, pp. 203-204]).
Recall that when a differential (unbounded) operator Lx acts in a Hilbert space,
such as.c~(a, b), it acts only on its domain. In the context ofthe present discussion,
this means that not all functions in.c~(a, b) satisfy the BCs necessary for defining
Lx. Thus, the functions for which the operator is defined (those that satisfy the
BCs) form a subset of .c~(a, b), which we called the domain of Lx and denoted
by 'D(Lx). From a formal standpoint it is important to distinguish among maps
that have different domains. For instance, the Hilbert-Schmidt integral operators,
which are defined on a finite interval, are compact, while those defined on the
entire real line are not.
20.2.5. Definition. Let Lx be the DO ofEquation (20.8). Suppose there exists a
DO Lt, with the property that
adjoint ofa
differential operator
d
w Iv* (Lx[u]) - u(Lt[v])*l = -Q[u, v*]
dx
for u, v E 'D(Lx) n'D(Lt),
conjunct where Q[u, v*l, called the conjunct ofthe functions u and v, depends on u, v, and (
their derivatives oforder up to n - I. The DO Lt is then called the formal adjoint
of Lx· IfLt = Lx (without regard to the Bes imposed on their solutions), then Lx
is said to beformally self-adjoint. If'D(Lt) ::J 'D(Lx) and Lt = Lx on 'D(Lx), then
Lx is said to be hermitian. If 'D(Lb ='D(Lx) and Lt =Lx, then Lx is said to be
self-a4joint.
generalized Green's
identity
The relation given in the definition above involving the conjunct is a general-
ization of the Lagrange identity and can also be written in integral form:
20.2 FORMAL CONSIOERATIONS 561
lb
dxw{v*(Lx[u])} -lb
dxw{u(L~[V])*} = Q[u, v*]I~ (20.11)
This form is sometimes called the generalized Green's identity.
George Green (17937-1841) was not appreciated in his lifetime.
His date of birth is unknown (however, it is known that he was
baptized on 14 July 1793), and no portrait ofhim survives. He left
school,afteronlyone year's attendance,to workin his father's bak-
ery. When the father opened a windmill in Nottingham, the boy used
an upper room as a study in which he taught himself physics and
mathematics from library books. In 1828, when he was thirty-five
years old, he published his most important work, An Essay on the
Application ofMathematical Analysis to the Theory ofElectricity
and Magnetism at his own expense. In it Green apologized for any
shortcomings in the paper due to his minimal formal educationor
the limited resources available to him, the latter being apparent in the few previous works he
cited. The introduction explained the importance Green placed on the "potential" function.
The body of the paper generalizes this idea to electricity and magnetism.
Inaddition to the physics ofelectricity and magnetism, Green's first paperalso contained
the monumental mathematical contributions for which he is now famous: The relationship
between surface and volume integrals we now call Green's theorem, and the Green'sfunc-
tion, a ubiquitous solution to partial differential equations in almost every area of physics.
With little appreciation for the future impact of this work, one of Green's contemporaries
declared the publication "a complete failure." The "Essay", which received little notice
because of poor circulation, was saved by Lord Kelvin, who tracked it down in a German
journal.
When his father died in 1829, some of George's friends urged him to seek a college
education. After four years ofself-study, during which he closed the gaps in his elementary
education, Green was admitted to Caius College of Cambridge University at the age of 40,
from which he graduated four years later after a disappointing performance on his final
examinations. Later, however, he was appointed Perce Fellow of Caius College. Two years
afterhis appointment he died, and his famous 1828 paper was republished, this timereaching
a much wider audience. This paper has been described as "the beginning of mathematical
physics in England."
He published only ten mathematical works. In 1833 he wrote three further papers. Two
on electricity were puhlished by the Cambridge Philosophical Society. One on hydrody- (
namics was published by the Royal Society of Edinburgh (of which he was a Fellow) in
1836. He also had two papers on hydrodynamics (in particular wave motion in canals), two
papers on reflection and refraction of light, and two papers on reflection and refraction of
sound published in Cambridge.
In 1923 the Green windmill was partially restored by a local businessman as a gesture of
tribute to Green. Einstein came to pay homage. Then a fire in 1947 destroyed the renovations.
Thirty years later the idea of a memorial was once again mooted, and sufficient money was
raised to purchase the mill and present it to the sympathetic Nottingham City Council. In
1980 the George Green Memorial Appeal was launcbed to secure $20,000 to get the sails
562 20. GREEN'S FUNCTIONS IN ONE OIMENSION
turning again and themachinery working oncemore. Today, Green's restored mill stands
as a mathematics museum in Nottingham.
20.2.1 Second-OrderLinear DOs
Since second-order linear differential operators (SOLDOs) are sufficiently general
for mostphysicalapplications,we will concentrate on them. Becausehomogeneous
BCs are important in constructing Green's functions, let us first consider BCs of
the form
Rl [u] '" "l1u(a) +"12U' (a) +fJl1u(b) +fJ12u' (b) = 0,
R2[ul sa "21u(a) +"22U' (a) +f!21u(b) +fJ22u'(b) = 0,
(20.12)
where it is assumed, as usual, that ("11, "12, fJl1, fJ12) and ("21, "22, fJ21, fJ22) are
linearly independent.
If we define the inner product as an integral with weight w, Equation (20.11)
can be formally written as
(vi L lu) = (ul Lt Iv)* + Q[u, v*]I~.
This would coincide with the usual definition of the adjoint if the surface term
vanishes, that is, if
For this to happen, we need to impose BCs on v. To find these BCs, let us rewrite
Equation (20.12) in a more compact form. Linear independence of the two row
vectors of coefficients implies that the 2 x 4 matrix of coefficients has rank two.
This means that the 2 x 4 matrix has an invertible 2 x 2 submatrix. By rearranging
the terms in Equation (20.12) if necessary, we can assume that the second of the two
2 x 2 subrnatrices is invertible. The homogeneous BCs can then be conveniently
written as
Q[u, v*llx=b = Q[u, v*]lx=a.
R[u] = (Rl[U1
) = (A B) (Ua) A + B 0
R2[U] Ub = Ua Ub = ,
where
(20.13)
(20.14)
(
and Bis invertible.
B '" (fJl1 fJ12),
fJ21 fJ22
(
u(a) )
ua sa u'(a) , (
u(b) )
Ub'" u'(b) ,
20.2 FORMAL CONSIOERATIONS 563
The most general form of the conjnnct for a SOLDO is
Q[u, v*](x) '= ql1 (x)u(x)v*(x) +q12(X)U(X)v'*(x)
+q21 (x)u' (x)v*(x) +q22(X)U'(x)v'*(x),
which can he written in matrix form as
Q[u, v*](x) = u~axv; where ax = (ql1 (x) qJ2(X)),
Q21(X) Q22(X)
(20.15)
and U
x and v; have similar definitions as U
a and Ub ahove. The vanishing of the
surface termbecomes
(20.16)
We need to translate this equation intoa condition on v* alone/' This is accom-
plished hy solving for two of the four quantities u(a), u'(a), u(b), and u'(b) in
terms of the other two, suhstituting the result in Equation (20.16), and setting the
coefficients of the other two equal to zero. Let us assume, as before, that the suh-
matrix B is invertible, i.e., u(b) and u'(b) are expressible in terms of u(a) and
u'(a). Then Ub =-B-1Aua, or ub = _u~At(Bt)-I, and we obtain
-u~At(Bt)-labVb =u~aav~ =} u~ [At(Bt)-labVb +aav~] =0,
andthe conditionon v* becomes
(20.17)
We see that all factors ofu have disappeared, as they should. The expanded version
of the BCs on v* are written as
BI[v*] es ul1v*(a) +u12v'*(a) +ql1v*(b) +q12v'*(b) = 0,
B2[V*] sa u2Iv*(a) +U22v'* (a) +Q2IV*(b) +Q22V'*(b) = O.
(20.18)
adjoint boundary These homogeneous BCs are said to be adjoint to those of(20.12). Because of the
conditions difference between BCs and their adjoints, the domain of a differential operator
need not be the sarne as that of its adjoint.
20.2.6. Example. Let Lx = d2/ dx2 withthehomogeneous BC,
RI[U] = au(a) - u'(a) = 0 and R2[U] = f3u(b) - u'(b) = O. (20.19)
Wewantto calculateQ[u, v*] and the adjointBC, for v. By repeatedintegration by parts
[or by usingEquation (13.23)], we obtainQ[u, u"] = u'v* - uv'*. For the surface termto
vanish, we musthave
u'(a)v*(a) - u(a)v'*(a) = u'(b)v*(b) - u(b)v'*(b).
2Theboundary conditions on v* should notdepend on thechoiceof u.
564 20. GREEN'S FUNCTIONS IN ONE DIMENSION
B2[V*] = {lv*(b) - v'*(b) =O.
Substitutingfrom (20.19)in this equation,we get
u(a)[av*(a) - v'*(a)] =u(b)[{lv*(b) - v'*(b)],
whichholdsfor arbitraryu if and only if
Bl[V*] = av*(a) - v'*(a) = 0 and
(20.20)
Thisis a special case,in whichtheadjoint Bes arethesameastheoriginal Bes (substitute
u for v* to see this).
Tosee that theoriginal Bes andtheir adjoints neednotbethesame, we consider
Rdu] = u'(a) - au(b) =0 and R2[U] = {lu(a) - u'(b) =0, (20.21)
from whichwe obtainu(a)[{lv*(b) + v'*(a)] = u(b)[av*(a) +v'*(b)]. Thus,
Bl[V*] =av*(a) +v'*(b) = 0 and BZ[v*] = {lv*(b) +v'*(a) =0,
(20.22)
mixed and unmixed whichis not the sameas (20.21).Boundaryconditionssuchasthosein (20.19)and(20.20),
BCs in whicheachequation contains thefunction anditsderivative evaluated: atthesamepoint,
are called unmixed BCs. On the other hand, (20.21)
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf
Hassani_Mathematical_Physics_A_Modem_Int.pdf

More Related Content

PDF
Quantum Field Theory 13 Basics In Mathematics And Physics Quantum Electrodyna...
PDF
1000 Solved Problems In Modern Physics
PDF
1000-solved-problems-in-modern-physics.pdf
PDF
1000-solved-problems-in-modern-physics.pdf
PDF
Hilbert Space Operators In Quantum Physics 2ed Blank Jir Exner
PDF
Linear Algebra For Computational Sciences And Engineering 2nd Edition Ferrant...
DOCX
COMMUNICATIONS ON PURE AND APPLIED MATHEMATICS, VOL. XIII, 001.docx
PDF
Mathematics For Physics An Illustrated Handbook 1st Edition Adam Marsh
Quantum Field Theory 13 Basics In Mathematics And Physics Quantum Electrodyna...
1000 Solved Problems In Modern Physics
1000-solved-problems-in-modern-physics.pdf
1000-solved-problems-in-modern-physics.pdf
Hilbert Space Operators In Quantum Physics 2ed Blank Jir Exner
Linear Algebra For Computational Sciences And Engineering 2nd Edition Ferrant...
COMMUNICATIONS ON PURE AND APPLIED MATHEMATICS, VOL. XIII, 001.docx
Mathematics For Physics An Illustrated Handbook 1st Edition Adam Marsh

Similar to Hassani_Mathematical_Physics_A_Modem_Int.pdf (20)

PDF
Gingras mathematics
PDF
Mathematical Methods For Students Of Physics And Related Fields 2nd Sadri Has...
PPT
Quantum Geometry: A reunion of math and physics
PDF
Spectral theory and quantum mechanics
PDF
SGC 2015 - Mathematical Sciences Extension Studies
PPT
Chapter 1
PDF
MAT PHYSIC.pdf
PDF
Categories for the Working Philosopher 1st Edition Elaine Landry
PDF
Toward a completed theory of relativity
PPTX
Math, applied math, and math in physics
PDF
Mathematical_Methods_for_Physicists_CHOW.pdf
PDF
Phillips - Introduction to Quantum Mechanics.pdf
PDF
Handbook Of Set Theory 1st Edition Matthew Foreman Akihiro Kanamori Authors E...
PDF
Mathematical Structures Of The Universe 1st Edition Micha Heller
PDF
Quantum mechanics
PPT
Mathematics evolution
PDF
Causality Measurement Theory And The Differentiable Structure Of Spacetime R ...
Gingras mathematics
Mathematical Methods For Students Of Physics And Related Fields 2nd Sadri Has...
Quantum Geometry: A reunion of math and physics
Spectral theory and quantum mechanics
SGC 2015 - Mathematical Sciences Extension Studies
Chapter 1
MAT PHYSIC.pdf
Categories for the Working Philosopher 1st Edition Elaine Landry
Toward a completed theory of relativity
Math, applied math, and math in physics
Mathematical_Methods_for_Physicists_CHOW.pdf
Phillips - Introduction to Quantum Mechanics.pdf
Handbook Of Set Theory 1st Edition Matthew Foreman Akihiro Kanamori Authors E...
Mathematical Structures Of The Universe 1st Edition Micha Heller
Quantum mechanics
Mathematics evolution
Causality Measurement Theory And The Differentiable Structure Of Spacetime R ...
Ad

Recently uploaded (20)

PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPT
6.1 High Risk New Born. Padetric health ppt
PPTX
famous lake in india and its disturibution and importance
PPTX
BIOMOLECULES PPT........................
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
2. Earth - The Living Planet Module 2ELS
PDF
Sciences of Europe No 170 (2025)
PDF
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
PDF
Placing the Near-Earth Object Impact Probability in Context
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
PPTX
2. Earth - The Living Planet earth and life
PPTX
C1 cut-Methane and it's Derivatives.pptx
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PDF
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
6.1 High Risk New Born. Padetric health ppt
famous lake in india and its disturibution and importance
BIOMOLECULES PPT........................
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
Taita Taveta Laboratory Technician Workshop Presentation.pptx
Classification Systems_TAXONOMY_SCIENCE8.pptx
2. Earth - The Living Planet Module 2ELS
Sciences of Europe No 170 (2025)
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
Placing the Near-Earth Object Impact Probability in Context
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
2. Earth - The Living Planet earth and life
C1 cut-Methane and it's Derivatives.pptx
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
Introduction to Fisheries Biotechnology_Lesson 1.pptx
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
Ad

Hassani_Mathematical_Physics_A_Modem_Int.pdf

  • 1. Sadri Hassani Mathematical Physics A Modem Introduction to Its Foundations With 152 Figures , Springer ODTlJ KU1"UPHANESt M. E. T. U. liBRARY
  • 2. METULIBRARY ..2 ~themltlcoJ physicS:I modem mllllllllllllllllllllllllllllll~11111111111111111111111111IIII 002m7S69 QC20 H394 SadriHassani Department ofPhysics IllinoisStateUniversity Normal, IL 61790 USA hassani@entropy.phy.ilstu.edu To my wife Sarah and to my children Dane Arash and Daisy Rita 336417 Libraryof CongressCataloging-in-Publication Data Hassani,Sadri. Mathematical physics: a modem introductionits foundations / SadriHassani. p. em. Includesbibliographical referencesandindex. ISBN 0-387-98579-4 (alk.paper) 1.Mathematical physics. I. Title. QC20.H394 1998 530.15--<1c21 98-24738 Printedon acid-freepaper. QC20 14394 c,2. © 1999Springer-Verlag New York,Inc. All rightsreserved.Thiswork maynot be translatedor copiedin whole or in part withoutthe written permission of the publisher (Springer-Verlag New York, Inc., 175 Fifth Avenue, New York, NY 10010, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use-in connectionwith any form of informationstorageand retrieval,electronicadaptation,computersctt- ware,or by similaror dissimilarmethodology now knownor hereafterdevelopedis forbidden. The use of generaldescriptivenames, trade names, trademarks,etc., in this publication, even if the formerare not especiallyidentified, is not to be takenas a sign that such names,as understoodby the TradeMarksandMerchandise MarksAct,may accordinglybe used freelyby anyone. Productionmanagedby KarinaMikhli;manufacturing supervisedby ThomasKing. Photocomposed copy preparedfromthe author's TeXfiles. Printedand boundby HamiltonPrintingCo.,Rensselaer, NY. Printedin the UnitedStatesof America. 9 8 7 6 5 4 3 (Correctedthirdprinting,2002) ISBN0-387-98579-4 SPIN 10854281 Springer-Verlag New York Berlin: Heidelberg A member.ofBertelsmannSpringer Science+Business Media GmbH
  • 3. Preface "Ich kann es nun einmal nicht lassen, in diesem Drama von Mathematik und Physik---<lie sich im Dunkeln befrnchten, aber von Angesicht zu Angesicht so geme einander verkennen und verleugnen-die Rolle des (wie ich gentigsam erfuhr, oft unerwiinschten) Boten zu spielen." Hermann Weyl It is said that mathematics is the language of Nature. If so, then physics is its poetry. Nature started to whisper into our ears when Egyptians and Babylonians were compelled to invent and use mathematics in their day-to-day activities. The faint geomettic and arithmetical pidgin of over four thousand years ago, snitable for rudimentary conversations with nature as applied to simple landscaping, has turned into a sophisticated language in which the heart of matter is articulated. The interplay between mathematics and physics needs no emphasis. What may need to be emphasized is that mathematics is not merely a tool with which the presentation of physics is facilitated, butthe only medium in which physics can survive. Just as language is the means by which humans can express their thoughts and without which they lose their uniqueidentity, mathematics is the only language through which physics can express itself and without which it loses its identity. And just as language is perfected due to its constant usage, mathematics develops in the most dramatic way because of its usage in physics. The quotation by Weyl above, an approximation to whose translation is "In this drama of mathematics and physics-whichfertilize each other in the dark, but which prefer to deny and misconstrue each other face to face-I cannot, however, resist playing the role ofa messenger, albeit, as I have abundantly learned, often an unwelcome one:'
  • 4. vi PREFACE is a perfect description of the natnral intimacy between what mathematicians and physicists do, and the nnnatnral estrangement between the two camps. Some ofthe most beantifnl mathematics has been motivated by physics (differential eqnations by Newtonian mechanics, differential geometry by general relativity, and operator theoryby qnantnmmechanics), and some of the most fundamental physics has been expressed in the most beantiful poetry of mathematics (mechanics in symplectic geometry, and fundamental forces in Lie group theory). I do uot want to give the impression that mathematics and physics cannot develop independently. On the contrary, it is precisely the independence of each discipline that reinforces not only itself, but the other discipline as well-just as the stndy of the grammar of a language improves its usage and vice versa. However, the most effective means by which the two camps can accomplish great success is throngh an inteuse dialogue. Fortnnately, with the advent of gauge and string theories ofparticlephysics, such a dialogue has beenreestablishedbetweenphysics and mathematics after a relatively long lull. Level and Philosophy of Presentation This is a book for physics stndeuts interested in the mathematics they use. It is also a book fur mathematics stndeuts who wish to see some of the abstract ideas with which they are fantiliar come alive in an applied setting. The level of preseutation is that of an advanced undergraduate or beginning graduate course (or sequence of courses) traditionally called "Mathematical Methods of Physics" or some variation thereof. Unlike most existing mathematical physics books intended for the same audience, which are usually lexicographic collections of facts about the diagonalization of matrices, tensor analysis, Legendre polynomials, contour integration, etc., with little emphasis on formal and systematic development of topics, this book attempts to strike a balance between formalism and application, between the abstract and the concrete. I have tried to include as mnch of the essential formalism as is necessary to render the book optimally coherent and self-contained. This entails stating and proving a large nnmber of theorems, propositions, lemmas, and corollaries. The benefit of such an approachis that the stndentwill recognize clearly both the power and the limitation of a mathematicalidea usedin physics. Thereis a tendency on the part ofthe uovice to universalize the mathematical methods and ideas eucountered in physics courses because the limitations of these methods and ideas are not clearly pointed out. There is a great deal offreedom in the topics and the level of presentation that instructors can choose from this book. My experience has showu that Parts I, TI, Ill, Chapter 12, selected sections ofChapter 13, and selected sections or examples of Chapter 19 (or a large snbset of all this) will be a reasonable course content for advancedundergraduates. Ifone adds Chapters 14and 20, as well as selectedtopics from Chapters 21 and 22, one can design a course snitable for first-year graduate
  • 5. PREFACE vii students. By judicious choice of topics from Parts VII and VIII, the instructor can bring the content of the course to a more modern setting. Depending on the sophistication ofthe students, this can be done either in the first year or the second year of graduate school. Features To betler understand theorems, propositions, and so forth, students need to see them in action. There are over 350 worked-out examples and over 850 problems (many with detailed hints) in this book, providing a vast arena in which students can watch the formalism unfold. The philosophy underlying this abundance can be summarized as ''An example is worth a thousand words of explanation." Thus, whenever a statement is intrinsically vague or hard to grasp, worked-out examples and/or problems with hints are provided to clarify it. The inclusion of such a large number of examples is the means by which the balance between formalism and application has been achieved. However, although applications are essential in understanding mathematical physics, they are only one side of the coin. The theorems, propositions, lemmas, and corollaries, being highly condensed versions of knowledge, are equally important. A conspicuous feature of the book, which is not emphasized in other compa- rable books, is the attempt to exhibit-as much as.it is useful and applicable-« interrelationships among various topics covered. Thus, the underlying theme of a vector space (which, in my opinion, is the most primitive concept at this level of presentation) recurs throughout the book and alerts the reader to the connection between various seemingly unrelated topics. Another useful feature is the presentation of the historical setting in which men and women of mathematics and physics worked. I have gone against the trend of the "ahistoricism" of mathematicians and physicists by summarizing the life stories of the people behind the ideas. Many a time, the anecdotes and the historical circumstances in which a mathematical or physical idea takes form can go a long way toward helping us understand and appreciate the idea, especially if the interaction among-and the contributions of-all those having a share in the creation of the idea is pointed out, and the historical continuity of the development of the idea is emphasized. To facilitate reference to them, all mathematical statements (definitions, theo- rems, propositions, lemmas, corollaries, and examples) have been nnmbered con- secutively within each section and are preceded by the section number. For exam- ple, 4.2.9 Definition indicates the ninth mathematical statement (which happens to be a definition) in Section 4.2. The end of a proofis marked by an empty square D, and that of an example by a filled square III, placed at the right margin of each. Finally, a comprehensive index, a large number of marginal notes, and many explanatory underbraced and overbraced comments in equations facilitate the use
  • 6. viii PREFACE and comprehension of the book. In this respect, the book is also nsefnl as a refer- ence. Organization and Topical Coverage Aside from Chapter 0, which is a collection of pnrely mathematical concepts, the book is divided into eight parts. Part I, consisting of the first fonr chapters, is devotedto a thorough study of finite-dimensional vectorspaces and linearoperators defined on them. As the unifying theme ofthe book, vector spaces demand careful analysis, and Part I provides this in the more accessible setting of finite dimensionin a languagethatis conveniently generalizedto the more relevant infinite dimensions,' the subject of the next part. Following a brief discussion of the technical difficulties associated with in- finity, Part IT is devoted to the two main infinite-dimensional vector spaces of mathematical physics: the classical orthogonal polynomials, and Foutier series and transform. Complex variables appear in Part ill. Chapter 9 deals with basic properties of complex functions, complex series, and their convergence. Chapter 10 discusses the calculus of residues and its application to the evaluation of definite integrals. Chapter II deals with more advanced topics such as multivaluedfunctions, analytic continuation, and the method of steepest descent. Part IV treats mainly ordinary differential equations. Chapter 12 shows how ordinary differential equations of second order arise in physical problems, and Chapter 13 consists of a formal discussion of these differential equations as well as methods of solving them numerically. Chapter 14 brings in the power of com- plex analysis to a treatment of the hypergeometric differential equation. The last chapter of this part deals with the solution of differential equations using integral transforms. Part V starts with a formal chapter on the theory of operator and their spectral decomposition in Chapter 16. Chapter 17 focuses on a specific type of operator, namely the integral operators and their corresponding integral equations. The for- malism and applications of Stnrm-Liouville theory appear in Chapters 18 and 19, respectively. The entire Part VI is devoted to a discussion of Green's functions. Chapter 20 introduces these functions for ordinary differential equations, while Chapters 21 and 22 discuss the Green's functions in an m-dimensional Euclidean space. Some of the derivations in these last two chapters are new and, as far as I know, unavailable anywhere else. Parts VII and vrncontain a thorough discussion of Lie groups and their ap- plications. The concept of group is introduced in Chapter 23. The theory of group representation, with an eye on its application in quantom mechanics, is discussed in the next chapter. Chapters 25 and 26 concentrate on tensor algebra and ten-, sor analysis on manifolds. In Part vrn, the concepts of group and manifold are
  • 7. PREFACE ix brought together in the coutext of Lie groups. Chapter 27 discusses Lie groups and their algebras as well as their represeutations, with special emphasis on their application in physics. Chapter 28 is on differential geometry including a brief introduction to general relativity. Lie's original motivation for constructing the groups that bear his name is discussed in Chapter 29 in the context of a systematic treatment of differential equations using their symmetry groups. The bookends in a chapter that blends many of the ideas developed throughout the previous parts in order to treat variational problems and their symmetries. It also provides a most fitting example of the claim made at the beginning of this preface and one of the most beautiful results of mathematical physics: Noether's theorem ou the relation between symmetries and conservation laws. Acknowledgments It gives me great pleasure to thank all those who contributed to the making of this book. George Rutherford was kind enough to voluuteer for the difficult task of condensing hundreds of pages of biography into tens of extremely informative pages. Without his help this unique and valuable feature of the book would have been next to impossible to achieve. I thank him wholeheartedly. Rainer Grobe and Qichang Su helped me with my rusty computational skills. (R. G. also helped me with my rusty German!) Many colleagues outside my department gave valuable comments and stimulating words of encouragement on the earlier version of the book. I would like to recordmy appreciationto Neil Rasbandfor reading part of the manuscript and commenting on it. Special thanks go to Tom von Foerster, senior editor ofphysics and mathematics at Springer-Verlag, not ouly for his patience and support, but also for the,extreme care he took in reading the entire manuscript and giving me invaluable advice as a result. Needless to say, the ultimate responsibility for the content of the book rests on me. Last but not least, I thank my wife, Sarah, my son, Dane, and my daughter, Daisy, for the time taken away from them while I was writing the book, and for their support during the long and arduous writing process. Many excellent textbooks, too numerous to cite individually here, have influ- enced the writing of this book. The following, however, are noteworthy for both their excellence and the amount of their influence: Birkhoff, G., and G.-C. Rota, Ordinary Differential Equations, 3rd ed., New York, Wiley, 1978. Bishop, R., and S. Goldberg, Tensor Analysis on Manifolds, New York, Dover, 1980. Dennery, P., and A. Krzywicki, Mathematics for Physicists, New York, Harper & Row, 1967. Halmos, P., Finite-Dimensional Vector Spaces, 2nd ed., Princeton, Van Nostrand, 1958.
  • 8. x PREFACE Hamennesh, M. Group Theory and its Application to Physical Problems, Dover, New York, 1989. Olver, P.Application ofLie Groups to DifferentialEquations, New York, Springer- Verlag, 1986. Unless otherwise indicated, all biographical sketches have been taken from the following three sources: Gillispie, C., ed., DictionaryofScientificBiography, Charles Scribner's,New York, 1970. Simmons, G. Calculus Gems, New York, McGraw-Hill, 1992. History ofMathematics archive at www-groups.dcs.st-and.ac.uk:80. I wonld greatly appreciate any comments and suggestions for improvements. Although extreme care was taken to correct all the misprints, the mere volume of the book makes it very likely that I have missed some (perhaps many) of them. I shall be most grateful to those readers kind enough to bring to my attention any remaining mistakes, typographical or otherwise. Please feel free to contact me. Sadri Hassani Campus Box 4560 Department of Physics Illinois State University Normal, IL 61790-4560, USA e-mail: hassani@entropy.phy.i1stu.edu Itis my pleasure to thank all those readers who pointed out typographical mistakes and suggested afew clarifyingchanges. With the exception ofa couple that required substantial revisiou, I have incorporated all the corrections and suggestions in this second printing.
  • 9. Note to the Reader Mathematics and physics are like the game of chess (or, for that matter, like any gamej-i-you willleam only by ''playing'' them. No amount of reading about the game will make you a master. Inthis bookyou will find a large number ofexamples and problems. Go throughas many examples as possible, and try to reproducethem. Pay particular attention to sentences like "The reader may check . .. "or "It is straightforward to show . .. "These are red flags warning you that for a good understanding of the material at hand, yon need to provide the missing steps. The problems often fill in missing steps as well; and in this respect they are essential for a thorough understanding of the book. Do not get discouraged if you cannot get to the solution of a problem at your first attempt. If you start from the beginning and think about each problem hard enough, you will get to the solution, .and you will see that the subsequent problems will not be as difficult. The extensive index makes the specific topics about which you may be in- terested to leam easily accessible. Often the marginal notes will help you easily locate the index entry you are after. I have included a large collection of biographical sketches of mathematical physicists ofthe past. These are truly inspiring stories, and I encourage you to read them. They let you see that even underexcruciating circumstances, the human mind can workmiracles. Youwill discover how these remarkable individuals overcame the political, social, and economic conditions of their time to let us get a faint glimpse of the truth. They are our true heroes.
  • 11. Contents Preface Note to the Reader List of Symbols o Mathematical Preliminaries 0.1 Sets . 0.2 Maps . 0.3 Metric Spaces . 0.4 Cardinality........ 0.5 Mathematical Induction 0.6 Problems.......................... I Finite-Dimensional VectorSpaces 1 Vectors and Transformations 1.1 VectorSpaces . . . . . 1.2 Inner Product . 1.3 Linear Transformations . 1.4 Algebras. 1.5 Problems.......... 2 Operator Algebra 2.1 Algebra «s:(V) v xi xxi 1 1 4 7 10 12 14 17 19 19 23 32 41 44 49 49
  • 12. xiv CONTENTS 2.2 Derivatives of Functions of Operators . 2.3 Conjugationof Operators . . . . 2.4 Hermitian and Unitary Operators 2.5 Projection Operators . . . . . . . 2.6 Operatorsin Numerical Analysis 2.7 Problems............. 3 Matrices: Operator Representations 3.1 Matrices.......... 3.2 Operationson Matrices .. . . . . 3.3 OrthonormalBases . 3.4 Change of Basis and SimilarityTransformation . 3.5 The Determinant . . .. 3.6 The Trace 3.7 Problems...... 4 Spectral Decomposition 4.1 Direct Sums . . . . . . . . . . 4.2 InvariantSubspaces . . . . . . 4.3 EigeuvaluesandEigenvectors . 4.4 SpectralDecomposition 4.5 Functionsof Operators 4.6 Polar Decomposition 4.7 Real VectorSpaces 4.8 Problems....... IT Infinite-Dimensional Vector Spaces 5 Hilbert Spaces 5.1 The Questionof Convergence . 5.2 The Space of Square-IntegrableFunctions 5.3 Problems................... 6 Generalized Functions 6.1 ContinuousIndex 6.2 Generalized Functions . 6.3 Problems........ 7 Classical Orthogonal Polynomials 7.1 GeneralProperties . . 7.2 Classification......... 7.3 RecurrenceRelations .... 7.4 Examplesof Classical OrthogonalPolynomials. 56 61 63 67 70 76 82 82 87 89 91 93 101 103 109 109 112 114 117 125 129 130 138 143 145 145 150 157 159 159 165 169 172 172 175 176 179
  • 13. 7.5 7.6 7.7 Expansion in Terms of Orthogonal Polynomials .. Generating Functions . . . . . . . . . . . . Problems . CONTENTS xv 186 190 190 8 Fourier Analysis 8.1 Fourier Series ..... 8.2 The Fourier Transform . 8.3 Problems........ III Complex Analysis 196 196 208 220 225 9 Complex Calculus 227 9.1 Complex Functions " 227 9.2 Analytic Functions. . . . . . . " 228 9.3 ConformalMaps .. . . . . . . . . . . . 236 9A Integration of Complex Functions . . . . . . . . . . . . . " 241 9.5 Derivativesas Integrals 248 9.6 Taylorand Laurent Series . 252 9.7 Problems............................ 263 10 Calculus of Residues 10.1 Residues . . . . . . . . . . . . . . . . 10.2 Classificationof Isolated Singularities 10.3 Evaluation of DefiniteIntegrals . . .. lOA Problems . 11 Complex Analysis: Advanced Topics ILl MeromorphicFunctions . . . 11.2 MultivaluedFunctions . . . . . . . . 11.3 Analytic Continuation . . . . . . . . . 1104 The Gamma and Beta Functions. . . . 11.5 Method of Steepest Descent . . . . . . 11.6 Problems. . . . . . . . . . . . . . . . 270 270 273 275 290 293 293 295 302 309 312 319 IV DifferentialEquations 325 12 Separation ofVariables in Spherical Coordinates 327 12.1 PDEs of Mathematical Physics .. . . . . . . . . . . . . . . 327 12.2 Separation of theAngular Part of the Laplacian. . . . . . .. 331 12.3 Constructionof Eigenvaluesof L2. . . . . . . . . . . 334 1204 Eigenvectorsof L2: Spherical Harmonics . . 338 12.5 Problems.. . . . . . . . . . . . . . ., . . . . . . . 346
  • 14. xvi CONTENTS 13 Second-Order Linear Differential Equations 348 13.1 General Properties of ODEs. . . . . . . . . . . . . 349 13.2 Existence and Uniqneness for First-Order DEs . 350 13.3 General Properties of SOLDEs . . . 352 13.4 The Wronskian. . . . . . . . . . . . 355 13.5 Adjoint Differential Operators. . . . 364 13.6 Power-Series Solntions of SOLDEs . 367 13.7 SOLDEs with Constant Coefficients 376 13.8 The WKB Method . . . . . . . . . . . . 380 13.9 Numerical Solntions of DEs . . . . . . . 383 13.10 Problems. . . . . . . . . . . . . . . . . . . . . . . . . . . 394 14 Complex Aualysis of SOLDEs 400 14.1 Analytic Properties of Complex DEs 401 14.2 Complex SOLDEs . . . . . . . . . . 404 14.3 Fuchsian Differential Equations . . . 410 14.4 The Hypergeometric Functiou . . . . 413 14.5 Confiuent Hypergeometric Functions 419 14.6 Problems. . . . . . . . . . . . . . . . . . 426 15 Integral Transforms and Differential Equations 433 15.1 Integral Representation of the Hypergeometric Function . . 434 15.2 Integral Representation of the Confiuent Hypergeometric Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 437 15.3 Integral Representation of Bessel Functions 438 15.4 Asymptotic Behavior of Bessel Functions. 443 15.5 Problems.. . . . . . . . . . . . . . . . . . . . . . . . . .. 445 V Operators on Hilbert Spaces 16 An Inlroduction to Operator Theory 16.1 From Abstract to Integral and Differential Operators . 16.2 Bounded Operators in Hilbert Spaces . . 16.3 Spectra ofLinear Operators . . 16.4 Compact Sets . 16.5 Compact Operators . 16.6 Spectrum of Compact Operators 16.7 Spectral Theorem for Compact Operators . 16.8 Resolvents 16.9 Problems. . . . . . . . . . . . . . . . . . 17 Integral Equations 17.1 Classification. 449 451 451 453 457 458 464 467 473 480 485 488 488
  • 15. CONTENTS xvii 17.2 Fredholm Integra!Equations . 494 17.3 Problems............ 505 18 Sturm-Liouville Systems: Formalism 507 18.1 UnboundedOperatorswith CompactResolvent 507 18.2 Sturm-Liouville Systems and SOLDEs . . . . 513 18.3 Other Propertiesof Sturm-Liouville Systems . 517 18.4 Problems. . . . . . . . . . . . . . . . . . . . 522 19 Sturm-Lionville Systems: Examples 524 19.1 Expansionsin Termsof Eigenfunctions . 524 19.2 Separationin CartesianCoordinates. . . 526 19.3 Separationin CylindricalCoordinates. . 535 19.4 Separationin SphericalCoordinates. 540 19.5 Problems................. 545 VI Green's Functions 20 Green's Functions in One Dimension 20.1 Calculationof Some Green's Functions . 20.2 Formal Considerations . . . . . . . . . . 20.3 Green's Functionsfor SOLDOs . . . . . 20.4 EigenfunctionExpansion of Green's Fnnctions . 20.5 Problems.. . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Multidimensional Green's Functions: Formalism 21.1 Propertiesof Partial DifferentialEquations . 21.2 MultidimensionalGFs and Delta Functions . 21.3 FormalDevelopment ... 21.4 IntegralEquations and GFs 21.5 PerturbationTheory 21.6 Problems. . . . . . . . . . 22 Multidimensional Green's Functions: Applications 22.1 Elliptic Equations . . 22.2 ParabolicEquations . . . . . . . . . . . . . . . 22.3 HyperbolicEquations . . . . . . . . . . . . . . 22.4 The FourierTransformTechnique . . . . 22.5 The EigenfunctionExpansion Technique 22.6 Problems. .. .. .. .. .. .. . ... 551 553 r- 554 557 565 577 580 583 584 592 596 600 603 610 613 613 621 626 628 636 641
  • 16. xviii CONTENTS VII Groups and Manifolds 23 Group Theory 23.1 Groups. 23.2 Subgroups . . . 23.3 Group Action . 23.4 The Symmetric Group s, 23.5 Problems. . . . . . . . . 24 Group Representation Theory 24.1 Definitionsand Examples 24.2 OrthogonalityProperties. . . 24.3 Analysis of Representations . . 24.4 Group Algebra . . . . . . . .. . . . . . 24.5 Relationship of Characters to Those of a Subgroup . 24.6 IrreducibleBasis Functions . . . . . . . 24.7 TensorProduct of Representations ... 24.8 Representationsof the Symmetric Group 24.9 Problems.. .. .. .. . .. .. .. .. 25 Algebra of Tensors 25.1 Multilinear Mappings 25.2 Symmetriesof Tensors . 25.3 Exterior Algebra . . . . . 25.4 hmer Product Revisited . 25.5 The Hodge Star Operator 25.6 Problems. . . . . . . . . 26 Analysis of Tensors 26.1 DifferentiableManifolds. . 26.2 Curves and Tangent Vectors . 26.3 Differentialof a Map ... 26.4 TensorFields on Manifolds 26.5 Exterior Calculus ... 26.6 Symplectic Geometry 26.7 Problems. . . . . . . . VIII Lie Groups and Their Applications 27 Lie Groups and Lie Algebras 27.1 Lie Groups and Their Algebras . . . . . . . 27.2 An Outline of Lie Algebra Theory ... 27.3 Representation of Compact Lie Groups . . . 649 651 652 656 663 664 669 673 -J 673 680 685 687 692 695 699 707 723 728 729 736 739 749 756 758 763 763 770 776 780 791 801 808 813 815 815 833 845
  • 17. 27.4 Representationof the General Linear Group . 27.5 Representationof Lie Algebras 27.6 Problems . CONTENTS xix 856 859 876 28 DifferentialGeometry 882 28.1 VectorFields and Curvature . . . . . . 883 28.2 RiemannianManifolds. . . . 887 28.3 CovariantDerivative and Geodesics . 897 28.4 Isometriesand Killing VectorFields 908 28.5 GeodesicDeviationand Curvature . 913 28.6 GeneralTheory of Relativity 918 28.7 Problems. . . . . . . . . . . . . . . . . . . . . . . . . . . . 932 29 Lie Gronpsand DifferentialEquations 936 29.1 Synunetries of AlgebraicEquations. . . . . . 936 29.2 Synunetry Groups of DifferentialEquations 941 29.3 The CentralTheorems. . . . . . . 951 29.4 Applicationto Some KnownPDEs 956 29.5 Applicationto ODEs. 964 29.6 Problems. . . . . . . . . . . . . . 970 30 Calcnlusof Variations, Symmetries,and ConservationLaws 973 30.1 The Calculusof Variations. . . . . . . . . . . 973 30.2 SymmetryGroups of VariationalProblems . . . . . . . . 988 30.3 ConservationLaws and Noether's Theorem. . . . . . . . 992 30.4 Applicationto ClassicalField Theory . 997 30.5 Problems. . . . . . . . . . . . . . . . . . . . . . . . . . .. 1000 Bibliography 1003 Index 1007
  • 19. List of Symbols E, (11') z R R+ C lI ill ~A AxB An U, «» A=B x ..... f(x) V 3 [a] gof iff ek(a, b) Cn(or JRn) pClt] P'[t] P~[t] Coo (al b) lIall "belongs to", ("does not belong to") Set of integers Set of real nnmbers Set of positive real nnmbers Set of complex nnmbers Set of nonnegative integers Set of rational nnmbers Complement of the set A Set of ordered pairs (a, b) with a E A and b E B {(aI, a2, ... , an)lat E Al Union, (Intersection) A is eqnivalent to B x is mapped to f(x) via the map f for all (valnes of) There exists (a valne of) Eqnivalence class to which a belongs Composition of maps f and g if and only if Set of functions on (a, b) with continnons derivatives np to order k Set of complex (or real) n-tnples Set of polynomials in t with complex coefficients Set of polynomials in t with real coefficients Set of polynomials with complex coefficients of degree n or less Set of all complex seqnences (atl~1 snch that L:~I latl2 < 00 Inner prodnct of la) and Ibl Norm (length) of the vector la)
  • 20. xxii LIST OF SYMBOLS £-(V) [8, T] Tt At, or A l.lE/lV 8(x - xo) Res[f(zolJ DE, ODE, PDE SOLDE GL(V) GL(n,q SL(n,q 71 ®72 AJB AP(V) Set of endomorphisms (linear operators) on vector space V Commutator of operators 8 and T Adjoint (hermitian conjugate) of operator T Transpose of matrix A Direct sum of vector spaces 1.l and V Dirac delta function nonvanishing only at x = xo Residue of f at point zo Differential equation, Ordinary DE, Partial DE Second order linear (ordinary) differential equation Set of all invertible operators on vector space V Set of all n x n complex matrices of nonzero determinant Set of all n x n complex matrices ofunit determinant Tensor product of 71 and 72 Exterior (wedge) product of skew-symmetric tensors A and B Set of all skew-symmetric tensors of type (p, 0) on V
  • 21. 0 _ Mathematical Preliminaries This introductory chapter gathers together some of the most basic tools and notions that are used throughout the book. It also introduces some common vocabulary and notations used in modem mathematical physics literature. Readers familiar with such concepts as sets, maps, equivalence relations, and metric spaces may wish to skip this chapter. 0.1. Sets Modem mathematics starts with the basic (and undefinable) concept of set. We think of a set as a structureless family, or collection, of objects. We speak, for example, of the set of students in a college, of men in a city, of women working concept ofset for a corporation, of vectors in space, of points in a plane, or of events in the elaborated continuum of space-time. Each member a of a set A is called an element of that sel. This relation is denoted by a E A (read "a is an element of A" or "a belongs to A"), and its negation by a ¢ A. Sometimes a is called a point of the set A to emphasize a geometric connotation. A set is usually designated by enumeration of its elements between braces. For example, {2, 4, 6, 8}represents the set consisting ofthe first four even natural numbers; {O, ±I, ±2, ±3, ... } is the set of all integers; {I, x, x2 , x3, ... } is the set of all nonnegative powers ofx; and {I, i, -1, -i} is the set of the four complex fourth roots ofunity. In many cases, a set is defined by a (mathematical) statement that holds for all ofits elements. Such a set is generally denoted by {x IP(x)} and read "the set of all x's such that P(x) is true." The foregoing examples of sets can be written alternatively as follows: {n In is even and I < n < 9}
  • 22. 2 O. MATHEMATICAL PRELIMINARIES {±n In is a natural number} {y I y = x" and n is a uatural uumber} {z IZ4 = I and z is a complex uumber} singleton (proper) subset empty set union, intersection, complement universal set Cartesian product ordered pairs In a frequently used shorthand uotation, the last two sets can be abbreviated as [x" I n 2: aand n is an integer} and [z E iC I Z4 = I}. Similarly, the uuit circle can be deuoted by {z [z] = I}, the closed interval [a, b] as {xla ::; x ::; b}, the open interval (a, b) as {x I a < x < b}, and the set of all nonnegative powers of x as {x"}~o' This last notation will be used frequeutly iu this book. A set with a single element is called a singleton. Ifa E A whenever a E B, we say that B is a subset of A and write B C A or A:J B. If Be A and A c B, then A = B.1f Be A and A i' B, thenB is called a proper subset of A. The set defined by {ala i' a}is called the empty set and is denoted by 0. Clearly, 0 contains no elements and is a subset of any arbitrary set. The collection of all subsets (including 0) of a set A is denoted by 2A • The reason for this notation is that the number of subsets ofa set containing n elements is 2" (Problem a.I).1f A and B are sets, their union, denoted by A U B, is the set containing all elements that belong to A or B or both. The intersection of the sets .! and B, denoted by A n B, is the set containing all elements belonging to both A and B. If {B.}.El is a collection of sets,1 we denote their union by U.B. and their intersection by n.Ba- In any application of set theory there is an underlying universal set whose subsets are the objects ofstudy. This universal set is usually clear from the context. For exaunple, in the study ofthe properties of integers, the set of integers, denoted by Z, is the universal set. The set of reals, JR, is the universal set in real analysis, and the set of complex numbers, iC, is the universal set in complex analysis. With a universal set X in mind, one can write X ~ A instead of ~ A. The complement of a set A is denoted by ~ A and defined as ~ A sa {a Ia Ii! A}. The complement of B in A (or their difference) is A ~ B == {ala E A and a Ii! B}. From two given sets A and B, it is possible to form the Cartesian prodnct of A and B, denoted by A x B, which is the set of ordered pairs (a, b), where a E A and b E B. This is expressed in set-theoretic notatiou as A x B ={(a, b)la E A and b e B}. 1Here I is an index set--or a counting set-with its typical element denoted by ct. In most cases, I is the set of (nonnegative) integers, but, in principle, it can be any set, for example, the set of real numbers.
  • 23. relation and equivalence relation 0.1 SETS 3 We can generalize this to an arbitrary number of sets. If AI, Az, ... , An are sets, then the Cartesian product of these sets is Al x Az x ... x An = {(ai, az, ... , an)!ai E Ad, which is a set of ordered n-tuples. If Al = Az = ... = An = A, then we write An instead of A x A x··· x A, and An = {(ai, az, ... , an) Ia; E Aj. The most familiar example of a Cartesian product occurs when A = R Then JRz is the set of pairs (XI,xz) with XI,xz E JR. This is simply the points in the Euclidean plane. Similarly, JR3 is the set of triplets (XI,xz, X3), or the points in space, and JRn = {(XI, Xz, ... , Xn)!Xi E JRj is the set ofreal n-tuples. 0.1.1 Equivalence Relations There are many instances in which the elements of a set are naturally grouped together. For example, all vector potentials that differ by the gradient of a scalar function can be grouped together because they all give the same magnetic field. Similarly, all quantum state functions (of unit "length") that differ by a multi- plicative complex number of unit length can be grouped together because they all represent the same physical state. The abstraction of these ideas is summarized in the following definition. 0.1.1. Definition. Let A be a set. A relation on A is a comparison test between ordered pairs ofelements ofA. If the pair (a, b) E A x A pass this test, we write at> b and read "a is related to b" An equivalence relation an A is a relation that has the fallowing properties: af>a V'aEA, a s-b ~ b s a a.b e A, a i-b.b» c ==>- a[>c a.b;c E A, (reflexivity) (symmetry) (transivity) When a t> b, we say that "a is equivalent to b" The set [a] = {b E Alb t> aj ofall equivalence class elements that are equivalent to a is called the equivalence class ofa. The reader may verify the following property of equivalence relations. 0.1.2. Proposition. If» is an equivalence relation an A and a, b E A, then either [a] n [b] = 0 or [a] = [bl representative ofan equivalence class Therefore, a' E [a] implies that [a'] = [a]. In other words, any element of an equivalence class can be chosen to be a representative of that class. Because of the symmetry of equivalence relations, sometimes we denote them by c-o.
  • 24. 4 O. MATHEMATICAL PRELIMINARIES 0.1.3. Example. Let A be the set of humanbeings.Let a »b be interpretedas"a is older than b." Then clearly, I> is a relation but not an equivalence relation. On the other hand, if we interpret a E> b as "a and b have the same paternal grandfather," then l> is an equivalence relation, as the reader may check. The equivalence class of a is the set of all grandchildren of a's paternal grandfather. Let V be the set of vector potentials. Write A l> A' if A - A' = V f for some function f. The reader may verifythat" is an equivalence relation. and that [A] is the set of all vector potentials giving rise to the same magnetic field. Let the underlying set be Z x (Z - {OJ). Say "(a, b) is relatedto (c, d)" if ad = be. Then this relation is an equivalence relation. Furthermore, [(a, b)] can be identified as the ratio a/b. l1li 0.1.4. Definition. Let A be a set and {Ra} a collection ofsubsets ofA. Wesay that partition ofa set {Ra} is a partition of A, or {Ra} partitions A, if the Ra's are disjoint, i.e., have noelementin common, andUo;Ba = A. Now consider the collection {[a] Ia E A} of all equivalence classes of A. quotient set These classes are disjoint, aod evidently their union covers all of A. Therefore, the collection of equivalence classes of A is a partition of A. This collection is denoted by A/1><1 aod is called the quotient set ofA underthe equivalence relation 1><1. 0.1.5. Example. Let the underlyingset be lll.3. Definean equivalence relationon lll.3 by saying that PI E lR3 and P2 E }R3 are equivalentifthey lie on the same line passing through the origin. Then ]R3I l><l is the set of all lines in space passing through the origin. If we choose the unit vector with positive third coordinate along a given line as the representative ofthat line, then]R3I l><lcan be identified with the upper unit hemisphere.e ]R3I l><lis called projective space the projective space associated with]R3. On the set IE of integers define a relation by writing m e- n for m, n E IE if m - n is divisible by k,where k is a fixed integer. Then e-is not only a relation, but an equivalence relation. In this case, we have Z/" = {[O], [1], ... , [k - I]}, as the reader is urged to verify. For the equivalence relation defined on IE x IE of Example 0.1.3, the set IE x lEI l><lcan be identified with lQ, the set of rational numbers. II 0.2 Maps map, domain, codomain, image To communicate between sets, one introduces the concept of a map. A map f from a set X to a set Y, denoted by f : X -> Y or X ~ Y, is a correspondence between elements of X aod those of Y in which all the elements of X participate, 2Purthermore, we need to identify any two points on the edge of the hemisphere which lie on the same diameter.
  • 25. 0.2 MAPS 5 Figure 1 The map f maps all of the set X onto a subset of Y. The shaded areain Y is f(K), the rangeof f. and each element of X corresponds to only one element of Y (see Figure 1). If y E Y is the element that corresponds to x E X via the map f, we write y = f(x) or x f--> f(x) or and call f (x) the image of x onder f. Thus, by the definition of map, x E X can have only one image. The set X is called the domain, and Y the codomain or the target space. Two maps f : X --> Y and g : X --> Y are said to be equal if function f(x) = g(x) for all x E X. 0.2.1. Box. A map whose codomain is the set ofreal numbers IR or the set ofcomplex numbers iC is commonly called afunction. A special map that applies to all sets A is idA : A --> A, called the identity identity map map of A, and defined by VaEA. graph ofa map The graph r f of a map f : A --> B is a subset of Ax B defined by r f = {(a, f(a)) Ia E A} C A x B. This general definition reduces to the ordinary graphs encountered in algebra and calculus where A = B = IR and A x B is the xy-plane. If A is a subset of X, we call f(A) = {f(x)lx E A} the image of A. Similarly, if B C f(X), we call preimage f- 1(B) = {x E Xlf(x) E B) the inverse image, or preimage, of B. In words, f-1(B) consists of all elements in X whose images are in B C Y.1f B consists of a single element b, then r:'(b) = {x E Xlf(x) = b) consists of all elements of X that are mapped to b. Note that it is possible for many points of X to have the same image in Y. The subset f(X) of the codomain of a map f is called the range of f (see Figure 1).
  • 26. 6 O. MATHEMATICAL PRELIMINARIES Figure 2 Thecomposition of twomaps is another map. composition oftwo maps injection, surjection, and bijection, or1-1 correspondence inverse of a map If I : X --> Y and g : Y --> W, then the mapping h : X --> W given by h(x) = g(f(x)) is' called the composition of I and g, and is denoted by h = g 0 I (see Figure 2).3It is easy to verify that loidx=l=idyol If l(xI) = I(X2) implies that XI = X2, we call I injective, or one-to-one (denoted I-I). For an injective map only one element of X corresponds to an element of Y. If I(X) = Y, the mapping is said to be surjective, oronto. A map that is both injective and surjective is said to be bijective, or to be a one-to- one correspondence. Two sets that are in one-to-one correspondence, have, by definition, the same nnmber of elements. If I : X --> Y is a bijection from X onto Y, then for each y E Y there is one and only one element X in X for which I(x) = y. Thus, there is a mapping I-I : Y --> X given by I-I(y) = x, where X is the nniqne element such that I(x) = y. This mapping is called the inverse of I. The inverse of I is also identified as the map that satisfies I 0 I-I = idy and I-I 0 I = idx- For example, one can easily verify that ln-I = exp and exp"! = ln, because In(eX ) = X and elnx = x. Given a map I : X --> Y, we can define a relation txl on X by saying XI txl X2 if l(xI) = I(X2). The reader may check that this is in fact an equivalence relation. The equivalence classes are subsets of X all of whose elements map to the same point in Y. In fact, [x] = 1-1(f(X». Corresponding to I, there is a map! : X/txl--> Y given by !([x]) = I(x). This map is injective because if !([XI]) = !([X2]), then l(xI) = I(X2), so XI and X2 belong to the same equivalence class; therefore, [XI] = [X2]. It follows that! : X/txl--> I(X) is bijective. IfI and g arebothbijections withinverses I-I and g-I, respectively,then goI also has an inverse, and verifying that (g 0 f)-I = I-log-I is straightforward. 3Notetheimportance of theorder in whichthecomposition is written. Thereverse ordermaynotevenexist.
  • 27. 0.3 METRIC SPACES 7 0.2.2. Example. As an example of the preirnage of a set, consider the sine and cosine functions. Then it should be clearthat . -10 { JOO SID = nn: n=-oo' cos- 1 0 = {i+mT:J:-oo III in{ectivity and surjectivity depend on the domain and codomain unit circle binary operation Similarly, sin-1[O, !l consists of all the intervals on the x-axis marked by heavy line segments in Figure 3, i.e., all thepointswhose sine lies between0 and~. As examples of maps, we consider fonctions 1 :lR-+ lR stndied in calculus. The Iwo fonctions 1 : lR -+ lR aud g : lR -+ (-I, +I) given, respectively, by 1 (x) = x 3 aud g(x) = taubx are bijective. The latter function, by the way,shows that there are as mauy points in the whole real line as there are in the interval (-I, +1). If we denote the set of positive real numbers by lR+, then the function 1 : lR -+ lR+ given by I(x) = x2 is surjective but not injective (both x aud ~x map to x 2). The function g : lR+ -+ lRgiven by the same rule, g(x) = x 2 , is injective but not surjective. On the other haud, h : lR+ -+ lR+ again given by h(x) = x2 is bijective, but u : lR -+ lRgiven by the same rule is neither injective nor surjective. LetMn x n denotethe set ofn xn real matrices. Define a function det : Mnxn ---+ lR by det(A) = det A,where det Ais the determinaut ofAfor A E J1nxn. This fonc- tion is clearly surjective (why?) but not injective. The set of all matrices whose determinaut is 1 is det-I(I). Such matrices occur frequently in physical applica- tions. Another example of interest is 1 :C -+ lRgiven by 1(z) = [z].This function is also neither injective nor swjective. Here 1-1 (1) is the unit circle, the circle of radius I in the complex plaue. The domain of a map cau be a Cartesiau product of a set, as in 1 :X x X -+ Y. Two specific cases are worthy of mention. The first is when Y = R An example ofthis case is the dot product onvectors, Thus, if X is the set of vectors in space, we cau define I(a, b) = a· b. The second case is when Y = X. Then 1 is called a binary operation on X, whereby au element in X is associated with two' elements in X. For instance, let X = Z, the set of all integers; then the fonction I: Z xZ -+ Zdefined by [tm, n) = mn is the binary operation of multiplication of integers. Similarly, g : lR x lR -+ lRgiven by g(x, y) = x +y is the binary operation of addition of real numbers. 0.3 Metric Spaces Although sets are at the root of modem mathematics, they are only of formal aud abstract interest by themselves. To make sets useful, it is necessary to introduce some structnres on them. There are two general procedures for the implementa- tion of such structnres. These are the abstractions of the two major brauches of mathematics-algebra aud aualysis.
  • 28. 8 O. MATHEMATICAL PRELIMINARIES Figure 3 Theunionof all theintervals on thex-axis marked by heavyline segments is . -1[0 1] sm ,~ . We can turn a set into an algebraic structure by introducing a binary operation on it. For example, a vector space consists, among other things, of the binary operation of vector addition. A group is, among other things, a set together with the binary operation of "multiplication". There are many other examples of algebraic systems, and they constitute the rich subject of algebra. When analysis, the otherbranch of mathematics, is abstractedusing the concept of sets, it leads to topology, in which the concept ofcontinuity plays a central role. This is also a rich subjectwith far-reaching implications and applications. We shall not go into any details of these two areas of mathematics. Although some algebraic systems will be discussed and the ideas of limit and continuity will be used in the sequel, this will be done in an intuitive fashion, by introducing and employing the concepts when they are needed. On the other hand, some general concepts will be introduced when they require minimum prerequisites. One of these is a metric space: 0.3.1. Definition. A metric space is a set X together with a real-valuedfunction metric space defined d: X x X ~ lRsuch that (a) d(x,y) 2: 0 VX,y,andd(x,y)=Oiffx=y. (b) d(x, y) = d(y, x). (c) d(x, y) ::: d(x, z) +d(z, y). (symmetry) (the triangle inequality) It is worthwhile to point out that X is a completely arbitrary set and needs no other structure. In this respect Definition 0.3.1 is very broad and encompasses many different situations, as the following examples will show. Before exantining the examples, note that the function d defined above is the abstraction ofthe notion of distance: (a) says that the distance between any two points is always nonnegative and is zero ouly if the two points coincide; (b) says that the distance between two points does not change if the two points are interchanged; (c) states the known fact
  • 29. 0.3 METRIC SPACES 9 that the sum of the lengths oftwo sides of a triangle is always greater than or equal to the length of the third side. Now consider these examples: 1. Let X = iQI, the set of rational numbers, and define d(x, y) = Ix - yl. 2. Let X = R, and again define d(x, y) = Ix - yl. 3. Let X consist of the points on the surface of a sphere. We can define two distance functions on X. Let dt (P, Q) be the length of the chord joining P and Q on the sphere. We can also define another metric, dz(P, Q), as the length of the arc of the great circle passing through points P and Q on the surface ofthe sphere. Itis not hard to convince oneselfthat dl and da satisfy all the properties of a metric function. 4. Let eO[a,b] denote the set ofcontinuous real-valued functions on the closed interval [a, b]. We can define d(f, g) = J:If(x) - g(x)1 dx for f, g E eO(a, b). 5. Let en(a, b) denote the set ofbounded continuons real-valned fnnctions on the closed interval [a, b]. We then define d(f, g) = max lIf(x) - g(x)IJ xe[a,b] for f, g E eB(a, b). This notation says: Take the absolute valne of the difference in f andg at all x in the interval [a, b] and then pickthe maximum of all these values. The metric function creates a natural setting in whichto testthe"closeness" of points in ametric space. Oneoccasionon whichtheideaof closenessbecomes sequence defined essential is in the study of a seqnence. A sequence is a mapping s : N --* X from the set of natural numbers N into the metric space X. Such a mapping associates with a positive integer n apoints(n) of the metric space X. Itis customary to write Sn (or Xn to match the symbol X) instead of s(n) and to enumerate the values of the function by writing {xnJ~I' Knowledge of the behavior of a sequence for large values of n is of fundamental importance. In particular, it is important to know whether a sequence approaches convergence defined a finitevalue as n increases. 0.3.2. Box. Suppose thatfor some x andforanypositive real number e, there exists a natural number N such that dtx-; x) < e whenevern > N. Then we say that the sequence {xn}~l converges to x and write limn~oo d(xn l x) = Oar d(xno x) --* 0 or simply Xn --* X. It may not be possible to test directly for the convergence of a given sequence because this requires a knowledge of the limit point x. However, it is possible to
  • 30. 10 O. MATHEMATICAL PRELIMINARIES Figure 4 The distancebetween the elements of a Cauchy sequence gets smaller and smaller. Cauchy sequence complete metric space do the next best thing-to see whether the poiots of the sequence get closer and closer as n gets larger and larger. A Cauchy sequence is a sequence for which limm.n-->oo d(xm, xn) = 0, as shown in Figure 4. We can test directly whether or not a sequence is Cauchy. However, the fact that a sequence is Cauchy does not guarantee that it converges. For example, let the metric space be the set of rational numbers IQI with the metric function d(x, y) = Ix - YI, and consider the sequence {xn}~l where X n = Lk~l (- li+1 / k. It is clear that X n is a rational number for any n. Also, to show that IXm - XnI 4 0 is an exercise in calculus. Thus, the sequence is Cauchy. However, it is probably known to the reader that limn--+oo Xn = In2, whichis not arational number. A metric space io which every Cauchy sequence converges is called a complete metric space. Complete metric spaces playa crucial role in modern analysis. The precediog example shows that IQI is not a complete metric space. However, if the limit poiots of all Cauchy sequences are added to IQI, the resulting space becomes complete. This complete space is, ofcourse, the real number system R It tums out that any iocomplete metric space can be "enlarged" to a complete metric space. 0.4 Cardinality The process of counting is a one-to-one comparison ofone set with another. Iftwo cardinalily sets are io one-to-one correspondence, they are said to have the same cardinality. Two sets with the same cardinality essentiallyhave the same "number" of elements. The set Fn = {I, 2, ... , n} is finite and has cardinality n. Any set from which there is a bijection onto Fn is said to be finite with n elements. Although some steps hadbeentaken before himin thedirection of a definitive theory of sets,thecreatorof thetheoryof setsis considered to be Georg Cantor (1845-1918), who was bornin Russiaof Danish-Jewish parentage butmoved to Germany withhis parents.
  • 31. 0.4 CARDINALITY 11 His father urged him to study engineering, and Cantor entered the University of Berlin in 1863 with that intention. There he came under the influence of Weierstrass and turned to pure mathematics. He became Privatdozent at Halle in 1869 and professor in 1879. When he was twenty-nine hepublished his first revolutionarypaper on the theory of infinitesetsin the JournalfUrMathematik.Although some of its propositions were deemed faulty by the older mathematicians,its overalloriginality and brilliance attracted attention.He continued to publishpapers on the theoryof sets and on transfinitenumbersuntil 1897. One of Cantor's main concerns was to differentiate among infinite sets by "size" and, like Balzano before him, he decided that one-to-one correspondence should be the basic principle. In his correspondence with Dedekind in 1873, Cantor posed the question of whether the set of real numbers can be put into one- to-one correspondence with the integers, and some weeks later he answered in the negative. He gave two proofs. The first is more complicated than the second, which is the one most often used today. In 1874 Cantor occupied himself with the equivalence of the points of a line and the points of R" and sought to prove that a one-to-one correspondence between these two sets was impossible. Three years later he proved that there is such a correspondence. He wrote to Dedekind, "I see it but I do not believe it." He later showed that given any set, it is always possible to create a new set, the set of subsets of the given set, whose cardinal number is larger than that of the given set. If 1'::0 is the given set, then the cardinal number of the set of subsets is denoted by 2~o. Cantor proved that 2~O = C, where c is the cardinal number of the continuum; i.e., the set of real numbers. Cantor's work, which resolved age-old problems and reversed much previous thought, could hardly be expected to receive immediate acceptance. His ideas on transfinite ordi- nal and cardinal numbers aroused the hostility of the powerful Leopold Kronecker, who attacked Cantor's theory savagely over more than a decade, repeatedly preventing Cantor from obtaining a more prominent appointment in Berlin. Though Kronecker died in 1891, his attacks left mathematicians suspicious of Cantor's work. Poincare referred to set theory as an interesting "pathological case." He also predicted that "Later generations will regard [Cantor's] Mengenlehre as 'a disease from which one has recovered." At one time Cantor suffered a nervous breakdown, but resumed work in 1887. Many prominentmathematicians, however, were impressedby the uses to which the new theoryhadalreadybeenpatinanalysis,measuretheory,andtopology. HilbertspreadCantor's ideas in Germany, and in 1926 said, "No one shall expel us from the paradise which Cantor created for us." He praised Cantor's transfinite arithmetic as "the most astonishing product of mathematical thought, one of the most beautiful realizations of human activity in the domain ofthe purely intelligible." BertrandRusselldescribed Cantor's work as "probablythe greatest of which the age can boast." The subsequent utility ofCantor's work in formalizing mathematics-a movement largely led by Hilbert-seems at odds with Cantor's Platonic view that the greater importance of his work was in its implications for metaphysics and theology. That his work could be so seainlessly diverted from the goals intended by its creator is strong testimony to its objectivity and craftsmanship.
  • 32. 12 O. MATHEMATICAL PRELIMINARIES countably infinite uncountable sets Cantor set constructed Now consider the set of natnral numbers N = {l, 2, 3, ... }. If there exists a bijection between a set A and N, then A is said to be countably infinite. Some examples of countably infinite sets are the set of all integers, the set ofeven natnral numbers, the set of odd natnral numbers, the set of all prime numbers, and the set of energy levels of the bound states of a hydrogen atom. It may seem surprising that a subset (such as the set of all even numbers) can be put into one-to-one correspondence with the full set (the set of all natural numbers); however, this is a property shared by all infinite sets. In fact, sometimes infinite sets are defined as those sets that are in one-to-one correspondence with at least one of their proper subsets. It is also astonishing to discover that there are as many rational numbers as there are natnral numbers. After all, there are infinitely many rational numbers just in the interval (0, I)-or between any two distinctreal numbers. Sets that are neither finite nor countably infinite are said to be uncountable. In some sense they are "more infinite" than any countable set. Examples of uncount- able sets are the points in the interval (-1, +1), the real numbers, the points in a plane, and the points in space. Itcanbe shown that these sets have the same cardinal- ity: There are as many points in three-dimensional space-the whole universe-as there are in the interval (-I, +1) or in any other finite interval. Cardinality is avery intricatemathematicalnotionwith many surprisingresults. Consider the interval [0, 1]. Remove the open interval (~, ~) from its middle. This means that the points ~ and ~ will not be removed. From the remaining portion, [0, ~] U [~, 1], remove the two middle thirds; the remaining portion will then be [0, ~] U [~, ~] U [~, ~] U [~, 1] (see Figure 5). Do this indefinitely. What is the cardinality of the remaining set, which is called the Cantor set? Intuitively we expect hardly anything to be left. We might persuade ourselves into accepting the fact that the number of points remaining is at mostinfinite but countable. The surprising fact is that the cardinality is that of the continuum! Thus, after removal of infinitely many middle thirds, the set that remains has as many points as the original set! 0.5 Mathematical Induction Many a time it is desirable to make a mathematical statement that is true for all natural numbers. For example, we may want to establish a formula involving an integer parameter that will hold for all positive integers. One encounters this situa- tion when, after experimenting with the first few positive integers, one recognizes a pattern and discovers a formula, and wants to make sure that the formula holds for all natural numbers. For this purpose, one uses mathematical induction. The induction principle essence of mathematical induction is stated as follows:
  • 33. (2) 0.5 MATHEMATICAL INDUCTION 13 0------------------ 1------ 2-- 3- 4-- Figure5 TheCantor setafter one,two,three, and four "dissections." 0.5.1. Box. Suppose that there is associatedwith each natural number (pos- itive integer) n a statement Sn. Then Sn is true for every positive integer provided the following two conditions hold: I. S, is true. 2. If Sm is true for some given positive integer m, then Sm+l is also true. We illustrate the use of mathematical induction by proving the binomial the- binomial theorem orem: where we have used the notation (;) '" k!(mm~k)!' The mathematical statement Sm is Equation (1). We note that SI is trivially true. Now we assume that Smis true and show that Sm+1 is also true. This means starting with Equation (1) and showing that
  • 34. 14 O. MATHEMATICAL PRELIMINARIES Then the induction principle ensures that the statement (equation) holds for all positiveintegers. Multiply both sides of Eqnation (I) by a +b to obtain (a +b)m+l = t (rn)am- k+l bk +t (rn)am- kbk+1. k~O k k=O k Now separate the k = 0 term from the first sum and the k = rn term from the secondsum: letk=j-linthissum The second sum in the last line involves j. Since this is a dummy index, we can substitute any symbol we please. The choice k is especially useful because then we canunitethetwo summations. Thisgives If we now use which the reader can easily verify, we finally obtain Mathematical induction is also used in defining quantities involving integers. inductive definitions Suchdefinitions are called inductive definitions. Forexample, inductive definition is used in defining powers: a l = a and am = am-lao 0.6 Problems 0.1. Show that the number of subsets of a set containing n elements is 2".
  • 35. tk=n(n+ I). k=O 2 0.6 PROBLEMS 15 0.2. Let A, B, and C be sets in a universal set U. Show that (a) A C BandB C CimpliesA C C. (b) AcBiffAnB=AiffAUB=B. (c) A c Band B C C implies (A U B) c C. (d) AU B = (A ~ B) U (A n B) U (B ~ A). Hint: To show the equality of two sets, show that each set is a subset of the other. 0.3. For each n E N, let In = Ixlix - II < n and Ix + II > ~} . Find UnIn and nnIn. 0.4. Show that a' E [a] implies that [a'] = [a]. 0.5. Can you define a binary .operatiou of "multiplication" on the set of vectors in space? What about vectors in the plane? 0.6. Show that (f a g)-1 = g-1 a 1-1 when I and g are both bijections. 0.7. Take any two open intervals (a, b) and (c, d), and show that there are as many points in the first as there are in the second, regardless of the size of the intervals. Hint: Find a (linear) algebraic relation between points of the two intervals. 0.8. Use mathematical induction to derive the Leibniz rule for differentiating a Leibniz rule product: d n n (n) d k I dn-kg dx n(f. g) = 'E k -d k d n-k' k=O x x 0.9. Use mathematical induction to derive the following results: n k rn+1 - 1 'Er = , k=O r - I Additional Reading 1. Halmos, P.Naive Set Theory, Springer-Verlag, 1974. A classic text on intu- itive (as opposed to axiomatic) set theory coveriog all the topics discussed in this chapter and much more. 2. Kelley, J. General Topology,Springer-Verlag, 1985. The introductory chap- ter of this classic reference is a detailed introduction to set theory and map- pings. 3. Simmons, G.lntroduction to Topologyand Modern Analysis, Krieger, 1983. The first chapter of this book covers not only set theory and mappings, but also the Cantor set and the fact that integers are as abundant as rational numbers.
  • 39. 1 _ Vectors and Transformations Two- and three-dimensional vectors-undoubtedlyfamiliar objects to thereader- can easily be generalized to higher dimensions. Representing vectors by their components, one canconceiveof vectors having N components. Thisis themost immediate generalization of vectors in the plane and in space, and such vectors are called N-dimensional Cartesian vectors. Cartesian vectors are limited in two respects: Their components are real, and their dimensionality is finite. Some ap- plications in physics require the removal of one or both of these limitations. It is therefore convenient to study vectors stripped of any dimensionality or reality of components. Such properties become consequences of more fundamental defini- tions. Although we will be concentrating on finite-dimensional vector spaces in this part of the book, many of the concepts and examples introduced here apply to infinite-dimensional spaces as well. 1.1 Vector Spaces Let us begin with the definition of an abstract (complex) vector space.' 1.1.1. Definition. A vector space Vover <C is a set ojobjects denoted by 10), Ib), vector space defined [z), and so on, called vectors, with the following propertiesr: 1. To every pair ofvectors la) and Ib) in Vthere corresponds a vector la) +Ib), also in V, called the sum oj la) and Ib), such that (a) la) + Ib) = Ib) + la), 1Keepin mindthat C is theset of complexnumbers andR theset of reels. 2Thebra, ( I,andket, I),notation forvectors, invented by Dirac, is veryusefulwhendealingwithcomplexvectorspaces. However, itis somewhat clumsyforcertain topicssuchasnannandmetrics andwill therefore beabandoned in thosediscussions.
  • 40. 20 1. VECTORS ANO TRANSFORMATIONS (b) la} +(Ib) + Ie)) = (Ia) + Ib)) + [c), (c) There exists a unique vector 10} E V, called the zero vector, such that la} + 10) = Ja} for every vector la), (d) To every vector la} E Vthere corresponds a unique vector - Ja} (also in V) such that la) + (-Ia}) = 10). scalars are numbers complex VS. real vector space concept offield summarized 2. To every complex number: a-{llso called a scalar-s-and every vector la) there corresponds a vector a la) in V such that (a) a(f3la}) = (af3) la), (b) J ]a) = la). 3. Multiplication involving vectors and scalars is distributive: (a) a(la) + Ib}) = ala) +alb). (b) (a +13) la} = ala) + f3la). The vector space defined above is also called a complex vector space. It is possible to replace iC with IR-the set ofreal numbers-s-in which case the resulting space will be called a real vector space. Real and complex numbers are prototypes of a mathematical structure called field. A field is a set of objects with two binary operations called addition and multiplication. Each operation distributes with re- spect to the other, and each operation has an identity. The identity for addition is denoted by °and is called additive identity. The identity for multiplication is de- noted by I and is called multiplicative identity. Furthermore, every element has an additive inverse, and every elementexcept the additive identity has a multiplicative inverse. 1.1.2. Example. SOME VECTOR SPACES 1. 1R is a vector space overthe fieldof real numbers. 2. C is a vector space overthefieldof realnumbers. 3. C is a vector space over the complex numbers. 4. LetV = R andletthefieldofscalarsheC.Thisisnot avectorspace,becauseproperty 2 of Definition 1.1.1 is notsatisfied: A complexnumber timesa realnumber is not arealnumber and therefore doesnotbelongto V. 5. Thesetof"arrows"in theplane(orin space) forms avector space over1R under the parallelogram lawof additiooof planar (or spatial)vectors. 3Complex numbers, particularly when they aretreated as variables, areusuallydenoted by z, andwe shall adhere to this convention in Part ITI. However, in the discussionof vectorspaces,we havefoundit moreconvenient to use lowercase Greek letters to denotecomplexnumbers as scalars.
  • 41. n-dimensional complex coordinate space n-dimensional reai coordinate space, or Cartesian n-space linear independence defined linear combination of vectors 1.1 VECTOR SPACES 21 6. Let :Pe[t] be the set of all polynomials with complex coefficients in a variable t. Then pC[t] is a vector space under the ordinary addition of polynomials and the multiplication of a polynomial by a complex number. In this case the zero vector is the zeropolynomial. 7. For a given positive integer n, let P~[t] be the set of all polynomials with complex coefficients of degree less than or equal to n. Again it is easy to verify that P~[t] is a vector space under the usual addition of polynomials and their multiplication by complex scalars. In particular. the sum of two polynomials of degree less than or equal to n is also a polynomial of degree less than or equal to n, and multiply- ing a polynomial with complex coefficients by a complex number gives another polynomial of the same type. Here the zero polynomial is the zero vector. 8. The set ~[t] of polynomials of degree less than or equal to n with real coefficients is a vector space over the reals, but it is not a vector space over the complex numbers. 9. Let en consist of all complex a-tuples such as la) = ("1, "2, ... , "n) and Ib) = (!!I, fJ2, ... , fJn)· Let" be a complex number.Then we define la) + Ib) = ("t + fJI, "'2 + fJ2,"" "n + fJn), ala} = (ceq, aa2, ...• aan), 10) = (0,0, ... , 0), -Ia) = (-"10 -"2,···, -"n)· It is easy to verify that en is a vector space over the complex numbers. It is called the n-dimensional complex coordinate space. 10. The set of all real n-tuples lRn is a vector space over the real numbers under the operations similar to that of en.It is called the n-dimensional real coordinate space, or Cartesian n-space. It is not a vector space over the complex numbers. 11. The setof all complex matrices withm rows andn columns JY[mxn is a vector space under the usual addition of matrices and multiplication by complex numbers. The zero vector is the m x n matrix with all entries equal to zero. 12. Let Coo be the set of all complex sequences la} = {a; }~l such that L~l la;1 2 < 00. One can show that with addition and scalar multiplication defined component- wise, Coo is a vector space over the complex numbers. 13. The set of all complex-valued functions of a single real variable that are continuous in the real interval (a, b) is a vector space over the complex numbers. 14. The set en(a, b) on (a, b) of all real-valued functions of a single real variable that possess continuous derivatives of all orders up to n forms a vector space over the reals. 15. The set eOO(a, b) of all real-valuedfunctions on (a, b) of a single real variablethat possess derivatives of all orders forms a vector space over the reals. II It is ciear from the example above that the existence nf a vector space depends as much on the nature of the vectors as nn the nature of the scalars. 1.1.3.Definition. The vectors lal), la2), ... , Ian}, are said to be linearly inde- pendent iffor "i EC, the relation L:f=t a; lai) =0 implies a; =0for all i. The sum L:f=l a; lai) is called a linear combination of{lai )}f=l'
  • 42. 22 1. VECTORS AND TRANSFORMATIONS subspace The intersection of two subspaces is also a subspace. 1.1.4. Definition. A subspace W ofa vector space V is a nonempty subset ofV with the property that iJla} , Ib) E W, then ala} +fJ Ib) also belongs to Wfor all a, fJ E C. A subspace is a vector space in its own right. The reader may verify that the intersection oftwo subspaces is also a subspace. span ofa subset ofa vector space 1.1.5. Theorem. If S is any nonempty set ofvectors in a vector space V,then the set Ws ofall linear combinations ofvectors in S is a subspace ofV. We say that Ws is the span of S, or that S spans WS, or that Ws is spanned by S. Ws is sometimes denoted by Span{S}. The proof of Theorem 1.1.5is left as Problem 1.8. basis defined 1.1.6. Definition. A basis ofa vector space V is a set B oflinearly independent vectors that spans all ofV. A vector space that has afinite basis is calledjinite- dimensional; otherwise, it is injinite-dimensional. We state the following without proof (see [Axle96, page 31]): components ofa vector in a basis 1.1.7. Theorem. All bases of a given finite-dimensional vector space have the same number oflinearly independent vectors. This number is called the dimension ofthe vector space. A vector space ofdimension N is sometimes denoted by VN. If 10) is a vectorin an N-dimensional vectorspaceV and B = {Ia,}}~1 a basis in that space, then by the definitionof a basis, there exists a nnique (see Problem 1.4) set of scalars{aI, a2,··., an} such that 10) = L~l a, 10,). The set {ail~l is called the components of la} with respect to the basis B. 1.1.8. Example. Thefollowingaresubspaces of someof thevectorspacesconsidered in Example 1.1.2. • The"space" of real numbers is a subspace of C overthereals, • 1R is notasubspace ofC overthecomplexnumbers, becauseasexplainedinExample 1.1.2,1R cannot be avector spaceoverthecomplex numbers. • The set of all vectors along a given line going through theorigin is a subspace of arrows intheplane(orspace)overJR. • P~[t] is a subspace ofpC[t]. • Cn- 1 is a subspace of en whenen- 1is identified withall complex a-tuples with zerolastentry. In general, em is a subspace of en form < n whenem is identified withalla-tuples whoselastn - m elements arezero. • :MYXS is asubspace of:Mm xn forr :::: m ors ~ n. Here, weidentify anr x s matrix withanm x n matrix whoselastm - r rows andn - s columns areall zero. • :P~ [t] is a subspace of p~ [t] form < n. • P~[t] is a subspace ofP~[t] form < n. Notethat bothP~[t] andP~n[t] are vector spaces Over thereals only.
  • 43. 1.2 INNER PRODUCT 23 • IRm is a subspaceofR" form < n. Therefore,lR2•the plane,is a subspaceoflR3~ the Euclidean space. Also, R 1 == IRis a subspace ofboth the plane]R2 and the Euclidean space IR3• II 1.1.9. Example. The following are bases for the vector spaces given in Example 1.1.2. • The number 1 is a basis for JR, which is therefore one-dimensional. • The numbers 1 and i = .J=I are basis vectors for the vector space Cover R Thus, this space is two-dimensional. • The number 1 is a basis for Cover C, and the space is one-dimensional. Note that although the vectors are the same as in the preceding item, changing the nature of the scalars changes the dimensionality of the space. • The set {ex,ey, cz} of the unit vectors in the directions of the three axes forms a basis in space. The space is three-dimensional. • A basis of pert] can be fanned by the monomials 1, t, (2, .... It is clear that this space is infinite-dimensional. • A basis of en is given by el, e2, ...,en, where ej is an n-tuple that has a 1 at the standard basis of en jth position and zeros everywhere else. This basis is called the standard basis of en. Clearly, the space has n dimensions. • A basis of JY[mxn is given by 811, e12, ... , eij, ... , emn• where 8U is the m x n matrix with zeros everywhere except at the intersection ofthe ithrow and jthcolumn, where it has a one. • A set consisting of the monomials 1, t, t 2 , ••• , tn fOnTIS a basis of:P~[t]. Thus, this space is (n + 1)-dimensional. • The standard basis of en is a basis of R" as well. It is also called the standard basis oflllff • Thus, IRn is n-dimensional. • If we assume that a < 0 < b, then the set of monomials 1, x, x2, ... forms a basis for e""(a, b), because, by Taylor'stheorem, any fuuction belonging to e""(a, b) can be expanded in an infinite power series about x = O.Thus, this space is infinite- dimensional. .. Given a space V with a basis B = {lai)}i~l' the span of any m vectors (m < n) of B is an m-dimensional subspace of V. 1.2 Inner Product A vector space, as given by Definition 1.1.1, is too general and structureless to be of much physical interest. One useful structure introduced on a vector space is a scalarproduct. Recallthat the scalar (dot) product of vectors in the plane or in space is a rule that associates with two vectors a and b, a real number. This association, denoted symbolically by g : V x V --> E, with g(a, b) = a . b, is symmetric: g(a, b) = g(b, a), is linear in the first (and by symmetry, in the second) factor:" g(aa +tJb, c) = ag(a, c) +tJg(b, c) or (aa +tJb) . c = aa· c +tJb· c, 4A function that is linear in both of its arguments is called a bilinear function.
  • 44. 24 1. VECTORS AND TRANSFORMATIONS gives the "length" of a vector: lal2 = g(a, a) = a . a 2: 0, and ensures that the only vector with zero length'' is the zero vector: g(a, a) = 0 if and only if a = O. We want to generalize these properties to abstract vector spaces for which the scalars are complex numbers. A verbatim generalization of the foregoing proper- ties, however, leads to a contradiction. Using the linearity in both arguments and anonzero la), we obtain g(i la), i la» = i2g(la} , la» = -g(la) , la}). (1.1) Dirac "bra," ( I,and "kef' I ), notation Is used forInner products. inner product defined Either the right-hand side (RHS) or left-hand side (LHS) of this equation must be negative! But this is inconsistent with the positivity of the "length" of a vector, which requires g(la) , la» to be positive for all nonzero vectors, including i la}. The source of the problem is the linearity in both arguments. Ifwe can change this property in such a way that one of the i's in Equation (1.1) comes out complex- conjugated, the problem may go away. This requires linearity in one argument and complex-conjugate linearity in the other. Which argument is to be complex- conjugate linear is a matter of convention. We choose the first argument to be so.6 We thus have g(a la) +fJ Ib), Ie}) = a*g(la) , Ie}) +fJ*g(lb) , Ic}), where a* denotes the complex conjugate. Consistency then requires us to change the symmetry property as well. In fact, we must demand that g(la} , Ib» = (g(lb) , la»)*, from which the reality ofg(la) , la})-anecessary condition for its positivity-follows innnediately. The question of the existence of an inner product on a vector space is a deep problem in higher analysis. Generally, if an inner product exists, there may be many ways to introduce one on a vector space. However, as we shall see in Section 1.2.4, afinite-dimensional vector space always has an inner product and this inner product is unique." So, for all practical purposes we can speak of the inner product on a finite-diroensional vector space, and as with the two- and three-dimensional cases, we can omit the letter g and use a notation that involves only the vectors. There are several such notations in use, but the one that will be employed in this book is the Dirac bra(c)ket notation, whereby g(la} , Ib}) is denoted by (al b). Using this notation, we have 1.2.1. Definition. The innerproduct a/two vectors, la) and Ib), in a vector space Vis a complex number, (a Ib) E <C, such that 1. (alb) = (bla)* 2. (al (fJ Ib) +Y Ie}) = fJ (al b) + y (al c) SInourpresent discussion, we areavoiding situations in whichanonzero vectorcanhavezero"length." Suchoccasionsarise inrelativity, andwe shalldiscussthemin Part VII. 6In some books,particularly in themathematical literature. thesecondargument is chosento be linear. 7TIris uniqueness holdsupto a certain equivalence of inner products thatwe shallnotget intohere.
  • 45. positive definite, or Riemannian inner product sesquilinear 1.2 INNER PRODUCT 25 3. (ala) ,,::0, and (ala) =0 ifandonlyif ja) = 10). The last relation is calledthe positivedefiniteproperty ofthe innerproduct.8 A pos- itive definite inner product is also called a Riemannian inner product, otherwise it is called pseudo-Riemannian. Note that linearity in the first argument is absent, because, as explained earlier, it would be inconsistent with the first property, which expresses the "symmetry" of the inner product. The extra operation of complex conjugation renders the true linearity in the second argument impossible. Because of this complex conjugation, the inner product on a complex vector space is not truly bilinear; it is commonly called sesqnilinear. A shorthand notation will be useful when dealing with the inner product of a linear combination of vectors. 1.2.2. Box. We write the illS ofthe second equation in the definition above as (alPb + vc). This has the advantage of treating a linear combination as a single vector. The second property then states that if the complex scalars happen to be in a ket, they "split out" unaffected: (alpb +vc) = P(al b) +V (al c) . (1.2) On the other hand, if the complex scalars happen to be in the first factor (the bra), then they should be conjugated when they are "split out": (Pb +vcla) = P* (bl a) +V* (cla) . (1.3) natural inner product forC" A vector space V on which an inner product is defined is called an inner product space. As mentioned above, all finite-dimensional vector spaces can be turned into inner product spaces. 1.2.3.Example. In this example we introduce some ofthe most common inner products. The reader is urged to verify that in all cases, we indeed have an inner product. • Let la) , Ib) E cn, with la) = (ej , a2, ... , an) and Ib) = (fJj, f32, ... , f3n), and define an inner product on en as n (alb) =aif31 +a2f32+ .. ·+a~f3n =I>if3j· i=l That this product satisfies all the required properties of an inner product is easily checked. For example, if Ib) = la), we obtain (al a) = lal1 2+ la212 +...+la,,12, which is clearly nonnegative. 8The positive definiteness must be relaxed in the space-time of relativity theory, in which nonzero vectors can have zero "length."
  • 46. 26 1. VECTORS ANO TRANSFORMATIONS • Similarly, for la} • Ib) E JRn the same definition (without the complex conjugation) satisfies all the properties of an inner product. • For Ie},Ib} E <coo the natural inner product is defined as (al b) = Lr:;,1 aifJi' The question of the convergence of this sum is thesubject of Problem 1.16. • Let x(t), y(t) E :Pe[t], the space of all polynomials in t with complex coefficients. Define weight function ofan inner product defined interms ofintegrals (xl y) es L b w(t)x*(t)y(t)dt, (t.4) naturai inner product forcomplex functions where a and b are real numbers--or infinity-for which the integral exists, and wet) is a real-valued, continuous function that is always strictlypositivein the interval (a, b). Then Equation (1.4) defines an inner product. Depending on the so-called weight function w(t), there can be many different inner products defined on the infinite-dimensional space pC[t]. • Let I, g E <C(a, b) and define their inner product by UI g) '" L b w(x)!*(x)g(x) dx. It is easily shown that (II g) satisfies alt the requirements of the inner product if, as in the previous case, the weight function w(x) is always positive in the interval (a, b). This is called the standard innerproduct on <C(o, b). III 1.2.1 Orthogonality The vectors of analytic geometry and calculus are often expressed in terms of unit vectors along the axes, i.e., vectors that are of unit length and perpendicular to one another. Such vectors are also important in abstract inner product spaces. orthogonalilydefined 1.2.4, Definition. Vectors la) ,Ib) E V are orthogonal if (al b) =0. A normal vector,or normalized vector, Ie}is onefor whicb (e] e) = 1.A basis B = [lei)}~1 orthonormal basis in an N-dimensional vector space V is an orthonormal basis if { I ifi =j, e, e , -8"- ( ,I J) - 'J = 0 ifi 'I i. Kronecker delta where 8ij, defined by the last equality, is called the Kronecker delta. 1.2.5. Example. Here are examples of orthonormal bases: • The standard basis of R" (or C"} leI} =(1,0, ... ,0), le2) = (0, I, .... ,0), ... , len} = (0,0, ... ,1) is orthonormal under the usual inner product of those spaces. (1.5)
  • 47. (a) !az) 1.2 INNER PRODUCT 27 laz) Ia,)-l =1£'1) '- ~ Ie2) '- '- ~I~) I~) ~ ~"'a,)11 ------"'1~) Ie,) (b) (e) (d) Figure 1.1 The essence of the Gram-Schmidt process is neatly illustrated by the process in two dimensions. This figure, depicts the stages of the construction of two orthonormal vectors. • Lellek) =eikx /v'2n'be functions in iC(O, 2".) withw(x) = 1. Then 1 fozn . . (ekl ek) = - e-tkXe,kx dx =1, 2". 0 andfor I i' k, (ell ek) = ~ f21r e-ilxikx dx = ~ f2tr ei(k-l)x dx = o. 2".k 2".k Thns, (etlek) = ~lk' 1.2.2 The Gram-Schmidt Process III The Gram-Schmidt process explained It is always possible to convert any basis in V into an orthonormal basis. A process by wbich this may be accomplished is called Gram-Schmidt orthonormaliza- tion. Consider a basis B = {lal} , laz}, ... , IaN)}. We intend to take linear com- binations of lai) in such a way that the resulting vectors are orthonormal. First, we let leI) = lal) /v'(atlal) and note that (ell el) = 1. If we subtract from laz) its projection along leI), we obtain a vector that is orthogonal to leI) (see Figure 1.1). Ca1ling the resulting vector lez)' we have lez) = laz} - (ell az) let), which can be written more symmetrically as 102) = laz) - leI) (ell az). Clearly, this vector is orthogonal to Iet}.In order to normalize lez}, we divide it by (ezl ez). Then lez} == lez) / (ezl ez)will be a normal vector orthogonal to leI). Subtracting from la3) its projections along the first and second unit vectors obtained so far will give the vector z Ie;) = la3) -leI} (ella3) -lez} (ezla3) = la3) - L lei} (eil a3) , i=l
  • 48. 28 1. VECTORS ANO TRANSFORMATIONS la,) (a) (b) (c) Figure 1.2 Once the orthonormal vectors in the plane of two vectors are obtained, the thirdorthonormal vector is easily constructed. which is orthogonal to both lei) and le2) (see Fignre 1.2): =1 ~O I ,..-'-, ,..-'-, (ell e3) = (ell a3) - (eli el) (ell a3) - (ell e2) (e21a3) = O. Similarly, (e21 e~) = O. Erhard Schmidt (1876-1959) obtained bis doctorate nnder the supervision of David Hilbert. His main interest was in in- tegral equations and Hilbert spaces. He is the "Schmidt" of the Gram-8chmldtorthogonalization process, which takes a basis of a space and constructs an orthonormal one from it. (Laplace had presented a special case of this process long before Gram or Schntidt.) In 1908 Schmidt worked on infinitely many equations in infinitelymanyunknowns, introducingvariousgeometricno- tations and terms that are still in use for describing spaces of functions. Schmidt's ideas were to lead to the geometry of Hilbert spaces. This was motivated by the study of integral equations (see Chapter 17) and an attempt at their abstraction. Earlier, Hilbert regarded a function as given by its Fourier coefficients. These satisfy the condition that L~l a~ is finite. He introduced sequences of real numbers {xn} such that L~l x~ is finite. Riesz and Fischer showed that there is a one-to-one correspondence between square-integrable functions and square-summable sequences of their Fourier co- efficients. In 1907 Schmidt and Frechet showed that a consistent theory could be obtained if the square-summable sequences were regarded as the coordinates of points in an infinite- dimensional space that is a generalization of n-dimensional Euclidean space. Thusftmctions can be regarded as points ofa space, now called a Hilbert space.
  • 49. 1.2 INNER PRODUCT 29 In general, if we have calculated m orthonormal vectors lell, ... , !em), with m < N, then we can find the next one using the following relatious: m le~+l) = lam+!) - Lie,) (eilam+,) , i=l !e~+l) (e~+,1 e~'+l) (1.6) Even though we have been discussing finite-dimensional vector spaces, the process ofEquation (1.6) can continue for infinite-dimensions as well. The reader is asked to pay attention to the fact that, at eacb stage of the Gram-Schmidt process, one is taking linear combinations of the original vectors. 1.2.3 The SchwarzInequality Let us now consider an important inequality that is valid in both finite and infinite dimensions and whose restriction to two and three dimensions is equivalent to the fact that the cosine of the angle between two vectors is always less than one. . 1.2.6. Theorem. For any pair ofvectors la) , Ib) in an innerproduct space V, the Schwarz inequality Schwarz inequality holds: (al a) (bl b) 2: 1(al b) 1 2. Equality holds when la) is proportional to Ib). Proof Let [c) = Ib) - ((al b) / (al a)) la), and note that (al c) = O. Write Ib) = ((al b) / (al a») la) + Ic) and take the inner product of Ib) with itself: l (alb )1 2 (bib) = (ala) (ala) + (clc). Sincethelastterm is nevernegative, we have (bl b) > I (al b) 1 2 =} (al a) (bl b) 2: 1 (al b) 1 2• - (al a) Equality holds iff (c] c) = 0 or Ic) = O. From the definition of [c), we conclude that la) and Ib) must be proportional. 0 Notice the power of abstraction: Webave derived the Schwarz inequality solely from the basic assumptions of inner product spaces independent of the specific nature of the inner product. Therefore, we do not have to prove the Schwarz inequality every time we encounter a new inner product space. Karl Herman AmandusSchwarz (1843-1921) theson of anarchitect, wasborn in what is nowSobiecin, Poland. Aftergymnasium. Schwarz studied chemistry in Berlin fora time
  • 50. 30 1. VECTORS ANO TRANSFORMATIONS before switching to mathematics, receiving his doctorate in 1864. He was greatly influenced by the reigning mathematicians in Germany at the time, especially Kummerand Weierstrass. The lecture notes that Schwan took while attending Weierstrass's lectures on the integral calculus still exist. Schwarz received an initial appointment at Halle and later appointments in Zurich and Gotringen before being named as Weierstrass'g successorat Berlin in 1892. These later years, filled with students and lectures, were not Schwarz's most productive, but his earlypapers assurehis place in mathematicshistory. Schwarz's favorite tool was geometry, which he soon turnedto the studyof analysis.He conclusively provedsome of Riemann's results that had been previously(and justifiably) challenged. The primary result in question was the assertion that every simply connected region in the plane could be con- formally mapped onto a circular area. From this effort came several well-known results now associated with Schwarz's name, including the principle of reflection and Schwarz's lemma. He also worked on surfaces of minimal area, the brancbof geometrybelovedby allwho dabblewithsoapbub- bles. Schwarz's most important work, for the occasion of Weierstrass's seventieth birthday, again dealt with minimal area, specifically whether a minimal surface yields a minimal area. Along the way, Schwarz demonstrated second variation in a multiple integral, constructed a function using successive approximation, and demonstrated the existence of a "least" eigenvalue for certain differential. equations. This work also contained the most famous inequality in mathematics, which bears his name. Schwarz's success obviously stemmed from a matching of his aptitude and training to the mathematical problems of the day. One of his traits, however, could be viewed as either positive or negative-his habit of treating all problems, whether trivial or monumental, with the same level of attention to detail. This might also at least partly explain the decline in productivity in Schwarz's later years. Schwarz had interests outside mathematics, although his marriage was a mathematical one, since he married Kummer's daughter. Outside mathematics he was the captain of the local voluntary fire brigade, and he assisted the stationmaster at the local railway station by closing the doors of the trains! 1.2.4 Length of a Vector norm ofavector defined In dealing with objects such as directed line segments in the plane or in space, the intuitive idea of the length of a vector is used to define the dot product. However, sometimes it is more convenient to introduce the innerproduct first and then define the length, as we shall do now. 1.2.7. Definition. The norm, or length, ofa vector la) in an inner product space is denoted by lIall and defined as lIall ==.J"(£il<i). We use the notation lIaa +,Bbll for the norm ofthe vector a la) +,B Ib). One can easily show that the norm has the following properties:
  • 51. triangle inequality narmed linear space natural distance in a normed linear space parallelogram law 1.2 INNER PROOUCT 31 1. The norm of the zero vector is zero: 11011 = 0. 2. lIall 2:: 0, and lIall = 0 if and only if la) = 10). 3. lIaall = lailiall tor any? complex o. 4. lIa +bll ::0 lIall + IIbli. This property is called the triangle ineqnality. A vector space on which a nonn is defined is called a normed linear space. One can introduce the idea of the "distance" between two vectors in a nonned linear space. The distance between la) and Ib)----<!enoted by d(a, b)-is simply the norm of their difference: d(a, b) es lIa - bll. It can be readily shown that this has all the properties one expects of the distance (or metric) function introduced in Chapter O. However, one does not need a nonned space to define distance. For exannple,as explained in Chapter 0, one can define the distance between two points on the surface of a sphere, but the addition of two points on a sphere-a necessary operation for vector space structure-is not defined. Thus the points on a sphere forma metricspace,butnot a vector space. Inner product spaces are automatically nonned spaces, but the converse is not, in general, true: There are nonned spaces, i.e., spaces satisfying properties 1-4 above that cannot be promoted to inner product spaces. However, if the norm satisfies the paraUelogram law, (1.7) then one can define and show that it is indeed an inner product. In fact, we have (see [Frie 82, pp. 203-204] for a proof) the foUowing theorem. 1.2.8. Theorem. A normed linear space is an inner product space if and only if the norm satisfies the parallelogram law. Now consider any N-dimensional vector space V. Choose a basis Ilain!:! in V, and for any vector la) whose components are la; 1!:1 in this basis, define N lIall 2 == ~)ail2. i=l The reader may check that this defines a norm, and that the norm satisfies the parallelogrann law. From Theorem 1.2.8 we have the foUowing: 1.2.9. Theorem. Everyfinite-dimensional vectorspace can be turned into an inner product space.
  • 52. 32 1. VECTORS ANO TRANSFORMATIONS <C" has many different distance functions 1.2.10. Example. Let the space be en. Thenatural inner product of en givesriseto a norm, which,forthevector la} = (cq , aZI ••• , an) is n lIall=~= ~)a;l2. i=l This norm yields thefollowing distance between la) andIb) = Uh.lh..... fJn): n d(a,b) = Iia-bil =)(a-bla-b)= ~)ai -fJiI2• i=l Onecandefinenthernorms, suchaslIaIIJ sa LI~l lai I,whichhasall thereqniredproperties of anonn, andleadstothedistance n dl (a. b) = lIa- bill = ~)aj - fJ;I· i=1 Another norm defined on en is givenby lIalip = (~Ia;lp) lip where p is apositive integer. It isproved inhigher mathematical analysis thatII . lip hasall theproperties of anonn.(Thenontrivial part oftheproofis toverifythe triangle inequality.) Theassociated distance is (" )tlP dp(a, b) = lIa- blip = ;~ laj - fJ;lP Theother twonoons introduced above arespecialcases,forp = 2 andp = 1. III 1.3 Linear Transformations We have made progress in enriching vector spaces with structures such as norms and inner products.However,this enrichment, although important,will be of little value if it is imprisoned in a single vector space. We would like to give vector spaceproperties freedom of movement, so they can go from one space to another. The vehicle that carries these properties is a linear transformation which is the subject of this section. However,first it is instructive to review the concept of a mapping (discussed in Chapter 0) by considering some examples relevant. to the present discussion. 1.3.1. Example. Thefollowing area few familiar examples ofmappings. 9Thefirstproperty followsfromthisby lettinga = O.
  • 53. 1.3 LINEAR TRANSFORMATIONS 33 1. Let f : IR ->-1R be givenby f(x) = x2• 2. Let g : 1R2 ->- IR be givenby g(x, y) = x 2 + y2 - 4. 3. Let F : 1R2 ->- iC be givenby F(x, y) = U(x, y) + iV(x, y), where U : 1R2 ->- IR and V : 1R2 ->- lR. 4. Let T : IR ->- 1R2 be givenby T(t) = (t +3, 2t - 5). 5. Motion ofapointparticle in space can be considered as a mapping M : [a, b] --+ JR3, where [a, b] is an interval of the real line. For each t E [a, b], we defineM(t) = (x(t), y(t), z(t», where x(t), y(t), and z(t) are real-valned functions of t. If we identify t withtime,whichis assumed to havea value in the interval [a, b], then M(t) describes thepath of theparticle as a function of time,anda and b arethe beginning and theendof themotion, respectively. II Let us consider an arbitrary mapping F : V ---> W from a vector space V to another vector space W. It is assumed that the two vector spaces are over the same scalars, say Co Consider la) and Ib) in V and Ix) and Iy) in W such that F(la) = Ix) and F(lb) = Iy). In general, F does not preserve the vector space structnre. That is, the image of a linear combination of vectors is not the same as thelinearcombination of the images: F(a la) +{J Ib) i' aF(lx) +{JF(ly). This is the case for all the mappings of Example 1.3.1 except the fourth item. There are many applications in which the preservation ofthe vector space structure (preservation of the linear combination) is desired. 1.3.2. Definition. A linear transformation from the complex vector space V to the complex vector space W is a mapping T : V ---> W such that linear transformation, linear operator, endomorphism T(a la) +{JIb)) = aTria)) +{JT(lb)) Via) , Ib) E V and a, {J E C. A linear transformation T : V ---> V is called an endomorphism ofV or a linear operator on V. The action ofa linear transformation on a vector is written without the parentheses: T(la)) sa T la). The same definition applies to real vector spaces. Note that the definition demands that both vector spaces have the same set of scalars: The same scalars multiply vectors in V on the LHS and those in W on the RHS. An immediate consequence of the definition is the following: 1.3.3. Box. Twa linear transformations T : V ---> Wand U : V ---> Ware equal ifand only ifT la,) = U lail for all la,) in same basis afV. Thus, a linear transformation is uniquely determined by its action an same basis of its domain space.
  • 54. 34 1. VECTORS AND TRANSFORMATIONS linear functional qv, W) isavector space dual vector space v· derivative operator integration operator integration isalinear functional on the space ofcontinuous functions The equality in this box is simply the set-theoretic equality of maps discussed in Chapter O. Au importaut example of a linear trausformation occurs when the second vector space, W, happens to be the set of scalars, C or JR, in which case the linear trausformation is called a linear functional. The set oflinear trausformations from Vto W is denoted by r:.(V, W), aud this set happens to be a vector space. The zero transformation, 0, is defined to take every vector in V to the zero vector ofW. The sum oftwo linear trausformations Taud U is the lineartrausformation T+U, whose action on a vector la) E V is defined to be (T+U) la) es T la) + U la). Similarly, define aT by (aD la) es arT tal) = aT la). The set of endomorphisms of V is denoted by r:.(V) rather thau t:(V, V). The set oflinear functionals r:.(V, iC)--or t:(V, JR) ifV is a real vector space--- is denoted by V* aud is called the dual space of V. 1.3.4. Example. Thefollowing aresomeexamples of linear operators in various vector spaces.Theproofsof linearity are simpleinall cases andareleftasexercises forthereader. I. Let {Jal} , la2) , ... , lam}} be an arbitraryfiniteset of vectorsin V, and {I), 12, ... , 1m } an arbitrary set of linear functionals on V. Let m A'" L lak)lk E qV) k~l be definedby A [x) = L:~llak) Ik(lx» = L:~1 Ik(lx» lak}. Then A is a linear operator on V. 2. Let" be a permutation(shnffiing) of the integers It, 2, ... , n}. If [x} = (~1, ~2, ... , fJn) is a vector in en, we canwrite An Ix} = (~n(l)' ~n(2), ... , ~n(n). ThenAn is a linear operator. 3. For any Ix) E :P'[t], withx(t) = L:Z~o cxktk,write Iy} = D[x}, where Iy) isdefined as yet) = Lk=l kaktk-l. Then Dis a linear operator, thederivativeoperator. 4. For every [z} E :P'[t], with x(t) = L:Z~o cxktk, write Iy) = S Ix}, where Iy) E :P'[t] is definedas yet) = L:Z~O[CXk/(k + l)]tH I. Then S is a linear operator, the integrationoperator. 5. Definethe operatorInl : eO(a, b) -+ IRby inl(f) = lb f(t) dt. Thenintis alinear functional on thevectorspace eO(a, b)~ 6. Let en(a, b) be the set ofreat-valued functionsdefinedin the interval [a, b] whose first n derivatives exist and are continoous. For any If) E en(a, b) define lu} = GIf}, with u(t) = g(t)f(t) and get) a fixedfunction in en (a, b). Then Gis linear. In particular. theoperation of multiplying by t, whose operator is denoted by T, is ~~ .
  • 55. 1.3 LINEAR TRANSFORMATIONS 35 An immediate consequence of Definition 1.3.2 is that the image of the zero vector in V is the zero vector in W. This is not true for a general mapping, but it is necessarily true for a linear mapping. As the zero vector of V is mapped onto the zero vector of W, other vectors of V may also be dragged along. In fact, we have the following theorem. 1.3.5. Theorem. The set ofvectors in V that are mapped onto the zero vector of W under the linear transformation T : V --+ W form a subspace ofV called the Tremel, or null space, ofT and denoted by kerT. kernel ofa linear transformation Proof The proof is left as an exercise". D nullity The dimension of ker T is also called the nullity of V. The proof of the following is also left as an exercise. rank ofa linear transformation 1.3.6. Theorem. The range T(V) of a linear transformation T subspace ofW. The dimension ofT(V) is called the rank ofT. V--+Wisa 1.3.7. Theorem. A linear transformation is I-I (injective) iffits kernel is zero. Proof The "only if" part is trivial. For the "if" part, suppose T lat) = T la2);then linearity of T implies that T(lat) - la2)) = O. Since kerT = 0, we must have lat) = la2). D Suppose we start with a basis of ker T and add enough linearly independent vectors to it to get a basis for V.Withoutloss ofgenerality, let us assume thatthe first n vectors in this basis form a basis ofker T. So let B = (Ial) , la2), ... , IaN)}be a basis for V and B' = (Iat) , la2) , ... , Ian)}be a basis for kerT. Here N = dim V andn = dim kerT. II is straightforward to show that (Tlan+t), ... , T IaN)} is a basis for T(V). We therefore have the following result. dimension theorem 1.3.8. Theorem. Let T : V --+ W be a linear transformation. Then!O dim V = dim ker T +dim T(V) This theorem is called the dimension theorem. One ofits consequences is that an injective endomorphism is automatically surjective, and vice versa: 1.3.9. Proposition. An endomorphism ofa finite-dimensional vector space is bi- jective zf it is either injective or surjective. The dimension theorem is obviously valid only for finite-dimensional vec- tor spaces. In particnlar, neither swjectivity nor injectivity implies bijectivity for infinite-dimensional vector spaces. lORecall that the dimension of a vector space depends on the scalars used in that space. Although we are dealing with two different vector spaces here, since they are both over the same set of scalars (complex or real), no confusion in the concept of dimension arises.
  • 56. 36 1. VECTORS AND TRANSFORMATIONS 1.3.10. Example. Let us try to findthe kernelofT : ]R4 -+ ]R3 givenby 2xl +x2 +x3 -X4 =0, XI +x2 +2X3 +2X4 = 0, XI - X3 - 3X4 = 0. The"solution" to theseequations is Xl = x3 +3X4 andX2 = -3X3 - 5X4. Thus, to be in kerT,a vectorin r mustbeof the form where X3 andX4 are arbitrary realnumbers. It followsthat kerTconsists ofvectors that can bewritten as linearcombinations of the two linearly independent vectors(1, -3, 1, 0) and (3, -5,0, I). Therefore, dim kerT = 2. Theorem 1.3.8then says that dimT(V) = 2; that is, therange of T is two-dimensional. Thisbecomesclearwhenonenotesthat andtherefore T(x!. X2. X3, X4), anarbitrary vectorintherange ofT, is alinear combination of only twolinearlyindependentvectors,(I, 0, I) and (0, I, -I). III isomorphism and automorphism In many cases, two vector spaces may "look" different, while in reality they are very much the same. For example, the set of complex numbers C is a two- dimensional vector space over the reals, as is 1R2. Although we call the vectors of these two spaces by different names, they have very similarproperties. This notion of "similarity" is made precise in the following definition. 1.3.11. Definition. A vector space V is said to be isomorphic to another vector space W ifthere exists a bijective linear mapping T ; '17---> W.Then Tis called an isomorphism.II A bijective linear map ofVonto itselfis called an automorphism ofV. The set ofautomarphisms ofVis denoted by GL(V). For all practical purposes, two isomorphic vector spaces are different manifes- tations of the"same" vectorspace. Intheexample discussed above, thecorrespon- dence T; C ---> 1R2 , with T(x +iy) = (x, y), establishes an isomorphism between the two vectorspaces. It should be emphasized that only as vector spaces are C and 1R2 isomorphic. If we go beyond the vector space structures, the two sets are qnite different. For example, C has a natural multiplication for its elements, but 1R2 does not. The following theorem gives a working criterion for isomorphism. llThe word"isomorphism," as we shall see, is used in conjunction with many algebraic structures. To distinguish them, qualifiers need to be used.In the present context, we speakof linear isomorphism.We shalluse qualifiers whennecessary. However, thecontextusuallymakesthemeaning of isomorphism clear.
  • 57. only two N-dimensional vector spaces 1.3 LINEAR TRANSFORMATIONS 37 1.3.12. Theorem. A linear surjective map T : V --+ W is an isomorphism ifand only ifits nullity is zero. Proof The "only if" part is obvious. Toprove the "if" part, assume that the nnllity is zero. Then by Theorem 1.3.7, T is I-I. Since it is already surjective, T must be bijective. D 1.3.13. Theorem. An isomorphism T : V --+ W carries linearly independent sets ofvectors onto linearly independent sets ofvectors. Proof Assume that {Iai)}I~I is a set of linearly independent vectors in V.To show that{T lai))I=1 islinearly independentin W, assume that there exist eq , az, ... , am such that 2:1~1 a;T lai) = 10). Then the linearity of T and Theorem 1.3.12 give T(2:7:"1a; lai)) = 10), or 2:;'::1 a; laj) = 10), and the linear independence of the laj) implies that a, = °for all i. Therefore, {Tlaj))I~I must be linearly independent. D The following theorem shows that finite-dimensional vector spaces are severely limited in number: 1.3.14. Theorem. Twofinite-dimensional vector spaces areisomorphic ifand only ifthey have the same dimension. Proof Let Bv = {lai)}~1 be a basis for V and Bw = {lbi))~1 a basis for W. Define the linear transformation Tlai) = Ibi), i = 1,2, ... , N. The rest of the proof involves showing that T is an isomorphism. We leave this as an exercise for the reader. D A consequence of Theorem 1.3.14 is that all N-dimensional vector spaces over JR are isomorphic to JRN and all complex N-dimensional vector spaces are isomorphic to eN. So, for all practical purposes, we have only two N-dimensional vectorspaces, IRN andeN. 1.3.1 More on Linear Functionals An example of isomorphism is that between a vector space and its dual space, which we discuss now. Consider an N-dimensional vector space with a basis B = {Ial), laz), ... , ION)). For any given set of N scalars, {ai, az, ... , aN}, define the linear functional fa by fa laj) = at. When fa acts on any arbitrary vector Ib) = 2:~dJ; laj) in V, the result is (1.9) This expression suggests that Ib) can be represented as acolumnvectorwith entries th, fJz, ... , fJN and fa as a row vector with entries ai, az,···, aN. Then fa Ib) is
  • 58. 38 1. VECTORS ANO TRANSFORMATIONS Every set ofN scalars defines a linear functional. merely the matrix product'< of the row vector (on the left) and the column vector (on the right). fa is uniquely detennioed by the set {aI, a2, ... ,aNj. In other words, corre- sponding to every set of N scalars there exists a unique linear functional. This leads us to a particular set of functionals, fl, f2, , fN corresponding, respec- tively, to the sets of scalars {I, 0, 0, ... , OJ, {O, 1,0, , OJ, .,., (O, 0, 0, ... , I], This means that fl 101) = 1 f21a2) = 1 or that and and and fllaj} =0 f2laj} =0 for j i' 1, for j i' 2, for j i' N, (1.10) where 8ij is the Kronecker delta. The functionals of Equation (1.10) form a basis of the dual space V*. To show this, consider an arbitrary 9 E V*, which is uniquely determined by its action on the vectors in a basis B = lIal) , la2} , ... , ION} j. Let 9 lai} = Yi E C. Then we claim that 9 = L:;:"I Yifi. In fact, consider an arbitrary vector 10) in V with components (aI, a2, ... , aN) with respect to B. Then, on the one hand, 9 10) = g(L:;:"1 a; lai) = L:;:"I aig lai} = L:;:"! aiYi. On the other hand, (~Yifi) la} = (~Yifi) (~aj laj)) N N N N N = LYi Lajfi laj) = LYi LaAj = LYiai. ;=1 j=l ;=1 j=l i=l Since the actions of 9 and L:;:"I Yifi yield equal results for arbitrary 10),we con- clude that 9 = L:;:"I Yifi, i.e., {fij;:"1 span V'. Thus, we have the following result. 1.3.15. Theorem. If V is an N-dimensional vector space with a basis B = lIal} , 102) , ... , ION} j, then there is a corresponding unique basis B* = {f;};:"! in V* with the property that fi 10j} = 8ij. By this theorem the dual space of an N -dinnensional vector space is also N- dual basis dinnensional, and thus isomorphic to it. The basis B* is called the dual basis of B. A corollary to Theorem 1.3.15 is that to every vector in V there corresponds a 12Matrices will be taken upin Chapter 3. Here,we assumeonly a nodding familiarity withelementary matrix operations.
  • 59. 1.3 liNEAR TRANSFORMATIONS 39 unique linear functional in V*. This can be seen by noting that every vector la) is uniquely determined by its components (ai, a2, ... ,aN) in a basis B. The unique linear functional fa corresponding to [a), also called the dual of la), is simply Lf::l ajfj, withIi E B*. annihilator ofavector and a subspace 1.3.16. Definition. An annihilator of la) E V is a linear junctional f E V* such that f la) = O. Let W be a subspace ofV. The set oflinear functionals in V* that annihilate all vectors in W is denoted by WO. The reader may check that WO is a subspace of V*. Moreover, if we extend a basis {Iai))f~t of W to a basis B = (Iaj) 1[:,1 of V, then we can show that the functionals {fj 17=k+l' chosen from the basis B* = (fj 17=t dnalto B, span WO. I! then follows that dim V = dimW+dimWO. (1.11) dual, or pull back, of a linear transformation We shall have occasions to use annihilators later on when we discuss symplectic geometry. We have "dualed" a vector, a basis, and a complete vector space. The only object remaining is a linear transformation. 1.3.17. Definition. Let T : V --> 11be a linear transformation. Define l' : 11* --> V* by13 [T*(g)J la) = g(T la») Via) E V, 9 E 11*, T* is called the dual or pull back, ofT. One can readily verify that l' E £'(11*,VOl, i.e., that T* is a linear operator on 11*. Some of the mapping properties of T* are tied to those of T. To see this we first consider the kernel of1'. Clearly, 9 is in the kernel of l' if and only if 9 annihilates all vectors of the form T [a), i.e., all vectors in T(V). I! follows that 9 is in T(V)o. In particular, ifT is surjective, T(V) = 11,and 9 annihilates all vectors in 11, i.e., it is the zero linear functional. We conclude that kerT* = 0, and therefore, T*is injective. Similarly, one can show that if T is injective, then T* is surjective. We summarize the discussion above: 1.3.18. Proposition. Let T be a linear transformation and T* its pull back. Then kerT* = T(V)o. 1fT is surjective (injective), then T* is injective (surjective). In particular, T" is an isomorphism ifT is. I! is useful to make a connection between the inner product and linear func- tionals. To do this, consider a basis (Iat) , la2) , ... , IaN)) and letaj = (al aj). As noted earlier, the set of scalars (ail[:,t defines a unique linear functional fa such that t, lai) = ai. Since (a Iaj) is also equal to cq, ilis natural to identify fa with duals and inner products 13Donot confusethis"*,,withcomplexconjugation.
  • 60. 40 1. VECTORS AND TRANSFORMATIONS tihe symbol (ai, and write T: fa >--> (al where T is tiheidentification map. It is also convenient to introduce tihe notation'f (Ia))t es (ai, (1.12) dagger ofalinear combination of vectors where tihe symbol t means "dual, or dagger of." Now we ask: How does tihis dagger operation act on a linear combination of vectors? Let ]c) = ala) +fJ Ih) and take tihe inner product of [c) witihan arbitrary vector Ix} using linearity iu tihe second factor: (xl e) = a (xl a) +fJ (xl h). Now complex conjugate botihsides and use tihe (sesqui)symmetry of tihe inner product: (LHS)* = (xle)* = (cl x}, (RHS)* =a* (xl a)* +fJ* (xl h)* = a* (al x) +fJ* (hi x) = (a* (al +fJ* (hi) [x) . Since tihis is true for all [x), we musthave (Ie))t es (c] = a* (al+fJ* (hi. Therefore, in a duality "operation" tihecomplex scalars must be conjugated. So, we have (a la) +fJ Ih})t = a* (al +fJ* (hi. (1.13) Thus, unlike tiheassociation la) .... la which is linear, tihe association la .... (al is uotlinear, but sesquilinear, i.e., tiheideutification map T mentioned above is also sesquilinear: T(ala +fJlb) = a* (al +fJ* (hi = a*T(la) +fJ*T(lb); It is convenient to represent lu) E en as a column vector Then tihe definition ofthe complex inner product suggests tihattihe dual of la) must be represented as a row vector witihcomplex conjugate entries: Compare (1.14) with the comments after (1.9). The complex conjugation in(1.14) isthe resuit ofthe sesquilinearily ofthe association la) <+ (s], (al = (aj ai ... a~), and tiheinner product can be written as tihe (matrix) product (al h) = (aj ai (1.14) 14The significance of thisnotation will becomeclearin Section2.3.
  • 61. 1.4 ALGEBRAS 41 1.3.19. Example. Let IL and Vbe vectorspaceswith bases BU =Iluj)}i'~l and BV = (IVj}}j=l' respectively. Consider an mn-dimensional vector space W whose basis Bw is in one-to-one correspondence with the pairs (lUi), IVj», and let !UiVj) be the vector corresponding to (Iuj), IVj»' For lu) Ell with components 10I;}i'=1 in Bu and [v) E V with components {11jlj=l in Bv, definethe vector lu, v} E W whosecomponents in Bw are {OIi~j}' One can easily sbowthat if lu), lu'), and lu") are vectorsin IL and lu") = 01lu) + fJ lu'), then [a". v) = 01 lu, v) +fJ lu', v) ThespaceW thusdefined is calledthe tensor product oflL andV anddenotedby IL <8> V. One can alsodefine tensorproductof three or more vectorspaces. Of specialinterest are tensor products of a vector space and its dual. The tensor product space Vr,s of type (r,s) is definedas follows: V,., = V <8> V <8> ... V<8>V* <8>V* <8> ... <8> V* I ' - v - " , . r times s times Wesballcomeback to this spacein Cbapter25. 1.4 Algebras III In many physical applications, a vector space has a natural "product," the prime example being the vector space of matrices. It is therefore useful to consider vector spaces for which such a product exists. algebra defined 1.4.1. Definition. An algebra A over iC (or IR) is a vector space over iC (or IR), together with a binary operation /L : V x V -+ V, called multiplication, that satisfies15 a(pb +ye) = pab + yac Va, b, e E A, Vp,Y E iC (or IR), dimension ofthe algebra; associativity; commutativity; identity; and right and leftinverses with a similar relationfor multiplication on the right. The dimension ofthe vector space is called the dimension ofthe algebra. The algebra is called associative if the product satisfies a(bc) = (ab)e and commutative if it satisfies ab = ba. An algebra with identity is an algebra that has an element 1 satisfying al =la =a. An element b ofan algebra with identity is said to be a left inverse ofa ifba = 1. Right inverse is defined similarly. 1.4.2. Example. Definethe following producton ]Rz: (xj, XZ)(Yl, yz) = (XIYl - XZYZ, X1YZ + XZYl)· The reader is urged to verify that this productturns JR2 into a commutativealgebra. 15Weshall abandon the Dirac bra-and-ket notation in this section due to its clumsiness; instead we use boldface roman letters to denote vectors. It is customary to write ab for p,(a, b).
  • 62. 42 1. VECTORS ANO TRANSFORMATIONS Similarly, thevector(cross)product on lR.3 turns itinto a nonassociative, noncornmu- tative algebra. Theparadigm of allalgebras is thematrix algebra whosebinary operation is ordinary multiplication of n x n matrices. Thisalgebra is associative butnotcommutative. All the examples above arefinite-dimensional algebras. An example of an infinite- dimensional algebra is eOO(a, b), the vectorspace of infinitely differentiable real-valued functions onarealinterval (a, b). Themultiplication isdefioedpointwise: H f E eOO(a, b) andg E eOO(a, b), then Thisalgebra is commutative andassociative. fg(x) '" f(x)g(x) v x E (a, b). .. derivation ofan algebra defined The last item in the example above has a feature that turns out to be of great significancein all algebras,the product rule for differentiation. 1.4.3.Definition. A vectorspace endomorphism D : A --+ A is calleda derivation on A if it has the additional property D(ab) = [D(a)]b +a[D(b)]. 1.4.4. Example. LetA bethesetof n x n matrices. Definethebinary operation, denoted bye, as AoB"'AB-BA, where the RHS is ordinary matrix multiplication. The reader may check thatA together withthisoperation becomesanalgebra. Now let Abe a fixedmatrix, anddefinethelinear transformation Thenwe notethat DA(B 0 C) = A. (B. C) = A(B. C) - (B 0 C)A =A~-~-~-C~=~-_-_+~ Ontheother hand, (DAB). C + B. (DAC) = (AoB) .C+B. (AoC) = (AB- BA) .C+B. (AC - CAl = (AB - BA)C- C(AB- BA) + B(AC - CAl - (AC- CA)B = ABC+ CBA- BCA- ACB. So, DA is a derivation on A. .. Thelineartransformations connectiogvectorspacescanbemodifiedslightlyto accommodatethebinaryoperationofmultiplicationof thecorrespondingalgebras:
  • 63. (1.15) algebra homomorphism and isomorphism structure constants ofanalgebra 1.4 ALGEBRAS 43 1.4.5.Definition. Let A and 13 be algebras. A linear transformation T : A ..... 13 is called an algebra homomorphism if T(ab) = T(a)T(b). A bijective algebra homomorphism is called an algebra isomorphism. 1.4.6.Example. LetA be]R3, and :8 thesetof3 x 3matrices oftheform Then themap T : A -+ :8 defined by canbe shown to be a linear isomorphism. Let the crossproduct be the binary operation onA, tnming itintoanalgebra. For B, define thebinary operation ofExample 1.4.4. The reader maycheckthat, withtheseoperations, Tis extended to analgebra isomorphism. IlIlI Given an algebraA and a basis B = (e;}~1 for the underlyingvector space, onecanwrite N eiej = Lc~ekl k~1 The complex numbers ef j , the components of the vector eiej in the basis B, are calledthe structureconstants of A. These constantsdeterminetheproductof any two vectorsonce theyare expressedinterms of thebasis vectorsof B. Conversely, 1.4.7.Box. Given any N-dimensional vector space V, one can turn it into an algebra by choosing a basis and a set ofN3 numbers {c~} and defining the product ofbasis vectors by Equation (1.15). 1.4.8. Example. Consider the vector space of n x n matrices with its standard basis {ei]'}~ "-I' wheree., hasa 1 atthe ijthposition and zero everywhere else. Thismeans that l,j_ (eij)zk = 8il8jko and n n (eijekl)mn = L(eij)mr(ekl)rn = L~im~jr8kr8In = lJimlJjk8In = 8jk(eU)mn, r=1 r=1 or eijekl = 8jkeil· The structure constantsarecjj~kl = 8im8jk8ln. Note thatone needs a double index to label these constants. II
  • 64. 44 1. VECTORS AND TRANSFORMATIONS 1.4.9. Example. In thestandard basis{eil ofR4, choose thestructore constants asfol- lows: 2 2 2 2 et = -02 = -e3 = -e4 = ej, elei = eie! = ei for i = 2, 3, 4, eiej = L Eijkek for i, j = 2, 3, 4, k algebra of quaternions [<ijk is defined in Equation (3.19)]. The reader may verify that these relations turn R4 intoanassociative, butnoncommutative, algebra, calledthealgebra of quaternionsand denoted by1HI. In thiscontext, ej isusually denoted by I, ande2,e3,ande4byi, j, andk, respectively, and one writes q = x + iy + iz +kw foranelement oflBI. It thenbecomes evident that1HI isageneralization ofC.In analogy withC,x iscalledtherealpart ofq, and (y, z, w) thepure part ofq. Similarly, theconjugateofq isq* = x - iy - iz - kw. III Algebras have a surprisingly rich structure and are used extensively in many branches of mathematics and physics. We shall see their usefulness in our dis- cussion of group theory in Part vn. To ciose this section, and to complete this introductory discussion of algebras, we cite a few more useful notions. left, right, end 1.4.10. Definition. LetA be an algebra. A subspace 'B ofA is calledasubalgebra two-sided ideals ofA if'B contains the products ofall its members. If'B has the extra property that it contains abfor all a E A and b E 'B, then 'B is called a left ideal ofA. A right ideal and a two-sided ideal are defined similarly. It is ciear from the definition that an ideal is automatically a subalgebra, and that 1.4.11. Box. No proper ideal of an algebra with identity can contain the identity element. In fact, no proper left (right) ideal can contain an element that has a left (right) minimal ideal inverse.An ideal can itself contain a proper (sub)ideal.Ifan ideal does not contain any proper subideal, it is called a minimal ideal. 1.4.12. Example. Thevector space eO(a, b) of all contiuuous real-valued functions ou an interval (a, b) is turned into a commutative algebra by pointwise multiplication: If f, g E eO(a, b), thentheproduct fg is defined by fg(x) '" f(x)g(x) for allx E (a, b). The set of functions that vanish at a given fixedpointC E (a, b) constitutes an ideal in eO(a, b). Sincethealgebra is commutative, theidealis two-sided. II ideals generated by an element ofan algebra One can easily construct left ideals for an algebra A: Take any element x E A and consider the set Ax", {ax Ia E A}.
  • 65. 1.5 PROBLEMS 45 The reader may check that .Ax is a left ideal. A right ideal can be constructed similarly. To construct a two-sided ideal, consider the set .AKA== {axb Ia, h E .A}. These are all called ideals generated by x. 1.5 Problems 1.1. Let R+ denote the set of positive real nnmbers. Define the "sum" of two elements of R+ to be their usual product, and define scalar multiplication hy elements of R as being given by r . p = pr where r E R and p E R+. With these operations, show that R+ is a vector space over R 1.2. Show that the intersection of two subspaces is also a subspace. 1.3. For each of the following subsets of R3 determine whether it is a subspace ofR3: (a) {(X, y, z) ER3 jx + y - 2z = 0}; (b) {(x,y,z)ER3Ix+y- 2z=3); (c) {(x, y, z) E R3 1xyz = OJ. 1.4. Prove that the components of a vector in a given basis are unique. 1.5. Show that the following vectors form a basis for en (or Rn). I I I I laz) = I I I o ... , I o o o 1.6. Let W be a subspace of R5 defined by W = {(Xl, ... ,X5) E R5 1xI = 3xz +X3, Xz = X5, and xa = 2x3}. Find a basis for W. 1.7. Show that the inner product of any vector with 10) is zero. 1.8. Prove Theorem 1.1.5. 1.9. Find aD, bo, bl, CO, Ci, and cz such that the polynomials aD, bo + bIt, and CO +Cit +cztZ are mutually orthonormal in the interval [0, I]. The inner product is as defined for polynomials in Example 1.2.3 with wet) = 1.
  • 66. 46 1. VECTORS AND TRANSFORMATIONS 1.10. Given the linearly independent vectors x(t) = t", for n = 0, 1,2, ... in pc[I], use the Gram-Schmidt process to find the orthonormal polynomials eo(t), ei (t), and e2(t) (a) when the inner product is defined as (xl y) = J~l x*(t)y(t) dt. (b) when the inner product is defined with a nontrivial weight function: (xl y) = i:e-t'x*(t)y(t) dt. Hint: Use the following result: j ..;n 00 2 e-ttndt= 0 ioo 1 ·3·5 .. · (n - 1) ..;n 2n/ 2 ifn = 0, if n is odd, if n is even. 1.11. (a) Use the Gram-Schmidt process to find an orthonormal set of vectors out of (I, -I, 1), (-1,0, I), and (2, -1,2). (b) Are these three vectors linearly independent? Ifnot, find a zero linear combi- nation of them by using part (a). 1.12. (a) Use the Gram-Schmidt process to find an orthonormal set of vectors out of (I, -I, 2), (-2, I, -1), and (-1, -I, 4). (b) Are these three vectors linearly independent? If not, find a zero linear combi- nation of them by using part (a). 1.13. Show that 1.14. Show that i:dx i:dy (x5 - x3 +2x2 - 2)(y5 -l +2y2 - 2)e-(x 4 + y4 ) s i:dx i:dy (x4 - 2x2 + l)(i +4y3 +4)e-(x 4 +y'). Hint: Define an appropriate inner product and use the Schwarz inequality. 1.15. Show that for any set of n complex numbers ai, a2, ... , an, we have la1 +a2 + ...+an 1 2 :s n (la112 + la212 + ... + Ian 1 2) . Hint: Apply the Schwarz inequality to (I, 1, ... ,1) and (ai, a2, ... , an).
  • 67. 1.5 PROBLEMS 47 1.16. Using the Schwarz inequality show that if (a;}f,;;, and ltl;}f,;;, are iu iCoo, then Lf,;;, al fJ; is convergent. 1.17. Show that T : IR2 --> IR3 given byT(x, y) = (x2 + y2,x + y, 2x - y) is not a linear mapping. 1.18. Verify that all the transformations of Example 1.3.4 are linear. 1.19. Let 17: be the permutation that takes (1,2,3) to (3, 1, 2). Find A" Ie;}, i = 1,2,3, where {Ie;}}T=, is the standard basis ofIR3 (or i(3), and A" is as defined in Exam- ple 1.3.4. 1.20. Show that if TEI:,(iC, C), then there exists a E iCsuch that T la} = ala} for all la) E C. 1.21. Showthatif (la;}}f~l spans V and TEI:,(V, W) is surjective, then (T la,}}f=l spans W. 1.22. Give an example of a function f :IR2 --> IR such that f(a la}) = af(la}) Va E IR and la} E IR2 but f is not linear. Hint: Consider a homogeneous function of degree 1. 1.23. Show that the following transformations are linear: (a) V is ic over the reals and C [z) = [z"). Is C linear ifinstead ofreal numbers, complex numbers are used as scalars? (b) V is pelt] and T Ix(t) = Ix(t + 1) - Ix(t)}. 1.24. Verify that the kernel of a transformation T : V --> W is a subspace of V, and that T(V) is a subspace of W. 1.25. Let V and W be finite dimensional vector spaces. Show thatifTEI:,(V, W) is surjective, then dim W :::dim V. 1.26. Suppose that V is finite dimensional and TEl:,(V, W) is not zero. Prove that there exists a subspace U of V such that ker T nU = (OJ and T(V) =T(U). 1.27. Show that WO is a subspace of V* and dim V = dimW+ dimWo. 1.28. Show that T and T* have the same rank. In particular, show that if T is injective, then T* is surjective, Hint: Use the dimension theorem for T and T* and Equation (1.11).
  • 68. 48 1. VECTORS ANO TRANSFORMATIONS 1.29. Show that (a) the product on IR2 defiued by turns IR2 into an associative and commutative algebra, and (b) the cross product onIR.3 turns it intoanonassociative, noncommutative algebra. 1.30. Fix a vector a E IR3 and define the linear transformation D. : IR3 -4 IR3 by D.(h) = a x b. Show that D. is a derivation ofIR3 with the cross product as multiplication. 1.31. Show that the linear transformation of Example 1.4.6 is an isomorphism of the two algebras A and 11. 1.32. Write down all the structure constants for the algebra of quaternions. Show that this algebra is associative. 1.33. Show that a quaternion is real iff it commntes with every quaternion and that it is pure iffits square is a nonpositive real number. 1.34. Let p and q be two quaternions. Show that (a) (pq)* = q*p*. (b) q E IRiffq* = q, andq E IR3 iffq* = -q. (c) qq* = q*q is a nonnegative real number. 1.35. Show that no proper left (right) ideal of an algebra with identity can contain an element that has a left (right) inverse. 1.36. Let A be an algebra, and x E A. Show that Ax is a left ideal, xA is a right ideal, and AxA is a two-sided ideal. Additional Reading I. Alder, S. Linear Algebra Done Right, Springer-Verlag, 1996. A small text packed with information. Lots of marginal notes and historical remarks. 2. Greub, W.LinearAlgebra, 4th ed., Springer-Verlag, 1975. Chapter V has a good discussion of algebras and their properties. 3. Halmos, P.Finite Dimensional Vector Spaces, 2nd ed., VanNostrand, 1958. Another great book by the master expositor.
  • 69. 2 _ Operator Algebra Recall that a vector space in which one can multiply two vectors to obtain a third vector is called an algebra. In this cbapter, we want to iuvestigate the algebra of linear transformations. We have already established that the set of linear trans- formations L (V, W) from V to W is a vector space. Let us attempt to define a multiplication as well. The best candidate is the composition of linear transforma- tions. If T : V --+ U and S : U --+ W are linear operators, then the composition SoT: V --+ W is also a linear operator, as can easily be verified. This product, however, is not defined on a single vector space, bnt is such that it takes an element in L(V, U) and another element in a second vector space L(U, W) to give an element in yet another vector space L(V, W). An algebra requires a single vector space. We can accomplish this by letting V = U = W. Then the three spacesoflineartransformations collapsetothe single space L (V, V), the set of endomorphisms of V, which we abbreviate as ,c,(V) and to which T, S, ST sa SoT, and TS sa To S belong. The space ,c,(V) is the algebra of the linear operators on V. 2.1 Algebraof 'c(V) Operator algebra encompasses various operations on, and relations among, op- erators. One of these relations is the equality of operators, which is intuitively obvions; nevertheless, we make it explicit in (see also Box 1.3.3) the following defiuition. operator equality 2.1.1.Definition. Two linearoperators T,U E ,c,(V) areequalifT 10) = U la)Ior aUla) E V. Because of the linearity of T and U, we have
  • 70. 50 2. OPERATOR ALGEBRA 2.1.2. Box. Two endomorphisms T, U E L(V) are equal ifT lai} = Ulai} for aUla,} E B, where B is a basis ofV. Therefore, an endomorphism is uniquely determined by its action on the vectors ofa basis. The equality of operators can also be established by other, more convenient, methods when an inner product is defined on the vector space. The following two theorems contain the essence of these alternatives. 2.1.3. Theorem. An endomorphism T ofan inner product space is 0 ifand only if! {blT la} es (bl Ta) = Ofor aUla) and Ib). Proof Clearly, ifT = 0 then {bl T la} = O. Conversely, if (blT la) = 0 for all ]a) and Ib), then, choosing Ib) =T la) = ITa), we obtain (TaITa)=O Via) * Tla)=O Via) * T=O by positive definiteness of the inner product. o 2.1.4. Theorem. A linear operator T on an inner product space is 0 ifand only if (al T la) = Ofor aUla}. Proof Obviously, ifT = 0, then (al T la) = O. Conversely, choose a vectora la)+ fJ Ib), sandwich T between this vector and its dual, and rearrange terms to obtain polarization identity what is known as the polarization identity: a*fJ (al T Ib) +afJ* (blT la) = (aa + fJbl T laa + fJb) -lal2 (al T la) - IfJI 2 (bl T Ib). According to the assumption of the theorem, the RHS is zero. Thus, if we let a = fJ = 1 we obtain (alT Ib) + (hiT la) = O. Similarly, with a =1 and fJ = i we get i (al T Ib) - i (blT la) = O. These two equations give (al T Ib) = 0 for all la) , Ib). By Theorem 2.1.3, T = O. 0 To show that two operators U and T are equal, one can either have them act on an arbitrary vector and show that they give the same result, or one verifies that U- Tis the zero operatorby means ofone ofthe theorems above. Equivalently, one shows that (alT Ib) = (al U Ib) or (al T la) = (al U la) for all le}, Ib}. In addition to the zero element, which is present in all algebras, L(V) has an identity element, 1, which satisfies the relation 1 la) = la) for all ]«) E V. With 1 in our possession, we can ask whether it is possible to find an operator T-1 with the property that T-I T = n-I = 1. Generally speaking, ouly bijective mappings have inverses. Therefore, only automorphisms ofa vector space are invertible. 1It is convenient here to use the notation ITa) for T la). This would then allow us to write the dual of the vector as (Tal, emphasizing that it is indeed the bra associated with T la).
  • 71. 2.1 ALGEBRA OF L(V) 51 2.1.5. Example. Let the linear operator T : lit3 -+ lit3 be defined by T(xl. x2, X3) = (Xl +X2, X2 +x3, xl +x3)· We wantto see whetherT is invertible and,if so, find its inverse. T has an inverseif and onlyif it is bijective. By thecomments after Theorem 1.3.8thisis thecaseif andonlyif T is either surjective orinjective. The latter is equivalent to kefT = IO}. ButkefTis the set of all vectors satisfying T(XI,x2, X3) = (0,0, 0), or Xl +X2 =0, X2+X3 =0, Thereader may checkthattheunique solutionto these equations is xi = x2 = x3 = o. Thus, theonlyvector belonging tokef Tis thezerovector. Therefore, Thasaninverse. To find r l applyT-IT = 1 to (XI,X2,X3): (xj , X2, X3) =T-IT(xl. X2, X3) =T-I(XI +X2, X2 +x3, Xl +X3)· This equationdemonstrates how T-1 acts on vectors. Tomake this more apparent, we let Xl + X2 = x, X2 + X3 = y, Xl + x3 = z, solve for Xl, x2, andX3 in terms of x, y, andz, andsubstitute inthepreceding equation to obtain rl(x,y,z) = !(x - y+z,x + y -z, -x + y+z). Rewriting thisequation in terms of Xl, x2. andX3 gives T- I(XI,X2,X3) = !(XI -x2 +x3,XI +x2 -x3, -Xl +x2 +X3)' Wecaneasily verify thatT-1T= 1 andthatTT- 1 = 1. III The following theorem, whose proof is left as an exercise, describes some properties of the inverse operator. 2.1.6. Theorem. The inverse of a linear operator is unique. 1fT and S are two invertible linear operators, then TS is also invertible, and An endomorphism T : V ---> V is invertible if and only if it carries a basis of V onto another basis ofV. 2.1.1 Polynomials of Operators With products and sums of operators defined, we can construct polynomials of operators. We define powers of T inductively as Tm = TT"'-l = Tm-IT for all positive integers m ::: I. The consistency of this equation (for m = I) demands that orO = 1. It follows that polynomials such as p(T) = "01 +"IT +"2T2 +...+"nT" can be defined. I ODTO l'WnJRJ'ROiANES! M. E. T. U. USRARY
  • 72. 52 2. OPERATOR ALGEBRA 2.1.7. Example. Let TO : IR2 --+ ]R2 be the linear operator that rotates vectors in the xy-plane through the angle 0, that is, Te(x, y) = (x cos s - y sin e',x sln s + y cos s). Weareinterested in powersof Te: x' y' T~(X, y) = Te(x cos s - y sin e',x stn s + y ccs s) = (x' coss - y'sine, x' sine +y'cosB) = «x cos s - y sin s) cos s - (x sinf +y coae) sine, (x cos s - y sinO) sin s + (x sinO +ycos sj cos s) = (x cos 20 - y sin 20, x sin 20 + y cos20). Thus, T2 rotates (x, y) by 20. Similarly, one can show that T~(X, y) = (x cos 30 - y sin 30, x sin 30 + ycos30), andin general, l"9(x, y) = (x cosnB - y sinnB, x sinnB +y cosn(J). whichshowsthat 19 is a rotation of (x, y) through the anglentJ, thatis, TO = TnO. Thisresultcouldhavebeen guessedbecause TO is equivalent torotating (x, y) n times,eachtimeby ananglee. III Negative powers of an invertible linear operator T are defined by T-m (T-I)m. The exponents of T satisfy the usual rules. In particular, for any two integers m and n (positive or negative), TmTn = ~+n and ~)n = r nn. The first relation implies that the inverse of Tm is T-m. One can further generalize the exponent to include fractions and ultimately all real numbers; but we ueed to wait uutil Chapter 4, in which we discuss the spectral decomposition theorem. 2.1.8. Example. Letus evaluate Tin for theoperator of theprevious example. First, let us findTel (seeFigure 2.1). Wearelooking for an operator sucb that TelTe(x, y) =(x, y), or Tel (x cos s - y sine',x sinO +y cos s) = (x, y). (2.1) We define x' = x cos () - ysin eand y' = x sin e+ ycos eand solve x and yin terms of x' andy' to obtain x = x' cose+ y' sin(J andy= -x' sine+./cose. Substituting forx and y in Equation (2.1) yields T01(X', y') = (x' cose +y'sine, -x' sine + y' cos e). Comparing this withthe actionof T8 in the previous example, we discoverthat the only difference betweenthe two operators is the sign of the sineterm, Weconclude that TO1 hasthe sameeffectasT-8' So we have and -n (T-I)n (T )n T8 = 8 = -8 =T_n8·
  • 73. 2.1 ALGEBRA OF L(V) 53 "J8Cx,y) (x,y) Figure 2.1 Theoperator T(;I andits inverse astheyacton apointin theplane. It is instructive to verifythatTin T9 = 1: x' y' TinT'J(x, y) = T(in(x cosn(} - y sin nO', x sinne + y cosn(}) = (x' cosn8 + y' sinnB, -x' sinnB + y' cos nO) = «x cosnB - y sinnB) cosnf:J + (x sinn() + y cosn(}) sinne, - (x cos nO - y sin nO)sin nrJ + (x sin n() + y cos nfJ) cosnB) = (x (cos2 nO+sin2 nO), y(sin2 nO+cos2 nO)) = (x, y). Similarly, we can showthat T,JTiJn(x, y) = (x, y). l1li One has to keep in mind that p (T) is not, in general, invertible, even ifT is. In fact, the snm oftwo invertible operators is not necessarily invertible. For example, although T and -Tare invertible, their sum, the zero operator, is not. 2.1.2 Functions of Operators We can go one step beyond polynomials of operators and, via Taylor expansion, define functions ofthem. Consider an ordinary function f (x), which has the Taylor expansion 00 (x - xO)k d k f I f(x) = L , -k . k=Q k. dx x=xo in which xo is a pointwhere f (x) and all its derivatives are defined. Tothis function, there corresponds a fnnction of the operator T, defined as f(T)=fdk{1 (T- xol )k k=O dx x=xo k! (2.2)
  • 74. (2.3) (2.4) 54 2. OPERATOR ALGEBRA Becausethisseriesisaninfinite sumof operators, difficulties mayarise concerning its convergence. However, as will be shown in Chapter 4, f (T) is always defined for finite-dimensional vector spaces. In fact, it is always a polynomial in T. For the time being, we shall think of f (T) as a formal infinite series. A simplification results when the function can be expanded about x = O. In this case we obtain 00 dkfl Tk f(T) = L d k -k' . k~O x x~O . A widely used function is the exponential, whose expansion is easily found to be 00 Tk e T '" exp(T) = L-' k~O k! 2.1.9.Example. Let us evaluateexp(aT) wheu T : ][l.2 -+ ][l.2 is given by T(x,y) = (-y,x). Wecan find a generalformulafor the action ofTn on (x, y). Start with n =2: T2(x, y) = T(-y, x) = (-x, -y) = -(x, y) = -l(x, y). Thus, T2 = -1. From T andT2 we caneasily obtain higher powers of T. Forexample: T3 = T(T2)= -T, 'f'I = T2T2 = 1, and in general, T2n = (-I)nl T2n+1 = (-I)nT Thus, for n =0, 1,2, . forn =0, 1,2, . (aT)n (aT)n 00 (aT)2k+l 00 (aT)2k exp(aT) = L -,- + L -,- = L ,+ L --, n odd n. n even n. k=O (2k + I). k~O (2k). 00 a2k+lT2k+l 00 a2kT2k 00 (_lla2k+1 00 (_I)ka2k =L , +L--, =L I T+L , 1 k=O (2k + I). k~O (2k). k=O (2k + I). k~O (2k). 00 (_I)ka2k+l 00 (_I)ka2k =TL , +lL " k~O (2k + I). k~O (2k). Thetwoseries arerecognized as sina andcosa. respectively. Therefore, we get eaT = Tsina + 1cosa, whichshowsthat eaT is apolynomial (of first degree) inT. The action of eaT on (x, y) is given by e·T (x, y) = (sinaT + cosut) (x, y) = sinaT(x, y) + cos«t (x, y) = (sina)(-y, x) + (cosa)(x, y) = (-y sina, x sin a) + (x cos a, ycosa) = (x cosa - y sin a, x sina +y cosa).
  • 75. 2.1 ALGEBRAOFL(V) 55 The reader will recognize the final expression as a rotation in the xy-plane through an angle a. Thus, we can think. of eaT as a rotation operator of angle a about the z-axis. In this contextT is called the generator of the rotation. II 2.1.3 Commutators The result of multiplication of two operators depends on the order in which the operators appear. This means thatifT, U E .G(V),then TU E .G(V) and UTE .G(V); however, in general UT i= TU. When this is the case, we say that U and T do not commute. The extent to which two operators fail to commute is given in the following definition. commutator defined 2.1.10. Definition. The commutator [U,T] ofthe two operators Uand T in .G(V) is another operator in .G(V), defined as [U,T] es UT-TU. An immediate consequence of this definition is the following: 2.1.11. Proposition. For S, T, U E .G(V)and a, fJ E IC (or ffi.), we have [U, T] = -[T, U], [aU, fJn = afJ[U, no [S, T+ U] = [S, T] + [S, U], [S+T, U] = [S, U]+ [T, U], [ST, U] = S[T, U]+ [S, U]T, [S, TU] = [S, T]U+ T[S, U], [[S, T], U]+ [[U, S], n+ [[T, U], S] = O. antisymmetry linearity linearity in the right entry linearity in the left entry right derivation property left derivation property Jacobi identity Proof In almost all cases the proof follows immediately from the definition. The only minor exceptions are the derivation properties. We prove the left derivation property: [S, TU] =S(TU) - (TU)S =STU- TUS+ TSU - TSU ' - v - ' =0 = (ST - TS)U + T(SU - US) = [S, T]U+ T[S, U]. The right derivation property is proved in exactly the same way. A useful consequence of the definition and Proposition 2.1.11 is D for m=O,±I,±2, .... In particular, [A, 1] = 0 and [A,A-I] = O.
  • 76. 56 2. OPERATOR ALGEBRA 2.2 Derivatives of Functions of Operators atime-dependent operator does not commute with itself atdifferent times derivative ofan operator Up to this point we have heen discussing the algebraic properties of operators, static objects that obey certain algebraic rules and fulfill the static needs of some applications. However, physical quantities are dynamic, and if we want operators to represent physical quantities, we must allow them to change with time. This dynamism is best illustrated in quantum mechanics, where physical observables are represented by operators. Let us consider a mapping H : ffi. ---> ,c(V), which/ takes in a real number and gives out a linear operator on the vector space V. We denote the image of t E ffi. by H(t), which acts on the underlying vector space V. The physical meaning of this is that as t (usually time) varies, its image H(t) also varies. Therefore, for different values of t, we have different operators. In particular, [H(t), H(t')] "" afor t "" t', A concrete example is an operator that is a linear combination of the operators D and T introduced in Example 1.3.4, with time-dependent scalars. To be specific, let H(t) = Dcos wt +Tsinwt, where w is a constant. As time passes, H(t) changes its identity from D to T and back to D. Most of the time it has a hybrid identity! Since DandT do not commute, values of H(t) for differeuttimes do not necessarily commute. Of particular interest are operators that can be written as exp H(t), where H(t) is a "simple" operator; i.e., the dependence of H(t) on t is simpler than the corre- sponding dependence of exp H(t). We have already encountered such a situation in Example 2.1.9, where it was shown that the operation of rotation around the z-axis could be written as expaT, and the action ofT on (x, y) was a great deal simpler than the corresponding action of exp aT. Such a state of affairs is very common in physics. In fact, it can be shown that many operators of physical interest can be written as a product of simpler operators, each being of the form exp aT. For example, we know from Euler's theorem in mechanics that an arbitrary rotation in three dimensions can be written as a product of three simpler rotations, each being a rotation through a so-called Euler angle about an axis. 2.2.1. Definition. For the mapping H : ffi. ---> ,(,(V), we define the derivative as dH = lim H(t + M) - H(t). dt ""-+0 f!,.t This derivative also belongs to ,c(V). As long as we keep track ofthe order, practically all the rules ofdifferentiation apply to operators. For example, d dU dT dt (UT) = dt T + Udt . 2Strictlyspeaking,thedomain ofH mustbe an interval [a, b] of the real line. because Hmay notbe definedfor all R. However, for our purposes, such a finedistinction is Dot necessary.
  • 77. 2.2 DERIVATIVES DF FUNCTIONS OF OPERATORS 57 We are not allowed to change the order of multiplication on the RHS, not even when both operators being multiplied are the same on the LHS. For instance, if we let U = T = Hin the preceding equation, we obtain d 2 dH dH -(H)= -H+H- dt dt dt This is not, in general, equal to 2H!fJ/-. 2.2.2. Example. Letusfindthederivative ofexp(tH),whereHis independentof t. Using Definition2.2.1, we have d lim exp[(t + Ll.t)Hj - exp(tH) -exp(tH) = . dt .6.t-+0 IJ..t However,for infinitesimalI::1t we have cxpltr + Ll.t)H] - exp(tH) = etHel!.tH _ etH = etH(1 + HLl.t) _ etH = etHHLl.t. Therefore, d etHHLl.t - exp(tH) = lim - - - = etHH. dt l!.HO Ll.t Since HandetH commute.e we also have d - exp(tH)= HetH. dt Note that in deriving the equation for the derivative of etH, we have used the relation etHe..6..tH = e(t+Lit)H. This may seem trivial.but it will be shown later that in general, ~#~~. • Now let us evaluate the derivative of a more general time-dependent operator, exp[H(t)]: d . exp[H(t + Ll.t)] - exp[H(t)] - exp[H(t)] = Ion . dt l!.HO Ll.t If H(t) possesses a derivative, we have, to the first order in Ll.t, d H(t + Ll.t) = H(t) + Ll.t dt H, and we can write exp[H(t + Ll.t)] = exp[H(t) + Ll.tdH/dt]. It is very tempting to factor out the exp[H(t)] and expand the remaining part. However, as we will see presently, this is not possible in general. As preparation, consider the following example, which concerns the integration of an operator.
  • 78. 58 2. OPERATOR ALGEBRA evolution operator 2.2.3. Example. The Schrtidingerequation i ;t It(t» = HIt(t» can be turned into an operator differential equation asfollows. Definetheso-called evolutionoperator U(t) by It(t» = U(t) It(O»,and substitute in the Schrtidinger eqnation to obtain i ~U(t) It(O» = HU(t) It(O» . at Ignoring thearbitraryvector [t(O» results ina differentialequation in U(t). Forthepurposes ofthisexample.Ietusconsideranoperatordifferential equationof the form dU / dt = HU(t), where His not dependent on t, We can find a solutionto such an equation by repeated differentiation followedby Taylor seriesexpansion. Thus, d2U d ~ = H- d U = H[HU(t)] = H2U(t), dt t d3U d d -3 = -[H2U(t)] = H2_U = H3U(t). dt dt dt In general dnU/dtn = HnU(t). Assuming that U(t) is well-defiued at t = 0, the above relations say that all derivatives of U(t) are also well-defined at t = O. Therefore, we can expand U(t) arouud t = 0 to obtain 00 tn (dnU) 00 tn U(t) = L, -d n = L ,HnU(O) n=O n. t t=O n=O n. = (~ U:r)U(O) = e'HU(O). III Let us see under what conditions we have exp(5 +T) = exp(5) exp(T). We consider only the case where the commutator of the two operators commutes with both ofthem: [T, [5, Tj] = 0 = [5, [5, T]].NowconsidertheoperatorU(t) = e'Se'Te-'(SH) and differentiate it using the result ofExarnple 2.2.2 and the product rule for differentiation: ~U = 5e'setTe-t(SH) +etSTe'Te-'(S+T) _ e'Se'T(5 +T)e-t(SH) dt = 5e'Se'Te-t(SH) _ etSetT5e-t(SH). (2.5) The three factors of U(t) are present in all terms; however. they are not always next to one another. We can switchthe operators if we introduce a commutator. For instance, e'T5 = 5etT +letT, 5]. Itis leflas aproblemforthereaderto show that if[5, Tj commutes with 5 and T, then letT, 5] = -t[5, T]etT, and therefore, etT5 = 5e,r - t[5, T]etT. Substituting this in Equation (2.5) and noting that e'55 = SetS yields dU/dt = t[5. TjU(t). The solution to this equation is U(t) =exp c;[5, T]) => e'Se'Te-t(SH) =exp c;[5, T]) 3Thisis a consequence of a moregeneral result that if two operators commute, anypairof functions of thoseoperators also commute (see Problem 2.14).
  • 79. Baker-Campbell- Hausdorff formula 2.2 DERIVATIVES OF FUNCTIONS OF OPERATORS 59 because U(O) = 1. We thus have the following: 2.2.4. Proposition. Let S, T E £'(V). If [S, [S, T]] = 0 = [T, [S, TJ], then the Baker-Campbell-Hausdorffformula holds: (2.6) In particular, e'Se'T = e,(s+n ifand only if[S, TJ = o. If t = I, Equation (2.6) reduces to eSeTe-(l/2)[S,T] = eS+T. (2.7) Now assume that both H(t) and its derivative commute with [H, dH/dt]. Letting S =H(t) and T =f!J.tdH/dt in (2.7), we obtain eH(t+.6.t) = eH(t)+.6.tdH/dt = eH(I)elll(dH/dl)e-[H(I),lltdH/dl]/2. For infinitesimal f!J.t, this yields e H(I+IlI) =eH(I) (1 +f!J.t~~) (1- ~f!J.t[H(t), ~~]) =e H(I){l +f!J.t~~ - ~f!J.t[H(t), ~~]}, and we have ~eH(I) = eHdH _ !eH [H dH]. dt dt 2 ' dt We can also write eH(t+IlI) =e[H(I)+lltdH/dl] =e[llldH/dl+H(I)] = e[llldH/dl]eH(t)e-[lltdH/dl,H(ll]/2, which yields ~eH(ll = dH eH+ !eH [H dH]. dt dt 2 ' dt Addingthe above two expressions and dividing by 2 yields the following symmetric expression forthederivative: ~eH(ll =! (dHeH +eHdH) =! {dH eH} dt 2 dt dt 2 dt ' , antlcommutator where IS, T} sa ST + TS is called the anticommutator of the operators S and T. We, therefore, have the following proposition.
  • 80. 60 2. OPERATOR ALGEBRA 2.2.5. Prepositlon, Let H : R --> .G(V) and assume that H and its derivative commutewith [H, dH/dt]. Then !£eH(t) =! {dH eH} dt 2 dt ' . In particular, if[H, dH/dt] = 0, then !£eH(t) = dHeH =eHdH . dt dt dt A frequently eucountered operator is F(t) = etABe-tA, where A and B are t-independent. It is straightforward to show that dF - = [A, F(t)] dt and d [ dF] -[A, F(t)] = A, - . dt dt Using these results, we can write d2F d -2 = -[A, F(t)] =[A, [A, F(t)]] sa A2[F(t)], dt dt and in general, dnF/dtn = An[F(t)], where An[F(t)] is defined inductively as An[F(t)] = [A,An-1[F(t)]], with AO[F(t)] '" F(t). For example, A3[F(t)] =[A,A2[F(t)]] =[A, [A,A[F(t)]]] =[A, [A, [A, F(t)]]]. Evaluating F(t) and all its derivatives at t = 0 and substituting in the Taylor expansion about t = 0, we get 00 tn dRFI 00 tn 00 t" F(t) =L , -.- =L ,An[F(O)] =L ,An[B]. n=O n. dt 1=0 n=O n. n=O n. That is, 00 tn t2 etABe-tA = L _An[B] ea B +t[A, B] + -[A, [A, B]] +.... n~O n! 2! Sometimes tltis is written symbolically as where the RHS is merely an abbreviation of the infinite sum in the middle. For t = I we obtain a widely used formula: ( 00 I ) I eABe-A= eA[B] = L ,An [B] '" B + [A, B] + ,[A, [A, B]] +.... n=O n. 2.
  • 81. 2.3 CONJUGATION OF OPERATORS 61 IfA commutes with [A, BJ,then the infinite series truncates at the second term, and we have etABe-tA = B +t[A, BJ. For instance, ifA and B are replaced by D and T ofExample 1.3.4, we get (see Problem 2.3) etOTe-tO= T +t[D, T] = T +t1. generator of The RHS shows that the operator T has been translated by an amount t (more translation precisely, by t times the unit operator). We therefore call exp(tD) the translation operator of T by t, and we call D the generator of translation. With a little mod- ification T and D become, respectively, the position and momentum operators in momentum as quantum mechanics. Thus, generator of translation 2.2.6. Box. Momentum is the generator oftranslation in quantum mechan- ics. But more of this later! 2.3 Conjugation of Operators We have discussed the notion of the dual of a vector in conjunction with inner products. We now incorporate linear operators into this notion. Let Ib) , [c) E '7 and assume that [c) = T Ib). We know that there are linear functionals in the dual space '7* that are associated with (Ib))t = (bl and (Ic))t = (c], Is there a linear operator belonging to £'('7*) that somehow corresponds to T? In other words, can we find a linear operator that relates (bl and (cl just as T relates Ib) and [c}?The answer comes in the following definition. adjoint ofan operator 2.3.1. Definition. Let T E £'('7) and 10) ,Ib} E '7. The adjoint, or hermitian conjugate, ofT is denoted by rt and defined by (01 T Ib)* = (blTt 10) . (2.8) The LHS of Equation (2.8) can be written as (01 c)* or (c] a), in which case we can identify (2.9) This equation is sometimes used as the definition of the hermitian conjugate. From Equation (2.8), the reader may easily verify that 1t = 1. Thus, using the unit operatorfor T, (2.9) justifies Equation (1.12). Some of the properties of conjugation are listed in the following theorem, whose proof is left as an exercise.
  • 82. 62 2. OPERATOR ALGEBRA 2.3.2. Theorem. Let U,T E .(,(V) and a E C. Then 1. (U +T)t = Ut +Tt. 3. (OIT)t =OI*Tt. 2. (UT)t = TtUt. 4. «T)t)t = T. The last identity holds for finite-dimensional vector spaces; it does not apply to infinite-dimensional vectorspaces in general. In previous examples dealing with linear operators T : Ill" -> Ill", an element ofIll" was denoted by a row vector, such as (x, y) for Ill2 and (x, y, z) for Ill3.There wasno confusion, because we wereoperating onlyin V. However, sinceelements of both V and V* are required when discussing T, T*, and rt, it is helpful to make a distinction between them. We therefore resort to the convention introduced in Example 1.2.3 by which . 2.3.3.Box. Ketsare representedas column vectorsandbrasas rowvectors. 2.3.4. Example. Let us find the hermitianconjugateof theoperatorT : C3 -> C3 given by T (~~) = (ati::~~a 3 ) . {t3 al - a2 + ia3 Introduce la} = GD and Ib) = (~D with dual vectors (al = (aj a2 a3)and (bl = (13j 132 133),respectively. Weuse Equation(2.8)to findTt:
  • 83. 2.4 HERMITIAN AND UNITARYOPERATORS 63 Therefore, we obtain III hermitian and antl-hermltlan operators 2.4 Hermitian and Unitary Operators The process of conjugation of linear operators looks much like conjugation of complex numbers. Equation (2.8) alludes to this fact, and Theorem 2.3.2 provides further evidence. It is therefore natural to look for operators that are counter- parts of real numbers. One can define complex conjugation for operators and thereby construct real operators. However, these real operators will not be inter- esting because-as it turns out-they completely ignore the complex character of the vector space. The following altemative definition makes use of hermitian conjugation, and the result will have much wider application than is allowed by a mere complex conjugation. 2.4.1. Definition. A linear operator H E L(V) is calledhermitian, or self-adjoint, ifHt = H. Similarly, A E L(V) is called anti-hermitian ifAt = -A Charles Hermite (1822-1901), one of the most eminent French mathematicians of thenineteenth century, was par- ticularly distinguishedforthecleanelegance andhigh artis- tic quality of his work. As a student, he courted disaster byneglecting his routine assigned worktostudytheclassic masters of mathematics; andthough henearly failedhisex- aminations, he became a first-rate creative mathematician whilestillin his earlytwenties. In 1870he wasappointed to a professorship at the Sorbonne, where he trained a whole generation of well-known French mathematicians, includ- ing Picard, Borel, and Poincare. Thecharacter ofhis mindis suggestedbyaremarkofPoincare: "TalkwithM. Hermite. Heneverevokesaconcrete image,yetyousoonperceivethatthemostabstractentities areto him likelivingcreatures." Hedislikedgeometry, butwasstrongly attractedto numbertheory andanalysis,andhis favorite subject was ellipticfunctions, where these two fieldstouch in many remarkable ways. Earlier in the centurythe Norwegian genius Abel had proved that the general equation of the fifth degreecannotbe solvedby functions involvingonly rational operations androotextractions. Oneof Hermite's mostsurprising achievements (in 1858)wasto showthat thisequation canbesolvedby ellipticfunctions. His 1873proofof thetranscendence of e was another high pointof his career," If he hadbeenwillingtodigevendeeper intothisvein,he couldprobably havedisposedof tt as 4Transcendental numbers arethosethat arenotroots of polynomials withinteger coefficients.
  • 84. 64 2. OPERATOR ALGEBRA well,butapparently hehadhadenoughof a goodthing.Ashe wrotetoa friend, "I sballrisk nothing on anattempt to provethetranscendence of thenumber n, Ifothers undertake this enterprise, no onewill be happier than I attheir success,butbelieveme, my dear friend, it willnotfail to cost themsome efforts." As it turned out,Lindemann's proof nine yearslater restedon extendingHermite's method. Several ofhis purely mathematical discoveries hadunexpected applications many years later to mathematical physics.Forexample, the Hermitianforms and matrices that he in- ventedin connectionwith certain problems of number theoryturned out to be crucial for Heisenberg's 1925formulation of quantummechanics, andHermite polynomials (seeChap- ter7) areuseful in solving Schrodinger's waveequation. The following observations strengthen the above conjecture that conjugation of complex numbers and hermitian conjugation of operators are somehow related. expectation value 2.4.2.Definition. The expectation value (T)a ofan operator T in the "state" la) is a comp/ex number defined by (T)a = (aITla).. The complex conjugate of the expectation value is5 (T)* = (al T la)* = (al rt la). In words, r', the hermitian conjugate of T, has an expectation value that is the complex conjugate of the latter's expectationvalue. Inparticular, ifT is hermitian- is equal to its hermitian conjugate-its expectation value will be real. What is the analogue of the known fact that a complex number is the sum of a real number and a pure imaginary one? The decomposition shows that any operator can be written as a sum of a hermitian operator H = ~ (T +Tt) and an anti-hermitian operator A = ~ (T - Tt). We can go even further, because any anti-hermitian operator A can be written as A = i(-iA) in which -iA is hermitian: (-iA)t = (-i)*At = i(-A) = -iA. Denoting -iA by H', we write T = H + iH', where both H and H' are hermitian. This is the analogue of the decomposition z = x + iYin which both x and y are real. Clearly, we should expect some departures from a perfect correspondence. This is due to a lack ofconunutativity among operators. For instance, although the product of two real numbers is real, the product of two hermitian operators is not, in general, hermitian: 5Whennoriskof confusionexists, it is common to drop thesubscript "a" andwrite(T) fortheexpectation valueofT.
  • 85. 2.4 HERMITIAN AND UNITARY OPERATORS 65 We have seen the relation between expectation values and conjugatiou properties of operators. The following theorem completely characterizes hermitian operators in terms of their expectation values: 2.4.3. Theorem. A linear transformation H on a complex inner product space is hermitian ifand only if (al H la) is realfor allla). Proof We have already pointed out that a hermitian operator has real expectation values. Conversely, assume that (al H la) is real for all ]e). Then (al H la) =(al H la)* =(al Ht la) {} (al H - Ht la) =0 By Theorem 2.1.4 we must have H - Ht = o. Via) . D 2.4.4. Example. In thisexample, we illustrate theresult oftheabove theorem with2 x 2 matrices. Thematrix H = C? c/) is hermitian'' andacts on «:2. Letus take anarbitrary vector la) =(~P and evaluate (al H la). We have Therefore, (al H la) = (ai az)(-.ia2) = -iaia2 + iazat ,a! = iaial + (ia20:1)* = 2Re(io:iat), and (al H la) is real. Forthemostgeneral 2 x 2 hermitian matrix H= (;* ~), where a and y arereal, we have Hla) = (;* and (al H la) = (Oli az) (pOl*OI! ++P(2) = ai(aa! +P0l2) +OIZ(P*OI! + y(2) at ya2 =ala!1 2 +aiP0I2 +aZp*a! + Yla21 2 = 0I1a!1 2 +YI0I212 +2Re(aip0I2). Again (al Hla) is real. III 2.4.5. Definition. An operator A on an inner product space is called positive positive operators (written A:::: 0) ifA is hermitian and (al Ala) :::: Ofor allla). 6Weassume thatthereader hasa casualfamiliarity withhermitianmatrices. Thinkof ann x n matrix asalinearoperator that actsoncolumnvectorswhoseelementsarecomponents of vectorsdefinedin thestandard basisof en orR". A hermitian matrix thenbecomesa hermitian operator.
  • 86. 66 2. OPERATOR ALGEBRA positive definite operators 2.4.6. Example. Ali example of a positive operator is the squareof a hermitian opera- tor.? Wenote that foranyhermitian operator Handany vector [c), we have (c] H2 1 a} = (al HtH la) = (Hal Ha) ~ obecauseofthepositivedefiniteness oftheinnerproduct. III An operator T satisfying the extra condition that (aIT la) = 0 implies la) = 0 is called positive definite. From the discussion ofthe example above, we conclude that the square of an invertible hermitian operator is positive definite. Thereadermaybe familiar with two- and three-dimensional rigid rotations and the fact that they preserve distances and the scalar product. Can this be generalized to complex inner product spaces? Let jz) , Ib) E V, and let U be an operator on V that preserves the scalar product; that is, given Ib') = U Ib) and la') = Ula), then (a'i b') = (al b). This yields (a'l b') = «al Ut)(U Ib» = (al UtU Ib) = (al b) = (aI1Ib). Since this is truefor arbitrary la) and Ib), we obtain u'u = 1. In the next chapter, when we introduce the concept of the determinant of operators, we shall see that this relation implies that U and ut are both Inverrible," with each one being the inverse of the other. 2.4.7. Definition. Let V be afinite-dimensional inner product space. An operator unitary operators U is called a unitary operator ifUt = U-I. Unitary operators preserve the inner product o/V. 2.4.8. Example. Thelineartransformation T : C3 -->C3 givenby T I:~) = ( (al ~li~2i'::~~:0./6 ) ;;3 {al - a2 + a3 + i(al + cz + (3)}/./6 is unitary. In fact,let and with dual vectors (al = (at ai a3') and (hi = (Pi Pi P3'), respectively. Weuse Equation(2.8) and the procedureof Example2.3.4to findrt, The resultis al a2 a3(1 - i) v1+./6+ ./6 ial ia2 a3(l + i) v1-./6- ./6 ~This is further evidencethathermitian operators areanalogues of realnumbers: The square of anyrealnumber is positive. Thisimplication holdsonlyforfinite-dimensional vectorspaces.
  • 87. 2.5 PROJECTION OPERATORS 67 and we can verify that Thus TIt = 1. Similarly, we can show that TtT = 1 and therefore that T is unitary. II 2.5 Projection Operators We have already considered subspaces briefly. The significance of subspaces is that physics frequently takes place not inside the whole vector space, but in one of its subspaces. For instance, although motion generally takes place in a three- dimensional space, it may restrict itself to a plane either because of constraints or due to the nature of the force responsible for the motion. An example is planetary motion, which is confined to a plane because the force of gravity is central. Fur- thermore, the example of projectile motion teaches us that it is very convenient to "project" the motion onto the horizontal and vertical axes and to study these projections separately. It is, therefore, appropriate to ask how we can go from a full space to one of its subspaces in the context of linear operators. Let us first consider a simple example. A point in the plane is designated by the coordinates (x, y). A subspace of the plane is the x-axis. Is there a linear operator.? say Px , that acts on such a point and somehow sends it into that subspace? Of course, there are many operators from ]Rz to R However, we are looking for a specific one. We want Px to projectthe point onto the x-axis. Such an operator has to act on (x, y) and produce (x, 0): Px(x, y) = (x, 0). Note that ifthe point already lies on the x-axis, Px does not change it. In particular, if we apply Px twice, we get the same result as if we apply it ouly once. And this is true for any point in the plane. Therefore, our operator must have the property P~ = Px. We can generaIize the above discussion in the following deflnition.I'' projection operators 2.5.1. Definition. A hermitian operator PEl:.,(V) is calledaprojectionoperator ifPZ = P. From this definition it immediately follows that the only projection operator with an inverse is the identity operator. (Show this!) Considertwo projection operators PI and Pz. We wantto investigate conditions under which PI + Pz becomes a projection operator. By definition, PI + Pz = (PI +Pz)z = pi +PI Pz +PZPI + P~. So PI +Pz is a projection operator ifand ouly if (2.10) 9We want this operator to preserve the vector-space structure of the plane and the axis. l°lt is sometimes useful to relax the condition ofherrniticity. However, in this part ofthe book, we demand that P be hermitian.
  • 88. 68 2. OPERATOR ALGEBRA MUltiply this ou the left by PI to get PfP2 + PIP2PI =0 =} PIP2 + PIP2PI =O. Now multiply the same equatiou ou the right by PI to get PIP2PI +P2Pf = 0 =} PIP2PI +P2PI = O. These last two equatious yield (2.11) orthogonal projection operators compieteness relation The solutiou to Equatious (2.10) aud (2.11) is PIP2 = P2PI = O. We therefore have the following result. 2,5.2. Proposition. Let PI, P2 E .G(V) be projection operators. Then PI + P2 is a projection operator if and only if PIP2 = P2PI = O. Projection operators satisfying this condition are called orthogonalprojection operators. More geuerally, if there is a set {P;}i~1 ofprojectiou operators satisfyiug { Pi ifi = j, p.p. - , J - 0 ifi '" j, theu P = I:7::1 Pi is also a projectiou operator. Giveu a uormal vector Ie), oue cau show easily that P = [e) (e] is a projectiou operator: • Pishermitiau: pt = (Ie) (el)t = «el)t(le»t = Ie) (e]. • P equals its square: p2 = (le) (el)(le) (el) = Ie) (el e) (e] = lei (e], --- =1 ill fact, we cantake au orthouormal basis B = {lei)}~I andcoustruct a set of projectiou operators {Pi = lei) (eil}~I' The operators Pi are mutually orthogoual. Thus, their sum I:~I Pi is also a projectiou operator. 2.5.3. Propositiou. Let B = {lei}}~1 be an orthonormal basis for VN. Then the set {Pi = lei) (eil}~1 consists ofmutually orthogonal projection operators, and I:~I Pi = I:~I lei) (eiI =1. This relation is called the completeness relation. Proof The mutual orthogouality of the Pi is au immediate cousequeuce of the orthouormality of the lei). To show the secoud part, consider au arbitrary vector la), writteu in terms of lei): la) = I:f=1 aj leji. Apply Pi to both sides to obtain
  • 89. 2.5 PROJECTION OPERATORS 69 Therefore, we have 11a) = tai lei) = tPila) = (tPi) la). Since this holds for an arbitrary la), the two operators must be equal. D If we choose only the first m < N vectors instead of the entire basis, then the projection operator p(m) '" I:~=I lei) (eil projects arbitrary vectors into the subspace spanned by the first m basis vectors (lei)li':"I' In other words, when p(m) acts on any vector la) E V, the result will be a linear combination of only the first m vectors. The simple proof of this fact is left as an exercise. These points are illustrated in the following example. 2.5.4. Example. Consider three orthonormal vectors (lei)1;=1 E lW,3 given by I (I I 0) 10)=2: 1 1 0. . 0 0 0 Theprojection operators associated witheachof thesecanbeobtained by notingthat (ei I is a row vector.Therefore, PI = leI) (eIl = ~ (D(I Similarly, pz=H~I)(l -I 2)=H~1 ~: ~2) and ( I -I I)=~ -I I 3 -I I -I) I . I Note thatPi projects ontothe line along lei}. This can be testedby lettingPi acton anarbitrary vectorandshowing that theresulting vectoris perpendicular to theother two vectors. Forexample, let P2acton anarbitrary columnvector: la) == Pz(;)= ~ (~I ~I !2)(;)= ~ (:x-:/_2~z). z 2 -2 4 z 2x-2y+4z We verify that]e} is perpendicular to both let) and [eg}: I (ella)= vrz(1 I(X-Y+2Z) I 0) (; -x +Y - 2z = O. 2x-2y+4z
  • 90. 70 2. OPERATOR ALGEBRA Similarly, (e31 a) =O. So indeed, la) is atongle2). We can find the operator that projects onto the plane fanned by leI) and le2). This is t (2 t PI + P2 = - I 2 3 I -t When this operator acts on an arbitrary column vector, it produces a vector lying in the planeof 1eJ) and le2), orperpeodicularto le3): ( X) t (2 I t ) (X) t GX+Y+Z) Ib) == (PI + P2) Y = 3" I 2 -I Y = 3" X +2y - Z • Z I -t 2 Z - Y + 2z It is easy to show that (e31 b) = O.The operators that project onto the other two planes are obtainedsimilarly. Finalty, weverifyeasilythat PI + P2 + P3 = (~ ~ ~ =1. o 0 ~) 2.6 Operators in Numerical Analysis 11II forward, backward, and central difference operators In numerical calculations, limitiog operations involving infinities and zeros are replaced with finite values. The most natural setting for the discussion of such operations is the ideas developed in this chapter. In this section, we shall assume that all operators are invertible, and (rather sloppily) manipulate them with no mathematical justification. 2.6.1 Finite-Difference Operators In all numerical manipulations, a function is considered as a table with two columns. The first column lists the (discrete) values of the independent variable Xi, and the second column lists the value of the function I at Xi. We often write Ii for I(xi). Three operators that are in use in numerical analysis are the forward difference operator ~,the backward difference operator V (not to be confused with the gradient), and the central difference operator 6. These are defined as follows: ~Ii sa 11+1 - fi, Vfi es Ii -fi-I, (2.12) The last equation has only theoretical significance, because a half-step is not used in the tabulation of functions or in computer calculations. Typically, the data are equally spaced, so Xi+1 - Xi =h is the same for all i, Then Ii±1 =I(xi ± h), and we define Ii±lj2sa I(xi ± hI2).
  • 91. 2.6 OPERATORS IN NUMERICAL ANALYSIS 71 We can define products (composition) of the three operators. In particular, A 2 is given by A 2 f; = A(f1+1 - ft) =11+2 - 2/1+1 + f;. Similarly, V2f; = V(f; - li-l) = Ii - 2li-l + Ii-a. We note that (2.13) tPf; = f;+J - 21i + Ii-I. (2.14) shifting and averaging operators tPf; = f;+1 - f; - (f; - Ii-I) = (A - V)li =} (j2 = A-V. '---v-' ' - , , - ' =Afi =V!i This shows that the three operators are related. Itis convenient to introduce the shifting and averaging operators, respectively E and p" as E/(x) = I(x +h), p,1(x) = ~ [I (x +~) +I (x - ~)]. (2.15) Note that for any positive integer n, EnI(x) = I(x +nh). We generalize this to any real number a: E a I(x) = I(x +ah). All the other finite-difference operators can be written in terms of E: (2.16) A=E-1, V=1-E-1, (j=EI/2_E-1/2, p,=~(El/2+E-l/2) (2.17) The first two equations of (2.17) can be rewritten as E=l +A, E = (1 - V)-l (2.18) We can obtaina usefulformula for the shifting operator when it acts on polynomials of degree n or less. First note that 1 - Vn +1 = (1 - V)(l + V + ... + Vn ) . But Vn +1 armihilates all polynomials of degree n or less (see Problem 2.33). Therefore, for such polynomials, we have 1 = (1 - V)(l + V + ., . + Vn ) ,
  • 92. (2.20) 72 2. OPERATOR ALGEBRA which shows that E = (1 - V)-I = 1 + V +...+ Vn. Now let n -+ 00 and obtain 00 E = (1 - V)-I = L V k (2.19) k~O for polynomials ofany degree and-hy Taylor expansion-for any (well-behaved) function. 2.6.1. Example. Numerical interpolation illustrates the use of the formulas derived above. Suppose that we are given a table of the values of a function, and we want the valueof thefunction foranx locatedbetween twoentries. Wecannot usethetable directly, butwe mayuse thefollowing procedure. Assumethat thevalues of thefunction f are givenforXl, X2, ... , xi, ... , and we are interested in thevalueof thefunction for x suchthat xi < x < xi+ 1. Thiscorresponds to Ii+r. where0 < r < 1.Wehave r r ( r(r - I) 2 ) li+r=Eli=(1+~)/i= 1+r~+--2-~ +... j;. In practice, theinfinite sumis truncated after a finite number of terms. If only twotenusarekept, we have li+r "" (1 + r~)li = Ii +r(fi+1 - Ii) = (I - r)/i +rli+l. (2.21) In particular. for r = t.Equatiou (2.21) yields li+l/2 "" t(fi + li+I). which states the reasonable result that thevalueatthemidpoint is approximately equalto theaverage of the values attheendpoints. If the third term of the series in Equation (2.20) is also retained. theu [ r(r - I) 2] r(r - I) 2 li+r"" 1 +r~+ --2-~ Ii =Ii +r~/i + --2-~ Ii . r(r - I) = Ii +r(fi+l - /i) + --2-(1i+2 - 2li+l + Ii) (2.22) ~-~(I-~ r~-I) = 2 Ii +r(2 - r)li+1 + --2-11+2' For r = t.that is. at the midpoint betweeu Xi and Xi +1. Equation (2.22) yields 1i+1/2 "" ~Ii + lli+l - !/i+2. whichturns outtohe abetter approximation than Equation (2.21).However, it involves not onlythetwopoints oneither sideof x butalsoarelatively distant point, Xi+2, Ifwe wereto retain terms up to ~k for k > 2. theu Ii+r would be giveu in terms of h. Ii+j. ...• li+k> and theresult wouldbemore accurate than (2.22). Thus, the more information we have about the behavior of a function at distant points; the better we can approximate it at x E (Xj,Xi+l). Theforegoing analysis wasbasedonforward interpolation. Wemaywanttouseback- wardinterpolation, where fi-r is soughtfor 0 < r < 1.In sucha casewe usethebackward difference operator 1 ( r ( r - I) 2 ) Ii-r =(E- )rIi =(1 - V)r Ii = 1 - rV + --2-V +... h- III
  • 93. differentiation and integration operators 2.6 OPERATORS IN NUMERICAL ANALYSIS 73 2.6.2. Example. Let us check the conclusion made in Example 2.6.1 using a calculator and a specific function, say sinx. A calculator gives sin(O.I) = 0.0998334, sin(O.2) = 0.1986693,sin(0.3) = 0.2955202.Supposethat we wantto findsin(O.l5)by interpolation. UsingEquation (2.21)with r = i,we obtain sin(0.15) "" ![sin(O.I) + sin(0.2)] = 0.1492514. On the other hand, using (2.22)with r = iyields sin(0.15) "" i sin(O.I) + i sin(0.2) - ksin(0.3) = 0.1494995. The valueof sin(O.l5) obtainedusing a calculatoris 0.1494381.It is clearthat (2.22)gives a betterestimate than (2.21). III 2.6.2 Differentiation and Integration Operators The two mostimportant operations ofmathematicalphysics canbe writtett itt temns of the finite-difference operators. Define the differetttiatiott operator 0 and the mtegration operator J by Of(x) == f'(x), l x+h Jf(x) es x f(t) dt. (2.23) Assuming that 0-1exists, we note that f(x) = 0-1[f'(x)]. This shows that 0-1is the operatiott of antidifferentiation: 0-1f(x) = F(x), where F is any primitive of f. Ott the other hand, AF(x) = F(x +h) - F(x) = Jf(x). These two equations and the fact that J and 0 commute (reader, verify!) show that "exact" relation between shiffing and differentiation operators Ao-1 = J =} JO = OJ = A = E -1. UsingtheTaylor expansion, we canwrite (2.24) (2.26) 2.6.3. Example. Let us calculate cos(O.I), considering cosx to be (dldx)(sinx) and usingthevaluesgivenin Example 2.6.2. Using Equation (2.26) to secondorder, we get
  • 94. 74 2. OPERATOR ALGEBRA Thisgives cos(O.I) '" 1[-0.295520 +4(0.198669) - 3(0.099833)] = 0.998284. "exact" relation between integration and difference operators Incomparison. thevalue obtained directly from a calculator is 0.995004. The operator J is a special case of a more general operator defined by t: Jaf(x) = x f(t) dt = F(x +ah) - F(x) = (Ea -l)F(x) = (Ea _1)D-1f(x), or Ea-1 (1+a)a-1 J = (Ea -1)D-1 = h-- = h-'-c:--,--'---:-,-- a In E 1n(1 +a) , where we nsed Equation (2.26). iii (2.27) (2.28) 2.6.3 NumericalIntegration Suppose that we are interested in the numerical value of J: f(x) dx. Let Xo sa a and XN == b, and divide the interval [a, b] into N equal parts, each of length h = (b - a)jN. The method commonly used in calcnlating integrals is to find the integral J:,;+ah f(x) dx, where a is a suitable number, and then add all such integrals. More specifically, t: t: r: 1= f(x)dx+ f(x)dx+ ... + f(x)dx, XQ xo+ah xo+(M-l)ah where M is a suitably chosen number. In fact, since xN = Xo + N h, we have M a = N. We next employ Equation (2.27) to get 1= Jafo +Jafa +...+Jaf(M-l)a = Ja (~l fka) , k~O where fka sa f(xo + kah). We thus need an expression for Ja to evaluate the integral. Such an expression can be derived elegantly, by noting that 1 a S ES la Ea- 1 I E ds = - = -- = -Ja , DinED InE h so that [by Equation (2.27)] (2.29)
  • 95. trapezoidal rule for numerical integration 2.6 OPERATORS IN NUMERICAL ANALYSIS 75 where we expanded (1 +AY using the binomial infinite series. Equations (2.28) and (2.29) givethe desired evaluationof the integral. Let us make a few remarks before developing any commonly used rules of approximation.First, once h is set, the function can be evaluatedonly at Xo +nh, where n is a positiveinteger.This means that fn is givenonly for positiveiotegers n. Thus, in the sum in (2.28) ka must be an integer. Since k is an integer, we conclude that a must be an integer. Second, since N = Ma for some integer M, we must choose N to be a multiple of a. Third, if we areto be able to evaluate Jaf(M-l)a [the last term io (2.28)], Ja cannot have powers of A higher than a, because An f(M-l)a contaics a term of the form f(xo +(M - l)ah +nh) = f(XN + (n - a)h), whichforn > a givesf at apoint beyondtheupperlimit.Thus,in thepower-series expansion of Ja, we must make sure that no power ofA beyond a is retained. There are several specific Ja's commonly used in numerical integration. We will consider these next. The trapezoidal rnle sets a = 1. Accordiog to the remarks above, we therefore retain terms up to the firstpower in the expansion of Ja. Then (2.29) gives JI =J =h(1 + ~A). Substituting this in Equation (2.28), we obtain ( N- l ) N-l 1= h(1 + ~A) ~ fk = h ~[!k + ~(fk+l - fk)] h N-l h ="2 L(fk + fk+l) = "2(fo+2fl + .. ·+2fN-l +fN). k~O (2.30) Simpson's one-third rule for numerical integration Simpson's three-eighths ruie for numerical integration Simpson's one-third rule sets a = 2. Thus, we have to retain all terms up to the A 2 term. However, for a = 2, the third power of A disappears in Equation (2.29), and we get an extra ''power'' of accuracy for free! Because of this,Simpson's one-thirdruleispopularfornumericalintegrations.Equation(2.29) yields J2 = 2h(1 + A + ~A 2). Substituting this in (2.28) yields h N/2-1 h N/2-1 1= - L (61 +6A +A 2)fzk = - L (flk+2 +4flk+l + flk) 3 k=O 3 k=O h = "3 (fo +4fl +2fz +4/3 +...+4fN-l + fN). (2.31) Itis understood,of course,that N is an eveninteger.The factor tgivesthis method its name. For Simpson's three-eighths rule, we set a = 3, retain terms up to A 3, and use Equation (2.29) to obtaio J3 = 3h(1 + ~A + ~A2 + fA 3) = 3: (81 + l2A +6A 2 + A3).
  • 96. 76 2. OPERATOR ALGEBRA Substitutingin (2.28),we get 3h N/3-t 1=- L (81+12A+6A z+A 3 ) f3k 8 k=O 3h N/3-t = 8" ~ (f3k+3 +3f3k+2 +3f3k+t + 13k). (2.32) III 2.6.4. Example. Letus use Simpson's one-third rulewithfour intervals to evaluate the familiar integrall = fJ eX dx. With h = 0.25 andN =4,Equation (2.31) yields 1 '" 0.25 (I +4eO.25+2e0.5 +4eO.75 +e) = 1.71832. 3 This is very close tothe"exact" result e - I '" 1.71828. 2.7 Problems 2.1. ConsideralinearoperatorTon a finite-dimensional vectorspaceV.Showthat there exists a polynomial P such that P(T) =O. Hint: Take a basis B = {Iai )}!:I aod considerthe vectors {Tk lal)}~O for large enough M aod conclude that there existsapolynomialPI(n suchthat PI (T) lal) = O. Do thesamefor laz), etc. Now take the product of all such polynomials. 2.2. Use mathematicalinductionto showthat [A, Am] = O. 2.3. For D aodT definedin Exarople 1.3.4: (a) Showthat [D, T] = 1. (b) Calculate the linear traosformationsD3T3 aod T3D3. 2.4. Considerthreelinear operators LI, Lz, aod L3 satisfyingthe commutationre- lations [LI, Lz] = L3, [L3, Ltl = Lz, [Lz, L3] =LI, aod define the new operators L± = LI ± iLz. (a) Showthat the operator LZ'" Li+~ +L~ commutes with Lk, k = 1,2,3. (b) Showthat the set {L+, L, L3} is closedundercommutation, i.e., thecommuta- tor of aoytwo of themcaobe writtenas a linear combinationof the set.Determine these commutators. (c) Write LZin terms of L+, L_, aod L3. 2.5. Prove the rest of Proposition2.1.11. 2.6. Show that if [[A, B],A] = 0, then for every positiveinteger k, [Ak, B] = kAk-I[A, B]. Hint:Firstprove the relationfor low valuesof k; then use mathematicalinduction.
  • 97. 2.7 PROBLEMS 77 2.7. Show that for 0 and T defined in Example 1.3.4, [ok, T] = kok-1 and [Tk, OJ = _kTk- 1• 2.8. Evaluate the derivative of H-I(t) in terms of the derivative of H(t). 2.9. Show that for any ct, fJ E lRand any H E ,c,(V), we have 2.10. Show that (U +T)(U - T) =U2 - T2 if and only if [U, T] =O. 2.11. Prove that if A and B are hermitian, then i[A, BJ is also hermitian. 2.12. Find the solution to the operator differential equation dU iii = tHU(t). Hint: Make the change of variable y = t2 and use the result of Example 2.2.3. 2.13. Verify that ~H3 = (dH) H2 + H(dH) H+ H2 (dH). dt dt dt dt 2.14. Show that if A and B commute, and ! and g are arbitrary functions, then !(A) and g(B) also commute. 2.15. Assuming that US, T], T] = 0 = [[5, T], 5], show that [5, exp(tT)] = t[S, T] exp(tn. Hint: Expand the exponential and use Problem 2.6. 2.16. Prove that exp(HI +H2 + H3) = exp(Hj ) exp(H2) exp(H3) . exp{-!([HI, H2J + [HI, H3]+[H2, H3])} provided that HI, H2, and H3 commute with all the commutators. What is the generalization to HI +H2 +...+Hn? 2.17. Denoting the derivative of A(t) by A, show that d . . -[A, B] = [A, BJ+ [A, B]. dt 2.18. Prove Theorem 2.3.2. Hint: Use Equation (2.8) and Theorem 2.1.3. 2.19. Let A(t) es exp(tH)Ao exp( -tH), where H and An are constant operators. Show thatdA/dt = [H, A(t)]. What happens when Hcommutes with A(t)?
  • 98. 78 2. OPERATOR ALGEBRA 2.20. Let If}, Ig) E C(a, b) with the additional property that f(a) = g(a) = feb) = g(b) = O. Show that for such fuuctions, the derivative operator D is anti-hermitian, The inner product is defined as usual: UI g) es f !*(t)g(t) dt. 2.21. In this problem, you will go through the steps of proving the rigorous state- Heisenberg ment ofthe Heisenberg nncertainty principle. Denote the expectation (average) uncertainty principle value of an operator A in a state 11jI} by Aavg.Thus, Aavg = (A) = (1jI1 A 11jI). The uncertainty (deviation from the mean) in state 11jI) of the operator A is given by LlA = J«A - AavgP) = J(1jI1 (A - Aavg1P 11jI)· (a) Show that for any two hermitian operators A and B, we have Hint: Apply the Schwarz inequality to an appropriate pair of vectors. (b) Using the above and the triangle inequality for complex numbers, show that 1(1jI1 [A, B] 11jI)12 ::: 4 (1jI1 A 2 1 1j1) (1jI1 B 2 1 1j1) . (c) Define the operators A' = A - "1, B' = B - fJ1, where" and fJ are real numbers. Show that A' and B' are hermitian and [A', B'] = [A, B]. (d) Now use all the results above to show the celebrated uncertainty relation (M)(LlB) ~ !I(1jI I [A, B] 11jI}1. What does this reduce to for position operator x and momentum operator p if [x, p] = i1i? 2,22. Show that U = exp A is unitary if and only if A is anti-hermitian, 2.23. Find rt for each of the following linear operators. (a) T : JR2 -+ ]R2 given by (b) T: ]R3 -+ ]R3 given by T(~) = (;x~2:;~). z -x+2y+3z (c) T : ]R2 -+ ]R2 given by T (X) = (xcosll- y sin II) y xsmll+ycosll'
  • 99. 2.7 PROBLEMS 79 where 0 is a real number. What is TtT? (d) T : 1(;2 """* 1(;2 given by T (a1) = (al - ia2) • a'2 1CXl +(X2 (e) T : 1(;3 """* 1(;3 given by 2.24. Show that if P is a (hermitian) projection operator, so are (a) 1 - P and (b) UtpU for any unitary operator U. 2.25. For the vector (a) Find the associated projection matrix, Pa. (b) Verify that Pa does project an arbitrary vector in 1(;4 along la). (c) Verify directly that the matrix 1 - Pais also a projection operator. 2.26. Let lal) '" al = (I, I, -I) and la2) sa 82 = (-2, I, -I). (a) Construct (in the form of a matrix) the projection operators PI and P2 that project onto the directions of lal) and la2),respectively. Verify that they are indeed projection operators. (b) Construct (in the form of a matrix) the operator P = PI + P2 and verify directly that it is a projection operator. (c) Let P act on an arbitraryvector (x, y, z). What is the dotproduct ofthe resulting vector with the vector 8I x 82? What can you say about P and your conclusion in (b)? 2.27. Let p(m) = L~=llei) (eil be a projection operator constructed out of the firstm orthonormal vectors ofthe basis B = {lei) 1f:,1of"17. Show that p(m) projects into the subspace spanned by the first m vectors in B. 2.28. What is the length ofthe projection ofthe vector (3, 4, -4) onto a line whose parametric equation is x = 21 +I, Y = -I +3, z = 1 -I? Hint: Find a unit vector in the direction of the line and construct its projection operator. 2.29. The parametric equation of a line L in a coordinate system with origin 0 is x=2/+1, y = 1 + I, Z=-2/+2.
  • 100. 80 2. OPERATOR ALGEBRA A point P has coordinates (3, -2, 1). (a) Using the projection operators, find the length of the projection of 0 P on the line L. (b) Find the vector whose beginning is P and ends on the line L and perpendicnlar to L. (c) From this vector calculate the distance from P to the line L. 2.30. Let the operator U : 1(;2 -> 1(;2 be given by U (al) = (iJz - i ~) a ~+.!!Z... 2 ..(i..(i Is U unitary? 2.31. Show that the product of two unitary operators is always unitary, but the product of two hermitian operators is hermitian if and only if they commute. 2.32. Let 5 be an operator that is both unitary and hermitian. Show that (a) 5 is involutive (i.e., 52 = 1). (b) 5 = p+ - P-, where p+ and P- are hermitian projection operators. 2,33. Show that when the forward difference operator is applied to a polynomial, the degree ofthe polynomial is reduced by I. (Hint: Consider xn first.) Then show that a n+1 annihilates all polynomials of degree n or less. 2.34. Show that 6n +1 and Vn+1 annihilate any polynomial of degree n. 2.35. Show that all of the finite-difference operators commute with one another. 2.36. Verify the identities VE = 6E1 / 2 = a, v +a = 2p,6 = E - E-1, E-1/2 = P, _ (6/2). 2.37. By writing everything in terms of E, show that 62 = a - v = a V. 2.38. Write expressions for E1/2, a, V, and p, in terms of 6. 2.39. Show that o = ~ sinh-I m . 2.40. Show that 02 = :2 (a2 - a 3 + ga4 _ ~a5 - ... ) and derive Equation (2.27).
  • 101. 2.7 PROBLEMS 81 2.41. Find an expression for Ja in powers of a. Retain all terms np to the foorth power. 2.42. Show that for a = 2, the third power of a disappears in Equation (2.29). 2.43. Evaluate the following integrals numerically, using six subintervals with the trapezoidal rule, Simpson's one-third rule, and Simpson's three-eighths rule. Compare with the exact result when possible. 1 5 x 3dx. [, f2 (a) (b) o e-X dx. (c) 0 xei cosx dx. [1 14inXdx. {' (d) -dx. (e) (f) 1 e' sin x dx. 1 x [ dx [, (i) 11 eX tan x dx. (g) _1 1+x2' (h) 0 xe" dx. Additional Reading 1. Axler, S. Linear Algebra Done Right, Springer-Verlag, 1996. 2. Greub, W. Linear Algebra, 4th ed., Springer-Verlag, 1975. 3. Hildebrand, F. Introduction to Numerical Analysis, 2nd ed., Dover, 1987. Uses operator techniques in numerical analysis. It has a detailed discussion of error analysis, a topic completely ignored in our text.
  • 102. 3 _ Matrices: Operator Representations So far, our theoretical investigation has been dealing mostly with abstract vec- tors and abstract operators. As we have seen in examples and problems, concrete representations of vectors and operators are necessary in most applications. Such representations are obtained by choosing a basis and expressing alloperations in terms of components of vectors and matrix representations of operators. 3.1 Matrices Letus choose a basis Bv = {Iai) 1~1 ofavector space VN, and express an arbitrary vector Ix) in this basis: [r) = L~I ~i lai). We write (3.1) representation of and say that the column vector x represents Ix) in Bv. We can also have a linear vectors transformation A E L (VN, WM) act on the basis vectors in Bv to give vectors in the M-dimensional vector space WM: IWk) = A lak). The tatter can be written as a linear combination of basis vectors Bw = {Ibj) If=j in W M : M IWI) = L"jllbj), j=l M IW2) = L"j2Ibj), j=l M IWN) = L"jN Ibj)' j=1 Note that the components have an extra subscript to denote which ofthe N vectors {IWi ) 1~1 they are representing. The components can be arranged in a column as
  • 103. 3.1 MATRICES 83 before to give a representation of lbe corresponding vectors: The operator itself is determined by the collection of all these vectors, i.e., by a matrix. We write this as ( "' 11 "'21 A= . ; a~2 (3.2) representation of operators and call Albe matrix representing Ain bases Bv and Bw. This statement is also sunnnarized symbolically as M Alai) = I>jilbj) , j~1 i = 1,2, ... ,N. (3.3) We lbus have the following mle: 3.1.1.Box. To find the matrix A representing A in bases Bv = (Iai)1~1 and Bw = libj)1f=1' express Alai) as a linear combination ofthe vectors in Bw. The components form the ith column ofA. Now consider lbe vector Iy) = A Ix) in WM. This vector can be written in two ways: On the one hand, Iy) = Ef=1 ryj Ibj ) . On the other hand, N N IY) =Alx) = A I) lai) = L~iAlai) ;=1 ;=1 Since Iy) has a unique set of components in lbe basis Bw, we conclude lbat N ryj = L"'ji~i' i=l j = 1,2, ... ,M. (3.4)
  • 104. 84 3. MATRICES: OPERATOR REPRESENTATIONS Thisis written as ( m ) (al1 ~2 a21 · . · . · . ~M aMI a12 a22 aM2 (3.5) The operator TA associated with a matrix A in which the matrix multiplication rule is understood. This matrix equation is the representation of the operator equation Iy) = A [x) in the bases Bv and Bw. The construction above indicates that-once the bases are fixed in the two vector spaces-to every operator there corresponds a unique matrix. This unique- ness is the result of the uniqueness of the components of vectors in a basis. On the other hand, given an M x N matrix A with elements aij, one can construct a unique linear2Perator TA defined by its action on the basis vectors (see Box 1.3.3): TA lai) == Ljd aji Ibj ). Thus, there is a one-to-one correspondence between op- erators and matrices. This correspondence is in fact a linear isomorphism: 3.1.2. Proposition. The two vector spaces £.,(VN, WM) and JV( M xN are isomor- phic. An explicit isomorphism is established only when a basis is chosen for each vectorspace,inwhichcase,anoperator is identified withitsmatrixrepresentation. Given the linear transformations A : VN -+ WM and B : WM -+ UK, we can form the composite linear transformation BoA : VN -+ UK. We can also choose bases Bv = {Iai)}~" Bw = {Ibi)}~" Bu = {Ici)};;', for V, W, and U, respectively. Then A, B, and BoA will be represented by an M x N, a K x M, and a K x N matrix, respectively, the latter being the matrix product of the other two matrices. Matrices are determined entirely by their elements. For thisreason a matrix A whose elements are(.lll, 0!12, ... is sometimes denoted by (aij). Sintilarly, the elements of this matrix are denoted by (A)ij. So, on the one hand, we have (aij) = A, and on the other hand (A)ij = ai], In the context of this notation, therefore, we canwrite (A+ B)ij = (A)ij + (B)ij =} (aij + fJij) =(aij) + (fJij), (yA)ij = y(A)ij =} y(aij) = (yaij), (O)ij = 0, (1)ij = 8ij. A matrix as a representation of a linear operator is well-defined only in refer- ence to a specific basis. A collection ofrows and columns ofnumbers by themselves have no operational meaning. When we manipulate matrices and attach meaning to them, we make an unannounced assumption regarding the basis: We have the standard basis of en (or JRn) in mind. The following example should clarify this subtlety.
  • 105. 3.1 MATRICES 85 3.1.3. Example. Let us fiud the matrix represeutatiou of the liuear operator A E £, (lR.3), given by ( X ) (X-Y+2Z) A Y = 3x-z z 2y+z in thebasis There is a tendency to associate thematrix -1 o 2 (3.6) withtheoperator A. Thefollowingdiscussionwill showthatthisis false.Toobtain thefirst columnof thematrix representing A,we consider So, by Box 3.1.1, the first column of the matrix is Theothertwo columns are obtained from givingthesecondandthethird columns, respectively. Thewholematrix is then 2-5~) 1 t . o 2 As long asallvectors are represented by columnswhoseentries are expansion coefficients of thevectors in B, A andAare indistinguishable. However, theactionof Aon thecolumn
  • 106. 86 3. MATRICES: OPERATOR REPRESENTATIONS x vector(Y )willnotyieldtheRHSofEquation(3.6)!Althoughthisisnotnsuallyemphasized, z thecolumnvectorontheLHSof Equation (3.6) is reallythevector whichis anexpansion in terms of thestandard basis of]R3 rather than in tenus of B. x Wecanexpand A(Y) in terms of B. yielding z ( X ) (X - Y +2Z) A Y = 3x-z z 2y+z = (2x - b) (i)+(-x+ h+ 2z) G)+ (x+ b -z) (D. Thissaysthat in thebasisB thisvector hastherepresentation X Similarly, (Y ) is represented by z W)) = ( t~~~:~;). z B -~x + ~y + ~z ApplyingA to the RHSof (3.8)yieldsthe RHS of (3.7),as it should. (3.7) (3.8) Given any M X N matrix A, an operator TA E L (VN, WM) can be associated with A, and one can construct the kernel and the range of TA. The rank of TA rank ofa matrix is called the rank of A. Sioce the rank of an operator is basis independent, this definition makes sense. Now suppose that we choose a basis for the kernel ofTA and extend it to a basis ofV. Let VI denote the span ofthe remaining basis vectors. Similarly, we choose a basis for TA (V) and extend it to a basis for W. In these two bases, the M x N matrix representiog TA will have all zeros except for an r x r submatrix, where r is the rank of TA. The reader may verify that this submatrix has a nonzero detenninant. In fact, the submatrix represents the isomorphism betweeu VI and TA (V), and, by its very construction, is the largest such matrix. Since the determinant of an operator is basis-iudependent, we have the followiog proposition: 3.1.4. Proposition. The rank ofa matrix is the dimension ofthe largest (square) submatrix whosedeterminant is not zero.
  • 107. 3.2 OPERATIONS ON MATRICES 87 3.2 Operations on Matrices There are two basic operations that one can perform on a matrix to obtain a new transpose ofa matrix one; these are transposition and complex conjugation. The transpose ofan M x N matrix A is an N x M matrix At obtained by interchanging the rows and columns of A: (At)ij = (A)ji, or (3.9) The following theorem, whose proof follows immediately from the definition oftranspose, summarizes the importantproperties ofthe operation oftransposition. 3.2.1.Theorem. Let Aand Bbe two (square) matrices. Then (a) (A+ B)t = At + Bt, (b) (AB)t = BtAt, (c) (At)t = A. (3.10) symmetric and antisymmetric matrices orthogonal matrix complex conjugation hermitian conjugate Of special interest is a matrix that is identical to its transpose. Such matrices occur frequently in physics and are called symmetric matrices. Similarly, anti- symmetric matrices are those satisfying At = -A. Any matrix A can be written as A = ~(A +At) + ~(A - At), where the first term is symmetric and the second is antisymmetric. The elements of a symmetric matrix A satisfy the relation a ji = (At)ij = (A)ij = aij; i.e., the matrix is symmetric under reflection through the main di- agonal. On the other hand, for an antisymmetric matrix we have a ji = -aij. In particular, the diagonal elements of an antisymmetric matrix are all zero. A (real) matrix satisfying AtA = AAt = 1 is called orthogonal. Complex conjugation is an operation under which all elements of a matrix are complex conjugated. Denoting the complex conjugate of A by A*, we have (A·)ij = (A)ij, or (aij)* == (a~). A matrix is real if and only if A* = A. Clearly, Wr=A . Under the combined operation of complex coujugation and transpositiou, the rows and columns ofa matrix are interchanged and all of its elemeuts are complex conjugated. This combined operatiou is called the adjoint operation, or hermitian conjugation, and is denoted by t, as with operators. Thus, we have At = (At)* = (A*)t, (At)ij = (A)ji or (aij)t = (aji)' Two types of matrices are important enough to warrant a separate definition. hermitian and unitary 3.2.2.Definition. A hermitian matrix Hsatisfies Ht = H,or, in terms ofelements, matrices ~ij =tlji- A unitary matrix U satisfies UtU =uut =1, or, in terms ofelements, "N *"N* , L....k=l lLiktLjk = L...k=l J.Lkif.Lkj = Vij· Remarks: It follows immediately from this definition that I. The diagonal elements of a hermitian matrix are real.
  • 108. 88 3. MATRICES: OPERATOR REPRESENTATIONS 2. The kth column of a hennitiau matrix is the complex coujugate of its kth row, and vice versa. 3. A real hermitiau matrix is symmetric. 4. The rows of au N x N unitary matrix, when considered as vectors in eN, form au orthonormal set, as do the columns. 5. A real unitary matrix is orthogonal. It is sometimes possible (aud desirable) to trausfonn a matrix into a form in which all of its off-diagonal elements are zero. Such a matrix is called a diagonal diagonal matrices matrix. A diagonal matrix whose diagonal elements are {Akl~1 is denoted by diag(AI, A2,... , AN)' 3.2.3. Example. In this example,we derivea useful identityfor functions of a diagonal matrix.Let D= diag(AI,1.2, ... , An) be a diagonalmatrix, and [(x) a functionthatbas a Taylor seriesexpansion f (x) = L~o akxk. The samefunction of Dcanbe written as Inwords, thefunction ofadiagonalmatrixisequal toadiagonalmatrixwhoseentries are the samefunction of thecorresponding entries of theoriginal matrix. ill the abovederivation, we usedthefollowingobviousproperties of diagonal matrices: adiag(AJ,A2, ... ,An) =diag(aAI,aA2, ... ,aAn), diag(AI,1.2, , An) + diag("'I, "'2, ,"'n) = diag(AI +"'1, ... , An + Wn), diag(AI,1.2, , An) . diag("'I. "'2 ,"'n) = diag(AI"'I, ... , An"'n). III 3.2.4. Example. (a) A prototypical symmetric matrix is that of the moment of in- ertia encountered in mechanics. The ijth eleIIlent of this matrix is defined as Iij es fff P(XI. X2, X3)XiXj dV, whereXi is the ilb Cartesiancoordinateof a point in the dis- tribution of massdescribed by thevolumedensity P(Xlo xz, X3).It is clearthat Iij = Iji. or I = It. Themoment of inertia matrix canbe represented as Ithassix independent elements. (b) Anexampleof ananti symmetric matrix is theelectromagnetic fieldtensor givenby F= (~3 -82 -EI -83 82 o -81 81 0 -E2 -E3 Pauli spin matrices (c) Examples of hermitian matrices arethe2 x 2 Pauli spin matrices:
  • 109. 3.3 ORTHONORMAL BASES 89 ( 0 -i) cz = i 0 . (d) The mostfrequently encountered orthogonal matrices are rotations. One suchmatrix Euler angles represents therotation ofa3-dimensionalrigidbodyin terms of Euleranglesandisusedin mechanics. Attaching acoordinate system tothebody, ageneralrotationcanbedecomposed intoa rotation of anglee about thez-axis,followedby arotation of angleeabout thenew x-axis, followedbyarotationofangle1ft aboutthenew z-axis. Wesimplyexhibitthismatrix interms of theseangles andleaveitto thereader to showthat it is indeedorthogonal. ( COS 1fr cos tp - sin 1/1 cos 8 sin cp sin 1/1 cos rp + cos 'if.r cos ()sinrp sinfJ sinrp - cos 1fr sin q; - sin 1/1 cos 8 cos rp - sin "" sin cp + cos 1/1 cos ()cos qJ sine cos rp sin1f.rsin9 ) - cos1ft sinB . cosB III 3.3 Orthonormal Bases Matrix representation is facilitated by choosing an orthonormal basis B = (Ie;)}!:!. The matrix elements of an operator A can be found in such a basis by "multiplying" both sides of A lei) = I:f=t aki lek) on the left by {ej I: or (3.11) We can also show that in an orthonormal basis, the ith component ;i of a vector is found by multiplying it by (eiI. This expression for;i allows us to write the expansion of [x) as N =} 1 = I>j}(ejl, j~l (3.12) which is the same as in Proposition 2.5.3. Let us now investigate the representation of the special operators discussed in Chapter 2 and find the connection between those operators and the matrices encountered in the last section. We begin by calculating the matrix representing the hermitian conjugate of an operator T. In an orthonormal basis, the elements of this matrix are given by Equation (3.11), 7:;j = (eil T lej). Taking the complex conjugate of this equation, and using the definition of T" given in Equation (2.8), we obtain 7:ij = (eil T lej)* = (ej Irt lei), or (Tt)ij = 7:j;. This is precisely how the adjoint of a matrix was defined.
  • 110. 90 3. MATRICES: OPERATOR REPRESENTATIONS Note how crucially this conclusion depends on the orthonormality of the basis vectors. If the basis were not orthonormal, we could not use Equation (3.11) on which the conclusion is based. Therefore, 3.3.1. Box. Only in an orthonormal basis is the adjoint ofan operator rep- resented by the adjoint ofthe matrix representing that operator. In particular, a hermitian operator is represented by a hermitian matrix only if an orthonormal basis is used. The following example illustrates this point. 3.3.2. Example. Consider the matrix representation of the hermitian operator H in a general-notorthononnal-basis B = {Iai) If::l.Theelementsofthe matrix corresponding to H aregivenby N H lak) = L tljk laj). j=l or N H la;) = L tlji laj}' j=l (3.13) Takingthe product of the first equation with <rti Iand complex-conjugatingthe result gives (aiIHlak)* = CLf~l~jk(ailaj})* = Ej=l~jk(ajlai).Butbythedefinitionofa hermitian operator, (ail H lak)* = (aklHt lai) = (aklH lai)' So we have (aklH lai) = Ef=l ~jk (ajlai). On the other hand, muitiplying the second equation in (3.13) by (akl gives (aklHjai) = "L.7=1 Ylji {akl aj}' Theonlyconclusion we candraw from thisdiscussion is Ef~l ~jk (ajl ail = Ef~l ~ji (akl aj). Because litis equation does not say anything about each individual 1Jij. we cannot conclude, in general, that 1]0 = 'YJji' However, if the lai)'s are orthonormal, then (ajl ail = 8ji and (akl aj) = 8kj. and we obtain "L.f=l11jk(Jji = "Lf=11Jji8icj, orrJik = nu. as expected of a hermitian matrix. II Similarly, we expect the matrices representing unitary operators to be uni- tary only if the basis is orthonormal. This is an inunediate consequence of Equation (3.10), hut we shall prove it in order to provide yet another example of how the completeness relation, Equation (3.12), is used. Since UUt = 1, we have (eil UUt lej) = (eil1Iej) = 8ij. We insert the completeness relation 1 = Lf=l lek) (ekl between Uand uton the LHS: (eil U (~ lekHekl) U t lej) = ~ (eil ~ lek) (ekl ~t lej) = 8ij. =f.£ik =f.£jk This equation gives the first half of the requirement for a unitary matrix given in Definition 3.2.2. By redoing the calcnlation for UtU, we conld obtain the second half of that requirement.
  • 111. 3.4 CHANGE OF BASIS AND SIMILARITY TRANSFORMATION 91 3.4 Change of Basis and Similarity Transformation !tis often advantageous to describe aphysicalproblem iu aparticularbasis because it takes a simpler form there, but the general form of the result may still be of importance. In such cases the problem is solved in one basis, and the result is transformed to other bases. Let us investigate this point in some detail. Given a basis B = (Iai)}!:!' we can wtite an arbitrary vector la) with com- ponents {al,a2, ... ,aN} in B as la) = L!:lai lai). Now suppose that we change the basis to B' = (Iai)}f=l' How are the components of la) in B' re- lated to those in B? To answer this question, we wtite lai) in terms of B' vectors, lai) = Lf=l Pji lai), and substitute for lai) in this expansion of la), obtaining la) = L!:l a; Lf=l Pji lai) = Li,j aiPji lai)· Ifwe denote the jth component of la) in B' by ai, then this equation tells us that N otj = L Pjiotj i=l for j = 1,2, "" N. (3.14) Ifwe use a', R, and a, respectively, to designate a column vector with elements ai, an N x N matrix withelements Pit- and a columnvectorwithelements ai, then Equation (3.14) can be wtitten in matrix form as (~~) cp~: ~~ ~~~) (~~) or a' = Ra. a~ - ~l P~2 •• a~' (3.15) basistransformation The matrix Ris calledthe basis transformation matrix. !tis invertiblebecause matrix it is a linear transformation that maps one basis onto another (see Theorem 2.1.6). What happens to a matrix when we transform the basis? Consider the equation Ib) = A la), where la) and Ib) have components {ail!:l and (,Bil!:!' respectively, in B. This equation has a corresponding matrix equation b = Aa. Now, if we cbaoge the basis, the components of la) and Ib) will change to those of a' and b', respectively. We seek a matrix A' such that b' = A'a'. This matrix will clearly be the transform of A. Using Equation (3.15), we wtite Rb =A'Ra, or b =R-1A'Ra. Comparing this with b = Aa and applying the fact that both equations hold for arbitrary a and b, we conclude that or (3.16) similarity This is called a similarity transformation on A, and A' is said to be similar to A. transformation The transformation matrix R can easily be found for orthonormal bases B = (Jei)}!:l and B' = (Ie;)}!:!. We have lei) = Lf=l Pki le~). Multiplying this
  • 112. 92 3. MATRICES: OPERATOR REPRESENTATIONS equation by {eiI, we obtain N N (eilei) =LPki (eileiJ =LPki8jk <e». k~1 k~1 That is, (3.17) 3.4.1. Box. Tofind the ijth element ofthe matrix that changes the compo- nents ofa vector in the orthonormal basis B to those ofthe same vector in the orthonormal basis B', take the j th ket in B and multiply it by the ith bra in R'. To find the ijth element of the matrix that changes B' into B, we take the jth ket in B' and multiply it by the ith bra in B: pij = (ed ei). However, the matrix R' must be R-I, as can be seen from Equation (3.15). On the other hand, (pi)' =(edei)' =(eilei) =Pji,Or (R-I)lj =Pji, or (R-I)jj =Pji =(Rt)ij. (3.18) This shows that R is a unitary matrix and yields an important result. 3.4.2. Theorem. The matrix that transforms one orthonormal basis into another is necessarily unitary. From Equations (3.17) and (3.18) we have (Rt)ij = {ed ei}. Thus, 3.4.3. Box. To obtain the jth column of Rt, we take the jth vector in the new basis and successively "multiply" it by {ed for i = 1,2, ... , N. In particular, if the original basis is the standard basis of eN and lei} is rep- resented by a column vector in that basis, then the jth column of Rt is simply the vector lei}. 3.4.4. Example. In this example, we show that the similarity transform of a function of a matrix is the same function of the similarity transform of the matrix: R!(A)R-1 = f (RAR-1). Theproofinvolvesinserting 1 = R- IRbetweenfactors of AintheTaylor series expansion of !(A): R!(A)R-I = R If.akAk) R-I = f;akRAkR-I = f;akR~~-;R-I .t=0 k~O k~O k times 00 . 00 =I>kRAR-IRAR-I ... RAR-I = L ak (RAR-I)k = !(RAR-I). III k~O k=O
  • 113. 3.5 THE DETERMINANT 93 3.5 The Determinant An important concept associated with linear operators is the determinant, Deter- minants are usually defined in terms of matrix representations of operators in a particular basis. This may give the impression that determinants are basis depen- dent. However, we shall show thatthe value ofthe determinant ofan operatoris the same in all bases. In fact, it is possible to define determinants of operators without resort to a specific representation ofthe operator in terms ofmatrices (see Section 25.3.1). Letus first introduce apermutation symboleiliz...iN' whichwill beusedexten- sively in this chapter. It is defined by (3.19) In other words, S'j'2...iN is completely antisymmetric (or skew-symmetric) under interchange ofany pair ofits indices. We will use this permutation symbol to define determinants. An immediate consequence of the definition above is that S'j'2...iN will be zero if any two of its indices are equal. Also note that f:iliz ...iN is +1 if (iI, ia. ... , iN) is an even permutationI (shuflling) of (1, 2, . '" N), and -I if it is an odd permutation of (1,2, ... , N). 3.5.1 Determinantof a Matrix determinant defined 3.5.1. Definition. The determinant is a mapping, det : MNxN --> C, given in term ofthe elements OIu ofa matrix A by N detA = L 8iliz...iNCXlit ... 00NiN' il,...,iN=l Definition 3.5.1 gives detA in terms of an expansion in rows, so the first entry is from the first row, the second from the second row, and so on. It is also possible to expand in terms of columns, as the following theorem shows. 3.5.2. Theorem. The determinant ofa matrix A can be written as N detA = L eiriz...iNCXitl ... otiNN· il, ...,iN Therefore, det A = det At. 1An even permutation means an even number of exchanges of pairs of elements. Thus, (2,3, 1) is an even permutation of (1, 2, 3), while (2, 1, 3) is an odd permutation.It can be shown (see Chapter 23) that the parity (evenness or oddness) of a permutation is well-defined; i.e., thatalthough there may be manyroutes of reaching a permutation froma given (fiducial) permutation via exchanges of pairs of elements, all suchroutes require eithereven numbers of exchanges or odd numbers of exchanges.
  • 114. 94 3. MATRICES: OPERATOR REPRESENTATIONS Proof We shall go into some detail in the proof of only this theorem on determi- nants to illustrate the manipulations involved in working with the B symbol. We shall not reproduce such details for the other theorems. In the equation of Definition 3.5.1, ii, i2, ... , iN are all different and form a permutation of (I, 2, ... ,N). So one of the a's must have I as its second index. We assume that it is the h th term; that is, iiI = I, and We move this term all the way to the left to get aiJIali! . .. aNiN' Now we look for the entry with 2 as the second index and assume that it occurs at the hth position; that is, ijz = 2. We move this to the left, next to aiI 1, and wtite (ljt1(ljz2(llil aNiN0 Wecontinue in thisfashion untilwe get(lhl(ljz2· .. oriNN. Since h, n. ,iN is really a reshuffling ofit, i2, ... , iN, the surnmationindices can be changed to h, h. ... , l«, and we can wtite2 N detA = "I: Bili2...iN(1hl ... ajNN. il,···,jN=l If we can show that Biliz ...iN = 8hjz...in» we are done. In the equation of the theorem, the sequence of integers (ii, i2, ... , is) is obtained by some shuffling of (I, 2, ... , N). What we have done just now is to reshuffle (iI, tz. ... , iN) in reverse order to get back to (1, 2, ... , N). Thus, ifthe shuflling in the equation of the theorem is even (odd), the reshuffling will also be even (odd). Thus, Biti,...iN = Biih...is» and we obtain the first part of the theorem: N detA = L 8hjz...jN(lhl ... ajNN. h,···,iN=l For the second part, we simply note that the rows of At are coluums of A and vice versa. D 3.5.3. Theorem. Interchanging two rows (or two columns) of a matrix changes the sign ofits determinant. Proof The proof is a simple exercise in permutation left for the reader. D An immediate consequence of this theorem is the following corollary. 3.5.4. Corollary. The determinant ofa matrix with two equal rows (or two equal columns) is zero. Therefore, one can add a multiple ofa row (column) to another row (column) ofa matrix without changing its determinant. 2TheE symbolin thesumis notindependent of thej's. although it appears without suchindices.In reality, thei indicesare "functions" of thej indices.
  • 115. (3.20) cofactor ofan element ofamatrix 3.5 THE DETERMINANT 95 Since every term of the determinant in Definition 3.5.1 contains one and only one elementfrom eachrow, we canwrite N detA = CXilAil +cxi2Ai2 + ... +otiNAiN = L:CiijAij, j~l where Aij contaios products of elements of the matrix A other than the element aii' Sinceeachelementof aroworcolumnoccursatmostonceineachterm of the expansion, Aij cannot contain any element from the ithrow or the jth column, The quantity Aij is called the cofactor of aij, and the above expression is known as the (Laplace) expansion ofdetA by its ith row. Clearly, there is a similar expansion by the ith column of the determinant, which is obtained by a similar argument usiog the equation of Theorem 3.5.2. We collect both results io the followiog equation. N N detA = L:cxijAij = L:ajiAjio )=1 )=1 Vandermonde, Alexandre-Thieophile, alsoknownasAlexis,Abnit, andCharles-Auguste Vandermonde (1735-1796) had a father, a physician who directed his sickly son towarda musicalcareer. An acquaintanceship withFontaine, however, so stimulated Vandennonde that in 1771 he was elected to the Academic des Sciences, to which he presented four mathematicalpapers (histotal mathematical production) in 1771~1772. Later,Vandennond.e wrote several papers on harmony, andit was said at thattime that musicians considered Vandermonde to be a mathematician andthat mathematicians viewedhimas a musician. Vandermonde's membership in theAcademy led to a paper on experiments withcold, made with Bezout and Lavoisier in 1776, and a paper on the manufacture of steel with Berthollel and Monge in 1786. Vandermonde becamean ardent and active revolutionary, being such a close friend of Monge thathe was termed "femme de Monge." He was a member of the Commune of Paris andtheclubof the Jacobins. In 1782he was director of the Conservatoire des Artset Metiers andin 1792, chief of the Bureau de 1'Habillement desArmies. Hejoinedin thedesignof acourseinpoliticaleconomyfortheEcoleNormale andin 1795 was named a member of theInstitut National. Vandermonde is best known for the theory of determinants. Lebesgue believedthat the attribution of determinant to Vandermonde was due to a misreading of his notation. Nevertheless, Vandermonde's fourth paper was the first to give a connected exposition of determinants, because he (1) defined a contemporary symbolismthat wasmorecomplete, simple, and appropriate than that of Leibniz; (2) defined determinants as functions apart from thesolution of linear equations presentedby Cramer butalsotreatedbyVandermonde; and(3) gaveanumber of properties of thesefunctions, suchasthenumber andsignsof the terms andtheeffectof interchanging two consecutive indices(rowsorcolumns), whichhe usedto showthata determinant is zeroif tworowsorcolumnsareidentical. Vandermonde's realandunrecognized claim to fame was lodgedin his first paper, in whichhe approachedthe generalprohlemof the solvabilityof algebraiceqoationsthrough a study of functions invariant under permutations of the roots of the equations. Cauchy
  • 116. 96 3. MATRICES: OPERATOR REPRESENTATIONS assigned priority in this to Lagrange and Vandermonde. Vandermonde read his paper in November 1770, but he did not become a member of the Academy until 1771, and the paperwasnot publisheduntil 1774.AlthoughVandennonde'smethodswerecloseto those later developed by Abeland Galoisfor testing the solvability ofequations, and although his treatment of the binomial equation xn - 1 = 0 could easily have led to the anticipation of Gauss's results on constructible polygons, Vandennonde himself did not rigorously or completelyestablishhisresults,nordid he see the implicationsfor geometry. Nevertheless, Kroneckerdates the modern movement in algebra to Vandermonde's 1770 paper. Unfortunately, Vandennonde's spurt of enthusiasm and creativity, which in two years produced four insightful mathematical papers at least two of which were of substantial importance, was quickly diverted by the exciting politics of the time and perhaps by poor health. 3.5.5. Proposition. Ifi i' k, then L.f~1 aijAkj =0 =L.f~1 ajiAjk. Proof Consider the matrix B ohtained from A hy replacing row k hy row i (row i remains unchanged, of course). The matrix B has two equal rows, and its de- terminant is therefore zero. Now, if we ex~and det B by its kth row according to Equation (3.20), we obtain 0 = det B = L.j=1 t!kjBkj. But the elements of the kth row of B are the elements of the ith row of A;that is, {Jkj = ai], and the cofactors of the kth row of B are the same as those of A, that is, Bkj = Akj. Thus, the first equation ofthe proposition is established. The second equation can be established using expansion by columns. D minor ofa matrix A minor of order N - I of an N x N matrix A is the determinant of a matrix obtained by striking out one row and one column ofA.Ifwe strike out the ith row and jth column of A, then the minor is denoted by Mij. 3.5.6. Theorem. Aij = (-l)'+jMij. Proof The proofinvolves separating all from the rest of the terms in the expansion of the determinant, The unique coefficient of all is All by Equation (3.20). We can show that it is also MIl by examining the e expansion of the determinant and performing the first sum. This will establish the equality A II = MIl. The general equality is obtained by performing enough interchanges of rows and columns of the matrix to bring aij into the first-row first-column position, each exchange introducing a negative sign, thus the (-l)i+j factor. The details are left as an exercise. D The combination of Equation (3.20) and Theorem 3.5.6 gives the familiar routine of evaluating the determinant of a matrix. 3.5.2 Determinants of Products of Matrices One extremely useful property of determinants is expressed in the following the- orem.
  • 117. 3.5 THE DETERMINANT 97 3.5.7. Theorem. det(AB) = (detA)(detB) Proof The proofconsists in keeping track ofindex shuffling while rearranging the order of products ofmatrix elements. We shall leave the details as an exercise. D 3.5.8. Example. Let 0 and U denote, respectively, an orthogonal and a unitary n x n matrix; that is, 00' = 0'0 = 1, and UUt = utu = 1. Taking the detenoinant of the firstequation and nsing Theorems 3.5.2 and 3.5.7. we nbtain (det O)(deta') = (det 0)2 = det 1 = 1. Therefore, for an orthogonal matrix, we get detO = ±1. Orthogonal transformations preserve a real inner product. Among such transformations are the so-called inversions, which, in their simplest form, multiply a vector by -1. In three dimensions this correspondsto a reflection through the origin. The matrix associated with this operation is -1: o -1 o which has a determinant of -1. This is a prototype of other, more complicated, orthogonal transformations whose determinants are -1. The other orthogonal transformations, whose determinants are +I, are of special in- terest because they correspond to rotations in three dimensions. The set of orthogonal transformations in n dimensions having the determinant +1 is denoted by SO(n). These transformations are special because they have the mathematical structure of a (continuous) group, which finds application in many areas of advanced physics. We shall come back to the topic of group theory later in the book. We can obtain a similar result for unitary transformations. We take the determinant of both sides of UtU = 1: det(U*)' det U= det U* det U= (det U)*(det U) = Idet UI2 = I. Thus, we can generally write det U = eia, with a E JR.. The set of those transformations with a = Ofonos agronp to which 1 belongs and that is denoted by SU(n). This gronphas found applications in the attempts at unifying the fundamental interactions. II 3.5.3 Inverse of a Matrix One of the most useful properties of the determinant is the simple criterion it gives for a matrix to be invertible. We are ready to investigate this criterion now. We combine Equation (3.20) and the content of Proposition 3.5.5 into a single equation, N N LClijAkj = (detA)8ik = LCljiAjk, )=1 j=l (3.21)
  • 118. 98 3. MATRICES: OPERATOR REPRESENTATIONS andconstruct amatrix CA, whoseelementsarethecofactors of theelements of the matrix A: (3.22) Then Equation (3.21) can be written as N N I>ij (C~)jk = (detA)8ik = L (C~)kj «s. j=l j=l Of, in matrix form, as AC~ =(det A)1 =C~A. (3.23) (3.24) 3.5.9. Theorem. The inverse ofa matrix (ifit exists) is unique. The matrix A has inverse ofamatrix an inverse ifand only ifdetA i' O. Furthermore, C' A-t = _A_ det A' where CA is the matrix ofthe cofactors ofA. Proof. Let B and C be inverses of A. Then B = (CA)B = C (AB) = C. '-.r-' '-.r-' =1 =1 For the second part, we note that if A has an inverse B, then AB= 1 =} detAdetB = det 1 = I, whence det A i' O. Conversely, if det A i' 0, then dividing both sides ofEquation (3.23) by detA, we obtain the unique inverse (3.24) of A. D The inverse of a 2 x 2 matrix is easily found: ( a b)-t I ( d -b) e d =ad-be -e a (3.25) ifad - be i' O. There is a more practical way ofcalculating the inverse ofmatrices. In the following discussion of this method, we shall confine ourselves simply to stating a couple of definitions and the main theorem, with no attempt at providing any proofs. The practical utility of the method will be illusttated by a detailed analysis of examples.
  • 119. 3.5 THE DETERMINANT 99 elementary row operation triangular, or row-echelon form of amatrix 3.5.10. Definition. An elementary row operation on a matrix is one of the fol- lowing: (a) interchange oftwo rows ofthe matrix, (b) multiplication ofa row by a nonzero number, and (c) addition ofa multiple ofone row to another. Elementary column operations are defined analogously. 3.5.11. Definition. A matrix is in triangular, or row-echelon.form if it satisfies the following three conditions: 1. Any row consisting ofonly zeros is below any row that contains at least one nonzero element. 2. Goingfrom left to right, the first nonzero entry ofany row is to the left ofthe first nonzero entry ofany lower row. 3. The first nonzero entry ofeach row is 1. 3.5.12. Theorem. For any invertible n x n matrix A, • The n x 2n matrix (All) can be transformed into the n x 2n matrix (IIA-I) by means ofa finite number ofelementary row operations.3 • if (All) is transformed into (liB) by means ofelementary row operations, then B = A-I A systematic way of transforming (All) into (I lA-I) is first to bring A into triangular form and then eliminate all nonzero elements of each column by ele- mentary row operations. 3.5.13. Example. Let us evaluate the inverse of A= G: =D· We start with G 2 o I I -III 00) -2 0 I 0 == M -I 0 0 I o 0) I 0 3 I -III -2 0 -5 -2 ( I 2 o I o 0 0) o , I 3(2)+(3) o ) o =M' -1/5 and apply elementary row operations to Mto bring the left half of it into triangular form. If we dennte the klb row by (k) and the three operatinns of Definition 3.5.10, respectively, by (k) '* (j), ark), and ark) + (j), we get (I 2 -III 0 M ,0 I -2 0 I -2(1)+(3) 0 -3 I -2 0 (I2 -III 0 -------+ 0 I -2 0 I -!(3) 0 0 I 2/5 -3/5 3The matrix (AI1) denotes the n x 2n matrix obtained by juxtaposing the n x n unit matrix to the right of A.It can easily be shown that if A, 8, and C are n x n matrices, then A(BIC) = (ABIAC).
  • 120. 100 3. MATRICES, OPERATOR REPRESENTATIONS The left half of M' is in triangular form. However, we wantall entries aboveany 1 in a columnto be zero as well, .i.e.,we wantthe left-hand matrix to be 1. We can do thisby appropriate use of type3 elementary rowoperations: c 0 3 II -2 -tJ M' ) 0 I -2 0 I -2(2)+(1) 0 0 I 2/5 -3/5 G 0 o 1-1/5 -1/5 3/5 ) , I -2 0 I -~/5 -3(3)+(1) 0 I 2/5 -3/5 2(3)+(2) G 0 o ,-1/5 -1/5 3/5 ) I o 4/5 -1/5 -2/5 0 I 2/5 -3/5 -1/5 Theright half of the resultingmatrixis A-I. 11II 3.5.14. Example. It is instructive to start with a matrix thatis not invertible andshow that it is impossibleto turn it into 1 by elementary rowoperations. Consider thematrix B = (i -I D· -2 -I 5 Let us systematically bring it intotriangular fonn: M= (i -I 31 1 0 0) C-2 I 10 I D -2 I 0 I o ,2 -I 3 I 0 -I 5 o 0 0 I (1)<>(2) -I 5 o 0 0 (~I -2 I 10 I 0) C -2 I 10 I D , 3 I I -2 o -------+ 0 3 I I -2 -2(1)+(2) 5 o 0 0 I (1)+(3) 0 3 I 0 I C -2 I I 0 I 0) C -2 I I 0 I D ' 0 3 I I -2 o --+ 0 I 1/3 1/3 -2/3 -(2)+(3) 0 0 o -I 3 I ~(2) 0 0 o -I 3 The matrix B is now in triangular form, butits third row contains all zeros.There is no way we can bring thisinto the form of a unitmatrix. We therefore conclude that B is not invertible. This is, of course, obvious,sinceit caneasily beverified thatBhas a vanishing determinant. 11II We mentioned earlier that the determinant is a property of linear lransfonna- tions, although they are defined in terms of matrices that represent them. We can now show this. First, note that taking the determinant of both sides of AA-1 = I,
  • 121. trace ota square matrix 3.6 THE TRACE 101 one obtains det(A-1) = 1/ detA. Now recall that the representations of an opera- tor in two different bases are related via a similarity transformation. Thus, if A is represented by Ain one basis and by A' in another, then there exists an invertible matrix R such that A' = RAR-1 • Taking the determinant of both sides, we get I det A' = detR detA - - = detA. detR Thus, the determinant is an inttiusic property of the operator, independent of the basis chosen in which to represent the operator. 3.6 The Trace Another inttiusic quantity associated with an operator that is usually defined in terms of matrices is given in the following definition. 3.6.1. Definition. Let A be an N x N matrix. The mapping tr : MN x N --> <C (or lR) given by tr A = L~l aii is called the trace ofA. 3.6.2. Theorem. The trace is a linear mapping. Furthermore, trA' = trA and tr(AB) = tr(BA). connection between trace and determinant Proof The linearity of the trace and the first identity follow directly from the definition. To prove the second identity of the theorem, we use the definitions of the trace and the matrix product: N N N N N tr(AB) = ~)AB)ii = LL(A)ij(B)ji = LL(B)ji(A)ij ;=1 ;=1 j=l ;=1 j=l N (N ) N = f; ~(B)ji(A)ij =f;(BA)jj =tr(BA). D 3.6.3. Example. In thisexample, we show a veryusefulconnection between the trace and thedeterminant that holdswhen amatrix is onlyinfinitesimally different from theunit matrix. Letuscalculate thedeterminant of 1+fA tofirst order in E. Usingthedefinition of determinant, we write n det(1 +€A) = L £;1 ...iIl(151i1 +€alit)···(l5nin +€a'nin) i1,.."i n= 1 n n n = E Eit ...in81it···8nin +€ L E E"il ...in81i1···8kik .. ,8ninakik· i1,...•in= 1 k=l ;1,...,in= 1
  • 122. (3.26) 102 3. MATRICES: OPERATOR REPRESENTATIONS Thefirst sumis justtheproduct of all theKronecker deltas. In thesecondsum,8kik means thatin the product of the deltas, ~kik is absent. This term is obtained by multiplying the secondterm of thekth parentheses by thefirst term of all therest.Since we are interested onlyinthe first power ofE, we stopat this term. Now,thefirst sum is reduced toE12...n = 1 after all theKronecker deltas are summed over. Forthesecondsum,we get n n n n e L E Eit...i1l8lil··· 8kh ... 8ninetkh = € L L €12 ...ik ...nCl kik k=l it ....• in=l k=l ik=l n n = e E€12 ...k...netkk = € Eakk = € trA, k~l k~l where thelastlinefollows from thefactthat theonlynonzerovalue for€12...ik...n isobtained whenik is equalto the missing index,i.e., k, in whichcaseit will be 1.ThusOOt(1 +fA) = l+<trA. III Similar matrices have the same trace: If A' = RAR-1, then tr A' =tr(RAR-1) =tr[R(AR-1)] =tr[(AR-1)R] =tr[A(R-1R)] =tr(A1) =trA. The preceding discussion is summarized in the following proposition. 3.6.4. Proposition. To every operator A E L (V) are associated two intrinsic numbers, det A and tr A, which are the determinant and trace ofthe matrix repre- sentation ofthe operator in anybasis ofV. Itfollows from this proposition that the resnlt ofExample 3.6.3 can be written in termsof operators: det(l + fA) = 1+ <trA. (3.27) (3.28) A par1icnlarly usefulformula thatcan be derivedfrom this equation is the derivative at t = 0 of an operator A(t) depending on a single variable with the property that A(O) = 1. To first order in t, we can write A(t) = 1 + tA(o) where a dot represents differentiating with respect to t. Substituting this in Equation (3.27) and differentiating with respect to t, we obtain the important result d I . - det(A(t» = tr A(O). dt ,=0 3.6.5. Example. Wehaveseenthatthedeterminantof aproductof matrices is theproduct of thedeterminants. On theother hand, thetrace of a sum of matrices is thesumof traces. Whendealingwith numbers, products and sums arerelated via the logarithm andexpo- nential: afJ = exp{lna +In,B}. A generalization of thisrelation exists fordiagonalizable matrices. LetAbe sucha matrix, i.e., let D= RAR-1for-some similarity transformation R andsome diagonal matrix D= diag(Al, 1.2, ... , An).The determinantofa diagonalmatris is simplytheproduct of its elements:
  • 123. 3.7 PROBLEMS 103 Takingthe naturallog of both sidesandusing the result of Example3.2.3,we have In(detD) =lnAl +lnA2+···+lnAn =tr(lnD), whichcan also be writtenas det D= exp(tr(ln D)). In tenus ofA,thisreadsdet(RAR-1) = exp{tr(ln(RAR-1))). Nowinvoketheinvariance of determinant andtrace under similarity transformation andtheresultofExample3.4.4to obtain detA = exp{tr(R(lnA)R-1)) = exp{tr(lnA)). (3.29) Thisisanimportantequation,whichissometimesusedtodefinethedeterminantofoperators in infinite-dimensional vectorspaces. II Bolh lhe determinant and lhe trace are mappings from MNxN to C. The deter- minant is not a linear mapping, bnt lhe trace is; and lhis opens up lhe possibility of defining an inner product in lhe vector space of N x N matrices in terms of lhe trace: 3.6.6. Proposition. For any two matrices A, B E MNxN, the mapping g : MNxN x MNxN --> C defined by g{A, B) = tr(AtB) is a sesquilinear inner product. Proof The proof follows directly from lhe linearity of trace and lhe definition of hermitian conjugate. D 3.7 Problems 3.1. Show lhat if [c) = la) + Ib), lhen in any basis lhe components of Ie) are equal to lhe sums of lhe corresponding components of la) and Ib). Also show lhat lhe elements of lhe matrix representing lhe sum of two operators are lhe sums of the elements of lhe matrices representing IDosetwo operators. 3.2. Show lhat the unit operator 1 is represented by lhe unit matrix in any basis. 3.3. The linear operator A : ]R3 --> ]R2 is given by A (X) = (2x+Y - 3Z) . Y x+y-z z Construct lhe matrix representing A in lhe standard bases of]R3 and]R2. 3.4. The linear transformation T : ]R3 --> ]R3 is defined as Find lhe matrix representation of T in
  • 124. 104 3. MATRICES: OPERATOR REPRESENTATIONS (a) the standard basis of JR3, (b) the basis consisting of lat) = (1,1,0), la2} = (1,0, -I), and la3} = (0,2,3). 3.5. Show that the diagonal elements of an antisymmetric matrix are all zero. 3.6. Show that the nnmber of independent real parameters for an N x N 1. (real) symmetric matrix is N(N + 1)/2, 2. (real) antisymmetric matrix is N(N - 1)/2, 3. (real) orthogonal matrix is N(N - 1)/2, 4. (complex) unitary matrix is N 2, 5. (complex) hermitian matrix is N2 • 3.7. Show that an arbitrary orthogonal 2 x 2 matrix can be written in one of the following two forms: ( CO So sinO - SinO) cos e or ( cosO sinO SinO) -CDSe . The first is a pure rotation (its determinant is +I), and the second has determinant -1. The form of the choices is dictated by the assnmption that the first entry of the matrix reduces to I when 0 = O. 3.8. Derive the formulas COS(OI +~) = cos 01 cos~ - sin 01 sin~, sin(OI +~) = sin 01 cos ~ + cos 01 sin 02 by noting that the rotation of the angle 01 +~ in the xy-plane is the product of two rotations. (See Problem 3.7.) 3.9. Prove that ifa matrix Msatisfies MMt = 0, then M= O. Note that in general, M 2 = 0 does not imply that Mis zero. Find a nonzero 2 x 2 matrix whose square is zero. 3.10. Construct the matrix representations of and the derivative andmultipiication-by-t operators. Choose II, t, t2 , t3 } as your basis of3'~[tl and {l, t, t2 , t3 , t4 } as your basis of3'4[t]. Use the matrix of 0 so obtained to find the first, second, third, fourth, and fifth derivatives of a general polynomial of degree 4.
  • 125. 3.7 PROBLEMS 105 3.11. Find the transformation matrix R that relates the (orthonormal) standard basis of C3 to the orthonormal basis obtained from the following vectors via the Gram-Schmidt process: Verify that R is unitary, as expected from Theorem 3.4.2. 3.12. If the matrix representation of an endomorphism Tof C2 with respect to the standard basis is (11), what is its matrix representation with respect to the basis {(D, (!I)}? 3.I3. If the matrix representation of an endomorphism T of C3with respect to the standard basis is what is the representation of T with respect to the basis o 1 -1 {( 1 ),(-1),{ I)}? -I 1· 0 3.14. Using Definition 3.5.1, calculate the determinant of a general 3 x 3 matrix and obtain the familiar expansion of such a determinant in terms of the first row of the matrix. 3.15. Prove Corollary 3.5.4. 3.16. Showthatdet(aA) = aN detAforanNxNmatrixAandacomplexnumber a. 3.17. Show that det 1 = I for any unit matrix. 3.18. Find a specific pair ofmatrices Aand Bsuch that det(A+ B) oF detA+det B. Therefore, the determinant is not a linear mapping. Hint: Any pair of matrices will most likely work. In fact, the challenge is to find a pair such that det(A + B) = detA + detB. 3.19. Demonstrate Proposition 3.5.5 using an arbitrary 3 x 3 matrix and evaluating the sum explicitly. 3.20. Find the inverse of the matrix A= (i -2 -I o I
  • 126. 106 3. MATRICES: OPERATOR REPRESENTATIONS 3.21. Show explicitly that det(AB) = det Adet Bfor 2 x 2 matrices. 3.22. Given three N x N matrices A, B,and C such that AB = C with C invertible, show that both A and B must be invertible. Thus, any two operators A and B on a finite-dimensional vector space satisfying AB = 1 are invertible and each is the inverse of the other. Note: This is not true for infinite-dimensional vector spaces. 3.23. Show directly that the similarity transformation induced by R does not change the determinant or the trace of A where (12-1) R = 0 1 -2 2 1 -1 and (3-1 2) A = 0 1 -2·. I -3 -I 3.24. Find the matrix that transforms the standard basis of 1(;3 to the vectors I -I 0 .j2 .j2 !al} = j !a2) = i !a3) = -2 .J6 .J6 .J6 l+i -1+i l+i .J6 .J6 .J6 Show that this matrix is unitary. 3.25. Consider the three operators LI, L2, and L3 satisfying [Lj , L21 = iL3, [L3, Ll] = iL2. [L2, L3] = iLl. Show that the trace of each of these operators is necessarilyzero. 3.26. Show that in the expansion of the determinant given in Definition 3.5.1, uo two elements of the same row or the same column can appear in each term of the sum. 3.27. Find inverses for the following matrices using both methods discussed in this chapter. A=U I -1) ( 2 -1) C= (~1 -1 ~ ), I 2 , B= 0 I -2 , I -I -2 -2 2 1 -1 -I -2 CyO 0 (I - i)/(2../2) "H)!('yO) J D- 0 1/../2 (I - i)/(2../2) -(1 +i)/(2../2) - 1/../2 0 -(1 - i)/(2../2) -(1 +i)/(2../2) . 0 1/../2 -(1 - i)/(2../2) (1+ i)/(2../2) 3.28. Let Abe an operator on V. Show that if det A = 0, then there exists a nonzero vector [x) E V such that A [x) = o.
  • 127. 3.7 PROBLEMS 107 3.29. For which values ofa are the following matrices iuvertible? Find the iuverses whenever possible. A=G a ;), B=G I D, 1 a a 1 c= (! 1 ~), D=G I ~). a I 0 a 3.30. Let {ai}~l' be the set consisting of the N rows of an N x N matrix A and assume that the ai are orthogonal to each other. Show that Hint: Consider AAt. What would the result be if Awere a unitary matrix? 3.31. Prove that a set of n homogeneous linear equations in n unknowns has a nontrivial solution if and only if the determinant of the matrix of coefficients is zero. 3.32. Use determinants to show that an antisymmetric matrix whose dimension is odd cannot have an inverse. 3.33. Show that tr(la} (bl) = (bl a). Hint: Evaluate the trace in an orthonormal basis. 3.34. Show that if two invertible N x N matrices A and B anticommute (that is, AB + BA = 0), then (a) N must be even, and (b) tr A = tr B = O. 3.35. Show that for a spatial rotation Rfi(0) of an angle 0 about an arbitrary axis n, tr Ril (0) = I +2 cos O. 3.36. Express the smn of the squares of elements of a matrix as a trace. Show that this smn is invariant under an orthogonal transformation of the matrix. 3.37. Let S and Abe a symmetric and an antisymmetric matrix, respectively, and let Mbe a general matrix. Show that (a) trM = tr M', (b) tr(SA) = 0; iu particular, tr A = 0, (c) SAis antisymmetric if and only if [S, A} = 0, (d) MSM' is symmetric and MAM' is antisymmetric. (e) MHMt is hermitian if His.
  • 128. 108 3. MATRICES: OPERATOR REPRESENTATIONS 3.38. Find the trace of each of the following linear operators: (a) T : ]R3 --> ]R3 given by T(x, y, z) = (x +Y - z, 2x + 3y - 2z, x - y). (b) T : ]R3 --> ]R3 given by T(x, y, z) = (y - z, x +2y +z, z - y). (c) T : 1(;4--> 1(;4given by T(x, y, z, w) = (x + iy - z +iw, 2ix +3y - 2iz - w, x - iy, z + iw). 3.39. Use Equation (3.29) to derive Equation (3.27). 3.40. Suppose that there are two operators A and B such that [A, B] = c1, where c is a constant. Show that the vector space in which such operators are defined cannot be finite-dimensional. Conclude that the position and momentum operators of quantum mechanics can be defined only in infinite dimensions. Additional Reading 1. Axler, S. Linear Algebra Done Right, Springer-Verlag, 1996. 2. Birkhoff, G. and MacLane, S. Modern Algebra, 4th ed., Macmillan, 1977. Discusses matrices from the standpoint of row and column operations. 3. Greub, W.Linear Algebra, 4th ed., Springer-Verlag, 1975.
  • 129. 4 _ Spectral Decomposition The last chapter discussed matrix representation of operators. It was pointed out there that sucha representation is basis-dependent. In some bases, the operatormay "look" quite complicated, while in others it may take a simple form. In a "spe- cial" basis, the operator may look the simplest: It may be a diagonal matrix. This chapter investigates conditions under which a basis exists in which the operator is represented by a diagonal matrix. 4.1 Direct Sums Sum oftwo subspaces defined Sometimes it is possible, and convenient, to break up a vector space into special (disjoint) subspaces. For instance, it is convenient to decompose the motion of a projectile into its horizontal and vertical components. Similarly, the study of the motion of a particle in R3 under the influence of a central force field is facili- tated by decomposing the motion into its projections onto the direction of angular momentum and onto a plane perpendicular to the angular momentum. This corre- sponds to decomposing a vector in space into a vector, say, in the xy-plane and a vector along the z-axis. We can generalize this to any vector space, but first some notation: Let 11and W be subspaces of a vector space V. Denote by 11+W the collection of all vectors in V that can be written as a sum of two vectors, one in 11 and one in W. It is easy to show that 11 +W is a subspace of V. 4.1.1. Example. Let If be the xy-plane and W the yz-plane. Theseare both subspaces ofJll.3, and so is 1L +W. In fact, 1L +W = Jll.3, becausegivenany vector(x, y, z) inJll.3, we can write it as (x, y, z) = (x, h.0) +(0, h.z}. '-'-' ell eW
  • 130. 110 4. SPECTRAL DECOMPOSITION Thisdecomposition is notunique: Wecouldalsowrite(x, y, z) = (x, h,0) + (0, ~y, z), anda hostof other relations. l1li 4.1.2. Definition. Let U and W be subspaces ofa vector space Vsuch that V = U +Wand the only vector common to both U and W is the zero vector. Then we direct sumdefined say that Vis the direct sum ofU and Wand write uniqueness ofdirect 4.1.3. Proposition. Let U and W be subspaces ofV. Then V = U Ell W ifand only sum ifany vector in Vcan be written uniquely as a vector in 11 plus a vector in W. Proof Assume V = U Ell W, and let Iv) E V be written as a sum of a vector in U and a vector in W in two different ways: Iv) = lu) + Iw) = lu') + Iw') {} lu) - lu') = Iw') - Iw). The LHS is in U. Since it is equal to the RHS-which is in W-it must be in W as well. Therefore, the LHS must equal zero, as must the RHS. Thus, lu) = lu') , Iw') = Iw), and there is only one way that [u) can be written as a sum of a vector in U and a vector in W. Conversely, if la) E U and also la) E W, then one can write la) = la) + 10) and -- in'lL inW la) = 10) + la) . -- in'lL inW Uniqueness of the decomposition of la) implies that la) = 10). Therefore, the only vector corumon to both U and W is the zero vector. This implies that V = UEilW. D dimensions ina 4.1.4. Proposition. IfV = U Ell W, then dim V = dim U +dim W. direct sum Proof Let llui) }~I be a basis for U and Ilwi)J~=I a basis for W. Then it is easily verified that IluI) ,lu2), ... , lum ) , IWI) ,lw2), ... , IWk)J is a basis for V. The details are left as an exercise. D We can generalize the notion of the direct sum to more than two subspaces. For example, we can write R' = XEIl'IJEIlZ, where X, 'IJ, and Z are the one-dimensional subspaces corresponding to the three axes. Now assume that (4.1) i.e., V is the direct sum of r of its subspaces that have no common vectors among themselves except the zero vector and have the property that any vector in Vcan be written (uniquely) as a sum of vectors one from each subspace. Define the linear operator Pj by Pj lu) = luj) where lu) = LJ=I luj), luj) E Uj . Then it is readily
  • 131. 4.1 DIRECT SUMS 111 verified that P] = Pj and PjPk = 0 for j i' k. Thus, the P/s are (not necessarily hermitian) projection operators. Furthermore, for an arbitrary vector lu), we have ¥Iu) E V. Since this is true for all vectors, we have the identity (4.2) orthogonal complement ofa subspace 4.1.5. Definition. Let V be an innerproduct space. Let JV(be any subspace ofV. Denote by JV(.L the set ofall vectors in Vorthogonal to all the vectors in :M. JV(.L (pronounced "em perp") is called the orthogonal complement ofJV(. 4.1.6. Proposition. JV(.L is a subspace ofV. Proof In fact, if la), Ib) E JV(.L, then for any vector [c) E JV(,we have =0 =0 - --- (cl (a la) + f3lb» = a (c] a) +f3 (c] b) = O. So, ala) +f3lb} E JV(.L for arbitrary a, f3 E C and la} , Ib) E JV(.L. o If V of Equation (4.1) is an inner product space, and the subspaces are mutually orthogonal, then for arbitrary lu) , Iv} E V, which shows that Pj is hermitian. In Chapter2, we assnmedthe projectionoperators to be hermitian. Now we see that only in an ioner product space (and only if the subspaces of a direct sum are orthogonal) do we recover the hermiticity of projection operators. 4.1.7. Example. Consideran orthonormalbasis BM = !Iei)):"=l for M, and extend it to a basis B = !Iei))~l for V. Now coostruct a (hermitian) projection operator P = L~l lei) {ei I· Thisis theoperatorthat projects anarbitrary vector inVanta thesubspace M. It is straightforwardto showthat 1- P is the projectionoperatorthat projectsontoM.L (seeProblem4.2). Anarbitrary vector la) E Vcanbe written as la) =(P +1 - P) la) =Pia) +(1 - P) la) . '-.-' in.M: in:M.l Furthermore, theonlyvector that canbeinboth:M andMl..is thezerovector, because itis the onlyvectororthogonalto itself. III
  • 132. 112 4. SPECTRAL DECOMPOSITION From this example and the remarks immediately preceding it we may conclude the following: 4.1.8. Proposition. If V is an inner product space, then V = M Ell M.L for any subspace M. Furthermore, the projection operators corresponding to M and M.L are hermitian. 4.2 Invariant Subspaces invariant subspace: reduction ofan operator matrix representation ofanoperator in a subspace block diagonal matrix defined This section explores the possibility of obtaioiog subspaces by means ofthe action ofa linear operator on vectors of an N-dimensional vector space V. Let 10) be any vector 10V, and A a linear operator on V. The vectors 10) ,A 10) ,A2 1 a) , ... , AN 10) are linearly dependent (there are N + I of them!). Let M es Span {Ak 10)Jf=o' It follows that, m es dim M =" dim V, and M has the property that for any vector [x) E M the vector A [x) also belongs to M (show this!). In other words, no vector 10M "leaves" the subspace wheu acted on by A. 4.2.1. Definition. A subspace M is an invariant subspace ofthe operator A if A transforms vectors ofM into vectors ofM. This iswrittensuccinctlyas A(M) C M. We say that M reduces A if both M and M.L are invariant subspaces ofA. Startiog with a basis of M, we can extend it to a basis B = {Iai) J!:, of V whose first m vectors span M. The matrix representation of A 10 such a basis is given by the relation Alai) = I:f=''''jilaj),i = 1,2, ... ,N.lfi =" m,then ",ji = 0 for j > m, because Alai) belongs to M when i =" m and therefore can be written as a linear combioation of only {Ia,) , 102),... , lam)). Thus, the matrix representation of A 10B will have the form A = (An A12), 02' A22 where An is anm x m matrix, A'2 anm x (N - m) matrix, 021the (N - m) x m zero matrix, and A22 an (N - m) x (N - m) matrix. We say that An represents the operator A 10the m-dimensional subspace M. It may also happen that the snbspace spanned by the remaioiog basis vectors 10 B, namely lam+, ) , lam+2) , ... , ION), is also an invarianl subspace of A. Then A12 will be zero, and A will take a block diagonal form:' A = (An 0) o A22 A matrix representation of an operator that can be brought ioto this form by a reducible and suitable choice ofbasis is said 10 be reducible; otherwise, it is called irreducible. irreducible matrices 1Fromnow on, we shalldenoteall zeromatrices by thesamesymbolregardless of theirdimensionality.
  • 133. 4,2 INVARIANT SUBSPACES 113 A reducible matrix A is denoted in two different waysr' (4.3) condition for invariance For example, when M reduces A and one chooses a basis the first m vectors of which are in M and the remaining ones in M1-, then A is reducible. We have seen on a number of occasions the significance of the hermitian conjugate of an operator (e.g., in relation to hermitian and unitary operators). The importance of this operator will be borne out further when we study the spectral theorem later in this chapter. Let us now investigate some properties of the adjoint of an operator in the context of invariant subspaces. 4.2.2. Lemma. A subspace M ofan inner product space V is invariant under the linear operator A ifand only ifM1- is invariant under At, Proof The proof is left as a problem. D An immediate consequence ofthe above lemma and the two identities (At) t = A and (M1-)1- = M is contained in the following theorem. 4.2.3. Theorem. A subspace of V reduces A if and only if it is invariant under both A and At. 4.2.4. Lemma. Let M be a subspace ofV and P the hermitianprojection operator onto M. Then M is invariant under the linear operator A ifand only ifAP = PAP, Proof Suppose M is invariant. Then for any [x) in V, we have Pix) EM·=} AP Ix) EM=} PAP Ix) = AP Ix) . Since the last equality holds for arbitrary [x), we bave AP = PAP, Conversely, suppose AP = PAP. For any IY) E M, we have Ply) = IY) =} AP Iy) = A Iy) = P(AP IY» E M. '-.-' =PAP Therefore, M is invariant under A. D 4.2.5. Theorem. Let M be a subspace ofV, P the hermitian projection operator ofVonto M, and Aa linear operator on V.Then M reduces Aifand only ifAand P commute. 21tis common to use a single subscript for submatrices ofa block diagonal matrix, just as it is common to use a single subscript for entries of a diagonal matrix.
  • 134. 114 4. SPECTRAL DECOMPOSITION Proof Suppose M reduces A. Then by Theorem 4.2.3, M is invariant under both A and At. Lemma 4.2.4 then implies AP = PAP and Atp = PAtp. (4.4) Taking the adjoint of the second equation yields (Atp)t = (PAtp)t, or PA = PAP. This equation together with the first equation of (4.4) yields PA = AP. Conversely, suppose that PA = AP. Then p2A = PAP, whence PA = PAP. Taking adjoints gives Atp = PAtp, because P is hermitian. By Lemma 4.2.4, M is invariant under At. Similarly, from PA = AP, we get PAP = Ap2, whence PAP = AP. Once again by Lemma 4.2.4, M is invariant under A. By Theorem 4.2.3, M rednces A. D The main goal of the remaining part of this chapter is to prove that certain operators, e.g. hermitian operators, are diagonalizable, that is, that we can always find an (orthonormal) basis in which they are represented by a diagonal matrix. 4.3 Eigenvalues and Eigenvectors eigenvalue and eigenvector Let us begin by considering eigenvaluesand eigenvectors, which are generaliza- tions of familiar concepts in two and three dimensions. Consider the operation of rotation about the z-axis by an angle 0 denoted by Rz(O).Such a rotation takes any vector (x, y) in the xy-plane to a new vector (x cos 0 - y sin 0, x sin 0 +y cos 0). Thus, unless (x, y) = (0,0) or 0 is an integer multiple of 2n, the vector will change. Is there a nonzero vector that is so special (eigen,in German) that it does not change when acted on by Rz(O)? As long as we confine ourselves to two di- mensions, the answer is no. But if we lift ourselves up from the two-dimensional xy-plane, we encounter many such vectors, alI of which lie along the z-axis. The foregoing example can be generalized to any rotation (normally specified by EuIer angles). In fact, the methods developed in this section can be used to show that a general rotation, given by Euler angles, always has an unchanged vector lying along the axis around which the rotation takes place. This concept is further generalized in the following definition. 4.3.1. Definition. A scalar A is an eigenvalue and a nonzero vector la} is an eigenvector ofthe linear transformation A E J:.,(V) if Ala} = Ala). (4.5) 4.3.2. Proposition. Addthe zero vector to the set ofall eigenvectors ofAbelonging to the same eigenvalue A, and denote the span of the resulting set by MA. Then MA is a subspace ofV, and every (nonzero) vector in MA is an eigenvector ofA with eigenvalue A. Proof The prooffollows immediately from the above definition and the definition of a subspace. D
  • 135. 4.3 EIGENVALUES AND EIGENVECTORS 115 eigenspace 4.3.3. Definition. The subspace JVe,. is referred to as the eigenspace ofA corre- sponding to the eigenvalue ),.. Its dimension is called the geometric multiplicity of ),.. An eigenvalue is called simple if its geometric multiplicity is 1. The set of spectrum eigenvalues ofA is called the spectrum ofA. By their very construction, eigenspaces corresponding to different eigenvalues have no vectors in common except the zero vector. This can be demonstrated by noting that if Iv) E. M, n MILfor ), f= 1-'-, then 0= (A - ),.1) Iv) = A Iv) - ),.Iv) = I-'-Iv) - ),.Iv) = (I-'- -),.) Iv) '* Iv) = O. '--.-' ,.0 Let us rewrite Equation (4.5) as (A - ),,1) la) = O. This equation says that la) is an eigenvector of A if and only if la) belongs to the kernel of A - ),,1. If the latter is invertible, then its kernel will consist of only the zero vector, which is not acceptable as a solution of Equation (4.5). Thus, if we are to obtain nontrivial solutions, A -),,1 must have no inverse. This is true if and only if det(A - ),,1) = O. (4.6) characteristic polynomial and characteristic roots ofanoperator The determinant in Equation (4.6) is a polynomial in)", called the characteris- tic polynomial of A.The roots of this polynomial are called characteristic roots and are simply the eigenvalues of A. Now, any polynomial of degree greater than or equal to I has at least one (complex) root, which yields the following theorem. 4.3.4. Theorem. Every operator on afinite-dimensional vector space over iC has at least one eigenvalue and therefore at least one eigenvector. Let X},),.2, •.• ,),.p be the distinct roots of the characteristic polynomial of A, and let xj occur mj times. Then 3 p det(A - ),,1) = (),.I - ),.)m 1 ••• (),.p - ),.)mp = D(),.j _ ),.)mj. j~1 For),. = 0, this gives p detA - 'l.rn t , m 2 'l.mp_D'l.mj - 11.1 11.2 ••• "-p - Aj. j=l (4.7) (4.8) Equation (4.8) states that the determinant of an operator is the product of all its eigenvalues. In particular, ifone ofthe eigenvalues is zero, then the operator is not invertible. 3mj is called the algebraic multiplicity of J..i-
  • 136. 116 4. SPECTRAL DECDMPDSITIDN 4.3.5. Example. Letus fiudthe eigenvaluesof a projectionoperatorP.1f la} is an eigen- vector,then Pia) = Ala}. ApplyingP on both sides again, we obtain p2la) =AP la} =1.(1. la}) =Alia). But p2 =P; thus, Pia} =Alia). It followsthaUlla) =A[e},or (1.1 - A) la} =O. Since la} i' 0, wemustbaveA(A-I) = 0, or1.= 0, 1.Thus,the onlyeigenvalues of aprojection operator are0 and1. Thepresence of zero as aneigenvalue of P is anindication that P is notinvertible. II 4.3.6. Example. Tobe ableto see thedifference betweenalgebraic andgeometric multi- plicities,considerthematrixA= (A 1),whosecharacteristicpolynomialis (I - 1.)1. Thus, thematrix has only oneeigenvalue, A = I, with algebraic multiplicity ml = 2. However, themostgeneralvector la) satisfying(A-1) la) = 0 is easilyshownto be ofthefonn (~). Thisshowsthat M).=l is one-dimensional, i.e., thegeometric multiplicity of Ais 1. l1li diagonalizable operators As mentioned at the beginning of this chapter, it is useful to represent an operator by as simple a matrix as possible. The simplest matrix is a diagoual matrix. This motivates the following definition: 4.3.7. Definition. A linear operatorA on a vector space V is said to be diagonal- izable if there is a basisfor Vall ofwhose vectorsare eigenvectors ofA. 4.3.8. Theorem. Let A be a diagonalizable operator on a vector space V with distinct eigenvalues{Aj }j~t. Thereare (not necessarilyhermitian)projectionop- eratorsPj on Vsuch that r (I) 1 = LPj, j=l (2) PiPj = 0 for i t= i. r (3) A = I>jPj. j=l Proof Let JY(j denote the eigenspace corresponding to the eigenvalue Ai- Since the eigenvectors span V and the only common vector of two different eigenspaces is the zero vector (see comments after Definition 4.3.3), we have This immediately gives (I) and (2) if we use Equations (4.1) and (4.2) where Pj is the projection operator onto JY( s- To prove (3), let Iv} be an arbitrary vectorin V.Then [u) canbe writtenuniquely as a sum of vectors each coming from one eigenspace. Therefore, A [u) =~AIVj) =~Aj IVj) = (~AjPj) Iv}. Since this equality holds for all vectors Iv}, (3) follows. D
  • 137. 4.4 SPECTRAL DECOMPOSITION 117 4.4 Spectral Decomposition This section derives one of the most powerful theorems in the theory of linear operators, the spectral decomposition theorem. We shall derive the theorem for operators that generalize hermitian and unitary operators. normal operator 4.4.1. Definition. A normal operator is an operator on an inner product space defined that commutes with its adjoint. An important consequence of this definition is that if and only if Ais normal. (4.9) 4.4.2. Proposition. Let A be a normal operator on V. Then Ix} is an eigenvector ofAwith eigenvalue). ifand only iflx} is an eigenvector ofAt with eigenvalue):*. Proof By Equation (4.9), thefactthat(A-)"1)t = At -),,*1, and the factthatA-)"1 isnonnal (reader, verify), we have II (A-)"1)xll = 0 ifandonly if II(At -)..*1)xll = O. Since it is only the zero vector that has the zero nonn, we get (A - ),,1) Ix} = 0 if and only if (At - ),,*1) [x) = o. This proves the proposition. D We obtain auseful consequence ofthis propositionby applying it to a hermitian operator H and a unitary operator" U.In the first case, we get Therefore, Ais real.Inthe second case, we write Ix) = 1jx) = UUt [x) = U()..* Ix)) = )..*U Ix) = U* Ix) => U* = I. Therefore, ).. is unimodular (has absolute value equal to 1). We summarize the foregoing discussion: 4.4,3. Corollary. The eigenvalues ofa hermitian operator are real. The eigenval- ues ofa unitary operator have unit absolute value. 4.4.4. Example. Let us find the eigenvalues and eigenvectors of the hermitianmatrix H=(~r})' Wehave ( - A det(H -Al) = det i Thus, theeigenvalues, Al = 1andA2 = -I, arereal, asexpected. "Obviously, bothare normal operators.
  • 138. 118 4. SPECTRAL DECOMPOSITION Tofindtheeigenvectors, we write -i) ("I) = (--:"1 - i"2) -1 cz la'i - cz oraz = ial. which gives lal) = (iC:X1 r ) = al (}), where cq is an arbitrary complex number. Atso, -i) (Ih) =(fh - ifh) 1 fh zfh +fh Always normalize the eigenvectors! or fh = -ilh, which gives la2} = (!'~1) = PI <-~;l, where PI is an arbitrary complex number. It is desirable, inmostsituations, toorthonormalizetheeigenvectors. In thepresentcase, theyarealready orthogonal. Thisis a property shared by all eigenvectors of ahermitian (in fact,normal) operator stated in thenexttheorem. Wetherefore need to merelynormalize the eigenvectors: or 1"11 = 1/.J2and"I = ei~1.J2for some'P E lll.. A similarresullis obtained for PI. The choiceip = 0 yields and III The following theorem proves for all normal operators the orthogonality prop- erty of their eigenvectors illnstrated in the example above for a simple hermitian operator. 4.4.5. Theorem. An eigenspace ofa normaloperatorreduces that operator. More- over, eigenspaces ofa normal operator are mutually orthogonal. Proof The first part of the theorem is a trivial consequence of Proposition 4.4.2 and Theorem 4.2.3. To prove the second part, let Iu) E JY[A and Iv) E JY[I' with A i' 1-'- Then, using Theorem 4.2.3 once more, we obtain A(vlu) = (vIAu) = (vIAu) = (Atvlu) = (Il*vlu) =Il(vlu) It follows that (A- Il) (vi u) = 0 and since Ai' u, (vi u) = O. D spectral decomposition theorem 4.4.6. Theorem. (Spectral Decomposition Theorem) Let Abe a normal operator on afinite-dimensional complex inner product space V. Let AI, A2, ... , A, be its distinct eigenvalues. Then there exist nonzero (hermitian) projection operators PI, Pz, ... , Pr such that Vi i' j ..
  • 139. 4.4 SPECTRAL DECDMPDSITIDN 119 3. L;'=I AiPi = A. Proof Let Pi be the operator that projects onto the eigenspace Mi corresponding to eigenvalue Ai. By conuuents after Proposition 4.1.6, these projection operators are hermitian. Because of Theorem 4.4.5, the only vector conuuon to any two distinct eigenspaces is the zero vector. So, it makes sense to talk about the direct sum of these eigenspaces. Let M =MI E9M2E9· . 'E9Mr and P =Lr=1 Pi, where P is the orthogonal projection operator onto M. Since A conuuutes with every Pi (Theorem 4.2.5), it conuuutes with P. Hence, by Theorem 4.2.5, M reduces A,i.e., M1- is also invariant under A. Now regard the resttiction of A to M1- as an operator in its own right on the finite-dimensional vector space M1-. Theorem 4.3.4 now forces A to have at least one eigenvector in M1-. But this is impossible because all eigenvectors of A have been accounted for in its eigenspaces. The only resolution is for M1- to be zero. This gives and The second equation follows from the first and Equations (4.1) and (4.2). The remaining part of the theorem follows from arguments sintilar to those used in the proof of Theorem 4.3.8. D We can now establish the counection between the diagonalizability of a normal operatorand the spectral theorem. In each subspace Mi' we choose an orthonormal basis. The union of all these bases is clearly a basis for the whole space V. Let us label these basis vectors lei), where the subscript indicates the subspace and the superscript indicates the particnlar vector in that subspace. Clearly, (ei lei:) = 8",8jj, and Pj = L;~llei){eil. Noting that Pk len =8kj' lei:),we can obtain the mattix elements of A in such a basis: Only the diagonal elements are nonzero. We note that for each subscript j we have mj orthonormal vectors lei}, where mj is the dimension of Mi- Thus, Aj occurs mj times as a diagonal element. Therefore, in such an orthonormal basis, A will be represented by diag(AI, ... , AI, A2, ... , A2, ... , Ar , ••• , Ar ). '"-.,,-" . , . ml times m2 times m- times Let us sunuuarize the foregoing discussion: 4.4.7. Corollary. IfA E £'(V) isnormal,then Vhas an orthonormalbasisconsist- ingofeigenvectorsofA. Therefore, a normaloperatoron a complexinnerproduct space is diagonalizable.
  • 140. 120 4. SPECTRAL DECOMPOSITION Computation ofthe largest and the smallest eigenvalues ofanormal operator Using this corollary, the reader may show the following: 4.4.8. Corollary. A hermitianoperatorispositive ifandonly ifall its eigenvalues are positive. 4.4.9.Example. COMPUTATION OFLARGEST AND SMALLEST EIGENVALUES There is an elegant technique that yields the largest andthe smallest (in absolute value) eigenvalues of a normal operator A in a straightforward way if theeigenspaces of these eigenvalues are onedimensional. Forconvenience, assume that theeigenvalues are labeled in order of decreasing absolute values: IAII > IA21 > ... > IArl ;6 O. Let (laknt'=1 be a basis of V consistingof eigenvectorsof A,and Ix) = Lt'=1 ~k lak) an arbitrary vectorin V. Then AmIx) = f ~kAm lak) = f ~kAT lak) = A'i' [~llal) + i:~k eflak)] . k=1 k=1 k=2 I In thelimitm --+ 00, thesummation inthebrackets vanishes. Therefore, and Ahermitian matrix can be diagonalized byaunitary matrix. foranyIy} E V.Taking theratio of this equation and thecorresponding oneform + 1,we obtain . (yl Am+1 Ix) lim AI. m-e-co (yl Am Ix) Notehowcrucially thisrelation depends onthefactthat Al is nondegenerate, i.e., that JV(1 is one-dimensional. By taking larger andlarger values for m, we can obtain a better and better approximation tothelargest eigenvalue. Assumingthatzero isnotthesmallesteigenvalue Ar-andthereforenotaneigenvalue- of A, we can findthe smallesteigenvalueby reptacing A with A-I and Al with liAr. The details areleft asanexercise forthereader. III Any given hermitian matrix H can be thought of as the representation of a hermitian operator in the standard orthonormal basis. We can find a unitary matrix U that Cantransform the standard basis to the orthonormal basis consisting of lej), the eigenvectors of the hermitian operator. The representation of the hermitian operator in the new basis is UHUt, as discussed in Section 3.3. However, the above argument showed that the new matrix is diagonal. We therefore have the following result. 4.4.10. Corollary. A hermitianmatrixcanalwaysbe broughtto diagonalform by meansofa unitarytransformation matrix. 4.4.11.Example. Letus consider thediagonalization of thehermitian matrix o o -I-i 1- i -I+i -I+i o o -I-i) 1+ i o . o
  • 141. 4.4 SPECTRAL DECOMPOSITION 121 The characteristic polynomial is det(H - A1) = (A + 2)2(A - 2)2 Thus, Al = -2 with multiplicity m 1 = 2, and).2 = 2 With multiplicity m2 = 2. Tofindtheeigenvectors, we first look at the matrix equation (H+21) la} = 0, or o 2 -1-; 1- i -1+; -1+; 2 o -I-.i)GI) I+! "'2 =0. o "'3 2 4 Thisis a systemof linear equations whose"solution" is Wehavetwoarbitraryparameters, so we expecttwolinearlyindependent solutions. Forthe two choices al = 2, cz = 0 and al = 0, Cl2 = 2, we obtain, respectively, and which happen to he orthogonal. We simply normalize them to ohtain and Similarly, the second eigenvalue equation, (H- 21) la} = 0, gives rise to the conditions "'3 = -!(l + ;)("'1 +"'2) and "'4 = -!(l-;)("'1 - "'2), whichprodnce the orthonormal vectors and le4} = I", ( I~ .). 2...,2 - -! 1-; Theunitarymatrix thatdiagonaIizes Hcan be constructed fromthesecolunmvectors using the remarks hefore Example 3.4.4, which imply that ifwe simply put the vectors lei} together ascolumns, theresulting matrix is Ut : 2 0 2 0 I 0 2 0 2 Ut = _ 2-,/2 1+; 1+; -1-; -1-; 1-; -1+; -1+; 1- i
  • 142. 122 4. SPECTRAL DECOMPOSITION and the unitary matrix will be 2 0 1- i 1+ i 1 0 2 1- i -1- i U=(Ut)t=_, 2,f'i 2 0 -I+i -1- i 0 2 -I+i 1 + i We can easily check that U diagonalizes H, i.e., that UHUt is diagonal. application ot diagonalizalion in electromagnetism 4.4.12. Example. In somephysicalapplications the abilityto diagonaIize matricescan be very useful. As a simple but illustrative example, let us consider the motion ofa charged particle in a constant magnetic field pointing in the zdirection. The equation of motion for sucha particleis d (eX m--.! = qv x B = qdet Vx dt 0 ey Vy o which in component form becomes dv y qB -=--Vx , dt m dvz -=0. dt Ignoring the uniform motion in the z direction, we need to solve the first two coupled equations, which in matrix form becomes d (Vx) qB ( 0 dt v y = -;;; -1 (4.10) where we have introduced a factor of i to render the matrix hermitian, and defined (j) = qB/m. If the 2 x 2 matrix were diagonal, we would get two uncoupled equations, which we could solve easily. Diagonalizing the matrix. involves finding a matrix R such that Ifwe coulddo sucha diagonalization, we wouldmuttipty(4.10)by Rto get' which can be written as d (vi) _ . (1'1 -d , - -leV 0 t v y where 5The fact that R is independent of t is crucial in this step. This fact, in tum, is a consequence of the independence from t of the original 2 x 2 matrix.
  • 143. 4.4 SPECTRAL DECDMPDSITIDN 123 Wethenwouldhaveapairof uncoupled equations that have v' = v' e-iwtttt andv' = v' e-iw!LZt as a solutionset in whichv' andv' XOx YOy ' O x Oy areintegration constants. To find R, we need the normalizedeigenvectors of (~i b). But these are obtainedin preciselythe samefashion as in Example 4.4.4. Thereis, however, an arbitrariness in the solutions due to the choice in numbering the eigenvalues. If we choose the normalized eigenvectors 1 (-i) le2) = -J'i 1 ' thenfrom commentsatthe end of Section 3.3, we get -1 t 1 (i R =R=-J'il With this choiceof R,we have t t 1 (-i R=(R) = - . -J'i' so that1'1 = 1 = -1'2. Havingfound Rt, we can write (V x)= Rt (VJ) = _1 (i vy vy -J'i 1 -i)(Vbxe~i"") 1 v' el wf . Oy (4.11) If thex andy components of velocityat t = 0 areVOx andVDy' respectively, then or (v o x)= Rt (V9X) , vOy vO y Substituting iu (4.11), we obtain (V 9X) = R(vox) = _1 (-iVox+ VOY) . vO y VDy..,fi i vOx + VOy -i)«-iVox +voy)e~iwt) = ( vOx cos cr +VDy sin cr ) 1 (ivox+voy)elwt -vOxsinwt+voyCOSW( . simultaneous diagonalization defined This gives the velocity asa function of time. Antidifferentiating once with respect to time yields the position vector. II In many situations of physical interest, it is desirable to know whether two operators are simultaneously diagonalizable. For instance, if there exists a basis of a Hilbert space of a quantum-mechanical system consisting of simultaneous eigenvectors of two operators, then one can measure those two operators at the same time. In particular, they are not restricted by an uncertainty relation. 4.4.13. Definition. Two operatorsaresaidtobesimultaneouslydiagonalizable if they can be written in terms a/the same set a/projection operators, as in Theorem 4.4.6.
  • 144. 124 4. SPECTRAL OECOMPOSITION This definitionis consistent with the matrix representation ofthe two operators, because if we take the orthonormal basis B = (Iej)) discussed right afterTheorem 4.4.6, we obtaindiagonalmatrices for bothoperators. Whatare the conditionsunder which two operators can be simultaneously diagonalized? Clearly, a necessary condition is that thetwooperators commute. Thisis animmediate consequence of the orthogonality of the projection operators, which trivially implies PiPj = PjPi for all i and j. It is also apparent in the matrix representation ofthe operators: Any two diagonal matrices commute. What about sufficiency? Is the commutativity of the two operators sufficient for them to be simultaneously diagonalizable? To answer this question, we need the following lemma: 4.4.14. Lemma. An operator T commutes with a normal operator A ifand only if T commutes with all the projection operators ofA. Proof The "if" part is trivial. To prove the "only if" part, suppose AT = TA, and let Ix} be any vector in one of the eigenspaces of A, say Mi- Then we have A(T Ix}) = T(A Ix}) = T(A.j Ix}) = Aj(T Ix»); i.e., T Ix) isinMj,orMj is invariant under T. Since M j is arbitrary, T leaves all eigenspaces invariant. In particular, it leaves Mt,the orthogonal complement ofM j (the direct sum ofall the remaining eigenspaces), invariant. By Theorems 4.2.3 and 4.2.5, TPj = PjT; and this holds forallj. 0 necessary and sufficient condition forsimultaneous diagonalizability 4.4.15. Theorem. A necessary and sufficient condition for two normal operators A and B to be simultaneously diagonalizable is [A, B] = O. Proof As claimed above, the"necessity" is trivial. Toprovethe"sufficiency," let A = Lj~l AjPj and B = Lf~l I-'kQk, where {Aj} and {Pj} are eigenvalues and projections of A, and {M} and {Qk! are those of B. Assume [A, B] = O. Then by Lemma 4.4.14, AQk = QkA. Since Qk commutes with A, it must commute with the latter's projection operators: PjQk = QkPi- To complete the proof, define Rjk sa PjQk, and note that Rjk = (PjQk)t = Qlpj = QkPj = PjQk = Rjk, (Rjk)2 = (PjQd = PjQkPjQk = PjPjQkQk = PjQk = Rjk. Therefore, Rjk are hermitian projection operators. Furthermore, r r LRjk=LPjQk=Qk. j=l j=l ----- =, Similarly, Lf=l Rjk = Lf=l PjQk = Pj Lf=l Qk = Pi- We can now write A and B as r r s A = LAjPj = LLAjRjk, j=l j~l k~l , , r B = LI-'kQk = LLI-'kRjk. k~l k~lj=l
  • 145. spectral decomposition ofa Pauli spin matrix 4.5 FUNCTIONS OF OPERATORS 125 By definition, they are simultaneously diagonalizable. D 4.4.16. Example. Let us findthe spectraldecompositionof the Paoti spinmatrix ( 0 -i) U2 = i 0 . Theeigenvalues andeigenvectors havebeenfound in Example 4.4.4. Theseare and The subspaces JY[J..j areone-dimensional; therefore, 1 (1) 1 1(1 PI = 1eJ) (ell = -J'i i -J'i (1 -i) =:2 i 1(1) 1(1 P2 = !e2) (e21 = - . (l i) = -2 . 2 -z -I Wecan checkthat PI +P2 = (6?) and -i) 1 ' -i) _~ (1 1 2-' III What restrictions are to be imposed on the most general operator T to make it diagonalizable? We saw in Chapter 2 that T can be written in terms of its so- called Cartesian components as T = H +iH' where both H and H' are hermitian and can therefore be decomposed according to Theorem 4.4.6. Can we conclude that T is also decomposable? No. Because the projection operators used in the decomposition of H may not be the same as those used for H'. However, if H and H' are simultaneously diagonalizable such that and r H' = I>~Pko k~1 (4.12) then T = L:k~1 ()'k + iA~)Pk. It follows that T has a spectral decomposition, and therefore is diagonalizable. Theorem 4.4.15 now implies that H and H' must commute. Since, H = ~ (T +Tt) and H' = tr(T- Tt), we have [H, H'] = 0 if and ouly if [T, Tt] = 0; i.e., T is normal. We thus get back to the condition with which we started the whole discussion of spectral decomposition in this section. 4.5 Functionsof Operators Functions of transformations were discussed in Chapter 2. With the power of spectral decomposition at our disposal, we can draw many important conclusions about them.
  • 146. 126 4. SPECTRAL OECOMPOSITION First, we note thatifT = L;~l A,P" then, because oforthogonality ofthe P,'s , , 1 2 = LATPi"'" yn = LAfPi. i=1 ;=1 Thus, any polynomial pinT has a spectral decomposition given by p(T) = L;=l p(Ai)Pi. Generalizing this to functions expandable in power series gives 00 f(T) = Lf(Ai)Pi. i=l (4.13) 4.5.1.Example. Let us investigatethe spectral decomposition of the following unitary (actuallyorthogonal)matrix: -Sine) cos a . ( COS B - ei() sinO Wefindtheeigenvalues ( C aSe - A -SinB) z det . 0 0 ' =1. -2cosOA+I=0. SIn v cos] - A yieldingAl = e-i8 andA2 = ei9 . ForAl we have(reader, provide themissingsteps) CO~9S~~il~) (:~)= 0 ~ a2 = ial => andfor A2. ( COS e- e-i() sinB - sinO ) ("I) 0 cosB - e-i9 a2 = => "z = -i"l => lez) = JzCJ. Wenote that the Mlj are one-dimensionaland spannedby [ej). Thus, _ie-iO) I ( eiO -te + 2- . te e -Ie PI = leI) (ell = ~ (;)(1 Pz = lez) (ezl =! (I.) (I 2 -, Clearly,PI + Pz = 1, and -i8 i9 1 (e-i() e PI +e Pz = 2 ie-i8 I(I -i) = 2: i O=!(l 2 -, -i) I ' D· ie iO) if) = U. e III If we takethenatural log of this equation anduse Equation (4.13), we obtain InU = In(e-iO)Pl + In(eiO)pz = -iOPI + iePz = i(-OPI + OPz)== iH, (4.14) where H== -fjPl +8P2 is ahennitianoperatorbecausefj isrealandPIandP2are hermitian. InvertingEquation(4.14)givesU = ei" , where H = O(-PI + Pz) = 0 (~i ~).
  • 147. The square root ofan operator isplagued by multivaluedness. In the real numbers, we have only two-valuedness! 4.5 FUNCTIONS OF OPERATORS 127 The example above shows that the unitary 2 x 2 matrix Ucan be written as an exponentialof an anti-hermitianoperator. This is a general result. In fact, we havethe following theorem,whose proof is left as an exercisefor the reader (see Problem4.22). 4.5.2.Theorem. A unitary operatorUon afinite-dimensionalcomplexinnerprod- uctspacecanbe writtenas U = ei H whereH is hermitian. Furthermore, a unitary matrix can be brought to diagonalform by a unitary transformation matrix. The last statemeutfollowsfrom Corollary4.4.10 and the fact that for any functiou f that can be expandedin a Taylorseries. A usefulfunctionof an operatoris its squareroot. A natural way to definethe square root of an operatorA is .jA = L~~l (±A)Pi. This clearly givesmany candidatesfor theroot,becauseeach termin the sum can haveeitherthe plus sign or the mious sign. 4.5.3.Definition. Thepositive squareroot ofa positive operator A = L~=l A,P, is.;A = L~~l APi. The uniquenessof the spectraldecompositionimplies that the positivesquare root of a positiveoperatoris unique. 4.5.4.Example. Letusevaluate ,fAwhere A=CS3i ~). First,we haveto spectrally decomposeA.Its characteristic equation is 1.2 - lOA + 16 = 0, withrootsAl = 8 andA2 = 2. Since both eigenvaluesarepositive andAis hermitian, we concludethat Ais indeedpositive (Corollary 4.4.8). We can also easily findits normalized eigenvectors: Thus, and 1 (-i) 1<2) =.,fi 1 . and ;), -i) 1 ' We caneasily check!hat(,fA)2 = A. -i) 1 (3 1 =.,fi -i ~) . III
  • 148. 128 4. SPECTRAL DECOMPOSITION Intuitively, higher and higher powers of T, when acting on a few vectors of the space, eventually exhaust all vectors, and further increase in power will be a repetition of lower powers. This intuitive idea can be made more precise by looking at the projection operators. We have already seen that Tn = LJ=l AjPj, n = 1,2, .... For various n'sone can"solve" for Pj in termsofpowers ofT. Since there are only a finite number of Pj 's, only a finite number of powers of T will suffice. In fact, we canexplicitlyconstructthe polynomialin Tfor Pi- Ifthereis such a polynomial, by Equation (4.13) it must satisfy Pj = pj(T) = L~~l Pj(Ak)Pk, where Pj is some polynomial to be determined. By orthogonality ofthe projection operators, Pj O.k) must be zero unless k = j, in which case it mnst be I. In other words, Pj O.k) = 8kj. Such a polynomial can be explicitly constructed: Therefore, (4.15) and we have the following result. 4.5.5. Proposition. Every function ofa normal operator on a finite-dimensional vector space can be expressedas a polynomial. Infact.from Equations (4./3) and (4.15), (4.16) 4.5.6. Example. Let us write .,fA of the last exampleas a polynomialin A. Wehave Substitutingin Equation(4.16), we obtain v'8 .,fi .,fi v'8 .,fA = j):jPI (A) +../i:i.P2(A) = -(A -2) - -(A- 8) = -A+-. 6 6 6 3 TheRHS is clearly a (first-degree) polynomial in A, andit is easy to verify that it is the matrix of ..fAobtainedin thepreviousexample. II
  • 149. 4.6 POLAR DECOMPOSITION 129 4.6 Polar Decomposition We have seen many similarities between operators and complex numbers. For instance, hermitian operators behave very much like the real numbers: they have realeigenvalues; their squares arepositive; everyoperatorcanbewritten asH+i H', where both H and H' are hermitian; and so forth. Also, unitary operators can be written as expiH, where H is hermitian. So unitary operators are the analogue of complex numbers of unit magnitude such as eiO• A general complex number can be written as re'", Can we write an arbitrary operator in an analogous way? The following theorem provides the answer. polar decomposition 4,6.1. Theorem. (polar decomposition theorem) An operatorA onafinite-dimen- theorem sionalcomplexinnerproductspacecanbe writtenas A = UR whereRis a(unique) positive operatorand U a unitaryoperator. IfA is invertible,then U is also unique. Proof We will prove the theorem for the case where the operator is invertible. The proof of the general case can be found in books on linear algebra (such as [Halm 58]). The reader may show that the operator AtA is positive. Therefore, it has a unique positive square root R.We let V = RA-I, or VA = R. Then w t = RA-1(RA-1)t = RA-1(A-1)tRt = R(AtA)-IRt = R(R2)-IRt = R(RtR)-IRt = RR-1(Rtl-1Rt = 1 and and Vis indeed a unitary operator. Now choose U = vt to get the desired decom- position. To prove uniqueness we note that UR = U'R' implies that R = UtU'R' and R2 = RtR = (utu'R'lt(utU'R') = R'tU'tUUtU'R' = R'tR' = R'2. Since the positive transformation R2 (or R'2l has only one positive square root, it follows that R = R' IfA is invertible, then so is R = UtA. Therefore, UR = U'R =} URR-1 = U'RR-1 =} U = U', and U is also unique. D Itis interesting to note that the positive definiteness ofRand the nonuniqueness of U are the analogue of the positivity of r and the nonuniqueness of eiO in the polar representation of complex numbers: z = reifJ = rei (fJ+ 2mr ) Vn EZ.
  • 150. 130 4. SPECTRAL DECOMPOSITION In practice, R is found by spectrally decomposing AtA and taking its positive square root.6 Once R is found, Ucan be calculated from the definition A = UR. 4.6.2. Example. Let us findthepolar decomposition of A= (-gi ~. We have Theeigenvalues and eigenvectors of R2 areroutinely found tobe I (i) Al = 18, 1.2 = 2, let} = 2../2 ~ , Theprojection matrices are i../7 7 }, Thus, TofindU,we notethatdetAis nonzero.Hence,Ais invertible, whichimpliesthat Ris also invertible. Theinverse of Ris The unitary mattixis simply U = AR-I = ~ (-iI5../2 3v'14 24 3i-m 15../2)· Itis leftforthereader toverifythat Uis indeed unitary. 4.7 Real Vector Spaces The treatment so far in this chapter has focused on complex inner product spaces. The complex number system is far more complete than the real numbers. For example, in preparation for the proof of the spectral decomposition theorem, we used the existence of n roots of a polynomial of degree n over the complex field 6It is important to payattention to theorderof thetwo operators: One decomposesAtA,not AA t.
  • 151. 4.7 REAL VECTOR SPACES 131 (this is the fundamental theorem of algebra). A polynomial over the reals, on the other hand, does not necessarily have all its roots in the real number system. Itmay therefore seem that vector spaces over the reals will not satisfy the nseful theorems and results developed for complex spaces. However, through a process called complexification of a real vector space, in which an imaginary part is added to such a space, it is possible to prove (see, for example, [Balm 58]) practically all the results obtained for complex vector spaces. Only the resnlts are given here. 4.7.1. Theorem. Arealsymmetricoperatorhasaspectraldecompositionasstated in Theorem4.4.6. This theorem is especially useful in applications of classical physics, which deal mostly with real vector spaces. A typical situation involves a vector that is related to another vector by a symmetric matrix. It is then convenient to find a coordinate system in which the two vectors are related in a simple mauner. This involves diagonalizing the symmetric mattix by a rotation (a real orthogonal matrix). Theorem 4.7.1 reassures us that such a diagonalization is possible. 4.7.2. Example. For a systemof N point particles constituting a rigid body,the total angular momentum L = 'Lf::l mj(rj x Vi) is related to the angular frequency via L = 'Lf::l mi[rj x (w x Ii)] = 'Lf::l mj[wfj . fj - fieri . W)], or (Z ;)= (~;: ~;~ t., Izx /zy where IX,) (WX) I yz Wy. Izz {Oz N i.. = L:mjCrl-xl>. ;=1 N Ixy = -LmjXiYi, i=1 N Iyy =L mi (r1- Y1), i=1 N [xz = - LmjXjZi. i=1 N I zz = EmjCrl - zf), i=1 N Iyz =- LmiYiZi, i=1 with Ixy = I yx • [xz = [zx. and I yz = flY' The 3 x 3 matrix is denoted by I and is called the moment of inertia matrix. It is symmetric,andTheorem4.7.1permitsits diagonalization by an orthogonal transformation (the counterpart of a unitary transformation in a real vector space). But an orthogonal transformation in threedimensions is merely a rotation of coordinares.? Thus, Theorem 4.7.1 says that it is always possible to choose coordinate systems in which the moment of inertia matrix is diagonal. In such a coordinate system we have Lx = Ixxwx. L y = Iyywy, and Lz = Izzwz, simplifying the equations considerably. Similarly, the kineticenergyof therigidrotatingbody, N. N N T = L !mjVr = L !miVj . (w x rj) = L !mjw. (r, x Vj) = !W. L = !wt!W, i=1 i=1 i=1 71bis is not entirely true! There are orthogonal transformations that are composed of a rotation followed by a reflection about the origin. See Example 3.5.8.
  • 152. 132 4. SPECTRAL DECOMPOSITION which in general has off-diagonal terms involving Ixy and so forth, reduces to a simple & . I 2 II 2 II 2 III rorm: T = 2. Ixxwx + 2" yyWy + '2 zzwz· 4.7.3. Example. Another application of Theorem 4.7.1 is in the study of conic sections. The most general form of the equation of a conic section is alx 2 + azy2 +a3xy + a4x +a5Y +a6 = 0, where aI, ... , a6 are constants. ITthe coordinate axes coincide with the principal axes of the conic section, the xy term will beabsent, and the equation of the conic section takes the familiar form, On geometrical grounds we have to be able to rotate xy-coordinates to coincide with the principal axes. We shall do this using the ideas discussed in this chapter. First, wenote that the generalequation for a conic sectioncan be writtenin matrixform as (x as) (~) +a6 = o. The 2 x 2 matrix is symmetricand can therefore be diagonalizedby means of an orthogonal matrix R. Then R/R = 1, and we can write Let a3/2) Rt = (ai a2 0 Then we get (x' y') (aJ ~) (;:) + (a4 a~) (;;) +a6 = 0; or "2 "2 " " 0 alx +a2Y +a4x +asy +a6 = . The cross term has disappeared. The orthogonal matrix R is simply a rotation. In fact, it rotates the original coordinate system to coincide with the principal axes of the conic section. II 4.7.4. Example. In this example we investigate conditions under which a multivariable function has a maximum or a minimum. A point a = (aI, a2, ...• an) E lRn is a maximum (minimum) of a function f(xj, X2, ... , xn) ee f(r) if vflx,=a, = (aa f , aaf , ... , aaf ) = 0 Xl x2 xn Xj=aj
  • 153. 4.7 REAL VECTOR SPACES 133 and for small Xi - ai, the difference fer) - /(a) is negative (positive). To relate this difference to thetopicsof thissection, write theTaylorexpansionof the functionarounda keeping terms up to the second order: n al 1 n ( a2 I ) l(r)=/(a)+L(Xi-ai)(-a.) +-2 L(Xi- ai)(Xj-aj) -a·a· +... , i=l X, r=a i,j XI XJ r=a or, constructing a column vector out of 8j ea Xi - ai and a symmetric matrix Dij out of the second derivatives, we can write 1 t I(r) - I(a) = -8 08 + ... 2 because the first derivatives vanish. For a to be a minimum point of f, the RHS of the last equation must be positive for arbitrary 8. This means that D must be a positive matrix. S Thus, all its eigenvalues must be positive (Corollary 4.4.8). Similarly, we can show that for a to be a maximum point of f. -0 must be positive definite. This means that Dmust have negative eigenvalues. Whenwespecialize theforegoing discussion to twodimensions, we obtainresultsthat are familiar from calculus. For the function f (x, y) to have a minimum, the eigenvalues of the matrix (Ixx Ixy) Iyx Iyy must be positive. The characteristic polynomial det (Ixx - A Iyx Ixy ) =0 Iv» -A yields two eigenvalues: Ixx +Iyy+JUxx - lyy)2 +4!A At = 2 ' Ixx + Iyy - JUxx - lyy)2 +4!ly 1.2 = 2 . These eigenvalues will be both positive if Ixx + Iyy > jUxx:- /yy)2 +4!ly. and both negative if Ixx + Ivv < -JUxx - lyy)2 +41;y. Squaring these iuequalities and simplifying yields 2 Ixxlyy > IxY' SNote that Dis already symmetric-the real analogue of hermitian.
  • 154. 134 4. SPECTRAL DECOMPOSITION which shows that fxx and /yy must have the same sign. ITthey are both positive (negative), we havea minimum (maximum).This is the familiar condition for the attainmentof extrema by a function of two variables. II Although the establishment of spectral decomposition for symmetric opera- tors is fairly straightforward, the case of orthogonal operators (the counterpart of unitary operators in a real vector space) is more complicated. In fact, we have already seen in Example 4.5.1 that the eigenvalues of an orthogonal transforma- tion in two dimensions are, in general, complex. This is in contrast to symmetric transformations, Think of the orthogonal operator 0 as a unitary operator.? Since the absolute value ofthe eigenvalues ofa unitary operator is I, the only real possibilities are ±I. To find the other eigenvalues we note that as a unitary operator. 0 can be written as eA , where A is anti-hermitian (see Problem 4.22). Since hermitian conjugation and transposition coincide for real vector spaces, we conclude that A = _At, and A is antisymmetric. It is also real, because 0 is. Let us now consider the eigenvalues of A. If A is an eigenvalue of A corre- sponding to the eigenvector la), then (al Ala) = A(al a). Taking the complex conjugate of both sides gives (al At la) = A*(al a); but At = At = -A. because A is real and antisymmetric. We therefore have (alAla) = -A* (ala), which gives A* = -A. It follows that if we restrict A to be real. then it can only be zero; otherwise. it must be purely imaginary. Furthermore, the reader may verify that if Ais an eigenvalue ofA. so is -A. Therefore, the diagonal form ofAlooks like this: Amag = diag(O. 0, ...• O.WI. -WI. ifh. -ilh•...•Wk, -iek). which gives 0 the following diagonal form: Od' = eAdiag = diag(eO eO eO eifft e-i91 eilh e-ifh eiOk e-Uh) lag , , ... " , , , , .•• , , with el •fh•....ekall real. Ilis clear that if 0 has -I as an eigenvalue, then some of the e's must equal ±1f. Separating the 1f'S from the rest of e's and putting all of the above arguments together, we get O' - diagt l 1 1 -1 -1 -1 Uft -Uh i(h -iBz iBm -iBm) dlag- , , ... , ,~,e ,e ,e,e , ... ,e ,e N+ N_ whcre v., +N_ +2m = dimO. Getting insight from Example 4.5.1, we can argue. admittedly in a nonrigorous way. that corresponding to each pair e±iBj is a 2 x 2 matrix of the form -Sine}) == R (e.) COse} 2 J (4.17) 9This can always be done by formally identifying transposition with hermitian conjugation, an identification that holds when the underlying field of numbers is real.
  • 155. 4.7 REAL VECTOR SPACES 135 We therefore have the following theorem (refer to [Hahn 58] for a rigorous treat- ment). 4.7.5. Theorem. A real orthogonal operator on a real inner product space V cannot, in general, be completely diagonalized. The closest it can get to a diagonal form is Odiag = diag(l, 1, ... ,1",-1, -1, ... , -1, R2(11j), R2(!h), ... , R2(em )) , where N+ + N_ +2m = dim V and R2(ej) is as given in (4.17). Furthermore, the matrix that transforms an orthogonal matrix into the form above is itself an orthogonal matrix. The last statement follows from Theorem 4.5.2 and the fact that an orthogonal matrix is the real analogue of a unitary matrix. 4.7.6. Example. An interesting applicationof Theorem4.7.5 occursinclassicalmechan- ics,whereitis shownthat themotionof arigidbodyconsistsof atranslation andarotation. 'The rotation is represented by a 3 x 3 orthogonal matrix. Theorem 4.7.5 states that by an appropriate choice of coordinate systems(l.e., by applying the same orthogonal transfor- mation that diagonalizes therotation matrix of therigidbody),one can"diagonalize" the 3 x 3 orthogonal matrix. The"diagonal" formis o ±l o or o cosB sinO -s~e) . cosB Excluding thereflections (corresponding to -1's) and thetrivial identity rotation, we con- cludethat anyrotation of arigidbodycanbewritten as ( I 0 o cosB o sine -s~e), cosO whichis arotation through theangleBabout the(new)x-axis. III Combining the rotation of the example above with the translations, we obtain the following theorem. 4.7.7. Theorem. (Euler) The general motion ofa rigid body consists ofthe trans- lation of one point of that body and a rotation about a single axis through that point. Finally, we quote the polar decomposition for real inner product spaces. 4.7.8. Theorem. Any operator A on a real inner product space can be written as A = OR, where R is a (unique) symmetric positive operator and 0 is orthogonal.
  • 156. 136 4. SPECTRAL DECOMPOSITION 4.7.9. Example. Let us decompose the following matrix into its polar form: Theprocedure is thesameasin thecomplexcase. Wehave 2 t (2 3)(2 0) (13 R = A A = 0 -2 3 -2 = -6 witheigenvalues A.l = 1 and).2 = 16 andnormalized eigenvectors and Theprojection operators are PI = le1)(eIl = ~ G;), Thus, we have -2) 1. Wenote that Ais invertible. Thus,Ris also invertible, and R- I = 2~ (~ I~)' This gives0 = AR-1,or Itis readilyverified that 0 is indeedorthogonal. -2) = ~ (17 1 5-6 -6) 8 . III Our excursion through operator algebra and matrix theory bas revealed to us the diversity of diagonalizable operators. Could it be perhaps that all operators are diagonalizable? In other words, given any operator, can we find a basis in which the matrix representing that operator is diagonal? The answer is, in general, no! (See Problem 4.27.) Discussion of this topic entails a treatment of the Hamilton- Cayly theorem and the Jordan canonical form of a matrix, in which the so-called generalized eigenvectors are introduced. A generalized eigenvectorbelongs to the kernel of (A - A1)m for some positive integer m. Then A is called a generalized eigenvalue. We shall not pursue this matter here. The interested reader can find such a discussion in books on linear algebra and matrix theory. We shall, however, see the application of this notionto special operators on infinite-dimensional vector spaces in Chapter 16. One result is worth mentioning at this point. 4.7.10. Proposition. If the roots ofthe characteristic polynomial ofa matrix are all simple, then the matrix can be brought to diagonal form by a similarity (not necessarily a unitary) transformation.
  • 157. 4.7 REAL VECTOR SPACES 137 4.7.11. Example. As a finat example of the application of the results of this section, tet us evaluate the n-fold integral 100 100 100 -'" In = dXl dX2 ... dXne - L..i,j=l lnijXiXj, -00 -00 -00 (4.t8) [det J] = IdetR'1 = I. III analytic definition of the determinant ofa matrix where the mij are elements of a real, symmetric, positive definite matrix. say M. Because it is symmetric, M can be diagonaIized by an orthogonal matrix R so that RMRt = D is a diagonal matrix whose diagonal entries are the eigenvalues, AI. ).,2, ... , An, of M, whose positive definiteness ensures that none of these eigenvalues is zero or negative. The exponent in (4.18) can be written as n L mijXjXj = xtMx =xtRtRMRtRx = xttDx' = )"1X12 +... +).nx~2. i,j=l where or, in component form, x; = L:j=l TijXj for t = 1,2, ... ,n. Similarly, since x = Rtx'. it follows that Xi =LJ=l rjixj for i = 1,2, ... , n. The "volume element" dx! dXn is related to the primed volume element as follows: 1 8(Xl> X2, ,xn)1 I I I d I dXl···dxn = f " ' f dXl ···dxn es IdetJldxl ... xn' 8(xl' x2 . · · · , xn) where J is the Jacobian matrix whose ijth element is 8xi/8xj. But 8Xi 8' =rji xj Therefore, in terms of s', the integral In becomes In = 1 00 dx~ l°Odx~ ... 1 00 dx~e-J..IX?-J..2Xq-- ..·-J..nX~2 -00 -00 -00 = (/_: dx~e-J..1X?)(i:dx2e-J..2X22) ... (i:dx~e-J..nX~) = ~ ~... ~ = "n/2 1 = "n/2(detM)-t/2, VAt V1.2 vi;. .JA1A2"· An because the determinant of a matrix is the product of its eigenvalues. This result can be written as 1 00 t nn dnxe-X Mx = "n/2(detM)-1/2 =} detM = , -00 (J~oodnxe-xt M X )2 which gives an analytic definition of the determinant.
  • 158. 138 4. SPECTRAL DECOMPOSITION 4.8 Problems 4.1. Let 11j and 11z be subspaces of V. Show that (a) dim(l1j + 11z)= dim 111 + dim 11z - dim(111 n11z).Hint: Extend a basis of 111 n 11z to both 111 and 11z. (b) If111 + 11z = Vand dim 111 + dim 11z = dim V,then V= 111 m11z. (b) If dim 111 + dim 11z > dim V,then 111 n11z oF {OJ. 4.2. Let P be the (hermitian) projection operator onto a subspace Jye Show that 1 - P projects onto M1-. Hint: You need to show that (ml Pia) = (ml a) for arbitrary la) and 1m) E M; therefore, consider (ml P la)*, and use the hermiticity ofP. 4.3. Show that a subspace M of an inner product space V is invariant under the linear operator A if and only if M1- is invariant under At. 4.4. Show that the intersection of two invariant subspaces of an operator is also an invariant subspace. 4.5. Let n be a permutation of the integers (I, 2, ... , nj. Find the spectrum of An, if for Ix) = (aj, az, ... ,an) E en, we define An [x) = (an(I), ... , an(n»)' 4.6. Show that (a) the coefficient of J..N in the characteristic polynomial is (_l)N, where N = dim V, and (b) the constant in the characteristic polynomial of an operator is its determinant. 4.7. Operators A and B satisfy the commutation relation [A, B] = 1. Let Ib) be an eigenvector of B with eigenvalue J... Show that e-'fA Ib) is also an eigenvector of translation operator B,but witheigenvalue J.. + r , This is why e-'fA is called the translation operator for B. Hint: First find [B, e-'fA]. 4.8. Find the eigenvalues of an involutive operator, that is, an operator A with the property AZ = 1. 4.9. Assume that A and A' are similar matrices. Show that they have the same eigenvalues. 4.10. In each of the following cases, determine the counterclockwise rotation of the xy-axes that brings the conic section into the standard form and determine the conic section. (a) llxz + 3l + 6xy - 12 = 0 (e) 2xz<s"> 4xy - 3 = 0 (e) 2xz+sl- 4xy - 36 = 0 (b) SxZ - 3yz+6xy+6=O (d) 6x z +3l- 4xy - 7 = 0
  • 159. 4.B PROBLEMS 139 4.11. Show that if A is invertible, then the eigenvectors of A-I are the same as those of A and the eigenvalues of A-I are the reciprocals of those of A. 4.12. Find all eigenvalues and eigenvectors of the following matrices: Al = (~ ~) 81 = (~ ~) Cl= CI -2 ~I) 3 -4 -I A2= G 0 ~) 82= G I D C I I JJ I 0 C2 = : -I 0 I I A3= G I D83 = G I D ( I D I I C3 = : 0 0 I I 4.13. Sbow that a 2 x 2 rotation matrix does not have a real eigenvalue (and, therefore, eigenvector) when the rotation angle is not an integer multiple of tt . What is the physical interpretation of this? 4.14. Three equal point masses are located at (a, a, 0), (a, 0, a), and (0, a, a). Find the moment ofinertia matrix as well as its eigenvalues and the corresponding eigenvectors. 4.15. Consider (aI, a2, ... , an) E en and define Eij as the operator that inter- changes a: and ai- Find the eigenvalues of this operator. 4.16. Find the eigenvalues and eigenvectors of the operator -id/dx acting in the vector space of differentiable functions el (-00,00). 4.17. Show that a hermitian operator is positive if and only if its eigenvalues are positive. 4.18. What are the spectral decompositions of At, A-I, and AAt for an invertible normal operator A? 4.19. Consider the matrix I +i) 3 . (a) Find the eigenvalues and the orthonormal eigenvectors of A. (b) Calcnlatethe projection operators (matrices) PI andP2 and verifythat Li Pi = 1 and Li AiPi = A. (c) Find the matrices.,fA, sin(nA/6), and cos(nA/6). (d) Is A invertible? If so, find the eigenvalues and eigenvectors of A-I.
  • 160. 140 4. SPECTRAL OECOMPOSITION 4.20. Consider the matrix A= (~i ~ ~} (a) Find the eigenvalnes of A.Hint: Try A= 3 in the characteristic polynomial of A. (b) For each A,find a basis for M, the eigenspace associated with the eigenvalue A. (c) Use the Gram-Schmidt process to orthonormalize the above basis vectors. (d) Calculate the projection operators (matrices) Pi for each subspace and verify that Li Pi =1 and LiAiPi =A. (e) Find the matrices.,fA, sin(rrAj2), and cos(rrAj2). (f) Is Ainvertible? If so, find the eigenvalues and eigenvectors of A-I. 4.21. Show that if two hermitian matrices have the same set of eigenvalues, then they are unitarily related. 4.22. Prove that corresponding to every unitary operator U acting on a finite- dimensional vector space, there is a hermitian operator H such that U = exp iH. 4.23. Find the polar decomposition of the following matrices: ( 2i 0) A="fi3' ( 41 B = 12i -12i) 34 ' ( 1 0 c= 0 1 I i 4.24. Show that an arbitrary matrix Acan be "diagonalized" as D = UAV, where U is unitary and Dis a real diagonal matrix with only nonnegative eigenvalues. Hint: Consider AAt . 4.25. Show that (a) if A is an eigenvalue of an antisymmetric operator, then so is -A, and (b) antisymmetric operators (matrices) of odd dimension cannot be invertible. 4.26. Find the unitary matrices that diagonalize the following hermitian matrices: Al = C12 _i -1+ i) A2=C· ;), A3 = G-i) -1 ' -, o ' BI = (~I -1 ~) B2 = (~. 0 ~i) . 0 , , -1 -i -I -, Warning! You may have to resort to numerical approximations for some of these. 4.27. For A = (b1),where x t= 0, show that it is impossible to find an invertible 2 x 2 matrix R such that RAR-1is diagonal. (This shows that not all operators are diagonalizable.)
  • 161. 4.8 PROBLEMS 141 Additional Reading 1. Alder, S. Linear Algebra Done Right, Springer-Verlag, 1996. Concise but useful discussion of real and complex spectral theory. 2. DeVito, C. Functional Analysis and Linear Operator Theory, Addison- Wesley, 1990. Has a good discussion of spectral theory for finite and infinite dimensions. 3. Halmos, P. Finite Dimensional Vector Spaces, 2nd ed., Van Nostrand, 1958. Comprehensive treatment of real and complex spectral theory for operators on finite dimensional vectorspaces.
  • 165. 5 _ Hilbert Spaces The basic concepts of finite-dimensional vector spaces introduced in Chapter I can readily be generalized to infinite dimensions. The definition of a vector space and concepts of linear combination, linear independence, basis,subspace, span, andso forth all carry over to infinite dimensions. However, one thing is crucially different in the new situation, and this difference makes the study of infinite-dimensional vector spaces bothricher and morenontrivial: In afinite-dimensional vectorspace we dealt withfinite sums; ininfinite dimensions we encounter infinite sums. Thus, we haveto investigate theconvergence of suchsums. 5.1 The Question of Convergence The intuitive notion of convergence acquired in calculus makes use of the idea of closeness. This, in tum, requires the notion of distance.' We considered such a notion in Chapter I in the context of a norm, and saw that the inner product had anassociated norm. However, it is possibleto introduce a norm on a vectorspace without aninner product. One such norm, applicable to en and JRn, was where p is an integer. The "natural" norm, i.e., that induced on en (or JRn) by the usual inner product, corresponds to p = 2. The distance between two points 1It is possible to introduce the idea of closeness abstractly, withoutresort to the notion of distance,as is done in topology. However, distance,as appliedin vectorspaces,is as abstract as we wantto get.
  • 166. 146 5. HILBERT SPACES Closeness isa relative concept! Cauchy sequence defined complete vector spacedefined depends on the particntar norm nsed. For example, consider the "point" (or vector) Ib) = (0.1,0.1, ... ,0.1) in a 1000-dimensional space (n = 1000). One can easily check that the distance of this vector from the origin varies considerably with p: IIblil = 100, IIbll2 = 3.16, libII 10 = 0.2. This variation may give the impression that there is no snch thing as "closeness", and it all depends on how one defines the norm. This isnot true, becanse closeness is arelative concept: One always compares distances. A norm with large p shrinks all distances of a space, and a norm with small p stretches them. Thus, although it is impossible (and meaningless) to say that "Ia) is close to Ib)" because of the dependence of distance on p, one can always say "Ia) is closer to Ib) than [c) is to Id)," regardless of the value of p, Now that we have a way of telling whether vectors are close together or far apart, we can talk about limits and the convergence of sequences of vectors. Let us begin by recalling the definition of a Cauchy sequence 5.1.1. Definition. An infinite sequence of vectors {lai)}~l in a normed linear space V is called a Cauchy sequence if 1im;-->oo lIa; - aj II = O. j-HX) A convergent sequence is necessarily Cauchy. This can be shown using the triangle inequality (see Problem 5.2). However, there may be Cauchy sequences in a given vector space that do not converge to any vector in that space (see the example below). Snch a convergence requires additional properties of a vector space summarized in the following deliuition. 5.1.2. Definition. A complete vector space Vis a normed linear space for which every Cauchy sequence of vectors in V has a limit vector in V. In other words, if {Iai)}~l is a Cauchy sequence, then there exists a vector la} E V such that limi-->oo lIa; - a II = o. 5.1.3. Example. 1.lll.is completewithrespectto theabsolute-value norm lIaII = laI. In otherwords,everyCauchysequenceof realnumbers hasa limitin JR. Thisis provedin real analysis. 2. <C is complete with respect to the norm lIall = lal = ~(Rea)2 + (rma)2. Using lal :s IReal + IImc], one can show that the completeness of <C followsfrom that of R, Detailsareleft as anexercisefor thereader. 3. The set of rational numbers Q is not completewith respectto the absolute-value norm. In fact, {(I + 1/ k)k}~l is a sequence of rational numbers that is Cauchy but does not convergeto arational number; it converges to e, thebaseof thenatural logarithm, whichis known to bean irrational number. II Let {laj}}~l be a Cauchy sequence of vectors in a finite-dimensional vec- tor space VN. Choose an orthonormal basis {lek)}f=l in VN such thatz laj} = 2Recallthatone can always definean innerproduct on a finite-dimensional vectorspace.So, the existenceof orthonormal basesis guaranteed.
  • 167. allfinite-dimensional vector spaces are complete 5.1 THE QUESTION OF CONVERGENCE 147 N W N ill Lk=l Clk [ek) and [aj) = Lk=t Clk [ek). Then lIa; _ajll2 = (a; -ajla; -aj) = 11~(Clf) -Clk j») lekf N N = L (Clk;) - Clkj))*(Cl?) - Cl?») (ek[el) = L [Clf) - Clk j) [2. k,l=l k=l The LHS goes to zero, because the sequence is assumed Cauchy. Furthermore, all terms on the RHS are positive. Thus, they too must go to zero as i, j --> 00. By the completeness of C, there must exist Clk E C such that limn....oo Clk n) = Clk for k = 1,2, ... , N. Now consider [a) E VN given by la) = Lf=l Clk lek). We claim that [a) is the limit of the above sequence of vectors iu VN. Indeed, We have proved the followiug: 5.1.4. Proposition. Every Cauchy sequence in a finite-dimensional innerproduct space overC (or R) is convergent. In otherwords, everyfinite-dimensionalcomplex (or real) inner product space is complete with respect to the norm induced by its inner product. Thenextexample showshow important theword "finite" is. 5.1.5. Example. Consider{fk}~l' the infinitesequence of continuousfunctions defined iu the interval [-I, +1] by { I if Ilk::ox::ol, Ik(x) = (kx + 1)/2 if -llk::o x::o Ilk, o if -I::ox::o -Ilk. Thissequence belongsto eO(_l, 1), theinner product spaceof continuous functions with its usual inner product: (II g) = J~l !*(x)g(x) dx. It is straightforwardto verify that IIIk - /;112 = J~ll!k(x) - /;(x)1 2dx . ,0. Therefore,the sequence is Cauchy. k,j"""'*oo However. thelimitof this sequence is (see Figure 5.1) I(x) = {I if 0< x < I, o if -1<x<O, which is discontinuous at x = 0 andtherefore does not belong to the spacein whichthe original sequence lies. III
  • 168. 148 5. HILBERT SPACES y o Figure 5.1 Thelimit of thesequence of thecontinuous functions Ik is a discontinuous function that is 1 forx > 0 and0 forx < O. We see that infinite-dimensional vector spaces are not generally complete. It is a nontrivial taskto show whether ornot a given infinite-dimensional vector space is complete. Any vector space (finite- or infinite-dimensional) contains all finite linearcom- binations of the form I:7~1 a; lai) when it contains all the lai)'s. This follows from the very definition of a vector space. However, the sitnation is different when n goes to infinity. For the vector space to contain the infinite sum, firstly, the mean- ing of such a sum has to be clarified, i.e., a norm and an associated convergence criterion needs to be put in place. Secondly, the vector space has to be complete with respect tn that norm. A complete normed vector space is called a Banach Banach space space. We shall not deal with a general Banach space, but only with those spaces whose norms arise natnrally from an inner product. This leads to the following definition: Hilbert space defined 5.1.6. Definition. A complete inner product space, commonly denoted by 11:, is called a Hilbert space. Thus, all finite-dimensional real or complex vector spaces are Hilbert spaces. However, when we speak of a Hilbert space, we shall usually assume that it is infinite-dimensional. It is convenient to use orthonormal vectors in stndying Hilbert spaces. So, let us consider an infinite sequence (lei) }~1 of orthonormal vectors all belonging to a Hilbert space 11:. Next, take any vector If) E 11:, construct the complex numbers fi = (ei If), and form the sequence of vectors'' n Ifn) = L fi lei) ;=1 for n = 1,2, ... (5.1) 3WecanconsiderIfn)asan"approximation"toIf),becausebothshare thesamecomponents alongthesamesetoforthonormal vectors. The sequence of orthonormal vectorsactsverymuchas a basis.However, to bea basis,anextracondition mustbe met. Weshalldiscussthiscondition shortly.
  • 169. 5.1 THE QUESTION OF CONVERGENCE 149 For the pair of vectors If) and IJ.,}, the Schwarz inequality gives (5.2) where Equation (5.1) has been used to evaluate Unl fn}. On the other hand, taking the inner product of (5.1) with (fl yields n n n Ulfn) = Lfdflei) = Lfd,' = Ll.fil2. i=1 ;=1 ;=1 Parseval inequality Substitution of this in Equation (5.2) yields the Parseval inequality: n L I.ti1 2 :s UI f} . ;=1 (5.3) This conclusion is true for arbitrarily large n and can be stated as follows: 5.1.7. Proposition. Let Ilei)}?::t be an infinite set of orthonormal vectors in a Hilbert space, 1[, Let If) E 11: and define complex numbers Ii = (eil f). Then Bessel inequality the Besselinequality holds: E?::l Ifil 2 s UI f). The Bessel inequality shows that the vector 00 n " Ii lei) == lim "Ii lei) L- n--+ooL- i=1 i=l complete orthonormal sequence ofvectors converges; that is, it has a finite norm. However, the inequality does not say whether the vector converges to If). To make such a statement we need completeness: 5.1.8. Definition. A sequence oforthonormal vectors Ilei)}?::! in a Hilbert space 11: is called complete ifthe only vector in 11: that is orthogonal to all the lei} is the zerovector. This completeness property is the extra condition alluded to (in the footuote) above, and is what is required to make a basis. 5.1.9. Proposition. Let Ilei}}?::! be an orthonormal sequence in 11:. Then the following statements are equivalent: 1. Ilei}}?::! is complete. v If) E 1[, 2. If) = E~! [ei} (eil f) 3. E~l [ei} (eil = 1. 4. UI g) = E?::l UI ei) (eil g) v If), Ig} E 11:.
  • 170. 150 5. HILBERT SPACES v If) E 1f. Proof We shall prove the implications 1=}2 =} 3 =} 4 =} 5 =} I. I =} 2: It is sufficient to show that the vector 11ft) == If) - I:f;:! lei) (eil f) is otthogonal to all the Iej): 8ij 00 ,..-'--, (ejl1ft) = (ejl f) - L (ejl ei) (eil f) = O. ;=1 2 =} 3: Since If) =llf) =I:f;:j(lei) (eil) If) is true for all If) E 1C, we must have 1 = I:f;:I!ei)(eil. 3 =} 4: (fl g) = (flllg) = (fl (I:f;:I!ei) (eil) Ig) =I:f;:j (fl ei)(eil g). 4 =} 5: Let Ig} = If) in statement 4 and recall that (fl ei) = (eil f)*. 5 =} 1: Let If) be otthogonal to all the lei). Then all the terms in the sum are zero implying that 1If112 = 0, which in tum gives If) = 0, because only the zero vectorhas a zeronorm. 0 Parseval equality; generalized Fourier coefficients The equality 00 00 IIff = (fl f) =L I(eil f) 1 2 =L Ifil 2 , ;=1 i=1 fi = (eil f) , (5.4) is called the Parseval equality, and the complexnumbers Ji are called generalized completeness Fourier coefficients. The relation relation 00 1 = L lei} (eil ;=1 is called the completeness relation. (5.5) basis for Hilbert 5.1.10. Definition. A complete orthonormal sequence (lei) Jf;:! in a Hilbert space spaces 1Cis called a basis of1C. 5.2 The Space of Square-Integrable Functions Chapter 1 showed that the collection of all continuous fuoctions defined on an interval [a, b] forms a linear vector space. Example 5.1.5 showed that this space is not complete. Can we enlarge this space to make it complete? Since we are interested in an inner product as well, and since a natural inner product for func- tions is defined in terms of integrals, we want to make sure that our fuoctions are integrable. However, integrability does not require continuity, it only requires piecewise continuity. In this section we shall discuss conditions under which the
  • 171. 5.2 THE SPACE OF SQUARE-INTEGRABLE FUNCTIONS 151 space offunctions becomes complete. An important class offunctions has already been mentioned in Chapter I. These functions satisfy the inner product given by square-integrable functions (gl f) = lb g*(x)!(x)w(x) dx. If g(x) = f ix), we obtain (fl f) = lb 1!(x)1 2w(x) dx. (5.6) Functions for which such an integral is defined are said to be square-integrable. David Hilbert (1862-1943), the greatest mathematician of this century, receivedhis Ph.D. fromthe University of Konigsberg and was a member of the staff there from 1886 to 1895.ln 1895 he was appointed to the chair of mathematics at the University of Gottingen, wherehecontinued to teachfortherestof his life. Hilbert is one of that rare breedof late 19th-century math- ematicians whose spectrum of expertise covereda wide range, withformal settheory atoneendandmathematicalphysicsatthe other.He did superbworkin geometry,algebraicgeometry,alge- braic numbertheory, integral equations, and operatortheory. The seminaltwo-volume bookMethodender mathematische Physik by R. Courant, stilloneof thebestbooksonthesubject, wasgreatly influenced byHilbert. Hilbert's workingeometryhadthegreatestinfluence in that area sinceEuclid. A system- aticstudy of theaxioms of Euclidean geometry ledHilbert topropose 21 suchaxioms, and he analyzed their significance. He published Grundlagen der Geometrie in 1899, puttiug geometry onaformal axiomatic foundation. Hisfamous 23Paris problems challenged (and stilltodaychallenge) mathematicians to solvefundamental questions. It waslateinhis career that Hilbert turned to thesubject forwhichhe is mostfamous among physicists. A lecture by Erik Holmgren in 1901 on Fredholm's work on integral equations, whichhadalready beenpublished in Sweden, aroused Hilbert's interest in the subject. DavidHilbert, having established himselfastheleading mathematician ofhis time by his work on algebraic numbers, algebraic invariants, andthefoundations of geometry, nowturned hisattention tointegralequations.He saysthat aninvestigation of thesubject showedhim thatitwasimportant forthetheory of definite integrals, forthedevelopment of arbitrary functions in series (ofspecialfunctions ortrigonometric functions). forthetheory of linear differential equations, forpotential theory. andforthecalculus of variations. He wrote a series of sixpapers from 1904to 1910andreproduced theminhisbookGrundzuge einer allgemeinenTheone der linearenIntegralgleichungen (1912). During thelatter part of thisworkhe applied integral equations toproblems of mathematical physics. Itis saidthat Hilbert discovered thecorrect fieldequation forgeneral relativity in 1915 (oneyear before Einstein) usingthevariational principle, butnever claimed priority. Hilbert claimed that he worked best out-of-doors. He accordingly attached an 18-foot blackboard tohis neighbor's wallandbuiltacovered walkway there so that he couldwork outside in any weather. He wouldintermittently interrupt his pacing andhis blackboard
  • 172. 152 5. HILBERT SPACES computations witha few turns around therestof theyardonhis bicycle, orhe wouldpull some weeds, ordo somegarden trimming. Once,whena visitorcalled,themaidsenthim to the backyard andadvised thatif the master wasn'treadily visible at the blackboard to lookforhimupin one of thetrees. Highlygiftedandhighlyversatile, DavidHilbert radiated overmathematics acatching optimism anda stimulating vitality that canonlybe called"thespirit of Hilbert," Engraved on a stone marker set over Hilbert's grave in Gottingen arethe master's own optimistic words: "WiT mussenwissen.WiT werden wissen." ("We must know. Weshallknow.") The space of square-integrable functions over the interval [a, b] is denoted by .c~(a, b). In this notation L stands for Lebesgue, who generalized the notion of the ordinary Riemann integral to cases for which the integrand could be highly discontinuous; 2 stands for the power of j (x) in the integral; a and b denote the limits of integration; and w refers to the weight function (a strictly positive real- valued function). When w(x) = I, we use the notation .c2(a, b). The significance of .c~(a, b) lies in the following theorem (for a proof, see [Reed 80, Chapter III]): £'~(a, b) iscomplete 5.2.1. Theorem. (Riesz-Fischer theorem) The space .c~(a, b) is complete. A complete infinite-dimensional ioner product space was earlier defined to be a Hilbert space. The following theorem shows that the number of Hilbert spaces is severely restricted. (For a proof, see [Frie 82, p. 216].) all Hilbert spacesare alike 5.2.2. Theorem. All infinite-dimensional complete inner product spaces are iso- morphic to .c~(a, b). .c~ (a, b) is defined in terms offunctions that satisfyEquation (5.6). Yetanioner product involves integrals of the form J:g*(x)j(x)w(x) dx: Are such integrals well-defined and finite? Using the Schwarz inequality, which holds for any ioner product space, finite or infinite, one can show that the integral is defined. The isomorphism ofTheorem 5.2.2 makes the Hilbert space more tangible, because it identifies the space with a space of functions, objects that are more familiar than abstract vectors. Nonetheless, a faceless function is very little improvement over an abstract vector. What is desirable is a set of concrete functions with which we can calculate. The following theorem provides such functions (for a proof, see [Sinon 83, pp. 154-161]). 5.2.3. Theorem. (Stone-Weierstrass approximation theorem) The sequence oj functions (monomials) {xk }, where k = 0, 1,2, ... .forms a basis oj.c~(a, b). Thus, any function j can be written as j(x) = L:~oakxk. Note that the {x k } are not orthonormal but are linearly independent. If we wish to obtain an orthononnal-or simply orthogonal-linear combination ofthese vectors, we can use the Gram-Schmidt process. The result will be certain polynomials, denoted by Cn(x), that are orthogonal to one another and span .c~(a, b).
  • 173. 5.2 THE SPACE OF SQUARE-INTEGRABLE FUNCTIONS 153 Such orthogonal polynomials satisfy very useful recurrence relations, which we now derive. In the following discussion P,k(X) denotes a generic polynomial of degree less than or equal to k. For example, 3x s - 4x2 +5, 2x + I, -2.4x4 + 3x3 - x2 +6, and 2 are all denoted by p,s(x) or p,s(x) or p,S9(X) because they all have degrees less than or equal to 5, 8, and 59. Since a polynomial of degree less than n can be written as a liuear combination of Ck(X) with k < n, we have the obvious property 1 b Cn(x)P,n_l(x)w(x)dx = O. (5.7) Let km and k~ denote, respectively, the coefficients of xm andxm- 1 in Cm(x), and let hm = 1b[Cm(x)fw(X)dX. (5.8) The polynomial CnH (x) - (knHI kn)xCn(x) has degree less than or equal to n, and therefore can be expanded as a linear combination of the Cj (x): (5.9) form :On-2. form:On-2. Take the inner product of both sides of this equation with Cm(x): 1 b ~+11b Cn+l(x)Cm(x)w(x)dx - -- xCn(x)Cm(x)w(x)dx a kn a n (b = I:aj I, Cj(x)Cm(x)w(x) dx. j=O a The first integral on the LHS vanishes as long as m :0 n; the second integral vanishes if m :0 n - 2 [if m :0 n - 2, then xCm(x) is a polynomial of degree n - 1]. Thus, we have n r I:aj I, Cj(x)Cm(x)w(x)dx = 0 j=O a The integral in the sum is zero unless j = m, by orthogonality. Therefore, the sum reduces to am 1 b [Cm(x)]2W(x) dx = 0 Sincetheintegral isnonzero, we conclude that am = 0 form =0, 1,2, ... , n - 2, and Equation (5.9) reduces to kn+! Cn+l(X) - --xCn(x) = an-lCn-!(X) +anCn(x). (5.10) kn
  • 174. 154 5. HILBERT SPACES It can be shown that if we define kn+l an = --, kn s; an Yn=-----, hn-l an-l (5.11) a recurrence relation then Equation (5.10) can be expressed as lor orthogonal polynomials Cn+l(X) = (anx + f3n)Cn(x) +VnCn-l(X), or (5.12) (5.13) Other recurrence relations, involving higher powers of x, can be obtained from the one above. For example, a recurrence relation involving x2 can be obtained by multiplying both sides ofEquation (5.13) by x and expanding each term ofthe RHS using that same equation. The result will be (5.14) 5.2.4. Example. As anapplication of the recurrence relations above, letus evaluate t, == lb xCm(x)Cn(x)w(x)dx. Substituting (5.13)in the integralgives 1 lb P lb lj = - Cm(x)Cn+l(x)w(x)dx - --" Cm(x)Cn(x)w(x)dx an a an a ~ lb - --" Cm(X)Cn_l (x)w(x) dx. IXn a Wenowuse theorthogonality relations among the Ck(X) to obtain 1 lb . P lb lj = ~8m,n+l C~(x)w(x)dx---"8mn C~(x)w(x)dx an a an a Yn lb 2 - -8m,n-l Cm(x)w(x)dx IXn a ( 1 8 Pm Ym+l ) = -- m,n+l - -8mn - --8m,n- l hm• Clm-l am Clm+l
  • 175. 5.2 THE SPACE OF SQUARE-INTEGRABLE FUNCTIONS 155 or III if m = n + I, if m = n, j hml Olm_ 1 I -{3mhm/am 1- -0Ym+lhm/am+l if m = n - 1, otherwise. 5.2.5. Example. Letus findtheorthogonalpolynomialsforminga basisof £.2(-I, +I), whichwedenoteby Pk(X), wherek is the degreeof thepolynomial.Let Po(x) = L Tofind PI (x), write PI (x) = ax +b, anddeterminea andb in sucha way that PI (x) is orthogonal to Po(x): 0=11 PI (x) Po(x) dx = 11 (ax +b) dx = !ax21~1 +2b = 2b. -I -I So oneof thecoefficients, b, is zero. Tofind theother one; we needsomestandardization procedure.We"standardize"Pk(X) by requiringthat Pk(l) = I Vk: For k = I this yields a x 1 = I,ora = 1. so that PI (x) =X. Wecan calculate P2(X) similarly:Write P2(X) =ax2 +bx +c, imposethe condition that it be orthogonalto both PI (x) and Po(x), and enforce the standardizationprocedure. All this will yield 11 2 11 2 0= P2(x) Po(x) dx = -a +2c, 0 = P2(x) PI (x) dx = -b, -I 3 -I 3 and P2(1) = a+b+c = L Thesethreeequationshavetheuniquesolutiona = 3/2, b = 0, c = -1/2. Thus, P2(X) = !(3x2 - I). These are the first three Legendrepolynomials, which arepart of a larger group of polynomials to be discussed in Chapter 7. II 5.2.1 Orthogonal Polynomials and Least Squares The method of least squares is no doubt familiar to the reader. In the simplest procedure, one tries to find a linear function that most closely fits a set of data. By defioition, "most closely" means that the sum of the squares ofthe differences between the data points and the corresponding values of the linear function is minimum. More generally, one seeks the best polynomial fit to the data. We shall consider a related topic, namely least-square fitting ofa givenfunction with polynomials. Suppose f(x) is a function defined on (a, b). We wantto find a polynomial that most closely approximates f. Write such a polynomial as p(x) = L:Z=o akxk, where the ak's are to be determined such that S(ao, 011, ... , an) ea lb [f(x) - 010 - alx - ... - anx n]2 dx is a minimum. Differentiating S withrespect to the ak's and setting the result equal to zerogives as lb [ n ] 0=-. = 2(-xi ) f(x) - I>kXk dx, oaJ a k=O
  • 176. 158 5. HILBERTSPACES (a) II! ± gil = II!II+ IIgll· (b) II! +gll2 + II! - gll2 = 2(1If11 + IIgll)2. (c) Using parts (a), (b), and Theorem 1.2.8, show that .c1(R) is not an inner product space. This shows that not all norms arise from an inner product. 5.6. Use Equation (5.10) to derive Equation (5.12). Hint: To find an, equate the coefficients of x" on both sides of Equation (5.10). To find an-I, multiply both sides of Equation (5.10) by Cn_IW(X) and integrate, using the definitions of kn, k~, andhno 5.7. Evaluate the integral f:x2C m(x)Cn(x)w(x) dx. Additional Reading 1. Boccara, N. Functional Analysis, Academic Press, 1990. An application oriented book with many abstract topics related to Hilbert spaces (e.g., Lebesgue measure) explained for a physics audience. 2. DeVito, C. Functional Analysis and Linear Operator Theory, Addison- Wesley, 1990. 3. Reed, M., and Simon, B. FunctionalAnalysis, Academic Press, 1980. Coau- thored by a mathematical physicist (B.S.), this first volume ofa four-volume encyclopedic treatise on functional analysis and Hilbert spaces bas many examples and problems to help the reader comprehend the rather abstract presentation. 4. Zeidler, E. Applied Functional Analysis, Springer-Verlag, 1995. Another application-oriented book on Hilbert spaces suitable for a physics audience.
  • 177. 6, _ Generalized Functions Once we allow the number of dimensions to be infinite, we open the door for numerous possibilities that are not present in the finite case. One such possibility arises because ofthe variety ofinfinities. Wehave encountered two types of infinity in Chapter 0, the countable infinity and the uncountable infinity. The paradigm of the former is the "number" of integers, and that of the latter is the "number" of real numbers. The nature of dimensionality of the vector space is reflected in the components of a general vector, which has a finite number of components in a finite-dimensional vector space, a countably infinite number of components in an infinite-dimensional vector space with a countable basis, and an uncountably infinite number of components in an infinite-dimensional vector space with no countable basis. 6.1 ContinuousIndex To gain an understanding of the nature of, and differences between, the three types of vector spaces mentioned above, it is convenient to think of components as functions of a "counting set." Thus, the components Ji of a vector If) in an N-dimensional vector space can be thought of as values of a function f defined on the finite set {I, 2, ... , N}, and to emphasize such functional dependence, we write f (I) instead of fi- Similarly, the components Ji of a vector If) in a Hilbert space with the countable basis B = (lei)}~l can be thought of as values of a function f : N --> C, where N is the (infinite) set of natural numbers. The next step is to allow the counting set to be uncountable, i.e., a continuum such as the real numbers or an interval thereof. This leads to a "component" ofthe form f (x) corresponding to a function f : lR --> C. What about the vectors themselves? What sort of a basis gives rise to such components?
  • 178. 160 6. GENERALIZED FUNCTIONS Because of the isomorphism of Theorem 5.2.2, we shall concentrate on .c~ (a, b). In keeping with our earlier notation, let {lex)lxeJR denote the elements of an orthonormal set and interpret f(x) as (ex If). The inner product of .c~(a, b) cannow be written as (glf) = lb g*(x)f(x)w(x)dx = lb (glex)(exl f) w(x)dx = (gl ([ lex) w(x) (exI dX) i». The last line suggests writing lb lex) w(x) (exI dx = 1. In the physics literature the "e" is ignored, and one writes Ix) for lex). Hence, we obtain thecompleteness relation fora continuous index: completeness relation fora continuous index lb [x) w(x) (xl dx = 1, or lb [x) (xl dx = 1, (6.1) where in the secoud integral, w(x) is set equal to unity. We also have If) = (lb Ix) w(x) (xl dx) If) = lb f(x)w(x) [x) dx, (6.2) which shows how to expand a vector If) in terms of the Ix)'s. Take the inner product of (6.2) with (x']: (xii f) = f(x') = lb f(x)w(x) (xii x) dx where x' is assumed to lie in the interval (a, b), otherwise f(x') = 0 by definition. This equation, which holds for arbitrary f, tells us inuuediately that w(x) (xii x) is no ordinary function of x and x'. For instance, suppose f(x') = O. Then, the result of integration is always zero, regardless of the behavior of f at other points. Clearly, there is an infinitude offunctions that vanish at x', yet all ofthem give the same integral! Pursuing this line of argument more quantitatively, one can show that w(x) (x'] x) = Oifx oF x', w(x) (xl x) = 00, w(x) (x'] x) is anevenfunctiou of x - x', and J:w (x) (x'] x) dx = 1. The proof is left as a problem. The reader Dirac delta function may recognize this as the Dirac delta function 8(x - x') = w(x) (x'] x) , (6.3) which, for a function f defined on the interval (a, b), has the following property.' l b f(x)8(x _ x') dx = {f(x I ) if x E (a, b), (6.4) a 0 ifxrt(a,b). 1Foranelementary discussionof theDiracdeltafunction withmanyexamplesof its application, see [Hass99].
  • 179. 6.1 CONTINUOUS INDEX 161 ! ! , _._._._._._._._.__._~_._._---_._._._._._._.+- ............- ········__·"·__····j···_--_·_--------·----i···_·- .•..........- , , . . " ..__._._-_._._._.~._---_._._._._._._._._~ .._. ........ - , ' '/; i ----f--- _._._._.- ······1- .j ....... - J ~! , , _._~_.. , ." .. ---------.""-j .................- _.J .,....... ; , e-,.......... ..-.- ~~.- , ........- ......._.,.. , : j , ! i i -0.5 o 0.5 I 1.5 2 2.5 Figure 6.1 The Gaussian bell-shaped curve approaches the Dirac delta function as the width of the curve approaches zero. The value of € is 1 for the dashed curve, 0.25 for the heavy curve and 0.05 for the light curve. Written in the form (xii x) = 8(x -x')fw(x), Equation (6.3) is the generalization of the orthonormality relation of vectors to the case of a continuous index. The Dirac delta function is anything but a "function." Nevertheless, there is a well-developed branch of mathematics, called generalized function theory or functional analysis, studying it and many other functions like it in a highly rigorous fashion. We shall only briefly explore this territory of mathematics in the next section. At this point we simply mention the fact that the Dirac delta function can be represented as the limit of certain sequences of ordinary functions. The following three examples illustrate some of these representations. 6.1.1.Example. Consider a Gaussian curve whose width approaches zero at the same time that its height approaches infinity in such a way that its area remains constant. In the infinite limit, we obtain the Dirac delta function. In fact, we have 8(x -'- x') = limE-+o ~e-(X-XI)2/E. In the limit of E ---+ 0, the height of this Gaussian goes to infin- v H ' ity while its width goes to zero (see Figure 6.1). Furthermore, for any nonzero value of E, we can easily verify that This relation is independent of E and therefore still holds in the limit E -+ O.The limit of the Gaussian behaves like the Dirac delta function. II 6.1.2. Example. Consider thefunctionDT (x - x') defined as I jT O( ') DT(X -x') == _ e' x-x fdt. 2" -T
  • 180. 162 6. GENERALIZED FUNCTIONS -5 o 5 Figure 6.2 The function sin Txfx also approaches the Dirac delta function as the width of the curve approaches zero. The value of T is 0.5 for the dashed curve. 2 for the heavy curve, and 15 for the light curve. The integral is easily evaluated, with the result « ') T 1 el x-x til sinT(x -x') DT(X - x') = - = . . 2rr i(x -x') -T 7r x -x' The graph of DT(x - 0) as a function of x for various values of T is shown in Figure 6.2. Note that the width of the curve decreases as T increases. The area under the curve can be calculated: 100 , I 100 sin T(x - x') 1 100 sin y DT(X-x)dx=- , dx=- --dy=l. -00 7C -00 X -x tt -00 Y ' - - - ' =" (6.5) Figure 6.2 shows that DT (x - x') becomes more and more like the Dirac delta function as T gets larger and larger. In fact, we have ~(x _ x') = lim I sin T(x - x'). T--+oo JC X - x' and T(x -x') -+ 0 To see this, we note that for any finite T we can write DT(X _ x') = T sin T(x - x'). x T(x - x') Furthermore, for values of x thatare very close to x', sin T(x - x') I -+ . T(x-x') . Thus, for such values of x and x', we have Dr(x - x') ~ (T/Jr), which is large when T is large. This is as expected of a delta function: ~ (0) = 00. On the other hand, the width
  • 181. (6.6) (6.7) III step function orB function 8function as derivative 01 B function 6.1 CONTINUOUS INDEX 163 of Dr (x - x') around x' is given, roughly, by the distance between the points at which DT (x - x') drops to zero: T (x - x') = ±1l'. or X - x' = ±1l'/ T. This width is roughly Ax = 2n/ T. whichgoestozeroasT grows.Again,thisisasexpectedofthedeltafunction. III The preceding example suggests auotherrepresentation ofthe Dirac deltafunc- tion: I joo.( ') 8(x - x') = - e' x-x 'dt. 2". -00 6.1.3. Example. A third representation of the Dirac delta function involves the step function e(x - x'), which is defined as I {O if x e x', B(x -x) == 1 if x>x' and is discontinuous at x = x', We can approximate this step function by many continuous functions, such as TE (x - x') defined by [ 0 if X~X'_,. T,(x-x')== 2~(X-x'+') if x'-, ~x~x'+,. 1 if x ~ x' +€, where e is a small positive number as shown in Figure 6.3. It is clear that B(x-x')= tiro T,(x-x'). ,--+0 Now let us consider the derivative of Te(x - x') with respect to x: r if x <X'-f, at; , 1 if x' - € < X < x' +E, -(x-x)= - dx 2, 0 if x > x' +E. We note that the derivative is not defined at x = x' - e and x = x' + E. and that dT€/ dx is zero everywhere except when x lies in the interval (x' - E, x' + f), where it is equal to 1/(2f) and goes to infinity as e .....,. O.Here again we see signs of the delta function. In fact, we also note that 100 (dT,) t:(dT,) lX '+E 1 - dx = - dx = -dx = I. -00 dx x' -, dx x'_, 2< It is not surprising, then, to find that lim€......,)oo dT€ (x - x') = 8(x - x'). Assuming that the dx interchange of the order of differentiation and the limiting process is justified, we obtain the importantidentity d I I -B(x - x) = 8(x - x). dx
  • 182. 164 6. GENERALIZED FUNCTIONS y 1 x o X'-E x' x'+E Figure 6.3 The step function, or 8-function, shown in the figure has the Dirac delta function asits derivative. Now that wehavesomeunderstanding of onecontinuous index,we cangener- alizetheresults to several continuous indices. In theearlierdiscussion welookedat j (x) as thexthcomponent of some abstractvector If). Forfunctions of n variables, we can think of j(X!, ... , xn) as the component of an abstract vector If) along a basis vector Ix!, ... , xn}.2 This basis is a direct generalization of one continuous index to n. Then j(XI, ... , xn) is defined as j(X!, ... , xn) = (Xl, ... , XnIf). If the region of integration is denoted by n, and we use the abbreviations thenwe canwrite If) = Indnxj(r)w(r) [r}, Indnx [r) w(r) (r] = 1, j(r') =Indnxj(r)w(r) (r'] r), (r'l r) w(r) =8(r - r'), (6.8) where d"X is the "volume" element and n is the region of integration of interest. For instance, ifthe region of definition of the functions under consideration is the surface of the unit sphere, then [with w(r) = 1], one gets rdq, fan sinO so 10, q,HO, q,1 =1. (6.9) 2Do notconfusethiswithanu-dimensional vector. Infact,thedimension is a-fold infinite: eachxi countsone infinite setof numbers!
  • 183. (6.10) 6.2 GENERALIZED FUNCTIONS 165 This will be used in our discussion of spherical harmonics in Chapter 12. An important identity using the three-dimensional Dirac delta function comes from potential theory. This is (see [Bass 99] for a discussion of this equation) 2 ( 1 ) , V - - = -411"8(r-r). Ir-r'l 6.2 Generalized Functions Paul Adrian Maurice Dirac discovered the delta function in the late 1920s while investigating scaltering problems in quantum mechanics. This "function" seemed to violate most properties ofother functions known to mathematicians at the time. Furthermore, the derivative of the delta function, 8'(x - x') is such that for any ordinary function f(x), i:f(x)8'(x - x')dx =- i:f'(x)8(x - x') dx =- f'(x'). We can define 8'(x - x') by this relation. In addition, we can define the derivative of any function, including discontinuous functions, at any point (including points of discontinuity, where the usual definition ofderivative fails) by this relation. That is, if cp(x) is a "bad" function whose derivative is not defined at some point(s), and f(x) is a "good" function, we can define the derivative of cp(x) by i:f(x)cp'(x)dx == - i:f'(x)cp(x)dx. The integral on the RHS is well-defined. Functions such as the Dirac deltafunction and its derivatives of all orders are not functions in the traditional sense. Whatis common among all ofthem is that in most applications they appearinside an integral, and we saw in Chapter 1that integration can be considered as a linear functional on the space of continuous functions. 11 is therefore natura1to describe such functions in terms of linear functionals. This idea was picked up by Laurent Schwartz in the 1950s who developed it into a new branch of mathematics called generalized functions, or distributions. A distribution is a mathematical entity that appears inside an integral in con- junction with a well-behaved test function-which we assume to depend on n variables-such that the result of integration is a well-defined number. Depending on the type of test function used, different kinds of distributions can be defined. If we want to include the Dirac delta function and its derivatives of all orders, then the test functions must be infinitely differentiable, that is, they must be eoo functions on Rn (or C"), Moreover, in order for the theory of distributions to be mathematicallyfeasible, all the test functions mustvanish outside a finite "volume" of Rn (or Cn).3 One common notation for such functions is e1i'(Rn) or e1i'(cn) 3Such functions aresaidtobe of compactsupport.
  • 184. 166 6. GENERALIZED FUNCTIDNS generalized functions and distributions defined (F stands for "finite"). The definitive property of distributions concerns the way they combine with test functions to give a number. The test functions used clearly form a vector space over R or C. In this vector-space language, distributions are liuear functionals. The linearity is a simple consequence of the properties of the integral. We therefore have the following definition of a distribution. 6.2.1. Definition. A distribution, orgeneralizedfunction, is a continuous" linear functional on the space ef"(Rn) or ef"(Cn). Iff Eef" and <p is a distribution, then <p[J] = Coo' ,!,(r)f(r) dnx. Another notation nsed in place of <p[J] is (<p, f). This' is more appealing not only becanse <p is linear, in the sense that <p[af + ,Bg] = a<p[J] + ,B<p[g], bnt also because the set of all such liuear functionals forms a vector space; that is, the liuear combination of the <p's is also defined. Thus, (<p, f) suggests a mutual "democracy" for both !'s and <p's. We now have a shorthand way of writing integrals. For instance, if 8a repre- sents the Dirac delta function 8(x - a), with an integration over x understood, then (8a , f) = f(a). Similarly, (8~, f) = -/,(0), and for linear combinations, (a8a +,B8~, f) = af(a) - ,B/,(a). 6.2.2. Example. An ordinary(continnons)functinn g coobe thonghtof as a specialcase of a distribution, The linear functional g : ef"(R) --+ Ris simpty defined by (g, f) ea g[f] = 1:"'00 g(x)f(x) dx. III 6.2.3. Example. An interesting application of distributions (generalizedfunctions) oc- curs whenthenotionofdensityisgeneralizedto include notonly(smooth) volumedensities, butalsopoint-like,linear, andsurface densities. Apointcbargeq locatedatrO coobethougbtofashavingacharge densityper) =q8(r- rO)'Inthelanguage oflinear functionals,weinterpretp asadistribution,p : ef"(E3)--+ E, whichforanarbitrary function f gives p[f] = (p, f) =qf(ro)· (6.11) Thedeltafunction character of p canbe detected fromthisequation by recalling that the LHSis On the RHS of this equation, the only volume element that contributes is the one that contains the point to: all the rest contribute zero. As b.Vi -+ 0, the only way that the RHScangive a nonzero number is forp(rO)!(rO) tobe infinite. Since! is a well-behaved function, p(rO) must be infinite, implying that per) acts as a delta function. This shows thatthe definitionof Equation(6.11) ieads to a delta-functionbehavior for p. Similarlyfor linearandsurface densities. III 4See [Zeidler, 95], pp.27. 156-160, fora formal definition of thecontinuity of linear functionals.
  • 185. "The amount of theoretical ground one has tocover before being able to solve problems of real practical value is rather large, but this circumstance isan inevitable consequence ofthe fundamental part played by transformation theory and is likely to become more pronounced inthe theoretical physics of thefuture." P.A.M. Dirac (1930) 6.2 GENERALIZED FUNCTIONS 167 The example above and Problems 6.5 and 6.6 suggest that a distribution that confines an integral to a lower-dimensional space mnst have a delta function in its definition. "Physical Laws should have mathematical beauty." This statement was Dirac's response to the question of his philosophy ofphysics, posed to him in Moscow in 1955. He wrote it on a blackboard that is still preserved today. Paul Adrien Maurice Dirac (1902-1984), was born in 1902 in Bristol, England, of a Swiss, French-speaking father and an English mother. His father, a taciturn man who refused to receive friends at home, enforced young Paul's silence by requiring that only French be spoken at the dinner table. Perhaps this explains Dirac's later disinclination toward collaboration and his general tendency to be a loner in most aspects ofhis life. The fundamental nature ofhis work made the involvement ofstudents difficult, so perhaps Dirac's personality was well-suited to his extraordinary accomplishments. Dirac went to Merchant Venturer's School, the public school where his father taught French, and while there displayed great mathematical abilities. Upon graduation, he fol- lowed in his older brother's footsteps and went to Bristol University to study electrical engineering. He was 19 when he graduated Bristol University in 1921. Unable to find a suitable engineering position due to the economic recession that gripped post-World War I England, Dirac accepted a fellowship to study mathematics at Bristol University. This fel- lowship, together with a grant from the Department of Scientific and Industrial Research, made it possible for Dirac to go to Cambridge as a research student in 1923. At Cambridge Dirac was exposed to the experimental activities of the Cavendish Laboratory, and he be- came a member of the intellectual circle over which Rutherford and Fowler presided. He took his Ph.D. in 1926 and was elected in 1927 as a fellow. His appointment as university lecturer came in 1929. He assumed the Lucasian professorship following Joseph Larmor in 1932 and retired from it in 1969. Two years later he accepted a position at Florida State University where he lived out his remaining years. The FSU library now carries his name. In the late 1920s the relentless march of ideas and discoveries had carried physics to a generally accepted relativistic theory of the electron. Dirac, however, was dissatisfied with the prevailing ideas and, somewhat in isolation, sought for a better formulation. By 1928 he succeeded in finding an equation, the Diracequation, that accorded with his own ideas and also fit most of the established principles of the time. Ultimately, this equation, and the physical theory behind it, proved to be one of the great intellectual achievements of the period. It was particularly remarkable for the internal beauty of its mathematical structure, which not only clarified previously mysterious phenomena such as spin and the Fermi- Dirac statistics associated with it, but also predicted the existence ofan electron-like particle of negative energy, the antielectron, or positron,and, more recently, it has come to playa . role of great importance in modem mathematics, particularly in the interrelations between topology, geometry, and analysis. Heisenberg characterized the discovery of antimatter by Dirac as "the most decisive discovery in connection with the properties or the nature of elementary particles .... This discovery ofparticles and antiparticles by Dirac. .. changed our whole outlook on atomic physics completely." One of the interesting implications of
  • 186. 168 6. GENERALIZED FUNCTIONS his work that predicted thepositron wastheprediction of amagnetic monopole. Dirac won the Nobel Prize in 1933for this work. Dirac is not only one of the chief authors of quantum mechanics, buthe is also the creator of quantum electrodynamics andone of theprincipal architects of quantum field theory. While studying the scattering theory of quantum particles, he invented the (Dirac) delta junction; in his attempt at quantizing the geueral theory of relativity, he founded constrained Hamiltonian dynamics, whichis one of the mostactive areas of theoretical physicsresearch today. Oneof his greatest contributions is theinvention of bra ( Iand ket I)· WhileatCambridge, Diracdidnotaccept many research students. Thosewhoworked with him generally thought that he was a good supervisor; but one who did not spend muchtimewithhis students. A student neededtobeextremely independent to work under Dirac. One suchstudent was Dennis Sciama, who later became the supervisor of Stephen Hawking, thecurrent holder of the Lucasian chair. Salam andWigner, in their preface tothe FestschriftthathonorsDiraconhisseventiethbirthday andcommemorates hiscontributions toquantum mechanics succinctly assessedtheman: Diracis one of thechief creators of quantum mechanics .... Posterity will rateDiracasoneof thegreatest physicists of alltime.Thepresent generation valueshimas one of its greatest teachers . . .. Onthoseprivileged to know him, Dirachas left his mark . .. by his human greatness. He is modest, affectionate, and setsthehighestpossiblestandards of personal and scientific integrity. He is a legendinhis ownlifetimeandrightly so. (Takenfrom Schweber, S. S. "Some chapters for a history of quantum field theory: 1938- 1952", in Relativity, Groups, and Topology II vol. 2, B. S. Dewitt and R. Stora, eds., North-Holland, Amsterdam, 1984.) We have seenthat the deltafunctiott Cattbe thoughtofas the limitofatt ordinary function, This idea can he generalized, 6.2.4. Defittition. Let {'Pn (x)} be a sequence offunctions such that n~i:'Pn(x)f(x)dx exists for all f E e~(JR). Then the sequence is said to converge to the distribution 'P,defined by ('P, f) = j}!'!"J:'Pn(x)f(x)dx V f. This convergence is denoted by 'Pn -> 'P. For example, it Cattbe verified that n 2 2 _e-n x -> 8(x) .,fii attd 1 - cosnx -> 8(x) nnx2 and so Ott. The proofs are left as exercises.
  • 187. 6.3 PROBLEMS 169 derivative ofa 6.2.5. Definition. The derivative of a distribution rp is another distribution rp' distribution defined by (rp', f) = - (rp, f') V f E e~. 6.2.6. Example. Wecancombine thelasttwodefinitions toshowthat if thefunctions en are defined as { a if x < _1 B,,(x) == (nx + 1)/2 if --k ~"~ s k. 1 if x ~ k, tbenB~(x) -+ 8(x). Wewritethedefinition of the derivative, (B~, f) = - (Bn • j'), in terms ofintegrals: 100 , 1 00 df 100 B,,(x)f(x)dx = - B,,(x)-dx = - Bn(x)df -00 -00 dx -00 =_(1-1 /" B n(X)df+l l / n Bn(x)df+ rooBn(X)d f) -00 -lin 11/n = _ (0+11 /"nx + Idf + roodf) -I/n 2 11/n = _::11 / n xdf _ ~11 / " df _ roodf 2 -I/n 2 -I/n 11/n = -~ (xf(x)I~I~" - i:f(X)dX) I - 2: (f(lfn) - f(-I/n» - f(oo) + f(l/n). Forlarge n. we have I/n '" 0 and f(±I/n) '" frO). Thus, 100 , n (I I I I 2 ) B,,(x)f(x)dx '" -- - f(-) + - f(--) - -frO) + frO) '" frO). -00 2 n n n n n Theapproximation becomesequality in thelimitn -+ 00. Thus, 1 00 , , lim Bn(x)f(x)dx = frO) = (80. f) => Bn -+ 8. n"""*oo -00 Note that /(00) = 0 because of the assumption that all functions must vanish outside a finitevolume. II 6.3 Problems 6.1. Write a density function for two point charges q; and qz located at r = rl and r = r2. respectively. 6.2. Write a density function for four point charges ql = q, q2 = -q, q3 = q and q4 = -q, located at the comers of a square of side 2a, lying in the xy-plane, whose center is at the origin and whose first comer is at (a. a).
  • 188. 170 6. GENERALIZED FUNCTIONS I 6.3. Show that 8(f(x)) = 8(x - xo), where xo is a root of f and x is 1f'(xo)1 confined to values close to xo. Hint: Make a change of variable to Y = f(x). 6.4. Show that where the Xk'S are all the roots of f in the interval on which f is defined. 6.5. Define the distribution p : eoo(R3) --+ R by (p, f) = f<J(r)f(r)da(r), S where <J(r) is a smooth function on a smooth surface S in R3• Show that per) is zero ifr is not on S and infinite ifr is on S. 6.6. Define thedisttibution p : eoo(R3) --+ R by (p, f) = fc)..(r)f(r) di(r), where )..(r) is a smooth function on a smooth curve C in R3• Show that per) is zero if r is not 00 C and infinite if r is on C. 6.7. Express the three-dimensional Dirac delta function as a product of three one- dimensional delta functions involving the coordinates in (a) cylindtical coordinates, (b) spherical coordinates, (c) general curvilinear coordinates. Hint: The Dirac delta function in R3 satisfies JJJ 8(r)d3x = 1. 6.8. Show that J""oo 8'(x)f(x) dx = - f'(O) where 8'(x) sa !x8(x). 6.9. Evaluate the following integrals: (a) i:8(x2 - 5x +6)(3x2 -7x +2) dx. (e) roo 8(sin]fx)(~)Xdx. 10.5 Hint: Use the result of Problem 6.4. (b) i:8(x2 - ]f2)cosxdx. 1 00 2 (d) -00 8(e-X )lnxdx. 6.10. Consider IxIas a generalized function and find its derivative.
  • 189. 6.3 PROBLEMS 171 6.11. Let ~ E eOO(JRn) be a smooth function on JRn, and let <fJ be a distribution. Showthat ~<fJ is alsoa distribution. What is the natural definitionfor ~<fJ? What is (~<fJ)/, the derivative of ~<fJ? 6.12. Showthat eachof the following sequencesof functionsapproaches8(x) in the sense of Definition6.2.4. n 2 2 (a) .,jiie-n x • (b) 1- cosnx 1tnx 2 (d) sinnx. nx Hint:Approximate<fJn(x) for largen andx "" 0, andthen evaluatethe appropriate integral. 6.13. Showthat ~(1 +tanhnx) --> O(x) as n --> 00. 6.14. Showthatx8'(x) = -8(x). Additional Reading 1. Hassani, S. Mathematical Methods, Springer-Verlag, 2000. An elementary treatmentof the Dirac delta function with many examplesdrawn from me- chanicsand electromagnetism. 2. Rudin, W.Functional Analysis, McGraw-Hill,1991. Part II of this mathe- maticalbut(forthosewithastrongundergraduatemathematicsbackground) veryreadablebook is devotedto the theory of distributions. 3. Reed, M. and Simon, B. Functional Analysis, AcademicPress, 1980.
  • 190. 7 _ Classical Orthogonal Polynomials The last example of Chapter 5 discussed only one of the many types of the so-called classical orthogonal polynomials. Historically, these polynomials were discovered as solutions to differential equations arising in various physical problems. Suchpolynomials can be producedby startingwith I, x, X 2, ... and employing the Gram-Schmidt process. However, there is a more elegant, albeit less general, approach that simultaneously studies most polynomials of interest to physicists. We will employ this approach.' 7.1 General Properties for n = 0, 1,2, ... , Most relevant properties of the polynomials of interest are contained in 7.1.1.Theorem. Consider thefunctions 1 dn Fn(x) = w(x) dx n(uis") where 1. F, (x) is a first-degreepolynomial in x, (7.1) 2. s(x) is a polynomial in x ofdegree less than or equal to 2 with only real roots, 3. w(x) is a strictly positive function, integrable in the interval (a, b), that satisfies the boundary conditions w(a)s(a) = 0 = w(b)s(b). IThisapproach is duetoF.G. Tricomi [Tric 55]. See also [Denn 67].
  • 191. 7.1 GENERAL PROPERTIES 173 Then Fn(x) is a polynomial of degree n in x and is orthogonal-on the inter- val (a, b), with weight w(x)---to any polynomial Pk(X) of degree k < n, i.e., J:Pk(x)Fn(x)w(x) dx = Of or k < n. These polynomials are collectively called classical onhogonalpolynomials. Before proving the theorem, we need two lemmasf 7.1.2. Lemma. Thefollowing identity holds: Proof See Problem 7.1. m:S n. D 7.1.3. Lemma. All the derivatives d/" jdxm(wsn) vanish at x = a and x = b,for all values ofm < n. Proof Set k = 0 in the identity of the previous lemma and let P,o = 1. Then we dm have --(wsn) = wsn-mFsm- The RHS vanishes atx = a and x = b due to the dx'" - third condition stated in the theorem. D Proofofthe theorem. .We prove the orthogonality first. The proof involves multi- ple use of integration by parts: l b lb I d n pk(x)F"(x)w(x)dx = Pk(x)- [-n (ws n)] wdx a a W dx l b d [dn - 1 ] = Pk(X)- ---I (uis") dx a dx dx»: d n-I I b lb d d"-l = Pk(X) - - I (ws") - d Pk - d 1 (wsn) dx. dxv: a a X xll- ~ =0 by Lemma7.1.3 This shows that each integration by parts transfers one differentiation from uis" to Pk and introduces a minus sign. Thus, after k integrations by parts, we get l b lbd k Pk d n - k a Pk(x)Fn(x)w(x)dx = (_I)k a dxk dxn_k(wsn)dx l b d [dn-k-I ] d"-k-l I b = C a dx dx" k 1 (ws n) dx = C dx" k 1 (ws n) a = 0, 2Recall that P5k is a generic polynomial with degree less than or equal to k.
  • 192. 174 7. CLASSICAL ORTHOGONAL POLYNOMIALS where we have used the fact that the kth derivative of a potynomial of degree k is a constant. Note that n - k - 1 ~ 0 because k < n, so that the last line of the equation is well-defined. To prove the firstpart of the theorem, we use Lemma 7.1.2 with k = 0 and m = dn 1 dn n to get -(wsn) = wP<n,orFn(x) = --(wsn) = Psn-Toprove that Fn(x) dxn - ui dx" - is a polynomial ofdegree precisely equal ton, we write Fn(x) = P~n-' (x)+knxn, multiply both sides by w(x)Fn(x), and iotegrate over (a, b): lb [Fn(x)]2w(x)dx = lb P~n_,Fn(x)w(X)dx +kn lb xnFn(x)w(x)dx. The LHS is a positive quantity because both w(x) and [Fn(x)]2 are positive, and the first iotegral on the RHS vaoishes by the first part of the proof. Therefore, the second term on the RHS cannot be zero. In particular, kn t= 0, and Fn(x) is of degreen. D Itis customary to iotroduce a normalization constant io the definition of Fn (x), and write 1 dn Fn(x) = -- __(ws n). Knwdxn (7.2) generalized Rodriguez formula differential equation forclassical orthogonal polynomials This equationis calledthe generalizedRodriguez fonnula. For historicalreasons, different polynomial functions are normalized differently, which is why K. is introduced here. From Theorem 7.1.1 it is clear that the sequence Fo(x), F, (x), F2(X), . " of polynomials forms an orthogonal set of polynomials on [a, b] with weight function w(x). All the varieties of classical orthogonal polynomials were discovered as solu- tions of differential equations. Here, we give a siogle generic differential equation satisfied by all the Fn's. The proof is outlioed in Problem 7.4. 7.1.4. Proposition. Let k, be the coefficientofx in Fl (x) and (72 the coefficientof x2 in sex). Then the orthogonal polynomials Fn satisfy the differential equation3 We shall study the differential equation above in the context of the Sturm- Liouville problem (see Chapters 18 and 19), which is an eigenvalue problem in- volving differential operators. 3Aprime is a symbol forderivative withrespect tox.
  • 193. 7.2 CLASSIFICATION 175 7.2 Classification Let us now investigate the consequences of vatious choices of s(x). We start with Ft (x), and note that it satisfies Equation (7.2) with n = 1: 1 d Ft(x) = --(ws), Kiui dx or (7.3) which can be integrated to yield ws = Aexp(f KtFt(x)dx/s) where A is a constant. On the other hand, being a polynomial of degree 1, Fl (x) can be written as F;(x) = klX +k;. It follows that (I Kl (klX +kJ) ) w(x)s(x) = Aexp s dx, w(a)s(a) = 0 = w(b)s(b). Next we look at the three choices for s(x): a constant, a polynomial of degree 1, and a polynomial of degree 2. For a constant s(x), Equation (7.3) can be easily integrated: (I Kl(klX +k')) (I ) w(x)s(x) = Aexp s 1 dx =Aexp (2ax+fJ)dx = Aeax2+px+c = Beax2+px. The interval (a, b) is determined by w(a)s(a) = 0 = w(b)s(b), which yields Beua2+pa = 0 = Beub2+Pb. The only way that this equality can hold is for a and b to be infinite. Since a < b, we must take a =-00 and b =+00, in which case a < O. With y = .JIiiT(x +fJ/(2a)) and choosing B = s exp(fJ2/(4a)), we obtain w(y) = exp( _y2). We also take the constant sto be 1. This is always possible by a proper choice of constants such as B. lfthe degree ofsis 1, then s(x) = O'IX +0'0 and where y = Klkl/O'I, P = Klk1JO'I - KlklO'O/O'[, and B is A modified by the constant of integration. The last equation above must satisfy the boundary conditions at a and b: B(O'la + O'o)Peya = 0 = B(O'lb + O'O)Peyb, which give a = -0'0/0'1, P > 0, Y < 0, and b = +00. With appropriate redefinition of vatiables and parameters, we can write w(y) = yVe-Y, v > -1, and s(x) = x, a =0, b =+00.
  • 194. 176 7. CLASSICAL ORTHOGONAL POLYNOMIALS fL v w(x) Polynomial 0 0 I Legendre, Pn(x) J.-! J.-! (I - x2jA- l/2 Gegenbauer, C~(x), J. > -! 1 1 (I - x 2)- 1/2 Chebyshev of the first kind, Tn(x) -2 -2 1 1 (I - x 2)1/2 Chebyshev of the second kind, Un(x) 2 2 Table7.1 Special casesof Jacobi polynomials Similarly, we can obtain the weight function and the interval of integration for the case when s(x) is of degree 2. This result, aswell asthe results obtained above, are collected in the following proposition. 7.2.1. Proposition. Ifthe conditions ofTheorem 7.1.1 prevail, then (a) For s(x) ofdegree zero we get w(x) = e-x' with s(x) = I, a = -00, and b = +00. The resulting polynomials are called Hermite polynomials and are denoted by Hn(x). (b) For s(x) ofdegree 1, we obtain w(x) = xVe-x with v > -1, s(x) = x, a = 0, and b = +00. The resulting polynomials are called Laguerre polynomials and are denoted by L~(x). (c) Fors(x) ofdegree 2, we get w(x) = (1 +x)I'(I-x)V with u.,» > -1, s(x) = 1 - x 2 , a = -1, and b = +1. The resulting polynomials are called Jacobi polynomials and are denoted by p/:. v (x). Jacobi polynomials are themselves divided into other subcategories depending on the values of fLand v. The most common and widely used of these are collected in Table 7.1. Note that the definitionofeachofthe precedingpolynomialsinvolves a "standardization," which boils down to a particular choice of Kn in the generalized Rodriguez formula. 7.3 Recurrence Relations Besides the recurrence relations obtained in Section 5.2, we can use the differen- tial equation of Proposition 7.1.4 to construct new recurrence relations involving derivatives. These relations apply only to classical orthogonal polynomials, and notto general ones. We start with Equation (5.12) Fn+l(X) = (anx + fJn)Fn(x) + )'nFn-l(X), (7.4)
  • 195. (7.5) 7.3 RECURRENCE RELATIONS 177 differentiate both sides twice, and substitnte for the secoud derivative from the differeutial equation of Propositiou 7.1.4. This will yield 2wsanF~+ [an:X (ws) +WAn(anX +fin)] t; - WAn+!Fn+1 +WYnAn-IFn-l = O. Karl Gustav Jacob Jacobi (1804-1851) was the secoud son born 10 a well-to-doJewish baoking family in Potsdam. An obviously bright young man, Jacobi was soon moved to the highest class in spite of his youth and remained at the gym- nasium for four years only because he could not enter the university until he was sixteen. He excelled at the University of Berlin in all the classical subjects as well as mathematical studies, the topic he soon chose as his career.He passed the examinationto becomea secondaryschoolteacher,then later the examination that allowed university teaching, and joined the faculty at Berlin at the age of twenty. Since promotion there appeared unlikely,he moved in 1826 to the University of Konigsberg in search of a more permanent position. He was known as a lively and creative lecturer who often injected his latest research topics into the lectures. He began what is now a common practice at most universities-the research seminar-for the most advanced students and his faculty collaborators. The Jacobi "school," together with the influence of Bessel and Neumann(also at Konigsberg), sparked a renewal of mathematical excellence in Germany. In 1843 Jacobi fell gravely ill with diabetes. After seeing his condition, Dirichlet, with the help of von Humboldt, secured a donation to enable Jacobi to spend several months in Italy, a therapy recommended by his doctor. The friendly atmosphere and healthful climate there soon improved his condition. Jacobi was later given royal permission to move from Konigsberg10 Berlin so that his heal!h would not be affected by the harsh winlers in the former location. A salary bonus given to Jacobi to offset the highercostofliving in the capital was revoked after he made some politically sensitive remarks in an impromptu speech. A permanent position at Berlin was also refused, and the reduced salary and lack of security caused considerable hardship for Jacobi and his family. Only after he accepted a position in Vienna did the Prussian government recognize the desirability of keeping the distinguished mathematician within its borders, offering him special concessions that together with his love for his homeland convinced Jacobi to stay. In 1851 Jacobi died after contracting both influenza and smallpox. Jacobi's mathematical reputation beganlargely with his heated competition with Abelin the study ofellipticfunctions. Legendre, formerly the star ofsuch studies, wrote Jacobi ofhis happiness at having "livedlong enough to witness these magnanimous contests between two young athletes equally strong." Although Jacobi and Abel could reasonably be considered contemporary researchers who arrived at many of the same results independently, Jacobi suggested the names "Abelian functions" and "Abelian theorem" in a review he wrote for Crelle's Journal. Jacobi also extended his discoveries in elliptic functions to number theory and the theory of integration. He also worked in other areas of number theory, such as the theory ofquadratic forms and the representation ofintegers as sums ofsquares and cubes. He
  • 196. (7.6) (7.7) 178 7. CLASSICAL ORTHOGONAL POLYNOMIALS presentedthewell-knownJacobian, orfunctional determinant, in1841.Tophysicists, Jacobi is probably bestknown for his workin dynamics with the form introduced by Hamilton. Althoughelegantandquitegeneral, Hamiltoniandynamics did notlenditselfto easysolution of many practical problems in mechanics. In the spirit of Lagrange, Poisson, and others, Jacobi investigated transformations of Hamilton's equations thatpreserved their canonical nature (loosely speaking, thatpreserved thePoissonbrackets ineachrepresentation). After much work anda littlesimplification, the resulting equations of motion, now known as Hamilton-Jacobi equations, allowedJacobi tosolve several importantproblems inordinary andcelestial mechanics. Clebsch andlater Helmholtz amplified their use in other areas of physics. We can get another recurrence relation involving derivatives by substituting (7.4) in (7.5) and simplifying: ZwsanF~ + [an :X(ws) +wO'n - An+t)(anx + fJn)] Fn +wYn(An-l - An+l)Fn-l = O. Two other recurrence relations can be obtained by differentiating Equations (7.6) and (7.5), respectively, and using the differential equation for Fn• Now solve the first equation so obtained for Yn(djdx)(wFn-l) and substitote the result in the second equation. After simplification, the result will be ZwanAnFn + :x {[an :x (ws) +W(An - An-l)(anx + fJn)] Fn} d + (An-I - An+t) dx (WFn+l) = O. Finally, we record one more useful recurrence relation: dw dw An(x)Fn - An+l(anx + fJn)-d Fn+t +YnAn-l(anx + fJn)-Fn-1 x dx +Bn(x)F~+t + YnDn(x)F~_1 = 0, (7.8) where An(x) = (anx +fJn) [zwanAn +an d 2 2(ws) +An(anx +fJn) dW] dx dx 2 d - an dx (ws), d Bn(x) = an dx (ws) - w(anx + fJn)(An+1 - An), d Dn(x) = w(anx + fJn)(An-l - An) - an -(ws). dx Details of the derivation ofthis relation are left for the reader. All these recurrence relations seem to be very complicated. However, complexity is the price we pay
  • 197. 7.4 EXAMPLES OF CLASSICAL ORTHOGONAL POLYNOMIALS 179 for generality. When we work with specific orthogonal polynomials, the equa- tions simplify considerably. For instance, for Hermite and Legendre polynomials Equation (7.6) yields, respectively, useful recurrence relations forHermite and Legendre polynomials and 2 ' (l - X )Pn +nxp" - nPn-l = O. (7.9) Also, applying Equation (7.7) to Legendre polynomials gives , , Pn+1 - XPn - (n + I)Pn = 0, and Equation (7.8) yields , , Pn+1 - Pn-1 - (2n + I)Pn = O. (7.10) (7.11) It is possible to find many more recurrence relations by manipulating the ex- istingrecurrence relations. Before studying specific orthogonal polynomials, let us pause for a moment to appreciate the generality and elegance of the foregoing discussion. With a few assumptions and a single defining equation we have severely restricted the choice of the weight function and with it the choice of the interval (a, b). We have nev- ertheless exhausted the list of the so-called classical orthogonal polynomials. 7.4 Examples of Classical Orthogonal Polynomials We now consttuct the specific polynomials used frequently in physics. We have seen that the four parameters Kn, kn, k~, and hn determine all the properties ofthe polynomials. Once Kn is fixed by some standardization, we can detenmine all the other parameters: kn and k~ will be given by the generalized Rodriguez formula, and hn can be calculated as hn = lb F;(x)w(x) dx = lb (knxn + ...)Fn(x)W(X) dx l b I d n k lb d [d n-1 ] =kn wxn----(wsn)dx = -'0. xn_ - - I (ws n) dx a Knw dx" Kn a dx dx»: k d n-1 [b k lb d d n-I = -'0. xn __ (ws n) _ -'0. _(xn) __ (ws n) dx Kn dxr:" a Kn a dx dx n-1 The first term of the last line is zero by Lemma 7.1.3. It is clear that each integra- tion by parts introduces a minus sign and shifts one differentiation from uis" to x".Thus, after n integrations by parts and noting that dOjdxo(wsn) = ius" and d" jdxn(xn) = n!, we obtain h (-l)nknn! lb nd n = WS x. Kn a (7.12)
  • 198. 180 7. CLASSICAL ORTHOGONAL POLYNOMIALS summary of properties of Hermite polynomials 7.4.1 Hermite Polynomials The Hermite polynomials are standardized such that Kn = (_1)n. Thus, the geueralized Rodriguez formula (7.2) and Proposition 7.2.1 give 2 d" 2 Hn(x) = (_I)"ex - (e-X ). dxn (7.13) It is clear that each time e- x2 is differentiated, a factor of -2x is introduced. The highest power ofx is obtained when we differentiate e-x2n times. This yields 2 2 (_I)nex (_2x)ne- x =2nxn '* kn =2n. To obtain l(,., we find it helpful to see whether the polynomial is even or odd. We substitute -x for x in Equation (7.13) and get Hn(-x) = (_I)n Hn(x), which shows thatifn is even (odd), Hn is an even (odd) polynomial, i.e., it can have only even (odd) powers of x. In either case, the next-highest power of x in Hn(x) is not n - I but n - 2. Thus, the coefficient of xn-1 is zero for Hn(x), and we have k~ = O. For h«, we use (7.12) to obtain hn = .,jii2nnL Next we calculate the recurrence relation of Equation (5.12). We can readily calculate the constants needed: an = 2, fin = 0, Yn = -2n. Then substitute these in Equation (5.12) to obtain (7.14) (7.15) (7.16) summary of properties of Laguerre polynomiais Other recurrence relations can be obtained similarly. Finally, the differential equation of Hn(x) is obtained by first noting that K1 = -I, (jz = 0, Fl (x) = 2x '* kl = 2. All of this gives An = -2n, which can be used in the equation of Proposition 7.1.4 to get dZHn an; dxz - 2x dx +2nHn = O. 7.4.2 LaguerrePolynomials For Laguerre polynomials, the standardization is Kn = nL Thus, the generalized Rodriguez formula (7.2) and Proposition 7.2.1 give I dn I dn L~(x) = (xVe-xxn) = _x-VeX_(xn+ve-X). n!xVe x dx" n! dx" To find kn we note that differentiating e-x does not introduce any new powers of x but ouly a factor of -1. Thus, the highest power of x is obtained by leaving x n+v alone and differentiating e-x n times. This gives ~,x-vexXn+V(_I)ne-X = (_I)n xn '* k n = (_I)n. n. n! n!
  • 199. 7.4 EXAMPLES OF CLASSICAL ORTHOGONAL POLYNOMIALS 181 We may try to check the evenness or oddness of L~ (x); however, this will not be helpful because changing x to -x distorts the RHS of Equation (7.16). In fact, k~ i' 0 in this case, and it can be calculated by noticing that the next-highest power of x is obtained by adding the first derivative of x"+v n times and multiplying the result by (_1)n-l , which comes from differentiating e-x. We obtain I ( 1)"-I(n +v) -v X[( 1)"-1 ( + ) n+v-l -X] - ,,-1 -x e - n n v x e = )1 X , n! (n -I . and therefore k~ = (-I)"-I(n + v)/(n - I)L Finally, for h" we get h _ (-1)"[(-1)" /n!]n! 1"" v -x "d _ I 1"" n+v -x d n - x e x x - - x e x. n! 0 n! 0 If v is not an integer (and it need not be), the integral on the RHS cannot be evaluated by elementary methods. In fact, this integral occurs so frequently in thegamma function mathematical applications that it is given a special name, the gamma function. A detailed discussion of this function can be found in Chapter 11. At this point, we simply note that ['en +1) = n! for n EN, (7.17) and write hn as h n = ['en +v + 1) = ['en +v + 1) . n! ['en + 1) The relevant parameters for the recurrence relation can be easily calculated: 1 Ol - - - - ,,- n+I' fJ _ 2n+v+I n- n+l ' n+v Y"=-n+I' Substituting these in Equation (5.12) and simplifying yields (n + I)L~+l = (2n +v + I - x)L~ - (n + v)L~_I' With kl = -I and (f2 = 0, we get A" = -n, and the differential equation of Proposition 7.1.4 becomes d2 p ai: x--f + (v +I-x)--" +nL~ = O. dx dx (7.18)
  • 200. 182 7. CLASSICAL ORTHOGONAL POLYNOMIALS 7.4.3 Legendre Polynomials summary of properties of Legendre polynomials Instead of discussing the Jacobi polynomials as a whole, we will discuss a special case of them, the Legendre polynomials P"(x), which are more widely used in physics. With IJ. = 0 = v, corresponding to the Legendre polynomials, the weight function for the Jacobi polynomials reduces to w(x) = I. The standardization is K" = (-1)"2"n!. Thus, the generalized Rodriguez formula reads (_I)" a" 2 P,,(x) = - - - - [(1- x)"]. 2nn! dx" (7.19) Tofind k", we expand the expression in square brackets using the binomial theorem and take the nth derivative of the highest power of x. This yields k x" _ (_I)" a" _x2)" __ 1_ a" (x2n) " - 2" , a II [( ] - 2" I a " n. x n. x = _1_2n(2n - 1)(2n - 2) ... (n + I)x". 2nn! 2"r(n +1) After some algebra (see Problem 7.7), we get k" = 1 2 n!r(z) Adrien-Marie Legendre (1752-1833) came from a well- to-do Parisian family and received an excellent education in science and mathematics. His university work was advanced enough that his mentor used many of Legendre's essays in a treatise on mechanics. A man of modest fortune until the revolution, Legendre was able to devote himself to study and research without recourse to an academic position. In 1782 he won the prize of the Berlin Academy for calculat- ing the trajectories of cannonballs taking air resistance into account. This essay brought him to the attention of Lagrange and helped pave the way to acceptance in French scientific circles, notably the Academy of Sciences, to which Legendre submitted numerous papers. In July 1784 he submitted a paper on planetary orbits that contained the now-famous Leg- endre polynomials, mentioning that Lagrange had been able to "present a more complete theory" in a recent paper by using Legendre's results. In the years that followed, Legendre concentrated his efforts in number theory, celestial mechanics, and the theory of elliptic functions. In addition, he was a prolific calculator, producing large tables of the values of special functions, and he also authored an elementary textbook_ that remained in use for many decades. In 1824 Legendre refused to vote for the government's candidate for Institut National. Because of this, his pension was stopped and he died in poverty and in pain at the age of 80 after several years of failing health. Legendre produced a large number of useful ideas but did not always develop them in the most rigorous manner, claiming to hold the priority for an idea if he had presented
  • 201. 7.4 EXAMPLES OF CLASSICAL ORTHOGONAL POLYNOMIALS 183 merely a reasonable argumentfor it. Gauss, with whom he had severalquarrels overpriority. considered rigorous proof the standard of ownership. To Legendre's credit, however, he was an enthusiastic supporter of his young rivals Abel and Jacobi and gave their work considerable attention in his writings. Especially in the theory of elliptic functions, the area of competition with Abel and Jacobi, Legendre is considered more of a trailblazer than a great builder. Hermite wrotethat Legendre "is consideredthe founder of the theory of elliptic functions" and "greatly smoothed the way for his successors," but Dotes that the recognition of the double periodicity of the inversefunction, which allowed the great progress of others, was missing from Legendre's work. Legendre also contributed to practical efforts in science and mathematics. He and two of his contemporaries were assigned in 1787 to a panel conducting geodetic work in co- operation with the observatories at Paris and Greenwich. Four years later the same panel members were appointed as the Academy's commissioners to undertake the measurements and calculations necessary to determine the length of the standard meter. Legendre's seem- ingly tireless skill at calculating produced large tables of the values of trigonometric and elliptic functions, logarithms, and solutions to various special equations. In his famous textbook Elements de geometrie (1794) he gave a simple proof that 7r is irrational and conjecturedthat it is not the root of any algebraicequationof finite degree with rational coefficients. The textbook was somewhat dogmatic in its presentation of ordinary Euclidean thought and includes none of the non-Euclidean ideas beginning to be fanned around that time. It was Legendre who first gave a rigorous proofof the theorem (assuming all of Euclid's postulates, of course) that the sum of the angles of a triangle is "equal to two right angles." Very little of his research in this area was of memorable quality. The samecould possibly be arguedfor the balanceof his writing,bot one must acknowledge the very fruitful ideas he left behind in number theory and elliptic functions and, of course, the introduction of Legendre polynomials and the important Legendre transformation used both in thennodynamics and Hamiltonian mechanics. To find k~, we look at the evenness or oddness of the polynomials. By an investigation of the Rodriguez formula-s-as in our study ofHennitepolynomials- we note that Fn(-x) = (_I)nFn(x), which tells us that Fn(x) is either even or odd. In either case, x will not have an (n - l)st power. Therefore, k~ = O. We now calculate hn as given by (7.12): h n = (-I)nknn! 11 (l _ x2)ndx = 2 nr(n + ~)/ r(~) 11 (1 _ x2)ndx. s; -1 2nn ! _1 The integral can be evaluated by repeated integration by parts (see Problem 7.8). Substituting the result in the expression above yields hn = 2/(2n + 1). We need ctn, fin and Yn for the recurrence relation: kn+ l 2n + 1r(n + 1 +~) n!r(~) 2n + 1 ct n = k:: = (n + 1)!r(~) 2nr(n +~) = n + 1 ' where we used the relation I'(n-l- 1+~) = (n+ ~)r(n+~). We also have fin =0 (because ej, = 0 = k~+I) and Yn = -n/(n+l). Therefore, the recurrence relation
  • 202. 184 7. CLASSICAL ORTHOGONAL POLYNOMIALS is (n + l)Pn+I(X) = (2n + l)xPn(x) -nPn-I(X). (7.20) Now we use K, = -2, Pl(X) = x =} kl = 1, and 02 = -1 to obtain An = -n(n + 1), which yields the following differential equation: d [ 2 dPn] dx (l-x) dx = -n(n + l)Pn· This can also be expressed as 2 d2P n ar; (l - x ) dx2 - 2x dx +n(n + l)Pn = O. 7.4.4 Other Classical Orthogonal Polynomials (7.21) (7.22) The rest of the classical orthogonal polynomials can be constructed similarly. For the sake of completeness, we merely quote the results. Jacobi Polynomials, P::'v(x) Standardization: K; = (-2)nn! Constants: kn =2-n r(2n +fJ. +v +1), k' = n(v - fJ.) k n!r(n +fJ. +v + 1) n 2n +fJ. +v n, 2,,+v+lr(n +fJ. + l)r(n +v +1) hn = -:-:c::-----'----'--'---=-'----'-------'----'-,.,- n!(2n +fJ. +v + l)r(n +fJ. +v +1) Rodriguez formula: Differential Equation: d2P"'v dP"'v (1 - x2)__ n - +[fJ. - v - (fJ. + v +2)x]_n_ dx2 dx +n(n +fJ. +v + l)P/:'v = 0 A Recurrence Relation: 2(n + l)(n +fJ. +v +1)(2n +fJ. +v)P:-i-~ = (2n +fJ. +v + 1)[(2n +fJ. +v)(2n +u. +v +2)x +v2 - fJ.2]p/:. v - 2(n + fJ.)(n +v)(2n +fJ. +v +2)P:~~
  • 203. 7.4 EXAMPLES OF CLASSICAL ORTHOGONAL POLYNOMIALS 185 Gegenbauer Polynomials, C;(x) f(n + A+ ~)r(2A) Standardization: K - (_2)nn'-'-__---"'--'-..,..:- n - . r(n +2A)f(A + ~) k _ 2n f(n + A) , ..jiif(n + 2A)r(A + ~) Constants: ~, - 0 h - --'--;c;--:-:-=-:;;:-:-,-=,-!'-- n - n! f(A) , ''"n - , n - n!(n +A)r(2A)r(A) Rodriguez Formnla: Differential Equation: d2e'" . de'" (1 - x2 )-----f - (2A+ l)x d" + n(n + 2A)e~ = 0 dx x A Recurrence Relation: (n + l)e~+l = 2(n + A)Xe~ - (n + 2A- l)e~_l Chebyshev Polynomials of the First Kind, Tn(x) Standardization: K = (_I)" (2n)! n 2nn! Constants: Standardization: • (_I)n2nn! dn Rodrfguez Formulaz Tn(x) = . (l_x2)1/2_ [(l_x2)"- 1/2] (2n)! dx" 2 d2T n dTn 2 Differential Equation: (1 - x )--2 - x - + n Tn = 0 dx dx A Recnrrence Relation: Tn+l = 2xTn - Tn-l Chebyshev Polynomials of the Second Kind, U«(x) K _ _ 1)n (2n + I)! n-( 2n(n+l)! Constants:
  • 204. (7.23) 186 7. CLASSICAL ORTHOGONAL POLYNOMIALS Rodriguez Formula: U ( ) = (_1)n2 n(n + 1)1 (1 _ x2)- 1/2 d n [(1 _ x2)"+1/2] n x (2n + I)! dx" 2 d 2U n au; Differential Equation: (1 - x )-2- - 3x- +n(n +2)Un = 0 dx dx A Recurrence Relation: Un+1 = 2xUn - Un-I 7.5 Expansion in Terms of Orthogonal Polynomials Having studied the different classical orthogonal polynomials, we can now use them to write an arbitrary function f E £'~(a, b) as a series of these polynomials. Ifwe denote a complete set of orthogonal (not necessarily classical) polynomials by ICk) and the given function by If), we may write 00 If) = Lak ICk), k=O where ak is found by multiplying both sides of the equation by (Cil and using the orthogonality of the ICk) 's: 00 (Cil f) = Lad Cil Ck) = a; (Cil Cj) =} k=O This is written in function form as (Cil f) aj = . (CilCj) (7.24) (7.25) t C~(x)f(x)w(x) dx a· - ",a'-:-i:--'-'-=--.:-'--'-:-'---'---'----- i r-: J;ICj(x)12w(x)dx We can also "derive" the functional form of Equation (7.23) by multiplying both ofits sides by (xl and using the fact that (xl f) = f(x) and (xl Ck) = Ck(X). The result will be 00 f(x) = LakCk(X). k=O (7.26) (7.27) 7.5.1. Example. The solution of Laplace's equation in spherically symmetric electro- static problems that are independent of theazimuthal angleis givenby ~( bk k) <I>(r, 0) = L.. HI +ckr Pk(COSO). k=O r Consider twoconducting hemispheres of radius a separated by a small insulating gap attheequator. Theupper hemisphere is heldatpotential Vo andthelowerone at - Vo. as
  • 205. 7.5 EXPANSION IN TERMS OF ORTHOGONAL POLYNOMIALS 187 shownin Figure 7.1. Wewantto findthe potential at pointsoutsidethe resulting sphere. Sincethepotential mustvanishatinfinity, we expectthesecondterm in Equation (7.27)to be absent, i.e., Ck = 0 V k. To find bb substitute a for r in (7.27) and let cosB '" x. Then, 00 bk <I>(a, x) = L HI Pk(X), k=Oa where { V. if - I < x < 0, <I>(a, x) = - 0 +Vo if ue x -c L From Equation (7.25), we have 2k + 111 -2- Pk(X)<I>(a, x) dx -1 Toproceed, we rewrite thefirst integral: 10 Pk(x)dx = _ fO Pk(-y)dy = f1 Pk(-y)dy = (_I)k f1 Pk(x)dx, -1 1+1 10 10 where we madeuse oftbe parity property of Pk(X). Therefore, bk 2k+1 k r aH 1 = -2-Vo[l- (-I) ] 10 Pk(x)dx. Itis now clearthat only oddpolynomials contribute to the expansion. Using the result of Problem 7.26, we get or Zm+2 m (2m)! bZm+1 =(4m +3)a VO(-I) Z +1 . 2 m ml(m + 1)1 Note that <I>(a, x) is an odd function; that is, <I>(a, -x) = -<I>(a, x) as is evident from its definition. Thus,onlyoddpolynomials appear intheexpansion of 4>(a,x) topreserve this property. Havingfoundthecoefficients, we canwritethepotential: 00 m(4m + 3)(2m)1 (a)zm+z <I>(r,B)=VoL(-I) Z +1 . - PZm+1(cos B). k~O 2 m ml(m + 1)1 r III
  • 206. 188 7. CLASSICAL ORTHOGONAL POLYNOMIALS Figure 7.1 Thevoltageis +Vo fortheupper hemisphere, where0 :::; B < 1T/2, orwhere o < cosO:::; 1. It is -Vo for the lower hemisphere, where 7C/2 < B :::; rr, or where -1 ~cose <0. The place where Legendre polynomials appear most natnrally is, as mentioned above, in the solntion of Laplace's eqnation in spherical coordinates. After the par- tial differential equation is transformed into three ordinary differential equations using the method of the separation of variables, the differential equation corre- sponding to the polar angle egives rise to solutions of which Legendre polynonti- als are special cases. This differential equation simplifies to Legendre differential equation if the substitution x = cos eis made; in that case, the solutions will be Legendre polynontials in x, or in cos e. That is why the argument of Pk(X) is restricted to the interval [-I, +I]. 7.5.2.Example. Wecan expandthe Dirac deltafunctionin tenus of Legendre polyno- mials.Wewrite 00 8(x) = L a"P,,(x), n=O where 2n+11 1 2n+1 an = - - Pn(x)8(x) dx = --Pn(O). 2 -1 2 Foroddn thiswillgive zero,because Pn(x) is an oddpolynomial. Toevaluate Pn(0) for evenn, we use therecurrence relation(7.20)forx = 0: n-I or nPn(O) = -en -1)P,,_z(O), or P,,(O) = ---Pn-z(O). Iteratingthis m times,we n obtain R 0 _ I m (n - I)(n - 3)··· (n - 2m + I) ,,( ) - (-) n(n - 2)(n - 4) ... (n - 2m +2) P,,-Zm(O). (2m - 1)(2m - 3) .. ·3' I Forn = 2m, thisyields PZm(0) = (_i)m PoCO). Nowwe "fill 2m(2m - 2) .. ·4· 2 the gaps" in the numerator by multiplying it-and the denominator, of course-by the
  • 207. (7.28) 7.6 GENERATING FUNCTIONS 189 denominator. Thisyields 2m(2m -1)(2m - 2)· .. 3·2·1 P2m (0) = (_I)m [2m(2m _ 2) ... 4 . 2]2 m (2m)! m (2m)! = (-I) [2mm!]2 = (-I) 22m(m!)2' because Po(x) = 1.Thus, we canwrite ~ 4m + I m (2m)! ~(x) = L.- --(-I) 2 2 P2m(x). m=O 2 2 m(m!) Wecan also derive this expansion as follows. Forany complete set of orthonormal vectors[IA}}~t, we have ~(x - x') =w(x) (xl x') =w(x) (xI1Ix') = w(x) {xl (~llk) (AI) Ix'} = w(x) ~Ik(x')lk(X). Legendre polynomials arenotorthonormal; hutwe canmakethemsoby dividing Pk(x) by h~/2 = ..j2/(2k + I). Then,notingthat w(x) = I, we obtain , 00 Pk(X') Pk(X) 00 2k + I , ~(x - x) = ~ ..j2((2k + I) ..j2(2k +I) = ~ -2-Pk(x )Pk(X). For x' = 0 we get ~(x) = E~o 2k; 11'k(O)Pk(X), which agrees with the previons result. III 7.6 Generating Functions Itis possible to generate all orthogonal polynomials of a certain kind from a single function of two variables g(x, t) by repeated differentiation of that function. Such generating function a function is called a generating function. This generating function is assumed to be expandable in the form 00 g(x, t) = I>ntn Fn(x), n=O so that the nth derivative ofg(x, t) with respect to t evaluated at t = 0 gives Fn(x) to within a multiplicative constant. The constant anis introduced for convenience. Clearly, for g(x, t) to be useful, it must be in closed form. The derivation of such a function for general Fn(x) is nontrivial, and we shall not attempt to derive such a general generating function-as we did, for instance, for the general Rodriguez formula. Instead, we simply quote these functions in Table 7.2, and leave the derivation of the generating functions of Hermite and Legendre polynomials as Problems 7.14 and 7.20. For the derivation of Laguerre generating function, see [Hassani, 2000] pp. 606-607.
  • 208. 190 7. CLASSICAL ORTHOGONAL POLYNOMIALS Polynomial Generating function an Hermite, Hn(x) exp(-t2 +2xt) lin! Lagnerre,L~(x) exp[-xtl(l-t)]/(l-t)v+l I Chebyshev (1st kind), Tn(x) (I - t2)(t2 - 2xt + 1)-1 2, n f= 0 ao = 1 Chebyshev (2nd kind), Un (x) (t2 - 2xt +1)-1 1 . Table 72 Generating functions for selected polynomials 7.7 Problems 7.1. Let n = 1 in Eqnation (7.1) and solve for sdw . Now snbstitute this in the dx derivative of ws nP,k and show that the derivative is eqnalto ws n-l P,k+!. Repeat this process m times to prove Lennna 7.1.2. 7.2. Find w(x), a, and b for the case of the classical orthogonal polynomials in which s(x) is of second degree. 7.3. Integrate by parts twice and nse Lennna 7.1.2 to show that lb Fm(wsF~)'dx =0 for m < n. 7.4. (a) Using Lemma 7.1.2 conclnde that (wsF~)'[u: is a polynomial of degree less than or equal to n. (b) Write (ws F~)'Iw as a linear combination of F, (x), and use their orthogonality and Problem 7.3 to show that the linear combination collapses to a single term. (c) Multiply both sides ofthe differential equation so obtained by Fn and integrate. The RHS becomes hnAn. For the LHS, carry out the differentiation and note that (ws)'I w = Kl Fl. Now show that K1Fl F~ +sF;:is a polynomial ofdegree n, and that the LHS of the differential equation yields {(Klkln + 02n(n - I)}hn. Now find An. 7.5. Derive the recurrence relation of Equation (7.8). Hint: Differentiate Equation (7.5) and substitute for F;: from the differential equation. Now multiply the result- ing equation by anx +fJn and substitute for (anx +fJn)F~ from one of the earlier recurrence relations.
  • 209. ifn=2m. ifn is odd, 7.7 PROBLEMS 191 7.6. Using only the orthogonality of Hermite polynomials i:e-X'Hm(x)Hn(x)dx = .j1iZnn!8 mn (and the fact that they are polynomials) generate the first three of them. 7.7. Show that for Legendre polynomials, kn = Znr(n + !)/[n!r(~)]. Hint: Multiply and divide the expression given in the book by n!; take a factor of Z out of allterms in the nmnerator; the even terms yield a factor ofn!, and the odd terms give a gannna function. 7.8. Using integration by parts several times, show that 11(1 _ x2)"dx = zmn(n - I)··· (n - m + I) IIx2m(l _ x2)n-mdx. -I . 3 . 5 . 7 ... (Zm - 1) -I Now show that I~I(1 - x2)ndx = Zr(!)n!/[(Zn + 1)r(n + !)]. 7.9. Use the generalized Rodriguez formula for Hermite polynomials and integra- tion by parts to expand x2k and x2k+I in terms of Hermite polynomials. 7.10. Use the recurrence relation for Hermite polynomials to show that 1 00 , -00xe-x Hm(x)Hn(x)dx = .j1i2n-In! [8m•n - 1 +2(n + 1)8m•n+l]' What happens when m = n? 7.11. Apply the general forma1ism of the recurrence relations given in the book to Hermite polynomials to find the following: Hn + H~_I - 2xHn_ 1 = O. 7.12. Show that 1':"00x2e-x ' H;(x) dx = .j1i2n(n + !)n! 7.13. Use a recurrence relations for Hermite polynomials to show that Hn(O) = {O (_I)m (2~)! 7.14. Differentiate the expansion of g(x, t) for Hermite polynomials with respect to x (treating t as a constant) and choose an such that nan = an-I to obtain a differential equation for g. Solve this differential equation. To determine the "constant" ofintegrationuse theresultofProblem7.13 to showthatg(O, t) = e-/'.
  • 210. 192 7. CLASSICAL ORTHOGONAL POLYNOMIALS 7.15. Use the expansion of the generating function for Hermite polynontials to obtain Then integrate both sides over x and use the orthogonality ofthe Herntite polyno- ntials to get 00 (sz)" fOO 2 L:-,-Z e-X H;(x)dx = ./iieZ". n~O (n.) -00 Deduce from this the normalization constant hn of Hn(x). 7.16. Using the recurrence relation of Equation (7.14) repeatedly, show that fOO k _x2 {O X e Hm(x)Hm+n(x)dx = rz: -00 '171: 2m(m +k)! if n > k, if n = k. 7.17. Given that Po(x) = I and PI (x) = x, (a) use (7.20) repeatedly to show that Pn(l) = 1. (b) Using the same equation, find Pz(x), P3(X), and P4(X). 7.18. Apply the general formalism of the recurrence relations given in the book to find the following two relations for Legendre polynontials: , , (a) nPn - xP; + Pn-1 = O. (b) (I - xZ)P~ - nPn-l +nXPn = O. 7.19. Show that J~l xnPn(x) dx = 2n+1(n!)z j(2n +I)!. Hint: Use the definition of hn and kn and the fact that Pn is orthogonal to any polynontial of degree lower thann. 7.20. Differentiate the expansion ofg(x, t) for Legendre polynontials, and choose Un = I. For P~, you will substitute two different expressions to get two equations. First use Equation (7.11) with n + I replaced by n, to obtain d 00 (I - tZ)~ +tg = 2 L:ntnPn-l +2t. dx n=2 As an alternative, use Equation (7.10) to substilnte for P~ and get dg 00 (I-xt)- = L:ntnPn_l +t. dx n=Z Combine the last two equations to get (lz - 2xt +I)g' = tg. Solve this differential equation and detemtine the constant of integration by evaluating g (x, 0).
  • 211. 7.7 PROBLEMS 193 7.21. Use the generating function for Legendre polynontials to show that Pn (I) = I, Pn(-I) = (_I)n, Pn(O) = 0 for odd n, and P~(l) = n(n + 1)/2. 7.22. Both electrostatic and gravitational potential energies depend on the qnantity I/lr - r'], where r' is the position of the sonrce (charge or mass) and r is the observation point. (a) Let r lie along the z-axis, and nse spherical coordinates and the definition of generating functions to show that _1_ = ~ f(r<)n Pn(cose), [r - r'] r-; n=O r-; where r«r» is the smaller (larger) of rand r', and eis the polar angle. (b) The electrostatic or gravitational potential energy <!>(r) is given by <!>(r) = kll! p(r ') d3x' , [r - r'] where k is a constant and p(r') is the (charge or mass) density function. Use the result of part (a) to show that if the density depends only on r', and not on any angle (i.e., p is spherically symmetric), then <!>(r) reduces to the potential energy of a point charge at the origin for r > r', (c) What is <!>(r)-in the form of an integral-for r < a for a spherically sym- metric density that extends from origin to a? (d) Show that E (or g) is given by [kQ(r)lr2]er where Q(r) is the charge (or mass) enclosed in a sphere of radius r, 7.23. Use the generating function for Legendre polynontials and their orthogonal- ity to derive the relation 11 dx 2 = ft 2n1 1 P;(x)dx. _11-2xt+t n=O -1 Integrate the LHS, expand the result in powers of t, and compare these powers on both sides to obtain the normalization constant hn• 7.24. Evaluate the following integrals using the expansion of the generating func- tion for Legendre polynontials. 10 " (a cos s +b) sine de (a) . o -v'a2+2abcose+b2 7.25. Differentiate the expansion of the Legendre polynontial generating function with respect to x and manipulate the resulting expression to obtain 00 00 (I - 2xt +t 2 ) :~:::>n P~(x) = t ~::>n Pn(x). n=O n=O
  • 212. if k is even, 194 7. CLASSICAL ORTHOGONAL POLYNOMIALS Equate equal powers of t ou both sides to derive the recurrence relation , , , Pn+1 + Pn-1 - 2xPn - Pn = O. 7.26. Show that [1 Pk(X) dx = {8 kO io (_I)(k-l'/2(k_l)! S",==:fu2-" if k is odd. 2k(y )!( &¥ )! Hint: For even k, extend the region of integration to (-I, I) and use the orthogo- nality property. For odd k, note that dk- 1 2 k 1 dxk- 1 (I-x) 10 gives zero for the upper limit (by Lemma 7.1.3). For the lower limit, expand the expression using the binomial theorem, and carry out the differentiation, keeping in mind that only one term of the expansion contributes. 7.27. Showthatg(x, r) = g(-x, -t)forbothHermiteandLegendrepolynomiais. Now expand g(x, t) and g(-x, -t) and compare the coefficients of t" to obtain parity relations the parity relations for these polynomials: and Pn(-x) = (_I)n P,,(x). 7.28. Derive the orthogonality of Legendre polynomials directly from the differ- ential equation they satisfy. 7.29. Expand Ixl in the interval (-I, +1) in terms of Legendre polynomials. Hint: Use the result ofProblem 7.26. 7.30. Apply the general formalism of the recurrence relations given in the book to find the following two relations for Laguerre polynomials: v v dL~ (a) nLn - (n +v)Ln_1 - x dx = O. (b) (n + I)L~+1 - (2n +v + I - x)L~ +(n +v)L~_1 = O. 7.31. From the generating function for Laguerre polynomials given in Table 7.2 deduce that L~(O) = r(n +v + I)/[n!r(v + I)]. 7.32. Let Ln sa L2. Now differentiate both sides of e-xt/(l-t) 00 g(x,t)= I-t = LtnL,,(x) o with respect to x and compare powers of t to obtain L~(0) = -n and L~(0) = ~n(n - I). Hint: Differentiate 1/(1 - t) = I:;;:o t" to get an expression for (I - t)-2.
  • 213. 7.7 PROBLEMS 195 7.33. Expande-kx as a series ofLaguerrepolynomials L~ (x). Findthe coefficients by using (a) the orthogonality of L~(x) and (b) the generating function. 7.34. Derive the recurrence relations given in the book for Jacobi, Gegenbauer, and Chebyshev polynomials. 7.35. Show that Tn(-x) = (-I)nTn(x) and Un(-x) = (-I)nun(x). Hint: Use g(x, t) = g(-x, -t). 7.36. Show that Tn(l) = I, Un(l) = n + I, Tn(-I) = (_I)", Un(-I) = (-I)n(n + I), T2m(O) = (_l)m = U2m(O), and T2m+l(O) = 0 = U2m+! (0). Additional Reading J. Dennery, P.and Krzywicki, A. Mathematicsfor Physicists, Harper and Row, 1967. Treats the classical orthogonal polynomials in the spirit of this chapter. 2. Tricomi, F. Vorlesungen uber Orthogonalreihen, Springer, 1955. The origi- nal unified treatment of the classical orthogonal polynomials.
  • 214. 8, _ Fourier Analysis The single most recurring theme of mathematical physics is Fourier analysis. It shows up, for example, in classical mechanics and the analysis of normal modes, in electromagnetic theory and the frequency analysis of waves, in noise consid- erations and thermal physics, in quantum theory and the transformation between momentum and coordinate representations, and in relativistic quantum field theory and creation and annihilation operation formalism. 8.1 Fourier Series One way to begin the study of Fourier series and transforms is to invoke a general- ization of the Stone-Weierstrass Approximation Theorem (Theorem 5.2.3), which established the completeness of monomials, xk . The generalization of Theorem 5.2.3 permits us to find another set of orthogonal functions in terms of which we can expand an arbitrary function. This generalization involves polynomials in more than one variable (For a proof of this theorem, see Sinunons [Sinun 83, pp 160-161].) generalized Stone-Weierstrass theorem 8.1.1. Theorem. (generalized Stone-Weierstrass theorem) Suppose that !(X, X2, ... , xn) is continuous in the domain {ai ::: Xi ::: hili=. Then it can be expanded in termsofthe monomials X~l X~2 ••. x~n, wherethe k; are nonnegative integers. Now let us consider functions that are periodic and investigate their expan- sion in terms of elementary periodic functions. We use the generalized Stone- Weierstrass theorem with two variables, x and y. A function g(x, y) can be written as g(x, y) = I:k,m~o akmxkym. In this equation, x and y can be considered as co- ordinates in the xy-plane, whichin tum can be written in terms ofpolar coordinates
  • 215. 8.1 FOURIER SERIES 197 r and e. Inthat case, we obtain 00 [(r, e) sa g(r cos s, r sine) = L akmrk+m cosk esin" e. k,m=O In particular, if we let r = I, we obtain a function of ealone, which upon substi- tution of complex exponentials for sin eand cos ebecomes (8.1) where b" is a constant that depends on akm' The RHS of (8.1) is periodic with period 2IT;thus, it is especially suitable for periodic functions [(e) that satisfy the periodicity condition [(e - IT) = [(e +IT). We can also write Equation (8.1) as 00 [(e) = bo + L (b"ei"e +b_"e-i"e) n=l 00 = bo +L[(b" +b_,,) cosne +i(b" - b_,,) sinne)] n=l '"-r--' . , I =An =Bn 00 = bo+ L(A" cosne + B" sinne). n=l (8.2) If [(e) is real, then bo, A", and B" are also real. Equation (8.1) or (8.2) is called the Fourierseries expansion of [(e). Let us now concentrate on the elementary periodic functions ei"e. We define the (Ie,,) }~I such that their "eth components" are given by 1 . e (ele,,) = l'C:e"', -v2IT where eE (-IT, IT). These functions-or ket vectors-which belong to .(,2(-IT, IT), are orthonormal, as can be easily verified. It can also be shown that they are complete. In fact, for functions that are continuous on (-IT, IT), this is a result of the generalized Stone-Weierstrass theorem. It turns out, however, that (Ie,,) }~I is also a complete orthonormal sequence for piecewisecontinuousfunctions on (-IT, IT).I Therefore, any periodic piecewise continuous function of e can be expressed as a linear combination of these orthonormal vectors. Thus if If) e.(,2(-IT, IT), then 00 If) = L [" Ie,,), n=-oo where f" = (e,,1 f). (8.3) 1A piecewisecontinuous function onafiniteinterval isonethathasafinitenumberofdiscontinuitiesinitsinterval ofdefinition.
  • 216. 198 8. FOURIER ANALYSIS We can write this as a functional relation if we take the Othcomponent of both sides: (01 f) = '£':::-00 fn (01 en), or Fourier series expansion: anguiar expression f(O) = _1_ f fnein9 .[iii n=-oo with fn given by (8.4) fundamental cell ofa periodic function "The profound study ofnature isthe most fruitful source of mathematical discoveries." Joseph Fourier fn = (enI1 If) = (enl (j~ 10)(01 dO) If) = L:(enl 0) (01 f) so = -1-1"e-in9f(O)dO. (8.5) .[iii -" It is important to note that even though f (0) may be defined only for -:n: :0: o:0: :n:, Eqnation (8.4) extendsthe domain of definitionof f(O) to all the intervals (2k - I):n: :0: 0 :0: (2k + I):n: for all k E Z. Thns, if a function is to be represented by Equation (8.4) withoutany specificationof the interval of definition,it must be periodic in O.For such functions, the interval of their definition can be translated by a factor of2:n:. Thus, f(O) with -:n: :0: 0 :0: :n: is equivalentto f(O - 2m:n:) with 2m:n: -:n: :0: 0 :0: 2m:n: +:n:; both will give the same Fourier series expansion.We shall defineperiodic functions in their fundamental cell such as (-:n:, :n:). JosephFourier (1768-1830) didvery wellasayoung student of mathematics buthad set his heart on becoming an army officer. Denied a commission because he was the son of a tailor, he went to a Benedictine school with thehope that he couldcontinue studying mathematics atitsseminary in Paris. TheFrench Revolution changed thoseplans andsetthestage formany of the personal circumstancesofFourier'slateryears, duein part to his courageous defenseof some of its victims, anaction that led to his arrest in 1794.He wasreleased later that year, andhe enrolled as a student in theEcoleNonnale, whichopened andclosedwithinayear. Hisperformancethere, however, wasenoughtoeamhimaposition asassistantlecturer (under Lagrange and Monge) in theEcole Polytechnique. He was anexcellentmathematical physicist, was a friend of Napoleon (sofarassuchpeoplehavefriends), andaccompaniedhimin 1798toEgypt, where Fourier heldvarious diplomatic andadministrative posts whilealso conducting research. Napoleon tooknote of his accomplishments and, on Fourier's return to France in 1801, appointed himprefect of thedistrict of Isere, in southeastern France, andin thiscapacity builtthe first realroad from Grenoble to Turin. He also befriended theboy Champollion, who later deciphered the Rosetta stone as the first long step toward understanding the hieroglyphic writing of the ancient Egyptians. Like other scientists of his time, Fourier took up the flow of heat. The flow was of interest as a practical problem in the handling of metals in industry and as a scientific problem in attempts todetermine thetemperature in theinterior of theearth, thevariation
  • 217. Fourier series expansion: general expression 8.1 FOURIER SERIES 199 of that temperature with time, and other such questions. He submitted a basic paper on heat conduction to the Academy of Sciences of Paris in 1807. The paper was judged by Lagrange, Laplace, and Legendre. The paper was not publish<id, mainly due to the objections of Lagrange, who had earlier rejected the use of trigonometric series. But the Academy did wish to encourage Fourier to develop his ideas, and So made the problem of the propagation of heat the subject ofa grand prize to be awarded in 1812. Fourier submitteda revised paper in 1811, which was judged by the men alreadymentioned and others. It won the prize but was criticized for its lack of rigor and so was not published at that time in the Memoires of the Academy. He developed a mastery ofclear notation, some of which is still in use today. (The mod- ern integral sign and the placement of the limits of integration near its top and bottom were introduced by Fourier.) It was also his habit to maintain close association between mathe- matical relations and physically measurable quantities, especially in limiting or asymptotic cases, even performing some of the experiments himself. He was one of the first to begin full incorporation of physical constants into his equations, and made considerable strides toward the modern ideas of units and dimensional analysis. Fourier continued to work on the subject of heat and, in 1822, published one of the classics of mathematics, Theorie Analytique de la Chaleur, in which he made extensive use of the series that now bear his name and incorporated the first part of his 1811 paper practically without change. Two years later he became secretary of theAcademy and was able to have his 1811 paper published in its original form in the Memoires. Fourier series were of profound significance in connection with the evolution of the concept of a function, the rigorous theory of definite integrals, and the development of Hilbert spaces. Fourierclaimed that"arbitrary" graphs can be represented by trigonometric series and should therefore be treated as legitimate functions, and it came as a shock to many that he tumed out to be right. The classical definition ofthe definite integral due to Riemann was first given in his fundamental paper of 1854 on the subject of Fourier series. Hilbert thought of a function as represented by an infinite sequence, the Fourier coefficients of the function. Fourier himself is one of the fortunate few: his name has become rooted in all civilized languages as an adjective that is well-known to physical scientists and mathematicians in everypart of theworld. Functions are not always defined on (-17:,17:).Let us consider a function F(x) that is defined on (a, b) and is periodic with period L = b - a. We define a new variable, and note that f(O) == F«LI217:)O +a +L12) has period (-17:,17:) because f(U17:) = F (2~ (0 ±1r)+a +~) = F (x ±~) and F(x + L12) = F(x - LI2). If follows that we can expand the latter as in Equation (8.4). Using that equation, but writing 0 in tenus of x, we obtain
  • 218. (8.6) 200 8. FOURIER ANALYSIS F(X)=F(~8+a+£)=_I- f !n exp [in 2:n: (x-a-£)] . 2:n: 2../iii n=-oo L 2 = _1_ f Fne2mrixjL, 4 n=-oo where we have introduced- Fn sa '/L/2:n:!ne-i(2nn/L)(a+L/2). Using Equation (8.5), we can write t; = [Le-i(2nn/L)(a+L/2)_I_ In e-inO !(8)d8 V2;. ../iii -n = 4 e-i(2nn/L)(a+L/2) la + L e-i(2nn/L)(x-a-L/2)F(x) 2:n: dx ~ a L = _1_1 b e-i(2nn/L)xF(x)dx. (8.7) 4a The functions exp(2:n:inx/L)/4 are easily seen to be orthonormal as mem- bers of .c}(a, b). We can introduce {len)}~1 with the "xth component" giveu by (xl en) = (1/4)eZrrinx/L. Then the reader may check that Equations (8.6) and (8.7) can be written as IF) = L~-oo Fn len) with Fn = (nl F). 8.1.2.Example. In the study of electricalcircuits,periodicvoltage signalsof different square wave voltage shapes are encountered. Anexample is a squarewavevoltageof height Uo."duration" T, and "rest dnration" T [seeFignre 8.I(a)]. The potential as a function of time Vet) can be expanded as a Fourier series. Theinterval is (0, 2T) because that is one wholecycleof the potential variation. Wetherefore use Equation (8.6) andwrite I 00 Vet) = -- L Vn,e2n1fitj2T, ..tiTn=-oo I Jo2T . where Vn = -_ e-2n,nt/2TV(t)dt. ..tiT 0 Theproblem is tofindVn . Thisis easilydoneby substituting V t _ {Uo if 0 s t ::; T, ( ) - 0 if T s t s 2T in the last integral: Vn = Uo fT e-innt/T dt = Uo (_~) [(_I)n _ I] where n i' 0 ..tiT 1 0 ..tiT 1n1f _{O ifn is even andn =1= 0, - ..tiTuo if . odd . n IS . 1n1f 2TheFn aredefined suchthatwhattheymultiply in theexpansion areorthonormal in theinterval (a, b).
  • 219. 8.1 FOURIER SERIES 201 (b) (a) 2T T -T u0 "- ..••....+........... :::::::::::t::::::::::: + !::::::::::::1::::::::::: ••••••·••• l •••••··•···· .••..•••••••..:............. ••••••••••••(10............ . ~ . o :~:~:~:~:r:::::~:~:~I::::::::::::L:::::::::I:~:~:~:~::~:L:~:~:~:~:l:::::::::::I:::::.:.~ -.::.- --.-: : ; --:;;.: : U o .:10............ .. ~ . ,~l~'.~~H~;f~~~:~·~~~~~ -2T Figure 8.1 (a) The periodic square wave potential. (b) Various approximations to the Fourier series of the square-wave potential. Thedashed plotis that of thefirst term of the series, the thick grey plot keeps 3 terms, and the solid plot IS terms. I 1 2T I IT ~ For n = 0, we obtain Vo = "'"' V(t) dt = "'"' Uodt = Uo -. Therefore, ..,2T 0 ..,2T 0 2 we canwrite V(t) = _1_ [uo f'f+ ,fiTUo ( I: !eimrt/ T + f: !e'nm/T)] ,.j2f V"2 Z:Tr n=-oo n n=l n n odd n odd = Uo{!+ ~ If: _Ie-in"t/T + f: !einm/T]} 2 l1r n=l -n n=l n odd n odd = U {! ~ ~ _1_ in ([2k+ 111ft)} o 2+1ft;Q2k+l s T . Figure 8.1(b)showsthegraphicalrepresentationoftheabove sumwhenonlyafinite number of tenusarepresent. .. sawtooth voltage 8.1.3. Example. Another frequently nsed voltage is the sawtooth voltage [see Fig- ure 8.2(a)]. The equatiou for V(t) with period T is Vet) = UotlT for 0 ~ t ~ T, anditsFourier representation is I 00 V(t) = - L Vne2n1titjT, Jf n~-oo where Vn = _1_ rT e-2"int/TV(t) dt. Jf io
  • 220. 202 B. FOURIER ANALYSIS Substitutingfor V(t) in the integral above yields Vn = _1_ {T e-21rintjTUo!.... dt = UoT-3j2 {T e-21rint/Tt dt ..ti 10 T 10 = ti r-3/2(---J!--e-2Jrint/TIT +;!.- (T e-21rint/Tdt) o -12mr 0 l2mr 10 ::..::.-~-- ~o 3/2 ( T 2) uo..ti = UOT- -.- = --.-- where n t"0, -12m/: z2n:tr 1 leT 1 leT t Vo = - V(t) dt = f'i' UO- dt = !Uo..ti. ..ti 0 -rt 0 T Thus, V(t) = _1_ [lUo..ti _ uo..ti (t ~ei2n"'/T+ f: ~ei2n"t/T)] ...tf 2 z2rr n=-oo n n=l n = Uo {~- 2. f: ~Sin(2mrt)}. 2 JCn=ln T Figure 8.2(b)showsthegraphical representation of theabove series keeping thefirst few ternM. • The foregoing examples indicate an important fact about Fourier series. At points of discontinuity (for example, t = T in the preceding two examples), the value of the functiou is not defined, but the Fourier series expansion assigns it a value-the average of the two values on the right and left ofthe discontinuity. For instance, when we substitute t = T in the series of Example 8.1.3, all the sine terms vanish and we obtain V(T) = Uo/2, the average of Uo (on the left) and 0 (on the rigbt). We express this as V(T) = ![V(T - 0)+ V(T +0)] sa !lim [V(T - E)+ V(T +E)]. .....0 This is a general property of Fourier series. In fact, the main theorem of Fourier series, which follows, incorporates this property. (For a proof of this theorem, see [Cour 62].) 8.1.4. Theorem. TheFourierseries ofafunction f(O) that ispiecewise continuous in the interval (-Jr, Jr) converges to ![f(O +0) + f(O - 0)] for - x < 0 < it; ![f(Jr) + f(-Jr)] for 0 = ±Jr. Although we used exponential functions to find the Fourier expansion of the two examples above, it is more convenient to start with the trigonometric series
  • 221. B.l FOURIER SERIES 203 (b) time Figure8.2 (a)Theperiodic saw-toothpotential. (b)Various approximations totheFourier seriesof the sawtooth potential. The dashed plot is thatof the first term of the series,the thick grey plot keeps 3 terms, and the solid plot 15 terms. when the expansion of a real function is songht. Equation (8.2) already gives such an expansion. All we need to do now is find expressions for An and Bn. From the definitions of An and the relation between bn and In we get 1 An =bn +b-n = !'C(fn + I-n) v2re 1 I f" 1 f" = - ( - e-ine1(0) so + - eine1(0) dO) v'2ir v'2ir -" v'2ir -" 1 f" 1 L = - [e- ine +eineJ/(O)dO = - cosnOI(O) dO. 211: -1t tt -11: Similarly, I L Bn = - sinnOI(O) dO, :n: _" 1 I f" I bo = !'Clo = -2 1(0) se es -2 Ao. V 2n 1!-1C (8.8) (8.9) So, for a function 1(0) defined in (-re, :n:), the Fourier trigonometric series is as in Equation (8.2) with the coefficients given by Equations (8.8) and (8.9). For a function F(x), defined on (a, b), the trigonometric series becomes 1 ~ ( Znnx . 2n:n:x) F(x) = -AD + L.. Ancos-- + Bn sm-- , 2 n=1 L L (8.10)
  • 222. (8.11) 204 8. FOURIER ANALYSIS where An = ~ t cos Cn;X)F(x) dx ; En = ~ (b sin (2nrrx) F(x) dx. Lin L A convenient rule to remember is that for even (odd) functions-which are necessarily defined on a symmetric interval around the origin-only cosine (sine) terms appear in the Fourier expansion. 8.1.5. Example. An alternating current is tnmed into a direct currentby starting with a signal of the form V(t) ex: Isin llltI.i.e., a harmonic function thatis nevernegative, as shownin Figure 8.3(a).Thenby proper electronics, one smooths outthe "bumps" so that the output signal is verynearly a direct voltage.Let us Fourier-analyze the above signal. Since Vet) is even for -lC -c tot < 1C, we expect only cosine tenus to be present. Ifforthe timebeingwe use e instead of cot,we canwriteIsinBI = !Ao +L~l Ancosn6, where 1 1" 2 10" An = - IsinOlcosnOdO=- sinOcosnOdO ]'(-Jr nO 21o"t 2 [(-I)n+ 1] = - ~[sin(n + 1)0 - sin(n -I)O]dO = --2-- 11:0 n-l1! _{-~(+) for e even and e ee D, - 1r n -1 o forn odd, and Ao = (1/rr)J':." Isin0l dO = 4/rr. Theexpansionthen yields . 2 4 ~ cos2kwt Ismwtl=--- L., 2 ' rr tt k~l 4k - 1 where in the sumwe substituted 2k for n, andOJt forB. Figure 8.3(b) showsthe graph of the seriesabove when only the first few termsarekept. II It is useful In have a representation of the Dirac delta function in terms of the present orthonormal basis of Fourier expansion. First we note that we can represent the delta function in terms of a series in any set of orthonormal functions (see Problem 8.23): 8(x - x') = L fn(x)t,;'(x')w(x). (8.12) n Next we use the basis of the Fourier expansion for which w(x) = 1. We then obtain 00 8(x -x') = L n=-oo e21r:inx/L e-2ninx'/L ~ Ee2nin(x-x')/L . n=-oo
  • 223. 8.1 FOURIER SERIES 205 (a) (b) 31C/ro 21C/ro 1C/ro time a oO:"""~.L.....~..L.,..~u.....~..........~-'-'-~ ..........~.........~........" -rc/ro a Figure8.3 (a) The periodic "monopolar" sine potential. (b) Various approximations to theFourier seriesofthe"monopolar' sinepotential. Thedashedplotis that of thefirst term of the series, the thick grey plot keeps 3 terms, and the solid plot 15 terms. 8.1.1 The Gibbs Phenomenon The plot of the Fourier series expansions in Figures 8.1(b) and 8.2(b) exhibit a feature that is common to all such expansions: At the discontinuity of the periodic function, the truncated Fourier series overestimates the actual function. This is Gibbs phenomenon called the Gibbs phenomenon, and is the subject of this subsection. Let us approximate the infiuite series with a fiuite sum. Then 1 ~ . 0 1 ~ . 0 1 fo2n" .0' fN(O) = -- L.. fn eln = -- L.. eln - - e-In f(O') dO' v'2rr n=- N v'2rr n=- N ' v'2rr 0 1 (2n N = 27C In dO' f(O') L ein(O-O'), o n=-N where we substituted Equation (8.5) in the sum and, without loss of generality, changed the interval of integration from (-:n:, IT) to (0, 2:n:). Problem 8.2 shows that the sum in the last equation is ~ in(O-O') _ sin[(N + ~)(O - 0')] L.. e - 1 n=-N sin['i(O - 0')] It follows that fN(O) =...!:.- r:s«f(O') sin[(N + ~)(O - 0')] 2" Jo sin[~(O - 0')]
  • 224. 206 6. FOURIER ANALYSIS I i: sin[(N + l)tP] I j2n~0 = - dtPf(tP+0) . 1 2 sa - d4>f(tP +O)S(tP)· 2n: -0 s1O(2tP) 2n: -0 (8.13) ~ =SCq,) We wantto investigate the behavior of fN at a discontinuity of f. By translating the limits of integration if uecessary, we can assume that the discontinuity of f occurs at a point a such that 0 01 a 01 2n:. Let us denote the jump at this discontinuity for the function itself by !'>f, and for its finite Foutier sum by !'>fN: !'>f sa f(a + f) - f(a - f), maximum overshoot inGibbs phenomenon calculated Then, we have !'>fN I c: I j2n-u+< =- dtPf(tP +a + f)S(tP) - - dtPf(tP +a - f)S(tP) 2n -a-e 211: -a+€ = _I {j-u+<dtPf(tP +a + f)S(tP) +t:dtPf(tP +a + f)S(tP)} 2rr -a-e -a+€ I {j2n-u-< c: } - - dtPf(tP +a - f)S(tP) + dtPf(tP +a - f)S(tP) 2n -a+€ 2rr-a-e I {j-u+< {2n-u+< } = 2n: -u-< dtPf(tP +a + f)S(tP) - J2n-u-< dtPf(tP +a - f)S(tP) I i: + - dtP[f(tP +a +f) - f(tP +a - f)]S(tP) 2n -a+e The first two integrals give zero because ofthe small ranges ofintegration and the continuity ofthe integrands in those intervals. The integrand ofthe third integral is almost zero for all values ofthe range ofintegration except when tP '" O.Hence, we can confine the integration to the small interval (-8, +8) for which the difference in the square brackets is simply ts], It now follows that !'>f j~ sin[(N + !)tP] !'>f fo~ sin[(N + !)tP] !'>fN(8) '" -2 . 1 dtP '" - 1 dtP, it _~ s1O(2tP) n: 0 2tP wherewe have emphasized the dependence of fN on 8 and approximated the sine in the denominator by its argument, a good approximation due to the smallness of tP. The reader may find the plot of the integrand in Fignre 6.2, where it is shown clearly that the major contribution to the integral comes from the interval [0, n:1(N+!)],where n:I(N +!) is the first zero ofthe integrand. Furthermore, it is clear that ifthe upper limit is larger than n:I(N +!),the result ofthe integral will decrease, because in each interval oflength 2n:,the area below the horizontal axis is larger than that above. Therefore, if we are interested in the maximum overshoot
  • 225. 8.1 FOURIER SERIES 207 of the finite sum, we must set the upperlimit equal to re/(N +~). It follows firstly that the maximum overshoot of the finite sum occurs at reI(N +~) RJ n tN to the right of the discontinuity. Secondly, the amount of the maximum overshoot is =X I ~ 2!:>.f 1,,/(N+2) sin[(N + ~)</J] (l1fN)m", RJ - A. d</J tt 0 'I' 2 1 1C sinx = -tV -dx RJ 1.179I1f. re 0 x Thus (8.14) 8.1.6.Box. (Gibbs phenomenon) The finite (large-N) sum approximation ofthe discontinuousfunction overshoots the function itselfat a discontinuity by about 18 percent. 8.1.2 Fourier Series in Higher Dimensions It is instructive to generalize the Fourier series to more than one dimension. This generalizationis especiallyusefulin crystallography and solid-statephysics, which deal with three-dimensional periodic structures. To generalize to N dimensions, we first consider a special case in which an N-dimensional periodic function is a product of N one-dimensional periodic functions. That is, we take the N functions f(J)(x) = ~ f: ffiJe2i" kx/ Lj , j = 1,2, ... , N, yLj k=-oo and multiply them on both sides to obtain (8.15) where we have used the following new notations: F(r) ea f(l)(xj)f(2)(X2)" .j<N)(XN), k",(kl,k2, ... .k»). gk = 2re(kilL!, ... , kNILN), V=LI L2··· LN, Fk sa fkl ... fkN' r = (Xl, X2, ... , XN). We take Equation (8.15) as the definition of the Fourier series for any periodic function of N variables (not just the product of N functions of a single variable). However, application of (8.15) requires some clarification. In one dimension, the
  • 226. 208 8. FOURIER ANALYSIS shape ofthe smallest region ofperiodicity is unique. I! is simply a liue segment of length L, for example. In two and more dimensious, however, such regions may havea variety of shapes. Forinstance, in two dimensions, theycanberectangles, peutagons, hexagons, and so forth. Thus, we let V in Equation (8.15) stand for a primitive cell of the N-dimensional lattice. This cell is important iu solid-state Wigner-Seitz cell physics, and (iu three dimeusions) is called the Wigner-Seitz cell. I! is customary to absorb the factor 1!.,fV iuto Fk, and write F(r) = L Fkeigk" k (8.16) where the integral is over a siugle Wigner-Seitz cell. Recall that F(r) is a periodic function ofr. This means that when r is changed by R, where R is a vector describiug the boundaries of a cell, then we should get the same function: F(r +R) = F(r). When substituted iu (8.16), this yields F(r +R) =Lk Fkeig•.(,+R) =Lk eig•.R Fkeig•." which is equal to F(r) if (8.17) In three dimensions R = mI8t + m28z + m3a3, where mI, mz. and m3 are iutegers and 31, 3Z, and 33 are crystal axes, which are not generally orthogonal. On the other hand, gk = nlbl +nzhz +n3b3, where nl, nz, and n3are iutegers, reciprocallallice and bl, bz, and b3 are the reclprocal lattice vectors defined by vectors The reader may verify that bi . 3j = 2:n:8ij. Thus ~.R = (tnibi) . (tmj3j) = ~nimjbi'3j 1=1 1=1 I,l 3 = 2:n: Lmjnj = 2:n:(iuteger), j=l and Equation (8.17) is satisfied. 8.2 The FourierTransform The Fourier series representation of F (x) is valid for the entire realliue as long as F(x) is periodic. However, most functions encountered in physical applications are defined in some iuterval (a, b) withoutrepetition beyond that iuterval. I!wonld be useful ifwe conld also expand such functions iu some form ofFourier "series." One way to do this is to star! with the periodic series and then let the period go to iufinity while extending the domain of the definition of the function. As a
  • 227. 8.2 THE FOURIER TRANSFORM 209 f(x) a (a) L )1001 b=a+L a-L a (b) a+L a+2L (8.18) (8.19) Figure 8.4 (a) The function we want to represent. (b) The Fourier series representation of thefunction. specific case, suppose we are iuterested in representiug a function f(x) that is defined only for the interval (a, b) and is assigned the value zero everywhere else [see Figure 8.4(a)]. To begiu with, we might try the Fourier series representation, but this will produce a repetition of our function. This situation is depicted iu Figure 8.4(b). Next we may try a function gA (x) defined in the interval (a - A/2, b +A/2), where A is an arbitrary positive number: { o if a - A/2 < x < a, gA(X)= f(x) if a-c x -c b, o if b < x < b + A/2. This function, which is depicted iu Figure 8.5, has the Fourier series representation I 00 gA(X) = L gA,ne2hmxI(L+Al, .,jL +A n=-oo where I lb + A/2 . gA.n = e-2l1rnxI(L+AlgA(X)dx. .,jL +A a-A/2 We have managed to separate various copies of the original periodic function by A. It should be clear that if A ---> 00, we can completely isolate the function
  • 228. 210 8. FOURIER ANALYSIS and stop the repetition. Let us investigate the behavior of Equations (8.18) and (8.19) as A grows without bound. First, we notice that the quantity kn defined by kn es 2n71:/ (L + A) and appearing in the exponent becomes almost continuous. In other words, as n changes by one unit, kn changes only slightly. This suggests that the terms in the sum in Equation (8.18) can be lumped together in j intervals of width Snj , giving where kj ea 2nj7l:/(L + A), and gA(kj) == gA,nj' Substituting !'1nj = [(L + A)/271:]!'1kj in the above sum, we obtain where we introduced gA(kj) defined by gA(kj) sa .J(L +A)/271: gA(kj). It is now clear that the preceding sum approaches an integral in the limit that A --+ 00. Fourier integral In the same limit, gA (x) --+ f (x), and we have transforms f(x) = ~1°O f(k)eikXdk, 'V 21l' -00 where (8.20) (8.21) - . - . JL+A f(k) sa lim u(kj) = lun - - gA(kj) A--+oo A-+oo 2n JL + A I lb+Af 2 . = lim e-'kjXgA(x)dx A ....oo 271:.JL + A a-A/2 = _1_100 f(x)e-ikxdx. ...J2ir -00 Equations (8.20) and (8.21) are called the Fourier integral transforms of f(k) and f(x), respectively. 8.2.1, Example. Let us evaluatethe Fouriertransformof the function defiuedby f(x) = {b if Ixl < a, D if [x] > a (seeFigure 8.6).From (8.21)we bave J(k) = _1_ foo f(x)e-ikxdx = _b_ fa e-ikxdx = 2ab (Siuka), ."fiii -00 ."fiii -a ."fiii ka whichis thefunction encountered (anddepicted) in Example 6.1.2.
  • 229. a-N2 8.2 THE FOURIER TRANSFORM 211 b+N2 Figure 8.5 By introducing theparameter A, we havemanaged to separate thecopies of thefunction. Letusdiscussthisresultindetail. First, notethatifa ~ 00, thefunction f (x) becomes a constant function overtheentire real line, andwe get _ 2b sinka 2b I(k) = = lim - - = =7C8(k) v 21r e-e-oc k '"21r by the resultof Example 6.1.2. This is the Fourier transformof an everywhere-constant function (seeProblem 8.12). Next,let b --+ 00 anda --+ 0 in suchawaythat2ab, whichis the area under I(x), is I. Then I(x) will approach the delta function, and j(k) becomes _ 2ab sinka I sinka I I(k) = lim - - - = - lim - - =-. b....oo.j'iii ka .j'iii a....O ka .j'iii a-+O Heisenberg uncertainty relalion So the Fouriertransform of the delta functionis the constant 1/~. Finally, we note that the width of I(x) is Sx =2a, and the width of j(k) is roughly thedistance, onthek-axis,betweenits first two roots, k+ andk_. on either sideof k = 0: t>k =k+ -k_ =27C[a. Thus increasing the width of I(x) results in a decrease in the width of !(k). In other words, whenthefunction is wide, its Fourier transform is narrow. In the limitofinfinite width(a constantfunction), we get infinite sharpness (thedeltafunction). The lasttwo statements are verygeneral. In fact, it canbe shownthatAxAk ~ 1 forany function I (x). When both sides of this inequality are multiplied by the (reduced) Planck constant a== h/(23t}, theresultis thecelebrated Heisenberg uncertainty relation:3 t>xt>p 2: Ii, where p = Ilk is the momeutum of the particle. Having obtained the transform of I(x), we canwrite I( I 100 2b sinka ikxdk _ b 100 sinka ikxdk x}=-- ----e -- --e . .j'iii -oo.j'iii k 7C -00 k III 3Inthecontextof theuncertainty relation, the widthof the function-the so-calledwavepacket-measures theuncertainty in theposition x of a quantum mechanical particle. Similarly, thewidthof theFourier transform measures theuncertainty ink, whichis related to momentum p via p = hk,
  • 230. 212 B. FOURIER ANALYSIS f(x) b ax _a a x Figure8.6 The square"bump" function. 8.2.2. Example. Let us evaluate the Fourier transform of a Gaussian g(x) = ae- bx2 witha,b > 0: a 1 00 2 ae- k2 / 4b 100 2 g(k) = - - e-b(x +ikxjb)dx = e-b(x+ikj2b) dx . ../2ii -00 ../2ii -00 Toevaluate thisintegral rigorously, we would havetousetechniques developed incomplex analysis, whicharenotintroduced until Chapter 10(seeExample 10.3.8).However, we can ignore thefactthat theexponent is complex, substitute y = x + ik/(2b), andwrite 100 e-b[x+ik/(2b)]2dx = 100 e-by2dy = E, -00 -00 Vb Thus, we have g(k) = ::"'e-k'j(4b), which is also a Gaussian. -n» We note again that the width of g(x), which is proportional to lj../b, is in inverse relation to the widthof g(k), whichis proportionalto ../b.Wethus have AxAk - 1. III Equations (8.20) and (8.21) are reciprocals of one another. However, it is not obvious that they are consistent. In other words, if we substitute(8.20) in the RHS of (8.21), do we get an identity? Let's try this: Ilk) = _1_100 dx e-ikx [_1_100 !(k')eik'Xdk'] ..;z;r -00 ..;z;r -00 = ~ 100 dx 100 !(k')ei(k'-k)Xdk'. 21l' -00 -00 We now change the order of the two integrations: Ilk) = 100 dk' /(k') [~1°O dx ei(k'-k)X] . -00 21t -00 But the expression in the square brackets is the delta function (see Example 6.1.2). Thus, we have Ilk) = J~oo dk'!(k')8(k' - k), which is an identity.
  • 231. 8.2 THE FOURIER TRANSFORM 213 As in the case of Fourier series, Equations (8.20) and (8.21) are valid even if f and! are piecewise continuous. In that case the Fourier transforms are written as ![f(x +0) + f(x - 0)] = ~100 !(k)eikxdk, '1271: -00 ![!(k+O)+!(k-O)] = ~1°O f(x)e-ikxdx, '1271: -00 (8.22) where each zero on the LHS is an E that has gone to its limit. It is useful to generalize Fourier transform equations to more than one dimen- sion. The generalization is straightforward: 1 f 'k- f(r) = dnke' -r f(k) (271:)n/2 ' I f 'k !(k) = dnxf(r)e-' .r. (271: )n/2 (8.23) Let us now use the abstract notation of Chapter 6 to get more insight into the preceding results. In the language of Chapter 6, Equation (8.20) can be written as (8.24) (8.25) where we have defined I 'k (xlk) = =e' x. '1271: Equation (8.24) suggests the identification If) es If) as well as the identity 1 = L:Ik) (kl dk, which is the same as (6.1). Equation (6.3) yields (kl k') = 8(k - k'), (8.26) (8.27) which upon the insertion of a unit operator gives an integral representation of the delta function: 8(k - k') = (klllk') = (kl (L:Ix) (xl dX) Ik') 100 I 100 = (kl x) (xl k') dx = - dxei(k'-k)x. -00 21f -co Obviously, we can also write 8(x - x') = [1/(271:)] J":'oo dkei(x-x'lk.
  • 232. 214 8. FOURIER ANALYSIS Ifmore than onedimension is involved, we use 8(k - k') = _1_ fdnxei(k-k'j.r (211")n ' 8(r - r') = _1_ fdnkei(r-r'j.k (211")n ' with the inner product relations (8.28) ( Ik) I rk-r r = (211")n/2 e , 1 -ik.r {k] r) = (211")n/2 e • (8.29) Equations (8.28) and (8.29) and the identification If) == If) exhibit a striking resemblance between [r) and [k), In fact, any given abstract vector If) can be expressed either in terms of its r representation, (r] f) = f(r), or in terms of its k representation, (kl f) == f(k). These two representations are completely equivalent, and there is a one-to-one correspondence between the two, given by Equation (8.23). The representation that is used in practice is dictated by the physical application. In quantum mechartics, for instance, most of the time the r representation, corresponding to the position, is used, because then the operator equations turn into differential equations that are normally linear and easier to solve than the corresponding equations in the k representation, which is related to the momentum. 8.2.3. Example. InthisexampleweevaluatetheFouriertransform ofthe Coulombpoten- tialVCr) of a point charge q: VCr) = «tr- TheFouriertransfonn is important in scattering experiments withatoms, molecules,andsolids.As we shallseeinthefollowing, theFourier Yukawa potential transform of V (r) is notdefined. However, if we work withtheYukawapotential, qe-ar Va(r) =--, r a> 0, theFourier transform will be well-defined,andwe cantakethelimita ~ 0 to recoverthe Coulomb potential. Thus, we seektheFourier transformof Va(r). Weareworking in threedimensionsandtherefore may write I Iff -ar Va(k) = d3xe-ik.r~. (211")3/2 r It is clearfrom the presence of r that spherical coordinates are appropriate. We are free to pick anydirection as the z-axis.A simplifying choice in thiscaseis thedirection of k. So, we let k = Iklez = kez, or k . r = kr cose, whereeis the polar anglein spherical coordinates. Now we have Va(k) = __ q_ roor2dr {1C sln sse {21r dcpe-ikrcos(Je-ar. (211")3/2 10 10 10 r Thecp integration is trivial and gives 217'. The0 integration is donenext: {1C sinOe-ikrcos(J dO = 11 e-ikrudu = ~(eikr _ e-ikr). fo -'-1 ikr
  • 233. .. 8.2 THE FOURIER TRANSFORM 215 We thus have Va(k) = q(2n) rX!drrze-ar _1_ C i kr _ e-ikr) (211:)3/2 1 0 r ikr = __ q_ ~ [00 dr [e(-a+iklr _ e-(a+ik)rj (211:)1/2 ik 10 q I (e(-a+ik)r 1 00 e-(a+ik)r 1 00 ) = (211:)1/2 ik -a +ik 0 + ,,+ ik 0 . Note how thefactor e-ar hastamed thedivergent behavior of theexponential atr -+ 00. This was the reason for introducing it in the first place. Simplifyingthe last expression yields Va(k) =(2ql../'iii)(~ +~)-1. The parameter" is a measure of the range of the potential. Itis clearthat thelarger ex is, thesmallertherange. In fact,it wasinresponse to the short range of nuclear forcesthat Yukawa introduced a. Forelectromagnetism, where therange isinfinite, ex becomes zeroandVa(r) reduces toV (r). Thus, theFouriertransform of theCoulomb potential is _ 2q I VCoul (k) = ../'iii k2 . If a charge distribution is involved, theFourier transform willbedifferent. 8.2.4. Example. The example above deals with the electrostatic potential of a point charge. Letus now consider thecase where thecharge is distributed overa finite volume. Thenthepotential is V( ) = fff qp(r') d3 , '" f per') d3 , r Ir' _ r] x q II'"' _ r] x , where qp(r') is thecharge density at r'.andwe haveuseda single integral because d3x' already indicates thenumber of integrations tobe performed. Notethatwe havenormalized p (1'"') sothat its integralover the volmne is I. Figore 8.7 showsthe geometry of the situation. Making a changeof variables, R == r' - r, or r' = R + r, andd 3x' = d 3X, with R", (X, Y, Z), we get V(k) = (211:~3/2f d3xe-ik.rq f P(RR+ r ) d3X . (8.30) Toevaluate Equation (8.30), we substitute for peR+ r) in terms of its Fourier transform, peR +r) = __ 1_ fd3k'p(k')eik'.(R+r). (8.31) (211:)3/2 Combining (8.30) and (8.31), we obtain ik'·R V(k) =.s.. f d3xd3X d3k'_e__ p(k')ir.(k'-k) (211:)3 R ik'·R = fd 3X d3k'_e- -(k') (_1_ f d3xeir.(k'-kl) q R p (211:)3 -...::.:.....:.....:.~--~ ~(k'-k) (8.32)
  • 234. 216 8. FOURIER ANALYSIS dq o Figure 8.7 TheFourier transform of thepotential of a continuous charge distribution at P is calculated usingthisgeometry. What isniceaboutthisresultis thatthecontribution ofthecharge distribution, p(k), has beencompletely factored out.Theintegral, asidefroma constant anda change in thesign of k, is simply the Fourier transform of the Coulomb potential of a point cbarge obtained in theprevious example. Wecantherefore writeEquation (8.32) as - 3/2 - 4".qp(k) V(k) = (2".) P(k)VCoul(-k) = Ikl2 • formfactor Fourier transform and the discovery of quarks This equation is important in analyzing thestructure of atomic particles. TheFourier transform ii(k) is directly measurable in scattering experiments. In a typical experiment a (cbarged) target is probed with a charged point particle (electron). If the analysis of the scatteringdatashowsadeviationfrom 1/ k2in thebehaviorof V(k),theniteanbeconcluded that the target particle has a charge distribntion. More specifically, a plot of k2iT(k) versus k givesthevariation of p(k:),theformfactor,withk.1f theresulting graph isaconstant, then p(k) is a constant, and the target is a point particle [p(k) is a constant for point particles, where p(r') ex8(r - r')]. Ifthere is any deviation from a constantfunction, p(k) mnst have adependence onk, andcorrespondingly, thetarget particle musthaveacharge distribution. The abovediscussion, when generalized to four-dimensional relativistic space-time, was the basis for a strong argument in favor of the existence of point-like particles- qnarks-inside a proton in 1968, when the results ofthe scattering of high-energy electrons off protons at the Stanford Linear Accelerator Center revealed deviation from a constant ~~~oofu~~~ • 8.2.1 Fourier Transformsand Derivatives The Fourier transform is very useful for solving differential equations. This is because the derivative operator iu r space turns into ordinary multiplication in k
  • 235. :~ = Jznfdky(k)(ik)e ikx, f(x) = JznfdkJ(k)e ikx. 8.2 THE FOURIER TRANSFORM 217 space. For example, if we differentiate f (r) in Equation (8.23) with respect to xi- we obtain ~ fer) = I fdnk~ei(klXl+.+kjxj+..+k"x")J()<) aXj (2rr)n/2 aXj = (2rr~n/2 fdnkCikj)e ik.r J(k). That is, every time we differentiate with respect to any component of r, the cor- responding component of k "comes down." Thus, the n-dimensional gradient is V fer) = (2rr)-n/2 f dnkCik)eik.rJ()<), and the n-dimensional Laplacian is v2 fer) = (2rr)-n/2 f d"k(_k2)eik.rJ(k). We shall use Fourier transforms extensively in solving differential equations later in the book. Here, we can illustrate the above points with a simple example. Consider the ordinary second-order differential equation d 2y dy C2dx2 +Ci dx +CoY = f(x), where Co, Cl, and C2 are constaots. We can "solve" this equation by simply substituting the following in it: y(X) = JznfdkYCk)e ikx, d 2y = __ 1_ fdky(k)k2eikx, dx? ./'iii . This gives Jznfdky(k)(-C2k2 +iCtk +Co)e ikx = JznfdkJ(k)e ih Equating the coefficients of eikx on both sides, we obtain - k _ J(k) y( ) - -C2k2 +iCik:+Co' If we know J(k) [which can be obtained from f(x)], we can calculate y(x) by Fourier-transforming Y(k). The resulting integrals are not generally easy to evaluate. In some cases the methods of complex analysis may be helpful; in others numerical integration may be the last resort. However, the real power of the Fourier transform lies in the formal analysis of differential equations. 8.2.2 The Discrete Fourier Transform The preceding remarks alluded to the power of the Fourier transform in solving certain differential equations. Ifsuch a solution is combined with numerical tech- niques, the integrals must be replaced by sums. This is particularly troe if our
  • 236. 218 8. FOURIER ANALYSIS function is given by a table rather than a mathematical relation, a common fea- ture of numerical analysis. So suppose that we are given a set of measurements performed in equal time intervals of I'>t.Suppose that the overall period in which these measurements are done is T. We are seeking a Fourier transform ofthis finite set of data. First we write or, discretizing the frequency as well and writing CUm = mSco, with I'>cu to be determined later, we have I N-l T j(ml'>cu) = - - L f(nl'>t)e-i(mh.wj n6.t ( - ) • v'2IT n~O N (8.33) Since the Fourier transform is given in terms of a finite sum, let us explore the idea of writing the inverse transform also as a sum. So, multiply both sides of the above equation by [ei(mh.wjk6.t /(v'2IT)]l'>cu and sum over m: I N-l T I'> N-l N-l , -- L j(ml'>cu)ei (mh.w)k6.t I'>cu = ~ L L f(n!'>t)eimh.w6.t (k- n) v'2IT m~O 2:n:N n~O m~O T I'> N-l N-l = ~ L f(nl'>t) L eimh.w6.t (k- n). 2:n:N n~O m=O Problem 8.2 shows that j N N-l eim!!J..w8.t (k- n) = ~ . e iN t1wt:J.t(k-n) _ 1 eili.wAt(k n) _ 1 if k = n, if k # n. Wewant the sum to vanish whenk # n. This suggests demanding that N I'>cul'>t(k- n) be an integer multiple of 2:n:. Since I'>cu and I'>t are to be independent of this (arbitrary) integer (as well as k and n), we must write T 2:n: N I'>cul'>t(k - n) = 2:n:(k - n) =} N I'>cuN = 2:n: =} I'>cu = T discrete Fourier With this choice, we have the following discreteFouriertransforms: transforms 2:n:j cu· - - } - T ' (8.34)
  • 237. 8.2 THE FOURIER TRANSFORM 219 where we have redefined the new 1to be .J2rcN/ T times the old j. Discrete Fourier transforms are used extensively in numerical calculation of problems in which ordinary Fourier transforms are used. For instance, if a dif- ferential equation lends itself to a solution via the Fourier transform as discussed before, then discrete Fourier transforms will give a procedure for fiuding the so- lution uumerically. Similarly, the frequeucy analysis of signals is nicely handled by discrete Fourier transforms. It turns out that discrete Fourier analysis is very iuteusive computationally. Its present status as a popular tool iu computational physics is due primarily to a very fastFourier efficieut method of calculatiou known as the fast Fouriertransform. 10a typical transform Fourier transform, one has to perform a sum of N terms for every point. Since there are N points to transform, the total computational time will be of order N2• 10the fast Fourier transform, one takes N to be even and divides the sum into two other sums, one overthe even termsandone overthe odd terms. Thenthe computation time will be of order 2 x (N/2)2, or half the original calculation. Similarly, if N/2 is even, one can further divide the odd and even sums by two and obtain a computation time of 4 x (N / 4)2, or a quarter ofthe original calculation. 10general, if N = 2k , then by dividing the sums consecutively, we end up with N transforms to be performed after k steps. So, the computation time will be kN = N log2 N. For N = 128, the computation time will be 100 log2 128 = 700 as opposed to 1282 R<16, 400, a reduction by a factor of over 20. The fast Fourier transformis indeed fast! 8.2.3 The Fourier Transform of a Distribution Although one can define the Fourier transform of a distribution in exact analogy to an ordinary function, sometimes it is convenient to define the Fourier transform of the distribution as a linear functional. Let us ignore the distinction between the two variables x and k, and simply define the Fourier transform of a function f :JR ---> JR as - 1 100 . t f(u) = = f(t)e'U dt. '! 2rc -00 Now we consider two functions, f and g, and note that 1 00 100 1 100 . (j, g) == f(u)g(u)du = f(u) [= g(t)e-'U'dt] du -00 -00 v2n-00 1 00 1 100 .] = g(t) [-- f(u)e-'U'du dt -00 ../iii-oo = L:g(t)l(t) dt = (I, g) . The following definition is motivated by the last equation.
  • 238. 220 8. FOURIER ANALYSIS 8.2.5. Definition. Let<p be a distribution and let f be a er;function whose Fourier transform 1exists and is also a er;function. Then we define the Fourier transform ifJ of<p to be the distribution given by (ifJ, f) = (<p, /) • 8.2.6. Example. TheFouriertransform of 8(x) is givenby - - - 1 100 (8, f) = (8, f) = frO) = PC f(t) dt ",21l' -00 = i:(~)f(t)dt=(~,f). Thus,g= 1/$, asexpected. TheFourier transform of }j (x - x') ee 8X' (x) is givenby - - - ! 1 1 00 -ix't (8x" f) = (8x" f) = f(x) = PC f(t)e dt ",,21l'" -00 = 100 (_I_e- 'x't) f(t)dt. -00 $ Thus,if ,,(x) = 8(x - x'), then';;(t) = (1/$)e-'x't. 8.3 Problems III 8.1. Consider the function frO) = L~=-oo 8(0 - Zmn}. (a) Show that f is periodic of period 2". (b) What is the Fourier series expansion for f(O). 8.2. Break:the sum L~~-N ein(O-O') into L;;L; +1+L~~t. Use the geometric sum formula N rN+l _ 1 ~arn=a~__'::' L..J r-1 n=O to obtain ~ e'n(O-O') = e'(O-O') e'N(O-~') - 1 = ei~(N+l)(O-O') sin[~N(O - Of)]. t:1 e'(O-O) - 1 sin[~(O - 0')] By changing n to -n or equivalently, (0 - Of) to -(0 - Of) find a similar sum from -N to -I. Now put everythiog together and use the trigonometric identity 2 cos a sin P= sin(a +P) - sin(a - P) to show that ~ 'n(O-O') _ sin[(N + ~)(O - Of)] ~ e - 1 . n~-N sin[Z(O - 0')]
  • 239. 8.3 PROBLEMS 221 8.3. Find the Fourier series expansion of the periodic function defined on its fun- damental cell as !(8) = {-1(1f+8) if -1f ~8 < 0, 1(1f -8) if 0 < 8 ~ tt, 8.4. Show that An and B" inEquation (8.2) are real when !(8) is real. 8.5. Find the Fourier series expansion ofthe periodic function !(8) defined on its fundamental cell, (-1f, zr), as !(8) = cosa8, (a) when a is an integer. (b) when a is not an integer. 8.6. Find the Fourier series expansion of the periodic function defined on its fun- damental cell, (-1f, x), as !(8) = 8. 8.7. Considerthe periodic function that is defined on its fundamental cell, (-a, a), as !(x) = [z], (a) Find its Fourier series expansion. (b) Evaluate both sides of the expansion at x = 0, and show that 2 8~ 1 it = L.., (2k + 1)2. k=O (c) Show that the infinite series gives the same result as the function when both are evaluated at x = a. 8.8. Let !(x) = x be a periodic function defined over the interval (0, 2a). Find the Fourier series expansion of f . 8.9. Show that the piecewise parabolic "approximation" to a2 sin(1fxja) in the interval (-a, a) given by the function !(x) = {4X(a +x) if - a ~ x ~ 0 4x(a-x) if O~x~a hastheFourier series expansion 32a2 ~ 1 . (2n + 1)1fx !(x) = ---;3 ~ (2n + 1)3 sm a . Plot !(x), a2 sin(1fxja), and the series expansion (up to 20 terms) for a = I between -I and +I on the same graph. 8.10. Find the Fourier series expansion of !(8) = 82 for 181 < tt . Then show that and 1f2 00 (_I)" ---~­ 12 - L.., n2 • n=l
  • 240. 222 8. FOURIER ANALYSIS 8.11. Find the Fourier series expansion of f(t) = {Sinwt if 0::; t ::;nfo», o if - rr/w::; t::; O. 8.12. What is the Fourier transform of (a) the constant function f (x) = C, and (b) the Dirac delta function 8(x)? 8.13. Show that (a) if g(x) is real, then g*(k) = g(-k), and (b) if g(x) is even (odd), then g(k) is also even (odd). 8.14. Let gc(x) stand for the single function that is nonzero only on a subinterval of the fundamental cell (a, a +L). Define the function g(x) as 00 g(x) = L gc(x - jL). j=-oo (a) Show that g(x) is periodic with period L. (b) Find its Fourier transform g(k), and verify that 00 g(k) = Lgc(k) L 8(kL - 2mrr). m=-oo (c) Find the (inverse) transform of g(k), and show that it is the Fourier series of gc(x). 8.15. Evaluate the Fourier transform of g(x) = {b - blxl/a if [x] < a, o if [x] > a. 8.16. Let f(6) be a periodic function given by f(6) = I:~-oo anetlle. Find its Fourier transform /(t). 8.17. Let if ItI < T, if ItI > T. Show that /(w) = _1_ {Sin[(W- "'O)T] _ sin[(w +wa)T]} . "J2ii w - wo w +"'0 Verify the uncertainty relation !'>w!'>t RJ 4rr.
  • 241. 8.3 PR08LEMS 223 8.18. if f(x) = g(x +a), show that j(k) = e-iakg(k). 8.19. For a > 0 find the Fourier transform of f(x) = e-alxl. Is j(k) symmetric? Is it real? Verify the uncertaioty relations. 8.20. The displacement of a damped harmonic oscillator is given by { Ae- atei""'t if t > 0 r» = 0 if 0' 1 t < . Fiod jew) and show that the frequency distribution jj(w)12 is given by - 2 A2 I If(w)1 = -2 ( )2 + 2' 7l'ltJ-lVO a convolution theorem 8.21. Prove the convolution theorem: i:f(x)g(y -x)dx = i:j(k)g(k)e ikY dk. What will this give when y = O? Parseval's relation 8.22. Prove Parseval's relation for Fourier transforms: i:f(x)g*(x)dx = i:j(k)g*(k)dk. Inparticular, the norm ofa function-with weightfunction equal to I-is iovariant under Fourier transform, 8.23. Use the completeness relation 1 = La In) (nl and sandwich it between [x) and (x'] to find an expression for the Dirac delta function in terms of an iofinite seriesof orthonormal functions. 8.24. Use a Fourier transform io three dimensions to find a solution of the Poisson equation: V2<1>(r) = -4np(r). 8.25. For 9'(x) = 8(x - x'), find q;(y). 8.26. Show that jet) = f( -t). 8.27. The Fourier transform of a distribution 9' is given by 00 I q;(t) = I: _8' (t - n). n=on! What is 9'(x)? Hiot: Use ql(x) = 9'(-x) 8.28. For f (x) = LZ~o akxk, show that n j(u) = y'2;I:ikak8(k)(u), k~O where
  • 242. 224 8. FOURIER ANALYSIS Additional Reading 1. Courant, R. and Hilbert, D. Methods ofMathematical Physics, vol. I, In- terscience, 1962. The classic book by two masters. This is a very readable book written specifically for physicists. Its treatment of Fourier series and transforms is very clear. 2. DeVries, P.A First Course in Computational Physics, Wiley, 1994. A good discnssion ofthe fast Fourier transform inclnding some illnstrative computer programs. 3. Reed, M., and Simon', B. Fourier Analysis, Self-Adjointness, Academic Press, 1980. Second volume of a four-volume series. A comprehensive ex- position of Fourier analysis with emphasis on operator theory. 4. Richtmyer, R. Principles of Advanced Mathematical Physics, Springer- Verlag, 1978. A two-volume book on mathematical physics written in a formal style, but very useful due to its comprehensiveness and the large number of examples drawn from physics. Chapter 4 discusses Fourier anal- ysis and distributions.
  • 243. Part III _ Complex Analysis
  • 245. 9 _ Complex Calculus Complex analysis, just like real analysis, deals with questions of continuity, con- vergence of series,differentiation, integration, andso forth. Thereader is assumed to have been exposed to the algebra of complex numbers. 9.1 ComplexFunctions A complex function is a map f : C --> C, and we write f(z) = w, where both z and w are complex numbers.' The map f can be geometrically thought of as a correspondence between two complex planes, the z-plane and the w-plane. The w- plane has a real axis and an imaginary axis, which we can call u and v, respectively. Both u and v are real functions ofthe coordinates ofz, i.e., x and y. Therefore, we may write f(z) = u(x, y) +iv(x, y). (9.1) This equation gives a unique point (u, v) in the w-plane for each point (x, y) in the z-plane (see Figure 9.1). Under f, regions of the z-plane are mapped onto regions of the w-plane. For instance, a curve in the z-plane may be mapped into a curve in the w-plane. The following example illustrates this point. 9.1.1. Example. Letus investigate thebehavior of acoupleof elementary complexfunc- tions. In particular, we shall look at theway a line y = mx in thez-plane is mapped into curves in the w-plane. 1Strictly speaking, we shouldwritef : S -e- C whereS is a subset of thecomplexplane. Thereasonis that mostfunctions arenotdefined fortheentire setof complexnumbers, so that thedomain of suchfunctions is notnecessarily C. Weshallspecify thedomain only whenit is absolutely necessary. Otherwise, we use thegenericnotation f :C -e- C, even though f is defined onlyon a subsetof C.
  • 246. 228 9. COMPLEX CALCULUS f z (x,y) w (u,v) Figure 9.1 A map from the z-plane to the w-plane. (a) For w= I(z) = z2, we have w = (x +iy)2 =x2 - i +2ixy, with u(x, y) = x2 - y2 and v(x, y) = 2xy. For y = mx, i.e., for a line in the z-plane withslopem, theseequations yieldu = (1- m2)x 2 andv = 2mx2. Eliminating x inthese equations, we find v = [2m(1 - m2)]u. This is a line passing through the origin of the w-plane [see Figore 9.2(a)]. Note that the angle the image line makes with the real axis ofthe w-plane is twice the angle the original line makes with the x-axis. (Shnw this!). (b) Thefunction w = fez) = eZ = ex+iy gives u(x, y) = eXcos y andvex, y) = eXsiny. Substituting y = mx, we obtain u = eXcosmx and v = ex sinmx. Unlikepart (a), we cannot eliminate x to find v as an explicitfunction of u, Nevertheless, the last pairof equations areparametric equations of a curve, whichwe canplotin a uv-plane as shown inFigore9.2(b). III Limits of complex functions are defined in terms of absolute values. Thus, limz-+a f(z) = wo means that given any real number f > 0, we can find a corresponding real number 8 > Osuchthatlf(z)-wol < s whenever Izc-u] < 8. Similarly, we say that a function f is cnntinunus at z = a iflirnz-+a f(z) = f(a). 9.2 AnalyticFunctions The derivative of a complex function is defined as usual: 9.2.1.Definitinn. Let f : iC --> iC be a complexfunction. The derivative of f at zo is df I = lim f(zo +L>z) - f(zo) , dz Zo 6.z-+0 .6.z provided that the limit exists and is independent ofL>z.
  • 247. 9.2 ANALYTIC FUNCTIONS 229 y z2 a -+ x (aJ y eZ a -+ x (bJ 2a u 2+4im 1 +im Example Illustrating path dependence of derivative Figure9.2 (a)Themap z2 takes alinewithslopeanglea and maps ittoalinewithtwice theanglein the w-plane. (b)ThemapeZ takes thesameline and mapsit to a spiral in the w-plane. In this definition "independent of /),.z" means independent of /),.X and /),.y (the components of Az) and, therefore, independent ofthe direction of approach to zoo The restrictions of this definition apply to the real case as well. For instance, the derivative of f(x) = [x] at x = 0 does not exisr' because it approaches +I from the right and - I from the left. Itcan easily be shownthat allthe formal rules ofdifferentiation that apply to the real case also apply to the complex case. Forexample, iff and g are differentiable, then f ± g, f g, and-as long as g is not zero-f / g are also differentiable, and their derivatives are given by the usual rules of differentiation. 9.2.2, Example. Let us examinethe derivative of fez) = x2 +2ii at z = I +i: df I = lim f(1 +i +Ll.z)- f(t +i) dz z=l+i 6.z~O ..6.z . (I + /),.x)2+2i(t + Ll.y)2 - t - 2i = hm .8.x--*O Ax + if:!..y .6.y~O = lim 2Ll.x+4iLl.y+(~x)2+2i(Ll.y)2. 8x~O Ax + l.6..y .6.y~O Let us approachz = I + i along the line y - I = m(x - I). Then Ll.y = m Sx, and the limit yields df I = lim 2/),.x +4imLl.x + (/),.x)2 +2im2(Ll.x)2 dz z=l+i .6.x~O Ax + im Sx 20ne canrephrase thisandsaythatthederivative exists.butnotinterms of ordinary functions, rather, interms of generalized functions-in this case 8(x)---discussed in Chapter6.
  • 248. 230 9. COMPLEX CALCULUS It follows that we get infinitely manyvaluesfor the derivative depending on thevaluewe assign to m, i.e., depending on the direction along which we approach 1 + i, Thus, the derivative does notexist atz = 1+i, III It is clear from the definition that differentiability pnts a severe restriction on fez) because it requires the limit to be the saroe for all paths going through zoo Furthermore, differentiability is a local property: To test whether or not a function fez) is differentiable at zo. we move away from zo by a small aroount Az and check the existence of the limit in Definition 9.2.1. What are the conditions under which a complex function is differentiable? For fez) = u(x, y) +iv(x, y), Definition 9.2.1 yields df I = lim {u(xo + AX, YO + Ay) - u(xo, YO) dz zo 8x-+O Ax+iAy dy~O . v(xo + AX, YO + Ay) - v(xo, YO)} +1 . Ax +iAy If this limit is to exist for all paths, it must exist for the two particular paths on which Ay =0 (parallel to the x-axis) and Ax =0 (parallel to the y-axis). For the first path we get df I = lim u(xo + AX, Yo) - u(xo, Yo) dz zo llx-+O ~x . Ii v(xo + AX, YO) - v(xo, YO) au I .av I +1 m = - +1 - . Lx-+O !:i.x ax (xo,Yo) ax (xo,yo) For the second path (Ax = 0), we obtain df I = lim u(xo, Yo + Ay) - u(xo, YO) dz eo 8y-+O iAy . I' v(xo, Yo+ Ay) - v(xo, YO) . au I av I +1 rm =-1 - +- 8y-+O iAy ay (xo,Yo) ay (XO,Yo) . If f is to be differentiable at zo, the derivatives along the two paths must be equal. Equating the real and imaginary parts of both sides of this equation and ignoriog the subscript zo (xo, Yo,or zo is arbitrary), we obtain au av ax = ay and (9.2) These two conditions, which are necessary for the differentiability of f, are called Cauchy-Riemann the Cauchy-Riemann conditions. conditions An alternative way of writing the Cauchy-Riemann (C-R) conditions is ob- tained by making the substitution' x = !(z + z*) and y = 11(z - z") in u(x, y) 3Weuse z" to indicate thecomplexconjugate of z.Occasionally we mayuse z.
  • 249. 9.2 ANALYTIC FUNCTIONS 231 and vex, y), using the chain rule to write Equation (9.2) in terms of z and z", substituting the results in !L = ~ +i ~ and showing that Equation (9.2) is az* az* az* equivalent to the single equation af/az* = O. This equation says that 9.2.3. Box. Iff is to be differentiable, it must be independent ofz". Ifthe derivative of f exists, the arguments leading to Equation (9.2) imply that the derivative can be expressed as Expression forthe derivative ofa differentiable complex function df au' av av au - = -+i-= --i-. dz ax ax ay ay (9.3) The C-R conditions assure us that these two equations are equivalent. The following example illustrates the differentiability of complex functions. 9.2.4. Example. Letus determine whether or not thefollowingfunctions aredifferen- tiable: (a) We have alreadyestablishedthat fez) = x2 + 2iy2 is not differentiableat z =1+ t. We can now show that it is has no derivative at any point in the complex plane (ex- cept at the origin). This is easily seen by noting that U = x2 and v = 2y2, and that au/ax = 2x of. av/ay = 4y, and the first Cauchy-Riemann conditionis not satisfied. The secondC-Rcondition is satisfied, butthat is notenough. Wecan also write fez) in terms of z andz": fez) = G(Z+z*)t +2i [;i (z - z*)t = 1(1-2i)(z2 +z*2) + !(l +2i)zz*. f (z) hasanexplicitdependence on z". Therefore, it is notdifferentiable. (b) Now consider fez) =x2 - y2 +2ixy, for which u = x2 - i and v = 2xy. The C-R conditionsbecome au/ax =2x = av/ay and au/ay = -2y = -av/ax. Thus, fez) may bedifferentiable. Recallthat theC-Rconditions areonlynecessary conditions; wehavenot shown(bot we will, shortly)that they are also sufficient. Tocheck the dependenceof f on z", sobstitntex = (z +z*)/2 and y = (z - z*)/(2i) in u andv to showthat f (z) = z2, andthus there is no z" dependence. (c) Let u(x, y) =eXcos y and vex, y) =eXsiny. Then au/ax = eX cosy = av/ay and aulay = _ex siny =-avlax,and theC-Rconditions aresatisfied. Also, fez) = e" cos y +ie" siny =eX(cosy +i siny) =e"ely = e x +iy =eZ, andthere is no z"dependence. The requirement of differentiability is very restrictive: The derivative must exist along infiuitely many paths. On the other hand, the C-R conditions seem deceptively mild: They are derived for only two paths. Nevertheless, the two paths are, in fact, true representatives ofall paths; that is, the C-R conditions are not only necessary, but also sufficient:
  • 250. 232 9. COMPLEX CALCULUS 9.2.5. Theorem. The function f(z) = u(x, y) + iv(x, y) is differentiable in a region ofthe complex plane ifand only if the Cauchy-Riemann conditions, . au av = ax ay and au av = ay ax (or, equivalently, af/az* = 0), are satisfied and all first partial derivatives of u and v are continuous in that region. In that case df au . av av . au -=-+Z-=--z-. dz ax ax ay ay Proof We have already shown the "only if" part. To show the "if" part, note that if the derivative exists at all, it must equal (9.3). Thus, we have to show that lim f(z + M) - f(z) = au +i av ..6..z--+o llz ax ax or, equivaleutly, that I f (z + AZ) - f (z) _ (au +iav)1 <f Az ax ax By defiuition, whenever IAzl < 8. f(z + Az) - f(z) = u(x + AX, y + Ay) +iv(x + AX, y + Ay) - u(x, y) - iv(x, y). Since u andv havecontinuous first partial derivatives, we canwrite au au u(x + AX, y + Ay) = u(x, y) + ax AX + ay Ay +flAx +81 Ay, av av v(x + AX, y + Ay) = v(x, y) + -Ax + -Ay + f2Ax +82Ay, ax ay where sj , ez, 8J, and 82are real numbers that approachzero as Ax and Ay approach zero.Using theseexpressions, we canwrite f(z + Az) - f(z) = (aU +i av) AX +i (-i au + av) Ay ax ax ay ay +(fl +if2)Ax + (81 +i82)Ay ( au av) = ax +i ax (Ax+iAy)+fAx+8Ay, where f == er +iez. 8 == 81+i82, and we used the C-R conditions in the last step. Dividing both sides by Az = Ax +i Ay, we get f(z + Az) - f(z) _ (aU +i av) = fAX +8 Ay. Az ax ax Az Az
  • 251. analyticity and singularity; regular and singular points; entire functions 9.2 ANALYTIC FUNCTIONS 233 By the triangle inequality, IRHSI :S lEI + iE21 + 181 +i821. This follows from the fact that Il!.xl/Il!.zl and Il!.yl/Il!.zI are both equal to at most 1. The E and 8 terms can be made as small as desired by making l!.z small enough. We have thus established that when the CoRconditions hold, the function f is differentiable. D Angustin-Louis Cauchy (1789-1857) was ODe of the most influ- entialFrench mathematicians of thenineteenth century. Hebegan his career as a military engineer, butwhenhis healthbroke down in 1813 he followedhis natural inclination anddevoted himself wholly tomathematics. In mathematical productivity Cauchy was surpassed only by Euler, andhis collectedworks fill 27 fatvolumes. He madesub- stantial contributions to number theory anddeterminants; is con- sideredto bethe originatorof the theory of finitegroups; and did extensive workin astronomy, mechanics, optics,andthetheory of elasticity. His greatest achievements, however, layin thefieldof analysis. Together withhis con- temporaries Gauss andAbel, he wasapioneer intherigorous treatmentof limits,continuous functions, derivatives, integrals, andinfinite series.Severalof thebasictestsfortheconver- genceof seriesare associated withhis name. He alsoprovided thefirst existenceproof for solutions of differential equations, gavethefirst proofof theconvergence of aTaylor series, andwasthefirst to feel theneedfora careful studyof theconvergence behavior of Fourier series(see Chapter 8). However, his most important workwas in the theory of functions of a complexvariable, whichinessence he created andwhichhas continued to beone of thedominant branches of bothpureandapplied mathematics. In thisfield,Cauchy's inte- graltheorem andCauchy's integral formula arefundamental tools without whichmodem analysiscould hardly exist (see Chapter 9). Unfortunately, his personality did notharmonize with thefruitful powerof his mind. He was an arrogant royalist in politics anda self-righteous, preaching, pious believer in religion-all this in an age of republican skepticism-and most of his fellow scientists disliked him and considered him a smug hypocrite. It might be fairer to put first things first and describe him as a great mathematician who happened also to be a sincere but narrow-minded bigot. 9.2.6. Definition. Afunction f : C --+ C is called analytic at zo if it is differen- tiable at zo and at all other points in some neighborhood ofzoo A point at which f is analytic is called a regular point of f. A point at which f is not analytic is called a singular point or a singularity of f. A function for which all points in iC are regular is called an entire function. 9.2.7. Example. DERNATIVES OF SOME FUNCTIONS (a) fez) = z. Hereu = x andv = y; theC-Rconditions areeasilyshownto hold,andforanyz,we have df/dz = ou/ox +iov/ox = 1.Therefore, thederivativeexists at all pointsof the cornpiex
  • 252. 234 9. COMPLEX CALCULUS plane. (b) f(z) = z2 Here u = x2 - y2 and v = 2xy; the C-R conditions hold, and for all points z of the complex plane, we have dfldz = aulax + iavlax = 2x + i2y = Zz. Therefore, f(z) is differentiable at all points. (c) f(z) = zn for n 2: I. Wecanusemathematical induction andthefactthattheproduct oftwoentire functions isan entire function to show that ..</..(zn) = nzn-I. (d) f(z) = ao +aJZ +...+an_JZn-1+ dz anZn, where ai arearbitrary constants. That fez) is entire follows directly from part(e) andthe factthatthesumof two entire functions is entire. (e) fez) = liz. Thederivative canbe found to be f'(z) = -Ilz2, which does not exist for z = O. Thus, z = 0 is a singularity of f(z). However, any other point is a regular point of f. (f) f(z) = [z[2. Using the definition of thederivative, we obtain !;.f [z+ !;.z[2 _[z[2 !;.z = !;,z ( .z:....:,+..:!;.=z)cc(z::.*--,:.+..:!;.=z::.*-')_---..::zz"--* * A * !;.z* - =z +uz +z--. !;,z !;.z For z = 0, !;.fl!;.z = !;.z*, which goes to zero as Az --> O. Therefore, dfldz = 0 at z = 0.4 However, ifz i' 0, the limit of !;.fI!;.z will depend on how z is approached. Thus, df/dz does not exist if z i' O. This shows that [z[2 is differentiable only at z = 0 and nowhere else in its neighborhood. It also shows that even if the real (here, u = x2 + ;) andimaginary (here, v = 0) parts of acomplexfunction havecontinuous partial derivatives of alI orders at a point, the functiun may not he differentiable there. (g) f(z) = II sin z: This gives dfldz =- coszl sin2z. Thus, f has infinitely many (isolated) singular points at z = ±nrr forn = 0, 1,2,.... III 9.2.8. Example. THE COMPLEX EXPONENTIAL FUNCTION In this example, we lind the (unique) function f : <C --> <C that has the following three properties: (a) f is single-valued and analytic for all z, (b) dfldz = f(z), and (c) f(ZI +Z2) = f(ZI)f(Z2)· Property (b) shows that if f (z) is well behaved, then dfIdds also well behaved. In particular, if f (z) is defined for all values ofz, then f must he entire. For zt = 0 = Z2,property (c) yields flO) = [/(0)]2 =} f(O) = I, or f(O) = o. On the other hand, df = lim f(z + !;.z) - f(z) = lim f(z)f(!;.z) - f(z) = f(z) lim f(!;.z) - I dz ..6.z~O .6.z ..6.z--+0 8z ..6.z-+0 ilz Properly (b) now implies that lim f(!;.z) - I = I =} 1'(0) = I ..6.z--+0 ..6.z and f(O) = I. 4Although thederivative of Izl2 exists atz= 0, itis notanalytic there(oranywhere else). Tobe analytic atapoint,afunction musthavederivatives atall points in someneighborhood of thegivenpoint.
  • 253. 9.2 ANALYTIC FUNCTIONS 235 The first implication follows from the definition ofderivative, and the second from the fact thatthe onlyother choice,namely1(0) = 0, wouldyield -00 for the limit. Now,we write I(z) =u(x, y) + iv(x, y},for whichproperty(b) becomes au av au av -+i-=u+iv;;;} -=u, -=v. ax ax ax ax Theseequatioushavethe most general solutionu(x, y) = a(y)e' and v(x, y) = b(y)eX, where a(y) and bey) are the "constants" of integration. The Cauchy-Riemann conditions now yield aryl = dbldy and daldy = -b(y), whose most general solutionis aryl = Acosy+Bsiny,b(y) =Asiny-Bcosy.Ontheotherhand,/(O) = lyieldsu(O,O) = I andv(O,0) =0,implyingthata(O) = I, b(O)=0 or A = I, B =O. Wethereforeconclude that I(z) =a(y)eX+ib(y)e' =eX(cosy +i siny) =exiy =eZ • Both ex and eiy are well-defined in the entire complex plane. Hence, eZ is defined and differentiable over all C; therefore, it is entire. II Example 9.2.7 shows that any polynomial in z is entire. Example 9.2.8 shows that the exponential function eZ is also entire. Therefore, any product and/or sum ofpolynomials and eZ will also he entire. We can bnild other entire functions. For instance, eiz and e-iz are entire functions; therefore. the trigonometric functions, defined as eiz _ e-iz eiz +e-iz sinz= 2i and cosz= 2 (9.4) are also entire functions. Problem 9.5 shows that sin z and cos z have only real zeros. The hyperholic functions can be defined similarly: eZ _ e-z eZ +e-z sinh z = and cosh z = (9.5) 2 2 Although the sum and the product of entire functions are entire, the ratio, in general, is not. For instance, if f(z) and g(z) are polynontials of degrees m and n, respectively, then for n > 0, the ratio f (z)Ig(z) is not entire, because at the zeros ofg(z)-which always exist and we assume that it is not a zero of f(z)-the derivative is not defined. The functions u(x, y) and v(x, y)' of an analytic function have an interesting property that the following example investigates. 9.2.9. Example. The familyof curvesu(x, y) = constant is perpendicularto thefamily of curves vex, y) = constant at each point of the complex plane where fez) = u + iv is analytic. This can easily be seenby lookingat thenormalto the curves.The normalto the curve u(x, y) = constant is simply Vu = (aulax, aulay). Similarly, the normal to the curve v(x, y) = constant is Vv = (avlax, avlay). Takingthedotproductofthesetwononna1s, we obtain au av au av au (au) au (au) (Vu) . (Vv) = ax ax + ay ay = ax - ay + ay ax =0 by theC-R conditions. III
  • 254. 236 9. COMPLEX CALCULUS 9.3 Conformal Maps The real and imaginary parts of an analytic function separately satisfy the two- dimensional Laplace's equation: a2 u a2 u a2 v a2 v -+--0 -+-=0 (9.6) ax2 ay 2 - , ax2 ai This can easily be verified from the C-R conditions. Laplace's equation in three dimensions, a2 <1> a2 <1> a2 <1> - + - + - - 0 ax2 ai 8z2 - , describes the electrostatic potential <I> in a charge-free region of space. In a typ- ical electrostatic problem the potential <I> is given at certain bonndaries (usually conducting surfaces), and its value at every point in space is sought. There are numerous techniques for solving such problems, and some of them will be dis- cussed later in the book. However, some of these problems have a certain degree of symmetry that reduces them to two-dimensional problems. In such cases, the theory of analytic functions can be extremely helpful. The symmetry mentioned above is cylindrical symmetry, where the potential is known a priori to be independent of the z-coordinate (the axis of symmetry). This situation occurs when conductors are cylinders and-if there are charge distribu- tions in certain regions of space-the densities are z-independent. In such cases, a<l>jaz = 0, and the problem reduces to a two-dimensional one. harmonic functions Functions satisfying Laplace's equation are called harmonic functions. Thus, the electrostatic potential is a three-dimensioual harmonic function, and the po- tential for a cylindrically symmetric charge distribution and boundary condition is a two-dimensional harmonic function. Since the real and the imaginary parts of a complex analytic function are also harmonic, techniques of complex analysis are sometimes useful in solving electrostatic problems with cylindrical symmetry.' To illustrate the connection between electrostatics and complex analysis, con- sider a long straight filament with a constant linear charge density A. It is shown in introdnctory electromagnetism that the potential <I> (disregarding the arbitrary con- stant that determines the reference potential) is given, in cylindrical coordinates, by <I> = 2Alnp = 2AIn[(X2 + y2)1/2] = 2AIn [z]. Since <I> satisfies Laplace's equation, we conclude that <I> could be the real part of an analytic function w(z), complex potential which we call the complex potential. Example 9.2.9, plus the fact that the curves u = <I> = constant are circles, imply that the constant-v curves arerays, i.e., v ()( rp. Choosing the constant of proportionality as 2A,we obtain w(z) = 2Alnp + i2Arp = 2AIn(pei~). = 2AlnZ 5We useelectrostatics because it is morefamiliar tophysicsstudents. Engineering students arefamiliar withsteadystate heat transfer aswell, whichalsoinvolves Laplace's equation, andtherefore is amenable to thistechnique.
  • 255. 9.3 CONFORMAL MAPS 237 It is useful to know the complex potential of more than one filament of charge. Tofind such a potential we must first find w(z) for aline charge when it is displaced from the origin. Ifthe line is located at zo = xo +iyo, then it is easy to show that w(z) = 2Aln(Z - zo). If there aren line charges located at zjv zj, ... , Zn, then n w(z) = 2 I>kIn(z - Zk)· k=l (9.7) The function w(z) can be used directly to solve a number ofelectrostatic problems involving simple charge distributions and conductor arrangements. Some of these are illustrated in problems at the end of this chapter. Iostead of treating w(z) as a complex potential, let us look at it as a map from the z-plane (or xy-plane) to the w-plane (or uv-plane). In particular, the equipotential curves (circles) are mapped onto lines parallel to the v-axis in the w-plane. This is so because equipotential curves are defined by u = constant. Similarly, the constant-v curves are mapped onto horizontal lines in the w-plane. This is an enormous simplification of the geometry. Straight lines, especially when they are parallel to axes, are by far simpler geometrical objects than circles," especially if the circles are not centered at the origin. So let us consider two complex "worlds." One is represented by the xy-plane and denoted by z. The other, the "prime world," is represented? by z', and its real and imaginary parts by x' and y'. We start in z, where we need to find a physical quantity such as the electrostatic potential <I>(x, y). If the problem is too complicated in the z-world, we transfer it to the z'-world, in which it may be easily solvable; we solve the problem there (in terms of x' and y') and then transfer back to the z-world (x and y). The mapping that relates z and z' must be cleverly chosen. Otherwise, there is no guarantee that the problem will simplify. Two conditions are necessary for the above strategy to work. First, the dif- ferential equation describing the physics must not get more complicated with the transfer to z'.Since Laplace's equation is already of the simplest type, the z'-world must also respect Laplace's equation. Second, and more importantly, the map- ping must preserve the angles between curves. This is necessary because we want the equipotential curves and the field lines to be perpendicular in both worlds. A mapping that preserves the angle between two curves at a given point is called conformal mapping a conformal mapping. We already have such mappings at our disposal, as the following proposition shows. 9.3.1. Proposition. Let VI and Y2 be curves in the complex z-plane that intersectat apointui at an angle a.Let f : iC --+ C be a mappinggiven by f(z) =z' =x'+iy' that is analytic at zoo Let vi and V~ be the images ofVI and Y2 under this mapping, which intersect at an angle a', Then, 6This statement is valid only in Cartesian coordinates. But these are precisely the coordinates we are using in this discussion. 7We are using Zl instead of w, and (x', y') instead of Cu,v).
  • 256. for i = 1,2. 238 9. COMPLEX CALCULUS (a) a' = a, that is, the mapping f is conformal, if (dz'ldzlzo oF O. (b) Iff is harmonic in (x, y), it is also harmonic in (x', y'). Proof We sketch the proof of the first part. The details, as well as the proof ofthe second part, involve partial differentiation and the chain rule and are left for the reader. The angle between the two curves is obtained by taking the inner product of the two unit vectors tangent to the curves at zoo A small displacement along Yi canbe written asexAxi + eyD..Yi for i = 1,2, and theunitvectors as '" ex.6.xi +eylJ.Yi ej = J(!;'Xi)2 + (!;.Yi)2 Therefore, , , !;.XI!;.X2 +!;.YI !;.Y2 el . ez = . . J (!;.XIP + (!;'YI)2J(!;.X2)2 + (!;.Y2P Sintilarly, in the prime plane, we have !;.x;!;.x~ + !;.Y;!;.y~ e~ .e; = ,,===='==~r=~==~=== J(!;.x;)2 +(!;.y;pJ(!;.x~)2 +(!;.y~)2' (9.8) . translation dilation inversion where x' = u(x, y) andy' = vex, y) andu and v are the real andimaginaryPj of the analytic function f. Using the relations , au au , av av !;.x, = -!;.x, + -a !;.Yi, !;.Yi = -a !;.Xi + -!;.Yi, i =1,2, ax Y x ay and the Cauchy-Riemann conditions, the readermay verify that 1'; .e~ = 1'1 ·1'2. D The following are some examples of conformal mappings. (a) z' = z +a,where a is an arbitrary complex constant. This is simply a trans- lation of the z-plane, (b) z' = bz; where b is an arbitrary complex constant. This is a dilation whereby distances are dilated by a factor Ibl. A graph in the z-plane is mapped onto a sim- ilar (congruent) graph in the z'·plane that will be reduced (Ibl < 1) or enlarged (!bl > 1) by a factor of Ibl. (c) z' = liz. This is called an inversion. Example 9.3.2 will show that under such a mapping, circles are mapped onto circles or straight lines. (d) Combining the preceding three transformations yields the general mapping , az+b Z=--, cz+d which is conformal if cz +d oF 0 oF dz'ldz. The latter conditions are eqnivalent toad - be oF O.
  • 257. 9.3 CONFORMAL MAPS 239 9.3.2. Example, A circle of radius r whose center is at a in the a-plane is described by the equation lz - aI = r. When transforming to the z'-plane under inversion, this equation becomes II/z' - a] = r, or 11- all = rill. Squaringboth sidesand simplifying yields (r2 - lal2)lz'12 +2Re(al) - I =O. In termsof Cartesiancoordinates, this becomes (9.9) where a == a- +ia;. We now consider two cases: 1. r i' 14 Divideby r2 -lal2 and comptetethe squaresto get ( ' ar )2 (, ai )2 a;+af 1 x + + y - - - -0 r2 _ lal2 r2 - lal2 (r2 _ la12)2 r2 - lal2 - , or definiug a; sa -ar/(r2 -laI2), a; eaai!(r2 -laI2 ), and r' '" r/lr2 -lal2l, we have (x' - a;)2 + (y' - aj)2 = rll, which can also be written as Ii - a'l = r', I , . , a* a = ar + la; = 2 2' lal -r homographic transformations This is a circle in the z' -plane with center at a' and radius of r', 2. r = a: Then Equation (9.9) reduces to a-x' - aj y' = l.which is the equation of a liue. If we use the transformation z' = 1/(z - c) instead of z' = lIz, then lz - at = r becomes 11/z' - (a - c)I = r, and all the above analysis will go through exactly as before, exceptthata is replacedby a - c. III Mappiugs of the form given in Equation (9.8) are called homographic trans- formations. A nseful property of such transformations is that they can map an iufinite region of the z-plane onto a finite region of the z'-plane. In fact, points with very large values ofz are mapped onto a neighborhood of the point z' = a/ c. Of course, this argmnent goes both ways: Eqnation (9.8) also maps a neighbor- hood of -d/ c in the z-plane onto large regions of the z'-plane. The nsefulness of homographic transformations is illustrated in the followiug example. 9.3.3. Example. Cousidertwocyliudricalconductorsofequalrar1ius r, beldatpotentials Ul and U2. respectively, whose centers are D units of length apart. Choose the x-and the y-axes such that the centers of the cylinders are located on the x-axis at distances al and «a from the origin, as shown in Figure 9.3. Let us find the electrostatic potential produced by sucha configuration iu the xy-plane. We know from elementary electrostatics that the problem becomes very simple if the two cylinders are concentric (and, of course, of different radii). Thus, we try to map the two circles onto two concentric circles in the z'-plane such that the infinite region outside the two circles in the z-plane gets mapped onto the finite annular region between the two concentric circles in the zl-plane. We then (easily) find the potential in the z'-plane, and transfer it back to the z-plane. The most general mapping that may be able to do the job is that given by Equation (9.8). However, it turns out that we do not have to be this general. In fact, the special case
  • 258. 240 9. COMPLEX CALCULUS Figure 9.3 In thez-plane, we see twoequalcylinders whosecenters areseparated. z' = I/(z - c) in which c is a real constant will be snfficient. So, z = (I/z') + c, and the circles Iz -akl =I" fork =1,2 will be mapped onto the circles [z' - a,(1 =1"'(, where (by Example 9.3.2) a,( = (ak - c)/[(ak - c)2 - r 2] and r,( = r/I(ak - c)2 _ r 21 . Canwe arrange theparameters so thatthe circles in the z' -plane are concentric, i.e., thata~ =a2 ? Theansweris yes. Wesetal = a2 andsolveforaz in terms of al' Theresult is either the trivial solution az = at, or cz = c - r2/(al - c). IT we place the originof thez-plane atthecenter of the first cylinder, thenat = 0 andaz = D = c + r2[c. Wecan also findat anda2:at = a2ee a' = -c/(c2 - r2), andthegeometry of the problem is as shownin Figure 9.4. For sucha geometry the potential at a pointin the annular regionis given by 4>' = A In p + B = A In Iz' - a'i + B, where A and B are real constants determined by the conditions <1>'(r;) = "1 and <1>' (r~) = "2, which yields and Thepotential $' is therealpart of thecomplexfuncttonf F(Z') =Aln(z' - a') + B, whichis analytic exceptatz' = a', apointlyingoutside theregionof interest. Wecannow go back to the z-plane by substituting z' = I/(z - c) to obtain G(z) = Ain (_1__ a') + B, z-c 8Writing Z = IzleiO , we notethatln z = In [z] +ie, so that therealpart of a complexlog function is thelog of theabsolute value.
  • 259. 9.4 INTEGRATION OF COMPLEX FUNCTIONS 241 " Figure 9.4 In ~~ z' -plane, we see two concentric unequal cylinders. whoserealpart is thepotential in thez-plane: I I - a'z+ a'cl <I>(x, y) =Re[G(z)] =Aln + B z-c =Alnl(1 +a'c-a'x) -ia'yl +B (x-c)+iy A [(I+a'c -a'x)2 +a'2y2] =-In 2 2 +B. 2 (x - c) + y Thisis thepotential we want. 9.4 Integration of Complex Functions iii The derivative of a complex function is an important concept and, as the previous section demonstrated, provides a powerful tool in physical applications. The con- cept of integration is even more important. In fact, we will see in the next section that derivatives can be written in terms of integrals. We will study integrals of complex functions in detail in this section. The definite integral of a complex function is defined in analogy to that of a real function: 1 ' 2 N !(z) dz = lim L !(Zi)Ll.Zi, at N....:,-oo. 1 ..6.zr+O' =
  • 260. 242 9. COMPLEX CALCULUS where I1Zi is a small segment, situated at zs,ofthe curve that conoects the complex numberal to the complex number a2 in the z-plane, Since there are infinitely many ways ofconnecting al to a2, it is possible to obtain different values for the integral for different paths. Oneencounters a similar situation when one triesto evaluate theline integral of a vector field. In fact, we can tum the integral of a complex function into a line integral as follows. We substitute f(z) = u +iv anddz = dx +idy in the integral to obtain c c c f(z)dz = (udx - vdy) +i (vdx +udy). at at at Ifwe define the two-dimensional vectors Al sa (u, -v) andA2 '" (v, u), then we get ra, f(z) dz = ra, Al . dr +ira, A2 . dr. It follows from Stokes' theorem JU I JUI jUt (or Green's theorem, since the vectors lie in a plane) that the integral of f is path- independent only if both AI and A2 have vanishing curls. This in tum follows if and only if u and v satisfy the CoRconditions, and this is exactly what is needed for f (z) to be analytic. Path-independence of a line integral ofa vector A is equivalent to the vanishing of the integral along a closed path, and the latter is equivalent to the vanishing of V x A = 0 at every point of the region bordered by the closed path. The preceding discussion is encapsnlated in an important theorem, which we shall state shortly. First, however, it is worthwhile to become familiar with some terminology used frequently in complex analysis. curve defined 1. A curve is a map Y : [a, b] -> iC from the real interval into the complex plane given by y(t) = Yr(t) +iYi(t), where a :s: t :s: b, and Yr and Yi are the real and imaginary parts of Y; Y(a) is called the initial point of the curve and y(b) its final point. 2. A simple arc, or a Jordan arc,is a curvethat does not crossitself, i.e., y is injective (or one to one), so that y(tl) "" y(t2) when tl "" ti. 3. A path is a finite collection In, n, ... ,Yn} of simple arcs such that the initial point of JIk+1 coincides with the final point of JIk. 4. A smooth arc is a curve for which dy ldt = dYr/dt + idYi/dt exists and is nonzero for t E [a, b]. contour defined 5. A contour is a path whose arcs are smooth. When the initial point of YI coincides with the final point of Yn,the contour is said to be a simple closed contour. Cauchy-Goursat 9.4,1. Theorem. (Cauchy-Goursat theorem) Let f : C -> iC be analytic on a theorem simple closed contour C and at all points inside C. Then i f(z)dz = O.
  • 261. 9.4 INTEGRATION OF COMPLEX FUNCTIONS 243 y (1,2) x Figure 9.5 Thethree different paths of integration corresponding to theintegrals II. If. Iz. and I~. 9.4.2. Example. EXAMPLESOF DEFINITE INTEGRALS (a) Let us evaluate the integral II = fY1 z dz where ·YI is the straight line drawn from the origin to the point (1.2) (see Figure 9.5). Along such a line y = 2x and. using t for x. YI(t) =t +2it where 0 :s t:s 1; so II = 1z dz = [t +2it)(dt +2idt) = [I (-3tdt +4itdt) = -~ +u. )'1 io 10 For a different path)'2. along which y = 2x2• we get )'2(t) = t +2it2 where 0 :s t :s 1. and Ii = Ir,zdz = 10 1 (t +2it 2)(dt +4itdt) = -~ +2i. Therefore, II = If. Thisis what is expected fromtheCauchy-Goursat theorem because thefunction fez) = z is analytic on thetwopaths andin theregionbounded by them. (h) To find Iz es fY1 z2dz with YI as in part (a). suhstitute for z in terms of t: 1 10 1 11 2 Iz = (t +2it)2(dt +2idt) = (1 +2i)3 t2dt = --3 - -3i, n 0 Next we compare h with I~ = fY3 z2dz whereY3 is as shown in Figure 9.5. Thispathcan be described by n(t) = g+i(t -1) Therefore, for 0 s t .::: 1, for 1 s t ::::: 3. , 101 2 /,3 2 1 2 11 2 12 = t dt + [1 +itt - 1)] (idt) = - - 4 - -i = -- - -zt, o 1 3 3 3 3
  • 262. 244 9. COMPLEX CALCULUS Figure 9.6 Thetwosemicircular paths forcalculating h andI~. whichis identical to 12.onceagainbecausethefunction is analytic on YlandY3 aswell as in theregionbounded bythem. (c) Now consider /3 es fY4 dz/z whereY4 is theupper semicircle of unit radius, as shown in Figure 9.6. A parametric equation for Y4 canbe givenin terms of (): O:::::8::S1L convention for positive sense of integretion around a closed contour Thus,we obtain On the otherhand, , 1t j," t . te . 13 = -dz = ~,e dO = -llr. Y~ Z 2n el Herethetwo integrals arenotequal. From Y4 andY~ we canconstruct a counterclockwise simpte closed contour C, along which the integral of f(z) = l/z becomes Jc dz/z = /3 - I~ = liT(. Thatthe integral is not zero is a consequence of the fact that liz is not analytic atallpointsof theregionbounded by the closedcontour C. .. The Cauchy-Goursat theorem applies to more complicated regions. When a region contains points at which f(z) is not analytic, those points can be avoided by redefining the region and the contour. Such a procedure requires an agreement on the direction we will take. Convention. When integrating along a closed contour, we agree to move along the contour in such a way that the enclosed region lies to our left. An integration thatfollows this convention is called integration in the positive sense. Integration performed in the opposite direction acquires a minus sign.
  • 263. 9.4 INTEGRATION OF COMPLEX FUNCTIONS 245 y o x simply and multiply connected regions Cauchy Integral Formula Figure 9.7 A complicated contourcan be broken up into simpler ones. Note thatthe boundaries of the "eyes" and the "mouth" are forced to be traversed in the (negative) clockwisedirection. For a simple closed contour, movementin the counterclockwise directionyields integration in the positive sense. However, as the contour becomes more compli- cated, this conclusion breaks down. Figure 9.7 shows a complicatedpath enclosing a region (shaded) in which the integrand is analytic. Note that it is possible to tra- verse a portion of the region twice in opposite directions without affecting the integral, which may be a sum of integrals for different pieces of the contour. Also note that the"eyes" and the "mouth" are traversed clockwise! This is necessary because of the convention above. A region such as that shown in Figure 9.7, in which holes are "punched out," is called multiply connected. In contrast, a sim- ply connected region is one in which every simple closed contour encloses only points of the region. One important consequence of the Cauchy-Goursat theorem is the following: 9.4.3. Theorem. (Cauchy iutegral formula) Let ! be analytic on and within a simple closed contour C integrated in the positive sense. Let zo be any interior point to C. Then I i !(z) !(ZO) = -. --dz 2:m c z - zo To prove the Cauchy integral formula (CIF), we need the following lemma. 9.4.4. Lemma. (Darboux inequality) Suppose ! : C -+ iC is continuous and bounded on a path y, i.e.•there exists a positive number M such that 1!(z)1 ::: M
  • 264. 246 9. COMPLEX CALCULUS for all values z E y. Then where L y is the length ofthe path ofintegration. Proof 11 f(Z)dZI = IN~ootf(Zi)!!.Zil = J!!!'oo Itf(Zi)!!.Zi I y Li.Zi--+O l=l .6.Zj--+O 1=1 N N s lim ''If(zi)!!.ziI = lim L If(Zi)1 l!!.ziI N--+oo !-- N--+oo . ..6.zi--+O 1=1 8Zj--+O 1=1 N s M lim L l!!.ziI = MLy • N-*oo. 1 Li.zi--+O 1= The first inequality follows from the triangle inequality, the second from the bound- edness of f, and the last equality follows .from the definition of the length of a path. D Now we are ready to prove the Cauchy integral formula. ProofofCIF. Consider the shaded region in Figure 9.8, which is bounded by C, by YO (a circle of arbitrarily small radius 8 centered at zo), and by Ll and L2, two straight line segments infinitesimally close to one another (we can, in fact, assnme that L1 and L2 are right on top of one another; however, they are separated in the figure for clarity). Let us use C' to denote the union of all these curves. Since f(z)/(z - zo) is analytic everywhere on the contour C' and inside the shaded region, we can write 0= _1_ 1 f(z) dz (9.10) 2ni fe! z - zo = _1_[1 f(z) dz + 1 f(z) dz + 1 f(z) dz + 1 f(z) dzJ. 2ni rc z - zo fro Z - zo hI Z - zo hl Z - zo , =0 The contributions from L1 and L2 cancel because they are integrals along the sarne line segment in opposite directions. Let us evaluate the contribution from the in- finitesimal circle yo.First we note that because f (z) is continuous (differentiability implies continnity), we can write I f(z) - f(zo) I= If(z) - f(zo) I = If(z) - f(zo)! z-zo Iz-zol 8 • <- 8
  • 265. 9.4 INTEGRATION OF COMPLEX FUNCTIONS 247 y x Figure 9.8 Theintegrand is analytic withinandon theboundary of theshaded region. It is alwayspossibleto construct contours that excludeall singular points. for Z E JIO, where. is a small positive number. We now apply the Darbonx inequality and write 1 1. !(z) - !(zo) dzi < =-2,,8 = 2"•. fro z-zo 8 This means that the integral goes to zero as 8 ---> 0, or 1. !(z) dz = 1. !(zo) dz = !(zo) 1. .ss.; froz-zo froz-zo froz-zo We can easily calculate the integral on the RHS by noting that z - zo = 8ei• and that JIO has a clockwise direction: i dz 12 " i8e i·d<p i !(z) - - = - . = -2"i =} --dz = -2"i!(zo). Yo z - zo 0 8e' qJ Yo Z - zo Snbstitnting this in (9.10) yields the desired result. 9.4.5. Example. Wecanusetheelf to evaluate theintegrals D fc (z2 - I)dz Iz=· , c, (z- iHz2 - 4)3
  • 266. 248 9. COMPLEX CALCULUS where CI, e2, andC3 arecircles centered atthe origin withradii 71 = 3/2,72 = I, and 73 =4. For lj we notethat /(z) = z2/(z2 + 3)2is analyticwithinandon Coand zc = i lies in theinterior of C1. Thus, t /(z)dz .. . i 2 .st II = --. = 2",/(,) = 21"'2 2 = -'-2' c, z -, (, + 3) Similarly, /(z) = (z2 - 1)/(z2 - 4)3 for the integralIz is analytic on andwithinC2, and ZQ = 1/2 is aninterior point of C2. Thus, the Cllt gives i /(z)dz 32". Iz = --1 = 2".i/(1/2) = --5i. C,z-:;: 112 Forthe lastintegral, / (z) = ez/2/(z2 - 20)4,andthe interiorpointis zu = iit: 2". III (9.11) Explanation at why theCauchy integral formula works! The Cauchy integral formula gives the value of an analytic function at every point inside a simple closed contour when it is given the value of the function only at points on the contour. It seems as though an analytic function is not free to change insidearegion onceitsvalueis fixedonthecontour enclosingthat region. There is an analogons sitnation in electrostatics: The specification ofthe poten- tial at the boundaries, such as the surfaces ofconductors, automatically determines the potential at any other point in the region of space bounded by the conductors. This is the content of the uniqueness theorem used in electrostatic boundary value problems. However, the electrostatic potential <I> is bound by another condition, Laplace's equation; and the combination of Laplace's equation and the boundary conditions furnishes the uniqueness of <1>. Similarly, the real and imaginary parts of an analytic function separately satisfy Laplace's equation in two dimensions! Thus, it should come as no surprise that the value of an analytic function on a boundary (contour) determines the function at all points inside the boundary. 9.5 Derivatives as Integrals The Cauchy Integral Formula is a very powerful tool for working with analytic functions. One of the applications of this formula is in evaluating the derivatives of such functions. It is convenient to change the dununy integration variable to ~ and write the ClF as f(z) = ---.!..." 1. f(~) d~ , 2"., fc ~ - z where C is a simple closed contour in the ~-plane and z is a point within C. As preparation for defining the derivative of an analytic function, we need the following result.
  • 267. (9.12) derivative ofan analytic function given interms ofan integral 9.5 DERIVATIVES AS INTEGRALS 249 9.5.1. Proposition. Let y be any path---<z contour,for example-and g a contin- uousfunction on that path. Thefunction f(z) defined by f(z) = ~ 1g(;) d; 2:n:z y ; - z is analytic at every point z ¢ y. Proof The proof follows immediately from differentiation of the integral: df =_1 ~lg(;)d; =_1 19(;)d;~(_I_)=_1 1 g(;)d;. dz 2:n:i dz y ; - z 2:n:i y dz ; - z 2:n:i y (; - Z)2 This is defined for all values of z not on y.9 Thus, f(z) is analytic there. D We can generalize the formula above to the nth derivative, and obtain d n f nl 1 g(;)d; dzn = 2:n:i y (; - z)n+1. Applyingthis resultto an analytic function expressedby Equation (9.11), we obtain the following important theorem. 9.5.2. Theorem. The derivatives of all orders of an analytic function f (z) exist in the domain of analyticity of the function and are themselves analytic in that domain. The nth derivative of f(z) is given by f(n)(z) = d n f = ~ 1 f(;) d; dz" 2:n:i ic (; - z)n+1 9.5.3. Example. Let us apply Bquation (9.12) directly 10 some simple functions. In all . cases, we will assumethat the contour is a circle of radius r centeredatz. (a) Lei f(z) = K, a constant. Then, for n = 1 we have df __1_1 ~ dz - 2"i fc (I; - z)2. Since~ is onthecircle C centered atz,g- z = reifJ and dg = riei~d8. Sowe have df = _1_ r: Kire iO dO = ~ f2rr e-lOso = O. dz 2"i 10 (relO)2 Zatr 10 . (b) Given f (z) = z, its first derivative will be df = _1_ 1 ~ = _1_ r: (z+relO)irelOdO dz 2ni fc (I; - z)2 2"i 10 (re,O)2 = ~ (: t" e-lOdO + t" -) = ~(O+2") = I. 2n rio 10 21l' , 9Theinterchange of differentiation andintegration requires justification. Suchaninterchange canbe doneif the integral has somerestrictive properties. Weshallnot concern ourselves withsuchdetails. In fact,one canachievethe sameresult by using thedefinition of derivatives andtheusualproperties of integrals.
  • 268. 250 9. COMPLEX CALCULUS (c) Given f(z) = z2, for the firstderivative Equation (9.12) yields df = _1_ 1. g2dg = _1_ {2n (z +rei9)2ireiOdB dz 'Iitl fc (g - z)2 2"i 10 (re,0)2 = ~ r:[z2 + (rei9)2"+ 2zre'0j (reiO)-ldB 2" 1 0 I(z2 fo2n" fo2n . foh) = - - e-'OdB+r e'OdB+2z dB ee Zz, 2n r 0 ' 0 0 It can be shown that, in general, (d/dz)z" = mz..- l. The proof is left as Problem 9.30. III The CIF is a central formula in complex aualysis, and we shall see its sig- nificance in much of the later development of complex analysis. For now, let us demonstrate its usefulness in proving a couple of important properties of analytic functions. 9.5.4. Proposition. The absolute value ofan analytic function f(z) cannot have a local maximum within the region ofanalyticity ofthe function. Proof Let S c <C be the region of analyticity of f. Suppose zo E S were a local maximum. Then we could find a circle jn of small enough radius 8, centered at zo, such that If(zo)1 > If(z)1 for all Z on )/0. We now show that this cannot happen. Using the CIF, and noting that z - zo = 8eiO, we have If(zo)1 = I~,[ f(z) dzl = _I I{2n f(z)i8ei9deI 2"., fro z - zo 2" 10 8e'0 s ~ {2n If(z)!de s _I r:use = M, 2" 10 2" 10 where M is the maximum value of If(z)1 for z E )/0. This inequality says that there is at least one point z on the circle )/0 (the point at which the maximum of !f(z)1 is attained) such that If(zo)1 ::: If(z)l. This contradicts our assumption. Therefore, there can be no local maximum within S. D 9.5.5. Proposition. A bounded entire function is necessarily a constant. Proof We show that the derivative of such a function is zero. Consider df 1,[ f(g) dg dz = 2"i fc (g - Z)2 • Since f is an entire function, the closed contour C can be chosen to be a very large circle of radius R with center at z. Taking the absolute value of both sides yields
  • 269. fundamental theorem ofalgebra proved 9.5 DERIVATIVES AS INTEGRALS 251 where M is the maximum of the function in the complex plane. Now, as R ---> 00, the derivative goes to zero, and the function must be a constant. D Proposition 9.5.5 is a very powerful statement about analytic functions. There are many interesting and nontrivial real functions thatare boundedandhave deriva- tivesof all orders on theentire real line. Forinstance, e- x2 is sucha function. No such freedom exists for complex analytic functions. Any nontrivial analytic func- tion is either not bounded (goes to infinity somewhere on the complex plane) or not entire (it is not analytic at some point(s) of the complex plane). A consequence of Proposition 9.5.5 is the fundamental theorem of algebra, which states that any polynomial of degree n 2': 1 has n roots (some of which may be repeated). In other words, the polynomial p(x) = aD +alX +...+anxn for n, 2': I can be factored completely as p(x) = c(x - Zl)(X - zz)··· (x - Zn) where c is a constant and theZi are, in general. complexnumbers. To see how Proposition 9.5.5 implies the fundamental theorem of algebra, we let f(z) = l/p(z) and assume the contrary, i.e., that p(z) is never zero for any (finite) Z E C. Then f(z) is bounded and analytic for all z E C, and Proposition 9.5.5 says that f (z) is a constant. This is obviously wrong. Thus, there mnst be at least one z, say z = zi, for which p(z) is zero. So, we can factor out (z - Zl) from p(z) and write p(z) = (z - Zl)q(Z) where q(z) is of degree n - 1. Applying the above argument to q(z), we have p(z) = (z - Zl)(Z - zz)r(z) where r(z) is of degree n - 2. Continning in this way, we can factor p(z) into linear factors. The last polynomial will be a constant (a polynomial of degree zero) which we have denoted by c. The primitive (indefinite integral) of an analytic function can be defined using definite integrals just as in the real case. Let f : C ---> C be analytic in a region S of the complex plane. Let zo and z be two points in S, and defmel'' F(z) es fz: f(l;) d~. We can show that F(z) is the primitive of f(z) by showing that lim IF(z + t>z) - F(z) - f(z)1 = o. .6.z--+O f).z We leave the details as a problem for the reader. 9.5.6. Proposition. Let f : C ---> C be analytic in a region S ofC. Then at every point z E S, there exists an analytic function F : C ---> C such that dF dz = f(z). lONate that theintegral is path-independent duetotheanalyticity of f. Thus, F is well-defined.
  • 270. 252 9. COMPLEX CALCULUS In the sketch of the proofof Proposition 9.5.6, we used only the continuity of f and the fact that the integral was well-defined. These two conditions are sufficient to establish the analyticity of F and f, since the latter is the derivative of the former. The following theorem, due to Morera, states this fact and is the converse of the Cauchy-Goursat theorem. 9.5.7. Theorem. (Morera's theorem) Let a function f : iC -+ iC be continuous in a simply connected region S. Iffor each simple closed contour C in S we have :Fe f(~) d~ = 0, then f is analytic throughout S. 9.6 Taylor and Laurent Series The expansion of functions in terms of polynomials or monomials is important in calculus and was emphasized in the analysis of Chapter 5. We now apply this concept to analytic functions. 9.6.1 Properties of Series absolute convergence power series circle ofconvergence The reader is assumed to have some familiarity with complex series. Nevertheless, we state (without proof) the most important properties of complex series before discussing Taylor and Laurent series. A complex series is said to converge absolutely ifthe real series L~o IZkl = L~o Jx~ +y~ converges. Clearly, absolute convergence implies convergence. 9.6.1. Proposition. If the power series L~o ak(Z - ZO)k converges for Zl i' zo, then it converges absolutely for every value of Z such that Iz- zol < IZI - zol. Similarly if the power series L~o bk/(Z - ZO)k converges for Z2 i' zo. then it converges absolutelyfor every value ofz such that [z - zol > IZ2 - zn]. A geometric interpretation of this proposition is that if a power series-with positive powers--converges for a point at a distance 71 from ZQ, then it converges for all interior points of the circle whose center is zo, and whose radius is r. Similarly, if a power series-with negative powers-converges for a point at a distance r2 from zo, then it converges for all exterior points of the circle whose center is zo and whose radius is r: (see Figure 9.9). Generally speaking, positive powers are used for points inside a circle and negative powers for points outside it. The largest circle about zo such that the first power series ofProposition 9.6.1 converges is called the circle of convergence of the power series. The propo- sition implies that the series carmot converge at any point outside the circle of convergence. (Why?) In determining the convergence of a power series 00 S(z) es I>n(z - zo)". n=O (9.13)
  • 271. 9.6 TAYLOR ANO LAURENT SERIES 253 (a) Figure 9.9 (a) Power series withpositive exponents converge for the interiorpoints of a circle. (b) Power series withnegative exponents converge fortheexterior points of acircle. we look at the behavior of the sequence of partial sums N SN(Z) es Ean(z - zo)". n=O Convergence of (9.13) implies that for any 8 > 0, there exists an integer N. such that whenever N > Ns- uniform convergence explained power series are uniformly convergent and analytic power series can be differentiated and integrated term by term In general, the integer N. may be dependent on z; that is, for different values of z, we may be forced to pick different N. 's. When N. is independent of z, we say that theconvergence is uniform. 9,6.2. Theorem. The power series S(z) = L~oan(Z - ao)" is uniformly con- vergent for all points within its circle of convergence and represents a function that is analytic there. By substituting the reciprocal of (z - zo) in the power series, we can show that if L~o bk/(z - ZO)k is convergent in the annulus rz < Iz- zol < rl, then it is uniformly convergent for all Z in that annulus. 9.6.3. Theorem. A convergentpower series can be differentiated and integrated term by term; that is, if S(z) = L~o an(z - zo)".then dS(z) 00 1 00 1 ~=Enan(z-zo)n-I, S(z)dz=Ean (z-zo)ndz n=l Y n=O Y for any path y lying in the circle ofconvergence ofthe power series.
  • 272. (9.14) 254 9. COMPLEX CALCULUS 9.6.2 Taylor and Laurent Series We now state and prove the two main theorems of this section. A Taylor series consists of terms withonly positivepowers.A Laurent series allows fornegative powersas well. Taylor series 9.6.4. Theorem. (Taylor series) Let f be analytic throughout the interior of a circle Co having radius ro and centered at zoo Then at each point z inside Co. 00 f(n)(zo) f(z) = f(zo) + f'(zo)(z - zo) +... = L I (z - zo)". n=O n. That is. the power series converges to f(z) when [z- zol < roo Proof From the CIF and the fact that z is inside Co, we have Maclaurin series f(z) = ~ J f(~) d~. 2rrl feo ~ - z On the other hand, 1 1 1 ~ - z = ~ - zo +zo - z = (~_ zo) (1 _z - zo) ~ -zo = 1 1 1 ~ (Z - ZO)n ~-zo 1-~ = ~-zo~ ~-zo ; -zo n_O The last equality follows from the fact that I(z - zo)/(~ - zo)1 < I-because z is inside the circle Co and ~ is on it-and from the sum of a geometric series. Substituting in the CIF and using Theorem 9.5.2, we obtain the result. D For zo = 0 we obtain the Maclanrin series: 00 f(n)(o) f(z) = f(O) + J'(O)z +... = L __zn n=O n! The Taylor expansion reqnires analyticity of the function at all points interior to the circle Co. On many occasions there may be a point inside Co at which the function is not analytic. The Laurent series accommodates such cases. Laurent series 9.6.5. Theorem. (Laurent series) Let Cl and C2 be circles ofradii Yj and ri. both centered at zo in the Z-plane with Yj > rz. Let f : iC --+ iC be analytic on C1 and C2 and throughout S, the annular region between the two circles. Then, at each point z E S, f(z) is given by 00 f(z) = L an(z - zo)" n=-oo where an = _1 J f(~) d~ 2rri fc (~ - zo)n+l and C is any contour within S that encircles zo.
  • 273. (9.15) 9.6 TAYLOR AND LAURENT SERIES 255 Figure9.10 Theannular region within andon whosecontour theexpanded function is analytic. Proof Let Y be a small closed contour in S enclosing z, as shown in Figure 9.10. For the composite contour C' the Cauchy-Goursat theorem gives 0= J f(~) d~ = J f(~) d~ _ J f(~) d~ _ J f(~) d~, Yc' ~ - Z YCI ~ - Z Yc, ~ - z r, ~ - z where the y and C2 integrations are negative because their interior lies to our right as we traverse them. The y integral is simply 2rr:if(z) by the CIF. Thus, we obtain 2rr:if(z) = J f(~) d~ _ J f(~) d~. YCI ~ - Z Yc, ~ - z Now we use the same trick we used in deriving the Taylor expansion. Since z is located in the annular region, ri < [z - zol < rl. We have to keep this in mind When expanding the fractions. In particular, for ~ E CI we want the ~ term in the denominator, and for ~ E C2 we want it in the numerator. Substituting such expansions in Equation (9.15) yields 00 i f(~)d~ 2rr:if(z) = I:(z - zo)" (~_ )n+l n=O Cl zo 00 I i +I:( )n+l f(;)(~ - zo)" d~. n=Oz-zo C2 (9.16) Now we consider an arbitrary contour C in S that encircles zoo Figure 9.11 shows a region bounded by a' contour composed of CI and C. In this region
  • 274. 256 9. COMPLEX CALCULUS 1(1;)/(1; - zo)n+l is analytic (because I; can never equal zo). Thus, the integral over the compositecontourmust vanish by the Cauchy-Goursattheorem. Itfollows that the integral over C1 is equal to that over C. A similar argument shows that the C2 integral can also be replaced by an integral over C. We let n +1 = -m iu the second sum of Equation (9.16) to transform it into f: 1 m J. 1(1;)(1; - zo)-m-l dl; = f (z - zo)m J. 1(1;) ~:+1 . m~-1 (z - zo) 1'c m=-oo 1'c (I; - zo) Changing the dummy index back to n and substituting the result in Equation (9.16) yields 00 i 1(1;) n=-1 i 1(1;) 2:n:i/(z) = L(z - zo)" (I; _ )n+l dl; + L (z - zo)" (I; _ )n+l dl;. n=O C zo -00 c zo We can now combine the sums and divide both sides by 2:n:i to get the desired expansion. D The Laurentexpansion is convergeut as loug as'2 < Iz- ZoI < 'I. Iu particular, if'2 = 0, and if the function is analytic throughout the interior of the larger circle, then an will be zero forn = -1, -2, ... because 1(1;)/(1; _zo)n+l will be analytic for negative n, and the integral will be zero by the Cauchy-Goursat theorem. Thus, ouly positive powers of (z - zo) will be present in the series, and we will recover the Taylor series, as we should. It is clear that we can expand CI and shrink C2 until we encounter a poiut at which I is no longer analytic. This is obvious from the constructiou of the proof, in which ouly the analyticity in the annular region is important, not its size. Thus, we can include all the possible analytic points by expanding CI and shrinking C2. 9.6.6. Example. Letus expand somefunctions in tenusof series. Foranentire function there is nopointin theentire complex plane atwhichit is notanalytic. Thus, onlypositive powers of (z - zo) will bepresent, andwe will haveaTaylor expansion that is validforall values of z. (a)Let usexpand eZ around 20 = O. Thenthderivative of eZ is eZ• Thus, j(n)(O) = 1,and Taylor (Maclaurin) expansion gives 00 tIn) (0) 00 zn e'= L--zn=L-' n=O n! n=O n! (b)TheMaclaurin series forsinz is obtained by noting that d n . I {O ifn is even, -r-r-smz = dzn ,~o (_I)(n-I)/2 ifnisod<! and substituting thisin theMaclaurin expansion: n 00 2k+l sinz = L (_I)(n-I)/2~ = L(-I)k z . n odd n! k~O (2k + 1)1
  • 275. 9.6 TAYLOR AND LAURENT SERIES 257 Figure 9.11 Thearbitrary contour intheannular regionusedin theLaurent expansion. Similarly, we canobtain 00 Z2k cosz = {;,(_I)k (Zk)!' 00 Z2k+l sinhz = (;, ""(Z"-k-+--'I'""')!' 00 Z2k coshz = I: (Zk " k~O ). (c) Thefunction 1/ (1 + z) is notentire, so theregionof its convergence is limited. Letus find the Maclaurin expansion of this function. The function is analytic within all circles of radii r < 1. At r = 1 we encounter a singularity, the pointz = -1. Thus,the series converges forallpointsl! z forwhich [z] < 1. Forsuchpoints we have d n I /(n)(O) = -[(I +z)-I] . =(_I)nnL dzn z=o Thus, _1_ = f /(n) (0)z" = f(-I)nZn. l+z n=O n! n=O III Taylor and Laurent series allow us to express an analytic function as a power series. For a Taylor series of !(Z), the expansion is routine because the coeffi- cient of its nth term is simply !(n)(zo)/nl, where zo is the center of the circle of convergence. When a Laurent series is applicable, however, the nth coefficient is not, in general, easy to evaluate. Usually it can be found by inspection and certain manipulations of other known series. But if we use such an intuitive approach 11As remarked before, theseriesdiverges forall points outside thecirclelz! = 1. Thisdoesnotmeanthat thefunction cannot be represented by a seriesforpointsoutside thecircle.Onthe contrary, we shallsee shortly that Laurent series,withnegative powers of z - zearedesignedprecisely forsucha purpose.
  • 276. 258 9. COMPLEX CALCULUS n+l = 8kn Laurent series is unique You can add, subtract, and multiply convergent power series to determine the coefficients, can we be sure that we have obtained the correct Laurent series? The following theorem answers this question. 9.6.7. Theorem. Ifthe series L~-oo an(z - zo)n converges to fez) at all points insomeannular region aboutzo. then it is theunique Laurent seriesexpansion of fez) in that region. Proof Multiply both sides of f (z) = L~-oo an(z - zo)n by 1 211:i(z - zo)k+!' integrate the result along a contour C in the annular region, and use the easily verifiable fact that 1 J dz 211:i fc (z - zo)k to obtain I J fez) 211:i fc (z_ zo)k+! dz = ar. Thus, the coefficient in the power series of f is precisely the coefficient in the Laurent series, and the two must be identical. 0 We will look at some examples that illustrate the abstract ideas developed in thepreceding collectionof theorems and propositions. However, we canconsider a much broader range of examples if we know the arithmetic of power series. The following theorem giving arithmetical manipulations with power series is not difficult to prove (see [Chur 74]). 9.6.8. Theorem. Letthetwopowerseriesf(z) = L~_ooan(z-zo)nandg(z) = L~-oo bn(z - zo)n be convergent within some annular region r2 < [z- zoI < TJ· Then the sum L~-oo (an+bn)(z - zo)n converges to f (z)+g(z), and theproduct 00 00 00 L L anbm(z - zo)m+n == L Ck(Z - ZO)k n=-oo m=-oo k=-QO converges to f(z)g(z) for Z interior to the annular region. Furthermore, if g(z) i' 0 for some neighborhood of zo. then the series obtained by long divi- sion ofL~_ooan(Z - eo)" by L:'=-oo bm(z - zo)" converges to f(z)/g(z) in that neighborhood. This theorem, inessence, says that converging power series can be manipulated as though they were finite sums (polynomials). Such manipulations are extremely useful when dealing with Taylor.andLaurent expansions in which the straightfor- ward calculation of coefficients may be tedious. The following examples illustrate the power of infinite-series arithmetic.
  • 277. 9.6.9. Example. rewriteit as 9.6 TAYLOR AND LAURENT SERIES 259 Toexpandthefunctionf(z) = 22+3~ in a Laurent seriesaboutz =0, z +z fez) = ~ (2+ 3Z) = ~ (3 __1_) = ~ (3 _~(_I)nZn) Z21+Z z2 I+z Z2 L.. n=O 1 2 3 2 1 2 =-(3-I+z-z +Z -···)=-+--I+z-z +.... Z2 Z2 . Z This seriesconverges for 0 < [z] < 1.Wenotethat negativepowersof z are alsopresent.12 Using the notation ofTheorem 9.6.5, we have a_2 = 2, a_I = 1, an = 0 for n :::s -3, and an = (_1)n+1 forn '" O. III 9.6.10. Example. The function fez) = 1/(4z - z2) is the ratio of two entire functions. Therefore, by Theorem9.6.8, itis analytic everywhereexceptat the zeros ofits denominator, Z = 0 and z =4. For tileannularregion (herer2 of Theorem9.6.5is zero)0 < lel < 4, we expand f(z) in the Laurent series around z = O. Instead of actually calculating an, we first note that fez) = :z (I _lz/4)· The second factor can be expanded in a geometric series because Iz/41 < 1: l OO n 00 _ _ '" (~) _ '" 4-n n 1 _ z/4 - L.. 4 - L.. z. n=O n=O Dividing this by cz, and noting that z = 0 is the only zero of 4z and is exclnded from the annular region, we obtain the expansion Although we derived this series using manipulations ofother series, the uniqueness ofseries representations assures us that this is the Laurent series for the indicated region. How can we represent fez) in the region for which Iz[ > 4? This region is exterior to the circle lel = 4, so we expect negative powers of z. To find the Laurent expansion we write fez) = - z; C_1 4 /J and note that ]4/zl < 1 for points exterior to the larger circle. The second factor can be written as a geometric series: 12This is a reflection of the fact that the function is not analytic inside the entire circle lz! = 1; it blows up at z = O.
  • 278. 260 9. COMPLEX CALCULUS Dividingby _z2, whichis nonzero intheregionexterior tothelarger circle,yields 00 f(z) = - I>"z-n-2 n=Q 11II 9.6.11. Example. The function fez) = z([(z -1)(z-2)] has a Taylor expansion around theoriginIorlz] < 1.Tofind thisexpansion, we write13 1 2 1 1 f(z) =--1 +-2 = -1- - - 1 (2· z- z- -z -z Expanding both fractions in geometric series (hoth [z] and Iz(21are less than 1), we ohtain fez) = L~o z" - L~O(z(2)n. Adding the two series-nsing Theorem 9.6.8-yie1ds 00 fez) = L(l- 2-")zn n=O for Izl < 1. This is the unique Taylor expansion of fez) within the circle Izi = 1. For1 < Izi -c 2 we havea Laurent series. Toobtain thisseries, write l(z 1 1 ( 1) 1 fez) = l(z - 1 - 1 - z(2 = -z 1 - l(z - 1 - z(2· Since hoth fractions on the RHS converge in the annuiarregion (11(zl < 1, Iz(21 < 1), we get 100 In 00 n 00 00 fez) = -zL (z) - L m= - Lz- n- I - L2- nzn n=D n=O n=O n=O -00 00 00 =- L z"> Lz-nzn= L anZ n , n=-l n=O n=-oo where an = -1 forn < 0 and an = _Z-n farn ::: O. Thisis the unique Laurent expansion of f (z) in the givenregion. Finally, for Izl > 2 we have only negative powers of z. Weobtain theexpansion inthis region hy rewriting f (z) as follows: l(z 2(z fez) = -1 _ l(z + 1 - 2(, Expanding the fractions yields 00 00 00 fez) = - L z-n-I + L 2n+1z-n-1 = L(2n+l - 1)z-n-1. n=O n=O n=O Thisis again theunique expansion of f(z) in theregion[z] > 2. 11II 13We could,of course, evaluate the derivatives of all orders of thefunction at z = 0 anduse Maclaurin's formula. However, thepresent methodgives the sameresultmuchmorequickly.
  • 279. 9.6 TAYLOR ANO LAURENT SERIES 261 9.6.12. Example. Definef (z) as f { (I - cosz)/z2 for z '" 0, (z) = 1 2 for z = O. Wecanshowthat f(z) is anentire function. Since 1 - cosZ andz2 are entire functions, their ratio is analytic everywhere exceptat thezerosof its denominator. Theonlysuchzerois z = O. Thus,Theorem 9.6.8 implies that f(z) is analyticeverywhere exceptpossiblyat z = O. To see thebehaviorof f(z) at z = 0, we look atits Maclaurin series: 00 z2n 1- cosz = 1- L(-I)n_-, n~O (2n)! whichimpliesthat The expansion on the RHS shows that the value of the series at z = 0 is !'which, by definition, is f(O). Thus,the seriesconverges for all z, andTheorem9.6.2 saysthat f(z) is entire. III A Laurent series can give information about the integral of a function around a closed contour in whose interior the function may not be analytic. In fact, the coefficient of the first negative power in a Laurent series is given by a_I = .2., J f(~) d~. 21f1 rc (9.17) Thus, to find the integral of a (nonanalytic) function around a closed contour sur- rounding zo, we write the Laurent series for the function and read offthe coefficient of the I/(z - zo) term. 9.6.13. Example. As an illustrationof thisidea.Iet us evaluatethe integralI = Pc dz/ [z2(z - 2)], whereC is the circleof radius I centeredatthe origin.Thefunctionis analytic in theannular region 0 < [z] < 2. We can therefore expand it as a Laurent series about z =0 in thatregion: I I ( I ) I 00 (z)n z2(z - 2) = - 2z2 I - z/2 = - 2z2]; 2: =-~ C2) - ~ G)-~ -.... Thus,LI = -1,and Pcdz/[z2(z - 2)] = 21tia_l = -i1t/2. A directevaluationof the integral isnontrivial. Infact, wewill seelaterthat to find certain integrals, itis advantageous to cast them in the form of a contourintegraland use either Equation (9.17) or a related equation. II
  • 280. 262 9. COMPLEX CALCULUS Let f : C -> C be analytic at zoo Then by definition, there exists a neighbor- hood ofzo in which f is analytic. In particular, we can find a circle Iz-zol = r > a in whose interior f has a Taylor expansion. zero oforder k 9.6.14. Definition. Let 00 f(n) (eo) 00 f(z) = L I (z - zo)" es Lan(Z - zn)". n=O n. n=O Then f is saidto have a zero oforderk atzo if f(n) (zn) = Oforn = 0, 1, ... , k-l but f(k) (zo) i' o. In that case f(z) = (z - ZO)k. I:~o ak+n(Z- zo)n, where ak i' aand [z-zol < r. We define g(z) as 00 g(z) = L ak+n(z - zo)" n=O where [z - zol < r thezeros ofan analytic function are isolated simple zero and note that g(zo) = ak i' O. Convergence of the series on the RHS implies that g(z) is continuous at ZOo Consequently, for each E > 0, there exists 8 such that Ig(z) - akl < E whenever [z - zol < 8.lfwe choose E = lakl/2, then, for some 80> 0, Ig(z) - akl < lakl/2 whenever [z - zo] < 80. Thus, as long as z is inside the circle [z - zn] < 80, g(z) cannot vanish (because if it did the first inequality would imply that lakl < lakl/2). We therefore have the following result. 9.6.15. Theorem. Let f : C -> iC be analytic at zo and f (zo) = O. Then there exists a neighborhood of zo throughout which f has no other zeros unless f is identically zero there. Thus, the zeros ofan analytic function are isolated. When k = 1, we say that zo is a simple zero of f. To find the order of the zero of a function at a point, we differentiate the function, evaluate the derivative at that point, and continue theprocess untilwe obtain anonzero valueforthederivative. 9.6.16. Example. (a) The zeros of cosz, which are z = (2k + 1),,/2, are all simple, because d d coszl = - sin [(2k + I)::] '"O. z z~(2k+I)n/2 2 (b)Tofindtheorderof thezeroof f (z) = eZ - 1- z - z2/2 at z = 0, we differentiate f(z) andevaluate f'(O): f'(O) = (eZ -I-z)z=o = O. Differentiating again gives f"(O) = (eZ - l)z=O = O. Differentiating once more yields fm (0) = (eZ)z~o = I. Thus,the zero is of order 3. III
  • 281. 9.7 PROBLEMS 263 9.7 Problems 9.1. Show that the function w = l/z maps the straight line y = a in the z-plane onto a circle in the w-plane with radius l/(2Ial) and center (0, l/(2a)). 9.2. (a) Using the chainmle, find af/az* and af/az in terms of partial derivatives with respect to x and y. (b) Evaluate af/az* and af/az assuming that the Cauchy-Riemann conditions hold. 9.3. Show that when z is represented by polar coordinates, the derivative of a function f(z) can be written as df =e-19 (au +iav), dz aT aT where U and V are the real and imaginary parts of f(z) written in polar coor- dinates. What are the C-R conditions in polar coordinates? Hint: Start with the C-R conditions in Cartesian coordinates and apply the chain IDle to them using x = TCOSO and y =T sinO. 9.4. Show that d/dz(lnz) = liz. Hint: Find u(x, y) and v(x, y) for ln z and differentiate them. 9.5. Show that sin z and cos z have only real roots. 9.6. Show that (a) the sum and the prodnct of two entire functions are entire, and (b) the ratio of two entire functions is analytic everywhere except at the zeros of thedenominator. 9,7. Given that u =2Aln[(XZ+yZ)I/Z], show that v =2Atan"! (yfx), where u and v are the real and imaginary parts of an analytic function w(z). 9.8. If w(z) is any complex potential, show that its (complex) derivative gives the components of the electric field. 9.9. (a) Show that the flnx through an element of area da of the lateral surface of a cylinder (with arbitrary cross section) is d¢ = dz(IEI ds) where ds is an arc length along the equipotential surface. (b) Prove that lEI = Idw/dzl =av/as where v is the imaginary part of the com- plex potential, and s is the parameter describing the length along the equipotential curves. (c) Combine (a) and (b) to get flux per unit z-length = _¢- = v(Pz) - V(PI) zz - Zl for any two points PI and Pz on the cross-sectional curve of the lateral surface. Conclude that the total flux per unit z-length through a cylinder (with arbitrary
  • 282. 264 9. COMPLEX CALCULUS cross section) is [v], the total change in v as one goes around the curve. (d) Using Gauss's law, show that the capacitance per unit length for the capacitor consisting of the two conductors with potentials Ut and U2 is c '" charge per unit length _ [v]j4rr potential difference - IU2 - Uti' 9.10. Using Equation (9.7) (a) find the equipotential curves (curves of constant u) and curves of constant v for two line charges of equal magnitude and opposite signs located at y = a and y = -a in the xy-plane. (b) Show that z=a(sin;),. +iSinh;Jj(COSh;),. -cos 2:) by solving Equation (9.7) for z and simplifying. (c) Show that the equipotential curves are circles in the xy-plane of radii a] sinh(uj2)") with centers at (0, a coth(uj2)")), and that the curves of constant v are circles of radii a] sin(vj2)") with centers at (a cot(vj2)"), 0). 9.11. In this problem, you will find the capacitance per unit length of two cylin- dtical conductors of radii Rt and R2 the distance between whose centers is D by looking for two line charge densities +),. and -),. such that the two cylinders are two of the equipotential surfaces. From Problem 9.10, we have a Ri = sinh(u;j2),.) ' Yi =a coth(u;j2)"), i = 1,2, where Yjand Y2 are the locations of the centers ofthe two conductors on the y-axis (which we assume to connect the two centers). I Ut U21 (a) Show that D = IYt - Y21 = Rt cosh 2)" - R2 cosh 2)" . (b) Square both sides and use cosh(a -b) = cosh a coshb - sinha sinhb and the expressions for the R's and the y's given above to obtain ( Ut - U2) IRr+R~-D21 cosh = . 2)" 2RtR2 (c) Now find the capacitance per unit length. Consider the special case of two concenttic cylinders. (d) Find the capacitance per unit length of a cylinder and a plane, by letting one of the radii, say Rj, go to infinity while h '" Rj - D remains fixed.
  • 283. 9.7 PROBLEMS 265 9.12. Use Equations (9.4) and (9.5) to establish the following identities. (a) Re(sinz) = sin x coshy, (b) Re(cosz) = cosxcoshy, (c) Re(sinhz) = sinh x cosy, (d) Re(coshz) = cosh x cosy, (e) Isind = sin2x + sinh2 y, (f) Istnh z]? = sinh2x + sin2 y, 9.13. Find all the zeros of sinh z and cosh z. 9.14. Verify the following hyperbolic identities. Im(sin z) = cos x sinh y. Im(cosz) = - sinx sinhy. Im(sinhz) = cosh x siny. Im(coshz) = sinhxsiny. 1 coszl2 = cos2x + sinh2 y. 1 coshzl2 = sinh2x + cos2 y. (a) cosh2Z - sinh2Z = 1. (b) COSh(ZI + Z2) = COShZl cosh zj + sinh zj sinh zj. (c) sinhtz: + Z2) = sin zj COShZ2 + COShZl sinh z-. (d) cosh2z = cosh2z + sinh2z, sinh2z = 2 sinh zcosh z, taoh Zj + tanh Z2 (e) tanhfzj +Z2) = 1 +tanh zj taohZ2' 9.15. Show that (a) taoh(!:) = sinhx+isiny 2 cosh x + cos y 9.16. Find all values of z such that ( Z) sinhx-isiny (b) coth - = . 2 coshx - cos Y (a) eZ = -3. (b) eZ = 1 +i./3. (c) e 2z- 1 = 1. 9.17. Show that le-zi < 1 if and only if Re(z) > O. 9.18. Show that both the real and the imaginary parts of an aoaiytic function are harmonic. 9.19. Show that each of the following functions-s-call each one u(x, y)-is har- monic, and find the function's harmonic partner, v(x, y), such that u(x, y) + iv(x, y) is analytic. (a)x3 - 3xi. (b) eX cosy. 2 2 2 (d)e- Ycos2x. (e) eY -X cos2xy. (I) eX(x cosy - y siny) + 2 sinh y sin x + x 3 - 3xi + y.
  • 284. 266 9. COMPLEX CALCULUS 9.20. Prove the following identities. (u) cos-I Z = -i In(z ± JZ2 - I). I (i - Z) (c) tan-I z = ---, In -.- . 2, ,+z (e) sinh-I z = In(z ± JZ2 + I). (b) sin-I z = -i In[iz ± ~)]. (d) cosh-I z = In(z ± ~). (f) tanh-I z =! In (I +Z). 2 I-z 9.21. Find the curve defined by each of the following equations. (u) z = I - it, 0::; t ::; 2. n 3:n: (c) z =u(cost+isint), "2::; t::; 2' (b) z = t +it2, -00 < t < 00. i (d) z = t + -, -00 < t < O. t 9.22. Provide the details of the proof of part (a) of Proposition 9.3.1. Prove part a2 cf> a2 cf> (b) by showing that if f(z) = z' = x' +iy' is analytic and -2 + -2 = 0, then ax ay a2cf> a2cf> -' +--0 axil ay'2 - . 9.23. Let f(t) = u(t) + iv(t) be a (piecewise) continuous complex-valued function of a real variable t defined in the interval a ::; t ::; b. Show that if F(t) = U(t) +iV(t) is a function such that dF[dt = f(t), then lb f(t) dt = F(b) - F(u). This is the fundamental theorem of calculus for complex variables. 9.24. Find the value of the integral Icr(z +2)/z]dz, where C is (a) the semicircle z = 2eiO , for 0 ::; e::; it; (b) the semicircle z = 2e'&,for:n: ::; e ::; 2:n:, and (c) thecirclez = 2ei(), for-]i ~ 8 .:5 Jr. 9.25. Evaluate the integra1Iy dz/(z-I-i) where y is (a) the line joining zi = 2i and Z2 = 3, and (b) the broken path from zi to the origin and from there to Z2. 9.26. Evaluate the integral Ie zm(z*)ndz, where m and n are integers and C is the circle [z] = I taken counterclockwise. 9.27. Let C be the boundary ofthe square with vertices at the points z =0, z = I, z = I +i, and z =i with counterclockwise direction. Evaluate fc(5z +2)dz and
  • 285. 9.7 PROBLEMS 267 9.28. Let Cl be a simple closed contour. Deform Cl into anew contour C2io such a way that C1 does not encounter aoy singularity of ao aoalytic function f io the process. Show that J f(z)dz = J f(z)dz. Tel i. That is, the contour cao always be deformed iota simpler shapes (such as a circle) aod the iotegral evaluated. 9.29. Use the result of the previous problem to show that J dz. = 2,,; aod J (z - I - i)m-1dz = 0 for m = ±I, ±2, ... fCZ- I - 1 fc when C is the boundary of a square with vertices at z = 0, z = 2, z = 2 +2;, aod z = 2;, taken counterclockwise. 9.30. Use Equation (9.12) aod the bioomial expaosion to show that d dz (z'") = mzm-l. 9.31. Evaluate '.fc dZ/(Z2 - 1) where C is the circle lzl = 3 iotegrated io the positive sense. Hint: Deform C iota a contour C' that bypasses the singularities of the iotegraod. 9.32. Show that when f is aoalytic witbio aod on a simple closed contour C aod zo is noton C, then J f'(z) dz = J f(z) dz . fc z - zo fc (z - ZO)2 9.33. Let C be the boundary of a square whose sides lie along the lioes x = ±3 aod y = ±3. For the positive sense of integration, evaluate each of the following integrals. J eZ J cosz d (b) fc Z(Z2 + 10) dz: (e) fc (z _ ~)(Z2 -10) Z. i coshz i cosz (e) -4- dz. (f) -3 dz: c z c z i eZ i cosz (h) dz: (i) --.- dz: c (z - ;,,)2 C Z +in (k) J sinhz dz: (I) J eoshz dz; fc (z - ;,,/2)2 fc (z - ;,,/2)2 for - 3 < a < 3. (n) J Z2 dz: fc (z - 2)(Z2 - 10) (a) J e- z dz: fc z - i,,/2 i sinhz (d) -4-dz. c z i cosz d (g) z c (z - ;,,/2)2 . (j) J 2 e Z dz: fc z -5z +4 i taoz (m) dz c (z - a)2
  • 286. 268 9. COMPLEX CALCULUS 9.34. Let C be the circle lz - i1 = 3 integrated in the positive sense. Find the valne of each of the following integrals. (a) J: e Z dz Y c Z2 +rr2 • J: dz (d) :rc (z2 +9)2 . J: sinhz (b) Yc (Z2 +rr2j2 dz, J: coshz (e) Y c (Z2 +rr2) 3 dz: J: dz (c) :rc Z2 +9' (f) J: Z2 - 3z +4 dz, Y c z2 - 4z + 3 9.35. Show that Legendre polynomials (for [x] < I) can be represented as (_I)n i (1- z2)n Pn(x) = +1 dz; 2n(2rri) c (z - x)" where C is the unit circle around the origin. 9.36. Let I be analytic within and on the circle Yo given by [z - zol = Yo and integrated in the positive sense. Show that Cauehy's inequalityholds: nlM 1I(n) (zo)1 :::: ~, Yo where M is the maximum value of I/(z)1 on Yo. 9.37. Expand sinhz in a Taylor series about the point z = in. 9.38. What is the largest circle within which the Maclaurin series for tanh Z con- verges to tanhz? 9.39. Find the (unique) Laurentexpansion of each of the following functions about the origin for its entire region of analyticity. (d) sinhz-z Z4 (h) (Z2 ~ 1)2' I (e) --'z2'--(-'-I---z-'--). z2 -4 (g) ---Z--9' z - I (f) ---Z--I' z - I (a) . (z - 2)(z - 3) I (e) (I _ z)3' (i) _z_. z-I 9.40. Show that the following functions are entire. { e2Z_ 1 2 (a) I(z) = -----;:- - z for Z f= O, 2 for z = O. { ~ (b) I(z) = I z for z f= 0, for z = O. { cosz (e) I(z) = Z2 - rr2 /4 -I/rr for z f= ±rr/2, for z = ±rr/2.
  • 287. 9.7 PROBLEMS 269 9.41. Let I be analytic at zo and I (zo) = I'(zo) = ... = I(k)(eo) = O. Show that the following function is analytic at zo: 1 I(z) (z - ZO)k+l g(z) = flk+ll(zo) (k + I)! for z i' zo. for z = zoo 9.42. Obtain the first few nonzero terms of the Laurent series expansion of each of the following functions about the origin. Also find the integral of the function along a small simple closed contour encircling the origin. I (a) -.-. sm z Z4 (e) 3 . 6z+z -6sinhz (b) I 1- cosz I (f) -2-'-' Z smz (c) z I-coshz I (g) eZ - I' (d) --,-- z - sinz Additional Reading I. Churchill, R. and Verhey, R. Complex Variables and Applications, 3rd ed., McGraw-Hill, 1974. An introductory text on complex variables with many examples and exercises. 2. Lang, S. Complex Analysis, 2nd ed., Springer-Verlag, 1985. An extremely well-written book by a master expositor. Although the book has a formal tone, the clarity of exposition and the use of many examples make this book very readable.
  • 288. 10 _ Calculus of Residues One of the mostpowerful tools made available by complex analysis is the theory of residues, which makes possible the routine evaluation of certain definite integrals that are impossible to calculate otherwise. The derivation, application, and analysis of this tool constitute the main focus of this chapter. In the preceding chapter we saw examples in which integrals were related to expansion coefficients of Laurent series. Here we will develop a systematic way of evaluating both real and complex integrals. 10.1 Residues Recall that a singular point zo of f : iC --* iC is a point at which f fails to be analytic. If in addition, there is some neighborhood of zo in which f is analytic at every point (except of course at zo itself), then zo is called an isolatedsingu- isolated singuiarity Iarity of f. Almost all the singularities we have encountered so far have been isolated singularities. However, we will see later-when discussing multivalued functions-that singularities that are not isolated do exist. Let zo be an isolated singularity of f. Then there exists an r > 0 such that within the "annular" region 0 < [z - zoI < r, the function f has the Laurent expansion f( ) ~ ( n ~ n bl b2 Z = L- anz-zo) =L-an(Z-ZO) +--+ )2+'" n=-oo n=O Z - zo (z - zo where 1 J f(~)d~ an = 2".; fc (~ - zo)n+l and
  • 289. (10.1) 10.1 RESIDUES 271 In particular, hi = 2 1 . 1. f(~) d~, 7T:Z Yc where C is any simple closed contour around zo. traversed in the positive sense, on and interior to which f is analytic except at the point zo itself. The complex number hj, which is essentially the integral of f(z) along the contour, is called residue defined the residue of f at the isolated singular point zo. It is important to note that the residue is independent of the contour C as long as zo is the only isolated singular point within C. Pierre AlphonseLaurent (1813-1854) gradoated fromthe EcolePo1ytechniqoe nearthe top ofhis classand became asecondlieutenantin theengineering corps. Onhisreturn from the warin Algeria, he took part in the effortto improve the portat Le Havre, spending six years there directing various parts of theproject Laurent's superior officers admired thebreadth of his practical experience andthe goodjudgment it afforded theyoung en- gineer. During thisperiod he wrote his first scientific paper, onthecalculus of variations. andsubmitted it to the French Academy of Sciences for the grand prix in mathematics. Unfortunately thecompetition hadalready closed(allboughthejudgeshadnotyetdeclared a winner), andLaurent's submission wasnotsuccessful. However, thepaper so impressed Cauchy that he recommended itspublication, also without success. Thepaperfor which Laurentismostwellknownsufferedasimilarfate. Inithedescribed amore generalform ofatheoremearlierprovenbyCauchyfor the powerseries expansionof afunction. Laurentrealizedthatonecouldgeneralizethisresultto holdinany annularregion between two singular ordiscontinuous points by usingbothpositive andnegative powers in theseries, thus allowing treatment of regions beyond thefirst singular ordiscontinuous point. Again, Cauchy arguedfor thepaper'spublicationwithoutsuccess. Thepassageoftime provided amore justreward, however, andtheuse of Laurent series became a fundamental toolin complex analysis. Laurent laterworkedinthetheory of lightwaves and contended withCauchy overthe interpretation of thedifferential equations thelatterhadfonnulated to explain thebehavior of light. Little came ofhis work inthisarea. however, andLaurent diedattheageof forty- two,_ a captain serving on the committee on fortifications in Paris. His widowpressed to havetwomore of hispapers read to theAcademy, onlyone of whichwaspublished. We use the notation Res[f(zo)] to denote the residue of f at the isolated singular point zoo Equation (10.1) can then be written as t f(z) dz = 2ni Res[f(zo)]· What if there are several isolated singular points within the simple closed contour C? The following theorem provides the answer. residue theorem 10.1.1. Theorem. (the residue theorem) Let C be a positively oriented simple
  • 290. (10.2) 272 10. CALCULUS OF RESIDUES Figure 10.1 Singularitiesare avoidedby going around them. closed contour within and on which a function f is analytic except at a finite number ofisolated singular points Zl , Z2, ... , Zm interior to C. Then 1. f(z)dz=2,..itReS[f(Zk)]. fc k=1 Proof Let Ck be the positively traversed circle around Zk.Then Figure 10.1 and the Cauchy-Goursat theorem yield 0= !cJ(Z)dZ = - f f(z)dz + f f(z)dz +!c f(z)dz, circles parallel lines where C' is the union of all the contours, and the minus sign on the first integral is due to the interiors of all circles lie to our right as we traverse their boundaries. The contributious of theparallel lines cancel out, and we obtain i f(z) dz = t i f(z) dz = t 2,..i Res[f(Zk)], c k=l Ck k=l where in thelast step thedefinition ofresidue at Zk has been used. o 10.1.2. Example. Let us evaluatethe integral:fc(2z - 3) dz/[z(z - 1)] where C is the circle Izi = 2. There are two isolated singularitiesin C, Zl = 0 and Z2 = I. To find Res[f(ZI)l. we expandaroundthe origin: 2z-3 3 I 3 I 3 - - - = - - -- = - +-- = - +1 +z +... for [z] < I. z(z - I) z z- I z 1 - z z
  • 291. 10.2 CLASSIFICATION OF ISOLATED SINGULARITIES 273 This givesRes[!(z,)] = 3. Similarly, expandingaroundz = I gives 2z-3 = 3 __ I_= __ I_+ 3I:(_I)n(Z_I)n, z(z - I) z - I + I z - I z - I k=O whichyieldsRes[!(z2)] = -I. Thus, 1. 2z - 3 dz = 2"iIRes[!(z,)] + Res[f(Z2)]} = 2"i(3 - I) = 4"i. Yc z(z - I) 10.2 Classificationof Isolated Singularities III Let [ : iC -> iChave an isolated singularity at zoo Then there exist a real number r > 0 and an annular region 0 < [z - zol < r such that T can be represented by theLaurent series (10.3) principal partofa function removable singular point poles defined simple pole essential singularity The second sum in Equation (10.3), involving negative powers of (z - zo), is called the principal part of [ at zoo We can use the ptincipal part to distinguish three types of isolated singularities. The behavior ofthe function near the isolated singularity is fundamentally different in each case. I. If bn = 0 for all n 2: I, zo is called a removable singular point of f .In this case, the Laurent series contains only nonnegative powers of (z - zo), and setting [(zo) = ao makes the function analytic at zoo For example, the function fez) = (e' - 1 - Z)/z2, which is indeterminate at z =0, becomes entire if we set [(0) =1 ,becauseits Lanrentseries fez) = ~+~+:~+... hasno negative power. 2. If bn = 0 for all n > m and bm # 0, zn is called a pole of order m. In this case, the expansion takes the form ~ n bi »: [(z) = L.., an(z - zn) + -- +...+ -:---=-:-:::- n~O Z - zo (z - zo)" for 0 < lz - zn] < r. Inparticular, if m = 1, ZO is called a simple pole. 3. If the principal part of [ at zo has an infinite number of nonzero terms, the point zo is called an essential singularity. A prototype of functions that have essential singularities is
  • 292. 274 10. CALCULUSOFRESIOUES which has an essential singularity at z = aand a residue of I there. To see how strange suchfunctions are, we let a be any realnumber, and consider z = I/Ona +2n,,0 for n =0, ±1, ±2, .... For such a z we have e1/ z = elna+2mri = ae2nni = a. Inparticular, asn ---+ 00, Z gets arbitrarily close to the origin. Thus, in an arbitrarily small neighborhood ofthe origin, there are infinitely many points at which the function exp(l/z) takes on an arbitrary value a. In other words, as z ---> 0, the function gets arbitrarily close to any real number! This result holds for all functions with essential singularities. 10.2.1. Example. ORDER OF POLES (a) The function(zz - 3z+5)/(z - I) hasa Laurentseriesaroundz = I containingonly 1hreetenns: (zZ -3z+5)/(z -I) =-I +(z-I)+3/(z -I). Thus,it has a simplepole atz = 1,witha residue of 3. (b) The function sinz/z6 has a Laurentseries sinz 1 00 z2n+l 1 l I z 7 = z6 ~(_I)n (2n + 1)1 = z5 - 6z3 + (5!)z - 7! + ... about z = O. Theprincipal part hasthree terms. Thepole, at z = 0, is of order 5, and the functionhas a residue of 1/120 at z = O. (c) The function (zZ - 5z + 6)/(z - 2) has aremovahle singularityat z =2, because zZ - 5z +6 = (z - 2)(z - 3) = z _ 3 = -I + (z -2) z-2 z-2 andbn = afor all n. III 10.2.2. Example. SINGULARITIES OF A RATIONAL FUNCTION rational function In this example we showthat a function whose only singularities in the entire complex plane arepolesmusthe a rational function,1 i.e., theratio of two polynomials. Letf be suchafunction and let {zj lJ=l beitspolessuchthat Zj isof orderml :Expand thefunction about zt in a Laurent series Afunction whose only singularities in the entire complex plane are poles isa rational function. bj bmj ~ k Pt(z) fez) = - - + ...+ ( )mj + L" ak(Z - Zt) == ( )mj + 81(Z), Z - Zl Z - Zl k=O Z - Zl where Pj (z) is a polynomialof degree mj - I in z and 81 is analytic at Zj. It should be clearthat theremaining poles of f are in gl. So, expand gl about Z2 in a Laurent series. A similar argumentas aboveyields81(z) = Pz(z)/(z - Zz)m2 +8Z(Z) where P2(Z) is a polynomial ofdegree m2 - I in z andg2 is analytic atzt and Z2. Continuing inthismanner, we get where g hasnopoles.Sinceallpolesof f havebeenisolatedin thesum,g mustbe analytic everywhere in C, i.e., anentire function. Now substitute 1/ t for z, take the limit t --+ 0, lWe assume that thepointatinfinity is nota poleof the function, i.e., thatf(l/z) does nothavea pole at theorigin.
  • 293. 10.3 EVALUATION OF DEFINITE INTEGRALS 275 andnote that, since thedegreeof Pi is mi - 1, all theterms in thepreceding equation go to zeroexceptpossiblygO/t). Moreover, lim g(l/t) i' 00, HO because, by assumption, thepointatinfinity is nota pole of f. Thus,g is abounded entire function. By Proposition 9.5.5, g mustbea constant. Taking acommon denominator forall the termsyields a ratioof two polynomials. II The type of isolated singularity that is most important in applications is of the second type-poles. For a function that has a pole oforder m at zo, the calculation of residaes is routine. Such a calculation, in turn, enables us to evaluate many integrals effortlessly. How do we calculate the residue of a function f having a pole of order m at zo? It is clear that if f has a pole of order m, then g : iC --> iC defined by g(z) es (z - zo)mf(z) is analytic at zoo Thus, for any simple closed contour C that contains zo but no other singular point of f, we have I i i i g(z)dz' g(m-l)(zo) Res[f(zo)] = -2' f(z)dz = -2' ( )m = ( 1)1' 1l'1 C 7n C z - zo m - . Interms of f this yields2 I dm- I Res[f(zo)] = ( 1)1 lim d ..-I [(z - zo)" f(z)]. m - . z-e-zn Z For the special, but important, case of a simple pole, we obtain Res[f(zo)] = lim [(z - zo)f(z)]. Z--+ZQ 10.3 Evaluation of Definite Integrals (lOA) (10.5) Themostwidespreadapplicationofresidues occurs in the evaluationofrealdefinite integrals. It is possible to "complexify" certain real definite integrals and relate them to contour integrations in the complex plane. We will discuss this method shortly; however, we first need a lemma. 10.3.1. Lemma. (Jordan's lemma) Let CR be a semicircle ofradius R in the upper halfof the complex plane (UHP) and centered at the origin. Let f be a function that tends uniformly to zerofaster than l!I'zl for arg(z) E [0, n] as [z] --> 00. Let 0: be a nonnegative real number. Then 2Thelimitis taken becausein manycasesthemeresubstitution of zo mayresultin anindeterminate form.
  • 294. 276 10. CALCULUS DF RESIDUES Proof. For Z E CR we write z = Rei9, dz = IRei9de, and IOIZ = IOI(Rcos e +IR sin e) = IOIR cos e - OIR sine and substitute in the absolute value of the integral to show that IIRI :s1o"e-·Rsin9Rlf(Rei9)lde. By assumption, Rlf(Rei9)1< feR) independent of e, where feR) is an arbitrary positive number that tends to zero as R ---> 00. By breaking up the interval of integration into two equal pieces andchanging eto n - ein the second integral, one can show that IIRI < 2f(R) 10,,/2 e-·Rsin9de. Furthermore, sin e "= 2eIn for 0 :s e :S n 12 (see Figure 10.2 for a "proof"). Thus, fo " f2 . nf(R) Ihl < 2f(R) e-(2.R/,,)9de = --(I - e-·R), o OIR which goes to zero as R gets larger and larger. D Note that Jordan's lemma applies fora = 0 as well, because (I-e-·R) ---> OIR as a ---> O. If a < 0, the lemma is still valid if the semicircle CR is taken in the lower half of the complex plane (LHP) and fez) goes to zero mtiformly for n :S arg(z) :S 2n. We are now in a position to apply the residue theorem to the evaluation of definite integrals. The three types of integrals most commonly encountered are discussed separately below. In all cases we assume that Jordan's lemma holds. 10.3.1 Integrals of Rational Functions The first type of integral we can evaluate using the residue theorem is of the form I -100 p(x) d 1- - - x -00 q(x) , where p(x) and q(x) are real polynomials, and q(x) f= 0 for any real x. We can then write t, = lim 1R p(x) dx = lim r p(z) dz R....oo -R q(x) R....oo lex q(z) , where Cx is the (open) contour lying on the real axis from - R to +R. Assuming that Jordan's lemmaholds, we can close that coutour by adding to it the semicircle
  • 295. 10.3 EVALUATION DF DEFINITE INTEGRALS 277 ----------r-------- A ---+-------!----------'. ----r-··.-- 1/,,- .i., -+.--------. !--------~---'i,-------- -! ....__...i···········-r-·········-t·__······_+·····--T .........i···········_j_···········f·· _·_·-t-·_---------:--·_···_----+---- ·····1 I o -0.5 0 0.5 I 1.5 2 2.5 3 3.5 Figure 10.2 The "proof" of sin6 ::: 26/" for 0", 6 '" "/2. The line is the graphof y = 26/,,; thecurve is that of y = sin6. of radius R [see Figure lO.3(a)]. This will not affect the value of the integral, because in the limit R --+ 00, the contribution of the integral of the semicircle tends to zero. We close the contour in the UHP if q(z) has at least one zero there. We then get !J = lim J: p(z) dz = 2"j ~Res[P(Zj)] R-+C.,rCq(z) ~ q(Zj) , where C is the closed contourcomposed ofthe interval (- R, R) and the semicircle CR,and (Zj 11~t are the zeros of q(z) in the UHP. We may instead close the contour in the LHP,3 in which case where (Zj I1=1are the zeros of q(z) in the LHP. The minus sign indicates that in the LHP we (are forced to) integrate clockwise. 10.3.2. Example. Letus evaluate theintegrall = 1 0 00 x2dx/[(x2 + 1)(x 2 +9)]. Since the integrand is even,we can extend the interval of integration to all real numbers (and dividethe resultby 2).It is shownbelowthat Jordan's lemmaholds. Therefore, we write the contourintegral corresponding to I: 3Provided thatJordan's lemmaholdsthere.
  • 296. 278 10. CALCULUS OF RESIDUES where C is as shownin Figure 1O.3(a). Note that the contour is traversed in the positive sense.Thisis alwaystrue fortheUHP. Thesingularities of thefunction in theUHPare the simple poles i and3i corresponding to the simplezerosof thedenominator. Theresidues atthesepoles are 2 1 Res[f(i)] = lim (z _ i) z - - z....t (z - i)(z +i)(z2 +9) 16i ' Res[f(3i)] = lim (z _ 3i) z2 = 3 z....3' (z2 + l)(z - 3i)(z +3i) 16i' Thus,we obtain It is instructive to obtain thesameresults usingtheLHP. In thiscase,thecontour is as showninFigure 10.3(b) andis taken clockwise,so we haveto introduce aminussign.The singular points areatz = -i and z = - 3i. Thesearesimplepolesatwhichtheresidues of thefunction are 2 Res[f(-O] = lim (z + O-:----::--,----,-z--,---,,---:::- z....-, (z - i)(z +i)(z2 +9) 16i' 2 Res[f(-3i)] = lim (z +3i) z = z....-3' (z2 +1)(z - 3i)(z +3i) Therefore, 3 16i To show thatJordan's lemma applies to this integral, we have only to establish that limR-4oo Rlf(Rei8)1 = O. In the case at hand, a = 0 becausethere is no exponential function intheintegrand. Thus, whichclearlygoes to zeroas R --+ 00. 10.3.3. Example. Let us uow consider a slightly more complicated integral: 1 00 x2dx -00 (x2 + 1)(x2 +4)2' 11II which in the UHP turns into Pc z2dz/[(z2 + l)(z2 +4)2]. The poles in the UHP are at z = i andz = 2i. The former is a simplepole, andthe latter is a pole of order 2. Thus,
  • 297. 10.3 EVALUATION OF DEFINITE INTEGRALS 279 3i -R (a) -i -3i (b) R Figure10.3 (a)Thelargesemicircleis cbosenin the UHP. (b) Notebowthe directionof contour integration is forced tobe clockwisewhenthesemicircle is chosen in theLHP. usingEquations(l0.5) and (10.4),we obtain Res[f(i)l= lim(z-i) z2 = __ 1_, z....i (z - i)(z +i)(z2 +4)2 18i Res[f(2i)] = _1_ lim .'!- [(Z _2i)2 z2 ] (2 - 1)1 z....2' dz (z2+ I)(z + 2i)2(z - 2i)2 d [ z2 ] 5 = ,~, dz (z2+ I)(z +2i)2 = 72i' and foo ~~.::x_2=dx:;.-_~ _ 2"i ( __1_ + _5_) _z. -00 (x2+1)(x2+4)2 - 18i 72i - 36' Closingthecontour in theLHP wouldyieldthesameresult. 10.3.2 Products of Rational and Trigonometric Functions III The second type of integral we can evaluate using the residue theorem is of the . form 1 00 pIx) --cosaxdx -00 q(x) or 1 00 pIx) . --smaxdx, -00 q(x) where a is a real number, pIx) and q(x) are real polynomials in x, and q(x) has no real zeros. These integrals are the real and imaginary parts of I 100 pIx) 'ax d 2= --e x. -00 q(X)
  • 298. 280 10. CALCULUS DFRESIDUES The presence of eiax dictates the choice ofthe half-plane: Ifa "=: 0, we choose the UHP; otherwise, we choose the LHP. We must, of course, have enough powers of x in the denominator to render Rjp(Reie)/q(ReiB)1 uniformly convergent to zero. 10.3.4. Example. Let us evaluate!-""oo[cos ax/(x2+ IP] dx wherea 'i O. Thisintegral is the real part of the integral h = J~oo eiax dx/ (x2 + 1)2. When a > 0, we close in the UHP as advised by Jordan's lemma. Then we proceed as for integrals of rational functions. Thus, we have i eiaz h = 2 2 dz = 2rri Res[f(i)] C (z + I) fora> 0 because there is only one pole (of order 2) in the UHP at z = i, We next calculate the residue: Res[f(i)] = lim .'!..- [(z - i)2 e~az 2J z-->i dz (z - i) (z + i) . d [ e iaz J . [CZ + i)iae iaz - 2e iaZ J e- a = Inn - - - - = Inn = -(I +a). z-+i dz (z +i)2 z-vt (z +i)3 4i Substituting this in the expression for 12. we obtain h = Ie-a(l +a) for a > O. When a < 0, we have to close the contour in the LHP, where the pole of order 2 is at z = -i and the contour is taken clockwise. Thus, we get i eiaz h = 2 2 dz = -2rri Res[t(-i)] C (z + I) For the residue we obtain for a < O. d [ e iaz J e a Res[f(-i)] = lim - (z +i)2 2 2 = --(1- a), z-->-i dz (z - i) (z + i) 4i and the expression for h becomes h = ~ea(l - a) for a < O.We can combine the two results and write 100 cosax 7f I I 2 2 dx =Re(l2) = lz = -(I + lal)e- a . -00 (x + I) 2 10.3.5. Example. As anotherexample, let us evaluate III 100 x sin ax ---dx -00 x4 + 4 where a 'i O. This is the imaginary part of the integral h = J~oo xeiax dx/(x4 +4), which, in terms of z and for the closed contour in the UHP (when a > 0), becomes i zeiaz m tx> -4--dz=2rriLRes[f(Zj)] C z +4 j~l for a> O. (10.6)
  • 299. 10.3 EVALUATION OF DEFINITE INTEGRALS 281 The singularities are determined by the zeros of the denominator: z4 + 4 = 0, or Z = 1 ± i, -1 ± i, Of these four simplepoles only two, 1 + i and-1 + i, arein theUHP. We now calculate theresidues: zeiaz Res[f(1 + i)] = lim (z - I - i) . . . . z-->1+; (z - I -,)(z - I + ,)(z + I-,)(z + 1+,) (1 +i)e ia(l+ i) eiae-a (2i)(2)(2 +2i) 8i zeiaz Res[f(-I + ill = lim (z + I - i) . . . . z-->-1+; (z + I -,)(z + I + ,)(z - I-,)(z - 1+,) (-1 +Oia(- l+ i) e-iae-a (2i)(-2)(-2 +2i) 8i Substituting in Equatiou(10.6),we obtain e-a. . x 12= 2rci""""8i(ela _e-W ) = ize-asina. Thus, 1 00 x sinax 1r -4-- dx = lm(h) = _e-a sina -00 x +4 2 for a> O. (10.7) Fora < 0, we couldclose thecontour in the LHP. Butthere is aneasierway of getting to theauswer. Wenote that -a > 0, audEquation(10.7)yields 100 xsinax 1 00 xsin[(-a)x] 1r (a). n --dx=- dx=--e-- sm(-a)=-easina. -00 x4 + 4 -00 x4 + 4 2 2 Wecancollect thetwocases in 1 00 x sin-ax 1C I I ---dx = _e- a sina. -00 x4 + 4 2 10.3.3 Functions of Trigonometric Functions The third type of integral we can evaluate usingthe residue theorem involves only trigonometric functions and is of the form fo2n F(sinO, cos 0) ao. where F is some (typically rational) function ofits arguments. Since 0 varies from oto 217:,we can consider it an argument ofa point z on the unit circle centered at the origin. Then z = eW and e-w = I/z, and we can substitute cos 0 = (z +l/z)/2, sinO = (z - l/z)/(2i), and dll = dz/(iz) in the original integral, to obtain J F (Z - I/z, z+ I/Z) ~z. Jc 2, 2 IZ This integral can often be evaluated using the method of residues.
  • 300. 282 10. CALCULUS OF RESIDUES a -I-~ Z2 = and a 10.3.6. Example. Let us evaluate the integral frdO/(1 + acosO) where faJ < 1. Substituting for cos () and de in terms of z, we obtain 1. dz/iz 21. dz f c 1+ a[(z2 + 1)/(2z)] = i fc 2z + az2 + a' where C is the unit circle centered at the origin. The singularities of the integrand are the zerosof its denominator: -l+~ ai = For lal< 1it is clear that Z2 will lie outside the unit circle C; therefore, it does not contribute to the integral. But zt lies inside, and we obtain fc 2z + ::2 + a = 2:n:i Res[f(Zt)]· The residue of the simple pole at zt can be calculated: Res[!(ZI)] = lim (z - Zl) ( ~( ) ~ (_1_) z-e-zt a Z - zr Z - Z2 a zt - Z2 I ( a) I =;; 2v'I-a2 = 2v'I-a2' It follows that [2" dO 2 1. dz 2 ( I ) 2:n: 10 1+{lcose=irc2z+az2+a=i'bri 2-!1-a2 = Jl-aZ' II 10.3.7. Example. As anotherexample, let us consider the integral where a> 1. ] _ {" dO - 10 (a + cos0)2 Since cos eis an even function of e. we may write where a > 1. 11" dO ] - -2: _,,(a+cosO)2 This integration is over a complete cycle around the origin, and we can make the usual substitution: ]=~1. dz/iz =~1. zdz . 2 f c [a + (z2 + 1)/2z]2 i fc (z2 +2az + 1)2 The denominator has the"roots zt = -a + JaZ - 1 and Z2 = -a - via2 - 1, which are both of order 2. The second root is outside the unit circle because a > 1. Also, it is easily verified that for all a > 1, zr is inside the unit circle. Since zt is a pole of order 2, we have Res[f(ZI)] = lim '!...- [(z _ ZI)2 z ] z-->z[ dz (z - ZI)2(z - Z2)2 -lim'!...-[ z ]_ I - z-e-zr dz (z - Z2)2 - (Zl - Z2)2 We thos obtain] = ~2:n:i Res[!(ZI)] = 2:n:a 3/2' , (a - I)
  • 301. 10.3 EVALUATION OF DEFINITE INTEGRALS 283 10.3.4 Some OtherIntegrals The three types of definite integrals discussed above do not exhaust all possible applications of the residue theorem. There are other integrals that do not fit into any of the foregoing three categories but are still manageable. As the next two examples demonstrate, aningenious choiceof contours allowsevaluation of other types of integrals. 10.3.8. Example. Letus evaluate the Gaussianintegral 1 00 . 2 I = elax-bx dx -00 where a, b E lll., b > O. Completing squares in theexponent, we have 1= 1 00 e-b[x-ia/(2b)]2_a2j4bd x = e-a2/ 4b lim lR e-b[x-ia/(2b)]2dx. -00 R-?oo -R If we change thevariable of integration toz = x - iaf(2b), we obtain 2/(4b) lR- ia/ (2b) b 2 1= e-alim e? Z dz. R-+oo -R-iaj(2b) Let us now define IR: l R- ia/(2b) 2 [R es e-bz dz. -R-ia/(2b) This is an integralalonga straightline Ci that is parallelto the x-axis (seeFigure 10.4). We close the contour as shown andnote that e- hzZ is analyticthroughout the interior of theclosed contour (it is anentirefunction!). Thus,the contour integral mustvanishby the Cauchy-Gaursat theorem. So we obtain [ 2 i: 2 1 2 IR + e-bz dz + e-bx dx + e-bz dz = O. C3 R C4 AlongC3, z =R + iy and r e-bz2dz=1° e-b(R+iy)2idY=ie-bR21° eby2-2ibRYdy lc, -ia/(2b) -ia/(2b) whichclearly tendsto zero as R --+ 00. We get a similarresultfor the integral along C4. Therefore, we have l R 2 IR = e-bx dx -R Finally, we get . 100 _bx 2 ~ =} 11m IR = e dx = -b' R~oo -00 III
  • 302. 284 10, CALCULUS OF RESIDUES -R -ia 1(2b) C I Figure 10.4 The contour for the evaluation of the Gaussian integral. R 10.3.9. Example. Let us evaluate [ = fo oo dxl(x3 + I). If the integraud were even, we could extend the lower limit of integration to -00 and close the contour in the UHP. Since this is not the case, we need to use a different trick. To get a hint as to how to close the contour, we study the singularities of the integrand. These are simply the roots of the denominator: z3 = -1 or Z,~ = ei(2n+ l )1l'j 3 with n =0, 1,2. These, as well as a contour that has only zo as an interior point, are shown in Figure 10.5. We thus have 1 dz 1 dz [ + -3-- + -3-- = 2"i Res[f(zo)]· CR Z + I c, z + I (10.8) The CR integral vanishes, as usual. Along Cz. z = ria, with constantc, so that dz =eiadr aud 1 dz 10 ei'dr ,{OO dr C2 z3 + 1 = 00 (re ia)3 + 1 = _e UX io r3e3ia + 1. In particular, ifwe choose 3a = 2rr. we obtain f ~ = _i21r/3 1 00 ~ = _i27t/ 3I. 1c, z3 + I 10 r3 + I Substituting this in Equation (10.8) gives 21Ci (I - e i2rr/3)[ = 2,,; Res[f(zO)] =} [= '2 /3 Res[f(zo)], 1- e' 71: On the other baud, Res[f(zO)] = lim (z - zo) I 2-->20 (z - zo)(z - ZI)(Z - Z2) I I (zo - ZI)(ZO - Z2) (eirr/3 - eirr)(eirr/3 _ ei5rr/3) ' These last two equations yield 2"i I 27r [ = 1- ei2rr/3 (eirr/3 _ eirr)(eirr/3 - ei5rr/3) = 3,,13'
  • 303. 10.3 EVALUATION OF DEFINITE INTEGRALS 285 Cz • Zz Figure10.5 The contour is chosenso that onlyoneof the poleslies inside. 10.3.5 Principal Valueof an Integral So far we have discussed only integrals of functions that have no singularities on the contour. Let ns now investigate the conseqnences of the presence of singular points on the contour. Consider the integral 1'" f(x) dx, _ooX-XQ (10.9) principal value ofan integral where xo is a real number and f is analytic at xo. To avoid xo-which causes the integrand to diverge-we bypass it by indenting the contour as shown in Figure 10.6 and denoting the new contour by Cu' The contour Co is simply a semicircle of radius E. Forthecontour Cu , we have ( f(z) dz = 1 xo - < f(x) dx +1'" f(x) dx + ( f(z) dz, Jeu z - XQ -00 x - XQ xo+€ X - XQ leo z - XQ In the limit E ~ 0, the sum of the first two terms on the RHS-when it exists- defines the principal value of the integral in Equation (10.9): 1 '" f(x) . [1xo - < f(x) 1'" f(x) ] P --dx=lim --dx+ --dx. -00 x - XQ E~O -00 X - XQ xo+€ X - XQ The integral over the semicircle is calcnlated by noting that z - Xo = .e;e and dz =iee"dO: JeD f(z) dz/(z - xo) =-i:n:f(xo). Therefore, 1 f(z) 1'" f(x) . - - dz = P - - dx - mf(xo)· CuZ-XQ _ooX-XQ (10.10)
  • 304. 286 10. CALCULUS OF RESIDUES Xo-E . Figure 10.6 The contour Cu avoidsxo. XO+E Cu (10.11) On the other hand, if Co is taken below the siogularity on a contour Cd, say, we obtaio ( f(z) dz = P 100 f(x) dx +irrf(xo). Jed Z - XQ -00 x - XQ We see that the contour integral depends on how the siogular poiot Xo is avoided. However, the priocipal value, ifit exists, is unique. To calculate this priocipal value we close the contour by addiog a large semicircle to it as before, assuming that the contribution from this semicircle goes to zero by Jordan's lemma. The contours C« and Cd are replaced by a closed contour, and the value of the iotegral will be given by the residue theorem. We therefore have P 100 f(x) dx = ±irrf(xo) +2rri fRes [ f(Zj) ], -00 x - XQ j=l Zj - XQ where the plus sign corresponds to placing the iofioitesimal semicircle io the UHP, as shown io Figure 10.6, and the mious sign corresponds to the other choice. 10.3.10. Example. Let us use the principal-value method to evaluate the integral 10 00 sinx 1100 sinx 1= --dx = - --dx. ox 2_00 x Itappears that x = 0 is a singular point of the integrand; in reality, however, it is only a removable singularity, as canbe verifiedby theTaylor expansionof sinxJx. Tomakeuse of theprincipal-value method, we write I= ~1m(100 e ix dX) = ~1m(p1 00 e ix dX). 2 -00 x 2 -00 x We now use Equation (10.11) withthe small circle in the UHP, noting that there areno singularities foreix[x there. Thisyields 1 00 ix P - dx = breeD) = in. -00 x Therefore, roo sinx 1 1r 1 0 7 dx = 2: Imuz) = 2' III
  • 305. 10.3 EVALUATION OF DEFINITE INTEGRALS 287 x • Xo I e _ _ _ _1 - - - - - Figure 10.7 Theequivalent contour obtained by "stretching" Cu, thecontour of Figure 10.6. The principal value of an integral can be written more compactly ifwe deform the contour Cu by stretching it into that shown in Figure 10.7. For small enough <, such a deformation will not change the number of singularities within the infinite closed contour. Thus, the LHS of Equation (10.10) will have limits of integration -00 + i< and +00 + i<.1f we change the variable of integration to ~ = z - ie, this integral becomes 1 00 f(~ +i<) d~ =1 00 f(~)d~ =1 00 f(z) dz -00 ~ + ie - XQ -00 ~ - XQ + ie -00 Z - XQ + ie' (10.12) where in the last step we changed the dummy integration variable back to z. Note that since f is assumed to be continuous at all points on the contour, f (~ +is) ---> f(~) for small <. The last integral of Equation (10.12) shows that there is no singularity on the new x-axis; we have pushed the singnlarity down to XQ - i <.In otherwords, we have given the singnlarity Onthe x-axis a small negative imaginary part. We can thus rewrite Equation (10.10) as p 100 f(x) dx = irrf(xQ) +100 f(x) dx. , -00 x - XQ -00 x - XQ + lE where x is used instead of z in the last integral because we are indeed integrating along the new x-axis-s-assumlng that no other singnlarities are present in the UHP. A similar argument, this time for the LHP, introduces a minus sign for the first term on the RHS and for the e term in the denominator. Therefore, p 100 f(x) dx = ±irrf(xQ) +100 f(x)dx., -00 x - xo -00 x - XQ ± lE (10.13) where the plus (minus) sign refers to the UHP (LHP). This result is sometimes abbreviated as 1 I -----,--.- = P-- dx Of irr~(x - XQ). x -XO±ZE x -XD (10.14)
  • 306. E > o. 288 10. CALCULUS DF RESIDUES 10.3.11. Example. Let us use residuesto evaluatethe fuuction I 1 00 e ikx dx f(k)=-. --., 21n -00 X-lf The integral representation 01 the e(step) lunction Wehaveto close thecontour byadding a large semicircle. Whether we dothisin theUHP or the LHPis dictatedby the sign of k: Ifk > 0, we close in the UHP. Thus, 1 1eikZdz [ e ikz ] f(k) = -. --. =Res --. 21l'l C Z - IE Z - IE z-vie = lim [(Z - it:) eik~ ] = e-kE-----+ 1. Z-+l€ Z - IE 10-+0 Ontheother hand, if k -c 0, we mustclose in theLHP, in whichtheintegrand is analytic. Thus, by theCauchy-Goursat theorem, theintegral vanishes. Therefore, wehave f(k) = {I ifk>O, o ifk<O. theta(orstep) This is precisely the definition of the theta fnnction (or step function). Thus, we have function obtained anintegral representation of that function: I 1 00 e ixt O(x) = -. --. dt. 2Jrl -00 t - IE .. Now suppose that there are two singularpoints on the real axis, at XI and xz. Let us avoid XI and Xz by making little semicircles, as before, letting both semicircles be in the UHP (see Figure 10.8). Without writing the integrands, we can represent the contour integral by The principal value ofthe integral is naturally defined to be the sum of all integrals having E in their limits. The contribution from the small semicircle Clean be calculated by substituting z - XI = Eeie in the integral: r f(z)dz 1°f(xI +Eeie)iEeie de . f(xI) leI (z - Xl)(Z - Xz) = x Eeie(Xl +EeW - Xz) = -In Xl - X2' with a similar result for CZ. Putting everything together, we get plOO f(x) d . f(xz)-f(xI) 2 'LR x - l1t = 1i1 es. -00 (x - Xl)(X - Xz) Xz - XI If we inclnde the case where both CI and Cz are in the LHP, we get ploo f(x) d ±. f(xz)-f(Xl) + 2 'LR x = tit 1!1 es, -00 (x - XI)(X - Xz) Xz - Xl (10.15)
  • 307. 10.3 EVALUATION OF DEFINITE INTEGRALS 289 XI C2 X2 Figure10.8 One of the four choices of contours for evaluating the principal value of the integral when there are two poles on the real axis. where the plus sign is for the case where CI and C2 are in the UHP and the minus sign for the case where both are in the LHP. We can also obtain the result for the case where the two singularities coincide by taking the limit Xl --> X2. Then the RHS of the last equation becomes a derivative, and we obtain JOO f(x) . ." P 2 dx = ±mf'(xo) +2m L...Res. -00 (x - xo) 10.3.12. Example. An expression encountered in the study ofGreen's functions or prop- agators (which we shall discuss later in the book) is 1 00 eitx dx _oox2 - k2 ' where k and t are real constants. We want to calculate the principal value of this integral. We use Equation (10.15) and note that for t > 0, we need to close the contour in the UHP, where there are no poles: 1 00 eitx dx 100 eitx dx eikt - e-ikt sinkt P ---=P =irr --'Jr-- -00 x2 - k2 -00 (x - k)(x +k) 2k - k· When t < 0, we have to close the contour in the LHP, where again there are no poles: 1 00 eitx dx 100 eitx dx eikt - e-ikt sin kt P -00 x-2---k-2 = P -00 (x - k)(x + k) = -Ln 2k = Jr- k - · The two results abovecan be combinedinto a singlerelation: 1 00 eitx dx sinkltl P -2--2 = -Jr--. -oox -k k 11II
  • 308. 290 10. CALCULUS OF RESIOUES 10.4 Problems (c) J cosz dz: Yc z(z - n) (f) J I-~OSZ dz: Yc z (") J dz I Yc Z3(Z +5)' i: (I) zdz. cZ J 4z-3 (a) Yc z(z _ 2) dz. i 2+1 (d) z dz: c z(z - I) ()i sinh z g -4- dz. c z (j) i tanzdz.: i dz (m) -.-- c z2 sin Z' 10.1. Evaluate each of the following integrals, for all of which C is the circle lel = 3. I ! i eli (b) i dz: c z(z i in) i cosh z (e) 2 -I- 2 dz: c z n (h) izcos (D dz. J dz (k) Yc sinh 2z' i e'dz (n) . C (z - I)(z - 2) 10.2. Leth(z) be analytic and have a simple zero at z = zo. and let g(z) be analytic there. Let !(z) = g(z)/h(z), and show that g(zo) Res[!(zo)] = h'(zo)' 10.3. Find the residne of !(z) = 1/ cos z at each of its poles. 10.4. Evaluate the integral J:'dx/[(x2 + l)(x2 +4)] by closing the contour (a) in the UHP and (b) in the LHP. 10.5. Evaluate the following integrals, in which a and b are nonzero real constants.
  • 309. R+ib -R o 10.4 PROBLEMS 291 R+ib - +R where a ;f ±1. where a ;f ±1. (b) {2" d8 where a > 1. io a+cos8 10 2;< d8 (d) where a, b > O. o (a +bcos28)2 (f) (" d¢ where a ;f ±1. io 1- Zacos ¢ +a2 Fi9ure 10.9 Thecontour usedin Problem 10.8. 10.6. Evaluate each of the following integrals by turning it into a contour integral around a unit circle. (2" d8 (a) io 5+4sin8 10 2" ae (e) o l+sin28' 10 2" cos2 38 (e) ao.. o 5-4cosZ8 t" cos23¢ d¢ (g) io 1- Zacos¢ + a2 (h) [" cos Z¢ d¢ io 1 - Za cos ¢ + a2 (i) 10"tan(x + ia) dx where a E R. (j) 10" eC os~ cos(n¢ - sin ¢) d¢ where n E Z. 10.7. Evaluate the integral I = f'''ooeuxdx/(I+eX)forO < a < 1.Hint Choose a closed (long) rectangle that encloses only one of the zeros of the denominator. Show that the contributions of the short sides of the rectangle are zero. 10.8. Derive the integration formnla fo oo e- x2 cos(Zbx)dx = V;e- b2 where b ;f 0 by integrating the function e- z2 around the rectangnlar path shown in Figure 10.9. 10.9. Use the result of Example 10.3.11 to show that 8'(k) = 8(k).
  • 310. 292 10. CALCULUS OF RESIDUES 10.10. Find the principal values of the following iutegrals. (d) 1 00 sinxdx (a) . -00 (x2 +4) (x - I) 1 00 x cosx (c) 2 5 dx. -oox - x+6 10.11. Evaluate the following integrals. 1 00 cosax (b) --dx -00 I +x3 1 00 l-cosx 2 dx. -00 x where a "':O. (d) (f) rOO xl _ b 2 (Sinax) (a) Jo x2 +b2 -x- dx. 1 00 sin ax (c) dx o x(x2 +b2)2 . 1 00 8in2 x dx (e) o x2 ' Additional Reading (b) roo sin ax d Jo x(x2 +b2) x. 1 00 cos 2ax - cos 2bx d 2 x. o x roo sin3x dx 10 x3 · I. Dennery, P.and Krzywicki, A. Mathematics for Physicists, Harper and Row, 1967. Includes a detailed discussion of complex analysis encompassing ap- plications ofconformal mappings and the residue theorem to physical prob- lems. 2. Mathews, J. and Walker, R. Mathematical Methods of Physics, 2nd ed., Benjamin, 1970. A "practical" guide to mathematical physics, including a long discussion of "how to evaluate integrals" and the use of the residue theorem.
  • 311. 11 _ Complex Analysis: Advanced Topics The subject of complex analysis is an extremely rich and powerful area of math- ematics. We have already seeu some of this richuess and power io the previous chapter. This chapter concludes our discussion ofcomplex analysis by iotroducing some other topics with varyiog degrees of importance. 11.1 Meromorphic Functions Complex functions that have only simple poles as their siogularities are numerous meromorphic in applications and are called meromorpbic functions. Inthis section, we derive functions an important result for such functions. Assume that!(z) has simple poles at{zj }f~l' where N could be iofinity. Then, if z 01 Zj for all j, the residue theorem yields! ~ ( !CI;) dl; = !CZ)+ tRes eCI;)) , 2Jfl len I; - Z j=! I; - Z ~=Zj where Cn is a circle contaioiog the first n poles, and it is assumed that the poles are arranged in order of iocreasiog absolute values. Since the poles of ! are assumed to be simple, we have ( ! ~) ) . !~) 1 . Res - - = Inn CI; - Zj)--. = -- lim [CI; - zj)!CI;)] I; - Z ~=Zj ~""Zj I; - Z Zj - Z ~""Zj 1 r' = --Res[!Cf;)]~~Zj es _J_, Zj - Z Zj - Z INote that theresidue of f(~)/(~ - z) at'; = z is simply f(z),
  • 312. 294 11. COMPLEX ANALYSIS: ADVANCED TOPICS where rj is, by definition, the residue of f(~) at ~ = ij. Substituting in the preceding equation gives f(z) = ~ ( f(~) d~ - t~. 2:", Je, ~ - Z j~l Zj - Z Taking the difference between this and the same equation evaluated at Z = 0 (assumed to be none of the poles),Z we can write (11.1) Mittag-Leffler expansion f(z) - f(O) = -f--, ( H~(~» d~ +t rs (_1_ + 2.). 7Cl len - Z )=1 Z - Zj Zj If If(~)1 approaches a finite value as I~I --.. 00, the integral vanishes for an infinite circle (which includes all poles now), and we obtain what is called the Mittag- Leffler expansion of the meromorphic fuoction f: f(z) = f(O) +trj (_1_+ 2.). )=1 Z - Zj Zj Now we let g be an entire function with simple zeros. We claim that (a) (dg/dz)/g(z) is a meromorphic function that is bounded for all values of z, and (b) its residues are all unity. To see this, note that g is of the fomr' g(z) = (z - ZI)(Z - zz)··· (z - zN)f(z), where Zl, ... , ZN are all the zeros of g, and f is an analytic function that does not vanish anywhere in the complex plane. It is now easy to see that g'(z) N I f'(z) g(z) = L z - z· + f(z) . J=l j This expression has both properties (a) and (b) mentioned above. Furthermore, the last term is an entire function that is bounded for all C. Therefore, it must be a constant by Proposition 9.5.5.This derivation also verifies Equation (11.1),which in the case at hand can be written as d s'(z) d N (I I) -d Ing(z) =-( ) = -d Ing(O)+ L --+ - , z g z Z j~l Z - Zj Zj whose solution is readily found to be g(Z) = g(O)eCZ fI(I-~) eZ / Zj )=1 z) and it is assumed that Zj i= 0 for all j, where (dg/dz)lz~o c= g(O) (11.2) ~This is not a restrictive assumption because we can always move our coordinate system so that the origin avoids allpoles. One can ''prove'' this by factoring the simple zeros one by one, writing g(z) = (z - Zl)!t (z) and noting that g(Z2) = 0, with Z2 i= at. implies that il (z) = (z - Zz) h (z), etc.
  • 313. (a) 11.2 MULTIVALUED FUNCTIONS 295 (b) Figure 11.1 (a) TheangleeO changes by 2rr as ZQ makes a complete circuit around C. (b)Theangle80returns to its original valuewhenzQ completes thecircuit. 11.2 Multivalued Functions The arbitrariness, up to a multiple of2"., of the angle 0 = arg(z) inz = reieleads to fuuctions that can take differentvalues at the same point. Consider, for example, the function !(z) =.,ji.. Writing z in polar coordinates, we obtain !(z) =f tr, 0) = (reie)t/2 = ..;reie/ 2. This shows that for the same z = (r,O) = (r,O + 2".), we get two different values, f tr,0) and f'(r, 0 + 2".) = - !(r, 0). This may be disturbing at first. After all, the definition of a function (mapping) ensures that for any point in the domain a unique image is obtained. Here two differentimages are obtainedfor the same z. Riemannfound a cure for this complex "double vision" by introducing what is now calledRiemann sheets. We will discuss these briefly below, but first let us take a closer look at a prototype of multivalued functions. Consider the natural log function, In z. For z = reie this is defined as In z =In r+iO =In [z] +i arg(z) where arg(z) is defined only to within a multiple of 2".; that is, arg(z) = 0 + Znit , for n = 0, ±l, ±2, .... We can see the peculiar nature of the logarithmic function by considering a closed path around the point z = 0, as shown in Figure 11.1(a). Starting at Zo, we move counterclockwise, noticing the constant increase in the angle (}o, untilwe reach the initial point in the z-plane. However, the angle is then 00+ 2".. Thus, the process of moving around the origin has changed the value of the log function by 2".i. Thus, (lnzo)final - (lnzo)mitiol = Zati. Note that in this process zo does not change, because branch point 11.2.1. Definition. A branch point ofafunction f : iC -> iC is a complex number
  • 314. 296 11. COMPLEX ANALYSIS: ADVANCED TOPICS zo with the property that f(ro, 110) oft f(ro, 1I0 +211") for any closed curve C encircling Z00Here (ro, 110) are the polar coordinates ofzo. Victor-Alexandre Pniseox (1820-1883) wasthe firstto take up the subjectof multivalued functions. In 1850Puiseux published a celebrated paper on complex algebraic functions givenby f (u, z)= 0, f apolynomial inu andz.Hefirst made clearthedistinctionbetween polesandbranchpointsthat Cauchy hadbarely perceived, andintroduced thenotion of an essentialsingularpoint,towhichWeierstrass independently hadcalledattention. Though Cauchy, in the1846paper, didconsider thevariation of simple multivalued functions along paths that enclosed branch points, Puiseux clarified thissubject too. Puiseux alsoshowed that thedevelopment of afunction ofz about abranch point z = a must involve fractional powers of z - a. He thenimproved on Cauchy's theorem on the expansion of a function in a Maclaurin series. By his significant investigations of many- valued functions and their branch points in thecomplex plane, andby his initial work on integrals of suchfunctions, Puiseux brought Cauchy's pioneering work in function theory to the end of whatmightbe called the firststage.The difficnlties in the thenryof multiple- valuedfunctions and integrals of suchfunctions werestilltobe overcome. Cauchy didwrite other papers on theintegrals of multiplevalued functions in whichhe attempted to follow uponPuiseux's work; andthough heintroduced thenotion of branch cuts(!ignes d~arrh), be was stillconfused about thedistinction between poles and branch points. Thissubject of algebraic functions andtheir integrals wastobe pursued by Riemann. Puiseux wasa keenmountaineer andwasthefirst to scaletheAlpine peak that is now named after him. Thus, Z = 0 is a branchpointofthe logarithmic function. Studyingthe behavior of 1n(I/z) = -Inz around z = 0 will reveal that the point "at infinity" is also a branch point of In z. Figure 11.1(b) shows that any other point of the complex )lIane, such as z', carmot be a branch point because 110 does not change when C' is traversed completely. 11.2.1 Riemann Surfaces branch cut orsimply "cut" The idea of a Riemarm surface begins with the removal of all points that lie on the line (or any other curve) joining two branch points. For In z this means the removal of all points lying on a curve that starts at Z = 0 and extends all the way to infinity. Such a curve is called a branch cut, or simply a cut. Let us concentrate on In z and take the cut to be along the negative half of the real axis. Let us also define the functions fn (z) = I«(r, II) = Inr +i(1I+2n1l") for - 1C < e< 1C; r > 0; n = 0, ±1, ... ,
  • 315. Riemann surfaces and sheets 11.2 MULTIVALUED FUNCTIONS 297 so in (z) takes on the same values for -:rr: < e < n that In z takes in the range (2n - I):rr: < e < (2n + I):rr:. We have replaced the multivalued logarithmic function by a series of different functions that are analytic in the cut z-plane. This process of cutting the z-plane and then defining a sequence of functions eliminates the contradiction caused by the existence ofbranch points, since we are no longer allowed to completely encircle a branch point. A complete circulation involves crossiog the cut, which, in tum, violates the domain of definition of in (z). We have made good progress. We have replaced the (nonanalytic) multivalued function In Z with a series of analytic (in their domain of definition) functions in (z). However, there is a problem left: in(z) has a discontinuity at the cut. In fact, just above the cut in(r,:rr: - E) = Inr +i(:rr: - E +2n:rr:) with s c- 0, and just below it in(r, -:rr: +E) = Inr +i(-:rr: +E +2n:rr:), so that lim [in(r,:rr: - E) - in(r, -:rr: +E)l = 2:rr:i. ' ....0 To cure this we make the observation that the value of in (z) just above the cut is the same as the value of in+1(z) just below the cut. This suggests the following geometrical construction, due to Riemarm: Superpose an infinite series of cut complex planes one on top of the other, each plane corresponding to a different value of n. The adjacent planes are connected along the cut such that the upper lip of the cut in the (n - I)th plane is connected to the lower lip of the cut in the nth plane. All planes contain the two branch points. That is, the branch points appear as "hinges" at which all the planes are joined. With this geometrical construction, if we cross the cut, we end up on a different plane adjacent to the previous one (Figure 11.2). The geometric surface thus constructed is called a Riemann surface; each plane is called a Riemann sheet and is denoted by R], for j = 0, ±I, ±2, .... A single-valued function defined on a Riemarm sheet is called a branch of the original multivalued function. We have achieved the following: From a multivalued function we have con- structed a sequence of single-valued functions, each defined in a single complex plane; from this sequence of functions we have constructed a single complex func- tion defined on a single Riemarm surface. Thus, the logatithmic function is analytic throughout the Riemarm surface except at the branch points, which are simply the function's singular points. It is now easy to see the geometrical significance ofbranch points. A complete cycle around a branch point takes us to another Riemarm sheet, where the function takes on a different form. On the other hand, a complete cycle around an ordinary pointeither nevercrossesthecut,or if it does, it will crossit backto the original sheet. Let us now briefly consider two of the more common multivalued functions and their Riemarm surfaces. 11.2.2. Example. THE FUNCTION f(z) = z'!" The only branchpointsfor the functioo f(z) = zl/n are z = 0 and the point at infinity.
  • 316. 298 11. COMPLEX ANALYSIS: ADVANCED TOPICS Figure 11.2 - A few sheets of the Riemann surface of the logarithmic function. The path C encircling the origin 0 ends up on the lower sheet. Defioing!k(z) sa r 1/ nei(9+2k" jn) for k = 0, 1, ... , n -1 and0 < e < 2". andfollowing the same procedure as for the logarithmic function, we see that there must be n Riemann sheets, labeled Ro. RI •... , Rn-l. in the Riemann surface. The lower edge of Rn-l is pasted to the upper edge of Ro along the cut, which is taken to be along the positive real axis. The Riemann surfacefor n = 2 is shownin Figure 11.3. It is c1ear that for any nonintegervalue ofa the function fez) = zOl. has a branch point at z = 0 and another at the point at infinity. For irrational a the number of Riemann sheets is infinite. II 11.2.3. Example. l'HEFUNCTION f(z) = (z2 _1)1/2 The brancbpoints for the function f(z) = (z2 - 1)1/2 are at zt = +1 and Z2 = -1 (see Figure 11.4).Writingz - 1 = rt ie, andz + 1 = r2eie,. we bave f(z) = (Yjeie1)1/2 (r2eie,)I/2 = .jYjr2ei(e1+e, )/2. The cut is along the real axis from z = -1 to z = +1. There are two Riemann sheets in the Riemann surface. Clearly, only cycles of 2rr involving one branch point will cross the cut and therefore end up on a different sheet. Any closed curve that has both zr and Z2 as interior points will remain entirely on the original sheet. II evaluation of integrals involving Guts The notion of branch cnts can be used to evalnate certain integrals that do not fit into the three categories discussed in Chapter 10. The basic idea is to circumvent the cut by constructing a contour that is infinitesimally close to the cut and circles around branch points.
  • 317. 11.2 MULTIVALUED FUNCTIONS 299 Figure 11.3 The Riemann surface for fez) = zl/2. z Figure 11.4 The cut for the fuoction fez) = (z2 - 1)1/2 is from ZI to Z2. Paths that circle only one of the points cross the cut and end up on the other sheet. 11.2.4. Example. To evaluate the integrall = 1000 xadx/(x2 + I) for lal < I, consider the complex integral t' = :Fe zadz/(z2 + 1) where C is as shown in Figure 11.5 and the cut is taken along the positive real axis. To evaluate the contribution from CR and Cr, we let p stand for either T or R. Then we have It is clear that since lal < 1, I p -+ 0 as p -)- 0 or p -+ 00.
  • 318. 300 11. COMPLEX ANALYSIS: ADVANCED TOPICS (11.3) Figure 11.5 Thecontourfortheevaluationofthe integrals of Examples 11.2.4and11.2.5. The contributions from L 1 and £2 do not cancel one another because the value of the functionchangesabove andbelow the cut. To evaluate these two integrals we have to choose a branch of the function. Let us choose that branch on which ZCl = IzlfteiaB for o< () < 2Jt. Along L1, (J ~ 0 or zet = x", andalong L2. e ~ 2rr or zet = (xe27ti)a'. Thus, i z· looo x' 1° x ct e 2rria --4Z= --dx+ . dx C z2 +I 0 x 2 +I 00 (xe2n')2 + I = (1 - e21riCl. ) tX) ~ dx, 10 x2 + I TheLHSof thisequation canbe obtained usingtheresidue theorem. There are two simple poles, at z = +i and z = -i with residues Res[f(i)] = (ein/2)./2i and Res[f(-i)] = _(ei3n/2). /2i. Thus, 1i z· (ei•n/2 ei3 • n/2) . . - - dz = 2rci --.- - --.- = rt(ew rrj2 _ eI3car/2). cz2 + 1 21 21 Combiniog this with Equation (11.3), we obtaio [00 ~ dx = 7r(eicmj2 - <3urr/2) = !!:. sec a7r . io xl + 1 1 - e~bna 2 2 Ifwe hadchosenadifferent branch of thefunction, boththeLHSandtheRHSof Equation (11.3) wonld have been different, but the fina1 result would stiDhave been the same. III
  • 319. 11.2 MULTIVALUED FUNCTIDNS 301 Ce for 0 < a < 1. (11.4) III (11.5) Figure 11.6 The contour for the evaluation of the integral of Example 11.2.6. 11.2.5. Example. Here is another integral involving a branch cut: 1= roo x-a dx 10 x+ I To evaluate this integral we use the zeroth branch of the function and the contour of the previousexample (Figure 11.5).Thus, writing z = peW. we have 2"iRes[f(-I)] = J: z-a dz = [00 p-a dp + J: z-a dz Ic z + I 1 0 p + I fCR Z + I 1 0 (pe2i,,)-a. i c a + . e2mdp+ --dz. 00 peZm + 1 Cr Z + 1 The contributions from both circles vanish by the same argument used in the previous example. On the other hand, Res[f(-I)] = (_I)-a. For the branch we are using, -I = ei". Thus, Res[f(-I)] =e-ia". The RHS of Equation (11.4) yields 10 00 -a 1000 -a _P_ dp - e-2ina ..!!...:.....- dp = (1 _ e-2i1ta)I. o p+1 0 p+1 It follows from (11.4) that (1 - e-2iJra)I = 21rie-i 1r Q , or 10 00 x-a 1C dx - for 0 < a < 1. o x+l - sinarr 11.2.6. Example. Let us evaluate I = fo oo Inx dx/(x2+a2) with a > O. We choose the zeroth branch of the logarithmic function, in which -li < 9 < Jr, and use the contour of Figure 11.6. For Ll' Z = pei1r (note that p > 0), and for £2, Z = p. Thus, we have i ln z 1< In(pe;"). /, ln z 2"i Res[f(ia)] = -2--2 dz = . 2 2 e"dp + ~ dz C z +a 00 (pelJt ) +a C" z +a 100 lup /, Inz + -2--2 dp + ~ dz. E p +a CR z +a
  • 320. 302 11. COMPLEX ANALYSIS: ADVANCED TOPICS wherez = ia is the only singularity-a simplepole-in theUHP. Now we note that l E In(pei:Jr). 1 00 lop +itt 1 00 lop . 100 dp . el1C dp = dp= ---dp+zJr ---. 00 ( pel'lr)2 +a2 to p2 +a2 E p2 +a2 to p2 +a2 Thecontributions from thecirclestendto zero. Ontheother hand, Inz In(ia) 1 ( ''') Res[f(ia)] = lim (z - ia) . . = -.- = -.- Ina +!- . z-via (z - w)(z + za) 2za 'Iia 2 Substituting thelasttworesults in Equation (11.5), we obtain ?:'(lna+i::')=21°O ;llP 1 dp +i:7r100 zd p 2" a 2 E P +a E p +a It canalsoeasilybe shownthatIo oo dp/(p2 +a2) = ,,/(2a). Thus, in the limitE -+ 0, we get I = !!..-Ina. The sign of a is irrelevant because it appears as a square in the integral. 2a Thus, we canwrite 1 00 Inx n -2--2 dx = -21 Ilnla l, D x +a a a i' O. equal onapiece, equal all over n.3 Analytic Continuation Analytic functions have certain unique properties, some ofwhich we have already noted. For instance, the Cauchy integral formula gives the value of an analytic function inside a simple closed contour once its value on the contour is known. We have also seen that we can deform the contours ofintegration as long as we do not encounter any singnlarities of the function. Combining these two properties and assuming that f : C -+ C is analytic within a region See, we can ask the following question: Is it possible to extend f beyond S1 We shall see in this section that the answer is yes in many cases of interest." First consider 1I.3.1. Theorem. Let fl' fz : C -+ C be analytic in a region S. If fl = I: in a neighborhood ofa point z E S, orfor a segment ofa curve in S, then I, = fzfor all z E S. Proof Let g = f1 - fz,and U = {z E S Ig(z) = OJ. Then U is a subset of S that includes the neighborhood of z (or the line segment) in which fl = h. If U is the entire region S, we are done. Otherwise, U has a boundary beyondwhich g(z) oF O. Since all points within the boundary satisfy g(z) = 0, and since g is continuous (more than that, it is analytic) on S, g must vanish also on the boundary. But the bouodary points are not isolated: Any small circle around anyone ofthem includes points of U as well as points outside U. Thus, g must vanish on a neighborhood of any boundary point, implying that g vanishes for some points outside U. This contradicts our assumption. Thus, U must include the entire region S. 0 4Provided that S is not discrete (countable). (See [Lang85, p. 91].)
  • 321. when Iz+il < i· 11.3 ANALYTIC CONTINUATION 303 A consequence of this theorem is the following corollary. 11.3.2. Corollary. The behavior 01afunction that is analytic in a region S c C is completely determined by its behavior in a (small) neighborhood 01an arbitrary point in that region. analytic continuation This process of detetmining the behavior of an analytic function outside the regionin which it was originally defined is called analytic continuation. Although there are infinitely many ways of analytically continuing beyond regions of defi- nition, the values of all functions obtained as a result of diverse continuations are the sarne at any given point. This follows from Theorem 11.3.1. Let It, Iz :C -+ C be analytic in regions 8t and 82, respectively. Suppose that II and fzhave different functional forms in their respective regions of analyticity. If there is an overlap between 81 and 82 and if II = fz within that overlap, then the (unique) analytic continuation ofII into 82 must be h. and vice versa. In fact, we may regard !J and h as a single function I :iC -+ C such that I(z) = {!J(Z) when ZE 81, I:(z) when Z E 82. Clearly, I is analytic for the combined region 8 = 81 U 82. We then say that II and fz are analytic continuations of one another. 11.3.3. Example. Let us consider the function 11(z) = L~o t", which is analytic for lel < 1. We have seen that it converges to 1(1 - z) for [z] < 1. Thus, we have II(z) =1((1 - z) when [z] < I, and!J is not definedfor Izi > 1. Now let us consider a second function, h(z) = L;:'=o (~)n+1 (z + i)n, which converges for [z + ~I < ~. To see what it converges to, we note that h(z) = ~ L~o [~(z + i)]", Thus, ~ I h(z) = = - I-~(z+~) I-z Weobservethat althoughII(z) and h (z) havedifferentseriesrepresentations in the two overlapping regions(seeFigure 11.7),theyrepresentthe samefunction, I(z) = 1(1- z). Wecantherefore write I { fJ (z) when Izl < I, (z) - - h(z) when lz+ ~I < i, and I: and h are analyticcontinuations of ooe another. In fact, I(z) = 1(1 - z) is the analyticcontinuation of both fJ and Iz for all of iCexcept z = 1. Figure 11.7sbowsSi, the region of definitionof Ii.for i = 1, 2. III 11.3.4. Example. The function fJ(z) =1 0 00 e-ztdt existsonly if Re(z) > 0, in which caseII(z) = I(z.Its regionof definitionSI is showninFigure H.S andis simplytheright half-plane.
  • 322. 304 11. COMPLEX ANALYSIS: ADVANCED TOPICS Figure 11.7 Thefunction defined in thesmaller circleis continued analytically intothe larger circle. Figure 11.8 Thefunctions !l andI: are analytic continuations of eachother: !l ana- lytically continues 12 into the right half-plane, and 12 analytically continues !J into the senticirc1ein the left half-plane. Now we define 12by a geometric series: h(z) = i L:~O[(z+i)fi]n where Iz+il < I. Thisseries converges, within its circleof convergence 82, to . I I l =-. l-(z+iJfi z
  • 323. (a) 11.3 ANALYTIC CONTINUATION 305 (b) Figure 11.9 (a) Regions 81 and 82 separated by the bonndary B and the contnnr C. (b) Thecontour C splitsupintoCl andC2- Thus, we have ~={ft(Z) when z e Sj , z h(z) when Z E 82. Thetwofunctions are-analytic continuations of oneanother, andf (z) = 1/z is theanalytic continoation of both II and 12 for aU z E C except z = O. iii 11.3.1 The Schwarz Reflection Principle A result that is useful in some physical applications is referred to as a dispersion relation. To derive such a relation we need to know the behavior of analytic func- tions on either side of the real axis. This is found using the Schwarz reflection principle, for which we need the following result. 11.3.5. Proposition. Let Ii be analytic throughout 8i, where i = I, 2. Let B be the boundary between 81 and 82 (Figure 11.9) andassume that It and Izare con- tinuous on B and coincide there. Thenthetwofunctionsare analytic continuations ofone another and together they define a (unique)function I(z) = {fl (z) when z E 81 U B, h(z) when z E 82 U B, which is analytic throughaut the entire region 81 U 82 U B.
  • 324. 306 11. COMPLEX ANALYSIS: ADVANCED TOPICS Proof The proofconsists in showing that the function integrates to zero along any closed curve in Sl U S2 U B. Once this is done, one can use Morera's theorem to conclude analyticity. The case when the closed curve is entirely in either S; or S2 is trivial. When the curve is partially in Sj and partially in S2 the proof becomes only slightly more complicated, because one has to split up the contour C into Cl and C2 of Figure 11.9(b). The details are left as an exercise. D Schwarz reflection principle 11.3.6. Theorem. (Schwarz reflection principle) Let f be a function that is ana- lytic in a region S that has a segment ofthe real axis as part ofits boundary B. If f(z) is real whenever z is real, then the analytic continuation g of f into S* (the mirror image ofS with respect to the real axis) exists and is given by g(z) = f*(z*) where z E S*. Proof First, we show that g is analytic in S*. Let f(z) sa u(x, y) +iv(x, y), g(z) es U(x, y) +iV(x, y). Then f(z*) = fix, -y) = u(x, -y) +iv(x, -y) and g(z) = f*(z*) imply that U(x, y) = u(x, -y) and V(x, y) = -v(x, -y). Therefore, au au av av av = = =---=- ax ax ay a(-y) ay' au au av av ay = - ay = ax = -""h' These are the Cauchy-Riemann conditions for g(z). Thus, g is analytic. Next, we note that fix, 0) = g(x, 0), implying that f and g agree on the real axis. Proposition 11.3.5 then implies that f and g are analytic continuations of one another. D It follows from this theorem that there exists an analytic function h such that h(z) = {f(Z) when z E S, g(z) when z E S*. We note that h(z*) = g(z*) = f*(z) = h*(z). 11.3.2 DispersionRelations Let f he analytic throughout the complex plane except at a cut along the real axis extending from xo to infinity. For a point z not on the x-axis, the Cauchy integral formula gives f(z) = (2ni}-1 Ie f(~) d~I(~ - z) where C is the contour shown in Figure 11.10. We assume that f drops to zero fast enough that the contribution
  • 325. 11.3 ANALYTIC CONTINUATION 307 o Figure 11.10 The contour used for dispersion relations. from the large circle tends to zero. The reader may show that the contribution from the small half-circle around XQ also vanishes. Then 1 [l""+i' f(~) l""-i' f(~) ] f(z) = -. -d~ - -d~ 2rn xo+i'e ~ - Z xo-ie ~ - z = ~ [ roo f(x +i~) dx _ roo f(x - if) dX]. 2nl JXD x - Z +IE Jxo X - Z - IE Since z is not on the real axis, we can ignore the iE tenus in the denominators, so that f(z) = (2rri)-1 Jx':[f(x + if) - f(x - if)]dxj(x - z). The Schwarz reflection principle in the form J*(z) = f(z*) can now be used to yield f(x +if) - f(x - if) = f(x +if) - J*(x +if) = 2i Im[f(x +if)]. dispersion relation The final result is 1 1""Im[f(x +if)] f(z) = - . dx. tt xo x-z This is one form of a dispersion relation. It expresses the value of a function at any point of the cut complex plane in terms of an integral ofthe imaginary part of the function on the upper edge of the cut. When there are no residues in the UHP, we can obtain other forms ofdispersion relations by equating the real and imaginary parts of Equation (l0.11). The result
  • 326. 308 11. COMPLEX ANALYSIS: ADVANCED TOPICS is 1 100 Im[f(x)] Re[f(xo)] = ±-P dx, Jr -00 X -xo Im[f(xo)] = =F~P100 Re[f(x)] dx, 1T: -00 x-xo (11.6) (11.7) where the upper (lower) sign corresponds to placing the small semicircle around xo in the UHP (LHP). The real and imaginary parts of f, as related by Equation Hilbert transform (11.6), are sometimes sometiroes said to be the Hilbert transform of one another. In some applications the imaginary part of f is an odd function ofits argument. Then the first equation in (11.6) can be wtitten as 2 1000 x Im[f(x)] Re[f(xo)] = ±-P 2 dx. x 0 x2-x O To artive at dispersion relations, the following condition must hold: lim Rlf(Reie)1 = 0, R->oo . where R is the radius of the large semicircle in the UHP (or LHP). If f does not satisfy this prerequisite, it is still possible to obtain a dispersion relation called dispersion relation a dispersion relation with one subtraction. This can be done by introducing with onesubtraction an extra factor of x in the denominator of the integrand. We start with Equation (l0.15), confining ourselves to the UHP and assuming that there are no poles there, so that the sum over residues is dropped: f(X2) - f(XI) = ~P 100 f(x) dx, X2 - Xl in -00 (x - XI)(X - X2) The reader may check that by equating the real and imaginary parts on both sides, letting Xl = 0 and X2 = xo, and changing X to -x in the first half of the interval of integration, we obtain :....Re.:..o[::...f.:c.(x-",o)",] = Re[f(O)] + ~ [p roo Im[f(-x)] dx + P roo Im[f(x)] dX]. Xo xo n 10 x(x +xo) 10 x(x - xo) For the case where Im[f(-x)] = -lm[f(x)], this equation yields Re[f(xo)] = Re[f(O)] + 2x~ P roo Im~f(X)1 dx. n 10 x(x -xo) 11.3.7. Example. In optics. it has been shown that the imaginary part of the forward- optical theorem scatteringlightamplitudewithfreqnency00 is related, by the so-calledoptical theorem, to the total crosssectionfor the absorptionof light of that frequency: 00 !m[f(oo)] = -<1lot(oo). 411"
  • 327. (11.8) Kramers-Kronig relation 11.4 THE GAMMAAND BETA FUNCTIONS 309 Substituting this in Equation(t 1.7)yields ,,} looo O"totCw) Re[f(wo)] = Re[f(O)] + ----%:p 2 2 da. 2:rr 0 (V-coO Thus, the real part of the (coherent) forwardscatteringof light,that is, the real part of the indexcfrefraaion. canbecomputedfromEquation(11.8)byeithermeasuring orcalculating O'tot(w), the simpler quantity describing the absorption of lightin themedium. Equation (11.8) is theoriginal Kramers-Kronigrelation. II 11.4 The Gamma and Beta Functions We have already encountered the gamma function. In this section, we derive some useful relations involving the gammafunction and the closelyrelatedbetafunction. The gamma function is a generalization ofthe factorial function-which is defined only for positive integers-to the system of complex numbers, By differentiating the integral 1(01) '" fooo e-aldt = 1/01 with respect to 01 repeatedly and setting gamma function 01 = I at the end, we get fo oo tne-I dt = nL This fact motivates the generalization defined r(z) '" faoot,-Ie-Idt for Refz) > 0, (11.9) where I' is called the gamma (or factorial) function. It is also called Euler's integralofthe second kind. It is clear from its definition that r(n + I) = n! (11.10) ifn is a positive integer. The restriction Re(z) > 0 assnres the convergence ofthe integral. An immediate consequence ofEquation (11.9) is obtained by integrating it by parts: I'(z + I) = zF'(z). (11.11) This also leads to Equation (11.10) by iteration. Another consequence is the analyticity of I'(z), Differentiating Equation (11.11) with respectto z, we obtain dr(z + 1) r() dr(z) dz = z +zdZ" Thus, dr(z)/dz exists and is finite if and only if dr(z + I)/dz is finite (recall that z oF 0). The procednre of showing the latter is outlined in Problem 11.16. Therefore, F'(z) is auaiytic whenever I'(z +1) is. To see the singularities of I'(z), we note that I'{z +n) = z(z + l)(z +2) ... (z +n - I)r(z),
  • 328. 310 11. COMPLEX ANALYSIS: ADVANCED TOPICS or F'(z} = f(z +n) z(z + l)(z +2) ... (z +n - 1) (11.12) The numerator is analytic as long as Re(z +n) > 0, or Retz) > r--Fl, Thus, for Refz) > -n, the singularities of T'(z) arethepolesatz = 0, -I, -2, ... , -n+ I. Since n is arbitrary, we conclude that 11.4.1. Box. I'(z) is analytic at all Z E iC except at z = 0, -1, -2, ... , where I'(z) has simple poles. A useful result is obtained by setting z = !in Equation (11.9): (11.13) This can be obtained by making the substitntion u = .,fi in the integral. We can derive an expression for the logarithmic derivative ofthe gamma func- tion that involves an infinite series. To do so, we use Equation (11.2) noting that 1/ I'(z + I) is an entire function with simple zeros at {-k}~I' Equation (11.2) gives where y is a constant to be determined. Using Equation (11.11), we obtain 1 00 ( Z) -- = zeyZ f1 1 + - e-z/k • I'(z) k~1 k (11.14) Euler-Mascheroni To determine y, let z = 1 in Equation (11.14) and evaluate the resulting product constant numerically. The result is y = 0.57721566 ... , the so-called Euler-Mascheroni constant. Differentiating the logarithm of both sides of Equation (11.14), we obtain d 1 00 (1 1) -in[f(z)] = -- - y +I: -- - . dz Z k=l k z+k (11.15) (11.16) where Re(a), Re(b) > O. Other properties ofthe ganunafunction are derivable from the results presented beta function defined here. Those derivations are leftas problems. The betafunction, or Euler's integral of the first kind, is defined for complex numbers a and b as follows: B(a, b) es fal ta-1 (1 - t)b-1dt
  • 329. (11.17) 11.4 THE GAMMA AND BETA FUNCTIONS 311 By changing I to 1/1, we can also write R(a, b) sa 100 I-a-b(t - l)b-1dl. Since 0 ::: I ::: I in Equation (11.16), we can define eby I = sin2 e.This gives ["/2 R(a, b) = 2 J o sin2a- 1e cos2b- 1e de. (11.18) (11.19) gamma function and beta function are related This relation can be used to establish a connection between the gamma and beta functions. We note that where in the last step we changed the variable to x = .;t. Multiply I'(c) by ['(b) and express the resulting double integral in terms of polar coordinates to obtain ['(a)[,(b) = ['(a +b)R(a, b), or ['(a)[,(b) R(a, b) = Bib, a) = . ['(a +b) Let us now establish the following useful relation: 11: [,(z)[,(1 - z) = -.-. SIDrr:Z (11.20) for 0 < Re(z) < I. (11.21) With a = z and b = 1- z, and using u = tan s, Equations (11.18) and (11.19) give (00 u2z~1 [,(z)[,(1 - z) = R(z, I - z) =2 J o u2 + I du Using the resultobtained in Exaruple 11.2.4, we immediately get Equation (11.20), valid for 0 < Re(z) < I. By analytic continuation we then geueralize Equation (11.20) to values of z for which both sides are analytic. 11.4.2. Example. Asan illustration of theuse of Equation(tt.20),tetus showthatI'(z) canalsobe written as 1 1 [ e' r(z) = 2:n:i lc Ii dt, whereC isthecontourshownin Fignre11.11.FromEqoations (11.9)and(tt.20)itfollows that 1 sinxz sin rrz 1000 ei1 rz - e-i7l'Z 1000 e-r - = --r(l - z) = - - «"r-'dr = - dr. rez):n: :n: 0 2:n:i 0 rt
  • 330. 312 11. COMPLEX ANALYSIS: ADVANCED TOPICS Im(t) c Figure 11.11 Thecontour C usedin evaluating thereciprocal gamma function. The contour integral of Equation (11.21) can be evaluated by noting that above the real axis, t = rein = -r, below it t = re-i:n: = -r. and, as thereader may check,that the contribution from thesmallcircleatthe origin is zero; so l et loco e- r 1° e- r -dt= -.-(-dr) + . (-dr) C tZ 0 (re l1TY 00 (re-m)Z . loco e-r . loco e-r = _e-l1 rz _ dr + el.7rZ - dr. o rZ 0 rZ Comparison withthelastequation aboveyieldsthedesired result. III Another useful relation can be obtained by combining Equations (11.11) and (11.20): r(z)r(1 - z) = r(z)(-z)r(-z) = rr/ sin zrz. Thus, r(z)r(-z) = ic z sinzrz (11.22) Once we know rex) for positive values ofreal x, we can use Equation (11.22) to find rex) for x < O. Thus, for instance, r(!) =.,fii gives r(-!) = -2.,fii. Equation (11.22) also shows that the gamma function has simple poles wherever z is a negative integer. 11.5 Method of Steepest Descent It is shown in statistical mechanics ([Hill 87, pp. 150-152]) that the partition function, which generates all the thermodynamical quantities, can be written as a contour integral. Debye found a very elegant technique of approximating this contour integral, which we investigate in this section. Consider the integral I (a) == fceU!(z)g(z) dz (11.23) where lal is large and f and g are analytic in some region of C containing the contour C. Since this integral occurs frequently in physical applications, it would
  • 331. (11.24) 11.5 METHOD OF STEEPEST DESCENT 313 be helpful if we could find a general approximation for it that is applicable for all f and g. The fact that lal is large will be of great help. By redefining f(z), if necessary, we can assume that a = lale' ar g(a) is real and positive [absorb e' argte) into the function f(z) if need be]. The exponent of the integrand can be written as af(z) = au(x, y) +iav(x, y). Since a is large and positive, we expect the exponential to be the largest at the maximum of u(x, y). Thus, if we deform the contour so that it passes through a point zo at which u(x, y) is maximum, the contribution to the integral may come mostly from the neighborhood of zoo This opens up the possibility of expanding the exponent about zo and keeping the lowest terms in the expansion, which is what we are after. There is one catch, however. Because of the largeness of a, the imaginary part of af in the exponent will oscillate violently as v(x, y) changes even by a small amount. This oscillation can make the contribution of the real part of f(zo) negligibly small and render the whole procedure useless. Thus, we want to tame the variation of exp[iv(x, y)] by making v(x, y) vary as slowly as possible. A necessary condition is for the derivative of v to vanish at zoo This and the fact that the real part is to have a maximum at zo lead to au . av dfl ax + Z ax = dz zo = o. However, we do not stop here but demand that the imaginary part of f be constant along the deformed contour: Im[f(z)] =Im[f(zo)] or v(x, y) =v(xo, Yo). Equation (11.24) and the Cauchy-Riemann conditions imply that aujax = 0= aujay atzo. Thus, it might appear that zo is a maximum (or minimum) ofthe surface described by the function u(x, y). This is not true: For the surface to have a maximum (minimum), both second derivatives, a2ujax2 and a2ujay2, must be negative (positive). But thatis impossible because u(x, y) is harmonic-the sum of these two derivatives is zero. Recall that a point at which the derivatives vanish but that is neither a maximum nor a minimum is called a saddle point. That is why the procedure described below is sometimes called the saddle point approximation. We are interested in values of z close to zo. So let us expand f(z) in a Taylor series about zo, use Equation (11.24), and keep terms only up to the second, to obtain f(z) = f(zo) + !(z - ZO)2I" (zo). Let us assume that t"(zo) f= 0, and define z - zo = Yje iB, and !f"(zo) = T2eifh and substitute in the above expansion to obtain f(z) - f(zo) = TfT2ei(2Bz+fh), (11.25) (11.26) (11.27)
  • 332. 314 11. COMPLEX ANALYSIS: ADVANCED TOPICS y z,t>~O__ Co x Figure 11.12 A segment of thecontour Coin thevicinityof zoo Thelinesmentioned in thetextaresmallsegments of thecontour Cocentered atzoo or Re[f(z) - f(zo)] = rhcos(201 +02), Im[f(z) - f(zo)] = rrr2 sin(201 +~). (11.28) The constancy of Im[f(z)] implies that sin(201 + 02) = 0, or 201 + 02 = ntt. Thus, for 01 = -~/2 + nIT/2 where n = 0, I, 2, 3, the imaginary part of f is constant. The angle 02 is determined by the second equation in (11.26). Once we determine n, the path of saddle point integration will be specified. To get insight into this specification, consider z - zo = rlei(-1J,/2+nrr/2), and eliminate rl from its real and imaginary parts to obtain y-yo=[tanC; -~)](X-xo). This is the equation of a line passing through zo = (xo, YO) and making an angle of 01 = (nIT - ~)/2 with the real axis. For n = 0,2 we get one line, and for n = 1,3 we get another that is perpendicular to the first (see Figure 11.12). It is to be emphasized that along both these lines the imaginary part of fez) remains constant. To choose the correct line, we need to look at the real part of the function. Also note that these "lines" are small segments of (or tangents to) the deformed contourat zoo We are looking for directions along which Re(f) goes through a relative max- imum at zoo In fact, we are after a path on which the function decreases maximally. This occurs when Re[f(z)] - Re[f(zo)] take the largest negative value. Equation (11.28) determines such a path: It is that path on which cos(201 +02) = -I, or when n = 1,3. There is ouly one such path in the region of interest, and the pro- method ofsteepest cedure is uniquely determined/' Because the descent from the maximum value at descent
  • 333. 11.5 METHOO OF STEEPEST OESCENT 315 zo is maximum along such a path, this procedure is called the method of steepest descent. Now that we have determined the contour, let us approximate the iutegral. Substituting 2ej +e2 = tt, 3n in Equation (11.27), we get fez) - f(zo) =-rir: es _t2 = i(z - ZO)2 f"(zo). Using this in Equation (11.23) yields I (a) "" ( ea[i(zoH'] g(z) dz = eai C,o) ( e-at'g(z) dz, leo leo (11.29) (11.30) where Co isthe deformed contour passing through zoo To proceed, we need to solve for z in terms of t. From Equation (11.29) we have 2 2 2 t 2 -ie, (z - zo) = - - - I = --e !"(zo) ri Therefore, [z - zoI =It1/JT2, or Z - zo =(It1/JT2)eie" by the first equation of (11.26). Let us agree that for t > 0, the point z ou the coutour will move in the direction that makes an angle of 0 :5 el < tt , and that t < 0 correspouds to the opposite direction. This convention removes the remaining ambiguity of the angle el, and gives 1 ·0 z=zo+-el 1 , JT2 (lUI) Using the Taylor expansion of g(z) about zn, we can write g(z)dz = {~~einOlg(n)(zO)} e iO, dt L.- n/2 I 'ri. n=O 72 n. 'V':l. 00 tn = " ei(n+I)01 g(nZO) dt L- (n+l)/2 , I n=O 72 n. and substituting this in Equation (11.30) yields I(a) "" eaiC.o) { e-at' {~ t n eiCn+I)O, gcn)(ZO)} dt lee L.- Cn+I)/2 , o n=O 72 n. 00 i(n+l)91 JOO = eaiC'o)" e g(n)(ZO) e-at'tn dt. L.- Cn+I)/2 , n=O 72 n. -00 (11.32) 5Theangle191 is stillambiguous by x, becausen canbe 1 or 3. However, by a suitablesign convention described below,we canremovethisambiguity.
  • 334. 316 11. COMPLEX ANALYSIS: ADVANCED TOPICS The extension of the integral limits to infinity does not alter the result significantly because a is assumed large and positive. The integral in the sum is zero for odd n. Whenn is even, wemake the substitution u = at2 and show that J~oo e-at2 t" dt = a-(n+l)/2r[(n+ 1)/2].Withn = 2k, and using r2 = If"(zo) 1/2,the sum becomes asymptotic expansion ofI(a) 00 2k+l/2 i(2k+I)el [(a) "" ea!(,o) '" e g(2k)(zo)r(k +!)a-k-1/2. f;;o If"(zo)lk+l/2(2k)! 2 (11.33) (11.34) (11.35) III Stirling approximation This is called theasymptotic expansion of [(a). In most applications, only the first term of the above series is retained, giving [(ex) "" ea!(,o) fiIT eiel g(zo) . V-;- JI f" (zo)1 11.5.1. Example. LeI us approximale theintegral [(ex) == r(ex + 1) =faOO e-,zadz, where ex is a positive real number. First, we must rewrite the integral in the fonn ofEquation (11.23). We candothis bynoting that za =ealnz. Thus,wehave l(ex) = 10 00 ealnZ-Zdz = 10 00 elX(lnz-z/a)dz, andWe identify f(z) =ln z- z/a andg(z) = I. The saddle point is found from f'(z) = 0 or zu = C1. Furthermore, from 1 " 1 ( 1) 1 in 2 f (zn) = 2 - a2 = 2cx2e and 2(h +6z = x, 3n. as well as the condition 0 ::::: 81 < zr, we conclude that 91 = O. Substitution inEquation (11.34) yields r(ex + 1) "" ea!(,o) rzrr_l _ = .J2Jrexea(lna-l) = 5e-aaa+l/2, V-;;./1/ex2 called theStirlingapproximation. 11.5.2. Example. TheHankel function ofthe first kind is defined as HSI)(ex) ==,!- ( e(a/2)(,-1/zl..!!:.-, lJr lc zv+l where C is the contour shown in Figure 11.13. We want to find the asymptotic expansion of this function, choosing the branch of the function in which -l'C < e < 'Jr. We identify f(z) = :!:(z - I/z) andg(z) = z-v-I. Next, the stationary points of f are calculated: df I I dz = 2: + 2z2 =0 =} Zo = ±i.
  • 335. 11.5 METHOD OF STEEPEST DESCENT 317 Irnz in 1-----.----- o Rez III Figure 11.13 The contourfor the evaluation of the Hankelfunctionof thefirstkind. The contour of integration suggests the saddle point ZQ = +i. The second derivative evalu- ated at the saddlepointgives f"(zo) = -1/z5 = -i = e-i" j2, or Ih = -,,/2. This, and the convention 0 ::::: 611 < 1r, force us to choose 611 = 3.1l"/ 4. Substituting this in Equation (11.34)and notingthat f(i) = i and II"(zo)1 = I, we obtain H~l)(a) es ~I(a) ~ ~eai fiJrei3rr/4i - v- l = (2ica-vlr/z-n/4), IX m V-;; Van where we have used i-v- 1 = e-i(v+l)rrj2. Although Equation (11.34) is adequate for most applications, we shall have occasionto demand abetter approximation. One may try to keep higher-orderterms of Equation (11.33), but that infinite sum is in reality inconsistent. The reason is that in the product g(z) dz; we kept only the first power of t in the expansion of z. To restore consistency, let us expand z(t) as well. Snppose 00 z-zo = Lbmtm m=l so that 00 =} dz = L(m + l)bm+1t mdt, m=O 00 tn 00 g(z) dz = L -;;72-/no,g(n)(zo) L (m + I)bm+ltmdt n=O 72 n. m=O 00 ei n8r = L ~(m + l)bm+lg(n)(Zo)t m+ndt. m,n=O T2 n. Now introdnce 1 = m +n and note that the summation over n goes np to I. This gives
  • 336. 318 11. COMPLEX ANALYSIS: ADVANCED TOPICS Substituting this in Equation (11.30) and changing the contour integration into the integral from -00 to 00 as before yields 00 I(a) "" ea!(zo) Lazka-k-I/Zr (k +!), k~O 2k ein(h au = L ---nj2(2k - n + l)bZk_n+lg(n)(zo). n=O r2 n! (11.36) The only thing left to do is to evaluate bm • We shall not give a general formula for these coefficients. Instead, we shall calculate the first three of them. This shonld reveal to the reader the general method of approximating them to any order. We have already calculated bt in Equation (11.31). To calculate bz, keep the next- highest term in the expansion ofboth z and tZ• Thus write t Z = -~f"(zo)(z - zo)z - ~f"'(zo)(z - ZO)3. 2 6 Now substitute the first equation in the second and equate the coefficients of equal powers of t on both sides. The second power of t gives nothing new: It merely reaffirms the value of bl. The coefficient of the third power oft is -blbzt"(zo) - ibif"'(zo). Setting this equal to zero gives (11.37) bz = bffill (zo) _ fill (zo) e4i9[ 6f"(zo) - 31f"(zo)12 , where we substituted for bl from Equation (11.31) and used 201 +Oz = tt . To calculate b3, keep one more term in the expansion ofboth z and t Z to obtain and 2 1" 2 1 m 3 1 (iv) 4 t = -- f (zo)(z - zo) - - f (zo)(z - zo) - - f (zo)(z - zo) 2 6 24 . Once again substitute the first equation in the second and equate the coefficients of equal powers of t on both sides. The second and third powers of t give nothing new. Setting the coefficient of the fourth power of t equal to zero yields b _ b3 { 5[f"'(zo)]z f(iV)} 3 - I 72[f"(zo)]Z - 24f"(zo) ../ie3i9 [ {5[flll(zo)f f(iV)} = 121f"(zo)13/Z 3[f"(zo)]Z - f"(zo) . (11.38)
  • 337. 11.6 PROBLEMS 319 Imz 2i1[;/' + __---, ..LJ • f -00 o +00 Rez Figure 11.14 Contourused for Problem 11.4. 11.6 Problems 11.1. Derive Equation (11.2) from its logarithmic derivative. 11.2. Show that the point at infinity is not a branch point for f(z) = (z2 - 1)1/2. 11.3. Find the following integrals, for which 0 t= a E R. 1 00 (lnx)2 (e) 22dx. o x +a 11.4. Use the contour in Figure 11.14 to evaluate the following integrals. (a) roo s~ax dx Jo sinh x (b) roo x ~osax dx Jo smhx 11.5. Show that 1;f(sine) de = 21;/2 f(sine)de for an arbitrary function f defiued iu the interval [-I, +1]. 11.6. Findthe principalvalue ofthe integral1.':"00 x sin x dx/ (x 2- x5) andevaluate 1 00 x sinx d 1= x -00 (x - xo ± i€)(x +Xo ± i€) for the four possible choices of signs. 11.7. Use analytic continuation, the analyticity of the exponential, hyperbolic, and trigonometric functions, and the analogous identities for real z to prove the following identities. (a) eZ = cosh z + sinhz. (e) sin2z = 2sinzcosz. (b) cosh2 z - sinh2Z = 1.
  • 338. 320 11. COMPLEX ANALYSIS: ADVANCED TOPICS 11.8. Show that the function I/z2 represents the analytic continnation into the domain iC - {OJ (all the complex plane minns the origin) of the function defined by L:~o(n + I)(z + I)" where [z + II < 1. 11.9. Find the analytic continuation into iC- Ii, -i} (all the complex plane except i and -i) of f(z) = fo oo e-zt sin t dt where Refz) > O. 11.10. Expand f(z) = L:~o zn (defined in its circle of convergence) in a Taylor series about z = a. For what values of a does this expansion permit the function f (z) to be continued analytically? 11.11. The two power series 00 zn Ji(z) = L- n=l n and . ~ n (z _2)n !z(z) = in + L..(-I) -'--'--------'- n=l n and where Retz) > O. have no common domain of convergence. Show that they are nevertheless analytic continuations of one another. 11.12. Prove that the functions defined by the two series I (I - a)z (I - a)2z2 -I---z - (I - Z)2 + -'-(:-:-I-_--"Z"")3c- are analytic continuations of one another. 11.13. Show that the function Ji (z) = I/(z2 + I), where z i' ±i, is the analytic continuation into iC - Ii, -i} of the function !z(z) = L:~0(_l)nz2n, where lel < 1. 11.14. Find the analytic continuation into iC - {OJ of the function f(z) = 10 00 te-ztdt 11.15. Show that the integral in Equation (11.9) converges. Hint: First show that [F'(z + I) I ::s fooo tX e-tdt where x = Retz), Now show that for some integer n > 0 and conclude that I'(z) is finite. 11.16. Show that dr(z + 1)ldz exists and is finite by establishing the following: (a) [Inr] < t+ lit for t > O. Hint: Fort::: I, showthatt -Int is a monotonically increasing function. For t < I, make the substitution t = lis. (b) Use the result from part (a) in the integral for dr(z + 1)ldz to show that Idr(z + I)ldzl is finite. Hint: Differentiate inside the integral.
  • 339. 11.6 PROBLEMS 321 11.17. Derive Equation (11.11) from Equation (11.9). 11.18. Show that I'(~) = .ft, and that 2 k (2k+l) (Zk - I)l! == (Zk -j)(2k - 3) .. ·5·3· I = .ftf -Z- . 11.19. Show that F'(z) = Io1[1n(llt)y-ldt with Re(z) > O. 11.20. Derive the identity 1000 eX"dx = f[(a +1)la]. 11.21. Consider the function f (z) = (I +z)". (a) Show that d" fldznlz~o = I'(o + 1)1I'(o - n + I), and use it to derive the relation ( a ) a! f(a + I) where n == nl(a - n)! == n!r(a - n + I) . (b) Show that for general complex numbers a and b we can formally write (a +b)a = t (:)anb a-n. n=O (c) Show that if a is a positive integer m, the series in part (b) truncates at n = m. 11.22. Prove that the residue of I'(z) at z = -k is rk = (-I)klkL Hint: Use Equation (11.12) 11.23. Derive the following relation for z = x +iy: 11.24. Using the definition of B(a, b), Equation (11.16), show that B(a, b) = B(b, a). 11.25. Integrate Equation (11.21) by parts and derive Equation (11.11). 11.26. For positive integers n, show that f(~ - n)r(~ +n) = (_1)n".. 11.27. Show that (a) B(a, b) = B(a + I, b) + B(a, b + I). (b) B(a, b + I) = C~ b) B(a, b). (c) B(a, b)B(a +b, c) = B(b, c)B(a, b + c).
  • 340. 322 11. COMPLEX ANALYSIS: ADVANCED TOPICS Imz o Rez -it: 1-----.....- - - - Figure 11.15 The contour for the evaluation of the Hankel function of the second kind. 11.28. Verify that 121(1 +t)a(l - t)bdt = za+b+lB(a + 1, b + 1). 11.29. Show that the volume of the solid formed by the surface z = xal, the xy-, yz-, and xz-planes, and the plane parallel to the z-axis and going through the points (0, YO) and (xo, 0) is xa+1 b+l o Yo B(a + 1, b + 1). a+b+Z 11.30. Derive this relation: where - 1 < a < b. Hint: Let t = tanh2x in Equation (11.16). 11.31. The Hankel function of the second kind is defined as where C is the contour shown in Figure 11.15. Find the asymptotic expansion of this function. 11.32. Find the asymptotic dependence ofthe modified Bessel function of the first kind, defined as . [v(ex) es ~ 1. e(a/2)«+I/<) dz , 2:rrz Y c zv+l where C starts at -00, approaches the origin and circles it, and goes back to -00. Thus the negative real axis is excluded from the domain of analyticity.
  • 341. 11.6 PROBLEMS 323 11.33. Find the asymptotic dependeoce of the modified Bessel function of the second kind: K (a) ==! [ e-(aI2)(z+1/z)~ IJ 2 Jc zv+l' where C starts at 00, approaches the origin and circles it, and goes back to 00. Thus the positive real axis is excluded from the domain of analyticity. Additional Reading I. Denoery, P.and Krzywicki, A.Mathematics for Physicists, Harper and Row, 1967. 2. Lang, S. Complex Analysis, 2nd ed., Springer-Verlag, 1985. Contains a very lucid discussion of analytic continuation.
  • 345. 12 _ Separation of Variables in Spherical Coordinates The laws of physics are almost exclusively writteu in the form of differential equations (DEs). In (point) particle mechanics there is only one independent vari- able, leading to ordinary differential equations (ODEs). In other areas of physics in which extended objects such as fields are studied, variations with respect to po- sition are also important. Partial derivatives with respect to coordinate variables show up in the differential equations, which are therefore called partial differen- tiai eqnations (PDEs). We list 'the most common PDEs of mathematical physics in the following. 12.1 PDEs of Mathematical Physics Inelectrostatics, where time-independent scalar fields such as potentials and vector fields such as electrostatic fields are studied, the law is described by Poisson's Poisson's equation equation, V2<1>(r) = -41fp(r). (12.1) Laplace's equation In vacuum, where p(r) = 0, Equation (12.1) reduces to Laplace's equation, (12.2) Many electrostatic problems involve couductors held at constant potentials and situatedin vacuum. In the space betweensuchconducting surfaces, the electrostatic potential obeys Equation (12.2). heat equation The most simplified version of the heat equation is (12.3)
  • 346. 328 12. SEPARATION OF VARIABLES IN SPHERICAL COOROINATES where T is the temperature aud a is a constaut characterizing the medium io which heat is flowing. One of the most frequently recurriog PDEs encountered in mathematical wave equation physics is the wave eqnation, (12.4) This equation (or its simplification to lower dimensions) is applied to the vibration of striogs aud drums; the propagation of sound io gases, solids, aud liquids; the propagation of disturbauces in plasmas; aud the propagation of electromagnetic waves. The Schriidinger equation, describiog nonrelativistic quautum phenomena, Schrodinger equation is where m is the mass of a subatomic particle, Ii is Plauck's constaut (divided by 2".), V is the potential energy of the particle, aud IW(r, t)I2 is the probability density of findiog the particle at r at time t. A relativistic generalization of the Schrodinger equation for a free particle of mass ,m is the Klein-Gordon eqnation, which, in terms of the natural units (Ii = 1 = c), reduces to Klein-Gordon equation ~ aw - -V2 w+ V(r)W = -Ui-, 2m at 2 2 a2q, Vq,-mq,=-2' at (12.5) (12.6) time isseparated from space Equations (12.3-12.6) have partial derivatives with respect to time. As a first step toward solviog these PDEs aud as au iotroduction to similar techniques used io the solution of PDEs not iovolviog time,1 let us separate the time variable. We will denote the functions io all four equations by the generic symbol W(r, t). The basic idea is to separate the r aud t dependence ioto factors: W(r, t) es R(r)T(t). This factorization permits us to separate the two operations of space differentiation aud time differentiation. Let L staud for all spatial derivative operators aud write all the relevaut equations either as LW = awjat or as Lj1 = a2w jat2• With this notation andtheabove separation, we have L(RT) = T(LR) = {RdTjdt, Rd2Tjdt2 1See [Hass99] fora thorough discussionof separation in Cartesian andcylindrical coordinates. Chapter 19 of thisbookalso contains examples of solutions to some second-order linear DEsresulting from suchseparation.
  • 347. 12.1 PDES DF MATHEMATICAL PHYSICS 329 Dividing both sides by RT, we obtain (12.7) Now comes the crucial step in the process of separation ofvariables. The LHS of Equation (12.7) is a function of position alone, and the RHS is a function of time alone. Since r and t are iudependent variables, the only way that (12.7) can hold is for both sides to be constant, say a: 1 -LR = a =} LR = «R R 1 dT dT --=a =} -=aT T dt dt We have reduced the original time-dependent POE to an ODE, dT =aT dt or (12.8) and a POE involving only the position variables, (L - a)R = O. The most general formofL-a arising from Equations (12.3-12.6) is L-a sa V2 +j(r). Therefore, Equations (12.3-12.6) are equivalent to (12.8), and V 2R +j(r)R = O. (12.9) Toinclude Poisson's equation, we replacethe zero on the RHS by g(r) == -41!'p(r), obtaining V 2R + j(r)R = g(r). With the exception of Poisson's equation (an inhomogeneous POE), in all theforegoing equations the term on the RHS is zero.' We will restrict ourselves to this so-called homogeneous case and rewrite (12.9) as V2W(r) + j(r)W(r) = O. (12.10) Depending on the geometry of the problem, Equation (12.10) is further separated into ODEs each involving a single coordinate of a suitable coordinate system. We shall see examples of all major coordinate systems (Cartesian, cylindrical, and 210most cases, a is chosento be real.In the case of the Schrodinger equation, it is moreconvenientto choose a to be purely imaginary so thatthei in thedefinition of Lcanbecompensated. Inall cases,theprecisenature of a is determined byboundary conditions. 3Techniques forsolvinginhomogeneous PDEsarediscussed in Chapters 21 and22.
  • 348. 330 12. SEPARATION OF VARIABLES IN SPHERICAL COORDINATES spherical) in Chapter 19. For the rest of this chapter, we shall concentrate on some general aspects of the spherical coordinates. Jean Le Rond d'A1embert (1717-1783) was the illegitimate son of a famous sa- lon hostess of eighteenth-century Paris and a cavalry officer. Abandoned by his mother, d' Alembert was raised by a foster family and later educated by the arrangement of his father at a nearby church-sponsored school, in which he received instruction in the classics and above-average instruction in mathematics. After studying law and medicine, he finally chose to pursue a careerin mathematics. In the 17408he joinedthe ranks ofthe philosophes, a growing group of deistic and materialistic thinkers and writers who actively questioned the social and intellectual standards of the day. He traveled little (he left France only once, to visit the court of Frederick the Great), preferring instead the company of his friends in the salons, among whom he was well known for his wit and laughter. D' Alembert turned his mathematical and philosophical talents to many of the outstanding scientific problems of the day, with mixed success. Perhaps his most famous scien- tific work, entitled 'Iraite de dynamique, shows his appre- ciation that a revolution was taking place in the science of mechanics-thefonnaIization ofthe principles stated by New- ton into a rigorous mathematical framework. The philoso- phy to which d'Alembert subscribed, however, refused to ac- knowledge the primacy of a concept as unclear and arbitrary as "force," introducing a certain awkwardness to his treatment and perhaps causing him to overlook the important principle of conservation of energy. Later,d'Alembertproduced a treatiseon fluidmechanics (the priority of which is still debated by historians), a paper dealing with vibrating sltings (iu which the wave equation makes its first appearance in physics), and a skillful treatment of celestial mechanics. D' Alembert is also credited with use of the first partial differential equation as well as the first solution to such an equation using separation of variables. (One should be careful interpreting ''first'': many of d' Alembert's predecessors and con- temporaries gave similar. though less satisfactory. treatments of these milestones.) Perhaps his most well-known contribution to mathematics (at least among students) is the ratio test for the convergence of infinite series. Much of the work for which d' Alembert is remembered occurred outside mathemat- ical physics. He was chosen as the science editor of the Encyclopedie, and his lengthy Discours Preliminaire in that volume is considered one of the defining documents of the Enlightenment. Other works included writings on law, religion, and music. Since d' Alembert's final years were not especially happy ones, perhaps this account of his life should eud with a glimpse at the bwnanity his philosopby often gave his work. Like many of his contemporaries, he considered the problem of calculating the relative risk associated with the new practice of smallpox inoculation. which in rare cases caused the disease it was designed to prevent. Although not very successful in the mathematical sense. he was careful to point out that the probability ofaccidental infection, however slight or elegantly derived, would be small consolation to a father whose child died from the
  • 349. 12.2 SEPARATION OFTHEANGULAR PARTOFTHE LAPLACIAN 331 inoculation. It is greatlyto his credit that d'Alembertdid not believe such considerations irrelevant to the problem. 12.2 Separation ofthe Angular Partofthe Laplacian angular momentum operator commutation relations between components 01 angular momentum operator With Cartesian and cylindrical variables, the boundary conditions are important in determining the nature of the solutions of the ODE obtained from the POE. In almost all applications, however, the angular part of the spherical variables can be separated and studied very generally. This is because the angular part of the Laplacian in the spherical coordinate system is closely related to the operation of rotation and the angular momentum, which are independent of any particular situation. The separation of the angular part in spherical coordinates can be done in a fashion exactly analogous to the separation of time by writing Ij1 as a product of three functions, each depending on ouly one of the variables. However, we will follow an approach that is used in quantum mechanical treatments of angular momentum. This approach, which is based on the operator algebra of Chapter 2 and is extremely powerful and elegant, gives solutions for the angular partin closed form. Define the vector operator pas p= -iV so that its jth Cartesian component is Pj = -i8/8xj, for j = 1,2,3. In quantum mechanics p(multiplied by Ti) is the momentum operator. It is easy tu verify that" [Xj, Pk] = i8jk and [Xj, xk] = 0= [Pj' Pk]· We can also define the angular momentum operator as C= r x p. This is expressed in components as Li = (r X P)i = EijkXjPk for i = 1,2,3, where Einstein's summation convention (summing over repeated indices) is utilized.f Using the commutation relations above, we obtain [Lj, Lk] = iEjk/L/. We will see shortly that Ccan be written solely in terms of the angles eand tp, Moreover, there is one factor of pin the definition of C, so if we square C, we will get two factors of p, and a Laplacian may emerge in the expression for C.L. In this marmer, we may be able to write '112 in terms of L2, which depends only on 4These operators act on the space of functions possessing enough "nice" properties as to renderthe space suitable. The operator xj ~ply multiplies functions, while p' differentiates them. It IS assumed that the reader is familiar with vector algebrausing indices and such objects as 8ij and Eijk' For an introductory treatment, sufficient for our present discussion, see [Hass 99]. A more advanced treatment ofthese objects (tensors) can be found in Part VII of this book.
  • 350. 332 12. SEPARATION OF VARIABLES IN SPHERICAL COORDINATES angles. Let us try this: 3 2 "" " L = L· L = .!...J LiL; = EijkXjPkEimllXmPn = EijkEimn XjPkXmPn ;=1 = (8jm8kn - 8jn8km)XjPkXmPn = XjPkXjPk - XjPkXkPj' We need to write this expression in such a way that factors with the same index are next to each other, to give a dot product. We must also try, when possible, to keep the pfactors to the right so that they can operate on functions without intervention from the x factors. We do this using the commutation relations between the x's and the p's: L 2 = Xj(XjPk - i8kj)Pk - (PkXj +i 8kj)XkPj = XjXjPkPk - iXjPj - PkXkXjPj - iXjPj = XjXjPkPk - 2ixjPj - (XkPk - i8kk)XjPj' Recalling that 8kk = Ei~l 8kk = 3 and XjXj = E]~l XjXj = r·r= r 2 etc., we can write L2 = r 2p.p+rr· p- (r. p)(i'. p), which, if we make the substitution p= -iV, yields V2 = _r-2L2 +r-2(r. V)(r· V) +r-2r · V. Letting both sides act on the function "'(r, e, <p), we get 2 I 2 1 1 V '" = --L '" + -(r· V)(r· V)'" + -r· V"'. r2 r2 r2 (12.11) (12.12) Laplacian separated into angular and radial parts But we note that r· V = reT' V = ra/ar. We thus get the final form of 72", in spherical coordinates: V2", = _~L2", + ~~ (ra"') + ~ a",. r2 rar ar r ar It is important to note that Equation (12.11) is a general relation that holds in all.coordinate systems. Although all the manipulations leading to it were done in Cartesian coordinates, since it is written in vectornotation, there is no indication in the final form that it was derived using specific coordinates. Equation (12.12) is the spherical version of (12.11) and is the version we shall use. We will first make the simplifying assumption that in Equation (12.10), the master equation, fer) is a function of r only. Equation (12.10) then becomes 1 2 1 a (a"') I a", --L '" + -- r- + -- + f(r)'" = O. r2 rar ar r ar Assuming, for the time being, that L2 depends only on eand <p, and separating '" into a product of two functions, "'(r, e, <p) = R(r)Y(e, <p), we can rewrite this
  • 351. 12.2 SEPARATION OFTHEANGULAR PARTOFTHE LAPLACIAN 333 equation as I la[a ]Ia --L2(RY) + -- r-(RY) + --(RY) + f(r)RY = O. r2 r ar ar r ar Dividing by RY and multiplying by r2 yields I 2 r d (dR) r dR 2 --L (Y)+-- r- +--+r f(r) =0, Y Rdr dr R dr " - . . - ' • I -no +a or and d 2R 2dR [ Ci] '-+--+ f(r)-- R=O. dr2 r dr r2 (12.13) (12.14) We will concentrate on the angular part, Equation (12.13), leaving the radial part to the general discussion of ODEs. The rest of this subsection will focus on showing that Ll sa Lx> L2 sa Ly, and L3 sa Lz are independent of r. Since Li is an operator, we can study its action on an arbitrary function f. Thus, L;j = -i€ijkX/'hf == -i€ijkXjaf/aXk. We can express the Cartesian Xj in terms of r, II, and tp, and use the chain rule to express af/axk in terms of spherical coordinates. This will give us Lif expressed in terms of r, II, and rp. It will then emerge that r is absent in the final expression. Let us start with x = r sin IIcos tp, Y = r sin IIsin tp, z = r cos II, and their inverses, r = (x2 + y2 + Z2)1/2, cosll = zfr, tanrp = Ylx, and express the Cartesian derivatives in terms of spherical coordinates using the chain rule. The first such derivative is af af ar af all af arp -=--+--+--. ax ar ax all ax arp ax (12.15) The derivative of one coordinate system with respect to the other can be easily calculated. For example, arlax = x]r = sinllcosrp, and differentiating both sides of the equation cos II = zlr, we obtain . all zarlax zx -smll- = ---- = -- = ax r2 r3 cos IIsin 1/ cos rp all cos IIcos rp =} = ---'- r ax r Finally, differentiating both sides oftan rp = YIx with respectto x yields arpIax = - sinrp/(r sin II). Using these expressions in Equation (12.15), we get af. af cosllcosrpaf sinrp af - = smllcosrp- + - ---. ax ar r all r sin II arp
  • 352. 334 12. SEPARATION OF VARIABLES IN SPHERICALCOOROINATES In exactly the same way, we obtain af . . af cos esin I" af cos I" af -=smesmqJ,,+ oe+-'-e'" ay or r u r sm vI" af af sine af - = cos()- - ---. az ar r ae We can now calculate Lx by letting it act on an arbitrary function and expressing all Cartesian coordinates and derivatives in terms of spherical coordinates. The result is . af . af .( . a a ) Lxf = -ly- + IZ- = I smqJ- +cote cos 1"- f, az ay ae aqJ Cartesian components of angular momentum operator expressed inspherical coordinates or Lx = i (Sin qJ~ +cote cos qJ~) . ae aqJ Analogous arguments yield Ly = i (-cosqJ~ +cote SinqJ~), ae aqJ (12.16) (12.17) (12.18) angular momentum squared as differential operator ineand q> It is left as a problem for the reader to show that by adding the squares of the components of the angular momentum operator, one obtains 2 I a (. a) I a2 L = - sine ae sme ae - sin2 eaqJ2' Substitution in Equation (12.12) yields the familiar expression for the Laplacian in spherical coordinates. 12.3 Construction of Eigenvalues of l2 Now that we have L2 in terms of eand 1", we could substitute in Equation (12.13), separate the eand I" dependence, and solve the corresponding ODEs. However, there is a much more elegant way of solving this problem algebraically, because Equation (12.13) is simply an eigenvalue equation for L2. In this section, we will find the eigenvalues of L2 • The next section will evaluate the eigenvectors of L2 Let us consider L2 as an abstract operator and write (12.13) as where IY) is an abstract vector whose (e, qJ)thcomponent can be calculated later. Since L2 is a differential operator, it does not have a (finite-dimensional) matrix
  • 353. 12.3 CONSTRUCTION OF EIGENVALUES OF L2 335 representation. Thus, the determinantal procedure for calculating eigenvalues and eigenfunctions will not work here, and we have to find another way. The equation above specifies an eigenvalue, a, and an eigenvector, IY). There may be more than one IY) corresponding to the same a. Todistinguish among these so-called degenerate eigenvectors, we choose a second operator, say L3 E {L,}that commntes with L2. This allows us to select a basis in which both L2 and L3 are diagonal, or, equivalently, a basis whose vectors are simultaneous eigenvectors of both L2 and L3. This is possible by Theorem 4.4.15 and the fact that both L2 and L3 are hermitian operators in the space of square-integrable functions. (The proofis left as a problem.) In general, we would want to continue adding operators until we obtained a maximum set of commuting operators which could label the eigenvectors. In this case, L2 and L3 exhaust the set," Using the more common subscripts x, y, and z instead of 1, 2, 3 and attaching labels to the eigenvectors, we have Lz IY.,p) = fJ 1Y.,p), (12.19) The hermiticity of L2 and Lz implies the reality of a and fJ. Next we need to determine the possible values for a and fJ. Define two new operators L+ sa Lx +i Ly and L == Lx - iLy . It is then easily verified that (12.20) angular momentum raising and lowering operalors The first equation implies that L± are invariant operators when acting in the sub- space corresponding to the eigenvalue a; that is, L± 1Y.,p) are eigenvectors of L2 with the same eigenvalue a: L2(L± 1Y.,p) = L±(L21Y.,p) = aL± 1Y.,p). The second equation in (12.20) yields Lz(L+ IY.,p) =(LzL+) 1Y.,p) =(L+Lz +L+) IY.,p) =L+Lz 1Y.,p)+ L+ lYa,p) =fJL+ 1Ya,p) +L+ 1Y.,p) = (fJ + I)L+ IY.,p) . This indicates that L+ 1Y.,p) has one more unit of the Lz eigenvalue than 1Y.,p) does. In other words, L+ raises the eigenvalue of Lz by one unit. That is why L+ is called a raising operator. Similarly, L is called a lowering operator because Lz(L 1Y.,p) = (fJ - I)L 1Ya,p). Wecansummarize theabovediscussion as 6Wecouldjustas well havechosenL 2 andanyother component as ourmaximalset. However, L2 andL3 is theuniversally accepted choice.
  • 354. 336 12. SEPARATION OF VARIABLES IN SPHERICAL COOROINATES where C± are constants to be determined by a suitable normalization. There are restrictions on (and relations between) 01 and p.First note that as L2 is a sum of squares of hermitian operators, it must be a positive operator; that is, (al L2 1 a) :::: 0 for all ]c). In particular, 0::; (Ya,~1 L2IYa,~) = 01 (Ya,~1 Ya,~) = 0IIIYa,~1I2. Therefore, 01 :::: O. Next, one can readily show that L 2 = L+L +L; - Lz = LL+ +L; +Lz. (12.21) Sandwiching both sides of the first equality between IYa,~) and (Ya,~ Iyields (Ya,~1 L2IYa,~) = (Ya,~1 L+L IYa,~) + (Ya,~1 L; lYa,~) - (Ya,~1 t, lYa,~), with an analogous expression involving LL+. Using the fact that L+ = (L)t, we get 0IIIYa,~1I2 = (Ya,~1 L+L IYa,~) +p2I1Ya,~1I2 - PIIYa,~1I2 = (Ya,~1 LL+ IYa,~) +p2I1Ya,~1I2 +PIIYa,~1I2 = liLT IYa.~) 11 2 +p2I1Ya,~ f 1= PllYa,~1I2 (12.22) Because of the positivity ofnonns, this yields 01 :::: p2 - P and 01 :::: p2 +p. Adding these two inequalities gives 201 :::: 2p2 => -v'{i::; p ::; v'{i. It follows that the values of p are bounded. That is, there exist a maximum p,denoted by P+, and a minimum p, denoted by p_, beyond which there are no more values of p. This can happen only if because if L± lYa,~±) are not zero, then they must have values of p corresponding to P±± 1, which are not allowed. Using P+ for pin Equation (12.22) yields (01- p~ - P+)llYa,~+1I2 = O. By definition lYa,~+) oft 0 (otherwise P+ - 1 wonld be the maximum). Thus, we obtain 01 = p~ +P+ An analogous procedure using p_ for p yields 01 = P:- p_. We solve these two equations for P+ and p_: P+ = !(-I ±"'1 +4a), p_ =!(I ± "'I+4a). Since P+ :::: p_ and "'1 +401 :::: I, we must choose P+ =!(-I +"'1 +401) =-p_.
  • 355. forkEN, eigenvalues ofL' and Lz given 12.3 CONSTRUCTION OF EIGENVALUES OF L2 337 Starting with 1Ya.~+), we can apply L_ to it repeatedly. In each step we decrease the value offJ by one unit. There must be a limit to the number of vectors obtained in this way,becausefJ has a minimum. Therefore, there mustexist a nonnegative integer k such that (L)k+1IYa,~+) =L(L~ lYa,~+)) =O. Thus, L~ lYa,~+) must be proportioual to lYa,~_). In particular, since L~ IYa,~+) has a fJ value equal to fJ+ - k, we have fJ- = fJ+ - k. Now, using fJ- = -fJ+ (derived above) yields the important result k . fJ+=Z=J or a = j (j +1), since a = fJt+fJ+. This result is important enough to be stated as a theorem. 12.3.1. Theorem. The eigenvectorsof~, denoted by 1Yjm), satisfy the eigenvalue relations L 21 Yjm) = j(j + 1) IYjm) , Lz IYjm) = m IYjm) , where j is a positive integer or half-integer, and m can take a value in the set I-j, -j + 1, ... ,j -1, j] of2j + 1 numbers. Let us briefly consider the normalization of the eigenvectors. We already know that the IYjm), being eigenvectors ofthe hermitian operators L2 and Lz, are orthog- onal. We also demand that they be of unit norm; that is, (12.23) (12.24) This will determine the constants C±, introduced earlier. Let ns consider C+ first, which is defined by L+ IYjm) = C+ IYj,m+l). The hermitian conjngate of this equation is (Yjml L = C':' (Yj,m+ll. We contract these two eqnations to get (Yjml LL+ IYj m) = IC+12 (Yj,m+ll Yj,m+l). Then we use the second relation in Equation (12.21), Theorem 12.3.1, and (12.23) to obtain j (j + 1) - m(m + 1) = 1 C+1 2 => 1 C+1=-/"j-;-(j:---+:---I:7)---m:---(:---m-+:---I'"'"). Adopting the convention that the argument (phase) of the complex number C+ is zero (and therefore that C+ is real), we get C+ = .jj(j +1) -m(m +1) Similarly, C- = -/j (j + 1) m(m 1). Thus, we get L+ IYjm) = ,jj(j + 1) - m(m + 1) IYj,m+l) , L IYjm) =,jj(j + 1) - m(m - 1) IYj,m-l).
  • 356. 338 12. SEPARATION OF VARIABLES IN SPHERICAL COOROINATES 12.3.2. Example. Letusfindanexpressionfor IYlm) byrepeatedly applyingL- to IYu). Theactionfor L_ is completely described by Equation (12.24). Forthefirst power of L_, we obtain L-lYu) = ,//(/ + I) -/(/-1) IYI,I-I} = J2l1Y1,l-1). Weapply L_ oncemore: (L-)2IYU) = J2lL-1Y1,1-l} = J2l,//(l + I) - (/- 1)(/- 2) IYI,I-2} = J2l,/2(21 - I) 1Y1,1-2} = ,/2(21)(2/ - I) 1Y1,1-2)' ApplyingL- a thirdtimeyields (L-)3 IYU) = ,/2(2/)(2/- 1)L-IYI,l-2} = ,/2(21)(2/- 1),/6(/- I) 1Y1,1-3) = ,/3!(21)(21 - 1)(21 - 2) IYI,H}. Thepattern suggeststhefollowingformula fora generalpowerk: L~ lYu) = ,/k!(2/)(21-I) ... (2/- k + I) 1Y1,I-k}, or L~ lYu} = ,/k!(21)!j(21 k)! 1YI,I-k).lfwe set1- k =m and solvefor IYI,m)' we get } (I +m)! l-mIY) IYl,m = (/ _m)!(21)! L_ iu- III The discussion in this sectionis the standard treatment ofangularmomentumin quantum mechanics. In the context of quantum mechanics, Theorem 12.3.1 states the far-reaching physical result that particles can have integer or half-integer spin. Suchaconclusionistiedtotherotationgroup inthree dimensions, which,intum, is an exarupleofa Lie group, or a contiuuous group oftransformations. We shall come back to a study ofgroups later. Itis worth notiug that it was the study ofdifferential equations that led the Norwegian mathematician Sophus Lie to the investigation of their symmetries and the development of the beautiful branch of mathematics and theoretical physics that bears his narue. Thus, the existence of a conuection between group theory (rotation, angular momentum) and the differential equation we are trying to solve should not come as a surprise. 12.4 Eigenvectors of L2 : Spherical Harmonics The treatment in the preceding section took place in an abstract vector space. Let us go back tu the fuuction space and represent the operators and vectors in terms of 0 and <p. First, let us consider Lz in the form of a differential operator, as given in Equation (12.17). The eigenvalue equation for L, becomes
  • 357. 12.4 EIGENVECTORS OF L2: SPHERICAL HARMONICS 339 We write Yjm(O, <p) = Pjm(O)Qjm(<P) and substitute in the above equation to obtain the ODE for <p, dQjm/d<p = ImQjm, which has a solution of the form Qjm(<P) = Cjmeim~, where Cjm is a constant. Absorbing this constant into Pjm, we can write Yjm(O, <p) = Pjm(O)eim~. In classical physics the value of functions must be the same at <p as at <p + 2". This condition restricts the values of m to integers. In quantum mechanics, on the other hand, it is the absolute values of functions that are physically measurable quantities, and therefore m can also be a half-integer. 12.4.1. Box. From now on, we shall assume that m is an integer and denote the eigenvectors of L2 by Ylm(0, <p), in which 1Is a nonnegative integer. Our task is to find an analytic expression for Ylm(O, <pl.We need differential expressions for L±. These can easily be obtained from the expressions for Lx and Ly given in Equations (12.16) and (12.17). (The straightforward manipulations are left as a problem.) We thus have L± = e±i~ (±~ + i cotO~) . (12.25) ao a<p Since 1is the highest value ofm, when L+ acts on Yll(O, <p) = Pll(O)eil~ the result must be zero. This leads to the differential equation (:0 + I cot 0aa<p) [Pll(O)eil~] = 0 =} (:0 -I cot 0 )Pll(O) = O. The solution to this differential equation is readily found to be Pll (IJ) = CI (sin 0)1. The constant is subscripted because each Pll may lead to a different constant of integration. We can now write Yll(O, <p) = CI(sinO)Ieil~. With Yll(O, <p) at our disposal, we can obtain any Ylm(O, <p) by repeated ap- plication of L. In principle, the result of Example 12.3.2 gives all the (abstract) eigenvectors. In practice, however, it is helpful to have a closed form (in terms of derivatives) for just the 0 part of Ylm(O, <pl. So, let us apply L, as given in Equation (12.25) to Yll(O, <p): . ( a a ) '1 LYli = e-'~ - - + I cotO- [Pll(O)e'~] ao a<p = e-i~ [- :0 +I cotO(il)] [Pll(O)eil~] = (_I)ei(l-l)~ (:0 +lcotO) Pll(O).
  • 358. 340 12. SEPARATION OF VARIABLES IN SPHERICALCOORDINATES It can be shown that for a positive integer, ( d ) I d .n dO +ncotO f(O) = sinnOdO[sm Of(O)]. Using this result yields (12.26) (12.27) LYIl = v'2iYI.I-l = v'2iei(l-ll~PI,1-1 (0) = (_I)ei(I-ll~_I_~[sini O(CI sini 0)] siniO dO ei(I-l)~ d = (-I)CI---(sin21 0). sinlO dO We apply L to (12.27), and use Equation (12.26) with n = I - I to obtain With a little more effort one can detect a pattern and obtain ei (l-k)~ dk k ( 2)1 L_Yn = CI (1 _ u2)(l-kl(2 duk [ I - u ] . If we let k = I - m and make use of the result obtained in Example 12.3.2, we obtain YIm(O, rp) = (I +m)! eim~ dI- m --,-'---'-c-=,...,C [(I _ u2)I] (1- m)!(21)! I (I _ u2)m(2 du' m . To specify Ylm(O, rp) completely, we need to evaluate CI. Since CI does not depend on m, we set m = 0 in the above expression, obtaining I d l 2 I YlO(U, rp) = IMmCIdl [(1 - u ) ]. v (21)! u The RHS looks very much like the Legendre polynomials of Chapter 7. In fact, YlO(U, rp) = ~(-1)1211!PI(U) '" AIPI(U). v (21)! (12.28)
  • 359. (12.30) 12.4 EIGENVECTORS OF L2: SPHERICAL HARMONICS 341 Therefore, the normalization of YIO and the Legendre polynomials Pi determines Ct. We now use Equation (6.9) to obtain the integral form of the orthonormality relation forYlm: (12.29) which in terms of u = cos ebecomes 12" it dip Yt;m'(U' <P)Ytm(U, <p)du = 811'8mm,. o -t Problem 12.15 shows thatnsing (12.29) one gets At = .)(21 + l)j(4n). Therefore, Equation (12.28) yields not only the value of Ci, but also the useful relation r: YIO(U, '1') = --Pt(u). 4n spherical harmonics Substituting the value of Ct thus obtained, we finally get (12.31) where u = cos e. These functions, the eigenfunctions of L2 and Lz, are called spherical harmonics. They occur frequently in those physical applications for which the Laplacian is expressed in terms of spherical coordinates. One can immediately read off the epart of the spherical harmonics: t('ii+ll Ptm(u) = (-1) V~ 2tl! However, this is not the version used in the literature. For historical reasons the associated Legendre associated Legendre functions Pt(u) are used. These are defined by functions m m (I+m)! ~ Pt (u) = (-1) (1- m)!V 2i+1Ptm(u) (I + )1 (1 2)-m/2 dt- m = (_I)t+m m. - u [(1- u2iJ. (1- m)! »n du' m Thus, Ytm(e, '1') = (_1)'" [21 4: 1 i:::;:t2 pt"'<cose)ei"'~. (12.33) (12.34)
  • 360. 342 12. SEPARATION OF VARIABLES IN SPHERICAL COOROINATES We generated the spherical harmonics starting with Yl!(0, <p) and applying the lowering operator L. We could have started with YI,-I(O, <p) instead, and applied the raising operator L+. Thelatterprocedure is identical to the former; nevertheless, we outline it below because of some important relations thatemerge along the way. We first note that (I +m)! LI- ", IY ) IYI,-",) = (I _ m)!(21)! + 1,-1· (12.35) (This can be obtained following the steps of Example 12.3.2,) Next, we use L IYI,-I) = 0 in differential form to obtain (:0 -I cot 0) PI,-I(O) = 0, which has the same form as the differential equation for Pl!. Thus, the solution is PI,_I(O) = Cf(sinO)I, and YI,_I(O, <p) = PI,_I(O)e-il~ = C;(sinO)le-il~. Applying L+ repeatedly yields k I (-lle-i(l-k)~ dk 2 I L+YI,_I(U, <p) = CI (I _ u2)(l k)j2 du k [(1 - U ) ], where U = cos O. Substituting k = 1 - m and using Equation (12.35) gives YI,-",(U, <p) = The constant Cf can be determined as before. In fact, for m = 0 we get exactly the same result as before, so we expect Cf to be identical to CI. Thus, Comparison with Equation (12.32) yields YI,_",(O, <p) = (-I)"'YI~"'(O, <p), (12.36) (12.37) andusingthedefinition Pic.s.Ie', <p) = PI,_",(O)e-i"'~ and the first partofEquation (12.33), we obtain P-"'(O) = (-1)'" (1- m)! pm(O). I (l +m)! I
  • 361. 12.4 EIGENVECTORS OF L2: SPHERICAL HARMONICS 343 The first few spherical harmonics with positive m are given below. Those with negative m can be obtained using Equation (12.36). [g S . Y21 = - _e up sin ecos e, 8rr For 1= 0, Fori = I, For 1= 2, For I = 3, I Yoo = v'4Jt' If a· YlO = - cos II, Yl1 = - -e'~ sin II. 4rr 8rr {f 2 Y20 = -(3cos II - I), 16rr /lf s u« ·2 Y22 = --e T sm II. 32rr hf 3 Y30 = -(Scos II - 3 cos II), 16rr /if l . 2 Y31 = - -e'~ sinll(Scos II - I), 64rr ~ OS 2 2 /lfS 3 3 Y32 = --e HPsin (}cosO, Y33 = - --e HPsin 6. 32rr Mrr From Equations (12.13), (12.18), and (12.34) and the fact that a = 1(1 + I) for some nonnegative integer I, we obtain which gives I d (. dPt) m 2 m m - - smll-- - --PI +l(l + I)PI = O. sin IIdll dll sin211 As before, we let u = cos II to obtain d [ 2 dPt] [ m 2 ] m - (l-u)- + 1(1+1)---2 PI =0. du du. 1- u (12.38) associated Legendre differential equation This is called the associated Legendre differential equation. Its solntions, the associated Legendre functions, are given in closed form in Equation (12.33). For m = 0, Equation (12.38) reduces to the Legendre differential equation whose solutions, again given by Equation (12.33) with m = 0, are the Legendre polyno- mials encountered in Chapter 7. When m = 0, the spherical harmonics become q.>-independent. This corresponds to a physical situationin which there is an explicit azimuthal symmetry. In such cases (when it is obvious that the physical property in question does not depend on q.» a Legendre polynomial, depending only on cos II, will multiply the radial function.
  • 362. 344 12. SEPARATION OF VARIABLES IN SPHERICAL COOROINATES 12.4.1 Expansion of Angular Functions The orthonormality of spherical harmonics can be utilized to expand functions of IJ and rp in terms of them. The fact that these functions are complete will be discussed in a general way in the context of Sturm-Liouville theory. Assuming completeness for now, we write { L~o L~=-I aIm Ylm(IJ, rp) f(IJ, rp) = L~'=-l aIm Ylm(IJ, rp) if1is not fixed, if1is fixed, (12.39) where we have included the case where it is known a priori that f (IJ, rp) has a given fixed 1value. To find aIm,we multiply both sides by Yl~ (IJ, rp) and integrate over the solid angle. The result, obtained by using the orthonormality relation, is (12.40) where dn es sin IJ dIJ dtp is the element of solid angle. A useful special case of this formula is alt1= ffdnf(IJ, rp)Y1t(IJ, rp) = )21/I ffdnf(IJ, rp) PI(cos IJ), 11: (12.41) where we have introduced an extra superscript to emphasize the relation of the expansion coefficients with the function being expanded. Another useful relation is obtained when we let IJ = 0 in Equation (12.39): { L~O L~=-l almYlm(IJ, rp)19=O f(IJ, rp)19=O = L~=-l aimYlm(IJ, rp)19=O if 1is not fixed, if 1is fixed. From Equations (12.33) and (12.34) one can show that Therefore, [ L~o alt 1 / ~t1 if 1is not fixed, f(IJ, rp)19=O = a(f) /21+1 if1is fixed. 10 41l' (12.42)
  • 363. 12.4 EIGENVECTORS OF L2; SPHERICAL HARMONICS 345 z y x Figure 12.1 The unit vectors er and er, with their spherical angles and the angle y betweenthem. 12.4.2 Addition Theorem for Spherical Harmonics An important consequence of the expansion in terms of YZm is called the addition theorem for spherical harmonics. Consider two unit vectors e, and e" making spherical angles (e, rp) and (e', rp'), respectively, as shown in Figure 12.1. Let y be the angle between the two vectors. The addition theorem states that addition theorem for spherical harmonics Z 4". L * , , Pz(cos y) = - - ylm(e , rp )Yzm(e, rp). 2/ + I m=-I (12.43) We shall not give a proof of this theorem here and refer the reader to an elegant proof on page 866 which nses the representation theory of groups. The addition theorem is particularly useful in the expansion of the frequently occurring expression 1/lr - r'[. For definiteness we assume [r'] sa r' < [r] es r, Then, introducing t = r' / r, we have 1 1 1 = =-(I+t2-2tcosy)-1/2. [r - r'l (r2 + rtl - 2rr' cos y)I/2 r Recalling the generating function for Legendre polynomials from Chapter 7 and using the addition theorem, we get 1 1 00 00 r,l 4". Z - I - 'I = - LtIPI(cosy) = L 1+1 -2/ 1 L Y{;,,(e',rp')Yzm(e,rp) r r r [=0 [=0 r + m=-l
  • 364. 346 12. SEPARATION OF VARIABLES IN SPHERICAL COOROINATES expansion of 1/1r - "I inspherical coordinates It is clear that if r < r', we should expand in terms of the ratio r Ir', It is therefore customary to use r< to denote the smaller and r-; to denote the larger of the two radii r and r'. Then theabove equation is written as (12.44) This equation is used frequently in the study of Coulomb-like potentials. 12.5 Problems 12.1. By applying the operator [Xj, pkl to an arbitrary function fer), show that [Xj, Pk] = i8jk. 12.2. Use the defining relation Li =fijkXjPk to show thatXjPk - XkPj =fijkLi. In both of these expressions a sum over the repeated indices is understood. 12.3. For the angular momentum operator Li = EijkXj Pko show that the commu- tationrelation [Lj , Lk] = iEjktLI holds. 12.4. Evaluate aflay and af/az in spherical coordinates and find Ly and Lz in terms of spherical coordinates. 12.5. Obtain an expression for L2 in terms of 8 and <fl, and substitnte the result in Equation (12.12) to obtain the Laplacian in spherical coordinates. 12.6. Show that L2 =L+L +L; - Lz and L2 =L_L+ + L; +Lz. 12.7. Show that L2, Lx, Ly , and Lz are hermitian operators in the space of square- integrable functions. 12.8. Verify the following commutation relations: 12.9. Show that L_IY.p) has fJ -I as its eigenvaluefor Lz, and that IY.,P±) cannot be zero. 12.10. Show that if the IYjm) are normalized to mtity, then with proper choice of phase, L IYjm) = ..;j(j + I) - m(m - I) IYj,m-l). 12.11. Derive Equation (12.35).
  • 365. 12.5 PROBLEMS 347 12.12. Starting with Lx and Ly , derive the following expression for L±: L± = e±i. (±~ +i cotO~). ao arp 12.13. Integrate dP/dO -I cotOP = 0 to find P(O). 12.14. Verify the following differential identity: (:0 +n cot 0 )j(O) = sin~O:e [sin" OJ(O)]. 12.15. Letl = I' and m = m' = 0 in Equation (12.30), and substitnte for YIO from Equation (12.28) to obtain Al = ../(21 + 1)/4".. 12.16. Show that (-Ile-i(l-kl. dk LiYI,-/(U, rp) = c;(I _ u2)(l-k)/2 duk [(I - u 2)/]. 12.17. Derive the relations Y/,-m(O, rp) = (_l)mY/:m(0, rp) and (I - m)1 P-m(O) = (_I)m . pm(O). / (I+m)! / 12.18. Show that I:~=-I IYlm(O, rp)12 = (21+1)/(4".). Verify this explicitly for 1=landl=2. 12.19. Show that the addition theorem for spherical harmonics can be written as p/(cosy) = PI(cos O)PI(cos 0') / (I - m)' +2 I:: .Pt(cos O)Pt (cos 0') cos[m(rp _ rp')]. m=! (l +m)! Additional Reading 1. Morse, P. and Feshbach, M. Methods ojTheoretical Physics, McGraw-Hill, 1953. A two-volume classic including a long discussion of the separation of variables in many (sometimes exotic) coordinate systems. 2. The angular momentum eigenvalues and eigenfunctions are discussed in most books on quantnm mechanics. See, e.g., Messiah, A. Quantum Me- chanics, volume IT, Wiley, 1966.
  • 366. 13 _ Second-Order Linear Differential Equations The discussion of Chapter 12 has clearly singled out ODEs, especially those of second order, as objects requiring special attention because most common PDEs of mathematical physics can be separated into ODEs (of second order). This is really an oversimplification of the situation. Many PDEs of physics, both at the fundamental theoretical level (as in the general theory of relativity) and from a practical standpoint (weather forecast) are nonlinear, and the method of the separation of variables does not work. Since no general analytic solutions for such nonlinear systems have been found, we shall confine ourselves to the linear systems, especially those that admit a separated solution. With the exception of the infinite power series, uo systematic method of solving DEs existed during the first halfofthe nineteenthcentury. The majority of solutions were completely ad hoc and obtained by trial and error, causing frustration and anxiety among mathematicians. It was to overcome this frustration that Sophus Lie, motivated by the newly developed concept of group, took up the systematic study of DEs in the second half of the nineteenth century. This study not only gave a handle on the disarrayed area of DEs, but also gave birth to one of the most beautiful and fundamental branches of mathematical physics, Lie group theory. We shall come back to a thorough treatment of this theory in Parts vn and Vlll. Our main task in this chapter is to study the second-order linear differential equations (SOLDEs). However, to understand SOLDEs, we need some basic un- derstanding of differential equations in general. The next section outlines some essential properties of general DEs. Section 2 is a very brief introduction to first- order DEs, and the remainder of the chapter deals with SOLDEs.
  • 367. 13.1 GENERAL PROPERTIES OF ODES 349 13.1 General Properties of ODEs The most general ODE can be expressed as ( dy d 2y dny) F x,y, dx' dx2 ' · · · ' dxn =0, (13.1) in which F : Rn+2 --> R is a real-valued function of n +2 real variables. When F depends explicitly and nontrivially on dnyjdxn, Equation (13.1) is called an nth-order ODE. An ODE is said to be linear if the part of the function F that includes y and all its derivatives is linear in y. The most general nth-order linear ODE is dy dny po(x)y +P!(x)-d +...+Pn(X)-d = q(x) x xn for Pn(x) '" 0, (13.2) (13.3) Pn(x) '" 0, where {pilf=o and q are functions of the independent variable x. Equation (13.2) is said to be homogeneous if q = 0; otherwise, it is said to be inhomogeneous and q (x) is called the inhomogeneous term. It is customary, and convenient, to define a linear differential operator Lby! d dn L sa po(x) +P!(x)- +...+Pn(x)-, dx dx" and write Equation (13.2) as homogeneous and inhomogeneous ODEs L[y] =q(x). (13.4) A solution ofEquation (13.2) or (l3.4) is a single-variablefunction f :R --> R such that F (x, f(x), f'(x), ... , f(n) (x) = 0, or L[f] = q(x), for all x in the domain of definition of f. The solution of a differential equation may not exist if we put too many restrictions on it. For instance, if we demand that f : R --> R be differentiable too many times, we may not be able to find a solution, as the following example shows. 13.1.1. Example. Themostgeneralsolutioo ofdy/dx = IxIthatvaoishes at x = 0 is { l X2 ifx>O f(x ) - ~ - , - -tx2 if x :s o. Thisfunction is continuous andhas first derivative f'(x) = [x],whichis also continuous atx = O. However, if we demand that its secondderivative alsobe continuous atx = 0, we cannot finda solution, because f"(x) ={+I if x > 0, -I ifx <0. 1Do notconfusethislinear differential operator withtheangular momentum (vector) operator L.
  • 368. 350 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS If we want 1m(x) to exist at x = 0, then we have to expand the notion of a function to include distributions,or generalizedfunctions. II .Overrestricting a solution for a differeutial equation results in its absence, but underrestricting it allows multiple solutions. To strike a balance between these two extremes, we agree to make a solution as many times differentiable as plausible and to satisfy certaiu initial conditions. For an nth-order DE such initial conditions are commonly equivalent (but not restricted) to a specification ofthe function and of its first n - I derivatives. This sort of specification is made feasible by the following theorem. implicit function 13.1.2. Theorem. (implicit functiou theorem) Let G : JRn+l --> lR, given by theorem G(Xl, X2, ... , Xn+l) E JR, have continuous partial derivatives up to the kth order in some neighborhood of a point Po = (rl, ru ... , r,,+I) in JRn+l Let. (8Gj8xn+I)lpo i' O. Then there exists a unique function F : JRn --> JR that is continuously differentiable k times at (some smaller) neighborhood of Po such that Xn+1 = F(Xl, X2, ... ,xn)for all points P = (Xl,X2, ... ,Xn+l) in a neigh- borhood of Poand G(Xl, X2, ... , Xn, F(XI, X2, ... , xn)) = O. Theorem 13.1.2 simply asserts that under certain (mild) conditions we can "solve" for one of the independent variables in G(XI, X2, ... , xn+!) = 0 in terms ofthe others. A proof of this theorem is usually given in advanced calcnlus books. Application of this theorem to Equation (13.1) leads to dny ( dy d 2y dn-1y) dxn=F x'Y'dx'dx2""'dxn-1 ' provided that G satisfies the conditions of the theorem. If we know the solution y = f(x) and its derivatives up to order n - I, we can evaluate its nth derivative using this equation. In addition, we can calculate the derivatives of all orders (assuming they exist) by differentiating this equation. This allows us to expand the solution in a Taylor series. Thus-for solutions that have derivatives of all orders-knowledge of the value of a solution and its first n - I derivatives at a point Xo determines that solution at a neighboring point x. We shall not study the general ODE of Equation (13.1) or even its simpler linear versiou (13.2). We will only briefly study ODEs ofthe first order in the next section, and then concentrate on linear ODEs of the second order for the rest of this chapter. 13.2 Existence and Uniqueness for First-Order DEs A general first-order DE (FODE) is of the form G(x, y, y') = O. We can find y' (the derivative of y) in terms of a function of x and y if the function G(Xl, x2, X3)
  • 369. (13.6) themost general FODE In normal form explicit solution toa general first-order linear differential equation 13.2 EXISTENCE AND UNIQUENESS FOR FIRST-ORDER DES 351 is differentiablewith respect to its third argument and aGjax3 0; O. In that case we have , dy y sa - = F(x, y), (13.5) dx which is said to be a normal FODE. If F (x, y) is a linear function of y, then Equation (13.5) becomes a first-orderlinear DE (FOLDE), which can generally be written as dy PI(X)- + po(x)y = q(x). dx It can be shownthat the generalFOLDE has an explicit solution: (see [Hass99]) 13.2.1. Theorem. Anyfirst order linear DE oftheform PI(x) y' +Po(x) Y = q(x), in which po, PI, and q are continuous functions in some interval (a, b), has a generalsolution (13.7) (13.8) where C is an arbitrary constantand f1-(x) = _1_ exp [lX po(t) dt] , PI(X) xo PI(t) where Xo and Xl are arbitrary points in the interval (a, b). No such explicitsolutionexists for nonlinear first-orderDEs. Nevertheless, it is reassuring to know that a solution of such a DE always exists and under some mildconditions,this solutionis unique.We summarizesome of the ideasinvolved intheproofoftheexistenceanduniqueuessofthe solutionstoFODEs.(Forproofs, seethe excellentbook by Birkhoffand Rota [Birk78].) Wefirst statean existeuce theoremdue to Peano: Peano existence 13.2.2. Theorem. (Peanoexistencetheorem) Ifthefunction F (x,y) is continuous theorem for the points on and within the rectangle defined by Iy-cl ::s K and Ix -01 ::s N, and if IF(x, y) I ::s M there, then the differential equation y' = F(x, y) has at least one solution, y = f(x), definedfor [x - 01 ::s min(N, K j M) and satisfying the initial condition f(a) == c. Thistheoremguaranteesonlytheexistenceof solutions.Toensureuniqueness, the function F needsto havesome additionalproperties.An iroportantpropertyis statedin the followingdefinition. Lipschitz condition 13.2.3. Definition. Afunction F(x, y) satisfies a Lipschitz condition in a domain D C IRz iffor somefinite constant L (Lipschitzconstant),it satisfies the inequality IF(x, YI) - F(x, yz)1 s LIYI - yzl for all points (x, YI) and (x, yz) in D.
  • 370. 352 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS uniqueness theorem 13.2.4. Theorem. (uoiqueness theorem) Let f (x) and g(x) be any two solutions ofthe FODE y' = F(x, y) in a domain D, where F satisfies a Lipschitz condition with Lipschitz constant L. Then If(x) - g(x)1 :::eLlx-allf(a) - g(a)l. In particular, the FODE has at most one solution curve passing through the point (a, c) E D. The final conclusion of this theorem is an easy cousequence of the assumed differentiability of F and the requirement f(a) = g(a) = c. The theorem says that if there is a solution y = f(x) to the DE y' = F(x, y) satisfyiog f(a) =c, then it is the solution. The requirements of the Peano existence theorem are too broad to yield so- lutions that have some nice properties. For iostance, the ioterval of definition of the solutions may depend on their initial values. The followiog example illustrates this poiot. 13.2.5. Example. Consider the DEdyldx = eY• Thegeneralsolntinnof thisDEcanbe obtained by direct integration: «»dy = dx =} -e-Y = x + C. Ify = b whenx = 0, then C = _e-h, and e-Y = -x +e-h =} y = _In(e-h - x). Thus, thesolution is defined for-00 < x < e-b, ie.,-theinterval of definition of asolution changes withits initialvalue. l1li To avoid situations illustrated io the example above, one demands not justthe continuity of F-as does the Peano existence theorem-but a Lipschitz condition for it. Then one ensuresnot only the existence, butalso the uniqueness: local existence and uniqueness theorem 13.2.6. Theorem. (local existence and uniqueness theorem) Suppose thatthefunc- tion F(x, y) is defined and continuous in the rectangle Iy - c] :S K, Ix - al ::: N and satisfies a Lipschitz condition there. Let M = max IF(x, y)l in this rectan- gle. Then the differential equation y' = F(x, y) has a unique solution y = f(x) satisfying f(a) = c and defined on the interval Ix - al :S mio(N, KIM). 13.3 General Properties of SOLDEs The most general SOLDE is d2y dy P2(X)-2 +PI (x)-d + po(x)y = P3(X). dx x (13.9)
  • 371. 13.3 GENERAL PROPERTIES OF SOLDES 353 (13.10) normal form ofa SOLDE singular points ofa SOLDE Dividing by P2(X) and writing P for Pl/P2, q for polP2, and r for P3/P2 reduces this to the normal form d2y dy -2 + P(x)-d +q(x)y = rex). dx x Equation (13.10) is equivalent to (13.9) if P2(X) i' O. The points at which P2(X) vanishes are called the singnlar points of the differential equation. There is a crucial difference between the singular points of linear differential equations and those ofnonlinear differential equations. For a nonlinear differential equation such as (x2_y)y' = x 2+y2, thecnrve y = x2is the collectionofsingular points. This makes it impossible to construct solutions y = f (x) that are defined on an interval I = [a, b] of the x-axis because for any x E I, there is a y for which the differential equation is undefined. Linear differential equations do not have this problem, because the coefficients of the derivatives are functions of x only. Therefore, all the singular "curves" are vertical. Thus, we have the following definition. (13.11) where L[y]=P3, 13.3.1. Definition. The normalform ofa SOWE, Equation (13.10), is regularon an interval [a, b] of the x-axis if p(x), q(x), and rex) are continuous on [a, b]. A solution ofa normal SOWE is a twice-differentiable function y = f(x) that satisfies the SOLVE at every point of[a, b]. It is clear that any function that satisfies Equation (13.1O)--{)r Equation (13.9)-must necessarily be twice differentiable, and that is all that is demanded ofthe solutions. Any higher-order differentiability requirement may be too restric- tive, as was pointed out in Example 13.1.1. Most solutions to a normal SOLDE, however, automatically have derivatives of order higher than two. We write Equation (13.9) in the operator form as d2 d L sa P2dx2 +PI dx +PO· regular SOLDE It is clear that L is a linear operator because d/ dx is linear, as are all powers of it. Thus, for constants a and {J, L[aYI +{JY2] = aL[yJl +{JL[Y2]. In particular, if Yl and Y2 are two solutions of Equation (13.11), then L[YI - Y2] = O. That is, the difference between any two solutions of a SOLDE is a solution of the homogeneous equation obtained by setting P3 = 0.2 An immediate consequence of the linearity of L is the following: 13.3.2. Lemma. lj'L[u] = rex), L[v] = sex), a and {J are constants, and w = au +{Jy, then L[w] = ar(x) +{Js(x). Theproofofthis lemmais trivial, but the resultdescribes the fundamental prop- erty of linear operators: When r = s = 0, that is, in dealing with homogeneous superposition principle 2Thisconclusion is, of course, notlimited totheSOLDE; it holdsforall linear DEs.
  • 372. 354 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS equations, the lennna says that any linear combination of solutious of the homo- geneous SOLDE (HSOLDE) is also a solution. This is called the superposition principle. Based on physical intuition, we expect to be able to predict the behavior of a physical system if we know the differential equation obeyed by that system, and, equally importantly, the initial data. A prediction is not a prediction unless it is unique.' This expectation for linear equations is bome out in the language of mathematics in the form of an existence theorem and a uniqueness theorem. We consider the latter next. But first, we need a lennna. 13.3.3. Lemma. The only solution g(x) ofthe homogeneous equation s"+py' + qy = 0 defined on the interval [a, b] that satisfies g(a) = 0 = g'(a) is the trivial solution g = O. Proof Introduce the nounegative function u(x) es [g(x)]2 + [g'(x)]2 and differ- entiate it to get u'(x) = 2g'g +2g'g" = 2g'(g +g") = 2g'(g - pg' - qg) = _2p(g')2 + 2(1 - q)gg'. Since (g ± g')2 2:0, it follows that 21gg'I ::: g2 + e". Thns, 2(1 - q)gg' ::: 21(1 - q)gg'l = 21(1 - q)1 Igg'l s 1(1 - q)l(g2 + g'2) .s (I + Iql)(g2 + g'2), and therefore, u'(x)::: lu'(x)1 = 1- 2pg,2 + 2(1- q)gg'l s 21plg,2 + (I + Iql)(g2 + gl2) = [I + Iq(x)l]g2 + [I + Iq(x)1 +2Ip(x)l]gl2. Now let K = I +max[lq(x)1 +2Ip(x)J], where the maximum is taken over [a, b]. Then we obtain u'(x)::: K(g2 +g'2) = Ku(x) V x E [a, b]. Using the resnlt of Problem 13.1 yields u(x) ::: u(a)eK(x-a) for all x E [a, b]. This equation, plus u(a) = 0, as well as the fact that u(x) 2: 0 imply that u(x) = g2(x) +gl2(x) = O. It follows that g(x) = 0 = g'(x) for all x E [a, b]. D uniqueness of 13.3.4. Theorem. (Uniqueness theorem) Ifp and q are continuous on [a, b], then solutions fa SOLDE 3Physicalintuition also tells us thatif the initial conditions are changed by an infinitesimal amount, then the solutions will be changedinfinitesimally. Thus, the solutionsof lineardifferential equations are said to be continuous functions of the initialconditions. Nonlinear differential equations can have completely different solutionsfor two initialconditions that are infinitesimally close. Since initialconditions cannot be specifiedwithmathematical precision in practice, nonlinear differential equations leadto unpredictable solutions, orchaos.This subject has receivedmuchattention in recentyears. Foranelementary discussionof chaossee [Hass99, Chapter 15].
  • 373. 13.4 THEWRONSKIAN 355 atmost one solution y = f (x) ofEquation (13.10) can satisfy theinitial conditions f(a) = Ci and I'(a) = C2, where Ci and C2 are arbitrary constants. Proof Let fl and 12 be two solutions satisfying the given initial conditions. Then their difference, g es fl - 12, satisfies the bomogeneous equation [witb rex) = 0]. The initial condition that g(x) satisfies is clearly g(a) = a= g'la). By Lemma 13.3.3, g = aor !I = h. 0 Theorem 13.3.4 can be applied to any homogeneous SOLDE to find the latter's most general solution. In particular, let !I(x) and 12(x) be any two solutions of y" + p(x)y' +q(x)y = 0 (13.12) defined on the interval [a, b]. Assnme that the two vectors VI = (fl(a), fila»~ and V2 = (h(a), f~(a» in IR2 are linearly independent," Let g(x) be another solution. The vector (g(a), g'(a)) can be written as a linear combination of VI and V2, giving the two equations g(a) = cdl(a) +c2h(a), g'(a) = Cif{(a) +c2f~(a). Now consider the function u(x) ea g(x) - cdl(x) - c2h(x), which satisfies Equation (13.12) and the initial conditions ural =u'(a) =O. By Lemma 13.3.3, we must have u(x) = 0 or g(x) = Ci!l(X) + c2h(x). We have proved the following: 13.3.5. Theorem. Let fl and 12 be two solutions ofthe HSOWE v" + py' +qy = 0, where p and q are continuousfunctions defined on the interval [a, b]. If (fl(a), flea»~ and (h(a), f~(a)) are linearly independent vectors in IR2, then every solution g(x) ofthis HSOWE is equal to some linear combination g(x) = CI!I(x) +c2h(x) offl and 12 with constantcoefficients Cl and C2. 13.4 The Wronskian The two solntions fl(x) and hex) in Theorem 13.3.5 have the property that any other solution g (x) can be expressed as a linear combination of them. We call basis ofsolutions fl and 12 a basis of solutions of the HSOLDE. To form a basis of solutions, !I 4If theyarenot,thenone mustchoosea different initial pointfortheinterval.
  • 374. 356 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS and h must be linearly independent. The linear dependence or independence of a number of functions If; 1i'=1 : [a, b] --> lit is a concept that must hold for all x E [a, b]. Thus, if (a;}i'~1 E lit can be found such that for some XQ E [a, b], it does not mean that the j's are linearly dependent. Linear dependence requires that the equality hold for all x E [a, b]. In fact, we must write aIfI +a2h +...+anfn = 0, where 0 is the zero function. Wronskian defined 13.4.1. Definition. The Wronskian ofany two differentiable functions II(x) and h(x) is ( fJ (X) W(fl, h: x) = lI(x)f~(x) - h(x)f{(x) = det h(x) f{(X») . f~(x) 13.4.2. Proposition. The Wronskian ofany two solutions ofEquation (13.12) sat- isfies where c is any number in the interval [a, b]. Proof. Differentiating both sides of the definition of Wronskian and substituting from Equation (13.12) yields a FOLDE for W(fl, h: x), which can be easily solved. The details are left as a problem. D An important consequence of Proposition 13.4.2 is that the Wronskian of any two solutions of Equation (13.12) does not change sign in [a, b]. In particular, if the Wronskian vanishes at one point in [a, b], it vanishes at all points in [a, b]. The real importance of the Wronskian is contained in the following theorem. 13.4.3. Theorem. Two differentiable functions fl and h, which are nonzero in the interval [a', b], are linearly dependent ifand only iftheir Wronskian vanishes. Proof. If fl and h are linearly dependent, then one is a multiple of the other, and the Wronskian is readily seen to vanish. Conversely, assume that the Wronskian is zero. Then lI(x)f~(x) - h(x)f{(x) = 0 ~ IIdh = hdfl ~ h = elI and the two functions are linearly dependent. D
  • 375. ifx?;O if x:::: 0 13.4 THE WRONSKIAN 357 Josef Hoene de Wronski (1778-1853) was hom Josef Hoene, hut he adopted the name Wronski around 1810justafter he married, He hadmovedto France andbecomea French citizenin 1800andmovedto Parisin 1810, the sameyearhe publishedhis firstmemoiron the foundations of mathematics, which receivedless than favorable reviews from Lacroix andLagrange. Hisotherinterests includedthedesignof caterpillar vehiclestocompetewith therailways. However, theywerenevermanufactured. Wronski wasinterested mainlyinapplying philosophyto mathematics, thephilosophy taking precedence overrigorous mathematical proofs. He criticised Lagrange's use of infinite seriesandintroduced his ownideasforseriesexpansions of a function. The coefficientsin this series aredeterminants now known as Wronskians [so named by Thomas Muir (1844- 1934), a GlasgowHigh School science master who became an authority on determinants by devoting most of his life to writing afive-volume treatise onthehistory of determinants]. Formany years Wronski's workwasdismissedasrubbish. However, a closer examination of the workin more recent timesshowsthatalthough someis wrongandhe has anincredibly high opinion of himself andhisideas,there arealsosomemathematical insights of greatdepthandbrilliance hidden within thepapers. 13.4.4. Example. Let I: (x) = x and h(x) = Ixlfor x E [-1,1]. These two functions are linearly independent inthegiveninterval, becausealx +a21xI = 0for all x ifandonly if ''1 = ez =O.The Wronskian, on the other hand, vanishes for all x E [-1, +1]: dlxl dx dlxl W(ft, fz; x) = x- -Ixl-= x- -Ixl dx dx dx d {x ifx>O {x = x dx -x if x ~ 0 - -x { X - x =0 if x > 0 = -x - (-x) = 0 if x < O. Thus, it is possiblefortwofunctions to haveavanishing Wronskian without beinglinearly dependent. However, as we showed in the proof of the theorem above, if the functions are differentiable in their interval of definition, then they are linearly dependent if their Wronskian vanishes. II 13.4.5. Example. TheWronskian canbe generalized to n functions. TheWronskian of the functions fJ, h. ... ,[n is ( fJ(x) h(x) W(ft,fz, ... ,fn: x) = det : fn(x) f{(x) f5.(x) f/'(x) fi"_l)(X») f}"-l) (x) fJn-l) (x)
  • 376. 358 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS If the functions are linearlydependent,then W(ft,!2, ... , in; x) = O. For instance, itisclearthat eX, e-x, andsinhx arelinearly dependent. Thus, weexpect ( e" e" ex) W(eX , e-x, sinh x; x) = det e-x _e-x e-x sinhx coshx sinhx tovanish, asis easilyseen(thefirst and lastcolumns arethesame). 13.4.1 A Second Solution to the HSOLDE Ifwe know one solution to Equation (13.12), say ft, then by differentiating both sides of h(x)fi(x) - h(x)f{(x) = W(x) = W(e)e- J: p(t)dt, dividing the result by f 1 2 , and noting that the LHS will be the derivative of hih, we can solve for 12 in tenus of ft. The result is hex) = fleX) {C+ K 1 X ---i-exp [-1'p(t)dt] dS} , a f l (S) c (13.13) ke kx ) -ke-kx = -2k, where K '" W (c) is another arbitrary (nonzero) constant; we do not have to know W (x) (this would require knowledge of 12, which we are trying to calculate!) to obtain Wee). In fact, the reader is urged to check directly that hex) satisfies the DE of (13.12) for arbitrary C and K. Whenever possible-and convenient- it is customary to set C = 0, because its presence simply gives a term that is proportional to the known solution ft (x). 13.4.6. Example. (a) A solutionto the SOLDE y" - k2y = 0 is ekx. To finda second solution, we let C = 0 and K = I 10Equation(13.13).Sincep(x) = 0, we have ( 1 x ds ) I e-2ka !2(x)=e kx 0+ a e2k' =-2ke-kx+~t!X, which, ignoring thesecond term (whichis proportional to thefirst solution), leads directly to thechoiceof e-kx as a secondsolution. (h) Thedifferentialequationy"+k2y = ohassinkx asasolution. WithC = O,a = rr/(2k), andK = I, we get !2(x) =slokx (o+l x .~ ) =-slokxcotksl;/2k = - coskx . 1r/2k sm ks (c)For the solotions in part (a), ( ekx W(x) = det e-kx andfor thosein part (h), W(x) = det (SlokX coskx kcoskx ) =-k -ksiokx .
  • 377. 13.4 THE WRONSKIAN 359 BothWronskians are constant. In general, theWronskian of anytwo linearly independent solutions of y" + q(x) y = 0 is constant. II Most special functions used in mathematicalphysics are solutions ofSOLDEs. The behavior of these functions at certain special points is determined by the physics of the particular problem. In most situations physical expectation leads to a preference for one particular solution over the other. For example, although there are two linearly independent solutions to the Legendre DE ~ [(I - x 2 ) dY] +n(n + I)y = 0, dx dx the solution that is most frequently encountered is the Legendre polynomial Pn (x) discussed in Chapter 7. The other solntion can be obtained by solving the Legendre equation or by using Equation (13.13), as done in the following example. 13.4.7. Example. The Legendreequationcan bereexpressedas d2y 2x dy n(n + I) dx2 - I _ x2 dx + I _ x2 Y = O. This is an HSOWE with p(x) = -~2 and q(x) = n(n + I) I-x I-x2 One solutionof this HSOLDEis the well-knownLegendrepolynomial Pn(x). Using this as ourinputandemploying Equation (13.13), we cangenerate another set ofsolutions. Let Qn(x) stand for the linearly independent''partner'' of Pn(x). Then, setting C = 0= c in Equation(13.13)yields Qn(x) = KPn(x) 1 x +exp [[' ~ -l d. " Pn (s) 1 0 I - t = KPn(x)lx + [~] d. = AnPn(x)l x ~. 2 ' a Pn (s} 1-. " (I -. )Pn (.) where An is anarbitrary constant determined bystandardization, anda is anarbitrary point in the interval [-1, +1].Forinstance,forn = 0, we have Po = 1, andwe obtain for Ixl < 1. for Ixl < 1. 1 x d. [I II+xl I 1 1 +"1 ] QO(x) = Ao - - = Ao -In - - - -In - - . "1-.2 2 I-x 2 1-" The standardform of Qo(x) is obtainedby settingAo = I and" = 0: I II+x I Qo(x) = -In - - 2 I-x Similarly, since Pt (x) = x, 1 x d. II+x I Ql(X)=AlX 2 2 =Ax+Bxln - - +C " s (I -. ) I - x Here standardization is A = 0, B = !'andC = -1. Thus, I II+x I Ql(X) = -xln - - -1. 2 I-x III
  • 378. 360 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS 13.4.2 The General Solution to an ISOLDE Inhomogeneous SOLDEs (ISOLDEs) can be most elegantly discussed iu terms of Green's functions, the subject of Chapter 20, which automatically iucorporate the boundary conditions. However, the most general solution of an ISOLDE, with no boundary specification, can be discussed at this poiut. Let g(x) be a particular solution of L[y] =y" + py' +qy =r(x) (13.14) and let h(x) be any other solution of this equation. Then h(x) - g(x) satisfies Equation (13.12) and can be written as a liuear combiuation ofa basis ofsolutions fl (x) and h (x), leading to the followiug equation: h(x) = CJfl(x) +ezh(x) +g(x). (13.15) Thus, ifwe have apartieular solution of the ISOLDE of Equation (13.14) and two basis solutions of the HSOLDE, then the most general solution of (13.14) can be expressed as the sum of a linear combiuation of the two basis solutions and the particular solution. We know how to find a second solution to the HSOLDE once we know one solution. We now show that knowiug one such solution will also allow us to find a particular solution to the ISOLDE. The method we use is called the variation of method ofvariation constants. This methodcan also be used to find a second solution to the HSOLDE. ofconstants Let hand h be the two (known) solutions of the HSOLDE and g(x) the sought-after solution to Equation (13.14). Write g as g(x) = fl (x)v(x) and sub- stitute it in (13.14) to get a SOLDE for v(x): " ( 2f {) , r v + p+- v =-. h h This is aftrst order liuear DE iu v', which has a solution of the form v' = W(x) [c+1 x h(t)r(t) dt] ft(x) a W(t) , where W(x) is the (known) Wronskian of Equation (13.14). Substitutiug W(x) = h(x)f5.(x) - h(x)f{(x) =!:... (h) ft(x) ft(x) dx h iu the above expression for v' and settiug C = 0 (we are interested in a particular solution), we get dv =!:... (h) 1 x fl(t)r(t) dt dx dx h a W(t) .».[h(x) 1x h(t)r(t) -l_h(x) !:...lx h(t)r(t) dt dx h(x) a W(t) fl(x).ax a W(t) =ft(x)r(x)/W(x)
  • 379. 13.4 THE WRONSKIAN 361 and V(X) = h(x) r Ji(t)r(t) dt _ (X h(t)r(t) dt. fl(x) Ja W(t) Ja W(t) This leads to the particular solution 1 x fl (t)r(t) 1X h(t)r(t) g(x) = fl(x)v(x) = h(x) W(t) dt - fl(x) W(t) dt. a a (13.16) We have just proved the following result. 13.4.8. Proposition. Given a single solution Ji(x) ofthe homogeneous equation corresponding to an ISOLDE, one can use Equation (13.13) tofind a second solu- tion h(x) ofthe homogeneous equation and Equation (13.16) tofind a particular solution g(x). The most general solution h will then be h(x) = cdl (x) +czh(x) +g(x). 13.4.3 Separation and Comparison Theorems The Wronskian can be used to derive some properties ofthe graphs of solutions of HSOLDEs. One such property concerns the relative position of the zeros of two linearly independent solutions of an HSOLDE. theseparation 13.4.9. Theorem. (the separation theorem) The zeros oftwo linearly independent theorem solutions ofan HSOWE occur alternately. Proof Let Ji(x) and h(x) be two independent solutions of Equation (13.12). We have to show that a zero of Ji exists between any two zeros of h. The linear independence of fl and h implies that W (fl, h: x) i' 0 for any x E [a, b]. Let Xi E [a, b] be a zero of h. Then oi' W(fI, fz;Xi) = Ji(xilf~(Xi) - h(xilf{(xil = fl(Xi)f~(Xi)' Thus, [: (Xi) i' 0 and f~(Xi) i' O. Suppose that XI and xz-where X2 > Xl----are two successive zeros of h. Since h is continuous in [a, b] and f~(xI) i' 0, h has to be either increasing [f~(Xl) > 0] or decreasing [f~(XI) < 0] at XI. For h to be zero at xz, the next point, f~ (X2) must have the opposite sign from f~(Xl) (see Figure 13.1). We proved earlier that the sign of the Wronskian does not change in [a, b] (see Proposition 13.4.2 and comments after it). The above equation then says that fl (XI) and I, (xz) also have opposite signs. The continuity of Ji then implies that Ji must cross the x-axis somewhere between Xl and xz. A similar argument shows that there exists one zero of h between any two zeros of fl. D
  • 380. 362 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS Figure 13.1 If 12 (xI ) > 0 > 12(X2). then (assumingthaI the Wronskianis positive) !J(XI) > 0 > !J(X2). the comparison theorem 13.4.10. Example. Two linearly independent sotntions of v" + Y = 0 are sinx and cosx. The separation theorem suggeststhat the zeros of sinx andcosx mustalternate, a fact known fromelementary trigonometry: The zeros of cosx occurat odd,multiples of "/2. and thoseof sinX occurat evenmultiplesof"/2. ill A second useful result is known as the comparison theorem (for a proof. see [Birk 78. p. 38]). 13.4.11. Theorem. (the comparison theorem) Let f and g be nontrivial solutions of u" + p (x)u = 0 and v" +q (x) v = O. respectively. where p (x) 2:: q(x) for all X E [a. b]. Then f vanishes at least once between any two zeros ofg. unless p = q and f is a constant multiple ofg. The form of the differential equations used in the comparison theorem is not restrictive because any HSOLDE can be cast in this form. asthe following example shows. 13.4.12. Example. We show that y" + p(x)y' + q(x)y = 0 can be cast in the fonn u" + S(x)u = Obyanappropriate functional transformation. Definew(x) by y = WU, and substitutein the HSOLDE to obtain (u'w +w'u)' +p(u'w +w'u) +quw = O. or wu" + (2w' + pw)u' + (qw + pw' + w")u = O. (B.i?) If we demand that thecoefficient of u' bezero, we obtain the DE 2w' +pw = 0, whose solution is w(x) = cexp[-i LX p(t)dt].
  • 381. 13.4 THEWRONSKIAN 363 Dividing (13.t7) by this w and substituting for w yields u" + S(x)u = 0, where W' w" 12 II Sex) = q + p- + - = q - 4P - 'lP . w w oscillation ofthe Bessel function of order zero A useful special case of the comparison theorem is given as the following corollary whose straightforward but instructive proof is left as a problem. 13.4.13. Corollary. If q(x) ::: 0 for all x E [a, bj, then no nontrivial solution of the differential equation v" +q(x)v = 0 can have more than one zero. 13.4.14. Example. It should be clear from the preceding discussion that the oscillations of the solutions of s" + q(x)v = 0 are mostly determined by the sign and magnitude of q(x). For q(x) ::s 0 there is no oscillation; that is, there is no solution that changes sign more than once. Now suppose that q(x) ::: k2 > 0 for some real k. Then, by Theorem 13.4.11, any solution of v" + q(x)v = 0 must have at least one zero between any two successivezeros of the solutionsinkx of ulf + k2u = O. This means that any solutionof v" +q(x)v = ohas a zero in any interval oflength1l"jk if q(x) 2: k2 > O. Let us apply this to the Bessel DE, " I, ( n 2 ) y + -y + 1- - y = O. x x2 We can eliminate the y' term by substituting vj.JX for y.5 This transforms the Bessel DE into. ( 4n2-1) v" + 1 - - - - v = O. 4x2 We compare this, for n = 0, with u" +U = 0, which has a solution u = sin x, and conclude that each interval of length 7r ofthe positive x-axis contains at least one zero ofany solution of order zero (n = 0) ofthe Bessel equation. Thus, in particular, the zeroth Bessel function, denoted by Jo(x), has a zero in each interval of length 7r of the x-axis. On the other hand, for 4n2 - I > 0, or n > !' we have I > [t - (4n2 - l)j4x2j. This implies that sin x has at least one zero between any two successive zeros of the Bessel functions of order greater than ~. It follows that such a Bessel function can have at most one zero between any two successive zeros of sin x (or in each interval of length 7r on the positive x-axis). 11 13.4.15. Example. Let us apply Corollary 13.4.13 to v" - v = 0 in whichq(x) = -I < O.According to the corollary, the most general solution, cjeX + C2e-x, can have at most one zero. Indeed, x -x 0 1 I 1 c21 qe+c2e = ;::}x=zn- q , and this (real) x (ifit exists) is the only possible solution, as predicted by the corollary. 11 SBecause of the square root in the denominator, the range of x will have to be restricted to positive values.
  • 382. 364 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS 13.5 AdjointDifferential Operators We discussed adjoint operators in detail in the context of finite-dimensional vector spaces in Chapter 2. In particular, the importance of self-adjoint, or hermitian, operators was clearly spelled out by the spectral decomposition theorem of Chapter 4. A consequence of that theorem is the completeness of the eigenvectors of a hermitian operator, the fact that an arbitrary vector can be expressed as a linear combination of the (orthonormal) eigenvectors of a hermitian operator. Self-adjoint differential operators are equally important because their "eigen- functions" also form complete orthogonal sets, as we shall see later. This section will generalize the concept of the adjoint to the case of a differential operator (of second degree). 13.5.1. Definition. The HSOLVE L[y] '" p2(X)y" +PI(x)y' + po(x)y = 0 exact SOLDE is said to be exact if (13.18) (13.19) integrating factor for for all f E e2[a, b] andfor some A, B E ella, bJ, An integratingfactorfor L[y] SOLDE is afunction /L(x) such that /L(x)L[y] is exact. If an integrating factor exists, then Equation (13.18) reduces to d , , d)A(x)y + B(x)y] =0 => A(x)y + B(x)y =C, a FOLDE with a constant inhomogeneous term. Even the ISOLDE corresponding to Equation (13.18) can be solved, because d , /L(x)L[y] =/L(x)r(x) => dx[A(x)y + B(x)y] =/L(x)r(x) => A(x)y' + B(x)y = LX /L(t)r(t)dt, which is a general FOLDE. Thus, the existence of an integrating factor completely solves a SOLDE. Itis therefore important to know whether ornot a SOLDE admits an integrating factor. First let us give a criterion for the exactness of a SOLDE. 13.5,2. Proposition. The SOWE ofEquation (13.18) is exact ifand only ifP~ - pi + Po = o, Proof Ifthe SOLDE is exact, then Equation (13.19) holds for all f. implying that P2 = A, PI = A' +B. and PO = B'. It follows that P~ = A", pi = A" +B', and PO = B'. which in tum give P~ - pi + PO = O.
  • 383. 13.5 ADJOINT DIFFERENTIAL OPERATORS 365 Conversely if Pz - PI + PO = 0, then, substituting PO = - pz + PI in the LHS of Equation (13.18), we obtain and the DE is exact. D A general SOLDE is clearly not exact. Can we make it exact by multiplying it by an integrating factor as we did with a FOLDE? The following proposition contains theanswer. 13.5.3. Proposition. Afunetion J.L is an integratingfactor ofthe SainE ofEqua- tion (13.18) if and only if it is a solution ofthe HSOInE M[J.L] es (P2J.L)" - (PIJ.L)'+PoJ.L = O. (13.20) Proof This is an immediate conseque,:!ceof Proposition 13.5.2. We can expand Equation (13.20) to obtain the equivalent equation P2J.L" + (2p~ - PI)J.L' +(pz - PI + pO)J.L = O. D (13.21) (13.22) The operator Mgiven by 2 ~ d r d " , M es P2dx 2 +(2p2 - PI) dx +(P2 - PI+ PO) is called the adjoint of the operator L and denoted by Mes Lt. The reason for the use of the word "adjoint" will be made clear below. Proposition 13.5.3 confirms the existence of an integrating factor. However, the latter can be obtained only by solving Equation (13.21), which is at least as difficult as solving the original differential equation! In contrast, the integrating factor for a FOLDE can be obtained by a mere integration [see Equation (13.8)]. Although integrating factors for SOLDEs are not as usefnl as their counterparts for FOLDEs, they can facilitate the study of SOLDEs. Let us first note that the adjoint of the adjoint of a differential operator is the original operator: (Lt)t = L (see Problem 13.11). This suggests that if v is an integrating factor of L[u], then u will be an integrating factor ofM[v] sa Lt[v]. In particular, multiplying the first one by v and the second one by u and subtracting the results, we obtain [see Equations (13.18) and (13.20)] vL[u] - uM[v] = (VP2)U" - U(P2V)" +(VPI)U' +U(PI v)', which can be simplified to adjoint ofa second-order linear differential operator d , , vL[u] - uM[v] = -[p2VU - (P2V) u +PIUV]. dx (13.23)
  • 384. (13.24) 366 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS Lagrange identities Integrating this from a to b yields lb (vL[u]- uM[v])dx = [pzvu' - (pzv)'u +Pluv]I~. Equations (13.23) and (13.24) are calledthe Lagrange identities. Equation (13.24) embodies the reason for calling Mthe adjointofL: Ifwe consider u and v as abstract vectors Iu) and Iv), L and M as operators in a Hilbert space with the inner product (ul v) = J:u*(x)v(x) dx, then Equation (13.24) can be written as (vi L lu) - (ul M Iv) = (ul Lt [u)" - (ul M [u} = [pzvu' - (pzv)'u +Pluv]I~. If the RHS is zero, then (ul Lt Iv)* = (ul M Iv) for all ]«) , [u), and since all these operators and functions are real, Lt = M. As in the case of finite-dimensional vector spaces, a self-adjoint differential operator merits special consideration. For M[v] es Lt [v] to be equal to L, we must have [see Equations (13.18) and(13.21)12p~ - PI = PI and P~ - pi +PO = Po· The first equation gives P~ = PI, which also solves the second equation. If this conditionholds, then we can write Equation (13.18) as L[y] = PZy" +p~y' +POY, or d [ dY] ( L[y] = dx pz(x) dx +Po(x)y =O. Can we make all SOLDEs self-adjoint? Let us multiply both sides ofEquation (13.18) by a function h(x), to be detenntined later. We get the new DE h(x)pz(x)y" +h(x)PI(x)y' +h(x)po(x)y = 0, which we desire to be self-adjoint. This will be accomplished if we choose h(x) such thathpi =(hpz)', or pzh' +h(p~- PI) =0, which can be readily integrated to give h(x) = ~ exp [fX PI (t) dt] . PZ pz(t) We have just proved the following: all SOLOEs canbe 13.5.4. Theorem. The SOInE ofEquation (13.18) is self-adjoint if and only if made self-adjoint P~ = PI, in which case the DE has the form d [ dY] dx pz(x) dx + po(x)y = O. If it is not self-adjoint, it can be made so by multiplying it through by I [fX PI(t) ] h(x) = -exp --dt. pz pz(t)
  • 385. 13.6 POWER-SERIES SOLUTIONS OF SOLOES 367 13.5.5. Example. (a)The Legendreequationin normalform, " 2x, x y ---y +--y=O, I-xl I - x2 is notself-adjoint. However, we get-aself-adjoint versionif we multiplythrough byh(x) = I-x2: (I - x2)y" - 2xy' +Ay= 0, or [(I - x 2)y'1 ' + Ay= o. (b) Similarly, thenormalformof the Besselequation I ( n 2 ) y"+-y'+ 1-- y=O x x2 is not self-adjoint, but multiplying throughby h(x) = x yields ~(XdY) + (x _n 2 )y = 0, dx. dx x whichis clearlyself-adjoint. 13.6 Power-Series Solutions of SOLDEs l1li Analysis is one of the richest branches of mathematics, focusing ou the eudless variety of objects we call functions. The simplest kind of functiou is a polyno- mial, which is obtained by performing the simple algebraic operations ofaddition and multiplication on the independent variable x. The next in complexity are the trigonometric functions, which are obtained by taking ratios of geometric objects. If we demand a simplistic, intuitive approach to functions, the list ends there. It was only with the advent of derivatives, integrals, and differential equations that a vastly rich variety of functions exploded into existence in the eighteenth and nine- teenth centuries. Forinstance, e",nonexistent beforetheinvention of calculus, can be thought of as the function that solves dy[dx = y. Although the definition of a function in terms ofDEs and integrals seems a bit artificial, for most applications it is the only way to define a function. For instance, theerror function, usedin statistics, is.defined as I IX 2 erf(x) == r;; e-t dt. y'Jr -00 Such a function carmot be expressed in terms of elementary functions. Similarly, functions (of x) such as 1 00 sint -dt, x t
  • 386. 368 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS and so on are enconnteredfrequently in applications. None of these functions can be expressed in terms of other well-knownfunctions. An effective way of studying such functions is to study the differential equa- tions they satisfy. Infact, the majority of functions encountered in mathematical physics obey the HSOLDEof Equation (13.I8) in which the Pi(x) are elementary functions,mostlyratios of polynomials(of degree at most 2). Of course, to specify functionscompletely,appropriatebonndaryconditionsare necessary.For instance, the error function mentioned above satisfiesthe HSOLDE y" +2xy' = 0 withthe boundary conditions yeO) =! and y'(O) =l/..,m. The natural tendency to resist the idea of a function as a solution of a SOLDE is mostlydue to the abstractnature of differentialequations.After all, it iseasier to imagineconstructingfnnctionsby simplemultiplicationsor withsimplegeometric figuresthat have been aronnd for centuries. The following beautiful example (see [Birk78, pp. 85--S7]) should overcome this resistance and convince the skeptic that differentialequations contain all the information about a function. 13.6.1. Example. We canshow thatthesolutions to y" + Y =0 have all theproperties we expectof sinx and cosx. Letus denote the two linearly independent solutions of this equationbyC(x) andS(x).Tospecifythesefunctiooscompletety, wesetC(O) = S'(O) = t, andC'(O) = S(O) = O. We claim thatthisinformation is enoogh toideotify C(x) andSex) as cos x and sinx, respectively. First, let us showthat the solutions exist andarewell-behaved functions. With C(O) and c' (0) given, the equation y" + y = 0 can generate all derivatives of C (x) at zero: C"(O) = -C(O) = -I, CII/(O) = -C'(O) = 0, C(4) (0) = -C"(O) = +1, and, in general, { o if n i odd C(n) (0) = 1 a is , (-I)k ifa = 2k wherek = 0, 1,2, .. ,. Thns, theTaylor expansion of C(x) is 00 2k C(x) = L(-I)k~, k=O e2k)! Shnilarty, 00 2k+l Sex) = "e_l)k_X;-:-:c-; 6 (Zk+I)!' (13.25) e13.26) Example illustrates that all intormation about sine and cosine ishidd,m in their differential equation A simpleratio teston theseriesrepresentation of C(x) yields . ak+t ' e-l)k+lx2(k+I)/e2k+2)!, _x2 100 --= too = 1hn =0 k....oo ak k....oo e-l)kx2k/e2k)! k....oo e2k+ 2)(2k+ I) , whichshowsthat theseriesfor C(x) converges forallvaluesof x. Similarly, theseriesfor S(x) is alsoconvergent. Thus, we aredealing withwell-defined finite-valued functions. Letusnowenumerate and prove someproperties of C(x) andS(x). (a) C'ex) = -Sex), We prove this relation by differentiating c" (x) + C(x) = 0 and writing the result as
  • 387. 13.6 POWER-SERIES SOLUTIONS OF SOLOES 369 [C'(x) 1"+C'(x) = 0 to make evident the fact that C'(x) is also a solution. Since C'(0) = 0 and [C'eO)]' = C"(O) = -1, and since -Sex) satisfies the same initialconditions, the uniqueness theorem implies that C'(x) = -S(x). Similarly, S'(x) = C(x). (b) C2(x) +S2(x) = I. Sincethe p(x) term is absent from the SOLDE, Proposition 13.4.2impliesthat theWron- skiau of C(x) aud S(x) is constaut. On the otherhaud, W(C, S; x) = C(x)S'(x) - C' (x)S(x) = C2(x) + S2(x) = W(C, S; 0) = C2(0) + S2(O) = I. (c) S(a +x) = S(a)C(x) +C(a)S(x). Theuse oftbe chainruleeasily showsthat Sea+x) is a solution of the equation y" +y = O. Thus,it can be written as a linear combination of C(x) and Sex) [whichare linearly independent becausetheir Wronskian is nonzeroby (b)]: S(a +x) =AS(x) + BC(x). (3.27) This is a functional identity, which for x = 0 gives S(a) = BC(O) = B.lfwe differentiate both sides of Equation (13.27), we get C(a +x) = AS'(x) + BC'(x) = AC(x) - BS(x), which for x = 0 gives C(a) = A. Substitoting the values of A aud B in Equatiou (13.27) yieldsthedesired identity. A similar argument leadsto C(a +x) = C(a)C(x) - S(a)S(x). (d) Periodicity of C(x) aud S(x). Let xo be the smallest positive real uumber such that S(xo) = C(xo). Then property (b) implies that C(xo) = S(xo) = 1/./2. On the otherhaud, S(xo +x) = S(xO)C(x) +C(xo)S(x) = C(XO)C(x) +S(xO)S(x) = C(xo)C(x) - S(xo)S(-x) =C(xO - x). The third equality follows becaose by Equation (13.26), S(x) is au odd function of x. This is truefor all x; in particular, for x = xo it yields S(2xo) = C(O) = I, aud by property (b), C(2xO) = O. Using property (c) once more, we get S(2xo +x) = S(2xo)C(x) +C(2xo)S(x) = C(x), C(2xo +x) = C(2xo)C(x) - S(2xO)S(x) = -S(x). Substitoting x = 2xo yields S(4xO) = C(2xo) = 0 aud C(4xo) = -S(2xO) = -I. Continuing in thismanner, we caneasilyobtain S(8xo +x) =S(x), C(8xo + x) = C(x), whichprove theperiodicity of Sex) andC(x) andshowthat their periodis 8xQ. It is even possibleto determine xo. Thisdetermination is left as aproblem, buttheresultis xo = {l/.Ji _d_t_. 10 JJ=t2 A numerical calculation will showthatthisis 11:/4.
  • 388. 370 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS 13.6.1 Frobenius Method of Undetermined Coefficients A proper treatment of SOLDEs requires the medium of complex analysis and will be undertakeu in the uext chapter. At this poiut, however, we are seekiug aformal iufiuite series solutiou to the SOLDE y" + p(x)y' +q(x)y = 0, where p(x) and q(x) are real and analytic. This means that p(x) and q(x) can he represeuted by convergent power series in some interval (a, b). [The interesting case where p(x) and q(x) may have siugularities will be treated in the context of complex solutions.] The general procedure is to write the expansions'' 00 p(x) = Lakxk, k~O 00 q(x) = LbkXk, k~O (13.28) for the coefficient functions p and q and the solution y, substitute them in the SOLDE, and equate the coefficient of each power of x to zero. For this purpose, we need expansions for derivatives of y: 00 00 y' = LkCkXk-1 = L(k + l)ck+I Xk, k=l k=O 00 00 v" =L(k + l)kck+IXk-1 = L(k +2)(k + I)Ck+2xk k~l k=O Thus 00 00 p(x)y' = L L amxm(k + l)Ck+IXk = L(k + l)amCk+lX k+m. k=Om=O k,m Letk +m == n andsumovern. Thentheother sum,saym, cannot exceedn. Thus, 00 n p(x)y' = LL(n-m+l)amcn-m+IX n. n=Om=O Similarly, q(x)y = I:~o I:~~o bmcn_mxn. Substituting these sums and the se- ries for y" in the SOLDE, we obtain ~{ (n + l)(n +2)Cn+2 +1;[en - m + l)amCn-m+1 +b~cn-m]} x n = O. 6Herewe areexpanding about theorigin. If suchanexpansion is impossible orinconvenient, one canexpand about another point,sayxo. Onewouldthenreplace all powersof x in all expressions belowwithpowersof x - xo. Theseexpansions assume that p, q, andy havenosingularity atx = O. In general, thisassumption is notvalid,anda different approach, inwhichthewhole seriesis multiplied by a (notnecessarily positiveinteger) powerof x, oughttobe taken. Detailsareprovided in Chapter 14".
  • 389. 13.6 POWER-SERIES SOLUTIONS OF SOLOES 371 For this to be true for all x, the coefficient of each power of x must vanish: " (n + lien +2)c"+2 = - L[(n - m + l)amc,,-m+1 +bmc,,_m] m=O or for n 2: 0, ,,-I n(n + I)C"+l = - L[(n - m)amc,,_m +bmc,,_m_tl m=O for n 2: 1. (13.29) the SOLOE existence theorem If we know Co and CI (for instance from bonndary conditions), we can uniquely determine c"for n 2:2 from Eqnation (13.29). This, in tnm, gives a unique power- series expansion for y, and we have the following theorem. 13.6.2. Theorem. (the existence theorem) For any SOLVE of the form s" + p(x) y' + q(x) y = 0 with analytic coefficient functions given by the first two equations of(13.28), there exists a unique power series, given by the third equa- tion of(13.28) that formally satisfies the SOWEfor each choice ofco and ct. This theorem merely states the existence of a formal power series and says nothing about its convergence. The following example will demonstrate that con- vergence is not necessarily guaranteed. 13.6.3. Example. Theformal power-series solutionforx2 s'- y+x = 0 canbeobtained by letting y = L~o e-x",Theny' = L~o(n + l)cn+lXn, andsubstitution in theDE gives L~O(n + l)Cn+lXn+2 - E~o cnxn + x = 0, or 00 00 L(n+ 1)cn+lXn+2 - Co -qx - LCnxn +x = O. n=O n=2 We see thatCo = 0, C1 = 1, and (n + l)Cn+l = cn+2 for n ~ O. Thus, we have the recursion relation nCn = cn+l forn ::: 1 whose unique solution is en = (n - 1)1, which generates thefollowing solution fortheDE: Y = x +xz+ (2!)x 3 + (3!)x 4 + ... +(n - t)!x" + .... This seriesis notconvergent for anynonzerox. III As we shall see later, for normal SOLDEs, the power series of y in Equation (13.28) converges to an analytic function. The SOLDE solved in the preceding example is not normal. 13.6.4. Example. As an application of Theorem 13.6.2, tet us cousiderthe Legendre equation inits normal form " 2x! A Y ---ZY +--zy=O. I-x t-x
  • 390. 372 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS For IxI < I both p and q are analytic, and 00 00 p(x) = -2x L (x2)m = L (_2)x2m+1, m=O m=O 00 00 q(x) = ic L (x2)m = L !cx 2m . m=O m=O Thus, the coefficients of Eqoation (13.28) are { o ifm is even. am = -2 ifmisodd and bm = {~ if m is even, ifm is odd. Wewantto substitute for am andbm in Equation (13.29) to findCn+l . It is convenient to consider two cases: when n is odd and when n is even. For n = 2r + 1, Equation (13.29)-after some algebra-yields r (2r + 1)(2r +2)C2'+2 = L (4r - 4m - ic)C2(,-m)' m=O With r ---+ r + 1, thisbecomes ,+1 (2r +3)(2r +4)C2'+4 = L (4r +4 - 4m - ic)C2(,+I-m) m=O (13.30) foreven k. (13.31) ,+1 = (4r +4 - ic)C2(,+I) + L (4r +4 - 4m - ic)C2(,+I_m) m=l r =(4r +4 - ic)c2r+2 + L (4r - 4m - ic)C2(,_m) m=O = (4r +4 - ic)C2,+2+ (2r + 1)(2r +2)02,+2 = [-ic + (2r +3)(2r +2)]c2,+2, where in goingfromthesecondequality to thethird we changed thedummy index, andin going from the thirdequality to the fourth we used Eqoation (13.30). Now we let2r+2 eek to obtain (k + I)(k +2)Ck+2 = [k(k + I) - iclck, or k(k+ I) -ic ck+2 = (k + I)(k +2) Ck Itis notdifficult to showthat starting withn = 2r, thecaseof evenn, we obtain this same equation foroddk. Thus, we canwrite n(n+I)-ic cn+2 = (n + I)(n +2) Cn· Forarbitrary coandcj ,weobtaintwoindependentsolutions, oneofwhichhasonlyeven powers ofx andtheother onlyoddpowers. Thegeneralizedratiotest(see [Hass99, Chapter 5]) showsthatthe seriesis divergent for x = ±1 unless A = 1(1 + 1) for some positive
  • 391. quantum harmonic oscillator: power series method 13.6 POWER-SERIES SOLUTIONS OF SOLOES 373 integer l. In that case the infinite seriesbecomesa polynomial, the Legendre polynomial encountered in Chapter 7. Equation (13.31) could have been obtained by substitutiog Equation (13.28) directly intotheLegendre equation. Theroundabout wayto (13.31)takenhereshowsthegenerality of Equation (13.29). With specificdifferential equations it is generally better to substitute (13.28) directly. III 13.6.5. Example. WestudiedHermitepolynomials inChapter7inthe contextofclassical orthogonal polynomials. Let us see howtheyariseinphysics. Theone-dimensional time-independent SchrOdinger equation fora particle of massm in apotential V (x) is 12 d2lj1 - 2m dx2 + V(x)ljI = EljI, where E is thetotalenergy of theparticle. Fora harmonicoscillator, Vex) = ikx2 == !mw2x2 and m2w2 2m ljI" - ~x2lj1 + 12 EljI = O. Substituting ljI(x) = H(x) exp(-mwx2/21) and then making the change of variables x = (I/~mw/I)y yietds H" -2yH' +J..H =0 2E where J..=--1. tu» (13.32) ThisistheHermite differential equation innormal form. Weassume theexpansion H (y) = L~O cnyn whichyields 00 00 H'(y) = Lncnyn-t = L(n + t)cn+1yn, n=l n=O 00 00 H"(y) = Ln(n+l)cn+tyn-l = L(n+l)(n+2)cn+2yn. n=l n=O Substituting in Equation (13.32) gives 00 00 L[(n+ I)(n +2)cn+2 +J..cnJyn - 2 L (n + I)cn+lyn+l = 0, n=O n=O or 00 2C2 +J..co + L[(n +2)(n +3)C n+3+ J..cn+l - 2(n + l)cn+tlY n+l = O. n=O Setting thecoefficients of powers of y equal to zero,we obtain J.. C2 = -Zco, 2(n + I) - J.. C n+3 = (n +2)(n +3) cn+t for n ~ 0,
  • 392. 374 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS' or,replacingn withn - 1, (13.33) n2:l. 2n-i. c +2 - c " - (n+t)(n+2) ", Theratiotest yields easily thatthe seriesis convergent forall valuesof y. Thus, the infinite series whose coefficients obey the recursive relation in Equa- tion (13.33) converges for all y. However, on physical grounds, i.e., the demand that limx--+oo 1fr(x) = 0, the series must be truncated. This happens only if)" = 21for some integer I (see Probtem 13.20 and [Hass 99, Chapter 13]), and in that case we obtain a polynomial,the Hermite polynomialof order 1.A consequenceof such a truncation is the quantization of harmonic oscillatorenergy: quantum harmonic oscillator: algebraic method 2E 21 = A = Iiw - t =} E = (l + !)Iiw. Twosolutions are generated from Equation (13.33), one including only even powers andthe otheronly oddpowers.Theseareclearlylinearlyindependent. Thus,knowledgeof co and CJ determines the general solution of the HSOLDE of (13.32). III The preceding two examples show how certain special functions used in math- ematical physics are obtained in an analytic way, by solving a differential equation. We saw in Chapter 12 how to obtain spherical harmonics and Legendre polynomi- als by algebraic methods. It is instructive to solve the barmortic oscillator problem using algebraic methods, as the following example demonstrates. 13.6.6. Example. TheHamiltonian of a one-dimensional harmonic oscillator is p2 1 H= -+_mo}x2 2m 2 where p = -itid/dx isthemomentumoperator. Letusfind theeigenvectors andeigenvalues ofH. Wedefinethe operators and Using the commutation relation [x, p] = i1i1, we can showthat and _ ... t t ... H - "wa a + 2,=1. (13.34) creation and annihilation operators Furthermore, one canreadily showthat [H, a] = -liwa, (13.35) Let ItfrE) be the eigenvector corresponding to the eigenvalne E: HItfrE) = E ItfrE}, and note that Equation (13.35) gives Ha ItfrE) = (aH - liwa) /tfrE} = (E - Iiw)aItfrE) and Hat ItfrE} = (E+Iiw)at /tfrE). Thus,a /tfrE} is aneigenvectorof H,witheigenvalueE -liw, andat 11/1E} is aneigenvector witheigenvalueE +Iu». Thatis why at anda arecalledthe raising andlowering (orcreation andannihilation) operators, respectively. Wecan write
  • 393. 13.6 POWER-SERIES SOLUTIONS OF SOLDES 375 By applying a repeatedly, we obtain states of lowerandlowerenergies. Butthere is a limitto thisbecause His a positiveoperator: It cannot havea negative eigenvalue. Thus, theremustexista ground state, 11{Io}, suchthat a 11{Io} = O. Theenergyof thisgrouodstate (or the eigenvaluecorresponding to 11{Io» can be obtained-I H11{Io} = (lUvat a + !1Uv) 11{Io) = !1Uv11{Io) . Repeated application oftheraising operatoryieldsbothhigher-level statesandeigenvalues. Wethus define 11{In} by (13.36) where en is a normalizing constant. Theenergy of l1/Jn} is n unitshigher than theground state's, or t En = (n + ~)IUv, whichis what we obtained inthepreceding example. Tofinden, wedemandorthonormalityfor the 11{In). Takingtheinnerproductof (13.36) with itself, we can show (see Problem 13.21) that lenl 2 = nlc,,_1I2, or len1 2 = n!leoI2, whichfor Icol = 1 and real c" yields en = v'ni.lt follows,then, that quantum harmonic oscillator: connection between algebraic .andanalytic methods In terms of functions andderivative operators, a Ito} = 0 gives (J~:x + J2;W :x) 1{Io(x) = 0 with the solution1{IO(x) = eexp(-mwx2/2/i). Normalizing1{IO(x) gives 100 (mwx 2 ) ( lin ) 1/2 I = (1{IoI1{l0) = e 2 -00 exp --/i- dx = e 2 mw Thus, WecannowwriteEquation (13.37) in terms of differential operators: 1{I,,(x) = _1_ (~)ti4 (tw x _ J/i ~)n e-mwx'/(2n) v'ni lin 2/i 2mw dx Defining a newvariable y = .Jmw/hx transforms this equation into = (mW)t/4 _1_ ( _ ~)n e-Y'/2. 1{1" lin 1],=n I Y d -v L.··n: Y (13.37) 7Prom hereon, theunitoperator 1 will notbe shownexplicitly.
  • 394. 376 13. SECONO-OROER LINEAR OIFFERENTIAL EQUATIONS From this, the relation between Hermite polynomials, and the solutions of the one- dimensional harmonic oscillator as given in the previous example, we can obtain a general formula for Hn(x). In particular, if we note that (see Problem 13.21) '/2 ( d) '/2 -i d 2 eY y - - e-Y = -eY -e-Y dy dy and, in general, 2 ( d )n 2 2 d ll 2 eY /2 y _ _ e-Y /2 = (_l)ney -e-Y , dy dyn we recover the generalized Rodriguez formula of Chapter 7. l1li To end this section, we simply quote the following important theorem (for a proof, see [Birk 78, p. 95]): 13.6.7. Theorem. For any choice ofCo and CI, the radius ofconvergence ofany power series solution y = L~o CkXkfor the normal HSOWE y" +p(x)y' +q(x)y = 0 whose coefficients satisfy the recursion relation of (13.29) is at least as large as the smaller ofthe two radii ofconvergence ofthe two series for p(x) and q(x). 10 particular, if p(x) and q(x) are analytic in an interval around x = 0, then the solution of the normal HSOLDE is also analytic in a neighborhood of x = O. 13.7 SOLDEs with Constant Coefficients The solution to a SOLDE with constant coefficients can always be fonnd in closed form. 10fact, we can treat an nth-order linear differential equation (NOLDE) with constant coefficients with no extra effort. This brief section outlines the procedure for solving such an equation. For details, the reader is referred to any elementary book on differential equations (see also [Hass 99]). The most general nth-order linear differential equation (NOLDE) with constant coefficients can be written as L[y] == yCn) +an-1/,,-I) + ...+aIY' +aoy = r(x). (13.38) The corresponding homogeneous NOLDE (HNOLDE) is obtained by setting r(x) = O. Let us consider such a homogeneous case first. The solution to the homogeneous NOLDE L[y] == yCn) +an_IY C n-l) + ...+alY' +aOy = 0 (13.39) characteristic polynomial ofan HNOLOE can be found by making the exponential substitotion y = eAx , which results in the equation L[eAx ] = (A" +a,,_IAn-1 +...+alA +ao)eAX =O. This equation will hold only if Ais a root of the characteristic polynomial
  • 395. 13.7 SOLDES WITH CONSTANT COEFFICIENTS 377 which, by the fundamental theorem of algebra, can be written as (13.40) The Aj are the distinct (complex) roots of peA) and have multiplicity k], 13.7.1. Theorem. Let (Aj J7~1 be the roots ofthe characteristic polynomial ofthe real HNOLDE ofEquation (13.39), and let the respective roots have multiplicities (kj J7=1' Then the functions are a basis ofsolutions ofEquation (13.39). When a A is complex, one can write its corresponding solution in terms of trigonometric functions. 13.7.2. Example. Anequation that is usedin bothmechanics and circuit theory is for a, b > O. (13.41) Itscharacteristic polynomial is pC},,) = A2 + a'A +b. whichhastheroots Al = !(-a+Ja2 - 4b) and 1.2= i(-a-Ja2 - 4b). Wecandistinguish three different possiblemotions depending ontherelative sizes of a andb. (a) a2 > 4b (overdamped): Herewe havetwo distinctsimpleroots.The multiplicities are bothone(kl = k2 = 1); therefore, thepower of x forbothsolutions is zeroCrl = rz = 0). Lety == 1./a2 - 4b.Then themostgeneral solution is Since a > 2y, this solutionstarts at y = C1 + C2 at t = 0 and continuously decreases; so, as t -+ 00, yet) -+ o. (b)a2 = 4b (criticallydamped): In thiscasewehaveonemultiplerootof order2 (kl = 2); therefore, thepower of x canbe zeroor 1 (rl = 0, 1).Thus, thegeneral solution is yet) = Cjte-at/2 +<oe-al/ 2 This solution starts at y (0) = Co at t = 0, reaches a maximum (or minimum) at t = 2/a- colcr. and subsequently decays (grows)exponentiallyto zero. (e) a2 < 4b (underdamped): Oncemore, we havetwo distinct simple roots. Themulti- plicities arebothone (kl = k2 = 1);therefore, thepower of x for both solutions is zero
  • 396. 378 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS (" =r: =0). Let '" '" tv'4b - 02. Then Al = -0/2 + ico and A2 =Ai. The roots are complex, andthemostgeneral solutionis thusof theform yet) = e- at j2 (cr COS(J)t + cz sin cu) = Ae-at j2 cos(evt + a). Thesolution is aharmonic variation withadecayingamplitude A exp(-at/2). Notethatif a = 0, the amplitude doesnot decay. Thatis why a is called thedamping factor (orthe damping constant). Theseequations describeeitheramechanical systemoscillating (withnoexternal force) in a viscous (dissipative) fluid, or an electrical circuit consisting of a resistance R, an inductance L, anda capacitance C. For RLC circuits,a = RJL and b = Ij(LC). Thus, thedamping factor depends on therelative magnitudes of R andL. Ontheother hand, the frequency depends on all three elements. In particular, for R ~ 2JL/ c the circuit does not oscil- - . A physical system whose behaviorin the absence ofa driving force is described by a NOLDE will obey an inhomogeneous NOLDE in the presence of the driving force. This driving force is simply the inhomogeneous term of the NOLDE. The best way to solve such an inhomogeneous NOLDE in its most general form is by using Fourier transforms and Green's functions, as we will do in Chapter 20. For the particular, but important, case in which the inhomogeneous term is a product of polynomials and exponentials, the solution can be found in closed form. 13.7.3. Theorem. The NOLDE L[y1= eAx S(x), where S(x) is a polynomial, has the particular solution eAxq(x), where q(x) is also a polynomial. The degree of q(x) equals that ofS(x) unless A = Ai- a root ofthe characteristic polynomial of L, in which case the degree ofq (x) exceeds that of S(x) by ki- the multiplicity of Aj. Once we know the form of the particular solution of the NOLDE, we can find the coefficients in the polynomial of the solution by substitnting in the NOLDE and matching the powers on both sides. 13.7.4. Example. Letusfind themostgeneral solutions forthefollowingtwodifferential equationssubjectto the bonndarycooditions y(O) = 0 and y'(0) = 1. (a) The first DE we wantto consideris y" + y = xe", (13.42) The characteristic polynomial is).,2+ I, whose roots areAl = i andA2 = -i. Thus,a basis of solutions is {cosx, sinx}. Tofind theparticular solutionwe note that A(thecoefficient of x in theexponential part of theinhomogeneous term) is 1, whichis neither of theroots Al and).2-Thus,theparticular solutionis of the form q(x)eX , whereq(x) = Ax + B is
  • 397. 13.7 SOLDES WITH CONSTANT COEFFICIENTS 379 of degree I [same degree as that of S(x) = x]. We now substitute u = (Ax + B)eX in Equation (13.42) to obtain the relation 2Axex +(2A +2B).' = xe", Matching thecoefficients, we have 2A = I and 2A+2B=O =} A=!=-B. Thus,themostgeneral solution is Y = C1 cosx +C2 sin x + 'i<x - l)ex. Imposing the given boundary conditions yields 0 = y(O) = Cj - !and I = y' (0) = C2. Thus, y = !cos x +sinx + !(x - l)eX is theunique solution. (b) ThenextDE we wantto consider is y" _ y = xe", (13.43) Here p(A) = A2 - I, and the roots are At I and A2 = -1. A basis of solutions is {eX,e-X}. To find a particular solotion, we note that S(x) = x and A= I = At. Theorem 13.7.3 then implies thatq(x) must be of degree 2, because At is a simple root, i.e., kt =1. We therefore try q(x) = Ax2 + Bx +C =} u = (Ax 2 + Bx + C)ex. Taking the derivatives and substituting in Eqoation (13.43) yields two equations, 4A= I and A+B=O, whosesolution is A = -B = 1.Note thatC is notdetermined, becauseCe" is a solution of the homogeneousDE corresponding to Equation (13.43), so when Lis appliedto u, it eliminates the term Cex. Another way of looking at the situation is to note that themost general solution to (13.43) is of the form The term Cex could be absorbed in creX. We therefore set C = 0, applythe boundary conditions, andfindtheunique solution III
  • 398. (13.44) 380 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS 13.8 The WKB Method In this section, we treat the somewhat specialized method-due to Wentzel, Kramers, and Brillouin-e-of obtaining an approximate solution to a particular type of second-order DE arising from the Schrodinger equation in one dimension. Sup- pose we are interested in finding approximate solutions of the DE d2 y -2 +q(x)y = 0 dx in whichq varies "slowly" with respect tox in the sense discussed below. If q varies infinitely slowly, i.e., if it is a constant, the solution to Equation (13.44) is simply an imaginary exponential (or trigonometric). So, let us define </>(x) by y = ei¢(x) and rewrite the DE as (<//)2 +i</>" - q = O. (13.45) (13.46) (13.47) Assuming that <//' is small (compared to q), so that y does not oscillate too rapidly, we can find an approximate solution to the DE: <// = ±.jii =} </> = ± fJq(x) dx. The condition of validity of our assumption is obtained by differentiating (13.46): 1<//'1 '" ~ I~I « Iql. It follows from Equation (13.46) and the definition of</> that 1/y7i is approximately 1/(2Jf) times one "wavelength" of the solution y. Therefore, the approximation is valid if the change in qin one wavelength is small compared to IqI. The approximation can be improved by inserting the derivative of (13.46) in the DE and solving for a new </>: . , ( . ') t/2 12 lq , zq (</» "'q±-- =} </> "'± q±-- 2y7i 2y7i or The two choices give rise to two different solutions, a linear combination ofwhich gives the most general solution. Thus, y'" ~:(X) {C1exp [if.jii dX] +C2 exp [-i f.jii dX]} .
  • 399. 13.8 THE WKB METHOD 381 Equation (13.47) gives an approximate solution to (13.44) in any region in which the condition of validity holds. The method fails if q changes too rapidly or if it is zero at some point of the region. The latter is a serious difficnlty, since we often wish to join a solution in a region in which q(x) > 0 to one in a region in which q(x) < O. There is a general procedure for deriving the so-called connection formulas relating the constants C1 and C2 of the two solutions on either side of the point where q(x) = O. We shall not go into the details of such a derivation, as it is not particularly illuminating.f We simply quote a particular result that is useful in applications. Suppose that q passes through zero at xo, is positive to the right of xo, and satisfies the condition of validity in regions both to the right and to the left of xo. Furthermore, assume that the solution of the DE decreases exponentially to the left of xo. Under such conditions, the solution to the left will be of the form I lxO ~exp[- J-q(x) dX], -q(x) x while to the right, we have (13.48) (13.49) 2 ~ I cos [lX Jq(x) dx - ::]. q(x) xo 4 A similar procedure gives connection formulas for the case where q is positive on the left and negative on the right of xo. 13.8.1. Example. Consider theSchrodinger equation in onedimension d2t 2m dx 2 + 1<2 [E - V(x)lt = 0 where V(x) isa potential well meeting the horizontal line ofconstant E atx = a and x = b, so that 2m {>0 q(x) = Z[E - Vex)] I< < 0 if a < x < b, if x < a or x > b. The solution thatis bounded to theleft of a mustbe exponentially decaying. Therefore, in theinterval (c, b) theapproximate solution, asgivenby Equation (13.49), is t(x) '" A 1/4 cos (lX t~[E - Vex)] dx - ::'4) , (E - V) a I< where A is somearbitrary constant. The solution thatis bounded to therightof b mustalso be exponentially decaying. Hence,the solution for a < x < b is 8Theinterested reader is referred to thebookby Mathews andWalker, pp.27-37.
  • 400. (13.50) 382 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS Sincethese twoexpressions givethesamefunction in thesameregion, theymustbeequal. Thus, A = B, and, moreimportantly, cos(fJ~~[E - V(x)] dx - ~) = cos ({ J~~[E - V(x)] dx - ~) , or lab JZm[E - V(x)] dx = (n + !)rrn. Thisis essentiallytheBohr-Sommerfeld quantization condition ofpre-1925 quantum me- chanics. IIllII 13.8.1 Classical Limit of the Schrodlnger Equation As long as we are approximating solutions of second-orderDEs that arise naturally from the Schrodinger eqnation, it is instructive to look at another approximation to the Schr6dinger equation, its classical limit in which the Planck constant goes to zero. The idea is to note that since 1/r(r, t) is a complex function, one can write it as 1/r(r, t) = A(r, t) exp [~s(r, t)] , where A(r, t) and S(r, t) are real-valued functions. Snbstituting (13.50) in the Schrodinger equation and separating the real and the imaginary parts yields (13.51) These two equations are completely equivalent to the Schrodinger equation. The second equation has a direct physical interpretation. Define p(r, t) == A2 (r , t) = 11/r(r, t)12 and VS J(r, t) sa A2(r , t) - = pv, m ___ (13.52) =V multiply the second equation in (13.51) by 2A/m, and note that it then can be written as ap -+V·J=O, at (13.53) wltich is the continuity equation for probability. The fact that J is indeed the probability current density is left for Problem 13.30.
  • 401. Schriidinger equation describes a classical statistical mixture when Ii -e- O. 13.9 NUMERICAL SOLUTIONS OF DES 383 The first equation of (13.51) gives an interesting result when 1i, --+ °because in this limit, the RHS of the equation will be zero, and we get as 1 2 -+-mv +V =0. at 2 Taking the gradient of this equation, we obtain (:t +v'V)mv+VV=O, which is the equation ofmotion of a classical fluid with velocity field v = VS/ m. We thus have the following: 13.8.2. Proposition. In the classical limit, the solution ofthe Schriidinger equa- tion describes a fluid (statistical mixture) ofnoninteracting classical particles of mass m subject to the potential V(r). The density and the Currentdensity ofthis fluid are, respectively, the probability density p = I'"1 2 and the probability current density J ofthe quantum particle. 13.9 Numerical Solutionsof DEs The majority of differential equations encountered in physics do not have known analytic solutions. One therefore resorts to numerical solutions. There are a variety ofmethods having various degrees of simplicity ofuse and accuracy. This section considers a few representatives applicable to the solution of ODEs. We make frequent use oftechniques developed in Section 2.6. Therefore, the reader is urged to consult that section as needed. Any normal differential equation of nth order, dnx _ . (n-e l}, --F(x,x, ... ,x ,t), dtn can be reducedto a system ofnfirst-order differential equations by defining Xl = x, Xz =i, ... Xn = x(n-l). Thisgives thesystem We restrict ourselves to a FaDE of the form i = f(x, t) in which f is a well- behaved function oftwo real variables. At the end ofthe section, we briefly outline a technique for solving second-order DEs. Two general types of problems are encountered in applications. An initial value problem (IVP) gives X (t) at an initial time to and asks for the value of X at other times. The second type, the boundary value problem (BVP), applies ouly to differential equations ofhigher order than first. A second-order BVP specifies the value of x(t) and/or i(t) at one or more points and asks for X or i at other values of t. We shall consider only IVPs.
  • 402. 384 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS 13.9.1 Using the Backward Difference Operator Let us consider the IVP i = f(x, r), x(to) = xo· (13.54) (13.55) (13.56) The problem is to find {Xk = x(to +kh)}£"=I' given (13.54). Let us begiu by integrating (13.54) between tn aud tn +h: l to+h x(tn +h) - x(tn) = i(t) dt. to Chauging the variable of integration to s = (t - tn)/ h aud using the shift operator E introduced in Section 2.6 yields xn+' - Xn = h fa'i(tn +sh) ds = h faES i(tn)] ds. Since a typical situation involves calculating xn+' from the values of x(t) aud i(t) at preceding steps, we waut au expression in which the RHS of Equation (13.55) contains such preceding terms. This suggests expressing E in terms of the backward difference operator. It will also be useful to replace the lower limit of integration to - p, where p is a number to be chosen later for convenience. Thus, Equation (13.55) becomes Xn+t = xn-p +h [i~(1 - V)-S dS] in [I' 00 r(-s + l)ds k] . = xn- p +h _p ~ k!r(-s _ k + 1) (-V) Xn = xn-p +h (f>iP)Vk ) in, k=O where (p) (-l)kl' r(-s+l)ds III ak =-- =- s(s+l) .. ·(s+k-l)ds. k! r P r(-s - k + 1) k! -p (13.57) Keeping the first few terms for p = 0, we obtain the useful formula Due to the presence of V in Equation (13.58), finding the value of x(t) at tn+1 requires a knowledge of x(t) aud i(t) at points to, tl, ... , tn' Because of this,
  • 403. formulas of open and closed type 13.9 NUMERICAL SOLUTIONS OF DES 385 Equation (13.58) is called a formula of open type. In contrast, in formulas of closed type, the RHS contains values at tn+1 as well. We can obtain a formula of closed type by changing E' xnto its equivalent form, E,-l xn +1. The result is 00 . h" bCP)Vk . Xn+l = x n-p + L- k' Xn+b k~O where bC P) es (_l)k 11 r(-s+2)ds. k k! _P I'{-s - k +2) Keeping the first few terms for p = 0, we obtain x +1 "" x +h (1 _ V _ V 2 _ V 3 _ 19V 4 _ 3V 5 _ ... ) X +1 n n 2 12 24 720 160 " , (13.59) (13.60) which involves evaluation at tn+l on the RHS. For p = 1 (p = 3), Equation (13.56) [(13.59)] results in an expansion in powers of V in which the coefficient of VP (VP+2)is zero. Thus, retaining terms up to the (p - l)st [(p + l)st] power of V automatically gives us an accuracy of hl' (hp +2). This is the advantage of using nonzero values of p and the reason we considered such cases. The reason for the use of formulas of the closed type is the smallness of the error involved. All the formulas derived in this section involve powers of V operating onx, orXn+ l . Thismeansthat to findxn+ l , we mustknow the values of Xk for k :0 n + 1. However, x = f(x, t) or Xk = f(xk, tk) implies that knowledge of Xk requires knowledge of Xk. Therefore, to find Xn+l, we must know not ouly the values of x but also the values of x(t) at tk for k :0 n +1. In particular, we cannot start with n = 0 because we would get negative indices for x due to the high powers of V. This means that the first few values of Xk must be obtained using a different method. One common method of starting the solution is to use aTaylor seriesexpansion: h2xo 2 Xk = x (to +kh) =xo +hxok+ -2-k +... , where (13.61) Xo = Itx«; to), .. (8 f I ). + 8f I xo= - XQ - , ax Xij,to at XQ,to For the general case, it is clearthat the derivatives requiredfor the RHS of Equation .(13.61) involve very complicated expressions. The following example illustrates the procedure for a specific case.
  • 404. 386 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS 13.9.1. Example. Letus solvethe IVPx+x +e' x 2 = 0 withx(O) = I. Wecanobtain a Taylor series expansion for x by noting that . 2 xQ = -xo -xO' xo = x(O) = -XO - 2xOXO - x5, ... .. ".,2 2 .. 4' 2 xo=-xO-£...,o- xQxO- xQxO-xo' Continuing in this way. we can obtain derivatives of all orders. Substituting xQ = 1 and keeping terms up to the fifth order, we obtain iO = -2, iQ = 5, x·o =-16, d4XI =65, dt4 ,=0 d 5 x I = -326. dt5 ,~O (13.62) Substituting thesevaluesin a Taylorseriesexpansionwithh = 0.1 yields Xk = I - 0.2k +0.025k2 - 0.0027k3 + (2.7 x 1O-4)k4 - (2.7 x 1O-5)k5 + .... Thus,Xt = 0.82254, X2 = 0.68186, and X3 = 0.56741. Thecorresponding valuesoU can be calculatedusingthe DE. Wesimplyquote the result: Xt = -1.57026, x2 = -1.24973, X3 = -1.00200. III Once the starting values are obtained, either a formula of open type or one of closed type is used to find the next x value. Ouly formulas of open type will be discussed here. However, as mentioned earlier, the accuracy of closed-type formulas is better. The price one pays for having x,,+! on the RHS is that using closed-type formulas requires estimating Xn+t. This estimate is then substituted in the RHS, and an improved estimate is found. The process is continued until no further improvement in the estimate is achieved. The use ofopen-type formulas involves simple substitution ofthe known quan- tities Xo, xi, ... , Xn on the RHS to obtain x,,+t. The master equation (for p = 0) is (13.58). The number of powers of V that are retained gives rise to different Euler's method methods. For instance, when no power is retained, the method is called Euler's method, for which we nse xn+t '" x" +hin. A more commouly used method is Adam's method, for which all powers of V up to and including the third are Adam's method retained. We then have or, in tenus of values ofi, h Xn+t '" Xn + 24 (55in - 59in_t +37i,,_2 - 9i,,_3). Recall thatik = f(xk, tk). Thus, if we know the values xs, Xn-I, X,,-2, andx,,_3, we can obtain xn+1- 13.9.2. Example. Knowing xo, xI, X2, andX3,we can useEquation(13.62)to calculate X4 for Example13.9.1: 0.1 X4 "" X3 + 24 (55x3 - 59x2+37xl - 9xo) = 0.47793.
  • 405. 13.9 NUMERICAL SOLUTIONS OF DES 387 With X4 at our disposal, we canevaluate x4 = - X4 - xlet4, andsubstitute it in to find X5. and so on. A crucial fact about such methods is that every value obtaiued is in error by some amount, and using such values to obtain new values propagates the error. Thus, error cau accumulate rapidly aud make approximatious worse at each step. Discussiou of error propagation aud error aualysis-topics that we have not, aud shall not, cover-is commou in the literature (see, for example, [Hild 87, pp. 267- 268]). 13.9.2 The Runge-Kutta Method The FODE of Equatiou (13.54) leads to a unique Taylor series, h2 x(to +h) = xo +hio + 2io +... , where io, io, aud all the rest of the derivatives cau be evaluated by differeutiatiog i = lex, t). Thus, theoretically, the Taylor series gives the solution (for to +h; but to +2h, to + 3h, aud so on cau be obtaioed similarly). However, in practice, the Taylor series converges slowly, aud the accuracy involved is not high. Thus oue resorts to other methods of solution such as described earlier. Runge-Kutta method Another method, known as the Runge-Kulla method, replaces the Taylor series with Xn+l = Xn +h [ao!(xn, tn) +i-»-+bjh, tn + /Ljh)] , J~t (13.63) (13.64) where ao aud {aj, b], /Lj }}=t are constauts choseu such that iftheRHS of (13.64) were expauded in powers of the spacing h, the coefficients of a certaio number of the leading terms would agree with the corresponding expausion coefficieuts of the RHS of (13.63). It is customary to express the b's as linear combinations of preceding values of !; i-I hb, = LAirkr, r=O i = 1,2, ... , p.
  • 406. (13.65) 388 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS The k; are recursively defined as ko = hf(x,,, In), k; = hf(x" +b-h, In + P,rh). Then Equation (13.64) gives Xn+l = Xn + L~=oarkr' The (nontrivial) task now is to determine the parameters a-, J-Lr, and Aijo Carle David TalmoRunge (1856-1927), afterreturningfrom a six-month vacation in Italy, enrolled at the University of Munich to study literature. However, after six weeks of the course he changed to mathematics and physics. Rungeattendedcourses with Max Planck, andtheybecame close friends. In 1877 both went to Berlin, but Runge turned to pure mathematicsafterattendingWeierstrass'slectures. His doctoraldissertatioo(1880)dealt with differentialgeometry. After taking a secondary-school teachers certificationtest, he returned to Berlin, where he was influenced by Kronecker. Runge then worked on a procedure for the numerical solution of algebraic equations in which the roots were expressed as infinite series of rationalfunc- tions of the coefficients. In the area of numerical analysis, he is credited with an efficient method of solving differential equations numerically, work he did with Martin Kutta. Runge published little at that stage, but after visiting MitlagMLeffler in Stockholm in September 1884 he produced a large nnmber of papers in Mittag-Leffler'sjonmal Acta mathematica. In 1886. Runge obtained a chair at Hanover and remained there for 18 years. Within a year Runge had moved away from pure mathematics to study the wavelengths of the spectral lines of elements other than hydrogen. He did a great deal of experimental work and published a great quantity of results, including a separation of the spectral lines of helium in two spectral series. In 1904 Klein persuaded Gottingen to offer Runge a chair of applied mathematics, a post that Rnnge held nnti!he retired in 1925. Runge was always a fit and active man, and on his 70th birthday he entertained his grandchildren by doing handstands. However, a few months later he had a heart attack and died. In general, the determination of these constants is extremely tedious. Let us consider the very simple case where p = I, and let A es AOl and IL sa ILl. Then we obtain Xn+l = x" +aoko +alkj, where ko = hf(xn , In) and kj = hf(xn + Ako, In + p,h). Taylor-expanding kj, a function of two variables, gives? k: = hf +h2(lLft +Vfx) + h; (1L2I« +2Ap,ffxt +A2 f2 fxx) + O(h 4 ) , 9ThesymbolO(h ln ) meansthatall termsof orderhlll·and higherhavebeenneglected.
  • 407. 13.9 NUMERICAL SOLUTIONS OF OES 389 where It es anal, etc. Substituting this in the first equation of (13.65), we get w c- nt Xn+l = x" +h(ao +al)I +h 2a(J-LIt +Allx) + h; al(J-L2Itt +2AJ-Lllxt +A 2 12 Ixx) + O(h 4 ) . On the other hand, with .. df al dx al . x = I, x = dt = atdl + at= xIx + It = IIx + It, x' = Itt +2IIxt + 12 Ixx + Ix<Jlx + It), Equation (13.63) gives h2 xn+! = Xn +hI + z<Jlx + It) h 3 2 4 + 6[Itt +2IIxt + I Ixx + Ix<Jlx + ft)] + O(h ). (13.66) (13.67) lfwe demand that (13.66) and (13.67) agree up to the h2 term (we cannot demand agreement for h3 or higher because of overspecification), then we mnst have ao + al = 1, alJ-L = ~,aIA = ~' There are ouly three equations for four unknowns. Therefore, there will be an arbitrary parameter f3 in terms of which the unknowns can be written: in ta ~S. hs al es a ao = 1- f3, al = f3, 1 A = 2f3' is td IS m Substituting these values in Eqnation (13.65) gives This formula becomes useful ifwe let f3 = ~' Then In +hf(2f3) = In +h = In+l, which makes evaluation of the second term in square brackets convenient. For f3 = ~,we have i) h 3 X,,+l = xn +"2[f(Xn, In)+ I(xn +hI, In+I)] + O(h ). (13.68) What is nice about this equation is that it needs no starting up! We can plug in the known quantities In, tn+1, and Xn on the RHS and find Xn+I starting with n = O. However, theresult is not veryaccurate, and we cannot makeit anymore accurate by demanding agreement for higher powers of h, because, as mentioned earlier, sucha demand overspecifies theunknowns.
  • 408. 390 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS Martin Wilhelm Knlla (1867-1944) losthis parents when he was still a child, and together with his brother went to his uncle in Breslau to go to gymnasium. He attended the University of Bres1au from 1885-1890, and the University of Munich from 1891-1894 concentrating mainly on mathematics, but he was also interested in languages, music, and art. Although he completed the certification for teaching mathematics and physics in 1894, he did not start teaching immediately. Instead, he assisted von Dyckat the Technische Hochschule Miinchen until 1897 (and then again from 1899 to 1903). From 1898 to 1899 he studied at Cambridge, and a year later, he finished his Ph.D. at the University of Munich. In 1902, he completed his habilitation in pure and applied mathematics at the Technische Hochschule Munchen, where he became professor of applied mathematics five years later. In 1909 he accepted an offer from University of Vienna, but a year later he went to the Technische Hochschule Aachen as a professor. From 1912 until his retirement in 1935 he worked at the Technische Hochschule Stuttgart. Kutta's name is well known not only to physicists, applied mathematicians, and en- gineers, but also to specialists in aerospace science and fluid mechanics. The first group use the Runge-Kutta method, developed at the beginning of the twentieth century to obtain numerical solutions to ordinary differential equations. The second group use the Kutta- Zhukovskii formula for the theoretical description of the buoyancy of a body immersed in a nonturbulent moving fluid. Kutta's work on the application of conformal mapping to the study of airplane wings was later applied to the flight of birds, and further developed by L. Prandtl in the theory of wings. Kutta obtained the motivation for his first scientific publication from Boltzmann and others (including a historian ofmathematics) when working on the theoretical determination of the heat exchanged between two concentric cylinders kept at constant temperatures. By applying the conformal mapping technique, Kutta managed to obtain numerical values for the heat conductivity of air that agreed well with the experimental values of the time. Three ofKutta'spublications dealt withthe history ofmathematics, for which he profited greatly because of his knowledge of the Arabic language. One of the most important tasks of applied mathematics is to approximate numerically the initial value problem of ODEs whose solutions cannot be found in closed form. After Euler(1770) hadalready expressed the basic idea, Runge (1895) and Heun(1900) wrote down the appropriate formulas. Kutta's contribution was to considerably increase the accuracy, and allow for a larger selection of the parameters involved. After accepting a professorship in Stuttgart in 1912, Kutta devoted all his time to teaching. He was very much in demand as a teacher, and it is said that his lectures were so good that even engineering students took an interest in mathematics. (Taken from W. Schulz, "Martin Wilhelm Kutta," Neue Deutsche Biographie 13, Berlin, (1952-) 348-350.) Formulas that give more accurate results can be obtained by retaining terms beyondp = 1.Thus, for p = 2, if we writexn+ ! = xn+I:~~o OI,k" there will be eight unknowns (three 01's, three Aij 's, and two u 's), and the demand for agreement between the Taylor expansion and the expansion of f up to h3 will yield only six equations. Therefore, there will be two arbitrary parameters whose specification
  • 409. 13.9 NUMERICALSOLUTIONS OF OES 391 results iu various formulas. The details ofthis kind ofalgebraic derivation are very messy, so we will merely consider two specificformulas. Onesuchformula, due to Kutta, is Xn+l = Xn + ~(ko +4kl +k2) + O(h4 ) , where ko = hf(xn , In), kl = hf(xn + !ko, In + !h), k2 = hf(xn +2kl - ko, In +h). (13.69) A second formula, dne to Heun, has the form Xn+! = Xn + !(ko +3k2) + O(h4 ), where ko = hf(xn, In), kl = hf(xn + ~kO, In + ~h), ka = hf(xn + ~kl - ko, In + ~h). These two formulas are of about the sarne order of accuracy. 13.9.3. Example. Letus solveIheDEofExample 13.9.1usingIheRunge-Kuua meIhod. WiIh to = 0, xO = I, h = 0.1, and n = 0, Equation (13.69) gives ko = -0.2, kl = -0.17515, k2 = -0.16476. so Ihat XI = 1 + ~(-0.2+4(-0.17515) - 0.16476) = 0.82244. This Xl, h = 0.1, and tl = to + h = 0.1 yield Ihe following new values:ko = -0.15700, kl = -0.13870, k2 = -0.13040, whicb in turn give x2 = 0.82244 + ~[-0.15700 - 4(0.13870) - 0.13040] = 0.68207. WesimilarlyobtainX3 =0.56964andX4 =0.47858. On theoIherhand, solvingIheFODE analyticallygives Iheexact result x(t) = e-t/(1 + t). Table 13.1 compares thevalues obtained here, thoseobtained usingAdam's method, and theexactvalues tofivedecimal places.It is clear that theRunge-Kutta method is more accurate than themethods discussed earlier. II1II The accuracy ofIhe Runge-Kutta meIhod and the fact Ihat it requires no startup procedure make it one of the most popular methods for solving differential equa- tions. The Runge-Kutta method can be made more accurate by using highervalues of p, For instance, a formula that is used for p = 3 is Xn+1 = Xn + ~(ko +2kl +2k2 +k3) + O(h 5 ) , where (13.70) ko = hf(xn , In), k2 = hf(xn + !kl, In + !h) kl = hf(xn + !ko, In + !h), k3 = hftx; +k2, In +h).
  • 410. 392 13. SECONO-ORDER LINEAR DIFFERENTIAL EQUATIONS I Analytical O. 1 0.1 0.82258 0.2 0.68228 0.3 0.56986 0.4 0.47880 Runge--Kutta 1 0.82244 0.68207 0.56964 0.47858 Adam's method 1 0.82254 0.68186 0.56741 0.47793 Table 13.1 Solutions to the differential equation of Example 13.9.1 obtained in three different ways. 13.9.3 Higher-OrderEquations Any nth-order differential equation is equivalent to n first-order differential equa- tionsinn+ 1variables. Thus, for instance, the most general SOOE, F(i, X, x, I) = 0, can he reduced to two FOOEs by solving for i to obtain i = G(x, x, z), and defining x = U to get the system of equations u= G(u, x, I), x=u. These two equations are completely equivalent to the original SOOB. Thus, it is appropriate to discuss numerical solutions of systems of FOOEs in several vari- ables. The discussion here will be limited to systems consisting of two equations. The generalization to several equations is not difficult. Consider the IVP of the following system of equations x= f(x, u, I), x(lo) =xo: u=g(x, u, I), u(to) =uo. (13.71) Using an obvious generalization of Equation (13.70), we can write X n+l = X n + !(ko + 2kl + 2k2 + k3) + O(hs), Un+l = Un + !(mo + 2ml + 2m2 + m3) + O(hs), where ko =hf(xn, Un, In), kl =hf(xn + !kO, Un + !mo, In + !h), k2 = hf(xn + !kl, Un + !mj, In + !h), k3 = hf(xn + ka. Un + m2, In + h), and mo = hg(xn, Un, tn), mj = hg(xn + !ko, Un + !mo, In + !h), m2 = hg(xn + !kj, Un + !ml, In + !h), m3 = hg(xn + k2, Un + m2, tn + h). (13.72)
  • 411. 13.9 NUMERICAL SOLUTIONS OF DES 393 These formulas are more general than needed for a SODE, since, as mentioned above, such a SODE is equivalent to the simpler system in which !(x, u, I) sa u. Therefore, Equation (13.72) specializes to ko = hu.. = hin k2 = hin + ~hm" and k, =h(un + ~mo) =hin + ~hmo, k3 = hin +hm-i, Xn+l = Xn +hin + ~h(mo +ml +m2) + O(h 5), in+l = in + ~(mo +2m, +2m2 +m3) + O(h 5), (13.73) where mo =hg(xn, in, In), m, =hg(xn + ~hin, in + ~mo, In + ~h), m2 = hg(xn + ~in + ~hmo, in + ~m" In + ~h), m3 = hg(xn +hin + ~hm" in +m2, In +h). 13.9.4. Example. The IVP x+ x = 0, x(O) = 0, x(O) = I clearly has the analytic solution xCt) = sinr. Nevertheless, let us use Equation (13.73) to illustrate the Runge- Kutta method and compare the result with the exact solution. For this problem g(x, i, t) = -x. Therefore, we can easily calculate the m's: rna = ":""hx n, ml = -h(xn + !hxn}, m2 = -h(xn + ~hxn - ~h2Xn), m3 = -h[xn +hin - ~h2(Xn + ~hXn)]. These lead to the following expressions for xn+1 and xn+1: xn+ l = xn + kin - ~h2(3Xn + kin - !h2Xn), xn+' = in - gh[6Xn +3hxn - h2(xn + !hin)]. Starting with xo = 0 and.to = 1, we can generate xl> X2. and so on by using the last two equations successively. The results for 10 values of x with h = 0.1 are given to five significant figures in Table 13.2. Note that up to x5 there is complete agreement with the exact result. .. The Runge-Kutta method leuds itself readily to use in computer programs. Because Equation (13.73) does not require any startups, it can be used directly to generate solutions to any IVP involving a SODE. Another, more direct, method of solving higher-order differential equations is to substitute 0 = -(1/ h) 10(1 - V) for the derivative operator in the differential equation, expand in terms ofV , and keep an appropriate number ofterms. Problem 13.33 illustrates this point for a linear SODE.
  • 412. 394 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS t Runge-Kutta 0.1 0.09983 0.2 0.19867 0.3 0.29552 0.4 0.38942 0.5 0.47943 0.6 0.56466 0.7 0.64425 0.8 0.71741 0.9 0.78342 1.0 0.84161 sint 0.09983 0.19867 0.29552 0.38942 0.47943 0.56464 0.64422 0.71736 0.78333 0.84147 Table 13.'2 Comparison of theRunge-Kutta andexactsolutions to thesecondorder DE ofExample 13.9.4. 13.10 Problems 13.1. Let u(x) be a differentiable function satisfying the differential inequal- ity u'(x) ~ Ku(x) for x E [a, b], where K is a constant. Show that u(x) ~ u(a)eK(x-a). Hint: Multiply both sides of the inequality by e-KX, and show that the result can be writteu as the derivative of a nouincreasing fuuction. Then use the fact that a ~ x to get the final result. 13.2. Prove Proposition 13.4.2. 13.3. Let f and g be two differentiable functions that are linearly dependent. Show that their Wronskian vanishes. 13.4. Show that if (Ji, f{) and (!2. f~) are linearly dependent at one point, then Ji and fz are linearly dependent at all x E [a, b]. Here Ji and fz are solutions of the DE of (13.12). Hint: Derive the identity W(fl, f2; X2) = W(Ji, f2; XI) exp {-1~2 P(t)dt} . 13.5. Show that the solutions to the SOLDE v" + q(x)y = 0 have a constant Wronskian. 13.6. Find (in tenus of an integral) GI/(x), the linearly independent "partner" of the Hermite polynomial HI/(x). Specialize this to n = 0, 1. Is it possible to find Go(x) and GI (x) in tenus of elementary functions? 13.7. LetJi, fz,and f3be any three solutions of y" +py' +qy = O. Show that the (generalized 3 x 3) Wronskian of these solutions is zero. Thus, any three solutions of the HSOLDE are linearly dependent.
  • 413. 13.10 PROBLEMS 395 13.8. For the HSOLDE v" +py' +qy = 0, show that P= 11ft. - hI!' W(fl, h) and I 'f " - f'f" q=1221. W(fI,f2) Thus, knowing two solutions of an HSOLDE allows us to reconstruct the DE. 13.9. Let h, 12, and 13 be three solutions of the third-order linear differential equation s" + P2(X)Y" + PI(x)y' + po(x)y = O. Derive a FODE satisfied by the (generalized 3 x 3) Wronskian of these solutions. 13.10. Prove Corollary 13.4.13. Hint: Consider the solution u = I of the DE u" = 0 and apply Theorem 13.4.11. 13.11. Show that the adjoint of Mgiven in Equation (13.21) is the original L. 13.12. Show that ifu(x) and v(x) are solutions ofthe self-adjointDE (pu')'+qu = 0, then Abel's identity, p(uv' - vu') = constant, holds. 13.13. Reduce each DE to self-adjoint form. (a) x2y" +xy' +y = O. (b) s"+ y'tan x = O. 13.14. Reduce the self-adjoint DE (py')' + qy = 0 to u" + S(x)u = 0 by an appropriate change of the dependent variable. What is Sex)? Apply this reduction to the Legendre DE for P"(x), and show that Sex) = 1+ n(n + I) - n(n + l)x 2 . • (l-x2)2 Now use this result to show that every solution of the Legendre equation has at least (2n + I)/lf zeros on (-I, +1). 13.15. Substitute v = y' / y in the homogeneousSOLDE y" +p(x)y' +q(x)y = 0 and: Riccati equation .(a) Show that it tums into v' + v2 + p(x)v + q(x) = 0, which is a first-order nonlinear equation called the Riccati equation. Would the same substitution work if the DE were inhomogeneous? (h) Show that by an appropriate transformation, the Riccati equation can be directly cast in the form u' +u2 +Sex) = o. 13.16. For thefunction S(x) defined in Example 13.6.1, let S-I(x) be the inverse, i.e., S-I(S(x» = x. Show that d -I I d - [S (x)] = "------"2' x vi-x·
  • 414. 396 13. SECOND-ORDER LINEAR DIFFERENTIAL EQUATIONS and given that s-t(O) = 0, conclude that fo x dt S-I(X) = ~. o 1- t2 13.17. Define sinh x andcoshx as the solutions of y" = y satisfying the boundary conditions y(O) = 0, y'(O) = I and y(O) = I, y'(O) = 0, respectively. Using Example 13.6.1 as a guide, show that (a) cosh2 x - sinh2 x = I. (c) sinh(-x) = -sinhx. (b) cosh(-x) = coshx. (d) sinh(a +x) = sinha coshx +cosha sinhx. 13.18. (a) Derive Equation (13.30) of Example 13.6.4. (b) Derive Equation (13.31) of Example 13.6.4 by direct substitntion. (c) Let)" = 1(1 + I) in Example 13.6.4 and calculate the Legendre polynomials p/(x) for 1= 0, 1,2,3, subject to the condition p/(l) = I. 13.19. Use Equation (13.33) ofExample 13.6.5 to generate the first three Hermite polynomials. Use the normalization to determine the arbitrary constant. 13.20. The function defined by 00 f(x) = I>nxn, n=O where 2n -)" c 2 - C n+ - (n + I)(n +2) n, can be written as f (x) = cog(x) +Cj h(x), where g is even andh is odd in x. Show that f (x) goes to infinity at least as fast as e Xx does, i.e., limx-->oo f (x )e- Xx i' O. Hint: Consider g(x) and h(x) separately and show that 00 g(x) = Lbnxn, n=O 4n -)" where b 1- b n+ - (2n + 1)(2n +2) n' Then concentrate on the ratio g(x)/e xx , where g and e Xx are approximated by polynomials of very high degrees. Take the limit of this ratio as x ---> 00, and use recursion relations for g and e Xx • The odd case follows similarly. 13.21. Refer to Example 13.6.6 for this problem. (a) Derive the commutation relation [a, at] = 1. (b) Show that the Hamiltonian can be written as given in Equation (13.34). (c) Derive the commutation relation [a, (at)n] = n(at)n-l.
  • 415. 13.10 PROBLEMS 397 (d) Take the inner product ofEquatiou (13.36) with itself aud use (c) to show that len 1 2 = nlen_112 • From this, conclude that Ic,I 2 = n!leoI2. (e) For auy function f (y), show that (y _~) (eY' /2f) = _eY' /2d f . dy dy Apply (y - d/dy) repeatedly to both sides of the above equation to obtain d n . d/' f ( y _ -) (eY'/2f) = (_I)"eY'/2_. dy dy" (f) Choose au appropriate f(y) in part (e) aud show that ' ( d )n 2 2 an 2 eY /2 y _ _ e-Y /2 = (-I)'eY -(e-Y ). dy dy" 13.22. Solve Airy's DE, y" +xy = 0, by the power-series method. Show that the radius of couvergence for both independent solutions is infinite. Use the compari- sou theorem to show that for x > 0 these solutions have infinitely mauy zeros, but for x < 0 they cau have at most one zero. 13.23. Show that the functions x/' eAX, where r = 0, I, 2, ... , k, are liuearly inde- pendeut. Hint: Starting with (D- A)k,apply powers ofD- Ato a linearcombination of xreAx for all possible r's. 13.24. Find a basis of real solutions for each DE. (a) s"+Sy' +6 = O. d4 y (e) - =y. dx 4 (b) v"+ 6y" + 12y' + 8y = O. d4y (d) -4 =-y. dx 13.25. Solve the following initial value problems. d4 y (a) -4 =y, dx d4y d2 y (b) -4 + -2 =0, dx dx d4y (e) -4 =0, dx y(O) = y'(O) = ylll(O) = 0, y"(O) = 1. y(O) = y" (0) = s"(0) = 0, y' (0) = 1. y(O) = y'(0) = y" (0) = 0, s"(0) = 2. 13.26. Solve v"-2y'+y = xe" subjectto the initial conditions y(O) = 0, y'(O) = 1.
  • 416. 398 13. SECONO-ORDER LINEAR DIFFERENTIAL EQUATIONS 13.27. Find the general solution of each equation. (a) y" = xe", (c) y" +y = sinx sin 2x. (e) y" - y = eX sin2x. (g) v" -4y' +4 = eX+xe2x . (b) y" - 4y' +4y = x2 . (d) v" - y = (1 +e-x )2. (f) y(6) _ y(4) = x2• (h) y" + y = e2x • i = t + sinx, x(O) = IT/2 i = sinxt, x(O) = 1 13.28. Consider the Euler equation, xn/ n) +an_1Xn-1y(n-l) + ... +alxy' +aoy = r(x). Substitute x = e' and show that such a substitution reduces this to a DE with constant coefficients. In particular, solve x 2y" - 4xy' +6y = x. 13.29. (a) Show that the substitution (13.50) reduces the Schrodinger equation to (13.51). (b) From the second equation of (13.51), derive the continuity equation for prob- ability. . 13.30. Show that the usual definition of probability current density, J=Re[l'J*i~VI'J], reduces to that in Equation (13.52) if we use (13.50). 13.31. Write a computer program that solves the following differential equations by (a) Adam's method [Equation (13.62)] and (b) the Runge-Kutta method [Equation (13.70)]. i = t - x 2 , x(O) = 1 i = e- xt, x(O) = 1 i=x2t2+1, x(O) = 1 13.32. Solve the following IVPs numerically, with h = 0.1. Find the first ten values of x. (a) x+0.2i2 + lOx = 20t, (b)x+4x=t2 , (c) x+i +x = 0, (d) tx +i +xt = 0, (e) x+i +x2 = t, (f) x +xt = 0, (g) x+sin x = to, x(O) = 0, x(O) = I, x(O) = 2, x(O) = I, x(O) = I, x(O) = 0, IT x(O) = 2' i(O) = O. i(O) = O. i(O) = O. i(O) = O. i(O) = O. i(O) = 1. i(O) = O.
  • 417. 13.10 PROBLEMS 399 13.33. Substitute dldt = D = -(IIh) 1n(1 - V) in the SOLDE i + p(t)i + q(t)x = r(t) and expand the log terms to obtain (V2 + V3)xn - hp.;(V + ~V2 )xn +h2qnxn = h2rn Since V is of order h, one has to keep one power fewer in the second term, Find anexpression forXn in terms of Xn-l, Xn-2, and Xn-3. validto h2 . Additional Reading 1. Birkhoff, G. and Rota, G.-c. Ordinary Differential Equations, 3rd ed., Wi- ley, 1978. The small size of this book is very deceptive. It is loaded with information. Written by two excellent mathematicians and authors, the book covers all the topics ofthis chapter and much more in a very clear and lucid style. 2. DeVries, P. A First Course in Computational Physics, Wiley, 1994. The numerical solutions of differential equations are discussed in detail. The approach is slightly different from the one used in this chapter. 3. Hildebrand,F.Introduction toNumericalAnalysis, 2nd ed., Dover, 1987.Our tteatmentofnumerical solutions of differentialequations closely follows that of this reference. 4. Mathews, J. and Walker, R. Mathematical Methods of Physics, 2nd ed., Benjamin, 1970. A good source for WKB approximation.
  • 418. 14 -----'---- _ Complex Analysis of SOLDEs We have familiarized ourselves with some useful techniques for finding solutions to differential equations. One powerful method that leads to formal solutions is power series. We also stated Theorem 13.6.7 which guarantees the convergence of the solution of the power series within a circle whose size is at least as large as the smallest of the circles of convergence of the coefficient functions. Thus, the convergence of the solution is related to the convergence of the coefficient functions. What about the nature of the convergence, or the analyticity of the solution? Is it related to the analyticity of the coefficient functions? If so, how? Are the singularpoints ofthe coefficients also singularpoints ofthe solution? Is the nature ofthe singularities the sarne? This chapter answers some ofthese questions. Analyticity is best handled in the complex plane. An important reason for this is the property of analytic continuation discussed in Chapter II. The differential equation du [dx = u2 has a solution u = -llx for all x except i = O. Thus, we have to "puncture" the real line by removing x = 0 from it. Then we have two solutions, because the domain of definition of u = -1Ix is not connected on the real line (technically, the definition ofa function includes its domain as well as the rule for going from the domain to the range). In addition, if we confine ourselves to the real line, there is no way that we can connect the x > 0 region to the x < 0 region. However, in the complex plane the sarne equation, duifd.; = w2, has the complex solution w = -liz, which is analytic everywhere except at z = O. Puncturing the complex plane does not destroy the connectivity of the region of definition ofw.Thus, the solution in the x > 0 region can be analytically continued to the solution in the x < 0 region by going around the origin. The aim of this chapter is to investigate the analytic properties ofthe solutions of some well known SOLDEs in mathematical physics. We begin with a result from differential equation theory (for a proof, see [Birk 78, p. 223]).
  • 419. continuation principle 14.1 ANALYTIC PROPERTIES OF COMPLEX DES 401 14.0.1. Proposition. (continuation principle) The function obtained by analytic continuation of any solution of an analytic differential equation along any path in the complex plane is a solution of the analytic continuation of the differential equation along the same path. An analytic differential equation is one with analytic coefficient functions. This proposition makes it possible to find a solution in one region of the complex plane and then continue it analytically. The following example shows how the singularities of the coefficient functions affect the behavior of the solution. 14.0.2. Example. LetusconsidertheFODE w'- (y jz)w = Ofory E lit. Thecoefficient functionp(z) = -yjz hasa simplepoleatz = O. Thesolution totheFODEiseasilyfound to be w = zY. Thus,depending on whether y is a nonnegative integer, a negative integer -m, ora noninteger, thesolution has aregular point,a pole of order m. ora branch point atz = 0, respectively. l1li This example shows that the singularities of the solution depend on the param- eters of the differential equation. 14.1 Analytic Properties of Complex DEs To prepare for discussing the analytic properties of the solutions of SOLDEs, let us consider some general properties of differential equations from a complex analytical point of view. 14.1.1 ComplexFOLDEs In the homogeneous FOLDE dw dz + p(z)w = 0, (14.1) p(z) is assumed to have only isolated singular points. It follows that p(z) can be expanded about a point zo-which may be a singularity of p(z)-as a Laurent series in some annular region Yj < [z - zo] < r2: 00 p(z) = L an(z- zo)" n=-oo where Yj < [z - zol < rz. The solution to Equation (14.1), as given in Theorem 13.2.1 with q = 0, is w(z) = exp [- I p(z)dz] = Cexp [-a-II Z ~zzo - ~anl(z - zo)ndz - ~a-nl(z - zo)-ndz] [ 00 00 ] an n+l a-n-l -n = C exp -a_IIn(z - zo) - L --(z - zo) +L--(z - zo) . n=O n + 1 n=l n
  • 420. 402 14. COMPLEX ANALYSIS OF SOLOES We can write this solution as w(z) = C(z - zo)"g(z), (14.2) where a sa -a-l and g(z) is an analytic single-valued function io the annular region rl < [z - zn] < rz because g(z) is the exponential of an analytic function. For the special case io which p has a simple pole, i.e., when a-n = 0 for all n 2: 2, the second sum io the exponent will be absent, and g will be analytic even at zoo In fact, g(zo) = I, and choosing C = 1, we can write (14.3) The singularity ofthe coefficient functions ofan FOLDE determines the singularity ofthe solution. Depending on the nature of the siognlarity of p(z) at zo, the solutions given by Equation (14.2) have different classifications. Foriostance, if p(z) has a removable siogularity (if a-n = 0 V n 2: I), the solution is Cg(z), which is analytic. In this case, we say that the FOLDE [Equation (14.1)] has a removable singularity at zoo If p(z) has a simple pole at zo (ifa-I f= 0 and a-n = 0 V n 2: 2), then io general, the solution has a branch poiot at zoo In this case we say that the FOLDE has a regular singularpoint. Fioally, if p(z) has a pole oforder m > I, then the solution will have an essential singulatity (see Problem 14.1). In this case the FOLDE is said to have an irregular singular point. To arrive at the solution given by Equation (14.2), we had to solve the FOLDE. Since higher-order differential equations are not as easily solved, it is desirable to obtaio such a solution through other considerations. The followiog example sets the stage for this endeavor. 14.1.1. Example. A FOLDEhas a unique solution,to within a multiplicative constant, givenby Theorem 13.2.1.Thus,given a solutionw(z), anyother solutionmustbe of the form Cw(z). Let zu be a singulatityof p(z), and let z - zo = rele . Start at a point z and circleZQ so that f) --+ e+21f.Eventhough p(z) mayhavea simple poleatzo,thesolution mayhavea branch point there. This is clear from the general solution, where a maybe a noninteger. Thus,w(z) ea w(zQ + re i(8+ 2:n) may be different from w(z). Todiscover thisbranch point-without solving the DE-invoke Proposition 14.0.1 andconclude that w(z) is also a solutionto the FOLDE. Thus, w(z) can be differentfrom w(z) by at most a multiplicative constant: w(z) = Cw(z). Definethe complex number ex by C = e'h<ia. Thenthefunctiong(z) '" (z - ZO)-aw(z) is single-valued aroundzo0 In fact, g(zo + rei(B+'h<) = [ri(B+2"lr a w(zo + rei(8+Z,,) =(z - zo)-ae-Z"iae'h<iaw(z) = (z - ZO)-aw(z) = g(z). This argumentshowsthat a solution w(z) of the FOLDE of Equation (14.1) can be writtenas w(z) = (z - zo)ag(z), where g(z) is single-valued. III
  • 421. (14.4) 14.1 ANALYTIC PROPERTIES OF COMPLEX DES 403 14.1.2 The Circuit Matrix The method used in Example 14.1.1 can be generalized to obtain a similar result for the NOLDE dnw dn-1w dw L[w] = dzn +Pn-I(Z) dzn-I +...+PI(Z) dz +po(z)w = 0 where all the Pi(Z) are analytic in rl < [z - zol < ri- Let {wj(z)}J~1 be a basis ofsolutions ofEquation (14.4), andletz-zo = re'", Start at z and analyticallycontinue the functions W j (z) one complete tum to e+2IT. Let Wj(z) '" Wj (eo +reie ) es Wj (zo +rei(9+2"J). Then, by a generalization of Proposition 14.0.1, {Wj(z)}J=1 are not ouly solutions, but they are linearly inde- pendent (because they are W j'S evaluated at a different point). Therefore, they also form a basis ofsolutions. Onthe otherhand, Wj (z) can be expressed as a linearcom- binationofthe Wj (z). Thus, Wj(z) =Wj (zo +re i(9+2"») = Lk=1 ajkwk(Z). The circuit matrix matrix A = (ajk), called the circnit matrix of the NOLDE, is invertible, because it transforms one basis into another. Therefore, it has ouly nonzero eigenvalues. We let A be one such eigenvalue, and choose the column vector C, with entries {c,}7~1' to be the corresponding eigenvector of the transpose of A (note that A and At, have the same set of eigenvalues). At least one such eigenvector always exists, because the characteristic polynomial of At has at least one root. Now we let w(z) = LJ=I CjWj(z). Clearly, this w(z) is a solution of (14.4), and n w(z) '" w(zo +rei (9+2>tJ) = :~::>jWj(zo +re i(9+2>tJ) j=l n n n = LCj LajkWk(z) = L(At)kjCjWk(Z) = LACkWk(Z) = AW(Z). j=l k=l j,k k=l If we define O! by A =e2>tia, then w(zo +rei(9+2>tJ) =e2"ia w(z). Now we write f(z) ea (z - zo)-aw(z). Following the argument used in Example 14.1.1, we get f(zo +rei(8+2"») = f(z); that is, f(z) is single-valued around ZOo We thus have the following theorem. 14.1.2. Theorem. Any homogeneous NOWE with analytic coefficient functions in ri < [z - zo 1 < rz admits a solution ofthe form W(z) = (z - zo)" f(z) where f(z) is single-valued around zo in rl < lz- eo! < rz- An isolated singular point zo near which an analytic function w(z) can be written as w(z) = (z - zo)" f(z), where f(z) is single-valued and analytic in the punctured neighborhood of zo. is called a simple branch point of w(z). The arguments leading to Theorem 14.1.2 imply that a solution with a simple branch
  • 422. 404 14. COMPLEX ANALYSIS OF SOLDES point exists if and only if the vector C whose components appear in w(z) is an eigenvector of At, the transpose of the circuit matrix. Thus, there are as many solutions with simple branch points as there are linearly independent eigenvectors of At 14.2 Complex SOLDEs canonical basis ofthe SOLOE Let us now consider the SOLDE ui" + p(z)wt ;t-q (z)w = O. Given two linearly independent solutions WI (z) and Wz(z), we form the2 x 2 matrix A and try to diagonalize it. There are three possible outcomes: 1. The matrix A is diagonalizable, and we can find two eigenvectors, F(z) and G(z), corresponding, respectively, to two distinct eigenvalues, Al and AZ. This means that F(zo +rei(O+z,,)) = AlF(z) and G(zo +rei(O+h)) = AZG(Z). Defining Al = eh i" andAz = eh iP, we get F(z) = (z - zo)" fez) andG(z) = (z-zo)Pg(z),asTheoremI4.1.2suggests. The set {F(z), G(z)} is called a canonical basis of the SOLDE. 2. The matrix Ais diagonalizable, and the two eigenvalues are the same. Inthis case both F(z) and G(z) have the same constant ex: F(z) = (z - zo)"fez) and G(z) = (z - zo)"g(z). 3. We cannot find two eigenvectors. This corresponds to the case where A is not diagonalizable. However, we can always find one eigenvector, so A has only one eigenvalue, A. We let WI (z) he the solution of the form (z - zo)" fez), where fez) is single-valued and A = eZ"i". The existence of such a solution is guaranteed by Theorem 14.1.2. Let wz(z) be any other linearly independent solution (Theorem 13.3.5 ensures the existence of such a second solution). Then and the circuit matrix will be A = (~g), which has eigenvalues A and b. Since Ais assumed to have only one eigenvalue (otherwise we would have the first outcome again), we must have b = A. This reduces Ato A = (~ ~), where a i" O. The condition a i" 0 is necessary to distinguish this case from the second outcome. Now we analytically continue h(z) sa wZ(Z)/WI(Z) one whole tum around zo, obtaining
  • 423. 14.2 COMPLEX SOLOES 405 It then follows that the functionI a gl (z) == h(z) - 2:n:iJ.. In(z - zo) is single-valued in TJ < [z - zn] < rz- If we redefine gl (z) and Wz(z) as (2:n:iJ..ja)gl (z) and (2:n:iJ..ja)wz(z), respectively, we have the following: 14.2.1. Theorem. If p(z) and q(z) are analytic in the annular region rl < [z - zol < rz. then the SOWE w" + p(z)w' +q(z)w = 0 admits a basis ofsolutions {WI, wz} in the neighborhood ofthe singular point zo. where either WI(Z) = (z - zn)" fez), or, in exceptional cases (when the circuit matrix is not diagonalizable], WI (z) = (z- zo)"fez), Thefunctions f (z). g(z), and gl (z) are analytic and single-valued in the annular region. This theorem allows us to factor out the branch point zo from the rest of the solutions. However, even though fez), g(z), and gl (z) are analytic in the annular region rl < Iz - zoI< r2, they may very well have poles of arbitrary orders at zoo Can we also factor out the poles? In general, we cannot; however, under special circumstances, described in the following definition, we can. 14.2.2. Definition. ASOWEoftheform w"+p(z)w'+q(z)w = othat isanalytic regular singular point in 0 < [z - zol < r has a regular singularpoint at zo ifp(z) has at worsta simple ofaSOLDE defined pole and q(z) has at worst a pole oforder 2 there. In a neighborhood of a regular singular point zo, the coefficient functions p(z) and q (z) have the puwer-series expansions 00 a-I '" k p(z) = - - + L..., ak(Z - zo) , z -zo k~O b-z b_1 ~ k q(z) = ( )Z + -- + L...,bk(z - zo) . z-zo z-zo k~O Multiplying both sides of the first equation by z - zo and the second by (z - zo)z and introducing P(z) sa (z - zo)p(z), Q(z) es (z - zo)Zq(z), we obtain 00 P(z) = Lak-I(Z - ZO)k, k=O 1Recallthat In(z - zo) increases by 21ri foreachturn around zoo 00 Q(z) = Lbk-z(z - zo)k. k~O
  • 424. 406 14. COMPLEX ANALYSIS OF SOLDES It is also convenient to multiply the SOLDE by (z - ZO)2 and write it as (z - ZO)2w" +(z - zo)P(z)w' + Q(z)w = o. Inspired by the discussion leading to Theorem 14.2.1, we write (14.5) 00 w(z) = (z - zo)" L Ck(Z - ZO)k, k=O Co = I, (14.6) where we have chosen the arbitrary multiplicative constant in such a way that Co = 1. Substitute this in Equation (14.5), and change the dummy variable-s-so that all sums start at O-to obtain ~ { (n +v)(n +v - I)Cn + ~[(k+v)an-k-I +bn-k-2]Ck } . (z - zo)"+v = 0, whichresults in therecursion relation For n = 0, this leads to what is known as the indicial equation for the exponent v: indicial equation, indicial polynomial, characteristic exponents n (n +v)(n +v - I)Cn = - L[(k +v)an_k_1 +bn-k-2]Ck· k=O ltv) ss v(v -I) +a_Iv +b_2 = O. (14.7) (14.8) The roots ofthis equation are calledthe characteristic exponents ofzo, and I (v) is called its indicial polynomial. Interms ofthis polynomial, (14.7) can be expressed as n-I I(n +v)Cn = - L[(k +v)an-k-I +bn-k-2]Ck k=O forn = 1,2, .... (14.9) Equation (14.8) determines what values of v are possible, and Equation (14.9) gives CI, C2, C3, ... , which in turn determine w(z). Special care must be taken if the indicial polynomial vanishes at n +v for some positive integer n, that is, if n +v, in addition to v, is a root ofthe indicial polynomial: I (n +v) = 0 = I (v). If uj and V2 are characteristic exponents of the indicial equation and Re(vI) > Re(V2),then a solutionfor VI always exists. A solutionfor V2 also exists if VI - v2 # n for any (positive) integer n. In particular, if Zo is an ordinary point [a point at which both p(z) and q(z) are analytic], then only one solution is determined by (14.9). (Why?) The foregoing discussion is summarized in thefollowing:
  • 425. 14.2 COMPLEX SOLDES 407 14.2.3. Theorem. If the dif.ferential equation w" + p(z)w' + q(z)w = 0 has a regular singular point at Z = zo, then at least one power series of the form of (14.6) formally solves the equation. If VI and V2 are the characteristic exponents ofzo. then there are two linearly independent formal solutions unless VI - V2 is aninteger. 14.2.4. Example. Let us consider somefamiliar differential equations. (a)TheBessel equation is " I , ( a 2 ) w + -w + 1- - w·= O. Z z2 In this case, the originis a regular singular point,a-I = I, andb-z = _a2. Thus, the indicia! equation is v(v - 1) + v - a2 = 0, and its solutions are vI = a and 112 = -a. Therefore, there are two linearly independent solutions to theBessel equation unless VI - v2 = 2a is aninteger, i.e., unlessa is eitheran integerora half-integer. (b) For the Coulomb potential f (r) = f3Ir, the most general radial eqoation [Equation (12.14)] reducesto " 2, (fJ a) w + -w + - - - w = O. z z z2 Thepointz = 0 is a regular singular point atwhicha-I = 2 andb-z = -ct. Theindicia! polynomial is lev) = v2+v -a withcharacteristic exponents vi = -i +iJl +4a and V2 = -! - !oJI +4a. There are two independentsolutionsunless vI - VZ = oJI +4a is aninteger. In practice, a = 1(1 + I), where I is some integer; so VI "" vz = 21 + I, and onlyone solution is obtained. (c) The hypergeometricdifferential equation is w" + y - (a +fJ + I)z w' _ ~w = O. z(1 - z) z(1 - z) A substantial nwnberof functions inmathematical physicsaresolutions of thisremarkable equation, withappropriatevaluesfora, fJ, andy. Theregularsingularpoints- arez = 0 and z = 1. Atz = O,a_1 = y andb_2 = O. The indicialpolynomialis I(v) = v(v +y -I), whose roots are VI = 0 and V2 = 1 - y. Unless y is an integer, we have two formal solutions. .. It is shown in differential equation theory [Birk 78, pp. 40-242] that as long as VI - V2 is not an integer, the series solution of Theorem 14.2.3 is convergent for a neighborhood of zoo What happens when VI - VZ is an integer? First, as a convenience, we translate the coordinate axes so that the point zo coincides with the origin. This will save us some writing, because instead of powers of z - zo, we will have powers ofz. Next we let VI = V2 +n with n a positive integer. Then, since it is impossible to encounter any new zero ofthe indicial polynomial beyond 2Thecoefficientof w neednothavea pole of order 2. Itspole canbe of order oneas well.
  • 426. 408 14. COMPLEX ANALYSIS OF SOLDES VI, the recursion relation, Equation (14.9), will be valid for all values ofn, and we obtaina solution: WI (z) = ZVI f(z) = ZVI (I+ f Ckl) , k=1 which is convergent in the region 0 < IzI < r for some r > 0 To investigate the nature and the possibility of the second solution, write the recursion relations of Equation (14.9) for the smaller characteristic root V:2: =PII(V2+I) ~ I(V2 + I)CI = -(V:2aO +b_l) Co =} CI = PI, 1(V2 +2)C2 = -(V:2al +bo)Co - [(V2 + I)ao +b_IlCI =} C2 sa P2, (14.10) 1(V:2 +n - I)C,,_I "" p,,-II(V2 +n - I)Co =} C,,_I = P,,-I, 1(V:2 +n)C" = I(VI)C" = p"Co =} 0 = P", where in each step, we have used the result of the previous step in which Ck is given as a multiple of Co = 1. Here, the p's are constauts depending (possibly in a very complicated way) on the ak's and bk'S. Theorem 14.2.3 guarantees two power series solutions only when VI - V2 is not an integer. When VI - V2 is an integer, Equation (14.10) shows that ariecessary condition for a secondpower series solution to exist is that Pn = O. Therefore, when p" i 0, we have to resort to other means of obtaining the second solution. Let us define the second solution as =WI(Z) ,......-"-, W2(Z) "" WI (z)h(z) = ZVl f(z)h(z) (14.11) (14.12) and substitute in the SOLDE to obtain a FOLDE in h', namely, h" + (p + 2w;lwI)h' =O,or, by substituting w;lwI =vi/z-} f' If, the equivalentFOLDE h"+ C~I + 2;' + p)h' = O. 14.2.5. Lemma. The coefficient ofh' in Equation (14.12) has a residue ofn + 1. Proof Recall that the residue of a function is the coefficient of Z-I in the Laurent expansion of the function (about Z = 0). Let us denote this residue for the coef- ficient of h' by A-I. Since f(O) = I, the ratio f' If is analytic at z = O. Thus, the simple pole at z = 0 comes from the other two terms. Substituting the Laurent expansion of p(z) gives 2vI 2vI a-I' -+p= -+-+aO+alz+· ... Z Z Z
  • 427. i = 1,2, 14.2 COMPLEX SOLOES 409 This shows that A-I = 2vI +0_1. On the other hand, comparing the two versions ofthe indicial polynomial v2+(a_1 -1)v+b_2 and (v -VI)(V- V2) = v2- (VI + 1I2)V + VI V2 gives VI + V2 = -(a-I - I), or 2vI - n = -(a-I - I). Therefore, A_I=2vI+a_l=n+1. 0 14.2.6. Theorem. Suppose that the characteristic exponents ofa SOWE with a regular singular point at z = 0 are VI and 1J:2. Consider three cases: 1. VI - Vz is not an integer. 2. V:2 = VI - n wheren is a nonnegativeinteger.and Pit, as definedinEquation (14.10), vanishes. 3. V2 = VI - n wheren is a nonnegativeinteger, and Pn, as definedinEquation (14.10), does not vanish. Then, in the first two cases, there exists a basis ofsolutions {WI, W2} oftheform Wi (z) = zV' (I+f Cki1zk) , k~1 and in the third case, the basis ofsolutions takes theform Wt(z) =ZV! (1+ ~akZk), W2(Z) =zV' (1+ ~bkl) +CWt(z)lnz, where the power series are convergent in a neighborhood ofz = o. Proof. The first two cases have been shown before. For the third case, we use Lemma 14.2.5 and write 2vI 2f' n + I ~ k - + -f + p = - - + L... CkZ , z Z k~O and the solution for the FOLDE in h' will be [see Equation (14.3) and the discussion preceding it] h' (z) = z-n-t (1+ f bkl) . k~t For n = 0, i.e., when the indicial polynomial has a double root, this yields h' (z) = l/z+ I:~1 bkZk- l, or h(z) = Inz+ gl (z), where gl is analytic in a neighborhood ofz = O. Forn oF 0, we have h'(z) = bn/z+ I:i4n bkZk-n-1and, by integration, 00 b h(z) = bn ln z + L _k_l-n k;fnk-n b In -n ~ bk k b In -n () = n z+z L...k_nz = n z+z g2Z, kim
  • 428. (14.14) 410 14. COMPLEX ANALYSIS OF SOLOES where gz is analytic in a neighborhood of z = O. Substituting h in Equation (14.11) and recalling that Vz = VI - n, we obtain the desired results of the theorem. 0 14.3 Fuchslan Differential Equations In many cases of physical interest, the behavior of the solution of a SOLDE at infinity is important. For instance, bound state solutions of the Schrodinger equa- tion describing the probability amplitndes of particles in quantum mechanics must tend to zero as the distance from the center of the binding force increases. We have seen that the behavior of a solution is determined by the behavior of the coefficient functions. To determine the behavior at infinity, we substitnte z = lit in the SOLDE dZw dw -z + P(z)-d +q(z)w = 0 (14.13) dz z and obtain dZv [2 1 ] dv 1 - + - - -r(t) - + -s(t)v = 0, dt Z t t Z dt t4 where v(t) = w(1lt), r(t) = p(llt), ands(t) = q(llt). Clearly, as z --+ 00, t --> O. Thus, we are interested in the behavior of (14.14) at t = O. We assume that both r(t) and s(t) are analytic at t = O. Equation (14.14) shows, however, that the solution v(t) may still have singularities att = 0 because of the extra terms appearing in the coefficient functions. We assume that infinity is a regular singular point of (14.13), by which we mean that t = 0 is a regular singular point of (14.14). Therefore, in the Taylor expansions ofr(t) and s(t), the first (constant) term ofr(t) and the first two terms of s(t) must be zero. Thus, we write 00 r(t) = alt +aztZ+ ... = Lakl, k~l 00 s(t) = bztZ+b3t3 + ... = Lbktk. k=Z By their definitions, these two equations imply that for p(z) and q(z), and for large values of [z], we must have expressions of the form (14.15)
  • 429. (14.16) (14.17) Fuchsian DE Asecond-order Fuchsian DE with two regular singular points leads to uninteresting solutions! Asecond-order Fuchsian DE with three regular singular points leads to Interesting solutions! 14.3 FUCHSIAN DIFFERENTIAL EQUATlDNS 411 When infinity is a regular singular point of Equation (14.13), or, equiva- lently, when the origin is a regular singular point of (14.14), it follows from Theorem 14.2.6 that there exists at least one solution of the form VI(t) = t" (I +L~l ckrk) or, in terms of z, -"(I ~ Ck) WI (z) = z + L..J k . k=l Z Here a is a characteristic exponents at t = 0 of(14.14), whoseindicial polynomial is easily found to be a(a - I) + (2 - al)a +b: = o. 14.3.1. Definition. A homogeneous differential equation with single-valued an- alytic coefficient functions is called a Fuchsian differential equation (FDE) if it has only regular singular points in the extended complex plane, i.e., the complex plane including the point at infinity. It turns out that a particnlar kind of FDE describes a large class of nonelemen- tary functions encountered in mathematical physics. Therefore, it is instructive to classify various kinds of FDEs. A fact that is used in such a classification is that complex functions whose only singularities in the extended complex plane are poles are rational functions, i.e., ratios of polynomials (see Example 10.2.2). We thus expect FDEs to have only rational functions as coefficients. Consider the case where the equation has at most two regular singnlar points at Zl and Z2. We introduce anew variable Hz) = Z - Zl . The regnlar singular points Z -Z2 at zt and Z2 are mapped onto the points ~I = ~(ZI) = 0 and ~2 = HZ2) = 00, respectively, in the extended ~-plane. Equation (14.13) becomes d2u du d~2 + <I>(~) d~ + e(~)u = 0, where u, <1>, and e are functions of ~ obtained when Z is expressed in terms of ~ in w(z), p(z), and qtz), respectively. From Equation (14.15) and the fact that ~ = 0 is at most a simple pole of <I>(~), we obtain <I>(~) = al/~. Similarly, e(~) = b2/~2. Thus, a SOFDE with two regular singular points is equivaleut to the DE wIt + (al/Z)w' + (b2/Z2)W = O. Mnltiplying both sides by Z2, we obtain Z2w" + alzw' + b2W = 0, which is the second-order Euler differential equation. A general nth-orderEuler differential equation is equivalent to a NOLDE with constant coefficients (see Problem 13.28). Thus, a second order Fuchsian DE (SOFDE) with two regular singular points is eqnivalent to a SOLDE with constant coefficients and produces nothing new. The simplest SOFDE whose solutions may include nonelementary functions is therefore one having three regular singnlar points, at say ZI, Z2, and Z3. By the transformation ~(z) = (z - ZI)(Z3 - Z2) (z - Z2)(Z3 - ZI)
  • 430. 412 14. COMPLEX ANALYSIS OF SOLOES (14.18) Riemann differential equation we can map Zl, Z2, and Z3 onto ~I = 0, ~2 = 00, and ~3 = 1. Thus, we assume that the three regular singular points are at Z = 0, z = I, and z = 00. It can be' shown [see Problem (14.8)] that the most geuera! p(z) and q(z) are Al BI A2 B2 A3 p(z) = - + --1 and q(z) = 2: + ( 1)2 ( I) z z- z z- zz- We thus have the following theorem. 14.3.2. Theorem. The most general second order Fuchsian DE with three regular singular points can be transformed into the form /I (AI BI) I [A2 B2 A3] 0 w+-+--w+-+ - w=, z Z - I z2 (z - 1)2 z(z - I) where AI, A2, A3, BI, and B2 are constants. This equation is called the Riemann differential equation. We can write the Riemaruo DE in terms of pairs of characteristic exponents, (;,1,1.2), (1-'1,1-'2), and (VI, V2), belonging to the singular points 0, 1, and 00, respectively. The indicia! equations are easily found to be 1. 2 + (AI - 1)1.+ A2 = 0, 1-'2 +(BI - 1)1-' +B2 = 0, v 2 +(I - Al - BI)v +A2 +B2 - A3 = O. By writing the indicia! equations as (A - 1.1)(1. - 1.2) = 0, and so forth and comparing coefficients, we can find the following relations: Al = I - Al - 1.2, BI = I - 1-'1 - 1-'2, AI+BI =VI+V2+1, A2 = 1.11.2, B2 = 1-'11-'2, A2 + B2 - A 3 = VIV2. These equations lead easily to the Riemann identity Al +1.2+1-'1 +1-'2 +VI +V2 = 1. (14.19) (14.20) Substituting these results in (14.18) gives the following result. 14.3.3. Theorem. A second order Fuchsian DE with three regular singuiarpoints in the extended complex plane is equivalent to the Riemann DE, ( 1 - 1.1 - 1.2 1- "1- "2) wI! + + t'" r- w' Z z-I [ 1.11.2 1-'11-'2 VIV2 - 1.11.2 -1-'11-'2] + --+ + w-O Z2 (z - 1)2 z(z - I) - , which is uniquely determined by the pairs of characteristic exponents at each singularpoint. The characteristic exponents satisfy the Riemann identity, Equation (14.19).
  • 431. 14.4 THE HYPERGEOMETRIC FUNCTION 413 The uniqueness of the Riemann DE allows us to derive identities for solutions and reduce the independent parameters of Equation (14.20) from five to three. We first note that if w(z) is a solution of the Riemann DE corresponding to (AI, A2), (J.'I, J.'2), and (VI, 112), then the function u(z) = z},(z - I)JLw(z) has branch points at z = 0, 1,00 [because w(z) does]; therefore, it is a solution of the Riemann DE. Its pairs of characteristic exponents are (see Problem 14.10) (AI +A,A2+A), In particular, if we let A =-AI and J.' = -J.'1, then the pairs reduce to (0, J.'2 - J.'I), Defining a sa VI + Al + J.L!, fJ es V2 + Al + J.'I, and y == I - A2 + AI, and using (14.19), we can write the pairs as (O,I-y), (0, y - a - fJ), (a, fJ), which yield the third version of the Riemann DE w" + (1:'. + 1- Y + a + fJ) w' + afJ w = O. z z - I z(z - 1) hypergeometric DE This important equation is commonly written in the equivalent form z(1 - z)w" + [y - (1 + a + fJ)z]w' - afJw = 0 (14.21) where ao = I. and is called the hypergeometric differential equation (HGDE). We will study this equation next. 14.4 The Hypergeometric Function The two characteristic exponents of Equation (14.21) at z = 0 are 0 and I - y. It follows from Theorem 14.2.6 that there exists an analytic solution (correspond- ing to the characteristic exponent 0) at z = O. Let us denote this solution, the hypergeomelric hypergeometric function, by F(a, fJ; y; z) and write function 00 F(a, fJ; y; z) = I>kZk k~O Substituting in the DE, we obtain the recurrence relation (a +k)(fJ +k) ak+I = (k + I)(y +k) ak for k 2: O.
  • 432. (14.23) 414 14. COMPLEX ANALYSIS OF SOLOES Thesecoefficients canbedetermined successivelyif y isneither zeronoranegative hypergeometric integer: series . . _ ~ a(a + 1)··· (a + k - 1)f:l(f:l + 1)· .. (f:l + k - 1) k F(a,f:l,y,z)-l+t;r k!y(y+1) ... (y+k-1) z r(y) ~ r(a + k)r(f:l + k) k = r(a)r(f:l) 6 r(k + l)r(y + k) z . (14.22) The series in (14.22) is called the hypergeometric series, because it is the gener- alization of F(l, f:l; f:l; z), which is simply the geometric series. We note immediately from (14.22) that 14.4.1. Box. The hypergeometric series becomes a polynomial if either a or f:l is a negative integer. Tbisis because for k < lal (ork < If:ll)bothr(a+k) [orr(f:l+k)]and I'(o) [or r(f:l)] have poles that cancel each other. However, r(a + k) [or r(f:l + k)] becomes finitefor k > lal (or k > If:ll), and the pole in I'(o) [or r(f:l)] makes the denominator infinite. Therefore, all terms of the series (14.22) beyond k = la I(or k = If:lll will be zero. Many of the properties of the hypergeometric function can be obtained directly from the HGDE, Equation (14.21). For instance, differentiating the HGDE and letting v = w', we obtain z(l - z)v" + [y + 1 - (a + f:l + 3)zlv' - (a + 1)(f:l+ l)v = 0, which shows that F'(a, f:l; y; z) = CF(a + 1, f:l + 1; y + 1; z), The constant C can be determined by differentiating Equation (14.22), setting z = 0 in the result.' and noting that F(a + I, f:l + 1; y + 1; 0) = 1. Then we obtain af:l F'(a,f:l; y; z) = -F(a + I, f:l + 1; y + 1; z). y Now assume that y '" 1, and make the substitution w = Zl-Yu in the HGDE to obtain" z(l-z)u" +[Yl - (al +f:ll + l)z]u' -alf:llu =0, where al =a -Y + 1, f:ll = f:l- Y + I, and Yl = 2 - y. Thus, u = F(a - y + I, f:l- Y + 1; 2 - y; z), 3Notethat thehypergeometric function evaluates to 1 atZ = 0 regardless of its parameters. 4Inthefollowingdiscussion, aI, Ih, andYl will represent theparameters of thenew DE satisfied by thenewfunction defined in terms of theold.
  • 433. 14.4 THE HYPERGEOMETRIC FUNCTION 415 and u is therefore analytic at z = D. This leads to an interesting resnIt. Provided that y is not an integer, the two functions Wt(z) == F(a, fJ; y; z), W2(Z) == zt-y F(a - y + 1, fJ - Y + I; 2 - y; z) (14.24) form a canonical basis of solutions to the HGDE at Z = D. This follows from Theorem 14.2.6 and the fact that (D, I - y) are a pair of (different) characteristic exponents at z = D. Johann CarlFriedrich Gauss (1777-1855)wasthegreatestof all mathematiciansandperhapsthemostrichlygifted geniusof whom there is any record. He was bornin the city ofBrunswick in northern Germany, His exceptional skill with numbers was clear at a very early age, and in later life he joked that he knew how to count before he could talk. It is said that Goethe wrote anddirectedlittleplaysfor a puppettheaterwhen he was 6 and that Mozart composed his first childish minuets when he was 5, but Gauss corrected an error in his father's payroll accounts at the age of 3. At the age of seven, when he started elemen- tary school, his teacher was amazed when Gauss summed the integers from 1 to 100 instantly by spotting that the som was 50 pairs of numbers each pair summing to 1Oi. His long professional life is so filled withaccomplishments that it is impossible to give a full account of them in the short space available here. All we can do is simply give a chronology of his almost uncountable discoveries. 1792-1794: Gauss reads the works of Newton, Euler, and Lagrange; discovers the prime number theorem (at the age of 14 or 15); invents the method of least squares; conceives the Gaussian law of distribution in the theory of probability. 1795:(only 18years old!) Proves tbataregularpolygonwithn sides is constructible (by ruler and compass) ifand only ifn is the productofa powerof2 and distinct prime numbers ofthe form Pk = 2 2k +1, and completely solves the 200D-year old problem ofruler-and-compass construction of regular polygons. He also discovers the law of quadratic reciprocity. 1799: Proves the fundamental theorem ef algebra in his doctoral dissertation using the then-mysterious complex numbers with complete confidence. 1801: Gauss publishes his DtsquistitonesArithmeticae in which he creates the modem rig- orous approach to mathematics; predicts the exact location of the asteroid Ceres. 1807:Becomes professor ofastronomy and the director ofthe new observatory at Gottingen, 1809:Publishes his second book, Theoria motus corporum coelestium, a major two-volume treatise on the motion ofcelestial bodies and the bible of planetary astronomers for the next 100years. 1812: Publishes Dlsqulsitiones generales circa seriem infinitam, a rigorous treatment of infinite series, and introduces the hypergeometric function for the first time, for which he uses the notation F(et,fJ; Y; z): an essay on approximate integration. 1820-1830: Publishes over 70 papers, including Disquisitiones generales circa superficies curvas, in which he creates the intrinsic differential geometry of general curved surfaces,
  • 434. 416 14. COMPLEX ANALYSIS OF SOLOES theforerunnerofRiemanniangeometryandthe generaltheoryofrelativity. From the 18308 on, Gauss was increasingly occupied with physics, and he enriched every branch of the subject he touched. In the theory of surface tension, he developed the fundamental idea of conservation ofenergy and solved the earliest problem in the calculus ofvariations. In op- tics,he introducedthe conceptof thefocallengthof a systemoflenses.He virtuallycreated the science of geomagnetism, and in collaboration with his friend and colleague Wilhelm Weber he invented the electromagnetic telegraph. In 1839 Gauss published his fundamental paper on the general theory of inverse square forces, which established potential theory as a coherent branch of mathematics and in which he established the divergence theorem. Gauss had many opportunities to leave Gottingen, but he refused all offers and remained there for the rest of his life, living quietly and simply, traveling rarely, and working with immense energy on a wide variety of problems in mathematics and its applications. Apart from science andhis family-he married twice and had six children, two ofwhom emigrated to America-his main interests were history and world literature, international politics, and publicfinance. He owned alarge library of about 6000 volumes in manylanguages, including Greek, Latin, English, French, Russian, Danish, and of course German. His acuteness in handling his own financial affairs is shown by the fact that although he started with virtually nothing, he left an estate over a hundred times as great as his average annual income during the last half of his life. The foregoing list is the published portion of Gauss's total achievement; the unpublished and private part is almost equally impressive. His scientific diary, a little booklet of 19 pages, discovered in 1898, extends from 1796 to 1814 and consists of 146 very concise statements of the results of his investigations, which often occupied him for weeks or months. These ideas were so abundant and so frequent that he physically did not have time to publish them. Some of the ideas recorded in this diary: Cauchy Integral Formula: Gauss discovers it in 1811, 16 years before Cauchy. Non-Euclidean Geometry: After failing to prove Euclid's fifth postulate at the age of 15, Gauss came to the conclusion that the Euclidean form of geometry cannot be the only one possible. Elliptic Functions: Gauss had found many of the results of Abeland Jacobi (the two main contributors to the subject) before these men were born. The facts became known partly through Jacobi himself. His attention was caught by a cryptic passage in the Disquisitiones, whose meaning can only be understood if one knows something about elliptic functions. He visited Gauss on several occasions to verify his suspicions and tell him about his own most recent discoveries, and each time Gauss pulled 30-year-old manuscripts out ofhis desk and showed Jacobi what Jacobi had just shown him. After a week's visit with Gauss in 1840, Jacobi wrote to his brother, "Mathematics would be in a very different position if practical astronomy had not diverted this colossal genius from his glorious career." A possible explanation for not publishing such important ideas is suggested by his comments in a letter to Bolyai: "It is not knowledge but the act of learning, not possession but the act of getting there, which grants the greatest enjoyment. When I have clarified and exhausted a subject, then I turn away from it in order to go into darkness again." His was the temperament of an explorer who is reluctant to take the time to write an account of his last expedition when he could be starting another. As it was, Gauss wrote a great deal, but to have published every fundamental discovery he made in a form satisfactory to himself would have required several long lifetimes.
  • 435. 14.4 THE HYPERGEOMETRIC FUNCTION 417 A third relation can be obtainedby making the snbstitntion w = (I-z)y-a-pu. Tltis leads to a hypergeometric equation for u with al = Y - a, fh = Y - fJ, and Yl = y. Furthermore, w is analytic at z = 0, and w(O) = I. We couclude that w = F(a, fJ; Y; z), We therefore have the ideutity F(a, fJ; Y; z) = (I - z)y-a-pF(y - a, Y - fJ; Y; z), (14.25) To obtain the canonical basis at z = I, we make the substitution t = 1-z, and notethattheresultisagaintheHGDE,withal = a,fJI = fJ,andYI = a+fJ-y+1. It follows from Equatiou (14.24) that W3(Z) sa F(a, fJ; a +fJ - Y + I; 1- z), W4(Z) == (I - z)y-a-pF(y - fJ, Y - a; Y - a - fJ + I; I - z) (14.26) form a canonical basis of solutious to the HGDE at z = I. A symmetry of the hypergeometric functiou that is easily obtained from the HGDEis F(a, fJ; Y; z) = F(fJ, a; Y; z). The six functions (14.27) F(a ± I, fJ; Y; z), F(a, fJ ± I; Y; z), F(a, fJ; Y ± I; z) (14.28) (14.29) are called hypergeometric functions contiguous to F(a, fJ; Y; z), The discussiou above showed how to obtain the basis of solutions at z = I from the regular solutiou to the HDE z = 0, Fta, fJ; Y; z). We can show that the basis of solutious at z = 00 can also be obtained from the hypergeometric fuuctiou. Equatiou (14.16) suggests a function of the form v(z) =z' F (ai, fJI; Yl; D sa z'w G) =} w(z) =z'v G), where r, ai, fJl' and Ylare to be determined. Since w(z) is a solution ofthe HGDE, v will satisfy the following DE (see Problem 14.15); z(l- z)v" + [I - a - fJ -2r - (2 - Y - 2r)z]v' - [r 2 - r +ry - ~(r +a)(r +fJ)] v = O. This reduces to the HGDE if r = -a or r = -fJ. For r = -a, the parameters become al = a, fJl = I +a - Y, and YI = a - fJ+I. For r = -fJ, the parameters are al = fJ, fJI = I +fJ - Y, and Yl = fJ - a + I. Thus, VI(Z) = z-a F (a, I +a - Y; a - fJ + I; ~) , V2(Z) = z-p F (fJ, I +fJ - Y; fJ - a + I; D (14.30)
  • 436. (14.31) 418 14. COMPLEX ANALYSIS OF SOLDES form a canonical basis of solutions for the HGDE that are valid about z = 00. As the preceding discussion suggests, it is possible to obtain many relations among the hypergeomettic functions with different parameters and independent variables. In fact, the nineteenth-century mathematicianKummer showed that there are 24 different (but linearly dependent, of course) solutions to the HGDE. These Kummer's solutions are collectively known as Kummer's solutions, and six of them were derived above. Another important relation (shown in Problem 14.16) is that z·-y (1 - z)Y-.-pF (Y - a, I - a; I - a +fJ; ~) also solves the HGDE. Many of the functions that occur in mathematical physics are related to the hypergeomettic function. Even some ofthe common elementary functions can be expressed in terms of the hypergeometric function with appropriate parameters. For example, when fJ = y, we obtain . . _ ~ r(a +k) k _ _. Fta, fJ, fJ, z) - f;;o r(a)r(k + I) z - (I - z) . Similarly, F(~, ~; ~; z2) = sin-1 zlz, and F(I, I; 2; -z) = In(1 + z)lz. How- ever, the real power of the hypergeomettic function is that it encompasses almost all of the nonelementary functions encountered in physics. Let us look briefly at a Jacobi functions few of these. Jacobi functions are solutions of the DE 2 d2u du (I - x )-2 + [fJ - a - (a + fJ + 2)x]- + ).(). + a +fJ + I)u = 0 dx dx (14.32) Defining x = I - 2z changes this DE into the HGDE with parameters al = )., fJl = ). + a + fJ + I, and Yl = I + a. The solutions of Equation (14.32), called the Jacobi functions of the first kind, are, with appropriate normalization, c. R) r(). +a + 1) ( I - Z) P op (z) = F -). ). + a + fJ + I' I + a' - - . A r(). + l)r(a + I ) ' " 2 When). = n, a nonnegative integer, the Jacobi function turns into a polynomial of degree n with the following expansion: pCa,PJ(z) = r(n+a+l) ~r(n+a+fJ+k+I).(Z-I)k n r(n + I)r(n + a + fJ + I) f;;o I'(o + k + I) 2 These are the Jacobipolynomials discussed in Chapter7.In fact, the DE satisfied by pJ.,PJ (x) ofChapter7 is identicalto Equation (14.32). Note that the transformation x = I - 2z translates the points z = 0 and z = 1 to the points x = I and x = -I, respectively. Thus the regular singular points of the Jacobi functions of the first kind are at ±1 and 00.
  • 437. 14.5 CONFLUENT HYPERGEOMETRIC FUNCTIONS 419 A second, linearly independent, solution of Equation (14.32) is obtained by using (14.31). These are called the Jacobi functions of the second kiud: lO,P) _ 2,,+o+Pr(A + ex + 1)r(A + fJ + I) Q, (z) - r(2A + ex + fJ + 2)(z - 1),,+o+l(z + I)P . F (A+ex + I,A + 1; 2A +ex + fJ +2; _2_). 1-z (14.33) (14.34) (14.35) (14.36) Gegenbauer Gegeubauer functions, orultraspherical functions, are special cases ofJacobi functions functions for which ex = fJ = /-' - ~. They are defined by /L _ r(A+2/-,) ( . 1. 1-Z) c, (z) - rCA + 1)r(2/-,) F -A, A+ 2/-" /-'+ 2' -2- . Note the change in the normalization constant. Linearly independent Gegenbauer functions "of the second kind" can be obtained from the Jacobi functions of the Legendre functions second kind by the substitution ex =fJ =/-'-~. Another special case of the Jacobi functions is obtained when ex = fJ = O. Those obtained from the Jacobi functions of the first kind are called Legendre functions of the first kind: _ (0,0) 1/2 ( . . 1 - Z) F,(z) = F, (z) = c, = F -A, A+1, 1, -2- . Legendre fuuctions of the second kind are obtained from the Jacobi functions of the second kind in a similar way: 2'r 2(A + 1) ( 2 ) Q,(z) = r(2A + 2)(z _ 1)"+1F A+ 1, A+ 1; 2A+ 2; 1 _ z . Otherfunctions derived from the Jacobifunctions are obtained similarly (see Chap- ter7). 14.5 Confluent Hypergeometric Functions The transformation x = 1 - 2z translates the regular singular points ofthe HGDE by a finite amount. Consequently, the new functions still have two regular singular points, z = ±I, in the complex plane. In some physical cases of importance, only the origin, corresponding to r = 0 in spherical coordinates (typically the location of the source of a central force), is the singular point. Ifwe want to obtain a differential equation consistent with such a case, we have to "push" the singular point z = 1 to infinity. This can be achieved by making the substitution t = rz in the HGDE and taking the limit r --> 00. The substitution yields d 2w (Y I - Y + ex + fJ) dw exfJ 0 dt2 + t + t - r at + t(t _ r) w = .
  • 438. 420 14. COMPLEX ANALYSIS OF SOLOES (14.38) confluent hypergeomelric OE confluent hypergeometric function and series If we blindly take the limit r --+ 00 with a, fJ, and )I remaining finite, Equation (14.36) reduces to w+ ()I/t)th = 0, an elementary FODE in th. To obtain a nonelementary DE, we need to manipulate the parameters, to let some of them tend to infinity. We want )I to remain finite, because otherwise the coefficient of dw/dt will blow up. We therefore let fJ or a tend to infinity. The result will be the same either way because a and fJ appear symmetrically in the equation. It is customary to let fJ = r --+ 00. In that case, Equation (14.36) becomes d 2w + (l'. _I) dw _ ':W = O. dt2 t dt t Multiplying by t and changing the independent variable back to Z yields zw"(z) +()I - z)w'(z) - OIW(Z) = O. (14.37) This is called the confluent hypergeometric DE (CHGDE). Since Z = 0 is still a regular singular point of the CHGDE, we can obtain expansions about that point. The characteristic exponents are 0 and 1-)1, as before. Thus, there is an analytic solution (corresponding to the characteristic exponent 0) to the CHGDE at the origin, which is called the confluent hypergeomelric function and denoted by <p(0I; )I; z). Since z = 0 is the ouly possible (finite) singularity of the CHGDE, <P(OI; )I; z) is an entire function. We can obtain the series expansion of <P(OI; )I; z) directly from Equation (14.22) and the fact that <p(0I; )I; z) = limp-+o F(OI, fJ; )I; z/fJ). The result is <P a. . _ r()I) f'... r(OI +k) k ( ,)I,z) - I'(c) f;;Q r(k+ 1)r()I +k)z . This is called the conflnent hypergeometric series. An argument similar to the one given in the case of the hypergeometric function shows that , 14.5.1. Box. The confluent hypergeometricfunction <P(a;)I; z) reduces to a polynomial when a is a negative integer. A second solution of the CHGDE can be obtained, as for the HGDE. If 1-)I is not an integer, then by taking the limit fJ --+ 00 of Equation (14.24), we obtain the second solution Zl-Y <P (a -)I +I, 2 -)I; z). Thus, any solution ofthe CHGDE can be written as a linear combination of <p(0I; )I; z) and zl-y <P(a -)I +I, 2 -)I; z), 14.5.2. Example. Thetime-independent Schrodinger equation for a centralpotential, in units in whichIi = m = I, is -!V21jf + V(r)1jf = E1jf. Forthe case of hydrogen-like hydrogen-like atoms atoms, V(r) = -Z;' [r, where Z is the atomic number, and theequation reducesto ( 2Ze2) V 2 1jf + 2E + -r- 1jf =O.
  • 439. (14.39) quantization ofthe energy ofthe hydrogen atom 14.5 CONFLUENT HYPERGEOMETRIC FUNCTIONS 421 The radial part of this equation is given by Eqnation (12.14)with fer) =2E +2Ze21r. Definingu = rR(r), we maywrite e, ( a b) - + A+ - - - u =0, dr 2 r r2 where A = 2E, a = 2Ze2, and b = 1(1 + I). This equation can be further simplified by defining r == kz (k is anarbitrary constant tobe determined later): d 2u ( 2 ak b) - + Ak + - - - u =O. dz2 Z z2 Choosing Ak2 =-! and introducing a == al(2H) yields d2u+(_!+~_~)u=0. dz2 4 z z2 Equations ofthisform canbe transformed intotheCHGDE by making thesubstitution u(z) = z"e-vz f(z).1t theu follows that d 2f +(2fJ. _2v)df +[_!+fJ.(fJ.- I ) _2fJ.v +~-~+v2]f=0. dz2 z dz 4 z2 zzz2 Cboosing v2 = ! and fJ.(fJ. - I) = b reduces this equation to I" + c:-2v) r - 2fJ. V z - a f = 0, which is in the form of (14.37). Onphysicalgrounds, we expect u(z) --+ 0 as z --+ 00.5 Therefore, v = !. Similarly, with fJ.(fJ. - I) = b = 1(1 + I), we obtain the two possibilities fJ. = -I and fJ. = I + I. Againonphysical grounds, we demand that u(O)befinite (thewavefunction mustnot blow up at r = 0). This implies'' that fJ. = I + I. We thus obtain fff + [2(1; I) _ I] t' _ I + ~- a f = O. Multiplying by z gives zf" + [2(1 + I) - zlr" - (I + I - a)f =O. Comparing this with Equation (14.37) shows that f is proportional to cf>(I + I-a, 21 +2; z). Thus, the solution of (14.39) can be written as u(z) =Cz1 +te-z/2cf> (1+ 1- a, 21 +2; z). An argument similar to that used in Problem 13.20 will reveal that the product e-z/2cf> (I+I-a, 21+2; z) willbeinfiniteunless the power series representing cf> terminates (becomes a polynomial). It follows from Box 14.5.1 that this will take place if I + I-a =-N (14.40) 5Thisis becausethevolumeintegral of 11112 overallspacemustbe finite. Theradial part of thisintegral is simplytheintegral of r2R2(r) = u2(r). This latter integral will notbe finite unlessu(oo) = o. 6Recallthat JL is theexponent of z =r / k.
  • 440. 422 14. COMPLEX ANALYSIS OF SOLOES for some integer N :::: O. In that case we obtain the Laguerre polynomials L j = r(N+j+l) <1>(-N' I') N - I'(N +I)r(j + I) ,J + ,z , where j = 21+ I. Condition (14.40) is the quantization rule for the energy levels of a hydrogen-like atom. Writing everything in tenus of the original parameters and defining n = N +I + 1 yields-after restoring all the m's and the !i's-the energy levels of a hydrogen-like atom: E = _ Z2me4 = _Z2 (me 2 ) a2~, 21i2n2 2 n2 where a = eO1(1ic) = 1/137 is the fine structure constant. The radial wave functions can now be written as Rn I(T) = Un,I(T) = CTle-bj(naO)<1> (-n +1+ 1,21 +2; 2ZT) , , r naO where an = 1i2/ (me2) = 0.529 x 10-8 em is the Bohr radius. III Friedrich Wilhelm Bessel (1784-1846) showed no signs of unusual academic ability in school, although he did show a liking. for mathematics and physics. He left school intending to become a merchant's apprentice, a desire that soon mate- rialized with a seven-year unpaid apprenticeship with a large mercantile finn in Bremen. The young Bessel proved so adept at accounting and calculation that he was granted a smallsalary, with raises, after only the first year. An interest in foreign trade led Besselto study geography aod laoguages at uight, astonish- ingly learning to read and write English in only three months. He also studied navigation in order to qualify as a cargo officer aboard ship, but his innate curiosity soon compelledhim to investigate astronomy at a more fundamentalleveI. Still serving his apprenticeship, Bessel learned to observe the positions of stars with sufficient accuracy to determine the longitude of Bremen, checking his results against professional astronomical journals. He then tackled the more formidable problem of determining the orbit of Halley's comet from published observations. After seeing the close agreement between Bessel's calculations and those ofHaIley, the German astronomer Olbers encouraged Bessel to improve his already impressive work with more observations. The improved calculations, an achievement tantamount to a modem doctoral dissertation, were published with Olbers's recommendation. Bessel later received appointments with increasing authority at observatories near Bremen and in Konigsberg, the latter position being accompanied by a professorship. (The title of doctor, required for the professorship, was granted by the University of Gottingen on the recommendation of Gauss.) Bessel provedhimselfan excellentobservational astronomer. His careful measurements coupled with his mathematical aptitude allowed him to produce accurate positions for a number of previously mapped stars, taking account of instrumental effects, atmospheric refraction, and the position and motion of the observation site. In 1820 he determined the position of the vernal equinox accurate to 0.01 second, in agreement with modern values.
  • 441. 14.5 CONFLUENT HYPERGEOMETRIC FUNCTIONS 423 His observation of the variation of the proper motion of the stars Sirius and Procyon led him to posit the existence of nearby, large, low-luminosity stars called dark companions. Between1821and 1833he cataloguedthe positionsof about 75,000 stars, publishing his measurements in detail. One of his most important contributions to astronomy was the determinationof the distanceto a star using parallax. This method uses triangulation,or the determinationof the apparentpositions of a distant object viewedfrom two points a known distanceapart,in this casetwodiametrically opposedpointsof the Earth's orbit.The angle subtended by the baseline of Earth's orbit, viewed from the star's perspective, is known as the star's parallax. Before Bessel's measurement, stars were assumed to be so distant that their parallaxeswere too small to measure, and it was further assumed that bright stars (thought to be nearer) would have the largest parallax. Bessel correctly reasoned that stars with large proper motions were more likely to be nearby ones and selected such a star, 61 Cygni, for his historic measurement. His measured parallax for that star differs by less than 8% from the currently accepted value. Given such an impressive record in astronomy, it seems only fitting that the famous functions that bear Bessel'sname grew out ofhis investigations ofperturbations in planetary systems. He showed that such perturbations could be divided into two effects and treated separately: the obvious direct attraction due to the perturbing planet and an indirect effect caused by the sun's response to theperturber's force. The so-called Bessel functions then appear as coefficients in the series treatment of the indirect perturbation. Although special cases of Bessel functions were discovered by Bernoulli,Euler, and Lagrange, the systematic treatment by Bessel clearly established his preeminence, a fitting tribute to the creator of the most famous functions in mathematical physics. 14.5.1 BesselFunctions Bessel differential The Bessel differential equation is usually written as equalion 1/ I, ( V 2) W + -w + 1- - w = 0 Z Z2 (14.41) As in the example above, the substitution w = zI'e-~' f(z) transforms (14.41) inlo d 2 f (2/k + I ) df [/k 2 - v 2 ~(2/k + I) 2 IJ f 0 -+---2~-+ - +~+ =, di z ~ ~ z which, if we set /k =v and ~ =i, reduces to r +CV:I _ 2i)f' _ (2V;I); f = O. Making the further substitution 2;z = t, and multiplying out by t, we obtain d2f df t - 2 +(2v+I-t)--(v+!)f=O, dt dt
  • 442. 424 14. COMPLEX ANALYSIS OF SOLOES which is in the fonn of (14.37) with a = v + 1and y = 2v + 1. Thns, solutions of the Bessel equation, Equation (14.41), can be wtitten as constant multiples of zVe-iz<I>(v +1,2v +I; 2iz). With propernonnalization, we Bessel function ofthe define the Bessel function of the first kind of order v as firstkind 1 (Z)v. 1 Jv(z) = - e-"<I>(v + 2' 2v + 1; 2iz). f(v + 1) 2 Using Equation (14.38) and the expansion for e-iz, we can show that ( Z ) V 00 (_I)k (Z)2k Jv(z)= 2" ~k!f(V+k+l) 2" . (14.42) (14.43) The second linearly independent solution can be obtained as usual and is propor- tional to ZI-(2v+1) G)v e-iz<I>(v +1- (2v + I) + I, 2 ., (2v + I); 2iz) =C G)-v e-iz<I>(_v + 1,-2v + 1; 2iz) =C1-v(z) , provided that 1- y = 1- (2v +1) = -2v is not an integer. When v is an integer, 1-n(z) = (_I)n In(z) (see Problem 14.25). Thus, when v is anoninteger, the most general solution is of the form AJv(z) + B1-v(z). How do we find a second linearly independent solution when v is an integer n? We first define Bessel function ofthe second kind, or Neumann function J» (z) cos V1f - 1-v(z) Yv(z) = . , smuzr (14.44) called the Bessel function of the second kind, or the Nenmann function. For noninteger v this is simply a linear combinatiou of the two linearly independent solutious. For integer v the function is indeterminate. Therefore, we use I'Hopital's rule and define . 1 . [aJv na1-V) Yn(z) == Ion Yv(z) = - Ion - - (-1) - - . v-+n 1{ v--+-n 8v 8v Equation (14.43) yields aJv = Jv(z) In (!:) _(!:)V f(_1)k W(v +k + 1) (!:)2k, av 2 2 k=O k!f(v + k + 1) 2 where W(z) = (dldz) In I'(z), Sintilarly, a1-v = -1-v(z) In (!:) + (!:)-v f W(-v +k +1) (!:)2k. av 2 2 k~O k!f(-v +k+ 1) 2
  • 443. 14.5 CONFLUENT HYPERGEOMETRIC FUNCTIONS 425 Substituting these expressions in the definition of Yn(z) and using L n (z) = (-I)nJn(z), we obtain The natural log term is indicative of the solution suggested by Theorem 14.2.6. Since Yv(z) is linearly independent of Jv(z) for any v, integer or noninteger, it is convenientto consider {Jv(z), Yv(z)} as a basis ofsolutions for the Bessel equation. Another basis of solutions is defined as Bessel function ofthe third kind orHankel function 2 (Z) I (z)n~ kqr(n+k+l) (Z)2k Yn(z)=;Jn(z)ln 2: -; 2: 6(-1) k!I'(n+k+l) 2: _~(~)-n -I n~ -I k qr(k-n+l) er T( 2 () 6( ) k!I'(k - n + I) 2 (14.45) (14.46) which are called Bessel functions of the third kind, or Hankelfunctions. Replacing z by iz in the Bessel equation yields whose basis of solutions consists of multiples of Jv(iz) and Lv(iz). Thus, the modified Bessel fnnctions of the first kind are defined as . (Z)v 00 I (Z)2k I - -mvj2], . - v(z) = e u(zz) - 2: t;k!I'(v +k + I) 2: . Similarly, the modified Bessel fnnctions of the second kindare defined as T( Kv(z) = 2' [I-v(z) - Iv(z)]. SIll v:rr When v is anintegern, In = L n, and Kn is indeterminate. Thus, we define Kn(z) as limv-->n Kv(z). This gives Kn(z) = (_I)n lim [aLv _ aIv]. 2 v--+n Bv Bv which has the power-series representation
  • 444. 426 14. COMPLEX ANALYSIS OF SOLOES We can obtain a recorrence relation for solntions of the Bessel eqnation as follows. If Zv(z) is a solntion of order v, then (see Problem 14.28) recurrence relation forsolutions ofthe BesselOE and v d v Zv-l = C2C -[z Zv(z)]. dz If the constants are chosen in snch a way that Zv, Z-v, Zv+l, and Zv-l satisfy their appropriate series expansions, then Cl = -I and C2 = 1. Carrying ont the differentiation in the equations for Zv+l and Zv-l, we obtain v dZ; Zv+l = -Zv - -d' z z v sz; Zv-l = -Zv + --. z dz (14.47) Adding these two eqnations yields the recursion relation 2v Zv-l(Z) +Zv+l(Z) = -Zv(z), z where Zv(z) can be any of the three kinds of Bessel functions. 14.6 Problems (14.48) for n 2: O. 14.1. Show that the solution of w' + w/ Z2 = 0 has an essential singularity at z =0. 14.2. Derive the recursion relation of Equation (14.7) and express it in terms of the indicial polynomial, as in Equation (14.9). 14.3. Find the characteristic exponent associated with the solution of ui" + p(z)w' +q(z)w = 0 at an ordinary point [a point at which p(z) and q(z) have no poles]. How many solutions can you find? 14.4. The Laplace equation in electrostatics when separated in spherical coordi- nates yields a DE in the radial coordinate given by ~ (x2dY ) _ n(n +I)y = 0 dx dx Starting with an infinite series of the form (14.6), show that the two independent solutions of this ODE are of the form xn and x-n- 1. 14.5. Findthe indicialpolynomial, characteristic exponents, and recursionrelation at both of the regular singular points of the Legendre equation, 2z a wI! - _-w' + --w = O. I- Z2 l- z2 What is ai, the coefficient of the Laurent expansion, for the point z = +I?
  • 445. F(-a, fJ; fJ; -z) = (I +z)", 14.6 PROBLEMS 427 14.6. Show that the substitution z = lit transforms Equation (14.13) into Equa- tion (14.14). 14.7. Obtain the indicial polynomial of Equation (14.14) for expansion about t = O. 14.8. Show that Riemann DE represents the most general second order Fuchsian DE. 14.9. Derive the indicial equation for the Riemann DE. 14.10. Show that the transformation v(z) = ZA(Z - I)"w(z) changes the pairs of characteristic exponents 0.1, A2), (iLl, iL2), and (VI, V2) for the Riemann DE to (AI +A, A2 +A), (J.L1 +u; iL2 +iL), and (VI - A - u, V2 - A - iL). 14.11. Go through the steps leading to Equations (14.24), (14.25), and (14.26). 14.12. Show that the elliptic function of the first kind, defined as /0 " /2 de K(z) = , o JI- Z2 sin2 e can be expressed as (n 12)F(~, ~; I; Z2). 14.13. By differentiating the hypergeometric series, show that dn r(a +n)r(fJ +n)r(y) dzn F(a, fJ; y; z) = r(a)r(fJ)r(y +n) F(a +n, fJ +n; y +n; z). 14.14. Use direct substitution in the hypergeometric series tu show that 1 1. 3. 2 1'_1 F(2' 2' 2'Z) = -sm z, z I F(I, I; 2; -z) = -In(1 +z). z 14.15. Show that the substitution v(z) = ZTw(l/z) [see Equation (14.28)] trans- forms the HGDE into Equation (14.29). 14.16. Consider the function v(z) '" zT(I - z)'F(al, fJI; Yl; liz) and assume that it is a solution of HGDE. Find a relation among r, s, aI, fJt, and Yl such that v(z) is written in terms of three parameters rather than five. In particular, show that one possibility is v(z) = za-y(I - z)y-a-PF(y - a, I - a; I +fJ - a; liz). Find all such possibilities.
  • 446. 428 14. COMPLEX ANALYSIS OF SOLOES 14.17. Show that the Jacobi functions are related to the hypergeometric fooctions. 14.18. Derive the expression for the Jacobi function of the second kind as given in Equation (14.33). 14.19. Show that z = 00 is not a regular singular point of the CHGDE. 14.20. Derive the confluent hypergeometric series from hypergeometric series. 14.21. Show that the Weber-Hermite equation, u" +(v+ ~ - !Z2)u = ocan be transformed into the CHGDE. Hint: Make the substitution u(z) = exp( -!Z2)v(Z). 14.22. The linear combination . _ I'(l - y) . I'(y - I) l-y _ . ljI(a, y, z) = I) <!>(a, y, z) + I'(a) Z <!>(a y + 1,2 - y, z) I'(a - y + j, a is also a solntion ofthe CHGDE. Show thatthe Hermitepolynomials can be written as ( Z) " nlz 2 H" .,fi = 2 1jI(-2' 2; 2)' 14.23. Verify that the error fooction erf'(z) = J~ e-t'dt satisfies the relation erf'(z) = z<!>(~, ~; _Z2). 14.24. Derive the series expansion of the Bessel function of the first kind from that of the confluent hypergeometric series and the expansion of the exponential. Check your answer by obtaining the same result by snbstituting the power series directly in the Bessel DE. 14.25. Show that 1-,,(z) = (-I)"J,,(z). Hint: Let v = -n in the expansion of Jv (z) and use I'Cm) = 00 for a nonpositive integer m, 14.26. In a potential-free region, the radial part of the Schrodinger equation re- duces to d 2 R + ~dR + [I.. _:!...] R = O. dr 2 r dr r2 Write the solutions of this DE in terms of Bessel functions. Hint: Substitute R = spherical Bessel uj.p. These solutions are called spherical Bessel functions. functions 14.27. Theorem 14.2.6 states that ooder certain conditions, linearly independent solutions of SOLDE at regular singolar points exist even though the difference between the characteristic exponents is an integer. An example is the case of Bessel fooctions of half-odd-integer orders. Evaluate the Wronskian of the two linearly independent solutions, Jv and J-v, ofthe Bessel equation and show that it vanishes only ifv is an integer.This shows, in particular, that J,,+1/2 and 1-,,-1/2 are linearly independent. Hint: Consider the value of the Wronskian at z = 0, and use the formula I'(v)I'(l - v) = x] sin vrr.
  • 447. 14.6 PROBLEMS 429 14.28. Show that z±V(djdz)[z'FvZv(z)] is a solution of the Bessel equation of order v ± I if Z, is a solution of order v. 14.29. Use the recursion relation of Equation (14.47) to prove that ( I d)m u v m ~ dz lz Zv(z)] = z - Zv-m(Z), I d m (~dJ [z-VZv(z)] = (-I)mz-V-mZv+m(z). 14.30. Using the series expansion of the Bessel function, write JI/2(Z) and Ll/2(Z) in terms of elementary functions. Hint: First show that r(k + ~) = v'IT(2k + 1)!j(k!22k+l). 14.31. From the results of the previous two problems, derive the relations L n- I/2(Z) = [frl / 2 (~:y ("o;Z) , I n+I/2(Z) = [fzn+I/2 (_~:y("i~Z). 14.32. Obtain the following integral identities: (a) fzV+l Jv(z)dz = zV+l Jv+l(Z). (b) fz-v+IJv(z)dz = -z-v+lJv_l(Z). (e) fZIL+l Jv(z)dz = ZIL+ 1 Jv+l(Z) +(I-' - v)ZIL Jv(z) - (1-'2 - v2)fZIL- 1 Jv(z) dz; and evaluate (d) fZ3 Jo(z) dz: Hint: For (c) write ZIL+ 1 = ZIL-Vzv+1 and use integration by parts. 14.33. Use Theorem 14.2.6 and the fact that In(z) is entire to show that for integer n, a second solution to the Bessel equation exists and can be written as Yn(z) = I n(z)[/n (z) +en ln z], where In(z) is analytic about z = O. 14.34. (a) Show that the Wronskian W(Jv, Z; z) of Jv and any other solution Z of the Bessel equation, satisfies the eqnation d dz[zW(Jv, Z; z)] = o.
  • 448. 430 14. COMPLEX ANALYSIS OF SOLOES (b) For some constant A, show that ~ [~] _ W(z) _ _ A_ dz Jv - J;(z) - zJ;(z)' (c) Show that the general second solution of the Bessel equation can be written as 14.35. Spherical Bessel functions are defined by Let il(Z) denote a spherical Bessel function "of some kind." By direct differenti- ationandsubstitution in the Bessel equation, show that (c) Combine the results of parts (a) and (b) to derive the recursion relations 2/ + 1 il-t(Z) + fi+l(z) = --fj(z), z df; /fi-I (z) - (/ + l)fi+1 (z) = (2/ + 1) dz . 14.36. Show that W(Jv, Yv;z) = 2/(:n:z), W(HJ!), HS 2 ); z) = 4/(i:n:z). Hint: Use Problem 14.34. 14.37. Verify the following relations: (a) Yn+I/2(Z) = (_1)n+1 L n-I/2(Z), L n-I/2(Z) = (-1)" I n+I/2(Z). Jv (z) - cos v:n: Lv (z) (b) Lv(z) = sin v:n: Jv(z) +cos v:n:Yv(z) = . . sm vn (c) Y-n(z) = (-l)nyn(z) in the limit v --> n in part (b). 14.38. Use the recurrence relation for the Bessel function to show that JI (z) = -Jij(z). 14.39. Let u = Jv().z) and v = Jv(jLz). Multiply the Bessel DE for u by v[z and that of v by ufz, Subtract the two equations to obtain 0.2 _ jL2)zuv = ~ [z (u dv _ v dU)] . dz dz dz (a) Write the above equation in terms of Jv(}"z) and Jv(jLz) and integrate both sides withrespectto Z.
  • 449. o is w = 14.6 PROBLEMS 431 (b) Now divide both sides by A2 - ,,2 and take the limit as" --> A. You will need to use L'Hopital's rule. (c) Substitute for J;'(AZ) from the Bessel DE and simplify to get fz[Jv(Az)fdz = Z; {[j;(AZ)]2 + (1 -A~:2) [Jv(Az)f} . (d)Finally, let X= xvn/a, where r.; is the nth root of Jv, and use Equation (14.47) to arriveat {a 2 (Xvn ) a 2 2 io zJv ---;;z dz = 2Jv+I (xvn). 14.40. The generating function for Bessel functions of integer order is exp[~z(t­ I/t)]. To see this, rewtite the generating function as e"/2e-z/21, expand both factors, and wtite the product as powers of t", Now show that the coefficient of t" is simply In(z). Finally, use Ln(z) = (-I)nJn(z) to derive the formula exp[~z(t - I/t)] = L~-oo JIl(z)tn. 14.41. Make the substitutions z = fJtY and w = tau to transform the Bessel DE into t2 d 2 u + (2", + l)t du + (fJ2y 2t2Y + ",2 _ v2y 2)u = O. Now show that dt2 dt Airy's differential equation ii - tu = 0 has solutions of the form hf3(~it3/2) and LI/3(~it3/2). d2 w 14.42. Show that the general solution of dt2 + t[AJv(el/') + BYv(el/')]. 14.43. Transform dw/dz + w2 + zm = 0 by making the substitution w = (d/dz) In v. Now make the forther substitutions v= u..;z and t = _2_z!+O/2)m m+2 to show that the new DE can be transformed into a Bessel equation of order I/(m +2). 14.44. Starting with the relation exp[~x(t - l/t)] exp[~Y(t - l/t)] = exp[~(x + y)(t - l/t)] and the fact that the exponential function is the generating function for I n (z), prove the "addition theorem" for Bessel functions: 00 JIl(x +y) = L h(X)Jn-k(Y)· k=-oo
  • 450. 432 14. COMPLEX ANALYSIS OF SOLDES Additional Reading 1. Birkhoff, G. and Rota, G.-C. OrdinaryDifferentialEquations, 3rded., Wiley, 1978. The first two sections ofthis chapter closely follow their presentation. 2. Dennery, P.and Krzywicki, A. Mathematicsfor Physicists, Harper and Row, 1967. 3. Watson, G. A Treatise on the Theory ofBessel Functions, 2nd ed., Cam- bridge University Press, 1952. As the name suggests, the definitive text and reference on Bessel functions.
  • 451. 15 _ Integral Transforms and Differential Equations The discussion in Chapter 14 introduced a general method of solving differential equations by power series-i-also called the Frobenius method-which gives a solution that converges within a circle of convergence. In general, this circle of convergence maybe small; however, thefunction represented by thepowerseries can be analytically continued using methods presented in Chapter II. This chapter, which is a bridge between differential equations and operators on Hilbert spaces (to be developed in the next part), introduces another method of solving DEs, which uses integral transforms and incorporates the analytic con- tinuation automatically. The integral transform of a function v is another function u given by u(z) = fcK(z, t)v(t)dt, (15.1) kernel ofintegral transforms examples ofintegral transforms where C is a convenient contour, and K(z, t), called the kernel of the integral transform, is an appropriate function of two complex variables. 15.0.1. Example. Letus consider some examples of integral transforms. (a)TheFouriertransform is familiar from thediscussionof Chapter 8. Thekernel is K(x, y) = eixy. (b) The Laplace transform is used frequently in electrical engineering. Its kernel is K(x, y) = e-xy. (c)TheEuler transform hasthekemel K(x, y) = (x _ y)v.
  • 452. 434 15. INTEGRAL TRANSFORMS AND DIFFERENTIAL EQUATIONS (d) The Mellintransform has the kernel st», y) = G(xY). where G is an arbitrary function. Most of the time K(x, y) is taken to be simply x Y. (e) The Hankel transform has the kernel K(x, y) = yJn(xy), IIIi Strategy forsolving DEs using integral transforms where In is the nth-order Bessel function. (f) A transform that is useful in connection with the Bessel equation has the kernel K(x,y) = Gfey-x'/4y. The idea behind using integral transform is to write the solution u(z) of a DE in z in terms of an integral such as Equation (15.1) and choose v and the kernel in such a way as to render the DE more manageable. Let Lz be a differential operator (DO) in the variable z. We want to determine u(z) such that Ldu] = 0, or equivalently, such that Ie Lz[K(z, t)]v(t) dt = O. Snppose that we can find M" a DO in the variable t, such that LdK(z, t)l = M,[K(z, t)]. Then the DE becomes Ie(M,[K(z, t)]Jv(t) dt = O. If C has a and b as initial and final points (a and b may be equal), then the Lagrange identity [see Equation (13.24)] yields 0= Lz[u] = t K(z, t)M;[V(t)]dt + Q[K, v]I~, where Q[K, v] is the "surface term." If vet) and the contour C (or a and b) are chosen in such a way that Q[K, v]l~ = 0 and M;[V(t)] = 0, (15.2) the problemis solved. The trick is to find an M,suchthatEquation (15.2) is easierto solve thanthe original equation, Lz[u] = O. This in torn demands a clever choice of the kernel, K(z, t). This chapter discusses how to solve some common differential equations of mathematical physics using the general idea presented above. 15.1 Integral Representation ofthe Hypergeometric Function Recall that for the hypergeometric function, the differential operator is d2 d Lz = z(1 - z) dz2 +[y - (a +fJ + I)zl dz - afJ For such operators-whose coefficient functions are polynomials-the proper choice for K(z, t) is the Euler kernel, (z - t)'. Applying Lz to this kernel and
  • 453. 15.1 INTEGRAL REPRESENTATION OF THE HYPERGEOMETRIC FUNCTION 435 rearrmgmgrenns,weO~IDn Lz[K(z, t)] = /Z2[-s(s - 1) - s(a +,B + 1) - a,B] +z[s(s- 1) +sy +st(a +,B + 1) +2a,Bt] - yst - a,BP}(z - t)'-2. (15.3) Note that except for a multiplicative constant, Kiz; t) is synunetric in z and t. This suggests that the general form of M, may be chosen to be the same as that of Lz except for the interchange of z and t. If we can manipulate the parameters in such a way that M,becomes simple, then we have a chance ofsolving the problem. For instance, if M, has the form of Lz with the constant term absent, then the hypergeometric DE effectively reduces to a FODE (in dv/ dt). Let us exploit this possibility. The general form of the M, that we are interested in is d2 d M, = P2(t) dt2 +PI (t) dt' i.e., with no Po term. By applying M, to K(z, t) = (z - t)' and setting the result equal to the RHS of Equation (15.3), we obtain s(s - 1)p2 - PISZ + pist = Z2[-s(s - 1) - s(a +,B + 1) - a,B] + z[s(s - 1) +sy +st(a +,B + 1) +2a,Bt] - yst - a,Bt2, for which the coefficients of equal powers of z on both sides must be equal: - s(s - 1) - s(a +,B +1) - a,B = 0 =} s = -a or s = -,B, - PIS = s(s - 1) +sy +stt« +,B + 1) +2a,Bt, s(s - 1)p2 + pvst = -yst - a,Bt2. Ifwe choose s = -a (s = -,B leads to an equivalent representation), the coeffi- cient functions of M, will be completely determined. In fact, the second equation gives PI (t), and the third determines P2(t). We finally obtain PI(t) =a+l-y +t(,B -a -1), and 2 d2 d M, = (t - t )-2 + [a + 1 - y +t(,B - a - 1)]-, dt dt (15.4) (15.5) which, according to Equation (13.20), yields the following DE for the adjoint: d2 d Mi[v] = -2 [(t - t2)v] - -{[a - y +1 +t(,B - a - l)]v} = O. dt dt
  • 454. (15.6) 436 15. INTEGRAL TRANSFORMS AND DIFFERENTIAL EQUATIONS The solution to this equation is v(t) = Cta-Y(t- I)Y-~-l (see Problem 15.5). We also need the surface tenm, Q[K, v], in the Lagrangeidentity (see Problem 15.6 for details): Q[K, v](t) = Cata-y+1(t - l)Y-~(z - t)-a-l. Finally, we need a specification of the contour. For different contours we will get different solutions. The contour chosen must, of course, have the property that Q[K, v] vanishes as a result of the integration. There are two possibilities: Either the contour is closed [a = b in (15.2)] or a i' b but Q[K, v] takes on the same value at a and at b. Let us consider the second of these possibilities. Clearly, Q[K, v](t) vanishes att = 1 ifRe(y) > Re(,8). Also, as t ---> 00, Q[K, v](l) ---> (_l)-a-1Cata-y+ltY-~t-a-l = (_l)-a-1Cat-~, which vanishes if Re(,8) > O. We thus take a = 1 and b = 00, and assume that Re(y) > Re(,8) > O. It then follows that u(z) = lb K(z, t)v(t)dt = c 100 (t - z)-ata-y(t - l)y-~-ldt. The constaot C' Can be determined to be r(y)/[r(,8)r(y - ,8)] (see Problem 15.7). Therefore, u(z) sa F(a,,8; y; z) = r(,8)~i~)_ ,8) 1 00 (t - z)-ata- y (t -l)y-~-ldt. Euler formula forfhe hypergeomelric function It is customary to change the variable of integration from t to 1/t. The resulting expression is called the Euler formula for the hypergeometric function: F(a ,8' y' z) = r(y) t(l - tz)-at~-l(l- t)y-~-ldt. , " r(,8)r(y - ,8) 10 (15.7) Note that the tenm (1 - tz)-a in the integral has two branch points in the z- plane, one at z = l/t and the other at z = 00. Therefore, we cut the z-plane from Zl = l/t, a point on the positive real axis, to Z2 = 00. Since 0 :'0 t :'0 1, Zl is somewhere in the interval [1, (0). To ensure that the cutis applicable for all values of t, we take Zl = 1 and cut the plane along the positive real axis. It follows that Equation (15.7) is well behaved as long as o< arg(l - z) < 21f. (15.8) We couldchoose a different contour, which, in general, would leadto a different solution. The following example illustrates one such choice. 15.1.1.Example. Firstnotethat Q[K, v] vanishes att = Oandt = 1 as long asRe(y) > Re(li) andRe(a) > Re(y) -I. Hence, we can choose the contour to start att =0 and end
  • 455. 15.2 INTEGRAL REPRESENTATION OF THE CONFLUENT HYPERGEOMETRIC FUNCTION 437 att = 1. Wethenhave (15.9) Tosee therelation between w(z) andthehypergeometric function, expand (1- t/z)-a in the integral to get w(z) = e"z-u f r(a +n) (~)n [I tu+n-y (1 _ tly-P-1dt. n~O f(a)r(n + 1) z Jo (15.10) Nowevaluatethe integralby changingt to l/t and osing Eqoations(11.19)and (11.17). Thischangestheintegral to [00 t-u-n-l+P(t _ l)y-P-1dt = r(a +n + 1 - y)r(y - fJ) Jl r(a +n + 1 - fJ) Snbstitnting this in Equation(15.10), we obtain c" -a ~ f(a +n)f(a +n + 1-y) (1)" w(z) = r(a) r(y - fJ)z f;;n F'(e +n + 1 - fJ)r(n + 1) Z c" r(a)r(a + 1- y) = r(a) f(y - fJ)z-a r(a +1- fJ) F(a, a - y + 1; a - fJ+ 1; l/z), where wehave usedthe hypergeometric series of Chapter 14.Choosing e" = f(a + 1 - fJ) r(y - fJ)r(a + 1 - y) yields w(z) = z-a F(a, a - y +1;a - fJ +1; l/z), whichis one of the solutions of the hypergeometric DE [Equation (14.30)]. II!I 15.2 Integral Representation of the Confluent Hy- pergeometric Function Having obtained the integral representation of the hypergeometric fnnction, we can readily get the integral representation of the conflnent hypergeometric func- tion by taking the proper limit. It was shown in Chapter 14 that <I>(a, y; z) = limp-+oo F(a, fJ; y; zlfJ). This snggests taking the limit of Equation (15.7). The presence of the ganuna functions with fJ as their arguments complicates things, but on the other hand, the symmetry ofthe hypergeometric function can be utilized
  • 456. 438 15. INTEGRALTRANSFORMS AND DIFFERENTIAL EQUATIONS to ouradvantage. Thus,we may write <1>(a, y; z) = lim F (a, fJ; y; .:.) = lim F (fJ, a; y; .:.) ~....oo fJ ~....oo fJ = lim r(y) r(1- IZ)-~1.-1(1 _ t)y-.-Idl ~....co r(a)r(y - a) Jo fJ = <1>(a, y; z) = r(y) t eztl·-I(I _ t)y-.-Idl r(a)r(y - a) Jo (15.11) becausethe limitofthe firsttenn in the integralis simplyetz. Note that the condition Re(y) > Re(a) > 0 must still hold here. Integral transforms are particularly useful in determining the asymptotic be- havior offunctions. We shall use them in deriving asymptotic formulas for Bessel functions later on, and Problem 15.10 derives the asymptotic formula for the con- fluent hypergeometric function. 15.3 Integral Representation of Bessel Functions Choosing the kernel, the contour, and the function v(l) that lead to an integral representation of a function is an art, and the nineteenth century produced many masters of it. A particularly popular theme in such endeavors was the Bessel equation and Bessel functions. This sectionconsiders theintegral representations of Bessel functions. The most effective kernel for the Bessel DE is K(z, I)= Grexp (I-~:). d Z I d ( V Z) . When the Bessel DO Lz sa -z + -- + 1- -z acts on K(z, I), it yields dz z dz z ( v+1 ZZ)(Z)V '/4 (d V+I) L K(z, t) = --- + I + - - et - z t = - - -- K(z I). z I 41z 2 dt I ' Thus, Mt = dfdt - (v + 1)/1, and Equation (13.20) gives t dv v +I Mt [V(I)] = -- - --v = 0, dt I whose solution, including the arbitrary constant ofintegrationk, is v(l) = ta:":", When we substitute this solution and the kernel in the surface term ofthe Lagrange identity, Equation (13.24), we obtain Q[K, V](I) = PIK(z, l)v(l) = k G)v l-v-I et - z' / (4t ) .
  • 457. 15.3 INTEGRAL REPRESENTATION OF BESSEL FUNCTIONS 439 Imt Ret (15.12) (15.13) inlegral representation of Bessel function Figure 15.1 ThecontourC in the r-planeusedin evaluating Jv(z). A contour in the I -plane that ensures the vanishing of Q [K, v]for all values of v starts at I = -00, comes to the origin, orbits it on an arbitrary circle, and finally goes back to t = -00 (see Figure 15.1). Such a contour is possible because of the factor et in the expression for Q[K, v]. We thus can write Jv(z) = k G)v fct-V-Iet-z2/(4t)dt Note that the integrand has a cut along the negative real axis due to the factor I-v-I. If v is an integer, the cut shrinks to a pole att = O. The constant k must be determined in such a way that the above expression for Jv(z) agrees with the series representation obtained in Chapter 14. It can be shown (see Problem 15.11) that k = 1/(2,,;). Thus, we have Jv(z) = ~ (~)V rt-v-let-z2/(41)dt 2", 2 lc Itis more convenient to take the factor (zI2)vinto the integral, introduce a new integration variable u = 2t [z, and rewrite the preceding equation as Jv(z) = ~ ru-v-1e(z/2)(u-lfu)du. 2", Jc This result is valid as long as Re(zu) < 0 when u --> -00 on the negative real axis; that is, Re(z) must be positive for Equation (15.13) to work. An interesting result can be obtained from Equation (15.13) when v is an integer. In that case the only singularity will be at the origin, so the contour can be taken to be a circle about the origin. This yields J (z) = _1_ ru-n-Ie(z/2)(u-lfu)du n 2ni Jc ' which is the nth coefficient of the Laurent series expansion ofexp[(zI2)(u -l/u)] Bessel generating about the origin. We thus have this important resnlt: function DO e(z/2)(t-l/l) = L In(z)tn. n=-oo (15.14)
  • 458. (15.15) 440 15. INTEGRAL TRANSFORMS AND DIFFERENTIAL EQUATIONS The function exp[(zIZ) (t - 1It)1is therefore appropriately called the generating fnnction for Bessel functions of integer order (see also Problem 14.40). Equation (15.14) canbe useful in deriving relations for suchBesselfunctions as the following example shows. 15.3.1. Example. Let us rewrite the LHS of (15.14) as eZI/2e-z/21, expand the expo- nentials, andcollectterms to obtain e(z/2)(t-ljt) = ezt/2e-z/2t = f ~ (~)m f ~ (_~)n m=O m. 2 n=O n. 2t = f f (-~)~ (~)m+n tm-n. m=On=O m.n. 2 If we letm - n = k, change them summation to k, andnotethat k goes from -00 to 00, we get e(z/2)(I-I/I) = f f (_I)n (~)2n+krk k~-oon~O (n +k)!n! 2 00 [(Z)k 00 (_I)n (Z)2n] k k~oo 2: Er(n +k + l)r(n + 1) 2: 1 . Comparing thisequation withEquation (15.14)yieldsthefamiliar expansion fortheBessel function: ( Z) k 00 (_I)n (Z)1. h(z) = 2: Er(n + k + l)r(n + 1) 2: . Wecanalsoobtain arecurrencerelationforI n (z), DifferentiatingbothsidesofEquation (15.14)with respect to t yields ~ (1 + k)e(Z/2)(I-I/I) = f nJn(Z)ln-1 n=-oo Using Equation (15.14) on the LHS gives 00 00 00 L G+ --;) In(z)t n = ~ L In(z)t n + ~ L In(z)t n-2 n=-oo 2t n=-oo n=-oo 00 00 =~ L In_l(Z)t''-I+~ L In+I(Z)I,,-I, n=-oo n=-oo (15.16) where we substituted n - 1forn in thefirst sumandn +1forn inthesecond. Equating the coefficientsof equalpowersof 1 on the LHS andthe RHS of Equations (15.15) and (15.16), we get z nJn(z)= 2:[Jn-I(Z) + J,,+I(Z)], which was obtained hy a different method in Chapter 14 [seeEq. (14.48)]. III
  • 459. 15.3 INTEGRAL REPRESENTATION OF BESSEL FUNCTIONS 441 Imw Rew Figure 15.2 The contourC' in the w-plane used in evaluating JvCz). (15.17) Re(z) > 0, We can start with Equation (15.13) and obtain other integral representations of Bessel functions by making appropriate substitutions. For instance, we can let u = e" and assume that the circle of the contour C has unit radius. The contour C' in the w-plane is determined as follows. Write u = rei() and w sa x + iy, sol reiO= eXeiy yielding r = eXand eiO= eiy. Along the first part of C, e= -rr and r goes from 00 to 1. Thus, along the corresponding part of C', y = -rr and x goes from 00 to O. On the circular part of C, r = 1 and egoes from -rr to -l-z , Thus, along the corresponding part of C', x = 0 and y goes from -rr to +rr. Finally, on the last part of C', y = rr and x goes from 0 to 00. Therefore, the contour C' in the w-plane is as shown in Figure 15.2. Substituting u = eW in Equation (15.13) yields Jv(z) = ---.!:..., ( ezsmhw-vwdw, 21rl lei which can be transformed into (see Problem 15.12) 1 In" sin vrr Inoo . Jv(z) = - cos(ve -zsine)de _ - - e-vt-zsinh 'dt. n 0 1r 0 (15.18) integral representation of Bessel functions of integer order For the special case of integer v, we obtain I n (z) = ~ (" cos(ne - z sin e) de. rr Jo In particular, 1 In" Jo(z) = - cos(z sine) de. rr 0 We can use the integral representation for Jv(z) to find the integral repre- sentation for Bessel functions of other kinds. For instance, to obtain the integral 1Do notconfusex andy withtherealandimaginary parts of z,
  • 460. 442 15. INTEGRAL TRANSFORMS AND DIFFERENTIAL EQUATIONS Imw in 1------1~---- en Rew Figure15.3 Thecontourc" in the w-planeused in evaluating HS1 )(z). representation for the Neumann function Yv(z), we use Equation (14.44): 1 Yv(z) = (cot V1f)Jv(z) - -.--Lv(z) sm V1r cot V1l' i1C cos VJr i"" inh = __ cos(ve-zsine)de--- e-Vf - z S "dt tc 0 7i 0 1 i" 1 i"" . - . cos(ve +zsine)de _ _ evt-zsinhtdt tt smVll' 0 1C 0 with Re(z) > O. Substitute tc - e for e in the third integral on the RHS. Then insert the resulting integrals plus Equation (15.18) in HJl) (z) = Jv(z) +iYv(z) to obtain Re(z) > O. These integrals can easily be shown to result from integrating along the contour e" of Figure 15.3. Thus, we have Re(z) > O. By changing i to -t, we can show that Re(z) > 0, where elll is the mirror image of e" about the real axis.
  • 461. g(w) = 1. 15.4 ASYMPTOTIC BEHAVIOR OF BESSEL FUNCTIONS 443 15.4 Asymptotic Behavior of Bessel Functions As mentioned before, integral representations are particularly useful for determin- ing the asymptotic behavioroffunctions. For Bessel functions we can considertwo kinds of limits. Assumiug that both v and z = x are real, we can consider v --+ 00 or x --+ 00. First, let us consider the behavior of Jv(x) oflarge order. The appropri- ate method for calculating the asymptotic form is the method ofsteepest descent discussed in Chapter II for which v takes the place of the large parameter a. We use Equation (15.17) because its integrand is simplerthan that ofEquation (15.13). The form of the integrand in Equation (15.17) may want to suggest f(w) = -w and g(w) = e" sinhw. However, this choice does not allow setting f' (w) equal to zero. To proceed, therefore, we write the exponent as v (~ sinh w- w), and conveniently introduce x]» es 1/ cosh wo, with wo a real number, which we take to be positive. Substituting this in the equation above, we can read off sinh w f(w) = h -w, cos wo The saddle point is obtained from df/dw = °or cosh w = cosh woo Thus, w = ±wo + 2in:n:, for n = 0, 1, 2 .... Since the contour C' lies in the right half- plane, we choose Wo as the saddle point. The second derivative fN (wo) is simply tanh wo, which is real, making Ih = 0, and el = n/2 or 3:n:/2. The convention of Chapter II suggests taking el = :n:/2 (see Figure 15.4). The rest is a matter of substitution. We are interested in the approximation to w up to the third order in t: w - wo = bIt + b2t2 + b3t3. Using Equations (11.31), (11.37), and (11.38), we can easily find the three coefficients: bl = v"i eWI = i v"i 1f"(woJll/2 ,Jtanh wo' b = f"'(wo) e4i81 = cosh 2 Wo 2 31f"(wo)12 3 sinh2 wiJ' { 5[f"'(wo)f f(4l(wO)} v"ie3i91 b3 = 3[f"(wO)]2 - f"(wo) 121f"(wo)!3/2 . v"i (5 2 ) - -I - coth Wo - 1 - 12(tanh wO)3/2 3 . Ifwe substitute the above in Equation (11.36), we obtain the following asymp- totic formula valid for v --+ 00: eX(sinhWO-Wij coshWO) [ I ( 5 2 ) ] Jv(x)'" 1+ I--coth wo + ... (2:n:x sinh wo)I/2 8x sinh Wo 3 ' where v is related to WQ via v = x coshwoo
  • 462. 444 15. INTEGRAL TRANSFORMS AND DIFFERENTIAL EQUATIONS v i r e - - - - - - - - - u -ix - - - ~-=--=:-::=- - - Figure 15.4 Thecontour Coin thew-plane usedin evaluating Jv(z) forlarge values of v. Let us now consider the asymptotic behavior for large x. It is convenient to consider the Hankel functions HSt)(x) and HS2)(x). The contours e" and ell! involve both the positive and the negative real axis; therefore, it is convenient, assuming that x > v, to write v = x cos fJ so that The saddle points are given by the solutions to cosh W =cos fJ, which are WQ = ±ifJ. Choosing WQ = +ifJ, we note that the contour along which Im(sinh W - W cos fJ) = Im(sinh WQ - WQ cos fJ) is given by coshu = [sinfJ + (v - fJ)cosfJ]/sinv. This contour is shown in Figure 15.5. The rest of the procedure is exactly the same as for Jv(x) described above. In fact, to obtain the expansion for HS1 ) (x), we simply replace WQ by ifJ. The result is 2 V2 I 5 H(l)(x) '" (. . ) ei(xsin~-v~) [I+ . . (I + -cot2fJ) +...J. v mx smfJ g.x smfJ 3 When x is much larger than v, fJ will be close to ](/2, and we have HS1l(x) '" (2ei(X- vrr/ 2- rr/4) (1 + ~) , 'In g.x which, with 11x -+ 0, is what we obtained in Example 11.5.2.
  • 463. 15.5 PROBLEMS 445 v u Figure 15.5 Thecontour in thew-plane usedin evaluating .H51)(z) in thelimitof large values of x. The other saddle point, at -if!, gives the otherHankel function, with the asymp- totic limit H(2)(X) '" (2e-i(X-vrr/2-rr/4) (1 _~) v V;; 8IX We can now use the expressions for the asymptotic forms of the two Hankel functions to write the asymptotic forms of Jv(x) and Yv(x) for large x: Jv(x) = &[H~l)(X) + H~2x)] '" (2 [cos (x - v~ - ~) + ~ sin (x - v~ - ~) +...] ' V;-; 2 4 8x 2 4 Yv(x) = ;i[H~l)(x) - H~2)(x)] "'J:x[Sin(x-vi -~) - 8~ cos (x-vi -~)+"l 15.5 Problems 15.1. Use the change of variables k =Int and ix =w -Cl (wherek and x are the common variables used in Fourier transform equations) to show that the Fourier transform changes into a Mellin transform, I lioo + u G(t) = -. F(w)t-Wdw, 21l'1 -ioo+a 15.2. The Laplace transform L[f] of a function f(l) is defined as L[J](s) sa 10 00 e-st f(t) dt.
  • 464. 446 15. INTEGRAL TRANSFORMS AND DIFFERENTIAL EQUATIONS Show that the Laplace transform of is I (a) f(t) = 1 s is s (b) f(t) = cosh tot s2 - w2 ' is w (c) f(t) = sinhwt s2 - (J)2' is s (d) f(t) =coswt s2 +w2· is w (e) f(t) = sinwt s2 +w2· f(t) = eM for t > 0, 1 (f) is , s-w (g) f(t) = t" is f(n + 1) sn+l , 15.3. Evaluate the integral where s > O. wheres2 > (li2. where s > w. where s > 0, n > -1. (15.19) 10 00 sintat f(t) = --dw o w by finding the Laplace transform and changing the order of integration. Express the result for both t > 0 and t < 0 in terms of the theta function. (You will need some results from Problem 15.2.) 15.4. Show that the Laplace transform of the derivative of a function is given by L[F'](s) = sL[F](s) - F(O). Similarly, show that for the second derivative the transform is L[FI/](s) = s2L[F](s) - sF(O) - F'(O). Use these results to solve the differential equation. ul/(t) +w2u(t ) = 0 snbject to the boundary conditions u(O) = a, u'(O) = O. 15.5. Solve the DE of Equation (15.5). 15.6. Calculate the surface term for the hypergeometric DE. 15.7. Determine the constant C' in Equation (15.6), the solution to the hypergeo- metric DE. Hint: Expand (t - z)-a inside the integral, use Equations (11.19) and (11.17), and compare the ensuing series with the hypergeometric series of Chapter 14. 15.8.: Derive the Euler formula [Equation (15.7)]. 15.9. Show that f(y)f(y - a - fJ) F(a, fJ; y; 1) = f(y _ a)r(y - fJ)'
  • 465. 15.5 PROBLEMS 447 Hint: Use Equation (11.19). Equation (15.19) was obtained by Gauss using only hypergeometric series. 15.10. We determine the asymptotic behavior of <l>(a, y; z) for z --> 00 in this problem. Break up the integral in Equation (15.11) into two parts, one from oto -00 and the other from -00 to 1. Substitute -t/z for t in the first integral, and 1 - t/ z for t in the second. Assuming that z --> 00 along the positive real axis, show that the second integral will dominate, and that as z ----* 00. r(y) <l>(a, y; z) --> f(a)z·-YeZ 15.11. In this problem, we determine the constant k of Equation (15.12). (a) Write the contour integral of Equation (15.12) for each of the three pieces of the contour. Note that arg(t) =-n as t comes from -00 and arg(t) =n as t goes to -00. Obtain a real integral from 0 to 00. (b) Use the relation f(z)f(1 - z) = tt] sin n z, obtained in Chapter 11, to show that n I'(v-z) = r(z + 1) sin xz (c) Expand the function exp(z2/4t) in the integral of part (a), and show that the contour integral reduces to ., ~ (z)2n f(-n - v) -2l smVJr L." - . n=O 2 f(n + 1) (d) Use the resnlt of part (c) in part (b), and compare the resnlt with the series expansion of Jv(z) in Chapter 14 to arrive finally at k = 1/(2ni). 15.12. By integrating along Ct, C2, C3, and C4 of Figure 15.2, derive Equation (15.18). 15.13. By substituting t = exp(ill) in Equation (15.14), show that 00 00 eiZ';ne = Jo(z) +2I:hn(z) cos(2nll) +2i I:hn+J (z) sin[(2n + 1)11]. n=l n=O In parricular, show that 1 121< . .o Jo(z) = - e,<sm dll. 2n 0 15.14. Derive the integral representations ofHJl)(x) and HJ2) (x) given in Section 15.3.
  • 466. 448 15. INTEGRAL TRANSFORMS AND DIFFERENTIAL EQUATIONS Additional Reading 1. Dennery, P. and Krzywicki, A Mathematicsfor Physicists, Harper and Row, 1967. 2. Mathews, J. and Walker, R. Mathematical Methods of Physics, 2nd ed., Benjamin, 1970.
  • 467. Part V _ Operators on Hilbert Spaces
  • 469. 16, _ An Introduction to Operator Theory The first two parts of the book dealt almost exclusively with algebraic techniques. The third and fourth part were, devoted to analytic methods. In this introductory chapter, we shall try to unite these two branches of mathematics to gaiu insight into the nature of some of the important equations in physics and their solutions. Let us start with a familiar problem. 16.1 From Abstract to Integral and Differential Op- erators Let's say we wantto solve an abstractvector-operatorequationA lu) = Iv) in an N- dimensional vector space V. To this end, we select a basis B = {lUi) )~1' write the equation in matrix form, and solve the resulting system of N linear equations. This produces the components of the solution lu) in B. If components in another basis B' are desired, they can be obtained using the similarity transformation connecting the two bases (see Chapter 3). There is a standard formal procedure for obtaining the matrix equation. It is convenient to choose an orthonormal basis B = IIei) }~1 for V and refer all components to this basis. The procedure involves contracting both sides of the equation with (eil and inserting 1 = r.7=1Iej) (ejl between A and lu): r.7=1 (eil A lej) (ejl u) = (eil v) for i = 1,2, ... , N, or N LAijuj = Vi j=l fori=1,2, ... ,N, (16.1)
  • 470. 452 16. AN INTRODUCTION TO OPERATOR THEORY where Ai} sa (eilAlej) , Uj ss (ejlu), and Vi ss (eilv). Equation (16.1) is a system of N linear equations in N unknowns Iuj J7=1' which can be solved to obtain the solution(s) of the original eqnation in B. A convenient basis is that in which A is represented by a diagonal matrix diag(AI, A2, '" ,AN). Then the operator equationtakes the simplcform x.a, = Vi, and the solution becomes immediate. Let us now apply the procedure just described to infinite-dimensional vector spaces, in particnlar, for the case of a continuous index. We want to find the solutions of K Iu) = If). Following the procedure used above, we obtain (xl K (lab IY) w(y) (YI dY) lu} =lab (xl K Iy) w(y) (YI u) dy = (xl f) , where we have used the results obtained in Chapter 6. Writing this in functional notation, we have lab K(x, y)w(y)u(y) dy = f(x), (16.2) Integral operators which is the continuous analogue of Equation (16.1). Here (a, b) is the interval andkernels on which the functions are defined. We note that the indices have tumed into continuous arguments, and the snm has turned into an integral. The operator K that leads to an equation such as (16.2) is called an integral operator (10), and the "matrix element" K (x, y) is saidto be its kernel. The discussion of the discrete case mentioned the possibility of the operator A being diagonal in the given basis B. Let us do the same with (16.2); that is, noting that x and y are indices for K, let us assume that K(x, y) = 0 for x oft y. Such local operators operators are called local operators. For local operators, the contribution to the integral comes ouly at the point where x = y (hence, their name). If K (x, y) is finite at this point, and the functions w(y) and u(y) are well behaved there, the LHS of (16.2) will vanish, and we will get inconsistencies. To avoid this, we need to have ( ) { o if x oft y, K x,y = . 00 If x = y. Thus, K(x, y) has the behavior of a delta function. Letting K (x, y) sa L(x)8(x - y)/w(x) and substituting in Equation (16.2) yields L(x)u(x) = f(x). Inthediscrete case,A.i wasmerely anindexednumber; itscontinuous analogue, L(x), may represent merely a function. However, the fact that x is a continuous variable (index) gives rise to other possibilities for L(x) that do not exist for the discrete case. For instance, L(x) could be a differential operator. The derivative, although defined by a limiting process involving neighboring points, is a local operator. Thus, we can speak of the derivative of a function at a point. For the
  • 471. right-shift operator 16.2 BOUNDED OPERATORS IN HILBERTSPACES 453 discrete case, u; can only "hop" from i to i +I and then back to i. Snch a difference (as opposed to differential) process is notlocal; it involves not only i but also i +1. The "point" i does not have an (infinitesimally close) neighbor. This essential difference between discrete and continuous operators makes the latter far richer in possibilities for applications. In particular, if L(x) is considered a differential operator, the equation L(x)u(x) = f(x) leads directly to the fruitful area of differential equation theory. 16.2 Bounded Operators in Hilbert Spaces The concept ofan operatoron a Hilbert space is extremely subtle. Even the elemen- tary characteristics of operators, such as the operation of hermitian conjugation, cannot generally be defined on the whole Hilbert space. In finite-dimensional vector spaces thereis a one-to-one correspondence be- tween operators andmatrices. So, in some sense, the study of operators reduces to a stody of matrices, which are collections of real or complex numbers. Al- though we have already noted an analogy between matrices and kernels, a whole new realm of questions arises when Aij is replaced by K(x, y)-questions about the continuity of K(x, y) in both its arguments, about the limit of K(x, y) as x and/or y approach the "end points" of the interval on which K is defined, about the boundedness and "compactness" of K, and so on. Such subtleties are not unex- pected. After all, when we tried to generalize concepts offinite-dimensional vector spaces to infinite dimensions in Chapter 5, we encountered difficulties. There we were concerned aboutvectorsonly; the generalization of operators is even more complicated. 16.2.1. Example. Recallthat <coo is the set of sequences la) = l"'iJ;';;l' or of co-tuples (al. a2, ... ), thatsatisfythe convergence requirement L~l laj 1 2 < 00 (see Example t.1.2). It is a Hilbert space with inner product defined by (al b) = L:~I "'jfJj. The standard (orthonormal) basisfor Coois {Iei}}~l' wherelei} has all components equal to zeroexcept the ith one, which is 1.Thenone has la} = L~l aj [ej}. Onecanintroduce anoperator Tr, calledtheright-shift operator,by T,la) =T,(f>jlej )) = f>jIej+I). j=l j=l In otherwords,Tr transforms("'10"'2 •... ) to (0, "'I. "'2, ... ). It is straightforward to show that Tr is indeed a linear operator. .. The first step in our stody of vector spaces of infinite dimensions was getting a handle on the convergence of infinite sums. This entailed defining a norm for vectors and a distance between them. In addition, we noted that the set of linear transformations L (V, W) was a vector space in its own right. Since operators are "vectors" in this space, the study of operators requires constructing a norm in L (V, W) when V and W are infinite-dimensional.
  • 472. 454 16. AN INTROOUCTION TO OPERATOR THEORY 16.2.2. Definition. Let 1fl and 1fz be two Hilbert spaces with norms II . 111 and II· liz. ForanyT EL(1fl,1fz),thenumber max { II TXllzl lx} 'i-o} IIxlll operator norm (if it exists) is called1 the operator norm ofT and is denoted by IITII. A linear transformation whose norm is finite is called a bounded linear transformation. A bounded linear transformation from a Hilbert space to itselfis called a bounded bounded operator operator. The collection ofall bounded linear transformations, which is a subset of L(1f" 1fz), will be denoted by 13(1fl, 1fz), and if1fl = 1fz es 1f, it will be denoted by 13(1f). Note that II . 111 and II . liz are the norms induced by the inner product of 1fl and 1fz. Also note that by dividing by IIx II t we eliminate the possibility of dilating the norm of IITII by choosing a "long" vector. By restricting the length of [x), one can eliminate the necessity for dividing by the length. In fact, the norm can equivalently be defined as IITII = max IIlTxllzl IIxlll = I} = max IIlTxllzl IIxliJ =" I}. (16.3) It is straightforwardto show thatthe three definitions are equivalent and they indeed define a norm. 16.2.3. Proposition. An operator T is bounded if and only if it maps vectors of finite norm to vectors offinite norm. Proof Clearly, ifT is bounded, then IITxll has finite norm. Conversely, if IITxllz is finitefor all [x) (ofunitlength),maxIIlTxllzlllxllt = I} is also finite, andTis bounded. D An innnediate consequence of the definition is IITxllz =" IITII IIxllJ 'of Ix) E 1f,. (16.4) Ifwechoose Ix}-Iy} instead of [x), it will follow from (16.4) that as [x) approaches Iy}, T [x) approaches T Iy}. This is the property that characterizes continuous func- tions: bounded operators 16.2.4. Proposition. The bounded operatorT E 13(1fj, 1fz) is a continuousfunc- arecontinuous tion from 1f, to 1fz. Another consequence of the definition is that 1The precise definition uses "supremum" instead of"maxiroum." Rather than spending a lot ofeffort explaining the difference between the two concepts, we use the less precise, but more intuitively familiar, concept of "maximum."
  • 473. 16.2 BOUNDEO OPERATORS IN HILBERT SPACES 455 16.2.5. Box. :B(:J-Cj, :J-Cz) is a vector subspace of!'(:J-Cj, :J-Cz), andfar:J-C, = :J-Cz = :J-C, we have 1 E 13(:J-C) and 11111 = I. 16.2.6. Example. Wehaveseenthatin aninnerproduct space,one canassociate alinear operator (linear functional) to everyvector. Thus,associatedwiththevector Ix} in aHilbert spaceXts thelinearoperatorIx : :J£ -e- C defined byIx (Iy» '" (x] y). Wewantto compare theoperatornormoffx withthenormof Ix}. FirstnotethatbyusingtheSchwarz inequality, we get IIlxll =max[ Ilxl~/)IIIY)"'0) =max[ I(~~~)IIIY)"'0) s 114 Onthe otherhand,from IIxliZ = Ix (Ix», we obtain IIxll = IXI~~I) ~ max [ Ilxl~I»IIIY) '" 0) =lllxll· Thesetwo inequalities implythat IIlxII = IIxli. III derivative operator is unbounded 16.2.7. Example. The derivative operator D dJdx is not a bounded operator on the Hilbert space- £}(a, b) of sqnare-integrable fnoctions. With a functionlike f(x) = Jx - a, one gets b-a "" Ilfll = .Ji ' norm ofaproduct is less than the product ofnorms. whiledf[dx = 1/(2.,fx - a) gives 1I0fllZ = !J% dx/(x - a) = 00. Weconclude that 11011 = 00. III 16.2.8. Example. Since£,O{) is analgebra aswell as a vectorspace,onemaybe inter- estedin therelation betweentheproduct of operators andtheir norms. Morespecifically, onemay wantto knowbow IISTII is relatedto IISII and IITII. Iu this example we sbowthat IISTII ~ IISII IITII. Todo so, we use thedefinition of operator nonn fortheproduct ST: IISTII = max [ II~:~II I Ix) '" 0) =max[ IISTxll IITxll I Ix) "'O",Tlx)) IITxll IIxll < max [ II SCT IX)1I1 Tlx) ",o)max[ II Txlll lx) "'0). - . IITxll ~---,I,-Ix,-II ~_~ =IITII (16.5) 2Herethe two Hilbertspacescoincide, so thatthe derivative operator acts on a single Hilbertspace.
  • 474. 456 16. AN INTROOUCTION TO OPERATOR THEORY Now note that the first term on the RHS does not scan all the vectors for maximality: It scans only the vectors in the image of T. If we include all vectors, we may obtain a larger number. Therefore, max{ 11 5(Tlx))111 T1x);"0) < max { 115.<1 11 Ix) ;"0) = 11511 IITxll - IIxll ' andthedesiredinequalityisestablished. A usefulconsequenceofthisresultis IITnII ::; IITlln, which we shall use frequently. IlIIII We can put Equation (16.5) to immediate good use. 16.2.9. Proposition. Let 1f be a Hilbert space and T E 13(1f). If [T II < I, then 1 - T is invertible and (1 - T)-t = L~o Tn. Proof. First note that the series converges, because Il f Tnl1s f IITnII :::: f IITli n = I_I T n=O n~O n=O II II and the sum has a finite norm. Furthermore, (1 - nfTn = (1 - T) (lim tTn) = lim (1 - T) trn n=O k->;oon=O k--+oo n=Q = lim (trn -trn+t) = lim (1 - Tk+l) = 1, k-+oo n=O n=O k-HXi because 0 < Iim,....co IITk+1 11 ::; limr....oo IITIlk+l = 0 for IITII < I, and the vanishing ofthe norm implies the vanishing ofthe operatoritself. One can similarly show that (L~o Tn)(l - T) = 1. D A corollary of this proposition is that operators that are "close enough" to an invertible operator are invertible (see Problem 16.1). Another corollary, whose proof is left as a straightforward exercise, is the following: 16.2.10. Corollary. Let T E 13(1f) and Aa complex number such that IITII < IAI. Then T - Al is an invertible operator, and lOOT n (T - Al)-t = -- L (-) An~O A Adjoints play an important role in the stody of operators. We recall that the adjoint ofTis definedas (yl T Ix)* = (r] rt Iy) or (Txl y) = (x ITt y).Inthefinite- dimensional case, we could calculate the matrix representation of the adjoint in a particular basis using this definition and generalize to all bases by similarity transformations. That is why we never raised the question of the existence of the adjoint of an operator. In the infinite-dimensional case, one must prove such an existence. We state the following theorem without proof:
  • 475. 16.3 SPECTRA OF LINEAR OPERATORS 457 16.2.11.Theorem. Let T E 1I(:Ji). Then the adjoint ofT, defined by (Txl y) = (x ITty), Tand Tthave equal exists. Furthermore, IITII = IITt II. norms Anotheruseful theoremthat we shall use later is the following. 16.2.12.Theorem. Let N(T) and ~(T) denote the null space (kernel) and the range ofT E 1I(:Ji).We have Proof Ix} is in N(Tt) iff rt [x) = 0 iff (y ITtx) = 0 for all Iy} E :Ji. Thisholds if and only if (Ty Ix) = 0 for all Iy) E :Ji. This is equivalentto the statement that Ix} is in ~(T)-L. This chain of argumentproves that N(Tt) = ~(T)-L. The second part of the theoremfollowsfrom the fact that (Tt)t = T. D 16.3 Spectra of Lineal" Operators regular point ofan operator resolvent setand spectrum ofan operator every eigenvalue of an operator on a vector space offinite dimension isin its spectrum and vice versa One of the most important results of the theory of finite-dimensional vector spacesis thespectraldecomposition theoremdevelopedin Chapter4.Theinfinite- dimensional analogue of that theorem is far more encompassingand difficult to prove.It is beyondthe scopeof this bookto developall the machineryneeded for a thoroughdiscussionof theinfinite-dimensional spectraltheory.Instead,weshall presentthe central results, and occasionallyintroducethe reader to theperipheral arguments when they seem to have their own merits. 16.3.1.Definition. Let T E .t:,(:Ji). A complex number Ais called a regular point ofT ifthe operator T - A1 is bounded and invertible.3 The set ofall regularpoints ofT is called the resolvent set ofT, and is denoted by p(T). The complement of p(T) in the complex plane is called the spectrum ofT and is denoted by a(T). Corollary16.2.10implies"thatifT isbounded,then p(T)isnotempty, andthat the spectrumof a boundedlinear operatoron a Hilbert space is a boundedset.In fact, an immediateconsequenceof the corollaryis that A::; IITII for all AE aCT). It is instructive to contrastthe finite-dimensional case againstthe implications of the above definition. Recall that because of the dimension theorem, a linear operator on a finite-dimensional vector space V is invertible if and only if it is eitherontoorone-to-one.Now, AE a(T) if and onlyifT - A1 is notinvertible. For finitedimensions, this impliesthatS ker(T- A1) 01 O. Thus, in finite dimensions, 3IfTis bounded, then T- ).1is automatically bounded. 40ne cansimply choosea x whoseabsolute value is greater than IITII. SNotehow critical finite-dimensionality is for this implication. In infinite dimensions, an operator canbe one-to-one (thus having a zerokernel) without beingonto.
  • 476. 458 16. AN INTRODUCTION TO OPERATOR THEORY notall points ofaCT) areeigenvalues A E ,,(T) if and only if there is a vector la) in V such that (T - A1) la) = O. This is the combined definition of eigenvalue and eigenvector, and is the definition we will have to use to define eigenvalues in infinite dimensions. It follows that in the finite-dimensional case, ,,(T) coincides with theset of all eigenvalues of T. This is not true for infinite dimensions, as the following example shows. 16.3.2. Example. Consider theright-shift operator actingon Coo. It is easy to see that IIT rall = lIall for all ]«). This yields IITrll = I, so that any A that belongsto a(T,) must be suchthat 1)..1 :::: 1. Wenow showthat theconverse is also true, i.e., that if III :5 1, then )" E o(T,.). It is sufficient to showthat if 0 < 1),,1 ::: I, thenTr - A1 is notinvertible. To establish this,we shallshowthat Tr - A1 is notonto. Supposethat T; -),,1 is onto.Thenthere must be a vector la) suchthat(Tr-),,1) la) = let} where leI}is the first standardbasisvector ofCco.Equating components onbothsides yields therecursion relations al = -1/)", anda j-l = )"aj forall j :::: 2. Onecanreadily solvethisrecursionrelation toobtaina j = -1/).) forallj. Thisisacontradiction, because 00 00 1 I:lajl2 = I: IAI2j J=l J=l will not converge ifO < IAI ~ 1, i.e., la} f/. Coo,and therefore Tr - A1 is not onto. we conclude thet c rr-) = {A E Cj O < IAI::: I). 1fwe could generalize the result of thefinite-dimensional casetoCoo,we wouldconclude that allcomplexnumbers whose magnitude is atmost I are eigenvalues of Tr- Quitetooursurprise, thefollowingargument showsthat Tr hasno eigenvalues atall! Suppose that A is an eigenvalue of Tr. Let la} be any eigenvector for A. Since Tr preserves the tengthof a vector, we have (al a) = (Tral Tra) = (Aal Aa) = IAI2 (al a). It follows that IAI = I. Nowwrite la) = {aj )1=1 and let am be the firstnonzeroterm of this sequence. Then 0 = (Tral em) = (Aul em) = Au m • The first equalitycomes about because T; la} has its first nonzero term in the (m + l)st position. Since A =1= 0, we must haveam = 0, whichcontradicts thechoice of thisnumber. II!I 16.4 CompactSets This section deals with some technical concepts, and as such will be rather formal. The central concept of this section is compactoess. Although we shall be using compactness sparingly in the sequel, the notion has sufficient application in higher analysis and algebra that it warrants an introductory exposure. Let us start with the familiar case of the real line, and the intuitive notion of "compactness." Clearly, we do not wantto call the entirereal line "compact," because intuitively, it is not. The next candidate seems to be a "finite" interval. So, first consider the open interval (a, b). Can we call it compact? Intuition says "yes," but the following argument shows that it would not be appropriate to call the open interval compact. Considerthe map e : IR ---> (a, b) givenby etr) = b"2a tanh t+bia •The reader may check that this map is continuous and bijective. Thus, we can continuously
  • 477. 16.4 COMPACT SETS 459 map all of R in a one-to-one manner onto (a, b). This makes (a, b) "look" very much" like lR. How can we modify the interval to make it compact? We do not want to alter its finiteness. So, the obvious thing to do is to add the end points. Thus, the interval [a, b] seems to be a good candidate; and indeed it is. The next step is to generalize the notion of a closed, finite interval and even- tually come up with a definition that can be applied to all spaces. First we need some terminology. open ball 16.4.1. Definition. An open ball B;(x) of radius r and center Ix) in a normed vector space V is the set ofall vectors in V whose distance from Ix} is strictly less than r: open round neighborhood bounded subset open subset boundary point closed subsetand closure Br(x) == lly) EVilly -xII < r]. We call Br(x) an open round neighborhood of [x), This is a generalization of open interval because (a, b) = {y E lRIly _a: b I< b; a} . 16.4.2. Example. A prototype of finite-dimensional nonned spacesis JRn. An openball of radius r centered at x is Br(x) = {y E JR I(Yt - Xt)2 +(Y2 - X2)2 + ... + (Yn - xn)2 < r2 }. Thus, all points inside a circle form an open ball in the xy-plane, and all interior points of a solid sphereform an open baIlin space. II 16.4.3. Definition. A bounded subset of a normed vector space is a subset that can be enclosed in an open ball offinite radius. For example, any region drawn on a piece of paper is a bounded subset of lR2, and any "visible" part of our environment is a bounded subset of lR3 because we can always find a big enough circle or sphere to enclose these subsets. 16.4.4. Definition. A subset (')ofa normedvector space V is called open ifeach of itspoints(vectors) hasan open round neighborhoodlying entirely in ('). A boundary point of(') is a point (vector) in Vall ofwhose open round neighborhoods contain points inside and outside ('). A closed subset eof V is a subset that contains all ofits boundary points. The closure ofa subset S is the union of S and all of its boundary points, and is denoted by S. Forexample, the boundary ofa region drawn on paper consists of all its bound- ary points. A curve drawn on paper has nothing but boundary points. Every point is also its own boundary. A boundary is always a closed set. In particular, a point is a closed set. In general, an open set cannot contain any boundary points. A frequently used property of a closed set eis that a convergent sequence of points of econverges to a point in e. 6In mathematical jargon one says that (a, b) and R are homeomorphic.
  • 478. 460 16. AN INTROOUCTION TO OPERATOR THEORY dense subset rational numbers are dense inthereal numbers p(T) isopen, and u(T) isclosed and bounded inC. 16.4.5.Definition. A subset W 01a normed vector space V is dense in V if the closure olW is the entire space V. Equivalently, W is dense ifeach vector in W is infinitesimally close to at least one vector in V. In other words, given any !u) E V and any E > 0, there is a !w) E W such that lIu - wll < E, i.e., any vector in V can be approximated, with arbitrary accuracy, by a vector in W. A paradigm of dense spaces is the set of rational numbers io the normed vector space of real numbers. It is a well-known fact that any real number can be approximated by a rational number with arbitrary accuracy: The decimal (or bioary) representation of real numbers is precisely such an approximation. An iotuitive way ofimagioiog denseness is that the (necessarily) infinite subsetis equal to almost all of the set, and its members are scattered "densely" everywhere io the set. The embeddiog of the rational numbers in the set of real numbers, and how they densely populate that set, is a good mental picture of all dense subsets. A useful property iovolviog the concept of closure and openness has to do with contiouous maps betweennormedvectorspaces. Let I: 1t:1 --> 1t:2be a continuous map. Let (92 be an open setin1t:2. Let 1-1((92) denote the ioverse image of (92,i.e., allpoiots of1t:1 that are mapped to (92. Let !XI)be a vector io I-I((92), !X2) = I(!XI)), and let B.(X2) be a ball contained entirely io (92. Then I-I(B. (X2)) contaios !XI) and lies entirely io 1-1((92). Because of the continuity of I, one can now construct an open ball centered at IXI) lyiog entirely io I-I(B. (X2)), and by inclusion, in I-I((92).This shows that every poiot of I-I((92)has a round open neighborhood lyiog entirely io 1-1((92). Thus, 1-1((92) is an open subset. One can similarly show the correspondiog property for closed subsets. We can summarize this io the followiog: 16.4.6.Proposition. Let I : 1t:1 --> 1t:2be continuous.Then the inverse image 01 an open (closed) subset 011t:2 is an open (closed) subset ol1t:J. Consider the resolvent set of a bounded operator T. We claim that this set is open io C. To see this, note that if )" E P(T), then T - )"1 is iovertible. On the other hand, Problem 16.1 shows that operators close to an invertible operator are iovertible. Thus, if we choose a sufficiently small positive number E and consider all complex numbers I-' withio a distance E from )",then all operators of the form T - 1-'1 are iovertible, i.e., I-' E P(T). Therefore, any)" E peT) has an open round neighborhood io the complex plane all poiots of which are io the resolvent. This shows that the resolvent set is open. In particular, it cannot contain any boundary poiots. However, peT) and O'(T) have to be separated by a common boundary? Sioce p(T) cannotcontaioany boundarypoiot, 0' (T) mustcarry the entire boundary. This shows that 0' (T) is a closed subset of Co Recalliog that 0' (T) is also bounded, we have the following result. 7The spectrum of a bounded operator need not occupy any "area" in the complex plane. It may consist of isolated points or line segments, etc., in which case the spectrum will constitute the entire boundary.
  • 479. 16.4 COMPACT SETS 461 16.4.7. Proposition. For any T E 1l(9i) the set peT) is an open subset ofC and IT (T) is a closed, bounded subset ofCo Let us go back to the notion of compactness. It tums out that the feature of the closed interval [a, b] most appropriate for generalization is the behavior of infinite sequences of numbers lying in the interval. More specifically, let la,1;';;1 be a sequence of infinitely many real numbers all lying in the interval [a, b]. It is intuitively clear that since there is not enough room for these points to stay away from each other, they will have to crowd around a number ofpoints in the interval. For example, the sequence in the interval [-1, +1] crowds around the two points -t and +t.ln fact, the points with even n accumnlate around +tand those with odd n crowd around - t. It tums out that all closed intervals of ~ have this property, namely, all sequences crowd around some points. To see that open intervals do not share this property consider the open interval (0, 1). The sequence 12n~tl:l = Il,~, ... I clearly crowds only around zero, which is not a point ofthe interval. But we already know that open intervals are not compact. 16.4.8. Definition. (Bolzano-Weierstrass property) A subset X ofa normed vec- compact subset tor space is called compact if every (infinite) sequence in X has a convergent subsequence. The reason for the introduction of a subsequence in the definition is that a sequence may have many points to which it converges. But no matter how many of these points there may exist, one can always obtain a convergent subsequence by choosing from among the points in the sequence. For instance, in the example above,one canchoosethe subsequenceconsistingofelementsfor whichn is even. This subsequence converges to the single point +t. An important theorem in real analysis characterizes all compact sets in ~n:8 16.4.9. Theorem. (BWHB theorem) A subset of~n is compact ifand only ifit is closed and bounded. We showed earlier that the spectrum of a bounded lioear operator is closed and aCT) iscompact bounded. Identifying C with ~2, the BWHB theorem implies that BBWHB stands for Balzano, Weierstrass, Heine, and Borel. Balzano and Weierstrass proved that any closed and bounded subset of R has the Balzano-Weierstrass property. Heine and Borel abstracted the notion of compactness in terms of open sets, and showed that a closed bounded subset of R is compact. The BWHB theorem as applied to R is usually called the Heine- Borel theorem (although some authors call it the Balzano-Weierstrass theorem). Since the Balzano-Weierstrass property and compactness are equivalent, we have decided to choose BWHB as the name of our theorem.
  • 480. 462 16. AN INTROOUCTION TO OPERATOR THEORY 16.4.10. Box. The spectrum ofa bounded linear operator is a compact sub- set ofC. criterion for finite-dimensionality An immediate consequence of the BWHB Theorem is that every bounded subset ofJRnhas a compactclosure. SinceJR" is a prototype ofallfinite-dimensional (normed) vector spaces, the same statement is true for all such vector spaces. What is interesting is that thestatement indeedcharacterizes theDonned space: 16.4.11. Theorem. A normedvector space isfinite-dimensional ifandonly ifevery bounded subset has a compact closure. This result can also be applied to subspaces of a normed vector space: A subspace W of a normed vector space Vis finite-dimensional if and ouly if every bounded subset ofW has a compact closure in W. A useful version ofthis property is stated in terms of sequences of points (vectors): 16.4.12. Theorem. A subspace W ofa normedvector space V isfinite dimensional ifand only ifevery bounded sequence in W has a convergent subsequence in W. Karl Theodor WDhelm Weierstrass (1815-1897) was boththe greatest analyst andtheworld's foremost teacher of advanced mathematics of the last third of the nine- teenth century. His career was also remarkable in another way-and aconsolation toall"late starters"-forhe began thesolid part of his professional life at theage of almost 40, whenmostmathematicians are long pasttheircreative years. Hisfather senthimtotheUniversity ofBonn toqualify forthehigherranks ofthePrussiancivil serviceby studying law andcommerce. ButKarl hadno interest in these sub- jects. He infuriatedhis fatherby rarely attendinglectnres,gettingpoor grades,andinstead, becomingachampion beerdrinker. Hedidmanage to becomeasuperb fencer, butwhenhe returned home, he hadno degree. In order to earn his living, he made a fresh start by teaching mathematics, physics, botany, German, penmanship. and gymnastics to the children of several small Prussian towns during the day. During the nights, however, he mingled with the intellectuals of thepast, particularly thegreat Norwegian mathematician Abel. Hisremarkable research on Abelian functions wascarried onforyearswithouttheknowledge of another livingsoul;he didn'tdiscussitwithanyone atall,orsubmit itforpublication inthemathernaticaljoumals of the day. All thischanged in 1854whenWeierstrass atlastpublished anaccount of his research onAbelian functions. Thispaper caught theattention of analert professor attheUniversity of Konigsberg who persuaded his university to award Weierstrass an honorary doctor's degree. The Ministry of Education granted Weierstrass a year'sleave of absence withpay
  • 481. 16.4 COMPACT SETS 463 to continue his research, andthe next yearhe was appointed to the University of Berlin, where heremained therestof his life. Weierstrass's great creative talents wereevenly dividedbetweenhis thinking andhis teaching. The student notes of his lectures,andcopies of thesenotes, andcopies of copies, werepassed from hand to hand throughout Europe andevenAmerica. LikeGauss he was indifferent to fame, butunlike Gauss he endeared himself to generations of students by thegenerosity withwhichhe encouraged themto developandpublish, andreceivecredit for, ideas andtheorems thatheessentially originatedhimself. Among Weierstrass's students andfollowerswere Cantor, Schwarz, Holder, Mittag-Leffler, SonjaKovalevskaya (Weierstrass's favorite student), Hilbert, Max Planck, Willard Gibbs, andmanyothers. In 1885he published thefamous theorem now calledtheWeierstrassapproximation theorem (see Theorems 5.2.3 and 8.1.1), which was given a far-reaching generalization, withmanyapplications, by themodemAmerican mathematician M. H. Stone. Thequality thatcametobeknownas"Weierstrassian rigor" wasparticularly visiblein his contributions to thefoundations ofreal analysis. Herefused to acceptanystatement as "intuitively obvious," butinstead demanded ironclad proofbasedon explicitproperties of therealnumbers. Thecareful reasoning required fortheseproofswasfounded ona crucial property of therealnumbers nowknownastheBWHB theorem. We shall need the following proposition in our study of compact operators: 16.4.13. Proposition. Let W be.aclosedproper subspaceofX and 8 an arbitrary nonnegative number with 0 :0 8 < 1. Then there exists a unit vector Ivo} E X such that IIx- voll2:8 v [x) E W. Proof Choose a vector Iv} in X but not in W and let d = minl]» - xIII [x) E W}. We claim that d > O. To show this, assume otherwise. Then for each (large) n and (sufficiently small) f, we could find distinct vectors (Ixn )) whose distance from [u) would be fin and for which the sequence (Ixn )) would have Iv} as a limit. Closure ofW would then imply that Iv} is in W, a contradiction. So, d > 0, Now, for any Ixo} E W, let eW lu} sa [x) _ Iv} - Ixo} = '(IIv - xolllx) + Ixo}) -Iv} IIv -xoll IIv -xoll and note that by the definition of d, the norm of the numerator is larger than d. Therefore, lIuli 2: dlllv - xoll for every Ix}, Ixo} E W. If we choose Ixo} snch that IIv - Xo II < as:t, which is possible because d8-1 > d, then lIuII 2: 8 for all Ix} E W. Now let Ivo} = (Iv) - Ixo})/lIv - xoll. 0
  • 482. 464 16. AN INTROOUCTION TO OPERATOR THEORY 16.5 Compact Operators It is straightforward to showthat if X is a compact set in 9ii and f : 9i1 ..... 9i2 is continuous, then fiX) (the image of X) is compact in 9i2. Since all bounded operators are continuous, we conclude that all bounded operators map compact subsets onto compactsubsets. Thereis aspecialsubsetof 21(9i1, 9i2) thatdeserves particularattention. compact operator 16.5.1. Definition. An operator K E 2l(9it, 9i2) is called a compact operator if it maps a bounded subset of9il onto a subset Of9i2 with compact closure. Since we will be dealing with function spaces, and since it is easier to deal with sequencesof functionsthan with subsetsof the space of functions,we findit moreusefulto havea definition of compactoperatorsin termsof sequencesrather than subsets.Thus,insteadof aboundedsubset,wetake a subsetof it consistingof a (necessatily)boundedsequence.The image of this sequencewill be a sequence in a compact set, which,by defutition, must have a convergentsubsequence. We thereforehavethe following: 16.5.2. Theorem. An operator K E 2l(9it, 9i2) is compact ifand only iffor any bounded sequence (Ixn)} in 9it, the sequence (K Ixn)} has a convergent subse- quence in 9i2. product oftwo compact operators Is compact 16.5.3. Example. Consider 2l(:J{), thesetofbounded operators ontheHilbert space X, If Kis a compactoperatorand T a bounded operator,then KT and TK are compact. This is because (T IXn) == IYn))is a bounded sequence if (Ixn)) is, and (K IYn) = KTIXn)) hasa convergentsubsequence,because Kis compact. For the second part, use the first definition of the compact operator and note that K maps bounded sets onto compact sets, which T (beingcontinuous) maps onto a compact set. As a special case of this property we note that the product of two compact operators is compact. Similarly,one can show that any linear combination of compactoperators is compact.Thus,anypolynomial of a compactoperator is compact. In particular, ~ n' . ~ n' . (1-K)n = L...., .. ,(-K)] =1+ L...., '. ,(-K)] ==1-Kn, j=O J .(n - J). j=l ] .(n - J). whereKn is a compactoperator. III finite rank operators 16.5.4. Definition. An operator T E .c(9it, 9i2) is called a finite rank operator ifits range is finite-dimensional. The following is clear from Theorem 16.4.12. 16.5.5. Proposition. A finite rank operator is compact In particular,everylinear transformationof a finite-dimensional vector space is compact.
  • 483. linear transformations of finite-dimensional vector spaces are compact 16.5 COMPACT OPERATORS 465 16.5.6. Theorem. IflKn} E ,c(Xj, X2) are compactandK E ,c(Xj, X2) is such that IIK - Kn II --+ 0, then K is compact. Proof. Let {lxm)J be a bounded sequence in Xl. Let {Kl Ixm1)Jbe the convergent subsequence guaranteed by the compactness of Kj. Now, {lxm1)J is a bounded sequence in Xl. It therefore has a subsequence {Ixm,)} such that {K2Ixm,)} is convergent. Note that {Kj Ixm,)} is also convergent. Continuing this process, we constroct the sequence of sequences where each sequence is a subsequence of all sequences preceding it. Furthermore, all the sequences {I<J Ixmk } } for I = I, ... , k are convergent. In particular, if we pick the diagonal sequence {IYm)J == {lxmm)J, then for any I E !'ii, the sequence {K1IYm)} converges in X2. To show that K is compact, we shall establish that {IYm)} is the subsequence of {Ixm)} such that {KIYm)} is convergent. Since X2 is complete, it is sufficient to show that {KIYm)} is Cauchy. We use the so-called "E/3trick." Write K IYm) - K IYn} = K IYm) -I<J IYm) +I<J IYm} - K1IYn) +I<J IYn) - K IYn} and use the triangle inequality to obtain By choosing m, n, and I large enough, we can make each ofthe three terms on the RHS smaller than E/3;the first and the third ones because Kl --+ K, the second one because {I<J IYn)} is a convergent sequence. 0 Recall that given an orthonormal basis {lei) }~j' any operator T on a Hilbert space X can be written as L:0~1 Cij lei) (ejl, where Cij = (eil T Iej). Now let K be a compact operator and consider the finite rank operators n Kn sa L Cij Ie,) (ejl, i,j=l Clearly, 11K- Kn II --+ O. The hermitian adjoints {Kk} are also of finite rank (there- fore, compact). Barring some convergence technicality, we see that Kt, which is the limit of the sequence of these compact operators, is also compact. Kiscompact iff Kt is 16.5.7. Theorem. Kis a compact operator ifand only ifKt is. A particular type of operator occurs frequently in integral equation theory. These are called Hilbert-Schmidt operators and defined as follows: 16.5.8. Definition. Let:J{ be a Hilbert space, and {lei) }~l an orthonormal basis. Hiibert-Schmidt An operator T E ,c(X) is called Hilberl-Schmidt if operators
  • 484. 466 16. AN INTROOUCTION TO OPERATOR THEORY 00 00 00 tr(TtT) sa L (edTtT lei) = L (Ted Tei) = L IITedl 2 < 00. i=l i=l i=l Hilbert-Schmidt operators are compact. Hilbert-Schmidt kernel 16.5.9. Theorem. Hilbert-Schmidt operators are compact. For a proof, see [Rich 78, pp. 242-246]. 16.5.10. Example. It is time to give a concrete example of a compact (Hilbert-8chmidt) operator. For this, we retorn to Equation (16.2) with w(y) = I, and assume that Iu) E r}(a, b). Supposefurther thatthe functionK(x, y) is continuous on the closed rectangle [a, b] X [a, b] in the xy-plane (orIR2) . Under such conditions, K(x, y) is called a Hilhert- Schmidt kernel. We now show that Kis compact. Firstnote thatdue to the continuity of K(x, y), J:J:IK(x, y)I 2dx dy < 00. Next, we calculate the trace of KtK. Let {lei) )~1 be any orthonormal basis of ,e2(a, b). Then trKtK = f:(etl KtKlei) = 'tfff(eil x) (xl Kt Iy) (yl Klz) (zl ei) dxdydz l=l /=1 = fff(YIKIX)*(YIKlz)'t(zlei)(eilx)dXdydZ 1=1 ~o(x-z) = fff(yl Klx)* (yl KIz)'(zl (~Iei) (eil) Ix) dxdydz ---- =1 III Bernard Balzano (1781-1848) was a Czech philosopher, math- ematician, andtheologianwhomade significant contributions to bothmathematics andthetheoryof knowledge.He enteredthe Philosophy Faculty.oftheUniversity of Prague in 1796, stodying philosophy andmathematics. Hewrote "Myspecialpleasure in mathematics restedtherefore particularly on its purelyspecula- tiveparts, in otherwordsI prizedonly thatpart of mathematics which was atthe sametimephilosophy." .10 the automo of 1800 he began three years of theological studywhilehe was preparing a doctoral thesis on geometry. He receivedhis doctorate in 1804 for a thesis in which he gave his view of mathematics and whatconstitutes a correct mathematical proof.In theprefacehe wrote: I could not be satisfiedwith a completely strictproof if it were not derived from conceptswhich the thesis to beprovedcontained, butrather madeuse of somefortuitous, alien,intermediate concept,whichis alwaysanerroneous transition to another kind.
  • 485. 16.6 SPECTRUM OF COMPACT OPERATORS 467 Two days after receiving his doctorate Balzano was ordained a Roman Catholic priest. However, he came to realize that teaching and not ministering defined his true vocation. In the same year, Balzano was appointed to the chair of philosophy and religion at the University of Prague. Because of his pacifist beliefs and his concern for economic justice, he was suspended from his position in 1819 after pressure from the Austrian government. Balzano had not given up without a fight but once he was suspended on a charge of heresy he was put under house arrest and forbidden to publish. Although some ofhis books had to bepublished outside Austria because ofgovernment censorship, he continued to write and to play an important role in the intellectual life of his country. Balzano intended to write a series of papers on the foundations of mathematics. He wrote two, the first of which was published. Instead of publishing the second one he decided to" . .. make myself better known to the learned world by publishing some papers which, by their titles, would be more suited to arouse attention." Pursuing this strategy he published Der binomische Lehrsatz ... (1816) and Rein analytischer Beweis ... (1817), which contain an attempt to free calculus from the concept of the infinitesimal. He is clear in his intention stating in the preface of the first that the work is "a sample of a new way of developing analysis." The paper gives a proof of the intermediate value theorem with Bolzano's new approach and in the work he defined what is now called a Cauchy sequence. The concept appears in Cauchy's work four years later but it is unlikely that Cauchy had read Balzano's work. After 1817, Bolzano published no further mathematical works for many years. Between the late I820s and the 1840s, he worked on a major work Grossentehre. This attempt to put the whole of mathematics on a logical foundation was published in parts, while Balzano hopedthathis students wouldfinish andpoblishthe complete work. His work Paradoxien des Unendlichen, a study of paradoxes of the infinite, was pub- lished in 1851, three years after his death, by one of his students. The word "set" appears here for the first time. In this work Balzano gives examples of 1-1 correspondences between the elements of an infinite set and the elements of a proper subset. Bolzano's theories ofmathematical infinity anticipated Georg Cantor's theory of infinite sets. It is also remarkable that he gave a function which is nowhere differentiable yet everywhere continuous. 16.6 Spectrum of Compact Operators Our next task is to investigate the spectrum 0" (K) of a compact operator K on a Hilbert space J£. We are particularly interested in the set of eigenvalues and eigenvectors of compact operators. Recall that every eigenvalue of an operator on a vector space of finite dimension is in its spectrum, and that every point of the spectrum is an eigenvalue (see page 457). In general, the second statement is not true. In fact, we saw that the right-shift operator had no eigenvalue at all, yet its spectrum was the entire unit disk of the complex plane. We first observe that 0 E O"(K), because otherwise 0 E p(K), which implies that K = K - 01 is invertible with inverse K-t. The product of two compact operators (in fact, the product of a compact and a bounded operator) is compact (see
  • 486. 468 16. AN INTRODUCTION TO OPERATOR THEORY oE cr(K) if Kis compact Example 16.5.3). This yields a contradiction? because the unit operator cannot be compact: Itmaps a bounded sequence to itself, not to a sequence with a couvergent subseqnence. 16.6.1. Proposition. For any compact operator KE ~(:Ji) on an infinite dimen- sional Hilbert space, we have 0 E u(K). To proceed, we note that eigenvectors of K corresponding to the eigenvalue ). belong to the null space of K - ),1. So, letlO NJ. = ker(K - ),1), Nt =ker(Kt -),*1), :RJ. es Range(K - ),1), :Rt =Range(Kt -),*1). generalized eigenvector 16.6.2. Theorem. NJ. and Nt are finite-dimensional subspaces of X. Further- -n.L _ Nt more, J}.. - )..0 Proof. We use Theorem 16.4.12. Let (Ix.)} be a bounded sequence in NJ.. Since K is compact, (K Ix.) = ).Ix.)} has a convergent subsequence. So (Ix.)} has a convergent subsequence. This subsequence will converge to a vector in NJ. if the latter is closed. But this follows from Proposition 16.4.6, continuity of K- ),1, the fact that NJ. is the inverse image of the zero vector, and the fact that any single point of a space, such as the zero vector, is a closed subset. Finite-dimensionality of Nt follows from the compactness of Kt and a similar argument as above. To show the second statement, we observe that for any bounded operator T, wehavelllu) ET(X).Liff(ulv) = ofor all Iv) ET(X)iff(uITx) = ofor all [x) E X iff (Ttul x) = 0 for all [x) EX iffTt lu) = 0 iff lu) E kerTt. This shows that T(X).L = ker rt, The desired result is obtained by letting T = K - ),1 and noting that (W.L).L = W for any subspace W of a Hilbert space. 0 We note that NJ. is the eigenspace of K corresponding to the eigenvalue ).. However, it may well happen that zero is the ouly number in o (K). In the finite- dimensional case, this corresponds to the case where the matrix representation of the operator is not diagonalizable. In such a case, the standard procedure is to look at generalized eigenvectors. We do the same in the case of compact operators. 16.6.3.Definition. A vector Iu) is a generaTked eigenvector of Kof order m if (K - )'1)m-l lu) # Obut(K- )'1)m lu) = O. The set ofsuch vectors, i.e., the null space of(K - )'1)m, will be denoted byNt) It is clear that {O} =N(O) C N esN(ll C N(2) C ... c Jlfm) c N(m+l) ... A-A ) , . - ) . - _ A _ A (16.6) l~Our conclusion is valid only in infinite dimensions. In finitedimensions, all operators, including 1, are compact. In what follows, we assume that X#= O. 11Recall that T(9{) is the range of the operatorT.
  • 487. 16.6 SPECTRUM OF COMPACT OPERATORS 469 and each Nt') is a subspace of 1C In general, a subspace with higher index is larger than those with lower index. Ifthere happens to be an equality at one link of the above chain, then the equality continnes all the way to the right ad infinitum. To see this, let p be the first integer for which the equality occurs, and let n > p be arbitrary. Suppose lu) E Nin+l). Then (K - A1)p+1[(K - Al)n-p lu)] = (K- A1)n+1 lu) = O. It follows that (K - Al)n-p lu) is in NjP+1). But NjP) = NjPH). So (K - At)" lu) = (K - Al)P[(K - Al)n-p lu)] = o. Thus every vector in Nin+1 ) is also in Nin). This fact and the above chain imply that Nin) = NinH) for all n > p, 16.6.4. Theorem. The subspacell Ni n ) isfinite-dimensionalforeach n. Moreover, there is an integer p such that for n = 0, 1,2, ... , p - 1 butNin ) = Nin +1 ) for all n :::: p, Proof For the first part, use the result of Exarnple 16.5.3to show that (K-At)" = Kn - An1 where Kn is compact. Now repeat the proof of Tbeorem 16.6.2for Kn. Ifthe integer p exists, the second part of the theorem follows from the discus- sion preceding the theorem. To show the existence of p, suppose, to the contrary, that Nin ) oft Nin +1 ) for every positive integer n. This means that for every n, we can find a (unit) vector Ivn) E Nin+l) that is not in Nin) and that by Proposition 16.4.13has the property We thus obtain a bounded sequence {Ivn)]. Let us apply K to this sequence. If j > I, then (Ivj) - IVi») E N~+I) by the construction of IVj) and the fact that N(l+l) c N U+1) Furthermore A - } . . , but 11Forinfinite dimensions, thefactthatlinearcombinations of a subsetbelongtothesubsetis notsufficient tomakethat subset intoa subspace. Thesubsetmustalso be closed. Wenormally leaveouttherather technical proofof closure.
  • 488. 470 16. AN INTRODUCTION TO OPERATOR THEORY by the definition ofN~+I). Therefore, (K - A1)(lvj) -Iv[)) E N~). Now note that ENf) ENf) , - ' - , , - " - , Klvj) - Klvl) = AH<K- A1)IVj) - *(K-A1) IVI) +Ivj) -IVI)}. It follows from Proposition 16.4.13 that the nann of the vector in curly brackets is larger than ~. Hence, II KIvj) - KIVI) II ::: IAI/2, i.e., since j and I are arbitrary, the sequence (KIvn )} does not have a convergent subsequence. This contradicts the fact that Kis compact. So, there must exist a p such that Nip) = Nl."+!). D Wealso needthe range of various powers of K- A1.Thus, let ~in) = Range(K- At)". One can show that 1C= ~(O) ::::> ~(l) ::::> ••• ::::> ~(n) ::::> ~(n+l) ::::> ••• ),.-),,- -J..-)" - 16.6.5. Theorem. Each ~in) is a subspace of1C. Moreover, there is an integer q h ha men) m(n+l)fi - 0 l i b men) - m(n+l)fi II sue t t J}, f=..AJ,. orn - , , ... , q - , ut.},J. -..AJ.. ora n:::: q. Proof The proof is similar to that of Theorem 16.6.4. The only extra step needed is to show that ~in) is closed. We shall not reproduce this step. D 16.6.6. Theorem. Let q be the integer ofTheorem 16.6.5. Then 1. 1C= Niq ) Ell ~iq) 2. Niq ) and ~iq) are invariant subspaces ofK. 3. The only vector in ~iq) that K - A1 maps to zero is the zero vector. Infact, when restricted to ~iq), the operator K - A1 is invertible. Proof (l) Recall that 1C = Niq ) Ell ~iq) means that every vector of 1Ccan be written as the sum of a vector in Niq ) and a vector in ~iq), and the only vector common to both subspaces in zero. We show the latter property first. In fact, we show that Nim ) n ~iq) = 0 for any integer m. Suppose [x) is in this intersection. For each n ::: q, there must be a vector Ixn) in 1Csuch that [x) = (K - At)" Ixn) because~in) = ~iq) forn::: q. If [x) i' 0, then Ixn) <t Nr') foreachn. Nowletr be the larger of the two integers (p, q) where p is the integer of Theorem 16.6.4. Then (16.7)
  • 489. 16.6 SPECTRUM OF COMPACT OPERATORS 471 From 0= (K-A1)'" [x) = (K-A1)'"+'lxr) and it follows that Ixr) E Ni,"+r). But Ni,"+r) =Nf) =Nj"), contradicting Equation (16.7). We conclude that [x) must be zero,By the definition of:Ri q), for any vector lz) in :Ji, we have that (K - A1)q lz) E :Ri q) . Since :Ri q) = :Rfq), there must he a vector Iy) E:Ji such that (K-A1)q [z) = (K- A1)2q Iy) or (K- A1)qllz) - (K- A1)q Iy)] = O. This shows that Iz) - (K- A1)q Iy) is in Ni q). On the otherhand, [z) = liz) - (K-A1)q ly)J+(K-A1)q Iy).. and the first part of the theorem is done. (2) For the second part, we simply note that (K - A1)Nik ) C; Nik - 1 ) C; Nik ), and that K(Nl q) = (K - A1 +A1)(Ni q) =(K - A1)(Ni q )+ A1(Ni q) c Ni q). c:Niq) cNiq) Similarly, K(:Ri q) = (K - A1 +A1)(:Ri q) = (K - A1)(:Ri q) +A1(:Ri q) C :Ri q). ~ c:Riq +1) c1l.iQ ) c:Riq ) (3) Suppose [z) E :Ri q) and (K - A1) [z) =O. Then [z) = (K - A1)q Iy) for some Iy) in Jf, and 0 = (K - A1) [z) = (K - A1)q+lly), or Iy) E Ni q+1 ) . From part (l)-withm =q+ I-we conclude that lz) =O. Itfollows that K-A1 is injective (or 1-1). We also have (K-A1):Ri q) = (K-A1)(K-A1)q(:Ji) = (K - A1)q+l(:Ji) = :Ri q+1) = :Riq). Therefore, when restricted to :Ri q), the operator K - A1 is surjective (or onto) as well. Thus, (K - A1) : :Ri q) --> :Ri q) is bijective, and therefore has an inverse. D 16.6.7. Corollary. The two integers p and q introduced in Theorems 16.6.4 and 16.6.5 are equal.
  • 490. 472 16. AN INTROOUCTION TO OPERATOR THEORY Proof The proof is left as a problem for the reader (see Problem 16.5). D The next theorem characterizes the spectrum ofa compact operator completely. In order to prove it, we need the following lemma. 16.6.8. Lemma. Let K, : 9<Iq) --> 9<Iq) be the restriction ofK to 9<Iq). Then: 1. Each nonzero point ofa(K) is an eigenvalue ofK whose eigenspace isfinite- dimensional. 2. a(K,) "" a(K). 3. Every infinite sequence in a(K) converges to zero. Proof (I) If A oF 0 is not an eigenvalue of K, the null space of K - A1 is zero. This says that {O} "" NIO) "" Nil) "" .. " i.e., p "" q "" O. From Theorem 16.6.6, we conclude that:Ji "" NIO) EIl9<IO) "" 9<II). Therefore, K - A1 is onto. Part (3) of Theorem 16.6.6 shows that K - A1 is one-to-one. Thus, K- A1 is invertible, and A E p(K). So, A ¢ a(K). (2) Clearly a(K,) <; a(K). To show the reverse inclusion, first note that 9<Iq) is infinite-dimensional because NIq) has finite dimension. Thus by Proposition 16.6.1,0 E a(K,). Now let u-s-nonzero and distinct from A-be in a(K). By part (I) /L is an eigenvalue of K, so there is a vector lu) E :Ji such that Klu) "" /Llu). We also have (K - A1) lu} "" (/L - A) lu), or (K - A1)q lu) "" (/L - A)q lu). Thus, (/L - A)q lu) (and, therefore lu) is in 9<Iq). Therefore, we can restrict K to 9<Iq), i.e., we can write Klu) "" /Llu) as K, lu} "" /Llu}, or (K, - /L1) lu) "" O. Hence, /L E a(K,). We conclude that every point of a(K) is a point of a(K,) and a(K) <; a(K,). (3) Let A be the limit of an infinite sequence in a(K) "" a(K,). If A oF 0, K, - A1 will be invertible (Theorem 16'6.6 part 3), indicating that A E p(K,). Since p(K,) is open, we can find an open round neighborhood of A entirely in p(K,). This contradicts the property of a limit of an infinite sequence whereby any neighborhood of the limit contains (infinitely many) other points of the sequence. Therefore, we must conclude that no nonzero A can be the limit of an infinite sequence in a (K). D 16.6.9. Theorem. Let Kbe a compact operator on an infinite-dimensionalHilbert space :Ji. Then 1. 0 E a(K). 2. Each nonzero point ofa (K) is an eigenvalue ofK whose eigenspace isfinite- dimensional. 3. a (K) is either a finite set or it is a sequence that converges to zero.
  • 491. 16.7 SPECTRAL THEOREM FOR COMPACT OPERATORS 473 o Figure 16.1 Theshaded area represents a convexsubsetof the vector space. It consists of vectors whose tips lie in the shaded region. It is clear that there is a (unique) vector belongingto the subsetwhose lengthis minimum. Proof (I) was proved in Proposition 16.6.1. (2) was shown in the lemma above. (3) Let <T,,(K) sa {A E <T(K) IIAI ~ lin}. Clearly, <Tn(K) must be a finite set, because otherwise the infinite set would constitute a sequence that by compactuess of <T"(K) would have to have (at least) a limit point. By part (2), this limit must be zero, which is not included in <Tn (K). Let <T1 (K) = {Ai}7~1,arranged in order of decreasing absolute value. Next, let Ak+l' Ak+2, ... label the elements of <T2(K) not accounted for in <T1 (K), again arranged in decreasing absolute value. If this process stops after a finite number of steps, <T (K) is finite. Otherwise, continue the process to construct a sequence whose limit by necessity is zero. D 16.7 Spectral Theorem for Compact Operators The finite-dimensional spectral decomposition theorem of Chapter 4 was based on the existence of eigenvalues, eigenspaces, and projection operators. Such existence was guaranteedby the existence of an innerproduct for any finite-dimensional vec- tor space. The task of establishing spectral decomposition for infinite-dimensional vector spaces is complicated not only by the possibility of the absence of an inner product, but also by the questions ofcompleteness, closure, and convergence. One can elintinate the first two hindrances by restticting oneself to a Hilbert space. However, evenso,onehasto dealwithother complications of infinite dimensions. As an example, consider the relation V = W 6) W.L,which is ttivially true for any subspace W in finite dimensions once an orthonormal basis is chosen. Recall that the procedure for establishing this relation is to complement a basis of W to produce a basis for the whole space. In an infinite-dimensional Hilbert space, we do not know a priori how to complement the basis of a subspace (which may be infinite-dimensional). Thus, one has to prove the existence of the orthogonal complement of a subspace. Without going into details, we sketch the proof. First a definition:
  • 492. 474 16. AN INTRODUCTION TO OPERATOR THEORY convex subset 16.7.1. Definition. A convex subset E ofa vector space is a collection ofvectors such that iflu) and Iv)are in E, then lu) -t(lu) -Iv» is also in Efor all 0 :0: t :0: 1. Intuitively, any two points of a convex subset can be connected by a straigbt line segment lying entirely in the subset. Let E be a convex subset (not a subspace) of a Hilbert space Ji:. One can show that there exists a unique vector in E with minimal nonn (see Figure 16.1). Now let M be a subspace of Ji:. For an arbitrary vector lu) in Ji:, consider the subset E = lu) - M, i.e., all vectors of the form lu) - 1m) with 1m) E M. Denote the unique vector of minimal nonn of lu) - M by lu) - IPu) with IPu) EM. One can show that lu) - IPu) is orthogonal to lu), i.e., (Iu) - IPu» E M.L (see Figure 16.2). Obviously, only the zero vector can be simultaneously in M and M.L. Furthermore, any vector lu) in Ji: can be written as lu) = IPu) +(Iu) -IPu» with IPu) EM and (Iu) -IPu» E M.L. This shows that Ji: = M Ell M.L. In words, a Hilbert space is the direct Sum of anyone of its subspaces and the orthogonal complement of that subspace. The vector IPu) so constructed is the projection of lu) inM. A projection operator P can be defined as a linear operator with the property that p2 = P. One can then show the following. 16.7.2. Theorem. The kernel ker P of a projection operator is the orthogonal complement ofthe range P(Ji:) ofP in Ji: iff P is hermitian. This is the reason for demanding henniticity of the projection operators in our treatruent of the finite-dimensional case. We now concentrate on the compact operators, and first look at hermitian compact operators. We need two lemmas: 16.7.3. Lemma. Let H E :B(Ji:) be a bounded hermitian operator on the Hilbert space K Then [H] = maxl] (H»] x) II1Ixll = II. Proof Let M denote the positive number on the RHS. From the definition of the norm ofan operator, we easily obtain I(Hxlx) 1:0: lIHlllIx1l2 = lIHlI,orM:o: lIHII. For the reverse inequality, see Problem 16.7. D 16.7.4. Lemma. Let K E :B(Ji:) be a hermitian compact operator. Then there is an eigenvalue}" ofKsuch that IAI = lIKll. Proof Let (Ixn)I be a sequence of unit vectors such lbat This is always possible, as the following argument shows. Let E be a small positive number. There must exist a unit vector IXI) E Ji: such that
  • 493. 16.7 SPECTRAL THEOREM FOR COMPACT OPERATORS 475 Figure 16.2 Theshaded area represents thesubspace Jy(of thevectorspace. Theconvex subset E consists of allvectors connecting points of M tothetipof lu}. It is clearthat there is a (unique) vectorbelonging to E whose lengthis minimum. The figure showsthat this vector is orthogonal to M, because otherwise, IIKII - E would be greater thau or equal to the uonn of the operator (see Lemma 16.7.3). Similarly, there must exist auother (differeut) unit vector IX2} E 1f such that II KII - E12 = I (Kx21 X2) I. Continuing this way, we construct au infinite sequence ofunit vectors (Ixn)) with the property IIKII-Eln = I(Kx n Ix n) I·This coustructionclearly produces the desired sequence. Note that the argumeut holds for any hennitiau bounded operator; compactness is uot uecessary. Now define an sa (Kxnl xn ) aud leta = lim an, so that lal = IIKII. Compact- ness of Kimplies that (IKxn)) converges. Let Iy} E 1fbe the limit of {IKxn}}. Then lIyll = lim IIKxnll ~ IIKllllxnll = IIKII. On the otherhaud, o~ IIKxn - axnll = IIKxnll 2 - 2a (Kxnl xn) + lal 2. Taking the limit aud noting that an aud a are real, we get It follows from these two inequalities that lIyll = IIKII aud that lim Ixn } = Iy} [tx, Furthermore, (K - al)(ly} la) =(K - al)(lim Ixn}) =lim(K - al) Ixn} =0 Therefore, a is au eigeuvalue of Kwith eigenvector Iy} Ia. o Let us orderall the eigeuvalues ofTheorem 16.6.9 in decreasing absolute value. Let Mn denote the (finite-dimensional) eigeuspace corresponding to eigenvalue An, aud Pn the projection to M n. The eigenspaces are pairwise orthogonal aud PnPm = 0 for m i' n. This follows in exact aualogy with the finite-dimensional case. First assume that K has only finitely mauy eigenvalues,
  • 494. 476 16. AN INTRODUCTION TO OPERATOR THEORY Let M "" Ml Ell Mz Ell ... Ell M,. = I:'i=l EIlMj, and let Mo be the orthogonal complementofM. Sinceeacheigenspace is invariant underK, so is JvC, Therefore, by Theorem 4.2.3-which holds for finite- as well as infinite-dimensional vector spaces-and the fact that Kis henmitian, Mo is also invariant. Let Ko be the restric- tion of Kto Mo. By Lemma 16.7.4, Ko has an eigenvalne Asnch that IAI = IIKolI.· If A i" 0, it mnst be one of the eigenvalnes already accounted for, because any eigenvalue of Ko is also an eigenvalue of K. This is impossible, because Mo is orthogonal to all the eigenspaces. So, A = 0, or IAI = IIKoll = 0, or Ko = 0, i.e., Kacts as the zero operator on Mo. Let Po be the orthogonal projection on Mo. Then JC = I:'i=o EIlMi- and we have 1 = I:'i=o Pi- and for an arbitrary [x) E JC, we have spectral theorem for compact hermitian operators K[x) = K (~Pj IX)) = ~K(Pj Ix)) = tAj(Pj Ix)). Itfollows that K = I:'i=l AjPj. Notice that the range ofKis I:'i=l EIlMj,which is finite-dimensional. Thus, Khas finite rank. Barring some technical details, which we shall notreproducehere, the case ofa compacthenmitian operatorwith infinitely many eigenvalues goes through in the sanneway (see [DeVi 90, pp. 179-180]): 16.7.5. Theorem. (spectral theorem: compact henmitian operators) Let K be a compact hermitian operator on a Hilbert space JC. Let {Aj IN=l be the distinct nonzero eigenvalues of K arranged in decreasing order of ahsolute values. For each j let Mj be the eigenspace ofK corresponding to eigenvalue Aj and Pj its projection operator with the property Pi Pj = 0 for i i" j. Then: 1. If N < 00, then K is an operator offinite rank, K = I:J = l AjPj, and JC = Mo Ell Ml Ell··· Ell MN, or 1 = I:J=oPj, where Mo is infinite- dimensional. 2. If N = 00, then Aj -+ 0 as j -+ 00, K = I:i=l AjPj, and JC = Mo Ell I:i=l EIlMl» or 1 = I:i=o P[, where Mo could be finite- or infinite- dimensional. Furthermore, which shows that the infinite series above converges for an operator norm. The eigenspaces of a compact hermitian operator are orthogonal and, by (2) of Theorem 16.7.5, span the entire space. By the Grann-Schrnidt process, one can select an orthonormal basis for each eigenspace. We therefore have the following corollary. 16.7.6. Corollary. If K is a compact hermitian operator on a Hilbert space JC, then the eigenvectors ofK constitute an orthonormal basis for JC.
  • 495. spectral theorem for compact normal operators 16.7 SPECTRAL THEOREM FOR COMPACT OPERATORS 477 16.7.7. Theorem. Let Kbe a compact hermitian operator on a Hilbert space Jf and let K = '£7~1 AjPj, where N could be infinite. A bounded linear operator on Jf commutes with Kifand only if it commutes with every Pi- Proof The "if" part is straightforward. So assume that the bounded operator T commutes with K. For [x) E Mj,wehave(K-Aj)Tlx) = T(K-Aj) Ix) =0. Similarly, (K - Aj)Tt [x) = Tt(K - Aj) Ix) = 0, because 0 = [T, K]t = rr'. K]. These equatious show that both T aud rt leave M j invariaut. This meaus that M j reduces T, aud by Theorem 4.2.5, TPj = PjT. 0 Next we prove the spectral theorem for a uormal operator. Recall that auy operatorTcaubewrittenasT=T,+iTiwhereT, = !(T+Tt)audTi = lICT-Tt) are henntitiau, aud since both T aud rt are compact, T, aud Ti are compact as well. For normal operators, we have the extra condition that [T" Til = [T,Tt] = O. Let T, = '£7~1 AjPj aud r, = ,£f=ll-'kQk be the spectral decompositions err, aud Ti. Using Theorem 16.7.7, it is strailfhtforward to show that if [T" Til = 0 then [Pj,Qk] = O.Now,sinceJf = '£j~oEllMj = '£~oEll:Nk.whereMj are the eigenspaces ofT, aud:Nk those ofTi, we have, for auy Ix) E Jf, T, [x) = (~AjPj) (~Qk IX)) = ~~AjPjQk Ix). Sintilarly, r, [x) = Ti('£7~oPj Ix) = '£f=1 ,£7=0I-'kQkPj [x), Combining these two relations aud noting that QkPj = Pj Qk gives N N T [x) = (T, +iTi) [x) = L L(Aj +il-'k)PjQk [z}. j~Ok~O The projection operators Pj Qk project onto the intersection of M j aud :Nk.There- fore, M j n :Nk are the eigenspaces of T. Only those terms in the sum for which M j n:Nk f= 0 contribute. As before, we cau order the eigenvalues according to their absolute values. 16.7.8. Theorem. (spectral theorem: compact normal operators) LetT be a com- pact normal operator on a Hilbert space Jf. Let {Aj 17=1 (where N can be 00) be the distinct nonzero eigenvalues ofT arranged in decreasing order ofabsolute values. For each n let Mn be the eigenspace ofT corresponding to eigenvalue An and Pn its projection operator with the property PmPn = Ofor m f= n. Then: 1. If N < 00, then T is an operator offinite rank T = '£~~1 Aj Pi- and Jf = Mo Ell Ml Ell··· Ell MN, or 1 = ,£7=0 Pj, where Mo is infinite- dimensional. 2. If N = 00, then An --> 0 as n --> 00, T = '£~1 AnPno and Jf = Mo Ell '£~1 EIlMn, or 1 = '£1=0 Pj, where Mo could be finite- or infinite- dimensional.
  • 496. 478 16. AN INTRODUCTION TO OPERATOR THEORY As in the case of a compact hermitian operator, by the Gram-Schmidt process, one can select an orthonormal basis for each eigenspace of a normal operator, in which case we have the following: 16.7.9. Corollary. lfT is a compact normal operator on a Hilbert space Jf, then the eigenvectors ofT constitute an orthonormal basis for Jf. One can use Theorem 16.7.8 to write any function of a normal operator T as an expansion in terms of the projection operators of T. First we note that Tk has A~ as its expansion coefficients. Next, we add various powers of T in the form of a polynomial and conclude that the expansion coefficients for a polynomial p(T) are pO.n). Finally, for any function f(T) we have 00 f(T) = Lf(An)Pn. n=l (16.8) Johann (John) von Neumann, (1903-1957), the eldest ofthree sons of Max von Neumann, a well-to-do Jewish banker, was privately educated until he entered the gymnasium in 1914. His unusual mathematical abilities soon came to the attention of his teachers, who pointed out to his father that teaching him conventional school mathematics would be a waste of time; he was therefore tutored in mathematics under the guidance of university professors, and by the age ofnineteen he was already recognized as a professional mathematician and had published his first paper. VonNeumann was Privatdozentat Berlinfrom 1927 to 1929 and at Hamburg in 1929-1930, then went to Princeton.University forthree years; in 1933 he was invited to join the newly opened Institute for Advanced Study, of which he was the youngest permanent member at that time. At the outbreak of World War Il, von Neumann was called upon to participate in various scientific projects related to the war effort: In particular, from 1943 he was a consultant on the construction of the atomic bomb at Los Alamos. After the war he retained his membership on numerous government boards and committees, and in 1954 he became a member of the Atomic Energy Commission. His health began to fail in 1955, and he died of cancer two years later. It is only in comparison with the greatest mathematical geniuses of history that von Neumann's scope in pure mathematics may appear somewhat restricted; it was far beyond the range ofmost ofhis contemporaries, and his extraordinary work in applied mathematics, in which he certainly equals Gauss, Cauchy, or Poincare, more than compensates for its limitations. Von Neumann's work in pure mathematics was accomplished between 1925 and 1940, when he seemed to be advancing at a breathless speed on all fronts of logic and analysis at once, not to speak of mathematical physics. The dominant theme in von Neumann's work is by far his work on the spectraltheoryofoperatorsin Hilbert spaces. For twenty years he was the undisputed master in this area, which contains what is now considered his most profound and most original creation, the theory of rings of operators.
  • 497. 16.7 SPECTRAL THEOREM FOR COMPACT OPERATORS 479 The first papers (1927) in which Hilbert space theory appears are those on the foundations ofquantum mechanics. These investigations later led von Neumann to a systematic study of unbounded hermitian operators. Von Neumann's most famous work in theoretical physics is his axiomatization ofquan- tum mechanics. When he began work in that field in 1927,' the methods used by its founders were hard to formulate in precise mathematical tenus: "Operators" on "functions" were handled without much consideration of their domain of definition or their topological prop- erties, and it was blithely assumed that such "operators," when self-adjoint, could always be "diagonalized" (as in the finite dimensional case), at the expense of introducing Dirac delta functions as "eigenvectors." Von Neumann showed that mathematical rigor could be restored by taking as basic axioms the assumptions that the states of a physical system were points of a Hilbert space and that the measurable quantities were Hermitian (generally unbounded) operators densely defined in that space. After 1927 von Neumann also devoted much effort to more specific problems of quan- tum mechanics, such as the problem ofmeasurement and the foundation of quantum statis- tics and quantum thermodynamics, proving in particular an ergodic theorem for quantum systems. All this work was developed and expanded in Mathematische Grundlagen del" Quantenmechanik (1932), in which he also discussed the much-debated question of"causal- ity" versus "indeterminacy" and concluded that no introduction of "hidden parameters" could keep the basic structure of quantum theory and restore "causality." Von Neumann's uncommon grasp of applied mathematics, treated as a whole without divorcing theory from experimental realization, was nowhere more apparent than in his work on computers. He became interested in numerical computations in connection with the need for quick estimates and approximate results that developed with the technology used for the war effort-particularly the complex problems of hydrodynamics-and the completely new problems presented by the harnessing of nuclear energy, for which no ready-made theoretical solutions were available. Von Neumann's extraordinary ability for rapid mental calculation was legendary. The story is told of a friend who brought him a simple kinematics problem. Two trains, a certain given distance apart, move toward each other at a given speed. A fly, initially on the windshield of one of the trains, flies back and forth between them, again at a known constant speed. When the trains collide, how far has the fly traveled? One way to solve the problem is to add up all the successively smaller distances in each individual flight. (The easy way is to multiply the Ily'sspeed by the time elapsed until the crash.) After a few seconds of thought, von Neumann quickly gave the correct answer. "That's strange," remarked his friend, "Most people try to sum the infinite series." "What's strange about that?" von Neumann replied. "That's what I did." In closing this section, let ns remark that the paradigm of compact operators, namely the Hilbert-Schmidt operator, is such because it is defined on the finite rectangle [a, b) x [a, b). If this rectangle grows beyond limit, or equivalently, if the Hilbert space is r:..2(R oo), where Roo is some infinite region of the real line, then the compactness property breaks down, as the following example illustrates. 16.7.10. Example. Consider the twokernels Kj(x. t) = e-Ix-II and K2(X, t) = sinxt
  • 498. 480 16. AN INTROOUCTION TO OPERATOR THEORY wherethe first one acts on ,(,2(_00, 00) and the second one on ,(,2(0, 00). One can show (see Problem 16.8) thatthese two kernels have, respectively, the two eigenfunctions e iat• a E JR., and a >0, corresponding to thetwoeigenvalues 2 ),,=--2' o<EIR, 1+0< and resolvent ofan operator Wesee that in the first case, all realnumbers between0 and2 areeigenvalues, rendering this set uncountable. In the second case, there are infinitely (in fact, uncountably) many eigenvectors. (oneforeacha) corresponding to thesingleeigenvalue JJr/2. Note,however, that in the first case the eigenfunctions and in the second case the kernel have infinite norms. II 16.8 Resolvents The discussion of the preceding section showed that the spectrum of a normal compact operator is countable. Removing the compactness property in general will remove countability, as showninExample 16.7.10. Wehave also seen that the right- shiftoperator, aboundedoperator, has uncountably many points in its spectrum. We therefore expect that the sums in Theorem 16.7.8 should be replaced by integrals in the spectral decomposition theorem for (noncompact) bounded operators. We shall not discuss the spectral theorem for general operators. However, one special class of noncompact operators is essential for the treatroent of Stnrrn-Liouville theory (to be stndied in Chapters 18 and 19). For these operators, the concept of resolvent will be used, which we develop in this section. This concept also makes a connectionbetweenthecountable (algebraic) and the uncountable (analytic) cases. 16.8.1. Definition. Let T be an operator and A E P(T). The operator Ri.(T) ea (T - A1)-1 is called the resolvent ofT at A. There are two important properties of the resolvent that are useful in analyzing the .spectrum of operators. Let us assume that A, fJ, E P(T), A rf fJ" and take the difference between their resolvents. Problem 16.9 shows how to obtain the following relation: (16.9) To obtain the second property of the resolvent, we formally (and indefinitely) differentiate Ri.(T) with respect to Aand evaluate the result at A= W
  • 499. 16.8 RESOLVENTS 481 Differentiating both sides of this equation. we get 2Ri(T), and in general. dn dn -R,(T) = n!R~+l(T) =} -R,(T) I = n!R~+l(T). d)..n d)..n ,~" Assuming that the Taylor series expansion exists, we may write (16.10) (16.11) forn =0.1•... which is the second property of the resolvent. We now look into the spectral decomposition from an analytical viewpoint. For convenience, we concentrate on the finite-dimensional case and let A be an arbitrary (not necessarily hermitian) N x N matrix. LetAbe a complexnumberthat is larger (in absolute value) than any of the eigenvalues of A. Since all operators on finite-dimensional vector spaces are compact, Lemma 16.7.4 assures us that 1)..1 > IITII. and it is then possible to expand R, (T) in a convergent power series as follows: 1 00 A n R,(A) = (A_)..l)-l = -- L:(-) ).. n~O ).. This is the Laurent expansion of R, (A). We can immediately read off the residue of R,(A) (the coefficient of 1/)..): Res[R,(A)] = -1 =} ----.!..., 1 R,(A)d)" = 1. 2m fr. where I' is a circle with its center at the origin and a radius large enongh to encompass all the eigenvalues ofA [see Figure l6.3(a)]. A similar argument shows that ----.!..., 1 )..R,(A)d)" = A. 21n fr. and in general. 1 inn --. )..R,(A)d)..=A 21l"1 r Using this and assuming that we can expand the function f(A) in a power series. we get - ~ 1 f()..)R,(A)d)" = f(A). 2m fr. Writing this equation in the form _1_1 f()..) d).. = f(A) 2rri fr. )"l-A (16.12)
  • 500. 482 16. AN INTRODUCTION TO OPERATOR THEORY • • 0 0 • 0 0 • 0 • • 0 (a) (b) Figure 16.3 (a)Thelargecircleencompassing alleigenvalues. (b)thedeformed contour consisting of smallcirclesorbiting theeigenvalues. makes it recognizable as the generalization of the Cauchy integral formula to operator-valued functions. To use any of the above integral formulas, we must know the analytic behavior of RJ.(A). From the formula of the inverse of a matrix given in Chapter 3, we have where Cjk(J..) is the cofactor of the ijth element of the matrix A - J..1 and p(J..) is the characteristic polynomial of A. Clearly, Cjk(J..) is also a polynomial. Thus, [RJ.(A)]jk is a rational function ofJ... It follows that RJ.(A) has ouly poles as singu- larities (see Example 10.2.2). The poles are simply the zeros of the denominator, i.e., the eigenvalues ofA.We can deform the contour r in such a way that it consists of small circles Yj that encircle the isolated eigenvalues J..j [see Figure 16.3(b)]. Then, with f(A) = 1, Equation (16.12) yields Pj sa -~ J RJ.(A)dJ... 2n, r"j (16.13) It can be shown (see Example 16.8.2 below) that {Pj) is a set of orthogonal pro- jection operators. Thus, Equation (16.13) is a resolution ofidentity, as specified in the spectral decomposition theorem in Chapter 4.
  • 501. 16.8 RESOLVENTS 483 16.8.2. Example. We want to show that the Pj are projection operators. First let i = j. Then12 P] = ( __1_.)2 1. R)JA)dA 1. Rtt(A)dl-'. 27l'Z t; jYi Note that Aneed not be equal to 1-'. In fact, we are free to choose IA - Aj I > II-' - Aj I,i.e., letthecirclecorresponding to ).. integration be outsidethatof JL integration.U Wecanthen rewrite the abovedoubleintegral as P] = (_f)2 1.(') 1.V» Rie(A)Rtt(A)dAdl-' 1rl irj irj = ( __ I )21. 1. [Rie(A) _ Rtt(A)] dAdl-' 21ri 1'r~},.) J;,~) ).. - JL J... - JL I I = h~J {it,)Rie(A) dA i(.) A~I-' - i<.)Rtt(A)du: it,)A~I-'} , J J J J wherewe used Equation (16.9) to go to the secondline. Now note that and i d A - - =2rri y~).) A - JL I because Alies outside rJIL) andn lies insideyy).Hence, P] = (- 2~i)2{o-'Ixi iiV» Rtt(A)dl-'} = - 2~i iy)Rtt(A)dl-' = Pj. The remaining part, namely PjPk = 0 for k '" j, can be done sbnilarly (see Problem 16.10). .. Now we let I(A) = A in Eqnation (16.12), deform the contonr as above, and write (16.14) 12We havenot discussed multipleintegrals of complexfunctions. A rigorous studyof such integrals involves thetheory of functions of several complexvariables-a subjectwe haveto avoiddue to lack of space.However, in the simplecase athand, thetheory of realmultipleintegrals is anhonestguide. 13Thisis possiblebecausethepoles areisolated.
  • 502. 484 16. AN INTRODUCTION TO OPERATOR THEORY It can be shown (see Problem 16.11) that OJ = 1. (A - Aj)" RA(A)o: t; In particular, since RA(A) has only poles as singularities, there exists a positive integer m such that OJ' = O. We have not yet made any assumptions about A. If we assume that A is hermitian, for example, theu RA(A) will have simple poles (see Problem 16.12). It follows that (A - Aj)RA(A) will be analytic at Aj for all j = 1,2,oo.,r,andOj =OiuEquatiou(16.14). We thus have , A= I>jPj, )=1 which is the spectral decomposition discussed in Chapter 4. Problem 16.13 shows that the Pj are hermitian. 16.8.3. Example. The most general 2 x 2 hermitiao matrixis of the form A = (a~l a12) , al2 aZ2 where au andaZ2 are-real numbers. Thus, whichhasroots Al = ![411 +a22 - )(all - 422)2+4141212]. A2 = ![all +a22 + )(all - a22)2 +4141212]. Theinverse of A- A1canimmediately be written: I I (a22 - A 412 ) R(A)-(A-A1)- - - A - - det(A-A1) -ai2 411-A I (aZ2 - A -aI 2 ) = (A - AI)(A - A2) -ah 411 - A . We wantto verify thatR). (A)has only simple poles. Twocases arise: 1. If Al i' A2, then it is clear that RA(A) has simplepoles. 2. If Al = A2, it appears that R}"(A) mayhavea pole of order 2. However, notethatif Al = A2.thenthe square rootsin theabove equationsmustvanish.This happensiff all = aZ2 sa a and aiz = O. It then follows that A} = A2 == a, and I (4-A 0) RA(A) = (A-a)2 0 a-A' This clearlyshowsthat RA(A) has only simplepoles in dtis case. III
  • 503. Jordan canonical form 16.9 PROBLEMS 485 If A is not hermitian, Dj oF 0; however, Dj is nevertheless nilpotent. That is, Dj' = 0 for some positive integer m. This property and Equation (16.14) can be used to show that A can be cast into a Jordan canonicalform via a similarity transformation. That is, there exists an N x N matrix S such that ("' 0 0 ... n SAS- 1 = J = IJ2 0 ... 0 0 where Jk is a matrix of the form A I 0 0 0 0 0 A I 0 0 0 Jk = 0 0 A I 0 0 0 0 0 0 A I in which A is one of the eigenvalnes of A. Different Jk may contain the same eigenvalues of A. Fora discussion of theJordan canonical form of a matrix, see [Birk 77], [Denn 67], or [Halm 58]. 16.9 Problems 16.1. Suppose that S is a bounded operator, T an invertible operator, and that Show that S is invertible. Hint: Show that T-1S is invertible. Thus, an operator that is "sufficiently close"to aninvertible operator is invertible. 16.2. Let V and W be finite-dimensional vector spaces. Show that T E .G(V,W) is necessarily bounded. 16.3. Let Jf be a Hilbert space, and T E .G(Jf) an isometry, i.e., a linear operator that does not change the norm of any vector. Show that IITII = 1. 16.4. Show that (a) the unit operator is not compact, and that (b) the inverse of a compact operator cannot be bounded. Hint: For (b) use the resnlts of Example 16.5.3. 16.5. Prove Corollary 16.6.7. Hint: Let [x) E Ni q + l ) and write it as [x) = In)+ Ir) with In) E Niq) and Ir) E :1<iq).Apply (K - A1)q+1 to [r), and invoke part (3) of Theorem 16.6.6 to show that Ir) E Ni q). Conclude that Ir) =0, Niq+l) = Ni q) ,
  • 504. 486 16. AN INTROOUCTION TO OPERATOR THEORY and q 2: p. To establish the reverse inequality, apply (K - I-.l)P to both sides of the direct sum of part (I) of Theorem 16.6.6, and notice that the LHS is ~i.P), the second term of the RHS is zero, and the first term is ~iq+p). Now conclude that p 2: q. 16.6. Let]«) E ~andletMbeasubspaceof1f.ShowthatthesubsetE = lu}-M is convex. Show that E is not necessarily a subspace of~. 16.7. Show that for any hermitian operator H, we have 4(Hxly} = (H(x +Y)lx +y) - (H(x - Y)lx - y) +i[(H(x +iy)1 x +iy} - (H(x - iy)1x - iy)]. Now let Ix) = I-.Iz) and Iy) = 1Hz) II-., where I-. = (IIHzll/llzlI)1/2, and show that IIHzll2 = {Hx] y) ::: MllzlIlIHzlI, where M = maxl] (Hz] z) 1/IIIzIl2}.Now conclude that IIHII ::: M. 16.8. Show that the twokemels Kj(x, r) =e-Ix-tl and K2(X, t) =sinxt, where the first one acts on ,(,2(-00,00) and the second one on ,(,2(0, 00), have the two eigenfunctions and o at t V2e + a2 + t2 ' respectively, corresponding to the two eigenvalues a> 0, 2 1-.=I+a2 ' aEIR, and 16.9. Derive Equation (16.9). Hint: Multiply R,,(T) by 1 = R"m(T - iLl) and R"m by 1 = R,,(T)(T - 1-.1). 16.10. Finish Example 16.8.2 by showing that PjPk = 0 for k i' j, 16.11. Show thalD) = :FYi (I-. - I-.j)nR"(A)d): Hint: Use mathematical induction and the technique used in Example 16.8.2. 16.12. (a) Take the inner product of lu) = (A - 1-.1) [u) with Iv) and show that for a hermitian A, 1m(vi u) = -(1m 1-.) IIv 11 2. Now use the Schwarz inequality to obtain IIvll::: I~~~I => IIR,,(A)lu)lI::: I~~I~I' (b) Use this result to show that
  • 505. 16.9 PROBLEMS 487 where eis the angle that A- Aj makes with the real axis and Ais chosen to have an imaginary part. From this result conclude that R.(A) has a simple pole when A is hermitian. 16.13. (a) Show that when A is hermitian, [R.(A)]t = R•• (A). (b) Write A - Aj = rjeie in the definition of Pj in Equation (16.13). Take the hermitian conjugate of both sides and use (a) to show that Pj is hermitian. Hint: You will have to change the variable of integration a number of times. Additional Readings 1. DeVito, C. Functional Analysis and Linear Operator Theory, Addison- Wesley, 1990. Our treatment of compact operators follows this reference's discussion. 2. Glimm, J. and Jaffe, A. Quantum Physics, 2nd ed., Springer-Verlag, 1987. One of the mostmathematical treatments ofthe subject, and therefore a good introduction to operator theory (see the appendix to Part I). 3. Reed, M. and Simon, B. FourierAnalysis, Self-Adjointness, Academic Press, 1980. 4. Richtmyer, R. Principles of Advanced Mathematical Physics, Springer- Verlag, 1978. Discusses resolvents in detail. 5. Zeidler, E. Applied Functional Analysis, Springer-Verlag, 1995.
  • 506. 17 -,------ _ Integral Equations The beginning of Chapter 16 showed that to solve a vector-operator equation one transforms it into an equation involving a sum over a discrete index [the matrix equation of Equation (16.1n,or an equationinvolving an integral over a continuous index [Equation (16.2)]. The latter is called an integral eqnation, which we shall investigate here using the machinery of Chapter 16. 17.1 Classification Volterra and Fredholm equations offirstand second kind Integral equations can be divided into two major groups. Those that have a variable limit of integration are called Volterra equations; those that have constant limits of integration are called Fredholm equations. If the unknown function appears only inside the integral, the integral equation is said to be of the first kind. Integral equations having the unknown function outside the integral as well as inside are said to be of the second kind. The four kinds of equations can be written as follows. Volterra equation of the Ist kind, Volterra equation of the 2nd kind, Fredhohn equation of the Ist kind, Fredhohn equation of the 2nd kind. l'K(x, t)u(t) dt = v(x), 1 b K(x, t)u(t) dt = v(x), u(x) = v(x) +l'K(x, t)u(t)dt, u(x) = v(x) +1 b K(x, t)u(t)dt, In all these equations, K (x, t) is called the kernel of the integral equation.
  • 507. 17.1 CLASSIFICATION 489 In the theory of integral equations of the second kind, one nsually multiplies the integral by a nonzero complex number A. Thus, the Fredhohn equation of the second kind becomes A A that satisfies (17.2) with v(x) = 0 is called a characteristic vaiue of the integral equation. In the abstract operator language both equations are written as characteristic value ofan Integral equation u(x) = v(x) +A1 b K(x, t)u(t) dt, and for the Volterra equation of the second kind one obtains u(x) = v(x) +A1 X K(x, t)u(t) dt. lu} = Iv) +AK lu} =} (K - A-I) lu) = -A-llv). (17.1) (17.2) (17.3) Thus A is a characteristic value for (17.1) if and only if A-1 is an eigenvalue of K. Recall that when the interval of integration (a, b) is finite, K(x, t) is called a Hilbert-Schmidt kemel. Example 16.5.10 showed that K is a compact operator, and by Theorem 16.6.9, the eigenvalues of Keither form a finite set or a sequence that converges to zero. 17.1.1. Theorem. The characteristic values ofa Fredholm equation ofthe second kind eitherform a finite set or a sequence ofcomplex numbers increasing beyond limit in absolute value. Our main task in this chapter is to study methods of solving integral equations ofthe second kind. We treat the Volterra equation first because it is easier to solve. Let us introduce the notation K[u](x) == 1 x K(x, t)u(t) dt and Kn[u](x) = K[Kn-1[u]](x) (17.4) whereby K [u1denotes a function whose value at x is given by the integral on the RHS of the first equation in (17.4). One can show with little difficulty that the associated operator Kis compact. Let M = max{IK (x, t) I Ia ::; t ::; x ::; b} and note that IAK[u](x)1 = IA1 x K(x, t)u(t) dtl ::; IAIIMlliulloo(x - a), where lIulioo == max{lu(x)I I x E (a, b)}. Using mathematical induction, one can show that (see Problem 17.1) (x a)n I(AK)n [u](x) I s 1'-lnlMlnllulioo , n. (17.5)
  • 508. 490 17. INTEGRAL EQUATIONS Since b 2:: x, we can replace x with b and still satisfy the inequality. Then the inequality of Eqnation (17.5) will hold for all x, and we can write the equation as an operator norm ineqnality: II(J.K)n II :s: 1J.lnlMlnlIuIIoo(b - a)n[n': Therefore, Volterra equalion of the second kind has aunique solution and no nonzero characterislic value and the series I:~o(J.K)n converges for all J.. In fact, a direct calculation shows that the series converges to the inverse of 1 - J.K.Thus, the latter is invertible and the spectrum of Khas no nonzero points. We have just shown the following. 17.1.2. Theorem. The Volterra equation ofthe second kind has no nonzero char- acteristic value. In particular, the operator 1 - J.K is invertible, and the Volterra equation of the second kind always has a unique solution given by the conver- gent infinite series u(x) = I:j:o J.i J: K i (x, t)v(t) dt where K i (x, t) is defined inductively in Equation (/7.4). VitoVolterra (1860-1940)wasonly t1whenhehecameinterested inrnathematics whilereading Legendre's Geometry. Attheageof 13 . hebegan tostudy thethree bodyproblem andmade someprogress. Hisfamilywereextremety poor(hisfatherhad diedwhenVito was two yearsold) butafterattending lecturesatFlorencehe was abletoproceedtoPisain 1878.AtPisahe studiedunder Betti, grad- uating asadoctor of physics in 1882.Histhesisonhydrodynamics included someresults of Stokes, discovered laterbutindependently by Volterra. Hebecame Professor of Mechanics atPisain 1883,andupon Betti'sdeath, he occupied thechair of mathematical physics. Af- terspending some time atTurin as the chair of mechanics, he was awarded the chair of mathematical physics attheUniversity of Romein 1900. Volterra conceived theideaof a theory of functions that depend ona continuous set of values of another function in 1883.Hadamard waslater to introduce theword "functional," whichreplacedVolterra's originalterminology. In 1890Volterrausedhisfunctional calculus toshowthatthetheory of Hamilton andJacobi fortheintegration ofthedifferential equations of dynamics couldbeextended to other problems of mathematical physics. Hismostfamous workwasdoneonintegralequations.Hebeganthisstudyin1884,and in 1896hepublished severalpapers onwhat isnowcalledtheVolterra integral equation. He continued to study functional analysis applications to integral equations producing a large number of papers oncomposition andpermutable functions. During the First World War Volterra joinedthe Air Force. He made many journeys to France andEngland to promote scientific collaboration. Afterthe war he returned to theUniversity of Rome, and his interests moved to mathematical biology. He studied the Verhulst equation andthelogisticcurve. Healso wrote onpredator-prey equations. In 1922Fascism seizedItaly, andVolterra fought against it in theItalian Parliament. However, by 1930theParliament wasabolished, andwhenVolterra refused totake anoath
  • 509. ural = cj , u'(a) = C2. 17.1 CLASSIFICATION 491 of allegiance to the Fascist government in 1931, he was forced to leave theUniversity of Rome.From thefollowingyearhe livedmostlyabroad, mainlyin Paris, butalso in Spain andother countries. 17.1.3. Example. Differential equations canbe transformed intointegral equations. For instance, considerthe SOLDE d2u du dx2 + Pt (x) dx +po(x)u = r(x), By integrating theDE once,we obtain du LX LX LX - = - PI (t)u'(t) dt - po(t)u(t) dt + ret) dt +C2. dx a a a Integrating thefirst integral by parts gives u'(x) =-PI (x)u(x) + LX[p (t) - po(t)]u(t) dt +LXret) dt +PI (a)ct +C2. ~f(X) ~g(X) III Neumann series solution Integrating once moreyields u(x) = -LXPI (t)u(t) dt + LX f(s) ds + LXg(s) ds +(x - a)[PI (a)ct +C2] = -LXPI (t)u(t) dt + LXds l'[p (t) - po(t)]u(t) dt + LXds l'ret) dt + (x - a)[PI (a)ct +C2]+ct =LX (x - t)[p (t) - poet)] - PI (t)} u(1) dt + LX(x - t)r(t) dt +(x - a)[PI (a)Cj +C2]+cr. (17.6) where we haveusedtheformula LXds l'f(t) dt =LX(x - t)f(t) dt, whichthereader mayverifybyinterchanging theorder of integration ontheLHS.Equation (17.6) is a Volterra equation ofthesecoud kind withkernel K(x, t) sa (x - t)[p (t) - poet)] - PI (t) andvex) '" f:(x - t)r(t) dt +(x - a)[PI (a)ct +C2]+ct. Wenow outline a systematic approachto obtainingthe infinite series of Theo- rem 17.1.2,which also works for the Fredholm equation of the second kind as we shall see in the next section.In the latter case, the series is guaranteedto converge
  • 510. 492 17. INTEGRAL EQUATIONS only if IAIIIKII < 1. This approach has the advantage that in each successive step, we obtain a better approximation to the solution. Writing the equation as Iu) = Iv) +AK Iu) , (17.7) we can interpret it as follows. The difference between lu) and [u) is AK lu). If AK were absent, the two vectors lu) and Iv) would be equal. The effect of AK is to change lu) in such a way that when the result is added to [u), it gives lu). As our initial approximation, therefore, we take Iu) to be equal to Iv) and write luo) = [u), where the index reminds us of the order (in this case zeroth, because AK = 0) of the approximation. Tofind a betterapproximation, we always snbstitute the latest approximation for lu) in the RHS of Equation (17.7). At this stage, we have lUI) = Iv) +AK luo) = Iv) +AK Iv). Still a better approximation is achieved if we substitute this expression in (17.7): The procedure is now clear. Once lun), the nth approximation, is obtained, we can get IUn+l) by substituting in the RHS of (17.7). Before continuing, let us write the above equations in integral form. In what follows, we shall concentrate on the Fredhohn equation. To obtain the result for the Volterra equation, one simply replaces b, the upper limit of integration, with x. The first approximation can be obtained by substituting v(t) for u(t) on the RHS of Equation (17.1). This yields Uj(x) = V(X)+A lb K(x,t)v(t)dt. Substituting this back in Equation (17.1) gives U2(X) = v(x) +Alb dsK(x, s)Uj(s) = v(x) +Alb dsK(x, s)v(s) + A21bdt [lb K(x, s)K(s, t) dS] v(t) = v(x) +Alb dtK(x, t)v(t) + A21b dtK2(x, t)v(t), where K 2(x, t) sa J:K(x, s)K(s, t)ds. Similar expressions can be derived for U3(X), U4(X), and so forth. The integrals expressing various "powers" of K can be obtained using Dirac notation andvectors with continuous indices,as discussed
  • 511. 17.1 CLASSIFICATION 493 in Chapter 6. Thus, for instance, K 3(x, t) sa (xl K(l b IsIl (sll dSI)K (l b IS2) (s21 dS2) Kit} n Iun} = Iv} +).K [u) + ... + ).nKn Iv} = L().K)i Iv} , i~O whose integral form is (17.8) (17.9) Un(x) = t).i l b Ki(x,t)v(t)dt. j=O a Here K i (x, t) is defined inductively by KO(x, z) = (xl ~ It) = (xI1It) = (xl t) = 8(x - t), Ki (x, t) = (xl KKi-1 It) = (xl K (l b Is) (si dS) Ki-I It} = l b K(x,s)Ki-l(s,t)ds. The limit of Un (x) as n -> 00 gives u(x) = f).i l b Ki(x, t)v(t)dt. (17.10) j=O a The convergence of this series, called the Neumann series, is always guaranteed for the Volterra equation. For the Fredholm equation, we need to impose the extra condition 1).IIIKIl < 1. 17.1.4. Example. As an example, let us find the solution of u(x) = 1 +). fou(t) dt, a Volterra equation of the second kind. Here, v(x) = I and K(x, t) = 1, and it is straightfor- ward to calculate approximations to u(x): Uo(x) =v(x) =1, UI(X) =1 +A foX K(x,t)uo(t)dt =1 +AX, {X . (X A2X2 U2(X) =1 +A 10 K(x,t)uI(t)dt = 1 +A 10 (I +M)dt = I +AX + -2-'
  • 512. 494 17. INTEGRAL EQUATIONS It is clear that the nth term will look like ).,2 x2 ).}lXn n )jxi u,,(x) = l+h+--+ ...+-,- = L-.,-. 2 n. j=O J. As n --+ 00, we obtain u(x) = eAx• By direct substitution, it is readily checked that this is indeed a solution of the original integral equation. II 17.2 Fredholm Integral Equations We can use our knowledge of compact operators gained in the previous chapter to study Fredholm equations of the second kind. With A "" 0 a complex number, we consider the characteristic equation (l-AK)lu) = Iv), or u(x) - AK[u](x) = v(x), (17.11) where all fuuctious are square-integrable on [a, bl, and K(x, t), the Hilbert- Schmidt kernel, is square-integrable on the rectangle [a, bl x [a, b]. Using Propositiou 16.2.9, we innnediately see that Equation (17.11) has a unique solutiou if IAIIiKIl < I, and the solutiou is of theform 00 lu) = (1 - AK)-llv) = LA"K" Iv), n=O (17.12) or u(x) = L~oAnK"[vl(x), where Kn[v](x) is defined as in Equation (17.4) except that now b replaces x as the upper limit of integration. 17.2.1. Example. Considerthe integralequation u(x) - 10 1 K(x, t)u(t) dt = x, where { X ifO~x <t, K(x, r) = t ift <x:::::: l. Here )" = 1; therefore, a Neumann series solution exists if IIKIf < 1. It is convenient to write K in terms of the theta function:1 K(x, t) = xO(t - x) +to(x - r), (17.13) This gives IK(x, t)12 = x20Ct - x) + t20(x - t) because 02(x - t) = O(x - t) aod O(x - t)O(t - x) = O. Thus, we have IIKII2 = f dx f dtIK(x, t)1 2 = fo1dx folx20Ct-X)dt+ fo1dx fo 1t20(X-t)dt [I r [I r [I (t3) [I (x3) 1 = fo dt fo x 2 dx + f o dx f o t 2 dt = f o dt "3 + fo dx 3" = 6' 1Recall that the theta function is defined to be 1 ifits argument is positive, and 0 if it is negative.
  • 513. 17.2 FREDHOLM INTEGRAL EQUATIONS 495 Sincethisis less than I, theNeumann series converges, andwehave2 u(x) = I» lb Ki(x,t)v(t)dt= flol Ki(x,t)tdt= ffi(x). j=O a j=O 0 j=O The firstfew terms areevaluatedas follows: fo(x) = 10 1 KO(x,t)tdt = 10 1 8(x,t)tdt =x !J(x) = (I K(x,t)tdt= (I[X8(t-X)+t8(x-t)]tdt . 10 10 =x 1 1 t dt + (X t2 dt = :: _ x 3 . x 1 0 2 6 Thenextterm is trickier than thefirst twobecause of theproduct of thetheta functions. We first substitute Equation (17.13) in the integral for the second-order term, and simplify h(x) = 10 1 K2(x,t)tdt =10 1 tdt 10 1 K(x,s)K(s,t)ds = 10 1 t dt 10 1 [x8(s - x) +sO(x - s)][sO(t - s) +tOrs - t)] ds =x 10 1 tdt 10 1 sO(s-x)8(t-s)ds+x 10 1 t2dt 10 1 O(s-x)8(s-t)ds + (dt (I s20(x-s)8(t-s)ds+ (I t2dt (O(X-s)O(s-t)ds. 10 10 . 10 10 Itis convenientto switchtheorderofintegration atthispoint, Thisisbecause ofthepresence ofO(x - s) and O(s-x), which do not involve t and are best integrated last. Thus, we have h(x) =X 10 1 s8(s-x)ds /,1 tdt+x 10 1 8(s-x)ds loSt2dt + foIS20(X-S)dS /,I t dt+ foIS8(X-S)dS fo St 2dt =x1IsdS(~_s2)+x11ds s3 + (Xs2dS(~_s2)+ f'sds s3 x 2 2 x 3 10 2 2 10 3 5 1 3 1 5 =-x--x +-x 24 12 120 As a test of his/her knowledge of O-function manipulation, the reader is orged to perform theintegration inreverse order. Addingalltheterms, we obtain anapproximation foru(x) that is validfor0::: x ::: 1: II 2Notethat inthiscase(Fredholm equation), we cancalculate thejth termin isolation. In theVolterra case, itwasmorenatural to calculate thesolutionup to a givenorder.
  • 514. 496 17. INTEGRAL EQUATIONS We have seen that the Volterra equation of the second kind has a unique so- lution which can be written as an infinite series (see Theorem 17.1.2). The case of the Fredholm equation of the second kind is more complicated because of the existence of eigenvalues. The general solution of Equation (17.11) is discussed in the following: Fredholm alternative 17.2.2. Theorem. (Fredhohn Alternative) Let K be a Hilbert-Schmidt operator and A a complex number. Then either 1. A is a regular value ofEquation (17.11)---or A-I is a regular point ofthe operator K-in which case the equation has the unique solution lu) = (1 - AK)-I Iv), or 2. A is a characteristic value ofEquation (17.11) (A-1 is an eigenvalue ofthe operator K), in which case the equation has a solution if and only if Iv) is in the orthogonal complement of the (finite-dimensional) null space of 1-A*Kt. Proof The first part is trivial if we recall that by definition, regular points of Kare those complex numbers fl. which make the operator K - fl.1 invertible. For part (2), we first show that the null space of 1 - A*Kt is finite-dimensional. We note that 1 - AK is invertible if and only if its adjoint 1 - A*Kt is invertible, and A E P(K) iff 1.* E P(Kt). Since the specttum of an operator is composed of all points that are not regular, we conclude that A is in the specttum of K if and only if 1.* is in the specttum of Kt. For compact operators, all nonzero points of the specttum are eigenvalues. Therefore, the nonzero points of the specttum of Kt, a compact operator by Theorem 16.5.7, are all eigenvalues of Kt, and the nnll space of 1 - A*Kt is finite-dimensional (Theorem 16.6.2). Next, we note that the equation itself requires that Iv) be in the range of the operator 1 - AK, which, by Theorem 16.6.2, is the orthogonal complement ofthe nnll space of1 - A*Kt. D Erik Ivar Fredholm (1866-1927) was born in Stockholm, the sonof awell-to-do merchant family. He received thebestedu- cationpossibleandsoonshowedgreatpromise inmathematics, leaningespeciallytoward the applied mathematics of practi- cal mechanics in a year of study at Stockholm's Polytechnic Institute. Fredholm finished his education attheUniversity of Uppsala, obtaining his doctorate in 1898. He also studied at theUniversity of Stockholm during thissameperiodandeven- tuallyreceivedanappointment to thefaculty there. Fredholm remained there therestof his professional life. His first contribution to mathematics wascontained in his doctoral thesis,in whichhe studied a first-order partial differential equation in three vari- ables, a problem thatarises in the deformation of anisotropic 'media. Several years later
  • 515. 17.2 FREDHOLM INTEGRAL EQUATlDNS 497 he completedthis work by finding the fundamental solntion to a general elliptic partial differential equation withconstant coefficients. Fredholm is perhaps bestknownforhis studies of the integral equation that bears his name. Suchequations occur frequently in physics.Fredholm's geniusled him to notethe similarity betweenhis equation anda relatively familiar matrix-vector equation, resulting in his identification of aquantity that playsthesamerolein hisequation asthedeterminant playsinthematrix-vectorequation. Hethusobtainedamethodfordeterminingtheexistence of a solution and later used an analogous expression to derive a solution to his equation akinto theCramer's rulesolution to thematrix-vector equation. He further showedthat the solution couldbeexpressed as a powerseriesin a complexvariable. Thislatter resultwas considered important enoughthatPoincare assumed it without proof(infacthe wasunable to prove it) in a study of related partial differential equations. Fredholm then considered the homogeneous form of his equation. He showed that under certain conditions, the vectorspace of solutionsis finite-dimensional. David Hilbert later extended Fredholm's worktoacompleteeigenvalue theory of theFredholm equation, whichultimately led to thediscovery of Hilbert spaces. 17.2.1 HermitianKernel Of special interest are integral equations in which the kernel is hermitian, which occurs exactly when the operator is hermitian. Such a kernel has the property thar' (r] K[r)" = (tl KIx) or [K(x, t)]* = K(t, x). For such kernels we can use the spectral theorem for compact hermitian operators to find a series solution for the integral equation. First we recall that where we have used J..j 1 to denote the eigenvalue of the operator" and expanded the projection operator in terms oforthonormal basis vectors of the corresponding finite-dimensional eigenspace. Recall that N can be infinity. Instead of the double sum, we can sum once over all the basisvectors and write K = L:~I J..;llun) (unl· Here n counts all the orthonormal eigenvectors ofthe Hilbert space, and J..;1is the eigenvalue corresponding to the eigenvector Iun). Therefore, J..;I may be repeated in the sum. The action of K on a vector Iu} is given by 00 K lu) = I>;I (unl u) lun). 11=1 (17.14) 3Sincewe aredealingmainlywithrealfunctions, hermiticity of Kimpliesthesymmetry of K, i.e., K (x, t) = K (t, x). 4)"j is thecharacteristic valueof theintegral equation, ortheinverse of theeigenvalue of thecorresponding operator.
  • 516. 498 17. INTEGRAL EQUATIONS If the Hilbert space is .(,2[0, b], we may be interested in the functional form of this equation. We obtain such a form by multiplying both sides by (x I: 00 00 K[u](x) == (xIKlu) =I);! (unlu) (xlun) =I);l (unlu)un(x). n=l n=l ' 3x (17.15) Hilbert-Schmidt theorem That this series converges uniformly in the interval [0, b] is known as the Hilbert- Schmidt theorem. 17.2.3. Example. Letus solve u(x) =x +AI:K(x, t)u(t)dt, where K(x, t) '" xt is a symmetric (hermitian) kernel, by theNeumann seriesmethod. Wenotethat / IIKII2 = lb lb IK(x,t)1 2dxdt = lb lb x 2t2dxdt = lb x2dx lb t2dt = (lb X2dX) 2 = ~(b3 _ a3)2, or IIKII = lb x2dx = !(b 3 - a3 ), andtheNeumann seriesconverges if I)..I(b3- a3) < 3. Assuming that thisconditionholds; we have 00 lb u(x) = x + I>i Ki(x, t)tdt. j=l a ThespecIal formof thekernel allows us to calculate xt(x, t) directly: Ki(x,t) = lb t:K(x,SI)K(s!,S2) ...·K(si_J,t)dslds2···dsi_l =lblb···lb xsfs~ ... S~_ltdslds2···dSj_l a a a J ( b r = xt 1s2 ds = xtllKlli-1 It follows thatI:Ki (x, t)t dt = xIlKlli-1!(b3 - a3) = xllKlli. Substituting thisin the expression foru(x) yields 00 00 u(x) = x + I>ixlIKlli = x +xAIiKII I>i-11IKlli-1 j=l j=l =X (I +AIIKIII_~IIKII)= I-:IIKII Becauseof thesimplicity of thekernel, we cansolvetheintegral equation exactly. First we write u(x) = x + Alb xtu(t) dt =x + AX lb tu(t) dt sa x(1 + AA),
  • 517. 17.2 FREDHDLM INTEGRAL EQUATIDNS 499 whereA = f:tu(t) dt. Multiplying both sidesby x aud integrating, we obtain A = lab xu(x)dx = (I +AA) lab x 2dx = (I + AA) IIKII =} A = I ~~::KII Substitutiug A in Equation(17.15)gives _ ( A IIKII ) _ x u(x) - x 1+ 1- AIIKII - 1- AIIKII" Thissolution is thesameasthefirst onewe obtained. However, no serieswasinvolvedhere, and therefore no assumption is necessary concerning l'AIIiKIl. II If one cau calculate the eigenvectors Iun} and the eigenvalues A;;1, then one can obtain a solution for the integral equation in terms of these eigenfunctions as follows: Substitute (17.14) in the Fredholm equation [Equation (17.3)] to get 00 lu) = Iv) +A~:::>;;1 (unl u) Iun). n=l Multiply both sides by (umI: (17.16) 00 (uml u) = (uml v) +ALA;;1 (unl u) (uml un) = (uml v) +).).;;;1 (uml U), n=1 ~ (17.17) or,if ).. is notoneof theeigenvalues. ( I) - Am(umlv) Urn U - . Am - A Substituting this in Equation (17.16) gives ~ (unl v) lu} = Iv) +AL. A _ A Iun), n=l n and in the functional form, ~ (unl v) u(x) = v(x) +AL. -r-r--r-un (x), n=l An - A (17.18) (17.19) In case A = Am for some m, the Fredholm alternative (Theorem 17.2.2) says that we will have a solution only if Iv) is in the orthogonal complement of the null space of 5 1 - AmK. Moreover, Equation (17.17) shows that (uml u), the expan- sion coefficients of the basis vectors of the eigenspace Mm, cannot be specified. 5Remember thatKis hermitian; therefore, Am is real.
  • 518. (17.20) (17.21) 500 17. INTEGRAL EQUATIONS However, Equation (17.17) does determine the rest of the coefficients as before. In this case, the solution can be written as ~ (k) ~ (u,,1v) lu) = Iv) + L.., ckl u", } +A L.., A _ A lu,,), k=l 11=1 n n=f:.m where r is the (finite) dimension of M"" k labels the ottbononnal basis {Iu!:»} of M"" and {Ck}k~1 are arbitrary constants. In functional form, this equation becomes r 00 (ulv) u(x) = v(x) +I:CkU~~)(X) +AI: _n_un(x). k=l 11=1 An - A n"", 17.2.4. Example. Wenow give an example of the applicationof Equation (17.19).We wantto solveu(x) = 3 J~1 K(x, t)u(t) dt +x2 where and Pk(x) is a Legendrepolynontial. We first note that {Uk} is an orthonormal set of functions, that K(x, t) is real and symmetric (therefore, hermitian), andthat 11 dt 11 dxIK(x, t)12 = 11 dt 11 dx t Uk(X~U:(t) u/(x;u;(t) -1 -1 -1 -1 k,/=O 2 / Z / 00 1 I 11 11 = I: k/2172 Uk(X)U/(x) dx Uk(t)U/(t) dt k,/=O 2 2 -I -1 ~~~~ =Okl =clkl 001 001 = I:"k8kk = I:"k = 2 < 00. k~O Z k=O Z Thus, K (x, t) is a Hilbert-Schutidtkernel. Now note that Thisahows that zq isaneigenfunctionof K(x, r) witheigenvalue1/2k/2. Since3 ,. I/Zk/2 forany integer k, we canuse Equation (17.19) to write 2 00 J~1 Uk(S)s2ds u(x) = x +3 I: k/2 Uk(X). k=O Z - 3
  • 519. 17.2 FREDHDLM INTEGRAL EQUATIDNS 501 But J~I Uk(S)s2ds =0 for k 2:: 3. For k ::: 2, we use the firstthree Legendrepolynomials to get 1 1 2 ..ti uo(s)s ds = -, -I 3 11 UI (s)s2ds = 0, -I degenerate or separable kernel Thisgives u(x) = !- 2x2. Thereader is urged to substitute this solution in theoriginal integral equationandverifythat it works. II 17.2.2 Degenerate Kernels The preceding example involves the simplest kind of degenerate, or separable, kernels. A kernel is called degenerate, or separable, if it cao be written as a finite sum of products of functions of one variable: n K(x, t) = I>/>j(x)1frj(t), j=l (17.22) where <Pj aod 1frj are assumed to be square-integrable. Substitoting (17.22) in the Fredhohn integral equation of the second kind, we obtain n lb u(x) - A~ <Pj (x) a 1frj(t)u(t) dt = v(x). If we define f.'j = 1:1frj(t)u(t) dt, the preceding equation becomes n u(x) - ALf.'j<Pj(X) = v(x). j=l Multiply this equation by 1frt(x) aod integrate over x to get (17.23) n f.li - ALfJ-jAij = Vi j~l for i = I, 2, ... ,n, (17.24) where Aij = J:1frt(t)<pj(t) dt aod Vi = J:1frt(t)V(t) dt. With f.'i, Vi, aod Aij as components of column vectors u, v, and a matrix A,we canwritetheabove linear system of equations as u - )"Au = v, or (1-AA)u =v. (17.25) We cao now determine the f.'i by solving the system of linear equations given by (i7.24). Once the f.'i are determined, Equation (17.23) gives u(x). Thus, for a degenerate kernel the Fredholm problem reduces to a system of linear equations.
  • 520. 502 17. INTEGRAL EQUATIONS 17.2.5. Example. As aconcrete example of anintegral equation withdegenerate kernel, we solve u(x)-l fJ(I+xt)u(t) dt =x for two different values ofA. Thekemel, K(x, t) = 1 + xt, isseparable with 9'>1 (x) = I, 1/Jl (I) = I, 9'>2(X) = x, and 1/J2(t) = t. This gives the matrix A= Gt)· 2 3 Forconvenience,we definethematrix B sa 1 - )"A. (a)Firstassumethat)" = 1.In thatcase Bhasa nonzero determinant. Thus,8-1 exists,and canbe calculatedto be With VI = f 1/J;(I)v(t)dt = f tdt =! and "2 = 10 1 1/J; (I )V(t) dt = 10 1 t 2dt = j we obtain ( fl.I ) = B-Iv = (-i fl.2 -2 Equation (17.23) then gives u(x) = fl.19'>1 (x) +f1.2¢2(x) +x = -2. (b) Now, forthepurpose ofillustrating the otheralternative ofTheorem 17.2.2, let us take l = 8 +203. Then ( 7 +203 4 +03 ) B = 1 - AA = - 4 +03 (5 +203)/3 ' anddetB= O. This shows that8 +2.../13is a characteristic valueof the equation. Wethus havea solution only if vex) == x is orthogonal to thenull space of 1 - A*At = at. To determine abasisforthisnullspace, we have to find vectors Iz) suchthat at lz) = O. Since ). is real,andBis realandsymmetric, Bt = B,andwe must solve (7+203 4+03 ) «I) 4 +03 (5 +203)/3 <2 = O. Thesolution to thisequation is a multiple of lz) == (-2!,JI3 ).lfthe integral equation is to have a solution,the columnvectorv (whose corresponding ket we denoteby Iv)) mustbe orthogonal to lz). But (zl v) = (3 -2 - 03) (t) f- O. Therefore, the integral equationhas no solution. III
  • 521. 17.2 FREDHOLM INTEGRAL EQUATIONS 503 The reader may feel uneasy that the functions <Pj (x) and Vrj (t) appearing in a degenerate kemel are arbitrary to within a multiplicative function. After all, we can multiply <Pj(x) by a nonzero function, and divide Vrj(t) by the same function, and get the same kemel. Such a change clearly alters the matrices A and B and therefore seems likely to change the solution, u(x). That this is not the case is demonstrated in Problem 17.2. In fact, it can be shown quite generally that the transformations described above do not change the solution. As the alert reader may have noticed, we have been avoiding the problem of solving the eigenvalue (characteristic) problem for integral operators. Such a problem is nontrivial, and the analogue of the finite-dimensional case, where one works with determinants and characteristic polynomials, does not exist. An exception is a degenerate hermitian'' kemel, i.e., a kemel of the form K (x, t) = L:?~l hi(X)h~(t). Substituting this in the characteristic-value equation u(x) = AlbK(x, t)u(t) dt, we obtain u(x) = A L:?=l hi (x) f:h~(t)u(t) dt. Defining I-'i sa f:h~(t)u(t) dt and substituting it back in the equation gives n u(x) = AI)i(x)l-'i. i=l Multiplying this equation by A-lh;;(x) and integrating over x yields (17.26) This is an eigenvalueequation for the hermitian n x n matrix Mwith elements mi], which, by spectral theorem for hermitian operators, can be solved. In fact, the matrix need not be hetmitian; as long as it is normal, the eigenvalue problem can be solved. Once the eigenvectors and the eigenvalues are found, we can substitute them in Equation (17.26) and obtain u(x). We expect to find a finite number of eigenfunctions and eigenvalues. Our analysis of compactoperators included such a case. That analysis also showed that the entire (infinite-dimensional) Hilbert space could be written as the direct sum of eigenspaces that are finite-dimensional for nonzero eigenvalues. Therefore, we expect the eigenspace corresponding to the zero eigenvalue (or infinite characteristic value) to be infinite-dimensional. The following example illustrates these points. 17.2.6. Example. Let usfind thenonzero characteristic values andcorresponding eigen- functions of the kernel K(x, t) = 1 + sin(x + t) for -Jl' ::0 X. t ::0 n, 6Actually, theproblem of a degenerate kernel that leadsto a normal matrix, as described below,canalso be solved.
  • 522. 504 17. INTEGRAL EQUATIONS We are seeking functions u and scalars A satisfying u(x) = AK[u](x), or u(x) = Ai:[I +sin(x +t)lu(t) dt. Expanding sin(x +t), we obtain u(x) =)" L:[1 + sinx cos t + cos x sint]u(t) dt, or A-lu(x) = ILl + IL2sinx + JL3 COSX, (17.27) (17.28) where 1'1 = f':.n u(t) dt, !'2 = f':.n u(t) cost dt, and !'3 = f':.n u(t) sin t dt. Integrate both sides of Equation (17.28) withrespect to x from -rr to 1C to obtain A-I JL1 = 23l'JL 1. Similarly, multiplying by sin x andcosx andintegrating yields and (17.29) If1'1 i' 0, we get A-I = 2,.., which, when substituted in (17.29), yields!'2 = 1'3 = O. We thus have, as a first solution, All = 2Jr and lUI) = a(6), where a is anarbitrary constant. o Equation (17.28) now gives Ai1uI(X) = 1'10 or UI(X) = cj , where Cj is an arbitrary constant tobe determined. On the other hand, 1'1 = 0 in-I i' 2,... Then Equation (17.29) yields A-I =±,.. and 1'2 = ±1'3· For A-I == 1.+1 =,..,Equation (17.28) gives u(x) ea u+(x) = c+(sinx +cosx), andfor A-I == ).,=1 = -Jr, it yields u(x) ss u_(x) = c_{sinx - COS x), where c± are arbitrary constants to bedetermined by normalization of eigenfunctions. Thenormalized eigenfunctions are I U1= $' u±(x) = ~(sinx ± cos x). v 2,.. Direct substitution in theoriginal integral equation easily verifies that U I,'U+, and u.: are eigenfunctions of theintegral equation withtheeigenvalues calculated above. Letus nowconsider the zeroeigenvalue (orinfinite characteristic value). Divideboth sides of Equation (17.27) by A and take the limit of A -+ 00. Then the integral equation becomes L:[1 +sinx cost + cos x sint]u(t) dt = O. The solutions u(t) to thisequation wouldspanthe eigenspace corresponding to the zero eigenvalue, or infinite characteristic value. We pointed out abovethat this eigenspace is expected to be infinite-dimensional. This. expectation is borne out once we note that all functions of the form sinnt or cosnt withn ~ 2 maketheabove integral zero; and there are infinitely many suchfunctions. III
  • 523. 17.3 PROBLEMS 505 17.3 Problems 17.1. Use mathematical induction to derive Equation (17.5). 17.2. Repeat part (a) of Example 17.2.5 using ¢2(x) = x, 'h(t) = t so that we still have K(x, t) = 4>1(X)1/tl(t) +¢2(X)1/r2(t). 17.3. Use the spectral theorem for compact hermitian operators to show that ifthe kernel of a Hilbert-Schmidt operator has a finite number of nonzero eigenvalues, then the kernel is separable. Hint: See the discussion at the beginning of Section 17.2.1. 17.4. Use the method of successive approximations to solve the Volterra equation u(x) = Ag u(t)dt. Then derive a DE equivalent to the Volterra equation (make sure to include the initial condition), and solve it. 17.5. Regard the Fourier transform, I 1 00 F[f](x) sa - - e ixy fey) dy ,J2ir -00 as an integral operator. (a) Show that F2[f](x) = fe-x). (b) Deduce, therefore. that the only eigenvalues of this operator are A = ± I, ±i. (c) Let f(x) be any even function of x. Show that an appropriate choice of a can make u = f +aF[f] an eigenfunction of F. (This shows that the eigenvalues of F have infinite multiplicity.) 17.6. For what values of Adoes the following integral equation have a solution? u(x) = A10" sin(x +t)u(t) dt +x. What is that solution? Redo the problem using a Neumann series expansion. Under whatcondition is theseriesconvergent? 17.7. It is possible to mnltiply the functions 4>j(x) by rj(x) and 1/rj(t) by I!yj(t) and still get the same degenerate kernel, K(x, t) = I:j=1 4>j (x)1/rj(t). Show that such arbitrariness, although affecting the matrices A and S, does not change the solution of the Fredholm problem u(x) - AlbK(x, t)u(t)dt = f(x). 17.8. Show, by direct substitution, that the solution found in Example 17.2.4 does satisfy its integral equation.
  • 524. 506 17. INTEGRAL EQUATIONS 17.9. Solve u(x) = !f~1(x +t)u(t) dt +x. 17.10. Solve u(x) = ),. f~ xtu(t) dt + x using the Neumann series method. For . what values of)" is the series convergent? Now find the eigenvalues and eigen- functions of the kernel and solve the problem using these eigenvalues and eigen- functions. 17.11. Solve u(x) = ),. fo oo K(x, t)u(t)dt+xa, where", is any real number except a negative integer, and K(x, t) = e-(x+I). For what values of)" does the integral equation have a solution? 17.12. Solve the integral equations (a) u(x) = eX +),. fol xtu(t)dt. (c) u(x) = x2 +fol xtu(t) dt. (b) u(x) =),. fo" sin(x - t)u(t) dt. (d) u(x) = x + foX u(t) dt. 17.13. Solve the integral equation u(x) = x +),. f~(x+t)tu(t) dt, keeping terms upto),.2. 17.14. Solve the integral equation u(x) = e-ixi+),. f.""oo e-1x-11u(t) dt, assuming that f remains finite as x --> ±oo. 17.15. Solve the integral equation u(x) = e-ixi +),. fo oo u(t) cosxt dt, assuming that f remains finite as x --> ±oo. Additional Reading 1. DeVito, C. Functional Analysis and Linear Operator Theory, Addison- Wesley, 1990. 2. Jiirgen, K. Linear Integral Operators, Pitman, 1982. Translated from its original German, this is a thorough (but formal) introduction to integral operators and equations. 3. Zeidler, E. Applied Functional Analysis, Springer-Verlag, 1995.
  • 525. 18, _ Sturm-Liouville Systems: Formalism The linear operators discussed in the last two chapters were exclusively integral operators. Most applications of physical interest, however, involve differential operators (DO). Unfortunately, differential operators are unbounded. We noted that complications arise when one abandons the compactness property of the operator, e.g., sums tum into integrals and one loses one's grip over the eigenvalues of noncompact operators. The transition to unbounded operators further complicates malters. Fortunately, the formalism of one type of DOs that occur most frequently in physics can be stndied in the context of compact operators. Such a stndy is our aim for this chapter. 18.1 Unbounded Operators with Compact Resol- vent domain ofalinear operator As was pointed out in Example 16.2.7, the derivative operator cannot be defined for all functions in .(,2(a, b). This motivates the following: 18.1.1. Definition. Let D be a linear manifold 1 in the Hilbert space 11:. A linear map T : :D --> 11: will be called a linear operator in2 11:. :D is called the domain ofT and often denoted by :D(T). 18.1.2. Example. The domainof the derivative operatorD, as an operatoron £,2(a, b), cannot be the entire space. On the other hand, D is defined on the linear manifold M in £,2(a. b) spannedby (ei2nrrx/Lj wiih L = b - a. As we saw in Cbapter8, :M is dense 1A linear manifoldof an infinite-dimensional normedvectorspace'7 is a propersubset that is a vectorspacein its ownright, butls not necessarilyclosed. As opposed to on X.
  • 526. 508 18. STURM-LIOUVILLE SYSTEMS: FORMALISM (see Definition 16.4.5 and the discussion following it) in ~2(a, b). This is the essence of Fourier series: That every function in t.}(a, b) canbe expanded in (i.e., approximated by) a Fourier series. It turns out thatmany unbounded operators on a Hilbertspace sharethe same property, namely that their domains aredensein theHilbert space. Another important property of Fourier expansion is the fact that if the function is differentiable, then one can differentiate both sides, i.e., one can differentiate a Fourier expansion term by term if such anoperation makes sensefortheoriginal function. Define the sequence (fm) by m fm(x) = L uni2rrnx/L, n=-m 1 lb . a'l = .JL a !(x)e-l271:nxjLdx. Then we can statethe propertyaboveas follows:Soppose (fm) is in M.lflim [m = I and lim I:" =g, then I' = g andI E M. Manyunboundedoperatorssharethisproperty. iii 18.1.3. Definition. Let :D be a linear manifold in the Hilbert space Jf. Let T : :D --'> Jf be a linear operator in Jf. Suppose that for any sequence {Iun)) in D, both {Iun)) and {Tlun)}converge in Jf, i.e., lim lun ) = lu) and lim T lun) = Iv) . difference between hermitian and self-adjoint operators closed operator We say thatT is closed iflv) E :DandTlu) = [u). Notice that we cannot demand that Iv) be in :D for a general operator. This, as we saw in the preceding example, will not be appropriate for unbounded operators. The restriction of the domain of an unbounded operator is necessitated by the fact that the action ofthe operator on a vector in the Hilbert space in general takes that vector out of the space. The following theorem (see [DeVi 90, pp. 251-252] for a proot) shows why this is necessary: 18.1.4. Theorem. A closed linear operator in Jf that is defined at every point of Jf (so that :D= Jf) is bounded. Thus, if we are interested in unbounded operators (for instance, differential operators), we have to restrict their domains. In particular, we have to accept the possibility of an operator whose adjoint has a different domain.f 18.1.5. Definition. Let Tbea linear operatorin Jf. Weshall say thatT is hermitian ifTt is an extension ofT, i.e., :D(T) C :D(Tt) andTt lu) = T lu)foralllu) E :D(T). T is called self-adjoint if:D(T) = :D(Tt). operators with compact resolvent As we shall see shortly, certain types of Sturm-Liouville operators, although unbounded, lend themselves to a study within the context of compact operators. 18.1.6. Definition. A hermitian linear operator T in a Hilbert space Jf is said to have a compact resolvent ifthere is a I-' E P(T) for which the resolvent R" (T) is compact. 3Thissubtledifference betweenhermitian andself-adjoint is statedheremerelytowarnthereader andwill be confined to the present discussion. Thetwo qualifiers will be (ab)used interchangeably in therestof thebook.
  • 527. 18.1 UNBOUNOEO OPERATORS WITH COMPACT RESOLVENT 509 An immediate consequence of this definition is that R, (T) is compact for all A E pen. To see this, note that R,(T) is bounded by Definition 16.3.1. Now use Equation (16.9) and write The RHS is a product ofa bounded" and a compact operator, and therefore must be compact. The compactness ofthe resolvent characterizes its spectrum by Theorem 16.7.8. As the following theorem shows, this in tnrn characterizes the spectmm of the operators with compact resolvent. 18.1.7. Theorem. Let T be an operator with compact resolvent R,(T) where A E pm. Then 0 'I I" E p(R,m) if and only if (A + 1/1") E peT). Similarly, I" 'lOis an eigenvalue ofR,(T) ifand only if (A+ 1/1") is an eigenvalue ofT. Furthermore, the eigenvectorsofR,(T) corresponding to I" coincide with those of T corresponding to (A+ 1/1"). Proof The proof consists of a series of two-sided implications involving defini- tions. We give the proof of the first part, the second part being very similar: I" E P(R, (T)) iff R,(T) -1"1 is invertible. R,(T) -1"1 is invertible iff (T - A1)-1 -1"1 is invertible. (T - Al)-1 -1"1 is invertible iff l-I"(T-Al) is invertible. I l-I"(T-Al) is invertible iff -l-T+Al is invertible. I" (±H)l-T is invertible iff (± H) E peT). Comparing the LHS of the first line with the RHS of the last line, we obtain the first part of the theorem. D A consequence of this theorem is that the eigenspaces of an (unbounded) operator with compact resolvent are finite-dimensional, i.e., such an operator has only finitely many eigenvectors corresponding to each ofits eigenvalues. Moreover, arranging the eigenvalues I"n of the resolvent in decreasing order (as done in Theorem 16.7.8), we conclude that the eigenvalues of T can be arranged in a sequence in increasing order of their absolute values and the limit ofthis sequence is infinity. 18.1.8. Example. Consider the operator T in .(,2(0,1) definedbyS Tf = - r having the domain 2J(T) = If E .(,2(0, I) Ir E .(,2(0, I), frO) = f(l) = OJ. The reader may 4rhe sumof two bounded operators is bounded. 5Weshalldepart from ourconvention hereand shallnotusetheDirac bar-ket notation although theuse of abstract operators encourages theiruse.The reason is thatin thisexample, we aredealingwithfunctions, andit is moreconvenient toundress the functions fromtheirDiracclothing.
  • 528. 510 18. STURM-LIOUVILLE SYSTEMS: FORMALISM checkthat zerois not aneigenvalue of T. Therefore, we maychoose Ro(n = T- 1. We shallstudya systematic way of finding inverses of some specificdifferential operators in theupcoming chapters on Green's functions. At thispoint,sufficeit to saythat T-1 canbe written as a Hilbert-Schmidt integral operator withkernel K(x,t) = {X(I-t) if 0 ~ x ~ t s I, (I - x)t if 0 ~ t ~ x ~ I. Thus,ifT1= g, i.e., if I" = -g, thenrig = I, or 1= K[g], i.e., I(x) = K[g](x) = f K(x, t)g(t) dt = fox (I - x)tg(t) dt + {(I - x)tg(t) dt. lt is readilyverified that K[g](O) = K[g](l) = 0 and !,,(x) = K[g]"(X) = -g. We can now use Theorem 18.1.7 with)" = 0 to find all the eigenvalues ofT: /Ln is an eigenvalue of T if andonlyif 1/Jkn is aneigenvalue of T-1.Theseeigenvalues shouldhave finite-dimensional eigenspaces, andwe shouldbe ableto arrange themin increasing order of magnitude without bound. To verifythis, we solve I" = -J.L!. whose solutions are /Ln = n27f2 andIn(x) = sinmfx. Notethat there is onlyoneeigenfunction corresponding to eacheigenvalue. Therefore, theeigenspaces arefinite- (one-)dimensional. l1li The example above is a special case of a large class of DOs occurring in math- ematical physics. Recall from Theorem 13.5.4 that all linear second-order differ- ential equations can be made self-adjoint. Moreover, Example 13.4.12 showed that any SOLDE can be transformed into a form in which the first-derivative term is absent. By dividing the DE by the coefficient of the second-derivative term if necessary, the study of the most general second-order linear differential operators boils down to that of the so-called Sturm-Liouville (S-L) operators which are assumed to be self-adjoint. Differential operators are necessarily ac- companied by boundary conditions that specify their domains. So, to be complete, let us assume that the DO in Equation (18.1) acts on the subset of £,z(a, b) con- sisting of functions u that satisfy the following so-called separated boundary conditions: Sturm-Liouville operators separated boundary conditions d2 Lx == dx 2 - q(x), alu(a) +fhu'(a) = 0, azu(b) +fJzu'(b) = 0, (18.1) (18.2) regular Sturm-Liouville systems where aI, az, fh, and fJz are real constants with the property that the matrix of coefficients has no zero rows. The collectionofthe DO and the boundary conditions above is called a regular Sturm-Liouville system. We now show that the DO of a regular Sturm-Liouville system has compact resolvent. First observe that by adding au-with a an arbitrary number different from all eigenvalues of the DO-to both sides of the eigenvalue equation u" -
  • 529. 18.1 UNBOUNDED OPERATORS WITH COMPACT RESOLVENT 511 qu = AU,we canassume" thatzerois not aneigenvalue of Lx. Next, suppose that Ut(x) and uz(x) are ihe two linearly independent solutions ofihe homogeneous DE satisfying ihe first and ihe second boundary conditions of Equation (18.2), respectively. The operator whose kernel is K(x, t) = {-U1(X)UZ(t)/w(a) if a::o x ::0 t::o b, -UI(t)UZ(x)/W(a) if a::o t::o x ::0 b, in which W is ihe Wronskian of the solutions, is a Hilbert-Schmidt operator and iherefore compact. We now show ihat K(x, t) is ihe resolvent Ro(Lx) = L.;-t sa K of our DO. To see this, write Lxu = v, and uz(x) 1 x Ut(x) lb u(x) = K[v](x) = --- Ut(t)v(t)dt - -(-) uz(t)v(t)dt. W(a) a Wax Differentiating this once gives u t (x) 1 x u';(x) lb u' (x) = __ z_ UI (t)v(t) dt - _t_ uz(t)v(t) dt, W(a) a W(a) x and a second differentiation yields u"(x) 1 x u"(x) lb u"(x) = __ z_ uI(t)v(t)dt __t_ uz(t)v(t)dt + v(x). W(a) a W(a) x The last equation follows from ihe fact ihat the Wronskian uiUz- u;uI is constant for a DE of the form u" -qu = O. By substituting ul = qu; andu~ = qu: in ihe last equation, we verify ihat u = K[v] is indeed a solution ofihe Sturm-Liouvi1le system Lxu = v. Next, we show that ihe eigensolutions of ihe S-L system are nondegenerate, i.e., ihe eigenspaces are one-dimensional. Suppose II and [z are any two eigen- functions corresponding to ihe same eigenvalue. Then boih must satisfy ihe same DE and the Same boundary conditions; in particular, we must have at!t(a) +fhI{(a) = 0, '* (!t(a) I{(a)) (at) _ (0) at!z(a) +fhI~(a) = 0 jz(a) I~(a) fJt - 0 . (18.3) If at and fJI are not boih zero, ihe Wronskian-ihe determinant of ihe matrix above-must vanish. Therefore, ihe two functions must be linearly dependent. Finally, recall ihat a Hilbert space on which a compact operator K is defined can be written as a direct sum of ihe latter's eigenspaces. More specifically, 9{ = '£-7=0 EIlJV(j, where each JV(j is finite-dimensional for j = 1, 2, ... , and 6Although this will changeq-and the originaloperator-no information will be lost becausethe eigenvectors will be the sameandall eigenvalues will be changed by a.
  • 530. (18.4) 512 lB. STURM-LIOUVILLE SYSTEMS: FORMALISM N can befinite or infinite. If N is finite, then Mo, which can be considered as the eigenspace of zero eigenvalne,7 will be infinite-dimensional. If Mo is finite- dimensional (or absent), then N mnst be infinite, and the eigenvectors of K will span the entire space, i.e., they will form a complete orthogonal system. We now show that this holds for the regular Sturm-Liouville operator. Jacques Charles FraucoisSturm (1803-1855)madethefirst accurate determination of the velocity of soundin water in 1826, working with the Swiss engineer Daniel Colladon. He became a French citizenin 1833 andworked in Paris at the EcolePolytechniquewhere hebecame aprofessor in 1838.In 1840 he succeeded Poisson in the chair of mechanics in the Pacuue des Sciences, Paris. Theproblems of determining theeigenvalues andeigen- functions of an ordinary differential equation withboundary conditions andof expanding a given function in tenus of an infinite seriesof the eigenfunctions, which date from about 1750,becamemoreprominent asnew coordinate systemswereintroduced andnewclasses of functions aroseas theeigenfunctions of ordinary differential equations. Sturm andhis friend Joseph Liouvilledecided to tackle thegeneral problem for any second-order linear differential equation. . Sturm hadbeenworking since 1833onproblems of partial differential equations, pri- marily on the flowof heatin a bar of variable density, andhence was fully aware of the eigenvalue andeigenfunction problem. Themathematical ideas he applied tothisproblem arecloselyrelatedtohisinvestigations ofthe reality anddistributionof theroots ofalgebraic equations. His ideas on differential equations, he says,camefrom thestudy of difference equations anda passage to thelimit. Liouville, informed by Sturm of theproblems he was working on, took up the same subject. The results of their joint workwas published in several papers which are quite detailed. Suppose that the above Hilbert-Schmidt operator K has a zero eigenvalue. Then, there must exists a nonzero function v such that K[v](x) = 0, i.e., U2(X) 1 x Ut(x) lb - - - Uj (t)v(t) dt - - - U2(t)V(t) dt = 0 W(a) a W(a) x for all x. Differentiate this twice to get U"(x) r u"(x) lb - .Ji(a) Ja Ut(t)v(t) dt - ~(a) x U2(t)V(t) dt +v(x) = O. Now substitute ur = qUj and Uz= qU2 in this equation and use Equation (18.4) to conclude that v = O. This is impossible because no eigenvector can be zero. 7Thereader recallsthat whenKactsonMo, it yieldszero.
  • 531. Theorem forregular Sturm-Liouville systems 16.2 STURM--L1DUVILLE SYSTEMS AND SDLDES 513 Hence, zero is not an eigenvalue of K, i.e., Mo = {D). Since eigenvectors of K = L;1coincidewith eigenvectors of Lx, andeigenvalues of Lx arethereciprocals of the eigenvalues of K, we have the following result. 18.1.9. Theorem. A regular Sturm-Liouville system has a countable number of eigenvalues that can be arranged in an increasing sequence that has infinity as its limit. The eigenvectors ofthe Sturm-Liouville operator are nondegenerate and constitute a complete orthogonal set. Furthermore, the eigenfunction Un (x) cor- responding to the eigenvalue An has exactly n zeros in its interval ofdefinition. The last statementis not a result of operator theory, but can be derivedusing the theory of differential equations. We shall not present the,details of its derivation. We need to emphasize that the boundary conditions are an integral part of S- L systems. Changing the boundary conditions so that, for example, they are no longer separated may destroy the regularity of the SoL system. 18.2 Sturm-LiouvilleSystems and SOLDEs We are now ready to combine our discussion of the preceding section with the knowledge gained from our study of differential equations. We saw in Chapter 12 that the separation of PDEs normally results in expressions of the form L[u]+ AU = 0, or d2u du P2(x)d 2 +PI(x)-d + po(x)u +AU = 0, x x (18.5) (18.6) where uis a function of a single variable and Ais, a priori, an arbitrary constant. This is an eigenvalue equation for the operator L, which is not, in general, self-adjoint. lfwe use Theorem 13.5.4 and multiply (18.5) by 1 [fX PI(t) ] w(x) = --exp --dt, . P2(X) P2(t) it becomes self-adjoint for real A,and can be written as d [ dU] dx p(x) dx + [AW(X) - q(x)]u = 0 withp(x) =w(x)p2(x)andq(x) =-po(x)W(x). Equation (18.6) is the standard form of the SoL equation. However, it is not in the form studied in the previous section. To turn it into that form one changes both the independent and dependent Liouville substitution variables via the so-called Lionville substitntion: u(x) = v(t)[p(x)w(x)r1/4, _lX~(S)d t - . s. a pes) (18.7)
  • 532. 514 18. STURM-LIOUVILLE SYSTEMS: FORMALISM (18.8) Itis thena mailerofchain-rule differentiation to showthatEquation (18.6) hecomes d2v dt2 +[J. - Q(t)]v = 0, where Q(t) = q(x(t)) + [P(X(t))W(X(t))]-1/4 d d: [(pw)I/4]. w(x(t)) t Therefore, Theorem 18.1.9 still holds. Joseph Liouville (1809-1882) was a highly respecled professor at the College de France, in Paris, and the founder and editor of theJournal des Mathbnatiques Pures et Appliquees, afamouspe- riodical that played an important role in French mathematical life through the latter part of the nineteenth century. His own remark- able achievements as a creativemathematicianhave onlyrecently received the appreciation they deserve. He was the first to solve a boundary value problem by solving an equivalent integral equation. His ingenious theory offractional differentiation answered the long-standing question of what rea- sonable meaning can be assigned to the symbol any/ dx n when n is not a positive integer. He discovered the fundamental result in complex analysis that a bounded entire function is necessarily a constant and used it as the basis for his own theory of elliptic functions. There is also a well-known Liouvilletheoremin Hamiltonian mechanics, which states that volume integrals are time-invariant in phase space. In collaboration with Sturm, he also investigated the eigenvalue problem of second-order differential equations. The theory oftranscendental numbers is another branch ofmathematics that originated in Liouville's work. The irrationality of 1f and e (the fact that they are not solutions of any linear equations) had been proved in the eighteenth century by Lambertand Euler.In 1844 Liouville showed that e is not a root of any quadratic equation with integral coefficients as well. This led him to conjecture that e is transcendental, which means that it does not satisfy any polynomial equation with integral coefficients. 18.2.1. Example. The Liouville substitutiou [Equatiou (18.7)1 transformsthe BesselDE (xu')' + (Px - v2/x)u = 0 iuto d2 v [2 v 2 - 1 /4] -+ k - v=O, dt 2 t2 coskt Jl/2(kt) = B .,ji , or from whichwe can obtainaninteresting resultwhen v = !.In thatcase we have v+k2v =0, whose solutious are of the form coskt aud siukt. Noting that u(x) = JI/2(X), Equatiou (18.7) gives sinkt JI/2(kt) = A .,ji
  • 533. (18.9) 18.2 STURM--LIOUVILLESYSTEMS ANO SOLDES 515 andsinceh/2(X) is analyticatx = 0, wernnst have h/2(kt) = A sinkt/,f!, whichis the resultobtainedin Chapter 14. .. The appearance of W is the result of our desireto renderthe differential nperator self-adjoint. It also appears in another context. Recall the Lagrange identity for a self-adjoint differential operator L: . d , , uL[v] - vL[u] = -{p(x)[u(x)v (x) - v(x)u (x)]). dx Ifwe specialize this identity to the S-Lequation of(18.6) with u = Uj correspond- ing to the eigenvalue Al and v = U2 corresponding to the eigenvalue A2,we obtain for the LHS uIL[U2] - u2L[uIl = Uj(-A2WU2) +U20.IWUI) = (AI - A2)WUIU2. Integrating both sides of (18.9) then yields (Aj - A2)lab WUjU2dx = {p(x)[Uj(x)u~(x) - U2(X)U~(x)])~. (18.10) A desired property of the solutions of a self-adjoint DE is their orthogonality when they belong to different eigenvalues. This property will be satisfied if we assume an inner product integral with weight function W (x), and if the RHS of Equation (18.10) vanishes. There are various boundary conditions that fulfill the latter requirement. For example, Uj and U2 could satisfy the boundary conditions of Equation (18.2). Another set of appropriate boundary conditions (BC) is the periodic boundary periodic BC given by conditions ural = u(b) and u' (a) = u' (b). (18.11) However, as the following example shows, the latter BCs do not lead to a regular S-L system. 18.2.2. Example. (a)TheS-Lsysternconsisting oftheS-Lequationd2u/dt2+",2u = 0 intheinterval[0, T] withtheseparatedBCsu(O)= 0 andu(T) = ohastheeigenfunctions un(t) = sin n; t with n = 1,2, ... and the eigenvalues An = w~ = (nx /T)2 with n = 1,2, .... (b) Letthe S-Leqnationbe the sameas inpart (a) bntchange theintervalto [-T, +T] and theBCs to aperiodiconesuchasu(-T) = u(T) andu'(-T) = u'(T). Theeigenvalues are thesameasbefore,but the eigenfunctions are1,sin(mrt/T), andcos(mrt/T), wheren is a positiveinteger. Note that there is a degeneracy herein the sense thattherearetwolinearly independent eigenfunctions havingthe sameeigenvalue(mr / T)2. By Theorem18.1.9,the S-L systemis notregular. . (c) The Besselequationfor a givenfixed 1)2 is 1 ( v 2 ) u"+ -;.u' + k 2 - x2 u = 0, where a :S:X :s:b,
  • 534. (18.12) 516 18. STURM-LIOUVILLE SYSTEMS: FORMALISM andit canbe turned intoanS-L systemif we multiply it by I [IX Pl(t)] [IX dt] w(x)=--exp --dt=exp -=x. P2(X) P2(t) t Thenwe canwrite d (dU) (2 v 2) - x- + k x - - u = 0, dx dx x which is in theform of Equation(18.6) with P = w = x, A = k2, and q(x) = v2/x. If a > 0, we can obtaina regularSoL systemhy applyingappropriateseparatedBes. III singular S·Lsystems A regular SoLsystem is too restrictive for applications where either a or b or both may be infinite or where either a or b may be a singular point of the SoL equation. A singular S·L system is one for which one or more of the following conditions hold: 1. The interval [a, b] stretches to infinity in either or both directions. 2. Either p or w vanishes at one or both end points a and b. 3. The function q(x) is not continuous in [a, b]. 4. Anyone of the functions ptx), q(x), and w(x) is singular at a or b. Even though the conclusions concenting eigenvalues of a regular SoLsystem cannot be generalized to the singular SoLsystem, the orthogonality of eigenfunc- tions correspondiug to different eigenvalues can, as long as the eigenfunctions are square-integrable with weight function w(x): 18.2.3. Box. The eigenfunctions ofa singular SoLsystem are orthogonal if the RHS of(18.10) vanishes. 18.2.4. Example. Bessel functions Jvex) are entire functions. Thus, they are square- integrable in the interval[0, b] for any finitepositiveb. For fixedv the DE 2d2u du 2 2 2 r dr2 +r dr +(k r - v )u = 0 transforms into the Bessel equationx-e"+xu'+(x1_v2)u = oifwe make the substitution kr = x. Thus, the solution of the singular S-L equation (18.12)thatis analytic atr = 0 and corresponds to theeigenvalue k2 is uk(r) = JvCkr). Fortwodifferent eigenvalues, kr andk~, theeigenfunctions are orthogonalifthe boundarytermof (18.10)corresponding to Equation(18.12)vanishes, that is, if (r[Jv(klr)J~(k2r)- Jv(k2r)J~(klr)]Jg
  • 535. where - 1 < x < 1, 18.3 OTHER PROPERTIES OF STURM--L10UVILLE SYSTEMS 517 vanishes,whichwilloccurifandonlyif Jv(kj b)J~(k2b)-Jv(k2b)J~(kj b) = O. Acornmon choice is to take Jv(k]b) = 0 = Jv(k2b), that is, to take both k]b and k2b as (different) rootsof the Bessel function of order v. WethushaveItr Jv(kir)Jv(kjr) dr = 0 if ki and kj aredifferent roots of Jp(kb) = O. TheLegendre equation d [ 2 dU] - (I-x)- +AU=O, dx dx is aheady self-adjoint. Thus, w(x) = I, and p(x) = 1 - x2. The eigenfnnctionsof this singularS-I. system[singularbecansep(I) = p(-I) = 0] areregularattheendpoints x = ±I andarethe LegendrepolynomialsPn(x) correspondingto A= n(n + 1).The bonndary termof(I8.1O) clearlyvanishesata = -1 and b = +1. Since Pn(x) are square-integrable on [-I, +1], we obtain the familiar orthogonality relation: J~l Pn(x)Pm(x)dx = 0 if m i' n. TheHermite DE is u" - 2xu' + AU = O. (18.13) (18.14) 2 It is transformedinto an SoL systemif we multiply it by w(x) = e-x . The resultingSoL equation is d [ 2 dU] 2 - e-x _ + Ae-x u = O. dx dx The boundary tenn corresponding to the two eigenfunctions u1(x) andU2(x) having the respective eigenvalues A1 and)..2 =1= Al is (e- X2 [U] (x)u2 (x)- U2 (x)u! (x)]J~. Thisvanishes forarbitrary Uland U2 (because theyareHermite polynomials) if a = -00 andb = +00. The function u is an eigenfunction of (18.14) corresponding to the eigenvalue x if and only if it is a solution of (18.13). Solutions of this DE corresponding to A = 2n are the Hermite polynomials Hn(x) discussed in Chapter 7. We can therefore write J~:: e- x 2 Hn(x)Hm(x) dx = 0 ifm i' n. This orthogonalityrelation was also derivedin Chapter7. III 18.3 Other Properties of Sturm-Liouville Systems The S-L problem is central to the solution of many DEs in mathematical physics. In some cases the S-L equation has a direct bearing on the physics. For example, the eigenvalue Amay correspond to the orbital angularmomentum ofan electron in an atom (see the treatment of spherical harmonics in Chapter 12) or to the energy levels of a particle in a potential (see Example 14.5.2). In many cases, then, it is worthwhile to gain some knowledge of the behavior of an S-L system in the limit oflarge A-high angular momentum or high energy. Similarly, it is useful to understand the behavior of the solutions for large values of their arguments. We therefore devote this section to a discnssion of the behavior of solutions of an S-L system in the limit of large eigenvalues and large independent variable.
  • 536. 518 18. STURM-LIOUVILLE SYSTEMS: FORMALISM 18.3.1 Asymptotic Behavior for Large Eigenvalues We assume that the SoLoperator has the form given in Equation (18.1). This can always be done for an arbitrary second-order linear DE by multiplying it by a proper function (to make it self-adjoint) followed by a Liouville substitution. So, consider an SoL systems of the following form: u" + [A - q(x)]u == u" + Q(x)u = 0 whereQ=A-q (18.15) with separated Bes of (18.2). Let us assume that Q(x) > 0 for all x E [a, b], that is, A> q(x). This is reasonable, since we are interested in very large A. The study of the system of (18.15) and (18.2) is simplified if we make the PrOfer substitution Priifer snbstitution: (18.16) (18.17) where R(x, A)and </>(x, A)are A-dependent functions of x. This substitution trans- forms the SoLequation of (18.15) into a pair of equations (see Problem 18.3): d</> q' dx = .vA - q(x) - 4[A _ q(x)] sin2</>, dR = Rq' cos2</>. dx 4[A - q(x)] The function R(x, A)is assumed to be positive because any negativity ofu can be transferred to the phase </>(x, A). Also, R carmot be zero at any point of [a, b], because both u and u' would vanish at that point, and, by Lemma 13.3.3, u(x) = O. Equation (18.17) is very useful in discussing the asymptotic behavior of solutions of SoL systems both when A --> 00 and when x --> 00. Before we discuss such asymptotics, we need to make a digression. It is often useful to have a notation for the behavior of a function j (x, A) for large Aand all values of x. If the function remains bounded for all values of x as A--> 00, we write j(x, A) = 0(1). Intuitively, this means that as Agets larger and larger, the magnitude of the function j(x, A)remains of order 1. In other words, for no value of x is lim"....co j(x, A) infinite. If A"j(x, A) = 0(1), then we can write j(x, A) = O(1)/A". This means that as Atends to infinity, j(x, A) goes to zero as fast as I/A" does. Sometimes this is written as j(x, A) = OrA-"). Some properties of 0(1) are as follows: 1. Ifa is a finite real number, then 0(1) +a = 0(1). 2. 0(1) + 0(1) = 0(1), and 0(1)0(1) = 0(1). 3. For finite a and b, J:0(1) dx = 0(1). 4. Ifr and s are real numbers with r ::s s, then O(I)A' + O(I)A' = o(1)A' . .'
  • 537. 18.3 OTHER PROPERTIES OF STURM--L10UVILLE SYSTEMS 519 5. If g(x) is any bounded function of x, then a Taylor series expansion yields [A +g(x)]' = A' [I + g~)r , { g(x) r(r - I) [g(X)]2 O(I)} =A I+r-+ - +-- A 2 A A3 = A' + rg(x)A,-1 + O(l)A,-2 = A' + O(l)A,-1 = 0(1)).'. Returning to Equation (18.17) and expanding its RHSs using property 5, we obtain dR 0(1) = dx A Taylor series expansion of q,(x, A) and R(x, A) about x = a then yields 0(1) q,(x, A) = q,(a, A) + (x - a)Vi + Vi' 0(1) R(x, A) = R(a, A) +-A- (18.18) for A -+ 00. These resnlts are useful in determining the behavior of An for large n. As an example, we use (18.2) and (18.16) to write _ al = u'(a) = R(a, A)Ql/4(a, A)cos[q,(a, A)] = Ql/2(a, A)cot[q,(a, A)], Ih u(a) R(a, A)Q 1/4(a, A)sin[q,(a, A)l where we have assumed that,81 f' O. If,81 = 0, we can take the ratio ,81/al, which is finite because at least one of the two constants must be different from zero. Let A = -al/,81 and write cot[q,(a, A)] = A/.JA q(a). Similarly, cot[q,(b, A)] = B/.JA - q(b), where B = -a2/.82. Let us concentrate on the nth eigenvalue and write -1 A </!(a, An) = cot "fAn - q(a) A q,(b, An) = coC 1 --r.;'==T,[i' .JAn q(b) For large Anthe argument ofcot"1 is small. Therefore, we can expand the RHS in a Taylor series about zero: 1 1 1f 1f 0(1) cot" e ee ccf" (0)-£+···=--£+···=-+-- 2 2 A. for s = o(I)/A.. It follows that 1f 0(1) q,(a, An) = "2 + A.' 1f 0(1) q,(b, An) = - +nit + 1'>' 2 vAn (18.19)
  • 538. 520 18. STURM-LIOUVILLE SYSTEMS: FORMALISM The term nit appears in (18.19) because, by Theorem 18.1.9, the nth eigenfunction has n zeros between a and b. Since u = R Q~1/4 sin tP, this means that sin tP must go through n zeros as x goes from a to b. Thus, at x = b the phase tP must be nn largerthanat x = a. Substituting x = b in the first equation of (18.18), with A --> An, and using (18.19), we obtain ~ +n1l' + 0(1) = ~ + 0(1) + (b -a)J):;; + 0(1), 2 .n; 2 .;r;; .n; or 0(1) (b - a)J):;; = nn + 1'>' vAn (18.20) (18.21) One consequence ofthis result is that, limn-->oo nA;I/2 = (b-a)I1l'. Thus,.;r;; = Csn, where limn-->oo Cn = 1l'1(b - a), and Equation (18.20) can be rewritten as J):;; = ~ + 0(1) = ~ + 0(1). b -a Cnn b -a n 00, This equation describes the asymptotic behavior of eigenvalues. The following theorem, statedwithoutproof, describes the asymptotic behaviorofeigenfunctions. 18.3.1. Theorem. Let {un(x)} ~o be the normalized eigenfunctions ofthe regular S-L system given by Equations (18.15) and (18.2) with fhfh i' O. Then.for n --> (18.22) where An = n(n + I). _~ n1l'(x-a) 0(1) unix) - --cos + --. b-a b-a n 18.3.2. Example. Let us derive an asymptotic formula for the Legeudre polynomials Pn (x). WefirstmaketheLiouville substitution to transform theLegendre DE [(I-x2)pI]'+ n(n + I)Pn = 0 ioto d2v dt2 + [An - Q(t)]v = 0, asymptotic behavior ofsolutions of large order (18.23) Here p(x) = 1- x2 and w(x) = I, so t = r ds/~ = cos-I x, or x(t) = cost, and Pn(x(t)) = v(t)[I-x2(t)]-1/4 = ;(1) . smr 10Equation (18.22)
  • 539. 16.3 OTHER PROPERTIES OF STURM--LIOUVILLE SYSTEMS 521 For large n we can neglect Q(t), make the approximation An ~ (n + !)2, and write v+ en + !)2v = 0, whose general solution is v(t) = A cos[(n+ ~)t +"J, where A and a are arbitrary constants. Substituting this solution in (18.23) yields Pn(cost) = A cos[(n + ~)t + "J/v'sint. To determine" we note that Pn(O) = 0 if n is odd.Thus,ifwe let t = 1r/2,thecosineterm vanishesforoddn ifa = -JC/4.Thus,the general asymptotic formula forLegendre polynomials is A [ I "] Pn(cost) = v" cos (n + -)t - - . smt 2 4 18.3.2 Asymptotic Behavior for Large x III Liouville and Pnifer substitutions are useful in investigating the behavior of the solutions of S-L systems for large x as well. The general procedure is to transform the DE into the form of Eqnation (18.8) by the Liouville substitution; then make the Pnifer substitution of(18.16) to obtain two DEs in the form of (18.17). Solving Equation (18.17) when x -+ 00 determines the behavior of </J and R and, subse- quently, of u, the solution. Problem 18.4 illustrates this procedure for the Bessel functions. We simply quote the results: (2 [ ( 1)]( v 2 - 1/4] 0(1) Jv(x)=y;;;;COS x- v+2: 2+ 2x + x5/2' (2 . [ ( 1)]( v 2 -1/4] 0(1) Yv(x) = y;;;;sm x - v + 2: 2 + 2x + x5/2 . These two relations easily yield the asymptotic expressions for the Hankel func- tions: HP)(x) = Jv(x) +iYv(x) {!; {[ ( 1)]( v 2 -1/4]} 0(1) = -exp i x- v+- -+ +--, ](X 2 2 2x x5/ 2 H~2)(x) = Jv(x) - iYv(x) = (2exp {-i [x _(v + ~) ::. + v2 - 1/4]} + 0(1). Y;;;; 2 2 2x x5/2 If the last term in the exponent-which vanishes as x -+ oo-is ignored, the asymptotic expression for HPJ(x) matches what was obtained in Chapter 15 using the method of steepest descent.
  • 540. 522 18. STURM-LIOUVILLE SYSTEMS: FORMALISM 18.4 Problems 18.1. Show that the Liouville substitutiou trausfonns regular S-L systems into regular S-L systems aud separated aud periodic BCs into separated aud periodic BCs, respectively. 18.2. Let UI(x) aud U2(X) be trausfonned, respectively into VI(t) aud V2(t) by the Liouville substitutiou. Show that the inner product ou [a, b] with weight functiou w(x) is trausfonned into the inner product ou [0, c] with unit weight, where c = J:../w/pdx. 18.3. Derive Equatiou (18.17) from (18.15) using Priifer substitution. 18.4. (a) Show that the Liouville substitution trausfonns the Bessel DE into d 2 v (2 v 2 - 1/4) _ -2 + k - 2 v-D. dt t (b) Find the the equations obtained from the Priifer substitution, aud show that for large x these equations reduce to , ( a) 0(1) </> = k I - 2k2x2 + ~ R' 0(1) = R ~' where a = v2 - t. (c) Integrate these equations from x to b > x aud take the limit as b --+ 00 to get a 0(1) </>(x) = </>00 +kx+ 2kx +7' 0(1) R(x) = Roo+ -2-' x where </>00 =limb-->oo(</> (b) - kb) aud Roo =limb-->oo R(b). (d) Substitute these aud the appropriate expression for Q-I/4 in Equation (18.16) aud show that Roo ( v 2 - 1/4) 0(1) vex) = Ii: cos kx - kxoo + + -2-' -Jk 2kx x where kxoo ea n/2 - </>00' (e) Choose Roo = ../2/:n: for all solutions of the Bessel DE, aud let aud for the Bessel functions Jv(x) aud the Neumann functions Yv(x), respectively, aud find the asymptotic behavior of these two functions.
  • 541. IB.4 PROBLEMS 523 Additional Reading 1. Birkhoff, G. and Rota, G.-C. OrdinaryDifferentialEquations, 3rded., Wiley, 1978. Has a good discussion of Sturm-Liouville differential equations and their asymptotic behavior. 2. Boccara, N. Functional Analysis, Academic Press, 1990. Discusses the Sturm-Liouville operators in the same spirit as presented in this chapter. 3. Hellwig, G. Differential Operators of Mathematical Physics, Addison- Wesley, 1967. An oldie, but goodie! It gives a readable account of the Sturm-Liouville systems.
  • 542. 19 _ Sturm-Liouville Systems: Examples Chapter 12 showed how the solution of many PDEs can be written as the product of the solutions of the separated ODEs. These DEs are nsually of Sturm-Liouville type. We saw this in the construction of spherical harmonics. In this chapter, con- sistingmainly of illustrative examples, we shall considerthe use of othercoordinate systems and construct solutions to DEs as infinite series expansions in terms of SoLeigenfunctions. 19.1 Expansions in Termsof Eigenfunctions Central to the expansion of solutions in terms ofSoLeigenfunctions is the question of their completeness. This completeness was established for a regular SoLsystem in Theorem 18.1.9. We shall shortly state an analogous theorem (without proof) that establishes the completeness of the eigenfunctions of more general SoLsystems. This theorem requires the following generalization of the separated and the periodic BCs: RIU == anura) +a12u'(a) +anu(b) +aI4u'(b) = 0, Rzu sa aZlu(a) +«nu'ta) +aZ3u(b) +a24u'(b) = 0, where aij are numbers such that the rank of the following matrix is 2: (19.1) TheseparatedBCscorrespondtothecaseforwhichan = aj, a12 = fh,az3 = az, and aZ4 = th, with all other aij zero. Similarly, the periodic BC is a special case for which an = -an = azz = -a24 = I, with all other aij zero. It is easy to
  • 543. 19.1 EXPANSIONS IN TERMS OF EIGENFUNCTIONS 525 verify that the rank of the matrix a is 2 for these two special cases. Let ll= {u E eZ[a,bJIRju =0, for j = 1,2} (19.2) bea subspace of.c~ (a, b),and-to assure the vanishing oftheRHS ofthe Lagrange identity-assume that the following equality holds: (b) det ("'U "'12) = (a) det ("'t3 "'t4). P "'ZI "'22 P "'Z3 "'24 (19.3) We are now ready to consider the theorem (for a proof, see [Hell 67, Chapter 7]). 19.1.1. Theorem. The eigenfunctions (un(x)}~1 ofan S-L system consisting of the S-L equation (pu')' + (Aw - q)u = 0 and the Bes of(19.1)form a complete basis ofthe subspace II of.c~(a, b) described in (19.2). The eigenvalues are real and countably infinite and each one has a multiplicity ofat most 2. They can be ordered according to size Al ::: Az ::: .. " and their only limit point is +00. First note that Equation (19.3) contains both separated and periodic BCs as special cases (problem 19.1).10 the case ofperiodic BCs, we assume that pea) = p (b). Thus, all the eigenfunctions discussed so far are covered by Theorem 19.1.1. Second, the orthogonality ofeigenfunctions corresponding to differenteigenvalues and the fact that there are infinitely many distincteigenvalues assure the existence of infinitely many eigenfunctions. Third, the eigenfunctions form a basis of II and not thewhole.c~(a, b). Onlythosefunctionsu E .c~(a, b) thatsatisfytheBCin(19.1) are expandable in terms of u,,(x). Finally, the last statement of Theorem 19.1.1 is a repetition of part of Theorem 18.1.9 but is included because the conditiuns under which Theorem 19.1.1 holds are more general than those applying to Theorem 18.1.9. Part II discussed orthogonalfunctions in detail and showed how otherfunctions can be expanded in terms of them. However, the procedure nsed in Part II was ad hoc from a logical standpoint. After all, the orthogonal polynomials were invented by nineteenth-century mathematical physicists who, in their struggle to solve the PDEs of physics using the separation of variables, carne across various ODEs of the second order, all of which were recognized later as S-L systems. From a logical standpoint, therefore, this chapter shonld precede Part II. But the order of the chapters was based on clarity and ease of presentation and the fact that the machinery of differential equations was a prerequisite for such a discussion. Theorem 19.1.1 is the importantlink betweenthe algebraic and the analytic ma- chinery of differential equation theory. This theorem puts at our disposal concrete mathematical functions that are calculable to any desired accuracy (on a computer, say) and can serve as basis functions for all the expansions described in Part II. The remainder of this chapter is devoted to solving some PDEs of mathematical physics using the separation of variables and Theorem 19.1.1.
  • 544. 526 19. STURM-LIOUVILLE SYSTEMS: EXAMPLES x c b t:P= f(x,y) y Figure 19.1 A rectangular conducting box of which one face is held at the potential f (x, y) and the otherfaces are grounded. 19.2 Separation in Cartesian Coordinates (19.4) d2Z -2 +vZ=O, dz Problems most suitable for Cartesian coordinates have boundaries withrectangular symmetry such as boxes or planes. 19.2.1. Example. RECrANGULAR CONDUCTING BOX Consider a rectangular conducting box with sides a, b, and c (see Figure 19.1). All faces are held at zero potential except the top face, whose potential is given by a function f(x, y). Let us find the potential at all points inside the box. The relevant PDEfor this situation is Laplace's equation, V2ep = O.Writing cJl(x, y, z) as a product of three functions, <I>(x, y, z) = X(x)Y(y)Z(z), yields three ODEs (see Problem19.2): d2X -2 +AX=O, dx rectangular conducling box where A + f.l- + v = O.The vanishing of cI> at x = 0 and x = a means that forn= 1,2, .... <1>(0, y, z) = X(O)Y(y)Z(z) = 0 V y, z =} X(O) = 0, <I>(a, y, z) = X(a)Y(y)Z(z) = 0 V y, z =} X(a) = O. We thus obtain an S-L system, XIf + AX = 0, X(O) = 0 = X(a), whose Be is neither separated nor periodic, but satisfies (19.1) with au = Q(23 = 1 and all other aii zero. This S~L systemhas the eigenvalues and eigenfunctions An = (n;)2 and XIl(X) = sin (n;x) Similarly, the second equation in (19.4) leads to (m n)2 JLm= b and for m = 1,2, .... On the other hand, the third equation in (19.4) does not lead to an S-L system because the Befor the top ofthe box does not fit (19.1). This is as expected because the "eigenvalue" v is already determined by Aand JL. Nevertheless, we can find a solution for that equation. The substitution 2 _(,!!:)2 (mn)2 Ymn - a + b
  • 545. 19.2 SEPARATION IN CARTESIAN COORDINATES 527 changes the Z equation to Z" - Y~n Z = 0, whose solution, consistent with Z (0) = 0, is Z(z) = Cmn sinh(Ymnz). We note that X(x) and Y(y) are functions satisfying RIX = 0 = R2X. Thns, by Theorem 19.1.1, they can be written as a linear combination of Xn(x) and YmCy): X(x) = I:~l An sin(mrxja) and Y(y) = I:;;,'=l Bm sin(mrrbjy). Conseqnently, the most general solution can be expressed as 00 00 <I> (x, y, z) = X(x)Y(y)Z(z) = L L Amn sin c:x) sin (mbrr y) sinh(YmnZ), n=lm=l where Amn = AnBmCmn. To specify cI> completely, we must determine the arbitrary constants Amn . This is done by imposing the remaining BC, <I> (x, y, c) = f(x, y), yielding the identity 00 00 f(x, y) = L L Amn sin c:x) sin Crr y) sinh(Ymn c) n=lm=! = f f BmnsinC x) sin Crr y), n=lm=! where Bmn == Amn sinh(Ymnc). This is a two-dimensional Fourier series (see Chapter 8) whose coefficients are given by 4 fa t' (nrr) (mrr) Bmn = ab 10 dx 10 dyf(x, y)sin --;;x sin bY . III Pierre Simon de Laplace (1749-1827) was a French mathematician and theoretical astronomer who was so fa- mous in his own time that he was known as the Newton of France. His main interests throughout his life were celestial mechanics, the theory of probability, and personal advance- ment. At the age of 24 he was already deeply engaged in the detailed application of Newton's law of gravitation to the solar system as a whole, in which the planets and their satel- lites are not governed by the sun alone, but interact with one another in a bewildering variety of ways. Even Newton had been of the opinion that divine intervention would occasionally be needed to prevent this complex mechanism from degenerating into chaos. Laplace decided to seek reassurance elsewhere, and succeeded in proving that the ideal solar system of mathematics is a sta- ble dynamical system that will endure unchanged for all time. This achievement was only one of the long series of triumphs recorded in his monumental treatise Mecamque Celeste (published in five volumes from 1799 to 1825), which summed up the work on gravitation of several generations of illustrious mathematicians. Unfortunately for his later reputation, he omitted all reference to the discoveries of his predecessors and contemporaries, and left it to be inferred that the ideas were entirely his own. Many anecdotes are associated with this work. One of the best known describes the occasion on which Napoleon tried to get a
  • 546. 528 19. STURM-LIOUVILLE SYSTEMS: EXAMPLES rise outofLaplaceby protestingthathe hadwrittena hugebookon thesystemofthe world without once mentioning God as the author of the universe. Laplace is supposed to have replied, "Sire, I had no need of that hypothesis." The principal legacy of the Mecanique Celeste to later generations lay in Laplace's wholesale development of potential theory, with its far-reachingimplicationsfor adozendifferent branchesof physical scienceranging from gravitationandfluidmechanicsto electromagnetismandatomicphysics.Eventhough he lifted the idea of the potential from Lagrange without acknowledgment, he exploited it so extensively that ever since his time the fundamental equation of potential theory has been known as Laplace's equation. After the French Revolution, Laplace's political talents and greed for position came to full flower. His compatriots speak ironically of his "supple- ness" and "versatility" as a politician. What this really means is that each time there was a changeof regime (andthere were many),Laplace smoothlyadaptedhimselfby changing his principles-back and forth between fervent republicanism and fawning royalism-and each time he emerged with a betterjob and grander titles. He has been aptly compared with the apocryphal Vicar of Bray in English literature, who was twice a Catholic and twice a Protestant. The Vicar is said to have replied as follows to the charge of being a turncoat: "Not so, neither, for if I changed my religion, I am sure I kept true to my principle, which is to live and die the Vicar of Bray." To balance his faults, Laplace was always generous in giving assistance and encourage- ment to younger scientists. From time to time he helped forward in their careers such men as the chemist GaywLussac.the traveler and naturalist Humboldt, the physicist Poisson, and- appropriately-the young Cauchy, who was destined to become one of the chief architects of nineteenth century mathematics. Laplace's equation describes not only electrostatics, but also heat transfer. When the transfer (diffusion) of heat takes place with the temperature being in- dependent of time, the process is known as steady-state heat transfer. The dif- fusion equation, aT fat = a2 V 2T , becomes Laplace's equation, V 2T = 0, and the technique of the preceding example can be used. It is easy to see that the diffusion equation allows us to perform any linear transformation on T, such as T -> «T +P, and still satisfy that equation. This implies that T can be measured in any scale such as Kelvin, Celsius, and Fahrenheit. steady-state haat-conducting plate haat-conducting plata: steady state 19.2.2. Example. STEADY-STATE HEAT-CONDUCTING PLATE Let us consider a rectangular heat-conducting plate with sides of lengths a and b. Three of the sides are held at T = 0, and the fourth side has a temperaturevariationT = f (x) (see Figure 19.2). The flat faces are insulated. so they cannot lose heat to the surroundings. Assuming a steady-state heat transfer, let us calculate the variation of T over the plate. The problem is two-dimensional. The separation of variables leads to where A+ f.' =O. (19.5) TheXequationandtheBCsT(O, y) = I'(a, y) = oformanS-Lsystemwhoseeigenvalues and eigenfunctions are An = (mr/ap and Xn(x) = sin(mrx/a) forn = 1,2, .... Thus.ac- cordingtoTheorem19.1.1,ageneralX(x) canbewrittenasX(x) =L:~t An sin(nrrxja).
  • 547. 19.2 SEPARATION IN CARTESIAN COOROINATES 529 T=f(x) o· ---x 301 o· f-oE---- a ------;~ Figure 19.2 A heat-conducting rectangular plate. TheY equation, on theother hand, doesnotformanS-L systemdueto thefactthat its "eigenvalue" ispredeterminedbythethirdequationin (19.5).Nevertheless, wecansolvethe equation Y" -(mr/apr = Otoobtainthegeneral solutionY(y) = Aen:Jry/a +Be-mry/a. Since T'[x, 0) =0 'I x, we must have Y(O) = O. This implies thatA+B = 0, which.tnturn, reduces the solution to Y = A sinh(mryja). Thus, the most general solution, consistent with the three Bes T(O, y) = T(a, y) = T(x, 0) = 0, is 00 ri», y) = X(x)Y(y) = L s; sin errx) sinh erry). n=l a a Thefourth Be gives a Fourier series, ~ [ . (nrr)]. (nrr) ~ . (nrr ) f(x) = L- s; smh --;;b sm --;;x sa L- en sin --;;x , 11=1 n=l whose coefficientscanbe determined from ( nrr ) 2 fa (nrr) c, =s; sinh --;;b =;;10 sin --;;x f(x) dx. In particular, if thefourth sideis heldattheconstant temperature To. then 2T. { 4TO c; = --.!l. (.!'-) [1- (_1)n] = nrr a nH 0 if n is odd, if n is even, and we obtain T(x, y) = 4To f 1 sin[(2k + 1)rrx/alsinh[(2k + I)rry/al. n k~O 2k + I sinh[(2k + I)rrb/a] (19.6) Ifthetemperature variation of thefourth sideis of theform f (x) = Tosin(nx/a). then 2To Ina (nrrx) (rrx) 2To (a) en = - sin -- sin - dx = - - 8n 1 = To8n 1 a 0 a- a a 2 ' ,
  • 548. 530 19. STURM-LIOUVILLE SYSTEMS: EXAMPLES and Bn = en/ sinh(mrb/a) = [To/ sinh(mrb/a)]8n," and we have T(x ) =To sin(rrx/a) sinh(rry/a). , y 0 sinh(rrb/a) (19.7) conduction 01 heal in a rectangular plate Onlyone term of the seriessurvives in this case becausethe variation on the fourth side happens tohe oneof theharmonics of theexpansion. ' Note that the temperature variations given by (19,6) and (19.7) are independent of thematerial of theplatebecausewe are dealingwith a steadystate. The conductivity of the material is a factor only in the process of heat transfer, leading to the steady state. Onceequilibrium hasbeenreached, thedistribution of temperature will bethesameforall materials. II The preceding two examples concerned themselves with static sitoations. The remaining examples of this section are drawn from the (time-dependent) diffusion equation, the Schrodinger equation, and the wave equation. 19.2.3. Example. CONDUCTION OF HEAT IN A RECTANGULAR PLATE Considera rectangularbeat-conductingplate with sides oflength a and b all held at T = O. Assumethat attimet = 0 thetemperature has a distribution function I(x, y). Let us find thevariation of temperature forall points(x, y) atall timest > O. Thediffusion equation forthisproblem is aT =k2V2T = k2 (a 2; + a 2;). at ax ay A separation of variables, T(x, y, t) = X(x)Y(y)g(t), leads to three DEs: d2 y -2 +iLY=O, dy The BCs T(O, y, t) = T(a, y, r) = T(x, 0, t) = T(x, b, r) = 0, together with the three ODEs,giveriseto two S-L systems.The solutions to bothof theseare easilyfound: _ (mr)2 An- - a _ (mrr)2 iLm- b and and Xn(X) = sin (n;x) Ym(y) = siu erry) for n = 1,2, ... , for m = 1,2, .... Thesegiveriseto thegeneral solutions 00 X(x) =~::>n sin ex), n=I With Ymn == k 2Jr2(n 2/a2 + m2/h2), the solutionto the g equation can be expressed as g(t) = Cmne-Ymnt. Putting everything together, we obtain 00 00 (nrr) (mrr) T(x, y, t) = E L Amne-Ymnt sin --;;x sin bY , n=lm=l
  • 549. quantum particle ina box 19.2 SEPARATION IN CARTESIAN COOROINATES 531 where Amn = AnBmCmn is an arbitrary constant. To determine it, we imposetheinitial condition T(x, y, 0) = f(x, y). This yields 00 00 f(x,y) = L L Amn sin C:x ) sin C"y), n=lm=l which determines the coefficientsAmn: Amn = a: fa" dx t dyf(x, y) sin C:x) sin C"y). .. 19.2.4. Example. QUANTUM PARTICLEIN A BOX Thebehavior of anatomic particle of massJl- confined in arectangular box withsidesa, b, andc (aninfinite three-dimensionalpotential well) is governedbytheSchrodingerequation fora freeparticle, ina1/l = _ n 2 (a 2 1/1 + a 2 1/1 + a 2 1/1 ) , at 2/L ax2 ai az2 and the Bethat 1/I(x,y, Z, t) vanishes at all sides of the boxfar all time. A separation of vanabies 1/1(x, y, z, t) = X(x)Y(y)Z(z)T(t) yields the ODEs d2X -2 +).X=O, dx dT -+iwT=O, dt where d2y -2 +uY=O, dy n ni ea -().+u+v). 2/L d2Z -2 +vX=O, dz The spatial equations, togetherwiththe Bes 1/1(0, y, Z, t) = 1/I(a, y, Z, t) =0 1/1(x, 0, z,t) = 1/I(x,b, z,t) = 0 1/1(x, y, 0, t) = 1/I(x,y, c, t) = 0 =} X(O) = 0 = X(a), =} YeO) = 0 = Y(b), =} Z(O) = 0 = Z(c), leadto threeS-L systems, whose solutionsareeasily found: Xn(X) = sin (n;x), Ym(y) = sin (m b " y), ZI(Z) = sin ez), (n ")2 An = a ' (m ")2 Um = b ' VI = C:)2 for n = 1,2, ... , for m = 1,2, ...• for I = 1,2, .... Thetimeequation, on theother hand, basa solutionof theform where Wimn = 2: [C)2+ C")2+ en· The solution of theSchrodinger equation that is consistent withtheBes is therefore 1fr(x, y, Z, r) = f: Almne-iWlmnt sin (n;x) sin (mb 7r Y) sin c:z). 1,m,n=1
  • 550. (19.8) (19.9) 532 19. STURM-LIOUVILLE SYSTEMS: EXAMPLES The constants Almn are determined by the initial shape, 1/1(x, y, Z, 0) of the wave function. The energy of the particle is /i2,,2 (n2 m 2 1 2 ) E=nwlmn = - - -2+2+2 . 2/L a b c Each set of three positive integers (n, m,l) represents a state of the particle. For a cube, a = b = c == L, and the energy of the particle is /i2,,2 /i2,,2 E = --2(n 2 +m 2 +z2) = ------Z-/3 (n 2 +m 2 +z2) 2/LL 2/LY where V = L 3 is the volume of the box. The ground stale is (1,1,1), has energy E = 3lt2rr2/2f.l-V2/3,andisnondegenerate(onlyone statecorrespondstothis energy).However, thehigher-levelstatesaredegenerate. Forinstance,thethreedistinctstates(I, 1,2), (I, 2, I), and (2,1,1) all correspondto the same energy, E = 6/i2,,2/2/Ly2/3. The degeneracy increases rapidly with larger values ofn, m, and I. Equation(19.8)canbe writtenas 2/LEy2/3 where R 2 = h,2 1r2 This looks like the equation of a sphere in the nml-space.1f R is large, the number ofstates contained within the sphere ofradius R (the number ofstates with energy less than or equal to E) is simply the volume of the first octant! of the sphere. If N is the number of such states, we have density ofstates Thus the density of states (the number of states per unit volume) is _ N _::. (~)3/2 3/2 n - y - 6 /i2,,2 E. This is an important formula in solid-state physics, because the energy E is (with minor Fermi energy modifications required by spin) the Fermi energy. If the Fermi energy is denoted by Ef' Equation (19.9) gives Ef = an2j3 where a is some constant. l1li In the preceding examples the time variation is given by a first derivative. Thns, as far as time is concerned, we have a FODE. Itfollows that the initial specification of the physical quantity of interest (temperature T or Schrodinger wave functiou 1/1) is sufficient to determine the solution uniquely. A second kind of time-dependent PDE occurring in physics is the wave equa- tion, which contains time derivatives of the second order. Thus, there are two arbitrary parameters in the general solution. To determine these, we expect two initial conditions. For example, if the wave is standing, as in a rope clamped at both IThis is because n, m, and 1are all positive.
  • 551. 19.2 SEPARATION IN CARTESIAN COOROINATES 533 ends, the boundary conditions are not sufficient to determine the wave function uniquely. One also needs to specify the initial (transverse) velocity ofeach point of the rope. For traveling waves, specification of the wave shape and velocity shape is not as important as the mode ofpropagation. For instance, in the theory of wave guides, after the time variation is separated, a particular time variation, such as e+iwt , and a particular direction for the propagation of the wave, say the z-axis, are chosen. Thus, if u denotes a component of the electric or the magnetic field, we can write u(x, y, Z, z) = 1fr(x, y)e'(WI±k'l, where k is the wave number. The waveequation thenreduces to Introducing y2 = w2/c2 - k2 and the transverse gradient V, = (a/ax, a/ay) and writing the above equation in terms of the full vectors, we obtain 2 2 {E} (V, +Y ) B = 0, where { E }= {E(x, y)} e,(wIHz) B B(x, y) (19.10) These are the basic equations used in the study of electromagnetic wave guides andresonant cavities. guided waves Maxwell's equations in conjunction with Equation (19.10) gives the transverse components (components perpendicular to the propagation direction) E, and B, in terms of the longitudinal components Ez and Bz (see [Lorr 88, Chapter 33]): 2 (aEz ) .W. y E, = V, - - ,-ez x (V,Bz), az c 2 (aBz ) .W. yB,=V, - +,-e,x(V,Ez)' az c Three types of guided waves are usually studied. (19.11) 1. Transverse magnetic (TM) waves have B, = 0 everywhere. The BC on E demands that Ez vanish at the conducting walls of the guide. 2. Transverse electric (TE) waves have E, = 0 everywhere. The BC on B requires that the normal directional derivative vanish at the walls. 3. Transverse electromagnetic (TEM) waves have B, = 0 = Ez.For anontriv- ial solution, Equation (19.11) demands that y2 = O. This form resembles a free wave with no boundaries.
  • 552. 534 19. STURM-LIOUVILLE SYSTEMS: EXAMPLES We will discuss the TM mode briefly (see any book on electromagnetic theory for further details). The basic equations in this mode are ('v; +YZ)Ez = 0, Z (8E,) Y E, = V, az ' B, =0, yZB, = i :':e, x (V,E,). c (19.12) Y(O) = 0 = Y(b), X(O) =0 = X(a), rectangular wave guides 19.2.5. Example. REcrANGULAR WAVE GUIDES For a wave guide with a rectangular cross section of sides a and b in the x and the y directions, respectively, we have aZE, aZEz Z -z- + -z- +Y E, =O. ax ay A separationof variables,E,(x, y) =X(x)Y(y), leads to two SoLsystems, dZX -Z +AX=O, dx dZY -Z +/LY=O, dy where y2 = A + u: Theseequations havethe solutions Xn(X) = sin (n;x) , Ym(y) = sin(";,1!'y) , _(n1!')Z An - a u-m = (";,1!')Z for n = 1,2, ... , for m = 1,2, .... The wavenumber is given by which has toberealifthe waveis to propagate (animaginary k leadsto exponential decay orgrowthalongthe z-axis). Thus,there is a cutofffrequency, {orm,n 2: 1, below whichthewavecannot propagate through the wave guide. It follows that fora TM wave the lowest frequency thatcan propagate along a rectangular wave guide is WI! 1!'cJaZ +bZj(ab). Themostgeneral solutionforEz is therefore 00 Ez = L Amnsin(n; X) sin (mb 1r y)ei(wt±kmnZ). m,n=l Theconstants Amn are arbitrary andcanbedetermined from theinitial shapeof thewave, butthatis not commonly done. Once Ez is found, theothercomponents canbe calculated usingEquation(19.12). II
  • 553. 19.3 SEPARATION IN CYLINDRICALCOORDINATES 535 «P=V(p,<j» T h y conducting cylindrical can x Figure 19.3 A conducting cylindricalcan whose top has a potentialgivenby V(p, ,), withthe restof the surfacegrounded. 19.3 Separation in Cylindrical Coordinates When the geometry of the boundaries is cylindrical, the appropriate coordinate system is the cylindrical one. This usually leads to Bessel functions "of some kind." Before working specific examples of cylindrical geometry, let us consider a question that has more general implications. We saw in the previous section that separation of variables leads to ODEs in which certain constants (eigenvalues) appear. Differentchoices ofsigns for these constants Canleadto differentfunctional forms ofthe general solution. For example, an equation such as d2x/ dt 2 - kx = 0 can have exponential solutions if k > 0 or trigonometric solutions if k < O. One cannot a priori assign a specific sign to k. Thus, the general form of the solution is indeterminate. However, oncetheboundary conditions areimposed, theunique solutions will emerge regardless ofthe initial functional form ofthe solutions (see [Hass 99] for a thorough discussion of this point). 19.3.1. Example. CONDUCTING CYUNDRICAL CAN Consider a cylindrical conducting can of radius a and height h (see Figure 19.3). The potential varies atthetopfaceas V(p, <p), whilethelateral surface andthebottom faceare grounded. Letus find theelectrostatic potential atallpointsinsidethecan. A separation of variables transforms Laplace's equation intothree ODEs: d(dR) (2 m 2) - p- + k p - - R = 0, dp dp p
  • 554. 536 19. STURM-LIOUVILLE SYSTEMS: EXAMPLES wherein anticipation of thecorrect Bes, we have written theconstants ask2 and _m 2 with m aninteger. Thefirst ofthese istheBesselequation, whosegeneral solution canbewritten as R(p) = AJm(kp) + BYm(kp). The second DE, when the extra condition of periodicity is imposed onthepotential, hasthegeneral solution S(<p) = Ccosm<p+ Dsinm<p. Finally the third DE hasa general solntion of the form Z(z) = Eekz + Fe-kz. and E=-F Wenotethatnoneof thethree ODEsleadto anS-L system ofTheorem19.1.1 because the Bes associatedwiththemdo not satisfy (19.1). However, we can still solve theproblem by imposing thegivenBes. Thefactthatthepotential mustbe finiteeverywhere insidethecan (including at p = 0) forces B to vanish because theNeumann function Ym (kp) is notdefined at p = O. On the other hand, we want <I> to vanish at p = a. This gives Jm(ka) = 0, which demands that ka be a rootof the Bessel functionof order m. Denotingby Xmn thenth zeroof the Bessel function of order m, wehaveka = Xmn, ork = xmn/a forn = 1,2, .... Sintilarly, the vanishing of <I> at Z = 0 implies that . (xmnz) Z(z) = E sinh -a- . Wecan now multiply R, S, and Z and sumoverallpossible values ofm and n, keeping inmindthatnegative values ofm giveterms thatarelinearly dependentonthecorresponding Fourier-Bessel series positive values. Theresult is theso-calledFourier-Bessel series: 00 00 (xmn). (xmn) . <I>(p, <p, z) = L L Jm . ---;;-P sinh ---;;-z (Amn cosm<p + Bmn smm<p), m~On~l (19.13) where Amn and Bmn are constants to be determined by theremaining Be. Tofind these constants we use theorthogonality of thetrigonometric andBessel functions. Forz = h Equation (19.13) redoces to yep, lp) = f f:s; (x: n p) sinh(x:n h) (Amncosmlp + Bmn sinmlp), m=On=l from which we obtain Amn = 2 2 2 r:d<p fa dp pV(p, <p)Jm e mn p) cosmtp, x a Jm+1(xmn) sinh(xmnh/a) Jo Jo a Bmn = 22 2 {21rdlp (a dPPV(p,lp)Jm(Xmnp)sinmlp, x a Jm+1(xmn) sinh(xmnh/a) Jo Jo a where we haveusedthefollowing result derived in Problem 14.39: (19.14)
  • 555. 19.3 SEPARATION IN CYLINDRICALCOORDINATES 537 Forthespecialbutimportant case.ofazimuthal symmetry, for whichV is independent of sp, we obtain 40m 0 In" (xon ) Amn = ' dppV(p)Jo -p , a2Jf(xo n) sinh(xOnhja) 0 a Bmn =0. .. circular heat-conducting plate The reason we obtained discrete values for k was the demand that <I> vanish at p = a.Ifwe let a -+ 00, then k will be a continuous variable, and instead ofa sum over k, we will obtain an integral. This is completely analogous to the transition froma Fourier seriesto a Fourier transform, butwe will not pursue it further. 19.3.2. Example. CIRCULAR HEAT-CONDUCTING PLATE Consider a circular heat-conducting plateof radius a whose temperature attimet = 0 has a disIribution function f (p, <p). Let us findthe variation of T for alt points (p, <p) on the platefortimet > 0 whentheedge is keptatT = O. Thisis a two-dimensional problem involvingtheheatequation, aT = k2V 2T = k2 [.!.i.(/T) + 2.a 2T]. at p ap Bp p2 B<p2 A separationof variables,T(p, <p, t) = R(p)S(<p)g(t), teads to the following ODEs: dg 2 d 2S -d =k Ag, -2 +tLS=O, t dsp a 2R t dR (tL ) d p2 + Pdp - p2 +A R = O. Toobtain exponential decayrather than growth for the temperature, we demand that A== b2 < O. Toensureperiodicity (see thediscussionatthebeginningof this section),we must havef.L = m2, where m is aninteger.'Tohave finite T at p = 0, no Neumann function is to be present. Thisleadsto thefollowingsolutions: get) = Ae-k'b", S(<p) = Bcosm<p + Csinm<p, R(p) = DJm(bp). Ifthe temperature is to bezeroat p = a, we musthaveJm(ba) = 0, orb = xmn/a. It followsthatthegeneral solution canbe written as 00 00 T(p, sp, t) = L L e-k2(Xmn/a)2t Jm(x:n p){Amncosm<p + Bmn sinm<p). m=On=l Amn andBmn canbe determined asin thepreceding example. .. cylindrical wave guide 19.3.3. Example. CYLINDRICAL WAVEGUIDE Fora TM wavepropagating alongthez-axis in a hollow circ~ar conductor, we have[see Eqnation(19.12)] 1 B (BEZ ) t B 2E z 2 - - p- +---+y Ez=O. p Bp Bp p2 B<p2
  • 556. 538 19. STURM-LIOUVILLE SYSTEMS: EXAMPLES The separation Ez = R(p)S(rp) yields S(rp) = A cosmrp + H sinmrp and d 2R I dR ( m 2) d p2 + Pdp + y2 - p2 R =O. Thesolution to thisequation, whichis regular at p = 0 andvanishes at p = a, is R(p) = CJm(xmn p) a and Xmn y=-. a Recalling thedefinition of y, we obtain Hz =0, and This gives the cut-offfrequency lVmll = cXmn/a. Thesolution fortheazimuthally symmetric case (m = 0) is Ez(p, v,t) = f:AnJO (X~n p) ei(wt±knz) n=l 11II current dlstrlbutlon in acircular wire There are many variations on the theme of Bessel functions. We have encoun- lered three kinds ofBessel functions, as well as modifiedBessel functions. Another variation encountered in applications leads 10 whal are known as Kelvinfunctions, introduced in the following example. 19.3.4. Example. CURRENT DISTRlBUTION IN A CIRCULAR WIRE Consider the flow of charges in an infinitely long wire with a circular cross section of radius Q. We are interested in calculating the variation of thecurrent density in the wire as a function of timeand location. Therelevant equation canbe obtained by starting with Maxwell's equations for negligible charge density (V· E = 0), Ohm's law G= ,,-E), the assumption of high electrical conductivity (I,,-EI » jaE/atl), and the usual procedure of obtaining thewaveequation fromMaxwell's equations. Theresult is 2. 4",,,- aj V J-7at =0. Moreover, we make thesimplifying assumptions that the wireis alongthez-axis and that there is no turbulence, so j is also alongthe z direction. We further assume that j is . t independent of <p and z,andthat its time-dependence is givenby e-iwt . Thenwe get "-. skindeplh Kelvin equation d2. I d . _J + _-.-1. +<2j = 0, d p2 pdp where <2 = i4"',,-w/c2 == i2/~2 and ~ = c/-'/2",,,-w is called the skin depth. TheKelvinequation is usually givenas d2w I dw 2 --+---ik w=O. dx 2 x dx (19.15) (19.16)
  • 557. 19.3 SEPARATION IN CYLINDRICALCOORDINATES 539 If we substitute x = ,Jit/ k, it becomesin + tiJ/ t + w = 0 whichis a Bessel equation of order zero. Ifthe solution is to beregular atx = 0, thentheonlychoiceis wet) = Jo(t) = Kelvin function Jo(e-i-nj 4 kx). This is the Kelvin function forEquation (19.16).It is usuallywritten as Jo(e-hr/4kx) es ber(kx) +i bei(kx) where berand beistand for"Besselreal" and "Besselimaginary," respectively. If we sub- stitute z = e-i1Cj4kx in the expansion for Io(z) and separate the real andthe imaginary parts of theexpansion, we obtain (xI2)4 (xI2)8 ber(x) = 1- --+-- - ... (2!)2 (4!)2 . (xI2)2 (xI2)6 (xI2)10 beifx) = (1!)2 - (3!)2 + (5!)2 _ .... Equation (19.15) is the complex conjugate of(l9.16) withk2 = 2/132. Thus,its solution is j(p) = AJo(ein/4kp) =-A {ber (~p) - ibei (~p)}. Wecancompare thevalue of thecurrent density at p withits value atthesurface p = a: [ (~ ) (~ )]1/2 j(p) ber 2 T P + bei 2 T P Ij(a) I= b 2 (~) b.2 (~ ) er -a + el -a 13 13 quanlum particle ina cylindrical can For low frequencies, 13 is large, which implies that P113 is small; thus, ber(~p113) '" I and bei(~pllJ) '" 0, and Ij(p)lj(a)1 '" I; i.e., the current density is almost uniform. For higher frequencies the ratio of the current densitiesstarts at a valueless than 1 at p = 0 andincreases to 1 at p = a. The starting valuedepends on the frequency. Forverylarge frequencies the starting value is almost zero (see [Mati 80, pp 150-156]). II 19.3.5. Example. QUANTUM PARTICLE IN A CYUNDRICAL CAN Let us consider a quantum particle in a cylindrical can. Foranatomicparticle of mass f.L confined ina cylindrical canof lengthL andradius a, therelevant Schriidinger equation is . at Ii [I a ( at) Ia 2t a2t] I at = - 2{Jo Pap p ap + p2 arp2 + az2 . Let us solve thisequation subject to theBe that1/!(p,qJ, Z, t) vanishes at the sides of the can. A separation of variables, t(p, rp, z, t) =R(p)S(rp)Z(z)T(t), yields dT d2Z d2S -=-;OJT -+AZ=O, -+m2S = O, dt dz2 drp2 d 2 R + .!:. dR + (2{Jo OJ _ A _ m 2 ) R = O. (19.17) dp2 pdp Ii p2
  • 558. 540 19. STURM-LIOUVILLE SYSTEMS: EXAMPLES TheZ equation, along withits periodic Bes, constitutes anS-Lsystemwithsolutions Z(z) = sin (k;z) fork=I.2..... (19.18) If we let 2/Lw/1i- (k:n:/ L)2 '" b2•thenthe last eqnationin (19.17)becomes d 2R +.!:.dR +(b2_ m 2 ) R = O . dp2 pdp . p2 and the solutionthat is well-behaved at p = 0 is Jm(bp). Since R(a) = O. we obtainthe quantization condition b = xmn/a forn = 1,2, .... Thus, theenergy eigenvalues are and thegeneral solution canbe written as ~ -iw t (xmn). (k:n: ) . '!J(P. <P. z, t) = L, e km. Jm -p 8m -z (Akmn ccemo + Ekmn smm<p). k,n=l a L III m=O 19.4 Separation in Spherical Coordinates Recall that most PDEs encountered in physical applications can be separated. in spherical coordinates. into L2y(O. <p) = 1(1 + I)Y(O. <p). d 2R +~dR +[f(r)_I(I+I)]R=O. dr2 r dr r2 We discussed the first of these two equations in great detail in Chapter 12. In particular. we constructed Yim(0. <p) in such a way thatthey formed an orthonormal sequence. However. thatconstructionwas purely algebraic and did not say anything about the completeness of Ylm(O.<pl. With Theorem 19.1.1 at our disposal, we can separate the first equation of (19.18) into two ODEs by writing Ylm(O.<p) = Plm(O)Sm(<p). We obtain d2S m 2 --2 +m Sm=O. d<p d [ 2 dPlm] [ m 2 ] - (l-x )-- + 1(1+1)--- Plm=O. dx dx l-x2 where x = cos O. These are both S-L systems satisfying the conditions ofTheorem 19.1.1. Thus. the Sm are orthogonal among themselves and form a complete set for ,(,2(0. 2IT).Similarly. for any fixed m. the Plm(X) form a complete orthogonal set for ,(,2(-1. +1) (actually for the subset of ,(,2(-1. +1) that satisfies the same
  • 559. 19.4 SEPARATION IN SPHERICAL COORDINATES 541 BC as the Pi«do atx =±I). Thus, the products Ylm(X,<p) =Plm(X)Sm(<P) fonu a complete orthogonal sequence in the (Cartesian prodnct) set [-I, +I] x [0, 2,,], which, in tenus of spherical angles, is the unit sphere, 0 :::: e :::: x ;0 :::: <p :::: 2". Let ns considersome specific examples of expansionin the spherical coordinate system starting with the simplest case, Laplace's equation for which fer) = O. The radial equation is therefore d2R 2dR 1(1 + I) -+--- R=O. dr2 r dr r 2 Multiplying by r2, substituting r = e', and using the chain rule and the fact that dt[dr = I/r leads to the following SOLDE with constant coefficients: d2R dR - + -·-1(1 + I)R = O. dt2 dt This has a characteristic polynomial peA) = A2 +A -1(1 + I) with roots Al = I and A2 = -(I + I). Thus, a general solution is of the fonu R(t) = AeAI ' +BeA" = A(e')l +B(e,)-I-I, or, in tenus of r, R(r) = Ar1 + Br-I-I. Thus, the most general solution of Laplace's equation is 00 1 <I>(r, e, <p) = L L (Almr l +Blmr-1-I)Ylm(e, <p). I=Om=-l For regions containing the origin, the finiteness of <I> implies that Blm = O. Denoting the potential in such regions by <l>in, we obtain 00 1 <l>in(r, e. <p) = L L AlmrIYlm(e, <p). I=Om=-1 Similarly, for regious including r = 00, we have 00 I <l>out(r, e, <p) = L L Blmr-I-IYlm(e, <p). 1=0 m=-l To determine Aim and Blm, we need to invoke appropriate BCs. In particular, forinside a sphere of radius a on which the potential is given by Vee, <p), we have 00 1 vee, <p) = <l>in(a, e. <p) = L L Alma1Ylm(e,<p). l=Om=-1
  • 560. 542 19. STURM-LIOUVILLE SYSTEMS: EXAMPLES Multiplying by Y~ (0, 91) and integrating over dQ = sin 0 dO d91, we obtain Akj = a-k IIdQV(O, 91}Y~(O, 91} =} Aim = a-I IIdQV(O, 91}YI;"(O,91}· Similarly, for potential ontside the sphere, Rim = a l +1 IIdQV(O, 91}YI;"(O, 91}· In particular, if V is independent of 91, only the components for which m = 0 are nonzero, and we have 2rr 1" 2rr Fll-I + 1 1" . Am = -I sinOV(O}Yio(O}dO = -I -4- smOV(O}PI(cosO} ae. a 0 a tc 0 which yields where Al = _2_ {" sinOV(O}PI(COSO} so. 21 + 1 10 Similarly, The next simplest case after Laplace's equation is that for which f (r) is a constant. The diffusion equation, the wave equation, and the Schriidinger equation for a free particle give rise to such a case once time is separated from the restofthe variables. Helmholtz equation The Helmholtz equationis (19.19) (19.20) and its radial part is d 2R +~dR +[k2_1(l+I}]R=0. dr2 r dr r2 (This equation was discussed in Problems 14.26 and 14.35.) The solutions are spherical Bessel spherical Bessel functions, generically denoted by the corresponding lower case functions letter as ZI(X} and given by (19.21)
  • 561. 19.4 SEPARATION IN SPHERICALCOOROINATES 543 where Zv(x) is a solution of the Bessel eqnation of order v. A general solution of (19.20) can therefore be written as If the origin is included in the region of interest, then we must set B = O. For such a case, the solution to the Helmholtz equation is 00 Z 'h(r, IJ, 'P) = L L Azmjz(kr)Yzm(lJ, 'P). I=Om=-l (19.22) for I = 0, 1, ... , n = 1,2, .... particle Inahard sphere The subscript k indicates that'" is a solution of the Helmholtz equation with k2 as its constant. 19.4.1. Example. PARTICLE IN A HARD SPHERE The time-independent Schrodinger equation for a particle in a sphere of radius a is ,,} - 2jLv 2t = Et with the Be t(o, e, IfJ) = O. Here E is the energy of the particle and jL is its mass. We rewrite the Schrodinger equation as V2t + 2jLEjn2 = O. With k2 = 2f.LE/;,,2,we canimmediately writetheradial solution Rz(r) = Ajz(kr) = Ajz(,j2jLErjn). Thevanishing of tat 0 impliesthat jZ(.,f2jLE ojn) = 0, or .,f2jLE 0 11, - = Xin for n = 1,2, "', where KIn is thenthzeroof hex), whichis the sameas thezeroof Jl+l/2(X). Thus, the energy is quantized as El n = n2x~ 2jLo Thegeneral solution to theSchrodlnger equation is 00 00 I r) t(r,e,IfJ) = LL L Anzmjz(Xznti YZm(e,'P). n=11=Om=-1 III A particularly useful consequence of Equation (19.22) is the expansion of a plane wave in terms of spherical Bessel functions. It is easily verified that if k is a vector, with k . k = k2, then eik.r is a solution of the Helmholtz equation. Thus, eik.r can be expanded as in Equation (19.22). Assuming that k is along the z-axis, we get k· r = krcoslJ, which is independent of 'P. Only the terms of Equation (19.22) for which m = 0 will survive in such a case, and we may write eik"osO = L~o A/j/(kr)Pz(cos IJ).To find Az, let u = cos IJ, mnltiply both sides by Pn(u),andintegratefrom-I to I: 11 . 00 11 2 Pn(u)e'krUdu = L Azjz(kr) Pn(u)Pz(u) du = Anjn(kr)--. -1 /=0 -1 2n + I
  • 562. 544 19. STURM-LIOUVILLE SYSTEMS: EXAMPLES Thus . 2n + 111 ikru Anln(kr) = - - Pn(u)e du 2 -1 = 2n + 1 f (ikr)m 11 Pn(u)umdu. 2 m=O m! -1 (19.23) expansion ofeik .rin spherical harmonics This equality holds for all values of kr. In particular, both sides should give the same result in the limit ofsmall kr. From the definition of jn(kr) and the expansion of In(kr), we obtain .Jii (kr)n 1 jn(kr) ~ 2 2 ['(n +3/2) On the other hand, the first nonvanishing term of the RHS of Equation (19.23) occurs when m = n. Eqnating these terms on both sides, we get .Jii (kr)n 21..+1n! = 2n + 1 ;n(kr)n 2n+1(nl)2 An 2 2 (2n +1)!.Jii 2 n! (2n + I)! ' wherewe have used (19.24) ( ~) _ (2n + 1)1.Jii I' n + 2 - 21..+1n! and 1 1 n 2n+1(nl)2 -1 Pn(U)U du = (2n + 1)1' Equation (19.24) yields An = ;n(2n + 1). With An thns calculated, we can now write 00 elkroo'B = L(21 + l);l jl(kr) PI(cos 6). I~O (19.25) For an arbitrary direction of k, k . r = kr cos y, where y is the angle between k and r. Thus, we may write elk.• = I:Z:;o(21 + 1);1jl (kr) PI(cos V), and using the addition theorem for spherical harmonics, we finally obtain 00 1 elk-r = 4n L L ;1h(kr)Y/;n (6', ,P')Ylm(6, <p), 1=0 m=-l (19.26) where 6' and ip' are the spherical angles of k and 6 and <p are those of r. Such a decomposition of plane waves into components with definite orbital angular momenta is extremely useful when working with scattering theory for waves and particles.
  • 563. 19.5 PROBLEMS 545 T=O T=O Figure 19.4 A semi-infiniteheat-conductingplate. 19.5 Problems x 19.1. Show that separated and periodic Bes are special cases of the equality in Equation (19.3). 19.2. Derive Equation (19.4). 19.3. A semi-infinite heat-conducting plate of width b is extended along the pos- itive x-axis with one comer at (0, 0) and the other at (0, b). The side of width b is held at temperature To, and the two long sides are held at T = 0 (see Figure 19.4). The two flat faces are insulated. Find the temperature variation ofthe plate, assum- ing equilibrium. Repeat the problem with the temperature of the short side held at each of the following: Ca)T={O ~f To If (c) Tocos Gy) , 0< y < biZ, biZ < y < b. o:'0 y :'0 b. (b) ;y, 0:'0 y:'O b. (d) Tosin (~y) , 0:'0 y:'0 b. 19.4. Find a general solution for the electromagnetic wave propagation in a res- onant cavity, a rectangular box of sides 0 :'0 x :'0 a, 0 :'0 Y :'0 b, and 0 :'0 z :'0 d with perfectly conducting walls. Discuss the modes the cavity can accommodate. ( 19.5. The lateral faces of a cube are grounded, and its top and bottom faces are held at potentials 11(x, y) and fz(x, y), respectively. (a) Find a general expression for the potential inside the cube. (b) Find the potential if the top is held at Vo volts and the bottom at - Vo volts. 19.6. Find the potential inside a semi-infinite cylindrical conductor, closed at the nearby end, whose cross section is a square with sides oflength a. All sides are grounded except the square side, which is held at the constant potential Vo.
  • 564. 546 19. STURM-LIOUVILLE SYSTEMS: EXAMPLES 19.7. Find the temperature distribution of a rectangular plate (see Figure 19.2) with sides of lengths a and b if three sides are held at T = 0 and the fourth side has a temperature variation given by 0::; x < a. To (a) -x, a To I a I (e) -; x -"2 ' o::: x < a. To (b) 2"x(x - a), 0:" x :" a. a (d) T = 0, 0 :" x :" a. 19.8. Consider a thin heat-conducting bar of length b along the x-axis with one end at x = 0 held at temperature To and the other end at x = b held attemperature - To. The lateral surface of the bar is thermally insulated. Find the temperature distribution at all times if initially it is given by 2To (a) T(O, x) = -bx +To, where 0:" x:" b. 2To 2 (b) T(O, x) = -z;zx +To, where 0:" x :" b. To (e) T(O, x) = -,;x + To, where 0:" x < b. (d) T(O,x) = Tocosej;x), where O:"x:"b. Hint: The solution corresponding to the zero eigenvalue is essential and cannot be excluded. 19.9. Determine T(x, y, r) for the rectangular plate ofExample 19.2.3 if initially the lower left quarter is held at To and the rest of the plate is held at T = O. 19.10. All sides of the plate of Example 19.2.3 are held at T = O. Find the temperature distribution for all time if the initial temperature distribution is given by if "a < x < ~a and Ib < y < ~b 4 - - 4 4 - - 4 ' otherwise. { To (a) T(x, y, 0) = 0 To (b) T(x, y, 0) = -xy, ab To (e) T(x, y, 0) = -x, a where 0:" x < a and 0 :" y < b. where 0:" x < a and 0 < y < b. ( 19.11. Repeat Example 19.2.3 with the temperatures of the sides equal to Tl, T2, T3, and T4.Hint: You must include solutions corresponding to the zero eigenvalue. 19.12. A string of length a is fixed at the left end, and the right end moves with displacement A sinmt. Find 1{f(x,t) and a consistent set of initial conditions for the displacement and the velocity.
  • 565. 19.5 PROBLEMS 547 19.13. Find the equation for a vibrating rectangular membrane with sides of lengths a and b rigidly fastened on all sides. For a = b, show that a given mode frequency mayhavemorethan one solution. 19.14. Repeat Example 19.3.1 if the can has semi-infinite length, the lateral sur- face is grounded, and: (a) the base is held at the potential V(p, <fI). Specialize to the case where the potential of the base is given-in Cartesian co- ordinates-s-oy Vo (b) V = -yo a Vo (c) V = -x. a Vo (d) V = '2xy. a Hint: Use the integral identity f zV+IJv(z) dz = Zv+1 Jv+1 (z). 19.15. Find the steady-state temperature distribution T (p, tp, z) in a semi-infinite solid cylinder ofradius a if the temperature distribution ofthe base is f(p, <fI) and the lateral surface is held at T = O. 19.16. Find the steady-state temperature distribution of a solid cylinder with a height and radius of 10, assuuting that the base and the lateral surface are at T = 0 and the top is at T = 100. 19.17. The circumference ofa flat circular plate ofradius a, lying in the xy-plane, is held at T = O. Find the temperature distribution for all time if the temperature distribution att = 0 is given-in Cartesian coordinates-by To (a) -yo a To (b) -x. a To (c) '2xy. a (d) To. 19.18. Find the temperature of a circular conducting plate ofradius a at all points of its surface for all time t > 0, assuming that its edge is held at T = 0 and initially its surface from the ceuter to a/2 is in contact with a heat bath of temperature To. 19.19. Fiud the potential of a cylindrical conducting can of radius a and height h whose top is held at a constant potential Vo while the rest is grounded. 19.20. Find the modes and the corresponding fields of a cylindricalresonantcavity of length L and radius a. Discuss the lowest TM mode. 19.21. Two identical long conducting half-cylindrical shells (cross sectious are half-circles) ofradius a are glued togetherin such a way that they are insulated from one another. One half-cylinder is held at potential Vo and the other is grouuded. Findthe potential at any pointinsidethe resulting cylinder. Hint: Separate Laplace's equation in two dimensions. (
  • 566. 548 19. STURM-LIOUVILLE SYSTEMS: EXAMPLES 19.22. A linear charge distribution of uniform dcnsity x extends along the z-axis from z = -b to z = b. Show that the electrostatic potential at any point r > b is given by 00 (b/r)Zk+l <P(r, e, rp) = 2)"I: PZk(COSe). k~O 2k+ I Hiut: Consider a point on the z-axis at a distance r > b from the origin. Solve the simple problem by integration and compare the result with the infinite series to obtain the unknown coefficients. 19.23. The upper half of a heat-conducting sphere of radius a has T = 100; the . lower halfis maintained at T = -100. The whole sphereis inside an infinitelylarge mass of heat-conducting material. Find the steady-state temperature distribution inside and outside the sphere. 19.24. Find the steady-state temperature distribution inside a sphere of radius a when the surface temperature is given by: (a) To cosz e. (d) To (cos e- cos3 e). (b) To cos" e. (e) Tosm2 e. (c) Tol cos e]. (f) To sin4 e. 19.25. Find the electrostatic potential both inside and outside a conducting sphere of radius a when the sphere is maintained at a potential given by (a) Vo(cose - 3sinz e). { Vocos e for the upper hemisphere, (c) . o for the lower hemisphere. 19.26. Find the steady-state temperature distribution inside a solid hemisphere of radius a if the curved surface is held at To and the flat surface at T = O. Hint: Imagine completing the sphere and maintaining the lower hemisphere at a temperature such that the overall surface temperature distributionis an oddfunction about e= n/2. 19.27. Find the steady-state temperature distribution in a spherical shell of inner ( radius R1 and outer radius Rz when the inner surface has a temperature Tl and the outer surface a temperature Tz- Additional Reading 1. Jackson, J. Classical Electrodynamics, 2nd ed., Wiley, 1975. The classic textbook on electromagnetism with many examples and problems on the solutions of Laplace's equation in different coordinate systems.
  • 567. 19.5 PROBLEMS 549 2. Mathews, J. and Walker, R. Mathematical Methods of Physics, 2nd ed., Benjamin, 1970. 3. Morse, P. and Feshbach, M. Methods ofTheoretical Physics, McGraw-Hill, 1953. (
  • 569. Part VI _ Green's Functions (
  • 570. 1 1 (
  • 571. (20.1) 20 _ Green's Functions in One Dimension Our treatment of differential equations, with the exception of SOLDEs with con- stant coefficients, did not consider inhomogeneous equations. At this point, how- ever, we can put into use one of the most elegant pieces of machinery in higher mathematics, Green's functions, to solve inhomogeneous differential equations. This chapter addresses Green's functions in one dimension, that is, Green's functions of ordinary differential equations. Consider the ODE Lx[u] = f(x) where Lxis a linear differential operator. In the abstract Dirac notation this can be formally written as L lu) = If}. If L has an inverse L-1 sa G, the solution can be formally written as lu) =L-I If} =G If}. Multiplying this by (xl and inserting 1 = f dy Iy) w(y) (yl between G and If) gives u(x) = fdyG(x, y)w(y)f(y), where the integration is over the range of definition of the functions involved. Once we know G(x, y), Equation (20.1) gives the solution u(x) in an integral form. But how do we find G(x, y)? Sandwiching both sides of LG = 1 between (xl and Iy) and using 1 = f dx'lx'} w(x') (x'i between L and G yields f dx'L(x, x')w(x')G(x', y) = (xl y) = 8(x - y)jw(x) if we use Equation (6.3). In particular, if L is a local differential operator (see Section 16.1), then L(x, x') = [8(x - x')jw(x)]Lx, and differential equation we obtain forGreen's function ( 8(x - y) LxG(x, y) = --'C....,-7"'- w(x) or LxG(x, y) = 8(x - y), (20.2) where the second equation makes the frequently used assumption that w(x) = 1. Green's function G(x, y) is called the Green's function (GF) for the differential operator (DO) Lx.
  • 572. 554 20. GREEN'S FUNCTIONS IN ONE OIMENSION As discussed iu Chapters 16 and 18, Lx might uot be defined for all functions on R Moreover, a complete specification of Lx reqnires some initial (or boundary) conditions. Therefore, we expect G(x, y) to depend on such initial conditions as well. We note that when Lx is applied to (20.1), we get Lxu(x) = f dy[Lx(G(x, y)]w(y)f(y) = f dy 8(x - y) w(y)f(y) = f(x), w(x) indicating that u(x) is indeed a solution of the original ODE. Equation (20.2), involving the generalized function 8(x - y) (or disttibution in the language of Chapter 6), is meaningful only in the same context. Thus, we treat G(x, y) not as an ordinary function but as a distribution. Finally, (20.1) is assumed to hold for an arbitrary (well-behaved) function f. 20.1 Calculation of Some Green's Functions This section presents some examples of calculating G(x, y) for very simple DOs. Later we will see how to obtain Green's functions for a general second-orderlinear differential operator. Although the complete specification of GFs requires bound- ary conditions, we shall introduce unspecified constants in some of the examples below, and calculate some indefinite GFs. 20.1.1. Example. Let us findthe GF for the simplestDO, Lx =d/dx. Weneed to find a distribution suchthat its derivative is theDirac deltafunction: 1 G'(x, y) =cS (x - y). In Chapter 6, we encountered sucha distribution-the stepfunction B(x - y). Thus, G(x, y) = O(x - y) +aryl, wherearyl is the "constant"of integration. III The example above did not include a boundary (or initial) condition. Let us see how boundary conditions affect the resulting GF. 20.1.2. Example. Letussolve u'(x) = f(x) wherex E [0,00) andu(O) = O. Ageneral solution of thisDE is givenby Equation (20.1) andthepreceding example: u(x) =fooo O(x - y)f(y)dy +foOOa(y)f(y)dy. Thefactor8(x - y) in the firstterm onthe RHSchopsofftheintegralatx: u(x) = fox f(y)dy + fooo a(y)f(y)dy. TheBegives0 = u(O) = 0+ Jo ooa(y)f(y)dy. Theonlywaythatthiscanbe satisfied for arbitrary fey) is for aryl to be zero.Thus,G(x, y) = O(x - y), and u(x) = foOO O(x - y)f(y)dy = fox f(y)dy. 1Hereandelsewhere in thischapter, a primeovera GFindicates differentiation withrespectto its first argument. (
  • 573. 20.1 CALCULATION OF SOME GREEN'S FUNCTIONS 555 Thisiskillingaflywithasledgehammer! Wecouldhaveobtainedtheresultbyasimple integration. However, theroundabout wayoutlinedhereillustrates someimportant features of GFsthat will be discussed later. TheBe introduced hereis veryspecial. What happens if it is changed to u(O) = a? Problem 20.1 answers lbat. III 20.1.3. Example. A more complicated DO is Lx = d2/ dx2.Let us find its indefinite GF. To do so, we integrate GI/(x, y) =8(x - y) ouce wilbrespect to x to obtain -!xG(x, y) = O(x - y) +a(y). A secoud integration yields G(x, y) = f dxO(x - y) +xa(y) + ~(y), wherea and 11 arearbitrary functions andtheintegral isanindefinite integral tobeevaluated next. Let Q(x, y) be the primitive of O(x - y); lbat is, dQ"( {I if x > y, - =u x-y) = dx 0 if x < y. Thesolution to thisequation is Q(x, y) = {x +a(y) if x > y, bey) if x < y. (20.3) Note that we have not defined Q(x, y) atx = y. It will become c1earbelow lbat Q(x, y) is continuous at x = y. Itis convenient to writeQ (x, y) as Q(x, y) = [x +a(y)]e(x - y) +b(y)e(y -x). (20.4) To specify aryl and bey) further, we differentiate (20.4) and compare it wilb (20.3): dQ dx = O(x - y) + [x +a(y)]8(x - y) - b(y)8(x - y) =O(x - y) + [x - bey) +a(y)]8(x - y), (20.5) where we have used d d dx O(x - y) = - dx O(y - x) = 8(x - y). For Equation (20.5) to agree wilb (20.3), we must bave [x - bey) + a(y)]8(x - y) = 0, which,uponintegration overx, yieldsa(y) - b(y) = -y. Substituting thisintheexpression for Q(x, y) gives Q(x, y) = (x - y)e(x - y) +b(y)[O(x - y) +O(y - x)]. But O(x) +Or-x) = I; therefore, Q(x, y) = (x - y)e(x - y) +bey). It follows, among other things, that n (x, y) is continuous atx = y. Wecannowwrite ( G(x, y) = (x - y)e(x - y) +xa(y) +fJ(y), where fJ(y) = ~(y) +bey). III The GF in the example above has two arbitrary functions, Ol(y) and fJ(y), which are the result of underspecification of Lx: A full specification of Lx requires Bes, as the following example shows.
  • 574. 556 20. GREEN'S FUNCTIONS IN ONE OIMENSION 20.1.4.Example. Let us calculate the GF of Lx[u] = u"(x) = f(x) subject to the BC u(a) = u(b) = 0 where [a, b] is the iutervalon which Lx is defined.Example 20.1.3gives us the (indefinite) GFfor Lx.Using that,we can write u(x) = lb (x - y)e(x - y)f(y) dy +x lb a(y)f(y) dy +lb fJ(y)f(y) dy = LX(x - y)f(y) dy +x lb a(y)f(y) dy +lb fJ(y)f(y) dy. Applyiugthe BCs yields 0= u(a) = a lb a(y)f(y) dy +lb fJ(y)f(y) dy, 0= u(b) = lb (b - y)f(y) dy +b lb a(y)f(y) dy +t fJ(y)f(y) dy. (20.6) where a:::; x and y ::s: b. From thesetwo relations it is possibleto determine ex(y) andfJ(y): Substitute forthe last iutegral on the RHS of the second equation of (20.6) from the first equation and get o = J/;[b - y + ba(y) - aa(y)]f(y) dy. Siuce this must hold for arbitrary fey), we concludethat b-y b - y + (b - a)a(y) =0 => a(y) = ---. b-a Substituting for a (y) iu the first equation of (20.6) and noting that the result holds for arbitrary f, weobtain fJ(y) =a(b-y)/(b-a). Insertionofa(y) andfJ(y) iutheexpression for G(x, y) obtaiuediu Example 20.1.3 gives y-b G(x, y) =(x - y)e(x - y) + (x - a) b _ a It is striking that G(a, y) =(a - y)e(a - y) =0 (becausea - y ~ 0), and y-b G(b, y) = (b - y)e(b - y) + (b - a) b _ a = 0 because e(b - y) = 1 for all y ~ b [recall that x and y lie iu the iuterval (a, b)J. These two equations revealthe important fact thatas a functionof x, G(x, y) satisfiesthe same (homogeneous) Beasthesolutionofthe DE.Thisis ageneralpropertythat willbediscussed later. III In all the preceding examples, the BCs were very simple, Specifically, the value of the solution and/or its derivative at the boundary points was zero. What if the BCs are not so simple? In particular, how can we handle a case where u(a) [or u'(a)] and u(b) [or u'(b)] are nonzero? Consider a general (second-order) differential operator Lx and the differential equation Lx[u] = J(x) subjecttotheBCsu(a) = at andu(b) = bl. We claim that we canreduce this system to the case where u(a) = u(b) = O.RecallfromChapter 13 that the most general solution to such a DE is ofthe form u = Uk +u, where Uk, (
  • 575. 20.2 FORMAL CONSIDERATIONS 557 the solution to the homogeneous equation, satisfies Lx[Uh] = 0 and contains the arbitrary parameters inherent in solutions of differential equations. For instance, if the linearly independent solutions are v and w, then Uh(X) = C, v(x) +C2W(X) and Ui is any solution of the inhomogeneous DE. Ifwe demand that uh(a) =a, and uh(b) =b, then u, satisfies the system Lx[uil = f(x), Uj(a) =ui(b) =0, which is ofthe type discussedin the preceding examples. Since Lxis a SOLDO, we can put all the machinery of Chapter 13 to workto obtain v(x), w (x), and therefore Uh(X). The problem then reduces to a DE for which the BCs are homogeneous; that is, the value ofthe solution and/or its derivative is zero at the boundary points. 20.1.5.Example. Let us asswne that Lx = d2/dx2 Calculation of Uh is trivial: d2uh Lx[Uh] = 0 ~ -2- = 0 ~ Uh(X) = c'x + C2· dx To evaluate C, and C2, we impose the BCs Uh(a) = a, and uh(b) = b,: C,a + C2 = at. C,b + C2 = b,. This gives C, = (b, - a,)/(b - a) and C2 = (a,b - ab,)/(b - a). Theinhomogeneous equation defines a problem identical to that of Example 20.1.4. Thus, we can immediately write Ui(X) = I:G(x, y)f(y) dy, where G(x, y) is as given in thatexample.Thus,the generalsolutionis hI - at alb - abl lx x - alb u(x) = ~x + b _ a + a (x - y)f(y) dy + b _ a a (y - b)f(y) dylll Example 20.1.5 shows that an inhomogeneous DE with inhomogeneous BCs can be separated into two DEs, one homogeneous with inhomogeneous BCs and the other inhomogeneous with homogeneous BCs, the latter being appropriate for the OF. Futthermore, all the foregoing examples indicate that solutions of DEs can be succinctly written in terms of OFs that automatically incorporate the BCs as long as the BCs are homogeneous. Can a OF also give the solution to a DE with inhomogeneons BCs? 20.2 Formal Considerations The discussion and examples ofthe preceding section hint at the power of Green's functions. The elegance of such a function becomes apparent from the realization that it contains all the information about the solutions of a DE for any type of BCs, as we are about to show. Since OFs are inverses of DOs, let us briefly reexamine the inverse of an operator, which is closely tied to its specttum. The question as (
  • 576. 558 20. GREEN'S FUNCTIONS IN ONE OIMENSION to whether or not an operator A in a finite-dimensional vector space is invertible is succinctly answered by the value of its determinant: A is invertible if and only if detA t= O. In fact, as we saw at the beginning of Chapter 16, one translates the abstract operator equation A lu) = [u) into a matrix equation Au = v and reduces the question to that of the inverse of a matrix. This matrix takes on an especially simpleformwhenA is diagonal,thatis, when Aij = AiBij. Forthisspecialsituation we have for i = 1,2, ... , N (no sum over i). (20.7) This equation has a nnique solution (for arbitrary Vi) if and only if Ai t= 0 for all i, In that case Ui = Vi/Ai for i = 1,2, ... , N. In particnlar, if Vi = 0 for all i, that is, when Equation (20.7) is homogeneous, the unique solution is the trivial solution.On the otherhand, when someof the Ai arezero,there maybe no solution to (20.7), but the homogeneousequation has a nontrivial solution (Ui need not be zero). Recalling (fromChapter 3) that an operator is invertibleif and only if none of its eigenvaluesis zero, we have the following: 20.2.1. Proposition. The operator A E J:.,(V) isinvertible ifandonlyifthehomo- geneous equation A Iu) = 0 hasno nontrivial solutions. In infinite-dimensional(Hilbert) spaces there is no determinant. How can we tell whether or not an operator in a Hilbert space is invertible? The exploitation of the connection between invertibility and eigenvalues has led to Proposition 20.2.1, which can be generalizedto an operator acting on any vector space, finite or infinite. Consider the equation A lu) = 0 in a Hilbert space 1(. In general, neither the domain nor the range of A is the whole of 1(. If A is invertible, then the only solution to the equation A lu) = 0 is lu} = O. Conversely, assuming that the equation has no nontrivialsolution implies that the null space of A consists of only the zero vector.Thus, This showsthatA is injective(one-to-one),i.e., A is abijectivelinear mappingfrom the domain of A, 1J(A), onto the range of A. Therefore, A must have an inverse. The foregoing discussion can be expressed as follows.If Alu) = 0, then (by the definitionof eigenvectors)A = 0 is an eigenvalue of A if and only if lu) t= o. Thus, if A Iu) = 0 has no nontrivial solution,then zero cannot be an eigenvalueof ( A. This can also be stated as follows: 20.2.2. Theorem. An operator Aon a Hilbertspacehasan inverse if andonly if A = 0 isnotan eigenvalue of A. Green's functions are inversesof differential operators. Therefore, it is impor- tant to have a clear understanding of the DOs. An nth-order linear differentialop- erator (NOLDO)satisfiesthe followingtheorem (for aproof, see [Birk 78, Chapter 6]).
  • 577. 20.2 FORMAL CONSIOERATIONS 559 20.2.3. Theorem. Let d" dn-I d Lx = P,,(x)-d + Pn-I(X)-d 1+'" + PI (x)-d + po(x) xn x": X (20.8) (20.9) where p,,(x) ¢ 0 in [a, b]. Let Xo E [a, b] and let {YklZ~1 be given numbers and f(x) a given piecewise continuous function on [a, b]. Then the initial value problem (IVP) Lx[u] = f for x E [a, b], u(xo) = YI, u'(xo) = J!2, ... , uC,,-IxO) = Yn has one and only one solution. initial value problem This is simply the existence and uniqueness theorem for a NOLDE. Equation (20.9) is referred to as the IVPwith data (f(x); YI, ... , YnJ. This Iheoremis used to define Lx. Part ofIhat definition are the BCs Ihat the solutions to Lx must satisfy. A particularly important BC is the homogeneous one in which YI = J!2 = ... = Yn = O. In such a case it can he shown (see Problem 20.3) Ihat the only nontrivial solution of the homogeneous DE Lx[u] = 0 is u ss O. Theorem 20.2.2 then tells us that Lx is invertible; that is, there is a unique operator G suchthat LG = 1. The "components" version ofIhis last relation is part ofthe content of the nexttheorem. 20.2.4. Theorem. The DO Lx of Equation (20.8) associated with the IVP with data (f(x); 0, 0, ... , OJ is invertible; that is, there exists afunction G(x, y) such that . 8(x-y) LxG(x, y) = . w(x) The importance of homogeneous BCs can now be appreciated. Theorem 20.2.4 is the reason why we had to impose homogeneous BCs to obtain the GF in all the examples of the previous section. The BCs in (20.9) clearly are not the only ones Ihat can be used. The most general linear BCs encountered in differential operator Iheory are RI[U] sa al1u(a) + + alnu(n-I)(a) + fJl1u(b) + + fJlnuCn-I)(b) = YI, ( R2[U] == a2Iu(a) + + a2nu(n-I)(a) + fJ2IU(b) + + fJ2nU(n-l) (b) = J!2, (20.10) Rn[u] == anlu(a) + ... + annuC,,-I)(a) + fJ"lu(b) + ... + fJnnuCn-lb) = Yn· The n row vectors {(ail, ... , ain, fJil, ... , fJin)}7~1 are assumed to be independent (in particular, no row is identical to zero). We refer to Ri as boundary functionals because for each (sufficiently smoolh) function u, Ihey give a number Yi. The
  • 578. 560 20. GREEN'S FUNCTIONS IN ONE DIMENSION boundary functionals and boundary value problem DO of (20.8) and the BCs of (20.10) together form a boundary valne problem (BVP). The DE Lx[u] = f subject to the BCs of (20.10) is a BVP with data (f(x); YI, ... , Ynl. We note that the Ri are linear; that is, and Ri[au] = aRi[U]. completely homogeneous problem Since Lx is also linear, we conclude that the superposition principle applies to the system consisting of Lx[u] = f and the BCs of (20.10), which is sometimes denoted by (L; RI, ... , Rn). If u satisfies the BVP with data {f; YI, ... , Ynl and v satisfies the BVP with data Is:1."1, ... , /Lnl, then au + pv satisfies the BVP with data laf +pg; an +P/LI, , aYn +P/Lnl. It follows that if u and v both satisfy the BVPwith data {f; n, ,Ylll, then u - v satisfies the BVP with data 10; 0, 0, ... , O], which is called the completely homogeneons problem. Unlike the IVP, the BVP with data 10; 0, 0, ... , O} may have a nontrivial so- lution. If the completely homogeneous problem has no noutrivial solution, then the BVP with data {f; YI, ... , Ynl has at most one solution (a solution exists for any set of data). On the other hand, if the completely homogeneous problem has nontrivial solutions, then the BVP with data {f; YI, ... , Yn1eitherhas no solutions or has more than one solution (see [Stak 79, pp. 203-204]). Recall that when a differential (unbounded) operator Lx acts in a Hilbert space, such as.c~(a, b), it acts only on its domain. In the context ofthe present discussion, this means that not all functions in.c~(a, b) satisfy the BCs necessary for defining Lx. Thus, the functions for which the operator is defined (those that satisfy the BCs) form a subset of .c~(a, b), which we called the domain of Lx and denoted by 'D(Lx). From a formal standpoint it is important to distinguish among maps that have different domains. For instance, the Hilbert-Schmidt integral operators, which are defined on a finite interval, are compact, while those defined on the entire real line are not. 20.2.5. Definition. Let Lx be the DO ofEquation (20.8). Suppose there exists a DO Lt, with the property that adjoint ofa differential operator d w Iv* (Lx[u]) - u(Lt[v])*l = -Q[u, v*] dx for u, v E 'D(Lx) n'D(Lt), conjunct where Q[u, v*l, called the conjunct ofthe functions u and v, depends on u, v, and ( their derivatives oforder up to n - I. The DO Lt is then called the formal adjoint of Lx· IfLt = Lx (without regard to the Bes imposed on their solutions), then Lx is said to beformally self-adjoint. If'D(Lt) ::J 'D(Lx) and Lt = Lx on 'D(Lx), then Lx is said to be hermitian. If 'D(Lb ='D(Lx) and Lt =Lx, then Lx is said to be self-a4joint. generalized Green's identity The relation given in the definition above involving the conjunct is a general- ization of the Lagrange identity and can also be written in integral form:
  • 579. 20.2 FORMAL CONSIOERATIONS 561 lb dxw{v*(Lx[u])} -lb dxw{u(L~[V])*} = Q[u, v*]I~ (20.11) This form is sometimes called the generalized Green's identity. George Green (17937-1841) was not appreciated in his lifetime. His date of birth is unknown (however, it is known that he was baptized on 14 July 1793), and no portrait ofhim survives. He left school,afteronlyone year's attendance,to workin his father's bak- ery. When the father opened a windmill in Nottingham, the boy used an upper room as a study in which he taught himself physics and mathematics from library books. In 1828, when he was thirty-five years old, he published his most important work, An Essay on the Application ofMathematical Analysis to the Theory ofElectricity and Magnetism at his own expense. In it Green apologized for any shortcomings in the paper due to his minimal formal educationor the limited resources available to him, the latter being apparent in the few previous works he cited. The introduction explained the importance Green placed on the "potential" function. The body of the paper generalizes this idea to electricity and magnetism. Inaddition to the physics ofelectricity and magnetism, Green's first paperalso contained the monumental mathematical contributions for which he is now famous: The relationship between surface and volume integrals we now call Green's theorem, and the Green'sfunc- tion, a ubiquitous solution to partial differential equations in almost every area of physics. With little appreciation for the future impact of this work, one of Green's contemporaries declared the publication "a complete failure." The "Essay", which received little notice because of poor circulation, was saved by Lord Kelvin, who tracked it down in a German journal. When his father died in 1829, some of George's friends urged him to seek a college education. After four years ofself-study, during which he closed the gaps in his elementary education, Green was admitted to Caius College of Cambridge University at the age of 40, from which he graduated four years later after a disappointing performance on his final examinations. Later, however, he was appointed Perce Fellow of Caius College. Two years afterhis appointment he died, and his famous 1828 paper was republished, this timereaching a much wider audience. This paper has been described as "the beginning of mathematical physics in England." He published only ten mathematical works. In 1833 he wrote three further papers. Two on electricity were puhlished by the Cambridge Philosophical Society. One on hydrody- ( namics was published by the Royal Society of Edinburgh (of which he was a Fellow) in 1836. He also had two papers on hydrodynamics (in particular wave motion in canals), two papers on reflection and refraction of light, and two papers on reflection and refraction of sound published in Cambridge. In 1923 the Green windmill was partially restored by a local businessman as a gesture of tribute to Green. Einstein came to pay homage. Then a fire in 1947 destroyed the renovations. Thirty years later the idea of a memorial was once again mooted, and sufficient money was raised to purchase the mill and present it to the sympathetic Nottingham City Council. In 1980 the George Green Memorial Appeal was launcbed to secure $20,000 to get the sails
  • 580. 562 20. GREEN'S FUNCTIONS IN ONE OIMENSION turning again and themachinery working oncemore. Today, Green's restored mill stands as a mathematics museum in Nottingham. 20.2.1 Second-OrderLinear DOs Since second-order linear differential operators (SOLDOs) are sufficiently general for mostphysicalapplications,we will concentrate on them. Becausehomogeneous BCs are important in constructing Green's functions, let us first consider BCs of the form Rl [u] '" "l1u(a) +"12U' (a) +fJl1u(b) +fJ12u' (b) = 0, R2[ul sa "21u(a) +"22U' (a) +f!21u(b) +fJ22u'(b) = 0, (20.12) where it is assumed, as usual, that ("11, "12, fJl1, fJ12) and ("21, "22, fJ21, fJ22) are linearly independent. If we define the inner product as an integral with weight w, Equation (20.11) can be formally written as (vi L lu) = (ul Lt Iv)* + Q[u, v*]I~. This would coincide with the usual definition of the adjoint if the surface term vanishes, that is, if For this to happen, we need to impose BCs on v. To find these BCs, let us rewrite Equation (20.12) in a more compact form. Linear independence of the two row vectors of coefficients implies that the 2 x 4 matrix of coefficients has rank two. This means that the 2 x 4 matrix has an invertible 2 x 2 submatrix. By rearranging the terms in Equation (20.12) if necessary, we can assume that the second of the two 2 x 2 subrnatrices is invertible. The homogeneous BCs can then be conveniently written as Q[u, v*llx=b = Q[u, v*]lx=a. R[u] = (Rl[U1 ) = (A B) (Ua) A + B 0 R2[U] Ub = Ua Ub = , where (20.13) (20.14) ( and Bis invertible. B '" (fJl1 fJ12), fJ21 fJ22 ( u(a) ) ua sa u'(a) , ( u(b) ) Ub'" u'(b) ,
  • 581. 20.2 FORMAL CONSIOERATIONS 563 The most general form of the conjnnct for a SOLDO is Q[u, v*](x) '= ql1 (x)u(x)v*(x) +q12(X)U(X)v'*(x) +q21 (x)u' (x)v*(x) +q22(X)U'(x)v'*(x), which can he written in matrix form as Q[u, v*](x) = u~axv; where ax = (ql1 (x) qJ2(X)), Q21(X) Q22(X) (20.15) and U x and v; have similar definitions as U a and Ub ahove. The vanishing of the surface termbecomes (20.16) We need to translate this equation intoa condition on v* alone/' This is accom- plished hy solving for two of the four quantities u(a), u'(a), u(b), and u'(b) in terms of the other two, suhstituting the result in Equation (20.16), and setting the coefficients of the other two equal to zero. Let us assume, as before, that the suh- matrix B is invertible, i.e., u(b) and u'(b) are expressible in terms of u(a) and u'(a). Then Ub =-B-1Aua, or ub = _u~At(Bt)-I, and we obtain -u~At(Bt)-labVb =u~aav~ =} u~ [At(Bt)-labVb +aav~] =0, andthe conditionon v* becomes (20.17) We see that all factors ofu have disappeared, as they should. The expanded version of the BCs on v* are written as BI[v*] es ul1v*(a) +u12v'*(a) +ql1v*(b) +q12v'*(b) = 0, B2[V*] sa u2Iv*(a) +U22v'* (a) +Q2IV*(b) +Q22V'*(b) = O. (20.18) adjoint boundary These homogeneous BCs are said to be adjoint to those of(20.12). Because of the conditions difference between BCs and their adjoints, the domain of a differential operator need not be the sarne as that of its adjoint. 20.2.6. Example. Let Lx = d2/ dx2 withthehomogeneous BC, RI[U] = au(a) - u'(a) = 0 and R2[U] = f3u(b) - u'(b) = O. (20.19) Wewantto calculateQ[u, v*] and the adjointBC, for v. By repeatedintegration by parts [or by usingEquation (13.23)], we obtainQ[u, u"] = u'v* - uv'*. For the surface termto vanish, we musthave u'(a)v*(a) - u(a)v'*(a) = u'(b)v*(b) - u(b)v'*(b). 2Theboundary conditions on v* should notdepend on thechoiceof u.
  • 582. 564 20. GREEN'S FUNCTIONS IN ONE DIMENSION B2[V*] = {lv*(b) - v'*(b) =O. Substitutingfrom (20.19)in this equation,we get u(a)[av*(a) - v'*(a)] =u(b)[{lv*(b) - v'*(b)], whichholdsfor arbitraryu if and only if Bl[V*] = av*(a) - v'*(a) = 0 and (20.20) Thisis a special case,in whichtheadjoint Bes arethesameastheoriginal Bes (substitute u for v* to see this). Tosee that theoriginal Bes andtheir adjoints neednotbethesame, we consider Rdu] = u'(a) - au(b) =0 and R2[U] = {lu(a) - u'(b) =0, (20.21) from whichwe obtainu(a)[{lv*(b) + v'*(a)] = u(b)[av*(a) +v'*(b)]. Thus, Bl[V*] =av*(a) +v'*(b) = 0 and BZ[v*] = {lv*(b) +v'*(a) =0, (20.22) mixed and unmixed whichis not the sameas (20.21).Boundaryconditionssuchasthosein (20.19)and(20.20), BCs in whicheachequation contains thefunction anditsderivative evaluated: atthesamepoint, are called unmixed BCs. On the other hand, (20.21)