Classical mechanics

Classical Mechanics

Joel A. Shapiro

April 21, 2003

i

Copyright C 1994, 1997 by Joel A. Shapiro
All rights reserved. No part of this publication may be reproduced,
stored in a retrieval system, or transmitted in any form or by any
means, electronic, mechanical, photocopying, or otherwise, without the
prior written permission of the author.

This is a preliminary version of the book, not to be considered a
fully published edition. While some of the material, particularly the
first four chapters, is close to readiness for a first edition, chapters 6
and 7 need more work, and chapter 8 is incomplete. The appendices
are random selections not yet reorganized. There are also as yet few
exercises for the later chapters. The first edition will have an adequate
set of exercises for each chapter.

The author welcomes corrections, comments, and criticism.

Contents

1 Particle Kinematics 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Single Particle Kinematics . . . . . . . . . . . . . . . . . 4
1.2.1 Motion in conﬁguration space . . . . . . . . . . . 4
1.2.2 Conserved Quantities . . . . . . . . . . . . . . . . 6
1.3 Systems of Particles . . . . . . . . . . . . . . . . . . . . . 9
1.3.1 External and internal forces . . . . . . . . . . . . 10
1.3.2 Constraints . . . . . . . . . . . . . . . . . . . . . 14
1.3.3 Generalized Coordinates for Unconstrained Sys-
tems . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.3.4 Kinetic energy in generalized coordinates . . . . . 19
1.4 Phase Space . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4.1 Dynamical Systems . . . . . . . . . . . . . . . . . 22
1.4.2 Phase Space Flows . . . . . . . . . . . . . . . . . 27

2 Lagrange’s and Hamilton’s Equations 37
2.1 Lagrangian Mechanics . . . . . . . . . . . . . . . . . . . 37
2.1.1 Derivation for unconstrained systems . . . . . . . 38
2.1.2 Lagrangian for Constrained Systems . . . . . . . 41
2.1.3 Hamilton’s Principle . . . . . . . . . . . . . . . . 46
2.1.4 Examples of functional variation . . . . . . . . . . 48
2.1.5 Conserved Quantities . . . . . . . . . . . . . . . . 50
2.1.6 Hamilton’s Equations . . . . . . . . . . . . . . . . 53
2.1.7 Velocity-dependent forces . . . . . . . . . . . . . 55

3 Two Body Central Forces 65
3.1 Reduction to a one dimensional problem . . . . . . . . . 65

iii

iv CONTENTS

3.1.1 Reduction to a one-body problem . . . . . . . . . 66
3.1.2 Reduction to one dimension . . . . . . . . . . . . 67
3.2 Integrating the motion . . . . . . . . . . . . . . . . . . . 69
3.2.1 The Kepler problem . . . . . . . . . . . . . . . . 70
3.2.2 Nearly Circular Orbits . . . . . . . . . . . . . . . 74
3.3 The Laplace-Runge-Lenz Vector . . . . . . . . . . . . . . 77
3.4 The virial theorem . . . . . . . . . . . . . . . . . . . . . 78
3.5 Rutherford Scattering . . . . . . . . . . . . . . . . . . . . 79

4 Rigid Body Motion 85
4.1 Configuration space for a rigid body . . . . . . . . . . . . 85
4.1.1 Orthogonal Transformations . . . . . . . . . . . . 87
4.1.2 Groups . . . . . . . . . . . . . . . . . . . . . . . . 91
4.2 Kinematics in a rotating coordinate system . . . . . . . . 94
4.3 The moment of inertia tensor . . . . . . . . . . . . . . . 98
4.3.1 Motion about a fixed point . . . . . . . . . . . . . 98
4.3.2 More General Motion . . . . . . . . . . . . . . . . 100
4.4 Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.4.1 Euler’s Equations . . . . . . . . . . . . . . . . . . 107
4.4.2 Euler angles . . . . . . . . . . . . . . . . . . . . . 113
4.4.3 The symmetric top . . . . . . . . . . . . . . . . . 117

5 Small Oscillations 127
5.1 Small oscillations about stable equilibrium . . . . . . . . 127
5.1.1 Molecular Vibrations . . . . . . . . . . . . . . . . 130
5.1.2 An Alternative Approach . . . . . . . . . . . . . . 137
5.2 Other interactions . . . . . . . . . . . . . . . . . . . . . . 137
5.3 String dynamics . . . . . . . . . . . . . . . . . . . . . . . 138
5.4 Field theory . . . . . . . . . . . . . . . . . . . . . . . . . 143

6 Hamilton’s Equations 147
6.1 Legendre transforms . . . . . . . . . . . . . . . . . . . . 147
6.2 Variations on phase curves . . . . . . . . . . . . . . . . . 152
6.3 Canonical transformations . . . . . . . . . . . . . . . . . 153
6.4 Poisson Brackets . . . . . . . . . . . . . . . . . . . . . . 155
6.5 Higher Differential Forms . . . . . . . . . . . . . . . . . . 160
6.6 The natural symplectic 2-form . . . . . . . . . . . . . . . 169

CONTENTS v

6.6.1 Generating Functions . . . . . . . . . . . . . . . . 172
6.7 Hamilton–Jacobi Theory . . . . . . . . . . . . . . . . . . 181
6.8 Action-Angle Variables . . . . . . . . . . . . . . . . . . . 185

7 Perturbation Theory 189
7.1 Integrable systems . . . . . . . . . . . . . . . . . . . . . 189
7.2 Canonical Perturbation Theory . . . . . . . . . . . . . . 194
7.2.1 Time Dependent Perturbation Theory . . . . . . 196
7.3 Adiabatic Invariants . . . . . . . . . . . . . . . . . . . . 198
7.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . 198
7.3.2 For a time-independent Hamiltonian . . . . . . . 198
7.3.3 Slow time variation in H(q, p, t) . . . . . . . . . . 200
7.3.4 Systems with Many Degrees of Freedom . . . . . 206
7.3.5 Formal Perturbative Treatment . . . . . . . . . . 209
7.4 Rapidly Varying Perturbations . . . . . . . . . . . . . . . 211
7.5 New approach . . . . . . . . . . . . . . . . . . . . . . . . 216

8 Field Theory 219
8.1 Noether’s Theorem . . . . . . . . . . . . . . . . . . . . . 225

A ijk and cross products 229
A.1 Vector Operations . . . . . . . . . . . . . . . . . . . . . . 229
A.1.1 δij and ijk . . . . . . . . . . . . . . . . . . . . . . 229

B The gradient operator 233

C Gradient in Spherical Coordinates 237

Chapter 1

Particle Kinematics

1.1 Introduction
Classical mechanics, narrowly defined, is the investigation of the motion
of systems of particles in Euclidean three-dimensional space, under the
influence of specified force laws, with the motion’s evolution determined
by Newton’s second law, a second order differential equation. That
is, given certain laws determining physical forces, and some boundary
conditions on the positions of the particles at some particular times, the
problem is to determine the positions of all the particles at all times.
We will be discussing motions under specific fundamental laws of great
physical importance, such as Coulomb’s law for the electrostatic force
between charged particles. We will also discuss laws which are less
fundamental, because the motion under them can be solved explicitly,
allowing them to serve as very useful models for approximations to more
complicated physical situations, or as a testbed for examining concepts
in an explicitly evaluatable situation. Techniques suitable for broad
classes of force laws will also be developed.
The formalism of Newtonian classical mechanics, together with in-
vestigations into the appropriate force laws, provided the basic frame-
work for physics from the time of Newton until the beginning of this
century. The systems considered had a wide range of complexity. One
might consider a single particle on which the Earth’s gravity acts. But
one could also consider systems as the limit of an infinite number of

1

2 CHAPTER 1. PARTICLE KINEMATICS

very small particles, with displacements smoothly varying in space,
which gives rise to the continuum limit. One example of this is the
consideration of transverse waves on a stretched string, in which every
point on the string has an associated degree of freedom, its transverse
displacement.
The scope of classical mechanics was broadened in the 19th century,
in order to consider electromagnetism. Here the degrees of freedom
were not just the positions in space of charged particles, but also other
quantities, distributed throughout space, such as the the electric field
at each point. This expansion in the type of degrees of freedom has
continued, and now in fundamental physics one considers many degrees
of freedom which correspond to no spatial motion, but one can still
discuss the classical mechanics of such systems.
As a fundamental framework for physics, classical mechanics gave
way on several fronts to more sophisticated concepts in the early 1900’s.
Most dramatically, quantum mechanics has changed our focus from spe-
cific solutions for the dynamical degrees of freedom as a function of time
to the wave function, which determines the probabilities that a system
have particular values of these degrees of freedom. Special relativity
not only produced a variation of the Galilean invariance implicit in
Newton’s laws, but also is, at a fundamental level, at odds with the
basic ingredient of classical mechanics — that one particle can exert
a force on another, depending only on their simultaneous but different
positions. Finally general relativity brought out the narrowness of the
assumption that the coordinates of a particle are in a Euclidean space,
indicating instead not only that on the largest scales these coordinates
describe a curved manifold rather than a flat space, but also that this
geometry is itself a dynamical field.
Indeed, most of 20th century physics goes beyond classical Newto-
nian mechanics in one way or another. As many readers of this book
expect to become physicists working at the cutting edge of physics re-
search, and therefore will need to go beyond classical mechanics, we
begin with a few words of justification for investing effort in under-
standing classical mechanics.
First of all, classical mechanics is still very useful in itself, and not
just for engineers. Consider the problems (scientific — not political)
that NASA faces if it wants to land a rocket on a planet. This requires

1.1. INTRODUCTION 3

an accuracy of predicting the position of both planet and rocket far
beyond what one gets assuming Kepler’s laws, which is the motion one
predicts by treating the planet as a point particle influenced only by
the Newtonian gravitational field of the Sun, also treated as a point
particle. NASA must consider other effects, and either demonstrate
that they are ignorable or include them into the calculations. These
include

• multipole moments of the sun

• forces due to other planets

• effects of corrections to Newtonian gravity due to general relativ-
ity

• friction due to the solar wind and gas in the solar system

Learning how to estimate or incorporate such effects is not trivial.
Secondly, classical mechanics is not a dead field of research — in
fact, in the last two decades there has been a great deal of interest in
“dynamical systems”. Attention has shifted from calculation of the or-
bit over fixed intervals of time to questions of the long-term stability of
the motion. New ways of looking at dynamical behavior have emerged,
such as chaos and fractal systems.
Thirdly, the fundamental concepts of classical mechanics provide the
conceptual framework of quantum mechanics. For example, although
the Hamiltonian and Lagrangian were developed as sophisticated tech-
niques for performing classical mechanics calculations, they provide the
basic dynamical objects of quantum mechanics and quantum field the-
ory respectively. One view of classical mechanics is as a steepest path
approximation to the path integral which describes quantum mechan-
ics. This integral over paths is of a classical quantity depending on the
“action” of the motion.
So classical mechanics is worth learning well, and we might as well
jump right in.


1.2 Single Particle Kinematics
We start with the simplest kind of system, a single unconstrained par-
ticle, free to move in three dimensional space, under the influence of a
force F .

1.2.1 Motion in configuration space
The motion of the particle is described by a function which gives its
position as a function of time. These positions are points in Euclidean
space. Euclidean space is similar to a vector space, except that there
is no special point which is fixed as the origin. It does have a met-
ric, that is, a notion of distance between any two points, D(A, B). It
also has the concept of a displacement A − B from one point B in the
Euclidean space to another, A. These displacements do form a vector
space, and for a three-dimensional Euclidean space, the vectors form
a three-dimensional real vector space R3 , which can be given an or-
thonormal basis such that the distance between A and B is given by
D(A, B) = 3 [(A − B)i ]2 . Because the mathematics of vector spaces
i=1
is so useful, we often convert our Euclidean space to a vector space
by choosing a particular point as the origin. Each particle’s position
is then equated to the displacement of that position from the origin,
so that it is described by a position vector r relative to this origin.
But the origin has no physical significance unless it has been choosen
in some physically meaningful way. In general the multiplication of a
position vector by a scalar is as meaningless physically as saying that
42nd street is three times 14th street. The cartesian components of
the vector r, with respect to some fixed though arbitrary coordinate
system, are called the coordinates, cartesian coordinates in this case.
We shall find that we often (even usually) prefer to change to other sets
of coordinates, such as polar or spherical coordinates, but for the time
being we stick to cartesian coordinates.
The motion of the particle is the function r(t) of time. Certainly
one of the central questions of classical mechanics is to determine, given
the physical properties of a system and some initial conditions, what
the subsequent motion is. The required “physical properties” is a spec-
ification of the force, F . The beginnings of modern classical mechanics

1.2. SINGLE PARTICLE KINEMATICS 5

was the realization at early in the 17th century that the physics, or dy-
namics, enters into the motion (or kinematics) through the force and its
effect on the acceleration, and not through any direct effect of dynamics
on the position or velocity of the particle.
Most likely the force will depend on the position of the particle, say
for a particle in the gravitational field of a fixed (heavy) source at the
origin, for which

GMm
F (r) = − r. (1.1)
r3
But the force might also depend explicitly on time. For example, for
the motion of a spaceship near the Earth, we might assume that the
force is given by sum of the Newtonian gravitational forces of the Sun,
Moon and Earth. Each of these forces depends on the positions of the
corresponding heavenly body, which varies with time. The assumption
here is that the motion of these bodies is independent of the position of
the light spaceship. We assume someone else has already performed the
nontrivial problem of finding the positions of these bodies as functions
of time. Given that, we can write down the force the spaceship feels at
time t if it happens to be at position r,

r − RS (t) r − RE (t)
F (r, t) = −GmMS − GmME
|r − RS (t)|3 |r − RE (t)|3
r − RM (t)
−GmMM .
|r − RM (t)|3

Finally, the force might depend on the velocity of the particle, as for
example for the Lorentz force on a charged particle in electric and
magnetic fields

F (r, v, t) = q E(r, t) + q v × B(r, t). (1.2)

However the force is determined, it determines the motion of the
particle through the second order differential equation known as New-
ton’s Second Law
d2 r
F (r, v, t) = ma = m 2 .
dt


As this is a second order differential equation, the solution depends in
general on two arbitrary (3-vector) parameters, which we might choose
to be the initial position and velocity, r(0) and v(0).
For a given physical situation and a given set of initial conditions
for the particle, Newton’s laws determine the motion r(t), which is
a curve in configuration space parameterized by time t, known as
the trajectory in configuration space. If we consider the curve itself,
independent of how it depends on time, this is called the orbit of the
particle. For example, the orbit of a planet, in the approximation that
it feels only the field of a fixed sun, is an ellipse. That word does not
imply any information about the time dependence or parameterization
of the curve.

1.2.2 Conserved Quantities
While we tend to think of Newtonian mechanics as centered on New-
ton’s Second Law in the form F = ma, he actually started with the
observation that in the absence of a force, there was uniform motion.
We would now say that under these circumstances the momentum
p(t) is conserved, dp/dt = 0. In his second law, Newton stated the
effect of a force as producing a rate of change of momentum, which we
would write as
F = dp/dt,
rather than as producing an acceleration F = ma. In focusing on
the concept of momentum, Newton emphasized one of the fundamen-
tal quantities of physics, useful beyond Newtonian mechanics, in both
relativity and quantum mechanics1 . Only after using the classical rela-
tion of momentum to velocity, p = mv, and the assumption that m is
constant, do we find the familiar F = ma.
One of the principal tools in understanding the motion of many
systems is isolating those quantities which do not change with time. A
conserved quantity is a function of the positions and momenta, and
perhaps explicitly of time as well, Q(r, p, t), which remains unchanged
when evaluated along the actual motion, dQ(r(t), p(t), t)/dt = 0. A
1
The relationship of momentum to velocity is changed in these extensions,
however.

1.2. SINGLE PARTICLE KINEMATICS 7

function depending on the positions, momenta, and time is said to be
a function on extended phase space2 . When time is not included, the
space is called phase space. In this language, a conserved quantity is a
function on extended phase space with a vanishing total time derivative
along any path which describes the motion of the system.
A single particle with no forces acting on it provides a very simple
˙
example. As Newton tells us, p = dp/dt = F = 0, so the momentum
is conserved. There are three more conserved quantities Q(r, p, t) :=
˙
r(t)−tp(t)/m, which have a time rate of change dQ/dt = r−p/m −tp/m = ˙
0. These six independent conserved quantities are as many as one could
have for a system with a six dimensional phase space, and they com-
pletely solve for the motion. Of course this was a very simple system
to solve. We now consider a particle under the influence of a force.

Energy
Consider a particle under the influence of an external force F . In gen-
eral, the momentum will not be conserved, although if any cartesian
component of the force vanishes along the motion, that component of
the momentum will be conserved. Also the kinetic energy, defined as
T = 1 mv 2 , will not in general be conserved, because
2

dT ˙
= mv · v = F · v.
dt
As the particle moves from the point ri to the point rf the total change
in the kinetic energy is the work done by the force F ,
rf
∆T = F · dr.
ri

If the force law F (r, p, t) applicable to the particle is independent of
time and velocity, then the work done will not depend on how quickly
the particle moved along the path from ri to rf . If in addition the
work done is independent of the path taken between these points, so it
depends only on the endpoints, then the force is called a conservative
2
Phase space is discussed further in section 1.4.


force and we assosciate with it potential energy
r0
U(r) = U(r0 ) + F (r ) · dr ,
r
where r0 is some arbitrary reference position and U(r0 ) is an arbitrarily
chosen reference energy, which has no physical significance in ordinary
mechanics. U(r) represents the potential the force has for doing work
on the particle if the particle is at position r.
rf rf

The condition for the path inte-
gral to be independent of the path is
that it gives the same results along Γ2
Γ
any two coterminous paths Γ1 and Γ2 ,
or alternatively that it give zero when
evaluated along any closed path such Γ1
ri
as Γ = Γ1 − Γ2 , the path consisting of ri
following Γ1 and then taking Γ2 back- Independence of path Γ1 = Γ2
wards to the starting point. By Stokes’ is equivalent to vanishing of the
Theorem, this line integral is equiva- path integral over closed paths
lent to an integral over any surface S Γ, which is in turn equivalent
bounded by Γ, to the vanishing of the curl on
the surface whose boundary is
F · dr = × F dS. Γ.
Γ S

Thus the requirement that the integral of F · dr vanish around any
closed path is equivalent to the requirement that the curl of F vanish
everywhere in space.
By considering an infinitesimal path from r to r + ∆r, we see that
U(r + ∆) − U(r) = −F · ∆r, or
F (r) = − U(r).
The value of the concept of potential energy is that it enables finding
a conserved quantity, the total energy, in situtations in which all forces
are conservative. Then the total energy E = T + U changes at a rate
dE dT dr
= + · U = F · v − v · F = 0.
dt dt dt

1.3. SYSTEMS OF PARTICLES 9

The total energy can also be used in systems with both conservative
and nonconservative forces, giving a quantity whose rate of change is
determined by the work done only by the nonconservative forces. One
example of this usefulness is in the discussion of a slightly damped
harmonic oscillator driven by a periodic force near resonance. Then the
amplitude of steady-state motion is determined by a balence between
the average power input by the driving force and the average power
dissipated by friction, the two nonconservative forces in the problem,
without needing to worry about the work done by the spring.

Angular momentum
Another quantity which is often useful because it may be conserved is
the angular momentum. The definition requires a reference point in the
Euclidean space, say r0 . Then a particle at position r with momentum
p has an angular momentum about r0 given by L = (r − r0 ) × p.
Very often we take the reference point r0 to be the same as the point we
have chosen as the origin in converting the Euclidian space to a vector
space, so r0 = 0, and

L = r×p
dL dr dp 1
= ×p+r× = p × p + r × F = 0 + τ = τ.
dt dt dt m
where we have defined the torque about r0 as τ = (r − r0 ) × F in
general, and τ = r × F when our reference point r0 is at the origin.
We see that if the torque τ (t) vanishes (at all times) the angular
momentum is conserved. This can happen not only if the force is zero,
but also if the force always points to the reference point. This is the
case in a central force problem such as motion of a planet about the
sun.

1.3 Systems of Particles
So far we have talked about a system consisting of only a single particle,
possibly influenced by external forces. Consider now a system of n
particles with positions ri , i = 1, . . . , n, in flat space. The configuration


of the system then has 3n coordinates (configuration space is R3n ), and
the phase space has 6n coordinates {ri , pi }.

1.3.1 External and internal forces
Let Fi be the total force acting on particle i. It is the sum of the forces
produced by each of the other particles and that due to any external
force. Let Fji be the force particle j exerts on particle i and let FiE be
the external force on particle i. Using Newton’s second law on particle
i, we have
Fi = FiE + ˙ ˙
Fji = pi = mi v i ,
j

where mi is the mass of the i’th particle. Here we are assuming forces
have identifiable causes, which is the real meaning of Newton’s sec-
ond law, and that the causes are either individual particles or external
forces. Thus we are assuming there are no “three-body” forces which
are not simply the sum of “two-body” forces that one object exerts on
another.
Define the center of mass and total mass
mi ri
R= , M= mi .
mi
Then if we define the total momentum
d dR
P = pi = mi vi = mi ri = M ,
dt dt
we have
dP ˙
=P = ˙
pi = Fi = FiE + Fji .
dt i ij

Let us define F E = i FiE to be the total external force. If Newton’s
Third Law holds,

Fji = −Fij , so Fij = 0, and
ij

˙
P = F E. (1.3)


Thus the internal forces cancel in pairs in their effect on the total mo-
mentum, which changes only in response to the total external force. As
an obvious but very important consequence3 the total momentum of an
isolated system is conserved.
The total angular momentum is also just a sum over the individual
particles, in this case of the individual angular momenta:

L= Li = ri × pi .

Its rate of change with time is

dL ˙
=L= vi × pi + ri × Fi = 0 + ri × FiE + ri × Fji.
dt i i ij

The total external torque is naturally defined as

τ= ri × FiE ,
i

3
There are situations and ways of describing them in which the law of action
and reaction seems not to hold. For example, a current i1 flowing through a wire
segment ds1 contributes, according to the law of Biot and Savart, a magnetic field
dB = µ0 i1 ds1 × r/4π|r|3 at a point r away from the current element. If a current
i2 flows through a segment of wire ds2 at that point, it feels a force

µ0 ds2 × (ds1 × r)
F12 = i1 i2
4π |r|3

due to element 1. On the other hand F21 is given by the same expression with ds1
and ds2 interchanged and the sign of r reversed, so
µ0 i1 i2
F12 + F21 = [ds1 (ds2 · r) − ds2 (ds1 · r)] ,
4π |r|3

which is not generally zero.
One should not despair for the validity of momentum conservation. The Law
of Biot and Savart only holds for time-independent current distributions. Unless
the currents form closed loops, there will be a charge buildup and Coulomb forces
need to be considered. If the loops are closed, the total momentum will involve
integrals over the two closed loops, for which F12 + F21 can be shown to vanish.
More generally, even the sum of the momenta of the current elements is not the
whole story, because there is momentum in the electromagnetic field, which will be
changing in the time-dependent situation.


so we might ask if the last term vanishes due the Third Law, which
permits us to rewrite Fji = 1 Fji − Fij . Then the last term becomes
2

1 1
ri × Fji = ri × Fji − ri × Fij
ij 2 ij 2 ij
1 1
= ri × Fji − rj × Fji
2 ij 2 ij
1
= (ri − rj ) × Fji .
2 ij

This is not automatically zero, but vanishes if one assumes a stronger
form of the Third Law, namely that the action and reaction forces be-
tween two particles acts along the line of separation of the particles.
If the force law is independent of velocity and rotationally and trans-
lationally symmetric, there is no other direction for it to point. For
spinning particles and magnetic forces the argument is not so simple
— in fact electromagnetic forces between moving charged particles are
really only correctly viewed in a context in which the system includes
not only the particles but also the fields themselves. For such a system,
in general the total energy, momentum, and angular momentum of the
particles alone will not be conserved, because the fields can carry all
of these quantities. But properly defining the energy, momentum, and
angular momentum of the electromagnetic fields, and including them in
the totals, will result in quantities conserved as a result of symmetries
of the underlying physics. This is further discussed in section 8.1.
Making the assumption that the strong form of Newton’s Third Law
holds, we have shown that

dL
τ= . (1.4)
dt
The conservation laws are very useful because they permit algebraic
solution for part of the velocity. Taking a single particle as an example,
if E = 1 mv 2 + U(r) is conserved, the speed |v(t)| is determined at all
2
times (as a function of r) by one arbitrary constant E. Similarly if
L is conserved, the components of v which are perpendicular to r are
determined in terms of the fixed constant L. With both conserved, v


is completely determined except for the sign of the radial component.
Examples of the usefulness of conserved quantities are everywhere, and
will be particularly clear when we consider the two body central force
problem later. But first we continue our discussion of general systems
of particles.
As we mentioned earlier, the total angular momentum depends on
the point of evaluation, that is, the origin of the coordinate system
used. We now show that it consists of two contributions, the angular
momentum about the center of mass and the angular momentum of
a fictitious point object located at the center of mass. Let r i be the
position of the i’th particle with respect to the center of mass, so r i =
ri − R. Then

˙ ˙
L = mi ri × vi = mi r i + R × r i + R
i i
˙ ˙
= mi r i × r i + mi r i × R
i i

˙ ˙
+R × mi r i + M R × R
= ri × pi + R × P.
i

Here we have noted that mi r i = 0, and also its derivative mi v i =
0. We have defined p i = mi v i , the momentum in the center of mass
reference frame. The first term of the final form is the sum of the
angular momenta of the particles about their center of mass, while the
second term is the angular momentum the system would have if it were
collapsed to a point at the center of mass.
What about the total energy? The kinetic energy

1 2 1
T = mi vi = mi v i + V · v i + V
2 2
1 2 1
= mi v i + MV 2 , (1.5)
2 2
where the cross term vanishes, once again, because mi v i = 0. Thus
the kinetic energy of the system can also be viewed as the sum of the
kinetic energies of the constituents about the center of mass, plus the


kinetic energy the system would have if it were collapsed to a particle
at the center of mass.
If the forces on the system are due to potentials, the total energy
will be conserved, but this includes not only the potential due to the
external forces but also that due to interparticle forces, Uij (ri , rj ).
In general this contribution will not be zero or even constant with
time, and the internal potential energy will need to be considered. One
exception to this is the case of a rigid body.

1.3.2 Constraints
A rigid body is defined as a system of n particles for which all the
interparticle distances are constrained to fixed constants, |ri − rj | = cij ,
and the interparticle potentials are functions only of these interparticle
distances. As these distances do not vary, neither does the internal
potential energy. These interparticle forces cannot do work, and the
internal potential energy may be ignored.
The rigid body is an example of a constrained system, in which the
general 3n degrees of freedom are restricted by some forces of constraint
which place conditions on the coordinates ri , perhaps in conjunction
with their momenta. In such descriptions we do not wish to consider
or specify the forces themselves, but only their (approximate) effect.
The forces are assumed to be whatever is necessary to have that ef-
fect. It is generally assumed, as in the case with the rigid body, that
the constraint forces do no work under displacements allowed by the
constraints. We will consider this point in more detail later.
If the constraints can be phrased so that they are on the coordinates
and time only, as Φi (r1 , ...rn , t) = 0, i = 1, . . . , k, they are known as
holonomic constraints. These constraints determine hypersurfaces
in configuration space to which all motion of the system is confined.
In general this hypersurface forms a 3n − k dimensional manifold. We
might describe the configuration point on this manifold in terms of
3n − k generalized coordinates, qj , j = 1, . . . , 3n − k, so that the 3n − k
variables qj , together with the k constraint conditions Φi ({ri }) = 0,
determine the ri = ri (q1 , . . . , q3n−k , t)
The constrained subspace of configuration space need not be a flat
space. Consider, for example, a mass on one end of a rigid light rod


of length L, the other end of which
is fixed to be at the origin r = 0,
though the rod is completely free z
to rotate. Clearly the possible val-
ues of the cartesian coordinates r
of the position of the mass satisfy θ
the constraint |r| = L, so r lies L
on the surface of a sphere of ra-
dius L. We might choose as gen- y
eralized coordinates the standard ϕ
spherical angles θ and φ. Thus
the constrained subspace is two di- x
mensional but not flat — rather it
is the surface of a sphere, which
Generalized coordinates (θ, φ) for
mathematicians call S 2 . It is nat-
a particle constrained to lie on a
ural to reexpress the dynamics in
sphere.
terms of θ and φ.

The use of generalized (non-cartesian) coordinates is not just for
constrained systems. The motion of a particle in a central force field
about the origin, with a potential U(r) = U(|r|), is far more naturally
described in terms of spherical coordinates r, θ, and φ than in terms of
x, y, and z.

Before we pursue a discussion of generalized coordinates, it must be
pointed out that not all constraints are holonomic. The standard ex-
ample is a disk of radius R, which rolls on a fixed horizontal plane. It is
constrained to always remain vertical, and also to roll without slipping
on the plane. As coordinates we can choose the x and y of the center of
the disk, which are also the x and y of the contact point, together with
the angle a fixed line on the disk makes with the downward direction,
φ, and the angle the axis of the disk makes with the x axis, θ.


As the disk rolls through
an angle dφ, the point of
contact moves a distance
Rdφ in a direction depend-
ing on θ, z

Rdφ sin θ = dx
R
Rdφ cos θ = dy φ
y
Dividing by dt, we get two
constraints involving the po-
x
sitions and velocities, θ

˙
Φ1 := Rφ sin θ − x = 0
˙ A vertical disk free to roll on a plane.
˙ A fixed line on the disk makes an angle
Φ2 := Rφ cos θ − y = 0.
˙
of φ with respect to the vertical, and
the axis of the disk makes an angle θ
The fact that these involve
with the x-axis. The long curved path
velocities does not auto-
matically make them non- is the trajectory of the contact point.
The three small paths are alternate tra-
holonomic. In the simpler
jectories illustrating that x, y, and φ can
one-dimensional problem in
each be changed without any net change
which the disk is confined to
in the other coordinates.
the yz plane, rolling along
x = 0 (θ = 0), we would have only the coordinates φ and y, with the
˙
rolling constraint Rφ − y = 0. But this constraint can be integrated,
˙
Rφ(t) − y(t) = c, for some constant c, so that it becomes a constraint
among just the coordinates, and is holomorphic. This cannot be done
with the two-dimensional problem. We can see that there is no con-
straint among the four coordinates themselves because each of them
can be changed by a motion which leaves the others unchanged. Ro-
tating θ without moving the other coordinates is straightforward. By
rolling the disk along each of the three small paths shown to the right
of the disk, we can change one of the variables x, y, or φ, respectively,
with no net change in the other coordinates. Thus all values of the
coordinates4 can be achieved in this fashion.

4
Thus the configuration space is x ∈ R, y ∈ R, θ ∈ [0, 2π) and φ ∈ [0, 2π),


There are other, less interesting, nonholonomic constraints given by
inequalities rather than constraint equations. A bug sliding down a
bowling ball obeys the constraint |r| ≥ R. Such problems are solved by
considering the constraint with an equality (|r| = R), but restricting
the region of validity of the solution by an inequality on the constraint
force (N ≥ 0), and then supplementing with the unconstrained problem
once the bug leaves the surface.
In quantum field theory, anholonomic constraints which are func-
tions of the positions and momenta are further subdivided into first
and second class constraints ` la Dirac, with the first class constraints
a
leading to local gauge invariance, as in Quantum Electrodynamics or
Yang-Mills theory. But this is heading far afield.

1.3.3 Generalized Coordinates for Unconstrained
Systems
Before we get further into constrained systems and D’Alembert’s Prin-
ciple, we will discuss the formulation of a conservative unconstrained
system in generalized coordinates. Thus we wish to use 3n general-
ized coordinates qj , which, together with time, determine all of the 3n
cartesian coordinates ri :
ri = ri (q1 , ..., q3n , t).
Notice that this is a relationship between different descriptions of the
same point in configuration space, and the functions ri ({q}, t) are in-
dependent of the motion of any particle. We are assuming that the ri
and the qj are each a complete set of coordinates for the space, so the
q’s are also functions of the {ri } and t:
qj = qj (r1 , ..., rn , t).
The t dependence permits there to be an explicit dependence of this
relation on time, as we would have, for example, in relating a rotating
coordinate system to an inertial cartesian one.
or, if we allow more carefully for the continuity as θ and φ go through 2π, the
2
more accurate statement is that configuration space is R × (S 1 )2 , where S 1 is the
circumference of a circle, θ ∈ [0, 2π], with the requirement that θ = 0 is equivalent
to θ = 2π.


Let us change the cartesian coordinate notation slightly, with {xk }
the 3n cartesian coordinates of the n 3-vectors ri , deemphasizing the
division of these coordinates into triplets.
A small change in the coordinates of a particle in configuration
space, whether an actual change over a small time interval dt or a
“virtual” change between where a particle is and where it might have
been under slightly altered circumstances, can be described by a set of
δxk or by a set of δqj . If we are talking about a virtual change at the
same time, these are related by the chain rule
∂xk ∂qj
δxk = δqj , δqj = δxk , (for δt = 0). (1.6)
j ∂qj k ∂xk

For the actual motion through time, or any variation where δt is not
assumed to be zero, we need the more general form,
∂xk ∂xk ∂qj ∂qk
δxk = δqj + δt, δqj = δxk + δt. (1.7)
j ∂qj ∂t k ∂xk ∂t
A virtual displacement, with δt = 0, is the kind of variation we need
to find the forces described by a potential. Thus the force is
∂U({x}) ∂U({x({q})}) ∂qj ∂qj
Fk = − =− = Qj , (1.8)
∂xk j ∂qj ∂xk j ∂xk

where
∂xk ∂U({x({q})})
Qj := Fk =− (1.9)
k ∂qj ∂qj
is known as the generalized force. We may think of U (q, t) :=˜
U(x(q), t) as a potential in the generalized coordinates {q}. Note that
if the coordinate transformation is time-dependent, it is possible that
a time-independent potential U(x) will lead to a time-dependent po-
˜
tential U(q, t), and a system with forces described by a time-dependent
potential is not conservative.
The definition in (1.9) of the generalized force Qj holds even if the
cartesian force is not described by a potential.
The qk do not necessarily have units of distance. For example,
one qk might be an angle, as in polar or spherical coordinates. The
corresponding component of the generalized force will have the units of
energy and we might consider it a torque rather than a force.


1.3.4 Kinetic energy in generalized coordinates
We have seen that, under the right circumstances, the potential energy
can be thought of as a function of the generalized coordinates qk , and
the generalized forces Qk are given by the potential just as for ordinary
cartesian coordinates and their forces. Now we examine the kinetic
energy
1 ˙2 1
T = mi ri = mj x2
˙j
2 i 2 j
where the 3n values mj are not really independent, as each parti-
cle has the same mass in all three dimensions in ordinary Newtonian
mechanics5 . Now
 
∆xj ∂xj ∆qk  ∂xj
xj = lim
˙ = lim  + ,
∆t→0 ∆t ∆t→0 ∂qk ∆t ∂t
k q,t q

where |q,t means that t and the q’s other than qk are held fixed. The
last term is due to the possibility that the coordinates xi (q1 , ..., q3n , t)
may vary with time even for fixed values of qk . So the chain rule is
giving us
dxj ∂xj ∂xj
xj =
˙ = qk +
˙ . (1.10)
dt k ∂qk q,t ∂t q
Plugging this into the kinetic energy, we see that
 2
1 ∂xj ∂xj ∂xj ∂xj 1
∂xj 
T = mj qk q +
˙ ˙ mj qk
˙ + mj 
.
2 j,k, ∂qk ∂q j,k ∂qk q
∂t j 2
∂t q
(1.11)
What is the interpretation of these terms? Only the first term arises
if the relation between x and q is time independent. The second and
third terms are the sources of the r · (ω × r) and (ω × r)2 terms in the
˙
kinetic energy when we consider rotating coordinate systems6 .

5
But in an anisotropic crystal, the effective mass of a particle might in fact be
different in different directions.
6
This will be fully developed in section 4.2


Let’s work a simple example: we
will consider a two dimensional system
using polar coordinates with θ measured
from a direction rotating at angular ve- r
locity ω. Thus the angle the radius vec- θ
x2
tor to an arbitrary point (r, θ) makes ωt
with the inertial x1 -axis is θ + ωt, and x1
the relations are

x1 = r cos(θ + ωt),
x2 = r sin(θ + ωt),

with inverse relations Rotating polar coordinates
related to inertial cartesian
r = x2 + x2 ,
1 2 coordinates.
θ = sin−1 (x2 /r) − ωt.
˙ ˙ ˙
So x1 = r cos(θ+ωt)− θr sin(θ+ωt)−ωr sin(θ+ωt), where the last term
˙ ˙ ˙
is from ∂xj /∂t, and x2 = r sin(θ + ωt) + θr cos(θ + ωt) + ωr cos(θ + ωt).
In the square, things get a bit simpler, x2 = r 2 + r 2 (ω + θ)2 .
˙i ˙ ˙
We see that the form of the kinetic energy in terms of the generalized
coordinates and their velocities is much more complicated than it is
in cartesian inertial coordinates, where it is coordinate independent,
and a simple diagonal quadratic form in the velocities. In generalized
coordinates, it is quadratic but not homogeneous7 in the velocities, and
with an arbitrary dependence on the coordinates. In general, even if the
coordinate transformation is time independent, the form of the kinetic
energy is still coordinate dependent and, while a purely quadratic form
in the velocities, it is not necessarily diagonal. In this time-independent
situation, we have
1 ∂xj ∂xj
T = Mk qk q ,
˙ ˙ with Mk = mj , (1.12)
2 k j ∂qk ∂q

where Mk is known as the mass matrix, and is always symmetric but
not necessarily diagonal or coordinate independent.
7
It involves quadratic and lower order terms in the velocities, not just quadratic
ones.

1.4. PHASE SPACE 21

The mass matrix is independent of the ∂xj /∂t terms, and we can
understand the results we just obtained for it in our two-dimensional
example above,

M11 = m, M12 = M21 = 0, M22 = mr 2 ,

by considering the case without rotation, ω = 0. We can also derive
this expression for the kinetic energy in nonrotating polar coordinates
˙e
by expressing the velocity vector v = rˆr + r θˆθ in terms of unit vectors
˙e
in the radial and tangential directions respectively. The coefficients
of these unit vectors can be understood graphically with geometric
arguments. This leads more quickly to v 2 = (r)2 + r 2 (θ)2 , T = 1 mr 2 +
˙ ˙
2
˙
1 2 ˙2
2
mr θ , and the mass matrix follows. Similar geometric arguments
are usually used to find the form of the kinetic energy in spherical
coordinates, but the formal approach of (1.12) enables us to find the
form even in situations where the geometry is difficult to picture.
It is important to keep in mind that when we view T as a function of
coordinates and velocities, these are independent arguments evaluated
at a particular moment of time. Thus we can ask independently how T
varies as we change xi or as we change xi , each time holding the other
˙
variable fixed. Thus the kinetic energy is not a function on the 3n-
dimensional configuration space, but on a larger, 6n-dimensional space8
with a point specifying both the coordinates {qi } and the velocities {qi }.
˙

1.4 Phase Space
If the trajectory of the system in configuration space, r(t), is known, the
velocity as a function of time, v(t) is also determined. As the mass of the
particle is simply a physical constant, the momentum p = mv contains
the same information as the velocity. Viewed as functions of time, this
gives nothing beyond the information in the trajectory. But at any
given time, r and p provide a complete set of initial conditions, while r
alone does not. We define phase space as the set of possible positions
8
This space is called the tangent bundle to configuration space. For cartesian
coordinates it is almost identical to phase space, which is in general the “cotangent
bundle” to configuration space.


and momenta for the system at some instant. Equivalently, it is the set
of possible initial conditions, or the set of possible motions obeying the
equations of motion. For a single particle in cartesian coordinates, the
six coordinates of phase space are the three components of r and the
three components of p. At any instant of time, the system is represented
by a point in this space, called the phase point, and that point moves
with time according to the physical laws of the system. These laws are
embodied in the force function, which we now consider as a function of
p rather than v, in addition to r and t. We may write these equations
as
dr p
= ,
dt m
dp
= F (r, p, t).
dt
Note that these are first order equations, which means that the mo-
tion of the point representing the system in phase space is completely
determined9 by where the phase point is. This is to be distinguished
from the trajectory in configuration space, where in order to know the
trajectory you must have not only an initial point (position) but also
an initial velocity.

1.4.1 Dynamical Systems
We have spoken of the coordinates of phase space for a single par-
ticle as r and p, but from a mathematical point of view these to-
gether give the coordinates of the phase point in phase space. We
might describe these coordinates in terms of a six dimensional vector
η = (r1 , r2 , r3 , p1 , p2 , p3 ). The physical laws determine at each point
a velocity function for the phase point as it moves through phase
space,
dη
= V (η, t), (1.13)
dt
which gives the velocity at which the phase point representing the sys-
tem moves through phase space. Only half of this velocity is the ordi-
9
We will assume throughout that the force function is a well defined continuous
function of its arguments.

1.4. PHASE SPACE 23

nary velocity, while the other half represents the rapidity with which the
momentum is changing, i.e. the force. The path traced by the phase
point as it travels through phase space is called the phase curve.
For a system of n particles in three dimensions, the complete set of
initial conditions requires 3n spatial coordinates and 3n momenta, so
phase space is 6n dimensional. While this certainly makes visualization
difficult, the large dimensionality is no hindrance for formal develop-
ments. Also, it is sometimes possible to focus on particular dimensions,
or to make generalizations of ideas familiar in two and three dimensions.
For example, in discussing integrable systems (7.1), we will find that
the motion of the phase point is confined to a 3n-dimensional torus, a
generalization of one and two dimensional tori, which are circles and
the surface of a donut respectively.
Thus for a system composed of a finite number of particles, the
dynamics is determined by the first order ordinary differential equation
(1.13), formally a very simple equation. All of the complication of the
physical situation is hidden in the large dimensionality of the dependent
variable η and in the functional dependence of the velocity function
V (η, t) on it.
There are other systems besides Newtonian mechanics which are
controlled by equation (1.13), with a suitable velocity function. Collec-
tively these are known as dynamical systems. For example, individ-
uals of an asexual mutually hostile species might have a fixed birth rate
b and a death rate proportional to the population, so the population
would obey the logistic equation10 dp/dt = bp − cp2 , a dynamical
system with a one-dimensional space for its dependent variable. The
populations of three competing species could be described by eq. (1.13)
with η in three dimensions.
The dimensionality d of η in (1.13) is called the order of the dy-
namical system. A d’th order differential equation in one independent
variable may always be recast as a first order differential equation in d
variables, so it is one example of a d’th order dynamical system. The
space of these dependent variables is called the phase space of the dy-
namical system. Newtonian systems always give rise to an even-order
10
This is not to be confused with the simpler logistic map, which is a recursion
relation with the same form but with solutions displaying a very different behavior.


system, because each spatial coordinate is paired with a momentum.
For n particles unconstrained in D dimensions, the order of the dy-
namical system is d = 2nD. Even for constrained Newtonian systems,
there is always a pairing of coordinates and momenta, which gives a
restricting structure, called the symplectic structure11 , on phase space.
If the force function does not depend explicitly on time, we say the
system is autonomous. The velocity function has no explicit depen-
dance on time, V = V (η), and is a time-independent vector field on
phase space, which we can indicate by arrows just as we might the
electric field in ordinary space. This gives a visual indication of the
motion of the system’s point. For example, consider a damped har-
monic oscillator with F = −kx − αp, for which the velocity function
is
dx dp p
, = , −kx − αp .
dt dt m
A plot of this field for the undamped (α = 0) and damped oscillators
p p

x x

Undamped Damped
Figure 1.1: Velocity field for undamped and damped harmonic oscil-
lators, and one possible phase curve for each system through phase
space.

is shown in Figure 1.1. The velocity field is everywhere tangent to any
possible path, one of which is shown for each case. Note that qualitative
features of the motion can be seen from the velocity field without any
solving of the differential equations; it is clear that in the damped case
the path of the system must spiral in toward the origin.
The paths taken by possible physical motions through the phase
space of an autonomous system have an important property. Because
11
This will be discussed in sections (6.3) and (6.6).

1.4. PHASE SPACE 25

the rate and direction with which the phase point moves away from
a given point of phase space is completely determined by the velocity
function at that point, if the system ever returns to a point it must
move away from that point exactly as it did the last time. That is,
if the system at time T returns to a point in phase space that it was
at at time t = 0, then its subsequent motion must be just as it was,
so η(T + t) = η(t), and the motion is periodic with period T . This
almost implies that the phase curve the object takes through phase
space must be nonintersecting12 .
In the non-autonomous case, where the velocity field is time depen-
dent, it may be preferable to think in terms of extended phase space, a
6n + 1 dimensional space with coordinates (η, t). The velocity field can
be extended to this space by giving each vector a last component of 1,
as dt/dt = 1. Then the motion of the system is relentlessly upwards in
this direction, though still complex in the others. For the undamped
one-dimensional harmonic oscillator, the path is a helix in the three
dimensional extended phase space.
Most of this book is devoted to finding analytic methods for ex-
ploring the motion of a system. In several cases we will be able to
find exact analytic solutions, but it should be noted that these exactly
solvable problems, while very important, cover only a small set of real
problems. It is therefore important to have methods other than search-
ing for analytic solutions to deal with dynamical systems. Phase space
provides one method for finding qualitative information about the so-
lutions. Another approach is numerical. Newton’s Law, and more
generally the equation (1.13) for a dynamical system, is a set of ordi-
nary differential equations for the evolution of the system’s position in
phase space. Thus it is always subject to numerical solution given an
initial configuration, at least up until such point that some singularity
in the velocity function is reached. One primitive technique which will
work for all such systems is to choose a small time interval of length
∆t, and use dη/dt at the beginning of each interval to approximate ∆η
during this interval. This gives a new approximate value for η at the

12
An exception can occur at an unstable equilibrium point, where the velocity
function vanishes. The motion can just end at such a point, and several possible
phase curves can terminate at that point.


end of this interval, which may then be taken as the beginning of the
next.13
As an example, we show the
meat of a calculation for the do i = 1,n
damped harmonic oscillator, in dx = (p/m) * dt
Fortran. This same technique dp = -(k*x+alpha*p)*dt
will work even with a very com- x = x + dx
plicated situation. One need p = p + dp
only add lines for all the com- t = t + dt
ponents of the position and mo- write *, t, x, p
mentum, and change the force enddo
law appropriately.
This is not to say that nu- Integrating the motion, for a
merical solution is a good way damped harmonic oscillator.
to solve this problem. An analytical solution, if it can be found, is
almost always preferable, because

• It is far more likely to provide insight into the qualitative features
of the motion.

• Numerical solutions must be done separately for each value of the
parameters (k, m, α) and each value of the initial conditions (x0
and p0 ).

• Numerical solutions have subtle numerical problems in that they
are only exact as ∆t → 0, and only if the computations are done
exactly. Sometimes uncontrolled approximate solutions lead to
surprisingly large errors.
13
This is a very unsophisticated method. The errors made in each step for ∆r
and ∆p are typically O(∆t)2 . As any calculation of the evolution from time t0
to tf will involve a number ([tf − t0 ]/∆t) of time steps which grows inversely to
∆t, the cumulative error can be expected to be O(∆t). In principle therefore we
can approach exact results for a ﬁnite time evolution by taking smaller and smaller
time steps, but in practise there are other considerations, such as computer time and
roundoﬀ errors, which argue strongly in favor of using more sophisticated numerical
techniques, with errors of higher order in ∆t. These can be found in any text on
numerical methods.

1.4. PHASE SPACE 27

Nonetheless, numerical solutions are often the only way to handle a
real problem, and there has been extensive development of techniques
for efficiently and accurately handling the problem, which is essentially
one of solving a system of first order ordinary differential equations.

1.4.2 Phase Space Flows
As we just saw, Newton’s equations for a system of particles can be
cast in the form of a set of first order ordinary differential equations
in time on phase space, with the motion in phase space described by
the velocity field. This could be more generally discussed as a d’th
order dynamical system, with a phase point representing the system
in a d-dimensional phase space, moving with time t along the velocity
field, sweeping out a path in phase space called the phase curve. The
phase point η(t) is also called the state of the system at time t. Many
qualitative features of the motion can be stated in terms of the phase
curve.

Fixed Points
There may be points ηk , known as fixed points, at which the velocity
function vanishes, V (ηk ) = 0. This is a point of equilibrium for the
system, for if the system is at a fixed point at one moment, η(t0 ) = ηk ,
it remains at that point. At other points, the system does not stay
put, but there may be sets of states which flow into each other, such
as the elliptical orbit for the undamped harmonic oscillator. These are
called invariant sets of states. In a first order dynamical system14 ,
the fixed points divide the line into intervals which are invariant sets.
Even though a first-order system is smaller than any Newtonian sys-
tem, it is worthwhile discussing briefly the phase flow there. We have
been assuming the velocity function is a smooth function — generically
its zeros will be first order, and near the fixed point η0 we will have
V (η) ≈ c(η − η0 ). If the constant c < 0, dη/dt will have the oppo-
site sign from η − η0 , and the system will flow towards the fixed point,
14
Note that this is not a one-dimensional Newtonian system, which is a two
dimensional η = (x, p) dynamical system.


which is therefore called stable. On the other hand, if c > 0, the dis-
placement η − η0 will grow with time, and the fixed point is unstable.
Of course there are other possibilities: if V (η) = cη 2 , the fixed point
η = 0 is stable from the left and unstable from the right. But this kind
of situation is somewhat artificial, and such a system is structually
unstable. What that means is that if the velocity field is perturbed
by a small smooth variation V (η) → V (η) + w(η), for some bounded
smooth function w, the fixed point at η = 0 is likely to either disap-
pear or split into two fixed points, whereas the fixed points discussed
earlier will simply be shifted by order in position and will retain their
stability or instability. Thus the simple zero in the velocity function is
structurally stable. Note that structual stability is quite a different
notion from stability of the fixed point.
In this discussion of stability in first order dynamical systems, we
see that generically the stable fixed points occur where the velocity
function decreases through zero, while the unstable points are where it
increases through zero. Thus generically the fixed points will alternate
in stability, dividing the phase line into open intervals which are each
invariant sets of states, with the points in a given interval flowing either
to the left or to the right, but never leaving the open interval. The state
never reaches the stable fixed point because the time t = dη/V (η) ≈
(1/c) dη/(η −η0 ) diverges. On the other hand, in the case V (η) = cη 2 ,
−1
a system starting at η0 at t = 0 has a motion given by η = (η0 − ct)−1 ,
which runs off to infinity as t → 1/η0 c. Thus the solution terminates
at t = 1/η0 c, and makes no sense thereafter. This form of solution is
called terminating motion.
For higher order dynamical systems, the d equations Vi (η) = 0
required for a fixed point will generically determine the d variables
ηj , so the generic form of the velocity field near a fixed point η0 is
Vi (η) = j Mij (ηj − η0j ) with a nonsingular matrix M. The stability
of the flow will be determined by this d-dimensional square matrix M.
Generically the eigenvalue equation, a d’th order polynomial in λ, will
have d distinct solutions. Because M is a real matrix, the eigenvalues
must either be real or come in complex conjugate pairs. For the real
case, whether the eigenvalue is positive or negative determines the in-
stability or stability of the flow along the direction of the eigenvector.
For a pair of complex conjugate eigenvalues λ = u + iv and λ∗ = u − iv,

1.4. PHASE SPACE 29

with eigenvectors e and e ∗ respectively, we may describe the flow in the
plane δη = η − η0 = x(e + e ∗ ) + iy(e − e ∗ ), so
η = M · δη = x(λe + λ∗ e ∗ ) + iy(λe − λ∗ e ∗ )
˙
= (ux − vy)(e + e ∗ ) + (vx + uy)(e − e ∗ )
so
x
˙ u −v x x = Aeut cos(vt + φ)
= , or .
y
˙ v u y y = Aeut sin(vt + φ)
Thus we see that the motion spirals in towards the fixed point if u is
negative, and spirals away from the fixed point if u is positive. Stability
in these directions is determined by the sign of the real part of the
eigenvalue.
In general, then, stability in each subspace around the fixed point η0
depends on the sign of the real part of the eigenvalue. If all the real parts
are negative, the system will flow from anywhere in some neighborhood
of η0 towards the fixed point, so limt→∞ η(t) = η0 provided we start
in that neighborhood. Then η0 is an attractor and is a strongly
stable fixed point. On the other hand, if some of the eigenvalues
have positive real parts, there are unstable directions. Starting from
a generic point in any neighborhood of η0 , the motion will eventually
flow out along an unstable direction, and the fixed point is considered
unstable, although there may be subspaces along which the flow may
be into η0 . An example is the line x = y in the hyperbolic fixed
point case shown in Figure 1.2.
Some examples of two dimensional flows in the neighborhood of a
generic fixed point are shown in Figure 1.2. Note that none of these
describe the fixed point of the undamped harmonic oscillator of Figure
1.1. We have discussed generic situations as if the velocity field were
chosen arbitrarily from the set of all smooth vector functions, but in
fact Newtonian mechanics imposes constraints on the velocity fields in
many situations, in particular if there are conserved quantities.

Effect of conserved quantities on the flow
If the system has a conserved quantity Q(q, p) which is a function on
phase space only, and not of time, the flow in phase space is consider-
ably changed. This is because the equations Q(q, p) = K gives a set


x = −x + y, x = −3x − y, x = 3x + y, x = −x − 3y,
˙ ˙ ˙ ˙
y = −2x − y. y = −x − 3y. y = x + 3y. y = −3x − y.
˙ ˙ ˙ ˙

Strongly stable Strongly stable Unstable fixed Hyperbolic
spiral point. fixed point, point, fixed point,
√
λ = −1 ± 2i. λ = −1, −2. λ = 1, 2. λ = −2, 1.

Figure 1.2: Four generic fixed points for a second order dynamical
system.

of subsurfaces or contours in phase space, and the system is confined
to stay on whichever contour it is on initially. Unless this conserved
quantity is a trivial function, i.e. constant, in the vicinity of a fixed
point, it is not possible for all points to flow into the fixed point, and
thus it is not strongly stable. In the terms of our generic discussion,
the gradient of Q gives a direction orthogonal to the image of M, so
there is a zero eigenvalue and we are not in the generic situation we
discussed.
For the case of a single particle in a potential, the total energy
E = p2 /2m + U(r) is conserved, and so the motion of the system
is confined to one surface of a given energy. As p/m is part of the
velocity function, a fixed point must have p = 0. The vanishing of
the other half of the velocity field gives U(r0 ) = 0, which is the
condition for a stationary point of the potential energy, and for the
force to vanish. If this point is a maximum or a saddle of U, the
motion along a descending path will be unstable. If the fixed point
is a minimum of the potential, the region E(r, p) < E(r0 , 0) + , for

1.4. PHASE SPACE 31

sufficiently small , gives a neighborhood around η0 = (r0 , 0) to which
the motion is confined if it starts within this region. Such a fixed point is
called stable15 , but it is not strongly stable, as the flow does not settle
down to η0 . This is the situation we saw for the undamped harmonic
oscillator. For that situation F = −kx, so the potential energy may be
taken to be

0 1
U(x) = −kx dx = kx2 ,
x 2

and so the total energy E = p2 /2m + 1 kx2 is conserved. The curves
2
of constant E in phase space are ellipses, and each motion orbits the
appropriate ellipse, as shown in Fig. 1.1 for the undamped oscillator.
This contrasts to the case of the damped oscillator, for which there is
no conserved energy, and for which the origin is a strongly stable fixed
point.

15
A fixed point is stable if it is in arbitrarity small neighborhoods, each with the
property that if the system is in that neighborhood at one time, it remains in it at
all later times.


As an example of a con-
servative system with both sta-
ble and unstable fixed points, 0.3
U
consider a particle in one di- 0.2 U(x)
mension with a cubic potential 0.1
U(x) = ax2 − bx3 , as shown in
-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2
Fig. 1.3. There is a stable equi- x
-0.1
librium at xs = 0 and an un-
-0.2
stable one at xu = 2a/3b. Each
has an associated fixed point in -0.3

phase space, an elliptic fixed
p
point ηs = (xs , 0) and a hyper- 1
bolic fixed point ηu = (xu , 0).
The velocity field in phase
space and several possible or-
bits are shown. Near the sta-
ble equilibrium, the trajectories x
are approximately ellipses, as
they were for the harmonic os-
cillator, but for larger energies -1
they begin to feel the asym-
metry of the potential, and Figure 1.3. Motion in a cubic poten-
the orbits become egg-shaped. tial.

If the system has total energy precisely U(xu ), the contour line
crosses itself. This contour actually consists of three separate orbits.
One starts at t → −∞ at x = xu , completes one trip though the
potential well, and returns as t → +∞ to x = xu . The other two are
orbits which go from x = xu to x = ∞, one incoming and one outgoing.
For E > U(xu ), all the orbits start and end at x = +∞. Note that
generically the orbits deform continuously as the energy varies, but at
E = U(xu ) this is not the case — the character of the orbit changes as
E passes through U(xu ). An orbit with this critical value of the energy
is called a seperatrix, as it seperates regions in phase space where the
orbits have different qualitative characteristics.
Quite generally hyperbolic fixed points are at the ends of seperatri-
ces. In our case the contour E = U(xu ) consists of four invariant sets

1.4. PHASE SPACE 33

of states, one of which is the point ηu itself, and the other three are
the orbits which are the disconnected pieces left of the contour after
removing ηu .

Exercises
1.1 (a) Find the potential energy function U (r) for a particle in the grav-
itational field of the Earth, for which the force law is F (r) = −GME mr/r 3 .
(b) Find the escape velocity from the Earth, that is, the minimum velocity
a particle near the surface can have for which it is possible that the particle
will eventually coast to arbitrarily large distances without being acted upon
by any force other than gravity. The Earth has a mass of 6.0 × 1024 kg and
a radius of 6.4 × 106 m. Newton’s gravitational constant is 6.67 × 10−11 N ·
m2 /kg2 .
1.2 In the discussion of a system of particles, it is important that the
particles included in the system remain the same. There are some situations
in which we wish to focus our attention on a set of particles which changes
with time, such as a rocket ship which is emitting gas continuously. The
equation of motion for such a problem may be derived by considering an
infinitesimal time interval, [t, t + ∆t], and choosing the system to be the
rocket with the fuel still in it at time t, so that at time t + ∆t the system
consists of the rocket with its remaining fuel and also the small amount of
fuel emitted during the infinitesimal time interval.
Let M (t) be the mass of the rocket and remaining fuel at time t, assume that
the fuel is emitted with velocity u with respect to the rocket, and call the
velocity of the rocket v(t) in an inertial coordinate system. If the external
force on the rocket is F (t) and the external force on the infinitesimal amount
of exhaust is infinitesimal, the fact that F (t) is the rate of change of the total
momentum gives the equation of motion for the rocket.
(a) Show that this equation is
dv dM
M = F (t) + u .
dt dt
(b) Suppose the rocket is in a constant gravitational field F = −M gˆz for
e
the period during which it is burning fuel, and that it is fired straight up
with constant exhaust velocity (u = −uˆz ), starting from rest. Find v(t) in
e
terms of t and M (t).
(c) Find the maximum fraction of the initial mass of the rocket which can
escape the Earth’s gravitational field if u = 2000m/s.


1.3 For a particle in two dimensions, we might use polar coordinates (r, θ)
and use basis unit vectors er and eθ in the radial and tangent directions
ˆ ˆ
respectively to describe more general vectors. Because this pair of unit
vectors differ from point to point, the er and eθ along the trajectory of a
ˆ ˆ
moving particle are themselves changing with time.
(a) Show that
d ˙e d ˙e
er = θˆθ ,
ˆ eθ = −θˆr .
ˆ
dt dt
(b) Thus show that the derivative of r = rˆr is
e
˙e
v = rˆr + r θˆθ ,
˙e

which verifies the discussion of Sec. (1.3.4).
(c) Show that the derivative of the velocity is
d
a= v = (¨ − r θ 2 )ˆr + (r θ + 2r θ)ˆθ .
r ˙ e ¨ ˙˙ e
dt
(d) Thus Newton’s Law says for the radial and tangential components of
the force are Fr = er · F = m(¨ − r θ 2 ), Fθ = eθ · F = m(r θ + 2r θ). Show
ˆ r ˙ ˆ ¨ ˙˙
that the generalized forces are Qr = Fr and Qθ = rFθ .

1.4 Analyze the errors in the integration of Newton’s Laws in the sim-
ple Euler’s approach described in section 1.4.1, where we approximated
the change for x and p in each time interval ∆t between ti and ti+1 by
x(t) ≈ x(ti ), p(t) ≈ F (x(ti ), v(ti )). Assuming F to be differentiable, show
˙ ˙ ˙
that the error which accumulates in a finite time interval T is of order (∆t)1 .

1.5 Write a simple program to integrate the equation of the harmonic os-
cillator through one period of oscillation, using Euler’s method with a step
size ∆t. Do this for several ∆t, and see whether the error accumulated in
one period meets the expectations of problem 1.4.

1.6 Describe the one dimensional phase space for the logistic equation p = ˙
bp − cp2 , with b > 0, c > 0. Give the fixed points, the invariant sets of states,
and describe the flow on each of the invariant sets.

1.7 Consider a pendulum consisting of a mass at the end of a massless rod
of length L, the other end of which is fixed but free to rotate. Ignore one of
the horizontal directions, and describe the dynamics in terms of the angle θ

1.4. PHASE SPACE 35

between the rod and the downwards direction, without making a small angle
approximation.
(a) Find the generalized force Qθ and find the conserved quantity on phase
space.
(b) Give a sketch of the velocity function, including all the regions of phase
space. Show all fixed points, seperatrices, and describe all the invariant sets
of states. [Note: the variable θ is defined only modulo 2π, so the phase
space is the Cartesian product of an interval of length 2π in θ with the real
line for pθ . This can be plotted on a strip, with the understanding that the
left and right edges are identified. To avoid having important points on the
boundary, it would be well to plot this with θ ∈ [−π/2, 3π/2].

Chapter 2

Lagrange’s and Hamilton’s
Equations

In this chapter, we consider two reformulations of Newtonian mechan-
ics, the Lagrangian and the Hamiltonian formalism. The first is natu-
rally associated with configuration space, extended by time, while the
latter is the natural description for working in phase space.
Lagrange developed his approach in 1764 in a study of the libra-
tion of the moon, but it is best thought of as a general method of
treating dynamics in terms of generalized coordinates for configuration
space. It so transcends its origin that the Lagrangian is considered the
fundamental object which describes a quantum field theory.
Hamilton’s approach arose in 1835 in his unification of the language
of optics and mechanics. It too had a usefulness far beyond its origin,
and the Hamiltonian is now most familiar as the operator in quantum
mechanics which determines the evolution in time of the wave function.

2.1 Lagrangian Mechanics
We begin by deriving Lagrange’s equation as a simple change of co-
ordinates in an unconstrained system, one which is evolving according
to Newton’s laws with force laws given by some potential. Lagrangian
mechanics is also and especially useful in the presence of constraints,
so we will then extend the formalism to this more general situation.

37

38 CHAPTER 2. LAGRANGE’S AND HAMILTON’S EQUATIONS

2.1.1 Derivation for unconstrained systems
For a collection of particles with conservative forces described by a
potential, we have in inertial cartesian coordinates
mï = Fi .
x
The left hand side of this equation is determined by the kinetic energy
function as the time derivative of the momentum pi = ∂T /∂ xi , while ˙
the right hand side is a derivative of the potential energy, −∂U/∂xi . As
T is independent of xi and U is independent of xi in these coordinates,
˙
we can write both sides in terms of the Lagrangian L = T − U, which
is then a function of both the coordinates and their velocities. Thus we
have established
d ∂L ∂L
− = 0,
dt ∂ xi ∂xi
˙
which, once we generalize it to arbitrary coordinates, will be known as
Lagrange’s equation. This particular combination of T (r) with U(r) ˙
˙
to get the more complicated L(r, r) seems an artificial construction for
the inertial cartesian coordinates, but it has the advantage of preserving
the form of Lagrange’s equations for any set of generalized coordinates.
As we did in section 1.3.3, we assume we have a set of generalized
coordinates {qj } which parameterize all of coordinate space, so that
each point may be described by the {qj } or by the {xi }, i, j ∈ [1, N],
and thus each set may be thought of as a function of the other, and
time:
qj = qj (x1 , ...xN , t) xi = xi (q1 , ...qN , t). (2.1)
We may consider L as a function1 of the generalized coordinates qj and
qj , and ask whether the same expression in these coordinates
˙
d ∂L ∂L
−
dt ∂ qj
˙ ∂qj
1
Of course we are not saying that L(x, x, t) is the same function of its coor-
˙
dinates as L(q, q, t), but rather that these are two functions which agree at the
˙
corresponding physical points. More precisely, we are defining a new function
˜
L(q, q, t) = L(x(q, t), x(q, q, t), t), but we are being physicists and neglecting the
˙ ˙ ˙
tilde. We are treating the Lagrangian here as a scalar under coordinate transfor-
mations, in the sense used in general relativity, that its value at a given physical
point is unchanged by changing the coordinate system used to define that point.

2.1. LAGRANGIAN MECHANICS 39

also vanishes. The chain rule tells us
∂L ∂L ∂qk ∂L ∂ qk˙
= + . (2.2)
∂ xj
˙ k
∂qk ∂ xj
˙ k
∂ qk ∂ xj
˙ ˙

The first term vanishes because qk depends only on the coordinates xk
and t, but not on the xk . From the inverse relation to (1.10),
˙

∂qj ∂qj
qj =
˙ xi +
˙ , (2.3)
i ∂xi ∂t

we have
∂ qj
˙ ∂qj
= .
∂ xi
˙ ∂xi
Using this in (2.2),
∂L ∂L ∂qj
= . (2.4)
∂ xi
˙ j ∂ qj ∂xi
˙

Lagrange’s equation involves the time derivative of this. Here what
is meant is not a partial derivative ∂/∂t, holding the point in configu-
ration space fixed, but rather the derivative along the path which the
system takes as it moves through configuration space. It is called the
stream derivative, a name which comes from fluid mechanics, where
it gives the rate at which some property defined throughout the fluid,
f (r, t), changes for a fixed element of fluid as the fluid as a whole flows.
We write it as a total derivative to indicate that we are following the
motion rather than evaluating the rate of change at a fixed point in
space, as the partial derivative does.
For any function f (x, t) of extended configuration space, this total
time derivative is
df ∂f ∂f
= xj +
˙ . (2.5)
dt j ∂xj ∂t

Using Leibnitz’ rule on (2.4) and using (2.5) in the second term, we
find

d ∂L d ∂L ∂qj ∂L ∂ 2 qj ∂ 2 qj
= + xk +
˙ . (2.6)
dt ∂ xi
˙ j dt ∂ qj
˙ ∂xi j ∂ qj
˙ k ∂xi ∂xk ∂xi ∂t


On the other hand, the chain rule also tells us
∂L ∂L ∂qj ∂L ∂ qj
˙
= + ,
∂xi j ∂qj ∂xi j ∂ qj ∂xi
˙
where the last term does not necessarily vanish, as qj in general depends
˙
on both the coordinates and velocities. In fact, from 2.3,
∂ qj
˙ ∂ 2 qj ∂ 2 qj
= xk +
˙ ,
∂xi k ∂xi ∂xk ∂xi ∂t
so
∂L ∂L ∂qj ∂L ∂ 2 qj ∂ 2 qj
= + xk +
˙ . (2.7)
∂xi j ∂qj ∂xi j ∂ qj
˙ k ∂xi ∂xk ∂xi ∂t

Lagrange’s equation in cartesian coordinates says (2.6) and (2.7) are
equal, and in subtracting them the second terms cancel2 , so
d ∂L ∂L ∂qj
0 = − .
j dt ∂ qj
˙ ∂qj ∂xi

The matrix ∂qj /∂xi is nonsingular, as it has ∂xi /∂qj as its inverse, so
we have derived Lagrange’s Equation in generalized coordinates:
d ∂L ∂L
− = 0.
dt ∂ qj
˙ ∂qj
Thus we see that Lagrange’s equations are form invariant under
changes of the generalized coordinates used to describe the conﬁgura-
tion of the system. It is primarily for this reason that this particular
and peculiar combination of kinetic and potential energy is useful. Note
that we implicity assume the Lagrangian itself transformed like a scalar,
in that its value at a given physical point of conﬁguration space is in-
dependent of the choice of generalized coordinates that describe the
point. The change of coordinates itself (2.1) is called a point trans-
formation.
2
This is why we chose the particular combination we did for the Lagrangian,
rather than L = T − αU for some α = 1. Had we done so, Lagrange’s equation
in cartesian coordinates would have been α d(∂L/∂ xj )/dt − ∂L/∂xj = 0, and in
˙
the subtraction of (2.7) from α×(2.6), the terms proportional to ∂L/∂ qi (without
˙
a time derivative) would not have cancelled.


2.1.2 Lagrangian for Constrained Systems
We now wish to generalize our discussion to include contraints. At
the same time we will also consider possibly nonconservative forces.
As we mentioned in section 1.3.2, we often have a system with internal
forces whose effect is better understood than the forces themselves, with
which we may not be concerned. We will assume the constraints are
holonomic, expressible as k real functions Φα (r1 , ..., rn , t) = 0, which
are somehow enforced by constraint forces FiC on the particle i. There
may also be other forces, which we will call FiD and will treat as having
a dynamical effect. These are given by known functions of the config-
uration and time, possibly but not necessarily in terms of a potential.
This distinction will seem artificial without examples, so it would
be well to keep these two in mind. In each of these cases the full
configuration space is R3 , but the constraints restrict the motion to an
allowed subspace of extended configuration space.

1. In section 1.3.2 we discussed a mass on a light rigid rod, the other
end of which is fixed at the origin. Thus the mass is constrained
to have |r| = L, and the allowed subspace of configuration space
is the surface of a sphere, independent of time. The rod exerts the
constraint force to avoid compression or expansion. The natural
assumption to make is that the force is in the radial direction, and
therefore has no component in the direction of allowed motions,
the tangential directions. That is, for all allowed displacements,
δr, we have F C · δr = 0, and the constraint force does no work.

2. Consider a bead free to slide without friction on the spoke of a ro-
tating bicycle wheel3 , rotating about a fixed axis at fixed angular
velocity ω. That is, for the polar angle θ of inertial coordinates,
Φ := θ − ωt = 0 is a constraint4 , but the r coordinate is uncon-
strained. Here the allowed subspace is not time independent, but
is a helical sort of structure in extended configuration space. We
expect the force exerted by the spoke on the bead to be in the eθ ˆ
3
Unlike a real bicycle wheel, we are assuming here that the spoke is directly
along a radius of the circle, pointing directly to the axle.
4
There is also a constraint z = 0.


direction. This is again perpendicular to any virtual displace-
ment, by which we mean an allowed change in configuration at a
fixed time. It is important to distinguish this virtual displacement
from a small segment of the trajectory of the particle. In this case
a virtual displacement is a change in r without a change in θ, and
is perpendicular to eθ . So again, we have the “net virtual work”
ˆ
of the constraint forces is zero. It is important to note that this
does not mean that the net real work is zero. In a small time
interval, the displacement ∆r includes a component rω∆t in the
tangential direction, and the force of constraint does do work!

We will assume that the constraint forces in general satisfy this
restriction that no net virtual work is done by the forces of constraint
˙
for any possible virtual displacement. Newton’s law tells us that pi =
Fi = FiC + FiD . We can multiply by an arbitrary virtual displacement

FiD − pi · δri = −
˙ FiC · δri = 0,
i i

where the first equality would be true even if δri did not satisfy the
constraints, but the second requires δri to be an allowed virtual dis-
placement. Thus
FiD − pi · δri = 0,
˙ (2.8)
i

which is known as D’Alembert’s Principle. This gives an equation
which determines the motion on the constrained subspace and does not
involve the unspecified forces of constraint F C . We drop the super-
script D from now on.
Suppose we know generalized coordinates q1 , . . . , qN which parame-
terize the constrained subspace, which means ri = ri (q1 , . . . , qN , t), for
i = 1, . . . , n, are known functions and the N q’s are independent. There
are N = 3n − k of these independent coordinates, where k is the num-
ber of holonomic constraints. Then ∂ri /∂qj is no longer an invertable,
or even square, matrix, but we still have

∂ri ∂ri
∆ri = ∆qj + ∆t.
j ∂qj ∂t


For the velocity of the particle, divide this by ∆t, giving
∂ri ∂ri
vi = qj +
˙ , (2.9)
j ∂qj ∂t
but for a virtual displacement ∆t = 0 we have
∂ri
δri = δqj .
j ∂qj
Diﬀerentiating (2.9) we note that,
∂vi ∂ri
= , (2.10)
∂ qj
˙ ∂qj
and also
∂vi ∂ 2 ri ∂ 2 ri d ∂ri
= qk +
˙ = , (2.11)
∂qj k ∂qj ∂qk ∂qj ∂t dt ∂qj
where the last equality comes from applying (2.5), with coordinates qj
rather than xj , to f = ∂ri /∂qj . The ﬁrst term in the equation (2.8)
stating D’Alembert’s principle is
∂ri
Fi · δri = Fi · δqj = Qj · δqj .
i j i ∂qj j

The generalized force Qj has the same form as in the unconstrained
case, as given by (1.9), but there are only as many of them as there are
unconstrained degrees of freedom.
The second term involves
˙ dpi ∂ri
pi · δri = δqj
i i dt ∂qj
d ∂ri d ∂ri
= pi · δqj − pi · δqj
j dt i ∂qj ij dt ∂qj
d ∂vi ∂vi
= pi · δqj − pi · δqj
j dt i ∂ qj
˙ ij ∂qj
d ∂vi ∂vi
= mi vi · − mi vi · δqj
j dt i ∂ qj
˙ i ∂qj
d ∂T ∂T
= − δqj ,
j dt ∂ qj
˙ ∂qj


where we used (2.10) and (2.11) to get the third line. Plugging in the
expressions we have found for the two terms in D’Alembert’s Principle,

d ∂T ∂T
− − Qj δqj = 0.
j dt ∂ qj
˙ ∂qj

We assumed we had a holonomic system and the q’s were all indepen-
dent, so this equation holds for arbitrary virtual displacements δqj , and
therefore
d ∂T ∂T
− − Qj = 0. (2.12)
dt ∂ qj
˙ ∂qj
Now let us restrict ourselves to forces given by a potential, with
Fi = − i U({r}, t), or

∂ri ˜
∂ U ({q}, t)
Qj = − · iU =− .
i ∂qj ∂qj t

Notice that Qj depends only on the value of U on the constrained
surface. Also, U is independent of the qi ’s, so
˙
d ∂T ∂T ∂U d ∂(T − U) ∂(T − U)
− + =0= − ,
dt ∂ qj
˙ ∂qj ∂qj dt ∂ qj
˙ ∂qj
or
d ∂L ∂L
− = 0. (2.13)
dt ∂ qj
˙ ∂qj
This is Lagrange’s equation, which we have now derived in the more
general context of constrained systems.

Some examples of the use of Lagrangians
Atwood’s machine consists of two blocks of mass m1 and m2 attached
by an inextensible cord which suspends them from a pulley of moment
of inertia I with frictionless bearings. The kinetic energy is
1 1 1
T = m1 x2 + m2 x2 + Iω 2
˙ ˙
2 2 2
U = m1 gx + m2 g(K − x) = (m1 − m2 )gx + const


where we have used the fact that the sum of the heights of the masses
is a constant K. We assume the cord does not slip on the pulley, so
the angular velocity of the pulley is ω = x/r, and
˙
1
L = (m1 + m2 + I/r 2)x2 + (m2 − m1 )gx,
˙
2
and Lagrange’s equation gives
d ∂L ∂L
− = 0 = (m1 + m2 + I/r 2 )¨ − (m2 − m1 )g.
x
dt ∂ x
˙ ∂x
Notice that we set up our system in terms of only one degree of freedom,
the height of the first mass. This one degree of freedom parameterizes
the line which is the allowed subspace of the unconstrained configura-
tion space, a three dimensional space which also has directions corre-
sponding to the angle of the pulley and the height of the second mass.
The constraints restrict these three variables because the string has a
fixed length and does not slip on the pulley. Note that this formalism
has permitted us to solve the problem without solving for the forces of
constraint, which in this case are the tensions in the cord on either side
of the pulley.
As a second example, reconsider the bead on the spoke of a rotating
bicycle wheel. In section (1.3.4) we saw that the kinetic energy is
T = 1 mr 2 + 1 mr 2 ω 2 . If there are no forces other than the constraint
2
˙ 2
forces, U(r, θ) ≡ 0, and the Lagrangian is
1 1
L = mr 2 + mr 2 ω 2.
˙
2 2
The equation of motion for the one degree of freedom is easy enough:
d ∂L ∂L
= m¨ =
r = mrω 2,
dt ∂ r
˙ ∂r
which looks like a harmonic oscillator with a negative spring constant,
so the solution is a real exponential instead of oscillating,

r(t) = Ae−ωt + Beωt .

The velocity-independent term in T acts just like a potential would,
and can in fact be considered the potential for the centrifugal force.


But we see that the total energy T is not conserved but blows up as
t → ∞, T ∼ mB 2 ω 2 e2ωt . This is because the force of constraint, while
it does no virtual work, does do real work.
Finally, let us consider the mass on the end of the gimballed rod.
The allowed subspace is the surface of a sphere, which can be parame-
terized by an azimuthal angle φ and the polar angle with the upwards
direction, θ, in terms of which

z = cos θ, x = sin θ cos φ, y = sin θ sin φ,

and T = 1 m 2 (θ2 + sin2 θφ2 ). With an arbitrary potential U(θ, φ), the
2
˙ ˙
Lagrangian becomes
1
L = m 2 (θ2 + sin2 θφ2 ) − U(θ, φ).
˙ ˙
2
From the two independent variables θ, φ there are two Lagrange equa-
tions of motion,
∂U 1
m 2θ = −
¨ + sin(2θ)φ2 ,
˙ (2.14)
∂θ 2
d ∂U
m 2 sin2 θφ = −
˙ . (2.15)
dt ∂φ
Notice that this is a dynamical system with two coordinates, similar
to ordinary mechanics in two dimensions, except that the mass matrix,
while diagonal, is coordinate dependent, and the space on which motion
occurs is not an infinite flat plane, but a curved two dimensional surface,
that of a sphere. These two distinctions are connected—the coordinates
enter the mass matrix because it is impossible to describe a curved space
with unconstrained cartesian coordinates.

2.1.3 Hamilton’s Principle
The configuration of a system at any moment is specified by the value
of the generalized coordinates qj (t), and the space coordinatized by
these q1 , . . . , qN is the configuration space. The time evolution of the
system is given by the trajectory, or motion of the point in configuration
space as a function of time, which can be specified by the functions qi (t).


One can imagine the system taking many paths, whether they obey
Newton’s Laws or not. We consider only paths for which the qi (t) are
differentiable. Along any such path, we define the action as
t2
I= L(q(t), q(t), t)dt.
˙ (2.16)
t1

The action depends on the starting and ending points q(t1 ) and q(t2 ),
but beyond that, the value of the action depends on the path, unlike the
work done by a conservative force on a point moving in ordinary space.
In fact, it is exactly this dependence on the path which makes this
concept useful — Hamilton’s principle states that the actual motion of
the particle from q(t1 ) = qi to q(t2 ) = qf is along a path q(t) for which
the action is stationary. That means that for any small deviation of the
path from the actual one, keeping the initial and final configurations
fixed, the variation of the action vanishes to first order in the deviation.
To find out where a differentiable function of one variable has a
stationary point, we differentiate and solve the equation found by set-
ting the derivative to zero. If we have a differentiable function f of
several variables xi , the first-order variation of the function is ∆f =
i (xi − x0i ) ∂f /∂xi |x0 , so unless ∂f /∂xi |x0 = 0 for all i, there is some
variation of the {xi } which causes a first order variation of f , and then
x0 is not a stationary point.
But our action is a functional, a function of functions, which rep-
resent an infinite number of variables, even for a path in only one
dimension. Intuitively, at each time q(t) is a separate variable, though
varying q at only one point makes q hard to interpret. A rigorous math-
˙
ematician might want to describe the path q(t) on t ∈ [0, 1] in terms of
Fourier series, for which q(t) = q0 + q1 t + n=1 an sin(nπt). Then the
functional I(f ) given by

I= f (q(t), q(t), t)dt
˙

becomes a function of the infinitely many variables q0 , q1 , a1 , . . .. The
endpoints fix q0 and q1 , but the stationary condition gives an infinite
number of equations ∂I/∂an = 0.
It is not really necessary to be so rigorous, however. Under a change
q(t) → q(t) + δq(t), the derivative will vary by δ q = d δq(t)/dt, and the
˙


functional I will vary by

∂f ∂f
δI = δq + δ q dt
˙
∂q ∂q˙
f
∂f ∂f d ∂f
= δq + − δqdt,
∂q i
˙ ∂q dt ∂ q
˙

where we integrated the second term by parts. The boundary terms
each have a factor of δq at the initial or final point, which vanish because
Hamilton tells us to hold the qi and qf fixed, and therefore the functional
is stationary if and only if

∂f d ∂f
− = 0 for t ∈ (ti , tf ) (2.17)
∂q dt ∂ q
˙

We see that if f is the Lagrangian, we get exactly Lagrange’s equation.
The above derivation is essentially unaltered if we have many degrees
of freedom qi instead of just one.

2.1.4 Examples of functional variation
In this section we will work through some examples of functional vari-
ations both in the context of the action and for other examples not
directly related to mechanics.

The falling particle
As a first example of functional variation, consider a particle thrown
up in a uniform gravitional field at t = 0, which lands at the same
spot at t = T . The Lagrangian is L = 1 m(x2 + y 2 + z 2 ) − mgz, and
2
˙ ˙ ˙
the boundary conditions are x(t) = y(t) = z(t) = 0 at t = 0 and
t = T . Elementary mechanics tells us the solution to this problem is
x(t) = y(t) ≡ 0, z(t) = v0 t − 1 gt2 with v0 = 1 gT . Let us evaluate the
2 2
action for any other path, writing z(t) in terms of its deviation from
the suspected solution,
1 1
z(t) = ∆z(t) + gT t − gt2 .
2 2


We make no assumptions about this path other than that it is differ-
entiable and meets the boundary conditions x = y = ∆z = 0 at t = 0
and at t = T . The action is
 
T 2
1  2 d∆z d∆z 1 2
I = m x + y2 +
˙ ˙ + g(T − 2t) + g (T − 2t)2 
0 2 dt dt 4
1
−mg∆z − mg 2 t(T − t) dt.
2

The fourth term can be integrated by parts,

T T T
1 d∆z 1
mg(T − 2t) dt = mg(T − 2t)∆z + mg∆z(t) dt.
0 2 dt 2 0 0

The boundary term vanishes because ∆z = 0 where it is evaluated, and
the other term cancels the sixth term in I, so

T 1 2 1
I = mg (T − 2t)2 − t(T − t) dt
0 2 4 
T 1 2
d∆z 
+ m x2 + y 2 +
˙ ˙ .
0 2 dt

The first integral is independent of the path, so the minimum action
requires the second integral to be as small as possible. But it is an
integral of a non-negative quantity, so its minimum is zero, requiring
x = y = d∆z/dt = 0. As x = y = ∆z = 0 at t = 0, this tells us
˙ ˙
x = y = ∆z = 0 at all times, and the path which minimizes the action
is the one we expect from elementary mechanics.

Is the shortest path a straight line?

The calculus of variations occurs in other contexts, some of which are
more intuitive. The classic example is to find the shortest path between
two points in the plane. The length of a path y(x) from (x1 , y1 ) to


(x2 , y2 ) is given5 by

x2 x2 2
dy
= ds = 1+ dx.
x1 x1 dx
We see that length is playing the role of the action, and x is playing the
√ of t. Using y to represent dy/dx, we have the integrand f (y, y, x) =
role ˙
2 , and ∂f /∂y = 0, so Eq. 2.17 gives
˙
1+y ˙
d ∂f d y
˙
= √ = 0, so y = const.
˙
dx ∂ y
˙ dx 1 + y 2
˙
and the path is a straight line.

2.1.5 Conserved Quantities
Ignorable Coordinates
If the Lagrangian does not depend on one coordinate, say qk , then we
say it is an ignorable coordinate. Of course, we still want to solve
for it, as its derivative may still enter the Lagrangian and eﬀect the
evolution of other coordinates. By Lagrange’s equation
d ∂L ∂L
= = 0,
dt ∂ qk
˙ ∂qk
so if in general we deﬁne
∂L
Pk :=,
∂ qk
˙
as the generalized momentum, then in the case that L is indepen-
dent of qk , Pk is conserved, dPk /dt = 0.

Linear Momentum As a very elementary example, consider a par-
ticle under a force given by a potential which depends only on y and z,
but not x. Then
1
L = m x2 + y 2 + z 2 − U(y, z)
˙ ˙ ˙
2
5
Here we are assuming the path is monotone in x, without moving somewhere
to the left and somewhere to the right. To prove that the straight line is shorter
than other paths which might not obey this restriction, do Exercise 2.2.


is independent of x, x is an ignorable coordinate and
∂L
Px = = mx ˙
∂x˙
is conserved. This is no surprize, of course, because the force is F =
− U and Fx = −∂U/∂x = 0.
Note that, using the definition of the generalized momenta
∂L
Pk = ,
∂ qk
˙
Lagrange’s equation can be written as
d ∂L ∂T ∂U
Pk = = − .
dt ∂qk ∂qk ∂qk
Only the last term enters the definition of the generalized force, so if
the kinetic energy depends on the coordinates, as will often be the case,
it is not true that dPk /dt = Qk . In that sense we might say that the
generalized momentum and the generalized force have not been defined
consistently.

Angular Momentum As a second example of a system with an
ignorable coordinate, consider an axially symmetric system described
with inertial polar coordinates (r, θ, z), with z along the symmetry axis.
Extending the form of the kinetic energy we found in sec (1.3.4) to
include the z coordinate, we have T = 1 mr 2 + 1 mr 2 θ2 + 1 mz 2 . The
2
˙ 2
˙
2
˙
potential is independent of θ, because otherwise the system would not
be symmetric about the z-axis, so the Lagrangian
1 1 1
L = mr 2 + mr 2 θ2 + mz 2 − U(r, z)
˙ ˙ ˙
2 2 2
does not depend on θ, which is therefore an ignorable coordinate, and
∂L
Pθ := = mr 2 θ = constant.
˙
˙
∂θ
We see that the conserved momentum Pθ is in fact the z-component of
the angular momentum, and is conserved because the axially symmetric
potential can exert no torque in the z-direction:
∂U
τz = − r × U = −r U = −r 2 = 0.
z θ ∂θ


Finally, consider a particle in a spherically symmetric potential in
spherical coordinates. In section (3.1.2) we will show that the kinetic
energy in spherical coordinates is T = 1 mr 2 + 1 mr 2 θ2 + 1 mr 2 sin2 θφ2 ,
2
˙ 2
˙
2
˙
so the Lagrangian with a spherically symmetric potential is
1 1 1
L = mr 2 + mr 2 θ2 + mr 2 sin2 θφ2 − U(r).
˙ ˙ ˙
2 2 2
Again, φ is an ignorable coordinate and the conjugate momentum Pφ
is conserved. Note, however, that even though the potential is inde-
pendent of θ as well, θ does appear undifferentiated in the Lagrangian,
and it is not an ignorable coordinate, nor is Pθ conserved6 .

Energy Conservation
We may ask what happens to the Lagrangian along the path of the
motion.
dL ∂L dqi ∂L dqi ∂L
˙
= + +
dt i ∂qi dt i ∂ qi dt
˙ ∂t
In the first term the first factor is
d ∂L
dt ∂ qi
˙
by the equations of motion, so
dL d ∂L ∂L
= qi +
˙ .
dt dt i ∂ qi
˙ ∂t
We expect energy conservation when the potential is time invariant and
there is not time dependence in the constraints, i.e. when ∂L/∂t = 0,
so we rewrite this in terms of
∂L
H(q, q, t) =
˙ qi
˙ −L= qi Pi − L
˙
i ∂ qi
˙ i
6
It seems curious that we are finding straightforwardly one of the components
of the conserved momentum, but not the other two, Ly and Lx , which are also
conserved. The fact that not all of these emerge as conjugates to ignorable coordi-
nates is related to the fact that the components of the angular momentum do not
commute in quantum mechanics. This will be discussed further in section (6.6.1).


Then for the actual motion of the system,
dH ∂L
=− .
dt ∂t
If ∂L/∂t = 0, H is conserved.
H is essentially the Hamiltonian, although strictly speaking that
name is reserved for the function H(q, p, t) on extended phase space
rather than the function with arguments (q, q, t). What is H physically?
˙
In the case of Newtonian mechanics with a potential function, L is
a quadratic function of the velocities qi . If we write the Lagrangian
˙
L = L2 + L1 + L0 as a sum of pieces purely quadratic, purely linear,
and independent of the velocities respectively, then
∂
qi
˙
i ∂ qi
˙
is an operator which multiplies each term by its order in velocities,
∂Li ∂L
qi
˙ = iLi , qi
˙ = 2L2 + L1 ,
i ∂ qi
˙ i ∂ qi
˙
and
H = L2 − L0 .
For a system of particles described by their cartesian coordinates, L2
is just the kinetic energy T , while L0 is the negative of the potential
energy L0 = −U, so H = T + U is the ordinary energy. As we shall see
later, however, there are constrained systems in which the Hamiltonian
is conserved but is not the ordinary energy.

2.1.6 Hamilton’s Equations
We have written the Lagrangian as a function of qi , qi , and t, so it is a
˙
function of N + N + 1 variables. For a free particle we can write the
kinetic energy either as 1 mx2 or as p2 /2m. More generally, we can7
2
˙
reexpress the dynamics in terms of the 2N + 1 variables qk , Pk , and t.
7
In ﬁeld theory there arise situations in which the set of functions Pk (qi , qi )
˙
cannot be inverted to give functions qi = qi (qj , Pj ). This gives rise to local gauge
˙ ˙
invariance, and will be discussed in Chapter 8, but until then we will assume that
the phase space (q, p), or cotangent bundle, is equivalent to the tangent bundle,
i.e. the space of (q, q).
˙


The motion of the system sweeps out a path in the space (q, q, t) or
˙
a path in (q, P, t). Along this line, the variation of L is

∂L ∂L ∂L
dL = dqk +
˙ dqk + dt
k ∂ qk
˙ ∂qk ∂t
˙ ∂L
= Pk dqk + Pk dqk +
˙ dt
k ∂t

where for the first term we used the definition of the generalized mo-
˙
mentum and in the second we have used the equations of motion Pk =
∂L/∂qk . Then examining the change in the Hamiltonian H = k Pk qk −˙
L along this actual motion,

dH = (Pk dqk + qk dPk ) − dL
˙ ˙
k
∂L
= qk dPk − Pk dqk −
˙ ˙ dt.
k ∂t

If we think of qk and H as functions of q and P , and think of H as a
˙
function of q, P , and t, we see that the physical motion obeys

∂H ∂H ∂H ∂L
qk =
˙ , Pk = −
˙ , =−
∂Pk q,t
∂qk P,t
∂t q,P
∂t q,q
˙

The first two constitute Hamilton’s equations of motion, which are
first order equations for the motion of the point representing the system
in phase space.
Let’s work out a simple example, the one dimensional harmonic
oscillator. Here the kinetic energy is T = 1 mx2 , the potential energy
2
˙
is U = 1 kx2 , so L = 1 mx2 − 1 kx2 , the only generalized momentum is
2 2
˙ 2
P = ∂L/∂ x = mx, and the Hamiltonian is H = P x − L = P 2 /m −
˙ ˙ ˙
(P 2 /2m − 1 kx2 ) = P 2 /2m + 1 kx2 . Note this is just the sum of the
2 2
kinetic and potential energies, or the total energy.
Hamilton’s equations give

∂H P ∂H
x=
˙ = , P =−
˙ = −kx = F.
∂P x
m ∂x P


These two equations verify the usual connection of the momentum and
velocity and give Newton’s second law.
The identiﬁcation of H with the total energy is more general than
our particular example. If T is purely quadratic in velocities, we can
write T = 1 ij Mij qi qj in terms of a symmetric mass matrix Mij . If
2
˙ ˙
in addition U is independent of velocities,
1
L = Mij qi qj − U(q)
˙ ˙
2 ij
∂L
Pk = = Mki qi
˙
∂ qk
˙ i

which as a matrix equation in a n-dimensional space is P = M · q.
˙
Assuming M is invertible,8 we also have q = M −1 · P , so
˙

H = PT · q − L
˙
1 T
= P T · M −1 · P − q · M · q − U(q)
˙ ˙
2
1
= P T · M −1 · P − P T · M −1 · M · M −1 · P + U(q)
2
1 T
= P · M −1 · P + U(q) = T + U
2
so we see that the Hamiltonian is indeed the total energy under these
circumstances.

2.1.7 Velocity-dependent forces
We have concentrated thus far on Newtonian mechanics with a potential
given as a function of coordinates only. As the potential is a piece of
the Lagrangian, which may depend on velocities as well, we should
also entertain the possibility of velocity-dependent potentials. Only by
8
If M were not invertible, there would be a linear combination of velocities
which does not aﬀect the Lagrangian. The degree of freedom corresponding to this
combination would have a Lagrange equation without time derivatives, so it would
be a constraint equation rather than an equation of motion. But we are assuming
that the q’s are a set of independent generalized coordinates that have already been
pruned of all constraints.


considering such a potential can we possibly find velocity-dependent
forces, and one of the most important force laws in physics is of that
form. This is the Lorentz force9 on a particle of charge q in the presence
of electromagnetic fields E(r, t) and B(r, t),

v
F =q E+ ×B . (2.18)
c

If the motion of a charged particle is described by Lagrangian mechanics
with a potential U(r, v, t), Lagrange’s equation says

d ∂L ∂L d ∂U ∂U d ∂U ∂U
0= − = mï −
r + , so Fi = − .
dt ∂vi ∂ri dt ∂vi ∂ri dt ∂vi ∂ri

We want a force linear in v and proportional to q, so let us try

U = q φ(r, t) + v · C(r, t) .

Then we need to have
v d
E+ ×B = C − φ− vj Cj . (2.19)
c dt j

The first term is a stream derivative evaluated at the time-dependent
position of the particle, so, as in Eq. (2.5),

d ∂C ∂C
C= + vj .
dt ∂t j ∂xj

The last term looks like the last term of (2.19), except that the indices
on the derivative operator and on C have been reversed. This suggests
that these two terms combine to form a cross product. Indeed, noting
(B.10) that
∂C
v× ×C = vj Cj − vj ,
j ∂xj
9
We have used Gaussian units here, but those who prefer S. I. units (rationalized
MKS) can simply set c = 1.


we see that (2.19) becomes

v ∂C ∂C ∂C
E+ ×B = − φ− vj Cj + vj = − φ−v× ×C .
c ∂t j j ∂xj ∂t

We have successfully generated the term linear in v if we can show
that there exists a vector field C(r, t) such that B = −c × C. A curl
is always divergenceless, so this requires · B = 0, but this is indeed
one of Maxwell’s equations, and it ensures10 there exists a vector field
A, known as the magnetic vector potential, such that B = × A.
Thus with C = −A/c, we need only to find a φ such that

1 ∂A
E =− φ− .
c ∂t
Once again, one of Maxwell’s laws,

1 ∂B
×E + = 0,
c ∂t
gaurantees the existence of φ, the electrostatic potential, because
after inserting B = × A, this is a statement that E + (1/c)∂ A/∂t has
no curl, and is the gradient of something.
Thus we see that the Lagrangian which describes the motion of
a charged particle in an electromagnetic field is given by a velocity-
dependent potential

U(r, v) = q φ(r, t) − (v/c) · A(r, t) .

Note, however, that this Lagrangian describes only the motion of the
charged particle, and not the dynamics of the field itself.

Arbitrariness in the Lagrangian In this discussion of finding the
Lagrangian to describe the Lorentz force, we used the lemma that guar-
anteed that the divergenceless magnetic field B can be written in terms
10
This is but one of many consequences of the Poincar´ lemma, discussed in
e
section 6.5 (well, it should be). The particular forms we are using here state that
3
if · B = 0 and × F = 0 in all of R , then there exist a scalar function φ and a
vector field A such that B = × A and F = φ.


of some magnetic vector potential A, with B = × A. But A is not
uniquely specified by B; in fact, if a change is made, A → A + λ(r, t),
B is unchanged because the curl of a gradient vanishes. The electric
field E will be changed by −(1/c)∂ A/∂t, however, unless we also make
a change in the electrostatic potential, φ → φ − (1/c)∂λ/∂t. If we do,
we have completely unchanged electromagnetic fields, which is where
the physics lies. This change in the potentials,

A→A+ λ(r, t), φ → φ − (1/c)∂λ/∂t, (2.20)

is known as a gauge transformation, and the invariance of the physics
under this change is known as gauge invariance. Under this change,
the potential U and the Lagrangian are not unchanged,

v 1 ∂λ q dλ
L → L − q δφ − · δA = L + −v· λ(r, t) = L + .
c c ∂t c dt

We have here an example which points out that there is not a unique
Lagrangian which describes a given physical problem, and the ambigu-
ity is more that just the arbitrary constant we always knew was involved
in the potential energy. This ambiguity is quite general, not depending
on the gauge transformations of Maxwell fields. In general, if

d
L(2) (qj , qj , t) = L(1) (qj , qj , t) +
˙ ˙ f (qj , t) (2.21)
dt
then L(1) and L(2) give the same equations of motion, and therefore the
same physics, for qj (t). While this can be easily checked by evaluating
the Lagrange equations, it is best understood in terms of the variation
of the action. For any path qj (t) between qjI at t = tI to qjF at t = tF ,
the two actions are related by
tF d
S (2) = L(1) (qj , qj , t) +
˙ f (qj , t) dt
tI dt
= S (1) + f (qjF , tF ) − f (qjI , tI ).

The variation of path that one makes to find the stationary action does
not change the endpoints qjF and qjI , so the difference S (2) − S (1) is a


constant independent of the trajectory, and a stationary trajectory for
S (2) is clearly stationary for S (1) as well.
The conjugate momenta are aﬀected by the change in Lagrangian,
however, because L(2) = L(1) + j qj ∂f /∂qj + ∂f /∂t, so
˙

(2) ∂L(2) (1) ∂f
pj = = pj + .
∂ qj
˙ ∂qj

This ambiguity is not usually mentioned in elementary mechanics,
because if we restict our attention to Lagrangians consisting of canon-
ical kinetic energy and potentials which are velocity-independent, a
change (2.21) to a Lagrangian L(1) of this type will produce an L(2)
which is not of this type, unless f is independent of position q and
leaves the momenta unchanged.

Dissipation Another familiar force which is velocity dependent is
friction. Even the “constant” sliding friction met with in elementary
courses depends on the direction, if not the magnitude, of the velocity.
Friction in a viscous medium is often taken to be a force proportional
to the velocity, F = −αv. We saw above that a potential linear in
velocities produces a force perpendicular to v, and a term higher order
in velocities will contribute to the acceleration. This situation cannot
handled by Lagrange’s equations. An extension to the Lagrange formal-
ism, involving Rayleigh’s dissipation function, is discussed in Ref. [4].

Exercises

2.1 (Galelean relativity): Sally is sitting in a railroad car observing a
system of particles, using a Cartesian coordinate system so that the par-
(S)
ticles are at positions ri (t), and move under the inﬂuence of a potential
(S)
U (S) ({ri }). Thomas is in another railroad car, moving with constant ve-
locity u with respect to Sally, and so he describes the position of each particle
(T ) (S)
as ri (t) = ri (t) − ut. Each takes the kinetic energy to be of the standard
form in his system, i.e. T (S) = 1 mi r ˙ (S) 2 and T (T ) = 1 mi r (T ) 2 .
˙
2 i 2 i


(a) Show that if Thomas assumes the potential function U (T ) (r (T ) ) to be
the same as Sally’s at the same physical points,

U (T ) (r (T ) ) = U (S) (r (T ) + ut), (2.22)

then the equations of motion derived by Sally and Thomas describe the
(S) (T )
same physics. That is, if ri (t) is a solution of Sally’s equations, ri (t) =
(S)
ri (t) − ut is a solution of Thomas’.
(b) show that if U (S) ({ri }) is a function only of the displacements of one
particle from another, {ri − rj }, then U (T ) is the same function of its argu-
ments as U (S) , U (T ) ({ri }) = U (S) ({ri }). This is a different statement than
Eq. 2.22, which states that they agree at the same physical configuration.
Show it will not generally be true if U (S) is not restricted to depend only on
the differences in positions.
(c) If it is true that U (S) (r) = U (T ) (r), show that Sally and Thomas de-
rive the same equations of motion, which we call “form invariance” of the
equations.
(d) Show that nonetheless Sally and Thomas disagree on the energy of a
particular physical motion, and relate the difference to the total momentum.
Which of these quantities are conserved?

2.2 In order to show that the shortest path in two dimensional Euclidean
space is a straight line without making the assumption that ∆x does not
change sign along the path, we can consider using a parameter λ and de-
scribing the path by two functions x(λ) and y(λ), say with λ ∈ [0, 1]. Then
1
= dλ x2 (λ) + y 2 (λ),
˙ ˙
0

where x means dx/dλ. This is of the form of a variational integral with
˙
two variables. Show that the variational equations do not determine the
functions x(λ) and y(λ), but do determine that the path is a straight line.
Show that the pair of functions (x(λ), y(λ)) gives the same action as another
pair (˜(λ), y (λ)), where x(λ) = x(t(λ)) and y (λ) = y(t(λ)), where t(λ) is
x ˜ ˜ ˜
any monotone function mapping [0, 1] onto itself. Explain why this equality
of the lengths is obvious in terms of alternate parameterizations of the path.
[In field theory, this is an example of a local gauge invariance, and plays a
major role in string theory.]

2.3 Consider a circular hoop of radius R rotating about a vertical diameter
at a fixed angular velocity Ω. On the hoop there is a bead of mass m, which


slides without friction on the hoop. The only external force is gravity. Derive
the Lagrangian and the Lagrange equation using the polar angle θ as the
unconstrained generalized coordinate. Find a conserved quantity, and find
˙
the equilibrium points, for which θ = 0. Find the condition on Ω such that
there is an equilibrium point away from the axis.

2.4 Early steam engines had a feedback device, called a governor, to au-
tomatically control the speed. The engine rotated a vertical shaft with an
angular velocity Ω proportional to its speed. On opposite sides of this shaft,
two hinged rods each held a metal weight, which was attached to another
such rod hinged to a sliding collar, as shown.

Ω

L

As the shaft rotates faster, the balls move out-
m m
wards, the collar rises and uncovers a hole, 1 1
releasing some steam. Assume all hinges are
frictionless, the rods massless, and each ball L
has mass m1 and the collar has mass m2 .
(a) Write the Lagrangian in terms of the m
2
generalized coordinate θ.
(b) Find the equilibrium angle θ as a func-
tion of the shaft angular velocity Ω. Tell Governor for a steam en-
whether the equilibrium is stable or not. gine.

2.5 A cylinder of radius R is held horizontally in a fixed position, and
a smaller uniform cylindrical disk of radius a is placed on top of the first
cylinder, and is released from rest. There is a coefficient of static friction
µs and a coefficient of kinetic friction µk < µs for the contact between the
cylinders. As the equilibrium at the top is unstable, the top cylinder will
begin to roll on the bottom cylinder.


a

θ
(a) If µs is sufficiently large, the small disk will
roll until it separates from the fixed cylinder.
Find the angle θ at which the separation oc-
curs, and find the minimum value of µs for R
which this situation holds.
(b) If µs is less than the minimum value found
above, what happens differently, and at what A small cylinder rolling on
angle θ does this different behavior begin? a fixed larger cylinder.

2.6 (a) Show that if Φ(q1 , ..., qn , t) is an arbitrary differentiable function on
extended configuration space, and L(1) ({qi }, {qj }, t) and L(2) ({qi }, {qj }, t)
˙ ˙
are two Lagrangians which differ by the total time derivative of Φ,

d
L(1) ({qi }, {qj }, t) = L(2) ({qi }, {qj }, t) +
˙ ˙ Φ(q1 , ..., qn , t),
dt
show by explicit calculations that the equations of motion determined by
L(1) are the same as the equations of motion determined by L(2) .
(1) (2)
(b) What is the relationship between the momenta pi and pi determined
by these two Lagrangians respectively.

2.7 A particle of mass m lies on a frictionless horizontal table with a tiny
hole in it. An inextensible massless string attached to m goes through the
hole and is connected to another particle of mass M , which moves vertically
only. Give a full set of generalized unconstrained coordinates and write the
Lagrangian in terms of these. Assume the string remains taut at all times
and that the motions in question never have either particle reaching the hole,
and there is no friction of the string sliding at the hole.
Are there ignorable coordinates? Reduce the problem to a single second
order differential equation.

2.8 Consider a mass m on the end of a massless rigid rod of length , the
other end of which is free to rotate about a fixed point. This is a spherical
pendulum. Find the Lagrangian and the equations of motion.


2.9 (a) Find a differential equation for θ(φ) for the shortest path on the
surface of a sphere between two arbitrary points on that surface, by mini-
mizing the length of the path, assuming it to be monotone in φ.
(b) By geometrical argument (that it must be a great circle) argue that
the path should satisfy

cos(φ − φ0 ) = K cot θ,

and show that this is indeed the solution of the differential equation you
derived.

2.10 (a): Find the canonical momenta for a charged particle moving in an
electromagnetic field and also under the influence of a non-electromagnetic
force described by a potential U (r).
(b): If the electromagnetic field is a constant magnetic field B = B0 ez , with
ˆ
no electric field and with U (r) = 0, what conserved quantities are there?

Chapter 3

Two Body Central Forces

Consider two particles of masses m1 and m2 , with the only forces those
of their mutual interaction, which we assume is given by a potential
which is a function only of the distance between them, U(|r1 − r2 |). In
a mathematical sense this is a very strong restriction, but it applies very
nicely to many physical situations. The classical case is the motion of a
planet around the Sun, ignoring the eﬀects mentioned at the beginning
of the book. But it also applies to electrostatic forces and to many
eﬀective representations of nonrelativistic interparticle forces.

3.1 Reduction to a one dimensional prob-
lem

Our original problem has six degrees of freedom, but because of the
symmetries in the problem, many of these can be simply separated
and solved for, reducing the problem to a mathematically equivalent
problem of a single particle moving in one dimension. First we reduce
it to a one-body problem, and then we reduce the dimensionality.

65

66 CHAPTER 3. TWO BODY CENTRAL FORCES

3.1.1 Reduction to a one-body problem
As there are no external forces, we expect the center of mass coordinate
to be in uniform motion, and it behoves us to use
m1 r1 + m2 r2
R=
m1 + m2
as three of our generalized coordinates. For the other three, we ﬁrst
use the cartesian components of the relative coordinate
r := r2 − r1 ,
although we will soon change to spherical coordinates for this vector.
In terms of R and r, the particle positions are
m2 m1
r1 = R − r, r2 = R + r, where M = m1 + m2 .
M M
The kinetic energy is
1 1
T = ˙2
m1 r1 + m2 r2˙2
2 2
1 ˙ m2 ˙ 2 1 ˙ m1 ˙ 2
= m1 R − r + m2 R + r
2 M 2 M
1 ˙ 2 1 m1 m2 ˙ 2
= (m1 + m2 )R + r
2 2 M
1 ˙ 2 1 ˙2
= M R + µr ,
2 2
where
m1 m2
µ :=
m1 + m2
is called the reduced mass. Thus the kinetic energy is transformed to
the form for two eﬀective particles of mass M and µ, which is neither
simpler nor more complicated than it was in the original variables.
For the potential energy, however, the new variables are to be pre-
ferred, for U(|r1 −r2 | = U(|r|) is independent of R, whose three compo-
nents are therefore ignorable coordinates, and their conjugate momenta
∂(T − U) ˙
Pcm = = M Ri
i ˙
∂ Ri

3.1. REDUCTION TO A ONE DIMENSIONAL PROBLEM 67

are conserved. This reduces half of the motion to triviality, leaving
an eﬀective one-body problem with T = 1 µr 2 , and the given potential
2
˙
U(r).
We have not yet made use of the fact that U only depends on the
magnitude of r. In fact, the above reduction applies to any two-body
system without external forces, as long as Newton’s Third Law holds.

3.1.2 Reduction to one dimension
In the problem under discussion, however, there is the additional re-
striction that the potential depends only on the magnitude of r, that is,
on the distance between the two particles, and not on the direction of
r. Thus we now convert from cartesian to spherical coordinates (r, θ, φ)
for r. In terms of the cartesian coordinates (x, y, z)
1
r= (x2 + y 2 + z 2 ) 2 x= r sin θ cos φ
θ= cos−1 (z/r) y= r sin θ sin φ
φ= tan−1 (y/x) z= r cos θ
Plugging into the kinetic energy is messy but eventually reduces to a
rather simple form
1
T = µ x2 + x2 + x2
˙1 ˙2 ˙3
2
1
= µ (r sin θ cos φ + θr cos θ cos φ − φr sin θ sin φ)2
˙ ˙ ˙
2
+(r sin θ sin φ + θr cos θ sin φ + φr sin θ cos φ)2
˙ ˙ ˙
+(r cos θ − θr sin θ)2
˙ ˙
1
= µ r 2 + r 2 θ2 + r 2 sin2 θφ2
˙ ˙ ˙ (3.1)
2
Notice that in spherical coordinates T is a funtion of r and θ as well as
˙ ˙ ˙
r, θ, and φ, but it is not a function of φ, which is therefore an ignorable
coordinate, and
∂L
Pφ = = µr 2 sin2 θφ = constant.
˙
˙
∂φ
Note that r sin θ is the distance of the particle from the z-axis, so Pφ
is just the z-component of the angular momentum, Lz . Of course all


of L = r × p is conserved, because in our effective one body problem
there is no torque about the origin. Thus L is a constant1 , and the
motion must remain in a plane perpendicular to L and passing through
the origin, as a consequence of the fact that r ⊥ L. It simplifies things
if we choose our coordinates so that L is in the z-direction. Then
θ = π/2, θ = 0, L = µr 2 φ. The r equation of motion is then
˙ ˙

L2
µ¨ − µr φ2 + dU/dr = 0 = µ¨ − 3 + dU/dr.
r ˙ r
µr
This is the one-dimensional motion of body in an effective potential

L2
Ueff (r) = U(r) + .
2µr 2
Thus we have reduced a two-body three-dimensional problem to one
with a single degree of freedom, without any additional complication
except the addition of a centrifugal barrier term L2 /2µr 2 to the
potential.
Before we proceed, a comment may be useful in retrospect about
the reduction in variables in going from the three dimensional one-body
problem to a one dimensional problem. Here we reduced the phase
space from six variables to two, in a problem which had four conserved
quantities, L and H. But we have not yet used the conservation of H
in this reduction, we have only used the three conserved quantities L.
Where have these dimensions gone? From L conservation, by choosing
our axes with L z, the two constraints Lx = 0 and Ly = 0 ( with
Lz = 0) do imply z = pz = 0, thereby eliminating two of the coordinates
of phase space. The conservation of Lz , however, is a consequence of an
ignorable coordinate φ, with conserved conjugate momentum Pφ = Lz .
In this case, not only is the corresponding momentum restricted to a
constant value, eliminating one dimension of variation in phase space,
but the corresponding coordinate, φ, while not fixed, drops out of con-
sideration because it does not appear in the remaining one dimensional
1
If L = 0, p and r are in the same direction, to which the motion is then
confined. In this case it is more appropriate to use Cartesian coordinates with this
direction as x, reducing the problem to a one-dimensional problem with potential
U (x) = U (r = |x|). In the rest of this chapter we assume L = 0.

3.2. INTEGRATING THE MOTION 69

problem. This is generally true for an ignorable coordinate — the cor-
responding momentum becomes a time-constant parameter, and the
coordinate disappears from the remaining problem.

3.2 Integrating the motion
We can simplify the problem even more by using the one conservation
law left, that of energy. Because the energy of the effective motion is a
constant,
1
E = µr 2 + Ueff = constant
˙
2
we can immediately solve for
1/2
dr 2
=± (E − Ueff (r)) .
dt µ

This can be inverted and integrated over r, to give
dr
t = t0 ± , (3.2)
2 (E − Ueff (r)) /µ

which is the inverse function of the solution to the radial motion prob-
lem r(t). We can also find the orbit because

dφ ˙
φ L dt
= = 2
dr dr/dt µr dr
so
r dr
φ = φ0 ± L . (3.3)
r0 r 2 2µ (E − Ueff (r))

The sign ambiguity from the square root is only because r may be
increasing or decreasing, but time, and usually φ/L, are always in-
creasing.
Qualitative features of the motion are largely determined by the
range over which the argument of the square root is positive, as for


other values of r we would have imaginary velocities. Thus the motion
is restricted to this allowed region. Unless L = 0 or the potential
U(r) is very strongly attractive for small r, the centrifugal barrier will
dominate, so Ueff −→ +∞, and there must be a smallest radius rp > 0
r→0
for which E ≥ Ueff . Generically the force will not vanish there, so
E − Ueff ≈ c(r − rp ) for r ≈ rp , and the integrals in (3.2) and (3.3)
are convergent. Thus an incoming orbit reaches r = rp at a finite time
and finite angle, and the motion then continues with r increasing and
the ± signs reversed. The radius rp is called a turning point of the
motion. If there is also a maximum value of r for which the velocity
is real, it is also a turning point, and an outgoing orbit will reach this
maximum and then r will start to decrease, confining the orbit to the
allowed values of r.
If there are both minimum and maximum values, this interpretation
of Eq. (3.3) gives φ as a multiple valued function of r, with an “inverse”
r(φ) which is a periodic function of φ. But there is no particular reason
for this period to be the geometrically natural periodicity 2π of φ, so
that different values of r may be expected in successive passes through
the same angle in the plane of the motion. There would need to be
something very special about the attractive potential for the period
to turn out to be just 2π, but indeed that is the case for Newtonian
gravity.
We have reduced the problem of the motion to doing integrals. In
general that is all we can do explicitly, but in some cases we can do the
integral analytically, and two of these special cases are very important
physically.

3.2.1 The Kepler problem

Consider first the force of Newtonian gravity, or equivalently the Coulomb
attraction of unlike charged particles. The force F (r) = −K/r 2 has a
potential
K
U(r) = − .
r


Then the φ integral is
−1/2
L 2E 2K L2
φ = φ0 ± dr + − 2 2
µr 2 µ r µr
du
= φ0 ± √ (3.4)
γ + αu − u2

where we have made the variable substitution u = 1/r which simpli-
ﬁes the form, and have introduced abbreviations γ = 2µE/L2 , α =
2Kµ/L2 .
As dφ/dr must be real the motion will clearly be conﬁned to re-
gions for which the argument of the square root is nonnegative, and
the motion in r will reverse at the turning points where the argument
vanishes. The argument is clearly negative as u → ∞, which is r = 0.
We have assumed L = 0, so the angular momentum barrier dominates
over the Coulomb attraction, and always prevents the particle from
reaching the origin. Thus there is always at least one turning point,
umax , corresponding to the minimum distance rmin. Then the argument
of the square root must factor into [−(u − umax )(u − umin)], although
if umin is negative it is not really the minimum u, which can never
get past zero. The integral (3.4) can be done2 with the substitution
sin2 β = (umax − u)/(umax − umin ). This shows φ = φ0 ± 2β, where φ0
is the angle at r = rmin , u = umax . Then

1
u≡ = A cos(φ − φ0 ) + B
r
where A and B are constants which could be followed from our sequence
of substitutions, but are better evaluated in terms of the conserved
quantities E and L directly. φ = φ0 corresponds to the minimum r,
r = rp , the point of closest approach, or perigee3 , so rp = A + B, and
−1

A > 0. Let θ = φ − φ0 be the angle from this minimum, with the x
2
Of course it can also be done by looking in a good table of integrals. For
example, see 2.261(c) of Gradshtein and Ryzhik[5].
3
Perigee is the correct word if the heavier of the two is the Earth, perihelion
if it is the sun, periastron for some other star. Pericenter is also used, but not as
generally as it ought to be.


axis along θ = 0. Then

1 1 e 1 1 + e cos θ
= A cos θ + B = 1− (1 − cos θ) =
r rp 1+e rp 1 + e

where e = A/B.
What is this orbit? Clearly rp just sets the scale of the whole orbit.
From rp (1 + e) = r + er cos θ = r + ex, if we subtract ex and square,
we get rp + 2rp e(rp − x) + e2 (rp − x)2 = r 2 = x2 + y 2, which is clearly
2

quadratic in x and y. It is therefore a conic section,

y 2 + (1 − e2 )x2 + 2e(1 + e)xrp − (1 + e)2 rp = 0.
2

The nature of the curve depends on the coefficient of x2 . For

• |e| < 1, the coefficient is > 0, and we have an ellipse.

• e = ±1, the coefficient vanishes and y 2 = ax + b is a parabola.

• |e| > 1, the coefficient is < 0, and we have a hyperbola.

All of these are posible motions. The bound orbits are ellipses,
which describe planetary motion and also the motion of comets. But
objects which have enough energy to escape from the sun, such as
Voyager 2, are in hyperbolic orbit, or in the dividing case where the
total energy is exactly zero, a parabolic orbit. Then as time goes to ∞,
φ goes to a finite value, φ → π for a parabola, or some constant less
than π for a hyperbolic orbit.
Let us return to the elliptic case. The closest approach, or perigee,
is r = rp , while the furthest apart the objects get is at θ = π, r =
ra = rp (1 + e)/(1 − e), which is called the apogee or aphelion. e is the
eccentricity of the ellipse. An ellipse is a circle stretched uniformly in
one direction; the diameter in that direction becomes the major axis
of the ellipse, while the perpendicular diameter becomes the minor
axis.


One half the length of the major
axis is the semi-major axis and d r
is denoted by a. ea
ra rp
1 1+e rp
a = rp + rp = , a
2 1−e 1−e
so a
Properties of an ellipse. The
rp = (1 − e)a, ra = (1 + e)a.
large dots are the foci. The
Notice that the center of the el- eccentricity is e and a is the
lipse is ea away from the Sun. semi-major axis

Kepler tells us not only that the orbit is an ellipse, but also that the
sun is at one focus. To verify that, note the other focus of an ellipse
is symmetrically located, at (−2ea, 0), and work out the sum of the
distances of any point on the ellipse from the two foci. This will verify
that d + r = 2a is a constant, showing that the orbit is indeed an ellipse
with the sun at one focus.
How are a and e related to the total energy E and the angular
momentum L? At apogee and perigee, dr/dφ vanishes, and so does r, ˙
so E = U(r) + L2 /2µr 2 = −K/r + L2 /2µr 2 , which holds at r = rp =
a(1−e) and at r = ra = a(1+e). Thus Ea2 (1±e)2 +Ka(1±e)−L2 /2µ =
0. These two equations are easily solved for a and e in terms of the
constants of the motion E and L
K 2EL2
a=− , e2 = 1 + .
2E µK 2
As expected for a bound orbit, we have found r as a periodic func-
tion of φ, but it is surprising that the period is the natural period 2π.
In other words, as the planet makes its revolutions around the sun,
its perihelion is always in the same direction. That didn’t have to be
the case — one could imagine that each time around, the minimum
distance occurred at a slightly different (or very different) angle. Such
an effect is called the precession of the perihelion. We will discuss
this for nearly circular orbits in other potentials in section (3.2.2).
What about Kepler’s Third Law? The area of a triange with r as
one edge and the displacement during a small time interval δr = vδt is


A = 1 |r × v|δt = |r × p|δt/2µ, so the area swept out per unit time is
2

dA L
= .
dt 2µ
which is constant. The area of an ellipse made by stretching a circle is
stretched by the same amount, so A is π times the semimajor axis times
the semiminor axis. √ endpoint of the semiminor axis is a away from
The
each focus, so it is a 1 − e2 from the center, and

√ 2EL2
A = πa2 1 − e2 = πa2 1 − 1 +
µK 2
L −2E
= πa2 .
K µ
Recall that for bound orbits E < 0, so A is real. The period is just the
area swept out in one revolution divided by the rate it is swept out, or
L −2E 2µ
T = πa2
K µ L
2
2πa π
= −2µE = K(2µ)1/2 (−E)−3/2 (3.5)
K 2
2
2πa
= µK/a = 2πa3/2 (K)−1/2 µ1/2 , (3.6)
K
independent of L. The fact that T and a depend only on E and not on
L is another fascinating manifestation of the very subtle symmetries of
the Kepler/Coulomb problem.

3.2.2 Nearly Circular Orbits
For a general central potential we cannot find an analytic form for the
motion, which involves solving the effective one-dimensional problem
with Ueff (r) = U(r) + L2 /2µr 2. If Ueff (r) has a minimum at r = a, one
solution is certainly a circular orbit of radius a. The minimum requires
dUeff (r)/dr = 0 = −F (r) − L2 /µr 3 , so
L2
F (a) = − .
µa3


We may also ask about trajectories which differ only slightly from this
orbit, for which |r − a| is small. Expanding Ueff (r) in a Taylor series
about a,
1
Ueff (r) = Ueff (a) + (r − a)2 k,
2
where
d2 Ueff
k =
dr 2 a
dF 3L2 dF 3F
= − + 4 =− + .
dr µa dr a
For r = a to be a minimum and the nearly circular orbits to be stable,
the second derivative and k must be positive, and therefore F +3F/a <
0. As always when we treat a problem as small deviations from a
stable equilibrium4 we have harmonic oscillator motion, with a period
Tosc = 2π µ/k.
As a simple class of examples, consider the case where the force law
depends on r with a simple power, F = −cr n . Then k = (n + 3)can−1 ,
which is positive and the orbit stable only if n > −3. For gravity,
n = −2, c = K, k = K/a3 , and

µa3
Tosc = 2π
K
agreeing with what we derived for the more general motion, not re-
stricted to small deviations from circularity. But for more general n,
we find

µa1−n
Tosc = 2π .
c(n + 3)

The period of revolution Trev can be calculated for the circular orbit,
as
2π
L = µa2 θ = µa2
˙ = µa3 |F (a)|,
Trev
4
This statement has an exception if the second derivative vanishes, k = 0.


so

µa
Trev = 2π
|F (a)|

which for the power law case is

µa1−n
Trev = 2π .
c

Thus the two periods Tosc and Trev are not equal unless n = −2, as
in the gravitational case. Let us deﬁne the apsidal angle ψ as the
angle between an apogee and the next perigee. It is therefore ψ =
√
πTosc /Trev = π/ 3 + n. For the gravitational case ψ = π, the apogee
and perigee are on opposite sides of the orbit. For a two- or three-
dimensional harmonic oscillator F (r) = −kr we have n = 1, ψ = 1 π, 2
and now an orbit contains two apogees and two perigees, and is again
an ellipse, but now with the center-of-force at the center of the ellipse
rather than at one focus.
Note that if ψ/π is not rational, the orbit never closes, while if
ψ/π = p/q, the orbit will close after q revolutions, having reached p
apogees and perigees. The orbit will then be closed, but unless q = 1
it will be self-intersecting. This exact closure is also only true in the
small deviation approximation; more generally, Bertrand’s Theorem
states that only for the n = −2 and n = 1 cases are the generic orbits
closed.
In the treatment of planetary motion, the precession of the peri-
helion is the angle though which the perihelion slowly moves, so it is
2ψ −2π per orbit. We have seen that it is zero for the pure inverse force
law. There is actually some precession of the planets, due mostly to
perturbative eﬀects of the other planets, but also in part due to correc-
tions to Newtonian mechanics found from Einstein’s theory of general
relativity. In the late nineteenth century descrepancies in the preces-
sion of Mercury’s orbit remained unexplained, and the resolution by
Einstein was one of the important initial successes of general relativity.

3.3. THE LAPLACE-RUNGE-LENZ VECTOR 77

3.3 The Laplace-Runge-Lenz Vector
The remarkable simplicity of the motion for the Kepler and harmonic
oscillator central force problems is in each case connected with a hidden
symmetry. We now explore this for the Kepler problem.
˙
For any central force problem F = p = f (r)ˆr we have a conserved
e
˙ ˙ ˙ ˙
angular momentum L = m(r × r), for L = mr × r + (f (r)/r)r × r = 0.
The motion is therefore conﬁned to a plane perpendicular to L, and the
vector p × L is always in the plane of motion, as are r and p. Consider
the evolution of p × L with time5
d ˙ ˙
p×L = p × L = F × L = mf (r)ˆr × (r × r)
e
dt
e ˙ ˙e ˙
= mf (r) rˆr · r − rˆr · r = mf (r)(rr − r r)
˙
On the other hand, the time variation of the unit vector er = r/r is
ˆ
d dr ˙
r rr
˙ ˙ ˙
rr − r r
er =
ˆ = − 2 =− .
dt dt r r r r2
For the Kepler case, where f (r) = −K/r 2 , these are proportional to
each other with a constant ratio, so we can combine them to form a
conserved quantity A = p × L − mK er , called6 the Laplace-Runge-
ˆ
Lenz vector, dA/dt = 0.
While we have just found three conserved quantities in addition to
the conserved energy and the three conserved components of L, these
cannot all be independent. Indeed we have already noted that A lies
in the plane of motion and is perpendicular to L, so A · L = 0. If we
dot A into the position vector,
A · r = r · (p × (r × p)) − mkr = (r × p)2 − mkr = L2 − mkr,
so if θ is the angle between A and r, we have Ar cos θ + mkr = L2 , or
1 mk A
= 2 1+ cos θ ,
r L mk
5
ˆ ˙
Some hints: A × (B × C) = B(A · C) − C(A · B), and er · r = (1/r)r · r = ˙
2
(1/2r)d(r )/dt = r. The ﬁrst equation, known as the bac-cab equation, is shown
˙
in Appendix A.
6
by Goldstein, at least. While others often use only the last two names, Laplace
clearly has priority.


which is an elegant way of deriving the formula we found previously
by integration, with A = mke. Note θ = 0 is the perigee, so A is a
constant vector pointing towards the perigee.
We also see that the magnitude of A is given in terms of e, which we
have previously related to L and E, so A2 = m2 k 2 + 2mEL2 is a further
relation among the seven conserved quantities, showing that only five
are independent. There could not be more than five independent con-
served functions depending analytically on the six variables of phase
space (for the relative motion only), for otherwise the point represent-
ing the system in phase space would be unable to move. In fact, the
five independent conserved quantities on the six dimensional dimen-
sional phase space confine a generic invariant set of states, or orbit, to
a one dimensional subspace. For power laws other than n = −2 and
n = 1, as the orbits do not close, they are dense in a two dimensional
region of phase space, indicating that there cannot be more than four
independent conserved analytic functions on phase space. So we see
the connection between the existence of the conserved A in the Kepler
case and the fact that the orbits are closed.

3.4 The virial theorem
Consider a system of particles and the quantity G = i pi · ri . Then
the rate at which this changes is
dG
= Fi · ri + 2T.
dt
If the system returns to a region in phase space where it had been, after
some time, G returns to what it was, and the average value of dG/dt
vanishes,
dG
= Fi · ri + 2 T = 0.
dt
This average will also be zero if the region stays in some bounded part
of phase space for which G can only take bounded values, and the
averaging time is taken to infinity. This is appropriate for a system in
thermal equilibrium, for example.

3.5. RUTHERFORD SCATTERING 79

Consider a gas of particles which interact only with the fixed walls
of the container, so that the force acts only on the surface, and the sum
becomes an integral over dF = −pdA, where p is the uniform pressure
and dA is an outward pointing vector representing a small piece of the
surface of the volume. Then

Fi · ri = − pr · dA = −p · rdV = −3pV
δV V

so 2T = 3pV .
A very different application occurs for a power law central force
between pairs of particles, say for a potential U(ri , rj ) = a|ri − rj |n+1 .
Then this action and reaction contribute Fij ·rj + Fji ·ri = Fji ·(ri −rj ) =
−(n + 1)a|ri − rj |n+1 = −(n + 1)U(ri , rj ). So summing over all the
particles and using 2T = − F · r , we have

n+1
T = U .
2

For Kepler, n = −2, so T = − 1 U = − T + U = −E must hold for
2
closed orbits or for large systems of particles which remain bound and
uncollapsed. It is not true, of course, for unbound systems which have
E > 0.
The fact that the average value of the kinetic energy in a bound sys-
tem gives a measure of the potential energy is the basis of the measure-
ments of the missing mass, or dark matter, in galaxies and in clusters of
galaxies. This remains a useful tool despite the fact that a multiparticle
gravitationally bound system can generally throw off some particles by
bringing others closer together, so that, strictly speaking, G does not
return to its original value or remain bounded.

3.5 Rutherford Scattering
We have discussed the 1/r potential in terms of Newtonian gravity,
but of course it is equally applicable to Coulomb’s law of electrostatic


forces. The force between nonrelativistic charges Q and q is given7 by

1 Qq
F = r,
4π 0 r 3

and the potential energy is U(r) = −K/r with K = −Qq/4π 0 .

Unlike gravity, the force is not
always attractive (K > 0), and
for like sign charges we have
K < 0, and therefore U and the θ
total energy are always positive,
and there are no bound motions.
Whatever the relative signs, we r
are going to consider scattering α
φ
here, and therefore positive en-
ergy solutions with the initial α
state of ﬁnite speed v0 and r →
∞. Thus the relative motion is
a hyperbola, with
1+e
r = rp
1 + e cos φ
2EL2
e = ± 1+ .
µK 2 b

This starts and ends with
r → ∞, at φ → ±α = Rutherford scattering. An α
± cos−1 (−1/e), and the angle particle approaches a heavy nu-
θ through which the velocity cleus with an impact parameter
changes is called the scattering b, scattering through an angle
angle. For simplicity we will θ. The cross sectional area dσ
consider the repulsive case, with of the incident beam is scattered
e < 0 so that α < π/2. through angles ∈ [θ, θ + dθ].

7
Here we use S. I. or rationalized MKS units. For Gaussian units drop the 4π 0 ,
or for Heaviside-Lorentz units drop only the 0 .


We see that θ = π − 2α, so

θ cos α |e|−1 1 µK 2
tan = cot α = √ = =√ = .
2 1 − cos2 α 1 − |e|−2 e2 −1 2EL2

We have K = Qq/4π 0 . We need to evaluate E and L. At r = ∞,
U → 0, E = 1 µv0 , L = µbv0 , where b is the impact parameter, the
2
2

distance by which the asymptotic line of the initial motion misses the
scattering center. Thus

θ µ K
tan =K 2
= 2
. (3.7)
2 µv0 (µbv0 )2 µbv0

The scattering angle therefore depends on b, the perpendicular dis-
placement from the axis parallel to the beam through the nucleus. Par-
ticles passing through a given area will be scattered through a given
angle, with a fixed angle θ corresponding to a circle centered on the
axis, having radius b(θ) given by 3.7. The area of the beam dσ in an
annular ring of impact parameters ∈ [b, b + db] is dσ = 2πbdb. To relate
db to dθ, we differentiate the scattering equation for fixed v0 ,
1 θ −K
sec2 dθ = 2 2 db,
2 2 µv0 b

dσ µv0 b2
2
πµv0 b3
2
= 2πb =
dθ 2K cos2 (θ/2) K cos2 (θ/2)
2 3 3 2
πµv0 K cos θ/2 K cos θ/2
= =π
K cos2 (θ/2) 2
µv0 sin θ/2 2
µv0 sin3 θ/2
2
π K sin θ
= .
2 2
µv0 sin4 θ/2

(The last expression is useful because sin θdθ is the “natural measure”
for θ, in the sense that integrating over volume in spherical coordinates
is d3 V = r 2 dr sin θdθdφ.)
How do we measure dσ/dθ? There is a beam of N particles shot
at random impact parameters onto a foil with n scattering centers per


unit area, and we confine the beam to an area A. Each particle will be
significantly scattered only by the scattering center to which it comes
closest, if the foil is thin enough. The number of incident particles per
unit area is N/A, and the number of scatterers being bombarded is nA,
so the number which get scattered through an angle ∈ [θ, θ + dθ] is
N dσ dσ
× nA × dθ = Nn dθ.
A dθ dθ
We have used the cylindrical symmetry of this problem to ignore the
φ dependance of the scattering. More generally, the scattering would
not be uniform in φ, so that the area of beam scattered into a given
region of (θ,φ) would be
dσ
dσ = sin θdθdφ,
dΩ
where dσ/dΩ is called the differential cross section. For Rutherford
scattering we have
2
dσ 1 K θ
= 2
csc4 .
dΩ 4 µv0 2

Scattering in other potentials
We see that the cross section depends on the angle through which the in-
cident particle is scattered for a given impact parameter. In Rutherford
scattering θ increases monotonically as b decreases, which is possible
only because the force is “hard”, and a particle aimed right at the cen-
ter will turn around rather than plowing through. This was a surprize
to Rutherford, for the concurrent model of the nucleus, Thompson’s
plum pudding model, had the nuclear charge spread out over some
atomic-sized spherical region, and the Coulomb potential would have
decreased once the alpha particle entered this region. So sufficiently en-
ergetic alpha particles aimed at the center should have passed through
undeflected instead of scattered backwards. In fact, of course, the nu-
cleus does have a finite size, and this is still true, but at a much smaller
distance, and therefore a much larger energy.
If the scattering angle θ(b) does run smoothly from 0 at b = 0
to 0 at b → ∞, as shown, then there is an extremal value for which


dθ/db|b0 = 0, and for θ < θ(b0 ), dσ/dθ can get contributions from
several different b’s,

dσ bi db
= .
dΩ i sin θ dθ i

It also means that the cross sec-
tion becomes infinite as θ →
θ(b0 ), and vanishes above that
value of θ. This effect is known θ
as rainbow scattering, and is θ (b0 )
the cause of rainbows, because
the scattering for a given color
light off a water droplet is very b0
strongly peaked at the maxi- b
mum angle of scattering.
Another unusual effect occurs when θ(b) becomes 0 or π for some
nonzero value of b, with db/dθ finite. Then dσ/dΩ blows up due to the
sin θ in the denominator, even though the integral (dσ/dΩ) sin θdθdφ
is perfectly finite. This effect is called glory scattering, and can be
seen around the shadow of a plane on the clouds below.

Exercises

3.1 Consider a spherical droplet of water in the sunlight. A ray of light
with impact parameter b is refracted, so by Snell’s Law n sin β = sin α. It is
then internally reflected once and refracted again on the way out.
(a) Express the scattering angle θ in terms of α and β.


(b) Find the scattering cross sec-
tion dσ/dΩ as a function of θ, α
and β (which is implicitly a func-
tion of θ from (a) and Snell’s Law).
(c) The smallest value of θ is
called the rainbow scattering an- α
gle. Why? Find it numerically to
first order in δ if the index of re-
b β

fraction is n = 1.333 + δ
(d) The visual spectrum runs from
violet, where n = 1.343, to red,
where n = 1.331. Find the angu-
lar radius of the rainbow’s circle, θ
and the angular width of the rain-
bow, and tell whether the red or One way light can scatter from a
blue is on the outside. spherical raindrop.

3.2 Consider a particle constrained to move on the surface described in
cylindrical coordinates by z = αr 3 , subject to a constant gravitational force
F = −mgˆz . Find the Lagrangian, two conserved quantities, and reduce the
e
problem to a one dimensional problem. What is the condition for circular
motion at constant r?

3.3 From the general expression for φ as an integral over r, applied to a
three dimensional symmetrical harmonic oscillator V (r) = 1 kr 2 , integrate
2
the equation, and show that the motion is an ellipse, with the center of
force at the center of the ellipse. Consider the three complex quantities
√
Qi = pi − i kmri , and show that each has a very simple equation of motion,
as a consequence of which the nine quantities Q∗ Qk are conserved. Identify
i
as many as possible of these with previously known conserved quantities.

3.4 Show that if a particle under the influence of a central force has an
orbit which is a circle passing through the point of attraction, then the force
is a power law with |F | ∝ r −5 . Assuming the potential is defined so that
U (∞) = 0, show that for this particular orbit E = 0, find the period, and
by expressing x, y and the speed as a function of the angle measured from
˙ ˙
the center of the circle, and its derivative, show that x, y and the speed all
˙ ˙
go to infinity as the particle passes through the center of force.

Chapter 4

Rigid Body Motion

In this chapter we develop the dynamics of a rigid body, one in which all
interparticle distances are fixed by internal forces of constraint. This is,
of course, an idealization which ignores elastic and plastic deformations
to which any real body is susceptible, but it is an excellent approxi-
mation for many situations, and vastly simplifies the dynamics of the
very large number of constituent particles of which any macroscopic
body is made. In fact, it reduces the problem to one with six degrees
of freedom. While the ensuing motion can still be quite complex, it is
tractible. In the process we will be dealing with a configuration space
which is a group, and is not a Euclidean space. Degrees of freedom
which lie on a group manifold rather than Euclidean space arise often
in applications in quantum mechanics and quantum field theory, in ad-
dition to the classical problems we will consider such as gyroscopes and
tops.

4.1 Configuration space for a rigid body
A macroscopic body is made up of a very large number of atoms. De-
scribing the motion of such a system without some simplifications is
clearly impossible. Many objects of interest, however, are very well
approximated by the assumption that the distances between the atoms

85

86 CHAPTER 4. RIGID BODY MOTION

in the body are fixed1 ,

|rα − rβ | = cαβ = constant. (4.1)

This constitutes a set of holonomic constraints, but not independent
ones, as we have here 1 n(n − 1) constraints on 3n coordinates. Rather
2
than trying to solve the constraints, we can understand what are the
generalized coordinates by recognizing that the possible motions which
leave the interparticle lengths fixed are combinations of

• translations of the body as a whole, rα → rα + C,

• rotations of the body about some fixed, or “marked”, point.

We will need to discuss how to represent the latter part of the con-
figuration, (including what a rotation is), and how to reexpress the
kinetic and potential energies in terms of this configuration space and
its velocities.
The first part of the configuration, describing the translation, can
be specified by giving the coordinates of the marked point fixed in
the body, R(t). Often, but not always, we will choose this marked
point to be the center of mass R(t) of the body. In order to discuss
other points which are part of the body, we will use an orthonormal
coordinate system fixed in the body, known as the body coordinates,
with the origin at the fixed point R. The constraints mean that the
position of each particle of the body has fixed coordinates in terms of
this coordinate system. Thus the dynamical configuration of the body
is completely specified by giving the orientation of these coordinate
axes in addition to R. This orientation needs to be described relative
to a fixed inertial coordinate system, or inertial coordinates, with
orthonormal basis ei .ˆ
Let the three orthogonal unit vectors defining the body coordinates
be ei , for i = 1, 2, 3. Then the position of any particle α in the body
ˆ
which has coordinates bαi in the body coordinate system is at the po-
sition rα = R + i bαi ei .ˆ In order to know its components in the
1
In this chapter we will use Greek letters as subscripts to represent the different
particles within the body, reserving Latin subscripts to represent the three spatial
directions.

4.1. CONFIGURATION SPACE FOR A RIGID BODY 87

inertial frame rα = i rαi ei we need to know the coordinates of the
ˆ
three vectors ei in terms of the inertial coordinates,
ˆ

ei =
ˆ Aij ej .
ˆ (4.2)
j

The nine quantities Aij , together with the three components of R =
Ri ei , specify the position of every particle,
ˆ
˜
rαi = Ri + bαj Aji ,
j

and the configuration of the system is completely specified by Ri (t) and
Aij (t).
The nine real quantities in the matrix Aij are not independent, for
the basis vectors ei of the body-fixed coordinate system are orthonor-
ˆ
mal,

ei · ek = δik =
ˆ ˆ Aij Ak ej · e =
ˆ ˆ Aij Ak δj = Aij Akj ,
j j j

or in matrix languag AAT = 1 Such a matrix , whose transpose is equal
I.
to its inverse, is called orthogonal, and is a transformation of basis
vectors which preserves orthonormality of the basis vectors. Because
they play such an important role in the study of rigid body motion, we
need to explore the properties of orthogonal transformations in some
detail.

4.1.1 Orthogonal Transformations
There are two ways of thinking about an orthogonal transformation
A and its action on an orthonormal basis, (Eq. 4.2). One way is to
consider that {î } and {î } are simply different basis vectors used to
e e
describe the same physical vectors in the same vector space. A vector
V is the same vector whether it is expanded in one basis V = j Vj ej ˆ
or the other V = i Vi ei . Thus
ˆ

V = Vj ej =
ˆ Vi ei =
ˆ Vi Aij ej ,
ˆ
j i ij


and we may conclude from the fact that the ej are linearly independent
ˆ
that Vj = i Vi Aij , or in matrix notation that V = AT V . Because A
is orthogonal, multiplying by A (from the left) gives V = AV , or

Vi = Aij Vj . (4.3)
j

Thus A is to be viewed as a rule for giving the primed basis vectors in
terms of the unprimed ones (4.2), and also for giving the components
of a vector in the primed coordinate system in terms of the components
in the unprimed one (4.3). This picture of the role of A is called the
passive interpretation.

One may also use matrices to represent a real physical transfor-
mation of an object or quantity. In particular, Eq. 4.2 gives A the
interpretation of an operator that rotates each of the coordinate basis
e1 , e2 , e3 into the corresponding new vector e1 , e2 , or e3 . For real rota-
ˆ ˆ ˆ ˆ ˆ ˆ
tion of the physical system, all the vectors describing the objects are
changed by the rotation into new vectors V → V (R) , physically differ-
ent from the original vector, but having the same coordinates in the
primed basis as V has in the unprimed basis. This is called the active
interpretation of the transformation. Both active and passive views of
the transformation apply here, and this can easily lead to confusion.
The transformation A(t) is the physical transformation which rotated
the body from some standard orientation, in which the body axes ei ˆ
were parallel to the “lab frame” axes ei , to the configuration of the
ˆ
body at time t. But it also gives the relation of the components of the
same position vectors (at time t) expressed in body fixed and lab frame
coordinates.

If we first consider rotations in two dimensions, it is clear that they
are generally described by the counterclockwise angle θ through which
the basis is rotated,


e1 = cos θˆ1 + sin θˆ2
ˆ e e θ ^2 ^’
e e
^ 1
e2 = − sin θˆ1 + cos θˆ2 e’
ˆ e e 2 θ
corresponding to the matrix ^
e1
cos θ sin θ
A= . (4.4)
− sin θ cos θ
Clearly taking the transpose simply changes the sign of θ, which is just
what is necessary to produce the inverse transformation. Thus each two
dimensional rotation is an orthogonal transformation. The orthogonal-
ity equation A · A−1 = 1 has four matrix elements. It is straightforward
to show that these four equations on the four elements of A deter-
mine A to be of the form 4.4 except that the sign of the bottom row is
undetermined. For example, the transformation e1 = e1 , e2 = −ˆ2 is or-
ˆ ˆ ˆ e
thogonal but is not a rotation. Let us call this transformation P . Thus
any two-dimensional orthogonal matrix is a rotation or is P followed
by a rotation. The set of all real orthogonal matrices in two dimensions
is called O(2), and the subset consisting of rotations is called SO(2).
In three dimensions we need to take some care with what we mean
by a rotation. On the one hand, we might mean that the transformation
has some fixed axis and is a rotation through some angle about that
axis. Let us call that a rotation about an axis. On the other hand,
we might mean all transformations we can produce by a sequence of
rotations about various axes. Let us define rotation in this sense.
Clearly if we consider the rotation R which rotates the basis {ˆ} into
e
the basis {ˆ }, and if we have another rotation R which rotates {ˆ }
e e
into {ˆ }, then the transformation which first does R and then does
e
R , called the composition of them, R = R ◦ R, is also a rotation
˘
in this latter sense. As ei = j Rij ej = ij Rij Rjk ek , we see that
ˆ ˆ ˆ
R˘ ik = j R Rjk and e = k Rik ek . Thus the composition R = R R
î ˘ ˆ ˘
ij
is given by matrix multiplication. In two dimensions, straightforward
evaluation will verify that if R and R are of the form (4.4) with angles
˘ ˘
θ and θ respectively, the product R is of the same form with angle θ =
θ+θ . Thus all rotations are rotations about an axis there. Rotations in


H
V

V:
H:

H V

Figure 4.1: The results of applying the two rotations H and V to a
book depends on which is done first. Thus rotations do not commute.
Here we are looking down at a book which is originally lying face up on
a table. V is a rotation about the vertical z-axis, and H is a rotation
about a fixed axis pointing to the right, each through 90◦ .

three dimensions are a bit more complex, because they can take place
in different directions as well as through different angles. We can still
represent the composition of rotations with matrix multiplication, now
of 3 × 3 matrices. In general, matrices do not commute, AB = BA,
and this is indeed reflected in the fact that the effect of performing
two rotations depends in the order in which they are done. A graphic
illustration is worth trying. Let V be the process of rotating an object
through 90◦ about the vertical z-axis, and H be a rotation through 90◦
about the x-axis, which goes goes off to our right. If we start with the
book lying face up facing us on the table, and first apply V and then H,
we wind up with the binding down and the front of the book facing us.
If, however, we start from the same position but apply first H and then
V , we wind up with the book standing upright on the table with the
binding towards us. Clearly the operations H and V do not commute.


It is clear that any composition of rotations must be orthogonal,
as any set of orthonormal basis vectors will remain orthonormal under
each transformation. It is also clear that there is a three dimensional
version of P , say e1 = e1 , e2 = e2 , e3 = −ˆ3 , which is orthogonal but
ˆ ˆ ˆ ˆ ˆ e
not a composition of rotations, for it changes a right-handed coordinate
system (with e1 × e2 = e3 ) to a left handed one, while rotations preserve
ˆ ˆ ˆ
the handedness. It is straightforward to show that any composition of
orthogonal matrices is orthogonal, for if AAT = 1 and BB T = 1 and
I I
C = AB, then CC T = AB(AB)T = ABB T AT = A 1 AT = 1 and CI I,
is orthogonal as well. So the rotations are a subset of the set O(N) of
orthogonal matrices.

4.1.2 Groups
This set of orthogonal matrices is a group, which means that the set
O(N) satisﬁes the following requirements, which we state for a general
set G.
A set G of elements A, B, C, ... together with a group multiplica-
tion rule ( ) for combining two of them, is a group if
• Given any two elements A and B in the group, the product A B
is also in the group. One then says that the set G is closed
under . In our case the group multiplication is ordinary matrix
multiplication, the group consists of all 3 × 3 orthogonal real
matrices, and we have just shown that it is closed.
• The product rule is associative; for every A, B, C ∈ G, we have
A (B C) = (A B) C. For matrix multiplication this is
simply due to the commutivity of ﬁnite sums, i j = j i .
• There is an element e in G, called the identity, such that for every
element A ∈ G, eA = Ae = A. In our case e is the unit matrix
1 1 ij = δij .
I, I
• Every element A ∈ G has an element A−1 ∈ G such that AA−1 =
A−1 A = e. This element is called the inverse of A, and in the
case of orthogonal matrices is the inverse matrix, which always ex-
ists, because for orthogonal matrices the inverse is the transpose,
which always exists for any matrix.


While the constraints 4.1 would permit A(t) to be any orthogonal
matrix, the nature of Newtonian mechanics requires it to vary con-
tinuously in time. If the system starts with A = 1 there must be
I,
a continuous path in the space of orthogonal matrices to the conﬁg-
uration A(t) at any later time. But the set of matrices O(3) is not
connected in this fashion: there is no path from A = 1 to A = P . To
I
see it is true, we look at the determinant of A. From AAT = 1 we I
T T 2
see that det(AA ) = 1 = det(A) det(A ) = (det A) so det A = ±1 for
all orthogonal matrices A. But the determinant varies continuously as
the matrix does, so no continuous variation of the matrix can lead to a
jump in its determinant. Thus the matrices which represent rotations
have unit determinant, det A = +1, and are called unimodular.
The set of all unimodular orthogonal matrices in N dimensions is
called SO(N). It is a subset of O(N), the set of all orthogonal ma-
trices in N dimensions. Clearly all rotations are in this subset. The
subset is closed under multiplication, and the identity and the inverses
of elements in SO(N) are also in SO(N), for their determinants are
clearly 1. Thus SO(N) is a subgroup of O(N). It is actually the set
of rotations, but we shall prove this statement only for the case N = 3,
which is the immediately relevant one. Simultaneously we will show
that every rotation in three dimensions is a rotation about an axis. We
have already proven it for N = 2. We now show that every A ∈ SO(3)
has one vector it leaves unchanged or invariant, so that it is eﬀectively a
rotation in the plane perpendicular to this direction, or in other words
a rotation about the axis it leaves invariant. The fact that every uni-
modular orthogonal matrix in three dimensions is a rotation about an
axis is known as Euler’s Theorem. To show that it is true, we note
that if A is orthogonal and has determinant 1,

det (A − 1 T = det(1 − AT ) = det(1 − A)
I)A I I
= det(A − 1 det(A) = det(−(1 − A)) = (−1)3 det(1 − A)
I) I I
= − det(1 − A),
I

so det(1 − A) = 0 and 1 − A is a singular matrix. Then there exists
I I
a vector ω which is annihilated by it, (1 − A)ω = 0, or Aω = ω, and
I
ω is invariant under A. Of course this determines only the direction of
ω, and only up to sign. If we choose a new coordinate system in which


˜
the z -axis points along ω, we see that the elements Ai3 = (0, 0, 1), and
˜
2 2
orthogonality gives A3j = 1 = A33 so A31 = A32 = 0. Thus A is of
the form
 
0
(B)
A= 0
0 0 1

where B is an orthogonal unimodular 2 × 2 matrix, which is therefore a
rotation about the z-axis through some angle ω, which we may choose
to be in the range ω ∈ (−π, π]. It is natural to define the vector ω,
whose direction only was determined above, to be ω = ωˆz . Thus we see
e˜
that the set of orthogonal unimodular matrices is the set of rotations,
and elements of this set may be specified by a vector2 of length ≤ π.
Thus we see that the rotation which determines the orientation of a
rigid body can be described by the three degrees of freedom ω. Together
with the translational coordinates R, this parameterizes the configura-
tion space of the rigid body, which is six dimensional. It is important to
recognize that this is not motion in a flat six dimensional configuration
space, however. For example, the configurations with ω = (0, 0, π − )
and ω = (0, 0, −π + ) approach each other as → 0, so that motion
need not even be continuous in ω. The composition of rotations is
by multiplication of the matrices, not by addition of the ω’s. There
are other ways of describing the configuration space, two of which are
known as Euler angles and Cayley-Klein parameters, but none of these
make describing the space very intuitive. For some purposes we do
not need all of the complications involved in describing finite rotations,
but only what is necessary to describe infinitesimal changes between
the configuration at time t and at time t + ∆t. We will discuss these
applications first. Later, when we do need to discuss the configuration
in section 4.4.2, we will define Euler angles.

2
More precisely, we choose ω along one of the two opposite directions left invari-
ant by A, so that the the angle of rotation is non-negative and ≤ π. This specifies
a point in or on the surface of a three dimensional ball of radius π, but in the case
when the angle is exactly π the two diametrically opposed points both describe the
same rotation. Mathematicians say that the space of SO(3) is three-dimensional
real projective space P3 (R).


4.2 Kinematics in a rotating coordinate
system
We have seen that the rotations form a group. Let us describe the
configuration of the body coordinate system by the position R(t) of a
given point and the rotation matrix A(t) : ei → ei which transforms the
ˆ ˆ
canonical fixed basis (inertial frame) into the body basis. A given par-
ticle of the body is fixed in the body coordinates, but this, of course, is
not an inertial coordinate system, but a rotating and possibly accelerat-
ing one. We need to discuss the transformation of kinematics between
these two frames. While our current interest is in rigid bodies, we will
first derive a general formula for rotating (and accelerating) coordinate
systems.
Suppose a particle has coordinates b(t) = i bi (t)î (t) in the body
e
system. We are not assuming at the moment that the particle is part
of the rigid body, in which case the bi (t) would be independent of
time. In the inertial coordinates the particle has its position given by
r(t) = R(t) + b(t), but the coordinates of b(t) are different in the space
and body coordinates. Thus

ri (t) = Ri (t) + bi (t) = Ri (t) + A−1 (t) b (t).
ij j
j

The velocity is v = i ri ei , because the ei are inertial and therefore
˙ˆ ˆ
considered stationary, so
 
˙ d −1 dbj (t)
v =R+  A (t) bj (t) + A−1 (t)  ei ,
ˆ
dt ij dt
ij ij

˙
and not R + i (dbi /dt)î , because the ei are themselves changing with
e ˆ
time. We might define a “body time derivative”

˙ d dbi
b := b := ei ,
ˆ
b dt b i dt

but it is not the velocity of the particle α, even with respect to R(t), in
the sense that physically a vector is basis independent, and its derivative

4.2. KINEMATICS IN A ROTATING COORDINATE SYSTEM 95

requires a notion of which basis vectors are considered time independent
(inertial) and which are not. Converting the inertial evaluation to the
body frame requires the velocity to include the dA−1 /dt term as well
˙
as the b term.
b
What is the meaning of this extra term

d −1
V= A (t) bj (t)î
e ?
ij dt ij

The derivative is, of course,
1
V = lim A−1 (t + ∆t)ij − A−1 (t)ij bj (t)î .
e
∆t→0 ∆t
ij

This expression has coordinates in the body frame with basis vectors
from the inertial frame. It is better to describe it in terms of the body
coordinates and body basis vectors by inserting ei = k (A−1 (t)ik ek (t) =
ˆ ˆ
k Aki (t)ˆk (t). Then we have
e

1
V= ek lim
ˆ A(t)A−1 (t + ∆t) − A(t)A−1 (t) b (t).
∆t→0 ∆t kj j
kj

The second term is easy enough to understand, as A(t)A−1 (t) = 1 I,
so the full second term is just b expressed in the body frame. The
interpretation of the first term is suggested by its matrix form: A−1 (t +
∆t) maps the body basis at t + ∆t to the inertial frame, and A(t) maps
this to the body basis at t. So together this is the infinitesimal rotation
ei (t + ∆t) → ei (t). This transformation must be close to an identity,
ˆ ˆ
as ∆t → 0. Let us expand it:

B := A(t)A−1 (t + ∆t) = 1 − Ω ∆t + O(∆t)2 .
I (4.5)

Here Ω is a matrix which has fixed (finite) elements as ∆t → 0, and
is called the generator of the rotation. Note B −1 = 1 + Ω ∆t to the
I
order we are working, while the transpose B T = 1 − Ω T ∆t, so because
I
we know B is orthogonal we must have that Ω is antisymmetric,
Ω = −Ω T , Ωij = −Ωji .


Subtracting 1 from both sides of (4.5) and taking the limit shows
I
that the matrix
d d
Ω (t) = −A(t) · A−1 (t) = A(t) · A−1 (t),
dt dt
where the latter equality follows from differentiating A · A−1 = 1I.
The antisymmetric matrix Ω is effectively a vector. Define ωk =
1
2 ij kij Ωij . Then the ωk also determine the Ωij :

1
ijk ωk = ijk k m Ω m
k 2km
1 1
= (δi δjm − δim δj ) Ω m = (Ωij − Ωji ) = Ωij ,
2 m 2
so ωk and Ωij are essentially the same thing.
We have still not answered the question, “what is V?”
1
V = ek lim
ˆ [B − 1 kj bj = −
I] ek Ωkj bj = −
ˆ ek
ˆ kj ω bj
∆t→0 ∆t
kj kj kj

= ω × b,
where ω = ω e . Thus we see that
ˆ
˙ ˙
v = R + ω × b + (b)b , (4.6)
and the second term, coming from V, represents the motion due to the
rotating coordinate system.
When differentiating a true vector, which is independent of the ori-
gin of the coordinate system, rather than a position, the first term in
(4.6) is absent, so in general for a vector C,
 
d dC
C =   + ω × C. (4.7)
dt dt
b
˙
The velocity v is a vector, as are R and b, the latter because it is the
difference of two positions. The angular velocity ω is also a vector, and
its derivative is particularly simple, because

˙ d dω dω
ω= ω= +ω×ω = . (4.8)
dt dt b
dt b

4.2. KINEMATICS IN A ROTATING COORDINATE SYSTEM 97

Another way to understand (4.7) is as a simple application of Leib-
nitz’ rule to C = Ci ei , noting that
ˆ

d d
e (t) =
ˆ Aij (t)ˆj =
e (Ω A)ij ej =
ˆ Ωik ek ,
ˆ
dt i j dt j k

which means that the second term from Leibnitz is
d
Ci e (t) =
ˆ Ci Ωik ek =
ˆ Ci ikj ωj ek
ˆ = ω × C,
dt i ik ijk

˙
as given in (4.7). This shows that even the peculiar object (b)b obeys
(4.7).
Applying this to the velocity itself (4.6), we ﬁnd the acceleration

d d ˙ dω d d ˙
a = v = R+ × b + ω × b + (b)b
dt dt dt   dt dt    
¨ db  d2 b  db
= R + ω × b + ω × 
˙ + ω × b +  2 + ω ×  
dt dt dt
b b b
   
2
¨ db db
= R +  2  + 2ω ×   + ω × b + ω × ω × b .
˙
dt dt
b b

This is a general relation between any orthonormal coordinate system
and an inertial one, and in general can be used to describe physics in
noninertial coordinates, regardless of whether that coordinate system is
imbedded in a rigid body. The full force on the particle is F = ma, but
if we use r, v , and a to represent b, (db/dt)b and (d2 b/dt2 )b respectively,
we have an expression for the apparent force

¨ ˙
ma = F − mR − 2mω × v − mω × r − mω × (ω × r).

The additions to the real force are the pseudoforce for an accelerating
¨
reference frame −mR, the Coriolus force −2mω ×v , an unnamed force
˙
involving the angular acceleration of the coordinate system −mω × r,
and the centrifugal force −mω × (ω × r) respectively.


4.3 The moment of inertia tensor
Let us return to a rigid body, where the particles are constrained to
keep the distances between them constant. Then the coordinates bαi in
the body frame are independant of time, and
˙
vα = R + ω × bα

so the individual momenta and the total momentum are

pα = mα V + mα ω × bα
P = MV + ω × mα bα
α

= MV + Mω × B

where B is the center of mass position relative to the marked point R.

4.3.1 Motion about a fixed point
Angular Momentum
We next evaluate the total angular momentum, L = α rα × pα . We
will first consider the special case in which the body is rotating about
the origin, so R ≡ 0, and then we will return to the general case. As
pα = mα ω × bα already involves a cross product, we will find a triple
product, and will use the reduction formula3

A× B×C = B A·C −C A·B .

Thus

L = mα bα × ω × bα (4.9)
α

= ω mα b2 −
α mα bα bα · ω . (4.10)
α α

We see that, in general, L need not be parallel to the angular velocity ω,
but it is always linear in ω. Thus it is possible to generalize the equation
3
This formula is colloquially known as the bac-cab formula. It is proven in
Appendix A.

4.3. THE MOMENT OF INERTIA TENSOR 99

L = Iω of elementary physics courses, but we need to generalize I from
a multiplicative number to a linear operator which maps vectors into
vectors, not necessarily in the same direction. In component language
this linear operation is clearly in the form Li = j Iij ωj , so I is a 3 × 3
matrix. Rewriting (4.10), we have
2
Li = ωi mα bα − mα bαi bα · ω .
α α
2
= mα bα δij − bαi bαj ωj
j α

≡ Iij ωj ,
j

where
2
Iij = mα bα δij − bαi bαj
α

is the inertia tensor about the ﬁxed point R. In matrix form, we now
have (4.10) as

L = I · ω, (4.11)

where I · ω means a vector with components (I · ω)i = j Iij ωj .
If we consider the rigid body in the continuum limit, the sum over
particles becomes an integral over space times the density of matter,

Iij = d3 bρ(b) b 2 δij − bi bj . (4.12)

Kinetic energy
˜
For a body rotating about a ﬁxed point R,
1 2 1
T = mα vα = mα ω × bα · ω × bα .
2 α 2 α

From the general 3-dimensional identity4

A × B · C × D = A · C B · D − A · D B · C,
4
See Appendix A for a hint on how to derive this.


we have
1 2
T = mα ω 2 bα − ω · bα
2
2 α
1 2
= ωi ωj mα bα δij − bαi bαj
2 ij α
1
= ωi Iij ωj . (4.13)
2 ij

or
1
T = ω · I · ω.
2
Noting that j Iij ωj = Li , T = 1 ω · L for a rigid body rotating about
2
the origin, with L measured from that origin.

4.3.2 More General Motion
When the marked point R is not fixed in space, there is nothing special
about it, and we might ask whether it would be better to evaluate the
moment of inertia about some other point. Working in the body-fixed
coordinates, we may consider a given point b and evaluate the moment
of inertia about that point, rather than about the origin. This means
bα is replaced by bα − b, so

(b ) 2
Iij = mα bα − b δij − (bαi − bi ) (bαj − bj )
α
(0)
= Iij +M −2b · B + b2 δij + Bi bj + bi Bj − bi bj , (4.14)

where we recall that B is the position of the center of mass with respect
to R, the origin of the body fixed coordinates. Subtracting the moment
of inertia about the center of mass, given by (4.14) with b → B, we
have
(b ) (B )
Iij − Iij = M −2b · B + b2 + B 2 δij + Bi bj + bi Bj − bi bj − Bi Bj
2
= M b−B δij − (bi − Bi ) (bj − Bj ) . (4.15)


Note the difference is independent of the origin of the coordinate sys-
tem, depending only on the vector ˘ = b − B.
b
A possible axis of rotation can be specified by a point b through
which it passes, together with a unit vector n in the direction of the
ˆ
5
axis . The moment of inertia about the axis (b, n) is defined as
ˆ
(b )
n · I · n. If we compare this to the moment about a parallel axis
ˆ ˆ
through the center of mass, we see that

n · I (b ) · n − n · I (cm) · n = M ˘2 n2 − (˘ · n)2
ˆ ˆ ˆ ˆ b ˆ b ˆ
= M(ˆ × ˘ 2 = M ˘2 ,
n b) b⊥ (4.16)

where ˘⊥ is the projection of the vector, from the center of mass to b,
b
onto the plane perpendicular to the axis. Thus the moment of inertia
about any axis is the moment of inertia about a parallel axis through
the center of mass, plus M 2 , where = ˘⊥ is the distance between
b
these two axes. This is known as the parallel axis theorem.
The general motion of a rigid body involves both a rotation and a
translation of a given point R. Then
˙
r α = V + ω × bα , (4.17)
where V and ω may be functions of time, but they are the same for all
particles α. Then the angular momentum about the origin is
L = ˙
mα rα × r α = mα rα × V + mα R + bα × ω × bα
α α α

= M R × V + I (0) · ω + M R × (ω × B), (4.18)
where the inertia tensor I (0) is still measured about R, even though
that is not a fixed point. Recall that R is the laboratory position of the
center of mass, while B is its position in the body-fixed system. The
kinetic energy is now
1 ˙2 1
T = mα r α = mα V + ω × bα · V + ω × bα
α 2 2 α

5
Actually, this gives more information than is needed to specify an axis, as b and
b specify the same axis if b − b ∝ n. In the expression for the moment of inertia
ˆ
about the axis, (4.16), we see that the component of b parallel to n does not affect
ˆ
the result.


1 1 2
= mα V 2 + V · ω × mα bα + mα ω × bα
2 α α 2 α
1 1
= M V 2 + M V · ω × B + ω · I (0) · ω (4.19)
2 2
and again the inertia tensor is calculated about the arbitrary point R.
We will see that it makes more sense to use the center of mass.

Simpliﬁcation Using the Center of Mass
˙
As each rα = V + ω × bα , the center of mass velocity is given by
MV = ˙
mα r α = mα V + ω × bα = M V + ω × B , (4.20)
α α

so 1 M V 2 = 1 M V 2 + M V · (ω × B) + 1 M(ω × B)2 . Comparing with
2 2 2
4.19, we see that
1 1 1
T = M V 2 − M(ω × B)2 + ω · I (0) · ω.
2 2 2
The last two terms can be written in terms of the inertia tensor about
the center of mass. From 4.15 with b = 0 for R = R,
(cm) (0)
Iij = Iij − MB 2 δij + MBi Bj .

Using the formula for A × B · C × D again,
1 1 2 1
T = M V 2 − M ω 2B 2 − ω · B + ω · I (0) · ω
2 2 2
1 1
= M V 2 + ω · I (cm) · ω. (4.21)
2 2
A similar expression holds for the angular momentum. Inserting V =
V − ω × B into (4.18),

L = M R × V − ω × B + I (0) · ω + M R × (ω × B)
= M R × V − M(R − R) × (ω × B) + I (0) · ω
= M R × V − M B × (ω × B) + I (0) · ω
= M R × V − M ω B 2 + M B ω · B + I (0) · ω
= M R × V + I (cm) · ω. (4.22)


These two decompositions, (4.21) and (4.22), have a reasonable in-
terpretation: the total angular momentum is the angular momentum
about the center of mass, plus the angular momentum that a point
particle of mass M and position R(t) would have. Similiarly, the to-
tal kinetic energy is the rotational kinetic energy of the body rotating
about its center of mass, plus the kinetic energy of the ﬁctious point
particle at the center of mass.
Note that if we go back to the situation where the marked point
R is stationary at the origin of the lab coordinates, V = 0, L = I · ω,
T = 1 ω · I · ω = 1 ω · L.
2 2
The angular momentum in Eqs. 4.18 and 4.22 is the angular momen-
tum measured about the origin of the lab coordinates, L = α mα rα ×
vα . It is useful to consider the angular momentum as measured about
the center of mass,

L cm = mα rα − R × vα − V = L − Mr × V , (4.23)
α

so we see that the angular momentum, measured about the center of
mass, is just I (cm) · ω.
The parallel axis theorem is also of the form of a decomposition.
The inertia tensor about a given point r given by (4.15) is

(r) (cm) 2
Iij = Iij +M r−R δij − (ri − Ri ) (rj − Rj ) .

This is, once again, the sum of the quantity, here the inertia tensor, of
the body about the center of mass, plus the value a particle of mass M
at the center of mass R would have, evaluated about r.
There is another theorem about moments of inertia, though much
less general — it only applies to a planar object — let’s say in the xy
plane, so that zα ≈ 0 for all the particles constituting the body. As

Izz = mα x2 + yα
α
2
α
2 2 2
Ixx = mα yα + zα = mα yα
α α

Iyy = mα x2
α + 2
zα = mα x2 ,
α
α α


we see that Izz = Ixx + Iyy , the moment of inertia about an axis per-
pendicular to the body is the sum of the moments about two perpen-
dicular axes within the body, through the same point. This is known
as the perpendicular axis theorem. As an example of its usefulness
we calculate the moments for a thin uniform ring lying on the circle
x2 + y 2 = R2 , z = 0, about the origin. As every particle of the ring
has the same distance R from the z-axis, the moment of inertia Izz is
simply MR2 . As Ixx = Iyy by symmetry, and as the two must add up
to Izz , we have, by a simple indirect calculation, Ixx = 1 MR2 .
2
The parallel axis theorem (4.16) is also a useful calculational tool.
Consider the moment of inertia of the ring about an axis parallel to
its axis of symmetry but through a point on the ring. About the axis
of symmetry, Izz = MR2 , and b⊥ = R, so about a point on the ring,
Izz = 2MR2 . If instead, we want the moment about a tangent to the
ring, Ixx = Ixx + MR2 = 1 MR2 + MR2 = 3MR2 /2. Of course for
(cm)
2
Iyy the b⊥ = 0, so Iyy = 1 MR2 , and we may verify that Izz = Ixx + Iyy
2
about this point as well.

Principal axes
If an object has an axial symmetry about z, we may use cylindrical polar
coordinates (ρ, θ, z). Then its density µ(ρ, θ, z) must be independent of
θ, and

Iij = dz ρdρ dθ µ(ρ, z) (ρ2 + z 2 )δij − ri rj ,

so Ixz = dz ρdρ µ(ρ, z)dθ (−zρ cos θ) = 0

Ixy = dz ρdρ µ(ρ, z)dθ (ρ2 sin θ cos θ) = 0

Ixx = dz ρdρ µ(ρ, z)dθ (ρ2 + z 2 − ρ2 cos2 θ

Iyy = dz ρdρ µ(ρ, z)dθ (ρ2 + z 2 − ρ2 sin2 θ = Ixx

Thus the inertia tensor is diagonal and has two equal elements,
 
Ixx 0 0
 
Iij =  0 Ixx 0 .
0 0 Izz


In general, an object need not have an axis of symmetry, and even
a diagonal inertia tensor need not have two equal “eigenvalues”. Even
if a body has no symmetry, however, there is always a choice of axes, a
coordinate system, such that in this system the inertia tensor is diago-
nal. This is because Iij is always a real symmetric tensor, and any such
tensor can be brought to diagonal form by an orthogonal similiarity
transformation6
 
I1 0 0

I = OID O−1 , ID =  0 I2 0  (4.24)
0 0 I3

An orthogonal matrix O is either a rotation or a rotation times P , and
the P ’s can be commuted through ID without changing its form, so
there is a rotation R which brings the inertia tensor into diagonal form.
The axes of this new coordinate system are known as the principal
axes.

Tire balancing
Consider a rigid body rotating on an axle, and therefore about a fixed
˙
axis. What total force and torque will the axle exert? First, R = ω × R,
so
¨ ˙
R = ω × R + ω × R = ω × R + ω × (ω × R) = ω × R + ω(ω · R) + Rω 2.
˙ ˙ ˙

˙
If the axis is fixed, ω and ω are in the same direction, so the first term
is perpendicular to the other two. If we want the total force to be zero7 ,
¨
R = 0, so
¨
R · R = 0 = 0 + (ω · R)2 − R2 ω 2 .
Thus the angle between ω and R is 0 or π, and the center of mass must
lie on the axis of rotation. This is the condition of static balance if the
axis of rotation is horizontal in a gravitational field. Consider a car
6
This should be proven in any linear algebra course. For example, see [1], The-
orem 6 in Section 6.3.
7
Here we are ignoring any constant force compensating the force exerted by the
road which is holding the car up!


tire: to be stable at rest at any angle, R must lie on the axis or there
will be a gravitational torque about the axis, causing rotation in the
absense of friction. If the tire is not statically balanced, this force will
rotate rapidly with the tire, leading to vibrations of the car.
˙
Even if the net force is 0, there might be a torque. τ = L =
d(I · ω)/dt. If I · ω is not parallel to ω it will rotate with the wheel,
˙
and so L will rapidly oscillate. This is also not good for your axle. If,
however, ω is parallel to one of the principal axes, I · ω is parallel to
ω, so if ω is constant, so is L, and τ = 0. The process of placing small
weights around the tire to cause one of the principal axes to be aligned
with the axle is called dynamical balancing.
Every rigid body has its principal axes; the problem of finding them
and the moments of inertia about them, given the inertia tensor Iij in
some coordiate system, is a mathematical question of finding a rotation
R and “eigenvalues” I1 , I2 , I3 (not components of a vector) such that
 
1
 
equation 4.24 holds, with R in place of O. The vector v1 = R  0  is
0
then an eigenvector, for
     
1 1 1
−1      
I · v1 = RID R R  0  = RID  0  = I1 R  0  = I1 v1 .
0 0 0
Similarly I · v2 = I2 v2 and I · v3 = I3 v3 , where v2 and v3 are defined the
same way, starting with e2 and e3 instead of e1 . Note that, in general,
ˆ ˆ ˆ
I acts simply as a multiplier only for multiples of these three vectors
individually, and not for sums of them. On a more general vector I will
change the direction as well as the length.
Note that the Ii are all ≥ 0, for given any vector n,
n·I ·n= mα [rα n2 − (rα · n)2 ] =
2
mα rα n2 (1 − cos2 θα ) ≥ 0,
2
α α

so all the eigenvalues must be ≥ 0. It will be equal to zero only if all
massive points of the body are in the ±n directions, in which case the
rigid body must be a thin line.
Finding the eigenvalues Ii is easier than finding the rotation R.
Consider the matrix I − λ1 which has the same eigenvectors as I,
I,

4.4. DYNAMICS 107

but with eigenvalues Ii − λ. Then if λ is one of the eigenvalues of Ii ,
this matrix will annihilate vi , so Ii − λ is a singular matrix with zero
determinant. Thus the equation det(I − λ1 = 0, which is a cubic
I)
equation in λ, gives as its roots the eigenvalues of I.

4.4 Dynamics
4.4.1 Euler’s Equations
So far, we have been working in an inertial coordinate system O. In
complicated situations this is rather unnatural; it is more natural to
use a coordiate system O fixed in the rigid body. In such a coordinate
system, the vector one gets by differentiating the coefficients of a vector
˙
b = bi ei differs from the inertial derivative b as given in Eq. 4.7. For
ˆ
the time derivative of the angular momentum, we have
 
dL dL
τ= =   +ω×L
dt dt
b
d(Iij ωj )
= ei + ω × (I · ω),
ˆ
ij dt

where we have either a system rotating about a fixed point R, with τ ,
L, and Iij all evaluated about that fixed point, or we are working about
the center of mass, with τ , L, and Iij all evaluated about the center
of mass, even if it is in motion. Now in the O frame, all the masses
are at fixed positions, so Iij is constant, and the first term is simply
˙
I · (dω/dt)b, which by (4.8) is simply ω. Thus we have (in the body
coordinate system)
˙
τ = I · ω + ω × (I · ω). (4.25)

We showed that there is always a choice of cartesian coordinates mounted
on the body along the principal axes. For the rest of this section we
will use this body-fixed coordinate system, so we will drop the primes.
The torque not only determines the rate of change of the angular
momentum, but also does work in the system. For a system rotating


about a fixed point, we see from the expression (4.13), T = 1 ω · I · ω,
2
that
dT 1 ˙ 1 1 ˙
= ω · I · ω + ω · I · ω + ω · I · ω.
˙
dt 2 2 2
The first and last terms are equal because the inertia tensor is symmet-
ric, Iij = Iji, and the middle term vanishes in the body-fixed coordinate
˙
system because all particle positions are fixed. Thus dT /dt = ω · I · ω =
˙
ω · L = ω · τ . Thus the kinetic energy changes due to the work done
by the external torque. Therefore, of course, if there is no torque the
kinetic energy is constant.
We will write out explicitly the components of Eq. 4.25. In evalu-
ating τ1 , we need the first component of the second term,

[(ω1 , ω2 , ω3 ) × (I1 ω1 , I2 ω2 , I3 ω3 )]1 = (I3 − I2 )ω2 ω3 .

Inserting this and the similar expressions for the other components into
Eq. (4.25), we get Euler’s equations

τ1 = I1 ω1 + (I3 − I2 )ω2 ω3 ,
˙
τ2 = I2 ω2 + (I1 − I3 )ω1 ω3 ,
˙ (4.26)
τ3 = I3 ω3 + (I2 − I1 )ω1 ω2 .
˙

Using these equations we can address several situations of increasing
difficulty.
First, let us ask under what circumstances the angular velocity will
˙
be fixed in the absense of a torque. As τ = ω = 0, from the 1-component
equation we conclude that (I2 − I3 )ω2 ω3 = 0. Then either the moments
are equal (I2 = I3 ) or one of the two components ω2 or ω3 must van-
ish. Similarly, if I1 = I2 , either ω1 or ω2 vanishes. So the only way
more than one component of ω can be nonzero is if two or more of the
principal moments are equal. In this case, the principal axes are not
uniquely determined. For example, if I1 = I2 = I3 , the third axes is
unambiguously required as one of the principle axes, but any direction
in the (12)-plane will serve as the second principal axis. In this case we
˙
see that τ = ω = 0 implies either ω is along the z-axis (ω1 = ω2 = 0) or
it lies in the (12)-plane, (ω3 = 0). In any case, the angular velocity is

4.4. DYNAMICS 109

constant in the absence of torques only if it lies along a principal axis
of the body.
As our next example, consider an axially symmetric body with no
˙
external forces or torques acting on it. Then R is a constant, and we
will choose to work in an inertial frame where R is fixed at the origin.
Choosing our body-fixed coordinates with z along the axis of symmetry,
our axes are principal ones and I1 = I2 , so we have
I1 ω1 = (I1 − I3 )ω2 ω3 ,
˙
I1 ω2 = (I3 − I1 )ω1 ω3 ,
˙
I3 ω3 = (I1 − I2 )ω1 ω2 = 0.
˙
We see that ω3 is a constant. Let Ω = ω3 (I3 − I1 )/I1 . Then we see that
ω1 = −Ωω2 ,
˙ ω2 = Ωω1 .
˙
Differentiating the first and plugging into the second, we find
ω1 = −Ωω2 = −Ω2 ω1 ,
¨ ˙
which is just the harmonic oscillator equation. So ω1 = A cos(Ωt + φ)
with some arbitrary amplitude A and constant phase φ, and ω2 =
−ω1 /Ω = A sin(Ωt + φ). We see that, in the body-fixed frame, the
˙
angular velocity rotates about the axis of symmetry in a circle, with
arbitrary radius A, and a period 2π/Ω. The angular velocity vector ω
is therefore sweeping out a cone, called the body cone of precession
with a half-angle φb = tan−1 A/ω3 . Note the length of ω is fixed.
What is happening in the lab frame? The kinetic energy 1 ω · L is
2
constant, as is the vector L itself. As the length of a vector is frame
independent, |ω| is fixed as well. Therefore the angle between them,
called the lab angle, is constant,
ω·L 2T
cos φL = = = constant.
|ω||L| |ω||L|

Thus ω rotates about L in a cone, called the laboratory cone.
Note that φb is the angle between ω and the z-axis of the body,
while φL is the angle between ω and L, so they are not the same angle
in two different coordinate systems.


The situation is a bit hard to picture. In the body frame it is hard
to visualize ω, although that is the negative of the angular velocity of
the universe in that system. In the lab frame the body is instantanously
rotating about the axis ω, but this axis is not fixed in the body. At any
instant, the points on this line are not moving, and we may think of the
body rolling without slipping on the lab cone, with ω the momentary
line of contact. Thus the body cone rolls on the lab cone without
slipping.

The Poinsot construction
This idea has an extension to the more general case where the body
has no symmetry. The motion in this case can be quite complex, both
for analytic solution, because Euler’s equations are nonlinear, and to
visualize, because the body is rotating and bobbing around in a com-
plicated fashion. But as we are assuming there are no external forces
or torques, the kinetic energy and total angular momentum vectors are
constant, and this will help us understand the motion. To do so we
construct an abstract object called the inertia ellipsoid. Working in
the body frame, consider that the equation
2T = ωi Iij ωj = f (ω)
ij

is a quadratic equation for ω, with constant coefficients, which therefore
determines an ellipsoid8 in the space of possible values of ω. This is
called the inertia ellipsoid9 . It is fixed in the body, and so if we were
to scale it by some constant to change units from angular velocity to
position, we could think of it as a fixed ellipsoid in the body itself,
centered at the center of mass. At every moment the instantanous
value of ω must lie on this ellipsoid, so ω(t) sweeps out a path on this
ellipsoid called the polhode.
8
We assume the body is not a thin line, so that I is a positive definite matrix (all
its eigenvalues are strictly > 0), so the surface defined by this equation is bounded.
9
Exactly which quantity forms the inertia ellipsoid varies by author. Goldstein
√
scales ω by a constant 1/ 2T to form an object ρ whose ellipsoid he calls the inertia
ellipsoid. Landau and Lifshitz discuss an ellipsoid of L values but don’t give it a
name. They then call the corresponding path swept out by ω the polhode, as we
do.

4.4. DYNAMICS 111

If we go to the lab frame, we see this ellipsoid fixed in and moving
with the body. The instantaneous value of ω still lies on it. In ad-
dition, the component of ω in the (fixed) L direction is fixed, and as
the center of mass is fixed, the point corresponding to ω lies in a plane
perpendicular to L a fixed distance from the center of mass, known as
the invariant plane. Finally we note that the normal to the surface of
the ellipsoid f (ω) = 2T is parallel to f = 2I · ω = 2L, so the ellipsoid
of inertia is tangent to the invariant plane at the point ω(t). The path
that ω(t) sweeps out on the invariant plane is called the herpolhode.
At this particular moment, the point corresponding to ω in the body
is not moving, so the inertia ellipsoid is rolling, not slipping, on the
invariant plane.
In general, if there is no special symmetry, the inertia ellipsoid will
not be axially symmetric, so that in order to roll on the fixed plane and
keep its center at a fixed point, it will need to bob up and down. But
in the special case with axial symmetry, the inertia ellipsoid will also
have this symmetry, so it can roll about a circle, with its symmetry
axis at a fixed angle relative to the invariant plane. In the body frame,
ω3 is fixed and the polhode moves on a circle of radius A = ω sin φb .
In the lab frame, ω rotates about L, so it sweeps out a circle of radius
ω sin φL in the invariant plane. One circle is rolling on the other, and
the polhode rotates about its circle at the rate Ω in the body frame, so
the angular rate at which the herpolhode rotates about L, ΩL , is

circumference of polhode circle I3 − I1 sin φb
ΩL = Ω = ω3 .
circumference of herpolhode circle I1 sin φL

Stability of rotation about an axis
We have seen that the motion of a isolated rigid body is simple only if
the angular velocity is along one of the principal axes, and can be very
complex otherwise. However, it is worth considering what happens
if ω is very nearly, but not exactly, along one of the principal axes,
say z. Then we may write ω = ω3 e3 + in the body coordinates,
ˆ
and assume 3 = 0 and the other components are small. We treat
Euler’s equations to first order in the small quantity . To this order,
ω3 = (I1 − I2 ) 1 2 /I3 ≈ 0, so ω3 may be considered a constant. The
˙


other two equations give
I2 − I3
ω1 = ˙ 1 =
˙ 2 ω3
I1
I3 − I1
ω2 = ˙ 2
˙ = 1 ω3
I2
so
I2 − I3 I3 − I1 2
¨1 = ω3 1 .
I1 I2
What happens to (t) depends on the sign of the coefficient, or the sign
of (I2 − I3 )(I3 − I1 ). If it is negative, 1 oscillates, and indeed rotates
about z just as we found for the symmetric top. This will be the case
if I3 is either the largest or the smallest eigenvalue. If, however, it is
the middle eigenvalue, the constant will be positive, and the equation
is solved by exponentials, one damping out and one growing. Unless
the initial conditions are perfectly fixed, the growing piece will have
a nonzero coefficient and will blow up. Thus a rotation about the
intermediate principal axis is unstable, while motion about the axes
with the largest and smallest moments are stable. For the case where
two of the moments are equal, the motion will be stable about the third,
and slightly unstable ( will grow linearly instead of exponentially with
time) about the others.
An interesting way of understanding this stability or instability of
rotation close to a principle axes involves another ellipsoid we can de-
fine for the free rigid body, an ellipsoid of possible angular momentum
values. Of course in the inertial coordinates L is constant, but in body
fixed language the coordinates vary with time, though the length of L
is still constant. In addition, the conservation of kinetic energy

2T = L · I −1 · L

(where I −1 is the inverse of the moment of inertia matrix) gives a
quadratic equation for the three components of L, just as we had for ω
and the ellipsoid of inertia. The path of L(t) on this ellipsoid is on the
intersection of the ellisoid with a sphere of radius |L|, for the length is
fixed.

4.4. DYNAMICS 113

If ω is near the principle axis with the largest moment of inertia,
L lies near the major axis of the ellipsoid. The sphere is nearly cir-
cumscribing the ellipsoid, so the intersection consists only of two small
loops surrounding each end of the major axis. Similiarly if ω is near the
smallest moment, the sphere is nearly inscribed by the ellipsoid, and
again the possible values of L lie close to either end of the minor axis.
Thus the subsequent motion is confined to one of these small loops. But
if ω starts near the intermediate principle axis, L does likewise, and the
intersection consists of two loops which extend from near one end to
near the other of the intermediate axis, and the possible continuous
motion of L is not confined to a small region of the ellipsoid.
Because the rotation of the Earth flattens the poles, the Earth is
approximately an oblate ellipsoid, with I3 greater than I1 = I2 by
about one part in 300. As ω3 is 2π per siderial day, if ω is not perfectly
aligned with the axis, it will precess about the symmetry axis once
every 10 months. This Chandler wobble is not of much significance,
however, because the body angle φb ≈ 10−6 .

4.4.2 Euler angles
Up to this point we have managed to describe the motion of a rigid
body without specifying its coordinates. This is not possible for most
problems with external forces, for which the torque will generally de-
pend on the orientation of the body. It is time to face up to the problem
of using three generalized coordinates to describe the orientation.
In section 4.1.1 we described the orientation of a rigid body in terms
of a rotation through a finite angle in a given direction, specified by ω.
This does not give a simple parameterization of the matrix A, and
it is more common to use an alternate description known as Euler
angles. Here we describe the rotation A as a composition of three
simpler rotations about specified coordinates, so that we are making a
sequence of changes of coordinates

Rz (φ) Ry1 (θ) Rz2 (ψ)
(x, y, z) −→ (x1 , y1 , z1 ) −→ (x2 , y2 , z2 ) −→ (x , y , z ).

We have chosen three specific directions about which to make the three
rotations, namely the original z-axis, the next y-axis, y1 , and then the


new z-axis, which is both z2 and z . This choice is not universal, but is
the one generally used in quantum mechanics. Many of the standard
classical mechanics texts10 take the second rotation to be about the x1 -
axis instead of y1 , but quantum mechanics texts11 avoid this because
the action of Ry on a spinor is real, while the action of Rx is not. While
this does not concern us here, we prefer to be compatible with quantum
mechanics discussions.

z
z’
θ
y’

x ψ
y φ

line of nodes

x’

Figure 4.2: The Euler angles as rotations through φ, θ, ψ, about the z,
y1 , and z2 axes sequentially

This procedure is pictured in Figure 4.2. To see that any rotation
can be written in this form, and to determine the range of the angles, we
first discuss what fixes the y1 axis. Notice that the rotation about the
z-axis leaves z uneffected, so z1 = z, Similiarly, the last rotation leaves
10
See [2], [4], [6], [7], [8] and [12].
11
For example [9] and [13].

4.4. DYNAMICS 115

the z2 axis unchanged, so it is also the z axis. The planes orthogonal to
these axes are also left invariant12 . These planes, the xy-plane and the
x y -plane respectively, intersect in a line called the line of nodes13 .
These planes are also the x1 y1 and x2 y2 planes respectively, and as
the second rotation Ry1 (θ) must map the first into the second plane,
we see that y1 , which is unaffected by Ry1 , must be along the line of
nodes. We choose between the two possible orientations of y1 to keep
the necessary θ angle in [0, π]. The angles φ and ψ are then chosen
∈ [0, 2π) as necessary to map y → y1 and y1 → y respectively.
While the rotation about the z-axis leaves z uneffected, it rotates
the x and y components by the matrix (4.4). Thus in three dimensions,
a rotation about the z axis is represented by
 
cos φ sin φ 0
 
Rz (φ) =  − sin φ cos φ 0  . (4.27)
0 0 1
Similarly a rotation through an angle θ about the current y axis has a
similar form  
cos θ 0 − sin θ
 
Ry (θ) =  0 1 0 . (4.28)
sin θ 0 cos θ
The reader needs to assure himself, by thinking of the rotations as active
transformations, that the action of the matrix Ry after having applied
Rz produces a rotation about the y1 -axis, not the original y-axis.
The full rotation A = Rz (ψ)·Ry (θ)·Rz (φ) can then be found simply
by matrix multiplication:

A(φ, θ, ψ) =
   
cos ψ sin ψ 0 cos θ 0 − sin θ cos φ sin φ 0
   
 − sin ψ cos ψ 0 0 1 0   − sin φ cos φ 0 
0 0 1 sin θ 0 cos θ 0 0 1
= (4.29)
12
although the points in the planes are rotated by 4.4.
13
The case where the xy and x y are identical, rather than intersecting in a line,
is exceptional, corresponding to θ = 0 or θ = π. Then the two rotations about the
z-axis add or subtract, and many choices for the Euler angles (φ, ψ) will give the
same full rotation.

 
− sin φ sin ψ + cos θ cos φ cos ψ cos φ sin ψ + cos θ sin φ cos ψ − sin θ cos ψ
 − sin φ cos ψ − cos θ cos φ sin ψ cos φ cos ψ − cos θ sin φ sin ψ sin θ sin ψ  .
sin θ cos φ sin θ sin φ cos θ

We need to reexpress the kinetic energy in terms of the Euler angles
and their time derivatives. From the discussion of section 4.2, we have
d −1
Ω = −A(t) · A (t)
dt
The inverse matrix is simply the transpose, so ﬁnding Ω can be done by
straightforward diﬀerentiation and matrix multiplication14 . The result
is

Ω = (4.30)
 ˙ ˙ ˙ ˙ 
0 ψ + φ cos θ −θ cos ψ − φ sin θ sin ψ
 −ψ ˙
˙ − φ cos θ 0 θ˙ sin ψ − φ sin θ cos ψ  .
˙
˙ cos ψ + φ sin θ sin ψ
θ ˙ ˙ ˙
−θ sin ψ + φ sin θ cos ψ 0

Note Ω is antisymmetric as expected, so it can be recast into the axial
vector ω
˙ ˙
ω1 = Ω23 = θ sin ψ − φ sin θ cos ψ,
˙ ˙
ω2 = Ω31 = θ cos ψ + φ sin θ sin ψ, (4.31)
˙ ˙
ω3 = Ω12 = ψ + φ cos θ.

This expression for ω gives the necessary velocities for the kinetic energy
term (4.19 or 4.21) in the Lagrangian, which becomes
1 1 ˜
L = M V 2 + M V · ω × B + ω · I (R) · ω − U(R, θ, ψ, φ), (4.32)
2 2
or
1 1
L = M V 2 + ω · I (cm) · ω − −U(R, θ, ψ, φ), (4.33)
2 2
with ω = i ωi ei given by (4.31).
ˆ
14
Verifying the above expression for A and the following one for Ω is a good ap-
plication for a student having access to a good symbolic algebra computer program.
Both Mathematica and Maple handle the problem nicely.

4.4. DYNAMICS 117

4.4.3 The symmetric top
Now let us consider an example with external forces which constrain
one point of a symmetrical top to be stationary. Then we choose this
˜
to be the fixed point, at the origin R = 0, and we choose the body-fixed
z -axis to be along the axis of symmetry. Of course the center of mass
in on this axis, so R = (0, 0, ) in body-fixed coordinates. We will set
up the motion by writing the Lagrangian from the forms for the kinetic
and potential energy, due entirely to the gravitational field15 .
1 2 2 1 2
T = (ω1 + ω2 )I1 + ω3 I3
2 2
1 ˙2 2 1 ˙ ˙ 2
= φ sin θ + θ2 I1 +
˙ φ cos θ + ψ I3 , (4.34)
2 2
U = Mgzcm = Mg A−1 = Mg cos θ. (4.35)
zz
So L = T − U is independent of φ, ψ, and the corresponding momenta
pφ = φ sin2 θI1 + φ cos θ + ψ cos θI3
˙ ˙ ˙

= φ sin2 θI1 + cos θω3 I3 ,
˙
pψ ˙ ˙
= φ cos θ + ψ I3 = ω3 I3
are constants of the motion. Let us use parameters a = pψ /I1 and
b = pφ /I1 , which are more convenient, to parameterize the motion,
instead of pφ , pψ , or even ω3 , which is also a constant of the motion
and might seem physically a more natural choice. A third constant of
the motion is the energy,
1 1 2
E = T + U = I1 θ2 + φ2 sin2 θ + ω3 I3 + Mg cos θ.
˙ ˙
2 2
Solving for φ from pφ = I1 b = φ sin2 θI1 + I1 a cos θ,
˙ ˙

˙ b − a cos θ
φ = , (4.36)
sin2 θ
˙ ˙ I1 a b − a cos θ
ψ = ω3 − φ cos θ = − cos θ, (4.37)
I3 sin2 θ
15
As we did in discussing Euler’s equations, we drop the primes on ωi and on
Iij even though we are evaluating these components in the body fixed coordinate
system. The coordinate z, however, is still a lab coordinate, with ez pointing
ˆ
upward.


Then E becomes
1 ˙ 1
E = I1 θ2 + U (θ) + I3 ω3 ,
2
2 2
where
1 (b − a cos θ)2
U (θ) := I1 + Mg cos θ.
2 sin2 θ
The term 1 I3 ω3 is an ignorable constant, so we consider E := E− 1 I3 ω3
2
2
2
2

as the third constant of the motion, and we now have a one dimensional
problem for θ(t), with a first integral of the motion. Once we solve for
˙
θ(t), we can plug back in to find φ and ψ. ˙
˙
Substitute u = cos θ, u = − sin θθ, so
˙

I1 u2
˙ 1 (b − au)2
E = + I1 + Mg u,
2(1 − u2 ) 2 1 − u2
or

u2 = (1 − u2 )(α − βu) − (b − au)2 =: f (u),
˙ (4.38)

with α = 2E /I1 , β = 2Mg /I1 .
f (u) is a cubic with a positive
u3 term, and is negative at u =
±1, where the first term van-
ishes, and which are also the
limits of the physical range of .2
u

values of u. If there are to be
any allowed values for u2 , f (u) -1
˙ cos θ
min

must be nonnegative somewhere cos θ
max
1 u

in u ∈ [−1, 1], so f must look
very much like what is shown.
To visualize what is happening, note that a point on the symmetry
axis moves on a sphere, with θ and φ representing the usual spherical
coordinates, as can be seen by examining what A−1 does to (0, 0, z ). So
as θ moves back and forth between θmin and θmax , the top is wobbling
closer and further from the vertical, called nutation. At the same
time, the symmetry axis is precessing, rotating about the vertical

4.4. DYNAMICS 119

θ = 52◦ θ = 44◦ θ = θmin

Figure 4.3: Possible loci for a point on the symmetry axis of the top.
The axis nutates between θmin = 50◦ and θmax = 60◦

˙
axis, at a rate φ which is not constant but a function of θ (Eq. 4.36).
Qualitatively we may distinguish three kinds of motion, depending on
˙
the values of φ at the turning points in θ. These in turn depend on the
initial conditions and the parameters of the top, expressed in a, b, and
˙
θmin , θmax . If the value of u = cos θ at which φ vanishes is within the
range of nutation, then the precession will be in diﬀerent directions at
θmin and θmax , and the motion is as in Fig. 4.3a. On the other hand,
if θ = cos−1 (b/a) ∈ [θmin , θmax ], the precession will always be in the
same direction, although it will speed up and slow down. We then get
a motion as in Fig. 4.3b. Finally, it is possible that cos θmin = b/a, so
that the precession stops at the top, as in Fig. 4.3c. This special case
is of interest, because if the top’s axis is held still at an angle to the
vertical, and then released, this is the motion we will get.

Exercises
4.1 Prove the following properties of matrix algebra:
(a) Matrix multiplication is associative: A · (B · C) = (A · B) · C.
(b) (A·B)T = B T ·AT , where AT is the transpose of A, that is (AT )ij := Aji .
(c) If A−1 and B −1 exist, (A · B)−1 = B −1 · A−1 .
(d) The complex conjugate of a matrix (A∗ )ij = A∗ is the matrix with
ij
every element complex conjugated. The hermitean conjugate A† is the


transpose of that, A† := (A∗ )T = (AT )∗ , with (A† )ij := A∗ . Show that
ji
(A · B)∗ = A∗ · B ∗ and (A · B)† = B † · A† .

4.2 In section (4.1) we considered reexpressing a vector V = i Vi ei in
ˆ
terms of new orthogonal basis vectors. If the new vectors are e i = j Aij ej ,
ˆ
we can also write ei = j Aji e j , because A
ˆ T = A−1 for an orthogonal

transformation.
Consider now using a new basis e i which are not orthonormal. Then we must
choose which of the two above expressions to generalize. Let ei = j Aji e j ,
ˆ
and ﬁnd the expressions for (a) e j in terms of ei ; (b) Vi in terms of Vj ;
ˆ
and (c) Vi in terms of Vj . Then show (d) that if a linear tranformation T
which maps vectors V → W is given in the ei basis by a matrix Bij , in that
ˆ
Wi = Bij Vj , then the same transformation T in the e i basis is given by
C = A · B · A−1 . This transformation of matrices, B → C = A · B · A−1 , for
an arbitrary invertible matrix A, is called a similarity transformation.

4.3 Two matrices B and C are called similar if there exists an invertible
matrix A such that C = A · B · A−1 , and this transformation of B into C
is called a similarity transformation, as in the last problem. Show that, if
B and C are similar, (a) Tr B = Tr C; (b) det B = det C; (c) B and C
have the same eigenvalues; (d) If A is orthogonal and B is symmetric (or
antisymmetric), then C is symmetric (or antisymmetric).

4.4 From the fact that AA−1 = 1 for any invertible matrix, show that if
A(t) is a diﬀerentiable matrix-valued function of time,

dA −1
A A−1 = −A
˙ .
dt

4.5 Show that a counterclockwise rotation through an angle θ about an
axis in the direction of a unit vector n passing through the origin is given
ˆ
by the matrix

Aij = δij cos θ + ni nj (1 − cos θ) − ijk nk sin θ.

4.4. DYNAMICS 121

4.6 Consider a rigid body in the shape of a right circular cone of height
h and a base which is a circle of radius R, made of matter with a uniform
density ρ.
a) Find the position of the center
of mass. Be sure to specify with
respect to what.
b) Find the moment of inertia ten-
sor in some suitable, well specified P
coordinate system about the cen-
ter of mass.
c) Initially the cone is spinning
about its symmetry axis, which is
in the z direction, with angular h
velocity ω0 , and with no external
forces or torques acting on it. At y
time t = 0 it is hit with a momen-
tary laser pulse which imparts an
impulse P in the x direction at the R
x
apex of the cone, as shown.
Describe the subsequent force-free motion, including, as a function of time,
the angular velocity, angular momentum, and the position of the apex, in any
inertial coordinate system you choose, provided you spell out the relation to
the initial inertial coordinate system.

4.7 We defined the general rotation as A = Rz (ψ)·Ry (θ)·Rz (φ). Work out
the full expression for A(φ, θ, ψ), and verify the last expression in (4.29). [For
this and exercise 4.8, you might want to use a computer algebra program
such as mathematica or maple, if one is available.]

˙ ˙ ˙
4.8 Find the expression for ω in terms of φ, θ, ψ, φ, θ, ψ. [This can be done
simply with computer algebra programs. If you want to do this by hand, you
might find it easier to use the product form A = R3 R2 R1 , and the rather
simpler expressions for RRT . You will still need to bring the result (for
˙
˙
R1 R1T , for example) through the other rotations, which is somewhat messy.]

4.9 A diamond shaped object is shown in top, front, and side views. It is
an octahedron, with 8 triangular flat faces.


A’
a
B’ C B
a
A
b b a a
It is made of solid aluminum of uniform C C

density, with a total mass M . The di-
mensions, as shown, satisfy h > b > a. h
(a) Find the moment of inertia tensor
about the center of mass, clearly speci- B’ A B A B A’
fying the coordinate system chosen.
(b) About which lines can a stable spin-
h
ning motion, with fixed ω, take place,
assuming no external forces act on the
C’ C’
body?
4.10 From the expression 4.38 for u = cos θ for the motion of the symmetric
top, we can derive a function for the time t(u) as an indefinite integral
u
t(u) = f −1/2 (z) dz.

For values which are physically realizable, the function f has two (generically
distinct) roots, uX ≤ uN in the interval u ∈ [−1, 1], and one root uU ∈
[1, ∞), which does not correspond to a physical value of θ. The integrand
is then generically an analytic function of z with square root branch points
at uN , uX , uU , and ∞, which we can represent on a cut Riemann sheet with
cuts on the real axis, [−∞, uX ] and [uN , uU ], and f (u) > 0 for u ∈ (uX , uN ).
Taking t = 0 at the time the top is at the bottom of a wobble, θ = θmax , u =
uX , we can find the time at which it first reaches another u ∈ [uX , uN ] by
integrating along the real axis. But we could also use any other path in
the upper half plane, as the integral of a complex function is independent of
deformations of the path through regions where the function is analytic.
(a) Extend this definition to a function t(u) defined for Im u ≥ 0, with u
not on a cut, and show that the image of this function is a rectangle in the
complex t plane, and identify the pre-images of the sides. Call the width
T /2 and the height τ /2
(b) Extend this function to the lower half of the same Riemann sheet by
allowing contour integrals passing through [uX , uN ], and show that this ex-
tends the image in t to the rectangle (0, T /2) × (−iτ /2, iτ /2).
(c) If the coutour passes through the cut (−∞, uX ] onto the second Riemann
sheet, the integrand has the opposite sign from what it would have at the

4.4. DYNAMICS 123

corresponding point of the first sheet. Show that if the path takes this path
onto the second sheet and reaches the point u, the value t1 (u) thus obtained
is t1 (u) = −t0 (u), where t0 (u) is the value obtained in (a) or (b) for the
same u on the first Riemann sheet.
(d) Show that passing to the second Riemann sheet by going through the
cut [uN , uU ] instead, produces a t2 (u) = t1 + T .
(e) Show that evaluating the integral along two contours, Γ1 and Γ2 , which
differ only by Γ1 circling the [uN , uU ] cut clockwise once more than Γ2 does,
gives t1 = t2 + iτ .
(f) Show that any value of t can be reached by some path, by circling the
[uN , uU ] as many times as necessary, and also by passing downwards through
it and upwards through the [−∞, uX ] cut as often as necessary (perhaps
reversed).
(g) Argue that thus means the function u(t) is an analytic function from
the complex t plane into the u complex plane, analytic except at the points
t = nT + i(m + 1 )τ , where u(t) has double poles. Note this function is
2
doubly periodic, with u(t) = u(t + nT + imτ ).
(g) Show that the function is then given by u = β ℘(t − iτ /2) + c, where c
is a constant, β is the constant from (4.38), and
1 1 1
℘(z) = + −
z2 m,n∈Z
Z
(z − nT − miτ )2 (nT + miτ )2
(m,n)=0

is the Weierstrass’ ℘-Function.
(h) Show that ℘ satisfies the differential equation

℘ 2 = 4℘3 − g2 ℘ − g3 ,

where

g2 = (mT + inτ )−4 , g3 = (mT + inτ )−6 .
m,n∈Z m,n∈Z
(m,n)=(0,0) (m,n)=(0,0)

[Note that the Weierstrass function is defined more generally, using param-
eters ω1 = T /2, ω2 = iτ /2, with the ω’s permitted to be arbitrary complex
numbers with differing phases.]

4.11 As a rotation about the origin maps the unit sphere into itself, one
way to describe rotations is as a subset of maps f : S 2 → S 2 of the (surface of
the) unit sphere into itself. Those which correspond to rotations are clearly


one-to-one, continuous, and preserve the angle between any two paths which
intersect at a point. This is called a conformal map. In addition, rotations
preserve the distances between points. In this problem we show how to
describe such mappings, and therefore give a representation for the rotations
in three dimensions.
(a) Let N be the north pole (0, 0, 1) of the unit sphere Σ = {(x, y, z), x2 +
y 2 + z 2 = 1}. Define the map from the rest of the sphere s : Σ − {N } → R2
given by a stereographic projection, which maps each point on the unit
sphere, other than the north pole, into the point (u, v) in the equatorial
plane (x, y, 0) by giving the intersection with this plane of the straight line
which joins the point (x, y, z) ∈ Σ to the north pole. Find (u, v) as a function
of (x, y, z), and show that the lengths of infinitesimal paths in the vicinity
of a point are scaled by a factor 1/(1 − z) independent of direction, and
therefore that the map s preserves the angles between intersecting curves
(i.e. is conformal).
(b) Show that the map f ((u, v)) → (u , v ) which results from first applying
s−1 , then a rotation, and then s, is a conformal map from R2 into R2 , except
for the pre-image of the point which gets mapped into the north pole by the
rotation.
By a general theorem of complex variables, any such map is analytic, so
f : u + iv → u + iv is an analytic function except at the point ξ0 = u0 + iv0
which is mapped to infinity, and ξ0 is a simple pole of f . Show that f (ξ) =
(aξ + b)/(ξ − ξ0 ), for some complex a and b. This is the set of complex
Mobius transformations, which are usually rewritten as
αξ + β
f (ξ) = ,
γξ + δ
where α, β, γ, δ are complex constants. An overall complex scale change does
not affect f , so the scale of these four complex constants is generally fixed
by imposing a normalizing condition αδ − βγ = 1.
(c) Show that composition of Mobius transformations f = f ◦ f : ξ −→
f
ξ −→ ξ is given by matrix multiplication,
f

α β α β α β
= · .
γ δ γ δ γ δ

(d) Not every mapping s−1 ◦ f ◦ s is a rotation, for rotations need to preserve
distances as well. We saw that an infinitesimal distance d on Σ is mapped
by s to a distance |dξ| = d /(1 − z). Argue that the condition that f : ξ → ξ ˜

4.4. DYNAMICS 125

correspond to a rotation is that d ˜ ≡ (1 − z )|df /dξ||dξ| = d . Express
˜
˜
this change of scale in terms of ξ and ξ rather than z and z , and ﬁnd
˜
the conditions on α, β, γ, δ that insure this is true for all ξ. Together with
the normalizing condition, show that this requires the matrix for f to be a
unitary matrix with determinant 1, so that the set of rotations corresponds to
the group SU (2). The matrix elements are called Cayley-Klein parameters,
and the real and imaginary parts of them are called the Euler parameters.

Chapter 5

Small Oscillations

5.1 Small oscillations about stable equi-
librium
Consider a situation with N unconstrained generalized coordinates qi
described by a mass matrix Mij and a potential U({qi }), and suppose
that U has a local minimum at some point in conﬁguration space, qi =
qi0 . Then this point is a stable equilibrium point, for the generalized
force at that point is zero, and if the system is placed nearly at rest near
that point, it will not have enough energy to move far away from that
point. We may study the behavior of such motions by expanding the
potential1 in Taylor’s series expansion in the deviations ηi = qi − qi0 ,
∂U 1 ∂2U
U(q1 , . . . , qN ) = U(qi0 ) + ηi + ηi ηj + ... .
i ∂qi 0
2 ij ∂qi ∂qj 0

The constant U(qi0 ) is of no interest, as only changes in potential mat-
ter, so we may as well set it to zero. In the second term, − ∂U/∂qi |0
is the generalized force at the equilibrium point, so it is zero. Thus
the leading term in the expansion is the quadratic one, and we may
approximate
1 ∂2U
U({qi }) = Aij ηi ηj , with Aij = . (5.1)
2 ij ∂qi ∂qj 0

1
assumed to have continuous second derivatives.

127

128 CHAPTER 5. SMALL OSCILLATIONS

Note that A is a constant symmetric real matrix.
The kinetic energy T = 1 Mij ηi ηj is already second order in the
2
˙ ˙
small variations from equilibrium, so we may evaluate Mij , which in
general can depend on the coordinates qi , at the equilibrium point, ig-
noring any higher order changes. Thus Mij is a constant. Thus both
the kinetic and potential energies are quadratic forms in the displace-
ment η, which we think of as a vector in N-dimensional space. Thus
we can write the energies in matrix form
1 1
T = η T · M · η,
˙ ˙ U = η T · A · η. (5.2)
2 2
A and M are real symmetric matrices, and because any displacement
corresponds to positive kinetic and nonnegative potential energies, they
are positive (semi)deﬁnite matrices, meaning that all their eigenvalues
are greater than zero, except that A may also have eigenvalues equal
to zero (these are directions in which the stability is neutral to lowest
order, but may be determined by higher order terms in the displace-
ment).
Lagrange’s equation of motion

d ∂L ∂L d
0= − = M ·η+A·η =M ·η+A·η
˙ ¨ (5.3)
dt ∂ ηi ∂ηi
˙ dt

is not necessarily diagonal in the coordinate η. We shall use the fact
that any real symmetric matrix can be diagonalized by a similarity
transformation with an orthogonal matrix to reduce the problem to a
set of independant harmonic oscillators. While both M and A can be
diagonalized by an orthogonal transformation, they can not necessarily
be diagonalized by the same one, so our procedure will be in steps:

1. Diagonalize M with an orthogonal transformation O1 , transform-
ing the coordinates to a new set x = O1 · η.

2. Scale the x coordinates to reduce the mass matrix to the identity
matrix. The new coordinates will be called y.

3. Diagonalize the new potential energy matrix with another orthog-
onal matrix O2 , giving the ﬁnal set of coordinates, ξ = O2 ·y. Note

5.1. SMALL OSCILLATIONS ABOUT STABLE EQUILIBRIUM129

this transformation leaves the kinetic energy matrix diagonal be-
cause the identity matrix is unaffected by similarity transforma-
tions.
The ξ are normal modes, modes of oscillation which are independent
in the sense that they do not affect each other.
Let us do this in more detail. We are starting with the coordinates
η and the real symmetric matrices A and M, and we want to solve the
equations M · η + A · η = 0. In our first step, we use the matrix O1 ,
¨
−1
which linear algebra guarantees exists, that makes m = O1 · M · O1
diagonal. Note O1 is time-independent, so defining xi = j O1 ij ηj also
gives xi = j O1 ij ηj , and
˙ ˙
1 T
T = η ·M ·η
˙ ˙
2
1 T −1
= η · O1 · m · O1 · η
˙ ˙
2
1 T T
= η · O1 · m · (O1 · η)
˙ ˙
2
1
= (O1 · η)T · m · (O1 · η)
˙ ˙
2
1 T
= x · m · x.
˙ ˙
2
−1
Similarly the potential energy becomes U = 1 xT · O1 · A · O1 · x. We
2
know that the matrix m is diagonal, and the diagonal elements mii
are all strictly positive. To begin the second step, define the diagonal
√
matrix Sij = mii δij and new coordinates yi = Sii xi = j Sij xj , or
y = S · x. Now m = S 2 = S T · S, so T = 1 xT · m · x = 1 xT · S T · S · x =
2
˙ ˙ 2
˙ ˙
1 T 1 T
2
(S · x) · S · x = 2 y · y. In terms of y, the potential energy is
˙ ˙ ˙ ˙
1 T
U = 2 y · B · y, where
B = S −1 · O1 · A · O1 · S −1
−1

is still a symmetric matrix.
Finally, let O2 be an orthogonal matrix which diagonalizes B, so
−1
C = O2 · B · O2 is diagonal, and let ξ = O2 · y. Just as in the first
step,
1 1
U = ξ T · O2 · B · O2 · ξ = ξ T · C · ξ,
−1
2 2


while the kinetic energy
1 1 1˙ ˙
T = y T · y = y T · O2 · O2 · y = ξ T · ξ
˙ ˙ ˙ T
˙
2 2 2
is still diagonal. Because the potential energy must still be nonnegative,
all the diagonal elements Cii are nonnegative, and we will call them
√
ωi := Cii . Then
1 ˙2 1 2 2 ¨ 2
T = ξj , U= ωj ξ j , ξj + ωj ξj = 0,
2 j 2 j

so we have N independent harmonic oscillators with the solutions

ξj = Re aj eiωj t ,

with some arbitrary complex numbers aj .
To ﬁnd what the solution looks like in terms of the original coordi-
nates qi , we need to undo all these transformations. As ξ = O2 · y =
O2 · S · x = O2 · S · O1 · η, we have

q = q0 + O1 · S −1 · O2 · ξ.
−1 −1

We have completely solved this very general problem in small os-
cillations, at least in the sense that we have reduced it to a solvable
problem of diagonalizing symmetric real matrices. What we have done
may appear abstract and formal and devoid of physical insight, but it
is a general algorithm which will work on a very wide class of problems
of small oscillations about equilibrium. In fact, because diagonalizing
matrices is something for which computer programs are available, this
is even a practical method for solving such systems, even if there are
dozens of interacting particles.

5.1.1 Molecular Vibrations
Consider a molecule made up of n atoms. We need to choose the right
level of description to understand low energy excitations. We do not
want to describe the molecule in terms of quarks, gluons, and leptons.
Nor do we need to consider all the electronic motion, which is gov-
erned by quantum mechanics. The description we will use, called the


Born-Oppenheimer approximation, is to model the nuclei as clas-
sical particles. The electrons, which are much lighter, move around
much more quickly and cannot be treated classically; we assume that
for any given configuration of the nuclei, the electrons will almost in-
stantaneously find a quantum-mechanical ground state, which will have
an energy which depends on the current positions of the nuclei. This
is then a potential energy when considering the nuclear motion. The
nuclei themselves will be considered point particles, and we will ignore
internal quantum-mechanical degrees of freedom such as nuclear spins.
So we are considering n point particles moving in three dimensions,
with some potential about which we know only qualitative features.
There are 3n degrees of freedom. Of these, 3 are the center of mass
motion, which, as there are no external forces, is simply motion at
constant velocity. Some of the degrees of freedom describe rotational
modes, i.e. motions that the molecule could have as a rigid body. For
a generic molecule this would be three degrees of freedom, but if the
equilibrium configuration of the molecule is linear, rotation about that
line is not a degree of freedom, and so only two of the degrees of freedom
are rotations in that case. The remaining degrees of freedom, 3n − 6
for noncollinear and 3n − 5 for collinear molecules, are vibrations.

O2
CO 2 H O
2
Figure 5.1: Some simple molecules in their equilibrium positions.

For a collinear molecule, it makes sense to divide the vibrations into
transverse and longitudinal ones. Considering motion in one dimension
only, the nuclei have n degrees of freedom, one of which is a center-of-
mass motion, leaving n − 1 longitudinal vibrations. So the remaining
(3n−5)−(n−1) = 2(n−2) vibrational degrees of freedom are transverse


vibrational modes. There are no such modes for a diatomic molecule.

Example: CO2
Consider first the CO2 molecule. As it is a molecule, there must be a
position of stable equilibrium, and empirically we know it to be collinear
and symmetric, which one might have guessed. We will first consider
only collinear motions of the molecule. If the oxygens have coordinates
q1 and q2 , and the carbon q3 , the potential depends on q1 −q3 and q2 −q3
in the same way, so the equilibrium positions have q2 −q3 = −(q1 −q3 ) =
b. Assuming no direct force between the two oxygen molecules, the one
dimensional motion may be described near equilibrium by
1 1
U = k(q3 − q1 − b)2 + k(q2 − q3 − b)2
2 2
1 1 1
T = ˙2 ˙2
mO q1 + mO q2 + mC q3 .˙2
2 2 2
We gave our formal solution in terms of displacements from the equilib-
rium position, but we now have a situation in which there is no single
equilibrium position, as the problem is translationally invariant, and
while equilibrium has constraints on the differences of q’s, there is no
constraint on the center of mass. We can treat this in two different
ways:
1. Explicitly fix the center of mass, eliminating one of the degrees
of freedom.
2. Pick arbitrarily an equilibrium position. While the deviations of
the center-of-mass position from the equilibrium is not confined
to small excursions, the quadratic approximation is still exact.
First we follow the first method. We can always work in a frame
where the center of mass is at rest, at the origin. Then mO (q1 + q2 ) +
mC q3 = 0 is a constraint, which we must eliminate. We can do so by
dropping q3 as an independant degree of freedom, and we have, in terms
of the two displacements from equilibrium η1 = q1 + b and η2 = q2 − b,
q3 = −(η1 + η2 )mO /mC , and
1 1 1 mO
˙2 ˙2 ˙2 ˙2 ˙2
T = mO (η1 + η2 ) + mC η3 = mO η1 + η2 + (η1 + η2 )2
˙ ˙
2 2 2 mC


1 m2
O 1 + mC /mO 1 η1
˙
= ( η1
˙ η2 )
˙ .
2 mC 1 1 + mC /mO η2
˙

Now T is not diagonal, or more precisely M isn’t. We must ﬁnd the
−1
orthogonal matrix O1 such that O1 · M · O1 is diagonal. We may
assume it to be a rotation, which can only be

cos θ − sin θ
O=
sin θ cos θ

for some value of θ. It is worthwhile to derive a formula for diagonalizing
a general real symmetric 2 × 2 matrix and then plug in our particular
form. Let
a b c −s
M= , and O = ,
b d s c

where we have abbreviated s = sin θ, c = cos θ. We will require the
matrix element m12 = (O · M · O−1 )12 = 0, because m is diagonal. This
determines θ:
c −s a b c s
O · M · O−1 =
s c b d −s c
c −s · as + bc · acs + bc2 − bs2 − scd
= =
· · · bs + cd · ·

where we have placed a · in place of matrix elements we don’t need to
calculate. Thus the condition on θ is

(a − d) sin θ cos θ + b(cos2 θ − sin2 θ) = 0 = (a − d) sin 2θ/2 + b cos 2θ,

or
−2b
tan 2θ = .
a−d
Notice this determines 2θ only modulo π, and therefore θ modulo 90◦ ,
which ought to be expected, as a rotation through 90◦ only interchanges
axes and reverses directions, both of which leave a diagonal matrix
diagonal.


In our case a = d, so tan 2θ = ∞, and θ = π/4. As x = O1 η,

x1 cos π/4 − sin π/4 η1 1 η1 − η2
= =√ ,
x2 sin π/4 cos π/4 η2 2 η1 + η2

and inversely
η1 1 x1 + x2
=√ .
η2 2 −x1 + x2

Then
1 (x1 + x2 )2 (x1 − x2 )2 mO √
˙ ˙ ˙ ˙
T = mO + + ( 2x2 )2
˙
2 2 2 mC
1 1 2mO
= mO x2 + mO 1 +
˙1 x2
˙2
2 2 mC
1 1
U = k(q3 − q1 − b)2 + k(q2 − q3 − b)2
2 2
1 mO 2 mO 2
= k η1 + (η1 + η2 ) + η2 + (η1 + η2 )
2 mC mC
1 2m2 2mO
= k η1 + η2 + 2O (η1 + η2 )2 +
2 2
(η1 + η2 )2
2 mC mC
1 4mO
= k x2 + x2 + 2 (mO + mC )x2
1 2 2
2 mC
1 2 1 mC + 2mO 2 2
= kx1 + k x2 .
2 2 mC
Thus U is already diagonal and we don’t need to go through steps 2 and
3, the scaling and second orthogonalization, except to note that if we
2
skip the scaling the angular frequencies are given by ωi = coeﬃcient
in U / coeﬃcient in T . Thus we have one normal mode, x1 , with
ω1 = k/mO , with x2 = 0, η1 = −η2 , q3 = 0, in which the two oxygens
vibrate in and out together, symmetrically about the carbon, which
doesn’t move. We also have another mode, x2 , with

k(mC + 2mO )2 /m2
O k(mC + 2mO )
ω2 = = ,
mO (1 + 2mO /mC ) mO mC


with x1 = 0, η1 = η2 , in which the two oxygens move right or left
together, with the carbon moving in the opposite direction.
We have successfully solved for the longitudinal vibrations by elimi-
nating one of the degrees of freedom. Let us now try the second method,
in which we choose an arbitrary equilibrium position q1 = −b, q2 = b,
q3 = 0. Then
1 1
T = ˙2 ˙2 ˙2
mO (η1 + η2 ) + mC η3
2 2
1
U = k (η1 − η3 ) + (η2 − η3 )2 .
2
2
T is already diagonal, so O1 = 1 x = η. In the second step S is the
I,
√ √ √
diagonal matrix with S11 = S22 = mO , S33 = mC , and yi = mO ηi
√
for i = 1, 2, and y3 = mC η3 . Then
 
2 2
1  y1 y3 y2 y3 
U = k √ −√ + √ −√
2 mO mC mO mC
1 k 2 2 2 √
= mC y1 + mC y2 + 2mO y3 − 2 mO mC (y1 + y2 )y3 .
2 mO mC

Thus the matrix B is
 √ 
mC 0 − mO mC
 √ 
B= 0 mC − mO mC  ,
√ √
− mO mC − mO mC 2mO
√ √ √
which is singular, as it annihilates the vector y T = ( mO , mO , mC ),
which corresponds to η T = (1, 1, 1), i.e. all the nuclei are moving by the
same amount, or the molecule is translating rigidly. Thus this vector
corresponds to a zero eigenvalue of U, and a harmonic oscillation of
zero frequency. This is free motion2 , ξ = ξ0 + vt. The other two modes
can be found by diagonalizing the matrix, and will be as we found by
the other method.
2
To see that linear motion is a limiting case of harmonic motion as ω → 0, we
need to choose the complex coeﬃcient to be a function of ω, A(ω) = x0 −iv0 /ω, with
x0 and v0 real. Then x(t) = limω→0 Re A(ω)eiωt = x0 + v0 limω→0 sin(ωt)/ω =
x0 + v0 t


Transverse motion
What about the transverse motion? Consider the equilibrium position
of the molecule to lie in the x direction, and consider small deviations
in the z direction. The kinetic energy
1 1 1
T = ˙2
mO z1 + mO z2 + mC z3 .
˙ ˙2
2 2 2
is already diagonal, just as for
the longitudinal modes in the
second method. Any potential
energy must be due to a resis-
tance to bending, so to second z
order, U ∝ (ψ − θ)2 ∼ (tan ψ − θ 2
2
b ψ
tan θ) = [(z2 − z3 )/b + (z1 − z
z b 3
z3 )/b]2 = b−2 (z1 + z2 − 2z3 )2 . 1
Note that the potential energy is proportional to the square of a sin-
gle linear combination of the displacements, or to the square of one
component (with respect to a particular direction) of the displacement.
Therefore there is no contribution of the two orthogonal directions, and
there are two zero modes, or two degrees of freedom with no restoring
force. One of these is the center of mass motion, z1 = z2 = z3 , and
the other is the third direction in the abstract space of possible dis-
placements, z T = (1, −1, 0), with z1 = −z2 , z3 = 0, which we see is a
rotation. Thus there remains only one true transverse vibrational mode
in the z direction, and also one in the y direction, which together with
the two longitudinal ones we found earlier, make up the 4 vibrational
modes we expected from the general formula 2(n − 2) for a collinear
molecule.
You might ask whether these oscillations we have discussed are in
any way observable. Quantum mechanically, a harmonic oscillator can
only be in states with excitation energy E = n¯ ω, where n ∈ Z is an
h
integer and 2π¯ is Planck’s constant. When molecules are in an excited
h
state, they can emit a photon while changing to a lower energy state.
The energy of the photon, which is the amount lost by the molecule,
is proportional to the frequency, ∆E = 2π¯ f , so by measuring the
h
wavelength of the emitted light, we can determine the vibrational fre-
quencies of the molecules. So the calculations we have done, and many

5.2. OTHER INTERACTIONS 137

others for which we have built the apparatus, are in fact very practical
tools for molecular physics.

5.1.2 An Alternative Approach
The step by step diagonalization we just gave is not the easiest approach
to solving the linear differential equation (5.3). Solutions to linear
differential equations are subject to superposition, and equations with
coefficients independent of time are simplified by Fourier transform, so
we can express the N dimensional vector of functions ηi (t) as
∞
ηi (t) = dωfj (ω)e−iωt .
−∞

Then the Lagrange equations become
∞
dω Aij − ω 2 Mij fj (ω)e−iωt = 0 for all t.
−∞

But e−iωt are linearly independent functions of t ∈ R, so

Aij − ω 2 Mij fj (ω) = 0.

This implies fj (ω) = 0 except when the matrix Aij − ω 2 Mij is singular,
det (Aij − ω 2 Mij ) = 0, which gives a descrete set of angular frequencies
ω1 . . . ωN , and for each an eigenvector fj .

5.2 Other interactions
In our treatment we assumed a Lagrangian formulation with a kinetic
˙
term purely quadratic in q, together with a velocity independent poten-
tial. There is a wider scope of small oscillation problems which might in-
clude dissipative forces like friction, or external time-dependent forces,
or perhaps terms in the Lagrangian linear in the velocities. An exam-
ple of the latter occurs in rotating reference frames, from the Coriolus
force, and is important in the question of whether there is a gravitation-
ally stable location for small objects caught between the Earth and the
moon at the “L5” point. Each of these complications introduces terms,


even in the linear approximation to the equations of motion, which can-
not be diagonalized away, because there is not significant freedom of
diagonalization left, in general, after having simplified T and U. Thus
the approach of section 5.1 does not generalize well, but the approach
of section 5.1.2 can be applied.

5.3 String dynamics
In this section we consider two closely related problems, transverse os-
cillations of a stretched loaded string, and of a stretched heavy string.
The latter is is a limiting case of the former. This will provide an in-
troduction to field theory, in which the dynamical degrees of freedom
are not a discrete set but are defined at each point in space. In Chap-
ter 8 we will discuss more interesting and involved cases such as the
electromagnetic field, where at each point in space we have E and B
as degrees of freedom, though not without constraints.
The loaded string we will consider is a light string under tension τ
stretched between two fixed points a distance apart, say at x = 0 and
x = . On the string, at points x = a, 2a, 3a, . . . , na, are fixed n parti-
cles each of mass m, with the first and last a distance a away from the
fixed ends. Thus = (n + 1)a. We will consider only small transverse
motion of these masses, using yi as the transverse displacement of the
i’th mass, which is at x = ia. We assume all excursions from the equilib-
rium positions yi = 0 are small, and in particular that the difference in
successive displacements yi+1 − yi a. Thus we are assuming that the
angle made by each segment of the string, θi = tan−1 [(yi+1 −yi )/a] 1.
Working to first order in the θ’s in the equations of motion, and sec-
ond order for the Lagrangian, we see that restricting our attention to
transverse motions and requiring no horizontal motion forces taking the
tension τ to be constant along the string. The transverse force on the
i’th mass is thus
yi+1 − yi yi−1 − yi τ
Fi = τ +τ = (yi+1 − 2yi + yi−1 ).
a a a
The potential energy U(y1 , . . . , yn ) then satisfies
∂U τ
= − (yi+1 − 2yi + yi−1 )
∂yi a

5.3. STRING DYNAMICS 139

so

U(y1 , . . . , yi , . . . , yn )
yi τ
= dyi (2yi − yi+1 − yi−1 ) + F (y1 , . . . , yi−1, yi+1 , . . . , yn )
0 a
τ 2
= y − (yi+1 + yi−1 )yi + F (y1 , . . . , yi−1, yi+1 , . . . , yn )
a i
τ
= (yi+1 − yi)2 + (yi − yi−1 )2 + F (y1 , . . . , yi−1 , yi+1, . . . , yn )
2a
n
τ
= (yi+1 − yi)2 + constant.
i=0 2a

The F and F are unspecified functions of all the yj ’s except yi . In the
last expression we satisfied the condition for all i, and we have used
the convenient definition y0 = yn+1 = 0. We can and will drop the
arbitrary constant.
The kinetic energy is simply T = 1 m n yi , so the mass matrix is
2 1 ˙
2

already proportional to the identity matrix and we do not need to go
through the first two steps of our general process. The potential energy
U = 1 y T · A · y has a non-diagonal n × n matrix
2
 
−2 1 0 0 ··· 0 0
 1 −2 1 0 · · · 0 0 
 
 
 0
τ 1 −2 1 · · · 0 0 
A=−  . . . . .. . . .
. 
a .. .
. .
. .
. . .
. . 
 
 0 0 0 0 · · · −2 1 
0 0 0 0 · · · 1 −2

Diagonalizing even a 3 × 3 matrix is work, so an n × n matrix might
seem out of the question, without some hints from the physics of the
situation. In this case the hint comes in a roundabout fashion — we will
first consider a limit in which n → ∞, the continuum limit, which
leads to an interesting physical situation in its own right.
Suppose we consider the loaded string problem in the limit that the
spacing a becomes very small, but the number of masses m becomes
large, keeping the total length of the string fixed. If at the same
time we adjust the individual masses so that the mass per unit length,
ρ, is fixed, our bumpy string gets smoothed out in the limit, and we


might expect that in this limit we reproduce the physical problem of
transverse modes of a uniformly dense stretched string, like a violin
string. Thus we wish to consider the limit
a → 0, n → ∞, = (n + 1)a fixed, m → 0, ρ = m/a fixed.
It is natural to think of the degrees of freedom as associated with the
label x rather than i, so we redefine the dynamical functions {yj (t)}
as y(x, t), with y(ja, t) = yj (t). While this only defines the function at
discrete points in x, these are closely spaced for small a and become
dense as a → 0. We will assume that the function y(x) is twice differ-
entiable in the continuum limit, though we shall see that this is not the
case for all possible motions of the discrete system.
What happens to the kinetic and potential energies in this limit?
For the kinetic energy,
1 1 1 1
T = m ˙2
yi = ρ ay 2 (xi ) = ρ
˙ ∆xy 2 (xi ) → ρ
˙ dx y 2(x),
˙
2 i 2 i 2 i 2 0

where the next to last expression is just the definition of a Riemann
integral. For the potential energy,
2 2
τ τ yi+1 − yi τ ∂y
U= (yi+1 − yi )2 = ∆x → dx .
2a i 2 i ∆x 2 0 ∂x
The equation of motion for yi is
∂L ∂U τ
mï =
y =− = [(yi+1 − yi) − (yi − yi−1 )],
∂yi ∂yi a
or
τ
ρa¨(x) =
y ([y(x + a) − y(x)] − [y(x) − y( x − a)]).
a
We need to be careful about taking the limit
y(x + a) − y(x) ∂y
→
a ∂x
because we are subtracting two such expressions evaluated at nearby
points, and because we will need to divide by a again to get an equation
between finite quantities. Thus we note that
y(x + a) − y(x) ∂y
= + O(a2 ),
a ∂x x+a/2

5.3. STRING DYNAMICS 141

so
τ y(x + a) − y(x) y(x) − y( x − a)
ρ¨(x) =
y −
a a a
 
τ  ∂y ∂y →τ
∂2y
≈ − ,
a ∂x x+a/2
∂x x−a/2
∂x2

and we wind up with the wave equation for transverse waves on a
massive string
∂2y ∂2y
− c2 2 = 0,
∂t2 ∂x
where
τ
c= .
ρ
Solving this wave equation is very simple. For the fixed boundary
conditions y(x) = 0 at x = 0 and x = , the solution is a fourier
expansion
∞
y(x, t) = Re Bp eickp t sin kp x,
p=1

where kp = pπ. Each p represents one normal mode, and there are
an infinite number as we would expect because in the continuum limit
there are an infinite number of degrees of freedom.
We have certainly not shown that y(x) = B sin kx is a normal mode
for the problem with finite n, but it is worth checking it out. This
corresponds to a mode with yj = B sin kaj, on which we apply the
matrix A
τ
(A · y)i = Aij yj = − (yi+1 − 2yi + yi−1 )
j a
τ
= − B (sin(kai + ka) − 2 sin(kai) + sin(kai − ka))
a
τ
= − B(sin(kai) cos(ka) + cos(kai) sin(ka) − 2 sin(kai)
a
+ sin(kai) cos(ka) − cos(kai) sin(ka))
τ
= B (2 − 2 cos(ka)) sin(kai)
a
2τ
= (1 − cos(ka)) yi .
a


So we see that it is a normal mode, although the frequency of oscillation

2τ τ sin(ka/2)
ω= (1 − cos(ka)) = 2
am ρ a

differs from k τ /ρ except in the limit a → 0 for fixed k.
The k’s which index the normal modes are restricted by the fixed
ends to the discrete set k = pπ/ = pπ/(n + 1)a, but this is still too
many (∞) for a system with a finite number of degrees of freedom.
The resolution of this paradox is that not all different k’s correspond
to different modes. For example, if p = p + 2m(n + 1) for some integer
m, then k = k + 2πm/a, and sin(k aj) = sin(kaj + 2mπ) = sin(kaj),
so k and k represent the same normal mode. Also, if p = 2(n + 1) − p,
k = (2π/a) − k, sin(k aj) = sin(2π − kaj) = − sin(kaj), so k and k
represent the same normal mode, with opposite phase. Finally p =
n + 1, k = π/a gives yj = B sin(kaj) = 0 for all j and is not a normal
mode. This leaves as independent only p = 1, ..., n, the right number
of normal modes for a system with n degrees of freedom.
The angular frequency of the p’th normal mode

τ pπ
ωp = 2 sin
ma 2(n + 1)

in plotted in Fig. 5.3. For fixed values of p and ρ, as n → ∞,

τ1 paπ τ pπ
ωp = 2 sin →2 = ckp ,
ρa 2 ρ2

5.4. FIELD THEORY 143

as we have in the continuum
limit. But if we consider modes
with a fixed ratio of p/n as n →
∞, we do not have a smooth
limit y(x), and such nodes are
not appropriate for the con-
tinuum limit. In the physics
of crystals, the former kind of
modes are known as accoustic
modes, while the later modes,
in particular those for n − p
fixed, which depend on the dis- 1 2 3 4 5 6 7 8 9 10 11 12

crete nature of the crystal, are Fig. 5.3. Frequencies of oscilla-
called optical modes. tion of the loaded string.

5.4 Field theory
We saw in the last section that the kinetic and potential energies in
the continuum limit can be written as integrals over x of densities, and
so we may also write the Lagrangian as the integral of a Lagrangian
density L(x),
 
L 2
1 1 ∂y(x, t)
L=T −U = dxL(x), L(x) =  ρy 2 (x, t) − τ
˙ .
0 2 2 ∂x

This Lagrangian, however, will not be of much use until we figure out
what is meant by varying it with respect to each dynamical degree
of freedom or its corresponding velocity. In the discrete case we have
the canonical momenta Pi = ∂L/∂ yi , where the derivative requires
˙
holding all yj fixed, for j = i, as well as all yk fixed. This extracts one
˙
term from the sum 1 ρ ayi2 , and this would appear to vanish in the
2
˙
limit a → 0. Instead, we define the canonical momentum as a density,
Pi → aP (x = ia), so

1 ∂
P (x = ia) = lim a L(y(x), y(x), x)|x=ai .
˙
a ∂ yi
˙ i


We may think of the last part of this limit,

lim a L(y(x), y(x), x)|x=ai =
˙ dx L(y(x), y(x), x),
˙
a→0
i

if we also define a limiting operation
1 ∂ δ
lim → ,
a→0 a ∂ yi
˙ δ y(x)
˙
1 ∂
and similarly for a ∂yi
, which act on functionals of y(x) and y(x) by
˙

δy(x1) δ y(x1 )
˙ δy(x1 ) δ y(x1 )
˙
= δ(x1 − x2 ), = = 0, = δ(x1 − x2 ).
δy(x2) δy(x2 ) δ y(x2 )
˙ δ y(x2 )
˙
Here δ(x − x) is the Dirac delta function, defined by its integral,
x2
f (x )δ(x − x)dx = f (x)
x1

for any function f (x), provided x ∈ (x1 , x2 ). Thus
δ 1
P (x) = dx ρy 2 (x , t) =
˙ dx ρy(x , t)δ(x − x) = ρy(x, t).
˙ ˙
δ y(x)
˙ 0 2 0

We also need to evaluate
2
δ δ −τ ∂y
L= dx .
δy(x) δy(x) 0 2 ∂x x=x

For this we need
δ ∂y(x ) ∂
= δ(x − x) := δ (x − x),
δy(x) ∂x ∂x
which is again defined by its integral,
x2 x2 ∂
f (x )δ (x − x)dx = f (x ) δ(x − x)dx
x1 x1 ∂x
x2 ∂f
x
= f (x )δ(x − x)|x2 −
1
dx δ(x − x)
x1 ∂x
∂f
= (x),
∂x

5.4. FIELD THEORY 145

where after integration by parts the surface term is dropped because
δ(x − x ) = 0 for x = x , which it is for x = x1 , x2 if x ∈ (x1 , x2 ). Thus

δ ∂y ∂2y
L=− dx τ (x )δ (x − x) = τ 2 ,
δy(x) 0 ∂x ∂x

and Lagrange’s equations give the wave equation

∂2y
ρ¨(x, t) − τ
y = 0.
∂x2

Exercises

5.1 Three springs connect two masses to each other and to immobile walls,
as shown. Find the normal modes and frequencies of oscillation, assuming
the system remains along the line shown.

a 2a a

k m 2k m k

5.2 Consider the motion, in a vertical plane of a double pendulum consist-
ing of two masses attached to each other and to a ﬁxed point by inextensible
strings of length L. The upper mass has mass m1 and the lower mass m2 .
This is all in a laboratory with the ordinary gravitational forces near the
surface of the Earth.


L
a) Set up the Lagrangian for the motion, assuming the
strings stay taut.
b) Simplify the system under the approximation that the m
motion involves only small deviations from equilibrium.
1
Put the problem in matrix form appropriate for the pro-
cedure discussed in class. L
c) Find the frequencies of the normal modes of oscilla-
tion. [Hint: following exactly the steps given in class will
be complex, but the analogous procedure reversing the m
order of U and T will work easily.] 2

5.3 (a) Show that if three mutually gravitating point masses are at the
vertices of an equilateral triangle which is rotating about an axis normal
to the plane of the triangle and through the center of mass, at a suitable
angular velocity ω, this motion satisfies the equations of motion. Thus this
configuration is an equilibrium in the rotating coordinate system. Do not
assume the masses are equal.
(b) Suppose that two stars of masses M1 and M2 are rotating in circular
orbits about their common center of mass. Consider a small mass m which
is approximately in the equilibrium position described above (which is known
as the L5 point). The mass is small enough that you can ignore its effect on
the two stars. Analyze the motion, considering specifically the stability of
the equilibrium point as a function of the ratio of the masses of the stars.

Chapter 6

Hamilton’s Equations

We discussed the generalized momenta

∂L(q, q, t)
˙
pi = ,
∂ qi
˙

and how the canonical variables {qi , pj } describe phase space. One can
use phase space rather than {qi , qj } to describe the state of a system
˙
at any moment. In this chapter we will explore the tools which stem
from this phase space approach to dynamics.

6.1 Legendre transforms
The important object for determining the motion of a system using the
Lagrangian approach is not the Lagrangian itself but its variation, un-
der arbitrary changes in the variables q and q, treated as independent
˙
variables. It is the vanishing of the variation of the action under such
variations which determines the dynamical equations. In the phase
space approach, we want to change variables q → p, where the pi are
˙
part of the gradient of the Lagrangian with respect to the velocities.
This is an example of a general procedure called the Legendre trans-
formation. We will discuss it in terms of the mathematical concept of
a differential form.
Because it is the variation of L which is important, we need to focus
our attention on the differential dL rather than on L itself. We first

147

148 CHAPTER 6. HAMILTON’S EQUATIONS

want to give a formal definition of the differential, which we will do first
for a function f (x1 , ..., xn ) of n variables, although for the Lagrangian
we will later subdivide these into coordinates and velocities. We will
take the space in which x takes values to be some general space we call
M, which might be ordinary Euclidean space but might be something
else, like the surface of a sphere1 . Given a function f of n independent
variables xi , the differential is
n
∂f
df = dxi . (6.1)
i=1 ∂xi

What does that mean? As an approximate statement, this can be
regarded as saying
n
∂f
df ≈ ∆f ≡ f (xi + ∆xi ) − f (xi ) = ∆xi + O(∆xi ∆xj ),
i=1 ∂xi

with some statement about the ∆xi being small, followed by the drop-
ping of the “order (∆x)2 ” terms. Notice that df is a function not only
of the point x ∈ M, but also of the small displacements ∆xi . A very
useful mathematical language emerges if we formalize the definition of
df , extending its definition to arbitrary ∆xi , even when the ∆xi are
not small. Of course, for large ∆xi they can no longer be thought
of as the difference of two positions in M and df no longer has the
meaning of the difference of two values of f . Our formal df is now
defined as a linear function of these ∆xi variables, which we therefore
consider to be a vector v lying in an n-dimensional vector space Rn .
Thus df : M × Rn → R is a real-valued function with two arguments,
one in M and one in a vector space. The dxi which appear in (6.1)
can be thought of as operators acting on this vector space argument to
extract the i th component, and the action of df on the argument (x, v)
is df (x, v) = i (∂f /∂xi )vi .
This differential is a special case of a 1-form, as is each of the oper-
ators dxi . All n of these dxi form a basis of 1-forms, which are more
generally
ω= ωi (x)dxi .
i
1
Mathematically, M is a manifold, but we will not carefully define that here.
The precise definition is available in Ref. [11].

6.1. LEGENDRE TRANSFORMS 149

If there exists an ordinary function f (x) such that ω = df , then ω is
said to be an exact 1-form.
Consider L(qi , vj , t), where vi = qi . At a given time we consider q
˙
and v as independant variables. The differential of L on the space of
coordinates and velocities, at a fixed time, is
∂L ∂L ∂L
dL = dqi + dvi = dqi + pi dvi .
i ∂qi i ∂vi i ∂qi i

If we wish to describe physics in phase space (qi , pi ), we are making
a change of variables from vi to the gradient with respect to these
variables, pi = ∂L/∂vi , where we focus now on the variables being
transformed and ignore the fixed qi variables. So dL = i pi dvi , and
the pi are functions of the vj determined by the function L(vi ). Is
there a function g(pi) which reverses the roles of v and p, for which
dg = i vi dpi ? If we can invert the functions p(v), we can define
g(pi) = i vi pi − L(vi (pj )), which has a differential

dg = dvi pi + vi dpi − dL = dvi pi + vi dpi − pi dvi
i i i i i

= vi dpi
i

as requested, and which also determines the relationship between v and
p,
∂g
vi = = vi (pj ),
∂pi
giving the inverse relation to pk (v ). This particular form of changing
variables is called a Legendre transformation. In the case of interest
here, the function g is called H(qi , pj , t), the Hamiltonian,

H= qi pi − L.
˙ (6.2)
i

Other examples of Legendre transformations occur in thermody-
namics. The energy change of a gas in a variable container with heat
flow is sometimes written

dE = dQ − pdV,
¯


where dQ is not an exact differential, and the heat Q is not a well defined
¯
system variable. Instead one defines the entropy and temperature
dQ = T dS, and the entropy S is a well defined property of the gas.
¯
Thus the state of the gas can be described by the two variables S and
V , and changes involve an energy change

dE = T dS − pdV.

We see that the temperature is T = ∂E/∂S|V . If we wish to find
quantities appropriate for describing the gas as a function of T rather
than S, we define the free energy F by −F = T S−E so dF = −SdT −
pdV , and we treat F as a function F (T, V ). Alternatively, to use the
pressure p rather than V , we define the enthalpy X(p, S) = V p + E,
dX = V dp+T dS. To make both changes, and use (T, p) to describe the
state of the gas, we use the Gibbs free energy G(T, p) = X − T S =
E + V p − T S, dG = V dp − SdT
Most Lagrangians we encounter have the decomposition L = L2 +
L1 + L0 into terms quadratic, linear, and independent of velocities, as
considered in 2.1.5. Then the momenta are linear in velocities, pi =
j Mij qj + ai , or in matrix form p = M · q + a, which has the inverse
˙ ˙
relation q = M · (p − a). As H = L2 − L0 , H = 1 (p − a) · M −1 · (p −
˙ −1
2
a) − L0 . As an example, consider spherical coordinates, in which the
kinetic energy is

m 2 1 p2 p2
T = 2 ˙2 2 2 ˙2
r + r θ + r sin θφ =
˙ pr + 2 + 2 φ 2
2 θ
.
2 2m r r sin θ

Note that pθ = p · eθ , in fact it doesn’t even have the same units.
ˆ
The equations of motion in Hamiltonian form,

∂H ∂H
qk =
˙ , pk = −
˙ ,
∂pk q,t
∂qk p,t

are almost symmetric in their treatment of q and p. If we define a 2N
dimensional coordinate η for phase space,

ηi = qi
for 1 ≤ i ≤ N,
ηn+i = pi

6.1. LEGENDRE TRANSFORMS 151

we can write Hamilton’s equation in terms of a particular matrix J,

∂H 0 1 N ×N
I
ηj = Jij
˙ , where J = .
∂ηk −1 N ×N
I 0

J is like a multidimensional version of the iσy which we meet in quantum-
mechanical descriptions of spin 1/2 particles. It is real, antisymmetric,
and because J 2 = −1 it is orthogonal. Mathematicians would say that
I,
J describes the complex structure on phase space.
For a given physical problem there is no unique set of generalized
coordinates which describe it. Then transforming to the Hamiltonian
may give different objects. An nice example is given in Goldstein,
a mass on a spring attached to a “fixed point” which is on a truck
moving at uniform velocity vT , relative to the Earth. If we use the
Earth coordinate x to describe the mass, the equilibrium position of
the spring is moving in time, xeq = vT t, ignoring a negligible initial
position. Thus U = 1 k(x − vT t)2 , while T = 1 mx2 as usual, and
2 2
˙
1 1 1
L = 2 mx − 2 k(x − vT t) , p = mx, H = p /2m + 2 k(x − vT t)2 . The
˙ 2 2
˙ 2

equations of motion p = m¨ = −∂H/∂x = −k(x−vT t), of course, show
˙ x
that H is not conserved, dH/dt = (p/m)dp/dt + k(x − vT )(x − vT t) =
˙
−(kp/m)(x − vT t) + (kp/m − kvT )(x − vT t) = −kvT (x − vT t) = 0.
Alternatively, dH/dt = −∂L/∂t = −kvT (x − vT t) = 0. This is not
surprising; the spring exerts a force on the truck and the truck is doing
work to keep the fixed point moving at constant velocity.
On the other hand, if we use the truck coordinate x = x − vT t, we
may describe the motion in this frame with T = 1 mx 2 , U = 1 kx 2 ,
2
˙ 2
1 2 1 2
L = 2 mx − 2 kx , giving the correct equations of motion p = mx ,
˙ ˙
p = m¨ = −∂L /∂x = −kx . With this set of coordinates, the
˙ x
Hamiltonian is H = x p − L = p 2 /2m + 1 kx 2 , which is conserved.
˙ 2
From the correspondence between the two sets of variables, x = x−vT t,
and p = p − mvT , we see that the Hamiltonians at corresponding
points in phase space differ, H(x, p) − H (x , p ) = (p2 − p 2 )/2m =
2mvT p − 1 mvT = 0.
2
2


6.2 Variations on phase curves
In applying Hamilton’s Principle to derive Lagrange’s Equations, we
considered variations in which δqi (t) was arbitrary except at the initial
and final times, but the velocities were fixed in terms of these, δ qi (t) =
˙
(d/dt)δqi (t). In discussing dynamics in terms of phase space, this is not
the most natural variation, because this means that the momenta are
not varied independently. Here we will show that Hamilton’s equations
follow from a modified Hamilton’s Principle, in which the momenta are
freely varied.
We write the action in terms of the Hamiltonian,
tf
I= pi qi − H(qj , pj , t) dt,
˙
ti i

and consider its variation under arbitrary variation of the path in phase
space, (qi (t), pi (t)). The qi (t) is still dqi /dt, but the momentum is varied
˙
free of any connection to qi . Then
˙
tf tf
∂H ∂H
δI = δpi qi −
˙ − δqi pi +
˙ dt + pi δqi ,
ti i ∂pi i ∂qi i ti

where we have integrated the pi dδqi /dt term by parts. Note that
in order to relate stationarity of the action to Hamilton Equations of
Motion, it is necessary only to constrain the qi (t) at the initial and final
times, without imposing any limitations on the variation of pi (t), either
at the endpoints, as we did for qi (t), or in the interior (ti , tf ), where
we had previously related pi and qj . The relation between qi and pj
˙ ˙
emerges instead among the equations of motion.
The qi seems a bit out of place in a variational principle over phase
˙
space, and indeed we can rewrite the action integral as an integral of a
1-form over a path in extended phase space,

I= pi dqi − H(q, p, t)dt.
i

We will see, in section 6.6, that the first term of the integrand leads to
a very important form on phase space, and that the whole integrand is
an important 1-form on extended phase space.

6.3. CANONICAL TRANSFORMATIONS 153

6.3 Canonical transformations
We have seen that it is often useful to switch from the original set of
coordinates in which a problem appeared to a diﬀerent set in which
the problem became simpler. We switched from cartesian to center-of-
mass spherical coordinates to discuss planetary motion, for example,
or from the Earth frame to the truck frame in the example in which
we found how Hamiltonians depend on coordinate choices. In all these
cases we considered a change of coordinates q → Q, where each Qi is
a function of all the qj and possibly time, but not of the momenta or
velocities. This is called a point transformation. But we have seen
that we can work in phase space where coordinates and momenta enter
together in similar ways, and we might ask ourselves what happens if we
make a change of variables on phase space, to new variables Qi (q, p, t),
Pi (q, p, t). We should not expect the Hamiltonian to be the same either
in form or in value, as we saw even for point transformations, but there
must be a new Hamiltonian K(Q, P, t) from which we can derive the
correct equations of motion,
∂K ∂K
˙
Qi = , Pi = −
˙ .
∂Pi ∂Qi
The analog of η for our new variables will be called ζ, so
Q ˙ ∂K
ζ= , ζ=J· .
P ∂ζ
If this exists, we say the new variables (Q, P ) are canonical variables
and the transformation (q, p) → (Q, P ) is a canonical transforma-
tion.
These new Hamiltonian equations are related to the old ones, η = J ·
˙
∂H/∂η, by the function which gives the new coordinates and momenta
in terms of the old, ζ = ζ(η, t). Then

˙ dζi ∂ζi ∂ζi
ζi = = ηj +
˙ .
dt j ∂ηj ∂t

Let us write the Jacobian matrix Mij := ∂ζi /∂ηj . In general, M will
not be a constant but a function on phase space. The above relation


for the velocities now reads

˙ ∂ζ
ζ =M ·η+
˙ .
∂t η

The gradients in phase space are also related,
∂ ∂ζj ∂
= , or η = MT · ζ.
∂ηi t,η
∂ηi t,η
∂ζj t,ζ

Thus we have
∂ζ ∂ζ ∂ζ
˙
ζ = M ·η+
˙ = M ·J · ηH + = M · J · MT · ζH +
∂t ∂t ∂t
= J · ζ K.
Let us first consider a canonical transformation which does not de-
pend on time, so ∂ζ/∂t|η = 0. We see that we can choose the new
Hamiltonian to be the same as the old, K = H, and get correct me-
chanics, if
M · J · M T = J. (6.3)
We will require this condition even when ζ does depend on t, but then
se need to revisit the question of finding K.
The condition (6.3) on M is similar to, and a generalization of, the
condition for orthogonality of a matrix, OOT = 1 which is of the same
I,
form with J replaced by 1 Another example of this kind of relation
I.
in physics occurs in special relativity, where a Lorentz transformation
Lµν gives the relation between two coordinates, xµ = ν Lµν xν , with
xν a four dimensional vector with x4 = ct. Then the condition which
makes L a Lorentz transformation is
 
1 0 0 0
0 1 0 0 
 
L · g · LT = g, with g =  .
0 0 1 0 
0 0 0 −1
The matrix g in relativity is known as the indefinite metric, and the
condition on L is known as pseudo-orthogonality. In our main discus-
sion, however, J is not a metric, as it is antisymmetric rather than
symmetric, and the word which describes M is symplectic.

6.4. POISSON BRACKETS 155

Just as for orthogonal transformations, symplectic transformations
can be divided into those which can be generated by infinitesimal
transformations (which are connected to the identity) and those which
can not. Consider a transformation M which is almost the identity,
Mij = δij + Gij , or M = 1 + G, where is considered some in-
I
finitesimal parameter while G is a finite matrix. As M is symplectic,
(1 + G) · J · (1 + GT ) = J, which tells us that to lowest order in ,
GJ + JGT = 0. Comparing this to the condition for the generator of
an infinitesimal rotation, Ω = −ΩT , we see that it is similar except for
the appearence of J on opposite sides, changing orthogonality to sym-
plecticity. The new variables under such a canonical transformation
are ζ = η + G · η.
One important example of an infinitesimal canonical transformation
is the one which relates (time dependent transformations (?)) at dif-
ferent times. Suppose η → ζ(η, t) is a canonical tranformation which
depends on time. One particular one is η → ζ0 = ζ(η, t0) for some par-
ticular time, so ζ0 → ζ(η, t0) is also a canonical transformation, and for
t = t0 + ∆t ≈ t0 it will be nearly the identity if ζ(η, t) is differentiable.
Notice that the relationship ensuring Hamilton’s equations exist,

∂ζ
M · J · MT · ζH + =J· ζ K,
∂t

with the symplectic condition M · J · M T = J, implies ζ (K − H) =
−J · ∂ζ/∂t, so K differs from H here. This discussion holds as long as
M is symplectic, even if it is not an infinitesimal transformation.

6.4 Poisson Brackets
Suppose I have some function f (q, p, t) on phase space and I want to
ask how f changes as the system evolves with time. Then

df ∂f ∂f ∂f
= qi +
˙ pi +
˙
dt i ∂qi i ∂pi ∂t
∂f ∂H ∂f ∂H ∂f
= − + .
i ∂qi ∂pi i ∂pi ∂qi ∂t


The structure of the first two terms is that of a Poisson bracket, a
bilinear operation of functions on phase space defined by
∂u ∂v ∂u ∂v
[u, v] := − . (6.4)
i ∂qi ∂pi i ∂pi ∂qi

The Poisson bracket is a fundamental property of the phase space. In
symplectic language,

∂u ∂v T
[u, v] = Jij =( η u) ·J · η v. (6.5)
∂ηi ∂ηj

If we describe the system in terms of a different set of canonical variables
ζ, we should still find the function f (t) changing at the same rate. We
may think of u and v as functions of ζ as easily as of η, and we may
ask whether [u, v]ζ is the same as [u, v]η . Using η = M T · ζ , we have
T T
[u, v]η = MT · ζu · J · MT ζv =( ζ u) · M · J · MT ζv
T
= ( ζ u) ·J ζv = [u, v]ζ ,

so we see that the Poisson bracket is independent of the coordinatization
used to describe phase space, as long as it is canonical.
The Poisson bracket plays such an important role in classical me-
chanics, and an even more important role in quantum mechanics, that
it is worthwhile to discuss some of its abstract properties. First of all,
from the definition it is obvious that it is antisymmetric:

[u, v] = −[v, u]. (6.6)

It is a linear operator on each function over constant linear combina-
tions, but is satisfies a Leibnitz rule for non-constant multiples,

[uv, w] = [u, w]v + u[v, w], (6.7)

which follows immediately from the definition, using Leibnitz’ rule on
the partial derivatives. A very special relation is the Jacobi identity,

[u, [v, w]] + [v, [w, u]] + [w, [u, v]] = 0. (6.8)


We need to prove that this is true. To simplify the presentation, we
introduce some abbreviated notation. We use a subscript ,i to indicate
partial derivative with respect to ηi , so u,i means ∂u/∂ηi , and u,i,j means
∂(∂u/∂ηi )/∂ηj . We will assume all our functions on phase space are
suitably differentiable, so u,i,j = u,j,i. We will also use the summation
convention, that any index which appears twice in a term is assumed
to be summed over2 . Then [v, w] = v,i Jij w,j , and
[u, [v, w]] = [u, v,iJij w,j ]
= [u, v,i]Jij w,j + v,i Jij [u, w,j ]
= u,k Jk v,i, Jij w,j + v,i Jij u,k Jk w,j, .
In the Jacobi identity, there are two other terms like this, one with the
substitution u → v → w → u and the other with u → w → v → u,
giving a sum of six terms. The only ones involving second derivatives
of v are the first term above and the one found from applying u →
w → v → u to the second, u,iJij w,k Jk v,j, . The indices are all dummy
indices, summed over, so their names can be changed, by i → k → j →
→ i, converting this term to u,k Jk w,j Jjiv, ,i. Adding the original term
u,k Jk v,i, Jij w,j , and using v, ,i = v,i, , gives u,k Jk w,j (Jji + Jij )v, ,i = 0
because J is antisymmetric. Thus the terms in the Jacobi identity
involving second derivatives of v vanish, but the same argument applies
in pairs to the other terms, involving second derivatives of u or of w,
so they all vanish, and the Jacobi identity is proven.
This argument can be made more elegantly if we recognize that
for each function f on phase space, we may view [f, ·] as a differential
operator on functions g on phase space, mapping g → [f, g]. Calling
this operator Df , we see that
∂f ∂
Df = Jij ,
j i ∂ηi ∂ηj

which is of the general form that a differential operator has,
∂
Df = fj ,
j ∂ηj
2
This convention of understood summation was invented by Einstein, who called
it the “greatest contribution of my life”.


where fj are an arbitrary set of functions on phase space. For the
Poisson bracket, the functions fj are linear combinations of the f,j ,
but fj = f,j . With this interpretation, [f, g] = Df g, and [h, [f, g]] =
Dh Df g. Thus

[h, [f, g]] + [f, [g, h]] = [h, [f, g]] − [f, [h, g]] = Dh Df g − Df Dh g
= (Dh Df − Df Dh )g,

and we see that this combination of Poisson brackets involves the com-
mutator of differential operators. But such a commutator is always a
linear differential operator itself,

∂ ∂ ∂gj ∂ ∂2
Dh Dg = hi gj = hi + hi gj
ij ∂ηi ∂ηj ij ∂ηi ∂ηj ij ∂ηi ∂ηj
∂ ∂ ∂hi ∂ ∂2
Dg Dh = gj hi = gj + hi gj
ij ∂ηj ∂ηi ij ∂ηj ∂ηi ij ∂ηi ∂ηj

so in the commutator, the second derivative terms cancel, and

∂gj ∂ ∂hi ∂
D h Dg − D g Dh = hi − gj
ij ∂ηi ∂ηj ij ∂ηj ∂ηi
∂gj ∂hj ∂
= hi − gi .
ij ∂ηi ∂ηi ∂ηj

This is just another first order differential operator, so there are no
second derivatives of f left in the left side of the Jacobi identity. In
fact, the identity tells us that this combination is

Dh Dg − Dg Dh = D[h,g] (6.9)

An antisymmetric product which obeys the Jacobi identity is what
makes a Lie algebra. Lie algebras are the infinitesimal generators of
Lie groups, or continuous groups, one example of which is the group
of rotations SO(3) which we have already considered. Notice that the
“product” here is not assosciative, [u, [v, w]] = [[u, v], w]. In fact, the
difference [u, [v, w]] − [[u, v], w] = [u, [v, w]] + [w, [u, v]] = −[v, [w, u]] by


the Jacobi identity, so the Jacobi identity replaces the law of associa-
tivity in a Lie algebra.
Recall that the rate at which a function on phase space, evaluated
on the system as it evolves, changes with time is

df ∂f
= −[H, f ] + , (6.10)
dt ∂t
where H is the Hamiltonian. The function [f, g] on phase space also
evolves that way, of course, so

d[f, g] ∂[f, g]
= −[H, [f, g]] +
dt ∂t
∂f ∂g
= [f, [g, H]] + [g, [H, f ]] + , g + f,
∂t ∂t
∂g ∂f
= f, −[H, g] + + g, [H, f ] −
∂t ∂t
dg df
= f, − g, .
dt dt

If f and g are conserved quantities, df /dt = dg/dt = 0, and we have
the important consequence that d[f, g]/dt = 0. This proves Poisson’s
theorem: The Poisson bracket of two conserved quantities is a con-
served quantity.
We will now show an important theorem, known as Liouville’s
theorem, that the volume of a region of phase space is invariant under
canonical transformations. This is not a volume in ordinary space,
but a 2n dimensional volume, given by integrating the volume element
2n
i=1 dηi in the old coordinates, and by

2n 2n 2n
∂ζi
dζi = det dηi = |det M| dηi
i=1 ∂ηj i=1 i=1

in the new, where we have used the fact that the change of variables
requires a Jacobian in the volume element. But because J = M ·J ·M T ,
det J = det M det J det M T = (det M)2 det J, and J is nonsingular, so
det M = ±1, and the volume element is unchanged.


In statistical mechanics, we generally do not know the actual state
of a system, but know something about the probability that the system
is in a particular region of phase space. As the transformation which
maps possible values of η(t1 ) to the values into which they will evolve
at time t2 is a canonical transformation, this means that the volume of
a region in phase space does not change with time, although the region
itself changes. Thus the probability density, specifying the likelihood
that the system is near a particular point of phase space, is invariant
as we move along with the system.

6.5 Higher Differential Forms
In section 6.1 we discussed a reinterpretation of the differential df as an
example of a more general differential 1-form, a map ω : M × Rn → R.
We saw that the {dxi } provide a basis for these forms, so the general
1-form can be written as ω = i ωi (x) dxi . The differential df gave an
example. We defined an exact 1-form as one which is a differential of
some well-defined function f . What is the condition for a 1-form to be
exact? If ω = ωi dxi is df , then ωi = ∂f /∂xi = f,i , and

∂ωi ∂2f ∂2f
ωi,j = = = = ωj,i.
∂xj ∂xi ∂xj ∂xj ∂xi

Thus one necessary condition for ω to be exact is that the combination
ωj,i − ωi,j = 0. We will define a 2-form to be the set of these objects
which must vanish. In fact, we define a differential k-form to be a
map

ω (k) : M × Rn × · · · × Rn → R
k times

which is linear in its action on each of the Rn and totally antisymmetric
in its action on the k copies, and is a smooth function of x ∈ M. At a

6.5. HIGHER DIFFERENTIAL FORMS 161

given point, a basis of the k-forms is3

dxi1 ∧ dxi2 ∧ · · · ∧ dxik := (−1)P dxiP 1 ⊗ dxiP 2 ⊗ · · · ⊗ dxiP k .
P ∈Sk

For example, in three dimensions there are three independent 2-forms
at a point, dx1 ∧ dx2 , dx1 ∧ dx3 , and dx2 ∧ dx3 , where dx1 ∧ dx2 =
dx1 ⊗ dx2 − dx2 ⊗ dx1 , which means that, acting on u and v, dx1 ∧
dx2 (u, v) = u1 v2 − u2 v1 . The product ∧ is called the wedge product
or exterior product, and can be extended to act between k1 - and
k2 -forms so that it becomes an associative distributive product. Note
that this definition of a k-form agrees, for k = 1, with our previous
definition, and for k = 0 tells us a 0-form is simply a function on M.
The general expression for a k-form is

ω (k) = ωi1 ...ik (x)dxi1 ∧ · · · ∧ dxik .
i1 <...<ik

Let us consider some examples in three dimensional Euclidean space
E 3 , where there is a correspondance we can make between vectors and
1- and 2-forms. In this discussion we will not be considering how the
objects change under changes in the coordinates of E 3 , to which we will
return later.

k = 0: As always, 0-forms are simply functions, f (x), x ∈ E 3 .

k = 1: A 1-form ω = ωi dxi can be thought of, or associated with, a
vector field A(x) = ωi (x)î . Note that if ω = df , ωi = ∂f /∂xi ,
e
so A = f .

k = 2: A general two form is a sum over the three independent wedge
products with independent functions B12 (x), B13 (x), B23 (x). Let
3
Some explanation of the mathematical symbols might be in order here. Sk is the
group of permutations on k objects, and (−1)P is the sign of the permutation P ,
which is plus or minus one if the permutation can be built from an even or an odd
number, respectively, of transpositions of two of the elements. The tensor product
⊗ of two linear operators into a field is a linear operator which acts on the product
space, or in other words a bilinear operator with two arguments. Here dxi ⊗ dxj is
n n
an operator on R × R which maps the pair of vectors (u, v) to ui vj .


us extend the definition of Bij to make it an antisymmetric ma-
trix, so
B= Bij dxi ∧ dxj = Bij dxi ⊗ dxj .
i<j i,j

As we did for the angular velocity matrix Ω in (4.2), we can
condense the information in the antisymmetric matrix Bij into
a vector field B = Bi ei , with Bij =
ˆ ijk Bk . Note that this
3
step requires that we are working in E rather than some other
dimension. Thus B = ijk ijk Bk dxi ⊗ dxj .

k = 3: There is only one basis 3-form available in three dimensions,
dx1 ∧ dx2 ∧ dx3 . Any other 3-form is proportional to this one,
and in particular dxi ∧ dxj ∧ dxk = ijk dx1 ∧ dx2 ∧ dx3 . The most
general 3-form C is simply specified by an ordinary function C(x),
which multiplies dx1 ∧ dx2 ∧ dx3 .
Having established, in three dimensions, a correspondance between
vectors and 1- and 2-forms, and between functions and 0- and 3-forms,
we can ask to what the wedge product corresponds in terms of these
vectors. If A and C are two vectors corresponding to the 1-forms A =
Ai dxi and C = Ci dxi , and if B = A ∧ C, then

B= Ai Cj dxi ∧ dxj = (Ai Cj − Aj Ci )dxi ⊗ dxj = Bij dxi ⊗ dxj ,
ij ij ij

so Bij = Ai Cj − Aj Ci , and
1 1 1
Bk = kij Bij = kij Ai Cj − kij Aj Ci = kij Ai Cj ,
2 2 2
so
B = A × C,
and the wedge product of two 1-forms is the cross product of their
vectors.
If A is a 1-form and B is a 2-form, the wedge product C = A ∧ B =
C(x)dx1 ∧ dx2 ∧ dx3 is given by

C = A∧B = Ai Bjk dxi ∧ dxj ∧ dxk
i j<k
jk B ijk dx1 ∧ dx2 ∧ dx3


= Ai B jk ijk dx1 ∧ dx2 ∧ dx3
i j<k
symmetric under j ↔ k
1
= Ai B jk ijk dx1 ∧ dx2 ∧ dx3 = Ai B δi dx1 ∧ dx2 ∧ dx3
2 i jk i

= A · Bdx1 ∧ dx2 ∧ dx3 ,

so we see that the wedge product of a 1-form and a 2-form gives the
dot product of their vectors.

The exterior derivative
We defined the differential of a function f , which we now call a 0-
form, giving a 1-form df = f,i dxi . Now we want to generalize the
notion of differential so that d can act on k-forms for arbitrary k. This
generalized differential

d : k-forms → (k + 1)-forms

is called the exterior derivative. It is defined to be linear and to act
on one term in the sum over basis elements by

d (fi1 ...ik (x)dxi1 ∧ · · · ∧ dxik ) = (dfi1 ...ik (x)) ∧ dxi1 ∧ · · · ∧ dxik
= fi1 ...ik ,j dxj ∧ dxi1 ∧ · · · ∧ dxik .
j

Clearly some examples are called for, so let us look again at three
dimensional Euclidean space.
k = 0: For a 0-form f , df = f,i dxi , as we defined earlier. In terms
of vectors, df ∼ f .

k = 1: For a 1-form ω = ωi dxi , dω = i dωi ∧ dxi = ij ωi,j dxj ∧
dxi = ij (ωj,i − ωi,j ) dxi ⊗ dxj , corresponding to a two form with
Bij = ωj,i − ωi,j . These Bij are exactly the things which must
vanish if ω is to be exact. In three dimensional Euclidean space,
we have a vector B with components Bk = 1 2 kij (ωj,i − ωi,j ) =

kij ∂i ωj = ( × ω)k , so here the exterior derivative of a 1-form
gives a curl, B = × ω.


k = 2: On a two form B = i<j Bij dxi ∧ dxj , the exterior derivative
gives a 3-form C = dB = k i<j Bij,k dxk ∧ dxi ∧ dxj . In three-
dimensional Euclidean space, this reduces to

C= (∂k ij B) kij dx1 ∧dx2 ∧dx3 = ∂k Bk dx1 ∧dx2 ∧dx3 ,
k i<j k

so C(x) = · B, and the exterior derivative on a 2-form gives the
divergence of the corresponding vector.

k = 3: If C is a 3-form, dC is a 4-form. In three dimensions there
cannot be any 4-forms, so dC = 0 for all such forms.

We can summarize the action of the exterior derivative in three dimen-
sions in this diagram:

f
d
¹ ω (1) ∼ A
d
¹ ω (2) ∼ B
d
¹ ω (3)
f ×A ·B

Now that we have d operating on all k-forms, we can ask what
happens if we apply it twice. Looking ﬁrst in three dimenions, on a 0-
form we get d2 f = dA for A ∼ f , and dA ∼ × A, so d2 f ∼ × f .
But the curl of a gradient is zero, so d2 = 0 in this case. On a one form
d2 A = dB, B ∼ × A and dB ∼ · B = · ( × A). Now we have
the divergence of a curl, which is also zero. For higher forms in three
dimensions we can only get zero because the degree of the form would
be greater than three. Thus we have a strong hint that d2 might vanish
in general. To verify this, we apply d2 to ω (k) = ωi1 ...ik dxi1 ∧· · ·∧dxik .
Then

dω = (∂j ωi1 ...ik ) dxj ∧ dxi1 ∧ · · · ∧ dxik
j i1 <i2 <···<ik

d(dω) = ( ∂ ∂j ωi1 ...ik ) dx ∧ dxj ∧dxi1 ∧ · · · ∧ dxik
j i1 <i2 <···<ik
symmetric antisymmetric
= 0.

This is a very important result. A k-form which is the exterior deriva-
tive of some (k − 1)-form is called exact, while a k-form whose exterior


derivative vanishes is called closed, and we have just proven that all
exact k-forms are closed.
The converse is a more subtle question. In general, there are k-
forms which are closed but not exact, given by harmonic functions
on the manifold M, which form what is known as the cohomology of
M. This has to do with global properties of the space, however, and
locally every closed form can be written as an exact one.4 The precisely
stated theorem, known as Poincar´’s Lemma, is that if ω is a closed
e
k-form on a coordinate neighborhood U of a manifold M, and if U
is contractible to a point, then ω is exact on U. We will ignore the
possibility of global obstructions and assume that we can write closed
k-forms in terms of an exterior derivative acting on a (k − 1)-form.

Coordinate independence of k-forms
We have introduced forms in a way which makes them appear depen-
dent on the coordinates xi used to describe the space M. This is not
what we want at all5 . We want to be able to describe physical quan-
tities that have intrinsic meaning independent of a coordinate system.
If we are presented with another set of coordinates yj describing the
same physical space, the points in this space set up a mapping, ideally
an isomorphism, from one coordinate space to the other, y = y(x). If
a function represents a physical field independent of coordinates, the
actual function f (x) used with the x coordinates must be replaced by
4
An example may be useful. In two dimensions, the 1-form ω = −yr−2 dx +
−2
xr dy satisfies dω = 0 wherever it is well defined, but it is not well defined at the
origin. Locally, we can write ω = dθ, where θ is the polar coordinate. But θ is
not, strictly speaking, a function on the plane, even on the plane with the origin
removed, because it is not single-valued. It is a well defined function on the plane
with a half axis removed, which leaves a simply-connected region, a region with no
holes. In fact, this is the general condition for the exactness of a 1-form — a closed
1-form on a simply connected manifold is exact.
5
Indeed, most mathematical texts will first define an abstract notion of a vector
in the tangent space as a directional derivative operator, specified by equivalence
classes of parameterized paths on M. Then 1-forms are defined as duals to these
vectors. In the first step any coordinatization of M is tied to the corresponding
n
basis of the vector space R . While this provides an elegant coordinate-independent
way of defining the forms, the abstract nature of this definition of vectors can be
unsettling to a physicist.


˜
another function f(y) when using the y coordinates. That they both de-
˜
scribe the physical value at a given physical point requires f (x) = f (y)
when y = y(x), or more precisely6 f (x) = f (y(x)). This associated
˜
function and coordinate system is called a scalar field.
If we think of the differential df as the change in f corresponding
˜
to an infinitesimal change dx, then clearly df is the same thing in
different coordinates, provided we understand the dyi to represent the
same physical displacement as dx does. That means

∂yk
dyk = dxj .
j ∂xj

˜ ˜
As f (x) = f (y(x)) and f(y) = f (x(y)), the chain rule gives

∂f ˜
∂ f ∂yj ∂f˜ ∂f ∂xi
= , = ,
∂xi j ∂yj ∂xi ∂yj i ∂xi ∂yj

so

∂f˜ ∂f ∂xi ∂yk
˜
df = dyk = dxj
k ∂yk ijk ∂xi ∂yk ∂xj
∂f
= δij dxj = f,i dxi = df.
ij ∂xi i

We impose this transformation law in general on the coefficients in our
k-forms, to make the k-form invariant, which means that the coefficients
are covariant,

∂xi
ωj =
˜ ωi
i ∂yj
k
∂xi
ωj1 ...jk =
˜ ωi1 ...ik .
i1 ,i2 ,...,ik =1 ∂yjl

6
More elegantly, giving the map x → y the name φ, so y = φ(x), we can state
˜
the relation as f = f ◦ φ.


Integration of k-forms
Suppose we have a k-dimensional smooth “surface” S in M, parame-
terized by coordinates (u1 , · · · , uk ). We deﬁne the integral of a k-form

ω (k) = ωi1 ...ik dxi1 ∧ · · · ∧ dxik
i1 <...<ik

over S by
k
∂xi
ω (k) = ωi1 ...ik (x(u)) du1 du2 · · · duk .
S i1 ,i2 ,...,ik =1
∂u

We had better give some examples. For k = 1, the “surface” is
actually a path Γ : u → x(u), and
umax ∂xi
ωi dxi = ωi (x(u)) du,
Γ umin ∂u

which seems obvious. In vector notation this is Γ A · dr, the path
integral of the vector A.
For k = 2,
∂xi ∂xj
ω (2) = Bij dudv.
S ∂u ∂v
In three dimensions, the parallelogram which is the image of the rect-
angle [u, u+du]×[v, v +dv] has edges (∂x/∂u)du and (∂x/∂v)dv, which
has an area equal to the magnitude of

∂x ∂x
“dS” = × dudv
∂u ∂v

and a normal in the direction of “dS”. Writing Bij in terms of the
corresponding vector B, Bij = ijk Bk , so

∂x ∂x
ω (2) = ijk Bk dudv
S S ∂u i
∂v j
∂x ∂x
= Bk × dudv = B · dS,
S ∂u ∂v k S


so ω (2) gives the flux of B through the surface.
Similarly for k = 3 in three dimensions,
∂x ∂x ∂x
ijk dudvdw
∂u i
∂v j
∂w k

is the volume of the parallelopiped which is the image of [u, u + du] ×
[v, v +dv]×[w, w +dw]. As ωijk = ω123 ijk , this is exactly what appears:
∂xi ∂xj ∂xk
ω (3) = ijk ω123 dudvdw = ω123 (x)dV.
∂u ∂v ∂w
Notice that we have only defined the integration of k-forms over
submanifolds of dimension k, not over other-dimensional submanifolds.
These are the only integrals which have coordinate invariant meanings.
We state7 a marvelous theorem, special cases of which you have seen
often before, known as Stokes’ Theorem. Let C be a k-dimensional
submanifold of M, with ∂C its boundary. Let ω be a (k − 1)-form.
Then Stokes’ theorem says

dω = ω. (6.11)
C ∂C

This elegant jewel is actually familiar in several contexts in three
dimensions. If k = 2, C is a surface, usually called S, bounded by a
closed path Γ = ∂S. If ω is a 1-form associated with A, then Γ ω =
Γ A · d . dω is the 2-form ∼ × A, and S dω = S × A · dS, so
we see that this Stokes’ theorem includes the one we first learned by
that name. But it also includes other possibilities. We can try k = 3,
where C = V is a volume with surface S = ∂V . Then if ω ∼ B is a
two form, S ω = S B · dS, while dω ∼ · B, so V dω = · BdV ,
so here Stokes’ general theorem gives Gauss’s theorem. Finally, we
could consider k = 1, C = Γ, which has a boundary ∂C consisting
of two points, say A and B. Our 0-form ω = f is a function, and
Stokes’ theorem gives8 Γ f = f (B) − f (A), the “fundamental theorem
of calculus”.
7
For a proof and for a more precise explanation of its meaning, we refer the reader
to the mathematical literature. In particular [10] and [3] are advanced calculus texts
which give elementary discussions in Euclidean 3-dimensional space. A more general
treatment is (possibly???) given in [11].
8
Note that there is a direction associated with the boundary, which is induced

6.6. THE NATURAL SYMPLECTIC 2-FORM 169

6.6 The natural symplectic 2-form
We now turn our attention back to phase space, with a set of canonical
coordinates (qi , pi ). Using these coordinates we can deﬁne a particular
1-form ω1 = i pi dqi . For a point transformation Qi = Qi (q1 , . . . , qn , t)
we may use the same Lagrangian, reexpressed in the new variables, of
course. Here the Qi are independent of the velocities qj , so on phase
˙
space9 dQi = j (∂Qi /∂qj )dqj . The new velocities are given by

˙ ∂Qi ∂Qi
Qi = qj +
˙ .
j ∂qj ∂t

Thus the old canonical momenta,

∂L(q, q, t)
˙ ˙
∂L(Q, Q, t) ˙
∂ Qj ∂Qj
pi = = = Pj .
∂ qi
˙ j
˙
∂ Qj ∂ qi
˙ j ∂qi
q,t q,t q,t

Thus the form ω1 may be written
∂Qj
ω1 = Pj dqi = Pj dQj ,
i j ∂qi j

so the form of ω1 is invariant under point transformations. This is too
limited, however, for our current goals of considering general canonical
transformations on phase space, under which ω1 will not be invariant.
However, its exterior derivative

ω2 := dω1 = dpi ∧ dqi
i

is invariant under all canonical transformations, as we shall show mo-
mentarily. This makes it special, the natural symplectic structure
by a direction associated with C itself. This gives an ambiguity in what we have
stated, for example how the direction of an open surface induces a direction on the
closed loop which bounds it. Changing this direction would clearly reverse the sign
of A · d . We have not worried about this ambiguity, but we cannot avoid noticing
the appearence of the sign in this last example.
9
We have not included a term ∂Qi dt which would be necessary if we were con-
∂t
sidering a form in the 2n + 1 dimensional extended phase space which includes time
as one of its coordinates.


on phase space. We can reexpress ω2 in terms of our combined coor-
dinate notation ηi , because

− Jij dηi ∧ dηj = − dqi ∧ dpi = dpi ∧ dqi = ω2 .
i<j i i

We must now show that the natural symplectic structure is indeed
form invariant under canonical transformation. Thus if Qi , Pi are a
new set of canonical coordinates, combined into ζj , we expect the cor-
responding object formed from them, ω2 = − ij Jij dζi ⊗ dζj , to reduce
to the same 2-form, ω2 . We ﬁrst note that

∂ζi
dζi = dηj = Mij dηj ,
j ∂ηj j

with the same Jacobian matrix M we met in (6.3). Thus

ω2 = − Jij dζi ⊗ dζj = − Jij Mik dηk ⊗ Mj dη
ij ij k

= − MT · J · M dηk ⊗ dη .
k
k

Things will work out if we can show M T · J · M = J, whereas what we
know for canonical transformations from Eq. (6.3) is that M · J · M T =
J. We also know M is invertible and that J 2 = −1, so if we multiply
this equation from the left by −J · M −1 and from the right by J · M,
we learn that

−J · M −1 · M · J · M T · J · M = −J · M −1 · J · J · M
= J · M −1 · M = J
= −J · J · M T · J · M = M T · J · M,

which is what we wanted to prove. Thus we have shown that the 2-form
ω2 is form-invariant under canonical transformations, and deserves its
name.
One important property of of the 2-form ω2 on phase space is that
it is non-degenerate; there is no vector v such that ω(·, v) = 0, which
follows simply from the fact that the matrix Jij is non-singular.


Extended phase space
One way of looking at the evolution of a system is in phase space, where
a given system corresponds to a point moving with time, and the general
equations of motion corresponds to a velocity ﬁeld. Another way is to
consider extended phase space, a 2n + 1 dimensional space with
coordinates (qi , pi , t), for which a system’s motion is a path, monotone
in t. By the modiﬁed Hamilton’s principle, the path of a system in this
t
space is an extremum of the action I = tif pi dqi − H(q, p, t)dt, which
is the integral of the one-form

ω3 = pi dqi − H(q, p, t)dt.

The exterior derivative of this form involves the symplectic structure,
ω2 , as dω3 = ω2 − dH ∧ dt. The 2-form ω2 on phase space is non-
degenerate, and every vector in phase space is also in extended phase
space. On such a vector, on which dt gives zero, the extra term gives
only something in the dt direction, so there are still no vectors in this
subspace which are annihilated by dω3 . Thus there is at most one di-
rection in extended phase space which is annihilated by dω3 . But any
2-form in an odd number of dimensions must annihilate some vector,
because in a given basis it corresponds to an antisymmetric matrix Bij ,
and in an odd number of dimensions det B = det B T = det(−B) =
(−1)2n+1 det B = − det B, so det B = 0 and the matrix is singular,
annihilating some vector ξ. In fact, for dω3 this annihilated vector ξ
is the tangent to the path the system takes through extended phase
space.
One way to see this is to simply work out what dω3 is and apply it
to the vector ξ, which is proportional to v = (qi , pi , 1). So we wish to
˙ ˙
show dω3 (·, v) = 0. Evaluating

dpi ∧ dqi (·, v) = dpi dqi (v) − dqi dpi (v) = dpi qi −
˙ dqi pi
˙
dH ∧ dt(·, v) = dH dt(v) − dt dH(v)
∂H ∂H ∂H
= dqi + dpi + dt 1
∂qi ∂pi ∂t
∂H ∂H ∂H
−dt qi
˙ + pi
˙ +
∂qi ∂pi ∂t


∂H ∂H ∂H ∂H
= dqi + dpi − dt qi
˙ + pi
˙
∂qi ∂pi ∂qi ∂pi
∂H ∂H
dω3 (·, v) = qi −
˙ dpi − pi +
˙ dqi
∂pi ∂qi
∂H ∂H
+ qi
˙ + pi
˙ dt
∂qi ∂pi
= 0

where the vanishing is due to the Hamilton equations of motion.
There is a more abstract way of understanding why dω3 (·, v) van-
ishes, from the modified Hamilton’s principle, which states that if the
path taken were infinitesimally varied from the physical path, there
would be no change in the action. But this change is the integral of ω3
along a loop, forwards in time along the first trajectory and backwards
along the second. From Stokes’ theorem this means the integral of dω3
over a surface connecting these two paths vanishes. But this surface is
a sum over infinitesimal parallelograms one side of which is v ∆t and
the other side of which10 is (δq(t), δp(t), 0). As this latter vector is an
arbitrary function of t, each parallelogram must independently give 0,
so that its contribution to the integral, dω3 ((δq, δp, 0), v)∆t = 0. In
addition, dω3(v, v) = 0, of course, so dω3 (·, v) vanishes on a complete
basis of vectors and is therefore zero.

6.6.1 Generating Functions
Consider a canonical transformation (q, p) → (Q, P ), and the two 1-
forms ω1 = i pi dqi and ω1 = i Pi dQi . We have mentioned that the
difference of these will not vanish in general, but the exterior derivative
of this difference, d(ω1 − ω1 ) = ω2 − ω2 = 0, so ω1 − ω1 is an closed 1-
form. Thus it is exact11 , and there must be a function F on phase space
such that ω1 − ω1 = dF . We call F the generating function of the
10
It is slightly more elegant to consider the path parameterized independently of
time, and consider arbitrary variations (δq, δp, δt), because the integral involved in
the action, being the integral of a 1-form, is independent of the parameterization.
With this approach we find immediately that dω3 (·, v) vanishes on all vectors.
11
We are assuming phase space is simply connected, or else we are ignoring any
complications which might ensue from F not being globally well defined.


canonical transformation. If the transformation (q, p) → (Q, P ) is
such that the old q’s alone, without information about the old p’s, do
not impose any restrictions on the new Q’s, then the dq and dQ are
independent, and we can use q and Q to parameterize phase space12 .
Then knowledge of the function F (q, Q) determines the transformation,
as
∂F ∂F
pi = , −Pi = .
∂qi Q ∂Qi q
If the canonical transformation depends on time, the function F will
also depend on time. Now if we consider the motion in extended phase
space, we know the phase trajectory that the system takes through
extended phase space is determined by Hamilton’s equations, which
could be written in any set of canonical coordinates, so in particular
there is some Hamiltonian K(Q, P, t) such that the tangent to the phase
trajectory, v, is annihilated by dω3 , where ω3 = Pi dQi −K(Q, P, t)dt.
Now in general knowing that two 2-forms both annihilate the same
vector would not be suﬃcient to identify them, but in this case we also
know that restricting dω3 and dω3 to their action on the dt = 0 subspace
gives the same 2-form ω2 . That is to say, if u and u are two vectors
with time components zero, we know that (dω3 − dω3 )(u, u ) = 0. Any
vector can be expressed as a multiple of v and some vector u with time
component zero, and as both dω3 and dω3 annihilate v, we see that
dω3 − dω3 vanishes on all pairs of vectors, and is therefore zero. Thus
ω3 − ω3 is a closed 1-form, which must be at least locally exact, and
indeed ω3 − ω3 = dF , where F is the generating function we found
above13 . Thus dF = pdq − P dQ + (K − H)dt, or
∂F
K=H+ .
∂t
The function F (q, Q, t) is what Goldstein calls F1 . The existence
of F as a function on extended phase space holds even if the Q and q
12
Note that this is the opposite extreme from a point transformation, which is a
canonical transformation for which the Q’s depend only on the q’s, independent of
the p’s.
13
From its deﬁnition in that context, we found that in phase space, dF = ω1 − ω1 ,
which is the part of ω3 − ω3 not in the time direction. Thus if ω3 − ω3 = dF for
some other function F , we know dF − dF = (K − K)dt for some new Hamiltonian
function K (Q, P, t), so this corresponds to an ambiguity in K.


are not independent, but in this case F will need to be expressed as a
function of other coordinates. Suppose the new P ’s and the old q’s are
independent, so we can write F (q, P, t). Then deﬁne F2 = Qi Pi + F .
Then

dF2 = Qi dPi + Pi dQi + pi dqi − Pi dQi + (K − H)dt
= Qi dPi + pi dqi + (K − H)dt,

so
∂F2 ∂F2 ∂F2
Qi = , pi = , K(Q, P, t) = H(q, p, t) + .
∂Pi ∂qi ∂t
The generating function can be a function of old momenta rather
than the old coordinates. Making one choice for the old coordinates
and one for the new, there are four kinds of generating functions as
described by Goldstein. Let us consider some examples. The function
F1 = i qi Qi generates an interchange of p and q,

Qi = pi , Pi = −qi ,
which leaves the Hamiltonian unchanged. We saw this clearly leaves
the form of Hamilton’s equations unchanged. An interesting generator
of the second type is F2 = i λi qi Pi , which gives Qi = λi qi , Pi = λ−1 pi ,
i
a simple change in scale of the coordinates with a corresponding inverse
scale change in momenta to allow [Qi , Pj ] = δij to remain unchanged.
This also doesn’t change H. For λ = 1, this is the identity transforma-
tion, for which F = 0, of course.
Placing point transformations in this language provides another ex-
ample. For a point transformation, Qi = fi (q1 , . . . , qn , t), which is what
one gets with a generating function
F2 = fi (q1 , . . . , qn , t)Pi .
i

Note that
∂F2 ∂fj
pi = = Pj
∂qi j ∂qi
is at any point q a linear transformation of the momenta, required to
preserve the canonical Poisson bracket, but this transformation is q


dependent, so while Q is a function of q and t only, independent of p,
P (q, p, t) will in general have a nontrivial dependence on coordinates
as well as a linear dependence on the old momenta.
For a harmonic oscillator, a simple scaling gives
p2 k 1
H= + q2 = k/m P 2 + Q2 ,
2m 2 2
where Q = (km)1/4 q, P = (km)−1/4 p. In this form, thinking of phase
space as just some two-dimensional space, we seem to be encouraged
to consider a new, polar, coordinate system with θ = tan−1 Q/P as
the new coordinate, and we might hope to have the radial coordinate
related to the new momentum, P = −∂F1 /∂θ. As P = ∂F1 /∂Q is also
Q cot θ, we can take F1 = 1 Q2 cot θ, so P = − 1 Q2 (− csc2 θ) = 1 Q2 (1 +
2 2 2
P 2 /Q2 ) = 1 (Q2 + P 2 ) = H/ω. Note as F1 is not time dependent, K =
2
H and is independent of θ, which is therefore an ignorable coordinate,
so its conjugate momentum P is conserved. Of course P differs from the
conserved Hamiltonian H only by the factor ω = k/m, so this is not
unexpected. With H now linear in the new momentum P, the conjugate
˙
coordinate θ grows linearly with time at the fixed rate θ = ∂H/∂P = ω.

Infinitesimal generators, redux
Let us return to the infinitesimal canonical transformation
ζi = ηi + gi (ηj ).
Mij = ∂ζi /∂ηj = δij + ∂gi /∂ηj needs to be symplectic, and so Gij =
∂gi /∂ηj satisfies the appropriate condition for the generator of a sym-
plectic matrix, G · J = −J · GT . For the generator of the canonical
transformation, we need a perturbation of the generator for the identity
transformation, which can’t be in F1 form (as (q, Q) are not indepen-
dent), but is easily done in F2 form, F2 (q, P ) = i qi Pi + G(q, P, t),
with pi = ∂F2 /∂qi = Pi + ∂G/∂qi , Qi = ∂F2 /∂Pi = qi + ∂G/∂Pi , or
Qi qi 0 1I ∂G/∂qi
ζ= = + =η+ J· G,
Pi pi −1 0
I ∂G/∂pi
where we have ignored higher order terms in in inverting the q → Q
relation and in replacing ∂G/∂Qi with ∂G/∂qi .


The change due to the infinitesimal transformation may be written
in terms of Poisson bracket with the coordinates themselves:

δη = ζ − η = J · G = [η, G].

In the case of an infinitesimal transformation due to time evolution, the
small parameter can be taken to be ∆t, and δη = ∆t η = ∆t[H, η], so we
˙
see that the Hamiltonian acts as the generator of time translations, in
the sense that it maps the coordinate η of a system in phase space into
the coordinates the system will have, due to its equations of motion, at
a slightly later time.
This last example encourages us to find another interpretation of
canonical transformations. Up to now we have viewed the transforma-
tion as a change of variables describing an unchanged physical situa-
tion, just as the passive view of a rotation is to view it as a change in
the description of an unchanged physical point in terms of a rotated
set of coordinates. But rotations are also used to describe changes in
the physical situation with regards to a fixed coordinate system14 , and
similarly in the case of motion through phase space, it is natural to
think of the canonical transformation generated by the Hamiltonian as
describing the actual motion of a system through phase space rather
than as a change in coordinates. More generally, we may view a canon-
ical transformation as a diffeomorphism15 of phase space onto itself,
g : M → M with g(q, p) = (Q, P ).
For an infinitesimal canonical transformation, this active interpre-
tation gives us a small displacement δη = [η, G] for every point η in
phase space, so we can view G and its associated infinitesimal canon-
ical transformation as producing a flow on phase space. G also builds
a finite transformation by repeated application, so that we get a se-
quence on canonical transformations g λ parameterized by λ = n∆λ.
This sequence maps an initial η0 into a sequence of points g λη0 , each
generated from the previous one by the infinitesimal transformation
∆λG, so g λ+∆λη0 − g λ η0 = ∆λ[g λ η0 , G]. In the limit ∆λ → 0, with
14
We leave to Mach and others the question of whether this distinction is real.
15
An isomorphism g : M → N is a 1-1 map with an image including all of N
(onto), which is therefore invertible to form g −1 : N → M. A diffeomorphism is an
isomorphism g for which both g and g −1 are differentiable.


n allowed to grow so that we consider a finite range of λ, we have a
one (continuous) parameter family of transformations g λ : M → M,
satisfying the differential equation

dg λ (η)
= g λη, G .
dλ
This differential equation defines a phase flow on phase space. If G is
not a function of λ, this has the form of a differential equation solved
by an exponential,
g λ (η) = eλ[·,G] η,
which means
1
g λ(η) = η + λ[η, G] + λ2 [[η, G], G] + ....
2

In the case that the generating function is the Hamiltonian, G = H,
this phase flow gives the evolution through time, λ is t, and the velocity
field on phase space is given by [η, G]. If the Hamiltonian is time
independent, the velocity field is fixed, and the solution is formally an
exponential.
Let me review changes due to a generating function. In the passive
picture, we view η and ζ = η + δη as alternative coordinatizations of
the same physical point in phase space. Let us call this point A when
expressed in terms of the η coordinates and A in terms of ζ. For an
infinitesimal generator F2 = i qi Pi + G, δη = J G = [η, G]. A
physical scalar defined by a function u(η) changes its functional form
to u, but not its value at a given physical point, so u(A ) = u(A). For
˜ ˜
˜ ˜
the Hamiltonian, there is a change in value as well, for H or K is not
the same as H, even at the corresponding point,

˜ ∂F2 ∂G
K(A ) = H(A) + = H(A) + .
∂t ∂t
Now consider an active view. Here a canonical transformation is
thought of as moving the point in phase space, and at the same time
˜
changing the functions u → u, H → K, where we are focusing on the
˜
form of these functions, on how they depend on their arguments. We


think of ζ as representing a different point B of phase space, although
the coordinates η(B) are the same as ζ(A ). We ask how u and K˜
differ from u and H at B. At the cost of differing from Goldstein by
an overall sign, let
∂u ∂u
∆u = u(B) − u(B) = u(A) − u(A ) = −δηi
˜ =− [ηi , G]
∂ηi i ∂ηi
= − [u, G]

∂G ∂G
∆H = K(B) − H(B) = H(A) + − H(A ) = − [H, G]
∂t ∂t
dG
= .
dt
Note that if the generator of the transformation is a conserved quan-
tity, the Hamiltonian is unchanged, in that it is the same function after
the transformation as it was before. That is, the Hamiltonian is form
invariant.
We have seen that conserved quantities are generators of symmetries
of the problem, transformations which can be made without changing
the Hamiltonian. We saw that the symmetry generators form a closed
algebra under Poisson bracket, and that finite symmetry transforma-
tions result from exponentiating the generators. Let us discuss the
more common conserved quantities in detail, showing how they gen-
erate symmetries. We have already seen that ignorable coordinates
lead to conservation of the corresponding momentum. Now the reverse
comes if we assume one of the momenta, say pI , is conserved. Then
from our discussion we know that the generator G = pI will generate
canonical transformations which are symmetries of the system. Those
transformations are

δqj = [qj , pI ] = δjI , δpj = [pj , pI ] = 0.

Thus the transformation just changes the one coordinate qI and leaves
all the other coordinates and all momenta unchanged. In other words,
it is a translation of qI . As the Hamiltonian is unchanged, it must be
independent of qI , and qI is an ignorable coordinate.


Second, consider the angular momentum component ω·L = ijk ωi rj pk
for a point particle with q = r. As a generator, ω · L produces changes
δr = [r , ijk ωi rj pk ] = ijk ωi rj [r , pk ] = ijk ωi rj δ k = ij ω i rj
= (ω × r) ,
which is how the point moves under a rotation about the axis ω. The
momentum also changes,
δp = [p , ijk ωi rj pk ] = ijk ωi pk [p , rj ] = ijk ωi pk (−δ j ) =− i k ω i pk
= (ω × p) ,
so p also rotates.
By Poisson’s theorem, the set of constants of the motion is closed
under Poisson bracket, and given two such generators, the bracket is
also a symmetry, so the symmetries form a Lie algebra under Poisson
bracket. For a free particle, p and L are both symmetries, and we have
just seen that [p , Li ] = ik pk , a linear combination of symmetries,
while of course [pi , pj ] = 0 generates the identity transformation and is
in the algebra. What about [Li , Lj ]? As Li = ik rk p ,
[Li , Lj ] = [ rk p , Lj ]
ik
= ik rk [p , Lj ] + ik [rk , Lj ]p
= − ik rk j m pm + ik jmk rm p
= (δij δkm − δim δjk ) rk pm − (δij δm − δim δj ) rm p
= (δia δjb − δib δja ) ra pb
= kij kab ra pb = ijk Lk . (6.12)

We see that we get back the third component of L, so we do not get
a new kind of conserved quantity, but instead we see that the algebra
closes on the space spanned by the momenta and angular momenta. We
also note that it is impossible to have two components of L conserved
without the third component also being conserved. Note also that ω · L
does a rotation the same way on the three vectors r, p, and L. Indeed
it will do so on any vector composed from r, and p, rotating all of the
physical system16 .
16
If there is some rotationally non-invariant property of a particle which is not


The above algebraic artifice is peculiar to three dimensions; in other
dimensions d = 3 there is no -symbol to make a vector out of L, but the
angular momentum can always be treated as an antisymmetric tensor,
Lij = xi pj − xj pi . There are D(D − 1)/2 components, and the Lie
algebra again closes

[Lij , Lk ] = δjk Li − δik Lj − δj Lik + δi Ljk .

We have related conserved quantities to generators of infinitesimal
canonical transformation, but these infinitesimals can be integrated
to produce finite transformations as well. Suppose we consider a
parameterized set of canonical transformations η → ζ(α), as a sequence
of transformations generated by δα G acting repeatedly, so that

ζ(α + δα) = ζ(α) + δα[ζ(α), G]
dζ
or = [ζ, G].
dα
The right side is linear in ζ, so the solution of this differential equation
is, at least formally,

ζ(α) = e[·,G]ζ(0)
1
= 1 + α[·, G] + α2 [[·, G], G] + ... ζ(0)
2
1
= ζ(0) + α[ζ(0), G] + α2 [[ζ(0), G], G] + ....
2
In this fashion, any Lie algebra, and in particular the Lie algebra
formed by the Poisson brackets of generators of symmetry transfor-
mations, can be exponentiated to form a continuous group, called a
Lie Group. In the case of angular momentum, the three components
form a three-dimensional Lie algebra, and the exponentials of these a
three-dimensional Lie group which is SO(3), the rotation group.

built out of r and p, it will not be suitably rotated by L = r × p, in which case L is
not the full angular momentum but only the orbital angular momentum. The
generator of a rotation of all of the physics, the full angular momentum J, is then
the sum of L and another piece, called the intrinsic spin of the particle.

6.7. HAMILTON–JACOBI THEORY 181

6.7 Hamilton–Jacobi Theory
We have mentioned the time dependent canonical transformation that
maps the coordinates of a system at a given fixed time t0 into their
values at a later time t. Now let us consider the reverse transformation,
˙ ˙
mapping (q(t), p(t)) → (Q = q0 , P = p0 ). But then Q = 0, P = 0, and
the Hamiltonian which generates these trivial equations of motion is
K = 0. We denote by S(q, P, t) the generating function of type 2 which
generates this transformation. It satisfies

∂S ∂S
K = H(q, p, t) + = 0, with pi = ,
∂t ∂qi

so S is determined by the differential equation

∂S ∂S
H q, ,t + = 0, (6.13)
∂q ∂t

which we can think of as a partial differential equation in n+1 variables
q, t, thinking of P as fixed and understood. If H is independent of
time, we can solve by separating the t from the q dependence, we may
write S(q, P, t) = W (q, P ) − αt, where α is the separation constant
independent of q and t, but not necessarily of P . We get a time-
independent equation
∂W
H q, = α. (6.14)
∂q
The function S is known as Hamilton’s principal function, while the
function W is called Hamilton’s characteristic function, and the
equations (6.13) and (6.14) are both known as the Hamilton-Jacobi
equation. They are still partial differential equations in many variables,
but under some circumstances further separation of variables may be
possible. We consider first a system with one degree of freedom, with
a conserved H, which we will sometimes specify even further to the
particular case of a harmonic oscillator. Then we we treat a separable
system with two degrees of freedom.
We are looking for new coordinates (Q, P ) which are time inde-
pendent, and have the differential equation for Hamilton’s principal


function S(q, P, t):

∂S ∂S
H q, + = 0.
∂q ∂t

For a harmonic oscillator with H = p2 /2m + 1 kq 2 , this equation is
2

2
∂S ∂S
+ kmq 2 + 2m = 0. (6.15)
∂q ∂t

We can certainly find a seperated solution of the form S = W (q, P ) −
α(P )t, where the first two terms of (6.15) are independent of t. Then
we have an ordinary differential equation,
2
dW
= 2mα − kmq 2 ,
dq

which can be easily integrated
q
W = 2mα − kmq 2 dq + f (α)
0
α 1
= f (α) + θ + sin 2θ , (6.16)
ω 2

where we have made a substitution sin θ = q k/2α, and made explicit
note that the constant (in q) of integration, f (α), may depend on α. For
other hamiltonians, we will still have the solution to the partial differ-
ential equation for S given by separation of variables S = W (q, P ) − αt,
because H was assumed time-independent, but the integral for W may
not be doable analytically.
As S is a type 2 generating function,
∂F2 ∂W
p= = .
∂q ∂q
For our harmonic oscillator, this gives

∂W ∂q α 1 + cos 2θ √
p= = = 2αm cos θ.
∂θ ∂θ ω 2α/k cos θ

6.7. HAMILTON–JACOBI THEORY 183

Plugging into the Hamiltonian, we have

H = α(cos2 θ + sin2 θ) = α,

which will always be the case when (6.14) holds.
We have not spelled out what our new momentum P is, except
that it is conserved, and we can take it to be α. (α = ωR in terms of
our previous discussion of the harmonic oscillator.) The new coordinate
Q = ∂S/∂P = ∂W/∂α|q −t. But Q is, by hypothesis, time independent,
so

∂W
= t + Q.
∂α

For the harmonic oscillator calculation (6.16),

1 1 α ∂θ θ
f (α) + (θ + sin 2θ) + (1 + cos 2θ) = f (α) + = t + Q
ω 2 ω ∂α q ω

Recall sin θ = q k/2α, so

∂θ −q k 1
= = − tan θ,
∂α q
2α cos θ 2α 2α

and θ = ωt + δ, for δ some constant.
As an example of a nontrivial problem with two degrees of free-
dom which is nonetheless separable and therefore solvable using the
Hamilton-Jacobi method, we consider the motion of a particle of mass
m attracted by Newtonian gravity to two equal masses ﬁxed in space.
For simplicity we consider only motion in a plane containing the two
masses, which we take to be at (±c, 0) in cartesian coordinates x, y. If
r1 and r2 are the distances from the particle to the two ﬁxed masses
−1 −1
respectively, the gravitational potential is U = −K(r1 + r2 ), while
the kinetic energy is simple in terms of x and y, T = 1 m(x2 + y 2 ). The
2
˙ ˙
relation between these is, of course,


r1 = (x + c)2 + y 2
2

r2 = (x − c)2 + y 2
2 y

r1 r2

Considering both the kinetic
and potential energies, the prob-
c c x
lem will not separate either in
terms of (x, y) or in terms of (r1 , r2 ), but it does separate in terms of
elliptical coordinates

ξ = r1 + r2
η = r1 − r2
2 2
From r1 − r2 = 4cx = ξη we ﬁnd a fairly simple expression x = (ξ η +
˙ ˙
˙
ξη)/4c. The expression for y is more diﬃcult, but can be found from
observing that 1 (r1 + r2 ) = x2 + y 2 + c2 = (ξ 2 + η 2 )/4, so
2
2 2

2
ξ 2 + η2 ξη (ξ 2 − 4c2 )(4c2 − η 2 )
y2 = − − c2 = ,
4 4c 16c2
or
1
y= ξ 2 − 4c2 4c2 − η 2
4c
and
˙ 4c − η − η η ξ − 4c .
1 2 2 2 2
y=
˙ ξξ 2 ˙
4c ξ − 4c2 4c2 − η 2
Squaring, adding in the x contribution, and simplifying then shows that

m ξ 2 − η2 2 ξ 2 − η 2 ˙2
T = η + 2
˙ ξ .
8 4c2 − η 2 ξ − 4c2
˙˙
Note that there are no crossed terms ∝ ξ η, a manifestation of the
orthogonality of the curvilinear coordinates ξ and η. The potential
energy becomes

1 1 2 2 −4Kξ
U = −K + = −K + = .
r1 r2 ξ+η ξ−η ξ 2 − η2

6.8. ACTION-ANGLE VARIABLES 185

In terms of the new coordinates ξ and η and their conjugate momenta,
we see that
2/m
H= p2 (ξ 2 − 4c2 ) + p2 (4c2 − η 2 ) − 2mKξ .
ξ 2 − η2 ξ η

Then the Hamilton-Jacobi equation for Hamilton’s characteristic func-
tion is
 
2 2
2/m  2 ∂W ∂W
(ξ − 4c2 ) + (4c2 − η 2 ) − 2mKξ  = α,
ξ2−η 2 ∂ξ ∂η
or
2
∂W 1
(ξ 2 − 4c2 ) − 2mKξ − mαξ 2
∂ξ 2
2
∂W 1
+(4c2 − η 2 ) + αmη 2 = 0.
∂η 2
The ﬁrst line depends only on ξ, and the second only on η, so they
must each be constant, with W (ξ, η) = Wξ (ξ) + Wη (η), and
2
2 2 dWξ (ξ) 1
(ξ − 4c ) − 2mKξ − αmξ 2 = β
dξ 2
2
dWη (η) 1
(4c2 − η 2 ) + αmη 2 = −β.
dη 2
These are now reduced to integrals for Wi , which can in fact be inte-
grated to give an explicit expression in terms of elliptic integrals.

6.8 Action-Angle Variables
Consider again a general system with one degree of freedom and a con-
served Hamiltonian. Suppose the system undergoes periodic behavior,
with p(t) and q(t) periodic with period τ . We don’t require q itself to
˙
be periodic as it might be an angular variable which might not return
to the same value when the system returns to the same physical point,
as, for example, the angle which describes a rotation.


If we define an integral over one full period,
1 t+τ
J(t) = p dq,
2π t

it will be time independent. As p = ∂W/∂q = p(q, α), the inte-
gral can be defined without reference to time, just as the integral
2πJ = pdq over one orbit of q, holding α fixed. Then J becomes
a function of α alone, and if we assume this function to be invert-
ible, H = α = α(J). We can take J to be our canonical momentum
P . Using Hamilton’s Principal Function S as the generator, we find
Q = ∂S/∂J = ∂W (q, J)/∂J − (dα/dJ)t. Alternatively, we might use
Hamilton’s Characteristic Function W by itself as the generator, to de-
fine the conjugate variable φ = ∂W (q, J)/∂J, which is simply related to
Q = φ − (dα/dJ)t. Note that φ and Q are both canonically conjugate
to J, differing at any instant only by a function of J. As the Hamilton-
˙
Jacobi Q is time independent, we see that φ = dα/dJ = dH/dJ = ω(J),
which is a constant, because while it is a function of J, J is a constant
˙
in time. We could also derive φ from Hamilton’s equations considering
W as a genenerator, for W is time independent, the therefore the new
Hamiltonian is unchanged, and the equation of motion for φ is simply
˙
φ = ∂H/∂J. Either way, we see that φ = ωt+ δ. The coordinates (J, φ)
are called action-angle variables. Consider the change in φ during
one cycle.
∂φ ∂ ∂W d d
∆φ = dq = dq = pdq = 2πJ = 2π.
∂q ∂q ∂J dJ dJ
Thus we see that in one period τ , ∆φ = 2π = ωτ , so ω = 1/τ .
For our harmonic oscillator, of course,
√ 2α 2π 2απ
2πJ = pdq = 2αm cos2 θdθ =
k 0 k/m

so J is just a constant 1/ k/m times the old canonical momentum α,
and thus its conjugate φ = k/m Q = k/m(t + β), so ω = k/m
as we expect. The important thing here is that ∆φ = 2π, even if the
problem itself is not solvable.

6.8. ACTION-ANGLE VARIABLES 187

Exercises
6.1 In Exercise 2.6, we discussed the connection between two Lagrangians,
L1 and L2 , which differed by a total time derivative of a function on extended
configuration space,
d
L1 ({qi }, {qj }, t) = L2 ({qi }, {qj }, t) +
˙ ˙ Φ(q1 , ..., qn , t).
dt
You found that these gave the same equations of motion, but differing mo-
(1) (2)
menta pi and pi . Find the relationship between the two Hamiltonians,
H1 and H2 , and show that these lead to equivalent equations of motion.

6.2 A uniform static magnetic field can be described by a static vector
potential A = 1 B × r. A particle of mass m and charge q moves under the
2
influence of this field.
(a) Find the Hamiltonian, using inertial cartesian coordinates.
(b) Find the Hamiltonian, using coordinates of a rotating system with an-
gular velocity ω = −q B/2mc.

6.3 Consider a symmetric top with one pont on the symmetry axis fixed
in space, as we did at the end of chapter 4. Write the Hamiltonian for the
top. Noting the cyclic (ignorable) coordinates, explain how this becomes an
effective one-dimensional system.

6.4 (a) Show that a particle under a central force with an attractive po-
tential inversely proportional to the distance squared has a conserved quan-
tity D = 1 r · p − Ht.
2
(b) Show that the infinitesimal transformation generated by D scales r
and p by opposite infinitesimal amounts, Q = (1 + 2 )r, P = (1 − 2 )p, or
for a finite transformation Q = λr, P = λ−1 p. Show that if we describe
the motion in terms of a scaled time T = λ2 t, the equations of motion are
invariant under this combined transformation (r, p, t) → (Q, P , T ).

6.5 We saw that the Poisson bracket associates with every differentiable
function f on phase space a differential operator Df := [f, ·] which acts on
functions g on phase space by Df g = [f, g]. We also saw that every differ-
ential operator is associated with a vector, which in a particular coordinate
system has components fi , where
∂
Df = fi .
∂ηi


A 1-form acts on such a vector by
dxj (Df ) = fj .
Show that for the natural symplectic structure ω2 , acting on the differential
operator coming from the Poisson bracket as its first argument,
ω2 (Df , ·) = df,
which indicates the connection between ω2 and the Poisson bracket.
6.6 Give a complete discussion of the relation of forms in cartesian co-
ordinates in four dimensions to functions, vector fields, and antisymmetric
matrix (tensor) fields, and what wedge products and exterior derivatives of
the forms correspond to in each case. This discussion should parallel what
is done in my book, Pages 148-150, for three dimensions. [Note that two
different antisymmetric tensors, Bµν and Bµν = 1 ρσ µνρσ Bρσ , can be re-
˜
2
lated to the same 2-form, in differing fashions. They are related to each
other with the four dimensional jk m, which you will need to define, and
are called duals of each other. Using one fashion, the two different 2-forms
associated with these two matrices are also called duals.
(b) Let Fµν be a 4×4 matrix defined over a four dimensional space (x, y, z, ict),
with matrix elements Fjk = jk B , for j, k, each 1, 2, 3, and F4j = iEj =
−Fj4 . Show that the statement that F corresponds, by one of the two
fashions, to a closed 2-form F, constitutes two of Maxwell’s equations, and
explain how this implies that 2-form is the exterior derivative of a 1-form,
and what that 1-form is in terms of electromagnetic theory described in
3-dimensional language.
(c) Find the 3-form associated with the exterior derivative of the 2-form dual
to F, and show that it is associated with the 4-vector charge current density
J = (j, icρ), where j is the usual current density and ρ the usual charge
density.
6.7 Consider the following differential forms:
A = y dx + x dy + dz
B = y 2 dx + x2 dy + dz
C = xy(y − x) dx ∧ dy + y(y − 1) dx ∧ dz + x(x − 1) dy ∧ dz
D = 2(x − y) dx ∧ dy ∧ dz
E = 2(x − y) dx ∧ dy
Find as many relations as you can, expressible without coordinates, among
these forms. Consider using the exterior derivative and the wedge product.

Chapter 7

Perturbation Theory

The class of problems in classical mechanics which are amenable to ex-
act solution is quite limited, but many interesting physical problems
differ from such a solvable problem by corrections which may be con-
sidered small. One example is planetary motion, which can be treated
as a perturbation on a problem in which the planets do not interact
with each other, and the forces with the Sun are purely Newtonian
forces between point particles. Another example occurs if we wish to
find the first corrections to the linear small oscillation approximation
to motion about a stable equilibrium point. The best starting point
is an integrable system, for which we can find sufficient integrals of
the motion to give the problem a simple solution in terms of action-
angle variables as the canonical coordinates on phase space. One then
phrases the full problem in such a way that the perturbations due to
the extra interactions beyond the integrable forces are kept as small as
possible. We first examine the solvable starting point.

7.1 Integrable systems
An integral of the motion for a hamiltonian system is a function F
on phase space M for which the Poisson bracket with H vanishes,
[F, H] = 0. More generally, a set of functions on phase space is said
to be in involution if all their pairwise Poisson brackets vanish. The
systems we shall consider are integrable systems in the sense that

189

190 CHAPTER 7. PERTURBATION THEORY

there exists one integral of the motion for each degree of freedom, and
these are in involution and independent. Thus on the 2n-dimensional
manifold of phase space, there are n functions Fi for which [Fi , Fj ] = 0,
and the Fi are independent, so the dFi are linearly independent at each
point η ∈ M. We will assume the first of these is the Hamiltonian.
As each of the Fi is a conserved quantity, the motion of the system
is confined to a submanifold of phase space determined by the initial
values of thes invariants fi = Fi (q(0), p(0)):
Mf = {η : Fi (η) = fi for i = 1, . . . , n}.
The differential operators DFi = [Fi , ·] correspond to vectors tangent
to the manifold Mf , because acting on each of the Fj functions DFi
vanishes, as the F ’s are in involution. These differential operators also
commute with one another, because as we saw in (6.9),
DFi DFj − DFj DFi = D[Fi ,Fj ] = 0.
They are also linearly independent, for if αi DFi = 0, αi DFi ηj =
0 = [ αi Fi , ηj ], which means that αi Fi is a constant on phase space,
and that would contradict the assumed independence of the Fi . Thus
the DFi are n commuting independent differential operators correspond-
ing to the generators Fi of an Abelian group of displacements on Mf .
A given reference point η0 ∈ M is mapped by the canonical transfor-
mation generator ti Fi into some other point g t (η0 ) ∈ Mf . If the
manifold Mf is compact, there must be many values of t for which
g t (η0 ) = η0 . These elements form an abelian subgroup, and therefore
a lattice in Rn . It has n independent lattice vectors, and a unit cell
which is in 1-1 correspondence with Mf . Let these basis vectors be
e1 , . . . , en . These are the edges of the unit cell in Rn , the interior of
which is a linear combination ai ei where each of the ai ∈ [0, 1). We
therefore have a diffeomorphism between this unit cell and Mf , which
induces coordinates on Mf . Because these are periodic, we scale the
ai to new coordinates φi = 2πai , so each point of Mf is labelled by
φ, given by the t = φk ek /2π for which g t (η0 ) = η. Notice each φi
is a coordinate on a circle, with φi = 0 representing the same point
as φi = 2π, so the manifold Mf is diffeomorphic to an n dimensional
torus T n = (S 1 )n .

7.1. INTEGRABLE SYSTEMS 191

Under an inﬁnitesimal generator δti Fi , a point of Mf is translated
by δη = δti [η, Fi ]. This is true for any choice of the coordinates η, in
particular it can be applied to the φj , so

δφj = δti [φj , Fi ],
i

where we have already expressed

δt = δφk ek /2π.
k

We see that the Poisson bracket is the inverse of the matrix Aji given
by the j’th coordinate of the i’th basis vector
1
Aji = (ei )j , δ t = A · δφ, [φj , Fi ] = A−1 .
2π ji

As the Hamiltonian H = F1 corresponds to the generator with t =
(1, 0, . . . , 0), an inﬁnitesimal time translation generated by δtH pro-
duces a change δφi = (A−1 )i1 δt = ωi δt, for some vector ω which is
determined by the ei . Note that the periodicities ei may depend on the
values of the integrals of the motion, so ω does as well, and we have

dφ
= ω(f ).
dt

The angle variables φ are not conjugate to the integrals of the mo-
tion Fi , but rather to combinations of them,
1
Ii = ei (f) · F ,
2π
for then
1
[φj , Ii ] = ei (f ) [φj , Fk ] = Aki A−1 = δij .
2π k jk

These Ii are the action variables, which are functions of the original set
Fj of integrals of the motion, and therefore are themselves integrals of
the motion. In action-angle variables the motion is very simple, with I


˙
constant and φ = ω = constant. This is called conditionally periodic
motion, and the ωi are called the frequencies. If all the ratios of the
ωi ’s are rational, the motion will be truly periodic, with a period the
least common multiple of the individual periods 2π/ωi. More generally,
there may be some relations

ki ωi = 0
i

for integer values ki . Each of these is called a relation among the
frequencies. If there are no such relations the frequencies are said to
be independent frequencies.
In the space of possible values of ω, the subspace of values for which
the frequencies are independent is surely dense. In fact, most such
points have independent frequencies. We should be able to say then
that most of the invariant tori Mf have independent frequencies if the
mapping ω(f ) is one-to-one. This condition is
∂ω ∂ω
det = 0, or equivalently det = 0.
∂f ∂I
When this condition holds the system is called a nondegenerate sys-
tem. As ωi = ∂H/∂Ii , this condition can also be written as det ∂ 2 H/∂Ii ∂Ij =
0.
Consider a function g on Mf . We define two averages of this func-
tion. One is the time average we get starting at a particular point φ0
and averaging over over an infinitely long time,
1 T
g t(φ0 ) = lim g(φ0 + ωt)dt.
T →∞ T 0

We may also define the average over phase space, that is, over all values
of φ describing the submanifold Mf ,
2π 2π
g Mf = (2π)−n ... g(φ)dφ1 . . . dφn ,
0 0

where we have used the simple measure dφ1 . . . dφn on the space Mf .
Then an important theorem states that, if the frequencies are inde-
pendent, and g is a continuous function on Mf , the time and space

7.1. INTEGRABLE SYSTEMS 193

averages of g are the same. Note any such function g can be expanded
in a Fourier series, g(φ) = k∈Zn gk eik·φ , with g Mf = g0 , while

1 T
g t = lim gk eik·φ0 +ik·ωt dt
T →∞ T 0
k
1 T
= g0 + gk eik·φ0 lim eik·ωt dt = g0 ,
T →∞ T 0
k=0

because
1 T ik·ωt 1 eik·ωT − 1
lim e = lim = 0,
T →∞ T 0 T →∞ T ik · ω
as long as the denominator does not vanish. It is this requirement that
k · ω = 0 for all nonzero k ∈ Zn , which requires the frequencies to be
independent.
As an important corrolary of this theorem, when it holds the tra-
jectory is dense in Mf , and uniformly distributed, in the sense that
the time spent in each specified volume of Mf is proportional to that
volume, independent of the position or shape of that volume.
If instead of independence we have relations among the frequencies,
these relations, each given by a k ∈ Zn , form a subgroup of Zn (an
additive group of translations by integers along each of the axes). Each
such k gives a constant of the motion, k · φ. Each independent rela-
tion among the frequencies therefore restricts the dimensionality of the
motion by an additional dimension, so if the subgroup is generated by
r such independent relations, the motion is restricted to a manifold of
reduced dimension n − r, and the motion on this reduced torus T n−r is
conditionally periodic with n−r independent frequencies. The theorem
and corrolaries just discussed then apply to this reduced invariant torus,
but not to the whole n-dimensional torus with which we started. In
particular, g t(φ0 ) can depend on φ0 as it varies from one submanifold
T n−r to another, but not along paths on the same submanifold.
If the system is nondegenerate, for typical I the ωi ’s will have no
relations and the invariant torus will be densely filled by the motion of
the system. Therefore the invariant tori are uniquely defined, although
the choices of action and angle variables is not. In the degenerate case
the motion of the system does not fill the n dimensional invariant torus,


so it need not be uniquely defined. This is what happens, for example,
for the two dimensional harmonic oscillator or for the Kepler problem.

7.2 Canonical Perturbation Theory
We now consider a problem with a conserved Hamiltonian which is in
some sense approximated by an integrable system with n degrees of
freedom. This integrable system is described with a Hamiltonian H (0) ,
and we assume we have described it in terms of its action variables
(0) (0)
Ii and angle variables φi . This system is called the unperturbed
system, and the Hamiltonian is, of course, independent of the angle
variables, H (0) I (0) , φ(0) = H (0) I (0) .
The action-angle variables of the unperturbed system are a canon-
ical set of variables for the phase space, which is still the same phase
space for the full system. We write the Hamiltonian of the full system
as
H I (0) , φ(0) = H (0) I (0) + H1 I (0) , φ(0) . (7.1)
We have included the parameter so that we may regard the terms
in H1 as fixed in strength relative to each other, and still consider a
series expansion in , which gives an overall scale to the smallness of
the perturbation.
We might imagine that if the perturbation is small, there are some
new action-angle variables Ii and φi for the full system, which differ
by order from the unperturbed coordinates. These are new canonical
coordinates, and may be generated by a generating function (of type
2),
(0)
F I, φ(0) = φi Ii + F1 I, φ(0) + ....
This is a time-independent canonical transformation, so the full Hamil-
tonian is the same function on phase-space whether the unperturbed or
full action-angle variables are used, but has a different functional form,

H(I, φ) = H I (0) , φ(0) .
˜ (7.2)

Note that the phase space itself is described periodically by the coor-
dinates φ(0) , so the Hamiltonian perturbation H1 and the generating

7.2. CANONICAL PERTURBATION THEORY 195

function F1 are periodic functions (with period 2π) in these variables.
Thus we can expand them in Fourier series:
(0)
H1 I (0) , φ(0) = H1k I (0) eik·φ , (7.3)
k
(0) (0)
F1 I, φ = F1k I eik·φ , (7.4)
k

where the sum is over all n-tuples of integers k ∈ Zn . The zeros of the
new angles are arbitrary for each I, so we may choose F10 (I) = 0.
The unperturbed action variables, on which H0 depends, are the old
(0) (0) (0)
momenta given by Ii = ∂F/∂φi = Ii + ∂F1 /∂φi + ..., so to first
order
∂H0 ∂F1
H0 I (0) = H0 I + (0) (0)
+ ...
j ∂Ij ∂φj
(0) (0)
= H0 I + ωj ikj F1k (I)eik·φ + ..., (7.5)
j k

(0) (0)
where we have noted that ∂H0 /∂Ij = ωj , the frequencies of the
unperturbed problem. Thus
(0)
˜
H I, φ = H I (0) , φ(0) = H (0) I (0) + H1k I (0) eik·φ
k
 
(0) (0)
= H0 I +  ikj ωj F1k (I) + H1k I (0)  eik·φ .
k j

˜
The I are the action variables of the full Hamiltonian, so H(I, φ) is
in fact independent of φ. In the sum over Fourier modes on the right
hand side, the φ(0) dependence of the terms in parentheses due to the
difference of I (0) from I is higher order in , so the the coefficients
(0)
of eik·φ may be considered constants in φ(0) and therefore must van-
ish for k = 0. Thus the generating function is given in terms of the
Hamiltonian perturbation
H1k
F1k = i , k = 0. (7.6)
k · ω (0) (I)


We see that there may well be a problem in finding new action vari-
ables if there is a relation among the frequencies. If the unperturbed
system is not degenerate, “most” invariant tori will have no relation
among the frequencies. For these values, the extension of the proce-
dure we have described to a full power series expansion in may be
able to generate new action-angle variables, showing that the system
is still integrable. That this is true for sufficiently small perturbations
(0)
and “sufficiently irrational” ωJ is the conclusion of the famous KAM
theorem.
What happens if there is a relation among the frequencies? Consider
(0) (0)
a two degree of freedom system with pω1 + qω2 = 0, with p and
q relatively prime. Then the Euclidean algorithm shows us there are
integers m and n such that pm+ qn = 1. Instead of our initial variables
(0)
φi ∈ [0, 2π] to describe the torus, we can use the linear combinations
(0) (0)
ψ1 p q φ1 φ1
= (0) =B· (0) .
ψ2 n −m φ2 φ2
Then ψ1 and ψ2 are equally good choices for the angle variables of the
unperturbed system, as ψi ∈ [0, 2π] is a good coordinate system on the
torus. The corresponding action variables are Ii = (B −1 )ji Ij , and the
corresponding new frequencies are
∂H ∂H ∂Ij (0)
ωi = = = Bij ωj ,
∂Ii j ∂Ij ∂Ii
(0) (0)
and so in particular ω1 = pω1 + qω2 = 0 on the chosen invariant
torus. This conclusion is also obvious from the equations of motion
˙
φi = ωi .
In the unperturbed problem, on our initial invariant torus, ψ1 is a
constant of the motion, so in the perturbed system we might expect
it to vary slowly with respect to ψ2 . Then it is appropriate to use the
adiabatic approximation

7.2.1 Time Dependent Perturbation Theory
Consider a problem for which the Hamiltonian is approximately that
of an exactly solvable problem. For example, let’s take the pendulum,

7.2. CANONICAL PERTURBATION THEORY 197

L = 1 m 2 θ2 − mg (1 − cos θ), pθ = m 2 θ, H = p2 /2m 2 + mg (1 −
2
˙ ˙
θ
2 2 1 2
cos θ) ≈ pθ /2m + 2 mg θ , which is approximately given by an har-
monic oscillator if the excursions are not too big. More generally
H(q, p, t) = H0 (q, p, t) + HI (q, p, t),
where HI (q, p, t) is considered a small “interaction” Hamiltonian. We
assume we know Hamilton’s principal function S0 (q, P, t) for the un-
perturbed problem, which gives a canonical transformation (q, p) →
(Q, P ), and in the limit → 0, Q = P = 0. For the full problem,
˙ ˙
∂S0
K(Q, P, t) = H0 + HI + = HI ,
∂t
and is small. Expressing HI in terms of the new variables (Q, P ), we
have that
˙ ∂HI ˙ ∂HI
Q= , P =−
∂P ∂Q
and these are slowly varying because is small. In symplectic form,
with ζ T = (Q, P ), we have, of course,
˙
ζ= J· HI (ζ). (7.7)
This differential equation can be solved perturbatively. If we assume
an expansion
2
ζ(t) = ζ0 (t) + ζ1 (t) + ζ2 (t) + ...,
˙
ζn on the left of (7.7) can be determined from only lower order terms in
ζ on the right hand side, so we can recursively find higher and higher
order terms in . This is a good expansion for small for fixed t, but
as we are making an error of some order, say m, in ζ, this is O( m t) for
˙
ζ(t). Thus for calculating the long time behavior of the motion, this
method is unlikely to work in the sense that any finite order calculation
cannot be expected to be good for t → ∞. Even though H and H0
differ only slightly, and so acting on any given η they will produce only
slightly different rates of change, as time goes on there is nothing to
prevent these differences from building up. In a periodic motion, for
example, the perturbation is likely to make a change ∆τ of order in
the period τ of the motion, so at a time t ∼ τ 2 /2∆τ later, the systems
will be at opposite sides of their orbits, not close together at all.


7.3 Adiabatic Invariants
7.3.1 Introduction
We are going to discuss the evolution of a system which is, at every
instant, given by an integrable Hamiltonian, but for which the param-
eters of that Hamiltonian are slowly varying functions of time. We will
find that this leads to an approximation in which the actions are time
invariant. We begin with a qualitative discussion, and then we discuss
a formal perturbative expansion.
First we will consider a system with one degree of freedom described
by a Hamiltonian H(q, p, t) which has a slow time dependence. Let
us call TV the time scale over which the Hamiltonian has significant
variation (for fixed q, p). For a short time interval << TV , such a system
could be approximated by the Hamiltonian H0 (q, p) = H(q, p, t0 ), where
t0 is a fixed time within that interval. Any perturbative solution based
on this approximation may be good during this time interval, but if
extended to times comparable to the time scale TV over which H(q, p, t)
varies, the perturbative solution will break down. We wish to show,
however, that if the motion is bound and the period of the motion
determined by H0 is much less than the time scale of variations TV , the
action is very nearly conserved, even for evolution over a time interval
comparable to TV . We say that the action is an adiabatic invariant.

7.3.2 For a time-independent Hamiltonian
In the absence of any explicit time dependence, a Hamiltonian is con-
served. The motion is restricted to lie on a particular contour H(q, p) =
α, for all times. For bound solutions to the equations of motion, the
solutions are periodic closed orbits in phase space. We will call this
contour Γ, and the period of the motion τ . Let us parameterize the
contour with the action-angle variable φ. We take an arbitrary point
on Γ to be φ = 0 and also (q(0), p(0)). Every other point is deter-
mined by Γ(φ) = (q(φτ /2π), p(φτ /2π)), so the complete orbit is given
by Γ(φ), φ ∈ [0, 2π). The action is defined as
1
J= pdq. (7.8)
2π

7.3. ADIABATIC INVARIANTS 199

This may be considered as an integral along one cycle in extended phase
space, 2πJ(t) = tt+τ p(t )q(t )dt . Because p(t) and q(t) are periodic
˙ ˙
with period τ,
J is independent of time t. But J can also be thought
p
of as an integral in phase space itself, 2πJ = Γ pdq,
of a one form ω1 = pdq along the closed path Γ(φ), 1

φ ∈ [0, 2π], which is the orbit in question. By Stokes’ -1 0
q 1

-1
Theorem,
dω = ω,
S δS Fig. 1. The orbit of
true for any n-form ω and region S of a manifold, we an autonomous sys-
have 2πJ = A dp ∧ dq, where A is the area bounded tem in phase space.
by Γ.

In extended phase space {q, p, t}, if we start at time t=0 with any
point (q, p) on Γ, the trajectory swept out by the equations of motion,
(q(t), p(t), t) will lie on the surface of a cylinder with base A extended in
the time direction. Let Γt be the embedding of Γ into the time slice at t,
which is the intersection
of the cylinder with that time slice. The
surface of the cylinder can also be viewed
as the set of all the dynamical trajecto-
ries which start on Γ at t = 0. In other
words, if Tφ (t) is the trajectory of the sys-
2
1
tem which starts at Γ(φ) at t=0, the set of p0
Tφ (t) for φ ∈ [0, 2π], t ∈ [0, T ], sweeps out
-1 Γ ℑ -1
-2 t
0q
the same surface as Γt , t ∈ [0, T ]. Because 0 5 10 1
this is an autonomous system, the value t 15 20 2

of the action J is the same, regardless of Fig 2. The surface in extended
whether it is evaluated along Γt , for any phase space, generated by the
t, or evaluated along one period for any of ensemble of systems which start
the trajectories starting on Γ0 . If we ter- at time t = 0 on the orbit Γ
minate the evolution at time T , the end of shown in Fig. 1. One such tra-
the cylinder, ΓT , is the same orbit of the jectory is shown, labelled I, and
motion, in phase space, as was Γ0 . also shown is one of the Γt .


7.3.3 Slow time variation in H(q, p, t)
Now consider a time dependent Hamiltonian H(q, p, t). For a short in-
terval of time near t0 , if we assume the time variation of H is slowly
varying, the autonomous Hamiltonian H(q, p, t0 ) will provide an ap-
proximation, one that has conserved energy and bound orbits given by
contours of that energy. Consider extended phase space, and a closed
path Γ0 (φ) in the t=0 plane which is a contour of H(q, p, 0), just as we
had in the time-independent case. For each
point φ on this path, construct the tra-
jectory Tφ (t) evolving from Γ(φ) under
the influence of the full Hamiltonian 1

H(q, p, t), up until some fixed final time p0
t = T . This collection of trajectories
will sweep out a curved surface Σ1 with -1
0 10 2
boundary Γ0 at t=0 and another we call 20
30 0q
ΓT at time t=T . Because the Hamilto- t 40 50 60 -2
nian does change with time, these Γt , Fig. 3. The motion of a harmonic
the intersections of Σ1 with the planes oscillator with time-varying spring
at various times t, are not congruent. constant k ∝ (1 − t)4 , with =
Let Σ0 and ΣT be the regions of the 0.01. [Note that the horn is not
t=0 and t=T planes bounded by Γ0 and tipping downwards, but the surface
ΓT respectively, oriented so that their ends flat against the t = 65 plane.]
normals go forward in time.
This constructs a region which is a deformation of the cylinder1 that
we had in the case where H was independent of time. If the variation
of H is slow on a time scale of T , the path ΓT will not differ much
from Γ0 , so it will be nearly an orbit and the action defined by pdq
around ΓT will be nearly that around Γ0 . We shall show something
much stronger; that if the time dependence of H is a slow variation
compared with the approximate period of the motion, then each Γt is
˜
nearly an orbit and the action on that path, J(t) = Γt pdq is constant,
even if the Hamiltonian varies considerably over time T .
1
Of course it is possible that after some time, which must be on a time scale of
order TV rather than the much shorter cycle time τ , the trajectories might intersect,
which would require the system to reach a critical point in phase space. We assume
that our final time T is before the system reaches a critical point.


The Σ’s form a closed surface, which is Σ1 + ΣT −Σ0 , where we have
taken the orientation of Σ1 to point outwards, and made up for the
inward-pointing direction of Σ0 with a negative sign. Call the volume
enclosed by this closed surface V .
˜ ˜
We will ﬁrst show that the actions J(0) and J(T ) deﬁned on the
ends of the cylinder are the same. Again from Stokes’ theorem, they
are
˜
J(0) = pdq = dp ∧ dq ˜
and J(T ) = dp ∧ dq
Γ0 Σ0 ΣT

respectively. Each of these surfaces has no component in the t direction,
˜
so we may also evaluate J (t) = Σt ω2 , where

ω2 = dp ∧ dq − dH ∧ dt. (7.9)

Clearly ω2 is closed, dω2 = 0, as ω2 is a sum of wedge products of closed
forms.
As H is a function on extended phase space, dH = ∂H dp + ∂H dq +
∂p ∂q
∂H
∂t
dt, and thus

∂H ∂H
ω2 = dp ∧ dq − dp ∧ dt − dq ∧ dt
∂p ∂q
∂H ∂H
= dp + dt ∧ dq − dt , (7.10)
∂q ∂p

where we have used the antisymmetry of the wedge product, dq ∧ dt =
−dt ∧ dq, and dt ∧ dt = 0.
Now the interesting thing about this rewriting of the action in terms
of the new form (7.10) of ω2 is that ω2 is now a product of two 1-forms

∂H ∂H
ω2 = ωa ∧ ωb , where ωa = dp + dt, ωb = dq − dt,
∂q ∂p
and each of ωa and ωb vanishes along any trajectory of the motion,
along which Hamilton’s equations require
dp ∂H dq ∂H
=− , = .
dt ∂q dt ∂p


As a consequence, ω2 vanishes at any point when evaluated on a surface
which contains a physical trajectory, so in particular ω2 vanishes over
the surface Σ1 generated by the trajectories. Because ω2 is closed,

ω2 = dω2 = 0
Σ1 +ΣT −Σ0 V

where the first equality is due to Gauss’ law, one form of the generalized
Stokes’ theorem. Then we have
˜
J(T ) = ω2 = ˜
ω2 = J(0).
ΣT Σ0

What we have shown here for the area in phase space enclosed by an
orbit holds equally well for any area in phase space. If A is a region in
phase space, and if we define B as that region in phase space in which
systems will lie at time t = T if the system was in A at time t = 0, then
A dp ∧ dq = B dp ∧ dq. For systems with n > 1 degrees of freedom,
we may consider a set of n forms (dp ∧ dq)j , j = 1...n, which are all
conserved under dynamical evolution. In particular, (dp ∧ dq)n tells us
the hypervolume in phase space is preserved under its motion under
evolution according to Hamilton’s equations of motion. This truth is
known as Liouville’s theorem, though the n invariants (dp ∧ dq)j are
known as Poincar´ invariants.
e
While we have shown that the integral pdq is conserved when
evaluated over an initial contour in phase space at time t = 0, and then
compared to its integral over the path at time t = T given by the time
evolution of the ensembles which started on the first path, neither of
these integrals are exactly an action.
In fact, for a time-varying system
1 1

the action is not really well defined, p p
0.5 0.5

because actions are defined only for -2 -1.5 -1 -0.5 0 0.5 1 1.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5
q q
periodic motion. For the one dimen- -0.5 -0.5

sional harmonic oscillator (with vary- -1 -1

ing spring constant) of Fig. 3, a reason- Fig. 4. The trajectory in phase
able substitute definition is to define J space of the system in Fig. 3. The
for each “period” from one passing to “actions” during two “orbits” are
the right through the symmetry point, shown by shading. In the adiabatic
q = 0, to the next such crossing. The approximation the areas are equal.


1

p
trajectory of a single such system as it
moves through phase space is shown in -2 -1 0
q 1 1.5

Fig. 4. The integrals p(t)dq(t) over
time intervals between successive for-
-1
ward crossings of q = 0 is shown for
the first and last such intervals. While Fig. 5. The differences between the
these appear to have roughly the same actual trajectories (thick lines) dur-
area, what we have shown is that the ing the first and fifth oscillations,
integrals over the curves Γt are the and the ensembles Γt at the mo-
same. In Fig. 5 we show Γt for t at ments of the beginnings of those pe-
the beginning of the first and fifth “periods. The area enclosed by the lat-
riods”, together with the actual motion ter two curves are strictly equal, as
through those periods. The deviations we have shown. The figure indi-
are of order τ and not of T , and so are cates the differences between each
negligible as long as the approximate of those curves and the actual tra-
period is small compared to TV ∼ 1/ . jectories.
Another way we can define an action in our time-varying problem is
to write an expression for the action on extended phase space, J(q, p, t0 ),
given by the action at that value of (q, p) for a system with hamilto-
nian fixed at the time in question, Ht0 (q, p) := H(q, p, t0). This is an
ordinary harmonic oscillator with ω = k(t0 )/m. For an autonomous
harmonic oscillator the area of the elliptical orbit is
2
2πJ = πpmax qmax = πmωqmax ,
while the energy is
p2 mω 2 2 mω 2 2
+ q =E= q ,
2m 2 2 max
so we can write an expression for the action as a function on extended
phase space,
1 2 p2 mω(t) 2
J = mωqmax = E/ω = + q .
2 2mω(t) 2
With this definition, we can assign a value for the action to the system
as a each time, which in the autonomous case agrees with the standard
action.


From this discussion, we see that if the Hamiltonian varies slowly
on the time scale of an oscillation of the system, the action will remain
˜
fairly close to the Jt , which is conserved. Thus the action is an adiabatic
invariant, conserved in the limit that τ /TV → 0.

1.2

J
1

0.8

To see how this works in a particular 0.6
example, consider the harmonic oscillator
with a time-varying spring constant, which
we have chosen to be k(t) = k0 (1 − t)4 . 0.4
ω
With = 0.01, in units given by the initial E
ω, the evolution is shown from time 0 to 0.2

time 65. During this time the spring con-
stant becomes over 66 times weaker, and
0 20 40 60
the natural frequency decreases by a fac- t
tor of more than eight, as does the energy, Fig. 6. The change in angu-
but the action remains quite close to its lar frequency, energy, and ac-
original value, even though the adiabatic tion for the time-varying spring-
approximation is clearly badly violated by constant harmonic oscillator,
a spring constant which changes by a factor with k(t) ∝ (1 − t)4 , with =
of more than six during the last oscillation. ω(0)/100
We see that the failure of the action to be exactly conserved is due
to the descrepancy between the action evaluated on the actual path of
a single system and the action evaluated on the curve representing the
evolution, after a given time, of an ensemble of systems all of which
began at time t = 0 on a path in phase space which would have been
their paths had the system been autonomous.
This might tempt us to consider a diﬀerent problem, in which the
time dependance of the hamiltonian varies only during a ﬁxed time
interval, t ∈ [0, T ], but is constant before t = 0 and after T . If we look
at the motion during an oscillation before t = 0, the system’s trajectory


˜
projects exactly onto Γ0 , so the initial action J = J (0). If we consider
a full oscillation beginning after time T , the actual trajectory is again a
contour of energy in phase space. Does this mean the action is exactly
conserved?
There must be something wrong with this argument, because the
˜
constancy of J (t) did not depend on assumptions of slow variation of
the Hamiltonian. Thus it should apply to the pumped swing, and claim
that it is impossible to increase the energy of the oscillation by periodic
changes in the spring constant. But we know that is not correct. Exam-
ining this case will point
out the ﬂawed assumption in
the argument. In Fig. 7, we
show the surface generated by
time evolution of an ensemble
of systems initially on an en-
ergy contour for a harmonic os-
cillator. Starting at time 0, the
spring constant is modulated by
10% at a frequency twice the
natural frequency, for four nat-
1.5
ural periods. Thereafter the 1
0.5
Hamiltonian is the same as is 0
1
1.5

-0.5
was before t = 0, and each sys- -1
0
0.5

-1.5
tem’s path in phase space con- 0 -1
-0.5

10
tinues as a circle in phase space 20
30
-1.5

(in the units shown), but the en- Fig. 7. The surface Σ1 for a harmonic
semble of systems form a very oscillator with a spring constant which
elongated ﬁgure, rather than a varies, for the interval t ∈ [0, 8π], as
circle. k(t) = k(0)(1 + 0.1 sin 2t).
What has happened is that some of the systems in the ensemble have
gained energy from the pumping of the spring constant, while others
have lost energy. Thus there has been no conservation of the action
for individual systems, but rather there is some (vaguely understood)
average action which is unchanged.
Thus we see what is physically the crucial point in the adiabatic
expansion: if all the systems in the ensemble experience the perturba-


tion in the same way, because the time variation of the hamiltonian is
slow compared to the time it takes for each system in the ensemble to
occupy the initial position (in phase space) of every other system, then
each system will have its action conserved.

7.3.4 Systems with Many Degrees of Freedom
In the discussion above we considered as our starting point an au-
tonomous system with one degree of freedom. As the hamiltonian
is a conserved function on phase space, this is an integrable system.
For systems with n > 1 degrees of freedom, we wish to again start
with an integrable system. Such systems have n invariant “integrals
of the motion in involution”, and their phase space can be described
in terms of n action variables Ji and corresponding coordinates φi .
Phase space is periodic in each of the
φi with period 2π, and the submanifold
Mf of phase space which has a given
set {fi } of values for the Ji is an n-
dimensional torus. As the Ji are con-
served, the motion is confined to Mf ,
and indeed the equations of motion are
very simple, dφi /dt = νi (constant).
Mf is known as an invariant torus. Γ2
Γ1
In the one variable case we related
the action to the 1-form p dq. On the Fig 8. For an integrable system
invariant torus, the actions are con- with two degrees of freedom, the
stants and so it is trivially true that motion is confined to a 2-torus, and
Ji = Ji dφi/2π, where the integral is the trajectories are uniform motion
2π
0 dφi with the other φ’s held fixed. in each of the angles, with indepen-
This might lead one to think about n dent frequencies. The two actions
1-forms without a sum, but it is more J1 and J2 may be considered as in-
profitable to recognize that the single tegrals of the single 1-form ω1 =
1-form ω1 = Ji dφi alone contains all Ji dφi over two independant cy-
of the information we need. First note cles Γ1 and Γ2 as shown.
that, restricted to Mf , dJi vanishes,
so ω1 is closed, and its integral is a topological invariant, that is, un-
changed under continuous deformations of the path. We can take a set


of paths, or cycles, Γi , each winding around the torus only in the φi
1
direction, and we then have Ji = 2π Γi ω1 . The answer is completely in-
dependent of where the path Γi is drawn on Mf , as long as its topology
is unchanged. Thus the action can be thought of as a function on the
simplicial homology H1 of Mf . The actions can also be expressed as
1
an integral over a surface Σi bounded by the Γi , Ji = 2π Σi dJi ∧ dφi .
Notice that this surface does not lie on the invariant torus but cuts
across it. This formulation has two advantages. First, dpi ∧ dqi is
invariant under arbitrary canonical transformations, so dJi ∧ dφi is
just one way to write it. Secondly, on a surface of constant t, such as
Σi , it is identical to the fundamental form

n
ω2 = dpi ∧ dqi − dH ∧ dt,
i=1

the generalization to several degrees of freedom of the form we used to
show the invariance of the integral under time evolution in the single
degree of freedom case.

Now suppose that our system is subject to some time-dependent
perturbation, but that at all times its Hamiltonian remains close to an
integrable system, though that system might have parameters which
vary with time. Let’s also assume that after time T the hamiltonian
again becomes an autonomous integrable system, though perhaps with
parameters diﬀerent from what it had at t = 0.


Consider the evolution in time, un-
der the full hamiltonian, of each system
which at t = 0 was at some point φ0 on
the invariant torus Mf of the original
unperturbed system. Follow each such Γ1
system until time T . We assume that Γ2
none of these systems reaches a critical
point during this evolution. The region
in phase space thus varies continuously,
and at the fixed later time T , it still
will be topologically an n-torus, which
we will call B. The image of each of the ~
cycles Γi will be a cycle Γi on B, and to-
˜ ~ Γ2
gether these images will be a a basis of Γ1
the homology H1 of the B. Let Σi be ˜ Fig. 9. Time evolution of the invari-
surfaces within the t = T hyperplane ant torus, and each of two of the cy-
˜ ˜
bounded by Γi . Define Ji to be the cles on it.

˜ ˜ 1
integral on Σi of ω2 , so Ji = 2π Σi j dpj ∧ dqj , where we can drop the
˜
dH ∧ dt term on a constant t surface, as dt = 0. We can now repeat
the argument from the one-degree-of-freedom case to show that the in-
˜
tegrals Ji = Ji , again because ω2 is a closed 2-form which vanishes on
the surface of evolution, so that its integrals on the end-caps are the
same.
Now we have assumed that the system is again integrable at t = T ,
so there are new actions Ji , and new invariant tori
Mg = {(q, p) Ji (q, p) = gi}.

Each initial system which started at φ0 winds up on some new invariant
torus with g(φ0 ).
If the variation of the hamiltonian is sufficiently slow and smoothly
varying on phase space, and if the unperturbed motion is sufficiently
ergotic that each system samples the full invariant torus on a time scale
short compared to the variation time of the hamiltonian, then each
initial system φ0 may be expected to wind up with the same values of
the perturbed actions, so g is independant of φ0 . That means that the
torus B is, to some good approximation, one of the invariant tori Mg ,


that the cycles of B are cycles of Mg , and therefore that Ji = Ji = Ji ,
˜
and each of the actions is an adiabatic invariant.

7.3.5 Formal Perturbative Treatment
Consider a system based on a system H(q, p, λ), where λ is a set of
parameters, which is integrable for each constant value of λ within
some domain of interest. Now suppose our “real” system is described
by the same Hamiltonian, but with λ(t) a given slowly varying function
of time. Although the full Hamiltonian is not invariant, we will show
that the action variables are approximately so.
For each fixed value of λ, there is a generating function of type 1 to
the corresponding action-angle variables:

F1 (q, φ, λ) : (q, p) → (φ, I).

This is a time-independent transformation, so the Hamiltonian may be
written as H(I(q, p), λ), independent of the angle variable. This con-
˙ ˙
stant λ Hamiltonian has equations of motion φi = ∂H/∂Ii = ωi (λ), Ii =
0. But in the case where λ is a function of time, the transformation F1
is not a time-independent one, so the correct Hamiltonian is not just
the reexpressed Hamiltonian but has an additional term

∂F1 dλn
K(φ, I, λ) = H(I, λ) + ,
n ∂λn dt

where the second term is the expansion of ∂F1 /∂t by the chain rule.
The equations of motion involve differentiating K with respect to one
of the variables (φj , Ij ) holding the others, and time, fixed. While these
are not the usual variables (q, φ) for F1 , they are coordinates of phase
space, so F1 can be expressed in terms of (φj , Ij ), and as shown in (??),
it is periodic in the φj . The equation of motion for Ij is

˙ ∂ 2 F1 ˙
φi = ωi(λ) + λn ,
n ∂λn ∂Ii
2
˙ ∂ F1 ˙
Ii = λn ,
n ∂λn ∂φi


where all the partial derivatives are with respect to the variables φ, I, λ.
˙
We first note that if the parameters λ are slowly varying, the λn ’s in
the equations of motion make the deviations from the unperturbed
˙
system small, of first order in /τ = λ/λ, where τ is a typical time
for oscillation of the system. But in fact the constancy of the action
˙
is better than that, because the expression for Ij is predominantly an
oscillatory term with zero mean. This is most easily analyzed when the
unperturbed system is truly periodic, with period τ . Then during one
˙ ˙ ¨
period t ∈ [0, τ ], λ(t) ≈ λ(0) + tλ. Assuming λ(t) varies smoothly on
a time scale τ / , λ¨ ∼ λO( 2 /τ 2 ), so if we are willing to drop terms of
order 2 , we may treat λ as a constant. We can then also evaluate F1
˙
on the orbit of the unperturbed system, as that differs from the true
˙
orbit by order , and the resulting value is multiplied by λ, which is
already of order /τ , and the result is to be integrated over a period τ .
Then we may write the change of Ij over one period as
τ ∂ ∂F1
∆Ij ≈ ˙
λn dt.
n 0 ∂φj ∂λn

But F1 is a well defined single-valued function on the invariant manifold,
and so are its derivatives with respect to λn , so we may replace the time
integral by an integral over the orbit,

∆Ij ≈ ˙ τ
λn
∂ ∂F1
dφj = 0,
n L ∂φj ∂λn

where L is the length of the orbit, and we have used the fact that for
the unperturbed system dφj /dt is constant.
Thus the action variables have oscillations of order , but these
variations do not grow with time. Over a time t, ∆I = O( )+tO( 2 /τ ),
and is therefore conserved up to order even for times as large as τ / ,
corresponding to many natural periods, and also corresponding to the
time scale on which the Hamiltonian is varying significantly.
This form of perturbation, corresponding to variation of constants
on a time scale slow compared to the natural frequencies of the un-
perturbed system, is known as an adiabatic variation, and a quan-
tity conserved to order over times comparable to the variation it-
self is called an adiabatic invariant. Classic examples include ideal

7.4. RAPIDLY VARYING PERTURBATIONS 211

gases in a slowly varying container, a pendulum of slowly varying
length, and the motion of a rapidly moving charged particle in a strong
but slowly varying magnetic field. It is interesting to note that in
Bohr-Sommerfeld quantization in the old quantum mechanics, used be-
fore the Schr¨dinger equation clarified such issues, the quantization of
o
bound states was related to quantization of the action. For example,
in Bohr theory the electrons are in states with action nh, with n a
positive integer and h Planck’s constant. Because these values are pre-
served under adiabatic perturbation, it is possible that an adiabatic
perturbation of a quantum mechanical system maintains the system in
the initial quantum mechanical state, and indeed this can be shown,
with the full quantum theory, to be the case in general. An important
application is cooling by adiabatic demagnetization. Here atoms with a
magnetic moment are placed in a strong magnetic field and reach equi-
librium according to the Boltzman distribution for their polarizations.
If the magnetic field is adiabatically reduced, the separation energies
of the various polarization states is reduced proportionally. As the
distribution of polarization states remains the same for the adiabatic
change, it now fits a Boltzman distribution for a temperature reduced
proportionally to the field, so the atoms have been cooled.

7.4 Rapidly Varying Perturbations
At the other extreme from adiabatic perturbations, we may ask what
happens to a system if we add a perturbative potential which oscillates
rapidly with respect to the natural frequencies of the unperturbed sys-
tem. If the forces are of the same magnitude as those of the unper-
turbed system, we would expect that the coordinates and momenta
would change little during the short time of one external oscillation,
and that the effects of the force might be little more than adding jitter
to the unperturbed motion. Consider the case that the external force
is a pure sinusoidal oscillation,
H(q, p) = H0 (q, p) + U(q) sin ωt,
and let us write the resulting motion as
q(t) = q (t) + ξ(t),
¯


p(t) = p(t) + η(t),
¯

where we subtract out the average smoothly varying functions q and p,
¯ ¯
leaving the rapidly oscillating pieces ξ and η, which have natural time
scales of 2π/ω. Thus ξ, ω ξ, ω 2 ξ, η and ωη should all remain finite as ω
¨ ˙ ˙
gets large with all the parameters of H0 and U(q) fixed. The equations
of motion are

˙ ∂H0
q˙j + ξj =
¯ (q, p)
∂pj
∂H0 ∂ 2 H0 ∂ 2 H0
= (¯, p) +
q ¯ ξk (¯, p) +
q ¯ ηk (¯, p)
q ¯
∂pj k ∂pj ∂qk k ∂pj ∂pk
1 ∂ 3 H0
+ ηk η (¯, p) + O(ω −3)
q ¯
2 k ∂pj ∂pk ∂p
∂H0 ∂U
p˙j + ηj
¯ ˙ = − (q, p) − sin ωt
∂qj ∂qj
∂H0 ∂ 2 H0 ∂ 2 H0
= − (¯, p) −
q ¯ ξk (¯, p) −
q ¯ ηk (¯, p)
q ¯
∂qj k ∂qj ∂qk k ∂qj ∂pk
1 ∂ 3 H0 ∂U
− ηk η (¯, p) − sin ωt
q ¯ (¯)
q
2 k ∂qj ∂pk ∂p ∂qj
∂2U
− ξk sin ωt (¯) + O(ω −3 ).
q (7.11)
k
∂qj ∂qk

Averaging over one period, ignoring the changes in the slowly varying
functions2 of q and p, making use of the assumption that the average
¯ ¯
2
The careful reader will note that the argument is not really valid, because
we have variations in the coefficient of η of order ω −1 and in the coefficient of
sin ωt of order ω −2 . A valid argument concludes first that Eqs (7.12) are correct
through order ω −1 , which is then enough to get Eqs. (7.13) to the order stated,
and hence (7.14) and (7.15), with the assumption that any additional terms are
rapidly oscillating. If we then average (7.11) over a period centered at 2πn/ω, the
expressions which we claimed vanished do, except that the averages

∂U ∂ ∂U
ηj = − sin ωt
˙ =− ,
∂qj ∂t ∂qj

cancelling the inaccuracies of our argument.


of ξ and of η vanish, and dropping terms of order ω −3 , we have
∂H0 1 ∂ 3 H0
q˙j =
¯ (¯, p) +
q ¯ ηk η (¯, p),
q ¯
∂pj 2 k ∂pj ∂pk ∂p
∂H0 ∂2U
p˙j = −
¯ (¯, p) −
q ¯ ξk sin ωt (¯)
q (7.12)
∂qj k ∂qj ∂qk
1 ∂ 3 H0
− ηk η (¯, p).
q ¯
2 k ∂qj ∂pk ∂p
˙
Plugging these equations back into 7.11 to evaluate ξ and η to lowest
˙
order gives
∂ 2 H0
˙
ξj = ηk + O(ω −2 ),
k ∂ pk ∂ pj
¯ ¯
∂U
ηj
˙ = − sin ωt + O(ω −2). (7.13)
∂ qj
¯
Integrating ﬁrst
1 ∂U(¯)
q 1 ∂ ∂U
ηj (t) = cos ωt − 2 sin ωt + O(ω −3 ). (7.14)
ω ∂ qj
¯ ω ∂t ∂ qj
¯
Then integrating for ξ gives
1 ∂U ∂ 2 H0
ξj (t) = 2 sin ωt + O(ω −3 ), (7.15)
ω k ∂ qk ∂ pk ∂ pj
¯ ¯ ¯
where the extra accuracy is from integrating only over times of order
ω −1 . Now the mean values can be evaluated:
1 ∂U ∂U
ηk η = ,
2ω 2 ∂ qk ∂ q
¯ ¯
1 ∂U ∂ 2 H0
ξk sin ωt = .
2ω 2 ∂ qk ∂ pk ∂ p
¯ ¯ ¯
Inserting these into the equations of motion (7.12) gives exactly the
Hamilton equations which come from the mean motion Hamiltonian
1 ∂U ∂U ∂ 2 H0
K(¯, p) = H0 (¯, p) +
q ¯ q ¯ . (7.16)
4ω 2 k ∂ qk ∂ q ∂ pk ∂ p
¯ ¯ ¯ ¯


We see that the mean motion is perturbed only by terms of order
ω −2 τ 2 , where τ is a typical time for the unperturbed hamiltonian, so
the perturbation is small, even though the original perturbing potential
is not small at generic instants of time.
The careful reader will be bothered by the contributions of slowly
varying terms multiplied by a single ηk or by sin ωt, for which the
average over a period will vanish to order ω −1 but not necessarily to
order ω −2 . Thus the corrections to the motion of q and p clearly vanish
¯ ¯
to order ω −1, which is enough to establish the equations for ξ(t) and
η(t). But there are still ambiguities of order ω −2 in ηk and contributions
of that order from sin ωt U.
The problem arises from the ambiguity in defining the average mo-
tions by subtracting off the oscillations. Given the function p(t) with
the assurance that its derivative is order 1 as ω → ∞, we might try to
make this subtraction by defining
ω t+π/ω
p(t) :=
¯ p(t )dt ,
2π t−π/ω

and the rapidly oscillating part η(t) = p(t) − p(t). But we have not
¯
completely eliminated η , for over the cycle centered at t,
ω t+π/ω ω 2 t+π/ω t +π/ω
η := η(t )dt = p(t) −
¯ dt p(t )dt .
2π t−π/ω 2π t−π/ω t −π/ω

In the last term we interchange orders of integration,
2 t+2π/ω 2π/ω−|t −t|
2π
− ( η − p(t)) =
¯ dt p(t ) du
ω t−2π/ω 0
t+2π/ω 2π
= dt p(t ) − |t − t| .
t−2π/ω ω
So what! If I could assume p had a reasonable power series expansion
I could evaluate this, but only the first derivative is known to stay
bounded as ω → ∞. In fact, p is probably better defined with a
¯
smooth smearing function, say a Gaussian of width ω −1/2 or so.
Another approach would be to relax the zero condition, take the
expressions for ξ(t) and η(t) to be exact (as they can be considered
arbitrary subtractions), and then ask if the q and p given by H satisfy
¯ ¯


the equations given by K solve the original equations through second
order. But the answer is no, because there is a term ∝ cos2 ωt from the
ηk η term in 7.8. Perhaps we could add a higher order higher frequency
term to ηk ?
Let us simplify to one degree of freedom and one parameter, and
write
a1 a2
η(t) = eiωt + 2 e2iωt
ω ω
b1 b2
ξ(t) = 2 eiωt + 3 e2iωt
ω ω
−2
so to order ω inclusive,
a1 iωt
˙ 2ia2 a2
˙
η = ia1 +
˙ e + + 2 e2iωt
ω ω ω
ib1 ˙
b1 2ib2
˙
ξ= + 2 eiωt + 2 e2iωt
ω ω ω
The equations of motion are
∂H0 ∂ 2 H0 ∂ 2 H0 1 2 ∂ 3 H0
¯ ˙
˙
q+ξ = +ξ +η + η + O(ω −3 )
∂p ∂p∂q ∂p∂p 2 ∂p3
∂H0 ∂ 2 H0 ∂ 2 H0 1 2 ∂ 3 H0 ∂U ∂2U
p+η = −
˙
¯ ˙ −ξ −η − η − sin ωt − ξ sin ωt 2
∂q ∂q 2 ∂q∂p 2 ∂q∂ 2 p ∂q ∂q
Assuming all rapidly oscillating terms are in the right place,

ib1 ˙
b1 2ib2 b1 iωt b2 2iωt ∂ 2 H0
+ 2 eiωt + 2 e2iωt = + e + 3e
ω ω ω ω2 ω ∂p∂q
2 2 3
a1 iωt a2 2iωt ∂ H0 1 a1 ∂ H0
+ e + 2e + e2iωt
ω ω ∂p∂p 2 ω ∂p3

a1 iωt
˙ 2ia2 a2
˙ b1 ∂ 2 H0
ia1 + e + + 2 e2iωt = − 2 eiωt
ω ω ω ω ∂q 2
a1 iωt a2 2iωt ∂ 2 H 1 a1 2 2iωt ∂ 3 H ∂U
− e + 2e − e 2p
− sin ωt
ω ω ∂q∂p 2 ω ∂q∂ ∂q
2
b1 ∂ U
− 2 eiωt sin ωt 2
ω ∂q


This seems to say a2 is order ω −1, so neither η nore ξ do have
corrections of order ω −2, although their derivatives do. Let us try
another approach,

7.5 New approach
Let
1 ∂U ∂ 2 H 1
ξ(t) = 2 ∂q ∂p2
sin ωt + 2 ξ
ω ω
1 ∂U 1
η(t) = cos ωt + 2 η
ω ∂q ω
and assume q and p obey Hamiltons equations with K.
¯ ¯
Then 7.8 says
 
2
1 ∂  ∂U ∂ 2 H  1 ∂U ∂ 2 H 1 d ∂U ∂ 2 H 1 ˙
+ cos ωt + 2 sin ωt + ξ
4ω 2 ∂p ∂q ∂p2 ω ∂q ∂p2 ω dt ∂q ∂p2 ω2
1 ∂U ∂ 2 H ∂ 2 H 1 ∂2H 1 ∂U ∂ 2 H
= sin ωt + 2 ξ + cos ωt
ω2 k ∂q ∂p2 ∂q∂p ω ∂q∂p ω ∂q ∂p2
2
1 ∂2H 1 ∂U ∂3H
+ 2η 2
+ 2 cos2 ωt ,
ω ∂p 2ω ∂q ∂p3
 
2
1 ∂  ∂U ∂2H  1 d ∂U ∂U 1 ˙
− 2 + cos ωt − sin ωt + 2 η
4ω ∂q ∂q ∂p2 ω dt ∂q ∂q ω
1 ∂U ∂ 2 H ∂ 2 H 1 ∂2H 1 ∂U ∂2H
=− sin ωt − 2 ξ − cos ωt
ω 2 ∂q ∂p2 ∂q 2 ω ∂q 2 ω ∂q ∂q∂p
2
1 ∂2H 1 ∂U ∂3H ∂U
− 2η − 2 cos2 ωt 2
− sin ωt
ω ∂q∂p 2ω ∂q ∂q∂p ∂q
1 ∂U ∂ 2 H ∂2U 1 ∂2U
− 2 sin2 ωt 2 − 2 ξ sin ωt 2
ω ∂q ∂p2 ∂q ω ∂q
Cancel the obvious terms, use d(∂U/∂q)/dt = (∂ 2 U/∂q 2 )(∂H/∂p) +
O(ω −2 ), to get
2
1 ∂U ∂3H 1 d ∂U ∂ 2 H 1 ˙
+ 2 sin ωt + ξ
4ω 2 ∂q ∂p 3 ω dt ∂q ∂p2 ω2

7.5. NEW APPROACH 217

1 ∂U ∂ 2 H ∂ 2 H 1 ∂2H
= sin ωt + 2 ξ
ω2 k ∂q ∂p2 ∂q∂p ω ∂q∂p
2
1 ∂2H 1 ∂U ∂3H
+ 2
η 2
+ 2 cos2 ωt ,
ω ∂p 2ω ∂q ∂p3
2
1 ∂U ∂ 2 U ∂ 2 H 1 ∂U ∂3H 1 ∂ 2 U ∂H 1 ˙
− 2 ∂q ∂q 2 ∂p2
− 2 2
+ 2 ∂p
cos ωt + 2 η
2ω 4ω ∂q ∂q∂p ω ∂q ω
2 2 2 2
1 ∂U ∂ H ∂ H 1 ∂ H 1 ∂U ∂ H
=− 2 2 ∂q 2
sin ωt − 2 ξ 2
− cos ωt
ω ∂q ∂p ω ∂q ω ∂q ∂q∂p
2
1 ∂2H 1 ∂U ∂3H
− η − 2 cos2 ωt
ω 2 ∂q∂p 2ω ∂q ∂q∂p2
1 ∂U ∂ 2 H 2 ∂2U 1 ∂2U
− 2 sin ωt 2 − 2 ξ sin ωt 2
ω ∂q ∂p2 ∂q ω ∂q
Now bring the ﬁrst terms on the left to the other side and use cos 2ωt =
2 cos2 ωt − 1 = −(2 sin2 ωt − 1), to get
1 d ∂U ∂ 2 H 1 ˙
2
sin ωt 2
+ 2ξ
ω dt ∂q ∂p ω
2 2
1 ∂U ∂ H ∂ H 1 ∂2H 1 ∂2H
= 2 sin ωt + 2 ξ + η
ω k ∂q ∂p2 ∂q∂p ω ∂q∂p ω 2 ∂p2
2
1 ∂U ∂3H
+ 2 cos 2ωt,
4ω ∂q ∂p3
1 ˙ 1 ∂U ∂ 2 H ∂ 2 H 1 ∂2H 1 ∂ ∂U ∂H
2
η =− 2 2 ∂q 2
sin ωt − 2 ξ 2
− cos ωt
ω ω ∂q ∂p ω ∂q ω ∂q ∂q ∂p
2
1 ∂2H 1 ∂U ∂3H
− η − 2 cos 2ωt
ω 2 ∂q∂p 4ω ∂q ∂q∂p2
1 ∂U ∂ 2 H ∂ 2 U 1 ∂2U
+ 2 cos 2ωt − 2 ξ sin ωt 2
2ω ∂q ∂p2 ∂q 2 ω ∂q
˙
Note that there is a term of higher order in the η expression, so
∂ ∂U ∂H
η = sin ωt + O(ω −3).
∂q ∂q ∂p
All the other terms are consistent with an O(ω −3) rapidly oscillating
contribution.


Exercises
7.1 Consider the harmonic oscillator H = p2 /2m + 1 mω 2 q 2 as a pertur-
2
bation on a free particle H0 = p2 /2m. Find Hamilton’s Principle Function
S(q, P ) which generates the transformation of the unperturbed hamiltonian
to Q, P the initial position and momentum. From this, ﬁnd the Hamiltonian
K(Q, P, t) for the full harmonic oscillator, and thus equations of motion for
Q and P . Solve these iteratively, assuming P (0) = 0, through fourth order
in ω. Express q and p to this order, and compare to the exact solution for
an harmonic oscillator.

7.2 Consider the Kepler problem in two dimensions. That is, a particle of
(reduced) mass µ moves in two dimensions under the inﬂuence of a potential

K
U (x, y) = − .
x2 + y 2

This is an integrable system, with two integrals of the motion which are in
involution. In answering this problem you are expected to make use of the
explicit solutions we found for the Kepler problem.
a) What are the two integrals of the motion, F1 and F2 , in more familiar
terms and in terms of explicit functions on phase space.
b) Show that F1 and F2 are in involution.
c) Pick an appropriate η0 ∈ Mf , and explain how the coordinates t are
related to the phase space coordinates η = gt (η0 ). This discussion may
be somewhat qualitative, assuming we both know the explicit solutions of
Chapter 3, but it should be clearly stated.
d) Find the vectors ei which describe the unit cell, and give the relation
between the angle variables φi and the usual coordinates η. One of these
should be explicit, while the other may be described qualitatively.
e) Comment on whether there are relations among the frequencies and
whether this is a degenerate system.

Chapter 8

Field Theory

In section 5.4 we considered the continuum limit of a chain of point
masses on stretched string. We had a situation in which the poten-
tial energy had interaction terms for particle A which depended only
on the relative displacements of particles in the neighborhood of A.
If we take our coordinates to be displacements from equilibrium, and
consider only motions for which the displacement η = η(x, y, z, t) be-
comes diﬀerentiable in the continuum limit, then the leading term in
the potential energy is proportional to second derivatives in the spacial
coordinates. For our points on a string at tension τ , with mass density
ρ, we found
1 L
T = ρ y 2 (x)dx,
˙
2 0
2
τ L ∂y
U = dx,
2 0 ∂x

as we can write the Lagrangian as an integral of a Lagrangian density
L(y, y, y , x, t). Actually for our string we had no y or x or t dependence,
˙
because we ignored gravity Ug = ρgy(x, t)dx, and had a homogeneous
string whose properties were time independent. In general, however,
such dependence is quite possible. For a three dimensional object, such
as the equations for the displacement of the atoms in a crystal, we might
have ﬁelds η, the three components of the displacement of a particle,
as a function of the three coordinates (x, y, z) determining the particle,

219

220 CHAPTER 8. FIELD THEORY

as well as time. Thus the generalized coordinates are the functions
ηi (x, y, z, t), and the Lagrangian density will depend on these, their
gradients, their time derivatives, as well as possibly on x, y, z, t. Thus
∂ηi ∂ηi ∂ηi ∂ηi
L = L(ηi , , , , , x, y, z, t)
∂x ∂y ∂z ∂t
and

L = dx dy dz L,

I = dx dy dz dt L.

The actual motion of the system will be given by a particular set
of functions ηi (x, y, z, t), which are functions over the volume in ques-
tion and of t ∈ [tI , tf ]. The function will be determined by the laws
of dynamics of the system, together with boundary conditions which
depend on the initial configuration ηI (x, y, z, tI ) and perhaps a final
configuration. Generally there are some boundary conditions on the
spacial boundaries as well. For example, our stretched string required
y = 0 at x = 0 and x = L.
Before taking the continuum limit we say that the configuration of
the system at a given t was a point in a large N dimensional configura-
tion space, and the motion of the system is a path Γ(t) in this space. In
the continuum limit N → ∞, so we might think of the path as a path
in an infinite dimensional space. But we can also think of this path as
a mapping t → η(·, ·, ·, t) of time into the (infinite dimensional) space
of functions on ordinary space.
Hamilton’s principal says that the actual path is an extremum of
the action. If we consider small variations δηi (x, y, z, t) which vanish
on the boundaries, then

δI = dx dy dz dt δL = 0.

Note that what is varied here are the functions ηi , not the coordinates
(x, y, z, t). x, y, z do not represent the position of some atom — they
represent a label which tells us which atom it is that we are talking
about. They may well be the equilibrium position of that atom, but

221

they are independent of the motion. It is the ηi which are the dynamical
degrees of freedom, specifying the configuration of the system.
The variation
∂ηi ∂ηi ∂ηi ∂ηi
δL(ηi , , , , , x, y, z, t)
∂x ∂y ∂z ∂t
∂L ∂L ∂η ∂L ∂η ∂L ∂η
= δη + δ + δ + δ
∂η ∂(∂η/∂x) ∂x ∂(∂η/∂y) ∂y ∂(∂η/∂z) ∂z
∂L ∂η
+ δ .
∂(∂η/∂t) ∂t

Notice there is no variation of x, y, z, and t, as we discussed.
The notation is getting awkward, so we need to reintroduce the
notation A,i = ∂A/∂ri . In fact, we see that ∂/∂t enters in the same
way as ∂/∂x, so we will set x0 = t and write

∂ ∂ ∂ ∂ ∂
∂µ := = , , , ,
∂xµ ∂t ∂x ∂y ∂z

for µ = 0, 1, 2, 3, and write η,µ := ∂µ η. If there are several fields ηi , then
∂µ ηi = ηi,µ . The comma represents the beginning of differentiation, so
we must not use one to separate different ordinary indices.
In this notation, we have
3
∂L ∂L
δL = δηi + δηi,µ ,
i ∂ηi i µ=0 ∂ηi,µ

and
 
3
∂L ∂L
δI =  δηi + δηi,µ  d4 x,

where d4 x = dx dy dz dt. Except for the first term, we integrate by
parts,
 
3
∂L ∂L 
δI =  − ∂µ δηi d4 x,


where we have thrown away the boundary terms which involve δηi eval-
uated on the boundary, which we assumed to be zero. Inside the region
of integration, the δηi are independent, so requiring δI = 0 for all
functions δηi (xµ ) implies

∂L ∂L
∂µ − = 0. (8.1)
∂ηi,µ ∂ηi

We have written the equations of motion (which is now a partial dif-
ferential equation rather than coupled ordinary differential equations),
in a form which looks like we are dealing with a relativistic problem,
because t and spatial coordinates are entering in the same way. We
have not made any assumption of relativity, however, and our problem
will not be relativistically invariant unless the Lagrangian density is
invariant under Lorentz transformations (as well as translations).
Now consider how the Lagrangian changes from one point in space-
time to another, including the variation of the fields, assuming the fields
obey the equations of motion. Then the total derivative for a variation
of xµ is

dL ∂L ∂L ∂L
= + ηi,µ + ηi,ν,µ .
dxµ ∂xµ η
∂ηi ∂ηi,ν

Plugging the equations of motion into the second term,

dL ∂L ∂L ∂L
= + ∂ν ηi,µ + ηi,µ,ν
dxµ ∂xµ ∂ηi,ν ∂ηi,ν
∂L ∂L
= + ∂ν ηi,µ .
∂xµ ∂ηi,ν

Thus
∂L
∂ν Tµν = − , (8.2)
∂xµ
where the stress-energy tensor Tµν is defined by

∂L
Tµν (x) = ηi,µ − Lδµν . (8.3)
∂ηi,ν

223

Note that if the Lagrangian density has no explicit dependence on
the coordinates xµ , the stress-energy tensor satisfies an equation ∂ν Tµν
which is a continuity equation.
In dynamics of discrete systems we defined the Hamiltonian as H =
i pi qi − L(q, p, t). Considering the continuum as a limit, L =
˙ d3 xL
is the limit of ijk ∆x∆y∆zLijk , where Lijk depends on qijk and a few
of its neighbors, and also on qijk . The conjugate momentum pijk =
˙
∂L/∂ qijk = ∆x∆y∆z∂Lijk /∂ qijk which would vanish in the continuum
˙ ˙
limit, so instead we define

π(x, y, z) = pijk /∆x∆y∆z = ∂Lijk /∂ qijk = δL/δ q(x, y, z).
˙ ˙

The Hamiltonian

H = pijk qijk − L =
˙ ∆x∆y∆zπ(x, y, z)q(xyz) − L
˙
= d3 x (π(r)q(r) − L) =
˙ d3 xH,

where the Hamiltonian density is defined by

H(r) = π(r)q(r) − L(r).
˙

Of course if there are several fields qi at each point,

H(r) = πi (r)qi (r) − L(r).
˙
i

where
δL
πi (r) = .
δ qi (r)
˙
Notice that the Hamiltonian density is exactly T00 , one component of
the stress-energy tensor.
Consider the case where L does not depend explicitly on (x, t), so
3
∂ν Tµν = 0,
ν=0

or
3
∂
Tµ0 = ∂i Tµi = 0.
∂t i=1


This is a continuity equation, similar to the equation from fluid me-
chanics, ∂ρ/∂t + · (ρv) = 0, which expresses the conservation of
mass. That equation has the interpretation that the change in the
mass contained in some volume is equal to the flux into the volume,
because ρv is the flow of mass past a unit surface area. In the current
case, we have four conservation equations, indexed by µ. Each of these
can be integrated over space to tell us about the rate of change of the
“charge” Qµ (t) = d3 V Tµ0 (x, t),
d ∂
Qµ (t) = d3 V Tµi (x, t).
dt ∂xi
We see that his is the integral of the divergence of a vector current
(Jµ )i = Tµi , which by Gauss’ law becomes a surface integral of the flux
of Jµ out of the volume of our system. We have been sloppy about our
boundary conditions, but in many cases it is reasonable to assume there
is no flux out of the volume. In this case the right hand side vanishes,
and we find four conserved quantities
Qµ (t) = constant.
For µ = 0 we saw that T00 is the energy density, so Q0 is the total
energy.

Cyclic coordinates
In discrete mechanics, when L was independent of a coordinate qi , even
though it depended on qi , we called the coordinate cyclic or ignorable,
˙
and found a conserved momentum conjugate to it. For fields in general,
L(η, η, η) depends on spatial derivates of η as well, and we may ask
˙
whether we need to require absense of dependence on η for a coordi-
nate to be cyclic. Independence of both η and η implies independence
on an infinite number of discrete coordinates, the values of η(r) at ev-
ery point r, which is too restrictive a condition for our discussion. We
will call a coordinate field ηi cyclic if L does not depend directly on ηi ,
although it may depend on its derivatives ηi and ηi .
˙
The Lagrange equation then states
δL d δL
∂µ = 0, or πi + ∂j = 0.
µ δηi,µ dt j δηi,j

8.1. NOETHER’S THEOREM 225

If we integrate this equation over all space, and define

Πi (t) = πi (r)d3 r,

then the derivative dΠ/dt involves the integral of a divergence, which
by Gauss’ law is a surface term
dΠ(t) δL
=− (dS)j .
dt δηi,j
Assuming the spatial boundary conditions are such that we may ignore
this boundary term, we see that Πi (t) is a constant of the motion.

8.1 Noether’s Theorem
We want to discuss the relationship between symmetries and conserved
quantities which is known as Noether’s theorem. It concerns in-
finitesimal tranformations of the degrees of freedom ηi (xµ ) which may
relate these to degrees of freedom at a changed point. That is, the
new fields η (x ) is related to η(x) rather than η(x ), where xµ → xµ =
xµ + δxµ is some infinitesimal transformation of the coordinates rather
than of the degrees of freedom. For a scalar field, like temperature,
under a rotation, we would define the new field

η (x ) = η(x),

but more generally the field may also change, in a way that may depend
on other fields,

ηi (x ) = ηi (x) + δηi (x; ηk (x)).

This is what you would expect for a vector field E under rotations,
because the new Ex gets a component from the old Ey .
The Lagrangian is a given function of the old fields L(ηi , ηi,µ , xµ ).
If we substitute in the values of η(x) in terms of η (x ) we get a new
function L , defined by

L (ηi , ηi,µ , xµ ) = L(ηi , ηi,µ , xµ ).


The symmetries we wish to discuss are transformations of this type
under which the form of the Lagrangian density does not change, so
that L is the same functional form as L, or
L (ηi , ηi,µ , xµ ) = L(ηi , ηi,µ , xµ ).
In considering the action, we integrate the Lagrangian density over
a region of space-time between two spacial slices corresponding to an
initial time and a final time. We may, however, consider an arbitrary
region of spacetime Ω ⊂ R4 . The corresponding four dimensional vol-
ume in the transformed coordinates is the region x ∈ Ω . The action
for a given field configuration η

S(η) = L(η, η,µ, x)d4 x
Ω

differs from S (η ) = Ω L (η , η,µ , x )d4 x ) only by the Jacobian, as a
change of variables gives
∂x
S (η ) = L(η, η,µ , x)d4 x.
Ω ∂x
The Jacobian is
∂δxµ
det (δµν + ∂ν δxµ ) = 1 + Tr = 1 + ∂µ δxµ .
∂xν
It makes little sense to assume the Lagrangian density is invariant unless
the volume element is as well, so we will require the Jacobian to be
identically 1, or ∂µ δxµ = 0. So then δS = 0 for the symmetries we wish
to discuss.
We can also consider S (x ) as an integral over x, as this is just a
dummy variable,

S (η ) = L η (x), η,µ (x), x d4 x.
Ω

This differs from S(η) by S (η ) − S(η) = δ1 S + δ2 S, because
1. the Lagrangian is evaluated with the field η rather than η, pro-
ducing a change
δL ¯ δL ¯
δ1 S = δηi + δηi,µ d4 x,
δηi δηi,µ

8.1. NOETHER’S THEOREM 227

where

¯
δηi (x) := ηi (x) − ηi (x) = ηi (x) − ηi (x ) + δηi (x) = δηi (x) − ηi,µ δxµ .

2. Change in the region of integration, Ω rather than Ω,

δ2 S = − L(η, η,µ, x)d4 x.
Ω Ω

If we define dΣµ to be an element of the three dimensional surface
Σ = ∂Ω of Ω, with outward-pointing normal in the direction of dΣµ ,
the difference in the regions of integration may be written as an integral
over the surface,

− d4 x = δxµ · dΣµ .
Ω Ω Σ

Thus

δ2 S = Lδxµ · dSµ = ∂µ (Lδxµ ) (8.4)
∂Ω Ω

by Gauss’ Law (in four dimensions).
¯
As δ is a difference of two functions at the same values of x, this
¯ ¯
operator commutes with partial differentiation, so δηi,µ = ∂µ δηi . Using
this in the second term of δ1 S and the equations of motion in the first,
we have
∂L ¯ ∂L ¯
δ1 S = d4 x ∂µ δηi + ∂µ δηi
∂ηi,µ ∂ηi,µ
∂L ¯
= d4 x∂µ δηi
Ω ∂ηi,µ
∂L ∂L
= d4 x∂µ δηi − ηi,ν δxν .
Ω ∂ηi,µ ∂ηi,µ

Then δ1 S + δ2 S = 0 is a condition in the form

d4 x ∂µ Jµ = 0, (8.5)
Ω


which holds for arbitrary volumes Ω. Thus we have a conservation
equation
∂µ Jµ = 0.
The infinitesimal variations may be thought of as proportional to an
infinitesimal parameter , which is often in fact a component of a four-
vector. The variations in xµ and ηi are then

dxµ dηi
δxµ = , δηi = ,
d d
so if δ1 S + δ2 S = 0 is − times (8.5),

∂L dηi ∂L dxν dxµ
Jµ = − + ηi,ν −L .
∂ηi,µ d ∂ηi,µ d d
∂L dηi dxν
= − + Tνµ . (8.6)
∂ηi,µ d d

Exercises
8.1 The Lagrangian density for the electromagnetic field in vacuum may
be written
1
L= E2 −B2 ,
2
where the dynamical degrees of freedom are not E and B, but rather A and
φ, where

B = ×A
1 ˙
E = − φ− A
c

a) Find the canonical momenta, and comment on what seems unusual about
one of the answers.
b) Find the Lagrange Equations for the system. Relate to known equations
for the electromagnetic field.

Appendix A

ijk and cross products

A.1 Vector Operations
A.1.1 δij and ijk

These are some notes on the use of the antisymmetric symbol ijk for
expressing cross products. This is an extremely powerful tool for manip-
ulating cross products and their generalizations in higher dimensions,
and although many low level courses avoid the use of , I think this is
a mistake and I want you to become proﬁcient with it.
In a cartesian coordinate system a vector V has components Vi along
each of the three orthonormal basis vectors ei , or V = i Vi ei . The dot
ˆ ˆ
product of two vectors, A · B, is bilinear and can therefore be written
as

A·B = ( Ai ei ) ·
ˆ Bj ej
ˆ (A.1)
i j

= Ai Bj ei · ej
ˆ ˆ (A.2)
i j

= Ai Bj δij , (A.3)
i j

where the Kronecker delta δij is deﬁned to be 1 if i = j and 0 otherwise.
As the basis vectors ek are orthonormal, i.e. orthogonal to each other
ˆ
and of unit length, we have ei · ej = δij .
ˆ ˆ

229

230 APPENDIX A. IJK AND CROSS PRODUCTS

Doing a sum over an index j of an expression involving a δij is
very simple, because the only term in the sum which contributes is
the one with j = i. Thus j F (i, j)δij = F (i, i), which is to say, one
just replaces j with i in all the other factors, and drops the δij and the
summation over j. So we have A· B = i Ai Bi , the standard expression
for the dot product1
We now consider the cross product of two vectors, A × B, which
is also a bilinear expression, so we must have A × B = ( i Ai ei ) × ˆ
( j Bj ej ) = i j Ai Bj (î × ej ). The cross product ei × ej is a vector,
ˆ e ˆ ˆ ˆ
which can therefore be written as V = k Vk ek . But the vector result
ˆ
depends also on the two input vectors, so the coefficients Vk really
depend on i and j as well. Define them to be ijk , so
ei × ej =
ˆ ˆ kij ek .
ˆ
k
It is easy to evaluate the 27 coefficients kij , because the cross product
of two orthogonal unit vectors is a unit vector orthogonal to both of
them. Thus e1 ×ˆ2 = e3 , so 312 = 1 and k12 = 0 if k = 1 or 2. Applying
ˆ e ˆ
the same argument to e2 × e3 and e3 × e1 , and using the antisymmetry
ˆ ˆ ˆ ˆ
of the cross product, A × B = −B × A, we see that
123 = 231 = 312 = 1; 132 = 213 = 321 = −1,
and ijk = 0 for all other values of the indices, i.e. ijk = 0 whenever any
two of the indices are equal. Note that changes sign not only when the
last two indices are interchanged (a consequence of the antisymmetry of
the cross product), but whenever any two of its indices are interchanged.
Thus ijk is zero unless (1, 2, 3) → (i, j, k) is a permutation, and is equal
to the sign of the permutation if it exists.
Now that we have an expression for ei × ej , we can evaluate
ˆ ˆ
A×B = Ai Bj (î × ej ) =
e ˆ kij Ai Bj ek .
ˆ (A.4)
i j i j k

Much of the usefulness of expressing cross products in terms of ’s
comes from the identity
kij k m = δi δjm − δim δj , (A.5)
k
1
Note that this only holds because we have expressed our vectors in terms of
orthonormal basis vectors.

A.1. VECTOR OPERATIONS 231

which can be shown as follows. To get a contribution to the sum, k
must be different from the unequal indices i and j, and also different
from and m. Thus we get 0 unless the pair (i, j) and the pair ( , m)
are the same pair of different indices. There are only two ways that
can happen, as given by the two terms, and we only need to verify the
coefficients. If i = and j = m, the two ’s are equal and the square
is 1, so the first term has the proper coefficient of 1. The second term
differs by one transposition of two indices on one epsilon, so it must
have the opposite sign.
We now turn to some applications. Let us first evaluate

A · (B × C) = Ai ijk Bj Ck = ijk Ai Bj Ck . (A.6)
i jk ijk

Note that A · (B × C) is, up to sign, the volume of the parallelopiped
formed by the vectors A, B, and C. From the fact that the changes
sign under transpositions of any two indices, we see that the same is
true for transposing the vectors, so that

A · (B × C) = −A · (C × B) = B · (C × A) = −B · (A × C)
= C · (A × B) = −C · (B × A).

Now consider V = A × (B × C). Using our formulas,

V = kij ek Ai (B
ˆ × C)j = kij ek Ai
ˆ jlm Bl Cm .
ijk ijk lm

Notice that the sum on j involves only the two epsilons, and we can use

kij jlm = jki jlm = δkl δim − δkm δil .
j j

Thus

Vk = ( kij jlm )Ai Bl Cm = (δkl δim − δkm δil )Ai Bl Cm
ilm j ilm

= δkl δim Ai Bl Cm − δkm δil Ai Bl Cm
ilm ilm

= Ai Bk Ci − Ai Bi Ck = A · C Bk − A · B Ck ,
i i

232 APPENDIX A. IJK AND CROSS PRODUCTS

so
A × (B × C) = B A · C − C A · B. (A.7)
This is sometimes known as the bac-cab formula.
Exercise: Using (A.5) for the manipulation of cross products,
show that

(A × B) · (C × D) = A · C B · D − A · D B · C.

The determinant of a matrix can be defined using the symbol. For
a 3 × 3 matrix A,

det A = ijk A1i A2j A3k = ijk Ai1 Aj2 Ak3 .
ijk ijk

From the second definition, we see that the determinant is the volume
of the parallelopiped formed from the images under the linear map A
of the three unit vectors ei , as
ˆ

(Aˆ1 ) · ((Aˆ2 ) × (Aˆ3 )) = det A.
e e e

In higher dimensions, the cross product is not a vector, but there
is a generalization of which remains very useful. In an n-dimensional
space, i1 i2 ...in has n indices and is defined as the sign of the permuta-
tion (1, 2, . . . , n) → (i1 i2 . . . in ), if the indices are all unequal, and zero
otherwise. The analog of (A.5) has (n − 1)! terms from all the permu-
tations of the unsummed indices on the second . The determinant of
an n × n matrix is defined as
n
det A = i1 i2 ...in Ap,ip .
i1 ,...,in p=1

Appendix B

The gradient operator

We can define the gradient operator

∂
= ei
ˆ . (B.1)
i ∂xi

While this looks like an ordinary vector, the coefficients are not num-
bers Vi but are operators, which do not commute with functions of
the coordinates xi . We can still write out the components straightfor-
wardly, but we must be careful to keep the order of the operators and
the fields correct.
The gradient of a scalar field Φ(r) is simply evaluated by distributing
the gradient operator

∂ ∂Φ
Φ=( ei
ˆ )Φ(r) = ei
ˆ . (B.2)
i ∂xi i ∂xi

Because the individual components obey the Leibnitz rule ∂AB =
∂xi
∂A
∂xi
B+
∂B
A ∂xi , so does the gradient, so if A and B are scalar fields,

AB = ( A)B + A B. (B.3)

The general application of the gradient operator to a vector A
gives an object with coefficients with two indices, a tensor. Some parts
of this tensor, however, can be simplified. The first (which is the trace

233

234 APPENDIX B. THE GRADIENT OPERATOR

of the tensor) is called the divergence of the vector, written and defined
by
∂ ∂Bj ∂Bj
·A = ( ei
ˆ )·( ej Bj ) =
ˆ ei · ej
ˆ ˆ = δij
i ∂xi j ij ∂xi ij ∂xi
∂Bi
= . (B.4)
i ∂xi
In asking about Leibnitz’ rule, we must remember to apply the diver-
gence operator only to vectors. One possibility is to apply it to the
vector V = ΦA, with components Vi = ΦAi . Thus
∂(ΦAi ) ∂Φ ∂Ai
· (ΦA) = = Ai + Φ
i ∂xi i ∂xi i ∂xi
= ( Φ) · A + Φ · A. (B.5)
We could also apply the divergence to the cross product of two vectors,
∂(A × B)i ∂( jk ijk Aj Bk ) ∂(Aj Bk )
· (A × B) = = = ijk
i ∂xi i ∂xi ijk ∂xi
∂Aj ∂Bk
= ijk Bk + ijk Aj . (B.6)
ijk ∂xi ijk ∂xi

This is expressible in terms of the curls of A and B.
The curl is like a cross product with the first vector replaced by the
differential operator, so we may write the i’th component as
∂
( × A)i = ijk Ak . (B.7)
jk ∂xj

We see that the last expression in (B.6) is
∂Aj ∂Bk
( kij )Bk − Aj jik = ( ×A)·B−A·( ×B). (B.8)
k ij ∂xi j ik ∂xi
where the sign which changed did so due to the transpositions in the
indices on the , which we have done in order to put things in the form
of the definition of the curl. Thus
· (A × B) = ( × A) · B − A · ( × B). (B.9)

235

Vector algebra identities apply to the curl as to any ordinary vector,
except that one must be careful not to change, by reordering, what the
diﬀerential operators act on. In particular, Eq. A.7 is

∂B
A×( × B) = Ai Bi − Ai . (B.10)
i i ∂xi

236 APPENDIX B. THE GRADIENT OPERATOR

Appendix C

Gradient in Spherical
Coordinates

The transformation between Cartesian and spherical coordinates is
given by
1
r= (x2 + y 2 + z 2 ) 2 x= r sin θ cos φ
−1
θ= cos (z/r) y= r sin θ sin φ
φ= tan−1 (y/x) z= r cos θ
The basis vectors {ˆr , eθ , eφ } at the point (r, θ, φ) are given in terms
e ˆ ˆ
of the cartesian basis vectors by

er = sin θ cos φ ex + sin θ sin φ ey + cos θ ez
ˆ ˆ ˆ ˆ
eθ = cos θ cos φ ex + cos θ sin φ ey − sin θ ez
ˆ ˆ ˆ ˆ
eφ = − sin φ ex + cos φ ey .
ˆ ˆ ˆ

By the chain rule, if we have two sets of coordinates, say si and ci ,
and we know the form a function f (si) and the dependence of si on
∂f ∂f ∂s
cj , we can ﬁnd ∂ci = j ∂sj ∂cj , where |s means hold the other s’s
i c
s
ﬁxed while varying sj . In our case, the sj are the spherical coordinates
r, θ, φ, while the ci are x, y, z.
Thus
 
∂f ∂r ∂f ∂θ ∂f ∂φ
f =  + +  ex
ˆ
∂r θφ
∂x yz
∂θ rφ
∂x yz
∂φ rθ
∂x yz

237

238 APPENDIX C. GRADIENT IN SPHERICAL COORDINATES
 
∂f ∂r ∂f ∂θ ∂f ∂φ
+ + +  ey (C.1)
ˆ
∂r θφ
∂y xz
∂θ rφ
∂y xz
∂φ rθ
∂y xz
 
∂f ∂r ∂f ∂θ ∂f ∂φ
+ + +  ez
ˆ
∂r θφ
∂z xy
∂θ rφ
∂z xy
∂φ rθ
∂z xy

∂s
We will need all the partial derivatives ∂cj . From r 2 = x2 + y 2 + z 2 we
i
see that
∂r x ∂r y ∂r z
= = = .
∂x yz r ∂y xz r ∂z xy r
√
From cos θ = z/r = z/ x2 + y 2 + z 2 ,

∂θ −zx −r 2 cos θ sin θ cos φ
− sin θ = =
∂x yz (x2 + y 2 + z 2 )3/2 r3

so
∂θ cos θ cos φ
= .
∂x yz
r
Similarly,
∂θ cos θ sin φ
= .
∂y xz
r
There is an extra term when differentiating w.r.t. z, from the numera-
tor, so

∂θ 1 z2 1 − cos2 θ
− sin θ = − 3 = = r −1 sin2 θ,
∂z xy
r r r

so
∂θ
= −r −1 sin θ.
∂z xy

Finally, the derivatives of φ can easily be found from differentiating
tan φ = y/x. Using differentials,

dy ydx dy dx sin θ sin φ
sec2 φdφ = − 2 = −
x x r sin θ cos φ r sin2 θ cos2 φ

239

so
∂φ 1 sin φ ∂φ 1 cos φ ∂φ
=− = = 0.
∂x yz
r sin θ ∂y xz
r sin θ ∂z xy

Now we are ready to plug this all into (C.1). Grouping together the
terms involving each of the three partial derivatives, we ﬁnd

∂f x y z
f = ex + ey + ez
ˆ ˆ ˆ
∂r θφ
r r r
∂f cos θ cos φ cos θ sin φ sin θ
+ ex +
ˆ ey −
ˆ ez
ˆ
∂θ rφ
r r r
∂f 1 sin φ 1 cos φ
+ − ex +
ˆ ey
ˆ
∂φ rθ
r sin θ r sin θ
∂f 1 ∂f 1 ∂f
= er +
ˆ eθ +
ˆ eφ
ˆ
∂r θφ
r ∂θ rφ
r sin θ ∂φ rθ

Thus we have derived the form for the gradient in spherical coordinates.

240 APPENDIX C. GRADIENT IN SPHERICAL COORDINATES

Bibliography

[1] Howard Anton. Elementary Linear Algebra. John Wiley, New
York, 1973. QA251.A57 ISBN 0-471-03247-6.

[2] V. I. Arnol’d. Math. Methods of Classical Mechanics. Springer-
Verlag, New York, 1984. QA805.A6813.

[3] R. Creighton Buck. Advanced Calculus. McGraw-Hill, 1956.

[4] Herbert Goldstein. Classical Mechanics. Addison-Wesley, Reading,
Massachusetts, second edition, 1980. QA805.G6.

[5] I. S. Gradshtein and I. M. Ryzhik. Table of integrals, series, and
products. Academic Press, New York, 1965. QA55.R943.

[6] L Landau and Lifschitz. Mechanics. Pergamon Press, Oxford, 2nd
edition, 1969. QA805.L283/1976.

[7] Jerry B. Marion and Stephen T. Thornton. Classical Dynam-
ics. Harcourt Brace Jovanovich, San Diego, 3rd ed edition, 1988.
QA845.M38/1988.

[8] R. A. Matzner and L. C Shepley. Classical Mechanics. Prentice
Hall, Englewood Cliﬀs, NJ, 91. QC125.2.M37 ISBN 0-13-137076-6.

[9] Morris Edgar Rose. Elementary Theory of Angular Momentum.
Wiley, New York, 1957. QC174.1.R7.

[10] Walter Rudin. Principles of Mathematical Analysis. McGraw-Hill,
New York, 1953.

241

242 BIBLIOGRAPHY

[11] M. Spivak. Diﬀerential Geometry, volume 1. Publish or Perish,
Inc., 1970.

[12] Keith R. Symon. Mechanics. Addsion-Wesley, Reading, Mas-
sachusetts, 3rd edition, 1971. QC125.S98/1971 ISBN 0-201-07392-
7.

[13] Eugene Wigner. Group Theory and Its Applications to Quantum
Mechanics of Atomic Spectra. Academic Press, New York, 1959.

Index

O(N), 91 composition, 89
1-forms, 148 conditionally periodic motion, 192
configuration space, 6, 46
accoustic modes, 143 conformal, 124
action, 47 conservative force, 8
action-angle, 186 conserved, 6
active, 88 conserved quantity, 6
adiabatic invariant, 210 continuum limit, 139
angular momentum, 9 cotangent bundle, 21
antisymmetric, 95
apogee, 72 D’Alembert’s Principle, 42
apsidal angle, 76 diffeomorphism, 176
associative, 91 differential cross section, 82
attractor, 29 differential k-form, 160
autonomous, 24 Dirac delta function, 144
dynamical balancing, 106
bac-cab, 77, 98, 232 dynamical systems, 23
body cone, 109
body coordinates, 86 eccentricity, 72
Born-Oppenheimer, 131 electrostatic potential, 57
elliptic fixed point, 32
canonical transformation, 153 enthalpy, 150
canonical variables, 153 Euler’s equations, 108
center of mass, 10 Euler’s Theorem, 92
centrifugal barrier, 68 exact, 149, 164
Chandler wobble, 113 extended phase space, 7, 171
closed, 165 exterior derivative, 163
closed under, 91 exterior product, 161
complex structure on phase space,
151 fixed points, 27

243

244 INDEX

form invariant, 178 invariant sets of states, 27
free energy, 150 inverse, 87
functional, 47 involution, 189
gauge invariance, 58 Jacobi identity, 156
gauge transformation, 58
generalized force, 18 kinetic energy, 7
generalized momentum, 50 Knonecker delta, 87
generating function of the canon-
ical transformation, 173 lab angle, 109
generator, 95 Lagrangian, 38
Gibbs free energy, 150 Lagrangian density, 143, 219
glory scattering, 83 Laplace-Runge-Lenz vector, 77
group, 91 Legendre transformation, 149
group multiplication, 91 Levi-Civita, 96
line of nodes, 115
Hamilton’s characteristic function, Liouville’s theorem, 159
181 logistic equation, 23
Hamilton’s equations of motion,
54 magnetic vector potential, 57
Hamilton’s principal function, 181 major axis, 72
Hamilton-Jacobi, 181 mass matrix, 55
Hamiltonian, 53, 149 mean motion Hamiltonian, 213
Hamiltonian density, 223 minor axis, 72
hermitean conjugate, 119 moment of inertia, 101
herpolhode, 111 momentum, 6
holonomic constraints, 14
hyperbolic ﬁxed point, 29 natural symplectic structure, 169
Noether’s theorem, 225
ignorable coordinate, 50 non-degenerate, 170
impact parameter, 81 nondegenerate system, 192
independent frequencies, 192 normal modes, 129
inertia ellipsoid, 110 nutation, 118
inertia tensor, 99
inertial coordinates, 86 oblate, 113
integrable system, 189 optical modes, 143
intrinsic spin, 180 orbit, 6
invariant plane, 111 orbital angular momentum, 180

INDEX 245

order of the dynamical system, Stokes’ Theorem, 168
23 stream derivative, 39
orthogonal, 87 stress-energy, 222
strongly stable, 29
parallel axis theorem, 101 structurally stable, 28
passive, 88 subgroup, 92
perigee, 72 summation convention, 157
period, 25 symplectic, 154
periodic, 25 symplectic structure, 24
perpendicular axis theorem, 104
phase curve, 23, 27 terminating motion, 28
phase point, 22, 27 torque, 9
phase space, 7, 21 total external force, 10
phase trajectory, 173 total mass, 10
Poincar´’s Lemma, 165
e total momentum, 10
point transformation, 40, 153 trajectory, 6
Poisson bracket, 156 transpose, 87, 119
Poisson’s theorem, 159 turning point, 70
polhode, 110 turning points, 71
potential energy, 8
precessing, 118 unimodular, 92
precession of the perihelion, 73 unperturbed system, 194
principal axes, 105 unstable, 29

rainbow scattering, 83 velocity function, 22
reduced mass, 66 vibrations, 131
relation among the frequencies, virtual displacement, 42
192
wedge product, 161
rotation, 89
work, 7
rotation about an axis, 89

scattering angle, 80
semi-major axis, 73
seperatrix, 32
sign of the permutation, 161
similar, 120
similarity transformation, 120
stable, 28, 31

Classical mechanics

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Classical mechanics (20)

Recently uploaded (20)

Classical mechanics