Methods Pages2011
Methods Pages2011
ABSTRACT
Books
1
5.9 Analytic Continuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
5.10 The Gamma Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
5.11 The Riemann Zeta Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
5.12 Asymptotic Expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
5.13 Method of Steepest Descent . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
2
1 First and Second-order Differential Equations
It is a phenomenological fact that most of the fundamental equations that arise in physics
are of second order in derivatives. These may be spatial derivatives, or time derivatives in
various circumstances. We call the spatial coordinates and time, the independent variables
of the differential equation, while the fields whose behaviour is governed by the equation
are called the dependent variables. Examples of dependent variables are the electromag-
netic potentials in Maxwells equations, or the wave function in quantum mechanics. It is
frequently the case that the equations are linear in the dependent variables. Consider, for
example, the scalar potential in electrostatics, which satisfies
2 = 4 (1.1)
where is the charge density. The potential appears only linearly in this equation, which
is known as Poissons equation. In the case where there are no charges present, so that the
right-hand side vanishes, we have the special case of Laplaces equation.
Other linear equations are the Helmholtz equation 2 +k2 = 0, the diffusion equation
2 /t = 0, the wave equation 2 c2 2 /t2 = 0, and the Schrodinger equation
h2 /(2m)2 + V ih /t = 0.
The reason for the linearity of most of the fundamental equations in physics can be traced
back to the fact that the fields in the equations do not usually act as sources for themselves.
Thus, for example, in electromagnetism the electric and magnetic fields respond to the
sources that create them, but they do not themselves act as sources; the electromagnetic
fields themselves are uncharged; it is the electrons and other particles that carry charges
that act as the sources, while the photon itself is neutral. There are in fact generalisations
of Maxwells theory, known as Yang-Mills theories, which play a fundamental role in the
description of the strong and weak nuclear forces, which are non-linear. This is precisely
because the Yang-Mills fields themselves carry the generalised type of electric charge.
Another fundamental theory that has non-linear equations of motion is gravity, described
by Einsteins general theory of relativity. The reason here is very similar; all forms of energy
(mass) act as sources for the gravitational field. In particular, the energy in the gravitational
field itself acts as a source for gravity, hence the non-linearity. Of course in the Newtonian
limit the gravitational field is assumed to be very weak, and all the non-linearities disappear.
In fact there is every reason to believe that if one looks in sufficient detail then even
the linear Maxwell equations will receive higher-order non-linear modifications. Our best
3
candidate for a unified theory of all the fundamental interactions is string theory, and the
way in which Maxwells equations emerge there is as a sort of low-energy effective theory,
which will receive higher-order non-linear corrections. However, at low energy scales, these
terms will be insignificantly small, and so we wont usually go wrong by assuming that
Maxwells equations are good enough.
The story with the order of the fundamental differential equations of physics is rather
similar too. Maxwells equations, the Schrodinger equation, and Einsteins equations are all
of second order in derivatives with respect to (at least some of) the independent variables. If
you probe more closely in string theory, you find that Maxwells equations and the Einstein
equations will also receive higher-order corrections that involve larger numbers of time and
space derivatives, but again, these are insignificant at low energies. So in some sense one
should probably ultimately take the view that the fundamental equations of physics tend to
be of second order in derivatives because those are the only important terms at the energy
scales that we normally probe.
We should certainly expect that at least second derivatives will be observable, since
these are needed in order to describe wave-like motion. For Maxwells theory the existence
of wave-like solutions (radio waves, light, etc.) is a commonplace observation, and probably
in the not too distant future gravitational waves will be observed too.
Differential equations involving only one independent variable are called ordinary differen-
tials equations, or ODEs, by contrast with partial differential equations, or PDEs, which
have more than one independent variable. Even first-order ODEs can be complicated.
One situation that is easily solvable is the following. Suppose we have the single first-
order ODE
dy(x)
= F (x) . (1.2)
dx
Rx
The solution is, of course, simply given by y(x) = dx F (x ) (note that x here is just a
name for the dummy integration variable). This is known as reducing the problem to
quadratures, meaning that it now comes down to just performing an indefinite integral.
Of course it may or may not be be that the integral can be evaluated explicitly, but that is
a different issue; the equation can be regarded as having been solved.
The most general first-order ODE takes the form
dy(x)
= F (x, y) . (1.3)
dx
4
A special class of function F (x, y) for which we can again easily solve the equation explicitly
is if
P (x)
F (x, y) = , (1.4)
Q(y)
implying that (1.3) becomes P (x) dx + Q(y) dy = 0, since then we can reduce the solution
to quadratures, with Z Z
x y
dx P (x ) + dy Q(y ) = 0 . (1.5)
P (x, y)
F (x, y) = , (1.6)
Q(x, y)
and if the differential P (x, y) dx + Q(x, y) dy is exact, which means that we can find a
function (x, y) such that
Of course there is no guarantee that such a will exist. Clearly a necessary condition is
that
P (x, y) Q(x, y)
= , (1.8)
y x
since d = /x dx + /y dy, which implies we must have
= P (x, y) , = Q(x, y) , (1.9)
x y
2 2
= . (1.10)
xy yx
In fact, one can also see that (1.8) is sufficient for the existence of the function ; the
condition (1.8) is known as an integrability condition for to exist. If exists, then solving
the differential equation (1.3) reduces to solving d = 0, implying (x, y) = c =constant.
Once (x, y) is known, this implicitly gives y as a function of x.
If P (x, y) and Q(x, y) do not satisfy (1.8) then all is not lost, because we can recall that
solving the differential equation (1.3), where F (x, y) = P (x, y)/Q(x, y) means solving
P (x, y) dx + Q(x, y) dy = 0, which is equivalent to solving
5
where (x, y) is some generically non-vanishing but as yet otherwise arbitrary function. If
we want the left-hand side of this equation to be an exact differential,
Consider the case where the function F (x, y) appearing in (1.3) is linear in y, of the form
F (x, y) = p(x) y + q(x). Then the differential equation becomes
dy
+ p(x) y = q(x) , (1.14)
dx
which is in fact the most general possible form for a first-order linear equation. The equation
can straightforwardly be solved explicitly, since now it is rather easy to find the required
integrating factor that renders the left-hand side an exact differential. In particular, is
just a function of x here. Thus we multiply (1.14) by (x),
dy
(x) + (x) p(x) y = (x) q(x) , (1.15)
dx
and require (x) to be such that the left-hand side can be rewritten as
dy d((x) y)
(x) + (x) p(x) = , (1.16)
dx dx
so that (1.15) becomes
dy d(x)
(x) + y = (x) q(x) . (1.17)
dx dx
Differentiating the right-hand side of (1.16), we see that (x) must be chosen so that
d(x)
= (x) p(x) , (1.18)
dx
implying that we shall have
Z x
(x) = exp dx p(x ) . (1.19)
6
(The arbitrary integration constant just amounts to a constant additive shift of the integral,
and hence a constant rescaling of (x), which obviously is an arbitrariness in our freedom
to choose an integrating factor.)
With (x) in principle determined by the integral (1.19), it is now straightforward to
integrate the differential equation written in the form (1.16), giving
Z x
1
y(x) = dx (x ) q(x ) . (1.20)
(x)
Note that the arbitrariness in the choice of the lower limit of the integral implies that y(x)
has an additive part y0 (x) amounting to an arbitrary constant multiple of 1/(x),
Z x
y0 (x) = C exp dx p(x ) . (1.21)
This is the general solution of the homogeneous differential equation where the source
term q(x) is taken to be zero. The other part, y(x) y0 (x) in (1.20) is the particular
integral, which is a specific solution of the inhomogeneous equation with the source term
q(x) included.
If the equation of motion in a particular problem has sufficient symmetries of the appropriate
type, we can sometimes reduce the problem to one involving only ordinary differential
equations. A simple example of the type of symmetry that can allow this is the spatial
translation symmetry of the Laplace equation 2 = 0 or Helmholtz equation 2 +k2 =
0 written in Cartesian coordinates:
2 2 2
+ + + k2 = 0 . (2.1)
x2 y 2 z 2
Clearly, this equation retains the same form if we shift x, y and z by constants,
x x + c1 , y y + c2 , z z + c3 . (2.2)
This is not to say that any specific solution of the equation will be invariant under (2.2), but
it does mean that the solutions must transform in a rather particular way. To be precise, if
(x, y, z) is one solution of the differential equation, then (x + c1 , y + c2 , z + c3 ) must be
another.
7
As is well known, we can solve (2.1) by looking for solutions of the form (x, y, z) =
X(x) Y (y) Z(z). Substituting into (2.1), and dividing by , gives
1 d2 X 1 d2 Y 1 d2 Z
+ + + k2 = 0 . (2.3)
X dx2 Y dy 2 Z dz 2
The first three terms on the left-hand side could depend only on x, y and z respectively, and
so the equation can only be consistent for all (x, y, z) if each term is separately constant,
d2 X d2 Y d2 Z
+ a21 X = 0 , + a22 Y = 0 , + a23 Z = 0 , (2.4)
dx2 dy 2 dz 2
where the constants satisfy
a21 + a22 + a23 = k2 , (2.5)
The separation constants ai can be either real, giving oscillatory solutions in that coordinate
direction, or imaginary, giving exponentially growing and decaying solutions, provided that
the sum (2.5) is satisfied. It will be the boundary conditions in the specific problem being
solved that determine whether a given separation constant ai should be real or imaginary.
The general solution will be an infinite sum over all the basic exponential solutions,
X
(x, y, z) = c(a1 , a2 , a3 ) eia1 x eia2 y eia3 z . (2.7)
a1 ,a2 ,a3
where the separation constants (a1 , a2 , a3 ) can be arbitrary, save only that they must satisfy
the constraint (2.5). At this stage the sums in (2.7) are really integrals over the continuous
ranges of (a1 , a2 , a3 ) that satisfy (2.5). Typically, the boundary conditions will ensure that
there is only a discrete infinity of allowed triplets of separation constants, and so the integrals
becomes sums. In a well-posed problem, the boundary conditions will also fully determine
the values of the constant coefficients c(a1 , a2 , a3 ).
Consider, for example, a potential-theory problem in which a hollow cube of side 1 is
composed of conducting metal plates, where five of them are held at potential zero, while
the sixth is held at a specified potential V (x, y). The task is to calculate the electrostatic
potential (x, y, z) everywhere inside the cube. Thus we must solve Laplaces equation
2 = 0 , (2.8)
8
(we take the face at z = 1 to be at some specified potential V (x, y), with the other five
faces at zero potential.)
Since we are solving Laplaces equation, 2 = 0, the constant k appearing in the
Helmholtz example above is zero, and so the constraint (2.5) on the separation constants is
just
a21 + a22 + a23 = 0 (2.10)
X(x) = A ei a1 x + B ei a1 x (2.11)
for X(x), one must choose the constants so that B = A, and hence
Thus we have either the sine function, if a1 is real, or the hypebolic sinh function, if a1 is
imaginary. But we also have the boundary condtion that (1, y, z) = 0, which means that
X(1) = 0. This determines that a1 must be real, so that we get oscillatory functions for
X(x) that can vanish at x = 1 as well as at x = 0. Thus we must have
Note that we now indeed have a sum over a discrete infinity of separation constants.
Finally, the boundary condition (x, y, 1) = V (x, y) on the remaining face at z = 1 tells
us that
X X p
V (x, y) = bmn sin(m x) sin(n y) sinh( m2 + n 2 ) . (2.15)
m>0 n>0
This allows us to determine the constants bmn . We use the orthogonality of the sine func-
tions, which in this case is the statement that if m and p are integers we must have
Z 1
dx sin(m x) sin(p x) = 0 (2.16)
0
9
if p and m are unequal, and
Z 1
1
dx sin(m x) sin(p x) = 2 (2.17)
0
if p and m are equal.1 This allows us to pick out the term m = p, n = q in the double
summation (2.15), by multiplying by sin(p x) sin(q y) and integrating over x and y:
Z 1 Z 1 q
dx dy V (x, y) sin(p x) sin(q y) = 14 bpq sinh( p2 + q 2 ) , (2.18)
0 0
and so Z Z
4 1 1
bpq = p dx dy V (x, y) sin(p x) sin(q y) . (2.19)
sinh( p2 + q 2 ) 0 0
Once the precise specification of V (x, y) on the face at z = 1 is given, the integrals can be
evaluated and the constants bpq determined explicitly.
Suppose, as a specific example, V (x, y) is just a constant, V0 , on the face at z = 1. Since
R1
0 dx sin(p x) = [1 (1)p ]/(p ), we thene find that bpq is nonzero only when p and q
are odd, and
16V0
bp,q = p , p, q odd . (2.20)
p q 2 sinh( p2 + q 2 )
All the constants in the original general solution of Laplaces equation have now been
determined, and the problem is solved.
Another common example of separability arises when solving the Laplace or Helmholtz equa-
tion in spherical polar coordinates (r, , ). These are related to the Cartesian coorindates
(x, y, z) in the standard way:
1 2 1
r + 2 2(,) + k2 = 0 , (2.22)
r 2 r r r
where 2(,) is the two-dimensional Laplace operator on the surface of the unit-radius
sphere,
1 1 2
2(,) sin + 2 . (2.23)
sin sin 2
1
Just use the rules for multiplying products of sine functions to show this. What we are doing here is
constructing a Fourier series expansion for the function V , which happens to be taken to be a constant in
our example.
10
The Helmholtz equation in spherical polar coordinates can be separated by first writing
(r, , ) in the form
1
(r, , ) = R(r) Y (, ) . (2.24)
r
Substituting into the Helmholtz equation (2.22), and dividing out by in the usual way,
we get
r 2 d2 R 1
2
+ r 2 k2 + 2(,) Y = 0 . (2.25)
R dr Y
(It is useful to note that r 2 (r 2 /r)/r is the same thing as r 1 2 (r )/r 2 when doing
this calculation.)
The last term in (2.25) depends only on and , while the first two terms depend only
on r, and so consistency for all (r, , ) therefore means that the last term must be constant,
and so
d2 R
2(,) Y = Y , = k 2
R. (2.26)
dr 2 r2
The key point now is that one can show that the harmonics Y (, ) on the sphere are well-
behaved only if the separation constant takes a certain discrete infinity of non-negative
values. The most elegant way to show this is by making use of the symmetry properties of
the sphere, but since this takes us away from the main goals of the course, we will not follow
that approach here.2 Instead, we shall follow the more traditional, if more pedestrian,
approach of examining the conditions under which singular behaviour of the eigenfunction
solutions of the differential equation can be avoided.
To study the eigenvalue problem 2(,) Y = Y in detail, we make a further separation
of variables by taking Y (, ) to be of the form Y (, ) = () (). Substituting this in,
2
The essential point is that the surface of the unit sphere can be defined as x2 + y 2 + z 2 = 1, and this is
invariant under transformations of the form
x x
y M y ,
z z
where M is any constant 3 3 orthogonal matrix, satisfying M T M = 1l. This shows that the sphere is
invariant under the 3-parameter group O(3), and hence the eigenfunctions Y must fall into representations
under O(3). The calculation of the allowed values for , and the forms of the associated eigenfunctions Y ,
then follow from group-theoretic considerations. Anticipating the result that we shall see by other means,
the eigenvalues take the form = ( + 1), where is any non-negative integer. The eigenfunctions
are classified by and a second integer m, with m , and are the well-known spherical harmonics
Ym (, ). The fact that depends on but not m means that the eigenvalue = ( + 1) has a degeneracy
(2 + 1).
11
and multiplying by sin2 Y 1 , we get
1 d d 1 d2
sin sin + sin2 + = 0. (2.27)
d d d2
By now-familiar arguments the last term depends only on , while the first two depend only
on . Consistency for all and therefore implies that the last term must be a constant,
and so we have
d2
+ m2 = 0 , (2.28)
d2
d d
sin sin + ( sin2 m2 ) = 0 . (2.29)
d d
d dy m2
(1 x2 ) + y = 0. (2.30)
dx dx 1 x2
This equation is called the Associated Legendre Equation, and it will become necessary to
study its properties, and solutions, in some detail in order to be able to construct solutions
of the Laplace or Helmholtz equation in spherical polar coordinates. We shall do this in
section 3 below. In fact, as we shall see, it is convenient first to study the simpler equation
when m = 0, which corresponds to the case where the harmonics Y (, ) on the sphere are
independent of the azimuthal angle . The equation (2.30) in the case m = 0 is called the
Legendre Equation.
Another important second-order equation that can arise from the separation of variables is
Bessels equation, Suppose we are solving Laplaces equation in cylindrical polar coordinates
(, , z), so that we have
1 1 2 2
+ 2 + = 0. (2.31)
2 z 2
12
We can separate variables by writing (, , z) = R() () Z(z), which leads, after dividing
out by , to
1 d dR 1 d2 1 d2 Z
+ 2 + = 0. (2.32)
R d d d2 Z dz 2
The first two terms depend on and but not z, whilst the last term depends on z but
not and . Thus the last term must be a constant, which we shall call k2 , and then
1 d dR 1 d2
+ 2 + k2 = 0 . (2.33)
R d d d2
Multiplying by 2 , we obtain
d dR 1 d2
+ k2 2 + = 0. (2.34)
R d d d2
The first two terms depend on but not , whilst the last term depends on but not .
We deduce that the last term is a constant, which we shal call 2 . The separation process
is now complete, and we have
d2 Z 2 d2
k Z = 0 , + 2 = 0 , (2.35)
dz 2 d2
d2 R 1 dR 2 2
+ + k 2 R = 0, (2.36)
d2 d
where k2 and 2 are separation constants. Rescaling the radial coordinate by defining
x = k , and renaming R as y, the last equation takes the form
d2 y dy
x2 +x + (x2 2 ) y = 0 . (2.37)
dx2 dx
We shall now turn to a detailed study of the solutions of the associated Legendre equation,
which we obtained in our separation of variables in spherical polar coordinates in section
2.2.
We begin by considering the simpler case where the separation constant m is zero, implying
that the associated Legendre equation (2.30) reduces to the Legendre equation
[(1 x2 ) y ] + y = 0 . , (3.1)
13
which we can also write as
(1 x2 ) y 2x y + y = 0 . (3.2)
Note that here we are denoting a derivative with respect to x by a prime, so that dy/dx is
written as y , and so on. We shall use (3.2) to introduce the method of solution of linear
ODEs by series solution, known sometimes as the Frobenius Method.
The idea essentially is to develop a solution as a power series in the independent variable
x, with expansion coefficients determined by substituting the series into the differential
equation, and equating terms order by order in x. The method is of wide applicability; here
we shall take the Legendre equation as an example to illustrate the procedure.
We begin by writing the series expansion
X
y= an xn . (3.3)
n0
(In more general circumstances, which we shall study later, we shall need to consider series
P
expansions of the form y(x) = (n)0 an xn+ , where may not necessarily be an integer.
But in the present case, for reasons we shall see later, we do not need the x factor at all.)
Clearly we shall have
X X
y = n an xn1 , y = n (n 1) an xn2 . (3.4)
n0 n0
Since we want to equate terms order by order in x, it is useful to shift the summation
variable by 2 in the first term, by writing n = m + 2;
X X X
n (n 1) an xn2 = (m + 2)(m + 1) am+2 xm = (m + 2)(m + 1) am+2 xm . (3.6)
n0 m2 m0
(The last step, where we have dropped the m = 2 and m = 1 terms in the summation,
clearly follows from the fact that the (m + 2)(m + 1) factor gives zero for these two values
of m.) Finally, relabelling m as n again, we get from (3.5)
X
(n + 2)(n + 1) an+2 + ( n (n + 1)) an xn = 0 . (3.7)
n0
Since this must hold for all values of x, it follows that the coefficient of each power of x
must vanish separately, giving
14
for all n 0. Thus we have the recursion relation
n (n + 1)
an+2 = an . (3.9)
(n + 1)(n + 2)
We see from (3.9) that all the coefficients an with n 2 can be solved for, in terms of a0
and a1 . In fact all the an for even n can be solved for in terms of a0 , while all the an for odd
n can be solved for in terms of a1 . Since the equation is linear, we can take the even-n series
and the odd-n series as the two independent solutions of the Legendre equation, which we
can call y1 (x) and y2 (x):
y(1) (x) = a0 + a2 x2 + a4 x4 + ,
The first solution involves only the even an , and thus has only even powers of x, whilst
the second involves only the odd an , and has only odd powers of x. We can conveniently
consider the two solutions separately, by taking either a1 = 0, to discuss y(1) , or else taking
a0 = 0, to discuss y(2) .
Starting with y(1) , we therefore have from (3.9) that a2 = 12 a0 , a3 = 0, a4 = 1
12 (6
) a2 , a5 = 0, etc.. In the expression for a4 , we can substitute the expression already found
for a2 , and so on. Thus we will get
a2 = 12 a0 , 1
a4 = 24 (6 ) a0 , ...
a3 = a5 = a7 = = 0 . (3.11)
To discuss the solution y(2) instead, we can take a0 = 0 and a1 6= 0. The recursion
relation (3.9) now gives a2 = 0, a3 = 16 (2 ) a1 , a4 = 0, a5 = 1
20 (12 ) a3 , a5 = 0, etc.,
and so we find
1 1
a3 = 6 (2 ) a1 , a5 = 120 (2 ) (12 ) a1 , ...
a2 = a4 = a6 = = 0 . (3.13)
15
To summarise, we have produced two independent solutions to our differential equation
(3.2), which are given by (3.12) and (3.14). The fact that they are independent is obvious,
since the first is an even function of x whilst the second is an odd function. To make this
precise, we should say that y(1) (x) and y(2) (x) are linearly-independent, meaning that the
only possible solution for constants and in the equation
is = 0 and = 0. In other words, y(1) (x) and y(2) (x) are not related by any constant
factor of proportionality. We shall show later that any second-order ordinary differential
equation must have exactly two linearly-independent solutions, and so with our solutions
y(1) (x) and y(2) (x) established to be linearly-independent, this means that we have obtained
the most general possible solution to (3.2).
The next question is what can we do with our series solutions (3.12) and (3.14). They
are, in general, infinite series. Whenever one encounters an infinite series, one needs to
worry about whether the series converges to a finite result. For example, the series
X
S1 2n = 1 + 1
2 + 1
4 + 1
8 + 1
16 + (3.16)
n0
For our solutions (3.12) and (3.14), we must find out for what range of values of x do the
series converge.
One way to test the converge of a series is by applying the ratio test. This test says
P
that the series f = n0 wn converges if the ratio Rn wn+1 /wn is less than 1 in the
limit n . The series converges if R < 1, it diverges if R > 1, and one gains no
information in the marginal case where R = 1. We wont prove the ratio test here, but
it is clearly plausible. It essentially says that if each successive term in the series (at least
when we go a long way down the series) is smaller than its predecessor by some fraction
less than 1, then the sum of all the terms is finite. If on the other hand each successive
term is bigger than its predecessor by some factor greater than 1, then the sum of all the
16
terms will be infinite. We can try out the ratio test on the three examples in (3.16), (3.17)
and (3.18). Sure enough, for the convergent series (3.16) we find the ratio of the (n + 1)th
term to the nth term is
1/2n+1 1
Rn = 2 (3.19)
1/2n
1
and so this has the limit R = 2 which is less than 1. For the second example (3.17) we
have Rn = 2, and so R = 2. The ratio test therefore predicts that this series will diverge.
For the third example, we see from (3.18) that
n+1
Rn = , (3.20)
n+2
and so R = 1. The ratio test doesnt give us any result in this case therefore. However, a
more involved calculation will show that the series (3.18) diverges.
Going back to our series solutions (3.3), we have
n (n + 1) 2
Rn = x . (3.22)
(n + 1) (n + 2)
For sufficiently large n we can neglect the contribution from the fixed given value of , and
so the terms proportional to n2 in the numerator and denominator dominate at large n.
Thus we have
R = x2 . (3.23)
Recall that in our problem, x = cos , and so we are interested in x in the range 1 x 1.
If |x| < 1, the ratio test tells us that the series converges. However, we would also like to
know what happens at x = 1, since these points correspond to = 0 and = , the north
and south poles of the sphere. Here, the ratio test fails to give us any information, although
it does tell us that the series diverges for |x| > 1.
A more sophisticated analysis shows that the series will in fact always diverge at x = 1,
unless takes a value such that the series terminates. Obviously, if the series terminates
after a finite number of terms, then there can be no possibility of the sum diverging. For
the termination to occur, the numerator in (3.9) must vanish for some value of n. Clearly,
a necessary condition for this to occur is that must be a positive integer of the form
( + 1), where is an integer, which we may assume to be non-negative. In fact the even
series for y1 (x) terminates if = ( + 1), where is an even non-negative integer, whilst
17
the odd series for y(2) terminates if is an odd positive integer. Once an becomes zero for
some value of n, it is obvious from the recursion relation (3.9) that all the higher coefficients
an+2 , an+4 , . . . will vanish too.
As an example to illustrate the divergent behaviour if the series does not terminate,
consider the odd series y2 (x), with = 0. From (3.9) we then have an+2 = n an /(n + 2)
(with n odd), which has the solution an = a1 /n. Thus the series (3.3) becomes
y = a0 (x + 13 x3 + 51 x5 + 17 x7 + ) , (3.24)
= ( + 1) , (3.26)
It is actually not too hard to solve the recursion relation (3.9) explicitly, to obtain the
general expression for the Legendre polynomials.3 One finds
[/2]
X (1)k (2 2k)!
P (x) = x2k . (3.28)
k=0
2 k! ( k)! ( 2k)!
3
Solving recursion relations from scratch is not always easy. However, something that is easy is to verify
that a claimed solution does indeed solve the recursion relation. It is leaft as an exercise for the reader to
verify that (3.28) does indeed solve (3.9).
18
A similar analysis for the case where m is non-zero shows that the associated Legendre
equation (2.30) has solutions regular at x = 1 only if is a non-negative integer, and
m is an integer taking any of the values in the range m . The corresponding
eigenfunctions are the associated Legendre functions Pm (x). It can be shown that these
are related to the Legendre polynomials P (x) by the formula
dm P (x)
Pm (x) = (1)m (1 x2 )m/2 . (3.29)
dxm
We shall return to these later.
The Legendre polynomials P (x) are the basic set of regular solutions of the Legendre
equation,
d dP (x)
(1 x2 ) + ( + 1) P (x) = 0 , (3.30)
dx dx
and this is the equation that arose (in the azimuthally-symmetric case) when separating
variables in spherical polar coordinates. It follows that in order to construct solutions of
the Laplace equation by the method of separating the variables, we shall therefore need to
have a thorough understanding of the properties of the Legendre polynomials.
The basic technique that one uses for solving an equation such as the Laplace equation
in spherical polar coordinates is parallel to that which we used in section (2.1) when we
discussed the analogous problem in Cartesian coordinates. Namely, we write down the
most general possible solution (obtained by separation of variables), and then determine
the constant coefficients in the expansion by matching to given boundary data. As we shall
see below, this means in particular that we need to be able to determine the coefficients a
in the expansion of an arbitrary function f (x) in terms of Legendre polynomials;
X
f (x) = a P (x) . (3.31)
0
For now we shall just assume that such an expansion is possible; the proof is a little involved,
and we shall postpone this until a bit later in the course, where we shall prove it in a much
more general context.4
The essential requirement in order to be able to determine the constants a is to know
some appropriate orthogonality relation among the Legendre polynomials. Specifically, we
can show that Z 1
dx P (x) Pn (x) = 0 , 6= n , (3.32)
1
4
The series (3.31) is a generalisation of the familiar Fourier series.
19
and Z 1
dx P (x) Pn (x) = Cn , = n. (3.33)
1
The constants Cn are calculable (once one has defined a normalisation for the Legendre
polynomials), and we shall calculate them, and prove the orthogonality condition (3.32)
below. It is clear that with these results we can then calculate the coefficients a in the
series expansion (3.31). We just multiply (3.31) by Pn (x) and integrate over x, to get
Z 1 X Z 1
dx Pn (x) f (x) = a dx P (x) Pn (x) ,
1 0 1
= an Cn . (3.34)
(It is understood that P and Pn here are functions of x, and that a prime means d/dx.)
We now integrate over x, from x = 1 to x = +1, and note that using an integration by
parts we shall have
Z 1 Z 1 h i1
dx [(1 x2 ) P ] Pn = dx [(1 x2 ) P Pn + (1 x2 ) P (x) Pn (x) . (3.37)
1 1 1
The boundary terms here at x = 1 vanish, because of the (1 x2 ) factor. Thus after
integrating (3.36) and integrating by parts on the first two terms, we get simply
Z 1
[ ( + 1) n (n + 1)] dx P (x) Pn (x) = 0 . (3.38)
1
Since it is always understood that and n are non-negative integers, we see that ( + 1)
is equal to n (n + 1) only if = n. Thus if have proved the orthogonality of the Legendre
polynomials; if and n are not equal, then (3.39) is satisfied.
The next step takes a little more work. We need to calculate the constants Cn occur-
ring in the integral (3.33). Of course we can only do that once we have decided upon a
20
normalisation for the Legendre polynomials P (x). By convention, they are defined to be
such that
P (1) = 1 . (3.40)
In order to evaluate the integral in (3.33), we now need to have an explicit way of expressing
the Legendre polynomials. It turns out that a convenient way to do this is in terms of
a representation called Rodrigues Formula. This formula asserts that P (x), with the
normalisation (3.40), can be written as
1 d
P (x) = (x2 1) . (3.41)
2 ! dx
We can prove Rodrigues formula in two stages. First, we shall prove that it gives some
constant multiple of the previously-defined Legendre polynomial P (x). Then, we shall
prove that in fact from the definition (3.41), we shall have P (1) = 1. To prove the first
stage, we begin by defining
u = (x2 1) . (3.42)
d
= . (3.43)
dx
(1 x2 )u + 2xu = 0 . (3.44)
Now differentiate this ( + 1) times. Recall that Liebnitzs rule for the differentiation of a
product of functions f and g is (f g) = f g + (f )g. If we differentiate the product twice,
we get 2 (f g) = f 2 g + 2(f )(g) + ( 2 f )g. Differentiating n times gives
where we recognise the coefficients as the binomial coefficients that arise in the expansion
of (x + y)n .
Now back to our ( + 1)-fold differentiation of (3.44). For g we shall take either u or u,
and f will correspondingly be either (1 x2 ) or 2x. Note that we only have to go a short
way down the series in (3.45), because more than two derivatives kills (1 x2 ), and more
than one kills 2x. Thus we get, collecting up the terms,
(1 x2 ) +2 u 2x +1 u + ( + 1) u = 0. (3.46)
21
In other words, if we define y = u, which means y = (x2 1) , then we have shown
that y satisfies the Legendre equation
Thus we have completed stage 1 of the proof: we have shown that P (x) defined by (3.41)
indeed satisfies the the Legendre equation, and since (3.41) manifestly gives a polynomial
solution (i.e. a finite number of terms in powers of x, up to x , as opposed to an infinite
power series), it must be proportional to the Legendre polynomial solution of the equation
that we constructed previously.
To complete the demonstration that Rodrigues formula (3.41) produces exactly the
previously-defined Legendre polynomials, we need to show that they are normalised so that
P (1) is indeed equal to 1. To do this, we first write (3.41) as
1 d h
i
P (x) = (x 1) (x + 1) . (3.48)
2 ! dx
Now imagine differentiating out all the terms, using the Leibnitz formula (3.45), and then
setting x = 1. The only term(s) that survive will be those that do not have an undiffer-
entiated factor involving powers of (x 1). The only way to kill off the powers of (x 1)
completely is if all derivatives land on (x 1) . This clearly gives a factor of !. Since all
the derivatives needed to land on (x 1) , this means the (x + 1) factor is left undifferen-
tiated, and therefore gives 2 when we set x = 1. The 2 and the ! cancel the terms in the
prefactor denominator in (3.48), and so indeed we have confirmed that P (1) = 1.
Lest our original task has been forgotten during the course of this discussion, let us
remind ourselves that we wanted to determine the constants Cn in (3.33). That is, we want
to calculate Z 1
Cn = dx [Pn (x)]2 . (3.49)
1
where we write n instead of dn /dxn for brevity. Integrating by parts n times, and noting
that the powers of (x2 1) will kill off the resulting boundary terms, we therefore have
Z
(1)n 1
Cn = 2n dx (x2 1)n 2n (x2 1)n . (3.51)
2 (n!)2 1
Now (x2 1)(n) is a polynomial in x, which looks like x2n + , where the ellipses denote
terms of lower order in x. When we differentiate 2n times, only the first term gives a
22
contribution, and so from 2n x2n = (2n)! we find that
Z 1
(2n)!
Cn = 2n dx (1 x2 )n . (3.52)
2 (n!)2 1
Unfortunately our troubles are not yet quite over, because the integral is not going to
give up without a bit of a fight. The best way to evaluate it is by induction. We note that
we can write the following:
x d
(1 x2 )n = (1 x2 )(1 x2 )n1 = (1 x2 )n1 + (1 x2 )n . (3.53)
2n dx
Plugging this into (3.52), we see that it gives us
Z 1
2n 1 (2n 1)!
Cn = Cn1 + 2n x d[(1 x2 )n ] . (3.54)
2n 2 (n!)2 1
Thus, finally, we arrive at Cn = 2/(2n + 1), and so the normalisation of the integral of
[Pn (x)]2 is established: Z 1 2
dx[Pn (x)]2 = . (3.58)
1 2n + 1
Let us review what we have achieved. Starting from a proposed expansion of an arbitrary
function f (x) as a sum of Legendre polynomials as in (3.31);
X
f (x) = a P (x) , (3.59)
0
It is time to look at a few examples. First, we may note that it is often very helpful
to use Rodrigues formula in order to evaluate the integral (3.60). Substituting (3.41) into
(3.60), and integrating by parts, we obtain
Z
(2 + 1) h d1 i1 (2 + 1) 1 d1
a = +1 f (x) 1 (x2 1) +1 dx f (x) (x2 1) . (3.61)
2 ! dx 1 2 ! 1 dx1
23
The boundary term gives zero, since the ( 1)th derivative of (x2 1) leaves one overall
factor of (x2 1), and this vanishes at x = 1. Continuing this procedure, we can perform
( 1) further integrations by parts, ending up with
Z
(2 + 1) 1 d f (x)
a = dx (1 x2 ) . (3.62)
2+1 ! 1 dx
Notice in particular that if the given function f (x) is itself a polynomial of degree n,
then all its derivatives d f (x)/dx for > n vanish. This means that the all the expansion
coefficients a will vanish for > n. This should not be a surprise, since we know that P (x)
is itself a polynomial of degree . In fact the set of Legendre polynomials with 0 n
really form a basis for the set of all possible polynomials of degree n. For example, we
have
P0 (x) = 1 , P1 (x) = x , P2 (x) = 21 (3x2 1) , (3.63)
and we can see just by doing elementary algebra that we can re-express the general quadratic
polynomial a x2 + b x + c as
It is clear that we can do a similar expansion for any polynomial of finite degree n, and
(3.62) gives us the expressions for the coefficients a , which will be non-zero only for n.
More generally, if the function f (x) that we are expanding is non-polonomial in x, we
shall get an infinite series in (3.59).
Having constructed the Legendre polynomials, and determined their orthogonality and nor-
malisation properties, we can now use them in order to construct azimuthally-symmetric
solutions of the Laplace or Helmholtz equations. (We shall move on to case without az-
imuthal symmetry later.)
Recall, from section 2.2, that if we consider functions that are independent of the az-
imuthal angle , then the solution (r, , ) of the Laplace or Helmholtz equation was
written as
1
(r, ) = R(r) () , (3.65)
r
with and R satisfying
1 d d
sin + = 0 (3.66)
sin d d
24
and
d2 R 2
= k R. (3.67)
dr 2 r2
We determined that the functions () will only be regular at the north and south poles
of the sphere if = ( + 1) where is an integer (which can be assumed non-negative,
without loss of generality). The functions () are then the Legendre polynomials, with
() P (cos ).
Let us specialise to the case of the Laplace equation, which means that k = 0 in the
equation (3.67) for the radial function R(r). It is easy to see that with = ( + 1), the
two independent solutions of (3.67) are
R r +1 , and R r . (3.68)
It follows, therefore, that the most general azimuthal solution of the Laplace equation
2 = 0 in spherical polar coordinates can be written as
X
(r, ) = (A r + B r 1 ) P (cos ) . (3.69)
0
We established the orthogonality relations for the Legendre polynomials, given in (3.32)
and (3.33) with C eventually determined to be C = 2/(2 + 1). In terms of , related to
x by x = cos , we therefore have
Z 2
sin d P (cos ) Pn (cos ) = n , (3.70)
0 2 + 1
The symbol on the right-hand side here is the Kronecker delta function. By definition,
mn is zero if m 6= n, while it equals 1 if m = n. Thus (3.70) says that the integral on the
left-hand side vanishes unless = n, in which case it gives 2/(2 + 1).
We can use these results in order to solve problems in potential theory. Suppose, for
example, the electrostatic potential is specified everywhere on the surface of a sphere of
radius a, as (a, ) = V () for some given function V (), and it is also specified that the
potential is zero at infinity. Suppose we wish to calculate the potential (r, ) everywhere
outside the sphere. Since the potential is specified to vanish at infinity, it follows that the
coefficients A in (3.69) must all be zero, and so our solution is of the form
X
(r, ) = B r 1 P (cos ) . (3.71)
0
To determine the remaining coefficients B , we set r = a and use the given boundary data
(a, ) = V ():
X
(a, ) = V () = B a1 P (cos ) . (3.72)
0
25
R
Multiplying by Pn (cos ) and integrating over sin d, we get
Z 2
sin d V () Pn (cos ) = an1 Bn , (3.73)
0 2n + 1
whence Z
Bn = 12 (2n + 1) an+1 sin d V () Pn (cos ) . (3.74)
0
(2 + 1) ( 2)!! a+1
B = (2)(1)/2 (3.75)
1
2 2 ( + 1) !
when is odd. (Note that (2p + 1)!! means (2p + 1)(2p 1)(2p 3) 5 3 1.) The
first few terms in the solution give
h 3a2 7a4 11a6 i
(r, ) = V P1 (cos ) P3 (cos ) + P5 (cos ) + . (3.76)
2r 2 8r 4 16r 6
The series solution that we derived above is valid in the exterior region r a. With a
little work, one can prove that the series converges for all r a. However, the series diverges
if we try to apply it to the interior region r < a. This is perhaps not surprising, since it is a
series in inverse powers of r, and so the terms will all be blowing up as r approaches zero.
It is therefore necessary to look for a different power series solution in the interior region.
Suppose, then, we want to find the potential inside the sphere. Part of the specification
of the problem now requires that we be told what happens at the origin, r = 0. Let us
suppose we are told that the potential is finite there; there is, for example, no point charge
that could cause the potential to diverge as r approaches zero. It follows that the coefficients
B in the general series expansion (3.69) must all be zero, in the interior region, and so we
must look for an expansion of the form
X
(r, ) = A r P (cos ) . (3.77)
0
Again, the remaining A coefficients are determined by matching the series expansion (3.77)
to the specified boundary condition (a, ) = V () at r = a. Note, however, that there
is no need to solve this from scratch. We can just use the results of the calculations we
already did for the expansion in the exterior region r a. To avoid confusion, let us call
26
(ra)
the B coefficients we found previously B , and call the new coefficients A that we need
(ra)
for the interior region A . The point is that the two series solutions have a common
intersection of their regimes of validity, namely on the surface r = a. Therefore we have
that
X (ra) X (ra)
(a, ) = B a1 P (cos ) = A a P (cos ) . (3.78)
0 0
Since this holds for all , the linear independence of the Legendre polynomials tells us that
the two expressions must agree for each value of , and hence we have
(ra) (ra)
A = a21 B . (3.79)
To summarise, we have now obtained the complete solution to the elctrostatics problem
where the potential is specified on the spherical surface r = a. The solution comes in two
parts, valid in different regions. The series solution that converges in the exterior region
diverges in the interior region, and conversely the solution that converges in the interior
region diverges in the exterior region. By having both the solutions, we are able to cover
the entire space.
There is yet another way to define the Legendre polynomials, which is very useful in its
own right. This is via what is called a Generating Function. The claim is that
X
G(x, t) (1 2x t + t2 )1/2 = t P (x) , (3.80)
0
where, for convergence of the series, we must have |t| < 1. How do we use this to read off
the Legendre polynomials? We perform a power series expansion of the left-hand side, in
increasing powers of t. Doing this, we find that the left-hand side gives
Equating this with the right-hand side of (3.80), and comparing the coefficients of each
power of t, we read off
27
left-hand side in (3.80), and then to show that by equating the coefficients of each power of
t with those on the right-hand side, we can recognise the power-series expansion of P (x)
that we already know. This is done in detail below. A more elegant approach, however,
is as follows. We want to show that all the P (x) defined by (3.80) satisfy the Legendre
equation
(1 x2 ) P 2x P + ( + 1) P = 0 . (3.83)
is equal to zero. Now, looking at (3.80) we can see that H can be written as
But the left-hand side is just (1 t)1 , which has the binomial expansion
1 X
= 1 + t + t2 + t3 + t4 + = t , (3.87)
1t 0
and so by comparing with the right-hand side in (3.86) we immediately get P (1) = 1.
We can use the generating function representation of the Legendre polynomials in order
R1 2
to provide a different derivation of the result (3.58) for 1 dx[Pn (x)] that we needed earlier
in order to calculate the expansion coefficients in (3.59). If we take the square of the defining
relation (3.80), we obtain
1 X
2
= t+n P (x)Pn (x) , (3.88)
1 2xt + t ,n
R1
where each sum is over the non-negative integers. Now integrate this 1 dx, and recall that
R1
we have the orthogonality relation 1 dxP (x)Pn (x) = 0 when 6= n. Thus from (3.88) we
28
get Z Z
1 dx X 1
= t2n dx[Pn (x)]2 . (3.89)
1 1 2xt + t2 n0 1
The integral on the left-hand side is easily evaluated, giving the result
X Z 1
log(1 + t) log(1 t)
= t2n dx[Pn (x)]2 . (3.90)
t n0 1
In our analysis in section 3, we made the specialisation from the Associated Legendre
Equation (2.30) to the case of the Legendre Equation, where m = 0. Let us now return to
the full Associated Legendre Equation, which we shall need for finding general solutions of
Laplaces equation, in which the potential function is allowed to depend on the azimuthal
angle . For convenience, we present again the Associated Legendre Equation:
d dy m2
(1 x2 ) + y = 0. (3.93)
dx dx 1 x2
As mentioned previously, it turns out that we can construct the relevant solutions of this
equation rather simply, in terms of the Legendre polynomials that we have already studied.
To begin, we write y = (1 x2 )m/2 w, and substitute this into (3.93). After simple
algebra we find, after extracting an overall factor of (1 x2 )m/2 , that w must satisfy
(We are using a prime to denote differentiation d/dx here.) Now suppose that we have a
solution u of the ordinary Legendre equation:
(1 x)2 u 2x u + u = 0 . (3.95)
Next, we differentiate this m times. Let us use the notation m as a shorthand for dm /dxm .
Using the Leibnitz rule (3.45) for the n-fold derivative of a product, and noting that we
29
only need the first two or three terms in this expression if we apply it to the products in
(3.95), we easily find that
dm
Pm (x) (1)m (1 x2 )m/2 P (x) , (3.97)
dxm
where P (x) is a Legendre polynomial, then Pm (x) will be a solution of the Associated
Legendre Equation with = ( + 1):
d dP m m2 m
(1 x2 ) + ( + 1) P = 0 . (3.98)
dx dx 1 x2
Since P (x) is regular everywhere including x = 1, it is clear that Pm (x) will be too. It
is understood here that we are taking the integer m to be non-negative. It is clear that we
must have m too, since if m exceeds then the m-fold derivative of the th Legendre
polynomial (which itself is of degree ) will give zero.
Recall next that we have Rodrigues formula (3.41), which gives us an expression for
P (x). Substituting this into (3.97), we get
(1)m 2 m/2 d
+m
Pm (x) = (1 x ) (x2 1) . (3.99)
2 ! dx+m
A nice little miracle now occurs: this formula makes sense for negative values of m too,
provided that m . Thus we have a construction of Associated Legendre Functions for
all integers m in the interval m .
Looking at the Associated Legendre Equation (3.98), we note that the equation itself is
invariant under sending
m m , (3.100)
since m appears only as m2 in the equation. This means that if we take a solution with a
given m, then sending m to m gives us another solution. What is more, only one solution
at fixed and m2 can be regular at x = 1, since the second solution will have logarithmic
singularities there (just like we saw for the Legendre functions). Since Pm (x) and Pm (x)
given by 3.99 are both regular at x = 1, it follows therefore that they must be linearly
dependent; i.e. Pm (x) must be some constant multiple of Pm (x):
30
It is easy to determine what the constant k is, by using (3.99). From (3.101) we get
It is good enough just to look at the highest power of x, since all we need to do is to
calculate what k is.5 Thus we get
(2)! (2)!
x+m = k (1)m x2m xm (3.103)
( + m)! ( m)!
( m)! m
Pm (x) = (1)m P (x) . (3.104)
( + m)!
Using this result we can now very easily work out the normalisation integral for the
associated Legendre functions Pm (x). The relevant integral we shall need to evaluate is
Z 1
dx Pm (x) Pnm (x) . (3.105)
1
(It will become clear in section 3.6 why we have set the upper indices m equal here.) Using
the same method as we used for the Legendre polynomials, it is easy to show that (3.105)
vanishes unless = n. For = m, we can make use of (3.104) to write the required integral
as Z Z
1 ( + m)! 1
Cm dx [Pm (x)]2 = (1)m dx Pm (x) Pm (x) . (3.106)
1 ( m)! 1
Our task is to calculate the constants Cm . We can use the generalised Rodrigues formula
(3.99), thus giving
Z
(1)m ( + m)! 1
Cm = 2 dx +m (x2 1) m (x2 1) . (3.107)
2 (!)2 ( m)! 1
(Note that by making use of (3.104) we have managed to get the two powers of (1 x2 )m/2
that would otherwise have arisen from (Pm )2 to cancel instead of adding, which simplifies
life considerably.) Integrating by parts +m times in (3.107), and noting that the boundary
terms all give zero, we therefore have
1 Z
( + m)!
Cm = 2 2
dx (1 x2 ) 2 (x2 1) ,
2 (!) ( m)! 1
Z 1
(2)! ( + m)!
= dx (1 x2 ) . (3.108)
22 (!)2 ( m)! 1
5
One could, more adventurously, give another proof that Pm (x) and Pm (x) are linearly dependent by
checking all powers of x. We leave this as an exercise for the reader.
31
The integral here is the same one we had to evaluate in the case of the Legendre polynomials
in (3.52); the only difference now is the extra factorial prefactors. Thus from the previous
results in section 3.2, we see that
2 ( + m)!
Cm = . (3.109)
2 + 1 ( m)!
In other words, we have shown that
Z 1 2 ( + m)!
dx Pm (x) Pm
(x) = . (3.110)
1 2 + 1 ( m)!
It may be recalled that a while back, we were solving equations such as the Laplace equation
or the Helmholtz equation in spherical polar coordinates, in section 2.2. We had reduced the
problem, by means of separating variables, to solving for the radial functions R(r) and the
functions Y (, ) on the spherical constant-radius surfaces. Thus the Helmholtz equation
2 + k2 = 0 implied that if we write
1
(r, , ) = R(r) Y (, ) , (3.111)
r
the R(r) and Y , ) should satisfy
d2 R
2(,) Y = Y , 2
= 2 k2 R , (3.112)
dr r
where
1 1 2
2(,) sin + (3.113)
sin sin2 2
is the Laplace operator on the unit sphere. We then performed the further separation of
angular variables on the sphere, with Y (, ) = () (), showing that for regularity we
must have = ( + 1), and m is an integer with m .
Putting together what we found for the angular functions, we see that the Y (, ) are
characterised by the two integers and m, and we may define
s s
(2 + 1) ( m)! m
Ym (, ) P (cos ) ei m , 0, m . (3.114)
4 ( + m)!
The Spherical Harmonics Ym (, ) satisfy
2(,) Ym (, ) = ( + 1) Ym (, ) . (3.115)
These spherical harmonics form the complete set of regular solutions of 2(,) Y = Y
on the unit sphere. Note from (3.104) that we have
32
where the bar denotes complex conjugation.
From our results in the previous sections, we can easily see that the spherical harmonics
satsify the orthogonality properties
Z
d Y m ( ) Ym (, ) = mm , (3.117)
where
d sin d d (3.118)
R
is the area element on the unit sphere, and d X means
Z 2 Z
d sin d X . (3.119)
0 0
Thus (3.117) just says that the integral on the left-hand side is zero unless = and m = m.
Note that it is the integration over that is responsible for producing the Kronecker delta
mm , since the dependent factors in (3.117) are
Z 2
d ei (mm ) . (3.120)
0
This integrates to zero if the integers m and m are unequal, whilst giving 2 if m = m .
The remaining integration over in (3.117) then reduces, with m and m equal, to the
integral in (3.110), which then gives rise to the Kronecker delta function in (3.117).
It is instructive to look at the first few spherical harmonics explicitly. From (3.114), and
using (3.99) to give the expressions for the Pm , we find
1
Y0,0 (, ) = ,
4
r
3
Y1,1 (, ) = sin ei ,
r 8
3
Y1,0 (, ) = cos ,
r 4
3
Y1,1 (, ) = sin ei ,
8
r
15
Y2,2 (, ) = sin2 e2i ,
32
r
15
Y2,1 (, ) = sin cos ei ,
r 8
5
Y2,0 (, ) = (3 cos2 1) ,
r 16
15
Y2,1 (, ) = sin cos ei ,
r 8
15
Y2,2 (, ) = sin2 e2i . (3.121)
32
33
It is also instructive to rewrite the spherical harmonics in terms of Cartesian, rather
than spherical polar, coordinates. Recall that the two coordinate systems are related by
We can write the expressions for x and y more succinctly in a single complex equation,
x + i y = r sin ei , (3.123)
since we have the well-known result that ei = cos + i sin . Thus for the spherical
harmonics listed in (3.121) we have
1
Y0,0 = ,
4
r
3 x + iy
Y1,1 = ,
r 8 r
3 z
Y1,0 = ,
r 4 r
3 x iy
Y1,1 = ,
r 8 r
15 (x + i y)2
Y2,2 = ,
32
r r2
15 z (x + i y)
Y2,1 = 2
,
r 8 r
5 3z 2 r 5 2z 2 x2 y 2
Y2,0 = 2
1 = ,
r 16 r 16 r2
15 z (x i y)
Y2,1 = ,
r 8 r2
15 (x i y)2
Y2,2 = . (3.124)
32 r2
What we are seeing here is that for each value of , we are getting a set of functions, labelled
by m with m , that are all of the form of polynomials of degree in (x, y, z), divided
by r :
xi1 xi2 xi
Ym , (3.125)
r
where xi for i = 1, 2 and 3 denotes x, y and z respectively. The larger is, the larger
the number of possible such polynomials. Looking at = 1, we have in total three Y1,m
functions, which could be reorganised, by taking appropriate linear combinations, as
x y z
, , . (3.126)
r r r
Thus once we know one of them, the other two just correspond to rotating the coordinate
system through 90 degrees about one or another axis. The same is true of all the higher
34
harmonics too. The spherical harmonics thus have built into them the knowledge of the
rotational symmetries of the sphere. Our procedure for deriving the spherical harmonics was
completely non-transparent, in the sense that no explicit use of the rotational symmetry
of the sphere was made in the derivation. But at the end of the day, we see that the
harmonics we have obtained do indeed reflect the symmetries. In the language of group
theory, one says that the spherical harmonics Ym fall into representations of the rotation
group. One of the rather remarkable miracles that we encountered during our derivation,
namely that the solutions to the associated Legendre equation could be constructed from
solutions of the ordinary Legendre equation, ultimately has its explanation in the fact that
the harmonics Ym with m 6= 0 are simply related to the m = 0 harmonic Y0 by symmetry
rotations of the sphere.
Going back to our general form of the separated solution (3.111), and noting that if we
are solving Laplaces equation then the radial functions still satisfy (2.26) with k = 0, just
as they did in the azimuthally-symmetric case m = 0, we now have that the most general
solution of Laplaces equation in spherical polar coordinates6 is written as
X X
(r, , ) = (Am r + Bm r 1 ) Ym (, ) . (3.127)
0 m=
The constants Am and Bm , which depend on both and m, are as yet arbitrary. Their
values are determined by boundary conditions, as in the previous potential-theory examples
that we have looked at. Because we are now allowing the azimuthal separation constant m
to be non-zero, the class of solutions described by (3.127) includes those that are dependent
on the azimuthal angle .
Let us conclude this part of the discussion with a simple example. Suppose the electro-
static potential is given on the the spherical surface r = a, and that one is told that
(a, , ) = V (, ) (3.128)
on this surface, for some given function V (, ). Calculate the potential everywhere inside
the surface.
First, we note that since the potential must remain finite as r approaches zero, it must be
that all the coefficients Bm in (3.127) vanish in this problem. The Am can be calculated by
setting r = a in what remains in (3.127), and then multiplying by Y ,m (, ) and integrating
over the sphere; Z
d (a, , ) Y m (, ) = a A m . (3.129)
6
That is, the most general solution that is regular on the spherical surfaces at constant r.
35
Here, we have made use of the orthogonality relations (3.117). Thus we have
Z
Am = a d V (, ) Ym (, ) (3.130)
where V0 is a constant. Because this potential has a particularly simply form, we can spot
that it can be written in terms of the spherical harmonics as
r
1 2
V0 sin sin = V0 sin (ei ei ) = i V0 (Y1,1 (, ) + Y1,1 (, )) , (3.132)
2i 3
where we have used the = 0 expressions in (3.121). This, of course, is all one is really
doing in any case, when one uses the orthogonality relations to determine the expansion
coefficients; we just need to figure out what linear combination of the basis functions con-
structs for us the desired function. It just happens in this example that the answer is so
simple that we can spot it without doing all the work of evaluating the integrals in (3.132).
Thus, we see by comparing with the general solution (3.127) that we must have
r
2 r
(r, , ) = i V0 (Y1,1 (, ) + Y1,1 (, )) . (3.133)
3 a
This is actually real (as it must be) despite the presence of the i, since the Ym functions
themselves are complex. In fact in this example it is obviously much simpler to write the
answer explicitly, using the expressions in (3.121); we just get
r
(r, , ) = V0 sin sin . (3.134)
a
The example chosen here was so simple that it perhaps makes the use of the whole
edifice of spherical harmonic expansions look a trifle superfluous. The principles involved
in this example, though, are no different from the ones that would be involved in a more
complicated example.
Before moving on, let us return to the generating function for the Legendre polynomials,
defined in (3.80). There is a nice physical interpretation of this construction, which we shall
now describe. In the process, we shall illustrate another very useful technique that can
sometimes be used in order to determine the coeeficients in the expansion of the solution
of Laplaces equation in terms of Legendre polynomials or spherical harmonics.
36
Consider the problem of a point charge of unit strength, sitting on the z axis at a point
z = r . We know, since it is an axially-symmetric situation, that the potential must be
expressible as
X
(r, ) = (A r + B r 1 ) P (cos ) . (3.135)
0
To determine the coefficients, we must first make a choice between considering either the
region where r r , or the region where r r .
In the region r r , we will require that tend to zero as r ; it should not blow
up at large r, because there is no other charge in the problem that would make it do so.
(Of course we could allow it to tend to an arbitrary constant value at infinity, but that is
a trivial issue, since one only ever measures potential differences, and so without loss of
generality (wolog, for short!) we can assume the constant asymptotic value is zero.) Thus
boundary conditions in the problem determine that
A = 0 , 0. (3.136)
What about the remaining constants B ? We dont have a specific boundary anywhere that
will determine these, but we do have the following simple way to work them out. Consider
the potential on axis, i.e. on the z axis, which for z > 0 means = 0 . Since we are looking
at the region r > r , the on-axis potential is therefore given by
1
(r, 0) = . (3.137)
r r
(For simplicity we use units where the 4 0 that appears in the rather cumbersome SI
system of units has been absorbed.7 ) Now we can determine the constants B by matching
this to the general solution (3.135) (with A = 0, of course, since we have already established
that). Thus we shall have
1 X
= B r 1 P (1) . (3.138)
r r 0
We can pull out a factor of 1/r on the left-hand side, and do a binomial expansion of
(1 r /r)1 = 1 + r /r + (r /r)2 + (noting that r /r < 1, since r > r , and so the series
converges):
1 X r X
= B r 1 . (3.139)
r 0 r 0
We have also used that P (1) = 1. Equating coefficients of each power of r in (3.139), we
find B = r . Thus from (3.135), the general solution at a point (r, ) with r > r is given
7
It is perfectly OK for slow-moving creatures like engineers to be content with the SI units... B.F. Schutz
37
by
X r
(r, ) = P (cos ) , r > r . (3.140)
0
r +1
This gives the potential at a point (r, ) (for arbitrary azimuthal angle , of course, since
the configuration is axially symmetric), due to a unit charge at z = r on the z axis.
To make contact with the generating function, we now observe that we can in fact write
down the solution to this problem in closed form. The potential (r, ) will just be the
inverse of the distance from the point (r, ) to the point z = r on the z axis where the unit
charge is located. Using the cosine rule, this distance is (r 2 2r r cos + r 2 )1/2 . Thus
from (3.140) it must be that
2 X r
(r 2 2r r cos + r )1/2 = P (cos ) . (3.141)
0
r +1
Pulling out a factor of r 2 from the bracket on the left-hand side, and defining t r /r,
x cos , we see that (3.141) is nothing but
X
(1 2x t + t2 )1/2 = t P (x) . (3.142)
0
38
could rather easily solve the problem off-axis too, but in more complicated situations the
technique can be very powerful.
Finally, we note that there is a generalisation of (3.141) that can be given, in which the
unit charge is located at an arbitrary point. This is therefore an expression for
1
= , (3.144)
|~r ~r |
the potential at position vector ~r, due to a point charge at ~r . Without too much difficulty,
one can show that it is expressible as a sum over spherical harmonics. As usual, there are
two different expressions, depending upon whether r > r or r < r , where r |~r|, r |~r |.
We have:
1 X X 4 r
r > r : = Ym ( , ) Ym (, ) , (3.145)
|~r ~r | 0 m= 2 + 1 r +1
1 X X 4 r
r < r : = Ym ( , ) Ym (, ) , (3.146)
|~r ~r | 0 m= 2 + 1 r +1
where (r, , ) denotes the point ~r in spherical polar coordinates, and likewise (r , , )
denotes the point ~r . One proof of these formulae involves noting that since (3.144), viewed
as a function of ~r, is a solution of Laplaces equation, it must be expressible as a sum of the
form (3.127). Then, by performing some manipulations in which one rotates to a coordinate
system where ~r lies along the rotated z axis, and invoking the previous result (3.141), the
result follows after some simple algebra.
dy d2 y
y , y . (4.2)
dx dx2
In the following discussion, we shall assume that p(x) and q(x) are rational functions of
x, i.e. that they are expressible as ratios of polynomials in x.
39
4.1 Singular points of the equation
First, we introduce the notion of singular points of the equation. A point x = x0 is called
an ordinary point if p(x) and q(x) are finite there.8 To be more precise, the point x = x0 is
an ordinary point if p and q are analytic there, i.e. they admit power-series expansions of
P
the form p(x) = n0 pn (x x0 )n , etc. The point x = x0 is defined to be a singular point
if either p(x) or q(x) diverges at x = x0 . More precisely, x = x0 is a singular point if at
least one of p(x) and q(x) is non-analytic there. For reasons that will become clear later, it
is useful to refine this definition, and subdivide singular points into regular singular points,
and irregular singular points. They are defined as follows:
In other words, if the singularities are not too severe, meaning that a simple pole in p(x)
is allowed, and a double pole in q(x) is allowed, then the singularity is a regular one. As
we shall see, equations whose only singular points are regular ones admit better-behaved
series solutions than those with irregular singular points.
As stated above, these definitions apply only for finite values of x0 . To analyse the point
x = , we can first perform the change of independent variable from x to z = 1/x, and
study the behaviour of the transformed equation at z = 0. Using the chain rule to write
d d d d2 2 d
2 d d2 d
= z = z 2 , 2
= z 2
+ z = z 4 2 + 2z 3 , (4.3)
dx dz dz dx dz dz dz dz
where z dz/dx, we see that the equation (4.1) becomes, with y, p and q now viewed as
y(1/z), p(1/z) and q(1/z),
d2 y (2z p) dy q
+ + 4 y = 0. (4.4)
dz 2 z2 dz z
8
In this course we shall always use the word finite in its proper sense, of meaning not infinite. Some
physicists have the tiresome habit of misusing the term to mean (sometimes, but not always!) non-zero,
which can cause unnecessary confusion. (As in, for example, The heat bath had a finite temperature, or
There is a finite probability of winning the lottery.) Presumably, however, these same people would not
disagree with the mathematical fact that if x and y are finite numbers, then x + y is a finite number too.
Their inconsistency is then apparent if one considers the special case x = 1, y = 1. We shall have further
comments on linguistics later...
40
(2zp) q
The point x = is therefore an ordinary point if p z2
and q z4
are analytic at
z = 0; it is a regular singular point if at least one of p or q is non-analytic while z p and
z 2 q are analytic at z = 0; and it is an irregular singular point if z p or z 2 q (or both) are
non-analytic at z = 0.
It is worthwhile pausing here to check the singularity structure in a couple of examples.
Consider first the associated Legendre equation (2.30). Rewriting the equation in the form
(4.1), we have
2x m2
y 2
y + 2
y = 0. (4.5)
1x 1x (1 x2 )2
Thus we see that all finite values of x except x = 1 are ordinary points. There are regular
singular points at x = 1. Defining x = 1/z, one finds that (4.5) becomes
d2 y 2z dy m2
+ y = 0. (4.6)
dz 2 1 z 2 dz z 2 (1 z 2 ) (1 z 2 )2
This shows that z = 0 is a regular singular point too. Therefore the singularities of the
associated Legendre equation comprise three regular singular points, at x = (1, 1, ).
These are also the singularities in the special case of the Legendre equation, where m =
0. It is, by the way, no coincidence that the trouble spots that we encountered when
constructing the series expansion of the Legendre equation were at x = 1, precisely at the
locations of singular points of the equation.
We also encountered Bessels equation, given by (2.37). Dividing by x2 , this becomes
1 2
y + y + 1 2 y = 0, (4.7)
x x
showing that the only singular point within the range of finite x is a regular singular point
at x = 0. Replacing x by z = 1/x to analyse the point at infinity, we find that Bessels
equation becomes
d2 y 1 dy 1 2
+ + y = 0. (4.8)
dz 2 z dz z4 z2
The 1/z 4 pole in q at z = 0 shows that the Bessel equation (4.7) has an irregular singular
point at x = , together with its regular singular point at x = 0.
It is worth remarking, for future reference, that although Bessels equation has an irreg-
ular singular point, it is one of a rather specific kind, with a 1/z 4 pole in the coefficient of
y. This can actually be viewed as the superposition or confluence of two regular singular
points. Consider the situation of an ODE with two regular singular points, at x = a and
x = b; a simple example would be
1
y + p(x) y + y = 0. (4.9)
(x a)2 (x b)2
41
Let us, for further simplicity, suppose that here p(x) has no poles at x = a or x = b. Clearly,
if we now choose the parameters a and b to be equal then instead of having two regular
singular points at x = a and x = b, we will have one irregular singular point at x = a = b,
with a fourth-order pole. Thus we may consider Bessels equation to be a confluent limit of
an equation with three regular singular points. In fact most of the common second-order
ODEs that one encounters in physics either directly have three regular singular points, or
else they are confluent limits of equations with three regular singular points. So important
are such equations that the entire class of second-order ODEs with three regular singular
points has been classified, and its solutions studied in great detail. It turns out that by
making appropriate transformations of the independent and dependent variables, one can
put any such equation into a standard canonical form, which is known as the Hypergeometric
Equation. In this form, the three regular singular points are located at x = 0, x = 1 and
x = . The hypergeometric equation is the following
where a, b and c are constants. The regular singular points at x = 0 and x = 1 are evident
by inspection, and the regular singular point at x = can be seen easily after making the
standard x = 1/z transformation.
Here, we shall undertake a somewhat more systematic study of some of the properties of
second-order ODEs, and their solutions. We shall, as usual, take the equation to be
To begin, let us consider the question of how many independent solutions to this equation
there will be.
The Wronskian is a function defined as follows. Suppose that y1 and y2 are two solutions
of (4.11). Then we define the Wronskian (y1 , y2 ) of the two solutions by
(y1 , y2 ) y1 y2 y2 y1 . (4.12)
Sometimes, it will be useful to emphasise that the Wronskian is a function of x, since y1 and
y2 are functions of x. At other times, it may be convenient to suppress explicit indications
42
of what the Wronskian depends on, in order to abbreviate the writing. Thus depending on
the circumstance, we may choose to write it in any of the forms
Conversely, if y1 and y2 are linearly dependent, say y1 = c y2 , then it follows that the
Wronskian vanishes,
Thus combining this with the previous observation, we have the result that that the Wron-
skian (y1 , y2 ) vanishes if and only if the two solutions y1 and y2 are linearly dependent.
In fact, if one is given a particular solution y1 to the second-order linear ODE, the
Wronskian can be used in order to construct a second, linearly-independent solution y2 , as
follows.
Let us suppose we are given a solution y1 (x). We then pick a specific point x = x0 ,
which we will view as our starting point. The point x0 will be assumed to be generic, in
the sense that y1 (x0 ) and y1 (x0 ) are non-vanishing. (If this is not true at a particular point,
then we can always move to a nearby point where it will be true.) We may then consider
a second solution y2 (x), which we shall characterise by specifying the values of y2 (x) and
y2 (x) at x = x0 . These two constants can conveniently be written as
where and are constants. (This is nothing but a specification of the initial conditions
for y2 (x0 ) and y2 (x0 ). It happens to be convenient to express them as constant multiples
and of the non-vanishing constants y1 (x0 ) and y1 (x0 ).) Thus at x = x0 , we will have
43
It is clear therefore that in the immediate vicinity of x = x0 , the solution y2 is linearly
independent of y1 provided that 6= . We now look at what happens to (y1 , y2 ) as we
move away from x = x0 . To do this, differentiate the definition (4.12) of the Wronskian,
and then use the original differential equation (4.11) to simply the result:
d
= y1 y2 y2 y1 ,
dx
= y1 (p y2 + q y2 ) + y2 (p y1 + q y1 ) ,
= p (y1 y2 y2 y1 ) ,
= p . (4.19)
d d log f
= . (4.21)
dx dx
Note that (x0 ) is just a constant here. We see that (x), which was already determined
to be non-vanishing at x = x0 , will be non-vanishing for all x, at least within some neigh-
bourhood of the point x0 , and hence the solution y2 is independent of y1 for all x.
We have established that if we have two solutions y1 and y2 for which y2 (x0 )/y2 (x0 ) 6=
y1 (x0 )/y1 (x0 ), then these two solutions are linearly independent. In fact we can do better,
and actually construct such a second independent solution y2 (x), from a given solution
y1 (x). To do this, we observe that from the definition of the Wronskian we may deduce
d y2
(x) = y1 y2 y2 y1 = y12 , (4.23)
dx y1
44
whence d(y2 /y1 )/dx = /y12 and so9
Z x Z x
(t) dt
y2 (x) = y1 (x) dt = (x0 ) y1 (x) , (4.24)
x1 y12 (t) x1 f (t) y12 (t)
where x1 is an arbitrary constant, and for the second equality we have made use of the
expression (4.22). Different choices for x1 shift the value of the integral by a constant,
and therefore shift the expression for for y2 (x) by the addition of a constant multiple of
y1 (x). This arbitrariness is not of interest to us right now, since we can always take linear
superpositions of solutions of a linear equation, and thereby get another solution. Since
we already know that y1 (x) solves the equation, it is not interesting, for now, to add a
constant multiple of y1 (x) to our construction of a linearly-independent solution y2 . (If
y2 (x) is linearly independent of y1 (x), then so is y2 (x) + k y1 (x), for any constant k.)
We are also not interested, for now, in the freedom to rescale our second solution y2 (x)
by a constant factor; obviously, since the differential equation is linear, then if y2 (x) is a
solution then so is c y2 (x), for any constant c. We may therefore omit the constant prefactor
in (4.24), and work with a rescaled y2 . In summary, we may conclude that if y1 is a solution
of the differential equation (4.11), then a second, linearly independent, solution y2 (x) is
given by Z x dt
y2 (x) = , (4.25)
y12 (t) f (t)
where f (t) is given by (4.20) and the choice of lower limit of integration is not particularly
important. Although it is merely a consistency check that we made no mistakes, it is in
fact easy to verify by direct substitution that (4.25) satisfies the original equation (4.11),
given that y1 does.
The question now arises as to whether there could be a third solution y3 of (4.11),
independent both of y1 and y2 . Our results above would already suggest not, since we
followed a rather general route by means of which we were led to construct y2 in (4.24);
the only arbitrariness was in the choice of two constants of integration, and changing these
9
Note that if we have an equation dF (x)/dx = G(x), then when we write down the indefinite integral we
write
Z x
F (x) = G(t) dt ,
taking care to use a symbol for the integration variable that is not the same as the variable x. It doesnt
matter whether we call it t, or x , or x, or y or ; anything but x will do! In some textbooks immense
Rx
confusion is caused by writing F (x) = G(x) dx. The meaning of the variable x that is the argument of
F (x), and the variable t that is the (dummy) integration variable, are quite distinct, and different symbols
should be used for the two.
45
merely rescales our y2 by a constant factor, and adds a constant multiple of y1 to it. It is
instructive, however, to consider the following direct demonstration that there can be no
third independent solution:
Suppose we do postulate a third solution y3 . Our aim will be to show that it can in fact
be written as a linear combination of y1 and y2 . Begin by picking a generic point x = x0 ,
at which we shall specify the values of y3 (x0 ) and y3 (x0 ). Rather than saying
It is easy to see that this is an equally good parameterisation. Simple algebra shows that
the constants A and B are related to a and b by
a y2 (x0 ) b y2 (x0 ) b y1 (x0 ) a y1 (x0 )
A= , B= , (4.28)
(y1 , y2 )(x0 ) (y1 , y2 )(x0 )
where (y1 , y2 )(x0 ) means the Wronskian of y1 and y2 evaluated at x = x0 , namely
The crucial point is that by our intial assumption of the linear independence of y1 and y2 ,
we must have (y1 , y2 )(x0 ) 6= 0, and thus nothing prevents us solving (4.28) for A and B;
we have two independent equations determining the two constants A and B. Now, we can
use the original differential equation (4.11) to deduce that
= A y1 (x0 ) + B y2 (x0 ) .
We can then repeat these steps for all the higher derivatives of y3 at x = x0 , deducing that
where y (n) denotes the nth derivative. But we know from Taylors theorem that within
its radius of convergence, we can write any function h(x) in terms of all its derivatives at
x = x0 :
X 1
h(x) = (x x0 )n h(n) (x0 ) . (4.33)
n0
n!
46
Therefore it follows from (4.32) that
X 1 (n) X 1 (n) (n)
y3 (x) = (x x0 )n y3 (x0 ) = (x x0 )n [A y1 (x0 ) + B y2 (x0 )] ,
n0
n! n0
n!
= A y1 (x) + B y2 (x) , (4.34)
and hence the solution y3 is linearly dependent on y1 and y2 , at least within the radius of
convergence of the power series expansion around x0 .
To recapitulate, what we did was to consider a completely arbitrary solution y3 of the
second-order ODE (4.11). We showed that it can always be written as a linear combination
of the two independent solutions y1 and y2 , at least within the range of x for which they have
convergent power-series expansions. Therefore there are exactly two linearly independent
solutions. It is clear that very similar arguments could be used for an N th-order ODE, and
would show that it has N linearly-independent solutions.
We have so far considered the solutions of the homogeneous equation (4.11), or L(y) =
0, for which each term is of degree 1 in y or its derivatives. We may also consider the
inhomogeneous equation L(y) = r(x), i.e.
One can think of the function r(x) as being like a source term for the field y. Here,
we shall show that we can obtain a formal solution for this equation, in terms of the two
linearly-independent solutions y1 and y2 of the homogeneous equation, L(y1 ) = 0, L(y2 ) = 0
that we discussed previously. In other words, we suppose that we know how to solve the
homogeneous equation, and now we wish to use these known solutions y1 and y2 in order
to obtain the solution of the inhomogeneous equation.
To do this, first consider what happens if we write y = u v in (4.35). It follows that
L(u v) = (u v) + p (u v) + q u v ,
= v u + v p u + v q u + u v + p u v + 2u v ,
= v L(u) + u v + (u p + 2u ) v = r . (4.36)
Now choose u = y1 , where y1 is one of the solutions of the homogeneous equation, L(y1 ) = 0.
Thus we get
v + p + 2(y1 /y1 ) v = r/y1 , (4.37)
47
after dividing out by y1 . Our task is to solve for v. We saw previously from the definition
(4.12) of the Wronskian that (y2 /y1 ) = /y12 , and also = p(x) , and hence we will
have
y y y y
2
= = 2 1 3 = p 2 2 1 2 = (p + 2 1 ) 2 . (4.38)
y1 y12 2
y1 y1 y1 y1 y1 y1 y1
This can therefore be written as
Next, multiply this equation by v , multiply (4.37) by (y2 /y1 ) , and subtract the former
from the latter. This gives
48
4.4 Series solutions of the homogeneous equation
Let us now return to a more detailed study of the construction of series solutions of second-
order linear ODEs. To begin, consider the case where we expand the solution of (4.11)
around an ordinary point x = a, i.e. a point around which p(x) and q(x) are analytic. This
means that in the vicinity of x = a, we can expand them in Taylor series:
Let us try assuming that the solution y(x) is also analytic near x = a, so we can also expand
it in a Taylor series:
X
y(x) = an (x a)n = a0 + a1 (x a) + a2 (x a)2 + . (4.49)
n0
From the term in (x a)1 , we then solve for a3 in terms of a0 , a1 and a2 . Since we have
already solved for a2 in terms of a0 and a1 , this then gives us a3 in terms of a0 and a1 .
Continuing to higher orders, we thus obtain all the an for n 2 in terms of a0 and a1 .
Since the two initial coefficients a0 and a1 are arbitrary, these parameterise the two-
dimensional space of solutions of the second-order ODE. Thus we may think of the general
solution as being given by
y = a0 y1 + a1 y2 , (4.52)
where y1 and y2 are the two independent solutions determined by our series expansions.
(The solution y1 is the one obtained by taking a0 = 1 and a1 = 0, while the solution y2
49
is obtained by taking a1 = 1 and a1 = 1 and a0 = 0.) Solving for the various higher
coefficients an as described above, one finds that the two solutions are given by
Note that the two basic solutions y1 and y2 have the convenient properties that
y1 (a) = 1 , y1 (a) = 0 ,
Thus if one is looking for the solution that satisfies the boundary conditions y(a) = A,
y (a) = B, then the answer is y = A y1 + B y2 .
We were able to obtain analytic solutions (i.e. solutions as Taylor series) in the neigh-
bourhood of x = a because this was an ordinary point, where p(x) and q(x) were analytic,
and themselves had Taylor-series expansions. The series solution will be valid within a
radius of convergence determined by the closest singular point. Thus, for example, if the
closest singular point of the ODE is at x = b, where b > a, then the series solutions will
converge for all x such that
|x a| < b a . (4.55)
In general, the series solutions will become divergent when x approaches either of the
values that saturates this inequality. We saw an example of this when we studied the series
solution of the Legendre equation. We expanded around the ordinary point at x = 0,
and sure enough, we found that the series solutions became divergent at x = 1, which
correspond to regular singular points of the Legendre equation. (Of course we also observed
that in that case we could arrange, by a judicious choice of the parameters of the equation,
to get a power-series solution that actually terminated, thereby avoiding the divergence of
the generic solution.)
So far, we considered the case where we expand around an ordinary point x = a, for which
p(a) and q(a) are analytic at x = a. Suppose now that the function p(x) has a pole at
x = a, while q(x) is still analytic. Let us assume that p(x) has a pole of degree N , which
means that is has an expansion of the form
X
p(x) = pk (x a)k , (4.56)
k=N
50
for some constants pk . A convenient way to write this is as
F (x)
p(x) = , (4.57)
(x a)N
and hence
F (a) F (a) F (a)
p(x) = + + + . (4.59)
(x a)N (x a)N 1 2(x a)N 2
Note that F (a) can be assumed nonzero, since we are assuming that the coefficient of the
leading-order (x a)N pole is nonzero. As we shall illustrate below, we will now find that
if we continue to seek analytic solutions in the form of the series expansion (4.49), certain
of the coefficients ai in the series expansion (4.49) will be zero, namely
a1 = a2 = = aN = 0 . (4.60)
Plugging the series expansion (4.49) into the equation (4.11), with this assumed form for
p(x), we will get
Thus the coefficient of (x a)2 tells us that a1 = 0 (recall that F (a) 6= 0, which in turn
means that the coefficient of (x a)1 implies that a2 = 0. The coefficient of (x a)0 then
allows us to solve for a3 in terms of a0 . The higher powers of (x a) will then allow us to
solve for a4 , a5 , etc., in terms of a0 . It is not hard to see that this gives the series solution
51
Weve found one solution in this example, as a power series in (x a). But what of
the other solution? We know from our previous general analysis that there should be two
independent solutions. Evidently, the second solution must not be expressible as a power
series of the form (4.49); hence our failure to find it by this means. Recall, however, that we
were able earlier to give a general construction of the second, linearly-independent, solution
of any second-order ODE, if we were given one solution. The second solution was given by
(4.24), and thus is of the form
Z x dt
y2 (x) = y1 (x) , (4.64)
f (t) y12 (t)
where p(x) = d log f /dx. Now, we are assuming that p(x) is given by (4.61), where F (x) is
analytic at x = a (i.e. it admits a Taylor expansion around the point x = a). Therefore we
can expand F (x) in a power series, giving
F (a) F (a) 1
p(x) = + + F (a) + 61 F (a) (x a) + . (4.65)
(x a)2 xa 2
Thus we have
Z x F (a)
log f = p(t)dt = + F (a) log(x a) + 12 F (a) (x a) + , (4.66)
xa
whence
1 F (a) (a)
= exp exp[ 12 F (a) (x a) + ] ,
(x a)F
f (x) xa
F (a)
= exp (x a)F (a) G(x) , (4.67)
xa
where G(x) is an analytic function. Since y1 (x) is an analytic function (admiting a Taylor
expansion around the point x = a), it follows that 1/y12 (x) is analytic too, and so finally we
conclude from (4.64) that
Z x (a)
y2 (x) = y1 (x) eF (a)/(ta) (t a)F H(t) dt , (4.68)
52
Taylor series around the essential singularity. This explains why we were unable to find a
power-series expansion for the second solution in this case.
We ran into this problem with the construction of the second solution because we as-
sumed that p(x) had a double pole at x = a, as in (4.61). Suppose instead p(x) had only a
single pole, so that
F (x)
p(x) = , (4.69)
xa
where F (x) is analytic at x = a. Thus we will now have
F (a)
p(x) = + F (a) + 21 F (a) (x a) + . (4.70)
xa
R
Integrating to get log f = p, we will now have
where H(t) is some analytic function. This is a much better situation than the previous
one. Now, instead of an essential singularity, we instead merely face a power-law singular
behaviour. In fact if we expand H(t) in a Taylor series around t = a,
X
H(t) = hn (t a)n , (4.73)
n0
(We shall assume, for now, that F (a) is not an integer.) The series therefore involves
fractional powers of (x a). This is a rather mild kind of singularity, called a branch cut.
We will study such things in more detail later in the course.
Let us pause to summarise what we have discovered. If we look at an ordinary point
x = a, for which p(a) and q(a) are finite, then we can obtain the two independent solutions
of the second-order ODE (4.11) as power-series expansions of the form (4.49). If, on the
other hand, p(x) has a pole at x = a, while q(a) is still assumed to be analytic, then we
can only obtain one solution of the ODE as a power series of the form (4.49). The second
solution must instead now be obtained using the general construction (4.64). However, if
p(x) has a pole of degree N 2, the behaviour of this second solution will be very bad
53
around x = a, with an essential singularity. By contrast, if p(x) has only a simple pole, the
second solution will be much better behaved. It will still, in general, not be a simple power
series, but it will have nothing worse than a branch-cut singularity in its behaviour around
x = a. In fact, it is evident from (4.74) that the second solution, in the case where p(x) has
a pole only of degree N = 1, has a series expansion of the form
X
y2 (x) = bn (x a)n+s , (4.75)
n0
We analysed above what happens if q(x) is analytic at x = a, but p(x) is singular. Suppose
now that we consider the more general situation where both p(x) and q(x) are singular at
x = a. Specifically, let us consider the situation when
F (x) G(x)
p(x) = , q(x) = , (4.76)
(x a)N (x a)M
where F (x) and G(x) are themselves analytic at x = a, and N and M are positive integers.
To study the behaviour of the solutions, let us consider a solution y of L(y) = 0, where
we shall write y = u v. The idea is that we are going to factor off all the singular behaviour of
y in the function v, while u will be taken to be analytic. (Clearly we can always make some
such split; if all else failed, we could take u = 1, after all! The key point is that we want to
make a useful split of this sort). Now, it follows that our equation L(y) = y + p y + q y = 0
becomes
u + H u + J u = 0 , (4.77)
2v v p v
H =p+ , J =q+ + . (4.78)
v v v
Now, from what we saw in the previous section, we know that provided the function J
in (4.77) is analytic, there will be at least one analytic solution u, even if H has a pole.
Thus we will consider cases where H has poles, but where we can choose the function v in
such a way that J is analytic. We shall then choose u to be the analytic solution of (4.77).
54
Let us suppose that x = a is a regular singular point. This corresponds to the case where
p(x) has a pole of order 1, and q(x) has a pole of order 2. From the previous discussion, we
are expecting that the solution will have singular behaviour of the general form (x a)s .
In fact, to begin with let us try taking the function v, into which we are factoring off the
singular behaviour, to be given precisely by
v = (x a)s , (4.79)
for some constant index s. This implies that v /v = s/(x a) and v /v = s(s 1)/(x a)2 ,
and hence J defined in (4.78) is given by
s p(x) s(s 1)
J = q(x) + + . (4.80)
xa (x a)2
With p(x) having a first-order pole, and q(x) a second-order pole, we can write
F (x) G(x)
p(x) = , q(x) = , (4.81)
xa (x a)2
and so
G(a) + s F (a) + s (s 1) G (a) + s F (a)
J= + + regular terms . (4.83)
(x a)2 xa
Assume for a moment that G (a) + s F (a) = 0, so that there is no 1/(x a) term. We
see that we can then make J completely regular if we choose s such that the coefficient of
1/(x a)2 vanishes. This is achieved if s satisfies the so-called Indicial Equation
Its two roots, which we may call s1 and s2 , correspondingly give us two solutions of the
original ODE,
y1 = (x a)s1 u1 , y2 = (x a)s2 u2 , (4.85)
Now suppose that G (a) + s F (a) is non-zero, which means we also have a 1/(x a)
singular term in J. To handle this, we just have to modify slightly our choice of how to
55
factor off the singular behaviour of the solution when we write y(x) = u(x) v(x). We do
this by choosing
v(x) = (x a)s e x . (4.87)
The condition for removing the 1/(x a)2 singularity is unchanged from before; we should
take s to satisfy the indicial equation (4.84). We now use the additional freedom associated
with the introduction of the constant , which we choose such that the coefficient of 1/(xa)
vanishes also. Having thus arranged that J is analytic, we therefore again know that we
can find at least one analytic solution u(x) to the equation (4.77), and thus we will get a
solution to the original differential equation of the form y(x) = u(x) (x a)s e x . Since e x
is analytic, we again have the conclusion that y(x) is expressed as (x a)s times an analytic
function, where s is determined by the indicial equation (4.84).
In a generic case where the two roots s1 and s2 satisfy s1 s2 6= integer, we obtain
two independent solutions by this method. If, on the other hand, s1 = s2 , (and usually, if
s1 s2 =integer), one finds that u2 is related to u1 by u2 = (x a)s1 s2 u1 , and so from
(4.85) we see that we will get only one solution by this method. The second solution can,
however, still be obtained using (4.24),
Z x dt
y2 (x) = y1 (x) , (4.89)
f (t) y1 (t)2
where A is a constant, and p(x) is written as p = d log f /dx. Let us look at this in more
detail.
Since p(x) is given by (4.81) it follows that
Z x
1 F (a)
= exp dt + = (x a)F (a) g(x) , (4.90)
f (x) (t a)
where g(x) is the analytic function that comes from integrating the higher-order terms.
Now, the indicial equation (4.84) can be written as (s s1 )(s s2 ) = 0, where s1 and s2
are its roots, and so we see that s1 + s2 = 1 F (a), and hence 1/f (x) in (4.90) has the
form (x a)s1 +s2 1 times the analytic function g(x). Plugging the form of the first solution
given in (4.85), for y1 , into (4.89), we therefore find that the integrand is of the form
56
where h(t) = g(t)/u1 (t)2 is again analytic. If we expand h(t) as
X
h(t) = bn (t a)n , (4.92)
n0
then inserting this into (4.91), and then (4.89), and integrating term by term, we obtain
an expression for the second solution y2 (x). In general, i.e. when s1 s2 is not equal to an
integer, this will give
X bn
y2 (x) = u1 (x) (x a)n+s2 . (4.93)
n0
n s 1 + s 2
If s1 s2 is not equal to an integer, we saw previously that we had already found the
two linearly-independent solutions of the differential equation, given in (4.85). In these
circumstances, the solution (4.93) must be just equivalent to the second solution already
found in (4.85).10
If instead s1 s2 is an integer, it is clear from (4.93) that if the constant bn with n = s1 s2
is non-vanishing, then the expression (4.93) is invalid, because of the vanishing denominator
ns1 +s2 for this term in the sum. What has happened, of course, is that this term in (4.93)
Rx
came from integrating 1/(t a). In the usual way, dt (t a)k = (x a)k+1 /(k + 1) for all
Rx
values of the constant k except for k = 1, when instead we get dt (t a)1 = log(x a).
Thus, when s1 s2 is an integer we must omit the term with n = s1 s2 from the sum in
(4.93), and handle the integration separately. The net result is that we get
X bn
y2 (x) = bs1 s2 y1 (x) log(x a) + u1 (x) (x a)n+s2 , (4.94)
n0
n s1 + s2
P
where we use the notation n0 to indicate that the term n = s1 s2 is omitted in the
summation. Thus in general, to find the second independent solution in a series expansion
around a regular singular point x = a, we should include a log(x a) term in the postulated
form of the second solution. In fact, from (4.94), we see that we should try a series expansion
X
y2 (x) = A y1 (x) log(x a) + cn (x a)n+s , (4.95)
n
57
It is becoming clear by this stage that one could spend a lifetime exploring all the special
cases and abnormalities and perversities in the structure of the solutions of ODEs. Let us
therefore bring this discussion to a close, with a summary of what we have found, and what
one finds in a more exhaustive analysis of all the possibilities.
The coefficients an satisfy a recursion relation which determines all the an in terms of
a0 and a1 . Thus we have two linearly-independent analytic solutions.
The coefficients an will satisfy a recursion relation, and in addition the quantity s will
satisfy an indicial equation, quadratic in s:
(s s1 )(s s2 ) = 0 . (4.98)
If s1 s2 6= integer, one will obtain the two independent solutions by this method,
associated with the two values s = s1 and s = s2 for the index. If s1 = s2 , and usually,
if s1 s2 = integer, only one linearly independent solution, say y1 (x), will arise from
this construction. The second solution can be obtained by seeking a series expansion
of the form
X
y2 (x) = A y1 (x) log(x a) + cn (x a)n+s . (4.99)
n0
3. If p(x) has a pole of order higher than 1 at x = a, or q(x) has a pole of order higher
than 2 at x = a, then at least one, and possibly both, of the solutions will have an
essential singularity at x = a. Note, however, that if q(x) is analytic while p(x) has a
pole of arbitrary order n, then one of the solutions is analytic at x = a, as we saw in
section 4.4.2.
58
4. If p(x) or q(x) themselves have worse singularities than poles, the solutions will be
even more pathological.
Finally, here is an example of the application of the series solution technique, in the case
of the Bessel equation,
x2 y + x y + (x2 2 ) y = 0 , (4.100)
where is a constant, given, parameter in the equation. As we have already discussed, this
equation has a regular singular point at x = 0, and an irregular singular point at x = .
We shall perform a series expansion around x = 0. Since this is a regular singular point,
we therefore seek solutions of the form
X
y(x) = an xn+ . (4.101)
n0
Since this must hold for all x, we can now equate to zero the coefficient of each power of
x. To do this, in the first sum we make the replacement n n + 2, so that (4.102) is
re-expressed as11
Xn o
[(n + + 2)2 2 ] an+2 + an xn++2
n0
( 2 2 ) a0 = 0 , [( + 1)2 2 ] a1 = 0 . (4.105)
We begin with the first equation in (4.105). This is the Indicial Equation. Notice that
we can in general insist, without any loss of generality, that a0 6= 0. The reason for this is
as follows. Suppose a0 were equal to zero. The series (4.101) would then begin with the a1
11
Recall that sending n n + 2 in the first sum means first setting n = m + 2, so that the summation
over m runs from 2 up to +. Then, we write this as the sum from m = 0 to + together with the
extra two terms m = 2 and m = 1 added on in addition. Finally, we relabel the m summation variable
as n.
59
term, so it would be a series whose powers of x were (x+1 , x+2 , x+3 , . . .). But since at
the stage when we write (4.101) is a completely arbitrary constant, not yet determined,
we could as well relabel it by writing = 1. We would then have a series whose powers
of x are (x , x +1 , x +2 , . . .). But this is exactly what we would have had if the a0 term
were in fact non-zero, after relabelling as . So insisting that a0 be non-zero loses no
generality at all.
Proceeding, we then have the indical equation 2 2 = 0, i.e.
= . (4.106)
Now we look at the second equation in (4.105). Since we already know from the indicial
equation that 2 = 2 , we can rewrite the second equation as
(2 + 1) a1 = 0 . (4.107)
Thus either a1 = 0 or else = 12 . But since we already know from the indicial equation
that = , it follows that except in the very special case where = 12 , which has to be
analysed separately, we must have that a1 = 0. Let us assume that 6= 12 , for simplicity.
In fact, we shall assume for now that takes a generic value, which is not equal to any
integer or half integer.
Finally, in the recursion relation (4.104), we substitute the two possible values for , i.e.
= or = . In each of these two cases, the recursion relation then gives us expressions
for all the an with n 2, in terms of a0 (which is non-zero), and a1 (which is zero since we
are assuming 6= 12 ).
We can check the radius of convergence of the series solutions, by applying the ratio
test. The ratio of successive terms (bearing in mind that a1 = 0, which means all the odd
an are zero) is given by
an+2 x2 x2
= 2 , (4.108)
an (n + + 2)2
where = . In either case, at large n we see that the absolute value of the ratio tends
to x2 /n2 , and thus the ratio becomes zero for any fixed x, no matter how large. Thus the
radius of convergence is infinite. Notice that this accords perfectly with the fact that the
next nearest singular point of the Bessel equation is at x = . Thus we should expect the
series to converge for any finite x.
We can easily see that with taking a generic value (neither integer nor half-integer),
our two solutions corresponding to the two roots of the indicial equation 2 2 = 0, namely
60
= + and = , give linearly-independent solutions. This is obvious, since the two
solutions take the form12
X X
y1 = x an xn , y2 = x an xn . (4.109)
n0 n0
Clearly, the prefactors x around the front of the (analytic) Taylor expansions are different
fractional powers of x, and so obviously there is no linear relation between the solutions.
1
For example, if = 3 then y1 has terms x1/3 , s4/3 , etc, whilst y2 has terms x1/3 , x2/3 , etc.
The recursion relation (4.104) can be solved explicitly. The resulting solutions for y(x)
are called Bessel functions J (x), given by
X (1)n x 2n+
J (x) = . (4.110)
n0
n! (n + + 1) 2
(The two solutions are J (x) and J (x).) Note that the Gamma function is defined (for
(z) > 0, where means the real part) by
Z
(z) = tz1 et dt . (4.111)
0
(z + 1) = z (z) . (4.112)
As we shall see later in the course, this relation allows us to extend (z) by analytic
continuation to the entire complex plane. By making use of (4.112), it is straightforward
to show that the coefficients appearing in (4.110) do indeed satisfy the recursion relation
(4.104).
The argument demonstrating the linear independence of the two solutions could clearly
fail if were an integer or half-integer. Let us look at a specific example, where = 1. We
see from (4.104) that we shall have
an
an+2 = (4.113)
n(n + 2)
for the putative second solution with = = 1 in (4.109). The first thing to notice is
that we cannot use this equation to give us a2 in terms of a0 , since there is a zero in the
denominator on the right-hand side when n = 0. Thus we must conclude that a0 = 0. Since
12
We use an to distinguish the series expansion coefficients for the second solution from those for the first
solution. Like the coefficients an in the first solution, they satisfy the recursion relation (4.104) too, but
since = rather than = +, the actual expressions fr=or the an will differ from those for the an .
61
we also have a1 = 0 from (4.107), it means that the series for y2 in (4.109) begins with the
a2 term. The subsequent an with n even are non-vanishing, while the an with n odd vanish
(since a1 vanishes).
Now, we can perform the following relabelling; let an = cn2 . Then we shall have
X X
y2 = an xn1 = cn xn+1 , (4.114)
n2 n0
where in the second expression we have made the replacement n n + 2 in the summation
variable. From (4.113), it follows that cn = cn2 /(n(n + 2)), or in other words
cn
cn+2 = . (4.115)
(n + 2)(n + 4)
with a0 6= 0 and a1 = 0. Thus our supposed second solution y2 , given by (4.114) and
(4.115), is exactly the same series, with coefficients cn satisfying the identical recursion
relation, as the first solution (4.116). This proves, for the example = 1, that we fail to
get the second solution in the form (4.101). A similar discussion shows that this happens
whenever is an integer or half-integer. (That is, it happens whenever the difference of the
two roots of the indicial equation, namely
() , (4.117)
is an integer. This can also be seen explicitly by examining the series expression (4.110) for
the Bessel funtions.
From the earlier general discussion, we know that if the difference of the two roots of
the indicial equation is an integer, we should expect that the linearly-independent second
solution will involve a log term, as in (4.95). In our example of the Bessel equation, we
should therefore look for a second solution of the form
X
y(x) = y1 (x) log x + bn xn , (4.118)
n0
where y1 (x) is the previously-obtained first solution, given in (4.109). We now plug (4.118)
into the Bessel equation (4.100), and solve for the new coefficients bn . (We can take it, for
62
the purposes of this discussion, that we have already solved for the expansion coeffcients an
in the first solution y1 (x), as in (4.110).)
The first thing to notice is that when we plug (4.118) into the Bessel equation (4.100)
all the terms involving log x will automatically cancel out. This is because they will give
x2 y1 + x y1 + (x2 2 ) y1 log x , (4.119)
which is zero precisely because y1 is already known to satisfy the Bessel equation. The
remaining terms, after multiplying by x for convenience, give
X X X
2 (n + ) an xn+2 + n (n 2) bn xn + bn xn+2 = 0 . (4.120)
n0 n0 n0
As usual, the task is to collect the coefficients of each power of x, and equate each such
coefficient to zero. Notice that this only makes sense if 2 is an integer. If we had 2
not equal to an integer, then all the terms in the first summation would have to vanish
independently of the terms in the remaining summations. This would imply that the an all
vanished, which would contradict the fact that the coeffcients an in the first solution y1 in
(4.109) do not vanish. All that this means, of course, is that the second solution does not
involve a log x if 2 is not an integer; we already got a perfectly good second solution as in
(4.109) when 2 was not an integer, so we definitely should not expect a third solution
with a log x term in this case! On the other hand, when 2 is an integer, we can see from
(4.120) that the terms in the first summation will combine with those in the second and
third summations, giving us sensible equations that determine the coefficients bn .
To illustrate what is going on, let us again consider our example of = 1. Equation
(4.120) becomes
X X X
2 (n + 1) an xn+2 + n (n 2) bn xn + bn xn+2 = 0 . (4.121)
n0 n0 n0
(As usual, after the relabelling we have to add in by hand the terms that have dropped
off the bottom in the new summation over n 0. In this case there is only one such term
(the former n = 1 term), since at n = 0 the n (n + 2) prefactor gives zero.)
From (4.122), we obtain the recursion relation
bn + 2(n + 1) an
bn+2 = , n 0, (4.123)
n (n + 2)
63
and also
b1 = 0 . (4.124)
Looking at (4.123), we see that it will only make sense for n = 0 if we have
b0 = 2a0 , (4.125)
since otherwise we will get infinity from the n factor in the denominator. This means that
b2 is not determined by (4.123). (Look back to (4.122), before we divided out to get (4.123),
to see that with b0 = 2a0 we get a consistent equation at order x2 , but that it tells us
nothing about the value of b2 .)
The undetermined value for b2 should not be a surprise. Our trial series solution (4.118)
for the second solution is going to give us the most general possible expression for the
linearly-independent second solution. In particular, it will produce for us a second solution
that can include an arbitrary constant multiple of the first solution. That arbitrariness of
adding in any constant multiple of the first solution must manifest itself in an arbitrariness
in the solution for the bn coefficients. And that is what we have just encountered. Why did
it show up in an arbitrariness in the value of b2 ? The reason is because we are looking at the
example where = 1. The first solution, y1 in (4.109), has terms in (a0 x, a1 x2 , a2 x3 , . . .).
The terms in our trial second solution (4.118), coming from the sum over bn xn will have
terms in (b0 x1 , b1 , b2 x, b3 x2 , . . .). Thus we see that the admixture of the first solution that
we expect to see appearing when we construct the second solution in (4.118) will precisely
begin with the term b2 .
The bottom line from the above discussion is that we can take b2 to be anything we
like; different values correspond to different admixtures of the first solution y1 added in to
our new second solution. It doesnt matter how much of y1 one adds in; the second solution
will still be linearly independent of y1 . The simplest choice, therefore, is to take b2 = 0. We
now have a complete specification of the second solution. It is given by (4.118), where we
solve (4.123) for the coefficients bn with n 3, subject to
b0 = 2a0 , b1 = 0 , b2 = 0 . (4.126)
Clearly, a similar discussion could be given for any other choice of integer N such that
2 = N . (This would include the case where = 12 , whose discussion we deferred earlier.)
We shall not dwell further on this here; it is to be hoped that the general idea of how one
looks for series solutions of differential equations is now clear.
64
4.5 Sturm-Liouville Theory
In the previous sections, we discussed certain aspects of how to construct the solutions
of second-order linear ODEs in considerable detail. Here, we shall take a look at some
general properties of the solutions of the ODE. To begin, let us consider a general class of
second-order differential operators L, of the form
This is very much of the kind we discussed previously, as in (4.1), except that now we have
a function of x multiplying the u term too. We shall assume that we are interested in
studying this operator in some interval a x b, and that the functions p0 (x), p1 (x) and
p2 (x) are all real in this region. Furthermore, we assume that p0 (x) does not vanish within
the interval, i.e. for a < x < b, and p1 (x) and p2 (x) remain finite in the interval.13 In other
words, the equation L(u) = 0 has no singular points within the interval, a < x < b. The
points x = a and x = b themselves may be singular points, and indeed they commonly are.
Now, we may define the adjoint L of the operator L, as follows:
d2 d
L(u) 2
(p0 u) (p1 u) + p2 u
dx dx
= p0 u + (2p0 p1 ) u + (p0 p1 + p2 ) u . (4.128)
The reason for introducing this operator can be seen from the following. Suppose we consider
the integral Z Z
b b
dx v Lu = dx v (p0 u + p1 u + p2 u) , (4.129)
a a
and now integrate by parts to get the derivatives off u. Suppose that, for whatever reason,
the boundary terms in the integrations by parts vanish, so that we can simply use the rule
Z Z
dxf (x) g (x) dxf (x) g(x) . (4.130)
Then, after integrating by parts twice on the first term in (4.129), and once on the second
term, we shall have Z b
dx[(p0 v) (p1 v) + p2 v] u , (4.132)
a
13
To complete all the technical specification, we shall assume that the first 2 derivatives of p0 are contin-
uous, the first derivative of p1 is continuous, and that p2 is continuous, for a x b.
65
and so Z Z
b b
dx v Lu = dx(Lv) u , (4.133)
a a
where L is defined in equation (4.128). So the adjoint operator L is the one that arises when
we throw the derivatives over from the original operand function u and onto the function v
that multiplies it in (4.129). We shall discuss later why we dropped the boundary terms.
It is clear that if the functions pi (x) are related to each other in an appropriate way,
then the adjoint operator L will in fact be identical to the original operator L. From the
second line in (4.128), we see that this will be true if it happens to be the case that p0 and
p1 are related by
p0 (x) = p1 (x) . (4.134)
Lu = Lu = p0 u + p0 u + p2 u = (p0 u ) + p2 u . (4.135)
Now that we are down to just two function p0 and p2 , we may as well give them names
without indices, say P (x) and Q(x). Not surprisingly, an operator L that is equal to its
adjoint L is called a self-adjoint operator.
Note that any differential operator of the form (4.127), even if it is not itself self-adjoint,
is related to a self-adjoint operator that is obtained by multiplying it by some appropriate
function h(x). To see, this, we note that the analogue of (4.134) for the operator multiplied
by h will be (h p0 ) = h p1 , or in other words,
h p1 p0
= . (4.136)
h p0
This equation can then simply be integrated to determine the required multiplying function
h that will make the operator become self-adjoint. (Recall that we imposed the condition
p0 6= 0 at the outset, so there is no problem in principle with performing the integration.)
Thus we can proceed with our discussion by assuming that by this means, we have rendered
our operator self-adjoint.
Assuming now that we have a self-adjoint operator L, we may consider the following eigen-
value problem,
Lu(x) + w(x) u(x) = 0 , (4.137)
66
where w(x) is some given function, called a weight function or density function, and is a
constant. It is assumed that w(x) > 0 in the interval a x b, except possibly for isolated
points where w(x) = 0.
The idea is that we look for solutions u(x), subject to certain boundary conditions
imposed at x = a and x = b. By analogy with the eigenvalue problem for a matrix M with
eigenvectors V and eigenvalues
M V = V , (4.138)
the solutions u(x) to (4.137) are called eigenfunctions, and the corresponding constant is
called the eigenvalue. The typical situation is that is an as-yet undetermined constant,
and that one wants to find all the possible values for which the equation (4.137) admits
solutions u(x) that satisfy the specified boundary conditions. Commonly, it turns out that
only a discrete (usually infinite) set of values of can occur.
We have met an example of such a Sturm-Liouville eigenvalue problem already in this
course. Recall that we obtained the Legendre equation (3.1) by separating the Helmholtz
equation in spherical polar coordinates, and setting the azimuthal separation constant m
to zero. This equation is
m2
((1 x2 ) u ) u + u = 0, (4.139)
1 x2
which is clearly of the form (4.137), with the weight function being simply w(x) = 1.
It is clear by comparing the form of L here with the general form in (4.135) that it is self-
adjoint. When we solved for the solutions of the equation, we imposed the requirement that
the functions u(x) should be regular at x = 1. We were in fact solving the Sturm-Liouville
problem for the Legendre equation, seeking all the solutions in the interval 1 x 1 that
are regular at the endpoints. We found that such eigenfunctions exist only if the eigenvalue
takes the form = ( + 1), where is a non-negative integer. The corresponding
eigenfunctions P (x) are the Legendre polynomials.
The example of the Legendre equation illustrates the typical way in which a Sturm-
Liouville problem arises in physics. One separates an equation such as the Laplace equa-
tion, Helmholtz equation or wave equation, and obtains ODEs for functions in the various
independent variables. The required solutions to these ODEs must satisfy certain bound-
ary conditions, and one then looks for the allowed values of the separation constants for
which regular solutions arise. Thus the eigenvalue in a Sturm-Liouville problem is typically
a separation constant.
67
To proceed, let us return to the question of boundary conditions. Recall that we mo-
tivated the introduction of the adjoint operator L in (4.128) by considering the integral
R
v Lu, and throwing the derivatives over from u and onto v, by integration by parts. In
the process, we ignored the possible contributions from the boundary terms arising from
the integrations by parts, promising to return to discuss it later. This is what we shall now
do. First of all, we should actually be a little more general than in the previous discussion,
and allow for the possibility that the functions on which L acts might be complex. For any
pair of functions u and v, we then define the inner product (v, u), as
Z b
(v, u) dx v(x) u(x) , (4.140)
a
What we get is
Z b
(v, Lu) = dx v (P u ) + v Q u
a
Z b h ib
= dx v (P u ) + v Q u + P v u
a a
Z b h ib
= dx (P v ) u + v Q u + P v u P v u . (4.142)
a a
The integrand in the last line is just like the one in the first line, but with the roles of u
and v interchanged. Thus if the boundary terms in the last line were to vanish, we would
have established that
(v, Lu) = (Lv, u) . (4.143)
We make the boundary terms vanish by fiat; i.e. we declare that the space of functions we
shall consider will be such that the boundary terms vanish. One way to do this is to require
that
P (a) u1 (a) u2 (a) = 0 , P (b) u1 (b) u2 (b) = 0 , (4.144)
for any pair of eigenfunctions (possibly the same one) u1 and u2 . In practice, we might
achieve this by requiring, for example, that each eigenfunction satisfy
68
Another possibility is to require instead
for each eigenfunction. Yet a third possibility is to impose a weaker condition than (4.145),
and require that each eigenfunction satisfy
Any of these last three conditions will ensure that (4.144) is satisfied, and hence that the
boundary terms from the integrations by parts give no contribution. Our Legendre equation
analysis was an example where we were effectively imposing boundary conditions of the type
(4.147). In that case we had P (x) = (1 x2 ), and we required our eigenfunctions to be
regular at x = 1 and x = 1. Therefore the P (x) factor ensured that (4.144) was satisfied
in that example.
A slightly more general way to make the boundary terms in the last line of (4.142)
vanish is simply to require
for all possible pairs of eigenfunctions u1 and u2 , without demanding that this quantity
itself be zero. Such a condition might naturally arise if the independent variable x rep-
resented a periodic coordinate, or else was effectively describing a periodic direction, such
as a coordinate on an infinite lattice. Having imposed boundary conditions such that the
boundary terms in the last line of (4.142) vanish, one says that the self-adjoint operator L
is Hermitean with respect to the functions u and v that satisfy such boundary conditions.
One should therefore keep in mind this distinction between the meaning of self-adjoint and
Hermitean. Any operator L of the form (4.141) is self-adjoint. If in addition, one restricts
attention to functions that satisfy the boundary conditions (4.144) or (4.148), then the
operator L is Hermitean with respect to this class of eigenfunctions.
Note that we can actually extend the notion of Hermitean operators to include cases
where operator itself is not built purely from real quantities. This situation arises, for
example, in quantum mechanics. Consider, for instance, the momentum operator
d
px i (4.149)
dx
(we choose units where h = 1 here, since Plancks constant plays an inessential role here).
Let us assume that we impose boundary conditions on u and v (which would be called
69
wave-functions, in this quantum-mechanical context) such that we can drop the boundary
terms after integration by parts. Then we see that
Z b Z b
(v, px u) = i dx v u = i dx v u = (px v, u) . (4.150)
a a
Note that the sign worked out nicely in the end because (px v, u) means, by virtue of the
definition (4.140), Z b
dx (px v) u , (4.151)
a
and so the complex conjugation of the i factor in (4.149) produces +i. Of course this
example is a first-order operator, rather than being of the general class of second-order
operators that we were previously discussing. The key point, though, is that we can extend
the notion of hermiticity to any differential operator A, through the requirement (v, A u) =
(A v, u), where appropriate boundary conditions are imposed so that the boundary terms
from the integrations by parts can be dropped.
We already alluded to the fact that there is a close analogy between the Sturm-Liouville
eigenvalue problem for differential operators, and the eigenvalue problem in the theory of
matrices. Before proceeding with the Sturm-Liouville problem, let us first briefly recall
some of the features of the matrix eigenvalue problem.
Suppose that A is an Hermitean matrix, which we write as A = A . By definition, A is
the matrix obtained by transposing A, and complex conjugating its components. Suppose
we are looking for eigenvectors V of the matrix A, namely vectors that satisfy the equation
AV = V , (4.152)
where is some constant, called the eigenvalue. Let us suppose that A is an N N matrix
(it must, of course, be square, since we are requiring that it equal the complex conjugate of
its transpose).
Rewriting (4.152) as
(A 1l) V = 0 , (4.153)
where 1l means the unit N N matrix, we know from the theory of linear algebra that the
condition for solutions of this equation to exist is that
70
This gives an N th order polynomial equation for , called the characteristic equation, and
thus we will have N roots, which we may call (n) , for 1 n N , and associated with each
root will be the corresponding eigenvector V(n) . In general, for an arbitrary square matrix
A they could be complex. However here, since we are requiring that A be Hermitean, we
can show that the eigenvalues (n) are real. We do this by taking the eigenvector equation
(4.152) for a particular eigenvector V(n) and associated eigenvalue (n) , and multiplying from
the left by the Hermitean conjugate of V(m) :
V(m) A V(n) = (n) V(m) V(n) . (4.155)
Now, take the Hermitean conjugate of this expression, recalling that for matrices X and Y
we have (XY ) = Y X . Thus we get
V(n) A V(m) = (n) V(n) V(m) . (4.156)
V(n) A V(m) = (n) V(n) V(m) . (4.157)
Since this equation is true for all values of the labels m and n, we can equally well rewrite
it with the roles of m and n exchanged:
V(m) A V(n) = (m) V(m) V(n) . (4.158)
((n) (m) ) V(m) V(n) = 0 . (4.159)
First, consider the case when we set m = n. Bear in mind that V(n) V(n) equals the sum
of the modulus-squares of all the components of V(n) . In other words, for any vector V with
P
components V = (v1 , v2 , v3 , . . .), V V is equal to 2
i |vi | . (Dont confuse the notation vi
for the components of a given vector V with the notation V(n) , where the index n labels
a set of different eigenvectors.) We see that for any non-zero vector V(n) (which we have),
(4.159) implies that
(n) = (n) , (4.160)
((n) (m) ) V(m) V(n) = 0 . (4.161)
71
Thus if the two eigenvectors V(m) and V(n) have unequal eigenvalues, (n) 6= (m) , then
the eigenvectors are orthogonal to each other, meaning, by definition, that V(n) V(m) =
0. In the case where two different eigenvectors V(1) and V(2) happen to have the same
eigenvalue (i.e. they are degenerate), then it means that we have a two-dimensional space
of eigenvectors U a V(1) + b V(2) which satisfy A U = U for arbitrary constants a and b.
Clearly, we can always choose two members from this family, say U1 and U2 , by judicious
choice of the constants a and b, such that U1 U2 = 0. This can easily be generalised to a
scheme, known as Gram-Schmidt Orthogonalisation, for dealing with arbitrary numbers of
degenerate eigenvalues.
Thus either by necessity, in the case of non-degenerate eigenvalues, supplemented by
choice, in the case of degenerate eigenvalues, we can arrange always that the set of N
eigenvectors are orthogonal,
V(n) V(m) = 0 , m 6= n . (4.162)
Of course we can easily arrange also to make each eigenvector have unit length, V(n) V(n) = 1,
by rescaling it if necessary. Thus we can always choose the eigenvectors to be orthonormal:
V(n) V(m) = nm , (4.163)
where L is an Hermitean operator and w(x) is called the weight function. It will be assumed
that w(x) is non-negative in the interval a x b, and in fact that w(x) > 0 except possibly
for isolated points where w(x) = 0.
We can now rerun the previous derivations for Hermitean matrices in the case of our
Hermitean Sturm-Liouville operator L. To economise on the writing, recall that we are
defining the inner product (v, u) of any functions u and v by
Z b
(v, u) dx v(x) u(x) . (4.165)
a
Note that it follows immediately from this definition that we shall have
72
and that if f is any real function,
(v, f u) = (f v, u) . (4.167)
Of course it is also the case that any constant factor can be pulled outside the integral,
and so
(v, u) = (v, u) , ( v, u) = (v, u) . (4.168)
Note that we are allowing for the possibility that is complex; the complex conjugation of
in the second equation here is an immediate consequence of the definition (4.165) of the
inner product.
Further properties of the inner product are that for any function u, we shall have
(u, u) 0 , (4.169)
since we are integrating the quantity |u(x)|2 , which is pointwise non-negative, over the
interval a x b. In fact, the only way that (u, u) can equal zero is if u(x) = 0 for all x
in the interval a x b. More generally, if f is a positive function in the interval [a, b], we
shall have
(u, f u) 0 , (4.170)
(v, L u) = (L v, u) . (4.171)
Now, suppose that we have eigenfunctions un with eigenvalues n for the Sturm-Liouville
problem (4.164):
L un + n w un = 0 . (4.172)
Consequently, we have
(um , L un ) + n (um , w un ) = 0 . (4.173)
0 = (um , L un ) + n (um , w un ) =
= (L un , um ) + n (w un , um )
where we have made use of various of the properties of the inner product detailed above,
and the Hermiticity of L. By interchanging the indices m and n, this last line tells us that
73
Subtracting this from (4.173), we therefore find that
(n m ) (um , w un ) = 0 . (4.176)
Note that is precisely analogous to the discussion we gave above for the eigenvectors of an
Hermitean matrix.
Consider first the case where we take m = n, giving
(n n ) (un , w un ) = 0 . (4.177)
Now, our foresight in insisting that the weight function w(x) be non-negative in the interval
[a, b] becomes apparent, since it means that for a non-vanishing eigenfunction un we shall
have (un , w un ) > 0. Thus equation (4.177) implies that
n = n , (4.178)
(n m ) (um , w un ) = 0 . (4.179)
(um , w un ) = 0 . (4.180)
74
the question of how many there are. In fact, for the kind of situation we are considering,
with boundary conditions of the form (4.144) or (4.148), the set of eigenfunctions un will
indeed be discrete, so that we can sensibly label them by an integer n. The number of
eigenfunctions is infinite, so we can think of the label n as running from 1 to .
Lets see how this works in an example. Take the operator L and the weight function
w(x) to be
d2
L= , w(x) = 1 . (4.181)
dx2
It is clear that this operator L is indeed self-adjoint. The Sturm-Liouville problem in this
example is therefore to study the eigenvalues and eigenfunctions of the equation
u + u = 0 . (4.182)
1 1
u(x) = A cos 2 x + B sin 2 x . (4.183)
Now, we have to consider boundary conditions. Suppose for example, that we choose our
interval to be 0 x , so a = 0, b = . One choice for the boundary conditions would
be to require
u(0) = 0 , u() = 0 , (4.184)
in which case we would deduce that A = 0 and 1/2 = n, so the eigenvalues must take
the form
n = n 2 , (4.185)
Un = sin nx . (4.186)
u(0) = 0 , u () = 0 , (4.187)
for all functions. Then from (4.183) we would have A = 0 and 1/2 = (n + 21 ), implying
eigenvalues and eigenfunctions
n = (n + 21 )2 , Un = sin(n + 12 )x , (4.188)
75
for all non-negative integers n.
This emphasises the fact that the details of the eigenvalues, and the corresponding
eigenfunctions, depend on the details of the boundary conditions that are imposed.
Of course either of the above choices of boundary condition are a bit of an overkill,
since we really need only demand that the boundary terms from the integrations by parts
vanish, and their vanishing will be ensured if the periodic boundary conditions (4.148) are
satisfied, which amounts to
v(a) u (a) = v(b) u (b) (4.189)
for any pair of eigenfunctions u and v (including, possibly, the same eigenfunction for u
and v), since the function P (x) = 1. Now, we can see that the set of functions sin 2nx and
cos 2nx will all be satisfactory eigenfunctions. Let us give these names,
Thus for any choice of any two functions u and v taken from this total set, it will always
be the case that
v(0) u (0) = v() u () . (4.191)
(A non-trivial case to check here is when u = Un and v = Vm .) Note that now the two
eigenfunctions Un and Vn have the same eigenvalue n = 4n2 .
2. There is a non-zero gap between each eigenvalue and the next largest one. Thus we
may order them
1 < 2 < 3 < . (4.192)
76
The gap can never become infinitesimal, for any n+1 n , no matter how large n is.
(Assuming, as we are, that b a is finite.)
Let us deal straight away with the issue of what is meant by appropriate boundary
d2
conditions. In particular, notice that Property 2 here is not satisfied by the L = dx2
example with the periodic boundary conditions (4.191), although it is satisfied in the case
of the more stringent boundary condition (4.184) we considered previously. The point is
that the slightly less restrictive boundary conditions of the periodic type tend to allow both
independent solutions of the second-order ODE at a fixed value of , whereas the more
forceful conditions like (4.184) tend to allow only one of the two solutions. So there is
commonly a two-fold degeneracy of eigenvalues when the weaker kinds of boundary condi-
tion are imposed. It is perfectly straightforward to accommodate this in some appropriate
generalisations of the properties listed above, but it is once again one of those examples
where one can spend time endlessly dotting all the is and crossing all the ts, and at the
end of the day one has not really added hugely to the understanding of the key points.
Let us assume for now that we choose sufficiently powerful boundary conditions that the
degeneracies are avoided.
Now, to proceed, let us consider the following problem. It is familiar from the theory of
Fourier series that if we have an arbitrary function f (x) defined in the interval 0 x ,
such that f (0) = 0 and f () = 0, we can expand it in terms of the functions sin nx, as
X
f (x) = cn sin nx , (4.193)
n1
where Z
2
cn = dx f (x) sin nx . (4.194)
0
Since we have seen that the functions sin nx arise as the eigenfunctions of the Sturm-
d2
Liouville problem with L = dx2 , with the boundary conditions u(0) = u() = 0, it is
natural to suppose that we should be able to carry out analogous series expansions in terms
of the eigenfunctions for other Sturm-Liouville operators. This is the subject we shall now
pursue.
77
Let us begin by supposing that we can indeed expand an arbitrary function f (x), sat-
isfying our chosen Sturm-Liouville boundary conditions, in terms of the eigenfunctions un :
X
f (x) = cn un (x) . (4.195)
n1
Is this the end of the story? Well, not quite. We have tacitly assumed in the above
discussion that it is possible to make an expansion of the form (4.195). The question of
whether or not it is actually possible is the question of whether or not the eigenfunctions un
form a complete set. Think of the analogous question for finite-dimensional vectors. What
constitutes a complete set of basis vectors in an N -dimensional vector space? The answer is
you need N independent basis vectors, which can span the entire space. In terms of these,
you can expand any vector in the space. For example, in three-dimensional Cartesian space
we can use the three unit vectors lying along the x, y and z axes as basis vectors; they form
a complete set.
The problem in our present case is that we effectively have an infinite-dimensional
vector space; there are infinitely many independent eigenfunctions. Certainly, we know
that a complete set of basis functions must be infinite in number. We indeed have infinitely
many functions un , the eigenfunctions of the Sturm-Liouville problem. But is it a big
enough infinity? This is the question we need to look at in a little bit of detail. It is worth
doing because it lies at the heart of so many techniques that one uses in physics. Think of
quantum mechanics, for example, where one expands an arbitrary wave function in terms
of the eigenfunctions of the Schrodinger equation. To do this, one needs to be sure one has
a complete set of basis functions. It is the same basic question as the one we shall look at
here for the Sturm-Liouville problem. To do so, we first need to study a another aspect of
Sturm-Liouville theory:
78
To begin, we note that the Sturm-Liouville equation Lu + w u = 0, with Lu
(P (x) u ) + Q(x) u, can be derived rather elegantly from a variational principle. Define
the functional14 (f ) for any function f (x), by
Z b
2
(f ) (f , P f ) (f, Q f ) = dx (P f Q f 2 ) . (4.198)
a
(We shall, for simplicity, assume for now that we deal with real functions. There is no great
subtlety involved in treating complex functions; essentially we would just write |f |2 in place
of f 2 , etc.. It is just a bit simpler to let them be real, and no great point of principle
will be lost. Redo all the steps for complex functions if you wish.) Let us also define the
norm-squared of the function f :
Z b
N (f ) (f, w f ) = dx w(x) (f (x))2 , (4.199)
a
It is useful also to define more general bilinear functionals (f, g) and N (f, g), by
(f, g) (f , P g ) (f, Q g) ,
Comparing with (4.198) and (4.199), we see that (f ) = (f, f ), and N (f ) = N (f, f ).
Note that other properties of these functionals are
N (f, g) = N (g, f ) ,
N (f + g) = N (f ) + N (g) + 2N (f, g) ,
(f + g) = (f ) + (g) + 2(f, g) ,
79
start by looking for the function f , subject to some specified Sturm-Liuoville boundary
conditions, that minimises the ratio
Rb
(f ) a dx [P f 2 Q f 2 ]
R(f ) = Rb , (4.202)
N (f ) a dx w f 2
(Of course f can be determined only up to a constant scaling, since the ratio in is invariant
under f (x) k f (x), where k is any constant. Thus it will always be understood that
when we speak of the minimising function, we mean modulo this scaling arbitrariness.)
To get an overview of the idea, consider first the following simple argument. Let us
suppose that we make small variations of f , i.e. we replace f by f + f , where the variations
will be assumed also to be subject to the same Sturm-Liouville boundary conditions. (In
other words, we consider small variations that keep us within the same class of boundary
conditions.) We shall work only to first order in the small variation f . With R(f ) defined
in (4.202), we therefore have
1
R = 2 N = ( R N ) , (4.203)
N N N
where for any functional X(f ) we define its variation by
Note that the boundary terms drop out because, by supposition, f and f both satisfy
the given Sturm-Liouville boundary conditions. Likewise, we have, without any need for
integration by parts, Z b
N (f ) = 2 dx w f f . (4.206)
a
Now substitute into (4.203), and choose f = f0 , the function that supposedly minimises
R(f ). Defining R0 R(f0 ), we shall therefore have
Z b
2
R = dx [(P f0 ) + Q f0 + R0 w f0 ] f . (4.207)
N (f0 ) a
But if R(f ) is minimised by taking f = f0 , it must be that to first order in varition around
f = f0 , i.e. for f = f0 + f , we must have R = 0. This is obvious from the following
80
argument: Suppose a given variation f made R non-zero at first order. Since R(f ), and
hence R, is just a number, then we would either have R is positive or R is negative.
If it were negative, then this would be an immediate contradiction to the statement that
f = f0 is the function that minimises R(f ), and so this is impossible. Suppose R were
instead positive. But now, we could just take a new f that is the negative of the original
f ; now we would find that for a variation f , we again had R negative, and again we
would have a contradiction. Therefore, the only possible conclusion is that if R(f0 ) is the
minimum value of R(f ), then R = 0 to first order in f .
Recall that we placed no conditions on f , other than that it should satisfy specified
Sturm-Liouville boundary conditions at x = a and x = b. Since it is otherwise arbitrary in
the interval a < x < b, it follows that (4.207) can only be zero for all possible such f if the
integrand itself vanishes, i.e.
(P f0 ) + Q f0 + R0 w f0 = 0 . (4.208)
This shows that the function f0 that minimises R(f ) subject to the given Sturm-Liouville
boundary conditions is itself an eigenfunction of the Sturm-Liouville problem, with eigen-
value R0 = R(f0 ). Thus we see that by minimising R(f ) defined in (4.202), we obtain the
eigenfunction with the lowest eigenvalue, and we determine this eigenvalue.
Notice that this can provide a useful way of getting an estimate on the lowest eigenvalue
of the Sturm-Liouville problem. Even if the Sturm-Liouville operator is complicated and
we cannot explicitly solve for the eigenfunctions in the equation
(P u ) + Q u + w u = 0 , (4.209)
we can just make a guess at the lowest eigenfunction, say u = u. Of course we should make
sure that our guessed function u does at least satisfy the given Sturm-Liouville boundary
conditions; that is easily done. We then evaluate R(u), defined in (4.202). By our argument
above, we therefore know that
R(u) 1 , (4.210)
where 1 is the lowest eigenvalue of the Sturm-Liouville problem. If we were lucky enough
to guess the exact lowest eigenfunction, then we would have R(u) = 1 . In the more likely
event that our guessed function u is not the exact eigenfunction, we shall have IR(u) > 1 .
The nearer u is to being the true lowest eigenfunction, the nearer R(u) will be to the true
lowest eigenvalue. We can keep trying different choices for u, until we have made R(u) as
small as possible. This will give us an upper bound on the lowest eigenvalue.
81
Let us now look at the variational problem in a bit more detail. As we shall see,
we can actually do quite a bit more, and learn also about the higher eignfunctions and
eigenvalues also. We begin by repeating the previous derivation of the minimum value for
R(f ), expressed in the language we shall use for some subsequent manipulations.
Suppose now that the function that minimises R(f ) is called 1 , and that the minimum
value for R in (4.202) is R1 , so
(1 ) = R1 N (1 ) . (4.211)
(1 + ) R1 N (1 + ) . (4.212)
Here, is an arbitrary constant, and is any function that satisfies the Sturm-Liouville
boundary conditions. Thus from the various properties of N and given above, we see
that
(1 ) + 2 (1 , ) + 2 () R1 N () + 2 R1 N (1 , ) + 2 R1 N () . (4.213)
Now, by taking sufficiently small (so that the 2 terms become unimportant) and of the
proper sign, we could clearly violate this inequality unless the coefficient of the term
vanishes. Thus we deduce that
(1 , ) R1 N (1 , ) = 0 , (4.215)
where is an arbitrary function satisfying the boundary conditions. This equation is nothing
but Z b
dx (P 1 ) + Q 1 + R1 w 1 = 0 , (4.216)
a
and if this is to hold for all , it must be that the integrand vanishes, implying
(P 1 ) + Q 1 + R1 1 = 0 . (4.217)
In other words, we have learned that the function 1 that minimises the ratio R in (4.202)
is precisely an eigenfunction of the Sturm-Liouville equation, L 1 + R1 1 = 0. Since
1 is as small as possible, it follows that 1 is the lowest eigenfunction, and R1 = 1 , the
82
lowest eigenvalue. Let us emphasise also that we now know that for any function f that
satisfies the boundary conditions, we must have
(f ) 1 N (f ) , (4.218)
(2 ) = R2 N (2 ) , N (1 , 2 ) = 0 . (4.219)
Let us emphasise again that we are not yet assuming that 2 is the second eigenfunction,
nor that R2 is the corresponding eigenvalue. We only assume that 2 is the function that
minimises (f )/N (f ), subject to the constraint N (1 , f ) = 0.
Now by definition, if we look at (2 + ), where is a constant, and is an arbitrary
function satisfying the boundary conditions, and in addition the constraint
N (1 , ) = 0 , (4.220)
(2 + ) R2 N (2 + ) . (4.221)
where
N (1 , )
c= . (4.223)
N (1 )
(Of course , like every function we ever talk about, will still be assumed to satisfy our
Sturm-Liouville boundary conditions.) Thus from (4.221) we will have
(2 + c 1 ) R2 N (2 + c 1 ) . (4.224)
83
Expanding everything out, we have for (2 + c 1 ):
(2 + c 1 ) = R2 N (2 ) + 2 (2 , ) 2 c (2 , 1 )
+2 () + 2 c2 (1 ) 22 c (1 , ) , (4.225)
= R2 N (2 ) + 2 (2 , ) + 2 () 2 c2 1 N (1 ) .
For N (2 + c 1 ) we have
N (2 + c 1 ) = N (2 ) + 2 N (2 , ) 2 c N (2 , 1 )
+2 N () + 2 c2 N (1 ) 22 c N (1 , ) ,
= N (2 ) + 2 N (2 , ) + 2 N () 2 c2 N (1 ) . (4.226)
In each case, we have made use of previously-derived results in arriving at the second lines.
Plugging into (4.224), we thus find that the O(0 ) terms cancel out, and we are left with
By the same argument as we used in the original 1 minimisation, this equality can only
be true for arbitrary small if the coefficient of vanishes:
(2 , ) R2 N (2 , ) = 0 . (4.228)
Since this must hold for all that satisfy the boundary conditions, it follows that like in
the previous 1 discussion, here we shall have
L 2 + R2 w 2 = 0 . (4.229)
So the function that minimises (f )/N (f ) subject to the constraint that it be orthogonal
to 1 is an eigenfunction of the Sturm-Liouville equation. By definition, R2 is the smallest
value we can achieve for R in (4.202), for functions f orthogonal to 1 . Therefore R2 = 2 ,
the next-to-smallest eigenvalue.
It should now be evident that we can proceed iteratively in the same manner, to con-
struct all the eigenfunctions and eigenvalues in sequence. At the next step, we consider
the constrained minimisation problem where we require that the functions f in (f )/N (f )
must be orthogonal both to 1 and 2 . Following precisely analogous steps to those de-
scribed above, we then find that the function 3 that achieves the minimum value R3 = 3
for this ratio is again an eigenfunction of the Sturm-Liouville equation. This will therefore
be the third eigenfunction, in the sense 1 < 2 < 3 .
84
At the (N + 1)th stage in in the process, we look for the function N +1 that minimises
R = (f )/N (f ), subject to the requirements that
N (n , f ) = 0 , 1nN. (4.230)
The resulting minimum value for R will be the (N + 1)th eigenvalue N +1 , and N +1 will
be the (N + 1)th eigenfunction.
Let us conclude this part of the discussion by emphasising one important point, which
we shall need later. If f (x) is any admissible function that is orthogonal to the first N
eigenfunctions, as in (4.230), then it satisfies the inequality
(f ) N +1 N (f ) . (4.231)
One way to formulate the question of completeness is the following. Suppose we make
a partial expansion of the form (4.195), with constant coefficients cn chosen as in (4.197),
but where we run the summation not up to infinity, but instead up to some number N .
Obviously we will not in general hit the target and get a perfect expansion of the function
f (x) like this; at best, we will have some sort of approximation to f (x), which we hope will
get better and better as higher and higher modes are included in the sum. In fact we can
define
N
X
fN (x) f (x) cn un (x) , (4.232)
n=1
where the coefficients cn are defined in (4.197). What we would like to be able to show is
that as we send N to infinity, the functions fN (x), which measure the discrepancy between
the true function f (x) and our attempted series expansion, should in some sense tend to
zero. The best way to measure this is to define
Z b
A2N dx w(x) (fN (x))2 = (fN , w fN ) = N (fN ) . (4.233)
a
Now, if we can show that A2N goes to zero as N goes to infinity, we will be achieving a good
least-squares fit.
To show this, we now use the functional (f ) that was defined in (4.198), and the
properties that we derived. Before we begin, let us observe that we can, without loss of
generality, make the simplifying assumption that 1 = 0. We can do this for the following
reason. We know that the eigenvalue spectrum is bounded below, meaning that 1 , the
85
smallest eigenvalue, must satisfy 1 > . We can then shift the Sturm-Liouville operator
L, defined by Lu = (P u ) + Q u, to Le = L + 1 w, which is achieved by taking Lu
e
e u, where Q
(P u ) + Q e = Q+1 w. Thus we can just as well work with the redefined operator
e which will therefore have eigenvalues n = n 1 0. The set of eigenfunctions will
L,
be identical to before, and we have simply arranged to shift the eigenvalues. Let us assume
from now on that we have done this, so we drop the tildes, and simply assume that 1 = 0,
and in general n 0.
Now, we define
fN (x)
FN (x) . (4.234)
AN
From (4.233), it is clear that
N (FN ) = 1 . (4.235)
The delta function in the second term clicks only if n lies within the range of the sum-
mation index, and so we get:
1nN : (un , w FN ) = 0 ,
cn
nN +1: (un , w FN ) = . (4.237)
AN
This means that FN (x) is precisely one of those functions that we examined earlier, which
is orthogonal to all of the first N eigenfunctions, and thus satisfies (4.230). Since FN is
normalised, satisfying N (FN ) = 1, it then follows from (4.231) and (4.235) that
(FN ) N +1 . (4.238)
Now, let us calculate (FN ) directly. From (4.232) and (4.234), we will get
N
X N X
X N
A2N (FN ) = (f ) + 2 cm (f, L um ) cm cn (un , L um ) . (4.239)
m=1 m=1 n=0
In the last two terms, where we have already integrated by parts, we now use the fact
that the um are Sturm-Liouville eigenfunctions, and so L um can be replaced by m w um .
86
Now, from the definition (4.197) of the coefficients cn , we see that we eventually get
N
X
A2N (FN ) = (f ) c2n n . (4.240)
n=1
Since we arranged that the eigenvalues satisfy n 0, it follows from this equation that
(f )
A2N . (4.241)
(FN )
But we saw earlier in (4.238), that (FN ) N +1 , so we deduce that
(f )
A2N . (4.242)
N +1
Now, (f ) is just a functional of the original function f (x) that we are trying to expand in
an eigenfunction series, so it certainly doesnt depend on N . Furthermore, (f ) is definitely
positive, (f ) > 0 (except in the special case where f = c u1 , for which it vanishes). The
upshot of all this, then, is that (4.242) is telling us that as we send N to infinity, implying
that N +1 goes to infinity, we will have
AN 0 . (4.243)
The thing that has taken us so long to show is that an expansion of the assumed kind (4.245)
really does work. That is to say, we showed, after quite a long chain of arguments, that
the set of eigenfunctions un really is complete. This is the sort of exercise that one usually
tends not to go through, but since eigenfunction expansions play such an important role in
87
all kinds of branches of physics (for example, they are heavily used in quantum mechanics),
it is worthwhile just for once to see how the completeness is established.
Now that we have established the validity of the expansion (4.245), we can restate the
notion of completeness as follows. Take the expression (4.246), and substitute it into (4.245):
X
f (x) = N (un , f ) un (x) . (4.247)
n1
where, being physicists, we are allowed to sneak the summation through the integral without
too much concern. (It is one of those fine points that strictly speaking ought to be examined
carefully, but in the end it turns out to be justified.) What we are seeing in (4.248) is that
P
n un (x) un (y) is behaving exactly like the Dirac delta function (x y), which has the
defining property that Z b
f (x) = dy f (y) (x y) , (4.249)
a
The point about the completeness of the eigenfunctions is that the left-hand side of this
expression does indeed share with the Dirac delta function the property that it is able
to take any admissible function f and regenerate it as in (4.249); it doesnt miss any
functions.15 Thus it is often convenient to take (4.250) as the definition of completeness.
Note that it is sometimes more convenient to think of the weight function w(x) as part
of the integration measure, in which case we could define a slightly different delta-function,
let us call it (x, y), as
X
(x, y) = un (x) un (y) (4.251)
n1
88
The Dirac delta function is an example of what are called generalised functions. When
Dirac first introduced the delta function, the mathematicians were a bit sniffy about it,
since they hadnt thought of them first, complaining that they werent well-defined, that
derivatives of delta functions were even less well-defined, and so on.16 These were in fact
perfectly valid objections to raise, and sorting out the new mathematics involved in making
them respectable led to the whole subject of generalised functions. However, it is perhaps
worth noting that unlike Dirac, who simply went ahead with using them regardless, the
mathematicians who sorted out the details never won the Nobel Prize.
Let us pause here to say a bit more about the delta function. A useful physical picture
to keep in mind is that the delta function (x) looks like a spike at x = 0 that is infinitely
high, and infinitessimally narrow, such that its total area is unity. Thus we may think of
(x) as the limit of a rectangular function defined as follows:
Note that we dont actually need to take the integration range to be the entire real line; as
long as it is over an interval that encompasses x = y, we shall get the same result.
16
It is surprisingly common in research to encounter one or more of the following reactions: (1) Its
wrong; (2) Its trivial; (3) I did it first. Interestingly, it is not unknown to get all three reactions
simultaneously from the same person.
89
4.5.5 Eigenfunction expansions for Green functions
Lun + n w un = 0 . (4.260)
Now, let us look for a solution u(x) to the inhomogeneous problem (4.259), where
we shall assume that u(x) satisfies the same boundary conditions as the eigenfunctions
un (x). Since u(x) is thus assumed to be an admissible function, it follows from our previous
discussion of completeness that we can expand it as
X
u(x) = bn un (x) , (4.261)
n1
for constant coefficients bn that we shall determine. Plugging this into (4.259), and making
use of (4.260) to work out Lun , we therefore obtain
X
bn ( n ) w(x) un (x) = f (x) . (4.262)
n1
Now multiply this um (x) and integrate from a to b. Using the orthogonality of eigenfunctions
um , we therefore get Z b
bm ( m ) = dx um (x) f (x) . (4.263)
a
where as usual we exercise our physicists prerogative of taking summations freely through
integrations. Note that we have been careful to distinguish the integration variable y from
the free variable x in u(x).
Equation (4.264) is of the form
Z b
u(x) = dy G(x, y) f (y) , (4.265)
a
90
with G(x, y) given here by
X un (x) un (y)
G(x, y) = . (4.266)
n1
n
The quantity G(x, y) is known as the Green Function for the problem; it is precisely the
kernel which allows one to solve the inhomogeneous equation by integrating it times the
source term, as in (4.265).17
We may note the following properties of the Green function. First of all, from (4.266),
we see that it is symmetric in its two arguments,
Secondly, since by construction the function u(x) in (4.265) must satisfy (4.259), we may
substitute in to find what equation G(x, y) must satisfy. Doing this, we get
Z b
Lu + w u = dy (L + w) G(x, y) f (y) = f (x) , (4.268)
a
where it is understood that the functions P , Q and w depend on x, not y, and that the
derivatives in L are with respect to x. Since the second equality here must hold for any
f (x), it follows that the quantity multiplying f (y) in the integral must be precisely the
Dirac delta function, and so it must be that
91
There are interesting, and sometimes useful, consequences of the fact that we can express
the Green function in the form (4.266). Recall that the constant in (4.266) is just a
parameter that appeared in the original inhomogeneous equation (4.259) that we are solving.
It has nothing directly to do with the eigenvalues n arising in the Sturm-Liouville problem
(4.260). However, it is clear from the expression (4.266) that there will be a divergence, i.e.
pole, in the expression for G(x, y) whenever is chosen to be equal to any of the Sturm-
Liouville eigenvalues n . It is a bit like a resonance phenomenon, where the solution of a
forced harmonic oscillator equation goes berserk if the source term (the forcing function) is
chosen to be oscillatory with the natural period of oscillation of the homogeneous (unforced)
equation.
Here, what is happening is that if the constant is chosen to be equal to one of the
Sturm-Liouville eigenvalues, say = N , then we suddenly find that we are free to add in
a constant multiple of the corresponding eigenfunction uN (x) to our inhomogeneous solu-
tion, since uN (x) now happens to be precisely the solution of the homogeneous equation
Lu + w u = 0. (For generic , none of the eigenfunctions un (x) solves the homogeneous
equation.) The divergence in the Green function is arising because suddenly that particu-
lar eigenfunction uN (x) is playing a dominant role in the eigenfunction expansion for the
solution.
Recall now that some lectures ago we actually encountered another way of constructing
the Green function for this problem, although we didnt call it that at the time. In (4.47) we
obtained a solution to the inhomogeneous second-order ODE u + p(x) u + q(x) u = r(x),
in a form that translates, in our present case,18 to
Z x Z x
y1 (t) y2 (t)
u(x) = y2 (x) dt f (t) y1 (x) dt f (t) , (4.271)
x1 P (t) (y1 , y2 )(t) x2 P (t) (y1 , y2 )(t)
where y1 and y2 are the two solutions of the homogeneous equation, which for us will
be Ly + w y = 0, and (y1 , y2 )(t) = y1 (t) y2 (t) y2 (t) y1 (t) is the Wronskian of the
two solutions. Recall that taking different choices for the lower limits of integration x1
and x2 just corresponds to adding different constant multiples of the two solutions of the
homogeneous equation. Thus the choices of x1 and x2 parameterise the most general solution
of the inhomogeneous equation. This freedom is used in order to fit the boundary conditions
we wish to impose on u(x).
Suppose, as an example, that our boundary conditions are u(a) = 0 = u(b). We may,
18
Note that in order to fit our present equation P u + P u + Q u + w u = f into the previous mould,
we should divide by P , implying that the source term r(x) in the earlier equation becomes f (x)/P (x) here.
92
for convenience, choose to specify the homogeneous solutions y1 (x) and y2 (x) by requiring19
This choice is not obligatory; any choice of boundary conditions for y1 (x) and y2 (x) will
allow us to solve the inhomogeneous equation, as long as we make sure that our boundary
conditions lead to linearly-independent solutions y1 (x) and y2 (x). (We know this because
we have already proved that (4.271) gives the most general possible solution of the in-
homogeneous solution, provided that y1 and y2 are linearly-independent solutions of the
homogeneous equation.) The choice in (4.272) is very convenient, as we shall now see.
In our example, we want our inhomogeneous solution u(x) to satisfy u(a) = 0 and
u(b) = 0. To ensure these two conditions, we have at our disposal to choose the two
integration limits x1 and x2 . In view of (4.272), we can see that u(a) = 0 implies we should
take x1 = a. Similarly, u(b) = 0 implies we should take x2 = b. Thus from (4.271) we can
write the solution as
Z x Z b
y1 (t) y2 (x) y2 (t) y1 (x)
u(x) = dt f (t) + dt f (t) , (4.273)
a P (y1 , y2 ) x P (y1 , y2 )
(Note that the sign has changed on the second term because we have reversed the order of
the limits.)
Note that (4.273) can be interpreted as the equation
Z b
u(x) = dt G(x, t) f (t) , (4.274)
a
93
require a little work, since one is an infinite sum and the other is not. Essentially, one has
to show that the infinite series expansion (4.266) for the Green function is the generalised
Fourier expansion of the expression (4.275).
Let us consider a simple example, and just compare some of the key features. Take the
case that we looked at earlier, where
d2
L= , w(x) = 1 . (4.276)
dx2
d2 u(x)
+ u(x) = f (x) (4.277)
dx2
for which u(0) = u() = 0. We saw before that the eigenfunctions and eigenvalues for the
Sturm-Liouville problem
un + n un = 0 (4.278)
will be r
2
un (x) = sin nx , n = n2 , (4.279)
for the positive integers n. (We didnt give the normalisation before.) Thus from (4.266)
the Green function for the inhomogeneous problem is
2 X sin nx sin nt
G(x, t) = . (4.280)
n1 n2
On the other hand, for the closed-form expression (4.275), the required solutions of the
homogeneous equation y + y = 0, such that y1 (0) = 0 and y2 () = 0 are (choosing the
scale factors to be 1 for simplicity)
1 1
y1 (x) = sin ( 2 x) , y2 (x) = sin ( 2 (x )) . (4.281)
We should be able to see the same resonance phenomenon of which we spoke earlier,
in both of the (equivalent) expressions for the Green function. In (4.280), we clearly see
a resonance whenever is equal to the square of an integer, = N 2 . On the other hand,
in the closed-form expression (4.275), we can see in this case that the only divergences can
94
possibly come from the Wronskian in the denominator, since y1 and y2 themselves are just
1
sine functions. Sure enough, we see from (4.282) that the Wronskian vanishes if 2 = N ,
or, in other words, at = N 2 . So indeed the pole structure of the Green function is the
same in the two expressions.
95
5 Functions of a Complex Variable
The extension from the real number system to complex numbers is an important one both
within mathematics itself, and also in physics. The most obvious area of physics where
they are indispensable is quantum mechanics, where the wave function is an intrinsically
complex object. In mathematics their use is very widespread. One very important point is
that by generalising from the real to the complex numbers, it becomes possible to treat the
solution of polynomial equations in a uniform manner, since now not only equations like
x2 1 = 0 but also x2 + 1 = 0 can be solved.
The complex numbers can be defined in terms of ordered pairs of real numbers. Thus
we may define the complex number z to be the ordered pair z = (x, y), where x and y are
real. Of course this doesnt tell us much until we give some rules for how these quantities
behave. If z = (x, y), and z = (x , y ) are two complex numbers, and a is any real number,
then the rules can be stated as
z + z = (x + x , y + y ) , (5.1)
a z = (a x, a y) , (5.2)
z z = (x x y y , x y + x y) . (5.3)
z = (x, y) . (5.4)
Note that a complex number of the form z = (x, 0) is therefore real, according to the rule
(5.4) for complex conjugation; z = z. We can write such a real number in the time-honoured
way, simply as x. Thus we may define
(x, 0) = x . (5.5)
The modulus of z, denoted by |z|, is defined as the positive square root of |z|2 z z,
which, from (5.3), (5.4) and (5.5), is given by
96
1. Commutative and Associative Laws of Addition:
z1 + z2 = z2 + z1 ,
z1 z2 = z2 z1 ,
3. Distributive Law:
(z1 + z2 ) z3 = z1 z3 + z2 z3 . (5.9)
We can also define the operation of division. If z1 z2 = z3 , then we see from the previous
rules that, multiplying by z1 , we have
z3 z1 z3
z2 = = . (5.11)
|z1 |2 z1
(The expression z3 z1 /|z1 |2 here defines what we mean by z3 /z1 .) The fact that we can
solve z1 z2 = z3 for z2 when z1 is non-zero, effectively by dividing out by z1 , is a slightly
non-trivial property of the complex numbers, which we needed to check before we could
assume it to be true. The analogous feature for real numbers is, of course, totally familiar,
and we use it every day without a moments thought. This property of the real number
system and the complex number system is described by the statement that the real numbers
and the complex numbers both form Division Algebras.
We can, of course, recognise that from the previous rules that the square of the complex
number (0, 1) is (1, 0), which from (5.5) is simply 1. Thus we can view (0, 1) as being
the square root of 1:
(0, 1) = i = 1 . (5.12)
As can be seen from (5.4), it has the following property under complex conjugation:
In other words, i = i.
97
Note that in our description of the complex numbers as ordered pairs of real numbers,
this is the first time that the symbol i, meaning the square root of minus 1, has appeared.
We can, of course, re-express everything we did so far in the more familiar notation, in
which we write the complex number z = (x, y) as
z = x + iy . (5.14)
where if a and b are real, then A is complex; if a and b are complex, then A is a quaternion;
and if a andf b are quaternions, then A is an octonion. We need to specify two rules,
namely the rule for multiplication, and the rule for conjugation. These rules can be stated
completely generally, applying to all four of the division algebras: If A = (a, b) and B =
(c, d), then
AB = (a c db, d a + b c) , (5.16)
A = (a, b) . (5.17)
98
We also have the rules for multiplication by a real number , and for addition:
A = ( a, b) , A + B = (a + c, b + d) . (5.18)
The rules (5.16) and (5.17), together with (5.18), are all that one needs in order to define
the four division algebras. Note that they reduce to the rules (5.3) and (5.4), together with
(5.1) and (5.2), in the case of the complex number system, for which a, b, c and d are
just real numbers. Note in particular that in this case, the conjugations (the bars) on the
quantities appearing on the right-hand sides of (5.16) and (5.17) serve no purpose, since a,
b, c and d are real. But they do no harm, either, and the nice thing is that (5.16) and (5.17)
are completely universal rules, applying to all the division algebras.
It is straightforward to check from (5.16) that the multiplication of quaternions is non-
commutative; i.e. in general
AB 6= BA . (5.19)
This happens because of the conjugations on the right-hand side of (5.16), from which we
easily see that
da + bc) ,
AB = (ac db, while BA = (ca bd, bc + da) . (5.20)
(Of course the ordering of the lower-case quantities on the right-hand sides are irrelevant
here since they are just complex numbers.) It can also be checked that the multiplication
rule for quaternions is still associative:
Note, however, that one cannot just assume this; rather, one applies the multiplication rule
(5.16) and checks it. A further property of the quaternions, again derivable from the rules
given above, is that the conjugation of the product AB gives
AB = B A . (5.22)
Notice that all the features of the quaternions, including the non-commutativity of
multiplication, and the conjugation property (5.22), are familiar from the way in which
matrices work (with conjugation now understood to be Hermitean conjugation). In fact, as
we shall discuss in section 5.1.1 below, one can represent quaternions by 2 2 matrices.
If we now go to octonions, then it is straightforward to check from the universal rules
given above in (5.16) and (5.17), together with (5.1) and (5.2), that not only is their
multiplication non-commutative, but it is also non-associative; i.e. in general
99
This non-associativity is a direct consequence of the non-commutativity of the multiplication
rule for the quaternions that now appear on the right-hand side of (5.16) and (5.17). Notice
that the order in which the symbols appear in the expressions on the right-hand side of (5.16)
is absolutely crucial now. Of course it did not matter when we were defining quaternions,
because the multiplication rule for the complex numbers is commutative. But it does matter
now when we multiply octonions, because the constituent quaternions do not multiply
commutatively.
Notice that from the universal rules given above we can also derive that the conjugation
of a product of two octonions satisfies the same property (5.22) that we saw above for the
quaternions. However, it should be emphasised that unlike the quaternions, which can be
represented by 2 2 matrices, the octonions can not be represented by matrices. To see
this, we need look no further than the multiplicative non-associativity displayed in equation
(5.23): Matrix multiplication is associative, and therefore octonions cannot be represented
by matrices.
The crucial feature that all four of these number systems have is that they are Division
Algebras. First, notice that in all cases we have
AA = AA = aa + bb . (5.25)
We have the property that |A|2 0, with equality if and only if A = 0. At the risk of
sounding like the proverbial broken gramophone record (for those old enough to remember
what they are), we can emphasise again here that all the statements being made here can
be verified directly using just (5.16) and (5.17), together with (5.1) and (5.2).
For the quaternions, it is then obvious that they form a division algebra; we just multiply
AB = C on the left by A, and use
and hence, dividing out by the non-zero real number |A|2 , we get
AC
B= . (5.27)
|A|2
Note that we cannot simply write this as
C
B= , (5.28)
A
100
since the quotient operation is ambiguous. In this case, since the quaternions can in fact
be represented by matrices, we can in fact define the notion of the inverse of a quaternion,
A1 A|A|1 , and (5.27) can then be rewritten with a left-multiplication by the inverse,
as
B = A1 C . (5.29)
For the octonions, we can actually perform the identical calculation, but in order to do
this we must first check a slightly non-obvious fact. We noted that in general, octonionic
multiplication is non-associative. Looking at (5.26) we see that we assumed that A(AB) =
(AA)B. This is no problem for quaternions, since their multiplication rule is associative,
but it looks dangerous for octonions. However, one can check (from, as always, (5.16) and
(5.17)), that in the special case of multiplying the three octonions AAB (in other words,
the special case when two of the adjacent octonions are conjugates of each other), the
multiplication is associative, and so the steps in (5.26) are still valid. This then proves that
the octonions do indeed form a division algebra.
It is worth noting, however, that we only just got away with it for the octonions. If
we try going to the next natural generalisation of the octonions, which we might call the
hexadecions, defined as ordered pairs of octonions obeying the rules (5.16) and (5.17),
then we find that our luck has run out; we do not get a division algebra.
As we have already emphasised, the rules described above specify the four division algebras
(i.e. reals, complex, quaternions and octonions) completely, and one does not need any other
knowledge or definitions in order to manipulate them. However, it is sometimes useful to
have other ways to think about them, analogous to the way in which we introduce the
symbol i, as the square root of minus 1, for the complex numbers.
We give some more details about these other ways of thinking of quaternions and octo-
nions in this section. Some of the discussion is a little repetitive, since it is drawn from an
earlier version of these lecture notes. This entire section can be ignored, if desired.
The next extension beyond the complex numbers is, as has been said above, to the
quaternions. Another way of thinking about them is to say that one now has three inde-
pendent imaginary units, usually denoted by i, j and k, subject to the rules
i2 = j2 = k2 = 1 , i j = j i = k , j k = k j = i , k i = i k = j . (5.30)
101
A quaternion q is then a quantity of the form
q = q0 + q1 i + q2 j + q3 k , (5.31)
where q0 , q1 , q2 and q3 are all real numbers. There is again an operation of complex
conjugation, q, in which the signs of all three of i, j and k are reversed
q = q0 q1 i q2 j q3 k , (5.32)
The modulus |q| of a quaternion q is a real number, defined to be the positive square root
of
|q|2 q q = q q = q02 + q12 + q22 + q32 . (5.33)
|q q | = |q| |q | , (5.34)
1 = i , 2 = j , 3 = k , (5.35)
a b = ab + abc c , (5.36)
Note that the Einstein summation convention for the repeated c index is understood, so
(5.36) really means
3
X
a b = ab + abc c . (5.38)
c=1
102
In fact, one can recognise this as the multiplication algebra of 1 times the Pauli
matrices a of quantum mechanics, a = 1 a , which can be represented as
! ! !
0 1 0 1 1 0
1 = , 2 = , 3 = . (5.39)
1 0 1 0 0 1
(We use the rather clumsy notation 1 here to distinguish this ordinary square root of
minus one from the i quaternion.) In this representation, the quaternion defined in (5.31)
is therefore written as
!
q0 1 q3 1 q1 q2
q= . (5.40)
1 q1 + q2 q0 + 1 q3
Since the quaternions are now represented by matrices, it is immediately clear that we shall
have associativity, A(BC) = (AB)C, but not commutativity, under multiplication.
As a final remark about the quaternions, note that we can equally well view them as we
did previously, as an ordered pair of complex numbers. Thus we may define
q = (a, b) = a + b j = a0 + a1 i + b0 j + b1 k , (5.41)
(a + b j)(c + d j) = a c + b j d j + a d j + b j c
= a c + b dj2 + a d j + b c j
+ (a d + b c) j
= (a c b d)
a d + b c) .
= (a c b d, (5.43)
Note that the complex conjugations in this expression arise from taking the quaternion j
through the quaternion i, which generates a minus sign, thus
j c = j (c0 + c1 i) = c0 j + c1 j i
= c0 j c1 i j = (c0 c1 i) j = c j . (5.44)
Notice that the way quaternions are defined here as ordered pairs of complex numbers
is closely analogous to the definition of the complex numbers themselves as ordered pairs of
103
real numbers. The multiplication rule (5.42) is also very like the multiplication rule in the
last line in (5.2) for the complex numbers. Indeed, the only real difference is that for the
quaternions, the notion of complex conjugation of the constituent complex numbers arises.
It is because of this that commutativity of the quaternions is lost.
The next stage after the quaternions is the octonions, where one has 7 independent
imaginary units. The rules for how these combine is quite intricate, leading to the rather
splendidly-named Zorn Product between two octonions. It turns out that for the octonions,
not only does one not have multiplicative commutativity, but multiplicative associativity is
also lost, meaning that A (B C) 6= (A B) C in general.
For the octonions, let us denote the 7 imaginary units by a , where now 1 a 7.
Their multiplication rule is reminiscent of (5.36), but instead is
a b = ab + cabc c , (5.45)
where cabc are a set of totally-antisymmetric constant coefficients, and the Einstein sum-
mation convention is in operation, meaning that the index c in the last term is understood
to be summed over the range 1 to 7. Note that the total antisymmetry of cabc means that
the interchange of any pair of indices causes a sign change; for example, cabc = cbac . A
convenient choice for the cabc , which are known as the structure constants of the octonion
algebra, is
c147 = c257 = c367 = c156 = c264 = c345 = 1 , c123 = +1 . (5.46)
w = w0 + wa a . (5.47)
There is a notion of an octonionic conjugate, where the signs of the 7 imaginary units are
reversed:
w = w0 wa a , (5.48)
104
One can verify from (5.46) that
where an absolutely crucial point is that cbcde is also totally antisymmetric. In fact,
whilst
(3 1 ) 7 = c312 2 7 = 2 7 = c275 5 = +5 . (5.53)
So what does survive? An important thing that is still true for the octonions is that any
two of them, say w and w , will satisfy
|w w | = |w| |w | . (5.54)
Most importantly, all of the real, complex, quaternionic and octonionic algebras are
division algebras. This means that the concept of division makes sense, which is perhaps
quite surprising in the case of the octonions. Suppose that A, B and C are any three
numbers in any one of these four number systems. First note that we have
A (A B) = (A A) B . (5.55)
This is obvious from the associativity for the real, complex or quaternionic algebras. It is
not obvious for the octonions, since they are not associative (i.e. A (B C) 6= (A B) C), but
a straightforward calculation using the previously-given properties shows that it is true for
the special case A (A B) = (A A) B. Thus we can consider the following manipulation. If
A B = C, then we will have
A (A B) = |A|2 B = A C . (5.56)
Hence we have
A C
B= , (5.57)
|A|2
where we are allowed to divide by the real number |A|2 , provided that it doesnt vanish.
Thus as long as A 6= 0, we can give meaning to the division of C by A. This shows that all
four of the number systems are division algebras.
105
Finally, note that again we can define the octonions as an ordered pair of the previous
objects, i.e. quaternions, in this chain of real, complex, quaternionic and octonionic division
algebras. Thus we define the octonion w = (a, b) = a + b 7 , where a = a0 + a1 i + a2 j + a3 k
and b = b0 + b1 i + b2 j + b3 k are quaternions, and i = 1 , j = 2 and k = 3 . The conjugate
of w is given by w = (a, b). It is straightforward to show, from the previously-given
multiplication rules for the imaginary octonions, that the rule for multiplying octonions
w = (a, b) and w = (c, d) is
w w = (a c db, d a + b c) . (5.58)
This is very analogous to the previous multiplication rule (5.42) that we found for the
quaternions. Note, however, that the issue of ordering of the constituent quaternions in
these octonionic products is now important, and indeed a careful calculation from the
multiplication rules shows that the ordering must be as in (5.58). In fact (5.58) is rather
general, and encompasses all three of the multiplication rules that we have met. As a rule
for the quaternions, the ordering of the complex-number constituents in (5.58) would be
unimportant, and as a rule for the complex numbers, not only the ordering but also the
complex conjugation of the real-number constituents would be unimportant.
After discussing the generalities of division algebras, let us return now to the complex
numbers, which is the subject we wish to develop further here. Since a complex number
z is an ordered pair of real numbers, z = (x, y), it is natural to represent it as a point in
the two-dimensional plane, whose Cartesian axes are simply x and y. This is known as the
Complex Plane, or sometimes the Argand Diagram. Of course it is also often convenient to
employ polar coordinates r and in the plane, related to the Cartesian coordinates by
106
we can see that in particular, in the power series expansion of ei the real terms (even powers
of assemble into the power series for cos , whilst the imaginary terms (odd powers of )
assemble into the series for sin . In other words
Turning this around, which can be achieved by adding or subtracting the comlex conjugate,
we find
cos = 12 (ei + ei ) , sin = 1 i
2i (e ei ) . (5.63)
z = r ei . (5.64)
Note that we can also write this as z = |z| ei . The angle is known as the phase, or the
argument, of the complex number z. When complex numbers are multiplied together, the
phases are additive, and so if z1 = |z1 | ei 1 and z2 = |z2 | ei 2 , then
where we write z1 = |z1 | ei1 and z2 = |z2 | ei2 . (The inequality follows from the fact that
cos 1.) By induction, the inequality (5.66) extends to any finite number of terms:
Having introduced the notion of complex numbers, we can now consider situations where
we have a complex function depending on a complex argument. The most general kind of
107
possibility would be to consider a complex function f = u+i v, where u and v are themselves
real functions of the complex variable z = x + i y;
As it stands, this notion of a function of a complex variable is too broad, and con-
sequently of limited value. If functions are to be at all interesting, we must be able to
differentiate them. Suppose the function f (z) is defined in some region, or domain, D in
the complex plane (the two-dimensional plane with Cartesian axes x and y). We would
naturally define the derivative of f at a point z0 in D as the limit of
f (z) f (z0 ) f
= (5.70)
z z0 z
as z approaches z0 . The key point here, though, is that in order to be able to say the
limit, we must insist that the answer is independent of how we let z approach the point
z0 . The complex plane, being 2-dimensional, allows z to approach z0 on any of an infinity
of different trajectories at different angles. We would like the answer to be unique.
A classic example of a function of z whose derivative is not unique is f (z) = |z|2 = z z.
Thus from (5.70) we would attempt to calculate the limit
|z|2 |z0 |2 z z z0 z0 z z0
= = z + z0 (5.71)
z z0 z z0 z z0
as z approaches z0 . Now, if we write z z0 = |z z0 | ei , we see that this becomes
which shows that, except at z0 = 0, the answer depends on the angle at which z approaches
z0 in the complex plane. One say that the function |z|2 is not differentiable except at z = 0.
The interesting functions f (z) to consider are those which are differentiable in some
domain D in the complex plane. Placing the additional requirement that f (z) be single
valued in the domain, we have the definition of an analytic function, sometimes known as a
holomorphic function. Thus:
Let us look at the conditions under which a function is analytic in D. It is easy to derive
necessary conditions. Suppose first we take the limit in (5.70) in which z + z approaches
z along the direction of the real axis (the x axis), so that z = x;
f u + i v ux x + i vx x
= = = ux + i vx , (5.73)
z x + i y x
108
where ux denotes u/x, etc., and so in general u = ux x + uy y, etc. (Clearly for (5.73)
to be well-defined the partial derivatives ux u/x and vx v/x must exist.)
Now suppose instead we approach along the imaginary axis, z = i y so that now
f u + i v uy y + i vy y
= = = i uy + vy . (5.74)
z x + i y i y
(This time, we require that the partial derivatives uy and vy exist.) If this is to agree with
the previous result from approaching along x, we must have ux + i vx = vy i uy , which,
equating real and imaginary parts, gives
ux = vy , uy = vx . (5.75)
These conditions are known as the Cauchy-Riemann equations. It is easy to show that we
would derive the same conditions if we allowed z to lie along any ray that approaches z at
any angle.
The Cauchy-Riemann equations by themselves are necessary but not sufficient for the
analyticity of the function f . The full statement is the following:
There is a nice alternative way to view the Cauchy-Riemann equations. Since z = x+i y,
and hence z = x i y, we may solve to express x and y in terms of z and z:
x = 21 (z + z) , y = 2i (z z) . (5.76)
Formally, we can think of z and z as being independent variables. Then, using the chain
rule, we shall have
x y
z = + = 12 x 2i y ,
z z x z y
x y
z = + = 12 x + 2i y , (5.77)
z z x z y
where x /x and y /y. (Note that z means a partial derivative holding z fixed,
etc.) So if we have a complex function f = u + i v, then z f is given by
z f = 12 ux + 2i uy + 2i vx 12 vy , (5.78)
20
A function f (z) is continuous at z0 if, for any given > 0 (however small), we can find a number such
that |f (z) f (z0 )| < for all points z in D satisfying |z z0 | < .
109
which vanishes by the Cauchy-Riemann equations (5.75).21 So the Cauchy-Riemann equa-
tions are equivalent to the statement that the function f (z) depends on z but not on z. We
now see instantly why the function f = |z|2 = z z was not in general analytic, although it
was at the origin, z = 0.
We have seen that the real and imaginary parts u and v of an analytic function satisfy the
Cauchy-Riemann equations (5.75). From these, it follows that uxx = vyx = vxy = uyy , and
similarly for v. In other words, u and v each satisfy Laplaces equation in two dimensions:
2 2
2 u = 0 , 2 v = 0 , where 2 + . (5.79)
x2 y 2
This is a very useful property, since it provides us with ways of solving Laplaces equation
in two dimensions. It has applications in 2-dimensional electrostatics and gravity, and in
hydrodynamics.
Another very important consequence is that we can use the properties (5.79) in reverse,
in the following sense. We have shown that if u is the real part of an analytic function,
then it satisfies 2 u = 0. In fact the implication goes in the other direction too; if u(x, y)
satisfies the Laplace equation uxx + uyy = 0 then it follows that it can be taken to be the
real part of some analytic function. We can say that uxx + uyy = 0 is the integrability
condition for the pair of equations ux = vy , uy = vx to admit a solution for v(x, y).
To solve for v(x, y), one differentates u(x, y) with respect to x or y, and integrates with
respect to y or x respectively, to construct the function v(x, y) using (5.75):
Z
u(x, y )
y
v(x, y) = dy + (x) ,
y0 x
Z x
u(x , y)
v(x, y) = dx + (y) . (5.80)
x0 y
The first integral, which comes from integrating ux = vy , leaves an arbitrary function
of x unresolved, while the second, coming from integrating uy = vx , leaves an arbitrary
function of y unresolved. Consistency between the two resolves everything, up to an additive
constant in v(x, y). This constant never can be determined purely from the given data, since
clearly if f (z) is analytic then so is f (z) + i , where is a real constant. But the real parts
21
One might feel uneasy with treating z and z as independent variables, since one is actually the complex
conjugate of the other. The proper way to show that it is a valid procedure is temporarily to introduce a
genuinely independent complex variable z, and to write functions as depending on z and z, rather than z
and z. After performing the differentiations in this enlarged complex 2-plane, one then sets z = z, which
amounts to taking the standard section that defines the complex plane. It then becomes apparent that
one can equally well just treat z and z as independent, and cut out the intermediate step of enlarging the
dimension of the complex space.
110
of f (z) and f (z) + i are identical, and so clearly we cannot deduce the value of , merely
from the given u(x, y). Note that we do need both equations in (5.80), in order to determine
v(x, y) up to the additive constant . Of course the freedom to pick different constant lower
limits of integration y0 and x0 in (5.80) just amounts to changing the arbitrary functions
(x) and (y), so we can choose y0 and x0 in any way we wish.
Let us check this with an example. Suppose we are given u(x, y) = ex cos y, and asked
to find v(x, y). A quick check shows that uxx + uyy = 0, so we will not be wasting our time
by searching for v(x, y). We have
Sure enough, the two expressions are compatible, and we see that (x) = (y). By the
standard argument that is the same as one uses in the separation of variables, it must be
that (x) = (y) = , where is a (real) constant. Thus we have found that v(x, y) =
ex sin y + , and so
= ez + i . (5.83)
ux vx + uy vy = 0 , (5.84)
where
~ ( , )
(5.86)
x y
is the 2-dimensional gradient operator. Equation (5.85) says that families of curves in the
(x, y) plane corresponding to u = constant and v = constant intersect at right-angles at
~ is perpendicular to the lines of constant u,
all points of intersection. This is because u
~ is perpendicular to the lines of constant v. Since we are in two dimensions here, it
while v
follows that if the perpendicular to line u = constant is perpendicular to the perpendicular
to the line v = constant, then the lines u = constant must be perpendicular to the lines
v = constant wherever they intersect.
111
5.2.1 Power Series
A very important concept in complex variable theory is the idea of a power series, and its
P
radius of convergence. We could consider the infinite series n=0 an (z z0 )n , but since a
simple shift of the origin in the complex plane allows us to take z0 = 0, we may as well
make life a little bit simpler by assuming this has been done. Thus, let us consider
X
f (z) = an z n , (5.87)
n=0
which is clearly a sum of non-negative terms. If this converges, then |f (z)| is finite, and so
the series (5.87) is clearly convergent. If
1
|an | n 1/R (5.91)
as n , then it is evident that the power series (5.87) is absolutely convergent if |z| < R,
and divergent if |z| > R. (As always, the borderline case |z| = R is trickier, and depends
on finer details of the coefficients an .) The quantity R is called the radius of convergence
of the series. The circle of radius R (centred on the expansion point z = 0 in our case) is
called the circle of convergence. The series (5.87) is absolutely convergent for any z that
lies within the circle of convergence.
We can now establish the following theorem, which is of great importance.
If f (z) is defined by the power series (5.87), then f (z) is an analytic function at every
point within the circle of convergence.
112
This is all about establishing that the power series defining f (z) is differentiable within
the circle of convergence. Thus we define
X
(z) = n an z n1 , (5.92)
n=1
without yet prejudging that (z) is the derivative of f (z). Suppose the series (5.87) has
radius of convergence R. It follows that for any such that 0 < < R, |an n | must be
bounded, since we know that even the entire infinite sum is bounded. We may say, then,
that |an n | < K for any n, where K is some positive number. Then, defining r = |z|, and
= |h|, it follows that if r < and r + < , we have
X (z + h)n z n
f (z + h) f (z)
(z) = an n z n1 . (5.93)
h n=0
h
and the series obtained by differentiating this with respect to x.) Clearly the last line in
(5.96) tends to zero as goes to zero. This proves that (z) given in (5.92) is indeed the
derivative of f (z). Thus f (z), defined by the power series (5.87), is differentiable within its
circle of convergence. Since it is also manifestly single-valued, this means that it is analytic
with the circle of convergence.
113
It is also clear that the derivative f (z), given, as we now know, by (5.92), is has the
same radius of convergence as the original series for f (z). This is because the limit of
|n an |1/n as n tends to infinity is clearly the same as the limit of |an |1/n . The process of
differentiation can therefore be continued to higher and higher derivatives. In other words,
a power series can be differentiated term by term as many times as we wish, at any point
within its circle of convergence.
A very important result in the theory of complex functions is Cauchys Theorem, which
states:
where we have separated the original integral into its real and imaginary parts. Written in
this way, each of the contour integrals can be seen to be nothing but a closed line integral
of the kind familiar, for example, in electromagnetism. The only difference here is that
we are in two dimensions rather than three. However, we still have the concept of Stokes
Theorem, known as Greens Theorem in two dimensions, which asserts that
I Z
~ d~ =
E ~ E
~ dS
~, (5.100)
C S
where C is a closed curve bounding a domain S, and E ~ is any vector field defined in S
~ just
and on C, with well-defined derivatives in S. In two dimensions, the curl operator
means
~ = Ey Ex .
~ E (5.101)
x y
114
~ d~ means Ex dx +
(It is effectively just the z component of the three-dimensional curl.) E
Ey dy, and the area element dS ~ will just be dx dy.
Applying Greens theorem to the integrals in (5.99), we therefore obtain
I Z Z
v u u v
f (z) dz = + dx dy + i dx dy . (5.102)
C S x y S x y
But the integrands here are precisely the quantities that vanish by virtue of the Cauchy-
H
Riemann equations (5.75), and thus we see that f (z) dz = 0, verifying Cauchys theorem.
An alternative proof of Cauchys theorem can be given as follows. Define first the slightly
more general integral I
F () f (z) dz ; 0 1, (5.103)
(The prime symbol always means the derivative of a function with respect to its argument.)
Now integrate the second term by parts, giving
I I
1 1
F () = f (z) dz + [ z f (z)] f (z) dz
= [z f (z)] , (5.105)
where the square brackets indicate that we take the difference between the values of the
enclosed quantity at the beginning and end of the integration range. But since we are
integrating around a closed curve, and since z f (z) is a single-valued function, this must
vanish. Thus we have established that F () = 0, implying that F () = constant. We can
determine this constant by considering any value of we wish. Taking = 0, it is clear
from (5.103) that F (0) = 0, whence F (1) = 0, proving Cauchys theorem.
Why did we appear not to need the Cauchy-Riemann equations (5.75) in this proof?
The answer, of course, is that effectively we did, since we assumed that we could sensibly
talk about the derivative of f , called f . As we saw when we discussed the Cauchy-Riemann
equations, they are the consequence of requiring that f (z) have a well-defined meaning.
Cauchys theorem has very important implications in the theory of integration of com-
plex functions. One of these is that if f (z) is an analytic function in some domain D, then
if we integrate f (z) from points z1 to z2 within D the answer
Z z2
f (z) dz (5.106)
z1
115
is independent of the path of integration within D. This follows immediately by noting that
if we consider two integration paths P1 and P2 then the total path consisting of integration
from z1 to z2 along P1 , and then back to z1 in the negative direction along P2 constitutes
a closed curve C = P1 P2 within D. Thus Cauchys theorem tells us that
I Z Z
0= f (z) dz = f (z) dz f (z) dz . (5.107)
C P1 P2
where the contour of integration can be taken to be any path within the domain of analyt-
icity. Notice that the integrated function, g(z), has the same domain of analyticity as the
integrand f (z). To show this, we just have to show that the derivative of g(z) is unique.
This (almost self-evident) property can be made evident by considering
g(z) g()
f () , (5.109)
z
116
the two by a narrow causeway of infinitesimally-separated parallel paths. This creates a
total contour which, by construction, contains no singularities at all. The integrals in and
out along the causeway cancel each other in the limit when the separation of the two paths
becomes zero, and hence, taking into account the orientations of the two contributions C
and Ce in the total closed path, we find from Cauchys theorem that
I I
f (z)dz = f (z)dz . (5.112)
C e
C
Finally, on the subject of Cauchys theorem, let us note that we can turn it around,
and effectively use it as a definition of an analytic function. This is the content of Moreras
Theorem, which states:
H
If f (z) is continuous and single-valued within a closed contour C, and if f (z) dz = 0
for any closed contour within C, then f (z) is analytic within C.
This can provide a useful way of testing whether a function is analytic in some domain.
Suppose that the function f (z) is analytic in a domain D. Consider the integral
I
f (z)
G(a) = dz , (5.113)
C za
where the contour C is any closed curve within D. There are three cases to consider,
depending on whether the point a lies inside, on, or outside the contour of integration C.
Consider first the case when a lies within C. By an observation in the previous section,
we know that the value of the integral (5.113) will not alter if we deform the contour in
any way provided that the deformation does not cross over the point z = a. We can exploit
this in order to make life simple, by deforming the contour into a small circle C , of radius
, centred on the point a. Thus we may write
z a = ei , (5.114)
117
In the limit as tends to zero, the continuity of the function f (z) implies that the last
integral will vanish, since f (a + ei ) = f (a) + f (a) ei + , and so we have that if f (z)
is analytic within and on any closed contour C then
I
f (z)
dz = 2 i f (a) , (5.116)
C za
We can view the Cauchy integral formula as a way of evaluating an analytic function at
a point z in terms of a contour integral around any closed curve C that contains z:
I
1 f () d
f (z) = . (5.119)
2 i C z
A very useful consequence from this is that we can use it also to express the derivatives of
f (z) in terms of contour integrals. Essentially, one just differentiates (5.119) with respect to
z, meaning that on the right-hand side it is only the function ( z)1 that is differentiated.
We ought to be a little careful just once to verify that this differentiation under the integral
is justified, so that having established the validity, we can be cavalier about it in the future.
The demonstration is in any case pretty simple. We have
I
f (z + h) f (z) 1 f () 1 1
= d ,
h 2 i h zh z
I
1 f () d
= . (5.120)
2 i ( z)( z h)
Now in the limit when h 0 the left-hand side becomes f (z), and thus we get
I
1 f () d
f (z) = . (5.121)
2 i ( z)2
118
The question of the validity of this process, in which we have taken the limit h 0 under
the integration, comes down to whether it was valid to assume that
I
1 1
T f () d
( z)2 ( z h)( z)
I
f () d
= h (5.122)
( z)2 ( z h)
for some function g(z) integrated around the closed curve C. Using the inequality (5.68),
generalised to a sum over an infinity of infinitesimal complex numbers, we have
I I
|G| = g(z) dz |g(z)| |dz| . (5.124)
C C
Now, clearly this last integral, which sums up quantities that are all positive or zero, must
itself be less than or equal to the integral
I
|g|max |dz| , (5.125)
C
where we define |g|max to be the largest value that |g(z)| attains anywhere on the contour
C. Since this maximum value is of course just a constant, we have that
I
|G| |g|max |dz| . (5.126)
C
One further useful inequality may be written down in the case that g(z) is itself the ratio
of complex functions: g(z) = A(z)/B(z). Obviously, we can now say that the maximum
value of |g(z)| anywhere on the contour is bounded by
|A|max
|g|max , (5.129)
|B|min
119
where |A|max is the maximum value of |A(z)| anywhere on the contour C, and |B|min is the
minimum value of |B((z)| anywhere on the contour C. (In general, these two extrema will
occur at different points on the contour.) Thus we may say that
I A(z) |A|max
dz L, (5.130)
B(z) |B|min
|h| M L
|T | , (5.131)
b2 (b |h|)
where M is the maximum value of |f ()| on the contour, L is the length of the contour,
and b is the minimum value of of | z| on the contour. These are all fixed numbers,
independent of h, and so we see that indeed T must vanish as h is taken to zero.
More generally, by continuing the above procedure, we can show that the nth derivative
of f (z) is given by I
(n) 1 dn 1
f (z) = f () d , (5.132)
2 i dz n z
or, in other words, I
(n) n! f () d
f (z) = . (5.133)
2 i C ( z)n+1
Note that since all the derivatives of f (z) exist, for all point C within the contour C, it
follows that f (n) (z) is analytic within C for any n.
We can use Cauchys integral formula to derive Taylors theorem for the expansion of a
function f (z) around a point z = a at which f (z) is analytic. An important outcome from
this will be that we shall see that the radius of convergence of the Taylor series extends up
to the singularity of f (z) that is nearest to z = a.
From Cauchys integral formula we have that if f (z) is analytic inside and on a contour
C, and if z = a + h lies inside C, then
I
1 f () d
f (a + h) = . (5.134)
2 i ah
23
Note, by the way, that although we presented the argument above for the case of a closed contour, the
bounds (5.128) and (5.130) apply equally well to the case of an open contour that does not close on itself.
Of course, they will only be useful bounds if the length L is finite.
120
PN n
Now, bearing in mind that the geometric series n=0 x sums to give (1 xN +1 ) (1 x)1 ,
we have, by taking x = h/( a) that
hN+1
N
X hn 1 1 (a)N+1
=
n=0
( a)n+1 ( a) 1 h
a
1 hN +1
= . (5.135)
a h ( a h) ( a)N +1
We can use this identity as an expression for ( a h)1 in (5.134), implying that
N I I
X hn f () d hN +1 f () d
f (a + h) = + . (5.136)
n=0
2 i ( a)n+1 2 i ( a h) ( a)N +1
Now, if M denotes the maximum value of |f ()| on the contour C, then by taking C to
be a circle of radius r centred on = a, we shall have
|h|N +1 M r M r |h| N +1
|RN | N +1
= . (5.139)
(r |h|) r r |h| r
Thus if we choose h such that |h| < r, it follows that as N is sent to infinity, RN will go to
zero. This means that the Taylor series
X hn
f (a + h) = f (n) (a) , (5.140)
n=0
n!
or in other words,
X (z a)n
f (z) = f (n) (a) , (5.141)
n=0
n!
will be convergent for any z lying within the circle of radius r centred on z = a. But we can
choose this circle to be as large as we like, provided that it does not enclose any singularity
of f (z). Therefore, it follows that the radius of convergence of the Taylor series (5.140) is
precisely equal to the distance between z = a and the nearest singularity of f (z).
121
5.3.4 The Laurent Series
Suppose now that we want to expand f (z) around a point z = a where f (z) has a singularity.
Clearly the previous Taylor expansion will no longer work. However, depending upon the
nature of the singularity at z = a, we may be able to construct a more general kind of series
expansion, known as a Laurent series. To do this, consider two concentric circles C1 and
C2 , centred on the point z = a, where C1 has a larger radius that can be taken out as far
as possible before hitting the next singularity of f (z), while C2 is an arbitrarily small circle
enclosing a. Take the path C1 to be anticlockwise, while the path C2 is clockwise. We can
make C1 and C2 into a single closed contour C, by joining them along a narrow causeway,
as shown in Figure 1.
C1
z=a
11
00
00
11
C
2
11
00
00
11
z=a+h
The idea is that we will take a limit where the width of the causeway joining the inner
and outer circles shrinks to zero. In the region of the complex plane under discussion, the
function f (z) has, by assumption, only an isolated singularity at z = a.
Now consider Cauchys integral formula for this contour, which will give
I
1 f () d
f (a + h) = . (5.142)
2 i C ah
The reason for this is that the closed contour C encloses no singularities except for the pole
at = a + h. In particular, it avoids the singularity of f (z) at z = a. Since the integrand
122
is non-singular in the neighbourhood of the causeway, we see that when the width of the
causeway is taken to zero, we shall find that the integration along the lower road heading
in from C1 to C2 will be exactly cancelled by the integration in the opposite direction along
the upper road heading out from C2 to C1 . Furthermore, in the limit when the width
of the causeway goes to zero, the gaps in the contours C1 and C2 shrink to zero, and they
can be replaced by closed circular contours. In this sense, therefore, we can disregard the
contribution of the causeway, and just make the replacement that
I I I
+ . (5.143)
C C1 C2
For the first integral, around the large circle C1 , we can use the same expansion for ( a
h)1 as we used in the Taylor series previously, obtained by setting N = in (5.135), and
using the fact that the second term on the right-hand side then vanishes, since hN +1 /|
a|N +1 goes to zero on C1 when N goes to infinity, as a result of the radius of C1 being larger
than |h|. In other words, we expand ( a h)1 as
1 1
= ,
ah ( a)(1 h ( a)1 )
1 h h2
= 1+ + + , (5.145)
a a ( a)2
X hn
= .
n=0
( a)n+1
On the other hand, in the second integral in (5.144) we can expand ( a h)1 in a
series valid for | a| << |h|, namely
1 1
= ,
ah h(1 ( a) h1 )
1 a ( a)2
= 1+ + + , (5.146)
h h h2
X ( a)m1
= .
m=1
hm
Thus we find
I I
1 X f () d 1 X 1
f (a + h) = hn n+1
+ m
f () ( a)m1 d , (5.147)
2 i n=0 C1 ( a) 2 i m=1
h +
C2
123
where we define C2+ to mean the contour C2 but with the direction of the integration path
reversed, i.e. C2+ runs anti-clockwise around the point = a, which means it is now the
standard positive direction for a contour. Thus we have
X
f (a + h) = an hn , (5.148)
n=
Here, the integration contour is C1 when evaluating an for n 0, and C2+ when evaluating
an for n < 0. Notice that we can in fact just as well choose to use the contour C1 for the
n < 0 integrals too, since the deformation of the contour C2+ into C1 does not cross any
singularities of the integrand when n < 0.
Note that using the original variable z = a + h, (5.148) is written as
X
f (z) = an (z a)n . (5.150)
n=
The expansion in (5.150) is known as the Laurent Series. By similar arguments to those
we used for the Taylor series, one can see that the series converges in an annulus whose
larger radius is defined by the contour C1 . This contour could be chosen to be the largest
possible circle centred on the singularity of f (z) at z = a that does not enclose any other
singularity of f (z).
In the Laurent series, the function f (z) has been split as the sum of two parts:
The part f+ (z) (the terms with n 0 in (5.150)) is analytic everywhere inside the larger
circle C1 . The part f (z) (the terms with n 1 in (5.150)) is analytic everywhere outside
the small circle C2 enclosing the singularity as z = a.
In practice, one commonly wants to work out just the first few terms in the Laurent
expansion of a function around a singular point. For example, it is often of interest to know
the singular terms, corresponding to the inverse powers of (z a) in (5.150). If the function
in question has a pole of degree N at the expansion point, then there will just be N singular
terms, corresponding to the powers (z a)N down to (z a)1 . For reasons that we shall
see later, the coefficient of (z a)1 is often of particular interest.
124
Determining the first few terms in the expansion of a function with a pole at z = a is
usually pretty simple, and can just be done by elementary methods. Suppose, for example,
we have the function
g(z)
f (z) = , (5.153)
zN
where g(z) is analytic, and that we want to find the Laurent expansion around the point
z = 0. Since g(z) is analytic, it has a Taylor expansion, which we can write as
X
g(z) = bm z m . (5.154)
m0
1 X
f (z) = bm z m
z N m0
X
= bm z mN
m0
X
= bn+N z n . (5.155)
nN
1 1 2 1 3
f (z) = 1 + z + 2 z + 6 z +
z2
1 1 1 1
= + + + z + . (5.156)
z2 z 2 6
In more complicated examples, there might be an analytic function that goes to zero
in the denominator of the function f (z). We can still work out the first few terms in the
Laurent expansion by elementary methods, by writing out the Taylor expansion of the
function in the denominator. Consider, for example, the function f (z) = 1/ sin z, to be
expanding in a Laurent series around z = 0. We just write out the first few terms in the
Taylor series for sin z,
sin z = z 61 z 3 + 1 5
120 z +
= z 1 16 z 2 + 1 4
120 z + . (5.157)
Notice that on the second line, we have pulled out the overall factor of z, so that what
remains inside the parentheses is an analytic function that does not go to zero at z = 0.
Now, we write
1 1 1
f (z) = = 1 61 z 2 + 1 4
120 z + , (5.158)
sin z z
125
and the problem has reduced to the kind we discussed previously. Making the expansion of
the term in parentheses using (1 + x)1 = 1 x + x2 x3 + , we get
1
f (z) = 1 + 61 z 2 + 7 4
360 z + , (5.159)
z
1 1 7 3
= + 16 z + 360 z + . (5.160)
sin z z
Note that if we had only wanted to know the pole term, we would not have needed to push
the series expansion as far as we just did. So as a practical tip, time can be saved by working
just to the order needed, and no more, when performing the expansion. (One must take
care, though, to be sure to take the expansion far enough.)
We are now in a position to classify the types of singularity that a function of a complex
variable may possess.
Suppose that f (z) has a singularity at z = a, and that its Laurent expansion for f (a+h),
given in general in (5.148), actually terminates at some specific negative value of n, say
n = N . Thus we have
X
f (a + h) = an hn . (5.161)
n=N
We then say that f (z) has a pole of order N at z = a. In other words, as z approaches a
the function f (z) has the behaviour
aN
f (z) = + less singular terms . (5.162)
(z a)N
If, on the other hand, the sum over negative values of n in (5.148) does not terminate,
but goes on to n = , then the function f (z) has an essential singularity at z = a. A
classic example is the function
1
f (z) = e z . (5.163)
126
and setting w = 1/z. The Laurent series (5.164) has terms in arbitrarily negative powers
of z, and so z = 0 is an essential singularity.
Functions have quite a complicated behaviour near an essential singularity. For example,
if z approaches zero along the positive real axis, e1/z tends to infinity. On the other hand, if
the approach to zero is along the negative real axis, e1/z instead tends to zero. An approach
to z = 0 along the imaginary axis causes e1/z to have unit modulus, but with an ever-
increasing phase rotation. In fact a function f (z) with an essential singularity can take on
any value, for z near to the singular point.
Note that the Laurent expansion (5.148) that we have been discussing here is applicable
only if the singularity of f (z) is an isolated one.24 There can also exist singularities of a
different kind, which are neither poles nor essential singularities. Consider, for example,
the functions f (z) = z, or f (z) = log z. Neither of these can be expanded in a Laurent
series around z = 0. They are both discontinuous along an entire semi-infinite line starting
from the point z = 0. Thus the singularity at z = 0 is not an isolated one; it is called a
branch point. We shall discuss these in more detail later.
For now, just take note of the fact that a singularity in a function does not necessarily
mean that the function is infinite there. By definition, a function f (z) is singular at z = a
if it is not analytic at z = a. Thus, for example, f (z) = z 1/2 is singular at z = 0, even
though f (0) = 0. This can be seen from the fact that we cannot expand z 1/2 as a power
series around z = 0, and therefore z 1/2 cannot be analytic there. It is also the case that
although f (0) is finite, the derivatives of f (z) are infinite at z = 0.
For now, let us look in a bit more detail at functions with isolated singularities.
A very important, and initially perhaps rather surprising, result is the following, known as
Liouvilles Theorem:
A function f (z) that is analytic for all finite values of z and is bounded everywhere is
a constant.
Note that when we say f (z) is bounded everywhere (at finite z), we mean that there
exists some positive number S, which is independent of z, such that
|f (z)| S (5.166)
24
By definition, if a function f (z) has a singularity at z = a, then it is an isolated singularity if f (z) can
be expanded in a Laurent series around z = a.
127
for all finite z.
We can prove Liouvilles theorem using the result obtained earlier from Cauchys integral
formula, for f (a): I
1 f (z) dz
f (a) = . (5.167)
2 i (z a)2
Take the contour of integration to be a circle of radius R centred on z = a, which means
that the points z on the contour are defined by |z a| = R. Since we are assuming that
f (z) is bounded, we may take |f (z)| M for all points z on the contour, where M is some
finite positive number. Then, using (5.167), we must have
M M
|f (a)| (2 R) = . (5.168)
2 R2 R
Thus by taking R to infinity, and recalling our assumption that f (z) remains bounded for
all finite z (meaning that M is finite, in fact M S, no matter how large R is), we see that
f (a) must be zero. Thus f (a) is a constant, independent of a. Thus Liouvilles theorem is
established.
An illustration of Liouvilles theorem can be given with the following example. Suppose
we try to construct an analytic function that is well-behaved, and bounded, everywhere.
If we were considering real functions as opposed to complex analytic functions, we might
consider a function such as
1
f (x) = , (5.169)
1 + x2
which rather boringly falls off to zero as x tends to + or , having attained the exciting
peak of f = 1 at the origin. Thus as a real function of x, we have |f (x)| 1 everywhere.
However, viewed as a function of the variable z in the complex plane, it is unbounded:
1 1 i i
f (z) = 2
= = ,. (5.170)
1+z (z i)(z + i) 2(z + i) 2(z i)
Thus the function f (z) actually has poles at z = i, away from the z axis. Of course we
could consider instead the function
1
g(z) = , (5.171)
1 + |z|2
which certainly satisfies |g(z)| 1 everywhere. But g(z) is not analytic (since it depends
on z as well as z.)
Liouvilles theorem tells us that any bounded analytic function we try to construct is
inevitably going to have singularities somewhere, unless we are content with the humble
constant function.
A similar argument to the above allows us to extend Liouvilles theorem to the following:
128
If f (z) is analytic for all finite z, and if |f (z)| is bounded by S |z|k for some integer k
and some constant S, i.e. |f (z)/z k | S for all finite z, then f (z) is a polynomial of
degree k.
To show this, we follow the same strategy as before, by using the higher-derivative
consequences of Cauchys integral:
I
(n) n! f (z) dz
f (a) = . (5.172)
2 i (z a)n+1
Assume that |f (z)| M |z|k on the contour at radius R centred on z = a. Then we have
n! M Rk
|f (n) (a)| (2 R) = n! M Rkn . (5.173)
2 Rn+1
Thus we see that as R tends to infinity, all the terms with k < n will vanish (since we shall
always have M S, where S is some fixed number), and so
But this is just telling us that f (z) is a polynomial in z with z k as its highest power, which
proves the theorem. Liouvilles theorem itself is just the special case k = 0.
A function f (z) that is a polynomial in z of degree k,
k
X
f (z) = an z n , (5.175)
n=0
is clearly analytic for all finite values of z. However, if k > 0 it will inevitably have a pole
at infinity. To see this, we use the usual trick of making the coordinate transformation
1
= , (5.176)
z
and then looking at the behaviour of the function f (1/) at = 0. Clearly, for a polynomial
of degree k of the form (5.175), we shall get
k
X
f (1/) = an n , (5.177)
n=0
129
By the Cauchy test for the convergence of series, we see that (|z|n /n!)1/n = |z| (n!)1/n
tends to zero as n tends to infinity,25 for any finite |z|, and so the exponential is analytic for
all finite z. Of course the situation at infinity is another story; here, one has to look at e1/
as tends to zero, and as we saw previously this has an essential singularity, which is more
divergent than any finite-order pole. Other examples of entire functions are cos z, and the
Bessel function of integer order, Jn (z). The Bessel function has the power-series expansion
X (1) z n+2
Jn (z) = . (5.179)
=0
! (n + )! 2
Of course we know from Liouvilles theorem that any interesting entire function (i.e.
anything except the purely constant function) must have some sort of singularity at infinity.
Entire functions are analytic everywhere except at infinity. Next on the list are meromorphic
functions:
We insist, in the definition of a meromorphic function, that the only singularities that
are allowed are poles, and not, for example, essential singularities. Note that we also insist,
in this definition of a strictly meromorphic function, that it either be analytic also at infinity,
or at worst, have a pole at infinity.
The number of poles in a meromorphic function must be finite. This follows from the
fact that if there were an infinite number then there would exist some singular point, either
at finite z or at z = , which would not be isolated, thus contradicting the definition of
an everywhere-meromorphic function. For example; suppose we had a function with poles
at all the integers along the real axis. These would appear to be isolated, since each one is
unit distance from the next. However, these poles actually have an accumulation point at
infinity, as can be seen by writing z = 1/ and looking near = 0. Thus a function of this
25
To see this, we may, for convenience, take n to be even, in which case we may write
Each of the 1
2
n square brackets is n, and so we have n! nn/2 . It follows that (n!)1/n n1/2 , and hence
(n!)1/n n 1/2
, which tends to zero as n tends to infinity.
130
type will actually have a bad singularity at infinity, We shall in fact be studying such an
example later.
Any meromorphic function f (z) can be written as a ratio of two polynomials. Such a
ratio is known as a rational function. To see why we can always write f (z) in this way, we
have only to make use of the observation above that the number of poles must be finite. Let
the number of poles at finite z be N . Thus at a set of N points zn in the complex plane,
the function f (z) has poles of orders dn . It follows that the function
N
Y
g(z) f (z) (z zn )dn (5.180)
n=1
must be analytic everywhere (except possibly at infinity), since we have cleverly arranged
to cancel out every pole at finite z. Even if f (z) does have a pole at infinity, it follows
from (5.180) that g(z) will diverge no faster than |z|k for some finite integer k. But we
saw earlier, in the generalisation of Liouvilles theorem, that any such function must be a
polynomial of degree k. Thus we conclude that f (z) is a ratio of polynomials:
g(z)
f (z) = QN . (5.181)
dn
n=1 (z zn )
The fact that a meromorphic function can be expressed as a ratio of polynomials can be
extremely useful.
A ratio of two polynomials can be expanded out as a sum of partial fractions. For
example
1 + z2 1 1
2
= 1. (5.182)
1z z+1 z1
Therefore it follows that a function f (z) that is meromorphic can be expanded out as a sum
of partial fractions in that region. For a strictly meromorphic function, this sum will be
a finite one (since there are only finitely many poles, each of finite order). Later, we shall
consider the rather more general possibility of a function that is meromorphic only within
a restricted domain D.
Having introduced the notion of a strictly meromorphic function, it is also useful to
introduce a slightly less strict notion of meromorphicity. Thus, we can define the notion
of a function that is meromorphic within a restricted region. Thus a function is said to be
meromorphic in a domain D in the complex plane if it is analytic except for pole singularities
in the domain D. The previous definition of a meromorphic function thus corresponds to the
case where D is the entire complex plane, including infinity. A very common situation for a
more restricted meromorphic function is when we consider functions that are meromorphic
in the finite complex plane. Such functions are analytic except for isolated pole singularities
131
everywhere in the finite complex plane, but they are allowed to have worse singularities
(such as essential singularities) at infinity. Notice in particular that such a function is now
allowed to have an infinite number of isolated poles in the finite complex plane (since we
are now allowing there to be an accumulation point at infinity).
Let us consider an example of a function f (z) that is meromorphic in some region, and
furthermore where every pole is of order 1. This is in fact a very common circumstance.
As a piece of terminology, a pole of order 1 is also known as a simple pole. Let us assume
that the poles are located at the points zn , numbered in increasing order of distance from
the origin. Thus near z = zn , we shall have
bn
f (z) , (5.183)
z zn
where the constant bn characterises the strength of the pole. In fact bn is known as the
residue at the pole z = zn .
Consider a circle Cp centred on z = 0 and with radius Rp chosen so that it encloses p
of the poles. To avoid problems, we choose Rp so that it does not pass through any pole.
Then the function
p
X bn
Gp (z) f (z) (5.184)
n=1
z zn
will be analytic within the circle, since we have explicitly arranged to subtract out all
the poles (which we are assuming all to be of order 1). Using Cauchys integral, we shall
therefore have
I I p I
1 Gp () d 1 f () d 1 X d
Gp (z) = = bn . (5.185)
2 i Cp z 2 i Cp z 2 i n=1 Cp ( z)( zn )
Now, each term in the sum here integrates to zero. This is because the integrand is
1 1 h 1 1 i
= (5.186)
( z)( zn ) z zn z zn
The integral (over ) is taken around a contour that encloses both the simple pole at = z
and the simple pole at = zn . We saw earlier, in the proof of Cauchys integral formula, that
a contour integral running anti-clockwise around a simple pole c/( 0 ) gives the answer
2 c i, and so the result of integrating (5.186) around our contour is (2 i2 i)/(z zn ) = 0.
Thus we conclude that I
1 f () d
Gp (z) = . (5.187)
2 i Cp z
Now, consider a sequence of ever-larger circles Cp , enclosing larger and larger numbers of
poles. This defines a sequence of functions Gp (z) for increasing p, each of which is analytic
132
within Rp . We want to show that Gp (z) is bounded as p tends to infinity, which will allow us
to invoke Liouvilles theorem and deduce that G (z) = constant. By a now-familiar method
of argument, we suppose that Mp is the maximum value that |f ()| attains anywhere on
the circular contour of radius Rp . Then from (5.187), and using (5.128), we shall have
Mp Rp
|Gp (z)| . (5.188)
Rp |z|
Consider first the case of a function f for which Mp is bounded in value as Rp goes
to infinity. Then, we see from (5.188) that |Gp (z)| is bounded as p goes to infinity. By
Liouvilles theorem, it follows that G (z) must just be a constant, c. Thus in this case we
have
X bn
f (z) = c + . (5.189)
n=1
z zn
We are left with one undetermined constant, c. This can be fixed by looking at one special
value of z, and then equating the two sides in (5.189). Suppose, for example, that f (z) is
analytic at z = 0. We can then determine c by setting z = 0:
X bn
f (0) = c , (5.190)
n=1
zn
and then plugging the solution for c back into (5.189), giving
h
X bn bn i
f (z) = f (0) + + . (5.191)
n=1
z zn zn
(If f (z) happens to have a pole at z = 0, then we just choose some other special value of z
instead, when solving for c.)
We obtained this result by assuming that f (z) was bounded on the circle of radius Rp ,
as Rp was sent to infinity. Even if this is not the case, one can often construct a related
function, for example f (z)/z k for some suitable integer k, which is bounded on the circle.
With appropriate minor modifications, a formula like (5.191) can then be obtained.
An example is long overdue. Consider the function f (z) = tan z. which is, of course
(sin z)/ cos z. Now we have
where we have used the standard results that cos(i y) = cosh y and sin(i y) = i sinh y. Thus
we have
133
It is evident that | sin z| is finite for all finite z, and that therefore tan z can have poles only
when cos z vanishes. From the second expression for | cos z|2 in (5.193), we see that this can
happen only if y = 0 and cos x = 0, i.e. at
z = (n + 12 ) , (5.194)
where n is an integer.
Near z = (n + 21 ), say z = + (n + 12 ), where || is small, we shall have
sin(n + 12 ) = (1)n ,
sin2 x + sinh2 y
| tan z|2 = . (5.197)
cos2 x + sinh2 y
Bearing in mind that sin x and cos x are bounded by 1, that cos p = (1)p 6= 0, and that
sinh2 y and cosh2 y both diverge like 1 2|y|
4e as |y| tends to infinity, we see that | tan z| is
indeed bounded on the circles Rp of radius p , as p tends to infinity. Thus we can now
invoke our result (5.191), to deduce that
h
X i
1 1
tan z = + . (5.198)
n= z (n + 12 ) (n + 12 ))
We can split the summation range into the poles at positive and at negative values of x, by
using
X
X
X
un = un + un1 . , (5.199)
n= n=0 n=0
134
which, grouping the summands together, becomes
X 2z
tan z = 1 2 2 . (5.201)
n=0 (n + 2) z2
This gives our series for the function f (z) = tan z. Note that it displays the expected poles
at all the places where the cos z denominator vanishes, namely at z = (m + 12 ) , where m
is any integer.
Another application of the result (5.191) is to obtain an expansion of an entire function
as an infinite product. Suppose f (z) is entire, meaning that it is analytic everywhere except
at infinity. It follows that f (z) is an analytic function too, and so the function
f (z) d
g(z) = log f (z) (5.202)
f (z) dz
is meromorphic for all finite z. (Its only singularities are poles at the places where f (z)
vanishes, i.e. at the zeros of f (z).)
Let us suppose that f (z) has only simple zeros, i.e. it vanishes like cn (z zn ) near the
zero at z = zn , and furthermore, suppose that f (0) 6= 0. Thus we can apply the formula
(5.191) to g(z), implying that
h
d f (0) X 1 1 i
log f (z) = + + . (5.203)
dz f (0) n=1 z zn zn
This infinite-product expansion is valid for any entire function f (z) with simple zeros at
z = zn , none of which is located at z = 0, whose logarithmic derivative f /f is bounded on
a set of circles Rp . Obviously, without too much trouble, generalisations can be obtained
where some of these restrictions are removed.
Let us apply this result in an example. Consider the function sin z. From (5.193) we
see that it has zeros only at y = 0, x = n . The zero at z = 0 is unfortunate, since in the
derivation of (5.205) we required our entire function f (z) to be non-zero at z = 0. But this
is easily handled, by taking our entire function to be f (z) = (sin z)/z, which tends to 1 at
135
z = 0. We now have a function that satisfies all the requirements, and so from (5.205) we
shall have
sin z Y z z/(n )
= 1 e , (5.206)
z n=
n
where the term n = 0 in the product is to be omitted. Combining the positive-n and
negative-n terms pairwise, we therefore find that
h
Y z 2 i
sin z = z 1 . (5.207)
n=1
n
All the functions we have considered so far have been single-valued ones; given a point z,
the function f (z) has a unique value. Many functions do not enjoy this property. A classic
example is the function f (z) = z 1/2 . This can take two possible values for each non-zero
point z, for the usual reason that there is an ambiguity of sign in taking the square root.
This can be made more precise here, by considering the representation of the point z as
z = r ei . Thus we shall have
1 1 i
f (z) = (r ei ) 2 = r 2 e 2 . (5.208)
But we can also write z = r ei(+2) , since is periodic, with period 2, on the complex
plane. Now we obtain
1 1 i 1 i
f (z) = (r ei (+2) ) 2 = r 2 e 2 +i = r 2 e 2 . (5.209)
In fact, if we look at the value of f (z) = z 1/2 on the circle z = r ei , taking from = 0
to 0 = 2 , where is a small positive constant, we see that
f (r ei ) f (r) , (5.210)
as approaches 0 . But since we are back essentially to where we started in the complex
plane, it follows that f (z) must be discontinuous; it undergoes a jump in its value, on
completing a circuit around the origin.
Of course although in this description we seemed to attach a particular significance to
the positive real axis there is not really anything especially distinguished about this line.
We could just as well have re-oriented our discussion, and concluded that the jump in the
value of f (z) = z 1/2 occurred on some other axis emanating from the origin. The important
136
invariant statement is that if you trace around any closed path that encircles the origin, the
value of z 1/2 will have changed, by an overall factor of (1), on returning to the starting
point. The function f (z) = z 1/2 is double-valued on the complex plane.
If we continue on and take a second trip around the closed path, we will return again
with a factor of (1) relative to the previous visitation of the starting point. So after two
rotations, we are back where we started and the function f (z) = z 1/2 is back to its original
value too. This is expressed mathematically by the fact that
1 i 1 i
f (r ei (+4) ) = r 2 e 2 e2 i = r 2 e 2 = f (r ei ) . (5.211)
An elegant way to deal with a multi-valued function such as f (z) = z 1/2 is to consider
an enlarged two-dimensional surface on which the function is defined. In the case of the
double-valued function f (z) = z 1/2 , we can do it as follows. Imagine taking the complex
plane, and making a semi-infinite cut along the real axis, from x = 0 to x = +. Now,
stack a second copy of the complex plane above this one, again with a cut from x = 0 to
x = +. Now, identify (i.e. glue) the lower edge of the cut of the underneath complex
plane with the upper edge of the cut of the complex plane that sits on top. Finally (a little
trickier to imagine!), identify the lower cut edge of the complex plane on top with the upper
cut edge of the complex plane that sits underneath. We have created something a bit like
a multi-story car-park (with two levels, in this case). As you drive anti-clockwise around
the origin, starting on the lower floor, you find, after one circuit, that you have driven up
onto the upper floor. Carrying on for one more circuit, you are back on the lower floor
again.26 What has been achieved is the creation of a two-sheeted surface, called a Riemann
Surface, on which one has to take z around the origin through a total phase of 4 before
before it returns to its starting point. The function f (z) = z 1/2 is therefore single-valued
on this two-sheeted surface. Ordinary functions, i.e. ones that were single-valued on the
original complex plane, simply have the property of taking the same value on each of the
two sheets, at z = r ei and z = r ei (+2) .
We already noted that the choice of where to run the cut was arbitrary. The important
thing is that for the function f (z) = z 1/2 , it must run from z = 0 out to z = , along any
arbitrarily specifiable path. It is often convenient to take this to be the cut along the real
positive axis, but any other choice will do.
The reason why the origin is so important here is that it is at z = 0 that the actual
branch point of the function f (z) = z 1/2 lies. It is easy to see this, by following the value
26
Of course multi-story car-parks dont work quite like that in real life, owing to the need to be able to
embed them in three dimensions!
137
of f (z) = z 1/2 as z is taken around various closed paths (it is simplest to choose circles) in
the complex plane. One easily sees that the f (z) f (z) discontinuity is encountered
for any path that encloses the origin, but no discontinuity arises for any closed path that
does not enclose the origin.
If one encircles the origin, one also encircles the point at infinity, so f (z) = z 1/2 also has
a branch point at infinity. (Clearly f (1/) = 1/2 is also double valued on going around
= 0.) So in fact, the branch cut that we must introduce is running from one branch point
to the other. This is a general feature of multi-valued functions. In more complicated cases,
this can mean that there are various possible choices for how to select the branch cuts. In
the present case, choosing the branch cut along any arbitrary path from z = 0 to z =
will do. Then, as one follows around a closed path, there is a discontinuity in f (z) each time
the branch cut is crossed. If a closed path crosses it twice (in opposite directions), then the
two cancel out, and the function returns to its original value without any discontinuity.27
Consider another example, namely the function
1 1 1
f (z) = (z 2 1) 2 = (z 1) 2 (z + 1) 2 . (5.212)
It is easy to see that since z 1/2 has a branch point at z = 0, here we shall have branch
points at z = 1 and z = 1. Any closed path encircling either z = 1 or z = +1 (but not
1 1
both) will reveal a discontinuity associated with the two-valuedness of (z + 1) 2 or (z 1) 2
respectively. On the other hand, a circuit that encloses both of the points z = 1 and z = 1
will not encounter any discontinuity. The minus sign coming from encircling one branch
point is cancelled by that coming from encircling the other. The upshot is that we can
choose our branch cuts in either of two superficially-different ways. One of the choices is
to run the branch cut from z = 1 to z = +1. Another quite different-looking choice is to
run a branch cut from z = 1 to z = + along the real positive axis, and another cut from
z = 1 to z = along the real negative axis.
For either of these choices, one gets the right conclusion. Namely, as one follows along
any path, there is a discontinuity whenever a branch cut is crossed. Crossing twice in a
27
In the special case of z 1/2 , for which the function is exactly two-valued, then crossing over the cut twice
even both in the same direction will cause a cancellation of the discontinuity. But more generally, a double
crossing of the branch will cause the discontinuities to cancel only if the crossings are in opposite directions.
Of course multiple crossings of the cut in the same direction might lead to a cancellation, if the function is
only finitely-many valued. For example, f (z) = z 1/n is n-valued, so winding n times around in the same
direction gets back to the original value, if n is an integer. On the other hand f (z) = z 1/ will never return
to its original value, no matter how many complete circuits of the origin are made.
138
given path will cause the two discontinuities to cancel out. so even if we consider the second
choice of branch cuts, with two cuts running out to infinity from the points z = 1 and
z = +1, we get the correct conclusion that a closed path that encircles both z = 1 and
z = +1 will reveal no discontinuity after returning to its starting point.
Actually the two apparently-different choices for the branch cuts are not so very different,
topologically-speaking. Really, z = is like a single point, and one effectively should view
the complex plane as the surface of a sphere, with everywhere out at infinity corresponding
to the same point on the sphere. Think of making a stereographic projection from the north
pole of the sphere onto the infinite plane tangent to the south pole. We think of this plane
as the complex plane. A straight line joining the north pole to a given point in the complex
plane therefore passes through a single point on the sphere. This gives a mapping from each
point in the complex plane into a point on the sphere. Clearly, things get a bit degenerate
as we go further and further out in the complex plane. Eventually, we find that all points
at |z| = , regardless of their direction out from the origin, map onto a single point on
the sphere, namely the north pole. This sphere, known as the Riemann Sphere, is really
what the complex plane is like. Of course as we have seen, a lot of otherwise well-behaved
functions tend to have more severe singularities at z = , but that doesnt detract from
the usefulness of the picture. Figure 2 below show the mapping of a point Q in the complex
plane into a corresponding point P on the Riemann sphere.
1 1
f (1/) = ( 2 1) 2 = 1 (1 2 ) 2 ,
1 1
= 2 18 3 16
1 5
+ . (5.213)
Since it has no branch point there, we can actually take the second choice of branch cuts,
where the two cuts ran from z = 1 and z = +1 to infinity (in other words a single line
from z = 1 to the north pole and back to z = +1), and deform it continuously into the
first choice, where the branch cut simply runs from z = 1 to z = +1. If you think of the
branch cut as an elastic band joining z = 1 to z = +1 via the north pole, it only takes
someone like Superman wandering around at the north pole to give it a little tweak, and it
can contract smoothly and continuously from the second choice of branch cut to the first.
139
North Pole
Riemann Sphere
P
Complex Plane
Figure 2: The point Q in the complex plane projects onto P on the Riemann sphere.
Before proceeding with the mainstream of the development, let us pause for an interlude
on a rather elegant and curious topic. It is a rather little-known method for solving the
following problem. Suppose we are given the real part u(x, y) of an analytic function
f (z) = u(x, y) + i v(x, y). It is a classic exercise, to work out the imaginary part v(x, y),
and hence to learn what the full analytic function f (z) is, by making use of the Cauchy-
Riemann equations. We discussed this problem a while back, showing how one solves for v
by integrating the Cauchy-Riemann equations.
What is not so well known is that one can do the job of finding v(x, y) from u(x, y)
without ever needing to differentiate or integrate at all. This makes a nice party trick,
if you go to the right (or maybe wrong!) sort of parties. The way it works is absurdly
simple, and so, in the best traditions of a conjuring trick, here first is the show. Unlike
the conjurors trick, however, we shall see afterwards how the rabbit was slipped into the
hat. I have not been able to find very full references to it; the earliest I came across is to a
certain Prof. A. Oppenheim, so I shall refer to it as the Oppenheim Method.
140
The way to derive the analytic function f (z), given its real part u(x, y), is the following:
z z
f (z) = 2u( , ) + c , (5.214)
2 2i
where c is a constant. The real part of c can be fixed by using the known given expression
for the real part of f (z). The imaginary part of c is not determinable. Of course this is
always the case; f (z) and f (z) + i , where is a real constant, have the same real parts
and the same analyticity properties, so no method could tell us what is, in the absence of
further specification. (In the usual Cauchy-Riemann derivation of v(x, y), this arbitrariness
arose as a constant of integration.)
Just to show that it really does work, consider the same example that we treated above
using the traditional method. Suppose we are given that u(x, y) = ex cos y is the real part
of an analytic function f (z). What is f (z)? According to (5.214), the answer is
1 i 1 1
f (z) = 2e 2 z cos( z) + c = 2e 2 z cosh( z) + c ,
2 2
z
= e + 1 + c. (5.215)
Now, we fix c by noting, for example, that from the original u(x, y) we have u(0, 0) = 1,
and so we should choose c so that f (z) has real part 1 at z = 0. Thus we have c = 1, and
hence f (z) = ez . (There is no need to be tedious about always adding i , since this trivial
point about the arbitrariness over the imaginary constant is now well understood.) Finally,
we can easily verify that indeed f (z) = ez is the answer we were looking for, since
and so sure enough, this analytic function has real part ex cos y.
How did it work? Like all the best conjuring tricks, the explanation is ludicrously simple.
P
Since f (z) is analytic, we can expand it as a power series, f (z) = n0 an z n . Note that
we are assuming here that it is in particular analytic at z = 0; we shall show later how to
remove this assumption. If we write the expansion coefficients an as an = n + i n , where
n and n are real, then from the series expansion we shall have
Xh i
2u(x, y) = f (z) + f (z) = (n + i n ) (x + i y)n + (n i n ) (x i y)n . (5.217)
n0
Now plug in the values x = z/2, y = z/(2i), as required in the Oppenheim formula:
z z Xh z z n z z n i
2u( , ) = (n + i n ) + + (n i n ) ,
2 2i n0
2 2 2 2
X
= (n + i n ) z n + 0 i 0 , (5.218)
n0
= f (z) + 0 i 0 .
141
Thats all there is to it! The result is proven. Omne ignotum pro magnifico.
We assumed in the proof that f (z) was analytic at z = 0. If its not, then in its
present form the procedure can sometimes break down. For example, suppose we consider
1
the function u(x, y) = 2 log(x2 + y 2 ). (Secretly, we know that this is the real part of the
function f (z) = log z, which of course is analytic for all finite z except for the branch point
at z = 0.) Trying the Oppenheim formula (5.214), we get
1 1
f (z) = log( z 2 z 2 ) + c = log 0 + c . (5.219)
4 4
Oooppps!! Not to worry, we know why it has failed. We need to find a generalisation of the
Oppenheim formula, to allow for such cases where the function we are looking for happens
to be non-analytic at z = 0. The answer is the following:
z +a za
f (z) = 2u( , ) + c, (5.220)
2 2i
where a is an arbitrary constant, to be chosen to avoid any unpleasantness. Lets try this
1
in our function u(x, y) = 2 log(x2 + y 2 ):
h z + a 2 z a 2 i
f (z) = log + c,
2 2
= log(a z) + c = log z + log a + c . (5.221)
So for any value of a other than a = 0, everything is fine. As usual, an elementary exami-
nation of a special case fixes the real part of the constant c, to give c = log a.
It is easy to see why the generalisation (5.220) works. We just repeat the derivation
in (5.218), but now consider an expansion of the function f (z) around z = a rather than
P n
z = 0; f (z) = n0 an (z a) . Provided we dont choose a so that we are trying to expand
around a singular point of f (z), all must then be well:
z+a za Xh z + a za n z + a za n i
2u( , ) = (n + i n ) + a + (n i n ) a ,
2 2i n0
2 2 2 2
X
= (n + i n ) (z a)n + 0 i 0 , (5.222)
n0
= f (z) + 0 i 0 .
Just to show off the method in one further example, suppose we are given
x y
u(x, y) = e x2 +y2 cos . (5.223)
x2 + y 2
Obviously we shall have to use (5.220) with a 6= 0 here. Thus we get
z+a za z+a za
f (z) = 2e 2a z cos + c = 2e 2a z cosh ,
z+a
za2i a zaz 2a z
= e 2a z e 2a z + e 2a z + c , (5.224)
1 1
= e + e + c.
a z
142
Fixing the constant c from a special case, we get
1
f (z) = e z . (5.225)
The method has even worked for a function with an essential singularity, provided that we
take care not to try using a = 0. (Try doing the calculation by the traditional procedure
using (5.80) to see how much simpler it is to use the generalised Oppenheim formula.)
Having shown how effective the Oppenheim method is, it is perhaps now time to admit
to why in some sense a little bit of a cheat is being played here. This is not to say that
anything was incorrect; all the formulae derived are perfectly valid. It is a slightly unusual
kind of trick that has been played, in fact.
Normally, when a conjuror performs a trick, it is he who slips the rabbit into the
hat, and then pulls it out at the appropriate moment to astound his audience. Ironically
enough, in the case of the Oppenheim formula it is the audience itself that unwittingly slips
the rabbit into the hat, and yet nevertheless it is duly amazed when the rabbit reappears.
The key point is that if one were actually working with a realistic problem, in which
only the real part of an analytic function were known, one would typically be restricted
to knowing it only as an experimental result from a set of observations. Indeed, in a
common circumstance such information about the real part of an analytic function might
arise precisely from an experimental observation of, for example, the refractive index of a
medium as a function of frequency. The imaginary part, on the other hand, is related to the
decay of the wave as it moves through the medium. There are quite profound Dispersion
Relations that can be derived that relate the imaginary part to the real part. They are
derived precisely by making use of the Cauchy-Riemann relations, to derive v(x, y) from
u(x, y) by taking the appropriate derivatives and integrals of u(x, y), as in (5.80).
So why was the Oppenheim formula a cheat? The answer is that it assumes that one
knows what happens if one inserts complex values like x = (z + a)/2 and y = (z a)/(2i)
into the slots of u(x, y) that are designed to take the real numbers x and y. In a real-life
experiment one cannot do this; one cannot set the frequency of a laser to a complex value!
So the knowledge about the function u(x, y) that the Oppenheim formula requires one to
have is knowledge that is not available in practical situations. In those real-life cases, one
would instead have to use (5.80) to calculate v(x, y). And the process of integration is
non-local, in the sense that the value for the integral depends upon the values that the
integrand takes in an entire region in the (x, y) plane. This is why dispersion relations
actually contain quite subtle information.
143
The ironic thing is that although the Opennheim formula is therefore in some sense a
cheat, it nevertheless works, and works correctly, in any example that one is likely to
check it with. The point is that when we want to test a formula like that, we tend not to go
out and start measuring refractive indices; rather, we reach into our memories and drag out
some familiar function whose properties have already been established. So it is a formula
that is almost never usable, and yet it works almost always when it is tested with toy
examples. It is a bit like asking someone to pick a random number. Amongst the set of all
numbers, the chance that an arbitrarily chosen number will be rational is zero, and yet the
chance that the persons chosen number will be rational is pretty close to unity.
After some rather lengthy preliminaries, we have now established the groundwork for the
further development of the subject of complex integration. First, we shall derive a general
result about the integration of functions with poles.
If f (z) has an isolated pole of order n at z = a, then by definition, it can be expressed
as
an an+1 a1
f (z) = n
+ n1
+ + + (z) , (5.226)
(z a) (z a) za
where (z) is analytic at and near z = a. The coefficient of a1 in this expansion is called
the residue of f (z) at the pole at z = a.
Let us consider the integral of f (z) around a closed contour C which encloses the pole
at z = a, but within which (z) is analytic. (So C encloses no other singularities of f (z)
except the pole at z = a.) We have
I n
X I I
dz
f (z) dz = ak + (z) dz . (5.227)
C k=1 C (z a)k
By Cauchys theorem we know that the last integral vanishes, since (z) is analytic within
C. To evaluate the integrals under the summation, we may deform the contour C to a circle
of radius centred on z = a, respecting the previous condition that no other singularities
of f (z) shall be encompassed.
Letting z a = ei , the deformed contour C is then parameterised by allowing to
range from 0 to 2, while holding fixed. Thus we shall have dz = i ei d on the contour,
and so
I Z Z h e(1k) i i2
dz 2 i ei d 2
= = i 1k e(1k) i d = 1k . (5.228)
C (z a)k 0 k ei k 0 1k 0
144
When the integer k takes any value other than k = 1, this clearly gives zero. On the other
hand, when k = 1 we have I Z
dz 2
=i d = 2 i (5.229)
C za 0
as we saw when deriving Cauchys integral formula. Thus we arrive at the conclusion that
I
f (z) dz = 2 i a1 . (5.230)
C
The result (5.230) gives the value of the integral when the contour C encloses only the
pole in f (z) located at z = a. Clearly, if the contour were to enclose several poles, at
locations z = a, z = b, z = c, etc., we could smoothly deform C so that it described circles
around each of the poles, joined by narrow causeways of the kind that we encountered
previously, which would contribute nothing to the total integral.
Thus we arrive at the Theorem of Residues, which asserts that if f (z) be analytic
everywhere within a contour C, except at a number of isolated poles inside the contour,
then I X
F (z) dz = 2 i Rs , (5.231)
C s
The theorem of residues can be used in order to evaluate many kinds of integrals. Since this
is an important application, we shall look at a number of examples. First, a list of three
main types of real integral that we shall be able to evaluate:
where R is a rational function of cos and sin . (Recall that if f (z) is a rational
function, it means that it is the ratio of two polynomials.)
where f (z) is analytic in the upper half plane (y > 0) except for poles that do not lie
on the real axis. The function f (z) is also required to have the property that z f (z)
145
should tend to zero as |z| tends to infinity whenever 0 arg(z) . (arg(z) is the
phase of z. This condition means that z f (z) must go to zero for all points z that go
to infinity in the upper half plane.)
where f (z) is a rational function, analytic at z = 0, with no poles on the positive real
axis. Furthermore, z f (z) should tend to zero as z approaches 0 or infinity.
First, consider the type (1) integrals. We introduce z as the complex variable z = ei .
Thus we have
cos = 21 (z + z 1 ) , sin = 1
2i (z z 1 ) . (5.235)
Recalling that R is a rational function of cos and sin , it follows that the integral (5.232)
will become a contour integral of some rational function of z, integrated around a unit circle
centred on the origin. It is a straightforward procedure to evaluate the residues of the poles
in the rational function, and so, by using the theorem of residues, the result follows.
Let us consider an example. Suppose we wish to evaluate
Z 2 d
I(p) , (5.236)
0 1 2p cos + p2
This has simple poles, at z = p and z = 1/p. Since we are assuming that 0 < p < 1, it
follows from the fact that C is the unit circle that the pole at z = 1/p lies outside C, and
so the only pole enclosed is the simple pole at z = p. Thus the residue of the integrand at
z = p is given by taking the limit of
h 1 i
(z p) (5.238)
i (1 p z)(z p)
as z tends to p, i.e. i/(1 p2 ). Thus from the theorem of residues (5.231), we get
Z 2 d 2
2
= , 0 < p < 1. (5.239)
0 1 2p cos + p 1 p2
Note that if we consider the same integral (5.236), but now take the constant p to be
greater than 1, the contour C (the unit circle) now encloses only the simple pole at z = 1/p.
146
Multiplying the integrand by (z 1/p), and taking the limit where z tends to 1/p, we now
find that the residue is +i/(1 p2 ), whence
Z 2 d 2
2
= 2 , p > 1. (5.240)
0 1 2p cos + p p 1
In fact the results for all real p can be combined into the single formula
Z 2 d 2
= 2 . (5.241)
0 1 2p cos + p2 |p 1|
For a more complicated example, consider
Z 2 cos2 3 d
I(p) , (5.242)
0 1 2p cos 2 + p2
with 0 < p < 1. Now, we have
2
I 1 3
+ 21 z 3 I
2z dz (z 6 + 1)2 dz
I(p) = = . (5.243)
C i z (1 p z 2 )(1 p z 2 ) C 4i z 5 (1 p z 2 )(z 2 p)
1 1
The integrand has poles at z = 0, z = p 2 and z = p 2 . Since we are assuming 0 < p < 1,
1
it follows that only the poles at z = 0 and z = p 2 lie within the unit circle corresponding
1
to the contour C. The poles at z = p 2 are simple poles, and the only slight complication
in this example is that the pole at z = 0 is of order 5, so we have to work a little harder
to extract the residue there. Shortly, we shall derive a general formula that can sometimes
be useful in such circumstances. An alternative approach, often in practice preferrable
when one is working out the algebra by hand (as opposed to using an algebraic computing
program), is simply to factor out the singular behaviour and then make a Taylor expansion
of the remaining regular function, as we described earlier. In this case, it is quite simple.
We have
(z 6 + 1)2 1
= (1 + z 6 )2 (1 p z 2 )1 (1 z 2 p1 )1
z 5 (1 p z 2 )(z 2 p) p z5
1
= 5 (1 + p z 2 + p2 z 4 + )(1 + z 2 p1 + z 4 p2 + )
pz
1
= 5 (1 + p z 2 + p2 z 4 + z 2 p1 + z 4 + z 4 p2 + )
pz
1 (p2 + 1) (p4 + p2 + 1)
= 5 + , (5.244)
pz p2 z 3 p3 z
from which we read off the residue of this function at z = 0 as the coefficient of 1/z. Notice
that to make these expansions we just used (1 x)1 = 1 + x + x2 + , and that we only
needed to push these expansions far enough to collect the terms at order z 4 that are then
multiplied by 1/z 5 .
147
After a little further algebra for the two simple poles, we find that the residues of the
1 1
integrand in (5.243) at z = 0, z = p 2 and z = p 2 are given by
i (1 + p2 + p4 ) i (1 + p3 )2 i (1 + p3 )2
, , , (5.245)
4p3 8p3 (1 p2 ) 8p3 (1 p2 )
respectively. Plugging into the theorem of residues (5.231), we therefore obtain the result
Z 2 cos2 3 d (1 p + p2 )
= , (5.246)
0 1 2p cos 2 + p2 (1 p)
when 0 < p < 1.
It is sometimes useful to have a general result for the evaluation of the residue at an
nth-order pole. It really just amounts to formalising the procedure we used above, of
extracting the singular behaviour and then Taylor expanding the remaining analytic factor:
If f (z) has a pole of order n at z = a, it follows that it will have the form
g(z)
f (z) = , (5.247)
(z a)n
where g(z) is analytic in the neighbourhood of z = a. Thus we may expand g(z) in a Taylor
series around z = a, giving
1 1
n1 (n1)
f (z) = g(a) + (z a) g (a) + + (z a) g (a) + ,
(z a)n (n 1)!
g(a) g (a) g(n1) (a)
= + + + + . (5.248)
(z a)n (z a)n1 (n 1)! (z a)
We then read off the residue, namely the coefficient of the first-order pole term 1/(z a),
finding g(n1) (a)/(n 1)!. Re-expressing this in terms of the original function f (z), using
(5.247), we arrive at the general result that
1 h dn1 i
n
R= ((z a) f (z)) . (5.249)
(n 1)! dz n1 z=a
It is straightforward to check that when applied to our example in (5.243), the formula
(5.249) reproduces our previous result for the residue at the 5th-order pole at z = 0. In
practise, though, it is usually more convenient in a hand calculation to use the method
described previously, rather than slogging out the derivatives needed for (5.249).
As a final example of the type (1) class of integrals, consider
Z 2 I
d 4z dz
I(a, b) = , (5.250)
0 (a + b cos )2 C i (b + 2a z + b z 2 )2
148
where a > b > 0. The integrand has (double) poles at
a a2 b2
z= , (5.251)
b
and so just the pole at z = (a + a2 b2 )/b lies inside the unit circle. After a little
calculation, one finds the residue there, and hence, from (5.231), we get
Z 2 d 2 a
2
= 3 . (5.252)
0 (a + b cos ) (a2 b2 ) 2
Turning now to integrals of type 2 (5.233), the approach here is to consider a contour
integral of the form I
I f (z) dz , (5.253)
C
where the contour C is taken to consist of the line from x = R to x = +R along the x
axis, and then a semicircle of radius R in the upper half plane, thus altogether forming a
closed path.
The condition that z f (z) should go to zero as |z| goes to infinity with 0 arg(z)
ensures that the contribution from integrating along the semicircular arc will vanish when
we send R to infinity. (On the arc we have dz = i R ei d, and so we would like R f (R ei )
to tend to zero as R tends to infinity, for all in the range 0 , whence the condition
that we placed on f (z).) Thus we shall have that
Z X
f (x) dx = 2 i Rs , (5.254)
s
where the sum is taken over the residues Rs at all the poles of f (z) in the upper half plane.
The contour is depicted in Figure 3 below.
149
-R R
Figure 3: The contour encloses poles of f (z) in the upper half plane
However, in more complicated examples the contour integral approach is often much easier
to use. Consider, for instance, the integral
Z x4 dx
, (5.258)
(a + b x2 )4
where a > 0 and b > 0. The function f (z) = z 4 (a + b z 2 )4 has poles of order 4 at
1
z = i(a/b) 2 , and so there is just one pole in the upper half plane. Using either the formula
(5.249), or the direct approach of extracting the singular factor and Taylor-expanding by
hand to calculate the residue, and multiplying by 2 i, we get
Z x4 dx 3 5
= 1
16 a 2 b 2 . (5.259)
(a + b x2 )4
Just to illustrate the point, we may note that we could in principle have worked out
(5.258) by elementary means, but the procedure would be quite unpleasant to implement.
By means of an appropriate trigonometric substitution, one eventually concludes that the
integral (5.258) gives
Z h 3b2 x5 8a b x3 3a2 x
b x i
x4 dx 1
= + arctan , (5.260)
(a + b x2 )4 48a b2 (a + b x2 )3 16a3/2 b5/2 a
from which the result (5.260) follows. If you try it, you will find that the labour involved is
much more than in the contour integral method.
One reason for the great saving of labour when using the contour integral method is that
when using the old-fashioned approach of first evaluating the indefinite integral, and then
substituting the limits of integration, one is actually working out much more than is ever
150
needed. It is intrinsically a more complicated exercise to find the function whose derivative
is f (x) than it is to find the result of integrating f (x) over a fixed interval such as to
. If we look at the first term in (5.259) (the rational function of x), we see that they
disappear altogether when one sets x to its limiting values of and +. And yet by the
old-fashioned method it is necessary first to thrash out the entire integral, including these
terms, since we dont know in advance how to recognise that some of the terms in the final
result will end up getting thrown away when the limits are substituted. In our example
above, the indefinite integral is still doable, albeit with a struggle. In more complicated
examples there may be no closed-form expression for the indefinite integral, and yet the
definite integral may have a simple form, easily found by contour-integral methods.
Finally, consider integrals of type 3 (5.234). In general, is assumed to be a real
number, but not an integer. We consider the function (z)1 f (z), which therefore has a
branch-point singularity at z = 0. We shall choose to run the branch cut along the positive
real z-axis, and consider a contour C of exactly the form given in Figure 1, with a = 0.
Note that the branch cut therefore does not lie within the total closed contour. Eventually,
we allow the radius of the larger circle C1 to become infinite, while the radius of the smaller
circle C2 will go to zero. In view of the assumption that z f (z) goes to zero as z goes to
0 or infinity, it follows that the contributions from integrating around these two circles will
each give zero.
Unlike the situation when we used the contour of Figure 1 for deriving the Laurent series,
we are now faced with a function (z)1 f (z) with a branch point at z = 0, and we have
chosen to run the branch cut between the two paths forming the causeway. Consequently,
there is a discontinuity as one traces the value of (z)1 f (z) around a closed path that
encircles the origin. This means that the results of integrating along the two sides of the
causeway connecting the circles C1 and C2 will not cancel.
We shall define (z)1 to be real and positive when z lies on the negative real axis, such
as at the point where the small circle C2 intersects the negative real axis.28 Consequently,
28
Note that this is a definition it is not automatically true. To see this, observe that we can write
1 = e2 in , where n is any integer. If we were dealing with a single-value function g(z), then g(1) = g(e2 i n )
for any n and there is no ambiguity. But if we raise 1 to the fractional power b, then we would have all
the possible choices e2 i n b for what we mean by 1b . If b is rational, i.e. b = p/q for integers p and q, then
there will be a finite number of different bth roots of 1, while if b is irrational, such as b = 2 or b = ,
then there will be infinitely many different bth roots of 1. They are in general complex numbers of unit
magnitude. Our specific choice we are making here, which must be stated, and not merely left to chance, is
to say that we shall define (z)b to be real and positive, when z lies on the negative real axis.
151
on the lower part of the causeway (below the positive real axis), the phase will be ei (1) .
On the other hand, on the upper part of the causeway (above the positive real axis), the
phase will be ei (1) . Thus we find that
I Z Z
1 i (1) 1 i (1)
(z) f (z) dz = e x f (x) dx + e x1 f (x) dx ,
C 0 0
Z
= 2i sin( ) x1 f (x) dx , (5.261)
0
where the minus sign on the first term on the right in the top line comes from the fact
that the integral from x = 0 to x = is running in the direction opposite to the indicated
direction of the contour in Figure 1. The contour integral on the left-hand side picks up all
the contributions from the poles of f (z). Thus we have the result that
Z X
x1 f (x) dx = Rs , (5.262)
0 sin s
where Rs is the residue of (z)1 f (z) at pole number s of the function f (z).
As an example, consider the integral
Z x1 dx
. (5.263)
0 1+x
Here, we therefore have f (z) = 1/(z + 1), which just has a simple pole, at z = 1. The
residue of (z)1 f (z) is therefore just 1, and so from (5.262) we obtain that when 0 <
< 1, Z x1 dx
= . (5.264)
0 1+x sin
(The restriction 0 < < 1 is to ensure that the fall-off conditions for type 3 integrands at
z = 0 and z = are satisfied.)
A common circumstance is when there is in fact a pole in the integrand that lies exactly
on the path where we wish to run the contour. An example would be an integral of the type
(2) discussed above, but where the integrand now has poles on the real axis. If these are
simple poles, then the following method can be used. Consider a situation where we wish
R
to evaluate f (x) dx, and f (z) has a single simple pole on the real axis, at z = a. What
we do is to make a little detour in the contour, to skirt around the pole, so the contour C in
Figure 3 now aquires a little semicircular bypass , of radius , taking it into the upper
half plane around the point z = a. This is shown in Figure 4 below. Thus before we take
the limit where R , we shall have
Z a Z Z R X
f (x) dx + f (z) dz + f (x) dx = 2 i Rj , (5.265)
R a+ j
152
1
0
0
1
-R R
where as usual Rj is the residue of f (z) at its jth pole in the upper half plane.
Sending R to infinity, and to zero, the remaining two terms on the left-hand side of (5.265)
R
define what is called the Cauchy Principal Value Integral, denoted by P ,
Z Z a Z
P f (x) dx f (x) dx + f (x) dx , (5.267)
a+
where one takes the limit where the small positive quantity goes to zero. Such a definition
is necessary in order to give meaning to what would otherwise be an ill-defined integral.
In general, we therefore arrive at the result that if f (z) has several simple poles on the
e , as well as poles in the upper half plane with residues Rj , then
real axis, with residues Rk
Z X X
P f (x) dx = 2 i Rj + i ek .
R (5.268)
j k
Here, the principal-value prescription is used to give meaning to the integral, analogously
to (5.267), at each of the simple poles on the real axis.
R
Consider, as an example, (sin x)/x dx. Actually, of course, this integrand has no
pole on the real axis, since the pole in 1/x is cancelled by the zero of sin x. But one way to
153
do the calculation is to say that we shall calculate the imaginary part of
Z Z Z
ei x cos x sin x
dx = dx + i dx . (5.269)
x x x
We must now use the principal-value prescription to give meaning to the integral on the
left-hand side, since the real part of the integrand in (5.269), namely (cos x)/x, does have a
pole at x = 0. But since we are after the imaginary part, the fact that we have regulated
the real part of the integral will not upset what we want. Thus from (5.268) we find that
Z ei x
P dx = i , (5.270)
x
and so from the imaginary part (which is all there is; the principal-value integral has
regulated the ill-defined real part to be zero) we get
Z sin x
dx = . (5.271)
x
Notice that there is another way that we could have handled a pole on the real axis.
We could have bypassed aound it the other way, by taking a semicircular contour that
went into the lower half complex plane instead. Now, the integration (5.266) would be
replaced by one where ran from = to = 2 as one follows in the direction of the
e rather than +i R
arrow, giving, eventually, a contribution i R e in (5.268). But all is
actually well, because if we make a detour of this kind we should actually now also include
the contribution of this pole as an honest pole enclosed by the full contour C, so it will also
e in the first summation on the right-hand side of (5.268). So at
give a contribution 2 i R
the end of the day, we end up with the same conclusion no matter which way we detour
around the pole.
Another common kind of real integral that can be evaluated using the calculus of residues
involves the log function. Consider, for example, the following:
Z log x dx
I . (5.272)
0 (1 + x2 )2
One way to evaluate this is by taking the usual large semicircular contour in the upper half
plane, with a little semicircular detour (in the upper half plane) bypassing the branch
point at z = 0, as in Figure 4. We think of running the branch cut of log z from z = 0 to
z = , just fractionally below the positive real axis. Thus for z on the positive real axis,
we shall have simply log z = log x. If we look just below the branch cut, i.e. for z = x i ,
where is a very small positive constant, we shall have log z = log x + 2i in the limit when
goes to zero, since we have effectively swung once around the origin, sending z z e2i ,
to get there.
154
Then we shall have
Z Z Z
log x dx log z dz log x dx
+ + = 2 i R , (5.273)
(1 + x2 )2 (1 + z 2 )2 (1 + x2 )2
where R is the residue of (log z)/(1+z 2 )2 at the double pole at z = i in the upper half plane.
(As usual, we must check that the integrand indeed has the appropriate fall-off property
so that the contribution from the large semicircular arc goes to zero; it does.) There are a
couple of new features that this example illustrates.
First, consider the integral around the little semicircle . Letting z = ei there we
shall have Z Z
log z dz log( ei ) ei d
= i . (5.274)
(1 + z 2 )2 0 (1 + 2 e2i )2
This looks alarming at first, but closer inspection reveals that it will give zero, once we take
the limit 0. The point is that after writing log( ei ) = log + i , we see that the
integrations will not introduce any divergences, and so the overall factors of or log in
the two parts of the answer will both nicely kill off the contributions, as 0.
Next, consider the first integral on the left-hand side of (5.273). For this, we can change
variable from x, which takes negative values, to t, say, which is positive. But we need to
take care, because of the multi-valuedness of the log function. So we should define
x = ei t . (5.275)
In all places except the log, we can simply interpret this as x = t, but in the log we shall
have log z = log(ei t) = log t + i . Thus the first integral in (5.273) gives
Z 0 Z Z
log x dx log t dt dt
= + i . (5.276)
(1 + x2 )2 0 (1 + t2 )2 0 (1 + t2 )2
(Now that we know that there is no contribution from the little semicircle , we can just
take = 0 and forget about it.) The first term on the right-hand side here is of exactly the
same form as our original integral I defined in (5.272). The second term on the right is a
simple integral. It itself can be done by contour integral methods, as we have seen. Since
there is no new subtlety involved in evaluating it, lets just quote the answer, namely
Z dt
= 14 . (5.277)
0 (1 + t2 )2
1
2I + i 2 = 2 i R . (5.278)
4
155
It remains only to evaluate the residue of (log z)/(1 + z 2 )2 at the double pole at z = i in
the upper half plane. We do this with the standard formula (5.249). Thus we have
d h log z i
R= , (5.279)
dz (z + i)2
to be evaluated at z = i = ei /2 . (Note that we should write it explicitly as ei /2 in order
to know exactly what to do with the log z term.) Thus we get
i 1
R= + . (5.280)
4 8
Plugging into (5.278), we see that the imaginary term on the left-hand side is cancelled by
the imaginary term in (5.280), leaving just 2I = /2. Thus, eventually, we arrive at the
result that Z log x dx
= 14 . (5.281)
0 (1 + x2 )2
It is perhaps a useful pedagodgical exercise to calculate the residue of log z/(1 + z 2 )2 at
z = i instead by the elementary expansion method. Writing z = i + , and taking care to
write i as ei/2 in the logarithm since here its precise phase matters, we have
log z log[ei/2 (1 i)] 1 h i i i 2
= = + log(1 i) 1 ,
(1 + z 2 )2 4 2 (1 2i )2 4 2 2 2
1 h i i
= 2 + (i + ) (1 + i + ) ,
4 2
i i 1
= 2+ + + , (5.282)
8 4 8
whence the result (5.280) for the residue.
Aside from the specifics of this example, there are two main general lessons to be learned
from it. The first is that if an integrand has just a logarithmic divergence at some point
z = a, then the contour integral around a little semicircle or circle centred on z = a will give
zero in the limit when its radius goes to zero. This is because the logarithmic divergence
of log is outweighed by the linear factor of coming from writing dz = i ei d.
The second general lesson from this example is that one should pay careful attention to
how the a coordinate redefinition is performed, for example when re-expressing an integral
along the negative real axis as an integral over a positive variable (like t in our example).
In particular, one has to handle the redefinition with appropriate care in the multi-valued
log function.
Another application of the calculus of residues is for evaluating certain types of infinite
series. The idea is the following. We have seen that the functions cosec z and cot z have
156
the property of having simple poles at all the integers, whilst otherwise being analytic in
the whole finite complex plane. In fact, they are bounded everywhere as one takes |z| to
infinity, except along the real axis where the poles lie. Using these functions, we can write
down contour integrals that are related to infinite sums.
First, let us note that the residues of the two trigonometric functions are as follows:
where Ra denotes the residue of f (z) cot z at pole number a of the function f (z), and
the summation is over all such poles that lie within the contour Cp . In other words, we have
simply split the total sum over residues into the first term, which sums over the residues at
the known simple poles of cot z, and the second term, which sums over the poles associated
with the function f (z) itself. Of course, in the first summation, the residue of f (z) cot z
at z = n is simply f (n), since the pole in cot z is simple, and itself has residue 1. (We
are assuming here that f (z) doesnt itself have poles at the integers.)
Now, it is clear that if we send p to infinity, so that the corresponding contour Cp grows
to infinite size and encompasses the whole complex plane, we shall have
I
X X
f (z) cot z = 2 i f (n) + 2 i Ra , (5.285)
C n= a
where the second sum now ranges over the residues Ra of f (z) cot z at all the poles of
f (z). Furthermore, let us suppose that the function f (z) is such that
157
C0
C1
C2
Figure 5: The square contours enclose the poles of f (z) (square dots) and the poles of cot z
or cosec z (round dots)
It follows that the integral around the contour C out at infinity will be zero. Consequently,
we obtain the result that
X X
f (n) = Ra , (5.287)
n= a
where the right-hand sum is over the residues Ra of f (z) cot z at all the poles of f (z).
In a similar fashion, using cosec z in place of cot z, we have that
X X
(1)n f (n) = ea ,
R (5.288)
n= a
where the right-hand sum is over the residues of f (z) cosec z at all the poles of f (z).
Consider an example. Suppose we take
1
f (z) = . (5.289)
(z + a)2
This has a double pole at z = a. Using (5.249), we therefore find that the residue of
f (z) cot z at z = a is
R = 2 cosec 2 (a) , (5.290)
158
and hence from (5.287) we conclude that
X 1 2
= . (5.291)
n=
(n + a)2 sin2 a
We can also evaluate the analgous sum with alternating signs, by using (5.288) instead.
Now, we caluate the residue of (z + a)2 cosec z at the double pole at z = a, and
conclude that
X (1)n 2 cos a
= . (5.292)
n=
(n + a)2 sin2 a
Clearly there are large classes of infinite series that can be summed using these tech-
niques. We shall encounter another example later, in a discussion of the Riemann zeta
function.
It is easily seen, by applying the Cauchy test for convergence, that this series is absolutely
convergent for |z| < 1. It follows, therefore, that the function g(z) defined by (5.293) is
analytic inside the unit circle |z| < 1. It is also true, of course, that g(z) is singular outside
the unit circle; the power series diverges.
Of couse (5.293) is a very simple geometric series, and we can see by inspection that it
can be summed, when |z| < 1, to give
1
f (z) = . (5.294)
1z
This is analytic everywhere except for a pole at z = 1. So we have two functions, g(z) and
f (z), which are both analytic inside the unit circle, and indeed they are identical inside
the unit circle. However, whereas the function g(z) is singular outside the unit circle, the
function f (z) is well-defined and analytic in the entire complex plane, with the exception
of the point z = 1 where it has a simple pole.
It is evident, therefore, that we can view f (z) = 1/(1 z) as an extrapolation, or
continuation, of the function g(z) = 1 + z + z 2 + outside its circle of convergence. As
159
we shall prove below, there is an enormously powerful statement that can be made; the
function 1/(1 z) is the unique analyic continuation of the original function g(z) defined in
the unit circle by (5.293). This uniqueness is absolutely crucial, since it means that one can
sensibly talk about the analytic continuation of a function that is initially defined in some
restricted region of the complex plane. A priori, one might have imagined that there could
be any number of ways of defining functions that coincided with g(z) inside the unit circle,
but that extrapolated in all sorts of different ways as one went outside the unit circle. And
indeed, if we dont place the extra, and very powerful, restriction of analyticity, then that
would be exactly the case. We could indeed dream up all sorts of non-analytic functions
that agreed with g(z) inside the unit circle, and that extrapolated in arbitrary ways outside
the unit circle.29 The amazing thing is that if we insist that the extrapolating function be
analytic, then there is precisely one, and only one, analytic continuation.
In the present example, we have the luxury of knowing that the function g(z), defined by
the series expansion (5.293), actually sums to give 1/(1 z) for any z within the unit circle.
This immediately allows us to deduce, in this example, that the analytic continuation of
g(z) is precisely given by
1
,
g(z) = (5.295)
1z
which is defined everywhere in the complex plane except at z = 1. So in this toy example,
we know what the function really is.
Suppose, for a moment, that we didnt know that the series (5.293) could be summed
to give (5.295). We could, however, discover that g(z) defined by (5.293) gave perfectly
sensible results for any z within the unit circle. (For example, by applying the Cauchy test
for absolute convergence of the series.) Suppose that we use these results to evaluate g(z)
in the neighbourhood of the point z = 21 . This allows us, by using Taylors theorem, to
construct a series expansion for g(z) around the point z = 12 :
X g (n) ( 1 )
g(z) = 2
(z + 21 )n , (5.296)
n0
n!
where g(n) ( 21 ) is the nth derivative of g(z) evaluated at z = 21 . Since we are expanding
around a point where g(z) is already known to be equal to (1 z)1 , it is easy to use this
to evaluate g(n) ( 12 ), and thereby show that (5.296) gives
g(z) = 2
3 + 49 (z + 12 ) + 8
27 (z + 12 )2 + 16
81 (z + 12 )3 + , (5.297)
29
We could, for example, simply define a function F (z) such that F (z) g(z) for |z| < 1, and F (z) h(z)
for |z| 1, where h(z) is any function we wish. But the function will in general be horribly non-analytic on
the unit circle |z| = 1 where the changeover occurs.
160
with the coefficient of the general term following the obvious pattern indicated by the dis-
played terms. A simple application of the Cauchy test shows that the radius of convergence
is 32 . In other words, the series (5.297) converges within a circle of radius 3
2 centred on the
point z = 12 . This circle of convergence, and the original one, are depicted in Figure 6
below. We see that this process has taken us outside the original unit circle; we are now
able to evaluate the function g(z) in a region outside the unit circle, where its original
power-series expansion (5.293) does not converge.30
It should be clear that be repeated use of this technique, we can eventually cover the
entire complex plane, and hence construct the analytic continuation of g(z) from its original
definition (5.293) to a function defined everywhere except at z = 1.
The crucial point here is that the process of analytic continuation is a unique one. To
show this, we can establish the following theorem:
Let f (z) and g(z) be two functions that are analytic in a region D, and suppose
30 3
Secretly, we can understand very clearly why the radius of convergence for this new expansion is 2
,
since what we are really doing is expanding (1 z)1 around the point z = 12 and the nearest singularity
is as z = 1. For the present pedagogical purposes, however, we are pretending that we dont know that.
161
that they are equal on an infinite set of points having a limit point z0 in D. Then
f (z) g(z) for all points z in D.
In other words, if we know that the two analytic functions f (z) and g(z) agree on an
arc of points ending at point31 z0 in D, then they must agree everywhere in D. (Note that
we do not even need to know that they agree on a smooth arc; it is sufficient even to know
that they agree on a discrete set of points that get denser and denser until the end of the
arc at z = z0 is reached.)
To prove this theorem, we first define h(z) = f (z) g(z). Thus we know that h(z)
is analytic in D, and it vanishes on an infinite set of points with limit point z0 . We are
required to prove that h(z) must be zero everywhere in D. We do this by expanding h(z)
in a Taylor series around z = z0 :
X
h(z) = ak (z z0 )k = a0 + a1 (z z0 ) + , (5.298)
k=0
then p(z) is an analytic function, and its Taylor series is therefore also convergent, in the
neighbourhood of z = z0 . Now comes the punch-line. We know that h(z) is zero for all
the points z = zn in that infinite set that has z0 as a limit point. Thus in particular there
are points zn with n very large that are arbitrarily close to z = z0 , and at which h(z)
vanishes. It follows from its definition that p(z) must also vanish at these points. But since
the Taylor series for p(z) is convergent for points z near to z = z0 , it follows that for p(zn )
to vanish when n is very large we must have am = 0, since all the higher terms in the
Taylor series would be negligible. But this contradicts our assumption that am was the first
31
An example of such a set of points would be zn = z0 + 1/n, with n = 1, 2, 3 . . ..
162
non-vanishing coefficient in (5.298). Thus the premise that there exists a first non-vanishing
coefficient was false, and so it must be that all the coefficients ak vanish. This proves that
h(z) = 0, which is what we wanted to show.
The above proof shows that h(z) must vanish within the circle of convergence, centered
on z = z0 , of the Taylor series (5.298). By repeating the discussion as necessary, we can
extend this region gradually until the whole of the domain D has been covered. Thus we
have established that f (z) = g(z) everywhere in D, if they agree on an infinite set of points
with limit point z0 .
By this means, we may eventually seek to analytically extend the function to the whole
complex plane. There may well be singularities at certain places, but provided we dont
run into a solid wall of singularities, we can get around them and extend the definition
of the function as far as we wish. Of course if the function has branch points, then we will
encounter all the usual multi-valuedness issues as we seek to extend the function.
Let us go back for a moment to our example with the function g(z) that was originally
defined by the power series (5.293). We can now immediately invoke this theorem. It is
easily established that the series (5.293) sums to give 1/(1 z) within the unit circle. Thus
we have two analytic functions, namely g(z) defined by (5.293) and f (z) defined by (5.294)
that agree in the entire unit circle. (Much more than just an arc with a limit point, in
fact!) Therefore, it follows that there is a unique way to extend analytically outside the
unit circle. Since f (z) = 1/(1 z) is certainly analytic outside the unit circle, it follows
that the function 1/(1 z) is the unique analytic extension of g(z) defined by the power
series (5.293).
Let us now consider a less trivial example, to show the power of analytic continuation.
which converges if Re(z) > 0. It is easy to see that if Re(z) > 1 then we can perform an
integration by parts to obtain
Z h i
(z) = (z 1) et tz2 dt et tz1 = (z 1) (z 1) , (5.301)
0 0
since the boundary term then gives no contribution. Shifting by 1 for convenience, we have
(z + 1) = z (z) . (5.302)
163
One easily sees that if z is a positive integer k, the solution to this recursion relation is
(k) = (k 1)!, since it is easily established by elementary integration that (1) = 1. The
responsibility for the rather tiresome shift by 1 in the relation (k) = (k 1)! lies with
Leonhard Euler.
Of course the definition (5.300) is valid only when the integral converges. Its clear that
the et factor ensures that there is no trouble from the upper limit of integration, but from
t = 0 there will be a divergence unless Re(z) > 0. Furthermore, for Re(z) > 0 it is clear that
we can differentiate (5.300) with respect to z as many times as we wish, and the integrals
will still converge.32 Thus (z) defined by (5.300) is finite and analytic for all points with
Re(z) > 0.
We can now use (5.302) in order to give an analytic contiuation of (z) into the region
where Re(z) 0. Specifically, if we write (5.302) as
(z + 1)
(z) = , (5.304)
z
then this gives a way of evaluating (z) for points in the strip 1 + < Re(z) < ( a
small positive quantity) in terms of (z + 1) at points with Re(z + 1) > 0, where (z + 1)
is known to be analytic. The function so defined, and the original Gamma function, have
an overlapping region of convergence, and so we can make an analytic continuation into the
strip 1 + < Re(z) < . The process can then be applied iteratively, to cover more and
more strips over to the left-hand side of the complex plane, until the whole plane has been
covered by the analytic extension. Thus by sending z z + 1 in (5.304) we may write
(z + 2)
(z + 1) = , (5.305)
z+1
(z + 2)
(z) = . (5.306)
(z + 1) z
The right-hand side is analytic for Re(z) > 2, save for the two manifest poles at z = 0 and
z = 1, and so this has provided us with an analytic continuation of (z) into the region
Re(z) > 2. In the next iteration we use (5.304) with z z + 2 to express (z + 2) as
32
Write tz = ez log t
, and so, for example,
Z
(z) = dt tz1 log t et . (5.303)
0
Now matter how many powers of log t are brought down by repeated differentiation, the factor of tz1 will
ensure convergence at t = 0.
164
(z + 3)/(z + 2), hence giving
(z + 3)
(z) = , (5.307)
(z + 2)(z + 1) z
valid for Re(z) > 3, and so on.
Of course the analytically continued function (z) is not necessarily analytic at every
point in the complex plane, and indeed, we are already seeing, it has isolated poles. To
explore the behaviour of (z) in the region of some point z with Re(z) 0, we first iterate
(5.302) just as many times n as are necessary in order to express (z) in terms of (z+n+1):
(z + n + 1)
(z) = , (5.308)
(z + n)(z + n 1)(z + n 2) z
where we choose n so that Re(z + n + 1) > 0 but Re(z + n) < 0. Since we have already
established that (z + n + 1) will therefore be finite, it follows that the only singularities of
(z) can come from places where the denominator in (5.308) vanishes. This will therefore
happen when z = 0 or z is a negative integer.
To study the precise behaviour near the point z = n, we may set z = n + , where
|| << 1, and use (5.308) to give
(1)n (1 + ) (1)n
(n + ) = = + , (5.309)
(n )(n 1) (1 ) n (n 1) 2 1
where the terms represented by are analytic in . Thus there is a simple pole at = 0.
Its residue is calulated by multiplying (5.309) by and taking the limit 0. Thus we
conclude that (z) is meromorphic in the whole finite complex plane, with simple poles
at the points z = 0, 1, 2, 3, . . ., with the residue at z = n being (1)n /n!. (Since
(1) = 1.)
The regular spacing of the poles of (z) is reminiscent of the poles of the functions
cosec z or cot z. Of course in these cases, they have simple poles at all the integers; zero
negative and positive. We can in fact make a function with precisely this property out of
(z), by writing the product
(z) (1 z) . (5.310)
From what we saw above, it is clear that this function will have simple poles at precisely
all the integers. Might it be that this function is related to cosec z or cot z?
To answer this, consider again the original integral representation (5.300) for (z), and
now make the change of variables t t2 . This implies dt/t 2dt/t, and so we shall
have Z 2
(z) = 2 et t2z1 dt . (5.311)
0
165
Thus we may write
Z Z 2 +y 2 )
(a) (1 a) = 4 dx dy e(x x2a1 y 2a+1 . (5.312)
0 0
If we restrict a such that 0 < Re(a) < 1, this integral falls into the category of type 3 that
we discussed a couple of sections ago. Thus we have
2 X
(a) (1 a) = Rc , (5.316)
sin(2 a) c
where Rc are the residues at the poles of (z)2a1 /(1 + z 2 ). These poles lie at z = i, and
the residues are easily seen to be 12 ei a . Thus we get
2 2 cos( a)
(a) (1 a) = cos( a) = ,
sin(2 a) 2 sin( a) cos( a)
= . (5.317)
sin a
Although we derived this by restricting a such that 0 < Re(a) < 1 in order to ensure con-
vergence in the integration, we can use the now-familiar technique of analytic continuation
and conclude that
(z) (1 z) = , (5.318)
sin z
in the whole complex plane. This result, known as the reflection formula, is one that will
be useful in the next section, when we shall discuss the Riemann Zeta function.
Before moving on to the Riemann Zeta function, let us first use (5.318) to uncover a
couple more properties of the Gamma function. The first of these is a simple fact, namely
that
( 21 ) = . (5.319)
1
We see this by setting z = 2 in (5.318).
166
The second, more significant, property of (z) that we can deduce from (5.318) is that
(z)1 an entire function. That is to say, (z)1 is analytic everywhere in the finite complex
plane. Since we have already seen that the only singularities of (z) are poles, this means
that we need only show that (z) has no zeros in the finite complex plane. Looking at
(5.318) we see that if it were to be the case that (z) = 0 for some value of z, then it would
have to be that (1 z) were infinite there.33 But we know precisely where (1 z) is
infinite, namely the poles at z = 1, 2, 3 . . ., and (z) is certainly not zero there.34 Therefore
(z) is everywhere non-zero in the finite complex plane. Consequently, (z)1 is analytic
everywhere in the finite complex plane, thus proving the contention that (z)1 is an entire
function.
Before closing this section, we may observe that we can also give a contour integral
representation for the Gamma function. This will have the nice feature that it will provide
us directly with an expression for (z) that is valid in the whole complex plane. As we shall
see, the Hankel integral
Z
1
(z) = et (t)z1 dt , (5.320)
2 i sin z C
provides a representation for the Gamma function, where we integrate in the complex t-
plane around the so-called Hankel Contour depicted in Figure 7 below. This starts at +
just above the real axis, swings around the origin, and goes out to + again just below
the real axis. As usual, we shall run the branch cut for the multi-valued function (t)z1
along the positive real axis in the complex t plane, and (t)z1 will be taken to be real and
positive when t lies on the negative real axis.
We see can deform the contour in Figure 7 into the contour depicted in Figure 8, since
no singularities are crossed in the process. If Re(z) > 0, there will be no contribution from
integrating around the small circle surrounding the origin, in the limit where its radius
is sent to zero. Hence the contour integral is re-expressible simply in terms of the two
semi-infinite line integrals just above and below the real axis.
The integrals along the lower and upper causeways in Figure 8, we follow the same
procedure that we have used before. We define the phase of (t)z to be zero when t lies
on the negative real t axis, and run the branch cut along the positive real t axis. For the
33
Recall that sin z is an entire function, and it therefore has no singularity in the finite complex plane.
Consequently, 1/(sin z) must be non-vanishing for all finite z.
34
Instead, the poles of (1 z) at z = 1, 2, 3 . . . are balanced in (5.318) by the poles in 1/ sin( z).
167
Figure 7: The Hankel contour
integral below the real axis, we therefore have (t) = e i x, with x running from 0 to +.
For the integral above the real axis, we have (t) = ei x, with x running from + to 0.
Consequently, we get
Z Z
et (t)z1 dt = (ei (z1) ei (z1) ) et tz1 dt ,
C 0
Z
t z1
= 2 i sin(z) e t dt , (5.321)
0
and hence we see that (5.320) has reduced to the original real integral expression (5.300)
when Re(z) > 0. However, the integral in the expression (5.320) has a much wider appli-
cability; it is actually single-valued and analytic for all z. (Recall that we are integrating
around the Hankel contour, which does not pass through the point t = 0, and so there is no
reason for any singularity to arise, for any value of z.) The poles in (z) (which we know
from our earlier discussion to occur at z = 0, 1, 2, . . .) must therefore be due entirely
to the 1/ sin(z) prefactor in (5.321. Indeed, as we saw a while ago, 1/ sin(z) has simple
poles when z is an integer.35
Combining (5.320) with (5.318), we can give another contour integral expression for
35
The reason why (5.321) doesnt also imply that (z) has simple poles when z is a positive integer is that
168
Figure 8: The deformation of the Hankel contour
(z), namely Z
1 1
= et (t)z dt , (5.322)
(z) 2 i C
where we again integrate around the Hankel contour of Figure 7, in the complex t plane.
Again, this integral is valid for all z. Indeed with this expression we see again the result
that we previously deduced from (5.318), that (z)1 is an entire function, having no
singularities anywhere in the finite complex plane.
A pause for reflection is appropriate here. What we have shown is that (z) defined by
(5.320) or (5.322) gives the analytic continuation of our original Gamma function (5.300) to
the entire complex plane, where it is analytic except for simple poles at z = 0, 1, 2, . . ..
How is it that these contour integrals do better than the previous real integral (5.300),
the integral itself vanishes when z is a positive integer, and this cancels out the pole from 1/ sin(z). This
vanishing can be seen from the fact that when z is a positive integer, the integrand is analytic (there is no
longer a branch cut), the contour can be closed off at infinity to make a closed contour encircling the origin,
and hence Cauchys theorem implies the integral vanishes.
169
which only converged when Re(z) > 0? The crucial point is that in our derivation, when
we related the real integral in (5.300) to the contour integral (5.320), we noted that the
contribution from the little circle as the contour swung around the origin would go to zero
provided that the real part of z was greater than 0.
So what has happened is that we have re-expressed the real integral in (5.300) in terms
of a contour integral of the form (5.320), which gives the same answer when the real part
of z is greater than 0, but it disagrees when the real part of z is 0. In fact it disagrees
by the having the rather nice feature of being convergent and analytic when Re(z) 0,
unlike the real integral that diverges. So as we wander off westwards in the complex z plane
we wave a fond farewell to the real integral, with its divergent result, and adopt instead
the result from the contour integral, which happily provides us with analytic answers even
when Re(z) 0. We should not be worried by the fact that the integrals are disagreeing
there; quite the contrary, in fact. The whole point of the exercise was to find a better way
of representing the function, to cover a wider region in the complex plane. If we had merely
reproduced the bad behaviour of the original integral in (5.300), we would have achieved
nothing by introducing the contour integrals (5.320) and (5.322).
Now we turn to the Riemman Zeta function, as a slightly more intricate example of the
analytic continuation of a function of a complex variable.
This sum converges whenever the real part of s is greater than 1. (For example, (2) =
P 2 P
n1 n can be shown to equal 2 /6, whereas (1) = n1 n
1 is logarithmically diver-
gent. The sum is more and more divergent as Re(s) becomes less than 1.)
Since the series (5.323) defining (s) is convergent everywhere to the right of the line
Re(s) = 1 in the complex plane, it follows that (s) is analytic in that region. It is reasonable
to ask what is its analytic continuation over to the left of Re(s) = 1. As we have already
seen from the simple example of f (z) = 1/(1 z), the mere fact that our original power
series diverges in the region with Re(s) 0 does not in any way imply that the actual
function (s) will behave badly there. It may be just that our power series representation
is inadequate.
170