[go: up one dir, main page]

0% found this document useful (0 votes)
31 views131 pages

GR 1

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 131

PHY483F/1483F

Relativity Theory I
(2020-21)
Department of Physics
University of Toronto

Instructor: Prof. A.W. Peet

Sources:-

• M.P. Hobson, G.P. Efsthathiou, and A.N. Lasenby, “General relativity: an introduction
for physicists” (Cambridge University Press, 2005) [recommended textbook];

• Sean Carroll, “Spacetime and geometry: an introduction to general relativity” (Addison-


Wesley, 2004);

• Ray d’Inverno, “Introducing Einstein’s relativity” (Oxford University Press, 1992);

• Jim Hartle, “Gravity: an introduction to Einstein’s general relativity” (Pearson, 2003);

• Bob Wald, “General relativity” (University of Chicago Press, 1984);

• Tomás Ortı́n, “Gravity and strings” (Cambridge University Press, 2004);

• Noel Doughty, “Lagrangian Interaction” (Westview Press, 1990);

• my personal notes over three decades.

Version: Monday 16th November, 2020 @ 10:55

Licence: Creative Commons Attribution-NonCommercial-NoDerivs Canada 2.5


Contents
1 R10Sep iii
1.1 Invitation to General Relativity . . . . . . . . . . . . . . . . . . . . . . . . . iii
1.2 Course website . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 M14Sep 2
2.1 Galilean relativity, 3-vectors in Euclidean space, and index notation . . . . . 2

3 R17Sep 8
3.1 Special relativity and 4-vectors in Minkowski spacetime . . . . . . . . . . . . 8
3.2 Partial derivative 4-vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4 M21Sep 14
4.1 Relativistic particle: position, momentum, acceleration 4-vectors . . . . . . . 14
4.2 Electromagnetism: 4-vector potential and field strength tensor . . . . . . . . 18

5 R24Sep 21
5.1 Constant relativistic acceleration and the twin paradox . . . . . . . . . . . . 21
5.2 The Equivalence Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.3 Spacetime as a curved Riemannian manifold . . . . . . . . . . . . . . . . . . 25

6 M28Sep 27
6.1 Basis vectors in curved spacetime . . . . . . . . . . . . . . . . . . . . . . . . 27
6.2 Tensors in curved spacetime . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.3 Rules for tensor index gymnastics . . . . . . . . . . . . . . . . . . . . . . . . 30

7 R01Oct 32
7.1 Building a covariant derivative . . . . . . . . . . . . . . . . . . . . . . . . . . 32
7.2 How basis vectors change: the role of the affine connection . . . . . . . . . . 32
7.3 The covariant derivative and parallel transport . . . . . . . . . . . . . . . . . 36

8 M05Oct 38
8.1 The geodesic equations for test particle motion in curved spacetime . . . . . 38
8.2 Example computation for affine connection and geodesic equations . . . . . . 40

9 R08Oct 44
9.1 Spacetime curvature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
9.2 The Riemann tensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
9.3 Example computations for Riemann . . . . . . . . . . . . . . . . . . . . . . . 46

10 R15Oct 50
10.1 Geodesic deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
10.2 Tidal forces and taking the Newtonian limit for Christoffels . . . . . . . . . . 51

11 M19Oct 56
11.1 Newtonian limit for Riemann . . . . . . . . . . . . . . . . . . . . . . . . . . 56
11.2 Riemann normal coordinates and the Bianchi identity . . . . . . . . . . . . . 58

i
11.3 The information in Riemann . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

12 R22Oct 62
12.1 Lie derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
12.2 Killing vectors and tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

13 M26Oct 67
13.1 Maximally symmetric spacetimes . . . . . . . . . . . . . . . . . . . . . . . . 67
13.2 Einstein’s equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

14 R29Oct 73
14.1 Birkhoff’s theorem and the Schwarzschild black hole . . . . . . . . . . . . . . 73

15 M02Nov 79
15.1 TOV equation for a star . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
15.2 Geodesics of Schwarzschild . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

16 R05Nov 85
16.1 Causal structure of Schwarzschild . . . . . . . . . . . . . . . . . . . . . . . . 85

17 M16Nov 91
17.1 Charged black holes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
17.2 Rotating black holes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

18 R19Nov 96
18.1 The Kerr solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
18.2 The Penrose process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

19 M23Nov 101
19.1 Gravitational redshift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
19.2 Planetary perihelion precession . . . . . . . . . . . . . . . . . . . . . . . . . 103

20 R26Nov 106
20.1 Bending of light . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
20.2 Radar echoes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

21 M30Nov 111
21.1 Geodesic precession of gyroscopes . . . . . . . . . . . . . . . . . . . . . . . . 111
21.2 Accretion disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

22 R03Dec 116
22.1 Finding the wave equation for metric perturbations . . . . . . . . . . . . . . 116
22.2 Solving the linearized Einstein equations . . . . . . . . . . . . . . . . . . . . 118

23 M07Dec 122
23.1 Gravitational plane waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
23.2 Energy loss from gravitational radiation . . . . . . . . . . . . . . . . . . . . 125

ii
1 R10Sep
1.1 Invitation to General Relativity
From a particle physics perspective, the gravitational force is the weakest of the four known
forces. So why does gravity dominate the dynamics of the universe? A simple first answer is
that there is a lot of matter in the universe that gravitates. Even though the gravitational
attraction between any two subatomic particles is weak, if you get enough of them together
you can eventually make a black hole! A slightly more sophisticated answer focuses on the
range of the gravitational force and what sources it. The only two long-range forces we know
of in Nature are gravity and electromagnetism. By contrast, the strong nuclear force binding
atomic nuclei and the weak nuclear force responsible for the fusion reaction powering our
Sun are very short-range. Electromagnetic fields are sourced by charges and currents, but
the universe is electrically neutral on average, so electromagnetism does not dominate its
evolution. Gravity, on the other hand, is sourced by energy-momentum. Since everything
has energy-momentum, even the graviton, you can never get away from gravity.
Newton wowed the world a third of a millennium ago with his Law of Universal Gravita-
tion, which explained both celestial and terrestrial observations. Our focus in this course is
on explaining Einstein’s famous General Theory of Relativity (GR), which is a gener-
alization of both Newtonian gravity and Special Relativity proven useful for describing the
dynamics of the cosmos. By the end of term, you will be familiar with Albert Einstein’s
famous equation for the gravitational field gµν (x)
1
Rµν − gµν R + Λgµν = −8πGN Tµν , (1.1)
2
where Rµν and R involve (first and) second derivatives of gµν and Tµν describes the energy-
momentum tensor of all non-gravitational fields which are collectively known as matter fields.
You will also understand how GR gives back Newton’s theory of gravity in the limit where
speeds are small and spacetime is weakly curved. You can think of Einstein’s GR as Gravity
2.0, built on the foundation of Gravity 1.0 established by Newton – an upgrade.
The name for this course is “Relativity Theory 1”. Another name by which it is com-
monly known is “GR 1”, which stands for “General Relativity 1”. The main thing we learn
how to do in this course is how to write the dynamical equations of physics in the language of
tensor analysis. Tensor analysis always sounds scary when you start, but it is not much more
complicated conceptually than vector analysis, something you have been doing for years. We
will show how to take your vector analysis knowledge from flat space and generalize it to
spacetime. We begin with flat spacetime, which is pertinent to Special Relativity, and then
we build on that knowledge to figure out how to write dynamical equations of physics even
in curved spacetime.
Einstein taught us that the speed of light is constant and is the same in all inertial frames
of reference. We will therefore adopt the relativistic convention that c = 1 throughout the
course. This implies that time is measured in metres, and mass is measured in units of
energy, e.g. me =511 keV. We will keep all other physical constants explicit, such as Planck’s
constant ~ characterizing the strength of quantum effects and the Newton constant GN
characterizing the strength of gravity. If you feel queasy about missing factors of c in any
equation, they can always be easily restored by using dimensional analysis.

iii
1.2 Course website
Please have a careful read of the course website at https://ap.io/483f/. It contains lots
of vital and useful information for all students taking PHY483F/PHY1483F, including the
syllabus, online lecture notes, and how to contact me. Almost everything you need to know
about the course is contained in the pages listed, and in all the clickable links in those pages.
The remaining tiny fraction of information that needs to be hidden behind a UofT firewall
for class members only can be found on Quercus, in the Announcements (from the Prof.)
and Modules (from the TA).

Remarks in my notes intended for more advanced/interested students are indicated in blue.

1
2 M14Sep
2.1 Galilean relativity, 3-vectors in Euclidean space, and index
notation
Before we review some aspects of Special Relativity and introduce some new ones, let us begin
by reminding ourselves of the non-relativistic version of relativity, also known as Galilean
relativity. When we want to transform from one inertial frame of reference to another moving
at relative velocity v, there are three things we must think about:

(a) how time intervals relate,

(b) how spatial position intervals relate,

(c) how velocities relate.

In Galilean relativity, all clocks are synchronized,

dt0 = dt , (2.1)

displacements are related via


dx0 = dx − vdt , (2.2)
and velocities u = dx/dt compose by simple addition,

u0 = u − v , (2.3)

where v is the relative velocity between the unprimed and primed frames of reference. Ein-
stein upgraded these formulæ when he invented Special Relativity, and you have seen the
results before: they are known as Lorentz Transformations. We will get to them soon enough
– and we will show you how simple they can look when written in terms of rapidity rather
than velocity. But for now, let us inspect how 3-vectors work more closely, in some detail.
This will serve as a pattern for the relativistic case.
Lots of things of interest in physics are vectors, which are in essence things that point. I
like to say that a vector has a ‘leg’ that sticks out, telling you where it points. Mathematically,
the vector components are what you get when you resolve the vector along an orthonormal
basis. In Special and General Relativity, we will need to be scrupulously careful to distinguish
where we put our indices (up/down and left/right). For arbitrary vectors v, we write the
index telling you which component is which with an upstairs index: v 1 , v 2 , . . . , v d , where
i = 1, . . . , d and d is the spatial dimension. Note that the upstairs index i used here is not a
power; instead, it specifies which component v i is being discussed: the ith one. If you think
of a contravariant vector as a column vector, the upstairs index i denotes which row of the
column vector you are talking about. If you need to take a power of a vector component,
the GR convention is to write parentheses around it, e.g. (v 1 )2 . Note also that it is common
in GR literature to write the vector v as v i – technically, v i is a component of v, but letting
the index show explicitly rather than suppressing it helps us remember its transformation
properties.

2
Vectors provide a useful notational shorthand, preventing us from having to write out
all the components explicitly every time we write a physics equation like F~ = m~a. Tensor
analysis in GR is nothing scary – it is the natural generalization of vector analysis to curved
spacetime and multilegged objects. Its underlying idea is twofold:-

• In physics, the most useful dynamical variables transform in well-defined ways under
coordinate transformations, and are known as tensors. Example: the momentum
vector pi .
• The laws of physics should be tensorial equations. A Newtonian example you will
recognize is
F i = mai . (2.4)

When we change coordinates, the components of tensors on both sides of the equation change,
but the underlying physical relations between them do not.
The natural type of vector we defined above is called a contravariant vector. This is
like a column vector. It has a natural counterpart called a covariant vector, also known
as a dual vector. This is like a row vector. A covariant vector ω has components ωi ; note
that this is a downstairs index rather than an upstairs index. The index i tells you which
column of the row vector you are talking about. There is a natural inner product between
contravariant vectors v and covariant vectors ω:
X
ω·v = ωi v i . (2.5)
i

A very useful convention that we will use throughout the course is the Einstein summa-
tion convention. This is a notational shorthand in which a repeated index is automatically
summed over when it occurs precisely once upstairs and precisely once downstairs. This con-
vention suppresses the unwieldy Σ signs so that it becomes easier to see the wood for the
trees. The thing that signals that you are summing over an index is that it is repeated. Note
that a repeated (summed over) index can appear precisely twice in any given equation: if it
occurs more times, the writer has made a mistake. Summing over a repeated index is also
called index contraction, because what you get for the result has none of the summed-over
indices remaining. In our v · ω example above, the result is a scalar: a tensor with zero legs.
Why is it important to distinguish between contravariant and covariant vectors? In a
nutshell, because they transform differently under coordinate transformations. Let us see
how this works for a rotation. You may be used to writing a rotation of (say) a displacement
vector as a d × d matrix R. Rotation matrices are orthogonal,

R−1 = RT . (2.6)

Alternatively, we can say that they preserve the Euclidean norm in 3-space:

RT 13 R = 13 , (2.7)

where 13 is the identity matrix. While R transforms contravariant vectors v in Euclidean


space as
v → v0 = R · v , (2.8)

3
where the prime indicates the transformed vector and the unprimed vector is the original,
the transpose RT transforms covariant vectors v T as
0
v T → v T = v T · RT . (2.9)

However, we strongly discourage writing coordinate transformations in terms of matrices


in future, and instead encourage you to get the hang of index notation. Once time is
included, coordinates are curvilinear, and spacetime is physically curved, index notation
and the Einstein summation convention will help us keep track of indices in a much more
succinct way and therefore reduce the error rate when handling tensors.
A rotation is expressed in index notation for the contravariant vector components as
0 0
v i = Ri j v j . (2.10)
0
The Ri j is the component of the rotation matrix from the i0 th row and jth column. Note
that the left-right placement of indices here is physically important, as well as the upstairs-
downstairs placement. The physics reason why is that rotation matrices are not symmetric,
so casually switching them makes no sense. Let us write out the above transformation law
more explicitly, so that you can see how it encodes matrix multiplication in a disciplined
way. For a rotation of the vector v with components v i about the z-axis, it reads1
0 0 0 0
v 1 = R1 1 v 1 + R1 2 v 2 + R1 3 v 3 = + cos θ v 1 + sin θ v 2
0 0 0 0
v 2 = R2 1 v 1 + R2 2 v 2 + R2 3 v 3 = − sin θ v 1 + cos θ v 2
0 0 0 0
v 3 = R3 1 v 1 + R3 2 v 2 + R3 3 v 3 = v 3 . (2.11)

For the covariant vector ω specified by its components ωi , we have

ωi0 = Ri0 j ωj . (2.12)

Note that the left-right and upstairs-downstairs index placements are deliberate and phys-
ically meaningful here, as for the contravariant case earlier. Ri0 j is the i0 th column of the
jth row of RT .
Rotations are interesting mathematically because they preserve the Euclidean norm. In
index notation, this condition reads
0 0
Rj 0 i δkj 0 Rk ` = δ`i , (2.13)

where 
1, i = `
δ`i = . (2.14)
6 `
0, i =
Mathematically, the tensor δji with one contravariant index and one covariant index is the
identity matrix. Since the identity is a symmetric matrix, we do not have to be picky about
left-right index placement on Kronecker deltas like we do for other tensors. More generically,
we must be very careful about left-right and up-down index placement on our tensors. A
1
Nitpickers: please note that, like Carroll, we take the point of view that the vector stays fixed while the
coordinate system changes under the relevant transformation.

4
physical implication of the above formula is that even if you rotate your velocity vector, you
still get the same kinetic energy for a non-relativistic particle, because the kinetic energy is
proportional to the norm of the velocity vector.
0
Using the components of Ri j given above, use eq.(2.13) to check that you have correctly
0
identified the R` k . Then using the transformation laws for contravariant and covariant
vectors eq.s (2.10,2.12), show that covariant vector components are transformed in the −θ
direction while the contravariant vector components are transformed in the +θ direction.
This sign difference might seem rather trivial, but it is anything but! It is our first glimpse
of why we do need to be very careful to distinguish between upstairs and downstairs in-
dices for vectors – and more generally for tensors, which are multilegged generalization of
contravariant and covariant vectors.
In physics, we often want to find the norm (length) of a vector, or the angle between
two distinct vectors via the dot product. In index notation, what we need to be able to
do is to convert contravariant vectors into covariant vectors or vice versa. To achieve this,
we need extra structure on the space (or later on, the spacetime) in which the vectors live,
called a metric tensor g, which must be invertible. In flat Euclidean 3-space in Cartesian
coordinates, the role of the metric tensor is played by the Kronecker delta tensor, which
is the identity matrix in both upstairs and downstairs components,

1, i = j
δij = , (2.15)
0, i 6= j
and 
ij 1, i = j
δ = . (2.16)
6 j
0, i =
The one-up-one-down version δji also has the same numerical values.
The upstairs spacetime metric is the (left and right) inverse of the downstairs metric,

g ij gjk = g i k = δki ,
gij g jk = gi k = δik . (2.17)

As you can see, this equation is easily satisfied for flat Euclidean 3-space in Cartesian
coordinates. Soon we will see that this equation must also hold when using more general
coordinate systems or when operating in curved spacetime – or both.
Converting between contravariant and covariant components of vectors and vice versa is
achieved via
vi = gij v j , (2.18)
and
ω i = g ij ωj , (2.19)
where again we used the Einstein summation convention. The fact that the metric is so trivial
in flat Euclidean 3-space in Cartesian coordinates is why people are often very careless with
index placement – if you write it out explicitly you will see that (for example) v 2 = v2 because
gij = δij and g ij = δ ij . Reminder: if you need to take a power of a vector component, put
parentheses around it to make it unambiguous. For example, (v 1 )2 means the square of the
first contravariant component of the vector v.

5
Notice how if we have a physics equation for contravariant vectors, say F j = maj , we
can multiply both sides by δij (which is a number) and sum over the repeated index j to
obtain a covariant vector equation Fi = mai . This chain of logic only works because we have
a metric available – otherwise, we would have no way of converting upstairs-index equations
to lower-index ones.
As you will recall from Newtonian physics, the kinetic energy is proportional to the
square of the velocity vector, i.e. its norm |v|2 = δij v i v j . This is a scalar, i.e. invariant
under rotations. So is the inner product or dot product of any two contravariant vectors
ai and bj , formed by using the metric tensor,
a · b = gij ai bj . (2.20)
In 3 spatial dimensions only, we can build another 3-index animal out of two contravariant
vectors ai and bj by taking an outer product or cross product. We will be able to write an
expression for this in our handy index notation by making use of another new object known
as the permutation pseudotensor Eijk , which is antisymmetric under interchange of any
two of its indices,

 +1 , (ijk) = even permutation of (123) ,
Eijk = −1 , (ijk) = odd permutation of (123) , (2.21)
0 , otherwise .

There is also an upstairs version Eijk with the same numerical values, in flat Euclidean 3-
space. What is this permutation pseudotensor used for? Well, one of the first things it can
do is to help us find the determinant of a matrix,
det(M ) = Eijk M 1i M 2j M 3k . (2.22)
Applying this formula to an orthogonal transformation matrix allows us to discover that Eijk
is a pseudotensor, rather than a proper tensor, because it does not flip sign under a parity
transformation xi → −xi . It is also invariant under rotations and translations.
When handling expressions containing Eijk or its upstairs version Eijk , we may need to
know what contractions of these beasts look like. The identities it obeys are very handy to
know,
Eijk Eijk = 3! ,
Eijk Eij` = 2! δ`k ,
Eijk Eimn = 1! δmn
jk
,
ijk
Eijk E`mn = 0! δ`mn , (2.23)
where the generalized Kronecker deltas are defined by
j k
jk δm δm j k
δmn ≡ = δm δn − δnj δm
k
, (2.24)
δnj δnk
and
δ`i δ`j δ`k
ijk
δ`mn i
≡ δm j
δm k
δm j k
= +δ`i δm δn + δm δn δ` + δni δ`j δm
i j k k i j k
− δm δ` δn − δ`i δnj δm
k
− δni δm
j k
δ` . (2.25)
i j k
δn δn δn

6
Using all of this, we can finally write out the components of the outer (cross) product in
index notation in 3D,
(a × b)i = Eijk aj bk . (2.26)
Notice that in writing the outer product here, we have again used the Einstein summation
convention – twice – on both j and k. This makes the expression more compact. Also,
since this is a bona fide 3-vector equation, we can raise the index using our (spatial, trivial)
metric. As you should convince yourself, the result is (a × b)i = Eijk aj bk . We can use these
expressions to find, for example,

[a × (b × c)]` = E`mn am (b × c)n


= E`mn am Enpq bp cq
pq m
= δ`m a bp c q
p q p q m
= (δ` δm − δm δ` )a bp cq
q p
= a b` c q − a bp c `
= [b(a · c) − c(a · b)]` , (2.27)

which should look familiar from vector calculus classes earlier in your education.
In general spacetime dimension d, we can define a d-dimensional version of the anti-
symmetric permutation pseudotensor with d indices. Then, the outer product between two
contravariant vectors ai and bj is more properly thought of as a pseudotensor with (d − 2)
legs, because it is formed via the contraction Ei1 i2 ...id ai1 bi2 of two vectors a and b with the
d-legged E pseudotensor. Note that E is defined in any dimension as long as the manifold is
orientable.

7
3 R17Sep
3.1 Special relativity and 4-vectors in Minkowski spacetime
Let us now turn to studying how to generalize spatial vectors in flat Euclidean space to
spacetime vectors in flat Minkowski spacetime, in Cartesian coordinates to begin with.
The bedrock principle of the constancy of the speed of light has some fairly dramatic
physics implications, chief among them being time dilation and length contraction.
Both of these ideas have been rigorously tested experimentally, e.g. in particle collider and
cosmic ray contexts, and found to hold true. Also, velocities no longer add simply, obeying a
composition law that looks pretty mysterious the first time you see it. Let me now demystify
this and Lorentz boosts by using a clever parametrization.
When you first saw Lorentz boosts, probably at the end of first year Newtonian me-
chanics or in a second year modern physics course, they probably looked like the following.
For an infinitesimal Lorentz boost in the x direction in units where c = 1,
dt0 = γv (dt − vdx) , dx0 = γv (dx − vdt) , dy 0 = dy , dz 0 = dz , (3.1)

where γv ≡ 1/ 1 − v 2 . Using these expressions, you can easily figure out how velocities
transform for a Lorentz boost along the x axis,
dx0 ux − v dy 0 uy dz 0 uz
u0x = = , u 0
y = = , u0
z = = . (3.2)
dt0 1 − ux v dt0 γv (1 − ux v) dt0 γv (1 − ux v)
You can also work out the 3-accelerations
ax ay (uy v)ax az (uz v)ax
a0x = 3 , a 0
y = + , a 0
z = + .
γv (1 − ux v)3 γv2 (1 − ux v)2 γv2 (1 − ux v)3 γv2 (1 − ux v)2 γv2 (1 − ux v)3
(3.3)
Notice that, unlike for Galilean relativity, acceleration is not an invariant in Special Relativ-
ity. But whether or not someone is accelerating is an absolute concept: if the acceleration is
zero in one frame of reference, then it is also zero in a Lorentz boosted frame of reference.
Note: these formulæ are written in older notation that we will not continue using later in
this course.
We can write Lorentz boost formulæ in a much prettier way by using the rapidity ζ,
which is defined by
v = tanh ζ . (3.4)
Note that while the speed ranges over v ∈ (−1, +1), the rapidity ranges over ζ ∈ (−∞, +∞).
The really awesome thing about rapidity is that it is additive. To add the rapidities, you
literally just add them, like for rotation angles: ζtot = ζ1 + ζ2 . It is a simple exercise to
recover the relativistic velocity addition law from the definition of rapidity and its additive
nature. Give it a go yourself to be sure you understand. Now we are in a position to show
you a Lorentz boost along the x direction in rapidity variables – da-daah!
dt0 = + cosh ζ dt − sinh ζ dx ,
dx0 = − sinh ζ dt + cosh ζ dx ,
dy 0 = dy ,
dz 0 = dz . (3.5)

8
This looks a bit like a rotation, except for two physically important differences: (1) it mixes
temporal and spatial intervals, rather than different spatial intervals, and (2) it involves
hyperbolic trig functions, rather than normal trig functions. Another difference is that it is
not the 3D Euclidean norm that is preserved under Lorentz transformations, but rather the
4D Minkowski norm, also known as the invariant interval

ds2 = dt2 − dx2 − dy 2 − dz 2 . (3.6)

The invariant interval so defined is positive if the points are timelike separated, negative
if they are spacelike separated, and zero if they are null separated. This classification
works regardless of which inertial reference frame you use, because it is invariant under sym-
metry transformations of Minkowski spacetime: rotations, [Lorentz] boosts, and translations.
The invariant interval ds2 = dt2 − |d~x|2 gives rise to the concept of a light cone. For
a point p, this is the cone defined by all light rays emanating from p into the future or the
past. Points that are timelike separated from p are inside its light cone (positive ds2 ), those
that are spacelike separated from p are outside it (negative ds2 ), and those that are null
separated from p (zero ds2 ) lie on the light cone itself. Put more colloquially, if you had just
died at point p, then your past light cone and its interior would contain all possible suspects
for who had murdered you. If on the other hand you had set off a bomb at p, then your
future light cone and its interior would contain beings you could have killed (using any form
of explosive, TNT and photon torpedoes included!). Here is a pictorial representation of the
light cone (for the D = 2 + 1 case). Figure credit: Wikipedia.

Note that light rays are conventionally drawn at a 45 degree angle on spacetime diagrams,
in flat spacetime, to represent the fact that c = 1. In curved spacetime, the story gets more
complicated, because the spacetime metric varies with position, rather than being constant.
It might be worth reminding you of the definition of proper time. To set the context,
consider two events that are timelike separated. The proper time between two spacetime
events measures the time elapsed as seen by an observer for whom the two events occur at
the same spatial position. In our signature convention, the invariant interval is positive in
the timelike case, so ds2 = dτ 2 .

9
Motivated by the form of the matrices representing Lorentz boost transformations, let
us define a relativistic 4-vector x with components xµ given by

x0 = (c)t ,
xi = (~x)i . (3.7)

Here, µ ∈ {0, 1, . . . , d}. Notice how time is totally different conceptually than it was in
Galilean relativity: it is the zeroth position coordinate, not an invariant. We can then define
the invariant interval as
ds2 = gµν dxµ dxν . (3.8)
In flat Minkowski spacetime in Cartesian coordinates, the metric tensor has downstairs
components 
 +1, µ = ν = 0
ηµν = −1, µ = ν ∈ {1, 2, . . . , d} . (3.9)
0, µ 6= ν

Its upstairs counterpart, the inverse, has components



 +1, µ=ν=0
µν
η = −1, µ = ν ∈ {1, 2, . . . , d} . (3.10)
0, µ 6= ν

The equations expressing the fact that the upstairs and downstairs Minkowski metrics are
inverses of each other are

η αβ ηβγ = η αγ = δγα ,
ηαβ η βγ = ηαγ = δαγ . (3.11)

Again, we have used the Einstein summation convention where repeated indices are summed
over. Note that we have chosen the mostly minus signature convention here. Be aware
that formulæ that you may obtain from various GR textbooks may have been written in the
opposite sign convention. This can be quite annoying when you are trying to track down
minus sign errors in a calculation. HEL has a useful table on p.193 outlining key signature
convention differences with d’Inverno, Misner-Thorne-Wheeler and Weinberg.
The Minkowski metric tensor η is useful for raising and lowering indices. Specifically,
for a contravariant vector V ν we can find its covariant components Vµ by contracting with
ηµν :
Vµ = ηµν V ν . (3.12)
Contracting an index means repeating it (precisely once) and summing over it. For ex-
ample, in the above equation, the index ν is contracted, while the index µ is not. Let us
calculate one component, V0 .

V0 = η0ν V ν
= η00 V 0 + η01 V 1 + η02 V 2 + η03 V 3
= (+1)V 0 + (0)V 1 + (0)V 2 + (0)V 3
= +V 0 . (3.13)

10
To find the contravariant components ω µ of a covariant vector ων , we need to contract with
the upstairs metric η µν :-
ω µ = η µν ων . (3.14)
Using the Minkowski metric, we can define a relativistic dot product between two contravari-
ant vectors aµ and bν ,
a · b = ηµν aµ bν . (3.15)
Before we move on to defining tensors in a more general way, let us make a couple
of comments about the symmetry group of Minkowski spacetime for those who might be
interested. We talked about rotations earlier, and noted that they preserved the norm of 3-
vectors in flat Euclidean space. A rotation matrix is orthogonal and preserves the Euclidean
3-norm. The group of such matrices in 3D is known as SO(3). What is the analogue
condition for 4-vectors in flat Minkowski spacetime? If you work out the algebra, you will
find that both rotation and boost transformations written as 4 × 4 matrices Λ preserve the
Minkowski norm, ΛT ηΛ = η, where η is the Minkowski metric tensor we defined above. In
index notation,
0 0
Λk0 i η k `0 Λ` j = η i j . (3.16)
Such matrices Λ in D = d + 1 dimensions are said to belong to the group SO(1, d). Rotation
and boost matrices together are known as Lorentz transformations and they form a Lie
(continuous) group known as the Lorentz group.
If we include translations as well, the resulting group of transformations is known as
the Poincaré group ISO(1, d) (mathematically, it is a semidirect product). An interesting
fact about the Poincaré group is that, without even looking at an experiment, you can
prove theoretically that there are only two2 invariants: the mass m and the intrinsic spin
s. They are always the same in different inertial frames of reference related by rotations,
boosts, or translations. This is why subatomic particles are differentiated by their mass and
spin. The third label we use to distinguish subatomic particles, that also respects Poincaré
invariance, is the set of conserved charges under whichever gauge symmetries are relevant,
e.g. SU (3) × SU (2) × U (1) of the Standard Model of Particle Physics.
So, how do we define vectors and tensors in flat spacetime? The signature property of
a vector and, more generally, of a tensor, is that it transforms in a specific and well-defined
way under changes of reference frame, using the spacetime coordinates as the quintessential
example. For a single-index tensor V with upstairs components V µ ,
0
µ0 ∂xµ ν 0
V = ν
V = Λµ ν V ν , (3.17)
∂x
which is known as a contravariant vector. There are also covariant vectors which obey
∂xν
Vµ0 = Vν = Λµ0 ν Vν . (3.18)
∂xµ0
2
If you want to know why, and are unafraid of a little Lie group theory, you can find out why by reading
my PHY2404S notes at https://ap.io/archives/courses/2014-2020/2404s/qft.pdf. I also explain
there why helicity is the relevant thing for massless particles and why spin has the character of an angular
momentum for massive particles.

11
Look closely at the above two equations: they are materially different. In the equation for
the contravariant vectors, the transformed coordinates x0 appear in the numerator of the
Jacobian and the original coordinates x appear in the denominator in the transformation
law; for covariant vectors the opposite happens.
Mathematically speaking, contravariant vectors live in the tangent space, which is de-
fined at every point in spacetime. Covariant vectors live in the cotangent space. They
obey the usual axioms of vector spaces: associativity and commutativity of addition, ex-
istence of identity and inverse under addition, distributivity, and compatibility with scalar
multiplication.
A rank (m, n) tensor has m contravariant indices and n covariant indices. In math-
ematical language, a rank (m, n) tensor is a multilinear map from the direct product of
m copies of the cotangent space with n copies of the tangent space into the real numbers.
Alternatively, you can think of it as a machine with m slots for covariant vectors and n slots
for contravariant vectors to make a scalar. For instance, a rank (0, 1) tensor (a covariant
vector) is a machine with one slot for a contravariant vector (a rank (1, 0) tensor), which
when inserted will produce a scalar (a rank (0, 0) tensor). The spacetime metric is a (0, 2)
tensor; its inverse is a (2, 0) tensor.
To find out how the components of a tensor transform, you use the transformation
matrices on each index in turn,
0 0
0 0 ∂xµ1 ∂xµm ∂xσ1 ∂xσn λ1 ...λm
T µ1 ...µmν1 0 ...νn 0 = . . . . . . T σ1 ...σn . (3.19)
∂xλ1 ∂xλm ∂xν1 0 ∂xνn 0
Note that each of the indices λ1 , . . . , λm and σ1 , . . . , σn in this equation is repeated and
summed over, keeping to the Einstein summation convention. So if you were to expand out
all the components one by one, this would be a pretty long equation. It’s just as well we
know how to represent it compactly using index notation!
The general idea of tensor analysis is that all laws of physics should be expressible in
terms of tensor equations. In tensorial equations, indices can be consistently raised and
lowered, as long as this is done consistently to both sides. In other words, you should not
raise an index on the left side of a tensor equation while failing to do the same on the right
hand side. Every equation should have the same number and type of indices on both sides.
Tensorial equations hold equally well in any frame of reference, even though the components
are different in different frames of reference.
Now let us turn to a few examples of the utility of tensors in Minkowski spacetime.

3.2 Partial derivative 4-vector


We can use Minkowski spacetime tensors to describe more objects than a massive point
particle. For starters, we can form a very important covariant vector out of derivatives,

∂µ ≡ . (3.20)
∂xµ
Its zeroth component describes the time derivative
∂ 1 ∂
∂0 = = , (3.21)
∂(c)t (c) ∂t

12
while the spatial parts ∂i describe spatial derivatives. As you can see, ∂µ arises naturally
as a covariant vector. It is a straightforward and worthwhile exercise to show that in flat
Minkowski spacetime,
1 ∂2 ∂2 ∂2 ∂2
∂ µ ∂µ = − − − . (3.22)
(c)2 ∂t2 ∂x2 ∂y 2 ∂z 2
This differential operator appears in relativistic wave equations, e.g. the Maxwell equations,
or the equation of motion for a Klein-Gordon (scalar) field Φ, ∂ µ ∂µ Φ = m2 Φ.
For fun, let us try applying −i~∂µ to a plane wave of the form f (x) = eik·x and see what
happens.

−i~∂µ f (x) = −i~∂µ exp(ikλ xλ )


= −i~[ikν δµν ] exp(ikλ xλ )
= ~kµ f (x) . (3.23)

In other words, −i~∂µ is playing the role of the momentum when acting on plane waves of
the form f (x) = eik·x , producing the eigenvalue pµ = ~kµ . In mathematical lingo, we say
that the plane wave carries a representation of the translation group. If we only had discrete
translation invariance up to a lattice vector instead, we would end up with Bloch waves
instead of continuous spectrum plane waves.

13
4 M21Sep
4.1 Relativistic particle: position, momentum, acceleration 4-vectors
For any point particle, massive or massless, we can define its 4-momentum pµ by

p0 = E ,
pi = (~p)i , (4.1)

where p~ is the relativistic 3-momentum and E is the relativistic energy. For a massive
particle, we have
m
p0 = √ = m cosh ζ ,
1 − v2
m
pi = √ v i = m sinh ζ v̂ i . (4.2)
1−v 2

Check out for yourself what happens to components of the 4-momentum under Lorentz
transformations.
Notice that the relativistic norm of the momentum 4-vector is a constant,

pµ pµ = E 2 − |~p|2 = m2 . (4.3)

This is known as the mass shell relation. It holds for any particle, massless or massive.
For massless particles like the photon, E 2 = |~p|2 .
The 4-velocity is defined for massive particles only, via
dxµ (τ )
uµ = , (4.4)

where τ is the proper time. It is related to the momentum 4-vector by

pµ = muµ . (4.5)

Note that the 4-velocity satisfies


uµ uµ = +1 , (4.6)
by the mass shell constraint. Work out for yourself how the spatial components of uµ relate
to the Newtonian velocity – you should find u0 = γv , ui = γv (~v )i .
The 4-acceleration is defined for massive particles only, via
duµ (τ ) d2 xµ (τ )
aµ = = , (4.7)
dτ dτ 2
where τ is proper time. Work out for yourself how this relativistic acceleration aµ connects
with the Newtonian version of acceleration ~a that you used in first-year undergrad physics.
What if we wanted to get a bit more sophisticated and write down an action principle
for the point particle? First, let us do a lightning review of some salient points from classical
mechanics. In a general dynamical system, our variables are the coordinates

q a (λ) , (4.8)

14
where the index a labels which coordinate we are discussing and λ is a parameter that
measures where we are along a particle path. For non-relativistic system we will pick λ = t,
.
Newtonian time. The velocities are q a (λ), where
d
.= . (4.9)

We also have the expression for the canonical momenta in terms of the velocities,
∂L
pa = . , (4.10)
∂ qa
which are found from the action, which is a functional of the coordinates,
Z
a .
S = S[q (λ)] = dλ L(q a (λ), q b (λ)) . (4.11)

Using the Lagrangian L and the expressions for the canonical momenta in terms of the
velocities, we can form the Hamiltonian H which depends on the coordinates on phase
space, the coordinates and their conjugate momenta,
X .
H = H(q a , pb ) = pa q a − L . (4.12)
a

The principle of least action δS = 0, combined with your knowledge of the calculus of
variations, results in the Euler-Lagrange equations
 
∂L d ∂L
− . = 0. (4.13)
∂q a dλ ∂ q a
These equations of motion are equivalent to Hamilton’s equations,
dpa dq a
= {pa , H}PB , = {q a , H}PB , (4.14)
dλ dλ
where the Poisson bracket is defined via
X ∂f ∂g ∂g ∂f
{f, g}PB = a
− a . (4.15)
a
∂q ∂pa ∂q ∂pa

The basic dynamical variables for a non-relativistic point particle are xi (t), where t is
the non-relativistic time and i = 1, 2, 3. These are 3 bona fide independent functions. There
is no issue about how to parametrize t, because all observers agree on time, by Galilean
relativity. For a free nonrelativistic particle, the Lagrangian is just the kinetic energy,
Z
1
Snonrel = dt m|~v |2 . (4.16)
2
This action respects Galilean invariance in Euclidean 3-space. The canonical momenta are

pi = mvi , (4.17)

15
and the Hamiltonian is
1 i
HNR = p pi . (4.18)
2m
This is just the kinetic energy written in terms of momentum rather than velocity.
So, that was all well and good, but what about an action principle for the relativistic
point particle? This will be an integral over the worldline of the particle, which is the path
it traces out as it moves through spacetime. For relativistic point particles we cannot use
the Newtonian kinetic energy, because it is not invariant under Lorentz boosts. We will have
to use a generalization that respects Einsteinian relativity. The simplest guess for an action
generalizing the above that people typically write for a massive particle is proportional to
the arc length, r
dxµ (τ ) dxν (τ )
Z
(1)
Srel = −m dτ ηµν , (4.19)
dτ dτ
where τ is the proper time (an invariant, unlike the time coordinate). This action has the
benefit that, at low speeds, it reduces to the familiar non-relativistic action – up to an
additive constant (try it yourself to see how, by doing a Taylor series). It assumes that the
particle position xµ (τ ) can be parametrized by the proper time τ .
The drawback of this first choice of relativistic particle action is twofold. First, the
particle is assumed to be massive, so that proper time can be used to parametrize the
worldline. If we want to write down equations of motion for massless particles like photons,
it will not suffice. Second, our 4 dynamical variables xµ (τ ) are not actually independent
functions. At all points along the evolution, the 4 must obey the mass shell constraint,
. .
xµ xµ = +1, where . = d/dτ . As a result, only 3 of the 4 xµ (τ ) are independent functions.
It is a physics fib to pretend that all 4 can be independently varied in the action principle.
This is why some people substitute the mass shell constraint into the Lagrangian to sidestep
the problem. p. .
Suppose further that we tried to use the above Lagrangian Lτ = −m xµ xµ to find the
canonical momenta and Hamiltonian. What we would end up with is Hτ ≡ 0. A related fact
is that the geometric arc length Lagrangian is ‘singular’. What does this mean? Well, if we
inspect the Euler-Lagrange equations for general q a (λ), we can rearrange them to see that
∂ 2 L ..b ∂L ∂ 2 L .b ∂ 2L
. . q = − . a q − . . (4.20)
∂ qb∂ qa ∂q a ∂q b ∂ q ∂t∂ q a
.
Everything on the RHS of this equation is a function of λ, q a , and q a . So for our massive
.. .
relativistic point particle, finding all of the accelerations xµ in terms of τ , xµ (τ ), xµ (τ ) only
works if the Hessian tensor
∂ 2L
. . (4.21)
∂ xν ∂ xµ
has maximal rank. It actually has one zero eigenvalue, and this signals the presence of a
local gauge symmetry: reparametrization invariance.3
3
For more details on this and a number of related topics, see the 1990 textbook “Lagrangian Interaction”
by Noel Doughty, intended for senior undergraduates. When I was doing my B.Sc.(Hons) degree in New
Zealand, I took a course from Doughty, and his notes and background material were published as this book
a year later. I am very grateful to Doughty for helping inspire me to be a theoretical physicist. If you take
a peek into the Acknowledgements section, you will see that he thanked me and four of my classmates. :D

16
Let us now mention the correct way to handle a constraint. The key is to introduce a
Lagrange multiplier, which in this case we will call e(λ), the einbein. In general, a Lagrange
multiplier is something that appears in your action principle only via dependence on “co-
ordinates” but not on “velocities”: it is not a truly dynamical field. Its only function is to
implement the constraint that you need to impose, in a way that respects the symmetries
of your system. In our case, we want to preserve invariance under rotations, boosts, and
translations. Using this idea, a Lagrangian can be written down that achieves all the things
we need. I will refer to it as the einbein Lagrangian4 ,
dxµ (λ) dxν (λ) 1
Z  
(2) 1 −1 2
Srel = dλ e (λ) ηµν + e(λ) m . (4.22)
2 dλ dλ 2
This action is invariant under reparametrizations λ → λ0 , as long as the einbein transforms
as

e → e0 = 0 e . (4.23)

(2)
Varying this action Srel w.r.t. e(λ) gives its Euler-Lagrange equation, and this produces
the mass shell constraint,
. .
xµ (λ)xµ (λ) = +m2 [e(λ)]2 . (4.24)
It is a good idea to check this yourself, by working through the steps. Along the way, you
will need to use the fact that
∂xα
= ∂β xα = δβα = ηβα , (4.25)
∂xβ
.
and a similar equation for the xµ s. For massive particles we can pick the proper time
. .
parametrization in which e(λ) = 1/m; then xµ xµ = 1 and λ = τ . For massless particles, a
. .
convenient parametrization is e(λ) = 1, and so xµ xµ = 0 and there is no concept of proper
time, just a parameter λ. Because we have obtained the mass shell constraint equation
directly from the action, we can be confident that we truly have only 3 independent functions
xµ (λ) in our dynamical system, not 4.
(2)
Varying Srel w.r.t. xµ (λ) gives the equations of motion for the relativistic particle posi-
tion,
[pµ (λ)]. = 0 , (4.26)
where the canonical momenta are
.
pµ (λ) = e−1 (λ)xµ (λ) . (4.27)
.
The above equation of motion pµ = 0 is equally valid for a massive or massless particle, as
long as it is free, rather than acted on by an external force. If it feels an external force,
obviously we would generically not expect its momentum to be conserved.
The Hamiltonian is
.
Hλ = pµ (λ) xµ (λ) − Lλ
1
= e(λ) pµ (λ)pµ (λ) − m2 .
 
(4.28)
2
4
Sometimes it is alternatively referred to as “1D General Relativity”.

17
This Hamiltonian is proportional to the constraint, and this is the correct answer because it
gives all the correct Poisson Brackets:

{xµ , pν }PB = δνµ , {xµ , xν }PB = 0 , {pµ , pν }PB = 0 . (4.29)

If we wanted to canonically quantize a system (something we will not be doing in this course),
we would replace classical Poisson brackets with quantum mechanical commutators.

4.2 Electromagnetism: 4-vector potential and field strength ten-


sor
A less trivial example of a special relativistic tensor is Maxwell’s electromagnetism. Having
played with the Maxwell equations, you know why EM waves travel (in vacuum) at the speed
of light. You may already know that, decades before Einstein invented special relativity,
Maxwell had baked it into the very fabric of his eponymous equations! What you may
not know is that the familiar electric and magnetic field strengths are actually not correctly
described by vectors, but instead by a two-index covariant antisymmetric tensor. Specifically,
in four spacetime dimensions, the gauge field strength components Fµν are built out of
j
~
F0i = +δij E
~k
Fij = −Eijk B (4.30)

In this equation, we used the totally antisymmetric permutation symbol in 3 dimensions.


The electromagnetic 4-vector gauge potential Aµ is built out of the scalar potential
and the 3-vector potential, with components

A0 = Φ ,
i
~ .
Ai = A (4.31)

It is related to the field strength via the covariant curl,

Fµν = ∂µ Aν − ∂ν Aµ . (4.32)

This splits up in 3+1 notation as


~ =∇
B ~ ×A
~,
~
E ~ − ∂A .
~ = −∇Φ (4.33)
∂t
Note that Aµ (xλ ) is the basic dynamical field of electromagnetism. The field strength Fµν is
a derived quantity.
Using the above definitions, the four Maxwell equations
~
∇ ~ − ∂ E = J~ ,
~ ×B ~ = ρ,
∇·E (4.34)
∂t
~
∇ ~ + ∂ B = ~0 ,
~ ×E ~ = 0.
∇·B (4.35)
∂t

18
neatly collapse into two manifestly relativistic Maxwell equations,

∂µ F µν = J ν ,
Eµνλρ ∂ν Fλρ = 0 . (4.36)

Later when we generalize to curved spacetime, the partial derivatives ∂µ will be replaced by
covariant derivatives ∇µ .
In the above relativistic Maxwell equations, the 4-vector current is built out of the charge
density and the 3-vector current, with components

J0 = ρ ,
i
J i = ~j . (4.37)

The 4-vector current obeys a conservation law,

∂µ J µ = 0 . (4.38)

Here we are working in four spacetime dimensions. If we write more generally the
spacetime dimension D as5 D = d + 1, then the D-index pseudotensor Eµ0 ...µd is defined via

 +1 , (µ0 . . . µd ) = even permutation of (012 . . . d) ,
Eµ0 ...µd = −1 , (µ0 . . . µd ) = odd permutation of (012 . . . d) , (4.39)
0 , otherwise .

If you want the permutation tensor with upstairs indices, you can easily build it by using
η µν to raise the indices. Note that our 4-index permutation pseudotensor Eµνλσ obeys some
handy identities, in a Minkowski space generalization of what we saw before for Euclidean
3-space. Defining
µν δµ δν
δαβ = αµ αν (4.40)
δβ δβ
and
δαµ δαν δαλ
µνλ µ
δαβγ = δβ δβν δβλ (4.41)
δγµ δγν δγλ
and
δαµ δαν δαλ δασ
µνλσ δµ δβν δβλ δβσ
δαβγδ = βµ (4.42)
δγ δγν δγλ δγσ
δδµ δδν δδλ δδσ
5
We only ever consider spacetimes with one timelike dimension. Currently, it is not generally known how
to make sense of quantum theory with two or more timelike dimensions. Which ∂/∂t should we use in the
Schrödinger equation?

19
gives, after quite a bit of algebra,

Eµνλσ Eµνλσ = −4! ,


Eµνλσ Eµνλδ = −3! δδσ ,
Eµνλσ Eµνγδ = −2! δγδ
λσ
,
Eµνλσ Eµβγδ = −1! δβγδ
νλσ
,
µνλσ
Eµνλσ Eαβγδ = −0! δαβγδ . (4.43)

The relativistic Lorentz force law can be written very nicely in relativistic tensor notation,

maµ = qF µν uν , (4.44)

where uµ is the relativistic 4-velocity and aµ is the relativistic 4-acceleration. You will work
out some aspects of this EM story in your HW1 assignment. In particular, you will be able
to compute the effect of a Lorentz boost on the electromagnetic fields E ~ and B,
~ which many
of you will not have seen before.
We now turn to the question of what happens for accelerated observers moving with
constant relativistic acceleration.

20
5 R24Sep
5.1 Constant relativistic acceleration and the twin paradox
When I was an undergraduate, a professor introduced the idea of the Twin Paradox to us.
Could the space traveller twin really live longer by travelling at relativistic speeds? The
maddening thing was that he never equipped us with the technology to answer the question!
Here is how we can solve that without having to resort to General Relativity: we will only
use what we know about Special Relativity to solve it, along with a tiny bit of calculus.
We all know that time dilation lengthens time intervals as compared to what is measured
in rest frame. We also know that each observer sees the other person’s clock as running slow.
So why is there even a difference between what the space twin sees and what the homebody
twin sees? Acceleration. The space twin must accelerate in order to turn around and come
back to Earth, before they can compare clocks with the homebody twin. This is what makes
the space twin physically distinct from the homebody twin, who stays in a relatively boring
inertial reference frame while the space twin gallivants around the galaxy.
What do we mean by constant relativistic acceleration, exactly? Without loss of gener-
ality, we may take the astronaut acceleration to be pointing along the x1 direction. Since the
rapidity is additive under two successive Lorentz boosts, we may take a guess that constant
relativistic acceleration occurs when rapidity increases linearly with proper time. Let the
magnitude of the constant relativistic acceleration be g. For an infinitesimal addition to
rapidity in the x1 direction dζ, the proposal is

dζ = g dτ . (5.1)

Here we have suppressed the factors of c, which can be easily restored via dimensional
analysis. This formula is interesting because it actually holds for any kind of acceleration
g(τ ), not just the constant kind. Let us now see why.
Our key tool for analysis will be to define the instantaneous inertial rest frame
(IIRF) for the accelerating astronaut, which we will denote by primes. This is obviously
distinct from the ordinary inertial reference frame (IRF) of the homebody twin, and it
is different at each point along the space twin’s trajectory because they accelerate. The
key physical feature of the IIRF at any τ is that the astronaut is at rest in that frame at
the instant in question: u0x = 0. And since we know the relationship between 3-velocities
and 3-accelerations in different inertial reference frames from our experience with Lorentz
transformations, we can figure out what happens for the astronaut’s trajectory measured in
the lab frame. We have
ux − v ax
u0x = , a0x = 3 , (5.2)
1 − ux v γv (1 − ux v)3

where γv ≡ 1/ 1 − v 2 . Accordingly, at each instant along the astronaut trajectory,

ux = v . (5.3)

Therefore, for a general acceleration a0x = g(τ ) in the IIRF,

ax = γv−3 g(τ ) . (5.4)

21
We also know from elementary Lorentz transformations that dt = γv dτ (this is just like
muons from cosmic rays lasting longer in the lab frame than in the muon rest frame because
they are whizzing down to earth at relativistic speed). Remembering that ax = dux /dt,
rearranging the above equation as a function of v, and integrating gives for the 3-velocity in
the x1 direction Z τ
arctanh[ux (τ )] = dσ g(σ) . (5.5)
0

In turn, this can be easily rearranged to give the rapidity along the x1 direction
Z τ
ζ(τ ) = dσ g(σ) , (5.6)
0

where we assumed that ζ(τ = 0) = 0. Next, we would like to compute the distance in the
homebody frame moved by the astronaut during homebody time dt. This is simply obtained
from the speed
dx = ux dt = tanh ζ dt . (5.7)
To convert to astronaut time, we again use the standard time dilation formula,

dt = cosh ζ dτ . (5.8)

This implies that


dx = sinh ζ dτ . (5.9)
If we know ζ(τ ), we can integrate these equations. It is especially easy to do so for
constant acceleration g. The position of the space twin in homebody coordinates becomes
1
x(τ ) = [cosh(gτ ) − 1] + x0 . (5.10)
g
The time for the space twin in homebody coordinates integrates to
1
t(τ ) = sinh(gτ ) + t0 . (5.11)
g
Using these equations, you can figure out the physical effect of acceleration on the ageing
process. In your first homework assignment, you will find out that acceleration serves to
enhance the familiar constant-speed time dilation effect, rather than reduce it. This is
because the free particle trajectory actually maximizes proper time elapsed during motion;
any acceleration applied reduces it. We will be able to see why this is later on when we
study geodesics. Geodesics are, morally speaking, the closest thing to a straight line that is
available in curved spacetime. They describe the trajectories of test particles in freefall.
Getting back to our equations above, we can see that for the case of a constant relativistic
acceleration g, the trajectory of the accelerating space twin satisfies
 2
1 1
x(τ ) − x0 + − [t(τ ) − t0 ]2 = 2 . (5.12)
g g
As you can see by inspection, this is a hyperbola. The asymptotes of the hyperbola are
known as acceleration horizons or Rindler horizons. These asymptotes are lines with

22
a 1:1 slope on a spacetime diagram. We can see why they are horizons by recalling that
light rays also move at 45 degrees. An observer on a timelike trajectory going at higher
acceleration hugs the hyperbola asymptote more tightly, but still cannot ‘see’ beyond the
Rindler horizon.

In fact, the physics is even more interesting than this. The accelerated astronaut not
only finds that there are parts of spacetime that they cannot communicate with because of
their acceleration, but also that the physics of quantum fields for them is qualitatively and
quantitatively different from what the homebody sees. The Minkowski vacuum (the state
with no particles), seen in the reference frame of the astronaut with constant acceleration,
turns out to have plenty of particles in it, and they can be measured with a detector. Not
only that, the spectrum is thermal, at the Rindler temperature. Including factors of c,
the formula for this reads
~g
TRindler = . (5.13)
2πckB
The greater the acceleration, the higher the temperature that the detector will register. This
phenomenon of acceleration radiation is known as the Unruh effect. For those who are
interested, the physics of particle detectors in GR is explained nicely in the advanced GR
textbook by Birrell and Davies “Quantum Field Theory in Curved Space”.

23
5.2 The Equivalence Principle
Einstein became famous for several different accomplishments. One which is legion among
theoretical physicists is the concept of the Gedankenexperiment (German for thought exper-
iment). It allows us to work out all sorts of imaginative ideas without having to actually
spend any money. So imagine, if you will, that you are an astronaut on the space station.
Imagine that you are blindfolded and kidnapped and then one of two things happens to you.
Either you feel the acceleration due to gravity or you take a ride in a rocket ship capable of
that same acceleration. How would you tell the difference?

The gravitational force from a body of mass M on a test mass mg is

GN M mg
F~ grav = − r̂ , (5.14)
r2
where mg is the gravitational mass and GN is the Newton constant. So we have
 
mg GN M r̂
~agrav = − . (5.15)
mi r2

If mi = mg , then this acceleration ~agrav does not depend on the properties of the test mass
feeling the gravitational force.
The universality of gravitation was first put forward by Newton, centuries before Ein-
stein. Others hypothesized that the acceleration due to gravity should be universal, not de-
pending on the composition of the falling object. This idea has since been tested exquisitely
well. It implies that an object’s inertial mass (what makes you hard to move in the morning)
is equal to its gravitational mass (what responds to gravity), and is known as the Weak
Equivalence Principle.
When Einstein formulated his theory of General Relativity (GR) he decided to bake the
equivalence principle into the very fabric of spacetime. In GR, there is no local experiment

24
you can do to tell the difference between acceleration due to rockets and acceleration due to
gravity. This is known as the Einstein Equivalence Principle.
The really cool thing about the equivalence principle? It implies that every reference
frame, including accelerating ones, can be instantaneously approximated by a Lorentz frame.
This might seem like mathematical nitpicking, but it is actually a key physics insight, as it
implies that locally in spacetime, everything is just Special Relativity. What makes Gen-
eral Relativity interesting and nontrivial is the story of how those individual infinitesimal
neighbourhoods are sewn together into the fabric of curved spacetime.
It is important to note that this equivalence between gravity and acceleration holds only
in an infinitesimal patch about a point. If we have access to a finite sized patch of spacetime,
we can distinguish gravity from acceleration by measuring tidal forces. We will develop that
story later on when we get to discussing geodesic deviation.
Consider a photon in Earth’s gravitational field. If it gets aimed upwards, then after
a time interval dt, what is the effect? Well, photons cannot change their speed, as they
always go at c. What can change for a photon is its energy (or equivalently the magnitude
of its momentum, because the photon mass shell relation is E = |~p|). It can also change
its heading. When a photon moves upwards in a gravitational field, it gains gravitational
potential energy, so it must lose kinetic energy (conservation of the total energy is valid near
Earth, because there is a time translation symmetry). The photon should therefore suffer a
redshift in going upwards. This phenomenon is known as gravitational redshift, and it
implies that clocks run slower when they are deeper in a gravitational field. Black holes take
this to an extreme, as we will see much later in the course.
Did you know that GPS devices rely on both Special and General Relativity to locate
you accurately? They need to take account of the fact that the GPS transmitter satellites
are (a) travelling at a measurable fraction of the speed of light, requiring Special Relativistic
Doppler corrections, and (b) higher up in Earth’s gravitational field than we are, necessitating
General Relativistic corrections. Without those corrections together, you would probably
be kilometres off your intended position after a day’s canoeing. So GR does actually touch
your life in a measurable way, if you ever use a GPS unit, say in your smartphone.

5.3 Spacetime as a curved Riemannian manifold


Newton conceptualized gravity via forces that act at a distance instantaneously. This in-
stantaneous propagation of gravitational effects is in direct contradiction to the relativistic
principle that the speed of light is the upper speed limit for everyone. In Einstein’s GR,
the speed of propagation of gravititational disturbances is tied to be exactly equal to the
speed of light in vacuum. The formalism of GR is designed to express all the effects of grav-
ity in a relativistic way, like gravitational redshift, via geometrical properties of the fabric of
spacetime. The mathematical name for the type of geometry used is (pseudo)Riemannian
geometry.
A (p + q)-dimensional manifold with signature (p, q) is a spacetime that locally looks
like a patch of Rp,q . For example, for our D = 3 + 1 universe with three large spatial
dimensions, this would be R3,1 . The manifold is the collection (union) of these patches,
known as coordinate charts, along with the transition functions that teach you how
to sew the patches together. The manifold needs to be continuous, and in order for us

25
to compute sensible physical quantities it should also be differentiable. The mathematical
concept of the coordinate chart is equivalent to the usual physics idea of a coordinate system
or reference frame.
As an example of how you might need more than one coordinate chart to cover a manifold,
consider a circle S 1 . Each coordinate chart must be an open set of R (emphasis on open).
So the minimum number of coordinate charts required to cover the 1-sphere is two.
For the 2-sphere S 2 , you need a projection to get S 2 onto R2 , or a patch thereof like a
map. The most commonly used projection is the Mercator projection, which preserves angles
rather than area. It is possible to use a different projection that preserves area, such as the
Peters projection. However, the price of maintaining areas on the map is that angles are not
preserved: countries look funny shaped compared to their Mercator cousins. Because the
sphere is curved and the plane is not, you cannot create a map that preserves both angles
and areas. The reason why the Mercator projection has been so dominant is a technical one:
because it preserves angles, it is optimal for navigation of marine vessels and aeroplanes.
But it massively overstates the size of countries closer to the poles. In particular, Western
Europe looks more important on Mercator maps than it should, while Africa and Brazil look
much smaller. Colonialism also had a role in the dominance of the Mercator projection.
Examples of manifolds include Minkowski space, the sphere, the torus, and 2D Riemann
surfaces with arbitrary genus. What about spaces that are not manifolds? Any intersection
of lines with k-planes will do. A cone is an example of a non-differentiable manifold, because
of what happens at its apex. Some manifolds have a boundary, for instance a line segment.
Some manifolds have no boundary.
General Relativity treats the fabric of spacetime as a differentiable manifold. Note that
it is also possible to handle discontinuities in the spacetime metric in some situations in
GR, but only if the appropriate source of energy-momentum is available at the discontinuity
to enforce consistency with Einstein’s equations. The formalism for handling this non-
differentiable case is known as the Israel junction conditions, and its equations are derived
by integrating Einstein’s equations across discontinuities in suitably covariant ways. This
works a lot like deriving equations for shock waves in fluid mechanics.
Spacetime being a differentiable manifold is not enough structure to describe gravity as
we see it in experiments. The geometry should be suitably constrained by some physical
equations, which should – by the Correspondence Principle – reduce to Newtonian mechanics
in the limit of small speeds and weak gravity. Our spacetime manifolds will satisfy the
Einstein equations.

26
6 M28Sep
6.1 Basis vectors in curved spacetime
How do vectors and tensors work when spacetime is curved? We will have to be more
careful than before, and the key difference is that the matrices showing us how to transform
between different coordinate systems are no longer constant matrices. Suppose that we
have coordinates xµ on our manifold and that we consider an arbitrary functions of these
coordinates. Then the directional derivative along a direction λ of a curve is
df ∂f dxµ
= (6.1)
dλ ∂xµ dλ
so that we can write
d dxµ
= ∂µ (6.2)
dλ dλ
In other words, {eµ = ∂µ } is a set of basis vectors.
This story goes deeper. Mathematically, the tangent space Tp (M ) at a point p of a
manifold M is isomorphic to the space of directional derivative operators on curves through
p. It is a vector space, and the Leibniz rule is obeyed. Vector fields can then be defined on M .
An example of a vector field would be the wind direction at the surface of the Earth. Take
a look at https://earth.nullschool.net/ for a very beautiful interactive visualization of
winds on Earth.
What about a basis for covariant vectors living in the cotangent space Tp∗ (M )? There is
a very natural candidate: the differentials {eµ = dxµ }. Note that these dxµ are not the same
as the contravariant basis vectors ∂µ = ∂/∂xµ ; you can tell the difference partly by where
the index is placed. The coordinate bases for contravariant and covariant vectors obey a
natural inner product,
∂xµ
(∂ν )(dxµ ) = ν
= δνµ . (6.3)
∂x
We do not have to stick to only using partial derivatives and differentials as bases. More
generally, we can denote a basis for contravariant vectors living in the tangent space as
{eµ } . (6.4)
Any contravariant vector v can be expanded in terms of this basis,
v = v µ eµ . (6.5)
Also, we denote a basis for covariant vectors living in the cotangent space as
{eν } . (6.6)
Any covariant vector ω can be expanded in terms of this basis,
ω = ων eν . (6.7)
Generally, a basis for contravariant vectors eµ and a basis for covariant vectors eν must be
reciprocals,
eν · eµ = δνµ . (6.8)

27
What if we wanted to measure distances and angles on our spacetime manifold? In terms
of the basis eµ we can write the vector displacement ds between a point at xµ and another
point at xµ + dxµ in terms of our general basis vectors,

ds = eµ dxµ . (6.9)

Accordingly, the line element ds2 is

ds2 = ds · ds = eµ dxµ · eν dxν . (6.10)

From this equation we can identify the metric tensor, denoted by gµν ,

gµν = eµ · eν . (6.11)

This is a generalization of the flat Minkowski metric that we encountered in our quick review
of Special Relativity, and it tells us how to measure distances and angles. The above line
element in curved spacetime obeys a very important principle: it is invariant under arbitrary
coordinate changes which are invertible and C ∞ , known as diffeomorphisms.
The inverse metric is denoted as g µν and it is built in an exactly similar fashion,

g µν = eµ · eν . (6.12)

The downstairs metric gµν and its inverse the upstairs metric g νλ obey

g µν gνλ = δλµ . (6.13)

The spacetime metric and its inverse are used all the time in GR, for raising and lowering
indices on tensors.
Sometimes it is physically useful to use a special basis called the orthonormal basis.
In this case, we denote the basis vectors with hats, and they obey

êµ · êν = ηµν ,


êµ · êν = η µν . (6.14)

Flat, boring Minkowski spacetime R1,3 written in a spherical polar coordinate basis is
not a curved spacetime, but it has a spacetime metric and tensor transformation laws that
depend on spacetime position. As an exercise to test your understanding, check explicitly
that the line element is

ds2 = dt2 − dr2 − r2 dθ2 − r2 sin2 θ dφ2 , (6.15)

by starting from the expressions for the spherical polar spatial coordinates {r, θ, φ} in terms
of the the Cartesian spatial coordinates {x1 , x2 , x3 },

x1 = r cos θ , (6.16)
x2 = r sin θ cos φ , (6.17)
x3 = r sin θ sin φ . (6.18)

28
6.2 Tensors in curved spacetime
Tensors work in curved spacetime a lot like they do in flat spacetime. The most important
physical difference is that under a change of reference frame represented by
0
0 ∂xµ
Λµ ν ≡ (6.19)
∂xν
the new coordinates are related to the old ones by coordinate-dependent factors, rather than
simple constants like cos θ or sinh ζ. Our central physics strategy will be to remain focused
on the transformation properties of our tensors of interest. That is the essence of what a
tensor does: it transforms in very specific, well-defined ways when the coordinate system
changes.
Earlier, we introduced bases eµ for rank (1,0) vectors and eν for rank (0,1) vectors.
Accordingly, a general rank (1,0) vector v in curved spacetime can be written in components
as
v = v µ eµ , (6.20)
and a covariant vector ω in curved spacetime can be written in components
ω = ωµ eµ . (6.21)
Under coordinate transformations, their components transform as
0 0
v µ = Λµ ν v ν , (6.22)
and
ωµ0 = Λµ0 ν ων . (6.23)
The transformation matrices
0
µ0 ∂xµ ν ∂xν
Λ ν = and Λµ0 = (6.24)
∂xν ∂xµ0
now generically depend on spacetime position, and they satisfy
0 0 0
Λµ0 σ Λµ ν = δνσ , Λµ0 σ Λν σ = δµν 0 . (6.25)
Contravariant vectors on a pseudoRiemannian manifold representing curved spacetime
live in the tangent space, which is a vector space. Covariant vectors live in the cotangent
space. The collection of all (co)tangent spaces over M is known mathematically as the
(co)tangent bundle. One of the key properties of a covariant vector ω is that we can
naturally take its inner product with a contravariant vector v without using the metric, and
it yields a scalar:
ω(v) = ων eν · v µ eµ = ωµ v µ . (6.26)
This enables us to recognize that another way to think about a covariant vector is that it
is a machine that takes a contravariant vector and produces a scalar. Or in mathematical
words, it is a bilinear map from the cotangent space into the real numbers R, obeying
(aω1 + bω2 )(v) = aω1 (v) + bω2 (v) , (6.27)
ω(av1 + bv2 ) = aω(v1 ) + bω(v2 ) . (6.28)

29
Vectors obey entirely analogous rules.
A rank (m, n) tensor in curved spacetime is defined by direct analogy as a multilinear
map from a collection of m covariant vectors and n contravariant vectors to R. Its com-
ponents in a coordinate basis can be extracted from T by slotting in the right number of
covariant and contravariant basis vectors,

T µ1 ...µm ν1 ...νn = T (eµ1 , . . . , eµm , eν1 , . . . , eνn ) . (6.29)

Alternatively, it can be written in terms of basis tensors as

T = T µ1 ...µm ν1 ...νn eµ1 ⊗ . . . ⊗ eµm ⊗ eν1 ⊗ . . . ⊗ eνn , (6.30)

where ⊗ denotes the outer product (not the inner product!). Note that in this picture
expanding a tensor in components in one basis versus a second basis results in different
components, as we would expect; the tensor stays the same.
The coordinate transformation law for the components of a rank (m, n) tensor in curved
spacetime is
0 0
0 0 ∂xµ1 ∂xµm ∂xσ1 ∂xσn λ1 ...λm
T µ1 ...µmν1 0 ...νn 0 = . . . . . . T σ1 ...σn . (6.31)
∂xλ1 ∂xλm ∂xν1 0 ∂xνn 0
Notice that now the Jacobians involved typically depend on spacetime position.

6.3 Rules for tensor index gymnastics


There are very specific rules for manipulating tensors. We already met one of them: the
Einstein summation convention. In curved spacetime it works exactly the same way as in
flat spacetime: repeated indices are summed over. But let us also make explicit some other
specific tensor manipulation rules.
First and foremost among them is the fact that when you write a tensor equation, indices
on the LHS and RHS must be exactly matched. For example, pµ = muµ is a sensible tensor
equation (it has one upstairs index on both sides) while the erroneous pµ = muµ is not (on
the LHS the µ index is downstairs while on the RHS it is upstairs).
Second, vertical moves of tensor indices – up or down – can only be made by lowering or
raising them with the rank (0,2) metric tensor or its rank (2,0) inverse. Said less pedantically,
we raise or lower indices using the metric. For example,

Tµ ν λσ = gµρ T ρν λσ , (6.32)

and similarly for other raised/lowered components: you use as many factors of the met-
ric/inverse metric as needed (with appropriate contractions) to lower/raise all the requisite
indices.
Third, we must always preserve the horizontal ordering of the indices when cal-
culating, for both upstairs and downstairs indices. For example, for a general rank (2,2)
tensor,
T µν λσ 6= T νµλσ . (6.33)

30
(The RHS has the µ and ν indices switched compared to the LHS.) Other horizontal switches
of indices are equally verboten, unless you know that the tensor has appropriate symmetry
properties. The only standard exception to the rule that horizontal index ordering matters
is the Kronecker δβα tensor, which is symmetrical by definition.

Let us now make a few remarks about symmetries among tensors. Tensors can have
symmetries on their indices, which reduce the number of independent components, but this
is not generic. For example, under interchange of its indices a two-index tensor might be
symmetric
Sµν = +Sνµ , (6.34)
or antisymmetric
Aµν = −Aνµ . (6.35)
For rank two tensors only, an arbitrary tensor T can actually be written as the sum of a
symmetric tensor S and an antisymmetric tensor A. In components,

Tµν = Sµν + Aµν , (6.36)

where
1 1
(Tµν + Tνµ ) , Aµν = (Tµν − Tνµ ) ,
Sµν = (6.37)
2! 2!
This works because the total number of independent components of a 2-index tensor is D ×
D = D2 , while a symmetric 2-index tensor has D(D+1)/2 components and an antisymmetric
2-index tensor has D(D − 1)/2, so that D(D + 1)/2 + D(D − 1)/2 = D2 . For larger rank,
such a split cannot be done, because totally symmetric and totally antisymmetric tensors do
not have enough independent components between them to cover the total number.
Any tensor can be symmetrized or antisymmetrized on any number k of upper or lower
indices. For symmetrization on k indices, denoted by round parentheses around those indices,
we have
1
T(µ1 ...µk ) ≡ (Tµ1 µ2 ...µk + sum over other permutations of {µ1 . . . µk }) , (6.38)
k!
while for antisymmetrization on k indices, denoted by square brackets around those indices,
we have
1
T[µ1 ...µk ] ≡ (Tµ1 µ2 ...µk + alternating sum over other permutations of {µ1 . . . µk }) . (6.39)
k!
where the alternating sum counts even/odd permutations with a +/− sign.
Suppose you know that a tensor is symmetric with all indices down. How do you work
out the symmetry of its counterpart with some or all of its indices up? By using known
symmetry properties of the downstairs components and using the metric tensor to raise
indices. Remember that the metric tensor itself is symmetric under interchange of its two
indices, and so is its inverse. Note also that the contraction of the metric tensor with itself
is
g µν gµν = g µµ = δµµ = 1 + . . . + 1 = D . (6.40)

31
7 R01Oct
7.1 Building a covariant derivative
Because coordinate change matrices generically depend on spacetime position, simple partial
derivatives of tensors are typically not themselves tensors. For example, the partial derivative
of a covariant vector W , ∂µ Wν , changes under coordinate changes as

∂xµ ∂
 ν 
∂ ∂ ∂x
Wν −→ Wν 0 = Wν
∂xµ ∂xµ0 ∂xµ0 ∂xµ ∂xν 0
∂xµ ∂xν ∂ ∂xµ ∂
 ν
∂x
= 0 0 W ν + W ν . (7.1)
∂xµ ∂xν ∂xµ ∂xµ ∂xµ ∂xν 0
0

Although the first term looks good for tensoriality, we see that the second term ruins the
fun for generic changes of coordinates.
At any particular point p, we can choose a reference frame (denoted here by hats) in
which the first derivatives can be set to zero in that coordinate system, ∂σ̂ gµ̂ν̂ |p = 0. The
way to see this mathematically is to use (a) Taylor expansions around a particular point
and (b) the tensorial transformation property of the two-index metric tensor gµν . However,
this cannot be made to work beyond first order in derivatives, because there are not enough
components. Physically, this means that we will need extra structure on our spacetime
manifold in order to be able to define covariant derivatives that transform like tensors.
The structure we need is known as an affine connection. It will enable us to make
covariant versions of partial derivatives ∂µ , denoted ∇µ , designed to transform like tensors.
For taking covariant derivatives of tensorial indices relevant to bosonic fields, we will use the
Levi-Civita connection or Christoffel symbols Γµνλ . For taking covariant derivatives of
spinorial indices relevant to fermionic fields, a researcher would use a different beast known
as the spin connection ω µab (see GR2 for details). We will work on manifolds without torsion,
and for this case knowing the metric tensor is sufficient to determine both connections.

7.2 How basis vectors change: the role of the affine connection
In curved spacetime, the partial derivative of a basis vector generically does not appear to
lie in the tangent space. But as HEL explain in §3, this can be easily fixed up by defining the
derivative in the manifold of the coordinate basis vectors by projecting into the tangent space
at the point in question. Then we can expand out the expression for the partial derivative
∂λ of a contravariant basis vector eµ in terms of the basis eν , with coefficients Γνµλ :

∂λ eµ = Γνµλ eν . (7.2)

We can figure out the analogous equation for the covariant basis vectors eµ by differentiating
the equation eµ · eν = δνµ and taking the partial derivative of both sides, yielding

∂λ eµ = −Γµνλ eν . (7.3)

We can find the expressions for the coefficients Γνµλ by taking the partial derivative of

32
our metric tensor,
∂λ gµν = (∂λ eµ ) · eν + eµ · ∂λ eν
= Γσµλ eσ · eν + eµ · Γσνλ eσ . (7.4)
Using this, we can now form the combination
∂λ gµν + ∂ν gλµ − ∂µ gνλ
= 2Γσλν gµσ . (7.5)
where in the second step we used the fact that the Christoffels are symmetric under inter-
change of their lower indices. This fact stems from the assumption that our spacetime has
zero torsion6 . Rearranging the above gives the full expression for the coefficients Γµνλ in
terms of first derivatives of the metric tensor,
1
Γµνλ = g µσ (∂ν gσλ + ∂λ gνσ − ∂σ gνλ ) . (7.6)
2

How about an example? Consider the 2D plane R2 with Cartesian coordinates {x, y}.
The basis vectors ex and ey are maximally boring: they do not change with position. How-
ever, if we transform to plane polar coordinates {ρ, φ} given by
p
x = ρ cos φ , ρ = x2 + y 2 ,
y = ρ sin φ , φ = arctan(y/x) , (7.7)
then in plane polar coordinates the basis vectors eρ and eφ definitely do change with position,
which is the generic situation in GR. To see this, recall that
ds = ex dx + ey dy = eρ dρ + eφ dφ , (7.8)
and use the above coordinate transformations to identify
eρ = + cos φ ex + sin φ ey ,
eφ = ρ(− sin φ ex + cos φ ey ) . (7.9)
It follows quickly from the above that
ds2 = dρ2 + ρ2 dθ2 . (7.10)
We can inspect how the basis vectors change with position, to obtain
∂eρ
= 0,
∂ρ
∂eρ 1
= − sin φ ex + cos φ ey = eφ ,
∂φ ρ
∂eφ 1
= − sin φ ex + cos φ ey = eφ ,
∂ρ ρ
∂eφ
= −ρ cos φ ex − ρ sin φ ey = −ρ eρ . (7.11)
∂φ
6
Torsion is a rank (1,2) tensor, and it falls outside the scope of this course.

33
From eq.(7.3), we have that ∂λ eµ = Γν µλ eν , so we can read off the Christoffels,

Γρρρ = 0 , Γφρρ = 0 ,
1
Γρρφ = 0 , Γφρφ = + ,
ρ
1
Γρφρ = 0 , Γφφρ = + ,
ρ
Γρφφ = −ρ , φ
Γ φφ = 0 . (7.12)

Alternatively, we could have obtained these expressions for the Christoffels by taking deriva-
tives of the metric tensor as in eq.(7.6). But I think also showing the explicit effect on the
basis vectors as in eq.(7.11) helps us understand the physics better. I recommend drawing
yourself some pictures to illustrate explicitly how the plane polar coordinate basis vectors
change with position according to the above equations.

Previously, we noticed that taking the partial derivative of a tensor does not give another
tensor, generically. The problem was that the coordinate transformation generally depends
on spacetime position. Let us delineate the properties that a covariant derivative ∇ should
have. It should be linear,
∇(T + S) = ∇T + ∇S , (7.13)
where S, T are arbitrary tensors, and it should obey the Leibniz rule,

∇(T ⊗ S) = (∇T ) ⊗ S + T ⊗ (∇S) . (7.14)

It should also commute with contractions, which is tantamount to assuming that

∇σ gµν = 0 , (7.15)

a very reasonable assumption. The covariant derivative should reduce to the partial deriva-
tive when operating upon scalars, because those tensors have no legs. The combination
of the first two properties above implies that ∇ can be written as the sum of the partial
derivative ∂ and a linear transformation, which you can think of as a ‘correction’ to keep the
derivative tensorial. The coefficients of this correction term are known as the connection
coefficients or the Christoffel symbols Γµαβ (pronounced criss-toff-ill).
To see how this works, let us consider the derivative of a contravariant vector v

∂ν v = (∂ν v µ )eµ + v µ (∂ν eµ )


= (∂ν v µ )eµ + v µ (Γλµν eλ )
= (∂ν v µ )eµ + v λ (Γµλν eµ )
= (∂ν v µ + v λ Γµλν )eµ , (7.16)

where in the third line we relabelled dummy indices. The part in brackets is known as the
covariant derivative,
∇ν v µ ≡ ∂ν v µ + Γµλν v λ . (7.17)

34
By exactly similar logic, we can find the covariant derivative of a covariant vector,

∇ν ωµ = ∂ν ωµ − Γλµν ωλ . (7.18)

If we want to take the covariant derivative of a rank (m, n) tensor, then we just act on each
of its legs in turn with the connection,

∇σ T µ1 ...µmν1 ...νn = ∂σ T µ1 ...µmν1 ...νn


+Γµ1σλ T λµ2 ...µm ν1 ...νn + Γµ2σλ T µ1 λµ3 ...µm ν1 ...νn + . . .
−Γλσν1 T µ1 ...µmλν2 ...νn − Γλσν2 T µ1 ...µmν1 λν3 ...νn + . . . (7.19)

How about a quick example? We can use the Christoffels to teach us how to take the
covariant Laplacian in 2D plane polar coordinates. If we have a scalar field Ψ(ρ, φ), then

∇µ ∇µ Ψ = ∇µ ∂ µ Ψ
= ∂µ ∂ µ Ψ + Γµµν ∂ ν Ψ
= (∂ρ ∂ ρ + ∂φ ∂ φ )Ψ + Γρρρ ∂ ρ Ψ + Γρρφ ∂ φ Ψ + Γφφρ ∂ ρ Ψ + Γφφφ ∂ φ Ψ
∂ 2 Ψ 1 ∂Ψ 1 ∂ 2Ψ
= + + . (7.20)
∂ρ2 ρ ∂ρ ρ2 ∂φ2
This should look familiar from vector calculus class: we just derived it from first principles.

One of the most important things to remember is that the connection is not a tensor.
It has components labelled with Greek indices, but that does not make it a tensor in and of
itself. Indeed, the connection is designed specifically to correct the non-tensorial property
of the partial derivative in order to create a new tensor from an old one. Its transformation
law under a coordinate transformation is
0 0
0 ∂xµ ∂xλ ∂xν ν ∂xµ ∂xλ ∂ 2 xν
Γνµ0 λ0 = Γ − . (7.21)
∂xµ ∂xλ ∂xν µλ ∂xµ0 ∂xλ0 ∂xµ ∂xλ
0 0

From this, you can see that the difference between two connections is a tensor, because the
second term (which is independent of the Γs) drops out of their transformation law.
Our connection is metric compatible, meaning that the covariant derivative con-
structed from it obeys
∇σ gµν = 0 , (7.22)
for all values of σ, µ, ν. There are two other useful equations that follow from this one,

∇σ g µν = 0 , (7.23)
∇λ µ0 µ1 ...µd = 0 , (7.24)

where the completely antisymmetric tensor density µ0 µ1 ...µd is used in integrating covariantly
over spacetime. We will not have occasion to use it here, but it will play an important role
in deriving Einstein’s equations from an action principle in the GR2 course. The fact that
our metric-compatible covariant derivative commutes with raising and lowering of indices is

35
very fortunate – if there were torsion, we would have to be scrupulously careful about our
index placements.
In discussing covariant derivatives of tensors, it is worth noting here that some people
use a different convention than ours. They abbreviate by defining commas after indices to
represent partial derivatives, while semicolons represent covariant derivatives. We will stick
with keeping ∂ and ∇ explicit, because in pages full of long GR equations it is all too easy
to lose track of punctuation marks.

7.3 The covariant derivative and parallel transport


Introducing a covariant derivative (as compared to a plain derivative) was a really great idea
for doing physics. It allows us to write tensor equations wherever we go. All we need to do
is to be sure to write ∇s rather than ∂s. But one question worth asking is this: what rate
of change does ∇ actually measure?
A way to answer that question and get a better handle on ∇ is to ask when ∇ of some
tensor is zero. For this we actually have to specify what path along which we hope to compare
tensors – because comparing tensors at two different points is, a priori, meaningless in GR.
After all, the spacetime metric varies from point to point.
Consider a parametrized curve xµ (λ), and a vector field v(λ) = v µ (λ)eµ (λ). The deriva-
tive of the vector v field w.r.t the curve parameter λ is
dv dv µ ∂eµ dxσ
= eµ + v µ σ
dλ dλ ∂x dλ
µ
dv dxσ
= + Γµνσ v ν eµ
dλ dλ
Dv µ
≡ eµ . (7.25)

The quantity
D dxσ
≡ ∇σ (7.26)
Dλ dλ
is known as the directional covariant derivative. This animal is only defined along the
path xµ (λ), and when acting on a tensor it produces another tensor.
We say that a tensor T is parallel transported along the path if
D µ1 ...µk dxσ
T ν1 ...ν`
= ∇σ T µ1 ...µkν1 ...ν` = 0 . (7.27)
Dλ dλ
This is known as the equation of parallel transport, and it is a proper tensor equation. Now,
since we have a metric compatible connection, ∇σ gµν = 0, parallel transport preserves
the inner product of two tensors. For example, for two vectors V µ and W µ ,
   
D µ ν D µ ν D µ ν µ D ν

(gµν V W ) = gµν V W + gµν V W +V W (7.28)
Dλ Dλ Dλ Dλ
= 0 + 0 + 0 = 0, (7.29)
if the vectors are both parallel transported. You can visualize what parallel transport does
by imagining that it keeps the same angle between the vector and the directional derivative
along the path xµ (λ).

36
To see what parallel transporting can imply, consider the two-sphere. Imagine that we
start at the North Pole with a vector at an angle. We keep the angle of our vector constant
as we move along a line of longitude, (say) the Greenwich meridian, down to the Equator.
Then imagine that we turn East and continue parallel transporting our vector some way
around the equator. Then we turn North and parallel transport our vector up a second line
of longitude, back to the North Pole. If you have visualized this correctly in your mind, you
will see that our vector, regardless of the direction it was initially pointing, has undergone a
finite rotation. This is because the sphere is (positively) curved.

37
8 M05Oct
8.1 The geodesic equations for test particle motion in curved
spacetime
A geodesic is a path xµ (λ) that parallel transports its own tangent vector. It follows that
the equation satisfied by the geodesic is

D dxµ d 2 xµ ν σ
 
µ dx dx
= + Γ νσ = 0. (8.1)
Dλ dλ dλ2 dλ dλ

We can also think about parallel transport in the following way. When we take an
ordinary partial derivative, we do it by taking

f (xµ + ∆xµ ) − f (xµ ) ∂f


lim µ
= . (8.2)
∆x→0 ∆x ∂xµ
In curved spacetime, the result of this is not a tensor. What we do is instead take the
covariant derivative, as follows.

1. We take xµ (λ + dλ) as our “x plus an infinitesimal change” and find T there.


2. We parallel transport T back to the original point at xµ (λ), along the path xµ (λ).
3. We compare the parallel-transported-back T to the original T at xµ (λ), and we ‘divide
by’ dλ.

The result is DT /Dλ, the covariant rate of change of the tensor with respect to λ at the
spacetime point xµ (λ).

Let us now see another way that the geodesic equation can be derived, using a variational
approach. Consider a massive point particle in proper time gauge. The relativistic einbein
action is, up to a constant that is physically irrelevant at the classical level,

dxµ (τ ) dxν (τ )
Z
S = −m dτ gµν (xλ ) . (8.3)
dτ dτ
What happens when we vary
xµ → xµ + δxµ ? (8.4)

38
Under such a variation,
gµν → gµν + (∂σ gµν )δxσ . (8.5)
Varying the action, we have

dxµ dxν
Z  
1
− δS = dτ δ gµν (8.6)
m dτ dτ
µ ν
dδxµ dxν
Z     
σ dx dx
= dτ (∂σ gµν ) δx + gµν + (µ ↔ ν) (8.7)
dτ dτ dτ dτ
dxµ dxν σ
Z 
= dτ (∂σ gµν ) δx +
dτ dτ
dxσ dxν µ d 2 xµ ν
 
− (∂σ gµν ) δx + gµν 2 δx + (µ ↔ ν) , (8.8)
dτ dτ dτ

where in the last step we integrated by parts7 . We also used the fact that
δdxµ dδxµ
= . (8.9)
dτ dτ
Collecting all the terms, we have

d2 x σ ν ρ
Z  
1 σ dx dx
− δS = dτ gµσ 2 + gµσ Γ νρ δxµ . (8.10)
m dτ dτ dτ

Demanding that this be zero for arbitrary variations δxµ , we obtain the geodesic equation,

d 2 xµ ν
µ dx dx
ρ
+ Γ νρ = 0. (8.11)
dτ 2 dτ dτ
An affine parameter λ is defined to be λ = aτ + b for constants a, b. In other words, a λ
is an affine parameter if it is linearly related to τ (for a massive particle). For a massless
particle, we can still define an affine parameter. In fact, our geodesic equation requires just
such an affine parametrization, regardless of the particle mass.
For either massive or massless particles, the geodesic equation can be written in very
compact form in terms of the momentum vector,

pν ∇ ν pµ = 0 . (8.12)

For point particles, we relate the momentum pµ to the four-velocity uµ via


dxµ
pµ = muµ = m , m2 > 0 ,

pµ = u µ , m2 = 0 . (8.13)

The second formula follows Carroll’s convention for defining the “four-velocity” for massless
particles. Since pµ pµ = 0 for them, we have a free choice for the proportionality factor.
7
We assume that the manifold has sufficiently trivial topology for the integration by parts to work.

39
There is a central physics point to understand about this extremization. Is it a mini-
mization or a maximization? In fact, the geodesic maximizes proper time. Why? Well,
if we were to lower the proper time interval (∆τ )2 along a changed path, we would get closer
to (∆τ )2 = 0, which is a null path. To go lower, to (∆τ )2 < 0, we would have to use an
illegal spacelike path. So minimizing (∆τ )2 does not make sense, and in fact the proper
time is maximized via the variational principle. The fact that the proper time is maximized
happens precisely because it is infinitesimally close to paths with lower proper time. Carroll
has a morally similar argument: he shows that for any timelike path, we can approximate it
by a (jaggedy looking) piecewise continuous bunch of null paths, all of the pieces of which
have zero invariant interval. Since the geodesic is infinitesimally nearby to null paths with
zero proper time, it must maximize proper time.
The physical consequence of this mathematical fact that geodesics maximize proper time
is that accelerated observers – those who are not in freefall – measure less proper
time than those who are in freefall. This is why the space twin in the Twin Paradox always
comes back younger, not older, than the homebody twin. The more you accelerate around
with your rockets, the younger you are compared to a homebody who stays on a geodesic.
If all geodesics on a spacetime manifold go as far as they please, then the manifold is
said to be geodesically complete. But if some geodesic(s) bang into a singularity, or end
prematurely, then the manifold is geodesically incomplete. For spacetimes with matter, this
is the generic case, actually. Roger Penrose just won part of the 2020 Nobel Prize in Physics
for (co-)explaining this.

8.2 Example computation for affine connection and geodesic equa-


tions
Let us now work a relatively simple example of calculating Christoffel components for a
spacetime with dependence on only one coordinate, x0 = t. We will take the spatially flat8
Friedman-Robertson-Walker ansatz in D = d + 1 spacetime dimensions,

ds2 = dt2 − a2 (t)|d~x|2 , (8.14)

where a(t) is the scale factor. Since

ds2 = gµν dxµ dxν , (8.15)

we have

g00 = +1 ,
gij = −[a(t)]2 δij . (8.16)

Because the metric is diagonal, we can invert it by eye, to obtain

g 00 = +1 ,
g ij = −[a(t)]−2 δ ij . (8.17)
8
For the more general case with nontrival spatial metric, see Carroll §8.3.

40
Finding the Christoffels is relatively straightforward, as many of them are zero. Notice that
the only coordinate dependence in the metric is on the time coordinate.
First, let us try for Γ000 ,
1
Γ000 = g 0σ (∂0 g0σ + ∂0 g0σ − ∂σ g00 )
2
1 00
= g ∂0 g00 = 0 , (8.18)
2
because the metric is diagonal and because g00 is a constant.
Next up is
1
Γ00i = g 0σ (∂0 giσ + ∂i g0σ − ∂σ g0i )
2
1 00
= g (∂0 gi0 + ∂i g00 − ∂0 g0i )
2
1 00
= g ∂i g00 = 0 , (8.19)
2
because the metric is diagonal and because g00 is a constant.
A more interesting case is Γ0ij , which is nonzero.
1
Γ0ij = g 0σ (∂i gjσ + ∂j giσ − ∂σ gij )
2
1
= g 00 (∂i gj0 + ∂j gi0 − ∂0 gij )
2
1
= − g 00 ∂0 gij
2
.
= aa δij , (8.20)
where . = d/dt. Along the way, we again used the fact that the metric is diagonal and g00
is a constant.
Now consider Γi 00 .
1
Γi 00 = g iσ (∂0 g0σ + ∂0 g0σ − ∂σ g00 )
2
= 0, (8.21)
because the metric is diagonal and because g00 is a constant.
Next, let us look at the only other nonzero Christoffel symbol Γi 0j . We have
1
Γi 0j = g iσ (∂0 gjσ + ∂j g0σ − ∂σ g0j )
2
1
= g ik (∂0 gjk + ∂j g0k − ∂k g0j )
2
1 ik
= g ∂0 gjk
2
1
= {[a(t)]−2 δ ik }∂0 {[a(t)]2 δjk }
2
.
a i
= δ j. (8.22)
a

41
Finally, what about the all-spatial Christoffels Γi jk ? We have
1
Γi jk = g i` (∂j gk` + ∂k gj` − ∂` gjk )
2
= 0, (8.23)

because none of the spatial components of the metric depends on spatial position.
In summary, we have:-
.
Γ0ij = aa δij , (8.24)
.
a i
Γi j0 = δ , (8.25)
a j
with all other components zero. Notice how it is the “velocity” of the scale factor ȧ(t) that
appears here. The quantity .
a
= H(t) (8.26)
a
is known as the Hubble constant if the scale factor is exponential. (Whether or not the
scale factor can behave in this fashion is determined by the energy-momentum of matter in
the spacetime, as we will discover later on in the course.)

Now let us look at the geodesic equations in this simple spacetime, doing a time space
split like for the Christoffels above. In general, we have
d 2 xµ ν
µ dx dx
σ
+ Γ νσ = 0. (8.27)
dλ2 dλ dλ
The 0th component of this equation reads
d 2 x0 0 dx dx
ν σ
0= + Γ νσ
dλ2 dλ dλ
2 0 i j
dx 0 dx dx
= + Γ ij
dλ2 dλ dλ
2 0
=
dx
+ a
. δ dxi dxj
a (8.28)
ij
dλ2 dλ dλ
because all the other terms contributing to the sums over ν and σ involve Christoffel com-
ponents that are zero.
The ith component reads
d 2 xi i dxν dxσ
0= + Γ νσ
dλ2 dλ dλ
2 i 0 j j 0
dx i dx dx i dx dx
= + Γ 0j + Γ j0
dλ2
. dλ dλ
d2 xi 2a dx0 dxi
dλ dλ
= + . (8.29)
dλ2 a dλ dλ
The first thing to notice about these geodesic equations we have derived is that they are
coupled and nonlinear. The equation for dx0 /dλ depends on what dxi /dλ are doing, and vice

42
versa. This is why solving for motions of massless particles (photons) or massive particles
(like electrons) in the background of a general curved spacetime is generically much more
complicated than doing Newton’s Laws for non-relativistic physics.
The second thing to notice about our super-simple spacetime is that the spatial geodesic
equations actually have a first integral (!). To see this, let us try taking the λ derivative of

dxj
pi = a2 (t) δij . (8.30)

We have, by the Leibniz rule and the chain rule,
j
 
d d 2 0 dx
pi = δij a (x )
dλ dλ dλ
. dx dx + δ a2 d2 xj
0
  j
= δij 2a a ij
dλ dλ dλ2
 2 j
dx
.
2a dx0 dxj

= δij a2 +
dλ2 a dλ dλ
= 0. (8.31)

Therefore, pi is a conserved quantity along the geodesic. As we will see a bit later in the
course, this conservation law arises because our spacetime metric has a symmetry: none
of the components of the metric tensor depends on spatial coordinates. This is your first
example of how Noether’s Theorem works in General Relativity.

43
9 R08Oct
9.1 Spacetime curvature
Einstein’s General Theory of Relativity upgraded the way we think about gravitational
physics. Instead of imposing Newton’s three laws of motion and imposing his force law
for universal gravitation, we assume that the starting point is the fabric of spacetime. We
worked quite hard already to define tensors on arbitrary spacetimes, by focusing intensely on
their transformation properties under changes of reference frame, i.e., changes of coordinates.
We also figured out in our last lecture how to define a covariant derivative, with the help of
the Levi-Civita connection. We went to all that trouble of wrangling the Christoffel symbols
because this enabled us to do two exciting things: (a) to define a derivative ∇µ that is
a tensor, even in curved spacetime, and (b) to derive the geodesic equation, which is the
equation obeyed by any relativistic particle undergoing freefall in the spacetime in question.
Along the way, we learned that geodesics maximize the proper time.
As we alluded to earlier, Riemann curvature tensor is the mathematical quantity that
Albert Einstein discovered was the key to gravitational physics expressed in the language
of curved spacetime. He realized that the Riemann tensor, which contains at most two
derivatives of the metric tensor, could even be used to build an action principle for general
relativity. We will derive the Einstein action and the Einstein equations of motion for
the gravitational field in the GR2 course. For now, all we need to keep in mind is that the
Riemann tensor encodes a wide variety of gravitational phenomena in its tensor components,
including the physics of tidal forces and the motion of particles in spacetime. In particular,
we will soon show how in the Newtonian limit of weak gravity and slow speeds, we will recover
familiar expressions from Newtonian physics – without ever having to use the concept of a
force! First, we need to develop a bit more formalism.

9.2 The Riemann tensor


Consider an infinitesimal parallelogram, with vectors Aµ and B ν forming the sides.

In hand-waving terms, the Riemann curvature is what tells us how much a vector V µ gets
rotated under parallel transport around the parallelogram. The infinitesimal change in V ,
δV , is a (1,0) tensor, and so are A, B, and V . Roughly speaking, we expect δV to be
proportional to V and to the size of the parallelogram. To connect δV to A, B, V we need
a (1,3) tensor with which to contract indices naturally, and the role of this is played by the
Riemann curvature. The resulting equation from our handwaving is therefore
δV µ ∼ Rµναβ V ν Aα B β . (9.1)
While this sketch of Riemann’s origin gives us the gist, we now need to be more precise and
make a proper definition.

44
Recall that earlier we found parallel transport to be the right way of thinking about how
to compare vectors at different places in spacetime. Combined with our little parallelogram
hand-wave just now, this can be used to motivate a mathematical definition of the Riemann
tensor as arising from taking commutators of covariant derivatives. On a (1,0) vector V ,
Riemann is defined via9
[∇µ , ∇ν ] V ρ = +Rρλµν V λ , (9.2)
for a torsion-free connection. This formula teaches us how to find the components of the
Riemann tensor in terms of Christoffel connection coefficients. Let us write out the pieces
individually to see how it works out. First, note that for any (1,1) tensor Tν ρ ,

∇µ Tν ρ = ∂µ Tν ρ + Γρµσ Tν σ − Γλµν Tλ ρ . (9.3)

So with Tν ρ = ∇ν V ρ , we have

∇µ (∇ν V ρ ) = ∂µ (∇ν V ρ ) + Γρµλ (∇ν V λ ) − Γλµν (∇λ V ρ ) (9.4)


= ∂µ (∂ν V ρ + Γρνλ V λ ) + Γρµλ (∂ν V λ + Γλνσ ) − Γλµν (∂λ V ρ + Γρλσ V σ ) (9.5)
= ∂µ ∂ν V ρ + Γρνλ ∂µ V λ + Γρµλ ∂ν V λ − Γλµν ∂λ V ρ
+ (∂µ Γρνσ )V σ + Γρµλ Γλνσ V σ − Γλµν Γρλσ V σ . (9.6)

Then

∇µ (∇ν V ρ ) − (µ ↔ ν) = ∂µ Γρνσ + Γρµλ Γλνσ V σ − Γλµν Γρλσ V σ


  

+ Γρνλ ∂µ V λ + Γρµλ ∂ν V λ − Γλµν ∂λ V ρ − (µ ↔ ν) (9.7)


  

∂µ Γρνσ + Γρµλ Γλνσ − (µ ↔ ν) V σ .



= (9.8)

Now we can put the pieces together to see the general formula for taking the commutator of
covariant derivatives acting on a vector. Using

[∇µ , ∇ν ] V ρ = +Rρσµν V σ (9.9)

gives us the formula for the Riemann tensor components,

Rρσµν = ∂µ Γρσν − ∂ν Γρσµ + Γλσν Γρλµ − Γλσµ Γρλν . (9.10)

For a covariant derivative acting on a (0,1) tensor, a covariant vector, one finds the same
Riemann tensor coefficients and

[∇µ , ∇ν ] ωρ = −Rλρµν ωλ . (9.11)

If you slog through the details, you can compute the commutator of covariant derivatives
on a rank (k, `) tensor V as well. This is not much worse than the calculation we have just
done, and we suppress the details here. The result is

[∇ρ , ∇σ ] V µ1 ...µkν1 ...ν` = Rµ1λρσ V λ...µk µ2


ν1 ...ν` + R λρσ V
µ1 λµ3 ...µk
ν1 ...ν` + . . .
µ1 ...µk µ ...µ
− Rλν1 ρσ V λ
λν2 ...ν` − R ν2 ρσ V
1 k
ν1 λν3 ...ν` − ... . (9.12)
9
We are using the sign conventions of HEL

45
Riemann arises naturally as a rank (1,3) tensor. By doing a partial contraction of two
of its indices, we can define the Ricci tensor Rµν , which naturally arises as a rank (0,2)
tensor,
Rµν = Rαµνα . (9.13)
Notice that we are contracting the first and fourth indices here to make the Ricci tensor.
This is a choice of convention, and we have chosen to use the same convention as the HEL
textbook.
By contracting the Ricci tensor with the metric, we can form the Ricci scalar R, which
has rank (0,0),
R = g µν Rµν . (9.14)
Other kinds of contractions involving Riemann are also possible, such as “Riemann squared”
and “Ricci squared”. For our purposes in this course, we only need to know about the Ricci
tensor and the Ricci scalar – because both of them will appear on the left hand side of
Einstein’s equations.
Note that if you change the signature of our Lorentzian spacetime from mostly minus
to mostly plus, the Christoffels Γλµν would stay the same, the Riemann tensor Rρλµν would
also stay the same, and so would the Ricci tensor Rµν , but the Ricci scalar R would develop
a relative minus sign.

9.3 Example computations for Riemann


Suppose that we study 2D Euclidean space in plane polar coordinates {x1 , x2 } = {ρ, ϕ},

ds2 = dρ2 + ρ2 dϕ2 . (9.15)

We previously found nonzero Christoffels for this spacetime in these coordinates when we
introduced basis vectors,
1
Γ212 = , Γ122 = −ρ . (9.16)
ρ
From this, we can find the Riemann tensor using our formula from above,

Rρσµν = ∂µ Γρνσ − ∂ν Γρµσ + Γρµλ Γλνσ − Γρνλ Γλµσ . (9.17)

Substituting in gives

R1212 = ∂1 Γ122 − ∂2 Γ121 + Γλ22 Γ1λ1 − Γλ21 Γ1λ2


= ∂1 Γ122 − 0 + 0 − Γ221 Γ122
1
= ∂ρ (−ρ) − (−ρ)
ρ
= −1 − (−1) = 0 . (9.18)

All the other components of Riemann that might have been nonzero are actually zero, also.
This result reflects the fact that this 2D spacetime is flat.

46
Now suppose we try a spacetime which we already suspect is curved: the two-sphere
with coordinates {x1 , x2 } = {θ, φ},
ds2 = dθ2 + sin2 θ dφ2 . (9.19)
Computing the Christoffels is straightforward, using either the formula in terms of deriva-
tives of the metric tensor or the formula for how basis vectors change. The only nonzero
components turn out to be
cos θ
Γ122 = − sin θ cos θ , Γ212 = . (9.20)
sin θ
As we will discover next week, there is only one independent component of Riemann in 2D,
and it is R1212 . To compute it, we substitute in again,
R1212 = ∂1 Γ122 − Γ221 Γ122
cos θ
= ∂θ (− sin θ cos θ) − (− sin θ cos θ)
sin θ
= −(cos2 θ − sin2 θ) + cos2 θ
= + sin2 θ . (9.21)
If we lift the second index using g µν , we obtain
R12 12 = +1 . (9.22)
The answer is positive because the sphere is positively curved. All of the other nonzero
components of Riemann can be expressed in terms of this one, for instance R21 21 = +R12 12 .
Finally, let us work a slightly more nontrivial example of calculating Riemann com-
ponents for a spacetime with dependence on only one coordinate. As with our geodesic
equation example at the end of the previous section, we take the spatially flat FRW ansatz
in D = d + 1 spacetime dimensions,
ds2 = dt2 − a2 (t)|d~x|2 , (9.23)
where a(t) is the scale factor. Most of the components of Riemann for this simple spacetime
are actually zero. Let us sketch how to find the ones that are nonzero.
We had for the Christoffels
.
Γ0ij = aa δij ,
.
a
Γi j0 = δji . (9.24)
a
The first group of nonzero Riemann components have one time index up and one down, and
two spatial indices:
R0i0j = ∂0 Γ0ji − Γ0jk Γk0i
.
. 2 .. . a k
= a + aa δij − aa δjk δ i
a
..
= aa δij . (9.25)

47
Then we have
Ri 00j = ∂0 Γi j0 + Γi0k Γkj0
aa − a2 i
..
a2 i
. .
= δ j + δ
a2 a2 j
..a
= δij . (9.26)
a
The second group of nonzero Riemann components has all spatial indices,
Ri jk` = Γi k0 Γ0lj − Γi `0 Γ0kj
.  . 
a i . a i .
= δ k (aa δ`j ) − δ ` (aa δkj )
a a
. 2 i i
= a δ k δj` − δ ` δjk .

(9.27)
Notice how we have discovered both “velocity squared” ȧ2 terms, which arise via Γ··· Γ···
parts in Riemann, and ä “acceleration” terms, which arise via ∂·2 g·· parts in Riemann. It
is not until you compute the curvature that you see the appearance of the “acceleration”
pieces. Notice also how the “acceleration” of the scale factor showed up in the Riemann com-
ponents involving the time direction; the all-spatial Riemanns gave only “velocity squared”
contributions.
Since we now have the Riemann tensor, we can contract it to find Ricci. The nonzero
components are
R00 = Ri 00i
..a
= +δ i i
..aa
= +d ,
a
Rij = R0ij0 − Rkijk
.. .
= −aa δij − a2 δ kk δij − δ kj δik

.. .
= −aa δij − a2 (d − 1)δij
 .. .
= −δ aa + (d − 1)a2 ,
ij (9.28)
where d is the spatial dimension (d = 3 in our universe). Contracting the Ricci tensor with
the metric tensor gives the Ricci scalar,
R = g 00 R00 + g ij Rij
..a  
1  ..
= + d − d − 2 aa + (d − 1)a2
.
a a
..a .
a2
= +2d + d(d − 1) 2 . (9.29)
a a
Using D = d + 1, we can write this in terms of the spacetime dimension D,
..a a2
.
R = +2(D − 1) + (D − 1)(D − 2) 2 . (9.30)
a a

48
The time evolution of this depends sensitively on the details of how the scale factor evolves.
We will need to develop the Einstein equation to see how scale factor evolution is tied to
the energy-momentum of the type of matter hanging out in the spacetime. Arbitrary scale
factors a(t) are not allowed; the Einstein equations will determine them in terms of the
energy density and the pressures.

49
10 R15Oct
10.1 Geodesic deviation
Geodesics are generally not straight lines in curved spacetime. Physically, they deviate from
one another, because of spacetime curvature. How can we make this intuition mathematically
precise? Consider a one-parameter family of geodesics γs (λ), where λ is the affine parameter
along the geodesic in question. The parameter s ∈ R tells you which geodesic you are
referring to. We can choose coordinates s and λ on the manifold as long as the geodesics do
not cross.

Then we have two naturally defined vector fields,


∂xµ ∂xµ
Sµ = , Tµ = . (10.1)
∂s ∂λ
A useful mnemonic here is that S is for Separation while T is for Tangent.
We would now like to build the covariant analogue of the ‘relative velocity’ between
geodesics,
V µ = T α ∇α S µ , (10.2)
and the ‘relative acceleration’
Aµ := T α ∇α V µ . (10.3)
Note that the acceleration of a path away from being a geodesic is different. That would be

T α ∇α T µ . (10.4)

Since our proposed definitions above are tensor equations, they are well-defined. Now,
S and T µ are basis vectors adapted to a coordinate system, with s and λ. Therefore,
µ

[S, T ] = 0 . (10.5)

On our way towards building the relative acceleration vector, we will need an identity for
vector fields,

[X, Y ]µ = X α ∂α Y µ − Y α ∂α X µ (10.6)
= X α ∇α Y µ − Y α ∇α X µ . (10.7)

50
This allows us to relate S-directional derivatives of T to T -directional derivatives of S,
S α ∇α T µ = T α ∇α S µ . (10.8)
Now we can compute the relative acceleration vector.
Aµ = T α ∇α (T σ ∇σ S µ ) (10.9)
= T α ∇α (S σ ∇σ T µ )
= (T α ∇α S σ )(∇σ T µ ) + T α S σ {[∇σ ∇α T µ ] + Rµνασ T ν }
= (S α ∇α T σ )(∇σ T µ ) + Rµνασ T ν T α S σ
+[S σ ∇σ (T α ∇α T µ ) − (S σ ∇σ T α )(∇α T µ )]
= +Rµνασ T ν T α S σ , (10.10)
where we used (i) [S, T ] = 0, (ii) ∇ obeys the Leibniz rule and Riemann is defined in terms
of a commutator of covariant derivatives, (iii) the Leibniz rule and rearranging terms, (iv)
relabelling of dummy indices to cancel terms and T being the tangent vector of a geodesic.
Summarizing, we have the geodesic deviation equation
µ D2 S µ
A = 2
= (∇T ∇T S)µ = +Rµνασ T ν T α S σ . (10.11)

Here we see how the Riemann curvature tensor governs the deviation of geodesics in a very
precise way. The covariant acceleration deviation of this one-parameter family of geodesics
is given by the Riemann tensor contracted with the tangent vector T twice, on its second
and third indices, and contracted with the separation vector S once, on its fourth index.

10.2 Tidal forces and taking the Newtonian limit for Christoffels
Remember the tides? If you, like me, have spent any length of time near the ocean, then
you know that the water level rises and falls twice a day. But do you know why? Newton
first explained this in his Principia. Basically, oceanic water on the near side to the Moon
bulges because it is closer to the Moon than ocean on the far side and hence feels stronger
gravity; for the bulge on the far side that can be seen to happen through ‘centrifugal force’.
So we see two tides per day. (Note: distances in the figure are not to scale.)

51
How do tidal forces work in Newtonian and Einsteinian gravity? Well, you cannot detect
curvature using only one test particle, or only one geodesic. You need to use multiples to
see the physical effects of curvature of space or spacetime. So let us think about geodesic
deviation in the Newtonian limit, even before we recruit the heavy machinery of tensor
analysis in curved spacetime and the Riemann tensor. We will soon see how Riemann and
the Newtonian potential are connected by the Newtonian limit of weak gravity and
slow speeds.
In an inertial frame, the equation of motion of the first particle moving in a Newtonian
gravitational potential Φ(xk ) is
d 2 xi
= −δ ij ∂j Φ(xk ) . (10.12)
dt2
Next, we define the vector y i to be the separation of the second particle from the first, which
is assumed to be small. We have that
d2 i
(x + y i ) = −δ ij ∂j Φ(xk + y k ) . (10.13)
dt2
Taylor expanding gives

∂j Φ(xk + y k ) = ∂j Φ(xk ) + (∂` ∂j Φ(xk ))y ` + O(y 2 ) , (10.14)

so that the Newtonian trajectory deviation equation is


d2 i
y = −δ ij (∂j ∂k Φ) y k . (10.15)
dt2
The left hand side is known as the tidal acceleration, and it is described by the second mixed
partial derivatives of the Newtonian potential.
For simplicity, let us ignore the fact that the Earth is rotating on its own axis as well
as the rotation of the Earth around the Sun. Letting the moon be at (x, y, z) = (0, 0, d), we
have for the Newtonian gravitational potential
GN Mm
Φm (x, y, z) = − . (10.16)
{x2 + + (z − d)2 }1/2
y2
From this we can calculate the acceleration deviation vector
 2 
∂ Φ GN Mm
= + diag(1, 1, −2) . (10.17)
∂xi ∂xj 0 d3
Why the asymmetry between the z and x, y directions? Simple. The functional dependence
in the denominator is different.
∂ 2Φ
 
1 −3/2
= − GN Mm ∂x 2x · − {. . .} (10.18)
∂x2 0 2 0
 
−3/2 3 −5/2
= GN Mm {. . .} − x · 2x · − {. . .} (10.19)
2 0
GN Mm
= +0 (10.20)
d3

52
whereas
∂ 2Φ
 
1 −3/2
= − GN Mm ∂z 2(z − d) · − {. . .} (10.21)
∂z 2 0 2 0
 
−3/2 3 −5/2
= GN Mm {. . .} + (z − d) · 2(z − d) · − {. . .} (10.22)
2 0
2
GN Mm GN Mm d
= + 3
−3 (10.23)
d d5
2GN Mm
= − . (10.24)
d3
Another way to write the same set of equations is to use a unit normal vector ni = xi /r
pointing in the radial direction; then
 2 
∂ Φ GN Mm
aij = − i j
= − (δij − 3ni nj ) (10.25)
∂x ∂x 0 r3

This (tensor) equation tells us that you get stretched in the radial direction and squeezed in the
transverse directions. Quite generally, you can think of gravity as a stretchy-squeezy force.
This originates in the fact that gravitational intereactions in our universe are transmitted
by a spin-two boson known as the graviton. It has a polarization tensor rather than a
polarization vector. After symmetries under arbitrary changes of coordinates are taken into
account, there are two independent physical polarizations for the graviton in four spacetime
dimensions, like there are for the photon. But please do not mistake one for the other: the
photon only has spin one, and in dimension other than D = 3+1 the numbers of independent
physical polarizations of photons and gravitons will not match. That they do in D = 3 + 1
is an numerical accident.
How big are tidal forces, in orders of magnitude? First, we need to figure out which
of the solar system bodies is relevant. If you do the calculation using the above formula
for tidal accelerations, you find that the Moon is actually the biggest contributor, because
although it is much lighter than the Sun (about 27,100,000 times) it is much closer (about
388 times), and it is the cube of the distance that counts. Plugging in the numbers, you will
find that the Sun’s tidal acceleration is only about 45% of the Moon’s. So we focus on the
Moon. We would like to compare the magnitude to the acceleration due to gravity. So, to
get the order of magnitude, we are computing the ratio of the tidal force on a piece of ocean
to the g-force,
GN MM rE rE2 MM  rE 3
· ∼ ∼ 10−7 . (10.26)
d3 GN ME ME d
Tidal forces might seem like teeny weeny forces, but when you multiply by entire oceans,
you get physical effects that human beings can relate to.
We can make a little table comparing what we have found in Newtonian gravity versus
Einsteinian General Relativity so far.

53
What Newton Einstein
gravity Φ(xi , t) gαβ (xλ )
d2 x i d 2 xµ µ dx dx
ν σ
test particle EOM = −δ ij ∂j Φ = −Γ νσ
dt2 dλ2 dλ dλ
d2 y i D S2 µ
deviation = −δ ij ∂j ∂k Φ y k = +Rµνσρ T ν T σ S ρ
dt2 Dλ2
tidal forces ∂i ∂j Φ Rρσµν = +∂µ Γρνσ − ∂ν Γρµσ + Γρµλ Γλνσ − Γρνλ Γλµσ
gravity EOM ∇2 Φ = 4πGN ρ ??? (Einstein equations, coming soon!)
In the Newtonian equation of motion for Φ, ρ is the mass density of whatever is sourcing the
gravitational field, and GN is the Newton constant characterizing the strength of gravity.

In order to see how the covariant geodesic deviation equation reduces to the familiar
Newtonian equations, we need to take the Newtonian limit in which gravity is weak and
speeds are low. (Recall also that x0 = ct and we will need to put back the factors of c
here to make the approximation clear.) Either we can assume staticity, or we can note
that ∂0 = ∂t /c, which is a factor 1/c smaller than ∂i . In the Newtonian approximation, we
treat the Newtonian potential as a perturbation on 1, and we will ignore terms of order Φ2
compared to terms of order Φ.
In the weak-field limit, the line element is diagonal and quite simple,

ds2 = (1 + 2Φ/c2 )c2 dt2 − (1 − 2Φ/c2 )(dx2 + dy 2 + dz 2 ) . (10.27)

For the moment, you will need to take this equation on faith, as I have not yet developed
the machinery required to see how it emerges. What I will do for now is to assume it as
an ansatz, and show that it correctly gives back the familiar Newtonian limit in the limit
of weak gravity and slow speeds. Later on in the course, I will give a fuller explanation of
where this expression for the approximate line element comes from.
In the low-speed Newtonian limit, there is no difference between proper time and coor-
dinate time t. The dynamical variables of interest become xµ (λ) → xi (t). What this means
is that we only need to consider the spatial components of the geodesic deviation equation,
as the temporal component takes care of itself automatically. In the limit of slow speeds
compared to the speed of light, then, we have

d2 y i
2
= +Ri ttj y j . (10.28)
dt
To check that this does reduce to the Newtonian expression we need to compute Ri ttj for the
above line element. In the limit of weak gravity, we can find the components of the inverse
metric to first order in Φ,

g tt ' (1 − 2Φ/c2 )/c2 , g ij ' −δ ij (1 + 2Φ/c2 ) . (10.29)

For our general Christoffel symbol we have


1
Γµνλ = g µσ (∂ν gσλ + ∂λ gσν − ∂σ gνλ ) , (10.30)
2

54
so we can pick off the 0 and i parts individually. Assuming that gravity is weak allows us to
keep only first order terms in Φ. Assuming that Φ does not depend on time (to first order
in small quantities) sets some Christoffels to zero. For example,
1
Γ000 = g 00 (∂0 g00 ) = 0 . (10.31)
2
and
1
Γ0ij = g 00 (∂i g0j + ∂j g0i − ∂0 gij ) = 0 , (10.32)
2
and
1
Γi 0j = g ik (∂0 gk0 + ∂j g0k − ∂k g0j ) = 0 . (10.33)
2
Then we have
1 1
Γ00i = g 00 ∂i g00 ' (1 − 2Φ/c2 )∂i (1 + 2Φ/c2 ) ' ∂i Φ/c2 ⇒ Γt ti = ∂i Φ/c2 . (10.34)
2 2
Another nontrivial component is
1 1
Γi 00 = g ik (∂0 gk0 + ∂0 g0k − ∂k g00 ) = − g ik ∂k g00 = δ ik ∂k Φ/c2 ⇒ Γi tt = δ ik ∂k Φ . (10.35)
2 2
Finally, we have
1 i`
Γi jk = g (∂j g`k + ∂k g`j − ∂` gjk ) (10.36)
2
1 i`
= δ (1 + 2Φ/c2 )(−2/c2 ) (δ`k ∂j Φ + δ`j ∂k Φ − δjk ∂` Φ) (10.37)
2
1
⇒ Γi jk = 2 −δki ∂j Φ − δji ∂k Φ + δ i` δjk ∂` Φ .

(10.38)
c

55
11 M19Oct
11.1 Newtonian limit for Riemann
From the Christoffels we computed last time, we can compute the Riemann components,

1 ∂ 2Φ
Rt xtx = − , (11.1)
c2 ∂x2
1 ∂ 2Φ
Rt xty = − 2 , (11.2)
c ∂x∂y
1 ∂ 2Φ ∂ 2Φ
 
Rxyxy = + 2 + , (11.3)
c ∂x2 ∂y 2
 2 
1 ∂ Φ
Rxyxz = + 2 . (11.4)
c ∂y∂z

plus eight more equations from cyclic permutations of (x, y, z). Note that we do not obtain
any squares of partial derivatives here in our Riemanns because we are only working to first
order in the Newtonian potential Φ. Then, using our geodesic deviation equation in the
Newtonian limit, we have
d2 y i
= +Ri ttj y j . (11.5)
dt2
Since we also know that

Ri tjt = ∂j Γi tt − 0 = +∂j (δ ik ∂k Φ) = +δ ik ∂j ∂k Φ , (11.6)

we can see that the General Relativistic geodesic deviation equation involving Riemann gives
back the Newtonian expression, which is exactly what we set out to prove last time.

To illustrate the abstract concept of geodesic deviation, let us work a very simple exam-
ple. Suppose that we have a two-sphere of unit radius with line element

dΩ22 = dθ2 + sin2 θ dφ2 . (11.7)

If you did the Homework 1 assignment, you will already know how to find the Christoffels
for this case. There are only two that are nonzero,

Γθφφ = − sin θ cos θ , Γφφθ = + cot θ . (11.8)

Denoting d/dλ by an overdot, we find for the geodesic equation


.2
θ̈ − sin θ cos θ φ = 0 , (11.9)
..
φ̈ + 2 cot θ θ φ = 0 . (11.10)

These are second order nonlinear PDEs, and solving them can be a battle if you do not
choose your initial conditions cleverly.

56
If we wish, we can use spherical symmetry to pick a particular initial condition to make
integrating these equations simpler. We choose the initial conditions
π . .
θ(λ)|λ=0 = , θ(λ)|λ=0 = −Ω0 , φ(λ)|λ=0 = 0 , φ(λ)|λ=0 = 0 . (11.11)
2
This corresponds to pointing your tangent vector down a line of longitude. The constant Ω0
is the angular speed with which the polar angle θ is changing with the affine parameter λ.
What is Riemann? The only nonzero component on the 2-sphere S 2 is

Rθφθφ = + sin2 θ . (11.12)

So the components of the geodesic deviation acceleration are

Aθ = Rθνασ T ν T α S σ
= Rθφθφ T φ T θ S φ + Rθφφθ T φ T φ S θ
= + sin2 θ[T φ T θ S φ − (T φ )2 S θ )] , (11.13)

and

Aφ = Rφθθφ T θ T θ S φ + Rφθφθ T θ T φ S θ
= +[T θ T φ S θ − (T θ )2 S φ ] , (11.14)

Now we need to specify S and T . Since the tangent vector to a geodesic running down a
line of longitude points in the (negative of the) polar direction, and the separation vector
between two adjacent such geodesics points in the azimuthal direction, we have that

T θ = −Ω0 , Tφ = 0, Sθ = 0 , Sφ = 1 . (11.15)

So
Aθ = 0 , Aφ = −Ω20 . (11.16)
The magnitude is what you should expect for an angular acceleration of the type represented
here. The minus sign is physical.
It is possible to get considerably more sophisticated in discussing the physics of geodesic
deviation. In order to derive more precise equations, one studies a congruence of geodesics,
which is a set of curves in an open region of spacetime such that every point in the region
lies on precisely one curve. The story of how geodesics deviate can be expressed in more
sophisticated tensor languauge by studying the covariant derivative of the four-velocity vector
∇µ Uν and decomposing it into three independent parts: (a) the trace part θ, known as the
expansion of the congruence, (b) the symmetric traceless part σµν , known as the shear of the
congruence, and (c) the antisymmetric part ωµν , known as the rotation of the congruence.
Each of these affects the evolution of the others, and the equations obtained are different for
massive and massless particles. We will not show the details here because the algebra is too
long-winded.

57
11.2 Riemann normal coordinates and the Bianchi identity
Riemann normal coordinates are a handy coordinate system that you can always use
based about any point p. They are defined in a smallish patch in the neighbourhood of p,
and do not necessarily extend infinitely in all directions, as we will explain when we talk
about geodesic deviation soon. But they are a great little coordinate system that you can use
to evaluate tensor equations, and to help prove tensor equations. We will use the notational
convention that equations written in Riemann normal coordinates have bars over the tensors.
Strictly speaking we should also bar all the indices, but this is beyond my typing patience
at present, so please imagine barred indices everywhere in your head.
A Riemann normal coordinate system is one built using geodesics about a point p.
More concretely for our purposes, it is the coordinate system in which the metric is locally
Minkowskian, and the Christoffels are zero – at the point p,
µ
Γ̄ αβ = 0. (11.17)

Then, since ∇σ gαβ = 0 everywhere, including at p,

¯ σ ḡ µν = ∂σ ḡ µν − Γ̄λσµ ḡ λν − Γ̄λσν ḡ λµ
∇ (11.18)
= ∂σ ḡ µν + 0 = 0 . (11.19)

Therefore, in Riemann normal coordinate system, we have the special relations

∂σ ḡ µν = 0 , (11.20)
α
Γ̄ λσ = 0 , (11.21)
µ µ µ
R̄ νσρ = ∂σ Γ̄ νρ − ∂ρ Γ̄ νσ . (11.22)

As you can imagine, using this coordinate system we can more quickly check tensor equations.
This is not a trick – tensor equations are valid in any coordinate system. Therefore, they
must hold in any frame, including the Riemann normal coordinate frame in which our tensor
components simplify. This conceptual tool can be super handy.

We are now going to make use of this special coordinate system to identify all the
symmetries of Riemann. This is an important quest, because it will enable is to compute
how many independent components Riemann has in arbitrary spacetime dimension D = d+1.
In turn, that helps us understand the physics of this four-legged tensor. Computing it can
be arduous for a general spacetime, and this is why I set computer algebra as part of HW1.
To help us find the symmetries, it helps to start by using the spacetime metric to build
the (0,4) version of Riemann from the natural (1,3) version,

Rαβµν = gαλ Rλβµν . (11.23)

The first symmetry we can notice by inspection of the formula for Riemann in terms of
Christoffels. We see immediately that Riemann is antisymmetric upon exchange of its final
two indices,
Rρσµν = −Rρσνµ . (11.24)

58
In Riemann normal coordinates,
 
λ λ
R̄ρσµν = ḡ ρλ ∂µ Γ̄ νσ − ∂ν Γ̄ µσ (11.25)
 
1 λα
= ḡ ρλ ∂µ ḡ (∂σ ḡ να + ∂ν ḡ σα − ∂α ḡ νσ ) − (µ ↔ ν) (11.26)
2
1
= ḡ ḡ λα (∂µ ∂σ ḡ να + ∂µ ∂ν ḡ σα − ∂µ ∂α ḡ νσ ) − (µ ↔ ν) (11.27)
2 ρλ
1 
= ∂µ ∂σ ḡ νρ + ∂µ ∂ν ḡ σρ − ∂µ ∂ρ ḡ νσ − (µ ↔ ν) (11.28)
2
1 
= ∂µ ∂σ ḡ νρ − ∂µ ∂ρ ḡ νσ − (µ ↔ ν) , (11.29)
2
where in the third line above we used the fact that ∂µ ḡ λα = 0 in Riemann normal coordinates,
and in the fourth line we used symmetry. Therefore, we can see two additional identities
satisfied by Riemann,
Rρσµν = −Rσρµν , (11.30)
i.e., Riemann is antisymmetric upon exchange of its first two indices as well as its last two,
and
Rρσµν = Rµνρσ (11.31)
i.e., Riemann is symmetric under interchange of the first two indices with the last two.
We can also look at a version of Riemann with cyclic permutations on the last three
indices,
Qρσµν := Rρσµν + Rρµνσ + Rρνσµ . (11.32)
Evaluating again in Riemann normal coordinates gives

2Qρσµν = (∂µ ∂σ ḡ νρ − ∂µ ∂ρ ḡ νσ ) + (∂ν ∂µ ḡ ρσ − ∂ν ∂ρ ḡ µσ ) + (∂σ ∂ν ḡ µρ − ∂σ ∂ρ ḡ µν )


−(µ ↔ ν) (11.33)

= ∂ρ −∂µ ḡ νσ − ∂ν ḡ µσ − ∂σ ḡ µν + ∂ν ḡ µσ + ∂µ ḡ νσ + ∂σ ḡ νµ
  
+∂σ ∂µ ḡ νρ + ∂ν ḡ µρ − ∂ν ḡ µρ − ∂µ ḡ νρ + ∂µ ∂ν ḡ σρ − ∂ν ∂µ ḡ σρ (11.34)
= (0) + (0) + 0 − 0 (11.35)
= 0, (11.36)

where we have used the fact that mixed partial derivatives commute and the fact that the
metric is symmetric. Because of the antisymmetry properties, an equivalent way of writing
this is
Rρ[σµν] = 0 , (11.37)
and using other symmetries of Riemann it immediately follows from this that

R[ρσµν] = 0 , (11.38)

i.e., the totally antisymmetric part of Riemann vanishes too. With straightforward but
tedious algebra of very similar type, we can also derive the Bianchi identity which governs
covariant derivatives of Riemann. It can be written in (at least) two mathematically different

59
but physically identical ways, which are related by the symmetries of Riemann. The first
form is
∇λ Rρσµν + ∇ρ Rσλµν + ∇σ Rλρµν = 0 , (11.39)
and the second form is
∇[λ Rµν]ρσ = 0 , (11.40)
which constrains Riemann by relating components at different points. You can think of the
Bianchi identity for Riemann as like a Jacobi identity for covariant derivatives,

[[∇µ , ∇ν ], ∇λ ] + [[∇ν , ∇λ ], ∇µ ] + [[∇λ , ∇µ ], ∇ν ] = 0 . (11.41)

11.3 The information in Riemann


Now we have all the ingredients we need in order to compute the number of independent
Riemann coefficients. We know that as a (0,4) tensor Riemann satisfies

Rαβγδ = −Rαβδγ , (11.42)


Rαβγδ = −Rβαγδ , (11.43)
Rαβγδ = Rγδαβ , (11.44)
R[αβγδ] = 0. (11.45)

Suppose that we bunch the indices of Riemann in twos. Then we can think of Riemann as
like a symmetric combination of two antisymmetric blocks. Recall that the dimension of an
antisymmetric D × D matrix is D(D − 1)/2 while that of a symmetric matrix is D(D + 1)/2.
Then the number of components of Riemann should be
    
1 1 1 D
nR (D) = D(D − 1) D(D − 1) + 1 − . (11.46)
2 2 2 4

We obtained this by using the symmetries of the first three identities to compute the tentative
total and then subtracting off the number of completely antisymmetric components to satisfy
the fourth identity. This process works because the four constraints are independent. Then,
with very simple algebra, we obtain
1 2 2
nR (D) = D (D − 1) . (11.47)
12
Notice a few things about this formula. In one spacetime dimension, nR (1) = 0 and Riemann
has no components. This makes sense, as there is only one independent direction, so you
cannot build a nonzero commutator of covariant derivatives. There is not enough room in
spacetime to build a parallelogram. In two spacetime dimensions, we have nR (2) = 1 and
Riemann has just one independent component. This makes gravitational physics in D = 1+1
quite easy compared to higher dimensions. In three spacetime dimensions, we get nR (3) = 6,
and in four spacetime dimensions we have nR (4) = 20. This number is, not accidentally,
equal to the number of degrees of freedom in the second partial derivatives of the metric
that we cannot set to zero by a clever choice of coordinate system when Taylor expanding
the metric.

60
As we keep going up in dimension, nR (D) proliferates like a quartic polynomial of D.
By the time we get to ten or eleven spacetime dimensions, we are dealing with nR (10) = 825
or nR (11) = 1210 independent components! This is why we often use computer algebra in
research, when calculating in spacetime dimensions relevant to string theory. Of course, it
is also possible with clever techniques to cut through the algebra and find quicker ways to
calculate analytically, when your metric is diagonal or sparse in other significant ways.

61
12 R22Oct
12.1 Lie derivatives
So far we have developed covariant derivatives and curvature, which required having a
Christoffel connection. An interesting fact is there are some structures that can be de-
fined on a curved spacetime manifold even without reference to a connection or curvature.
We will introduce a the idea of the Lie10 derivative today, because studying it acting on
the metric tensor of spacetime will lead us to the General Relativistic version of Noether’s
Theorem, which is one of the most important ideas of all time in theoretical physics. We will
find that a symmetry of the spacetime metric gives an integral of the motion, a conserved
quantity, which we can use to help solve for trajectories of test particles in some important
cases.
The key concept we will need for our discussion of Noether’s Theorem is how to take a
Lie derivative along the congruence defined by a vector field. So, first things first, what is
congruence? On a spacetime manifold, a congruence is a set of curves that fill the manifold
(or more generally some part of it) without intersecting. Therefore, the congruence provides
a mapping of a manifold onto itself, in the following sense. If the parameter on the curves is
λ, then any tiny ∆λ defines a mapping, where each point is advanced by ∆λ along the same
curve in the congruence. This is a 1-1 mapping if the vector field is C 1 , and if it is C ∞ it is
called a diffeomorphism. If there is such a map for any ∆λ, then we have a one-parameter
Lie group, and the mapping is called a Lie dragging along the congruence.
Suppose that we have a scalar function f defined on our spacetime manifold. Then our

above mapping defined by ∆λ lets us define a new function f∆λ in the obvious way: if a
point P on a certain curve in the congruence gets mapped to the point Q, then

f (P ) = f∆λ (Q) . (12.1)

If it happens that we have a function for which the new value f∆λ (Q) is equal to the old one
f (Q), for all Q,

f = f∆λ , (12.2)
then the function is invariant under the mapping. If it is invariant for all ∆λ, then the
function is said to be Lie dragged. In less fancy language,
df
= 0. (12.3)

Acting on any given tensor, the Lie derivative along some vector field V , written as LV ,
measures how fast the tensor changes along integral curves of V . Acting on a scalar function
f , that is just the directional derivative,

LV (f ) = V λ ∂λ f . (12.4)

Note that this is the partial derivative: we have not involved any affine connection here.
10
Pronunciation note: “Lie” rhymes with “see”.

62
What about a vector field? Any vector field V is defined by the congruence of curves
for which it is the tangent field,
dxµ
Vµ = . (12.5)

A familiar example from undergraduate electromagnetism is that the magnetic flux lines are
the integral curves of the magnetic field 3-vector. Now suppose that we have two general
vector fields X and Y . Recall that for any vector V , it can be expanded in the coordinate
basis as V = V µ ∂µ . Then we can define the commutator [X, Y ] of two vector fields via

[X, Y ](f ) ≡ X(Y (f )) − Y (X(f )) , (12.6)

where f is an arbitrary function. The neat thing about [X, Y ] is that it is a bona fide vector
field: it is linear,
[X, Y ](af + bg) = a[X, Y ]f + b[X, Y ]g , (12.7)
and it obeys the Leibniz rule,

[X, Y ](f g) = f [X, Y ]g + g[X, Y ]f . (12.8)

In the coordinate basis, the new vector field [X, Y ] has components

[X, Y ]µ = X λ ∂λ Y µ − Y λ ∂λ X µ . (12.9)

This is a well-defined tensor, because the non-tensorial pieces from the partial derivatives
cancel by antisymmetry of the commutator. If you prefer, you can write the above formula
with covariant derivatives instead – that way, it looks more tensorial.
Suppose that we adapt our coordinate system so that V points entirely along the coor-
dinate basis vector ∂/∂xd . The utility of choosing this coordinate system is that a diffeomor-
phism by λ amounts to a coordinate transformation from (x0 , x1 , . . . , xd ) to (x0 , x1 , . . . , xd +
λ). Then the components of a different vector T µ pulled back from the transformed point to
the original are simply T µ (x0 , x1 , . . . , xd + λ). In this coordinate system, the Lie derivative
then becomes
∂ µ
LV T µ = T . (12.10)
∂xd
This expression is clearly not covariant, but we know that for two vector fields V and T
the commutator [V , T ] is a well-defined tensor, and in this coordinate system it happens to
have components
∂T µ
[V , T ]µ = V ν ∂ν T µ − T ν ∂ν V µ = . (12.11)
∂xd
Since both LV T and [V , T ] are vectors (rank (1, 0) tensors), their components must be
equal, and so we finally have the formula we want. The Lie derivative of a vector T along
the vector field V is
LV T = [V , T ] . (12.12)
This quantity on the RHS is called the Lie bracket. The equation says that how the vector
T changes along integral curves of another vector V is encoded in the commutator of the
two vector fields. The formula for the action of the Lie derivative on covariant vectors follows
directly from what we have just derived for contravariant vectors and the Leibniz rule.

63
For a general rank (k, `) tensor, the Lie derivative is

(LV T )µ1 ...µk ν1 ...ν` = V σ ∂σ T µ1 ...µkν1 ...ν`


−(∂λ V µ1 )T λµ2 ...µk ν1 ...ν` − . . .
+(∂ν1 V λ )T µ1 ...µkλν2 ...ν` + . . . . (12.13)

This equation may make you a bit uncomfortable because it involves partial derivatives. In
fact, if you do the straightforward but tedious algebra, you will find that it is just as valid
with covariant derivatives replacing the partial ones,

(LV T )µ1 ...µk ν1 ...ν` = V σ ∇σ T µ1 ...µkν1 ...ν`


−(∇λ V µ1 )T λµ2 ...µk ν1 ...ν` − . . .
+(∇ν1 V λ )T µ1 ...µkλν2 ...ν` + . . . . (12.14)

This equation certainly looked less tensorial written the first way. But the first equation has
the advantage that it makes clear that no connection is necessary to define Lie derivatives
of tensors. It is an independent structure.

12.2 Killing vectors and tensors


In this section, we will be especially interested in the expression above for the Lie derivative
of the metric tensor, which characterizes everything about gravity in our spacetime. We
have
(LV g)µν = ∇µ Vν + ∇ν Vµ . (12.15)
So if
0 = ∇µ Kν + ∇ν Kµ , (12.16)
for some vector K, the metric is unchanged. K is known as a Killing vector, and the
metric is unchanged along its integral curves, i.e., it has a symmetry. This is Noether’s
Theorem in curved spacetime, and it plays an extremely important role in the physics of
GR. So, what is the corresponding conservation law?
Consider the quantity K · p. Its covariant derivative is

∇µ (Kλ pλ ) = (∇µ Kλ )pλ + Kλ (∇µ pλ ) . (12.17)

Contracting this with pµ gives

pµ ∇µ (Kλ pλ ) = pµ pλ ∇µ Kλ + Kλ pµ (∇µ pλ ) , (12.18)

and the second term disappears by the geodesic equation. The first term can also be seen
to vanish by virtue of symmetry and the Killing vector equation. So the Killing equation is
equivalent to conservation of K · p. More generally, if we have a Killing tensor obeying

∇(µ Kν1 ...ν` ) = 0 (12.19)

then
pµ ∇µ (Kν1 ...ν` pν1 . . . pν` ) = 0 . (12.20)

64
A fascination with finding conserved quantities is physically important because it can
help us solve for geodesics. Soon, when we introduce black holes, we will see just how
crucial conserved quantities can be in analyzing geodesic motion and physical consequences
of it. So let us derive an alternative form of the geodesic equation which will be handy for
future reference. What is the directional covariant derivative of the downstairs version of
the tangent vector to the curve xµ (λ)?
d 2 xµ σ
 
D dxµ α dx dxα
= − Γ σµ . (12.21)
Dλ dλ dλ2 dλ dλ
This should be zero for geodesics, giving
d 2 xµ 1 αβ dxσ dxα
= g (∂ g
σ βµ + ∂ g
µ βσ − ∂ g
β σµ ) ,
dλ2 2 dλ dλ
1 dxσ dxβ
= (∂σ gβµ + ∂µ gβσ − ∂β gσµ ) ,
2 dλ dλ
1 dxσ dxβ
= (+∂µ gβσ ) (12.22)
2 dλ dλ
which yields (upon relabelling dummy indices)

dxα dxβ
 
d dxµ 1
= (∂µ gαβ ) . (12.23)
dλ dλ 2 dλ dλ
So if the entire spacetime metric has zero dependence on a particular coordinate xµ , the
corresponding lower-index tangent vector dxµ /dλ is conserved! For a massive particle, this
quantity is none other than pµ /m. For the massless particle, we can choose a convention in
which pµ = dxµ /dλ. Therefore,

if ∂µ gαβ = 0 ∃µ ∀α, β then pµ = constant . (12.24)

Let us do an ultra-simple example of a Killing vector. Consider Minkowski space in 4D,


namely R3,1 with the flat spacetime metric. In Cartesian coordinates, we obviously have
spacetime translation invariance. This implies that all components of pµ are conserved.
As a less trivial example, take our spatially flat FRW universe for which we previously
worked out the Christoffels. Notice that the metric depended only on time. Obviously, this
means that energy is not conserved. Stop and think on that for a minute. You probably
thought that conservation of energy must be true in all circumstances, even for the whole
universe. You would be wrong. It requires a symmetry! Since none of the components of
the metric depend on spatial coordinates, the spatial momenta pi are conserved.
For our third example of Killing vectors, consider the two-sphere S 2 with round metric

ds2 = dθ2 + sin2 θ dφ2 . (12.25)

How do we find the Killing vectors? We need to solve the D(D + 1)/2 Killing equations,

0 = ∇µ Kν + ∇ν Kµ
= ∂µ Kν + ∂ν Kµ − 2Γαµν Kα . (12.26)

65
First, we need the nonzero Christoffels,
cos θ
Γφφθ = , Γθφφ = − sin θ cos θ . (12.27)
sin θ
Then the three independent Killing vector equations involve θθ, φφ, θφ:
0 = ∂θ Kθ ,
0 = ∂φ Kφ + sin θ cos θ Kθ ,
2 cos θ
0 = ∂φ Kθ + ∂θ Kφ − Kφ . (12.28)
sin θ
The first Killing equation teaches us that
Kθ = Kθ (φ) . (12.29)
Taking ∂φ of the second Killing equation gives, after a little bit of massaging of trig functions
and using the third equation,
∂φ2 Kθ + Kθ = 0 . (12.30)
We can readily solve this,
Kθ (φ) = A sin φ + B cos φ , (12.31)
where A, B are constants of integration. Using this in the third Killing equation and partially
integrating w.r.t. φ to find Kφ gives
Kφ = F (θ) + A sin θ cos θ cos φ − B sin θ cos θ sin φ , (12.32)
where F is an arbitrary function of integration. Substituting this back into the third Killing
equation gives, after more trigonometric algebraic massage,
2 cos θ
∂θ F (θ) − F (θ) = 0 , (12.33)
sin θ
which is readily integrated to
F (θ) = C sin2 θ , (12.34)
where C is a constant of integration. Therefore, the general form of our Killing vectors for
the two-sphere are, for the downstairs components,
Kθ = A sin φ + B cos φ ,
Kφ = C sin2 θ + sin θ cos θ (A cos φ − B sin φ) . (12.35)
If we take A = 0, B = 0, C = 1, we get a Killing vector R with upstairs components
Rθ = 0 , Rφ = 1 . (12.36)
If we take A = 0, B = 1, C = 0, we get a Killing vector S with upstairs components
S θ = cos φ , S φ = − cot θ sin φ . (12.37)
If we take A = −1, B = 0, C = 0, we get a Killing vector T with upstairs components
T θ = − sin φ , T φ = − cot θ cos φ . (12.38)
As you can check by transforming between spherical polar coordinates and Cartesian coordi-
nates, these three Killing vectors correspond to R = x∂y −y∂x , S = z∂x −x∂z , T = y∂z −z∂y .

66
13 M26Oct
13.1 Maximally symmetric spacetimes
Spacetimes are distinguished by how many symmetries they possess. The more symmet-
ric, the more calculable. The less symmetric, the less calculable. Even though maximally
symmetric spacetimes possess an unrealistic amount of symmetry for experimental purposes,
they are still very useful to study because calculations are easier to complete and they help
build intuition.
What are the maximally symmetric spacetimes? We need to specify the spacetime
signature11 in order to get started on this discussion. In Euclidean signature, Riemannian
manifolds with maximal symmetry are (up to local isometry) either: Euclidean space RD ,
the sphere S D , or hyperbolic space H D . In Lorentzian signature, there are also three options,
and they split up according to the value of the cosmological constant Λ (a.k.a. dark energy
density). When Λ = 0, we get Minkowski space Rd,1 , where D = d + 1. For Λ < 0 we get
Anti de Sitter spacetime (AdS), and for Λ > 0 we get de Sitter spacetime (deS).
Recall that Minkowski spacetime is invariant under (d + 1) translations, d(d − 1)/2
rotations, and d boosts. Adding the numbers together gives a total of
1 1 1
(d + 1) + d(d − 1) + d = (d + 1)(d + 2) = D(D + 1) (13.1)
2 2 2
symmetries. We therefore say that a spacetime manifold of dimension D is maximally
symmetric if it possesses D(D + 1)/2 independent symmetries.
What equation should the Riemann tensor obey in maximally symmetric spacetimes? It
had better be invariant under local Lorentz transformations, because there is no preferred
direction in spacetime. There are only a very few tensors which we can use: gµν and µ1 ...µD .
The epsilon tensor turns out to have the wrong symmetry properties to build Riemann
components, and the metric ends up the winner. The sole combination of metric tensor
components that possesses the right symmetries to be Riemann is antisymmetric, and tracing
gives the constant of proportionality,
R
Rρσµν = (gρν gσµ − gρµ gσν ) . (13.2)
D(D − 1)

The Ricci scalar R is constant over the entire manifold for maximally symmetric spacetimes.

Anti de Sitter spacetime AdSD=d+1 can be embedded in a Minkowski spacetime of one


higher dimension Rd,2 , via

− (t1 )2 − (t2 )2 + (x1 )2 + . . . + (xd )2 = −L2 (13.3)

where L is the radius of curvature of the AdSD . There are several different coordinate
11
If we were in a mathematically picky mood, we would also want to specify the spacetime topology.

67
systems in common usage for AdSD . One of the most useful is global coordinates, in which

t1 = L cosh ρ cos τ , (13.4)


t2 = L cosh ρ sin τ , (13.5)
d
X
i
x i
= L sinh ρ x̂ , where (x̂i )2 = 1 . (13.6)
i=1

In general dimension, spherical coordinates are defined via

x̂1 = cos θ1 ,
p−1
Y
p
x̂ = cos θ1 sin θm , p ∈ {2, . . . , d − 1} ,
m=1
d−1
Y
d
x̂ = sin θm . (13.7)
m=1

You can check yourself, either by hand or using SymPy, that the resulting line element of
AdSD in global coordinates is

ds2 = L2 cosh2 ρ dτ 2 − dρ2 − sinh2 ρ dΩ2d−1 ,



(13.8)

where !
d−1
X `−1
Y
dΩ2d−1 = dθ12 + 2
sin θm dθ`2 . (13.9)
`=2 m=1

With a further transformation in time and radius to static coordinates,

t = Lτ ,
r = L sinh ρ , (13.10)

we obtain −1
r2 r2
  
2 2
ds = 1 + 2 dt − 1 + 2 dr2 − r2 dΩ2d−1 . (13.11)
L L
The scale L is the radius of curvature, and it sets the scale for all the physics in AdSD .
The physics of Anti de Sitter (or de Sitter) spacetime in D = d + 1 dimensions differs
markedly from the physics of Minkowski spacetime. One of the quickest ways to illustrate
this is to compare the falloff of partial waves in AdS versus flat spacetime. Solving a wave
equation for a simple type of field is a straightforward way to see this.
Consider a Klein-Gordon (scalar) field living in flat Minkowski spacetime. Its equation
of motion in spherical coordinates {t, r, ΩD−2 } is

∇µ ∇µ Φ = m2 Φ . (13.12)

If we write
Φ(t, r, ΩD−2 ) = e−iωt χ(r)Y`,{m} (ΩD−2 ) , (13.13)

68
where the spherical harmonics obey

∇2S d−1 Y`,{m} = −`(` + d − 2)Y`,{m} , (13.14)

and separate variables, we find


 2  
∂ (d − 2) ∂ 2 `(` + d − 2) 2
+ + ω − −m χ(r) = 0 . (13.15)
∂r2 r ∂r r2
The most physically important thing to understand from this partial differential equation is
that higher partial waves with ` > 0 are less important at large radius than the ` = 0 mode.
A related fact is that when we write out the multipole expansion for electric and magnetic
fields in Minkowski spacetime, higher multipole fields fall off with larger powers of radius.
This physics is inherent to Minkowski spacetime with Λ = 0. It may surprise you to learn
that it does not carry over to other values of the cosmological constant.
Suppose that we now consider instead AdSd+1 with global coordinates {t, ρ, Ωd−2 },

ds2 = L2 cosh2 ρ dτ 2 − dρ2 − sinh2 ρ dΩ2d−1 .



(13.16)

In this set of coordinates, ρ ranges from 0 (the interior of AdS) to π/2 (the boundary) and
the coordinate t ranges from −∞ to +∞. What does the scalar wave equation look like in
this spacetime? Anticipating separation of variables again, let us write

Φ(τ, ρ, Ωd−1 ) = e−iωτ χ(ρ)Y`,{m} (Ωd−2 ) . (13.17)

Then the equation of motion becomes


 
1 d−1
  2 2 2 2

∂ρ (tan ρ) ∂ρ + ω − `(` + d − 2) csc ρ − m sec ρ χ(ρ) = 0 . (13.18)
(tan ρ)d−1
Notice that as we approach the boundary, the higher angular momentum modes are not
suppressed compared to the ` = 0 mode. This is the germ of why the AdS/CFT correspon-
dence discovered in the context of string theory in 1997 can work: an observer living on
the boundary of the spacetime can see lots of information about what is happening in the
interior of the spacetime all the way from the boundary. If we want to know the character
of solutions to the above differential equation, we can substitute

χ(ρ) = (cos ρ)2h (sin ρ)2b f (ρ) , (13.19)

which, upon the substitution


y ≡ sin2 ρ , (13.20)
gives
ω2
   
d
y(1 − y)∂y2 f 2
+ 2b + − (2h + 2b + 1)y ∂y f − (h + b) − f = 0. (13.21)
2 4
The solutions to this equation are hypergeometric functions, with
√  
d ± d2 + 4m2 ` ` d
h± = , b= + , − +1− . (13.22)
4 2 2 2

69
(For further details, see e.g. hep-th/9805171.)
de Sitter spacetime dSD can be embedded in RD,1 via

t1 = L2 − r2 sinh(t/L) , (13.23)
Xd
i
i
x = Lx̂ , where (x̂i )2 = 1 , (13.24)
i=1

xD = L2 − r2 cosh(t/L) . (13.25)

This gives rise to static coordinates. (Like AdS, dS can alternatively be sliced with flat,
positively curved, or negatively curved spatial sections. In static coordinates, the de Sitter
line element becomes
−1
r2 r2
  
2 2
ds = 1 − 2 dt − 1 − 2 dr2 − r2 dΩ2d−1 . (13.26)
L L

This has a cosmological horizon at r = L. We will not have time to develop the similarities
and differences between cosmological horizons and black hole horizons in this course.

13.2 Einstein’s equations


In plain language, Einstein’s equations express the fact that matter tells spacetime how
to curve and spacetime tells matter how to move. In PHY484, I will show how to derive
Einstein’s equations of General Relativity. For now, we will just write them down for you
and show you how to use them. They relate a geometrical quantity on the left hand side,
built out of the Riemann curvature tensor, to an energy-momentum tensor of any matter
fields in the physical system containing gravitation as well. In tensor notation, they read as
follows,
1
Rαβ − gαβ R + Λgαβ = −8πGN Tαβ . (13.27)
2
The quantity Λ is known as the cosmological constant. (Note: you can put back the
powers of c very easily by recruiting dimensional analysis.)
A very important characteristic of Einstein’s equations is that they are nonlinear. You
can see this by eye by recalling the formula for Christoffels in terms of metric derivatives,
which is nonlinear, as well as the formula for the Riemanns in terms of derivatives of Christof-
fels and contractions of Christoffels, which is also nonlinear. Nonlinearity makes GR very
different qualitatively than Newtonian gravity. It is only in the Newtonian limit of GR that
the linearity with which you are familiar emerges and shows itself as the superposition prin-
ciple for the Newtonian potential Φ(x). For generic situations in GR, nonlinearity is present
in the partial differential equations for the evolution of spacetime. The mathematics of non-
linear PDEs is hugely complicated compared to linear ones, and for generic spacetimes often
no general statements can be made. Symmetry helps enormously with the task of trying to
solve the differential equations, classify spacetimes, or find their geodesics.
The energy-momentum tensor on the RHS of Einstein’s equations is covariantly con-
served. The way to see this is to take covariant derivatives of both sides of the Einstein

70
equations. The Einstein tensor is defined as
1
Gµν = Rµν − gµν R . (13.28)
2
Notice that this is denoted with a big-Gµν , rather than the small-gµν metric or the GN
denoting the Newton gravitational constant. By itself, the rank (0,2) Einstein tensor Gµν
does not look like much. But it obeys an extremely useful identity by virtue of the Bianchi
identity for the Riemann tensor. To see this, let us take the first form of our Bianchi identity
and contract with two factors of the upstairs metric,

0 = g νσ g µλ (∇λ Rρσµν + ∇ρ Rσλµν + ∇σ Rλρµν ) (13.29)


= ∇µ Rρµ − ∇ρ R + ∇ν Rρν . (13.30)

Rearranging this expression gives a relationship between the covariant derivative of the Ricci
tensor and the covariant derivative of the Ricci scalar,
1
∇µ Rρµ = ∇ρ R . (13.31)
2
This identity is handy because it enables us to prove that

∇µ Gµν = 0 . (13.32)

In other words, the Einstein tensor is covariantly conserved. We also have the metric com-
patibility condition on our affine connection,

∇σ gµν = 0 . (13.33)

Then we have
∇µ Tµν
matter
= 0. (13.34)
Covariant conservation of the energy-momentum tensor in GR is mandatory, not voluntary.
How about some examples of energy-momentum tensors? Consider a perfect fluid, which
is a spherical cow approximation to real fluids, characterized only by three things: energy
density ρ, pressure p, and fluid velocity uµ . Its energy-momentum tensor is constructed from
those three quantities and the metric tensor,

p.f.
 p
Tµν = ρ + 2 uµ uν − pgµν . (13.35)
c
More generally, if we have an action principle for some classical matter (non-gravitational)
field coupled to gravity, Smatter , then the energy-momentum tensor is determined by varying
the action w.r.t. gµν according to the following recipe12 :

2 δSmatter
Tµν (xσ ) = p , (13.36)
−g(xσ ) δg µν (xσ )
12
I will prove this near the beginning of the GR2 PHY[1]484S course

71
where (−g) is an abbreviation for the determinant of the downstairs metric,
√ q
−g ≡ − det (gαβ ) . (13.37)

This quantity arises in writing down a general relativistically invariant measure of integra-

tion, dD x −g. (For the case of spherical coordinates on flat Minkowski spacetime, it is
r2 sin θ, which should be familiar to you from undergraduate multivariable calculus.) A
handy formula is
√ 1√ 1√
δ −g = − −g gαβ δg αβ = + −g g αβ δgαβ . (13.38)
2 2
For a relativistic massive point particle,
Z
particle m . .
Tµν (x) = p dτ z µ z ν δ 4 (x − z(τ )) . (13.39)
−g(x)

We can see how this arises by starting from the Einbein action in curved spacetime in proper
time gauge for a massive particle,

dz µ (τ ) dz ν (τ ) 1
Z  
(2) 1
Srel = m dτ gµν + m . (13.40)
2 dτ dτ 2

The only part of this action that depends on the spacetime metric is the first term. Also,
we will only get a nonzero result when we are on the particle path.
How about for a scalar field Φ? For minimal coupling to gravity,

Z  
D 1 µ 1 2 2
Sscalar [Φ] = d x −g ∇ Φ∇µ Φ − m Φ − V (Φ) . (13.41)
2 2

It follows that  
scalar 1 2
Tµν = ∇µ Φ∇ν Φ − gµν (∇Φ) − V (Φ) . (13.42)
2
For the electromagnetic field Aµ ,

Z
1
SEM [Aα ] = − dD x −gF µν Fµν . (13.43)
4
It follows that  
EM λ 1 λσ
Tµν = − Fµλ Fν − gµν F Fλσ . (13.44)
4

72
14 R29Oct
14.1 Birkhoff ’s theorem and the Schwarzschild black hole
Let us now attack the question of solving the vacuum Einstein equations when we have a
static, spherically symmetric spacetime. After a bit of work, we will be able to show that
the Schwarzschild black hole possessing mass M is the unique solution.
Our methodology follows that of Carroll §5.2, and will involve a few steps. We will first
use spherical symmetry to constrain the possible metric components that might be turned
on. Then we will use the vacuum Einstein equations to prove that the time dependence must
drop out. Then we will solve the remaining vacuum Einstein equations, and we will obtain
the Schwarzschild solution. The last piece of the puzzle will be provided by the Newtonian
limit, which will connect a mathematically arbitrary constant of integration to the physical
quantity GN M , where M is the mass of the Schwarzschild geometry and GN is the Newton
constant, which has dimensions of lengthD−2 and parametrizes the strength of gravity.
First, let us discuss the definition of a static spacetime in Lorentzian signature. Calling
the timelike coordinate x0 , we define a static spacetime as one for which (a) there is no
explicit time dependence in the metric and (b) the invariant interval possesses time reversal
invariance,
∂ λ

gµν (x ) = 0, (14.1)
∂x0
ds2 invariant under x0 → −x0 . (14.2)

A spacetime that only obeys the first condition is called a stationary spacetime. In essence,
a static spacetime basically does nothing at all over time, while a stationary spacetime does
exactly the same thing at all times. Note that staticity requires that there be no time-space
cross terms in the invariant interval, only time-time and space-space components.
Isotropy is also big requirement. Having this much symmetry eliminates a lot of possibly
independent components of the metric tensor. In particular, writing in terms of either Carte-
sian coordinates ~x or spherical polar coordinates r, θ, φ, we can only use three ingredients,

~x · ~x = r2 ,
d~x · d~x = dr2 + r2 dΩ22 ,
~x · d~x = rdr , (14.3)

where
dΩ22 = dθ2 + sin2 θ dφ2 . (14.4)
Any other thing we could build from the available ingredients would not respect spherical
symmetry.
Given the spherical symmetry of our ansatz, it is traditional to use spherical polar
coordinates, in which the metric on the S 2 is round – throughout the spacetime. For now,
we will allow the metric to have time dependence, but bear in mind that shortly we will find
it is disallowed by the Einstein equations. We write the metric as
00 (t0 ,r 0 ) 00 (t0 ,r 0 ) 00 (t0 ,r 0 ) 00 (t0 ,r 0 )
ds2 = e2α (dt0 )2 − e2β (dr0 )2 − 2e2γ dt0 dr0 − e2δ (r0 )2 dΩ22 . (14.5)

73
Next, we can change to a new radial coordinate r(t0 , r0 ) by
00 (t0 ,r 0 )
r2 = (r0 )2 e2δ . (14.6)

This r is often referred to as the areal radius, because r2 is the thing in front of dΩ22 ,
the metric on the round two-sphere. Using this areal radius coordinate, we can then adjust
the definitions of all functions dependent on time and radius accordingly, to new functions,
single primed,
0 0 0 0 0 0
ds2 = e2α (t ,r) (dt0 )2 − e2β (t ,r) dr2 − 2e2γ (t ,r) dt0 dr − r2 dΩ22 . (14.7)

In order to be able to get rid of the 2dt0 dr0 term in this line element, we are going to
have to work harder. Let us start by trying the simplest proposal for a new time coordinate,
?? 0 0
dt = e2α(t ,r) dt0 − e2γ(t ,r) dr . (14.8)

If we try to follow this path further, we will find that second mixed partial derivatives
of the new t coordinate w.r.t. the old coordinates fail to commute, so the equation (14.8)
above is inconsistent. (To see a simple example of how this process works when done right,
try transforming from Cartesian coordinates (x, y) on the plane to polar coordinates (r, θ),
and checking that mixed second partials commute.) Our simplest proposal for a coordinate
change failed. Can we craft a better proposal? As you may recall from the general theory
of ODEs/PDEs, the right strategy is to recruit an integrating factor, which here must be a
function of both t0 and r: Φ(t0 , r). We define a new time coordinate t(t0 , r) by
h i
2Φ(t,r) 2α(t0 ,r) 0 2γ(t0 ,r)
dt = e e dt − e dr . (14.9)

The very explicit factor of eΦ(t,r) in front of the [. . .] parts we wanted is designed precisely
such that the right hand side of the above expression is an exact differential. In this case, it
can be shown that we can always find such a Φ(t, r). Then, using the above equations, we
obtain
e−4Φ dt2 = e4α (dt0 )2 − 2e2(α+γ) dt0 dr + e4γ dr2 . (14.10)
Rearranging this and forming our (dt0 )2 and 2dt0 dr pieces gives
0 0
e2α(t ,r) (dt0 )2 − 2e2γ(t ,r) dt0 dr = e−2α(t,r)−4Φ(t,r) dt2 − e−2α(t,r)+4γ(t,r) dr2 . (14.11)

Woohoo – the cross terms in the metric are gone! Redefining our metric ansatz functions
according to
0 0
e2α = e−2α −4Φ ,
0 0 0
e2β = e2β + e−2α +4γ (14.12)

gives
ds2 = e2α(t,r) dt2 − e2β(t,r) dr2 − r2 dΩ22 . (14.13)
The point of all this wrestling with differentials was to show that we can always choose a
coordinate system in which off-diagonal metric components are absent, even if our spherically
symmetric system is time dependent.

74
Our next task is going to be to show that the time dependence in the metric functions
also has to drop out. For this part, we will need to use the equations of motion for the metric
tensor field on spacetime. For the Einstein equations, we need to compute Christoffels to
get Riemanns which we can then contract to get Ricci components, e.g. via SymPy code you
wrote for HW1+HW2. We get
Γt tt = {∂t α} , Γt tr = ∂r α ,
Γt rr = {e2(β−α) ∂t β} , Γrtt = e2(α−β) ∂r α ,
Γrtr = {∂t β} , Γrrr = ∂r β ,
Γrθθ = −re−2β , Γrφφ = −r sin2 θe−2β ,
1
Γθrθ = , Γθφφ = − sin θ cos θ ,
r
φ 1 cos θ
Γ rφ = , Γφθφ = . (14.14)
r sin θ
Note that the pieces involving ∂t have been highlighted with {. . .} in the above equation so
you can clearly see the effect of allowing time dependence. For the Ricci tensor, we obtain
 
2(α−β) 2 2 2
Rtt = e −(∂r α) − (∂r α) + ∂r α∂r β − (∂r α)
r
2 2

+ −(∂t β) + (∂t α)(∂t β) − (∂t β) ,
 
2
Rtr = − (∂t β)
r
 
2 2 2
Rrr = (∂r α) + (∂r α) − (∂r α)(∂r β) − (∂r β)
r
2(β−α) 2 2

+e −(∂t β) − (∂t β) + (∂t α)(∂t β) ,
 −2β 
Rθθ = − e (r∂r β − r∂r α − 1) + 1 ,
Rφφ = sin2 θRθθ . (14.15)
All these tensors must be zero for us to have a solution of the vacuum Einstein equations.
Note how some of the Einstein equations have turned out to be second order dynamical
equations while others are first order constraints. This is a general feature in GR.
First, let us look at Rtr . This must be zero, which demands of β(t, r) that
∂t β(t, r) = 0 ⇒ β = β(r) . (14.16)
You can see by looking for the {. . .} parts in the Riccis that many terms now drop out
completely because β is a function of r only. Obviously, this simplifies our life quite a lot!
Second, let us notice that the Rθθ = 0 equation (a first order constraint equation) is
relatively simple. Let us take a time derivative of it,
∂t (Rθθ ) = 0 = −2(∂t β)e−2β [r∂r β − r∂r α − 1] + e−2β [r∂t ∂r (β − α)] . (14.17)
But since ∂t β = 0 by our Rtr = 0 equation, we have
e−2β r∂t ∂r (β − α) = 0 . (14.18)

75
Then, using what we know about β(r), we can partially integrate to get

α(t, r) = f (r) + g(t) . (14.19)

Notice how the only remaining place where we have time dependence is in the tt component
of the metric. What a stroke of luck! This means that we can absorb it simply by doing a
coordinate transformation involving only time (not radius or angular coordinates),

dt̃ = dt eg(t) . (14.20)

Let us redefine our time coordinate to correspond to this t̃ (we drop the tilde, for notational
clarity). Then we have
β = β(r) , α = α(r) . (14.21)
Third, let us look at the remaining (more complex) tt and rr Einstein equations,
 
2 2 2
0 = (∂r α) + (∂r α) − ∂r α∂r β + (∂r α) , (14.22)
r
 
2 2 2
0 = −(∂r α) − (∂r α) + (∂r α)(∂r β) + (∂r β) (14.23)
r
By simply adding these equations together, we obtain

∂r (α + β) = 0 . (14.24)

This means that


β(r) = const. − α(r) . (14.25)
This constant of integration can be absorbed into the time coordinate, so that β(r) = −α(r).
Fourth, we can plug this expression for β(r) in terms of α(r) back in to the Rθθ = 0
Einstein equation to obtain
[2r∂r α + 1] e2α = 1 . (14.26)
By quick inspection you can see that this becomes

∂r re2α(r) = 1 ,

(14.27)

so that
re2α(r) = r + c1 , (14.28)
where c1 is a mathematically arbitrary constant. This can be integrated to give
c1
e2α(r) = 1 + . (14.29)
r
We are nearly done, but we need one more physical ingredient. We need to know the
physical meaning of c1 , because it is what controls all the nontrivial radial dependence
in our new static spherically symmetric metric satisfying the vacuum Einstein equations.
This is where the Newtonian limit comes to our rescue. We know that in regions of weak
gravity, far away from the centre of our spacetime near r → ∞, gtt should take the form of
gtt ' 1 − 2Φ/c2 , where Φ = −GN M/r. This fixes our arbitrary constant of integration.

76
Therefore, we finally obtain the famous Schwarzschild metric in four13 spacetime
dimensions:    −1
2 2GN M 2 2GN M
ds = 1 − dt − 1 − dr2 − r2 dΩ22 . (14.30)
r r
Birkhoff ’s Theorem says that this is the unique static spherically symmetric solution of
the vacuum Einstein equations. We sketched a proof of this en route, when we found that
the Einstein equations would not allow time dependence. Note that in the solution we see
GN , which is a theory parameter, and M , which is a solution parameter.
One of the two most physically intriguing things about this solution, in this coordinate
system, is that there is a place where grr blows up (and gtt goes to zero). This is known as
the event horizon. It is located at the Schwarzschild radius
2GN M
rS = . (14.31)
c2
where I have temporarily shown the factors of c for physical clarity. Try calculating your
own Schwarzschild radius. You do not fit inside this radius, so you are not a black hole.
The second physically intriguing thing about this solution of Einstein’s equations is that
it has a curvature singularity at r = 0 that is not just a coordinate singularity. It is a
truly physical singularity, as you can see by computing a curvature invariant like Riemann
squared,
48(GN M )2
Rµνσρ Rµνσρ = , (14.32)
r6
which diverges at r = 0. Notice that at r = rS = 2GN M , this curvature invariant is small
in Planck units for a big black hole,

12`4P
`4P Rµνσρ Rµνσρ (r = rS ) = 4 . (14.33)
rS

Conversely, for a Planck mass black hole, the curvature would be Planckian.
A third aspect of this solution should also grab your physical interest: the nonlinearity
of gravity manifest in it. Nonlinearity is what allows there to be a nontrivial solution of the
vacuum Einstein equations (ones with Tµν = 0) at all. Compare to Newtonian gravity, where
a zero mass on the RHS of the Laplace equation would result in a zero Newtonian potential!

Mathematically, the mass M of a classical Schwarzschild black hole might in principle


take any value from −∞ to +∞, because rS arose as a mere constant of integration of
Einstein’s equations. However, physically there are limits to what the mass can be. For
starters, M must be finite for a physically reasonable solution. More importantly, the mass
must be nonnegative, M ≥ 0, because the singularity is not covered by a horizon if the
Schwarzschild radius is negative! When M < 0, the gravitational redshift also walks off into
the complex plane, and we are in trouble interpreting what the heck our spacetime might
mean. So already at the classical level, we can imagine why taking our black holes to have
13
Cousins of Schwarzschild which are asymptotically flat are available in various dimensions for D > 3.
They have 1/rD−3 dependence rather than 1/r dependence in the metric.

77
non-negative mass is a sensible physical precaution. (Of course, the case M = 0 is Minkowski
spacetime.)
There is a more sophisticated argument available for mass non-negativity that takes into
account quantum corrections to classical gravity, first made in 1995 by G.T. Horowitz and
R.C. Myers. They argued that if a negative-mass black hole solution were physical, in the
sense that quantum gravity corrections somehow ‘fixed up’ the negative-mass naked singu-
larity into some physical blob with large-but-finite curvature, then the vacuum of quantum
gravity would be unstable. Their logic was this: we could reduce the energy of our system
from the vacuum state by simply pair-producing more and more blob-antiblob pairs. This
works because each blob has negative energy and so does each anti-blob! The existence of
negative-mass ‘black holes’ would therefore thoroughly destabilize the vacuum of quantum
gravity, which is the foundation upon which we lay excitations of quantum fields describing
the fluctuating degrees of freedom of the system. The result would be a horrible, physically
inconsistent mess.
The moral of this mass positivity story is this: do not trust that every mathematical
solution of a physically interesting set of PDEs is physical. We must also check that physical
boundary conditions are obeyed, and ensure that basic physical principles like stability of
the vacuum are preserved. This is why we assume henceforth that MBH ≥ 0.

78
15 M02Nov
15.1 TOV equation for a star
Let us now see what changes when we allow an energy-momentum tensor in our static,
spherically symmetric spacetime. The simplest kind of thing to consider is called a perfect
fluid. What is a perfect fluid? Physically, it is a kind of spherical cow approximation,
in which we model a system like the ball of gas we call our Sun by a simple macroscopic
fluid, described only by its proper energy density ρ and pressure p in the instantaneous rest
frame. We ignore shear viscosity, bulk viscosity, and heat conduction. For a perfect fluid,
the energy-momentum tensor Tµν can be written in the form
p.f.
 p
Tµν = ρ + 2 Uµ Uν − pgµν . (15.1)
c
This obeys the conservation equation
∇µ Tµν
p.f.
= 0. (15.2)
In flat Minkowski spacetime in Cartesian coordinates, in the Newtonian limit, conservation
of energy-momentum can be seen to reduce to (a) the continuity equation, and (b) the Euler
equation, the classical equation of motion for a perfect fluid. For details, see §8.3 of HEL.
Here, we work in curved spacetime so our story is more involved.
As you can check, in comoving coordinates, we have only the time component of the
4-velocity and its magnitude is set by the timelike condition U µ Uµ = 1. Then the Einstein
equations for our static, spherically symmetric star involve only radial dependence, and they
are (with c = 1)
1 −2β 
2r∂r β − 1 + e2β = 8πGN ρ ,

2
e (15.3)
r
1 −2β 
2r∂r α + 1 − e2β = 8πGN p ,

2
e (15.4)
 r 
1
e−2β ∂r2 α + (∂r α)2 − ∂r α∂r β + (∂r α − ∂r β) = 8πGN p . (15.5)
r
Note very carefully here the difference between ρ(r) and p(r). Make sure you write ρs in
your own handwritten notes in such a way that they are easily distinguishable from ps.
Now, we have a set of three coupled ODEs in α(r), β(r), ρ(r), p(r). Without some physical
input there are not enough equations to solve the system. But we can do it if we recruit
conservation of the energy-momentum tensor ∇µ Tµν = 0 and provide an equation of state.
The tt Einstein equation is a function of β only: it does not involve α. This allows us to
define a mass function m(r) such that
 −1
2β 2GN m(r)
e = 1− . (15.6)
r
Then in terms of m(r) rather than β(r), the tt Einstein equation becomes dm/dr = 4πr2 ρ(r),
which we can immediately integrate to
Z r
m(r) = 4π dr̃ r̃2 ρ(r̃) . (15.7)
0

79
You might look at this formula and think “Oh! This is just the natural answer: you
take the mass density and multiply by the surface area, and integrate radially.” But that
would be too quick, because the volume element in our curved spacetime metric is actually
{drdθdφ r2 sin θ eβ(r) }. So if we wanted to define the true energy density, we would instead
calculate Z R
r̃2 ρ(r̃)
M̄ = dr̃ p (15.8)
0 1 − 2GN m(r̃)/r̃
and this is greater than M because of the binding energy (a concept which does make sense
in GR for spherical stars).
The radial Einstein equation becomes

dα [GN m(r) + 4πGN r3 p(r)]


= . (15.9)
dr r[r − 2GN m(r)]

To get any further, we need to recruit energy-momentum tensor conservation. With only
radial dependence, this gives

dα(r) dp(r)
[ρ(r) + p(r)] =− , (15.10)
dr dr
which lets us eliminate dα/dr in favour of dp/dr. We obtain

dp(r) [ρ(r) + p(r)] [GN m(r) + 4πGN r3 p(r)]


=− (15.11)
dr r[r − 2GN m(r)]

This is the Tolman-Oppenheimer-Volkov equation for hydrostatic equilibrium in a star,


for the static spherically symmetric case in 4D.
In order to actually solve the TOV equation, we need to know one more equation: the
equation of state, which is a relationship p = p(ρ). For astrophysical systems, a polytropic
equation of state is often employed, which takes the form ρ = Kργ for some constants K, γ.
As a toy model, we can consider an incompressible star with finite constant mass density ρ∗
out to some radius R. Then the mass function is easily integrated, and M = 4πR2 /3. This
in turn gives √ √
R3 − rS R2 − R3 − rS r2
p(r) = ρ∗ √ 3 √ (15.12)
R − rS r2 − 3 R3 − rS R2
Integrating again to find gtt yields
r r
3 rS 1 rS r2
eα(r) = 1− − 1− 3 , r < R. (15.13)
2 R 2 R
The pressure increases near the core, even though we have assumed absolute incompressibility
of the fluid. In particular, if M > Mmax = (4R)/(9GN ), then the pressure at the core goes to
infinity. Oops! With our simplistic ansatz, we have managed to evolve ourselves outside the
regime of validity of Einstein’s equations. Of course, real stars do not obey such a simplistic
model as an incompressible fluid. Still, it is interesting that we can get the right order of
magnitude estimate of when a star can be too big to be gravitationally stable. Sometimes,

80
the stellar object collapses gravitationally into a black hole. If the initial configuration had
no overall angular momentum, it will settle down eventually to a Schwarzschild solution. If
it is rotating, then the metric we will discuss soon is known as the Kerr black hole.
Stellar evolution produces different endpoints depending on the initial mass of the star
in question. For small stars like ours, when they run out of gas for nuclear fusion, they
contract and become white dwarfs. If they are somewhat larger, above about 1.4M , known
as the Chandrasekhar limit, then electron degeneracy pressure is not sufficient to hold
them up, and they collapse further to become a neutron star (a class that includes pulsars).
Above about 3-4M , known as the Oppenheimer-Volkov limit, even neutron degeneracy
pressure is not enough. Bigger stars collapse to produce black holes.
People like to categorize black holes by size. We can distinguish three basic classes by
formation mechanism. Stellar mass black holes are produced by collapse of individual
stars, and have masses of a few to a few hundred solar masses. We also have supermassive
black holes at the centres of most galaxies, at millions to billions of solar masses. The third
class is known as primordial black holes because the only way these smaller-mass objects
could have been formed would have been in the Big Bang. The density of primordial black
holes is small, if there were any at all to begin with, because of the period of inflation which
grew the universe by gigantic amounts early in the history of its evolution, diluting them.

15.2 Geodesics of Schwarzschild


We now move to studying geodesics in the Schwarzschild spacetime explicitly. The nonzero
Christoffels for this geometry are
rS
Γt tr = ; (15.14)
2r(r − rS )
rS
Γrrr = −Γt tr , Γrtt = (r − rS ) , Γrθθ = −(r − rS ) , Γrφφ = sin2 θ Γrθθ ; (15.15)
2r3
1
Γθrθ = , Γθφφ = − sin θ cos θ ; (15.16)
r
1 cos θ
Γφrφ = , Γφθφ = , (15.17)
r sin θ
where rS is the Schwarzschild radius. Then our geodesic equations become
d2 t rS dt dr
2
+ = 0, (15.18)
dλ r(r − rS ) dλ dλ
 2  2
d2 r rS dt rS dr
2
+ 3 (r − rS ) −
dλ 2r dλ 2r(r − rS ) dλ
(   2 )
2
dθ dφ
− (r − rS ) + sin2 θ = 0, (15.19)
dλ dλ
 2
d2 θ 2 dθ dr dφ
+ − sin θ cos θ = 0, (15.20)
dλ2 r dλ dλ dλ
d2 φ 2 dφ dr cos θ dθ dφ
2
+ +2 = 0. (15.21)
dλ r dλ dλ sin θ dλ dλ

81
These equations look rather formidable until you realize that finding the Killing vectors
allows you to find first integrals of two out of four of the geodesic equations. This follows
because ∂t gµν = 0 and ∂φ gµν = 0. We write the energy
 rS  dt
E = pt = 1 − . (15.22)
r dλ
and the angular momentum

L = pφ = r2 sin θ . (15.23)

The next equation we can recruit is
U µ Uµ =  , (15.24)
where  = 0 for null geodesics and  = +1 for timelike geodesics. For either type of geodesic,
we have gµν U µ U ν = , or
 2   2 "   2 #
2
 rS  dt rS −1 dr dθ dφ
= 1− − 1− − r2 + sin2 θ . (15.25)
r dλ r dλ dλ dλ

Substituting in our conserved angular momentum L and energy E gives


 2  2
 rS −1 2  rS −1 dr 2 dθ L2
= 1− E − 1− −r − 2 2 . (15.26)
r r dλ dλ r sin θ

Our next step is a piece of physics input. We can use rotational symmetry to pick
θ = π/2. It is consistent with the geodesic equations to leave dθ/dλ = 0 for all affine time.
Then  2
L2
 
1 dr 1 2 1 rS 
= E − 1− + 2 . (15.27)
2 dλ 2 2 r r
Some textbooks like to help you visualize this setup by making a mapping onto a familiar
non-relativistic Newtonian system, as follows,

m→1 (15.28)
 2
2 dr
|~v | → , (15.29)

E2
Etot → , (15.30)
2
L2 L2 rS L2
 
1 rS   rS
Veff (r) → 1− + 2 = − + 2− . (15.31)
2 r r 2 2r 2r 2r3

You can learn everything you need to know about the availability of various types of orbits
(for either null or timelike geodesics) by plotting this “effective potential”. Carroll has two
great figures in §5.4, Figures 5.4 and 5.5:-

82
For Newtonian gravity, there are no massless particle orbits. Massive particles can have
stable bound orbits, depending on the angular momentum per unit mass.
For Einsteinian gravity, photons can orbit, but they are unstable. Any small perturbation
and the path flings off back out to infinity (sometimes after buzzing around the black hole
horizon a few times) or falls inexorably into the black hole. Massive particles, on the other
hand, can have bound orbits, and the outer solution radius gives a stable orbit while the
inner one gives an unstable orbit.
Circular orbits can happen when dVeff /dr = 0 at r = r∗ , solving the equation
 rS 2 3rS 2
r∗ − L2 r∗ + L γ = 0, (15.32)
2 2
where (following Carroll) we introduce γ = 1 for GR and γ = 0 for NG (Newtonian gravity).
Specifically, for massless geodesics, r∗ = 3rS γ/2, and as you can see by evaluating the second
derivative of Veff (r), it is an unstable maximum. Massive geodesics provide a richer context.

83
We find two solutions, s
r∗ L2 L4 3L2 γ
= 2 ± − 2 , (15.33)
rS rS rS4 rS
From this you can quickly see that NG has only one solution, at r∗ = 2L2 /rS . But for GR
the story is a lot more interesting. There are two solutions and, as you can see by computing
the second derivative of the effective potential, the outer one is stable while the inner one
is unstable. As you can discover by inspecting the negative root of eq.(15.33) carefully, for
radii smaller than r = r∗,ICO , where

3rS
r∗,ICO = , (15.34)
2
there are no stable circular orbits at all. Nothing can orbit that close without falling across
the horizon. Gravity is too strong. The angular momentum at which the stable and unstable
orbits coalesce for timelike geodesics is L4 = 3rS2 L2 γ, i.e., where the discriminant in eq.(15.33)
vanishes. This is called the ISCO, or the Innermost Stable Circular Orbit,

r∗,ISCO = 3rS = 2r∗,ICO . (15.35)

The following image is an artist’s rendition of what the black hole in the Large Magellanic
Cloud might look like (credit: Alain Riazuelo / CC BY-SA 2.5.). It looks weird because
you are not used to photon trajectories being bent. The strong and nonlinear gravitational
effects of the black hole are quite extreme!

84
16 R05Nov
16.1 Causal structure of Schwarzschild

How do light cones behave in the spacetime of the Schwarzschild black hole? In the original
Schwarzschild coordinates, we had the spacetime metric

2
 rS  2  rS −1 2
ds = 1 − dt − 1 − dr − r2 dΩ22 . (16.1)
r r
Obviously, we will have to suppress some of our four spacetime coordinates in order to fit
a diagram onto a two-dimensional page. It will assist our visualizations to suppress the
angular directions and focus attention on the time and radial directions. (Tip: be sure to
double-check spacetime diagrams in textbooks to eliminate avoidable confusion over which
coordinates are suppressed.)
For a null trajectory we have ds2 = 0. For purely radial motion, we can immediately
read off the slope of the light cone,
dt  rS −1
=± 1− . (16.2)
dr r
As we would expect, the magnitude of this tends to unity at r → ∞. Light rays go at 45◦
on a (t, r) diagram. At slightly smaller radii, it increases a little. What happens at r → rS
you may not have expected: the magnitude of the slope of the light cone blows up! The
light cone is physically squashed down to have zero opening angle. This is a coordinate
singularity.
Inside the Schwarzschild radius, gtt and grr both flip sign, seemingly switching roles.
This is a symptom of the fact that this coordinate system does not actually cover the region
of the black hole spacetime inside the horizon. Another symptom of the disease we see here
is that it appears a photon would take an infinite amount of time to fall into the black hole.
It does – in these coordinates.

Redshifting depends on the coordinate system. To do a better job of probing the causal
structure of the Schwarzschild black hole spacetime, we are actually better off aiming to
answer more invariant questions, like “How much affine parameter does it take before a
freely falling particle hits the singularity?” We can also hunt for better coordinate systems
which do cover the entire black hole spacetime, not just the region outside the horizon.

85
Let us start by inspecting what we have so far in our Schwarzschild coordinates. For
radial null paths,
dt ±1
= . (16.3)
dr (1 − rS /r)
Defining
dt
= ±1 (16.4)
dr∗
gives Z
r∗ d(r/rS )
= , (16.5)
rS (1 − rS /r)
so that  
r
r∗ = r + rS ln −1 , (16.6)
rS
which is known as the tortoise coordinate. This ranges over r∗ ∈ (−∞, +∞), while the
original radial coordinate ranged over r ∈ [0, ∞). So this tortoise coordinate also only covers
the region outside the horizon. The benefit of using these coordinates is that the radial null
paths are simple,
t = ±r∗ + c . (16.7)
The light cones are all at 45◦ in tortoise coordinates.
Using the tortoise coordinate, our black hole metric becomes
 
2 rS
ds = 1 − [dt2 − dr∗2 ] − r2 (r∗ )dΩ22 . (16.8)
r(r∗ )
Next, let us try adapting our coordinates to null motion. Define null coordinates in the
time-radius plane,

u ≡ t − r∗ , (16.9)
v ≡ t + r∗ . (16.10)

Then our black hole spacetime metric takes the form


 
2 rS
ds = 1 − dv 2 − 2drdv − r2 (u, v)dΩ22 . (16.11)
r(u, v)
These coordinates are called Eddington-Finkelstein coordinates. As you can check, this
metric remains invertible, including at the horizon.
Then for radial null motion, we have
 2
 rS  dv dv
1− − 2 = 0, (16.12)
r dr dr
so that 
2
dv  (outgoing) ,
= (1 − rS /r) (16.13)
dr  0 (ingoing) .

86
Because the first solution is positive, it is relevant for outgoing radial null paths. The second
solution is the one relevant for ingoing radial null paths.
Notice what this implies about our light cones in (v, r) coordinates. We have that for
the ingoing ‘side’ (on a 2D diagram) of the light cones, this always hugs v =const. For the
outgoing ‘side’ of the light cones, the slope depends on r/rS . At r → ∞, this slope is 2. If
we are at a finite r > rS , then the slope is positive and bigger than 2. At r = rS the slope
becomes infinite, pointing straight up the v-axis. For r < rS the slope becomes negative,
and points towards the inside of the black hole only. This represents the physics that we
want in a rather more elegant way than Schwarzschild coordinates did. Infalling photons do
not make it out of the black hole once they have crossed the horizon.
The following picture is a summary of what we have found out about light cones in
Eddington-Finkelstein coordinates.

A nice feature of Eddington-Finkelstein coordinates is that our light-cones do not get


squished down to infinitely thin pencils. But note carefully that they do turn over at the
horizon. Note that with these new coordinates we have managed to cover the region t → +∞
of the black hole spacetime, because at constant v, decreasing r sends t → +∞. So we have
extended in one direction. How about the other direction?
Are there other coordinates that might restore the symmetry between u and v? Our
Eddington-Finkelstein coordinates so far privileged v. Because of that, they are known as
ingoing Eddington-Finkelstein coordinates. It turns out that we can alternatively find a
second set of Eddington-Finkelstein coordinates, adapted for outgoing rather than ingoing
null paths, in which we have
 
2 rS
ds = 1 − du2 + 2drdu − r2 (u, v)dΩ22 . (16.14)
r(u, v)

Working back from our definitions, we see that this corresponds to the region t → −∞, as
compared to the region t → +∞ which the first set of Eddington-Finkelstein coordinates
extended us to. In outgoing Eddington-Finkelstein coordinates (u, r), the slope of the light
cones is 
2
du  − (ingoing) ,
= (1 − rS /r) (16.15)
dr  0 (outgoing) .
The accompanying picture illustrates this.

87
Can we uncover yet more regions of the Schwarzschild black hole spacetime? It turns
out that the answer is yes, if we use another even smarter coordinate system known as
Kruskal-Szekeres coordinates.
Our first guess for how to get further is furnished by choosing both light-cone coordinates
(u, v), in place of (u, r), or (v, r), or (t, r∗ ), or (t, r). We find immediately that
 
2 rS
ds = 1 − dudv − r2 (u, v)dΩ22 , (16.16)
r(u, v)
where  
1 r r
(v − u) = + ln −1 , (16.17)
2 rS rS
which implicitly defines r as a function of (u, v).
This is looking more promising: our light-cones will stay at 45◦ in these (u, v) coordinates.
But there is still one big fly in the ointment: we still have the problem that the horizon is
located infinitely far away. To cure this symptom, we make an exponential mapping to bring
the horizon to a finite place,
   
u v
U = − exp − V = + exp + . (16.18)
2rS 2rS
In these Kruskal-Szekeres coordinates we find
2rS3
 
2 −r(U,V )/rS
ds = dU dV − e + r2 (U, V )dΩ22 . (16.19)
r(U, V )
Picking apart the null (U, V ) coordinates into time T and radius R coordinates, via
r    
1 r r t
T = (U + V ) = − 1 exp sinh , (16.20)
2 rS 2rS 2rS
r    
1 r r t
R = (U − V ) = − 1 exp cosh , (16.21)
2 rS 2rS 2rS
gives the spacetime metric
2rS3
 
2 2 2 −r(T,R)/rS
+ r2 (T, R)dΩ22 ,

ds = −dT + dR − e (16.22)
r(T, R)
where r(T, R) is implicitly defined by
 
2 r 2
T −R = 1− er/rS . (16.23)
rS

88
This slick manipulation probably feels like it just happened at 100km/h. So let us slow down
a little, and unpack all of what these new amazing Kruskal coordinates allow us to see for
the physics of the Schwarzschild black hole.
In Kruskal-Szekeres coordinates,

• Radial null motion occurs along

T = ±R + c1 . (16.24)

• Surfaces of constant r are at


 
2 2 r
T −R = 1− er/rS , (16.25)
rS

which are hyperbolas in the (T, R) plane.

• Surfaces of constant t in are at


 
T t
= tanh , (16.26)
R 2rS

which are simply straight lines in the (T, R) plane.

• The event horizon is at


T = ±R . (16.27)
This has two solutions, corresponding physically to having both a black hole horizon
and a white hole horizon.

• The singularity is at
T 2 − R2 = 1 . (16.28)
This has two solutions, one which corresponds to a black hole singularity and one
which corresponds to a white hole singularity.

• What ranges do our coordinates (U, V ) cover? We see that (U, V ) range over all possible
values aside from where the curvature singularity occurs:

− ∞ ≤ T ≤ ∞,
− ∞ ≤ R ≤ ∞,
T 2 − R2 < 1 . (16.29)

Note: it appears that the U, V may be ill-defined inside the horizon, but it is actually
the original t, r coordinates that are ill-defined there. The U, V Kruskal coordinates
are well-defined, except of course in the disallowed singular region. This is the really
key part of using Kruskal coordinates which allows us to obtain what is known as the
maximal analytic extension of the Schwarzschild spacetime. (In the figure below,
rg is our rS , t̃ is our T , and r̃ is our R.)

89
Notice how the Kruskal diagram actually has extra regions by comparison to the original
Schwarzschild coordinate patch. These new extra regions can be abbreviated as II, III, and
IV. From region I we can, via future-directed null rays, go into region II. So it makes sense
to interpret this part as the region behind the black hole event horizon. And you can see
from the picture above that the black hole singularity is in region II.
Suppose, from region I, we followed instead a past-directed null ray. Then what? Ac-
cording to our Kruskal diagram, we would cross a horizon to go into another region – III –
with another singularity, the white hole singularity, which can be loosely called the ‘mirror
image’ of the singularity in region II under time reversal. The horizon in region III is the
white hole horizon that we identified in our list of bullet points above.
By following future-directed null rays from region III, or past-directed null rays from
region II, we can see a second asymptotically flat region. But we can never communicate
with it! It is a causally separated place unconnected by timelike or null geodesics to the
original asymptotic region. Some people like to speak of the Schwarzschild geometry as a
“wormhole” connecting two asymptotically flat regions, but it is not physical in any sense
to call it a wormhole because it is not traversable. It closes up too quickly for any physical
observer (even an electron) to cross from I to IV. For more details, see p.228 of Carroll.
A black hole formed in gravitational collapse would involve at most regions I and II.
Regions III and IV would not be present; there would be no white hole, only a black hole.
What is a white hole, physically? Mathematically, it is the time-reverse of a black hole.
Such a beast cannot actually be formed in gravitational collapse – that produces a black hole
with a future horizon, not a white hole with a past horizon. The other interesting fact about
white holes, shown by D.M. Eardley in 1974, is that a white hole is unstable to collapsing
into a black hole. For these and other reasons, you do not need to worry about the physics
of white holes if you are considering classical gravity. Only quantum gravity theorists need
to worry our heads about such things.

90
17 M16Nov
17.1 Charged black holes
The Reissner-Nordstrøm solution is obtained when we assume staticity and spherical
symmetry, and allow an energy-momentum tensor coming from the electromagnetic field.
Since there are no known magnetic monopoles that could source a magnetic field, we will
stick with an electric field14 . Then the only nonzero component of F µν is F tr . Let us assume
the same metric ansatz as we had for Schwarzschild,

ds2 = e2α(r) dt2 − e2β(r) dr2 − r2 dΩ22 . (17.1)

The covariant source-free Maxwell equation ∇µ F µν = 0 can be rewritten in the form

1 √
√ ∂µ −gF µν = 0 ,

(17.2)
−g

where {−g} is shorthand for the negative of the determinant of the downstairs spacetime
metric. This Maxwell equation simplifies significantly by virtue of spherical symmetry. In
our spacetime ansatz, we have √
−g = eα+β r2 sin θ , (17.3)
so that the Maxwell equation implies

∂r r2 sin θeα+β F tr = 0 ,

(17.4)

which we can immediately integrate by eye to


c1 −α−β
F tr = e . (17.5)
r2
(Note: if we had been more rigorous and put in a delta function charge source on the RHS
of the Maxwell equation, c1 would have been proportional to the electric charge.)
The next step in solving the Einstein-Maxwell system is to substitute in the above
electric field into the energy-momentum tensor and apply Einstein’s equations. The details
are similar in spirit but longer in practice than what we did before in deriving Schwarzschild,
so we will not drag you through the algebra. The really nice thing is that, even with the
electric field turned on, it turns out that the Einstein equations still furnish the relationship

α = −β + const. , (17.6)

between the time-time component of the metric and the space-space component. Integrating
up the θθ Einstein equation like we did for Schwarzschild produces the solution,
−1
GN Q2 GN Q2
  
2GN M 2GN M
ds2RN =− 1− + 2
dt + 1 − + dr2 +r2 dΩ22 , (17.7)
c2 r 4π0 c4 r2 2
cr 4
4π0 c r 2

14
If you wanted to do the magnetic case, you would find Fθφ = P sin θ, where P ∝ magnetic charge.

91
where we have temporarily restored physical constants that we would usually set to unity.
We can make this look slightly prettier by defining

GN M GN Q2
µ≡ , and q2 ≡ ; (17.8)
c2 4π0 c4
then p
r± = µ ± µ2 − q 2 . (17.9)
The geometry has two event horizons, an outer horizon and an inner horizon. As you
can check by computing the full contraction of the Riemann tensor with itself, the curvature
singularity is located at r = 0. The “singularities” in the metric at r = r± are just coordinate
singularities, like the one we encountered for Schwarzschild.
There are three cases for Reissner-Nordstrøm metrics depending on the sign of what is
under the square root in the above formula.

1. µ2 < q 2 : This is unphysical. The event horizon walks off into the complex plane and
the singularity at the origin is then naked. Oops!
2. µ2 > q 2 : This is physical. It includes the limit of zero charge, which gives back
Schwarzschild (r+ = 2µ, r− = 0). Here there are two horizons, at r = r± . The
singularity in this case is timelike, as compared to spacelike for Schwarzschild.
3. µ2 = q 2 . This is also physical, and is known as the extremal Reissner-Nordstrøm
spacetime. You can think of it as having exquisitely balanced gravitational attraction
and electric repulsion.

The Penrose diagrams for Reissner-Nordstrøm black hole spacetimes are available in HEL
§12.6, if you wish to peruse them to obtain intuition. Note however one important caveat on
the maximal analytic extensions that display an infinite number (!) of asymptotic regions.
The inner horizon has the property that probe perturbations coming in from I − tend to
bunch up there: their magnitude grows out of control. But if the perturbation amplitude
were that big, then it would surely backreact on the geometry, from having so much energy-
momentum. This would entail changing the solution that we already wrote down. What
this teaches us is that the semiclassical perturbation analysis is breaking down. Most likely,
the singularity of a real physical charged black hole would become spacelike, covered by only
one horizon, not two.
At this point we can make one more advanced comment, concerning the physical realism
of charged black hole solutions. Quantum field theory shows you that charged black holes
in real astrophysical situations will actually discharge rather quickly, via the Schwinger
process, which nucleates charged particle-antiparticle pairs (e.g. electron-positron pairs) in
an electric field, about a Compton wavelength apart. So if you mention a charged black hole
to an astrophysicist, they tend to burst out laughing. But in some ways the joke is on them,
because focusing on the dynamics of charged black holes was what led string theorists to
perform the first-ever first-principles computation of the entropy of black holes in 1996, a
discovery whose development indirectly helped me get hired! To finesse the astrophysicist’s
objection, you can imagine that the “electric” charge we are discussing is not carried by light
quanta in the theory.

92
A note about two neat properties of our extremal Reissner-Nordstrøm black hole. First,
we will be able to see pretty quickly that there are multi black hole solutions in this case.
This spacetime has one double horizon at r = r− = |q|,
 2  −2
2 |q| 2 |q|
dsERN = − 1 − dt + 1 − dr2 + r2 dΩ22 . (17.10)
r r
We can easily define a shifted radial coordinate

ρ := r − |q| . (17.11)

Then dρ = dr and
     −1
|q| r − |q| ρ |q|
1− = = = 1+ . (17.12)
r r ρ + |q| ρ
Defining
|q|
H(ρ) = 1 + , (17.13)
ρ
we have

ds2ERN = −H −2 dt2 + H 2 dρ2 + (ρ + |q|)2 dΩ22


= −H −2 dt2 + H 2 dρ2 + ρ2 dΩ22 .

(17.14)

This coordinate system is known as isotropic coordinates because the metric in parenthe-
ses is the standard Euclidean metric in spherical polar coordinates. We also find that
p
GN At = H −1 − 1 . (17.15)

If we substituted this ansatz for the gauge potential and the metric into Maxwell’s equations
and the Einstein equations, we would find that they require only one equation between them,

~ 2H = 0 .
∇ (17.16)

In other words, H is a harmonic function of Cartesian coordinates ~x obtained from the


isotropic spherical coordinates. It is actually possible to have multi black hole solutions of
this system, because of the exact cancellation between gravitational attraction and electric
repulsion between any two of the black hole centres!
N
X GMa
H =1+ . (17.17)
i=1
|~x − ~xa |

This ability to superpose is extremely niche: it very generally fails in GR, a nonlinear theory.
Another interesting feature of the Reissner-Nordstrøm spacetime is what happens when
you take the near-horizon limit |~x| → 0. This in effect removes the 1 from the harmonic
function. If you look carefully at the single-centred black hole metric in this limit, you will
find that it produces AdS2 ×S 2 , two-dimensional Anti de Sitter spacetime times a two-sphere.
This fact is related to the famous AdS/CFT correspondence of string theory.

93
17.2 Rotating black holes
Now we move to discussing the Kerr black hole, which has not only mass but also angular
momentum. Our discussion here will be based largely on HEL §13. Demanding that the
spacetime be stationary and spheroidally symmetric requires an ansatz of the form

ds2 = e2α(r,θ) dt2 − e2γ(r,θ) [dφ − ω(r, θ)dt]2 − e2β(r,θ) dr2 − e2δ(r,θ) dθ2 . (17.18)

Note how many more functions we have turned on here, and the fact that there is now both
r and θ dependence in all our metric functions. Mathematically speaking, this complicates
the hell out of the process of solving the Einstein equations, because we now have PDEs in
two variables instead of ODEs in r only. We will not actually prove that the Kerr solution
solves the vacuum Einstein equations, because the algebra is awful.15 Instead, we will derive
some fascinating physical properties of spacetimes of the above form, and just present the
Kerr solution, gift wrapped with a bow on top.
From the above ansatz we see that

gtt = e2α − e2γ ω 2 . (17.19)

Since the metric has an off-diagonal component, gtφ , inverting to find the upstairs metric is
slightly more complicated. We can easily read off two of the components,

g rr = −e−2β , (17.20)
θθ −2δ
g = −e , (17.21)

but for the (t, φ) block we need to invert the 2 × 2 matrix. The result is

g tt = e−2α , (17.22)
φφ −2γ 2 −2α
g = −e +ω e , (17.23)
g tφ = +ωe−2α . (17.24)

It is possible to see one of the most intriguing consequences of GR, known as the dragging
of inertial frames, without getting specific about the form of any of the functions in our metric
ansatz. Since the metric obeys ∂φ gµν = 0, pφ is conserved along a geodesic. Then

pφ = g φµ pµ = g φφ pφ + g φt pt , (17.25)

and similarly
pt = g tt pt + g tφ pφ . (17.26)
Let us specialize to the case of pφ = 0: no initial angular momentum. This quantity remains
zero along the geodesic. Then, recalling our relationship between the momentum and the
tangent vector for either massive or massless geodesics,
dxµ
pµ ∝ , (17.27)

15
If you are a masochist and want to see it for yourself, please make SymPy do it.

94
we have
dφ pφ g tφ
= t = tt = ω(r, θ) . (17.28)
dt p g
In other words, ω is the coordinate angular velocity of a massless particle with no angular
momentum. What we have obtained here might not look like much, but it is physically
remarkable. A particle dropped straight inwards from infinity will not end up continuing
straight inwards – instead, gravity drags the particle around so it acquires an angular velocity.
This effect is know as the dragging of inertial frames.
Our next task is to define a physically important surface known as the stationary limit
surface. To get basic intuition for this phenomenon, consider what happens if we assume
that a particle/observer could in principle remain at fixed (r, θ, φ). This would require a
4-velocity of the form [uµ ] = [ut , ~0]T . Is this compatible with our spacetime? The answer
is: not everywhere! In the region where gtt is negative, we see that our assumed 4-velocity
is incompatible with the condition that u2 = 1. Oops. The equation gtt = 0 delineates the
surface inside which a particle/observer cannot stay stationary, and it is called the stationary
limit surface. Let us now dig a little deeper.
Imagine photons emitted from (r, θ, φ) purely in the ±φ direction at first, so that only
dt and dφ are nonzero along the photon path. Using ds2 = 0, we have

gtt dt2 + 2gtφ dtdφ + gφφ dφ2 = 0 , (17.29)

so that s
2
dφ gtφ gtφ gtt
=− ± 2
− (17.30)
dt gφφ gφφ gφφ
If at the emission point gtt /gφφ < 0, then dφ/dt is positive (negative) for photons emitted
in the ±φ direction, even though the magnitudes differ. But when gtt = 0, we cross over to
a different behaviour. In particular, on the surface gtt (r, θ) = 0, known as the stationary
limit surface, there are two qualitatively different solutions:
dφ gtφ dφ
= −2 = 2ω or = 0. (17.31)
dt gφφ dt

The first solution corresponds to a photon sent off in the same direction as the source rotation.
The second solution shows that frame dragging is so severe that initially the photon does
not move at all. This implies that a massive particle, which must always go slower than a
photon, also has to rotate with the source. This is true even if it has an arbitrarily large
angular momentum with opposite orientation!
As we will find next week when we start talking about experimental successes of GR, the
formula for gravitational redshift of an observer at a fixed spatial location in a stationary
spacetime is s
 
νR gtt (E)
= . (17.32)
νE stationary, fixed gtt (R)
This is why the stationary limit surface is also known as the infinite redshift surface.

95
18 R19Nov
18.1 The Kerr solution
Today’s material is based on parts of HEL §13. All figures shown are theirs.
How would we find the horizon in our rotating spacetime? The defining property of
an event horizon is that it is a null surface. In stationary axisymmetric spacetimes, its
equation must be of the form f (r, θ) = 0. Nullness then implies that g µν ∂µ f ∂ν f = 0, or
g rr (∂r f )2 + g θθ (∂θ f )2 = 0. In fact, it turns out that it is actually possible to choose our
coordinates r and θ such that the equation for the horizon can be put in the form f (r) = 0.
In this case, our condition reduces to g rr (∂r f )2 = 0, and therefore we see that the event
horizon occurs when
g rr = 0 . (18.1)
In our previous case of Schwarzschild, this was equivalent to the condition gtt = 0, but that
only holds for static black holes, not stationary ones.
This is a good place to mention a definition of a horizon associated to Killing vectors.
Suppose that we have a Killing vector χµ . If that Killing vector is null along some null
hypersurface Σ, then Σ is a Killing horizon of χµ . Note that χµ is normal to Σ because
a null surface cannot have two linearly independent null tangent vectors. Some important
facts are as follows.
• Every event horizon Σ in a stationary, asymptotically flat spacetime is a Killing horizon
for some Killing vector χµ .
• If the spacetime is static, then χµ will be the Killing vector K µ = (∂t )µ representing
time translations at infinity.
• If the spacetime is stationary but not static, then it will be axisymmetric with a
rotational Killing vector Rµ = (∂φ )µ , and χµ will be a linear combination K µ + ΩH Rµ
for some constant ΩH .
To prove that the empty space Einstein equations are satisfied, we need to show that
the Ricci tensor is zero for metrics of our form with α, β, γ, δ, ω. Take my word for it: this is
a very tedious computation. Here is the Kerr metric that emerges after all the calculational
dust has settled:-
4µar sin2 θ ρ2 2
 
2µr
ds = 1 − 2 dt2 +
2
dtdφ − dr − ρ2 dθ2
ρ ρ2 ∆
2µra2 sin2 θ
 
2 2
− r +a + sin2 θdφ2 ,
ρ2
ρ2 ∆ Σ2 sin2 θ 2 ρ2 2
= 2 dt2 − (dφ − ωdt) − dr − ρ2 dθ2 , (18.2)
Σ ρ2 ∆
where
ρ2 = r2 + a2 cos2 θ , Σ2 = (r2 + a2 )2 − a2 ∆ sin2 θ , (18.3)
2µar
∆ = r2 − 2µr + a2 , ω= . (18.4)
Σ2

96
The coordinate system in which we have presented the Kerr metric is known as the Boyer-
Lindquist coordinate system. Note: this was not actually the original coordinate system
used by Kerr when he derived the black hole, which are known as Kerr-Schild coordinates.
Where is the singularity of the Kerr spacetime describing a rotating black hole? Com-
puting the full contraction of Riemann with itself shows that only at ρ2 = 0 do we see a
physical singularity. This happens at
r2 + a2 cos2 θ = 0 , (18.5)
yielding
π
r = 0, θ= . (18.6)
2
Careful inspection reveals that this singularity is ring shaped. To see this, take the limit
M → 0 while keeping a nonzero; the result gives Minkowski spacetime in oblate spheroidal
coordinates, which are related to Cartesian coordinates by
√ √
x = r2 + a2 sin θ cos φ , y = r2 + a2 sin θ sin φ , z = r cos θ . (18.7)
This is to be contrasted with Schwarzschild, where the singularity was pointlike.
Where are the horizons? These occur where g rr → 0. This requires ∆ = 0, or
p
r = r± = µ ± µ2 − a2 . (18.8)
Note that, with factors of c temporarily restored for physical clarity,
GN M
µ= , and J = M ac . (18.9)
c2
So then we require
µ ≥ |a| (18.10)
for cosmic censorship.
Where is the stationary limit surface, also referred to as the ergoregion? This happens
when gtt → 0, p
rS ± = µ ± µ2 − a2 cos2 θ . (18.11)
The following figure summarizes these aspects from a side-on perspective.

97
18.2 The Penrose process
Previously we started the discussion of frame dragging in GR for the Kerr spacetime. Let
us now finish that line of reasoning, which will help lead us into the subject of black hole
thermodynamics.
Suppose that you had the ability to fire rockets and wanted to remain fixed at (r, θ) but
rotate around φ. Then the 4-velocity is

[uµ ] = ut [1, 0, 0, Ω]T , (18.12)

where

Ω= (18.13)
dt
is the angular velocity w.r.t. an observer at infinity. Demanding that the 4-velocity squares
to  gives a quadratic equation for ut :

gtt (ut )2 + 2gtφ ut uφ + gφφ (uφ )2 = (ut )2 [gtt + 2gtφ Ω + gφφ Ω2 ] =  . (18.14)

For real solutions for ut , we need

gφφ Ω2 + 2gtφ Ω + gtt ≥ 0 . (18.15)

Since gφφ < 0 everywhere, Ω must lie in the interval Ω ∈ (Ω− , Ω+ ), where
s
2
gtφ gtφ gtt gtt
r
Ω± = − ± 2
− = ω ± ω2 − . (18.16)
gφφ gφφ gφφ gφφ

Notice how Ω− can be negative if gtt > 0. Where gtt = 0, Ω− = 0 and Ω+ = 2ω. This occurs
on the stationary limit surface S + , which is outside (or at, for θ = 0, π) the event horizon. A
special situation ensues when ω 2 = gtt /gφφ , Ω± = ω. This holds at ∆ = 0, i.e. at the outer
horizon. At r = r+ , the angular velocity has to be one value only,
a
ΩH = ω(r+ , θ) = . (18.17)
2µr+
This is independent of θ, which is a highly nontrivial physics fact. It is also the maximum
allowed value of the angular velocity inside the ergoregion.
Now we have all the ingredients at hand to discuss the Penrose process. Suppose that
we have an observer at infinity with fixed position who fires particle A into the Kerr black
hole ergoregion. Then the energy of A measured at the emission event E is
(A)
E (A) = p(A) (E) · uobs = pt (E) , (18.18)

where the observer 4-velocity is [uµobs ] = [1, ~0]T . Now suppose that inside the ergoregion,
particle A decays into two other particles: A → B + C. Then momentum conservation
implies that
p(A) (D) = p(B) (D) + p(C) (D) , (18.19)
where D denotes the decay event.

98
Suppose that C eventually makes it out to infinity. The observer at infinity measures
the particle energy at the reception event R to be
(C) (C)
E (C) = pt (R) = pt (D) (18.20)

because pt is conserved along a geodesic by virtue of stationarity: ∂t gµν = 0. Similarly, for


the original particle,
(A) (A)
pt (D) = pt (E) . (18.21)
Then the time component of the above momentum conservation equation can be rearranged
to
(B)
E (C) = E (A) − pt (D) , (18.22)
(B)
because pt is conserved along a geodesic.
(B)
Now, if B were to escape the ergoregion, pt would be timelike, and hence proportional
to the particle energy as measured by an observer with purely timelike 4-velocity. Since
(B)
pt > 0, this implies that E (C) < E (A) , i.e. you get less energy out than you put in. But if
B were to instead fall into the black hole, then it would forever remain in the region where
(B)
gtt has opposite sign. Then pt would be interpreted as a component of spatial momentum,
which could in principle be either positive or negative. (If it were the energy, it would have
(B)
to be positive for a physical particle.) If pt happened to be negative, then E (C) > E (A) .
This means that we can extract energy from a rotating black hole!
Once B has fallen inside the event horizon, it becomes part of the black hole, whose
mass and angular momentum are then
(B)
M c 2 → M c 2 + pt ,
(B)
J →J− pφ . (18.23)

If we have an observer at fixed r, θ, we already worked out the 4-velocity: [uµ ] = ut [1, 0, 0, Ω]T ,
where Ω = dφ/dt is the angular velocity w.r.t. infinity. This observer measures B’s energy
to be  
(B) (B)
E (B) = pµ(B) uµ = ut pt + pφ Ω . (18.24)
This quantity must be positive for a physical particle, so
(B)
(B) pt
− pφ < . (18.25)

(B)
Consider the quantity L = −pφ . What is it? This is the component of B’s angular
momentum along the black hole rotation axis: there is a − sign because we are working in
(B)
mostly minus signature. Now, because pt < 0 for the Penrose process and Ω > 0, this
means that L < 0, resulting in a loss of angular momentum for the black hole. You can
keep extracting energy from a black hole like this until you have spun Kerr down all the way
to Schwarzschild. Earlier, we learned that the angular velocity is maximal at r = r+ , when
Ω = ΩH . So in fact for any observer at fixed r, θ, we have a general bound,
δM c2
δJ < . (18.26)
ΩH

99
Let us sketch the calculation of the area of the outer horizon r+ in the Kerr spacetime.
Writing
γij dxi dxj = −ds2 (dt = 0, dr = 0, r = r+ ) (18.27)
(r+ + a2 )2 sin2 θ
 2 
2 2 2 2
= (r+ + a cos θ)dθ + 2
dφ2 , (18.28)
(r+ + a2 cos2 θ)
we define the area A as Z p
A(r) = |γ|dθdφ . (18.29)

From the metric, we have p 2


|γ| = (r+ + a2 ) sin θ , (18.30)
so that
2
A(r+ ) = 4π(r+ + a2 ) . (18.31)
A very cute fact about the Penrose process is that the area of the black hole horizon does
not shrink when it occurs. What is the physics behind this? The angular momentum is
reduced more than the mass each time we do it, and this ensures that the area of the black
hole never decreases. To see a few more details, let us define the irreducible mass by
2 A
Mirr = (18.32)
16πG2N
1 2
= 2 (r+ + a2 ) (18.33)
GN
s !
1 J 2
= M2 + M4 − 2 (18.34)
2 GN
This might seem a tad unmotivated until we realize how it is affected by changes in M and
J. We find after some straightforward but boring algebra that
 
a δM
δMirr = p − δJ . (18.35)
4GN Mirr G2 M 2 − J 2 /M 2 ΩH
Look carefully at what this implies. We had earlier that for a Penrose process, δJ < δM/ΩH
(where both δM and δJ are negative), so
δMirr > 0 . (18.36)
Therefore, the maximum work you can extract via the Penrose process is
v s
u
1 u J2
M − Mirr = M − √ tM 2 + M 4 − 2 , (18.37)
2 GN

and this is maximized to (1 − 1/ 2) ' 29% of the original energy for extreme Kerr.
The moral of the story here is that we are discovering relationships between macroscopic
variables of the black hole, and this opens the door to discussing black hole thermodynamics
(a topic on which I am an expert). I will have more to say about this in Winter/Spring in
the second GR course, PHY484S/PHY1484S.

100
19 M23Nov
The reason we teach GR is not based in theoretical aesthetics, although those are really
quite beautiful and many great intellects have fallen in love with it! We teach GR and use
it because it works as an experimental description of gravity. In the next few lectures, we
will discuss some of the signature experiments that established GR firmly in the minds of
humans worldwide.
Material on experimental tests, not including gravitational waves, is based pretty closely
on Appendix 9A and §10 of the HEL textbook. All the figures displayed for this material
are theirs.

19.1 Gravitational redshift


Suppose that we have a stationary spacetime of the form
ds2 = g00 (xk )(dx0 )2 + 2g0i (xk )dx0 dxi + gij (xk )dxi dxj .
This includes all of the types of black holes we have studied so far: Schwarzschild, Reissner-
Nordstrøm, and Kerr. Imagine two different physical observers who are massive and therefore
move slower than light. Call them E for emitter and R for receiver, with worldlines xµE (τE )
and xµR (τR ) respectively, where τE , τR are the proper times for those two observers. Now let
E moving with 4-velocity UE (A) emit a photon at event A and R moving with 4-velocity
UR (B) receive it at event B.

We can find the energy of a photon in the reference frame of a massive observer by taking
the dot product of the photon’s 4-momentum with the observer’s 4-velocity,
E = pµ U µ . (19.1)
This works because we can choose the affine parameter of a null geodesic such that
dxµ
pµ = . (19.2)

(Note that this is different from the convention for massive particles, for which the constant
of proportionality in the above equation is the rest mass, rather than unity.) Then we have
E(A) = pµ (A)UEµ (A) , (19.3)
E(B) = pµ (B)URµ (B) . (19.4)

101
Since in both cases E = hν, we have

νR pµ (B)URµ (B)
= . (19.5)
νE pµ (A)UEµ (A)

Now, since the photon’s 4-momentum is tangent to its geodesic, it is parallel transported
transported along its path. Equivalently, the directional covariant derivative of pµ is zero
along the geodesic,
D dxσ dxσ
∂σ pµ − Γνσµ pν

0= pµ = ∇ σ pµ =
Dλ dλ dλ
σ
d dx
= pµ − Γνµσ pν . (19.6)
dλ dλ
We can use this to relate pµ (B) to pµ (A). Recruiting our convention pµ = dxµ /dλ, we have

d
pµ = Γνµσ pν pσ . (19.7)

Recall that we also have the mass shell relation for the photon,

pµ pµ = 0 . (19.8)

Suppose the emitter E and receiver R are at fixed spatial coordinates. (This would not be
true for freely falling observers.) Then the spatial components of the observers’ 4-velocities
vanish,
dxi dxi
UEi = E = 0 , and URi = R = 0 . (19.9)
dτE dτR

Using U µ Uµ = 1 for massive observers gives u0 = 1/ g00 , so that
s
νR p0 (B) g00 (A)
= . (19.10)
νE fixed p0 (A) g00 (B)

If the metric is stationary, i.e. ∂0 gµν = 0, then p0 is conserved by the geodesic equation.
Then since the momentum vector for a photon is equal to the tangent vector, as in eq.(19.2),
p0 is constant along a photon geodesic, and so
s
νR g00 (xkE )
= . (19.11)
νE fixed, stationary g00 (xkR )

For Schwarzschild, we obtain


s
νR [1 − 2GN m/(c2 rE )]
= . (19.12)
νE fixed, stationary [1 − 2GN m/(c2 rR )]

For the Kerr spacetime, we previously found that the location where g00 = 0 marks the sta-
tionary limit surface (SLS), the surface inside which a stationary observer gets involuntarily

102
dragged around with the rotating black hole spacetime geometry. Here, we see that the SLS
is also the location where the gravitational redshift for an observer at a fixed spatial location
becomes infinite.
The quantity z for the redshift is defined by
νR 1
= . (19.13)
νE 1+z
If we want to find out redshifts for freely falling observers, then we need to solve the geodesic
equations for the stationary spacetime in question. The analysis is even more complicated
for spacetimes that are not stationary.

19.2 Planetary perihelion precession


How will we discover the perihelion advance we are after? We will start by using the geodesic
equations derived for the Schwarzschild geometry introduced previously. The analysis can
also be done for Kerr, but for our purposes here the non-rotating case will suffice to show
the essential physics. We had a conserved energy
2µ .
 
E = 1− t, (19.14)
r
and a conserved angular momentum .
L = r2 φ , (19.15)
. .
where · = d/dλ. Then the norm condition gµν xµ xν =  (with  = 0 for photons and +1 for
massive particles) gives   2 
2 .2
E +r + 1−
2µ L
+  = 0. (19.16)
r r2
We saw previously that defining
1 L2
  

Veff (r) = + 1− , (19.17)
2 r2 r
in analogy with Newtonian experience allows the rewriting
1 .2 1
r + Veff (r) = E 2 ≡ E . (19.18)
2 2
We can combine the knowledge above to find the shape equation,
 −1
dφ dφ dr
= , (19.19)
dr dλ dλ
giving
dφ L
= ± 2 [2 (E − Veff (r))]−1/2 . (19.20)
dr r
Defining the orbit parameter b, via
L
b= , (19.21)
E

103
gives
   −1/2
dφ 1 1 1  2µ
= 2 2− + 1− , (19.22)
dr r b r 2 L2 r
where  = 1 for massive particles. (For photons, we would set  = 0 in this equation.) Now
we make a change of variables, to
L2 1 L2 1
u= = . (19.23)
GN M r µ r
The radial equation for massive particles (planets, etc.) then turns into
 2
du L2 2µ2 2EL2
+ 2 − 2u + u2 − 2 u3 = . (19.24)
dφ µ L µ2
On the face of it, this does not look any simpler than before. The neat trick is to realize
that differentiating this again yields a simpler second order equation! Straightforward but
unilluminating algebra yields
d2 u 3µ2 2
+ u = 1 + u . (19.25)
dφ2 L2
This equation is the full unadulterated GR result, and involves no approximations. The
second term on the RHS of this equation would be absent in the Newtonian computation.
For the Newtonian case, you can check that the solution to the shape equation is

u0 = 1 + e cos φ . (19.26)

Treating this as the zeroth order approximation to the GR result, we can substitute back

u(φ) ' u0 (φ) + u1 (φ) + . . . (19.27)

into eq.(19.25) and obtain a perturbative equation for first order corrections u1 . This gives

d2 u1 3µ2 2
+ u1 ' u . (19.28)
dφ2 L 0
As you should expect for an inherently nonlinear theory like GR, perturbation theory here
is nonlinear. Substituting in the specific form of u0 gives
d2 u 1 3µ2 e2 e2
  
+ u1 ' 2 1+ + 2e cos φ + cos 2φ . (19.29)
dφ2 L 2 2
You can check by explicitly differentiating that the solution to this is
3µ2 e2 e2
  
u1 ' 2 1+ + eφ sin φ − cos 2φ . (19.30)
L 2 6
Notice that the first term here is a constant displacement and that the third term is oscillatory
about zero. The second term that gives rise to a cumulative effect per orbit is the most
physically important one.

104
Figure credit: Mpfiz - Own work, Public Domain.
From here on we just focus on that key second cumulative term on top of the zeroth
order Newtonian contribution. We have
3µ2
ukey = 1 + e cos φ + e φ sin φ . (19.31)
L2
This can be rewritten as
ukey = 1 + e cos [(1 − α) φ] , (19.32)
where
3µ2
α= 2 , (19.33)
L
as you can see by doing a Taylor expansion to first order in small quantities,
cos[(1 − α)φ] ' cos φ + αφ sin φ + O(α2 ) . (19.34)
Then the precession per orbit is
6πG2N M 2
∆φ ' 2πα = . (19.35)
L2
In order to massage this expression a little further, we need to relate L2 to physical quantities
we know. For the Newtonian (uncorrected) ellipse, the EOM show that
L2
a= , (19.36)
µ(1 − e2 )
so that
6πGN M
∆φ = . (19.37)
c2 a(1
− e2 )
The first experimental test of this was with Mercury. For that planet, the gravitational
radius µ = GN M/c2 is about 1.48km, the eccentricity is about e = 0.2056, and the semima-
jor axis is about a = 5.79 × 1010 m. This results in a perihelion precession advance of about
5 × 10−7 radians per orbit, or about 43 seconds of arc per century. Note that the observed
value is actually considerably greater, but most of it comes from two prosaic places: (a)
precession of the equinoxes in our geocentric coordinate system, and (b) other planets per-
turbing Mercury’s orbit. The residual amount of 43 seconds of arc per century is perfectly
described by GR, to within experimental errors. This was not settled definitively in the
experimental realm until the 1960s. For Earth, our perihelion precession is less, only about
4 seconds of arc per century. Mercury is affected most because it is closest to the Sun.

105
20 R26Nov
20.1 Bending of light
Now let us focus on the bending of light. To start with, let us remind ourselves first of the
Newtonian result. Most people think that because two photons with zero mass should feel
zero Newtonian force between them, that implies that photons do not feel gravity. This is
incorrect. Newton imagined light as corpuscular, and it feels gravity like any other corpuscle.
The gravitational acceleration of a test mass does not depend on the mass.
In Newtonian mechanics, particles in unbound orbits move on hyperbolae rather than
ellipses. The incoming path asymptotes in the infinite past to one of the separatrices, and
the outgoing path asymptotes in the infinite future to the other separatrix. In principle, it
could come as close as the radius of the stellar object as it slingshots around the star.

We can estimate the size of the effect just using dimensional analysis. The variables
in the problem are: GN and c (theory constants), M (a solution parameter), and b, the
radius of closest approach. Since the deflection angle we are looking for is dimensionless, we
estimate that
GN M
θ∼ 2 . (20.1)
cb
In principle, θ could have been any function of the dimensionless RHS. We have chosen a
linear functional dependence on purpose, because we expect zero deflection angle when there
is no star and because we expect a small effect overall.
We can get more precise and confirm the linear dependence by asking about the gravita-
tional force felt by a corpuscle. Suppose that far from the star it starts in along the x-axis,
and that the star is located along the negative y-axis. To first order in small quantities,
px is unaffected by the gravitational deflection, and the corpuscle develops a small py by
gravitational attraction. The deflection angle |∆φ| = −(py /px )final is, to first order in small

106
quantities,
1 ∞ dpy
Z
|∆φ| = − dx
px −∞ dx
Z ∞
1 dpy
=− dx
px c −∞ dt
Z ∞
1 GN M m y
=− dx 2 2
p
px c −∞ (x + y ) x + y 2
2

2GN M
= , (20.2)
c2 b
where b is the impact parameter. Note that the factor of m for the corpuscle cancelled out:
the m in the numerator arising from the gravitational force killed the m in the momentum
denominator px = mc. Overall, we see that the Newtonian angle for deflection of light is
small but nonzero.
To analyze the answer in General Relativity, our starting point is again the geodesic
equations. For photons executing equatorial motion (θ = π/2), we had two Killing vectors
giving rise to two conserved quantities and also the tangent vector norm condition,

2µ .
 
1− t=E, (20.3)
r
.
r2 φ = L , (20.4)
−1
2µ .2 .2
  
2µ .
1− t − 1− r2 − r2 φ = 0 . (20.5)
r r

Substituting in the conserved quantities gives for the radial equation

.r2 + L2 1 − 2µ = E 2 .
 
(20.6)
r2 r

From the above, we can find the GR shape equation for photons moving in a Schwarzschild
geometry,
  −1/2
dφ 1 1 1 2µ
= 2 2 − 2 1− . (20.7)
dr r b r r
Substituting this time
1
ũ = (20.8)
r
into the shape equation, and massaging the algebra a bit further, gives

d2 ũ
+ ũ = 3µ ũ2 . (20.9)
dφ2
When there is no matter, the RHS of the above shape equation is zero. In that case, the
solution is
1
ũ(φ) = ũ0 (φ) = sin φ , (20.10)
b

107
where b is the impact parameter. Note how this is different from the timelike trajectories we
studied in the previous lecture, which executed elliptical trajectories in the Newtonian limit
rather than hyperbolic ones. Here, let us also work perturbatively, writing
1
ũ(φ) ' sin φ + ũ1 (φ) . (20.11)
b
Substituting to find the equation of motion for the perturbation gives
d2 ũ1 (φ) 3µ
2
+ ũ1 (φ) ' 2 sin2 φ . (20.12)
dφ b
As you can check explicitly, this is solved by
 
3µ 1
ũ1 (φ) ' 2 1 + cos 2φ , (20.13)
2b 3
so that  
1 3µ 1
ũ(φ) ' sin φ + 2 1 + cos 2φ . (20.14)
b 2b 3
This is the equation describing the trajectory of the photon in GR, to first order in
perturbations about the Newtonian result. So let us ask the question: what does the angle
tend to as we go very far away from the gravitating body? This amounts to taking r → ∞,
which corresponds in our variables to ũ → 0. In other words, we need to look for solutions
of ũ(φ) = 0. For slight deflections, sin φ ' φ and cos 2φ ' 1. Solving for the angle gives a
slightly negative answer, corresponding to one of the separatrices of the hyperbola,
2GN M
φ'− . (20.15)
c2 b
We are not quite finished. As indicated in the figure, the GR deflection angle for photons is
twice the above result,
4GN M
|∆φGR | ' . (20.16)
c2 b
Notice that this is also twice the Newtonian result for the bending of light. For a grazing
deflection by our Sun, it is about 1.75 seconds of arc.
What if we cannot apply a perturbation analysis because the deflection angle is large?
Then we would need to use the full GR geodesic equations for photons without any approx-
imations. In that case, by making use of previous results we have derived for the shape
equation for geodesics, we find
Z ∞   −1/2
1 1 2µ
|∆φGR | = 2 dr 2 − 2 1 − , (20.17)
r0 b r r
where r0 is the point of closest approach. At r0 , the [. . .] in the integrand vanishes.
Historical note: Eddington’s eclipse expedition to measure bending of light while the
Sun was blocked by the Moon was accepted in 1919 and made Einstein a rock star, despite
poorly understood systematic errors, because it appealed to Western Europeans in the post
WWI climate of wanting peace between nations that had been at war.

108
20.2 Radar echoes
One other important test of GR is measuring radar echoes in the solar system, which is
about the interplay between distance and time. To analyze this, we need two ingredients.
First, one of our geodesic equations for photons from earlier that took the form
.r2 + L2 1 − 2µ = E 2 .
 
(20.18)
r2 r
We also had the energy equation
2µ .
 
1− t=E, (20.19)
r
which is the second ingredient. These can be combined to find the t−r shape equation.
Using  2  2  2  −2
dr dr dt dr 2 2µ
= = E 1− , (20.20)
dλ dt dλ dt r
we have that −3  2 −1
(L/E)2
 
2µ dr 2µ
1− + = 1− . (20.21)
r dt r2 r
At the distance of closest approach, which we will call R, we have
 2
dr
= 0, (20.22)
dt
r=R

so that at that point −1


(L/E)2


= 1− (20.23)
R2 R
Then, after a bit of algebra, the expression
 2  2 3
(L/E)2

dr 2µ 2µ
= 1− − 1− . (20.24)
dt r r2 r
can then be massaged into the form
1/2
R2 (1 − 2µ/r)
 
dr 2µ
= 1− 1− 2 (20.25)
dt r r (1 − 2µ/R)
We can integrate this to get the time taken to travel from radial position R to r. It helps to
begin by expanding the integrand to first order in µ/r. After some algebra, we get
Z r  
r 2µ µR
t(r, R) ' dr √ 1+ + + ... . (20.26)
R r 2 − R2 r r(r + R)
Then we integrate, to obtain


  r
2 2
r + r 2 − R2 r−R
t(r, R) ' r − R + 2µ ln +µ + ... . (20.27)
R r+R
The first term on the RHS here is just what we would have got if we had drawn a straight
line. So the second and third terms are quantifying the bending of photon trajectories.

109
Now, suppose that we bounced a radar beam out to Venus and back, grazing the Sun.
Then we would have twice the sum of the second and third terms above (twice for there and
back). Using the approximation that the closest approach distance is much less than the
distance of either Earth or Venus from the Sun (rE  R, rV  R), gives
4GN M h rE rV i
∆t ' ln + 1 . (20.28)
c3 R2
Note that if we wanted to take into account the gravitational redshift of Earth, this is an
order µE /rE correction to what we have already calculated and therefore negligible.
Experimentally, when Venus is on the opposite side of the Sun to the Earth, the numerical
value of the time delay for a grazing passing of the Sun is about 220µs, if you convert time
back from metres to seconds. HEL goes into more detail about the experimental nuances in
§10.3. One has to correct for the motion of Venus and Earth in their orbits, their individual
gravitational fields, the variance of reflecting surfaces on Venus, and refraction by the Solar
corona. After all the experimental dust settles, you get the data agreeing in a pretty way
with the GR prediction.

110
21 M30Nov
21.1 Geodesic precession of gyroscopes
Precession of gyroscopes is another experimental test of General Relativity. Gyros are
interesting because they spin on an axis, and this spin vector sµ feels the effects of General
Relativity through the physics of parallel transport. Let us see out how this works.
The geodesic is a physically special curve because it parallel transports its own tangent
vector,
d µ
u + Γµνσ uν uσ = 0 . (21.1)

Physically, the spin must be orthogonal to the tangent vector,

gµν sµ uν = 0 . (21.2)

In other words, the spin cannot have a timelike component in the instantaneous rest frame
of the test object. If we want this zero inner product to be conserved at all points along the
worldline of the gyro, we need to insist that the spin vector sµ be parallel transported,
d µ
s + Γµνσ sν uσ = 0 . (21.3)

To demonstrate the effect we are after, it is sufficient to use the approximation that
Earth’s gravitational field (in which GPB flew) is described by the Schwarzschild metric. This
will simplify our computations because there are fewer Christoffel symbols for Schwarzschild
than for Kerr. Imagine that our test gyroscope is orbiting Earth in a circle, in the equatorial
plane of our spherical polar coordinate system. Circular motion occurs at fixed (r, θ), so that
u1 (λ) = 0 and u2 (λ) = 0 ∀ λ. Because θ = π/2, Γθϕϕ and Γϕθϕ are zero and Γrϕϕ = Γrθθ . So
our spin parallel transport equations in (t, r, θ, ϕ) coordinates become

dst
+ Γtrt sr ut = 0 , (21.4)

dsr
+ Γrtt st ut + Γrϕϕ sϕ uϕ = 0 , (21.5)

dsθ
= 0, (21.6)

dsϕ
+ Γϕrϕ sr uϕ = 0 . (21.7)

where
 −1    
t µ 2µ r µ 2µ r 2µ 1
Γ rt = 2 1− , Γ tt = 2 1 − , Γ ϕϕ = −r 1 − , Γϕrϕ = . (21.8)
r r r r r r
To proceed further, we need to know something about the normalization of the velocity
vector. We can write it as [uµ ] = ut [1, 0, 0, Ω]T , where Ω is our angular velocity for circular
motion. What is the angular velocity for our case? We actually mentioned the key ingredi-
ents already, in passing, when we discussed massless and massive particle geodesics in the

111
Schwarzschild spacetime. In particular, we derived the shape equation for (quasi-)elliptical
orbits. Circular orbits are a special case, and the shape equation can easily be rearranged
to find L. We obtain
2 µR2
L = (21.9)
R − 3µ
where R is the radius of the circular orbit. Then using the norm condition on the velocity
vector gives
(1 − 2µ/R)
E=p . (21.10)
1 − 3µ/R
We can also find the angular velocity, by using the geodesic equations to find ϕ(t),
 2  −1 !2
dϕ dϕ dt
= . (21.11)
dt dλ dλ

After the dust settles, this gives the very simple expression
µ
Ω2 = . (21.12)
r3
The norm of the 4-velocity must be unity, as appropriate to a massive particle (our gyro-
scope). This gives the equation
  −1/2  −1/2
0 2µ 2 2 3µ
u = 1− −r Ω = 1− . (21.13)
r r

In this system, we have ur = 0 = uθ , and so the condition that the spin vector be
orthogonal to the velocity vector becomes
 
2µ t t
1− s u − r2 sϕ uϕ = 0 . (21.14)
r

Since uϕ /ut = dϕ/dt = Ω, we can express st in terms of sϕ ,

Ωr2
st = sϕ . (21.15)
(1 − 2µ/r)
As you can check for yourself, this means that the first and fourth of the parallel transport
equations are equivalent. Then the remaining equations are

dsr rΩ ϕ dsθ dsϕ ut Ω r


− t s = 0, = 0, + s = 0. (21.16)
dλ u dλ dλ r
We can convert the experimentally relatively unfamiliar affine parameter λ to the coor-
dinate time t using ut = dt/dλ. Using the third equation to eliminate sϕ from the first gives
for the set of three
d2 s r Ω2 r dsθ dsϕ Ω r
+ s = 0, = 0, + s = 0. (21.17)
dt2 (ut )2 dt dt r

112
This has solution
Ω 1
sr (t) = s1 (0) cos Ω0 t , sθ (t) = 0 , sϕ (t) = − 0
s (0) sin Ω0 t , (21.18)
rΩ
where
Ω p
Ω0 == Ω 1 − 3µ/r . (21.19)
ut
Therefore, the spatial part of the spin vector is rotating relative to the radial direction r̂
with a coordinate angular speed −Ω0 in the direction -ϕ̂. But the radial direction itself is
rotating with coordinate angular speed +Ω. So it is the difference in speeds which gives rise
to geodesic precession.

If you revolve once in a coordinate time t = 2π/Ω, the final direction of the spatial spin
vector is 2π + α, where α = 2π(1 − Ω0 /Ω). Per revolution, then, the angular precession is
" r #

α = 2π 1 − 1 − . (21.20)
r

This effect is not very big, but it is cumulative. That means if you can machine almost-
perfect gyros and leave them in orbit for a veeeeeeeery long time, then you have a chance of
these effects adding up and being measurable.
From the GPB website: “Gravity Probe B, launched 20 April 2004, is a space exper-
iment testing two fundamental predictions of Einstein’s theory of General Relativity (GR),
the geodetic and frame-dragging effects, by means of cryogenic gyroscopes in Earth orbit.
Data collection started 28 August 2004 and ended 14 August 2005. Analysis of the data
from all four gyroscopes results in a geodetic drift rate of −6, 601.8 ± 18.3 mas/yr and
a frame-dragging drift rate of −37.2 ± 7.2 mas/yr, to be compared with the GR predic-
tions of −6, 606.1 mas/yr and −39.2 mas/yr, respectively (‘mas’ is milliarc-second; 1 mas=
4.848 × 10−9 radians or 2.778 × 10−7 degrees).”

113
21.2 Accretion disks
Lastly, let us mention one more experimental test of GR: accretion disks around compact
objects. They have matter swirling around the central black hole at millions of Kelvins, and
tend to emit strongly in the X-ray part of the spectrum. Even at such extreme temperatures,
some atoms can retain electrons and then emit radiation as they jump between energy levels,
and one such nucleus is iron. Looking at the shape of the broadened iron emission line from
the whole accretion disk actually gives a probe of the strong-field regime of GR, as we will
now motivate.
There are two types of redshift that operate in this system: gravitational redshift, and
Doppler shifting from relative velocity w.r.t. an observer here on Earth. Supposing that
we view an accretion disk and black hole system side-on, we would see a range of Doppler
shifting depending on which part of the disk we were looking at. This would even happen in
the Newtonian approximation! The really key part is the gravitational redshift. The essential
reason is that the smallest-possible frequency present in the observed spectrum must have
been emitted at the smallest possible value of r, so that it could experience maximum redshift
on the way out. Knowing the radius of the ISCO, we can then get a handle on the biggest
frequency ratio possible.

114
The ratio of the photon frequency at reception compared to that at emission is given by
νR pµ (R)uµR
= . (21.21)
νE pµ (E)uµE
Using what we derived in the previous experiment’s discussion concerning the angular ve-
locity and the tangent vector norm condition, you can show with straightforward algebra
that  1/2  
νR p0 (R) 3µ p0 (R) p3 (E)
= 1− 1± Ω , (21.22)
νE p0 (E)u0E + p3 (E)u3E r p0 (E) p0 (E)
where + corresponds to emitting matter on the side of the disk moving towards the observer
and − corresponds to matter on the other side. Now, because Schwarzschild is a stationary
metric, the downstairs component of the time component of the momentum of the photon
is conserved along a geodesic.
Our last ingredient is to find the ratio p3 (E)/p0 (E), and this is done using the null
photon momentum norm condition. Working in the equatorial plane, we find
 −1  
2µ 2 2µ 1
1− (p0 ) − 1 − (p1 )2 − 2 (p3 )2 = 0 . (21.23)
r r r
To get any further for a general angle between the accretion disk and us, we would need to
recruit the full photon geodesic equations. But in two special cases we can actually do a slick
avoidance manoeuvre and finesse this issue! When the matter is transverse to the observer
(or in a face-on disk), ϕ = 0, π. Then p3 (E) = 0, and so
r
νR 3µ
= 1− . (21.24)
νE r
When matter moves either directly towards or away from the observer, ϕ = ±π/2. Then
the radial component of the photon momentum is zero, and so
p
νR 1 − 3µ/r
= p (21.25)
νE 1 ± 1/ r/µ − 2
You
√ find that the smallest frequency √represented in the Iron emission line will be νR /νE =
2/3 ' 0.47 for face-on disks and 1/ 2 for edge-on disks.

115
22 R03Dec
The following gravitational waves material is based on §17 and §18 of HEL. All figures shown
are from HEL.

22.1 Finding the wave equation for metric perturbations


For this section we will assume that the cosmological constant is zero and that spacetime is
approximately flat. We will figure out the equations obeyed by small (perturbative) ripples
in the fabric of spacetime about the Minkowski metric, which are known as gravitational
waves. To begin, we assume that

gµν = ηµν + hµν , (22.1)

where |hµν |  1. To first order in small quantities,

g µν = η µν − hµν , (22.2)

where we raise and lower indices to this order by using the Minkowski metric,

hµν = η µρ η νσ hρσ . (22.3)

It is important to know how the perturbations are affected by changes of coordinates.


Under global Lorentz transformations x0µ = Λµν xν , we know that

0 ∂xρ ∂xσ
gµν = gρσ
∂x0µ ∂x0ν
= Λµρ Λν σ (ηρσ + hρσ )
= ηµν + Λµρ Λν σ hρσ (22.4)

because the Minkowski metric is invariant under global Lorentz transformations. Therefore,

h0µν = Λµρ Λν σ hρσ . (22.5)

In other words, hµν transforms like a tensor under global Lorentz transformations.
We can also ask about how perturbations in spacetime are affected by a general coordi-
nate transformation of the form
x0µ = xµ + ξ µ (x) . (22.6)
Therefore,
∂x0µ
= δνµ + ∂ν ξ µ . (22.7)
∂xν
By eye, we can see using the above equation that to first order in small quantities the inverse
transformation obeys
∂xµ
= δνµ − ∂ν ξ µ . (22.8)
∂x0ν

116
Accordingly, under these general coordinate transformations, we have
0
= δµρ − ∂µ ξ ρ (δνσ − ∂ν ξ σ ) (ηµν + hµν )

gµν
= ηµν + (hµν − ∂µ ξν − ∂ν ξµ ) (22.9)

where we defined ξµ = ηµν ξ ν , and worked to first order in small quantities. Therefore, the
transformation law of the perturbations under general coordinate transformations (22.6) is

h0µν = hµν − ∂µ ξν − ∂ν ξµ . (22.10)

What are the Christoffels to first order in perturbations? We did this type of approxi-
mation earlier when we first linearized a GR expression to recover its Newtonian limit. Here,
we obtain
1
Γσ µν = ∂ν hσµ + ∂µ hσν − ∂ σ hµν .

(22.11)
2
From this, it follows that, to first order in perturbations,
1
Rσ µνρ = ∂ν ∂µ hσρ + ∂ρ ∂ σ hµν − ∂ν ∂ σ hµρ − ∂ρ ∂µ hσν .

(22.12)
2
The really neat thing about this expression for the Riemann tensor is that it is invariant
under general coordinate transformations (22.6). As you can check, this property also holds
for the Ricci tensor and the Ricci scalar. For convenience, let us define

h = hσσ (22.13)

and write
2 = ∂ µ ∂µ . (22.14)
Then we obtain
1
∂µ ∂ν h + 2 hµν − ∂µ ∂ρ hρν − ∂ν ∂ρ hρµ ,

Rµν = (22.15)
2
and
R = 2 h − ∂µ ∂ν hµν . (22.16)
Plugging the above expressions into the Einstein equations yields a second-order PDE
for the perturbations. In order to aid in wrangling all the pertinent algebra, it is convenient
to define the trace reverse of hµν ,
1
h̄µν ≡ hµν − ηµν h . (22.17)
2
This obeys the property that
¯ =h .
h̄ (22.18)
µν µν

We also have (again to first order in perturbations)

h̄ = η µν h̄µν , (22.19)

which obeys h̄ = (1 − D/2)h. In D = 4, h̄ = −h. In terms of h̄µν , the Einstein equations


become
ρσ ρ ρ
2 h̄µν + ηµν ∂ρ ∂σ h̄ − ∂ν ∂ρ h̄µ − ∂µ ∂ρ h̄ν = −16πGN Tµν . (22.20)

117
On the face of it, this equation does not look very much like a familiar wave equation
involving the d’Alembertian!
In order to figure out what our Einstein equations for the perturbations imply physically,
it is crucial that we understand how h̄µν transforms under general coordinate transformations
(22.6). First, recall our equation (22.10), h0µν = hµν − ∂µ ξν − ∂ν ξµ . From this, it follows
directly that

h0 = η µν (hµν − ∂µ ξν − ∂ν ξµ )
= h − 2∂µ ξ µ . (22.21)

Therefore,
0µρ 1
h̄ = h0µρ − η µρ h0
2
1
= (hµρ − ∂ µ ξ ρ − ∂ ρ ξ µ ) − η µρ (h − 2∂σ ξ σ )
2
µρ
= h̄ − ∂ ξ − ∂ ξ + η ∂σ ξ σ .
µ ρ ρ µ µρ
(22.22)

Taking the partial derivative of this expression gives


0µρ µρ
∂ρ h̄ = ∂ρ h̄ − 2 ξ µ . (22.23)

So far, this description of algebra manipulations might seem a tad dry. But this is where the
real money is to be made in careful observation. Suppose that we are smart enough to choose
µρ 0µρ
a coordinate system in which 2 ξ µ = ∂ρ h̄ . Then ∂ρ h̄ = 0, which massively simplifies
the Einstein equation. In particular, all the terms on the LHS which did not involve the
d’Alembertian operator become equal to zero in this coordinate system. Wow!
To summarize, let us drop the primes for clarity, raise the indices with η, and write our
Einstein equation in this awesome new coordinate system,
µν 16πGN µν
2 h̄ =− T . (22.24)
c4
In order for the wave equation for our metric perturbations to obey this simple equation,
our coordinate system must obey
µν
∂µ h̄ = 0 . (22.25)
Any further coordinate change xµ → xµ + ξ µ within this gauge class would be OK, as long
as it satisfied 2 ξ µ = 0. This is very reminiscent of the Lorentz gauge in electromagnetism,
∂µ Aµ = 0, which still allows further gauge transformations of the form Aµ → Aµ + ∂ µ λ,
where 2 λ = 0. Accordingly, this gauge for metric perturbations is sometimes rather loosely
called the Lorentz gauge. More properly, it is called the de Donder gauge.

22.2 Solving the linearized Einstein equations


As always, if we are trying to solve a wave equation, it helps to start by finding the Green’s
function,
2x G(xσ − y σ ) = δ (4) (xσ − y σ ) . (22.26)

118
As explained in detail in HEL §17.6, this is solved by the retarded Green’s function

δ(x0 − |~x|)θ(x0 )
G(xσ ) = , (22.27)
4π|~x|

as you can check by substituting it in. Note that the retarded Green’s function (as com-
pared to, say, the advanced Green’s function) is required by causality: we cannot expect a
gravitational wave to be influenced by sources in its future light cone, only those in its past
light cone. Using the retarded Green’s function, we can see immediately that the solution
to the Einstein equation for the metric perturbation is

T µν (ctr , ~y )
Z
µν 4GN
h̄ (ct, ~x) = − 4 d3 ~y (22.28)
c |~x − ~y |

HEL Fig.17.2 is very helpful for visualizing the meaning of the retarded time variable tr in
this equation, which is defined by

ctr = ct − |~x − ~y | . (22.29)

Plodding through the details of how to check whether this satisfies the de Donder gauge
condition requires careful attendance to the retarded time story, using the chain rule for
derivatives, and integration by parts. The net result is
Z
∂ µν 4GN 1 ∂ µν 0
µ
h̄ = − 4 d3 ~y T (y , ~y ) . (22.30)
∂x c |~x − ~y | ∂y µ

But since the energy-momentum tensor is conserved in the linearized theory,

∂µ T µν = 0 , (22.31)

we have what we need:


µν
∂µ h̄ = 0. (22.32)

119
A very important idea from electromagnetism was the multipole expansion. Here,
it is the conserved energy-momentum that sources our gravitational wave, rather than the
conserved current sourcing the EM wave, but the principle is analogous. In an asymptotically
flat spacetime, higher partial waves fall off with higher powers of distance, so the lowest
pertinent multipole moment for a compact source dominates the physics of wave propagation
far from the source. As you learned in 3rd year EM class, in order to generate EM waves, a
time-dependent dipole moment is needed. In order to generate gravitational waves, it turns
out that we will need a time-dependent quadrupole moment.
To start our way towards that result, let us Taylor expand the denominator in the integral
µν
for h̄ , with |~x| = r and small ~y ,
   
1 1 i 1 1 i j 1
' + (−y )∂i + (−y )(−y )∂i ∂j + ...
|~x − ~y | r r 2! r
3xi xj − δij r2
 
1 i xi i j
= +y 3 +y y + ... (22.33)
r r r5
Motivated by this, we define the multipoles
Z
µνσ1 σ2 ...σ`
M (ctr ) = d3 ~y T µν (ctr , ~y )y σ1 y σ2 . . . y σ` , (22.34)

and obtain

4GN X (−1)` µνσ1 σ2 ...σ`
 
µν 1
h̄ (ct, ~x) = − 4 M (ctr )∂σ1 ∂σ2 . . . ∂σ` (22.35)
c `=0 `! r

For the case of a compact source, we can use these general expressions to find ap-
proximations for our linearized metric perturbations. First we need to consider what the
components of T µν tell us physically. T 00 is the energy density of the source particles, and if
this is integrated over all space then it gives M c2 , the conserved energy. T 0i is the momen-
tum density of source particles, and if this is integrated over all space it gives P i c, which is
also conserved at this order in perturbations. The T ij are the internal stresses, and they are
not necessarily zero when integrated over all space. Without loss of generality, we may take
our spatial coordinates xi to be in the centre of momentum (CoM) frame of the particles, so
that P i = 0. Then in CoM coordinates,
00 4GN M 0i i0
h̄ =− , h̄ = h̄ = 0 . (22.36)
c2 r
The remaining parts are
Z
ij 4GN
d3 ~y T ij (ct0 , ~y )
 
h̄ (ct, ~x) = − 2 ct0 =ct−r
(22.37)
cr
It is not especially easy to compute this integral directly. HEL explain carefully in §17.8
that a slightly indirect yet algebraically shorter route can be found by recruiting energy-
momentum conservation ∂µ T νµ = 0. In a 3+1 split, we have
0 = ∂0 T 00 + ∂k T 0k
0 = ∂0 T i0 + ∂k T ik . (22.38)

120
These two equations can be used to turn our integral over T ij into integrals over higher
moments of T 0i and T 00 . The first trick is to consider the integral of ∂k (T ik y j ) over a volume
completely enclosing the source and using Gauss’s theorem. The first conservation equation
then yields Z Z
3 ij 1 d
d3 ~y T i0 y j + T j0 y i .

d ~y T = (22.39)
2c dtr
The second trick is to consider the integral of ∂k (T 0k y i y j ) over the same enclosing volume;
it yields
1 d2
Z Z
3 ij
d ~y T = 2 2 d3 ~y T 00 y i y j . (22.40)
2c dtr
Defining the quadrupole moment I ij by
Z
I (ct) = d3 ~y T 00 (ct, ~y ) y i y j
ij
(22.41)

gives the solution


d2 I ij (ct0 )
 
ij 2GN
h̄ (ct, ~x) = − 6 . (22.42)
cr dt02 t0 =tr

This is known as the quadrupole formula.

121
23 M07Dec
As a warm-up example of solving for gravitational perturbations, we can consider a station-
ary non-relativistic source which is a perfect fluid. In this case the energy-momentum
tensor is constant in time, and the distinction between time and retarded time is irrelevant.
Then we have directly that
Z µν
µν 4GN 3 T (~ y)
h̄ (~x) = − 4 d ~y . (23.1)
c |~x − ~y |
When our perfect fluid is non-relativistic, all speeds are much smaller than c and to lowest
order in perturbation theory we can neglect the pressure. This gives

T 00 = ρc2 , T 0i = ρcui , T ij = ρui uj , (23.2)

where ρ is the proper density distribution of the source. The solution can be written as
00 4Φ 0i Ai ij
h̄ = , h̄ = , h̄ = 0 , (23.3)
c2 c
where
Z
ρ(~y )
Φ(~x) = −GN d3 ~y
|~x − ~y |
ρ(~y )ui (~y )
Z
4GN
Ai (~x) = − 2 d3 ~y . (23.4)
c |~x − ~y |
µν
We can easily obtain hµν as functions of h̄ using our earlier formula connecting them. The
result is
2Φ Ai
h00 = h11 = h22 = h33 = 2 , h0i = . (23.5)
c c
This provides the derivation that we promised quite a long time ago of the lowest-order
Newtonian approximation to the spacetime metric,
   
2Φ Ai 2Φ
ds = 1 + 2 (cdt) + 2 (cdt)dx − 1 − 2 δij dxi dxj ,
2 2 i
(23.6)
c c c
with the bonus that we now allow for slow rotation of the source. An example of a stationary
non-relativistic source would be a rigidly rotating sphere.

23.1 Gravitational plane waves


Another simple example of solving for gravitational perturbations is the case of gravita-
tional plane waves. These take the form
µν 1
h̄ = Aµν exp(ikµ xµ ) + c.c. . (23.7)
2
The de Donder gauge condition requires that the polarization tensor Aµν obeys

kµ Aµν = 0 , (23.8)

122
i.e., it is transverse to the direction of propagation of the wave.
Let us count polarizations. We started off with ten components of our symmetric tensor
metric perturbations. Fixing de Donder gauge reduces that to six independent components.
We can further fix the gauge by doing a coordinate transformation xµ → xµ +ξ µ , as long as we
stay within de Donder gauge, which further reduces the number of independent components
down to two. Let us see how this works, in more detail. Consider a ξ µ of the form

ξ µ = µ exp(ikν xν ) . (23.9)
µν
This clearly obeys 2 ξ µ = 0 if µ =const. Under this transformation, we know how h̄
transforms, which tells us that the polarization tensor must also transform as

A0µν = Aµν − iµ k ν − iν k µ + iη µν ρ kρ . (23.10)

Let our wavevector lie along the z-direction: ~k = kẑ. Then by our de Donder gauge condition,
Aµ3 = Aµ0 ∀µ. Using this and the above two equations, we can straightforwardly show that
the components of µ can always be chosen to ensure that the only nonzero components of
the polarization tensor are  
0 0 0 0
0 a b 0
[Aµν
TT ] = 0 b −a 0 .
  (23.11)
0 0 0 0
This cleverly chosen gauge is known as the transverse traceless (TT) gauge. If we wish,
we can write the polarization tensor as

Aµν µν µν
TT = ae+ + be× . (23.12)

More generally, we define the TT gauge via


0i
h̄TT ≡ 0 , h̄TT ≡ 0 . (23.13)
µν
Using this and the de Donder gauge condition ∂µ h̄TT = 0 , we have that
00 ij
∂0 h̄TT = 0 , ∂i h̄TT = 0 . (23.14)

What effect does such a gravitational plane wave have on a bunch of free particles? We
can work this out by using the geodesic equation for their motion,
duσ
+ Γσ µν uµ uν = 0 . (23.15)

Suppose a particle is initially at rest before the wave comes by. Then [uµ ] = c[1, ~0]T , and so
duσ
= −c2 Γσ 00

c2
= − η σρ (∂0 hρ0 + ∂0 h0ρ − ∂ρ h00 )
2
=0 (23.16)

123
because we are working in TT gauge. So hey: our coordinate system is adapted to individual
particles! But even though the coordinate separation of particles is constant, their physical
µν
separation is not, because h̄ 6= 0 . Let us parametrize the coordinate spatial separation
between two nearby particles as S i . Then the physical spatial separation is
`2 ≡ −gij S i S j = (δij − hij )S i S j . (23.17)
To first order in perturbations, we can define a new physical separation vector ζ i by
`2 = δij ζ i ζ j , (23.18)
or
1
ζ i = S i + hik S k . (23.19)
2
To see the effect of our gravitational plane wave in the z direction, let us inspect two particles
in the (x, y) plane. Then S 3 = 0. Also, because hk3 TT = 0 ∀k, there is no change in their
z-separation due to the plane wave. Their moving around is going to happen in the (x, y)
plane only. Another advantage of TT gauge is that h̄ = 0, which also implies that h = 0.
Picking the eµν
+ polarization tensor for definiteness, we find easily that

hµν µν µ µν 0 3
TT = ae+ cos(kµ x ) = ae+ cos[k(x − x )] , (23.20)
where k = |~k| = ω/c. So
a
cos[k(x0 − x3 )](S 1 , −S 2 , 0)T .
(ζ i ) = (S 1 , S 2 , 0)T − (23.21)
2
This is illustrated nicely in Fig.18.1 of HEL.

For the other case of the crossed polarization eµν


× , this Fig.18.2 of HEL shows how to
visualize its effect.

Either way, you can think of gravitational waves as stretchy-squeezy waves.

124
23.2 Energy loss from gravitational radiation
THere is no local notion of gravitational energy density in GR, because we could always
change it via a coordinate transformation. Also, in generic spacetimes in GR, neither en-
ergy nor momentum is conserved. But we can still motivate an expression for the energy-
momentum tensor of the gravitational field itself in the perturbative approximation, in order
to allow us to derive the famous formula for the power radiated by gravitational waves.
We started our perturbative approach to spacetime metric perturbations starting from
the full equations,
8πGN
Gµν = − 4 Tµν . (23.22)
c
Now imagine that we go one step beyond linear order, keeping up to second-order terms in
small quantities. Then we have
8πGN
G(1) (2)
µν + Gµν + . . . = − Tµν . (23.23)
c4
We could try moving the second-order approximation to the Einstein tensor over to the RHS
and calling it tgrav
µν . The problem with this idea is that unfortunately this expression is not
gauge invariant. HEL explain in detail how to fix this by averaging over a small region about
any given point and writing
c4
tgrav
µν ≡ hG(2) i. (23.24)
8πGN µν
After a good deal of fairly unilluminating algebra, the resulting expression becomes

c4 ρσ ρσ 1
tgrav
µν = h(∂µ h̄ρσ )∂ν h̄ − 2(∂σ h̄ )∂(µ h̄ν)ρ − (∂µ h̄)∂ν h̄i
32πGN 2
1
− hh̄ρ(µ Tν)ρ + ηµν hρσ Tρσ i . (23.25)
4
The key property of this thing is that it is invariant under gauge transformations, as required.
We consider gravitational plane waves in vacuo so that only the top line will appear for us.
µν µν
Now, in TT de Donder gauge, we have ∂µ h̄TT = 0, h̄TT = 0, and h̄TT = hµν TT . So then
in vacuo, we have only the first term in the complicated expression above turned on. In our
0i
TT gauge, considering only the radiative part of the gravitational field shows that h̄TT = 0,
so that in fact only the spatial components of the perturbations actually appear,

c4 ij
tgrav
µν = h(∂µ hTT
ij )(∂ν hTT )i (23.26)
32πGN

Physically, the energy flux (energy/area/time) in the ni spatial direction is

F (~n) = −ct0k nk = +δkj t0k nj , (23.27)

in our signature convention, because in general an energy-momentum tensor tµν encodes the
flux of µ-momentum in the ν-direction.

125
Let us consider a compact source and aim for the far-field result, choosing ~n to be
pointing in the radial direction away from the source. Then we have
c4 ij
F (r̂) = − h(∂t hTT
ij )∂r hTT i . (23.28)
32πGN
But from our quadrupole formula from earlier, we know that
ij 2GN h ..ij i
h̄ = − 6 I (23.29)
c r

where · ≡ d/dt and r means using retarded time. We need an expression for the TT part of
the quadrupole moment, so we define
1
Jij ≡ Iij − δij I , (23.30)
3
where I = Iii . Then
2GN h ..ij i
ij
hij
TT = h̄TT = −J (23.31)
c6 r
Now, in order to finish this line of reasoning, we need to slow down a little and be careful
about how to take t and r derivatives at retarded time. Our definition of retarded time was

x0r ≡ ctr = x0 − |~x − ~y | , (23.32)

and so for any function f (x0r , ~y ), we have

∂f (x0r , ~y ) ∂f (y 0 , ~y ) ∂x0r
 
= ,
∂xµ ∂y 0 r ∂x
µ

∂f (x0r , ~y ) ∂f (y 0 , ~y ) ∂f (y 0 , ~y ) ∂x0r
   
= + , (23.33)
∂y i ∂y i r ∂y 0 r ∂y
i

where r means to evaluate at y 0 = x0r . We therefore have that


2GN h ...TT i
∂t hTT
ij = − J ij (23.34)
c6 r

We can also evaluate


2GN h ..ij i 2GN ...ij
∂r hTT
ij = 6 2
J TT + 7 J TT . (23.35)
c r r cr
The second term here dominates over the first, and so our expression for the radiation flux
from the gravitational wave source is
GN ...TT ...ij
F (r̂) = hJ J i . (23.36)
8πr2 c9 ij TT
Our last task is to express this in terms of the original quadrupole. For that we need a
handy projection tensor,
Pij ≡ δij − ni nj . (23.37)

126
Applying this to an arbitrary spatial vector allows one to see that it obeys the properties we
expect of a projector. Then the transverse part of the polarization vector for the gravitational
wave is Aij i j k`
T = P k P `A is the transverse
 k`part. To ensure that there is no trace part, we
need to form AijTT = P i
k P j
` − 1 ij
2
P P k` A . By direct analogy, we find for the quadrupole
 
ij i j 1 ij
JTT = P k P ` − P Pk` Jk` . (23.38)
2

Denoting the components of the unit radial vector by x̂i , this gives

ij 1
JijTT JTT = Jij J ij − 2Ji j J ik x̂j x̂k + J ij J k` x̂i x̂j x̂k x̂` . (23.39)
2
To get the integrated luminosity, we integrate this over 4π of solid angle. After the boring
dust settles, we have (at last!) the famous formula we wanted,

dE GN h ... ...ij i
= −LGW = − 9 h J ij J i. (23.40)
dt 5c r

This shows that you not only need a quadrupole (not a monopole or a dipole) to produce
gravitational radiation, you also need the third derivative of it turned on. Again, the reason
why we use retarded times in this expression is to ensure the correct boundary conditions
for our Green’s function reflecting causality.
Gravitational radiation was discovered indirectly in 1974, via the famous observations
of Russell Hulse and Joseph Taylor of binary pulsars which won them a 1993 Nobel Prize in
Physics. The period between winks of the pulsar slowed down over time, at a rate precisely
predicted by GR. What was a far more impressive technological feat was the building of
LIGO, the Laser Interferometer Gravitational Wave Observatory. It won the 2017 Nobel
Prize in Physics for Rainer Weiss, Barry Barish, and Kip Thorne for the direct discovery of
gravitational waves – tiny ripples in the very fabric of spacetime long thought technologically
impossible to detect. Here are a few URLs for checking out their discoveries:-
• https://www.youtube.com/watch?v=FXlg3cr-q44
• https://www.ligo.caltech.edu/news/ligo20160211
• https://www.ligo.caltech.edu/page/four-new-detections-o1-o2-catalog

127

You might also like