notes_1
notes_1
notes_1
(part one)
Contents
0 Introduction 3
0.1 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4 Stochastic processes 40
4.1 Random walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2 Urn processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3 A branching process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.4 Other stochastic processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.5 Exercises on Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
1
©Nic Freeman, University of Sheffield, 2024.
5.6 Hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.7 Exercises on Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
2
©Nic Freeman, University of Sheffield, 2024.
Chapter 0
Introduction
We live in a random world: we cannot be certain of tomorrow’s weather or what the price of
petrol will be next year – but randomness is never ‘completely’ random. Often we know, or
rather, believe that some events are likely and others are unlikely. We might think that two
events are both possible, but are unlikely to occur together, and so on.
How should we handle this situation? Naturally, we would like to understand the world around
us and, when possible, to anticipate what might happen in the future. This necessitates that we
study the variety of random processes that we find around us.
We will see many and varied examples of random processes throughout this course, although
we will tend to call them stochastic processes (with the same meaning). They reflect the wide
variety of unpredictable ways in which reality behaves. We will also introduce a key concept used
in the study of stochastic processes, known as a martingale.
It has become common, in both science and industry, to use highly complex models of the
world around us. Such models cannot be magicked out of thin air. In fact, in much the same way
as we might build a miniature space station out of individual pieces of Lego, what is required is
a set of useful pieces that can be fitted together into realistic models. The theory of stochastic
processes provides some of the most useful building blocks, and the models built from them are
generally called stochastic models.
One industry that makes extensive use of stochastic modelling is finance. In this course, we
will often use financial models to motivate and exemplify our discussion of stochastic processes.
The central question in a financial model is usually how much a particular object is worth.
For example, we might ask how much we need to pay today, to have a barrel of oil delivered in
six months time. We might ask for something more complicated: how much would it cost to have
the opportunity, in six months time, to buy a barrel of oil, for a price that is agreed on today?
We will study the Black-Scholes model and the concept of ‘arbitrage free pricing’, which provide
somewhat surprising answers to this type of question.
3
©Nic Freeman, University of Sheffield, 2024.
0.1 Organization
Syllabus
These notes are for two courses: MAS352 and MAS61023. This is part one of the lecture notes.
Part two will be made available when we reach Chapter 10.
Some sections of the course are included in MAS61023 but not in MAS352. These sections
are marked with a (∆) symbol. We will not cover these sections in lectures. Students taking
MAS61023 should study these sections independently.
Some parts of the notes are marked with a ( ) symbol, which means they are off-syllabus.
These are often cases where detailed connections can be made to and from other parts of mathe-
matics.
Problem sheets
The exercises are divided up according to the chapters of the course. Some exercises are marked
as ‘challenge questions’ – these are intended to offer a serious, time consuming challenge to the
best students.
Aside from challenge questions, it is expected that students will attempt all exercises (for the
version of the course they are taking) and review their own solutions using the typed solutions
provided at the end of these notes, in Appendices A and C.
At three points during each semester, an assignment of additional exercises will be set. About
one week later, a mark scheme will be posted, and you should self-mark your solutions.
Examination
Both versions of the course will be examined in the summer sitting. Parts of the course marked
with a (∆) are examinable for MAS61023 but not for MAS352. Parts of the course marked with
a ( ) will not be examined (for everyone).
A formula sheet will be provided in the exam, see Appendices B (for semester 1) and E (for
semester 2). Some detailed advice on revision can be found in Appendix D, attached to the second
semester notes.
MAS61023 also contains a mid-year online test, which will take place towards the end of
January and comprise 15% of the final mark. MAS352 is assessed entirely by exam.
Website
Further information, including the timetable, can be found on
https://nicfreeman1209.github.io/Website/MASx52/.
4
©Nic Freeman, University of Sheffield, 2024.
Chapter 1
In this chapter we look at our first example of a financial market. We introduce the idea of
arbitrage free pricing, and discuss what tools we would need to build better models.
5
©Nic Freeman, University of Sheffield, 2024.
Let r > 0 be a deterministic constant, known as the interest rate. If we put an amount
x of cash into the bank at time t = 0 and leave it there until time t = 1, the bank will then
contain
x(1 + r)
in cash. The same formula applies if x is negative. This corresponds to borrowing C0 from
the bank (i.e. taking out a loan) and the bank then requires us to pay interest on the loan.
Our market contains cash, as its first commodity. As its second, we will have a stock. Let us
take a brief detour and explain what is meant by stock.
Firstly, we should realize that companies can be (and frequently are) owned by more than one
person at any given time. Secondly, the ‘right of ownership of a company’ can be broken down
into several different rights, such as:
• The right to a share of the profits.
• The right to vote on decisions concerning the companies future – for example, on a possible
merger with another company.
A share is a proportion of the rights of ownership of a company; for example a company might
split its rights of ownership into 100 equal shares, which can then be bought and sold individually.
The value of a share will vary over time, often according to how the successful the company is. A
collection of one or more shares in a company is known as stock.
We allow the amount of stock that we own to be any real number. This means we can own a
fractional amount of stock, or even a negative amount of stock. This is realistic: in the same way
6
©Nic Freeman, University of Sheffield, 2024.
as we could borrow cash from a bank, we can borrow stock from a stockbroker! We don’t pay
any interest on borrowed stock, we just have to eventually return it. (In reality the stockbroker
would charge us a fee but we’ll pretend they don’t, for simplicity.)
The value or worth of a stock (or, indeed any commodity) is the amount of cash required to
buy a single unit of stock. This changes, randomly:
Let u > d > 0 and s > 0 be deterministic constants. At time t = 0, it costs S0 = s cash
to buy one unit of stock. At time t = 1, one unit of stock becomes worth
su with probability pu ,
(
S1 =
sd with probability pd .
We can represent the changes in value of cash and stocks as a tree, where each edge is labelled
by the probability of occurrence.
To sum up, in the one-period market it is possible to trade stocks and cash. There are two
points in time, t = 0 and t = 1.
• If we have x units of cash at time t = 0, they will become worth x(1 + r) at time t = 1.
• If we have y units of stock, that are worth yS0 = sy at time t = 0, they will become worth
7
©Nic Freeman, University of Sheffield, 2024.
Our 5 units of cash become worth 5(1 + r) at time 1. Our 8 units of stock, which are worth 8S0
at time 0, become worth 8S1 at time 1. So, at time t = 1 our portfolio is worth V1 = 5(1+r)+8S1
and the expected value of our portfolio and time t is
1.3 Arbitrage
We now introduce a key concept in mathematical finance, known as arbitrage. We say that
arbitrage occurs in a market if it possible to make money, for free, without risk of losing money.
There is a subtle distinction to be made here. We might sometimes expect to make money, on
average. But an arbitrage possibility only occurs when it is possible to make money without any
chance of losing it.
Example 1.3.1 Suppose that, in the one-period market, someone offered to sell us a single unit
of stock for a special price 2s at time 0. We could then:
5. Profit!
Example 1.3.1 is obviously artificial. It does illustrates an important point: no one should
sell anything at a price that makes an arbitrage possible. However, if nothing is sold at a price
that would permit arbitrage then, equally, nothing can be bought for a price that would permit
arbitrage. With this in mind:
Let us step back and ask a natural question, about our market. Suppose we wish to have a
single unit of stock delivered to us at time T = 1, but we want to agree in advance, at time 0,
what price K we will pay for it. To do so, we would enter into a contract. A contract is an
agreement between two (or more) parties (i.e. people, companies, institutions, etc) that they will
do something together.
Consider a contract that refers to one party as the buyer and another party as the seller. The
contract specifies that:
At time 1, the seller will be paid K cash and will deliver one unit of stock to the buyer.
8
©Nic Freeman, University of Sheffield, 2024.
A contract of this form is known as a forward contract. Note that no money changes hands at
time 0. The price K that is paid at time 1 is known as the strike price. The question is: what
should be the value of K ?
In fact, there is only one possible value for K . This value is
• Suppose that a price K > s(1 + r) was agreed. Then we could do the following:
• Suppose, instead, that a price K < s(1 + r) was agreed. Then we could:
Once again, with this strategy we are certain to make a profit. Arbitrage!
Therefore, we reach the surprising conclusion that the only possible choice is K = s(1 + r).
We refer to s(1 + r) as the arbitrage free value for K . This is our first example of an important
principle:
The absence of arbitrage can force prices to take particular values. This is known as
arbitrage free pricing.
9
©Nic Freeman, University of Sheffield, 2024.
Figure 1.1: The stock price in GBP of Lloyds Banking Group, from September 2011 to September 2016.
by its expectation, we would want it to cost nothing! This would mean that E[S1 − K] = 0, which
means we’d choose
K = E[S1 ] = supu + sdpd . (1.2)
This is not the same as the formula K = s(1 + r), which we deduced in the previous section.
It is possible that our two candidate formulas for K ‘accidentally’ agree, that is upu + dpd =
s(1 + r), but they are only equal for very specific values of u, pu , d, pd , s and r. Observations of
real markets show that this doesn’t happen.
It may feel very surprising that (1.2) is different to (1.1). The reality is, that financial markets
are arbitrage free, and the correct strike price for our forward contract is K = s(1 + r). However
intuitively appealing it might seem to price by expected value, it is not what happens in reality.
Does this mean that, with the correct strike price K = s(1 + r), on average we either make
or lose money by entering into a forward contract? Yes, it does. But investors are often not
concerned with average payoffs – the world changes too quickly to make use of them. Investors
are concerned with what happens to them personally. Having realized this, we can give a short
explanation, in economic terms, of why markets are arbitrage free.
If it is possible to carry out arbitrage within a market, traders will1 discover how and imme-
diately do so. This creates high demand to buy undervalued commodities. It also creates high
demand to borrow overvalued commodities. In turn, this demand causes the price of commodities
to adjust, until it is no longer possible to carry out arbitrage. The result is that the market
constantly adjusts, and stays in an equilibrium in which no arbitrage is possible.
Of course, in many respects our market is an imperfect model. We will discuss its shortcomings,
as well as produce better models, as part of the course.
Remark 1.3.2 We will not mention ‘pricing by expectation’ again in the course. In a liquid
market, arbitrage free pricing is what matters.
1 Usually.
10
©Nic Freeman, University of Sheffield, 2024.
• We need to be able to express the idea that, as time passes, we gain information.
For example, in our market, at time t = 0 we don’t know how the stock price will change.
But at time t = 1, it has already changed and we do know. Of course, real markets have
more than one time step, and we only gain information gradually.
In fact, these two requirements are common to almost all stochastic modelling. For this reason,
we’ll develop our probabilistic tools based on a wide range of examples. We’ll return to study
(exclusively) financial markets in Chapter 5, and again in Chapters 15-19.
11
©Nic Freeman, University of Sheffield, 2024.
if x = n2
(
1
P[Xn = x] = n
1 − n1 if x = 0.
Show that P[|Xn | > 0] → 0 and E[Xn ] → ∞, as n → ∞.
1.6 Let X be a normal random variable with mean µ and variance σ 2 > 0. By calculating
P[Y ≤ y] (or otherwise) show that Y = X−µ
σ is a normal random variable with mean 0 and
variance 1.
1.7 For which values of p ∈ (0, ∞) is 1 x−p dx finite?
R∞
1.8 Which of the following sequences converge as n → ∞? What do they converge too?
cos(nπ)
nπ n n
1
sin
X X
−n −i
e 2 .
2 n i
i=1 i=1
12
©Nic Freeman, University of Sheffield, 2024.
Chapter 2
In this chapter we review probability theory, and develop some key tools for use in later chapters.
We begin with a special focus on σ -fields. The role of a σ -field is to provide a way of controlling
which information is visible (or, currently of interest) to us. As such, σ -fields will allow us to
express the idea that, as time passes, we gain information.
Definition 2.1.1 Let F be a set of subsets of Ω. We say F is a σ -field if it satisfies the following
properties:
1. ∅ ∈ F and Ω ∈ F .
2. if A ∈ F then Ω \ A ∈ F .
3. if A1 , A2 , . . . ∈ F then ∞
i=1 Ai ∈ F .
S
The role of a σ -field is to choose which subsets of outcomes we are actually interested in. The
power set F = P(Ω) is always a σ -field, and in this case every subset of Ω is an event. But P(Ω)
can be very big, and if our experiment is complicated, with many or even infinitely many possible
outcomes, we might want to consider a smaller choice of F instead.
Sometimes we will need to deal with more than one σ -field at a time. A σ -field G such that
G ⊆ F is known as a sub-σ -field of F .
We say that a subset A ⊆ Ω is measurable, or that it is an event (or measurable event), if
A ∈ F . To make to it clear which σ -field we mean to use in this definition, we sometimes write
that an event is F -measurable.
13
©Nic Freeman, University of Sheffield, 2024.
Example 2.1.2 Some examples of experiments and the σ -fields we might choose for them are
the following:
• We toss a coin, which might result in heads H or tails T . We take Ω = {H, T } and
F = ∅, {H}, {T }, Ω .
• We toss two coins, both of which might result in heads H or tails T . We take Ω =
{HH, T T, HT, T H}. However, we are only interested in the outcome that both coins are
heads. We take F = ∅, {HH}, Ω \ {HH}, Ω .
There are natural ways to choose a σ -field, even if we think of Ω as just an arbitrary set. For
example, F = {Ω, ∅} is a σ -field. If A is a subset of Ω, then F = {Ω, A, Ω \ A, ∅} is a σ -field
(check it!).
Given Ω and F , the final ingredient of a probability space is a measure P, which tells us how
likely the events in F are to occur.
1. P[Ω] = 1.
The second of these conditions if often called σ -additivity. Note that we needed Definition 2.1.1 to
make sense of Definion 2.1.3, because we needed something to tell us that P [ ∞ i=1 Ai ] was defined!
S
Definition 2.1.4 A probability space is a triple (Ω, F, P), where F is a σ -field and P is a prob-
ability measure.
For example, to model a single fair coin toss we would take Ω = {H, T }, F = {Ω, {H}, {T }, ∅}
and define P[H] = P[T ] = 12 .
We commented above that often we want to choose F to be smaller than P(Ω), but we have
not yet shown how to choose a suitably small F . Fortunately, there is a general way of doing so,
for which we need the following technical lemma.
Lemma 2.1.5 Let I be any set and for each i ∈ I let Fi be a σ -field. Then
(2.1)
\
F= Fi
i∈I
is a σ -field
14
©Nic Freeman, University of Sheffield, 2024.
Now, suppose that we have our Ω and we have a finite or countable collection of E1 , E2 , . . . ⊆ Ω,
which we want to be events. Let F be the set of all σ -fields that contain E1 , E2 , . . .. We enumerate
F as F = {Fi ; i ∈ I}, and apply Lemma 2.1.5. We thus obtain a σ -field F , which contains all
the events that we wanted.
The key point here is that F is the smallest σ -field that has E1 , E2 , . . . as events. To see why,
note that by (2.1), F is contained inside any σ -field F 0 which has E1 , E2 , . . . as events.
Definition 2.1.7 Let E1 , E2 , . . . be subsets of Ω. We write σ(E1 , E2 , . . . , ) for the smallest σ -field
containing E1 , E2 , . . ..
With Ω as any set, and A ⊆ Ω, our example {∅, A, Ω\A, Ω} is clearly σ(A). In general, though,
the point of Definition 2.1.7 is that we know useful σ -fields exist without having to construct them
explicitly.
In the same style, if F1 , F2 . . . are σ -fields then we write σ(F1 , F2 , . . .) for the smallest σ -algebra
with respect to which all events in F1 , F2 , . . . are measurable.
From Definition 2.1.1 and 2.1.3 we can deduce all the ‘usual’ properties of probability. For
example:
And so on. In this course we are concerned with applying probability theory rather than with
relating its properties right back to the definition of a probability space; but you should realize
that it is always possible to do so.
Definitions 2.1.1 and 2.1.3 both involve countable unions. It is convenient to be able to use
countable intersections too, for which we need the following lemma.
Since F is a σ -field, Ω \ Ai ∈ F for all i. Hence also \ Ai ∈ F , which in turn means that
S∞
i=1 Ω
Ω\( ∞ i=1 Ω \ Ai ) ∈ F .
S
In general, uncountable unions and intersections of measurable sets need not be measurable.
The reasons why we only allow countable unions/intersections in probability are complicated and
beyond the scope of this course. Loosely speaking, the bigger we make F , the harder it is to make
a probability measure P, because we need to define P[A] for all A ∈ F in a way that satisfies
Definition 2.1.3. Allowing uncountable set operations would (in natural situations) result in F
being so large that it would be impossible to find a suitable P.
From now on, the symbols Ω, F and P always denote the three elements of the
probability space (Ω, F, P).
15
©Nic Freeman, University of Sheffield, 2024.
0 if ω is odd,
(
X(ω) =
1 if ω is even.
We write
X −1 (A) = {ω ∈ Ω ; X(ω) ∈ A},
for A ⊆ R, which is called the pre-image of A under X . In words, X −1 (A) is the set of outcomes
ω for which the property X(ω) falls inside the set A. In our example above X −1 ({0}) = {1, 3, 5},
X −1 ({1}) = {2, 4, 6} and X −1 ({0, 1}) = {1, 2, 3, 4, 5, 6}.
It is common to write X −1 (a) in place of X −1 ({a}), because it makes easier reading. Similarly,
for an interval (a, b) ⊆ R we write X −1 (a, b) in place of X −1 (a, b) .
If it is clear which σ -field G we mean to use, which might simply say that X is measurable. We
will often shorten this to writing simply X ∈ mG .
For a probability space (Ω, F, P), we say that X : Ω → R is a random variable if X is
F -measurable. The relationship to the usual notation for probability is that P[X ∈ A] means
P[X −1 (A)], so as e.g.
We usually prefer writing P[X = a] and P[a < X < b] because we find them more intuitive; we
like to think of X as an object that takes a random value.
For example, suppose we toss a coin twice, with Ω = {HH, HT, T H, T T } as in Example 2.1.2.
If we take our σ -field to be F = P(Ω), the set of all subsets of Ω, then any function X : Ω → R
is F -measurable. However, suppose we choose instead
G = Ω, {HT, T H, T T }, {HH}, ∅
we have X −1 ([0, 1]) = {HH, HT, T H} ∈ / G, so X is not G -measurable. The intuition here is that
σ -field G ‘isn’t big enough’ for X , because G only contains information about whether we threw
two heads (or not).
16
©Nic Freeman, University of Sheffield, 2024.
Rigorously, if we want to check that X is G -measurable, we have to check that X −1 (I) ∈ G for
every subinterval of I ⊆ R. This can be tedious, especially if X takes many different values.
Fortunately, we will shortly see that, in practice, there is rarely any need to do so. What is
important for us is to understand the role played by a σ -field.
17
©Nic Freeman, University of Sheffield, 2024.
Lemma 2.2.2 Let G be a σ -field on Ω. Let X : Ω → R, and suppose X takes a finite or countable
set of values {x1 , x2 , . . .}. Then:
Proof: Let us first prove the ⇒ implication. So, assume the left hand side: X ∈ mG . Since
{X = xj } = X −1 (xj ) and {xj } = [xj , xj ] is a subinterval of R, by Definition 2.2.1 we have that
{X = xj } ∈ G . Since this holds for any j = 1, . . . , n, we have shown the right hand side.
Next, we will prove the ⇐ implication. So, we now assume the right hand side: {X = xj } ∈ G
for all j . Let I be any subinterval of R, and define J = {j ; xj ∈ I}. Then,
[ [
X −1 (I) = {ω ∈ Ω ; X(ω) ∈ I} = {ω ; X(w) = xj } = {X = xj }.
j∈J j∈J
Here, G1 contains the information of whether the roll is even or odd, and G2 contains the infor-
mation of whether the roll is ≤ 3 or > 3. It’s easy to check that G1 and G2 are both σ -fields.
Let us define
0 if ω is odd,
(
X1 (ω) =
1 if ω is even,
0 if ω ≤ 3,
(
X2 (ω) =
1 if ω > 3.
That is, X1 tests if the roll is even, and X2 tests if the roll is less than or equal to three. Based
on our intuition about information we should expect that X1 is measurable with respect to G1
but not G2 , and that X2 is measurable with respect to G2 but not G1 .
We can justify our intuition rigorously using Lemma 2.2.2. We have X1−1 (0) = {1, 3, 5},
X1−1 (1) = {2, 4, 6}, so X1 ∈ mG1 but X1 ∈ / mG2 . Similarly, we have X2−1 (0) = {1, 2, 3} and
X2−1 (1) = {4, 5, 6}, so X2 ∈
/ mG1 and X2 ∈ mG2 .
Let us extend this example by introducing a third σ -field,
G3 = σ {1, 3}, {2}, {4}, {5}, {6} .
The σ -field G3 is, by Definition 2.1.7, the smallest σ -field containing the events {1, 3}, {2}, {4}, {5}
and {6}. It contains the information of which ω ∈ Ω we threw except that it can’t tell the difference
18
©Nic Freeman, University of Sheffield, 2024.
between a 1 and a 3. If we tried to write G3 out in full we would discover that it had 32 elements
(and probably make some mistakes!) so instead we just use Definition 2.1.7.
To check if X1 ∈ mG3 , we need to check if G3 contains {1, 3, 5} and {2, 4, 6}. We can write
which shows that {1, 3, 5}, {2, 4, 6} ∈ G3 because, in both cases, the right hand sides are made up
of sets that we already know are in G3 , plus countable set operations. Hence, X1 is G3 measurable.
You can check for yourself that X2 is also G3 measurable.
Remark 2.2.3 ( ) For continuous random variables, there is no equivalent of Lemma 2.2.2.
More sophisticated tools from measure theory are needed – see MAS350/61022.
19
©Nic Freeman, University of Sheffield, 2024.
In words, σ(X) is the σ -field generated by the sets X −1 (I) for intervals I . The intuition is that
σ(X) is the smallest σ -field of events on which the random behaviour of X depends.
For example, consider throwing a fair die. Let Ω = {1, 2, 3, 4, 5, 6}, let F = P(Ω) and let
1 if ω is odd
(
X(ω) =
2 if ω is even.
Then X(ω) ∈ {1, 2}, with pre-images X −1 (1) = {1, 3, 5} and X −1 (2) = {2, 4, 6}. The smallest
σ -field that contains both of these subsets is
n o
σ(X) = ∅, {1, 3, 5}, {2, 4, 6}, Ω .
The information contained in this σ -field is whether ω is even or odd, which is precisely the same
information given by the value of X(ω).
In general, if X takes lots of different values, σ(X) could be very big and we would have no
hope of writing it out explicitly. Here’s another example: suppose that
1 if ω = 1,
Y (ω) = 2 if ω = 2,
3 if ω ≥ 3.
Then Y (ω) ∈ {1, 2, 3} with pre-images Y −1 (1) = {1}, Y −1 (2) = {2} and Y −1 (3) = {3, 4, 5, 6}.
The smallest σ -field containing these three subsets is
n o
σ(Y ) = ∅, {1}, {2}, {3, 4, 5, 6}, {1, 2}, {1, 3, 4, 5, 6}, {2, 3, 4, 5, 6}, Ω .
The information contained in this σ -field is whether ω is equal to 1, 2 or some number ≥ 3. Again,
this is precisely the same information as is contained in the value of X(ω).
It’s natural that X should be measurable with respect to the σ -field that contains precisely
the information on which X depends. Formally:
Proof: Let I be a subinterval of R. Then, by definition of σ(X), we have that X −1 (I) ∈ σ(X)
for all I .
If we have a finite or countable set of random variables X1 , X2 , . . . we define σ(X1 , X2 , . . .)
to be σ(X1−1 (I), X2−1 (I), . . . ; I is a subinterval of R). The intuition is the same: σ(X1 , X2 , . . .)
corresponds to the information jointly contained in X1 , X2 , . . ..
20
©Nic Freeman, University of Sheffield, 2024.
Essentially, every natural way of combining random variables together leads to other random
variables. Proposition 2.2.6 can usually be used to show this. We won’t prove Proposition 2.2.6
in this course, see MAS31002/61022.
For example, if X is a random variable then so is X 2+X . For a more difficult example, suppose
2
that X is a random variable and let Y = eX , which means that Y (ω) = limn→∞ ni=0 X(ω) .
P i
i!
Recall that we know from analysis that this limit exists since e = limn→∞ i=0 i! exists for all
x
Pn xi
is a random variable (we could use (2.2) repeatedly to show this) and, since the limit exists,
Y (ω) = limn→∞ Yn (ω) is measurable.
In general, if X is a random variable and g : R → R is any ‘sensible’ function then g(X) is also
a random variable. This includes polynomials, powers, all trig functions, all monotone functions,
all continuous and piecewise continuous functions, integrals/derivatives, etc etc.
Independence
We can express the concept of independence, which you already know about for random variables,
in terms of σ -fields. Recall that two events E1 , E2 ∈ F are said to be independent if P[E1 ∩ E2 ] =
P[E1 ]P[E2 ]. Using σ -fields, we have a consistent way of defining independence, for both random
variables and events.
Definition 2.2.7 Sub-σ -fields G1 , G2 of F are said to be independent if P(G1 ∩G2 ) = P(G1 )P(G2 )
for all G1 ∈ G1 and G2 ∈ G2 .
Events E1 and E2 are independent if σ(E1 ) and σ(E2 ) are independent.
Random variables X1 and X2 are independent if σ(X1 ) and σ(X2 ) are independent.
It can be checked that, for events and random variables, this definition is equivalent to the
definitions you will have seen in earlier courses. The same principle, of using the associated σ -
fields, applies to defining what it means for e.g. a random variable and an event to be independent.
We won’t check this claim as part of our course, see MAS31002/61022.
1 In the case of 1/X, we require that P[X = 0] = 0.
21
©Nic Freeman, University of Sheffield, 2024.
2.3 Infinite Ω
So far we focused on finite sample spaces of the form Ω = {x1 , x2 , . . . xn }. In such a case we
would normally take F = P(Ω), which is also a finite set. Since F contains every subset of Ω,
any σ -field on Ω is a sub-σ -field of F . We have seen how it is possible to construct other σ -fields
on Ω too.
In this case we can define a probability measure on Ω by choosing a finite sequence a1 , a2 , . . . , an
such that each ai ∈ [0, 1] and n1 ai = 1. We set P[xi ] = ai . This naturally extends to defining
P
(2.3)
X X
P[A] = P[xi ] = ai .
{i ; xi ∈A} {i ; xi ∈A}
It is hopefully obvious (and tedious to check) that, with this definition, P is a probability measure.
Consequently (Ω, F, P) is a probability space.
All our experiments with (finitely many tosses/throws of) dice and coins fit into this category
of examples. In fact, if our experiment has countably many outcomes, say, Ω = {x1 , x2 , . . .} we
can still work in much the same way, and the sum in (2.3) will become an infinite series that sums
to 1.
However, the theory of stochastic processes, as well as most sophisticated examples of stochastic
models, require an uncountable sample space. In such cases, we can’t use (2.3), because there is
no such thing as an uncountable sum.
F = σ(X1 , X2 , . . .)
i.e. F is the smallest σ -field with respect to which all the Xn are random variables. Then
22
©Nic Freeman, University of Sheffield, 2024.
• P[Xn = H] = P[Xn = T ] = 1
2 for all n ∈ N.
From this, you can work with P without having to know how P was constructed. You don’t even
need to know exactly which subsets of Ω are in F , because Proposition 2.2.6 gives you access to
plenty of random variables.
Remark 2.3.1 ( ) In this case it turns out that F is much smaller than P(Ω). In fact, if
we tried to take F = P(Ω), we would (after some significant effort) discover that there is no
probability measure P : P(Ω) → [0, 1] (i.e. satisfying Definition 2.1.3) in which the coin tosses are
independent. This is irritating, and surprising, and we just have to live with it.
23
©Nic Freeman, University of Sheffield, 2024.
Almost surely
In the example from Section 2.3 we used Ω = {H, T }N , which is the set of all sequences made
up of H s and T s. Our probability measure was independent, fair, coin tosses and we used the
random variable Xn for the nth toss.
Let’s examine this example a bit. First let us note that, for any individual sequence ω1 , ω2 , . . .
of heads and tails, by independence
1 1 1
P[X1 = ω1 , X2 = ω2 , . . .] = · · . . . = 0.
2 2 2
So every element of Ω has probability zero. This is not a problem – if we take enough elements
of Ω together then we do get non-zero probabilities, for example
1
P[X1 = H] = P [ω ∈ Ω such that ω1 = H] =
2
which is not surprising.
The probability that we never throw a head is
1 1
P[for all n, Xn = T ] = · . . . = 0
2 2
which means that the probability that we eventually throw a head is
P[for some n, Xn = H] = 1 − P[for all n, Xn = T ] = 1.
So, the event {for some n, Xn = H} has probability 1, but is not equal to the whole sample space
Ω. To handle this situation we have a piece of terminology.
Definition 2.3.2 If the event E has P[E] = 1, then we say E occurs almost surely.
So, we would say that ‘almost surely, our coin will eventually throw a head’. We might say
that ‘Y ≤ 1’ almost surely, to mean that P[Y ≤ 1] = 1. This piece of terminology will be used
very frequently from now on. We might sometimes say that an event ‘almost always’ happens,
with the same meaning.
For another example, suppose that we define qnH and qnT to be the proportion of heads and,
respectively, tails in the first n coin tosses X1 , X2 , . . . , Xn . Formally, this means that
n n
1X 1X
qnH = 1{Xi = H} and qnT = 1{Xi = T }.
n n
i=1 i=1
Here 1{Xi = H} is equal to 1 is Xi = H , and equal to zero otherwise; similarly for 1{Xi = T }.
We will think a bit more about random variables of this type in the next section. Of course
qnH + qnT = 1.
The random variables 1{Xi = H} are i.i.d. with E[1{Xi = H}] = 12 , hence by Theorem 1.1.1
we have P[qnH → 12 as n → ∞] = 1, and by the same argument we have also P[qnT → 12 as n →
∞] = 1. In words, this means that in the long run half our tosses will be tails and half will be
heads (which makes sense - our coin is fair). That is, the event
1 1
E = lim qn = H
and lim qn =T
n→∞ 2 n→∞ 2
occurs almost surely.
There are many many examples of sequences (e.g. HHT HHT HHT . . .) that don’t have
qn → 12 and qnH → 12 . We might think of the event E as being only a ‘small’ subset of Ω, but it
T
24
©Nic Freeman, University of Sheffield, 2024.
2.4 Expectation
There is only one part of the ‘usual machinery’ for probability that we haven’t yet discussed,
namely expectation.
Recall that the expectation of a discrete random variable X that takes the values {xi : i ∈ N}
is given by
(2.4)
X
E[X] = xi P[X = xi ].
xi
For a continuous random variables, the expectation uses an integral against the probability density
function, Z ∞
E[X] = x fX (x) dx. (2.5)
−∞
Recall also that it is possible for limits (i.e. infinite sums) and integrals to be infinite, or not exist
at all.
We are now conscious of the general definition of a random variable X , as an F -measurable
function from Ω to R. There are many random variables that are neither discrete nor continuous,
and for such cases (2.4) and (2.5) are not valid; we need a more general approach.
With Lebesgue integration, the expectation E can be defined using a single definition that works
for both discrete and continuous (and other more exotic) random variables. This definition relies
heavily on analysis and is well beyond the scope of this course. Instead, Lebesgue integration is
covered in MAS31002/61022.
For purposes of this course, what you should know is: E[X] is defined for all X such that either
The point here is that we are prepared to allow ourselves to write E[X] = ∞ (e.g. when the sum
or integral in (2.4) or (2.5) tends to ∞) provided that X ≥ 0. We are not prepared to allow
expectations to equal −∞, because we have to avoid nonsensical ‘∞ − ∞’ situations.
It’s worth knowing that if X ≥ 0 and P[X = ∞] > 0, then E[X] = ∞. In words, the slightest
chance of X being infinite will outweigh all of the finite possibilities and make E[X] infinite.
You may still use (2.4) and (2.5), in the discrete/continuous cases. You may also assume that
all the ‘standard’ properties of E hold:
You should become familiar with any of the properties that you are not already used to using.
The proofs of these properties are part of the formal construction of E and are not part of our
course.
25
©Nic Freeman, University of Sheffield, 2024.
Indicator functions
One important type of random variable is an indicator function. Let A ∈ F , then the indicator
function of A is the function (
1 ω∈A
1A (ω) =
0 ω∈ / A.
The indicator function is used to tell if an event occurred (in which case it is 1) or did not occur
(in which case it is 0). It is useful to remember that
P[A] = E[1A ].
We will sometimes not put the A as a subscript and write e.g. 1{X < 0} for the indicator function
of the event that X < 0.
As usual, let G denote a sub σ -field of F .
Proof: Let us write Y = 1A . Note that Y is a discrete random variable, which can take the
two values 0 and 1. We have Y −1 (1) = {Y = 1} = A and Y −1 (0) = {Y = 0} = Ω \ A. By
Proposition 2.2.2, Y is G measurable.
Indicator functions allow us to condition, meaning that we can break up a random variable
into two or more cases. For example, given any random variable X we can write
Precisely one of the two terms on the right hand side is non-zero. If X ≥ 1 then the first term
takes the value X and the second is zero; if X < 1 then the second term is equal to X and the
first term is zero.
We can use (2.6) to prove a useful inequality. Putting |X| in place of X , and then taking E
we obtain
Here, to deduce the second line, the key point is we can only use the inequality |x| ≤ x2 if x ≥ 1.
Hence, for the first term we can use that |X|1{|X|≥1} ≤ X 2 1{|X|≥1} . For the second term, we use
that |X|1{|X|<1} ≤ 1. In both cases, we also need the monotonicity of E.
26
©Nic Freeman, University of Sheffield, 2024.
Lp spaces
It will often be important for to us know whether a random variable X has finite mean and
variance. Some random variables do not, see exercise 2.8 (or MAS2010 or MAS31002) for example.
Random variables with finite mean and variances are easier to work with than those which don’t,
and many of the results in this course require these conditions.
We use some notation:
In this course, we will only be interested in the cases p = 1 and p = 2. These cases have the
following set of useful properties:
2. L2 is the set of random variables with finite variance. This comes from the fact that var(X) =
E[X 2 ] − E[X]2 (more strictly, it needs some integration theory from MAS31002/61022).
Often, to check if X ∈ Lp we must calculate E[|X|p ]. A special case where it is automatic is the
following.
Definition 2.4.4 We say that a random variable X is bounded if there exists (deterministic)
c ∈ R such that |X| ≤ c.
If X is bounded, then using monotonicity we have E[|X|p ] ≤ E[cp ] = cp < ∞, which means that
X ∈ Lp , for all p.
27
©Nic Freeman, University of Sheffield, 2024.
2.3 Let Ω = {H, T }N be the probability space from Section 2.3, corresponding to an infinite
sequence of independent fair coin tosses (Xn )∞
n=1 .
(a) Fix m ∈ N. Show that the probability that that random sequence X1 , X2 , . . . , contains
precisely m heads is zero.
(b) Deduce that, almost surely, the sequence X1 , X2 , . . . contains infinitely many heads and
infinitely many tails.
2.4 Let Ω = {1, 2, 3, 4, 5, 6}, corresponding to one roll of a die. In each of the following cases
describe, in words, the information contained within the given σ -field.
(c) F3 = ∅, {1, 2}, {3, 4}, {5, 6}, {1, 2, 3, 4}, {3, 4, 5, 6}, {1, 2, 5, 6}, Ω .
On random variables
2.5 Let Ω = {1, 2, 3, 4, 5} and set
G1 = σ {1, 5}, {2, 4}, {3}
G2 = ∅, {1, 2}, {3, 4, 5}, Ω .
2.6 Let Ω = {HH, HT, T H, T T }, representing two coin tosses. Define X to be the total number
of heads shown. Write down all the events in σ(X).
2.8 Let X be a random variable with the probability density function f : R → R given by
28
©Nic Freeman, University of Sheffield, 2024.
2.10 Let a > 0 and let X be a random variable such that X ≥ 0. Show that P[X ≥ a] ≤ a1 E[X].
Challenge questions
2.11 Show that if P[X ≥ 0] = 1 and E[X] = 0 then P[X = 0] = 1.
29
©Nic Freeman, University of Sheffield, 2024.
Chapter 3
We will introduce conditional expectation, which provides us with a way to estimate random
quantities based on only partial information. We will also introduce martingales, which are the
mathematical way to capture the concept of a fair game.
You might also have seen a second definition, using probability density functions, for continuous
random variables. These definitions are problematic, for several reasons, chiefly (1) its not imme-
diately clear how the two definitions interact and (2) we don’t want to be restricted to handling
only discrete or only continuous random variables.
In this section, we define the conditional expectation of random variables using σ -fields. In this
setting we are able to give a unified definition which is valid for general random variables. The
definition is originally due to Kolmogorov (in 1933), and is sometimes referred to as Kolmogorov’s
conditional expectation. It is one of the most important concepts in modern probability theory.
Conditional expectation is a mathematical tool with the following function. We have a prob-
ability space (Ω, F, P) and a random variable X : Ω → R. However, F is large and we want to
work with a sub-σ -algebra G , instead. As a result, we want to have a random variable Y such
that
1. Y is G -measurable
The second statement on this wish-list does not fully make sense; there are many different ways
in which we could compare X to a potential Y .
30
©Nic Freeman, University of Sheffield, 2024.
1. Y is G -measurable,
The first and second statements here correspond respectively to the items on our wish-list.
Definition 3.1.2 We refer to Y as (a version of) the conditional expectation of X given G . and
we write
Y = E[X | G].
Since any two such Y are almost surely equal so we sometimes refer to Y simply as the
conditional expectation of X . This is a slight abuse of notation, but it is commonplace and
harmless.
Proof of Theorem 3.1.1 is beyond the scope of this course. Loosely speaking, there is an abstract
recipe which constructs E[X|G]. It begins with the random variable X , and then averages out over
all the information that is not accessible to G , leaving only as much randomness as G can support,
resulting in E[X|G]. In this sense the map X 7→ E[X|G] simplifies (i.e. reduces the amount of
randomness in) X in a very particular way, to make it G measurable.
It is important to remember that E[X|G] is (in general) a random variable. It is also important
to remember that the two objects
are quite different. They are both useful. We will explore the connection between them in Section
3.1. Before doing so, let us look at a basic example.
Let X1 , X2 be independent random variables such that P[Xi = −1] = P[Xi = 1] = 12 . Set
F = σ(X1 , X2 ). We will show that
To do so, we should check that X1 satisfies the two conditions in Theorem 3.1.1, with
X = X1 + X2
Y = X1
G = σ(X1 ).
31
©Nic Freeman, University of Sheffield, 2024.
The first condition is immediate, since by Lemma 2.2.5 X1 is σ(X1 )-measurable i.e. Y ∈ mG . To
see the second condition, let G ∈ σ(X1 ). Then 1G ∈ σ(X1 ) by Lemma 2.4.2 and X2 ∈ σ(X2 ),
and these σ -fields are independent, so 1G and X2 are independent. Hence
This equation says precisely that E[X 1G ] = E[Y 1G ]. We have now checked both conditions, so
by Theorem 3.1.1 we have E[X|G] = Y , meaning that E[X1 + X2 |σ(X1 )] = X1 , which proves our
claim in (3.2).
The intuition for this, which is plainly visible in our calculation, is that X2 is independent of
σ(X1 ) so, thinking of conditional expectation as an operation which averages out all randomness
in X = X1 + X2 that is not G = σ(X1 ) measurable, we would average out X2 completely
i.e. E[X2 ] = 0.
We could equally think of X1 as being our best guess for X1 + X2 , given only information in
σ(X1 ), since E[X2 ] = 0. In general, guessing E[X|G] is not so easy!
32
©Nic Freeman, University of Sheffield, 2024.
Remark 3.1.3 This subsection is marked with a ( ), meaning that it is non-examinable. This
is so as you can forget the old definition and remember the new one!
To see the connection, we focus on the case where X, Z are random variables with finite sets of
values {x1 , . . . , xn }, {z1 , . . . , zm }. Let Y be the naive version of conditional expectation defined
in (3.1). That is,
1{Z(ω)=zj } E[X|Z = zj ].
X
Y (ω) =
j
We can use Theorem 3.1.1 to check that, in fact, Y is a version of E[X|σ(Z)]. We want to check
that Y satisfies the two properties listed in Theorem 3.1.1.
• Since Z only takes finitely many values {z1 , . . . , zm }, from the above equation we have that
Y only takes finitely many values. These values are {y1 , . . . , ym } where yj = E[X|Z = zj ].
We note
This is sufficient (although we will omit the details) to show that Y is σ(Z)-measurable.
• We can calculate
xi P[X = xi and Z = zj ]
X
=
i
= E[X 1{Z=zj } ].
Properly, to check that Y satisfies the second property in Theorem 3.1.1, we need to check
E[Y 1G ] = E[X 1G ] for a general G ∈ σ(Z) and not just G = {Z = zj }. However, for reasons
beyond the scope of this course, in this case (thanks to the fact that Z is finite) its enough
to consider only G of the form {Z = zj }.
Therefore, we have Y = E[X|σ(Z)] almost surely. In this course we favour writing E[X|σ(Z)]
instead of E[X|Z], to make it clear that we are looking at conditional expectation with respect
to a σ -field.
33
©Nic Freeman, University of Sheffield, 2024.
Proof of these properties is beyond the scope of our course. Note that the first five properties
above are common properties of both E[·] and E[· | G].
We’ll use these properties extensively, for the whole of the remainder of the course. They are
not on the formula sheet – you should remember them and become familiar with applying them.
Remark 3.2.2 ( ) Although we have not proved the properties in Proposition 3.2.1, they are
intuitive properties for conditional expectation to have.
For example, in the taking out what is known property, we can think of Z as already being
simple enough to be G measurable, so we’d expect that taking conditional expectation with respect
to G doesn’t need to affect it.
In the independence property, we can think of G as giving us no information about the value
X is taking, so our best guess at the value of X has to be simply E[X].
In the tower property for E[E[X|G]|H], we start with X , simplify it to be G measurable and
simplify it to be H measurable. But since H ⊆ G , we might as well have just simplified X enough
to be H measurable in a single step, which would be E[X|H].
Etc. It is a useful exercise for you to try and think of ‘intuitive’ arguments for the other
properties too, so as you can easily remember them.
34
©Nic Freeman, University of Sheffield, 2024.
Lemma 3.2.3 Let G be a sub-σ -field of F . Let X be an F -measurable random variable and let
Y = E[X|G]. Suppose that Y 0 is a G -measurable, random variable. Then
E[(X − Y )2 ] ≤ E[(X − Y 0 )2 ].
E[(X − Y 0 )2 ] = E[(X − Y + Y − Y 0 )2 ]
= E[(X − Y )2 ] + 2E[(X − Y )(Y − Y 0 )] + E[(Y − Y 0 )2 ]. (3.3)
Here, in the first step we used the ‘taking E’ property, in the second step we used Proposition
2.2.6 to tell us that Y − Y 0 is G -measurable, followed by the ‘taking out what is known’ rule. In
the final step we used the linearity and measurability properties. Since E[X|G] = Y almost surely,
we obtain that E[(X − Y )(Y − Y 0 )] = 0. Hence, since E[(Y − Y 0 )2 ] ≥ 0, from (3.3) we obtain
E[(X − Y 0 )2 ] ≥ E[(X − Y )2 ].
35
©Nic Freeman, University of Sheffield, 2024.
3.3 Martingales
In this section we introduce martingales, which are the mathematical representation of a ‘fair
game’. As usual, let (Ω, F, P) be a probability space. We refer to a sequence of random variables
(Sn )∞n=0 as a stochastic process. In this section of the course we only deal with discrete time
stochastic processes.
We have previously discussed the idea of gradually learning more and more information about
the outcome of some experiment, through seeing the information visible from gradually larger
σ -fields. We formalize this concept as follows.
Definition 3.3.2 We say that a stochastic process X = (Xn ) is adapted to the filtration (Fn ) if,
for all n, Xn is Fn measurable.
We should think of the filtration Fn as telling us which information we have access too at time
n = 1, 2, . . .. Thus, an adapted process is a process whose (random) value we know at all times
n ∈ N.
We are now ready to give the definition of a martingale.
1. if (Mn ) is adapted,
2. Mn ∈ L1 for all n,
We say that M is a submartingale if, instead of 3, we have E[Mn | Fn−1 ] ≥ Mn−1 almost surely.
We say that M is a supermartingale if, instead of 3, we have E[Mn | Fn−1 ] ≤ Mn−1 almost surely.
Remark 3.3.4 The second condition in Definition 3.3.3 is needed for the third to make sense.
(and S0 = 0). We can think of Sn as a game in the following way. At each time n = 1, 2, . . . we
toss a coin. We win if the nth round if the coin is heads, and lose if it is tails. Each time we win
we score 1, each time we lose we score −1. Thus, Sn is our score after n rounds. The process Sn
is often called a simple random walk.
36
©Nic Freeman, University of Sheffield, 2024.
We claim that Sn is a martingale. To see this, we check the three properties in the definition.
(1) Since X1 , X2 , . . . , Xn ∈ mσ(X1 , . . . , Xn ) we have that Sn ∈ Fn for all n ∈ N. (2) Since
|Sn | ≤ n for all n ∈ N, E[|Sn |] ≤ n for all n, so Sn ∈ L1 for all n. (3) We have
Here, in the first line we used the linearity of conditional expectation. To deduce the second
line we used the relationship between independence and conditional expectation (for the first
term) and the measurability rule (for the second term). To deduce the final line we used that
E[Xn+1 ] = (1) 12 + (−1) 12 = 0.
At time n we have seen the result of rounds 1, 2, . . . , n, so the information we currently have
access to is given by Fn . This means that at time n we know S1 , . . . , Sn . But we don’t know
Sn+1 , because Sn+1 is not Fn -measurable. However, using our current information we can make
our best guess at what Sn+1 will be, which naturally is E[Sn+1 |Fn ]. Since the game is fair, in the
future, on average we do not expect to win more than we lose, that is E[Sn+1 |Fn ] = Sn .
In this course we will see many examples of martingales, and we will gradually build up
an intuition for how to recognize a martingale. There is, however, one easy sufficient (but not
necessary) condition under which we can recognize that a stochastic process is not a martingale.
Lemma 3.3.6 Let (Fn ) be a filtration and suppose that (Mn ) is a martingale. Then
E[Mn ] = E[M0 ]
for all n ∈ N.
Proof: We have E[Mn+1 |Fn ] = Mn . Taking expectations and using the ‘taking E’ property
from Proposition 3.2.1, we have E[Mn+1 ] = E[Mn ]. The result follows by a trivial induction.
Suppose, now, that (Xn ) is an i.i.d. sequence of random variables such that P[Xi = 2] =
P[Xi = −1] = 12 . Note that E[Xn ] > 0. Define Sn and Fn as before. Now, E[Sn ] = n1 E[Xn ],
P
Hence Sn is a submartingale.
In general, if (Mn ) is a submartingale, then by definition E[Mn+1 | Fn ] ≥ Mn , so taking
expectations gives us E[Mn+1 ] ≥ E[Mn ]. For supermartingales we get E[Mn+1 ] ≤ E[Mn ]. In
words: submartingales, on average, increase, whereas supermartingales, on average, decrease.
The use of super- and sub- is counter intuitive in this respect.
Remark 3.3.7 Sometimes we will want to make it clear which filtration is being used in the
definition of a martingale. To do so we might say that (Mn ) is an Fn -martingale’, or that (Mn )
is a martingale with respect to Fn ’. We use the same notation for super/sub-martingales.
37
©Nic Freeman, University of Sheffield, 2024.
Our definition of a filtration and a martingale both make sense if we look at only a finite set of
times n = 1, . . . , N . We sometimes also use the terms filtration and martingale in this situation.
We end this section with two important general examples of martingales. You should check
the conditions yourself, as exercise 3.3.
Example 3.3.8 Let (Xn ) be a sequence of i.i.d. random variables such that E[Xn ] = 1 for all n,
and there exists c ∈ R such that |Xn | ≤ c for all n. Define Fn = σ(X1 , . . . , Xn ). Then
n
Y
Mn = Xn
i=1
is a martingale.
Example 3.3.9 Let Z ∈ L1 be a random variable and let (Fn ) be a filtration. Then
Mn = E[Z|Fn ]
is a martingale.
38
©Nic Freeman, University of Sheffield, 2024.
Use the properties of conditional expectation to find E[S2 | σ(X1 )] and E[S22 | σ(X1 )] in terms
of X1 , X2 and their expected values.
3.2 Let (Xn ) be a sequence of independent random variables such that P[Xn = 2] = 13 and
P[Xn = −1] = 23 . Set Fn = σ(Xi ; i ≤ n). Show that Sn = ni=1 Xi is an Fn martingale.
P
3.4 Let (Mt ) be a stochastic process that is both and submartingale and a supermartingale.
Show that (Mt ) is a martingale.
3.5 (a) Let (Mn ) be an Fn martingale. Show that, for all 0 ≤ n ≤ m, E[Mm | Fn ] = Mn .
(b) Guess and state (without proof) the analogous result to (a) for submartingales.
3.6 Let (Mn ) be a Fn martingale and suppose Mn ∈ L2 for all n. Show that
2
E[Mn+1 |Fn ] = Mn2 + E[(Mn+1 − Mn )2 |Fn ] (3.4)
Challenge questions
3.8 In the setting of 3.1, show that E[X1 | σ(Sn )] = n .
Sn
39
©Nic Freeman, University of Sheffield, 2024.
Chapter 4
Stochastic processes
In this chapter we introduce stochastic processes, with a selection of examples that are commonly
used as building blocks in stochastic modelling. We show that these stochastic processes are
closely connected to martingales.
1
P[Xi = 1] = P[Xi = −1] = . (4.1)
2
The simple symmetric random walk is the stochastic process
n
X
Sn = Xi .
i=1
40
©Nic Freeman, University of Sheffield, 2024.
By convention, this means that S0 = 0. The word ‘simple’ refers to the fact that the walk moves
by precisely 1 unit of space in each step of time i.e. |Xi | = 1. A sample path of Sn , which means
a sample of the sequence (S0 , S1 , S2 , . . .), might look like:
Note that when time is discrete t = 0, 1, 2, . . . it is standard to draw the location of the random
walk (and other stochastic processes) as constant in between integer time points.
Because of (4.1), the random walk is equally likely to move upwards or downwards. This case
is known as the ‘symmetric’ random walk because, if S0 = 0, the two stochastic processes Sn and
−Sn have the same distribution.
We have already seen (in Section 3.3) that Sn is a martingale, with respect to its generated
filtration
Fn = σ(X1 , . . . , Xn ) = σ(S1 , . . . , Sn ).
It should seem very natural that (Sn ) is a martingale – going upwards as much as downwards is
‘fair’.
The key difference to the symmetric random walk is that here we have p 6= q (the symmetric
random walk has p = q = 12 ). The asymmetric random walk is more likely to step upwards than
downwards if p > q , and vice versa if q < p. The technical term for this behaviour is drift. A
sample path for the case p > q might look like:
41
©Nic Freeman, University of Sheffield, 2024.
This is ‘unfair’, because of the drift upwards, so we should suspect that the asymmetric random
walk is not a martingale. In fact,
n n
(4.2)
X X
E[Sn ] = E[Xi ] = (p − q) = n(p − q),
i=1 i=1
whereas E[S0 ] = 0. Thus, Lemma 3.3.6 confirms that Sn is not a martingale. However, the
process
Mn = Sn − n(p − q) (4.3)
is a martingale. The key is that the term n(p − q) compensates for the drift and ‘restores fairness’.
We’ll now prove that (Mn ) is a martingale. Since Xi ∈ mFn for all i ≤ n, by Proposition 2.2.6
we have Sn − n(p − q) ∈ mFn . Since |Xi | ≤ 1 we have
42
©Nic Freeman, University of Sheffield, 2024.
1. Draw a ball from the urn, look at its colour, and return this ball to the urn.
So, at time n (which means: after the nth iteration of the above steps is completed) there are n+2
balls in the urn. This process is known as the Pólya urn. Pólya was a Hungarian mathematician
who made contributions across a wide spectrum mathematics, in the first half of the 20th century.
Let Bn be the number of red balls in the urn at time n, and note that B0 = 1. Set (Fn ) to be
the filtration generated by (Bn ).
Our first step is to note that Bn itself is not a martingale. The reason is that over time we will
put more and more red balls into the urn, so the number of red balls drifts upwards over time.
Formally, we can note that
Bn + 1 h i Bn h i
= E 1{(n+1)th draw is red} Fn + E 1{(n+1)th draw is black} Fn
n+3 n+3
Bn + 1 Bn Bn Bn
= + 1−
n+3 n+2 n+3 n+2
Bn2 + Bn (n + 2)Bn − Bn2
= +
(n + 2)(n + 3) (n + 2)(n + 3)
43
©Nic Freeman, University of Sheffield, 2024.
(n + 3)Bn
=
(n + 2)(n + 3)
Bn
=
n+2
= Mn .
We have Mn ∈ mFn and since Mn ∈ [0, 1] we have that Mn ∈ L1 . Hence (Mn ) is a martingale.
We can think of Mn = n+2 Bn
as a compensation mechanism; Bn tends to increase, and we
compensate for this increase by dividing by n + 2.
Remark 4.2.1 The calculation of E[Mn+1 | Fn ] is written out in full as a second example of
the method. In fact, we could simply have divided the equality in (4.4) by n + 3, and obtained
E[Mn+1 | Fn ] = Mn .
On fairness
It is clear that the symmetric random walk is fair; at all times it is equally likely to move up as
down. The asymmetric random walk is not fair, due to its drift (4.2), but once we compensate
for drift in (4.3) we do still obtain a martingale.
Then urn process requires more careful thought. For example, we might wonder:
Suppose that the first draw is red. Then, at time n = 1 we have two red balls and one
black ball. So, the chance of drawing a red ball is now 23 . How is this fair?!
To answer this question, let us make a number of points. Firstly, let us remind ourselves that the
quantity which is a martingale is Mn , the proportion of red balls in the urn.
Secondly, suppose that the first draw is indeed red. So, at n = 1 we have 2 red and 1 black,
giving a proportion of 23 red and 13 black. The expected fraction of red balls after the next
(i.e. second) draw is
2 (2 + 1) 1 2 6+2 2
· + · = =
3 4 3 4 12 3
which is of course equal to the proportion of red balls that we had at n = 1.
Lastly, note that it is equally likely that, on the first go, you’d pick out a black. So, starting
from n = 0 and looking forwards, both colors have equally good chances of increasing their own
numbers. In fact, if we were to pretend, right from the start, that black was red, and red was
black, we would see the same urn process. This type of fairness is known as symmetry. We’ve
seen that Bn tends to increase (because we keep adding more balls), and we can think of Mn as
a way of discovering the fairness ‘hiding’ inside of Bn .
To sum up: in life there are different ways to think of ‘fairness’ – and what we need to do here
is get a sense for precisely what kind of fairness martingales characterize. The fact that (Mn )
is a martingale does not prevent us from (sometimes) ending up with many more red balls than
black, or vice versa. It just means that, when viewed in terms of (Mn ), there is no bias towards
red of black inherent in the rules of the game.
44
©Nic Freeman, University of Sheffield, 2024.
Each dot is a ‘parent’, which has a random number of child dots (indicated by arrows). Each
parent choses how many children it will have independently of all else, by taking a random sample
of G. The BGW process is the process Zn , where Zn is the number of dots in generation n.
Formally, we define the BGW process as follows. Let Xin , where n, i ≥ 1, be i.i.d. nonnegative
integer-valued random variables with common distribution G. Define a sequence (Zn ) by Z0 = 1
and
, if Zn > 0
(
X1n+1 + . . . + XZn+1
Zn+1 = n
(4.5)
0, if Zn = 0
Then Z is the BGW process. The random variable Xin represents the number of children of the
ith parent in the nth generation.
Note that if Zn = 0 for some n, then for all m > n we also have Zm = 0.
45
©Nic Freeman, University of Sheffield, 2024.
Here, we use that the Xin+1 are independent of Fn , but Zn (and hence also 1{Zn = k}) is Fn
measurable. We justify exchanging the infinite and E using the result of exercise (6.8).
P
From (4.6), Lemma 3.3.6 tells us that if (Mn ) is a martingale that E[Mn ] = E[Mn+1 ]. But,
if µ < 1 we see that E[Zn+1 ] < E[Zn ] (downwards drift) and if µ > 1 then E[Zn+1 ] > E[Zn ]
(upwards drift).
However, much like with the asymmetric random walk, we can compensate for the drift and
obtain a martingale. More precisely, we will show that
Zn
Mn =
µn
is a martingale.
We have M0 = 1 ∈ mF0 , and if Mn ∈ Fn then from (4.5) we have that Mn+1 ∈ mFn+1 .
Hence, by induction Mn ∈ Fn for all n ∈ N. From (4.6), we have E[Zn+1 ] = µE[Zn ] so as
E[Zn ] = µn for all n. Hence E[Mn ] = 1 and Mn ∈ L1 .
Lastly, we repeat the calculation that led to (4.6), but now with conditional expectation in
place of E. The first few steps are essentially the same, and we obtain
∞
E X1n+1 + . . . + Xkn+1 1{Zn = k} | Fn
X
E[Zn+1 | Fn ] =
k=1
∞
1{Zn = k}E X1n+1 + . . . + Xkn+1 | Fn
X
=
k=1
∞
1{Zn = k}E X1n+1 + . . . + Xkn+1
X
=
k=1
∞
kµ1{Zn = k}
X
=
k=1
46
©Nic Freeman, University of Sheffield, 2024.
∞
k 1{Zn = k}
X
=µ
k=1
= µZn .
Here we use that Zn is Fn measurable to take out what is known, and then use that Xin+1 is
independent of Fn . Hence, E[Mn+1 |Fn ] = Mn , as required.
47
©Nic Freeman, University of Sheffield, 2024.
Remark 4.4.1 All the processes we have studied in this section can be represented as Markov
chains with state space N. It is possible to use the general theory of Markov chains to study these
stochastic processes, but it wouldn’t provide as much detail as we will obtain (in Chapters 7 and
8) using martingales.
48
©Nic Freeman, University of Sheffield, 2024.
is a martingale.
4.2 Let Sn = ni=1 Xi be the asymmetric random walk from Section 4.1, where P[Xi = 1] = p,
P
P[Xi = −1] = q and with p > q and p + q = 1. Show that (Sn ) is a submartingale and that
Sn
q
Mn =
p
is a martingale.
4.3 Let (Xi ) be a sequence of identically distributed random variables with common distribution
with probability pa
(
a
Xi =
−b with probability pb = 1 − pa .
4.4 Let Sn = i=1 Xi be the symmetric random walk from Section 4.1. Show that Sn is a
Pn 2
4.5 Let (Xi ) be an i.i.d. sequence of random variables such that P[Xi = 1] = P[Xi = −1] = 12 .
Define a stochastic process Sn by setting S0 = 1 and
Sn + Xn+1 if Sn > 0,
(
Sn+1 =
1 if Sn = 0.
That is, Sn behaves like a symmetric random walk but, whenever it becomes zero, on the
next time step it is ‘reflected’ back to 1. Let
n−1
1{Si = 0}
X
Ln =
i=0
be the number of time steps, before time n, at which Sn is zero. Show that
E[Sn+1 | Fn ] = Sn + 1{Sn = 0}
4.6 Consider an urn that may contain balls of three colours: red, blue and green. Initially the
urn contains one ball of each colour. Then, at each step of time n = 1, 2, . . . we draw a ball
from the urn. We place the drawn ball back into the urn and add an additional ball of the
same colour.
Let (Mn ) be the proportion of balls that are red. Show that (Mn ) is a martingale.
49
©Nic Freeman, University of Sheffield, 2024.
4.7 Let Sn = ni=1 Xi be the symmetric random walk from Section 4.1. State, with proof, which
P
Challenge questions
4.8 Let (Sn ) be the symmetric random walk from Section 4.1. Prove that there is no deterministic
function f : N → R such that Sn3 − f (n) is a martingale.
50
©Nic Freeman, University of Sheffield, 2024.
Chapter 5
We now return to financial mathematics. We will extend the one-period model from Chapter 1
and discover a surprising connection between arbitrage and martingales.
V0h = x + ys
V1h = x(1 + r) + yS1 .
51
©Nic Freeman, University of Sheffield, 2024.
We can also formalize the idea of arbitrage. A portfolio is an arbitrage if it makes money for
free:
Definition 5.1.2 A portfolio h = (x, y) is said to be an arbitrage possibility if:
V0h = 0
P[V1h ≥ 0] = 1
P[V1h > 0] > 0.
We say that a market is arbitrage free if there do not exist any arbitrage possibilities.
It is possible to characterize exactly when the one-period market is arbitrage free. In fact, we
have already done most of the work in exercise 1.3.
Proposition 5.1.3 The one-period market is arbitrage free if and only if d < 1 + r < u.
Proof: (⇒) : Recall that we assume d < u. Hence, if d < 1+r < u fails then either 1+r ≤ d < u
or d < u ≤ 1 + r. In both cases, we will construct an arbitrage possibility.
In the case 1 + r ≤ d < u we use the portfolio h = (−s, 1) which has V0h = 0 and
V1h = −s(1 + r) + S1 ≥ s(−(1 + r) + d) ≥ 0,
hence P[V1h ≥ 0] = 1. Further, with probability pu > 0 we have S1 = su, which means V1h >
s(−(1 + r) + d) ≥ 0. Hence P[V1h > 0] > 0. Thus, h is an arbitrage possibility.
If 0 < d < u ≤ 1 + r then we use the portfolio h0 = (s, −1), which has V0h = 0 and
0
0
V1h = s(1 + r) − S1 ≥ s(1 + r − u) ≥ 0,
hence P[V1h ≥ 0] = 1. Further, with probability pd > 0 we have S1 = sd, which means V1h >
0 0
Remark 5.1.4 In both cases, at time 0 we borrow whichever commodity (cash or stock) will
grow slowest in value, immediately sell it and use the proceeds to buy the other, which we know
will grow faster in value. Then we wait; at time 1 we own the commodity has grown fastest in
value, so we sell it, repay our debt and have some profit left over.
(⇐) : Now, assume that d < 1 + r < u. We need to show that no arbitrage is possible. To do
so, we will show that if a portfolio has V0h = 0 and V1h ≥ 0 then it also has V1h = 0.
So, let h = (x, y) be a portfolio such that V0h = 0 and V1h ≥ 0. We have
V0h = x + ys = 0.
The value of h at time 1 is
V1h = x(1 + r) + ysZ.
Using that x = −ys, we have
if Z = u,
(
ys(u − (1 + r))
V1h = (5.1)
ys(d − (1 + r)) if Z = d.
Since P[V1h ≥ 0] = 1 this means that both (a) ys(u − (1 + r)) ≥ 0 and (b) ys(d − (1 + r)) ≥ 0. If
y < 0 then we contradict (a) because 1 + r < u. If y > 0 then we contradict (b) because d < 1 + r.
So the only option left is that y = 0, in which case V0h = V1h = 0.
52
©Nic Freeman, University of Sheffield, 2024.
Expectation regained
In Proposition 5.1.3 we showed that our one period model was free of arbitrage if and only if
d < 1 + r < u.
This condition is very natural: it means that sometimes the stock will outperform cash and
sometimes cash will outperform the stock. Without this condition it is intuitively clear that our
market would be a bad model. From that point of view, Proposition 5.1.3 is encouraging since it
confirms the importance of (no) arbitrage.
However, it turns out that there is more to the condition d < 1 + r < u, which we now explore.
It is equivalent to asking that there exists qu , qd ∈ (0, 1) such that both
In words, (5.2) says that 1 + r is a weighted average of d and u, where d has weight qd and u has
weight qu . Solving these two equations gives
(1 + r) − d u − (1 + r)
qu = , qd = . (5.3)
u−d u−d
Now, here is the key: we can think of the weights qu and qd as probabilities. Let’s pretend that
we live in a different world, where a single unit of stock, worth S0 = s at time 0, changes value
to become worth
sd with probability qd ,
(
S1 =
su with probability qu .
We have altered – the technical term is tilted – the probabilities from their old values pd , pu to
new values qd , qu . Let’s call this new world Q, by which we mean that Q is our new probability
measure: Q[S1 = sd] = qd and Q[S1 = su] = qu . This is often called the risk-neutral world,
and qu , qd are known as the risk-neutral probabilities1 .
Since Q is a probability measure, we can use it to take expectations. We use EP and EQ to
make it clear if we are taking expectations using P or Q.
We have
1 1
EQ [S1 ] = suQ[S1 = su] + sdQ[S1 = sd]
1+r 1+r
1
= (s)(uqu + dqd )
1+r
= s.
The price of the stock at time 0 is S0 = s. To sum up, we have shown that the price S1 of a unit
of stock at time 1 satisfies
1
S0 = EQ [S1 ]. (5.4)
1+r
This is a formula that is very well known to economists. It gives the stock price today (t = 0) as
the expectation under Q of the stock price tomorrow (t = 1), discounted by the rate 1 + r at
which it would earn interest.
1 We will discuss the reason for the name ‘risk-neutral’ later.
53
©Nic Freeman, University of Sheffield, 2024.
Equation (5.4) is our first example of a ‘risk-neutral valuation’ formula. Recall that we pointed
out in Chapter 1 that we should not use EP and ‘expected value’ prices. A possible cause of
confusion is that (5.4) does correctly calculate the value (i.e. price) of a single unit of stock by
taking an expectation. The point is that we (1) use EQ rather than EP and (2) then discount
according to the interest rate. We will see, in the next section, that these two steps are the correct
way to go about arbitrage free pricing in general.
Moreover, in Section 5.4 we will extend our model to have multiple time steps. Then the
expectation in (5.4) will lead us to martingales.
54
©Nic Freeman, University of Sheffield, 2024.
Definition 5.2.1 A contingent claim is any random variable of the form X = Φ(S1 ), where Φ
is a deterministic function.
The function Φ is sometimes known as the contract function. One example of a contingent
claim is a forward contract, in which the holder promises to buy a unit of stock at time 1 for a
fixed price K , known as the strike price. In this case the contingent claim would be
Φ(S1 ) = S1 − K,
the value of a unit of stock at time 1 minus the price paid for it. We will see many other examples
in the course. Here is another.
Example 5.2.2 A European call option gives its holder the right (but not the obligation) to
buy, at time 1, a single unit of stock for a fixed price K that is agreed at time 0. As for futures,
K is known as the strike price.
Suppose we hold a European call option at time 1. Then, if S1 > K , we could exercise our
right to buy a unit of stock at price K , immediately sell the stock for S1 and consequently earn
S1 − K > 0 in cash. Alternatively if S1 ≤ K then our option is worthless.
Since S1 is equal to either to either su or sd, the only interesting case is when sd < K < su.
In this case, the contingent claim for our European call option is
su − K if S1 = su
(
Φ(S1 ) = (5.5)
0 if S1 = sd.
In the first case our right to buy is worth exercising; in the second case it is not. A simpler way
to write this contingent claim is
In general, given any contract, we can work out its contingent claim. We therefore plan to find
a general way of pricing contingent claims. In Section 1.3 we relied on finding specific trading
strategies to determine prices (one, from the point of view of the buyer, that gave an upper bound
and one, from the point of view of the seller, to give a lower bound). Our first step in this section
is to find a general way of constructing trading strategies.
The process of finding a replicating portfolio is known simply as replicating or hedging. The above
definition means that, if we hold the portfolio h at time 0, then at time 1 it will have precisely
the same value as the contingent claim Φ(S1 ). Therefore, since we assume our model is free of
arbitrage:
55
©Nic Freeman, University of Sheffield, 2024.
If a contingent claim Φ(S1 ) has a replicating portfolio h, then the price of the Φ(S1 ) at
time 0 must be equal to the value of h at time 0.
We say that a market is complete if every contingent claim can be replicated. Therefore, if
the market is complete, we can price any contingent claim.
Example 5.2.4 Suppose that s = 1, d = 12 , u = 2 and r = 14 , and that we are looking at the
contingent claim
1 if S1 = su,
(
Φ(S1 ) =
0 if S1 = sd.
We can represent this situation as a tree, with a branch for each possible movement of the stock,
and the resulting value of our contingent claim written in a square box.
Suppose that we wish to replicate Φ(S1 ). That is, we need a portfolio h = (x, y) such that
V1h = Φ(S1 ):
(1 + 14 )x + 2y = 1
(1 + 14 )x + 12 y = 0.
This is a pair of linear equations that we can solve. The solution (which is left for you to check)
is x = −4
15 , y = 3 . Hence the price of our contingent claim Φ(S1 ) at time 0 is V0 = 15 + 1 · 3 = 5 .
2 h −4 2 2
Let us now take an arbitrary contingent claim Φ(S1 ) and see if we can replicate it. This would
mean finding a portfolio h such that the value V1h of the portfolio at time 1 is Φ(S1 ):
Φ(su) if S1 = su,
(
V1h =
Φ(sd) if S1 = sd.
By (5.1), if we write h = (x, y) then we need
(1 + r)x + suy = Φ(su)
(1 + r)x + sdy = Φ(sd),
which is just a pair of linear equations to solve for (x, y). In matrix form,
! ! !
1 + r su x Φ(su)
= . (5.7)
1 + r sd y Φ(sd)
A unique solution exists when the determinant is non-zero, that is when (1 + r)su − (1 + r)sd 6= 0,
or equivalently when u 6= d. So, in this case, we can find a replicating portfolio for any contingent
claim.
It is an assumption of the model that d ≤ u, so we have that our one-period model is complete
if d < u. Therefore:
56
©Nic Freeman, University of Sheffield, 2024.
This has the solution (again, left for you to check) x = sd(K−su)
(1+r)(su−sd) , y = su−sd .
su−K
By the second
57
©Nic Freeman, University of Sheffield, 2024.
part of Proposition 5.2.6 the value of the European call option at time 0 is
1 1
EQ [Φ(S1 )] = (qu (su − K) + qd (0))
1+r 1+r
1 (1 + r) − d
= (su − K).
1+r u−d
58
©Nic Freeman, University of Sheffield, 2024.
• A forward or forward contract is the obligation to buy (or sell) a single unit of stock at
time 1 for a strike price K .
• A European call option is the right, but not the obligation, to buy a single unit of stock
at time 1 for a strike price K .
• A European put option is the right, but not the obligation, to sell a single unit of stock
at time 1 for a strike price K .
You are expected to remember these definitions! We will often use them in our examples.
There are many other types of financial derivative; we’ll mention some more examples later in
the course, in Section 17.2.
59
©Nic Freeman, University of Sheffield, 2024.
Note that the tree is recombining, in the sense that a move up (by u) followed by a move down
(by d) has the same outcome as a move down followed by a move up. It’s like a random walk,
except we multiply instead of add (recall exercise 4.1).
Remark 5.4.1 The one-period model is simply the T = 1 case of the binomial model. Both
models are summarized on the formula sheet, see Appendix B.
60
©Nic Freeman, University of Sheffield, 2024.
ht = (xt , yt )
The interpretation is that xt is the amount of cash, and yt the amount of stock, that we hold
during the time step t − 1 7→ t. We make our choice of how much cash and stock to hold during
t−1 7→ t based on knowing the value of S0 , S1 , . . . , St−1 , but without knowing St . This is realistic.
Definition 5.5.2 The value process of the portfolio strategy h = (ht )Tt=1 is the stochastic
process (Vth )Tt=0 given by
V0h = x1 + y1 S0 ,
Vth = xt (1 + r) + yt St ,
for t = 1, 2, . . . , T .
At t = 0, V0h is the value of the portfolio h1 . For t ≥ 1, Vth is the value of the portfolio ht at time
t, after the change in value of cash/stock that occurs during t − 1 7→ t. The value process Vth is
Ft measurable but it is not Ft−1 measurable.
We will be especially interested in portfolio strategies that require an initial investment at time
0 but, at later times t ≥ 1, 2, . . . , T − 1, any changes in the amount of stock/cash held will pay
for itself. We capture such portfolio strategies in the following definition.
Definition 5.5.3 A portfolio strategy ht = (xt , yt ) is said to be self-financing if
for t = 0, 1, 2, . . . , T .
This means that the value of the portfolio at time t is equal to the value (at time t) of the
stock/cash that is held in between times t 7→ t + 1. In other words, in a self-financing portfolio
at the times t = 1, 2, . . . we can swap our stocks for cash (and vice versa) according to whatever
the stock price turns out to be, but that is all we can do.
Lastly, our idea of arbitrage must also be upgraded to handle multiple time steps.
61
©Nic Freeman, University of Sheffield, 2024.
Definition 5.5.4 We say that a portfolio strategy (ht ) is an arbitrage possibility if it is self-
financing and satisfies
V0h = 0
P[VTh ≥ 0] = 1.
P[VTh > 0] > 0.
In words, an arbitrage possibility requires that we invest nothing at times t = 0, 1, . . . , T − 1,
but which gives us a positive probability of earning something at time T , with no risk at all of
actually losing money.
It’s natural to ask when the binomial model is arbitrage free. Happily, the condition turns out
to be the same as for the one-period model.
Proposition 5.5.5 The binomial model is arbitrage free if and only if d < 1 + r < u.
The proof is quite similar to the argument for the one-period model, but involves more technical
calculations and (for this reason) we don’t include it as part of the course.
Recall the risk-neutral probabilities from (5.3). In the one-period model, we use them to
define the risk-neutral world Q, in which on each time step the stock price moves up (by u) with
probability qu , or down (by d) with probability qd . This provides a connection to martingales:
Proposition 5.5.6 If d < 1 + r < u, then under the probability measure Q, the process
1
Mt = St
(1 + r)t
is a martingale, with respect to the filtration (Ft ).
62
©Nic Freeman, University of Sheffield, 2024.
Remark 5.5.7 Using Lemma 3.3.6 we have EQ [M0 ] = EQ [M1 ], which states that S0 = 1+r E [S1 ].
1 Q
63
©Nic Freeman, University of Sheffield, 2024.
5.6 Hedging
We can adapt the derivatives from Section 5.3 to the binomial model, by simply replacing time 1
with time T . For example, in the binomial model a forward contract is the obligation to buy a
single unit of stock at time T for a strike price K that is agreed at time 0.
Definition 5.6.1 A contingent claim is a random variable of the form Φ(ST ), where Φ : R → R
is a deterministic function.
Definition 5.6.2 We say that a portfolio strategy h = (ht )Tt=1 is a replicating portfolio or
hedging strategy for the contingent claim Φ(ST ) if h is self-financing and VTh = Φ(ST ).
These match the definitions for the one-period model, except we now care about the value of
the asset at time T (instead of time 1). We will shortly look at how to find replicating portfolios.
As in the one-period model, the binomial model is said to be complete if every contingent
claim can be replicated. Further, as in the one-period model, if the binomial model is free of
arbitrage then it is also complete. With this in mind, for the rest of this section we assume that
d < 1 + r < u.
Lastly, as in the one-period model, our assumption that there is no arbitrage means that:
If a contingent claim Φ(ST ) has a replicating portfolio h = (ht )Tt=1 , then the price of the
Φ(ST ) at time 0 must be equal to the value of h0 .
Now, let us end this chapter by showing how to compute prices and replicating portfolios in
the binomial model. We already know how to do this in the one-period model, see Example 5.2.4.
We could do it in full generality (as we did in (5.7) for the one-period model) but this would
involve lots of indices and look rather messy. Instead, we’ll work through a practical example that
makes the general strategy clear.
Let us take T = 3 and set
To make the calculations easier, we’ll also take our interest rate to be r = 0. We’ll price a
European call option with strike price K = 80. The contingent claim for this option, which is
STEP 1 is to work our the risk-neutral probabilities. From (5.3), these are qu = 1+0−0.5
1.5−0.5 = 0.5
and qd = 1 − qu = 0.5.
STEP 2 is to write down the tree of possible values that the stock can take during time
t = 0, 1, 2, 3. This looks like
64
©Nic Freeman, University of Sheffield, 2024.
We then work out, at each of the nodes corresponding to time T = 3, what the value of our
contingent claim (5.10) would be if this node were reached. We write these values in square
boxes:
We now come to STEP 3, the key idea. Suppose we are sitting in one of the nodes at time
t = 2, which we think of as the ‘current’ node. For example suppose we are at the uppermost
node (labelled 180, the ‘current’ value of the stock). Looking forwards one step of time we can
see that, if the stock price goes up our option is worth 190, whereas if the stock price goes down
our option is worth 10. What we are seeing here is (an instance of) the one-period model! With
contingent claim
Φ(su)
e = 190, Φ(sd)
e = 10.
So, using the one-period risk-neutral valuation formula from Proposition 5.2.6 the value of our
call option at our current node is
1
(190 · 0.5 + 10 · 0.5) = 100.
1+0
We could apply the same logic to any of the nodes corresponding to time t = 2, and compute the
value of our call option at that node:
65
©Nic Freeman, University of Sheffield, 2024.
If we now imagine ourselves sitting in one of the nodes at time t = 1, and look forwards one step
in time, we again find ourselves faced with an instance of the one-period model. This allows us
to compute the value of our call option at the t = 1 nodes; take for example the node labelled
by 40 which, one step into the future, sees the contingent claim Φ(su)
e = 5, Φ(sd)
e = 0 and using
(5.9) gives the value of the call option at this node as 1+0 (5 × 0.5 + 0 × 0.5) = 2.5. Repeating the
1
procedure on the other t = 1 node, and then also on the single t = 0 node gives us
Therefore, the value (i.e. the price) of our call option at time t = 0 is 27.5.
Although we have computed the price, we haven’t yet computed a replicating portfolio, which
is STEP 4. We could do it by solving lots of linear equations for our one-period models, as in
Example 5.2.4, but since we have several steps a quicker way is to apply Proposition 5.2.6 and
use the general formula we found in (5.8).
Starting at time t = 0, to replicate the contingent claim Φ(su)
e = 52.5 and Φ(sd)
e = 2.5 at time
t = 1, equation (5.8) tells us that we want the portfolio
1 1.5 · 2.5 − 0.5 · 52.5 1 52.5 − 2.5 5
x1 = = −22.5, y1 = = .
1+0 1.5 − 0.5 80 1.5 − 0.5 8
66
©Nic Freeman, University of Sheffield, 2024.
Next, suppose the stock price falls between t = 1 and t = 2, so our next node is S2 = 60. Our
portfolio (x2 , y2 ) now becomes worth
95
x2 (1 + 0) + y2 · 60 = −42.5 + · 60 = 5,
120
again equal to the value our of call option. For the final step, we must replicate the contingent
claim Φ(su)
e = 10, Φ(sd)
e = 0, which (5.8) tells us is done using x3 = −5 and y3 = 16 . Again, you
can check that the current value of this portfolio is 5.
Lastly, the stock price rises again to S3 = 90. Our portfolio becomes worth
1
x3 (1 + 0) + y3 · 90 = −5 + · 90 = 10,
6
equal to the payoff from our call option.
To sum up, using (5.8) we can work out which portfolio we would want to hold, at each possible
outcome of the stock changing value. At all times we would be holding a portfolio with current
value equal to the current value of the call option. Therefore, this gives a self-financing portfolio
strategy that replicates Φ(ST ).
67
©Nic Freeman, University of Sheffield, 2024.
(a) Φ(S1 ) = 1
if S1 = su,
(
3
(b) Φ(S1 ) = .
1 if S1 = sd.
5.3 Find the contingent claims Φ(S1 ) for the following derivatives.
(a) A contract in which the holder promises to buy two units of stock at time t = 1, each
for strike price K .
(b) A European put option with strike price K ∈ (sd, su) (see Section 5.3).
(c) A contract in which we promise that, if S1 = su, we will sell one unit of stock at time
t = 1 for strike price K ∈ (sd, su) (and otherwise, if S1 = sd we do nothing).
(d) Holding both the contracts in (b) and (c) at once.
5.6 Let T = 2 and let the initial value of a single unit of stock be S0 = 100. Suppose that
pu = 0.25 and pd = 0.75, that u = 2.0 and d = 0.5, and that r = 0.25. Draw out, in a
tree-like diagram, the possible values of the stock price at times t = 0, 1, 2. Find the price,
at time 0, of a European put option with strike price K = 100.
Suppose instead that pu = 0.1 and pd = 0.9. Does this change the value of our put option?
5.7 Let T = 2 and let the initial value of a single unit of stock be S0 = 120. Suppose that
pu = 0.5 and pd = 0.5, that u = 1.5 and d = 0.5, and that r = 0.0. Draw out, in a tree-like
diagram, the possible values of the stock price at times t = 0, 1, 2. Annotate your tree to
show a hedging strategy for a European call option with strike price K = 60. Hence, write
down the value of this option at time 0.
68
©Nic Freeman, University of Sheffield, 2024.
5.8 Let T = 2 and let the initial value of a single unit of stock be S0 = 480. Suppose that
pu = 0.5 and pd = 0.5, that u = 1.5 and d = 0.75, and that r = 0. Draw out, in a tree-like
diagram, the possible values of the stock price at times t = 0, 1, 2. Annotate your tree to
show a hedging strategy for a European call option with strike price K = 60. Hence, write
down the value of this option at time 0.
Comment on the values obtained for the hedging portfolios.
5.9 Recall that (St )Tt=1 is the price of a single unit of stock.
Challenge questions
5.10 Write a computer program (in a language of your choice) that carries out the pricing algo-
rithm for the binomial model, for a general number n of time-steps.
69
©Nic Freeman, University of Sheffield, 2024.
Chapter 6
A real number is a simple object; it takes a single value. As such, if an is a sequence of real
numbers, limn→∞ an = a, means that the value of an converges to the value of a.
Random variables are more complicated objects. They take many different values, with dif-
ferent probabilities. Consequently, if X1 , X2 , . . . and X are random variables, there are many
different ways in which we can try to make sense of the idea that Xn → X . They are called modes
of convergence, and are the focus of this chapter.
Convergence of random variables sits at the heart of all sophisticated stochastic modelling.
Crucially, it provides a way to approximate one random variable with another (since if Xn → X
then we may hope that Xn ≈ X for large n), which is particularly helpful if it is possible to
approximate a complex model Xn with a relatively simple random variable X . We will explore
this theme further in later chapters.
70
©Nic Freeman, University of Sheffield, 2024.
P [Xn → X as n → ∞] = 1.
• Xn → X , known as convergence in Lp , if
Lp
E [|Xn − X|p ] → 0 as n → ∞.
Here, p ≥ 1 is a real number. We will be interested in the cases p = 1 and p = 2. The case p = 2
is sometimes known as convergence in mean square. Note that these four definitions also appear
on the formula sheet, in Appendix B.
It is common for random variables to converge in some modes but not others, as the following
example shows.
Example 6.1.1 Let U be a uniform random variable on [0, 1] and set
n2 if U < 1/n,
(
Xn = n 1{U < 1/n} =
2
0 otherwise.
Our candidate limit is X = 0, the random variable that takes the deterministic value 0. We’ll
check each of the types of convergence in turn.
• For convergence in distribution, we note that P[X ≤ x] = 01 ifif x<0
x≥0 . We consider these two
cases:
71
©Nic Freeman, University of Sheffield, 2024.
As we might hope, there are relationships between the different modes of convergence, which
are useful to remember.
1. If Xn → X then Xn → X .
P d
2. If Xn → X then Xn → X .
a.s. P
Lp
3. If Xn → X then Xn → X .
P
Lq Lp
4. Let 1 ≤ p < q . If Xn → X then Xn → X .
In all other cases (i.e. that are not automatically implied by the above), convergence in one mode
does not imply convergence in another.
The proofs are not part of our course (they are part of MAS31002/61022). We can summarise
Lemma 6.1.2 with a diagram:
Remark 6.1.3 For convergence of real numbers, it was shown in MAS221 that if an → a and
an → b then a = b, which is known as uniqueness of limits. For random variables, the situation is
a little more complicated: if Xn → X and Xn → Y then X = Y almost surely. By Lemma 6.1.2,
P P
p
this result also applies to → and → . However, if we have only Xn → X and Xn → Y then we
L a.s. d d
can only conclude that X and Y have the same distribution function. Proving these facts is one
of the challenge exercises, 6.6.
72
©Nic Freeman, University of Sheffield, 2024.
Theorem 6.2.1 (Monotone Convergence Theorem) Let (Xn ) be a sequence of random vari-
ables and suppose that:
Then, there exists a random variable X such that Xn → X . Further, E[Xn ] → E[X].
a.s.
The first claim of the theorem, that the sequence Xn converges almost surely, is true because by
property 1 the sequence Xn is increasing, almost surely, and increasing sequences of real numbers
converge1
Since limits preserve weak inequalities and Xn ≥ 0 we have X ≥ 0, almost surely. However,
we should not forget that E[X] might be equal to +∞.
You can think of Theorem 6.2.1 as a stochastic equivalent of the fact that increasing real
valued sequences converge (in R if they are bounded, and to +∞ if they are not bounded). This
is usually helpful when applying the theorem, such as in the following example.
Let (Xn ) be a sequence of independent random variables, with distribution given by
Let Yn = i=1 Xi . Then (YnP ) is an increasing sequence, almost surely, and hence converges
Pn
Remark 6.2.2 (∆) Those of you taking the MAS61023 version of the course should now begin
your independent reading, starting in Chapter 8. It begins with a convergence theorem similar to
Theorem 6.2.1, but which applies to non-monotone sequences.
73
©Nic Freeman, University of Sheffield, 2024.
Show that Xn → 0 in L1 and almost surely. Deduce that also Xn → 0 in probability and in
distribution.
6.2 Let Xn , X be random variables.
L1
(a) Suppose that Xn → X as n → ∞. Show that E[Xn ] → E[X].
(b) Give an example where E[Xn ] → E[X] but Xn does not converge to X in L1 .
6.3 Let U be a random variable such that P[U = 0] = P[U = 1] = P[U = 2] = 13 . Let Xn and
X be given by
1 + n if U = 0
1
1 if U ∈ {0, 1}
(
Xn = 1 − n if U = 1
1 X=
0 if U = 2.
if U = 2,
0
Show that Xn → X both almost surely and in L1 . Deduce that also Xn → X in probability
and in distribution.
6.4 Let X1 be a random variable with distribution given by P[X1 = 1] = P[X1 = 0] = 12 . Set
Xn = X1 for all n ≥ 2. Set Y = 1 − X1 . Show that Xn → Y in distribution, but not in
probability.
6.5 Let (Xn ) be the sequence of random variables from 6.1. Define Yn = X1 + X2 + . . . + Xn .
(a) Show that, for all ω ∈ Ω, the sequence Yn (ω) is increasing and bounded.
(b) Deduce that there exists a random variable Y such that Yn → Y .
a.s.
6.7 Let X be a random variable such that X ≥ 0 and P[X < ∞] = 1. Define
X if X ≤ n
(
Xn =
0 otherwise.
Equivalently, Xn = X 1{X ≤ n}. Show that E[Xn ] → E[X] as n → ∞.
74
©Nic Freeman, University of Sheffield, 2024.
(a) Explain briefly why E i=1 E[Xi ] follows from the linearity of E, for
Pn Pn
i=1 Xi =
n ∈ N. Explain briefly why linearity alone does not allow us to deduce the same
equation with n = ∞.
(b) Suppose that Xn ≥ 0 almost surely, for all n. Show that
"∞ # ∞
(6.1)
X X
E Xi = E[Xi ].
i=1 i=1
(c) Suppose, instead, that the Xi are independent and that P[Xi = 1] = P[Xi = −1] = 12 .
Explain briefly why (6.1) fails to hold, in this case.
Challenge questions
6.6 Let (Xn ) be a sequence of random variables, and let X and Y be random variables.
(a) Show that if Xn → X and Xn → Y then X and Y have the same distribution.
d d
6.7 Let (Xn ) be a sequence of independent random variables such that P[Xn = 1] = P[Xn =
0] = 12 . Show that (Xn ) does not converge in probability and deduce that (Xn ) also does
not converge in L1 , or almost surely. Does Xn converge in distribution?
75
©Nic Freeman, University of Sheffield, 2024.
Chapter 7
In this section we introduce two important results from the theory of martingales, namely the
‘martingale transform’ and the ‘martingale convergence theorem’. We use these results to anal-
yse the behaviour of stochastic processes, including those from Chapter 4 (random walks, urns,
branching processes) and also the gambling game Roulette.
Despite having found a martingale connected to the binomial model, in Proposition 5.5.6, we
won’t use martingales to analyse our financial models – yet. That will come in Chapter 15, once
we have moved into continuous time and introduced the Black-Scholes model.
76
©Nic Freeman, University of Sheffield, 2024.
Mi − Mi−1 = Xi , so we win/lose each round with even chances; we bet 1 on each round, if we
win we get our money back doubled, if we lose we get nothing back.
We place the bet Ci−1 ∈ mFi−1 on play i, meaning that we must place our ith bet using only
the information we gained during the first i − 1 plays. In particular, we don’t yet know the result
of the ith play. So our filtration here is Fn = σ(Ci , Xi ; i ≤ n).
Hence Y is a martingale.
The argument is easily adapted to prove the second statement, e.g. for a supermartingale M ,
E[Mn − Mn−1 | Fn−1 ] ≤ 0. Note that in these cases it is important that C is non-negative.
77
©Nic Freeman, University of Sheffield, 2024.
7.2 Roulette
The martingale transform is a useful theoretical tool (see e.g. Sections 7.3, 8.2 and 12.1), but it
also provides a framework to model casino games. We illustrate this with Roulette.
In roulette, a metal ball lies inside of a spinning wheel. The wheel is divided into 37 segments,
of which 18 are black, 18 are red, and 1 is green. The wheel is spun, and the ball spins with it,
eventually coming to rest in one of the 37 segments. If the roulette wheel is manufactured properly,
the ball lands in each segment with probability 37 1
and the result of each spin is independent.
On each spin, a player can bet an amount of money C . The player chooses either red or
black. If the ball lands on the colour of their choice, they get their bet of C returned and win an
additional C . Otherwise, the casino takes the money and the player gets nothing.
The key point is that players can only bet on red or black. If the ball lands on green, the
casino takes everyones money.
Remark 7.2.1 Thinking more generally, most casino games fit into this mould – there is a very
small bias towards the casino earning money. This bias is known as the ‘house advantage’.
In each round of roulette, a players probability of winning is 1837 (it does not matter which
colour they pick). Let (Xn ) be a sequence of i.i.d. random variables such that
with probability 18
(
1 37
Xn =
−1 with probability 19 37
Naturally, the first case corresponds to the player winning game n and the second to losing. We
define
Xn
Mn = Xn .
i=1
Then, the value of Mn − Mn−1 = Xn is 1 if the player wins game n and −1 if they lose. We take
our filtration to be generated by (Mn ), so Fn = σ(Mi ; i ≤ n).
A player cannot see into the future. So the bet they place on game n must be chosen before the
game is played, at time n − 1 – we write this bet as Cn−1 , and require that it is Fn−1 measurable.
Hence, C is adapted. The total profit/loss of the player over time is the martingale transform
n
X
(C ◦ M )n = Ci−1 (Mi − Mi−1 ).
i=1
78
©Nic Freeman, University of Sheffield, 2024.
We’ll now show that (Mn ) is a supermartingale. We have Mn ∈ mFn and since |Mn | ≤ n we also
have Mn ∈ L1 . Lastly,
Here, the second line follows by linearity and the taking out what is known rule. The third line
follows because Xn+1 is independent of Fn , and the last line follows because E[Xn+1 ] = −1
37 < 0.
So, (Mn ) is a supermartingale and (Cn ) is adapted. Theorem 7.1.1 applies and tells us that
(C ◦ M )n is a supermartingale. We’ll continue this story in Section 7.4.
Remark 7.2.2 There have been ingenious attempts to win money at Roulette, often through
hidden technology or by exploiting mechanical flaws (which can slightly bias the odds), mixed
with probability theory.
In 1961, Edward O. Thorp (a professor of mathematics) and Claude Shannon (a professor of
electrical engineering) created the worlds first wearable computer, which timed the movements of
the ball and wheel, and used this information to try and predict roughly where the ball would
land. Here’s the machine itself:
Information was input to the computer by its wearer, who silently tapped their foot as the Roulette
wheel spun. By combining this information with elements of probability theory, they believed they
could consistently ‘beat the casino’. They were very successful, and obtained a return of 144% on
their bets. Their method is now illegal; at the time it was not, because no-one had even thought
it might be possible.
Of course, most gamblers are not so fortunate.
79
©Nic Freeman, University of Sheffield, 2024.
Definition 7.3.1 Let p ∈ [1, ∞). We say that a stochastic process (Xn ) is uniformly bounded in
Lp if there exists some M < ∞ such that, for all n,
E [|Xn |p ] ≤ M.
0 ≤ s1 < t 2 < . . . < s k < t k ≤ N such that Msi ≤ a, Mti > b for all i = 1, . . . , k.
Note that, for convenience, we draw Mn as a continuous (green line) although in fact it only
changing value at discrete times.
Studying upcrossings is key to establishing almost sure convergence of supermartingales. To
see why upcrossings are important, note that if (cn ) ⊆ R is a (deterministic) sequence and cn → c,
for some c ∈ R, then there is no interval [a, b] a < b such that (cn )∞ n=1 makes infinitely many
upcrossings of [a, b]; if there was then (cn ) would oscillate and couldn’t converge.
Note that UN [a, b] is an increasing function of N , and define U∞ [a, b] by
This is an almost surely limit, since it holds for each ω ∈ Ω. With this definition, U∞ [a, b] could
potentially be infinite, but we can prove that it is not.
80
©Nic Freeman, University of Sheffield, 2024.
The behaviour of Cn is that, when X enters the region below a, Cn starts taking the value 1. It
will continue to take the value 1 until M enters the region above b, at which point Cn will start
taking the value 0. It will continue to take the value 0 until M enters the region below a, and so
on. Hence,
N
(7.2)
X
(C ◦ M )N = Ck−1 (Mk − Mk−1 ) ≥ (b − a)UN [a, b] − |MN − a|.
k=1
That is, each upcrossing of [a, b] by M picks up at least (b − a); the final term corresponds to an
upcrossing that M might have started but not finished.
Note that Cn is adapted, bounded and non-negative. Hence, by Theorem 7.1.1 we have that
C ◦ M is a supermartingale. Thus E[(C ◦ M )N ] ≤ 0, which combined with (7.2) proves the given
result.
Lemma 7.3.3 Suppose M is a supermartingale and uniformly bounded in L1 . Then P [U∞ [a, b] =
∞] = 0.
We have that UN [a, b] is increasing, as N increases, and the definition of U∞ [a, b] in (7.1) gives
that that UN [a, n] → U∞ [a, b] almost surely as N → ∞. Hence, by the monotone convergence
theorem, E[UN [a, b]] → E[U∞ [a, b]], and so by letting N → ∞ in (7.3) we have
Proof: Define
Λa,b = {ω : for infinitely many n, Mn (ω) < a} ∩ {ω : for infinitely many n, Mn (ω) > b}.
We observe that Λa,b ⊆ {U∞ [a, b] = ∞}, which has probability 0 by Lemma 7.3.3. But since
we have that
P[Mn converges to some M∞ ∈ [−∞, +∞]] = 1
81
©Nic Freeman, University of Sheffield, 2024.
This inequality can be proved using the monotone convergence theorem and some careful analysis,
see exercise 7.14. Since we assumed that (Mn ) is uniformly bounded in L1 , (7.4) gives us that
E[|M∞ |] < ∞. Hence, P[|M∞ | = ∞] = 0 (or else the expected value would be infinite).
One useful note is that if Mn is a non-negative supermartingale then we have E[|Mn |] =
E[Mn ] ≤ E[M0 ], so in this case M is uniformly bounded in L1 .
Theorem 7.3.4 has one big disadvantage: it cannot tell us anything about the limit M∞ , except
that it is finite. To gain more information about M∞ , we need an extra condition.
Corollary 7.3.5 (Martingale Convergence Theorem II) In the setting of Theorem 7.3.4,
suppose additionally that (Mn ) is uniformly bounded in L2 . Then Mn → M∞ in both L1 and L2 ,
and
lim E[Mn ] = E[M∞ ], lim var(Mn ) → var(M∞ ).
n→∞ n→∞
82
©Nic Freeman, University of Sheffield, 2024.
Lemma 7.4.1 Let (an ) be an integer valued sequence, and suppose that an → a as n → ∞, where
a ∈ R. Then (an ) is eventually equal to a constant: there exists N ∈ N such that an = a for all
n ≥ N.
Proof: Put = 13 into the definition of a convergent real sequence, and we obtain that there
exists N ∈ N such that |an − a| ≤ 13 for all n ≥ N . Hence, for n ≥ N we have
Since 23 < 1, and an takes only integer values, this means that an = an+1 . Since this holds for all
n ≥ N , we must have aN = aN +1 = aN +2 = . . ., which means that an → aN as N → ∞ (and
hence a = aN ).
(C ◦ M )n . Thus, if our gambler starts with an amount of money K ∈ N at time 0, then at time
n they have an amount of money given by
Wn = K + (C ◦ M )n . (7.5)
We’ll assume that our gambler keeps betting until they run out of money, so we have Cn ≥ 1 and
Wn ≥ 0 for all n before they run out of money (and, if and when they do run out, Cn = 0 for
all remaining n). Since money is a discrete quantity, we can assume that K and Cn are integers,
hence Wn is also always an integer.
From Section 7.2 we know that (C ◦M )n is a supermartingale, hence Wn is also a supermartin-
gale. Since Wn ≥ 0 we have E[|Wn |] = E[Wn ] ≤ E[W0 ] so (Wn ) is uniformly bounded in L1 .
Hence, by the martingale convergence theorem, Wn is almost surely convergent.
By Lemma 7.4.1, the only way that Wn can converge is if it becomes constant, eventually.
Since each play results in a win (an increase) or a loss (a decrease) the only way Wn can become
eventually constant is if our gambler has lost all his money i.e. for some (random) N ∈ N, for all
n ≥ N , we have Wn = 0. Thus:
Lemma 7.4.2 Almost surely, a roulette player will eventually lose all their money.
83
©Nic Freeman, University of Sheffield, 2024.
We now aim to prove Proposition 7.4.3, which means we must find out the distribution of M∞ .
In order to do so, we will use Lemma 6.1.2, which tells us that Mn → M∞ , that is
d
for all x, except possibly those for which P[M∞ = x] > 0. We will begin with a surprising lemma
that tells us the distribution of Bn . Then, using (7.6), we will be able to find the distribution of
Mn , from which (7.7) will then tell us the distribution of M∞ .
Lemma 7.4.5 For all k = 1, . . . , n + 1, it holds that P[Bn = k] = n+1 .
1
Proof: Let A be the event that the first k balls drawn are red, and the next j balls drawn are
black. Then,
12 k 1 2 j
P[A] = ... ... . (7.8)
23 k+1k+2k+3 j+k+1
Here, each fraction corresponds (in order) to the probability of getting a red/black ball (as ap-
propriate) on the corresponding draw, given the results of previous draws. For example, 12 is the
probability that the first draw results in a red ball, after which the urn contains 2 red balls and
1 black ball, so 23 is the probability that the second draw results in a red ball, and so on. From
(7.8) we have P[A] = (j+k+1)!
j!k!
.
Here is the clever part: we note that drawing k red balls and j black balls, in the first j + k
draws but in a different order, would have the same probability. We would simply obtain the
numerators in (7.8) in a different order. There are j+k possible different orders (i.e. we must
k
choose k time-points from j + k times at which to pick the red balls). Hence, the probability that
we draw k red and j black in the first j + k draws is
j+k j!k! (j + k)! j!k! 1
= = .
k (j + k + 1)! k!(j + k − k)! (j + k + 1)! j+k+1
84
©Nic Freeman, University of Sheffield, 2024.
We set j = n − k , to obtain the probability of drawing (and thus adding) k red balls within the
first j + k = n draws. Hence P[Bn = k + 1] = n+1 1
for all k = 0, 1, . . . , n. Setting k 0 = k + 1, we
recover the statement of the lemma.
Proof: [Of Proposition 7.4.3.] Now that we know the distribution of Bn , we can use (7.6) to
find out the distribution of Mn . Lemma 7.4.5 tells us that Bn is uniformly distribution on the set
{1, . . . , n + 1}. Hence Mn is uniform on { n+2
1 2
, n+2 n+2 }. This gives us that, for x ∈ (0, 1),
, . . . , n+1
1
P[Mn ≤ x] = × bx(n + 2)c.
n+1
Here, bx(n + 2)c, which denotes the integer part of x(n + 2), is equal to the number of elements
of { n+2
1 2
, n+2 n+2 } that are ≤ x. Since x(n + 2) − 1 ≤ bx(n + 2)c ≤ x(n + 2) we obtain that
, . . . , n+1
x(n + 2) − 1 x(n + 2)
≤ P[Mn ≤ x] ≤
n+1 n+1
and the sandwich rule tells us that limn→∞ P[Mn ≤ x] = x. By (7.7), this means that
Therefore, M∞ has a uniform distribution on (0, 1). This proves Proposition 7.4.3.
We could change the rules of our urn process, to create a new kind of urn – for example whenever
we draw a red ball we could add two new red balls, rather than just one. Urn processes are
typically very ‘sensitive’, meaning that a small change in the process, or its initial conditions,
can have a big effect on the long term behaviour. You can investigate some examples of this in
exercises 7.7, 7.8 and 7.9.
Urn processes are often used in models in which the ‘next step’ of the process depends on
sampling from its current state. Recall that in random walks we have Sn+1 = Sn + Xn+1 , where
Xn+1 is independent of Sn ; so for random walks the next step of the process is independent of its
current state. By contrast, in Pólya’s urn, whether the next ball we add is red (or black) depends
on the current state of the urn; it depends on which colour ball we draw.
For example, we might look at people choosing which restaurant they have dinner at. Consider
two restaurants, say Café Rouge (which is red) and Le Chat Noir (which is black). We think of
customers in Café Rouge as red balls, and customers in Le Chat Noir as black balls. A newly
arriving customer (i.e. a new ball that we add to the urn) choosing which restaurant to visit is
more likely, but not certain, to choose the restaurant which currently has the most customers.
Other examples include the growth of social networks (people tend to befriend people who
already have many friends), machine learning (used in techniques for becoming successively more
confident about partially learned information), and maximal safe dosage estimation in clinical
trials (complicated reasons). Exercise 7.8 features an example of an urn process that is the basis
for modern models of evolution (successful organisms tend to have more descendants, who are
themselves more likely to inherit genes that will make them successful).
85
©Nic Freeman, University of Sheffield, 2024.
Proof: In this case Mn = Zn . So, we have Zn → M∞ almost surely. The offspring distribution
is not deterministic, so for as long as Zn 6= 0 the value of Zn will continue to change over time.
Each such change in value is of magnitude at least 1, hence by Lemma 7.4.1 the only way Zn can
converge is if Zn is eventually zero. Therefore, since Zn does converge, in this case we must have
P[Zn dies out] = 1 (and M∞ = 0).
Sketch of Proof: Recall our graphical representation of the Galton-Watson process in Section
4.3. Here’s the key idea: when µ < 1, on average we expect each individual to have less children
than when µ = 1. Therefore, if the Galton-Watson process dies out when µ = 1 (as is the case,
from Lemma 7.4.6), it should also die out when µ < 1.
Our idea is not quite a proof, because the size of the Galton-Watson process Zn is random,
and not necessarily equal to its expected size E[Zn ] = µn . To make the idea into a proof we will
need to find two Galton-Watson processes, Zn and Zen , such that 0 ≤ Zn ≤ Zen , with µ < 1 in Zn
86
©Nic Freeman, University of Sheffield, 2024.
and µ = 1 in Zen . Then, when Zen becomes extinct, so does Zn . This type of argument is known
as a coupling, meaning that we use one stochastic process to control another. See exercise 7.15
for how to do it.
Lemma 7.4.8 Suppose that µ > 1 and σ 2 < ∞. Then P[Zn → ∞] > 0.
Proof: The first step of the argument is to show that, for all n ∈ N,
σ2
2
E[Mn+1 ] = E[Mn2 ] + . (7.10)
µn+2
You can find this calculation as exercise 7.13. It is similar to the calculation that we did to find
E[Mn ] in Section 4.3, but a bit harder because of the square.
By iterating (7.10) and noting that E[M0 ] = 1, we obtain that
n
2
X σ2
E[Mn+1 ] =1+ .
µi+2
i=1
Hence,
∞
2 σ2 X 1
E[Mn+1 ]≤1+ ,
µ2 µi
i=1
which is finite because µ > 1. Therefore, (Mn ) is uniformly bounded in L2 , and the second
martingale convergence theorem applies. In particular, this gives us that
E[Mn ] → E[M∞ ].
Noting that E[Mn ] = 1 for all n, we obtain that E[M∞ ] = 1. Hence, P[M∞ > 0] > 0. On the
event that M∞ > 0, we have Mn = Zµnn → M∞ > 0. Since µn → ∞, this means that when
M∞ > 0 we must also have Zn → ∞. Hence, P[Zn → ∞] > 0.
The behaviour Zn → ∞ is often called ‘explosion’, reflecting the fact that (when it happens)
each generation tends to contain more and more individuals than the previous one. The case of
µ > 1 and σ 2 = ∞ behaves in the same way as Lemma 7.4.8, but the proof is more difficult and
we don’t study it in this course.
The Galton-Watson process can be extended in many ways. For example, we might vary the
offspring distribution used in each generation, or add a mechanism that allows individuals to
produce their children gradually across multiple time steps. The general term for such processes
is ‘branching processes’. Most branching processes display the same type of long-term behaviour
as we have uncovered in this section: if the offspring production is too low (on average) then they
are sure to die out, but, if the offspring production is high enough that dying out is not a certainty
then (instead) there is a chance of explosion to ∞.
Branching processes are the basis of stochastic models in which existing objects reproduce, or
break up, into several new objects. Examples are the spread of contagious disease, crushing rocks,
nuclear fission, tumour growth, and so on. In Chapter 19 (which is part of MAS61023 only) we
will use branching processes to model contagion of unpaid debts in banking networks.
87
©Nic Freeman, University of Sheffield, 2024.
Branching processes are used by biologists to model population growth, and they are good
models for populations that are rapidly increasing in size. (For populations that have reached
a roughly constant size, models based on urn processes, such as that of exercise 7.8, are more
effective.)
88
©Nic Freeman, University of Sheffield, 2024.
Note that we draw the picture as a continuous line, for convenience, but in fact the random walk
is jumping between integer values.
Next, let Sn be the asymmetric random walk, with p > q , so the walk drifts upwards. The
asymmetric random walk also experiences oscillations growing larger and larger, but the drift
upwards is strong enough that, in fact Sn → ∞ as n → ∞. So, our picture of the long-term
a.s.
89
©Nic Freeman, University of Sheffield, 2024.
Random walks can be extended in many ways, such as by adding an environment that influences
the behaviour of the walk (e.g. reflection upwards from the origin as in exercise 4.5, slower
movements when above 0, etc), or by having random walkers that walk around a graph (Z, in
our case above). More complex extensions involve random walks that are influenced by their own
history, for example by having a preference to move into sites that they have not yet visited.
In addition, many physical systems can be modelled by systems of several random walks, that
interact with each other, for example a setup with multiple random walks happening at the same
time (but started from different points in space) in which, whenever random walkers meet, both
of them suddenly vanish.
Random walks are the basis of essentially all stochastic modelling that involves particles moving
around randomly in space. For example, atoms in gases and liquids, animal movement, heat
diffusion, conduction of electricity, transportation networks, internet traffic, and so on.
In Chapters 15 and 16 we will use stock price models that are based on ‘Brownian motion’,
which is itself introduced in Chapter 11 and is the continuous analogue of the symmetric random
walk (i.e. no discontinuities). We’ll come back to the idea of atoms in gases and liquids in Section
11.1, and we’ll also briefly look at heat diffusion in Section 11.3.
90
©Nic Freeman, University of Sheffield, 2024.
7.2 Let Mn be a stochastic process. Let α, β ∈ R and let Cn and Dn be adapted stochastic
processes. Let Xn = αCn + βDn . Show that
(X ◦ M )n = α(C ◦ M )n + β(D ◦ M )n
for all n.
Xn
Sn = Xi
i=1
where we take S0 = 0.
(a) Show that Sn is a martingale, and deduce that there exists a real-valued random variable
S∞ such that Sn → S∞ as n → ∞.
a.s.
(b) Show that, almost surely, there exists some N ∈ N such that Xn = 0 for all n ≥ N .
7.4 Write simulations of the symmetric & asymmetric random walks (in a language of your
choice). Add functionality to draw the random walk as a graph, with time on the horizontal
axis and the value of the walk on the vertical axis.
Look at several samples from your simulations, with e.g. 1000 steps of time, and check that
they support the claims made in Section 7.4, about the long-term behaviour of random
walks.
Modify your simulation to simulate the random walks in Exercise 7.3 and Question 2 of
Assignment 3. Check that your graphs support the result that, in both cases, Sn → S∞ .
a.s.
From your graphs, do you notice a difference in behaviour between these two cases?
7.5 Let Mn = Sn −Ln be the martingale defined in exercise 4.5. Show that (Mn ) is not uniformly
bounded in L1 .
7.6 Recall the Galton-Watson process (Zn ) from Section 7.4, and recall that it is parametrized
by its offspring distribution G.
(a) Give an example of an offspring distribution G for which P[Zn dies out] = 1.
(b) Give an example of an offspring distribution G for which P[Zn dies out] = 0.
91
©Nic Freeman, University of Sheffield, 2024.
7.7 Consider the following modification of the Pólya urn process. At time n = 0, the urn contains
one red ball and one black ball. Then, at each time n = 1, 2, . . ., we draw a ball from the
urn. We place this ball back into the urn, and add one ball of the opposite colour to the urn;
so if we drew a red ball, we would add a black ball, and vice versa.
Therefore, at time n (which means: after the nth draw is complete) the urn contains n + 2
balls. Let Bn denote the number of red balls in the urn at time n, and let Mn = n+2
Bn
denote
the fraction of red balls in the urn at time n.
(a) Calculate E[Mn+1 | Fn ] and hence show that Mn is not a martingale, with respect to
the filtration Fn = σ(B1 , . . . , Bn ).
(b) Write a simulation of the urn (in a language of your choice) and use your simulation to
make a conjecture about the value of the almost sure limit of Mn as n → ∞. Does this
limit depend on the initial state of the urn?
7.8 Consider an urn that, at time n = 0, contains K ≥ 1 balls, each of which is either black or
red. At each time n = 1, 2, . . ., we do the following, in order:
1. Draw a ball X1 from the urn, and record its colour. Place X1 back into the urn.
2. Draw a ball X2 from the urn, and discard it.
3. Place a new ball, with the same colour as X1 , into the urn.
Thus, for all time, the urn contains exactly K balls. We write Mn for the fraction of red
balls in the urn, after the nth iteration of the above steps is complete.
This process is known as the discrete time ‘Moran model’, and is a model for the evolution
of a population that contains a fixed number K of individual organisms – represented as K
balls. At each time n, an individual X2 (is chosen and) dies and an individual X1 (is chosen
and) reproduces.
Although this model is a highly simplified version of reality, with careful enough application it
turns out to be very useful. For example, it is the basis for current methods of reconstructing
genealogical trees from data obtained by genome sequencing.
7.9 Consider the Pólya urn process from Section 7.4. Suppose that we begin our urn, at time
n = 0, with two red balls and one black ball. Let Mn denote the resulting fraction of red
balls in the urn at time n.
(a) Show that Mn does not converge almost surely to 0.
(b) Write a simulation of the Pólya urn process (in a language of your choice) and compare
the effect of different initial conditions on M∞ .
7.10 Let (Sn ) denote the simple asymmetric random walk, with q > p. In Exercise 4.2 we showed
that
Mn = (q/p)Sn
is a martingale.
92
©Nic Freeman, University of Sheffield, 2024.
(a) Show that there exists a real valued random variable M∞ such that Mn → M∞ .
a.s.
(b) Deduce that P[M∞ = 0] = 1 and that (Mn ) is not uniformly bounded in L2 .
(c) Use (a) and (b) to show that Sn → −∞. Explain briefly why this means that Sn → ∞
a.s. a.s.
7.11 Let Sn denote the symmetric random walk, from Section 7.4. Recall that S0 = 0.
(a) Show that Sn is even when n is even, and odd when n is odd.
(b) Define pn = P[Sn = 0]. Show that p2n = 2n and p2n+1 = 0.
−2n
n 2
(Hint: Count the number of ways to return to zero after precisely 2n steps.)
(c) Show that p2(n+1) = 1 − 2(n+1)
1
p2n for all n, and hence show that p2n → 0 as n → ∞.
(Hint: Use the inequality 1 − x ≤ e−x , which holds for all x ≥ 0.)
7.12 Let Sn denote the symmetric random walk, from Section 7.4. Let f : N → N be a determin-
istic function.
Challenge questions
7.13 In this question we establish the formula (7.10), which we used in the proof of Lemma 7.4.8.
Let (Zn ) be the Galton-Watson process, with the offspring distribution G. Suppose that
E[G] = µ and var(G) = σ 2 < ∞. Set Mn = Zµnn .
7.14 In this question we prove the inequality (7.4), which we used in the proof of the martingale
convergence theorem.
Let Mn be a sequence of random variables such that Mn → M∞ . Define
a.s.
Xn = inf |Mk |.
k≥n
(a) Explain why (Xn ) is an increasing sequence and, hence, why there is a random variable
X∞ such that Xn → X∞ .
a.s.
(b) Show that, for all > 0 and all n ∈ N there exists some n0 ≥ n such that
|Mn0 | − ≤ Xn ≤ |Mn |.
93
©Nic Freeman, University of Sheffield, 2024.
Define Zen by setting Ze0 = 1, and then using (7.9) with Zen in place of Zn and X
e n+1 in place
i
of Xin+1 . Define f : [0, 1] → R by f (α) = E[Xin+1 ].
(a) Convince yourself that Zen is a Galton-Watson process, with offspring distribution given
by (7.11).
(b) Explain briefly why 0 ≤ Zn ≤ Zen for all n.
(c) Show that f (0) < 1 and f (1) ≥ 1. Deduce that there exists a value α ∈ [0, 1] such that
f (α) = 1.
(d) Show that P[Zn dies out] = 1.
94
©Nic Freeman, University of Sheffield, 2024.
Chapter 8
In this chapter we develop some further tools for analysing stochastic processes: the dominated
convergence theorem, the optional stopping theorem and the strong Markov property. In chapters
8 and 9 we will write
min(s, t) = s ∧ t, max(s, t) = s ∨ t.
This is common notation in the field of stochastic processes.
Note that this whole chapter is marked with a (∆) – it is for independent study in MAS61023
but it is not part of MAS352.
95
©Nic Freeman, University of Sheffield, 2024.
1. Xn → X .
P
2. There exists a random variable Y ∈ L1 such that, for all n, |Xn | ≤ |Y | almost surely.
The random variable Y is often known as the dominating function or dominating random variable.
You can find a proof of the theorem in MAS31002/61022.
Example 8.1.2 Let Z ∼ N (µ, σ 2 ). Let X ∈ L1 be any random variable and let
Z
Xn = X + .
n
We can think of Xn as a noisy measurement of the random variable X , where the noise term Zn
becomes smaller as n → ∞.
Let us check the first condition of the theorem. Note that |Xn − X| = |Z|n , which tends to
zero as n → ∞ because |Z| < ∞. Hence Xn → X as n → ∞, which by Lemma 6.1.2 implies
a.s.
Xn → X .
P
Let us now check the second condition, with Y = |X| + |Z|. Then E[Y ] = E[|X|] + E[|Z|],
which is finite since X ∈ L1 and Z ∈ L1 . Hence, Y ∈ L1 . We have |Xn | ≤ |X| + n1 |Z| ≤ Y .
Therefore, we can apply the dominated convergence theorem and deduce that E[Xn ] → E[X]
as n → ∞. Of course, we can also calculate E[Xn ] = E[X] + n1 µ, and check that the result really
is true.
In the above example, we could calculate E[Xn ] = E[X] + nµ and notice that E[Xn ] → E[X],
without using any theorems at all. The real power of the dominated convergence theorem is
in situations where we don’t specify, or can’t easily calculate, the means of Xn or X . See, for
example, exercises 8.1 and 8.2, or the applications in Section 8.2.
If our sequence of random variables (Xn ) has |Xn | ≤ c for some deterministic constant c, then
the dominating function can be taken to be (the deterministic random variable) c. This case is
the a very common application, see e.g. exercise 8.1.
Remark 8.1.3 ( ) The dominated convergence theorem holds for conditional expectation too;
that is with E[·] replaced by E[· | G]. We won’t need this result as part of our course.
96
©Nic Freeman, University of Sheffield, 2024.
Definition 8.2.1 A map T : Ω → {0, 1, 2, . . . , ∞} is called a (Fn ) stopping time if, for all n, the
event {T ≤ n} is Fn measurable.
A stopping time is a random time with the property that, if we have only information from Fn
accessible to us at time n, we are able to decide at any n whether or not T has already happened.
It is straightforward to check that T is a stopping time if and only if {T = n} is Fn measurable
for all n. This is left for you as exercise 8.4.
Example 8.2.2 Let Sn = ni=1 Xi be the simple symmetric random walk, with P[Xi = 1] =
P
T = inf{n ≥ 0 ; Sn = a},
which is the first time Sn takes the value a, is a stopping time. It is commonly called the ‘hitting
time’ of a. To see that T is a stopping time we note that
n n
{T ≤ n} = {for some i ≤ n we have Si = a} =
[ [
{Si = a} = Si−1 (a)
i=0 i=0
You will have seen hitting times before – for most of you, in MAS2003 in the context of Markov
chains. You might also recall that the random walk Sn can be represented as Markov chains on
Z. You will probably have looked at a method for calculating expected hitting times by solving
systems of linear equations. This approach is feasible if the state space of a Markov chain is finite;
however Z is infinite. In this section we look at an alternative method, based on martingales.
Recall the notation ∧ and ∨ that we introduced at the start of this chapter. That is, min(s, t) =
s ∧ t and max(s, t) = s ∨ t.
Lemma 8.2.3 Let S and T be stopping times with respect to the filtration (Fn ). Then S ∧ T is
also a (Fn ) stopping time.
MnT = Mn∧T .
Here a ∧ b denotes the minimum of a and b. To be precise, this means that MnT (ω) = Mn∧T (ω) (ω)
for all ω ∈ Ω. In Example 8.2.2, S T would be the random walk S which is stopped (i.e. never
moves again) when (if!) it reaches state a.
97
©Nic Freeman, University of Sheffield, 2024.
The last equality holds because the sum is telescoping (i.e. the middle terms all cancel each other
out). Hence, by Theorem 7.1.1, if M is a martingale (resp. submartingale, supermartingale),
C ◦ M is also a martingale (resp. submartingale, supermartingale).
Theorem 8.2.5 (Doob’s Optional Stopping Theorem) Let M be martingale (resp. submartin-
gale, supermartingale) and let T be a stopping time. Then
E[MT ] = E[M0 ]
(a) T is bounded.
(b) P[T < ∞] = 1 and there exists c ∈ R such that |Mn | ≤ c for all n.
(c) E[T ] < ∞ and there exists c ∈ R such that |Mn − Mn−1 | ≤ c for all n.
Proof: We’ll prove this for the supermartingale case. The submartingale case then follows by
considering −M , and the martingale case follows since martingales are both supermartingales and
submartingales.
Note that
E[Mn∧T − M0 ] ≤ 0, (8.1)
because M T is a supermartingale, by Lemma 8.2.4. For (a), we take n = supω T (ω) and the
conclusion follows.
For (b), we use the dominated convergence theorem to let n → ∞ in (8.1). As n → ∞, almost
surely n ∧ T (ω) is eventually equal to T (ω) (because P[T < ∞] = 1), so Mn∧T → MT almost
surely. Since M is bounded, Mn∧T and MT are also bounded. So E[Mn∧T ] → E[MT ] and taking
limits in (8.1) obtains E[MT − M0 ] ≤ 0, which in turn implies that E[MT ] ≤ E[M0 ].
For (c), we will also use the dominated convergence theorem to let n → ∞ in (8.1), but now
we need a different way to check its conditions. We observe that
n∧T
(Mk − Mk−1 ) ≤ T sup |Mn − Mn−1 |.
X
|Mn∧T − M0 | =
n∈N
k=1
Since E[T (supn |Mn − Mn−1 |)] ≤ cE[T ] < ∞, we can use the Dominated Convergence Theorem
to let n → ∞, and the results follows as in (b).
These three sets of conditions (a)-(c) for the optional stopping theorem are listed on the formula
sheet, in Appendix B. Sometimes none of them apply! See Remark 9.3.5 and the lemma above it
for a warning example.
98
©Nic Freeman, University of Sheffield, 2024.
Definition 8.3.1 Let (Fn ) be a filtration and let T be a stopping time. Define
FT = A ∈ F ; A ∩ {T ≤ n} ∈ Fn for all n .
In words, FT is often known as the σ -field obtained by stopping (Fn ) at time T . If (Fn ) is the
generated filtration of (Xn ), then FT holds the information obtained from (X1 , X2 , . . . , XT ). We
will often need to use our intuition when dealing with this concept, but let us formally prove that
FT is a σ -field.
Lemma 8.3.2 Let (Fn ) be a filtration and let T be a stopping time. Then FT is a σ -field,
Proof: We must check the three conditions of Definition 2.1.1. Firstly, for all n ∈ N we have
∅∩{T ≤ n} = ∅ ∈ Fn , so ∅ ∈ FT . Secondly, if A ∈ FT then A∩{T ≤ n} ∈ Fn , and {T ≤ n} ∈ Fn
because T is a stopping time. Hence
(Ω \ A) ∩ {T ≤ n} = {T ≤ n} \ (A ∩ {T ≤ n}) ∈ Fn
so ∪∞
m=1 Am ∈ FT .
99
©Nic Freeman, University of Sheffield, 2024.
The key point is this: suppose that we are interested in the distribution of (Xm , Xm+1 , Xm+2 , . . .).
This might depend on the value of Xm , but if we know the value of Xm then values of X1 , . . . , Xm−1
(alongside any other information in Fm ) provide us with no extra information. Heuristically, the
future of (Xm ) might depend on its current value, but if we know that current value then we can
ignore the past.
The Markov property is very natural. We might even argue that reality itself, with its gen-
erated σ -field, has the Markov property, but this is a philosophical question best discussed else-
where. Most stochastic processes that are used in modelling are Markov processes. It is also
natural to replace the time m by a stopping time T . The strong Markov property states that
(XT , XT +1 , XT +2 , . . .) has the same distribution given the information FT as it would do if all
we knew was σ(XT ).
We will need the strong Markov property for random walks within Chapter 9. However, for
random walks we can say something even stronger: after a stopping time time T , a random walk
essentially restarts from its current location, and from then on behaves just like the same random
walk, with its future movements being independent of its past.
We need some notation to state this precisely. Let Sn = ni=1 Xi where the (Xi ) are indepen-
P
dent identically distribution random variables; this covers all the examples of random walks that
we will study in Chapter 9.
Theorem 8.4.1 Let T be a stopping time. The following two processes have the same distribution:
Moreover, in either case, the distribution of (ST , ST +1 , ST +2 , . . .) is that of a random walk begun
at ST and incremented by an independent copy of Xi at each step of time.
More formally, Theorem 8.4.1 states that for all n ∈ N and any function f : R → R we have
where (Sn0 ) is an independent copy of (Sn ), with identical distribution. Strictly, this equation
only holds provided that f (·) ∈ L1 in each case, but we will always take f to be a nice enough
function that this is not a problem.
It is often more useful to apply the strong Markov property in words, and this is common
practice in probability theory. We’ll do it that way in Chapter 9. We won’t include a proof of
Theorem 8.4.1 in this course. It isn’t hard to prove, but it is a bit tedious, and hopefully you can
believe the result without needing a proof.
100
©Nic Freeman, University of Sheffield, 2024.
The next result may appear surprising at first. The key point is that the Fn are assumed to be
independent, which means that they have no information in common. Consequently (8.2) implies
that T contains no information.
Theorem 8.5.1 (Kolmogorov’s 0-1 law) Let (Fn ) be a sequence of independent σ -fields and
let T be the associated tail σ -field. If A ∈ T then P[A] = 0 or P[A] = 1.
Proof: Let A ∈ T . Then A ∈ σ(Fn+1 , Fn+2 , . . .) for all n ∈ N. This means that A is
independent of σ(F1 , . . . , Fn ), for all n. It follows that A is independent of σ(Fn ; n ∈ N).
However, from (8.2) we have that T ⊆ σ(Fn ; n ∈ N), which means that A is independent of
T . Hence A is independent of A (this is not a typo!), which means that P[A] = P[A ∩ A] =
P[A]P[A] = P[A]2 . The only solutions of the equation x2 = x are 0 and 1, hence P[A] = 0 or
P[A] = 1.
Remark 8.5.2 Let us assume the same independence assumption as Theorem 8.5.1, and suppose
that X ∈ mT . Then {X ≤ x} ∈ T for all x ∈ R, so Theorem 8.5.1 gives that P[X ≤ x] is either
0 or 1 for each x ∈ R. A bit of analysis shows that if we set c = inf{x ∈ R ; P[X ≤ x] = 1} then
in fact P[X = c] = 1. Therefore, any random variable that is T measurable is almost surely equal
to a constant.
101
©Nic Freeman, University of Sheffield, 2024.
if |X| ≤ n,
(
X
Xn = X 1{|X|≤n} =
0 otherwise.
8.3 Let Z be a random variable taking values in [1, ∞) and for n ∈ N define
Z if Z ∈ [n, n + 1)
(
Xn = Z 1{Z∈[n,n+1)} = (8.3)
0 otherwise.
(a) Suppose that Z ∈ L1 . Use the dominated convergence theorem to show that E[Xn ] → 0
as n → ∞.
(b) Suppose, instead, that Z is a continuous random variable with probability density func-
tion
x−2 if x ≥ 1,
(
f (x) =
0 otherwise.
and define Xn using (8.3). Show that Z is not in L1 , but that E[Xn ] → 0.
(c) Comment on what part (b) tells us about the dominated convergence theorem.
8.5 Let (Fn ) be a filtration and let (Sn ) be an adapted process, with values in Z and initial
value S0 = 0. Show that R = inf{n ≥ 1 ; Sn = 0} is a stopping time.
Note: R is known as the first return time of (Sn ) to 0.
8.6 Let S and T be stopping times with respect to the same filtration. Show that S + T is also
a stopping time.
8.7 Let (Fn ) be a filtration. Let S and T be stopping times with S ≤ T . Show that FS ⊆ FT .
8.8 Recall the Pólya urn process from Section 4.2. Recall that Bn is the number of black balls
in the urn at time n, and that Mn = n+2
Bn
is the fraction of black balls in the urn at time n.
Let T be the first time at which a black ball is drawn from the urn.
102
©Nic Freeman, University of Sheffield, 2024.
8.9 This question is a continuation of exercise 7.8 – recall the urn process from that exercise.
Let T be the first time at which the urn contains only balls of one colour, and suppose that
initially the urn contains r red balls and b black balls. Recall that Mn denotes the fraction
of red balls in the urn at time n.
8.10 Let m ∈ N and m ≥ 2. At time n = 0, an urn contains 2m balls, of which m are red and
m are blue. At each time n = 1, . . . , 2m we draw a single ball from the urn; we discard
this ball and do not replace it. We continue until the urn is empty. Therefore, at time
n ∈ {0, . . . , 2m} the urn contains 2m − n balls.
Let Nn denote the number of red balls remaining in the urn at time n. For n = 0, . . . , 2m−1
let
Nn
Pn =
2m − n
be the fraction of red balls remaining after time n.
(a) Show that Pn is a martingale, with respect to a natural filtration that you should specify.
(b) [Challenge question] Let T be the first time at which we draw a red ball. Note that
a (T + 1)st ball will be drawn, because the urn initially contains at least two red balls.
Show that the probability that the (T + 1)st ball is red is 12 .
103
©Nic Freeman, University of Sheffield, 2024.
Chapter 9
In Section 7.4 we studied the long-term behaviour of the Polya urn and the Galton-Watson process,
by taking a limit as the time parameter tended to infinity of a suitably chosen martingale. The
following result shows that these techniques do not (or at least, not directly) explain how random
walks might behave. In this chapter we will focus on the case of simple random walks, which
means that they move precisely one unit of space, upwards or downwards, in each step of time.
Lemma 9.0.1 Let Sn = ni=1 Xi where P[Xi = 1] = p and P[Xi = −1] = 1 − p, and the (Xi )
P
are independent of each other. Then (Sn ) does not converge almost surely to a real valued random
variable and (Sn ) is not uniformly bounded in L1 .
Proof: The process (Sn ) is integer valued. From Lemma 7.4.1 we have that, if (Sn ) were
to converge almost surely, it would have to eventually become constant. By definition we have
|Sn+1 − Sn | = 1, so this cannot happen, hence (Sn ) does not convergence almost surely.
In the case p = 12 we have that (Sn ) is a martingale. If (Sn ) was also uniformly bounded in L1
then the martingale convergence theorem would imply almost sure convergence, so from what we
have already proved (Sn ) is not uniformly bounded in L1 . For the case p 6= 12 , we have shown in
(4.3) that Mn = Sn − n(p − q) is a martingale, hence E[Sn ] = n(p − q) and therefore |E[Sn ]| → ∞.
By the absolute values property of expectations |E[Sn ]| ≤ E[|Sn |], hence E[|Sn |] → ∞, so (Sn ) is
not uniformly bounded in L1 .
In fact, as we mentioned briefly (and without proof) in Section 7.4, random walks exhibit a
mixture of oscillatory behaviour and divergence to ±∞. We will study this behaviour here, as
well as considering some other closely related properties of random walks. The techniques we will
use are primarily those developed in Chapter 8.
This chapter will make use of some of the exercises from earlier chapters, including within the
proofs. Make sure that you study these too, when they appear, if you have not done so already.
As a reminder: the solutions to all of the end-of-chapter exercises can be found within the online
version of the lecture notes.
104
©Nic Freeman, University of Sheffield, 2024.
Set Fn = σ(X1 , . . . , Xn ) and note that (Fn ) is the natural filtration of Sn . Recall that we showed
in Section 4.1 that the stochastic process Sn − n(p − q) was a martingale. In Exercise 4.2 we
found another martingale associated to (Sn ), in particular we showed that
Mn = (q/p)Sn
is a martingale.
Our plan is to use the optional stopping theorem, applied to the martingale (Mn ), to obtain
information about the hitting times of the asymmetric random walk. Let Ta = inf{n : Sn = a}
and T = Ta ∧ Tb for integer a < 0 < b. We aim to calculate P[T = Ta ] and P[T = Tb ]. We can
show that Ta is a stopping time by noting that
n
[
{Ta ≤ n} = {Si ≤ a}.
i=0
Similarly, Tb is a stopping time and it follows from Lemma 8.2.3 that T is also a stopping time.
We now look to apply the optional stopping theorem, using the (b) conditions. For this, we’ll
need the following lemma.
Proof: Let m = b − a. Divide up the random variables (Xn ) into sets A1 , A2 , . . . as follows:
A1 A2 A3 ...
(9.1)
z }| { z }| { z }| { z }| {
X1 , X2 , X3 , . . . , Xm , Xm+1 , Xm+2 , . . . , X2m , X2m+1 , X2m+2 , . . . , X3m , X3m+1 , . . . .
105
©Nic Freeman, University of Sheffield, 2024.
We can think of the sequence of events E1 , E2 , . . . as one chance after another that our random
walker has to exit [a, b].
By independence, P[Ek ] = pm . Hence, the random variable K = inf{k ∈ N ; Ek occurs} is a
geometric random variable with success parameter pm . This means that K < ∞ almost surely
and that E[K] = p−m < ∞. By definition of K , the event EK occurs so T ≤ Km and by
monotonicity of E we have E[T ] ≤ mE[K] < ∞.
From above, we have that M is a martingale. Hence by Lemma 8.2.4, M T is also a martingale.
By definition of T ,
for all n, hence M T is a bounded martingale. Lemma 9.1.1 implies that P[T < ∞] = 1, so we
have that condition (b) of the optional stopping theorem holds for the martingale M T and the
stopping time T . Therefore,
E[MTT ] = E[M0T ]
but MTT = MT ∧T = MT and M0T = M0∧T = M0 = 1. So we have
E[MT ] = 1.
Our next aim is to calculate the probabilities P[T = Ta ] and P[T = Tb ]. That is, we want to know
which of the two boundaries a and b we actually hit at time T (for example, {T = Ta } = {ST = a}
is the event that we hit a at time T ).
Since P[T < ∞] = 1, we must hit one or other boundary, so we have that
1 = E[MT ] (9.3)
= E[MT 1{T = Ta }] + E[MT 1{T = Tb }]
a " #
q b
q
=E 1{T = Ta } + E 1{T = Tb }
p p
a b
q q
= P[T = Ta ] + P[T = Tb ] . (9.4)
p p
106
©Nic Freeman, University of Sheffield, 2024.
Solving the linear equations (9.2) and (9.4), recalling that p 6= q , gives that
(q/p)b − 1
P[T = Ta ] = . (9.5)
(q/p)b − (q/p)a
and therefore also
1 − (q/p)a
P[Tb = T ] = 1 − P[T = Ta ] = . (9.6)
(q/p)b − (q/p)a
Note that this formula does not make sense if p = q = 12 . In that case our two linear equations
above are in fact the same equation, so we cannot solve them to find P[T = Ta ] and P[T = Tb ].
However, all is not lost: the case of the symmetric random walk can be handled in similar style to
above, but we need a different martingale in place of (Mn ). This is left for you to do, with some
hints along the way, in Exercise 9.2
107
©Nic Freeman, University of Sheffield, 2024.
is a random walk, can sometimes be expressed using formulae involving binomial coefficients.
Stirling’s approximation is helpful to calculate the limits of such quantities as n → ∞.
It is also helpful to introduce some notation from analysis that you may not have seen before:
if (an ) and (bn ) are real valued sequences then we write
an
an ∼ bn to mean that lim = 1.
n→∞ bn
We will use this notation throughout the remainder of Chapter 9.
108
©Nic Freeman, University of Sheffield, 2024.
Tk = inf{n ≥ 0 ; Sn = k}
R = inf{n ≥ 1 ; Sn = 0}
be the first return time to zero. We set R = ∞ if the walk does not return to zero, in keeping
with the convention that inf ∅ = ∞. In Example 8.2.2 we showed that Tk is a stopping time, and
in Exercise 8.5 we showed that R is a stopping time.
Proof: Note that P[R < ∞] ∈ [0, 1]. We will argue by contradiction. Suppose that P[R <
∞] = p < 1. By the strong Markov property, the process (SR+n )n≥0 has the same distribution as
(Sn ). In words, each time the random walk returns to zero the process restarts, and from then on
acts like a simple symmetric random walk, independent of the past. Let G = ∞ n=1 1{Sn = 0},
P
which in words counts the number of times (Sn ) returns to zero. By the strong Markov property,
applied repeatedly at each return, G has a geometric distribution distribution P[G = j] = pj (1−p)
for all j ∈ {0, 1, . . . , }. In particular, E[G] < ∞.
We will now calculate E[G] using a different method. Note that G = ∞ n=1 1{Sn = 0}, hence
P
∞ ∞
E[1{Sn = 0}] = (9.8)
X X
E[G] = P[Sn = 0].
n=1 n=1
In the first equality above we have used Exercise 6.8 to swap the infinite sum and expectation.
Calculating P[Sn = 0] is Exercise 7.11; it is zero when n is odd, which leaves us only to consider
√
4πn(2n/e)2n −2n
2n −2n (2n)! −2n 1
P[S2n = 0] = 2 = 2
2 ∼ 2n
2 = √ . (9.9)
n (n!) 2πn(n/e) πn
In the above we have used Stirling’s approximation, in particular equation (9.7), to simplify the
√
binomial coefficients as n → ∞. From (9.9) we obtain that πnP[S2n = 0] → 1 as n → ∞.
Hence there exists some N ∈ N such that for all n ≥ N we have P[S2n = 0] ≥ 2√1πn . Putting this
back into (9.8) we have
∞ ∞
X X 1
E[G] ≥ P[S2n = 0] ≥ √ = ∞.
2 πn
n=N n=N
We had deduced above that E[G] was finite, so we have reached a contradiction. Hence in fact
P[R < ∞] = 1.
109
©Nic Freeman, University of Sheffield, 2024.
Proof: Note that T0 = 0, because S0 = 0. We next consider T1 . From Lemma 9.3.1 we have
P[R < ∞] = 1, hence by the strong Markov property (applied at successive returns to zero) we
have that (Sn )∞n=0 visits zero infinitely many times.
At after each visit to zero, by the strong Markov property, the next move of the walk will be
to 1 with probability 12 , and to −1 with probability 12 . Each time, the move will be independent
of the past. Hence, we need to wait a Geometric( 12 ) number of times to see a movement to 1.
Since this only requires finitely many attempts, we have P[T1 < ∞] = 1.
Let us now consider Tk for k ∈ N. To reach k we need to move from 0 to 1, then from 1 to
2, and so on until k . By the strong Markov property applied successively at T1 , then T2 , then T3
and so on, each such move takes the same number of steps (in distribution) as an independent
copy of T1 . Hence the distribution of Tk is that of k i.i.d. copies of T1 . Since P[T1 < ∞] = 1 it
follows immediately that P[Tk < ∞] = 1.
Lastly, we consider k < 0. By symmetry of the walk about zero, Tk and T−k have the same
distribution, so P[T−k < ∞] = P[Tk < ∞] = 1.
Theorem 9.3.3 Almost surely, for all k ∈ Z there are infinitely many n ∈ N such that Sn = k .
Proof: Since there are only countably many k ∈ Z, it suffices to prove the result for some fixed
(but arbitrary) k ∈ Z. From Lemma 9.3.2 we have P[Tk < ∞] = 1, so almost surely the walk
reaches k , in fact STk = k . By the strong Markov property, after time Tk the walk continues to
behave like a simple symmetric random walk, independent of its past. Lemma 9.3.1 gives that
P[R < ∞] = 1, meaning that the walk will almost surely return (again) to k . Repeating this
argument, we obtain that almost surely the walk visits k infinitely many times.
110
©Nic Freeman, University of Sheffield, 2024.
How long does the simple symmetric random walk take to reach 1?
Let us focus for a moment on the time T1 . Even though we now know that P[T1 < ∞] = 1, we are
not able to apply the optional stopping theorem to (Sn ) and T1 , because (Sn ) is unbounded and
we do not know if E[T1 ] < ∞. In fact, in this case E[T1 ] = ∞ and the optional stopping theorem
does not apply. Instead, we can argue by contradiction and use the optional stopping theorem to
deduce that E[T1 ] = ∞, as follows.
Proof: Suppose that E[T1 ] < ∞. Then we could apply the optional stopping theorem to (Sn )
and T using the condition (c). This would give E[S0 ] = 0 = E[ST1 ]. But, from Lemma 9.3.2 we
have P[T1 < ∞] = 1 which means, by definition of T1 , that ST1 = 1 almost surely. This is a
contradiction.
Remark 9.3.5 More generally, we have to be very careful about using the optional stopping
theorem. The conditions fail in many situations, and there are many examples of stopping times
T and martingales (Mn ) for which E[MT ] 6= E[M0 ].
From Lemmas 9.3.2 and 9.3.4 we have that P[T1 < ∞ = 1] but E[T1 ] = ∞. This tells us that
although the time T1 is always finite, it can be very large. You might find this counter-intuitive
at first glance since it is clear that P[T1 = 1] = 12 by just considering the first step, but the point
is that once the walk has (by chance) made a few steps downwards, it starts to take a very long
time to find its way back up again. In the next lemma, with a careful calculation we will find out
the exact distribution of T1 . Note that T1 is odd, as the value of (Sn ) is odd/even depending on
whether n is odd/even.
111
©Nic Freeman, University of Sheffield, 2024.
From the above calculation, and using the symmetry of the random walk about zero, we have
that
P[T ≤ 2n − 1] = 2P[S2n > 0] = P[S2n > 0] + P[S2n < 0] = 1 − P[S2n = 0]. (9.11)
The second line of the above follows from (9.11), and the last line follows from a short calculation
that is left for you. The last claim of the lemma follows from the same calculation as in (9.9).
The calculation that led to (9.10) is known as a ‘reflection principle’. It applies the strong
Markov property at Tk (with k = 1, in this case), and then divides the future into two cases based
on whether the first move is upwards of downwards. The distribution of the walk in these two
cases is a mirror of one another i.e. a reflection about height k .
112
©Nic Freeman, University of Sheffield, 2024.
P[Xi = 1] = p > 12 and P[X1 = −1] = 1 − p < 12 . The case p < 12 can be handled by considering
(−Sn ) in place of (Sn ). Let R = inf{n ≥ 1 ; Sn = 0} be the first return time of (Sn ) to zero. As
before, we set R = ∞ if the walk does not return to zero.
Like the symmetric random walk, the asymmetric case sees oscillations, but they are dominated
by the drift upwards, so much so that (Sn ) might not return to origin even once.
Proof: We will first prove that Sn → ∞. In fact, we have already proved this using a
a.s.
martingale convergence argument in Exercise 7.10. We’ll give a different proof here, using the
strong law of large numbers. We have Sn = ni=1 Xi , where the Xi are i.i.d. random variables,
P
so the strong law of large numbers implies that Snn → E[X1 ] as n → ∞. Note that E[X1 ] =
a.s.
p(1) + (1 − p)(−1) = 2p − 1. It follows that there exists (a random variable) N ∈ N such that
for all n ≥ N we have Snn ≥ 2p−1
2 , which implies Sn ≥ n 2 . Since 2p − 1 > 0 we therefore have
2p−1
Sn → ∞. In particular this means that P[(Sn ) visits zero infinitely many times] = 0.
a.s.
If P[R < ∞] was equal to 1 then, by applying the strong Markov property at each successive
return time to zero, we would have that almost surely (Sn ) visited the origin infinitely many
times. We have shown above that this is not the case. Hence P[R < ∞] < 1.
Remark 9.4.2 By the strong Markov property and the same argument as in the proof of Lemma
9.3.1 (except that there it was part of a proof by contradiction) the number of times that (Sn )
returns to the origin is a Geometric(p̂) random variable, where p̂ = P[R < ∞].
If we take Tk to be the hitting time of k ∈ Z, then it follows immediately from Lemma 9.4.1
that P[Tk < ∞] = 1 for k ≥ 0 and P[Tk < ∞] < 1 for k < 0. In the symmetric case Lemma 9.3.4
showed that E[T1 ] = ∞, but here in the asymmetric case the opposite is true.
Proof: We calculate
In the first line of the above we partition on the value of S1 . The first term of the second
line follows because if S1 = 1 then T1 = 1, and the second term uses the relationship between
expectation and conditional expectation. The third line follows because S1 ∈ F1 . The fourth line
113
©Nic Freeman, University of Sheffield, 2024.
uses that, on the event S1 = −1, T1 is equal to 1 (accounting for the first move 0 7→ −1) plus two
independent copies (T10 and T100 ) of T1 (accounting for the time to move from −1 7→ 0 and then
0 7→ 1). The strong Markov property gives that T10 and T100 are independent of F1 , leading to the
fifth line. The remainder of the calculation is straightforward. Rearranging to isolate E[T1 ], we
obtain the required result.
Remark 9.4.4 Using a similar argument to that used within the proof of Lemma 9.3.2, for k ∈ N
we have that Tk is the sum of k i.i.d. copies of T1 , hence E[Tk ] = 2p−1
kp
.
Note that when p ≈ 1 we have E[T1 ] ≈ 1, which makes sense since when p is close to 1 it is
very likely that the first step of the walk will be upwards. As p & 12 we have E[T1 ] → ∞, which
also makes sense because as p gets close to 12 we would expect the asymmetric walk to look more
and more like the symmetric case, which has E[T1 ] = ∞.
114
©Nic Freeman, University of Sheffield, 2024.
where P[Xi = (0, 1)] = P[Xi = (0, −1)] = P[Xi = (1, 0)] = P[Xi = (0, −1)] = 14 . These four
possibilities correspond to movements by precisely one unit of space in the four possible directions:
north, south, east and west.
Let R = inf{n ≥ 1 ; Sn = 0} denote the time taken to return to the origin 0 = (0, 0). In two
dimensions we have:
Lemma 9.5.1 If d = 2 then P[R < ∞] = 1.
Proof: We may use much the same argument as in Lemma 9.3.1. The only difference is that
we must replace the estimate 9.9 with a two dimensional version, as follows. We have
n
X 2n 2i 2n − 2i −2n
P[S2n = (0, 0)] = 4
2i i n−i
i=0
X n 2
2n n
= 4−2n
n k
k=0
2n −2n 2
= 2
n
1
∼ . (9.12)
πn
In the above, the first line comes from counting the number of possible ways in which the two
dimensional walk can start at 0 and returns to 0 after exactly 2n steps (we did the one dimensional
version in Exercise 7.11). To do so the walk must have made the same number of steps north as
it did south, and the same number of steps east as it did west. We write 2i for the number of
east-west steps and 2j = 2n − 2i for the number of north-south steps. The first binomial factor
counts which steps are east-west and which are north-south. The second and third, respectively,
count which of the east-west steps are to the east, and which of the north-south steps are to the
north – and that fixes everything else about the path.
The second line of the above calculation follows easily from the first, and you should write
the factorials out in full to check this for yourself. The third line follows from the second by the
combinatorical identity 2n k=0 k . The final line follows from the same calculation as in
Pn n
n =
the right hand side of (9.9).
With (9.12) in place of (9.9), the rest of the argument for this lemma follows in the same way
as that of Lemma 9.3.1. We’ll omit writing out the details again.
We won’t prove it here, but the simple symmetric random walk will take much longer to return
to the origin in d = 2 than d = 1. This is because in order to reach (0, 0) both the east-west
component Sn and the north-south component Sn must return simultaneously to zero, and they
(1) (2)
unlikely to do this at the same time. Perhaps surprisingly, in two dimensions the random walk
115
©Nic Freeman, University of Sheffield, 2024.
still manages to return. The picture changes if we move to three dimensions, with the obvious
notation: now Sn ∈ Z3 and there are 6 different directions to move in, each with probability 16 .
In three dimensions there is now so much space to get lost in that the random walk might not
return to the origin.
Sketch of Proof: We will only give a sketch of the argument. We need to make some
substantial modifications to the argument used in Lemmas 9.3.1 and 9.5.1, because we are now
trying to prove the opposite result, but there are still parts that overlap. In similar style to (9.12)
we have
X (2n)!
P[S2n = (0, 0, 0)] = 6−2n
(i!) (j!)2 (k!)2
2
i+j+k=n
n! −n 2
2n −2n X
= 2 3
n i!j!k!
i+j+k=n
2n −2n n! −n X n! −n
≤ 2 max 3 3 .
n i+j+k=n i!j!k! i!j!k!
i+j+k=n
The first line again concerns counting possible ways to return to the origin in exactly 2n steps
(we’ll omit the details of this) and the next two lines follow by elementary calculations. We
have that i+j+k=n i!j!k!n!
3−n = 1, because this sums up all the probabilities of outcomes (i, j, k)
P
that sum to n according to the trinomial distribution (which you might have to look up, if you
haven’t seen it before). A bit of calculus with Lagrange multipliers shows that the term i!j!k!
n!
3−n ,
subject to the condition i + j + k = n, is maximised when i, j, k are all approximately equal
to n3 . Approximating n3 by the nearest integer, we can find that maxi+j+k=n i!j!k! n!
3−n ≤ Cn ,
where C ∈ (0, ∞) is a constant. Combining this with the calculation from (9.9) we obtain that
P[S2n = 0] ≤ nC3/2 , where C 0 ∈ (0, ∞) is also constant.
0
∞ ∞
C0
= 0] ≤
X X
E[G] = P[S2n < ∞.
n=1 n=1
n3/2
Hence the expected number of visits to the origin is finite. By the strong Markov property, if
P[R < ∞] = 1 then we would almost surely have infinitely many returns to the origin. Hence
P[R < ∞] < 1.
Lemma 9.5.2 also holds in all dimensions d ≥ 3, To see this, note that the first three coordinates
of a d-dimensional random walk give us a three dimensional random walk. If the d-dimensional
walk visits the origin, so do its first three coordinates, but Lemma 9.5.2 gives that this has
probability less than one.
It is natural to ask what the random walk does do, in three dimensions and higher, as n → ∞.
The answer is easy to guess: it disappears off to infinity and does not come back i.e. |Sn | → ∞.
a.s.
116
©Nic Freeman, University of Sheffield, 2024.
117
©Nic Freeman, University of Sheffield, 2024.
9.5 For the simple symmetric random walk, in Lemma 9.3.6 we showed that P[T1 = 2n − 1] ∼
1
√
2n πn
. Use this fact to give a second proof (alongside that of Lemma 9.3.4) that E[T1 ] = ∞.
9.6 Let (Sn ) denote the simple symmetric random walk and let Tm = inf{n ≥ 0 ; Sn = m} be
the first hitting time of m ∈ Z. Let
eθSn
Mn(θ) =
(cosh θ)n
where θ ∈ R.
(b) Check that none of the conditions (a)-(c) of the optional stopping theorem apply to the
martingale (Mn ) at the stopping time Tm .
(θ)
9.8 Let (Sn ) denote the three dimensional simple symmetric random walk, as defined in Section
9.5. Let G = n=0 1{S2n =0} denote the total number of visits to the origin. Let R =
P∞
9.9 Let (Sn ) denote the three dimensional simple symmetric random walk, as defined in Section
9.5. Prove that |Sn | → ∞ as n → ∞.
a.s.
118
©Nic Freeman, University of Sheffield, 2024.
Appendix A
Chapter 1
1.1 At time 1 we would have 10(1 + r) in cash and 5 units of stock. This gives the value of our assets at time
1 as 10(1 + r) + 5S1 .
1.2 (a) Assuming we don’t buy or sell anything at time 0, the value of our portfolio at time 1 will be
x(1 + r) + yS1 , where S1 is as in the solution to 1.1. Since we need to be certain of paying off
our debt, we should assume a worst case scenario for S1 , that is S1 = sd. So, we are certain to pay
off our debt if and only if
x(1 + r) + ysd ≥ K.
(b) Since certainty requires that we cover the worst case scenario of our stock performance, and d < 1 + r,
our best strategy is to exchange all our assets for cash at time t = 0. This gives us x + ys in cash
at time 0. At time 1 we would then have (1 + r)(x + ys) in cash at time 0 and could pay our debt
providing that
(1 + r)(x + ys) ≥ K.
1.3 (a) We borrow s cash at time t = 0 and use it to buy one stock. Then wait until time t = 1, at which
point we will a stock worth S1 , where S1 is as in 1.1. We sell our stock, which gives us S1 in cash.
Since S1 ≥ sd > s(1 + r), we repay or debt (plus interest) which costs us s(1 + r). We then have
S1 − s(1 + r) > 0
s(1 + r) − S1 > 0
119
©Nic Freeman, University of Sheffield, 2024.
Hence, Y has the distribution function of N (0, 1), which means Y ∼ N (0, 1).
1.7 For p 6= 1, ∞
∞
x−p+1
Z
x−p dx =
1 −p + 1 x=1
and we obtain a finite answer only if −p + 1 < 0. When p = 1, we have
Z ∞
x−1 dx = [log x]∞
x=1 = ∞,
1
z}|{ z }| { z }| { z }| {
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+ + + + + + + + ++ + + + + + +....
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
120
©Nic Freeman, University of Sheffield, 2024.
Chapter 2
2.1 The result of rolling a pair of dice can be written ω = (ω1 , ω2 ) where ω1 , ω2 ∈ {1, 2, 3, 4, 5, 6}. So a suitable
Ω would be n o
Ω = (ω1 , ω2 ) ; ω1 , ω2 ∈ {1, 2, 3, 4, 5, 6} .
Of course other choices are possible too. Since our choice of Ω is finite, a suitable σ-field is F = P(Ω).
2.2 (a) Consider the case of F. We need to check the three properties in the definition of a σ-field.
1. We have ∅ ∈ F so the first property holds automatically.
2. For the second we check compliments: Ω \ Ω = ∅, Ω \ {1} = {2, 3}, Ω \ {2, 3} = {1} and Ω \ Ω = ∅;
in all cases we obtain an element of F.
3. For the third, we check unions. We have {1} ∪ {2, 3} = Ω. Including ∅ into a union doesn’t
change anything; including Ω into a union results in Ω. This covers all possible cases, in each
case resulting in an element of F.
So, F is a σ-field. Since F 0 is just F with the roles of 1 and 2 swapped, by symmetry F 0 is also a
σ-field.
(b) We have {1} ∈ F ∪ F 0 and {2} ∈ F ∪ F 0 , but {1} ∪ {2} = {1, 2} and {1, 2} ∈ / F ∪ F 0 . Hence F ∪ F 0
is not closed under unions; it fails property 3 of the definition of a σ-field.
However, F ∩ F 0 is the intersection of two σ-fields, so is automatically itself a σ-field. (Alternatively,
note that F ∩ F 0 = {∅, Ω}, and check the definition.)
2.3 (a) Let Am be the event that the sequence (Xn ) contains precisely m heads. Let Am,k be the event that
we see precisely m heads during the first k tosses and, from then on, only tails. Then,
That is, the probability of having only finite many heads is zero. Hence, almost surely, the sequence
(Xn ) contains infinitely many heads.
By symmetry (exchange the roles of heads and tails) almost surely the sequence (Xn ) also contains
infinitely many tails.
2.4 (a) F1 contains the information of whether a 6 was rolled.
(b) F2 gives the information of which value was rolled if a 1, 2 or 3 was rolled, but if the roll was in
{4, 5, 6} then F1 cannot tell the difference between these values.
(c) F3 contains the information of which of the sets {1, 2}, {3, 4} and {5, 6} contains the value rolled, but
gives no more information than that.
2.5 (a) The values taken by X1 are
if ω = 3,
0
X1 (ω) = 1 if ω = 2, 4,
if ω = 1, 5.
4
Hence, the pre-images are X1−1 (0) = {3}, X1−1 (1) = {2, 4} and X1−1 (4) = {1, 5}. These are all
elements of G1 , so X1 ∈ mG1 . However, they are not all elements of G2 (in fact, none of them are!) so
X1 ∈/ mG2 .
121
©Nic Freeman, University of Sheffield, 2024.
(b) Define
if ω = 1, 2,
(
0
X2 (ω) =
1 otherwise.
Then the pre-images are X2−1 (0) = {1, 2} and X2−1 (1) = {3, 4, 5}. Hence X2 ∈ mG2 .
It isn’t possible to use unions/intersections/complements to make the set {3, 4, 5} out of {1, 5}, {2, 4}, {3}
(one way to see this is note that as soon as we tried to include 5 we would also have to include 1,
meaning that we can’t make {3, 4, 5}). Hence X2 ∈ / mG1 .
2.6 We have X(T T ) = 0, X(HT ) = X(T H) = 1 and X(HH) = 2. This gives
n o
σ(X) = ∅, {T T }, {T H, HT }, {HH}, {T T, T H, HT }, {T T, HH}, {T H, HT, HH}, Ω .
To work this out, you could note that X −1 (i) must be in σ(X) for i = 0, 1, 2, then also include ∅ and Ω,
then keep adding unions and complements of the events until find you have added them all. (Of course,
this only works because σ(X) is finite!)
2.7 Since sums, divisions and products of random variables are random variables, X 2 + 1 is a random variable.
Since X 2 + 1 is non-zero, we have that X 21+1 is a random variable, and using products again, X 2X+1 is a
random variable.
For sin(X), recall that for any x ∈ R we have
∞
(−1)n 2n+1
sin(x) =
X
x .
n=0
(2n + 1)!
(−1)n
Since X is a random variable so is (2n+1)!
X 2n+1 . Since limits of random variables are also random variables,
so is
N
(−1)n
sin X = lim
X
X 2n+1 .
N →∞
n=0
(2n + 1)!
2.8 Since X ≥ 0 we have E[X p ] = E[|X|p ] for all p.
Z ∞ ∞ ∞
x−1
Z
E[X] = x1 2x−3 dx = 2 x−2 dx = 2 =2<∞
1 1 −1 1
so X ∈ L1 , but Z ∞ Z ∞
E[X 2 ] = x2 2x−3 dx = 2 x−1 dx = 2 [log x]∞
1 = ∞
1 1
so X ∈
/ L2 .
2.9 Let 1 ≤ p ≤ q < ∞ and X ∈ Lq . Our plan is to generalize the argument that we used to prove (2.7).
First, we condition to see that
|X|p = |X|p 1{|X| ≤ 1} + |X|p 1{|X| > 1}.
Hence,
E[|X|p ] = E [|X|p 1{|X| ≤ 1} + |X|p 1{|X| > 1}]
≤ E [1 + |X|q ]
= 1 + E|X|q ] < ∞
Here we use that, if |X| ≤ 1 then |X|p ≤ 1, and if |X| > 1 then since p ≤ q we have |X|p < |X|q . So
X ∈ Lp .
2.10 First, we condition to see that
X = X 1{X<a} + X 1{X≥a} . (A.1)
From (A.1) since X ≥ 0 we have
X ≥ X 1{X≥a}
≥ a1{X ≥ a}.
This second inequality follows by looking at two cases: if X < a then both sides are zero, but if X ≥ a then
we can use that X ≥ a. Using monotonicity of E, we then have
E[X] ≥ E[a1{X≥a} ] = aE[1{X≥a} ] = aP[X ≥ a].
Dividing through by a finishes the proof.
This result is known as Markov’s inequality.
122
©Nic Freeman, University of Sheffield, 2024.
= 0.
Chapter 3
3.1 We have S2 = X1 + X2 so
Here, we use the linearity of conditional expectation, followed by the fact that X1 is σ(X1 ) measurable and
X2 is independent of σ(X1 ) with E[X2 ] = 0.
Secondly, S22 = X12 + 2X1 X2 + X22 so
Here, we again use linearity of conditional expectation to deduce the first line. To deduce the second line,
we use that X12 and X1 are σ(X1 ) measurable (using the taking out what is known rule for the middle
term), whereas X2 is independent of σ(X1 ). The final line comes from E[X2 | σ(X1 )] = 0 (which we already
knew from above) and that E[X22 ] = 1.
3.2 For i ≤ n, we have Xi ∈ mFn , so Sn = X1 + X2 + . . . + Xn is also Fn measurable. For each i we have
|Xi | ≤ 2 so |Sn | ≤ 2n, hence Sn ∈ L1 . Lastly,
Here, we use linearity of conditional expectation in the first line. To deduce the second, we use that
Xi ∈ mFn for i ≤ n and that Xn+1 is independent of F\ in the second. For the final line we note that
E[Xn+1 ] = 13 2 + 23 (−1) = 0.
3.3 (a) Since Xi ∈ mFn for all i ≤ n, we have Mn = X1 X2 . . . Xn ∈ mFn . Since |Xi | ≤ c for all i, we have
|Mn | ≤ cn < ∞, so Mn is bounded and hence Mn ∈ L1 for all n. Lastly,
Here, to deduce the second line we use that Xi ∈ mFn for i ≤ n. To deduce the third line we use that
Xn+1 is independent of Fn and then to deduce the forth line we use that E[Xn+1 ] = 1.
123
©Nic Freeman, University of Sheffield, 2024.
(b) By definition of conditional expectation (Theorem 3.1.1), we have Mn ∈ L1 and Mn ∈ mGn for all n.
It remains only to check that
Here, to get from the first to the second line we use the tower property.
3.4 Note that the first and second conditions in Definition 3.3.3 are the same for super/sub-martingales as for
martingales. Hence, M satisfies these conditions. For the third condition, we have that E[Mn+1 | Fn ] ≤
Mn (because M is a supermartingale) and E[Mn+1 | Fn ] ≥ Mn (because M is a submartingale. Hence
E[Mn+1 | Fn ] = Mn , so M is a martingale.
3.5 (a) If m = n then E[Mn | Fn ] = Mn because Mn is Fn measurable. For m > n, we have
Here we use the tower property to deduce the first equality (note that m − 1 ≥ n so Fm−1 ⊇ Fn )
and the fact that (Mn ) is a martingale to deduce the second inequality (i.e. E[Mm | Fm−1 ] = Mm−1 ).
Iterating, from m, m − 1 . . . , n + 1 we obtain that
E[Mm | Fn ] = E[Mn+1 | Fn ]
and the martingale property then gives that this is equal to Mn , as required.
(b) If (Mn ) is a submartingale then E[Mm | Fn ] ≥ Mn , whereas if (Mn ) is a supermartingale then
E[Mm | Fn ] ≤ Mn .
3.6 We have (Mn+1 − Mn )2 = Mn+1
2
− 2Mn+1 Mn + Mn2 so
E (Mn+1 − Mn )2 | Fn = E[Mn+1
2
| Fn ] − 2E[Mn+1 Mn | Fn ] + E[Mn2 | Fn ]
2
= E[Mn+1 | Fn ] − 2Mn E[Mn+1 | Fn ] + Mn2
2
= E[Mn+1 | Fn ] − 2Mn2 + Mn2
2
= E[Mn+1 | Fn ] − Mn2
as required. Here, we use the taking out what is known rule (since Mn ∈ mFn ) and the martingale property
of (Mn ).
It follows that E[Mn+1
2
| Fn ] − Mn2 = E[(Mn+1 − Mn )2 | Fn ] ≥ 0. We have Mn2 ∈ mFn and since Mn ∈ L2
we have Mn2 ∈ L1 . Hence (Mn2 ) is a submartingale.
3.7 Using that E[Xn+1 | Fn ] = aXn + bXn−1 we can calculate
αa + 1 = α and αb = 1.
Hence, α = 1
b
is our choice. It is then easy to check that
Here we use the linearity of conditional expectation and the fact that Sn is σ(Sn ) measurable. Hence,
Sn
α = E[X1 | σ(Sn )] = .
n
124
©Nic Freeman, University of Sheffield, 2024.
Chapter 4
4.1 Since Zn = eSn and Sn ∈ mFn , also Zn ∈ mFn . Since |Sn | ≤ n, we have |eSn | ≤ en < ∞, so Zn is bounded
and hence Zn ∈ L1 . Hence also Mn ∈ mFn and Mn ∈ L1 .
Lastly,
E[Mn+1 | Fn ] = Mn
E[Zn+1 | Fn ] ≥ Zn
e+ 1
where to deduce the last line we use that 2
e
> 1. Thus Mn is a martingale and Zn is a submartingale.
4.2 Since Xi ∈ mFn for all i ≤ n, we have Mn ∈ mFn . We have |Xi | ≤ 1 so |Sn | ≤ n and |Mn | ≤
max{(q/p)n , (q/p)−n } = (q/p)−n (note p > q), which implies that Sn ∈ L1 and Mn ∈ L1 for all n.
We have
Here we use the taking out what is known rule, followed by the fact that Xn+1 is independent of Fn and
the relationship between conditional expectation and independence. To deduce the final step we use that
E[Xn+1 ] = p(1) + q(−1) = p − q > 0. Therefore, we have E[Sn+1 | Fn ] ≥ Sn , which means that Sn is a
submartingale.
We now look at Mn . We have
Here we use the taking out what is known rule, followed by the fact that Xn+1 is independent of Fn and
the relationship between conditional expectation and independence. To deduce the final step we use that
h i
E (q/p)Xn+1 = p(q/p)1 + q(q/p)−1 = p + q = 1.
4.3 Since Xi ∈ Fn for all i ≤ n we have Sn ∈ mFn where Fn = σ(X1 , . . . , Xn ). Further, if we set m =
max(|a|, |b|) then |Sn | ≤ nm so Sn ∈ L1 .
We have
125
©Nic Freeman, University of Sheffield, 2024.
4.4 Clearly Sn ∈ mFn (recall that, by default, we take (Fn ) to be the generated filtration of (Sn )) hence
Sn2 ∈ mFn and Sn2 − n ∈ Fn . We have |Sn |leqn so E[|Sn2 − n] ≤ E[Sn2 ] + E[n] = 2n by the triangle law and
monotonicity of expectation, hence Sn2 − n ∈ L1 . Similarly Sn2 ∈ L2 ,
We have
2
E[Sn+1 − (n + 1) | Fn ] = E[(Xn+1 + Sn )2 − (n + 1) | Fn ]
2
= E[Xn+1 | Fn ] + 2E[Sn Xn+1 | Fn ] + E[Sn2 | Fn ] − (n + 1)
2
= E[Xn+1 ] + 2Sn E[Xn+1 ] + Sn2 − (n + 1)
= 1 + 2Sn (0) + Sn2 − (n + 1)
= Sn2 − n
Here, the second line follows by linearity. The third line follows by the taking out what is known rule and
the independence rule, using that Xn+1 is independent of Fn , whilst Sn ∈ mFn . The fourth line follows
because Xn+1
2
= 1 and E[Xn+1 ] = 12 (1) + 12 (−1) = 0. Hence Mn = Sn2 − n is a martingale.
Subtracting n from both sides, we obtain E[Sn+12
| Fn ] = Sn2 + 1. Hence (Sn2 ) is a submartingale.
4.5 The natural filtration is Fn = σ(X1 , . . . , Xn ). We can write
Hence, if Sn ∈ mFn then Sn+1 ∈ mFn+1 . Since S1 = X1 ∈ mFn , a trivial induction shows that Sn ∈ mFn
for all n. hence, Ln ∈ mFn and also Sn − Ln ∈ mFn .
We have −n ≤ Sn − Ln ≤ n + 1 (because the walk is reflected at zero and can increase by at most 1 in each
time step) so Sn − Ln is bounded and hence Sn − Ln ∈ L1 .
Lastly,
Here we use linearity and the taking out what is known (Sn ∈ mFn ) rule, as well as that Xn+1 is independent
of Fn . Hence,
" n
#
1{Si = 0} Fn
X
E[Sn+1 − Ln+1 | Fn ] = E Sn+1 −
i=0
n
= Sn + 1{Sn = 0} − 1{Si = 0}
X
i=0
n−1
1{Si = 0}
X
= Sn −
i=0
= Sn − Ln
Here we use (A.2) to deduce the second line, as well as taking out what is known. Hence, Sn − Ln is a
martingale.
4.6 Let Bn be number of red balls in the urn at time n (i.e. after the nth draw is complete). Then, the
proportion of red balls in the urn just after the nth step is
Bn
Mn = .
n+3
Essentially the same argument as for the two colour urn of Section 4.2 shows that Mn is adapted to the
natural filtration and also in L1 . We have
E[Mn+1 | Fn ] = E Mn+1 1{(n+1)th draw is red} + Mn+1 1{(n+1)th draw is not red} Fn
h i
Bn + 1 B
=E 1{(n+1)th draw is red} + n 1{(n+1)th draw is not red} Fn
n+4 n+4
Bn + 1 Bn
E 1{(n+1)th draw is red} Fn + E 1{(n+1)th draw is not red} Fn
h i h i
=
n+4 n+4
126
©Nic Freeman, University of Sheffield, 2024.
Bn + 1 Bn Bn Bn
= + 1−
n+4 n+3 n+4 n+3
Bn (n + 4)
=
(n + 3)(n + 4)
= Mn .
Hence (Mn ) is a martingale.
4.7 By the result of 3.6, Sn2 is a submartingale.
(i) Mn = Sn2 + n is Fn measurable (because Sn ∈ mFn ) and since |Sn | ≤ n we have |Mn | ≤ n2 + n < ∞,
so Mn ∈ L1 . Further, using the submartingale property of Sn2 ,
2 2
E[Sn+1 + n + 1 | Fn ] = E[Sn+1 | Fn ] + n + 1 ≥ Sn2 + n
so Mn is a submartingale. But E[M1 ] = 2 6= 0 = E[M0 ], so (by Lemma 3.3.6) we have that (Mn ) is
not a martingale.
(ii) We have shown in Section 4.1 that (Sn ) is a martingale, and in Exercise 4.4 that Mn = Sn2 − n is a
martingale. Hence Sn2 − n + Sn ∈ Fn and E[|Sn2 + Sn − n|] ≤ E[|Sn2 − n|] + E[|Sn |] < ∞ by the triangle
law, so also Sn2 + Sn − n ∈ L1 . Also,
2 2
E[Sn+1 + Sn+1 − (n + 1) | Fn ] = E[Sn+1 − (n + 1) | Fn ] + E[Sn+1 | Fn ]
= Sn2 − n + Sn
as required. To deduce the second line we use the martingale properties of (Sn ) and (Mn ). As all
martingales are submartingales, this case is also a submartingale.
More generally, the sum of two martingales is always a martingale. The same is true for submartingales
and also for supermartingales.
(iii) Mn = Snn is Fn measurable (because Sn ∈ mFn ) and since |Sn | ≤ n we have |Sn | ≤ 1, so Mn ∈ L1 .
Since Sn is a martingale, we have
Sn+1 1 Sn Sn
E Fn = E[Sn+1 |Fn ] = 6= .
n+1 n+1 n+1 n
Hence Mn = Snn is not a martingale. Also, since Sn may be both positive or negative, we do not have
Sn
n+1
≥ Snn , so Sn is also not a submartingale.
4.8 We showed that Sn2 − n was a martingale in 4.7. We try to mimic that calculation and find out what goes
wrong in the cubic case. So, we look at
E[(Sn+1 − Sn )3 | Fn ] = E[Sn+1
3 2
| Fn ] − 3Sn E[Sn+1 | Fn ] + 3Sn2 E[Sn+1 | Fn ] − E[Sn3 | Fn ]
3
= E[Sn+1 | Fn ] − 3Sn (Sn2 + 1) + 3Sn2 Sn − Sn3
3
= E[Sn+1 | Fn ] − Sn3 − 3Sn .
Here we use linearity, taking out what is known and that fact that E[Sn+1
2
| Fn ] = Sn2 + 1 (from 4.7).
However, also Sn+1 − Sn = Xn+1 , so
E[(Sn+1 − Sn )3 | Fn ] = E[Xn+1
3
| Fn ]
3
= E[Xn+1 ]
= 0.
because Xn+1 is independent of Fn and 3
Xn+1 = Xn (it takes values only 1 or −1) so E[Xn+1
3
] = 0. Putting
these two calculations together, we have
3
E[Sn+1 | Fn ] = Sn3 + 3Sn . (A.3)
Suppose (aiming for a contradiction) that there is a deterministic function f : N → R such that Sn3 − f (n)
is a martingale. Then
3
E[Sn+1 − f (n + 1) | Fn ] = Sn3 − f (n).
Combining the above equation with (A.3) gives
f (n + 1) − f (n) = 3Sn
but this is impossible because Sn is not deterministic. Thus we reach a contradiction; no such function f
exists.
127
©Nic Freeman, University of Sheffield, 2024.
Chapter 5
5.1 The value of the portfolio (1, 3) at time 1 will be
x(1 + r) + ysu = 1
x(1 + r) + ysd = 1.
x(1 + r) + ysu = 3
x(1 + r) + ysd = 1.
Which has solution x = 1
1+r
2su
3 − su−sd and y = s(u−d)
2
.
Hence, the value of this contingent claim at time 0 is x + sy = 1
1+r
3− 2su
su−sd
+ 2
u−d
.
5.3 (a) If we buy two units of stock, at time 1, for a price K then our contingent claim is Φ(S1 ) = 2S1 − 2K.
(b) In a European put option, with strike price K ∈ (sd, su), we have the option to sell a single unit of
stock for strike price K. It is advantageous to us to do so only if K > S1 , so
if S1 = su,
(
0
Φ(S1 ) = = max(K − S1 , 0) (A.4)
K − S1 if S1 = sd.
(c) We S1 = su we sell a unit of stock for strike price K and otherwise do nothing. So our contingent
claim is
K − S1 if S1 = su,
(
Φ(S1 ) =
0 if S1 = sd.
(d) Holding both the contracts in (b) and (c) at once, means that in both S1 = su and S − 1 = sd,
we end up selling a single unit of stock for a fixed price K. Our contingent claim for doing so is
Φ(S1 ) = K − S1 .
5.4 (a) The contingent claims for the call and put options respectively are max(S1 − K, 0) from (5.6) and
max(K − S1 , 0) from (A.4). Using the risk neutral valuation formula from Proposition 5.2.6 we have
1 1
Πcall
0 = EQ [max(S1 − K, 0)] , Πput
0 = EQ [max(K − S1 , 0)] .
1+r 1+r
(b) Hence,
1
Πcall
0 − Πput
0 = EQ [max(S1 − K, 0) − max(K − S1 , 0)]
1+r
1
= EQ [max(S1 − K, 0) + min(S1 − K, 0)]
1+r
1
= EQ [S1 − K]
1+r
1 Q
= E [S1 ] − K
1+r
K
Πcall
0 − Πput
0 =s− .
1+r
128
©Nic Freeman, University of Sheffield, 2024.
5.5 In the binomial model, the contingent claim of a European call option with strike price K is Φ(ST ) =
max(ST − K).
5.6 A tree-like diagram of the possible changes in stock price, with the value of the contingent claim Φ(ST ) =
max(K − S1 , 0) written on for the final time step looks like
(1 + 0.25) − 0.5
qu = = 0.5, qd = 1 − qu = 0.5.
2.0 − 0.5
Using Proposition 5.2.6 recursively, as in Section 5.6, we can put in the value of the European put option
at all nodes back to time 0, giving
The hedging portfolios are found by solving linear equations of the form x+dSy = Φ(dS),
e x+uSy = Φ(uD),
e
where Φe is the contingent claim and S the initial stock price of the one period model associated to the
given node. The value of the contingent claim at each node is then inferred from the hedging portfolio as
V = x + Sy.
129
©Nic Freeman, University of Sheffield, 2024.
5.8 A tree-like diagram of the possible changes in stock price, with the value of the option at all nodes of the
tree shown in (green) square boxes, and hedging portfolios h = (x, y) written (in red) for each node of the
tree:
The hedging portfolios are found by solving linear equations of the form x+dSy = Φ(dS),
e x+uSy = Φ(uD),
e
where Φ is the contingent claim and S the initial stock price of the one period model associated to the
e
given node. The value of the contingent claim at each node is then inferred from the hedging portfolio as
V = x + Sy.
The hedging portfolio is the same at each node. This is because the contingent claim of our call option
always exceeds zero i.e. we are certain that at time T = 2 we would want to exercise our call option and
buy a single unit of stock for a price K = 60. Since there is no interest payable on cash, this means that
we can replicate our option (at all times) by holding a single unit of stock, and −60 in cash.
5.9 We have that (St ) is adapted (see Section 5.4) and since St ∈ (dt , ut ) we have also that St ∈ L1 . Hence, St
is a martingale under P if and only if EP [St+1 | Ft ] = St . That is if
EP [St+1 | Ft ] = E[Zt+1 St | Ft ]
= St EP [Z | Ft ]
= St EP [Z]
Hence,
t
EP [Mt+1 |Ft ] = log S0 + log(Zi ) + E[log(Zt+1 ) | Ft ]
X
i=1
= Mt + EP [log Zt+1 ]
= Mt + pu log u + pd log d.
Here we use taking out what is known, since S0 and Zi ∈ mFt for i ≤ t, and also that Zt+1 is independent
of Ft . Hence, Mt is a martingale under P if and only if pu log u + pd log d = 0.
Chapter 6
6.1 Since Xn ≥ 0 we have
E[|Xn − 0|] = E[Xn ] = 12 2−n = 2−(n+1)
which tends to 0 as n → ∞. Hence Xn → 0 in L1 . Lemma 6.1.2 then implies that also Xn → 0 in
probability and in distribution. Also, 0 ≤ Xn ≤ 2−n and 2−n → 0, so by the sandwich rule we have
Xn → 0 as n → ∞, and hence P[Xn → 0] = 1, which means that Xn → 0 almost surely.
L1
6.2 (a) We have Xn → X and hence
Here we use the linearity and absolute value properties of expectation. Hence, E[Xn ] → E[X].
130
©Nic Freeman, University of Sheffield, 2024.
(b) Let X1 be a random variable such that P[X1 = 1] = P[X1 = −1] = 12 and note that E[X1 ] = 0. Set
Xn = X1 for all n ≥ 2, which implies that E[Xn ] = 0 → 0 as n → ∞. But E[|Xn − 0|] = E[|Xn |] = 1
so Xn does not converge to 0 in L1 .
6.3 We have Xn = X when U = 2, and |Xn −X| = n1 when U = 0, 1. Hence, |Xn −X| ≤ 1
n
and by monotonicity
of E we have
1
E [|Xn − X|] ≤ → 0 as n → ∞,
n
L1
which means that Xn → X.
We can write
P[Xn → X] = P[Xn → X, U = 0] + P[Xn → X, U = 1] + P[Xn → X, U = 2]
1 1
= P[1 + → 1, U = 0] + P[1 − → 1, U = 1] + P[0 → 0, U = 2]
n n
= P[U = 0] + P[U = 1] + P[U = 2]
= 1.
Hence, Xn → X.
a.s.
if x < 0
0
P[Y ≤ x] = P[Xn ≤ x] = 2 if x ∈ [0, 1)
1
if x ≥ 1
1
6.5 Let (Xn ) be the sequence of random variables from 6.1. Define Yn = X1 + X2 + . . . + Xn .
(a) For each ω ∈ Ω, we have Yn+1 (ω) = Yn (ω) + Xn+1 (ω). Hence, (Yn (ω) is an increasing sequence. Since
Xn (ω) ≤ 2−n for all n we have
∞
X
|Yn (ω)| ≤ 2−1 + 2−2 + . . . + 2−n ≤ 2−i = 1,
i=1
131
©Nic Freeman, University of Sheffield, 2024.
6.6 (a) We have both P[Xn ≤ x] → P[X ≤ x] and P[Xn ≤ x] → P[Y ≤ x], so by uniqueness of limits for real
sequences, we have P[X ≤ x] = P[Y ≤ x] for all x ∈ R. Hence, X and Y have the same distribution
(i.e. they have the same distribution functions FX (x) = FY (x)).
(b) By definition of convergence in probability, for any a > 0, for any > 0 there exists N ∈ N such that,
for all n ≥ N ,
P[|Xn − X| > a] < and P[|Xn − Y | > a] < .
By the triangle inequality we have
P[|X − Y | > 2a] = P[|X − Xn + Xn − Y | > 2a] ≤ P[|X − Xn | + |Xn − Y | > 2a]. (A.6)
If |X − Xn | + |Xn − Y | > 2a then |X − Xn | > a or |Xn − Y | > a (or possibly both). Hence, continuing
(A.6),
P[|X − Y | > 2a] ≤ P[|Xn − X| > a] + P[|Xn − Y | > a] ≤ 2.
Since this is true for any > 0 and any a > 0, we have P[X = Y ] = 1.
132
©Nic Freeman, University of Sheffield, 2024.
6.7 Suppose (aiming for a contradiction) that there exists and random variable X such that Xn → X in
probability. By the triangle inequality we have
|Xn − Xn+1 | ≤ |Xn − X| + |X − Xn+1 |
Hence, if |Xn − Xn+1 | > 1 then |Xn − X| > or |Xn+1 − X| > 12 (or both). Therefore,
1
2
Since Xn → X in probability, the right hand side of the above tends to zero as n → ∞. This implies that
P [|Xn − Xn+1 | > 1] → 0
as n → ∞. But, Xn and Xn+1 are independent and P[Xn = 1, Xn+1 = 0] = 14 so P[|Xn − Xn+1 | > 1] ≥ 14
for all n. Therefore we have a contradiction, so there is no X such that Xn → X in probability.
Hence, by Lemma 6.1.2, we can’t have any X such that Xn → X almost surely or in L1 (since it would
imply convergence in probability).
Chapter 7
7.1 (a) We have (C ◦ M )n =
Pn
i=1 0(Si − Si−1 ) = 0.
(b) We have (C ◦ M )n =
Pn
i=1 1(Si − Si−1 ) = Sn − S0 = Sn .
(c) We have
n
X
(C ◦ M )n = Si−1 (Si − Si−1 )
i=1
Xn
= (X1 + . . . + Xi−1 )Xi
i=1
n
!2 n
1 X 1X 2
= Xi − Xi
2 i=1
2 i=1
S2 n
= n −
2 2
In the last line we use that Xi2 = 1.
7.2 We have
n
X
(X ◦ M )n = (αCn−1 + βDn−1 )(Mn − Mn−1 )
i=1
Xn n
X
=α Cn−1 (Mn − Mn−1 ) + β Dn−1 (Mn − Mn−1 )
i=1 i=1
= α(C ◦ M )n + β(C ◦ M )n
as required.
7.3 (a) We note that
n
X n
X
|Sn | ≤ |Xi | ≤ 1 = n < ∞.
i=1 i=1
133
©Nic Freeman, University of Sheffield, 2024.
(b) Note that (Sn ) takes values in Z and that, from part (b), almost surely we have that Sn converges
to a real value. It follows immediately by Lemma 7.4.1 that (almost surely) there exists N ∈ N such
that Sn = S∞ for all n ≥ N .
7.4 Here’s some R code to plot 1000 time-steps of the symmetric random walk. Changing the value given to
set.seed will generate a new sample.
> T=1000
> set.seed(1)
> x=c(0,2*rbinom(T-1,1,0.5)-1)
> y=cumsum(x)
> par(mar=rep(2,4))
> plot(y,type="l")
Adapting it to plot the asymmetric case and the random walk from exercise 7.3 is left for you.
In case you prefer Python, here’s some Python code that does the same job. (I prefer Python.)
import numpy as np
import matplotlib.pyplot as plt
p = 0.5
rr = np.random.random(1000)
step = 2*(rr < p) - 1
start = 0
positions = [start]
for x in step:
positions.append(positions[-1] + x)
plt.plot(positions)
plt.show()
Extending this code as suggested is left for you. For the last part, the difference in behaviour that you
should notice from the graph is that the walk from Exercise 7.3 eventually stops moving (at some random
time), whereas the random walk from Q2 of Assignment 3 never keeps moving but makes smaller and
smaller steps.
7.5 We will argue by contradiction. Suppose that (Mn ) was uniformly bounded in L1 . Then the martingale
convergence theorem would give that Mn → M∞ for some real valued random variable M∞ . However,
a.s.
Mn takes values in Z and we have Mn+1 6= Mn for all n, so Lemma 7.4.1 gives that (Mn ) cannot converge
almost surely. This is a contradiction.
7.6 (a) Take the offspring distribution to be P[G = 0] = 1. Then Z1 = 0, and hence Zn = 0 for all n ≥ 1, so
Zn dies out.
(b) Take the offspring distribution to be P[G = 2] = 1. Then Zn+1 = 2Zn , so Zn = 2n , and hence
Zn → ∞.
7.7 (a) We have
Bn B +1
=E 1{nth draw is red} Fn + E n 1{nth draw is black} Fn
n+3 n+3
Bn Bn + 1 h
E 1{nth draw is red} Fn + E 1{nth draw is black} Fn
h i i
=
n+3 n+3
Bn Bn Bn + 1 Bn
= + 1−
n+3n+2 n+3 n+2
Bn2 Bn (n + 2) + (n + 2) − Bn2 − Bn
= +
(n + 2)(n + 3) (n + 2)(n + 3)
(n + 1)Bn + (n + 2)
=
(n + 2)(n + 3)
134
©Nic Freeman, University of Sheffield, 2024.
(b) Since Mn ∈ [0, 1] we have E[|Mn |] ≤ 1, hence Mn is uniformly bounded in L1 . Hence, by the martingale
convergence theorem there exists a random variable M∞ such that Mn → M∞ .
a.s.
Since Mn = K , this means that also Bn → KM∞ . However, Bn is integer valued so, by Lemma
Bn a.s.
7.4.1, if Bn is to converge then it must eventually become constant. This can only occur if, eventually,
the urn contains either (i) only red balls or (ii) only black balls.
In case (i) we have Mn = 0 eventually, which means that M∞ = 0. In case (ii) we have Mn = 1
eventually, which means that M∞ = 1. Hence, M∞ ∈ {0, 1} almost surely.
7.9 (a) Consider the usual Pólya urn, from Section 4.2, started with just one red ball and one black ball.
After the first draw is complete we have two possibilities:
1. The urn contains one red ball and two black balls.
2. The urn contains two red balls and one black ball.
In the first case we reach precisely the initial state of the urn described in 7.9. In the second case we
also reach the initial state of the urn described in 7.9, but with the colours red and black swapped.
If Mn → 0, then in the first case the fraction of red balls would tend to zero, and (by symmetry of
a.s.
colours) in the second case the limiting fraction of black balls would tend to zero; thus the limiting
fraction of black balls would always be either 1 (the first case) or 0 (the second case). However, we
know from Proposition 7.4.3 that the limiting fraction of black balls is actually uniform on (0, 1).
Hence, P[Mn → 0] = 0.
(b) You should discover that having a higher proportion of initial red balls makes it more likely for M∞
(the limiting proportion of red balls) to be closer to 1. However, since M∞ is a random variable in
this case, you may need to take several samples of the process (for each initial condition that you try)
to see the effect.
135
©Nic Freeman, University of Sheffield, 2024.
Follow-up challenge question: In fact, the limiting value M∞ has a Beta distribution, Be(α, β).
You may need to look up what this means, if you haven’t seen the Beta distribution before. Consider
starting the urn with n1 red and n2 black balls, and see if you can use your simulations to guess a
formula for α and β in terms of n1 and n2 (the answer is a simple formula!).
7.10 (a) We have that (Mn ) is a non-negative martingale, hence supn E[|Mn |] = supn E[Mn ] = E[M0 ] = 1.
Thus (Mn ) is uniformly bounded in L1 and the (first) martingale convergence theorem applies, giving
that Mn → M∞ for a real valued M∞ . Since Mn ≥ 0 almost surely we must have M∞ ≥ 0 almost
a.s.
surely.
(b) The possible values of Mn are W = {(q/p)z ; z ∈ Z}. Note that Mn+1 6= Mn for all n ∈ N.
On the event that M∞ > 0 there exists > 0 such that Mn ≥ for all sufficiently large n. Choose
m ∈ N such that (q/p)m ≤ and then for all sufficiently large n we have Mn ∈ Wm = {(q/p)z ; z ≥
m, z ∈ Z}. For such n, since Mn 6= Mn+1 we have |Mn − Mn+1 | ≥ |(q/p)m − (q/p)m−1 | > 0, which
implies that (Mn ) could not converge. Hence, the only possible almost sure limit is M∞ = 0.
If (Mn ) was uniformly bounded in L2 then by the (second) martingale convergence theorem we would
have E[Mn ] → E[M∞ ]. However E[Mn ] = E[M0 ] = 1 and E[M∞ ] = 0, so this is a contradiction. Hence
(Mn ) is not uniformly bounded in L2 .
(c) We have (q/p)S Noting that q/p > 1 and hence log(q/p) > 0, taking logs gives that
a.s.
n → 0.
Sn log(q/p) → −∞, which gives Sn → −∞.
a.s. a.s.
For the final part, if (Sn ) is a simple asymmetric random walk with upwards drift, then (−Sn ) is
a simple asymmetric random walk with downwards drift. Applying what we’ve just discovered thus
gives −Sn → −∞, and multiplying by −1 gives the required result.
a.s.
7.11 (a) This is a trivial induction: if Sn is even then Sn+1 is odd, and vice versa.
(b) By part (a), p2n+1 = 0 since 0 is an even number and S2n+1 is an odd number.
If Sn is to start at zero and reach zero at time 2n, then during time 1, 2, . . . , 2n it must have made
precisely n steps upwards and precisely n steps downwards. The number of different ways in which
the walk can do this is 2nn
(we are choosing exactly n steps on which to go upwards, out of 2n total
steps). Each one of these ways has probability ( 12 )2n of occurring, so we obtain p2n = 2n .
−2n
n
2
(c) From (b) we note that
(2n + 2)(2n + 1) −2 2n + 1 2n + 1 1 1
p2(n+1) = 2 p2n = 2 2−2 pn = p2n = 1− p2n .
(n + 1)(n + 1) n+1 2n + 2 2(n + 1) 2
136
©Nic Freeman, University of Sheffield, 2024.
Since Sn
f (n)
is a martingale we must have
Sn Sn
= . (A.8)
f (n + 1) f (n)
Sn Sn
≤ . (A.9)
f (n + 1) f (n)
Note that we would need this equation to hold true with probability one. To handle the ≥ we now
need to care about the sign of Sn . The key idea is that Sn can be both positive or negative, so the
only way that (A.9) can hold is if f (n+1)
1 1
= f (n) .
We will use some of the results from the solution of exercise 7.11. In particular, from for odd n ∈ N
from 7.11(b) we have P [Sn = 0] = 0, and for even n ∈ N from (A.7) we have P[Sn = 0] < 1.
In both cases we have P[Sn 6= 0] > 0. Since Sn and −Sn have the same distribution we have
P[Sn < 0] = P[−Sn < 0] = P[Sn > 0], and hence
P[Sn 6= 0] = P[Sn < 0] + P[Sn > 0] = 2P[Sn < 0] = 2P[Sn > 0].
Hence, for all n ∈ N we have P[Sn > 0] > 0 and P[Sn < 0] > 0.
Now, consider n ≥ 1. We have that here is positive probability that Sn > 0. Hence, by (A.9), we
must have f (n+1)
1 1
≤ f (n) , which means that f (n + 1) ≥ f (n). But, there is also positive probability
that Sn < 0, hence by (A.9) we must have f (n+1) 1 1
≥ f (n) , which means that f (n + 1) ≤ f (n). Hence
f (n + 1) = f (n), for all n, and by a trivial induction we have f (1) = f (n) for all n.
7.13 (a) Since Mn = Zn
µn
we note that
1
(Zn+1 − µZn )2 = (Mn+1 − Mn )2 . (A.10)
µ2(n+1)
Further,
Zn
X
Zn+1 − µZn = (Xin+1 − µ)
i=1
and Y1 , Y2 , . . . , Yn are independent. Moreover, the Yi are identically distributed and independent of
Fn . Hence, by taking out what is known (Zn ∈ mFn ) we have
!2
Zn
X
2
E (Zn+1 − µZn ) | Fn = E Yi Fn
i=1
Zn
X Zn
X
= E[Yi2 | Fn ] + E[Yi Yj | Fn ]
i=1 i,j=1
i6=j
Zn
X Zn
X
= E[Yi2 ] + E[Yi ]E[Yj ]
i=1 i,j=1
i6=j
= Zn E[Y1 ]2 + 0
= Zn σ 2 .
137
©Nic Freeman, University of Sheffield, 2024.
2 σ2
E[Mn+1 ] = E[Mn2 ] + .
µn+2
7.14 (a) Note that Xn+1 ≥ Xn ≥ 0 for all n, because taking an inf over k > n + 1 is an infimum over a smaller
set than over k > n. Hence, (Xn ) is monotone increasing and hence the limit X∞ (ω) = limn→∞ Xn (ω)
exists (almost surely).
(b) Since |Mn | ≤ infk>n |Mk |, we have Xn ≤ |Mn |. By definition of inf, for each > 0 and n ∈ N there
exists some n0 ≥ n such that Xn ≥ |Xn0 | − . Combining these two properties, we obtain
|Mn0 | − ≤ Xn ≤ |Mn |.
(c) Letting n → ∞, in the above equation, which means that also n0 → ∞ because n0 ≥ n, we take an
almost sure limit to obtain |M∞ | − ≤ X∞ ≤ |M∞ |. Since we have this inequality for any > 0, in
fact X∞ = |M∞ | almost surely.
(d) We have shown that (Xn ) is an increasing sequence and Xn → X∞ , and since each |Mn | ≥ 0, we
a.s.
have also Xn ≥ 0. Hence, the monotone convergence theorem applies to Xn and X∞ , and therefore
E[Xn ] → E[X∞ ].
(e) From the monotone convergence theorem, writing in terms of Mn rather than Xn , we obtain
E inf |Mn | → E[|M∞ |]. (A.11)
k≥n
However, since infk≥n |Mn | ≤ |Mn | ≤ C = supj∈N E[|Mj |], the monotonicity property of expectation
gives us that, for all n,
E inf |Mn | ≤ C. (A.12)
k≥n
Combining (A.11) and (A.12), with the fact that limits preserve weak inequalities, we obtain that
E[|M∞ |] ≤ C, as required.
7.15 (a) All the Xe n+1 have the same distribution, and are independent by independence of the X n+1 and
i i
Ci . Thus, Z
n+1 en is a Galton-Watson process with off-spring distribution G e given by the common
distribution of the Xin+1 .
(b) By definition of Xe n+1 we have X n+1 ≤ X
i i
e n+1 . Hence, from (7.9), if Zn ≤ Z
i
en then also Zn+1 ≤ Z
en+1 .
Since Z0 = Z e0 = 1, an easy induction shows that Zn ≤ Z en for all n ∈ N.
(c) We have
h i
f (α) = E Xein+1
∞
X
= P[G = 0]P[C = 1] + iP[G = i]
i=1
∞
X
= αP[G = 0] + iP[G = i].
i=1
It is clear that f is continuous (in fact, f is linear). Hence, by the intermediate value theorem, there
is some α ∈ [0, 1] such that f (α) = 1.
(d) By (c), we can choose α such that f (α) = 1. Hence, E[X e n+1 ] = 1, so by (a) Z
i
en is a Galton-Watson
process with an offspring distribution that has mean µ̂ = 1. By Lemma 7.4.6, Z e dies out, almost
surely. Since, from (b), 0 ≤ Zn ≤ Zn , this means Zn must also die out, almost surely.
e
138
©Nic Freeman, University of Sheffield, 2024.
Chapter 8
8.1 We have U −1 < 1 so, for all ω ∈ Ω, Xn (ω) = U −n (ω) → 0 as n → ∞. Hence Xn (ω) → 0 almost surely as
n → ∞. Further, |Xn | ≤ 1 for all n ∈ N and E[1] = 1 < ∞, so the dominated convergence theorem applies
and shows that E[Xn ] → E[0] = 0 as n → ∞.
8.2 We check the two conditions of the dominated convergence theorem. To check the first condition, note that
if n ≥ |X| then X 1{|X|≤n} = X. Hence, since |X| < ∞, as n → ∞ we have Xn = X 1{|X|≤n} → X almost
surely.
To check the second condition, set Y = |X| and then E[|Y |] = E[|X|] < ∞ so Y ∈ L1 . Also, |Xn | =
|X 1{|X|≤n} | ≤ |X| = Y , so Y is a dominated random variable for (Xn ). Hence, the dominated convergence
theorem applies and E[Xn ] → E[X].
8.3 (a) We look to use the dominated convergence theorem. For any ω ∈ Ω we have Z(ω) < ∞, hence for all
n ∈ N such that n > Z(ω) we have Xn (ω) = 0. Therefore, as n → ∞, Xn (ω) → 0, which means that
Xn → 0 almost surely.
We have |Xn | ≤ Z and Z ∈ L1 , so we can use Z are the dominating random variable. Hence, by the
dominated convergence theorem, E[Xn ] → E[0] = 0.
(b) We have Z ∞ Z ∞
E[Z] = xf (x) dx = x−1 dx = [log x]∞
1 = ∞
1 1
which means Z ∈
/ L1 and also that
Z ∞
E[Xn ] = E[1{Z∈[n,n+1)} Z] = 1{x∈[n,n+1)} xf (x) dx
1
Z n+1
n+1
= x−1 dx = [log x]n+1
n = log(n + 1) − log n = log .
n n
As n → ∞, we have n+1 n
= 1 + n1 → 1, hence (using that log is a continuous function) we have
log( n+1
n
) → log 1 = 0. Hence, E[Xn ] → 0.
(c) Suppose that we wanted to use the DCT in (b). We still have Xn → 0 almost surely, but any
dominating random variable Y would have to satisfy Y ≥ |Xn | for all n, meaning that also Y ≥ Z,
which means that E[Y ] ≥ E[Z] = ∞; thus there is no dominating random variable Y ∈ L1 . Therefore,
we can’t use the DCT here, but we have shown in (b) that the conclusion of the DCT does hold: we
have that E[Xn ] does tend to zero.
We obtain that the conditions of the DCT are sufficient but not necessary for its conclusion to hold.
8.4 Note that {T = n} = {T ≤ n} \ {T ≤ n − 1}. If T is a stopping time then both parts of the right hand side
are elements of Fn , hence so is {T = n}.
Conversely, if {T = n} ∈ Fn for all n, then for i ≤ n we have {T = i} ∈ Fi ⊆ Fn . Hence the identity
{T ≤ n} = ∪n i=1 {T = i} shows that {T ≤ n} ∈ Fn , and therefore T is a stopping time.
Since S and T are both stopping times we have {T ≤ n}, {S ≤ n} ∈ Fn for all n ∈ N, which means that
{T ≤ j}, {S ≤ k} ∈ Fn for all j, k ≤ n. Hence {S + T ≤ n} ∈ Fn and therefore S + T is also a stopping
time.
8.7 If A ∈ FS then A ∩ {S ≤ n} ∈ Fn . Since T is a stopping time we have {T ≤ n} ∈ Fn , hence
A ∩ {S ≤ n} ∩ {T ≤ n} ∈ Fn .
139
©Nic Freeman, University of Sheffield, 2024.
1
= .
n
Therefore, since P[T = ∞] ≤ P[T ≥ n] for all n, we have P[T = ∞] = 0 and P[T < ∞] = 1.
(b) We note that T is a stopping time, with respect to Ft = σ(Bi ; i ≤ n), since
n
{T ≤ n} = {a red ball was drawn at some time i ≤ n} =
[
{Bi = 2}.
i=1
We showed in Section 4.2 that Mn was a martingale. Since Mn ∈ [0, 1] the process (Mn ) is bounded.
By part (a) for any n ∈ N we have P[T = ∞] ≤ P[T ≥ n] = n1 , hence P[T = ∞] = 0 and P[T < ∞] = 1.
Hence, we have condition (b) of the optional stopping theorem, so
1
E[MT ] = E[M0 ] = .
2
By definition of T we have BT = 2. Hence MT = 2
T +2
, so we obtain E[ T +2
1
] = 14 .
8.9 (a) We showed in exercise 7.8 that, almost surely, there was a point in time at which the urn contained
either all red balls or all black balls. Hence, P[T < ∞] = 1.
Moreover, almost surely, since the urn eventually contains either all red balls or all black balls, we
have either MT = 1 (all red balls) or MT = 0 (all black balls).
(b) We note that T is a stopping time, since
n
{T ≤ n} = for some i ≤ n we have Mi ∈ {0, 1} =
[
Mi−1 ({0, 1}).
i=0
In exercise 7.8 we showed that Mn was a martingale. Since Mn ∈ [0, 1] in fact Mn is a bounded
martingale, and we have P[T < ∞] from above, so conditions (b) of the optional stopping theorem
apply. Hence,
r
E[MT ] = E[M0 ] = .
r+b
Lastly, since MT is either 0 or 1 (almost surely) we note that
Hence, P[MT = 1] = r
r+b
.
8.10 (a) We use the filtration Fn = σ(Ni ; i ≤ n). We have Pn ∈ mFn and since 0 ≤ Pn ≤ 1 we have also that
Pn ∈ L1 . Also,
E[Pn+1 | Fn ] = E[Pn+1 1{the nth ball was red} | Fn ] + E[Pn+1 1{the nth ball was blue} | Fn ]
Nn − 1 Nn
=E 1{the n ball was red} Fn + E
th
1{the n ball was blue} Fn
th
2m − n − 1 2m − n − 1
Nn − 1 Nn
= E[1{the nth ball was red} | Fn ] + E[1{the nth ball was blue} | Fn ]
2m − n − 1 2m − n − 1
Nn − 1 Nn Nn 2m − n − Nn
= +
2m − n − 1 2m − n 2m − n − 1 2m − n
Nn
=
2m − n
= Pn .
Here we use taking out what is known (since Nn ∈ mFn ), along with the definition of our urn process
to calculate e.g. E[1{the nth ball was red} | Fn ] as a function of Nn . Hence (Pn ) is a martingale.
(b) Since 0 ≤ Pn ≤ 1, the process (Pn ) is a bounded martingale. The time T is bounded above by 2m,
hence condition (b) for the optional stopping theorem holds and E[PT ] = E[P0 ]. Since P0 = 12 this
gives us E[PT ] = 12 . Hence,
140
©Nic Freeman, University of Sheffield, 2024.
2m−1
X
= P[NT +1 = NT + 1 | T = i]P[T = i]
i=1
2m−1
X m−1
= P[T = i]
i=1
2m − i
= E[PT ]
1
= .
2
as required.
Chapter 9
9.1 (a) We have that T is a stopping time and (from Section 4.1) Mn = Sn − (p − q)n is a martingale. We
want to apply the optional stopping theorem to (Mn ).
We have E[T ] < ∞ from Lemma 9.1.1 and for all n we have
Thus, condition (c) for the optional stopping theorem applies and E[MT ] = E[M0 ]. That is,
Putting equations (9.5) and (9.6) into the above, and then using part (a) gives us
(q/p)b − 1 1 − (q/p)a
1
E[T ] = a + b .
p−q (q/p)b − (q/p)a (q/p)b − (q/p)a
9.2 (a) We have P[Ta < ∞] = P[Tb > ∞] = 1 from Lemma 9.3.2. Hence also P[T = Ta < ∞ or T = Tb <
∞] = 1. Since these two possibilities are disjoint, in fact P[T = Ta ] + P[T = Tb ] = 1.
For the second equation, we will use that (Sn ) is a martingale. We want to apply the optional stopping
theorem, but none of the conditions apply directly to (Sn ). Instead, set Mn = Sn∧T and note that
Lemma 8.2.4 implies that (Mn ) is a martingale. By definition of T we have |Mn | ≤ max{|a|, |b|},
hence (Mn ) is bounded. We have P[T < ∞] from above, hence conditions (b) of the optional stopping
theorem apply to (Mn ). We thus deduce that
Hence
as required. In the above, the first line partitions on the value of T (using what we already deduced).
The third line follows from the second because T a ≤ T , hence MTa = STa ∧T = STa , and similarly for
Tb .
(b) Solving the two equations (this is left for you) gives that P[T = Ta ] = b
b−a
and P[T = Tb ] = −a
b−a
.
141
©Nic Freeman, University of Sheffield, 2024.
(c) From Exercise 4.4 we have that Sn2 − n is a martingale. Again, none of the conditions of optional
stopping directly apply, but Xn = Sn∧T
2
− (n ∧ T ) is a martingale by Theorem 8.2.4. Similarly to
above, we have |Mn | ≤ max{a +|a|, b +|b|}, so (Xn ) is bounded. Hence conditions (b) of the optional
2 2
(b) The argument that E[G] < ∞, under the assumption P[R < ∞] = 1, is exactly the same as before.
We need to modify the next part, beginning with (9.9) which is replaced by the result in part (a).
Note that that S3n now plays the role of √
what was previously S2n , being the possible returns to zero.
Having (a) introduces an extra factor of 23 , but using the same method as before we can still deduce
that E[G] = ∞. We thus arrive at a contradiction in the same way.
9.4 (a) We have E[Xi ] = (p)(1) + (1 − p2 )(−1 − 2) = 5p
2
− 3. The condition p ∈ ( 56 , 1] implies that E[Xi ] > 0.
Writing µ = E[Xi ], it follows from the strong law of large numbers that (almost surely) there exists
N ∈ N such that for all n ≥ N we have Sn ≥ n µ2 . Hence Sn → ∞.
a.s.
(b) We calculate
E[T1 ] = E[T1 1{S1 =1} ] + E[T1 1{S1 =−1} ] + E[T1 1{S1 =−2} ]
= E[11{S1 =1} ] + E[E[T1 1{S1 =−1} | F1 ]] + E[E[T1 1{S1 =−2} | F1 ]]
=p + E[1{S1 =−1} E[T1 | F1 ]] + E[1{S1 =−2} E[T1 | F1 ]]
=p + E[1{S1 =−1} E[1 + T10 + T100 | F1 ]] + E[1{S1 =−2} E[1 + T10 + T100 + T1000 | F1 ]]
=p + E[1{S1 =−1} E[1 + T10 + T100 ]] + E[1{S1 =−1} E[1 + T10 + T100 + T1000 ]]
=p + E[1{S1 =−1} (1 + 2E[T1 ])] + E[1{S1 =−1} (1 + 3E[T1 ])]
1−p 1−p
=p + (1 + 2E[T1 ]) + (1 + 3E[T1 ]).
2 2
In the first line of the above we partition on the value of S1 . The first term of the second line follows
because if S1 = 1 then T1 = 1, and the other terms use the relationship between expectation and
conditional expectation. The third line follows because S1 ∈ F1 .
The fourth line uses that, on the event S1 = −1, T1 is equal to 1 (accounting for the first move 0 7→ −1)
plus two independent copies (T10 and T100 ) of T1 (accounting for the time to move from −1 7→ 0 and
then 0 7→ 1). Similarly, on the event S1 = −1, T1 is equal to 1 plus three independent copies of T1 .
Note that it is crucial here that (Sn ) can only jump upwards by 1 in each step of time, meaning that
it cannot go above 1 without first being equal to 1.
The strong Markov property gives that T10 , T100 and T1000 are independent of F1 , leading to the fifth line.
The remainder of the calculation is straightforward. We thus obtain
1−p
E[T1 ] = p + (2 + 5E[Tt ]).
2
Solving this equation gives E[T1 ] = 2
5p−3
.
142
©Nic Freeman, University of Sheffield, 2024.
9.5 Recall that Sn even when n even, and odd when n is odd. Hence T1 is odd. From the fact given in the
question, there exists N ∈ N such that for all n ≥ N we have P[T1 = 2n − 1] ≥ 4√πn
1
3/2 . Hence
∞ ∞ ∞
X 1 X 2n − 1 1 X 1
E[T1 ] ≥ (2n − 1)P[T1 = 2n − 1] ≥ √ ≥ √ =∞
n=N
4 π n=N n3/2 2 π n=N n1/2
as required.
9.6 (a) Note that Mn = / cosh θ). Since Xi ∈ mFn for all i ≤ n, Sn ∈ mFn for all n by
(θ) Qn θXi
i=1 (e
Proposition 2.2.6. Since |Sn | ≤ n we have |Mn | ≤ (cosh eθn
θ)n
< ∞, hence Mn ∈ L1 . We have also that
n
!
eθXi eθXn+1
Y
E[Mn+1 |Fn ] = E Fn
i=1
cosh θ cosh θ
θXn+1
e
= Mn E
cosh θ
= Mn .
Here we use the taking out what is known rule, the fact that Xn+1 is independent of Fn and the
relationship between conditional expectation and independence. To deduce the final line we note that
E[eθXi / cosh θ] = 12 (eθ + e−θ )/ cosh θ = 1.
Note: This is a more general case of the martingale from Exercise 4.1.
(b) Condition (1) fails because Tm is not bounded – to see this, note that for any N ∈ N there is a positive
probability that (Sn )N
n=0 remains below zero, in which case Tm ≥ N . Condition (b) fails because (Mn )
is unbounded. Condition (c) fails because we have E[T1 ] = ∞ by Lemma 9.3.4, and Tm ≥ T1 so also
E[Tm ] = ∞.
(c) By Lemma 8.2.4 the process Ln = Mn∧T is a martingale, and by definition of T we have that
(θ) (θ)
e−mθ emθ
. Since cosh θ ≥ 1, we thus have |Ln | ≤ emθ + e−mθ , so the martingale
(θ) (θ)
(cosh θ)n
≤ Ln ≤ (cosh θ)n
(Lθn ) is bounded. By Lemma 9.3.2 P[T < ∞] = 1, so conditions (b) of the optional stopping theorem
apply to (Ln ) and T , hence
(θ)
eθST eθS0
E =E = 1.
(cosh θ)T (cosh θ)0
By Lemma 9.3.2 we have P[T < ∞] = 1, hence almost surely either T = Tm or T = T−m . Hence,
eθSτ eθSτ
1=E 1 {T =Tm } + 1{T =T−m }
(cosh θ)T (cosh θ)T
emθ e−mθ
=E 1{T =Tm } + E 1 {T =T −m }
(cosh θ)Tm (cosh θ)T−m
1 1
= emθ E 1{T =Tm } + e −mθ
E 1 {T =T−m }
(cosh θ)Tm (cosh θ)T−m
1
1{T =Tm }
= emθ + e−mθ E
(cosh θ)Tm
In the above, the final lineh follows by symmetry about zero, which implies that the two expectations
1{T =Tm } i
are in fact equal. Thus E (cosh θ)Tm = 2 cosh(mθ)
1
, for all m ∈ N. With this in hand we can calculate
143
©Nic Freeman, University of Sheffield, 2024.
9.7 We have show in Lemma 9.5.1 that (Sn ) will almost surely return to the origin in finite time. By applying
the strong Markov property at each return to the origin, we obtain that (Sn ) will return to the origin
infinitely often.
Fix some z ∈ Z2 . There is a path from the origin to z, and with positive probability the walk will take
that path, hence P[Sn = z for some n ∈ N] is positive. By the strong Markov property, this means the
probability that (Sn ) visits z in between any two successive visits to the origin, is also positive. By the
strong Markov property, whether this occur is independent, for each successive pair of returns to the origin.
Hence we have infinitely many trials to try and visit z, each with the same positive success probability.
This will result in infinitely many visits to z, almost surely.
This holds for any z ∈ Z2 , which completes the argument.
9.8 (a) If L = ∞ then the set {n ; Sn = 0} much be infinite, meaning that (Sn ) has returned infinitely many
times to the origin. However, this has probability zero; by the strong Markov property, after each
visit there is a probability P[R = ∞] > 0 of not returning again, independently of the past; eventually
that will happen.
(b) L is not a stopping time. One way to see this is to note that {L = 0} = {R = ∞}, which depends on
the whole path taken by the walk, so is not an element of Fn for any N ∈ N.
(c) The event {L = 2n} is the event that S2n = 0, but the walk is never zero after that time. Hence
Note that we have used the Markov property to deduce the fifth line.
Recall
Pthat Sn is even if and only if n is even. Taking the above equation and summing over n, noting
that ∞ n=0 P[L = 2n] = 1 we obtain
∞
X
1 = P[R = ∞] P[S2n = 0] = P[R = ∞]E[G].
n=0
The required result follows by noting that P[R < ∞] = 1 − P[R = ∞].
(d) During the proof of Lemma 9.3.4, whilst we were arguing by contradiction, we noted that if p = P[R <
∞] < 1 then G would have a Geometric(p) distribution. In part (c) we’ve discovered the mean of this
geometric distribution. We also came close, for the same reason, during the proofs of Lemmas 9.5.1
and 9.5.2, and in Remark 9.4.2. In all of those cases, all we needed was to note that E[G] would be
finite – we didn’t need to know its value.
9.9 Let Br = {z ∈ Z3 ; |z| < n}. Note that Br is a finite set.
Consider some z ∈ Z3 . If (Sn ) visits z then, the first time that it does, we may apply the strong Markov
property. From then on the walk will behave independently of its past, again moving one unit of space in
a random direction on each step of time. Apply part (c) of Exercise 9.9, the expected number of returns
to z is finite, hence there will almost surely be a last return time to z.
We can do the above for any z ∈ BN . Since Br is finite, this means that almost surely there is a last return
time to the set Br . That is, there exists some random variable N ∈ N such that Sn ∈ / Br for all n ≥ N .
Hence |Sn | ≥ r for all n ≥ N . Since this holds for any r ∈ N, we have |Sn | → ∞.
a.s.
144
©Nic Freeman, University of Sheffield, 2024.
Appendix B
The formula sheet displayed on the following page will be provided in the exam.
145
©Nic Freeman, University of Sheffield, 2024.