notes_1

Stochastic Processes and Financial Mathematics
(part one)
Dr Nic Freeman / Dr Robin Stephenson
October 16, 2024

©Nic Freeman, University of Sheffield, 2024.
Contents
0 Introduction 3
0.1 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1 Expectation and Arbitrage 5

1.1 Betting on coin tosses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 The one-period market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Arbitrage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Modelling discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Exercises on Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Probability spaces and random variables 13

2.1 Probability measures and σ -fields . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Infinite Ω . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3 Conditional expectation and martingales 30

3.1 Conditional expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2 Properties of conditional expectation . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4 Stochastic processes 40
4.1 Random walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2 Urn processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3 A branching process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.4 Other stochastic processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5 The binomial model 51

5.1 Arbitrage in the one-period model . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.2 Hedging in the one-period model . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.3 Types of financial derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.4 The binomial model (definition) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.5 Portfolios, arbitrage and martingales . . . . . . . . . . . . . . . . . . . . . . . . . 61
1
5.6 Hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6 Convergence of random variables 70

6.1 Modes of convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.2 The monotone convergence theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 73
7 Stochastic processes and martingale theory 76

7.1 The martingale transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
7.2 Roulette . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
7.3 The martingale convergence theorem . . . . . . . . . . . . . . . . . . . . . . . . . 80
7.4 Long term behaviour of stochastic processes . . . . . . . . . . . . . . . . . . . . . 83
8 Further theory of stochastic processes (∆) 95

8.1 The dominated convergence theorem (∆) . . . . . . . . . . . . . . . . . . . . . . . 96
8.2 The optional stopping theorem (∆) . . . . . . . . . . . . . . . . . . . . . . . . . . 97
8.3 The stopped σ -field (∆) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8.4 The strong Markov property (∆) . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
8.5 Kolmogorov’s 0-1 law ( ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
8.6 Exercises on Chapter 8 (∆) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
9 Simple random walks (∆) 104

9.1 Exit probabilities (∆) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
9.2 Stirling’s Approximation (∆) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
9.3 Long term behaviour: symmetric case (∆) . . . . . . . . . . . . . . . . . . . . . . 109
9.4 Long term behaviour: asymmetric case (∆) . . . . . . . . . . . . . . . . . . . . . 113
9.5 In higher dimensions ( ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
9.6 Exercises on Chapter 9 (∆) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
A Solutions to exercises (part one) 119
B Formula Sheet (part one) 145
2
Chapter 0
Introduction
We live in a random world: we cannot be certain of tomorrow’s weather or what the price of
petrol will be next year – but randomness is never ‘completely’ random. Often we know, or
rather, believe that some events are likely and others are unlikely. We might think that two
events are both possible, but are unlikely to occur together, and so on.
How should we handle this situation? Naturally, we would like to understand the world around
us and, when possible, to anticipate what might happen in the future. This necessitates that we
study the variety of random processes that we find around us.
We will see many and varied examples of random processes throughout this course, although
we will tend to call them stochastic processes (with the same meaning). They reflect the wide
variety of unpredictable ways in which reality behaves. We will also introduce a key concept used
in the study of stochastic processes, known as a martingale.
It has become common, in both science and industry, to use highly complex models of the
world around us. Such models cannot be magicked out of thin air. In fact, in much the same way
as we might build a miniature space station out of individual pieces of Lego, what is required is
a set of useful pieces that can be fitted together into realistic models. The theory of stochastic
processes provides some of the most useful building blocks, and the models built from them are
generally called stochastic models.
One industry that makes extensive use of stochastic modelling is finance. In this course, we
will often use financial models to motivate and exemplify our discussion of stochastic processes.
The central question in a financial model is usually how much a particular object is worth.
For example, we might ask how much we need to pay today, to have a barrel of oil delivered in
six months time. We might ask for something more complicated: how much would it cost to have
the opportunity, in six months time, to buy a barrel of oil, for a price that is agreed on today?
We will study the Black-Scholes model and the concept of ‘arbitrage free pricing’, which provide
somewhat surprising answers to this type of question.
3
0.1 Organization
Syllabus
These notes are for two courses: MAS352 and MAS61023. This is part one of the lecture notes.
Part two will be made available when we reach Chapter 10.
Some sections of the course are included in MAS61023 but not in MAS352. These sections
are marked with a (∆) symbol. We will not cover these sections in lectures. Students taking
MAS61023 should study these sections independently.
Some parts of the notes are marked with a ( ) symbol, which means they are off-syllabus.
These are often cases where detailed connections can be made to and from other parts of mathe-
matics.
Problem sheets
The exercises are divided up according to the chapters of the course. Some exercises are marked
as ‘challenge questions’ – these are intended to offer a serious, time consuming challenge to the
best students.
Aside from challenge questions, it is expected that students will attempt all exercises (for the
version of the course they are taking) and review their own solutions using the typed solutions
provided at the end of these notes, in Appendices A and C.
At three points during each semester, an assignment of additional exercises will be set. About
one week later, a mark scheme will be posted, and you should self-mark your solutions.
Examination
Both versions of the course will be examined in the summer sitting. Parts of the course marked
with a (∆) are examinable for MAS61023 but not for MAS352. Parts of the course marked with
a ( ) will not be examined (for everyone).
A formula sheet will be provided in the exam, see Appendices B (for semester 1) and E (for
semester 2). Some detailed advice on revision can be found in Appendix D, attached to the second
semester notes.
MAS61023 also contains a mid-year online test, which will take place towards the end of
January and comprise 15% of the final mark. MAS352 is assessed entirely by exam.
Website
Further information, including the timetable, can be found on
https://nicfreeman1209.github.io/Website/MASx52/.
4
Chapter 1
Expectation and Arbitrage
In this chapter we look at our first example of a financial market. We introduce the idea of
arbitrage free pricing, and discuss what tools we would need to build better models.
1.1 Betting on coin tosses

We begin by looking at a simple betting game. Someone tosses a fair coin. They offer to pay you
$1 if the coin comes up heads and nothing if the coin comes up tails. How much are you prepared
to pay to play the game?
One way that you might answer this question is to look at the expected return of playing the
game. If the (random) amount of money that you win is X , then you’d expect to make
E[X] = 12 $1 + 12 $0 = $0.50.
So you might offer to pay $0.50 to play the game.
We can think of a single play as us paying some amount to buy a random quantity. That is,
we pay $0.50 to buy the random quantity X , then later on we discover if X is $1 or $0.
We can link this ‘pricing by expectation’ to the long term average of our winnings, if we played
the game multiple times. Formally this uses the strong law of large numbers:
Theorem 1.1.1 Let (Xi )i∈N be a sequence of random variables that are independent and identi-
cally distributed. Suppose that E[X1 ] = µ and var(X1 ) < ∞, and set
X1 + X2 + . . . + Xn
Sn = .
n
Then, with probability one, Sn → µ as n → ∞.
In our case, if we played the game a large number n of times, and on play i our winnings were
Xi , then our average winnings would be Sn ≈ E[X1 ] = 12 . So we might regard $0.50 as a ‘fair’
price to pay for a single play. If we paid less, in the long run we’d make money, and if we paid
more, in the long run we’d lose money.
Often, though, you might not be willing to pay this price. Suppose your life savings were
$20, 000. You probably (hopefully) wouldn’t gamble it on the toss of a single coin, where you
would get $40, 000 on heads and $0 on tails; it’s too risky.
It is tempting to hope that the fairest way to price anything is to calculate its expected value,
and then charge that much. As we will explain in the rest of Chapter 1, this tempting idea turns
out to be completely wrong.
5
1.2 The one-period market

Let’s replace our betting game by a more realistic situation. This will require us to define some
terminology. Our convention, for the whole of the course, is that when we introduce a new piece
of financial terminology we’ll write it in bold.
A market is any setting where it is possible to buy and sell one or more quantities. An object
that can be bought and sold is called a commodity. For our purposes, we will always define
exactly what can be bought and sold, and how the value of each commodity changes with time.
We use the variable t for time.
It is important to realize that money itself is a commodity. It can be bought and sold, in the
sense that it is common to exchange money for some other commodity. For example, we might
exchange some money for a new pair of shoes; at that same instant someone else is exchanging a
pair of shoes for money. When we think of money as a commodity we will usually refer to it as
cash or as a cash bond.
In this section we define a market, which we’ll then study for the rest of Chapter 1. It will be a
simple market, with only two commodities. Naturally, we have plans to study more sophisticated
examples, but we should start small!
Unlike our coin toss, in our market we will have time. As time passes money earns interest,
or if it is money that we owe we will be required to pay interest. We’ll have just one step of time
in our simple market. That is, we’ll have time t = 0 and time t = 1. For this reason, we will call
our market the one-period market.
Let r > 0 be a deterministic constant, known as the interest rate. If we put an amount
x of cash into the bank at time t = 0 and leave it there until time t = 1, the bank will then
contain
x(1 + r)
in cash. The same formula applies if x is negative. This corresponds to borrowing C0 from
the bank (i.e. taking out a loan) and the bank then requires us to pay interest on the loan.
Our market contains cash, as its first commodity. As its second, we will have a stock. Let us
take a brief detour and explain what is meant by stock.
Firstly, we should realize that companies can be (and frequently are) owned by more than one
person at any given time. Secondly, the ‘right of ownership of a company’ can be broken down
into several different rights, such as:
• The right to a share of the profits.
• The right to vote on decisions concerning the companies future – for example, on a possible
merger with another company.
A share is a proportion of the rights of ownership of a company; for example a company might
split its rights of ownership into 100 equal shares, which can then be bought and sold individually.
The value of a share will vary over time, often according to how the successful the company is. A
collection of one or more shares in a company is known as stock.
We allow the amount of stock that we own to be any real number. This means we can own a
fractional amount of stock, or even a negative amount of stock. This is realistic: in the same way
6
as we could borrow cash from a bank, we can borrow stock from a stockbroker! We don’t pay
any interest on borrowed stock, we just have to eventually return it. (In reality the stockbroker
would charge us a fee but we’ll pretend they don’t, for simplicity.)
The value or worth of a stock (or, indeed any commodity) is the amount of cash required to
buy a single unit of stock. This changes, randomly:
Let u > d > 0 and s > 0 be deterministic constants. At time t = 0, it costs S0 = s cash
to buy one unit of stock. At time t = 1, one unit of stock becomes worth
su with probability pu ,
(
S1 =
sd with probability pd .
of cash. Here, pu , pd > 0 and pu + pd = 1.
We can represent the changes in value of cash and stocks as a tree, where each edge is labelled
by the probability of occurrence.
To sum up, in the one-period market it is possible to trade stocks and cash. There are two
points in time, t = 0 and t = 1.
• If we have x units of cash at time t = 0, they will become worth x(1 + r) at time t = 1.
• If we have y units of stock, that are worth yS0 = sy at time t = 0, they will become worth
ysu with probability pu ,

(
yS1 =
ysd with probability pd .
at time t = 1.
We place no limit on how much, or how little, of each can be traded. That is, we assume the
bank will loan/save as much cash as we want, and that we are able to buy/sell unlimited amounts
of stock at the current market price. A market that satisfies this assumption is known as liquid
market.
For example, suppose that r > 0 and s > 0 are given, and that u = 32 , d = 13 and pu = pd = 12 .
At time t = 0 we hold a portfolio of 5 cash and 8 stock. What is the expected value of this
portfolio at time t = 1?
7
Our 5 units of cash become worth 5(1 + r) at time 1. Our 8 units of stock, which are worth 8S0
at time 0, become worth 8S1 at time 1. So, at time t = 1 our portfolio is worth V1 = 5(1+r)+8S1
and the expected value of our portfolio and time t is
E[V1 ] = 5(1 + r) + 8supu + 8sdpd

4
= 5(1 + r) + 6s + s
3
22
= 5 + 5r + s.
3
1.3 Arbitrage
We now introduce a key concept in mathematical finance, known as arbitrage. We say that
arbitrage occurs in a market if it possible to make money, for free, without risk of losing money.
There is a subtle distinction to be made here. We might sometimes expect to make money, on
average. But an arbitrage possibility only occurs when it is possible to make money without any
chance of losing it.
Example 1.3.1 Suppose that, in the one-period market, someone offered to sell us a single unit
of stock for a special price 2s at time 0. We could then:
1. Take out a loan of s

2 from the bank.
2. Buy the stock, at the special price, for s

2 cash.
3. Sell the stock, at the market rate, for s cash.
4. Repay our loan of s

2 to the bank (we still are at t = 0, so no interest is due).
5. Profit!
We now have no debts and s

2 cash, with certainty. This is an example of arbitrage.
Example 1.3.1 is obviously artificial. It does illustrates an important point: no one should
sell anything at a price that makes an arbitrage possible. However, if nothing is sold at a price
that would permit arbitrage then, equally, nothing can be bought for a price that would permit
arbitrage. With this in mind:
We assume that no arbitrage can occur in our market.
Let us step back and ask a natural question, about our market. Suppose we wish to have a
single unit of stock delivered to us at time T = 1, but we want to agree in advance, at time 0,
what price K we will pay for it. To do so, we would enter into a contract. A contract is an
agreement between two (or more) parties (i.e. people, companies, institutions, etc) that they will
do something together.
Consider a contract that refers to one party as the buyer and another party as the seller. The
contract specifies that:
At time 1, the seller will be paid K cash and will deliver one unit of stock to the buyer.
8
A contract of this form is known as a forward contract. Note that no money changes hands at
time 0. The price K that is paid at time 1 is known as the strike price. The question is: what
should be the value of K ?
In fact, there is only one possible value for K . This value is
K = s(1 + r). (1.1)
Let us now explain why. We argue by contradiction.
• Suppose that a price K > s(1 + r) was agreed. Then we could do the following:
1. At time 0, enter into a forward contract as the seller.

2. Borrow s from the bank, and use it buy a single unit of stock.
3. Wait until time 1.
4. Sell the stock (as agreed in our contract) in return for K cash.
5. We owe the bank s(1 + r) to pay back our loan, so we pay this amount to the bank.
6. We are left with K − s(1 + r) > 0 profit, in cash.
With this strategy we are certain to make a profit. This is arbitrage!
• Suppose, instead, that a price K < s(1 + r) was agreed. Then we could:
1. At time 0, enter into a forward contract as the buyer.

2. Borrow a single unit of stock from the stockbroker.
3. Sell this stock, in return for s cash.
4. Wait until time 1.
5. We now have s(1 + r) in cash. Since K < s(1 + r) we can use K of this cash to buy a
single unit of stock (as agreed in our contract).
6. Use the stock we just bought to pay back the stockbroker.
7. We are left with s(1 + r) − K > 0 profit, in cash.
Once again, with this strategy we are certain to make a profit. Arbitrage!
Therefore, we reach the surprising conclusion that the only possible choice is K = s(1 + r).
We refer to s(1 + r) as the arbitrage free value for K . This is our first example of an important
principle:
The absence of arbitrage can force prices to take particular values. This is known as
arbitrage free pricing.
Expectation versus arbitrage

What of pricing by expectation? Let us, temporarily, forget about arbitrage and try to use pricing
by expectation to find K .
The value of our forward contract at time 1, from the point of view of the buyer, is S1 − K . It
costs nothing to enter into the forward contract, so if we believed that we should price the contract
9
Figure 1.1: The stock price in GBP of Lloyds Banking Group, from September 2011 to September 2016.
by its expectation, we would want it to cost nothing! This would mean that E[S1 − K] = 0, which
means we’d choose
K = E[S1 ] = supu + sdpd . (1.2)
This is not the same as the formula K = s(1 + r), which we deduced in the previous section.
It is possible that our two candidate formulas for K ‘accidentally’ agree, that is upu + dpd =
s(1 + r), but they are only equal for very specific values of u, pu , d, pd , s and r. Observations of
real markets show that this doesn’t happen.
It may feel very surprising that (1.2) is different to (1.1). The reality is, that financial markets
are arbitrage free, and the correct strike price for our forward contract is K = s(1 + r). However
intuitively appealing it might seem to price by expected value, it is not what happens in reality.
Does this mean that, with the correct strike price K = s(1 + r), on average we either make
or lose money by entering into a forward contract? Yes, it does. But investors are often not
concerned with average payoffs – the world changes too quickly to make use of them. Investors
are concerned with what happens to them personally. Having realized this, we can give a short
explanation, in economic terms, of why markets are arbitrage free.
If it is possible to carry out arbitrage within a market, traders will1 discover how and imme-
diately do so. This creates high demand to buy undervalued commodities. It also creates high
demand to borrow overvalued commodities. In turn, this demand causes the price of commodities
to adjust, until it is no longer possible to carry out arbitrage. The result is that the market
constantly adjusts, and stays in an equilibrium in which no arbitrage is possible.
Of course, in many respects our market is an imperfect model. We will discuss its shortcomings,
as well as produce better models, as part of the course.
Remark 1.3.2 We will not mention ‘pricing by expectation’ again in the course. In a liquid
market, arbitrage free pricing is what matters.
1 Usually.
10
1.4 Modelling discussion

Our proof that the arbitrage free value for K was s(1 + r) is mathematically correct, but it is not
ideal. We relied on discovering specific trading strategies that (eventually) resulted in arbitrage.
If we tried to price a more complicated contract, we might fail to find the right trading strategies
and hence fail to find the right prices. In real markets, trading complicated contracts is common.
Happily, this is precisely the type of situation where mathematics can help. What is needed
is a systematic way of calculating arbitrage free prices, that always works. In order to find one,
we’ll need to first develop several key concepts from probability theory. More precisely:
• We need to be able to express the idea that, as time passes, we gain information.
For example, in our market, at time t = 0 we don’t know how the stock price will change.
But at time t = 1, it has already changed and we do know. Of course, real markets have
more than one time step, and we only gain information gradually.
• We need stochastic processes.

Our stock price process S0 7→ S1 , with its two branches, is too simplistic. Real stock prices
have a ‘jagged’ appearance (see Figure 1.1). What we need is a library of useful stochastic
processes, to build models out of.
In fact, these two requirements are common to almost all stochastic modelling. For this reason,
we’ll develop our probabilistic tools based on a wide range of examples. We’ll return to study
(exclusively) financial markets in Chapter 5, and again in Chapters 15-19.
11
1.5 Exercises on Chapter 1

On the one-period market
All these questions refer to the market defined in Section 1.2 and use notation u, d, pu , pd , r, s from
that section.
1.1 Suppose that our portfolio at time 0 has 10 units of cash and 5 units of stock. What is the
value of this portfolio at time 1?
1.2 Suppose that 0 < d < 1 + r < u. Our portfolio at time 0 has x ≥ 0 units of cash and y ≥ 0
units of stock, but we will have a debt to pay at time 1 of K > 0 units of cash.
(a) Assuming that we don’t buy or sell anything at time 0, under what conditions on x, y, K
can we be certain of paying off our debt?
(b) Suppose that do allow ourselves to trade cash and stocks at time 0. What strategy gives
us the best chance of being able to pay off our debt?
1.3 (a) Suppose that 0 < 1 + r < d < u. Find a trading strategy that results in an arbitrage.
(b) Suppose instead that 0 < d < u < 1 + r. Find a trading strategy that results in an
arbitrage.
Revision of probability and analysis

1.4 Let Y be an exponential random variable with parameter λ > 0. That is, the probability
density function of Y is
λe−λx for x > 0
(
fX (x) =
0 otherwise.
Calculate E[X] and E[X 2 ]. Hence, show that var(X) = λ2 .
1
1.5 Let (Xn ) be a sequence of independent random variables such that
if x = n2
(
1
P[Xn = x] = n
1 − n1 if x = 0.
Show that P[|Xn | > 0] → 0 and E[Xn ] → ∞, as n → ∞.
1.6 Let X be a normal random variable with mean µ and variance σ 2 > 0. By calculating
P[Y ≤ y] (or otherwise) show that Y = X−µ
σ is a normal random variable with mean 0 and
variance 1.
1.7 For which values of p ∈ (0, ∞) is 1 x−p dx finite?
R∞
1.8 Which of the following sequences converge as n → ∞? What do they converge too?
cos(nπ)
nπ n n
1
sin
X X
−n −i
e 2 .
2 n i
i=1 i=1
Give brief reasons for your answers.

1.9 Let (xn ) be a sequence of real numbers such that limn→∞ xn = 0. Show that (xn ) has a
subsequence (xnr ) such that ∞ r=1 |xnr | < ∞.
P
12
Chapter 2
Probability spaces and random

variables
In this chapter we review probability theory, and develop some key tools for use in later chapters.
We begin with a special focus on σ -fields. The role of a σ -field is to provide a way of controlling
which information is visible (or, currently of interest) to us. As such, σ -fields will allow us to
express the idea that, as time passes, we gain information.
2.1 Probability measures and σ-fields

Let Ω be a set. In probability theory, the symbol Ω is typically (and always, in this course)
used to denote the sample space. Intuitively, we think of ourselves as conducting some random
experiment, with an unknown outcome. The set Ω contains an ω ∈ Ω for every possible outcome
of the experiment. It is alternatively known as the state space.
Subsets of Ω correspond to collections of possible outcomes; such a subset is referred as an
event. For instance, if we roll a dice we might take Ω = {1, 2, 3, 4, 5, 6} and the set {1, 3, 5} is the
event that our dice roll is an odd number.
Definition 2.1.1 Let F be a set of subsets of Ω. We say F is a σ -field if it satisfies the following
properties:
1. ∅ ∈ F and Ω ∈ F .
2. if A ∈ F then Ω \ A ∈ F .
3. if A1 , A2 , . . . ∈ F then ∞
i=1 Ai ∈ F .
S
The role of a σ -field is to choose which subsets of outcomes we are actually interested in. The
power set F = P(Ω) is always a σ -field, and in this case every subset of Ω is an event. But P(Ω)
can be very big, and if our experiment is complicated, with many or even infinitely many possible
outcomes, we might want to consider a smaller choice of F instead.
Sometimes we will need to deal with more than one σ -field at a time. A σ -field G such that
G ⊆ F is known as a sub-σ -field of F .
We say that a subset A ⊆ Ω is measurable, or that it is an event (or measurable event), if
A ∈ F . To make to it clear which σ -field we mean to use in this definition, we sometimes write
that an event is F -measurable.
13
Example 2.1.2 Some examples of experiments and the σ -fields we might choose for them are
the following:
• We toss a coin, which might result in heads H or tails T . We take Ω = {H, T } and
F = ∅, {H}, {T }, Ω .

• We toss two coins, both of which might result in heads H or tails T . We take Ω =
{HH, T T, HT, T H}. However, we are only interested in the outcome that both coins are
heads. We take F = ∅, {HH}, Ω \ {HH}, Ω .

There are natural ways to choose a σ -field, even if we think of Ω as just an arbitrary set. For
example, F = {Ω, ∅} is a σ -field. If A is a subset of Ω, then F = {Ω, A, Ω \ A, ∅} is a σ -field
(check it!).
Given Ω and F , the final ingredient of a probability space is a measure P, which tells us how
likely the events in F are to occur.
Definition 2.1.3 A probability measure P is a function P : F → [0, 1] satisfying:
1. P[Ω] = 1.
2. If A1 , A2 , . . . ∈ F are pair-wise disjoint (i.e. Ai ∩ Aj = ∅ for all i, j such that i 6= j ) then

"∞ # ∞
[ X
P Ai = P[Ai ].
i=1 i=1
The second of these conditions if often called σ -additivity. Note that we needed Definition 2.1.1 to
make sense of Definion 2.1.3, because we needed something to tell us that P [ ∞ i=1 Ai ] was defined!
S
Definition 2.1.4 A probability space is a triple (Ω, F, P), where F is a σ -field and P is a prob-
ability measure.
For example, to model a single fair coin toss we would take Ω = {H, T }, F = {Ω, {H}, {T }, ∅}
and define P[H] = P[T ] = 12 .
We commented above that often we want to choose F to be smaller than P(Ω), but we have
not yet shown how to choose a suitably small F . Fortunately, there is a general way of doing so,
for which we need the following technical lemma.
Lemma 2.1.5 Let I be any set and for each i ∈ I let Fi be a σ -field. Then
(2.1)
\
F= Fi
i∈I
is a σ -field
Proof: We check the three conditions of Definition 2.1.1 for F .

(1) Since each Fi is a σ -field, we have ∅ ∈ Fi . Hence ∅ ∈ ∩i Fi . Similarly, Ω ∈ F .
(2) If A ∈ F = ∩i Fi then A ∈ Fi for each i. Since each Fi is a σ -field, Ω \ A ∈ Fi for each i.
Hence Ω \ A ∈ ∩i Fi .
(3) If Aj ∈ F for all j , then Aj ∈ Fi for all i and j . Since each Fi is a σ -field, ∪j Aj ∈ Fi for
all i. Hence ∪j Aj ∈ ∩i Fi .
14
Corollary 2.1.6 In particular, if F1 and F2 are σ -fields, so is F1 ∩ F2 .
Now, suppose that we have our Ω and we have a finite or countable collection of E1 , E2 , . . . ⊆ Ω,
which we want to be events. Let F be the set of all σ -fields that contain E1 , E2 , . . .. We enumerate
F as F = {Fi ; i ∈ I}, and apply Lemma 2.1.5. We thus obtain a σ -field F , which contains all
the events that we wanted.
The key point here is that F is the smallest σ -field that has E1 , E2 , . . . as events. To see why,
note that by (2.1), F is contained inside any σ -field F 0 which has E1 , E2 , . . . as events.
Definition 2.1.7 Let E1 , E2 , . . . be subsets of Ω. We write σ(E1 , E2 , . . . , ) for the smallest σ -field
containing E1 , E2 , . . ..
With Ω as any set, and A ⊆ Ω, our example {∅, A, Ω\A, Ω} is clearly σ(A). In general, though,
the point of Definition 2.1.7 is that we know useful σ -fields exist without having to construct them
explicitly.
In the same style, if F1 , F2 . . . are σ -fields then we write σ(F1 , F2 , . . .) for the smallest σ -algebra
with respect to which all events in F1 , F2 , . . . are measurable.
From Definition 2.1.1 and 2.1.3 we can deduce all the ‘usual’ properties of probability. For
example:
• If A ∈ F then Ω \ A ∈ F , and since Ω = A ∪ (Ω \ A) we have P[Ω] = 1 = P[A] + P[Ω \ A].
• If A, B ∈ F and A ⊆ B then we can write B = A ∪ (B \ A), which gives us that P[B] =

P[B \ A] + P [A], which implies that P[A] ≤ P[B].
And so on. In this course we are concerned with applying probability theory rather than with
relating its properties right back to the definition of a probability space; but you should realize
that it is always possible to do so.
Definitions 2.1.1 and 2.1.3 both involve countable unions. It is convenient to be able to use
countable intersections too, for which we need the following lemma.
Lemma 2.1.8 Let A1 , A2 , . . . ∈ F , where F is a σ -field. Then ∞ i=1 Ai ∈ F .

T
Proof: We can write

∞ ∞ ∞
!
\ \ [
Ai = Ω \ (Ω \ Ai ) = Ω \ Ω \ Ai .
i=1 i=1 i=1
Since F is a σ -field, Ω \ Ai ∈ F for all i. Hence also \ Ai ∈ F , which in turn means that
S∞
i=1 Ω
Ω\( ∞ i=1 Ω \ Ai ) ∈ F .
S

In general, uncountable unions and intersections of measurable sets need not be measurable.
The reasons why we only allow countable unions/intersections in probability are complicated and
beyond the scope of this course. Loosely speaking, the bigger we make F , the harder it is to make
a probability measure P, because we need to define P[A] for all A ∈ F in a way that satisfies
Definition 2.1.3. Allowing uncountable set operations would (in natural situations) result in F
being so large that it would be impossible to find a suitable P.
From now on, the symbols Ω, F and P always denote the three elements of the
probability space (Ω, F, P).
15
2.2 Random variables

Our probability space gives us a label ω ∈ Ω for every possible outcome. Sometimes it is more
convenient to think about a property of ω , rather than about ω itself. For this, we use a random
variable, X : Ω → R. For each outcome ω ∈ Ω, the value of X(ω) is a property of the outcome.
For example, let Ω = {1, 2, 3, 4, 5, 6} and F = P(Ω). We might be interested in the property
0 if ω is odd,
(
X(ω) =
1 if ω is even.
We write
X −1 (A) = {ω ∈ Ω ; X(ω) ∈ A},
for A ⊆ R, which is called the pre-image of A under X . In words, X −1 (A) is the set of outcomes
ω for which the property X(ω) falls inside the set A. In our example above X −1 ({0}) = {1, 3, 5},
X −1 ({1}) = {2, 4, 6} and X −1 ({0, 1}) = {1, 2, 3, 4, 5, 6}.
It is common to write X −1 (a) in place of X −1 ({a}), because it makes easier reading. Similarly,
for an interval (a, b) ⊆ R we write X −1 (a, b) in place of X −1 (a, b) .

Definition 2.2.1 Let G be a σ -field. A function X : Ω → R is said to be G -measurable if
for all subintervals I ⊆ R, we have X −1 (I) ∈ G.
If it is clear which σ -field G we mean to use, which might simply say that X is measurable. We
will often shorten this to writing simply X ∈ mG .
For a probability space (Ω, F, P), we say that X : Ω → R is a random variable if X is
F -measurable. The relationship to the usual notation for probability is that P[X ∈ A] means
P[X −1 (A)], so as e.g.
P [a < X < b] = P X −1 (a, b) = P[ω ∈ Ω ; X(ω) ∈ (a, b)]

P[X = a] = P[X −1 (a)] = P[ω ∈ Ω ; X(ω) = a].
We usually prefer writing P[X = a] and P[a < X < b] because we find them more intuitive; we
like to think of X as an object that takes a random value.
For example, suppose we toss a coin twice, with Ω = {HH, HT, T H, T T } as in Example 2.1.2.
If we take our σ -field to be F = P(Ω), the set of all subsets of Ω, then any function X : Ω → R
is F -measurable. However, suppose we choose instead

G = Ω, {HT, T H, T T }, {HH}, ∅
(as we did in Example 2.1.2). Then if we look at function
X(ω) = the total number of tails which occurred
we have X −1 ([0, 1]) = {HH, HT, T H} ∈ / G, so X is not G -measurable. The intuition here is that
σ -field G ‘isn’t big enough’ for X , because G only contains information about whether we threw
two heads (or not).
16
σ-fields and information

It will be very important for us to understand the connection between σ -fields and information.
When we talk about the ‘information’ contained in a σ -field G , we mean the following.
Suppose that an outcome ω of our experiment has occurred, but suppose that we don’t know
which ω ∈ Ω it was. Each event G ∈ G represents a piece of information. This piece of information
is whether or not ω ∈ G i.e. whether or not the event G has occurred. If this ‘information’ allows
us to deduce the exact value of X(ω), and if we can do this for any ω ∈ Ω, then X is G -measurable.
Going back to our example above, of two coin tosses, the information contained in G is whether
(or not) we threw two heads. Recall that X ∈ {0, 1, 2} was the number of tails thrown. Knowing
just this information provided by G doesn’t allow us to deduce X – for example if all we know is
that we didn’t throw two heads, we can’t work out exactly how many tails we threw.
The interaction between random variables and σ -fields can be summarised as follows:
event ↔ a piece of information: did the event occur

σ -field G ↔ which information we care about
X is G -measurable ↔ X depends only on information that we care about
Rigorously, if we want to check that X is G -measurable, we have to check that X −1 (I) ∈ G for
every subinterval of I ⊆ R. This can be tedious, especially if X takes many different values.
Fortunately, we will shortly see that, in practice, there is rarely any need to do so. What is
important for us is to understand the role played by a σ -field.
17
σ-fields and pre-images

Suppose that we have a probability space (Ω, F, P) and a random variable X : Ω → R. We want
to check that X is measurable with respect to some smaller σ -field G .
If X is a discrete random variable, we can use the following lemma.
Lemma 2.2.2 Let G be a σ -field on Ω. Let X : Ω → R, and suppose X takes a finite or countable
set of values {x1 , x2 , . . .}. Then:
X is measurable with respect to G ⇔ for all j, {X = xj } ∈ G.
Proof: Let us first prove the ⇒ implication. So, assume the left hand side: X ∈ mG . Since
{X = xj } = X −1 (xj ) and {xj } = [xj , xj ] is a subinterval of R, by Definition 2.2.1 we have that
{X = xj } ∈ G . Since this holds for any j = 1, . . . , n, we have shown the right hand side.
Next, we will prove the ⇐ implication. So, we now assume the right hand side: {X = xj } ∈ G
for all j . Let I be any subinterval of R, and define J = {j ; xj ∈ I}. Then,
[ [
X −1 (I) = {ω ∈ Ω ; X(ω) ∈ I} = {ω ; X(w) = xj } = {X = xj }.
j∈J j∈J
Since {X = xj } ∈ G , the definition of a σ -field tell us that also X −1 (I) ∈ G .

For example, take Ω = {1, 2, 3, 4, 5, 6}, which we think of as rolling a dice, and take F = P(Ω).
Consider

G1 = ∅, {1, 3, 5}, {2, 4, 6}, Ω ,

G2 = ∅, {1, 2, 3}, {4, 5, 6}, Ω ,
Here, G1 contains the information of whether the roll is even or odd, and G2 contains the infor-
mation of whether the roll is ≤ 3 or > 3. It’s easy to check that G1 and G2 are both σ -fields.
Let us define
0 if ω is odd,
(
X1 (ω) =
1 if ω is even,
0 if ω ≤ 3,
(
X2 (ω) =
1 if ω > 3.
That is, X1 tests if the roll is even, and X2 tests if the roll is less than or equal to three. Based
on our intuition about information we should expect that X1 is measurable with respect to G1
but not G2 , and that X2 is measurable with respect to G2 but not G1 .
We can justify our intuition rigorously using Lemma 2.2.2. We have X1−1 (0) = {1, 3, 5},
X1−1 (1) = {2, 4, 6}, so X1 ∈ mG1 but X1 ∈ / mG2 . Similarly, we have X2−1 (0) = {1, 2, 3} and
X2−1 (1) = {4, 5, 6}, so X2 ∈
/ mG1 and X2 ∈ mG2 .
Let us extend this example by introducing a third σ -field,

G3 = σ {1, 3}, {2}, {4}, {5}, {6} .
The σ -field G3 is, by Definition 2.1.7, the smallest σ -field containing the events {1, 3}, {2}, {4}, {5}
and {6}. It contains the information of which ω ∈ Ω we threw except that it can’t tell the difference
18
between a 1 and a 3. If we tried to write G3 out in full we would discover that it had 32 elements
(and probably make some mistakes!) so instead we just use Definition 2.1.7.
To check if X1 ∈ mG3 , we need to check if G3 contains {1, 3, 5} and {2, 4, 6}. We can write
{1, 3, 5} = {1, 3} ∪ {5}, {2, 4, 6} = {2} ∪ {4} ∪ {6}
which shows that {1, 3, 5}, {2, 4, 6} ∈ G3 because, in both cases, the right hand sides are made up
of sets that we already know are in G3 , plus countable set operations. Hence, X1 is G3 measurable.
You can check for yourself that X2 is also G3 measurable.
Remark 2.2.3 ( ) For continuous random variables, there is no equivalent of Lemma 2.2.2.
More sophisticated tools from measure theory are needed – see MAS350/61022.
19
σ-fields generated by random variables

We can think of random variables as containing information, because their values tell us something
about the result of the experiment. We can express this idea formally: there is a natural σ -field
associated to each function X : Ω → R.
Definition 2.2.4 The σ -field generated by X , denoted σ(X), is
σ(X) = σ(X −1 (I) ; I is a subinterval of R).
In words, σ(X) is the σ -field generated by the sets X −1 (I) for intervals I . The intuition is that
σ(X) is the smallest σ -field of events on which the random behaviour of X depends.
For example, consider throwing a fair die. Let Ω = {1, 2, 3, 4, 5, 6}, let F = P(Ω) and let
1 if ω is odd
(
X(ω) =
2 if ω is even.
Then X(ω) ∈ {1, 2}, with pre-images X −1 (1) = {1, 3, 5} and X −1 (2) = {2, 4, 6}. The smallest
σ -field that contains both of these subsets is
n o
σ(X) = ∅, {1, 3, 5}, {2, 4, 6}, Ω .
The information contained in this σ -field is whether ω is even or odd, which is precisely the same
information given by the value of X(ω).
In general, if X takes lots of different values, σ(X) could be very big and we would have no
hope of writing it out explicitly. Here’s another example: suppose that

1 if ω = 1,


Y (ω) = 2 if ω = 2,
3 if ω ≥ 3.


Then Y (ω) ∈ {1, 2, 3} with pre-images Y −1 (1) = {1}, Y −1 (2) = {2} and Y −1 (3) = {3, 4, 5, 6}.
The smallest σ -field containing these three subsets is
n o
σ(Y ) = ∅, {1}, {2}, {3, 4, 5, 6}, {1, 2}, {1, 3, 4, 5, 6}, {2, 3, 4, 5, 6}, Ω .
The information contained in this σ -field is whether ω is equal to 1, 2 or some number ≥ 3. Again,
this is precisely the same information as is contained in the value of X(ω).
It’s natural that X should be measurable with respect to the σ -field that contains precisely
the information on which X depends. Formally:
Lemma 2.2.5 Let X : Ω → R. Then X is σ(X)-measurable.
Proof: Let I be a subinterval of R. Then, by definition of σ(X), we have that X −1 (I) ∈ σ(X)
for all I .
If we have a finite or countable set of random variables X1 , X2 , . . . we define σ(X1 , X2 , . . .)
to be σ(X1−1 (I), X2−1 (I), . . . ; I is a subinterval of R). The intuition is the same: σ(X1 , X2 , . . .)
corresponds to the information jointly contained in X1 , X2 , . . ..
20
Combining random variables

Given a collection of random variables, it is useful to be able to construct other random variables
from them. To do so we have the following proposition. Since we will eventually deal with more
than one σ -field at once, it is useful to express this idea for a sub-σ -field G ⊆ F .
Proposition 2.2.6 Let α ∈ R and let X, Y, X1 , X2 , . . . be G -measurable functions from Ω → R.
Then
α, αX, X + Y, XY, 1/X, (2.2)
are all G -measurable1 . Further, if X∞ given by
X∞ (ω) = lim Xn (ω)

n→∞
exists for all ω , then X∞ is G -measurable.
Essentially, every natural way of combining random variables together leads to other random
variables. Proposition 2.2.6 can usually be used to show this. We won’t prove Proposition 2.2.6
in this course, see MAS31002/61022.
For example, if X is a random variable then so is X 2+X . For a more difficult example, suppose
2
that X is a random variable and let Y = eX , which means that Y (ω) = limn→∞ ni=0 X(ω) .
P i
i!
Recall that we know from analysis that this limit exists since e = limn→∞ i=0 i! exists for all
x
Pn xi
x ∈ R. Each of the partial sums

n
X Xi X2 Xn
Yn = =1+X + + ... +
i! 2 n!
i=0
is a random variable (we could use (2.2) repeatedly to show this) and, since the limit exists,
Y (ω) = limn→∞ Yn (ω) is measurable.
In general, if X is a random variable and g : R → R is any ‘sensible’ function then g(X) is also
a random variable. This includes polynomials, powers, all trig functions, all monotone functions,
all continuous and piecewise continuous functions, integrals/derivatives, etc etc.
Independence
We can express the concept of independence, which you already know about for random variables,
in terms of σ -fields. Recall that two events E1 , E2 ∈ F are said to be independent if P[E1 ∩ E2 ] =
P[E1 ]P[E2 ]. Using σ -fields, we have a consistent way of defining independence, for both random
variables and events.
Definition 2.2.7 Sub-σ -fields G1 , G2 of F are said to be independent if P(G1 ∩G2 ) = P(G1 )P(G2 )
for all G1 ∈ G1 and G2 ∈ G2 .
Events E1 and E2 are independent if σ(E1 ) and σ(E2 ) are independent.
Random variables X1 and X2 are independent if σ(X1 ) and σ(X2 ) are independent.
It can be checked that, for events and random variables, this definition is equivalent to the
definitions you will have seen in earlier courses. The same principle, of using the associated σ -
fields, applies to defining what it means for e.g. a random variable and an event to be independent.
We won’t check this claim as part of our course, see MAS31002/61022.
1 In the case of 1/X, we require that P[X = 0] = 0.
21
2.3 Infinite Ω
So far we focused on finite sample spaces of the form Ω = {x1 , x2 , . . . xn }. In such a case we
would normally take F = P(Ω), which is also a finite set. Since F contains every subset of Ω,
any σ -field on Ω is a sub-σ -field of F . We have seen how it is possible to construct other σ -fields
on Ω too.
In this case we can define a probability measure on Ω by choosing a finite sequence a1 , a2 , . . . , an
such that each ai ∈ [0, 1] and n1 ai = 1. We set P[xi ] = ai . This naturally extends to defining
P
P[A] for any subset A ⊆ Ω, by setting
(2.3)
X X
P[A] = P[xi ] = ai .
{i ; xi ∈A} {i ; xi ∈A}
It is hopefully obvious (and tedious to check) that, with this definition, P is a probability measure.
Consequently (Ω, F, P) is a probability space.
All our experiments with (finitely many tosses/throws of) dice and coins fit into this category
of examples. In fact, if our experiment has countably many outcomes, say, Ω = {x1 , x2 , . . .} we
can still work in much the same way, and the sum in (2.3) will become an infinite series that sums
to 1.
However, the theory of stochastic processes, as well as most sophisticated examples of stochastic
models, require an uncountable sample space. In such cases, we can’t use (2.3), because there is
no such thing as an uncountable sum.
An example with uncountable Ω

We now flex our muscles a bit, and look at an example where Ω is uncountable. We toss a
coin infinitely many times, then Ω = {H, T }N , meaning that we write an outcome as a sequence
ω = ω1 , ω2 , . . . where ωi ∈ {H, T }. The set Ω is uncountable.
We define the random variables Xn (ω) = ωn , so as Xn represents the result (H or T ) of the
n throw. We take
th
F = σ(X1 , X2 , . . .)
i.e. F is the smallest σ -field with respect to which all the Xn are random variables. Then
σ(X1 ) = {∅, {H ? ? ? . . .}, {T ? ? ? . . .}, Ω}

σ(X1 , X2 ) = σ {HH ? ? . . .}, {T H ? ? . . .}, {HT ? ? . . .}, {T T ? ? . . .}
n
= ∅, {HH ? ? . . .}, {T H ? ? . . .}, {HT ? ? . . .}, {T T ? ? . . .},
( ) ( )
HH ? ? . . . HT ? ? . . .
{H ? ? ? . . .}, {T ? ? ? . . .}, {?H ? ? . . .}, {?T ? ? . . .}, , ,
TT ? ?... TH ? ?...
o
{HH ? ? . . .}c , {T H ? ? . . .}c , {HT ? ? . . .}c , {T T ? ? . . .}c , Ω ,
where ? means that it can take on either H or T , so {H ? ? ? . . .} = {ω : ω1 = H}.

With the information available to us in σ(X1 , X2 ), we can distinguish between ω ’s where the
first or second outcomes are different. But if two ω ’s have the same first and second outcomes,
they fall into exactly the same subset(s) of σ(X1 , X2 ). Consequently, if a random variable depends
on anything more than the first and second outcomes, it will not be σ(X1 , X2 ) measurable.
22
It is not immediately clear if we can define a probability measure on F ! Since Ω is uncountable,

we cannot use the idea of (2.3) and define P in terms of P[ω] for each individual ω ∈ Ω. Equation
(2.3) simply would not make sense; there is no such thing as an uncountable sum.
To define a probability measure in this case requires a significant amount of machinery from
measure theory. It is outside of the scope of this course. For our purposes, whenever we need to
use an infinite Ω you will be given a probability measure and some of its helpful properties. For
example, in this case there exists a probability measure P : F → [0, 1] such that
• The Xn are independent random variables.
• P[Xn = H] = P[Xn = T ] = 1
2 for all n ∈ N.
From this, you can work with P without having to know how P was constructed. You don’t even
need to know exactly which subsets of Ω are in F , because Proposition 2.2.6 gives you access to
plenty of random variables.
Remark 2.3.1 ( ) In this case it turns out that F is much smaller than P(Ω). In fact, if
we tried to take F = P(Ω), we would (after some significant effort) discover that there is no
probability measure P : P(Ω) → [0, 1] (i.e. satisfying Definition 2.1.3) in which the coin tosses are
independent. This is irritating, and surprising, and we just have to live with it.
23
Almost surely
In the example from Section 2.3 we used Ω = {H, T }N , which is the set of all sequences made
up of H s and T s. Our probability measure was independent, fair, coin tosses and we used the
random variable Xn for the nth toss.
Let’s examine this example a bit. First let us note that, for any individual sequence ω1 , ω2 , . . .
of heads and tails, by independence
1 1 1
P[X1 = ω1 , X2 = ω2 , . . .] = · · . . . = 0.
2 2 2
So every element of Ω has probability zero. This is not a problem – if we take enough elements
of Ω together then we do get non-zero probabilities, for example
1
P[X1 = H] = P [ω ∈ Ω such that ω1 = H] =
2
which is not surprising.
The probability that we never throw a head is
1 1
P[for all n, Xn = T ] = · . . . = 0
2 2
which means that the probability that we eventually throw a head is
P[for some n, Xn = H] = 1 − P[for all n, Xn = T ] = 1.
So, the event {for some n, Xn = H} has probability 1, but is not equal to the whole sample space
Ω. To handle this situation we have a piece of terminology.
Definition 2.3.2 If the event E has P[E] = 1, then we say E occurs almost surely.
So, we would say that ‘almost surely, our coin will eventually throw a head’. We might say
that ‘Y ≤ 1’ almost surely, to mean that P[Y ≤ 1] = 1. This piece of terminology will be used
very frequently from now on. We might sometimes say that an event ‘almost always’ happens,
with the same meaning.
For another example, suppose that we define qnH and qnT to be the proportion of heads and,
respectively, tails in the first n coin tosses X1 , X2 , . . . , Xn . Formally, this means that
n n
1X 1X
qnH = 1{Xi = H} and qnT = 1{Xi = T }.
n n
i=1 i=1
Here 1{Xi = H} is equal to 1 is Xi = H , and equal to zero otherwise; similarly for 1{Xi = T }.
We will think a bit more about random variables of this type in the next section. Of course
qnH + qnT = 1.
The random variables 1{Xi = H} are i.i.d. with E[1{Xi = H}] = 12 , hence by Theorem 1.1.1
we have P[qnH → 12 as n → ∞] = 1, and by the same argument we have also P[qnT → 12 as n →
∞] = 1. In words, this means that in the long run half our tosses will be tails and half will be
heads (which makes sense - our coin is fair). That is, the event

1 1
E = lim qn = H
and lim qn =T
n→∞ 2 n→∞ 2
occurs almost surely.
There are many many examples of sequences (e.g. HHT HHT HHT . . .) that don’t have
qn → 12 and qnH → 12 . We might think of the event E as being only a ‘small’ subset of Ω, but it
T
has probability one.
24
2.4 Expectation
There is only one part of the ‘usual machinery’ for probability that we haven’t yet discussed,
namely expectation.
Recall that the expectation of a discrete random variable X that takes the values {xi : i ∈ N}
is given by
(2.4)
X
E[X] = xi P[X = xi ].
xi
For a continuous random variables, the expectation uses an integral against the probability density
function, Z ∞
E[X] = x fX (x) dx. (2.5)
−∞
Recall also that it is possible for limits (i.e. infinite sums) and integrals to be infinite, or not exist
at all.
We are now conscious of the general definition of a random variable X , as an F -measurable
function from Ω to R. There are many random variables that are neither discrete nor continuous,
and for such cases (2.4) and (2.5) are not valid; we need a more general approach.
With Lebesgue integration, the expectation E can be defined using a single definition that works
for both discrete and continuous (and other more exotic) random variables. This definition relies
heavily on analysis and is well beyond the scope of this course. Instead, Lebesgue integration is
covered in MAS31002/61022.
For purposes of this course, what you should know is: E[X] is defined for all X such that either
1. X ≥ 0, in which case it is possible that E[X] = ∞,
2. general X for which E[|X|] < ∞.
The point here is that we are prepared to allow ourselves to write E[X] = ∞ (e.g. when the sum
or integral in (2.4) or (2.5) tends to ∞) provided that X ≥ 0. We are not prepared to allow
expectations to equal −∞, because we have to avoid nonsensical ‘∞ − ∞’ situations.
It’s worth knowing that if X ≥ 0 and P[X = ∞] > 0, then E[X] = ∞. In words, the slightest
chance of X being infinite will outweigh all of the finite possibilities and make E[X] infinite.
You may still use (2.4) and (2.5), in the discrete/continuous cases. You may also assume that
all the ‘standard’ properties of E hold:
Proposition 2.4.1 For random variables X, Y :

(Linearity) If a, b ∈ R then E[aX + bY ] = aE[X] + bE[Y ].
(Independence) If X and Y are independent then E[XY ] = E[X]E[Y ].
(Absolute values) |E[X]| ≤ E[|X|].
(Monotonicity) If X ≤ Y then E[X] ≤ E[Y ].
You should become familiar with any of the properties that you are not already used to using.
The proofs of these properties are part of the formal construction of E and are not part of our
course.
25
Indicator functions
One important type of random variable is an indicator function. Let A ∈ F , then the indicator
function of A is the function (
1 ω∈A
1A (ω) =
0 ω∈ / A.
The indicator function is used to tell if an event occurred (in which case it is 1) or did not occur
(in which case it is 0). It is useful to remember that
P[A] = E[1A ].
We will sometimes not put the A as a subscript and write e.g. 1{X < 0} for the indicator function
of the event that X < 0.
As usual, let G denote a sub σ -field of F .
Lemma 2.4.2 Let A ∈ G . Then the function 1A is G -measurable.
Proof: Let us write Y = 1A . Note that Y is a discrete random variable, which can take the
two values 0 and 1. We have Y −1 (1) = {Y = 1} = A and Y −1 (0) = {Y = 0} = Ω \ A. By
Proposition 2.2.2, Y is G measurable.
Indicator functions allow us to condition, meaning that we can break up a random variable
into two or more cases. For example, given any random variable X we can write
X = X 1{X≥1} + X 1{X<1} . (2.6)
Precisely one of the two terms on the right hand side is non-zero. If X ≥ 1 then the first term
takes the value X and the second is zero; if X < 1 then the second term is equal to X and the
first term is zero.
We can use (2.6) to prove a useful inequality. Putting |X| in place of X , and then taking E
we obtain
E[|X|] = E[|X|1{|X|≥1} ] + E[|X|1{|X|<1} ]

≤ E[X 2 1{|X|≥1} ] + 1
≤ E[X 2 ] + 1. (2.7)
Here, to deduce the second line, the key point is we can only use the inequality |x| ≤ x2 if x ≥ 1.
Hence, for the first term we can use that |X|1{|X|≥1} ≤ X 2 1{|X|≥1} . For the second term, we use
that |X|1{|X|<1} ≤ 1. In both cases, we also need the monotonicity of E.
26
Lp spaces
It will often be important for to us know whether a random variable X has finite mean and
variance. Some random variables do not, see exercise 2.8 (or MAS2010 or MAS31002) for example.
Random variables with finite mean and variances are easier to work with than those which don’t,
and many of the results in this course require these conditions.
We use some notation:
Definition 2.4.3 Let p ∈ [1, ∞). We say that X ∈ Lp if E[|X|p ] < ∞.
In this course, we will only be interested in the cases p = 1 and p = 2. These cases have the
following set of useful properties:
1. By definition, L1 is the set of random variables for which E[|X|] is finite.
2. L2 is the set of random variables with finite variance. This comes from the fact that var(X) =
E[X 2 ] − E[X]2 (more strictly, it needs some integration theory from MAS31002/61022).
3. From (2.7), if X ∈ L2 then also X ∈ L1 .
Often, to check if X ∈ Lp we must calculate E[|X|p ]. A special case where it is automatic is the
following.
Definition 2.4.4 We say that a random variable X is bounded if there exists (deterministic)
c ∈ R such that |X| ≤ c.
If X is bounded, then using monotonicity we have E[|X|p ] ≤ E[cp ] = cp < ∞, which means that
X ∈ Lp , for all p.
27

On probability spaces
2.1 Consider the experiment of throwing two dice, then recording the uppermost faces of both
dice. Write down a suitable sample space Ω and suggest an appropriate σ -field F .
2.2 Let Ω = {1, 2, 3}. Let
F = {∅, {1}, {2, 3}, {1, 2, 3}},

F 0 = {∅, {2}, {1, 3}, {1, 2, 3}}.
(a) Show that F and F 0 are both σ -fields.

(b) Show that F ∪ F 0 is not a σ -field, but that F ∩ F 0 is a σ -field.
2.3 Let Ω = {H, T }N be the probability space from Section 2.3, corresponding to an infinite
sequence of independent fair coin tosses (Xn )∞
n=1 .
(a) Fix m ∈ N. Show that the probability that that random sequence X1 , X2 , . . . , contains
precisely m heads is zero.
(b) Deduce that, almost surely, the sequence X1 , X2 , . . . contains infinitely many heads and
infinitely many tails.
2.4 Let Ω = {1, 2, 3, 4, 5, 6}, corresponding to one roll of a die. In each of the following cases
describe, in words, the information contained within the given σ -field.
(a) F1 = ∅, {1, 2, 3, 4, 5}, {6}, Ω .

(b) F2 = σ {1}, {2}, {3}, {4, 5, 6} .

(c) F3 = ∅, {1, 2}, {3, 4}, {5, 6}, {1, 2, 3, 4}, {3, 4, 5, 6}, {1, 2, 5, 6}, Ω .

On random variables
2.5 Let Ω = {1, 2, 3, 4, 5} and set

G1 = σ {1, 5}, {2, 4}, {3}

G2 = ∅, {1, 2}, {3, 4, 5}, Ω .
(a) Define X1 : Ω → R by X1 (ω) = (ω − 3)2 . Show that X1 ∈ mG1 and X1 ∈

/ mG2 .
(b) Give an example of a function X2 : Ω → R such that X2 ∈ mG2 and X2 ∈
/ mG1 .
2.6 Let Ω = {HH, HT, T H, T T }, representing two coin tosses. Define X to be the total number
of heads shown. Write down all the events in σ(X).
2.7 Let X be a random variable. Explain why X

X 2 +1 and sin(X) are also random variables.
2.8 Let X be a random variable with the probability density function f : R → R given by
2x−3 if x ∈ [1, ∞),

(
f (x) =
0 otherwise.
Show that X ∈ L1 but X ∈

/ L2 .
28
2.9 Let 1 ≤ p ≤ q < ∞.
(a) Generalize (2.7) to show that E[|X|p ] ≤ E[|X|q ] + 1.

(b) Deduce that if X ∈ Lq then X ∈ Lp .
2.10 Let a > 0 and let X be a random variable such that X ≥ 0. Show that P[X ≥ a] ≤ a1 E[X].
Challenge questions
2.11 Show that if P[X ≥ 0] = 1 and E[X] = 0 then P[X = 0] = 1.
29
Chapter 3
Conditional expectation and

martingales
We will introduce conditional expectation, which provides us with a way to estimate random
quantities based on only partial information. We will also introduce martingales, which are the
mathematical way to capture the concept of a fair game.
3.1 Conditional expectation

Suppose X and Z are random variables that take on only finitely many values {x1 , . . . , xm } and
{z1 , . . . , zn }, respectively. In earlier courses, ‘conditional expectation’ was defined as follows:
P[X = xi | Z = zj ] = P[X = xi , Z = zj ]/P[Z = zj ]

X
E[X | Z = zj ] = xi P[X = xi | Z = zj ]
i
Y = E[X | Z] where: if Z(ω) = zj , then Y (ω) = E[X | Z = zj ] (3.1)
You might also have seen a second definition, using probability density functions, for continuous
random variables. These definitions are problematic, for several reasons, chiefly (1) its not imme-
diately clear how the two definitions interact and (2) we don’t want to be restricted to handling
only discrete or only continuous random variables.
In this section, we define the conditional expectation of random variables using σ -fields. In this
setting we are able to give a unified definition which is valid for general random variables. The
definition is originally due to Kolmogorov (in 1933), and is sometimes referred to as Kolmogorov’s
conditional expectation. It is one of the most important concepts in modern probability theory.
Conditional expectation is a mathematical tool with the following function. We have a prob-
ability space (Ω, F, P) and a random variable X : Ω → R. However, F is large and we want to
work with a sub-σ -algebra G , instead. As a result, we want to have a random variable Y such
that
1. Y is G -measurable
2. Y is ‘the best’ way to approximate X with a G -measurable random variable
The second statement on this wish-list does not fully make sense; there are many different ways
in which we could compare X to a potential Y .
30
Why might we want to do this? Imagine we are conducting an experiment in which we

gradually gain information about the result X . This corresponds to gradually seeing a larger and
larger G , with access to more and more information. At all times we want to keep a prediction of
what the future looks like, based on the currently available information. This prediction is Y .
It turns out there is only one natural way in which to realize our wish-list (which is convenient,
and somewhat surprising). It is the following:
Theorem 3.1.1 (Conditional Expectation) Let X be an L1 random variable on (Ω, F, P).

Let G be a sub-σ -field of F . Then there exists a random variable Y ∈ L1 such that
1. Y is G -measurable,
2. for every G ∈ G , we have E[Y 1G ] = E[X 1G ].
Moreover, if Y 0 ∈ L1 is a second random variable satisfying these conditions, P[Y = Y 0 ] = 1.
The first and second statements here correspond respectively to the items on our wish-list.
Definition 3.1.2 We refer to Y as (a version of) the conditional expectation of X given G . and
we write
Y = E[X | G].
Since any two such Y are almost surely equal so we sometimes refer to Y simply as the
conditional expectation of X . This is a slight abuse of notation, but it is commonplace and
harmless.
Proof of Theorem 3.1.1 is beyond the scope of this course. Loosely speaking, there is an abstract
recipe which constructs E[X|G]. It begins with the random variable X , and then averages out over
all the information that is not accessible to G , leaving only as much randomness as G can support,
resulting in E[X|G]. In this sense the map X 7→ E[X|G] simplifies (i.e. reduces the amount of
randomness in) X in a very particular way, to make it G measurable.
It is important to remember that E[X|G] is (in general) a random variable. It is also important
to remember that the two objects
E[X|G] and E[X|Z = z]
are quite different. They are both useful. We will explore the connection between them in Section
3.1. Before doing so, let us look at a basic example.
Let X1 , X2 be independent random variables such that P[Xi = −1] = P[Xi = 1] = 12 . Set
F = σ(X1 , X2 ). We will show that
E[X1 + X2 |σ(X1 )] = X1 . (3.2)
To do so, we should check that X1 satisfies the two conditions in Theorem 3.1.1, with
X = X1 + X2
Y = X1
G = σ(X1 ).
31
The first condition is immediate, since by Lemma 2.2.5 X1 is σ(X1 )-measurable i.e. Y ∈ mG . To
see the second condition, let G ∈ σ(X1 ). Then 1G ∈ σ(X1 ) by Lemma 2.4.2 and X2 ∈ σ(X2 ),
and these σ -fields are independent, so 1G and X2 are independent. Hence
E[(X1 + X2 )1G ] = E[X1 1G ] + E[1G X2 ]

= E[X1 1G ] + E[1G ]E[X2 ]
= E[X1 1G ] + P[G].0
= E[X1 1G ].
This equation says precisely that E[X 1G ] = E[Y 1G ]. We have now checked both conditions, so
by Theorem 3.1.1 we have E[X|G] = Y , meaning that E[X1 + X2 |σ(X1 )] = X1 , which proves our
claim in (3.2).
The intuition for this, which is plainly visible in our calculation, is that X2 is independent of
σ(X1 ) so, thinking of conditional expectation as an operation which averages out all randomness
in X = X1 + X2 that is not G = σ(X1 ) measurable, we would average out X2 completely
i.e. E[X2 ] = 0.
We could equally think of X1 as being our best guess for X1 + X2 , given only information in
σ(X1 ), since E[X2 ] = 0. In general, guessing E[X|G] is not so easy!
32
Relationship to the naive definition ( )

Conditional expectation extends the ‘naive’ definition of (3.1). Naturally, the ‘new’ conditional
expectation is much more general (and, moreover, it is what we require later in the course), but
we should still take the time to relate it to the naive definition.
Remark 3.1.3 This subsection is marked with a ( ), meaning that it is non-examinable. This
is so as you can forget the old definition and remember the new one!
To see the connection, we focus on the case where X, Z are random variables with finite sets of
values {x1 , . . . , xn }, {z1 , . . . , zm }. Let Y be the naive version of conditional expectation defined
in (3.1). That is,
1{Z(ω)=zj } E[X|Z = zj ].
X
Y (ω) =
j
We can use Theorem 3.1.1 to check that, in fact, Y is a version of E[X|σ(Z)]. We want to check
that Y satisfies the two properties listed in Theorem 3.1.1.
• Since Z only takes finitely many values {z1 , . . . , zm }, from the above equation we have that
Y only takes finitely many values. These values are {y1 , . . . , ym } where yj = E[X|Z = zj ].
We note
Y −1 (yj ) = {ω ∈ Ω ; Y (ω) = E[X|Z = zj ]}

= {ω ∈ Ω ; Z(ω) = zj }
= Z −1 (zj ) ∈ σ(Z).
This is sufficient (although we will omit the details) to show that Y is σ(Z)-measurable.
• We can calculate
E[Y 1{Z = zj }] = yj E[1{Z = zj }]

= yj P[Z = zj ]
X
= xi P[X = xi |Z = zj ]P[Zj = zj ]
i
xi P[X = xi and Z = zj ]
X
=
i
xi 1{Z=zj } P[X = xi and Z = zj ]

X
=
i,j
= E[X 1{Z=zj } ].
Properly, to check that Y satisfies the second property in Theorem 3.1.1, we need to check
E[Y 1G ] = E[X 1G ] for a general G ∈ σ(Z) and not just G = {Z = zj }. However, for reasons
beyond the scope of this course, in this case (thanks to the fact that Z is finite) its enough
to consider only G of the form {Z = zj }.
Therefore, we have Y = E[X|σ(Z)] almost surely. In this course we favour writing E[X|σ(Z)]
instead of E[X|Z], to make it clear that we are looking at conditional expectation with respect
to a σ -field.
33
3.2 Properties of conditional expectation

In all but the easiest cases, calculating conditional expectations explicitly from Theorem 3.1.1 is
not feasible. Instead, we are able to work with them via a set of useful properties, provided by
the following proposition.
Proposition 3.2.1 Let G, H be sub-σ -fields of F and X, Y, Z ∈ L1 . Then, almost surely,

(Linearity) E[a1 X1 + a2 X2 | G] = a1 E[X1 | G] + a2 E[X2 | G].
(Absolute values) |E[X | G]| ≤ E[|X| | G].
(Montonicity) If X ≤ Y , then E[X|G] ≤ E[Y |G].
(Constants) If a ∈ R (deterministic) then E[a | G] = a.
(Measurability) If X is G -measurable, then E[X | G] = X .
(Independence) If X is independent of G then E[X | G] = E[X].
(Taking out what is known) If Z is G measurable, then E[ZX | G] = Z E[X | G].
(Tower) If H ⊂ G then E[E[X | G] | H] = E[X | H].
(Taking E) It holds that E[E[X | G]] = E[X].
(No information) It holds that E[X | {∅, Ω}] = E[X],
Proof of these properties is beyond the scope of our course. Note that the first five properties
above are common properties of both E[·] and E[· | G].
We’ll use these properties extensively, for the whole of the remainder of the course. They are
not on the formula sheet – you should remember them and become familiar with applying them.
Remark 3.2.2 ( ) Although we have not proved the properties in Proposition 3.2.1, they are
intuitive properties for conditional expectation to have.
For example, in the taking out what is known property, we can think of Z as already being
simple enough to be G measurable, so we’d expect that taking conditional expectation with respect
to G doesn’t need to affect it.
In the independence property, we can think of G as giving us no information about the value
X is taking, so our best guess at the value of X has to be simply E[X].
In the tower property for E[E[X|G]|H], we start with X , simplify it to be G measurable and
simplify it to be H measurable. But since H ⊆ G , we might as well have just simplified X enough
to be H measurable in a single step, which would be E[X|H].
Etc. It is a useful exercise for you to try and think of ‘intuitive’ arguments for the other
properties too, so as you can easily remember them.
34
Conditional expectation as an estimator

The conditional expectation Y = E[X | G] is the ‘best least-squares estimator’ of X , based on
the information available in G . We can state this rigorously and use our toolkit from Proposition
3.2.1 prove it. It demonstrates another way in which Y is ‘the best’ G -measurable approximation
to X , and provides our first example of using the properties of E[X | G].
Lemma 3.2.3 Let G be a sub-σ -field of F . Let X be an F -measurable random variable and let
Y = E[X|G]. Suppose that Y 0 is a G -measurable, random variable. Then
E[(X − Y )2 ] ≤ E[(X − Y 0 )2 ].
Proof: We note that
E[(X − Y 0 )2 ] = E[(X − Y + Y − Y 0 )2 ]
= E[(X − Y )2 ] + 2E[(X − Y )(Y − Y 0 )] + E[(Y − Y 0 )2 ]. (3.3)
In the middle term above, we can write
E[(X − Y )(Y − Y 0 )] = E[E[(X − Y )(Y − Y 0 )|G]]

= E[(Y − Y 0 )E[X − Y |G]]
= E[(Y − Y 0 )(E[X|G] − Y )]
= E[(Y − Y 0 )(0)]
= 0.
Here, in the first step we used the ‘taking E’ property, in the second step we used Proposition
2.2.6 to tell us that Y − Y 0 is G -measurable, followed by the ‘taking out what is known’ rule. In
the final step we used the linearity and measurability properties. Since E[X|G] = Y almost surely,
we obtain that E[(X − Y )(Y − Y 0 )] = 0. Hence, since E[(Y − Y 0 )2 ] ≥ 0, from (3.3) we obtain
E[(X − Y 0 )2 ] ≥ E[(X − Y )2 ].
35
3.3 Martingales
In this section we introduce martingales, which are the mathematical representation of a ‘fair
game’. As usual, let (Ω, F, P) be a probability space. We refer to a sequence of random variables
(Sn )∞n=0 as a stochastic process. In this section of the course we only deal with discrete time
stochastic processes.
We have previously discussed the idea of gradually learning more and more information about
the outcome of some experiment, through seeing the information visible from gradually larger
σ -fields. We formalize this concept as follows.
Definition 3.3.1 A sequence of σ -fields (Fn )∞

n=0 is known as a filtration if F0 ⊆ F1 . . . ⊆ F .
Definition 3.3.2 We say that a stochastic process X = (Xn ) is adapted to the filtration (Fn ) if,
for all n, Xn is Fn measurable.
We should think of the filtration Fn as telling us which information we have access too at time
n = 1, 2, . . .. Thus, an adapted process is a process whose (random) value we know at all times
n ∈ N.
We are now ready to give the definition of a martingale.
Definition 3.3.3 A process M = (Mn )∞

n=0 is a martingale if
1. if (Mn ) is adapted,
2. Mn ∈ L1 for all n,
3. E[Mn+1 | Fn ] = Mn almost surely, for all n.
We say that M is a submartingale if, instead of 3, we have E[Mn | Fn−1 ] ≥ Mn−1 almost surely.
We say that M is a supermartingale if, instead of 3, we have E[Mn | Fn−1 ] ≤ Mn−1 almost surely.
Remark 3.3.4 The second condition in Definition 3.3.3 is needed for the third to make sense.
Remark 3.3.5 (Mn ) is a martingale iff it is both a submartingale and a supermartingale.
A martingale is the mathematical idealization of a fair game. It is best to understand what

we mean by this through an example.
Let (Xn ) be a sequence of i.i.d. random variables such that
1
P[Xi = 1] = P[Xi = −1] = .
2
Define Fn = σ(X1 , . . . , Xn ) and F0 = {∅, Ω}. Then (Fn ) is a filtration. Define
n
X
Sn = Xi
i=1
(and S0 = 0). We can think of Sn as a game in the following way. At each time n = 1, 2, . . . we
toss a coin. We win if the nth round if the coin is heads, and lose if it is tails. Each time we win
we score 1, each time we lose we score −1. Thus, Sn is our score after n rounds. The process Sn
is often called a simple random walk.
36
We claim that Sn is a martingale. To see this, we check the three properties in the definition.
(1) Since X1 , X2 , . . . , Xn ∈ mσ(X1 , . . . , Xn ) we have that Sn ∈ Fn for all n ∈ N. (2) Since
|Sn | ≤ n for all n ∈ N, E[|Sn |] ≤ n for all n, so Sn ∈ L1 for all n. (3) We have
E [Sn+1 | Fn ] = E[Xn+1 | Fn ] + E[Sn | Fn ]

= E[Xn+1 ] + Sn
= Sn .
Here, in the first line we used the linearity of conditional expectation. To deduce the second
line we used the relationship between independence and conditional expectation (for the first
term) and the measurability rule (for the second term). To deduce the final line we used that
E[Xn+1 ] = (1) 12 + (−1) 12 = 0.
At time n we have seen the result of rounds 1, 2, . . . , n, so the information we currently have
access to is given by Fn . This means that at time n we know S1 , . . . , Sn . But we don’t know
Sn+1 , because Sn+1 is not Fn -measurable. However, using our current information we can make
our best guess at what Sn+1 will be, which naturally is E[Sn+1 |Fn ]. Since the game is fair, in the
future, on average we do not expect to win more than we lose, that is E[Sn+1 |Fn ] = Sn .
In this course we will see many examples of martingales, and we will gradually build up
an intuition for how to recognize a martingale. There is, however, one easy sufficient (but not
necessary) condition under which we can recognize that a stochastic process is not a martingale.
Lemma 3.3.6 Let (Fn ) be a filtration and suppose that (Mn ) is a martingale. Then
E[Mn ] = E[M0 ]
for all n ∈ N.
Proof: We have E[Mn+1 |Fn ] = Mn . Taking expectations and using the ‘taking E’ property
from Proposition 3.2.1, we have E[Mn+1 ] = E[Mn ]. The result follows by a trivial induction.
Suppose, now, that (Xn ) is an i.i.d. sequence of random variables such that P[Xi = 2] =
P[Xi = −1] = 12 . Note that E[Xn ] > 0. Define Sn and Fn as before. Now, E[Sn ] = n1 E[Xn ],
P
which is not constant, so Sn is not a martingale. However, as before, Sn is Fn -measurable, and

|Sn | ≤ 2n so Sn ∈ L1 , essentially as before. We have
E[Sn+1 |Fn ] = E[Xn+1 |Fn ] + E[Sn |Fn ]

= E[Xn+1 ] + Sn
≥ Sn .
Hence Sn is a submartingale.
In general, if (Mn ) is a submartingale, then by definition E[Mn+1 | Fn ] ≥ Mn , so taking
expectations gives us E[Mn+1 ] ≥ E[Mn ]. For supermartingales we get E[Mn+1 ] ≤ E[Mn ]. In
words: submartingales, on average, increase, whereas supermartingales, on average, decrease.
The use of super- and sub- is counter intuitive in this respect.
Remark 3.3.7 Sometimes we will want to make it clear which filtration is being used in the
definition of a martingale. To do so we might say that (Mn ) is an Fn -martingale’, or that (Mn )
is a martingale with respect to Fn ’. We use the same notation for super/sub-martingales.
37
Our definition of a filtration and a martingale both make sense if we look at only a finite set of
times n = 1, . . . , N . We sometimes also use the terms filtration and martingale in this situation.
We end this section with two important general examples of martingales. You should check
the conditions yourself, as exercise 3.3.
Example 3.3.8 Let (Xn ) be a sequence of i.i.d. random variables such that E[Xn ] = 1 for all n,
and there exists c ∈ R such that |Xn | ≤ c for all n. Define Fn = σ(X1 , . . . , Xn ). Then
n
Y
Mn = Xn
i=1
is a martingale.
Example 3.3.9 Let Z ∈ L1 be a random variable and let (Fn ) be a filtration. Then
Mn = E[Z|Fn ]
is a martingale.
38

On conditional expectation and martingales
3.1 Let (Xn ) be a sequence of independent identically distributed random variables, such that
P[Xi = 1] = 12 and P[Xi = −1] = 12 . Let
n
X
Sn = Xi .
i=1
Use the properties of conditional expectation to find E[S2 | σ(X1 )] and E[S22 | σ(X1 )] in terms
of X1 , X2 and their expected values.
3.2 Let (Xn ) be a sequence of independent random variables such that P[Xn = 2] = 13 and
P[Xn = −1] = 23 . Set Fn = σ(Xi ; i ≤ n). Show that Sn = ni=1 Xi is an Fn martingale.
P
3.3 Check that Examples 3.3.8 and 3.3.9 are martingales.
3.4 Let (Mt ) be a stochastic process that is both and submartingale and a supermartingale.
Show that (Mt ) is a martingale.
3.5 (a) Let (Mn ) be an Fn martingale. Show that, for all 0 ≤ n ≤ m, E[Mm | Fn ] = Mn .
(b) Guess and state (without proof) the analogous result to (a) for submartingales.
3.6 Let (Mn ) be a Fn martingale and suppose Mn ∈ L2 for all n. Show that
2
E[Mn+1 |Fn ] = Mn2 + E[(Mn+1 − Mn )2 |Fn ] (3.4)
and deduce that (Mn2 ) is a submartingale.
3.7 Let X0 , X1 , . . . be a sequence of L1 random variables. Let Fn be their generated filtration

and suppose that E[Xn+1 |Fn ] = aXn + bXn−1 for all n ∈ N, where a, b > 0 and a + b = 1.
Find a value of α ∈ R (in terms of a, b) for which Sn = αXn + Xn−1 is an Fn martingale.
Challenge questions
3.8 In the setting of 3.1, show that E[X1 | σ(Sn )] = n .
Sn
39
Chapter 4
Stochastic processes
In this chapter we introduce stochastic processes, with a selection of examples that are commonly
used as building blocks in stochastic modelling. We show that these stochastic processes are
closely connected to martingales.
Definition 4.0.1 A stochastic process (in discrete time) is a sequence (Xn )∞

n=0 of random vari-
ables. We think of n as ‘time’.
For example, a sequence of i.i.d. random variables is a stochastic process. A martingale is a

stochastic process. A Markov chain (from MAS2003, for those who took it) is a stochastic process.
And so on.
For any stochastic process (Xn ) the natural or generated filtration of (Xn ) is the filtration
given by
Fn = σ(X1 , X2 , . . . , Xn ).
Therefore, a random variable is Fm measurable if it depends only on the behaviour of our stochas-
tic process up until time m.
From now on we adopt the convention (which is standard in the field of stochastic processes)
that whenever we don’t specify a filtration explicitly we mean to use the generated filtration.
4.1 Random walks

Random walks are stochastic processes that ‘walk around’ in space. We think of a particle that
moves between vertices of Z. At each step of time, the particle chooses at random to either move
up or down, for example from x to x + 1 or x − 1.
Simple symmetric random walk

Let (Xi )∞
i=1 be a sequence of i.i.d. random variables where
1
P[Xi = 1] = P[Xi = −1] = . (4.1)
2
The simple symmetric random walk is the stochastic process
n
X
Sn = Xi .
i=1
40
By convention, this means that S0 = 0. The word ‘simple’ refers to the fact that the walk moves
by precisely 1 unit of space in each step of time i.e. |Xi | = 1. A sample path of Sn , which means
a sample of the sequence (S0 , S1 , S2 , . . .), might look like:
Note that when time is discrete t = 0, 1, 2, . . . it is standard to draw the location of the random
walk (and other stochastic processes) as constant in between integer time points.
Because of (4.1), the random walk is equally likely to move upwards or downwards. This case
is known as the ‘symmetric’ random walk because, if S0 = 0, the two stochastic processes Sn and
−Sn have the same distribution.
We have already seen (in Section 3.3) that Sn is a martingale, with respect to its generated
filtration
Fn = σ(X1 , . . . , Xn ) = σ(S1 , . . . , Sn ).
It should seem very natural that (Sn ) is a martingale – going upwards as much as downwards is
‘fair’.
Simple asymmetric random walk

Let (Xi )∞
i=1 be a sequence of i.i.d. random variables. Let p + q = 1 with p, q ∈ [0, 1], p 6= q and
suppose that
P[Xi = 1] = p, P[Xi = −1] = q.
The asymmetric random walk is the stochastic process
n
X
Sn = Xi .
i=1
The key difference to the symmetric random walk is that here we have p 6= q (the symmetric
random walk has p = q = 12 ). The asymmetric random walk is more likely to step upwards than
downwards if p > q , and vice versa if q < p. The technical term for this behaviour is drift. A
sample path for the case p > q might look like:
41
This is ‘unfair’, because of the drift upwards, so we should suspect that the asymmetric random
walk is not a martingale. In fact,
n n
(4.2)
X X
E[Sn ] = E[Xi ] = (p − q) = n(p − q),
i=1 i=1
whereas E[S0 ] = 0. Thus, Lemma 3.3.6 confirms that Sn is not a martingale. However, the
process
Mn = Sn − n(p − q) (4.3)
is a martingale. The key is that the term n(p − q) compensates for the drift and ‘restores fairness’.
We’ll now prove that (Mn ) is a martingale. Since Xi ∈ mFn for all i ≤ n, by Proposition 2.2.6
we have Sn − n(p − q) ∈ mFn . Since |Xi | ≤ 1 we have
|Sn − n(p − q)| ≤ |Sn | + n|p − q| ≤ n + n|p − q|
and hence Mn is bounded, so Mn ∈ L1 . Lastly,
E[Sn+1 − (n + 1)(p − q) | Fn ] = E[Sn+1 | Fn ] − (n + 1)(p − q)

= E[Xn+1 | Fn ] + E[Sn | Fn ] − (n + 1)(p − q)
= E[Xn+1 ] + Sn − (n + 1)(p − q)
= (p − q) + Sn − (n + 1)(p − q)
= Sn − n(p − q).
Therefore E[Mn+1 | Fn ] = Mn , and (Mn ) is a martingale.
42
4.2 Urn processes

Urn processes are ‘balls in bags’ processes. In the simplest kind of urn process, which we look at
in this section, we have just a single urn (i.e. bag) that contains balls of two different colours.
At time 0, an urn contains 1 black ball and 1 red ball. Then, for each n = 1, 2, . . . , we generate
the state of the urn at time n by doing the following:
1. Draw a ball from the urn, look at its colour, and return this ball to the urn.
2. Add a new ball of the same colour as the drawn ball.
So, at time n (which means: after the nth iteration of the above steps is completed) there are n+2
balls in the urn. This process is known as the Pólya urn. Pólya was a Hungarian mathematician
who made contributions across a wide spectrum mathematics, in the first half of the 20th century.
Let Bn be the number of red balls in the urn at time n, and note that B0 = 1. Set (Fn ) to be
the filtration generated by (Bn ).
Our first step is to note that Bn itself is not a martingale. The reason is that over time we will
put more and more red balls into the urn, so the number of red balls drifts upwards over time.
Formally, we can note that
E[Bn+1 | Fn ] = E[Bn+1 1{(n+1)th draw is red} | Fn ] + E[Bn+1 1{(n+1)th draw is black} | Fn ]

= E[(Bn + 1)1{(n+1)th draw is red} | Fn ] + E[Bn 1{(n+1)th draw is black} | Fn ]
= (Bn + 1)E[1{(n+1)th draw is red} | Fn ] + Bn E[1{(n+1)th draw is black} | Fn ]

Bn Bn
= (Bn + 1) + Bn 1 −
n+2 n+2
Bn (n + 3)
= > Bn . (4.4)
n+2
We do have Bn ∈ mFn and since 1 ≤ Bn ≤ n + 2 we also have Bn ∈ L1 , so Bn is a submartingale,
but due to (4.4) Bn is not a martingale.
However, a closely related quantity is a martingale. Let
Bn
Mn = .
n+2
Then Mn is the proportion of balls in the urn that are red, at time n. Note that Mn ∈ [0, 1]. We
can think of the extra factor n + 2, which increases over time, as an attempt to cancel out the
upwards drift of Bn . We now have:
h i h i
E[Mn+1 | Fn ] = E Mn+1 1{(n+1)th draw is red} Fn + E Mn+1 1{(n+1)th draw is black} Fn

Bn + 1 Bn
=E 1 Fn + E 1 Fn
n + 3 {(n+1) draw is red} n + 3 {(n+1) draw is black}
th th
Bn + 1 h i Bn h i
= E 1{(n+1)th draw is red} Fn + E 1{(n+1)th draw is black} Fn
n+3 n+3
Bn + 1 Bn Bn Bn
= + 1−
n+3 n+2 n+3 n+2
Bn2 + Bn (n + 2)Bn − Bn2
= +
(n + 2)(n + 3) (n + 2)(n + 3)
43
(n + 3)Bn
=
(n + 2)(n + 3)
Bn
=
n+2
= Mn .
We have Mn ∈ mFn and since Mn ∈ [0, 1] we have that Mn ∈ L1 . Hence (Mn ) is a martingale.
We can think of Mn = n+2 Bn
as a compensation mechanism; Bn tends to increase, and we
compensate for this increase by dividing by n + 2.
Remark 4.2.1 The calculation of E[Mn+1 | Fn ] is written out in full as a second example of
the method. In fact, we could simply have divided the equality in (4.4) by n + 3, and obtained
E[Mn+1 | Fn ] = Mn .
On fairness
It is clear that the symmetric random walk is fair; at all times it is equally likely to move up as
down. The asymmetric random walk is not fair, due to its drift (4.2), but once we compensate
for drift in (4.3) we do still obtain a martingale.
Then urn process requires more careful thought. For example, we might wonder:
Suppose that the first draw is red. Then, at time n = 1 we have two red balls and one
black ball. So, the chance of drawing a red ball is now 23 . How is this fair?!
To answer this question, let us make a number of points. Firstly, let us remind ourselves that the
quantity which is a martingale is Mn , the proportion of red balls in the urn.
Secondly, suppose that the first draw is indeed red. So, at n = 1 we have 2 red and 1 black,
giving a proportion of 23 red and 13 black. The expected fraction of red balls after the next
(i.e. second) draw is
2 (2 + 1) 1 2 6+2 2
· + · = =
3 4 3 4 12 3
which is of course equal to the proportion of red balls that we had at n = 1.
Lastly, note that it is equally likely that, on the first go, you’d pick out a black. So, starting
from n = 0 and looking forwards, both colors have equally good chances of increasing their own
numbers. In fact, if we were to pretend, right from the start, that black was red, and red was
black, we would see the same urn process. This type of fairness is known as symmetry. We’ve
seen that Bn tends to increase (because we keep adding more balls), and we can think of Mn as
a way of discovering the fairness ‘hiding’ inside of Bn .
To sum up: in life there are different ways to think of ‘fairness’ – and what we need to do here
is get a sense for precisely what kind of fairness martingales characterize. The fact that (Mn )
is a martingale does not prevent us from (sometimes) ending up with many more red balls than
black, or vice versa. It just means that, when viewed in terms of (Mn ), there is no bias towards
red of black inherent in the rules of the game.
44
4.3 A branching process

Branching processes are stochastic processes that model objects which divide up into a random
number of copies of themselves. They are particularly important in mathematical biology (think
of cell division, the tree of life, etc). We won’t study any mathematical biology in this course, but
we will look at one example of a branching process: the Bienaymé-Galton-Watson process.
The Bienaymé-Galton-Watson (BGW for short) process is parametrized by a random variable
G, which is known as the offspring distribution. It is simplest to understand the BGW process by
drawing a tree, for example:
Each dot is a ‘parent’, which has a random number of child dots (indicated by arrows). Each
parent choses how many children it will have independently of all else, by taking a random sample
of G. The BGW process is the process Zn , where Zn is the number of dots in generation n.
Formally, we define the BGW process as follows. Let Xin , where n, i ≥ 1, be i.i.d. nonnegative
integer-valued random variables with common distribution G. Define a sequence (Zn ) by Z0 = 1
and
, if Zn > 0
(
X1n+1 + . . . + XZn+1
Zn+1 = n
(4.5)
0, if Zn = 0
Then Z is the BGW process. The random variable Xin represents the number of children of the
ith parent in the nth generation.
Note that if Zn = 0 for some n, then for all m > n we also have Zm = 0.
Remark 4.3.1 The Bienaymé-Galton-Watson is commonly just called Galton-Watson process

in the literature. It takes its name from Francis Galton (a statistician and social scientist) and
Henry Watson (a mathematical physicist), who in 1874 were concerned that Victorian aristocratic
surnames were becoming extinct. They tried to model how many children people had, which is
also how many times a surname was passed on, per family. This allowed them to use the process
Zn to predict whether a surname would die out (i.e. if Zn = 0 for some n) or become widespread
(i.e. Zn → ∞). Unfortunately they made mistakes in their work, and in fact it turns out the
French mathematician Irénée-Jules Bienaymé had entirely solved the question in 1845.
(Since then, the BGW process has found more important uses.)
Let us assume that G ∈ L1 and write µ = E[G]. Let Fn = σ(Xm

i ; i ∈ N, m ≤ n). In general,
45
Zn is not a martingale because
E[Zn+1 ] = E X1n+1 + . . . + XZn+1

n
"∞ #
X1n+1 + . . . + Xkn+1 1{Zn = k}
X
=E
k=1
∞
1{Zn = k}
X
X1n+1 + . . . + Xkn+1

= E
k=1
∞
E X1n+1 + . . . + Xkn+1 E [1{Zn = k}]
X
=
k=1
∞
X
E X1n+1 + . . . + E Xkn+1 P [Zn = k]

=
k=1
∞
X
= kµP[Zn = k]
k=1
∞
X
=µ k P[Zn = k]
k=1
= µ E[Zn ]. (4.6)
Here, we use that the Xin+1 are independent of Fn , but Zn (and hence also 1{Zn = k}) is Fn
measurable. We justify exchanging the infinite and E using the result of exercise (6.8).
P
From (4.6), Lemma 3.3.6 tells us that if (Mn ) is a martingale that E[Mn ] = E[Mn+1 ]. But,
if µ < 1 we see that E[Zn+1 ] < E[Zn ] (downwards drift) and if µ > 1 then E[Zn+1 ] > E[Zn ]
(upwards drift).
However, much like with the asymmetric random walk, we can compensate for the drift and
obtain a martingale. More precisely, we will show that
Zn
Mn =
µn
is a martingale.
We have M0 = 1 ∈ mF0 , and if Mn ∈ Fn then from (4.5) we have that Mn+1 ∈ mFn+1 .
Hence, by induction Mn ∈ Fn for all n ∈ N. From (4.6), we have E[Zn+1 ] = µE[Zn ] so as
E[Zn ] = µn for all n. Hence E[Mn ] = 1 and Mn ∈ L1 .
Lastly, we repeat the calculation that led to (4.6), but now with conditional expectation in
place of E. The first few steps are essentially the same, and we obtain
∞
E X1n+1 + . . . + Xkn+1 1{Zn = k} | Fn
X
E[Zn+1 | Fn ] =
k=1
∞
1{Zn = k}E X1n+1 + . . . + Xkn+1 | Fn
X
=
k=1
∞
1{Zn = k}E X1n+1 + . . . + Xkn+1
X
=
k=1
∞
kµ1{Zn = k}
X
=
k=1
46
∞
k 1{Zn = k}
X
=µ
k=1
= µZn .
Here we use that Zn is Fn measurable to take out what is known, and then use that Xin+1 is
independent of Fn . Hence, E[Mn+1 |Fn ] = Mn , as required.
47
4.4 Other stochastic processes

The world of stochastic processes, like the physical world that they try to model, is many and
varied. We can make more general kinds of random walk (and urn/branching processes) by
allowing more complex rules for what should happen on each new time step. Those of you
who have taken MAS2003 will have seen Poisson processes and Markov chains, which are two
more important types of stochastic process. There are stochastic processes to model objects that
coalesce together, objects that move around in space, objects that avoid one another, objects that
repeat themselves, objects that modify themselves, etc, etc.
Most (but not quite all) types of stochastic process have connections to martingales. The reason
for making these connections is that by using martingales it is possible to extract information
about the behaviour of a stochastic process – we will see some examples of how this can be done
in Chapters 7 and 8.
Remark 4.4.1 All the processes we have studied in this section can be represented as Markov
chains with state space N. It is possible to use the general theory of Markov chains to study these
stochastic processes, but it wouldn’t provide as much detail as we will obtain (in Chapters 7 and
8) using martingales.
48

On stochastic processes
4.1 Let Sn = ni=1 Xi be the symmetric random walk from Section 4.1 and let Zn = eSn . Show
P
that (Zn ) is a submartingale and that

!n
2
Mn = Zn
e + 1e
is a martingale.
4.2 Let Sn = ni=1 Xi be the asymmetric random walk from Section 4.1, where P[Xi = 1] = p,
P
P[Xi = −1] = q and with p > q and p + q = 1. Show that (Sn ) is a submartingale and that
Sn
q
Mn =
p
is a martingale.
4.3 Let (Xi ) be a sequence of identically distributed random variables with common distribution
with probability pa
(
a
Xi =
−b with probability pb = 1 − pa .
where 0 ≤ a, b. Let Sn = ni=1 Xi . Under what conditions on a, b, pa , pb is (Sn ) a martingale?

P
4.4 Let Sn = i=1 Xi be the symmetric random walk from Section 4.1. Show that Sn is a
Pn 2
submartingale and that Mn = Sn2 − n is a martingale.
4.5 Let (Xi ) be an i.i.d. sequence of random variables such that P[Xi = 1] = P[Xi = −1] = 12 .
Define a stochastic process Sn by setting S0 = 1 and
Sn + Xn+1 if Sn > 0,
(
Sn+1 =
1 if Sn = 0.
That is, Sn behaves like a symmetric random walk but, whenever it becomes zero, on the
next time step it is ‘reflected’ back to 1. Let
n−1
1{Si = 0}
X
Ln =
i=0
be the number of time steps, before time n, at which Sn is zero. Show that
E[Sn+1 | Fn ] = Sn + 1{Sn = 0}
and hence show that Sn − Ln is a martingale.
4.6 Consider an urn that may contain balls of three colours: red, blue and green. Initially the
urn contains one ball of each colour. Then, at each step of time n = 1, 2, . . . we draw a ball
from the urn. We place the drawn ball back into the urn and add an additional ball of the
same colour.
Let (Mn ) be the proportion of balls that are red. Show that (Mn ) is a martingale.
49
4.7 Let Sn = ni=1 Xi be the symmetric random walk from Section 4.1. State, with proof, which
P
of the following processes are martingales:

Sn
(i) Sn2 + n (ii) Sn2 + Sn − n (iii)
n
Which of the above are submartingales?
Challenge questions
4.8 Let (Sn ) be the symmetric random walk from Section 4.1. Prove that there is no deterministic
function f : N → R such that Sn3 − f (n) is a martingale.
50
Chapter 5
The binomial model
We now return to financial mathematics. We will extend the one-period model from Chapter 1
and discover a surprising connection between arbitrage and martingales.
5.1 Arbitrage in the one-period model

Let us recall the one-period market from Section 1.2. We have two commodities, cash and stock.
Cash earns interest at rate r, so:
• If we hold x units of cash at time 0, they become worth x(1 + r) at time 1.
At time t = 0, a single unit of stock is worth s units of cash. At time 1, the value of a unit of
stock changes to
sd with probability pd ,
(
S1 =
su with probability pu ,
where pu + pd = 1.
Note that roles of u and d are interchangeable – we would get the same model if we swapped
the values of u and d (and pu and pd to match). So, we lose nothing by assuming that d < u. We
also assume that all of r, pd , pd , d and u are strictly positive.
The price of our stock changes as follows:
• If we hold y units of stock, worth ys, at time 0, they become worth yS1 at time 1.
Recall that we can borrow cash from the bank (provided we pay it back with interest at rate r, at
some later time) and that we can borrow stock from the stockbroker (provided we give the same
number of units of stock back, at some later time). Thus, x and y are allowed to be negative,
with the meaning that we have borrowed.
Recall also that we use the term portfolio for the amount of cash/stock that we hold at some
time. We can formalize this: A portfolio is a pair h = (x, y) ∈ R2 , where x is the amount of
cash and y is the number of (units of) of stock.
Definition 5.1.1 The value process or price process of the portfolio h = (x, y) is the process
V h given by
V0h = x + ys
V1h = x(1 + r) + yS1 .
51
We can also formalize the idea of arbitrage. A portfolio is an arbitrage if it makes money for
free:
Definition 5.1.2 A portfolio h = (x, y) is said to be an arbitrage possibility if:
V0h = 0
P[V1h ≥ 0] = 1
P[V1h > 0] > 0.
We say that a market is arbitrage free if there do not exist any arbitrage possibilities.
It is possible to characterize exactly when the one-period market is arbitrage free. In fact, we
have already done most of the work in exercise 1.3.
Proposition 5.1.3 The one-period market is arbitrage free if and only if d < 1 + r < u.
Proof: (⇒) : Recall that we assume d < u. Hence, if d < 1+r < u fails then either 1+r ≤ d < u
or d < u ≤ 1 + r. In both cases, we will construct an arbitrage possibility.
In the case 1 + r ≤ d < u we use the portfolio h = (−s, 1) which has V0h = 0 and
V1h = −s(1 + r) + S1 ≥ s(−(1 + r) + d) ≥ 0,
hence P[V1h ≥ 0] = 1. Further, with probability pu > 0 we have S1 = su, which means V1h >
s(−(1 + r) + d) ≥ 0. Hence P[V1h > 0] > 0. Thus, h is an arbitrage possibility.
If 0 < d < u ≤ 1 + r then we use the portfolio h0 = (s, −1), which has V0h = 0 and
0
0
V1h = s(1 + r) − S1 ≥ s(1 + r − u) ≥ 0,
hence P[V1h ≥ 0] = 1. Further, with probability pd > 0 we have S1 = sd, which means V1h >
0 0
s(−(1 + r) + u) ≥ 0. Hence P[V1h > 0] > 0. Thus, h0 is also an arbitrage possibility.

0
Remark 5.1.4 In both cases, at time 0 we borrow whichever commodity (cash or stock) will
grow slowest in value, immediately sell it and use the proceeds to buy the other, which we know
will grow faster in value. Then we wait; at time 1 we own the commodity has grown fastest in
value, so we sell it, repay our debt and have some profit left over.
(⇐) : Now, assume that d < 1 + r < u. We need to show that no arbitrage is possible. To do
so, we will show that if a portfolio has V0h = 0 and V1h ≥ 0 then it also has V1h = 0.
So, let h = (x, y) be a portfolio such that V0h = 0 and V1h ≥ 0. We have
V0h = x + ys = 0.
The value of h at time 1 is
V1h = x(1 + r) + ysZ.
Using that x = −ys, we have
if Z = u,
(
ys(u − (1 + r))
V1h = (5.1)
ys(d − (1 + r)) if Z = d.
Since P[V1h ≥ 0] = 1 this means that both (a) ys(u − (1 + r)) ≥ 0 and (b) ys(d − (1 + r)) ≥ 0. If
y < 0 then we contradict (a) because 1 + r < u. If y > 0 then we contradict (b) because d < 1 + r.
So the only option left is that y = 0, in which case V0h = V1h = 0.
52
Expectation regained
In Proposition 5.1.3 we showed that our one period model was free of arbitrage if and only if
d < 1 + r < u.
This condition is very natural: it means that sometimes the stock will outperform cash and
sometimes cash will outperform the stock. Without this condition it is intuitively clear that our
market would be a bad model. From that point of view, Proposition 5.1.3 is encouraging since it
confirms the importance of (no) arbitrage.
However, it turns out that there is more to the condition d < 1 + r < u, which we now explore.
It is equivalent to asking that there exists qu , qd ∈ (0, 1) such that both
qu + qd = 1 and 1 + r = uqu + dqd . (5.2)
In words, (5.2) says that 1 + r is a weighted average of d and u, where d has weight qd and u has
weight qu . Solving these two equations gives
(1 + r) − d u − (1 + r)
qu = , qd = . (5.3)
u−d u−d
Now, here is the key: we can think of the weights qu and qd as probabilities. Let’s pretend that
we live in a different world, where a single unit of stock, worth S0 = s at time 0, changes value
to become worth
sd with probability qd ,
(
S1 =
su with probability qu .
We have altered – the technical term is tilted – the probabilities from their old values pd , pu to
new values qd , qu . Let’s call this new world Q, by which we mean that Q is our new probability
measure: Q[S1 = sd] = qd and Q[S1 = su] = qu . This is often called the risk-neutral world,
and qu , qd are known as the risk-neutral probabilities1 .
Since Q is a probability measure, we can use it to take expectations. We use EP and EQ to
make it clear if we are taking expectations using P or Q.
We have
1 1
EQ [S1 ] = suQ[S1 = su] + sdQ[S1 = sd]
1+r 1+r
1
= (s)(uqu + dqd )
1+r
= s.
The price of the stock at time 0 is S0 = s. To sum up, we have shown that the price S1 of a unit
of stock at time 1 satisfies
1
S0 = EQ [S1 ]. (5.4)
1+r
This is a formula that is very well known to economists. It gives the stock price today (t = 0) as
the expectation under Q of the stock price tomorrow (t = 1), discounted by the rate 1 + r at
which it would earn interest.
1 We will discuss the reason for the name ‘risk-neutral’ later.
53
Equation (5.4) is our first example of a ‘risk-neutral valuation’ formula. Recall that we pointed
out in Chapter 1 that we should not use EP and ‘expected value’ prices. A possible cause of
confusion is that (5.4) does correctly calculate the value (i.e. price) of a single unit of stock by
taking an expectation. The point is that we (1) use EQ rather than EP and (2) then discount
according to the interest rate. We will see, in the next section, that these two steps are the correct
way to go about arbitrage free pricing in general.
Moreover, in Section 5.4 we will extend our model to have multiple time steps. Then the
expectation in (5.4) will lead us to martingales.
54
5.2 Hedging in the one-period model

We saw in Section 1.3 that the ‘no arbitrage’ assumption could force some prices to take particular
values. It is not immediately obvious if the absence of arbitrage forces a unique value for every
price; we will show in this section that it does.
First, let us write down exactly what it is that we need to price.
Definition 5.2.1 A contingent claim is any random variable of the form X = Φ(S1 ), where Φ
is a deterministic function.
The function Φ is sometimes known as the contract function. One example of a contingent
claim is a forward contract, in which the holder promises to buy a unit of stock at time 1 for a
fixed price K , known as the strike price. In this case the contingent claim would be
Φ(S1 ) = S1 − K,
the value of a unit of stock at time 1 minus the price paid for it. We will see many other examples
in the course. Here is another.
Example 5.2.2 A European call option gives its holder the right (but not the obligation) to
buy, at time 1, a single unit of stock for a fixed price K that is agreed at time 0. As for futures,
K is known as the strike price.
Suppose we hold a European call option at time 1. Then, if S1 > K , we could exercise our
right to buy a unit of stock at price K , immediately sell the stock for S1 and consequently earn
S1 − K > 0 in cash. Alternatively if S1 ≤ K then our option is worthless.
Since S1 is equal to either to either su or sd, the only interesting case is when sd < K < su.
In this case, the contingent claim for our European call option is
su − K if S1 = su
(
Φ(S1 ) = (5.5)
0 if S1 = sd.
In the first case our right to buy is worth exercising; in the second case it is not. A simpler way
to write this contingent claim is
Φ(S1 ) = max(S1 − K, 0). (5.6)
In general, given any contract, we can work out its contingent claim. We therefore plan to find
a general way of pricing contingent claims. In Section 1.3 we relied on finding specific trading
strategies to determine prices (one, from the point of view of the buyer, that gave an upper bound
and one, from the point of view of the seller, to give a lower bound). Our first step in this section
is to find a general way of constructing trading strategies.
Definition 5.2.3 We say that a portfolio h is a replicating portfolio or hedging portfolio

for the contingent claim Φ(S1 ) if V1h = Φ(S1 ).
The process of finding a replicating portfolio is known simply as replicating or hedging. The above
definition means that, if we hold the portfolio h at time 0, then at time 1 it will have precisely
the same value as the contingent claim Φ(S1 ). Therefore, since we assume our model is free of
arbitrage:
55
If a contingent claim Φ(S1 ) has a replicating portfolio h, then the price of the Φ(S1 ) at
time 0 must be equal to the value of h at time 0.
We say that a market is complete if every contingent claim can be replicated. Therefore, if
the market is complete, we can price any contingent claim.
Example 5.2.4 Suppose that s = 1, d = 12 , u = 2 and r = 14 , and that we are looking at the
contingent claim
1 if S1 = su,
(
Φ(S1 ) =
0 if S1 = sd.
We can represent this situation as a tree, with a branch for each possible movement of the stock,
and the resulting value of our contingent claim written in a square box.
Suppose that we wish to replicate Φ(S1 ). That is, we need a portfolio h = (x, y) such that
V1h = Φ(S1 ):
(1 + 14 )x + 2y = 1
(1 + 14 )x + 12 y = 0.
This is a pair of linear equations that we can solve. The solution (which is left for you to check)
is x = −4
15 , y = 3 . Hence the price of our contingent claim Φ(S1 ) at time 0 is V0 = 15 + 1 · 3 = 5 .
2 h −4 2 2
Let us now take an arbitrary contingent claim Φ(S1 ) and see if we can replicate it. This would
mean finding a portfolio h such that the value V1h of the portfolio at time 1 is Φ(S1 ):
Φ(su) if S1 = su,
(
V1h =
Φ(sd) if S1 = sd.
By (5.1), if we write h = (x, y) then we need
(1 + r)x + suy = Φ(su)
(1 + r)x + sdy = Φ(sd),
which is just a pair of linear equations to solve for (x, y). In matrix form,
! ! !
1 + r su x Φ(su)
= . (5.7)
1 + r sd y Φ(sd)
A unique solution exists when the determinant is non-zero, that is when (1 + r)su − (1 + r)sd 6= 0,
or equivalently when u 6= d. So, in this case, we can find a replicating portfolio for any contingent
claim.
It is an assumption of the model that d ≤ u, so we have that our one-period model is complete
if d < u. Therefore:
56
Proposition 5.2.5 If the one-period model is arbitrage free then it is complete.

And, in this case, we can solve (5.7) to get
1 uΦ(sd) − dΦ(su)
x= ,
1+r u−d
1 Φ(su) − Φ(sd)
y= . (5.8)
s u−d
which tells us that the price of Φ(S1 ) at time 0 should be
V0h = x + sy

1 (1 + r) − d u − (1 + r)
= Φ(su) + Φ(sd)
1+r u−d u−d
1
= (qu Φ(su) + qd Φ(sd))
1+r
1
= EQ [Φ(S1 )].
1+r
Hence, the value (and therefore, the price) of Φ(S1 ) at time 0, is given by
1
V0h = EQ [Φ(S1 )]. (5.9)
1+r
The formula (5.9) is known as the risk-neutral valuation formula. It says that to find the
price of Φ(S1 ) at time 0 we should take its expectation according to Q, and then discount one
time step worth of interest i.e. divide by 1 + r. It is a very powerful tool, since it allows us to
price any contingent claim.
Note the similarity of (5.9) to (5.4). In fact, (5.4) is a special case of (5.9), namely the case
where Φ(S1 ) = S1 i.e. pricing the contingent claim corresponding to being given a single unit of
stock.
To sum up:
Proposition 5.2.6 Let Φ(S1 ) be a contingent claim. Then the (unique) replicating portfolio
h = (x, y) for Φ(S1 ) can be found by solving V1h = Φ(S1 ), which can be written as a pair of linear
equations:
(1 + r)x + suy = Φ(su)
(1 + r)x + sdy = Φ(sd).
The general solution is (5.8). The value (and hence, the price) of Φ(S1 ) at time 0 is
1
V0h = EQ [Φ(S1 )] .
1+r
For example, we can now both price and hedge the European call option.
Example 5.2.7 In (5.5) we found the contingent claim of a European call option with strike
price K ∈ (sd, su) to be Φ(ST ) = max(ST − K, 0). By the first part of Proposition 5.2.6, to find
a replicating portfolio h = (x, y) we must solve V1h = Φ(S1 ), which is
(1 + r)x + suy = su − K
(1 + r)x + sdy = 0.
This has the solution (again, left for you to check) x = sd(K−su)
(1+r)(su−sd) , y = su−sd .
su−K
By the second
57
part of Proposition 5.2.6 the value of the European call option at time 0 is
1 1
EQ [Φ(S1 )] = (qu (su − K) + qd (0))
1+r 1+r
1 (1 + r) − d
= (su − K).
1+r u−d
58
5.3 Types of financial derivative

A contract that specifies that buying/selling will occur, now or in the future, is known as a
financial derivative, or simply derivative. Financial derivatives that give a choice between two
options are often known simply as options.
Here we collect together the three types of financial derivatives that we have mentioned in
previous sections. As before, we use the term strike price to refer to a fixed price K that is
agreed at time 0 (and paid at time 1).
• A forward or forward contract is the obligation to buy (or sell) a single unit of stock at
time 1 for a strike price K .
• A European call option is the right, but not the obligation, to buy a single unit of stock
at time 1 for a strike price K .
• A European put option is the right, but not the obligation, to sell a single unit of stock
at time 1 for a strike price K .
You are expected to remember these definitions! We will often use them in our examples.
There are many other types of financial derivative; we’ll mention some more examples later in
the course, in Section 17.2.
59
5.4 The binomial model

Let us step back and examine our progress, for a moment. We now know about as much about
one-period model as there is to know. It is time to move onto to a more complicated (and more
realistic) model. The one-period model is unsatisfactory in two main respects:
1. The one-period model has only a single step of time.
2. The stock price process (St ) is too simplistic.
We’ll start to address the first of these points now. The second point waits until the second
semester of the course.
Adding multiple time steps to our model will make use of the theory we developed in Chapters
2 and 3. It will also reveal a surprising connection between arbitrage and martingales.
The binomial model has time points t = 0, 1, 2, . . . , T . Inside each time step, we have a
single step of the one-period model. This means that cash earns interest at rate r:
• If we hold x units of cash at time t, it will become worth x(1 + r) (in cash) at time t + 1.
For our stock, we’ll have to think a little harder. In a single time step, the value of our stock is
multiplied by a random variable Z with distribution P[Z = u] = pu , P[Z = d] = pd . We now
have several time steps. For each time step we’ll use a new independent Z . So, let (Zt )Tt=1 be a
sequence of i.i.d. random variables each with the distribution of Z .
• The value of a single unit of stock at time t is given by
S0 = s,
St = Zt St−1 .
We can illustrate the process (St ) using a tree-like diagram:
Note that the tree is recombining, in the sense that a move up (by u) followed by a move down
(by d) has the same outcome as a move down followed by a move up. It’s like a random walk,
except we multiply instead of add (recall exercise 4.1).
Remark 5.4.1 The one-period model is simply the T = 1 case of the binomial model. Both
models are summarized on the formula sheet, see Appendix B.
60
5.5 Portfolios, arbitrage and martingales

Since we now have multiple time steps, we can exchange cash for stock (and vice versa) at all
times t = 0, 1, . . . , T − 1. We need to expand our idea of a portfolio to allow for this.
The filtration corresponding to the information available to a buyer/seller in the binomial
model is
Ft = σ(Z1 , Z2 , . . . , Zt ).
In words, the information in Ft contains changes in the stock price up to and including at time t.
This means that, S0 , S1 , . . . , St are all Ft measurable, but St+1 is not Ft measurable.
When we choose how much stock/cash to buy/sell at time t − 1, we do so without knowing
how the stock price will change during t − 1 7→ t. So we must do so using information only from
Ft−1 .
We now have enough terminology to define the strategies that are available to participants in
the binomial market.
Definition 5.5.1 A portfolio strategy is a stochastic process
ht = (xt , yt )
for t = 1, 2, . . . , T , such that ht is Ft−1 measurable.
The interpretation is that xt is the amount of cash, and yt the amount of stock, that we hold
during the time step t − 1 7→ t. We make our choice of how much cash and stock to hold during
t−1 7→ t based on knowing the value of S0 , S1 , . . . , St−1 , but without knowing St . This is realistic.
Definition 5.5.2 The value process of the portfolio strategy h = (ht )Tt=1 is the stochastic
process (Vth )Tt=0 given by
V0h = x1 + y1 S0 ,
Vth = xt (1 + r) + yt St ,
for t = 1, 2, . . . , T .
At t = 0, V0h is the value of the portfolio h1 . For t ≥ 1, Vth is the value of the portfolio ht at time
t, after the change in value of cash/stock that occurs during t − 1 7→ t. The value process Vth is
Ft measurable but it is not Ft−1 measurable.
We will be especially interested in portfolio strategies that require an initial investment at time
0 but, at later times t ≥ 1, 2, . . . , T − 1, any changes in the amount of stock/cash held will pay
for itself. We capture such portfolio strategies in the following definition.
Definition 5.5.3 A portfolio strategy ht = (xt , yt ) is said to be self-financing if
Vth = xt+1 + yt+1 St .
for t = 0, 1, 2, . . . , T .
This means that the value of the portfolio at time t is equal to the value (at time t) of the
stock/cash that is held in between times t 7→ t + 1. In other words, in a self-financing portfolio
at the times t = 1, 2, . . . we can swap our stocks for cash (and vice versa) according to whatever
the stock price turns out to be, but that is all we can do.
Lastly, our idea of arbitrage must also be upgraded to handle multiple time steps.
61
Definition 5.5.4 We say that a portfolio strategy (ht ) is an arbitrage possibility if it is self-
financing and satisfies
V0h = 0
P[VTh ≥ 0] = 1.
P[VTh > 0] > 0.
In words, an arbitrage possibility requires that we invest nothing at times t = 0, 1, . . . , T − 1,
but which gives us a positive probability of earning something at time T , with no risk at all of
actually losing money.
It’s natural to ask when the binomial model is arbitrage free. Happily, the condition turns out
to be the same as for the one-period model.
Proposition 5.5.5 The binomial model is arbitrage free if and only if d < 1 + r < u.
The proof is quite similar to the argument for the one-period model, but involves more technical
calculations and (for this reason) we don’t include it as part of the course.
Recall the risk-neutral probabilities from (5.3). In the one-period model, we use them to
define the risk-neutral world Q, in which on each time step the stock price moves up (by u) with
probability qu , or down (by d) with probability qd . This provides a connection to martingales:
Proposition 5.5.6 If d < 1 + r < u, then under the probability measure Q, the process
1
Mt = St
(1 + r)t
is a martingale, with respect to the filtration (Ft ).
Proof: We have commented above that St ∈ mFt , and we also have dt S0 ≤ St ≤ ut S0 , so St

is bounded and hence St ∈ L1 . Hence also Mt ∈ mFt and Mt ∈ L1 . It remains to show that
EQ [Mt+1 | Ft ] = EQ Mt+1 1{Zt+1 =u} + Mt+1 1{Zt+1 =d} | Ft

uSt dSt
=E Q
1 + 1 | Ft
(1 + r)t+1 {Zt+1 =u} (1 + r)t+1 {Zt+1 =d}
St
uEQ 1{Zt+1 =u} | Ft + dEQ 1{Zt+1 =d} | Ft

= t+1
(1 + r)
St
uEQ 1{Zt+1 =u} + dEQ 1{Zt+1 =d}

= t+1
(1 + r)
St
= (uQ[Zt+1 = u] + dQ [Zt+1 = d])
(1 + r)t+1
St
= (uqu + dqd )
(1 + r)t+1
St
= (1 + r)
(1 + r)t+1
= Mt .
Here, from the second to third line we take out what is known, using that St ∈ mFt . To deduce
the third line we use linearity, and to deduce the fourth line we use that Zt+1 is independent of
Ft . Lastly, we recall from (5.2) that uqu + dqd = 1 + r. Hence, (Mt ) is a martingale with respect
to the filtration Ft , in the risk-neutral world Q.
62
Remark 5.5.7 Using Lemma 3.3.6 we have EQ [M0 ] = EQ [M1 ], which states that S0 = 1+r E [S1 ].
1 Q
This is precisely (5.4).
63
5.6 Hedging
We can adapt the derivatives from Section 5.3 to the binomial model, by simply replacing time 1
with time T . For example, in the binomial model a forward contract is the obligation to buy a
single unit of stock at time T for a strike price K that is agreed at time 0.
Definition 5.6.1 A contingent claim is a random variable of the form Φ(ST ), where Φ : R → R
is a deterministic function.
For a forward contract, the contingent claim would be Φ(ST ) = ST − K .
Definition 5.6.2 We say that a portfolio strategy h = (ht )Tt=1 is a replicating portfolio or
hedging strategy for the contingent claim Φ(ST ) if h is self-financing and VTh = Φ(ST ).
These match the definitions for the one-period model, except we now care about the value of
the asset at time T (instead of time 1). We will shortly look at how to find replicating portfolios.
As in the one-period model, the binomial model is said to be complete if every contingent
claim can be replicated. Further, as in the one-period model, if the binomial model is free of
arbitrage then it is also complete. With this in mind, for the rest of this section we assume that
d < 1 + r < u.
Lastly, as in the one-period model, our assumption that there is no arbitrage means that:
If a contingent claim Φ(ST ) has a replicating portfolio h = (ht )Tt=1 , then the price of the
Φ(ST ) at time 0 must be equal to the value of h0 .
Now, let us end this chapter by showing how to compute prices and replicating portfolios in
the binomial model. We already know how to do this in the one-period model, see Example 5.2.4.
We could do it in full generality (as we did in (5.7) for the one-period model) but this would
involve lots of indices and look rather messy. Instead, we’ll work through a practical example that
makes the general strategy clear.
Let us take T = 3 and set
S0 = 80, u = 1.5, d = 0.5, pu = 0.6, pd = 0.4.
To make the calculations easier, we’ll also take our interest rate to be r = 0. We’ll price a
European call option with strike price K = 80. The contingent claim for this option, which is
Φ(ST ) = max(ST − K, 0). (5.10)
STEP 1 is to work our the risk-neutral probabilities. From (5.3), these are qu = 1+0−0.5
1.5−0.5 = 0.5
and qd = 1 − qu = 0.5.
STEP 2 is to write down the tree of possible values that the stock can take during time
t = 0, 1, 2, 3. This looks like
64
We then work out, at each of the nodes corresponding to time T = 3, what the value of our
contingent claim (5.10) would be if this node were reached. We write these values in square
boxes:
We now come to STEP 3, the key idea. Suppose we are sitting in one of the nodes at time
t = 2, which we think of as the ‘current’ node. For example suppose we are at the uppermost
node (labelled 180, the ‘current’ value of the stock). Looking forwards one step of time we can
see that, if the stock price goes up our option is worth 190, whereas if the stock price goes down
our option is worth 10. What we are seeing here is (an instance of) the one-period model! With
contingent claim
Φ(su)
e = 190, Φ(sd)
e = 10.
So, using the one-period risk-neutral valuation formula from Proposition 5.2.6 the value of our
call option at our current node is
1
(190 · 0.5 + 10 · 0.5) = 100.
1+0
We could apply the same logic to any of the nodes corresponding to time t = 2, and compute the
value of our call option at that node:
65
If we now imagine ourselves sitting in one of the nodes at time t = 1, and look forwards one step
in time, we again find ourselves faced with an instance of the one-period model. This allows us
to compute the value of our call option at the t = 1 nodes; take for example the node labelled
by 40 which, one step into the future, sees the contingent claim Φ(su)
e = 5, Φ(sd)
e = 0 and using
(5.9) gives the value of the call option at this node as 1+0 (5 × 0.5 + 0 × 0.5) = 2.5. Repeating the
1
procedure on the other t = 1 node, and then also on the single t = 0 node gives us
Therefore, the value (i.e. the price) of our call option at time t = 0 is 27.5.
Although we have computed the price, we haven’t yet computed a replicating portfolio, which
is STEP 4. We could do it by solving lots of linear equations for our one-period models, as in
Example 5.2.4, but since we have several steps a quicker way is to apply Proposition 5.2.6 and
use the general formula we found in (5.8).
Starting at time t = 0, to replicate the contingent claim Φ(su)
e = 52.5 and Φ(sd)
e = 2.5 at time
t = 1, equation (5.8) tells us that we want the portfolio
1 1.5 · 2.5 − 0.5 · 52.5 1 52.5 − 2.5 5
x1 = = −22.5, y1 = = .
1+0 1.5 − 0.5 80 1.5 − 0.5 8
66
The value of this portfolio at time 0 is

5
x1 + 80y1 = −22.5 + 80 · = 27.5
8
which is equal to the initial value of our call option.
We can then carry on forwards. For example, if the stock went up in between t = 0 and
t = 1, then at time t = 1 we would be sitting in the node for S1 = 120, labelled simply 120. Our
portfolio (x1 , y1 ) is now worth
5
x1 (1 + 0) + y1 · 120 = −22.5 + 120 · = 52.5,
8
equal to what is now the value of our call option. We use (5.8) again to calculate the portfolio we
want to hold during time 1 7→ 2, this time with Φ(su)
e = 100 and Φ(sd)
e = 5, giving x2 = −42.5
and y2 = 120 . You can check that the current value of the portfolio (x2 , y2 ) is 52.5.
95
Next, suppose the stock price falls between t = 1 and t = 2, so our next node is S2 = 60. Our
portfolio (x2 , y2 ) now becomes worth
95
x2 (1 + 0) + y2 · 60 = −42.5 + · 60 = 5,
120
again equal to the value our of call option. For the final step, we must replicate the contingent
claim Φ(su)
e = 10, Φ(sd)
e = 0, which (5.8) tells us is done using x3 = −5 and y3 = 16 . Again, you
can check that the current value of this portfolio is 5.
Lastly, the stock price rises again to S3 = 90. Our portfolio becomes worth
1
x3 (1 + 0) + y3 · 90 = −5 + · 90 = 10,
6
equal to the payoff from our call option.
To sum up, using (5.8) we can work out which portfolio we would want to hold, at each possible
outcome of the stock changing value. At all times we would be holding a portfolio with current
value equal to the current value of the call option. Therefore, this gives a self-financing portfolio
strategy that replicates Φ(ST ).
67

All questions use the notation u, d, pu , pd , s and r, which has been used throughout this chapter.
In all questions we assume that the models are arbitrage free and complete: d < 1 + r < u.
On the one-period model

5.1 Suppose that we hold the portfolio (1, 3) at time 0. What is the value of this portfolio at
time 1?
5.2 Find portfolios that replicate the following contingent claims.
(a) Φ(S1 ) = 1
if S1 = su,
(
3
(b) Φ(S1 ) = .
1 if S1 = sd.
Hence, write down the values of these contingent claims at time 0.
5.3 Find the contingent claims Φ(S1 ) for the following derivatives.
(a) A contract in which the holder promises to buy two units of stock at time t = 1, each
for strike price K .
(b) A European put option with strike price K ∈ (sd, su) (see Section 5.3).
(c) A contract in which we promise that, if S1 = su, we will sell one unit of stock at time
t = 1 for strike price K ∈ (sd, su) (and otherwise, if S1 = sd we do nothing).
(d) Holding both the contracts in (b) and (c) at once.
5.4 Let Πcall

t and Πput
t be the price of European call and put options, both with the same strike
price K ∈ (sd, su), at times t = 0, 1.
(a) Write down formulae for Πcall

0 and Πput
0 .
(b) Show that Πcall
0 − Πput
0 =s− 1+r .
K
On the binomial model

5.5 Write down the contingent claim of a European call option (that matures at time T ).
5.6 Let T = 2 and let the initial value of a single unit of stock be S0 = 100. Suppose that
pu = 0.25 and pd = 0.75, that u = 2.0 and d = 0.5, and that r = 0.25. Draw out, in a
tree-like diagram, the possible values of the stock price at times t = 0, 1, 2. Find the price,
at time 0, of a European put option with strike price K = 100.
Suppose instead that pu = 0.1 and pd = 0.9. Does this change the value of our put option?
pu = 0.5 and pd = 0.5, that u = 1.5 and d = 0.5, and that r = 0.0. Draw out, in a tree-like
diagram, the possible values of the stock price at times t = 0, 1, 2. Annotate your tree to
show a hedging strategy for a European call option with strike price K = 60. Hence, write
down the value of this option at time 0.
68
pu = 0.5 and pd = 0.5, that u = 1.5 and d = 0.75, and that r = 0. Draw out, in a tree-like
diagram, the possible values of the stock price at times t = 0, 1, 2. Annotate your tree to
show a hedging strategy for a European call option with strike price K = 60. Hence, write
down the value of this option at time 0.
Comment on the values obtained for the hedging portfolios.
5.9 Recall that (St )Tt=1 is the price of a single unit of stock.
(a) Find a condition on pu , pd , u, d that is equivalent to saying that St is a martingale under

P.
(b) When is Mt = log St is a martingale under P?
Challenge questions
5.10 Write a computer program (in a language of your choice) that carries out the pricing algo-
rithm for the binomial model, for a general number n of time-steps.
69
Chapter 6
Convergence of random variables
A real number is a simple object; it takes a single value. As such, if an is a sequence of real
numbers, limn→∞ an = a, means that the value of an converges to the value of a.
Random variables are more complicated objects. They take many different values, with dif-
ferent probabilities. Consequently, if X1 , X2 , . . . and X are random variables, there are many
different ways in which we can try to make sense of the idea that Xn → X . They are called modes
of convergence, and are the focus of this chapter.
Convergence of random variables sits at the heart of all sophisticated stochastic modelling.
Crucially, it provides a way to approximate one random variable with another (since if Xn → X
then we may hope that Xn ≈ X for large n), which is particularly helpful if it is possible to
approximate a complex model Xn with a relatively simple random variable X . We will explore
this theme further in later chapters.
70
6.1 Modes of convergence

We say:
• Xn → X , known as convergence in distribution, if for every x ∈ R at which P[X = x] = 0,

d
lim P[Xn ≤ x] = P[X ≤ x].

n→∞
• Xn → X , known as convergence in probability, if given any a > 0,

P
lim P[|Xn − X| > a] = 0.

n→∞
• Xn → X , known as almost sure convergence, if

a.s.
P [Xn → X as n → ∞] = 1.
• Xn → X , known as convergence in Lp , if
Lp
E [|Xn − X|p ] → 0 as n → ∞.
Here, p ≥ 1 is a real number. We will be interested in the cases p = 1 and p = 2. The case p = 2
is sometimes known as convergence in mean square. Note that these four definitions also appear
on the formula sheet, in Appendix B.
It is common for random variables to converge in some modes but not others, as the following
example shows.
Example 6.1.1 Let U be a uniform random variable on [0, 1] and set
n2 if U < 1/n,
(
Xn = n 1{U < 1/n} =
2
0 otherwise.
Our candidate limit is X = 0, the random variable that takes the deterministic value 0. We’ll
check each of the types of convergence in turn.
• For convergence in distribution, we note that P[X ≤ x] = 01 ifif x<0
x≥0 . We consider these two

cases:
1. Firstly, if x < 0 then P[Xn ≤ x] = 0 so P[Xn ≤ x] → 0.

2. Secondly, consider x ≥ 0. By definition P[Xn = 0] = 1 − n1 , so we have that 1 − 1
n =
P[Xn = 0] ≤ P[X ≤ x] ≤ 1, and the sandwich rule tells us that P[Xn ≤ x] → 1.
Hence, P[Xn ≤ x] → P[X ≤ x] in both cases, which means that Xn → X .

d
• For any 0 < a ≤ n2 we have P[|Xn − 0| > a] = P[Xn > a] ≤ P[Xn = n2 ] = n1 , so as n → ∞

we have P[|Xn − 0| > a] → 0, which means that we do have Xn → 0.
P
• If Xm = 0 for some m ∈ N then Xn = 0 for all n ≥ m, which implies that Xn → 0 as

n → ∞. So, we have h i 1
P lim Xn = 0 ≥ P[Xm = 0] = 1 − .
n→∞ m
Since this is true for any m ∈ N, we have P[limn→∞ Xn = 0] = 1, that is Xn → 0.
a.s.
71
• Lastly, E[|Xn − 0|] = E[Xn ] = n2 n1 = n, which does not tend to 0 as n → ∞. So Xn does

not converge to 0 in L1 .
As we might hope, there are relationships between the different modes of convergence, which
are useful to remember.
Lemma 6.1.2 Let Xn , X be random variables.
1. If Xn → X then Xn → X .
P d
a.s. P
Lp
P
Lq Lp
4. Let 1 ≤ p < q . If Xn → X then Xn → X .
In all other cases (i.e. that are not automatically implied by the above), convergence in one mode
does not imply convergence in another.
The proofs are not part of our course (they are part of MAS31002/61022). We can summarise
Lemma 6.1.2 with a diagram:
Remark 6.1.3 For convergence of real numbers, it was shown in MAS221 that if an → a and
an → b then a = b, which is known as uniqueness of limits. For random variables, the situation is
a little more complicated: if Xn → X and Xn → Y then X = Y almost surely. By Lemma 6.1.2,
P P
p
this result also applies to → and → . However, if we have only Xn → X and Xn → Y then we
L a.s. d d
can only conclude that X and Y have the same distribution function. Proving these facts is one
of the challenge exercises, 6.6.
72
6.2 The monotone convergence theorem

A natural question to ask is, when does E[Xn ] → E[X]? This is not a mode of convergence, but
simply a practical question.
We are interested (for use later on in the course) to ask when almost sure convergence implies
that E[Xn ] → E[X]. As we can see from Example 6.1.1, in general it does not. We need some
extra conditions:
Theorem 6.2.1 (Monotone Convergence Theorem) Let (Xn ) be a sequence of random vari-
ables and suppose that:
1. Xn+1 ≥ Xn , almost surely, for all n.
2. Xn ≥ 0, almost surely, for all n.
Then, there exists a random variable X such that Xn → X . Further, E[Xn ] → E[X].
a.s.
The first claim of the theorem, that the sequence Xn converges almost surely, is true because by
property 1 the sequence Xn is increasing, almost surely, and increasing sequences of real numbers
converge1
Since limits preserve weak inequalities and Xn ≥ 0 we have X ≥ 0, almost surely. However,
we should not forget that E[X] might be equal to +∞.
You can think of Theorem 6.2.1 as a stochastic equivalent of the fact that increasing real
valued sequences converge (in R if they are bounded, and to +∞ if they are not bounded). This
is usually helpful when applying the theorem, such as in the following example.
Let (Xn ) be a sequence of independent random variables, with distribution given by
2−i with probability 12

(
Xi =
0 with probability 12 .
Let Yn = i=1 Xi . Then (YnP ) is an increasing sequence, almost surely, and hence converges
Pn
almost surely to the limit Y = ∞ i=1 Xi .

Since also Yn ≥ 0, we can apply the monotone convergence theorem to (Yn ) and deduce that
E[Yn ] → E[Y ]. By linearity of E, and geometric summation, we have that
n n 1
X X 11 1 2 − ( 12 )n+1
E[Yn ] = E[Yi ] = =
i=1
22
i=1
i 2 1 − 12
This converges to 12 as n → ∞, so we deduce that E[Y ] = 12 . We’ll investigate this example

further in exercise 6.5. In fact, Xi corresponds to the ith digit in the binary expansion of Y .
Remark 6.2.2 (∆) Those of you taking the MAS61023 version of the course should now begin
your independent reading, starting in Chapter 8. It begins with a convergence theorem similar to
Theorem 6.2.1, but which applies to non-monotone sequences.
1 ( ) More precisely: we have P [lim

n→∞ Xn (ω) exists] = 1 and for ω in this set we can define X(ω) = limn→∞ Xn (ω).
We don’t care about ω for which the limit doesn’t exist, because this has probability zero! We could set X(ω) = 0 in such
cases, and it won’t affect any probabilities involving X. See MAS31002/61022 for details.
73

On convergence of random variables
6.1 Let (Xn ) be a sequence of independent random variables such that
2−n with probability 12

(
Xn =
0 with probability 12 .
Show that Xn → 0 in L1 and almost surely. Deduce that also Xn → 0 in probability and in
distribution.
6.2 Let Xn , X be random variables.
L1
(a) Suppose that Xn → X as n → ∞. Show that E[Xn ] → E[X].
(b) Give an example where E[Xn ] → E[X] but Xn does not converge to X in L1 .
6.3 Let U be a random variable such that P[U = 0] = P[U = 1] = P[U = 2] = 13 . Let Xn and
X be given by

1 + n if U = 0
1
1 if U ∈ {0, 1}

 (
Xn = 1 − n if U = 1
1 X=
0 if U = 2.
if U = 2,


0
Show that Xn → X both almost surely and in L1 . Deduce that also Xn → X in probability
and in distribution.
6.4 Let X1 be a random variable with distribution given by P[X1 = 1] = P[X1 = 0] = 12 . Set
Xn = X1 for all n ≥ 2. Set Y = 1 − X1 . Show that Xn → Y in distribution, but not in
probability.
6.5 Let (Xn ) be the sequence of random variables from 6.1. Define Yn = X1 + X2 + . . . + Xn .
(a) Show that, for all ω ∈ Ω, the sequence Yn (ω) is increasing and bounded.
(b) Deduce that there exists a random variable Y such that Yn → Y .
a.s.
(c) Write down the distribution of Y1 , Y2 and Y3 .

(d) Suggest why we might guess that Y has a uniform distribution on [0, 1].
(e) Prove that Yn has a uniform distribution on {k2−n ; k = 0, 1, . . . , 2n − 1}.
(f) Prove that Y has a uniform distribution on [0, 1].
On the monotone convergence theorem

6.6 Let Yn be a sequence of random variables such that Yn+1 ≤ Yn ≤ 0, almost surely, for all n.
Show that there exists a random variable Y such that Yn → Y and E[Yn ] → E[Y ].
a.s.
6.7 Let X be a random variable such that X ≥ 0 and P[X < ∞] = 1. Define
X if X ≤ n
(
Xn =
0 otherwise.
Equivalently, Xn = X 1{X ≤ n}. Show that E[Xn ] → E[X] as n → ∞.
74
6.8 Let (Xn ) be a sequence of random variables.
(a) Explain briefly why E i=1 E[Xi ] follows from the linearity of E, for
Pn Pn
i=1 Xi =
n ∈ N. Explain briefly why linearity alone does not allow us to deduce the same
equation with n = ∞.
(b) Suppose that Xn ≥ 0 almost surely, for all n. Show that
"∞ # ∞
(6.1)
X X
E Xi = E[Xi ].
i=1 i=1
(c) Suppose, instead, that the Xi are independent and that P[Xi = 1] = P[Xi = −1] = 12 .
Explain briefly why (6.1) fails to hold, in this case.
Challenge questions
6.6 Let (Xn ) be a sequence of random variables, and let X and Y be random variables.
(a) Show that if Xn → X and Xn → Y then X and Y have the same distribution.
d d
(b) Show that if Xn → X and Xn → Y then X = Y almost surely.

P P
6.7 Let (Xn ) be a sequence of independent random variables such that P[Xn = 1] = P[Xn =
0] = 12 . Show that (Xn ) does not converge in probability and deduce that (Xn ) also does
not converge in L1 , or almost surely. Does Xn converge in distribution?
75
Chapter 7
Stochastic processes and martingale

theory
In this section we introduce two important results from the theory of martingales, namely the
‘martingale transform’ and the ‘martingale convergence theorem’. We use these results to anal-
yse the behaviour of stochastic processes, including those from Chapter 4 (random walks, urns,
branching processes) and also the gambling game Roulette.
Despite having found a martingale connected to the binomial model, in Proposition 5.5.6, we
won’t use martingales to analyse our financial models – yet. That will come in Chapter 15, once
we have moved into continuous time and introduced the Black-Scholes model.
76
7.1 The martingale transform

If M is a stochastic process and C is an adapted process, we define the martingale transform of
C by M
Xn
(C ◦ M )n = Ci−1 (Mi − Mi−1 ).
i=1
Here, by convention, we set (C ◦ M )0 = 0.

If M is a martingale, the process (C ◦ M )n can be thought of as our winnings after n plays
of a game. Here, at round i, a bet of Ci−1 is made, and the change to our resulting wealth is
Ci−1 (Mi − Mi−1 ). For example, if Ci ≡ 1 and Mn is the simple random walk Mn = n1 Xi then
P
Mi − Mi−1 = Xi , so we win/lose each round with even chances; we bet 1 on each round, if we
win we get our money back doubled, if we lose we get nothing back.
We place the bet Ci−1 ∈ mFi−1 on play i, meaning that we must place our ith bet using only
the information we gained during the first i − 1 plays. In particular, we don’t yet know the result
of the ith play. So our filtration here is Fn = σ(Ci , Xi ; i ≤ n).
Theorem 7.1.1 If M is a martingale and C is adapted and bounded, then (C ◦ M )n is also a

martingale.
Similarly, if M is a supermartingale (resp. submartingale), and C is adapted, bounded and
non-negative, then (C ◦ M )n is also a supermartingale martingale (resp. submartingale).
Proof: Let M be a martingale. Write Y = C ◦ M . We have Cn ∈ mFn and Xn ∈ mFn , so

Proposition 2.2.6 implies that Yn ∈ mFn . Since |Cn| ≤ c for some c and all n, we have
n
X n
X
E|Yn | ≤ E|Ck−1 (Mk − Mk−1 )| ≤ c E|Mk | + E|Mk−1 | < ∞.
k=1 k=1
So Yn ∈ L1 . Since Cn−1 is Fn−1 -measurable, by linearity of conditional expectation, the taking

out what is known rule and the martingale property of M , we have
E[Yn | Fn−1 ] = E[Yn−1 + Cn−1 (Mn − Mn−1 ) | Fn−1 ]

= Yn−1 + Cn−1 E[Mn − Mn−1 | Fn−1 ]
= Yn−1 + Cn−1 (E[Mn |Fn−1 ] − Mn−1 )
= Yn−1 .
Hence Y is a martingale.
The argument is easily adapted to prove the second statement, e.g. for a supermartingale M ,
E[Mn − Mn−1 | Fn−1 ] ≤ 0. Note that in these cases it is important that C is non-negative.
77
7.2 Roulette
The martingale transform is a useful theoretical tool (see e.g. Sections 7.3, 8.2 and 12.1), but it
also provides a framework to model casino games. We illustrate this with Roulette.
In roulette, a metal ball lies inside of a spinning wheel. The wheel is divided into 37 segments,
of which 18 are black, 18 are red, and 1 is green. The wheel is spun, and the ball spins with it,
eventually coming to rest in one of the 37 segments. If the roulette wheel is manufactured properly,
the ball lands in each segment with probability 37 1
and the result of each spin is independent.
On each spin, a player can bet an amount of money C . The player chooses either red or
black. If the ball lands on the colour of their choice, they get their bet of C returned and win an
additional C . Otherwise, the casino takes the money and the player gets nothing.
The key point is that players can only bet on red or black. If the ball lands on green, the
casino takes everyones money.
Remark 7.2.1 Thinking more generally, most casino games fit into this mould – there is a very
small bias towards the casino earning money. This bias is known as the ‘house advantage’.
In each round of roulette, a players probability of winning is 1837 (it does not matter which
colour they pick). Let (Xn ) be a sequence of i.i.d. random variables such that
with probability 18
(
1 37
Xn =
−1 with probability 19 37
Naturally, the first case corresponds to the player winning game n and the second to losing. We
define
Xn
Mn = Xn .
i=1
Then, the value of Mn − Mn−1 = Xn is 1 if the player wins game n and −1 if they lose. We take
our filtration to be generated by (Mn ), so Fn = σ(Mi ; i ≤ n).
A player cannot see into the future. So the bet they place on game n must be chosen before the
game is played, at time n − 1 – we write this bet as Cn−1 , and require that it is Fn−1 measurable.
Hence, C is adapted. The total profit/loss of the player over time is the martingale transform
n
X
(C ◦ M )n = Ci−1 (Mi − Mi−1 ).
i=1
78
We’ll now show that (Mn ) is a supermartingale. We have Mn ∈ mFn and since |Mn | ≤ n we also
have Mn ∈ L1 . Lastly,
E[Mn+1 |Fn ] = E [Xn+1 + Mn | Fn ]

= E[Xn+1 |Fn ] + Mn
= E[Xn+1 ] + Mn
≤ Mn .
Here, the second line follows by linearity and the taking out what is known rule. The third line
follows because Xn+1 is independent of Fn , and the last line follows because E[Xn+1 ] = −1
37 < 0.
So, (Mn ) is a supermartingale and (Cn ) is adapted. Theorem 7.1.1 applies and tells us that
(C ◦ M )n is a supermartingale. We’ll continue this story in Section 7.4.
Remark 7.2.2 There have been ingenious attempts to win money at Roulette, often through
hidden technology or by exploiting mechanical flaws (which can slightly bias the odds), mixed
with probability theory.
In 1961, Edward O. Thorp (a professor of mathematics) and Claude Shannon (a professor of
electrical engineering) created the worlds first wearable computer, which timed the movements of
the ball and wheel, and used this information to try and predict roughly where the ball would
land. Here’s the machine itself:
Information was input to the computer by its wearer, who silently tapped their foot as the Roulette
wheel spun. By combining this information with elements of probability theory, they believed they
could consistently ‘beat the casino’. They were very successful, and obtained a return of 144% on
their bets. Their method is now illegal; at the time it was not, because no-one had even thought
it might be possible.
Of course, most gamblers are not so fortunate.
79
7.3 The martingale convergence theorem

In this section, we are concerned with almost sure convergence of supermartingales (Mn ) as
n → ∞. Naturally, martingales are a special case and submartingales can be handled through
multiplying by −1. We’ll need the following definition:
Definition 7.3.1 Let p ∈ [1, ∞). We say that a stochastic process (Xn ) is uniformly bounded in
Lp if there exists some M < ∞ such that, for all n,
E [|Xn |p ] ≤ M.
As usual, we’ll mostly only be concerned with the cases p = 1, 2.

Let (Mn ) be a stochastic process and fix a < b. We define UN [a, b] to be the number of
upcrossings made in the interval [a, b] by M1 , . . . , MN . That is, Un [a, b] is the largest k such there
exists
0 ≤ s1 < t 2 < . . . < s k < t k ≤ N such that Msi ≤ a, Mti > b for all i = 1, . . . , k.
This definition is best understood through a picture:
Note that, for convenience, we draw Mn as a continuous (green line) although in fact it only
changing value at discrete times.
Studying upcrossings is key to establishing almost sure convergence of supermartingales. To
see why upcrossings are important, note that if (cn ) ⊆ R is a (deterministic) sequence and cn → c,
for some c ∈ R, then there is no interval [a, b] a < b such that (cn )∞ n=1 makes infinitely many
upcrossings of [a, b]; if there was then (cn ) would oscillate and couldn’t converge.
Note that UN [a, b] is an increasing function of N , and define U∞ [a, b] by
U∞ [a, b](ω) = lim UN [a, b](ω). (7.1)

N ↑∞
This is an almost surely limit, since it holds for each ω ∈ Ω. With this definition, U∞ [a, b] could
potentially be infinite, but we can prove that it is not.
Lemma 7.3.2 Let M be a supermartingale. Then
(b − a)E[UN [a, b]] ≤ E[|MN − a|].
80
Proof: Let C1 = 1{M0 < a} and recursively define
Cn = 1{Cn−1 = 1, Mn ≤ b} + 1{Cn−1 = 0, Mn ≤ a}.
The behaviour of Cn is that, when X enters the region below a, Cn starts taking the value 1. It
will continue to take the value 1 until M enters the region above b, at which point Cn will start
taking the value 0. It will continue to take the value 0 until M enters the region below a, and so
on. Hence,
N
(7.2)
X
(C ◦ M )N = Ck−1 (Mk − Mk−1 ) ≥ (b − a)UN [a, b] − |MN − a|.
k=1
That is, each upcrossing of [a, b] by M picks up at least (b − a); the final term corresponds to an
upcrossing that M might have started but not finished.
Note that Cn is adapted, bounded and non-negative. Hence, by Theorem 7.1.1 we have that
C ◦ M is a supermartingale. Thus E[(C ◦ M )N ] ≤ 0, which combined with (7.2) proves the given
result.
Lemma 7.3.3 Suppose M is a supermartingale and uniformly bounded in L1 . Then P [U∞ [a, b] =
∞] = 0.
Proof: From Lemma 7.3.2 we have
(b − a)E[UN [a, b]] ≤ |a| + sup E|Mn |. (7.3)

n∈N
We have that UN [a, b] is increasing, as N increases, and the definition of U∞ [a, b] in (7.1) gives
that that UN [a, n] → U∞ [a, b] almost surely as N → ∞. Hence, by the monotone convergence
theorem, E[UN [a, b]] → E[U∞ [a, b]], and so by letting N → ∞ in (7.3) we have
(b − a)E[U∞ [a, b]] ≤ |a| + sup E|Mn | < ∞,

n∈N
which implies that P[U∞ [a, b] < ∞] = 1.

Essentially, Lemma 7.3.3 says that the paths of M cannot oscillate indefinitely. This is the
crucial ingredient of the martingale convergence theorem.
Theorem 7.3.4 (Martingale Convergence Theorem I) Suppose M is a supermartingale and

uniformly bounded in L1 . Then the limit Mn → M∞ exists and P[|M∞ | < ∞] = 1.
a.s.
Proof: Define
Λa,b = {ω : for infinitely many n, Mn (ω) < a} ∩ {ω : for infinitely many n, Mn (ω) > b}.
We observe that Λa,b ⊆ {U∞ [a, b] = ∞}, which has probability 0 by Lemma 7.3.3. But since
{ω : Mn (ω) does not converge to a limit in [−∞, ∞]} =

[
Λa,b ,
a,b∈Q
a<b
we have that
P[Mn converges to some M∞ ∈ [−∞, +∞]] = 1
81
which proves the first part of the theorem.

To prove the second part we will use an inequality that holds for any convergent sequence
Mn → M∞ of random variables:
a.s.
E[|M∞ |] ≤ sup E[|Mn |]. (7.4)

n∈N
This inequality can be proved using the monotone convergence theorem and some careful analysis,
see exercise 7.14. Since we assumed that (Mn ) is uniformly bounded in L1 , (7.4) gives us that
E[|M∞ |] < ∞. Hence, P[|M∞ | = ∞] = 0 (or else the expected value would be infinite).
One useful note is that if Mn is a non-negative supermartingale then we have E[|Mn |] =
E[Mn ] ≤ E[M0 ], so in this case M is uniformly bounded in L1 .
Theorem 7.3.4 has one big disadvantage: it cannot tell us anything about the limit M∞ , except
that it is finite. To gain more information about M∞ , we need an extra condition.
Corollary 7.3.5 (Martingale Convergence Theorem II) In the setting of Theorem 7.3.4,
suppose additionally that (Mn ) is uniformly bounded in L2 . Then Mn → M∞ in both L1 and L2 ,
and
lim E[Mn ] = E[M∞ ], lim var(Mn ) → var(M∞ ).
n→∞ n→∞
The proof of Corollary 7.3.5 is outside of the scope of our course.
82
7.4 Long term behaviour of stochastic processes

Our next step is to use the martingale convergence theorem to look at the behaviour as t → ∞
of various stochastic processes. We will begin with Roulette, from Section 7.2, and then move on
to the trio of stochastic processes (random walks, urns, branching processes) that we introduced
in Chapter 4.
One fact, from real analysis, that we will find useful is the following:
Lemma 7.4.1 Let (an ) be an integer valued sequence, and suppose that an → a as n → ∞, where
a ∈ R. Then (an ) is eventually equal to a constant: there exists N ∈ N such that an = a for all
n ≥ N.
Proof: Put = 13 into the definition of a convergent real sequence, and we obtain that there
exists N ∈ N such that |an − a| ≤ 13 for all n ≥ N . Hence, for n ≥ N we have
|an − an+1 | = |an − a + a − an+1 | ≤ |an − a| + |a − an+1 |

1 1
≤ 3 + 3
2
= 3.
Since 23 < 1, and an takes only integer values, this means that an = an+1 . Since this holds for all
n ≥ N , we must have aN = aN +1 = aN +2 = . . ., which means that an → aN as N → ∞ (and
hence a = aN ).
Long term behaviour of Roulette

Let us think about what happens to our Roulette player, from Section 7.2.
Recall that our gambler bets an amount Cn−1 ∈ Fn−1 on play n. On each play, our gambler
increases his wealth, by Cn−1 , with probability 18
37 , or decreases his wealth, by Cn−1 , with prob-
ability 37 . We showed in Section 7.2 that his profit/loss after n plays was a supermartingale,
19
(C ◦ M )n . Thus, if our gambler starts with an amount of money K ∈ N at time 0, then at time
n they have an amount of money given by
Wn = K + (C ◦ M )n . (7.5)
We’ll assume that our gambler keeps betting until they run out of money, so we have Cn ≥ 1 and
Wn ≥ 0 for all n before they run out of money (and, if and when they do run out, Cn = 0 for
all remaining n). Since money is a discrete quantity, we can assume that K and Cn are integers,
hence Wn is also always an integer.
From Section 7.2 we know that (C ◦M )n is a supermartingale, hence Wn is also a supermartin-
gale. Since Wn ≥ 0 we have E[|Wn |] = E[Wn ] ≤ E[W0 ] so (Wn ) is uniformly bounded in L1 .
Hence, by the martingale convergence theorem, Wn is almost surely convergent.
By Lemma 7.4.1, the only way that Wn can converge is if it becomes constant, eventually.
Since each play results in a win (an increase) or a loss (a decrease) the only way Wn can become
eventually constant is if our gambler has lost all his money i.e. for some (random) N ∈ N, for all
n ≥ N , we have Wn = 0. Thus:
Lemma 7.4.2 Almost surely, a roulette player will eventually lose all their money.
83
Long term behaviour of urn processes

We will look at the Pólya urn process introduced in Section 4.2. Recall that we start our urn
with 2 balls, one red and one black. Then, at each time n = 1, 2, . . . , we draw a ball, uniformly
at random, from the urn, and replace it alongside an additional ball of the same colour. Thus, at
time n (which means: after the nth draw is completed) the urn contains n + 2 balls.
Recall also that we write Bn for the number of red balls in the urn at time n, and
Bn
Mn = (7.6)
n+2
for the fraction of red balls in the urn at time n. We have shown in Section 4.2 that Mn is a
martingale. Since Mn ∈ [0, 1], we have E[|Mn |] ≤ 1, hence (Mn ) is uniformly bounded in L1 and
the martingale convergence theorem applies. Therefore, there exists a random variable M∞ such
that Mn → M∞ . We will show that:
a.s.
Proposition 7.4.3 M∞ has the uniform distribution on [0, 1].

Remark 7.4.4 Results like Proposition 7.4.3 are extremely useful, in stochastic modelling. It
provides us with the following ‘rule of thumb’: if n is large, then the fraction of red balls in our
urn is approximately a uniform distribution on [0, 1]. Therefore, we can approximate a complicated
object (our urn) with a much simpler object (a uniform random variable).
We now aim to prove Proposition 7.4.3, which means we must find out the distribution of M∞ .
In order to do so, we will use Lemma 6.1.2, which tells us that Mn → M∞ , that is
d
P[Mn ≤ x] → P[M∞ ≤ x] (7.7)
for all x, except possibly those for which P[M∞ = x] > 0. We will begin with a surprising lemma
that tells us the distribution of Bn . Then, using (7.6), we will be able to find the distribution of
Mn , from which (7.7) will then tell us the distribution of M∞ .
Lemma 7.4.5 For all k = 1, . . . , n + 1, it holds that P[Bn = k] = n+1 .
1
Proof: Let A be the event that the first k balls drawn are red, and the next j balls drawn are
black. Then,
12 k 1 2 j
P[A] = ... ... . (7.8)
23 k+1k+2k+3 j+k+1
Here, each fraction corresponds (in order) to the probability of getting a red/black ball (as ap-
propriate) on the corresponding draw, given the results of previous draws. For example, 12 is the
probability that the first draw results in a red ball, after which the urn contains 2 red balls and
1 black ball, so 23 is the probability that the second draw results in a red ball, and so on. From
(7.8) we have P[A] = (j+k+1)!
j!k!
.
Here is the clever part: we note that drawing k red balls and j black balls, in the first j + k
draws but in a different order, would have the same probability. We would simply obtain the
numerators in (7.8) in a different order. There are j+k possible different orders (i.e. we must

k
choose k time-points from j + k times at which to pick the red balls). Hence, the probability that
we draw k red and j black in the first j + k draws is

j+k j!k! (j + k)! j!k! 1
= = .
k (j + k + 1)! k!(j + k − k)! (j + k + 1)! j+k+1
84
We set j = n − k , to obtain the probability of drawing (and thus adding) k red balls within the
first j + k = n draws. Hence P[Bn = k + 1] = n+1 1
for all k = 0, 1, . . . , n. Setting k 0 = k + 1, we
recover the statement of the lemma.
Proof: [Of Proposition 7.4.3.] Now that we know the distribution of Bn , we can use (7.6) to
find out the distribution of Mn . Lemma 7.4.5 tells us that Bn is uniformly distribution on the set
{1, . . . , n + 1}. Hence Mn is uniform on { n+2
1 2
, n+2 n+2 }. This gives us that, for x ∈ (0, 1),
, . . . , n+1
1
P[Mn ≤ x] = × bx(n + 2)c.
n+1
Here, bx(n + 2)c, which denotes the integer part of x(n + 2), is equal to the number of elements
of { n+2
1 2
, n+2 n+2 } that are ≤ x. Since x(n + 2) − 1 ≤ bx(n + 2)c ≤ x(n + 2) we obtain that
, . . . , n+1
x(n + 2) − 1 x(n + 2)
≤ P[Mn ≤ x] ≤
n+1 n+1
and the sandwich rule tells us that limn→∞ P[Mn ≤ x] = x. By (7.7), this means that
P[M∞ ≤ x] = x for all x ∈ (0, 1).
Therefore, M∞ has a uniform distribution on (0, 1). This proves Proposition 7.4.3.
Extensions and applications of urn processes
We could change the rules of our urn process, to create a new kind of urn – for example whenever
we draw a red ball we could add two new red balls, rather than just one. Urn processes are
typically very ‘sensitive’, meaning that a small change in the process, or its initial conditions,
can have a big effect on the long term behaviour. You can investigate some examples of this in
exercises 7.7, 7.8 and 7.9.
Urn processes are often used in models in which the ‘next step’ of the process depends on
sampling from its current state. Recall that in random walks we have Sn+1 = Sn + Xn+1 , where
Xn+1 is independent of Sn ; so for random walks the next step of the process is independent of its
current state. By contrast, in Pólya’s urn, whether the next ball we add is red (or black) depends
on the current state of the urn; it depends on which colour ball we draw.
For example, we might look at people choosing which restaurant they have dinner at. Consider
two restaurants, say Café Rouge (which is red) and Le Chat Noir (which is black). We think of
customers in Café Rouge as red balls, and customers in Le Chat Noir as black balls. A newly
arriving customer (i.e. a new ball that we add to the urn) choosing which restaurant to visit is
more likely, but not certain, to choose the restaurant which currently has the most customers.
Other examples include the growth of social networks (people tend to befriend people who
already have many friends), machine learning (used in techniques for becoming successively more
confident about partially learned information), and maximal safe dosage estimation in clinical
trials (complicated reasons). Exercise 7.8 features an example of an urn process that is the basis
for modern models of evolution (successful organisms tend to have more descendants, who are
themselves more likely to inherit genes that will make them successful).
85
Long term behaviour of Galton-Watson processes

The martingale convergence theorem can tell us about the long term behaviour of stochastic
processes. In this section we focus on the Galton-Watson process, which we introduced in Section
4.3.
Let us recall the notation from Section 4.3. Let Xin , where n, i ≥ 1, be i.i.d. nonnegative
integer-valued random variables with common distribution G. Define a sequence (Zn ) by Z0 = 1
and
, if Zn > 0
(
X1n+1 + . . . + XZn+1
Zn+1 = n
(7.9)
0, if Zn = 0
Then (Zn ) is the Galton-Watson process. In Section 4.3 we showed that, writing µ = E[G],
Zn
Mn =
µn
was a martingale with mean E[Mn ] = 1.
We now look to describe the long term behaviour of the process Zn , meaning that we want to
know how Zn behaves as n → ∞. We’ll consider three cases: µ < 1, µ = 1, and µ > 1. Before
we start, its important to note that, if ZN = 0 for any N ∈ N, then Zn = 0 for all n ≥ N . When
this happens, it is said that the process dies out.
We’ll need the martingale convergence theorem. We have that Mn is a martingale, with
expected value 1, and since Mn ≥ 0 we have E[|Mn |] = 1. Hence, (Mn ) is uniformly bounded
in L1 and by the martingale convergence theorem (Theorem 7.3.4) we have that the almost sure
limit
lim Mn = M∞
n→∞
exists. Since Mn ≥ 0, we have M∞ ∈ [0, ∞).
For the rest of this section, we’ll assume that the offspring distribution G is not deterministic.
Obviously, if G was deterministic and equal to say, c, then we would simply have Zn = cn for all
n.
Lemma 7.4.6 Suppose that µ = 1. Then P[Zn dies out] = 1.
Proof: In this case Mn = Zn . So, we have Zn → M∞ almost surely. The offspring distribution
is not deterministic, so for as long as Zn 6= 0 the value of Zn will continue to change over time.
Each such change in value is of magnitude at least 1, hence by Lemma 7.4.1 the only way Zn can
converge is if Zn is eventually zero. Therefore, since Zn does converge, in this case we must have
P[Zn dies out] = 1 (and M∞ = 0).
Lemma 7.4.7 Suppose that µ < 1. Then P[Zn dies out] = 1.
Sketch of Proof: Recall our graphical representation of the Galton-Watson process in Section
4.3. Here’s the key idea: when µ < 1, on average we expect each individual to have less children
than when µ = 1. Therefore, if the Galton-Watson process dies out when µ = 1 (as is the case,
from Lemma 7.4.6), it should also die out when µ < 1.
Our idea is not quite a proof, because the size of the Galton-Watson process Zn is random,
and not necessarily equal to its expected size E[Zn ] = µn . To make the idea into a proof we will
need to find two Galton-Watson processes, Zn and Zen , such that 0 ≤ Zn ≤ Zen , with µ < 1 in Zn
86
and µ = 1 in Zen . Then, when Zen becomes extinct, so does Zn . This type of argument is known
as a coupling, meaning that we use one stochastic process to control another. See exercise 7.15
for how to do it.
Lemma 7.4.8 Suppose that µ > 1 and σ 2 < ∞. Then P[Zn → ∞] > 0.
Proof: The first step of the argument is to show that, for all n ∈ N,
σ2
2
E[Mn+1 ] = E[Mn2 ] + . (7.10)
µn+2
You can find this calculation as exercise 7.13. It is similar to the calculation that we did to find
E[Mn ] in Section 4.3, but a bit harder because of the square.
By iterating (7.10) and noting that E[M0 ] = 1, we obtain that
n
2
X σ2
E[Mn+1 ] =1+ .
µi+2
i=1
Hence,
∞
2 σ2 X 1
E[Mn+1 ]≤1+ ,
µ2 µi
i=1
which is finite because µ > 1. Therefore, (Mn ) is uniformly bounded in L2 , and the second
martingale convergence theorem applies. In particular, this gives us that
E[Mn ] → E[M∞ ].
Noting that E[Mn ] = 1 for all n, we obtain that E[M∞ ] = 1. Hence, P[M∞ > 0] > 0. On the
event that M∞ > 0, we have Mn = Zµnn → M∞ > 0. Since µn → ∞, this means that when
M∞ > 0 we must also have Zn → ∞. Hence, P[Zn → ∞] > 0.
The behaviour Zn → ∞ is often called ‘explosion’, reflecting the fact that (when it happens)
each generation tends to contain more and more individuals than the previous one. The case of
µ > 1 and σ 2 = ∞ behaves in the same way as Lemma 7.4.8, but the proof is more difficult and
we don’t study it in this course.
Extensions and applications of the Galton-Watson process
The Galton-Watson process can be extended in many ways. For example, we might vary the
offspring distribution used in each generation, or add a mechanism that allows individuals to
produce their children gradually across multiple time steps. The general term for such processes
is ‘branching processes’. Most branching processes display the same type of long-term behaviour
as we have uncovered in this section: if the offspring production is too low (on average) then they
are sure to die out, but, if the offspring production is high enough that dying out is not a certainty
then (instead) there is a chance of explosion to ∞.
Branching processes are the basis of stochastic models in which existing objects reproduce, or
break up, into several new objects. Examples are the spread of contagious disease, crushing rocks,
nuclear fission, tumour growth, and so on. In Chapter 19 (which is part of MAS61023 only) we
will use branching processes to model contagion of unpaid debts in banking networks.
87
Branching processes are used by biologists to model population growth, and they are good
models for populations that are rapidly increasing in size. (For populations that have reached
a roughly constant size, models based on urn processes, such as that of exercise 7.8, are more
effective.)
88
Long term behaviour of random walks ( )

We now consider the long term behaviour of random walks. This section won’t include proofs, and
for that reason it is marked with a ( ) and is off syllabus. We will cover this chapter in lectures,
because it contain interesting information that provides intuition for (on-syllabus) results in future
chapters. For those of you taking MAS61023 a full set of proofs, using some new techniques, are
included in your independent reading in Chapters 8 and 9. However, some parts of the results
discussed below can be proved using the techniques we’ve already studied; exercises of this type
are on-syllabus for everyone, and you’ll see some at the end of this chapter.
Firstly, take Sn to be the simple symmetric random walk. It turns out that Sn will oscillate as
n → ∞, and that the magnitude of the largest oscillations will tend to infinity as n → ∞ (almost
surely). So, our picture of the long-term behaviour of the symmetric random walk looks like:
Note that we draw the picture as a continuous line, for convenience, but in fact the random walk
is jumping between integer values.
Next, let Sn be the asymmetric random walk, with p > q , so the walk drifts upwards. The
asymmetric random walk also experiences oscillations growing larger and larger, but the drift
upwards is strong enough that, in fact Sn → ∞ as n → ∞. So, our picture of the long-term
a.s.
behaviour of the asymmetric random walk looks like:
Of course, if q < p then Sn would drift downwards, and then Sn → −∞.

a.s.
89
Extensions and applications of random walks
Random walks can be extended in many ways, such as by adding an environment that influences
the behaviour of the walk (e.g. reflection upwards from the origin as in exercise 4.5, slower
movements when above 0, etc), or by having random walkers that walk around a graph (Z, in
our case above). More complex extensions involve random walks that are influenced by their own
history, for example by having a preference to move into sites that they have not yet visited.
In addition, many physical systems can be modelled by systems of several random walks, that
interact with each other, for example a setup with multiple random walks happening at the same
time (but started from different points in space) in which, whenever random walkers meet, both
of them suddenly vanish.
Random walks are the basis of essentially all stochastic modelling that involves particles moving
around randomly in space. For example, atoms in gases and liquids, animal movement, heat
diffusion, conduction of electricity, transportation networks, internet traffic, and so on.
In Chapters 15 and 16 we will use stock price models that are based on ‘Brownian motion’,
which is itself introduced in Chapter 11 and is the continuous analogue of the symmetric random
walk (i.e. no discontinuities). We’ll come back to the idea of atoms in gases and liquids in Section
11.1, and we’ll also briefly look at heat diffusion in Section 11.3.
90

On the martingale transform
7.1 Let Sn = ni=1 Xi be the symmetric random walk, from Section 7.4. In each of the following
P
cases, establish the given formula for (C ◦ S)n .

(a) If Cn = 0, show that (C ◦ S)n = 0.
(b) If Cn = 1, show that (C ◦ S)n = Sn .
(c) If Cn = Sn , show that (C ◦ S)n = Sn2
2 − n
2
7.2 Let Mn be a stochastic process. Let α, β ∈ R and let Cn and Dn be adapted stochastic
processes. Let Xn = αCn + βDn . Show that
(X ◦ M )n = α(C ◦ M )n + β(D ◦ M )n
for all n.
On long-term behaviour of stochastic processes

7.3 Let (Xn ) be a sequence of independent random variables, with distribution
1
P[Xn = 1] = P[Xn = −1] = 2
2n
and P[Xn = 0] = 1 − n2 . Define
1
Xn
Sn = Xi
i=1
where we take S0 = 0.
(a) Show that Sn is a martingale, and deduce that there exists a real-valued random variable
S∞ such that Sn → S∞ as n → ∞.
a.s.
(b) Show that, almost surely, there exists some N ∈ N such that Xn = 0 for all n ≥ N .
7.4 Write simulations of the symmetric & asymmetric random walks (in a language of your
choice). Add functionality to draw the random walk as a graph, with time on the horizontal
axis and the value of the walk on the vertical axis.
Look at several samples from your simulations, with e.g. 1000 steps of time, and check that
they support the claims made in Section 7.4, about the long-term behaviour of random
walks.
Modify your simulation to simulate the random walks in Exercise 7.3 and Question 2 of
Assignment 3. Check that your graphs support the result that, in both cases, Sn → S∞ .
a.s.
From your graphs, do you notice a difference in behaviour between these two cases?
7.5 Let Mn = Sn −Ln be the martingale defined in exercise 4.5. Show that (Mn ) is not uniformly
bounded in L1 .
7.6 Recall the Galton-Watson process (Zn ) from Section 7.4, and recall that it is parametrized
by its offspring distribution G.
(a) Give an example of an offspring distribution G for which P[Zn dies out] = 1.
(b) Give an example of an offspring distribution G for which P[Zn dies out] = 0.
91
7.7 Consider the following modification of the Pólya urn process. At time n = 0, the urn contains
one red ball and one black ball. Then, at each time n = 1, 2, . . ., we draw a ball from the
urn. We place this ball back into the urn, and add one ball of the opposite colour to the urn;
so if we drew a red ball, we would add a black ball, and vice versa.
Therefore, at time n (which means: after the nth draw is complete) the urn contains n + 2
balls. Let Bn denote the number of red balls in the urn at time n, and let Mn = n+2
Bn
denote
the fraction of red balls in the urn at time n.
(a) Calculate E[Mn+1 | Fn ] and hence show that Mn is not a martingale, with respect to
the filtration Fn = σ(B1 , . . . , Bn ).
(b) Write a simulation of the urn (in a language of your choice) and use your simulation to
make a conjecture about the value of the almost sure limit of Mn as n → ∞. Does this
limit depend on the initial state of the urn?
7.8 Consider an urn that, at time n = 0, contains K ≥ 1 balls, each of which is either black or
red. At each time n = 1, 2, . . ., we do the following, in order:
1. Draw a ball X1 from the urn, and record its colour. Place X1 back into the urn.
2. Draw a ball X2 from the urn, and discard it.
3. Place a new ball, with the same colour as X1 , into the urn.
Thus, for all time, the urn contains exactly K balls. We write Mn for the fraction of red
balls in the urn, after the nth iteration of the above steps is complete.
(a) Show that Mn is a martingale.

(b) Show that there exists a random variable M∞ such that Mn → M∞ as n → ∞, and
a.s.
deduce that P[M∞ = 0 or M∞ = 1] = 1.
This process is known as the discrete time ‘Moran model’, and is a model for the evolution
of a population that contains a fixed number K of individual organisms – represented as K
balls. At each time n, an individual X2 (is chosen and) dies and an individual X1 (is chosen
and) reproduces.
Although this model is a highly simplified version of reality, with careful enough application it
turns out to be very useful. For example, it is the basis for current methods of reconstructing
genealogical trees from data obtained by genome sequencing.
7.9 Consider the Pólya urn process from Section 7.4. Suppose that we begin our urn, at time
n = 0, with two red balls and one black ball. Let Mn denote the resulting fraction of red
balls in the urn at time n.
(a) Show that Mn does not converge almost surely to 0.
(b) Write a simulation of the Pólya urn process (in a language of your choice) and compare
the effect of different initial conditions on M∞ .
7.10 Let (Sn ) denote the simple asymmetric random walk, with q > p. In Exercise 4.2 we showed
that
Mn = (q/p)Sn
is a martingale.
92
(a) Show that there exists a real valued random variable M∞ such that Mn → M∞ .
a.s.
(b) Deduce that P[M∞ = 0] = 1 and that (Mn ) is not uniformly bounded in L2 .
(c) Use (a) and (b) to show that Sn → −∞. Explain briefly why this means that Sn → ∞
a.s. a.s.
for asymmetric random walks with p > q .
7.11 Let Sn denote the symmetric random walk, from Section 7.4. Recall that S0 = 0.
(a) Show that Sn is even when n is even, and odd when n is odd.
(b) Define pn = P[Sn = 0]. Show that p2n = 2n and p2n+1 = 0.
−2n
n 2
(Hint: Count the number of ways to return to zero after precisely 2n steps.)
(c) Show that p2(n+1) = 1 − 2(n+1)
1
p2n for all n, and hence show that p2n → 0 as n → ∞.

(Hint: Use the inequality 1 − x ≤ e−x , which holds for all x ≥ 0.)
7.12 Let Sn denote the symmetric random walk, from Section 7.4. Let f : N → N be a determin-
istic function.
(a) Show that if Sn

f (n) is a martingale (for n ≥ 1), then f is constant.
(b) Show that if Sn
f (n) is a supermartingale (for n ≥ 1), then f is constant.
Challenge questions
7.13 In this question we establish the formula (7.10), which we used in the proof of Lemma 7.4.8.
Let (Zn ) be the Galton-Watson process, with the offspring distribution G. Suppose that
E[G] = µ and var(G) = σ 2 < ∞. Set Mn = Zµnn .
(a) Show that

Zn σ 2
E[(Mn+1 − Mn )2 | Fn ] = ,
µ2(n+1)
(b) Deduce from part (a) and exercise 3.6 that E[Mn+1
2 ] = E[M 2 ] +
n
σ2
µn+2 .
7.14 In this question we prove the inequality (7.4), which we used in the proof of the martingale
convergence theorem.
Let Mn be a sequence of random variables such that Mn → M∞ . Define
a.s.
Xn = inf |Mk |.
k≥n
(a) Explain why (Xn ) is an increasing sequence and, hence, why there is a random variable
X∞ such that Xn → X∞ .
a.s.
(b) Show that, for all > 0 and all n ∈ N there exists some n0 ≥ n such that
|Mn0 | − ≤ Xn ≤ |Mn |.
(c) Deduce that X∞ = |M∞ |.

(d) Check that the monotone convergence theorem applies to (Xn ).
(e) Deduce that E[|M∞ |] ≤ supn∈N E[|Mn |].
93
7.15 In this question we give a rigorous proof of Lemma 7.4.7.

Let Zn and Xin+1 be as in (7.9) (i.e. Zn is a Galton-Watson process) and suppose that
E[Xin+1 ] = µ < 1.
Let α ∈ [0, 1]. For each i, n we define an independent random variable Cin+1 , with the same
distribution as C where P [C = 1] = α and P [C = 0] = 1 − α. We define

0

 if Xin+1 = 0 and Cin+1 = 0
Xe n+1 = 1
i if Xin+1 = 0 and Cin+1 = 1 (7.11)
X n+1 if X n+1 ≥ 1.


i i
Define Zen by setting Ze0 = 1, and then using (7.9) with Zen in place of Zn and X
e n+1 in place
i
of Xin+1 . Define f : [0, 1] → R by f (α) = E[Xin+1 ].
(a) Convince yourself that Zen is a Galton-Watson process, with offspring distribution given
by (7.11).
(b) Explain briefly why 0 ≤ Zn ≤ Zen for all n.
(c) Show that f (0) < 1 and f (1) ≥ 1. Deduce that there exists a value α ∈ [0, 1] such that
f (α) = 1.
(d) Show that P[Zn dies out] = 1.
94
Chapter 8
Further theory of stochastic

processes (∆)
In this chapter we develop some further tools for analysing stochastic processes: the dominated
convergence theorem, the optional stopping theorem and the strong Markov property. In chapters
8 and 9 we will write
min(s, t) = s ∧ t, max(s, t) = s ∨ t.
This is common notation in the field of stochastic processes.
Note that this whole chapter is marked with a (∆) – it is for independent study in MAS61023
but it is not part of MAS352.
95
8.1 The dominated convergence theorem (∆)

The monotone convergence theorem, from Section 6.2, applies to sequences of increasing random
variables. However, most sequences are not so well behaved, and in these cases we need a more
powerful theorem. Note that this section is marked with (∆), meaning that it is for independent
study in MAS61023, but it is not included in MAS352.
Theorem 8.1.1 (Dominated Convergence Theorem) Let Xn , X be random variables such

that:
1. Xn → X .
P
2. There exists a random variable Y ∈ L1 such that, for all n, |Xn | ≤ |Y | almost surely.
Then E[Xn ] → E[X] as n → ∞.
The random variable Y is often known as the dominating function or dominating random variable.
You can find a proof of the theorem in MAS31002/61022.
Example 8.1.2 Let Z ∼ N (µ, σ 2 ). Let X ∈ L1 be any random variable and let
Z
Xn = X + .
n
We can think of Xn as a noisy measurement of the random variable X , where the noise term Zn
becomes smaller as n → ∞.
Let us check the first condition of the theorem. Note that |Xn − X| = |Z|n , which tends to
zero as n → ∞ because |Z| < ∞. Hence Xn → X as n → ∞, which by Lemma 6.1.2 implies
a.s.
Xn → X .
P
Let us now check the second condition, with Y = |X| + |Z|. Then E[Y ] = E[|X|] + E[|Z|],
which is finite since X ∈ L1 and Z ∈ L1 . Hence, Y ∈ L1 . We have |Xn | ≤ |X| + n1 |Z| ≤ Y .
Therefore, we can apply the dominated convergence theorem and deduce that E[Xn ] → E[X]
as n → ∞. Of course, we can also calculate E[Xn ] = E[X] + n1 µ, and check that the result really
is true.
In the above example, we could calculate E[Xn ] = E[X] + nµ and notice that E[Xn ] → E[X],
without using any theorems at all. The real power of the dominated convergence theorem is
in situations where we don’t specify, or can’t easily calculate, the means of Xn or X . See, for
example, exercises 8.1 and 8.2, or the applications in Section 8.2.
If our sequence of random variables (Xn ) has |Xn | ≤ c for some deterministic constant c, then
the dominating function can be taken to be (the deterministic random variable) c. This case is
the a very common application, see e.g. exercise 8.1.
Remark 8.1.3 ( ) The dominated convergence theorem holds for conditional expectation too;
that is with E[·] replaced by E[· | G]. We won’t need this result as part of our course.
96
8.2 The optional stopping theorem (∆)

Adaptedness, from Definition 3.3.2, captures the idea that we cannot see into the future. This
leaves us with the question: as time passes, in what circumstances can we tell if an event has
already occurred?
Definition 8.2.1 A map T : Ω → {0, 1, 2, . . . , ∞} is called a (Fn ) stopping time if, for all n, the
event {T ≤ n} is Fn measurable.
A stopping time is a random time with the property that, if we have only information from Fn
accessible to us at time n, we are able to decide at any n whether or not T has already happened.
It is straightforward to check that T is a stopping time if and only if {T = n} is Fn measurable
for all n. This is left for you as exercise 8.4.
Example 8.2.2 Let Sn = ni=1 Xi be the simple symmetric random walk, with P[Xi = 1] =
P
P[Xi = −1] = 12 . Let Fn = σ(X1 , . . . , Xn ). Then, for any a ∈ N, the time
T = inf{n ≥ 0 ; Sn = a},
which is the first time Sn takes the value a, is a stopping time. It is commonly called the ‘hitting
time’ of a. To see that T is a stopping time we note that
n n
{T ≤ n} = {for some i ≤ n we have Si = a} =
[ [
{Si = a} = Si−1 (a)
i=0 i=0
Since Si is Fn measurable for all i ≤ n, the above equation shows that {T ≤ n} ∈ Fn .
You will have seen hitting times before – for most of you, in MAS2003 in the context of Markov
chains. You might also recall that the random walk Sn can be represented as Markov chains on
Z. You will probably have looked at a method for calculating expected hitting times by solving
systems of linear equations. This approach is feasible if the state space of a Markov chain is finite;
however Z is infinite. In this section we look at an alternative method, based on martingales.
Recall the notation ∧ and ∨ that we introduced at the start of this chapter. That is, min(s, t) =
s ∧ t and max(s, t) = s ∨ t.
Lemma 8.2.3 Let S and T be stopping times with respect to the filtration (Fn ). Then S ∧ T is
also a (Fn ) stopping time.
Proof: Note that

{S ∧ T ≤ n} = {S ≤ n} ∪ {T ≤ n}.
Since S and T are stopping times, both the sets on the right hand side of the above are events in
the σ -field Fn . Hence, the event on the left hand side is also in Fn .
If T is a stopping time and M is a stochastic process, we define M T to be the process
MnT = Mn∧T .
Here a ∧ b denotes the minimum of a and b. To be precise, this means that MnT (ω) = Mn∧T (ω) (ω)
for all ω ∈ Ω. In Example 8.2.2, S T would be the random walk S which is stopped (i.e. never
moves again) when (if!) it reaches state a.
97
Lemma 8.2.4 Let Mn be a martingale (resp. supmartingale, supermartingale) and let T be a

stopping time. Then M T is also a martingale (resp. supmartingale, supermartingale).
Proof: Let Cn := 1{T − 1 ≥ n}. Note that {T − 1 ≥ n} = Ω \ {T ≤ n − 1}, so {T ≥ n} ∈ Fn .

By Lemma 2.4.2 Cn ∈ mFn−1 . That is, (Cn ) is an adapted process. Moreover,
n n n∧T
1k−1≤T −1 (Mk −Mk−1 ) = 1k≤T (Mk −Mk−1 ) =
X X X
(C◦M )n = (Mk −Mk−1 ) = MT ∧n −M0 .
k=1 k=1 k=1
The last equality holds because the sum is telescoping (i.e. the middle terms all cancel each other
out). Hence, by Theorem 7.1.1, if M is a martingale (resp. submartingale, supermartingale),
C ◦ M is also a martingale (resp. submartingale, supermartingale).
Theorem 8.2.5 (Doob’s Optional Stopping Theorem) Let M be martingale (resp. submartin-
gale, supermartingale) and let T be a stopping time. Then
E[MT ] = E[M0 ]
(resp. ≥, ≤) if any one of the following conditions hold:
(a) T is bounded.
(b) P[T < ∞] = 1 and there exists c ∈ R such that |Mn | ≤ c for all n.
(c) E[T ] < ∞ and there exists c ∈ R such that |Mn − Mn−1 | ≤ c for all n.
Proof: We’ll prove this for the supermartingale case. The submartingale case then follows by
considering −M , and the martingale case follows since martingales are both supermartingales and
submartingales.
Note that
E[Mn∧T − M0 ] ≤ 0, (8.1)
because M T is a supermartingale, by Lemma 8.2.4. For (a), we take n = supω T (ω) and the
conclusion follows.
For (b), we use the dominated convergence theorem to let n → ∞ in (8.1). As n → ∞, almost
surely n ∧ T (ω) is eventually equal to T (ω) (because P[T < ∞] = 1), so Mn∧T → MT almost
surely. Since M is bounded, Mn∧T and MT are also bounded. So E[Mn∧T ] → E[MT ] and taking
limits in (8.1) obtains E[MT − M0 ] ≤ 0, which in turn implies that E[MT ] ≤ E[M0 ].
For (c), we will also use the dominated convergence theorem to let n → ∞ in (8.1), but now
we need a different way to check its conditions. We observe that
n∧T
(Mk − Mk−1 ) ≤ T sup |Mn − Mn−1 |.
X
|Mn∧T − M0 | =
n∈N
k=1
Since E[T (supn |Mn − Mn−1 |)] ≤ cE[T ] < ∞, we can use the Dominated Convergence Theorem
to let n → ∞, and the results follows as in (b).
These three sets of conditions (a)-(c) for the optional stopping theorem are listed on the formula
sheet, in Appendix B. Sometimes none of them apply! See Remark 9.3.5 and the lemma above it
for a warning example.
98
8.3 The stopped σ-field (∆)

Given a filtration (Fn ) and a stopping time T , it is natural to think about the information available
up to and including time T . The following definition makes this concept precise.
Definition 8.3.1 Let (Fn ) be a filtration and let T be a stopping time. Define
FT = A ∈ F ; A ∩ {T ≤ n} ∈ Fn for all n .

In words, FT is often known as the σ -field obtained by stopping (Fn ) at time T . If (Fn ) is the
generated filtration of (Xn ), then FT holds the information obtained from (X1 , X2 , . . . , XT ). We
will often need to use our intuition when dealing with this concept, but let us formally prove that
FT is a σ -field.
Lemma 8.3.2 Let (Fn ) be a filtration and let T be a stopping time. Then FT is a σ -field,
Proof: We must check the three conditions of Definition 2.1.1. Firstly, for all n ∈ N we have
∅∩{T ≤ n} = ∅ ∈ Fn , so ∅ ∈ FT . Secondly, if A ∈ FT then A∩{T ≤ n} ∈ Fn , and {T ≤ n} ∈ Fn
because T is a stopping time. Hence
(Ω \ A) ∩ {T ≤ n} = {T ≤ n} \ (A ∩ {T ≤ n}) ∈ Fn
so Ω \ A ∈ FT . Thirdly, if Am ∈ FT for all m, then Am ∩ {T ≤ n} ∈ Fn for all m, n. Hence

∞ ∞
!
[ [
Am ∩ {T ≤ n} = Am ∩ {T ≤ n} ∈ Fn ,
m=1 m=1
so ∪∞
m=1 Am ∈ FT .
99
8.4 The strong Markov property (∆)

Let us briefly consider a general filtered space (Ω, F, (Fn ), P), equipped with an adapted stochastic
process (Xm ). The Markov property states that, for any m ∈ N, the following two processes have
the same distribution:
1. the distribution of (Xm , Xm+1 , Xm+2 , . . .) given the information σ(Xm );
2. the distribution of (Xm , Xm+1 , Xm+2 , . . .) given the information Fm .
The key point is this: suppose that we are interested in the distribution of (Xm , Xm+1 , Xm+2 , . . .).
This might depend on the value of Xm , but if we know the value of Xm then values of X1 , . . . , Xm−1
(alongside any other information in Fm ) provide us with no extra information. Heuristically, the
future of (Xm ) might depend on its current value, but if we know that current value then we can
ignore the past.
The Markov property is very natural. We might even argue that reality itself, with its gen-
erated σ -field, has the Markov property, but this is a philosophical question best discussed else-
where. Most stochastic processes that are used in modelling are Markov processes. It is also
natural to replace the time m by a stopping time T . The strong Markov property states that
(XT , XT +1 , XT +2 , . . .) has the same distribution given the information FT as it would do if all
we knew was σ(XT ).
We will need the strong Markov property for random walks within Chapter 9. However, for
random walks we can say something even stronger: after a stopping time time T , a random walk
essentially restarts from its current location, and from then on behaves just like the same random
walk, with its future movements being independent of its past.
We need some notation to state this precisely. Let Sn = ni=1 Xi where the (Xi ) are indepen-
P
dent identically distribution random variables; this covers all the examples of random walks that
we will study in Chapter 9.
Theorem 8.4.1 Let T be a stopping time. The following two processes have the same distribution:
1. the distribution of (ST , ST +1 , ST +2 , . . .) given the information σ(ST );
2. the distribution of (ST , ST +1 , ST +2 , . . .) given the information FT .
Moreover, in either case, the distribution of (ST , ST +1 , ST +2 , . . .) is that of a random walk begun
at ST and incremented by an independent copy of Xi at each step of time.
More formally, Theorem 8.4.1 states that for all n ∈ N and any function f : R → R we have
E[f (ST +n ) | FT ] = E[f (ST +n ) | σ(ST )] = E[f (ST + Sn0 )]
where (Sn0 ) is an independent copy of (Sn ), with identical distribution. Strictly, this equation
only holds provided that f (·) ∈ L1 in each case, but we will always take f to be a nice enough
function that this is not a problem.
It is often more useful to apply the strong Markov property in words, and this is common
practice in probability theory. We’ll do it that way in Chapter 9. We won’t include a proof of
Theorem 8.4.1 in this course. It isn’t hard to prove, but it is a bit tedious, and hopefully you can
believe the result without needing a proof.
100
8.5 Kolmogorov’s 0-1 law ( )

This section is off-syllabus and is marked with a ( ). It contains an intriguing fact about
sequences of σ -fields, but it lives somewhere in between the material covered within our own
course and MAS31002/61022 (Probability with Measure). It is mainly of interest to those tak-
ing MAS31002/61022 alongside this course, and we will not use it within our course, so it is
best placed off-syllabus. It has a close connection to the second Borel-Cantelli lemma, which is
introduced in MAS31002/61022.
Let (Fn )n∈N be a sequence of σ -fields, on the same probability space (Ω, F, P). The tail σ -field
T of (Fn ) is defined by
∞
(8.2)
\
T = σ(Fn , Fn+1 , . . .).
n=1
Note that T is a σ -field by Lemma 2.1.5. The intuition here is that σ -field T contains events that
depend only on information ‘in the tail’ of the sequence (Fn ). That is, if we have an event E ∈ T ,
then for any N ∈ N we could tell whether E occurred by looking only at the occurrence of events
E 0 ∈ Fn for n ≥ N . For example, if (Xn ) is a sequence of random variables and Fn = σ(Xn )
then the almost sure limit Xn → X , if it exists, will satisfy X ∈ mT .
a.s.
The next result may appear surprising at first. The key point is that the Fn are assumed to be
independent, which means that they have no information in common. Consequently (8.2) implies
that T contains no information.
Theorem 8.5.1 (Kolmogorov’s 0-1 law) Let (Fn ) be a sequence of independent σ -fields and
let T be the associated tail σ -field. If A ∈ T then P[A] = 0 or P[A] = 1.
Proof: Let A ∈ T . Then A ∈ σ(Fn+1 , Fn+2 , . . .) for all n ∈ N. This means that A is
independent of σ(F1 , . . . , Fn ), for all n. It follows that A is independent of σ(Fn ; n ∈ N).
However, from (8.2) we have that T ⊆ σ(Fn ; n ∈ N), which means that A is independent of
T . Hence A is independent of A (this is not a typo!), which means that P[A] = P[A ∩ A] =
P[A]P[A] = P[A]2 . The only solutions of the equation x2 = x are 0 and 1, hence P[A] = 0 or
P[A] = 1.
Remark 8.5.2 Let us assume the same independence assumption as Theorem 8.5.1, and suppose
that X ∈ mT . Then {X ≤ x} ∈ T for all x ∈ R, so Theorem 8.5.1 gives that P[X ≤ x] is either
0 or 1 for each x ∈ R. A bit of analysis shows that if we set c = inf{x ∈ R ; P[X ≤ x] = 1} then
in fact P[X = c] = 1. Therefore, any random variable that is T measurable is almost surely equal
to a constant.
101
8.6 Exercises on Chapter 8 (∆)

On the dominated convergence theorem
8.1 Let U be a random variable that takes values in (1, ∞). Define Xn = U −n . Show that
E[Xn ] → 0.
8.2 Let X be a random variable in L1 and set
if |X| ≤ n,
(
X
Xn = X 1{|X|≤n} =
0 otherwise.
Show that E[Xn ] → E[X].
8.3 Let Z be a random variable taking values in [1, ∞) and for n ∈ N define
Z if Z ∈ [n, n + 1)
(
Xn = Z 1{Z∈[n,n+1)} = (8.3)
0 otherwise.
(a) Suppose that Z ∈ L1 . Use the dominated convergence theorem to show that E[Xn ] → 0
as n → ∞.
(b) Suppose, instead, that Z is a continuous random variable with probability density func-
tion
x−2 if x ≥ 1,
(
f (x) =
0 otherwise.
and define Xn using (8.3). Show that Z is not in L1 , but that E[Xn ] → 0.
(c) Comment on what part (b) tells us about the dominated convergence theorem.
On stopping times and optional stopping

8.4 Let (Fn ) be a filtration. Check that T is a stopping time if and only if {T = n} is Fn
measurable for all n.
8.5 Let (Fn ) be a filtration and let (Sn ) be an adapted process, with values in Z and initial
value S0 = 0. Show that R = inf{n ≥ 1 ; Sn = 0} is a stopping time.
Note: R is known as the first return time of (Sn ) to 0.
8.6 Let S and T be stopping times with respect to the same filtration. Show that S + T is also
a stopping time.
8.7 Let (Fn ) be a filtration. Let S and T be stopping times with S ≤ T . Show that FS ⊆ FT .
8.8 Recall the Pólya urn process from Section 4.2. Recall that Bn is the number of black balls
in the urn at time n, and that Mn = n+2
Bn
is the fraction of black balls in the urn at time n.
Let T be the first time at which a black ball is drawn from the urn.
(a) Show that P[T ≥ n] = n1 .

(b) Use the optional stopping theorem to show that E[MT ] = 1
2 and E[ T +2
1
] = 14 .
102
8.9 This question is a continuation of exercise 7.8 – recall the urn process from that exercise.
Let T be the first time at which the urn contains only balls of one colour, and suppose that
initially the urn contains r red balls and b black balls. Recall that Mn denotes the fraction
of red balls in the urn at time n.
(a) Deduce that P[T < ∞] = 1, and that P[MT = 0 or MT = 1] = 1.

(b) Show that E[MT ] = r
r+b and hence deduce that P[MT = 1] = r+b .
r
8.10 Let m ∈ N and m ≥ 2. At time n = 0, an urn contains 2m balls, of which m are red and
m are blue. At each time n = 1, . . . , 2m we draw a single ball from the urn; we discard
this ball and do not replace it. We continue until the urn is empty. Therefore, at time
n ∈ {0, . . . , 2m} the urn contains 2m − n balls.
Let Nn denote the number of red balls remaining in the urn at time n. For n = 0, . . . , 2m−1
let
Nn
Pn =
2m − n
be the fraction of red balls remaining after time n.
(a) Show that Pn is a martingale, with respect to a natural filtration that you should specify.
(b) [Challenge question] Let T be the first time at which we draw a red ball. Note that
a (T + 1)st ball will be drawn, because the urn initially contains at least two red balls.
Show that the probability that the (T + 1)st ball is red is 12 .
103
Chapter 9
Simple random walks (∆)
In Section 7.4 we studied the long-term behaviour of the Polya urn and the Galton-Watson process,
by taking a limit as the time parameter tended to infinity of a suitably chosen martingale. The
following result shows that these techniques do not (or at least, not directly) explain how random
walks might behave. In this chapter we will focus on the case of simple random walks, which
means that they move precisely one unit of space, upwards or downwards, in each step of time.
Lemma 9.0.1 Let Sn = ni=1 Xi where P[Xi = 1] = p and P[Xi = −1] = 1 − p, and the (Xi )
P
are independent of each other. Then (Sn ) does not converge almost surely to a real valued random
variable and (Sn ) is not uniformly bounded in L1 .
Proof: The process (Sn ) is integer valued. From Lemma 7.4.1 we have that, if (Sn ) were
to converge almost surely, it would have to eventually become constant. By definition we have
|Sn+1 − Sn | = 1, so this cannot happen, hence (Sn ) does not convergence almost surely.
In the case p = 12 we have that (Sn ) is a martingale. If (Sn ) was also uniformly bounded in L1
then the martingale convergence theorem would imply almost sure convergence, so from what we
have already proved (Sn ) is not uniformly bounded in L1 . For the case p 6= 12 , we have shown in
(4.3) that Mn = Sn − n(p − q) is a martingale, hence E[Sn ] = n(p − q) and therefore |E[Sn ]| → ∞.
By the absolute values property of expectations |E[Sn ]| ≤ E[|Sn |], hence E[|Sn |] → ∞, so (Sn ) is
not uniformly bounded in L1 .
In fact, as we mentioned briefly (and without proof) in Section 7.4, random walks exhibit a
mixture of oscillatory behaviour and divergence to ±∞. We will study this behaviour here, as
well as considering some other closely related properties of random walks. The techniques we will
use are primarily those developed in Chapter 8.
This chapter will make use of some of the exercises from earlier chapters, including within the
proofs. Make sure that you study these too, when they appear, if you have not done so already.
As a reminder: the solutions to all of the end-of-chapter exercises can be found within the online
version of the lecture notes.
104
9.1 Exit probabilities (∆)

We recall the asymmetric random walk from Section 4.1. Let (Xi )∞ i=1 be a sequence of i.i.d. random
variables. Let p + q = 1 with p, q ∈ (0, 1), p 6= q and suppose that
P[Xi = 1] = p, P[Xi = −1] = q.
The asymmetric random walk is the stochastic process

n
X
Sn = Xi .
i=1
Set Fn = σ(X1 , . . . , Xn ) and note that (Fn ) is the natural filtration of Sn . Recall that we showed
in Section 4.1 that the stochastic process Sn − n(p − q) was a martingale. In Exercise 4.2 we
found another martingale associated to (Sn ), in particular we showed that
Mn = (q/p)Sn
is a martingale.
Our plan is to use the optional stopping theorem, applied to the martingale (Mn ), to obtain
information about the hitting times of the asymmetric random walk. Let Ta = inf{n : Sn = a}
and T = Ta ∧ Tb for integer a < 0 < b. We aim to calculate P[T = Ta ] and P[T = Tb ]. We can
show that Ta is a stopping time by noting that
n
[
{Ta ≤ n} = {Si ≤ a}.
i=0
Similarly, Tb is a stopping time and it follows from Lemma 8.2.3 that T is also a stopping time.
We now look to apply the optional stopping theorem, using the (b) conditions. For this, we’ll
need the following lemma.
Lemma 9.1.1 It holds that E[T ] < ∞.
Proof: Let m = b − a. Divide up the random variables (Xn ) into sets A1 , A2 , . . . as follows:
A1 A2 A3 ...
(9.1)
z }| { z }| { z }| { z }| {
X1 , X2 , X3 , . . . , Xm , Xm+1 , Xm+2 , . . . , X2m , X2m+1 , X2m+2 , . . . , X3m , X3m+1 , . . . .
So as each Ak contains precisely m of the Xi s.

Let Ek = {for all X ∈ Ak , X = 1} be the event that all random variables in Ak are equal to
one. Note that if the event Ek occurs then our random walk moves up by m steps, during time
(k − 1)m, . . . , km.
If T has not happened before time (k − 1)m then a < S(k−1)m < b. Then, if the event Ek then
occurs, we will have Skm ≥ a and hence T ≤ km. This is best illustrated with a picture:
105
We can think of the sequence of events E1 , E2 , . . . as one chance after another that our random
walker has to exit [a, b].
By independence, P[Ek ] = pm . Hence, the random variable K = inf{k ∈ N ; Ek occurs} is a
geometric random variable with success parameter pm . This means that K < ∞ almost surely
and that E[K] = p−m < ∞. By definition of K , the event EK occurs so T ≤ Km and by
monotonicity of E we have E[T ] ≤ mE[K] < ∞.
From above, we have that M is a martingale. Hence by Lemma 8.2.4, M T is also a martingale.
By definition of T ,
if q > p then (q/p)a ≤ MnT ≤ (q/p)b

if p > q then (q/p)a ≥ MnT ≥ (q/p)b
for all n, hence M T is a bounded martingale. Lemma 9.1.1 implies that P[T < ∞] = 1, so we
have that condition (b) of the optional stopping theorem holds for the martingale M T and the
stopping time T . Therefore,
E[MTT ] = E[M0T ]
but MTT = MT ∧T = MT and M0T = M0∧T = M0 = 1. So we have
E[MT ] = 1.
Our next aim is to calculate the probabilities P[T = Ta ] and P[T = Tb ]. That is, we want to know
which of the two boundaries a and b we actually hit at time T (for example, {T = Ta } = {ST = a}
is the event that we hit a at time T ).
Since P[T < ∞] = 1, we must hit one or other boundary, so we have that
P[T = Ta ] + P[T = Tb ] = 1. (9.2)
By partitioning the expectation E[MT ] on whether {T = Ta }, we have
1 = E[MT ] (9.3)
= E[MT 1{T = Ta }] + E[MT 1{T = Tb }]
a " #
q b

q
=E 1{T = Ta } + E 1{T = Tb }
p p
a b
q q
= P[T = Ta ] + P[T = Tb ] . (9.4)
p p
106
Solving the linear equations (9.2) and (9.4), recalling that p 6= q , gives that
(q/p)b − 1
P[T = Ta ] = . (9.5)
(q/p)b − (q/p)a
and therefore also
1 − (q/p)a
P[Tb = T ] = 1 − P[T = Ta ] = . (9.6)
(q/p)b − (q/p)a
Note that this formula does not make sense if p = q = 12 . In that case our two linear equations
above are in fact the same equation, so we cannot solve them to find P[T = Ta ] and P[T = Tb ].
However, all is not lost: the case of the symmetric random walk can be handled in similar style to
above, but we need a different martingale in place of (Mn ). This is left for you to do, with some
hints along the way, in Exercise 9.2
107
9.2 Stirling’s Approximation (∆)

Stirling’s approximation is a fundamental inequality that is used across many different parts of
mathematics. It connects the factorial function with the constants π and e, and is often written
√
informally as n! ≈ 2πn ne . We mention it here because the distribution of Sn , where (Sn )
n
is a random walk, can sometimes be expressed using formulae involving binomial coefficients.
Stirling’s approximation is helpful to calculate the limits of such quantities as n → ∞.
It is also helpful to introduce some notation from analysis that you may not have seen before:
if (an ) and (bn ) are real valued sequences then we write
an
an ∼ bn to mean that lim = 1.
n→∞ bn
We will use this notation throughout the remainder of Chapter 9.
Theorem 9.2.1 For all n ∈ N it holds that

√ n n √ n n
2πn e1/(12n+1) < n! < 2πn e1/(12n) .
e e
This is a result from real analysis so we won’t include a proof in this course. Using that e1/n → 1,
it is straightforward to check that Theorem 9.2.1 implies the weaker result
√ n n
n! ∼ 2πn . (9.7)
e
We will apply Stirling’s approximation in this form. Equation (9.7) appears on the formula sheet.
108
9.3 Long term behaviour: symmetric case (∆)

In this section we study the simple symmetric random walk. Our goal is to show that (Sn ) not
only oscillates as n → ∞, but in fact (Sn ) visits every k ∈ Z infinitely many times. This implies
that (Sn ) will oscillate wildly, with ever larger oscillations, as n → ∞.
Let Sn = ni=1 Xi where the (Xi ) are independent and have identical distribution P[Xi =
P
1] = P[X1 = −1] = 12 . Note that S0 = 0. For k ∈ Z let
Tk = inf{n ≥ 0 ; Sn = k}
be the hitting time of k ∈ N and let
R = inf{n ≥ 1 ; Sn = 0}
be the first return time to zero. We set R = ∞ if the walk does not return to zero, in keeping
with the convention that inf ∅ = ∞. In Example 8.2.2 we showed that Tk is a stopping time, and
in Exercise 8.5 we showed that R is a stopping time.
Lemma 9.3.1 It holds that P[R < ∞] = 1.
Proof: Note that P[R < ∞] ∈ [0, 1]. We will argue by contradiction. Suppose that P[R <
∞] = p < 1. By the strong Markov property, the process (SR+n )n≥0 has the same distribution as
(Sn ). In words, each time the random walk returns to zero the process restarts, and from then on
acts like a simple symmetric random walk, independent of the past. Let G = ∞ n=1 1{Sn = 0},
P
which in words counts the number of times (Sn ) returns to zero. By the strong Markov property,
applied repeatedly at each return, G has a geometric distribution distribution P[G = j] = pj (1−p)
for all j ∈ {0, 1, . . . , }. In particular, E[G] < ∞.
We will now calculate E[G] using a different method. Note that G = ∞ n=1 1{Sn = 0}, hence
P
∞ ∞
E[1{Sn = 0}] = (9.8)
X X
E[G] = P[Sn = 0].
n=1 n=1
In the first equality above we have used Exercise 6.8 to swap the infinite sum and expectation.
Calculating P[Sn = 0] is Exercise 7.11; it is zero when n is odd, which leaves us only to consider
√
4πn(2n/e)2n −2n

2n −2n (2n)! −2n 1
P[S2n = 0] = 2 = 2
2 ∼ 2n
2 = √ . (9.9)
n (n!) 2πn(n/e) πn
In the above we have used Stirling’s approximation, in particular equation (9.7), to simplify the
√
binomial coefficients as n → ∞. From (9.9) we obtain that πnP[S2n = 0] → 1 as n → ∞.
Hence there exists some N ∈ N such that for all n ≥ N we have P[S2n = 0] ≥ 2√1πn . Putting this
back into (9.8) we have
∞ ∞
X X 1
E[G] ≥ P[S2n = 0] ≥ √ = ∞.
2 πn
n=N n=N
We had deduced above that E[G] was finite, so we have reached a contradiction. Hence in fact
P[R < ∞] = 1.
Lemma 9.3.2 For all k ∈ N we have that P[Tk < ∞] = 1.
109
Proof: Note that T0 = 0, because S0 = 0. We next consider T1 . From Lemma 9.3.1 we have
P[R < ∞] = 1, hence by the strong Markov property (applied at successive returns to zero) we
have that (Sn )∞n=0 visits zero infinitely many times.
At after each visit to zero, by the strong Markov property, the next move of the walk will be
to 1 with probability 12 , and to −1 with probability 12 . Each time, the move will be independent
of the past. Hence, we need to wait a Geometric( 12 ) number of times to see a movement to 1.
Since this only requires finitely many attempts, we have P[T1 < ∞] = 1.
Let us now consider Tk for k ∈ N. To reach k we need to move from 0 to 1, then from 1 to
2, and so on until k . By the strong Markov property applied successively at T1 , then T2 , then T3
and so on, each such move takes the same number of steps (in distribution) as an independent
copy of T1 . Hence the distribution of Tk is that of k i.i.d. copies of T1 . Since P[T1 < ∞] = 1 it
follows immediately that P[Tk < ∞] = 1.
Lastly, we consider k < 0. By symmetry of the walk about zero, Tk and T−k have the same
distribution, so P[T−k < ∞] = P[Tk < ∞] = 1.
Theorem 9.3.3 Almost surely, for all k ∈ Z there are infinitely many n ∈ N such that Sn = k .
Proof: Since there are only countably many k ∈ Z, it suffices to prove the result for some fixed
(but arbitrary) k ∈ Z. From Lemma 9.3.2 we have P[Tk < ∞] = 1, so almost surely the walk
reaches k , in fact STk = k . By the strong Markov property, after time Tk the walk continues to
behave like a simple symmetric random walk, independent of its past. Lemma 9.3.1 gives that
P[R < ∞] = 1, meaning that the walk will almost surely return (again) to k . Repeating this
argument, we obtain that almost surely the walk visits k infinitely many times.
110
How long does the simple symmetric random walk take to reach 1?
Let us focus for a moment on the time T1 . Even though we now know that P[T1 < ∞] = 1, we are
not able to apply the optional stopping theorem to (Sn ) and T1 , because (Sn ) is unbounded and
we do not know if E[T1 ] < ∞. In fact, in this case E[T1 ] = ∞ and the optional stopping theorem
does not apply. Instead, we can argue by contradiction and use the optional stopping theorem to
deduce that E[T1 ] = ∞, as follows.
Lemma 9.3.4 It holds that E[T1 ] = ∞.
Proof: Suppose that E[T1 ] < ∞. Then we could apply the optional stopping theorem to (Sn )
and T using the condition (c). This would give E[S0 ] = 0 = E[ST1 ]. But, from Lemma 9.3.2 we
have P[T1 < ∞] = 1 which means, by definition of T1 , that ST1 = 1 almost surely. This is a
contradiction.
Remark 9.3.5 More generally, we have to be very careful about using the optional stopping
theorem. The conditions fail in many situations, and there are many examples of stopping times
T and martingales (Mn ) for which E[MT ] 6= E[M0 ].
From Lemmas 9.3.2 and 9.3.4 we have that P[T1 < ∞ = 1] but E[T1 ] = ∞. This tells us that
although the time T1 is always finite, it can be very large. You might find this counter-intuitive
at first glance since it is clear that P[T1 = 1] = 12 by just considering the first step, but the point
is that once the walk has (by chance) made a few steps downwards, it starts to take a very long
time to find its way back up again. In the next lemma, with a careful calculation we will find out
the exact distribution of T1 . Note that T1 is odd, as the value of (Sn ) is odd/even depending on
whether n is odd/even.
Lemma 9.3.6 It holds that P[T1 = 2n − 1] = 2n 2n−1 ∼ 2n πn .

−2n 1 1
n 2
√
Proof: For n = 1 we have P[T1 = 1] = P[S1 = 1] = 12 , as required. It remains to consider

n ≥ 2. Note that
P[S2n > 0] = P[T1 ≤ 2n − 1, S2n > 0]

= E 1{T1 ≤2n−1} 1{S2n >0}

= E E 1{T1 ≤2n−1} 1{S2n >0} | FT1

h h ii
= E 1{T1 ≤2n−1} E 1{S2n −ST1 >0} | FT1

1
= E 1{T1 ≤2n−1} ×
2
1
= P[T1 ≤ 2n − 1]. (9.10)
2
In the above, the first line follows because if S2n > 0 then T1 ≤ 2n − 1. The second and third
lines use the relationship between expectation and conditional expectation, and the fourth line
uses that T1 is a stopping time, hence {T1 ≤ 2n − 1} ∈ FT1 . The fifth line uses the strong Markov
property, which gives that S2n − ST1 is independent of FT1 and behaves like a random walk with
2n − T1 steps. Since T1 is odd, this is an odd number of steps, which means that S2n − ST1 takes
odd values and has a distribution that is symmetric about 0, hence E[1{S2n −ST1 >0} | FT1 ] = 12 .
111
From the above calculation, and using the symmetry of the random walk about zero, we have
that
P[T ≤ 2n − 1] = 2P[S2n > 0] = P[S2n > 0] + P[S2n < 0] = 1 − P[S2n = 0]. (9.11)
To finish, we note that for n ≥ 2,
P[T = 2n − 1] = P[T ≤ 2n − 1] − P[T ≤ 2n − 3]

= (1 − P[S2n = 0]) − (1 − P[S2n−2 = 0])

2n − 2 −2n+2 2n −2n
= 2 − 2
n−1 n

2n −2n 1
= 2 .
n 2n − 1
The second line of the above follows from (9.11), and the last line follows from a short calculation
that is left for you. The last claim of the lemma follows from the same calculation as in (9.9).
The calculation that led to (9.10) is known as a ‘reflection principle’. It applies the strong
Markov property at Tk (with k = 1, in this case), and then divides the future into two cases based
on whether the first move is upwards of downwards. The distribution of the walk in these two
cases is a mirror of one another i.e. a reflection about height k .
112
9.4 Long term behaviour: asymmetric case (∆)

We will now move on to study the simple asymmetric random walk. We’ll recycle some of our
previous notation. Let Sn = ni=1 where the (Xi ) are independent and have identical distribution
P
P[Xi = 1] = p > 12 and P[X1 = −1] = 1 − p < 12 . The case p < 12 can be handled by considering
(−Sn ) in place of (Sn ). Let R = inf{n ≥ 1 ; Sn = 0} be the first return time of (Sn ) to zero. As
before, we set R = ∞ if the walk does not return to zero.
Like the symmetric random walk, the asymmetric case sees oscillations, but they are dominated
by the drift upwards, so much so that (Sn ) might not return to origin even once.
Lemma 9.4.1 It holds that P[R < ∞] < 1 and Sn → ∞.

a.s.
Proof: We will first prove that Sn → ∞. In fact, we have already proved this using a
a.s.
martingale convergence argument in Exercise 7.10. We’ll give a different proof here, using the
strong law of large numbers. We have Sn = ni=1 Xi , where the Xi are i.i.d. random variables,
P
so the strong law of large numbers implies that Snn → E[X1 ] as n → ∞. Note that E[X1 ] =
a.s.
p(1) + (1 − p)(−1) = 2p − 1. It follows that there exists (a random variable) N ∈ N such that
for all n ≥ N we have Snn ≥ 2p−1
2 , which implies Sn ≥ n 2 . Since 2p − 1 > 0 we therefore have
2p−1
Sn → ∞. In particular this means that P[(Sn ) visits zero infinitely many times] = 0.
a.s.
If P[R < ∞] was equal to 1 then, by applying the strong Markov property at each successive
return time to zero, we would have that almost surely (Sn ) visited the origin infinitely many
times. We have shown above that this is not the case. Hence P[R < ∞] < 1.
Remark 9.4.2 By the strong Markov property and the same argument as in the proof of Lemma
9.3.1 (except that there it was part of a proof by contradiction) the number of times that (Sn )
returns to the origin is a Geometric(p̂) random variable, where p̂ = P[R < ∞].
If we take Tk to be the hitting time of k ∈ Z, then it follows immediately from Lemma 9.4.1
that P[Tk < ∞] = 1 for k ≥ 0 and P[Tk < ∞] < 1 for k < 0. In the symmetric case Lemma 9.3.4
showed that E[T1 ] = ∞, but here in the asymmetric case the opposite is true.
Lemma 9.4.3 It holds that E[T1 ] = 2p−1 .

p
Proof: We calculate
E[T1 ] = E[T1 1{S1 =1} ] + E[T1 1{S1 =−1} ]

= E[11{S1 =1} ] + E[E[T1 1{S1 =−1} | F1 ]]
= p + E[1{S1 =−1} E[T1 | F1 ]]
= p + E[1{S1 =−1} E[1 + T10 + T100 | F1 ]]
= p + E[1{S1 =−1} E[1 + T10 + T100 ]]
= p + E[1{S1 =−1} (1 + 2E[T1 ])]
= p + (1 − p)(1 + 2E[T1 ]).
In the first line of the above we partition on the value of S1 . The first term of the second
line follows because if S1 = 1 then T1 = 1, and the second term uses the relationship between
expectation and conditional expectation. The third line follows because S1 ∈ F1 . The fourth line
113
uses that, on the event S1 = −1, T1 is equal to 1 (accounting for the first move 0 7→ −1) plus two
independent copies (T10 and T100 ) of T1 (accounting for the time to move from −1 7→ 0 and then
0 7→ 1). The strong Markov property gives that T10 and T100 are independent of F1 , leading to the
fifth line. The remainder of the calculation is straightforward. Rearranging to isolate E[T1 ], we
obtain the required result.
Remark 9.4.4 Using a similar argument to that used within the proof of Lemma 9.3.2, for k ∈ N
we have that Tk is the sum of k i.i.d. copies of T1 , hence E[Tk ] = 2p−1
kp
.
Note that when p ≈ 1 we have E[T1 ] ≈ 1, which makes sense since when p is close to 1 it is
very likely that the first step of the walk will be upwards. As p & 12 we have E[T1 ] → ∞, which
also makes sense because as p gets close to 12 we would expect the asymmetric walk to look more
and more like the symmetric case, which has E[T1 ] = ∞.
114
9.5 In higher dimensions ( )

We will look briefly at the behaviour of simple symmetric random walks in higher dimensions.
This section is off-syllabus and is included for interest. In particular, we will consider versions of
Lemma 9.3.1 in dimension d = 2 and d ≥ 3.
We must first explain what the simple symmetric random walk is, in higher dimensions. It
will suffice to describe d = 2 to make the principle clear. Then, for each n ∈ N, Sn is a random
element of Z2 , which we write as Sn = (Sn , Sn ). The movements are given by Sn = ni=1 Xi
(1) (2) P
where P[Xi = (0, 1)] = P[Xi = (0, −1)] = P[Xi = (1, 0)] = P[Xi = (0, −1)] = 14 . These four
possibilities correspond to movements by precisely one unit of space in the four possible directions:
north, south, east and west.
Let R = inf{n ≥ 1 ; Sn = 0} denote the time taken to return to the origin 0 = (0, 0). In two
dimensions we have:
Lemma 9.5.1 If d = 2 then P[R < ∞] = 1.
Proof: We may use much the same argument as in Lemma 9.3.1. The only difference is that
we must replace the estimate 9.9 with a two dimensional version, as follows. We have
n
X 2n 2i 2n − 2i −2n
P[S2n = (0, 0)] = 4
2i i n−i
i=0
X n 2
2n n
= 4−2n
n k
k=0
2n −2n 2

= 2
n
1
∼ . (9.12)
πn
In the above, the first line comes from counting the number of possible ways in which the two
dimensional walk can start at 0 and returns to 0 after exactly 2n steps (we did the one dimensional
version in Exercise 7.11). To do so the walk must have made the same number of steps north as
it did south, and the same number of steps east as it did west. We write 2i for the number of
east-west steps and 2j = 2n − 2i for the number of north-south steps. The first binomial factor
counts which steps are east-west and which are north-south. The second and third, respectively,
count which of the east-west steps are to the east, and which of the north-south steps are to the
north – and that fixes everything else about the path.
The second line of the above calculation follows easily from the first, and you should write
the factorials out in full to check this for yourself. The third line follows from the second by the
combinatorical identity 2n k=0 k . The final line follows from the same calculation as in
Pn n

n =
the right hand side of (9.9).
With (9.12) in place of (9.9), the rest of the argument for this lemma follows in the same way
as that of Lemma 9.3.1. We’ll omit writing out the details again.
We won’t prove it here, but the simple symmetric random walk will take much longer to return
to the origin in d = 2 than d = 1. This is because in order to reach (0, 0) both the east-west
component Sn and the north-south component Sn must return simultaneously to zero, and they
(1) (2)
unlikely to do this at the same time. Perhaps surprisingly, in two dimensions the random walk
115
still manages to return. The picture changes if we move to three dimensions, with the obvious
notation: now Sn ∈ Z3 and there are 6 different directions to move in, each with probability 16 .
In three dimensions there is now so much space to get lost in that the random walk might not
return to the origin.
Lemma 9.5.2 If d = 3 then P[R < ∞] < 1.
Sketch of Proof: We will only give a sketch of the argument. We need to make some
substantial modifications to the argument used in Lemmas 9.3.1 and 9.5.1, because we are now
trying to prove the opposite result, but there are still parts that overlap. In similar style to (9.12)
we have
X (2n)!
P[S2n = (0, 0, 0)] = 6−2n
(i!) (j!)2 (k!)2
2
i+j+k=n
n! −n 2

2n −2n X
= 2 3
n i!j!k!
i+j+k=n
 

2n −2n n! −n  X n! −n 
≤ 2 max 3 3 .
n i+j+k=n i!j!k! i!j!k!
i+j+k=n
The first line again concerns counting possible ways to return to the origin in exactly 2n steps
(we’ll omit the details of this) and the next two lines follow by elementary calculations. We
have that i+j+k=n i!j!k!n!
3−n = 1, because this sums up all the probabilities of outcomes (i, j, k)
P
that sum to n according to the trinomial distribution (which you might have to look up, if you
haven’t seen it before). A bit of calculus with Lagrange multipliers shows that the term i!j!k!
n!
3−n ,
subject to the condition i + j + k = n, is maximised when i, j, k are all approximately equal
to n3 . Approximating n3 by the nearest integer, we can find that maxi+j+k=n i!j!k! n!
3−n ≤ Cn ,
where C ∈ (0, ∞) is a constant. Combining this with the calculation from (9.9) we obtain that
P[S2n = 0] ≤ nC3/2 , where C 0 ∈ (0, ∞) is also constant.
0
Setting G = ∞ n=1 1{Sn = 0} we now have

P
∞ ∞
C0
= 0] ≤
X X
E[G] = P[S2n < ∞.
n=1 n=1
n3/2
Hence the expected number of visits to the origin is finite. By the strong Markov property, if
P[R < ∞] = 1 then we would almost surely have infinitely many returns to the origin. Hence
P[R < ∞] < 1.
Lemma 9.5.2 also holds in all dimensions d ≥ 3, To see this, note that the first three coordinates
of a d-dimensional random walk give us a three dimensional random walk. If the d-dimensional
walk visits the origin, so do its first three coordinates, but Lemma 9.5.2 gives that this has
probability less than one.
It is natural to ask what the random walk does do, in three dimensions and higher, as n → ∞.
The answer is easy to guess: it disappears off to infinity and does not come back i.e. |Sn | → ∞.
a.s.
This is left for you to prove, as Exercise 9.9.
116
9.6 Exercises on Chapter 9 (∆)

In all questions below (Sn ) denotes a random walk started at the origin, but which random walk
varies according to the question.
On one dimensional random walks

9.1 Let (Sn ) be the simple asymmetric random walk, as in Section 9.1. Let a < 0 < b be
integers and define the hitting times Ta = inf{n ∈ N ; Sn = a}, Tb = inf{n ∈ N ; Sn = b}
and T = Ta ∧ Tb .
(a) Show that E[ST ] = (p − q)E[T ].
(b) Calculate E[ST ] directly using (9.5) and (9.6) and hence calculate E[T ].
9.2 This question applies some of the techniques from Section 9.1 to the symmetric case.
Let (Sn ) denote the simple symmetric random walk and let Tk be the hitting time of k ∈ Z.
Let a < 0 < b be integers and let T = Ta ∧ Tb .
(a) Explain carefully why both
1 = P[T = Ta ] + P[T = Tb ]
0 = aP[T = Ta ] + bP[T = Tb ].
Hint: Recall that (Sn ) is a martingale.
(b) Solve these equations to find explicit formulae for P[T = Ta ] and P[T = Tb ] in terms of
a and b.
(c) Show that E[T ] = −ab.
Hint: Can you think of a useful martingale?
9.3 Let (Xi ) be a sequence of independent, identically distributed random variables with P[Xi =
2] = 13 and P[Xi = −1] = 23 . Set
Xn
Sn = Xi
i=1
and define the stopping time R = inf{n ≥ 1 ; Sn = 0}.
(a) Modify the argument√ in Exercise 7.11 to calculate P[Sn = 0] explicitly. Hence show
that P[S3n = 0] ∼ 2√πn
3
as n → ∞.
(b) Explain how to modify the proof of Lemma 9.3.1 to deduce that P[R < ∞] = 1.
9.4 Let p ∈ ( 35 , 1]. Let (Xi ) be a sequence of independent, identically distributed random
variables with P[Xi = 1] = p and P[Xi = −1] = P[Xi = −2] = 1−p 2 . Set
n
X
Sn = Xi
i=1
and let T1 = inf{n ≥ 1 ; Sn = 1}.

(a) Explain how to modify the argument in Lemma 9.4.1 to show that Sn → ∞ as n → ∞.
a.s.
Hence show that P[T1 < ∞] = 1.

(b) Modify the argument in Lemma 9.4.3 to calculate E[T1 ].
117
9.5 For the simple symmetric random walk, in Lemma 9.3.6 we showed that P[T1 = 2n − 1] ∼
1
√
2n πn
. Use this fact to give a second proof (alongside that of Lemma 9.3.4) that E[T1 ] = ∞.
9.6 Let (Sn ) denote the simple symmetric random walk and let Tm = inf{n ≥ 0 ; Sn = m} be
the first hitting time of m ∈ Z. Let
eθSn
Mn(θ) =
(cosh θ)n
where θ ∈ R.
(a) Show that Mn is a martingale.

(θ)
(b) Check that none of the conditions (a)-(c) of the optional stopping theorem apply to the
martingale (Mn ) at the stopping time Tm .
(θ)
(c) [Challenge question] Show that

1 1
E =
(cosh θ) T cosh(mθ)
where T = Tm ∧ T−m . You should start by applying the optional stopping theorem to
a suitable martingale.
On random walks in two and three dimensions ( )

9.7 Let (Sn ) denote the two dimensional simple symmetric random walk, as defined in Section
9.5. Prove that, almost surely, for each z ∈ Z2 there are infinitely many n ∈ N such that
Sn = z .
Hint: You can re-use some of the ideas from proof of Theorem 9.3.3.
9.8 Let (Sn ) denote the three dimensional simple symmetric random walk, as defined in Section
9.5. Let G = n=0 1{S2n =0} denote the total number of visits to the origin. Let R =
P∞
min{n = 1, 2, . . . ; Sn = 0} and L = max{n = 0, 1, 2, . . . ; Sn = 0} denote, respectively, the

first return time and the last visiting time of (Sn ) to the origin.
(a) Explain why P[L < ∞] = 1, as a consequence of Lemma 9.5.2.

(b) Is L is a stopping time? Give a brief reason for your answer.
(c) Show that P[L = 2n] = P[S2n = 0]P[R = ∞] and hence prove that E[G] = 1
1−P[R<∞] .
(d) We already came close to deducing this exact formula for E[G], more than once, within
the current chapter. Can you see where?
9.9 Let (Sn ) denote the three dimensional simple symmetric random walk, as defined in Section
9.5. Prove that |Sn | → ∞ as n → ∞.
a.s.
118
Appendix A
Solutions to exercises (part one)
Chapter 1
1.1 At time 1 we would have 10(1 + r) in cash and 5 units of stock. This gives the value of our assets at time
1 as 10(1 + r) + 5S1 .
1.2 (a) Assuming we don’t buy or sell anything at time 0, the value of our portfolio at time 1 will be
x(1 + r) + yS1 , where S1 is as in the solution to 1.1. Since we need to be certain of paying off
our debt, we should assume a worst case scenario for S1 , that is S1 = sd. So, we are certain to pay
off our debt if and only if
x(1 + r) + ysd ≥ K.
(b) Since certainty requires that we cover the worst case scenario of our stock performance, and d < 1 + r,
our best strategy is to exchange all our assets for cash at time t = 0. This gives us x + ys in cash
at time 0. At time 1 we would then have (1 + r)(x + ys) in cash at time 0 and could pay our debt
providing that
(1 + r)(x + ys) ≥ K.
1.3 (a) We borrow s cash at time t = 0 and use it to buy one stock. Then wait until time t = 1, at which
point we will a stock worth S1 , where S1 is as in 1.1. We sell our stock, which gives us S1 in cash.
Since S1 ≥ sd > s(1 + r), we repay or debt (plus interest) which costs us s(1 + r). We then have
S1 − s(1 + r) > 0
in cash. This is an arbitrage.

(b) We perform the same strategy as in (a), but with the roles of cash an stock swapped: We borrow 1
stock at time t = 0 and then sell it for s in cash. Then wait until time t = 1, at which point we will
have s(1+r) in cash. Since the price of a stock is now S1 , where S1 is as in 1.1, and S1 ≤ us < (1+r)s,
we now buy one stock costing S1 and repay the stockbroker, leaving us with
s(1 + r) − S1 > 0
in cash. This is an arbitrage.

1.4 We can calculate, using integration by parts, that
Z ∞ Z ∞
1
E[X] = xfX (x) dx = xλe−λx dx = .
−∞ 0 λ
Similarly, we can calculate that

Z ∞ Z ∞
2
E[X 2 ] = x2 fX (x) dx = x2 λe−λx dx =
−∞ 0 λ2
This gives that

1
var(X) = E[X 2 ] − E[X]2 = .
λ2
119
1.5 We can calculate

n2
E[Xn ] = 0 · P[Xn = 0] + n2 P[Xn = n2 ] = =n→∞
n
as n → ∞. Also,
1
P[|Xn | > 0] = P[Xn = n2 ] = →0
n
as n → ∞.
1.6 We have

X −µ
P[Y ≤ y] = P ≤y
σ
= P[X ≤ µ + yσ]
Z µ+yσ (x−µ)2
1 −
= √ e 2σ2 dx.
−∞ 2πσ 2
We want to turn the above integral into the distribution function of the standard normal. To do so,
substitute z = x−µ
σ
, and we obtain
Z y
1 z2
P[Y ≤ y] = √ e− 2 σ dz
−∞ 2πσ 2
Z y
1 − z22
= √ e dz.
−∞ 2π
Hence, Y has the distribution function of N (0, 1), which means Y ∼ N (0, 1).
1.7 For p 6= 1, ∞
∞
x−p+1
Z
x−p dx =
1 −p + 1 x=1
and we obtain a finite answer only if −p + 1 < 0. When p = 1, we have
Z ∞
x−1 dx = [log x]∞
x=1 = ∞,
1
so we conclude that dx is finite if and only if p > 1.

R∞ −p
1
x
1.8 As n → ∞,
• e−n → 0 because e > 1.
• sin nπ oscillates through 0, 1, 0, −1 and does not converge.

2
cos(nπ)
• n
converges to zero by the sandwich rule since | cos(nπ)
n
|≤ 1
n
→ 0.
n
• −i
= 1 − 2−n is a geometric series, and 1 − 2−n → 1 − 0 = 1.
P
2
i=1
n
• 1
tends to infinity, because
P
i
i=1
z}|{ z }| { z }| { z }| {
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+ + + + + + + + ++ + + + + + +....
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
each term contained in a · is greater than or equal to 1

.
z}|{
2
1.9 Recall that ∞ −2

< ∞. Define n1 = 1 and then
P
n=1 n
nr+1 = inf{k ∈ N ; k > nr and |xk | ≤ r−2 }.
Then for all r we have |xnr | ≤ r−2 . Hence,

∞
X ∞
X
|xnr | ≤ r−2 < ∞.
r=1 r=1
120
Chapter 2
2.1 The result of rolling a pair of dice can be written ω = (ω1 , ω2 ) where ω1 , ω2 ∈ {1, 2, 3, 4, 5, 6}. So a suitable
Ω would be n o
Ω = (ω1 , ω2 ) ; ω1 , ω2 ∈ {1, 2, 3, 4, 5, 6} .
Of course other choices are possible too. Since our choice of Ω is finite, a suitable σ-field is F = P(Ω).
2.2 (a) Consider the case of F. We need to check the three properties in the definition of a σ-field.
1. We have ∅ ∈ F so the first property holds automatically.
2. For the second we check compliments: Ω \ Ω = ∅, Ω \ {1} = {2, 3}, Ω \ {2, 3} = {1} and Ω \ Ω = ∅;
in all cases we obtain an element of F.
3. For the third, we check unions. We have {1} ∪ {2, 3} = Ω. Including ∅ into a union doesn’t
change anything; including Ω into a union results in Ω. This covers all possible cases, in each
case resulting in an element of F.
So, F is a σ-field. Since F 0 is just F with the roles of 1 and 2 swapped, by symmetry F 0 is also a
σ-field.
(b) We have {1} ∈ F ∪ F 0 and {2} ∈ F ∪ F 0 , but {1} ∪ {2} = {1, 2} and {1, 2} ∈ / F ∪ F 0 . Hence F ∪ F 0
is not closed under unions; it fails property 3 of the definition of a σ-field.
However, F ∩ F 0 is the intersection of two σ-fields, so is automatically itself a σ-field. (Alternatively,
note that F ∩ F 0 = {∅, Ω}, and check the definition.)
2.3 (a) Let Am be the event that the sequence (Xn ) contains precisely m heads. Let Am,k be the event that
we see precisely m heads during the first k tosses and, from then on, only tails. Then,
P[Am,k ] = P[m heads in X1 , . . . , Xk , no heads in Xk+1 , Xk+2 , . . .]

= P[m heads in X1 , . . . , Xk , ] P[ no heads in Xk+1 , Xk+2 , . . .]
= P[m heads in X1 , . . . , Xk , ] × 0
= 0.
If Am occurs then at least one of Am,1 , Am,2 , . . . occurs. Hence,

∞
X
P[Am ] ≤ P[Am,k ] = 0
k=0
.
(b) Let A be the event that the sequence (Xn ) contains finitely many heads. Then, if A occurs, precisely
one of A1 , A2 , . . . occurs. Hence,
X∞
P[A] = P[Am ] = 0.
m=0
That is, the probability of having only finite many heads is zero. Hence, almost surely, the sequence
(Xn ) contains infinitely many heads.
By symmetry (exchange the roles of heads and tails) almost surely the sequence (Xn ) also contains
infinitely many tails.
2.4 (a) F1 contains the information of whether a 6 was rolled.
(b) F2 gives the information of which value was rolled if a 1, 2 or 3 was rolled, but if the roll was in
{4, 5, 6} then F1 cannot tell the difference between these values.
(c) F3 contains the information of which of the sets {1, 2}, {3, 4} and {5, 6} contains the value rolled, but
gives no more information than that.
2.5 (a) The values taken by X1 are
if ω = 3,

0

X1 (ω) = 1 if ω = 2, 4,
if ω = 1, 5.

4

Hence, the pre-images are X1−1 (0) = {3}, X1−1 (1) = {2, 4} and X1−1 (4) = {1, 5}. These are all
elements of G1 , so X1 ∈ mG1 . However, they are not all elements of G2 (in fact, none of them are!) so
X1 ∈/ mG2 .
121
(b) Define
if ω = 1, 2,
(
0
X2 (ω) =
1 otherwise.
Then the pre-images are X2−1 (0) = {1, 2} and X2−1 (1) = {3, 4, 5}. Hence X2 ∈ mG2 .
It isn’t possible to use unions/intersections/complements to make the set {3, 4, 5} out of {1, 5}, {2, 4}, {3}
(one way to see this is note that as soon as we tried to include 5 we would also have to include 1,
meaning that we can’t make {3, 4, 5}). Hence X2 ∈ / mG1 .
2.6 We have X(T T ) = 0, X(HT ) = X(T H) = 1 and X(HH) = 2. This gives
n o
σ(X) = ∅, {T T }, {T H, HT }, {HH}, {T T, T H, HT }, {T T, HH}, {T H, HT, HH}, Ω .
To work this out, you could note that X −1 (i) must be in σ(X) for i = 0, 1, 2, then also include ∅ and Ω,
then keep adding unions and complements of the events until find you have added them all. (Of course,
this only works because σ(X) is finite!)
2.7 Since sums, divisions and products of random variables are random variables, X 2 + 1 is a random variable.
Since X 2 + 1 is non-zero, we have that X 21+1 is a random variable, and using products again, X 2X+1 is a
random variable.
For sin(X), recall that for any x ∈ R we have
∞
(−1)n 2n+1
sin(x) =
X
x .
n=0
(2n + 1)!
(−1)n
Since X is a random variable so is (2n+1)!
X 2n+1 . Since limits of random variables are also random variables,
so is
N
(−1)n
sin X = lim
X
X 2n+1 .
N →∞
n=0
(2n + 1)!
2.8 Since X ≥ 0 we have E[X p ] = E[|X|p ] for all p.
Z ∞ ∞ ∞
x−1
Z
E[X] = x1 2x−3 dx = 2 x−2 dx = 2 =2<∞
1 1 −1 1
so X ∈ L1 , but Z ∞ Z ∞
E[X 2 ] = x2 2x−3 dx = 2 x−1 dx = 2 [log x]∞
1 = ∞
1 1
so X ∈
/ L2 .
2.9 Let 1 ≤ p ≤ q < ∞ and X ∈ Lq . Our plan is to generalize the argument that we used to prove (2.7).
First, we condition to see that
|X|p = |X|p 1{|X| ≤ 1} + |X|p 1{|X| > 1}.
Hence,
E[|X|p ] = E [|X|p 1{|X| ≤ 1} + |X|p 1{|X| > 1}]
≤ E [1 + |X|q ]
= 1 + E|X|q ] < ∞
Here we use that, if |X| ≤ 1 then |X|p ≤ 1, and if |X| > 1 then since p ≤ q we have |X|p < |X|q . So
X ∈ Lp .
2.10 First, we condition to see that
X = X 1{X<a} + X 1{X≥a} . (A.1)
From (A.1) since X ≥ 0 we have
X ≥ X 1{X≥a}
≥ a1{X ≥ a}.
This second inequality follows by looking at two cases: if X < a then both sides are zero, but if X ≥ a then
we can use that X ≥ a. Using monotonicity of E, we then have
E[X] ≥ E[a1{X≥a} ] = aE[1{X≥a} ] = aP[X ≥ a].
Dividing through by a finishes the proof.
This result is known as Markov’s inequality.
122
2.11 Since X ≥ 0, Markov’s inequality (from 2.10) gives us that

1
P[X ≥ a] ≤ E[X] = 0
a
for any a > 0. Hence,
P[X > 0] = P[X > 1] + P[1 ≥ X > 0]

"∞ #
[n o
1 1
= P[X > 1] + P n
≥X> n+1
n=1
∞
X h i
= P[X > 1] + P n1 ≥ X > 1
n+1
n=1
X∞ h i
1
≤ P[X ≥ 1] + P X> n+1
n=1
= 0.
Chapter 3
3.1 We have S2 = X1 + X2 so
E[S2 | σ(X1 )] = E[X1 | σ(X1 )] + E[X2 | σ(X1 )]

= X1 + E[X2 ]
= X1 .
Here, we use the linearity of conditional expectation, followed by the fact that X1 is σ(X1 ) measurable and
X2 is independent of σ(X1 ) with E[X2 ] = 0.
Secondly, S22 = X12 + 2X1 X2 + X22 so
E[S22 ] = E[X12 | σ(X1 )] + 2E[X1 X2 | σ(X1 )] + E[X22 | σ(X1 )]

= X12 + 2X1 E[X2 | σ(X1 )] + E[X22 ]
= X12 + 1
Here, we again use linearity of conditional expectation to deduce the first line. To deduce the second line,
we use that X12 and X1 are σ(X1 ) measurable (using the taking out what is known rule for the middle
term), whereas X2 is independent of σ(X1 ). The final line comes from E[X2 | σ(X1 )] = 0 (which we already
knew from above) and that E[X22 ] = 1.
3.2 For i ≤ n, we have Xi ∈ mFn , so Sn = X1 + X2 + . . . + Xn is also Fn measurable. For each i we have
|Xi | ≤ 2 so |Sn | ≤ 2n, hence Sn ∈ L1 . Lastly,
E[Sn+1 | Fn ] = E[X1 + . . . + Xn | Fn ] + E[Xn+1 | Fn ]

= (X1 + . . . + Xn ) + E[Xn+1 ]
= Sn .
Here, we use linearity of conditional expectation in the first line. To deduce the second, we use that
Xi ∈ mFn for i ≤ n and that Xn+1 is independent of F\ in the second. For the final line we note that
E[Xn+1 ] = 13 2 + 23 (−1) = 0.
3.3 (a) Since Xi ∈ mFn for all i ≤ n, we have Mn = X1 X2 . . . Xn ∈ mFn . Since |Xi | ≤ c for all i, we have
|Mn | ≤ cn < ∞, so Mn is bounded and hence Mn ∈ L1 for all n. Lastly,
E[Mn+1 | Fn ] = E[X1 X2 . . . Xn Xn+1 | Fn ]

= X1 . . . Xn E[Xn+1 | Fn ]
= X1 . . . Xn E[Xn+1 ]
= X1 . . . Xn
= Mn .
Here, to deduce the second line we use that Xi ∈ mFn for i ≤ n. To deduce the third line we use that
Xn+1 is independent of Fn and then to deduce the forth line we use that E[Xn+1 ] = 1.
123
(b) By definition of conditional expectation (Theorem 3.1.1), we have Mn ∈ L1 and Mn ∈ mGn for all n.
It remains only to check that
E[Mn+1 | Gn ] = E[E[Z | Gn+1 ] | Gn ]

= E[Z | Gn ]
= Mn .
Here, to get from the first to the second line we use the tower property.
3.4 Note that the first and second conditions in Definition 3.3.3 are the same for super/sub-martingales as for
martingales. Hence, M satisfies these conditions. For the third condition, we have that E[Mn+1 | Fn ] ≤
Mn (because M is a supermartingale) and E[Mn+1 | Fn ] ≥ Mn (because M is a submartingale. Hence
E[Mn+1 | Fn ] = Mn , so M is a martingale.
3.5 (a) If m = n then E[Mn | Fn ] = Mn because Mn is Fn measurable. For m > n, we have
E[Mm |Fn ] = E[E[Mm | Fm−1 ] | Fn ] = E[Mm−1 | Fn ].
Here we use the tower property to deduce the first equality (note that m − 1 ≥ n so Fm−1 ⊇ Fn )
and the fact that (Mn ) is a martingale to deduce the second inequality (i.e. E[Mm | Fm−1 ] = Mm−1 ).
Iterating, from m, m − 1 . . . , n + 1 we obtain that
E[Mm | Fn ] = E[Mn+1 | Fn ]
and the martingale property then gives that this is equal to Mn , as required.
(b) If (Mn ) is a submartingale then E[Mm | Fn ] ≥ Mn , whereas if (Mn ) is a supermartingale then
E[Mm | Fn ] ≤ Mn .
3.6 We have (Mn+1 − Mn )2 = Mn+1
2
− 2Mn+1 Mn + Mn2 so
E (Mn+1 − Mn )2 | Fn = E[Mn+1
2
| Fn ] − 2E[Mn+1 Mn | Fn ] + E[Mn2 | Fn ]

2
= E[Mn+1 | Fn ] − 2Mn E[Mn+1 | Fn ] + Mn2
2
= E[Mn+1 | Fn ] − 2Mn2 + Mn2
2
= E[Mn+1 | Fn ] − Mn2
as required. Here, we use the taking out what is known rule (since Mn ∈ mFn ) and the martingale property
of (Mn ).
It follows that E[Mn+1
2
| Fn ] − Mn2 = E[(Mn+1 − Mn )2 | Fn ] ≥ 0. We have Mn2 ∈ mFn and since Mn ∈ L2
we have Mn2 ∈ L1 . Hence (Mn2 ) is a submartingale.
3.7 Using that E[Xn+1 | Fn ] = aXn + bXn−1 we can calculate
E[Sn+1 | Fn ] = E[αXn+1 + Xn | Fn ] = α(aXn + bXn−1 ) + Xn = (αa + 1)Xn + αbXn−1 .
We want this to be equal to Sn , and Sn = αXn + Xn−1 . So we need
αa + 1 = α and αb = 1.
Hence, α = 1
b
is our choice. It is then easy to check that
E[Sn+1 | Fn ] = 1b (aXn + bXn−1 ) + Xn = ( ab + 1)Xn + Xn−1 = 1b Xn + Xn−1 = Sn
and thus Sn is a martingale, as required.

3.8 Note that
E[X1 | σ(Sn )] = E[X2 | σ(Sn )] = . . . = E[Xn | σ(Sn )]
is constant, by symmetry (i.e. permuting the X1 , X2 , . . . , Xn does not change the distribution of Sn ). Hence,
if we set E[Xi | σ(Sn )] = α then
n
" n #
X X
nα = E[Xi | σ(Sn )] = E Xi σ(Sn ) = E[Sn | σ(Sn )] = Sn .
i=1 i=1
Here we use the linearity of conditional expectation and the fact that Sn is σ(Sn ) measurable. Hence,
Sn
α = E[X1 | σ(Sn )] = .
n
124
Chapter 4
4.1 Since Zn = eSn and Sn ∈ mFn , also Zn ∈ mFn . Since |Sn | ≤ n, we have |eSn | ≤ en < ∞, so Zn is bounded
and hence Zn ∈ L1 . Hence also Mn ∈ mFn and Mn ∈ L1 .
Lastly,
E[eSn+1 |Fn ] = E[eXn+1 eSn | Fn ]

= eSn E[eXn+1 | Fn ]
= eSn E[eXn+1 ]
= eSn 1
+ 12 e−1

2
e
1
e+
= eSn e
.
2
Here, we use the taking out what is known rule and the fact that Xn+1 (and hence also eXn+1 ) is independent
of Fn . We also calculate the expectation of the discrete random variable Xn+1 , using that P[Xn+1 = 1] =
P[Xn+1 = −1] = 12 . This gives us that
E[Mn+1 | Fn ] = Mn
E[Zn+1 | Fn ] ≥ Zn
e+ 1
where to deduce the last line we use that 2
e
> 1. Thus Mn is a martingale and Zn is a submartingale.
4.2 Since Xi ∈ mFn for all i ≤ n, we have Mn ∈ mFn . We have |Xi | ≤ 1 so |Sn | ≤ n and |Mn | ≤
max{(q/p)n , (q/p)−n } = (q/p)−n (note p > q), which implies that Sn ∈ L1 and Mn ∈ L1 for all n.
We have
E[Sn+1 | Fn ] = E[Sn + Xn+1 | Fn ]

= Sn + E[Xn+1 | Fn ]
= Sn + E[Xn+1 ]
> Sn .
Here we use the taking out what is known rule, followed by the fact that Xn+1 is independent of Fn and
the relationship between conditional expectation and independence. To deduce the final step we use that
E[Xn+1 ] = p(1) + q(−1) = p − q > 0. Therefore, we have E[Sn+1 | Fn ] ≥ Sn , which means that Sn is a
submartingale.
We now look at Mn . We have
E[Mn+1 |Fn ] = (q/p)Sn E[(q/p)Xn+1 |Fn ]

= (q/p)Sn E[(q/p)Xn+1 ]
= (q/p)Sn = Mn .
Here we use the taking out what is known rule, followed by the fact that Xn+1 is independent of Fn and
the relationship between conditional expectation and independence. To deduce the final step we use that
h i
E (q/p)Xn+1 = p(q/p)1 + q(q/p)−1 = p + q = 1.
4.3 Since Xi ∈ Fn for all i ≤ n we have Sn ∈ mFn where Fn = σ(X1 , . . . , Xn ). Further, if we set m =
max(|a|, |b|) then |Sn | ≤ nm so Sn ∈ L1 .
We have
E[Sn+1 | Fn ] = E[Sn + Xn+1 | Fn ]

= Sn + E[Xn+1 | Fn ]
= Sn + E[Xn+1 ]
= Sn + apa − bpb
where pb = 1 − pa . Therefore, Sn is a martingale if and only if apa = bpb .
125
4.4 Clearly Sn ∈ mFn (recall that, by default, we take (Fn ) to be the generated filtration of (Sn )) hence
Sn2 ∈ mFn and Sn2 − n ∈ Fn . We have |Sn |leqn so E[|Sn2 − n] ≤ E[Sn2 ] + E[n] = 2n by the triangle law and
monotonicity of expectation, hence Sn2 − n ∈ L1 . Similarly Sn2 ∈ L2 ,
We have
2
E[Sn+1 − (n + 1) | Fn ] = E[(Xn+1 + Sn )2 − (n + 1) | Fn ]
2
= E[Xn+1 | Fn ] + 2E[Sn Xn+1 | Fn ] + E[Sn2 | Fn ] − (n + 1)
2
= E[Xn+1 ] + 2Sn E[Xn+1 ] + Sn2 − (n + 1)
= 1 + 2Sn (0) + Sn2 − (n + 1)
= Sn2 − n
Here, the second line follows by linearity. The third line follows by the taking out what is known rule and
the independence rule, using that Xn+1 is independent of Fn , whilst Sn ∈ mFn . The fourth line follows
because Xn+1
2
= 1 and E[Xn+1 ] = 12 (1) + 12 (−1) = 0. Hence Mn = Sn2 − n is a martingale.
Subtracting n from both sides, we obtain E[Sn+12
| Fn ] = Sn2 + 1. Hence (Sn2 ) is a submartingale.
4.5 The natural filtration is Fn = σ(X1 , . . . , Xn ). We can write
Sn+1 = 1{Sn = 0} + 1{Sn 6= 0}(Sn + Xn+1 ).
Hence, if Sn ∈ mFn then Sn+1 ∈ mFn+1 . Since S1 = X1 ∈ mFn , a trivial induction shows that Sn ∈ mFn
for all n. hence, Ln ∈ mFn and also Sn − Ln ∈ mFn .
We have −n ≤ Sn − Ln ≤ n + 1 (because the walk is reflected at zero and can increase by at most 1 in each
time step) so Sn − Ln is bounded and hence Sn − Ln ∈ L1 .
Lastly,
E[Sn+1 |Fn ] = E[1{Sn = 0} + 1{Sn 6= 0}(Sn + Xn+1 ) | Fn ]

= 1{Sn = 0} + 1{Sn 6= 0}(Sn + E [Xn+1 |Fn ])
= 1{Sn = 0} + 1{Sn 6= 0}Sn
= 1{Sn = 0} + Sn . (A.2)
Here we use linearity and the taking out what is known (Sn ∈ mFn ) rule, as well as that Xn+1 is independent
of Fn . Hence,
" n
#
1{Si = 0} Fn
X
E[Sn+1 − Ln+1 | Fn ] = E Sn+1 −
i=0
n
= Sn + 1{Sn = 0} − 1{Si = 0}
X
i=0
n−1
1{Si = 0}
X
= Sn −
i=0
= Sn − Ln
Here we use (A.2) to deduce the second line, as well as taking out what is known. Hence, Sn − Ln is a
martingale.
4.6 Let Bn be number of red balls in the urn at time n (i.e. after the nth draw is complete). Then, the
proportion of red balls in the urn just after the nth step is
Bn
Mn = .
n+3
Essentially the same argument as for the two colour urn of Section 4.2 shows that Mn is adapted to the
natural filtration and also in L1 . We have
E[Mn+1 | Fn ] = E Mn+1 1{(n+1)th draw is red} + Mn+1 1{(n+1)th draw is not red} Fn
h i

Bn + 1 B
=E 1{(n+1)th draw is red} + n 1{(n+1)th draw is not red} Fn
n+4 n+4
Bn + 1 Bn
E 1{(n+1)th draw is red} Fn + E 1{(n+1)th draw is not red} Fn
h i h i
=
n+4 n+4
126

Bn + 1 Bn Bn Bn
= + 1−
n+4 n+3 n+4 n+3
Bn (n + 4)
=
(n + 3)(n + 4)
= Mn .
Hence (Mn ) is a martingale.
4.7 By the result of 3.6, Sn2 is a submartingale.
(i) Mn = Sn2 + n is Fn measurable (because Sn ∈ mFn ) and since |Sn | ≤ n we have |Mn | ≤ n2 + n < ∞,
so Mn ∈ L1 . Further, using the submartingale property of Sn2 ,
2 2
E[Sn+1 + n + 1 | Fn ] = E[Sn+1 | Fn ] + n + 1 ≥ Sn2 + n
so Mn is a submartingale. But E[M1 ] = 2 6= 0 = E[M0 ], so (by Lemma 3.3.6) we have that (Mn ) is
not a martingale.
(ii) We have shown in Section 4.1 that (Sn ) is a martingale, and in Exercise 4.4 that Mn = Sn2 − n is a
martingale. Hence Sn2 − n + Sn ∈ Fn and E[|Sn2 + Sn − n|] ≤ E[|Sn2 − n|] + E[|Sn |] < ∞ by the triangle
law, so also Sn2 + Sn − n ∈ L1 . Also,
2 2
E[Sn+1 + Sn+1 − (n + 1) | Fn ] = E[Sn+1 − (n + 1) | Fn ] + E[Sn+1 | Fn ]
= Sn2 − n + Sn
as required. To deduce the second line we use the martingale properties of (Sn ) and (Mn ). As all
martingales are submartingales, this case is also a submartingale.
More generally, the sum of two martingales is always a martingale. The same is true for submartingales
and also for supermartingales.
(iii) Mn = Snn is Fn measurable (because Sn ∈ mFn ) and since |Sn | ≤ n we have |Sn | ≤ 1, so Mn ∈ L1 .
Since Sn is a martingale, we have

Sn+1 1 Sn Sn
E Fn = E[Sn+1 |Fn ] = 6= .
n+1 n+1 n+1 n
Hence Mn = Snn is not a martingale. Also, since Sn may be both positive or negative, we do not have
Sn
n+1
≥ Snn , so Sn is also not a submartingale.
4.8 We showed that Sn2 − n was a martingale in 4.7. We try to mimic that calculation and find out what goes
wrong in the cubic case. So, we look at
E[(Sn+1 − Sn )3 | Fn ] = E[Sn+1
3 2
| Fn ] − 3Sn E[Sn+1 | Fn ] + 3Sn2 E[Sn+1 | Fn ] − E[Sn3 | Fn ]
3
= E[Sn+1 | Fn ] − 3Sn (Sn2 + 1) + 3Sn2 Sn − Sn3
3
= E[Sn+1 | Fn ] − Sn3 − 3Sn .
Here we use linearity, taking out what is known and that fact that E[Sn+1
2
| Fn ] = Sn2 + 1 (from 4.7).
However, also Sn+1 − Sn = Xn+1 , so
E[(Sn+1 − Sn )3 | Fn ] = E[Xn+1
3
| Fn ]
3
= E[Xn+1 ]
= 0.
because Xn+1 is independent of Fn and 3
Xn+1 = Xn (it takes values only 1 or −1) so E[Xn+1
3
] = 0. Putting
these two calculations together, we have
3
E[Sn+1 | Fn ] = Sn3 + 3Sn . (A.3)
Suppose (aiming for a contradiction) that there is a deterministic function f : N → R such that Sn3 − f (n)
is a martingale. Then
3
E[Sn+1 − f (n + 1) | Fn ] = Sn3 − f (n).
Combining the above equation with (A.3) gives
f (n + 1) − f (n) = 3Sn
but this is impossible because Sn is not deterministic. Thus we reach a contradiction; no such function f
exists.
127
Chapter 5
5.1 The value of the portfolio (1, 3) at time 1 will be
1(1 + r) + 3S1 = (1 + r) + 3S1
where S1 is a random variable with P[S1 = su] = pu and P[S1 = sd] = pd .

5.2 (a) We need a portfolio h = (x, y) such that V1h = Φ(S1 ) = 1, so we need that
x(1 + r) + ysu = 1
x(1 + r) + ysd = 1.
Solving this pair of linear equations gives x = 1

1+r
and y = 0. So our replicating portfolio consists
simply of 1+r
1
cash and no stock.
Hence, the value of this contingent claim at time 0 is x + sy = 1
1+r
.
(b) Now we need that
x(1 + r) + ysu = 3
x(1 + r) + ysd = 1.

Which has solution x = 1
1+r
2su
3 − su−sd and y = s(u−d)
2
.

Hence, the value of this contingent claim at time 0 is x + sy = 1
1+r
3− 2su
su−sd
+ 2
u−d
.
5.3 (a) If we buy two units of stock, at time 1, for a price K then our contingent claim is Φ(S1 ) = 2S1 − 2K.
(b) In a European put option, with strike price K ∈ (sd, su), we have the option to sell a single unit of
stock for strike price K. It is advantageous to us to do so only if K > S1 , so
if S1 = su,
(
0
Φ(S1 ) = = max(K − S1 , 0) (A.4)
K − S1 if S1 = sd.
(c) We S1 = su we sell a unit of stock for strike price K and otherwise do nothing. So our contingent
claim is
K − S1 if S1 = su,
(
Φ(S1 ) =
0 if S1 = sd.
(d) Holding both the contracts in (b) and (c) at once, means that in both S1 = su and S − 1 = sd,
we end up selling a single unit of stock for a fixed price K. Our contingent claim for doing so is
Φ(S1 ) = K − S1 .
5.4 (a) The contingent claims for the call and put options respectively are max(S1 − K, 0) from (5.6) and
max(K − S1 , 0) from (A.4). Using the risk neutral valuation formula from Proposition 5.2.6 we have
1 1
Πcall
0 = EQ [max(S1 − K, 0)] , Πput
0 = EQ [max(K − S1 , 0)] .
1+r 1+r
(b) Hence,
1
Πcall
0 − Πput
0 = EQ [max(S1 − K, 0) − max(K − S1 , 0)]
1+r
1
= EQ [max(S1 − K, 0) + min(S1 − K, 0)]
1+r
1
= EQ [S1 − K]
1+r
1 Q
= E [S1 ] − K
1+r
By (5.4) (or you can use Proposition 5.2.6 again) we have 1

1+r
EQ [S1 ] = S0 = s, so we obtain
K
Πcall
0 − Πput
0 =s− .
1+r
128
5.5 In the binomial model, the contingent claim of a European call option with strike price K is Φ(ST ) =
max(ST − K).
5.6 A tree-like diagram of the possible changes in stock price, with the value of the contingent claim Φ(ST ) =
max(K − S1 , 0) written on for the final time step looks like
By (5.3), the risk free probabilities in this case are
(1 + 0.25) − 0.5
qu = = 0.5, qd = 1 − qu = 0.5.
2.0 − 0.5
Using Proposition 5.2.6 recursively, as in Section 5.6, we can put in the value of the European put option
at all nodes back to time 0, giving
Hence, the value of our European put option at time 0 is 12.

Changing the values of pu and pd do not alter value of our put option, since pu and pd do not appear in
the pricing formulae.
5.7 A tree-like diagram of the possible changes in stock price, with the value of the option at all nodes of the
tree shown in (green) square boxes, and hedging portfolios h = (x, y) written (in red) for each node of the
tree:
The hedging portfolios are found by solving linear equations of the form x+dSy = Φ(dS),
e x+uSy = Φ(uD),
e
where Φe is the contingent claim and S the initial stock price of the one period model associated to the
given node. The value of the contingent claim at each node is then inferred from the hedging portfolio as
V = x + Sy.
129
5.8 A tree-like diagram of the possible changes in stock price, with the value of the option at all nodes of the
tree shown in (green) square boxes, and hedging portfolios h = (x, y) written (in red) for each node of the
tree:
The hedging portfolios are found by solving linear equations of the form x+dSy = Φ(dS),
e x+uSy = Φ(uD),
e
where Φ is the contingent claim and S the initial stock price of the one period model associated to the
e
given node. The value of the contingent claim at each node is then inferred from the hedging portfolio as
V = x + Sy.
The hedging portfolio is the same at each node. This is because the contingent claim of our call option
always exceeds zero i.e. we are certain that at time T = 2 we would want to exercise our call option and
buy a single unit of stock for a price K = 60. Since there is no interest payable on cash, this means that
we can replicate our option (at all times) by holding a single unit of stock, and −60 in cash.
5.9 We have that (St ) is adapted (see Section 5.4) and since St ∈ (dt , ut ) we have also that St ∈ L1 . Hence, St
is a martingale under P if and only if EP [St+1 | Ft ] = St . That is if
EP [St+1 | Ft ] = E[Zt+1 St | Ft ]
= St EP [Z | Ft ]
= St EP [Z]
is equal to St . Hence, St is a martingale under P if and only if EP [Zt+1 ] = upu + dpd = 1.

Now consider Mt = log St , and assume that 0 < d < u and 0 < s, which implies St > 0. Since St is adapted,
so is Mt . Since St ∈ St ∈ (dt , ut ) we have Mt ∈ (t log d, t log u) so also Mt ∈ L1 . We have
t
! t
Mt = log S0 Zi = log(S0 ) +
Y X
Zi .
i=1 i=1
Hence,
t
EP [Mt+1 |Ft ] = log S0 + log(Zi ) + E[log(Zt+1 ) | Ft ]
X
i=1
= Mt + EP [log Zt+1 ]
= Mt + pu log u + pd log d.
Here we use taking out what is known, since S0 and Zi ∈ mFt for i ≤ t, and also that Zt+1 is independent
of Ft . Hence, Mt is a martingale under P if and only if pu log u + pd log d = 0.
Chapter 6
6.1 Since Xn ≥ 0 we have
E[|Xn − 0|] = E[Xn ] = 12 2−n = 2−(n+1)
which tends to 0 as n → ∞. Hence Xn → 0 in L1 . Lemma 6.1.2 then implies that also Xn → 0 in
probability and in distribution. Also, 0 ≤ Xn ≤ 2−n and 2−n → 0, so by the sandwich rule we have
Xn → 0 as n → ∞, and hence P[Xn → 0] = 1, which means that Xn → 0 almost surely.
L1
6.2 (a) We have Xn → X and hence
|E[Xn ] − E[X]| = |E[Xn − X]| ≤ E[|Xn − X|] → 0.
Here we use the linearity and absolute value properties of expectation. Hence, E[Xn ] → E[X].
130
(b) Let X1 be a random variable such that P[X1 = 1] = P[X1 = −1] = 12 and note that E[X1 ] = 0. Set
Xn = X1 for all n ≥ 2, which implies that E[Xn ] = 0 → 0 as n → ∞. But E[|Xn − 0|] = E[|Xn |] = 1
so Xn does not converge to 0 in L1 .
6.3 We have Xn = X when U = 2, and |Xn −X| = n1 when U = 0, 1. Hence, |Xn −X| ≤ 1
n
and by monotonicity
of E we have
1
E [|Xn − X|] ≤ → 0 as n → ∞,
n
L1
which means that Xn → X.
We can write
P[Xn → X] = P[Xn → X, U = 0] + P[Xn → X, U = 1] + P[Xn → X, U = 2]
1 1
= P[1 + → 1, U = 0] + P[1 − → 1, U = 1] + P[0 → 0, U = 2]
n n
= P[U = 0] + P[U = 1] + P[U = 2]
= 1.
Hence, Xn → X.
a.s.
By Lemma 6.1.2, it follows that also Xn → X in probability and in distribution.

6.4 We have Xn = X1 = 1 − Y for all n, so Y = 0 ⇔ Xn = 1 and Y = 1 ⇔ Xn = 0. Hence |Y − Xn | = 1 for
all n. So, for example, with a = 12 we have P[|Xn − Y | ≥ a] = 1, which does not tend to zero as n → ∞.
Hence Xn does not converge to Y in probability.
However, Xn and y have the same distribution, given by
if x < 0

0

P[Y ≤ x] = P[Xn ≤ x] = 2 if x ∈ [0, 1)
1
if x ≥ 1

1

so we do have P[Xn ≤ x] → P[Y ≤ x] as n → ∞. Hence Xn → Y .

d
6.5 Let (Xn ) be the sequence of random variables from 6.1. Define Yn = X1 + X2 + . . . + Xn .
(a) For each ω ∈ Ω, we have Yn+1 (ω) = Yn (ω) + Xn+1 (ω). Hence, (Yn (ω) is an increasing sequence. Since
Xn (ω) ≤ 2−n for all n we have
∞
X
|Yn (ω)| ≤ 2−1 + 2−2 + . . . + 2−n ≤ 2−i = 1,
i=1
meaning that the sequence Yn (ω) is bounded above by 1.

(b) Hence, since bounded increasing sequences converge, for any ω ∈ Ω the sequence Yn (ω) converges.
(c) The value of Y1 is either 0 or 12 , both with probability 12 . The value of Y2 is either 0, 14 , 12 , 34 , all with
probability 14 . The value of Y3 is either 0, 18 , 14 , 38 , 12 , 58 , 34 , 78 all with probability 18 .
(d) We can guess from (c) that the Yn are becoming more and more evenly spread across [0, 1], so we
expect them to approach a uniform distribution as n → ∞.
(e) We will use induction on n. Our inductive hypothesis is
(IH)n : For all k = 0, 1, . . . , 2n − 1, it holds that P[Yn = k2−n ] = 2−n .
In words, this says that Yn is uniformly distributed on {k2−n ; k = 0, 1, . . . , 2n − 1}.
For the case n = 1, we have Y1 = X1 so P[Y1 = 0] = P[Y1 = 12 ] = 12 , hence (IH)1 holds.
Now, assume that (IH)n holds. We want to calculate P[Yn+1 = k2−(n+1) for k = 0, 1, . . . , 2n+1 − 1.
We consider two cases, dependent on whether k is even or odd.
• If k is even then we can write k = 2j for some j = 0, . . . , 2n − 1. Hence,
P[Yn+1 = k2−(n+1) ] = P[Yn+1 = j2−n ]
= P[Yn = j2−n and Xn+1 = 0]
= P[Yn = j2−n ]P[Xn+1 = 0]
= 2−n 12
= 2−(n+1) .
Here we use that Yn and Xn+1 are independent, and use (IH)n to calculate P[Yn = j2−n ].
131
• Alternatively, if k is odd then we can write k = 2j + 1 for some j = 0, . . . , 2n − 1. Hence,

P[Yn+1 = k2−(n+1) ] = P[Yn+1 = j2−n + 2−(n+1) ]
= P[Yn = j2−n and Xn+1 = 2−(n+1) ]
= P[Yn = j2−n ]P[Xn+1 = 2−(n+1) ]
= 2−n 12
= 2−(n+1) .
Here, again, we use that Yn and Xn+1 are independent, and use (IH)n to calculate P[Yn = j2−n ].
In both cases we have shown that P[Yn+1 = k2−(n+1) ] = 2−(n+1) , hence (IH)n+1 holds.
(f) We know that Yn → Y almost surely, hence by Lemma 6.1.2 the convergence also holds in distribution.
Hence P[Yn ≤ a] → P[Y ≤ a].
From part (e), for any a ∈ [0, 1] we have
(k − 1)2−n ≤ a < k2−n ⇒ P[Yn ≤ a] = k2−n .
Hence a ≤ P[Yn ≤ a] ≤ a + 2−n , and the sandwich rule tells us that P[Yn ≤ a] → a. Hence
P[Y ≤ a] = a, for all a ∈ [0, 1], which means that Y has a uniform distribution on [0, 1].
6.7 Since 1{X ≤ n + 1} ≥ 1{X ≤ n} we have Xn+1 ≥ Xn , and since X > 0 we have Xn ≥ 0. Further, for
all ω ∈ Ω, whenever X(w) < n we have Xn (ω) = X(ω), and since P[X(ω) < ∞] = 1 this means that
P [Xn (ω) → X(ω)] = 1. Hence Xn → X.
a.s.
Therefore, the monotone convergence theorem applies and E[Xn ] → E[X].

6.6 Define Xn = −Yn . Then Xn ≥ 0 and Xn+1 ≥ Xn , almost surely. Hence, (Xn ) satisfies the conditions
for the dominated convergence theorem and there exists a random variable X such that Xn → X and
a.s.
E[Xn ] → E[X]. Define Y = −X and then we have Yn → Y and E[Yn ] → E[Y ].

a.s.
6.8 (a) The equation " #

n n
(A.5)
X X
E Xi = E[Xi ]
i=1 i=1
follows
P∞ by iterating
P∞ the linearity property of E n times (or by a trivial induction). However, in
E i=1 Xi = i=1 E[Xi ] both sides are defined by infinite sums, which are limits, and the linearity
property does not tell us anything about limits.
(b) We’ll take a limit as n → ∞ on each side of (A.5). We’ll have to justify ourselves in each case.
First, consider the
Pright hand side. Since we have Xn ≥ 0 almost surely, E[Xn ] ≥ 0 by monotonicity
of E. Therefore, n i=1 E[Xi ] is an increasing sequence of real numbers. Hence it is convergent (either
to a real number or to +∞) and, by definition of an infinite series, the limit is written ∞
P
i=1 E[Xi ].
Now, the left hand side, and here we’ll need the monotone convergence theorem. Define Yn = n i=1 Xi .
P
Then Yn+1 − Yn = Xn+1 ≥ 0 almost surely, so Yn+1 ≥ Yn . Since Xi ≥ 0 for all i, also Yn ≥P 0, almost
surely. Hence the monotone convergence theorem applies and E[Yn ] → E[Y∞ ] where Y∞ = ∞ i=1 Xi .
Putting both sides together (and using the uniqueness of limits to tell us that the limits on both sides
must be equal as n → ∞) we obtain that E
P∞ P∞
i=1 X i = i=1 E[X i ].
(c) Since
P∞ P[Xi = 1] = P[Xi = −1], we have E[X i ] = 0 and hence also i=1 E[Xi ] = 0. However, the limit
P∞
is not well defined, because n
oscillates (it jumps up/down by ±1 on each step)
P
i=1 Xi Yn = i=1 X i
and does not converge. Hence, in this case E i=1 Xi is not well defined.
P∞
6.6 (a) We have both P[Xn ≤ x] → P[X ≤ x] and P[Xn ≤ x] → P[Y ≤ x], so by uniqueness of limits for real
sequences, we have P[X ≤ x] = P[Y ≤ x] for all x ∈ R. Hence, X and Y have the same distribution
(i.e. they have the same distribution functions FX (x) = FY (x)).
(b) By definition of convergence in probability, for any a > 0, for any > 0 there exists N ∈ N such that,
for all n ≥ N ,
P[|Xn − X| > a] < and P[|Xn − Y | > a] < .
By the triangle inequality we have
P[|X − Y | > 2a] = P[|X − Xn + Xn − Y | > 2a] ≤ P[|X − Xn | + |Xn − Y | > 2a]. (A.6)
If |X − Xn | + |Xn − Y | > 2a then |X − Xn | > a or |Xn − Y | > a (or possibly both). Hence, continuing
(A.6),
P[|X − Y | > 2a] ≤ P[|Xn − X| > a] + P[|Xn − Y | > a] ≤ 2.
Since this is true for any > 0 and any a > 0, we have P[X = Y ] = 1.
132
6.7 Suppose (aiming for a contradiction) that there exists and random variable X such that Xn → X in
probability. By the triangle inequality we have
|Xn − Xn+1 | ≤ |Xn − X| + |X − Xn+1 |
Hence, if |Xn − Xn+1 | > 1 then |Xn − X| > or |Xn+1 − X| > 12 (or both). Therefore,
1
2
P [|Xn − Xn+1 | > 1] ≤ P |Xn − X| > 12 + P |Xn+1 − X| > 12 .

Since Xn → X in probability, the right hand side of the above tends to zero as n → ∞. This implies that
P [|Xn − Xn+1 | > 1] → 0
as n → ∞. But, Xn and Xn+1 are independent and P[Xn = 1, Xn+1 = 0] = 14 so P[|Xn − Xn+1 | > 1] ≥ 14
for all n. Therefore we have a contradiction, so there is no X such that Xn → X in probability.
Hence, by Lemma 6.1.2, we can’t have any X such that Xn → X almost surely or in L1 (since it would
imply convergence in probability).
Chapter 7
7.1 (a) We have (C ◦ M )n =
Pn
i=1 0(Si − Si−1 ) = 0.
(b) We have (C ◦ M )n =
Pn
i=1 1(Si − Si−1 ) = Sn − S0 = Sn .
(c) We have
n
X
(C ◦ M )n = Si−1 (Si − Si−1 )
i=1
Xn
= (X1 + . . . + Xi−1 )Xi
i=1
n
!2 n
1 X 1X 2
= Xi − Xi
2 i=1
2 i=1
S2 n
= n −
2 2
In the last line we use that Xi2 = 1.
7.2 We have
n
X
(X ◦ M )n = (αCn−1 + βDn−1 )(Mn − Mn−1 )
i=1
Xn n
X
=α Cn−1 (Mn − Mn−1 ) + β Dn−1 (Mn − Mn−1 )
i=1 i=1
= α(C ◦ M )n + β(C ◦ M )n
as required.
7.3 (a) We note that
n
X n
X
|Sn | ≤ |Xi | ≤ 1 = n < ∞.
i=1 i=1
Hence Sn ∈ L1 . Since Xi ∈ Fn for all i ≤ n, we have also that Sn ∈ mFn . Lastly,

E[Sn+1 | Fn ] = E[Sn + Xn+1 | Fn ]
= Sn + E[Xn+1 ]
= Sn .
Therefore, Sn is a martingale. In the above, in the second line we use that Sn ∈ mFn and that Xn+1
is independent of Fn . To deduce the last line we use that E[Xi ] = 2n1 2 (1) = 2n1 2 (−1) + (1 − n12 )(0) = 0.
We next check that Sn is uniformly bounded in L1 . We have E[|Xi |] = 2i12 |1|+ 2i12 |−1|+(1− i12 )|0| = i12 .
Using that |Sn | ≤ |X1 | + . . . + |Xn |, monotonicity of expectations gives that
n n ∞
X X 1 X 1
E[|Sn |] ≤ E[|Xi |] ≤ 2
≤ 2
< ∞.
i=1 i=1
i i=1
i
133
Hence supn E[|Sn |] < ∞, so (Sn ) is uniformly bounded in L1 .

Therefore, the martingale convergence theorem applies, which gives that there exists some real-valued
random variable S∞ such that Sn → S∞ .
a.s.
(b) Note that (Sn ) takes values in Z and that, from part (b), almost surely we have that Sn converges
to a real value. It follows immediately by Lemma 7.4.1 that (almost surely) there exists N ∈ N such
that Sn = S∞ for all n ≥ N .
7.4 Here’s some R code to plot 1000 time-steps of the symmetric random walk. Changing the value given to
set.seed will generate a new sample.
> T=1000
> set.seed(1)
> x=c(0,2*rbinom(T-1,1,0.5)-1)
> y=cumsum(x)
> par(mar=rep(2,4))
> plot(y,type="l")
Adapting it to plot the asymmetric case and the random walk from exercise 7.3 is left for you.
In case you prefer Python, here’s some Python code that does the same job. (I prefer Python.)
import numpy as np
import matplotlib.pyplot as plt
p = 0.5
rr = np.random.random(1000)
step = 2*(rr < p) - 1
start = 0
positions = [start]
for x in step:
positions.append(positions[-1] + x)
plt.plot(positions)
plt.show()
Extending this code as suggested is left for you. For the last part, the difference in behaviour that you
should notice from the graph is that the walk from Exercise 7.3 eventually stops moving (at some random
time), whereas the random walk from Q2 of Assignment 3 never keeps moving but makes smaller and
smaller steps.
7.5 We will argue by contradiction. Suppose that (Mn ) was uniformly bounded in L1 . Then the martingale
convergence theorem would give that Mn → M∞ for some real valued random variable M∞ . However,
a.s.
Mn takes values in Z and we have Mn+1 6= Mn for all n, so Lemma 7.4.1 gives that (Mn ) cannot converge
almost surely. This is a contradiction.
7.6 (a) Take the offspring distribution to be P[G = 0] = 1. Then Z1 = 0, and hence Zn = 0 for all n ≥ 1, so
Zn dies out.
(b) Take the offspring distribution to be P[G = 2] = 1. Then Zn+1 = 2Zn , so Zn = 2n , and hence
Zn → ∞.
7.7 (a) We have
E[Mn+1 | Fn ] = E Mn+1 1{nth draw is red} Fn + E Mn+1 1{nth draw is black} Fn

h i h i

Bn B +1
=E 1{nth draw is red} Fn + E n 1{nth draw is black} Fn
n+3 n+3
Bn Bn + 1 h
E 1{nth draw is red} Fn + E 1{nth draw is black} Fn
h i i
=
n+3 n+3

Bn Bn Bn + 1 Bn
= + 1−
n+3n+2 n+3 n+2
Bn2 Bn (n + 2) + (n + 2) − Bn2 − Bn
= +
(n + 2)(n + 3) (n + 2)(n + 3)
(n + 1)Bn + (n + 2)
=
(n + 2)(n + 3)
134
This is clearly not equal to Mn = n+2

Bn
, so Mn is not a martingale.
This result might be rather surprising, but we should note that the urn process here is not ‘fair’, when
considered in terms of the proportion of (say) red balls. If we started with more red balls than black
balls, then over time we would expect to see the proportion of red balls reduce towards 12 , as we see in
part (b):
(b) Your conjecture should be that, regardless of the initial state of the urn, P[Mn → 1
2
as n → ∞] = 1.
(It is true.)
7.8 (a) Let Bn be the number of red balls in the urn at time n, and then Mn = BKn . Let us abuse notation
slightly and write B = r for the event that the ball X is red, and X = b for the event that the ball
X is black. Let X1 denote the first ball drawn on the (n + 1)th draw, and X2 denote the second ball
drawn.
Let Fn = σ(B1 , . . . , Bn ). It is clear that Mn ∈ [0, 1], so Mn ∈ L1 , and that Mn is measurable with
respect to Fn . We have
E[Mn+1 | Fn ] = E Mn+1 1{X1 =r,X2 =r} | Fn + E Mn+1 1{X1 =r,X2 =b} | Fn

+ E Mn+1 1{X1 =b,X2 =r} | Fn + E Mn+1 1{X1 =b,X2 =b} | Fn

Bn B +1
=E 1{X1 =r,X2 =r} | Fn + E n 1{X1 =r,X2 =b} | Fn
K K

Bn − 1 B
+E 1{X1 =b,X2 =r} | Fn + E n 1{X1 =b,X2 =b} | Fn
K K
Bn Bn + 1
E 1{X1 =r,X2 =r} | Fn + E 1{X1 =r,X2 =b} | Fn

=
K K
Bn − 1 Bn
E 1{X1 =b,X2 =r} | Fn + E 1{X1 =b,X2 =b} | Fn

+
K K
Bn Bn Bn Bn + 1 Bn K − Bn
= +
K K K K K K
Bn − 1 K − Bn Bn Bn K − Bn K − Bn
+ +
K K K K K K
1
= 3 Bn + 2Bn (K − Bn ) + Bn (K − B)n − Bn (K − Bn ) + Bn (K − Bn )2
3 2
K
1
= 3 K 2 Bn

K
= Mn
(b) Since Mn ∈ [0, 1] we have E[|Mn |] ≤ 1, hence Mn is uniformly bounded in L1 . Hence, by the martingale
convergence theorem there exists a random variable M∞ such that Mn → M∞ .
a.s.
Since Mn = K , this means that also Bn → KM∞ . However, Bn is integer valued so, by Lemma
Bn a.s.
7.4.1, if Bn is to converge then it must eventually become constant. This can only occur if, eventually,
the urn contains either (i) only red balls or (ii) only black balls.
In case (i) we have Mn = 0 eventually, which means that M∞ = 0. In case (ii) we have Mn = 1
eventually, which means that M∞ = 1. Hence, M∞ ∈ {0, 1} almost surely.
7.9 (a) Consider the usual Pólya urn, from Section 4.2, started with just one red ball and one black ball.
After the first draw is complete we have two possibilities:
1. The urn contains one red ball and two black balls.
2. The urn contains two red balls and one black ball.
In the first case we reach precisely the initial state of the urn described in 7.9. In the second case we
also reach the initial state of the urn described in 7.9, but with the colours red and black swapped.
If Mn → 0, then in the first case the fraction of red balls would tend to zero, and (by symmetry of
a.s.
colours) in the second case the limiting fraction of black balls would tend to zero; thus the limiting
fraction of black balls would always be either 1 (the first case) or 0 (the second case). However, we
know from Proposition 7.4.3 that the limiting fraction of black balls is actually uniform on (0, 1).
Hence, P[Mn → 0] = 0.
(b) You should discover that having a higher proportion of initial red balls makes it more likely for M∞
(the limiting proportion of red balls) to be closer to 1. However, since M∞ is a random variable in
this case, you may need to take several samples of the process (for each initial condition that you try)
to see the effect.
135
Follow-up challenge question: In fact, the limiting value M∞ has a Beta distribution, Be(α, β).
You may need to look up what this means, if you haven’t seen the Beta distribution before. Consider
starting the urn with n1 red and n2 black balls, and see if you can use your simulations to guess a
formula for α and β in terms of n1 and n2 (the answer is a simple formula!).
7.10 (a) We have that (Mn ) is a non-negative martingale, hence supn E[|Mn |] = supn E[Mn ] = E[M0 ] = 1.
Thus (Mn ) is uniformly bounded in L1 and the (first) martingale convergence theorem applies, giving
that Mn → M∞ for a real valued M∞ . Since Mn ≥ 0 almost surely we must have M∞ ≥ 0 almost
a.s.
surely.
(b) The possible values of Mn are W = {(q/p)z ; z ∈ Z}. Note that Mn+1 6= Mn for all n ∈ N.
On the event that M∞ > 0 there exists > 0 such that Mn ≥ for all sufficiently large n. Choose
m ∈ N such that (q/p)m ≤ and then for all sufficiently large n we have Mn ∈ Wm = {(q/p)z ; z ≥
m, z ∈ Z}. For such n, since Mn 6= Mn+1 we have |Mn − Mn+1 | ≥ |(q/p)m − (q/p)m−1 | > 0, which
implies that (Mn ) could not converge. Hence, the only possible almost sure limit is M∞ = 0.
If (Mn ) was uniformly bounded in L2 then by the (second) martingale convergence theorem we would
have E[Mn ] → E[M∞ ]. However E[Mn ] = E[M0 ] = 1 and E[M∞ ] = 0, so this is a contradiction. Hence
(Mn ) is not uniformly bounded in L2 .
(c) We have (q/p)S Noting that q/p > 1 and hence log(q/p) > 0, taking logs gives that
a.s.
n → 0.
Sn log(q/p) → −∞, which gives Sn → −∞.
a.s. a.s.
For the final part, if (Sn ) is a simple asymmetric random walk with upwards drift, then (−Sn ) is
a simple asymmetric random walk with downwards drift. Applying what we’ve just discovered thus
gives −Sn → −∞, and multiplying by −1 gives the required result.
a.s.
7.11 (a) This is a trivial induction: if Sn is even then Sn+1 is odd, and vice versa.
(b) By part (a), p2n+1 = 0 since 0 is an even number and S2n+1 is an odd number.
If Sn is to start at zero and reach zero at time 2n, then during time 1, 2, . . . , 2n it must have made
precisely n steps upwards and precisely n steps downwards. The number of different ways in which
the walk can do this is 2nn
(we are choosing exactly n steps on which to go upwards, out of 2n total
steps). Each one of these ways has probability ( 12 )2n of occurring, so we obtain p2n = 2n .
−2n
n
2
(c) From (b) we note that

(2n + 2)(2n + 1) −2 2n + 1 2n + 1 1 1
p2(n+1) = 2 p2n = 2 2−2 pn = p2n = 1− p2n .
(n + 1)(n + 1) n+1 2n + 2 2(n + 1) 2
Hence, using the hint, we have

1
p2(n+1) ≤ exp − p2n .
2(n + 1)
Iterating this formula obtains that

1 1 1
p2n ≤ exp − exp − . . . exp − p0
2n 2(n − 1) 2(1)
n
!
1X1
= exp − . (A.7)
2 i=1 i
Recall that n i=1 i → ∞ as n → ∞, and that e

1 −x
→ 0 as x → ∞. Hence, the right hand side of the
P
expression above tends to zero as n → ∞. Hence, since pn ≥ 0, by the sandwich we have p2n → 0 as
n → ∞.
7.12 (a) We note that

Sn+1 Sn 1
E Fn = + E[Xn+1 | Fn ]
f (n + 1) f (n + 1) f (n + 1)
Sn 1
= + E[Xn+1 ]
f (n + 1) f (n + 1)
Sn
=
f (n + 1)
Here we use that Sn ∈ mFn and that Xn+1 is independent of Fn .
136
Since Sn
f (n)
is a martingale we must have
Sn Sn
= . (A.8)
f (n + 1) f (n)
Since P[Sn 6= 0] > 0 this implies that f (n+1)

1 1
= f (n) , which implies that f (n + 1) = f (n). Since this
holds for any n, we have f (1) = f (n) for all n; so f is constant.
(b) If Sn
f (n)
is to be a supermartingale then, in place of (A.8), we would need
Sn Sn
≤ . (A.9)
f (n + 1) f (n)
Note that we would need this equation to hold true with probability one. To handle the ≥ we now
need to care about the sign of Sn . The key idea is that Sn can be both positive or negative, so the
only way that (A.9) can hold is if f (n+1)
1 1
= f (n) .
We will use some of the results from the solution of exercise 7.11. In particular, from for odd n ∈ N
from 7.11(b) we have P [Sn = 0] = 0, and for even n ∈ N from (A.7) we have P[Sn = 0] < 1.
In both cases we have P[Sn 6= 0] > 0. Since Sn and −Sn have the same distribution we have
P[Sn < 0] = P[−Sn < 0] = P[Sn > 0], and hence
P[Sn 6= 0] = P[Sn < 0] + P[Sn > 0] = 2P[Sn < 0] = 2P[Sn > 0].
Hence, for all n ∈ N we have P[Sn > 0] > 0 and P[Sn < 0] > 0.
Now, consider n ≥ 1. We have that here is positive probability that Sn > 0. Hence, by (A.9), we
must have f (n+1)
1 1
≤ f (n) , which means that f (n + 1) ≥ f (n). But, there is also positive probability
that Sn < 0, hence by (A.9) we must have f (n+1) 1 1
≥ f (n) , which means that f (n + 1) ≤ f (n). Hence
f (n + 1) = f (n), for all n, and by a trivial induction we have f (1) = f (n) for all n.
7.13 (a) Since Mn = Zn
µn
we note that
1
(Zn+1 − µZn )2 = (Mn+1 − Mn )2 . (A.10)
µ2(n+1)
Further,
Zn
X
Zn+1 − µZn = (Xin+1 − µ)
i=1
so it makes sense to define Yi = Xin+1 − µ. Then E[Yi ] = 0,
E[Yi2 ] = E[(Xin+1 − µ)2 ] = var(Xin+1 ) = σ 2 ,
and Y1 , Y2 , . . . , Yn are independent. Moreover, the Yi are identically distributed and independent of
Fn . Hence, by taking out what is known (Zn ∈ mFn ) we have
 !2 
Zn
X
2
E (Zn+1 − µZn ) | Fn = E  Yi Fn 
i=1
Zn
X Zn
X
= E[Yi2 | Fn ] + E[Yi Yj | Fn ]
i=1 i,j=1
i6=j
Zn
X Zn
X
= E[Yi2 ] + E[Yi ]E[Yj ]
i=1 i,j=1
i6=j
= Zn E[Y1 ]2 + 0
= Zn σ 2 .
So, from (A.10) we obtain

Zn σ 2
E[(Mn+1 − Mn )2 | Fn ] = .
µ2(n+1)
137
(b) By part (a) and (3.4), we have

2 Zn σ 2
E[Mn+1 | Fn ] = Mn2 + .
µ2(n+1)
Taking expectations, and using that E[Zn ] = µ , we obtain
n
2 σ2
E[Mn+1 ] = E[Mn2 ] + .
µn+2
7.14 (a) Note that Xn+1 ≥ Xn ≥ 0 for all n, because taking an inf over k > n + 1 is an infimum over a smaller
set than over k > n. Hence, (Xn ) is monotone increasing and hence the limit X∞ (ω) = limn→∞ Xn (ω)
exists (almost surely).
(b) Since |Mn | ≤ infk>n |Mk |, we have Xn ≤ |Mn |. By definition of inf, for each > 0 and n ∈ N there
exists some n0 ≥ n such that Xn ≥ |Xn0 | − . Combining these two properties, we obtain
|Mn0 | − ≤ Xn ≤ |Mn |.
(c) Letting n → ∞, in the above equation, which means that also n0 → ∞ because n0 ≥ n, we take an
almost sure limit to obtain |M∞ | − ≤ X∞ ≤ |M∞ |. Since we have this inequality for any > 0, in
fact X∞ = |M∞ | almost surely.
(d) We have shown that (Xn ) is an increasing sequence and Xn → X∞ , and since each |Mn | ≥ 0, we
a.s.
have also Xn ≥ 0. Hence, the monotone convergence theorem applies to Xn and X∞ , and therefore
E[Xn ] → E[X∞ ].
(e) From the monotone convergence theorem, writing in terms of Mn rather than Xn , we obtain

E inf |Mn | → E[|M∞ |]. (A.11)
k≥n
However, since infk≥n |Mn | ≤ |Mn | ≤ C = supj∈N E[|Mj |], the monotonicity property of expectation
gives us that, for all n,
E inf |Mn | ≤ C. (A.12)
k≥n
Combining (A.11) and (A.12), with the fact that limits preserve weak inequalities, we obtain that
E[|M∞ |] ≤ C, as required.
7.15 (a) All the Xe n+1 have the same distribution, and are independent by independence of the X n+1 and
i i
Ci . Thus, Z
n+1 en is a Galton-Watson process with off-spring distribution G e given by the common
distribution of the Xin+1 .
(b) By definition of Xe n+1 we have X n+1 ≤ X
i i
e n+1 . Hence, from (7.9), if Zn ≤ Z
i
en then also Zn+1 ≤ Z
en+1 .
Since Z0 = Z e0 = 1, an easy induction shows that Zn ≤ Z en for all n ∈ N.
(c) We have
h i
f (α) = E Xein+1
∞
X
= P[G = 0]P[C = 1] + iP[G = i]
i=1
∞
X
= αP[G = 0] + iP[G = i].
i=1
Recall that each Xin+1 has the same distribution as G. Hence,

∞
X
µ = E[Xin+1 ] = E[G] = iP[G = i] = f (0).
i=1
Since µ < 1 we therefore have f (0) < 1. Moreover,

∞
X ∞
X
f (1) = P[G = 0] + iP[G = i] ≥ P[G = i] = 1.
i=1 i=0
It is clear that f is continuous (in fact, f is linear). Hence, by the intermediate value theorem, there
is some α ∈ [0, 1] such that f (α) = 1.
(d) By (c), we can choose α such that f (α) = 1. Hence, E[X e n+1 ] = 1, so by (a) Z
i
en is a Galton-Watson
process with an offspring distribution that has mean µ̂ = 1. By Lemma 7.4.6, Z e dies out, almost
surely. Since, from (b), 0 ≤ Zn ≤ Zn , this means Zn must also die out, almost surely.
e
138
Chapter 8
8.1 We have U −1 < 1 so, for all ω ∈ Ω, Xn (ω) = U −n (ω) → 0 as n → ∞. Hence Xn (ω) → 0 almost surely as
n → ∞. Further, |Xn | ≤ 1 for all n ∈ N and E[1] = 1 < ∞, so the dominated convergence theorem applies
and shows that E[Xn ] → E[0] = 0 as n → ∞.
8.2 We check the two conditions of the dominated convergence theorem. To check the first condition, note that
if n ≥ |X| then X 1{|X|≤n} = X. Hence, since |X| < ∞, as n → ∞ we have Xn = X 1{|X|≤n} → X almost
surely.
To check the second condition, set Y = |X| and then E[|Y |] = E[|X|] < ∞ so Y ∈ L1 . Also, |Xn | =
|X 1{|X|≤n} | ≤ |X| = Y , so Y is a dominated random variable for (Xn ). Hence, the dominated convergence
theorem applies and E[Xn ] → E[X].
8.3 (a) We look to use the dominated convergence theorem. For any ω ∈ Ω we have Z(ω) < ∞, hence for all
n ∈ N such that n > Z(ω) we have Xn (ω) = 0. Therefore, as n → ∞, Xn (ω) → 0, which means that
Xn → 0 almost surely.
We have |Xn | ≤ Z and Z ∈ L1 , so we can use Z are the dominating random variable. Hence, by the
dominated convergence theorem, E[Xn ] → E[0] = 0.
(b) We have Z ∞ Z ∞
E[Z] = xf (x) dx = x−1 dx = [log x]∞
1 = ∞
1 1
which means Z ∈
/ L1 and also that
Z ∞
E[Xn ] = E[1{Z∈[n,n+1)} Z] = 1{x∈[n,n+1)} xf (x) dx
1
Z n+1
n+1
= x−1 dx = [log x]n+1
n = log(n + 1) − log n = log .
n n
As n → ∞, we have n+1 n
= 1 + n1 → 1, hence (using that log is a continuous function) we have
log( n+1
n
) → log 1 = 0. Hence, E[Xn ] → 0.
(c) Suppose that we wanted to use the DCT in (b). We still have Xn → 0 almost surely, but any
dominating random variable Y would have to satisfy Y ≥ |Xn | for all n, meaning that also Y ≥ Z,
which means that E[Y ] ≥ E[Z] = ∞; thus there is no dominating random variable Y ∈ L1 . Therefore,
we can’t use the DCT here, but we have shown in (b) that the conclusion of the DCT does hold: we
have that E[Xn ] does tend to zero.
We obtain that the conditions of the DCT are sufficient but not necessary for its conclusion to hold.
8.4 Note that {T = n} = {T ≤ n} \ {T ≤ n − 1}. If T is a stopping time then both parts of the right hand side
are elements of Fn , hence so is {T = n}.
Conversely, if {T = n} ∈ Fn for all n, then for i ≤ n we have {T = i} ∈ Fi ⊆ Fn . Hence the identity
{T ≤ n} = ∪n i=1 {T = i} shows that {T ≤ n} ∈ Fn , and therefore T is a stopping time.
8.5 We can write {R = n} = {Si 6= 0 for i = 1, . . . , n − 1} ∩ {Sn = 0} = {Sn = 0} ∩ i=1 Ω \ {Si = 0} . By

Tn
exercise 8.4 we have {Si = 0} ∈ Fn for all i ≤ n, hence {R = n} ∈ Fn for all n.

8.6 We can write
n n−k
[ [
{S + T ≤ n} = {S ≤ j} ∩ {T ≤ k}
k=0 j=0
Since S and T are both stopping times we have {T ≤ n}, {S ≤ n} ∈ Fn for all n ∈ N, which means that
{T ≤ j}, {S ≤ k} ∈ Fn for all j, k ≤ n. Hence {S + T ≤ n} ∈ Fn and therefore S + T is also a stopping
time.
8.7 If A ∈ FS then A ∩ {S ≤ n} ∈ Fn . Since T is a stopping time we have {T ≤ n} ∈ Fn , hence
A ∩ {S ≤ n} ∩ {T ≤ n} ∈ Fn .
As S ≤ T we have {T ≤ n} ⊆ {S ≤ n}, hence A ∩ {T ≤ n} ∈ Fn .

8.8 (a) We have
T = inf{n > 0 ; Bn = 2}.
Note that T ≥ n if and only if Bi = 1 for all i = 1, 2, . . . , n − 1. That is, if and only if we pick a red
ball out of the urn at times i − 1, 2, . . . , n − 1. Hence,
12 n−1
P[T ≥ n] = ...
23 n
139
1
= .
n
Therefore, since P[T = ∞] ≤ P[T ≥ n] for all n, we have P[T = ∞] = 0 and P[T < ∞] = 1.
(b) We note that T is a stopping time, with respect to Ft = σ(Bi ; i ≤ n), since
n
{T ≤ n} = {a red ball was drawn at some time i ≤ n} =
[
{Bi = 2}.
i=1
We showed in Section 4.2 that Mn was a martingale. Since Mn ∈ [0, 1] the process (Mn ) is bounded.
By part (a) for any n ∈ N we have P[T = ∞] ≤ P[T ≥ n] = n1 , hence P[T = ∞] = 0 and P[T < ∞] = 1.
Hence, we have condition (b) of the optional stopping theorem, so
1
E[MT ] = E[M0 ] = .
2
By definition of T we have BT = 2. Hence MT = 2
T +2
, so we obtain E[ T +2
1
] = 14 .
8.9 (a) We showed in exercise 7.8 that, almost surely, there was a point in time at which the urn contained
either all red balls or all black balls. Hence, P[T < ∞] = 1.
Moreover, almost surely, since the urn eventually contains either all red balls or all black balls, we
have either MT = 1 (all red balls) or MT = 0 (all black balls).
(b) We note that T is a stopping time, since
n
{T ≤ n} = for some i ≤ n we have Mi ∈ {0, 1} =
[
Mi−1 ({0, 1}).

i=0
In exercise 7.8 we showed that Mn was a martingale. Since Mn ∈ [0, 1] in fact Mn is a bounded
martingale, and we have P[T < ∞] from above, so conditions (b) of the optional stopping theorem
apply. Hence,
r
E[MT ] = E[M0 ] = .
r+b
Lastly, since MT is either 0 or 1 (almost surely) we note that
E[MT ] = E[MT 1{MT =1} + MT 1{MT ]=0} ]

= E[1{MT =1} ] + 0
= P[MT = 1].
Hence, P[MT = 1] = r
r+b
.
8.10 (a) We use the filtration Fn = σ(Ni ; i ≤ n). We have Pn ∈ mFn and since 0 ≤ Pn ≤ 1 we have also that
Pn ∈ L1 . Also,
E[Pn+1 | Fn ] = E[Pn+1 1{the nth ball was red} | Fn ] + E[Pn+1 1{the nth ball was blue} | Fn ]

Nn − 1 Nn
=E 1{the n ball was red} Fn + E
th
1{the n ball was blue} Fn
th
2m − n − 1 2m − n − 1
Nn − 1 Nn
= E[1{the nth ball was red} | Fn ] + E[1{the nth ball was blue} | Fn ]
2m − n − 1 2m − n − 1
Nn − 1 Nn Nn 2m − n − Nn
= +
2m − n − 1 2m − n 2m − n − 1 2m − n
Nn
=
2m − n
= Pn .
Here we use taking out what is known (since Nn ∈ mFn ), along with the definition of our urn process
to calculate e.g. E[1{the nth ball was red} | Fn ] as a function of Nn . Hence (Pn ) is a martingale.
(b) Since 0 ≤ Pn ≤ 1, the process (Pn ) is a bounded martingale. The time T is bounded above by 2m,
hence condition (b) for the optional stopping theorem holds and E[PT ] = E[P0 ]. Since P0 = 12 this
gives us E[PT ] = 12 . Hence,
P[(T + 1)st ball is red] = P[NT +1 = NT + 1]
140
2m−1
X
= P[NT +1 = NT + 1 | T = i]P[T = i]
i=1
2m−1
X m−1
= P[T = i]
i=1
2m − i
= E[PT ]
1
= .
2
as required.
Chapter 9
9.1 (a) We have that T is a stopping time and (from Section 4.1) Mn = Sn − (p − q)n is a martingale. We
want to apply the optional stopping theorem to (Mn ).
We have E[T ] < ∞ from Lemma 9.1.1 and for all n we have
|Mn+1 − Mn | ≤ |Sn+1 − Sn | + |p − q| = 1 + |p − q|.
Thus, condition (c) for the optional stopping theorem applies and E[MT ] = E[M0 ]. That is,
E[ST − (T )(p − q)] = E[S0 − (0)(p − q)] = E[S0 − 0(p − q)] = 0.
We thus have E[ST ] = (p − q)E[T ] as required.

(b) We have
E[ST ] = E[ST 1{T = Ta } + ST 1{T = Tb }]

= E[a1{T = Ta } + b1{T = Tb }]
= aP[T = Ta ] + bP[T = Tb ]
Putting equations (9.5) and (9.6) into the above, and then using part (a) gives us
(q/p)b − 1 1 − (q/p)a

1
E[T ] = a + b .
p−q (q/p)b − (q/p)a (q/p)b − (q/p)a
9.2 (a) We have P[Ta < ∞] = P[Tb > ∞] = 1 from Lemma 9.3.2. Hence also P[T = Ta < ∞ or T = Tb <
∞] = 1. Since these two possibilities are disjoint, in fact P[T = Ta ] + P[T = Tb ] = 1.
For the second equation, we will use that (Sn ) is a martingale. We want to apply the optional stopping
theorem, but none of the conditions apply directly to (Sn ). Instead, set Mn = Sn∧T and note that
Lemma 8.2.4 implies that (Mn ) is a martingale. By definition of T we have |Mn | ≤ max{|a|, |b|},
hence (Mn ) is bounded. We have P[T < ∞] from above, hence conditions (b) of the optional stopping
theorem apply to (Mn ). We thus deduce that
E[MT ] = E[M0 ] = E[S0 ] = 0.
Hence
E[MT ] = E[MT 1{T =Ta } + MT 1{T =Tb } ]

= E[MTa 1{T =Ta } ] + E[MTb 1{T =Tb } ]
= E[STa 1{T =Ta } ] + E[STa 1{T =Ta } ]
= E[a1{T =Ta } ] + E[b1{T =Ta } ]
= aP[T = Ta ] + bP[T = Ta ]
as required. In the above, the first line partitions on the value of T (using what we already deduced).
The third line follows from the second because T a ≤ T , hence MTa = STa ∧T = STa , and similarly for
Tb .
(b) Solving the two equations (this is left for you) gives that P[T = Ta ] = b
b−a
and P[T = Tb ] = −a
b−a
.
141
(c) From Exercise 4.4 we have that Sn2 − n is a martingale. Again, none of the conditions of optional
stopping directly apply, but Xn = Sn∧T
2
− (n ∧ T ) is a martingale by Theorem 8.2.4. Similarly to
above, we have |Mn | ≤ max{a +|a|, b +|b|}, so (Xn ) is bounded. Hence conditions (b) of the optional
2 2
stopping theorem apply and

E[XT ] = E[X0 ] = E[S02 − 0 =] = 0.
Thus E[ST2 ] = E[T ], which gives
E[T ] = E[ST2 1{T =Ta } + ST2 1{T =Ta } ]
= E[a2 1{T =Ta } + b2 1{T =Ta } ]
= a2 P[T = Ta ] + b2 P[T = Tb ]
a2 b b2 (−a)
= +
b−1 b−a
= −ab.
Here, the second line follows because when T = Ta we have ST = STa = a, hence also ST2a = a2 , and
similarly for ST2b . The fourth line uses part (b).
9.3 (a) Note that movements up at by +2, whilst movements down are by −1. In order for Sn to return
to zero at time n, it must have made twice as many movements down as up. In particular n must
therefore be a multiple of 3, so P[S3n+1 = 0] = P[S3n+2 = 0] = 0.
For S3n , we can count the possible ways to have n movements up and 2n movements down by choosing
n upwards movements out of 3n movements, that is 3n . Taking the probabilities into account and

n
then using (9.7) we obtain
!
n 2n √ √
3n 1 2 (3n)! 22n 6πn(3n/e)3n 22n 3
P[S3n = 0] = = ∼ √ = √ .
n 3 3 n!(2n)! 33n n
8πn(n/e) (2n/e) 2n 3 3n 2 πn
(b) The argument that E[G] < ∞, under the assumption P[R < ∞] = 1, is exactly the same as before.
We need to modify the next part, beginning with (9.9) which is replaced by the result in part (a).
Note that that S3n now plays the role of √
what was previously S2n , being the possible returns to zero.
Having (a) introduces an extra factor of 23 , but using the same method as before we can still deduce
that E[G] = ∞. We thus arrive at a contradiction in the same way.
9.4 (a) We have E[Xi ] = (p)(1) + (1 − p2 )(−1 − 2) = 5p
2
− 3. The condition p ∈ ( 56 , 1] implies that E[Xi ] > 0.
Writing µ = E[Xi ], it follows from the strong law of large numbers that (almost surely) there exists
N ∈ N such that for all n ≥ N we have Sn ≥ n µ2 . Hence Sn → ∞.
a.s.
(b) We calculate
E[T1 ] = E[T1 1{S1 =1} ] + E[T1 1{S1 =−1} ] + E[T1 1{S1 =−2} ]
= E[11{S1 =1} ] + E[E[T1 1{S1 =−1} | F1 ]] + E[E[T1 1{S1 =−2} | F1 ]]
=p + E[1{S1 =−1} E[T1 | F1 ]] + E[1{S1 =−2} E[T1 | F1 ]]
=p + E[1{S1 =−1} E[1 + T10 + T100 | F1 ]] + E[1{S1 =−2} E[1 + T10 + T100 + T1000 | F1 ]]
=p + E[1{S1 =−1} E[1 + T10 + T100 ]] + E[1{S1 =−1} E[1 + T10 + T100 + T1000 ]]
=p + E[1{S1 =−1} (1 + 2E[T1 ])] + E[1{S1 =−1} (1 + 3E[T1 ])]
1−p 1−p
=p + (1 + 2E[T1 ]) + (1 + 3E[T1 ]).
2 2
In the first line of the above we partition on the value of S1 . The first term of the second line follows
because if S1 = 1 then T1 = 1, and the other terms use the relationship between expectation and
conditional expectation. The third line follows because S1 ∈ F1 .
The fourth line uses that, on the event S1 = −1, T1 is equal to 1 (accounting for the first move 0 7→ −1)
plus two independent copies (T10 and T100 ) of T1 (accounting for the time to move from −1 7→ 0 and
then 0 7→ 1). Similarly, on the event S1 = −1, T1 is equal to 1 plus three independent copies of T1 .
Note that it is crucial here that (Sn ) can only jump upwards by 1 in each step of time, meaning that
it cannot go above 1 without first being equal to 1.
The strong Markov property gives that T10 , T100 and T1000 are independent of F1 , leading to the fifth line.
The remainder of the calculation is straightforward. We thus obtain
1−p
E[T1 ] = p + (2 + 5E[Tt ]).
2
Solving this equation gives E[T1 ] = 2
5p−3
.
142
9.5 Recall that Sn even when n even, and odd when n is odd. Hence T1 is odd. From the fact given in the
question, there exists N ∈ N such that for all n ≥ N we have P[T1 = 2n − 1] ≥ 4√πn
1
3/2 . Hence
∞ ∞ ∞
X 1 X 2n − 1 1 X 1
E[T1 ] ≥ (2n − 1)P[T1 = 2n − 1] ≥ √ ≥ √ =∞
n=N
4 π n=N n3/2 2 π n=N n1/2
as required.
9.6 (a) Note that Mn = / cosh θ). Since Xi ∈ mFn for all i ≤ n, Sn ∈ mFn for all n by
(θ) Qn θXi
i=1 (e
Proposition 2.2.6. Since |Sn | ≤ n we have |Mn | ≤ (cosh eθn
θ)n
< ∞, hence Mn ∈ L1 . We have also that
n
!
eθXi eθXn+1
Y
E[Mn+1 |Fn ] = E Fn
i=1
cosh θ cosh θ
θXn+1
e
= Mn E
cosh θ
= Mn .
Here we use the taking out what is known rule, the fact that Xn+1 is independent of Fn and the
relationship between conditional expectation and independence. To deduce the final line we note that
E[eθXi / cosh θ] = 12 (eθ + e−θ )/ cosh θ = 1.
Note: This is a more general case of the martingale from Exercise 4.1.
(b) Condition (1) fails because Tm is not bounded – to see this, note that for any N ∈ N there is a positive
probability that (Sn )N
n=0 remains below zero, in which case Tm ≥ N . Condition (b) fails because (Mn )
is unbounded. Condition (c) fails because we have E[T1 ] = ∞ by Lemma 9.3.4, and Tm ≥ T1 so also
E[Tm ] = ∞.
(c) By Lemma 8.2.4 the process Ln = Mn∧T is a martingale, and by definition of T we have that
(θ) (θ)
e−mθ emθ
. Since cosh θ ≥ 1, we thus have |Ln | ≤ emθ + e−mθ , so the martingale
(θ) (θ)
(cosh θ)n
≤ Ln ≤ (cosh θ)n
(Lθn ) is bounded. By Lemma 9.3.2 P[T < ∞] = 1, so conditions (b) of the optional stopping theorem
apply to (Ln ) and T , hence
(θ)
eθST eθS0

E =E = 1.
(cosh θ)T (cosh θ)0
By Lemma 9.3.2 we have P[T < ∞] = 1, hence almost surely either T = Tm or T = T−m . Hence,
eθSτ eθSτ

1=E 1 {T =Tm } + 1{T =T−m }
(cosh θ)T (cosh θ)T
emθ e−mθ

=E 1{T =Tm } + E 1 {T =T −m }
(cosh θ)Tm (cosh θ)T−m

1 1
= emθ E 1{T =Tm } + e −mθ
E 1 {T =T−m }
(cosh θ)Tm (cosh θ)T−m
1

1{T =Tm }

= emθ + e−mθ E
(cosh θ)Tm
In the above, the final lineh follows by symmetry about zero, which implies that the two expectations
1{T =Tm } i
are in fact equal. Thus E (cosh θ)Tm = 2 cosh(mθ)
1
, for all m ∈ N. With this in hand we can calculate
1{T =Tm } 1{T =T−m }

1
E =E +E
(cosh θ)T (cosh θ)Tm (cosh θ)T−m
1 1
= +
2 cosh(mθ) 2 cosh(−mθ)
1
= .
cosh(mθ)
In the first line we partition on whether T is equal to Tm or T−m . In the second line we apply the
formulae deduced above at both m and −m, and to deduce the final line we recall that cosh(x) =
cosh(−x).
Note: Tm and 1{T =Tm } are not independent, but if they were then it would be possible to write a
simpler solution. Check that you didn’t fall into this trap.
143
9.7 We have show in Lemma 9.5.1 that (Sn ) will almost surely return to the origin in finite time. By applying
the strong Markov property at each return to the origin, we obtain that (Sn ) will return to the origin
infinitely often.
Fix some z ∈ Z2 . There is a path from the origin to z, and with positive probability the walk will take
that path, hence P[Sn = z for some n ∈ N] is positive. By the strong Markov property, this means the
probability that (Sn ) visits z in between any two successive visits to the origin, is also positive. By the
strong Markov property, whether this occur is independent, for each successive pair of returns to the origin.
Hence we have infinitely many trials to try and visit z, each with the same positive success probability.
This will result in infinitely many visits to z, almost surely.
This holds for any z ∈ Z2 , which completes the argument.
9.8 (a) If L = ∞ then the set {n ; Sn = 0} much be infinite, meaning that (Sn ) has returned infinitely many
times to the origin. However, this has probability zero; by the strong Markov property, after each
visit there is a probability P[R = ∞] > 0 of not returning again, independently of the past; eventually
that will happen.
(b) L is not a stopping time. One way to see this is to note that {L = 0} = {R = ∞}, which depends on
the whole path taken by the walk, so is not an element of Fn for any N ∈ N.
(c) The event {L = 2n} is the event that S2n = 0, but the walk is never zero after that time. Hence
P[L = 2n] = E[1{L=2n} ]

= E[1{S2n =0} 1{Sk 6=0 for all k>2n} ]
= E[E[1{S2n =0} 1{Sk 6=0 for all k>2n} | F2n ]]
= E[1{S2n =0} E[1{Sk 6=0 for all k>2n} | F2n ]]
= E[1{S2n =0} P[R = ∞]]
= P[R = ∞]P[S2n = 0].
Note that we have used the Markov property to deduce the fifth line.
Recall
Pthat Sn is even if and only if n is even. Taking the above equation and summing over n, noting
that ∞ n=0 P[L = 2n] = 1 we obtain
∞
X
1 = P[R = ∞] P[S2n = 0] = P[R = ∞]E[G].
n=0
The required result follows by noting that P[R < ∞] = 1 − P[R = ∞].
(d) During the proof of Lemma 9.3.4, whilst we were arguing by contradiction, we noted that if p = P[R <
∞] < 1 then G would have a Geometric(p) distribution. In part (c) we’ve discovered the mean of this
geometric distribution. We also came close, for the same reason, during the proofs of Lemmas 9.5.1
and 9.5.2, and in Remark 9.4.2. In all of those cases, all we needed was to note that E[G] would be
finite – we didn’t need to know its value.
9.9 Let Br = {z ∈ Z3 ; |z| < n}. Note that Br is a finite set.
Consider some z ∈ Z3 . If (Sn ) visits z then, the first time that it does, we may apply the strong Markov
property. From then on the walk will behave independently of its past, again moving one unit of space in
a random direction on each step of time. Apply part (c) of Exercise 9.9, the expected number of returns
to z is finite, hence there will almost surely be a last return time to z.
We can do the above for any z ∈ BN . Since Br is finite, this means that almost surely there is a last return
time to the set Br . That is, there exists some random variable N ∈ N such that Sn ∈ / Br for all n ≥ N .
Hence |Sn | ≥ r for all n ≥ N . Since this holds for any r ∈ N, we have |Sn | → ∞.
a.s.
144
Appendix B
Formula Sheet (part one)
The formula sheet displayed on the following page will be provided in the exam.
145

notes_1

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

notes_1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

notes_1

Uploaded by

Copyright:

Available Formats

Stochastic Processes and Financial Mathematics

Dr Nic Freeman / Dr Robin Stephenson

October 16, 2024

1 Expectation and Arbitrage 5

2 Probability spaces and random variables 13

3 Conditional expectation and martingales 30

5 The binomial model 51

6 Convergence of random variables 70

7 Stochastic processes and martingale theory 76

8 Further theory of stochastic processes (∆) 95

9 Simple random walks (∆) 104

A Solutions to exercises (part one) 119

B Formula Sheet (part one) 145

Expectation and Arbitrage

1.1 Betting on coin tosses

1.2 The one-period market

of cash. Here, pu , pd > 0 and pu + pd = 1.

ysu with probability pu ,

E[V1 ] = 5(1 + r) + 8supu + 8sdpd

1. Take out a loan of s

2. Buy the stock, at the special price, for s

3. Sell the stock, at the market rate, for s cash.

4. Repay our loan of s

We now have no debts and s

We assume that no arbitrage can occur in our market.

K = s(1 + r). (1.1)

Let us now explain why. We argue by contradiction.

1. At time 0, enter into a forward contract as the seller.

With this strategy we are certain to make a profit. This is arbitrage!

1. At time 0, enter into a forward contract as the buyer.

Expectation versus arbitrage

1.4 Modelling discussion

• We need stochastic processes.

1.5 Exercises on Chapter 1

Revision of probability and analysis

1.5 Let (Xn ) be a sequence of independent random variables such that

Give brief reasons for your answers.

Probability spaces and random

2.1 Probability measures and σ-fields

Definition 2.1.3 A probability measure P is a function P : F → [0, 1] satisfying:

2. If A1 , A2 , . . . ∈ F are pair-wise disjoint (i.e. Ai ∩ Aj = ∅ for all i, j such that i 6= j ) then

Proof: We check the three conditions of Definition 2.1.1 for F .

Corollary 2.1.6 In particular, if F1 and F2 are σ -fields, so is F1 ∩ F2 .

• If A ∈ F then Ω \ A ∈ F , and since Ω = A ∪ (Ω \ A) we have P[Ω] = 1 = P[A] + P[Ω \ A].

• If A, B ∈ F and A ⊆ B then we can write B = A ∪ (B \ A), which gives us that P[B] =

Lemma 2.1.8 Let A1 , A2 , . . . ∈ F , where F is a σ -field. Then ∞ i=1 Ai ∈ F .

Proof: We can write

2.2 Random variables

Definition 2.2.1 Let G be a σ -field. A function X : Ω → R is said to be G -measurable if

for all subintervals I ⊆ R, we have X −1 (I) ∈ G.

P [a < X < b] = P X −1 (a, b) = P[ω ∈ Ω ; X(ω) ∈ (a, b)]

P[X = a] = P[X −1 (a)] = P[ω ∈ Ω ; X(ω) = a].

(as we did in Example 2.1.2). Then if we look at function

X(ω) = the total number of tails which occurred

σ-fields and information

event ↔ a piece of information: did the event occur

σ-fields and pre-images

X is measurable with respect to G ⇔ for all j, {X = xj } ∈ G.

Since {X = xj } ∈ G , the definition of a σ -field tell us that also X −1 (I) ∈ G .