[go: up one dir, main page]

0% found this document useful (0 votes)
12 views168 pages

Lecture Notes10

Study material

Uploaded by

Alex Siame
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views168 pages

Lecture Notes10

Study material

Uploaded by

Alex Siame
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 168

Stochastic Calculus

Financial Derivatives
and PDE’s

Simone Calogero

March 2, 2017
Contents

1 Probability spaces 3
1.1 σ-algebras and information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Probability measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Filtered probability spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.A Appendix: The ∞-coin tosses probability space . . . . . . . . . . . . . . . . 11
1.B Appendix: Solutions to selected problems . . . . . . . . . . . . . . . . . . . . 13

2 Random variables and stochastic processes 15


2.1 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Distribution and probability density functions . . . . . . . . . . . . . . . . . 19
2.2.1 Random variables with boundary values . . . . . . . . . . . . . . . . 24
2.2.2 Joint distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3 Stochastic processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4 Stochastic processes in financial mathematics . . . . . . . . . . . . . . . . . . 33
2.A Appendix: Solutions to selected problems . . . . . . . . . . . . . . . . . . . . 39

3 Expectation 43
3.1 Expectation and variance of random variables . . . . . . . . . . . . . . . . . 43
3.2 Computing the expectation of a random variable . . . . . . . . . . . . . . . . 50
3.3 Characteristic function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.4 Quadratic variation of stochastic processes . . . . . . . . . . . . . . . . . . . 55
3.5 Conditional expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.6 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.7 Markov processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.A Appendix: Solutions to selected problems . . . . . . . . . . . . . . . . . . . . 69

4 Stochastic calculus 73
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.2 The Itô integral of step processes . . . . . . . . . . . . . . . . . . . . . . . . 74
4.3 Itô’s integral of general stochastic processes . . . . . . . . . . . . . . . . . . 77
4.4 Diffusion processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.4.1 The product rule in stochastic calculus . . . . . . . . . . . . . . . . . 84
4.4.2 The chain rule in stochastic calculus . . . . . . . . . . . . . . . . . . 85

1
4.5 Girsanov’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.6 Diffusion processes in financial mathematics . . . . . . . . . . . . . . . . . . 89
4.A Appendix: Solutions to selected problems . . . . . . . . . . . . . . . . . . . . 92

5 Stochastic differential equations and partial differential equations 94


5.1 Stochastic differential equations . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.1.1 Linear SDE’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.1.2 Markov property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.1.3 Systems of SDE’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.2 Partial differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.3 The CIR process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.4 Finite different solutions of PDE’s . . . . . . . . . . . . . . . . . . . . . . . . 105
5.4.1 ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.4.2 PDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.A Appendix: Solutions to selected problems . . . . . . . . . . . . . . . . . . . . 115

6 The risk-neutral price 116


6.1 Absence of arbitrage in 1+1 dimensional markets . . . . . . . . . . . . . . . 116
6.2 The risk-neutral pricing formula . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.3 Black-Scholes price of European derivatives . . . . . . . . . . . . . . . . . . . 122
6.3.1 Black-Scholes price of European vanilla options . . . . . . . . . . . . 125
6.3.2 The greeks. Implied volatility and volatility curve . . . . . . . . . . . 127
6.4 European derivatives on a dividend-paying stock . . . . . . . . . . . . . . . . 131
6.5 Local and Stochastic volatility models . . . . . . . . . . . . . . . . . . . . . . 133
6.5.1 Local volatility models . . . . . . . . . . . . . . . . . . . . . . . . . . 134
6.5.2 Stochastic volatility models . . . . . . . . . . . . . . . . . . . . . . . 136
6.5.3 Variance swaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.6 Interest rate models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
6.7 Forwards and Futures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.7.1 Forwards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.7.2 Futures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
6.8 Multi-dimensional markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
6.9 Introduction to American derivatives . . . . . . . . . . . . . . . . . . . . . . 156
6.9.1 Perpetual American put options . . . . . . . . . . . . . . . . . . . . . 158
6.9.2 American calls on a dividend-paying stock . . . . . . . . . . . . . . . 162
6.A Appendix: Solutions to selected problems . . . . . . . . . . . . . . . . . . . . 163

2
Chapter 1

Probability spaces

1.1 σ-algebras and information


We begin with some notation and terminology. The symbol Ω denotes a generic non-empty
set; the power of Ω, denoted by 2Ω , is the set of all subsets of Ω. If the number of elements
in the set Ω is M ∈ N, we say that Ω is finite. If Ω contains an infinite number of elements
and there exists a bijection Ω ↔ N, we say that Ω is countably infinite. If Ω is neither finite
nor countably infinite, we say that it is uncountable. An example of uncountable set is the
set R of real numbers. When Ω is finite we write Ω = {ω1 , ω2 , . . . , ωM }, or Ω = {ωk }k=1,...,M .
If Ω is countably infinite we write Ω = {ωk }k∈N . Note that for a finite set Ω with M elements,
the power set contains 2M elements. For instance, if Ω = {♥, 1, $}, then

2Ω = {∅, {♥}, {1}, {$}, {♥, 1}, {♥, $}, {1, $}, {♥, 1, $} = Ω},

which contains 23 = 8 elements. Here ∅ denotes the empty set, which by definition is a
subset of all sets.
Within the applications in probability theory, the elements ω ∈ Ω are called sample
points and represent the possible outcomes of a given experiment (or trial), while the sub-
sets of Ω correspond to events which may occur in the experiment. For instance, if the
experiment consists in throwing a dice, then Ω = {1, 2, 3, 4, 5, 6} and A = {2, 4, 6} identifies
the event that the result of the experiment is an even number. Now let Ω = ΩN ,

ΩN = {(γ1 , . . . , γN ), γk ∈ {H, T }} = {H, T }N , (1.1)

where H stands for “head” and T stands for “tail”. Each element ω = (γ1 , . . . , γN ) ∈ ΩN
is called a N-toss and represents a possible outcome for the experiment “tossing a coin N
N
consecutive times”. Evidently, ΩN contains 2N elements and so 2ΩN contains 22 elements.
We show in Appendix 1.A at the end of the present chapter that Ω∞ —the sample space for
the experiment “tossing a coin infinitely many times”—is uncountable.
A collection of events, e.g., {A1 , A2 , . . . } ⊂ 2Ω , is also called information. To understand
the meaning of this terminology, suppose that the experiment has been performed and we
observe that the events A1 , A2 , . . . have occurred. We may then use this information to

3
restrict the possible outcomes of the experiment. For instance, if we are told that in a 5-toss
the following two events have occurred:

1. there are more heads than tails

2. the first toss is a tail

then we may conclude that the result of the 5-toss is one of

(T, H, H, H, H), (T, T, H, H, H), (T, H, T, H, H), (T, H, H, T, H), (T, H, H, H, T ).

If in addition we are given the information that

3. the last toss is a tail,

then we conclude that the result of the 5-toss is (T, H, H, H, T ).


The power set of the sample space provides the total accessible information and
represents the collection of all the events that can be resolved (i.e., whose occurrence can
be inferred) by knowing the outcome of the experiment. For an uncountable sample space,
the total accessible information is huge and it is typically replaced by a subclass of events
F ⊂ 2Ω , which is imposed to form a σ-algebra.

Definition 1.1. A collection F ⊆ 2Ω of subsets of Ω is called a σ-algebra (or σ-field) on


Ω if

(i) ∅ ∈ F;

(ii) A ∈ F ⇒ Ac := {ω ∈ Ω : ω ∈ / A} ∈ F;

(iii) ∞
S
k=1 Ak ∈ F, for all {Ak }k∈N ⊆ F.

If G is another σ-algebra on Ω and G ⊂ F, we say that G is a sub-σ-algebra of F.

Exercise 1.1. Let F be a σ-algebra. Show that Ω ∈ F and that ∩k∈N Ak ∈ F, for all
countable families {Ak }k∈N ⊂ F of events.

Exercise 1.2. Let Ω = {1, 2, 3, 4, 5, 6} be the sample space of a dice roll. Which of the
following sets of events are σ-algebras on Ω?

1. {∅, {1}, {2, 3, 4, 5, 6}, Ω},

2. {∅, {1}, {2}, {1, 2}, {1, 3, 4, 5, 6}, {2, 3, 4, 5, 6}, {3, 4, 5, 6}, Ω},

3. {∅, {2}, {1, 3, 4, 5, 6}, Ω}.

Exercise 1.3 (•). Prove that the intersection of any number of σ-algebras (including un-
countably many) is a σ-algebra. Show with a counterexample that the union of two σ-algebras
is not necessarily a σ-algebra.

4
Remark 1.1 (Notation). The letter A is used to denote a generic event in the σ-algebra.
If we need to consider two such events, we denote them by A, B, while N generic events are
denoted A1 , . . . , AN .
Let us comment on Definition 1.1. The empty set represents the “nothing happens”
event, while Ac represents the “A does not occur” event. Given a finite number A1 , . . . , AN
of events, their union is the event that at least one of the events A1 , . . . , AN occurs, while
their intersection is the event that all events A1 , . . . , AN occur. The reason to include the
countable union/intersection of events in our analysis is to make it possible to “take limits”
without crossing the boundaries of the theory. Of course, unions and intersections of infinitely
many sets only matter when Ω is not finite.
The smallest σ-algebra on Ω is F = {∅, Ω}, which is called the trivial σ-algebra. There
is no relevant information contained in the trivial σ-algebra. The largest possible σ-algebra
is F = 2Ω , which contains the full amount of accessible information. When Ω is countable,
it is common to pick 2Ω as σ-algebra of events. However, as already mentioned, when Ω
is uncountable this choice is unwise. A useful procedure to construct a σ-algebra of events
when Ω is uncountable is the following. First we select a collection of events (i.e., subsets of
Ω), which for some reason we regard as fundamental. Let O denote this collection of events.
Then we introduce the smallest σ-algebra containing O, which is formally defined as follows.
Definition 1.2. Let O ⊂ 2Ω . The σ-algebra generated by O is
\
FO = F : F ⊂ 2Ω is a σ-algebra and O ⊆ F ,

i.e., FO is the smallest σ-algebra on Ω containing O.


Recall that the intersection of any number of σ-algebras is still a σ-algebra, see Exer-
cise 1.3, hence FO is a well-defined σ-algebra. For example, let Ω = Rd and let O be the
collection of all open balls:

O = {Bx (R)}R>0,x∈Rd , where Bx (R) = {y ∈ Rd : |x − y| < R}.

The σ-algebra generated by O is called Borel σ-algebra and denoted B(Rd ). The elements
of B(Rd ) are called Borel sets.
Remark 1.2 (Notation). The Borel σ-algebra B(R) plays an important role in these notes,
so we shall use a specific notation for its elements. A generic event in the σ-algebra B(R)
will be denoted U ; if we need to consider two such events we denote them by U, V , while
N generic Borel sets of R will be denoted U1 , . . . UN . Recall that for general σ-algebras, the
notation used is the one indicated in Remark 1.1.
The σ-algebra generated by O has a particular simple form when O is a partition of Ω.
Definition 1.3. Let I ⊆ N. A collection O = {Ak }k∈I of non-empty subsets of Ω is called
a partition of Ω if
(i) the events {Ak }k∈I are disjoint, i.e., Aj ∩ Ak = ∅, for j 6= k;

5
S
(ii) k∈I Ak = Ω.
If I is a finite set we call O a finite partition of Ω.
Note that any countable sample space Ω = {ωk }k∈N is partitioned by the atomic events
Ak = {ωk }, where {ωk } identifies the event that the result of the experiment is exactly ωk .
Exercise 1.4. Show that when O is a partition, the σ-algebra generated by O is given by
the set of all subsets of Ω which can be written as the union of sets in the partition O (plus
the empty set, of course).
Exercise 1.5. Find the partition of Ω = {1, 2, 3, 4, 5, 6} that generates the σ-algebra 2 in
Exercise 1.2.

1.2 Probability measure


To any event A ∈ F we want to associate a probability that A occurred.
Definition 1.4. Let F be a σ-algebra on Ω. A probability measure is a function

P : F → [0, 1]

such that
(i) P(Ω) = 1;

(ii) for any countable collection of disjoint events {Ak }k∈N ⊆ F, we have

! ∞
[ X
P Ak = P(Ak ).
k=1 k=1

A triple (Ω, F, P) is called a probability space.


The quantity P(A) is called probability of the event A; if P(A) = 1 we say that the
event A occurs almost surely, which is sometimes shortened by a.s.; if P(A) = 0 we say
that A is a null set. In general, the elements of F with probability zero or one will be called
trivial events (as trivial is the information that they provide). For instance, P(Ω) = 1, i.e.,
the probability that “something happens” is one, and P(∅) = P(Ωc ) = 1 − P(Ω) = 0, i.e., the
probability the “nothing happens” is zero.
Exercise 1.6 (•). Prove the following properties:
1. P(Ac ) = 1 − P(A);

2. P(A ∪ B) = P(A) + P(B) − P(A ∩ B);

3. If A ⊂ B, then P(A) < P(B).

6
Exercise 1.7 (Continuity of probability measures (?)). Let {Ak }k∈N ⊆ F such that Ak ⊆
Ak+1 , for all k ∈ N. Let A = ∪k Ak . Show that

lim P(Ak ) = P(A).


k→∞

Similarly, if now {Ak }k∈N ⊆ F such that Ak+1 ⊆ Ak , for all k ∈ N and A = ∩k Ak , show
that
lim P(Ak ) = P(A).
k→∞

Let us see some examples of probability space.

• There is only one probability measure defined on the trivial σ-algebra, namely P(∅) = 0
and P(Ω) = 1.

• In this example we describe the general procedure to construct a probability space on


a countable sample space Ω = {ωk }k∈N . We pick F = 2Ω and let 0 ≤ pk ≤ 1, k ∈ N,
be real numbers such that ∞
X
pk = 1.
k=1

We introduce a probability measure on F by first defining the probability of the atomic


events {ω1 }, {ω2 }, . . . as
P({ωk }) = pk , k ∈ N.
Since every (non-empty) subset of Ω can be written as the disjoint union of atomic
events, then the probability of any event can be inferred using the property (ii) in the
definition of probability measure, e.g.,

P({ω1 , ω3 , ω5 }) = P({ω1 } ∪ {ω3 } ∪ {ω5 })


= P({ω1 }) + P({ω3 }) + P({ω5 }) = p1 + p3 + p5 .

In general we define X
P(A) = pk , A ∈ 2Ω ,
k:ωk ∈A

while P(∅) = 0.

• As a special case of the previous example we now introduce a probability measure on


the sample space ΩN of the N -coin tosses experiment. Given 0 < p < 1 and ω ∈ ΩN ,
we define the probability of the atomic event {ω} as

P({ω}) = pNH (ω) (1 − p)NT (ω) , (1.2)

where NH (ω) is the number of H in ω and NT (ω) is the number of T in ω (NH (ω) +
NT (ω) = N ). We say that the coin is fair if p = 1/2. The probability of a generic
event A ∈ F = 2ΩN is obtained by adding up the probabilities of the atomic events

7
whose disjoint union forms the event A. For instance, assume N = 3 and consider the
event
“The first and the second toss are equal”.
Denote by A ∈ F the set corresponding to this event. Then clearly A is the (disjoint)
union of the atomic events

{(H, H, H)}, {(H, H, T )}, {(T, T, T )}, {(T, T, H)}.

Hence,

P(A) = P({(H, H, H)}) + P({(H, H, T )}) + P({(T, T, T )}) + P({(T, T, H)})


= p3 + p2 (1 − p) + (1 − p)3 + (1 − p)2 p = 2p2 − 2p + 1.

• Let f : R → [0, ∞) be a measurable function1 such that


Z
f (x) dx = 1.
R

Then Z
P(U ) = f (x) dx, (1.3)
U

defines a probability measure on B(R).

Remark 1.3 (Riemann vs. Lebesgue integral). The integral in (1.3) must be understood
in the Lebesgue sense, since we are integrating a general measurable function over a general
Borel set. If f is a sufficiently regular (say, continuous) function, and U = (a, b) ⊂ R is
an interval, then the integral in (1.3) can be understood in the Riemann sense. Although
this last case is sufficient for most applications in finance, all integrals in these notes should
be understood in the Lebesgue sense, unless otherwise stated. The knowledge of Lebesgue
integration theory is however not required for our purposes.
P
Exercise 1.8 (•). Prove that ω∈ΩN P({ω}) = 1, where P({ω}) is given by (1.2).

Equivalent probability measures


A probability space is a triple (Ω, F, P) and if we change one element of this triple we get
a different probability space. The most interesting case is when a new probability measure
is introduced. Let us first show with an example (known as Bertrand’s paradox) that
there might not be just one “reasonable” definition of probability measure associated to a
given experiment. We perform an experiment whose result is a pair of points p, q on the
unit circle C (e.g., throw two balls in a roulette). The sample space for this experiment is
Ω = {(p, q) : p, q ∈ C}. Let T be the length of the chord joining p and q. Now let L be
1
See Section 2.1 for the definition of measurable function.

8
p p

L 1/2 m
T

L
q

(a) P(A) = 1/3 (b) P(A) = 1/4

Figure 1.1: The Bertrand paradox. The length T of the cord pq is greater then L.

the length of the side of a equilateral triangle inscribed in the circle C. Note that all such
triangles are obtained one from another by a rotation around the center of the circle and
all have the same sides length L. Consider the event A = {(p, q) ∈ Ω : T > L}. What
is a reasonable definition for P(A)? From one hand we can suppose that one vertex of the
triangle is p, and thus T will be greater than L if and only if the point q lies on the arch
of the circle between the two vertexes of the triangle different from p, see Figure 1.1(a).
Since the length of such arc is 1/3 the perimeter of the circle, then it is reasonable to define
P(A) = 1/3. On the other hand, it is simple to see that T > L whenever the midpoint m of
the chord lies within a circle of radius 1/2 concentric to C, see Figure 1.1(b). Since the area
of the interior circle is 1/4 the area of C, we are led to define P(A) = 1/4.
Whenever two probabilities are defined for the same experiment, we shall require them
to be equivalent, in the following sense.
Definition 1.5. Given two probability spaces (Ω, F, P) and (Ω, F, P),
e the probability mea-
e are said to be equivalent if P(A) = 0 ⇔ P(A)
sures P and P e = 0.

A complete characterization of the probability measures P


e equivalent to a given P will
be given in Theorem 3.3.

Conditional probability
It might be that the occurrence of an event B makes the occurrence of another event A more
or less likely. For instance, the probability of the event A = {the first two tosses of a fair
coin are both head} is 1/4; however if know that the first toss is a tail, then P(A) = 0, while
P(A) = 1/2 if we know that the first toss is a head. This leads to the important definition
of conditional probability.

9
Definition 1.6. Given two events A, B such that P(B) > 0, the conditional probability
of A given B is defined as
P(A ∩ B)
P(A|B) = .
P(B)
To justify this definition, let FB = {A ∩ B}A∈F , and set

PB (·) = P(·|B). (1.4)

Then (B, FB , PB ) is a probability space in which the events that cannot occur simultaneously
with B are null events. Therefore it is natural to regard (B, FB , PB ) as the restriction of the
probability space (Ω, F, P) when B has occurred.
If P(A|B) = P(A), the two events are said to be independent. The interpretation is the
following: if two events A, B are independent, then the occurrence of the event B does not
change the probability that A occurred. By Definition 1.4 we obtain the following equivalent
characterization of independent events.
Definition 1.7. Two events A, B are said to be independent if P(A ∩ B) = P(A)P(B). In
general, the events A1 , . . . , AN (N ≥ 2) are said to be independent if, for all 1 ≤ k1 < k2 <
· · · < km ≤ N , we have
m
Y
P(Ak1 ∩ · · · ∩ Akm ) = P(Akj ).
j=1

Two σ-algebras F, G are said to be independent if A and B are independent, for all A ∈ G
and B ∈ F. In general the σ-algebras F1 , . . . , FN (N ≥ 2) are said to be independent if
A1 , A2 , . . . , AN are independent events, for all A1 ∈ F1 , . . . , AN ∈ Fn .
Note that if F, G are two independent σ-algebras and A ∈ F ∩ G, then A is trivial. In
fact, if A ∈ F ∩ G, then P(A) = P(A ∩ A) = P(A)2 . Hence P(A) = 0 or 1. The interpretation
of this simple remark is that independent σ-algebras carry distinct information.
Exercise 1.9 (•). Given a fair coin and assuming N is odd, consider the following two
events A, B ∈ ΩN :

A = “the number of heads is greater than the number of tails”,

B = “The first toss is a head”.


Use your intuition to guess whether the two events are independent; then verify your answer
numerically (e.g., using Mathematica).

1.3 Filtered probability spaces


Consider again the N -coin tosses probability space. Let AH be the event that the first toss is
a head and AT the event that it is a tail. Clearly AT = AcH and the σ-algebra F1 generated
by the partition {AH , AT } is F1 = {AH , AT , Ω, ∅}. Now let AHH be the event that the first 2

10
tosses are heads, and similarly define AHT , AT H , AT T . These four events form a partition of
ΩN and they generate a σ-algebra F2 as indicated in Exercise 1.4. Clearly, F1 ⊂ F2 . Going
on with three tosses, four tosses, and so on, until we complete the N -toss, we construct a
sequence
F1 ⊂ F2 ⊂ · · · ⊂ FN = 2ΩN
of σ-algebras. The σ-algebra Fk contains all the events of the experiment that depend on
(i.e., which are resolved by) the first k tosses. The family {Fk }k=1,...,N of σ-algebras is an
example of filtration.

Definition 1.8. A filtration is a one parameter family {F(t)}t≥0 of σ-algebras such that
F(t) ⊆ F for all t ≥ 0 and F(s) ⊆ F(t) for all s ≤ t. A quadruple (Ω, F, {F(t)}t≥0 , P) is
called a filtered probability space.

In our applications t stands for the time variable and filtrations are associated to exper-
iments in which “information accumulates with time”. For instance, in the example given
above, the more times we toss the coin, the higher is the number of events which are resolved
by the experiment, i.e., the more information becomes accessible.

1.A Appendix: The ∞-coin tosses probability space


In this appendix we outline the construction of the probability space for the ∞-coin tosses
experiment. The sample space is

Ω∞ = {ω = (γn )n∈N , γn ∈ {H, T }}.

Let us show first that Ω is uncountable. We use the well-known Cantor diagonal argu-
ment. Suppose that Ω∞ is countable and write

Ω∞ = {ωk }k∈N . (1.5)


(k) (k)
Each ωk ∈ Ω∞ is a sequence of infinite tosses, which we write as ωk = (γj )j∈N , where γj
(k)
is either H or T , for all j ∈ N and for each fixed k ∈ N. Note that (γj )j,k∈N is an “∞ × ∞”
matrix. Now consider the ∞-toss corresponding to the diagonal of this matrix, that is
(m)
ω̄ = (γ̄m )m∈N , γ̄m = γm , for all m ∈ N.

Finally consider the ∞-toss ω which is obtained by changing each single toss of ω̄, that is to
say

ω = (γm )m∈N , where γm = H if γ̄m = T , and γm = T if γ̄m = H, for all m ∈ N.

It is clear that the ∞-toss ω does not belong to the set (1.5). In fact, by construction, the
first toss of ω is different from the first toss of ω1 , the second toss of ω is different from the
second toss of ω2 , . . . , the nth toss of ω is different from the nth toss of ωn , and so on, so

11
that each ∞-toss in (1.5) is different from ω. We conclude that the elements of Ω∞ cannot
be listed as they were comprising a countable set.
Now, let N ∈ N and recall that the sample space ΩN for the N -tosses experiment is given
by (1.1). For each ω̄ = (γ̄1 , . . . , γ̄N ) ∈ ΩN we define the event Aω̄ ⊂ Ω∞ by
Aω̄ = {ω = (γn )n∈N : γj = γ̄j , j = 1, . . . , N },
i.e., the event that the first N tosses in a ∞-toss be equal to (γ̄1 , . . . , γ̄N ). Define the
probability of this event as the probability of the N -toss ω̄, that is
P0 (Aω̄ ) = pNH (ω̄) (1 − p)NT (ω̄) ,
where 0 < p < 1, NH (ω̄) is the number of heads in the N -toss ω̄ and NT (ω̄) = N − NH (ω̄)
is the number of tails in ω̄, see (1.2). Next consider the family of events
UN = {Aω̄ }ω̄∈ΩN ⊂ 2Ω∞ .
It is clear that UN is, for each fixed N ∈ N, a partition of Ω∞ . Hence the σ-algebra
FN = FUN is generated according to Exercise 1.4. Note that FN contains all events of Ω∞
that are resolved by the first N tosses. Moreover FN ⊂ FN +1 , that is to say, {FN }N ∈N is
a filtration. Since P0 is defined for all Aω̄ ∈ UN , then it can be extended uniquely to the
entire FN , because each element A ∈ FN is the disjoint union of events of UN (see again
Exercise 1.4) and therefore the probability of A can be inferred by the property (ii) in the
definition of probability measure, see Definition 1.4. But then P0 extends uniquely to
[
F∞ = FN .
N ∈N

Hence we have constructed a triple (Ω∞ , F∞ , P0 ). Is this triple a probability space? The
answer is no, because F∞ is not a σ-algebra. To see this, let Ak be the event that the
k th toss in a infinite sequence of tosses is a head. Clearly Ak ∈ Fk for all k and therefore
{Ak }k∈N ⊂ F∞ . Now assume that F∞ is a σ-algebra. Then the event A = ∪k Ak would
belong to F∞ and therefore also Ac ∈ F∞ . The latter holds if and only if there exists N ∈ N
such that Ac ∈ FN . But Ac is the event that all tosses are tails, which of course cannot be
resolved by the information FN accumulated after just N tosses. We conclude that F∞ is
not a σ-algebra. In particular, we have shown that F∞ is not in general closed with respect
to the countable union of its elements. However it is easy to show that F∞ is closed with
respect to the finite union of its elements, and in addition satisfies the properties (i), (ii) in
Definition 1.4. This set of properties makes F∞ an algebra. To complete the construction
of the probability space for the ∞-coin tosses experiment, we need the following deep result.
Theorem 1.1 (Caratheódory’s theorem). Let U be an algebra PN of subsets of Ω and P0 :
U → [0, 1] a map satisfying P0 (Ω) = 1 and P0 (∪N A
i=1 i ) = i=1 P0 (Ai ), for every finite
2
collection {A1 , . . . , AN } ⊂ U of disjoint sets . Then there exists a unique probability measure
P on FU such that P(A) = P0 (A), for all A ∈ U.
2
P0 is called a pre-measure.

12
Hence the map P0 : F∞ → [0, 1] defined above extends uniquely to a probability measure
P defined on F = FF∞ . The resulting triple (Ω∞ , F, P) defines the probability space for the
∞-tosses experiment.

1.B Appendix: Solutions to selected problems


Exercise 1.3. Since an event belongs to the intersection of σ-algebras if and only if it
belongs to each single σ-algebra, the proof of the first statement is trivial. As an example
of two σ-algebras whose union is not a σ algebra, take 1 and 3 of Exercise 1.2.

Exercise 1.6. Since A and Ac are disjoint, we have


1 = P(Ω) = P(A ∪ Ac ) = P(A) + P(Ac ) ⇒ P(Ac ) = 1 − P(A).
To prove 2 we notice that A ∪ B is the disjoint union of the sets A \ B, B \ A and A ∩ B. It
follows that
P(A ∪ B) = P(A \ B) + P(B \ A) + P(A ∩ B).
Since A is the disjoint union of A ∩ B and A \ B, we also have
P(A) = P(A ∩ B) + P(A \ B)
and similarly
P(B) = P(B ∩ A) + P(B \ A). (1.6)
Combining the three identities above yields the result. Moreover, from (1.6) and assuming
A ⊂ B, we obtain P(B) = P(A) + P(B \ A) > P(A), which is claim 3.

Exercise 1.8. Since for all k = 0, . . . , N the number of N -tosses ω ∈ ΩN having NH (ω) = k
is given by the binomial coefficient
 
N N!
= ,
k k!(N − k)!
then
X X X  p NH (ω)
NH (ω) NT (ω) N
P({ω}) = p (1 − p) = (1 − p)
ω∈ΩN ω∈ΩN ω∈ΩN
1−p
N   k
N
X N p
= (1 − p) .
k=0
k 1 − p

By the binomial theorem, (1 + a)N = N N


P  k
k=0 k a , hence
 N
X
N p
P(Ω) = P({ω}) = (1 − p) 1+ = 1.
ω∈Ω
1−p
N

13
KHNL
1.0

0.9

0.8

0.7

0.6

N
20 40 60 80 100

Figure 1.2: A numerical solution of Exercise 1.9 for a generic odd natural number N .

Exercise 1.9. We expect that P(A|B) > P(A), that is to say, the first toss being a head
increases the probability that the number of heads in the complete N -toss will be larger than
the number of tails. To verify this, we first observe that P(A) = 1/2, since N is odd and
thus there will be either more heads or more tails in any N -toss. Moreover, P(A|B) = P(C),
where C ∈ ΩN −1 is the event that the number of heads in a (N − 1)-toss is larger or equal
to the number of tails. Letting k be the number of heads, P(C) is the probability that
k ∈ {(N − 1)/2, . . . , N − 1}. Since there are Nk−1 possible (N − 1)-tosses with k-heads,
then
N −1   k  N −1−k N −1
N −1 N −1
  
X 1 1 1 X
P(C) = = N −1 .
k 2 2 2 k
k=(N −1)/2 k=(N −1)/2

Thus proving the statement for a generic odd N is equivalent to prove the inequality
N −1
N −1
 
1 X 1
K(N ) = > .
2N −1 k 2
k=(N −1)/2

A “numerical proof” of this inequality is provided in Figure 1.2. Note that the function
K(N ) is decreasing and converges to 1/2 as N → ∞.

14
Chapter 2

Random variables and stochastic


processes

Throughout this chapter we assume that (Ω, F, {F(t)}t≥0 , P) is a given filtered probability
space.

2.1 Random variables


In many applications of probability theory, and in financial mathematics in particular, one
is more interested in knowing the value attained by quantities that depend on the outcome
of the experiment, rather than knowing which specific events have occurred. Such quantities
are called random variables.

Definition 2.1. A map X : Ω → R is called a (real-valued) random variable if {X ∈


U } ∈ F, for all U ∈ B(R), where

{X ∈ U } = {ω ∈ Ω : X(ω) ∈ U }

is the pre-image of the Borel set U . If there exists c ∈ R such that X(ω) = c almost surely,
we say that X is a deterministic constant.

Occasionally we shall also need to consider complex-valued random variables. These are
defined as the maps Z : Ω → C of the form Z = X + iY , where X, Y are real-valued random
variables and i is the imaginary unit (i2 = −1). Similarly a vector valued random variable
X = (X1 , . . . , XN ) : Ω → RN can be defined by simply requiring that each component
Xj : Ω → R is a random variable in the sense of Definition 2.1.

Remark 2.1 (Notation). A generic real-valued random variable will be denoted by X. If we


need to consider two such random variables we will denote them by X, Y , while N real-valued
random variables will be denoted by X1 , . . . , XN . Note that (X1 , . . . , XN ) : Ω → RN is a
vector-valued random variable. The letter Z is used for complex-valued random variables.

15
Remark 2.2. Equality among random variables is always understood to hold up to a null
set. That is to say, X = Y always means X = Y a.s., for all random variables X, Y : Ω → R.

Random variables are also called measurable functions, but we prefer to use this
terminology only when Ω = R and F = B(R). Measurable functions will be denoted by small
Latin letters (e.g., f, g, . . . ). If X is a random variable and Y = f (X) for some measurable
function f , then Y is also a random variable. We denote P(X ∈ U ) = P({X ∈ U })
the probability that X takes value in U ∈ B(R). Moreover, given two random variables
X, Y : Ω → R and the Borel sets U, V , we denote

P(X ∈ U, Y ∈ V ) = P({X ∈ U } ∩ {Y ∈ V }),

which is the probability that the random variable X takes value in U and Y takes value in
V . The generalization to an arbitrary number of random variables is straightforward.
As the value attained by X depends on the result of the experiment, random variables
carry information, i.e., upon knowing the value attained by X we know something about the
outcome ω of the experiment. For instance, if X(ω) = (−1)ω , where ω is the result of tossing
a dice, and if we are told that X takes value 1, then we infer immediately that the dice roll
is even. The information carried by a random variable X forms the σ-algebra generated by
X, whose precise definition is the following.

Definition 2.2. Let X : Ω → R be a random variable. The σ-algebra generated by X is


the collection σ(X) ⊆ F of events given by

σ(X) = {A ∈ F : A = {X ∈ U }, for some U ∈ B(R)}.

If G ⊆ F is another σ-algebra of subsets of Ω and σ(X) ⊆ G, we say that X is G-


measurable. If Y : Ω → R is another random variable and σ(Y ) ⊆ σ(X), we say that
Y is X-measurable

Exercise 2.1 (•). Prove that σ(X) is a σ-algebra.

Thus σ(X) contains all the events that are resolved by knowing the value of X. The
interpretation of X being G-measurable is that the information contained in G suffices to
determine the value taken by X in the experiment. Note that the σ-algebra generated by a
deterministic constant consists of trivial events only.

Definition 2.3. The σ-algebra σ(X, Y ) generated by two random variables X, Y : Ω → R


is the smallest σ-algebra containing σ(X) ∪ σ(Y ), that is to say1 σ(X, Y ) = FO , where
O = σ(X) ∪ σ(Y ), and similarly for any number of random variables.

If Y is X-measurable then σ(X, Y ) = σ(X), i.e., the random variable Y does not add
any new information to the one already contained in X. Clearly, if Y = f (X) for some
measurable function f , then Y is X-measurable. It can be shown that the opposite is also
1
See Definition 1.2.

16
true: if σ(Y ) ⊆ σ(X), then there exists a measurable function f such that Y = f (X) (see
Prop. 3 in [18]). The other extreme is when X and Y carry distinct information, i.e., when
σ(X) ∩ σ(Y ) consists of trivial events only. This occurs in particular when the two random
variables are independent.
Definition 2.4. Let X : Ω → R be a random variable and G ⊂ F be a sub-σ-algebra. We
say that X is independent of G if σ(X) and G are independent in the sense of Definition 1.7.
Two random variables X, Y : Ω → R are said to be independent random variables
if the σ-algebras σ(X) and σ(Y ) are independent. More generally, the random variables
X1 , . . . , XN are independent if σ(X1 ), . . . , σ(XN ) are independent σ-algebras.
In the intermediate case, i.e., when Y is neither X-measurable nor independent of X, it
is expected that the knowledge on the value attained by X helps to derive information on
the values attainable by Y . We shall study this case in the next chapter.
Exercise 2.2 (•). Show that when X, Y are independent random variables, then σ(X)∩σ(Y )
consists of trivial events only. Show that two deterministic constants are always indepen-
dent. Finally assume Y = g(X) and show that in this case the two random variables are
independent if and only if Y is a deterministic constant.
Exercise 2.3. Which of the following pairs of random variables X, Y : ΩN → R are indepen-
dent? (Use only the intuitive interpretation of independence and not the formal definition.)
1. X(ω) = NT (ω); Y (ω) = 1 if the first toss is head, Y (ω) = 0 otherwise.
2. X(ω) = 1 if there exists at least a head in ω, X(ω) = 0 otherwise; Y (ω) = 1 if there
exists exactly a head in ω, Y (ω) = 0 otherwise.
3. X(ω) = number of times that a head is followed by a tail; Y (ω) = 1 if there exist two
consecutive tail in ω, Y (ω) = 0 otherwise.
Theorem 2.1. Let X1 , . . . , XN be independent random variables. Let us divide the set
{X1 , . . . , XN } into m separate groups of random variables, namely, let
{X1 , . . . , XN } = {Xk1 }k1 ∈I1 ∪ {Xk2 }k2 ∈I2 ∪ · · · ∪ {Xkm }km ∈Im ,
where {I1 , I2 , . . . Im } is a partition of {1, . . . , N }. Let ni be the number of elements in the
set Ii , so that n1 + n2 + · · · + nm = N . Let g1 , . . . , gm be measurable functions such that
gi : Rni → R. Then the random variables
Y1 = g1 ((Xk1 )k1 ∈I1 ), Y2 = g2 ((Xk2 )k2 ∈I2 ), Ym = gm ((Xkm )km ∈Im )
are independent.
For instance, in the case of N = 2 independent random variables X1 , X2 , Theorem 2.1
asserts that Y1 = g(X1 ) and Y2 = f (X2 ) are independent random variables, for all measurable
functions f, g : R → R.
Exercise 2.4 (•). Prove Theorem 2.1 for the case N = 2.

17
Simple and discrete Random Variables
A special role is played by simple random variables. The simplest possible one is the indica-
tor function of an event: Given A ∈ F, the indicator function of A is the random variable
that takes value 1 if ω ∈ A and 0 otherwise, i.e.,

1, ω ∈ A,
IA (ω) =
0, ω ∈ Ac .

Obviously, σ(IA ) = {A, Ac , ∅, Ω}.


Definition 2.5. Let {Ak }k=1....,N ⊂ F be a family of disjoint events and a1 , . . . , aN be distinct
real numbers. The random variable
N
X
X= ak IAk
k=1

is called a simple random variable. If N = ∞ in this definition, we call X a discrete


random variable.
Thus a simple random variable X can attain only a finite number of values, while a
discrete random variable X attains countably infinite many values2 . In both cases we have

0, if x ∈
/ Im(X),
P(X = x) =
P(Ak ), if x = ak ,

where Im(X) = {x ∈ R : X(ω) = x, for some ω ∈ Ω} is the image of X. We remark that


most references do not assume, in the definition of simple random variable, that the sets
A1 , . . . , AN should be disjoint. We do so, however, because all simple random variables
considered in these notes satisfy this property and because the sets A1 , . . . , AN can always
be re-defined in such a way that they are disjoint, without modifying the image of the simple
random variable, see Exercise 2.5. Similarly the condition that a1 , . . . , aN should be distinct
can be removed from the definition of simple random variable.
Exercise 2.5 (•). Let a random variable X have the form
M
X
X= bk IBk ,
k=1

for some non-zero b1 , . . . , bM ∈ R and B1 , . . . BM ∈ F. Show that there exists a1 , . . . , aN ∈ R


distinct and disjoint sets A1 , . . . AN ∈ F such that
N
X
X= ak IAk .
k=1

2
Not all authors distinguish between simple and discrete random variables.

18
Let us see two examples of simple/discrete random variables that appear in financial
mathematics (and in many other applications). A simple random variable X is called a
binomial random variable if
• Range(X) = {0, 1, . . . , N };
N
p (1 − p)N −k , k = 1 . . . , N .
 k
• There exists p ∈ (0, 1) such that P(X = k) = k

For instance, if we let X to be the number of heads in a N -toss, then X is binomial. A


widely used model for the evolution of stock prices in financial mathematics assumes that
the price of the stock at any time is a binomial random variable (binomial asset pricing
model). A discrete random variable X is called a Poisson variable if
• Range(X) = N ∪ {0};
µk e−µ
• There exists µ > 0 such that P(X = k) = , k = 0, 1, 2, . . .
k!
We denote by P(µ) the set of all Poisson random variables with parameter µ > 0.
The following important theorem shows that all non-negative random variables can be
approximated by a sequence of simple random variables.
Theorem 2.2. Let X : Ω → [0, ∞) be a random variable and let n ∈ N be given. For
k = 0, 1, ...n2n − 1, consider the sets
n h k k + 1 o
Ak,n := X ∈ n , n
2 2
and for k = n2n let
An2n ,n = {X ≥ n}.
Note that {Ak,n }k=0,...,n2n is a partition of Ω, for all fixed n ∈ N. Define the simple random
variables
n2n
X
X k
sn (ω) = IA (ω).
k=0
2n k,n
Then 0 ≤ sX X X X
1 (ω) ≤ s2 (ω) ≤ · · · ≤ sn (ω) ≤ sn+1 (ω) ≤ · · · ≤ X(ω), for all ω ∈ Ω (i.e., the
sequence {sX
n }n∈N is non-decreasing) and

lim sX
n (ω) = X(ω), for all ω ∈ Ω.
n→∞

Exercise 2.6. Prove Theorem 2.2.

2.2 Distribution and probability density functions


Definition 2.6. The (cumulative) distribution function of the random variable X :
Ω → R is the non-negative function FX : R → [0, 1] given by FX (x) = P(X ≤ x). Two
random variables X, Y are said to be identically distributed if FX = FY .

19
Exercise 2.7 (•). Show that

P(a < X ≤ b) = FX (b) − FX (a).

Show also that FX is (1) right-continuous, (2) increasing and (3) limx→+∞ FX (x) = 1.

Exercise 2.8. Let F : R → [0, 1] be a measurable function satisfying the properties (1)–(3)
in Exercise 2.7. Show that there exists a probability space and a random variable X such
that F = FX .

Definition 2.7. A random variable X : Ω → R is said to admit the probability density


function (pdf ) fX : R → [0, ∞) if fX is integrable on R and
Z x
FX (x) = fX (y) dy. (2.1)
−∞

Note that if fX is the pdf of a random variable, then necessarily


Z
fX (x) dx = lim FX (x) = 1.
R x→∞

All probability density functions considered in these notes are continuous, and therefore
the integral in (2.1) can be understood in the Riemann sense. Moreover in this case FX is
differentiable and we have
dFX
fX = .
dx
If the integral in (2.1) is understood in the Lebesgue sense, then the density fX can be a
quite irregular function. In this case, the fundamental theorem of calculus for the Lebesgue
integral entails that the distribution FX (x) satisfying (2.1) is absolutely continuous, and so
in particular it is continuous. Conversely, if FX is absolutely continuous, then X admits a
density function. We remark that, regardless of the notion of integral being used, a simple
(or discrete) random variable X cannot admit a density inP the sense of Definition 2.7, unless
it is a deterministic constant. Suppose in fact that X = N k=1 ak IAk is not a deterministic
constant. Assume that a1 = max(a1 , . . . , aN ). Then

lim FX (x) = P(A2 ) + · · · + P(AN ) < 1,


x→a−
1

while
lim FX (x) = 1 = FX (a1 ).
x→a+
1

It follows that FX (x) is not continuous, and so in particular it cannot be written in the
form (2.1). To define the pdf of simple random variables, let
N
X
X= ak IAk ,
k=1

20
where without loss of generality we assume that the real numbers a1 , . . . , aN are distinct and
the sets A1 , . . . , AN are disjoint (see Exercise 2.5). The distribution function of X is
X
FX (x) = P(X ≤ x) = P(X = ak ). (2.2)
ak ≤x

In this case the probability density function fX (x) is defined as



P(X = x), if x = ak for some k
fX (x) = (2.3)
0 otherwise

and thus, with a slight abuse of notation, we can rewrite (2.2) as


X
FX (x) = fX (y), (2.4)
y≤x

which extend (2.1) to simple random variables. We remark that it is possible to unify the
definition of pdf for continuum and discrete random variables by writing the sum (2.4) as
an integral with respect to the Dirac measure, but we shall not do so.
We shall see that when a random variable X admits a density fX , all the relevant sta-
tistical information on X can be deduced by fX . We also remark that often one can prove
the existence of the pdf fX without however being able to derive an explicit formula for it.
For instance, fX is often given as the solution of a partial differential equation, or through
its (inverse) Fourier transform, which is called the characteristic function of X, see Sec-
tion 3.3. Some examples of density functions, which have important applications in financial
mathematics, are the following.

Examples of probability density functions


• A random variable X : Ω → R is said to be a normal (or normally distributed)
random variable if it admits the density
(x − m)2
1 −
fX (x) = √ e 2σ 2 ,
2πσ 2
for some m ∈ R and σ > 0, which are called respectively the expectation (or mean)
and the deviation of the normal random variable X, while σ 2 is called the variance
of X. A typical profile of a normal density function is shown in Figure 2.1(a). We
denote by N (m, σ 2 ) the set of all normal random variables with expectation m and
variance σ 2 . If m = 0 and σ 2 = 1, X ∈ N (0, 1) is said to be a standard normal
variable. The density function of standard normal random variables is denoted by φ,
while their distribution is denoted by Φ, i.e.,
Z x
1 − x2 1 y2
φ(x) = √ e 2 , Φ(x) = √ e− 2 dy.
2π 2π −∞

21
• A random variable X : Ω → R is said to be an exponential (or exponentially
distributed) random variable if it admits the density
fX (x) = λe−λx Ix≥0 ,
for some λ > 0, which is called the intensity of the exponential random variable X. A
typical profile is shown in Figure 2.1(b) . We denote by E(λ) the set of all exponential
random variables with intensity λ > 0. The distribution function of an exponential
random variable X with intensity λ is given by
Z x Z x
FX (x) = fX (y) dy = λ e−λy dy = 1 − e−λx .
−∞ 0

• A random variable X : Ω → R is said to be chi-squared distributed if it admits the


density
xδ/2−1 e−x/2
fX (x) = δ/2 Ix>0 ,
2 Γ(δ/2)
for some δ > 0, whichR ∞is t−1
called the degree of the chi-squared distributed random
variable. Here Γ(t) = 0 z e−z dz, t > 0 is the Gamma-function. Recall the relation
Γ(n) = (n − 1)!
for n ∈ N. We denote by χ2 (δ) the set of all chi-squared distributed random variables
with degree δ. Three typical profiles of this density are shown in Figure 2.2(a).
• A random variable X : Ω → R is said to be non-central chi-squared distributed
with degree δ > 0 and non-centrality parameter β > 0 if it admits the density
 δ−1
1 − x+β x 4 2 p
fX (x) = e 2 Iδ/2−1 ( βx)Ix>0 , (2.5)
2 β
where Iν (y) denotes the modified Bessel function of the first kind. We denote by χ2 (δ, β)
the random variables with density (2.5). It can be shown that χ2 (δ, 0) = χ2 (δ). Three
typical profiles of the density (2.5) are shown in Figure 2.2(b).
• A random variable X : Ω → R is said to be Cauchy distributed if it admits the
density
γ
fX (x) =
π((x − x0 )2 + γ 2 )
for x0 ∈ R and γ > 0 , called the location and the scale of the Cauchy pdf.
• A random variable X : Ω → R is said to be Lévy distributed if it admits the density
c
r −
c e 2(x−x0 )
fX (x) = Ix>x0 ,
2π (x − x0 )3/2
for x0 ∈ R and c > 0, called the location and the scale of the Lévy pdf.

22
0.20
2.0

0.15 1.5

0.10 1.0

0.05 0.5

-6 -4 -2 2 4 6 8 0.5 1.0 1.5 2.0 2.5 3.0

(a) X ∈ N (1, 2) (b) X ∈ E(2)

Figure 2.1: Densities of a normal random variable X and of an exponential random variable
Y.
0.25
1.2
∆=1/2 ∆=1
1.0 0.20

0.8
∆=1 0.15 ∆=2
0.6
0.10
∆=4
0.4 ∆=5
0.05
0.2

1 2 3 4 5 10 15 20

(a) X ∈ χ2 (δ) (b) X ∈ χ2 (δ, 2)

Figure 2.2: Densities of (non-central) chi-squared random variables with different degree.

If a random variable X admits a density fX , then for all (possibly unbounded) intervals
I ⊂ R the result of Exercise 2.7 entails
Z
P(X ∈ I) = fX (y) dy. (2.6)
I

It can be shown that (2.6) extends to


Z
P(g(X) ∈ I) = fX (x) dx, (2.7)
x:g(x)∈I

for all measurable functions g : R → R. For example, if X ∈ N (0, 1),


Z 1
2
P(X ≤ 1) = P(−1 ≤ X ≤ 1) = φ(x) dx ≈ 0.683,
−1

23
which means that a standard normal random variable has about 68.3 % chances to take
value in the interval [−1, 1].
Exercise 2.9 (•). Let X ∈ N (0, 1) and Y = X 2 . Show that Y ∈ χ2 (1).
Exercise 2.10. Let X ∈ N (0, 1). Show that the random variable W defined by

1/X 2 for X 6= 0,
W =
0 otherwise
is Lévy distributed.
Exercise 2.11. Let X ∈ N (m, σ 2 ) and Y = X 2 . Show that

cosh(m x/σ 2 ) x + m2
 
fY (x) = √ exp − Ix>0 .
2πxσ 2 2σ 2

2.2.1 Random variables with boundary values


Random variables in mathematical finance do not always admit a density in the classical
sense described above (or in any other sense), and the purpose of this section is to present
an example when one has to consider a generalized notion of density function. Suppose
that X takes value on the semi-open interval [0, ∞). Then clearly FX (x) = 0 for x < 0,
FX (0) = P(X = 0), while for x > 0 we can write

FX (x) = P(X ≤ x) = P(0 ≤ X ≤ x) = P(X = 0) + P(0 < X ≤ x).

Now assume that FX is differentiable on the openR set x ∈ (0, ∞). Then there exists a
x
function fX+ (x), x > 0, such that FX (x) − FX (0) = 0 fX+ (t) dt. Hence, for all x ∈ R we find
Z x
FX (x) = p0 H(x) + fX+ (t)It>0 dt,
−∞

where p0 = P(X = 0) and H(x) is the Heaviside function, i.e., H(x) = 1 if x ≥ 0, H(x) = 0
if x < 0. By introducing the delta-distribution through the formal identity

H 0 (x) = δ(x) (2.8)

then we obtain, again formally, the following expression for the density function
dFX (x)
fX (x) = = p0 δ(x) + fX+ (x). (2.9)
dx
The formal identities (2.8)-(2.9) become rigorous mathematical expressions when they are
understood in the sense of distributions. We shall refer to the term p0 δ(x) as the discrete
part of the density. The function fX+ is also called the defective density of the random
variable X. Note that Z ∞
fX+ (x) dx = 1 − p0 .
0

24
The defective density is the actual pdf of X if and only if p0 = 0.
The typical example of financial random variable whose pdf may have a discrete part
is the stock price S(t) at time t. For simple models (such us the geometric Browniam
motion (2.14) defined in Section 2.4 below), the stock price is strictly positive a.s. at all
finite times and the density has no discrete part. However for more sophisticated models the
stock price can reach zero with positive probability at any finite time and so the pdf of the
stock price admits a discrete part P(S(t) = 0)δ(x). Hence these models take into account
the risk of default of the stock. We shall see an example in Section 6.5.

2.2.2 Joint distribution


If two random variables X, Y : Ω → R are given, how can we verify whether or not they
are independent? This problem has a simple solution when X, Y admit a joint distribution
density.
Definition 2.8. The joint (cumulative) distribution FX,Y : R2 → [0, 1] of two random
variables X, Y : Ω → R is defined as

FX,Y (x, y) = P(X ≤ x, Y ≤ y).

The random variables X, Y are said to admit the joint (probability) density function
fX,Y : R2 → [0, ∞) if fX,Y is integrable in R2 and
Z x Z y
FX,Y (x, y) = fX,Y (η, ξ) dη dξ. (2.10)
−∞ −∞

Note the formal identities


∂ 2 FX,Y
Z
fX,Y = , fX,Y (x, y) dx dy = 1.
∂x ∂y R2

Moreover, if two random variables X, Y admit a joint density fX,Y , then each of them admits
a density (called marginal density in this context) which is given by
Z Z
fX (x) = fX,Y (x, y) dy, fY (y) = fX,Y (x, y) dx.
R R

To see this we write


Z x Z Z x
P(X ≤ x) = P(X ≤ x, Y ∈ R) = fX,Y (η, ξ) dη dξ = fX (η) dη
−∞ R −∞

and similarly for the random variable Y . If W = g(X, Y ), for some measurable function g,
and I ⊂ R is an interval, the analogue of (2.7) in 2 dimensions holds, namely:
Z
P(g(X, Y ) ∈ I) = fX,Y (x, y) dx dy.
x,y:g(x,y)∈I

25
As an example of joint pdf, let m = (m1 , m2 ) ∈ R2 and C = (Cij )i,j=1,2 be a 2×2 positive
definite, symmetric matrix. Two random variables X, Y : Ω → R are said to be jointly
normally distributed with mean m and covariance matrix C if they admit the joint
density  
1 1 −1 T
fX,Y (x, y) = p exp (z − m) · C · (z − m) , (2.11)
(2π)2 det C 2
where z = (x, y), “ · ” denotes the row by column product, C −1 is the inverse matrix of C
and v T is the transpose of the vector v.

Exercise 2.12 (•). Show that two random variables X, Y are jointly normally distributed if
and only if
1
fX,Y (x, y) = p ×
2πσ1 σ2 1 − ρ2
!
1 h (x − m )2 2ρ(x − m )(y − m ) (y − m )2 i
1 1 2 2
× exp − − + , (2.12)
2(1 − ρ2 ) σ12 σ1 σ2 σ22

where
2 2 C12
σ1 = C11 , σ2 = C22 , ρ= .
σ1 σ2
Exercise 2.13 (?). Let X, Y ∈ N (0, 1) be independent and jointly normally distributed.
Show that the random variable Z defined by

Y /X for X 6= 0,
Z=
0 otherwise

is Cauchy distributed.

In the next theorem we establish a simple condition for the independence of two random
variables which admit a joint density3 .

Theorem 2.3. The following holds.

(i) If two random variables X, Y admit the densities fX , fY and are independent, then
they admit the joint density

fX,Y (x, y) = fX (x)fY (y).

(ii) If two random variables X, Y admit a joint density fX,Y of the form

fX,Y (x, y) = u(x)v(y),


3
A similar result holds in terms of the joint cumulative distribution, irrespective of whether a joint pdf
exists.

26
for some functions u, v : R → [0, ∞), then X, Y are independent and admit the densi-
ties fX , fY given by
1
fX (x) = cu(x), fY (y) = v(y),
c
where Z Z −1
c= v(x) dx = u(y) dy .
R R

Proof. As to (i) we have

FX,Y (x, y) = P(X ≤ x, Y ≤ y) = P(X ≤ x)P(Y ≤ y)


Z x Z y
= fX (η) dη fY (ξ) dξ
−∞ −∞
Z x Z y
= fX (η)fY (ξ) dη dξ.
−∞ −∞

To prove (ii), we first write

{X ≤ x} = {X ≤ x} ∩ Ω = {X ≤ x} ∩ {Y ≤ R} = {X ≤ x, Y ≤ R}.

Hence,
Z x Z ∞ Z x Z Z x
P(X ≤ x) = fX,Y (η, y) dy dη = u(η) dη v(y) dy = cu(η) dη,
−∞ −∞ −∞ R −∞
R
where c = R v(y) dy. Thus X admits the density fX (x) = c Ru(x). At the same fashion one
proves that Y admits the density fY (y) = c0 v(y), where c0 = R u(x)dx. Since
Z Z Z Z
1= fX,Y (x, y) dx dy = u(x) dx v(y) dy = c0 c,
R R R R

then c0 = 1/c. It remains to prove that X, Y are independent. This follows by


Z Z Z Z
P(X ∈ U, Y ∈ V ) = fX,Y (x, y) dx dy = u(x) dx v(y) dy
U V U V
1
Z Z Z Z
= cu(x) dx v(y) dy = fX (x) dx fY (y) dy
U V c U V
= P(X ∈ U )P(Y ∈ V ), for all U, V ∈ B(R).

Remark 2.3. By Theorem 2.3 and the result of Exercise 2.12, we have that two jointly nor-
mally distributed random variables are independent if and only if ρ = 0 in the formula (2.12).

Exercise 2.14 (•). Let X ∈ N (0, 1) and Y ∈ E(1) be independent. Compute P(X ≤ Y ).

27
Exercise 2.15. Let X ∈ E(2), Y ∈ χ2 (3) be independent. Compute numerically (e.g., using
Mathematica) the following probability

P(log(1 + XY ) < 2).

Result:≈ 0.893.

In Exercise 3.19 we give another criteria to establish whether two random variables are
independent, which applies also when the random variables do not admit a density.

2.3 Stochastic processes


Definition 2.9. A stochastic process is a one-parameter family of random variables,
which we denote by {X(t)}t≥0 , or by {X(t)}t∈[0,T ] if the parameter t is restricted to the
interval [0, T ], T > 0. Hence, for each t ≥ 0, X(t) : Ω → R is a random variable. We
denote by X(t, ω) the value of X(t) on the sample point ω ∈ Ω, i.e., X(t, ω) = X(t)(ω).
ω ω
For each ω ∈ Ω fixed, the curve γX : R → R, γX (t) = X(t, ω) is called the ω-path of the
stochastic process and is assumed to be a measurable function. If the paths of a stochastic
process are all almost surely equal (i.e., independent of ω), we say that the stochastic process
is a deterministic function of time.

The parameter t will be referred to as time parameter, since this is what it represents
in the applications in financial mathematics. Examples of stochastic processes in financial
mathematics are given in the next section.

Definition 2.10. Two stochastic processes {X(t)}t≥0 , {Y (t)}t≥0 are said to be independent
if for all m, n ∈ N and 0 ≤ t1 < t2 < · · · < tn , 0 ≤ s1 < s2 < · · · < sm , the σ-algebras
σ(X(t1 ), . . . , X(tn )), σ(Y (s1 ), . . . , Y (sm )) are independent.

Hence two stochastic processes {X(t)}t≥0 , {Y (t)}t≥0 are independent if the information
obtained by “looking” at the process {X(t)}t≥0 up to time T is independent of the informa-
tion obtained by “looking” at the process {Y (t)}t≥0 up to time S, for all S, T > 0. Similarly
one defines the notion of several independent stochastic processes.

Remark 2.4 (Notation). If t runs over a countable set, i.e., t ∈ {tk }k∈N , then a stochastic
process is equivalent to a sequence of random variables X1 , X2 , . . . , where Xk = X(tk ). In
this case we say that the stochastic process is discrete and we denote it by {Xk }k∈N . An
example of discrete stochastic process is the random walk defined below.

A special role is played by step processes: given 0 = t0 < t1 < t2 < . . . , a step process
is a stochastic process {∆(t)}t≥0 of the form

X
∆(t, ω) = Xk (ω)I[tk ,tk+1 ) .
k=0

28
∆(t, ω∗ )

X4 (ω∗ )

X1 (ω∗ )
X2 (ω∗ )

t0 = 0 t1 t2 t3 t4

X3 (ω∗ )

Figure 2.3: The path ω = ω∗ of a step process.

A typical path of a step process is depicted in Figure 2.3. Note that the paths of a step
process are right-continuous, but not left-continuous. Moreover, since Xk (ω) = ∆(tk , ω), we
can rewrite ∆(t) as
X∞
∆(t) = ∆(tk )I[tk ,tk+1 ) .
k

It will be shown in Theorem 4.2 that any sufficiently regular stochastic process can be
approximated, in a suitable sense, by a sequence of step processes.
In the same way as a random variable generates a σ-algebra, a stochastic process generates
a filtration. Informally, the filtration generated by a stochastic process {X(t)}t≥0 contains
the information accumulated by looking at the process for longer and longer periods of time.
Definition 2.11. The filtration generated by the stochastic process {X(t)}t≥0 is given by
{FX (t)}t≥0 , where
FX (t) = FO(t) , O(t) = ∪0≤s≤t σ(X(s)).
Hence FX (t) is the smallest σ-algebra containing σ(X(s)), for all 0 ≤ s ≤ t, see Def-
inition 1.2. Similarly one defines the filtration {FX,Y (t)}t≥0 generated by two stochastic
processes {X(t)}t≥0 , {Y (t)}t≥0 , as well as the filtration generated by any number of stochas-
tic processes.
Definition 2.12. If {F(t)}t≥0 is a filtration and FX (t) ⊆ F(t), for all t ≥ 0, we say that
the stochastic process {X(t)}t≥0 is adapted to the filtration {F(t)}t≥0 .
The property of {X(t)}t≥0 being adapted to {F(t)}t≥0 means that the information con-
tained in F(t) suffices to determine the value attained by the random variable X(s), for all
s ∈ [0, t]. Clearly, {X(t)}t≥0 is adapted to its own generated filtration {FX (t)}t≥0 . Moreover

29
if {X(t)}t≥0 is adapted to {F(t)}t≥0 and Y (t) = f (X(t)), for some measurable function f ,
then {Y (t)}t≥0 is also adapted to {F(t)}t≥0 .
Next we give an example of (discrete) stochastic process. Let {Xt }t∈N be a sequence of
independent random variables satisfying

Xt = 1 with probability 1/2, Xt = −1 with probability 1/2,

for all t ∈ N. For a concrete realization of these random variables, we may think of Xt as
being defined on the sample space Ω∞ of the ∞-coin tosses experiment (see Appendix 1.A).
In fact, letting ω = (γj )j∈N ∈ Ω∞ , we may set

−1, if γt = H,
Xt (ω) =
1, if γt = T .

Hence Xt : Ω → {−1, 1} is the simple random variable Xt (ω) = IAt − IAct , where At = {ω ∈
Ω∞ : γt = H}. Clearly, FX (t) is the collection of all the events that are resolved by the first
t-tosses, which is given as indicated at the beginning of Section 1.3.

Definition 2.13. The stochastic process {Mt }t∈N given by


t
X
M0 = 0, Mt = Xk ,
k=1

is called symmetric random walk.

To understand the meaning of the term “random walk”, consider a particle moving on
the real line in the following way: if Xt = 1 (i.e., if the toss number t is a head), at time t
the particle moves one unit of length to the right, if Xt = −1 (i.e., if the toss number t is
a head) it moves one unit of length to the left. Then Mt gives the total amount of units of
length that the particle has travelled to the right or to the left up to time t.

Exercise 2.16. Which of the following holds?

FM (t) ⊂ FX (t), FM (t) = FX (t), FX (t) ⊂ FM (t).

Justify the answer.

The increments of the random walk are defined as follows. If (k1 , . . . , kN ) ∈ NN , such
that 1 ≤ k1 < k2 < · · · < kN , we set

∆1 = Mk1 − M0 = Mk1 , ∆2 = Mk2 − Mk1 , . . . , ∆N = MkN − MkN −1 .

Hence ∆j is the total displacement of the particle from time kj−1 to time kj .

Theorem 2.4. The increments ∆1 , . . . , ∆N of the random walk are independent random
variables.

30
Proof. Since

∆1 = X1 + · · · + Xk1 = g1 (X1 , . . . , Xk1 ),


∆2 = Xk1 +1 + · · · + Xk2 = g2 (Xk1 +1 , . . . , Xk2 ),
.
.
∆N = XkN −1 +1 + · · · + XkN = gN (XkN −1 +1 + · · · + XkN ),

the result follows by Theorem 2.1.


The interpretation of this result is that the particle has no memory of past movements:
the distance travelled by the particle in a given interval of time is not affected by the motion
of the particle at earlier times.
We may now define the most important of all stochastic processes.
Definition 2.14. A Brownian motion (or Wiener process) is a stochastic process
{W (t)}t≥0 such that
(i) The paths are continuous and start from 0 almost surely, i.e., the sample points ω ∈ Ω
ω ω
such that γW (0) = 0 and γW is a continuous function comprise a set of probability 1;

(ii) The increments over disjoint time intervals are independent, i.e., for all 0 = t0 < t1 <
. . . tm ∈ (0, ∞), the random variables

W (t1 ) − W (t0 ), W (t2 ) − W (t1 ), . . . , W (tm ) − W (tm−1 )

are independent;

(iii) For all s < t, the increment W (t) − W (s) belongs to N (0, t − s).
Remark 2.5. Note carefully that the properties defining a Brownian motion depend on
the probability measure P. Thus a stochastic process may be a Brownian motion relative
to a probability measure P and not a Brownian motion with respect to another (possibly
equivalent) probability measure P.
e If we want to emphasize the probability measure P
with respect to which a stochastic process is a Brownian motion we shall say that it is a
P-Brownian motion.
It can be shown that Brownian motions exist. In particular, it can be shown that the
sequence of stochastic processes {Wn (t)}t≥0 , n ∈ N, defined by
1
Wn (t) = √ M[nt] , (2.13)
n
where Mt is the symmetric random walk and [z] denotes the integer part of z, converges to
a Brownian motion4 . Therefore one may think of a Brownian motion as a time-continuum
4
The convergence holds in probability, i.e., limn→∞ P(|Wn (t) − W (t)| ≥ ε) = 0, for all ε > 0.

31
version of a symmetric random walk which runs for an infinite number of “infinitesimal
time steps”. In fact, provided the number of time steps is sufficiently large, the process
{Wn (t)}t≥0 gives a very good approximation of a Brownian motion, which is useful for
numerical computations. An example of path to the stochastic process {Wn (t)}t≥0 , for
n = 1000, is shown in Figure 2.4. Notice that there exist many Brownian motions and each
of them may have some specific properties besides those listed in Definition 2.14. However,
as long as we use only the properties (i)-(iii), we do not need to work with a specific example
of Brownian motion.
Once a Brownian motion is introduced it is natural to require that the filtration {F(t)}t≥0
should be somehow related to it. For our future financial applications, the following class of
filtrations will play a fundamental role.

Definition 2.15. Let {W (t)}t≥0 be a Brownian motion and denote by σ + (W (t)) the σ-
algebra generated by the increments {W (s) − W (t); s ≥ t}, that is

σ + (W (t)) = FO(t) , O(t) = ∪s≥t σ(W (s) − W (t)).

A filtration {F(t)}t≥0 is said to be a non-anticipating filtration for the Brownian motion


{W (t)}t≥0 if {W (t)}t≥0 is adapted to {F(t)}t≥0 and if the σ-algebras σ + (W (t)), F(t) are
independent for all t ≥ 0

The meaning is the following: the increments of the Brownian motion after time t are
independent of the information available at time t in the σ-algebra F(t). It is clear by the
previous definition that {FW (t)}t≥0 is a non-anticipating filtration for {W (t)}t≥0 . We shall
see later that many properties of Brownian motions that depend on {FW (t)}t≥0 also holds
with respect to any non-anticipating filtration (e.g., the martingale property, see Section 3.6).

Another important example of stochastic process applied in financial mathematics is the


following.

Definition 2.16. A Poisson process with rate λ is a stochastic process {N (t)}t≥0 such
that

(i) N (0) = 0 a.s.;

(ii) The increments over disjoint time-intervals are independent;

(iii) For all s < t, the increment N (t) − N (s) belongs to P(λ(t − s)).

Note in particular that N (t) is a discrete random variable, for all t ≥ 0, and that, in
contrast to the Brownian motion, the paths of a Poisson process are not continuous. The
Poisson process is the building block to construct more general stochastic processes with
jumps, which are very popular nowadays as models for the price of certain financial assets,
see [4].

32
25

20

15

10

200 400 600 800 1000

-5

-10

Figure 2.4: A path of the stochastic process (2.13) for n = 1000.

2.4 Stochastic processes in financial mathematics


Remark 2.6. For more information on the financial concepts introduced in this section, see
Chapter 1 in [5].
All variables in financial mathematics are represented by stochastic processes. The most
obvious example is the price (or value) of financial assets. The stochastic process repre-
senting the price per share of a generic asset at different times will be denoted by {Π(t)}t≥0 .
Depending on the type of asset considered, we use a different specific notation for the stochas-
tic process modeling its price.
Remark 2.7. We always assume that t = 0 is earlier or equal to the present time. In
particular, the value of all financial variables is known at time t = 0. Hence, if {X(t)}t≥0 is
a stochastic process modelling a financial variable, then X(0) is a deterministic constant.
Before presenting various examples of stochastic processes in financial mathematics, let
us introduce an important piece of terminology. An investor is said to have a short position
on an asset if the investor profits from a decrease of its price, and a long position if the
investor profits from an increase of the price of the asset. The specific trading strategy that
leads to a short or long position on an asset depends on the type of asset considered, as we
are now ready to describe in more details.

Stock price
The price per share a time t of a stock will be denoted by S(t). Typically S(t) > 0, for
all t ≥ 0, however, as discussed in Section 2.2.1, some models allow for the possibility that
S(t) = 0 with positive probability at finite times t > 0 (risk of default). Clearly {S(t)}t≥0

33
is a stochastic process. If we have several stocks, we shall denote their price by {S1 (t)}t≥0 ,
{S2 (t)}t≥0 , etc. Investors who own shares of a stock are those having a long position on the
stock, while investors short-selling the stock hold a short position. We recall that short-
selling a stock is the practice to sell the stock without actually owning it. Concretely, an
investor is short-selling N shares of a stock if the investor borrows the shares from a third
party and then sell them immediately on the market. The reason for short-selling assets
is the expectation that the price of the asset will decrease. If this is the case, then upon
re-purchasing the N shares in the future, and returning them to the lender, the short-seller
will profit from the lower current price of the asset compared to the price at the time of
short-selling.
The most popular model for the price of a stock is the geometric Brownian motion
stochastic process, which is given by
S(t) = S(0) exp(αt + σW (t)). (2.14)
Here {W (t)}t≥0 is a Brownian motion, α ∈ R is the instantaneous mean of log-return,
σ > 0 is the instantaneous volatility, while σ 2 is the instantaneous variance of the
stock. Note that α and σ are constant in this model. Moreover, S(0) is the price at time
t = 0 of the stock, which, according to Remark 2.7, is a deterministic constant. In Chapter 4
we introduce a generalization of the geometric Brownian motion, in which the instantaneous
mean of log-return and the instantaneous volatility of the stock are stochastic processes
{α(t)}t≥0 , {σ(t)}t≥0 (generalized geometric Brownian motion).
Exercise 2.17 (•). Derive the density of the geometric Brownian motion (2.14) and use
the result to show that P(S(t) = 0) = 0, i.e., a stock whose price is described by a geometric
Brownian motion cannot default.

Risk-free assets
A money market is a market in which the object of trading is money. More precisely, a
money market is a type of financial market where investors can borrow and lend money at
a given interest rate and for a period of time T ≤ 1 year5 . Assets in the money market
(i.e., short term loans) are assumed to be risk-free, which means that their value is always
increasing in time. Examples of risk-free assets in the money market are repurchase agree-
ments (repo), certificates of deposit, treasure bills, etc. The stochastic process corresponding
to the price per share of a generic risk-free asset will be denoted by {B(t)}t∈[0,T ] . The in-
stantaneous interest rate of a risk-free asset is a stochastic process {R(t)}t∈[0,T ] such that
R(t) > 0, for all t ∈ [0, T ], and such that the value of the asset at time t is given by
Z t 
B(t) = B(0) exp R(s) ds , t ∈ [0, T ]. (2.15)
0

This corresponds to the investor debit/credit with the money market at time t if the amount
B(0) is borrowed/lent by the investor at time t = 0. An investor lending (resp. borrowing)
5
Loans with maturity longer than 1 year are called bonds; they will be discussed in more details in
Chapter 6.

34
money has a long (resp. short) position on the risk-free asset (more precisely, on its interest
rate). We remark that the integral in the right hand side of (2.15) is to be evaluated path
by path, i.e., Z t 
B(t, ω) = B(0) exp R(s, ω) ds ,
0

for all fixed ω ∈ Ω. Although in the real world different risk-free assets have different
interest rates, throughout these notes we make the simplifying assumption that all assets
in the money market have the same instantaneous interest rate {R(t)}t∈[0,T ] , which we call
the interest rate of the money market. For the applications in options pricing theory it is
common to assume that the interest rate of the money market is a deterministic constant
R(t) = r, for all t ∈ [0, T ]. This assumption can be justified by the relatively short time of
maturity of options, see below.
Remark 2.8. The (average) interest rate of the money market is sometimes referred to
as “the cost of money”, and the ratio B(t)/B(0) is said to express the “time-value of
money”. This terminology is meant to emphasize that one reason for the “time-devaluation”
of money—in the sense that the purchasing power of money decreases with time—is precisely
the fact that money can grow interests by purchasing risk-free assets.

The discounting process


The stochastic process {D(t)}t≥0 given by
 Z t 
B(0)
D(t) = exp − R(s) ds =
0 B(t)

is called the discounting process. In general, if an asset price is multiplied by D(t), the
new stochastic process is called the discounted price of the asset. We denote the discounted
price by adding a subscript ∗ to the asset price. For instance, the discounted price of a stock
with price S(t) at time t is given by

S ∗ (t) = D(t)S(t).

Its meaning is the following: S ∗ (t) is the amount that should be invested on the money
market at time t = 0 in order that the value of this investment at time t replicates the
value of the stock at time t. Notice that S ∗ (t) < S(t). The discounted price of the stock
measures, roughly speaking, the loss in the stock value due to the “time-devaluation” of
money discussed above, see Remark 2.8.

Financial derivative
A financial derivative (or derivative security) is a contract whose value depends on the
performance of one (or more) other asset(s), which is called the underlying asset. There
exist various types of financial derivatives, the most common being options, futures, forwards

35
and swaps. Financial derivatives can be traded over the counter (OTC), or in a regularized
market. In the former case, the contract is stipulated between two individual investors, who
agree upon the conditions and the price of the contract. In particular, the same derivative
(on the same asset, with the same parameters) can have two different prices over the counter.
Derivatives traded in the market, on the contrary, are standardized contracts. Anyone, after
a proper authorization, can make offers to buy or sell derivatives in the market, in a way
much similar to how stocks are traded. Let us see some examples of financial derivative (we
shall introduce more in Chapter 6).
A call option is a contract between two parties, the buyer (or owner) of the call and
the seller (or writer) of the call. The contract gives to the buyer the right, but not the
obligation, to buy the underlying asset at some future time for a price agreed upon today,
which is called strike price of the call. If the buyer can exercise this option only at some
given time t = T > 0 (where t = 0 corresponds to the time at which the contract is
stipulated) then the call option is called European, while if the option can be exercised
at any time in the interval (0, T ], then the option is called American. The time T > 0 is
called maturity time, or expiration date of the call. The seller of the call is obliged to
sell the asset to the buyer if the latter decides to exercise the option. If the option to buy
in the definition of a call is replaced by the option to sell, then the option is called a put
option.
In exchange for the option, the buyer must pay a premium to the seller. Suppose that
the option is a European option with strike price K, maturity time T and premium Π0 on a
stock with price S(t) at time t. In which case is it then convenient for the buyer to exercise
the call? Let us define the payoff of a European call as

Y = (S(T ) − K)+ := max(0, S(T ) − K),

i.e., Y > 0 if the stock price at the expiration date is higher than the strike price of the call
and it is zero otherwise; similarly for a European put we set

Y = (K − S(T ))+ .

Note that Y is a random variable, because it depends on the random variable S(T ). Clearly, if
Y > 0 it is more convenient for the buyer to exercise the option rather than buying/selling the
asset on the market. Note however that the real profit for the buyer is given by N (Y − Π0 ),
where N is the number of option contracts owned by the buyer. Typically, options are sold
in stocks of 100 shares, that is to say, the minimum amount of options that one can buy is
100, which cover 100 shares of the underlying asset.
One reason why investors buy calls in the market is to protect a short position on the
underlying asset. In fact, suppose that an investor short-sells 100 shares of a stock at time
t = 0 with the agreement to return them to the original owner at time t0 > 0. The investor
believes that the price of the stock will go down in the future, but of course the price may
go up instead. To avoid possible large losses, at time t = 0 the investor buys 100 shares of
an American call option on the stock expiring at T ≥ t0 , and with strike price K = S(0). If
the price of the stock at time t0 is not lower than the price S(0) as the investor expected,

36
then the investor will exercise the call, i.e., will buy 100 shares of the stock at the price
K = S(0). In this way the investor can return the shares to the lender with minimal losses.
At the same fashion, investors buy put options to protect a long position on the underlying
asset. The reason why investors write options is mostly to get liquidity (cash) to invest in
other assets6 .
Let us introduce some further terminology. A European call (resp. put) is said to be in
the money at time t if S(t) > K (resp. S(t) < K). The call (resp. put) is said to be out
of the money if S(t) < K (resp. S(t) > K). If S(t) = K, the (call or put) option is said
to be at the money at time t. The meaning of this terminology is self-explanatory.
The premium that the buyer has to pay to the seller for the option is the price (or value)
of the option. It depends on time (in particular, on the time left to expiration). Clearly,
the deeper in the money is the option, the higher will be its price. Therefore the holder of
the long position on the option is the buyer, while the seller holds the short position on the
option.
European call and put options are examples of more general contracts called European
derivatives. Given a function g : (0, ∞) → R, a standard European derivative with
pay-off Y = g(S(T )) and maturity time T > 0 is a contract that pays to its owner the
amount Y at time T > 0. Here S(T ) is the price of the underlying asset (which we take to
be a stock) at time T . The function g is called pay-off function of the derivative. The
term “European” refers to the fact that the contract cannot be exercised before time T ,
while the term “standard” refers to the fact that the pay-off depends only on the price of
the underlying at time T . The pay-off of a non-standard European derivative depends on
the path of the asset price during the interval [0, T ]. For example, the pay-off of an Asian
RT
call is given by Y = ( 0 S(t) dt − K)+ .
The price at time t of a European derivative (standard or not) with pay-off Y and
expiration date T will be denoted by ΠY (t). Hence {ΠY (t)}t∈[0,T ] is a stochastic process.
In addition, we now show that ΠY (T ) = Y holds, i.e., there exist no offers to buy (sell) a
derivative for less (more) than Y at the time of maturity. In fact, suppose that a derivative
is sold for ΠY (T ) < Y “just before” it expires at time T . In this way the buyer would make
the sure profit Y − ΠY (T ) at time T , which means that the seller would loose the same
amount. On the contrary, upon buying a derivative “just before” maturity for more than
Y , the buyer would loose Y − ΠY (T ). Thus in a rational market, ΠY (T ) = Y (or, more
precisely, ΠY (t) → Y , as t → T ).
A standard American derivative with pay-off function g is a contract which can be
exercised at any time t ∈ (0, T ] prior or equal to its maturity and that, upon exercise, pays the
amount g(S(t)) to the holder of the derivative. In these notes we are mostly concerned with
(standard) European derivatives, but in Chapter 6.9 we also discuss briefly some properties
of American call/put options.
6
Of course, speculation is also an important motivation to buy/sell options. However the standard theory
of options pricing is firmly based on the interpretation of options as derivative securities and does not take
speculation into account.

37
Portfolio
The portfolio of an investor is the set of all assets in which the investor is trading. Mathe-
matically it is described by a collection of N stochastic processes

{h1 (t)}t≥0 , {h2 (t)}t≥0 , . . . , {hN (t)}t≥0 ,

where hk (t) represents the number of shares of the asset k at time t in the investor portfolio.
If hk (t) is positive, resp. negative, the investor has a long, resp. short, position on the asset
k at time t. If Πk (t) denotes the value of the asset k at time t, then {Πk (t)}t≥0 is a stochastic
process; the portfolio value is the stochastic process {V (t)}t≥0 given by
N
X
V (t) = hk (t)Πk (t).
k=1

Remark 2.9. For modeling purposes, it is convenient to assume that an investor can trade
any fraction of shares of an asset, i.e., hk (t) : Ω → R, rather than hk (t) : Ω → Z.

The investor makes a profit in the time interval [t0 , t1 ] if V (t1 ) > V (t0 ); the investor
incurs in a loss in the interval [t0 , t1 ] if V (t1 ) < V (t0 ). We now introduce the important
definition of arbitrage portfolio.

Definition 2.17. An arbitrage portfolio is a portfolio whose value {V (t)}t≥0 satisfies the
following properties, for some T > 0:

(i) V (0) = 0 almost surely;

(ii) V (T ) ≥ 0 almost surely;

(iii) P(V (T ) > 0) > 0.

Hence an arbitrage portfolio is a risk-free investment in the interval [0, T ] which requires
no initial wealth and with a positive probability to give profit. We remark that the arbitrage
property depends on the probability measure P. However, it is clear that if two measures P
and P e are equivalent, then the arbitrage property is satisfied with respect to P if and only if
it is satisfied with respect to P.
e The guiding principle to devise theoretical models for asset
prices in financial mathematics is to ensure that one cannot set-up an arbitrage portfolio by
investing on these assets (arbitrage-free principle).

Markets
A market in which the objects of trading are N risky assets (e.g., stocks) and M risk-free
assets in the money market is said to be “N+M dimensional”. Most of these notes focus
on the case of 1+1 dimensional markets in which we assume that the risky asset is a stock.
A portfolio invested in this market is a pair {hS (t), hB (t)}t≥0 of stochastic processes, where

38
hS (t) is the number of shares of the stock and hB (t) the number of shares of the risk-free
asset in the portfolio at time t. The value of such portfolio is given by
V (t) = hS (t)S(t) + hB (t)B(t),
where S(t) is the price of the stock (given for instance by (2.14)), while B(t) is the value at
time t of the risk-free asset, which is given by (2.15).

2.A Appendix: Solutions to selected problems


Exercise 2.1. Since Ω = {X ∈ R}, then Ω ∈ σ(X). {X ∈ U }c is the set of sample points
ω ∈ Ω such that X(ω) ∈ / U . The latter is equivalent to X(ω) ∈ U c , hence {X ∈ U }c = {X ∈
U c }. Since U c ∈ B(R), it follows that {X ∈ U } ∈ σ(X). Finally we have to prove that σ(X)
is closed with respect to the countable union of sets. Let {Ak }k∈N ⊂ σ(X). By definition of
σ(X), there exist {Uk }k∈N ⊂ B(R) such that Ak = {X ∈ Uk }. Thus we have the following
chain of equivalent statements
ω ∈ ∪k∈N Ak ⇔ ∃k̄ ∈ N : X(ω) ∈ Uk̄ ⇔ X(ω) ∈ ∪k∈N Uk ⇔ ω ∈ {X ∈ ∪k∈N Uk }.
Hence ∪k∈N Ak = {X ∈ ∪k∈N Uk }. Since ∪k∈N Uk ∈ B(R), then ∪k∈N Ak ∈ σ(X).

Exercise 2.2. Let A be an event that is resolved by both variables X, Y . This means that
there exist I, J ⊆ R such that A = {X ∈ I} = {Y ∈ J}. Hence, using the independence of
X, Y ,
P(A) = P(A ∩ A) = P(X ∈ I, Y ∈ J) = P(X ∈ I)P(Y ∈ J) = P(A)P(A) = P(A)2 .
Therefore P(A) = 0 or P(A) = 1.
Now let a, b two deterministic constants. Note that, for all I ⊂ R,

1 if a ∈ I
P(a ∈ I) =
0 otherwise
and similarly for b. Hence

1 if a ∈ I and b ∈ J
P(a ∈ I, b ∈ J) = = P(a ∈ I)P(b ∈ J).
0 otherwise
Finally we show that X and Y = g(X) are independent if and only if Y is a deterministic
constant. For the “if” part we use that

P(X ∈ J) if a ∈ I
P(a ∈ I, X ∈ J) = = P(a ∈ I)P(X ∈ J).
0 otherwise
For the “only if” part, let z ∈ R and I = {g(X) ≤ z} = {X ∈ g −1 (−∞, z]}. Then, using
the independence of X and Y = g(X),
P(g(X) ≤ z) = P(g(X) ≤ z, g(X) ≤ z) = P(X ∈ g −1 (−∞, z], g(X) ≤ z)
= P(X ∈ g −1 (−∞, z])P(g(X) ≤ z) = P(g(X) ≤ z)P(g(X) ≤ z).

39
Hence P(Y ≤ z) is either 0 or 1, which implies that Y is a deterministic constant.

Exercise 2.4. A ∈ σ(f (X)) if and only if A = {f (X) ∈ U }, for some U ∈ B(R). The latter is
equivalent to X(ω) ∈ {f ∈ U }, hence A = {X ∈ {f ∈ U }}. Similarly, B = {Y ∈ {g ∈ V }},
for some V ∈ B(R). Hence
P(A ∩ B) = P({X ∈ {f ∈ U }} ∩ {Y ∈ {g ∈ V }}).
As X and Y are independent, the right hand side is equal to P({X ∈ {f ∈ U }})P({Y ∈
{g ∈ V }}), hence P(A ∩ B) = P(A)P(B), as claimed.

Exercise 2.5. Let’s assume for simplicity that M = 2, i.e., X = b1 IB1 +b2 IB2 with B1 ∩B2 6=
∅ (the generalization to M > 2 is straightforward). Hence

 b1 for ω ∈ B1 \ (B1 ∩ B2 )
X(ω) = b2 for ω ∈ B2 \ (B1 ∩ B2 )
b1 + b2 for ω ∈ B1 ∩ B2

Assume b1 6= b2 . Then upon defining the disjoint sets A1 = B1 \(B1 ∩B2 ), A2 = B2 \(B1 ∩B2 ),
A3 = B1 ∩ B2 , and the real numbers a1 = b1 , a2 = b2 , a3 = b1 + b2 , we can rewrite X as the
simple random variable
X = a1 IA1 + a2 IA2 + a3 IA3 .
If b1 = b2 6= 0 we define A1 = B1 \ (B1 ∩ B2 ) ∪ B2 \ (B1 ∩ B2 ), A2 = B1 ∩ B2 , a1 = b1 = b2 ,
a2 = b1 + b2 = 2b1 and write X in the form
X = a1 IA1 + a2 IA2 .
Exercise 2.7. Write (−∞, b] as the disjoint union of the sets (−∞, a] and (a, b]. Hence

also {X ∈ (−∞, a]}, {X ∈ (a, b]} are disjoint. It follows that


P(−∞ < X ≤ b) = P(−∞ < X ≤ a) + P(a < X ≤ b),
that is, F (b) = F (a) + P(a < X ≤ b), by which the claim follows. To establish that FX is
right-continuous we now show that
1
P(X ≤ x0 + ) → P(X ≤ x0 ) as n → ∞, for all x0 ∈ R.
n
By the first part of the exercise it suffices to show that P(x0 < x ≤ x0 + n1 ) → 0 as n → ∞.
The intervals An = (x0 , x0 + n−1 ] satisfy An+1 ⊂ An and ∩n An = ∅. Hence by Exercise 1.7
we have limn→∞ P(An ) = P(∅) = 0. The facts that FX is increasing and converges to 1 as
x → ∞ are obvious.

Exercise 2.9. We first compute the distribution function FY of Y = X 2 . Clearly, FY (y) =


P(X 2 ≤ y) = 0, if y ≤ 0. For y > 0 we have
Z √y
2 √ √ 1 −x2 /2
FY (y) = P(X ≤ y) = P(− y < X < y) = √ e dx.
2π −√y

40
Hence, for y > 0,
1 e−y/2
 
d 1 −y/2 d √ −y/2 d √
fY (y) = FY (y) = √ e ( y) − e (− y) = √ √ .
dy 2π dy dy 2π y

Since Γ(1/2) = π, this is the claim.

Exercise 2.12. A 2 × 2 symmetric matrix


 
a b
C=
b c
is positive definite if and only if Tr C = a + c > 0 and det C = ac − b2 > 0. In particular,
a, c > 0. Let us denote
b
a = σ12 , c = σ22 , ρ = .
σ1 σ2
b2
Note that ρ2 = ac
< 1. Thus  
σ12 σ1 σ2 ρ
C= ,
σ1 σ2 ρ σ22
and so !
1
1
σ12
− σ1ρσ2
C −1 = .
1 − ρ2 − σ1ρσ2 1
σ22

Substituting into (2.11) proves (2.12).

Exercise 2.14. X ∈ N (0, 1) means that X has density


1 x2
fX (x) = √ e− 2 ,

while Y ∈ E(1) means that Y has density
fY (y) = e−y Iy≥0 .
Since X, Y are independent, they have the joint density fX,Y (x, y) = fX (x)fY (y). Hence
Z ∞
1
ZZ Z 2
− x2
P(X ≤ Y ) = fX,Y (x, y) dx dy = dx e dy √ e−y Iy≥0 .
R x 2π
x≤y

To compute this integral, we first divide the domain of integration on the variable x in x ≤ 0
and x ≥ 0. So doing we have
Z 0 Z ∞ Z ∞ Z ∞
1 − x2
2
−y 1 − x2
2
P(X ≤ Y ) = √ dx e dy e + √ dx e dy e−y .
2π −∞ 0 2π 0 x

Computing the integrals we find


1
P(X ≤ Y ) = + e1/2 (1 − Φ(1)),
2
41
Rz 2 /2
where Φ(z) = √1
2π −∞
e−y dy is the standard normal distribution.

Exercise 2.17. The density of S(t) is given by

d
fS(t) (x) = FS(t) (x),
dx
provided the distribution FS(t) , i.e.,

FS(t) (x) = P(S(t) ≤ x)

is differentiable. Clearly, fS(t) (x) = FS(t) (x) = 0, for x < 0. For x > 0 we use that
 
1 x
S(t) ≤ x if and only if W (t) ≤ log − αt := A(x).
σ S(0)

Thus,
A(x)
1
Z
y2
P(S(t) ≤ x) = P(−∞ < W (t) ≤ A(x)) = √ e− 2t dy,
2πt −∞

where for the second equality we used that W (t) ∈ N (0, t). Hence
Z A(x) !
d 1 y2 1 − A(x)2 dA(x)
fS(t) (x) = √ e− 2t dy = √ e 2t ,
dx 2πt −∞ 2πt dx

for x > 0, that is

(log x − log S(0) − αt)2


 
11
fS(t) (x) = √ exp − Ix>0 .
2πσ 2 t x 2σ 2 t

Since Z ∞
fS(t) (y)dy = 1,
0

then p0 = P(S(t) = 0) = 0.

42
Chapter 3

Expectation

Throughout this chapter we assume that (Ω, F, {F(t)}t≥0 , P) is a given filtered probability
space.

3.1 Expectation and variance of random variables


Suppose that we want to estimate the value of a random variable X before the experiment
has been performed. What is a reasonable definition for our “estimate” of X? Let us first
assume that X is a simple random variable of the form
N
X
X= ak IAk ,
k=1

for some finite partition {Ak }k=1,...,N of Ω and real distinct numbers a1 , . . . , aN . In this case,
it is natural to define the expected value (or expectation) of X as
N
X N
X
E[X] = ak P(Ak ) = ak P(X = ak ).
k=1 k=1

That is to say, E[X] is a weighted average of all the possible values attainable by X, in
which each value is weighted by its probability of occurrence. This definition applies also for
N = ∞ (i.e., for discrete random variables) provided of course the infinite series converges.
For instance, if X ∈ P(µ) we have
∞ ∞
X X µk e−µ
E[X] = kP(X = k) = k
k=0 k=0
k!
∞ ∞ ∞
−µ
X µk −µ
X µr+1 −µ
X µr
=e =e =e µ = µ.
k=1
(k − 1)! r=0
r! r=0
r!

Exercise 3.1 (•). Compute the expectation of binomial variables.

43
Now let X be a non-negative random variable and consider the sequence {sX n }n∈N of
X
simple functions defined in Theorem 2.2. Recall that sn converges pointwise to X as n → ∞,
i.e., sX
n (ω) → X(ω), for all ω ∈ Ω (see Exercise 2.6). Since

n2n −1
X k k k+1
E[sX
n] = P( ≤ X < ) + nP(X ≥ n), (3.1)
k=1
2n 2n 2n

it is natural to introduce the following definition.


Definition 3.1. Let X : Ω → [0, ∞) be a non-negative random variable. We define the
expectation of X as
n2n −1
X k k k+1
E[X] = lim P( ≤ X < ) + nP(X ≥ n), (3.2)
n→∞
k=1
2n 2n 2n

i.e., E[X] = limn→∞ E[sX X X


n ], where s1 , s2 , . . . is the sequence of simple functions converging
pointwise to X and defined in Theorem 2.2.
We remark that the limit in (3.2) exists, because (3.1) is an increasing sequence (see next
exercise), although this limit could be infinity. When the limit is finite we say that X has
finite expectation. This happens for instance when X is bounded, i.e., 0 ≤ X ≤ C a.s., for
some positive constant C.
Exercise 3.2. Show that E[sXn ] is increasing in n ∈ N. Show that the limit (3.2) is finite
when the non-negative random variable X is bounded.
Remark 3.1 (Monotone convergence theorem). It can be shown that that the limit (3.2) is
the same along any non-decreasing sequence of non-negative random variables that converge
pointwise to X, hence we can use any such sequence to define the expectation of a non-
negative random variable. This follows by the monotone convergence theorem, whose
precise statement is the following: If X1 , X2 , , . . . is a non-decreasing sequence of non-negative
random variables such that Xn → X pointiwise a.s., then E[Zn ] → E[Z].
Remark 3.2 (Dominated convergence theorem). The sequence of simple random variables
used to define the expectation of a non-negative random variable need not be non-decreasing
either. This follows by the dominated convergence theorem, whose precise statement
is the following: if X1 , X2 , . . . is a sequence of non-negative random variables such that
Xn → X, as n → ∞, pointiwise a.s., and supn Xn ≤ Y for some non-negative random
variable Y with finite expectation, then limn→∞ E[Xn ] = E[X].
Next we extend the definition of expectation to general random variables. For this pur-
pose we use that every random variable X : Ω → R can be written as
X = X+ − X− ,
where
X+ = max(0, X), X− = − min(X, 0)

44
are respectively the positive and negative part of X. Since X± are non-negative random
variables, then their expectation is given as in Definition 3.1.

Definition 3.2. Let X : Ω → R be a random variable and assume that at least one of the
random variables X+ , X− has finite expectation. Then we define the expectation of X as

E[X] = E[X+ ] − E[X− ].

If X± have both finite expectation, we say that X has finite expectation or that it is an
integrable random variable. The set of all integrable random variables on Ω will be denoted
by L1 (Ω), or by L1 (Ω, P) if we want to specify the probability measure.

Remark 3.3 (Notation). Of course the expectation of a random variable depends on the
probability measure. If another probability measure Pe is defined on the σ-algebra of events
(not necessarily equivalent to P), we denote the expectation of X in P
e by E[X].
e

Remark 3.4 (Expectation=Lebesgue integral). The expectation of a random variable X


with respect to the probability measure P is also called the Lebesgue integral of X over Ω
in the measure P and it is also denoted by
Z
E[X] = X(ω)dP(ω).

We shall not use this notation.

The following theorem collects some useful properties of the expectation:

Theorem 3.1. Let X, Y : Ω → R be integrable random variables. Then the following holds:

(i) Linearity: For all α, β ∈ R, E[αX + βY ] = αE[X] + βE[Y ];

(ii) If X ≤ Y a.s. then E[X] ≤ E[Y ];

(iii) If X ≥ 0 a.s., then E[X] = 0 if and only if X = 0 a.s.;

(iv) If X, Y are independent, then E[XY ] = E[X]E[Y ].

Sketch of the proof. For all claims, the argument of the proof is divided in three steps: STEP
1: Show that it suffices to prove the claim for non-negative random variables. STEP 2: Prove
the claim for simple functions. STEP 3: Take the limit along the sequences {sX Y
n }n∈N , {sn }n∈N
of simple functions converging to X, Y . Carrying out these three steps for (i), (ii) and (iii)
is simpler, so let us focus on (iv). Let X+ = f (X), X− = g(X), and similarly for Y , where
f (s) = max(0, s), g(s) = − min(0, s). By Exercise 2.4, each of (X+ , Y+ ), (X− , Y+ ), (X+ , Y− )
and (X− , Y− ) is a pair of independent (non-negative) random variables. Assume that the
claim is true for non-negative random variables. Then, using X = X+ − X− , Y = Y+ − Y−

45
and the linearity of the expectation, we find

E[XY ] = E[(X+ − X− )(Y+ − Y− )]


= E[X+ Y+ ] − E[X− Y+ ] − E[X+ Y− ] + E[X− Y− ]
= E[X+ ]E[Y+ ] − E[X− ]E[Y+ ] − E[X+ ]E[Y− ] + E[X− ]E[Y− ]
= (E[X+ ] − E[X− ])(E[Y+ ] − E[Y− ]) = E[X]E[Y ].

Hence it suffices to prove the claim for non-negative random variables. Next assume that
X, Y are independent simple functions and write
N
X M
X
X= aj IAj , Y = bk IBk .
j=1 k=1

We have
N X
X M N X
X M
XY = aj bk IAj IBk = aj bk IAj ∩Bk .
j=1 k=1 j=1 k=1

Thus by linearity of the expectation, and since the events Aj , Bk are independent, for all
j, k, we have
N X
X M N X
X M
E[XY ] = aj bk E[IAj ∩Bk ] = aj bk P(Aj ∩ Bk )
j=1 k=1 j=1 k=1
N X
X M N
X M
X
= aj bk P(Aj )P(Bk ) = aj P(Aj ) bk P(Bk ) = E[X]E[Y ].
j=1 k=1 j=1 k=1

Hence the claim holds for simple functions. It follows that

E[sX Y X Y
n sn ] = E[sn ]E[sn ].

Letting n → ∞, the right hand side converges to E[X]E[Y ]. To complete the proof we have
to show that the left hand side converges to E[XY ]. This follows by applying the monotone
convergence theorem (see Remark 3.1) to the sequence Zn = sX Y
n sn .

As |X| = X+ + X− , a random variable X is integrable if and only if E[|X|] < ∞. Hence


we have
X ∈ L1 (Ω) ⇔ E[X] < ∞ ⇔ E[|X|] < ∞.
The set of random variables X : Ω → R such that |X|2 is integrable, i.e., E[|X|2 ] < ∞, will
be denoted by L2 (Ω) or L2 (Ω, P).
Exercise 3.3 (•). Prove the Schwarz inequality,
p
E[XY ] ≤ E[X 2 ]E[Y 2 ], (3.3)

for all random variables X, Y ∈ L2 (Ω).

46
Letting Y = 1 in (3.3), we find

L1 (Ω) ⊂ L2 (Ω).

The covariance Cov(X, Y ) of two random variables X, Y ∈ L2 (Ω) is defined as

Cov(X, Y ) = E[XY ] − E[X]E[Y ].

Two random variables are said to be uncorrelated if Cov(X, Y ) = 0. By Theorem 3.1(iv),


if X, Y are independent then they are uncorrelated, but the opposite is not true in general.
Consider for example the simple random variables

 −1 with probability 1/3
X= 0 with probability 1/3
1 with probability 1/3

and 
2 0 with probability 1/3
Y =X =
1 with probability 2/3
Then X and Y are clearly not independent, but

Cov(X, Y ) = E[XY ] − E[X]E[Y ] = E[X 3 ] − 0 = 0,

since E[X 3 ] = E[X] = 0.


Definition 3.3. The variance of a random variable X ∈ L2 (Ω) is given by

Var[X] = E[(X − E[X])2 ].

Using the linearity of the expectation we can rewrite the definition of variance as

Var[X] = E[X 2 ] − 2E[E[X]X] + E[X]2 = E[X 2 ] − E[X]2 = Cov(X, X).

Note that a random variable has zero variance if and only if X = E[X] a.s., hence we may
view Var[X] as a measure of the “randomness of X”. As a way of example, let us compute
the variance of X ∈ P(µ). We have
∞ ∞ ∞
X X µk e−µ X k
E[X 2 ] = k 2 P(X = k) = k2 = e−µ µk
k=0 k=0
k! k=1
(k − 1)!
∞ ∞ ∞
−µ
X r+1 r+1 −µ
X µr X
=e µ =e µ +µ rP(X = r) = µ + µE[X] = µ + µ2 .
r=0
r! r=0
r! r=0

Hence
Var[X] = E[X 2 ] − E[X]2 = µ + µ2 − µ2 = µ.
Exercise 3.4. Compute the variance of binomial random variables.

47
Exercise 3.5 (•). Prove the following:

1. Var[αX] = α2 Var[X], for all constants α ∈ R;

2. Var[X + Y ] = Var[X] + Var[Y ] + 2Cov(X, Y );


p p
3. − Var[X]Var[Y ] ≤ Cov(X, Y ) ≤ Var[X]Var[Y ]. The left (resp. right) inequality
becomes an equality if and only if there exists a negative (resp. positive) constant a0
and a real constant b0 such that Y = a0 X + b0 almost surely.

By the previous exercise, Var(X + Y ) = Var(X) + Var(Y ) holds if and only if X, Y are
uncorrelated. Moreover, if we define the correlation of X, Y as

Cov(X, Y )
Cor(X, Y ) = p ,
Var(X)Var(Y )

then Cor(X, Y ) ∈ [−1, 1] and |Cor(X, Y )| = 1 if and only if Y is a linear function of X.


The interpretation is the following: the closer is Cor(X, Y ) to 1 (resp. −1), the more the
variable X and Y have tendency to move in the same (resp. opposite) direction (for instance,
(Cor(X, −2X) = −1, Cor(X, 2X) = 1). An important problem in quantitative finance is to
find correlations between the price of different assets.

Exercise 3.6. Let {Mk }k∈N be a symmetric random walk. Show that E[Mk ] = 0, and that
Var[Mk ] = k, for all k ∈ N.

Exercise 3.7. Show that the function k · k2 which maps a random variable Z to kZk2 =
p
E[Z 2 ] is a norm in L2 (Ω).

Remark 3.5 (L2 -norm). The norm defined in the previous exercise is called L2 norm. It
can be shown that it is a complete norm, i.e., if {XN }n∈N ⊂ L2 (Ω) is a Cauchy sequence of
random variables in the norm L2 , then there exists a random variable X ∈ L2 (Ω) such that
kXN − Xk2 → 0 as n → ∞.

Exercise 3.8 (•). Let {Wn (t)}t≥0 , n ∈ N, be the sequence of stochastic processes defined
in (2.13). Compute E[Wn (t)], Var[Wn (t)], Cov[Wn (t), Wn (s)]. Show that

Var(Wn (t)) → t, Cov(Wn (t), Wn (s)) → min(s, t), as n → +∞.

Next we want to present a first application in finance of the theory outlined above. In
particular we establish a sufficient condition which ensures that a portfolio is not an arbitrage.

Theorem 3.2. Let a portfolio be given with value {V (t)}t≥0 . Let V ∗ (t) = D(t)V (t) be the
discounted portfolio value. If there exists a measure P e ∗ (t)] is
e equivalent to P such that E[V
constant (independent of t), then the portfolio is not an arbitrage.

Proof. Assume that the portfolio is an arbitrage. Then V (0) = 0 almost surely; as V ∗ (0) =

48
V (0), the assumption of constant expectation in the probability measure P
e gives
e ∗ (t)] = 0,
E[V for all t ≥ 0. (3.4)
Let T > 0 be such that P(V (T ) ≥ 0) = 1 and P(V (T ) > 0) > 0. Since P and P e are
e (T ) ≥ 0) = 1 and P(V
equivalent, we also have P(V e (T ) > 0) > 0. Since the discounting
e ∗ (T ) ≥ 0) = 1 and P(V
process is positive, we also have P(V e ∗ (T ) > 0) > 0. However this
contradicts (3.4), due to Theorem 3.1(iii). Hence our original hypothesis that the portfolio
is an arbitrage portfolio is false.
Theorem 3.2 will be applied in Chapter 6. To this purpose we shall need the following
characterization of equivalent probability measures.
Theorem 3.3. The following statements are equivalents:
(i) P and P
e are equivalent probability measures;

(ii) There exists a unique (up to null sets) random variable Z : Ω → R such that Z > 0
almost surely, E[Z] = 1 and P(A)
e = E[ZIA ], for all A ∈ F.
Moreover, assuming any of these two equivalent conditions, for all random variables X such
that XZ ∈ L1 (Ω, P), we have X ∈ L1 (Ω, P)
e and

E[X]
e = E[ZX]. (3.5)
Proof. The implication (i) ⇒ (ii) follows by the Radon-Nikodým theorem, whose proof
can be found for instance in [6]. As to the implication (ii) ⇒ (i), we first observe that
P(Ω)
e = E[ZIΩ ] = E[Z] = 1. Hence, to prove that P e is a probability measure, it remains to
show that it satisfies the countable additivity property: for all families {Ak }k∈N of disjoint
e k Ak ) = P P(A
events, P(∪ e k ). To prove this let
k

Bn = ∪nk=1 Ak .
Clearly, ZIBn is an increasing sequence of random variables. Hence, by the monotone con-
vergence theorem (see Remark 3.1) we have
lim E[ZIBn ] = E[ZIB∞ ], B∞ = ∪∞
k=1 Ak ,
n→∞

i.e.,
lim P(B
e n ) = P(B
e ∞ ). (3.6)
n→∞
On the other hand, by linearity of the expectation,
n
X n
X
e n ) = E[ZBn ] = E[ZI∪n A ] = E[Z(IA + · · · + IAn )] =
P(B E[ZAk ] = e k ).
P(A
k=1 k 1
k=1 k=1

Hence (3.6) becomes



X
e ∞ Ak ).
e k ) = P(∪
P(A k=1
k=1

49
This proves that P e is a probability measure. To show that P and P e are equivalent, let A be
such that P(A) = 0. Since ZIA ≥ 0 almost surely, then P(A) = E[ZIA ] = 0 is equivalent, by
e e
Theorem 3.1(iii), to ZIA = 0 almost surely. Since Z > 0 almost surely, then this is equivalent
to IA = 0 a.s., i.e., P(A) = 0. Thus P(A)
e = 0 if and only if P(A) = 0, i.e., the probability
measures P and P are P
e equivalent. It remains to prove the identity (3.5). If X is the simple
random variable X = k ak IAk , then the proof is straightforward:
X X X
E[X]
e = ak P(A
e k) = ak E[ZIAk ] = E[Z ak IAk ] = E[ZX].
k k k

For a general non-negative random variable X the result follows by applying (3.5) to an
increasing sequence of simple random variables converging to X and then passing to the
limit (using the monotone convergence theorem). The result for a general random variable
X : Ω → R follows by applying (3.5) to the positive and negative part of X and using the
linearity of the expectation.
Remark 3.6 (Radon-Nikodým derivative). Using the Lebesgue integral notation (see Re-
mark 3.4) we can write (3.5) as
Z Z
X(ω)dP(ω)
e = X(ω)Z(ω)dP(ω).
Ω Ω

dP(ω)
This leads to the formal identity dP(ω) = Z(ω)dP(ω), or Z(ω) = dP(ω) , which explains why
e
e
Z is also called the Radon-Nikodým derivative of P
e with respect to P.

An application of Theorem (3.3) is given in Exercise 3.11 below.

3.2 Computing the expectation of a random variable


Next we discuss how to compute the expectation of a random variable X. Definition 3.1 is
clearly not very useful to this purpose, unless X is a simple random variable. There exist
several methods to compute the value for E[X], some of which will be presented later in these
notes. In this section we show that the expectation and the variance of a random variable
can be computed easily when the random variable admits a density.
Theorem 3.4. Let X : Ω → R be a random variable and g : R → R be a measurable function
such that g(X) ∈ L1 (Ω). Assume that X admits the density fX . Then
Z
E[g(X)] = g(x)fX (x) dx.
R

In particular, the expectation and the variance of X are given by


Z Z Z !2
E[X] = xfX (x) dx, Var[X] = x2 fX (x) dx − xfX (x) dx .
R R R

50
Proof. We prove the theorem under the assumption that g is a simple measurable function,
the proof for general functions g follows by a limit argument similar to the one used in the
proof of Theorem 3.1, see Theorem 1.5.2 in [21] for the details. Hence we assume
N
X
g(x) = αk IUk (x),
k=1

for some disjoint Borel sets U1 , . . . , UN ⊂ R. Thus


X X
E[g(X)] = E αk IUk (X)] = αk E[IUk (X)].
k k

Let Yk = IUk (X) : Ω → R. Then Yk is the simple random variable that takes value 1 if
ω ∈ Ak and 0 if ω ∈ Ack , where Ak = {X ∈ Uk }. Thus the expectation of Yk is given by
E[Yk ] = P(Ak ) and so
X X Z Z X Z
E[g(X)] = αk P(X ∈ Uk ) = αk f (x) dx = αk IUk (x)f (x) dx = g(x)f (x) dx,
k k Uk R k R

as claimed.
For instance, if X ∈ N (m, σ 2 ), we have
dx
Z
(x−m)2
E[X] = xe− 2σ2 √ = m,
2πσ 2
R
dx
Z
(x−m)2
Var[X] = x2 e− 2σ2 √ − m2 = σ 2 ,
R 2πσ 2
which explains why we called m the expectation and σ 2 the variance of the normal random
variable X. Note in particular that, for a Brownian motion {W (t)}t≥0 , there holds
E[W (t) − W (s)] = 0, Var[W (t) − W (s)] = |t − s|, for all s, t ≥ 0. (3.7)
Let us show that1
Cov(W (t), W (s)) = min(s, t). (3.8)
For s = t, the claim is equivalent to Var[W (t)] = t, which holds by definition of Brownian
motion (see (3.7)). For t > s we have
Cov(W (t), W (s)) = E[W (t)W (s)] − E[W (t)]E[W (s)]
= E[W (t)W (s)]
= E[(W (t) − W (s))W (s)] + E[W (s)2 ].
Since W (t)−W (s) and W (s) are independent random variables, then E[(W (t)−W (s))W (s)] =
E(W (t) − W (s)]E[W (s)] = 0, and so
Cov(W (t), W (s)) = E[W (s)2 ] = Var[W (s)] = s = min(s, t), for t > s.
A similar argument applies for t < s.
1
Compare (3.8) with the result of Exercise 3.8.

51
Exercise 3.9. The moment of order n of a random variable X is the quantity µn = E[X n ],
n = 1, 2, . . . . Let X ∈ N (0, σ 2 ). Prove that

0 if n is odd
µn = n
1 · 3 · 5 . . . · · · · (n − 1)σ if n is even.
Exercise 3.10 (•). Compute the expectation and the variance of exponential random vari-
ables.
Exercise 3.11 (•). Let X ∈ E(λ) an exponential random variable with intensity λ. Given
λ
e > 0, let
λ
e e
Z = e−(λ−λ)X .
λ
Define P(A)
e = E[ZIA ], A ∈ F. Show that Pe is a probability measure equivalent to P. Prove
that X ∈ E(λ)
e in the probability measure P.
e

Exercise 3.12. Compute the expectation and the variance of Cauchy distributed random
variables. Compute the expectation and the variance of Lévy distributed random variables.
Exercise 3.13. Compute the expectation and the variance of the geometric Brownian mo-
tion (2.14).
Exercise 3.14. Show that the paths of the Brownian motion have unbounded linear variation.
Namely, given 0 = t0 < t1 < · · · < tn = t with tk − tk−1 = h, for all k = 1, . . . , n, show that
n
X
E[ |W (tk ) − W (tk−1 )|] → ∞, as n → ∞.
k=1

(However, Brownian motions have finite quadratic variation, see Section 3.4).
A result similar to Theorem 3.4 can be used to compute the correlation between two
random variables that admit a joint density.
Theorem 3.5. Let X, Y : Ω → R be two random variables with joint density fX,Y : R2 →
[0, ∞) and let g : R2 → R be a measurable function such that g(X, Y ) ∈ L1 (Ω). Then
Z
E[g(X, Y )] = g(x, y)fX,Y (x, y) dx dy.
R2

In particular, for X, Y ∈ L2 (Ω),


Z Z Z
Cov(X, Y ) = xyfX,Y (x, y) dx dy − xfX (x) dx yfY (y) dy,
R2 R R

where Z Z
fX (x) = fX,Y (x, y) dy, fY (y) = fX,Y (x, y) dx
R R
are the (marginal) densities of X and Y .

52
Exercise 3.15. Show that if X1 , X2 : Ω → R are jointly normally distributed with covariant
matrix C = (Cij )i,j=1,2 , then Cij = Cov(Xi , Xj ).
Combining the results of Exercises 2.12 and 3.15, we see that the parameter ρ in Equa-
tion (2.12) is precisely the correlation of the two jointly normally distributed random vari-
ables X, Y . It follows by Remark 2.3 that two jointly normally distributed random variables
are independent if and only if they are uncorrelated. Recall that for general random variables,
independence implies uncorrelation, but the opposite is in general not true.

3.3 Characteristic function


In this section, and occasionally in the rest of the notes, we shall need to take the expectation
of a complex-valued random variable Z : Ω → C. Letting Z = Re(Z) + iIm(Z), the
expectation of Z is the complex number defined by

E[Z] = E[Re(Z)] + iE[Im(Z)].

Definition 3.4. Let X ∈ L1 (Ω). The function θX : R → C given by

θX (u) = E[eiuX ]

is called the characteristic function of X. The positive, real-valued function MX (u) =


E[euX ], when it exists in some neighborhood of u = 0, is called the moment-generating
function of X.
Note that if the random variable X admits the density fX , then
Z
θX (u) = eiux fX (x) dx,
R

i.e., the characteristic function is the inverse Fourier transform of the density. Table 3.1
contains some examples of characteristic functions.
Note carefully that, while θX is defined for all u ∈ R, the moment-generating function of
a random variable may be defined only in a subset of the real line, or not defined at all (see
Exercise 3.16). For instance, when X ∈ E(λ) we have
Z ∞ 
uX (u−λ)x +∞ if u ≥ λ
MX (u) = E[e ] = λ e dx = −1
0 (1 − u/λ) if u < λ

Hence MX (u) which is defined (as a positive function) only for u < λ.
Exercise 3.16. Show that Cauchy random variables do not have a well-defined moment-
generating function.
The characteristic function of a random variable provides a lot of information. In partic-
ular, it determines completely the distribution function of the random variable, as shown in
the following theorem (for the proof, see [6, Sec. 9.5]).

53
Density Characteristic function

N (m, σ 2 ) exp(ium − 21 σ 2 u2 )
E(λ) (1 − iu/λ)−1
χ2 (δ) (1 − 2iu)−δ/2
βu
(1 − 2iu)−δ/2 exp − 2u+i

χ2 (δ, β)

Table 3.1: Examples of characteristic functions

Theorem 3.6. Let X, Y ∈ L1 (Ω). Then θX = θY if and only if FX = FY . In particular, if


θX = θY and one of the two variables admits a density, the other does too and the densities
are equal.
According to the previous theorem, if we want for instance to prove that a random
variable X is normally distributed, we may try to show that its characteristic function θX is
given by θX (u) = exp(ium − 21 σ 2 u2 ), see Table 3.1. Another useful property of characteristic
functions is proved in the following theorem.
Theorem 3.7. Let X1 , . . . , XN ∈ L1 (Ω) be independent random variables. Then
θX1 +···+XN = θX1 . . . θXN .
Proof. We have
θX1 +···+XN (u) = E[eiu(X1 +···+XN ) ] = E[eiuX1 eiuX2 . . . eiuXN ].
Using that the variables Y1 = eiuX1 , . . . , YN = eiuXN are independent (see Theorem 2.1) and
that the expectation of the product of independent random variables is equal to the product
of their expectations (see Theorem 3.1(iv)) we obtain
E[eiuX1 eiuX2 . . . eiuXN ] = E[eiuX1 ] . . . E[euXN ] = θX1 (u) . . . θXN (u),
which concludes the proof.
As an example of application of the previous theorem, we now show that if X1 , . . . XN
are independent normally distributed random variables with expectations m1 , . . . , mN and
variances σ12 , . . . , σN
2
, then the random variable
Y = X1 + · · · + XN
is normally distributed with mean m and variance σ 2 given by
m = m1 + · · · + mN , 2
σ 2 = σ12 + · · · + σN . (3.9)

54
In fact,
1 2 2 1 2 2 1 2 u2
θX1 +···+XN (u) = θX1 (u) . . . θXN (u) = eium1 − 2 σ1 u . . . eiumN − 2 σN u = eium− 2 σ .
The right hand side of the previous equation is the characteristic function of a normal
variable with expectation m and variance σ 2 given by (3.9). Thus Theorem 3.6 implies that
X1 + · · · + XN ∈ N (m, σ 2 ).
Exercise P3.17. Let X1 ∈ N (m1 , σ12 ), . . . , XN ∈ N (mN , σN 2
), N ≥ 2, be independent. Show
that Y = N k=1 (X k /σ k ) 2
∈ χ2
(N, β) where β = (m 1 /σ 1 )2
+ · · · + (mN /σN )2 (compare with
Exercise 2.9).
Exercise 3.18. Let X, Y ∈ L1 (Ω) be independent random variables with densities fX , fY .
Show that X + Y has the density
Z
fX+Y (x) = fX (x − y)fY (y) dy.
R

Remark: The right hand side of the previous identity defines the convolution product of
the functions fX , fY .
The characteristic function is also very useful to establish whether two random variables
are independent, as shown in the following exercise.
Exercise 3.19. Let X, Y ∈ L1 (Ω) and define their joint characteristic function as
θX,Y (u, v) = E[eiuX+ivY ], u, v ∈ R.
Show that X, Y are independent if and only if θX,Y (u, v) = θX (u)θY (v).

3.4 Quadratic variation of stochastic processes


We continue this chapter by discussing the important concept of quadratic variation. We
introduce this concept to measure how “wild” a stochastic process oscillates in time, which
in financial mathematics is a measure of the volatility of an asset price.
Let {X(t)}t≥0 be a stochastic process. A partition of the interval [0, T ] is a set of points
Π = {t0 , t1 , . . . tm } such that
0 = t0 < t1 < t2 < · · · < tm = T.
The size of the partition is given by
kΠk = max (tj+1 − tj ).
j=0,...,m−1

To measure the amount of oscillations of {X(t)}t≥0 in the interval [0, T ] along the partition
Π, we compute
m−1
X
QΠ (ω) = (X(tj+1 , ω) − X(tj , ω))2 .
j=0

55
Note carefully that QΠ is a random variable and that it depends on the partition. For
example, let {∆(s)}s≥0 be the step process

X
∆(s, ω) = Xk (ω)I[sk ,sk+1 ) .
k=0

Then if the partition Π = {0, t1 , . . . , tm = T } is such that sk−1 < tk < sk , for all k =
1, 2, . . . m, we have

QΠ (ω) = (X2 (ω) − X1 (ω))2 + (X3 (ω) − X2 (ω))2 + · · · + (Xm+1 (ω) − Xm (ω))2 .

However if two points in the partition belong to the same interval [sk , sk+1 ), the variation
within these two instants of time clearly gives no contribution to the total variation QΠ .
To define the quadratic variation of the stochastic process {X(t)}t≥0 , we compute QΠn
along a sequence {Πn }n∈N of partitions to the interval [0, T ] such that kΠn k → 0 as n → ∞
and then we take the limit of QΠn as n → ∞. Since {QΠn }n∈N is a sequence of random
variables, there are several ways to define its limit as n → ∞. The precise definition that we
adopt is that of L2 -quadratic variation, in which the limit is taken in the norm k · k2 defined
in Exercise 3.5.
Definition 3.5. The L2 -quadratic variation of the stochastic process {X(t)}t≥0 in the
interval [0, T ] along the sequence of partitions {Πn }n∈N is a random variable denoted by
[X, X](T ) such that
 2 
m(n)−1
X (n) (n)
lim E  (X(tj+1 ) − X(tj ))2 − [X, X](T )  = 0,
n→∞
j=0

(n) (n) (n)


where m(n) + 1 is the number of points in the partition Πn = {t0 , t1 , t2 , . . . , tm(n)−1 , T }.
If the limit in the previous definition does not exist, the quadratic variation cannot be
defined as we did (an alternative definition is possible, but we shall not need it)
Note that the quadratic variation depends in general on the sequence of partitions of
the interval [0, T ] along which it is computed, although this is not reflected in our notation
[X, X](T ). However for several important examples of stochastic processes—and in particular
for all applications considered in these notes—the quadratic variation is independent of the
sequence of partitions. To distinguish this important special case, we shall use the following
(standard) notation:
dX(t)dX(t) = dY (t),
to indicate that the quadratic variation of the stochastic process {X(t)}t≥0 in any interval
[0, t] is given by the random variable Y (t), independently from the sequence of partitions of
the interval [0, t] along which it is computed. Note that {Y (t)}t≥0 is a stochastic process.
Now we show that if the paths of the stochastic process {X(t)}t≥0 are sufficiently regular,
then its quadratic variation is zero along any sequence of partitions.

56
Theorem 3.8. Assume that the paths of the stochastic process {X(t)}t≥0 satisfy
P(|X(t) − X(s)| ≤ C|t − s|γ ) = 1, (3.10)
for some positive constant C > 0 and γ > 1/2. Then
dX(t)dX(t) = 0.
Proof. We have
 2   2 
m(n)−1 m(n)−1
X (n) (n)
X (n) (n)
E  (X(tj+1 ) − X(tj ))2   ≤ C 4 E  (tj+1 − tj )2γ  
j=0 j=0
 2 
m(n)−1
X (n) (n) (n) (n)
= C 4 E  (tj+1 − tj )2γ−1 (tj+1 − tj )  .
j=0

(n) (n) P (n) (n)


Now we use that tj+1 − tj ≤ kΠn k and j (tj+1 − tj ) = T , so that
 2 
m(n)−1
X (n) (n) 2
E  (X(tj+1 ) − X(tj ))2   ≤ C 2 kΠn k2γ−1 T → 0, as kΠn k → 0.
j=0

As a special important case we have that


dtdt = 0. (3.11)
Next we compute the quadratic variation of Brownian motions.
Theorem 3.9. For a Brownian motion {W (t)}t≥0 there holds
dW (t)dW (t) = dt. (3.12)
Proof. Let
m(n)−1
X (n) (n)
QΠn (ω) = (W (tj+1 , ω) − W (tj , ω))2 ,
j=0

where we recall that m(n) + 1 is the number of points in the partition Πn of [0, T ]. We
compute
E[(QΠn − T )2 ] = E[Q2Πn ] + T 2 − 2T E[QΠn ].
But
m(n)−1 m(n)−1
X (n) (n)
X (n) (n)
E[QΠn ] = E[(W (tj+1 ) − W (tj ))2 ] = Var[W (tj+1 ) − W (tj )]
j=0 j=0
m(n)−1
X (n) (n)
= (tj+1 − tj ) = T.
j=0

57
Hence we have to prove that
lim E[Q2Πn ] − T 2 = 0,
kΠn k→0

or equivalently (as we have just proved that E[QΠn ] = T ),

lim Var(QΠn ) = 0. (3.13)


kΠn k→0

Since the increments of a Brownian motion are independent, we have


m(n)−1 m(n)−1
X (n) (n)
X (n) (n)
Var(QΠn ) = Var[(W (tj+1 ) − W (tj ))2 ] = E[(W (tj+1 ) − W (tj ))4 ]
j=0 j=0
m(n)−1
X (n) (n)
− E[(W (tj+1 ) − W (tj ))2 ]2
j=0

Now we use that


(n) (n) (n) (n) (n) (n)
E[(W (tj+1 ) − W (tj ))2 ] = Var[W (tj+1 ) − W (tj )] = tj+1 − tj ,

and, as it follows by Exercise 3.9,


(n) (n) (n) (n)
E[(W (tj+1 ) − W (tj ))4 ] = 3(tj+1 − tj )2 .

We conclude that
m(n)−1
X (n) (n)
Var[QΠn ] = 2 (tj+1 − tj )2 ≤ 2kΠn kT → 0, as kΠn k → 0,
j=0

which proves (3.13) and thus the theorem.


Remark 3.7 (No-where differentiability of Brownian motions). Combining Theorem 3.9 and
Theorem 3.8, we conclude that the paths of a Brownian motion {W (t)}t≥0 cannot satisfy
the regularity condition (3.10). In fact, while the paths of a Brownian motion are a.s.
continuous by definition, they turn out to be no-where differentiable, in the sense that
ω
the event {ω ∈ Ω : γW ∈ C 1 } is a null set. A proof of this can be found for instance in [7].
Finally we need to consider a slight generalization of the concept of quadratic variation.
Definition 3.6. We say that two stochastic processes {X1 (t)}t≥0 and {X2 (t)}t≥0 have L2 -
cross variation [X1 , X2 ](T ) ∈ R in the interval [0, T ] along the sequence of partitions
{Πn }n∈N , if
 2 
m(n)−1
X (n) (n) (n) (n)
lim E  (X1 (tj+1 ) − X1 (tj ))(X2 (tj+1 ) − X2 (tj )) − [X1 , X2 ](T )  = 0,
n→∞
j=0

where m(n) + 1 is the number of points in the partition Πn .

58
As for the quadratic variation of a stochastic process, we use a special notation to ex-
press that the cross variation of two stochastic processes is independent of the sequence of
partitions along which it is computed. Namely, we write
dX1 (t)dX2 (t) = dY (t),
to indicate that the cross variation [X1 , X2 ](t) equals Y (t) along any sequence of partitions
{Πn }n∈N . The following generalization of Theorem 3.8 is easily established.
Theorem 3.10. Assume that the paths of the stochastic processes {X1 (t)}t≥0 , {X2 (t)}t≥0
satisfy
P(|X1 (t) − X1 (s)| ≤ C|t − s|γ ) = 1, P(|X2 (t) − X2 (s)| ≤ C|t − s|λ ) = 1,
for some positive constants C, γ, λ such that γ + λ > 1/2. Then dX1 (t)dX2 (t) = 0.
Exercise 3.20. Prove the theorem.
As a special case we find that
dW (t)dt = 0. (3.14)
It is important to memorize the identities (3.11), (3.12) and (3.14), as they will be used
several times in the following chapters.
Exercise 3.21 (?). Let {W1 (t)}t≥0 , {W2 (t)}t≥0 be two independent Brownian motions. Show
that dW1 (t)dW2 (t) = 0.

3.5 Conditional expectation


Recall that the expectation value E[X] is an estimate on the average value of the random
variable X. This estimate does not depend on the σ-algebra F, nor on any sub-σ-algebra
thereof. However, if some information is known in the form of a σ-algebra G, then one
expects to be able to improve the estimate on the value of X. To quantify this we introduce
the definition of “expected value of X given G”, or conditional expectation, which we denote
by E[X|G]. We want the conditional expectation to verify the following properties:
(i) If X is G-measurable, then it should hold that E[X|G] = X, because the information
provided by G is sufficient to determine X;
(ii) If X is independent of G, then E[X|G] = E[X], because the occurrence of the events
in G does not effect the probability distribution of X;
Note that (i) already indicates that E[X|G] is a random variable. To begin with we define
the conditional expectation of a random variable X with respect to an event A ∈ F. Let’s
assume first that X is the simple random variable
N
X
X= ak IAk ,
k=1

59
where a1 , . . . , aN ∈ R and {Ak }k=1,...,N is a family of disjoints subsets of Ω. Let B ∈
k ∩B)
F : P(B) > 0 and PB (Ak ) = P(A P(B)
be the conditional probability of Ak given B, see
Definition 1.4. It is natural to define the conditional expectation of X given the event B as
N
X
E[X|B] = ak PB (Ak ).
k=1

Moreover, since
N
X N
X
XIB = ak IAk IB = ak IAk ∩B ,
k=1 k=1
B]
we also have the identity E[X|B] = E[XI
P(B)
. We use the latter identity to define the conditional
expectation given B of general random variables.
Definition 3.7. Let X ∈ L1 (Ω) and B ∈ F. When P(B) > 0 we define the conditional
expectation of X given the event B as
E[XIB ]
E[X|B] = .
P(B)
When P(B) = 0 we define E[X|B] = E[X].
Note that E[X|B] is a deterministic constant. Next we discuss the concept of conditional
expectation given a σ-algebra G. We first assume that G is generated by a (say, finite)
partition {Ak }k=1,...,M of Ω, see Exercise 1.4. Then it is natural to define
M
X
E[X|G] = E[X|Ak ]IAk . (3.15)
k=1

Note that E[X|G] is a G-measurable simple function. It will now be shown that (3.15)
satisfies the identity
E[E[X|G]|B] = E[X|B], for all B ∈ G : P(B) > 0. (3.16)
In fact,
P(B)E[E[X|G]|B] = E[E[X|G]IB ]
M
X
= E[ E[X|Ak ]IAk IB ]
k=1
M
X
= E[E[X|Ak ]IAk ∩B ]
k=1
M
X E[XIAk ]
= E[ IA ∩B ]
k=1
P(Ak ) k
M
X 1
= E[XIAk ]E[IAk ∩B ].
k=1
P(Ak )

60
Since {A1 , . . . , AM } is a partition of Ω, there exists I ⊂ {1, . . . , M } such that B = ∪k∈I Ak ;
hence the above sum may be restricted to k ∈ I. Since E[IAk ∩B ] = E[IAk ] = P(Ak ), for k ∈ I,
we obtain X
P(B)E[E[X|G]|B] = E[XIAk ] = E[XI∪k∈I Ak ] = E[XIB ],
k∈I

by which (3.16) follows.


Exercise 3.22. What is the interpretation of (3.16)?
The conditional expectation of a random variable with respect to a general σ-algebra can
be constructed explicitly only in some special cases (see Section 3.7). However an abstract
definition is still possible, which we give after the following theorem.
Theorem 3.11. Let G be a sub-σ-algebra of F and X ∈ L1 (Ω). If Y1 , Y2 ∈ L1 (Ω) are
G-measurable and satisfy

E[Yi |A] = E[X|A], for i = 1, 2 and all A ∈ G : P(A) > 0, (3.17)

then Y1 = Y2 a. s.
Proof. We want to prove that P(B) = 0, where

B = {ω ∈ Ω : Y1 (ω) 6= Y2 (ω)}.

Let B+ = {Y1 > Y2 } and assume P(B+ ) > 0. Then, by (3.17) and Definition 3.7,

E[(Y1 − Y2 )IB+ ] = E[Y1 IB+ ] − E[Y2 IB+ ] = P(B+ )(E[Y1 |B+ ] − E[Y2 |B+ ]) = 0.

By Theorem 3.1(iii), this is possible if and only if (Y1 − Y2 )IB+ = 0 a.s., which entails
P(B+ ) = 0. At the same fashion one proves that P(B− ) = 0, where P(B− ) = {Y1 < Y2 }.
Hence P(B) = P(B+ ) + P(B− ) = 0, as claimed.
Theorem 3.12 (and Definition). Let X ∈ L1 (Ω) and G be a sub-σ-algebra of F. There
exists a G-measurable random variable E[X|G] ∈ L1 (Ω) such that

E[E[X|G]|A] = E[X|A], for all A ∈ G : P(A) > 0. (3.18)

The random variable E[X|G], which by Theorem 3.11 is uniquely defined up to a null set, is
called the conditional expectation of X given the σ-algebra G. If G is the σ-algebra
generated by a random variable Y , i.e., G = σ(Y ), we write E[X|G] = E[X|Y ].
Proof. See [21, Appendix B].

Remark 3.8. Following Remark 3.3, we denote by E[X|G]


e the conditional expectation of
X in a new probability measure P, not necessarily equivalent to P.
e

We continue this section with a list of properties of the conditional expectation, which
we divide in three theorems.

61
Theorem 3.13. The conditional expectation of X ∈ L1 (Ω) satisfies the following identities
almost surely:
(i) E[E[X|G]] = E[X];

(ii) If X is G-measurable, then E[X|G] = X;

(iii) If H ⊂ G is a sub-σ-algebra, then

E[E[X|G]|H] = E[X|H];

(iv) Linearity: E[αX + βY |G] = αE[X|G] + βE[Y |G], for all α, β ∈ R and Y ∈ L1 (Ω).

(v) If G consists of trivial events only, then E[X|G] = E[X].


Proof. (i) Replace A = Ω into (3.18). (ii) Note that (3.17) is satisfied by Y1 = E[X|G] and
Y2 = X; hence when X is G-measurable, we have E[X|G] = X a.s., by uniqueness. (iii)
Using (3.18), and since H ⊂ G, the random variables E[E[X|G]|H], E[X|G] and E[X|H]
satisfy

E[E[E[X|G]|H]|A] = E[E[X|G]|A],
E[E[X|G]|A] = E[X|A],
E[E[X|H]|A] = E[X|A],

for all A ∈ H : P(A) > 0. It follows that

E[E[E[X|G]|H]|A] = E[E[X|H]|A]

and thus by uniqueness the claim follows. (iv) The variables Y1 = E[αX + βY |G] and
Y2 = αE[X|G] + βE[Y |G] satisfy (3.17), and so they are equal almost surely. (v) See next
exercise.
Exercise 3.23. Prove the property (v) in Theorem 3.13.
The following theorem collects some properties of the conditional expectation in the
presence of two variables and is given without proof.
Theorem 3.14. Let X, Y ∈ L1 (Ω). Then the following identities holds almost surely:
(i) If X is independent of G, then E[X|G] = E[X];

(ii) If X is G-measurable, then E[XY |G] = XE[Y |G];

(iii) If X ≤ Y then E[X|G] ≤ E[Y |G].


Exercise 3.24 (•). Prove the property (i) in Theorem 3.14 when G is generated by a partition
{Ak }k=1...,M of Ω, i.e., using Definition 3.15. Prove the property (ii) when Y is a simple
random variable.

62
Exercise 3.25. Prove the property (iii) in Theorem 3.14.

Exercise 3.26 (•). The purpose of this exercise is to show that the conditional expectation
is the best estimator of a random variable when some information is given in the form of a
sub-σ-algebra. Let X ∈ L1 (Ω) and G ⊆ F be a sub-σ-algebra. Define Err = X − E[X|G].
Show that E[Err] = 0 and
Var[Err] = min Var[Y − X],
Y

where the minimum is taken with respect to all G-measurable random variables Y .

Exercise 3.27 (•). Let X, Y ∈ L1 (Ω) and consider the decomposition

X = E[X|Y ] + (X − E[X|Y ]) = X1 + X2 .

Show that X2 and Y are uncorrelated. Hence any random variable X can be written as the
sum of a Y -measurable random variable and a remainder which is uncorrelated to X.

Theorem 3.15. Let X, Y : Ω → R be two random variables and G a sub-σ-algebra of F


such that X is G-measurable and Y is independent of G. Then for any measurable function
g : R2 → [0, ∞), the function f : R → [0, ∞) defined by

f (x) = E[g(x, Y )]

is measurable and moreover


E[g(X, Y )|G] = f (X).

The previous theorem tells us that, under the stated assumptions, we can compute the
random variable E[g(X, Y )|G] as if X were a constant.

3.6 Martingales
A martingale is a stochastic process which has no tendency to rise or fall. The precise
definition is the following:

Definition 3.8. A stochastic process {M (t)}t≥0 is called a martingale relative to the fil-
tration {F(t)}t≥0 if it is adapted to {F(t)}t≥0 , M (t) ∈ L1 (Ω) for all t ≥ 0, and

E[M (t)|F(s)] = M (s), for all 0 ≤ s ≤ t, (3.19)

for all t ≥ 0.

Hence a stochastic process is martingale if the information available up to time s does


not help to predict whether the stochastic process will raise or fall after time s.

Remark 3.9. If the condition (3.19) is replaced by E[M (t)|F(s)] ≥ M (s), for all 0 ≤

63
s ≤ t, the stochastic process {M (t)}t≥0 is called a sub-martingale. The interpretation
is that M (t) has no tendency to fall, but our expectation is that it will increase. If the
condition (3.19) is replaced by E[M (t)|F(s)] ≤ M (s), for all 0 ≤ s ≤ t, the stochastic
process {M (t)}t≥0 is called a super-martingale. The interpretation is that M (t) has not
tendency to rise, but our expectation is that it will decrease.
Remark 3.10. If we want to emphasize that the martingale property is satisfied with respect
to the probability measure P, we shall say that {M (t)}t≥0 is a P-martingale.
Since the conditional expectation of a random variable X is uniquely determined by (3.18),
then the property (3.19) is satisfied if and only if

E[M (s)IA ] = E[M (t)IA ], for all 0 ≤ s ≤ t and for all A ∈ F(s). (3.20)

In particular, letting A = Ω, we obtain that the expectation of a martingale is constant, i.e.,

E[M (t)] = E[M (0)], for all t ≥ 0. (3.21)

Combining the latter result with Theorem 3.2, we obtain the following sufficient condition
for no arbitrage.
Theorem 3.16. Let a portfolio be given with value {V (t)}t≥0 . If there exists a measure
e equivalent to P and a filtration {F(t)}t≥0 such that the discounted value of the portfolio
P
{V ∗ (t)}t≥0 is a martingale, then the portfolio is not an arbitrage.
Proof. The assumption is that

E[D(t)V
e (t)|F(s)] = D(s)V (s), for all 0 ≤ s ≤ t.

Hence, by (3.21), E[D(t)V


e (t)] = E[D(0)V
e (0)] = E[V
e (0)]. The result follows by Theorem 3.2.

Theorem 3.17. Let {F(t)}t≥0 be a non-anticipating filtration for the Brownian motion
{W (t)}t≥0 . Then {W (t)}t≥0 is a martingale relative to {F(t)}t≥0 .
Proof. The martingale property for s = t, i.e., E[W (t)|F(t)] = W (t), follows by the fact
W (t) is F(t)-measurable, and thus Theorem 3.13(ii) applies. For 0 ≤ s < t we have

E[W (t)|F(s)] = E[(W (t) − W (s))|F(s)] + E[W (s)|F(s)],


= E[W (t) − W (s)] + W (s) = W (s),

where we used that W (t) − W (s) is independent of F(s) (and so E[(W (t) − W (s))|F(s)] =
E[(W (t) − W (s))] by Theorem 3.14(i)), and the fact that W (s) is F(s)-measurable (and so
E[W (s)|F(s)] = W (s)).
Exercise 3.28. In the ∞-coin tosses experiment, let FN be the σ-algebra of the events
resolved by the first N tosses. Show that the random walk is a martingale with respect to the
filtration {FN }N ∈N . The proof can be found in

64
Thus Brownian motions are martingales, they have a.s. continuous paths and have
quadratic variation t in the interval [0, t], see Theorem 3.9. The following theorem, which
is a special case of the so called Lévy characterization of Brownian motion, show that
these three properties characterize Brownian motions and is often used to prove that a given
stochastic process is a Brownian motion. The proof can be found in [14].
Theorem 3.18. Let {M (t)}t≥0 be a martingale relative to a filtration {F(t)}t≥0 . Assume
that (i) M (0) = 0 a.s., (ii) the paths t → M (t, ω) are a.s. continuous and (iii) dM (t)dM (t) =
dt. Then {M (t)}t≥0 is a Brownian motion and {F(t)}t≥0 a non-anticipating filtration thereof.
Exercise 3.29. Consider the stochastic process {Z(t)}t≥0 given by
1 
Z(t) = exp σW (t) − σ 2 t ,
2
where {W (t)}t≥0 is a Brownian motion and σ ∈ R is a constant. Let {F(t)}t≥0 be a
non-anticipating filtration for {W (t)}t≥0 . Show that {Z(t)}t≥0 is a martingale relative to
{F(t)}t≥0 .
Exercise 3.30 (•). Let {N (t)}t≥0 be a Poisson process generating the filtration {FN (t)}t≥0 .
Show that (i) {N (t)}t≥0 is a sub-martingale relative to {FN (t)}t≥0 and (ii) the so-called
compound Poisson process {N (t) − λt}t≥0 is a martingale relative to{FN (t)}t≥0 , where
λ is the rate of the Poisson process (see Definition 2.16).
Exercise 3.31 (•). Let {F(t)}t∈[0,T ] be a filtration and {M (t)}t∈[0,T ] a stochastic process
adapted to {F(t)}t∈[0,T ] . Show that {M (t)}t∈[0,T ] is a martingale if and only if there exists a
F(T )-measurable random variable H ∈ L1 (Ω) such that
M (t) = E[H|F(t)].
Now assume that {Z(t)}t≥0 is a martingale such that Z(t) > 0 a.s. for all t ≥ 0 and
Z(0) = 1. In particular, E[Z(t)] = E[Z(0)] = 1 and therefore, by Theorem 3.3, the map
e : F → [0, 1] given by
P
P(A)
e = E[Z(T )IA ], A ∈ F (3.22)
is a probability measure equivalent to P, for all T > 0. Note that Pe depends on T > 0 and
P = P, for T = 0. The dependence on T is however not reflected in our notation. As usual,
e
the (conditional) expectation in the probability measure Pe will be denoted E.
e The relation
between E and E e is revealed in the following theorem.

Theorem 3.19. Let t ∈ [0, T ] and let X be a FW (t)-measurable random variable such that
Z(t)X ∈ L1 (Ω, P). Then X ∈ L1 (Ω, P)
e and

E[X]
e = E[Z(t)X]. (3.23)
Moreover, for all 0 ≤ s ≤ t and for all random variables Y such that Z(t)Y ∈ L1 (Ω, P),
there holds
e |FW (s)] = 1 E[Z(t)Y |FW (s)] (almost surely).
E[Y (3.24)
Z(s)

65
Proof. As shown in Theorem 3.3, E[X]
e = E[Z(T )X]. By Theorem 3.13(i), Theorem 3.14(ii),
and the martingale property of {Z(t)}t≥0 , we have

E[Z(T )X] = E[E[Z(T )X|FW (t)]] = E[XE[Z(T )|FW (t)]] = E[Z(t)X].

To prove (3.24), recall that the conditional expectation is uniquely defined (up to null sets)
by (3.18). Hence the identity (3.24) follows if we show that
−1
E[Z(s)
e E[Z(t)Y |FW (s)]IA ] = E[Y
e IA ],

for all A ∈ FW (s). Since IA is FW (s)-measurable, and using (3.23) with X = Z(s)−1 E[Z(t)Y IA |FW (s)]
and t = s, we have
−1 −1
E[Z(s)
e E[Z(t)Y |FW (s)]IA ] = E[Z(s)
e E[Z(t)Y IA |FW (s)]] = E[E[Z(t)Y IA |FW (s)]]
= E[Z(t)Y IA ] = E[Y
e IA ],

where in the last step we used again (3.23). The proof is complete.

3.7 Markov processes


In this section we introduce another class of stochastic processes, which will play a funda-
mental role in the following chapters.

Definition 3.9. A stochastic process {X(t)}t≥0 is called a Markov process with respect
to the filtration {F(t)}t≥0 if it is adapted to {F(t)}t≥0 and if for every measurable function
g : R → R such that g(X(t)) ∈ L1 (Ω) for all t ≥ 0, there exists a measurable function
fg : [0, ∞) × [0, ∞) × R → R such that

E[g(X(t))|F(s)] = fg (t, s, X(s)), for all 0 ≤ s ≤ t. (3.25)

If there exists a measurable function feg : [0, ∞) × R → R such that fg (t, s, x) = feg (t − s, x),
the Markov process is said to be homogeneous. If there exists a measurable function p :
[0, ∞) × [0, ∞) × R → R such that
Z
fg (t, s, x) = g(y)p(t, s, x, y) dy, for 0 ≤ s < t, (3.26)
R

then we call p the transition density of the Markov process.

Thus for a Markov process, the conditional expectation of g(X(t)) at the future time t
depends only on the random variable X(s) at time s, and not on the behavior of the process
before or after time s. Note that in the case of a homogeneous Markov process, the transition
density, if it exists, has the form p(t, s, x, y) = p∗ (t − s, x, y), for some measurable function
p∗ : [0, ∞) × R → R.

66
Remark 3.11. We will say that a stochastic process is a P-Markov process if we want to
emphasize that the Markov property holds in the probability measure P.
Exercise 3.32 (•). Show that the function fg (t, s, x) in the right hand side of (3.25) is given
by
fg (t, s, x) = E[g(X(t))|X(s) = x] for all 0 ≤ s ≤ t. (3.27)
Theorem 3.20. Let {X(t)}t≥0 be a Markov process with transition density p(t, s, x, y) rela-
tive to the filtration {F(t)}t≥0 . Assume X(s) = x ∈ R is a deterministic constant. Then for
all t ≥ s, X(t) admits the density fX(t) given by

fX(t) (y) = p(t, s, x, y).

Proof. By definition of density


Z x
P(X(t) ≤ x) = fX(t) (y) dy,
−∞

see Definition 2.7. Letting X(s) = x into (3.25)-(3.26) we obtain


Z
E[g(X(t))] = g(y)p(t, s, x, y) dy.
R

Choosing g = I(−∞,x] , we obtain


Z x
P(X(t) ≤ x) = p(t, s, x, y) dy,
−∞

hence fX (t)(y) = p(t, s, x, y).


Theorem 3.21. Let {F(t)}t≥0 be a non-anticipating filtration for the Brownian motion
{W (t)}t≥0 . Then {W (t)}t≥0 is a homogeneous Markov process relative to {F(t)}t≥0 with
transition density p(t, s, x, y) = p∗ (t − s, x, y), where
1 (y−x)2
p∗ (τ, x, y) = √ e− 2τ . (3.28)
2πτ
Proof. The statement holds for s = t, with fg (t, t, x) = g(x). For s < t we write

E[g(W (t))|F(s)] = E[g(W (t) − W (s) + W (s))|F(s)] = E[e


g (W (s), W (t) − W (s))|F(s)],

where ge(x, y) = g(x + y). Since W (t) − W (s) is independent of F(s) and W (s) is F(s)
measurable, then we can apply Theorem 3.15. Precisely, letting

g (x, W (t) − W (s))],


fg (t, s, x) = E[e

we have
E[g(W (t))|F(s)] = fg (t, s, W (s)),

67
which proves that the Brownian motion is a Markov process relative to {F(t)}t≥0 . To derive
the transition density we use that Y = W (t) − W (s) ∈ N (0, t − s), so that

1 1
Z Z
y2 (y−x)2
− 2(t−s)
E[g(x + Y )] = p g(x + y)e dy = p g(y)e− 2(t−s) dy,
2π(t − s) R 2π(t − s) R

hence Z 
E[g(W (t))|F(s)] = g(y)p∗ (t − s, x, y) dy ,
R x=W (s)

where p∗ is given by (3.28). This concludes the proof of the theorem.

Exercise 3.33. Show that, when p is given by (3.28), the function


Z
u(t, x) = g(y)p∗ (t − s, x, y) dy (3.29)
R

solves the heat equation with initial datum g at time t = s, namely


1
∂t u = ∂x2 u, u(s, x) = g(x), t > s, x ∈ R. (3.30)
2
Exercise 3.34. Let {F(t)}t≥0 be a non-anticipating filtration for the Brownian motion
{W (t)}t≥0 . Show that the geometric Brownian motion

S(t) = S(0)eσW (t)+αt

is a homogeneous Markov process in the filtration {F(t)}t≥0 with transition density p(t, s, x, y) =
p∗ (t − s, x, y), where

(log(y/x) − ατ )2
 
1
p∗ (τ, x, y) = √ exp − Iy>0 . (3.31)
σy 2πτ 2σ 2 τ

Show also that, when p is given by (3.31), the function v : (s, ∞) × (0, ∞) → R given by
Z
v(t, x) = g(y)p∗ (t − s, x, y) dy (3.32)
R

satisfies
1
∂t vs − (α + σ 2 /2)x∂x vs − σ 2 x2 ∂x2 vs = 0, for x > 0, t > s, (3.33a)
2
v(s, x) = g(x), for x > 0. (3.33b)

The correspondence between Markov processes and PDEs alluded to in the last two
exercises is a general property which will be further discussed later in the notes.

68
3.A Appendix: Solutions to selected problems
Exercise 3.1. Let X be a binomial random variable. Then
N N   N  
X X N k N −k N −1
X N p k−1
E[X] = kP(X = k) = k p (1 − p) = (1 − p) p k .
k=1 k=1
k k=0
k 1 − p

Now, by the binomial theorem


N  
X N k
x = (1 + x)N ,
k=0
k

for all x ∈ R. Differentiating with respect to x we get


N  
X N k−1
k x = N (1 + x)N −1 ,
k=0
k

Letting x = p/(1 − p) in the last identity we find E[X] = N p.

Exercise 3.3. If Y = 0 almost surely, the claim is obvious. Hence we may assume that
E[Y 2 ] > 0. Let
E[XY ]
Z=X− Y.
E[Y 2 ]
Then
E[XY ]2
0 ≤ E[Z 2 ] = E[X 2 ] − ,
E[Y 2 ]
by which (3.3) follows.

Exercise 3.5. The first and second properties follow by the linearity of the expectation. In
fact
Var[αX] = E[α2 X 2 ] − E[αX]2 = α2 E[X 2 ] − α2 E[X]2 = α2 Var[X],
and

Var[X + Y ] = E[(X + Y )2 ] − E[X + Y ]2 = E[X 2 ] + E[Y 2 ] + 2E[XY ]


− E[X]2 − E[Y ]2 − 2E[X]E[Y ] = Var[X] + Var[Y ] + 2Cov(X, Y ).

For the third property, let a ∈ R and compute, using 1 and 2,

Var[Y − aX] = a2 Var[X] + Var[Y ] − 2aCov(X, Y ).

Since the variance of a random variable is always non-negative, the parabola y(a) = a2 Var[X]+
Var[Y ]−2aCov(X, Y ) must always lie above the a-axis, or touch it at one single point a = a0 .
Hence
Cov(X, Y )2 − Var[X]Var[Y ] ≤ 0,

69
which proves the first part of the claim 3. Moreover Cov(X, Y )2 = Var[X]Var[Y ] if and only
if there exists a0 such that Var[−a0 X + Y ] = 0, i.e., Y = a0 X + b0 almost surely, for some
constant b0 . Substituting in the definition of covariance, we see that Cov(X, a0 X + b0 ) =
a0 Var[X], by which the second claim of property 3 follows immediately.

Exercise 3.8. By linearity of the expectation,


1
E[Wn (t)] = √ E[M[nt] ] = 0,
n
where we used the fact that E[Xk ] = E[Mk ] = 0. Since Var[Mk ] = k, we obtain
[nt]
Var[Wn (t)] =
.
n
Since nt ∼ [nt], as n → ∞, then limn→∞ Var[Wn (t)] = t. As to the covariance of Wn (t) and
Wn (s) for s 6= t, we compute
Cov[Wn (t), Wn (s)] = E[Wn (t)Wn (s)] − E[Wn (t)]E[Wn (s)] = E[Wn (t)Wn (s)]
 
1 1 1
= E √ M[nt] √ M[ns] = E[M[nt] M[ns] ]. (3.34)
n n n
Assume t > s (a similar argument applies to the case t < s). If [nt] = [ns] we have
E[M[nt] M[ns] ] = Var[M[ns] ] = [ns]. If [nt] ≥ 1 + [ns] we have
2
E[M[nt] M[ns] ] = E[(M[nt] −M[ns] )M[ns] ]+E[M[ns] ] = E[M[nt] −M[ns] ]E[M[ns] ]+Var[M[ns] ] = [ns],
where we used that the increment M[nt] − M[ns] is independent of M[ns] . Replacing into (3.34)
we obtain
[ns]
Cov[Wn (t), Wn (s)] = .
n
It follows that limn→∞ Cov[Wn (t), Wn (s)] = s.

Exercise 3.10. We have



1
Z Z
−λx
E[X] = λxe Ix≥0 dx = λ xe−λx dx = ,
R 0 λ
and
1
Z
2
Var[X] = E[X ] − E[X] = λx2 e−λx Ix≥0 dx − 2
2
R λ
Z ∞
1 2 1 1
=λ x2 e−λx dx − 2 = 2 − 2 = 2 .
0 λ λ λ λ

Exercise 3.11. According to Theorem 3.3, to prove that P e is equivalent to P we have to


show that E[Z] = 1. Using the density of exponential random variables we have
λ
e eZ ∞ e
λ
−(λ−λ)X
E[Z] = E[e ]= e−(λ−λ)x λe−λx dx = 1.
e
λ λ 0

70
To show that X ∈ E(λ)
e in the probability measure Pe we compute

λ
e eZ x e
λ
−(
≤ x) = E[ZIX≤x ] = E[e λ−λ)X
IX≤x ] = e−(λ−λ)y λe−λy dy = 1 − e−λx .
e e
P(X
e
λ λ 0

Exercise 3.24. Let G be generated by the partition {Ak }k=1,...M of Ω. Since X is indepen-
dent of G, then X and IAk are independent random variables, for all k = 1, . . . , M . It follows
that
M M M M
X E[XIAk ] X E[X]E[IAk ] X E[X]P(Ak ) X
E[X|G] = IAk = IAk = IAk = E[X] IAk = E[X].
P(A k) P(Ak) P(Ak)
k=1 k=1 k=1 k=1

The proof of (ii) when Y is a simple random variable is a straightforward application of


Theorem 3.13(ii).

Exercise 3.26. We have E[Err] = E[X] − E[E[X|G]] = 0, by Theorem 3.13(i). Let Y be


G-measurable and set µ = E[Y − X]. Then
Var[Y − X] = E[(Y − X − µ)2 ] = E[(Y − X − µ + E[X|G] − E[X|G])2 ]
h i
= E (E[X|G] − X)2 + (Y − µ − E[X|G])2 + 2(E[X|G] − X)(Y − µ − E[X|G])
= Var[Err] + E[α] + 2E[β],
where α = (Y − µ − E[X|G])2 and β = (E[X|G] − X)(Y − µ − E[X|G]). As E[α] ≥ 0 we have
Var[Y − X] ≥ Var[Err] + 2E[β]. Furthermore, as Y − µ − E[X|G] is G-measurable, then
E[β] = E[E[β|G]] = E[(Y − µ − E[X|G])E[(E[X|G] − X)|G] = 0.
Hence Var[Y − X] ≥ Var[Err], for all G-measurable random variables Y .

Exercise 3.27. We have, since E[X2 ] = 0,


Cov(X2 , Y ) = E[(X2 − E[X2 ])(Y − E[Y ])] = E[(X − E[X|Y ])(Y − E[Y ])]
= E[XY − E[Y ]X − Y E[X|Y ] + E[Y ]|E[X|Y ]]
h i
= E E[XY − E[Y ]X − Y E[X|Y ] + E[Y ]|E[X|Y ]] Y
= E[Y E[X|Y ] − E[Y ]E[X|Y ] − Y E[X|Y ] + E[Y ]E[X|Y ]] = 0.

Exercise 3.30. First we observe that claim (i) follows by claim (ii). In fact, if the compound
Poisson process is a martingale, then
E[N (t) − λt|FN (s)] = N (s) − λs, for all 0 ≤ s ≤ t,
by which it follows that
E[N (t)|FN (s)] = N (s) + λ(t − s) ≥ N (s), for all 0 ≤ s ≤ t.

71
Hence it remains to prove (ii). We have

E[N (t) − λt|FN (s)] = E[N (t) − N (s) + N (s) − λt|FN (s)]
= E[N (t) − N (s)|FN (s)] + E[N (s)|FN (s)] − λt
= E[N (t) − N (s)] + N (s) − λt = λ(t − s) + N (s) − λt = N (s) − λs.

Exercise 3.31. If M (t) is a martingale, then E[M (T )|F(t)] = M (t), hence we can pick
H = M (T ). Viceversa, by the iterative property of the conditional expectation, the process
E[H|F(t)] satisfies, for all s > t,

E[E[H|F(t)]|F(s)] = E[H|F(s)],

hence it is a martingale.

Exercise 3.32. Taking the conditional expectation of both sides of (3.25) with respect to
the event {X(s) = x} gives (3.27).

72
Chapter 4

Stochastic calculus

Throughout this chapter we assume that the probability space (Ω, F, P) and the Brownian
motion {W (t)}t≥0 are given. Moreover we denote by {F(t)}t≥0 a non-anticipating filtration
for the Brownian motion, e.g., F(t) = FW (t) (see Definition 2.15).

4.1 Introduction
So far we have studied in detail only one example of stochastic process, namely the Brownian
motion {W (t)}t≥0 . In this chapter we define several other processes which are naturally
derived from {W (t)}t≥0 and which in particular are adapted to {F(t)}t≥0 . To begin with, if
f : [0, ∞) × R → R is a measurable function, then we can introduce the stochastic processes
Z t
{f (t, W (t))}t≥0 , { f (s, W (s)) ds}t≥0 .
0

Note that the integral in the second stochastic process is the standard Lebesgue integral on
the s-variable. It is well-defined for instance when f is a continuous function.
The next class of stochastic processes that we want to consider are those obtained by
integrating along the paths of a Brownian motion, i.e., we want to give sense to the integral
Z t
I(t) = X(s)dW (s), (4.1)
0

where {X(t)}t≥0 is a stochastic process adapted to {F(t)}t≥0 . For our purposes we need to
give a meaning to I(t) when {X(t)}
R t≥0 has continuous paths a.s. (e.g., X(t) = W (t)). The
problem now is that the integral X(t)dg(t) is well-defined for continuous functions X (in the
Riemann-Stieltjes sense) only when g is of bounded variation. As shown in Exercise 3.14, the
paths of the Brownian motion are not of bounded variation, hence we have to find another
way to define (4.1). We begin in the next section by assuming that {X(t)}t≥0 is a step
process. Then we shall extend the definition to stochastic processes {X(t)}t≥0 such that
Z T
{X(t)}t≥0 is {F(t)}t≥0 -adapted and E[ X(t)2 dt] < ∞, for all T > 0. (4.2)
0

73
We denote by L2 the family of stochastic processes satisfying (4.2). The integral (4.1) can
be defined for more general processes than those in the class L2 , as we briefly discuss in
Theorem 4.4.

4.2 The Itô integral of step processes


Given 0 = t0 < t1 < · · · < tk < tk+1 < . . . and a sequence X1 , X2 , . . . of random variables
such that, for all j ∈ N, Xj ∈ L2 (Ω) and Xj is F(tj )-measurable, consider the {F(t)}t≥0 -
adapted step process
X∞
∆(t) = ∆(tj )I[tj ,tj+1 ) , ∆(tj ) = Xj . (4.3)
j=0

Note that {∆(t)}t≥0 ∈ L , by the assumption Xj ∈ L2 (Ω), for all j ∈ N. If we had to


2

integrate ∆(t) along a stochastic process {Y (t)}t≥0 with differentiable paths, we would have,
assuming t ∈ (tk , tk+1 ),
Z t Z tX ∞ k−1
X Z tj+1 Z t
∆(s)dY (s) = ∆(tj )I[tj ,tj+1 ) dY (t) = ∆(tj ) dY (t) + ∆(tk ) dY (t)
0 0 j=0 j=0 tj tk

k−1
X
= ∆(tj )(Y (tj+1 ) − Y (tj )) + ∆(tk )(Y (t) − Y (tk )).
j=0

The second line makes sense also for stochastic processes {Y (t)}t≥0 whose paths are no-
where differentiable, and thus in particular for the Brownian motion. We then introduce the
following definition.
Definition 4.1. The Itô integral over the interval [0, t] of a step process {∆(t)}t≥0 ∈ L2
is given by
Z t k−1
X
I(t) = ∆(s)dW (s) = ∆(tj )(W (tj+1 ) − W (tj )) + ∆(tk )(W (t) − W (tk )),
0 j=0

where tk is such that tk ≤ t < tk+1 .


Note that {I(t)}t≥0 is {F (t)}t≥0 -adapted stochastic process adapted with a.s. continuous
paths (in fact, the only dependence on the t variable is through W (t)). The following theorem
collects some other important properties of the Itô integral of a step process.
Theorem 4.1. The Itô integral of a step process {∆(t)}t≥0 ∈ L2 satisfies the following
properties.
(i) Linearity: for every pair of {F(t)}t≥0 -adapted step processes {∆1 (t)}t≥0 , {∆2 (t)}t≥0
and real constants c1 , c2 ∈ R there holds
Z t Z t Z t
(c1 ∆1 (s) + c2 ∆2 (s))dW (s) = c1 ∆1 (s)dW (s) + c2 ∆2 (s)dW (s).
0 0 0

74
(ii) Martingale property: the stochastic process {I(t)}t≥0 is a martingale in the filtration
{F(t)}t≥0 . In particular, E[I(t)] = E[I(0)] = E[0] = 0.
(iii) Quadratic variation: the quadratic variation of the stochastic process {I(t)}t≥0 on
the interval [0, T ] is independent of the sequence of partitions along which it is computed
and it is given by Z T
[I, I](T ) = ∆2 (s) ds.
0
Rt
(iv) Itô’s isometry: E[I 2 (t)] = E[ 0 ∆2 (s) ds], for all t ≥ 0.
Proof. The proof of (i) is straightforward. For the remaining claims, see the following theo-
rems in [21]: Theorem 4.2.1 (martingale property), Theorem 4.2.2 (Itô’s isometry), Theorem
4.2.3 (quadratic variation). Here we present the proof of (ii). First we remark that the con-
dition I(t) ∈ L1 (Ω), for all t ≥ 0, follows easily by the assumption that ∆(tj ) = Xj ∈ L2 (Ω),
for all j ∈ N and the Schwartz inequality. Hence we have to prove that
E[I(t)|F(s)] = I(s), for all 0 ≤ s ≤ t.
There are two possibilities: (1) either s, t ∈ [tk , tk+1 ), for some k ∈ N, or (2) there exists
l < k such that s ∈ [tl , tl+1 ) and t ∈ [tk , tk+1 ). We assume that (2) holds, the proof in the
case (1) being simpler. We write
l−1
X
I(t) = ∆(tj )(W (tj+1 ) − W (tj )) + ∆(tl )(W (tl+1 ) − W (tl ))
j=0
k−1
X
+ ∆(tj )(W (tj+1 ) − W (tj )) + ∆(tk )(W (t) − W (tk ))
j=l+1
Z t
= I(tl+1 ) + ∆(u)dW (u).
tl+1

Taking the conditional expectation of I(tl+1 ) we obtain


l−1
X
E[I(tl+1 )|F(s)] = E[∆(tj )(W (tj+1 ) − W (tj ))|F(s)] + E[∆(tl )(W (tl+1 ) − W (tl ))|F(s)].
j=0

As tl−1 < s, all random variables in the sum in the right hand side of the latter identity are
F(s)-measurable. Hence, by Theorem 3.13(ii),
l−1
X l−1
X
E[∆(tj )(W (tj+1 ) − W (tj ))|F(s)] = ∆(tj )(W (tj+1 ) − W (tj )).
j=0 j=0

Similarly,
E[∆(tl )(W (tl+1 ) − W (tl ))|F(s)] = E[∆(tl )W (tl+1 )|F(s)] − E[∆(tl )W (tl )|F(s)]
= ∆(tl )E[W (tl+1 )|F(s)] − ∆(tl )W (tl )
= ∆(tl )W (s) − ∆(tl )W (tl ),

75
where for the last equality we used that {W (t)}t≥0 is a martingale in the filtration {F(t)}t≥0 .
Hence
l−1
X
E[I(tl+1 )|F(s)] = ∆(tj )(W (tj+1 ) − W (tj )) + ∆(tl )(W (s) − W (tl )) = I(s).
j=0

To conclude the proof we have to show that


Z t k−1
X
E[ ∆(u)dW (u)] = E[∆(tj )(W (tj+1 )−W (tj ))|F(s)]+E[∆(tk )(W (t)−W (tk ))|F(s)] = 0.
tl+1 j=l+1

To prove this, we first note that, as before,


E[∆(tj )(W (tj+1 ) − W (tj ))|F(tj )] = ∆(tj )W (tj ) − ∆(tj )W (tj ) = 0.
Moreover, since F(s) ⊂ F(tj ), for j = l + 1, . . . , k − 1, as using Theorem 3.13(iii),
E[∆(tj )(W (tj+1 ) − W (tj ))] = E[E[∆(tj )(W (tj+1 ) − W (tj ))|F(tj )]|F(s)] = E[0|F(s)] = 0.
At the same fashion, since F(s) ⊂ F(tk ), we have
E[∆(tk )(W (t) − W (tk ))|F(s)] = E[E[∆(tk )(W (t) − W (tk ))|F(tk )]|F(s)] = 0.

Next we show that any stochastic process can be approximated, in a suitable sense, by
step processes.
Theorem 4.2. Let {X(t)}t≥0 ∈ L2 . Then for all T > 0 there exists a sequence of step
processes {{∆Tn (t)}t≥0 }n∈N such that ∆Tn (t) ∈ L2 for all n ∈ N and
Z T
lim E[ |∆Tn (t) − X(t)|2 dt] = 0. (4.4)
n→∞ 0

Proof. For simplicity we argue under the stronger assumption that the stochastic process
{X(t)}t≥0 is bounded and with continuous paths, namely
ω → X(t, ω) is bounded in Ω, for all t ≥ 0,
t → X(t, ω) is continuous for all ω ∈ Ω and t ≥ 0.
Now consider the partition of [0, T ] given by
(n) (n) (n) jT
0 = t0 < t1 < · · · < t(n)
n = T, tj =
n
and define
n−1
X (n)
∆Tn (t) = X(tk )I[t(n) ,t(n) ) , t ≥ 0,
k k+1
j=0

see Figure 4.1. Let us show that {∆Tn (t)}t≥0 is adapted to {F(t)}t≥0 . This is obvious for t ≥ T

76
X(t)

∆n (t)

(n) (n) (n)


t0 = 0 (n)
t1
(n)
t2 t3 (n)
t4 t5

Figure 4.1: A step process approximating a general stochastic process

(n)
(since the step process is identically zero for t ≥ T ). For t ∈ [0, T ) we have ∆Tn (t) = X(tk ),
(n) (n)
for t ∈ [tk , tk+1 ), hence
(n)
F∆Tn (t) = FX(t(n) ) ⊂ F(tk ) ⊂ F(t),
k (∗) (∗∗)

(n)
where in (*) we used that {X(t)}t≥0 is adapted to {F(t)}t≥0 and in (**) the fact that tk < t.
Moreover,
lim ∆Tn (t) = X(t), for all ω ∈ Ω,
n→∞
by the assumed continuity of the paths of {X(t)}t≥0 . For the next step we use the dominated
convergence theorem, see Remark 3.2. Since ∆Tn (t) and X(t) are bounded on [0, T ]×Ω, there
exists a constant CT such that |∆n (t) − X(t)|2 ≤ CT . Hence we may move the limit on the
left hand side of (4.4) across the expectation and integral operators and conclude that
Z T Z T
T 2
lim E[ |∆n (t) − X(t)| dt] = E[ lim |∆Tn (t) − X(t)|2 dt] = 0,
n→∞ 0 0 n→∞

as claimed.

4.3 Itô’s integral of general stochastic processes


The Itô integral of a general stochastic process is defined as the limit of the Itô integral
along a sequence of approximating step processes (in the sense of Theorem 4.2). The precise

77
definition is the following.
Theorem 4.3 (and Definition). Let {X(t)}t≥0 ∈ L2 , T > 0 and {{∆Tn (t)}t≥0 }n∈N be a
sequence of L2 -step processes converging to {X(t)}t≥0 in the sense of (4.4). Let
Z T
In (T ) = ∆Tn (s)dW (s).
0

Then there exists a random variable I(T ) such that


p
kIn (T ) − I(T )k2 := E[|In (T ) − I(T )|2 ] → 0, as n → ∞.
The random variable I(T ) is independent of the sequence of L2 -step processes converging to
{X(t)}t≥0 . I(T ) is called Itô’s integral of {X(t)}t≥0 on the interval [0, T ] and denoted by
Z T
I(T ) = X(s)dW (s).
0

Proof. By Itô’s isometry,


Z T
2
E[|In (T ) − Im (T )| ] = E[ |∆Tn (s) − ∆Tm (s)|2 ds].
0

We have
Z T Z T
E[ |∆Tn (s) − ∆Tm (s)|2 ds] ≤ 2E[ |∆Tn (s) − X(s)|2 ds]
0 0
Z T
+ 2E[ |∆Tm (s) − X(s)|2 ds] → 0 as n, m → ∞.
0

It follows that {In (T )}n∈N is a Cauchy sequence in the norm k · k2 . As mentioned in Re-
mark 3.5, the norm k · k2 is complete, i.e., Cauchy sequences converge. This proves the
existence of I(T ) such that kIn (T ) − I(T )k2 → 0. To prove that the limit is the same along
any sequence of L2 -step processes converging to {X(t)}t≥0 , assume that {{∆n (t)}t≥0 }n∈N ,
{{∆e n (t)}t≥0 }n∈N are two such sequences and denote
Z T Z T
In (T ) = ∆n (s)dW (s), In (t) =
e ∆
e n (s)dW (s).
0 0

Then, using (i), (iv) in Theorem 4.1, we compute


Z T Z T
2 2 e n (s)|2 ds]
E[(In (T ) − In (T )) ] = E[(
e (∆n (s) − ∆n (s))dW (s)) ] = E[
e |∆n (s) − ∆
0 0
Z T Z T
≤ 2E[ |∆n (s) − X(s)|2 ds] + 2E[ e n (s) − X(s)|2 ds] → 0,
|∆
0 0
as n → ∞,

which proves that In (T ) and Ien (T ) have the same limit. This completes the proof of the
theorem.

78
As a way of example, we compute the Itô integral of the Brownian motion. We claim
that, for all T > 0, Z T
W 2 (T ) T
W (t)dW (t) = − . (4.5)
0 2 2
To prove the claim, we approximate the Brownian motion by the sequence of step processes
introduced in the proof of Theorem 4.2. Hence we define
n−1
X jT
∆Tn (t) = W( )I jT j+1 .
j=0
n [ n , n T)

By definition
T n−1
jT (j + 1)T jT
Z X
In (T ) = ∆Tn (t)dW (t) = W( )[W ( ) − W ( )].
0 j=0
n n n

To simplify the notation we let Wj = W (jT /n). Hence our goal is to prove
n−1 2
X W 2 (T ) T
E[ Wj (Wj+1 − Wj ) − + ] → 0, as n → ∞. (4.6)
j=0
2 2

We prove below that the sum within the expectation can be rewritten as
n−1 n−1
X 1 1X
Wj (Wj+1 − Wj ) = W (T )2 − (Wj+1 − Wj )2 . (4.7)
j=0
2 2 j=0

Hence (4.6) is equivalent to


n−1 2
1 X
E[ (Wj+1 − Wj )2 − T ] → 0, as n → ∞,
4 j=0

which holds true by the already proven fact that [W, W ](T ) = T , see Theorem 3.9. It remains
to establish (4.7). Since W (T ) = Wn , we have
n−1 n−1 n−1 n−1
W (T ) 1 X 1 1X 2 1X 2 X
− (Wj+1 − Wj )2 = Wn2 − Wj+1 − W + Wj Wj+1
2 2 j=0 2 2 j=0 2 j=0 j j=0
n−2 n−1 n−1
1X 2 1X 2 X
=− Wj+1 − W + Wj Wj+1
2 j=0 2 j=1 j j=1
n−1
X n−1
X n−1
X
=− Wj2 + Wj Wj+1 = Wj (Wj+1 − Wj )
j=1 j=1 j=1
n−1
X
= Wj (Wj+1 − Wj ).
j=0

The proof of (4.5) is complete.

79
Exercise 4.1. Use the definition of Itô’s integral to prove that
Z T Z T
T W (T ) = W (t)dt + tdW (t). (4.8)
0 0

The Itô integral can be defined under weaker assumptions on the integrand stochastic
process than those considered so far. As this fact will be important in the following sections,
it is worth to briefly discuss it. Let M2 denote the set of {F(t)}t≥0 -adapted stochastic
RT
processes {X(t)}t≥0 such that 0 X(t)2 dt is bounded a.s. for all T > 0 (of course, L2 ⊂ M2 ).
Theorem 4.4 (and Definition). For every process {X(t)}t≥0 ∈ M2 and T > 0 there exists
a sequence of step processes {{∆Tn (t)}t≥0 }n∈N ⊂ M2 such that
Z T
lim |X(s) − ∆Tn (s)|2 ds → 0 a.s.
n→∞ 0

and Z T
∆n (t)dW (t)
0
converges in probability as n → ∞. The limit is independent of the sequence of step processes
converging to {X(t)}t≥0 and is called the Itô integral of the process {X(t)}t≥0 in the interval
[0, T ]. If {X(t)}t≥0 ∈ L2 , the Itô integral just defined coincides (a.s.) with the one defined
in Theorem 4.3.
For the proof of the previous theorem, see [1, Sec. 4.4]. We remark that Theorem 4.4
implies that all {F(t)}t≥0 -adapted stochastic processes with a.s. continuous paths are Itô
integrable. In fact, if {X(t)}t≥0 has a.s. continuous paths, then for all T > 0, there exists
CT (ω) such that supt∈[0,T ] |X(t, ω)| ≤ CT (ω) a.s. Hence
Z T
|X(s)|2 ds ≤ T CT2 (ω), a.s.
0

and thus Theorem 4.4 applies. The case of stochastic processes with a.s. continuous paths
covers all the applications in the following chapters, hence we shall restrict to it from now
on.
Definition 4.2. We define C 0 to be the space of all {F(t)}t≥0 -adapted stochastic processes
{X(t)}t≥0 with a.s. continuous paths.
In particular, if {X(t)}t≥0 , {Y (t)}t≥0 ∈ C 0 , then for all continuous functions f the process
{f (t, X(t), Y (t))}t≥0 belongs to C 0 and thus it is Itô integrable.
The properties listed in Theorem 4.1 carry over to the Itô integral of a general stochastic
process. For easy reference, we rewrite these properties in the following theorem.

Theorem 4.5. Let {X(t)}t≥0 ∈ C 0 . Then the Itô integral


Z t
I(t) = X(s)dW (s) (4.9)
0

80
satisfies the following properties for all t ≥ 0.

(0) {I(t)}t≥0 ∈ C 0 . If {X(t)}t≥0 ∈ L2 , then {I(t)}t≥0 ∈ L2 .

(i) Linearity: For all stochastic processes {X1 (t)}t≥0 , {X2 (t)}t≥0 ∈ C 0 and real constants
c1 , c2 ∈ R there holds
Z t Z t Z t
(c1 X1 (s) + c2 X2 (s))dW (s) = c1 X1 (s)dW (s) + c2 X2 (s)dW (s).
0 0 0

(ii) Martingale property: If {X(t)}t≥0 ∈ L2 , the stochastic process {I(t)}t≥0 is a mar-


tingale in the filtration {F(t)}t≥0 . In particular, E[I(t)] = E[I(0)] = 0, for all t ≥ 0.

(iii) Quadratic variation: For all T > 0, the quadratic variation of the stochastic process
{I(t)}t≥0 on the interval [0, T ] is independent of the sequence of partitions along which
it is computed and it is given by
Z T
[I, I](T ) = X 2 (s) ds. (4.10)
0

Rt
(iv) Itô’s isometry: If {X(t)}t≥0 ∈ L2 , then Var[I(t)] = E[I 2 (t)] = E[ 0 X 2 (s) ds], for all
t ≥ 0.
p
Proof of (ii). By (iv) and the Schwartz inequality, E[I(t)] ≤ E[I 2 (t)] < ∞. According
to (3.20), it now suffices to show that

E[I(t)IA ] = E[I(s)IA ], for all A ∈ F(s).

Let {{In (t)}t≥0 }n∈N be a sequence of Itô integrals of step processes which converges to
{I(t)}t≥0 in L2 (Ω), uniformly in compact intervals of time (see Theorem 4.3). Since {In (t)}t≥0
is a martingale for each n ∈ N, see Theorem 4.1, then

E[In (t)IA ] = E[In (s)IA ], for all A ∈ F(s).

Hence the claim follows if we show that E[In (t)IA ] → E[I(t)IA ], for all t ≥ 0. Using the
Schwarz inequality (3.3), we have
p p
E[(In (t) − I(t))IA ] ≤ E[(In (t) − I(t))2 ]E[IA ] ≤ kIn (t) − I(t)k2 P(A)
≤ kIn (t) − I(t)k2 → 0, as n → ∞,

and the claim follows.

Remark 4.1. Note carefully that the martingale property (ii) requires {X(t)}t≥0 ∈ L2 . A
stochastic process in C 0 \L2 is not a martingale in general (although it is a local martingale,
see [1]).

81
Exercise 4.2. Let {X(t)}t≥0 ∈ L2 . Show that the double Itô integral
Z t Z s 
J(t) = X(τ )dW (τ ) dW (s), t ≥ 0,
0 0

is well defined. Write down the properties in Theorem 4.5 for J(t).
Exercise 4.3. Prove the following generalization of Itô’s isometry. Let {X(t)}t≥0 , {Y (t)}t≥0 ∈
C 0 ∩ L2 and denote by IX (t), IY (t) their Itô integral over the interval [0, t]. Then
Z t
Cov(IX (t), IY (t)) = E[ X(s)Y (s) ds].
0

We now state without proof the martingale representation theorem, which asserts
that any martingale is an Itô integral (i.e., the logically opposite claim to (ii) in Theorem 4.5
holds). For the proof see Theorem 4.3.4 in [17].
Theorem 4.6. Let {M (t)}t≥0 , with M (t) ∈ L2 (Ω) for all t ≥ 0, be a martingale stochas-
tic process relative to the filtration {FW (t)}t≥0 . Then there exists a stochastic process
{Γ(t)}t≥0 ∈ L2 such that
Z t
M (t) = M (0) + Γ(s)dW (s), for all t ≥ 0.
0

Remark 4.2. Note carefully that the filtration used in the martingale representation the-
orem must be the one generated by the Brownian motion. Theorem 4.6 will be used in
Chapter 6 to show the existence of hedging portfolios of European derivatives, see Theo-
rem 6.2.
We conclude this section by introducing the “differential notation” for stochastic integrals.
Instead of (4.9), we write
dI(t) = X(t)dW (t).
For instance, the identities (4.5), (4.8) are also expressed as
d(W 2 (t)) = dt + 2W (t)dW (t), d(tW (t)) = W (t)dt + tdW (t).
The quadratic variation (4.10) is expressed also as
dI(t)dI(t) = X 2 (t)dt.
Note that this notation is in agreement with the one already introduced in Section 3.4,
namely Z t 
dI(t)dI(t) = d X (s)ds = X 2 (t)dt.
2
0
The differential notation is very useful to provide informal proofs in stochastic calculus.
For instance, using dI(t) = X(t)dW (t), and dW (t)dW (t) = dt, see (3.12), we obtain the
following simple “proof” of Theorem 4.5(iii):
dI(t)dI(t) = X(t)dW (t)X(t)dW (t) = X 2 (t)dW (t)dW (t) = X 2 (t)dt.

82
4.4 Diffusion processes
Now that we know how to integrate along the paths of a Brownian motion, we can define a
new class of stochastic processes.

Definition 4.3. Given {α(t)}t≥0 , {σ(t)}t≥0 ∈ C 0 , the stochastic process {X(t)}t≥0 ∈ C 0 given
by Z t Z t
X(t) = X(0) + σ(s)dW (s) + α(s) ds, t≥0 (4.11)
0 0

is called diffusion process with rate of quadratic variation {σ 2 (t)}t≥0 and drift {α(t)}t≥0 .

We denote diffusion processes also as

dX(t) = σ(t)dW (t) + α(t)dt. (4.12)

Note that

dX(t)dX(t) = σ 2 (t)dW (t)dW (t) + α2 (t)dtdt + σ(t)α(t)dW (t)dt

and thus, by (3.11), (3.12) and (3.14), we obtain

dX(t)dX(t) = σ 2 (t)dt,

which means that the quadratic variation of the diffusion process (4.12) is given by
Z t
[X, X](t) = σ 2 (s) ds, t ≥ 0.
0

Thus the stochastic process {σ 2 (t)}t≥0 measures the rate at which quadratic variations accu-
mulates in time in the diffusion process {X(t)}t≥0 . Furthermore, assuming {σ(t)}t≥0 ∈ L2 ,
we have Z t
E[ σ(s)dW (s)] = 0.
0
Rt
Hence the term 0 α(s)ds is the only one contributing to the evolution of the average of
{X(t)}t≥0 , which is the reason to call α(t) the drift of the diffusion process (if α = 0 and
{σ(t)}t≥0 ∈ L2 , the diffusion process is a martingale, as it follows by Theorem 4.5(ii)).
Finally, the integration along the paths of the diffusion process (4.12) is defined as
Z t Z t Z t
Y (s)dX(s) := Y (s)σ(s)dW (s) + Y (s)α(s)ds, (4.13)
0 0 0

for all {Y (t)}t≥0 ∈ C 0 .

83
4.4.1 The product rule in stochastic calculus
Recall that if f, g : R → R are two differentiable functions, the product (or Leibnitz) rule of
ordinary calculus states that
(f g)0 = f 0 g + f g 0 ,
and thus Z t
f g(t) = f g(0) + (g(s)df (s) + f (s)dg(s)).
0
Can this rule be true in stochastic calculus, i.e., when f and g are general diffusion processes?
The answer is clearly no. In fact, letting for instance f (t) = g(t) = W (t), Leibnitz’s rule give
us the relation d(W 2 (t)) = 2W (t)dW (t), while we have seen before that the correct formula
in Itô’s calculus is d(W 2 (t)) = 2W (t)dW (t) + t. The correct product rule in Itô’s calculus is
the following.
Theorem 4.7. Let {X1 (t)}t≥0 and {X2 (t)}t≥0 be the diffusion processes
dXi (t) = σi (t)dW (t) + θi (t)dt.
Then {X1 (t)X2 (t)}t≥0 is the diffusion process given by
d(X1 (t)X2 (t)) = X2 (t)dX1 (t) + X1 (t)dX2 (t) + σ1 (t)σ2 (t)dt. (4.14)
Exercise 4.4 (•). Prove the theorem in the case that αi and σi are deterministic constants
and Xi (0) = 0, for i = 1, 2.
Recall that the correct way to interpret (4.14) is
Z t Z t Z t
X1 (t)X2 (t) = X1 (0)X2 (0) + X2 (s)dX1 (s) + X1 (s)dX2 (s) + σ1 (s)σ2 (s)ds, (4.15)
0 0 0

where the integrals along the paths of the processes {Xi (t)}t≥0 are defined as in (4.13). Note
that all integrals in (4.15) are well-defined, since the integrand stochastic processes have a.s.
continuous paths. We also remark that, since
dX1 (t)dX2 (t) = (σ1 (t)dW (t) + α1 (t)dt)(σ2 (t)dW (t) + α2 (t)dt)
= σ1 (t)σ2 (t)dW (t)dW (t) + (α1 (t)σ2 (t) + α2 (t)σ1 (t))dW (t)dt + α1 (t)α2 (t)dtdt
= σ1 (t)σ2 (t)dt,
then we may rewrite (4.14) as
d(X1 (t)X2 (t)) = X2 (t)dX1 (t) + X1 (t)dX2 (t) + dX1 (t)dX2 (t), (4.16)
which is somehow easier to remember. Going back to the examples considered in Section 4.3,
the Itô product rule gives
d(W 2 (t)) = W (t)dW (t) + W (t)dW (t) + dW (t)dW (t) = 2W (t)dW (t) + dt,
d(tW (t)) = tdW (t) + W (t)dt + dW (t)dt = tdW (t) + W (t)dt,
in agreement with our previous calculations, see (4.5) and (4.8).

84
4.4.2 The chain rule in stochastic calculus
Next we consider the generalization to Itô’s calculus of the chain rule. Let us first recall how
the chain rule works in ordinary calculus. Assume that f : R × R → R and g : R → R are
differentiable functions. Then
d d
f (t, g(t)) = ∂t f (t, g(t)) + ∂x f (t, g(t)) g(t),
dt dt
by which we derive
Z t Z t
f (t, g(t)) = f (0, g(0)) + ∂s f (s, g(s)) ds + ∂x f (s, g(s)) dg(s).
0 0

Can this formula be true in stochastic calculus, i.e., when g is a diffusion process? The
answer is clearly no. In fact by setting f (t, x) = x2 , g(t) = W (t) and t = T in the previous
formula we obtain
Z T Z T
2 W 2 (T )
W (T ) = 2 W (t)dW (t), i.e., W (t)dW (t) = ,
0 0 2
while the Itô integral of the Brownian motion is given by (4.5). The correct formula for the
chain rule in stochastic calculus is given in the following theorem.
Theorem 4.8. Let f : R × R → R, f = f (t, x), be a C 1 function such that ∂x2 f is continuous
and let {X(t)}t≥0 be the diffusion process dX(t) = σ(t)dW (t) + α(t)dt. Then Itô’s formula
holds:
1
df (t, X(t)) = ∂t f (t, X(t)) dt + ∂x f (t, X(t)) dX(t) + ∂x2 f (t, X(t)) dX(t)dX(t), (4.17)
2
i.e.,
1
df (t, X(t)) = ∂t f (t, X(t)) dt + ∂x f (t, X(t)) (σ(t)dW (t) + α(t)dt) + ∂x2 f (t, X(t))σ 2 (t) dt.
2
(4.18)
For instance, letting X(t) = W (t) and f (t, x) = x2 , we obtain d(W 2 (t)) = W (t)dW (t) +
1
2
dt, i.e., (4.5), while for f (t, x) = tx we obtain d(tW (t)) = W (t)dt + tdW (t), which is (4.8).
In fact, the proof of Theorem 4.8 is similar to proof of (4.5) and (4.8). We omit the details
(see [21, Theorem 4.4.1] for a sketch of the proof).
Recall that (4.18) is a shorthand for
Z t Z t
1 2 2
f (t, X(t)) = f (0, X(0))+ (∂t f +α(s)∂x f + σ (s)∂x f )(s, X(s)) ds+ ∂x f (s, X(s))dW (s).
0 2 0

All integrals in the right hand side of the previous equation are well defined, as the integrand
stochastic processes have continuous paths. We conclude with the generalization of Itô’s
formula to functions of several random variables, which again we give without proof.

85
Theorem 4.9. Let f : R × RN × R → R be a C 1 function such that f = f (t, x) is twice con-
tinuously differentiable on the variable x ∈ RN . Let {X1 (t)}t≥0 , . . . , {XN (t)}t≥0 be diffusion
processes and let X(t) = (X1 (t), . . . , XN (t)). Then there holds:
N
X
df (t, X(t)) = ∂t f (t, X(t)) dt + ∂xi f (t, X(t)) dXi (t)
i=1
N
1X
+ ∂xi ∂xj f (t, X(t)) dXi (t)dXj (t). (4.19)
2 i,j=1

For instance, for N = 2 and letting f (t, x1 , x2 ) = x1 x2 into (4.19), we obtain the Itô
product rule (4.16).
Remark 4.3. Let {X(t)}t≥0 , {Y (t)}t≥0 be diffusion processes and define the complex-valued
stochastic process {Z(t)}t≥0 by Z(t) = X(t) + iY (t). Then any stochastic process of the
form g(t, Z(t)) can be written in the form f (t, X(t), Y (t)), where f (t, x, y) = g(t, x + iy).
Hence dg(t, Z(t)) can be computed using Theorem 4.9. An application to this formula is
given in Exercise 4.9 below.
The following exercises help to get familiar with the rules of stochastic calculus.
Exercise 4.5 (•). Let {W1 (t)}t≥0 , {W2 (t)}t≥0 be Brownian motions. Assume that there
exists a constant ρ ∈ [−1, 1] such that dW1 (t)dW2 (t) = ρdt. Show that ρ is the correlation of
the two Brownian motions at time t. Assuming that {W1 (t)}t≥0 , {W2 (t)}t≥0 are independent,
compute P(W1 (t) > W2 (s)), for all s, t > 0.
Exercise 4.6 (•). Consider the stochastic process {X(t)}t≥0 defined by X(t) = W (t)3 −
3tW (t). Show that {X(t)}t≥0 is a martingale and find a process {Γ(t)}t≥0 adapted to
{F(t)}t≥0 such that Z t
X(t) = X(0) + Γ(s)dW (s).
0
(The existence of the process {Γ(t)}t≥0 is ensured by Theorem 4.6.)
Exercise 4.7 (•). Let {θ(t)}t≥0 ∈ C 0 and define the stochastic process {Z(t)}t≥0 by
 Z t
1 t 2
Z 
Z(t) = exp − θ(s)dW (s) − θ (s)ds .
0 2 0
Show that Z t
Z(t) = − θ(s)Z(s) dW (s).
0
Processes of the form considered in Exercise 4.7 are fundamental in mathematical finance.
In particular, it is important to know whether {Z(t)}t≥0 is a martingale. By Exercise 4.7
and Theorem 4.5(ii), {Z(t)}t≥0 is a martingale if θ(t)Z(t) ∈ L2 , which is however difficult in
general to verify directly. The following condition, known as Novikov’s condition, is more
useful in the applications, as it involves only the time-integral of the process {θ(t)}t≥0 . The
proof can be found in [13].

86
Theorem 4.10. Let {θ(t)}t≥0 ∈ C 0 satisfy
1 T
Z
E[exp( θ(t)2 dt)] < ∞, for all T > 0. (4.20)
2 0
Then the stochastic process {Z(t)}t≥0 given by
 Z t
1 t 2
Z 
Z(t) = exp − θ(s)dW (s) − θ (s)ds .
0 2 0
is a martingale relative the filtration {F (t)}t≥0 .
In particular, the stochastic process {Z(t)}t≥0 is a martingale when θ(t) = const, hence
we recover the result of Exercise 3.29. The following exercise extends the result of Exercise 4.7
to the case of several independent Brownian motions.
Exercise 4.8 (?). Let {W1 (t)}t≥0 , . . . {WN (t)}t≥0 be independent Brownian motions and let
0
{F(t)}t≥0 be a non-anticipating filtration for all of them. Let {θ1 (t)}p t≥0 , . . . {θN (t)}t≥0 ∈ C
be adapted to {F(t)}t≥0 and set θ(t) = (θ1 (t), . . . , θN (t)), kθ(t)k = θ1 (t)2 + · · · + θN (t)2 .
Compute dZ(t), where
N Z t
!
1 t
X Z
Z(t) = exp − θj (s)dWj (s) − kθ(s)k2 ds .
j=1 0
2 0

Under which condition is {Z(t)}t≥0 a martingale?


Exercise 4.9 (•). Let A : R → R be continuous deterministic function of time. Show that
the random variable Z t
I(t) = A(s) dW (s)
0
Rt
is normally distributed with zero expectation and variance 0
A(s)2 ds.
Exercise 4.10. Show that the process {W 2 (t) − t}t≥0 is a martingale relative to {F(t)}t≥0 ,
where {W (t)}t≥0 is a Brownian motion and {F(t)}t≥0 a non-anticipating filtration thereof.
Prove also the following logically opposite statement: assume that {X(t)}t≥0 and {X 2 (t) −
t}t≥0 are martingales relative to {F(t)}t≥0 , {X(t)}t≥0 has a.s. continuous paths and X(0) =
0 a.s.. Then {X(t)}t≥0 is a Brownian motion with non-anticipating filtration {F(t)}t≥0 .

4.5 Girsanov’s theorem


In this section we assume that the non-anticipating filtration of the Brownian motion coin-
cides with {FW (t)}t≥0 . Let {θ(t)}t≥0 ∈ C 0 satisfy the Novikov condition (4.20). It follows
by Theorem 4.10 that the positive stochastic process {Z(t)}t≥0 given by
 Z t
1 t 2
Z 
Z(t) = exp − θ(s)dW (s) − θ (s)ds (4.21)
0 2 0

87
is a martingale relative to {FW (t)}t≥0 . As Z(0) = 1, then E[Z(t)] = 1 for all t ≥ 0. Thus
we can use the stochastic process {Z(t)}t≥0 to generate a measure P e equivalent to P as we
did at the end of Section 3.6, namely Pe : F → [0, 1] is given by

P(A)
e = E[Z(T )IA ], A ∈ F. (4.22)
The relation between E and E
e has been determined in Theorem 3.19, where we showed that

E[X]
e = E[Z(t)X], (4.23)
for all t ≥ 0 and FW (t)-measurable random variables X, and

e |FW (s)] = 1
E[Y E[Z(t)Y |FW (s)] (4.24)
Z(s)
for all 0 ≤ s ≤ t and random variables Y . We can now state and sketch the proof of
Girsanov’s theorem, which is a fundamental result with deep applications in mathematical
finance.
Theorem 4.11. Define the stochastic process {Wf (t)}t≥0 by
Z t
W (t) = W (t) +
f θ(s)ds, (4.25)
0

f (t) = dW (t)+θ(t)dt. Then {W


i.e., dW f (t)}t≥0 is a P-Brownian
e motion. Moreover {FW (t)}t≥0
is a non-anticipating filtration for the P-Brownian
e motion {W
f (t)}t≥0 .
Sketch of the proof. We prove the theorem using the Lévy characterization of Brownian mo-
tions, see Theorem 3.18. Clearly, {Wf (t)}t≥0 starts from zero and has continuous paths a.s.
Moreover we (formally) have dW f (t)dWf (t) = dW (t)dW (t) = dt. Hence it remains to show
that the Brownian motion {W f (t)}t≥0 is P-martingale
e relative to the filtration {FW (t)}t≥0 .
By Itô’s product rule we have
d(W
f (t)Z(t)) = W
f (t)dZ(t) + Z(t)dWf (s) + dW
f (t)dZ(t)
= (1 − θ(t)W
f (t))Z(t)dW (t),

that is to say, Z t
W
f (t)Z(t) = (1 − W
f (u)θ(u))Z(u)dW (u).
0

It follows by Theorem 4.5(ii) that the stochastic process {Z(t)W


f (t)}t≥0 is a P-martingale
relative to {FW (t)}t≥0 , i.e.,
E[Z(t)W
f (t)|FW (s)] = Z(s)W
f (s).

But according to (4.24),


E[Z(t)W
f (t)|FW (s)] = Z(s)E[
eW f (t)|FW (s)].

Hence E[
eW f (t)|FW (s)] = W
f (s), as claimed.

88
Later we shall need also the multi-dimensional version of Girsanov’s theorem. Let
{W1 (t)}t≥0 , . . . , {WN (t)}t≥0 be independent Brownian motions and let {FW (t)}t≥0 be their
own generated filtration. Let {θ1 (t)}t≥0 , . . . , {θN (t)}t≥0 ∈ C 0 be adapted to {FW (t)}t≥0 and
set θ(t) = (θ1 (t), . . . , θN (t)). We assume that the Novikov condition (4.20) is satisfied (with
θ(t)2 = kθ(t)k2 = θ1 (t)2 + · · · + θN (t)2 ). Then, as shown in Exercise 4.8, the stochastic
process {Z(t)}t≥0 given by
N Z t Z t !
X 1
Z(t) = exp − θj (s)dWj (s) − kθ(s)k2 ds
j=1 0
2 0

is a martingale relative to {FW (t)}t≥0 . It follows as before that the map P


e : F → [0, 1] given
by
P(A)
e = E[Z(T )IA ], A ∈ F (4.26)
is a new probability measure equivalent to P and the following N -dimensional generalization
of Girsanov’s theorem holds.
Theorem 4.12. Define the stochastic processes {W f1 (t)}t≥0 , . . . , {W
fN (t)}t≥0 by
Z t
W
fk (t) = Wk (t) + θi (s)ds, k = 1, . . . , N. (4.27)
0

Then {Wf1 (t)}t≥0 , . . . , {W


fN (t)}t≥0 are independent Brownian motions in the probability mea-
e Moreover the filtration {FW (t)}t≥0 generated by {W1 (t)}t≥0 , . . . , {WN (t)}t≥0 is a
sure P.
non-anticipating filtration for {W f1 (t)}t≥0 , . . . , {W
fN (t)}t≥0 .

4.6 Diffusion processes in financial mathematics


The purpose of this final section is to introduce some important examples of diffusion pro-
cesses in financial mathematics. The analysis of the properties of such processes is the subject
of Chapter 6.

Generalized geometric Brownian motion


Given two stochastic processes {α(t)}t≥0 , {σ(t)}t≥0 ∈ C 0 , adapted to {FW (t)}t≥0 , the stochas-
tic process {S(t)}t≥0 given by
Z t Z t 
S(t) = S(0) exp α(s)ds + σ(s)dW (s)
0 0

is called generalized geometric Brownian motion with instantaneous mean of log-


return {α(t)}t≥0 and instantaneous volatility {σ(t)}t≥0 . When α(t) = α ∈ R and
σ(t) = σ > 0 are deterministic constant, the process above reduces to the geometric Brownian
motion, see (2.14). The generalized geometric Brownian motion provides a quite more general

89
and realistic model for the dynamics of stock prices than the simple geometric Brownian
motion. In the rest of these notes we assume that stock prices are modeled by geometric
Brownian motions.
Since
S(t) = S(0)eX(t) , dX(t) = α(t)dt + σ(t)dW (t),
then Itô’s formula gives
1
dS(t) = S(0)eX(t) dX(t) + S(0)eX(t) dX(t)dX(t)
2
1
= S(t)α(t)dt + S(t)σ(t)dW (s) + σ 2 (t)S(t)dt
2
1
= µ(t)S(t)dt + σ(t)S(t)dW (t), where µ(t) = α(t) + σ 2 (t),
2
hence a generalized geometric Brownian motion is a diffusion process in which the rate of
quadratic variation and the drift depend on the process itself.
In the presence of several stocks, it is reasonable to assume that each of them introduced
a new source of randomness in the market. Thus, when dealing with N stocks, we assume the
existence of N Brownian motions {W1 (t)}t≥0 , . . . {WN (t)}t≥0 , not necessarily independent,
and model the evolution of the stocks prices {S1 (t)}t≥0 }, . . . , {SN (t)}t≥0 by the following
N -dimensional generalized geometric Brownian motion:
N
!
X
dSk (t) = µk (t) + σkj (t)dWj (t) Sk (t) (4.28)
j=1

for some stochastic processes {µk (t)}t≥0 , {σkj (t)}t≥0 ∈ C 0 , j, k = 1, . . . , N , adapted to the
filtration generated by the Brownian motions.

Self-financing portfolios
Consider a portfolio {hS (t), hB (t)}t≥0 invested in a 1+1-dimensional market. We assume
that the price of the stock follows the generalized geometric Brownian motion

dS(t) = µ(t)S(t)dt + σ(t)S(t)dW (t), (4.29)

while the value of the risk-free asset is given by (2.15), i.e.,

dB(t) = B(t)R(t)dt. (4.30)

Moreover we assume that the market parameters {µ(t)}t≥0 , {σ(t)}t≥0 , {R(t)}t≥0 have
continuous paths a.s. and are adapted to the filtration {FW (t)}t≥0 . The value of the portfolio
is given by
V (t) = hS (t)S(t) + hB (t)B(t). (4.31)
We say that the portfolio is self-financing if purchasing more shares of one asset is possible
only by selling shares of the other asset for an equivalent value (and not by infusing new cash

90
into the portfolio), and, conversely, if any cash obtained by selling one asset is immediately
re-invested to buy shares of the other asset (and not withdrawn from the portfolio). To
translate this condition into a mathematical formula, assume that (hS , hB ) is the investor
position on the stock and the risk-free asset during the “infinitesimal” time interval [t, t+δt).
Let V − (t + δt) be the value of this portfolio immediately before the time t + δt at which the
position is changed, i.e.,

V − (t + δt) = lim hS S(u) + hB B(u) = hS S(t + δt) + hB B(t + δt),


u→t+δt

where we used the continuity in time of the assets price. At the time t + δt, the investor
sells/buys shares of the assets. Let (h0S , h0B ) be the new position on the stock and the risk-free
asset. Then the value of the portfolio at time t + δt is given by

V (t + δt) = h0S S(t + δt) + h0B B(t + δt).

The difference V (t + δt) − V − (t + δt), if not zero, corresponds to cash withdrawn or added
to the portfolio as a result of the change in the position on the assets. In a self-financing
portfolio, however, this difference must be zero. We obtain

V (t + δt) − V − (t + δt) = 0 ⇔ (hS − h0S )S(t + δt) + (hB − h0B )B(t + δt) = 0.

Hence, the change of the portfolio value in the interval [t, t + δt] is given by

δV = V (t + δt) − V (t) = h0S S(t + δt) + h0B B(t + δt) − (hS S(t) + hB B(t)) = hS δS + hB δB,

where δS = S(t + δt) − S(t), and δB = B(t + δt) − B(t) are the changes of the assets value
in the interval [t, t + δt]. This discussion leads to the following definition.
Definition 4.4. A portfolio process {hS (t), hB (t)}t≥0 invested in the 1 + 1-dimensional mar-
ket (4.29)-(4.30) is said to be self-financing if it is adapted to {FW (t)}t≥0 and if its value
process {V (t)}t≥0 satisfies

dV (t) = hS (t)dS(t) + hB (t)dB(t). (4.32)

We conclude with the important definition of hedging portfolio. Suppose that at time
t a European derivative with pay-off Y at the time of maturity T > t is sold for the price
ΠY (t). An important problem in financial mathematics is to find a strategy for how the
seller should invest the premium ΠY (t) of the derivative in order to hedge the derivative,
i.e., to ensure that the portfolio value of the seller at time T is enough to pay-off the buyer
of the derivative. We shall assume throughout that the seller can invest the premium of the
derivative only on the 1+1 dimensional market consisting of the underlying stock and the
risk-free asset.
Definition 4.5. Consider the European derivative with pay-off Y and time of maturity T ,
where we assume that Y is FW (T )-measurable. A portfolio process {hS (t), hB (t)}t≥0 invested
in the underlying stock and the risk-free asset is said to be an hedging portfolio if

91
(i) {hS (t), hB (t)}t≥0 is adapted to {FW (t)}t≥0 ;

(ii) The value of the portfolio satisfies V (T ) = Y .


In the next chapter we shall answer the following questions:
1) What is a reasonable “fair” price for the European derivative at time t ∈ [0, T ] ?

2) What investment strategy (on the underlying stock and the risk-free asset) should the
seller undertake in order to hedge the derivative?

4.A Appendix: Solutions to selected problems


Exercise 4.4. Assume
Xi (t) = αi t + σi dW (t), i = 1, 2,
for constants α1 , α2 , σ1 , σ2 . Then the right hand side of (4.15) is
Z t
[(α2 s + σ2 W (s))σ1 + (α1 s + σ1 W (s))σ2 ]dW (s)
0
Z t
+ [(α2 s + σ2 W (s))α1 + (α1 s + σ1 W (s))α2 + σ1 σ2 ]ds
0
Z t Z t
= σ1 (α2 s + σ2 W (s))dW (s) + σ2 (α1 s + σ1 W (s))dW (s)
0 0
Z t Z t
t2 t2
+ α1 α2 + α1 σ2 W (s)ds + α1 α2 + σ1 α2 W (s)ds + σ1 σ2 t
2 0 2 0
Z t Z t Z t
= σ1 α2 sdW (s) + 2σ1 σ2 W (s)dW (s) + σ2 α1 sdW (s)
0 0 0
Z t
2
+ α1 α2 t + σ1 σ2 t + (α1 σ2 + σ1 α2 ) W (s)ds
0
Z t Z t
W (t)2 t 2
= 2σ1 σ2 ( − ) + σ1 σ2 t + α1 α2 t + (σ1 α2 + α1 σ2 )( sdW (s) + W (s)ds)
2 2 0 0
= σ1 σ2 W (t)2 + α1 α2 t2 + (σ1 α2 + α1 σ2 )tW (t)
= (α1 t + σ1 W (t))(α2 t + σ2 W (t)) = X1 (t)X2 (t).

Exercise 4.5. We have


E[W1 (t)W2 (t)] 1
Cor(W1 (t), W2 (t)) = p = E[W1 (t)W2 (t)].
Var[W1 (t)]Var[W2 (t)] t

Hence we have to show that E[W1 (t)W2 (t)] = ρt. By Itô’s product rule

d(W1 (t)W2 (t)) = W1 (t)dW2 (t)+W2 (t)dW1 (t)+dW1 (t)dW2 (t) = W1 (t)dW2 (t)+W2 (t)dW1 (t)+ρdt.

92
Taking the expectation we find E[W1 (t)W2 (t)] = ρt, which concludes the first part of the
exercise. As to the second part, the independent random variables W1 (t), W2 (s) have the
joint density
1 x2 y2
fW1 (t)W2 (s) (x, y) = fW1 (t) (x)fW2 (s) (y) = √ e− 2t − 2s .
2π ts
Hence
1 1
Z
x2 y2
P(W1 (t) > W2 (s)) = √ e− 2t − 2s dx dy = .
2π ts x>y 2

Exercise 4.6. To solve the exercise we must prove that dX(t) = Γ(t)dW (t), for some
process {Γ(t)}t≥0 adapted to {FW (t)}t≥0 . In fact, by Itô’s formula,
1
dX(t) = −3W (t) dt + (3W (t) − 3t) dW (t) + 6W (t)dW (t)dW (t) = 3(W (t)2 − t)dW (t),
2
where in the last step we used that dW (t)dW (t) = dt.

Exercise 4.7. By Itô’s formula, the stochastic process {Z(t)}t≥0 satisfies dZ(t) = −θ(t)Z(t)dW (t),
which is the claim.

Exercise 4.9. Since {I(t)}t≥0 is a martingale, then E[I(t)] = E[I(0)] = 0. By Itô’s isometry,
Z t Z t
2 2
Var[I(t)] = E[Y (t) ] = E[ A(s) ds] = A2 (s) ds,
0 0

since A(t) is not random. To prove that I(t) is normally distributed, it suffices to show that
its characteristic function satisfies
u2
Rt
θI(t) (u) = e− 2 0 A(s) ds
,
see Section 3.3. The latter is equivalent to
t
u2
2 Rt
Z
iuI(t) − u2 A(s) ds
E[e ]=e 0 , i.e., E[exp(iuI(t) + A(s) ds)] = 1.
2 0
2 Rt
Let Z(t) = exp(iuI(t) + u2 0 A(s) ds). If we show that Z(t) is a martingale, we are done,
because then E[Z(t)] = E[Z(0)] = 1. We write
Z t
u2 t 2
Z
iX(t)+Y (t)
Z(t) = e = f (X(t), Y (t)), X(t) = u A(s) dW (s), Y (t) = A (s) ds.
0 2 0
Then, by Theorem 4.9, using dX(t)dY (t) = dY (t)dY (t) = 0,
1
dZ(t) = ieiX(t)+Y (t) dX(t) + eiX(t)+Y (t) dY (t) − eiX(t)+Y (t) dX(t)dX(t) = iuZ(t)A(t)dW (t),
2
where we used that dX(t)dX(t) = u2 A2 (t)dt. Being an Itô’s integral, the process {Z(t)}t≥0
is a martingale, which completes the solution of the exercise.

93
Chapter 5

Stochastic differential equations and


partial differential equations

Throughout this chapter, the probability space (Ω, F, P) is given and {F(t)}t≥0 denotes
a non-anticipating filtration for the given Brownian motion {W (t)}t≥0 . Given T > 0, we
denote by DT the open region in the (t, x)-plane given by

DT = {t ∈ (0, T ), x ∈ R} = (0, T ) × R.

The closure and the boundary of DT are given respectively by

DT = [0, T ] × R, ∂DT = {t = 0, x ∈ R} ∪ {t = T, x ∈ R}.

Similarly we denote DT+ the open region

DT+ = {t ∈ (0, T ), x > 0} = (0, T ) × (0, ∞),

whose closure and boundary are given by

DT+ = [0, T ] × [0, ∞), ∂DT = {t = 0, x ≥ 0} ∪ {t = T, x ≥ 0} ∪ {t ∈ [0, T ], x = 0}.

Moreover we shall employ the following notation for functions spaces.


• C k (DT ) is the space of k-times continuously differentiable functions u : DT → R;

• C 1,2 (DT ) is the space of functions u ∈ C 1 (DT ) such that ∂x2 u ∈ C(DT );

• C k (DT ) is the space of functions u ∈ C k (DT ) whose partial derivatives up to order k


extend continuously on DT . Similarly one defines C 1,2 (DT );

• Cck (Rn ) is the space k-times continuously differentiable functions u : Rn → R with


compact support. We also let Cc∞ (Rn ) = ∩k∈N Cck (Rn )
A continuous function u is uniformly bounded on DT if there exists CT > 0 such that
|u(t, x)| ≤ CT , for all (t, x) ∈ DT . Unless otherwise stated, all functions are real-valued.

94
5.1 Stochastic differential equations
Definition 5.1. Given s ≥ 0, α, β ∈ C 0 ([s, ∞) × R), and a deterministic constant x ∈ R,
we say that a stochastic process {X(t)}t≥s is a global solution to the stochastic differential
equation (SDE)
dX(t) = α(t, X(t)) dt + β(t, X(t)) dW (t) (5.1)
with initial value X(s, ω) = x at time t = s, if {X(t)}t≥s ∈ C 0 and
Z t Z t
X(t) = x + α(τ, X(τ )) dτ + β(τ, X(τ )) dW (τ ), t ≥ s. (5.2)
s s

The initial value of a SDE can be a random variable instead of a deterministic constant,
but we shall not need this more general case. Note also that the integrals in the right
hand side of (5.2) are well-defined, as the integrand functions have continuous paths a.s. Of
course one needs suitable assumptions on the functions α, β to ensure that there is a (unique)
process {X(t)}t≥s satisfying (5.2). The precise statement is contained in the following global
existence and uniqueness theorem for SDE’s, which is reminiscent of the analogous result for
ordinary differential equations.
Theorem 5.1. Assume that for each T > s there exist constants CT , DT > 0 such that α, β
satisfy
|α(t, x)| + |β(t, x)| ≤ CT (1 + |x|), (5.3)
|α(t, x) − α(t, y)| + |β(t, x) − β(t, y)| ≤ DT |x − y|, (5.4)
for all t ∈ [s, T ], x, y ∈ R. Then there exists a unique global solution {X(t)}t≥s of the
SDE (5.1) with initial value X(s) = x. Moreover {X(t)}t≥s ∈ L2 .
A proof of Theorem 5.1 can be found in [17, Theorem 5.2.1]. Note that the result proved
in [17] is a bit more general than the one stated above, as it covers the case of a random
initial value.
The solution of (5.1) with initial initial value x at time t = s will be also denoted by
{X(t; s, x)}t≥s . It can be shown that, under the assumptions of Theorem 5.1, the random
variable X(t; s, x) depends (a.s.) continuously on the initial conditions (s, x), see [1, Sec. 7.3].
Remark 5.1. The uniqueness statement in Theorem 5.1 is to be understood “up to null
sets”. Precisely, if {Xi (t)}t≥s , i = 1, 2 are two solutions with the same initial value x, then
P( sup |X1 (t) − X2 (t)| > 0) = 0, for all T > s.
t∈[s,T ]

Remark 5.2. If the assumptions of Theorem 5.1 are satisfied only up to a fixed time
T > 0, then the solution of (5.1) could explode at some finite time in the future of T . For
example, the stochastic process {X(t)}0≤t<T∗ given by X(t) = log(W (t)+ex ) solves (5.1) with
α = − exp(−2x)/2 and β = exp(−x), but only up to the time T∗ = inf{t : W (t) = −ex } > 0.
In these notes we are only interested in global solutions of SDE’s, hence we require (5.3)-(5.4)
to hold for all T > 0.

95
Remark 5.3. The growth condition (5.3) alone is sufficient to prove the existence of a global
solution to (5.1). The Lipschitz condition (5.4) is used to ensure uniqueness. By using a
more general notion of solution (weak solution) and uniqueness (pathwise uniqueness),
one can extend Theorem 5.1 to a larger class of SDE’s, which include in particular the CIR
process considered in Section 5.3; see [19] for details.
Exercise 5.1. Within many applications in finance, the drift term α(t, x) is linear, an so it
can be written in the form

α(t, x) = a(b − x), a, b constant. (5.5)

A stochastic process {X(t)}t≥0 is called mean reverting if there exists a constant c such
that E[X(t)] → c as t → +∞. Most financial variables are required to satisfy the mean
reversion property. Prove that the solution {X(t; s, x)}t≥0 of (5.1) with linear drift (5.5)
satisfies
E[X(t; s, x)] = xe−a(t−s) + b(1 − e−a(t−s) ). (5.6)
Hence the process {X(t; s, x)}t≥0 is mean reversing if and only if a > 0 and in this case the
long time mean is given by c = b.

5.1.1 Linear SDE’s


A SDE of the form

dX(t) = (a(t) + b(t)X(t)) dt + (γ(t) + σ(t)X(t)) dW (t), X(s) = x, (5.7)

where a, b, γ, σ are deterministic functions of time, is called a linear stochastic differential


equation. We assume that for all T > 0 there exists a constant CT such that

sup (|a(t)| + |b(t)| + |γ(t)| + |σ(t)|) < CT ,


t∈[s,T ]

and so by Theorem 5.1 there exists a unique global solution of (5.7). For example, the
geometric Brownian motion (2.14) solves the linear SDE dS(t) = µS(t) dt + σS(t) dW (t),
where µ = α + σ 2 /2. Another example of linear SDE in finance is the Hull-White interest
rate model, see Section 6.6. Linear SDE’s can be solved explicitly, as shown in the following
theorem.
Theorem 5.2. The solution of (5.7) is given by X(t) = Y (t)Z(t), where
Z t Z t
σ(τ )2

Z(t) = exp σ(τ )dW (τ ) + (b(τ ) − )dτ ,
s s 2
Z t Z t
a(τ ) − σ(τ )γ(τ ) γ(τ )
Y (t) = x + dτ + dW (τ ).
s Z(τ ) s Z(τ )

Exercise 5.2 (?). Proof Theorem 5.2.

96
For example, in the special case in which the functions a, b, γ, σ are constant (independent
of time), the solution of (5.7) with initial value X(0) = x at time t = 0 is
2 
Z t 2
Z t
σ2

σW (t)+(b− σ2 )t −σW (τ )−(b− σ2 )τ
X(t) = e x+(a−γσ) e dτ +γ W (τ )e−σW (τ )−(b− 2 )τ dW (τ ) .
0 0

Exercise 5.3 (•). Consider the linear SDE (5.7) with constant coefficients a, b, γ and σ = 0.
Find the solution and show that X(t; s, x) ∈ N (m(t − s, x), ∆(t − s)2 ), where

a γ2
m(τ, x) = xe−bτ + (a − e−bτ ), ∆(τ )2 = (1 − e−2bτ ). (5.8)
b 2b
Exercise 5.4 (•). Find the solution {X(t)}t≥0 of the linear SDE

dX(t) = tX(t) dt + dW (t), t≥0

with initial value X(0) = 1. Find Cov(X(s), X(t)).

Exercise 5.5. Compute Cov(W (t), X(t)) and Cov(W 2 (t), X(t)), where X(t) = X(t; s, x) is
the stochastic process in Exercise 5.3.

5.1.2 Markov property


It can be shown that, under the assumptions of Theorem 5.1, the solution {X(t; s, x)}t≥s
of (5.1) is a Markov process, see for instance [1, Th. 9.2.3]. Moreover when α, β in (5.1) are
time-independent, {X(t; s, x)}t≥s is a homogeneous Markov process. The fact that solutions
of SDE’s should satisfy the Markov property is quite intuitive, for, as shown in Theorem 5.1,
the solution at time t is uniquely characterized by the initial value at time s < t. Consider
for example the linear SDE

dX(t) = (a − bX(t)) dt + γdW (t), t ≥ s, X(s) = x. (5.9)

As shown in Exercise 5.5, the solution of (5.9) is given by


Z t
b(s−t) a b(s−t)
X(t; s, x) = xe + (1 − e )+ γeb(u−t) dW (u), t ≥ s,
b s

and therefore X(t; s, x) ∈ N (m(t−s, x), ∆(t−s)2 ), where m(τ, x) and ∆(τ ) are given by (5.8).
By Theorem 3.20, the transition density of the Markov process {X(t; s, x)}t≥0 exists and is
given by the pdf of the random variable X(t; s, x), that is p(t, s, x, y) = p∗ (t − s, x, y), where
(y−m(τ,x))2 1
p∗ (τ, x, y) = e 2∆(τ )2 p .
2π∆(τ )2

The previous example rises the question of whether the Markov process solution of SDE’s
always admits a transition density. This problem is one of the subjects of Section 5.2.

97
5.1.3 Systems of SDE’s
Occasionally in the next chapter we need to consider systems of several SDE’s. All the
results presented in this section extend mutatis mutandis to systems of SDE’s, the difference
being merely notational. For example, given two Brownian motions {W1 (t)}t≥0 , {W2 (t)}t≥0
and continuous functions α1 , α2 , β11 , β12 , β21 , β22 : [s, ∞) × R2 → R, the relations
X
dXi (t) = αi (t, X1 (t), X2 (t)) dt + βij (t, X1 (t), X2 (t))dWj (t), (5.10a)
j=1,2

Xi (s) = xi , i = 1, 2 (5.10b)

define a system of two SDE’s on the stochastic processes {X1 (t)}t≥0 , {X2 (t)}t≥0 with initial
values X(s) = x1 , X2 (s) = x2 at time s. As usual, the correct way to interpret the relations
above is in the integral form:
Z t XZ t
Xi (t) = xi + αi (τ, X1 (τ ), X2 (τ )) dτ + βij (τ, X1 (τ ), X2 (τ ))dWj (τ ) i = 1, 2.
s j=1,2 s

Upon defining the vector and matrix valued functions α = (α1 , α2 )T , β = (βij )i,j=1,2 , and
letting X(t) = (X1 (t), X2 (t)), x = (x1 , x2 ), W (t) = (W1 (t), W2 (t)), we can rewrite (5.10) as

dX(t) = α(t, X(t)) dt + β(t, X(t)) · dW (t), X(s) = x, (5.11)

where · denotes the row by column matrix product. In fact, every system of any arbitrary
number of SDE’s can be written in the form (5.11). Theorem 5.1 continues to be valid for
systems of SDE’s, the only difference being that |α|, |β| in (5.3)-(5.4) stand now for the
vector norm of α and for the matrix norm of β.

5.2 Partial differential equations


All financial variables are represented by stochastic processes solving (systems of) SDE’s.
In this context, a problem which recurs often is to find a function f such that the process
{Y (t)}t≥0 , Y (t) = f (t, X(t)), is a martingale, where {X(t)}t≥0 is the global solution of (5.1)
with initial value X(0) = x. To this regard we have the following result.
Theorem 5.3. Let T > 0 and u ∈ C 1,2 (DT ). Assume that u satisfies
1
∂t u + α(t, x)∂x u + β(t, x)2 ∂x2 u = 0, (5.12)
2
in the region DT . Assume also that α, β satisfy the conditions in Theorem 5.1 (with s = 0)
and let {X(t)}t≥0 be the unique global solution of (5.1) with initial value X(0) = x. The
stochastic process {u(t, X(t))}t∈[0,T ] satisfies
Z t
u(t, X(t)) = u(0, x) + β(t, X(t))∂x u(t, X(t)) dW (t), t ∈ [0, T ]. (5.13)
0

98
Moreover if ∂x u is uniformly bounded on DT , then the stochastic process {u(t, X(t))}t∈[0,T ]
is a martingale relative to the filtration {F(t)}t≥0 .

Proof. By Itô’s formula we find

β2 2
du(t, X(t)) = (∂t u + α∂x u + ∂ u)(t, X(t)) dt + (β∂x u)(t, X(t)) dW (t).
2 x
As u solves (5.12), then du(t, X(t)) = (β∂x u)(t, X(t)) dW (t), which is equivalent to (5.13)
(as u(0, X(0)) = u(0, x)). Under the additional assumption that ∂x u is uniformly bounded
on DT , there exists a constant CT > 0 such that |∂x u(t, x)| ≤ CT and so, due also to (5.3),
the Itô integral in the right hand side of (5.13) is a martingale. This concludes the proof of
the theorem.

Definition 5.2. The partial differential equation (PDE) (5.12) is called the (backward)
Kolmogorov equation associated to the SDE (5.1). We say that u : DT → R is a strong
solution of (5.12) in the region DT if u ∈ C 1,2 (DT ), u solves (5.12) for all (t, x) ∈ DT , and
∂x u(t, x) is uniformly bounded on DT . Similarly, replacing DT with DT+ , one defines strong
solutions of (5.12) in the region DT+

Note carefully that for a strong solution u of the Kolmogorov equation in the region DT+ ,
we require that u, ∂t u, ∂x u and ∂x2 u extend continuously on the axis x = 0. This assumption
can be weakened, but we shall not do so. The statement of Theorem 5.3 rises the question
of whether there exist strong solutions to the Kolmogorov PDE. This important problem is
solved in the following theorem.

Theorem 5.4. Assume that the hypothesis of Theorem 5.1 are satisfied and in addition
assume that α, β ∈ C 2 (DT ) such that ∂xi α, ∂xi β are uniformly bounded on DT , for i = 1, 2
and for all T > 0. Let g ∈ C 2 (R) such that g 0 and g 00 are uniformly bounded on R. Define
the function
uT (t, x) = E[g(X(T ; t, x))], 0 ≤ t < T. (5.14)
Then the following holds.

(i) uT is a strong solution of the Kolmogorov PDE


1
∂t u + α(t, x)∂x u + β(t, x)2 ∂x2 u = 0, (t, x) ∈ DT , (5.15)
2
with the terminal condition

lim u(t, x) = g(x), for all x ∈ R. (5.16)


t→T

(ii) The solution is unique in the following sense: if v is another strong solution and
limt→T v(t, x) = g(x), then v = uT in DT .

99
Proof. For (i) see [17, Theorem 8.1.1]. We prove only (ii). Let v be a solution as stated in
the theorem and set Y (τ ) = v(τ, X(τ ; t, x)), for t ≤ τ ≤ T . By Itô’s formula and using that
v solves (5.15) we find dY (τ ) = β∂x v(τ, X(τ ; t, x))dW (τ ). Hence
Z T
v(T, X(T ; t, x)) − v(t, X(t; t, x)) = β∂x v(τ, X(τ ; t, x))dW (τ ).
t

Moreover v(T, X(T ; t, x)) = g(X(T ; t, x)), v(t, X(t; t, x)) = v(t, x) and in addition, by (5.3)
and the fact that of ∂x v is uniformly bounded, the Itô integral in the right hand side is a
martingale. Hence taking the expectation we find v(t, x) = E[g(T, X(T ; t, x))] = u(t, x).
Remark 5.4. It is often convenient to study the Kolmogorov PDE with an initial, rather
than terminal, condition. To this purpose it suffices to make the change of variable t → T − t
in (5.15). Letting ū(t, x) = u(T − t, x), we now see that ū satisfies the PDE
1
−∂t ū + α(T − t, x)∂x ū + β(T − t, x)2 ∂x2 ū = 0, 0≤t≤T (5.17)
2
with initial condition ū(0, x) = g(x). Note that this is the equation considered in [17,
Theorem 8.1.1]
Remark 5.5. Observe that in Theorem 5.4 we have a different solution for each fixed T . As
∂x uT is uniformly bounded, Theorem 5.3 gives that the stochastic process {uT (t, X(t))}t∈[0,T ]
is a martingale. Equation (5.14) is also called Dynkin’s formula (which is a special case
of the Feynman-Kac formula.)
Remark 5.6. It is possible to define other concepts of solution to the Kolmogorov PDE other
than the strong one, e.g., weak solution, entropy solution, etc. In general these solutions are
not uniquely characterized by their terminal value. In these notes we only consider strong
solutions, which, as proved in Theorem 5.4, are uniquely determined by (5.16).
Exercise 5.6. Consider the Kolmogorov PDE associated to the linear SDE (5.7) with con-
stant coefficients and σ = 0:
1
∂t u + (a − bx)∂x u + γ∂x2 u = 0, (t, x) ∈ DT+ . (5.18)
2
Find the strong solution of (5.18) that satisfies u(T, x) = 1. HINT: Use the ansatz f (t, x) =
e−xA(t,T )+B(t,T ) .
The study of the Kolmogorov equation is also important to establish whether the solution
of a SDE admits a transition density. In fact, it can be shown that when {X(t)}t≥s admits
a smooth transition density, then the latter coincides with the fundamental solution of
the Kolmogorov equation. To state the precise result, let us denote by δ(x − y) the δ-
distribution centered in y ∈ R, i.e., the distribution satisfying
Z
ψ(x)δ(x − y) dx = ψ(y), for all ψ ∈ Cc∞ (R).
R

100
A sequence of measurable functions (gn )n∈N is said to converge to δ(x − y) in the sense of
distributions if
Z
lim gn (x)ψ(x) dx → ψ(y), as n → ∞, for all ψ ∈ Cc∞ (R).
n→∞ R

Theorem 5.5. Assume the conditions in Theorem 5.1 are satisfied. Let {X(t; s, x)}t≥s be
the global solution of (5.1) with initial value X(s) = x; recall that this solution is a Markov
stochastic process.

(i) If {X(t; s, x)}t≥s admits a transition density p(t, s, x, y) which is C 1 in the variable s
and C 2 in the variable x, then p(t, s, x, y) solves the Kolmogorov PDE
1
∂s p + α(s, x)∂x p + β(s, x)2 ∂x2 p = 0, 0 < s < t, x ∈ R, (5.19)
2
with terminal value
lim p(t, s, x, y) = δ(x − y). (5.20)
s→t

(ii) If {X(t; s, x)}t≥0 admits a transition density p(t, s, x, y) which is C 1 in the variable t
and C 2 in the variable y, then p(t, s, x, y) solves the Fokker-Planck PDE1
1
∂t p + ∂y (α(t, y)p) − ∂y2 (β(t, y)2 p) = 0, t > s, x ∈ R, (5.21)
2
with initial value
lim p(t, s, x, y) = δ(x − y). (5.22)
t→s

Exercise 5.7. Prove Theorem 5.5. HINT: See Exercises 6.8 and 6.9 in [21].

Remark 5.7. The solution p of the problem (5.19)-(5.20) is called the fundamental so-
lution for the Kolmogorov PDE, as any other solution can be reconstructed from it. For
example for all functions g as in Theorem 5.4, the solution of (5.15) with the terminal
condition (5.16) is given by
Z
uT (t, x) = p(T, t, x, y)g(y) dy.
R

This can be verified either by a direct calculation or by using the interpretation of the
fundamental solution as transition density. Similarly, p is the fundamental solution of the
Fokker-Planck equation

Let us see an example of application of Theorem 5.5. First notice that when the functions
α, β in (5.1) are time-independent, then the Markovian stochastic process {X(t; s, x)}t≥s is
homogeneous and therefore the transition density, when it exists, has the form p(t, s, x, y) =
1
Also known as forward Kolmogorov PDE.

101
p∗ (t − s, x, y). By the change of variable s → t − s = τ in (5.19), we find that p∗ (τ, x, y)
satisfies
1
−∂τ p∗ + α(x)∂x p∗ + σ(x)2 ∂x2 p∗ = 0, (5.23)
2
as well as
1
∂τ p∗ + ∂y (α(y)p∗ ) − ∂y2 (σ(y)2 p∗ ) = 0, (5.24)
2
with the initial condition p∗ (0, x, y) = δ(x − y). For example the Brownian motion is a
Markov process with transition density (3.28). In this case, (5.23) and (5.24) both reduce to
the heat equation −∂τ p∗ + 12 ∂x2 p∗ = 0. It is straightforward to verify that (3.28) satisfies the
heat equation for (τ, x) ∈ (0, ∞) × R. Now we show that, as claimed in Theorem 5.5, the
initial condition p∗ (0, x, y) = δ(x − y) is also verified, that is
Z
lim p∗ (τ, x, y)ψ(y) dy = ψ(x), for all ψ ∈ Cc∞ (R) and x ∈ R.
τ →0 R

Indeed with the change of variable y = x + τ z, we have

1 √ z 2 dz
Z Z 2
Z
− z2
p∗ (τ, x, y)ψ(y) dy = √ e ψ(x + τ z) dz → ψ(0) e− 2 √ = ψ(0),
R 2π R R 2π
as claimed. Moreover, as W (0) = 0 a.s., Theorem 5.5 entails that the density of the Brownian
y2
motion is fW (t) (y) = p∗ (t, 0, y) = √ 1 e− 2t , which is of course correct.
2πt

Exercise 5.8. Show that the transition density derived in the example at the end of Sec-
tion 5.1 is the fundamental solution of the Kolmogorov equation for the linear SDE (5.9).

5.3 The CIR process


A CIR process is a stochastic process {X(t)}t≥s satisfying the SDE
p
dX(t) = a(b − X(t)) dt + c X(t) dW (t), X(s) = x > 0, (5.25)

where a, b, c are constant (c 6= 0). CIR processes are used in finance to model the stock
volatility in the Heston model (see Section 6.5.2) and the spot interest rate of bonds in the
CIR model (see Section√ 6.6). Note that the SDE (5.25) is not of the form considered so far, as
the function β(t, x) = c x is defined only for x ≥ 0 and, more importantly, it is not Lipschitz
continuous in a neighborhood of x = 0 as required in Theorem 5.1. Nevertheless, as already
mentioned in Remark 5.3, it can be shown that (5.25) admits a unique global solution for
all x > 0. Clearly the solution satisfies X(t) ≥ 0 a.s., for all t ≥ 0, otherwise the Itô integral
in the right hand side of (5.25) would not even be defined. For future applications, it is
important to know whether the solution can hit zero in finite time with positive probability.
This question is answered in the following theorem, whose proof can be found for instance
in [15, Exercise 37]).

102
Theorem 5.6. Let {X(t)}t≥0 be the CIR process with initial value X(0) = x > 0 at time
t = 0. Define the (stopping2 ) time

τ0x = inf{t ≥ 0 : X(t) = 0}.

Then P(τ0x < ∞) = 0 if and only if ab ≥ c2 /2, which is called Feller’s condition.

Exercise 5.9. Prove Theorem 5.6 following the hints in [15, Exercise 37].

The following theorem shows how to build a CIR process from a family of linear SDE’s.

Theorem 5.7. Let {W1 (t)}t≥0 , . . . {WN (t)}t≥0 be N ≥ 2 independent Brownian motions and
assume that {X1 (t)}t≥0 , . . . , {XN (t)}t≥0 solve

θ σ
dXj (t) = − Xj (t) dt + dWj (t), j = 1, . . . , N, Xj (0) = xj ∈ R, (5.26)
2 2
where θ, σ are deterministic constant. There exists a Brownian motion {W (t)}t≥0 such that
the stochastic process {X(t)}t≥0 given by
N
X
X(t) = Xj (t)2
j=1

2
solves (5.25) with a = θ, c = σ and b = N4θσ .

Proof. Let X(t) = N 2


P
j=1 Xj (t) . Applying Itô’s formula we find, after straightforward cal-
culations,
N
N σ2 X
dX(t) = ( − θX(t)) dt + σ Xj (t) dWj (t).
4 j=1

N σ2
Letting a = θ, c = σ, b = 4θ
and
N
X X (t)
dW (t) = pj dWj (t),
j=1 X(t)

we obtain that X(t) satisfies


p
dX(t) = a(b − X(t)) dt + c X(t) dW (t).

Thus {X(t)}t≥0 is a CIR process, provided we prove that {W (t)}t≥0 is a Brownian mo-
tion. Clearly, W (0) = 0 a.s. and the paths t → W (t, ω) are a.s. continuous. Hence to
conclude that {W (t)}t≥0 is a Brownian motion we must show that dW (t)dW (t) = dt, see
2
See Definition 6.9 for the general definition of stopping time.

103
Theorem 3.18. We have
N N
1 X 1 X
dW (t)dW (t) = Xi (t)Xj (t)dWi (t)dWj (t) = Xi (t)Xj (t)δij dt
X(t) i,j=1 X(t) i,j=1
N
1 X 2
= X (t)dt = dt,
X(t) j=1 j

where we used that dWi (t)dWj (t) = δij dt, since the Brownian motions are independent.
Note that N ≥ 2 implies the Feller condition ab ≥ c2 /2, hence the CIR process con-
structed in the previous theorem does not hit zero, see Theorem 5.6. Note also that the
solution of (5.26) is
σ t 1 θτ
 Z 
− 21 θt
Xj (t) = e xj + e dWj (τ ) .
2
2 0
It follows by Exercise 4.9 that the random variables X1 (t), . . . , XN (t) are normally distributed
with
1 σ2 1
E[Xj (t)] = e 2 θt xj , Var[Xj (t)] = (1 − e− 2 θt ).

It follows by Exercise 3.17 that the CIR process constructed Theorem 5.7 is non-central χ2
distributed. The following theorem shows that this is a general property of CIR processes.
Theorem 5.8. Assume ab > 0. The CIR process starting at x > 0 at time t = s satisfies
1
X(t; s, x) = Y, Y ∈ χ2 (δ, β),
2k
where
2a 4ab
k= , δ= , β = 2kxe−a(t−s) .
(1 − e−a(t−s) )c2 c2
Sketch of the proof. As the CIR process is a homogeneous Markov process, it is enough to
prove the claim for s = 0. Let X(t) = X(t; 0, x) for short and denote p(t, 0, x, y) = p∗ (t, x, y)
the density of X(t). By Theorem 5.5, p∗ solves the Fokker-Planck equation
1
∂t p∗ + ∂y (a(b − y)p∗ ) − ∂y2 (c2 y p∗ ) = 0, (5.27)
2
with initial datum p∗ (0, x, y) = δ(x − y). Moreover, the characteristic function θX(t) (u) :=
h(t, u) of X(t) is given by
Z
iuX(t)
h(t, u) = E[e ]= eiuy p∗ (t, x, y) dy,
R

and after straightforward calculations we derive the following equation on h


c2 2
∂τ h − iabuh + (au − iu )∂u h = 0. (5.28a)
2
104
The initial condition for equation (5.28a) is

h(0, y) = eix , (5.28b)

which is equivalent to p∗ (0, x, y) = δ(x − y). The initial value problem (5.28) can be solved
with the method of characteristics (see [8] for an illustration of this method) and one
finds that the solution is given by
 
βu
exp − 2(u+ik)
h(t, u) = θX(t) (u) = . (5.29)
(1 − iu/k)δ/2
u
Hence θX(t) (u) = θY ( 2k ), where θY (u) is the characteristic function of Y ∈ χ2 (δ, β), see
Table 3.1. This completes the proof.

Exercise 5.10. Derive (5.28a) and verify with Mathematica that (5.29) is the solution of
the initial value problem (5.28).

Finally we discuss briefly the question of existence of strong solutions to the Kolmogorov
equation for the CIR process, which is

c2 2
∂t u + a(b − x)∂x u + x∂ u = 0, (t, x) ∈ DT+ , u(T, x) = g(x). (5.30)
2 x
Note carefully that the Kolmogorov PDE is now defined only for x > 0, as the initial value
x in (5.25) must be positive. Now, if a strong solution of (5.30) exists, then it must be given
by u(t, x) = E[g(X(T ; t, x))] (this claim is proved exactly as in Theorem 5.4(iii)). Supposing
ab > 0, then Z ∞
u(t, x) = E[g(X(T ; t, x))] = p∗ (T − t, x, y)g(y) dy,
0

where the density of X(T ; t, x) is given as in Theorem 5.8. Using the asymptotic behavior
of p∗ as x → 0+ , it can be shown u(t, x) is bounded near the axis x = 0 only if the Feller
condition ab ≥ c2 /2 is satisfied and in this case ∂t u, ∂x u, ∂x2 u are also bounded. Hence u is
the (unique) strong solution of (5.30) if and only if ab ≥ c2 /2.

5.4 Finite different solutions of PDE’s


The finite difference methods are techniques to find (numerically) approximate solutions
to ordinary differential equations (ODEs) and partial differential equations (PDEs). They
are based on the idea to replace the ordinary/partial derivatives with a finite difference
quotient, e.g., y 0 (x) ≈ (y(x + h) − y(x))/h. The various methods differ by the choice of
the finite difference used in the approximation. We shall present a number of methods by
examples. As this suffices for our future applications, we consider only linear equations.

105
5.4.1 ODEs
Consider the first order ODE
dy
= ay + bt, y(0) = y0 , t ∈ [0, T ], (5.31)
dt
for some constants a, b ∈ R and T > 0. The solution is given by
b at
y(t) = y0 eat + (e − at − 1). (5.32)
a2
We shall apply three different finite difference methods to approximate the solution of (5.31).
In all cases we divide the time interval [0, T ] into a uniform partition,
T T
0 = t0 < t1 < · · · < tn = T, tj = j , ∆t = tj+1 − tj =
n n
and define
y(tj ) = yj , j = 0, . . . , n.

Forward Euler method


In this method we introduce the following approximation of dy/dt at time t:

dy y(t + ∆t) − y(t)


(t) = + O(∆t),
dt ∆t
i.e.,
dy
y(t + ∆t) = y(t) + (t)∆t + O(∆t2 ). (5.33)
dt
For Equation (5.31) this becomes

y(t + ∆t) = y(t) + (ay(t) + bt)∆t + O(∆t2 ).

Setting t = tj , ∆T = T /n, t + ∆t = tj + T /n = tj+1 and neglecting second order terms we


obtain
T
yj+1 = yj + (ayj + btj ) , j = 0, . . . , n − 1. (5.34)
n
As y0 is known, the previous iterative equation can be solved at any step j. This method is
called explicit, because the solution at the step j +1 is given explicitly in terms of the solution
at the step j. It is a simple matter to implement this method numerically, for instance using
the following Matlab function:3
function [time,sol]=exampleODEexp(T,y0,n)
dt=T/n;
3
The Matlab codes presented in this text are not optimized. Moreover the powerful vectorization tools of
Matlab are not employed, so as to make the codes easily adaptable to other computer softwares and languages.

106
sol=zeros(1,n+1);
time=zeros(1,n+1);
a=1; b=1;
sol(1)=y0;
for j=2:n+1
sol(j)=sol(j-1)+(a*sol(j-1)+b*time(j-1))*dt;
time(j)=time(j-1)+dt;
end

Exercise 5.11. Compare the approximate solution with the exact solution for increasing
values of n. Compile a table showing the difference between the approximate solution and
the exact solution at time T for increasing value of n.

Backward Euler method


This method consists in approximating dy/dt at time t as

dy y(t) − y(t − ∆t)


(t) = + O(∆t),
dt ∆t
hence
dy
y(t + ∆t) = y(t) + (t + ∆t)∆t + O(∆t2 ). (5.35)
dt
The iterative equation for (5.31) now is
T
yj+1 = yj + (ayj+1 + btj+1 ) , j = 0, . . . , n − 1. (5.36)
n
This method is called implicit, because the solution at the step j + 1 depends on the solution
at both the step j and the step j + 1 itself. Therefore implicit methods involve an extra
computation, which is to find yj+1 in terms of yj only. For the present example this is a
trivial step, as we have
aT −1 T
yj+1 = 1 − yj + btj+1 , (5.37)
n n
provided n 6= aT . Here is a Matlab function implementing the backward Euler method for
the ODE (5.31):
function [time,sol]=exampleODEimp(T,y0,n)
dt=T/n;
sol=zeros(1,n+1);
time=zeros(1,n+1);
a=1; b=1;
sol(1)=y0;
for j=2:n+1
time(j)=time(j-1)+dt;

107
sol(j)=1/(1-a*dt)*(sol(j-1)+b*time(j)*dt);
end

Exercise 5.12. Compare the approximate solution obtained with the backward Euler method
with the exact solution and the approximate one obtained via the forward Euler method.
Compile a table for increasing values of n as in Exercise 1.

Central difference method


By a Taylor expansion,

dy 1 d2 y
y(t + ∆) = y(t) + (t)∆t + 2
(t)∆t2 + O(∆t3 ), (5.38)
dt 2 dt
and replacing ∆t with −∆t,

dy 1 d2 y
y(t − ∆) = y(t) − (t)∆t + (t)∆t2 + O(∆t3 ). (5.39)
dt 2 dt2
Subtracting the two equations we obtain the following approximation for dy/dt at time t:

dy y(t + ∆t) − y(t − ∆)


(t) = + O(∆t2 ),
dt 2∆t
which is called central difference approximation. Hence
dy
y(t + ∆t) = y(t − ∆t) + 2 (t)∆t + O(∆t3 ). (5.40)
dt
Note that, compared to (5.33) and (5.35), we have gained one order in accuracy. The iterative
equation for (5.31) becomes

T
yj+1 = yj−1 − 2(ayj + btj ) , j = 0, . . . , n − 1. (5.41)
n
Note that the first step j = 0 requires y−1 . This is fixed by the backward method
T
y−1 = y0 − ay0 , (5.42)
n
which is (5.36) for j = −1.

Exercise 5.13. Write a Matlab function that implements the central difference method
for (5.31). Compile a table comparing the exact solution with the approximate solutions
at time T obtained by the three methods presented above for increasing value of n.

108
A second order ODE
Consider the second order ODE for the harmonic oscillator:
d2 y
2
= −ω 2 y, y(0) = y0 , ẏ(0) = ye0 . (5.43)
dt
The solution to this problem is given by
ye0
y(t) = y0 cos(ωt) + sin(ωt). (5.44)
ω
One can define forward/backward/central difference approximations for second derivatives
in a way similar as for first derivatives. For instance, adding (5.38) and (5.39) we obtain the
following central difference approximation for d2 y/dt2 at time t:
d2 y y(t + ∆t) − 2y(t) + y(t − ∆t)
2
(t) = + O(∆t),
dt ∆t2
which leads to the following iterative equation for (5.43):
 2
T
yj+1 = 2yj − yj−1 − ω 2 yj , j = 1, . . . , n − 1, (5.45)
n
T
y1 = y0 + ye0 . (5.46)
n
Note the approximate solution at the first node is computed using the forward method and
the initial datum ẏ(0) = ye0 . The Matlab function solving this iteration is the following.
function [time,sol]=harmonic(w,T,y0,N)
dt=T/N;
sol=zeros(1,N+1);
time=zeros(1,N+1);
sol(1)=y0(1);
sol(2)=sol(1)+y0(2)*dt;
for j=3:N+1
sol(j)=2*sol(j-1)-sol(j-2)-dt^2*w^2*sol(j-1);
time(j)=time(j-1)+dt;
end

Exercise 5.14. Compare the exact and approximate solutions at time T for increasing values
of n.

5.4.2 PDEs
In this section we present three finite difference methods to find approximate solutions to
the one-dimensional heat equation
∂t u = ∂x2 u, u(0, x) = u0 (x), (5.47)

109
where u0 is continuous. We refer to t as the time variable and to x as the spatial variable,
since this is what they typically represent in the applications of the heat equation. As before,
we let t ∈ [0, T ]. As to the domain of the spatial variable x, we distinguish two cases

(i) x runs over the whole real line, i.e., x ∈ (−∞, ∞), and we are interested in finding an
approximation to the solution u ∈ Cb1,2 (DT ).

(ii) x runs over a finite interval, say x ∈ (xmin , xmax ), and we want to find an approximation
of the solution u ∈ Cb1,2 ((0, T ) × (xmin , xmax )) which satisfies the boundary conditions4

u(t, xmin ) = uL (t), u(t, xmax ) = uR (t), t ∈ [0, T ],

for some given continuous functions uL , uR . We also require uL (0) = u0 (xmin ), uR (0) =
u0 (xmax ), so that the solution is continuous on the boundary.

In fact, for numerical purposes, problem (i) is a special case of problem (ii), for the
domain (−∞, ∞) must be approximated by (−A, A) for A >> 1 when we solve problem (i)
in a computer. Note however that in the finite domain approximation of problem (i), the
boundary conditions at x = ±A cannot be prescribed freely! Rather they have to be given by
suitable approximations of the limit values at x = ±∞ of the solution to the heat equation
on the real line.
By what we have just said we can focus on problem (ii). To simplify the discussion we
assume that the domain of the x variable is given by x ∈ (0, X) and we assign zero boundary
conditions, i.e., uL = uR = 0. Hence we want to study the problem

∂t u = ∂x2 u, (t, x) ∈ (0, T ) × (0, X), (5.48a)


u(0, x) = u0 (x), u(t, 0) = u(t, X) = 0, x ∈ [0, X], t ∈ [0, T ]; u0 (0) = u0 (X) = 0.
(5.48b)

We introduce the partition of the interval (0, X) given by

X X
0 = x0 < x1 < · · · < xm = X, xj = j , ∆x = xj+1 − xj = ,
m m
and the partition of the time interval [0, T ] given by

T T
0 = t0 < t1 < · · · < tn = T, ti = i , ∆t = ti+1 − ti = ,
m n
We let
ui,j = u(ti , xj ), i = 0, . . . , n, j = 0, . . . , m.
Hence ui,j is a n + 1 × m + 1 matrix. The ith row contains the value of the approximate
solution at each point of the spatial mesh at the fixed time ti . For instance, the zeroth row
4
These are called Dirichlet type boundary conditions. Other types of boundary conditions can be imposed,
but the Dirichlet type is sufficient for our applications to the Black-Scholes PDE.

110
is the initial datum: u0,j = u0 (xj ), i = 0, . . . m. The columns of the matrix ui,j contain the
values of the approximate solution at one spatial point for different times. For instance, the
column ui,0 are the values of the approximate solution at x0 = 0 for different times ti , while
ui,m contains the values at xm = X. By the given boundary conditions we then have

ui,0 = ui,m = 0, i = 0, . . . , n.

We define
∆t T m2
d= = . (5.49)
∆x2 X2 n

Method 1: Forward in time, centered in space


In this method we use a forward difference approximation for the time derivative and a
centered difference approximation for the second spatial derivative:

u(t + ∆t, x) − u(t, x)


∂t u(t, x) = + O(∆t),
∆t
u(t, x + ∆x) − 2u(t, x) + u(t, x − ∆x)
∂x2 u(t, x) = + O(∆x).
∆x2
We find
u(t + ∆t, x) = u(t, x) + d(u(t, x + ∆x) − 2u(t, x) + u(t, x − ∆x)).
Hence we obtain the following iterative equation

ui+1,j = ui,j + d(ui,j+1 − 2ui,j + ui,j−1 ), i = 0, . . . , n − 1, j = 1, . . . , n − 1, (5.50)

where we recall that u0,j = u0 (xj ), ui,0 = ui,m+1 = 0, i = 0, . . . , n, j = 0, . . . m. This method


is completely explicit. A Matlab function solving the iteration (5.50) with the initial datum
u0 (x) = exp(X 2 /4) − exp((x − X/2)2 ) is the following.
function [time,space,sol]=heatexp(T,X,n,m)
dt=T/n; dx=X/m;
d=dt/dx^2
sol=zeros(n+1,m+1);
time=zeros(1,n+1);
space=zeros(1,m+1);
for i=2:n+1
time(i)=time(i-1)+dt;
end
for j=2:m+1
space(j)=space(j-1)+dx;
end
for j=1:m+1
sol(1,j)=exp(X^2/4)-exp((space(j)-X/2)^2);

111
end
sol(:,1)=0; sol(:,m+1)=0;
for i=2:n+1
for j=3:m+1
sol(i,j-1)=sol(i-1,j-1)+d*(sol(i-1,j)-2*sol(i-1,j-1)+sol(i-1,j-2));
end
end

To visualize the result it is convenient to employ an animation which plots the approx-
imate solution at each point on the spatial mesh for some increasing sequence of times in
the partition {t0 , t1 , . . . , tn }. This visualization can be achieved with the following simple
Matlab function:
function anim(r,F,v)
N=length(F(:,1));
step=round(1+N*v/10);
figure
for i=1:step:N
plot(r,F(i,:));
axis([0 1 0 1/2]);
drawnow;
pause(0.3);
end

Upon running the command anim(space,sol,v), the previous function will plot the
approximate solutions at different increasing times with speed v (the speed v must be between
0 and 1).
Let us try the following: [time,space,sol]=heatexp(1,1,2500,50). Hence we solve
the problem on the unit square (t, x) ∈ (0, 1)2 on a mesh of (n, m) = 2500 × 50 points. The
value of the parameter (5.49) is
d = 1.
If we now try to visualize the solution by running anim(space,sol,0.1), we find that the
approximate solution behaves very strangely (it produces just random oscillations). However
by increasing the number of time steps with [time,space,sol]=heatexp(1,1,5000,50),
so that
d = 0.5,
and visualize the solution, we shall find that the approximate solution converges quickly
and smoothly to u ≡ 0, which is the equilibrium of our problem (i.e., the time independent
solution of (5.48)). In fact, this is not a coincidence, for we have the following

Theorem 5.9. The forward-centered method for the heat equation is unstable if d ≥ 1 and
stable for d < 1.

112
The term unstable here refers to the fact that numerical errors, due for instance to the
truncation and round-off of the initial datum on the spatial grid, will increase in time. On the
other hand, stability of a finite difference method means that the error will remain small at
all times. The stability condition d < 1 for the forward-centered method applied to the heat
equation is very restrictive: it forces us to choose a very high number of points on the time
partition. To avoid such a restriction, which could be very costly in terms of computation
time, implicit methods are preferred, such as the one we present next.

Method 2: Backward in time, centered in space


In this method we employ the backward finite difference approximation for the time derivative
and the central difference for the second spatial derivative (same as before). This results in
the following iterative equation:

ui+1,j = ui,j + d(ui+1,j+1 − 2ui+1,j + ui+1,j−1 ), i = 0, . . . , n − 1, j = 1, . . . , m − 1, (5.51)

where we recall that u0,j = u0 (xj ), ui,0 = ui,m+1 = 0, i = 0, . . . , n, j = 0, . . . m. This method


is implicit and we need therefore to solve for the solution at time i+1 in terms of the solution
at time i. To this purpose we let

ui = (ui,0 , ui,1 , . . . , ui,m )T

be the column vector containing the approximate solution at time ti and rewrite (5.51) in
matrix form as follows:
Aui+1 = ui , (5.52)
where A is the m + 1 × m + 1 matrix with non-zero entries given by

A0,0 = Am,m = 1, Ak,k = 1 + 2d, Ak,k−1 = Ak,k+1 = −d, k = 1, . . . m − 1.

The matrix A is invertible, hence we can invert (5.52) to express uj+1 in terms of uj as

ui+1 = A−1 ui . (5.53)

This method is unconditionally stable, i.e., it is stable for all values of the parameter d.
We can test this property by using the following Matlab function, which solves the iterative
equation (5.53):
function [time,space,sol]=heatimp(T,X,n,m)
dt=T/n; dx=X/m;
d=dt/dx^2
sol=zeros(n+1,m+1);
time=zeros(1,n+1);
space=zeros(1,m+1);
A=zeros(m+1,m+1);
A(1,1)=1; A(m+1,m+1)=1;

113
for i=2:n+1
time(i)=time(i-1)+dt;
end
for j=2:m+1
space(j)=space(j-1)+dx;
end
for j=1:m+1
sol(1,j)=exp(X^2/4)-exp((space(j)-X/2)^2);
end
sol(:,1)=0; sol(:,m+1)=0;
for k=2:m
A(k,k-1)=-d;
A(k,k)=1+2*d;
A(k,k+1)=-d;
end
for i=2:n+1
sol(i,:)=sol(i-1,:)*transpose(inv(A));
end

If we now run [time,space,sol]=heatexp(1,1,500,50), for which d = 5, and visualize


the solution we shall obtain that the approximate solution behaves smoothly as expected,
indicating that the instability problem of the forward-centered method has been solved.

Method 3: Crank-Nicholson
This is an implicit method with higher order of accuracy than the backward-centered method.
It is obtained by simply averaging between methods 1 and 2 above, i.e.,
1 1
ui+1,j = ui+1,j + ui+1,j ,
2 2
where the first term in the right hand side is computed with method 1 and the second term
with method 2. Thus we obtain the following iterative equation
d
ui+1,j = ui,j + [(ui,j+1 − 2ui,j + ui,j−1 ) + (ui+1,j+1 − 2ui+1,j + ui+1,j−1 )]. (5.54)
2
As the backward-centered method, the Crank-Nicholson method is also unconditionally sta-
ble.
Exercise 5.15. Write (5.54) in matrix form and solve for the solution at the time step i + 1
in terms of the solution at the time step i.
Exercise 5.16. Write a Matlab function that implements the Crank-Nicholson method.
Exercise 5.17. Compare methods 2 and 3.

114
5.A Appendix: Solutions to selected problems
Exercise 5.3. The SDE in question is
dX(t) = (a − bX(t)) dt + γdW (t), t ≥ s, X(s) = x.
Letting Y (t) = ebt X(t) and applying Itô’s formula we find that Y (t) satisfies
dY (t) = aebt dt + γebt dW (t), Y (s) = xebs .
Hence Z t Z t
bs bu
Y (t) = xe + a e dτ + γ ebu dW (u)
s s
and so Z t
b(s−t) a b(s−t)
X(t; s, x) = xe + (1 − e )+ γeb(u−t) dW (u).
b s
Taking the expectation we obtain immediately that E[X(t; s, x)] = m(t − s, x). Moreover by
Exercise 4.9, the Itô integral in X(t; s, x) is a normal random variable with zero mean and
variance ∆(t − s)2 , hence the claim follows.

t2 t2
Exercise 5.4. Letting Y (t) = e− 2 X(t), we find that dY (t) = e− 2 dW (t) and Y (0) = 1.
Thus Z t
t2 t2 u2
X(t) = e 2 + e 2 e− 2 dW (u).
0
Note that X(t) is normally distributed with mean
t2
E[X(t)] = e 2 .
It follows that Cov(X(s), X(t)) = E[X(s)X(t)] − E[X(s)]E[X(t)] is
s2 +t2
h Z s u2 Z t
u2
i
− 2
Cov(X(s), X(t)) = e 2 E e dW (u) e− 2 dW (u) .
0 0
Assume for example that s ≤ t. Hence
s2 +t2
hZ t 2
Z t
u2
i
− u2
Cov(X(s), X(t)) = e 2 E I[0,s] e dW (u) e− 2 dW (u) .
0 0
Using the result of Exercise 4.3 we have
Z t Z s
s2 +t2 u2 u2 s2 +t2 2
− −
Cov(X(s), X(t)) = e 2 I[0,s] e 2 e 2 du = e 2 e−u du
0 0

√ s2 +t2
Z 2s
− u2
2 du √ s2 +t2 √ 1
= πe 2 e √ = πe 2 (Φ( 2s) − ).
0 2π 2
For general s, t ≥ 0 we find
√ s2 +t2 √ 1
Cov(X(s), X(t)) = πe 2 (Φ( 2 min(s, t)) − ).
2

115
Chapter 6

The risk-neutral price

Throughout this chapter we assume that the probability space (Ω, F, P) and the Brownian
motion {W (t)}t≥0 are given. Furthermore, in order to avoid the need to repeatedly specify
technical assumptions, we make the following conventions:
• All stochastic processes in this chapter are assumed to have a.s. continuous paths and
so in particular they are integrable, both path by path and in the Itô sense. Of course
one may relax this assumption, but for our applications it is general enough.

• All Itô integrals in this chapter are assumed to be martingales, which holds for instance
when the integrand stochastic process is in the space L2 .

6.1 Absence of arbitrage in 1+1 dimensional markets


The ultimate purpose of this section is to prove that a self-financing portfolio invested in a
1+1 dimensional market is not an arbitrage. We shall prove the result by using Theorem 3.16,
i.e., by showing that there exists a measure, equivalent to P, with respect to which the
discounted value of the portfolio is a martingale. We first define such a measure. We
have seen in Theorem 4.10 that, given a stochastic process {θ(t)}t≥0 satisfying the Novikov
condition (4.20), the stochastic process {Z(t)}t≥0 defined by
 Z t
1 t
Z 
2
Z(t) = exp − θ(s) dW (s) − θ(s) ds , (6.1)
0 2 0

is a P-martingale relative to the filtration {FW (t)}t≥0 and that the map P
e : F → [0, 1] given
by
P(A)
e = E[Z(T )IA ] (6.2)
is a probability measure equivalent to P, for all T > 0.
Definition 6.1. Consider the 1+1 dimensional market

dS(t) = µ(t)S(t)dt + σ(t)S(t)dW (t), dB(t) = B(t)R(t)dt,

116
where the market parameters {µ(t)}t≥0 , {σ(t)}t≥0 , {R(t)}t≥0 are all adapted to {FW (t)}t≥0 .
Assume that σ(t) > 0 almost surely for all times. Let {θ(t)}t≥0 be the stochastic process
given by
µ(t) − R(t)
θ(t) = , (6.3)
σ(t)
and define {Z(t)}t≥0 by (6.1). Assume that {Z(t)}t≥0 is a martingale (e.g., {θ(t)}t≥0 satisfies
the Novikov condition (4.20)). The probability measure P e equivalent to P given by (6.2) is
called the risk-neutral probability measure of the market at time T .
Note that, by the definition (6.3) of the stochastic process {θ(t)}t≥0 , we can rewrite dS(t)
as
dS(t) = R(t)S(t)dt + σ(t)S(t)dW
f (t), (6.4)
where
dW
f (t) = dW (t) + θ(t)dt. (6.5)
By Girsanov theorem, Theorem 4.11, the stochastic process {W f (t)}t≥0 is a P-Brownian
e
motion. Moreover, {FW (t)}t≥0 is a non-anticipating filtration for {W f (t)}t≥0 . We also recall
that a portfolio {hS (t), hB (t)}t≥0 is self-financing if it is adapted to {FW (t)}t≥0 and if its
value {V (t)}t≥0 satisfies
dV (t) = hS (t)dS(t) + hB (t)dB(t), (6.6)
see Definition 4.4.
Theorem 6.1. Consider the 1+1 dimensional market
dS(t) = µ(t)S(t)dt + σ(t)S(t)dW (t), dB(t) = B(t)R(t)dt, (6.7)
where the market parameters {µ(t)}t≥0 , {σ(t)}t≥0 , {R(t)}t≥0 are adapted to {FW (t)}t≥0 .
Assume that σ(t) > 0 almost surely for all times. Then the following holds.
(i) The discounted stock price {S ∗ (t)}t≥0 is a P-martingale
e in the filtration {FW (t)}t≥0 .
(ii) A portfolio process {hS (t), hB (t)}t≥0 adapted to {FW (t)}t≥0 is self-financing if and only
if its discounted value satisfies
Z t

V (t) = V (0) + D(s)hS (s)σ(s)S(s)dWf (s). (6.8)
0

(iii) If {hS (t), hB (t)}t≥0 is a self-financing portfolio, then {hS (t), hB (t)}t≥0 is not an arbi-
trage.
Proof. (i) By (6.4) and dD(t) = −D(t)R(t)dt we have
dS ∗ (t) = S(t)dD(t) + D(t)dS(t) + dD(t)dS(t)
= −S(t)R(t)D(t) dt + D(t)(R(t)S(t) dt + σ(t)S(t) dW
f (t))
= D(t)σ(t)S(t)dW
f (t),

117
and so the discounted price {S ∗ (t)}t≥0 of the stock is a P-martingale
e relative to {FW (t)}t≥0 .
(ii) By (6.7) and hB (t)B(t) = V (t) − hS (t)S(t), the value (4.32) of self-financing portfolios
can be written as

dV (t) = hS (t)S(t)[(µ(t) − R(t))dt + σ(t)dW (t)] + V (t)R(t)dt. (6.9)

Hence

dV (t) = hS (t)S(t)σ(t)[dW (t) + θ(t)dt] + V (t)R(t)dt = hS (t)S(t)σ(t)dW


f (t) + V (t)R(t)dt.

Thus the discounted portfolio value V ∗ (t) = D(t)V (t) satisfies

dV ∗ (t) = V (t)dD(t) + D(t)dV (t) + dD(t)dV (t)


= −D(t)V (t)R(t) dt + D(t)hS (t)S(t)σ(t)dW
f (t) + D(t)V (t)R(t)dt
= D(t)hS (t)S(t)σ(t)dW
f (t),

which proves (6.8).


(iii) By (6.8), the discounted value of self-financing portfolios is a P-
e martingale relative to
the filtration {FW (t)}t≥0 . As P
e and P are equivalent, (iii) follows by Theorem 3.16.

Remark 6.1 (Arbitrage-free principle). The absence of self-financing arbitrage portfolios


in the 1+1 dimensional market (6.7) is consistent with the observations. In fact, even if
arbitrage opportunities may exist in real markets, they are very rare and last for very short
times, as they are quickly exploited by investors. In general, when a stochastic model for
the price of an asset is introduced, we require that it should satisfy the arbitrage-free
principle, namely that any self-financing portfolio invested in this asset and the risk-free
asset should be no arbitrage. Theorem 6.1 shows that the generalized geometric Brownian
motion satisfies the arbitrage-free principle, provided σ(t) > 0 a.s. for all times.

6.2 The risk-neutral pricing formula


Consider the European derivative with pay-off Y and time of maturity T > 0. We assume
that Y is FW (T )-measurable. Suppose that the derivative is sold at time t < T for the price
ΠY (t). The first concern of the seller is to hedge the derivative, that is to say, to invest the
amount ΠY (t) in such a way that the value of the seller portfolio at time T is enough to
pay-off the buyer of the derivative. The purpose of this section is to define a theoretical price
for the derivative which makes it possible for the seller to set-up such an hedging portfolio.
We argue under the following assumptions:

1. the seller is only allowed to invest the amount ΠY (t) in the 1+1 dimensional market
consisting of the underlying stock and the risk-free asset;

2. the investment strategy of the seller is self-financing.

118
It follows by Theorem 6.1 that the sought hedging portfolio is not an arbitrage. We may
interpret this fact as a “fairness” condition on the price of the derivative ΠY (t). In fact, if the
seller can hedge the derivative and still be able to make a risk-less profit on the underlying
stock, this may be considered unfair for the buyer.
We thus consider the 1+1 dimensional market

dS(t) = µ(t)S(t)dt + σ(t)S(t)dW (t), dB(t) = B(t)R(t)dt,

where we assume that the market parameters {µ(t)}t≥0 , {σ(t)}t≥0 , {R(t)}t≥0 are adapted to
{FW (t)}t≥0 and that σ(t) > 0 almost surely for all times. Let {hS (t), hB (t)}t≥0 be a self-
financing portfolio invested in this market and let {V (t)}t≥0 be its value. By Theorem 6.1,
the discounted value {V ∗ (t)}t≥0 of the portfolio is a P-martingale
e relative to the filtration
{FW (t)}t≥0 , hence
D(t)V (t) = E[D(T
e )V (T )|FW (t)].
Requiring the hedging condition V (T ) = Y gives
1 e
V (t) = E[D(T )Y |FW (t)].
D(t)
Since D(t) is FW (t)-measurable, we can move it inside the expectation and write the latter
equation as Z T
D(T )
V (t) = E[Y
e |FW (t)] = E[Y exp(−
e R(s) ds)|FW (t)],
D(t) t
Rt
where we used the definition D(t) = exp(− 0 R(s) ds) of the discounting process. Assuming
that the derivative is sold at time t for the price ΠY (t), then the value of the seller portfolio
at this time is precisely equal to the premium ΠY (t), which leads to the following definition.
Definition 6.2. Let Y be a FW (T )-measurable random variable with finite expectation. The
risk-neutral price (or fair price) at time t ∈ [0, T ] of the European derivative with pay-off
Y and time of maturity T > 0 is given by
Z T
ΠY (t) = E[Y exp(−
e R(s) ds)|FW (t)], (6.10)
t

i.e., it is equal to the value at time t of any self-financing hedging portfolio invested in the
underlying stock and the bond.
Remark 6.2. Being defined as a conditional expectation, the risk-neutral price can rarely be
computed explicitly. An exception to this is when the market parameters are deterministic,
see Section 6.3, and for some simple stochastic models, see Section 6.5.
In the particular case of a standard European derivative, i.e., when Y = g(S(T )), for
some measurable function g, the risk-neutral price becomes
Z T
ΠY (t) = E[g(S(T )) exp(−
e R(s) ds)|FW (t)].
t

119
By (6.4) we have
Z T T 
1
Z
S(T ) = S(t) exp (R(s) − σ 2 (s))ds + σ(s)dW
f (s) ,
t 2 t

hence the risk-neutral price of a standard European derivative takes the form
RT RT
Z T
1 2
(R(s)− σ (s))ds+ σ(s)dW (s)
ΠY (t) = E[g(S(t)e ) exp(− R(s) ds)|FW (t)]. (6.11)
f
e t 2 t

Since the risk-neutral price of the European derivative equals the value of self-financing hedg-
ing portfolios invested in a 1+1 dimensional market, then, by Theorem 6.1, the discounted
risk-neutral price {Π∗Y (t)}t∈[0,T ] is a P-martingale
e relative to the filtration {FW (t)}t≥0 . In
fact, this property follows directly also by Definition 6.10, as shown in the first part of the
following theorem.
Theorem 6.2. Consider the 1+1 dimensional market

dS(t) = µ(t)S(t)dt + σ(t)S(t)dW (t), dB(t) = B(t)R(t)dt,

where we assume that {µ(t)}t≥0 , {σ(t)}t≥0 , {R(t)}t≥0 are adapted to {FW (t)}t≥0 and that
σ(t) > 0 almost surely for all times. Assume that the European derivative on the stock with
pay-off Y and time of maturity T > 0 is priced by (6.10) and let Π∗Y (t) = D(t)ΠY (t) be the
discounted price of the derivative. Then the following holds.
(i) The process {Π∗Y (t)}t∈[0,T ] is a P-martingale
e relative to {FW (t)}t≥0 .
(ii) There exists a stochastic process {∆(t)}t∈[0,T ] , adapted to {FW (t)}t≥0 , such that
Z t
∗ f (s), t ∈ [0, T ].
ΠY (t) = ΠY (0) + ∆(s)dW (6.12)
0

(iii) The portfolio {hS (t), hB (t)}t∈[0,T ] given by

∆(t)
hS (t) = , hB (t) = (ΠY (t) − hS (t)S(t))/B(t) (6.13)
D(t)σ(t)S(t)
is self-financing and replicates the derivative at any time, i.e., its value V (t) is equal
to ΠY (t) for all t ∈ [0, T ]. In particular, V (T ) = ΠY (T ) = Y , i.e., the portfolio is
hedging the derivative.
Proof. (i) We have

Π∗Y (t) = D(t)ΠY (t) = E[Π e ∗ (T )|FW (t)],


e Y (T )D(T )|FW (t)] = E[Π
Y

where we used that ΠY (T ) = Y . Hence, for s ≤ t, and using Theorem 3.13(iii),


e ∗ (t)|FW (s)] = E[
E[Π e ∗ (T )|FW (t)]|FW (s)] = E[Π
e E[Π e ∗ (T )|FW (s)] = Π∗ (s).
Y Y Y Y

120
This shows that the discounted price of the derivative is a P-martingale
e relative to the
filtration {FW (t)}t∈[0,T ] .
(ii) By (i) and (3.24) we have

e ∗ (t)|FW (s)] = E[Z(t)Π∗ (t)|FW (s)],


Z(s)Π∗Y (s) = Z(s)E[Π Y Y (6.14)

i.e., the stochastic process {Z(t)Π∗Y (t)}t∈[0,T ] is a P-martingale relative to the filtration
{FW (t)}t∈[0,T ] . Hence, by the martingale representation theorem, Theorem 4.6, there ex-
ists a stochastic process {Γ(t)}t∈[0,T ] adapted to {FW (t)}t∈[0,T ] such that
Z t
Z(t)Π∗Y (t) = ΠY (0) + Γ(s)dW (s), t ∈ [0, T ],
0

i.e.,
d(Z(t)Π∗Y (t)) = Γ(t)dW (t). (6.15a)
On the other hand, by Itô’s product rule,

dΠ∗Y (t) =d(Z(t)Π∗Y (t)/Z(t)) = d(1/Z(t))Z(t)Π∗Y (t) + 1/Z(t)d(Z(t)Π∗Y (t))


+ d(1/Z(t))d(Z(t)Π∗Y (t)). (6.15b)

By Itô’s formula and dZ(t) = −θ(t)Z(t)dW (t), we obtain

1 1 θ(t) f
d(1/Z(t)) = − 2
dZ(t) + 3
dZ(t)dZ(t) = dW (t). (6.15c)
Z(t) Z(t) Z(t)

Hence
θ(t)Γ(t)
d(1/Z(t))d(Z(t)Π∗Y (t)) = dt. (6.15d)
Z(t)
Combining Equations (6.15) we have

Γ(t)
dΠ∗Y (t) = ∆(t)dW
f (t), where ∆(t) = θ(t)Π∗Y (t) + ,
Z(t)

which proves (6.12).


(iii) It is clear that the portfolio {hS (t), hB (t)}t∈[0,T ] given by (6.13) is adapted to {FW (t)}t≥0
and replicates the derivative. Furthermore (6.12) entails that V ∗ (t) = Π∗Y (t) satisfies (6.8),
hence, by Theorem 6.1(ii), {hS (t), hB (t)}t∈[0,T ] is a self-financing portfolio, and the proof is
completed.
Consider now the 1+1 dimensional market consisting of the European derivative and the
risk-free asset. The value of a self-financing portfolio invested in this market satisfies

dV (t) = hY (t)dΠY (t) + hB (t)dB(t), V (t) = hY (t)ΠY (t) + hB (t)B(t),

121
where hY (t) is the number of shares of the derivative in the portfolio. It follows by (6.12)
that the discounted value of this portfolio satisfies

d(V ∗ (t)) = −R(t)D(t)V (t)dt + D(t)hY (t)dΠY (t) + D(t)hB (t)B(t)R(t)dt


= −R(t)D(t)hY (t)ΠY (t)dt + D(t)hY (t)dΠY (t) = −hY (t)d(D(t)ΠY (t))
= −hY (t)∆(t)dW
f (t).

We infer that the discounted value process {V ∗ (t)}t≥0 is a P-martingale


e relative to {FW (t)}t≥0 .
Hence, by Theorem 3.16, the portfolio is not an arbitrage and therefore the risk-neutral price
model for European derivatives satisfies the arbitrage-free principle, see Remark 6.1.

6.3 Black-Scholes price of European derivatives


In this section we apply the results of Section 6.2 to the case when the market parameters
{µ(t)}t≥0 , {σ(t)}t≥0 , {R(t)}t≥0 are deterministic constants. In particular, the price of the
stock now follows the geometric Brownian motion (2.14), where α = µ − σ 2 /2, i.e.,
σ2 σ2
S(t) = S(0)eαt+σW (t) = S(0)e(µ− )t+σW (t)
= S(0)e(r− )t+σ W
f (t)
2 2 , (6.16)
f (t) = W (t) + ( µ−r )t is a Brownian motion in the risk-neutral probability measure.
where W σ
We assume in the following that the European derivative is standard. Replacing σ(s) = σ > 0
and R(s) = r > 0 into (6.11), we obtain
1 2 )τ
ΠY (t) = e−rτ E[g(S(t)e (r− 2 σ
eσ(W (T )−W (t)) )|FW (t)],
e f f

where
τ =T −t
is the time left to the expiration of the derivative. As FW (t) = FW
f (t), we obtain

1 2 )τ
ΠY (t) = e−rτ E[g(S(t)e (r− 2 σ
eσ(W (T )−W (t)) )|FW
f (t)].
e f f

As the increment Wf (T ) − W
f (t) is independent of Ff (t), the conditional expectation above
W
is a pure expectation, see Theorem 3.14(i), and so
1 2 )τ
ΠY (t) = e−rτ E[g(S(t)e(r− 2 σ eσ(W (T )−W (t)) )]. (6.17)

(Note that we don’t need anymore to be in the risk-neutral world). Finally, since W (T ) −
W (t) ∈ N (0, τ ), we obtain
e−rτ
Z
1 2 y2
ΠY (t) = √ g(S(t)e(r− 2 σ )τ eσy )e− 2τ dy, (6.18)
2πτ R
that is,
ΠY (t) = vg (t, S(t)), (6.19a)

122
where the Black-Scholes price function vg : DT+ → R is given by

e−rτ
Z √
1 2 y2
vg (t, x) = √ g(xe(r− 2 σ )τ eσ τ y )e− 2 dy. (6.19b)
2π R
Definition 6.3. Let g : (0, ∞) → R be a C 2 function and assume that g 0 , g 00 are uniformly
bounded. The stochastic process {ΠY (t)}t∈[0,T ] given by (6.19), is called the Black-Scholes
price of the standard European derivative with pay-off Y = g(S(T )) and time of maturity
T > 0.
Remark 6.3. Our assumptions on the pay-off function g can be considerably weakened,
but since they cover all real-world applications, we shall not do it. Note that, under our
assumptions, vg ∈ C 1,2 (DT+ ) and ∂x vg is uniformly bounded.
Remark 6.4. The fact that the Black-Scholes price of the derivative at time t is a de-
terministic function of S(t), that is, ΠY (t) = vg (t, S(t)), is an important property for the
applications. In fact, thanks to this property, at time t we may look at the price S(t) of
the stock in the market and compute explicitly the theoretical price ΠY (t) of the deriva-
tive. This theoretical value is, in general, different from the real market price. We shall
discuss how to interpret this difference is Section 6.3.2. Moreover, as shown below, the for-
mula (6.19) is equivalent to the Markov property of the geometric Brownian motion (6.16)
in the risk-neutral probability measure P.
e

We can rewrite the Black-Scholes price function as vg (t, x) = h(T − t, x), where, by a
change of variable in the integral on the right hand side of (6.19b),
Z
h(τ, x) = g(y)q(τ, x, y) dy,
R

where " 2 #
e−rτ Iy>0

1 y 1 2
q(τ, x, y) = √ exp − 2 log − (r − σ )τ .
y 2πσ 2 τ 2σ τ x 2
Comparing this expression with (3.31), we see that we can write the function q as

q(τ, x, y) = e−rτ p∗ (τ, x, y),

where p∗ is the transition density of the geometric Brownian motion (6.16). In particular, the
risk-neutral pricing formula of a standard European derivative when the market parameters
are constant is equivalent to the identity
Z
E[g(S(T ))|FW
e f (t)] = p∗ (T − t, S(t), y)g(y) dy,
R

and thus, since 0 ≤ t ≤ T are arbitrary, it is equivalent to the Markov property of the
geometric Brownian motion (6.16) in the risk-neutral probability measure P,e see again Ex-
ercise 3.34. We shall generalize this discussion to markets with non-constant parameters in

123
Section 6.5. Note also that replacing s = 0, t = τ , α = r − σ 2 /2 into (3.33), and letting
u(τ, x) = erτ h(τ, x), we obtain that u satisfies
1
−∂τ u + rx∂x u + σ 2 x2 ∂x2 u = 0, u(0, x) = h(0, x) = vg (T, x) = g(x).
2
Hence the function h(τ, x) satisfies
1
−∂τ h + rx∂x h + σ 2 x2 ∂x2 h = rh, h(0, x) = g(x).
2
As vg (t, x) = h(T − t, x), we obtain the following result.
Theorem 6.3. The Black-Scholes price function vg is the unique strong solution of the
Black-Scholes PDE
1
∂t vg + rx∂x vg + σ 2 x2 ∂x2 vg = rvg , (t, x) ∈ DT+ (6.20a)
2
with the terminal condition
vg (T, x) = g(x). (6.20b)
Exercise 6.1. Write a Matlab code that computes the finite difference solution of the prob-
lem (6.20). Use the Crank-Nicholson method presented in Section 5.4.2.
Remark 6.5. For the previous exercise one needs to fix the boundary condition at x = 0
for (6.20a). It is easy to show that the boundary value at x = 0 of the Black-Scholes price
function is given by
vg (t, 0) = g(0)er(t−T ) , for all t ∈ [0, T ]. (6.21)
In fact from one hand, letting x = 0 in (6.20a) we obtain that v(t) = vg (t, 0) satisfies dv/dt =
rv, hence v(t) = v(T )er(t−T ) . On the other hand, v(T ) = vg (T, 0) = g(0). Thus (6.21) follows.
For instance, in the case of a call, i.e., when g(z) = (z − K)+ , we obtain vg (t, 0) = 0, for
all t ∈ [0, T ], hence the risk-neutral price of a call option is zero when the price of the
underlying stock tends to zero. That this should be the case is clear, for the call will never
expire in the money if the price of the stock is arbitrarily small. For a put option, i.e., when
g(z) = (K − z)+ , we have vg (t, 0) = Ke−rτ , hence the risk-neutral price of a put option
is given by the discounted value of the strike price when the price of the underlying stock
tends to zero. This is also clear, since in this case the put option will certainly expire in the
money, i.e., its value at maturity is K with probability one, and so the value at any earlier
time is given by discounting its terminal value.
Next we compute the hedging portfolio of the derivative.
Theorem 6.4. Consider a standard European derivative priced according to Definition 6.3.
The portfolio {hS (t), hB (t)} given by

hS (t) = ∂x vg (t, S(t)), hB (t) = (ΠY (t) − hS (t)S(t))/B(t)

is a self-financing hedging portfolio for the derivative.

124
Proof. According to Theorem 6.2, we have to show that the discounted value of the Black-
Scholes price satisfies
dΠ∗Y (t) = D(t)S(t)σ∂x vg (t, S(t))dW
f (t).
A straightforward calculation, using ΠY (t) = vg (t, S(t)), Itô’s formula and Itô’s product rule,
gives
1
d(D(t)ΠY (t)) =D(t)[∂t vg (t, x) + rx∂x vg (t, x) + σ 2 x2 ∂x2 vg (t, x) − rvg (t, x)]x=S(t)
2
+ D(t)σS(t)∂x vg (t, S(t))dW f (t). (6.22)
Since vg solves the Black-Scholes PDE (6.20a), the result follows.
Exercise 6.2. Work out the details of the computation leading to (6.22).
Exercise 6.3. Find the risk-neutral price at time t = 0 of standard European derivatives
assuming that the market parameters are deterministic functions of time.

6.3.1 Black-Scholes price of European vanilla options


In this section we focus the discussion on call/put options, which are also called vanilla
options. We thereby assume that the pay-off of the derivative is given by
Y = (S(T ) − K)+ , i.e., Y = g(S(T )), g(x) = (x − K)+ , for a call option,
Y = (K − S(T ))+ , i.e., Y = g(S(T )), g(x) = (K − x)+ , for a put option.
The function vg given by (6.19b) will be denoted by c, for a call option, and by p, for a put
option.
Remark 6.6. Strictly speaking, the pay-off functions for call/put options do not satisfy the
regularity assumptions in Definition 6.3. For instance, g(x) = (x − K)+ is not differentiable
at x = K and so the Black-Scholes price function c(t, x) does not extend smoothly on the
boundary t = T . We shall ignore this technicality and still refer to c(t, x) as a strong solution
of the Black-Scholes PDE for call options.
Theorem 6.5. The Black-Scholes price at time t of a European call option with strike price
K > 0 and maturity T > 0 is given by c(t, S(t)), where
c(t, x) = xΦ(d1 ) − Ke−rτ Φ(d2 ), (6.23a)
log Kx + r − 21 σ 2 τ
 

d2 = √ , d1 = d2 + σ τ , (6.23b)
σ τ
Rx 1 2
and where Φ(x) = √12π −∞ e− 2 y dy is the standard normal distribution. The Black-Scholes
price of the corresponding put option is given by p(t, S(t)), where
p(t, x) = Φ(−d2 )Ke−rτ − Φ(−d1 )x. (6.24)
Moreover the put-call parity identity holds:
c(t, S(t)) − p(t, S(t)) = S(t) − Ke−rτ . (6.25)

125
Proof. We derive the Black-Scholes price of call options only, the argument for put options
being similar (see Exercise 6.24). We substitute g(z) = (z − K)+ into the right hand side
of (6.19b) and obtain

e−rτ
Z  √ y2
1 2

c(t, x) = √ xe(r− 2 σ )τ eσ τ y − K e− 2 dy.
2π R +

1 2 )τ √
Now we use that xe(r− 2 σ eσ τy
> K if and only if y > −d2 . Hence
Z ∞ Z ∞
e−rτ

√ 2 2
(r− 21 σ 2 )τ σ τ y − y2 − y2
c(t, x) = √ xe e e −K e dy .
2π −d2 −d2

√ √ 2
Using − 21 y 2 + σ τ y = − 21 (y − σ τ )2 + σ2 τ and changing variable in the integrals we obtain
Z ∞ Z ∞
e−rτ
 
√ 2
rτ − 21 (y−σ τ )2 − y2
c(t, x) = √ xe e dy − K e dy
2π −d2 −d2
" Z d2 +σ√τ Z d2 #
e−rτ rτ − 12 y 2
2
− y2
=√ xe e dy − K e dy
2π −∞ −∞

= sΦ(d1 ) − Ke−rτ Φ(d2 ).

As to the put-call parity, we have

c(t, x) − p(t, x) = sΦ(d1 ) − Ke−rτ Φ(d2 ) − Φ(−d2 )Ke−rτ + sΦ(−d1 )


= x(Φ(d1 ) + Φ(−d1 )) − Ke−rτ (Φ(d2 ) + Φ(−d2 )).

As Φ(z) + Φ(−z) = 1, the claim follows.

Exercise 6.4. Derive the Black-Scholes price p(t, S(t)) of European put options claimed in
Theorem 6.5.

Remark 6.7. The formulas (6.23)-(6.24) appeared for the first time in the seminal paper [2],
where they were derived by a completely diffent argument than the one presented here.

As to the self-financing hedging portfolio for the call/put option, we have hS (t) =
∂x c(t, S(t)) for call options and hS (t) = ∂x p(t, S(t)) for put options, see Theorem 6.4, while
the number of shares of the bond in the hedging portfolio is given by

hB (t) = (c(t, S(t)) − S(t)∂x c(t, S(t)))/B(t), for call options,

and
hB (t) = (c(t, S(t)) − S(t)∂x p(t, S(t)))/B(t), for put options.
Let us compute ∂x c:

∂x c = Φ(d1 ) + xΦ0 (d1 )∂x d1 − Ke−rτ Φ0 (d2 )∂x d2 .

126
1 2 √
As ∂x d1 = ∂x d2 = and Φ0 (x) = e− 2 x / 2π, we obtain
√1 ,
σ τx
 
1 − 12 d21 K −rτ − 1 d22
∂x c = Φ(d1 ) + √ e − e e 2 .
σ 2πτ x

Replacing d1 = d2 + σ τ we obtain
1 2 
e− 2 d2


− 21 σ 2 τ −d2 σ τ K −rτ
∂x c = Φ(d2 ) + √ e − e .
σ 2πτ x
Using the definition of d2 , the term within round brackets in the previous expression is easily
found to be zero, hence
∂x c = Φ(d1 ).
By the put-call parity we find also
∂x p = Φ(d1 ) − 1 = Φ(−d1 ).
Note that ∂x c > 0, while ∂x p < 0. This agrees with the fact that call options are bought to
protect a short position on the underlying stock, while put options are bought to protect a
long position on the underlying stock.
Exercise 6.5 (•). Consider a European derivative with maturity T and pay-off Y given by
Y = k + S(T ) log S(T ),
where k > 0 is a constant. Find the Black-Scholes price of the derivative at time t < T and
the hedging self-financing portfolio. Find the probability that the derivative expires in the
money.

6.3.2 The greeks. Implied volatility and volatility curve


The Black-Scholes price of a call (or put) option derived in Theorem 6.5 depends on the price
of the underlying stock, the time to maturity, the strike price, as well as on the (constant)
market parameters r, σ (it does not depend on α). The partial derivatives of the price
function c with respect to these variables are called greeks. We collect the most important
ones (for call options) in the following theorem.
Theorem 6.6. The price function c of call options satisfies the following:
∆ := ∂x c = Φ(d1 ), (6.26)
φ(d1 )
Γ := ∂x2 c = √ , (6.27)
xσ τ
ρ := ∂r c = Kτ e−rτ Φ(d2 ), (6.28)
xφ(d1 )σ
Θ := ∂t c = − √ − rKe−rτ Φ(d2 ), (6.29)
2 τ

ν := ∂σ c = xφ(d1 ) τ (called “vega”). (6.30)

127
In particular:
• ∆ > 0, i.e., the price of a call is increasing on the price of the underlying stock;
• Γ > 0, i.e., the price of a call is convex on the price of the underlying stock;
• ρ > 0, i.e., the price of the call is increasing on the interest rate of the bond;
• Θ < 0, i.e., the price of the call is decreasing in time;
• ν > 0, i.e., the price of the call is increasing on the volatility of the stock.
Exercise 6.6. Use the put-call parity to derive the greeks of put options.
The greeks measure the sensitivity of options prices with respect to the market conditions.
This information can be used to draw some important conclusions. Let us comment for
instance on the fact that vega is positive. It implies that the wish of an investor with a long
position on a call option is that the volatility of the underlying stock increased. As usual,
since this might not happen, the investor portfolio is exposed to possible losses due to the
decrease of the stock volatility (which makes the call option in the portfolio loose value).
This exposure can be secured by adding variance swaps into the portfolio, see Section 6.5.3.
Exercise 6.7. Prove that
lim c(t, x) = (x − Ke−rτ )+ , lim c(t, x) = x.
σ→0+ σ→∞

Implied volatility
Let us temporarily re-denote the Black-Scholes price of the call as
c(t, S(t), K, T, σ),
which reflects the dependence of the price on the parameters K, T, σ (we disregard the
dependence on r). As shown in Theorem 6.6,
S(t) d21 √
∂σ c(t, S(t), K, T, σ) = vega = √ e− 2 τ > 0.

Hence the Black-Scholes price of the option is an increasing function of the volatility. Fur-
thermore, by Exercise 6.7,
lim c(t, S(t), K, T ) = (S(t) − Ke−rτ )+ , lim c(t, S(t), K, T ) = S(t).
σ→0+ σ→+∞

Therefore the function c(t, S(t), K, T, ·) is a one-to-one map from (0, ∞) into the interval
I = ((S(t) − Ke−rτ )+ , S(t)), see Figure 6.1. Now suppose that at some given fixed time t
the real market price of the call is c̃(t). Clearly, the option is always cheaper than the stock
(otherwise we would buy directly the stock, and not the option) and typically we also have
c̃(t) > max(0, S(t) − Ke−rτ ). The latter is always true if S(t) < Ke−rτ (the price of options
is positive), while for S(t) > Ke−rτ this follows by the fact that S(t) − Ke−rτ ≈ S(t) − K
and real calls are always more expansive than their intrinsic value. This being said, we can
safely assume that c̃(t) ∈ I.

128
10

cHt, SHtL, K, T, ΣL
6

0
0 5 10 15 20

Σ
Figure 6.1: We fix S(t) = 10, K = 12, r = 0.01, τ = 1/12 and depict the Black-Scholes price
of the call as a function of the volatility. Note that in practice only the very left part of this
picture is of interest, because typically 0 < σ < 1.

Thus given the value of c̃(t) there exists a unique value of σ, which depends on the fixed
parameters T, K and which we denote by σimp (T, K), such that

c(t, S(t), K, T, σimp (T, K)) = c̃(t).

σimp (T, K) is called the implied volatility of the option. The implied volatility must be
computed numerically (for instance using Newton’s method), since there is no close formula
for it.
The implied volatility of an option (in this example of a call option) is a very important
parameter and it is often quoted together with the price of the option. If the market followed
exactly the assumptions in the Black-Scholes theory, then the implied volatility would be a
constant, independent of T, K and equal to the volatility of the underlying asset. In this
respect, σimp (T, K) may be viewed as a quantitative measure of how real markets deviate
from ideal Black-Scholes markets. Furthermore, the implied volatility may be viewed as the
market consensus on the future value of the volatility of the underlying stock. Recall in fact
that in order for the Black-Scholes price of the option to be c(t, S(t), K, T, σimp (T, K)), the
volatility of the stock should be equal to σimp (T, K) in the time interval [t, T ] in the future.
Hence by pricing the option at the price c̃(t) = c(t, S(t), K, T, σimp (T, K)), the market is
telling us that the buyers and sellers of the option believe that the volatility of the stock in
the future will be σimp (T, K).
As a way of example, in Figure 6.2 the implied volatility is determined (graphically) for
various Apple call options on May 12, 2014, when the stock was quoted at 585.54 dollars
(closing price of the previous market day). All options expire on June 13, 2014 (τ = 1
month = 1/12). The value r = 0.01 has been used, but the results do no change significantly

129
70 60

50
K=585
60

K=565 cŽ HtL=14.5
cHΣL

cHΣL
40
50
30

40 cŽ HtL=24.35 20

30 10

20 0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Σ Σ

70
60

60
50
K=590
K=575
50
40 cŽ HtL=11.47
cHΣL

cHΣL
40 cŽ HtL=19.28 30

30 20

20 10

10 0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Σ Σ
Figure 6.2: Implied volatility of various call options on the Apple stock

even assuming r = 0.05. In the pictures, K denotes the strike price and c̃(t) the call price.
We observe that the implied volatility is 20 % in three cases, while for the call with strike
K = 565 dollars the implied volatility is a little smaller (≈ 16%), which means that the
latter call is slightly underpriced compare to the others.

Volatility curve
As mentioned before, the implied volatility depends on the parameters T, K. Here we are
particularly interested in the dependence on the strike price, hence we re-denote the implied
volatility as σimp (K). If the market behaved exactly as in the Black-Scholes theory, then
σimp (K) = σ for all values of K, hence the graph of K → σimp (K) would be just a straight
horizontal line. Given that real markets do not satisfy exactly the assumptions in the Black-
Scholes theory, what can we say about the graph of the volatility curve K → σimp (K)?
Remarkably, it has been found that there exists recurrent convex shapes for the graph of
volatility curves, which are known as volatility smile and volatility skew, see Figure 6.3.

130
100 100

90
80

80
Implied Volatility

Implied Volatility
60
70
40
60
out of the money
20
50
in of the money out of the money in of the money
40 0
0 2 4 6 8 10 0 2 4 6 8 10
Strike Strike

(a) Smile (b) Skew

Figure 6.3: Volatility smile and skew of a call option (not from real data!)

In the case of a volatility smile, the minimum of the graph is reached at the strike price
K ≈ S(t), i.e., when the call is at the money. This behavior indicates that the more the call
is far from being at the money, the more it will be overpriced. Volatility smiles have been
recurrent in the market since after the crash in 1987 (Black Monday), indicating that this
event led investors to be more cautious when trading on options that are in or out of the
money. Volatility skews tell us whether investors prefer to trade on call or put options.
Devising mathematical models of volatility and asset prices able to reproduce volatility
curves is an active research topic in financial mathematics. We discuss the most popular
volatility models in the Section 6.5.

6.4 European derivatives on a dividend-paying stock


In this section we consider Black-Scholes markets with a dividend-paying stock. This means
that at some time t0 ∈ (0, T ) the price of the stock decreases of a fraction a ∈ (0, 1) of its price
immediately before t0 , the difference being deposited in the account of the shareholders1 .
Letting S(t−0 ) = limt→t−
0
S(t), we then have

S(t0 ) = S(t− − −
0 ) − aS(t0 ) = (1 − a)S(t0 ). (6.31)

We assume that on each of the intervals [0, t0 ), [t0 , T ], the stock price follows a geometric
Brownian motion, namely,

S(s) = S(t)eα(s−t)+σ(W (s)−W (t)) , t ∈ [0, t0 ), s ∈ [t, t0 ) (6.32)


S(s) = S(u)eα(s−u)+σ(W (s)−W (u)) , u ∈ [t0 , T ], s ∈ [u, T ]. (6.33)

1
The dividend is expressed in percentage of the price of the stock. For instance, a = 0.03 means that the
dividend paid is 3%.

131
Theorem 6.7. Consider the standard European derivative with pay-off Y = g(S(T )) and
(a,t )
maturity T . Let ΠY 0 (t) be the Black-Scholes price of the derivative at time t ∈ [0, T ]
assuming that the underlying stock pays the dividend aS(t− 0 ) at time t0 ∈ (0, T ). Then

(a,t ) vg (t, (1 − a)S(t)), for t < t0 ,
ΠY 0 (t) =
vg (t, S(t)), for t ≥ t0 ,
where vg (t, x) is the Black-Scholes pricing function in the absence of dividends, which is
given by (6.19b).
S(T )
Proof. Using S(t)
= eατ +σ(W (T )−W (t)) , we can rewrite (6.17) in the form
σ2
ΠY (t) = e−rτ E[g(S(T )e(r− 2
−α)τ
]. (6.34)
Taking the limit s → t−
0 in (6.32) and using the continuity of the paths of the Brownian
motion we find
S(t−
0 ) = S(t)e
α(t0 −t)+σ(W (t0 )−W (t))
, t ∈ [0, t0 ).
Replacing in (6.31) we obtain
S(t0 ) = (1 − a)S(t)eα(t0 −t)+σ(W (t0 )−W (t)) , t ∈ [0, t0 ).
Hence, letting (s, u) = (T, t0 ) and (s, u) = (T, t) into (6.33), we find

(1 − a)S(t)eατ +σ(W (T )−W (t)) for t ∈ [0, t0 ),
S(T ) = (6.35)
S(t)eατ +σ(W (T )−W (t)) for t ∈ [t0 , T ].
By the√definition of Black-Scholes price in the form (6.34) and denoting G = (W (T ) −
W (t))/ τ , we obtain
σ2 √
(a,t0 )
ΠY (t) = e−rτ E[g((1 − a)S(t)e(r− 2
)τ +σ τ G
], for t ∈ [0, t0 ),
σ2 √
(a,t0 )
ΠY (t) = e−rτ E[g(S(t)e(r− 2
)τ +σ τ G
], for t ∈ [t0 , T ].
As G ∈ N (0, 1), the result follows.
We conclude that for t ≥ t0 , i.e., after the dividend has been paid, the Black-Scholes
price function of the derivative is again given by (6.19b), while for t < t0 it is obtained
by replacing x with (1 − a)x in (6.19b). To see the effect of this change, suppose that the
derivative is a call option; let c(t, x) be the Black-Scholes price function in the absence of
dividends and ca (t, x) be the price function in the case that a dividend is paid at time t0 .
Then, according to Theorem 6.7,

c(t, (1 − a)x), for t < t0 ,
ca (t, x) =
c(t, x), for t ≥ t0 .
Since ∂x c > 0 (see Theorem 6.6), it follows that ca (t, x) < c(t, x), for t < t0 , that is to say,
the payment of a dividend makes the call option on the stock less valuable (i.e., cheaper)
than in the absence of dividends until the dividend is paid.

132
Exercise 6.8 (?). Give an intuitive explanation for the property just proved for call options
on a dividend paying stock.

Exercise 6.9 (•). A standard European derivative pays the amount Y = (S(T ) − S(0))+
at time of maturity T . Find the Black-Scholes price ΠY (0) of this derivative at time t =
0 assuming that the underlying stock pays the dividend (1 − e−rT )S( T2 −) at time t = T2 .
Compute the probability of positive return for a constant portfolio which is short 1 share of
the derivative and short S(0)e−rT shares of the risk-free asset.

Exercise 6.10. Derive the Black-Scholes price of the derivative with pay-off Y = g(S(T )),
assuming that the underlying pays a dividend at each time t1 < t2 < · · · < tM ∈ [0, T ].
Denote by ai the dividend paid at time ti , i = 1, . . . , M .

6.5 Local and Stochastic volatility models


In this and the next section we present a method to compute the risk-neutral price of
European derivatives when the market parameters are not deterministic functions. We first
assume in this section that the interest rate of the money market is constant, i.e., R(t) = r,
which is quite reasonable for derivatives with short maturity such as options; stochastic
interest rate models are important for pricing derivatives with very long time of maturity,
such us zero-coupon bonds, which are briefly discussed below in Section 6.6 (see also [21,
Section 6.5]). Assuming that the derivative is a standard European derivative with pay-off
function g, the risk-neutral price formula (6.1) becomes

ΠY (t) = e−rτ E[g(S(T


e ))|FW (t)], τ = T − t. (6.36)

Motivated by our earlier results on the Black-Scholes price, and Remark 6.4, we attempt to
re-write the risk-neutral price formula in the form

ΠY (t) = vg (t, S(t)) for all t ∈ [0, T ], for all T > 0, (6.37)

for some function vg : DT+ → (0, ∞), which we call the pricing function of the derivative.
By (6.36), this is equivalent to

E[g(S(T
e ))|FW (t)] = erτ vg (t, S(t)) (6.38)

i.e., to the property that {S(t)}t≥0 is a Markov process in the risk-neutral probability mea-
sure P,e relative to the filtration {FW (t)}t≥0 . At this point it remains to understand for
which stochastic processes {σ(t)}t≥0 does the generalized geometric Brownian motion (6.4)
satisfies this Markov property. We have seen in Section 5.1 that this holds in particular when
{S(t)}t≥0 satisfies a (system of) stochastic differential equation(s), see Section 5.1. Next we
discuss two examples which encompass most of the volatility models used in the applications:
Local volatility models and Stochastic volatility models.

133
6.5.1 Local volatility models
A local volatility model is a special case of the generalized geometric Brownian motion
in which the instantaneous volatility of the stock {σ(t)}t≥0 is assumed to be a deterministic
function of the stock price S(t). Given a measurable function β : [0, ∞) × [0, ∞) → (0, ∞),
we then let
σ(t)S(t) = β(t, S(t)), (6.39)
into (6.4), so that the stock price process {S(t)}t≥0 satisfies the SDE

dS(t) = rS(t) dt + β(t, S(t))dW


f (t), S(0) = S0 > 0. (6.40)

We assume that this SDE admits a unique global solution, which is true in particular under
the assumptions of Theorem 5.1. To this regard we observe that the drift term α(t, x) = rx
in (6.40) satisfies both (5.3) and (5.4), hence these conditions restrict only the form of the
function β(t, x). In the following we shall also assume that the solution {S(t)}t≥0 of (6.40)
is non-negative a.s. for all t > 0. Note however that the stochastic process solution of (6.40)
will in general
√ hit zero with positive probability at any finite time. For example, letting
β(t, x) = x, the stock price is given by a CIR process (5.25) with b = 0 and so, according
to Theorem 5.6, S(t) = 0 with positive probability for all t > 0.

Theorem 6.8. Let g ∈ C 2 ([0, ∞)) (except possibly on finitely many points) such that g 0 , g 00
are uniformly bounded and assume that the Kolmogorov PDE
1
∂t u + rx∂x u + β(t, x)2 ∂x2 u = 0 (t, x) ∈ DT+ , (6.41)
2
associated to (6.40) admits a (necessarily unique) strong solution in the region DT+ satisfying
u(T, x) = g(x). Let also
vg (t, x) = e−rτ u(t, x).
Then we have the following.

(i) vg satisfies
1
∂t vg + rx∂x vg + β(t, x)2 ∂x2 vg = rvg (t, x) ∈ DT+ , (6.42)
2
and the terminal condition
vg (T, x) = g(x). (6.43)

(ii) The price of the European derivative with pay-off Y = g(S(T )) and maturity T > 0 is
given by (6.37).

(iii) The portfolio given by

hS (t) = ∂x vg (t, S(t)), hB (t) = (ΠY (t) − hS (t)S(t))/B(t)

is a self-financing hedging portfolio.

134
(iv) The put-call parity holds.
Proof. (i) It is straightforward to verify that vg satisfies (6.42).
(ii) Let X(t) = vg (t, S(t)). By Itô’s formula we find
1
dX(t) = (∂t vg (t, S(t)) + rS(t)∂x vg (t, S(t)) + β(t, S(t))2 ∂x2 vg (t, S(t)))dt
2
+ β(t, S(t))∂x vg (t, S(t))dWf (t).

Hence
1
d(e−rt X(t)) = e−rt (∂t vg + rx∂x vg + β(t, x)2 ∂x2 vg − rvg )(t, S(t))dt
2
−rt
+ e β(t, S(t))∂x vg (t, S(t))dW
f (t).

As vg (t, x) satisfies (6.42), the drift term in the right hand side of the previous equation is
zero. Hence
Z t
−rt
e vg (t, S(t)) = vg (t, S0 ) + e−ru β(u, S(u))∂x vg (u, S(u))dW
f (u). (6.44)
0

It follows that2 the stochastic process {e−rt vg (t, S(t))}t≥0 is a P-martingale


e relative to {FW (t)}t≥0 ,
i.e.,
e −rt2 vg (t2 , S(t2 ))|FW (t1 )] = e−rt1 vg (t1 , S(t1 )), for all 0 ≤ t1 ≤ t2 ≤ T .
E[e
Letting t1 = t, t2 = T and using the boundary condition (6.43), we find
vg (t, S(t)) = e−rτ E[g(S(T
e ))|FW (t)],
which proves (6.37).
(iii) Replacing ΠY (t) = vg (t, S(t)) into (6.44), we find
Z t

ΠY (t) = ΠY (0) + e−ru β(u, S(u))∂x vg (u, S(u))dW
f (u).
0

Hence the claim on the hedging portfolio follows by Theorem 6.2.


(iv) Let c(t, x) be the solution of (6.42) with terminal condition g(x) = (x − K)+ and p(t, x)
be the solution with terminal condition g(x) = (K − x)+ , where K > 0 is the strike price
of the call/put option. As (6.42) is linear, c − p solves (6.42) with the terminal condition
(x − K)+ − (K − x)+ = (x − K) := h(x). The (unique) strong solution vh of (6.42) is given
by vh (t, x) = x − Ke−r(T −t) , hence
c(t, x) − p(t, x) = x − Ke−r(T −t) ,
which is the put-call parity.
Clearly, a closed formula for the solution of (6.41) is rarely available, hence to compute
the price of the derivative one needs to rely on numerical methods, such as those discussed
in Section 5.4.
2
Recall that we assume that Itô’s integrals are martingales!

135
Example: The CEV model
For the constant elasticity variance (CEV) model, we have β(t, S(t)) = σS(t)δ , where
σ > 0, δ > 0 are constants. The SDE for the stock price becomes

dS(t) = rS(t)dt + σS(t)δ dW


f (t), S(0) = S0 > 0. (6.45)

Note that for δ = 1 one recovers the Black-Scholes model. For δ = 6 1, we can construct the
solution of (6.45) using a CIR process, as shown in the following exercise.
Exercise 6.11. Given σ, r and δ 6= 1, define
σ2 1
a = 2r(δ − 1), c = −2σ(δ − 1), b = (2δ − 1), θ = − .
2r 2(δ − 1)
Let {X(t)}t≥0 be the CIR process
p
dX(t) = a(b − X(t)) dt + c X(t)dW
f (t), X(0) = x > 0.

Show that S(t) = X(t)θ solves (6.45) with S0 = xθ .


It follows by Exercise 6.11, and by Feller’s condition ab ≥ c2 /2 for the positivity of the
CIR process, that the solution of (6.45) remains strictly positive a.s. if δ ≥ 1, while for
0 < δ < 1, the stock price hits zero in finite time with positive probability.
The Kolmogorov PDE (6.41) associated to the CEV model is

σ 2 2δ 2
∂t u + rx∂x u + x ∂x u = 0, (t, x) ∈ DT+ .
2
Given a terminal value g at time T as in Theorem 5.4, the previous equation admits a
unique solution. However a fundamental solution, in the sense of Theorem 5.5, exists only
for δ > 1, as otherwise the stochastic process {S(t)}t≥0 hits zero at any finite time with
positive probability and therefore the density of the random variable S(t) has a discrete
part. The precise form of the (generalized) density fS(t) (x) in the CEV model is known for
all values of δ and are given for instance in [16]. An exact formula for call options can be
found in [20].

6.5.2 Stochastic volatility models


For local volatility models, the stock price and the instantaneous volatility are both stochastic
processes. However there is only one source of randomness which drives both these processes,
namely a single Brownian motion {W (t)}t≥0 . The next level of generalization consists in
assuming that the stock price and the volatility are driven by two different sources of ran-
domness.
Definition 6.4. Let {W1 (t)}t≥0 , {W2 (t)}t≥0 be two independent Brownian motions and
{FW (t)}t≥0 be their own generated filtration. Let ρ ∈ [−1, 1] be a deterministic constant

136
and µ, η, β : [0, ∞)3 → R be continuous functions. A stochastic volatility model is a pair
of (non-negative) stochastic diffusion processes {S(t)}t≥0 , {v(t)}t≥0 satisfying the following
system of SDE’s:
p
dS(t) = µ(t, S(t), v(t))S(t) dt + v(t)S(t) dW1 (t), (6.46)
p p
dv(t) = η(t, S(t), v(t)) dt + β(t, S(t), v(t)) v(t)(ρ dW1 (t) + 1 − ρ2 dW2 (t)). (6.47)
We see from (6.46) that {v(t)}t≥0 is the instantaneous variance of the stock price {S(t)}t≥0 .
Moreover the process {W (ρ) (t)}t≥0 given by
p
W (ρ) (t) = ρ W1 (t) + 1 − ρ2 W2 (t)
is a Brownian motion satisfying
dW1 (t)dW (ρ) (t) = ρ dt;
in particular the two Brownian motions {W1 (t)}t≥0 , {W (ρ) }t≥0 are not independent, as their
cross variation is not zero (in fact, by Exercise 4.5, ρ is the correlation of the two Brownian
motions). Hence in a stochastic volatility model the stock price and the volatility are both
stochastic processes driven by two correlated Brownian motions. We assume that {S(t)}t≥0
is non-negative and {v(t)}t≥0 is positive a.s. for all times, although we refrain from discussing
under which general conditions this property holds (we will present an example below).
Our next purpose is to introduce a risk-neutral probability measure such that the dis-
counted price of the stock is a martingale. As we have two Brownian motions in this model,
we shall apply the two-dimensional Girsanov Theorem 4.12 to construct such a probabil-
ity measure. Precisely, let r > 0 be the constant interest rate of the money market and
γ : [0, ∞)3 → R be a continuous function We define
µ(t, S(t), v(t)) − r
θ1 (t) = p , θ2 (t) = γ(t, S(t), v(t)), θ(t) = (θ1 (t), θ2 (t)).
v(t)

Given T > 0, we introduce the new probability measure P e(γ) equivalent to P by P


e(γ) (A) =
E[Z(T )IA ], for all A ∈ F, where
 Z t Z t
1 t
Z 
2
Z(t) = exp − θ1 (s) dW1 (s) − θ2 (s) dW2 (s) − kθ(s)k ds .
0 0 2 0
Then by Theorem 4.12, the stochastic processes
Z t Z t
(γ)
W1 (t) = W1 (t) +
f θ1 (s) ds, W2 (t) = W2 (t) +
f γ(s) ds
0 0

e(γ) -independent Brownian motions. Moreover (6.46)-(6.47) can be rewritten as


are two P
p
dS(t) = rS(t)dt + v(t)S(t)dW f1 (t), (6.48a)
p
dv(t) = [η(t, S(t), v(t)) − v(t)ψ(t, S(t), v(t))β(t, S(t), v(t))]dt
p
+ β(t, S(t), v(t)) v(t)dWf (ρ,γ) , (6.48b)

137
where {ψ(t, S(t), v(t))}t≥0 is the {FW (t)}t≥0 -adapted stochastic process given by
µ(t, S(t), v(t)) − r p
ψ(t, S(t), v(t)) = p ρ + γ(t, S(t), v(t)) 1 − ρ2 (6.49)
v(t)
and where
f2(γ) (t).
p
f (ρ,γ) (t) = ρW
W f1 (t) + 1 − ρ2 W
e(γ) -Brownian motions {W
Note that the P f (ρ,γ) (t)}t≥0 satisfy
f1 (t)}t≥0 , {W

dW f (ρ,γ) (t) = ρdt,


f1 (t)dW for ρ ∈ [−1, 1]. (6.50)
It follows immediately that the discounted price {e−rt S(t)}t≥0 is a P e(γ) -martingale relative
e(γ) are equivalent risk-neutral
to the filtration {FW (t)}t≥0 . Hence all probability measures P
probability measures.
Remark 6.8 (Incomplete markets). As the risk-neutral probability measure is not uniquely
defined, the market under discussion is said to be incomplete. Within incomplete markets
there is no unique value for the price of derivatives. The stochastic process {ψ(t)}t≥0 is
called the market price of volatility risk and reduces to (6.3) for γ ≡ 0 (or ρ = 1).
Consider now the standard European derivative with pay-off Y = g(S(T )) at time of
maturity T . For stochastic volatility models it is reasonable to assume that the risk-neutral
price ΠY (t) of the derivative is a local function of the stock price and of the instantaneous
variance, i.e., we make the following ansatz which generalizes (6.37):
ΠY (t) = e−r(T −t) E[g(S(T
e ))|FW (t)] = vg (t, S(t), v(t)) (6.51)
for all t ∈ [0, T ], for all T > 0 and for some measurable pricing vg . Of course, as in the case
of local volatility model, (6.51) is motivated by the Markov property of solutions to systems
of SDE’s. In fact, it is useful to consider a more general European derivative with pay-off Y
given by
Y = h(S(T ), v(T )),
for some function h : [0, ∞)2 → R, i.e., the pay-off of the derivative depends on the stock
value and on the instantaneous variance of the stock at the time of maturity. We have the
following analogue of Theorem 6.8.
Theorem 6.9. Assume that the functions η(t, x, y), β(t, x, y), ψ(t, x, y) in (6.48) are such
that the PDE
1 1
∂t u + rx∂x u + A∂y u + yx2 ∂x2 u + β 2 y∂y2 u + ρβxy∂xy
2
u = 0, (6.52a)
2 2

A = η − yβψ, (t, x, y) ∈ (0, T ) × (0, ∞)2 (6.52b)
admits a unique strong solution u satisfying u(0, x, y) = h(x, y). Then the risk-neutral price
of the derivative with pay-off Y = h(S(T ), σ(T )) and maturity T is given by
ΠY (t) = vh (t, S(t), σ(t))
where the pricing function vh is given by vh (t, x, y) = e−rτ u(t, x, y), τ = T − t.

138
Exercise 6.12. Prove the theorem. Hint: use Itô’s formula in two dimensions, see Theo-
rem 4.9, and the argument in the proof of Theorem 6.8.
As for the local volatility models, a closed formula solution of (6.52) is rarely available
and the use numerical methods to price the derivative becomes essential.

Heston model
The most popular stochastic volatility model is the Heston model, which is obtained by the
following substitutions in (6.46)-(6.47):
µ(t, S(t), v(t)) = α0 , β(t, x, y) = c, η(t, x, y) = a(b − y),
where µ0 , a, b, c are constant. Hence the stock price and the volatility dynamics in the Heston
model are given by the following stochastic differential equations:
p
dS(t) = µ0 S(t) dt + v(t)S(t)dW1 (t), (6.53a)
p
dv(t) = a(b − v(t))dt + c v(t)dW (ρ) (t). (6.53b)
Note in particular that the volatility in the Heston model is CIR process, see (5.25). The
condition 2ab > c2 ensures that v(t) is strictly positive (almost surely). To pass to the risk
neutral world we need to fix a risk-neutral probability measure, that is, we need to fix the
market price of volatility risk function ψ in (6.49). In the Heston model it is assumed that

ψ(t, x, y) = λ y,
for some constant λ ∈ R, which leads to the following form of the pricing PDE (6.52):
1 c2
∂t u + rx∂x u + (k − my)∂y u + yx2 ∂x2 u + y∂y2 u + ρcxy∂xy
2
u = ru, (6.54)
2 2
where the constant k, m are given by k = ab, m = (a + cλ).
The general solution of (6.54) with terminal datum u(T, x, y) = h(x, y) is not known.
However in the case of a call option (i.e., h(x, y) = g(x) = (x − K)+ ) an explicit formula for
the Fourier transform of the solution is available, see [12]. The existence of such formula,
which permits to compute the price of call options by very efficient numerical methods, is
one of the main reasons for the popularity of the Heston model.

6.5.3 Variance swaps


Variance swaps are financial derivatives3 on the realized annual variance of an asset (or
index). We first describe how the realized annual variance is computed from the historical
data of the asset price. Let T > 0 (measure in days) and consider the partition
jT
0 = t0 < t1 < · · · < tn = T, tj =
n
3
More precisely, forward contracts, see Section 6.7.

139
of the interval [0, T ]. Assume for instance that the asset is a stock and let S(tj ) = Sj be the
stock price at time tj . Here S1 , . . . Sn are historical data for the stock price and not random
variables (i.e., the interval [0, T ] lies in the past of the present time). The realized annual
variance of the stock in the interval [0, T ] along this partition is defined as
n−1   2
2 κX Sj+1
σ1year (n, T ) = log ,
T j=0 Sj

where κ is the number of trading days in one year (typically, κ = 252). A variance swap
stipulated at time t = 0, with maturity T and strike variance K is a contract between two
2
parties which, at the expiration date, entails the exchange of cash given by N (σ1year − K),
where N (called variance notional) is a conversion factor from units of variance to units
of currency. In particular, the holder of the long position on the swap is the party who
receives the cash in the case that the realized annual variance at the expiration date is larger
than the strike variance. Variance swaps are traded over the counter and they are used by
investors to protect their exposure to the volatility of the asset. For instance, suppose that
an investor has a position on an asset which is profitable if the volatility of the stock price
increases (e.g., the investor owns call options on the stock). Then it is clearly important
for the investor to secure such position against a possible decrease of the volatility. To this
purpose the investor opens a short position on a variance swap with another investor who is
exposed to the opposite risk.
Let us now discuss variance swaps from a mathematical modeling point of view. We
assume that the stock price follows the generalized geometric Brownian motion
Z t Z t 
S(t) = S(0) exp α(s)ds + σ(s)dW (s) .
0 0

Moreover, for modeling purposes, it is convenient to assume that n → ∞ and to introduce,


as an (unbiased) estimate for the realized annual variance in the future time interval [0, T ],
the random variable
κ κ T 2
Z
QT = [log S, log S](T ) = σ (t) dt.
T T 0
In fact, by the definition of quadratic variation, it follows that
 !2 
n−1   2
κ X Sj+1
E log − QT  → 0, as n → ∞.
T j=0 Sj

A variance swap can thus be defined as the (non-standard) European derivative with pay-off
Y = QT − K. Assuming that the interest rate of the bond is constant, R(t) = r > 0, the
risk-neutral value of a variance swap is given by
ΠY (t) = e−rτ E[Q
e T − K|FW (t)]. (6.55)
In particular, at time t = 0, i.e., when the contract is stipulated, we have
ΠY (0) = e−rT E[Q
e T − K]. (6.56)

140
where we used that FW (0) is a trivial σ-algebra, and therefore the conditional expectation
with respect to FW (0) is a pure expectation. As none of the two parties in a variance swap
has a privileged position on the contract, there is no premium associated to variance swaps,
that is to say, the fair value of a variance swap is zero4 . The value K∗ of the variance strike
which makes the risk-neutral price of a variance swap equal to zero at time t = 0, i.e.,
ΠY (0) = 0, is called the fair variance strike. By (6.56) we find
T
κ
Z
K∗ = e 2 (t)] dt,
E[σ (6.57)
T 0

To compute K∗ explicitly, we need to fix a stochastic model for the variance process {σ 2 (t)}t≥0 .
Let us consider the Heston model

dσ 2 (t) = a(b − σ 2 (t))dt + cσ(t)dW


f (t), (6.58)

where a, b, c are positive constants satisfying 2ab > c2 and where {W


f (t)}t≥0 is a Brownian
motion in the risk-neutral measure. To compute the fair variance strike of a swap using the
Heston model we use that
Z t
2 e 2 (s)] ds,
E[σ (t)] = abt − a
e E[σ
0

d e 2 e 2 (t)] and so
which implies dt
E[σ (t)] = ab − aE[σ

e 2 (t)] = b + (σ 2 − b)e−at ,
E[σ σ02 = E[σ
e 2 (0)] = σ 2 (0). (6.59)
0

Replacing into (6.57) we obtain

σ02 − b
 
−aT
K∗ = κ b + (1 − e ) .
aT

Exercise
p 6.13. Assume R = r > 0, α = constant. Moreover, given σ0 > 0, let σ(t) =
σ0 S(t), which is an example of CEV model. Compute the fair strike of the variance swap.

Exercise 6.14 (•). Assume that the price S(t) of a stock follows a generalized geometric
Brownian motion with instantaneous volatility {σ(t)}t≥0 given by the Heston model dσ 2 (t) =
a(b − σ 2 (t)) dt + cσ(t) dW
f (t), where {W
f (t)}t≥0 is a Brownian motion in the risk-neutral
probability measure and a, b, c are constants such that 2ab > c2 > 0. A volatility call option
with strike K and maturity T is a financial derivative with pay-off
s 
Z T
κ
Y =N σ 2 (t) dt − K  ,
T 0
+

4
This is a general property of forward contracts, see Section 6.7.

141
where κ is the number of trading days in one year and N is a dimensional constant that
converts units of volatility into units of currency. Assuming that the risk-neutral price ΠY (t)
of this derivative has the form
Z t
2
ΠY (t) = f (t, σ (t), σ(s)2 ds)
0

and that the interest rate of the risk-free asset is constant, find the partial differential equation
and the terminal value satisfied by the pricing function f .

6.6 Interest rate models


In this section we discuss an example of stochastic model for the interest rate {R(t)}t≥0 .
The general theory of interest rate models is among the most studied and important for the
applications. For a throughout discussion on this topic, see [3]. Here we just briefly consider
the Cox-Ingersoll-Ross (CIR) model, where {R(t)}t≥0 is given by the CIR process
p
dR(t) = a(b − R(t))dt + c R(t)dW f (t), R(0) = r0 , (6.60)

where {W f (t)}t∈[0,T ] is a Brownian motion in the risk-neutral probability measure and r0 , a, b, c


are positive constants such that 2ab > c2 . In particular, the interest rate R(t) is always
strictly positive. Our purpose is to compute the value at time t of a zero-coupon bond when
the interest rate is given by the CIR model. A zero-coupon bond with face value K and
maturity time T is a financial derivative which promises to pay to its owner the amount K
at time T . The term “zero-coupon” refers to the fact that no intermediate payments are
made to the owner; in particular, a zero-coupon bond is a European derivative and therefore
its risk-neutral price is given by
Z T
ΠY (t) = E[K exp(−
e R(s) ds)|FW (t)] = KB(t, T ),
t

where Z T
B(t, T ) = E[exp(−
e R(s) ds)|FW (t)] (6.61)
t

is the risk-neutral value of a zero-coupon bond with face value 1. To compute B(t, T ) under
a CIR interest rate model, we make the ansatz

B(t, T ) = f (t, R(t)), (6.62)

for some smooth function f : [0, T ] × (0, ∞) → R, which we want to find. Note that the
ansatz (6.62) does not correspond to a classical Markov property for the interest rate, as the
RT
random variable exp(− t R(s) ds) need not be, in general, a function of R(T ).

Theorem 6.10. When the interest rate {R(t)}t>0 follows the CIR model (6.60), the value

142
B(t, T ) of the zero-coupon bond is given by (6.62) with
f (t, x) = e−xC(T −t)−A(T −t) , (6.63a)
where
sinh(γτ )
C(τ ) = (6.63b)
γ cosh(γτ ) + 12 a sinh(γτ )
" 1
#
2ab γe 2 aτ
A(τ ) = − 2 log (6.63c)
c γ cosh(γτ ) + 21 a sinh(γτ )

and
1√ 2
γ= a + 2c2 . (6.63d)
2
Proof. Using Itô’s formula and product rule, together with (6.60), we obtain
d(D(t)f (t, R(t)) = D(t)[∂t f (t, R(t)) + a(b − R(t))∂x f (t, R(t))
c2
+ R(t)∂x2 f (t, R(t)) − R(t)f (t, R(t))]dt
2 p
+ D(t)∂x f (t, R(t))c R(t)dW f (t).

Hence, imposing that f be a solution of the PDE


c2 2
∂t f + a(b − x)∂x f + x∂x f = xf, (t, x) ∈ DT+ , (6.64a)
2
we obtain that the stochastic process {D(t)f (t, R(t))}t∈[0,T ] is a P-martingale
e relative to the
filtration {FW (t)}t∈[0,T ] . Imposing additionally the terminal condition
f (T, x) = 1, for all x > 0, (6.64b)
we obtain
D(t)f (t, R(t)) = E[D(T
e )f (T, R(T ))|FW (t)] = E[D(T
e )|FW (t)],
hence
f (t, R(t)) = E[D(T
e )/D(t)|FW (t)],
and thus (6.62) is verified. It can be shown that (5.32) is the solution of the terminal value
problem (6.64).
Exercise 6.15. Derive the solution of the problem (6.64). HINT: Use the ansatz (6.63a).

6.7 Forwards and Futures


6.7.1 Forwards
A forward contract with delivery price K and maturity (or delivery) time T on an
asset U is a type of financial derivative stipulated by two parties in which one of the parties

143
promises to sell and deliver to the other party the asset U at time T in exchange for the
cash K. As opposed to option contracts, both parties in a forward contract are obliged to
fulfill their part of the agreement. Forward contracts are traded over the counter and most
commonly on commodities or currencies. Let us give two examples.
Example of forward contract on a commodity. Consider a farmer who grows wheat and a
miller who needs wheat to produce flour. Clearly, the farmer interest is to sell the wheat for
the highest possible price, while the miller interest is to pay the least possible for the wheat.
The price of the wheat depends on many economical and non-economical factors (such as
whether conditions, which affect the quality and quantity of harvests) and it is therefore
quite volatile. The farmer and the miller then stipulate a forward contract on the wheat in
the winter (before the plantation, which occurs in the spring) with expiration date in the
end of the summer (when the wheat is harvested), in order to lock its future trading price
beforehand.
Example of forward contract on a currency. Suppose that a car company in Sweden
promises to deliver a stock of 100 cars to another company in the United States in exactly
one month. Suppose that the price of each car is fixed in Swedish crowns, say 100.000
crowns. Clearly the American company will benefit by an increase of the exchange rate
crown/dollars and will be damaged in the opposite case. To avoid possible high losses,
the American company by a forward contract on 100×100.000=ten millions Swedish crowns
expiring in one month which gives the company the right and the obligation to buy ten
millions crowns for a price in dollars agreed upon today.
Remark 6.9. As it is clear from the examples above, one of the purposes of forward
contracts is to share risks.
The delivery price of a forward contract is agreed by the two parties after a careful
analysis of several factors that may influence the future value of the asset (including logistic
factors, such as the cost of delivery, storage, etc.). For this reason, the delivery price K in a
forward contract may also be viewed as a pondered estimation for the price of the asset at
the time T in the future. In this respect, K is also called the forward price of U. More
precisely, the T -forward price of the asset U at time t is the strike price of a forward contract
on U with maturity T stipulated at time t; the current, actual price Π(t) of the asset is also
called the spot price.
Remark 6.10. As the consensus on the forward price is limited to the participants of the
forward contract, it is unlikely to be accepted by all investors as a good estimation for the
price of the asset at time T . The delivery price of futures contracts on the asset, which we
define in Section 6.7.2, gives a better and more commonly accepted estimation for the future
value of an asset.
Let us apply the risk-neutral pricing theory introduced in Section 6.2 to derive a mathe-
matical model for the forward price of an asset. Let f (t, Π(t), K, T ) be the value at time t
of a forward contract on an asset with price {Π(t)}t∈[0,T ] , maturity T and delivery price K.
The pay-off for the party agreeing to buy the asset is given by
Y = (Π(T ) − K),

144
while the pay-off for the party selling the asset is (K − Π(T )).

Remark 6.11. Note that one of the two parties in a forward contract is always going to
incur in a loss. If this loss is very large, then this party could become insolvent, i.e., unable
to fulfill the contract, and then both parties will end up loosing. In a futures contract this
is prevented by the mechanism of margine accounts, see Section 6.7.2.

As both parties in a forward contract have the same rights/obligations, none of them
pays a premium to stipulate the contract, and so f (t, Π(t), K, T ) = 0. Assuming that the
price {Π(t)}t≥0 of the underlying asset follows a generalized geometric Brownian motion with
strictly positive volatility, the risk-neutral value of the forward contract for the two parties
is

f (t, Π(t), K, T ) = ±E[(Π(T


e ) − K)D(T )/D(t)|FW (t)]
 Z T 
1 e
=± E(Π(T )D(T )|FW (t)) − K E[exp(−
e R(s) ds)|FW (t)] .
D(t) t

As the discounted price {Π∗ (t)}t≥0 of the underlying asset is a P-martingale


e relative to the
filtration {FW (t)}t≥0 , then E(Π(T )D(T )|FW (t)) = D(t)Π(t). Letting
e
Z T
B(t, T ) = E[exp(−
e R(s) ds)|FW (t)], (6.65)
t

the value of the forward contract becomes

f (t, Π(t), K, T ) = ±(Π(t) − KB(t, T )).

This leads to the following definition.

Definition 6.5. Assume that the price {Π(t)}t≥0 of an asset and the value of the bond satisfy

dΠ(t) = α(t)Π(t)dt + σ(t)Π(t)dW (t), dB(t) = B(t)R(t)dt,

where {α(t)}t≥0 , {σ(t)}t≥0 , and {R(t)}t≥0 are adapted to {FW (t)}t≥0 and σ(t) > 0 al-
most surely for all times. The risk-neutral T -forward price at time t of the asset is
the {FW (t)}t≥0 -adapted stochastic process {ForT (t)}t∈[0,T ] given by

Π(t)
ForT (t) = , t ∈ [0, T ]
B(t, T )

where B(t, T ) is given by (6.65).

Remark 6.12. The value B(t, T ) is the risk-neutral price at time t of a European derivative
with pay-off 1 at the time of maturity T . This type of derivative is called a zero-coupon
bond and is discussed in more details in Section 6.6.

145
Note that the forward price increases with respect to the time left to delivery, τ = T − t,
i.e., the longer we delay the delivery of the asset, the more we have to pay for it. This is
intuitive, as the seller of the asset is loosing money by not selling the asset on the spot (due
to its devaluation compared to the bond value). As a way of example, suppose that the
interest rate of the bond is a deterministic constant, R(t) = r > 0. Then the forward price
becomes
ForT (t) = erτ Π(t),
in which case we find that the spot price of an asset is the discounted value of the forward
price. When the asset is a commodity (e.g., corn), the forward price is also inflated by the
cost of storage. Letting c > 0 be the cost to storage one share of the asset for one year, then
the forward price of the asset, for delivery in τ years in the future, is ecτ erτ Π(t).

6.7.2 Futures
Futures contracts are standardized forward contracts, i.e., rather than being traded over
the counter, they are negotiated in regularized markets. Perhaps the most interesting role
of futures contracts is that they make trading on commodities possible for anyone. To this
regard we remark that commodities, e.g. crude oil, wheat, etc, are most often sold through
long term contracts, such as forward and futures contracts, and therefore they do not usually
have an “official spot price”, but only a future delivery price (commodities “spot markets”
exist, but their role is marginal for the discussion in this section).
Futures markets are markets in which the objects of trading are futures contracts.
Unlike forward contracts, all futures contracts in a futures market are subject to the same
regulation, and so in particular all contracts on the same asset with the same delivery time
T have the same delivery price, which is called the T-future price of the asset and which
we denote by FutT (t). Thus FutT (t) is the delivery price in a futures contract on the asset
with time of delivery T and which is stipulated at time t < T . Futures markets have been
existing for more than 300 years and nowadays the most important ones are the Chicago
Mercantile Exchange (CME), the New York Mercantile Exchange (NYMEX), the Chicago
Board of Trade (CBOT) and the International Exchange Group (ICE).
In a futures market, anyone (after a proper authorization) can stipulate a futures contract.
More precisely, holding a position in a futures contract in the futures market consists in the
agreement to receive as a cash flow the change in the future price of the underlying asset
during the time in which the position is held. Notice that the cash flow may be positive or
negative. In a long position the cash flow is positive when the future price goes up and it
is negative when the future price goes down , while a short position on the same contract
receives the opposite cash flow. Moreover, in order to eliminate the risk of insolvency, the
cash flow is distributed in time through the mechanism of the margin account. More
precisely, assume that at t = 0 we open a long position in a futures contract expiring at time
T . At the same time, we need to open a margin account which contains a certain amount
of cash (usually, 10 % of the current value of the T -future price for each contract opened).
At t = 1 day, the amount FutT (1) − FutT (0) will be added to the account, if it positive, or

146
510

505

500
Future Price

495

490

485

480
Jan15 Jan16
Delivery Time

Figure 6.4: Futures price of corn on May 12, 2014 (dashed line) and on May 13, 2014
(continuous line) for different delivery times

4.5

4.4
Future Price

4.3

4.2

4.1

4.0
Jul14 Jan15 Jul15
Delivery Time

Figure 6.5: Futures price of natural gas on May 13, 2014 for different delivery times

147
withdrawn, if it is negative. The position can be closed at any time t < T (multiple of days),
in which case the total amount of cash flown in the margin account is

(FutT (t) − FutT (t − 1)) + (FutT (t − 1) − FutT (t − 2))+


· · · + (FutT (1) − FutT (0)) = (FutT (t) − FutT (0)).

(In fact, if the margin account becomes too low, and the investor does not add new cash
to it, the position will be automatically closed by the exchange market). If a long position
is held up to the time of maturity, then the holder of the long position should buy the
underlying asset. However in the majority of cases futures contracts are cash settled and
not physically settled, i.e., the delivery of the underlying asset rarely occurs, and the
equivalent value in cash is paid instead.

Remark 6.13. Since a futures contract can be closed at any time prior to expiration, future
contracts are not European style derivatives.

Our next purpose is to derive a mathematical model for the future price of an asset. Our
guiding principle is that the 1+1 dimensional futures market consisting of a futures
contract and a risk-free asset should not admit self-financing arbitrage portfolios. Consider
a portfolio invested in h(t) shares of the futures contract and hB (t) shares of the risk-free
asset at time t. We assume that {h(t), hB (t)}t∈[0,T ] is adapted to {FW (t)}t≥0 and suppose
that {FutT (t)}t∈[0,T ] is an Itô’s process. Since futures contracts have zero-value, the value of
the portfolio at time t is V (t) = hB (t)B(t) + h(t)C(t), where C(t) is the cash-flow generated
by each futures contract up to time t. For a self-financing portfolio we require that any
positive cash-flow in the interval [t, t + dt] should be invested to buy shares of the bond and
that, conversely, any negative cash flow should be settled by issuing shares of the bond (i.e.,
by borrowing money). Since the cash-flow generated in the interval [t, t + dt] is given by
dC(t) = h(t)dFutT (t), the value of a self-financing portfolio invested in the 1+1 dimensional
futures market must satisfy

dV (t) = h(t)dFutT (t) + R(t)V (t)dt,

or equivalently
dV ∗ (t) = h(t)D(t)dFutT (t). (6.66)
Now, we have seen that a simple condition ensuring that a portfolio is not an arbitrage
is that its discounted value be a martingale in the risk-neutral measure relative to the
filtration generated by the Brownian motion. By (6.66), the latter condition is achieved
by requiring that dFutT (t) = ∆(t)dW f (t), for some stochastic process {∆(t)}t∈[0,T ] adapted
to {FW (t)}t∈[0,T ] . In particular, it is reasonable to impose that

(i) {FutT (t)}t∈[0,T ] should be a P-martingale


e relative to {FW (t)}t≥0 .

Furthermore, it is clear that the future price of an asset at the expiration date T should be
equal to its spot price at time T , and so we impose that

148
(ii) FutT (T ) = Π(T ).
It follows by Exercise 3.31 that the conditions (i)-(ii) determine a unique stochastic process
{FutT (t)}t∈[0,T ] , which is given in the following definition.
Definition 6.6. Assume that the price {Π(t)}t≥0 of an asset and the value of the bond satisfy
dΠ(t) = α(t)Π(t)dt + σ(t)Π(t)dW (t), dB(t) = B(t)R(t)dt,
where {α(t)}t≥0 , {σ(t)}t≥0 , and {R(t)}t≥0 are adapted to {FW (t)}t≥0 and σ(t) > 0 almost
surely for all times. The T -Future price at time t of the asset is the {FW (t)}t≥0 -adapted
stochastic process {FutT (t)}t∈[0,T ] given by

FutT (t) = E[Π(T


e )|FW (t)], t ∈ [0, T ].
We now show that our goal to make the futures market arbitrage-free has been achieved.
Theorem 6.11. There exists a stochastic process {∆(t)}t∈[0,T ] adapted to {FW (t)}t≥0 such
that Z t
FutT (t) = FutT (0) + ∆(s)dW
f (s). (6.67)
0
Moreover, any {FW (t)}t≥0 -adapted self-financing portfolio {h(t), hB (t)}t∈[0,T ] invested in the
1+1 dimensional futures market is not an arbitrage.
Proof. The second statement follows immediately by the first one, since (6.66) and (6.67)
imply that the value of a self-financing portfolio invested in the 1+1 dimensional futures
market is a P-martingale
e relative to the filtration {FW (t)}t∈[0,T ] . To prove (6.67), we first
notice that, by (3.24),
Z(s)E[Fut
e T (t)|FW (s)] = E[Z(t)FutT (t)|FW (s)].

By the martingale property of the future price, the left hand side is Z(s)FutT (s). Hence
Z(s)FutT (s) = E[Z(t)FutT (t)|FW (s)],
that is to say, the process {Z(t)FutT (t)}t∈[0,T ] is a P-martingale relative to the filtration
{FW (t)}t∈[0,T ] . By the martingale representation theorem, Theorem 4.6, there exists a
stochastic process {Γ(t)}t∈[0,T ] adapted to {FW (t)}t≥0 such that
Z t
Z(t)FutT (t) = FutT (0) + Γ(s)dW (s).
0

We now proceed as in the proof of Theorem 6.2, namely we write


dFutT (t) = d(Z(t)FutT (t)Z(t)−1 )
and apply Itô’s product rule and Itô’s formula to derive that (6.67) holds with
Γ(t)
∆(t) = θ(t)FutT (t) + .
Z(t)

149
Theorem 6.12. The Forward-Future spread of an asset, i.e., the difference between its
forward and future price, satisfies
1 n o
ForT (t)−FutT (t) = E[D(T )Π(T )|FW (t)]− E[D(T )|FW (t)]E[Π(T )|FW (t)] .
e e e
E[D(T
e )|FW (t)]
(6.68)
Moreover, if the interest rate {R(t)}t∈[0,T ] is a deterministic function of time (e.g., a deter-
ministic constant), then ForT (t) = FutT (t), for all t ∈ [0, T ].
Proof. The last claim follows directly by (6.68). In fact, when the interest rate of the bond
is deterministic, the discounting process is also deterministic and thus in particular D(T ) is
FW (t)-measurable. Hence, the term in curl brackets in (6.68) satisfies
n o
. . . = D(T )E[Π(T
e )|FW (t)] − D(T )E[Π(T
e )|FW (t)] = 0.

As to (6.68), we compute
Π(t) D(t)Π(t)
− E[Π(T
e )|FW (t)] = − E[Π(T
e )|FW (t)]
B(t, T ) E[D(T
e )|FW (t)]
E[D(T
e )Π(T )|FW (t)] e
= − E[Π(T )|FW (t)],
E[D(T
e )|FW (t)]

where for the last equality we used that {Π∗ (t)}t∈[0,T ] is a P-martingale
e relative to {FW (t)}t≥0 .
The result follows.
Note that, as opposed to the forward price of the asset, the future price need not be
increasing with the time left to delivery.

6.8 Multi-dimensional markets


In this section we consider N + 1 dimensional stock markets. We denote the stocks prices by

{S1 (t)}t≥0 , . . . , {SN (t)}t≥0

and assume the following dynamics


N
!
X
dSk (t) = µk (t) dt + σkj (t)dWj (t) Sk (t), (6.69)
j=1

for some stochastic processes {µk (t)}t≥0 , {σkj (t)}t≥0 , j, k = 1, . . . , N , adapted to the filtration
{FW (t)}t≥0 generated by the Brownian motions {W1 (t)}t≥0 , . . . {WN (t)}t≥0 . Moreover we
assume that the Brownian motions are independent, in particular

dWj (t)dWk (t) = 0, for all j 6= k, (6.70)

150
see Exercise 3.21. Finally we denote by {R(t)}t≥0 the interest rate of the money market,
which we assume to be adapted to {FW (t)}t≥0 .
Now, given stochastic processes {θk (t)}t≥0 , k = 1, . . . , N , adapted to {FW (t)}t≥0 , and
satisfying the Novikov condition (4.20), the stochastic process {Z(t)}t≥0 given by
N Z t Z t !
X 1 2
Z(t) = exp − θk (s) ds + θk (s) dWk (s) (6.71)
k=1 0 2 0

is a martingale relative to the filtration {FW (t)}t≥0 (see Exercise 4.8). Since E[Z(t)] =
E[Z(0)] = 1, for all t ≥ 0, we can use the stochastic process {Z(t)}t≥0 to define a risk-
neutral probability measure associated to the N + 1 dimensional stock market, as we did in
the one dimensional case, see Definition 6.1.
Definition 6.7. Let T > 0 and assume that the market price of risk equations
N
X
µj (t) − R(t) = σjk (t)θk (t), j = 1, . . . , N, (6.72)
k=1

admit a solution (θ1 (t), . . . , θN (t)), for all t ∈ [0, T ]. Define the stochastic process {Z(t)}t≥0
as in (6.71). Then the measure P e equivalent to P given by

P(A)
e = E[Z(T )IA ]
is called the risk-neutral probability measure of the market at time T .
Note that, as opposed to the one dimensional case, the risk-neutral measure just defined
need not be unique, as the market price of risk equations may admit more than one solution.
For each risk-neutral probability measure P e we can apply the multidimensional Girsanov
theorem 4.12 and conclude that the stochastic processes {W f1 (t)}t≥0 , . . . {W
fN (t)}t≥0 given by
Z t
Wfk (t) = Wk (t) + θk (s) ds
0

are P-independent
e Brownian motions. Moreover these Brownian motions are P-martingales
e
relative to the filtration {FW (t)}t≥0 .
Now let {hS1 (t)}t≥0 , . . . , {hSN (t)}t≥0 be {FW (t)}t≥0 -adapted stochastic processes repre-
senting the number of shares of the stocks in a portfolio invested in the N + 1 dimensional
stock market. The portfolio is self-financing if its value satisfies
N N
!
X X
dV (t) = hSk (t)dSk (t) + R(t) V (t) − hSk (t)Sk (t) dt.
k=1 k=1

Theorem 6.13. Assume that a risk-neutral probability P e exists, i.e., the equations (6.72)
admit a solution. Then the discounted value of any self-financing portfolio invested in the
N +1 dimensional market is a P-martingale
e relative to the filtration {FW (t)}t≥0 . In particular
(by Theorem 3.16) there exists no self-financing arbitrage portfolio invested in the N + 1
dimensional stock market.

151
Proof. The discounted value of the portfolio satisfies
N N
!
X X
dV ∗ (t) = D(t) hSj (t)Sj (t)(αj (t) − R(t)) dt + hSj (t)Sj (t)σjk (t)dWk (t)
j=1 j,k=1
N N N
!
X X X
= D(t) hSj (t)Sj (t) σjk (t)θk (t)dt + hSj (t)Sj (t)σjk (t)dWk (t)
j=1 k=1 j,k=1
N
X N
X
= D(t) hSj (t)Sj (t) σjk (t)dW
fk (t).
j=1 k=1

All Itô’s integrals in the last line are P-martingales


e relative to {FW (t)}t≥0 . The result
follows.
Exercise 6.16. Work out the details of the computations omitted in the proof of the previous
theorem.
Now we show that the existence of a risk-neutral probability measure is necessary for the
absence of self-financing arbitrage portfolios in N + 1 dimensional stock markets.
Let N = 3 and assume that the market parameters are constant. Let R(t) = r > 0,
(µ1 , µ2 , µ3 ) = (2, 3, 2) and let the volatility matrix be given by
 
1 2 0
σij =  2 4 0  .
1 2 0
Thus the stocks prices satisfy
dS1 (t) = (2dt + dW1 (t) + 2dW2 (t))S1 (t),
dS2 (t) = (3dt + 2dW1 (t) + 4dW2 (t))S2 (t),
dS3 (t) = (2dt + dW1 (t) + 2dW2 (t))S3 (t).
The market price of risk equations are
θ1 + 2θ2 = 2 − r
2θ1 + 4θ2 = 3 − r
θ1 + 2θ2 = 2 − r.
This system is solvable if and only if r = 1, in which case there exist infinitely many solutions
given by
1
θ1 ∈ R, θ2 = (1 − θ1 ).
2
Hence for r = 1 there exists at least one (in fact, infinitely many) risk-neutral probability
measures, and thus the market is free of arbitrage. To construct an arbitrage portfolio when
0 < r < 1, let
1 1 1
hS1 (t) = , hS2 (t) = − , hS3 (t) = .
S1 (t) S2 (t) S3 (t)

152
The value {V (t)}t≥0 of this portfolio satisfies

dV (t) = hS1 (t)dS1 (t) + hS2 (t)dS2 (t) + hS3 (t)dS3 (t)
+ r(V (t) − hS1 (t)S1 (t) − hS2 (t)S2 (t) − hS3 (t)S3 (t))dt
= rV (t)dt + (1 − r)dt.

Hence
1
V (t) = V (0)ert + (1 − r)(ert − 1)
r
and this portfolio is an arbitrage, because for V (0) = 0 we have V (t) > 0, for all t > 0.
Similarly one can find an arbitrage portfolio for r > 1.
Next we address the question of completeness of N + 1 dimensional stock markets, i.e.,
the question of whether any European derivative can be hedged in this market. Consider a
European derivative on the stocks with pay-off Y and time of maturity T . For instance, for
a standard European derivative, Y = g(S1 (T ), . . . SN (T )), for some measurable function g.
The risk-neutral price of the derivative is
Z T
ΠY (t) = E[Y exp(−
e R(s) ds)|F(t)],
t

and coincides with the value at time t of any self-financing portfolio invested in the N + 1
dimensional market. The question of existence of an hedging portfolio is answered by the
following theorem.

Theorem 6.14. Assume that the volatility matrix (σjk (t))j,k=1,...N is invertible, for all t ≥ 0.
There exist stochastic processes {∆1 (t)}t∈[0,T ] , . . . {∆N (t)}t∈[0,T ] , adapted to {FW (t)}t≥0 , such
that
XN Z t
D(t)ΠY (t) = ΠY (0) + ∆k (s)dWfk (s), t ∈ [0, T ]. (6.73)
k=1 0

Let (Y1 (t), . . . , YN (t)) be the solution of


N
X ∆j (t)
σjk (t)Yk (t) = . (6.74)
k=1
D(t)

Then the portfolio {hS1 (t), . . . , hSN (t), hB (t)}t∈[0,T ] given by


N
Yj (t) X
hSj (t) = , hB (t) = (ΠY (t) − hSj (t)Sj (t))/B(t) (6.75)
Sj (t) j=1

is self-financing and replicates the derivative at any time, i.e., its value V (t) is equal to
ΠY (t) for all t ∈ [0, T ]. In particular, V (T ) = ΠY (T ) = Y , i.e., the portfolio is hedging the
derivative.

153
The proof of this theorem is conceptually very similar to that of Theorem 6.2 and is
therefore omitted (it makes use of the multidimensional version of the martingale represen-
tation theorem). Notice that, having assumed that the volatility matrix is invertible, the
risk-neutral probability measure of the market is unique. We now show that the uniqueness
of the risk-neutral probability measure is necessary to guarantee completeness. In fact, let
r = 1 in the example considered before and pick the following solutions of the market price
of risk equations:
(θ1 , θ2 ) = (0, 1/2), and (θ1 , θ2 ) = (1, 0)
(any other pair of solutions would work). The two corresponding risk-neutral probability
measures, denoted respectively by P
e and P,
b are given by

P(A)
e = E[ZI
e A ] P(A)
b = E[ZI
b A ], for all A ∈ F,

where
1 1 1
Ze = e− 8 T − 2 W2 (T ) , Zb = e− 2 T −W1 (T ) .
Let A = {ω : 12 W2 (T, ω) − W1 (T, ω) > 38 T }. Hence

Z(ω)
b < Z(ω),
e for ω ∈ A

and thus P(A)


b < P(A).
e Consider a financial derivative with pay-off Q = IA /D(T ). If there
existed an hedging, self-financing portfolio for such derivative, then, since the discounted
value of such portfolio is a martingale in both risk-neutral probability measures, we would
have
V (0) = E(QD(T
e )), and V (0) = E(QD(T
b )). (6.76)
But
E(QD(T
b )) = E(I
b A ) = P(A)
b < P(A)
e = E(I
e A ) = E(QD(T
e ))
and thus (6.76) cannot be verified.
We conclude this section with an example of application of Theorem 6.14.

An example of option on two stocks


Let us consider a 2 + 1 dimensional stock market with constant parameters such that the
volatility matrix is invertible. Let K, T > 0 and consider a standard European derivative
with pay-off  
S1 (T )
Y = −K
S2 (T ) +

at time of maturity T . Let us find the risk-neutral price ΠY (t) of the derivative at time
t ∈ [0, T ). Letting r > 0 be the interest rate of the bond, (σjk )j,k=1,2 be the volatility matrix
of the stocks and σj = (σj1 , σj2 ), j = 1, 2, we have
|σj |2
)t+σj ·W
Sj (t) = Sj (0)e(r−
f (t)
2 ,

154
where Wf (t) = (W f2 (t)) and · denotes the standard scalar product of vectors. Hence,
f1 (t), W
with τ = T − t,
  
−rτ e S1 (T )
ΠY (t) = e E −K |F(t)
S2 (T ) +
  
−rτ e S1 (t) ( |σ2 |2 − |σ1 |2 )τ +(σ1 −σ2 )·(W
f (T )−W
f (t))
=e E e 2 2 −K |F(t) .
S2 (t) +

Now we write
√ √
(σ1 − σ2 ) · (W
f (T ) − W
f (t)) = τ [(σ11 − σ21 )G1 + (σ12 − σ22 )G2 ] = τ (X1 + X2 ),

where Gj = (Wj (T ) − Wj (t))/ τ ∈ N (0, 1), j = 1, 2, hence Xj ∈ N (0, (σ1j − σ2j )2 ), j = 1, 2.
In addition, X1 , X2 are independent random variables, hence, as shown in Section 3.3, X1 +X2
is normally distributed with zero mean and variance (σ11 − σ21 )2 + (σ12 − σ22 )2 = |σ1 − σ2 |2 .
It follows that
  
−rτ e S1 (t) ( |σ2 |2 − |σ1 |2 )τ +√τ |σ1 −σ2 |G)
ΠY (t) = e E e 2 2 −K ,
S2 (t) +

where G ∈ N (0, 1). Hence, letting


|σ1 − σ2 |2 |σ2 |2 |σ1 |2
 
r̂ = − −
2 2 2

and a = e(r̂−r)τ , we have


  
−r̂τ S1 (t) (r̂− |σ1 −σ2 |2 )τ +√τ |σ1 −σ2 |G
ΠY (t) = ae E e 2 −K
S2 (t) +

Up to the multiplicative parameter a, this is the Black-Scholes price of a call on a stock with
price S1 (t)/S2 (t), volatility |σ1 − σ2 | and for an interest rate of the bond given by r̂. Hence,
Theorem 6.5 gives
 
S1 (t) −r̂τ
ΠY (t) = a Φ(d+ ) − Ke Φ(d− ) := v(t, S1 (t), S2 (t)),
S2 (t)
where 2
S1 (t)
log KS2 (t)
+ (r̂ ± |σ1 −σ
2
2|

d± = √ .
|σ1 − σ2 | τ
As to the self-financing hedging portfolio, it can be shown, by an argument similar to the
one used in the 1+1 dimensional case (see Theorem 6.4), that one such portfolio is given by
∂v
hSj (t) = ∂xj
(t, S1 (t), S2 (t)), j = 1, 2. Therefore, recalling the delta function of the standard
European call (see Theorem 6.6), we obtain
a aS1 (t)
hS1 (t) = Φ(d+ ), hS2 (t) = − Φ(d+ ). (6.77a)
S2 (t) S2 (t)2

155
As usual
2
X
hB (t) = (ΠY (t) − hSj (t)Sj (t))/B(t). (6.77b)
j=1

Exercise 6.17. Show that the portfolio (6.77) is self-financing and hedges the derivative.

6.9 Introduction to American derivatives


Before giving the precise definition of fair price for American derivatives, we shall present
some general properties of these contracts. American derivatives can be exercised at any
time prior or including maturity T . Let Y (t) be the pay-off resulting from exercising the
derivative at time t ∈ (0, T ]. We call Y (t) the intrinsic value of the derivative. We consider
only standard American derivatives, for which we have Y (t) = g(S(t)), for some measurable
function g : R → R. For instance, g(x) = (x − K)+ for American calls and g(x) = (K − x)+
for American puts. We denote by Π e Y (t) the price of the American derivative with intrinsic
value Y (t) and by ΠY (t) the price of the European derivative with pay-off Y = Y (T ) at
maturity time T . Two obvious properties of American derivatives are the following:
e Y (t) ≥ ΠY (t), for all t ∈ [0, T ]. In fact an American derivative gives to its owner
(i) Π
all the rights of the corresponding European derivative plus one: the option of early
exercise. Thus it is clear that the American derivative cannot be cheaper than the
European one.
e Y (t) ≥ Y (t), for all t ∈ [0, T ]. If not, an arbitrage opportunity would arise by
(ii) Π
purchasing the American derivative and exercising it immediately.

Any reasonable definition of fair price for American derivatives must satisfy (i)-(ii).

Definition 6.8. A time t ∈ (0, T ] is said to be an optimal exercise time for the American
derivative with intrinsic value Y (t) if Π
e Y (t) = Y (t).

Hence by exercising the derivative at an optimal exercise time t, the buyer takes full
advantage of the derivative: the resulting pay-off equals the value of the derivative. On the
other hand, if Π
e Y (t) > Y (t) and the buyer wants to close the (long) position on the American
derivative, then the optimal strategy is to sell the derivative, thereby cashing the amount
Π
e Y (t).

Theorem 6.15. Assume (i) holds and let C(t) be the price of an American call at time
t ∈ [0, T ]. Assume further that the underlying stock price follows a generalized geometric
Brownian motion and that the interest rate R(t) of the money market is strictly positive for
all times. Then C(t) > Y (t), for all t ∈ [0, T ). In particular it is never optimal to exercise
the call prior to maturity.

Proof. We can assume S(t) ≥ K, as for S(t) < K the claim is obvious (since C(t) ≥ 0).

156
Rt
Denoting by D(t) = exp(− 0 R(s)ds) the discounting process, the price of the European
call is E[(S(T
e ) − K)+ D(T )/D(t)|FW (t)], hence by (i):
C(t) ≥ E[(S(T
e ) − K)+ D(T )/D(t)|FW (t)] ≥ E[(S(T
e ) − K)D(T )/D(t)|FW (t)]
= E[S(T
e )D(T )/D(t)|FW (t)] − K E[D(T
e )/D(t)|FW (t)] > D(t)−1 E[S
e ∗ (T )|FW (t)] − K
= S(t) − K,
where we used D(T )/D(t) < 1 (by the positivity of the interest rate R(t)) and the martingale
property of the discounted price {S ∗ (t)}t∈[0,T ] of the stock.
It follows that under the assumptions of the previous theorem, American and European
call options have the same value.
Remark 6.14. The result is valid for general standard American derivatives with convex
pay-off function, see [21, Section 8.5].
Remark 6.15. A notable exception to the assumed conditions in Theorem 6.15 is when the
underlying stock pays a dividend. In this case it can be shown that it is optimal to exercise
the American call immediately before the dividend is payed, provided the price of the stock
is sufficiently high, see Section 6.9.2 below.
Definition 6.9. Let T ∈ (0, ∞). A random variable τ : Ω → [0, T ] is called a stopping
time for the filtration {FW (t)}t≥0 if {τ ≤ t} ∈ FW (t), for all t ∈ [0, T ]. We denote by QT
the set of all stopping times for the filtration {FW (t)}t≥0 .
Think of τ as the time at which some random event takes place. Then τ is a stopping
time if the occurrence of the event before or at time t can be inferred by the information
available up to time t (no future information is required). For the applications that we have
in mind, τ will be the optimal exercise time of an American derivative, which marks the
event that the price of the derivative equals its intrinsic value.
From now on we assume that the market has constant parameters and r > 0. Hence the
price of the stock is given by the geometric Brownian motion
σ2
S(t) = S(0)e(r− )t+σ W
f (t)
2 .
We recall that in this case the price Π(0, T ) at time t = 0 of a European derivative with
pay-off Y = g(S(T )) at maturity time T > 0 is given by
2
e −rT g(S(0)e(r− σ2
ΠY (0, T ) = E[e )T +σ W
f (T )
)]
Now, if the writer of the American derivative were sure that the buyer would exercise at the
time u ∈ (0, T ], then the fair price of the American derivative at time t = 0 would be equal to
ΠY (0, u). As the writer cannot anticipate when the buyer will exercise, we would be tempted
to define the price of the American derivative at time zero as max{ΠY (0, u), 0 ≤ u ≤ T }.
However this definition would actually be unfair, as it does not take into account the fact
that the exercise time is a stopping time, i.e., it is random and it cannot be inferred using
future information. This leads us to the following definition.

157
Definition 6.10. In a market with constant parameters, the fair price at time t = 0 of the
standard American derivative with intrinsic value Y (t) = g(S(t)) and maturity T > 0 is
given by
2
Π(0)
e e −rτ g(S(0)e(r− σ2 )τ +σW
= max E[e
f (τ )
)] (6.78)
τ ∈QT

It is not possible in general to find an closed formula for the price of an American
derivative. A notable exception is the price of perpetual American put options, which we
discuss next.

6.9.1 Perpetual American put options


An American put option is called perpetual if it never expires, i.e., T = +∞. This is of
course an idealization, but perpetual American puts are very useful to visualize the structure
of general American put options. In this section we follow closely the discussion on [21,
Section 8.3]. Definition 6.78 becomes the following.
Definition 6.11. Let Q be the set of all stopping times for the filtration {FW (t)}t≥0 , i.e.,
τ ∈ Q iff τ : Ω → [0, ∞] is a random variable and {τ ≤ t} ∈ FW (t), for all t ≥ 0. The fair
price at time t = 0 of the perpetual American put with intrinsic value Y (t) = (K − S(t))+ is
2
Π(0)
e = max E[e e −rτ (K − S(0)e(r− σ2
e −rτ (K − S(τ ))+ ] = max E[e )τ +σ W
f (τ )
)+ ]. (6.79)
τ ∈Q τ ∈Q

Theorem 6.16. There holds


Π(0)
e = vL∗ (S(0)), (6.80)
where (
K −x 0≤x≤L
vL (x) =  2r
x − σ2
(K − L) L
x>L
and
2r
L∗ =
K.
2r + σ 2
Before we prove the theorem, some remarks are in order:
(i) L∗ < K;

(ii) For S(0) ≤ L∗ we have Π(0)


e = vL∗ (S(0)) = K − S(0) = (K − S(0))+ . Hence when
S(0) ≤ L∗ it is optimal to exercise the derivative.

(iii) We have Π(0)


e > (K − S(0))+ for S(0) > L∗ . In fact
 − 2r2 −1
0 2r x σ K − L∗
vL∗ (x) = − 2
σ L∗ L∗
hence vL0 ∗ (L∗ ) = −1. Moreover
− 2r2 −2
K − L∗

2r 2r x σ
vL00 ∗ (x) = 2 ( 2 + 1) ,
σ σ L∗ L2∗

158
which is always positive. Thus the graph of vL∗ (x) always lies above k − x for x > L∗ .
It follows that it is not optimal to exercise the derivative if S(0) > L∗ .

(iv) In the perpetual case, any time is equivalent to t = 0, as the time left to maturity is
always infinite. Hence
Π(t)
e = vL∗ (S(t)).

In conclusion the theorem is saying us that the buyer of the derivative should exercise as
soon as the stock price falls below the threshold L∗ . In fact we can reformulate the theorem
in the following terms:
e −rτ (K − S(τ ))+ ] over all possible τ ∈ Q is achieved
Theorem 6.17. The maximum of E[e
at τ = τ∗ , where
τ∗ = min{t ≥ 0 : S(t) = L∗ }.
e −rτ∗ (K − S(τ∗ ))+ ] = vL∗ (S(0)).
Moreover E[e
For the proof of Theorem 6.16 we need the optional sampling theorem:
Theorem 6.18. Let {X(t)}t≥0 be an adapted process and τ a stopping time. Let t ∧ τ =
min(t, τ ). If {X(t)}t≥0 is a martingale/supermatingale/submartingale, then {X(t ∧ τ )}t≥0 is
also a martingale/supermartingale/submartingale.
We can now prove Theorem 6.16. We divide the proof in two steps, which correspond
respectively to Theorem 8.3.5 and Corollary 8.3.6 in [21].

Step 1: The stochastic process {e−r(t∧τ ) vL∗ (S(t ∧ τ ))}t≥0 is a super-martingale for all τ ∈ Q.
Moreover for S(0) > L∗ the stochastic process {e−r(t∧τ∗ ) vL∗ (S(t ∧ τ∗ ))}t≥0 is a martingale.
By Itô’s formula,
1
d(e−rt vL∗ (S(t)t)) = e−rt [−rvL∗ (S(t)) + rS(t)vL0 ∗ (S(t)) + σ 2 S(t)2 vL00 ∗ (S(t))]dt
2
−rt 0
+ e σS(t)vL∗ (S(t))dW (t).
f

The drift term is zero for S(t) > L∗ and it is equal to −rK dt for S(t) ≤ L∗ . Hence
Z t Z t
−rt −ru
e vL∗ (S(t)) = vL∗ (S(0)) − rK e IS(u)≤L∗ (u) du + e−ru σS(u)vL0 ∗ (S(u))dW
f (u).
0 0

Since the drift term is non-positive, then {e−rt vL∗ (t)}t≥0 is a supermartingale and thus by the
optional sampling theorem, the process {e−r(t∧τ ) vL∗ (S(t ∧ τ )}t≥0 is a also a supermartingale,
for all τ ∈ Q. Now, if S(0) > L∗ , then, by continuity of the paths of the geometric Brownian
motion, S(u, ω) > L∗ as long as u < τ∗ (ω). Hence by stopping the process at τ∗ the stock
price will never fall below L∗ and therefore the drift term vanishes, that is
Z t∧τ∗
−r(t∧τ∗ )
e vL∗ (S(t ∧ τ∗ )) = vL∗ (S(0)) + e−ru σS(u)vL0 ∗ (S(u))dW
f (u).
0

159
The Itô integral is a martingale and thus the Itô integral stopped at time τ∗ is also a
martingale by the optional sampling theorem. The claim follows.

Step 2: The identity (6.80) holds. The supermartingale property of the process {e−r(t∧τ ) vL∗ (S(t∧
τ ))}t≥0 implies that its expectation is non-increasing, hence

e −r(t∧τ ) vL∗ (S(t ∧ τ ))] ≤ vL∗ (S(0)).


E[e

As vL∗ (x) is bounded and continuous, the limit t → +∞ gives

e −rτ vL∗ (S(τ ))] ≤ vL∗ (S(0)).


E[e

As vL∗ (x) ≥ (K − x)+ we also have

e −rτ (K − S(τ ))+ ] ≤ vL∗ (S(0)).


E[e

Taking the maximum over all τ ∈ Q we obtain

Π(0)
e e −rτ (K − S(τ ))+ ] ≤ vL∗ (S(0)).
= max E[e
τ ∈Q

Now we prove Π(0)


e ≥ vL∗ (S(0)). This is obvious for S(0) ≤ L∗ . In fact, letting for instance
τe = min{t ≥ 0 : S(t) ≤ L∗ }, we have τe ≡ 0 for S(0) ≤ L∗ and so maxτ ∈Q E[e e −rτ (K −
e −reτ (K − S(e
S(τ ))+ ] ≥ E[e τ ))+ ] = (K − S(0))+ = vL∗ (S(0)), for S(0) ≤ L∗ . For S(0) > L∗
we use the martingale property of the stochastic process {e−r(t∧τ∗ ) vL∗ (S(t ∧ τ∗ ))}t≥0 , which
implies
e −r(t∧τ∗ ) vL∗ (S(t ∧ τ∗ ))] = vL∗ (S(0)).
E[e
Hence in the limit t → +∞ we obtain
e −rτ∗ vL∗ (S(τ∗ ))].
vL∗ (S(0)) = E[e

Moreover e−rτ∗ vL∗ (S(τ∗ )) = e−rτ∗ vL∗ (L∗ ) = e−rτ∗ (K − S(τ∗ ))+ , hence

e −rτ∗ (K − S(τ∗ ))+ ].


vL∗ (S(0)) = E[e

It follows that
Π(0)
e e −rτ (K − S(τ ))+ ] ≥ vL∗ (S(0)),
= max E[e
τ ∈Q

which completes the proof. 2


Next we discuss the problem of hedging the perpetual American put with a portfolio invested
in the underlying stock and the risk-free asset.

Definition 6.12. A portfolio process {hS (t), hB (t)}t≥0 is said to be replicating the perpetual
American put if its value {V (t)}t≥0 equals Π(t)
e for all t ≥ 0.

160
Thus by setting-up a replicating portfolio, the writer of the perpetual American put is
sure to always be able to afford to pay-off the buyer. Note that in the European case a
self-financing hedging portfolio is trivially replicating, as the price of European derivatives
has been defined as the value of such portfolios. However in the American case a replicating
portfolio need not be self-financing: if the buyer does not exercise at an optimal exercise
time, the writer must withdraw cash from the portfolio in order to replicate the derivative.
This leads to the definition of portfolio generating a cash flow.

Definition 6.13. A portfolio {hS (t), hB (t)}t≥0 with value {V (t)}t≥0 is said to generate a
cash flow with rate c(t) if {c(t)}t≥0 is adapted to {FW (t)}t≥0 and

dV (t) = hS (t)dS(t) + hB (t)dB(t) − c(t)dt (6.81)

Remark 6.16. Note that the cash flow has been defined so that c(t) > 0 when the investor
withdraws cash from the portfolio (causing a decrease of its value).

The following theorem is Corollary 8.3.7 in [21].

Theorem 6.19. The portfolio given by

vL∗ (S(t)) − hS (t)S(t)


hS (t) = vL0 ∗ (S(t)), hB (t) =
B(0)ert

is replicating the perpetual American put while generating the case flow c(t) = rKIS(t)<L∗
(i.e., cash is withdrawn at the rate rK whenever S(t) < L∗ , provided of course the buyer
does not exercise the derivative).

Proof. By definition, V (t) = hS (t)S(t) + hB (t)B(t) = vL∗ (S(t)) = Π(t),


e hence the portfolio
is replicating. Moreover
1
dV (t) = d(vL∗ (S(t))) = hS (t)dS(t) + vL00 ∗ (S(t))σ 2 S(t)2 dt. (6.82)
2
Now, a straightforward calculation shows that vL∗ (x) satisfies
1
−rvL∗ + rxvL0 ∗ + σx2 vL00 ∗ = −rKIS(t)<L∗ ,
2
a relation which was already used in step 1 in the proof of Theorem 6.16. It follows that
1 00
v (S(t))σ 2 S(t)2 dt = r(vL∗ (S(t)) − S(t)hS (t))dt − rKIS(t)<L∗ dt
2 L∗
= hB (t)dB(t) − rKIS(t)<L∗ dt.

Hence (6.82) reduces to (6.81) with c(t) = rKIS(t)<L∗ , and the proof is complete.

161
6.9.2 American calls on a dividend-paying stock
Let bca (t, S(t), K, T ) denote the Black-Scholes price at time t of the American call with strike
K and maturity T assuming that the underlying stock pays the dividend aS(t− 0 ) at time t0 ∈
(0, T ). We denote by ca (t, S(t), K, T ) the Black-Scholes price of the corresponding European
call. We omit the subscript a to denote prices in the absence of dividends. Moreover replacing
the letter c with the letter P gives the price of the corresponding put option. We say that
it is optimal to exercise the American call at time t if its Black-Scholes price at this time
ca (t, S(t), K, T ) = (S(t) − K)+ .
equals the intrinsic value of the call, i.e., b
Theorem 6.20. consider the American call with strike K and expiration date T and assume
that the underlying stock pays the dividend aS(t−
0 ) at the time t0 ∈ (0, T ). Then

ca (t, S(t), K, T ) > (S(t) − K)+ ,


b for t ∈ [t0 , T ),

i.e., it is not optimal to exercise the American call prior to maturity after the dividend is
paid. Moreover, there exists δ > 0 such that, if
δ
S(t−
0 ) > max( , K),
1−a
then the equality
ca (t−
b − −
0 , S(t0 ), K, T ) = (S(t0 ) − K)+

holds, and so it is optimal to exercise the American call “just before” the dividend is to be
paid.
Proof. For the first claim we can assume (S(t) − K)+ = S(t) − K, otherwise the American
call is out of the money and so it is clearly not optimal to exercise. By Theorem 6.7 we have

ca (t, S(t), K, T ) = c(t, S(t), K, T ), Pa (t, S(t), K, T ) = P (t, S(t), K, T ), for t ≥ t0 .

Hence, by Theorem 6.5, the put-call parity holds after the dividend is paid:

ca (t, S(t), K, T ) = Pa (t, S(t), K, T ) + S(t) − Ke−r(T −t) , t ≥ t0 .

Thus, for t ∈ [t0 , T ),

ca (t, S(t), K, T ) ≥ ca (t, S(t), K, T ) > S(t) − K = (S(t) − K)+ ,


b

where we used that P (t, S(t), K, T ) > 0 and r ≥ 0. This proves the first part of the theorem,
i.e., the fact that it is not optimal to exercise the American call prior to expiration after the
dividend has been paid. In particular

ca (t, S(t), K, T ) = ca (t, S(t), K, T ),


b for t ≥ t0 . (6.83)

Next we show that it is optimal to exercise the American call “just before the dividend is
ca (t−
paid”, i.e., b − −
0 , S(t0 ), K, T ) = (S(t0 ) − K)+ , provided the price of the stock is sufficiently

162
high. Of course it must be S(t− 0 ) > K. Assume first that b ca (t− − −
0 , S(t0 ), K, T ) > S(t0 ) − K;
then, owing to (6.83), b ca (t− − − −
0 , S(t0 ), K, T ) = ca (t0 , S(t0 ), K, T ) (buying the American call
just before the dividend is paid is not better than buying the European call, since it is
never optimal to exercise the derivative prior to expiration). By Theorem (6.7) we have
ca (t− − − − −
0 , S(t0 ), K, T ) = c(t0 , (1 − a)S(t0 ), K, T ) = c(t0 , (1 − a)S(t0 ), K, T ), where for the
latter equality we used the continuity in time of the Black-Scholes price function in the
absence of dividends. Since (1 − a)S(t− 0 ) = S(t0 ), then

ca (t−
b − −
ca (t−
0 , S(t0 ), K, T ) > S(t0 ) − K ⇒ b

0 , S(t0 ), K, T ) = c(t0 , S(t0 ), K, T ).

Hence
ca (t−
b − − − −
0 , S(t0 ), K, T ) > S(t0 ) − K ⇒ c(t0 , S(t0 ), K, T ) > S(t0 ) − K = S(t0 ) + (1 − a)S(t0 ) − K.

Therefore, taking the contrapositive statement,


c(t0 , S(t0 ), K, T ) ≤ S(t0 ) + (1 − a)S(t− ca (t−
0)−K ⇒b
− −
0 , S(t0 ), K, T ) = S(t0 ) − K. (6.84)
Next we remark that the function x → c(t, x, K, T ) − x is decreasing (since ∆ = ∂x c =
Φ(d1 ) < 1, see Theorem 6.6), and
lim c(t, x, K, T ) − x = 0,
x→0+

lim c(t, x, K, T ) − x = lim P (t, x, K, T ) − Ke−r(T −t) = −Ke−r(T −t) ,


x→+∞ x→+∞

see Exercise 6.7. Thus if (1 − a)S(t−


0) − K > −Ke−r(T −t) , i.e., S(t− −1
0 ) > (1 − a) K(1 −
e−r(T −t) , there exists ω such that if S(t0 ) > ω, i.e., S(t−0 ) > ω/(1 − a), then the inequality

c(t0 , S(t0 ), K, T ) ≤ S(t0 ) + (1 − a)S(t0 ) − K holds. It follows by (6.84) that for such values
of S(t− −
0 ) it is optimal to exercise the call “at time t0 ”. Letting δ = max(ω, K(1 − e
−r(T −t)
)
concludes the proof of the theorem.
Exercise 6.18. Prove that it is not optimal to exercise the American call at time t ∈ [0, t0 )
if S(t) < Ka (1 − e−r(T −t) ).

6.A Appendix: Solutions to selected problems

Exercise 6.5. The pay-off function is g(z) = k + z log z. Hence the Black-Scholes price of
the derivative is ΠY (t) = v(t, S(t)), where
Z   2 √ 
r− σ2 τ −σ τ x x2 dx
−rτ
v(t, s) = e g se e− 2 √

ZR 
σ2 √

2 √ x2 dx
−rτ (r− σ2 )τ −σ τ x
=e k + se (log s + (r − )τ − σ τ x) e− 2 √
R 2 2π
dx
Z √
1 2
= ke−rτ + s log s e− 2 (x+σ τ ) √

Z R
σ2 √ 2 dx √ √ 2 dx
Z
1 1
− 2 (x+σ τ )
+ s(r − )τ e √ − sσ τ xe− 2 (x+σ τ ) √
2 R 2π R 2π

163
Using that
dx dx √
Z √
Z √
− 12 (x+σ τ )2 1
τ )2
e √ = 1, xe− 2 (x+σ √ = −σ τ ,
R 2π R 2π
we obtain
σ2
v(t, s) = ke−rτ + s log s + s(r + )τ.
2
Hence
σ2
ΠY (t) = ke−rτ + S(t) log S(t) + S(t)(r +
)τ.
2
This completes the first part of the exercise. The number of shares of the stock in the hedging
portfolio is given by
hS (t) = ∆(t, S(t)),
∂v σ2
where ∆(t, s) = ∂s
= log s + 1 + (r + 2
)τ . Hence

σ2
hS (t) = 1 + (r + )τ + log S(t).
2
The number of shares of the bond is obtained by using that

ΠY (t) = hS (t)S(t) + B(t)hB (t),

hence
1
hB (t) = (ΠY (t) − hS (t)S(t))
B(t)
σ2 σ2
= e−rt (ke−rτ + S(t) log S(t) + S(t)(r + )τ − S(t) − S(t)(r + )τ − S(t) log S(t))
2 2
= ke−rT − S(t)e−rt .

This completes the second part of the exercise. To compute the probability that Y > 0,
we first observe that the pay-off function g(z) has a minimum at z = e−1 and we have
g(e−1 ) = k − e−1 . Hence if k ≥ e−1 , the derivative has probability 1 to expire in the money.
If k < e−1 , there exist a < b such that

g(z) > 0 if and only if 0 < z < a or z > b.

Hence for k < e−1 we have

P(Y > 0) = P(S(T ) < a) + P(S(T ) > b).



Since S(T ) = S(0)eαT −σ TG
, with G ∈ N (0, 1), then

log S(0)
a
+ αT log S(0)
b
+ αT
S(T ) < a ⇔ G > √ := A, S(t) > b ⇔ G < √ := B.
σ T σ T

164
Thus
+∞ B
dx x2 dx
Z 2
Z
− x2
P(Y > 0) = P(G > A) + P(G < B) = e √ + e− 2 √
A 2π −∞ 2π
= 1 − Φ(A) + Φ(B).

This completes the solution of the third part of the exercise.

Exercise 6.14. Let Z t


Q(t) = σ(s)2 ds.
0
2
We have dQ(t) = σ(t) dt. We compute
1
d(e−rt f (t, σ 2 (t), Q(t))) =e−rt [−rf dt + ∂t f dt + ∂x f dσ 2 (t) + ∂x2 f dσ 2 (t)dσ 2 (t)
2
1 2 2
+ ∂y f dQ(t) + ∂y f dQ(t)dQ(t) + ∂xy f dQ(t)dσ 2 (t)]
2
−rt c2 2
= e [∂t f + a(b − σ (t))∂x f + σ (t)∂y f + σ (t)∂x2 f − rf ]dt
2 2
2
−rt
+ e cσ(t)∂x f dW (t).
f

where the function f and its derivatives are evaluated at (t, σ 2 (t), Q(t)). As the discounted
risk-neutral price must be a martingale in the risk-neutral probability measure, we need the
drift term in the above equation to be zero. This is achieved by imposing that f satisfies
the PDE
c2
∂t f + a(b − x)∂x f + x∂y f + x∂x2 f = rf (6.85)
2
Since ΠY (T ) = Y = f (T, σ 2 (T ), Q(T )), the terminal condition is
r 
κ
f (T, x, y) = N y−K .
T +

165
Bibliography

[1] L. Arnold: Stochastic differential equations. Theory and applications. Wiley Interscience
(1974)

[2] F. Black, M. Scholes: The Pricing of Options and Corporate Liabilities. The Journal of
Political Economy 81, 637–654 (1973)

[3] D. Brigo, F. Mercurio: Interest Rate Models - Theory and Practice. Second Ed.. Springer
(2006)

[4] R. Cont, P. Tankov: Financial Modelling With Jump Processes. Taylor & Francis (2004)

[5] S. Calogero: Introduction to options pricing theory. Lecture notes for the course ‘Options
and Mathematics”, Chalmers

[6] R. M. Dudley: Real Analysis and Probability. Cambridge studies in advanced mathe-
matics (2002)

[7] L. C. Evans: An introduction to stochastic differential equations. AMS (2013)

[8] L. C. Evans: Partial Differential Equations. AMS (1998)

[9] W. Feller: Two singular diffusion problems. Ann. Math. 54, 173–182 (1951)

[10] A. Friedman: Partial Differential Equations of Parabolic type. Dover Publications, Inc.
New York (2008)

[11] J. Gatheral: The volatility surface. Wiley Finance (2006)

[12] S. L. Heston: A Closed-Form Solution for Options with Stochastic Volatility with Ap-
plications to Bond and Currency Options. The Review of Financial Studies 6, 327–343
(1993)

[13] I. Karatzas, S. E. Shreve. Brownian motion and stochastic calculus. Springer-Verlag,


New York (1988)

[14] I. Karatsas, S. E. Shreve. Brownian Motion and Stochastic Calculus. Second Edi- tion,
Springer Verlag (1998)

166
[15] D. Lamberton, B. Lapeyre: Introduction to Stochastic Calculus applied to Finance (2nd
ed.). Chapman & Hall/CRC Financial Mathematics Series (2008)

[16] A. E. Linsday, D. R. Brecher: Simulation of the CEV model and local martingale
property.

[17] B. Øksendal: Stochastic differential equations. An introduction with applications. 5th


ed., Springer-Verlag (2000)

[18] M. M. Rao: Probability Theory with Applications. Academic Press (1984)

[19] D. Revuz, M. Yor: Continuous martingales and Brownian motion. Third edition,
Springer (2001)

[20] M. Schroder: Computing the Constant Elasticity of Variance Option Pricing Formula.
The Journal of Finance 44, 211-219 (1989)

[21] S. E. Shreve: Stochastic Calculus for Fincance II. Springer Finance (2008)

[22] A. Timmermann, C. W. J. Granger: Efficient market hypothesis and forecasting. Inter-


national Journal of Forecasting 20, 15-27 (2004)

167

You might also like