Lecture Notes10
Lecture Notes10
Financial Derivatives
and PDE’s
Simone Calogero
March 2, 2017
Contents
1 Probability spaces 3
1.1 σ-algebras and information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Probability measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Filtered probability spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.A Appendix: The ∞-coin tosses probability space . . . . . . . . . . . . . . . . 11
1.B Appendix: Solutions to selected problems . . . . . . . . . . . . . . . . . . . . 13
3 Expectation 43
3.1 Expectation and variance of random variables . . . . . . . . . . . . . . . . . 43
3.2 Computing the expectation of a random variable . . . . . . . . . . . . . . . . 50
3.3 Characteristic function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.4 Quadratic variation of stochastic processes . . . . . . . . . . . . . . . . . . . 55
3.5 Conditional expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.6 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.7 Markov processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.A Appendix: Solutions to selected problems . . . . . . . . . . . . . . . . . . . . 69
4 Stochastic calculus 73
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.2 The Itô integral of step processes . . . . . . . . . . . . . . . . . . . . . . . . 74
4.3 Itô’s integral of general stochastic processes . . . . . . . . . . . . . . . . . . 77
4.4 Diffusion processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.4.1 The product rule in stochastic calculus . . . . . . . . . . . . . . . . . 84
4.4.2 The chain rule in stochastic calculus . . . . . . . . . . . . . . . . . . 85
1
4.5 Girsanov’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.6 Diffusion processes in financial mathematics . . . . . . . . . . . . . . . . . . 89
4.A Appendix: Solutions to selected problems . . . . . . . . . . . . . . . . . . . . 92
2
Chapter 1
Probability spaces
2Ω = {∅, {♥}, {1}, {$}, {♥, 1}, {♥, $}, {1, $}, {♥, 1, $} = Ω},
which contains 23 = 8 elements. Here ∅ denotes the empty set, which by definition is a
subset of all sets.
Within the applications in probability theory, the elements ω ∈ Ω are called sample
points and represent the possible outcomes of a given experiment (or trial), while the sub-
sets of Ω correspond to events which may occur in the experiment. For instance, if the
experiment consists in throwing a dice, then Ω = {1, 2, 3, 4, 5, 6} and A = {2, 4, 6} identifies
the event that the result of the experiment is an even number. Now let Ω = ΩN ,
where H stands for “head” and T stands for “tail”. Each element ω = (γ1 , . . . , γN ) ∈ ΩN
is called a N-toss and represents a possible outcome for the experiment “tossing a coin N
N
consecutive times”. Evidently, ΩN contains 2N elements and so 2ΩN contains 22 elements.
We show in Appendix 1.A at the end of the present chapter that Ω∞ —the sample space for
the experiment “tossing a coin infinitely many times”—is uncountable.
A collection of events, e.g., {A1 , A2 , . . . } ⊂ 2Ω , is also called information. To understand
the meaning of this terminology, suppose that the experiment has been performed and we
observe that the events A1 , A2 , . . . have occurred. We may then use this information to
3
restrict the possible outcomes of the experiment. For instance, if we are told that in a 5-toss
the following two events have occurred:
(i) ∅ ∈ F;
(ii) A ∈ F ⇒ Ac := {ω ∈ Ω : ω ∈ / A} ∈ F;
(iii) ∞
S
k=1 Ak ∈ F, for all {Ak }k∈N ⊆ F.
Exercise 1.1. Let F be a σ-algebra. Show that Ω ∈ F and that ∩k∈N Ak ∈ F, for all
countable families {Ak }k∈N ⊂ F of events.
Exercise 1.2. Let Ω = {1, 2, 3, 4, 5, 6} be the sample space of a dice roll. Which of the
following sets of events are σ-algebras on Ω?
2. {∅, {1}, {2}, {1, 2}, {1, 3, 4, 5, 6}, {2, 3, 4, 5, 6}, {3, 4, 5, 6}, Ω},
Exercise 1.3 (•). Prove that the intersection of any number of σ-algebras (including un-
countably many) is a σ-algebra. Show with a counterexample that the union of two σ-algebras
is not necessarily a σ-algebra.
4
Remark 1.1 (Notation). The letter A is used to denote a generic event in the σ-algebra.
If we need to consider two such events, we denote them by A, B, while N generic events are
denoted A1 , . . . , AN .
Let us comment on Definition 1.1. The empty set represents the “nothing happens”
event, while Ac represents the “A does not occur” event. Given a finite number A1 , . . . , AN
of events, their union is the event that at least one of the events A1 , . . . , AN occurs, while
their intersection is the event that all events A1 , . . . , AN occur. The reason to include the
countable union/intersection of events in our analysis is to make it possible to “take limits”
without crossing the boundaries of the theory. Of course, unions and intersections of infinitely
many sets only matter when Ω is not finite.
The smallest σ-algebra on Ω is F = {∅, Ω}, which is called the trivial σ-algebra. There
is no relevant information contained in the trivial σ-algebra. The largest possible σ-algebra
is F = 2Ω , which contains the full amount of accessible information. When Ω is countable,
it is common to pick 2Ω as σ-algebra of events. However, as already mentioned, when Ω
is uncountable this choice is unwise. A useful procedure to construct a σ-algebra of events
when Ω is uncountable is the following. First we select a collection of events (i.e., subsets of
Ω), which for some reason we regard as fundamental. Let O denote this collection of events.
Then we introduce the smallest σ-algebra containing O, which is formally defined as follows.
Definition 1.2. Let O ⊂ 2Ω . The σ-algebra generated by O is
\
FO = F : F ⊂ 2Ω is a σ-algebra and O ⊆ F ,
The σ-algebra generated by O is called Borel σ-algebra and denoted B(Rd ). The elements
of B(Rd ) are called Borel sets.
Remark 1.2 (Notation). The Borel σ-algebra B(R) plays an important role in these notes,
so we shall use a specific notation for its elements. A generic event in the σ-algebra B(R)
will be denoted U ; if we need to consider two such events we denote them by U, V , while
N generic Borel sets of R will be denoted U1 , . . . UN . Recall that for general σ-algebras, the
notation used is the one indicated in Remark 1.1.
The σ-algebra generated by O has a particular simple form when O is a partition of Ω.
Definition 1.3. Let I ⊆ N. A collection O = {Ak }k∈I of non-empty subsets of Ω is called
a partition of Ω if
(i) the events {Ak }k∈I are disjoint, i.e., Aj ∩ Ak = ∅, for j 6= k;
5
S
(ii) k∈I Ak = Ω.
If I is a finite set we call O a finite partition of Ω.
Note that any countable sample space Ω = {ωk }k∈N is partitioned by the atomic events
Ak = {ωk }, where {ωk } identifies the event that the result of the experiment is exactly ωk .
Exercise 1.4. Show that when O is a partition, the σ-algebra generated by O is given by
the set of all subsets of Ω which can be written as the union of sets in the partition O (plus
the empty set, of course).
Exercise 1.5. Find the partition of Ω = {1, 2, 3, 4, 5, 6} that generates the σ-algebra 2 in
Exercise 1.2.
P : F → [0, 1]
such that
(i) P(Ω) = 1;
(ii) for any countable collection of disjoint events {Ak }k∈N ⊆ F, we have
∞
! ∞
[ X
P Ak = P(Ak ).
k=1 k=1
6
Exercise 1.7 (Continuity of probability measures (?)). Let {Ak }k∈N ⊆ F such that Ak ⊆
Ak+1 , for all k ∈ N. Let A = ∪k Ak . Show that
Similarly, if now {Ak }k∈N ⊆ F such that Ak+1 ⊆ Ak , for all k ∈ N and A = ∩k Ak , show
that
lim P(Ak ) = P(A).
k→∞
• There is only one probability measure defined on the trivial σ-algebra, namely P(∅) = 0
and P(Ω) = 1.
In general we define X
P(A) = pk , A ∈ 2Ω ,
k:ωk ∈A
while P(∅) = 0.
where NH (ω) is the number of H in ω and NT (ω) is the number of T in ω (NH (ω) +
NT (ω) = N ). We say that the coin is fair if p = 1/2. The probability of a generic
event A ∈ F = 2ΩN is obtained by adding up the probabilities of the atomic events
7
whose disjoint union forms the event A. For instance, assume N = 3 and consider the
event
“The first and the second toss are equal”.
Denote by A ∈ F the set corresponding to this event. Then clearly A is the (disjoint)
union of the atomic events
Hence,
Then Z
P(U ) = f (x) dx, (1.3)
U
Remark 1.3 (Riemann vs. Lebesgue integral). The integral in (1.3) must be understood
in the Lebesgue sense, since we are integrating a general measurable function over a general
Borel set. If f is a sufficiently regular (say, continuous) function, and U = (a, b) ⊂ R is
an interval, then the integral in (1.3) can be understood in the Riemann sense. Although
this last case is sufficient for most applications in finance, all integrals in these notes should
be understood in the Lebesgue sense, unless otherwise stated. The knowledge of Lebesgue
integration theory is however not required for our purposes.
P
Exercise 1.8 (•). Prove that ω∈ΩN P({ω}) = 1, where P({ω}) is given by (1.2).
8
p p
L 1/2 m
T
L
q
Figure 1.1: The Bertrand paradox. The length T of the cord pq is greater then L.
the length of the side of a equilateral triangle inscribed in the circle C. Note that all such
triangles are obtained one from another by a rotation around the center of the circle and
all have the same sides length L. Consider the event A = {(p, q) ∈ Ω : T > L}. What
is a reasonable definition for P(A)? From one hand we can suppose that one vertex of the
triangle is p, and thus T will be greater than L if and only if the point q lies on the arch
of the circle between the two vertexes of the triangle different from p, see Figure 1.1(a).
Since the length of such arc is 1/3 the perimeter of the circle, then it is reasonable to define
P(A) = 1/3. On the other hand, it is simple to see that T > L whenever the midpoint m of
the chord lies within a circle of radius 1/2 concentric to C, see Figure 1.1(b). Since the area
of the interior circle is 1/4 the area of C, we are led to define P(A) = 1/4.
Whenever two probabilities are defined for the same experiment, we shall require them
to be equivalent, in the following sense.
Definition 1.5. Given two probability spaces (Ω, F, P) and (Ω, F, P),
e the probability mea-
e are said to be equivalent if P(A) = 0 ⇔ P(A)
sures P and P e = 0.
Conditional probability
It might be that the occurrence of an event B makes the occurrence of another event A more
or less likely. For instance, the probability of the event A = {the first two tosses of a fair
coin are both head} is 1/4; however if know that the first toss is a tail, then P(A) = 0, while
P(A) = 1/2 if we know that the first toss is a head. This leads to the important definition
of conditional probability.
9
Definition 1.6. Given two events A, B such that P(B) > 0, the conditional probability
of A given B is defined as
P(A ∩ B)
P(A|B) = .
P(B)
To justify this definition, let FB = {A ∩ B}A∈F , and set
Then (B, FB , PB ) is a probability space in which the events that cannot occur simultaneously
with B are null events. Therefore it is natural to regard (B, FB , PB ) as the restriction of the
probability space (Ω, F, P) when B has occurred.
If P(A|B) = P(A), the two events are said to be independent. The interpretation is the
following: if two events A, B are independent, then the occurrence of the event B does not
change the probability that A occurred. By Definition 1.4 we obtain the following equivalent
characterization of independent events.
Definition 1.7. Two events A, B are said to be independent if P(A ∩ B) = P(A)P(B). In
general, the events A1 , . . . , AN (N ≥ 2) are said to be independent if, for all 1 ≤ k1 < k2 <
· · · < km ≤ N , we have
m
Y
P(Ak1 ∩ · · · ∩ Akm ) = P(Akj ).
j=1
Two σ-algebras F, G are said to be independent if A and B are independent, for all A ∈ G
and B ∈ F. In general the σ-algebras F1 , . . . , FN (N ≥ 2) are said to be independent if
A1 , A2 , . . . , AN are independent events, for all A1 ∈ F1 , . . . , AN ∈ Fn .
Note that if F, G are two independent σ-algebras and A ∈ F ∩ G, then A is trivial. In
fact, if A ∈ F ∩ G, then P(A) = P(A ∩ A) = P(A)2 . Hence P(A) = 0 or 1. The interpretation
of this simple remark is that independent σ-algebras carry distinct information.
Exercise 1.9 (•). Given a fair coin and assuming N is odd, consider the following two
events A, B ∈ ΩN :
10
tosses are heads, and similarly define AHT , AT H , AT T . These four events form a partition of
ΩN and they generate a σ-algebra F2 as indicated in Exercise 1.4. Clearly, F1 ⊂ F2 . Going
on with three tosses, four tosses, and so on, until we complete the N -toss, we construct a
sequence
F1 ⊂ F2 ⊂ · · · ⊂ FN = 2ΩN
of σ-algebras. The σ-algebra Fk contains all the events of the experiment that depend on
(i.e., which are resolved by) the first k tosses. The family {Fk }k=1,...,N of σ-algebras is an
example of filtration.
Definition 1.8. A filtration is a one parameter family {F(t)}t≥0 of σ-algebras such that
F(t) ⊆ F for all t ≥ 0 and F(s) ⊆ F(t) for all s ≤ t. A quadruple (Ω, F, {F(t)}t≥0 , P) is
called a filtered probability space.
In our applications t stands for the time variable and filtrations are associated to exper-
iments in which “information accumulates with time”. For instance, in the example given
above, the more times we toss the coin, the higher is the number of events which are resolved
by the experiment, i.e., the more information becomes accessible.
Let us show first that Ω is uncountable. We use the well-known Cantor diagonal argu-
ment. Suppose that Ω∞ is countable and write
Finally consider the ∞-toss ω which is obtained by changing each single toss of ω̄, that is to
say
It is clear that the ∞-toss ω does not belong to the set (1.5). In fact, by construction, the
first toss of ω is different from the first toss of ω1 , the second toss of ω is different from the
second toss of ω2 , . . . , the nth toss of ω is different from the nth toss of ωn , and so on, so
11
that each ∞-toss in (1.5) is different from ω. We conclude that the elements of Ω∞ cannot
be listed as they were comprising a countable set.
Now, let N ∈ N and recall that the sample space ΩN for the N -tosses experiment is given
by (1.1). For each ω̄ = (γ̄1 , . . . , γ̄N ) ∈ ΩN we define the event Aω̄ ⊂ Ω∞ by
Aω̄ = {ω = (γn )n∈N : γj = γ̄j , j = 1, . . . , N },
i.e., the event that the first N tosses in a ∞-toss be equal to (γ̄1 , . . . , γ̄N ). Define the
probability of this event as the probability of the N -toss ω̄, that is
P0 (Aω̄ ) = pNH (ω̄) (1 − p)NT (ω̄) ,
where 0 < p < 1, NH (ω̄) is the number of heads in the N -toss ω̄ and NT (ω̄) = N − NH (ω̄)
is the number of tails in ω̄, see (1.2). Next consider the family of events
UN = {Aω̄ }ω̄∈ΩN ⊂ 2Ω∞ .
It is clear that UN is, for each fixed N ∈ N, a partition of Ω∞ . Hence the σ-algebra
FN = FUN is generated according to Exercise 1.4. Note that FN contains all events of Ω∞
that are resolved by the first N tosses. Moreover FN ⊂ FN +1 , that is to say, {FN }N ∈N is
a filtration. Since P0 is defined for all Aω̄ ∈ UN , then it can be extended uniquely to the
entire FN , because each element A ∈ FN is the disjoint union of events of UN (see again
Exercise 1.4) and therefore the probability of A can be inferred by the property (ii) in the
definition of probability measure, see Definition 1.4. But then P0 extends uniquely to
[
F∞ = FN .
N ∈N
Hence we have constructed a triple (Ω∞ , F∞ , P0 ). Is this triple a probability space? The
answer is no, because F∞ is not a σ-algebra. To see this, let Ak be the event that the
k th toss in a infinite sequence of tosses is a head. Clearly Ak ∈ Fk for all k and therefore
{Ak }k∈N ⊂ F∞ . Now assume that F∞ is a σ-algebra. Then the event A = ∪k Ak would
belong to F∞ and therefore also Ac ∈ F∞ . The latter holds if and only if there exists N ∈ N
such that Ac ∈ FN . But Ac is the event that all tosses are tails, which of course cannot be
resolved by the information FN accumulated after just N tosses. We conclude that F∞ is
not a σ-algebra. In particular, we have shown that F∞ is not in general closed with respect
to the countable union of its elements. However it is easy to show that F∞ is closed with
respect to the finite union of its elements, and in addition satisfies the properties (i), (ii) in
Definition 1.4. This set of properties makes F∞ an algebra. To complete the construction
of the probability space for the ∞-coin tosses experiment, we need the following deep result.
Theorem 1.1 (Caratheódory’s theorem). Let U be an algebra PN of subsets of Ω and P0 :
U → [0, 1] a map satisfying P0 (Ω) = 1 and P0 (∪N A
i=1 i ) = i=1 P0 (Ai ), for every finite
2
collection {A1 , . . . , AN } ⊂ U of disjoint sets . Then there exists a unique probability measure
P on FU such that P(A) = P0 (A), for all A ∈ U.
2
P0 is called a pre-measure.
12
Hence the map P0 : F∞ → [0, 1] defined above extends uniquely to a probability measure
P defined on F = FF∞ . The resulting triple (Ω∞ , F, P) defines the probability space for the
∞-tosses experiment.
Exercise 1.8. Since for all k = 0, . . . , N the number of N -tosses ω ∈ ΩN having NH (ω) = k
is given by the binomial coefficient
N N!
= ,
k k!(N − k)!
then
X X X p NH (ω)
NH (ω) NT (ω) N
P({ω}) = p (1 − p) = (1 − p)
ω∈ΩN ω∈ΩN ω∈ΩN
1−p
N k
N
X N p
= (1 − p) .
k=0
k 1 − p
13
KHNL
1.0
0.9
0.8
0.7
0.6
N
20 40 60 80 100
Figure 1.2: A numerical solution of Exercise 1.9 for a generic odd natural number N .
Exercise 1.9. We expect that P(A|B) > P(A), that is to say, the first toss being a head
increases the probability that the number of heads in the complete N -toss will be larger than
the number of tails. To verify this, we first observe that P(A) = 1/2, since N is odd and
thus there will be either more heads or more tails in any N -toss. Moreover, P(A|B) = P(C),
where C ∈ ΩN −1 is the event that the number of heads in a (N − 1)-toss is larger or equal
to the number of tails. Letting k be the number of heads, P(C) is the probability that
k ∈ {(N − 1)/2, . . . , N − 1}. Since there are Nk−1 possible (N − 1)-tosses with k-heads,
then
N −1 k N −1−k N −1
N −1 N −1
X 1 1 1 X
P(C) = = N −1 .
k 2 2 2 k
k=(N −1)/2 k=(N −1)/2
Thus proving the statement for a generic odd N is equivalent to prove the inequality
N −1
N −1
1 X 1
K(N ) = > .
2N −1 k 2
k=(N −1)/2
A “numerical proof” of this inequality is provided in Figure 1.2. Note that the function
K(N ) is decreasing and converges to 1/2 as N → ∞.
14
Chapter 2
Throughout this chapter we assume that (Ω, F, {F(t)}t≥0 , P) is a given filtered probability
space.
{X ∈ U } = {ω ∈ Ω : X(ω) ∈ U }
is the pre-image of the Borel set U . If there exists c ∈ R such that X(ω) = c almost surely,
we say that X is a deterministic constant.
Occasionally we shall also need to consider complex-valued random variables. These are
defined as the maps Z : Ω → C of the form Z = X + iY , where X, Y are real-valued random
variables and i is the imaginary unit (i2 = −1). Similarly a vector valued random variable
X = (X1 , . . . , XN ) : Ω → RN can be defined by simply requiring that each component
Xj : Ω → R is a random variable in the sense of Definition 2.1.
15
Remark 2.2. Equality among random variables is always understood to hold up to a null
set. That is to say, X = Y always means X = Y a.s., for all random variables X, Y : Ω → R.
Random variables are also called measurable functions, but we prefer to use this
terminology only when Ω = R and F = B(R). Measurable functions will be denoted by small
Latin letters (e.g., f, g, . . . ). If X is a random variable and Y = f (X) for some measurable
function f , then Y is also a random variable. We denote P(X ∈ U ) = P({X ∈ U })
the probability that X takes value in U ∈ B(R). Moreover, given two random variables
X, Y : Ω → R and the Borel sets U, V , we denote
which is the probability that the random variable X takes value in U and Y takes value in
V . The generalization to an arbitrary number of random variables is straightforward.
As the value attained by X depends on the result of the experiment, random variables
carry information, i.e., upon knowing the value attained by X we know something about the
outcome ω of the experiment. For instance, if X(ω) = (−1)ω , where ω is the result of tossing
a dice, and if we are told that X takes value 1, then we infer immediately that the dice roll
is even. The information carried by a random variable X forms the σ-algebra generated by
X, whose precise definition is the following.
Thus σ(X) contains all the events that are resolved by knowing the value of X. The
interpretation of X being G-measurable is that the information contained in G suffices to
determine the value taken by X in the experiment. Note that the σ-algebra generated by a
deterministic constant consists of trivial events only.
If Y is X-measurable then σ(X, Y ) = σ(X), i.e., the random variable Y does not add
any new information to the one already contained in X. Clearly, if Y = f (X) for some
measurable function f , then Y is X-measurable. It can be shown that the opposite is also
1
See Definition 1.2.
16
true: if σ(Y ) ⊆ σ(X), then there exists a measurable function f such that Y = f (X) (see
Prop. 3 in [18]). The other extreme is when X and Y carry distinct information, i.e., when
σ(X) ∩ σ(Y ) consists of trivial events only. This occurs in particular when the two random
variables are independent.
Definition 2.4. Let X : Ω → R be a random variable and G ⊂ F be a sub-σ-algebra. We
say that X is independent of G if σ(X) and G are independent in the sense of Definition 1.7.
Two random variables X, Y : Ω → R are said to be independent random variables
if the σ-algebras σ(X) and σ(Y ) are independent. More generally, the random variables
X1 , . . . , XN are independent if σ(X1 ), . . . , σ(XN ) are independent σ-algebras.
In the intermediate case, i.e., when Y is neither X-measurable nor independent of X, it
is expected that the knowledge on the value attained by X helps to derive information on
the values attainable by Y . We shall study this case in the next chapter.
Exercise 2.2 (•). Show that when X, Y are independent random variables, then σ(X)∩σ(Y )
consists of trivial events only. Show that two deterministic constants are always indepen-
dent. Finally assume Y = g(X) and show that in this case the two random variables are
independent if and only if Y is a deterministic constant.
Exercise 2.3. Which of the following pairs of random variables X, Y : ΩN → R are indepen-
dent? (Use only the intuitive interpretation of independence and not the formal definition.)
1. X(ω) = NT (ω); Y (ω) = 1 if the first toss is head, Y (ω) = 0 otherwise.
2. X(ω) = 1 if there exists at least a head in ω, X(ω) = 0 otherwise; Y (ω) = 1 if there
exists exactly a head in ω, Y (ω) = 0 otherwise.
3. X(ω) = number of times that a head is followed by a tail; Y (ω) = 1 if there exist two
consecutive tail in ω, Y (ω) = 0 otherwise.
Theorem 2.1. Let X1 , . . . , XN be independent random variables. Let us divide the set
{X1 , . . . , XN } into m separate groups of random variables, namely, let
{X1 , . . . , XN } = {Xk1 }k1 ∈I1 ∪ {Xk2 }k2 ∈I2 ∪ · · · ∪ {Xkm }km ∈Im ,
where {I1 , I2 , . . . Im } is a partition of {1, . . . , N }. Let ni be the number of elements in the
set Ii , so that n1 + n2 + · · · + nm = N . Let g1 , . . . , gm be measurable functions such that
gi : Rni → R. Then the random variables
Y1 = g1 ((Xk1 )k1 ∈I1 ), Y2 = g2 ((Xk2 )k2 ∈I2 ), Ym = gm ((Xkm )km ∈Im )
are independent.
For instance, in the case of N = 2 independent random variables X1 , X2 , Theorem 2.1
asserts that Y1 = g(X1 ) and Y2 = f (X2 ) are independent random variables, for all measurable
functions f, g : R → R.
Exercise 2.4 (•). Prove Theorem 2.1 for the case N = 2.
17
Simple and discrete Random Variables
A special role is played by simple random variables. The simplest possible one is the indica-
tor function of an event: Given A ∈ F, the indicator function of A is the random variable
that takes value 1 if ω ∈ A and 0 otherwise, i.e.,
1, ω ∈ A,
IA (ω) =
0, ω ∈ Ac .
2
Not all authors distinguish between simple and discrete random variables.
18
Let us see two examples of simple/discrete random variables that appear in financial
mathematics (and in many other applications). A simple random variable X is called a
binomial random variable if
• Range(X) = {0, 1, . . . , N };
N
p (1 − p)N −k , k = 1 . . . , N .
k
• There exists p ∈ (0, 1) such that P(X = k) = k
lim sX
n (ω) = X(ω), for all ω ∈ Ω.
n→∞
19
Exercise 2.7 (•). Show that
Show also that FX is (1) right-continuous, (2) increasing and (3) limx→+∞ FX (x) = 1.
Exercise 2.8. Let F : R → [0, 1] be a measurable function satisfying the properties (1)–(3)
in Exercise 2.7. Show that there exists a probability space and a random variable X such
that F = FX .
All probability density functions considered in these notes are continuous, and therefore
the integral in (2.1) can be understood in the Riemann sense. Moreover in this case FX is
differentiable and we have
dFX
fX = .
dx
If the integral in (2.1) is understood in the Lebesgue sense, then the density fX can be a
quite irregular function. In this case, the fundamental theorem of calculus for the Lebesgue
integral entails that the distribution FX (x) satisfying (2.1) is absolutely continuous, and so
in particular it is continuous. Conversely, if FX is absolutely continuous, then X admits a
density function. We remark that, regardless of the notion of integral being used, a simple
(or discrete) random variable X cannot admit a density inP the sense of Definition 2.7, unless
it is a deterministic constant. Suppose in fact that X = N k=1 ak IAk is not a deterministic
constant. Assume that a1 = max(a1 , . . . , aN ). Then
while
lim FX (x) = 1 = FX (a1 ).
x→a+
1
It follows that FX (x) is not continuous, and so in particular it cannot be written in the
form (2.1). To define the pdf of simple random variables, let
N
X
X= ak IAk ,
k=1
20
where without loss of generality we assume that the real numbers a1 , . . . , aN are distinct and
the sets A1 , . . . , AN are disjoint (see Exercise 2.5). The distribution function of X is
X
FX (x) = P(X ≤ x) = P(X = ak ). (2.2)
ak ≤x
which extend (2.1) to simple random variables. We remark that it is possible to unify the
definition of pdf for continuum and discrete random variables by writing the sum (2.4) as
an integral with respect to the Dirac measure, but we shall not do so.
We shall see that when a random variable X admits a density fX , all the relevant sta-
tistical information on X can be deduced by fX . We also remark that often one can prove
the existence of the pdf fX without however being able to derive an explicit formula for it.
For instance, fX is often given as the solution of a partial differential equation, or through
its (inverse) Fourier transform, which is called the characteristic function of X, see Sec-
tion 3.3. Some examples of density functions, which have important applications in financial
mathematics, are the following.
21
• A random variable X : Ω → R is said to be an exponential (or exponentially
distributed) random variable if it admits the density
fX (x) = λe−λx Ix≥0 ,
for some λ > 0, which is called the intensity of the exponential random variable X. A
typical profile is shown in Figure 2.1(b) . We denote by E(λ) the set of all exponential
random variables with intensity λ > 0. The distribution function of an exponential
random variable X with intensity λ is given by
Z x Z x
FX (x) = fX (y) dy = λ e−λy dy = 1 − e−λx .
−∞ 0
22
0.20
2.0
0.15 1.5
0.10 1.0
0.05 0.5
Figure 2.1: Densities of a normal random variable X and of an exponential random variable
Y.
0.25
1.2
∆=1/2 ∆=1
1.0 0.20
0.8
∆=1 0.15 ∆=2
0.6
0.10
∆=4
0.4 ∆=5
0.05
0.2
1 2 3 4 5 10 15 20
Figure 2.2: Densities of (non-central) chi-squared random variables with different degree.
If a random variable X admits a density fX , then for all (possibly unbounded) intervals
I ⊂ R the result of Exercise 2.7 entails
Z
P(X ∈ I) = fX (y) dy. (2.6)
I
23
which means that a standard normal random variable has about 68.3 % chances to take
value in the interval [−1, 1].
Exercise 2.9 (•). Let X ∈ N (0, 1) and Y = X 2 . Show that Y ∈ χ2 (1).
Exercise 2.10. Let X ∈ N (0, 1). Show that the random variable W defined by
1/X 2 for X 6= 0,
W =
0 otherwise
is Lévy distributed.
Exercise 2.11. Let X ∈ N (m, σ 2 ) and Y = X 2 . Show that
√
cosh(m x/σ 2 ) x + m2
fY (x) = √ exp − Ix>0 .
2πxσ 2 2σ 2
Now assume that FX is differentiable on the openR set x ∈ (0, ∞). Then there exists a
x
function fX+ (x), x > 0, such that FX (x) − FX (0) = 0 fX+ (t) dt. Hence, for all x ∈ R we find
Z x
FX (x) = p0 H(x) + fX+ (t)It>0 dt,
−∞
where p0 = P(X = 0) and H(x) is the Heaviside function, i.e., H(x) = 1 if x ≥ 0, H(x) = 0
if x < 0. By introducing the delta-distribution through the formal identity
then we obtain, again formally, the following expression for the density function
dFX (x)
fX (x) = = p0 δ(x) + fX+ (x). (2.9)
dx
The formal identities (2.8)-(2.9) become rigorous mathematical expressions when they are
understood in the sense of distributions. We shall refer to the term p0 δ(x) as the discrete
part of the density. The function fX+ is also called the defective density of the random
variable X. Note that Z ∞
fX+ (x) dx = 1 − p0 .
0
24
The defective density is the actual pdf of X if and only if p0 = 0.
The typical example of financial random variable whose pdf may have a discrete part
is the stock price S(t) at time t. For simple models (such us the geometric Browniam
motion (2.14) defined in Section 2.4 below), the stock price is strictly positive a.s. at all
finite times and the density has no discrete part. However for more sophisticated models the
stock price can reach zero with positive probability at any finite time and so the pdf of the
stock price admits a discrete part P(S(t) = 0)δ(x). Hence these models take into account
the risk of default of the stock. We shall see an example in Section 6.5.
The random variables X, Y are said to admit the joint (probability) density function
fX,Y : R2 → [0, ∞) if fX,Y is integrable in R2 and
Z x Z y
FX,Y (x, y) = fX,Y (η, ξ) dη dξ. (2.10)
−∞ −∞
Moreover, if two random variables X, Y admit a joint density fX,Y , then each of them admits
a density (called marginal density in this context) which is given by
Z Z
fX (x) = fX,Y (x, y) dy, fY (y) = fX,Y (x, y) dx.
R R
and similarly for the random variable Y . If W = g(X, Y ), for some measurable function g,
and I ⊂ R is an interval, the analogue of (2.7) in 2 dimensions holds, namely:
Z
P(g(X, Y ) ∈ I) = fX,Y (x, y) dx dy.
x,y:g(x,y)∈I
25
As an example of joint pdf, let m = (m1 , m2 ) ∈ R2 and C = (Cij )i,j=1,2 be a 2×2 positive
definite, symmetric matrix. Two random variables X, Y : Ω → R are said to be jointly
normally distributed with mean m and covariance matrix C if they admit the joint
density
1 1 −1 T
fX,Y (x, y) = p exp (z − m) · C · (z − m) , (2.11)
(2π)2 det C 2
where z = (x, y), “ · ” denotes the row by column product, C −1 is the inverse matrix of C
and v T is the transpose of the vector v.
Exercise 2.12 (•). Show that two random variables X, Y are jointly normally distributed if
and only if
1
fX,Y (x, y) = p ×
2πσ1 σ2 1 − ρ2
!
1 h (x − m )2 2ρ(x − m )(y − m ) (y − m )2 i
1 1 2 2
× exp − − + , (2.12)
2(1 − ρ2 ) σ12 σ1 σ2 σ22
where
2 2 C12
σ1 = C11 , σ2 = C22 , ρ= .
σ1 σ2
Exercise 2.13 (?). Let X, Y ∈ N (0, 1) be independent and jointly normally distributed.
Show that the random variable Z defined by
Y /X for X 6= 0,
Z=
0 otherwise
is Cauchy distributed.
In the next theorem we establish a simple condition for the independence of two random
variables which admit a joint density3 .
(i) If two random variables X, Y admit the densities fX , fY and are independent, then
they admit the joint density
(ii) If two random variables X, Y admit a joint density fX,Y of the form
26
for some functions u, v : R → [0, ∞), then X, Y are independent and admit the densi-
ties fX , fY given by
1
fX (x) = cu(x), fY (y) = v(y),
c
where Z Z −1
c= v(x) dx = u(y) dy .
R R
{X ≤ x} = {X ≤ x} ∩ Ω = {X ≤ x} ∩ {Y ≤ R} = {X ≤ x, Y ≤ R}.
Hence,
Z x Z ∞ Z x Z Z x
P(X ≤ x) = fX,Y (η, y) dy dη = u(η) dη v(y) dy = cu(η) dη,
−∞ −∞ −∞ R −∞
R
where c = R v(y) dy. Thus X admits the density fX (x) = c Ru(x). At the same fashion one
proves that Y admits the density fY (y) = c0 v(y), where c0 = R u(x)dx. Since
Z Z Z Z
1= fX,Y (x, y) dx dy = u(x) dx v(y) dy = c0 c,
R R R R
Remark 2.3. By Theorem 2.3 and the result of Exercise 2.12, we have that two jointly nor-
mally distributed random variables are independent if and only if ρ = 0 in the formula (2.12).
Exercise 2.14 (•). Let X ∈ N (0, 1) and Y ∈ E(1) be independent. Compute P(X ≤ Y ).
27
Exercise 2.15. Let X ∈ E(2), Y ∈ χ2 (3) be independent. Compute numerically (e.g., using
Mathematica) the following probability
Result:≈ 0.893.
In Exercise 3.19 we give another criteria to establish whether two random variables are
independent, which applies also when the random variables do not admit a density.
The parameter t will be referred to as time parameter, since this is what it represents
in the applications in financial mathematics. Examples of stochastic processes in financial
mathematics are given in the next section.
Definition 2.10. Two stochastic processes {X(t)}t≥0 , {Y (t)}t≥0 are said to be independent
if for all m, n ∈ N and 0 ≤ t1 < t2 < · · · < tn , 0 ≤ s1 < s2 < · · · < sm , the σ-algebras
σ(X(t1 ), . . . , X(tn )), σ(Y (s1 ), . . . , Y (sm )) are independent.
Hence two stochastic processes {X(t)}t≥0 , {Y (t)}t≥0 are independent if the information
obtained by “looking” at the process {X(t)}t≥0 up to time T is independent of the informa-
tion obtained by “looking” at the process {Y (t)}t≥0 up to time S, for all S, T > 0. Similarly
one defines the notion of several independent stochastic processes.
Remark 2.4 (Notation). If t runs over a countable set, i.e., t ∈ {tk }k∈N , then a stochastic
process is equivalent to a sequence of random variables X1 , X2 , . . . , where Xk = X(tk ). In
this case we say that the stochastic process is discrete and we denote it by {Xk }k∈N . An
example of discrete stochastic process is the random walk defined below.
A special role is played by step processes: given 0 = t0 < t1 < t2 < . . . , a step process
is a stochastic process {∆(t)}t≥0 of the form
∞
X
∆(t, ω) = Xk (ω)I[tk ,tk+1 ) .
k=0
28
∆(t, ω∗ )
X4 (ω∗ )
X1 (ω∗ )
X2 (ω∗ )
t0 = 0 t1 t2 t3 t4
X3 (ω∗ )
A typical path of a step process is depicted in Figure 2.3. Note that the paths of a step
process are right-continuous, but not left-continuous. Moreover, since Xk (ω) = ∆(tk , ω), we
can rewrite ∆(t) as
X∞
∆(t) = ∆(tk )I[tk ,tk+1 ) .
k
It will be shown in Theorem 4.2 that any sufficiently regular stochastic process can be
approximated, in a suitable sense, by a sequence of step processes.
In the same way as a random variable generates a σ-algebra, a stochastic process generates
a filtration. Informally, the filtration generated by a stochastic process {X(t)}t≥0 contains
the information accumulated by looking at the process for longer and longer periods of time.
Definition 2.11. The filtration generated by the stochastic process {X(t)}t≥0 is given by
{FX (t)}t≥0 , where
FX (t) = FO(t) , O(t) = ∪0≤s≤t σ(X(s)).
Hence FX (t) is the smallest σ-algebra containing σ(X(s)), for all 0 ≤ s ≤ t, see Def-
inition 1.2. Similarly one defines the filtration {FX,Y (t)}t≥0 generated by two stochastic
processes {X(t)}t≥0 , {Y (t)}t≥0 , as well as the filtration generated by any number of stochas-
tic processes.
Definition 2.12. If {F(t)}t≥0 is a filtration and FX (t) ⊆ F(t), for all t ≥ 0, we say that
the stochastic process {X(t)}t≥0 is adapted to the filtration {F(t)}t≥0 .
The property of {X(t)}t≥0 being adapted to {F(t)}t≥0 means that the information con-
tained in F(t) suffices to determine the value attained by the random variable X(s), for all
s ∈ [0, t]. Clearly, {X(t)}t≥0 is adapted to its own generated filtration {FX (t)}t≥0 . Moreover
29
if {X(t)}t≥0 is adapted to {F(t)}t≥0 and Y (t) = f (X(t)), for some measurable function f ,
then {Y (t)}t≥0 is also adapted to {F(t)}t≥0 .
Next we give an example of (discrete) stochastic process. Let {Xt }t∈N be a sequence of
independent random variables satisfying
for all t ∈ N. For a concrete realization of these random variables, we may think of Xt as
being defined on the sample space Ω∞ of the ∞-coin tosses experiment (see Appendix 1.A).
In fact, letting ω = (γj )j∈N ∈ Ω∞ , we may set
−1, if γt = H,
Xt (ω) =
1, if γt = T .
Hence Xt : Ω → {−1, 1} is the simple random variable Xt (ω) = IAt − IAct , where At = {ω ∈
Ω∞ : γt = H}. Clearly, FX (t) is the collection of all the events that are resolved by the first
t-tosses, which is given as indicated at the beginning of Section 1.3.
To understand the meaning of the term “random walk”, consider a particle moving on
the real line in the following way: if Xt = 1 (i.e., if the toss number t is a head), at time t
the particle moves one unit of length to the right, if Xt = −1 (i.e., if the toss number t is
a head) it moves one unit of length to the left. Then Mt gives the total amount of units of
length that the particle has travelled to the right or to the left up to time t.
The increments of the random walk are defined as follows. If (k1 , . . . , kN ) ∈ NN , such
that 1 ≤ k1 < k2 < · · · < kN , we set
Hence ∆j is the total displacement of the particle from time kj−1 to time kj .
Theorem 2.4. The increments ∆1 , . . . , ∆N of the random walk are independent random
variables.
30
Proof. Since
(ii) The increments over disjoint time intervals are independent, i.e., for all 0 = t0 < t1 <
. . . tm ∈ (0, ∞), the random variables
are independent;
(iii) For all s < t, the increment W (t) − W (s) belongs to N (0, t − s).
Remark 2.5. Note carefully that the properties defining a Brownian motion depend on
the probability measure P. Thus a stochastic process may be a Brownian motion relative
to a probability measure P and not a Brownian motion with respect to another (possibly
equivalent) probability measure P.
e If we want to emphasize the probability measure P
with respect to which a stochastic process is a Brownian motion we shall say that it is a
P-Brownian motion.
It can be shown that Brownian motions exist. In particular, it can be shown that the
sequence of stochastic processes {Wn (t)}t≥0 , n ∈ N, defined by
1
Wn (t) = √ M[nt] , (2.13)
n
where Mt is the symmetric random walk and [z] denotes the integer part of z, converges to
a Brownian motion4 . Therefore one may think of a Brownian motion as a time-continuum
4
The convergence holds in probability, i.e., limn→∞ P(|Wn (t) − W (t)| ≥ ε) = 0, for all ε > 0.
31
version of a symmetric random walk which runs for an infinite number of “infinitesimal
time steps”. In fact, provided the number of time steps is sufficiently large, the process
{Wn (t)}t≥0 gives a very good approximation of a Brownian motion, which is useful for
numerical computations. An example of path to the stochastic process {Wn (t)}t≥0 , for
n = 1000, is shown in Figure 2.4. Notice that there exist many Brownian motions and each
of them may have some specific properties besides those listed in Definition 2.14. However,
as long as we use only the properties (i)-(iii), we do not need to work with a specific example
of Brownian motion.
Once a Brownian motion is introduced it is natural to require that the filtration {F(t)}t≥0
should be somehow related to it. For our future financial applications, the following class of
filtrations will play a fundamental role.
Definition 2.15. Let {W (t)}t≥0 be a Brownian motion and denote by σ + (W (t)) the σ-
algebra generated by the increments {W (s) − W (t); s ≥ t}, that is
The meaning is the following: the increments of the Brownian motion after time t are
independent of the information available at time t in the σ-algebra F(t). It is clear by the
previous definition that {FW (t)}t≥0 is a non-anticipating filtration for {W (t)}t≥0 . We shall
see later that many properties of Brownian motions that depend on {FW (t)}t≥0 also holds
with respect to any non-anticipating filtration (e.g., the martingale property, see Section 3.6).
Definition 2.16. A Poisson process with rate λ is a stochastic process {N (t)}t≥0 such
that
(iii) For all s < t, the increment N (t) − N (s) belongs to P(λ(t − s)).
Note in particular that N (t) is a discrete random variable, for all t ≥ 0, and that, in
contrast to the Brownian motion, the paths of a Poisson process are not continuous. The
Poisson process is the building block to construct more general stochastic processes with
jumps, which are very popular nowadays as models for the price of certain financial assets,
see [4].
32
25
20
15
10
-5
-10
Stock price
The price per share a time t of a stock will be denoted by S(t). Typically S(t) > 0, for
all t ≥ 0, however, as discussed in Section 2.2.1, some models allow for the possibility that
S(t) = 0 with positive probability at finite times t > 0 (risk of default). Clearly {S(t)}t≥0
33
is a stochastic process. If we have several stocks, we shall denote their price by {S1 (t)}t≥0 ,
{S2 (t)}t≥0 , etc. Investors who own shares of a stock are those having a long position on the
stock, while investors short-selling the stock hold a short position. We recall that short-
selling a stock is the practice to sell the stock without actually owning it. Concretely, an
investor is short-selling N shares of a stock if the investor borrows the shares from a third
party and then sell them immediately on the market. The reason for short-selling assets
is the expectation that the price of the asset will decrease. If this is the case, then upon
re-purchasing the N shares in the future, and returning them to the lender, the short-seller
will profit from the lower current price of the asset compared to the price at the time of
short-selling.
The most popular model for the price of a stock is the geometric Brownian motion
stochastic process, which is given by
S(t) = S(0) exp(αt + σW (t)). (2.14)
Here {W (t)}t≥0 is a Brownian motion, α ∈ R is the instantaneous mean of log-return,
σ > 0 is the instantaneous volatility, while σ 2 is the instantaneous variance of the
stock. Note that α and σ are constant in this model. Moreover, S(0) is the price at time
t = 0 of the stock, which, according to Remark 2.7, is a deterministic constant. In Chapter 4
we introduce a generalization of the geometric Brownian motion, in which the instantaneous
mean of log-return and the instantaneous volatility of the stock are stochastic processes
{α(t)}t≥0 , {σ(t)}t≥0 (generalized geometric Brownian motion).
Exercise 2.17 (•). Derive the density of the geometric Brownian motion (2.14) and use
the result to show that P(S(t) = 0) = 0, i.e., a stock whose price is described by a geometric
Brownian motion cannot default.
Risk-free assets
A money market is a market in which the object of trading is money. More precisely, a
money market is a type of financial market where investors can borrow and lend money at
a given interest rate and for a period of time T ≤ 1 year5 . Assets in the money market
(i.e., short term loans) are assumed to be risk-free, which means that their value is always
increasing in time. Examples of risk-free assets in the money market are repurchase agree-
ments (repo), certificates of deposit, treasure bills, etc. The stochastic process corresponding
to the price per share of a generic risk-free asset will be denoted by {B(t)}t∈[0,T ] . The in-
stantaneous interest rate of a risk-free asset is a stochastic process {R(t)}t∈[0,T ] such that
R(t) > 0, for all t ∈ [0, T ], and such that the value of the asset at time t is given by
Z t
B(t) = B(0) exp R(s) ds , t ∈ [0, T ]. (2.15)
0
This corresponds to the investor debit/credit with the money market at time t if the amount
B(0) is borrowed/lent by the investor at time t = 0. An investor lending (resp. borrowing)
5
Loans with maturity longer than 1 year are called bonds; they will be discussed in more details in
Chapter 6.
34
money has a long (resp. short) position on the risk-free asset (more precisely, on its interest
rate). We remark that the integral in the right hand side of (2.15) is to be evaluated path
by path, i.e., Z t
B(t, ω) = B(0) exp R(s, ω) ds ,
0
for all fixed ω ∈ Ω. Although in the real world different risk-free assets have different
interest rates, throughout these notes we make the simplifying assumption that all assets
in the money market have the same instantaneous interest rate {R(t)}t∈[0,T ] , which we call
the interest rate of the money market. For the applications in options pricing theory it is
common to assume that the interest rate of the money market is a deterministic constant
R(t) = r, for all t ∈ [0, T ]. This assumption can be justified by the relatively short time of
maturity of options, see below.
Remark 2.8. The (average) interest rate of the money market is sometimes referred to
as “the cost of money”, and the ratio B(t)/B(0) is said to express the “time-value of
money”. This terminology is meant to emphasize that one reason for the “time-devaluation”
of money—in the sense that the purchasing power of money decreases with time—is precisely
the fact that money can grow interests by purchasing risk-free assets.
is called the discounting process. In general, if an asset price is multiplied by D(t), the
new stochastic process is called the discounted price of the asset. We denote the discounted
price by adding a subscript ∗ to the asset price. For instance, the discounted price of a stock
with price S(t) at time t is given by
S ∗ (t) = D(t)S(t).
Its meaning is the following: S ∗ (t) is the amount that should be invested on the money
market at time t = 0 in order that the value of this investment at time t replicates the
value of the stock at time t. Notice that S ∗ (t) < S(t). The discounted price of the stock
measures, roughly speaking, the loss in the stock value due to the “time-devaluation” of
money discussed above, see Remark 2.8.
Financial derivative
A financial derivative (or derivative security) is a contract whose value depends on the
performance of one (or more) other asset(s), which is called the underlying asset. There
exist various types of financial derivatives, the most common being options, futures, forwards
35
and swaps. Financial derivatives can be traded over the counter (OTC), or in a regularized
market. In the former case, the contract is stipulated between two individual investors, who
agree upon the conditions and the price of the contract. In particular, the same derivative
(on the same asset, with the same parameters) can have two different prices over the counter.
Derivatives traded in the market, on the contrary, are standardized contracts. Anyone, after
a proper authorization, can make offers to buy or sell derivatives in the market, in a way
much similar to how stocks are traded. Let us see some examples of financial derivative (we
shall introduce more in Chapter 6).
A call option is a contract between two parties, the buyer (or owner) of the call and
the seller (or writer) of the call. The contract gives to the buyer the right, but not the
obligation, to buy the underlying asset at some future time for a price agreed upon today,
which is called strike price of the call. If the buyer can exercise this option only at some
given time t = T > 0 (where t = 0 corresponds to the time at which the contract is
stipulated) then the call option is called European, while if the option can be exercised
at any time in the interval (0, T ], then the option is called American. The time T > 0 is
called maturity time, or expiration date of the call. The seller of the call is obliged to
sell the asset to the buyer if the latter decides to exercise the option. If the option to buy
in the definition of a call is replaced by the option to sell, then the option is called a put
option.
In exchange for the option, the buyer must pay a premium to the seller. Suppose that
the option is a European option with strike price K, maturity time T and premium Π0 on a
stock with price S(t) at time t. In which case is it then convenient for the buyer to exercise
the call? Let us define the payoff of a European call as
i.e., Y > 0 if the stock price at the expiration date is higher than the strike price of the call
and it is zero otherwise; similarly for a European put we set
Y = (K − S(T ))+ .
Note that Y is a random variable, because it depends on the random variable S(T ). Clearly, if
Y > 0 it is more convenient for the buyer to exercise the option rather than buying/selling the
asset on the market. Note however that the real profit for the buyer is given by N (Y − Π0 ),
where N is the number of option contracts owned by the buyer. Typically, options are sold
in stocks of 100 shares, that is to say, the minimum amount of options that one can buy is
100, which cover 100 shares of the underlying asset.
One reason why investors buy calls in the market is to protect a short position on the
underlying asset. In fact, suppose that an investor short-sells 100 shares of a stock at time
t = 0 with the agreement to return them to the original owner at time t0 > 0. The investor
believes that the price of the stock will go down in the future, but of course the price may
go up instead. To avoid possible large losses, at time t = 0 the investor buys 100 shares of
an American call option on the stock expiring at T ≥ t0 , and with strike price K = S(0). If
the price of the stock at time t0 is not lower than the price S(0) as the investor expected,
36
then the investor will exercise the call, i.e., will buy 100 shares of the stock at the price
K = S(0). In this way the investor can return the shares to the lender with minimal losses.
At the same fashion, investors buy put options to protect a long position on the underlying
asset. The reason why investors write options is mostly to get liquidity (cash) to invest in
other assets6 .
Let us introduce some further terminology. A European call (resp. put) is said to be in
the money at time t if S(t) > K (resp. S(t) < K). The call (resp. put) is said to be out
of the money if S(t) < K (resp. S(t) > K). If S(t) = K, the (call or put) option is said
to be at the money at time t. The meaning of this terminology is self-explanatory.
The premium that the buyer has to pay to the seller for the option is the price (or value)
of the option. It depends on time (in particular, on the time left to expiration). Clearly,
the deeper in the money is the option, the higher will be its price. Therefore the holder of
the long position on the option is the buyer, while the seller holds the short position on the
option.
European call and put options are examples of more general contracts called European
derivatives. Given a function g : (0, ∞) → R, a standard European derivative with
pay-off Y = g(S(T )) and maturity time T > 0 is a contract that pays to its owner the
amount Y at time T > 0. Here S(T ) is the price of the underlying asset (which we take to
be a stock) at time T . The function g is called pay-off function of the derivative. The
term “European” refers to the fact that the contract cannot be exercised before time T ,
while the term “standard” refers to the fact that the pay-off depends only on the price of
the underlying at time T . The pay-off of a non-standard European derivative depends on
the path of the asset price during the interval [0, T ]. For example, the pay-off of an Asian
RT
call is given by Y = ( 0 S(t) dt − K)+ .
The price at time t of a European derivative (standard or not) with pay-off Y and
expiration date T will be denoted by ΠY (t). Hence {ΠY (t)}t∈[0,T ] is a stochastic process.
In addition, we now show that ΠY (T ) = Y holds, i.e., there exist no offers to buy (sell) a
derivative for less (more) than Y at the time of maturity. In fact, suppose that a derivative
is sold for ΠY (T ) < Y “just before” it expires at time T . In this way the buyer would make
the sure profit Y − ΠY (T ) at time T , which means that the seller would loose the same
amount. On the contrary, upon buying a derivative “just before” maturity for more than
Y , the buyer would loose Y − ΠY (T ). Thus in a rational market, ΠY (T ) = Y (or, more
precisely, ΠY (t) → Y , as t → T ).
A standard American derivative with pay-off function g is a contract which can be
exercised at any time t ∈ (0, T ] prior or equal to its maturity and that, upon exercise, pays the
amount g(S(t)) to the holder of the derivative. In these notes we are mostly concerned with
(standard) European derivatives, but in Chapter 6.9 we also discuss briefly some properties
of American call/put options.
6
Of course, speculation is also an important motivation to buy/sell options. However the standard theory
of options pricing is firmly based on the interpretation of options as derivative securities and does not take
speculation into account.
37
Portfolio
The portfolio of an investor is the set of all assets in which the investor is trading. Mathe-
matically it is described by a collection of N stochastic processes
where hk (t) represents the number of shares of the asset k at time t in the investor portfolio.
If hk (t) is positive, resp. negative, the investor has a long, resp. short, position on the asset
k at time t. If Πk (t) denotes the value of the asset k at time t, then {Πk (t)}t≥0 is a stochastic
process; the portfolio value is the stochastic process {V (t)}t≥0 given by
N
X
V (t) = hk (t)Πk (t).
k=1
Remark 2.9. For modeling purposes, it is convenient to assume that an investor can trade
any fraction of shares of an asset, i.e., hk (t) : Ω → R, rather than hk (t) : Ω → Z.
The investor makes a profit in the time interval [t0 , t1 ] if V (t1 ) > V (t0 ); the investor
incurs in a loss in the interval [t0 , t1 ] if V (t1 ) < V (t0 ). We now introduce the important
definition of arbitrage portfolio.
Definition 2.17. An arbitrage portfolio is a portfolio whose value {V (t)}t≥0 satisfies the
following properties, for some T > 0:
Hence an arbitrage portfolio is a risk-free investment in the interval [0, T ] which requires
no initial wealth and with a positive probability to give profit. We remark that the arbitrage
property depends on the probability measure P. However, it is clear that if two measures P
and P e are equivalent, then the arbitrage property is satisfied with respect to P if and only if
it is satisfied with respect to P.
e The guiding principle to devise theoretical models for asset
prices in financial mathematics is to ensure that one cannot set-up an arbitrage portfolio by
investing on these assets (arbitrage-free principle).
Markets
A market in which the objects of trading are N risky assets (e.g., stocks) and M risk-free
assets in the money market is said to be “N+M dimensional”. Most of these notes focus
on the case of 1+1 dimensional markets in which we assume that the risky asset is a stock.
A portfolio invested in this market is a pair {hS (t), hB (t)}t≥0 of stochastic processes, where
38
hS (t) is the number of shares of the stock and hB (t) the number of shares of the risk-free
asset in the portfolio at time t. The value of such portfolio is given by
V (t) = hS (t)S(t) + hB (t)B(t),
where S(t) is the price of the stock (given for instance by (2.14)), while B(t) is the value at
time t of the risk-free asset, which is given by (2.15).
Exercise 2.2. Let A be an event that is resolved by both variables X, Y . This means that
there exist I, J ⊆ R such that A = {X ∈ I} = {Y ∈ J}. Hence, using the independence of
X, Y ,
P(A) = P(A ∩ A) = P(X ∈ I, Y ∈ J) = P(X ∈ I)P(Y ∈ J) = P(A)P(A) = P(A)2 .
Therefore P(A) = 0 or P(A) = 1.
Now let a, b two deterministic constants. Note that, for all I ⊂ R,
1 if a ∈ I
P(a ∈ I) =
0 otherwise
and similarly for b. Hence
1 if a ∈ I and b ∈ J
P(a ∈ I, b ∈ J) = = P(a ∈ I)P(b ∈ J).
0 otherwise
Finally we show that X and Y = g(X) are independent if and only if Y is a deterministic
constant. For the “if” part we use that
P(X ∈ J) if a ∈ I
P(a ∈ I, X ∈ J) = = P(a ∈ I)P(X ∈ J).
0 otherwise
For the “only if” part, let z ∈ R and I = {g(X) ≤ z} = {X ∈ g −1 (−∞, z]}. Then, using
the independence of X and Y = g(X),
P(g(X) ≤ z) = P(g(X) ≤ z, g(X) ≤ z) = P(X ∈ g −1 (−∞, z], g(X) ≤ z)
= P(X ∈ g −1 (−∞, z])P(g(X) ≤ z) = P(g(X) ≤ z)P(g(X) ≤ z).
39
Hence P(Y ≤ z) is either 0 or 1, which implies that Y is a deterministic constant.
Exercise 2.4. A ∈ σ(f (X)) if and only if A = {f (X) ∈ U }, for some U ∈ B(R). The latter is
equivalent to X(ω) ∈ {f ∈ U }, hence A = {X ∈ {f ∈ U }}. Similarly, B = {Y ∈ {g ∈ V }},
for some V ∈ B(R). Hence
P(A ∩ B) = P({X ∈ {f ∈ U }} ∩ {Y ∈ {g ∈ V }}).
As X and Y are independent, the right hand side is equal to P({X ∈ {f ∈ U }})P({Y ∈
{g ∈ V }}), hence P(A ∩ B) = P(A)P(B), as claimed.
Exercise 2.5. Let’s assume for simplicity that M = 2, i.e., X = b1 IB1 +b2 IB2 with B1 ∩B2 6=
∅ (the generalization to M > 2 is straightforward). Hence
b1 for ω ∈ B1 \ (B1 ∩ B2 )
X(ω) = b2 for ω ∈ B2 \ (B1 ∩ B2 )
b1 + b2 for ω ∈ B1 ∩ B2
Assume b1 6= b2 . Then upon defining the disjoint sets A1 = B1 \(B1 ∩B2 ), A2 = B2 \(B1 ∩B2 ),
A3 = B1 ∩ B2 , and the real numbers a1 = b1 , a2 = b2 , a3 = b1 + b2 , we can rewrite X as the
simple random variable
X = a1 IA1 + a2 IA2 + a3 IA3 .
If b1 = b2 6= 0 we define A1 = B1 \ (B1 ∩ B2 ) ∪ B2 \ (B1 ∩ B2 ), A2 = B1 ∩ B2 , a1 = b1 = b2 ,
a2 = b1 + b2 = 2b1 and write X in the form
X = a1 IA1 + a2 IA2 .
Exercise 2.7. Write (−∞, b] as the disjoint union of the sets (−∞, a] and (a, b]. Hence
40
Hence, for y > 0,
1 e−y/2
d 1 −y/2 d √ −y/2 d √
fY (y) = FY (y) = √ e ( y) − e (− y) = √ √ .
dy 2π dy dy 2π y
√
Since Γ(1/2) = π, this is the claim.
To compute this integral, we first divide the domain of integration on the variable x in x ≤ 0
and x ≥ 0. So doing we have
Z 0 Z ∞ Z ∞ Z ∞
1 − x2
2
−y 1 − x2
2
P(X ≤ Y ) = √ dx e dy e + √ dx e dy e−y .
2π −∞ 0 2π 0 x
d
fS(t) (x) = FS(t) (x),
dx
provided the distribution FS(t) , i.e.,
is differentiable. Clearly, fS(t) (x) = FS(t) (x) = 0, for x < 0. For x > 0 we use that
1 x
S(t) ≤ x if and only if W (t) ≤ log − αt := A(x).
σ S(0)
Thus,
A(x)
1
Z
y2
P(S(t) ≤ x) = P(−∞ < W (t) ≤ A(x)) = √ e− 2t dy,
2πt −∞
where for the second equality we used that W (t) ∈ N (0, t). Hence
Z A(x) !
d 1 y2 1 − A(x)2 dA(x)
fS(t) (x) = √ e− 2t dy = √ e 2t ,
dx 2πt −∞ 2πt dx
Since Z ∞
fS(t) (y)dy = 1,
0
then p0 = P(S(t) = 0) = 0.
42
Chapter 3
Expectation
Throughout this chapter we assume that (Ω, F, {F(t)}t≥0 , P) is a given filtered probability
space.
for some finite partition {Ak }k=1,...,N of Ω and real distinct numbers a1 , . . . , aN . In this case,
it is natural to define the expected value (or expectation) of X as
N
X N
X
E[X] = ak P(Ak ) = ak P(X = ak ).
k=1 k=1
That is to say, E[X] is a weighted average of all the possible values attainable by X, in
which each value is weighted by its probability of occurrence. This definition applies also for
N = ∞ (i.e., for discrete random variables) provided of course the infinite series converges.
For instance, if X ∈ P(µ) we have
∞ ∞
X X µk e−µ
E[X] = kP(X = k) = k
k=0 k=0
k!
∞ ∞ ∞
−µ
X µk −µ
X µr+1 −µ
X µr
=e =e =e µ = µ.
k=1
(k − 1)! r=0
r! r=0
r!
43
Now let X be a non-negative random variable and consider the sequence {sX n }n∈N of
X
simple functions defined in Theorem 2.2. Recall that sn converges pointwise to X as n → ∞,
i.e., sX
n (ω) → X(ω), for all ω ∈ Ω (see Exercise 2.6). Since
n2n −1
X k k k+1
E[sX
n] = P( ≤ X < ) + nP(X ≥ n), (3.1)
k=1
2n 2n 2n
44
are respectively the positive and negative part of X. Since X± are non-negative random
variables, then their expectation is given as in Definition 3.1.
Definition 3.2. Let X : Ω → R be a random variable and assume that at least one of the
random variables X+ , X− has finite expectation. Then we define the expectation of X as
If X± have both finite expectation, we say that X has finite expectation or that it is an
integrable random variable. The set of all integrable random variables on Ω will be denoted
by L1 (Ω), or by L1 (Ω, P) if we want to specify the probability measure.
Remark 3.3 (Notation). Of course the expectation of a random variable depends on the
probability measure. If another probability measure Pe is defined on the σ-algebra of events
(not necessarily equivalent to P), we denote the expectation of X in P
e by E[X].
e
Theorem 3.1. Let X, Y : Ω → R be integrable random variables. Then the following holds:
Sketch of the proof. For all claims, the argument of the proof is divided in three steps: STEP
1: Show that it suffices to prove the claim for non-negative random variables. STEP 2: Prove
the claim for simple functions. STEP 3: Take the limit along the sequences {sX Y
n }n∈N , {sn }n∈N
of simple functions converging to X, Y . Carrying out these three steps for (i), (ii) and (iii)
is simpler, so let us focus on (iv). Let X+ = f (X), X− = g(X), and similarly for Y , where
f (s) = max(0, s), g(s) = − min(0, s). By Exercise 2.4, each of (X+ , Y+ ), (X− , Y+ ), (X+ , Y− )
and (X− , Y− ) is a pair of independent (non-negative) random variables. Assume that the
claim is true for non-negative random variables. Then, using X = X+ − X− , Y = Y+ − Y−
45
and the linearity of the expectation, we find
Hence it suffices to prove the claim for non-negative random variables. Next assume that
X, Y are independent simple functions and write
N
X M
X
X= aj IAj , Y = bk IBk .
j=1 k=1
We have
N X
X M N X
X M
XY = aj bk IAj IBk = aj bk IAj ∩Bk .
j=1 k=1 j=1 k=1
Thus by linearity of the expectation, and since the events Aj , Bk are independent, for all
j, k, we have
N X
X M N X
X M
E[XY ] = aj bk E[IAj ∩Bk ] = aj bk P(Aj ∩ Bk )
j=1 k=1 j=1 k=1
N X
X M N
X M
X
= aj bk P(Aj )P(Bk ) = aj P(Aj ) bk P(Bk ) = E[X]E[Y ].
j=1 k=1 j=1 k=1
E[sX Y X Y
n sn ] = E[sn ]E[sn ].
Letting n → ∞, the right hand side converges to E[X]E[Y ]. To complete the proof we have
to show that the left hand side converges to E[XY ]. This follows by applying the monotone
convergence theorem (see Remark 3.1) to the sequence Zn = sX Y
n sn .
46
Letting Y = 1 in (3.3), we find
L1 (Ω) ⊂ L2 (Ω).
and
2 0 with probability 1/3
Y =X =
1 with probability 2/3
Then X and Y are clearly not independent, but
Using the linearity of the expectation we can rewrite the definition of variance as
Note that a random variable has zero variance if and only if X = E[X] a.s., hence we may
view Var[X] as a measure of the “randomness of X”. As a way of example, let us compute
the variance of X ∈ P(µ). We have
∞ ∞ ∞
X X µk e−µ X k
E[X 2 ] = k 2 P(X = k) = k2 = e−µ µk
k=0 k=0
k! k=1
(k − 1)!
∞ ∞ ∞
−µ
X r+1 r+1 −µ
X µr X
=e µ =e µ +µ rP(X = r) = µ + µE[X] = µ + µ2 .
r=0
r! r=0
r! r=0
Hence
Var[X] = E[X 2 ] − E[X]2 = µ + µ2 − µ2 = µ.
Exercise 3.4. Compute the variance of binomial random variables.
47
Exercise 3.5 (•). Prove the following:
By the previous exercise, Var(X + Y ) = Var(X) + Var(Y ) holds if and only if X, Y are
uncorrelated. Moreover, if we define the correlation of X, Y as
Cov(X, Y )
Cor(X, Y ) = p ,
Var(X)Var(Y )
Exercise 3.6. Let {Mk }k∈N be a symmetric random walk. Show that E[Mk ] = 0, and that
Var[Mk ] = k, for all k ∈ N.
Exercise 3.7. Show that the function k · k2 which maps a random variable Z to kZk2 =
p
E[Z 2 ] is a norm in L2 (Ω).
Remark 3.5 (L2 -norm). The norm defined in the previous exercise is called L2 norm. It
can be shown that it is a complete norm, i.e., if {XN }n∈N ⊂ L2 (Ω) is a Cauchy sequence of
random variables in the norm L2 , then there exists a random variable X ∈ L2 (Ω) such that
kXN − Xk2 → 0 as n → ∞.
Exercise 3.8 (•). Let {Wn (t)}t≥0 , n ∈ N, be the sequence of stochastic processes defined
in (2.13). Compute E[Wn (t)], Var[Wn (t)], Cov[Wn (t), Wn (s)]. Show that
Next we want to present a first application in finance of the theory outlined above. In
particular we establish a sufficient condition which ensures that a portfolio is not an arbitrage.
Theorem 3.2. Let a portfolio be given with value {V (t)}t≥0 . Let V ∗ (t) = D(t)V (t) be the
discounted portfolio value. If there exists a measure P e ∗ (t)] is
e equivalent to P such that E[V
constant (independent of t), then the portfolio is not an arbitrage.
Proof. Assume that the portfolio is an arbitrage. Then V (0) = 0 almost surely; as V ∗ (0) =
48
V (0), the assumption of constant expectation in the probability measure P
e gives
e ∗ (t)] = 0,
E[V for all t ≥ 0. (3.4)
Let T > 0 be such that P(V (T ) ≥ 0) = 1 and P(V (T ) > 0) > 0. Since P and P e are
e (T ) ≥ 0) = 1 and P(V
equivalent, we also have P(V e (T ) > 0) > 0. Since the discounting
e ∗ (T ) ≥ 0) = 1 and P(V
process is positive, we also have P(V e ∗ (T ) > 0) > 0. However this
contradicts (3.4), due to Theorem 3.1(iii). Hence our original hypothesis that the portfolio
is an arbitrage portfolio is false.
Theorem 3.2 will be applied in Chapter 6. To this purpose we shall need the following
characterization of equivalent probability measures.
Theorem 3.3. The following statements are equivalents:
(i) P and P
e are equivalent probability measures;
(ii) There exists a unique (up to null sets) random variable Z : Ω → R such that Z > 0
almost surely, E[Z] = 1 and P(A)
e = E[ZIA ], for all A ∈ F.
Moreover, assuming any of these two equivalent conditions, for all random variables X such
that XZ ∈ L1 (Ω, P), we have X ∈ L1 (Ω, P)
e and
E[X]
e = E[ZX]. (3.5)
Proof. The implication (i) ⇒ (ii) follows by the Radon-Nikodým theorem, whose proof
can be found for instance in [6]. As to the implication (ii) ⇒ (i), we first observe that
P(Ω)
e = E[ZIΩ ] = E[Z] = 1. Hence, to prove that P e is a probability measure, it remains to
show that it satisfies the countable additivity property: for all families {Ak }k∈N of disjoint
e k Ak ) = P P(A
events, P(∪ e k ). To prove this let
k
Bn = ∪nk=1 Ak .
Clearly, ZIBn is an increasing sequence of random variables. Hence, by the monotone con-
vergence theorem (see Remark 3.1) we have
lim E[ZIBn ] = E[ZIB∞ ], B∞ = ∪∞
k=1 Ak ,
n→∞
i.e.,
lim P(B
e n ) = P(B
e ∞ ). (3.6)
n→∞
On the other hand, by linearity of the expectation,
n
X n
X
e n ) = E[ZBn ] = E[ZI∪n A ] = E[Z(IA + · · · + IAn )] =
P(B E[ZAk ] = e k ).
P(A
k=1 k 1
k=1 k=1
49
This proves that P e is a probability measure. To show that P and P e are equivalent, let A be
such that P(A) = 0. Since ZIA ≥ 0 almost surely, then P(A) = E[ZIA ] = 0 is equivalent, by
e e
Theorem 3.1(iii), to ZIA = 0 almost surely. Since Z > 0 almost surely, then this is equivalent
to IA = 0 a.s., i.e., P(A) = 0. Thus P(A)
e = 0 if and only if P(A) = 0, i.e., the probability
measures P and P are P
e equivalent. It remains to prove the identity (3.5). If X is the simple
random variable X = k ak IAk , then the proof is straightforward:
X X X
E[X]
e = ak P(A
e k) = ak E[ZIAk ] = E[Z ak IAk ] = E[ZX].
k k k
For a general non-negative random variable X the result follows by applying (3.5) to an
increasing sequence of simple random variables converging to X and then passing to the
limit (using the monotone convergence theorem). The result for a general random variable
X : Ω → R follows by applying (3.5) to the positive and negative part of X and using the
linearity of the expectation.
Remark 3.6 (Radon-Nikodým derivative). Using the Lebesgue integral notation (see Re-
mark 3.4) we can write (3.5) as
Z Z
X(ω)dP(ω)
e = X(ω)Z(ω)dP(ω).
Ω Ω
dP(ω)
This leads to the formal identity dP(ω) = Z(ω)dP(ω), or Z(ω) = dP(ω) , which explains why
e
e
Z is also called the Radon-Nikodým derivative of P
e with respect to P.
50
Proof. We prove the theorem under the assumption that g is a simple measurable function,
the proof for general functions g follows by a limit argument similar to the one used in the
proof of Theorem 3.1, see Theorem 1.5.2 in [21] for the details. Hence we assume
N
X
g(x) = αk IUk (x),
k=1
Let Yk = IUk (X) : Ω → R. Then Yk is the simple random variable that takes value 1 if
ω ∈ Ak and 0 if ω ∈ Ack , where Ak = {X ∈ Uk }. Thus the expectation of Yk is given by
E[Yk ] = P(Ak ) and so
X X Z Z X Z
E[g(X)] = αk P(X ∈ Uk ) = αk f (x) dx = αk IUk (x)f (x) dx = g(x)f (x) dx,
k k Uk R k R
as claimed.
For instance, if X ∈ N (m, σ 2 ), we have
dx
Z
(x−m)2
E[X] = xe− 2σ2 √ = m,
2πσ 2
R
dx
Z
(x−m)2
Var[X] = x2 e− 2σ2 √ − m2 = σ 2 ,
R 2πσ 2
which explains why we called m the expectation and σ 2 the variance of the normal random
variable X. Note in particular that, for a Brownian motion {W (t)}t≥0 , there holds
E[W (t) − W (s)] = 0, Var[W (t) − W (s)] = |t − s|, for all s, t ≥ 0. (3.7)
Let us show that1
Cov(W (t), W (s)) = min(s, t). (3.8)
For s = t, the claim is equivalent to Var[W (t)] = t, which holds by definition of Brownian
motion (see (3.7)). For t > s we have
Cov(W (t), W (s)) = E[W (t)W (s)] − E[W (t)]E[W (s)]
= E[W (t)W (s)]
= E[(W (t) − W (s))W (s)] + E[W (s)2 ].
Since W (t)−W (s) and W (s) are independent random variables, then E[(W (t)−W (s))W (s)] =
E(W (t) − W (s)]E[W (s)] = 0, and so
Cov(W (t), W (s)) = E[W (s)2 ] = Var[W (s)] = s = min(s, t), for t > s.
A similar argument applies for t < s.
1
Compare (3.8) with the result of Exercise 3.8.
51
Exercise 3.9. The moment of order n of a random variable X is the quantity µn = E[X n ],
n = 1, 2, . . . . Let X ∈ N (0, σ 2 ). Prove that
0 if n is odd
µn = n
1 · 3 · 5 . . . · · · · (n − 1)σ if n is even.
Exercise 3.10 (•). Compute the expectation and the variance of exponential random vari-
ables.
Exercise 3.11 (•). Let X ∈ E(λ) an exponential random variable with intensity λ. Given
λ
e > 0, let
λ
e e
Z = e−(λ−λ)X .
λ
Define P(A)
e = E[ZIA ], A ∈ F. Show that Pe is a probability measure equivalent to P. Prove
that X ∈ E(λ)
e in the probability measure P.
e
Exercise 3.12. Compute the expectation and the variance of Cauchy distributed random
variables. Compute the expectation and the variance of Lévy distributed random variables.
Exercise 3.13. Compute the expectation and the variance of the geometric Brownian mo-
tion (2.14).
Exercise 3.14. Show that the paths of the Brownian motion have unbounded linear variation.
Namely, given 0 = t0 < t1 < · · · < tn = t with tk − tk−1 = h, for all k = 1, . . . , n, show that
n
X
E[ |W (tk ) − W (tk−1 )|] → ∞, as n → ∞.
k=1
(However, Brownian motions have finite quadratic variation, see Section 3.4).
A result similar to Theorem 3.4 can be used to compute the correlation between two
random variables that admit a joint density.
Theorem 3.5. Let X, Y : Ω → R be two random variables with joint density fX,Y : R2 →
[0, ∞) and let g : R2 → R be a measurable function such that g(X, Y ) ∈ L1 (Ω). Then
Z
E[g(X, Y )] = g(x, y)fX,Y (x, y) dx dy.
R2
where Z Z
fX (x) = fX,Y (x, y) dy, fY (y) = fX,Y (x, y) dx
R R
are the (marginal) densities of X and Y .
52
Exercise 3.15. Show that if X1 , X2 : Ω → R are jointly normally distributed with covariant
matrix C = (Cij )i,j=1,2 , then Cij = Cov(Xi , Xj ).
Combining the results of Exercises 2.12 and 3.15, we see that the parameter ρ in Equa-
tion (2.12) is precisely the correlation of the two jointly normally distributed random vari-
ables X, Y . It follows by Remark 2.3 that two jointly normally distributed random variables
are independent if and only if they are uncorrelated. Recall that for general random variables,
independence implies uncorrelation, but the opposite is in general not true.
θX (u) = E[eiuX ]
i.e., the characteristic function is the inverse Fourier transform of the density. Table 3.1
contains some examples of characteristic functions.
Note carefully that, while θX is defined for all u ∈ R, the moment-generating function of
a random variable may be defined only in a subset of the real line, or not defined at all (see
Exercise 3.16). For instance, when X ∈ E(λ) we have
Z ∞
uX (u−λ)x +∞ if u ≥ λ
MX (u) = E[e ] = λ e dx = −1
0 (1 − u/λ) if u < λ
Hence MX (u) which is defined (as a positive function) only for u < λ.
Exercise 3.16. Show that Cauchy random variables do not have a well-defined moment-
generating function.
The characteristic function of a random variable provides a lot of information. In partic-
ular, it determines completely the distribution function of the random variable, as shown in
the following theorem (for the proof, see [6, Sec. 9.5]).
53
Density Characteristic function
N (m, σ 2 ) exp(ium − 21 σ 2 u2 )
E(λ) (1 − iu/λ)−1
χ2 (δ) (1 − 2iu)−δ/2
βu
(1 − 2iu)−δ/2 exp − 2u+i
χ2 (δ, β)
54
In fact,
1 2 2 1 2 2 1 2 u2
θX1 +···+XN (u) = θX1 (u) . . . θXN (u) = eium1 − 2 σ1 u . . . eiumN − 2 σN u = eium− 2 σ .
The right hand side of the previous equation is the characteristic function of a normal
variable with expectation m and variance σ 2 given by (3.9). Thus Theorem 3.6 implies that
X1 + · · · + XN ∈ N (m, σ 2 ).
Exercise P3.17. Let X1 ∈ N (m1 , σ12 ), . . . , XN ∈ N (mN , σN 2
), N ≥ 2, be independent. Show
that Y = N k=1 (X k /σ k ) 2
∈ χ2
(N, β) where β = (m 1 /σ 1 )2
+ · · · + (mN /σN )2 (compare with
Exercise 2.9).
Exercise 3.18. Let X, Y ∈ L1 (Ω) be independent random variables with densities fX , fY .
Show that X + Y has the density
Z
fX+Y (x) = fX (x − y)fY (y) dy.
R
Remark: The right hand side of the previous identity defines the convolution product of
the functions fX , fY .
The characteristic function is also very useful to establish whether two random variables
are independent, as shown in the following exercise.
Exercise 3.19. Let X, Y ∈ L1 (Ω) and define their joint characteristic function as
θX,Y (u, v) = E[eiuX+ivY ], u, v ∈ R.
Show that X, Y are independent if and only if θX,Y (u, v) = θX (u)θY (v).
To measure the amount of oscillations of {X(t)}t≥0 in the interval [0, T ] along the partition
Π, we compute
m−1
X
QΠ (ω) = (X(tj+1 , ω) − X(tj , ω))2 .
j=0
55
Note carefully that QΠ is a random variable and that it depends on the partition. For
example, let {∆(s)}s≥0 be the step process
∞
X
∆(s, ω) = Xk (ω)I[sk ,sk+1 ) .
k=0
Then if the partition Π = {0, t1 , . . . , tm = T } is such that sk−1 < tk < sk , for all k =
1, 2, . . . m, we have
QΠ (ω) = (X2 (ω) − X1 (ω))2 + (X3 (ω) − X2 (ω))2 + · · · + (Xm+1 (ω) − Xm (ω))2 .
However if two points in the partition belong to the same interval [sk , sk+1 ), the variation
within these two instants of time clearly gives no contribution to the total variation QΠ .
To define the quadratic variation of the stochastic process {X(t)}t≥0 , we compute QΠn
along a sequence {Πn }n∈N of partitions to the interval [0, T ] such that kΠn k → 0 as n → ∞
and then we take the limit of QΠn as n → ∞. Since {QΠn }n∈N is a sequence of random
variables, there are several ways to define its limit as n → ∞. The precise definition that we
adopt is that of L2 -quadratic variation, in which the limit is taken in the norm k · k2 defined
in Exercise 3.5.
Definition 3.5. The L2 -quadratic variation of the stochastic process {X(t)}t≥0 in the
interval [0, T ] along the sequence of partitions {Πn }n∈N is a random variable denoted by
[X, X](T ) such that
2
m(n)−1
X (n) (n)
lim E (X(tj+1 ) − X(tj ))2 − [X, X](T ) = 0,
n→∞
j=0
56
Theorem 3.8. Assume that the paths of the stochastic process {X(t)}t≥0 satisfy
P(|X(t) − X(s)| ≤ C|t − s|γ ) = 1, (3.10)
for some positive constant C > 0 and γ > 1/2. Then
dX(t)dX(t) = 0.
Proof. We have
2 2
m(n)−1 m(n)−1
X (n) (n)
X (n) (n)
E (X(tj+1 ) − X(tj ))2 ≤ C 4 E (tj+1 − tj )2γ
j=0 j=0
2
m(n)−1
X (n) (n) (n) (n)
= C 4 E (tj+1 − tj )2γ−1 (tj+1 − tj ) .
j=0
where we recall that m(n) + 1 is the number of points in the partition Πn of [0, T ]. We
compute
E[(QΠn − T )2 ] = E[Q2Πn ] + T 2 − 2T E[QΠn ].
But
m(n)−1 m(n)−1
X (n) (n)
X (n) (n)
E[QΠn ] = E[(W (tj+1 ) − W (tj ))2 ] = Var[W (tj+1 ) − W (tj )]
j=0 j=0
m(n)−1
X (n) (n)
= (tj+1 − tj ) = T.
j=0
57
Hence we have to prove that
lim E[Q2Πn ] − T 2 = 0,
kΠn k→0
We conclude that
m(n)−1
X (n) (n)
Var[QΠn ] = 2 (tj+1 − tj )2 ≤ 2kΠn kT → 0, as kΠn k → 0,
j=0
58
As for the quadratic variation of a stochastic process, we use a special notation to ex-
press that the cross variation of two stochastic processes is independent of the sequence of
partitions along which it is computed. Namely, we write
dX1 (t)dX2 (t) = dY (t),
to indicate that the cross variation [X1 , X2 ](t) equals Y (t) along any sequence of partitions
{Πn }n∈N . The following generalization of Theorem 3.8 is easily established.
Theorem 3.10. Assume that the paths of the stochastic processes {X1 (t)}t≥0 , {X2 (t)}t≥0
satisfy
P(|X1 (t) − X1 (s)| ≤ C|t − s|γ ) = 1, P(|X2 (t) − X2 (s)| ≤ C|t − s|λ ) = 1,
for some positive constants C, γ, λ such that γ + λ > 1/2. Then dX1 (t)dX2 (t) = 0.
Exercise 3.20. Prove the theorem.
As a special case we find that
dW (t)dt = 0. (3.14)
It is important to memorize the identities (3.11), (3.12) and (3.14), as they will be used
several times in the following chapters.
Exercise 3.21 (?). Let {W1 (t)}t≥0 , {W2 (t)}t≥0 be two independent Brownian motions. Show
that dW1 (t)dW2 (t) = 0.
59
where a1 , . . . , aN ∈ R and {Ak }k=1,...,N is a family of disjoints subsets of Ω. Let B ∈
k ∩B)
F : P(B) > 0 and PB (Ak ) = P(A P(B)
be the conditional probability of Ak given B, see
Definition 1.4. It is natural to define the conditional expectation of X given the event B as
N
X
E[X|B] = ak PB (Ak ).
k=1
Moreover, since
N
X N
X
XIB = ak IAk IB = ak IAk ∩B ,
k=1 k=1
B]
we also have the identity E[X|B] = E[XI
P(B)
. We use the latter identity to define the conditional
expectation given B of general random variables.
Definition 3.7. Let X ∈ L1 (Ω) and B ∈ F. When P(B) > 0 we define the conditional
expectation of X given the event B as
E[XIB ]
E[X|B] = .
P(B)
When P(B) = 0 we define E[X|B] = E[X].
Note that E[X|B] is a deterministic constant. Next we discuss the concept of conditional
expectation given a σ-algebra G. We first assume that G is generated by a (say, finite)
partition {Ak }k=1,...,M of Ω, see Exercise 1.4. Then it is natural to define
M
X
E[X|G] = E[X|Ak ]IAk . (3.15)
k=1
Note that E[X|G] is a G-measurable simple function. It will now be shown that (3.15)
satisfies the identity
E[E[X|G]|B] = E[X|B], for all B ∈ G : P(B) > 0. (3.16)
In fact,
P(B)E[E[X|G]|B] = E[E[X|G]IB ]
M
X
= E[ E[X|Ak ]IAk IB ]
k=1
M
X
= E[E[X|Ak ]IAk ∩B ]
k=1
M
X E[XIAk ]
= E[ IA ∩B ]
k=1
P(Ak ) k
M
X 1
= E[XIAk ]E[IAk ∩B ].
k=1
P(Ak )
60
Since {A1 , . . . , AM } is a partition of Ω, there exists I ⊂ {1, . . . , M } such that B = ∪k∈I Ak ;
hence the above sum may be restricted to k ∈ I. Since E[IAk ∩B ] = E[IAk ] = P(Ak ), for k ∈ I,
we obtain X
P(B)E[E[X|G]|B] = E[XIAk ] = E[XI∪k∈I Ak ] = E[XIB ],
k∈I
then Y1 = Y2 a. s.
Proof. We want to prove that P(B) = 0, where
B = {ω ∈ Ω : Y1 (ω) 6= Y2 (ω)}.
Let B+ = {Y1 > Y2 } and assume P(B+ ) > 0. Then, by (3.17) and Definition 3.7,
E[(Y1 − Y2 )IB+ ] = E[Y1 IB+ ] − E[Y2 IB+ ] = P(B+ )(E[Y1 |B+ ] − E[Y2 |B+ ]) = 0.
By Theorem 3.1(iii), this is possible if and only if (Y1 − Y2 )IB+ = 0 a.s., which entails
P(B+ ) = 0. At the same fashion one proves that P(B− ) = 0, where P(B− ) = {Y1 < Y2 }.
Hence P(B) = P(B+ ) + P(B− ) = 0, as claimed.
Theorem 3.12 (and Definition). Let X ∈ L1 (Ω) and G be a sub-σ-algebra of F. There
exists a G-measurable random variable E[X|G] ∈ L1 (Ω) such that
The random variable E[X|G], which by Theorem 3.11 is uniquely defined up to a null set, is
called the conditional expectation of X given the σ-algebra G. If G is the σ-algebra
generated by a random variable Y , i.e., G = σ(Y ), we write E[X|G] = E[X|Y ].
Proof. See [21, Appendix B].
We continue this section with a list of properties of the conditional expectation, which
we divide in three theorems.
61
Theorem 3.13. The conditional expectation of X ∈ L1 (Ω) satisfies the following identities
almost surely:
(i) E[E[X|G]] = E[X];
E[E[X|G]|H] = E[X|H];
(iv) Linearity: E[αX + βY |G] = αE[X|G] + βE[Y |G], for all α, β ∈ R and Y ∈ L1 (Ω).
E[E[E[X|G]|H]|A] = E[E[X|G]|A],
E[E[X|G]|A] = E[X|A],
E[E[X|H]|A] = E[X|A],
E[E[E[X|G]|H]|A] = E[E[X|H]|A]
and thus by uniqueness the claim follows. (iv) The variables Y1 = E[αX + βY |G] and
Y2 = αE[X|G] + βE[Y |G] satisfy (3.17), and so they are equal almost surely. (v) See next
exercise.
Exercise 3.23. Prove the property (v) in Theorem 3.13.
The following theorem collects some properties of the conditional expectation in the
presence of two variables and is given without proof.
Theorem 3.14. Let X, Y ∈ L1 (Ω). Then the following identities holds almost surely:
(i) If X is independent of G, then E[X|G] = E[X];
62
Exercise 3.25. Prove the property (iii) in Theorem 3.14.
Exercise 3.26 (•). The purpose of this exercise is to show that the conditional expectation
is the best estimator of a random variable when some information is given in the form of a
sub-σ-algebra. Let X ∈ L1 (Ω) and G ⊆ F be a sub-σ-algebra. Define Err = X − E[X|G].
Show that E[Err] = 0 and
Var[Err] = min Var[Y − X],
Y
where the minimum is taken with respect to all G-measurable random variables Y .
X = E[X|Y ] + (X − E[X|Y ]) = X1 + X2 .
Show that X2 and Y are uncorrelated. Hence any random variable X can be written as the
sum of a Y -measurable random variable and a remainder which is uncorrelated to X.
f (x) = E[g(x, Y )]
The previous theorem tells us that, under the stated assumptions, we can compute the
random variable E[g(X, Y )|G] as if X were a constant.
3.6 Martingales
A martingale is a stochastic process which has no tendency to rise or fall. The precise
definition is the following:
Definition 3.8. A stochastic process {M (t)}t≥0 is called a martingale relative to the fil-
tration {F(t)}t≥0 if it is adapted to {F(t)}t≥0 , M (t) ∈ L1 (Ω) for all t ≥ 0, and
for all t ≥ 0.
Remark 3.9. If the condition (3.19) is replaced by E[M (t)|F(s)] ≥ M (s), for all 0 ≤
63
s ≤ t, the stochastic process {M (t)}t≥0 is called a sub-martingale. The interpretation
is that M (t) has no tendency to fall, but our expectation is that it will increase. If the
condition (3.19) is replaced by E[M (t)|F(s)] ≤ M (s), for all 0 ≤ s ≤ t, the stochastic
process {M (t)}t≥0 is called a super-martingale. The interpretation is that M (t) has not
tendency to rise, but our expectation is that it will decrease.
Remark 3.10. If we want to emphasize that the martingale property is satisfied with respect
to the probability measure P, we shall say that {M (t)}t≥0 is a P-martingale.
Since the conditional expectation of a random variable X is uniquely determined by (3.18),
then the property (3.19) is satisfied if and only if
E[M (s)IA ] = E[M (t)IA ], for all 0 ≤ s ≤ t and for all A ∈ F(s). (3.20)
Combining the latter result with Theorem 3.2, we obtain the following sufficient condition
for no arbitrage.
Theorem 3.16. Let a portfolio be given with value {V (t)}t≥0 . If there exists a measure
e equivalent to P and a filtration {F(t)}t≥0 such that the discounted value of the portfolio
P
{V ∗ (t)}t≥0 is a martingale, then the portfolio is not an arbitrage.
Proof. The assumption is that
E[D(t)V
e (t)|F(s)] = D(s)V (s), for all 0 ≤ s ≤ t.
Theorem 3.17. Let {F(t)}t≥0 be a non-anticipating filtration for the Brownian motion
{W (t)}t≥0 . Then {W (t)}t≥0 is a martingale relative to {F(t)}t≥0 .
Proof. The martingale property for s = t, i.e., E[W (t)|F(t)] = W (t), follows by the fact
W (t) is F(t)-measurable, and thus Theorem 3.13(ii) applies. For 0 ≤ s < t we have
where we used that W (t) − W (s) is independent of F(s) (and so E[(W (t) − W (s))|F(s)] =
E[(W (t) − W (s))] by Theorem 3.14(i)), and the fact that W (s) is F(s)-measurable (and so
E[W (s)|F(s)] = W (s)).
Exercise 3.28. In the ∞-coin tosses experiment, let FN be the σ-algebra of the events
resolved by the first N tosses. Show that the random walk is a martingale with respect to the
filtration {FN }N ∈N . The proof can be found in
64
Thus Brownian motions are martingales, they have a.s. continuous paths and have
quadratic variation t in the interval [0, t], see Theorem 3.9. The following theorem, which
is a special case of the so called Lévy characterization of Brownian motion, show that
these three properties characterize Brownian motions and is often used to prove that a given
stochastic process is a Brownian motion. The proof can be found in [14].
Theorem 3.18. Let {M (t)}t≥0 be a martingale relative to a filtration {F(t)}t≥0 . Assume
that (i) M (0) = 0 a.s., (ii) the paths t → M (t, ω) are a.s. continuous and (iii) dM (t)dM (t) =
dt. Then {M (t)}t≥0 is a Brownian motion and {F(t)}t≥0 a non-anticipating filtration thereof.
Exercise 3.29. Consider the stochastic process {Z(t)}t≥0 given by
1
Z(t) = exp σW (t) − σ 2 t ,
2
where {W (t)}t≥0 is a Brownian motion and σ ∈ R is a constant. Let {F(t)}t≥0 be a
non-anticipating filtration for {W (t)}t≥0 . Show that {Z(t)}t≥0 is a martingale relative to
{F(t)}t≥0 .
Exercise 3.30 (•). Let {N (t)}t≥0 be a Poisson process generating the filtration {FN (t)}t≥0 .
Show that (i) {N (t)}t≥0 is a sub-martingale relative to {FN (t)}t≥0 and (ii) the so-called
compound Poisson process {N (t) − λt}t≥0 is a martingale relative to{FN (t)}t≥0 , where
λ is the rate of the Poisson process (see Definition 2.16).
Exercise 3.31 (•). Let {F(t)}t∈[0,T ] be a filtration and {M (t)}t∈[0,T ] a stochastic process
adapted to {F(t)}t∈[0,T ] . Show that {M (t)}t∈[0,T ] is a martingale if and only if there exists a
F(T )-measurable random variable H ∈ L1 (Ω) such that
M (t) = E[H|F(t)].
Now assume that {Z(t)}t≥0 is a martingale such that Z(t) > 0 a.s. for all t ≥ 0 and
Z(0) = 1. In particular, E[Z(t)] = E[Z(0)] = 1 and therefore, by Theorem 3.3, the map
e : F → [0, 1] given by
P
P(A)
e = E[Z(T )IA ], A ∈ F (3.22)
is a probability measure equivalent to P, for all T > 0. Note that Pe depends on T > 0 and
P = P, for T = 0. The dependence on T is however not reflected in our notation. As usual,
e
the (conditional) expectation in the probability measure Pe will be denoted E.
e The relation
between E and E e is revealed in the following theorem.
Theorem 3.19. Let t ∈ [0, T ] and let X be a FW (t)-measurable random variable such that
Z(t)X ∈ L1 (Ω, P). Then X ∈ L1 (Ω, P)
e and
E[X]
e = E[Z(t)X]. (3.23)
Moreover, for all 0 ≤ s ≤ t and for all random variables Y such that Z(t)Y ∈ L1 (Ω, P),
there holds
e |FW (s)] = 1 E[Z(t)Y |FW (s)] (almost surely).
E[Y (3.24)
Z(s)
65
Proof. As shown in Theorem 3.3, E[X]
e = E[Z(T )X]. By Theorem 3.13(i), Theorem 3.14(ii),
and the martingale property of {Z(t)}t≥0 , we have
To prove (3.24), recall that the conditional expectation is uniquely defined (up to null sets)
by (3.18). Hence the identity (3.24) follows if we show that
−1
E[Z(s)
e E[Z(t)Y |FW (s)]IA ] = E[Y
e IA ],
for all A ∈ FW (s). Since IA is FW (s)-measurable, and using (3.23) with X = Z(s)−1 E[Z(t)Y IA |FW (s)]
and t = s, we have
−1 −1
E[Z(s)
e E[Z(t)Y |FW (s)]IA ] = E[Z(s)
e E[Z(t)Y IA |FW (s)]] = E[E[Z(t)Y IA |FW (s)]]
= E[Z(t)Y IA ] = E[Y
e IA ],
where in the last step we used again (3.23). The proof is complete.
Definition 3.9. A stochastic process {X(t)}t≥0 is called a Markov process with respect
to the filtration {F(t)}t≥0 if it is adapted to {F(t)}t≥0 and if for every measurable function
g : R → R such that g(X(t)) ∈ L1 (Ω) for all t ≥ 0, there exists a measurable function
fg : [0, ∞) × [0, ∞) × R → R such that
If there exists a measurable function feg : [0, ∞) × R → R such that fg (t, s, x) = feg (t − s, x),
the Markov process is said to be homogeneous. If there exists a measurable function p :
[0, ∞) × [0, ∞) × R → R such that
Z
fg (t, s, x) = g(y)p(t, s, x, y) dy, for 0 ≤ s < t, (3.26)
R
Thus for a Markov process, the conditional expectation of g(X(t)) at the future time t
depends only on the random variable X(s) at time s, and not on the behavior of the process
before or after time s. Note that in the case of a homogeneous Markov process, the transition
density, if it exists, has the form p(t, s, x, y) = p∗ (t − s, x, y), for some measurable function
p∗ : [0, ∞) × R → R.
66
Remark 3.11. We will say that a stochastic process is a P-Markov process if we want to
emphasize that the Markov property holds in the probability measure P.
Exercise 3.32 (•). Show that the function fg (t, s, x) in the right hand side of (3.25) is given
by
fg (t, s, x) = E[g(X(t))|X(s) = x] for all 0 ≤ s ≤ t. (3.27)
Theorem 3.20. Let {X(t)}t≥0 be a Markov process with transition density p(t, s, x, y) rela-
tive to the filtration {F(t)}t≥0 . Assume X(s) = x ∈ R is a deterministic constant. Then for
all t ≥ s, X(t) admits the density fX(t) given by
where ge(x, y) = g(x + y). Since W (t) − W (s) is independent of F(s) and W (s) is F(s)
measurable, then we can apply Theorem 3.15. Precisely, letting
we have
E[g(W (t))|F(s)] = fg (t, s, W (s)),
67
which proves that the Brownian motion is a Markov process relative to {F(t)}t≥0 . To derive
the transition density we use that Y = W (t) − W (s) ∈ N (0, t − s), so that
1 1
Z Z
y2 (y−x)2
− 2(t−s)
E[g(x + Y )] = p g(x + y)e dy = p g(y)e− 2(t−s) dy,
2π(t − s) R 2π(t − s) R
hence Z
E[g(W (t))|F(s)] = g(y)p∗ (t − s, x, y) dy ,
R x=W (s)
is a homogeneous Markov process in the filtration {F(t)}t≥0 with transition density p(t, s, x, y) =
p∗ (t − s, x, y), where
(log(y/x) − ατ )2
1
p∗ (τ, x, y) = √ exp − Iy>0 . (3.31)
σy 2πτ 2σ 2 τ
Show also that, when p is given by (3.31), the function v : (s, ∞) × (0, ∞) → R given by
Z
v(t, x) = g(y)p∗ (t − s, x, y) dy (3.32)
R
satisfies
1
∂t vs − (α + σ 2 /2)x∂x vs − σ 2 x2 ∂x2 vs = 0, for x > 0, t > s, (3.33a)
2
v(s, x) = g(x), for x > 0. (3.33b)
The correspondence between Markov processes and PDEs alluded to in the last two
exercises is a general property which will be further discussed later in the notes.
68
3.A Appendix: Solutions to selected problems
Exercise 3.1. Let X be a binomial random variable. Then
N N N
X X N k N −k N −1
X N p k−1
E[X] = kP(X = k) = k p (1 − p) = (1 − p) p k .
k=1 k=1
k k=0
k 1 − p
Exercise 3.3. If Y = 0 almost surely, the claim is obvious. Hence we may assume that
E[Y 2 ] > 0. Let
E[XY ]
Z=X− Y.
E[Y 2 ]
Then
E[XY ]2
0 ≤ E[Z 2 ] = E[X 2 ] − ,
E[Y 2 ]
by which (3.3) follows.
Exercise 3.5. The first and second properties follow by the linearity of the expectation. In
fact
Var[αX] = E[α2 X 2 ] − E[αX]2 = α2 E[X 2 ] − α2 E[X]2 = α2 Var[X],
and
Since the variance of a random variable is always non-negative, the parabola y(a) = a2 Var[X]+
Var[Y ]−2aCov(X, Y ) must always lie above the a-axis, or touch it at one single point a = a0 .
Hence
Cov(X, Y )2 − Var[X]Var[Y ] ≤ 0,
69
which proves the first part of the claim 3. Moreover Cov(X, Y )2 = Var[X]Var[Y ] if and only
if there exists a0 such that Var[−a0 X + Y ] = 0, i.e., Y = a0 X + b0 almost surely, for some
constant b0 . Substituting in the definition of covariance, we see that Cov(X, a0 X + b0 ) =
a0 Var[X], by which the second claim of property 3 follows immediately.
70
To show that X ∈ E(λ)
e in the probability measure Pe we compute
λ
e eZ x e
λ
−(
≤ x) = E[ZIX≤x ] = E[e λ−λ)X
IX≤x ] = e−(λ−λ)y λe−λy dy = 1 − e−λx .
e e
P(X
e
λ λ 0
Exercise 3.24. Let G be generated by the partition {Ak }k=1,...M of Ω. Since X is indepen-
dent of G, then X and IAk are independent random variables, for all k = 1, . . . , M . It follows
that
M M M M
X E[XIAk ] X E[X]E[IAk ] X E[X]P(Ak ) X
E[X|G] = IAk = IAk = IAk = E[X] IAk = E[X].
P(A k) P(Ak) P(Ak)
k=1 k=1 k=1 k=1
Exercise 3.30. First we observe that claim (i) follows by claim (ii). In fact, if the compound
Poisson process is a martingale, then
E[N (t) − λt|FN (s)] = N (s) − λs, for all 0 ≤ s ≤ t,
by which it follows that
E[N (t)|FN (s)] = N (s) + λ(t − s) ≥ N (s), for all 0 ≤ s ≤ t.
71
Hence it remains to prove (ii). We have
E[N (t) − λt|FN (s)] = E[N (t) − N (s) + N (s) − λt|FN (s)]
= E[N (t) − N (s)|FN (s)] + E[N (s)|FN (s)] − λt
= E[N (t) − N (s)] + N (s) − λt = λ(t − s) + N (s) − λt = N (s) − λs.
Exercise 3.31. If M (t) is a martingale, then E[M (T )|F(t)] = M (t), hence we can pick
H = M (T ). Viceversa, by the iterative property of the conditional expectation, the process
E[H|F(t)] satisfies, for all s > t,
E[E[H|F(t)]|F(s)] = E[H|F(s)],
hence it is a martingale.
Exercise 3.32. Taking the conditional expectation of both sides of (3.25) with respect to
the event {X(s) = x} gives (3.27).
72
Chapter 4
Stochastic calculus
Throughout this chapter we assume that the probability space (Ω, F, P) and the Brownian
motion {W (t)}t≥0 are given. Moreover we denote by {F(t)}t≥0 a non-anticipating filtration
for the Brownian motion, e.g., F(t) = FW (t) (see Definition 2.15).
4.1 Introduction
So far we have studied in detail only one example of stochastic process, namely the Brownian
motion {W (t)}t≥0 . In this chapter we define several other processes which are naturally
derived from {W (t)}t≥0 and which in particular are adapted to {F(t)}t≥0 . To begin with, if
f : [0, ∞) × R → R is a measurable function, then we can introduce the stochastic processes
Z t
{f (t, W (t))}t≥0 , { f (s, W (s)) ds}t≥0 .
0
Note that the integral in the second stochastic process is the standard Lebesgue integral on
the s-variable. It is well-defined for instance when f is a continuous function.
The next class of stochastic processes that we want to consider are those obtained by
integrating along the paths of a Brownian motion, i.e., we want to give sense to the integral
Z t
I(t) = X(s)dW (s), (4.1)
0
where {X(t)}t≥0 is a stochastic process adapted to {F(t)}t≥0 . For our purposes we need to
give a meaning to I(t) when {X(t)}
R t≥0 has continuous paths a.s. (e.g., X(t) = W (t)). The
problem now is that the integral X(t)dg(t) is well-defined for continuous functions X (in the
Riemann-Stieltjes sense) only when g is of bounded variation. As shown in Exercise 3.14, the
paths of the Brownian motion are not of bounded variation, hence we have to find another
way to define (4.1). We begin in the next section by assuming that {X(t)}t≥0 is a step
process. Then we shall extend the definition to stochastic processes {X(t)}t≥0 such that
Z T
{X(t)}t≥0 is {F(t)}t≥0 -adapted and E[ X(t)2 dt] < ∞, for all T > 0. (4.2)
0
73
We denote by L2 the family of stochastic processes satisfying (4.2). The integral (4.1) can
be defined for more general processes than those in the class L2 , as we briefly discuss in
Theorem 4.4.
integrate ∆(t) along a stochastic process {Y (t)}t≥0 with differentiable paths, we would have,
assuming t ∈ (tk , tk+1 ),
Z t Z tX ∞ k−1
X Z tj+1 Z t
∆(s)dY (s) = ∆(tj )I[tj ,tj+1 ) dY (t) = ∆(tj ) dY (t) + ∆(tk ) dY (t)
0 0 j=0 j=0 tj tk
k−1
X
= ∆(tj )(Y (tj+1 ) − Y (tj )) + ∆(tk )(Y (t) − Y (tk )).
j=0
The second line makes sense also for stochastic processes {Y (t)}t≥0 whose paths are no-
where differentiable, and thus in particular for the Brownian motion. We then introduce the
following definition.
Definition 4.1. The Itô integral over the interval [0, t] of a step process {∆(t)}t≥0 ∈ L2
is given by
Z t k−1
X
I(t) = ∆(s)dW (s) = ∆(tj )(W (tj+1 ) − W (tj )) + ∆(tk )(W (t) − W (tk )),
0 j=0
74
(ii) Martingale property: the stochastic process {I(t)}t≥0 is a martingale in the filtration
{F(t)}t≥0 . In particular, E[I(t)] = E[I(0)] = E[0] = 0.
(iii) Quadratic variation: the quadratic variation of the stochastic process {I(t)}t≥0 on
the interval [0, T ] is independent of the sequence of partitions along which it is computed
and it is given by Z T
[I, I](T ) = ∆2 (s) ds.
0
Rt
(iv) Itô’s isometry: E[I 2 (t)] = E[ 0 ∆2 (s) ds], for all t ≥ 0.
Proof. The proof of (i) is straightforward. For the remaining claims, see the following theo-
rems in [21]: Theorem 4.2.1 (martingale property), Theorem 4.2.2 (Itô’s isometry), Theorem
4.2.3 (quadratic variation). Here we present the proof of (ii). First we remark that the con-
dition I(t) ∈ L1 (Ω), for all t ≥ 0, follows easily by the assumption that ∆(tj ) = Xj ∈ L2 (Ω),
for all j ∈ N and the Schwartz inequality. Hence we have to prove that
E[I(t)|F(s)] = I(s), for all 0 ≤ s ≤ t.
There are two possibilities: (1) either s, t ∈ [tk , tk+1 ), for some k ∈ N, or (2) there exists
l < k such that s ∈ [tl , tl+1 ) and t ∈ [tk , tk+1 ). We assume that (2) holds, the proof in the
case (1) being simpler. We write
l−1
X
I(t) = ∆(tj )(W (tj+1 ) − W (tj )) + ∆(tl )(W (tl+1 ) − W (tl ))
j=0
k−1
X
+ ∆(tj )(W (tj+1 ) − W (tj )) + ∆(tk )(W (t) − W (tk ))
j=l+1
Z t
= I(tl+1 ) + ∆(u)dW (u).
tl+1
As tl−1 < s, all random variables in the sum in the right hand side of the latter identity are
F(s)-measurable. Hence, by Theorem 3.13(ii),
l−1
X l−1
X
E[∆(tj )(W (tj+1 ) − W (tj ))|F(s)] = ∆(tj )(W (tj+1 ) − W (tj )).
j=0 j=0
Similarly,
E[∆(tl )(W (tl+1 ) − W (tl ))|F(s)] = E[∆(tl )W (tl+1 )|F(s)] − E[∆(tl )W (tl )|F(s)]
= ∆(tl )E[W (tl+1 )|F(s)] − ∆(tl )W (tl )
= ∆(tl )W (s) − ∆(tl )W (tl ),
75
where for the last equality we used that {W (t)}t≥0 is a martingale in the filtration {F(t)}t≥0 .
Hence
l−1
X
E[I(tl+1 )|F(s)] = ∆(tj )(W (tj+1 ) − W (tj )) + ∆(tl )(W (s) − W (tl )) = I(s).
j=0
Next we show that any stochastic process can be approximated, in a suitable sense, by
step processes.
Theorem 4.2. Let {X(t)}t≥0 ∈ L2 . Then for all T > 0 there exists a sequence of step
processes {{∆Tn (t)}t≥0 }n∈N such that ∆Tn (t) ∈ L2 for all n ∈ N and
Z T
lim E[ |∆Tn (t) − X(t)|2 dt] = 0. (4.4)
n→∞ 0
Proof. For simplicity we argue under the stronger assumption that the stochastic process
{X(t)}t≥0 is bounded and with continuous paths, namely
ω → X(t, ω) is bounded in Ω, for all t ≥ 0,
t → X(t, ω) is continuous for all ω ∈ Ω and t ≥ 0.
Now consider the partition of [0, T ] given by
(n) (n) (n) jT
0 = t0 < t1 < · · · < t(n)
n = T, tj =
n
and define
n−1
X (n)
∆Tn (t) = X(tk )I[t(n) ,t(n) ) , t ≥ 0,
k k+1
j=0
see Figure 4.1. Let us show that {∆Tn (t)}t≥0 is adapted to {F(t)}t≥0 . This is obvious for t ≥ T
76
X(t)
∆n (t)
(n)
(since the step process is identically zero for t ≥ T ). For t ∈ [0, T ) we have ∆Tn (t) = X(tk ),
(n) (n)
for t ∈ [tk , tk+1 ), hence
(n)
F∆Tn (t) = FX(t(n) ) ⊂ F(tk ) ⊂ F(t),
k (∗) (∗∗)
(n)
where in (*) we used that {X(t)}t≥0 is adapted to {F(t)}t≥0 and in (**) the fact that tk < t.
Moreover,
lim ∆Tn (t) = X(t), for all ω ∈ Ω,
n→∞
by the assumed continuity of the paths of {X(t)}t≥0 . For the next step we use the dominated
convergence theorem, see Remark 3.2. Since ∆Tn (t) and X(t) are bounded on [0, T ]×Ω, there
exists a constant CT such that |∆n (t) − X(t)|2 ≤ CT . Hence we may move the limit on the
left hand side of (4.4) across the expectation and integral operators and conclude that
Z T Z T
T 2
lim E[ |∆n (t) − X(t)| dt] = E[ lim |∆Tn (t) − X(t)|2 dt] = 0,
n→∞ 0 0 n→∞
as claimed.
77
definition is the following.
Theorem 4.3 (and Definition). Let {X(t)}t≥0 ∈ L2 , T > 0 and {{∆Tn (t)}t≥0 }n∈N be a
sequence of L2 -step processes converging to {X(t)}t≥0 in the sense of (4.4). Let
Z T
In (T ) = ∆Tn (s)dW (s).
0
We have
Z T Z T
E[ |∆Tn (s) − ∆Tm (s)|2 ds] ≤ 2E[ |∆Tn (s) − X(s)|2 ds]
0 0
Z T
+ 2E[ |∆Tm (s) − X(s)|2 ds] → 0 as n, m → ∞.
0
It follows that {In (T )}n∈N is a Cauchy sequence in the norm k · k2 . As mentioned in Re-
mark 3.5, the norm k · k2 is complete, i.e., Cauchy sequences converge. This proves the
existence of I(T ) such that kIn (T ) − I(T )k2 → 0. To prove that the limit is the same along
any sequence of L2 -step processes converging to {X(t)}t≥0 , assume that {{∆n (t)}t≥0 }n∈N ,
{{∆e n (t)}t≥0 }n∈N are two such sequences and denote
Z T Z T
In (T ) = ∆n (s)dW (s), In (t) =
e ∆
e n (s)dW (s).
0 0
which proves that In (T ) and Ien (T ) have the same limit. This completes the proof of the
theorem.
78
As a way of example, we compute the Itô integral of the Brownian motion. We claim
that, for all T > 0, Z T
W 2 (T ) T
W (t)dW (t) = − . (4.5)
0 2 2
To prove the claim, we approximate the Brownian motion by the sequence of step processes
introduced in the proof of Theorem 4.2. Hence we define
n−1
X jT
∆Tn (t) = W( )I jT j+1 .
j=0
n [ n , n T)
By definition
T n−1
jT (j + 1)T jT
Z X
In (T ) = ∆Tn (t)dW (t) = W( )[W ( ) − W ( )].
0 j=0
n n n
To simplify the notation we let Wj = W (jT /n). Hence our goal is to prove
n−1 2
X W 2 (T ) T
E[ Wj (Wj+1 − Wj ) − + ] → 0, as n → ∞. (4.6)
j=0
2 2
We prove below that the sum within the expectation can be rewritten as
n−1 n−1
X 1 1X
Wj (Wj+1 − Wj ) = W (T )2 − (Wj+1 − Wj )2 . (4.7)
j=0
2 2 j=0
which holds true by the already proven fact that [W, W ](T ) = T , see Theorem 3.9. It remains
to establish (4.7). Since W (T ) = Wn , we have
n−1 n−1 n−1 n−1
W (T ) 1 X 1 1X 2 1X 2 X
− (Wj+1 − Wj )2 = Wn2 − Wj+1 − W + Wj Wj+1
2 2 j=0 2 2 j=0 2 j=0 j j=0
n−2 n−1 n−1
1X 2 1X 2 X
=− Wj+1 − W + Wj Wj+1
2 j=0 2 j=1 j j=1
n−1
X n−1
X n−1
X
=− Wj2 + Wj Wj+1 = Wj (Wj+1 − Wj )
j=1 j=1 j=1
n−1
X
= Wj (Wj+1 − Wj ).
j=0
79
Exercise 4.1. Use the definition of Itô’s integral to prove that
Z T Z T
T W (T ) = W (t)dt + tdW (t). (4.8)
0 0
The Itô integral can be defined under weaker assumptions on the integrand stochastic
process than those considered so far. As this fact will be important in the following sections,
it is worth to briefly discuss it. Let M2 denote the set of {F(t)}t≥0 -adapted stochastic
RT
processes {X(t)}t≥0 such that 0 X(t)2 dt is bounded a.s. for all T > 0 (of course, L2 ⊂ M2 ).
Theorem 4.4 (and Definition). For every process {X(t)}t≥0 ∈ M2 and T > 0 there exists
a sequence of step processes {{∆Tn (t)}t≥0 }n∈N ⊂ M2 such that
Z T
lim |X(s) − ∆Tn (s)|2 ds → 0 a.s.
n→∞ 0
and Z T
∆n (t)dW (t)
0
converges in probability as n → ∞. The limit is independent of the sequence of step processes
converging to {X(t)}t≥0 and is called the Itô integral of the process {X(t)}t≥0 in the interval
[0, T ]. If {X(t)}t≥0 ∈ L2 , the Itô integral just defined coincides (a.s.) with the one defined
in Theorem 4.3.
For the proof of the previous theorem, see [1, Sec. 4.4]. We remark that Theorem 4.4
implies that all {F(t)}t≥0 -adapted stochastic processes with a.s. continuous paths are Itô
integrable. In fact, if {X(t)}t≥0 has a.s. continuous paths, then for all T > 0, there exists
CT (ω) such that supt∈[0,T ] |X(t, ω)| ≤ CT (ω) a.s. Hence
Z T
|X(s)|2 ds ≤ T CT2 (ω), a.s.
0
and thus Theorem 4.4 applies. The case of stochastic processes with a.s. continuous paths
covers all the applications in the following chapters, hence we shall restrict to it from now
on.
Definition 4.2. We define C 0 to be the space of all {F(t)}t≥0 -adapted stochastic processes
{X(t)}t≥0 with a.s. continuous paths.
In particular, if {X(t)}t≥0 , {Y (t)}t≥0 ∈ C 0 , then for all continuous functions f the process
{f (t, X(t), Y (t))}t≥0 belongs to C 0 and thus it is Itô integrable.
The properties listed in Theorem 4.1 carry over to the Itô integral of a general stochastic
process. For easy reference, we rewrite these properties in the following theorem.
80
satisfies the following properties for all t ≥ 0.
(i) Linearity: For all stochastic processes {X1 (t)}t≥0 , {X2 (t)}t≥0 ∈ C 0 and real constants
c1 , c2 ∈ R there holds
Z t Z t Z t
(c1 X1 (s) + c2 X2 (s))dW (s) = c1 X1 (s)dW (s) + c2 X2 (s)dW (s).
0 0 0
(iii) Quadratic variation: For all T > 0, the quadratic variation of the stochastic process
{I(t)}t≥0 on the interval [0, T ] is independent of the sequence of partitions along which
it is computed and it is given by
Z T
[I, I](T ) = X 2 (s) ds. (4.10)
0
Rt
(iv) Itô’s isometry: If {X(t)}t≥0 ∈ L2 , then Var[I(t)] = E[I 2 (t)] = E[ 0 X 2 (s) ds], for all
t ≥ 0.
p
Proof of (ii). By (iv) and the Schwartz inequality, E[I(t)] ≤ E[I 2 (t)] < ∞. According
to (3.20), it now suffices to show that
Let {{In (t)}t≥0 }n∈N be a sequence of Itô integrals of step processes which converges to
{I(t)}t≥0 in L2 (Ω), uniformly in compact intervals of time (see Theorem 4.3). Since {In (t)}t≥0
is a martingale for each n ∈ N, see Theorem 4.1, then
Hence the claim follows if we show that E[In (t)IA ] → E[I(t)IA ], for all t ≥ 0. Using the
Schwarz inequality (3.3), we have
p p
E[(In (t) − I(t))IA ] ≤ E[(In (t) − I(t))2 ]E[IA ] ≤ kIn (t) − I(t)k2 P(A)
≤ kIn (t) − I(t)k2 → 0, as n → ∞,
Remark 4.1. Note carefully that the martingale property (ii) requires {X(t)}t≥0 ∈ L2 . A
stochastic process in C 0 \L2 is not a martingale in general (although it is a local martingale,
see [1]).
81
Exercise 4.2. Let {X(t)}t≥0 ∈ L2 . Show that the double Itô integral
Z t Z s
J(t) = X(τ )dW (τ ) dW (s), t ≥ 0,
0 0
is well defined. Write down the properties in Theorem 4.5 for J(t).
Exercise 4.3. Prove the following generalization of Itô’s isometry. Let {X(t)}t≥0 , {Y (t)}t≥0 ∈
C 0 ∩ L2 and denote by IX (t), IY (t) their Itô integral over the interval [0, t]. Then
Z t
Cov(IX (t), IY (t)) = E[ X(s)Y (s) ds].
0
We now state without proof the martingale representation theorem, which asserts
that any martingale is an Itô integral (i.e., the logically opposite claim to (ii) in Theorem 4.5
holds). For the proof see Theorem 4.3.4 in [17].
Theorem 4.6. Let {M (t)}t≥0 , with M (t) ∈ L2 (Ω) for all t ≥ 0, be a martingale stochas-
tic process relative to the filtration {FW (t)}t≥0 . Then there exists a stochastic process
{Γ(t)}t≥0 ∈ L2 such that
Z t
M (t) = M (0) + Γ(s)dW (s), for all t ≥ 0.
0
Remark 4.2. Note carefully that the filtration used in the martingale representation the-
orem must be the one generated by the Brownian motion. Theorem 4.6 will be used in
Chapter 6 to show the existence of hedging portfolios of European derivatives, see Theo-
rem 6.2.
We conclude this section by introducing the “differential notation” for stochastic integrals.
Instead of (4.9), we write
dI(t) = X(t)dW (t).
For instance, the identities (4.5), (4.8) are also expressed as
d(W 2 (t)) = dt + 2W (t)dW (t), d(tW (t)) = W (t)dt + tdW (t).
The quadratic variation (4.10) is expressed also as
dI(t)dI(t) = X 2 (t)dt.
Note that this notation is in agreement with the one already introduced in Section 3.4,
namely Z t
dI(t)dI(t) = d X (s)ds = X 2 (t)dt.
2
0
The differential notation is very useful to provide informal proofs in stochastic calculus.
For instance, using dI(t) = X(t)dW (t), and dW (t)dW (t) = dt, see (3.12), we obtain the
following simple “proof” of Theorem 4.5(iii):
dI(t)dI(t) = X(t)dW (t)X(t)dW (t) = X 2 (t)dW (t)dW (t) = X 2 (t)dt.
82
4.4 Diffusion processes
Now that we know how to integrate along the paths of a Brownian motion, we can define a
new class of stochastic processes.
Definition 4.3. Given {α(t)}t≥0 , {σ(t)}t≥0 ∈ C 0 , the stochastic process {X(t)}t≥0 ∈ C 0 given
by Z t Z t
X(t) = X(0) + σ(s)dW (s) + α(s) ds, t≥0 (4.11)
0 0
is called diffusion process with rate of quadratic variation {σ 2 (t)}t≥0 and drift {α(t)}t≥0 .
Note that
dX(t)dX(t) = σ 2 (t)dt,
which means that the quadratic variation of the diffusion process (4.12) is given by
Z t
[X, X](t) = σ 2 (s) ds, t ≥ 0.
0
Thus the stochastic process {σ 2 (t)}t≥0 measures the rate at which quadratic variations accu-
mulates in time in the diffusion process {X(t)}t≥0 . Furthermore, assuming {σ(t)}t≥0 ∈ L2 ,
we have Z t
E[ σ(s)dW (s)] = 0.
0
Rt
Hence the term 0 α(s)ds is the only one contributing to the evolution of the average of
{X(t)}t≥0 , which is the reason to call α(t) the drift of the diffusion process (if α = 0 and
{σ(t)}t≥0 ∈ L2 , the diffusion process is a martingale, as it follows by Theorem 4.5(ii)).
Finally, the integration along the paths of the diffusion process (4.12) is defined as
Z t Z t Z t
Y (s)dX(s) := Y (s)σ(s)dW (s) + Y (s)α(s)ds, (4.13)
0 0 0
83
4.4.1 The product rule in stochastic calculus
Recall that if f, g : R → R are two differentiable functions, the product (or Leibnitz) rule of
ordinary calculus states that
(f g)0 = f 0 g + f g 0 ,
and thus Z t
f g(t) = f g(0) + (g(s)df (s) + f (s)dg(s)).
0
Can this rule be true in stochastic calculus, i.e., when f and g are general diffusion processes?
The answer is clearly no. In fact, letting for instance f (t) = g(t) = W (t), Leibnitz’s rule give
us the relation d(W 2 (t)) = 2W (t)dW (t), while we have seen before that the correct formula
in Itô’s calculus is d(W 2 (t)) = 2W (t)dW (t) + t. The correct product rule in Itô’s calculus is
the following.
Theorem 4.7. Let {X1 (t)}t≥0 and {X2 (t)}t≥0 be the diffusion processes
dXi (t) = σi (t)dW (t) + θi (t)dt.
Then {X1 (t)X2 (t)}t≥0 is the diffusion process given by
d(X1 (t)X2 (t)) = X2 (t)dX1 (t) + X1 (t)dX2 (t) + σ1 (t)σ2 (t)dt. (4.14)
Exercise 4.4 (•). Prove the theorem in the case that αi and σi are deterministic constants
and Xi (0) = 0, for i = 1, 2.
Recall that the correct way to interpret (4.14) is
Z t Z t Z t
X1 (t)X2 (t) = X1 (0)X2 (0) + X2 (s)dX1 (s) + X1 (s)dX2 (s) + σ1 (s)σ2 (s)ds, (4.15)
0 0 0
where the integrals along the paths of the processes {Xi (t)}t≥0 are defined as in (4.13). Note
that all integrals in (4.15) are well-defined, since the integrand stochastic processes have a.s.
continuous paths. We also remark that, since
dX1 (t)dX2 (t) = (σ1 (t)dW (t) + α1 (t)dt)(σ2 (t)dW (t) + α2 (t)dt)
= σ1 (t)σ2 (t)dW (t)dW (t) + (α1 (t)σ2 (t) + α2 (t)σ1 (t))dW (t)dt + α1 (t)α2 (t)dtdt
= σ1 (t)σ2 (t)dt,
then we may rewrite (4.14) as
d(X1 (t)X2 (t)) = X2 (t)dX1 (t) + X1 (t)dX2 (t) + dX1 (t)dX2 (t), (4.16)
which is somehow easier to remember. Going back to the examples considered in Section 4.3,
the Itô product rule gives
d(W 2 (t)) = W (t)dW (t) + W (t)dW (t) + dW (t)dW (t) = 2W (t)dW (t) + dt,
d(tW (t)) = tdW (t) + W (t)dt + dW (t)dt = tdW (t) + W (t)dt,
in agreement with our previous calculations, see (4.5) and (4.8).
84
4.4.2 The chain rule in stochastic calculus
Next we consider the generalization to Itô’s calculus of the chain rule. Let us first recall how
the chain rule works in ordinary calculus. Assume that f : R × R → R and g : R → R are
differentiable functions. Then
d d
f (t, g(t)) = ∂t f (t, g(t)) + ∂x f (t, g(t)) g(t),
dt dt
by which we derive
Z t Z t
f (t, g(t)) = f (0, g(0)) + ∂s f (s, g(s)) ds + ∂x f (s, g(s)) dg(s).
0 0
Can this formula be true in stochastic calculus, i.e., when g is a diffusion process? The
answer is clearly no. In fact by setting f (t, x) = x2 , g(t) = W (t) and t = T in the previous
formula we obtain
Z T Z T
2 W 2 (T )
W (T ) = 2 W (t)dW (t), i.e., W (t)dW (t) = ,
0 0 2
while the Itô integral of the Brownian motion is given by (4.5). The correct formula for the
chain rule in stochastic calculus is given in the following theorem.
Theorem 4.8. Let f : R × R → R, f = f (t, x), be a C 1 function such that ∂x2 f is continuous
and let {X(t)}t≥0 be the diffusion process dX(t) = σ(t)dW (t) + α(t)dt. Then Itô’s formula
holds:
1
df (t, X(t)) = ∂t f (t, X(t)) dt + ∂x f (t, X(t)) dX(t) + ∂x2 f (t, X(t)) dX(t)dX(t), (4.17)
2
i.e.,
1
df (t, X(t)) = ∂t f (t, X(t)) dt + ∂x f (t, X(t)) (σ(t)dW (t) + α(t)dt) + ∂x2 f (t, X(t))σ 2 (t) dt.
2
(4.18)
For instance, letting X(t) = W (t) and f (t, x) = x2 , we obtain d(W 2 (t)) = W (t)dW (t) +
1
2
dt, i.e., (4.5), while for f (t, x) = tx we obtain d(tW (t)) = W (t)dt + tdW (t), which is (4.8).
In fact, the proof of Theorem 4.8 is similar to proof of (4.5) and (4.8). We omit the details
(see [21, Theorem 4.4.1] for a sketch of the proof).
Recall that (4.18) is a shorthand for
Z t Z t
1 2 2
f (t, X(t)) = f (0, X(0))+ (∂t f +α(s)∂x f + σ (s)∂x f )(s, X(s)) ds+ ∂x f (s, X(s))dW (s).
0 2 0
All integrals in the right hand side of the previous equation are well defined, as the integrand
stochastic processes have continuous paths. We conclude with the generalization of Itô’s
formula to functions of several random variables, which again we give without proof.
85
Theorem 4.9. Let f : R × RN × R → R be a C 1 function such that f = f (t, x) is twice con-
tinuously differentiable on the variable x ∈ RN . Let {X1 (t)}t≥0 , . . . , {XN (t)}t≥0 be diffusion
processes and let X(t) = (X1 (t), . . . , XN (t)). Then there holds:
N
X
df (t, X(t)) = ∂t f (t, X(t)) dt + ∂xi f (t, X(t)) dXi (t)
i=1
N
1X
+ ∂xi ∂xj f (t, X(t)) dXi (t)dXj (t). (4.19)
2 i,j=1
For instance, for N = 2 and letting f (t, x1 , x2 ) = x1 x2 into (4.19), we obtain the Itô
product rule (4.16).
Remark 4.3. Let {X(t)}t≥0 , {Y (t)}t≥0 be diffusion processes and define the complex-valued
stochastic process {Z(t)}t≥0 by Z(t) = X(t) + iY (t). Then any stochastic process of the
form g(t, Z(t)) can be written in the form f (t, X(t), Y (t)), where f (t, x, y) = g(t, x + iy).
Hence dg(t, Z(t)) can be computed using Theorem 4.9. An application to this formula is
given in Exercise 4.9 below.
The following exercises help to get familiar with the rules of stochastic calculus.
Exercise 4.5 (•). Let {W1 (t)}t≥0 , {W2 (t)}t≥0 be Brownian motions. Assume that there
exists a constant ρ ∈ [−1, 1] such that dW1 (t)dW2 (t) = ρdt. Show that ρ is the correlation of
the two Brownian motions at time t. Assuming that {W1 (t)}t≥0 , {W2 (t)}t≥0 are independent,
compute P(W1 (t) > W2 (s)), for all s, t > 0.
Exercise 4.6 (•). Consider the stochastic process {X(t)}t≥0 defined by X(t) = W (t)3 −
3tW (t). Show that {X(t)}t≥0 is a martingale and find a process {Γ(t)}t≥0 adapted to
{F(t)}t≥0 such that Z t
X(t) = X(0) + Γ(s)dW (s).
0
(The existence of the process {Γ(t)}t≥0 is ensured by Theorem 4.6.)
Exercise 4.7 (•). Let {θ(t)}t≥0 ∈ C 0 and define the stochastic process {Z(t)}t≥0 by
Z t
1 t 2
Z
Z(t) = exp − θ(s)dW (s) − θ (s)ds .
0 2 0
Show that Z t
Z(t) = − θ(s)Z(s) dW (s).
0
Processes of the form considered in Exercise 4.7 are fundamental in mathematical finance.
In particular, it is important to know whether {Z(t)}t≥0 is a martingale. By Exercise 4.7
and Theorem 4.5(ii), {Z(t)}t≥0 is a martingale if θ(t)Z(t) ∈ L2 , which is however difficult in
general to verify directly. The following condition, known as Novikov’s condition, is more
useful in the applications, as it involves only the time-integral of the process {θ(t)}t≥0 . The
proof can be found in [13].
86
Theorem 4.10. Let {θ(t)}t≥0 ∈ C 0 satisfy
1 T
Z
E[exp( θ(t)2 dt)] < ∞, for all T > 0. (4.20)
2 0
Then the stochastic process {Z(t)}t≥0 given by
Z t
1 t 2
Z
Z(t) = exp − θ(s)dW (s) − θ (s)ds .
0 2 0
is a martingale relative the filtration {F (t)}t≥0 .
In particular, the stochastic process {Z(t)}t≥0 is a martingale when θ(t) = const, hence
we recover the result of Exercise 3.29. The following exercise extends the result of Exercise 4.7
to the case of several independent Brownian motions.
Exercise 4.8 (?). Let {W1 (t)}t≥0 , . . . {WN (t)}t≥0 be independent Brownian motions and let
0
{F(t)}t≥0 be a non-anticipating filtration for all of them. Let {θ1 (t)}p t≥0 , . . . {θN (t)}t≥0 ∈ C
be adapted to {F(t)}t≥0 and set θ(t) = (θ1 (t), . . . , θN (t)), kθ(t)k = θ1 (t)2 + · · · + θN (t)2 .
Compute dZ(t), where
N Z t
!
1 t
X Z
Z(t) = exp − θj (s)dWj (s) − kθ(s)k2 ds .
j=1 0
2 0
87
is a martingale relative to {FW (t)}t≥0 . As Z(0) = 1, then E[Z(t)] = 1 for all t ≥ 0. Thus
we can use the stochastic process {Z(t)}t≥0 to generate a measure P e equivalent to P as we
did at the end of Section 3.6, namely Pe : F → [0, 1] is given by
P(A)
e = E[Z(T )IA ], A ∈ F. (4.22)
The relation between E and E
e has been determined in Theorem 3.19, where we showed that
E[X]
e = E[Z(t)X], (4.23)
for all t ≥ 0 and FW (t)-measurable random variables X, and
e |FW (s)] = 1
E[Y E[Z(t)Y |FW (s)] (4.24)
Z(s)
for all 0 ≤ s ≤ t and random variables Y . We can now state and sketch the proof of
Girsanov’s theorem, which is a fundamental result with deep applications in mathematical
finance.
Theorem 4.11. Define the stochastic process {Wf (t)}t≥0 by
Z t
W (t) = W (t) +
f θ(s)ds, (4.25)
0
that is to say, Z t
W
f (t)Z(t) = (1 − W
f (u)θ(u))Z(u)dW (u).
0
Hence E[
eW f (t)|FW (s)] = W
f (s), as claimed.
88
Later we shall need also the multi-dimensional version of Girsanov’s theorem. Let
{W1 (t)}t≥0 , . . . , {WN (t)}t≥0 be independent Brownian motions and let {FW (t)}t≥0 be their
own generated filtration. Let {θ1 (t)}t≥0 , . . . , {θN (t)}t≥0 ∈ C 0 be adapted to {FW (t)}t≥0 and
set θ(t) = (θ1 (t), . . . , θN (t)). We assume that the Novikov condition (4.20) is satisfied (with
θ(t)2 = kθ(t)k2 = θ1 (t)2 + · · · + θN (t)2 ). Then, as shown in Exercise 4.8, the stochastic
process {Z(t)}t≥0 given by
N Z t Z t !
X 1
Z(t) = exp − θj (s)dWj (s) − kθ(s)k2 ds
j=1 0
2 0
89
and realistic model for the dynamics of stock prices than the simple geometric Brownian
motion. In the rest of these notes we assume that stock prices are modeled by geometric
Brownian motions.
Since
S(t) = S(0)eX(t) , dX(t) = α(t)dt + σ(t)dW (t),
then Itô’s formula gives
1
dS(t) = S(0)eX(t) dX(t) + S(0)eX(t) dX(t)dX(t)
2
1
= S(t)α(t)dt + S(t)σ(t)dW (s) + σ 2 (t)S(t)dt
2
1
= µ(t)S(t)dt + σ(t)S(t)dW (t), where µ(t) = α(t) + σ 2 (t),
2
hence a generalized geometric Brownian motion is a diffusion process in which the rate of
quadratic variation and the drift depend on the process itself.
In the presence of several stocks, it is reasonable to assume that each of them introduced
a new source of randomness in the market. Thus, when dealing with N stocks, we assume the
existence of N Brownian motions {W1 (t)}t≥0 , . . . {WN (t)}t≥0 , not necessarily independent,
and model the evolution of the stocks prices {S1 (t)}t≥0 }, . . . , {SN (t)}t≥0 by the following
N -dimensional generalized geometric Brownian motion:
N
!
X
dSk (t) = µk (t) + σkj (t)dWj (t) Sk (t) (4.28)
j=1
for some stochastic processes {µk (t)}t≥0 , {σkj (t)}t≥0 ∈ C 0 , j, k = 1, . . . , N , adapted to the
filtration generated by the Brownian motions.
Self-financing portfolios
Consider a portfolio {hS (t), hB (t)}t≥0 invested in a 1+1-dimensional market. We assume
that the price of the stock follows the generalized geometric Brownian motion
Moreover we assume that the market parameters {µ(t)}t≥0 , {σ(t)}t≥0 , {R(t)}t≥0 have
continuous paths a.s. and are adapted to the filtration {FW (t)}t≥0 . The value of the portfolio
is given by
V (t) = hS (t)S(t) + hB (t)B(t). (4.31)
We say that the portfolio is self-financing if purchasing more shares of one asset is possible
only by selling shares of the other asset for an equivalent value (and not by infusing new cash
90
into the portfolio), and, conversely, if any cash obtained by selling one asset is immediately
re-invested to buy shares of the other asset (and not withdrawn from the portfolio). To
translate this condition into a mathematical formula, assume that (hS , hB ) is the investor
position on the stock and the risk-free asset during the “infinitesimal” time interval [t, t+δt).
Let V − (t + δt) be the value of this portfolio immediately before the time t + δt at which the
position is changed, i.e.,
where we used the continuity in time of the assets price. At the time t + δt, the investor
sells/buys shares of the assets. Let (h0S , h0B ) be the new position on the stock and the risk-free
asset. Then the value of the portfolio at time t + δt is given by
The difference V (t + δt) − V − (t + δt), if not zero, corresponds to cash withdrawn or added
to the portfolio as a result of the change in the position on the assets. In a self-financing
portfolio, however, this difference must be zero. We obtain
V (t + δt) − V − (t + δt) = 0 ⇔ (hS − h0S )S(t + δt) + (hB − h0B )B(t + δt) = 0.
Hence, the change of the portfolio value in the interval [t, t + δt] is given by
δV = V (t + δt) − V (t) = h0S S(t + δt) + h0B B(t + δt) − (hS S(t) + hB B(t)) = hS δS + hB δB,
where δS = S(t + δt) − S(t), and δB = B(t + δt) − B(t) are the changes of the assets value
in the interval [t, t + δt]. This discussion leads to the following definition.
Definition 4.4. A portfolio process {hS (t), hB (t)}t≥0 invested in the 1 + 1-dimensional mar-
ket (4.29)-(4.30) is said to be self-financing if it is adapted to {FW (t)}t≥0 and if its value
process {V (t)}t≥0 satisfies
We conclude with the important definition of hedging portfolio. Suppose that at time
t a European derivative with pay-off Y at the time of maturity T > t is sold for the price
ΠY (t). An important problem in financial mathematics is to find a strategy for how the
seller should invest the premium ΠY (t) of the derivative in order to hedge the derivative,
i.e., to ensure that the portfolio value of the seller at time T is enough to pay-off the buyer
of the derivative. We shall assume throughout that the seller can invest the premium of the
derivative only on the 1+1 dimensional market consisting of the underlying stock and the
risk-free asset.
Definition 4.5. Consider the European derivative with pay-off Y and time of maturity T ,
where we assume that Y is FW (T )-measurable. A portfolio process {hS (t), hB (t)}t≥0 invested
in the underlying stock and the risk-free asset is said to be an hedging portfolio if
91
(i) {hS (t), hB (t)}t≥0 is adapted to {FW (t)}t≥0 ;
2) What investment strategy (on the underlying stock and the risk-free asset) should the
seller undertake in order to hedge the derivative?
Hence we have to show that E[W1 (t)W2 (t)] = ρt. By Itô’s product rule
d(W1 (t)W2 (t)) = W1 (t)dW2 (t)+W2 (t)dW1 (t)+dW1 (t)dW2 (t) = W1 (t)dW2 (t)+W2 (t)dW1 (t)+ρdt.
92
Taking the expectation we find E[W1 (t)W2 (t)] = ρt, which concludes the first part of the
exercise. As to the second part, the independent random variables W1 (t), W2 (s) have the
joint density
1 x2 y2
fW1 (t)W2 (s) (x, y) = fW1 (t) (x)fW2 (s) (y) = √ e− 2t − 2s .
2π ts
Hence
1 1
Z
x2 y2
P(W1 (t) > W2 (s)) = √ e− 2t − 2s dx dy = .
2π ts x>y 2
Exercise 4.6. To solve the exercise we must prove that dX(t) = Γ(t)dW (t), for some
process {Γ(t)}t≥0 adapted to {FW (t)}t≥0 . In fact, by Itô’s formula,
1
dX(t) = −3W (t) dt + (3W (t) − 3t) dW (t) + 6W (t)dW (t)dW (t) = 3(W (t)2 − t)dW (t),
2
where in the last step we used that dW (t)dW (t) = dt.
Exercise 4.7. By Itô’s formula, the stochastic process {Z(t)}t≥0 satisfies dZ(t) = −θ(t)Z(t)dW (t),
which is the claim.
Exercise 4.9. Since {I(t)}t≥0 is a martingale, then E[I(t)] = E[I(0)] = 0. By Itô’s isometry,
Z t Z t
2 2
Var[I(t)] = E[Y (t) ] = E[ A(s) ds] = A2 (s) ds,
0 0
since A(t) is not random. To prove that I(t) is normally distributed, it suffices to show that
its characteristic function satisfies
u2
Rt
θI(t) (u) = e− 2 0 A(s) ds
,
see Section 3.3. The latter is equivalent to
t
u2
2 Rt
Z
iuI(t) − u2 A(s) ds
E[e ]=e 0 , i.e., E[exp(iuI(t) + A(s) ds)] = 1.
2 0
2 Rt
Let Z(t) = exp(iuI(t) + u2 0 A(s) ds). If we show that Z(t) is a martingale, we are done,
because then E[Z(t)] = E[Z(0)] = 1. We write
Z t
u2 t 2
Z
iX(t)+Y (t)
Z(t) = e = f (X(t), Y (t)), X(t) = u A(s) dW (s), Y (t) = A (s) ds.
0 2 0
Then, by Theorem 4.9, using dX(t)dY (t) = dY (t)dY (t) = 0,
1
dZ(t) = ieiX(t)+Y (t) dX(t) + eiX(t)+Y (t) dY (t) − eiX(t)+Y (t) dX(t)dX(t) = iuZ(t)A(t)dW (t),
2
where we used that dX(t)dX(t) = u2 A2 (t)dt. Being an Itô’s integral, the process {Z(t)}t≥0
is a martingale, which completes the solution of the exercise.
93
Chapter 5
Throughout this chapter, the probability space (Ω, F, P) is given and {F(t)}t≥0 denotes
a non-anticipating filtration for the given Brownian motion {W (t)}t≥0 . Given T > 0, we
denote by DT the open region in the (t, x)-plane given by
DT = {t ∈ (0, T ), x ∈ R} = (0, T ) × R.
• C 1,2 (DT ) is the space of functions u ∈ C 1 (DT ) such that ∂x2 u ∈ C(DT );
94
5.1 Stochastic differential equations
Definition 5.1. Given s ≥ 0, α, β ∈ C 0 ([s, ∞) × R), and a deterministic constant x ∈ R,
we say that a stochastic process {X(t)}t≥s is a global solution to the stochastic differential
equation (SDE)
dX(t) = α(t, X(t)) dt + β(t, X(t)) dW (t) (5.1)
with initial value X(s, ω) = x at time t = s, if {X(t)}t≥s ∈ C 0 and
Z t Z t
X(t) = x + α(τ, X(τ )) dτ + β(τ, X(τ )) dW (τ ), t ≥ s. (5.2)
s s
The initial value of a SDE can be a random variable instead of a deterministic constant,
but we shall not need this more general case. Note also that the integrals in the right
hand side of (5.2) are well-defined, as the integrand functions have continuous paths a.s. Of
course one needs suitable assumptions on the functions α, β to ensure that there is a (unique)
process {X(t)}t≥s satisfying (5.2). The precise statement is contained in the following global
existence and uniqueness theorem for SDE’s, which is reminiscent of the analogous result for
ordinary differential equations.
Theorem 5.1. Assume that for each T > s there exist constants CT , DT > 0 such that α, β
satisfy
|α(t, x)| + |β(t, x)| ≤ CT (1 + |x|), (5.3)
|α(t, x) − α(t, y)| + |β(t, x) − β(t, y)| ≤ DT |x − y|, (5.4)
for all t ∈ [s, T ], x, y ∈ R. Then there exists a unique global solution {X(t)}t≥s of the
SDE (5.1) with initial value X(s) = x. Moreover {X(t)}t≥s ∈ L2 .
A proof of Theorem 5.1 can be found in [17, Theorem 5.2.1]. Note that the result proved
in [17] is a bit more general than the one stated above, as it covers the case of a random
initial value.
The solution of (5.1) with initial initial value x at time t = s will be also denoted by
{X(t; s, x)}t≥s . It can be shown that, under the assumptions of Theorem 5.1, the random
variable X(t; s, x) depends (a.s.) continuously on the initial conditions (s, x), see [1, Sec. 7.3].
Remark 5.1. The uniqueness statement in Theorem 5.1 is to be understood “up to null
sets”. Precisely, if {Xi (t)}t≥s , i = 1, 2 are two solutions with the same initial value x, then
P( sup |X1 (t) − X2 (t)| > 0) = 0, for all T > s.
t∈[s,T ]
Remark 5.2. If the assumptions of Theorem 5.1 are satisfied only up to a fixed time
T > 0, then the solution of (5.1) could explode at some finite time in the future of T . For
example, the stochastic process {X(t)}0≤t<T∗ given by X(t) = log(W (t)+ex ) solves (5.1) with
α = − exp(−2x)/2 and β = exp(−x), but only up to the time T∗ = inf{t : W (t) = −ex } > 0.
In these notes we are only interested in global solutions of SDE’s, hence we require (5.3)-(5.4)
to hold for all T > 0.
95
Remark 5.3. The growth condition (5.3) alone is sufficient to prove the existence of a global
solution to (5.1). The Lipschitz condition (5.4) is used to ensure uniqueness. By using a
more general notion of solution (weak solution) and uniqueness (pathwise uniqueness),
one can extend Theorem 5.1 to a larger class of SDE’s, which include in particular the CIR
process considered in Section 5.3; see [19] for details.
Exercise 5.1. Within many applications in finance, the drift term α(t, x) is linear, an so it
can be written in the form
A stochastic process {X(t)}t≥0 is called mean reverting if there exists a constant c such
that E[X(t)] → c as t → +∞. Most financial variables are required to satisfy the mean
reversion property. Prove that the solution {X(t; s, x)}t≥0 of (5.1) with linear drift (5.5)
satisfies
E[X(t; s, x)] = xe−a(t−s) + b(1 − e−a(t−s) ). (5.6)
Hence the process {X(t; s, x)}t≥0 is mean reversing if and only if a > 0 and in this case the
long time mean is given by c = b.
and so by Theorem 5.1 there exists a unique global solution of (5.7). For example, the
geometric Brownian motion (2.14) solves the linear SDE dS(t) = µS(t) dt + σS(t) dW (t),
where µ = α + σ 2 /2. Another example of linear SDE in finance is the Hull-White interest
rate model, see Section 6.6. Linear SDE’s can be solved explicitly, as shown in the following
theorem.
Theorem 5.2. The solution of (5.7) is given by X(t) = Y (t)Z(t), where
Z t Z t
σ(τ )2
Z(t) = exp σ(τ )dW (τ ) + (b(τ ) − )dτ ,
s s 2
Z t Z t
a(τ ) − σ(τ )γ(τ ) γ(τ )
Y (t) = x + dτ + dW (τ ).
s Z(τ ) s Z(τ )
96
For example, in the special case in which the functions a, b, γ, σ are constant (independent
of time), the solution of (5.7) with initial value X(0) = x at time t = 0 is
2
Z t 2
Z t
σ2
σW (t)+(b− σ2 )t −σW (τ )−(b− σ2 )τ
X(t) = e x+(a−γσ) e dτ +γ W (τ )e−σW (τ )−(b− 2 )τ dW (τ ) .
0 0
Exercise 5.3 (•). Consider the linear SDE (5.7) with constant coefficients a, b, γ and σ = 0.
Find the solution and show that X(t; s, x) ∈ N (m(t − s, x), ∆(t − s)2 ), where
a γ2
m(τ, x) = xe−bτ + (a − e−bτ ), ∆(τ )2 = (1 − e−2bτ ). (5.8)
b 2b
Exercise 5.4 (•). Find the solution {X(t)}t≥0 of the linear SDE
Exercise 5.5. Compute Cov(W (t), X(t)) and Cov(W 2 (t), X(t)), where X(t) = X(t; s, x) is
the stochastic process in Exercise 5.3.
and therefore X(t; s, x) ∈ N (m(t−s, x), ∆(t−s)2 ), where m(τ, x) and ∆(τ ) are given by (5.8).
By Theorem 3.20, the transition density of the Markov process {X(t; s, x)}t≥0 exists and is
given by the pdf of the random variable X(t; s, x), that is p(t, s, x, y) = p∗ (t − s, x, y), where
(y−m(τ,x))2 1
p∗ (τ, x, y) = e 2∆(τ )2 p .
2π∆(τ )2
The previous example rises the question of whether the Markov process solution of SDE’s
always admits a transition density. This problem is one of the subjects of Section 5.2.
97
5.1.3 Systems of SDE’s
Occasionally in the next chapter we need to consider systems of several SDE’s. All the
results presented in this section extend mutatis mutandis to systems of SDE’s, the difference
being merely notational. For example, given two Brownian motions {W1 (t)}t≥0 , {W2 (t)}t≥0
and continuous functions α1 , α2 , β11 , β12 , β21 , β22 : [s, ∞) × R2 → R, the relations
X
dXi (t) = αi (t, X1 (t), X2 (t)) dt + βij (t, X1 (t), X2 (t))dWj (t), (5.10a)
j=1,2
Xi (s) = xi , i = 1, 2 (5.10b)
define a system of two SDE’s on the stochastic processes {X1 (t)}t≥0 , {X2 (t)}t≥0 with initial
values X(s) = x1 , X2 (s) = x2 at time s. As usual, the correct way to interpret the relations
above is in the integral form:
Z t XZ t
Xi (t) = xi + αi (τ, X1 (τ ), X2 (τ )) dτ + βij (τ, X1 (τ ), X2 (τ ))dWj (τ ) i = 1, 2.
s j=1,2 s
Upon defining the vector and matrix valued functions α = (α1 , α2 )T , β = (βij )i,j=1,2 , and
letting X(t) = (X1 (t), X2 (t)), x = (x1 , x2 ), W (t) = (W1 (t), W2 (t)), we can rewrite (5.10) as
where · denotes the row by column matrix product. In fact, every system of any arbitrary
number of SDE’s can be written in the form (5.11). Theorem 5.1 continues to be valid for
systems of SDE’s, the only difference being that |α|, |β| in (5.3)-(5.4) stand now for the
vector norm of α and for the matrix norm of β.
98
Moreover if ∂x u is uniformly bounded on DT , then the stochastic process {u(t, X(t))}t∈[0,T ]
is a martingale relative to the filtration {F(t)}t≥0 .
β2 2
du(t, X(t)) = (∂t u + α∂x u + ∂ u)(t, X(t)) dt + (β∂x u)(t, X(t)) dW (t).
2 x
As u solves (5.12), then du(t, X(t)) = (β∂x u)(t, X(t)) dW (t), which is equivalent to (5.13)
(as u(0, X(0)) = u(0, x)). Under the additional assumption that ∂x u is uniformly bounded
on DT , there exists a constant CT > 0 such that |∂x u(t, x)| ≤ CT and so, due also to (5.3),
the Itô integral in the right hand side of (5.13) is a martingale. This concludes the proof of
the theorem.
Definition 5.2. The partial differential equation (PDE) (5.12) is called the (backward)
Kolmogorov equation associated to the SDE (5.1). We say that u : DT → R is a strong
solution of (5.12) in the region DT if u ∈ C 1,2 (DT ), u solves (5.12) for all (t, x) ∈ DT , and
∂x u(t, x) is uniformly bounded on DT . Similarly, replacing DT with DT+ , one defines strong
solutions of (5.12) in the region DT+
Note carefully that for a strong solution u of the Kolmogorov equation in the region DT+ ,
we require that u, ∂t u, ∂x u and ∂x2 u extend continuously on the axis x = 0. This assumption
can be weakened, but we shall not do so. The statement of Theorem 5.3 rises the question
of whether there exist strong solutions to the Kolmogorov PDE. This important problem is
solved in the following theorem.
Theorem 5.4. Assume that the hypothesis of Theorem 5.1 are satisfied and in addition
assume that α, β ∈ C 2 (DT ) such that ∂xi α, ∂xi β are uniformly bounded on DT , for i = 1, 2
and for all T > 0. Let g ∈ C 2 (R) such that g 0 and g 00 are uniformly bounded on R. Define
the function
uT (t, x) = E[g(X(T ; t, x))], 0 ≤ t < T. (5.14)
Then the following holds.
(ii) The solution is unique in the following sense: if v is another strong solution and
limt→T v(t, x) = g(x), then v = uT in DT .
99
Proof. For (i) see [17, Theorem 8.1.1]. We prove only (ii). Let v be a solution as stated in
the theorem and set Y (τ ) = v(τ, X(τ ; t, x)), for t ≤ τ ≤ T . By Itô’s formula and using that
v solves (5.15) we find dY (τ ) = β∂x v(τ, X(τ ; t, x))dW (τ ). Hence
Z T
v(T, X(T ; t, x)) − v(t, X(t; t, x)) = β∂x v(τ, X(τ ; t, x))dW (τ ).
t
Moreover v(T, X(T ; t, x)) = g(X(T ; t, x)), v(t, X(t; t, x)) = v(t, x) and in addition, by (5.3)
and the fact that of ∂x v is uniformly bounded, the Itô integral in the right hand side is a
martingale. Hence taking the expectation we find v(t, x) = E[g(T, X(T ; t, x))] = u(t, x).
Remark 5.4. It is often convenient to study the Kolmogorov PDE with an initial, rather
than terminal, condition. To this purpose it suffices to make the change of variable t → T − t
in (5.15). Letting ū(t, x) = u(T − t, x), we now see that ū satisfies the PDE
1
−∂t ū + α(T − t, x)∂x ū + β(T − t, x)2 ∂x2 ū = 0, 0≤t≤T (5.17)
2
with initial condition ū(0, x) = g(x). Note that this is the equation considered in [17,
Theorem 8.1.1]
Remark 5.5. Observe that in Theorem 5.4 we have a different solution for each fixed T . As
∂x uT is uniformly bounded, Theorem 5.3 gives that the stochastic process {uT (t, X(t))}t∈[0,T ]
is a martingale. Equation (5.14) is also called Dynkin’s formula (which is a special case
of the Feynman-Kac formula.)
Remark 5.6. It is possible to define other concepts of solution to the Kolmogorov PDE other
than the strong one, e.g., weak solution, entropy solution, etc. In general these solutions are
not uniquely characterized by their terminal value. In these notes we only consider strong
solutions, which, as proved in Theorem 5.4, are uniquely determined by (5.16).
Exercise 5.6. Consider the Kolmogorov PDE associated to the linear SDE (5.7) with con-
stant coefficients and σ = 0:
1
∂t u + (a − bx)∂x u + γ∂x2 u = 0, (t, x) ∈ DT+ . (5.18)
2
Find the strong solution of (5.18) that satisfies u(T, x) = 1. HINT: Use the ansatz f (t, x) =
e−xA(t,T )+B(t,T ) .
The study of the Kolmogorov equation is also important to establish whether the solution
of a SDE admits a transition density. In fact, it can be shown that when {X(t)}t≥s admits
a smooth transition density, then the latter coincides with the fundamental solution of
the Kolmogorov equation. To state the precise result, let us denote by δ(x − y) the δ-
distribution centered in y ∈ R, i.e., the distribution satisfying
Z
ψ(x)δ(x − y) dx = ψ(y), for all ψ ∈ Cc∞ (R).
R
100
A sequence of measurable functions (gn )n∈N is said to converge to δ(x − y) in the sense of
distributions if
Z
lim gn (x)ψ(x) dx → ψ(y), as n → ∞, for all ψ ∈ Cc∞ (R).
n→∞ R
Theorem 5.5. Assume the conditions in Theorem 5.1 are satisfied. Let {X(t; s, x)}t≥s be
the global solution of (5.1) with initial value X(s) = x; recall that this solution is a Markov
stochastic process.
(i) If {X(t; s, x)}t≥s admits a transition density p(t, s, x, y) which is C 1 in the variable s
and C 2 in the variable x, then p(t, s, x, y) solves the Kolmogorov PDE
1
∂s p + α(s, x)∂x p + β(s, x)2 ∂x2 p = 0, 0 < s < t, x ∈ R, (5.19)
2
with terminal value
lim p(t, s, x, y) = δ(x − y). (5.20)
s→t
(ii) If {X(t; s, x)}t≥0 admits a transition density p(t, s, x, y) which is C 1 in the variable t
and C 2 in the variable y, then p(t, s, x, y) solves the Fokker-Planck PDE1
1
∂t p + ∂y (α(t, y)p) − ∂y2 (β(t, y)2 p) = 0, t > s, x ∈ R, (5.21)
2
with initial value
lim p(t, s, x, y) = δ(x − y). (5.22)
t→s
Exercise 5.7. Prove Theorem 5.5. HINT: See Exercises 6.8 and 6.9 in [21].
Remark 5.7. The solution p of the problem (5.19)-(5.20) is called the fundamental so-
lution for the Kolmogorov PDE, as any other solution can be reconstructed from it. For
example for all functions g as in Theorem 5.4, the solution of (5.15) with the terminal
condition (5.16) is given by
Z
uT (t, x) = p(T, t, x, y)g(y) dy.
R
This can be verified either by a direct calculation or by using the interpretation of the
fundamental solution as transition density. Similarly, p is the fundamental solution of the
Fokker-Planck equation
Let us see an example of application of Theorem 5.5. First notice that when the functions
α, β in (5.1) are time-independent, then the Markovian stochastic process {X(t; s, x)}t≥s is
homogeneous and therefore the transition density, when it exists, has the form p(t, s, x, y) =
1
Also known as forward Kolmogorov PDE.
101
p∗ (t − s, x, y). By the change of variable s → t − s = τ in (5.19), we find that p∗ (τ, x, y)
satisfies
1
−∂τ p∗ + α(x)∂x p∗ + σ(x)2 ∂x2 p∗ = 0, (5.23)
2
as well as
1
∂τ p∗ + ∂y (α(y)p∗ ) − ∂y2 (σ(y)2 p∗ ) = 0, (5.24)
2
with the initial condition p∗ (0, x, y) = δ(x − y). For example the Brownian motion is a
Markov process with transition density (3.28). In this case, (5.23) and (5.24) both reduce to
the heat equation −∂τ p∗ + 12 ∂x2 p∗ = 0. It is straightforward to verify that (3.28) satisfies the
heat equation for (τ, x) ∈ (0, ∞) × R. Now we show that, as claimed in Theorem 5.5, the
initial condition p∗ (0, x, y) = δ(x − y) is also verified, that is
Z
lim p∗ (τ, x, y)ψ(y) dy = ψ(x), for all ψ ∈ Cc∞ (R) and x ∈ R.
τ →0 R
√
Indeed with the change of variable y = x + τ z, we have
1 √ z 2 dz
Z Z 2
Z
− z2
p∗ (τ, x, y)ψ(y) dy = √ e ψ(x + τ z) dz → ψ(0) e− 2 √ = ψ(0),
R 2π R R 2π
as claimed. Moreover, as W (0) = 0 a.s., Theorem 5.5 entails that the density of the Brownian
y2
motion is fW (t) (y) = p∗ (t, 0, y) = √ 1 e− 2t , which is of course correct.
2πt
Exercise 5.8. Show that the transition density derived in the example at the end of Sec-
tion 5.1 is the fundamental solution of the Kolmogorov equation for the linear SDE (5.9).
where a, b, c are constant (c 6= 0). CIR processes are used in finance to model the stock
volatility in the Heston model (see Section 6.5.2) and the spot interest rate of bonds in the
CIR model (see Section√ 6.6). Note that the SDE (5.25) is not of the form considered so far, as
the function β(t, x) = c x is defined only for x ≥ 0 and, more importantly, it is not Lipschitz
continuous in a neighborhood of x = 0 as required in Theorem 5.1. Nevertheless, as already
mentioned in Remark 5.3, it can be shown that (5.25) admits a unique global solution for
all x > 0. Clearly the solution satisfies X(t) ≥ 0 a.s., for all t ≥ 0, otherwise the Itô integral
in the right hand side of (5.25) would not even be defined. For future applications, it is
important to know whether the solution can hit zero in finite time with positive probability.
This question is answered in the following theorem, whose proof can be found for instance
in [15, Exercise 37]).
102
Theorem 5.6. Let {X(t)}t≥0 be the CIR process with initial value X(0) = x > 0 at time
t = 0. Define the (stopping2 ) time
Then P(τ0x < ∞) = 0 if and only if ab ≥ c2 /2, which is called Feller’s condition.
Exercise 5.9. Prove Theorem 5.6 following the hints in [15, Exercise 37].
The following theorem shows how to build a CIR process from a family of linear SDE’s.
Theorem 5.7. Let {W1 (t)}t≥0 , . . . {WN (t)}t≥0 be N ≥ 2 independent Brownian motions and
assume that {X1 (t)}t≥0 , . . . , {XN (t)}t≥0 solve
θ σ
dXj (t) = − Xj (t) dt + dWj (t), j = 1, . . . , N, Xj (0) = xj ∈ R, (5.26)
2 2
where θ, σ are deterministic constant. There exists a Brownian motion {W (t)}t≥0 such that
the stochastic process {X(t)}t≥0 given by
N
X
X(t) = Xj (t)2
j=1
2
solves (5.25) with a = θ, c = σ and b = N4θσ .
N σ2
Letting a = θ, c = σ, b = 4θ
and
N
X X (t)
dW (t) = pj dWj (t),
j=1 X(t)
Thus {X(t)}t≥0 is a CIR process, provided we prove that {W (t)}t≥0 is a Brownian mo-
tion. Clearly, W (0) = 0 a.s. and the paths t → W (t, ω) are a.s. continuous. Hence to
conclude that {W (t)}t≥0 is a Brownian motion we must show that dW (t)dW (t) = dt, see
2
See Definition 6.9 for the general definition of stopping time.
103
Theorem 3.18. We have
N N
1 X 1 X
dW (t)dW (t) = Xi (t)Xj (t)dWi (t)dWj (t) = Xi (t)Xj (t)δij dt
X(t) i,j=1 X(t) i,j=1
N
1 X 2
= X (t)dt = dt,
X(t) j=1 j
where we used that dWi (t)dWj (t) = δij dt, since the Brownian motions are independent.
Note that N ≥ 2 implies the Feller condition ab ≥ c2 /2, hence the CIR process con-
structed in the previous theorem does not hit zero, see Theorem 5.6. Note also that the
solution of (5.26) is
σ t 1 θτ
Z
− 21 θt
Xj (t) = e xj + e dWj (τ ) .
2
2 0
It follows by Exercise 4.9 that the random variables X1 (t), . . . , XN (t) are normally distributed
with
1 σ2 1
E[Xj (t)] = e 2 θt xj , Var[Xj (t)] = (1 − e− 2 θt ).
4θ
It follows by Exercise 3.17 that the CIR process constructed Theorem 5.7 is non-central χ2
distributed. The following theorem shows that this is a general property of CIR processes.
Theorem 5.8. Assume ab > 0. The CIR process starting at x > 0 at time t = s satisfies
1
X(t; s, x) = Y, Y ∈ χ2 (δ, β),
2k
where
2a 4ab
k= , δ= , β = 2kxe−a(t−s) .
(1 − e−a(t−s) )c2 c2
Sketch of the proof. As the CIR process is a homogeneous Markov process, it is enough to
prove the claim for s = 0. Let X(t) = X(t; 0, x) for short and denote p(t, 0, x, y) = p∗ (t, x, y)
the density of X(t). By Theorem 5.5, p∗ solves the Fokker-Planck equation
1
∂t p∗ + ∂y (a(b − y)p∗ ) − ∂y2 (c2 y p∗ ) = 0, (5.27)
2
with initial datum p∗ (0, x, y) = δ(x − y). Moreover, the characteristic function θX(t) (u) :=
h(t, u) of X(t) is given by
Z
iuX(t)
h(t, u) = E[e ]= eiuy p∗ (t, x, y) dy,
R
which is equivalent to p∗ (0, x, y) = δ(x − y). The initial value problem (5.28) can be solved
with the method of characteristics (see [8] for an illustration of this method) and one
finds that the solution is given by
βu
exp − 2(u+ik)
h(t, u) = θX(t) (u) = . (5.29)
(1 − iu/k)δ/2
u
Hence θX(t) (u) = θY ( 2k ), where θY (u) is the characteristic function of Y ∈ χ2 (δ, β), see
Table 3.1. This completes the proof.
Exercise 5.10. Derive (5.28a) and verify with Mathematica that (5.29) is the solution of
the initial value problem (5.28).
Finally we discuss briefly the question of existence of strong solutions to the Kolmogorov
equation for the CIR process, which is
c2 2
∂t u + a(b − x)∂x u + x∂ u = 0, (t, x) ∈ DT+ , u(T, x) = g(x). (5.30)
2 x
Note carefully that the Kolmogorov PDE is now defined only for x > 0, as the initial value
x in (5.25) must be positive. Now, if a strong solution of (5.30) exists, then it must be given
by u(t, x) = E[g(X(T ; t, x))] (this claim is proved exactly as in Theorem 5.4(iii)). Supposing
ab > 0, then Z ∞
u(t, x) = E[g(X(T ; t, x))] = p∗ (T − t, x, y)g(y) dy,
0
where the density of X(T ; t, x) is given as in Theorem 5.8. Using the asymptotic behavior
of p∗ as x → 0+ , it can be shown u(t, x) is bounded near the axis x = 0 only if the Feller
condition ab ≥ c2 /2 is satisfied and in this case ∂t u, ∂x u, ∂x2 u are also bounded. Hence u is
the (unique) strong solution of (5.30) if and only if ab ≥ c2 /2.
105
5.4.1 ODEs
Consider the first order ODE
dy
= ay + bt, y(0) = y0 , t ∈ [0, T ], (5.31)
dt
for some constants a, b ∈ R and T > 0. The solution is given by
b at
y(t) = y0 eat + (e − at − 1). (5.32)
a2
We shall apply three different finite difference methods to approximate the solution of (5.31).
In all cases we divide the time interval [0, T ] into a uniform partition,
T T
0 = t0 < t1 < · · · < tn = T, tj = j , ∆t = tj+1 − tj =
n n
and define
y(tj ) = yj , j = 0, . . . , n.
106
sol=zeros(1,n+1);
time=zeros(1,n+1);
a=1; b=1;
sol(1)=y0;
for j=2:n+1
sol(j)=sol(j-1)+(a*sol(j-1)+b*time(j-1))*dt;
time(j)=time(j-1)+dt;
end
Exercise 5.11. Compare the approximate solution with the exact solution for increasing
values of n. Compile a table showing the difference between the approximate solution and
the exact solution at time T for increasing value of n.
107
sol(j)=1/(1-a*dt)*(sol(j-1)+b*time(j)*dt);
end
Exercise 5.12. Compare the approximate solution obtained with the backward Euler method
with the exact solution and the approximate one obtained via the forward Euler method.
Compile a table for increasing values of n as in Exercise 1.
dy 1 d2 y
y(t + ∆) = y(t) + (t)∆t + 2
(t)∆t2 + O(∆t3 ), (5.38)
dt 2 dt
and replacing ∆t with −∆t,
dy 1 d2 y
y(t − ∆) = y(t) − (t)∆t + (t)∆t2 + O(∆t3 ). (5.39)
dt 2 dt2
Subtracting the two equations we obtain the following approximation for dy/dt at time t:
T
yj+1 = yj−1 − 2(ayj + btj ) , j = 0, . . . , n − 1. (5.41)
n
Note that the first step j = 0 requires y−1 . This is fixed by the backward method
T
y−1 = y0 − ay0 , (5.42)
n
which is (5.36) for j = −1.
Exercise 5.13. Write a Matlab function that implements the central difference method
for (5.31). Compile a table comparing the exact solution with the approximate solutions
at time T obtained by the three methods presented above for increasing value of n.
108
A second order ODE
Consider the second order ODE for the harmonic oscillator:
d2 y
2
= −ω 2 y, y(0) = y0 , ẏ(0) = ye0 . (5.43)
dt
The solution to this problem is given by
ye0
y(t) = y0 cos(ωt) + sin(ωt). (5.44)
ω
One can define forward/backward/central difference approximations for second derivatives
in a way similar as for first derivatives. For instance, adding (5.38) and (5.39) we obtain the
following central difference approximation for d2 y/dt2 at time t:
d2 y y(t + ∆t) − 2y(t) + y(t − ∆t)
2
(t) = + O(∆t),
dt ∆t2
which leads to the following iterative equation for (5.43):
2
T
yj+1 = 2yj − yj−1 − ω 2 yj , j = 1, . . . , n − 1, (5.45)
n
T
y1 = y0 + ye0 . (5.46)
n
Note the approximate solution at the first node is computed using the forward method and
the initial datum ẏ(0) = ye0 . The Matlab function solving this iteration is the following.
function [time,sol]=harmonic(w,T,y0,N)
dt=T/N;
sol=zeros(1,N+1);
time=zeros(1,N+1);
sol(1)=y0(1);
sol(2)=sol(1)+y0(2)*dt;
for j=3:N+1
sol(j)=2*sol(j-1)-sol(j-2)-dt^2*w^2*sol(j-1);
time(j)=time(j-1)+dt;
end
Exercise 5.14. Compare the exact and approximate solutions at time T for increasing values
of n.
5.4.2 PDEs
In this section we present three finite difference methods to find approximate solutions to
the one-dimensional heat equation
∂t u = ∂x2 u, u(0, x) = u0 (x), (5.47)
109
where u0 is continuous. We refer to t as the time variable and to x as the spatial variable,
since this is what they typically represent in the applications of the heat equation. As before,
we let t ∈ [0, T ]. As to the domain of the spatial variable x, we distinguish two cases
(i) x runs over the whole real line, i.e., x ∈ (−∞, ∞), and we are interested in finding an
approximation to the solution u ∈ Cb1,2 (DT ).
(ii) x runs over a finite interval, say x ∈ (xmin , xmax ), and we want to find an approximation
of the solution u ∈ Cb1,2 ((0, T ) × (xmin , xmax )) which satisfies the boundary conditions4
for some given continuous functions uL , uR . We also require uL (0) = u0 (xmin ), uR (0) =
u0 (xmax ), so that the solution is continuous on the boundary.
In fact, for numerical purposes, problem (i) is a special case of problem (ii), for the
domain (−∞, ∞) must be approximated by (−A, A) for A >> 1 when we solve problem (i)
in a computer. Note however that in the finite domain approximation of problem (i), the
boundary conditions at x = ±A cannot be prescribed freely! Rather they have to be given by
suitable approximations of the limit values at x = ±∞ of the solution to the heat equation
on the real line.
By what we have just said we can focus on problem (ii). To simplify the discussion we
assume that the domain of the x variable is given by x ∈ (0, X) and we assign zero boundary
conditions, i.e., uL = uR = 0. Hence we want to study the problem
X X
0 = x0 < x1 < · · · < xm = X, xj = j , ∆x = xj+1 − xj = ,
m m
and the partition of the time interval [0, T ] given by
T T
0 = t0 < t1 < · · · < tn = T, ti = i , ∆t = ti+1 − ti = ,
m n
We let
ui,j = u(ti , xj ), i = 0, . . . , n, j = 0, . . . , m.
Hence ui,j is a n + 1 × m + 1 matrix. The ith row contains the value of the approximate
solution at each point of the spatial mesh at the fixed time ti . For instance, the zeroth row
4
These are called Dirichlet type boundary conditions. Other types of boundary conditions can be imposed,
but the Dirichlet type is sufficient for our applications to the Black-Scholes PDE.
110
is the initial datum: u0,j = u0 (xj ), i = 0, . . . m. The columns of the matrix ui,j contain the
values of the approximate solution at one spatial point for different times. For instance, the
column ui,0 are the values of the approximate solution at x0 = 0 for different times ti , while
ui,m contains the values at xm = X. By the given boundary conditions we then have
ui,0 = ui,m = 0, i = 0, . . . , n.
We define
∆t T m2
d= = . (5.49)
∆x2 X2 n
111
end
sol(:,1)=0; sol(:,m+1)=0;
for i=2:n+1
for j=3:m+1
sol(i,j-1)=sol(i-1,j-1)+d*(sol(i-1,j)-2*sol(i-1,j-1)+sol(i-1,j-2));
end
end
To visualize the result it is convenient to employ an animation which plots the approx-
imate solution at each point on the spatial mesh for some increasing sequence of times in
the partition {t0 , t1 , . . . , tn }. This visualization can be achieved with the following simple
Matlab function:
function anim(r,F,v)
N=length(F(:,1));
step=round(1+N*v/10);
figure
for i=1:step:N
plot(r,F(i,:));
axis([0 1 0 1/2]);
drawnow;
pause(0.3);
end
Upon running the command anim(space,sol,v), the previous function will plot the
approximate solutions at different increasing times with speed v (the speed v must be between
0 and 1).
Let us try the following: [time,space,sol]=heatexp(1,1,2500,50). Hence we solve
the problem on the unit square (t, x) ∈ (0, 1)2 on a mesh of (n, m) = 2500 × 50 points. The
value of the parameter (5.49) is
d = 1.
If we now try to visualize the solution by running anim(space,sol,0.1), we find that the
approximate solution behaves very strangely (it produces just random oscillations). However
by increasing the number of time steps with [time,space,sol]=heatexp(1,1,5000,50),
so that
d = 0.5,
and visualize the solution, we shall find that the approximate solution converges quickly
and smoothly to u ≡ 0, which is the equilibrium of our problem (i.e., the time independent
solution of (5.48)). In fact, this is not a coincidence, for we have the following
Theorem 5.9. The forward-centered method for the heat equation is unstable if d ≥ 1 and
stable for d < 1.
112
The term unstable here refers to the fact that numerical errors, due for instance to the
truncation and round-off of the initial datum on the spatial grid, will increase in time. On the
other hand, stability of a finite difference method means that the error will remain small at
all times. The stability condition d < 1 for the forward-centered method applied to the heat
equation is very restrictive: it forces us to choose a very high number of points on the time
partition. To avoid such a restriction, which could be very costly in terms of computation
time, implicit methods are preferred, such as the one we present next.
be the column vector containing the approximate solution at time ti and rewrite (5.51) in
matrix form as follows:
Aui+1 = ui , (5.52)
where A is the m + 1 × m + 1 matrix with non-zero entries given by
The matrix A is invertible, hence we can invert (5.52) to express uj+1 in terms of uj as
This method is unconditionally stable, i.e., it is stable for all values of the parameter d.
We can test this property by using the following Matlab function, which solves the iterative
equation (5.53):
function [time,space,sol]=heatimp(T,X,n,m)
dt=T/n; dx=X/m;
d=dt/dx^2
sol=zeros(n+1,m+1);
time=zeros(1,n+1);
space=zeros(1,m+1);
A=zeros(m+1,m+1);
A(1,1)=1; A(m+1,m+1)=1;
113
for i=2:n+1
time(i)=time(i-1)+dt;
end
for j=2:m+1
space(j)=space(j-1)+dx;
end
for j=1:m+1
sol(1,j)=exp(X^2/4)-exp((space(j)-X/2)^2);
end
sol(:,1)=0; sol(:,m+1)=0;
for k=2:m
A(k,k-1)=-d;
A(k,k)=1+2*d;
A(k,k+1)=-d;
end
for i=2:n+1
sol(i,:)=sol(i-1,:)*transpose(inv(A));
end
Method 3: Crank-Nicholson
This is an implicit method with higher order of accuracy than the backward-centered method.
It is obtained by simply averaging between methods 1 and 2 above, i.e.,
1 1
ui+1,j = ui+1,j + ui+1,j ,
2 2
where the first term in the right hand side is computed with method 1 and the second term
with method 2. Thus we obtain the following iterative equation
d
ui+1,j = ui,j + [(ui,j+1 − 2ui,j + ui,j−1 ) + (ui+1,j+1 − 2ui+1,j + ui+1,j−1 )]. (5.54)
2
As the backward-centered method, the Crank-Nicholson method is also unconditionally sta-
ble.
Exercise 5.15. Write (5.54) in matrix form and solve for the solution at the time step i + 1
in terms of the solution at the time step i.
Exercise 5.16. Write a Matlab function that implements the Crank-Nicholson method.
Exercise 5.17. Compare methods 2 and 3.
114
5.A Appendix: Solutions to selected problems
Exercise 5.3. The SDE in question is
dX(t) = (a − bX(t)) dt + γdW (t), t ≥ s, X(s) = x.
Letting Y (t) = ebt X(t) and applying Itô’s formula we find that Y (t) satisfies
dY (t) = aebt dt + γebt dW (t), Y (s) = xebs .
Hence Z t Z t
bs bu
Y (t) = xe + a e dτ + γ ebu dW (u)
s s
and so Z t
b(s−t) a b(s−t)
X(t; s, x) = xe + (1 − e )+ γeb(u−t) dW (u).
b s
Taking the expectation we obtain immediately that E[X(t; s, x)] = m(t − s, x). Moreover by
Exercise 4.9, the Itô integral in X(t; s, x) is a normal random variable with zero mean and
variance ∆(t − s)2 , hence the claim follows.
t2 t2
Exercise 5.4. Letting Y (t) = e− 2 X(t), we find that dY (t) = e− 2 dW (t) and Y (0) = 1.
Thus Z t
t2 t2 u2
X(t) = e 2 + e 2 e− 2 dW (u).
0
Note that X(t) is normally distributed with mean
t2
E[X(t)] = e 2 .
It follows that Cov(X(s), X(t)) = E[X(s)X(t)] − E[X(s)]E[X(t)] is
s2 +t2
h Z s u2 Z t
u2
i
− 2
Cov(X(s), X(t)) = e 2 E e dW (u) e− 2 dW (u) .
0 0
Assume for example that s ≤ t. Hence
s2 +t2
hZ t 2
Z t
u2
i
− u2
Cov(X(s), X(t)) = e 2 E I[0,s] e dW (u) e− 2 dW (u) .
0 0
Using the result of Exercise 4.3 we have
Z t Z s
s2 +t2 u2 u2 s2 +t2 2
− −
Cov(X(s), X(t)) = e 2 I[0,s] e 2 e 2 du = e 2 e−u du
0 0
√
√ s2 +t2
Z 2s
− u2
2 du √ s2 +t2 √ 1
= πe 2 e √ = πe 2 (Φ( 2s) − ).
0 2π 2
For general s, t ≥ 0 we find
√ s2 +t2 √ 1
Cov(X(s), X(t)) = πe 2 (Φ( 2 min(s, t)) − ).
2
115
Chapter 6
Throughout this chapter we assume that the probability space (Ω, F, P) and the Brownian
motion {W (t)}t≥0 are given. Furthermore, in order to avoid the need to repeatedly specify
technical assumptions, we make the following conventions:
• All stochastic processes in this chapter are assumed to have a.s. continuous paths and
so in particular they are integrable, both path by path and in the Itô sense. Of course
one may relax this assumption, but for our applications it is general enough.
• All Itô integrals in this chapter are assumed to be martingales, which holds for instance
when the integrand stochastic process is in the space L2 .
is a P-martingale relative to the filtration {FW (t)}t≥0 and that the map P
e : F → [0, 1] given
by
P(A)
e = E[Z(T )IA ] (6.2)
is a probability measure equivalent to P, for all T > 0.
Definition 6.1. Consider the 1+1 dimensional market
116
where the market parameters {µ(t)}t≥0 , {σ(t)}t≥0 , {R(t)}t≥0 are all adapted to {FW (t)}t≥0 .
Assume that σ(t) > 0 almost surely for all times. Let {θ(t)}t≥0 be the stochastic process
given by
µ(t) − R(t)
θ(t) = , (6.3)
σ(t)
and define {Z(t)}t≥0 by (6.1). Assume that {Z(t)}t≥0 is a martingale (e.g., {θ(t)}t≥0 satisfies
the Novikov condition (4.20)). The probability measure P e equivalent to P given by (6.2) is
called the risk-neutral probability measure of the market at time T .
Note that, by the definition (6.3) of the stochastic process {θ(t)}t≥0 , we can rewrite dS(t)
as
dS(t) = R(t)S(t)dt + σ(t)S(t)dW
f (t), (6.4)
where
dW
f (t) = dW (t) + θ(t)dt. (6.5)
By Girsanov theorem, Theorem 4.11, the stochastic process {W f (t)}t≥0 is a P-Brownian
e
motion. Moreover, {FW (t)}t≥0 is a non-anticipating filtration for {W f (t)}t≥0 . We also recall
that a portfolio {hS (t), hB (t)}t≥0 is self-financing if it is adapted to {FW (t)}t≥0 and if its
value {V (t)}t≥0 satisfies
dV (t) = hS (t)dS(t) + hB (t)dB(t), (6.6)
see Definition 4.4.
Theorem 6.1. Consider the 1+1 dimensional market
dS(t) = µ(t)S(t)dt + σ(t)S(t)dW (t), dB(t) = B(t)R(t)dt, (6.7)
where the market parameters {µ(t)}t≥0 , {σ(t)}t≥0 , {R(t)}t≥0 are adapted to {FW (t)}t≥0 .
Assume that σ(t) > 0 almost surely for all times. Then the following holds.
(i) The discounted stock price {S ∗ (t)}t≥0 is a P-martingale
e in the filtration {FW (t)}t≥0 .
(ii) A portfolio process {hS (t), hB (t)}t≥0 adapted to {FW (t)}t≥0 is self-financing if and only
if its discounted value satisfies
Z t
∗
V (t) = V (0) + D(s)hS (s)σ(s)S(s)dWf (s). (6.8)
0
(iii) If {hS (t), hB (t)}t≥0 is a self-financing portfolio, then {hS (t), hB (t)}t≥0 is not an arbi-
trage.
Proof. (i) By (6.4) and dD(t) = −D(t)R(t)dt we have
dS ∗ (t) = S(t)dD(t) + D(t)dS(t) + dD(t)dS(t)
= −S(t)R(t)D(t) dt + D(t)(R(t)S(t) dt + σ(t)S(t) dW
f (t))
= D(t)σ(t)S(t)dW
f (t),
117
and so the discounted price {S ∗ (t)}t≥0 of the stock is a P-martingale
e relative to {FW (t)}t≥0 .
(ii) By (6.7) and hB (t)B(t) = V (t) − hS (t)S(t), the value (4.32) of self-financing portfolios
can be written as
Hence
1. the seller is only allowed to invest the amount ΠY (t) in the 1+1 dimensional market
consisting of the underlying stock and the risk-free asset;
118
It follows by Theorem 6.1 that the sought hedging portfolio is not an arbitrage. We may
interpret this fact as a “fairness” condition on the price of the derivative ΠY (t). In fact, if the
seller can hedge the derivative and still be able to make a risk-less profit on the underlying
stock, this may be considered unfair for the buyer.
We thus consider the 1+1 dimensional market
where we assume that the market parameters {µ(t)}t≥0 , {σ(t)}t≥0 , {R(t)}t≥0 are adapted to
{FW (t)}t≥0 and that σ(t) > 0 almost surely for all times. Let {hS (t), hB (t)}t≥0 be a self-
financing portfolio invested in this market and let {V (t)}t≥0 be its value. By Theorem 6.1,
the discounted value {V ∗ (t)}t≥0 of the portfolio is a P-martingale
e relative to the filtration
{FW (t)}t≥0 , hence
D(t)V (t) = E[D(T
e )V (T )|FW (t)].
Requiring the hedging condition V (T ) = Y gives
1 e
V (t) = E[D(T )Y |FW (t)].
D(t)
Since D(t) is FW (t)-measurable, we can move it inside the expectation and write the latter
equation as Z T
D(T )
V (t) = E[Y
e |FW (t)] = E[Y exp(−
e R(s) ds)|FW (t)],
D(t) t
Rt
where we used the definition D(t) = exp(− 0 R(s) ds) of the discounting process. Assuming
that the derivative is sold at time t for the price ΠY (t), then the value of the seller portfolio
at this time is precisely equal to the premium ΠY (t), which leads to the following definition.
Definition 6.2. Let Y be a FW (T )-measurable random variable with finite expectation. The
risk-neutral price (or fair price) at time t ∈ [0, T ] of the European derivative with pay-off
Y and time of maturity T > 0 is given by
Z T
ΠY (t) = E[Y exp(−
e R(s) ds)|FW (t)], (6.10)
t
i.e., it is equal to the value at time t of any self-financing hedging portfolio invested in the
underlying stock and the bond.
Remark 6.2. Being defined as a conditional expectation, the risk-neutral price can rarely be
computed explicitly. An exception to this is when the market parameters are deterministic,
see Section 6.3, and for some simple stochastic models, see Section 6.5.
In the particular case of a standard European derivative, i.e., when Y = g(S(T )), for
some measurable function g, the risk-neutral price becomes
Z T
ΠY (t) = E[g(S(T )) exp(−
e R(s) ds)|FW (t)].
t
119
By (6.4) we have
Z T T
1
Z
S(T ) = S(t) exp (R(s) − σ 2 (s))ds + σ(s)dW
f (s) ,
t 2 t
hence the risk-neutral price of a standard European derivative takes the form
RT RT
Z T
1 2
(R(s)− σ (s))ds+ σ(s)dW (s)
ΠY (t) = E[g(S(t)e ) exp(− R(s) ds)|FW (t)]. (6.11)
f
e t 2 t
Since the risk-neutral price of the European derivative equals the value of self-financing hedg-
ing portfolios invested in a 1+1 dimensional market, then, by Theorem 6.1, the discounted
risk-neutral price {Π∗Y (t)}t∈[0,T ] is a P-martingale
e relative to the filtration {FW (t)}t≥0 . In
fact, this property follows directly also by Definition 6.10, as shown in the first part of the
following theorem.
Theorem 6.2. Consider the 1+1 dimensional market
where we assume that {µ(t)}t≥0 , {σ(t)}t≥0 , {R(t)}t≥0 are adapted to {FW (t)}t≥0 and that
σ(t) > 0 almost surely for all times. Assume that the European derivative on the stock with
pay-off Y and time of maturity T > 0 is priced by (6.10) and let Π∗Y (t) = D(t)ΠY (t) be the
discounted price of the derivative. Then the following holds.
(i) The process {Π∗Y (t)}t∈[0,T ] is a P-martingale
e relative to {FW (t)}t≥0 .
(ii) There exists a stochastic process {∆(t)}t∈[0,T ] , adapted to {FW (t)}t≥0 , such that
Z t
∗ f (s), t ∈ [0, T ].
ΠY (t) = ΠY (0) + ∆(s)dW (6.12)
0
∆(t)
hS (t) = , hB (t) = (ΠY (t) − hS (t)S(t))/B(t) (6.13)
D(t)σ(t)S(t)
is self-financing and replicates the derivative at any time, i.e., its value V (t) is equal
to ΠY (t) for all t ∈ [0, T ]. In particular, V (T ) = ΠY (T ) = Y , i.e., the portfolio is
hedging the derivative.
Proof. (i) We have
120
This shows that the discounted price of the derivative is a P-martingale
e relative to the
filtration {FW (t)}t∈[0,T ] .
(ii) By (i) and (3.24) we have
i.e., the stochastic process {Z(t)Π∗Y (t)}t∈[0,T ] is a P-martingale relative to the filtration
{FW (t)}t∈[0,T ] . Hence, by the martingale representation theorem, Theorem 4.6, there ex-
ists a stochastic process {Γ(t)}t∈[0,T ] adapted to {FW (t)}t∈[0,T ] such that
Z t
Z(t)Π∗Y (t) = ΠY (0) + Γ(s)dW (s), t ∈ [0, T ],
0
i.e.,
d(Z(t)Π∗Y (t)) = Γ(t)dW (t). (6.15a)
On the other hand, by Itô’s product rule,
1 1 θ(t) f
d(1/Z(t)) = − 2
dZ(t) + 3
dZ(t)dZ(t) = dW (t). (6.15c)
Z(t) Z(t) Z(t)
Hence
θ(t)Γ(t)
d(1/Z(t))d(Z(t)Π∗Y (t)) = dt. (6.15d)
Z(t)
Combining Equations (6.15) we have
Γ(t)
dΠ∗Y (t) = ∆(t)dW
f (t), where ∆(t) = θ(t)Π∗Y (t) + ,
Z(t)
121
where hY (t) is the number of shares of the derivative in the portfolio. It follows by (6.12)
that the discounted value of this portfolio satisfies
where
τ =T −t
is the time left to the expiration of the derivative. As FW (t) = FW
f (t), we obtain
1 2 )τ
ΠY (t) = e−rτ E[g(S(t)e (r− 2 σ
eσ(W (T )−W (t)) )|FW
f (t)].
e f f
As the increment Wf (T ) − W
f (t) is independent of Ff (t), the conditional expectation above
W
is a pure expectation, see Theorem 3.14(i), and so
1 2 )τ
ΠY (t) = e−rτ E[g(S(t)e(r− 2 σ eσ(W (T )−W (t)) )]. (6.17)
(Note that we don’t need anymore to be in the risk-neutral world). Finally, since W (T ) −
W (t) ∈ N (0, τ ), we obtain
e−rτ
Z
1 2 y2
ΠY (t) = √ g(S(t)e(r− 2 σ )τ eσy )e− 2τ dy, (6.18)
2πτ R
that is,
ΠY (t) = vg (t, S(t)), (6.19a)
122
where the Black-Scholes price function vg : DT+ → R is given by
e−rτ
Z √
1 2 y2
vg (t, x) = √ g(xe(r− 2 σ )τ eσ τ y )e− 2 dy. (6.19b)
2π R
Definition 6.3. Let g : (0, ∞) → R be a C 2 function and assume that g 0 , g 00 are uniformly
bounded. The stochastic process {ΠY (t)}t∈[0,T ] given by (6.19), is called the Black-Scholes
price of the standard European derivative with pay-off Y = g(S(T )) and time of maturity
T > 0.
Remark 6.3. Our assumptions on the pay-off function g can be considerably weakened,
but since they cover all real-world applications, we shall not do it. Note that, under our
assumptions, vg ∈ C 1,2 (DT+ ) and ∂x vg is uniformly bounded.
Remark 6.4. The fact that the Black-Scholes price of the derivative at time t is a de-
terministic function of S(t), that is, ΠY (t) = vg (t, S(t)), is an important property for the
applications. In fact, thanks to this property, at time t we may look at the price S(t) of
the stock in the market and compute explicitly the theoretical price ΠY (t) of the deriva-
tive. This theoretical value is, in general, different from the real market price. We shall
discuss how to interpret this difference is Section 6.3.2. Moreover, as shown below, the for-
mula (6.19) is equivalent to the Markov property of the geometric Brownian motion (6.16)
in the risk-neutral probability measure P.
e
We can rewrite the Black-Scholes price function as vg (t, x) = h(T − t, x), where, by a
change of variable in the integral on the right hand side of (6.19b),
Z
h(τ, x) = g(y)q(τ, x, y) dy,
R
where " 2 #
e−rτ Iy>0
1 y 1 2
q(τ, x, y) = √ exp − 2 log − (r − σ )τ .
y 2πσ 2 τ 2σ τ x 2
Comparing this expression with (3.31), we see that we can write the function q as
where p∗ is the transition density of the geometric Brownian motion (6.16). In particular, the
risk-neutral pricing formula of a standard European derivative when the market parameters
are constant is equivalent to the identity
Z
E[g(S(T ))|FW
e f (t)] = p∗ (T − t, S(t), y)g(y) dy,
R
and thus, since 0 ≤ t ≤ T are arbitrary, it is equivalent to the Markov property of the
geometric Brownian motion (6.16) in the risk-neutral probability measure P,e see again Ex-
ercise 3.34. We shall generalize this discussion to markets with non-constant parameters in
123
Section 6.5. Note also that replacing s = 0, t = τ , α = r − σ 2 /2 into (3.33), and letting
u(τ, x) = erτ h(τ, x), we obtain that u satisfies
1
−∂τ u + rx∂x u + σ 2 x2 ∂x2 u = 0, u(0, x) = h(0, x) = vg (T, x) = g(x).
2
Hence the function h(τ, x) satisfies
1
−∂τ h + rx∂x h + σ 2 x2 ∂x2 h = rh, h(0, x) = g(x).
2
As vg (t, x) = h(T − t, x), we obtain the following result.
Theorem 6.3. The Black-Scholes price function vg is the unique strong solution of the
Black-Scholes PDE
1
∂t vg + rx∂x vg + σ 2 x2 ∂x2 vg = rvg , (t, x) ∈ DT+ (6.20a)
2
with the terminal condition
vg (T, x) = g(x). (6.20b)
Exercise 6.1. Write a Matlab code that computes the finite difference solution of the prob-
lem (6.20). Use the Crank-Nicholson method presented in Section 5.4.2.
Remark 6.5. For the previous exercise one needs to fix the boundary condition at x = 0
for (6.20a). It is easy to show that the boundary value at x = 0 of the Black-Scholes price
function is given by
vg (t, 0) = g(0)er(t−T ) , for all t ∈ [0, T ]. (6.21)
In fact from one hand, letting x = 0 in (6.20a) we obtain that v(t) = vg (t, 0) satisfies dv/dt =
rv, hence v(t) = v(T )er(t−T ) . On the other hand, v(T ) = vg (T, 0) = g(0). Thus (6.21) follows.
For instance, in the case of a call, i.e., when g(z) = (z − K)+ , we obtain vg (t, 0) = 0, for
all t ∈ [0, T ], hence the risk-neutral price of a call option is zero when the price of the
underlying stock tends to zero. That this should be the case is clear, for the call will never
expire in the money if the price of the stock is arbitrarily small. For a put option, i.e., when
g(z) = (K − z)+ , we have vg (t, 0) = Ke−rτ , hence the risk-neutral price of a put option
is given by the discounted value of the strike price when the price of the underlying stock
tends to zero. This is also clear, since in this case the put option will certainly expire in the
money, i.e., its value at maturity is K with probability one, and so the value at any earlier
time is given by discounting its terminal value.
Next we compute the hedging portfolio of the derivative.
Theorem 6.4. Consider a standard European derivative priced according to Definition 6.3.
The portfolio {hS (t), hB (t)} given by
124
Proof. According to Theorem 6.2, we have to show that the discounted value of the Black-
Scholes price satisfies
dΠ∗Y (t) = D(t)S(t)σ∂x vg (t, S(t))dW
f (t).
A straightforward calculation, using ΠY (t) = vg (t, S(t)), Itô’s formula and Itô’s product rule,
gives
1
d(D(t)ΠY (t)) =D(t)[∂t vg (t, x) + rx∂x vg (t, x) + σ 2 x2 ∂x2 vg (t, x) − rvg (t, x)]x=S(t)
2
+ D(t)σS(t)∂x vg (t, S(t))dW f (t). (6.22)
Since vg solves the Black-Scholes PDE (6.20a), the result follows.
Exercise 6.2. Work out the details of the computation leading to (6.22).
Exercise 6.3. Find the risk-neutral price at time t = 0 of standard European derivatives
assuming that the market parameters are deterministic functions of time.
125
Proof. We derive the Black-Scholes price of call options only, the argument for put options
being similar (see Exercise 6.24). We substitute g(z) = (z − K)+ into the right hand side
of (6.19b) and obtain
e−rτ
Z √ y2
1 2
c(t, x) = √ xe(r− 2 σ )τ eσ τ y − K e− 2 dy.
2π R +
1 2 )τ √
Now we use that xe(r− 2 σ eσ τy
> K if and only if y > −d2 . Hence
Z ∞ Z ∞
e−rτ
√ 2 2
(r− 21 σ 2 )τ σ τ y − y2 − y2
c(t, x) = √ xe e e −K e dy .
2π −d2 −d2
√ √ 2
Using − 21 y 2 + σ τ y = − 21 (y − σ τ )2 + σ2 τ and changing variable in the integrals we obtain
Z ∞ Z ∞
e−rτ
√ 2
rτ − 21 (y−σ τ )2 − y2
c(t, x) = √ xe e dy − K e dy
2π −d2 −d2
" Z d2 +σ√τ Z d2 #
e−rτ rτ − 12 y 2
2
− y2
=√ xe e dy − K e dy
2π −∞ −∞
Exercise 6.4. Derive the Black-Scholes price p(t, S(t)) of European put options claimed in
Theorem 6.5.
Remark 6.7. The formulas (6.23)-(6.24) appeared for the first time in the seminal paper [2],
where they were derived by a completely diffent argument than the one presented here.
As to the self-financing hedging portfolio for the call/put option, we have hS (t) =
∂x c(t, S(t)) for call options and hS (t) = ∂x p(t, S(t)) for put options, see Theorem 6.4, while
the number of shares of the bond in the hedging portfolio is given by
and
hB (t) = (c(t, S(t)) − S(t)∂x p(t, S(t)))/B(t), for put options.
Let us compute ∂x c:
126
1 2 √
As ∂x d1 = ∂x d2 = and Φ0 (x) = e− 2 x / 2π, we obtain
√1 ,
σ τx
1 − 12 d21 K −rτ − 1 d22
∂x c = Φ(d1 ) + √ e − e e 2 .
σ 2πτ x
√
Replacing d1 = d2 + σ τ we obtain
1 2
e− 2 d2
√
− 21 σ 2 τ −d2 σ τ K −rτ
∂x c = Φ(d2 ) + √ e − e .
σ 2πτ x
Using the definition of d2 , the term within round brackets in the previous expression is easily
found to be zero, hence
∂x c = Φ(d1 ).
By the put-call parity we find also
∂x p = Φ(d1 ) − 1 = Φ(−d1 ).
Note that ∂x c > 0, while ∂x p < 0. This agrees with the fact that call options are bought to
protect a short position on the underlying stock, while put options are bought to protect a
long position on the underlying stock.
Exercise 6.5 (•). Consider a European derivative with maturity T and pay-off Y given by
Y = k + S(T ) log S(T ),
where k > 0 is a constant. Find the Black-Scholes price of the derivative at time t < T and
the hedging self-financing portfolio. Find the probability that the derivative expires in the
money.
127
In particular:
• ∆ > 0, i.e., the price of a call is increasing on the price of the underlying stock;
• Γ > 0, i.e., the price of a call is convex on the price of the underlying stock;
• ρ > 0, i.e., the price of the call is increasing on the interest rate of the bond;
• Θ < 0, i.e., the price of the call is decreasing in time;
• ν > 0, i.e., the price of the call is increasing on the volatility of the stock.
Exercise 6.6. Use the put-call parity to derive the greeks of put options.
The greeks measure the sensitivity of options prices with respect to the market conditions.
This information can be used to draw some important conclusions. Let us comment for
instance on the fact that vega is positive. It implies that the wish of an investor with a long
position on a call option is that the volatility of the underlying stock increased. As usual,
since this might not happen, the investor portfolio is exposed to possible losses due to the
decrease of the stock volatility (which makes the call option in the portfolio loose value).
This exposure can be secured by adding variance swaps into the portfolio, see Section 6.5.3.
Exercise 6.7. Prove that
lim c(t, x) = (x − Ke−rτ )+ , lim c(t, x) = x.
σ→0+ σ→∞
Implied volatility
Let us temporarily re-denote the Black-Scholes price of the call as
c(t, S(t), K, T, σ),
which reflects the dependence of the price on the parameters K, T, σ (we disregard the
dependence on r). As shown in Theorem 6.6,
S(t) d21 √
∂σ c(t, S(t), K, T, σ) = vega = √ e− 2 τ > 0.
2π
Hence the Black-Scholes price of the option is an increasing function of the volatility. Fur-
thermore, by Exercise 6.7,
lim c(t, S(t), K, T ) = (S(t) − Ke−rτ )+ , lim c(t, S(t), K, T ) = S(t).
σ→0+ σ→+∞
Therefore the function c(t, S(t), K, T, ·) is a one-to-one map from (0, ∞) into the interval
I = ((S(t) − Ke−rτ )+ , S(t)), see Figure 6.1. Now suppose that at some given fixed time t
the real market price of the call is c̃(t). Clearly, the option is always cheaper than the stock
(otherwise we would buy directly the stock, and not the option) and typically we also have
c̃(t) > max(0, S(t) − Ke−rτ ). The latter is always true if S(t) < Ke−rτ (the price of options
is positive), while for S(t) > Ke−rτ this follows by the fact that S(t) − Ke−rτ ≈ S(t) − K
and real calls are always more expansive than their intrinsic value. This being said, we can
safely assume that c̃(t) ∈ I.
128
10
cHt, SHtL, K, T, ΣL
6
0
0 5 10 15 20
Σ
Figure 6.1: We fix S(t) = 10, K = 12, r = 0.01, τ = 1/12 and depict the Black-Scholes price
of the call as a function of the volatility. Note that in practice only the very left part of this
picture is of interest, because typically 0 < σ < 1.
Thus given the value of c̃(t) there exists a unique value of σ, which depends on the fixed
parameters T, K and which we denote by σimp (T, K), such that
σimp (T, K) is called the implied volatility of the option. The implied volatility must be
computed numerically (for instance using Newton’s method), since there is no close formula
for it.
The implied volatility of an option (in this example of a call option) is a very important
parameter and it is often quoted together with the price of the option. If the market followed
exactly the assumptions in the Black-Scholes theory, then the implied volatility would be a
constant, independent of T, K and equal to the volatility of the underlying asset. In this
respect, σimp (T, K) may be viewed as a quantitative measure of how real markets deviate
from ideal Black-Scholes markets. Furthermore, the implied volatility may be viewed as the
market consensus on the future value of the volatility of the underlying stock. Recall in fact
that in order for the Black-Scholes price of the option to be c(t, S(t), K, T, σimp (T, K)), the
volatility of the stock should be equal to σimp (T, K) in the time interval [t, T ] in the future.
Hence by pricing the option at the price c̃(t) = c(t, S(t), K, T, σimp (T, K)), the market is
telling us that the buyers and sellers of the option believe that the volatility of the stock in
the future will be σimp (T, K).
As a way of example, in Figure 6.2 the implied volatility is determined (graphically) for
various Apple call options on May 12, 2014, when the stock was quoted at 585.54 dollars
(closing price of the previous market day). All options expire on June 13, 2014 (τ = 1
month = 1/12). The value r = 0.01 has been used, but the results do no change significantly
129
70 60
50
K=585
60
K=565 c HtL=14.5
cHΣL
cHΣL
40
50
30
40 c HtL=24.35 20
30 10
20 0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Σ Σ
70
60
60
50
K=590
K=575
50
40 c HtL=11.47
cHΣL
cHΣL
40 c HtL=19.28 30
30 20
20 10
10 0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Σ Σ
Figure 6.2: Implied volatility of various call options on the Apple stock
even assuming r = 0.05. In the pictures, K denotes the strike price and c̃(t) the call price.
We observe that the implied volatility is 20 % in three cases, while for the call with strike
K = 565 dollars the implied volatility is a little smaller (≈ 16%), which means that the
latter call is slightly underpriced compare to the others.
Volatility curve
As mentioned before, the implied volatility depends on the parameters T, K. Here we are
particularly interested in the dependence on the strike price, hence we re-denote the implied
volatility as σimp (K). If the market behaved exactly as in the Black-Scholes theory, then
σimp (K) = σ for all values of K, hence the graph of K → σimp (K) would be just a straight
horizontal line. Given that real markets do not satisfy exactly the assumptions in the Black-
Scholes theory, what can we say about the graph of the volatility curve K → σimp (K)?
Remarkably, it has been found that there exists recurrent convex shapes for the graph of
volatility curves, which are known as volatility smile and volatility skew, see Figure 6.3.
130
100 100
90
80
80
Implied Volatility
Implied Volatility
60
70
40
60
out of the money
20
50
in of the money out of the money in of the money
40 0
0 2 4 6 8 10 0 2 4 6 8 10
Strike Strike
Figure 6.3: Volatility smile and skew of a call option (not from real data!)
In the case of a volatility smile, the minimum of the graph is reached at the strike price
K ≈ S(t), i.e., when the call is at the money. This behavior indicates that the more the call
is far from being at the money, the more it will be overpriced. Volatility smiles have been
recurrent in the market since after the crash in 1987 (Black Monday), indicating that this
event led investors to be more cautious when trading on options that are in or out of the
money. Volatility skews tell us whether investors prefer to trade on call or put options.
Devising mathematical models of volatility and asset prices able to reproduce volatility
curves is an active research topic in financial mathematics. We discuss the most popular
volatility models in the Section 6.5.
S(t0 ) = S(t− − −
0 ) − aS(t0 ) = (1 − a)S(t0 ). (6.31)
We assume that on each of the intervals [0, t0 ), [t0 , T ], the stock price follows a geometric
Brownian motion, namely,
1
The dividend is expressed in percentage of the price of the stock. For instance, a = 0.03 means that the
dividend paid is 3%.
131
Theorem 6.7. Consider the standard European derivative with pay-off Y = g(S(T )) and
(a,t )
maturity T . Let ΠY 0 (t) be the Black-Scholes price of the derivative at time t ∈ [0, T ]
assuming that the underlying stock pays the dividend aS(t− 0 ) at time t0 ∈ (0, T ). Then
(a,t ) vg (t, (1 − a)S(t)), for t < t0 ,
ΠY 0 (t) =
vg (t, S(t)), for t ≥ t0 ,
where vg (t, x) is the Black-Scholes pricing function in the absence of dividends, which is
given by (6.19b).
S(T )
Proof. Using S(t)
= eατ +σ(W (T )−W (t)) , we can rewrite (6.17) in the form
σ2
ΠY (t) = e−rτ E[g(S(T )e(r− 2
−α)τ
]. (6.34)
Taking the limit s → t−
0 in (6.32) and using the continuity of the paths of the Brownian
motion we find
S(t−
0 ) = S(t)e
α(t0 −t)+σ(W (t0 )−W (t))
, t ∈ [0, t0 ).
Replacing in (6.31) we obtain
S(t0 ) = (1 − a)S(t)eα(t0 −t)+σ(W (t0 )−W (t)) , t ∈ [0, t0 ).
Hence, letting (s, u) = (T, t0 ) and (s, u) = (T, t) into (6.33), we find
(1 − a)S(t)eατ +σ(W (T )−W (t)) for t ∈ [0, t0 ),
S(T ) = (6.35)
S(t)eατ +σ(W (T )−W (t)) for t ∈ [t0 , T ].
By the√definition of Black-Scholes price in the form (6.34) and denoting G = (W (T ) −
W (t))/ τ , we obtain
σ2 √
(a,t0 )
ΠY (t) = e−rτ E[g((1 − a)S(t)e(r− 2
)τ +σ τ G
], for t ∈ [0, t0 ),
σ2 √
(a,t0 )
ΠY (t) = e−rτ E[g(S(t)e(r− 2
)τ +σ τ G
], for t ∈ [t0 , T ].
As G ∈ N (0, 1), the result follows.
We conclude that for t ≥ t0 , i.e., after the dividend has been paid, the Black-Scholes
price function of the derivative is again given by (6.19b), while for t < t0 it is obtained
by replacing x with (1 − a)x in (6.19b). To see the effect of this change, suppose that the
derivative is a call option; let c(t, x) be the Black-Scholes price function in the absence of
dividends and ca (t, x) be the price function in the case that a dividend is paid at time t0 .
Then, according to Theorem 6.7,
c(t, (1 − a)x), for t < t0 ,
ca (t, x) =
c(t, x), for t ≥ t0 .
Since ∂x c > 0 (see Theorem 6.6), it follows that ca (t, x) < c(t, x), for t < t0 , that is to say,
the payment of a dividend makes the call option on the stock less valuable (i.e., cheaper)
than in the absence of dividends until the dividend is paid.
132
Exercise 6.8 (?). Give an intuitive explanation for the property just proved for call options
on a dividend paying stock.
Exercise 6.9 (•). A standard European derivative pays the amount Y = (S(T ) − S(0))+
at time of maturity T . Find the Black-Scholes price ΠY (0) of this derivative at time t =
0 assuming that the underlying stock pays the dividend (1 − e−rT )S( T2 −) at time t = T2 .
Compute the probability of positive return for a constant portfolio which is short 1 share of
the derivative and short S(0)e−rT shares of the risk-free asset.
Exercise 6.10. Derive the Black-Scholes price of the derivative with pay-off Y = g(S(T )),
assuming that the underlying pays a dividend at each time t1 < t2 < · · · < tM ∈ [0, T ].
Denote by ai the dividend paid at time ti , i = 1, . . . , M .
Motivated by our earlier results on the Black-Scholes price, and Remark 6.4, we attempt to
re-write the risk-neutral price formula in the form
ΠY (t) = vg (t, S(t)) for all t ∈ [0, T ], for all T > 0, (6.37)
for some function vg : DT+ → (0, ∞), which we call the pricing function of the derivative.
By (6.36), this is equivalent to
E[g(S(T
e ))|FW (t)] = erτ vg (t, S(t)) (6.38)
i.e., to the property that {S(t)}t≥0 is a Markov process in the risk-neutral probability mea-
sure P,e relative to the filtration {FW (t)}t≥0 . At this point it remains to understand for
which stochastic processes {σ(t)}t≥0 does the generalized geometric Brownian motion (6.4)
satisfies this Markov property. We have seen in Section 5.1 that this holds in particular when
{S(t)}t≥0 satisfies a (system of) stochastic differential equation(s), see Section 5.1. Next we
discuss two examples which encompass most of the volatility models used in the applications:
Local volatility models and Stochastic volatility models.
133
6.5.1 Local volatility models
A local volatility model is a special case of the generalized geometric Brownian motion
in which the instantaneous volatility of the stock {σ(t)}t≥0 is assumed to be a deterministic
function of the stock price S(t). Given a measurable function β : [0, ∞) × [0, ∞) → (0, ∞),
we then let
σ(t)S(t) = β(t, S(t)), (6.39)
into (6.4), so that the stock price process {S(t)}t≥0 satisfies the SDE
We assume that this SDE admits a unique global solution, which is true in particular under
the assumptions of Theorem 5.1. To this regard we observe that the drift term α(t, x) = rx
in (6.40) satisfies both (5.3) and (5.4), hence these conditions restrict only the form of the
function β(t, x). In the following we shall also assume that the solution {S(t)}t≥0 of (6.40)
is non-negative a.s. for all t > 0. Note however that the stochastic process solution of (6.40)
will in general
√ hit zero with positive probability at any finite time. For example, letting
β(t, x) = x, the stock price is given by a CIR process (5.25) with b = 0 and so, according
to Theorem 5.6, S(t) = 0 with positive probability for all t > 0.
Theorem 6.8. Let g ∈ C 2 ([0, ∞)) (except possibly on finitely many points) such that g 0 , g 00
are uniformly bounded and assume that the Kolmogorov PDE
1
∂t u + rx∂x u + β(t, x)2 ∂x2 u = 0 (t, x) ∈ DT+ , (6.41)
2
associated to (6.40) admits a (necessarily unique) strong solution in the region DT+ satisfying
u(T, x) = g(x). Let also
vg (t, x) = e−rτ u(t, x).
Then we have the following.
(i) vg satisfies
1
∂t vg + rx∂x vg + β(t, x)2 ∂x2 vg = rvg (t, x) ∈ DT+ , (6.42)
2
and the terminal condition
vg (T, x) = g(x). (6.43)
(ii) The price of the European derivative with pay-off Y = g(S(T )) and maturity T > 0 is
given by (6.37).
134
(iv) The put-call parity holds.
Proof. (i) It is straightforward to verify that vg satisfies (6.42).
(ii) Let X(t) = vg (t, S(t)). By Itô’s formula we find
1
dX(t) = (∂t vg (t, S(t)) + rS(t)∂x vg (t, S(t)) + β(t, S(t))2 ∂x2 vg (t, S(t)))dt
2
+ β(t, S(t))∂x vg (t, S(t))dWf (t).
Hence
1
d(e−rt X(t)) = e−rt (∂t vg + rx∂x vg + β(t, x)2 ∂x2 vg − rvg )(t, S(t))dt
2
−rt
+ e β(t, S(t))∂x vg (t, S(t))dW
f (t).
As vg (t, x) satisfies (6.42), the drift term in the right hand side of the previous equation is
zero. Hence
Z t
−rt
e vg (t, S(t)) = vg (t, S0 ) + e−ru β(u, S(u))∂x vg (u, S(u))dW
f (u). (6.44)
0
135
Example: The CEV model
For the constant elasticity variance (CEV) model, we have β(t, S(t)) = σS(t)δ , where
σ > 0, δ > 0 are constants. The SDE for the stock price becomes
Note that for δ = 1 one recovers the Black-Scholes model. For δ = 6 1, we can construct the
solution of (6.45) using a CIR process, as shown in the following exercise.
Exercise 6.11. Given σ, r and δ 6= 1, define
σ2 1
a = 2r(δ − 1), c = −2σ(δ − 1), b = (2δ − 1), θ = − .
2r 2(δ − 1)
Let {X(t)}t≥0 be the CIR process
p
dX(t) = a(b − X(t)) dt + c X(t)dW
f (t), X(0) = x > 0.
σ 2 2δ 2
∂t u + rx∂x u + x ∂x u = 0, (t, x) ∈ DT+ .
2
Given a terminal value g at time T as in Theorem 5.4, the previous equation admits a
unique solution. However a fundamental solution, in the sense of Theorem 5.5, exists only
for δ > 1, as otherwise the stochastic process {S(t)}t≥0 hits zero at any finite time with
positive probability and therefore the density of the random variable S(t) has a discrete
part. The precise form of the (generalized) density fS(t) (x) in the CEV model is known for
all values of δ and are given for instance in [16]. An exact formula for call options can be
found in [20].
136
and µ, η, β : [0, ∞)3 → R be continuous functions. A stochastic volatility model is a pair
of (non-negative) stochastic diffusion processes {S(t)}t≥0 , {v(t)}t≥0 satisfying the following
system of SDE’s:
p
dS(t) = µ(t, S(t), v(t))S(t) dt + v(t)S(t) dW1 (t), (6.46)
p p
dv(t) = η(t, S(t), v(t)) dt + β(t, S(t), v(t)) v(t)(ρ dW1 (t) + 1 − ρ2 dW2 (t)). (6.47)
We see from (6.46) that {v(t)}t≥0 is the instantaneous variance of the stock price {S(t)}t≥0 .
Moreover the process {W (ρ) (t)}t≥0 given by
p
W (ρ) (t) = ρ W1 (t) + 1 − ρ2 W2 (t)
is a Brownian motion satisfying
dW1 (t)dW (ρ) (t) = ρ dt;
in particular the two Brownian motions {W1 (t)}t≥0 , {W (ρ) }t≥0 are not independent, as their
cross variation is not zero (in fact, by Exercise 4.5, ρ is the correlation of the two Brownian
motions). Hence in a stochastic volatility model the stock price and the volatility are both
stochastic processes driven by two correlated Brownian motions. We assume that {S(t)}t≥0
is non-negative and {v(t)}t≥0 is positive a.s. for all times, although we refrain from discussing
under which general conditions this property holds (we will present an example below).
Our next purpose is to introduce a risk-neutral probability measure such that the dis-
counted price of the stock is a martingale. As we have two Brownian motions in this model,
we shall apply the two-dimensional Girsanov Theorem 4.12 to construct such a probabil-
ity measure. Precisely, let r > 0 be the constant interest rate of the money market and
γ : [0, ∞)3 → R be a continuous function We define
µ(t, S(t), v(t)) − r
θ1 (t) = p , θ2 (t) = γ(t, S(t), v(t)), θ(t) = (θ1 (t), θ2 (t)).
v(t)
137
where {ψ(t, S(t), v(t))}t≥0 is the {FW (t)}t≥0 -adapted stochastic process given by
µ(t, S(t), v(t)) − r p
ψ(t, S(t), v(t)) = p ρ + γ(t, S(t), v(t)) 1 − ρ2 (6.49)
v(t)
and where
f2(γ) (t).
p
f (ρ,γ) (t) = ρW
W f1 (t) + 1 − ρ2 W
e(γ) -Brownian motions {W
Note that the P f (ρ,γ) (t)}t≥0 satisfy
f1 (t)}t≥0 , {W
138
Exercise 6.12. Prove the theorem. Hint: use Itô’s formula in two dimensions, see Theo-
rem 4.9, and the argument in the proof of Theorem 6.8.
As for the local volatility models, a closed formula solution of (6.52) is rarely available
and the use numerical methods to price the derivative becomes essential.
Heston model
The most popular stochastic volatility model is the Heston model, which is obtained by the
following substitutions in (6.46)-(6.47):
µ(t, S(t), v(t)) = α0 , β(t, x, y) = c, η(t, x, y) = a(b − y),
where µ0 , a, b, c are constant. Hence the stock price and the volatility dynamics in the Heston
model are given by the following stochastic differential equations:
p
dS(t) = µ0 S(t) dt + v(t)S(t)dW1 (t), (6.53a)
p
dv(t) = a(b − v(t))dt + c v(t)dW (ρ) (t). (6.53b)
Note in particular that the volatility in the Heston model is CIR process, see (5.25). The
condition 2ab > c2 ensures that v(t) is strictly positive (almost surely). To pass to the risk
neutral world we need to fix a risk-neutral probability measure, that is, we need to fix the
market price of volatility risk function ψ in (6.49). In the Heston model it is assumed that
√
ψ(t, x, y) = λ y,
for some constant λ ∈ R, which leads to the following form of the pricing PDE (6.52):
1 c2
∂t u + rx∂x u + (k − my)∂y u + yx2 ∂x2 u + y∂y2 u + ρcxy∂xy
2
u = ru, (6.54)
2 2
where the constant k, m are given by k = ab, m = (a + cλ).
The general solution of (6.54) with terminal datum u(T, x, y) = h(x, y) is not known.
However in the case of a call option (i.e., h(x, y) = g(x) = (x − K)+ ) an explicit formula for
the Fourier transform of the solution is available, see [12]. The existence of such formula,
which permits to compute the price of call options by very efficient numerical methods, is
one of the main reasons for the popularity of the Heston model.
139
of the interval [0, T ]. Assume for instance that the asset is a stock and let S(tj ) = Sj be the
stock price at time tj . Here S1 , . . . Sn are historical data for the stock price and not random
variables (i.e., the interval [0, T ] lies in the past of the present time). The realized annual
variance of the stock in the interval [0, T ] along this partition is defined as
n−1 2
2 κX Sj+1
σ1year (n, T ) = log ,
T j=0 Sj
where κ is the number of trading days in one year (typically, κ = 252). A variance swap
stipulated at time t = 0, with maturity T and strike variance K is a contract between two
2
parties which, at the expiration date, entails the exchange of cash given by N (σ1year − K),
where N (called variance notional) is a conversion factor from units of variance to units
of currency. In particular, the holder of the long position on the swap is the party who
receives the cash in the case that the realized annual variance at the expiration date is larger
than the strike variance. Variance swaps are traded over the counter and they are used by
investors to protect their exposure to the volatility of the asset. For instance, suppose that
an investor has a position on an asset which is profitable if the volatility of the stock price
increases (e.g., the investor owns call options on the stock). Then it is clearly important
for the investor to secure such position against a possible decrease of the volatility. To this
purpose the investor opens a short position on a variance swap with another investor who is
exposed to the opposite risk.
Let us now discuss variance swaps from a mathematical modeling point of view. We
assume that the stock price follows the generalized geometric Brownian motion
Z t Z t
S(t) = S(0) exp α(s)ds + σ(s)dW (s) .
0 0
A variance swap can thus be defined as the (non-standard) European derivative with pay-off
Y = QT − K. Assuming that the interest rate of the bond is constant, R(t) = r > 0, the
risk-neutral value of a variance swap is given by
ΠY (t) = e−rτ E[Q
e T − K|FW (t)]. (6.55)
In particular, at time t = 0, i.e., when the contract is stipulated, we have
ΠY (0) = e−rT E[Q
e T − K]. (6.56)
140
where we used that FW (0) is a trivial σ-algebra, and therefore the conditional expectation
with respect to FW (0) is a pure expectation. As none of the two parties in a variance swap
has a privileged position on the contract, there is no premium associated to variance swaps,
that is to say, the fair value of a variance swap is zero4 . The value K∗ of the variance strike
which makes the risk-neutral price of a variance swap equal to zero at time t = 0, i.e.,
ΠY (0) = 0, is called the fair variance strike. By (6.56) we find
T
κ
Z
K∗ = e 2 (t)] dt,
E[σ (6.57)
T 0
To compute K∗ explicitly, we need to fix a stochastic model for the variance process {σ 2 (t)}t≥0 .
Let us consider the Heston model
d e 2 e 2 (t)] and so
which implies dt
E[σ (t)] = ab − aE[σ
e 2 (t)] = b + (σ 2 − b)e−at ,
E[σ σ02 = E[σ
e 2 (0)] = σ 2 (0). (6.59)
0
σ02 − b
−aT
K∗ = κ b + (1 − e ) .
aT
Exercise
p 6.13. Assume R = r > 0, α = constant. Moreover, given σ0 > 0, let σ(t) =
σ0 S(t), which is an example of CEV model. Compute the fair strike of the variance swap.
Exercise 6.14 (•). Assume that the price S(t) of a stock follows a generalized geometric
Brownian motion with instantaneous volatility {σ(t)}t≥0 given by the Heston model dσ 2 (t) =
a(b − σ 2 (t)) dt + cσ(t) dW
f (t), where {W
f (t)}t≥0 is a Brownian motion in the risk-neutral
probability measure and a, b, c are constants such that 2ab > c2 > 0. A volatility call option
with strike K and maturity T is a financial derivative with pay-off
s
Z T
κ
Y =N σ 2 (t) dt − K ,
T 0
+
4
This is a general property of forward contracts, see Section 6.7.
141
where κ is the number of trading days in one year and N is a dimensional constant that
converts units of volatility into units of currency. Assuming that the risk-neutral price ΠY (t)
of this derivative has the form
Z t
2
ΠY (t) = f (t, σ (t), σ(s)2 ds)
0
and that the interest rate of the risk-free asset is constant, find the partial differential equation
and the terminal value satisfied by the pricing function f .
where Z T
B(t, T ) = E[exp(−
e R(s) ds)|FW (t)] (6.61)
t
is the risk-neutral value of a zero-coupon bond with face value 1. To compute B(t, T ) under
a CIR interest rate model, we make the ansatz
for some smooth function f : [0, T ] × (0, ∞) → R, which we want to find. Note that the
ansatz (6.62) does not correspond to a classical Markov property for the interest rate, as the
RT
random variable exp(− t R(s) ds) need not be, in general, a function of R(T ).
Theorem 6.10. When the interest rate {R(t)}t>0 follows the CIR model (6.60), the value
142
B(t, T ) of the zero-coupon bond is given by (6.62) with
f (t, x) = e−xC(T −t)−A(T −t) , (6.63a)
where
sinh(γτ )
C(τ ) = (6.63b)
γ cosh(γτ ) + 12 a sinh(γτ )
" 1
#
2ab γe 2 aτ
A(τ ) = − 2 log (6.63c)
c γ cosh(γτ ) + 21 a sinh(γτ )
and
1√ 2
γ= a + 2c2 . (6.63d)
2
Proof. Using Itô’s formula and product rule, together with (6.60), we obtain
d(D(t)f (t, R(t)) = D(t)[∂t f (t, R(t)) + a(b − R(t))∂x f (t, R(t))
c2
+ R(t)∂x2 f (t, R(t)) − R(t)f (t, R(t))]dt
2 p
+ D(t)∂x f (t, R(t))c R(t)dW f (t).
143
promises to sell and deliver to the other party the asset U at time T in exchange for the
cash K. As opposed to option contracts, both parties in a forward contract are obliged to
fulfill their part of the agreement. Forward contracts are traded over the counter and most
commonly on commodities or currencies. Let us give two examples.
Example of forward contract on a commodity. Consider a farmer who grows wheat and a
miller who needs wheat to produce flour. Clearly, the farmer interest is to sell the wheat for
the highest possible price, while the miller interest is to pay the least possible for the wheat.
The price of the wheat depends on many economical and non-economical factors (such as
whether conditions, which affect the quality and quantity of harvests) and it is therefore
quite volatile. The farmer and the miller then stipulate a forward contract on the wheat in
the winter (before the plantation, which occurs in the spring) with expiration date in the
end of the summer (when the wheat is harvested), in order to lock its future trading price
beforehand.
Example of forward contract on a currency. Suppose that a car company in Sweden
promises to deliver a stock of 100 cars to another company in the United States in exactly
one month. Suppose that the price of each car is fixed in Swedish crowns, say 100.000
crowns. Clearly the American company will benefit by an increase of the exchange rate
crown/dollars and will be damaged in the opposite case. To avoid possible high losses,
the American company by a forward contract on 100×100.000=ten millions Swedish crowns
expiring in one month which gives the company the right and the obligation to buy ten
millions crowns for a price in dollars agreed upon today.
Remark 6.9. As it is clear from the examples above, one of the purposes of forward
contracts is to share risks.
The delivery price of a forward contract is agreed by the two parties after a careful
analysis of several factors that may influence the future value of the asset (including logistic
factors, such as the cost of delivery, storage, etc.). For this reason, the delivery price K in a
forward contract may also be viewed as a pondered estimation for the price of the asset at
the time T in the future. In this respect, K is also called the forward price of U. More
precisely, the T -forward price of the asset U at time t is the strike price of a forward contract
on U with maturity T stipulated at time t; the current, actual price Π(t) of the asset is also
called the spot price.
Remark 6.10. As the consensus on the forward price is limited to the participants of the
forward contract, it is unlikely to be accepted by all investors as a good estimation for the
price of the asset at time T . The delivery price of futures contracts on the asset, which we
define in Section 6.7.2, gives a better and more commonly accepted estimation for the future
value of an asset.
Let us apply the risk-neutral pricing theory introduced in Section 6.2 to derive a mathe-
matical model for the forward price of an asset. Let f (t, Π(t), K, T ) be the value at time t
of a forward contract on an asset with price {Π(t)}t∈[0,T ] , maturity T and delivery price K.
The pay-off for the party agreeing to buy the asset is given by
Y = (Π(T ) − K),
144
while the pay-off for the party selling the asset is (K − Π(T )).
Remark 6.11. Note that one of the two parties in a forward contract is always going to
incur in a loss. If this loss is very large, then this party could become insolvent, i.e., unable
to fulfill the contract, and then both parties will end up loosing. In a futures contract this
is prevented by the mechanism of margine accounts, see Section 6.7.2.
As both parties in a forward contract have the same rights/obligations, none of them
pays a premium to stipulate the contract, and so f (t, Π(t), K, T ) = 0. Assuming that the
price {Π(t)}t≥0 of the underlying asset follows a generalized geometric Brownian motion with
strictly positive volatility, the risk-neutral value of the forward contract for the two parties
is
Definition 6.5. Assume that the price {Π(t)}t≥0 of an asset and the value of the bond satisfy
where {α(t)}t≥0 , {σ(t)}t≥0 , and {R(t)}t≥0 are adapted to {FW (t)}t≥0 and σ(t) > 0 al-
most surely for all times. The risk-neutral T -forward price at time t of the asset is
the {FW (t)}t≥0 -adapted stochastic process {ForT (t)}t∈[0,T ] given by
Π(t)
ForT (t) = , t ∈ [0, T ]
B(t, T )
Remark 6.12. The value B(t, T ) is the risk-neutral price at time t of a European derivative
with pay-off 1 at the time of maturity T . This type of derivative is called a zero-coupon
bond and is discussed in more details in Section 6.6.
145
Note that the forward price increases with respect to the time left to delivery, τ = T − t,
i.e., the longer we delay the delivery of the asset, the more we have to pay for it. This is
intuitive, as the seller of the asset is loosing money by not selling the asset on the spot (due
to its devaluation compared to the bond value). As a way of example, suppose that the
interest rate of the bond is a deterministic constant, R(t) = r > 0. Then the forward price
becomes
ForT (t) = erτ Π(t),
in which case we find that the spot price of an asset is the discounted value of the forward
price. When the asset is a commodity (e.g., corn), the forward price is also inflated by the
cost of storage. Letting c > 0 be the cost to storage one share of the asset for one year, then
the forward price of the asset, for delivery in τ years in the future, is ecτ erτ Π(t).
6.7.2 Futures
Futures contracts are standardized forward contracts, i.e., rather than being traded over
the counter, they are negotiated in regularized markets. Perhaps the most interesting role
of futures contracts is that they make trading on commodities possible for anyone. To this
regard we remark that commodities, e.g. crude oil, wheat, etc, are most often sold through
long term contracts, such as forward and futures contracts, and therefore they do not usually
have an “official spot price”, but only a future delivery price (commodities “spot markets”
exist, but their role is marginal for the discussion in this section).
Futures markets are markets in which the objects of trading are futures contracts.
Unlike forward contracts, all futures contracts in a futures market are subject to the same
regulation, and so in particular all contracts on the same asset with the same delivery time
T have the same delivery price, which is called the T-future price of the asset and which
we denote by FutT (t). Thus FutT (t) is the delivery price in a futures contract on the asset
with time of delivery T and which is stipulated at time t < T . Futures markets have been
existing for more than 300 years and nowadays the most important ones are the Chicago
Mercantile Exchange (CME), the New York Mercantile Exchange (NYMEX), the Chicago
Board of Trade (CBOT) and the International Exchange Group (ICE).
In a futures market, anyone (after a proper authorization) can stipulate a futures contract.
More precisely, holding a position in a futures contract in the futures market consists in the
agreement to receive as a cash flow the change in the future price of the underlying asset
during the time in which the position is held. Notice that the cash flow may be positive or
negative. In a long position the cash flow is positive when the future price goes up and it
is negative when the future price goes down , while a short position on the same contract
receives the opposite cash flow. Moreover, in order to eliminate the risk of insolvency, the
cash flow is distributed in time through the mechanism of the margin account. More
precisely, assume that at t = 0 we open a long position in a futures contract expiring at time
T . At the same time, we need to open a margin account which contains a certain amount
of cash (usually, 10 % of the current value of the T -future price for each contract opened).
At t = 1 day, the amount FutT (1) − FutT (0) will be added to the account, if it positive, or
146
510
505
500
Future Price
495
490
485
480
Jan15 Jan16
Delivery Time
Figure 6.4: Futures price of corn on May 12, 2014 (dashed line) and on May 13, 2014
(continuous line) for different delivery times
4.5
4.4
Future Price
4.3
4.2
4.1
4.0
Jul14 Jan15 Jul15
Delivery Time
Figure 6.5: Futures price of natural gas on May 13, 2014 for different delivery times
147
withdrawn, if it is negative. The position can be closed at any time t < T (multiple of days),
in which case the total amount of cash flown in the margin account is
(In fact, if the margin account becomes too low, and the investor does not add new cash
to it, the position will be automatically closed by the exchange market). If a long position
is held up to the time of maturity, then the holder of the long position should buy the
underlying asset. However in the majority of cases futures contracts are cash settled and
not physically settled, i.e., the delivery of the underlying asset rarely occurs, and the
equivalent value in cash is paid instead.
Remark 6.13. Since a futures contract can be closed at any time prior to expiration, future
contracts are not European style derivatives.
Our next purpose is to derive a mathematical model for the future price of an asset. Our
guiding principle is that the 1+1 dimensional futures market consisting of a futures
contract and a risk-free asset should not admit self-financing arbitrage portfolios. Consider
a portfolio invested in h(t) shares of the futures contract and hB (t) shares of the risk-free
asset at time t. We assume that {h(t), hB (t)}t∈[0,T ] is adapted to {FW (t)}t≥0 and suppose
that {FutT (t)}t∈[0,T ] is an Itô’s process. Since futures contracts have zero-value, the value of
the portfolio at time t is V (t) = hB (t)B(t) + h(t)C(t), where C(t) is the cash-flow generated
by each futures contract up to time t. For a self-financing portfolio we require that any
positive cash-flow in the interval [t, t + dt] should be invested to buy shares of the bond and
that, conversely, any negative cash flow should be settled by issuing shares of the bond (i.e.,
by borrowing money). Since the cash-flow generated in the interval [t, t + dt] is given by
dC(t) = h(t)dFutT (t), the value of a self-financing portfolio invested in the 1+1 dimensional
futures market must satisfy
or equivalently
dV ∗ (t) = h(t)D(t)dFutT (t). (6.66)
Now, we have seen that a simple condition ensuring that a portfolio is not an arbitrage
is that its discounted value be a martingale in the risk-neutral measure relative to the
filtration generated by the Brownian motion. By (6.66), the latter condition is achieved
by requiring that dFutT (t) = ∆(t)dW f (t), for some stochastic process {∆(t)}t∈[0,T ] adapted
to {FW (t)}t∈[0,T ] . In particular, it is reasonable to impose that
Furthermore, it is clear that the future price of an asset at the expiration date T should be
equal to its spot price at time T , and so we impose that
148
(ii) FutT (T ) = Π(T ).
It follows by Exercise 3.31 that the conditions (i)-(ii) determine a unique stochastic process
{FutT (t)}t∈[0,T ] , which is given in the following definition.
Definition 6.6. Assume that the price {Π(t)}t≥0 of an asset and the value of the bond satisfy
dΠ(t) = α(t)Π(t)dt + σ(t)Π(t)dW (t), dB(t) = B(t)R(t)dt,
where {α(t)}t≥0 , {σ(t)}t≥0 , and {R(t)}t≥0 are adapted to {FW (t)}t≥0 and σ(t) > 0 almost
surely for all times. The T -Future price at time t of the asset is the {FW (t)}t≥0 -adapted
stochastic process {FutT (t)}t∈[0,T ] given by
By the martingale property of the future price, the left hand side is Z(s)FutT (s). Hence
Z(s)FutT (s) = E[Z(t)FutT (t)|FW (s)],
that is to say, the process {Z(t)FutT (t)}t∈[0,T ] is a P-martingale relative to the filtration
{FW (t)}t∈[0,T ] . By the martingale representation theorem, Theorem 4.6, there exists a
stochastic process {Γ(t)}t∈[0,T ] adapted to {FW (t)}t≥0 such that
Z t
Z(t)FutT (t) = FutT (0) + Γ(s)dW (s).
0
149
Theorem 6.12. The Forward-Future spread of an asset, i.e., the difference between its
forward and future price, satisfies
1 n o
ForT (t)−FutT (t) = E[D(T )Π(T )|FW (t)]− E[D(T )|FW (t)]E[Π(T )|FW (t)] .
e e e
E[D(T
e )|FW (t)]
(6.68)
Moreover, if the interest rate {R(t)}t∈[0,T ] is a deterministic function of time (e.g., a deter-
ministic constant), then ForT (t) = FutT (t), for all t ∈ [0, T ].
Proof. The last claim follows directly by (6.68). In fact, when the interest rate of the bond
is deterministic, the discounting process is also deterministic and thus in particular D(T ) is
FW (t)-measurable. Hence, the term in curl brackets in (6.68) satisfies
n o
. . . = D(T )E[Π(T
e )|FW (t)] − D(T )E[Π(T
e )|FW (t)] = 0.
As to (6.68), we compute
Π(t) D(t)Π(t)
− E[Π(T
e )|FW (t)] = − E[Π(T
e )|FW (t)]
B(t, T ) E[D(T
e )|FW (t)]
E[D(T
e )Π(T )|FW (t)] e
= − E[Π(T )|FW (t)],
E[D(T
e )|FW (t)]
where for the last equality we used that {Π∗ (t)}t∈[0,T ] is a P-martingale
e relative to {FW (t)}t≥0 .
The result follows.
Note that, as opposed to the forward price of the asset, the future price need not be
increasing with the time left to delivery.
for some stochastic processes {µk (t)}t≥0 , {σkj (t)}t≥0 , j, k = 1, . . . , N , adapted to the filtration
{FW (t)}t≥0 generated by the Brownian motions {W1 (t)}t≥0 , . . . {WN (t)}t≥0 . Moreover we
assume that the Brownian motions are independent, in particular
150
see Exercise 3.21. Finally we denote by {R(t)}t≥0 the interest rate of the money market,
which we assume to be adapted to {FW (t)}t≥0 .
Now, given stochastic processes {θk (t)}t≥0 , k = 1, . . . , N , adapted to {FW (t)}t≥0 , and
satisfying the Novikov condition (4.20), the stochastic process {Z(t)}t≥0 given by
N Z t Z t !
X 1 2
Z(t) = exp − θk (s) ds + θk (s) dWk (s) (6.71)
k=1 0 2 0
is a martingale relative to the filtration {FW (t)}t≥0 (see Exercise 4.8). Since E[Z(t)] =
E[Z(0)] = 1, for all t ≥ 0, we can use the stochastic process {Z(t)}t≥0 to define a risk-
neutral probability measure associated to the N + 1 dimensional stock market, as we did in
the one dimensional case, see Definition 6.1.
Definition 6.7. Let T > 0 and assume that the market price of risk equations
N
X
µj (t) − R(t) = σjk (t)θk (t), j = 1, . . . , N, (6.72)
k=1
admit a solution (θ1 (t), . . . , θN (t)), for all t ∈ [0, T ]. Define the stochastic process {Z(t)}t≥0
as in (6.71). Then the measure P e equivalent to P given by
P(A)
e = E[Z(T )IA ]
is called the risk-neutral probability measure of the market at time T .
Note that, as opposed to the one dimensional case, the risk-neutral measure just defined
need not be unique, as the market price of risk equations may admit more than one solution.
For each risk-neutral probability measure P e we can apply the multidimensional Girsanov
theorem 4.12 and conclude that the stochastic processes {W f1 (t)}t≥0 , . . . {W
fN (t)}t≥0 given by
Z t
Wfk (t) = Wk (t) + θk (s) ds
0
are P-independent
e Brownian motions. Moreover these Brownian motions are P-martingales
e
relative to the filtration {FW (t)}t≥0 .
Now let {hS1 (t)}t≥0 , . . . , {hSN (t)}t≥0 be {FW (t)}t≥0 -adapted stochastic processes repre-
senting the number of shares of the stocks in a portfolio invested in the N + 1 dimensional
stock market. The portfolio is self-financing if its value satisfies
N N
!
X X
dV (t) = hSk (t)dSk (t) + R(t) V (t) − hSk (t)Sk (t) dt.
k=1 k=1
Theorem 6.13. Assume that a risk-neutral probability P e exists, i.e., the equations (6.72)
admit a solution. Then the discounted value of any self-financing portfolio invested in the
N +1 dimensional market is a P-martingale
e relative to the filtration {FW (t)}t≥0 . In particular
(by Theorem 3.16) there exists no self-financing arbitrage portfolio invested in the N + 1
dimensional stock market.
151
Proof. The discounted value of the portfolio satisfies
N N
!
X X
dV ∗ (t) = D(t) hSj (t)Sj (t)(αj (t) − R(t)) dt + hSj (t)Sj (t)σjk (t)dWk (t)
j=1 j,k=1
N N N
!
X X X
= D(t) hSj (t)Sj (t) σjk (t)θk (t)dt + hSj (t)Sj (t)σjk (t)dWk (t)
j=1 k=1 j,k=1
N
X N
X
= D(t) hSj (t)Sj (t) σjk (t)dW
fk (t).
j=1 k=1
152
The value {V (t)}t≥0 of this portfolio satisfies
dV (t) = hS1 (t)dS1 (t) + hS2 (t)dS2 (t) + hS3 (t)dS3 (t)
+ r(V (t) − hS1 (t)S1 (t) − hS2 (t)S2 (t) − hS3 (t)S3 (t))dt
= rV (t)dt + (1 − r)dt.
Hence
1
V (t) = V (0)ert + (1 − r)(ert − 1)
r
and this portfolio is an arbitrage, because for V (0) = 0 we have V (t) > 0, for all t > 0.
Similarly one can find an arbitrage portfolio for r > 1.
Next we address the question of completeness of N + 1 dimensional stock markets, i.e.,
the question of whether any European derivative can be hedged in this market. Consider a
European derivative on the stocks with pay-off Y and time of maturity T . For instance, for
a standard European derivative, Y = g(S1 (T ), . . . SN (T )), for some measurable function g.
The risk-neutral price of the derivative is
Z T
ΠY (t) = E[Y exp(−
e R(s) ds)|F(t)],
t
and coincides with the value at time t of any self-financing portfolio invested in the N + 1
dimensional market. The question of existence of an hedging portfolio is answered by the
following theorem.
Theorem 6.14. Assume that the volatility matrix (σjk (t))j,k=1,...N is invertible, for all t ≥ 0.
There exist stochastic processes {∆1 (t)}t∈[0,T ] , . . . {∆N (t)}t∈[0,T ] , adapted to {FW (t)}t≥0 , such
that
XN Z t
D(t)ΠY (t) = ΠY (0) + ∆k (s)dWfk (s), t ∈ [0, T ]. (6.73)
k=1 0
is self-financing and replicates the derivative at any time, i.e., its value V (t) is equal to
ΠY (t) for all t ∈ [0, T ]. In particular, V (T ) = ΠY (T ) = Y , i.e., the portfolio is hedging the
derivative.
153
The proof of this theorem is conceptually very similar to that of Theorem 6.2 and is
therefore omitted (it makes use of the multidimensional version of the martingale represen-
tation theorem). Notice that, having assumed that the volatility matrix is invertible, the
risk-neutral probability measure of the market is unique. We now show that the uniqueness
of the risk-neutral probability measure is necessary to guarantee completeness. In fact, let
r = 1 in the example considered before and pick the following solutions of the market price
of risk equations:
(θ1 , θ2 ) = (0, 1/2), and (θ1 , θ2 ) = (1, 0)
(any other pair of solutions would work). The two corresponding risk-neutral probability
measures, denoted respectively by P
e and P,
b are given by
P(A)
e = E[ZI
e A ] P(A)
b = E[ZI
b A ], for all A ∈ F,
where
1 1 1
Ze = e− 8 T − 2 W2 (T ) , Zb = e− 2 T −W1 (T ) .
Let A = {ω : 12 W2 (T, ω) − W1 (T, ω) > 38 T }. Hence
Z(ω)
b < Z(ω),
e for ω ∈ A
at time of maturity T . Let us find the risk-neutral price ΠY (t) of the derivative at time
t ∈ [0, T ). Letting r > 0 be the interest rate of the bond, (σjk )j,k=1,2 be the volatility matrix
of the stocks and σj = (σj1 , σj2 ), j = 1, 2, we have
|σj |2
)t+σj ·W
Sj (t) = Sj (0)e(r−
f (t)
2 ,
154
where Wf (t) = (W f2 (t)) and · denotes the standard scalar product of vectors. Hence,
f1 (t), W
with τ = T − t,
−rτ e S1 (T )
ΠY (t) = e E −K |F(t)
S2 (T ) +
−rτ e S1 (t) ( |σ2 |2 − |σ1 |2 )τ +(σ1 −σ2 )·(W
f (T )−W
f (t))
=e E e 2 2 −K |F(t) .
S2 (t) +
Now we write
√ √
(σ1 − σ2 ) · (W
f (T ) − W
f (t)) = τ [(σ11 − σ21 )G1 + (σ12 − σ22 )G2 ] = τ (X1 + X2 ),
√
where Gj = (Wj (T ) − Wj (t))/ τ ∈ N (0, 1), j = 1, 2, hence Xj ∈ N (0, (σ1j − σ2j )2 ), j = 1, 2.
In addition, X1 , X2 are independent random variables, hence, as shown in Section 3.3, X1 +X2
is normally distributed with zero mean and variance (σ11 − σ21 )2 + (σ12 − σ22 )2 = |σ1 − σ2 |2 .
It follows that
−rτ e S1 (t) ( |σ2 |2 − |σ1 |2 )τ +√τ |σ1 −σ2 |G)
ΠY (t) = e E e 2 2 −K ,
S2 (t) +
Up to the multiplicative parameter a, this is the Black-Scholes price of a call on a stock with
price S1 (t)/S2 (t), volatility |σ1 − σ2 | and for an interest rate of the bond given by r̂. Hence,
Theorem 6.5 gives
S1 (t) −r̂τ
ΠY (t) = a Φ(d+ ) − Ke Φ(d− ) := v(t, S1 (t), S2 (t)),
S2 (t)
where 2
S1 (t)
log KS2 (t)
+ (r̂ ± |σ1 −σ
2
2|
)τ
d± = √ .
|σ1 − σ2 | τ
As to the self-financing hedging portfolio, it can be shown, by an argument similar to the
one used in the 1+1 dimensional case (see Theorem 6.4), that one such portfolio is given by
∂v
hSj (t) = ∂xj
(t, S1 (t), S2 (t)), j = 1, 2. Therefore, recalling the delta function of the standard
European call (see Theorem 6.6), we obtain
a aS1 (t)
hS1 (t) = Φ(d+ ), hS2 (t) = − Φ(d+ ). (6.77a)
S2 (t) S2 (t)2
155
As usual
2
X
hB (t) = (ΠY (t) − hSj (t)Sj (t))/B(t). (6.77b)
j=1
Exercise 6.17. Show that the portfolio (6.77) is self-financing and hedges the derivative.
Any reasonable definition of fair price for American derivatives must satisfy (i)-(ii).
Definition 6.8. A time t ∈ (0, T ] is said to be an optimal exercise time for the American
derivative with intrinsic value Y (t) if Π
e Y (t) = Y (t).
Hence by exercising the derivative at an optimal exercise time t, the buyer takes full
advantage of the derivative: the resulting pay-off equals the value of the derivative. On the
other hand, if Π
e Y (t) > Y (t) and the buyer wants to close the (long) position on the American
derivative, then the optimal strategy is to sell the derivative, thereby cashing the amount
Π
e Y (t).
Theorem 6.15. Assume (i) holds and let C(t) be the price of an American call at time
t ∈ [0, T ]. Assume further that the underlying stock price follows a generalized geometric
Brownian motion and that the interest rate R(t) of the money market is strictly positive for
all times. Then C(t) > Y (t), for all t ∈ [0, T ). In particular it is never optimal to exercise
the call prior to maturity.
Proof. We can assume S(t) ≥ K, as for S(t) < K the claim is obvious (since C(t) ≥ 0).
156
Rt
Denoting by D(t) = exp(− 0 R(s)ds) the discounting process, the price of the European
call is E[(S(T
e ) − K)+ D(T )/D(t)|FW (t)], hence by (i):
C(t) ≥ E[(S(T
e ) − K)+ D(T )/D(t)|FW (t)] ≥ E[(S(T
e ) − K)D(T )/D(t)|FW (t)]
= E[S(T
e )D(T )/D(t)|FW (t)] − K E[D(T
e )/D(t)|FW (t)] > D(t)−1 E[S
e ∗ (T )|FW (t)] − K
= S(t) − K,
where we used D(T )/D(t) < 1 (by the positivity of the interest rate R(t)) and the martingale
property of the discounted price {S ∗ (t)}t∈[0,T ] of the stock.
It follows that under the assumptions of the previous theorem, American and European
call options have the same value.
Remark 6.14. The result is valid for general standard American derivatives with convex
pay-off function, see [21, Section 8.5].
Remark 6.15. A notable exception to the assumed conditions in Theorem 6.15 is when the
underlying stock pays a dividend. In this case it can be shown that it is optimal to exercise
the American call immediately before the dividend is payed, provided the price of the stock
is sufficiently high, see Section 6.9.2 below.
Definition 6.9. Let T ∈ (0, ∞). A random variable τ : Ω → [0, T ] is called a stopping
time for the filtration {FW (t)}t≥0 if {τ ≤ t} ∈ FW (t), for all t ∈ [0, T ]. We denote by QT
the set of all stopping times for the filtration {FW (t)}t≥0 .
Think of τ as the time at which some random event takes place. Then τ is a stopping
time if the occurrence of the event before or at time t can be inferred by the information
available up to time t (no future information is required). For the applications that we have
in mind, τ will be the optimal exercise time of an American derivative, which marks the
event that the price of the derivative equals its intrinsic value.
From now on we assume that the market has constant parameters and r > 0. Hence the
price of the stock is given by the geometric Brownian motion
σ2
S(t) = S(0)e(r− )t+σ W
f (t)
2 .
We recall that in this case the price Π(0, T ) at time t = 0 of a European derivative with
pay-off Y = g(S(T )) at maturity time T > 0 is given by
2
e −rT g(S(0)e(r− σ2
ΠY (0, T ) = E[e )T +σ W
f (T )
)]
Now, if the writer of the American derivative were sure that the buyer would exercise at the
time u ∈ (0, T ], then the fair price of the American derivative at time t = 0 would be equal to
ΠY (0, u). As the writer cannot anticipate when the buyer will exercise, we would be tempted
to define the price of the American derivative at time zero as max{ΠY (0, u), 0 ≤ u ≤ T }.
However this definition would actually be unfair, as it does not take into account the fact
that the exercise time is a stopping time, i.e., it is random and it cannot be inferred using
future information. This leads us to the following definition.
157
Definition 6.10. In a market with constant parameters, the fair price at time t = 0 of the
standard American derivative with intrinsic value Y (t) = g(S(t)) and maturity T > 0 is
given by
2
Π(0)
e e −rτ g(S(0)e(r− σ2 )τ +σW
= max E[e
f (τ )
)] (6.78)
τ ∈QT
It is not possible in general to find an closed formula for the price of an American
derivative. A notable exception is the price of perpetual American put options, which we
discuss next.
158
which is always positive. Thus the graph of vL∗ (x) always lies above k − x for x > L∗ .
It follows that it is not optimal to exercise the derivative if S(0) > L∗ .
(iv) In the perpetual case, any time is equivalent to t = 0, as the time left to maturity is
always infinite. Hence
Π(t)
e = vL∗ (S(t)).
In conclusion the theorem is saying us that the buyer of the derivative should exercise as
soon as the stock price falls below the threshold L∗ . In fact we can reformulate the theorem
in the following terms:
e −rτ (K − S(τ ))+ ] over all possible τ ∈ Q is achieved
Theorem 6.17. The maximum of E[e
at τ = τ∗ , where
τ∗ = min{t ≥ 0 : S(t) = L∗ }.
e −rτ∗ (K − S(τ∗ ))+ ] = vL∗ (S(0)).
Moreover E[e
For the proof of Theorem 6.16 we need the optional sampling theorem:
Theorem 6.18. Let {X(t)}t≥0 be an adapted process and τ a stopping time. Let t ∧ τ =
min(t, τ ). If {X(t)}t≥0 is a martingale/supermatingale/submartingale, then {X(t ∧ τ )}t≥0 is
also a martingale/supermartingale/submartingale.
We can now prove Theorem 6.16. We divide the proof in two steps, which correspond
respectively to Theorem 8.3.5 and Corollary 8.3.6 in [21].
Step 1: The stochastic process {e−r(t∧τ ) vL∗ (S(t ∧ τ ))}t≥0 is a super-martingale for all τ ∈ Q.
Moreover for S(0) > L∗ the stochastic process {e−r(t∧τ∗ ) vL∗ (S(t ∧ τ∗ ))}t≥0 is a martingale.
By Itô’s formula,
1
d(e−rt vL∗ (S(t)t)) = e−rt [−rvL∗ (S(t)) + rS(t)vL0 ∗ (S(t)) + σ 2 S(t)2 vL00 ∗ (S(t))]dt
2
−rt 0
+ e σS(t)vL∗ (S(t))dW (t).
f
The drift term is zero for S(t) > L∗ and it is equal to −rK dt for S(t) ≤ L∗ . Hence
Z t Z t
−rt −ru
e vL∗ (S(t)) = vL∗ (S(0)) − rK e IS(u)≤L∗ (u) du + e−ru σS(u)vL0 ∗ (S(u))dW
f (u).
0 0
Since the drift term is non-positive, then {e−rt vL∗ (t)}t≥0 is a supermartingale and thus by the
optional sampling theorem, the process {e−r(t∧τ ) vL∗ (S(t ∧ τ )}t≥0 is a also a supermartingale,
for all τ ∈ Q. Now, if S(0) > L∗ , then, by continuity of the paths of the geometric Brownian
motion, S(u, ω) > L∗ as long as u < τ∗ (ω). Hence by stopping the process at τ∗ the stock
price will never fall below L∗ and therefore the drift term vanishes, that is
Z t∧τ∗
−r(t∧τ∗ )
e vL∗ (S(t ∧ τ∗ )) = vL∗ (S(0)) + e−ru σS(u)vL0 ∗ (S(u))dW
f (u).
0
159
The Itô integral is a martingale and thus the Itô integral stopped at time τ∗ is also a
martingale by the optional sampling theorem. The claim follows.
Step 2: The identity (6.80) holds. The supermartingale property of the process {e−r(t∧τ ) vL∗ (S(t∧
τ ))}t≥0 implies that its expectation is non-increasing, hence
Π(0)
e e −rτ (K − S(τ ))+ ] ≤ vL∗ (S(0)).
= max E[e
τ ∈Q
Moreover e−rτ∗ vL∗ (S(τ∗ )) = e−rτ∗ vL∗ (L∗ ) = e−rτ∗ (K − S(τ∗ ))+ , hence
It follows that
Π(0)
e e −rτ (K − S(τ ))+ ] ≥ vL∗ (S(0)),
= max E[e
τ ∈Q
Definition 6.12. A portfolio process {hS (t), hB (t)}t≥0 is said to be replicating the perpetual
American put if its value {V (t)}t≥0 equals Π(t)
e for all t ≥ 0.
160
Thus by setting-up a replicating portfolio, the writer of the perpetual American put is
sure to always be able to afford to pay-off the buyer. Note that in the European case a
self-financing hedging portfolio is trivially replicating, as the price of European derivatives
has been defined as the value of such portfolios. However in the American case a replicating
portfolio need not be self-financing: if the buyer does not exercise at an optimal exercise
time, the writer must withdraw cash from the portfolio in order to replicate the derivative.
This leads to the definition of portfolio generating a cash flow.
Definition 6.13. A portfolio {hS (t), hB (t)}t≥0 with value {V (t)}t≥0 is said to generate a
cash flow with rate c(t) if {c(t)}t≥0 is adapted to {FW (t)}t≥0 and
Remark 6.16. Note that the cash flow has been defined so that c(t) > 0 when the investor
withdraws cash from the portfolio (causing a decrease of its value).
is replicating the perpetual American put while generating the case flow c(t) = rKIS(t)<L∗
(i.e., cash is withdrawn at the rate rK whenever S(t) < L∗ , provided of course the buyer
does not exercise the derivative).
Hence (6.82) reduces to (6.81) with c(t) = rKIS(t)<L∗ , and the proof is complete.
161
6.9.2 American calls on a dividend-paying stock
Let bca (t, S(t), K, T ) denote the Black-Scholes price at time t of the American call with strike
K and maturity T assuming that the underlying stock pays the dividend aS(t− 0 ) at time t0 ∈
(0, T ). We denote by ca (t, S(t), K, T ) the Black-Scholes price of the corresponding European
call. We omit the subscript a to denote prices in the absence of dividends. Moreover replacing
the letter c with the letter P gives the price of the corresponding put option. We say that
it is optimal to exercise the American call at time t if its Black-Scholes price at this time
ca (t, S(t), K, T ) = (S(t) − K)+ .
equals the intrinsic value of the call, i.e., b
Theorem 6.20. consider the American call with strike K and expiration date T and assume
that the underlying stock pays the dividend aS(t−
0 ) at the time t0 ∈ (0, T ). Then
i.e., it is not optimal to exercise the American call prior to maturity after the dividend is
paid. Moreover, there exists δ > 0 such that, if
δ
S(t−
0 ) > max( , K),
1−a
then the equality
ca (t−
b − −
0 , S(t0 ), K, T ) = (S(t0 ) − K)+
holds, and so it is optimal to exercise the American call “just before” the dividend is to be
paid.
Proof. For the first claim we can assume (S(t) − K)+ = S(t) − K, otherwise the American
call is out of the money and so it is clearly not optimal to exercise. By Theorem 6.7 we have
Hence, by Theorem 6.5, the put-call parity holds after the dividend is paid:
where we used that P (t, S(t), K, T ) > 0 and r ≥ 0. This proves the first part of the theorem,
i.e., the fact that it is not optimal to exercise the American call prior to expiration after the
dividend has been paid. In particular
Next we show that it is optimal to exercise the American call “just before the dividend is
ca (t−
paid”, i.e., b − −
0 , S(t0 ), K, T ) = (S(t0 ) − K)+ , provided the price of the stock is sufficiently
162
high. Of course it must be S(t− 0 ) > K. Assume first that b ca (t− − −
0 , S(t0 ), K, T ) > S(t0 ) − K;
then, owing to (6.83), b ca (t− − − −
0 , S(t0 ), K, T ) = ca (t0 , S(t0 ), K, T ) (buying the American call
just before the dividend is paid is not better than buying the European call, since it is
never optimal to exercise the derivative prior to expiration). By Theorem (6.7) we have
ca (t− − − − −
0 , S(t0 ), K, T ) = c(t0 , (1 − a)S(t0 ), K, T ) = c(t0 , (1 − a)S(t0 ), K, T ), where for the
latter equality we used the continuity in time of the Black-Scholes price function in the
absence of dividends. Since (1 − a)S(t− 0 ) = S(t0 ), then
ca (t−
b − −
ca (t−
0 , S(t0 ), K, T ) > S(t0 ) − K ⇒ b
−
0 , S(t0 ), K, T ) = c(t0 , S(t0 ), K, T ).
Hence
ca (t−
b − − − −
0 , S(t0 ), K, T ) > S(t0 ) − K ⇒ c(t0 , S(t0 ), K, T ) > S(t0 ) − K = S(t0 ) + (1 − a)S(t0 ) − K.
Exercise 6.5. The pay-off function is g(z) = k + z log z. Hence the Black-Scholes price of
the derivative is ΠY (t) = v(t, S(t)), where
Z 2 √
r− σ2 τ −σ τ x x2 dx
−rτ
v(t, s) = e g se e− 2 √
2π
ZR
σ2 √
2 √ x2 dx
−rτ (r− σ2 )τ −σ τ x
=e k + se (log s + (r − )τ − σ τ x) e− 2 √
R 2 2π
dx
Z √
1 2
= ke−rτ + s log s e− 2 (x+σ τ ) √
2π
Z R
σ2 √ 2 dx √ √ 2 dx
Z
1 1
− 2 (x+σ τ )
+ s(r − )τ e √ − sσ τ xe− 2 (x+σ τ ) √
2 R 2π R 2π
163
Using that
dx dx √
Z √
Z √
− 12 (x+σ τ )2 1
τ )2
e √ = 1, xe− 2 (x+σ √ = −σ τ ,
R 2π R 2π
we obtain
σ2
v(t, s) = ke−rτ + s log s + s(r + )τ.
2
Hence
σ2
ΠY (t) = ke−rτ + S(t) log S(t) + S(t)(r +
)τ.
2
This completes the first part of the exercise. The number of shares of the stock in the hedging
portfolio is given by
hS (t) = ∆(t, S(t)),
∂v σ2
where ∆(t, s) = ∂s
= log s + 1 + (r + 2
)τ . Hence
σ2
hS (t) = 1 + (r + )τ + log S(t).
2
The number of shares of the bond is obtained by using that
hence
1
hB (t) = (ΠY (t) − hS (t)S(t))
B(t)
σ2 σ2
= e−rt (ke−rτ + S(t) log S(t) + S(t)(r + )τ − S(t) − S(t)(r + )τ − S(t) log S(t))
2 2
= ke−rT − S(t)e−rt .
This completes the second part of the exercise. To compute the probability that Y > 0,
we first observe that the pay-off function g(z) has a minimum at z = e−1 and we have
g(e−1 ) = k − e−1 . Hence if k ≥ e−1 , the derivative has probability 1 to expire in the money.
If k < e−1 , there exist a < b such that
log S(0)
a
+ αT log S(0)
b
+ αT
S(T ) < a ⇔ G > √ := A, S(t) > b ⇔ G < √ := B.
σ T σ T
164
Thus
+∞ B
dx x2 dx
Z 2
Z
− x2
P(Y > 0) = P(G > A) + P(G < B) = e √ + e− 2 √
A 2π −∞ 2π
= 1 − Φ(A) + Φ(B).
where the function f and its derivatives are evaluated at (t, σ 2 (t), Q(t)). As the discounted
risk-neutral price must be a martingale in the risk-neutral probability measure, we need the
drift term in the above equation to be zero. This is achieved by imposing that f satisfies
the PDE
c2
∂t f + a(b − x)∂x f + x∂y f + x∂x2 f = rf (6.85)
2
Since ΠY (T ) = Y = f (T, σ 2 (T ), Q(T )), the terminal condition is
r
κ
f (T, x, y) = N y−K .
T +
165
Bibliography
[1] L. Arnold: Stochastic differential equations. Theory and applications. Wiley Interscience
(1974)
[2] F. Black, M. Scholes: The Pricing of Options and Corporate Liabilities. The Journal of
Political Economy 81, 637–654 (1973)
[3] D. Brigo, F. Mercurio: Interest Rate Models - Theory and Practice. Second Ed.. Springer
(2006)
[4] R. Cont, P. Tankov: Financial Modelling With Jump Processes. Taylor & Francis (2004)
[5] S. Calogero: Introduction to options pricing theory. Lecture notes for the course ‘Options
and Mathematics”, Chalmers
[6] R. M. Dudley: Real Analysis and Probability. Cambridge studies in advanced mathe-
matics (2002)
[9] W. Feller: Two singular diffusion problems. Ann. Math. 54, 173–182 (1951)
[10] A. Friedman: Partial Differential Equations of Parabolic type. Dover Publications, Inc.
New York (2008)
[12] S. L. Heston: A Closed-Form Solution for Options with Stochastic Volatility with Ap-
plications to Bond and Currency Options. The Review of Financial Studies 6, 327–343
(1993)
[14] I. Karatsas, S. E. Shreve. Brownian Motion and Stochastic Calculus. Second Edi- tion,
Springer Verlag (1998)
166
[15] D. Lamberton, B. Lapeyre: Introduction to Stochastic Calculus applied to Finance (2nd
ed.). Chapman & Hall/CRC Financial Mathematics Series (2008)
[16] A. E. Linsday, D. R. Brecher: Simulation of the CEV model and local martingale
property.
[19] D. Revuz, M. Yor: Continuous martingales and Brownian motion. Third edition,
Springer (2001)
[20] M. Schroder: Computing the Constant Elasticity of Variance Option Pricing Formula.
The Journal of Finance 44, 211-219 (1989)
[21] S. E. Shreve: Stochastic Calculus for Fincance II. Springer Finance (2008)
167