An Introduction To Probability Theory - Geiss
An Introduction To Probability Theory - Geiss
An Introduction To Probability Theory - Geiss
Contents
1 Probability spaces
1.1 Definition of -algebras . . . . . . . . . . . . . . . . . . . . .
1.2 Probability measures . . . . . . . . . . . . . . . . . . . . . .
1.3 Examples of distributions . . . . . . . . . . . . . . . . . . .
1.3.1 Binomial distribution with parameter 0 < p < 1 . . .
1.3.2 Poisson distribution with parameter > 0 . . . . . .
1.3.3 Geometric distribution with parameter 0 < p < 1 . .
1.3.4 Lebesgue measure and uniform distribution . . . . .
1.3.5 Gaussian distribution on with mean m and
variance 2 > 0 . . . . . . . . . . . . . . . . . . . . .
1.3.6 Exponential distribution on with parameter > 0
1.3.7 Poissons Theorem . . . . . . . . . . . . . . . . . . .
1.4 A set which is not a Borel set . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
7
8
12
20
20
21
21
21
.
.
.
.
22
22
24
25
2 Random variables
29
2.1 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2 Measurable maps . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3 Integration
3.1 Definition of the expected value . . . . .
3.2 Basic properties of the expected value . .
3.3 Connections to the Riemann-integral . .
3.4 Change of variables in the expected value
3.5 Fubinis Theorem . . . . . . . . . . . . .
3.6 Some inequalities . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
39
39
42
48
49
51
58
4 Modes of convergence
63
4.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2 Some applications . . . . . . . . . . . . . . . . . . . . . . . . . 64
CONTENTS
Introduction
The modern period of probability theory is connected with names like S.N.
Bernstein (1880-1968), E. Borel (1871-1956), and A.N. Kolmogorov (19031987). In particular, in 1933 A.N. Kolmogorov published his modern approach of Probability Theory, including the notion of a measurable space
and a probability space. This lecture will start from this notion, to continue
with random variables and basic parts of integration theory, and to finish
with some first limit theorems.
The lecture is based on a mathematical axiomatic approach and is intended
for students from mathematics, but also for other students who need more
mathematical background for their further studies. We assume that the
integration with respect to the Riemann-integral on the real line is known.
The approach, we follow, seems to be in the beginning more difficult. But
once one has a solid basis, many things will be easier and more transparent
later. Let us start with an introducing example leading us to a problem
which should motivate our axiomatic approach.
Example. We would like to measure the temperature outside our home.
We can do this by an electronic thermometer which consists of a sensor
outside and a display, including some electronics, inside. The number we get
from the system is not correct because of several reasons. For instance, the
calibration of the thermometer might not be correct, the quality of the powersupply and the inside temperature might have some impact on the electronics.
It is impossible to describe all these sources of uncertainty explicitly. Hence
one is using probability. What is the idea?
Let us denote the exact temperature by T and the displayed temperature
by S, so that the difference T S is influenced by the above sources of
uncertainty. If we would measure simultaneously, by using thermometers of
the same type, we would get values S1 , S2 , ... with corresponding differences
D1 := T S1 ,
D2 := T S2 ,
D3 := T S3 , ...
CONTENTS
and
(f f )2.
fi () and
i=1
are close to each other. This yields us to the strong law of large numbers
discussed in Section 4.2.
Notation. Given a set and subsets A, B , then the following notation
is used:
intersection: A B
union: A B
set-theoretical minus: A\B
complement:
Ac
empty set:
real numbers:
natural numbers:
rational numbers:
=
=
=
=
=
{ : A and B}
{ : A or (or both) B}
{ : A and B}
{ : A}
set, without any element
= {1, 2, 3, ...}
:= min {, }.
Chapter 1
Probability spaces
In this chapter we introduce the probability space, the fundamental notion
of probability theory. A probability space (, F, P) consists of three components.
(1) The elementary events or states which are collected in a non-empty
set .
Example 1.0.1 (a) If we roll a die, then all possible outcomes are the
numbers between 1 and 6. That means
= {1, 2, 3, 4, 5, 6}.
(b) If we flip a coin, then we have either heads or tails on top, that
means
= {H, T }.
If we have two coins, then we would get
= {(H, H), (H, T ), (T, H), (T, T )}.
(c) For the lifetime of a bulb in hours we can choose
= [0, ).
(2) A -algebra F, which is the system of observable subsets of . Given
and some A F, one can not say which concrete occurs, but one
can decide whether A or A. The sets A F are called events: an
event A occurs if A and it does not occur if A.
Example 1.0.2 (a) The event the die shows an even number can be
described by
A = {2, 4, 6}.
1.1
Definition of -algebras
The -algebra is a basic tool in probability theory. It is the set the probability measures are defined on. Without this notion it would be impossible
to consider the fundamental Lebesgue measure on the interval [0, 1] or to
consider Gaussian measures, without which many parts of mathematics can
not live.
Definition 1.1.1 [-algebra, algebra, measurable space] Let be
a non-empty set. A system F of subsets A is called -algebra on if
(1) , F,
(2) A F implies that Ac := \A F,
i=1
Ai F.
Fj
jJ
is a -algebra as well.
10
Proof. The proof is very easy, but typical and fundamental. First we notice
that , Fj for all j J, so that , jJ Fj . Now let A, A1 , A2 , ...
jJ Fj . Hence A, A1 , A2 , ... Fj for all j J, so that (Fj are algebras!)
A = \A Fj
Ai F j
and
i=1
Fj
Ai
and
i=1
jJ
Fj .
jJ
(G) :=
CJ
yields to a -algebra according to Proposition 1.1.4 such that (by construction) G (G). It remains to show that (G) is the smallest -algebra
containing G. Assume another -algebra F with G F. By definition of J
we have that F J so that
C F.
(G) =
CJ
The construction is very elegant but has, as already mentioned, the slight
disadvantage that one cannot explicitly construct all elements of (G). Let
us now turn to one of the most important examples, the Borel -algebra on
. To do this we need the notion of open and closed sets.
11
be
be
be
be
be
be
the
the
the
the
the
the
system
system
system
system
system
system
of
of
of
of
of
of
all
all
all
all
all
all
] We
open subsets of ,
closed subsets of ,
intervals (, b], b ,
intervals (, b), b ,
intervals (a, b], < a < b < ,
intervals (a, b), < a < b < .
(a, b) =
(, b)\(, a +
n=1
1
)
n
(G3 )
(x x , x + x ),
For all x A
12
1.2
Probability measures
Let
Ai =
i=1
(Ai ).
(1.1)
i=1
k=1
k ,
(d) (k ) < .
The measure space (, F, ) or the measure are called finite if
() < .
(3) A measure space (, F, ) is called probability space and probability measure provided that () = 1.
1 : x0 A
.
0 : x0 A
13
P(Ak ) :=
n nk
p (1 p)k ,
k
0 < p < 1.
P()
If A1 , A2 , ... F then P (
i=1 Ai )
i=1 P (Ai ).
(3) If A, B F, then
(4)
(5)
P(
n
i=1
Ai ) =
14
P(An) = P
An .
n=1
P(An) = P
An .
n=1
P() = P
An
P (An) =
n=1
n=1
P () ,
n=1
=P
Ai
i=1
Ai
i=1
P (Ai) =
i=1
P (Ai) ,
i=1
because of P() = 0.
(3) Since (A B) (A\B) = , we get that
Ai
=P
i=1
Bi
P(Bi)
i=1
i=1
P(Ai).
i=1
Bn =
n=1
An
and Bi Bj =
n=1
for i = j. Consequently,
An
n=1
since
N
n=1
=P
Bn
n=1
=
n=1
P (Bn) = Nlim
Bn = AN . (7) is an exercise.
n=1
P (Bn) = Nlim
P(AN )
15
lim inf An :=
n
Ak
and
lim sup An :=
n
n=1 k=n
Ak .
n=1 k=n
The definition above says that lim inf n An if and only if all events An ,
except a finite number of them, occur, and that lim supn An if and only
if infinitely many of the events An occur.
Definition 1.2.6 [lim inf n n and lim supn n ] For 1 , 2 , ... we let
lim inf n := lim inf k
n
and
n kn
kn
Remark 1.2.7 (1) The value lim inf n n is the infimum of all c such that
there is a subsequence n1 < n2 < n3 < such that limk nk = c.
(2) The value lim supn n is the supremum of all c such that there is a
subsequence n1 < n2 < n3 < such that limk nk = c.
(3) By definition one has that
lim inf n lim sup n .
n
lim sup n = 1.
n
lim inf An
n
lim sup An .
n
P (Ak
16
P(A B) = P(A)P(B)
and C = gives
P(A B C) = P(A)P(B)P(C),
for B F,
P(A)
=
=
P(A B) + P(A B c)
P(A|B)P(B) + P(A|B c)P(B c).
This implies
A)
P(B|A) = P(B
P(A)
=
=
P(A|B)P(B)
P(A)
P(A|B)P(B)
P(A|B)P(B) + P(A|B c)P(B c) .
Let us consider an
Example 1.2.11 A laboratory blood test is 95% effective in detecting a
certain disease when it is, in fact, present. However, the test also yields a
false positive result for 1% of the healthy persons tested. If 0.5% of the
population actually has the disease, what is the probability a person has the
disease given his test result is positive? We set
B := person has the disease,
A := the test result is positive.
17
Hence we have
P(A|B)
P(A|B c)
P(B)
P(A|Bj )P(Bj ) .
P(A|Bk )P(Bk )
P(Bj |A) =
n
k=1
n=1
n=1
n=1
Ak
k=n
Ak . By
k=n+1
P(An) = , then
Ak
k=n
Ak
n=1 k=n
lim
k=n
lim
Ak
k=n
P (Ak ) = 0,
18
= lim inf
n
Acn
Acn .
=
n=1 k=n
Acn
= 0.
n=1 k=n
Letting Bn :=
k=n
Acn
P(Bn)
Ack
= 0.
= lim
n=1 k=n
P(Bn) = P
k=n
Since the independence of A1 , A2 , ... implies the independence of Ac1 , Ac2 , ...,
we finally get (setting pn := P(An )) that
Ack
k=n
lim
N ,N n
Ack
k=n
P (Ack )
lim
N ,N n
k=n
N
(1 pk )
lim
N ,N n
k=n
N
epn
lim
N ,N n
lim
N ,N n
k=n
N
k=n
pn
= e k=n pn
= e
= 0
where we have used that 1 x ex for x 0.
Although the definition of a measure is not difficult, to prove existence and
uniqueness of measures may sometimes be difficult. The problem lies in the
fact that, in general, the -algebras are not constructed explicitly, one only
knows its existence. To overcome this difficulty, one usually exploits
19
P0 : G [0, 1] satisfies:
P0() = 1.
i=1
P0
Ai =
i=1
P0(Ai).
i=1
P on F such that
P(A) = P0(A)
Ai G, then
for all A G.
with A1 F1 , A2 F2 .
A11
A12
(An1
An2 )
:=
P1(Ak1 )P2(Ak2 ).
k=1
20
n.
for all A, B G.
P1(A) = P2(A)
Then
for all A G.
1.3
Examples of distributions
1.3.1
P(B) = n,p(B) :=
n
k=0
n
k
Interpretation: Coin-tossing with one coin, such that one has head with
probability p and tail with probability 1 p. Then n,p ({k}) is equals the
probability, that within n trials one has k-times head.
1.3.2
21
P(B) = (B) :=
k
(B).
k=0 e
k! k
The Poisson distribution is used for example to model jump-diffusion processes: the probability that one has k jumps between the time-points s and
t with 0 s < t < , is equal to (ts) ({k}).
1.3.3
P(B) = p(B) :=
k=0 (1
p)k pk (B).
1.3.4
(ai , bi ]
i=1
(bi ai ).
:=
i=1
22
d(x). Letting
P(B) := b 1 a (B)
1.3.5
Gaussian distribution on
variance 2 > 0
(1) := .
(2) F := B() Borel -algebra.
(3) We take the algebra G considered in Example 1.1.3 and define
n
P0(A) :=
bi
i=1
ai
1
2 2
(xm)2
2 2
dx
for A := (a1 , b1 ](a2 , b2 ] (an , bn ] where we consider the Riemannintegral on the right-hand side. One can show (we do not do this here,
but compare with Proposition 3.5.8 below) that P0 satisfies the assumptions of Proposition 1.2.14, so that we can extend P0 to a probability
measure Nm,2 on B().
The measure Nm,2 is called Gaussian distribution (normal distribution) with mean m and variance 2 . Given A B() we write
pm,2 (x)dx with pm,2 (x) :=
Nm,2 (A) =
A
1
2 2
(xm)2
2 2
1.3.6
Exponential distribution on
>0
(1) := .
(2) F := B() Borel -algebra.
with parameter
23
P0(A) :=
i=1
bi
ai
p (x)dx.
A
([a + b, ) [a, ))
([a, ))
x
a+b e dx
x
e dx
a
(a+b)
ea
= ([b, )).
Example 1.3.2 Suppose that the amount of time one spends in a post office
1
is exponential distributed with = 10
.
(a) What is the probability, that a customer will spend more than 15 minutes?
(b) What is the probability, that a customer will spend more than 15 minutes in the post office, given that she or he is already there for at least
10 minutes?
1
24
1.3.7
Poissons Theorem
For large n and small p the Poisson distribution provides a good approximation for the binomial distribution.
Proposition 1.3.3 [Poissons Theorem] Let > 0, pn (0, 1), n =
1, 2, ..., and assume that npn as n . Then, for all k = 0, 1, . . . ,
n,pn ({k}) ({k}), n .
Proof. Fix an integer k 0. Then
n k
pn (1 pn )nk
k
n(n 1) . . . (n k + 1) k
=
pn (1 pn )nk
k!
1 n(n 1) . . . (n k + 1)
=
(npn )k (1 pn )nk .
k
k!
n
n,pn ({k}) =
+ 0
n
nk
+ n
n
nk
0
n
nk
+ 0
lim ln 1
n
n
lim (n k) ln 1
+ 0
n
0
ln 1 +
n
= lim
n 1/(n k)
+0
0
1 +
n
n2
= lim
2
n
1/(n k)
= ( + 0 ).
Hence
e(+0 ) = lim
+ 0
n
nk
lim
+ n
n
nk
e(0 ) .
+ n
n
nk
25
lim (1 pn )
1.4
= lim
+ n
1
n
nk
= e .
In this section we shall construct a set which is a subset of (0, 1] but not an
element of
B((0, 1]) := {B = A (0, 1] : A B()} .
Before we start we need
Definition 1.4.1 [-system] A class L is a -system if
(1) L,
(2) A, B L and A B imply B\A L,
(3) A1 , A2 , L and An An+1 , n = 1, 2, . . . imply
n=1
An L.
x+y
x+y1
if x + y (0, 1]
otherwise
and
A x := {a x : a A}.
Now define
L := {A B((0, 1]) such that
A x B((0, 1]) and (A x) = (A) for all x (0, 1]}.
26
and
(B x) = (B).
(B) (A)
(B x) (A x)
((B x) \ (A x))
((B \ A) x)
if and only if
xr =y
r (0, 1].
27
(0, 1] =
r(0,1]
rational
(H r) =
((0, 1]) =
r(0,1]
rational
(H r).
r(0,1]
rational
1 = ((0, 1]) =
r(0,1]
rational
So, the right hand side can either be 0 (if a = 0) or (if a > 0). This leads
to a contradiction, so H B((0, 1]).
28
Chapter 2
Random variables
Given a probability space (, F, P), in many stochastic models one considers
functions f : , which describe certain random phenomena, and is
interested in the computation of expressions like
P ({ : f () (a, b)}) ,
where a < b.
2.1
Random variables
f () =
i 1IAi (),
i=1
where
1IAi () :=
1 : Ai
.
0 : Ai
=
=
=
=
=
1,
0,
1,
1IA 1IB ,
1IA + 1IB 1IAB .
29
30
The definition above concerns only functions which take finitely many values,
which will be too restrictive in future. So we wish to extend this definition.
Definition 2.1.2 [random variables] Let (, F) be a measurable space.
A map f : is called random variable provided that there is a
sequence (fn )
n=1 of measurable step-functions fn : such that
f () = lim fn () for all .
n
Does our definition give what we would like to have? Yes, as we see from
Proposition 2.1.3 Let (, F) be a measurable space and let f :
a function. Then the following conditions are equivalent:
be
where fn : are measurable step-functions. For a measurable stepfunction one has that
fn1 ((a, b)) F
so that
f 1 ((a, b)) =
:a+
=
m=1 N =1 n=N
1
1
< fn () < b
m
m
F.
:a
=
m=1
1
< f () < b
m
fn () :=
k=4n
k
1I k
k+1 ().
2n { 2n f < 2n }
31
f
g
() :=
f ()
g()
is a random variable.
Hence
(f g)() = lim fn ()gn ().
n
fn () =
i 1IAi () and gn () =
i=1
j 1IBj (),
j=1
yields
k
(fn gn )() =
i j 1IAi Bj ()
i j 1IAi ()1IBj () =
i=1 j=1
i=1 j=1
2.2
Measurable maps
32
for all B .
for all B 0 ,
f 1 (B) F
for all B .
then
Proof. Define
A := B M : f 1 (B) F .
Obviously, 0 A. We show that A is a algebra.
(1) f 1 (M ) = F implies that M A.
(2) If B A, then
f 1 (B c ) =
=
=
=
{ : f () B c }
{ : f ()
/ B}
\ { : f () B}
f 1 (B)c F.
(3) If B1 , B2 , A, then
f 1
Bi
i=1
f 1 (Bi ) F.
=
i=1
33
Proof. Since f is continuous we know that f 1 ((a, b)) is open for all <
a < b < , so that f 1 ((a, b)) B(). Since the open intervals generate
B() we can apply Lemma 2.2.3.
Now we state some general properties of measurable maps.
Proposition 2.2.5 Let (1 , F1 ), (2 , F2 ), (3 , F3 ) be measurable spaces.
Assume that f : 1 2 is (F1 , F2 )-measurable and that g : 2 3 is
(F2 , F3 )-measurable. Then the following is satisfied:
(1) g f : 1 3 defined by
(g f )(1 ) := g(f (1 ))
is (F1 , F3 )-measurable.
(2) Assume that
Assume the random number generator gives out the number x. If we would
write a program such that output = heads in case x [0, p) and output
= tails in case x [p, 1], output would simulate the flipping of an (unfair)
coin, or in other words, output has binomial distribution 1,p .
Definition 2.2.7 [law of a random variable] Let (, F, P) be a probability space and f : be a random variable. Then
Pf (B) := P ( : f () B)
is called the law of the random variable f .
34
lim F (x) = 1.
P({ : f () x})
P
{ : f () xn }
n=1
= lim P ({ : f () xn })
n
= lim F (xn ).
n
(iii) The properties limx F (x) = 0 and limx F (x) = 1 are an exercise.
2.3. INDEPENDENCE
35
Proof. (1) (2) is of course trivial. We consider (2) (1): For sets of type
A := (a1 , b1 ] (an , bn ],
where the intervals are disjoint, one can show that
n
Lemma 2.2.3
f is measurable: f 1 (A) F for all A B()
Proposition 2.2.2
There exist measurable step functions (fn )
n=1 i.e.
Nn
n
fn = k=1 ak 1IAnk
n
with ak and Ank F such that
fn () f () for all as n .
2.3
Independence
Let us first start with the notion of a family of independent random variables.
Definition 2.3.1 [independence of a family of random variables]
Let (, F, P) be a probability space and fi : , i I, be random variables where I is a non-empty index-set. The family (fi )iI is called independent provided that for all i1 , ..., in I, n = 1, 2, ..., and all B1 , ..., Bn B()
one has that
P (fi
36
In case, we have a finite index set I, that means for example I = {1, ..., n},
then the definition above is equivalent to
Definition 2.3.2 [independence of a finite family of random variables] Let (, F, P) be a probability space and fi : , i = 1, . . . , n,
random variables. The random variables f1 , . . . , n are called independent
provided that for all B1 , ..., Bn B() one has that
2.3. INDEPENDENCE
37
pf (x)dx =
pg (x)dx = 1,
x
pf (y)dy,
Ff (x) =
pg (y)dy
and Fg (x) =
for all x (one says that the distribution-functions Ff and Fg are absolutely continuous with densities pf and pg , respectively). Then the independence of f and g is also equivalent to
x
pf (u)pg (v)d(u)d(v).
F(f,g) (x, y) =
38
Proposition 2.3.8 [Realization of independent random variables] Let ( , B( ), P) and n : be defined as above. Then (n )
n=1
is a sequence of independent random variables such that the law of n is Pn ,
that means
P(n B) = Pn(B)
=
k=1
P({ : k () Bk }).
Chapter 3
Integration
Given a probability space (, F, P) and a random variable f :
define the expectation or integral
f =
f dP =
, we
f ()dP()
3.1
g=
i 1IAi
i=1
g =
gdP =
g()dP() :=
i P(Ai ).
i=1
We have to check that the definition is correct, since it might be that different
representations give different expected values g. However, this is not the
case as shown by
Lemma 3.1.2 Assuming measurable step-functions
n
g=
i 1IAi =
i=1
n
i=1
i P(Ai ) =
j 1IBj ,
j=1
m
j=1
j P(Bj ).
39
40
CHAPTER 3. INTEGRATION
Proof. By subtracting in both equations the right-hand side from the lefthand one we only need to show that
n
i 1IAi = 0
i=1
implies that
n
i P(Ai ) = 0.
i=1
N
j=1
Cj = ,
jIi
Cj .
0=
i 1IAi =
i=1
i 1ICj =
i=1 jIi
i 1ICj =
j=1
j 1ICj
j=1
i:jIi
i P(Ai ) =
i=1
i P(Cj ) =
i
j=1
i=1 jIi
P(Cj ) =
j P(Cj ) = 0.
j=1
i:jIi
(f + g) = f + g.
Proof. The proof follows immediately from Lemma 3.1.2 and the definition
of the expected value of a step-function since, for
n
f=
i 1IAi
and
g=
i=1
j 1IBj ,
j=1
f + g =
i 1IAi +
i=1
j 1IBj
j=1
and
(f + g) =
i=1
i P(Ai ) +
j=1
j P(Bj ) = f + g.
41
f =
f dP =
f ()dP()
Note that in this definition the case f = is allowed. In the last step we
define the expectation for a general random variable.
Definition 3.1.5 [step three, f is general] Let (, F, P) be a probability space and f : be a random variable. Let
f + () := max {f (), 0}
(1) If f + < or
exists and set
f + <
and
f < .
f dP =
f ()dP() :=
f ()1IA ()dP().
42
CHAPTER 3. INTEGRATION
A simple example for the expectation is the expected value while rolling a
die:
Example 3.1.8 Assume that := {1, 2, . . . , 6}, F := 2 , and
which models rolling a die. If we define f (k) = k, i.e.
P({k}) := 16 ,
f (k) :=
i1I{i} (k),
i=1
f =
i=1
3.2
iP({i}) =
1 + 2 + + 6
= 3.5.
6
P-almost surely or
{ : P() holds}
belongs to F and is of measure one. Let us start with some first properties
of the expected value.
Proposition 3.2.1 Assume a probability space (, F, P) and random variables f, g : .
(1) If 0 f () g(), then 0 f g.
f = 0.
If f 0 a.s. and f = 0, then f = 0 a.s.
If f = g a.s. and f exists, then g exists and f = g.
Proof. (1) follows directly from the definition. Property (2) can be seen
as follows: by definition, the random variable f is integrable if and only if
f + < and f < . Since
: f + () = 0 : f () = 0 =
and since both sets are measurable, it follows that |f | = f + +f is integrable
if and only if f + and f are integrable and that
|f | = |f + f | f + + f = |f |.
43
be a
f () = lim fn ().
n
f = n
lim fn .
Proof. (1) It is easy to verify that the staircase-functions
4n 1
fn () :=
k=4n
k
1I k
k+1 ().
2n { 2n f < 2n }
fn0 () :=
k=0
k
1I k
k+1 ()
2n { 2n f < 2n }
gn hn f,
and
lim
gn = n
lim hn = f.
44
CHAPTER 3. INTEGRATION
Consider
dk,n := fk hn .
Clearly, dk,n fk as n and dk,n hn as k . Let
zk,n := arctan dk,n
Hence
f = lim
hn = lim
lim dk,n = lim lim dk,n = fn
n
n
n
k
k
(1 )P(A) lim n .
n
= P(A) lim
n
n
and are done.
Now we continue with same basic properties of the expectation.
Proposition 3.2.3 [properties of the expectation] Let (, F, P) be
a probability space and f, g : be random variables such that f and
g exist.
< or f + g < , then
(f + g) < and (f + g) = f + g.
(1) If
f + + g+
(2) If c , then
(3)
(f + g)+
< or
45
(f + g)+ + f + g = f + + g+ + (f + g)
(3.1)
and
0 n () ()
+ = lim
n + lim
n = lim
(n + n) = ( + ).
n
n
n
(2) is an exercise.
for all .
46
CHAPTER 3. INTEGRATION
For each fn take a sequence of step functions (fn,k )k1 such that 0 fn,k fn ,
as k . Setting
hN := max fn,k
1kN
1nN
f Nlim
fN .
fn f and hence
fn f.
(fn1IA )
c
fn
and
47
lim
inf fn
n
Proof. We only prove the first inequality. The second one follows from the
definition of lim sup and lim inf, the third one can be proved like the first
one. So we let
Zk := inf fn
nk
limninf fn = lim
Zk = lim
nk
inf fn
k
k
lim inf
k
nk
fn
= lim inf fn .
n
f = lim
fn.
n
Proof. Applying Fatous Lemma gives
f = lim
inf fn
n
lim sup fn = f.
n
48
3.3
CHAPTER 3. INTEGRATION
f (x)dx = f
with the Riemann-integral on the left-hand side and the expectation of the
random variable f with respect to the probability space ([0, 1], B([0, 1]), ),
where is the Lebesgue measure, on the right-hand side.
Now we consider a continuous function p : [0, ) such that
p(x)dx = 1
P on B() by
bi
p(x)dx
i=1
ai
|f (x)|p(x)dx < .
Then
f (x)p(x)dx = f
with the Riemann-integral on the left-hand side and the expectation of the
random variable f with respect to the probability space (, B(), P) on the
right-hand side.
Let us consider two examples indicating the difference between the Riemannintegral and our expected value.
49
f (x) :=
f = 1 if
lim
sin x
dx =
x
2
sin x
x
dx = and
0
sin x
x
dx = .
Transporting this into a probabilistic setting we take the exponential distribution with parameter > 0 from Section 1.3.6. Let f : be
x x
given by f (x) = 0 if x 0 and f (x) := sin
e if x > 0 and recall that the
x
exponential distribution with parameter > 0 is given by the density
p (x) = 1I[0,) (x)ex . The above yields that
t
lim
f (x)p (x)dx =
0
but
f (x)+ d (x) =
f (x) d (x) = .
Hence the expected value of f does not exists, but the Riemann-integral gives
a way to define a value, which makes sense. The point of this example is
that the Riemann-integral takes more information into the account than the
rather abstract expected value.
3.4
for all
A E.
50
CHAPTER 3. INTEGRATION
Then
A
g()dP () =
1 (A)
g(())dP()
for all A E in the sense that if one integral exists, the other exists as well,
and their values are equal.
Proof. (i) Letting g() := 1IA ()g() we have
g(()) = 1I1 (A) ()g(())
so that it is sufficient to consider the case A = . Hence we have to show
that
E
g()dP () =
g(())dP().
gn ()dP () =
gn (())dP()
1IB (())dP() =
gn (())dP().
Pf ((a, b]) =
p(x)dx
a
51
for all < a < b < where p : [0, ) is a continuous function such
() dP() =
n
g(x)dP (x) =
xn p(x)dx
P =
pk k
k=1
with pk 0,
k=1 pk = 1, and some k E (that means that the image
measure of P with respect to is discrete). Then
3.5
g(())dP() =
g()dP () =
pk g(k ).
k=1
Fubinis Theorem
52
CHAPTER 3. INTEGRATION
(2) 1I H.
(3) If fn H, fn 0, and fn f , where f is bounded on , then f H.
Then one has the following: if H contains the indicator function of every
set from some -system I of subsets of , then H contains every bounded
(I)-measurable function on .
Proof. See for example [5] (Theorem 3.14).
For the following it is convenient to allow that the random variables may
take infinite values.
Definition 3.5.3 [extended random variable] Let (, F) be a measurable space. A function f : {, } is called extended random
variable if
f 1 (B) := { : f () B} F
for all
B B().
f dP = lim
[f N ]dP.
53
f (1 , 2 )dP2 (2 ) and
2
1
f (1 , 2 )dP1 (1 )
1 2
f (1 , 2 )d(P1 P2 ) =
f (1 , 2 )dP2 (2 ) dP1 (1 )
f (1 , 2 )dP1 (1 ) dP2 (2 ).
It should be noted, that item (3) together with Formula (3.2) automatically
implies that
and
P2
2 :
P1
1 :
f (1 , 2 )dP1 (1 ) =
=0
f (1 , 2 )dP2 (2 ) =
= 0.
f (1 , 2 )dP2 (2 ) and 2
f (1 , 2 )dP1 (1 )
54
CHAPTER 3. INTEGRATION
1 2
f (1 , 2 )dP2 (2 ) dP1 (1 )
f (1 , 2 )dP1 (1 ) dP2 (2 ).
Again, using Propositions 2.1.4 and 3.2.4 we see that H satisfies the assumptions (1), (2), and (3) of Proposition 3.5.2. As -system I we take the system
of all F = AB with A F1 and B F2 . Letting f (1 , 2 ) = 1IA (1 )1IB (2 )
we easily can check that f H. For instance, property (c) follows from
1 2
f (1 , 2 )dP2 (2 ) dP1 (1 ) =
=
P1(A)P2(B).
Applying the Monotone Class Theorem Proposition 3.5.2 gives that H consists of all bounded functions f : 1 2 measurable with respect
F1 F2 . Hence we are done.
Now we state Fubinis Theorem for general random variables f : 1 2
.
Proposition 3.5.5 [Fubinis Theorem] Let f : 1 2
F1 F2 -measurable function such that
1 2
be an
(3.3)
f (1 , 20 )dP1 (1 ) and
f (10 , 2 )dP1 (2 )
55
and
2 1IM2 (2 )
1
f (1 , 2 )dP2 (2 )
f (1 , 2 )dP1 (1 )
1 2
f (1 , 2 )d(P1 P2 )
1IM1 (1 )
1
1IM2 (2 )
2
f (1 , 2 )dP2 (2 ) dP1 (1 )
f (1 , 2 )dP1 (1 ) dP2 (2 ).
f (1 , 2 )dP1 (1 )
ex dx
by Fubinis Theorem.
56
CHAPTER 3. INTEGRATION
f (x, y)
N
d(y) d(x)
=
2N
2N
f (x, y)
[N,N ][N,N ]
d( )(x, y)
(2N )2
ex ey d(y) d(x) =
N
e(x
2 +y 2 )
2 +y 2 )
, the above
d( )(x, y).
[N,N ][N,N ]
ex ey d(y) d(x)
lim
N
N
ex
lim
N
2
x2
lim
ey d(y) d(x)
N
N
d(x)
N
2
x2
d(x)
lim
2 +y 2 )
d( )(x, y)
[N,N ][N,N ]
=
=
e(x
lim
x2 +y 2 R2
R
2
d( )(x, y)
er rdrd
lim
2 +y 2 )
= lim 1 eR
R
=
ex d(x) =
57
1
2 2
(xm)2
2 2
xpm,2 (x)dx = m,
and
f = m
and
(3.4)
(f f )2 = 2.
(3.5)
1
2
ex dx =
2
z2
e 2 dz
xp0,1 (x)dx = 0
follows from the symmetry of the density p0,1 (x) = p0,1 (x). Finally, by
partial integration (use (x exp(x2 /2)) = exp(x2 /2) x2 exp(x2 /2)) one
can also compute that
1
x2
1
x2 e 2 dx =
2
x2
e 2 dx = 1.
(x2
xy
+ y 2 )2
for (x, y) = (0, 0) and f (0, 0) := 0 is not integrable on , even though the
iterated integrals exist end are equal. In fact
1
f (x, y)d(y) = 0
1
so that
1
58
CHAPTER 3. INTEGRATION
4
[1,1][1,1]
0
1
= 2
0
| sin cos |
ddr
r
1
dr = .
r
The inequality holds because on the right hand side we integrate only over
the area {(x, y) : x2 + y 2 1} which is a subset of [1, 1] [1, 1] and
2
/2
| sin cos |d = 4
0
sin cos d = 2
0
3.6
Some inequalities
is convex and
g(f ) g(f )
where the expected value on the right-hand side might be infinity.
59
Example 3.6.4 (1) The function g(x) := |x| is convex so that, for any
integrable f ,
|f | |f |.
(2) For 1 p < the function g(x) := |x|p is convex, so that Jensens
inequality applied to |f | gives that
(|f |)p |f |p .
For the second case in the example above there is another way we can go. It
lder-inequality.
uses the famous Ho
lders inequality] Assume a probability space
Proposition 3.6.5 [Ho
(, F, P) and random variables f, g : . If 1 < p, q < with p1 + 1q = 1,
then
|f g| (|f |p) p1 (|g|q ) 1q .
Proof. We can assume that |f |p > 0 and |g|q > 0. For example, assuming
|f |p = 0 would imply |f |p = 0 a.s. according to Proposition 3.2.1 so that
f g = 0 a.s. and |f g| = 0. Hence we may set
f :=
f
(|f |p )
1
p
and
g :=
g
(|g|q ) q
We notice that
xa y b ax + by
for x, y 0 and positive a, b with a + b = 1, which follows from the concavity
of the logarithm (we can assume for a moment that x, y > 0)
ln(ax + by) a ln x + b ln y = ln xa + ln y b = ln xa y b .
Setting x := |f|p , y := |
g |q , a := p1 , and b := 1q , we get
1
1 q
g|
|fg| = xa y b ax + by = |f|p + |
p
q
60
CHAPTER 3. INTEGRATION
and
|f g|
(|f |p ) (|g|q )
|fg| =
1
p
1
q
1
p
|an |p
|an bn |
n=1
1
q
|bn |q
n=1
n=1
|an bn |
n=1
1
N
1
p
|an |p
n=1
1
N
1
q
|bn |q
n=1
(3.6)
|a|p + |b|p
2
and (a+b)p 2p1 (ap +bp ) for a, b 0. Consequently, |f +g|p (|f |+|g|)p
2p1 (|f |p + |g|p ) and
61
|f + g|p
|f + g||f + g|p1
(|f | + |g|)|f + g|p1
|f ||f + g|p1 + |g||f + g|p1
(|f |p ) |f + g|(p1)q (|g|p ) |f + g|(p1)q
1
p
1
q
1
p
1
q
(f f )2
f2
P(|f f | )
2 .
2
exists. Ap-
(f f )2 = f 2 (f )2 f 2.
62
CHAPTER 3. INTEGRATION
Chapter 4
Modes of convergence
4.1
Definitions
P({ : fn() f ()
as n }) = 1.
as n .
|fn f |p 0
as n .
For the above types of convergence the random variables have to be defined
on the same probability space. There is a variant without this assumption.
Definition 4.1.2 [Convergence in distribution] Let (n , Fn , Pn ) and
(, F, P) be probability spaces and let fn : n and f : be
random variables. Then the sequence (fn )
n=1 converges in distribution
d
to f (fn f ) if and only if
(fn) (f )
as n
64
Lp
(3) If fn f , then fn f .
d
(4) One has that fn f if and only if Ffn (x) Ff (x) at each point x of
continuity of Ff (x), where Ffn and Ff are the distribution-functions of
fn and f , respectively.
if n = 1, 2
1 if n = 3, 4, . . . , 6
4
1
=
if n = 7, . . .
.
..
4.2
Some applications
f1 = m
and
(f1 m)2 = 2.
65
Then
f1 + + fn P
m
n
that means, for each > 0,
lim P
n
:|
n ,
as
f1 + + fn
m| >
n
0.
f1 + + fn nm
>
n
|f1 + + fn nm|2
n 2 2
n
k=1 (fk
n 2 2
m))
n 2
0
n 2 2
=
as n .
Using a stronger condition, we get easily more: the almost sure convergence
instead of the convergence in probability.
Proposition 4.2.2 [Strong law of large numbers] Let (fn )
n=1 be a
sequence of independent random variables with fk = 0, k = 1, 2, . . . , and
c := supn fn4 < . Then
f1 + + fn
0 a.s.
n
Proof. Let Sn :=
Sn4
n
k=1
fk . It holds
4
fk
k=1
fi fj fk fl
i,j,k,l,=1
n
fk4 +
k=1
fk2fl2,
k,l=1
k=l
inequality,
fk2
fk4 c.
3
4
66
Hence
and
n=1
4
Sn
n4
Sn4
=
n4
Sn4
n4
n=1
n=1
Sn
n
3c
< .
n2
0 a.s.
There are several strong laws of large numbers with other, in particular
weaker, conditions. Another set of results related to almost sure convergence
1
comes from Kolmogorovs 0-1-law. For example, we know that
n=1 n =
(1)n
but that
converges. What happens, if we would choose the
n=1
n
signs +, randomly, for example using independent random variables n ,
n = 1, 2, . . . , with
A :=
:
n=1
n ()
converges
n
(4.1)
be sequence of map-
Fn .
T :=
n=1
P(A) {0, 1}
Proof. See [5].
for all A T .
67
Example 4.2.5 Let us come back to the set A considered in Formula (4.1).
For all n {1, 2, ...} we have
A=
:
k=n
k ()
converges
k
Fn
so that A T .
We close with a fundamental example concerning the convergence in distribution: the Central Limit Theorem (CLT). For this we need
Definition 4.2.6 Let (, F, P) be a probability spaces. A sequence of
Independent random variables fn : is called Identically Distributed
(i.i.d.) provided that the random variables fn have the same law, that means
P(fn ) = P(fk )
for all n, k = 1, 2, ... and all .
Let (, F, P) be a probability space and (fn )
n=1 be a sequence of i.i.d. ran2
2
dom variables with f1 = 0 and f1 = . By the law of large numbers we
know
f1 + + fn P
0.
n
Hence the law of the limit is the Dirac-measure 0 . Is there a right scaling
factor c(n) such that
f1 + + fn
g,
c(n)
f1 + + fn
x
n
g
n
for any g with
P(g x) =
2
x
u2
e
du.
u2
e 2 du
Index
-system, 25
lim inf n An , 15
lim inf n n , 15
lim supn An , 15
lim supn n , 15
-system, 20
-systems and uniqueness of measures, 20
-Theorem, 25
-algebra, 8
-finite, 12
algebra, 8
axiom of choice, 26
Holders inequality, 59
i.i.d. sequence, 67
independence of a family of events,
36
independence of a family of random variables, 35
independence of a finite family of
random variables, 36
independence of a sequence of
events, 15
Bayes formula, 17
binomial distribution, 20
Borel -algebra, 11
Borel -algebra on n , 20
Caratheodorys extension theorem,
19
central limit theorem, 67
Change of variables, 49
Chebyshevs inequality, 58
closed set, 11
conditional probability, 16
convergence almost surely, 63
convergence in Lp , 63
convergence in distribution, 63
convergence in probability, 63
convexity, 58
counting measure, 13
Jensens inequality, 58
Kolmogorovs 0-1-law, 66
law of a random variable, 33
Lebesgue integrable, 41
Lebesgue measure, 21, 22
Lebesgues Theorem, 47
lemma of Borel-Cantelli, 17
lemma of Fatou, 15
lemma of Fatou for random variables, 47
Dirac measure, 12
distribution-function, 34
dominated convergence, 47
measurable map, 32
measurable space, 8
equivalence relation, 25
68
INDEX
measurable step-function, 29
measure, 12
measure space, 12
Minkowskis inequality, 60
monotone class theorem, 52
monotone convergence, 45
open set, 11
Poisson distribution, 21
Poissons Theorem, 24
probability measure, 12
probability space, 12
product of probability spaces, 19
random variable, 30
Realization of independent random
variables, 38
step-function, 29
strong law of large numbers, 65
tail -algebra, 66
uniform distribution, 21
variance, 41
vector space, 51
weak law of large numbers, 64
69
70
INDEX
Bibliography
[1] H. Bauer. Probability theory. Walter de Gruyter, 1996.
[2] H. Bauer. Measure and integration theory. Walter de Gruyter, 2001.
[3] P. Billingsley. Probability and Measure. Wiley, 1995.
[4] A.N. Shiryaev. Probability. Springer, 1996.
[5] D. Williams. Probability with martingales. Cambridge University Press,
1991.
71