[go: up one dir, main page]

0% found this document useful (0 votes)
18 views48 pages

MA359 Lecture Notes

The document is a course outline on Measure Theory by Josephine Evans, detailing key concepts such as integration, σ-algebras, measures, and measurable functions. It emphasizes the importance of Lebesgue measure and integration in advanced analysis and probability theory. The course aims to provide a rigorous understanding of function spaces and convergence theorems essential for further studies in mathematics.

Uploaded by

Andreea Popescu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views48 pages

MA359 Lecture Notes

The document is a course outline on Measure Theory by Josephine Evans, detailing key concepts such as integration, σ-algebras, measures, and measurable functions. It emphasizes the importance of Lebesgue measure and integration in advanced analysis and probability theory. The course aims to provide a rigorous understanding of function spaces and convergence theorems essential for further studies in mathematics.

Uploaded by

Andreea Popescu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Measure Theory

Josephine Evans

January 13, 2022

Contents
1 Introduction 2
1.1 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 The most important things you will learn in this course . . . . . . . . . . . . . . . . . . 2
2 σ -algebras, denition of a measure 3
2.1 Collections of subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Set functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Week 2 starts here - Outer measure, Lebesgue Measure 7
4 Week 3 starts here - Outer measure and Lebesgue measure cont. 13
4.1 Properties of Lebesgue measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5 Measurable Functions 15
5.1 Random variables and the measure theoretic formulation of probability - in brief . . . . 17
6 Week 4 starts here - Measurable function cont. 17
6.1 Convergence of measurable functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.2 Egoro's Theorem and Lusin's Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
7 Integration 20
8 Week 5 starts here -Integration cont. 21
8.1 Convergence theorems for integrals of functions . . . . . . . . . . . . . . . . . . . . . . . 24
9 More integration: Week 7 starts here 28
9.1 Agreement with Riemann Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
10 Norms and inequalities 32
10.1 Inequalities - Week 8 starts here . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
10.2 Back to Lp spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
11 Product Measures - Week 9 Starts here 39
11.1 Applications of product measure and Fubini's theorem . . . . . . . . . . . . . . . . . . . 42
12 Radon-Nikodym Theorem - Week 10 starts here 44
12.1 Signed measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
12.2 Absolute Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
12.3 Duality in Lp spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

1
1 Introduction
Welcome to measure theory. This course introduces the modern theory of functions and integration
which underpins most advanced analysis topics. In particular the theory of function spaces will be
important in PDEs and the notion of measurable functions allows us to rigorously understand random
variables.
The key example we will study is Lebesgue measure in Rd . The goal of dening Lebesgue measure
is to nd a way of asigning length/area/volume/whatever its called if d ≥ 4 to a subset of Rd . It turns
out that it is not possible to do this for every possible subset of Rd , but it is possible to do this for
every subset you are likely to come accross!

1.1 Integration
One of the most important results of measure theory is the ability to integrate `against' the measures
that we dene. We want this new denition of the integral to agree with the Riemann integral on subsets
of Rd and also allow us to integrate over sets that aren't subsets of Rd or with dierent weightings of
the dierent parts of Rd . This new theory of integration allows us to rigorously dene expectation in
probability theory and provides numerous convergence theorems which are some of the results you will
use most from this course.

1.2 The most important things you will learn in this course
For your own knowledge of how measure and function `really' work:
ˆ How Lebesgue measure is constructed.

ˆ How the Lebesgue integral is constructed.

ˆ How product measures/spaces are constructed.

ˆ How Lp spaces are dened.

For use in later courses:


ˆ The fact that Lebesgue measure exists and does what you expect it to do.

ˆ Why you can work with measures just by looking at how they behave on a π -system (nd out
what that is soon!).
ˆ The dierent ways in which functions can converge.

ˆ Equivalences between ways things converge.

ˆ Convergence theorems: i.e. when convergence of functions implies convergence of their integrals.

ˆ Important inequalities: Hölder/Cauchy-Schwartz, Minkowski, Jensen.

ˆ When you can switch the order of integration.

2
2 σ -algebras, denition of a measure
2.1 Collections of subsets
We begin with some dry denitions of collections of sets and functions from collections of sets to R.
These will give us the key formal denition of measures.
We begin with the most basic defnition. An algebra is is collection of sets closed under nite set
operations.
Denition 2.1 (Algebra). A collection of subsets of a space E , A, is called and algebra if
ˆ ∅ ∈ A, E ∈ A.

ˆ If A ∈ A then Ac ∈ A.

ˆ If A, B ∈ A then A ∪ B ∈ A.

ˆ If A, B ∈ A then A ∩ B ∈ A.

We next dene a σ -algebra. This is the key defnition of a collection of sets for measure theory. The
letter σ here denotes countability. A σ -algebra is a collection of subsets of a space E , which are closed
under countable set operations.
Denition 2.2 (σ-algebra). A collection of subsets of a space E , A is a σ-algebra if
ˆ ∅ ∈ A, E ∈ A.

ˆ If A ∈ A then Ac ∈ A.

ˆ If A1 , A2 , . . . is a countable collection of sets in A then ∈ A.


S∞
n=1 An

ˆ If A1 , A2 , . . . is a countable collection of sets in A then ∈ A.


T∞
n=1 An

Lemma 2.3. We can equivalently dene a σ-algebra as a collection of sets which is contains ∅ and is
closed under taking complements and countable unions.
Proof. Suppose that E is closed under complements and taking countable unionsT and contains ∅, then
it
T is clearS
that E ∈ E . We need to show that if (An )n is a sequence in E then An ∈ E . We know that
An = ( Acn )c so this gives our result.

Example 2.4. In this course we really only deal with one `concrete', non-trivial example of a σ-algebra.
This is complicated to introduce and we will discuss bellow. However, in order to better understand
the denition we give a few examples of things which are, and are not σ -algebras.
ˆ The main example is the Borel σ -algebra which we will meet in week 2

ˆ The power set of E is always a σ algebra.

ˆ If A ⊂ E then {∅, A, Ac , E} is a σ -algebra.

We now collect some results and further denitions about σ -algebras.


Lemma 2.5. Suppose E is a space and C is a collection of σ algebras possibly uncountable. Then
A∈C A is also a σ -algebra.
T

Proof. It is straightforward to check that every part of the denition of a σ-algebra holds for the
intersection.

3
Corollary 2.6. For any collection of subsets of a space E , F there is a smallest σ-algebra containing
F . We call this σ(F) or the σ -algebra generated by F .
Proof. There exists at least one σ-algebra containing
T F since the set of all subsets of E is a σ algebra.
Then we can consider the non-empty intersection A∈C A where C is the collection of all σ -algebras
which contain F . We call this resulting σ - algebra the σ -algebra generated by F .
Example 2.7 (Key example: Borel σ-algebra). If E is a topological space and O the family of open sets
in E , then we write B(E) to be the σ -algebra generated by O. This is called the Borel σ -algebra.
We are most interested in B(Rd ). We have the following result
Lemma 2.8. B(R) is generated by the following sets.
ˆ The collection of closed sets in R.
ˆ The collection of intervals of the form (−∞, b].
ˆ The collection of intervals of the form (a, b].
Proof. Let us call B1 , B2 , B3 to be the σ-algebras generated by the sets above. We then want to show
that B(R) ⊇ B1 ⊇ B2 ⊇ B3 .
As B(R) contains all the open sets, it also contains all the closed sets (whose complements are open).
Therefore, it also contains B1 .
As B1 contains all the closed sets, and all the intervals (−∞, b] are closed then B1 contains the
σ -algebra generated by these sets, namely B2 .
As B2 contains (−∞, b] and (−∞, a] and is closed under complements it also contains, (−∞, b] and
(a, ∞). As B2 is closed under intersection, this means it also contains (a, b]. This is true for all a < b
so B2 contains all sets of this form. Consequently, it contains B3 .
Now we want to show thatSB(R) ⊆ B3 . This will conclude the proof. First we note, that we can
make an open interval (a, b) = n (a, b − 1/n] where the union is taken over all n > (b − a)−1 . Now we
need to show that any open set in R is a countable union of open intervals. Let U be such an open set
then let [ [
O= (q − r, q + r).
q∈Q∩U r∈Q s.t (q−r,q+r)⊆U

Then since O is a union of subsets of U then O ⊆ U . Suppose that x ∈ U then there exists some ρ
such that (x − ρ, x + ρ) ⊆ U . There is some rationals q, r such that x ∈ (q − r, q + r) ⊆ (x − ρ, x + ρ)
therefore x ∈ O. Consequently U = O.
We have two further defnitions of collections of sets which will be useful. These separate the two
parts of the defnition of a σ -algebra.
Denition 2.9 (π-system). A collection of subsets of E , A is a π-system if
ˆ ∅∈A
ˆ If A, B ∈ A then A ∩ B ∈ A.
Denition 2.10 (D-system). A collection of subsets of E, A is a d-system if
ˆ E ∈ A.
ˆ If A, B ∈ A with A ⊂ B then B \ A ∈ A.
ˆ If A1 ⊂ A2 ⊂ A3 ⊂ . . . then ∞n=1 An ∈ A.
S

Lemma 2.11 (Dynkin's π-system lemma). Let A be a π-system. Then any d-system containing A also
contains the σ-algebra generated by A.
Proof. This is an exercise.

4
2.2 Set functions
Denition 2.12 (Set function). A set function φ is a function from a family of subsets of a space E ,
A to R ∪ {∞}.
Denition 2.13 (Measure). A measure is a specic type of set function which satises certain axioms.
A set function µ dened from a σ -algebra A is a measure if,
ˆ µ(A) ≥ 0 for every A ∈ A.
ˆ µ(∅) = 0
ˆ If A1 , A2 , A3 , . . . are all pairwise disjoint and in A then
!
[ X
µ An = µ(An ).
n n

We call this last property countable additivity.


Example 2.14 (Delta (function)). You've probably seen δx0 (x) used before; it is similar to the Kroeneker
delta
 which appears in discrete spaces δx,y = 1 if and only if x = y . This is the `function' dened by
δx0 (x)f (x)dx = f (x0 ). We can dene a measure on Rd which will have this property by

1 x0 ∈ A
δx0 (A) = .
0 x0 ∈
/A
Example 2.15 (Countable space). If E = {x1 , x2 , . . . } is a countable space and F : E → R≥0 is a
non-negative function then we can dene a measure by µ(A) = n F (xn )1xn ∈A . In fact any measure
P
on a countable set can be written this way by choosing F (xn ) = µ({xn }).
Example 2.16 (Function (informally)). In the course we will dene this rigorously later. However we
can dene a measure on Rd by integrating a function over subsets of Rd . If f is a non-negative function
then we dene µf (A) = A f (x)dx.
We also dene two further possible properties of set functions
Denition 2.17 (Monotonicity). A set function φ is monotone if whenever A ⊆ B we have φ(A) ≤
φ(B).
Denition 2.18 (Countable subadditivity). A set function φ is countably subadditive if for every
sequence of sets A1 , A2 , A3 , . . . we have
!
[ X
φ An ≤ φ(An ).
n n

Lemma 2.19. If µ is a measure the µ is both monotone and countably subadditive.


Proof. Suppose A ⊆ B then B = A ∪ (B \ A) and this union is disjoint. Countable additivity then
implies that µ(B) = µ(A) + µ(B \ A) and since µ(B \ A) ≥ 0 wehave µ(A) ≤ µ(B) .
Sn−1 
Now take a sequence A1 , A2 , A3 , . . . and dene Bn = An \ An ∩ k=1 Ak then the Bn form a
disjoint sequence with n An = n Bn . We also have, for every n, that Bn ⊆ An so by monotonicity
S S
µ(Bn ) ≤ µ(An ). Then using countable additivity on the union of the Bn we have
[ [ X X
µ( An ) = µ( Bn ) = µ(Bn ) ≤ µ(An ).
n n n n

5
Denition 2.20 (Measureable space). We call a pair (E, A) of a space and a σ-algebra, a measureable
space.
Denition 2.21 (Measure space). We call a triple (E, A, µ) of a space, a σ-algebra and a measure a
measure space.
Denition 2.22 (Finite measure space). We call a measure space (E, A, µ) nite if µ(E) < ∞.
Denition 2.23 (σ-nite measure space). We call a measure space, (E, A, µ), σ-nite if there exists
a countable collection E1 , E2 , · · · ∈ A such that
[
E= En ,
n

and
µ(Ei ) < ∞, ∀i.

Denition 2.24 (Borel measures and Radon measures). A measure µ on a subset of a topological
space E is called a Borel measure if it is a measure with respect to the Borel σ -algebra.
A Borel measure is called a Radon measure if for every compact set K ∈ B(E) we have that
µ(K) < ∞.

Lemma 2.25 (Continuity of measure). Let (E, E, µ) be a measure space. Suppose that (An )n is a
sequence of measurable sets with A1 ⊆ A2 ⊆ . . . and (Bn )n is a sequence of measurable sets with
B1 ⊇ B2 ⊇ . . . , and µ(B1 ) < ∞ then we have
!
[
µ An = lim µ(An )
n
n

and !
\
µ Bn = lim µ(Bn ).
n
n

Proof. Let Ãn = An \ An−1 . We have that n Ãn . Furthermore, countable additivity gives
S S
n An =
us that !
[ X
µ Ãn = µ(Ãn ).
n n
S 
Therefore, we have mn=1 µ(Ãn ) → µ ( n An ). We also have
m
n=1 Ãn = µ(Am ).
m
P S P
n=1 µ(Ãn ) = µ
Now we move onto
T the Bn , let Cn = B1 \ Bn then the Cn areTan increasing sequence of measurable
sets with Cn ↑ B1 \ n Bn . So by the rst part we have µ (B1 \ n Bn ) = limn µ(Cn ). Therefore
!
\
µ(B1 ) − µ Bn = µ(B1 ) − lim µ(Bn ).
n
n

This gives the result as long as µ(B1 ) < ∞. If there exist an m such that µ(Bm ) < ∞ then we can
renumber starting with m and repeat the argument above.
N.b. the fact that µ(B) < ∞ implies µ(A) < ∞ if B ⊂ A follows from nite additivity. µ(B) =
µ(A) + µ(B \ A) ≥ µ(A).

6
3 Week 2 starts here - Outer measure, Lebesgue Measure
Denition 3.1 (Outer measure). We write P(E) to be the power set of E , that is to say the set of
all subsets of E . An outer measure is a function, ν , from P(E) → R+ ∪ {∞} such that
ˆ ν(∅) = 0,

ˆ If A ⊆ B then ν(A) ≤ ν(B), (this is called monotonicity )


ˆ If A1 , A2 , . . . is a sequence of subsets then ν ( n An ) ≤ n ν(An ), (this is called countable
S P
subadditivity ).
The key example of an outer measure is Lebesgue outer measure, dening this is our rst step to
dening Lebesgue measure.
Denition 3.2 (Lebesgue measure on unions of intervals). Let us call I the set of countable unions of
half-open intervals (a, b]. That is to say I is the set of all sets of the form
[
(a1 , b1 ].
n

We also write J to be the set of nite unions of half open intervals. Then we dene a set function λ
from J to R by
n
X
λ ((a1 , b1 ] ∪ (a2 , b2 ] ∪ · · · ∪ (an , bn ]) = (bi − ai ).
i=1

We are most interested in λ dened on a single half open interval. Using this we can dene Lebesgue
outer measure.
Denition 3.3 (Lebesgue outer measure). We dene Lebesgue outer measure on P(R) by
λ(In ) : In are half open intervals , A ⊂
X [
λ∗ (A) = inf{ In }.
n n

Proposition 3.4. Lebesgue outer measure is an outer measure and agrees with λ on any half open
interval.
Proof. We need to check each part of the denition of outer measure. First the fact that λ∗ (∅) = 0
follows from the fact that ∅ ∈ I and λ(∅) = 0. Now suppose that A1 ⊂ A2 , then any set B ∈ I with
A2 ⊆ B also has A1 ⊆ B so

λ(In ) : In are intervals , A1 ⊂ B = λ(In ) : In are intervals , A2 ⊂ B =


X [ X [
inf{ In } ≤ inf{ In },
n n n n

as the inmum over a larger set will always Pbe smaller. Now let us turn to the countable subadditivity.
Let usPtake some sequence A1 , A2 , . . . , if n λ∗ (An ) = ∞ then we are done. Therefore we can assume
that n λ∗ (An ) < ∞. Now let us x an arbitrary S  > 0. Now by the denition of λ∗ for each n there
exists some In ∈ I such that An ⊆ In and In = S k In,k where the P In,k are half openP intervals, and
∗ (A ) + 2−n . Then the set I = is in and
P P
λ(I
Pk ∗ n,k ) ≤ λ n I
n n I n,k λ(I n,k ) = n k λ(I n,k ) ≤
. Therefore ∗( .
S P ∗
n λ (A n ) +  λ A
n n ) ≤ n λ (A n ) + 
Lastly if A is the interval (a, b] then (a, b] ∈ I so λ∗ (A) ≤ b − a. Suppose that (a, b] ⊆ (c1 , d1 ] ∪
(c2 , d2 ] ∪ . . . .
Suppose that (a, b] ⊆ (c1 , d1 ] ∪ (c2 , d2 ] ∪ . . . . Then we have that for any , δ that
[a + , b − ] ⊆ (c1 − δ/2, d1 + δ/2) ∪ (c2 − δ/4, d2 + δ/4) ∪ . . . (ck − 2−k δ, dk + 2−k δ) ∪ . . . .

7
Then using compactness there exists some n such that
n
[
[a + , b − ] ⊆ (ck − 2−k δ, dk + 2−k δ).
k=1

Then we can compare the lengths of these two sets to get


n
X ∞
X
−k+1
b − a − 2 ≤ (dk − ck + 2 δ) ≤ (dk − ck ) + 2δ.
k=1 k=1

Both  and δ are arbitrary so we can let them go to 0 and get


X
b−a≤ (dk − ck ).
k

Now ranging over all possible covering sequences gives


b − a ≤ λ∗ ((a, b]).

Note: When we are working in Rd as in the assignment the terms involving  and δ will be multiplied
by something involving the side lengths of rectangles. In order to run the proof you can say that wlog
all the rectangles you are looking at are contained inside some xed large rectangle. This will allow
you to send  and δ to zero without having to worry.
We want to turn this outer measure into a true measure. In order to do this we need to restrict λ∗
to some subset of P(R).
Denition 3.5 (Lebesgue Measurable sets). We call a set A ∈ P(R) is Lebesgue Measureable if for
any B ∈ P(R) we have
λ∗ (B) = λ∗ (A ∩ B) + λ∗ (Ac ∩ B).
Proposition 3.6. The collection of Lebesgue measureable sets, M , is a σ algebra.
Proof. First let us notice that the denition of a Lebesgue measureable sets is symmetric in A and Ac ,
so A ∈ M implies that Ac ∈ M .
Secondly we can see that ∅ ∈ M as λ∗ (A ∩ ∅) + λ∗ (A ∩ ∅c ) = λ∗ (∅) + λ∗ (A ∩ E) = 0 + λ∗ (A). This
also implies via the rst point that E ∈ M .
We then show that if A1 , A2 ∈ M then A1 ∪ A2 ∈ M . Using the fact that A1 ∈ M we have
λ∗ (B ∩ (A1 ∪ A2 )) = λ∗ (B ∩ (A1 ∪ A2 ) ∩ A1 ) + λ∗ (B ∩ (A1 ∪ A2 ) ∩ Ac1 ) = λ∗ (B ∩ A1 ) + λ∗ (B ∩ A2 ∩ Ac1 ).

We also have the identity (A1 ∪ A2 )c = Ac1 ∩ Ac2 therefore


λ∗ (B ∩ (A1 ∪ A2 )) + λ∗ (B ∩ (A1 ∪ A2 )c ) = λ∗ (B ∩ A1 ) + λ∗ (B ∩ A2 ∩ Ac1 ) + λ∗ (B ∩ Ac1 ∩ Ac2 ).

Then since A2 ∈ M we have


λ∗ (B ∩ A2 ∩ Ac1 ) + λ∗ (B ∩ Ac1 ∩ Ac2 ) = λ∗ (B ∩ Ac1 ).

Therefore,
λ∗ (B ∩ (A1 ∪ A2 )) + λ∗ (B ∩ (A1 ∪ A2 )c ) = λ∗ (B ∩ A1 ) + λ∗ (B ∩ Ac1 ).
Then we use again the fact that A1 ∈ M to get
λ∗ (B ∩ (A1 ∪ A2 )) + λ∗ (B ∩ (A1 ∪ A2 )c ) = λ∗ (B).

8
This shows that A1 ∪ A2 ∈ M .
Now let us take an innite sequence of disjoint sets A1 , A2 , A3 , . . . then we will show
n n
!!
X \
∗ ∗ ∗
λ (B) = λ (B ∩ Ai ) + λ B∩ Aci .
i=1 i=1

We can show this by induction. For the base case it just follows with n = 1 from the fact that A1 ∈ M .
Then by induction suppose we know that
n−1 n−1
!!
X \
λ∗ (B) = λ∗ (B ∩ Ai ) + λ∗ B∩ Aci .
i=1 i=1

Now since An ∈ M we have


n−1 n−1 n−1
!! !! !!
\ \ \
λ∗ B∩ Aci = λ∗ B ∩ An ∩ Aci + λ∗ B ∩ Acn ∩ Aci .
i=1 i=1 i=1
T 
Now since An is disjoint from A1 , . . . , An−1 we have that An ∩ n−1 c
i=1 Ai = An so we have

n−1 n
!! !!
\ \
λ∗ B∩ Aci = λ∗ (B ∩ An ) + λ∗ B∩ Aci .
i=1 i=1

This gives our induction step.


By monotonicity of the outer measure this gives that for any n we have
n ∞
!!
X \
∗ ∗ ∗
λ (B) ≥ λ (B ∩ Ai ) + λ B∩ Aci .
i=1 i=1

Consequently we can let n tend to innity to get


∞ ∞
!!
X \
λ∗ (B) ≥ λ∗ (B ∩ Ai ) + λ∗ B∩ Aci .
i=1 i=1

Now we can use the countable subadditivity of λ∗ to get



!! ∞
!! ∞
!! ∞
!c !
[ \ [ [
λ∗ (B) ≥ λ∗ B∩ Ai + λ∗ (B ∩ Aci = λ∗ B∩ Ai + λ∗ B∩ Ai .
i=1 i=1 i=1 i=1

Furthermore, the subadditivity of λ∗ gives



!! ∞
!c !
[ [
∗ ∗ ∗
λ (B) ≤ λ B∩ Ai +λ B∩ Ai .
i=1 i=1

Therefore, !c !
∞ ∞
!!
[ [
∗ ∗ ∗
λ (B) = λ B∩ Ai +λ B∩ Ai .
i=1 i=1

We have now shown that M is closed under complements and taking countable unions and contains
∅ which is sucient to show that M is a σ -algebra.

9
Proposition 3.7. The restriction of λ∗ to M is a measure.
Proof. We need to show that λ∗ is countably additive on M so let A1 , A2 , . . . be a sequence of disjoint
subsets in M . In the proof that M is a σ -algebra we showed that
∞ ∞
!!
X \
∗ ∗ ∗
λ (B) ≥ λ (B ∩ Ai ) + λ B∩ Aci .
i=1 i=1

Now let us take the particular case where B = this gives


S∞
i=1 Ai


! n ∞
! ∞
!c ! ∞
[ X [ [ X
∗ ∗ ∗
λ Ai ≥ λ (Ai ) + λ Ai ∩ Ai = λ∗ (Ai ).
i=1 i=1 i=1 i=1 i=1

Countable subadditivity gives


∞ n
!
[ X

λ Ai ≤ λ∗ (Ai ),
i=1 i=1

so consequently
∞ n
!
[ X

λ Ai = λ∗ (Ai ).
i=1 i=1

Remark 3.8. We now call the restriction of λ∗ to M , λ and call it Lebesgue measure.
We now want to know that there are some Lebesgue measureable sets. In order to do this we rst
show that all the intervals of the form (−∞, b] are Lebesgue measurable.
Lemma 3.9. The intervals of the form (−∞, b] are Lebesgue measureable.
Proof. Let B be a subset of R and let I1 , I2 , . . . be a sequence of half open intervals such that B ⊆
I1 ∪ I2 ∪ . . . . Now let us dene the (often empty) intervals
S l Ii = Ii ∩ (−∞, b] and
l
S Iir = Ii ∩ (b, ∞), these
r

are also half open intervals. We have B ∩ (−∞, b] ⊆ n In and B ∩ (b, ∞) ⊆ n In . Therefore we have
X X
λ∗ (B ∩ (−∞, b]) ≤ λ(Inl ), λ∗ (B ∩ (b, ∞)) ≤ λ(Inr ).
n n

Using this we have


X X X
λ∗ (B ∩ (−∞, b])) + λ∗ (B ∩ (b, ∞)) ≤ λ(Inl ) + λ(Inr ) = λ(In ).
n n n

We can then take the inmum over all possible sequences of intervals covering B to get
λ∗ (B ∩ (−∞, b])) + λ∗ (B ∩ (b, ∞)) ≤ λ∗ (B).

Combining this with countable subadditivity gives


λ∗ (B) = λ∗ (B ∩ (−∞, b])) + λ∗ (B ∩ (b, ∞)).

Therefore, (−∞, b] is Lebesgue measurable.


Corollary 3.10. Every set in B(R) is Lebesgue measurable.

10
Proof. The Borel σ algebra is the σ algebra generated by sets of the form (−∞, b] as shown last week.
Therefore, as M is a σ -algebra and contains all the intervals of the form (−∞, b] then it contains the
Borel σ -algebra.
The construction of Lebesgue measure via the outer measure can be generalised via Carathéodory's
extension theorem. We briey give the dention of a ring of subsets.
Denition 3.11 (Ring). A collection of subsets, A, of a space E is called a ring if for every A, B ∈ A
we have A \ B ∈ A and A ∪ B ∈ A.
Now we introduce Carathéodory's Extension theorem. We can see that the proof is in many ways
very similar to the construction of Lebesgue measure.
Theorem 3.12 (Carathéodory's Extension Theorem). Let A be a ring of subsets of E , and let µ : A →
[0, ∞] be a countably additive set function. Then µ extends to a measure on σ(A).
Proof. We dene the outer measure µ∗ on P(E) by
( )
X [
µ∗ (B) = inf µ(An ) : An ∈ A∀n, B ⊂ An .
n n

µ∗ (B) = ∞ if there is not possible sequence of An so that B is contained in their union. We can see
immediately that µ∗ (∅) = 0 and µ∗ is increasing.
As before we dene M to be the set of µ∗ measurable sets A that satisfy, for every B ⊆ E that
µ∗ (B) = µ∗ (B ∩ A) + µ∗ (B ∩ Ac ).

We want to show that M is a σ -algebra and that µ∗ restricts to a measure on M .


First we show that µ∗ is countably subadditive. Suppose that we have a sequence Bn and want to
show that !
[ X
µ∗ Bn ≤ µ∗ (Bn ).
n n
Let
P us x some  ∗> 0 then for each n there
S is a sequence A such that Bn ⊂ P m∗An,m and
S
An,m ∈P
−n . Then and n µ (Bn ) + .
S
m µ(A n,m ) ≤ µ (B n ) + 2 n B n ⊂ n,m A n,m n,m µ(An,m ) ≤
Therefore µ∗ ( n Bn ) ≤ n µ(Bn ) + . Since  is arbitrary this gives the countable subadditivity.
S P
Now we show that µ∗ agrees with µ on A. Let us take A S ∈ A clearly A ⊆ A so µ (A) ≤ µ(A). Now

suppose that there is a sequence An ∈ A such that A ⊆ n An . Then A ∩ An = A \ (A \ An ) ∈ A.


Therefore we use the countable subadditivity of µ on A to get
X X
µ(A) ≤ µ(An ∩ A) ≤ µ(An ).
n n

Taking the inmum over such sequences gives µ(A) ≤ µ∗ (A).


Therefore µ and µ∗ agree on A.
Now we show that M contains A. That is to say we want to show that if A ∈ A then for every B
µ∗ (B) = µ∗ (B ∩ A) + µ∗ (B ∩ Ac ).

Using subadditivity of µ∗ we have that µ∗ (B) ≤ µ∗ (B ∩ A) + µ∗ (B ∩ Ac ). Therefore we want to


S show
µ (B) ≥ µ (B∩A)+µ (B∩A ). Let An be a sequence in A such that µ (B) ≥ n µ(An )−, B ⊆ n An ,
∗ ∗ ∗ c ∗
P
then we already
P know that A∩An will be cin AP we also have that Ac ∩An = An \(A∩An ) ∈ A. Therefore
µ(B ∩ A) ≤ n µ(An ∩ A) and µ(B ∩ A ) ≤ n µ(Ac ∩ An ) and consequently
X X
µ∗ (B ∩ A) + µ∗ (B ∩ Ac ) ≤ (µ(An ∩ A) + µ(An ∩ Ac )) = µ(An ) ≤ µ∗ (B) + .
n n

11
As  is arbitrary this gives the required result.
The next step is to show that M is a σ -algebra. We start with the algebra part. E and ∅ are in
M as
µ∗ (B) = µ∗ (B ∩ E) + µ∗ (B ∩ ∅),
just because B ∩ E = B and B ∩ ∅ = ∅ and we know µ∗ (∅) = 0. We also can see that
µ∗ (B) = µ∗ (B ∩ A) + µ∗ (B ∩ Ac )
is symmetric in exchanging A and Ac so if A ∈ M then so is Ac . Now suppose A1 , A2 ∈ M . We notice
that (A1 ∩ A2 )c ∩ A1 = (Ac1 ∪ Ac2 ) ∩ A1 = (Ac1 ∩ A1 ) ∪ (Ac2 ∩ A1 ) = Ac2 ∩ A1 and Ac1 = Ac1 ∩ (A1 ∩ A2 )c .
Using this and the fact that A1 , A2 , Ac1 , Ac2 are in M we have

Using that A1 ∈ M µ∗ (B) = µ∗ (B ∩ A1 ) + µ∗ (B ∩ Ac1 )


Using that A2 ∈ M = µ∗ (B ∩ (A1 ∩ A2 )) + µ∗ (B ∩ A1 ∩ Ac2 ) + µ∗ (B ∩ Ac1 )
Using our rst identity = µ∗ (B ∩ (A1 ∩ A2 )) + µ∗ (B ∩ (A1 ∩ A2 )c ∩ A1 ) + µ∗ (B ∩ Ac1 )
Using our second identiy = µ∗ (B ∩ (A1 ∩ A2 )) + µ∗ (B ∩ (A1 ∩ A2 )c ∩ A1 ) + µ∗ (B ∩ (A1 ∩ A2 )c ∩ Ac1 )
Using the fact that A1 ∈ M = µ∗ (B ∩ (A1 ∩ A2 )) + µ∗ (B ∩ (A1 ∩ A2 )c ).
Now that we have shown that M contains nite unions we want to
S show it countains countable unions.
Let An be a sequence of disjoin sets in M . Let us write A = n An . Then itterating our previous
result we have for any B, n that
n
X
µ∗ (B) = µ∗ (B ∩ Ak ) + µ∗ (B ∩ Ac1 ∩ · · · ∩ Acn ).
k=1

Now as Ac ⊆ Ac1 ∩ Ac2 · · · ∩ Acn for each n we have µ∗ (B ∩ Ac ) ≤ µ∗ (B ∩ Ac1 ∩ · · · ∩ Acn ). Therefore for
each n n
X
µ∗ (B) ≥ µ∗ (B ∩ Ak ) + µ∗ (B ∩ Ac ).
k=1
Letting n → ∞ we have X
µ∗ (B) ≥ µ∗ (B ∩ An ) + µ∗ (B ∩ Ac ).
n
Now we use the countable subadditivity of µ∗ and the fact that B ∩ A = ∩ An ) to get
S
n (B

µ∗ (B) ≥ µ∗ (B ∩ A) + µ∗ (B ∩ Ac ).
As the other inequality holds by subadditivity of µ∗ we have that µ∗ (B) = µ∗ (B ∩ A) + µ∗ (B ∩ Ac ) and
hence A ∈ M .
Lastly, we want to show that µ∗ is a measure on M . In order to do this we need to show that µ∗
is countably additiveSon M . In the last step we showed that for any B , and a sequence of disjoint sets
An in M with A = n An , that
X
µ∗ (B) ≥ µ∗ (An ∩ B) + µ∗ (B ∩ Ac ).
n

If we apply this identity with B = A and use the fact that An ∩ A = An we get
X
µ∗ (A) ≥ µ∗ (An ).
n

Since we already know that is countably subadditive this is sucient to show that µ∗ is countably
µ∗
additive and hence a measure on M .

12
Theorem 3.13 (Uniqueness of Extension). Let µ1 and µ2 be measures on (E, E) with µ1 (E) = µ2 (E) <
∞. Suppose that µ1 = µ2 on A where A is a π-system generating E , then µ1 = µ2 on E .
Proof. Let us consider D ⊆ E dened as the measurable sets on which µ1 (A) = µ2 (A). By hypothesis
E ∈ D and A ⊆ D. We want to show that D is a σ -algebra and therefore D = E . Suppose that
A, B ∈ E with A ⊆ B then we have µi (A) + µi (B \ A) = µi (B) < ∞. This means that if A and B are
in D then so is A \ B . Suppose that An is a sequence of elements in D Swith A1 ⊆ A2 ⊆ A3S. . . then by
continuity of measure µ1 ( n An ) = limn µ1 (An ) = limn µ2 (An ) = µ2 ( n An ). Therefore n An ∈ D.
S
Therefore, D is a d-system containing the π -system A so by Dynkin's lemma is equal to E .

4 Week 3 starts here - Outer measure and Lebesgue measure cont.


4.1 Properties of Lebesgue measure
This collection is a section of facts about Lebesgue measure and the set of Lebesgue measurable sets.
We start with looking at M the σ -algebra of Lebesgue measurable sets.
Lemma 4.1 (Null sets are all Lebesgue measurable). If A in P(R) and λ∗ (A) = 0 then A ∈ M .
Proof. This is on the assignment.
We can actually characterise all Lebesgue measurable sets in terms of Null sets and Borel sets both
of which we have shown are measurable. We have the following propersition which we wont prove in
the course. We will show a similar result in the optional exercises.
Proposition 4.2. A set S ⊆ R is Lebesgue measurable if and only there exists a Borel set B and a
null set N such that S = B4N .
The most important thing we want to prove about M is that there exists a non-Lebesgue measurable
set. Before we do this we need to explore a few more properties of Lebesgue measure itself.
Proposition 4.3. Lebesgue measure is regular that is to say
ˆ λ(A) = inf{λ(U ) : U is open, A ⊆ U },

ˆ λ(A) = sup{λ(K) : K is compact, K ⊆ A}.


Proof. By monotonicity we can see that λ(A) ≤ inf{λ(US) : U is open P , A ⊆ U }. Furthermore we can
nd a sequence of half open rectangles Rk such that A ⊆ n Rn and n λ(Rn ) ≤ λ(A) + . By slightly
enlarging each of the S half open rectangles
P we can produce anotherSsequence of fully open rectangles
R̃n such that A ⊆ n R̃n and λ(A) ≥ n λ(R̃n ) − 2. The set, n R̃n is open and  can be made
arbitrarily small so this shows λ(A) ≥ inf{λ(U ) : U is open, A ⊆ U }.
Monotonicity shows that λ(A) ≥ sup{λ(K) : K is compact, K ⊆ A}. First let us assume that
A is contained in some ball, B around 0. Now use the rst part to nd some open set U such
that B \ A ⊆ U and λ(U ) ≤ λ(B \ A) + . Now let K = B \ U then we have K ⊆ A ⊆ B and
λ(K) ≥ λ(B) − λ(U ) ≥ λ(B) − λ(B \ A) −  = λ(A) −  (here we use the fact that B, A, U, K will all
have nite measure as they are inside B ). As  is arbitrary this concludes the proof when A is contained
in a ball.
Now suppose that A is unbounded. Then let An = A ∩ Bn where Bn is the closed ball of radius n.
We have that λ(An ) → λ(A). If λ(A) = ∞ then we can nd Kn ⊆ An with λ(Kn ) arbitrarily close to
λ(An ) therefore we can nd such a sequence with λ(Kn ) → ∞. If λ(A) 6= ∞ then, given , there exists
N such that λ(An ) ≥ λ(A) −  for n ≥ N . Then we can ne KN ⊆ AN such that λ(Kn ) ≥ λ(AN ) − 
therefore λ(KN ) ≥ λ(A) − 2. This shows we can a compact set which is contained in A, with measure
arbitrarily close to that of A.

13
We now want to show that Lebesgue measure is the only which assigns each interval the correct
measure.
Proposition 4.4. Lebesgue measure is the only measure on (R, B(R)) which assigns each half open
interval its length. This is equally true with half open hyper-rectangles in Rd .
Proof. The collection of half open intervals is a π-system which generates the Borel σ-algebra. There-
fore, if λ(R) had been nite we could use Dynkin's uniqueness of extension Lemma to get that any
other measure which agrees with Lebesgue measure on the half open intervals must agree with Lebesgue
measure on the whole of the Borel σ -algebra. Instead let En = [−n, n]d then, by Dynkin's uniqueness
of extension lemma we have that λ is the only measure on En , assigning each rectangle inside En its
measure. Since every rectangle is bounded, eventually it is inside some En so if µ is a measure such
that µ(R) = λ(R) for every rectangle then the restriction of µ to En must agree with the restriction
of λ to En . We also have that, for any A, µ(A) = limn µ(A ∩ En ) by continuity of measure. So
µ(A) = limn µ(A ∩ En ) = limn λ(A ∩ En ) = λ(A).

Corollary 4.5. Lebesgue measure is translation invariant. That is to say if we dene the set x + A =
then λ(x + A) = λ(A)
{x + y, y ∈ A}

Proof. Dene a new measure λx by λx (A) = λ(x+A) then λx ((a, b]) = λ((a+x, b+x]) = b+x−(a+x) =
b − a. Therefore λx agrees with λ on the half open intervals and therefore agrees with λ on the whole
of B(R). Again it is straightforward to extend this to Rd .
Lastly, in the construction of Lebesgue measure we show that M is not the whole of P(R) and
that there exist non-Lebesgue measureable sets.
Proposition 4.6. There exists sets that are in P(R) which are not in M .
Proof. This proof involves the use of the axiom of choice. In fact it is known that it is necessary to use
some form of the axiom of choice to prove the existence of a non-Lebesgue measurable set in R.
We use an argument by contradiction, we begin by assuming every subset of R is Lebesgue measur-
able. We dene an equivalence relation on [0, 1) by saying x ∼ y exactly when x − y ∈ Q. Using the
axiom of choice we nd a subset S of [0, 1) which contains exactly one representative of each equivalence
class. Next we dene the set S + q = {s + q (mod 1) : s ∈ S} for each q ∈ Q ∩ [0, 1). Then by our
choice of S we have that [
[0, 1) = (S + q),
q∈Q∩[0,1)

where this union is disjoint. We can also see by translation invariance of λ that if S were Lebesgue
measurable then we would have
λ(S) = λ(S + q)
for every q . Therefore, by countable additivity we would have
X X
λ([0, 1)) = λ(S + q) = λ(S) = ∞.
q∈Q∩[0,1) q∈Q∩[0,1)

14
5 Measurable Functions
A big part of measure theory is the study of functions which are compatible with the measure spaces.
We begin with a basic denition which will be satised by all the functions we are interested in.
Denition 5.1 (Mesasurable functions). If (E, E) and (F, F) are two measurable spaces and f is a
function E → F , then we say f is measurable if for every A ∈ F we have f −1 (A) ∈ E .
Lemma 5.2. Suppose that A ⊂ F is such that σ(A) = F . If f is a function such that for every A ∈ A
we have f −1 (A) ∈ E then f is measurable.
Proof. First we note that !
[ [
f −1 Ai = f −1 (Ai ),
i i
and
f −1 (B \ A) = f −1 (B) \ f −1 (A).
Now if we consider {A ∈ F : f −1 (A) ∈ E} then this is a σ -algebra, as E is a σ -algebra and f −1
preserves set operations. Therefore, {A ∈ F : f −1 (A) ∈ E} is a σ -algebra containing A therefore
F ⊆ {A ∈ F : f −1 (A) ∈ F} so f is measurable.

Remark 5.3. In particular note that the above lemma means that whenever we have f : E → R and
R is equipped with the Borel σ algebra, we know that f is measurable if f −1 ((−∞, b]) is a measurable
set for every b.
Lemma 5.4. If E, F are topological spaces, equipped with their Borel σ-algebras, and we have f : E → F
is a continuous map then f is measurable.
Proof. This is on the exercise sheet and a solution on the solution sheet.
Lemma 5.5. If (E, E), (F, F) and (G, G) are all measurable spaces and f :E→F and g : F → G are
measurable then so is g ◦ f .
Proof. Take any set A ∈ G then (g ◦ f )−1 (A) = {x ∈ E : g(f (x)) ∈ A}. Let us call B = {y ∈
F : g(y) ∈ A} = g −1 (A) then (g ◦ f )−1 (A) = {x ∈ E : f (x) ∈ B} = f −1 (B). Then as g is
measurable and A ∈ G then B ∈ F . In the same way as f is measurable and B ∈ F then f −1 (B) ∈ E .
As f −1 (B) = (g ◦ f )−1 (A) this shows that (g ◦ f )−1 (A) ∈ E for every A ∈ G and hence g ◦ f is
measurable.
Lemma 5.6. Suppose that f :R→R is a monotone function. Then f is measurable with respect to
the Borel σ-algebra.
Proof. Suppose without loss of generality that f is increasing then f −1 ((−∞, b]) is ∅, (−∞, ∞) or
(−∞, a) or (−∞, a] for some a. All these possibilities are Borel measurable sets.

Lemma 5.7. If fn is a sequence of measurable function taking values in (R, B(R) then the following
functions are all measurable:
ˆ −f1

ˆ λf1 for λ > 0 a xed contant.


ˆ f1 ∧ f2

ˆ f1 ∨ f2

15
ˆ f1 + f2 ,

ˆ f1 f2 ,

ˆ supn fn ,

ˆ inf n fn ,

ˆ lim supn fn ,

ˆ lim inf n fn .

Proof. We only show two result. The rest are on the assignment.
In order to show that any of these functions are measureable we want to look at f −1 ((−∞, b]) or
a similar set. (f1 ∨ f2 )−1 ((−∞, b]) = {x : max{f1 (x), f2 (x)} ≤ b} = {x : f1 (x) ≤ b and f2 (x) ≤
b} = {x : f1 (x) ≤ b} ∩ {x : f2 (x) ≤ b} = f1−1 ((−∞, b]) ∩ f2−1 ((−∞, b]). Now since f1 and f2 are
both measureable the sets f1−1 ((−∞, b]) and f2−1 ((−∞, b]) are both measurable. We also know that
the intersection of two measurable sets is measurable.
(f1 +f2 )−1 ((b, ∞)) = {x : f1 (x)+f2 (x) > b}. NowSif f1 (x) > b−f2 (x) then there exists a q ∈ Q such
that f1 (x) > q > b − f2 (x). Let us dene the set A = q∈Q {x : f1 (x) > q} ∩ {x : f2 (x) > b − q}. Since
f1 , f2 are both measurable A is a countable union of measurable sets so measurable. We can also see
that if x ∈ A then f1 (x)+f2 (x) > b and our observation shows that in fact A = {x : f1 (x)+f2 (x) > b}.
Therefore, f1 + f2 is measurable.
Denition 5.8 (Image measure). We can use a measurable function f to dene an image measure.
Suppose µ is a measure on (E, E) and f is a measurable function (E, E) → (F, F) then we can dene
a new measure ν by saying that
ν(A) = µ(f −1 (A)),
for every A ∈ F . We write ν = µ ◦ f −1 .
We can use the notion of image measure to construct further measures from Lebesgue measure.
Lemma 5.9. Suppose g : R → R and that g is non-constant, right-continuous and non-decreasing.
Let us dene g(−∞) = limx→−∞ g(x) and g(∞) = limx→∞ g(x) and let us call the interval I :=
(g(−∞), g(∞)) (this might be the whole of R. Dene a partial inverse to g by f : I → R by

f (x) = inf {x ≤ g(y)}.


y

Then f is left-continuous and non-decreasing and f (x) ≤ y if and only if x ≤ g(y).


Proof. Fix x ∈ I and consider the set Jx = {y ∈ R : x ≤ g(y)} by denition of I we know that Jx
is non empty and is not the whole of R (this shows that f is well dened). As g is non-decreasing, if
y ∈ Jx and y 0 ≥ y , then y 0 ∈ Jx . As g is right-continuous, if yn ∈ Jx and yn ↓ y then y ∈ Jx (noting
the ≤ sign in Jx ). Now using this we have that if x ≤ x0 then Jx ⊇ Jx0 so f (x) ≤ f (x0 ). We also have
that if xn ↑ x then Jx = n Jxn , so f (xn ) → f (x).
T

Theorem 5.10. Let g be a non-constant, right-continuous and non-decreasing function from R → R.


There exists a unique Radon measure on R such that for all a, b ∈ R with a < b
dg((a, b]) = g(b) − g(a).

We call this measure the Lebesgue Steitjles measure associated with g. Furthermore, every Radon
measure on R can be represented this way.

16
Proof. Dene I and f as in the Lemma above. Then we can construct dg as the image measure of
Lebesgue measure on I . That is to say we can let dg = λ ◦ f −1 . If this is the case then
dg((a, b]) = λ ({x : f (x) > a, f (x) ≤ b}) = λ((g(a), g(b)]) = g(b) − g(a).

The standard argument for uniqueness of measures (as for that of Lebesgue measure) gives uniqueness
of this measure.
Finally, if ν is a Radon measure on R then we can dene a function g , by
g(y) = ν((0, y]), y ≥ 0, g(y) = −ν((y, 0]), y < 0.

Then ν = dg by uniqueness.

5.1 Random variables and the measure theoretic formulation of probability - in


brief
The structure of measure theory allows us to put probability theory on a rm footing.
Denition 5.11. We call a measure space (Ω, F, P) a probability space (and use dierent letters for the
dierent bits) if P(Ω) = 1. In this setting we still have Ω is a set, F is a σ -algebra and P is a measure.
We call P a probability measure.
The set Ω is not really described and we view it as the space of all possible outcomes. We call A ⊂ Ω
and event if A ∈ A, and P(A) the probability of an event happening.
Denition 5.12. A random variable, X is a measurable function from a probability space (Ω, F, P) to
another measurable space (E, A).
Under this way of writng things, with B ∈ A, we have P(X ∈ B) = P({ω ∈ Ω : X(ω) ∈ B}) =
P(X −1 (B)), where X −1 (B) ∈ F as X is measurable. We call X −1 (B) the event that X ∈ B . When
working with probability people usually suppress the argument ω .
Denition 5.13. The law of a random variable, X is the image measure of P under the measurable
function X . I.e. if X : (Ω, F, P) → (E, A) then we dene a measure µX on (E, A) by µX (B) = P(X ∈
B).
Remark 5.14. The law of a random variable is an object which allows us to understand both probability
densities and discrete probability distributions in the same way.
If X takes values in R then the distribition function of X , FX (x) = µX ((−∞, x]). If X has density
f (x) then µX is equal to the measure given by µX (A) = A f (x)dx, though we still don't know how to
integrate over complicated sets.

6 Week 4 starts here - Measurable function cont.


6.1 Convergence of measurable functions
Denition 6.1 (Almost everywhere / Almost surely). We use the short hand almost everywhere (or
almost surely in a probability space) to discuss properties that are true everywhere except a measure
zero set.
Denition 6.2 (Convergence almost everywhere). Let (E, E, µ) be a measureable space. A sequence
of measureable functions, (fn )n≥1 : E → F , converges almost everywhere to f if
µ ({x ∈ E : fn (x) 6→ f (x)}) = 0

17
Denition 6.3 (Convergence in measure). Let (E, E, µ) be a measureable space. A sequence of real
valued measureable functions, (fn )n≥1 : E → F , converges in measure to f if for every  > 0
µ ({x : |f (x) − fn (x)| > }) → 0, as n → ∞.
Example 6.4. The sequence of functions fn (x) = xn converges to 0 Lebesgue almost everywhere on
[0, 1], and in measure, but it doesn't converge pointwise as it doesn't converge at x = 1.
Example 6.5. The- sequence of functions fn (x) = 1[n,n+1] (x) converges to 0 Lebesgue almost everywhere
(in fact everywhere) but not in measure.
Example 6.6. Consider the sequence of functions f1 = 1[0,1/2) , f2 = 1[1/2,1) , f3 = 1[0,1/4) , f4 = 1[1/4,1/2) , f5 =
1[1/2,3/4) , f6 = 1[3/4,1) , f7 = 1[0,1/8) , f8 = 1[1/8,1/4) . . . then fn converges to 0 in measure, but fn (x) does
not converge for any x.
We can prove a quasi-equivalence between these two notions of measure. Before we do this we need
to introduce a very useful lemma, the Borel-Cantelli Lemma. We introduce it here as it is used to
prove the following theorem but it is a useful tool to have whilst doing measure theory, particularly
probability theory. First let us also introduce some more notation
Denition 6.7. Let (An )n be a sequence of measurable sets then we have the following names
Am = {An innitely often },
\ [
lim sup An =
n n m≥n

and
Am = {An eventually }.
[ \
lim inf An =
n
n m≥n

The last names are more comon when the An are events in a probability space.
Lemma 6.8 (First Borel-Cantelli Lemma). Let (E, E, µ) be a measure space. Then if
P
n µ(An ) <∞
it follows that µ(lim supn An ) = 0).
Proof. For any n we have
 
[ X
µ(lim sup An ) ≤ µ  Am  ≤ µ(Am ).
n
m≥n m≥n

Then the right hand side goes to zero as n → ∞, so µ(lim supn An ) = 0.


Lemma 6.9 (Second Borel-Cantelli Lemma). Let (E, E, µ) be a probability space (µ(E) = 1). Then
suppose that µ(Ai ∩ Aj ) = µ(Ai )µ(Aj ) (the events are pairwise independent) for every i, j and that
n ) = ∞ then it will follow that µ(lim supn An ) = 1.
P
n µ(A
Proof. First we note that µ(Aci ∩Acj ) = µ((Ai ∪Aj )c ) = 1−µ(Ai ∪Aj ) = 1−µ(Ai )−µ(Aj )+µ(Ai )µ(Aj ) =
(1 − µ(Ai ))(1 − µ(Aj )). We use the inequality 1 − a ≤ e−a . Let an = µ(An ) then
N N
! !
as N → 0.
\ X
µ Acm = ΠN
m=n (1 − am ) ≤ exp − am → 0,
m=n m=n
T 
Therefore, µ m≥n m = 0 for every n. So µ(lim supn An ) = 1 − µ( n m≥n Am ) = 1.
c c
S T
A

Theorem 6.10. Let (E, E, µ) be a measure space and (fn )n be a sequence of measurable functions.
Then we have the following:

18
ˆ Suppose that µ(E) < ∞ and that fn → 0 almost everywhere, then fn → 0 in measure.
ˆ If fn → 0 in measure then there exists some subsequence (nk )k such that fnk → 0 almost every-
where.
Proof. Suppose that fn → 0 almost everywhere. Then
 

{x : |fm (x)| ≤ } ↑ µ (|fn | ≤  eventually) ≥ µ({x : fn (x) → 0}) = µ(E),


\
µ({x : |fn (x)| ≤ }) ≥ µ 
m≥n

therefore,
µ({x : |fn (x)| > ) = µ(E) − µ({x : |fn (x)| ≤ }) → 0.
Now suppose that fn → 0 in measure. We can nd a subsequence nk such that
µ({x : |fnk (x)| > 1/k}) ≤ 2−k .

Therefore X
µ({x : |fnk (x)| > 1/k}) < ∞.
k

Therefore by the rst Borel-Canteli lemma we have that


µ ({x : |fnk (x)| > 1/k innitely often}) = 0.

Therefore fnk → 0 almost everywhere.

6.2 Egoro's Theorem and Lusin's Theorem


Theorem 6.11 (Egoro's Theorem). Let (E, E, µ) be a nite measure space and (fn )n be a sequence
of real valued measurable functions on E . If µ is nite and if fn converges µ-almost everywhere to f
then for each positive  there is a set A with µ(Ac ) < , such that fn converges uniformly on A to f .
Proof. For each n let gn (x) = supj≥n |fj (x)−f (x)|. Then gn is a positive function which is nite almost
everywhere. The sequence (gn )n converges to 0 almost everywhere and so in measure. Therefore, for
each positive integer k we can nd nk such that
µ ({x : gnk > 1/k}) < 2−k .

Dene sets Ak = {x : gnk (x) ≤ 1/k} and let A = k Ak . The set A has
T

!
[ X X
µ(Ac ) = µ Ack ≤ µ(Ack ) ≤ 2−k = .
k k k

We want to show that fn converges uniformly to f on A. For each δ there exists a k such athat 1/k < δ ,
then as A ⊆ Ak , we have that for every n ≥ nk ,
|fn − f | ≤ gnk ≤ 1/k < δ,

uniformly on all x ∈ Ak and hence in A.


This proof motivates the following denition of convergence of functions.

19
Denition 6.12 (Almost uniform convergence). We say a sequence of functions (fn )n≥1 converges
almost uniformly on a measure space (E, E, µ) if for every  > 0 there exists a set A with µ(Ac ) < 
with fn → f uniformly on A.
We can use Egoro's theorem to prove a result called Lusin's theorem. First let us recall the
denition of regularity
Denition 6.13. Let E be a topological space and µ be a measure on (E, B(E)) then say µ is regular
if for every A ∈ B(E) we have
ˆ µ(A) = inf{µ(U ) : A ⊆ U, U is open},

ˆ µ(A) = sup{µ(K) : K ⊆ A, K is compact}.

Theorem 6.14 (Lusin's Theorem). Suppose that f is a measurable function and A ⊆ Rd is a Borel set
and λ(A) < ∞ then for any  > 0 there is a compact subset K of A with λ(A \ K) <  such that the
restriction of f to K is continuous.
Remark 6.15. This theorem can be generalised to locally compact Hausdor spaces, see Cohn's book.
Proof. Suppose rst that f only takes countably many values, a1 , a2 , a3 , . . . on A the let Ak = {x ∈
A : fS(x) = ak }, by measurablility of f we can
Sn see that Ak = f ({ak }) is measurable. We know that
−1

A = n An so by continuity of measure S λ( k=1 Ak ) ↑ λ(A). Since λ(A) < ∞ we have that for any
 > 0 there exists n such that λ(A \ nk=1 Ak ) < /2. By the regularity of Lebesgue Sn measure we can
nd compact subsets K1 , . . . , Kn such that λ(An \ Kn ) ≤ /2n. Then let K = k=1 Kk . This is a
compact subset of A and
n
[ n
[ n
[
λ(A \ K) ≤ λ(A \ Ak ) + λ( Ak \ Kk ) < /2 + /2.
k=1 k=1 k=1

Now f restricted to K is continuous since the Ki are disjoint and f is constant on each Ki .
Now we have proved the special case where f takes countably many values we can use this to prove
the theorem for general f . Let fn = 2−n b2n f c then 2−n ≥ f (x)−fn (x) ≥ 0 so fn (x) → f (x), uniformly.
Now, fn can only take countably many values, so by our special case of Lusin's theorem there exists a
Kn ⊆ K , compact, such that λ(A \ Kn ) ≤ 2−n−1 , and fS n is continuous on KnP . Now let K∞ = n Kn ,
T
then K∞ is compact and λ(A \ K∞ ) = λ(A \ K∞ ) = λ( n (A \ Kn )) ≤ /2 + n 2−n−1 = . Now we
have that fn converges uniformly to f on K∞ and fn is continuous on K∞ for each n. As the uniform
limit of continuous functions is continuous this shows that f is continuous on K .

7 Integration
We now get to the denition of the Lebesgue integral which is the second important object that we
construct in this course. There are several dierent notations for the integral of a function f with
respect to a measure µ. We have
 
µ(f ) = f dµ = f (x)µ(dx).
E E

When you are integrating with respect to Lebesgue measure the most common notation is

f (x)dx.
E

20
Before we start constructing the integral we'll briey discuss the motivations for how to construct it.
Firstly, you've already seen the Riemann integral. We can describe the strategy of Riemann integration -
very loosly - as splitting the domain of the function into equal sized chunks, estimating the height of the
function on each chunk then adding them together. Broadly what happens with Lebesgue integration
is that we split the range of the function into equal sized chunks, estimate the size of the part of the
domain which will end up in that chunk of range then sum everything up. We need the theory of
measure in order to do this because the bit of the domain corresponding to chuncks of the range can
be quite weird sets whose size it wouldn't be possible to measure. The rst motivation for this is that
whilst Riemann integration only works for functions from subsets of Rd to R, Lebesgue integration
allows the domain on the function to be quite weird, (as long as it is a measure space). As an example,
this is helpful for taking expectations rigorously because expectations are integral of random variables
and the domain of a random variable is a probability space which may not be explicit.
The second big motivation for introducing a new theory of integration is the issue of convergence.
It
 is important in many practical
 applications of integration
 theory to know when limn fn (x)dx =
limn fn (x)dx or when Ex Ey f (x, y)dxdy = Ey Ex f (x, y)dydx. Lebesgue integration allows us to
rigorously nd conditions on f under which these statements will be true. This is often not possible
in a satisfactory way with the Riemann theory of integration. We will see some of these convergence
theorems next week and then switching the order of integration towards the end of the course (currently
planned for week 9). The most important motivation for developing good convergence theorems was
the development of Fourier series. We want to know when it is possible to integrate a Fourier series
term by term.
The strategy for constructing the integral is to begin by dening µ(f ) when f belongs to a special
class of measurable functions that we call simple functions. We then dene the integral to progressively
larger classes of functions.
Denition 7.1 (Simple functions). Let (E, E, µ) be a measure space. The set of simple functions on
this space taking values in R are functions of the form
n
X
f (x) = ak 1Ak (x).
k=1

Here, the Ak are disjoint sets in E , 1A represents the indicator function of the set, and the ak are
non-negative real numbers. We note that this representation of f is not unique.
Denition 7.2 (The integral of a simple function). Still working in the setting above, let f (x) =
k=1 ak 1Ak (x), then we can dene
Pn

Xn
µ(f ) = ak µ(Ak ).
k=1

8 Week 5 starts here -Integration cont.


Denition 8.1 (Simple functions). Let (E, E, µ) be a measure space. The set of simple functions on
this space taking values in R are functions of the form
n
X
f (x) = ak 1Ak (x).
k=1

Here, the Ak are disjoint sets in E , 1A represents the indicator function of the set, and the ak are
non-negative real numbers. We note that this representation of f is not unique.

21
Denition 8.2 (The integral of a simple function). Still working in the setting above, let f (x) =
k=1 ak 1Ak (x), then we can dene
Pn

Xn
µ(f ) = ak µ(Ak ).
k=1

Lemma 8.3. The integral of a simple function is well dened (it doesn't depend on the choice of
representation of the simple function) and satises the following properties.
ˆ For α > 0 we have µ(αf ) = αµ(f )
ˆ µ(f + g) = µ(f ) + µ(g).

ˆ If f (x) ≤ g(x) for every x ∈ E then µ(f ) ≤ µ(g).


ˆ f = 0 µ-almost everywhere if and only if µ(f ) = 0
Proof. Let us rst look at the well denedness, without loss of generality lets assume the ak , bj are
strictly positivePand the sets Ak are disjoint and similar for the Bj . Let us suppose that f =
k=1 bk 1Bk which are both simple function representations. Then we can see that
Pn m
Snk=1 k 1AkS=
a
k=1 Bk and that ai = bj if Ai ∩ Bj 6= ∅. Using this we can write
m
k=1 Ak =

n
X
µ(f ) = ak µ(Ak )
k=1
 
n m n X
m
as Ak = (Ak ∩ Bj ) as
[ [ [ X
Ak = Bj  = ak µ(Ak ∩ Bj )
j k=1 j=1 k=1 j=1
n X
m m n
(as ak = bj if Ak ∩ Bj 6= ∅ so ak = bj or µ(Ak ∩ Bj ) = 0)
X X X
= bj µ(Ak ∩ Bj ) = bj µ(Ak ∩ Bj )
k=1 j=1 j=1 k=1
Xm
= bj µ(Bj ).
j=1

Now we move on to the linearity properties. These come naturally from the dention,
n
X n
X
µ(αf ) = αak µ(Ak ) = α ak µ(Ak ) = αµ(Ak ).
k=1 k=1

When we are dealing with two simple functions simultaneously it is useful to write them both in
a representation where the measurable sets Sappearing are the same for both functions. If we let
g = j=1 cj 1Cj then let us write A0 = E \ nk=1 Ak , a0 = 0 and dene C0 , c0 similarly then we can
Pm
write f + g as a simple function via
n X
X m
f (x) + g(x) = (ak + cj )1Ak ∩Cj
k=0 j=0

and we have n X
m
X
µ(f + g) = (ak + cj )µ(Ak ∩ Cj ).
k=0 j=0

22
We note that nk=0 Ak = = E and the Ak are mutually disjont, and the Cj are mutually
S Sm
j=0 Cj
disjoint. Therefore
n X
X m
µ(f + g) = (ak + cj )µ(Ak ∩ Cj )
k=0 j=0
n
X m
X m
X n
X
= ak µ(Ak ∩ Cj ) + cj µ(Ak ∩ Cj )
k=0 j=0 j=0 k=0
n m
as the unions of the Ak or Cj ll the space
X X
= ak µ(Ak ) + cj µ(Cj )
k=0 j=0
n m
as a0 = c0 = 0
X X
= ak µ(Ak ) + cj µ(Cj )
k=1 j=1

= µ(f ) + µ(g).

Now we move onto the monotonicity of the integral. Let us express f and g as before, again the
goal is to represent the two functions using the same measurable sets. We can rewrite as
n X
X m n X
X m
f= ak 1Ak ∩Cj = ak,j 1Ak ∩Cj ,
k=0 j=0 k=0 j=0

where ak,j = ak 1Ak ∩Cj 6=∅ . Here again we are using the fact that ll the space. In the same
Sm
j=0 Cj
way we can write
n X
X m
g= ck,j 1Ak ∩Cj ,
k=1 j=1

where ck,j = cj 1Ak ∩Cj 6=∅ . Then if f (x) ≤ g(x) for every x we know that this means ak,j ≤ ck,j for every
k, j . Then by the well denedness of the integral we have
n X
X m n X
X m
µ(f ) = ak,j µ(Ak ∩ Cj ) ≤ ck,j µ(Ak ∩ Cj ) = µ(g).
k=1 j=1 k=1 j=1

Lastly, we look at when µ(f ) = 0. First if f = 0 µ-almost everywhere then ak = 0 or µ(Ak ) = 0


for each k, therefore µ(f ) = 0. If µ(f ) = 0 then since all the terms ak µ(Ak ) ≥ 0 we have that ak = 0
or µ(Ak ) = 0 for each k.
We are now going to extend the denition of the integral to a larger class of functions.
Denition 8.4 (Lebesgue integral for positive functions). Let f be a positive, measurable function,
we dene
µ(f ) = sup{µ(g) : g is a simple function, g ≤ f }

Lemma 8.5. The above denition of Lebesgue integral forPpositive functions is consistent with the
dention for simple functions. That is to say that if f = nk=1 ak 1Ak where ak ≥ 0 and the Ak are
disjoint measurable sets then
n
ak µ(Ak ) = sup{µ(g) : g is a simple function, g ≤ f }.
X

k=1

Proof. This is on the exercise sheet.

23
Denition 8.6 (Final denition of Lebesgue integral). Suppose that f is a measurable function which
is not necessarily positive. Then we call f , µ-integrable or Lebesgue integrable if µ(|f |) < ∞. In this
case we can write f = f+ − f− where f+ and f− are both positive and measureable (f+ = max{f, 0}).
We then dene the integral of f by
µ(f ) = µ(f+ ) − µ(f− ).
Remark 8.7. Notice that we haven't yet proved that these denitions of the integral behave the way we
hope (e.g. are linear, monotone, etc). In order to do this we need to prove some convergence results.

8.1 Convergence theorems for integrals of functions


This is one of the most useful parts of the course. In follow on courses in PDE and probability you will
use these theorems again and again.
Lemma 8.8. If f and g are non-negative, measurable, real valued functions on (E, E, µ) and f ≤ g
then µ(f ) ≤ µ(g).
Proof. If f ≤ g then we recall that by denition µ(f ) = sup{µ(f˜) : f˜ is simple, f˜ ≤ f }. So if h is a
simple function with h ≤ f then we also have that h ≤ g . Therefore, we can write
µ(g) = sup{µ(h) : h is simple, h ≤ g}
= max{sup{µ(h) : h simple, h ≤ f }, sup{µ(h) : h simple, h ≤ g, h is not ≤ f}}
= max{µ(f ), sup{µ(h) : h simple, h ≤ f, h is not ≤ g}} ≥ µ(f ).

Theorem 8.9 (Monotone Convergence Theorem). Let f be a non-negative, measurable real-valued


function and let fn be a sequence of such functions. Then if fn ↑ f we will have that µ(fn ) ↑ µ(f ).
Proof. We will break this proof down into progressively more complicated cases. First we note that by
monotonicity limn µ(fn ) ≤ µ(f ) and therefore it is sucient to prove µ(f ) ≤ limn µ(fn ).
First let us look at the caseS where fn = 1An and f = 1A , then the assumptions imply that
A1 ⊆ A2 ⊆ A3 ⊆ . . . and A = n An . Then this result is the same as the continuity of a measure as
proved before.
Now let us keep f = 1A and let fn be a sequence of simple functions. Pick  ∈ (0, 1) arbitrary.
Then let An = {x : fn (x) > 1 − } then we have that An ↑ A, and (1 − )1An ≤ fn . Therefore by the
rst case we have that limn µ(fn ) ≥ limn µ((1 − )1An ) = limn (1 − )µ(1An ) = (1 − )µ(f ). Since  is
arbitrary this gives the result.
Now we look at the case where both f and fn are simple functions. We can write f = nk=1 ak 1Ak ,
P
where wlog each ak is strictly positive, then we have that a−1
k fn 1Ak ↑ 1Ak so we can apply the previous
case to each part of f . Specically
n
X n
X n
X
µ(fn ) = µ( 1Ak fn ) = ak µ(a−1
k 1Ak fn ) ↑ ak µ(Ak ) = µ(f ).
k=1 k=1 k=1

Here the rst equality follows from the fact that the support of fn must be included in the support of
f since 0 ≤ fn ≤ f .
Our next case is when f is positive and measurable and fn are all simple. Let us pick g a simple
function with g ≤ f then gn = fn ∧ g = min{fn , g} is a sequence of simple functions increasing to g .
Therefore, by our previous case µ(gn ) ↑ µ(g). Furthermore gn ≤ fn so by monotonicity
µ(g) = lim µ(gn ) ≤ lim µ(fn ).
n n

24
As g is an arbitrary this means that sup{µ(g) : g is a simple function, g ≤ f } ≤ limn µ(fn ).
The last case is the most general where both fn and f are positive and measureable. In this case we
introduce our favorite kind of approximation (which is very similar to what we used in Lusin's theorem)
gn = 2−n b2n fn c ∧ n,


then gn is a sequence of simple functions with gn ↑ f and gn ≤ fn , proving this is an exercise on the
assignment. Therefore we have
µ(f ) = lim µ(gn ) ≤ lim µ(fn ) ≤ µ(f ).
n n

Hence we have the required result.


We have a corollary of this result which can be thought of as another denition of µ(f ) for non-
negative, measurable f . This makes concrete our hand-waving denition of how Lebesgue measure
works as splitting the range of f up into equal chunks and using this to approximate the area under
the curve. This is often a more practically useful denition of the integral of a positive function.
Corollary 8.10. Let f be a non-negative, measurable, real valued function on (E, E, µ) and dene
fn = 2−n b2n f c ∧ n,


then µ(f ) = limn µ(fn ).


Proof. fn ↑ f so by monotone convergence limn µ(fn ) = µ(f ).
We can now use this to prove that the integral of positive measurable functions has the desired
properties.
Proposition 8.11. Suppose f and g are non-negative, real valued, measurable function on a space
(E, E, µ) then
ˆ For every α > 0 µ(αf ) = αµ(f ),
ˆ µ(f + g) = µ(f ) + µ(g)

ˆ If f ≤ g then µ(f ) ≤ µ(g)


ˆ µ(f ) = 0 if and only if f is 0 almost everywhere.
Proof. Suppose that fn is a sequence of simple functions with fn ↑ f then αfn is a sequence of simple
functions with αfn ↑ αf . Monotone convergence tells us that µ(αf ) = limn µ(αfn ). We can use our
previous results for simple functions to get that µ(αfn ) = αµ(fn ). Therefore µ(αf ) = limn µ(αfn ) =
limn αµ(fn ) = αµ(f ).
For the sum let fn and gn be sequences of simple functions with fn ↑ f and gn ↑ g . Then we
have (fn + gn ) ↑ f + g and by monotone convergence and the results for simple functions we have
µ(f + g) = limn µ(fn + gn ) = limn µ(fn ) + limn µ(gn ) = µ(f ) + µ(g).
The third point is proved before.
Now µ(f ) = 0 if and only if sup{µ(h) : h simple, h ≤ f } = 0 which is if and only if µ(h) = 0 for
every h ≤ f and h simple. Now we know that if h is simple µ(h) = 0 if and only if h is zero almost
everywhere. This tells us that µ(f ) = 0 if and only if h ≤ f and h simple, implies h = 0 almost
everywhere. If we look at the set on which f is positive we have {x : f (x) > 0} = n {x : f (x) > 1/n}
S
so by continuity of measure if µ(f > 0) > 0 then there exists some n such that µ(f > 1/n) > 0 therefore
f is zero almost everywhere if and only if we can't t any simple function underneath f which is not
zero almost everywhere.

25
Using this we can give another writing of the monotone convergence theorem.
Proposition 8.12
P (Beppo-Levi) . Suppose that (fn )n≥0 is a sequence of real-valued measurable func-
tions. Then µ( n fn ) = n µ(fn ).
P

Proof. Write gn = then gn ↑


Pn P
k=1 fk n fn

X n
X
µ(fn ) = lim µ(fk ),
n
n k=1

then by linearity
n
X Xn
lim µ(fk ) = lim µ( fk ) = lim µ(gn ),
n n n
k=1 k=1
then using monotone convergence we have
X
lim µ(gn ) = µ( fn ).
n
n

We can also prove that our notion of convergence for integrable functions behaves in the way we
expect. First we prove a helpful Lemma
Lemma 8.13. Let f1 , f2 , g1 , g2 all be non-negative, integrable, real valued functions such that f1 − f2 =
then we have µ(f1 ) − µ(f2 ) = µ(g1 ) − µ(g2 ).
g1 − g2
Proof. We have that f1 + g2 = g1 + f2 so µ(f1 + g2 ) = µ(g1 + f2 ), using linearity we have
µ(f1 ) + µ(g2 ) = µ(g1 ) + µ(f2 )

since all the integrals involved are nite we can rearrange this to give
µ(f1 ) − µ(f2 ) = µ(g1 ) − µ(g2 ).

Proposition 8.14. Suppose that f and g are integrable, real valued function on (E, E, µ) then
ˆ For every α > 0 we have µ(αf ) = αµ(f ), we also have µ(−f ) = −µ(f )
ˆ The function f + g is integrable and µ(f + g) = µ(f ) + µ(g)
ˆ If f ≤ g then µ(f ) ≤ µ(g)
Proof. Let us write f = f+ − f− and g = g+ − g− where these are split into the positive and negative
parts of f and g . Then αf = αf+ − αf− so µ(αf ) = µ(αf+ ) − µ(αf− ) = α(µ(f+ ) − µ(f− )). Similarly
−f = f− − f+ so µ(−f ) = µ(f− ) − µ(f+ ) = −µ(f ).
First we have that |f + g| ≤ |f | + |g| so f + g is integrable. For the second point we need to
use the Lemma above. We know that (f+ + g+ ) − (f− + g− ) = (f + g)+ − (f + g)− and all of
(f+ + g+ ), (f− + g− ), (f + g)+ , (f + g)− are non-negative and integrable so using the lemma we have

µ(f +g) = µ((f +g)+ )−µ((f +g)− ) = µ(f+ +g+ )−µ(f− +g− ) = µ(f+ )−µ(f− )+µ(g+ )−µ(g− ) = µ(f )+µ(g).

For the last point if f ≤ g then g − f is a non-negative measurable function so µ(g − f ) ≥ 0


and µ(g) = µ(f + (g − f )). We then use the linearity from the last point to get µ(f + (g − f )) =
µ(f ) + µ(g − f ) ≥ µ(f ).

26
Theorem 8.15 (Fatou's Lemma). Let fn be a sequence of non-negative measurable function then we
have the following result  
µ lim inf fn ≤ lim inf µ(fn )
n n
Remark 8.16. I always have trouble remembering which way around the inequality goes in this lemma.
A helpful example is if fn = 1[n,n+1) and µ is Lebesgue measure. Then λ(fn ) = 1 for every n and
lim inf n fn = 0. This is also an instructive example for why the limits can fail to be the same. Essentially
here the mass we are trying to integrate escapes to innity.
Proof. This is essentially a consequence of monotone convergence. Let gn = inf k≥n fk , then gn is a
non-decreasing sequence of measurable functions and gn ≤ fn for each n. By denition of the gn we
also know that lim inf fn = lim inf n gn = limn gn . Using Monotone convergence we then have
µ(lim inf fn ) = µ(lim gn ) = lim µ(gn ).
n n n
Then using monotonicity we have
µ(gn ) ≤ µ(fn )
for each n, so consequently
lim µ(gn ) = lim inf µ(gn ) ≤ lim inf µ(fn ).
n n n
Putting these all together gives the result.
Fatou's lemma is key to proving our next important convergence theorem.
Theorem 8.17 (Dominated convergence theorem). Let fn be a sequence of functions and f another
function such that fn → f almost everywhere. Suppose further that there exists a positive function g
such that |f | ≤ g, |fn | ≤ g for every n and µ(g) < ∞, then limn µ(fn ) = µ(f ). The function g is called
the dominating function.
Proof. Let us rst suppose that fn → f and the domination conditions hold everywhere. Then we have
that g + fn is a sequence of non-negative measurable functions whose limit is g + f . Applying Fatou's
lemma gives
µ(g + f ) ≤ lim inf µ(g + fn ) = µ(g) + lim inf µ(fn ),
n n
subtracting µ(g) from each side (which we can do as it is nite) gives
µ(f ) ≤ lim inf µ(fn ).
n
Similarly g − fn is a sequence of non-negative measurable functions whose limit is g − f . Applying
Fatou's lemma again gives
µ(g) − µ(f ) ≤ µ(g) + lim inf (−µ(fn )) = µ(g) − lim sup µ(fn ).
n n
Rearranging this since all the quantities are nite gives
lim sup µ(fn ) ≤ µ(f ).
n
Putting both parts together gives
µ(f ) ≤ lim inf µ(fn ) ≤ lim sup µ(fn ) ≤ µ(f ).
n n

Therefore the limit of the sequence µ(fn ) exists and is equal to µ(f ).
The extension of this result to when the conditions only hold almost everywhere is due to the fact
that the integrals of any function is unchanged by modifying that function on a measure zero set. This
type of result will be discussed in more detail when we introduce Lebesgue spaces. It isn't really the
point of this particular theorem, we just give the full version here so we are able to apply it.

27
9 More integration: Week 7 starts here
The following is a useful criteria for when we can dierentiate under the integral sign which also serves
as a good example of how to use the dominated convergence theorem.
Theorem 9.1 (Dierentiation under the integral sign). Let (E, E, µ) be a measure space and f :
U × E → R be a function such that x 7→ f (t, x) is integrable for every t, and t 7→ f (t, x) is dierentiable
for every x, and suppose further that there exists an integrable function g(x) such that
∂f (t, x)
≤ g(x), ∀t ∈ U
∂t

then the function x 7→ ∂f (t, x)/∂t is integrable and the function F (t) = E f (t, x)µ(dx) is dierentiable
with 
dF ∂f
= (t, x)µ(dx).
dt E ∂t
Notice here we are using a dierent notation for the integral with respect to µ. We do this because it is
helpful to be able to emphasise that we integrate in x but not t.
Proof. Let n be an arbitrary sequence which tends towards 0. Let
f (t + n , x) − f (t, x) ∂f
gn (t, x) = − (t, x).
n ∂t
First we notice that gn → 0 and gn + ∂f /∂t is measurable so ∂f /∂t is the limit of measurable functions
so measurable. By the mean value theorem we have |gn | ≤ 2g for each n. Therefore, by dominated
convergence we have 
gn (t, x)µ(dx) → 0.

This gives the required result.


We have a couple of useful facts about integration which don't t into a big section.
Denition 9.2 (Restriction of a Measure). Suppose that (E, E, µ) is a measure space and A ∈ E then
the set of measurable subsets of A is a σ algebra we call A and the restriction of µ to EA is a measure
we call µA . Furthermore we have that if f is a measurable function on E then µ(f 1A ) = µA (f ).
Remark 9.3. The last part is actually a lemma whose proof is an exercise.
We
b
can use this dention to make sense of Lebesgue integrals on intervals (for example). If I = [a, b]
then a f (x)dx = λ(f 1I ).
We also notice that we can dene a measure using a positive function f .
Proposition 9.4. Let (E, E, µ) be a measure space and let f be a non-negative measurable function.
Dene ν(A) = µ(f 1A ) for each A ∈ E . Then ν is a measure on E and for all non-negative g we have
ν(g) = µ(f g)

Proof. First we need to show that ν is indeed a measure. f 1∅ = 0 so we have ν(∅) = 0 as required.
We will also have that ν(A) ≥ 0 since f is non-negative. We show countable additivity, we note that
if A an B are disjoin
P then 1A∪B = 1A + 1B and furthermore if A1 , A2 , . . . is a P
sequence of disjoint
sets then 1Sn An = n 1An . With this reformulation ν( n An ) = µ(f 1Sn An ) = µ( n f 1An ) using the
S
Beppo-Levi reformulation of monotone convergence we have
X X X
µ( f 1An ) = µ(f 1An ) = ν(An ),
n n n

28
which is our desired result.
Now we want to show that if g ≥ 0 then ν(g) = µ(f g). Let us begin with the case where g = 1A
for some measurable A, then ν(g) = ν(A) = µ(f 1A ) = µ(f g) so the result follows by denition. Then
using the linearity of µ we can see that if g is a simple function then ν(g) = µ(f g). Now suppose
that g is not necessarilly simple, we can constuct (in our standard way) a sequence of simple functions,
gn , which increase to g then by monotone convergence we have that ν(g) = limn ν(gn ) = limn µ(f gn ).
Now f gn is a sequence of function which increases to f g so using monotone convergence we have that
limn µ(f gn ) = µ(f g) so we have that ν(g) = µ(f g).

There are also a few facts about Riemann integration which work in pretty much exactly the same
way for Lebesgue integration. For example the fundamental theorem of calculus holds equally well
in this case. We will see in general that when something is Riemann integrable it is also Lebesgue
integrable which will prove all these in general.
Theorem 9.5 (Fundamental
 theorem of calculus). Suppose that f : [a, b] → R is a continuous function
and set F (t) = at f (x)dx = λ(1[a,t] f ), then F is dierentiable with F 0 (t) = f (t). Furthermore, let
b
F : [a, b] → R have continuous derivative f , then F (b) − F (a) = a f (x)dx.

Proof. Given  > 0 there exists δ > 0 such that |x − t| ≤ δ implies that |f (x) − f (t)| ≤  therefore if
we take |h| ≤ δ then
 t+h  t∨(t+h)
1 1 1
(F (t + h) − F (t)) − f (t) = (f (x) − f (t))dx ≤ |f (x) − f (t)|dx.
h |h| t |h| t∧(t+h)

Now we can use the fact that inside the integral |x − t| ≤ δ so we have
 t∨(t+h)
1 1
[ (F (t + h) − F (t)) − f (t) ≤ dx = .
h |h| t∧(t+h)

Therefore limh→0 (F (t + h) − F (t))/h = f (t). 


In the other direction d/dt(F (t) − at f (x)dx) = 0 so F (t) − at f (x)dx is constant in t (by the mean

value theorem), so F (t) = F (a) + at f (x)dx. This gives us the result.
We can use the fundamental theorem of calculus to prove the standard result about change of
variables. This time we can exploit our new machinary.
Proposition 9.6. Let φ : [a, b] → [φ(a), φ(b)] be continuously dierentiable and strictly increasing then
for all non-negative g on [φ(a), φ(b)] we have
 φ(b)  b
g(y)dy = g(φ(x))φ0 (x)dx.
φ(a) a

Proof. First suppose that g is the indicator function of an interval (c, d] then we want to prove that
 φ(b)  b
1(c,d] (x)dy = 1(c,d] (φ(x))φ0 (x)dx.
φ(a) a

Here the left hand side is equal to [φ(a), φ(b)] ∩ (c, d] and the right hand side is
 b∧φ−1 (d)
φ0 (x)dx,
a∨φ−1 (c)

29
using the fundamental theorem of calculus this is
φ(b ∧ φ−1 (d)) − φ(a ∨ φ−1 (c)) = φ(b) ∧ d − φ(a) ∨ c = [φ(a), φ(b)] ∩ (c, d]

where here we used the fact that φ was increasing to commute it with min and max.
Now we have shown our proposition holds when g is the indicator function of a half open interval.
By linearity of the integral it will hold when g is the indicator function of a nite disjoint union of half
open intervals. Now let D be the set of all Borel sets A such that 1A satises our proposition. As the
name suggests we want to show that D is a d-system. If A ⊆ B and A, B ∈ D then 1B\A = 1B − 1A so
the proposition will hold for B \ A by linearity of the integral. Suppose that A1 ⊆ A2 ⊆ A3 . . . then
let gn = 1An then gn ↑ 1A = g and gn ◦ φ ↑ g ◦ φ, as φ is increasing so if
 φ(b)  b
gn (y)dy = gn (φ(x))φ0 (x)dx
φ(a) a

for each n applying monotone convergence to each side gives the result for g = 1A . This shows that
D is a d-system. Applying Dynkin's lemma then shows that for every A ∈ B(R) we have that the
proposition holds with g = 1A .
Linearity of the integral allows us to extend this result to any simple function g . We can then use
monotone convergence in exactly the same way as for the last part to extend it to any non-negative
measurable g .

9.1 Agreement with Riemann Integral


We now turn our attention to the case wher µ is Lebesgue measure, λ. We want to show that our new
dention of the integral will agree with the Riemann integral when they are both dened. Let us rst
recall the denitions associated with the Riemann integral.
Denition 9.7. Let [a, b] be an interval in R then a nite sequence of real numbers {ak }nk=0 is called
a partition of the interval if a = a0 < a1 < · · · < an = b. I usually denote a partition with a lower case
p or q . You also often see the notation P .

Denition 9.8. If we have two partitions p = {ak }nk=0 and q = {bj }m


j=0 then we say q is a renement
of p if every element of p appears in q .
Denition 9.9. We call a sequence of partitions (pn )n≥1 nested if for every n we have that pn+1 is a
renement of pn .
Denition 9.10. If we have a partition p = {ak }nk=0 and a function f then we can dene
mk = inf{f (x) : x ∈ [ak−1 , ak ]} and Mk = sup{f (x) : x ∈ [ak−1 , ak ]}.

Then we have the upper sum and lower sum associated to the partition which are dened as
n
X n
X
l(f, p) = mk (ak − ak−1 ), u(f, p) = Mk (ak − ak−1 ).
k=1 k=1

Remark 9.11. We can check that if q is a renement of p then


l(f, p) ≤ l(f, q) ≤ u(f, q) ≤ u(f, p).

Denition 9.12. We call a function f , Riemann integrable on [a, b] if supp l(f, p) = inf q u(f, q).

30
Lemma 9.13. A function, f , is Riemann integrable if and only if there exists a partition p such that
u(f, p) − l(f, p) < .

Proof. First suppose that f satises that for every  there exists a p such that u(f, p) − l(f, p) <  then
inf u(f, q) − sup l(f, q) ≤ u(f, p) − l(f, p) < .
q q

Since,  is arbitrary this implies


inf u(f, q) = sup l(f, q)
q q

and therefore f is Riemann integrable. 


Suppose that f is Riemann integrable then we know  that inf q u(f, q) = supq l(f,
 q) = f dx. There-
fore, given  there exists p1 and p2 so that u(f, p1 ) ≤ f dx + /2 and l(f, p2 ) ≥ f dx − /2. Now let
p = p1 ∪ p2 , that is to say the partition made up of all the point in both p1 and p2 . In this case p1 and
p2 are both renements of p, so we have
 
f dx − /2 ≤ l(f, p2 ) ≤ l(f, p) ≤ u(f, p) ≤ u(f, p1 ) ≤ f dx + /2,

so
u(f, p) − l(f, p) < .

We need a nal Lemma before we prove our theorem


Lemma 9.14. Suppose that f : R → R is Lebesgue measurable and g = f Lebesgue almost everywhere
then g is also Lebesgue measurable.
Proof. Take any Lebesgue measurable set B and look at g−1 (B) we have that g−1 (B) = (f −1 (B) ∪ {x :
g(x) ∈ B, f (x) ∈
/ B}) \ {x : g(x) ∈ / B, f (x) ∈ B}. Now write N = {x : f (x) 6= g(x)}, by denition
λ(N ) = 0. We also have that {x : g(x) ∈ B, f (x) ∈ / B} ⊆ N and {x : g(x) ∈ / B, f (x) ∈ B} ⊆ N
so they are both null sets and therefore Lebesgue measurable. As f is Lebesgue measurable f −1 (B)
is Lebesgue measurable. Since M is a σ -algebra this implies that (f −1 (B) ∪ {x : g(x) ∈ B, f (x) ∈/
B}) \ {x : g(x) ∈/ B, f (x) ∈ B} is Lebesgue measurable. Therefore g is Lebesgue measurable.

Theorem 9.15. Let [a, b] be an interval. Suppose that f is a bounded, function which is Riemann
integrable on [a, b], then it is Lebesgue integrable and the Riemann integral agrees witht the Lebesgue
integral.
Proof. As f is bounded we only need to show that it is Lebesgue measurable in order for it to be
integrable. Using the lemma above there exists a nested sequence of partitions pn such that u(f, pn ) −
l(f, pn ) < 1/n for each n. Let us dene two sequences of functions gn and hn . We write pn = {ank }N
k=0 ,
n

and recall the denition of mk and Mk associated to the partition. Then we dene
n n

Nn
X Nn
X
gn := mnk 1[ank−1 ,ank ) , hn := Mkn 1[ank−1 ,ank ) .
k=1 k=1

Here we can see that gn and hn are both sequences of simple functions. We also have that gn is a
monotonically increasing sequence and hn is a monotonically decreasing sequence. As f is bounded so
are the sequences gn (x) and hn (x) for each x so we dene g(x) = limn gn (x) and h(x) = limn hn (x), these
are both bounded, Borel measurable functions. We also have that gn (x) ≤ f (x) ≤ hn (x) ≤ sup[a,b] f so

31
consequently g(x) ≤ f (x) ≤ h(x). We can see that λ(gn ) = l(f, pn ) and λ(hn ) = u(f, pn ). We can use
sup[a,b] f as a dominating function, so we have by dominated convergence that

λ(g) = lim λ(gn ) = lim l(f, pn ) = lim u(f, pn ) = lim λ(hn ) = λ(h).
n n n n

We also have that h − g ≥ 0 and λ(h − g) = 0 so we know that h = g Lebesgue almost everywhere and
as h − f ≤ h − g we know that f = h = g almost everywhere. Therefore, f is almost everywhere equal
to a measurable function and it is bounded so is Lebesgue integrable
We nish this section with some examples of functions which are Lebesgue integrable but are not
Riemann integrable. The most classic example of this is
Example 9.16. Let f (x) = 1Q then f is Lebesgue integrable but not Riemann integrable on [0, 1] (or
any other interval). In order to see that this function is not Riemann integrable we can see that for
any partition p as the rationals an the irrationals are dense in [0, 1] then l(f, p) = 0 and u(f, p) = 1
therefore if we take a seqence of nested partitions pn then we wont have the limits of l(f, pn ) and
u(f, pn ) meeting.

10 Norms and inequalities


Let us remember what a normed space is with an emphasis on how this ts in with functions spaces.
Denition 10.1 (Normed space). A normed space is a vector space V equipped with a norm, k · k,
which should satisfy
ˆ kvk ∈ R+

ˆ kλvk = |λkkvk

ˆ kv + uk ≤ kvk + kuk

ˆ kvk = 0 if and only if v = 0.

We are interested in normed spaces of functions, where the norms come from integrating quantites.
Denition 10.2. Lp (E) Suppose that (E, E, µ) is a measure space, and p ≥ 1, then we have the
associated Lp space, which is the space of measureable functions equipped with the norm
kf kp = (µ(|f |p ))1/p .

When we are working


 on Ω ⊆ Rd with Lebesgue measure we often write the space Lp (Ω) to be the set
of functions with Ω |f |p dx < ∞. We then often write the norm k · kLp (Ω) if we have not previously
specied which space we are working in. If we are working with the measure γ(x)dx for some positive
function γ on Rd then we write Lp (γ), k · kLp (γ) . We also calle Lp (E) or Lp (Ω) or Lp (γ) the space of
measurable functions where the associated norm is nite.
We also have the supremum norm.
Denition 10.3. Suppose (E, E, µ) is a measure space. We have the following norm on measurable
functions
kf k∞ = inf{c : |f | ≤ c almost everywhere}.
We also call the space L∞ (E) the space of all measurable functions on E with kf k∞ < ∞.

32
Remark 10.4. Strictly speaking the norms dened above are seminorms. This is because all these norms
will vanish for a function f , where f is non-zero but is equal to zero almost everywhere. When working
in Lp spaces we consider two functions the same if they are equal almost everywhere.
Strictly speaking we no longer consider functions f when we work in Lp spaces we instead consider
equivalence classes of functions with the equivalence relation f ∼ g if f = g almost everywhere. When
working in this setting we write Lp for the space of measurable function equipped with the p-seminorm
and Lp for the space of equivalence classes of functions equipped with the p-norm. Most of the time
we wont really think about an element of Lp as an equivalence class and hopefully it quickly becomes
natural to think about functions as dened up to alteration on a null set.
Theorem 10.5. For p ∈ [1, ∞] the space Lp (E) is indeed a vector space.
Proof. We need to show f ∈ Lp (E) implies that αf ∈ Lp (E) for α ∈ R (or C) and f, g ∈ Lp (E) implies
that f + g ∈ Lp (E), specically we need to show that kαf kp < ∞ and kf + gkp < ∞.
For p < ∞, the rst point we can use the linearity of the integral to get
µ (|α|p |f |p ) = |α|p µ(|f |p ) < ∞.

Now, in a slightly more complex way we have


  
p p
2p (|f (x)|p + |g(x)|p ) µ(dx) ≤ 2p kf kpp + kgkpp < ∞.

|f (x)+g(x)| µ(dx) ≤ (2 max{|f (x)|, |g(x)|}) µ(dx) ≤

For p = ∞ for the rst point it follows immediately from the denition that kαf k∞ = |α|kf k∞ < ∞.
For the second point since the union of two null sets is null we have that f +g is equivalent to a function
which is uniformly bounded. Therefore it is clear that kf + gk∞ ≤ kf k∞ + kgk∞ .
In order to progress further with normed spaces of functions we need to be able to prove the triangle
inequality for the p-norms. This inequality is called Minkowski's inequality. In the next section we prove
it as well as several other inequalities which are very useful when working with function spaces.

10.1 Inequalities - Week 8 starts here


For our rst couple of inequalities let us just look at some useful inequalities between real numbers.
Lemma 10.6 (Young's inequality (watch out there are at least two things with this name)). Let x and
y be two positive real numbers and p ∈ [1, ∞) with 1/p + 1/q = 1 then we have
xp y q
xy ≤ + .
p q
Proof. We can see the inequality holds when either x or y are zero so we neglect this case and dene
u = xp and v = y q . Therefore we want to show that
u v
u1/p v 1/q ≤ + .
p q
As everything is strictly positive we can divide both sides by v , and use the relationship between p and
q , to get
u/v 1
u1/p v −1/p ≤ + .
p q
Now let us dene t = u/v , so our orriginal inequality will be true if we can show
t 1
t1/p ≤ + ,
p q

33
or equivalently
t 1
+ − t1/p ≥ 0.
p q
We can dierentiate this function in t and get
 
d t 1 1
+ − t1/p = (1 − t−1/q )
dt p q p

and dierentiate a second time to get


d2
 
t 1 1 −1−1/q
+ − t1/p = t > 0.
dt2 p q pq

So this function achieves a minimum when t−1/q = 1, that is when t = 1, and it achieves the minimum
value 0. Therefore it is always positive and the inequality holds.
We also have the very simple corollary which is often useful (especially in Analysis of PDE)
Corollary 10.7. Suppose that x, y are positive then for every η > 0 we have
xp η p yq
xy ≤ + q
p η q
Proof. Just write xy = (ηx)(y/η).
Using Young's inequality we can prove an inequality about functions.
Proposition 10.8 (Hölder's Inequality). Suppose that (E, E, µ) is a measure space and f ∈ Lp (E), g ∈
Lq (E) with 1/p + 1/q = 1 then f g ∈ L1 (E) and we have the following inequality

kf gk1 ≤ kf kp kgkq

Proof. First let us look at the case where f ∈ L1 (E) and g ∈ L∞ (E) without loss of generality let g be
bounded everywhere by kgk∞ then we have
|f (x)g(x)| ≤ |f (x)|kgk∞

and integrating this inequality (using monotonicity) gives the result.


The more complicated case is where p ∈ (1, ∞) then we have for each η > 0 that
η p |f (x)|p |g(x)|q
|f (x)g(x)| ≤ + .
p ηq q
Integrating this gives
ηp 1
kf gk1 ≤ kf kpp + q kgkqq .
p η q
We can then choose η however we want so we choose it to make the right hand side as small as possible.
We can nd out how best to choose η by dierentiating in η .
ηp
 
d 1
kf kpp + q kgkqq = η p−1 kf kpp − η −q−1 kgkqq ,
dη p η q

and
d2 ηp
 
1
kf kpp + q kgkqq = (p − 1)η p−2 kf kpp + (q − 1)η −q−2 kgkqq > 0.
dη 2 p η q

34
So the right hand side of the inequality is smalles when
η p−1 kf kpp = η −q−1 kgkqq .

Which is when
η = kgkqq/(p+q) kf k−p/(p+q)
p .
Substituting this value of η in gives
1 2 /(p+q) 1 2 /(p+q)
kf gk1 ≤ kf kp−p
p kgkpq/(p+q) + kf kpq/(p+q)
p kgkq−q
q = (kf kp kgkq )pq/(p+q) ,
p q
and
pq/(p + q) = ((p + q)/pq)−1 = (1/q + 1/p)−1 = 1.

Second proof of Holder's inequality. This is the more standard proof suppose rst that kf kp = 1, kgkq =
1 then using Young's inequality |f (x)g(x)| ≤ |f (x)|p /p + |g(x)|q /q . So integrating this gives kf gk1 =
kf kpp /p + kgkpq /q = 1/p + 1/q = 1. Then we have for general f, g that kf /kf kp kp = 1 and kg/kgkq kq = 1
so
kf g/kf kp kgkq k1 ≤ 1,
and multiplying through gives
kf gk1 = kf kp kgkq .

Remark 10.9 (Cauchy-Schwartz Inequality). The important case of this inequality when p = q = 2 is
generally known as the Cauchy-Schwartz inequality!!!
We also have Minkowski's inequality which as we discussed is necessary to make sure Lp is a normed
space.
Proposition 10.10 (Minkowski's Inequality). Let (E, E, µ) be a measure space and suppose that f, g
are in Lp then
kf + gkp ≤ kf kp + kgkp .
Proof. We have already shown this when p = ∞, the case where p = 1 is also straightforward. We have
|f (x) + g(x)| ≤ |f (x)| + |g(x)|

and integrating this gives the required inequality.


We now move on to p ∈ (1, ∞). We choose q so that 1/p + 1/q = 1 and observe that |f + g|p−1 ∈
L (E) as
q

|f + g|q(p−1) = |f + g|p .
We also have that |f |, |g| ∈ Lp (E). Therefore we have
kf + gkpp = µ(|f + g|p ) = µ(|f + g||f + g|p−1 )
≤ µ(|f ||f + g|p−1 ) + µ(|g||f + g|p−1
using Hölder's ineq ≤ (kf kp + kgkp ) k|f + g|p−1 kq
= (kf kp + kgkp ) kf + gkp/q
p .

Rearranging this gives


kf + gkp(1−1/q) ≤ kf kp + kgkp ,
and we recall that p(1 − 1/q) = 1.

35
Now we move onto some more probabilistically focussed inequalities which do not directly relate to
Lp spaces
Proposition 10.11 (Markov's Inequality/ Chebychev's inequality). Let (E, E, µ) be a measure space
and f a non-negative measurable function and λ > 0. Then we have
1
µ({x : f (x) > λ}) ≤ µ(f ).
λ
Proof. We have the following inequality
λ1{f (x)>λ} ≤ f.

We then integrate this and use the monotonicity of the integral to get
λµ({f (x) > λ}) ≤ µ(f ).

Remark 10.12 (Tail estimates). On of the powerful consequences of Markov's inequality is that is allows
us to estimate how the function will behave at large values. For example suppose that f ∈ Lp (R) then
we know that
λ({x : |f (x)| > t} = λ({x : |f (x)|p > tp }) ≤ t−p kf kpp .
This is particularly relevant in probability where we are interested in estimating how often extreme
events happen and we get inequalities of the form
P(X > x) ≤ x−p E(X p ).

Remark 10.13 (Tcherno bounds). Another common use of Markov's inequality is when we know how
µ(exp(αf (x))) behaves as we vary α. For example, in a probabilistic setting E(eαX ) is the moment
generating function which is often known for distributions. We can then use Markov's inequality via
µ({f (x) > t}) = µ({exp(αf (x)) > eαt }) ≤ µ(exp(αf ))e−αt .

Since the left hand side does not depend on α one can then optimise over α which will often give a
superior bound. An example of this is in the probabilistic setting if X is a normal random variable on
R with mean 0 and varaince σ 2 then we have
2 2
E eαX = eα σ /2 .


This leads to 2 σ 2 /2−αt


P(X > t) ≤ eα .
We can then see that
t 2 t2
 
2 2 1
α σ /2 − αt = ασ − − 2,
2 σ 2σ
so we can choose t = σ 2 α to get 2 /2σ 2
P(X > t) ≤ e−t .
Our last big inequality is Jensen's inequality which involves convexity. We briey recall the denition
of convexity and prove a useful lemma before moving onto the inequality.
Denition 10.14 (Convexity). Let I be an interval and let φ : I → R then we call φ convex if for
every t ∈ [0, 1], and x, y ∈ I , we have
φ(tx + (1 − t)y) ≤ tφ(x) + (1 − t)φ(y).

36
Lemma 10.15. Let φ : I → R be convex and let m be a point in the interior of I then there exists a, b
such that ax + b ≤ φ(x) for every x ∈ I and am + b = φ(m).
Proof. Take x < m < y then by convexity
y−m m−x
φ(m) ≤ φ(x) + φ(y).
y−x y−x
We can rearrange this to
(y − m + m − x)φ(m) ≤ (y − m)φ(x) + (m − x)φ(y),

then to
(y − m)(φ(m) − φ(x)) ≤ (m − x)(φ(y) − φ(m)),
then to
φ(m) − φ(x) φ(y) − φ(m)
≤ .
m−x y−m
This is true for any x, y surrounding m so there exists a such that
φ(m) − φ(x) φ(y) − φ(m)
≤a≤ ,
m−x y−m

for every such x, y . From this we get that φ(x) ≥ a(x − m) + φ(m).
Proposition 10.16 (Jensen's inequality). Suppose that (E, E, µ) is a measure space with µ(E) = 1
and let φ be a convex function from R to R and f is an integrable function then φ(f ) is well dened
and
µ(φ(f )) ≥ φ(µ(f )).

Remark 10.17. This is another inequality where I sometimes have trouble remembering which way the
inequality sign goes. My key example to check on is
 1 2  1
1 1
= xdx ≤ x2 dx = .
4 0 0 3

Proof. As µ(E) = 1 we can consider µ(f ) as the average value that f takes over E . Using our lemma
we have that there exists a, b such that
ax + b ≤ φ(x),
and
aµ(f ) + b = φ(µ(f )).
By the monotonicity of the integral
µ(af + b) ≤ µ(φ(f ))
and the left hand side is aµ(f ) + bµ(E) = aµ(f ) + b by linearity which by construction is equal to
φ(µ(f )).

37
10.2 Back to Lp spaces
Now we are armed with our inequalities, we want to discuss some properties of Lp spaces. First let us
dene convergence in Lp .
Denition 10.18. We say a sequence of functions fn converges to another function f in Lp if kfn −
f kp → 0 as n → ∞.
Theorem 10.19 (Lp (E) is complete). This is for the case p < ∞. Suppose that fn is a sequence in
Lp with kfn − fm kp → 0 as n, m → 0 then there exists an f in Lp such that kfn − f kp → 0 as n → ∞
Proof. Let n1 = 1 and then we can nd nk recursively such that kfnk − fnk−1 kp ≤ 2−k . Then we have
that X
kfnk − fnk−1 kp ≤ 1.
k
Choose K arbirtrary, then by Minkowski's inequality we have
K
X K
X
k |fnk − fnk−1 |kp ≤ kfnk − fnk−1 kp ≤ 1.
k=1 k=1

By Monotone convergence we can let K → ∞ to get



X
k |fnk − fnk−1 |kp ≤ 1.
k=1

Therefore,

X
|fnk (x) − fnk−1 (x)| < ∞
k=1

almost everywhere. This implies that fnk (x) is a Cauchy sequence for almost every x. Since we know
that R is complete, there exists a set E 0 with µ(E \ E 0 ) = 0 such that for every x ∈ E 0 the sequence
fnk (x). Dene
limk fnk (x) x ∈ E 0

f (x) =
0 x∈/ E0
Now we have a candidate for our limit, we want to show fn → f in Lp (E). Given  > 0 there exists
N such that if n, m ≥ N then kfn − fm kp ≤ . Therefore, for k suciently large and n ≥ N we have
kfn − fnk kp ≤ . Now, using Fatou's lemma

kfn − f kp = kfn − lim fnk kp ≤ lim inf kfn − fnk kp ≤ .


k k

Therefore kfn − f kp → 0.
Proposition
Pn 10.20. Linear combinations of simple functions, step functions p(functions of the form
φ(x) = k=1 ak 1(ck ,dk ] ), and continuous functions are all dense in the space L (R), p ∈ [1, ∞) that is
to say for any  > 0 and any f in Lp (R) there is a function g which is a simple function/step function/
continuous function such that kf − gkp ≤ .
Proof. The proof for simple functions and step functions is in the fourth assignment. In order to
show that it works for continuous functions we notice that the result is true for step functions so for
any f ∈ Lp (R) and any  > 0 there exists a step function φ such that kf − φkp ≤ /2, if we can
nd a continuous function g such that kφ − gkp ≤ /2 then by Minkowski's inquality kf − gkp ≤
kf − φkp + kφ − gkp ≤ /2 + /2.

38
Now if we look at the indicator function 1(c,d] (x) then let us take


 0 x∈ / (c − , d + )
(x − c + )/ x ∈ [c − , c)

g,c,d (x) =
 1
 x ∈ [c, d)
−(x − d − )/ x ∈ [d, d + )

Then kgc,d, − 1(c,d] kp ≤ 2. Now let φ(x) = nk=1 ak 1(ck ,dk ] and let g = nk=1 ak gck ,dk ,/2|ak |n then
P P

n
X n
X
kφ − gkp ≤ kak (1(ck ,dk ] − gck ,dk ,/2|ak |n )kp ≤ |ak |/|ak |n ≤ .
k=1 k=1

11 Product Measures - Week 9 Starts here


In this section we look at taking two measure spaces (E, E, µ) and (F, F, ν) and dening a σ algebra
and a measure on the product space E × F . This will give us another way of dening Lebesgue measure
on Rd . First we remind ourselves of the denition of cartesian product.
Denition 11.1 (Cartesian product). If E and F are spaces then the cartesian product E × F is the
space of twoples (x, y) where x ∈ E and y ∈ F .
Example 11.2. R × R = R2 .
Now we want to consider the product σ -algebra.
Denition 11.3. The product σ-algebra E × F is a σ -algebra on E × F which is generated by the
collection
A = {A × B : A ∈ E, B ∈ F}.
That is to say E × F = σ(A).
We now take some time to look at the projection maps πE and πF .
Denition 11.4. We dene two maps πE : E × F → E and πF : E × F → F by

πE (x, y) = x, πF (x, y) = y.

Lemma 11.5. The maps πE and πF are both measurable. Furthermore if C ∈ E × F then the following
sets are measurable
−1
Cy = {x ∈ E : (x, y) ∈ C} = πE πF−1 ({y}) ∩ C .]
 
Cx = {y ∈ F : (x, y) ∈ C} = πF πE ({x}) ∩ C ∈ F,

Furthermore if f : E × F → G is a measurable function then fx : F → G dened by fx (y) = f (x, y)


and fy : E → G dened by fy (x) = f (x, y) are both measurable functions.
Proof. First let us show that the projection maps are measurable. Let A be in E then πE−1 (A) = A × F ,
as F ∈ F this is a product set so is in E × F .
Now let us look at Cx . Let C be the collection of sets in E × F such that Cx ∈ F . Then C contains
all the product sets. We now want to show that C is a σ -algebra. (C c )x = {y ∈ F : (x, y) ∈ C c } =
{y ∈ F : (x, y) S / C} = F \S{y ∈ F : (x, y) ∈ C} = (Cx )c . Therefore C ∈ C implies that C c ∈ C . We

also have that ( n Cn )x = n ((Cn )x ). Therefore, C is closed under complements and countable unions
so is a σ -algebra. Therefore C ⊃ E × F .
Now we move onto fx . If A ∈ F then fx−1 (A) = {y ∈ F : f (x, y) ∈ A} = (f −1 (A))x . Using the
previous part we know that this is a measurable set. Therefore fx is measurable.

39
Theorem 11.6 (Product Measure). Given two σ-nite measure spaces (E, E, µ) and (F, F, ν) there
exists a unique measure, µ × ν , on E × F such that (µ × ν)(A × B) = µ(A)ν(B) when A ∈ E and B ∈ F .
Furthermore  
(µ × ν)(C) = ν(Cx )µ(dx) = µ(Cy )ν(dy).
E F

Proof. Let us begin in the case where both measure spaces are nite. As A = {A × B : A ∈ E, B ∈ F}
is a π -system generating E × F we can use Carathéodory's extension theorem to prove the rst part of
this theorem. However we will work directly as dening this measure is straightforward and useful for
understanding it.
First we check that x 7→ ν(Cx ) and y 7→ µ(Cy ) are both measurable functions so the integrals are
well dened. Let us begin in the case that ν is a nite measure. Let C be the collection of sets for which
the function x 7→ ν(Cx ) is E measurable. If C = A × B then ν(Cx ) = ν(B)1x∈A which is measurable.
Now we want to show that C is a σ -algebra. If C 1 ⊂ C 2 then ν((C 2 \ C 1 )S x ) = ν(C x ) − ν(Cx ) so
2 1

C2S \ C 1 ∈ C . Suppose that C n is an increasing sequence of sets in C then ν ( n C n )x = limn ν (Cxn )




so n Cn is in C . Therefore C is a σ -algebra and consequently contains E × F . Now the only reason


that we needed ν to be nite was to ensure that A × B ∈ C as otherwise this function might take the
value innity sometimes. You can solve this problem by putting a σ -algebra on [0, ∞] or we can work
in the σ -nite setting and let {Dn } be a sequence of disjoint subsets with ν(Dn ) < ∞ whose union is
the whole of F . By the argument above x 7→ ν((C ∩ Dn )x ) is always measurable (restricting the space
to Dn ) and ν(Cx ) = limn nk=1 ν((C ∩ Dn )x ).
P
Now we move onto the main part of the proof we can dene two dierent candidates for (µ × ν)
namely  
(µ × ν)1 (C) = ν(Cx )µ(dx), (µ × ν)2 (C) = µ(Cy )ν(dy).
E F
We can see that if C is of the form A × B then
 
(µ × ν)1 (A × B) = ν(B)1x∈A µ(dx) = µ(A)ν(B) = µ(A)1y∈B ν(dy) = (µ × ν)2 (A × B).
E F

Now we know that (µ × ν)1 and (µ × ν)2 agree on a π -system generating E × F so Dynkin's uniqueness
of extension lemma says that they agree on all of E × F .
Now we need to extend to the σ -nite case. S There areSsequences En and Fn of sets such that
µ(En ) < ∞, ν(Fn ) < ∞ for every n and E = n En , F = n Fn . Then we know that x 7→ ν((C ∩
(En × Fn ))x ) is a measurable function of x for every n, so letting n tend to innity we have ν(Cx ) =
limn ν((C ∩ (En × Fn ))x ) so x 7→ ν(Cx ) is the limit of measurable functions so measurable. Therefore
in the σ nite case we can still dene our two candidate measures (µ × ν)1 and (µ × ν)2 and we have
that (µ × ν1 (C) = limn (µ × ν))1 (C ∩ (En × Fn )) = limn (µ × ν)2 (C ∩ (En × Fn )) = (µ × ν)2 (C). So the
two measures are equal.
Now let (µ × ν)3 be any other candidate measure on E × F such that (µ × ν)3 (A × B) = µ(A)ν(B).
Dynkin's uniqueness of extension theorem tells us that it must be equal to (µ × ν) when restricted to
En × Fn for any n. We can then repeat exactly the same argument as above to extend it to any set in
E × F.

One of the key tools we get when using product measure is Fubini's theorem. There are two
theorems one for positive functions, one for integrable functions. The naming gets a bit wooly, but
often the theorem for positive functions is called Tonelli's theorem and that for integrable functions is
called Fubini's theorem. Sometimes the later is called the Fubini-Tonelli theorem and sometimes both
are called Fubini-Tonelli or Fubini. To play it safe I'm going to call both Fubini-Tonelli Theorem.

40
Theorem 11.7 (Fubini-Tonelli theorem for positive functions). Suppose that (E, E, µ) and (F, F, ν)
are σ-nite measure spaces and
 f is a non-negative E × F measurable function then the functions
x 7→ F f (x, y)ν(dy) and y 7→ E f (x, y)µ(dx) are both measurable and
     
(µ × ν)(f ) = f (x, y)ν(dy) ν(dx) = f (x, y)µ(dx) ν(dy).
E F F E

Proof. We build up the proof gradually, beginning with the case where f is the indicator function of a
set C ∈ E × F . In this case the measurability of the integrals in x or y and the form for (µ × ν)(f ) are
given by the construction of the product measure in the previous theorem.
The linearity of the integral then imply that
 the Fubini-Tonelli theorem holds whenever f is a non-
negative simple
 function, we also can see that f (x, y)ν(dy) will be measurable as the previous lemma
shows that 1Cx (y)ν(dy) is measurable and this is the sum of functions of that form.. We then note
that any non-negative measurable function f , can be approximated from below by non-negative simple
functions. Let fn be a sequence of simple functions approximating f . Then
Nn
X
fn = cnk 1Ckn ,
k=1

where Ckn ∈ E × F . Then we know that


     
(µ × ν)(fn ) = cnk 1Ckn (x, y)ν(dy) µ(dx) = n
c 1 (Ckn )x (y)ν(dy) µ(dx).
E F E F

By monotone convergence as n → ∞ the left hand side converges to (µ × ν)(f ). We can also see that
by monotone convergence  
cn 1(Ckn )x (y)ν(dy) ↑ f (x, y)ν(dy).
F F

We note that this shows that F f (x, y)ν(dy) is the limit of measurable functions. Consequently, we
use monotone convergence again to get that the right hand side converges to
  
f (x, y)ν(dy) µ(dx).
E F

This gives the desired conclusion for positive f .


Theorem 11.8 (Fubini-Tonelli theorem for integrable functions). Suppose that (E, E, µ) and (F, F, ν)
are σ-nite measure spaces and f is a E × F measurable function which is integrable with respect to
(µ × ν) then the functions
  
Ff (x, y)ν(dy) F |f (x, y)|ν(dy) < ∞
g(x) =
0 F |f (x, y)|ν(dy) = ∞

and   
Ef (x, y)µ(dx) E |f (x, y)|µ(dx) < ∞
h(y) =
0 E |f (x, y)|µ(dx) = ∞
are both measurable and integrable. Furthermore,
     
(µ × ν)(f ) = f (x, y)ν(dy) ν(dx) = f (x, y)µ(dx) ν(dy).
E F F E

41
Proof. Now we turn to the case where f is not necessarily non-negative but is (µ × ν) integrable. By
our result for non-negative functions we know that
  
(µ × ν)(|f |) = |f (x, y)|ν(dy) µ(dx),
E F

which proves that the function x 7→ F |f (x, y)|ν(dy) is µ-integrable, and is consequently nite almost
everywhere, therefore restricting
 the functions g, h to where they would be nite is not a problem. Let
A be the set on which x 7→ F |f (x, y)|ν(dy) is nite. Now we write f = f+ − f− in our usual way.
Then by denition
   
f (x, y)ν(dy)1x∈A = f+ (x, y)ν(dy) − f− (x, y)ν(dy) 1x∈A .

Then using the fact that µ(Ac ) = 0, and our result for non-negative functions we have
   
(µ × ν)(f ) = (µ × ν)(f+ ) − (µ × ν)(f− ) = f+ (x, y)ν(dy)µ(dx) − f− (x, y)ν(dy)µ(dx)
E F  E F

= f+ (x, y)ν(dy) − f− (x, y)ν(dy) 1x∈A µ(dx)
E F
   F
= f (x, y)ν(dy)1A µ(dx)
E  F

= f (x, y)ν(dy)µ(dx).
E F

11.1 Applications of product measure and Fubini's theorem


This section is a collection of examples and applications of product measure and Fubini's theorem
Example 11.9. Suppose (E, E, µ) is a measure space we look at its product with (R, B(R), λ) and suppose
that f : E → R is a non-negative measurable, then the set
A = {(x, y) : 0 ≤ y ≤ f (x)}

is measurable with the product σ -algebra and its measure is the area under the graph of f . We have
that
(µ × λ)(A) = µ(λ(Ax )) = µ(f ),
and  ∞
(µ × λ)(A) = λ(µ(Ay )) = λ({x : f (x) ≥ y}) = µ({x : f (x) ≥ y})dy.
0

Example 11.10 (Convolutions). Suppose that both f and g are in L1 (R) then for almost every x the
function t 7→ f (x − t)g(t) is also in L1 (R). We have that the function f ∗ g dened by
 
R f (x − t)g(t)dt if t 7→ f (x − t)g(t) is Lebesgue integrable
x 7→
0 Otherwise
is in L1 and satises kf ∗ gk1 ≤ kf k1 kgk1 .
We can prove this using Fubini-Tonelli. First we want to check that t 7→ f (x − t)g(t) is measurable.
Write h(t) = x − t this continuous function (for xed x) and t 7→ f (x − t) = f (h(t)) so it is the compo-
sition of two measurable functions so measurable. We also know that the product of two measurable

42
functions is measurable to f (x − t)g(t) is a measurable function of t. Now we want to check that it is
integrable    
f (x − t)g(t)dt dx ≤ |f (x − t)g(t)|dtdx

as f (x − t)g(t) ≤ |f (x − t)g(t)| and −f (x − t)g(t) ≤ |f (x − t)g(t)|. Now we apply Fubini-Tonelli and


get      
|f (x − t)g(t)|dtdx = |f (x − t)|dx |g(t)|dt = kf k1 |g(t)|dt = kf k1 kgk1 .

We can also show that convolutions of functions are continuous functions using the tools from
measure theory. For this we need to show that shifts are continuous in L1 .
Lemma 11.11. Dene the map Tτ : Lp (R) → Lp (R) by (Tτ f )(x) = f (x + τ ) then
lim kTτ f − f kp = 0.
τ →0

Proof. We want to show that for any  there exists τ∗ such that if τ ≤ τ∗ then kTτ f − f kp ≤ . First
let us show the result for step functions, that is to say functions of the form
n
X
φ(x) = ak 1[ck ,dk ) .
k=1

First by Minkowski's inequality we have


n
X n
X n
X
kTτ φ − φkp ≤ |ak |kTτ 1[ck ,dk ) − 1[ck ,dk ) kp = |ak |λ([ck + τ, dk + τ )∆[ck , dk ) ≤ |ak |2|τ |.
k=1 k=1 k=1

Therefore we can make τ small enough so that this is less than .


Now let us look at a general f we know (from Assignment 4) that there is a step function φ such that

kf −φkp ≤ /3. We also can change variables x ↔ x+τ so kTτ f −Tτ φkp =
1/p
|f (x + τ ) − φ(x + τ )|p dx =
kf − φkp for any τ . For this φ we can nd τ suciently small such that kTτ φ − φkp ≤ /3. Hence

kTτ f − f kp ≤ kTτ f − Tτ φkp + kTτ φ − φkp + kφ − f kp ≤ .

Now we go back to convolutions, we can show that if f, ∈ Lp (R) and g ∈ Lq (R) then f ∗ g is
continous.
 
|f ∗ g(y) − f ∗ g(x)| = | (f (x − t) − f (y − t))g(t)dt| ≤ |f (x − t) − f (y − t)||g(t)|dt
R R

we can bound this using Hölder's inequailty by


 1/p  1/p
p
kgkq |f (x − t) − f (y − t)| dt = kgkq |f (t − x + y) − f (t)|dt = kgkq kT−x+y f − f kp .

So if |x − y| is small enough then |f ∗ g(x) − f ∗ g(y)| will also be small.

43
12 Radon-Nikodym Theorem - Week 10 starts here
12.1 Signed measures
We introduce the notion of signed measures which will be useful in the proof of the Radon-Nikodym
theorem.
Denition 12.1 (Finite signed measure). A function µ from a σ -algebra E to R is a nite signed
measure if
ˆ µ(∅) = 0,

ˆ If (An )n≥1 is a sequence of disjoint sets then µ(


S P
n An ) = n µ(An )

Example 12.2. If (E, E, µ) is a measure space and f ∈ L1 (E) then ν dened by ν(A) = µ(f 1A ) is a
signed measure.
We want to show two decomposition theorems which basically allow us to reduce the situation back
to measures. First we need some more denitions and a useful Lemma.
Denition 12.3. If (E, E) is a measurable space and ν is a nite signed measure then we call A a
positive set if for every B ∈ E with B ⊆ A then ν(B) ≥ 0. The negative sets are dened analagously.
Lemma 12.4. Suppose that ν is a nite signed measure on (E, E) and suppose A ∈ E with ν(A) < 0
then there exists a negative set B with B ⊆ A and ν(B) ≤ ν(A).
Proof. We will produce this set A by an itterative process, dene
δ1 = sup{ν(C) : C ⊆ A},

then since ∅ ⊆ A we have that δ1 ≥ 0. If δ1 = 0 then we have a negative set so are done. If not we can
nd a set C1 ⊆ A with
ν(C1 ) ≥ min{δ1 /2, 1}.
(We take the minimum here because we don't know that δ1 is nite.) Now we will dene a sequence of
δn and Cn by setting
n−1
[
δn = sup{ν(C) : C ⊆ (A \ Ci )}
i=1

and Cn a set so that


ν(Cn ) ≥ min{δn /2, 1}.
Now let C∞ = and B = A \ C∞ . We now need to check that B has the required properties,
S
n Cn

ν(A) = ν(C∞ ) + ν(B) ≥ ν(B),

as ν(C∞ ) ≥ 0 by construction. As ν is a nite measure we must have ν(C∞ ) < ∞ and as the Cn are
constructed to be disjoint this means we must have limn ν(Cn ) = 0. Therefore limn δn = 0. If D ⊆ B
then we must have that ν(D) ≤ δn for every n, therefore ν(D) ≤ 0.
Now we are able to state and prove our two decomposition theorems.
Theorem 12.5 (Hahn Decomposition theorem). Let (E, E) be a measure space and ν a nite signed
measure. Then there exists a positive set P and a negative set N for ν such that E = P ∪ N .

44
Proof. Let L = inf{ν(A) : A is a negative set for ν} then L is nite as otherwise we could S
construct a
set with measure −∞. Then let An be a negative set with ν(An ) ≤ L + 1/n then let N = nSAn .
We canScheck that N is a negative set and that ν(N ) = L. If A ⊆ NPthen let Bn = An \ n−1 k=1 Ak
then A = n (A ∩ Bn ) and A ∩ Bn ⊆ An so ν(A ∩ Bn ) ≤ 0 and ν(A) = n ν(A ∩ Bn ) ≤ 0. Now since
N is a negative set ν(N \ An ) ≤ 0, therefore ν(N ) = ν(N \ An ) + ν(An ) ≤ ν(An ) ≤ L + 1/n. This is
true for any n so ν(N ) ≤ L and since L is dened to be the innmum over ν(A) for all negative sets
A, we will have ν(N ) ≥ L, therefore ν(N ) = L.
Let P = N c we want to check that P is a positive set. Suppose there exists a set A ⊆ P with
ν(A) < 0, then by our lemma there exists a negative set B ⊆ A with ν(B) ≤ ν(A) < 0. Then N ∪ B
is a negative set and N and B are disjoint so ν(N ∪ B) = ν(N ) + ν(B) < ν(N ) which contradicts the
fact that ν(N ) = L = inf{ν(A) : A is a negative set for ν} so we are done.
Theorem 12.6 (Jordan decompostion theorem). Every nite signed measure is the dierence of two
positive measures. Precisely, if (E, E) is a measure space and ν is a signed measure then there exist
measures ν+ and ν− such that for every A ∈ E we have ν(A) = ν+ (A) − ν− (A).
Proof. Take some Hahn decomposition (P, N ) then let ν+ (A) = ν(A∩P ), as A∩P ⊆ P then ν(A∩P ) ≥
0. Similarly let ν− (A) = −ν(A∩N ). By additivity of ν we have that ν(A) = ν+ (A)−ν− (A). Countable
additivity of ν+ and ν− follow immediately from countable additivity of ν .
Now we notice that if B ⊆ A then
ν(B) = ν+ (B) − ν− (B) ≤ ν+ (B) ≤ ν+ (A)

and ν+ (A) = ν(A ∩ P ) therefore we have that


ν+ (A) = sup{ν(B) : B ⊆ A, B ∈ E}

in the same way


ν− (A) = sup{−ν(B) : B ⊆ A, B ∈ E}.
This shows that the values of ν+ , ν− do not depend on the particular choice of Hahn decomposition.

12.2 Absolute Continuity


We now move on to the main focus of this section, the Radon-Nikodym theorem. In order to understand
the theorem we need a denition.
Denition 12.7. Let (E, E) be a measurable space and µ and ν be two measures then we say that
ν is absolutely continuous with respect to µ of ν  µ if for every A ∈ E with µ(A) = 0 we also have
ν(A) = 0.
We can characterise absolute continuity
Lemma 12.8. Suppose that (E, E) is a measurable space and µ a measure, ν a nite measure then
ν  µ if and only if for earch  > 0 there exists a δ > 0 such that µ(A) < δ implies that ν(A) < .
Proof. First let us suppose there exists such at δ for each , then if µ(A) = 0 we have that µ(A) < δ
for every δ so we must have ν(A) <  for every  so ν(A) = 0.
Now let us suppose that ν  µ. We prove the result by contradiction. Suppose there exists an 
such that for every δ there exists a set A with µ(A) < δ but ν(A) > . Then we can nd a sequence of
sets Ak such that µ(Ak ) < 2−k but ν(Ak ) ≥ . By the rst Borel-Cantelli lemma we have that
 
\ [
µ Am  = 0.
n m≥n

45
We also have that ν( ≥ ν(An ) ≥  and
S
m≥n Am )
   
\ [ [
ν Am  = lim ν  Am  ≥ .
n
n m≥n m≥n

This show gives us a set with µ(B) = 0 but ν(B) > 0 which contradicts ν  µ.
Now we can prove the main theorem for this section.
Theorem 12.9 (Radon-Nikodym Theorem). Let (E, E) be a measure space and let µ, ν be two nite
measures with ν  µ. Then there exists a measurable function g : E → [0, ∞) such that ν(A) = µ(g1A ).
The function g is unique up to identifying almost everywhere equal functions. We write g = dν/dµ and
call it the Radon-Nikodym derivative of ν with respect to µ.
Proof. Let us dene the set F which is the set of all measurable functions, f , with µ(f 1A ) ≤ ν(A) for
every A ∈ E . The idea is that F contains a function g which achieves µ(g) = supf ∈F µ(f ).
As a rst step we show that f1 ∨ f2 = max{f1 , f2 } ∈ F when f1 , f2 ∈ F . Let us take any A ∈ E
then let A1 = A ∩ {f1 ≥ f2 } and A2 = A ∩ {f1 < f2 }. Then
µ(f1 ∨ f2 1A ) = µ(f1 ∨ f2 1A1 ) + µ(f1 ∨ f2 1A2 ) = µ(f1 1A1 ) + µ(f2 1A2 ) ≤ ν(A1 ) + ν(A2 ) = ν(A).

Therefore f1 ∨ f2 ∈ F .
Now take a sequence fn such that µ(fn ) ≥ supf ∈F µ(f ) − 1/n. Then let gn = f1 ∨ f2 ∨ · · · ∨ fn , so
that the sequence of function gn is increasing and µ(gn ) ≥ supf ∈F µ(f ) − 1/n. Then as gn is increasing
it has a limit g and the monotone convergence theorem shows that
µ(g1A ) = lim µ(gn 1A ) ≤ ν(A).
n

So g ∈ F .
Now we can dene another positive measure ν0 (A) = ν(A) − µ(g1A ). We want to show that ν0 = 0
and will do this by contradiction. Suppose that there exists A ∈ E such that ν0 (A) > 0 then by
monotonicity we will have ν0 (E) > 0 and since µ is a nite measure there exists a number  > 0 such
that ν0 (E) > µ(E). Now ν0 − µ is a nite signed measure. Let (P, N ) be a Hahn decomposition for
this signed measure. Then (ν0 − µ)(A ∩ P ) ≥ 0 so ν0 (A ∩ P ) ≥ µ(A ∩ P ). Hence we have
ν(A) = µ(g1A ) + ν0 (A) ≥ µ(g1A ) + ν0 (A ∩ P )
≥ µ(g1A ) + µ(A ∩ P ) = µ(1A (g + 1P )).

We also have that µ(P ) > 0 as if µ(P ) = 0 then we would have ν0 (P ) = 0 as ν0  ν  µ, and this
would mean
(ν0 − µ)(E) = (ν0 − µ)(N ) ≤ 0,
which would contradict ν0 (E) > µ(E). Therefore, g + 1P belongs to F but µ(g + 1P ) > µ(g) which
contradicts the fact that g achieves µ(g) = supf ∈F µ(f ). Hence ν(A) = µ(g1A ).
Now we turn to uniqueness suppose that we have two positive functions g, h such that ν(A) =
µ(g1A ) = µ(h1A ) for every A, then as ν is nite g and h are integrable so g − h is integrable and
µ((g − h)1A ) = 0 for every A. As g − h is measurable then {x ∈ E : g − h ≥ 0} is a measurable set
so µ((g − h)1{x∈E : g−h≥0} ) = 0. This shows that (g − h)1{x∈E : g−h≥0} = 0 almost everywhere. In the
same way (g − h)1{x∈E : g−h≤0} = 0 almost everywhere. Therefore g = h µ-almost everywhere.

46
12.3 Duality in Lp spaces
The goal of this section is to prove that if 1/p + 1/q = 1 then the dual space of Lp (E) is isomorphic to
the space Lq (E). First let us dene a dual space.
Denition 12.10. Let V be a Banach space (a complete, normed vector space) then the dual space
of V is written V 0 and is the space of all bounded linear operators from V to R. We recall that we
call an operator K on V bounded if |K(v)| ≤ Ckvk for all v ∈ V . We can dene a norm on V 0 by
kKk = supkvk=1 |K(v)|.
The rst thing to note is that if g ∈ Lq (E) then we can dene a bounded linear operator on Lp (E)
by Kg (f ) = µ(f g). This is bounded by Hölder's inequality |µ(f g)| ≤ µ(|f g|) = kf gk1 ≤ kf kp kgkp . It
is also linear thanks to the linearity of the integral. Therfore we can produce a map from Lq (E) →
(Lp (E))0 by g 7→ Kg .
Theorem 12.11. Let (E, E, µ) be a nite measure space and p ∈ (1, ∞). The dual space of Lp (E) is
Lq (E) where 1/p + 1/q = 1. Furthermore the map dened by g 7→ Kg is an isometry.
Remark 12.12. This result also holds for arbitrary measure spaces (without the nite assumption).
Extending to σ -nite measure spaces is relatively straightforward and then to any measure space is
more complicated.
Proof. Remark: This result is similar in spirit to the Riesz representation result that was a non-
examinable topic in week 6.
First we note that the map g 7→ Kg is linear and kKg k(Lp )0 ≤ kgkq . Therefore the map is injective
we want to show that kKg k = kgk and that it is surjective.
First for the fact that kKg k = kgk we look at the function f (x) = sgn(g)|g(x)|q−1 then µ(|f |p ) =
µ(|g|q ) < ∞. Therfore we can look at the action of Kg on f and we have Kg (f ) = µ(|g|q ) so we know
that kKg k ≥ Kg (f )/kf kp = µ(|g|q )/µ(|g|q )1/p = µ(|g|q )1−1/p = kgkq . Therefore g 7→ Kg preserves
norms.
Now we want to show that this map is surgective, let us begin with the case where µ(E) < ∞.
Let us take K an arbirary element of (Lp (E))0 . In this case 1A ∈ Lp (E) for every A ∈ E so we can
dene a function on E by k(A) = K(1A ). We want to show that k is a signed measure. k(∅) =
K(0) = 0 and let A1 , A2 , . . . be a sequece of disjoint measurable sets. Then 1Snj=1 Aj = nj=1 1Aj then
P

k( nj=1 Aj ) = K(1Snj=1 Aj ) = K( nj=1 1Aj ) = nj=1 K(1Aj ) = nj=1 k(Aj ). We also have that k1Sj Aj −
S P P P

1Snj=1 Aj kp → 0 as n → 0. Therefore, as K is a continous map on Lp we have K(1Sn An ) = n K(1An ),


P

so k( n An ) = n k(An ) Therefore k is indeed a signed measure. By the Hahn decomposition and the
S P
Jordan decomposition we can write k = k+ − k− and there exists P ∪ N a Hahn decomposition with k
being positive on P and negative on N .
Next we want to show that k+  µ and k−  µ. If A ∈ E is such that µ(A) = 0 then µ(A ∩ P ) = 0
and µ(A ∩ N ) = 0 and K(1A∩P ) = K(0) = 0 and K(1A∩N ) = K(0) = 0. Therefore k+ (A) = 0 and
k− (A) = 0.
Then by the Radon-Nikodym theorem there exists functions g+ and g− such that k+ (A) = µ(g+ 1A )
and k− (A) = −µ(g− 1A ). Now let g = g+ − g− we want to show that g ∈ Lq and that K = Kg . This is
complicated.
Let us dene En by En = {x : |g(x)| ≤ n} then g1En is bounded and so in Lq as µ is nite. Then
dene a linear functional on Lp by Kn (f ) = µ(f g1En ) and another by K̃n (f ) = K(f 1En ). Then if A is
a measurable set we have Kn (1A ) = K̃n (1A ), by linearity if h is a simple function then Kn (h) = K̃n (h).
We showed in Assignment 4 that given a function f ∈ Lp ,  > 0 there exists a simple function h
with kf − hkp ≤ . Then we have that
|Kn (f )−K̃n (f )| ≤ |Kn (f )−Kn (h)|+|K̃n (f )−K̃n (h)| ≤ kKkkf −hkp +kg1En kq kf −hkp ≤ (kKk+kgkq ).

47
Since  is arbitrary this shows that Kn (f ) = K̃n (f ). K̃n (f ) = Kg1En (f ) so by our isometry we have
kK̃n k = kg1En kq . We also have that kK̃n k ≤ kKk as kK̃n k = supkf kp =1 Kn (f ) = supkf kp =1 K(f 1En ) ≤

supkf kp =1 K(f ) = kKk. Therefore kg1En kq ≤ kKk therefore kg1En kpp = |g|p 1En µ(dx) and then by
monotone convergence we get that kgkq = limn kg1En kq ≤ kKk. Therefore, g ∈ Lq . Then by exactly
the same argument with which we showed Kn = K̃n we have that K = Kg . This concludes the proof
int he nite case.

48

You might also like