MA359 Lecture Notes
MA359 Lecture Notes
Josephine Evans
Contents
1 Introduction 2
1.1 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 The most important things you will learn in this course . . . . . . . . . . . . . . . . . . 2
2 σ -algebras, denition of a measure 3
2.1 Collections of subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Set functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Week 2 starts here - Outer measure, Lebesgue Measure 7
4 Week 3 starts here - Outer measure and Lebesgue measure cont. 13
4.1 Properties of Lebesgue measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5 Measurable Functions 15
5.1 Random variables and the measure theoretic formulation of probability - in brief . . . . 17
6 Week 4 starts here - Measurable function cont. 17
6.1 Convergence of measurable functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.2 Egoro's Theorem and Lusin's Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
7 Integration 20
8 Week 5 starts here -Integration cont. 21
8.1 Convergence theorems for integrals of functions . . . . . . . . . . . . . . . . . . . . . . . 24
9 More integration: Week 7 starts here 28
9.1 Agreement with Riemann Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
10 Norms and inequalities 32
10.1 Inequalities - Week 8 starts here . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
10.2 Back to Lp spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
11 Product Measures - Week 9 Starts here 39
11.1 Applications of product measure and Fubini's theorem . . . . . . . . . . . . . . . . . . . 42
12 Radon-Nikodym Theorem - Week 10 starts here 44
12.1 Signed measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
12.2 Absolute Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
12.3 Duality in Lp spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
1
1 Introduction
Welcome to measure theory. This course introduces the modern theory of functions and integration
which underpins most advanced analysis topics. In particular the theory of function spaces will be
important in PDEs and the notion of measurable functions allows us to rigorously understand random
variables.
The key example we will study is Lebesgue measure in Rd . The goal of dening Lebesgue measure
is to nd a way of asigning length/area/volume/whatever its called if d ≥ 4 to a subset of Rd . It turns
out that it is not possible to do this for every possible subset of Rd , but it is possible to do this for
every subset you are likely to come accross!
1.1 Integration
One of the most important results of measure theory is the ability to integrate `against' the measures
that we dene. We want this new denition of the integral to agree with the Riemann integral on subsets
of Rd and also allow us to integrate over sets that aren't subsets of Rd or with dierent weightings of
the dierent parts of Rd . This new theory of integration allows us to rigorously dene expectation in
probability theory and provides numerous convergence theorems which are some of the results you will
use most from this course.
1.2 The most important things you will learn in this course
For your own knowledge of how measure and function `really' work:
How Lebesgue measure is constructed.
Why you can work with measures just by looking at how they behave on a π -system (nd out
what that is soon!).
The dierent ways in which functions can converge.
Convergence theorems: i.e. when convergence of functions implies convergence of their integrals.
2
2 σ -algebras, denition of a measure
2.1 Collections of subsets
We begin with some dry denitions of collections of sets and functions from collections of sets to R.
These will give us the key formal denition of measures.
We begin with the most basic defnition. An algebra is is collection of sets closed under nite set
operations.
Denition 2.1 (Algebra). A collection of subsets of a space E , A, is called and algebra if
∅ ∈ A, E ∈ A.
If A ∈ A then Ac ∈ A.
If A, B ∈ A then A ∪ B ∈ A.
If A, B ∈ A then A ∩ B ∈ A.
We next dene a σ -algebra. This is the key defnition of a collection of sets for measure theory. The
letter σ here denotes countability. A σ -algebra is a collection of subsets of a space E , which are closed
under countable set operations.
Denition 2.2 (σ-algebra). A collection of subsets of a space E , A is a σ-algebra if
∅ ∈ A, E ∈ A.
If A ∈ A then Ac ∈ A.
Lemma 2.3. We can equivalently dene a σ-algebra as a collection of sets which is contains ∅ and is
closed under taking complements and countable unions.
Proof. Suppose that E is closed under complements and taking countable unionsT and contains ∅, then
it
T is clearS
that E ∈ E . We need to show that if (An )n is a sequence in E then An ∈ E . We know that
An = ( Acn )c so this gives our result.
Example 2.4. In this course we really only deal with one `concrete', non-trivial example of a σ-algebra.
This is complicated to introduce and we will discuss bellow. However, in order to better understand
the denition we give a few examples of things which are, and are not σ -algebras.
The main example is the Borel σ -algebra which we will meet in week 2
Proof. It is straightforward to check that every part of the denition of a σ-algebra holds for the
intersection.
3
Corollary 2.6. For any collection of subsets of a space E , F there is a smallest σ-algebra containing
F . We call this σ(F) or the σ -algebra generated by F .
Proof. There exists at least one σ-algebra containing
T F since the set of all subsets of E is a σ algebra.
Then we can consider the non-empty intersection A∈C A where C is the collection of all σ -algebras
which contain F . We call this resulting σ - algebra the σ -algebra generated by F .
Example 2.7 (Key example: Borel σ-algebra). If E is a topological space and O the family of open sets
in E , then we write B(E) to be the σ -algebra generated by O. This is called the Borel σ -algebra.
We are most interested in B(Rd ). We have the following result
Lemma 2.8. B(R) is generated by the following sets.
The collection of closed sets in R.
The collection of intervals of the form (−∞, b].
The collection of intervals of the form (a, b].
Proof. Let us call B1 , B2 , B3 to be the σ-algebras generated by the sets above. We then want to show
that B(R) ⊇ B1 ⊇ B2 ⊇ B3 .
As B(R) contains all the open sets, it also contains all the closed sets (whose complements are open).
Therefore, it also contains B1 .
As B1 contains all the closed sets, and all the intervals (−∞, b] are closed then B1 contains the
σ -algebra generated by these sets, namely B2 .
As B2 contains (−∞, b] and (−∞, a] and is closed under complements it also contains, (−∞, b] and
(a, ∞). As B2 is closed under intersection, this means it also contains (a, b]. This is true for all a < b
so B2 contains all sets of this form. Consequently, it contains B3 .
Now we want to show thatSB(R) ⊆ B3 . This will conclude the proof. First we note, that we can
make an open interval (a, b) = n (a, b − 1/n] where the union is taken over all n > (b − a)−1 . Now we
need to show that any open set in R is a countable union of open intervals. Let U be such an open set
then let [ [
O= (q − r, q + r).
q∈Q∩U r∈Q s.t (q−r,q+r)⊆U
Then since O is a union of subsets of U then O ⊆ U . Suppose that x ∈ U then there exists some ρ
such that (x − ρ, x + ρ) ⊆ U . There is some rationals q, r such that x ∈ (q − r, q + r) ⊆ (x − ρ, x + ρ)
therefore x ∈ O. Consequently U = O.
We have two further defnitions of collections of sets which will be useful. These separate the two
parts of the defnition of a σ -algebra.
Denition 2.9 (π-system). A collection of subsets of E , A is a π-system if
∅∈A
If A, B ∈ A then A ∩ B ∈ A.
Denition 2.10 (D-system). A collection of subsets of E, A is a d-system if
E ∈ A.
If A, B ∈ A with A ⊂ B then B \ A ∈ A.
If A1 ⊂ A2 ⊂ A3 ⊂ . . . then ∞n=1 An ∈ A.
S
Lemma 2.11 (Dynkin's π-system lemma). Let A be a π-system. Then any d-system containing A also
contains the σ-algebra generated by A.
Proof. This is an exercise.
4
2.2 Set functions
Denition 2.12 (Set function). A set function φ is a function from a family of subsets of a space E ,
A to R ∪ {∞}.
Denition 2.13 (Measure). A measure is a specic type of set function which satises certain axioms.
A set function µ dened from a σ -algebra A is a measure if,
µ(A) ≥ 0 for every A ∈ A.
µ(∅) = 0
If A1 , A2 , A3 , . . . are all pairwise disjoint and in A then
!
[ X
µ An = µ(An ).
n n
5
Denition 2.20 (Measureable space). We call a pair (E, A) of a space and a σ-algebra, a measureable
space.
Denition 2.21 (Measure space). We call a triple (E, A, µ) of a space, a σ-algebra and a measure a
measure space.
Denition 2.22 (Finite measure space). We call a measure space (E, A, µ) nite if µ(E) < ∞.
Denition 2.23 (σ-nite measure space). We call a measure space, (E, A, µ), σ-nite if there exists
a countable collection E1 , E2 , · · · ∈ A such that
[
E= En ,
n
and
µ(Ei ) < ∞, ∀i.
Denition 2.24 (Borel measures and Radon measures). A measure µ on a subset of a topological
space E is called a Borel measure if it is a measure with respect to the Borel σ -algebra.
A Borel measure is called a Radon measure if for every compact set K ∈ B(E) we have that
µ(K) < ∞.
Lemma 2.25 (Continuity of measure). Let (E, E, µ) be a measure space. Suppose that (An )n is a
sequence of measurable sets with A1 ⊆ A2 ⊆ . . . and (Bn )n is a sequence of measurable sets with
B1 ⊇ B2 ⊇ . . . , and µ(B1 ) < ∞ then we have
!
[
µ An = lim µ(An )
n
n
and !
\
µ Bn = lim µ(Bn ).
n
n
Proof. Let Ãn = An \ An−1 . We have that n Ãn . Furthermore, countable additivity gives
S S
n An =
us that !
[ X
µ Ãn = µ(Ãn ).
n n
S
Therefore, we have mn=1 µ(Ãn ) → µ ( n An ). We also have
m
n=1 Ãn = µ(Am ).
m
P S P
n=1 µ(Ãn ) = µ
Now we move onto
T the Bn , let Cn = B1 \ Bn then the Cn areTan increasing sequence of measurable
sets with Cn ↑ B1 \ n Bn . So by the rst part we have µ (B1 \ n Bn ) = limn µ(Cn ). Therefore
!
\
µ(B1 ) − µ Bn = µ(B1 ) − lim µ(Bn ).
n
n
This gives the result as long as µ(B1 ) < ∞. If there exist an m such that µ(Bm ) < ∞ then we can
renumber starting with m and repeat the argument above.
N.b. the fact that µ(B) < ∞ implies µ(A) < ∞ if B ⊂ A follows from nite additivity. µ(B) =
µ(A) + µ(B \ A) ≥ µ(A).
6
3 Week 2 starts here - Outer measure, Lebesgue Measure
Denition 3.1 (Outer measure). We write P(E) to be the power set of E , that is to say the set of
all subsets of E . An outer measure is a function, ν , from P(E) → R+ ∪ {∞} such that
ν(∅) = 0,
We also write J to be the set of nite unions of half open intervals. Then we dene a set function λ
from J to R by
n
X
λ ((a1 , b1 ] ∪ (a2 , b2 ] ∪ · · · ∪ (an , bn ]) = (bi − ai ).
i=1
We are most interested in λ dened on a single half open interval. Using this we can dene Lebesgue
outer measure.
Denition 3.3 (Lebesgue outer measure). We dene Lebesgue outer measure on P(R) by
λ(In ) : In are half open intervals , A ⊂
X [
λ∗ (A) = inf{ In }.
n n
Proposition 3.4. Lebesgue outer measure is an outer measure and agrees with λ on any half open
interval.
Proof. We need to check each part of the denition of outer measure. First the fact that λ∗ (∅) = 0
follows from the fact that ∅ ∈ I and λ(∅) = 0. Now suppose that A1 ⊂ A2 , then any set B ∈ I with
A2 ⊆ B also has A1 ⊆ B so
as the inmum over a larger set will always Pbe smaller. Now let us turn to the countable subadditivity.
Let usPtake some sequence A1 , A2 , . . . , if n λ∗ (An ) = ∞ then we are done. Therefore we can assume
that n λ∗ (An ) < ∞. Now let us x an arbitrary S > 0. Now by the denition of λ∗ for each n there
exists some In ∈ I such that An ⊆ In and In = S k In,k where the P In,k are half openP intervals, and
∗ (A ) + 2−n . Then the set I = is in and
P P
λ(I
Pk ∗ n,k ) ≤ λ n I
n n I n,k λ(I n,k ) = n k λ(I n,k ) ≤
. Therefore ∗( .
S P ∗
n λ (A n ) + λ A
n n ) ≤ n λ (A n ) +
Lastly if A is the interval (a, b] then (a, b] ∈ I so λ∗ (A) ≤ b − a. Suppose that (a, b] ⊆ (c1 , d1 ] ∪
(c2 , d2 ] ∪ . . . .
Suppose that (a, b] ⊆ (c1 , d1 ] ∪ (c2 , d2 ] ∪ . . . . Then we have that for any , δ that
[a + , b − ] ⊆ (c1 − δ/2, d1 + δ/2) ∪ (c2 − δ/4, d2 + δ/4) ∪ . . . (ck − 2−k δ, dk + 2−k δ) ∪ . . . .
7
Then using compactness there exists some n such that
n
[
[a + , b − ] ⊆ (ck − 2−k δ, dk + 2−k δ).
k=1
Note: When we are working in Rd as in the assignment the terms involving and δ will be multiplied
by something involving the side lengths of rectangles. In order to run the proof you can say that wlog
all the rectangles you are looking at are contained inside some xed large rectangle. This will allow
you to send and δ to zero without having to worry.
We want to turn this outer measure into a true measure. In order to do this we need to restrict λ∗
to some subset of P(R).
Denition 3.5 (Lebesgue Measurable sets). We call a set A ∈ P(R) is Lebesgue Measureable if for
any B ∈ P(R) we have
λ∗ (B) = λ∗ (A ∩ B) + λ∗ (Ac ∩ B).
Proposition 3.6. The collection of Lebesgue measureable sets, M , is a σ algebra.
Proof. First let us notice that the denition of a Lebesgue measureable sets is symmetric in A and Ac ,
so A ∈ M implies that Ac ∈ M .
Secondly we can see that ∅ ∈ M as λ∗ (A ∩ ∅) + λ∗ (A ∩ ∅c ) = λ∗ (∅) + λ∗ (A ∩ E) = 0 + λ∗ (A). This
also implies via the rst point that E ∈ M .
We then show that if A1 , A2 ∈ M then A1 ∪ A2 ∈ M . Using the fact that A1 ∈ M we have
λ∗ (B ∩ (A1 ∪ A2 )) = λ∗ (B ∩ (A1 ∪ A2 ) ∩ A1 ) + λ∗ (B ∩ (A1 ∪ A2 ) ∩ Ac1 ) = λ∗ (B ∩ A1 ) + λ∗ (B ∩ A2 ∩ Ac1 ).
Therefore,
λ∗ (B ∩ (A1 ∪ A2 )) + λ∗ (B ∩ (A1 ∪ A2 )c ) = λ∗ (B ∩ A1 ) + λ∗ (B ∩ Ac1 ).
Then we use again the fact that A1 ∈ M to get
λ∗ (B ∩ (A1 ∪ A2 )) + λ∗ (B ∩ (A1 ∪ A2 )c ) = λ∗ (B).
8
This shows that A1 ∪ A2 ∈ M .
Now let us take an innite sequence of disjoint sets A1 , A2 , A3 , . . . then we will show
n n
!!
X \
∗ ∗ ∗
λ (B) = λ (B ∩ Ai ) + λ B∩ Aci .
i=1 i=1
We can show this by induction. For the base case it just follows with n = 1 from the fact that A1 ∈ M .
Then by induction suppose we know that
n−1 n−1
!!
X \
λ∗ (B) = λ∗ (B ∩ Ai ) + λ∗ B∩ Aci .
i=1 i=1
n−1 n
!! !!
\ \
λ∗ B∩ Aci = λ∗ (B ∩ An ) + λ∗ B∩ Aci .
i=1 i=1
Therefore, !c !
∞ ∞
!!
[ [
∗ ∗ ∗
λ (B) = λ B∩ Ai +λ B∩ Ai .
i=1 i=1
We have now shown that M is closed under complements and taking countable unions and contains
∅ which is sucient to show that M is a σ -algebra.
9
Proposition 3.7. The restriction of λ∗ to M is a measure.
Proof. We need to show that λ∗ is countably additive on M so let A1 , A2 , . . . be a sequence of disjoint
subsets in M . In the proof that M is a σ -algebra we showed that
∞ ∞
!!
X \
∗ ∗ ∗
λ (B) ≥ λ (B ∩ Ai ) + λ B∩ Aci .
i=1 i=1
∞
! n ∞
! ∞
!c ! ∞
[ X [ [ X
∗ ∗ ∗
λ Ai ≥ λ (Ai ) + λ Ai ∩ Ai = λ∗ (Ai ).
i=1 i=1 i=1 i=1 i=1
so consequently
∞ n
!
[ X
∗
λ Ai = λ∗ (Ai ).
i=1 i=1
Remark 3.8. We now call the restriction of λ∗ to M , λ and call it Lebesgue measure.
We now want to know that there are some Lebesgue measureable sets. In order to do this we rst
show that all the intervals of the form (−∞, b] are Lebesgue measurable.
Lemma 3.9. The intervals of the form (−∞, b] are Lebesgue measureable.
Proof. Let B be a subset of R and let I1 , I2 , . . . be a sequence of half open intervals such that B ⊆
I1 ∪ I2 ∪ . . . . Now let us dene the (often empty) intervals
S l Ii = Ii ∩ (−∞, b] and
l
S Iir = Ii ∩ (b, ∞), these
r
are also half open intervals. We have B ∩ (−∞, b] ⊆ n In and B ∩ (b, ∞) ⊆ n In . Therefore we have
X X
λ∗ (B ∩ (−∞, b]) ≤ λ(Inl ), λ∗ (B ∩ (b, ∞)) ≤ λ(Inr ).
n n
We can then take the inmum over all possible sequences of intervals covering B to get
λ∗ (B ∩ (−∞, b])) + λ∗ (B ∩ (b, ∞)) ≤ λ∗ (B).
10
Proof. The Borel σ algebra is the σ algebra generated by sets of the form (−∞, b] as shown last week.
Therefore, as M is a σ -algebra and contains all the intervals of the form (−∞, b] then it contains the
Borel σ -algebra.
The construction of Lebesgue measure via the outer measure can be generalised via Carathéodory's
extension theorem. We briey give the dention of a ring of subsets.
Denition 3.11 (Ring). A collection of subsets, A, of a space E is called a ring if for every A, B ∈ A
we have A \ B ∈ A and A ∪ B ∈ A.
Now we introduce Carathéodory's Extension theorem. We can see that the proof is in many ways
very similar to the construction of Lebesgue measure.
Theorem 3.12 (Carathéodory's Extension Theorem). Let A be a ring of subsets of E , and let µ : A →
[0, ∞] be a countably additive set function. Then µ extends to a measure on σ(A).
Proof. We dene the outer measure µ∗ on P(E) by
( )
X [
µ∗ (B) = inf µ(An ) : An ∈ A∀n, B ⊂ An .
n n
µ∗ (B) = ∞ if there is not possible sequence of An so that B is contained in their union. We can see
immediately that µ∗ (∅) = 0 and µ∗ is increasing.
As before we dene M to be the set of µ∗ measurable sets A that satisfy, for every B ⊆ E that
µ∗ (B) = µ∗ (B ∩ A) + µ∗ (B ∩ Ac ).
11
As is arbitrary this gives the required result.
The next step is to show that M is a σ -algebra. We start with the algebra part. E and ∅ are in
M as
µ∗ (B) = µ∗ (B ∩ E) + µ∗ (B ∩ ∅),
just because B ∩ E = B and B ∩ ∅ = ∅ and we know µ∗ (∅) = 0. We also can see that
µ∗ (B) = µ∗ (B ∩ A) + µ∗ (B ∩ Ac )
is symmetric in exchanging A and Ac so if A ∈ M then so is Ac . Now suppose A1 , A2 ∈ M . We notice
that (A1 ∩ A2 )c ∩ A1 = (Ac1 ∪ Ac2 ) ∩ A1 = (Ac1 ∩ A1 ) ∪ (Ac2 ∩ A1 ) = Ac2 ∩ A1 and Ac1 = Ac1 ∩ (A1 ∩ A2 )c .
Using this and the fact that A1 , A2 , Ac1 , Ac2 are in M we have
Now as Ac ⊆ Ac1 ∩ Ac2 · · · ∩ Acn for each n we have µ∗ (B ∩ Ac ) ≤ µ∗ (B ∩ Ac1 ∩ · · · ∩ Acn ). Therefore for
each n n
X
µ∗ (B) ≥ µ∗ (B ∩ Ak ) + µ∗ (B ∩ Ac ).
k=1
Letting n → ∞ we have X
µ∗ (B) ≥ µ∗ (B ∩ An ) + µ∗ (B ∩ Ac ).
n
Now we use the countable subadditivity of µ∗ and the fact that B ∩ A = ∩ An ) to get
S
n (B
µ∗ (B) ≥ µ∗ (B ∩ A) + µ∗ (B ∩ Ac ).
As the other inequality holds by subadditivity of µ∗ we have that µ∗ (B) = µ∗ (B ∩ A) + µ∗ (B ∩ Ac ) and
hence A ∈ M .
Lastly, we want to show that µ∗ is a measure on M . In order to do this we need to show that µ∗
is countably additiveSon M . In the last step we showed that for any B , and a sequence of disjoint sets
An in M with A = n An , that
X
µ∗ (B) ≥ µ∗ (An ∩ B) + µ∗ (B ∩ Ac ).
n
If we apply this identity with B = A and use the fact that An ∩ A = An we get
X
µ∗ (A) ≥ µ∗ (An ).
n
Since we already know that is countably subadditive this is sucient to show that µ∗ is countably
µ∗
additive and hence a measure on M .
12
Theorem 3.13 (Uniqueness of Extension). Let µ1 and µ2 be measures on (E, E) with µ1 (E) = µ2 (E) <
∞. Suppose that µ1 = µ2 on A where A is a π-system generating E , then µ1 = µ2 on E .
Proof. Let us consider D ⊆ E dened as the measurable sets on which µ1 (A) = µ2 (A). By hypothesis
E ∈ D and A ⊆ D. We want to show that D is a σ -algebra and therefore D = E . Suppose that
A, B ∈ E with A ⊆ B then we have µi (A) + µi (B \ A) = µi (B) < ∞. This means that if A and B are
in D then so is A \ B . Suppose that An is a sequence of elements in D Swith A1 ⊆ A2 ⊆ A3S. . . then by
continuity of measure µ1 ( n An ) = limn µ1 (An ) = limn µ2 (An ) = µ2 ( n An ). Therefore n An ∈ D.
S
Therefore, D is a d-system containing the π -system A so by Dynkin's lemma is equal to E .
13
We now want to show that Lebesgue measure is the only which assigns each interval the correct
measure.
Proposition 4.4. Lebesgue measure is the only measure on (R, B(R)) which assigns each half open
interval its length. This is equally true with half open hyper-rectangles in Rd .
Proof. The collection of half open intervals is a π-system which generates the Borel σ-algebra. There-
fore, if λ(R) had been nite we could use Dynkin's uniqueness of extension Lemma to get that any
other measure which agrees with Lebesgue measure on the half open intervals must agree with Lebesgue
measure on the whole of the Borel σ -algebra. Instead let En = [−n, n]d then, by Dynkin's uniqueness
of extension lemma we have that λ is the only measure on En , assigning each rectangle inside En its
measure. Since every rectangle is bounded, eventually it is inside some En so if µ is a measure such
that µ(R) = λ(R) for every rectangle then the restriction of µ to En must agree with the restriction
of λ to En . We also have that, for any A, µ(A) = limn µ(A ∩ En ) by continuity of measure. So
µ(A) = limn µ(A ∩ En ) = limn λ(A ∩ En ) = λ(A).
Corollary 4.5. Lebesgue measure is translation invariant. That is to say if we dene the set x + A =
then λ(x + A) = λ(A)
{x + y, y ∈ A}
Proof. Dene a new measure λx by λx (A) = λ(x+A) then λx ((a, b]) = λ((a+x, b+x]) = b+x−(a+x) =
b − a. Therefore λx agrees with λ on the half open intervals and therefore agrees with λ on the whole
of B(R). Again it is straightforward to extend this to Rd .
Lastly, in the construction of Lebesgue measure we show that M is not the whole of P(R) and
that there exist non-Lebesgue measureable sets.
Proposition 4.6. There exists sets that are in P(R) which are not in M .
Proof. This proof involves the use of the axiom of choice. In fact it is known that it is necessary to use
some form of the axiom of choice to prove the existence of a non-Lebesgue measurable set in R.
We use an argument by contradiction, we begin by assuming every subset of R is Lebesgue measur-
able. We dene an equivalence relation on [0, 1) by saying x ∼ y exactly when x − y ∈ Q. Using the
axiom of choice we nd a subset S of [0, 1) which contains exactly one representative of each equivalence
class. Next we dene the set S + q = {s + q (mod 1) : s ∈ S} for each q ∈ Q ∩ [0, 1). Then by our
choice of S we have that [
[0, 1) = (S + q),
q∈Q∩[0,1)
where this union is disjoint. We can also see by translation invariance of λ that if S were Lebesgue
measurable then we would have
λ(S) = λ(S + q)
for every q . Therefore, by countable additivity we would have
X X
λ([0, 1)) = λ(S + q) = λ(S) = ∞.
q∈Q∩[0,1) q∈Q∩[0,1)
14
5 Measurable Functions
A big part of measure theory is the study of functions which are compatible with the measure spaces.
We begin with a basic denition which will be satised by all the functions we are interested in.
Denition 5.1 (Mesasurable functions). If (E, E) and (F, F) are two measurable spaces and f is a
function E → F , then we say f is measurable if for every A ∈ F we have f −1 (A) ∈ E .
Lemma 5.2. Suppose that A ⊂ F is such that σ(A) = F . If f is a function such that for every A ∈ A
we have f −1 (A) ∈ E then f is measurable.
Proof. First we note that !
[ [
f −1 Ai = f −1 (Ai ),
i i
and
f −1 (B \ A) = f −1 (B) \ f −1 (A).
Now if we consider {A ∈ F : f −1 (A) ∈ E} then this is a σ -algebra, as E is a σ -algebra and f −1
preserves set operations. Therefore, {A ∈ F : f −1 (A) ∈ E} is a σ -algebra containing A therefore
F ⊆ {A ∈ F : f −1 (A) ∈ F} so f is measurable.
Remark 5.3. In particular note that the above lemma means that whenever we have f : E → R and
R is equipped with the Borel σ algebra, we know that f is measurable if f −1 ((−∞, b]) is a measurable
set for every b.
Lemma 5.4. If E, F are topological spaces, equipped with their Borel σ-algebras, and we have f : E → F
is a continuous map then f is measurable.
Proof. This is on the exercise sheet and a solution on the solution sheet.
Lemma 5.5. If (E, E), (F, F) and (G, G) are all measurable spaces and f :E→F and g : F → G are
measurable then so is g ◦ f .
Proof. Take any set A ∈ G then (g ◦ f )−1 (A) = {x ∈ E : g(f (x)) ∈ A}. Let us call B = {y ∈
F : g(y) ∈ A} = g −1 (A) then (g ◦ f )−1 (A) = {x ∈ E : f (x) ∈ B} = f −1 (B). Then as g is
measurable and A ∈ G then B ∈ F . In the same way as f is measurable and B ∈ F then f −1 (B) ∈ E .
As f −1 (B) = (g ◦ f )−1 (A) this shows that (g ◦ f )−1 (A) ∈ E for every A ∈ G and hence g ◦ f is
measurable.
Lemma 5.6. Suppose that f :R→R is a monotone function. Then f is measurable with respect to
the Borel σ-algebra.
Proof. Suppose without loss of generality that f is increasing then f −1 ((−∞, b]) is ∅, (−∞, ∞) or
(−∞, a) or (−∞, a] for some a. All these possibilities are Borel measurable sets.
Lemma 5.7. If fn is a sequence of measurable function taking values in (R, B(R) then the following
functions are all measurable:
−f1
f1 ∨ f2
15
f1 + f2 ,
f1 f2 ,
supn fn ,
inf n fn ,
lim supn fn ,
lim inf n fn .
Proof. We only show two result. The rest are on the assignment.
In order to show that any of these functions are measureable we want to look at f −1 ((−∞, b]) or
a similar set. (f1 ∨ f2 )−1 ((−∞, b]) = {x : max{f1 (x), f2 (x)} ≤ b} = {x : f1 (x) ≤ b and f2 (x) ≤
b} = {x : f1 (x) ≤ b} ∩ {x : f2 (x) ≤ b} = f1−1 ((−∞, b]) ∩ f2−1 ((−∞, b]). Now since f1 and f2 are
both measureable the sets f1−1 ((−∞, b]) and f2−1 ((−∞, b]) are both measurable. We also know that
the intersection of two measurable sets is measurable.
(f1 +f2 )−1 ((b, ∞)) = {x : f1 (x)+f2 (x) > b}. NowSif f1 (x) > b−f2 (x) then there exists a q ∈ Q such
that f1 (x) > q > b − f2 (x). Let us dene the set A = q∈Q {x : f1 (x) > q} ∩ {x : f2 (x) > b − q}. Since
f1 , f2 are both measurable A is a countable union of measurable sets so measurable. We can also see
that if x ∈ A then f1 (x)+f2 (x) > b and our observation shows that in fact A = {x : f1 (x)+f2 (x) > b}.
Therefore, f1 + f2 is measurable.
Denition 5.8 (Image measure). We can use a measurable function f to dene an image measure.
Suppose µ is a measure on (E, E) and f is a measurable function (E, E) → (F, F) then we can dene
a new measure ν by saying that
ν(A) = µ(f −1 (A)),
for every A ∈ F . We write ν = µ ◦ f −1 .
We can use the notion of image measure to construct further measures from Lebesgue measure.
Lemma 5.9. Suppose g : R → R and that g is non-constant, right-continuous and non-decreasing.
Let us dene g(−∞) = limx→−∞ g(x) and g(∞) = limx→∞ g(x) and let us call the interval I :=
(g(−∞), g(∞)) (this might be the whole of R. Dene a partial inverse to g by f : I → R by
We call this measure the Lebesgue Steitjles measure associated with g. Furthermore, every Radon
measure on R can be represented this way.
16
Proof. Dene I and f as in the Lemma above. Then we can construct dg as the image measure of
Lebesgue measure on I . That is to say we can let dg = λ ◦ f −1 . If this is the case then
dg((a, b]) = λ ({x : f (x) > a, f (x) ≤ b}) = λ((g(a), g(b)]) = g(b) − g(a).
The standard argument for uniqueness of measures (as for that of Lebesgue measure) gives uniqueness
of this measure.
Finally, if ν is a Radon measure on R then we can dene a function g , by
g(y) = ν((0, y]), y ≥ 0, g(y) = −ν((y, 0]), y < 0.
Then ν = dg by uniqueness.
17
Denition 6.3 (Convergence in measure). Let (E, E, µ) be a measureable space. A sequence of real
valued measureable functions, (fn )n≥1 : E → F , converges in measure to f if for every > 0
µ ({x : |f (x) − fn (x)| > }) → 0, as n → ∞.
Example 6.4. The sequence of functions fn (x) = xn converges to 0 Lebesgue almost everywhere on
[0, 1], and in measure, but it doesn't converge pointwise as it doesn't converge at x = 1.
Example 6.5. The- sequence of functions fn (x) = 1[n,n+1] (x) converges to 0 Lebesgue almost everywhere
(in fact everywhere) but not in measure.
Example 6.6. Consider the sequence of functions f1 = 1[0,1/2) , f2 = 1[1/2,1) , f3 = 1[0,1/4) , f4 = 1[1/4,1/2) , f5 =
1[1/2,3/4) , f6 = 1[3/4,1) , f7 = 1[0,1/8) , f8 = 1[1/8,1/4) . . . then fn converges to 0 in measure, but fn (x) does
not converge for any x.
We can prove a quasi-equivalence between these two notions of measure. Before we do this we need
to introduce a very useful lemma, the Borel-Cantelli Lemma. We introduce it here as it is used to
prove the following theorem but it is a useful tool to have whilst doing measure theory, particularly
probability theory. First let us also introduce some more notation
Denition 6.7. Let (An )n be a sequence of measurable sets then we have the following names
Am = {An innitely often },
\ [
lim sup An =
n n m≥n
and
Am = {An eventually }.
[ \
lim inf An =
n
n m≥n
The last names are more comon when the An are events in a probability space.
Lemma 6.8 (First Borel-Cantelli Lemma). Let (E, E, µ) be a measure space. Then if
P
n µ(An ) <∞
it follows that µ(lim supn An ) = 0).
Proof. For any n we have
[ X
µ(lim sup An ) ≤ µ Am ≤ µ(Am ).
n
m≥n m≥n
Theorem 6.10. Let (E, E, µ) be a measure space and (fn )n be a sequence of measurable functions.
Then we have the following:
18
Suppose that µ(E) < ∞ and that fn → 0 almost everywhere, then fn → 0 in measure.
If fn → 0 in measure then there exists some subsequence (nk )k such that fnk → 0 almost every-
where.
Proof. Suppose that fn → 0 almost everywhere. Then
therefore,
µ({x : |fn (x)| > ) = µ(E) − µ({x : |fn (x)| ≤ }) → 0.
Now suppose that fn → 0 in measure. We can nd a subsequence nk such that
µ({x : |fnk (x)| > 1/k}) ≤ 2−k .
Therefore X
µ({x : |fnk (x)| > 1/k}) < ∞.
k
Dene sets Ak = {x : gnk (x) ≤ 1/k} and let A = k Ak . The set A has
T
!
[ X X
µ(Ac ) = µ Ack ≤ µ(Ack ) ≤ 2−k = .
k k k
We want to show that fn converges uniformly to f on A. For each δ there exists a k such athat 1/k < δ ,
then as A ⊆ Ak , we have that for every n ≥ nk ,
|fn − f | ≤ gnk ≤ 1/k < δ,
19
Denition 6.12 (Almost uniform convergence). We say a sequence of functions (fn )n≥1 converges
almost uniformly on a measure space (E, E, µ) if for every > 0 there exists a set A with µ(Ac ) <
with fn → f uniformly on A.
We can use Egoro's theorem to prove a result called Lusin's theorem. First let us recall the
denition of regularity
Denition 6.13. Let E be a topological space and µ be a measure on (E, B(E)) then say µ is regular
if for every A ∈ B(E) we have
µ(A) = inf{µ(U ) : A ⊆ U, U is open},
Theorem 6.14 (Lusin's Theorem). Suppose that f is a measurable function and A ⊆ Rd is a Borel set
and λ(A) < ∞ then for any > 0 there is a compact subset K of A with λ(A \ K) < such that the
restriction of f to K is continuous.
Remark 6.15. This theorem can be generalised to locally compact Hausdor spaces, see Cohn's book.
Proof. Suppose rst that f only takes countably many values, a1 , a2 , a3 , . . . on A the let Ak = {x ∈
A : fS(x) = ak }, by measurablility of f we can
Sn see that Ak = f ({ak }) is measurable. We know that
−1
A = n An so by continuity of measure S λ( k=1 Ak ) ↑ λ(A). Since λ(A) < ∞ we have that for any
> 0 there exists n such that λ(A \ nk=1 Ak ) < /2. By the regularity of Lebesgue Sn measure we can
nd compact subsets K1 , . . . , Kn such that λ(An \ Kn ) ≤ /2n. Then let K = k=1 Kk . This is a
compact subset of A and
n
[ n
[ n
[
λ(A \ K) ≤ λ(A \ Ak ) + λ( Ak \ Kk ) < /2 + /2.
k=1 k=1 k=1
Now f restricted to K is continuous since the Ki are disjoint and f is constant on each Ki .
Now we have proved the special case where f takes countably many values we can use this to prove
the theorem for general f . Let fn = 2−n b2n f c then 2−n ≥ f (x)−fn (x) ≥ 0 so fn (x) → f (x), uniformly.
Now, fn can only take countably many values, so by our special case of Lusin's theorem there exists a
Kn ⊆ K , compact, such that λ(A \ Kn ) ≤ 2−n−1 , and fS n is continuous on KnP . Now let K∞ = n Kn ,
T
then K∞ is compact and λ(A \ K∞ ) = λ(A \ K∞ ) = λ( n (A \ Kn )) ≤ /2 + n 2−n−1 = . Now we
have that fn converges uniformly to f on K∞ and fn is continuous on K∞ for each n. As the uniform
limit of continuous functions is continuous this shows that f is continuous on K .
7 Integration
We now get to the denition of the Lebesgue integral which is the second important object that we
construct in this course. There are several dierent notations for the integral of a function f with
respect to a measure µ. We have
µ(f ) = f dµ = f (x)µ(dx).
E E
When you are integrating with respect to Lebesgue measure the most common notation is
f (x)dx.
E
20
Before we start constructing the integral we'll briey discuss the motivations for how to construct it.
Firstly, you've already seen the Riemann integral. We can describe the strategy of Riemann integration -
very loosly - as splitting the domain of the function into equal sized chunks, estimating the height of the
function on each chunk then adding them together. Broadly what happens with Lebesgue integration
is that we split the range of the function into equal sized chunks, estimate the size of the part of the
domain which will end up in that chunk of range then sum everything up. We need the theory of
measure in order to do this because the bit of the domain corresponding to chuncks of the range can
be quite weird sets whose size it wouldn't be possible to measure. The rst motivation for this is that
whilst Riemann integration only works for functions from subsets of Rd to R, Lebesgue integration
allows the domain on the function to be quite weird, (as long as it is a measure space). As an example,
this is helpful for taking expectations rigorously because expectations are integral of random variables
and the domain of a random variable is a probability space which may not be explicit.
The second big motivation for introducing a new theory of integration is the issue of convergence.
It
is important in many practical
applications of integration
theory to know when limn fn (x)dx =
limn fn (x)dx or when Ex Ey f (x, y)dxdy = Ey Ex f (x, y)dydx. Lebesgue integration allows us to
rigorously nd conditions on f under which these statements will be true. This is often not possible
in a satisfactory way with the Riemann theory of integration. We will see some of these convergence
theorems next week and then switching the order of integration towards the end of the course (currently
planned for week 9). The most important motivation for developing good convergence theorems was
the development of Fourier series. We want to know when it is possible to integrate a Fourier series
term by term.
The strategy for constructing the integral is to begin by dening µ(f ) when f belongs to a special
class of measurable functions that we call simple functions. We then dene the integral to progressively
larger classes of functions.
Denition 7.1 (Simple functions). Let (E, E, µ) be a measure space. The set of simple functions on
this space taking values in R are functions of the form
n
X
f (x) = ak 1Ak (x).
k=1
Here, the Ak are disjoint sets in E , 1A represents the indicator function of the set, and the ak are
non-negative real numbers. We note that this representation of f is not unique.
Denition 7.2 (The integral of a simple function). Still working in the setting above, let f (x) =
k=1 ak 1Ak (x), then we can dene
Pn
Xn
µ(f ) = ak µ(Ak ).
k=1
Here, the Ak are disjoint sets in E , 1A represents the indicator function of the set, and the ak are
non-negative real numbers. We note that this representation of f is not unique.
21
Denition 8.2 (The integral of a simple function). Still working in the setting above, let f (x) =
k=1 ak 1Ak (x), then we can dene
Pn
Xn
µ(f ) = ak µ(Ak ).
k=1
Lemma 8.3. The integral of a simple function is well dened (it doesn't depend on the choice of
representation of the simple function) and satises the following properties.
For α > 0 we have µ(αf ) = αµ(f )
µ(f + g) = µ(f ) + µ(g).
n
X
µ(f ) = ak µ(Ak )
k=1
n m n X
m
as Ak = (Ak ∩ Bj ) as
[ [ [ X
Ak = Bj = ak µ(Ak ∩ Bj )
j k=1 j=1 k=1 j=1
n X
m m n
(as ak = bj if Ak ∩ Bj 6= ∅ so ak = bj or µ(Ak ∩ Bj ) = 0)
X X X
= bj µ(Ak ∩ Bj ) = bj µ(Ak ∩ Bj )
k=1 j=1 j=1 k=1
Xm
= bj µ(Bj ).
j=1
Now we move on to the linearity properties. These come naturally from the dention,
n
X n
X
µ(αf ) = αak µ(Ak ) = α ak µ(Ak ) = αµ(Ak ).
k=1 k=1
When we are dealing with two simple functions simultaneously it is useful to write them both in
a representation where the measurable sets Sappearing are the same for both functions. If we let
g = j=1 cj 1Cj then let us write A0 = E \ nk=1 Ak , a0 = 0 and dene C0 , c0 similarly then we can
Pm
write f + g as a simple function via
n X
X m
f (x) + g(x) = (ak + cj )1Ak ∩Cj
k=0 j=0
and we have n X
m
X
µ(f + g) = (ak + cj )µ(Ak ∩ Cj ).
k=0 j=0
22
We note that nk=0 Ak = = E and the Ak are mutually disjont, and the Cj are mutually
S Sm
j=0 Cj
disjoint. Therefore
n X
X m
µ(f + g) = (ak + cj )µ(Ak ∩ Cj )
k=0 j=0
n
X m
X m
X n
X
= ak µ(Ak ∩ Cj ) + cj µ(Ak ∩ Cj )
k=0 j=0 j=0 k=0
n m
as the unions of the Ak or Cj ll the space
X X
= ak µ(Ak ) + cj µ(Cj )
k=0 j=0
n m
as a0 = c0 = 0
X X
= ak µ(Ak ) + cj µ(Cj )
k=1 j=1
= µ(f ) + µ(g).
Now we move onto the monotonicity of the integral. Let us express f and g as before, again the
goal is to represent the two functions using the same measurable sets. We can rewrite as
n X
X m n X
X m
f= ak 1Ak ∩Cj = ak,j 1Ak ∩Cj ,
k=0 j=0 k=0 j=0
where ak,j = ak 1Ak ∩Cj 6=∅ . Here again we are using the fact that ll the space. In the same
Sm
j=0 Cj
way we can write
n X
X m
g= ck,j 1Ak ∩Cj ,
k=1 j=1
where ck,j = cj 1Ak ∩Cj 6=∅ . Then if f (x) ≤ g(x) for every x we know that this means ak,j ≤ ck,j for every
k, j . Then by the well denedness of the integral we have
n X
X m n X
X m
µ(f ) = ak,j µ(Ak ∩ Cj ) ≤ ck,j µ(Ak ∩ Cj ) = µ(g).
k=1 j=1 k=1 j=1
Lemma 8.5. The above denition of Lebesgue integral forPpositive functions is consistent with the
dention for simple functions. That is to say that if f = nk=1 ak 1Ak where ak ≥ 0 and the Ak are
disjoint measurable sets then
n
ak µ(Ak ) = sup{µ(g) : g is a simple function, g ≤ f }.
X
k=1
23
Denition 8.6 (Final denition of Lebesgue integral). Suppose that f is a measurable function which
is not necessarily positive. Then we call f , µ-integrable or Lebesgue integrable if µ(|f |) < ∞. In this
case we can write f = f+ − f− where f+ and f− are both positive and measureable (f+ = max{f, 0}).
We then dene the integral of f by
µ(f ) = µ(f+ ) − µ(f− ).
Remark 8.7. Notice that we haven't yet proved that these denitions of the integral behave the way we
hope (e.g. are linear, monotone, etc). In order to do this we need to prove some convergence results.
Here the rst equality follows from the fact that the support of fn must be included in the support of
f since 0 ≤ fn ≤ f .
Our next case is when f is positive and measurable and fn are all simple. Let us pick g a simple
function with g ≤ f then gn = fn ∧ g = min{fn , g} is a sequence of simple functions increasing to g .
Therefore, by our previous case µ(gn ) ↑ µ(g). Furthermore gn ≤ fn so by monotonicity
µ(g) = lim µ(gn ) ≤ lim µ(fn ).
n n
24
As g is an arbitrary this means that sup{µ(g) : g is a simple function, g ≤ f } ≤ limn µ(fn ).
The last case is the most general where both fn and f are positive and measureable. In this case we
introduce our favorite kind of approximation (which is very similar to what we used in Lusin's theorem)
gn = 2−n b2n fn c ∧ n,
then gn is a sequence of simple functions with gn ↑ f and gn ≤ fn , proving this is an exercise on the
assignment. Therefore we have
µ(f ) = lim µ(gn ) ≤ lim µ(fn ) ≤ µ(f ).
n n
25
Using this we can give another writing of the monotone convergence theorem.
Proposition 8.12
P (Beppo-Levi) . Suppose that (fn )n≥0 is a sequence of real-valued measurable func-
tions. Then µ( n fn ) = n µ(fn ).
P
X n
X
µ(fn ) = lim µ(fk ),
n
n k=1
then by linearity
n
X Xn
lim µ(fk ) = lim µ( fk ) = lim µ(gn ),
n n n
k=1 k=1
then using monotone convergence we have
X
lim µ(gn ) = µ( fn ).
n
n
We can also prove that our notion of convergence for integrable functions behaves in the way we
expect. First we prove a helpful Lemma
Lemma 8.13. Let f1 , f2 , g1 , g2 all be non-negative, integrable, real valued functions such that f1 − f2 =
then we have µ(f1 ) − µ(f2 ) = µ(g1 ) − µ(g2 ).
g1 − g2
Proof. We have that f1 + g2 = g1 + f2 so µ(f1 + g2 ) = µ(g1 + f2 ), using linearity we have
µ(f1 ) + µ(g2 ) = µ(g1 ) + µ(f2 )
since all the integrals involved are nite we can rearrange this to give
µ(f1 ) − µ(f2 ) = µ(g1 ) − µ(g2 ).
Proposition 8.14. Suppose that f and g are integrable, real valued function on (E, E, µ) then
For every α > 0 we have µ(αf ) = αµ(f ), we also have µ(−f ) = −µ(f )
The function f + g is integrable and µ(f + g) = µ(f ) + µ(g)
If f ≤ g then µ(f ) ≤ µ(g)
Proof. Let us write f = f+ − f− and g = g+ − g− where these are split into the positive and negative
parts of f and g . Then αf = αf+ − αf− so µ(αf ) = µ(αf+ ) − µ(αf− ) = α(µ(f+ ) − µ(f− )). Similarly
−f = f− − f+ so µ(−f ) = µ(f− ) − µ(f+ ) = −µ(f ).
First we have that |f + g| ≤ |f | + |g| so f + g is integrable. For the second point we need to
use the Lemma above. We know that (f+ + g+ ) − (f− + g− ) = (f + g)+ − (f + g)− and all of
(f+ + g+ ), (f− + g− ), (f + g)+ , (f + g)− are non-negative and integrable so using the lemma we have
µ(f +g) = µ((f +g)+ )−µ((f +g)− ) = µ(f+ +g+ )−µ(f− +g− ) = µ(f+ )−µ(f− )+µ(g+ )−µ(g− ) = µ(f )+µ(g).
26
Theorem 8.15 (Fatou's Lemma). Let fn be a sequence of non-negative measurable function then we
have the following result
µ lim inf fn ≤ lim inf µ(fn )
n n
Remark 8.16. I always have trouble remembering which way around the inequality goes in this lemma.
A helpful example is if fn = 1[n,n+1) and µ is Lebesgue measure. Then λ(fn ) = 1 for every n and
lim inf n fn = 0. This is also an instructive example for why the limits can fail to be the same. Essentially
here the mass we are trying to integrate escapes to innity.
Proof. This is essentially a consequence of monotone convergence. Let gn = inf k≥n fk , then gn is a
non-decreasing sequence of measurable functions and gn ≤ fn for each n. By denition of the gn we
also know that lim inf fn = lim inf n gn = limn gn . Using Monotone convergence we then have
µ(lim inf fn ) = µ(lim gn ) = lim µ(gn ).
n n n
Then using monotonicity we have
µ(gn ) ≤ µ(fn )
for each n, so consequently
lim µ(gn ) = lim inf µ(gn ) ≤ lim inf µ(fn ).
n n n
Putting these all together gives the result.
Fatou's lemma is key to proving our next important convergence theorem.
Theorem 8.17 (Dominated convergence theorem). Let fn be a sequence of functions and f another
function such that fn → f almost everywhere. Suppose further that there exists a positive function g
such that |f | ≤ g, |fn | ≤ g for every n and µ(g) < ∞, then limn µ(fn ) = µ(f ). The function g is called
the dominating function.
Proof. Let us rst suppose that fn → f and the domination conditions hold everywhere. Then we have
that g + fn is a sequence of non-negative measurable functions whose limit is g + f . Applying Fatou's
lemma gives
µ(g + f ) ≤ lim inf µ(g + fn ) = µ(g) + lim inf µ(fn ),
n n
subtracting µ(g) from each side (which we can do as it is nite) gives
µ(f ) ≤ lim inf µ(fn ).
n
Similarly g − fn is a sequence of non-negative measurable functions whose limit is g − f . Applying
Fatou's lemma again gives
µ(g) − µ(f ) ≤ µ(g) + lim inf (−µ(fn )) = µ(g) − lim sup µ(fn ).
n n
Rearranging this since all the quantities are nite gives
lim sup µ(fn ) ≤ µ(f ).
n
Putting both parts together gives
µ(f ) ≤ lim inf µ(fn ) ≤ lim sup µ(fn ) ≤ µ(f ).
n n
Therefore the limit of the sequence µ(fn ) exists and is equal to µ(f ).
The extension of this result to when the conditions only hold almost everywhere is due to the fact
that the integrals of any function is unchanged by modifying that function on a measure zero set. This
type of result will be discussed in more detail when we introduce Lebesgue spaces. It isn't really the
point of this particular theorem, we just give the full version here so we are able to apply it.
27
9 More integration: Week 7 starts here
The following is a useful criteria for when we can dierentiate under the integral sign which also serves
as a good example of how to use the dominated convergence theorem.
Theorem 9.1 (Dierentiation under the integral sign). Let (E, E, µ) be a measure space and f :
U × E → R be a function such that x 7→ f (t, x) is integrable for every t, and t 7→ f (t, x) is dierentiable
for every x, and suppose further that there exists an integrable function g(x) such that
∂f (t, x)
≤ g(x), ∀t ∈ U
∂t
then the function x 7→ ∂f (t, x)/∂t is integrable and the function F (t) = E f (t, x)µ(dx) is dierentiable
with
dF ∂f
= (t, x)µ(dx).
dt E ∂t
Notice here we are using a dierent notation for the integral with respect to µ. We do this because it is
helpful to be able to emphasise that we integrate in x but not t.
Proof. Let n be an arbitrary sequence which tends towards 0. Let
f (t + n , x) − f (t, x) ∂f
gn (t, x) = − (t, x).
n ∂t
First we notice that gn → 0 and gn + ∂f /∂t is measurable so ∂f /∂t is the limit of measurable functions
so measurable. By the mean value theorem we have |gn | ≤ 2g for each n. Therefore, by dominated
convergence we have
gn (t, x)µ(dx) → 0.
Proof. First we need to show that ν is indeed a measure. f 1∅ = 0 so we have ν(∅) = 0 as required.
We will also have that ν(A) ≥ 0 since f is non-negative. We show countable additivity, we note that
if A an B are disjoin
P then 1A∪B = 1A + 1B and furthermore if A1 , A2 , . . . is a P
sequence of disjoint
sets then 1Sn An = n 1An . With this reformulation ν( n An ) = µ(f 1Sn An ) = µ( n f 1An ) using the
S
Beppo-Levi reformulation of monotone convergence we have
X X X
µ( f 1An ) = µ(f 1An ) = ν(An ),
n n n
28
which is our desired result.
Now we want to show that if g ≥ 0 then ν(g) = µ(f g). Let us begin with the case where g = 1A
for some measurable A, then ν(g) = ν(A) = µ(f 1A ) = µ(f g) so the result follows by denition. Then
using the linearity of µ we can see that if g is a simple function then ν(g) = µ(f g). Now suppose
that g is not necessarilly simple, we can constuct (in our standard way) a sequence of simple functions,
gn , which increase to g then by monotone convergence we have that ν(g) = limn ν(gn ) = limn µ(f gn ).
Now f gn is a sequence of function which increases to f g so using monotone convergence we have that
limn µ(f gn ) = µ(f g) so we have that ν(g) = µ(f g).
There are also a few facts about Riemann integration which work in pretty much exactly the same
way for Lebesgue integration. For example the fundamental theorem of calculus holds equally well
in this case. We will see in general that when something is Riemann integrable it is also Lebesgue
integrable which will prove all these in general.
Theorem 9.5 (Fundamental
theorem of calculus). Suppose that f : [a, b] → R is a continuous function
and set F (t) = at f (x)dx = λ(1[a,t] f ), then F is dierentiable with F 0 (t) = f (t). Furthermore, let
b
F : [a, b] → R have continuous derivative f , then F (b) − F (a) = a f (x)dx.
Proof. Given > 0 there exists δ > 0 such that |x − t| ≤ δ implies that |f (x) − f (t)| ≤ therefore if
we take |h| ≤ δ then
t+h t∨(t+h)
1 1 1
(F (t + h) − F (t)) − f (t) = (f (x) − f (t))dx ≤ |f (x) − f (t)|dx.
h |h| t |h| t∧(t+h)
Now we can use the fact that inside the integral |x − t| ≤ δ so we have
t∨(t+h)
1 1
[ (F (t + h) − F (t)) − f (t) ≤ dx = .
h |h| t∧(t+h)
Proof. First suppose that g is the indicator function of an interval (c, d] then we want to prove that
φ(b) b
1(c,d] (x)dy = 1(c,d] (φ(x))φ0 (x)dx.
φ(a) a
Here the left hand side is equal to [φ(a), φ(b)] ∩ (c, d] and the right hand side is
b∧φ−1 (d)
φ0 (x)dx,
a∨φ−1 (c)
29
using the fundamental theorem of calculus this is
φ(b ∧ φ−1 (d)) − φ(a ∨ φ−1 (c)) = φ(b) ∧ d − φ(a) ∨ c = [φ(a), φ(b)] ∩ (c, d]
where here we used the fact that φ was increasing to commute it with min and max.
Now we have shown our proposition holds when g is the indicator function of a half open interval.
By linearity of the integral it will hold when g is the indicator function of a nite disjoint union of half
open intervals. Now let D be the set of all Borel sets A such that 1A satises our proposition. As the
name suggests we want to show that D is a d-system. If A ⊆ B and A, B ∈ D then 1B\A = 1B − 1A so
the proposition will hold for B \ A by linearity of the integral. Suppose that A1 ⊆ A2 ⊆ A3 . . . then
let gn = 1An then gn ↑ 1A = g and gn ◦ φ ↑ g ◦ φ, as φ is increasing so if
φ(b) b
gn (y)dy = gn (φ(x))φ0 (x)dx
φ(a) a
for each n applying monotone convergence to each side gives the result for g = 1A . This shows that
D is a d-system. Applying Dynkin's lemma then shows that for every A ∈ B(R) we have that the
proposition holds with g = 1A .
Linearity of the integral allows us to extend this result to any simple function g . We can then use
monotone convergence in exactly the same way as for the last part to extend it to any non-negative
measurable g .
Then we have the upper sum and lower sum associated to the partition which are dened as
n
X n
X
l(f, p) = mk (ak − ak−1 ), u(f, p) = Mk (ak − ak−1 ).
k=1 k=1
Denition 9.12. We call a function f , Riemann integrable on [a, b] if supp l(f, p) = inf q u(f, q).
30
Lemma 9.13. A function, f , is Riemann integrable if and only if there exists a partition p such that
u(f, p) − l(f, p) < .
Proof. First suppose that f satises that for every there exists a p such that u(f, p) − l(f, p) < then
inf u(f, q) − sup l(f, q) ≤ u(f, p) − l(f, p) < .
q q
so
u(f, p) − l(f, p) < .
Theorem 9.15. Let [a, b] be an interval. Suppose that f is a bounded, function which is Riemann
integrable on [a, b], then it is Lebesgue integrable and the Riemann integral agrees witht the Lebesgue
integral.
Proof. As f is bounded we only need to show that it is Lebesgue measurable in order for it to be
integrable. Using the lemma above there exists a nested sequence of partitions pn such that u(f, pn ) −
l(f, pn ) < 1/n for each n. Let us dene two sequences of functions gn and hn . We write pn = {ank }N
k=0 ,
n
and recall the denition of mk and Mk associated to the partition. Then we dene
n n
Nn
X Nn
X
gn := mnk 1[ank−1 ,ank ) , hn := Mkn 1[ank−1 ,ank ) .
k=1 k=1
Here we can see that gn and hn are both sequences of simple functions. We also have that gn is a
monotonically increasing sequence and hn is a monotonically decreasing sequence. As f is bounded so
are the sequences gn (x) and hn (x) for each x so we dene g(x) = limn gn (x) and h(x) = limn hn (x), these
are both bounded, Borel measurable functions. We also have that gn (x) ≤ f (x) ≤ hn (x) ≤ sup[a,b] f so
31
consequently g(x) ≤ f (x) ≤ h(x). We can see that λ(gn ) = l(f, pn ) and λ(hn ) = u(f, pn ). We can use
sup[a,b] f as a dominating function, so we have by dominated convergence that
λ(g) = lim λ(gn ) = lim l(f, pn ) = lim u(f, pn ) = lim λ(hn ) = λ(h).
n n n n
We also have that h − g ≥ 0 and λ(h − g) = 0 so we know that h = g Lebesgue almost everywhere and
as h − f ≤ h − g we know that f = h = g almost everywhere. Therefore, f is almost everywhere equal
to a measurable function and it is bounded so is Lebesgue integrable
We nish this section with some examples of functions which are Lebesgue integrable but are not
Riemann integrable. The most classic example of this is
Example 9.16. Let f (x) = 1Q then f is Lebesgue integrable but not Riemann integrable on [0, 1] (or
any other interval). In order to see that this function is not Riemann integrable we can see that for
any partition p as the rationals an the irrationals are dense in [0, 1] then l(f, p) = 0 and u(f, p) = 1
therefore if we take a seqence of nested partitions pn then we wont have the limits of l(f, pn ) and
u(f, pn ) meeting.
kλvk = |λkkvk
kv + uk ≤ kvk + kuk
We are interested in normed spaces of functions, where the norms come from integrating quantites.
Denition 10.2. Lp (E) Suppose that (E, E, µ) is a measure space, and p ≥ 1, then we have the
associated Lp space, which is the space of measureable functions equipped with the norm
kf kp = (µ(|f |p ))1/p .
32
Remark 10.4. Strictly speaking the norms dened above are seminorms. This is because all these norms
will vanish for a function f , where f is non-zero but is equal to zero almost everywhere. When working
in Lp spaces we consider two functions the same if they are equal almost everywhere.
Strictly speaking we no longer consider functions f when we work in Lp spaces we instead consider
equivalence classes of functions with the equivalence relation f ∼ g if f = g almost everywhere. When
working in this setting we write Lp for the space of measurable function equipped with the p-seminorm
and Lp for the space of equivalence classes of functions equipped with the p-norm. Most of the time
we wont really think about an element of Lp as an equivalence class and hopefully it quickly becomes
natural to think about functions as dened up to alteration on a null set.
Theorem 10.5. For p ∈ [1, ∞] the space Lp (E) is indeed a vector space.
Proof. We need to show f ∈ Lp (E) implies that αf ∈ Lp (E) for α ∈ R (or C) and f, g ∈ Lp (E) implies
that f + g ∈ Lp (E), specically we need to show that kαf kp < ∞ and kf + gkp < ∞.
For p < ∞, the rst point we can use the linearity of the integral to get
µ (|α|p |f |p ) = |α|p µ(|f |p ) < ∞.
For p = ∞ for the rst point it follows immediately from the denition that kαf k∞ = |α|kf k∞ < ∞.
For the second point since the union of two null sets is null we have that f +g is equivalent to a function
which is uniformly bounded. Therefore it is clear that kf + gk∞ ≤ kf k∞ + kgk∞ .
In order to progress further with normed spaces of functions we need to be able to prove the triangle
inequality for the p-norms. This inequality is called Minkowski's inequality. In the next section we prove
it as well as several other inequalities which are very useful when working with function spaces.
33
or equivalently
t 1
+ − t1/p ≥ 0.
p q
We can dierentiate this function in t and get
d t 1 1
+ − t1/p = (1 − t−1/q )
dt p q p
So this function achieves a minimum when t−1/q = 1, that is when t = 1, and it achieves the minimum
value 0. Therefore it is always positive and the inequality holds.
We also have the very simple corollary which is often useful (especially in Analysis of PDE)
Corollary 10.7. Suppose that x, y are positive then for every η > 0 we have
xp η p yq
xy ≤ + q
p η q
Proof. Just write xy = (ηx)(y/η).
Using Young's inequality we can prove an inequality about functions.
Proposition 10.8 (Hölder's Inequality). Suppose that (E, E, µ) is a measure space and f ∈ Lp (E), g ∈
Lq (E) with 1/p + 1/q = 1 then f g ∈ L1 (E) and we have the following inequality
kf gk1 ≤ kf kp kgkq
Proof. First let us look at the case where f ∈ L1 (E) and g ∈ L∞ (E) without loss of generality let g be
bounded everywhere by kgk∞ then we have
|f (x)g(x)| ≤ |f (x)|kgk∞
and
d2 ηp
1
kf kpp + q kgkqq = (p − 1)η p−2 kf kpp + (q − 1)η −q−2 kgkqq > 0.
dη 2 p η q
34
So the right hand side of the inequality is smalles when
η p−1 kf kpp = η −q−1 kgkqq .
Which is when
η = kgkqq/(p+q) kf k−p/(p+q)
p .
Substituting this value of η in gives
1 2 /(p+q) 1 2 /(p+q)
kf gk1 ≤ kf kp−p
p kgkpq/(p+q) + kf kpq/(p+q)
p kgkq−q
q = (kf kp kgkq )pq/(p+q) ,
p q
and
pq/(p + q) = ((p + q)/pq)−1 = (1/q + 1/p)−1 = 1.
Second proof of Holder's inequality. This is the more standard proof suppose rst that kf kp = 1, kgkq =
1 then using Young's inequality |f (x)g(x)| ≤ |f (x)|p /p + |g(x)|q /q . So integrating this gives kf gk1 =
kf kpp /p + kgkpq /q = 1/p + 1/q = 1. Then we have for general f, g that kf /kf kp kp = 1 and kg/kgkq kq = 1
so
kf g/kf kp kgkq k1 ≤ 1,
and multiplying through gives
kf gk1 = kf kp kgkq .
Remark 10.9 (Cauchy-Schwartz Inequality). The important case of this inequality when p = q = 2 is
generally known as the Cauchy-Schwartz inequality!!!
We also have Minkowski's inequality which as we discussed is necessary to make sure Lp is a normed
space.
Proposition 10.10 (Minkowski's Inequality). Let (E, E, µ) be a measure space and suppose that f, g
are in Lp then
kf + gkp ≤ kf kp + kgkp .
Proof. We have already shown this when p = ∞, the case where p = 1 is also straightforward. We have
|f (x) + g(x)| ≤ |f (x)| + |g(x)|
|f + g|q(p−1) = |f + g|p .
We also have that |f |, |g| ∈ Lp (E). Therefore we have
kf + gkpp = µ(|f + g|p ) = µ(|f + g||f + g|p−1 )
≤ µ(|f ||f + g|p−1 ) + µ(|g||f + g|p−1
using Hölder's ineq ≤ (kf kp + kgkp ) k|f + g|p−1 kq
= (kf kp + kgkp ) kf + gkp/q
p .
35
Now we move onto some more probabilistically focussed inequalities which do not directly relate to
Lp spaces
Proposition 10.11 (Markov's Inequality/ Chebychev's inequality). Let (E, E, µ) be a measure space
and f a non-negative measurable function and λ > 0. Then we have
1
µ({x : f (x) > λ}) ≤ µ(f ).
λ
Proof. We have the following inequality
λ1{f (x)>λ} ≤ f.
We then integrate this and use the monotonicity of the integral to get
λµ({f (x) > λ}) ≤ µ(f ).
Remark 10.12 (Tail estimates). On of the powerful consequences of Markov's inequality is that is allows
us to estimate how the function will behave at large values. For example suppose that f ∈ Lp (R) then
we know that
λ({x : |f (x)| > t} = λ({x : |f (x)|p > tp }) ≤ t−p kf kpp .
This is particularly relevant in probability where we are interested in estimating how often extreme
events happen and we get inequalities of the form
P(X > x) ≤ x−p E(X p ).
Remark 10.13 (Tcherno bounds). Another common use of Markov's inequality is when we know how
µ(exp(αf (x))) behaves as we vary α. For example, in a probabilistic setting E(eαX ) is the moment
generating function which is often known for distributions. We can then use Markov's inequality via
µ({f (x) > t}) = µ({exp(αf (x)) > eαt }) ≤ µ(exp(αf ))e−αt .
Since the left hand side does not depend on α one can then optimise over α which will often give a
superior bound. An example of this is in the probabilistic setting if X is a normal random variable on
R with mean 0 and varaince σ 2 then we have
2 2
E eαX = eα σ /2 .
36
Lemma 10.15. Let φ : I → R be convex and let m be a point in the interior of I then there exists a, b
such that ax + b ≤ φ(x) for every x ∈ I and am + b = φ(m).
Proof. Take x < m < y then by convexity
y−m m−x
φ(m) ≤ φ(x) + φ(y).
y−x y−x
We can rearrange this to
(y − m + m − x)φ(m) ≤ (y − m)φ(x) + (m − x)φ(y),
then to
(y − m)(φ(m) − φ(x)) ≤ (m − x)(φ(y) − φ(m)),
then to
φ(m) − φ(x) φ(y) − φ(m)
≤ .
m−x y−m
This is true for any x, y surrounding m so there exists a such that
φ(m) − φ(x) φ(y) − φ(m)
≤a≤ ,
m−x y−m
for every such x, y . From this we get that φ(x) ≥ a(x − m) + φ(m).
Proposition 10.16 (Jensen's inequality). Suppose that (E, E, µ) is a measure space with µ(E) = 1
and let φ be a convex function from R to R and f is an integrable function then φ(f ) is well dened
and
µ(φ(f )) ≥ φ(µ(f )).
Remark 10.17. This is another inequality where I sometimes have trouble remembering which way the
inequality sign goes. My key example to check on is
1 2 1
1 1
= xdx ≤ x2 dx = .
4 0 0 3
Proof. As µ(E) = 1 we can consider µ(f ) as the average value that f takes over E . Using our lemma
we have that there exists a, b such that
ax + b ≤ φ(x),
and
aµ(f ) + b = φ(µ(f )).
By the monotonicity of the integral
µ(af + b) ≤ µ(φ(f ))
and the left hand side is aµ(f ) + bµ(E) = aµ(f ) + b by linearity which by construction is equal to
φ(µ(f )).
37
10.2 Back to Lp spaces
Now we are armed with our inequalities, we want to discuss some properties of Lp spaces. First let us
dene convergence in Lp .
Denition 10.18. We say a sequence of functions fn converges to another function f in Lp if kfn −
f kp → 0 as n → ∞.
Theorem 10.19 (Lp (E) is complete). This is for the case p < ∞. Suppose that fn is a sequence in
Lp with kfn − fm kp → 0 as n, m → 0 then there exists an f in Lp such that kfn − f kp → 0 as n → ∞
Proof. Let n1 = 1 and then we can nd nk recursively such that kfnk − fnk−1 kp ≤ 2−k . Then we have
that X
kfnk − fnk−1 kp ≤ 1.
k
Choose K arbirtrary, then by Minkowski's inequality we have
K
X K
X
k |fnk − fnk−1 |kp ≤ kfnk − fnk−1 kp ≤ 1.
k=1 k=1
Therefore,
∞
X
|fnk (x) − fnk−1 (x)| < ∞
k=1
almost everywhere. This implies that fnk (x) is a Cauchy sequence for almost every x. Since we know
that R is complete, there exists a set E 0 with µ(E \ E 0 ) = 0 such that for every x ∈ E 0 the sequence
fnk (x). Dene
limk fnk (x) x ∈ E 0
f (x) =
0 x∈/ E0
Now we have a candidate for our limit, we want to show fn → f in Lp (E). Given > 0 there exists
N such that if n, m ≥ N then kfn − fm kp ≤ . Therefore, for k suciently large and n ≥ N we have
kfn − fnk kp ≤ . Now, using Fatou's lemma
Therefore kfn − f kp → 0.
Proposition
Pn 10.20. Linear combinations of simple functions, step functions p(functions of the form
φ(x) = k=1 ak 1(ck ,dk ] ), and continuous functions are all dense in the space L (R), p ∈ [1, ∞) that is
to say for any > 0 and any f in Lp (R) there is a function g which is a simple function/step function/
continuous function such that kf − gkp ≤ .
Proof. The proof for simple functions and step functions is in the fourth assignment. In order to
show that it works for continuous functions we notice that the result is true for step functions so for
any f ∈ Lp (R) and any > 0 there exists a step function φ such that kf − φkp ≤ /2, if we can
nd a continuous function g such that kφ − gkp ≤ /2 then by Minkowski's inquality kf − gkp ≤
kf − φkp + kφ − gkp ≤ /2 + /2.
38
Now if we look at the indicator function 1(c,d] (x) then let us take
0 x∈ / (c − , d + )
(x − c + )/ x ∈ [c − , c)
g,c,d (x) =
1
x ∈ [c, d)
−(x − d − )/ x ∈ [d, d + )
Then kgc,d, − 1(c,d] kp ≤ 2. Now let φ(x) = nk=1 ak 1(ck ,dk ] and let g = nk=1 ak gck ,dk ,/2|ak |n then
P P
n
X n
X
kφ − gkp ≤ kak (1(ck ,dk ] − gck ,dk ,/2|ak |n )kp ≤ |ak |/|ak |n ≤ .
k=1 k=1
πE (x, y) = x, πF (x, y) = y.
Lemma 11.5. The maps πE and πF are both measurable. Furthermore if C ∈ E × F then the following
sets are measurable
−1
Cy = {x ∈ E : (x, y) ∈ C} = πE πF−1 ({y}) ∩ C .]
Cx = {y ∈ F : (x, y) ∈ C} = πF πE ({x}) ∩ C ∈ F,
39
Theorem 11.6 (Product Measure). Given two σ-nite measure spaces (E, E, µ) and (F, F, ν) there
exists a unique measure, µ × ν , on E × F such that (µ × ν)(A × B) = µ(A)ν(B) when A ∈ E and B ∈ F .
Furthermore
(µ × ν)(C) = ν(Cx )µ(dx) = µ(Cy )ν(dy).
E F
Proof. Let us begin in the case where both measure spaces are nite. As A = {A × B : A ∈ E, B ∈ F}
is a π -system generating E × F we can use Carathéodory's extension theorem to prove the rst part of
this theorem. However we will work directly as dening this measure is straightforward and useful for
understanding it.
First we check that x 7→ ν(Cx ) and y 7→ µ(Cy ) are both measurable functions so the integrals are
well dened. Let us begin in the case that ν is a nite measure. Let C be the collection of sets for which
the function x 7→ ν(Cx ) is E measurable. If C = A × B then ν(Cx ) = ν(B)1x∈A which is measurable.
Now we want to show that C is a σ -algebra. If C 1 ⊂ C 2 then ν((C 2 \ C 1 )S x ) = ν(C x ) − ν(Cx ) so
2 1
Now we know that (µ × ν)1 and (µ × ν)2 agree on a π -system generating E × F so Dynkin's uniqueness
of extension lemma says that they agree on all of E × F .
Now we need to extend to the σ -nite case. S There areSsequences En and Fn of sets such that
µ(En ) < ∞, ν(Fn ) < ∞ for every n and E = n En , F = n Fn . Then we know that x 7→ ν((C ∩
(En × Fn ))x ) is a measurable function of x for every n, so letting n tend to innity we have ν(Cx ) =
limn ν((C ∩ (En × Fn ))x ) so x 7→ ν(Cx ) is the limit of measurable functions so measurable. Therefore
in the σ nite case we can still dene our two candidate measures (µ × ν)1 and (µ × ν)2 and we have
that (µ × ν1 (C) = limn (µ × ν))1 (C ∩ (En × Fn )) = limn (µ × ν)2 (C ∩ (En × Fn )) = (µ × ν)2 (C). So the
two measures are equal.
Now let (µ × ν)3 be any other candidate measure on E × F such that (µ × ν)3 (A × B) = µ(A)ν(B).
Dynkin's uniqueness of extension theorem tells us that it must be equal to (µ × ν) when restricted to
En × Fn for any n. We can then repeat exactly the same argument as above to extend it to any set in
E × F.
One of the key tools we get when using product measure is Fubini's theorem. There are two
theorems one for positive functions, one for integrable functions. The naming gets a bit wooly, but
often the theorem for positive functions is called Tonelli's theorem and that for integrable functions is
called Fubini's theorem. Sometimes the later is called the Fubini-Tonelli theorem and sometimes both
are called Fubini-Tonelli or Fubini. To play it safe I'm going to call both Fubini-Tonelli Theorem.
40
Theorem 11.7 (Fubini-Tonelli theorem for positive functions). Suppose that (E, E, µ) and (F, F, ν)
are σ-nite measure spaces and
f is a non-negative E × F measurable function then the functions
x 7→ F f (x, y)ν(dy) and y 7→ E f (x, y)µ(dx) are both measurable and
(µ × ν)(f ) = f (x, y)ν(dy) ν(dx) = f (x, y)µ(dx) ν(dy).
E F F E
Proof. We build up the proof gradually, beginning with the case where f is the indicator function of a
set C ∈ E × F . In this case the measurability of the integrals in x or y and the form for (µ × ν)(f ) are
given by the construction of the product measure in the previous theorem.
The linearity of the integral then imply that
the Fubini-Tonelli theorem holds whenever f is a non-
negative simple
function, we also can see that f (x, y)ν(dy) will be measurable as the previous lemma
shows that 1Cx (y)ν(dy) is measurable and this is the sum of functions of that form.. We then note
that any non-negative measurable function f , can be approximated from below by non-negative simple
functions. Let fn be a sequence of simple functions approximating f . Then
Nn
X
fn = cnk 1Ckn ,
k=1
By monotone convergence as n → ∞ the left hand side converges to (µ × ν)(f ). We can also see that
by monotone convergence
cn 1(Ckn )x (y)ν(dy) ↑ f (x, y)ν(dy).
F F
We note that this shows that F f (x, y)ν(dy) is the limit of measurable functions. Consequently, we
use monotone convergence again to get that the right hand side converges to
f (x, y)ν(dy) µ(dx).
E F
and
Ef (x, y)µ(dx) E |f (x, y)|µ(dx) < ∞
h(y) =
0 E |f (x, y)|µ(dx) = ∞
are both measurable and integrable. Furthermore,
(µ × ν)(f ) = f (x, y)ν(dy) ν(dx) = f (x, y)µ(dx) ν(dy).
E F F E
41
Proof. Now we turn to the case where f is not necessarily non-negative but is (µ × ν) integrable. By
our result for non-negative functions we know that
(µ × ν)(|f |) = |f (x, y)|ν(dy) µ(dx),
E F
which proves that the function x 7→ F |f (x, y)|ν(dy) is µ-integrable, and is consequently nite almost
everywhere, therefore restricting
the functions g, h to where they would be nite is not a problem. Let
A be the set on which x 7→ F |f (x, y)|ν(dy) is nite. Now we write f = f+ − f− in our usual way.
Then by denition
f (x, y)ν(dy)1x∈A = f+ (x, y)ν(dy) − f− (x, y)ν(dy) 1x∈A .
Then using the fact that µ(Ac ) = 0, and our result for non-negative functions we have
(µ × ν)(f ) = (µ × ν)(f+ ) − (µ × ν)(f− ) = f+ (x, y)ν(dy)µ(dx) − f− (x, y)ν(dy)µ(dx)
E F E F
= f+ (x, y)ν(dy) − f− (x, y)ν(dy) 1x∈A µ(dx)
E F
F
= f (x, y)ν(dy)1A µ(dx)
E F
= f (x, y)ν(dy)µ(dx).
E F
is measurable with the product σ -algebra and its measure is the area under the graph of f . We have
that
(µ × λ)(A) = µ(λ(Ax )) = µ(f ),
and ∞
(µ × λ)(A) = λ(µ(Ay )) = λ({x : f (x) ≥ y}) = µ({x : f (x) ≥ y})dy.
0
Example 11.10 (Convolutions). Suppose that both f and g are in L1 (R) then for almost every x the
function t 7→ f (x − t)g(t) is also in L1 (R). We have that the function f ∗ g dened by
R f (x − t)g(t)dt if t 7→ f (x − t)g(t) is Lebesgue integrable
x 7→
0 Otherwise
is in L1 and satises kf ∗ gk1 ≤ kf k1 kgk1 .
We can prove this using Fubini-Tonelli. First we want to check that t 7→ f (x − t)g(t) is measurable.
Write h(t) = x − t this continuous function (for xed x) and t 7→ f (x − t) = f (h(t)) so it is the compo-
sition of two measurable functions so measurable. We also know that the product of two measurable
42
functions is measurable to f (x − t)g(t) is a measurable function of t. Now we want to check that it is
integrable
f (x − t)g(t)dt dx ≤ |f (x − t)g(t)|dtdx
We can also show that convolutions of functions are continuous functions using the tools from
measure theory. For this we need to show that shifts are continuous in L1 .
Lemma 11.11. Dene the map Tτ : Lp (R) → Lp (R) by (Tτ f )(x) = f (x + τ ) then
lim kTτ f − f kp = 0.
τ →0
Proof. We want to show that for any there exists τ∗ such that if τ ≤ τ∗ then kTτ f − f kp ≤ . First
let us show the result for step functions, that is to say functions of the form
n
X
φ(x) = ak 1[ck ,dk ) .
k=1
Now we go back to convolutions, we can show that if f, ∈ Lp (R) and g ∈ Lq (R) then f ∗ g is
continous.
|f ∗ g(y) − f ∗ g(x)| = | (f (x − t) − f (y − t))g(t)dt| ≤ |f (x − t) − f (y − t)||g(t)|dt
R R
43
12 Radon-Nikodym Theorem - Week 10 starts here
12.1 Signed measures
We introduce the notion of signed measures which will be useful in the proof of the Radon-Nikodym
theorem.
Denition 12.1 (Finite signed measure). A function µ from a σ -algebra E to R is a nite signed
measure if
µ(∅) = 0,
Example 12.2. If (E, E, µ) is a measure space and f ∈ L1 (E) then ν dened by ν(A) = µ(f 1A ) is a
signed measure.
We want to show two decomposition theorems which basically allow us to reduce the situation back
to measures. First we need some more denitions and a useful Lemma.
Denition 12.3. If (E, E) is a measurable space and ν is a nite signed measure then we call A a
positive set if for every B ∈ E with B ⊆ A then ν(B) ≥ 0. The negative sets are dened analagously.
Lemma 12.4. Suppose that ν is a nite signed measure on (E, E) and suppose A ∈ E with ν(A) < 0
then there exists a negative set B with B ⊆ A and ν(B) ≤ ν(A).
Proof. We will produce this set A by an itterative process, dene
δ1 = sup{ν(C) : C ⊆ A},
then since ∅ ⊆ A we have that δ1 ≥ 0. If δ1 = 0 then we have a negative set so are done. If not we can
nd a set C1 ⊆ A with
ν(C1 ) ≥ min{δ1 /2, 1}.
(We take the minimum here because we don't know that δ1 is nite.) Now we will dene a sequence of
δn and Cn by setting
n−1
[
δn = sup{ν(C) : C ⊆ (A \ Ci )}
i=1
as ν(C∞ ) ≥ 0 by construction. As ν is a nite measure we must have ν(C∞ ) < ∞ and as the Cn are
constructed to be disjoint this means we must have limn ν(Cn ) = 0. Therefore limn δn = 0. If D ⊆ B
then we must have that ν(D) ≤ δn for every n, therefore ν(D) ≤ 0.
Now we are able to state and prove our two decomposition theorems.
Theorem 12.5 (Hahn Decomposition theorem). Let (E, E) be a measure space and ν a nite signed
measure. Then there exists a positive set P and a negative set N for ν such that E = P ∪ N .
44
Proof. Let L = inf{ν(A) : A is a negative set for ν} then L is nite as otherwise we could S
construct a
set with measure −∞. Then let An be a negative set with ν(An ) ≤ L + 1/n then let N = nSAn .
We canScheck that N is a negative set and that ν(N ) = L. If A ⊆ NPthen let Bn = An \ n−1 k=1 Ak
then A = n (A ∩ Bn ) and A ∩ Bn ⊆ An so ν(A ∩ Bn ) ≤ 0 and ν(A) = n ν(A ∩ Bn ) ≤ 0. Now since
N is a negative set ν(N \ An ) ≤ 0, therefore ν(N ) = ν(N \ An ) + ν(An ) ≤ ν(An ) ≤ L + 1/n. This is
true for any n so ν(N ) ≤ L and since L is dened to be the innmum over ν(A) for all negative sets
A, we will have ν(N ) ≥ L, therefore ν(N ) = L.
Let P = N c we want to check that P is a positive set. Suppose there exists a set A ⊆ P with
ν(A) < 0, then by our lemma there exists a negative set B ⊆ A with ν(B) ≤ ν(A) < 0. Then N ∪ B
is a negative set and N and B are disjoint so ν(N ∪ B) = ν(N ) + ν(B) < ν(N ) which contradicts the
fact that ν(N ) = L = inf{ν(A) : A is a negative set for ν} so we are done.
Theorem 12.6 (Jordan decompostion theorem). Every nite signed measure is the dierence of two
positive measures. Precisely, if (E, E) is a measure space and ν is a signed measure then there exist
measures ν+ and ν− such that for every A ∈ E we have ν(A) = ν+ (A) − ν− (A).
Proof. Take some Hahn decomposition (P, N ) then let ν+ (A) = ν(A∩P ), as A∩P ⊆ P then ν(A∩P ) ≥
0. Similarly let ν− (A) = −ν(A∩N ). By additivity of ν we have that ν(A) = ν+ (A)−ν− (A). Countable
additivity of ν+ and ν− follow immediately from countable additivity of ν .
Now we notice that if B ⊆ A then
ν(B) = ν+ (B) − ν− (B) ≤ ν+ (B) ≤ ν+ (A)
45
We also have that ν( ≥ ν(An ) ≥ and
S
m≥n Am )
\ [ [
ν Am = lim ν Am ≥ .
n
n m≥n m≥n
This show gives us a set with µ(B) = 0 but ν(B) > 0 which contradicts ν µ.
Now we can prove the main theorem for this section.
Theorem 12.9 (Radon-Nikodym Theorem). Let (E, E) be a measure space and let µ, ν be two nite
measures with ν µ. Then there exists a measurable function g : E → [0, ∞) such that ν(A) = µ(g1A ).
The function g is unique up to identifying almost everywhere equal functions. We write g = dν/dµ and
call it the Radon-Nikodym derivative of ν with respect to µ.
Proof. Let us dene the set F which is the set of all measurable functions, f , with µ(f 1A ) ≤ ν(A) for
every A ∈ E . The idea is that F contains a function g which achieves µ(g) = supf ∈F µ(f ).
As a rst step we show that f1 ∨ f2 = max{f1 , f2 } ∈ F when f1 , f2 ∈ F . Let us take any A ∈ E
then let A1 = A ∩ {f1 ≥ f2 } and A2 = A ∩ {f1 < f2 }. Then
µ(f1 ∨ f2 1A ) = µ(f1 ∨ f2 1A1 ) + µ(f1 ∨ f2 1A2 ) = µ(f1 1A1 ) + µ(f2 1A2 ) ≤ ν(A1 ) + ν(A2 ) = ν(A).
Therefore f1 ∨ f2 ∈ F .
Now take a sequence fn such that µ(fn ) ≥ supf ∈F µ(f ) − 1/n. Then let gn = f1 ∨ f2 ∨ · · · ∨ fn , so
that the sequence of function gn is increasing and µ(gn ) ≥ supf ∈F µ(f ) − 1/n. Then as gn is increasing
it has a limit g and the monotone convergence theorem shows that
µ(g1A ) = lim µ(gn 1A ) ≤ ν(A).
n
So g ∈ F .
Now we can dene another positive measure ν0 (A) = ν(A) − µ(g1A ). We want to show that ν0 = 0
and will do this by contradiction. Suppose that there exists A ∈ E such that ν0 (A) > 0 then by
monotonicity we will have ν0 (E) > 0 and since µ is a nite measure there exists a number > 0 such
that ν0 (E) > µ(E). Now ν0 − µ is a nite signed measure. Let (P, N ) be a Hahn decomposition for
this signed measure. Then (ν0 − µ)(A ∩ P ) ≥ 0 so ν0 (A ∩ P ) ≥ µ(A ∩ P ). Hence we have
ν(A) = µ(g1A ) + ν0 (A) ≥ µ(g1A ) + ν0 (A ∩ P )
≥ µ(g1A ) + µ(A ∩ P ) = µ(1A (g + 1P )).
We also have that µ(P ) > 0 as if µ(P ) = 0 then we would have ν0 (P ) = 0 as ν0 ν µ, and this
would mean
(ν0 − µ)(E) = (ν0 − µ)(N ) ≤ 0,
which would contradict ν0 (E) > µ(E). Therefore, g + 1P belongs to F but µ(g + 1P ) > µ(g) which
contradicts the fact that g achieves µ(g) = supf ∈F µ(f ). Hence ν(A) = µ(g1A ).
Now we turn to uniqueness suppose that we have two positive functions g, h such that ν(A) =
µ(g1A ) = µ(h1A ) for every A, then as ν is nite g and h are integrable so g − h is integrable and
µ((g − h)1A ) = 0 for every A. As g − h is measurable then {x ∈ E : g − h ≥ 0} is a measurable set
so µ((g − h)1{x∈E : g−h≥0} ) = 0. This shows that (g − h)1{x∈E : g−h≥0} = 0 almost everywhere. In the
same way (g − h)1{x∈E : g−h≤0} = 0 almost everywhere. Therefore g = h µ-almost everywhere.
46
12.3 Duality in Lp spaces
The goal of this section is to prove that if 1/p + 1/q = 1 then the dual space of Lp (E) is isomorphic to
the space Lq (E). First let us dene a dual space.
Denition 12.10. Let V be a Banach space (a complete, normed vector space) then the dual space
of V is written V 0 and is the space of all bounded linear operators from V to R. We recall that we
call an operator K on V bounded if |K(v)| ≤ Ckvk for all v ∈ V . We can dene a norm on V 0 by
kKk = supkvk=1 |K(v)|.
The rst thing to note is that if g ∈ Lq (E) then we can dene a bounded linear operator on Lp (E)
by Kg (f ) = µ(f g). This is bounded by Hölder's inequality |µ(f g)| ≤ µ(|f g|) = kf gk1 ≤ kf kp kgkp . It
is also linear thanks to the linearity of the integral. Therfore we can produce a map from Lq (E) →
(Lp (E))0 by g 7→ Kg .
Theorem 12.11. Let (E, E, µ) be a nite measure space and p ∈ (1, ∞). The dual space of Lp (E) is
Lq (E) where 1/p + 1/q = 1. Furthermore the map dened by g 7→ Kg is an isometry.
Remark 12.12. This result also holds for arbitrary measure spaces (without the nite assumption).
Extending to σ -nite measure spaces is relatively straightforward and then to any measure space is
more complicated.
Proof. Remark: This result is similar in spirit to the Riesz representation result that was a non-
examinable topic in week 6.
First we note that the map g 7→ Kg is linear and kKg k(Lp )0 ≤ kgkq . Therefore the map is injective
we want to show that kKg k = kgk and that it is surjective.
First for the fact that kKg k = kgk we look at the function f (x) = sgn(g)|g(x)|q−1 then µ(|f |p ) =
µ(|g|q ) < ∞. Therfore we can look at the action of Kg on f and we have Kg (f ) = µ(|g|q ) so we know
that kKg k ≥ Kg (f )/kf kp = µ(|g|q )/µ(|g|q )1/p = µ(|g|q )1−1/p = kgkq . Therefore g 7→ Kg preserves
norms.
Now we want to show that this map is surgective, let us begin with the case where µ(E) < ∞.
Let us take K an arbirary element of (Lp (E))0 . In this case 1A ∈ Lp (E) for every A ∈ E so we can
dene a function on E by k(A) = K(1A ). We want to show that k is a signed measure. k(∅) =
K(0) = 0 and let A1 , A2 , . . . be a sequece of disjoint measurable sets. Then 1Snj=1 Aj = nj=1 1Aj then
P
k( nj=1 Aj ) = K(1Snj=1 Aj ) = K( nj=1 1Aj ) = nj=1 K(1Aj ) = nj=1 k(Aj ). We also have that k1Sj Aj −
S P P P
so k( n An ) = n k(An ) Therefore k is indeed a signed measure. By the Hahn decomposition and the
S P
Jordan decomposition we can write k = k+ − k− and there exists P ∪ N a Hahn decomposition with k
being positive on P and negative on N .
Next we want to show that k+ µ and k− µ. If A ∈ E is such that µ(A) = 0 then µ(A ∩ P ) = 0
and µ(A ∩ N ) = 0 and K(1A∩P ) = K(0) = 0 and K(1A∩N ) = K(0) = 0. Therefore k+ (A) = 0 and
k− (A) = 0.
Then by the Radon-Nikodym theorem there exists functions g+ and g− such that k+ (A) = µ(g+ 1A )
and k− (A) = −µ(g− 1A ). Now let g = g+ − g− we want to show that g ∈ Lq and that K = Kg . This is
complicated.
Let us dene En by En = {x : |g(x)| ≤ n} then g1En is bounded and so in Lq as µ is nite. Then
dene a linear functional on Lp by Kn (f ) = µ(f g1En ) and another by K̃n (f ) = K(f 1En ). Then if A is
a measurable set we have Kn (1A ) = K̃n (1A ), by linearity if h is a simple function then Kn (h) = K̃n (h).
We showed in Assignment 4 that given a function f ∈ Lp , > 0 there exists a simple function h
with kf − hkp ≤ . Then we have that
|Kn (f )−K̃n (f )| ≤ |Kn (f )−Kn (h)|+|K̃n (f )−K̃n (h)| ≤ kKkkf −hkp +kg1En kq kf −hkp ≤ (kKk+kgkq ).
47
Since is arbitrary this shows that Kn (f ) = K̃n (f ). K̃n (f ) = Kg1En (f ) so by our isometry we have
kK̃n k = kg1En kq . We also have that kK̃n k ≤ kKk as kK̃n k = supkf kp =1 Kn (f ) = supkf kp =1 K(f 1En ) ≤
supkf kp =1 K(f ) = kKk. Therefore kg1En kq ≤ kKk therefore kg1En kpp = |g|p 1En µ(dx) and then by
monotone convergence we get that kgkq = limn kg1En kq ≤ kKk. Therefore, g ∈ Lq . Then by exactly
the same argument with which we showed Kn = K̃n we have that K = Kg . This concludes the proof
int he nite case.
48