Introduction to Probability Theory
K. Suresh Kumar
Department of Mathematics
Indian Institute of Technology Bombay
September 2, 2017
2
LECTURES 12-13
Example 0.1 Consider the random experiment of tossing an unbiased coin
denumerable times. Let Xn , n ≥ 1 denote the random variable which takes
1 if the nth toss results in a H and 0 if the nth toss results in a T . Further
set
A = {Xn = 1 i.o.} .
One can calculate P (A) using Borel - Cantelli Lemma as follows. Define
An = {Xn = 1}. Then
{A1 , A2 , · · · }
are independent and P (An ) = 21 for all n ≥ 1 (exercise)
Hence
∞
X
P (An ) = ∞
n=1
Also note that
lim sup An = A .
n→∞
Therefore using Borel - Cantelli Lemma
P (A) = 1 .
Example 0.2 Let Xn as in the above example. Set
B = {X2n +1 = X2n +2 = · · · = X2n +b2 log2 nc = 1 i.o},
where bxc denotes the integer part of x ∈ R. Then P (B) = 0.
Set
Bn = {X2n +1 = X2n +2 = · · · = X2n +b2 log2 nc = 1}.
Bn0 s are independent, since there come from non overlapping P tosses. Also
P (Bn ) = blog12 n2 c . This lies between n12 and n22 . Hence ∞
n=1 P (Bn ) con-
2
verges. So by Borel-Cantelli, P (B) = 0. Here note
Example 0.3 Let Xn as in the above example. P Set Cn = {X1 = 1} for
all n ≥ 1. Then Cn are not independent but ∞ n=1 P (Cn ) diverges. Also
1
P (lim supn→∞ Cn ) = 2 . i.e., one can not in general relax ’independence’
from the Borel-Cantelli lemma.
3
∞
X 1
On Series- a digression: The series , 0 < α < ∞ is an important
nα
n=1
family of series. These are used to test the convergence/divergence of many
other series as we have seen in Example 0.2 . So we will P quickly take a look
the convergence of the above series. Consider the series ∞ 1
n=1 n2 .
4 times 8 times
∞ z }| { z }| {
X 1 1 1 1 1 1 1
≤ 1 + 2 + 2 + 2 + ··· + 2 + 2 + ··· + 2 +···
n2 2 2 4 4 8 8
n=1
1 1
= 1++ + · · · (geometric series)
2 22
Hence using comparison test, the series ∞ 1
P
n=1 n2 converges.
Now let us consider
4 terms 8 terms
∞ z }| { z }| { z }| {
X 1 1 1 1 1 1 1 1
= 1 + + + + + ··· + + + ··· + +···
n 2 3 4 5 8 9 16
n=1
4 terms 8 terms
z }| { z }| { z }| {
1 1 1 1 1 1 1
≥ 1 + + + + + ··· + + + ··· + +···
2 4 4 8 8 16 16
∞
X 1
= .
2
n=1
Hence by comparison test, the series ∞ 1
P
n=1 n diverges.
Now we give a general convergence /divergence result for the above series.
∞
X 1
Lemma 0.1 The series converges if 1 < α < ∞ and diverges to
nα
n=1
infinity if 0 < α ≤ 1.
Proof: The R ∞ proof I am going to give is based the comparison with the
1
integrals 1 xα dx. We know that the above Riemann integral diverges to
1
infinity if 0 < α ≤ 1 and is α−1 for α > 1.
Now using
Z n Z n+1
1 dx dx 1
α
≤ α
, α > 1 and α
≤ α , 0 < α ≤ 1,
n n−1 x n x n
we get
Z ∞ ∞
dx X 1
≤ ,0 < α ≤ 1
1 xα nα
n=1
4
and
∞ Z ∞
X 1 dx
α
≤ , α > 1.
n 1 xα
n=2
∞
X 1
From the above it follows that (exercise) series converges if 1 < α <
nα
n=1
∞ and diverges to infinity if 0 < α ≤ 1.
Chapter-5: Distribution Function
Key words: Distribution function, Law of a random variable, discrete
random variable, continuous random variable, pmf and pdf of random vari-
able.
In this chapter, we explore the ’quantification/ measurement’ of the ran-
dom variable by revealing the probability of special events from X. This
gives us the notion of distribution function. The distribution function tell
us how probability is ’distributed’ for a random variable.
Definition 5.1. (Distribution function)
Let X : Ω → R be a random variable on (Ω, F, P ). The function
F : R → R defined by
F (x) = P {X ≤ x}
is called the distribution function of X.
Observe that distribution function F of X reveals the probability of the
events X −1 (−∞, x]) for all x ∈ R, i.e., for a class of events from σ(X). Also
we can seen that the σ-field generated by the family {X −1 (−∞, x])|x ∈ R
is σ(X). Hence F ’describes’ the probabilities of all events from σ(X). We
will see more about this later. Now we have a close look at the distribution
function.
Theorem 0.1 The distribution function has the following properties
(i)
lim F (x) = 0, lim F (x) = 1 .
x→−∞ x→∞
(ii) F is non decreasing.
(iii) F is right continuous.
Proof:
(i) In the proof, we use the following.
5
For a function g : R → R, lim g(x) = a1 iff whenever xn ↓ −∞ as
x→−∞
n ↓ ∞, we have lim g(xn ) = a.
n→∞
Hence, for each sequence {xn } with xn ↓ −∞, we need to compute
lim F (xn ) = lim P ({X ≤ xn })
n→∞ n→∞
and show that it is 0. Set
An = {X ≤ xn } .
Then (exercise)
A1 ⊇ A2 ⊇ · · · and ∩n An = ∅ .
Therefore
P (An ) → 0 .
i.e.,
P {X ≤ xn } → 0 .
Hence
lim F (xn ) = 0 .
n→∞
Using similar argument, we can prove
lim F (x) = 1 .
x→∞
(ii) For x1 ≤ x2 , {X ≤ x1 } ⊆ {X ≤ x2 }. Hence F (x1 ) ≤ F (x2 ).
(iii) We have to show that, for each x ∈ R
lim F (y) = F (x) .
y↓x
As in the proof of (i), it is enough to show that whenever yn ↓ x,
lim F (yn ) = F (x).
n→∞
Let yn ↓ x. Set
An = {X ≤ yn } .
1
Given a function g : R → R, we say that lim g(x) = a if for each ε > 0, there exists
x→−∞
a ∆ > 0 such that whenever x ≤ −∆, we have |g(x) − a| ≤ ε.
The definition of limx→∞ g(x) and limx→x0 g(x) are analogous. For example,
lim g(x) = b if for each ε > 0 there exists a δ > 0 such that whenever |x − x0 | < δ,
x→x0
we have |g(x) − b| < ε.
6
Then (exercise)
A1 ⊇ A2 ⊃ · · ·
and (exercise)
{X ≤ x} = ∩∞
n=1 An .
Therefore
F (x) = P {X ≤ x} = lim P (An ) = lim F (yn ) .
n→∞ n→∞
Theorem 0.2 The set discontinuity points of a distribution function F is
countable.
Proof. Let F be the distribution function of a random variable X and
D = set of all points of discontinuity of F .
Then (since F is a monotone function, all its discontinuities are of jumb
type. (exercise-Hint: use the result : any bounded monotone sequence of
real numbers is convergent.))
D = {x ∈ R | F (x) − F (x−) > 0} ,
where
F (x−) := lim F (y) = P {X < x} .
y↑x
Set
1
Dn = {x ∈ R | F (x) − F (x−) ≥ }.
n
Then
D = ∪∞
n Dn .
Also note that for x ∈ R
P {X = x} = F (x) − F (x−) .
If x1 , x2 , . . . , xm be distinct points in Dn . Then
Pm
1 ≥ k=1 P {X = xk }
Xm
= F (xk ) − F (xk −)
k=1
m
≥ n .
Therefore
]Dn ≤ n .
7
Hence D is countable.
Recall that F (x) = P ({X ∈ (−∞, x]}), x ∈ R and I = {(−∞, x]|x ∈ R}
is a family of Borel sets. Also σ(I) = BR . So what are P ({X ∈ B}) when
B ’run through’ all Borel sets B. Observe that B 7→ P ({X ∈ B}) defines a
map from BR to [0, 1]. Is this map defines a probability measure? Answer
is in the following.
Theorem 0.3 Let X be a random variable on a probability space (Ω, F, P ).
For B ∈ BR , define
µX (B) = P {X ∈ B}.
Then µX is a probability measure on (R, BR ).
Proof.
µX (R) = P {X ∈ R} = P Ω = 1.
Also
µX (B) = P {X ∈ B} ≥ 0
for all B ∈ BR . Let B1 , B2 , · · · ∈ BR be pairwise disjoint, then
µX (∪∞ ∞ ∞
i=1 Bi ) = P {X ∈ ∪i=1 Bi } = P (∪i=1 {X ∈ Bi })
∞
X ∞
X
= P {X ∈ Bi } = µX (Bi ) .
i=1 i=1
[note that Bi ’s are disjoint imply that {X ∈ Bi }’s are disjoint]
This completes the proof.
Definition 5.2 The probability measure µX is called the distribution or law
of the random variable X.
Note that if F is the distribution function of X, then
F (x) = µX (−∞, x].
Example 0.4 Consider the probability space (Ω, F, P ) given by
Ω = {HH, HT, T H, T T },
F = P(Ω),
8
1
P {HH} = P {HT } = P {T H} = P {T T } = .
4
Define X : Ω → R as follows. For ω ∈ Ω,
X(ω) = number of heads in ω.
Then X takes values from {0, 1, 2}. The distribution function of the
random variable is given by
F (x) = 0 if x<0
= 14 if 0≤x<1
= 34 if 1≤x<2
= 1 if x ≥ 2.
Also, the distribution of X is
0 if {0, 1, 2} ∩ B = ∅
1
if {0, 1, 2} ∩ B = {0}
41
2 if {0, 1, 2} ∩ B = {1}
µX (B) = P {X ∈ B} = 1
4 if {0, 1, 2} ∩ B = {2}
3
if {0, 1, 2} ∩ B = {0, 1}
4
1 if {0, 1, 2} ∩ B = {0, 1, 2} .
Example 0.5 (Bernoulli distribution) Let (Ω, F, P ) is a probability space
and A ∈ F with p = P (A). Tossing a p-coin gives such a probability space
with an event A. Then X = IA is called a Bernoulli (p) random variable.
Here note that X takes 2 values 0 and 1.
The distribution function corresponding to Bernoulli distribution is given
by
F (x) = 0 if x < 0
= 1 − p if 0 ≤ x < 1
= 1 if x ≥ 1 .
The distribution of X is given by
0
if B ∩ {0, 1} = ∅
1−p if B ∩ {0, 1} = {0}
µ(B) =
p if B ∩ {0, 1} = {1}
1 if B ∩ {0, 1} = {0, 1} .
Then the function F and the probability measure µ on (R, BR ) are called re-
spectively the Bernoulli distribution function and the Bernoulli distribution.