§6 Random variables and distributions
§6.1 Random variable
6.1.1 Consider an experiment with sample space Ω.
(Informal) Definition. A random variable is a function X : Ω → (−∞, ∞).
6.1.2 A random variable is useful for quantifying experimental outcomes or descriptive events. Some-
times, it may be convenient to simplify the sample space so that events of interest can be
described by X.
6.1.3 Examples.
(i) Ω = {win, lose} can be transformed by defining X(win) = 1, X(lose) = 0.
(ii) Toss 2 coins. Ω = {HH, HT, TH, TT}. Interested in no. of heads only. Define
X(HH) = 2, X(HT) = X(TH) = 1, X(TT) = 0.
(iii) Toss 2 coins. Ω = {HH, HT, TH, TT}. Interested in how many more vertical strokes the
first letter has than the second letter. Define
X(HH) = X(TT) = 0, X(HT) = 1, X(TH) = −1.
(iv) n Bernoulli trials. Interested in no. of successes. Define X = no. of successes ∈ {0, 1, . . . , n}.
(v) Annual income Y has sample space [0, ∞). Taxable when annual income exceeds c say.
Only interested in part of income which is taxable. May define X = max{0, Y − c}.
6.1.4 Conventional notation:
Use capital letters X, Y, . . . to denote random variables and small letters x, y, . . . the possible
numerical values (or realisations) of these variables, so that e.g.
X(ω) = x, Y (ω) = y, for a particular outcome ω ∈ Ω.
§6.2 Distribution function
6.2.1 Definition. The distribution function of a random variable X is the function F : (−∞, ∞) →
[0, 1] given by
F (x) = P(X ≤ x).
Note: Alternative name → cumulative distribution function (cdf).
25
6.2.2 Example. Toss a coin twice, with Ω = {HH, HT, TH, TT}. Define
• X = no. of heads;
• Y = 1 if both tosses return the same side, and = −1 otherwise.
Distribution functions of X and Y are, respectively,
0, t < 0,
0, t < −1,
1/4, 0 ≤ t < 1,
FX (t) = and FY (t) = 1/2, −1 ≤ t < 1,
3/4, 1 ≤ t < 2,
1, t ≥ 1.
t ≥ 2.
1,
6.2.3 The following properties characterise a cdf:
(i) limx→−∞ F (x) = 0
(ii) limx→∞ F (x) = 1
(iii) F is increasing (or, equivalently, non-decreasing)
(iv) F is right-continuous, i.e. limh↓0 F (x + h) = F (x).
Note: F is not necessarily left-continuous. For instance, in Example §6.2.2, limh↓0 FX (1 − h) =
1/4 6= 3/4 = FX (1).
6.2.4 Random variables and distribution functions are useful for describing probability models. The
probabilities attributed to events concerning a random variable X can be calculated from the
distribution function of X — cdf of X completely specifies random behaviour of X.
Example. If X denotes an integer-valued random variable and its distribution function F is
given, then we can calculate
P(X = r) = F (r) − F (r − 1), r = 0, ±1, ±2, . . . ,
and deduce from these the probability of any event involving X.
§6.3 Discrete random variables
6.3.1 Definition. Let X be a random variable defined on the sample space Ω. Then X is a discrete
random variable if X(Ω) ≡ {X(ω) : ω ∈ Ω} is countable.
Note: A set A is countable if its elements can be enumerated (or listed), such that A = {a1 , a2 , . . .}.
26
6.3.2 Examples.
(i) Binomial (n, p): X(Ω) = {0, 1, 2, . . . , n}.
(ii) Bernoulli trial: X(Ω) = {0, 1}.
(iii) Poisson (λ): X(Ω) = {0, 1, 2, . . .}.
6.3.3 Definition. The mass function of a discrete random variable X is the function f : (−∞, ∞) →
[0, 1] such that
f (x) = P(X = x), x ∈ X(Ω).
Note: Alternative names → probability mass function or probability function.
Definition. The set {x ∈ X(Ω) : f (x) > 0} is known as the support of X.
Note: The support of X usually, but not necessarily, coincides with X(Ω).
6.3.4 Examples.
(i) Binomial (n, p):
n px (1 − p)n−x , x = 0, 1, 2, . . . , n,
f (x) = x
0, otherwise.
(ii) Bernoulli trial: (
px (1 − p)1−x , x = 0, 1,
f (x) =
0, otherwise.
(iii) Poisson (λ): (
e−λ λx /x!, x = 0, 1, 2, . . . ,
f (x) =
0, otherwise.
(iv) Let X be no. of failures before first success in a sequence of independent Bernoulli trials
with success probability p. Then X(Ω) = {0, 1, 2, . . .}, and X has mass function
(
(1 − p)x p, x = 0, 1, 2, . . . ,
f (x) =
0, otherwise.
This is called a geometric distribution.
27
(v) Let X be no. of failures before kth success in a sequence of independent Bernoulli trials
with success probability p. Then X(Ω) = {0, 1, 2, . . .}, and X has mass function
k − 1 + x
(1 − p)x pk , x = 0, 1, 2, . . . ,
f (x) = x
0, otherwise.
This is called a negative binomial distribution.
(vi) Suppose a random sample of size m are drawn without replacement from a collection of
k objects of one kind and N − k of another kind. Let X be no. of objects of the first kind
found in the sample. Then X has mass function
k N −k N
, x = max{0, m + k − N }, . . . , min{k, m},
f (x) = x m−x m
0, otherwise.
This is called a hypergeometric distribution.
The following figures display the mass functions of examples of the above discrete random
variables.
Binomial (8,0.2) Bernoulli (p=0.2)
0.8
0.3
0.6
0.2
0.4
0.1 0.2
0.0 0.0
2 5 8 0 1
Poisson (3.5) Geometric (p=0.2)
0.20 0.20
0.15 0.15
0.10 0.10
0.05 0.05
0.00 0.00
0 2 4 6 8 10 12 3 8 13
1 3 5 7 9 11
Negative binomial (p=0.2, k=4) Hypergeometric (m=4, N=18, k=6)
0.05 0.4
0.04
0.3
0.03
0.02 0.2
0.01 0.1
0.00 0.0
10 30 50 0 1 2 3 4
28
6.3.5 The cdf of a discrete random variable X is a step function with jumps at values in the support
of X.
The following figures display the distribution functions of some discrete random variables.
Binomial (8,0.2) Bernoulli (p=0.2)
0.9 0.9
0.4 0.4
-0.1 -0.1
1 4 7 0.0 0.5 1.0
Poisson (3.5) Geometric (p=0.2)
0.9 0.9
0.4 0.4
0 5 10 0 5 10
Negative binomial (p=0.2, k=4) Hypergeometric (m=4, N=18, k=6)
0.9
0.8
0.4
0.3
0 20 40 0 2 4
§6.4 Continuous random variables
6.4.1 Definition. A random variable X is continuous if its distribution function F (x) ≡ P(X ≤ x)
has the form Z x
F (x) = f (y) dy, −∞ < x < ∞,
−∞
for some function f : (−∞, ∞) → [0, ∞).
Definition. The function f is called the density function of X.
Note: Alternative names → probability density function (pdf) or probability function.
Definition. The set {x ∈ X(Ω) : f (x) > 0} is known as the support of X.
6.4.2 The cdf F of a continuous random variable X is continuous.
29
6.4.3 If the cdf F is differentiable, we can obtain the pdf f by f (x) = F 0 (x) (≥ 0 since F is
increasing).
6.4.4 The pdf f plays a similar role as the mass function P(X = x) for discrete X. Results for discrete
and continuous random variables can often be interchanged with P(X = x) and summation
R
sign Σ replaced by f (x) and integration sign .
Example. For any subset A of real numbers,
X
P(X ∈ A) = P(X = x) (discrete)
x∈A
Z
P(X ∈ A) = f (x) dx (continuous)
x∈A
6.4.5 If X is continuous, P(X = x) = 0 for all x.
6.4.6 If X is continuous with pdf f , then
Z ∞
(i) f (x) dx = 1
−∞
Z b
(ii) P(a ≤ X ≤ b) = P(a < X ≤ b) = P(a ≤ X < b) = P(a < X < b) = f (x) dx
a
Z
(iii) P(X ∈ A) = f (x) dx for any subset A of real numbers.
x∈A
6.4.7 Analogues in physics:
Continuous X Physics
pdf at x density at x (a point in space)
probability of set A mass of A (a region in space)
P(X = x) mass of a single point x (= 0 since x has no volume, hence no mass)
6.4.8 Examples. (f ↔ pdf, F ↔ cdf)
(i) Uniform distribution, U [a, b] (a < b):
0, x < a,
1
, a ≤ x ≤ b,
x−a
f (x) = b−a F (x) = , a ≤ x ≤ b,
0, b−a
otherwise,
1, x > b.
e.g. Straight rod drops freely onto a horizontal plane. Let X be angle between rod and North
direction: 0 ≤ X < 2π. Then X ∼ U [0, 2π].
30
(ii) Exponential distribution, exp(λ) (λ > 0):
( (
λe−λx , x > 0, 0, x ≤ 0,
f (x) = F (x) = −λx
0, x ≤ 0, 1 − e , x > 0.
Remarks:
– An exponential random variable describes the interarrival time, i.e. the random time
elapsing between unpredictable events (e.g. telephone calls, earthquakes, arrivals of buses
or customers etc.)
– The exponential distribution is memoryless, i.e. if X ∼ exp(λ),
P(X > s + t | X > s) = P(X > t).
Knowing that event hasn’t occurred in the past s units of time doesn’t alter the distri-
bution of arrival time in the future, i.e. we may assume the process starts afresh at any
point of observation.
– The scale parameter λ is also called the rate. The greater is λ, the shorter is the
interarrival time (the more frequent are the arrivals).
(iii) Gamma distribution, Gamma (α, β) (α, β > 0):
α α−1 −βx
β x e
, x > 0,
f (x) = Γ(α)
0, x ≤ 0,
R∞
where Γ(·) denotes the gamma function Γ(α) , 0 uα−1 e−u du.
Remarks:
– α: shape parameter; β: scale parameter, or rate.
– Gamma (1, β) ≡ exp(β).
(iv) Beta distribution, Beta (α, β) (α, β > 0):
Γ(α + β) xα−1 (1 − x)β−1 , 0 < x < 1,
f (x) = Γ(α)Γ(β)
0, otherwise.
Note: Beta (1, 1) ≡ U [0, 1].
(v) Cauchy distribution:
1 1
f (x) = , −∞ < x < ∞,
π 1 + (x − θ)2
for any fixed real parameter θ.
31
(vi) Normal (or Gaussian) distribution, N (µ, σ 2 ):
(x − µ)2
1
f (x) = √ exp − , −∞ < x < ∞.
2πσ 2 2σ 2
Remarks:
– µ is the mean, and σ 2 is the variance (to be discussed later).
– The pdf f has a bell shape, with centre µ. The bigger is σ 2 , the more widely spread is f .
– The Central Limit Theorem (CLT) states that in many cases, the average or sum of a
large number of independent 1 random variables is approximately normally distributed.
– The Binomial (n, p) random variable is the sum of n independent Bernoulli random vari-
ables. Thus we should expect, by CLT, that Binomial (n, p) is approximately normal for
large n. In fact,
approx.
Binomial (n, p) ∼ N (np, np (1 − p)), for large n.
– N (0, 1) is known as the standard normal distribution, i.e. special case of N (µ, σ 2 ) when
µ = 0 and σ = 1.
The pdf and cdf of N (0, 1) are usually denoted by φ and Φ, respectively:
Z x
1 −x2 /2
pdf φ(x) = √ e , cdf Φ(x) = φ(y) dy, −∞ < x < ∞.
2π −∞
– If X ∼ N (µ, σ 2 ) and a, b are fixed constants, then Y = aX + b ∼ N (aµ + b, a2 σ 2 ), i.e.
any linear transformation of a normal random variable is also normal.
Special case: Taking a = 1/σ and b = −µ/σ amounts to standardisation of X, resulting
in a standard normal random variable Y = (X − µ)/σ ∼ N (0, 1).
– Many real-life random phenomena obey a normal distribution approximately (due to CLT).
Examples include measurement error, height of a man, fluctuation from nominal quality
in production line, etc.
(vii) Chi-squared distribution with m degrees of freedom, χ2m :
if m is positive integer, χ2m is distribution of m 2
P
i=1 Zi , for independent standard
normal Z1 , . . . , Zm .
1 2
Note: χ2m ≡ Gamma (m/2, 1/2), Gamma (α, β) ≡ χ .
2β 2α
1
We shall discuss independent random variables in the next chapter.
32
(viii) Student’s t-distribution with m degrees of freedom, tm :
Z
tm is distribution of p , for independent Z ∼ N (0, 1) and X ∼ χ2m .
X/m
Remarks:
– tm is heavy-tailed version of N (0, 1): tm approaches N (0, 1) when m → ∞.
– t1 ≡ Cauchy distribution with centre θ = 0.
(ix) F distribution with parameters (m, n), Fm,n :
X/m
Fm,n is distribution of , for independent X ∼ χ2m and Y ∼ χ2n .
Y /n
Note: F1,n ≡ (tn )2 , and (Fm,n )−1 ≡ Fn,m .
The following diagrams display the density and distribution functions of examples of the above
continuous random variables.
Density functions
U[0,1] exp(1)
0.9 0.9
0.4 0.4
-0.1
-0.9 0.2 1.3 0 2 4
Gamma (3,1) Beta (0.8,1.2)
0.20 3
0.05 1
0 5 10 0.2 0.5 0.8
Cauchy (t 1) N(0,1)
0.3
0.3
0.1 0.1
-10 0 10 -3 0 3
χ 2 (4 d.f.) F 5,10
0.7
0.12
0.3
0.01
0 5 10 1 3 5
33
Distribution functions
U[0,1] exp(1)
0.9 0.9
0.4 0.4
-0.1
-0.9 0.2 1.3 0.5 2.0 3.5
Gamma (3,1) Beta (0.8,1.2)
0.9 0.9
0.4 0.4
-0.1
0 5 10 0.0 0.5 1.0
Cauchy (t 1) N(0,1)
0.9 0.9
0.4 0.4
-10 0 10 -3 0 3
χ 2 (4 d.f.) F 5,10
0.9 0.9
0.4 0.4
0 5 10 0.2 2.4 4.6
6.4.9 The normal distribution and the normal-related distributions — χ2m , tm , Fm,n — are useful
for statistical inference such as hypothesis testing, confidence interval construction, regression,
analysis of variance (ANOVA), etc.
§6.5 *** More challenges ***
6.5.1 Let X be a random variable with distribution function F . Let x be a fixed real number. For
n = 1, 2, . . . , define events
1
An = X ≤ x − .
n
(a) Show that
A1 ⊂ A2 ⊂ · · · and A1 ∪ A2 ∪ · · · = {X < x}.
34
(b) Show that
P(X < x) = lim F (x − 1/n).
n→∞
(c) Deduce from (b) that if F is continuous, then P(X = x) = 0 for all x ∈ R.
(d) Give an example of X for which P(X < x) 6= F (x) for some x.
6.5.2 Define, for a constant λ > 0, a function
(
1 − e−λx , x > 0
F (x) =
0, x ≤ 0.
(a) Verify that F is a distribution function.
(b) Find a density function for F .
(c) Let X be a random variable distributed under F . Define a new random variable Y as follows.
A fair coin is tossed once, independent of X. Put Y = X if a head turns up and Y = −X if
otherwise.
Write down expressions for the distribution and density functions of Y , respectively.
6.5.3 Let f be a function defined by
f (x) = max{0, c (4 − x2 )} for x ∈ (−∞, ∞),
where c is some unknown real constant.
(a) Find the value(s) of c such that f is a proper density function.
(b) Suppose now c takes the positive value determined in (a), and X is a random variable dis-
tributed with density f . Let F denote the distribution function corresponding to f .
(i) Find F .
(ii) Show that
F (x) + F (−x) = 1 for all real values of x.
(iii) Determine a positive constant a that satisfies
11a
P(|X| ≤ a) = .
16
(iv) Does there exist a positive constant a that satisfies P (|X| ≤ a) = 3a/16? If so, what is
it?
35