Plaksha University: Technology Leaders Program
Dr. Nandini Kannan
email : nandini.kannan@plaksha.edu.in
1
Chapter 3
Sampling Distributions
3.1 Introduction
Statistics as the science that deals with
(a) the collection, organization, and summary of information about
a particular topic of interest (Descriptive Statistics)
(b) drawing inferences about the population of interest using in-
formation obtained from a sample (Inferential Statistics)
Definition: A Parameter is a numerical measure associated with
a population.
2
Definition: A Statistic is a numerical measure associated with
the sample.
Example Paper Boat would like to determine the sugar content
of Nagpur oranges. How would you help Paper Boat answer this
question?
A consumer group wants to determine the fuel efficiency of the
new Honda SUV. How would you proceed.
In both cases, identify the parameter and the statistics of interest.
Therefore, Statistics are Random Variables.
We would like to know how the statistic changes over different
samples.
Definition: The probability distribution of a statistic is called a
sampling distribution.
Example: For the fuel efficiency example, the sampling distribution
of the mean could be attempted as follows:
3
• Draw a sample of 100 from the population of all vehicles manu-
factured in a given time period. Compute the sample mean.
• Repeat the process of drawing samples of size 100 several times.
• Each time a sample is selected, the value of the statistic (mean)
is calculated.
• Draw the relative frequency histogram of these computed statis-
tics.
If the process is repeated a large number of times, the histogram
will provide an approximation of the sampling distribution.
4
Let X1, . . . , Xn be n mutually independent observations made of a
particular quantitative phenomenon-for example, blood pressure. We
assume that each observation Xi has the same probability distribu-
tion. We say that the n observations X1, . . . , Xn are independent
and identically distributed (i.i.d.).
If X is described by a pmf, then
p(x1, . . . , xn) = P (X1 = x1, . . . , Xn = xn) = p(x1) . . . p(xn).
If X is described by a pdf, then
f (x1, . . . , xn) = f (x1) . . . f (xn).
X1, . . . , Xn are said to be a random sample of size n from
the distribution of X.
5
3.2 Sampling Distribution of the Mean
Suppose we are interested in estimating the mean µX of a random
variable X based on a random sample X1, . . . , Xn. We can estimate
µX by the sample average
n
1X
X̄ = Xi .
n i=1
Using properties of expectation, we have
µX̄ = E(X̄) = µX
and
2
2 σX
σX̄ = V ar(X̄) = .
n
Example: Consider a random sample of n = 12 from U (−1/2, 1/2).
Let T = ni=1 Xi. Figure 2.1 shows a histogram of 1000 such sums
P
with a superimposed normal pdf.
Figure 2.2 shows a histogram of the distribution of X(n), the largest
order statistic.
6
0.4
0.3
0.2
0.1
0.0
-3 -2 -1 0 1 2 3
Value
Figure 3.1: Probability Histogram: Sum of 12 Uniform Random Variables
7
400
300
200
100
0
0.0 0.1 0.2 0.3 0.4 0.5
ymax
Figure 3.2: Probability Histogram: Max of 12 Uniform Random Variables
8
3.3 Central Limit Theorem
Example: Suppose X is a discrete random variable with probability
distribution given by
x 0 1
1 2
p(x) 3 3
i.e. the population consists of 0’s and 1’s, 1/3-rd of the population
consists of 0’s, 2/3-rd of the population consists of 1’s.
We can compute the mean and variance. We have
2 2
E(X) = µ = V ar(X) = σ 2 =
3 9
Let X1, X2 be a random sample of size 2 drawn from this pop-
ulation. Both X1 and X2 can take values 0 and 1. There are four
possible samples of size 2:
(0, 0), (0, 1), (1, 0), (1, 1).
We compute the average X 2 and total T2 for all possible samples.
The table is shown below.
9
X1 X2 X 2 T2
0 0 0 0
0 1 0.5 1
1 0 0.5 1
1 1 1 2
Thus the average and total are both random variables, and we can
compute their probability distributions.
X 2 can assume values 0, 0.5, and 1.
P (X 2 = 0) = P (X1 = 0 and X2 = 0)
= P (X1 = 0) P (X2 = 0) (by independence)
1 1
=
3 3
1
=
9
P (X 2 = 0.5) = P (X1 = 0 and X2 = 1) + P (X1 = 1 and X2 = 0)
10
= P (X1 = 0) P (X2 = 1) + P (X1 = 1) P (X2 = 0)
1 2 2 2
= +
3 3 3 3
4
=
9
P (X 2 = 1) = P (X1 = 1 and X2 = 1)
= P (X1 = 1) P (X2 = 1)
2 2
=
3 3
4
=
9
Similarly the total T2 is a random variable taking values 0, 1, and
2. We can compute the probabilities in exactly the same way. The
probability distributions can be summarized as follows:
X2 0.0 0.5 1.0
1 4 4
prob 9 9 9
We compute the mean and variance to be
2
E(X 2) = =µ
3
11
1 σ2
V ar(X 2) = =
9 2
The probability distribution of T2 is given below.
T2 0.0 1.0 2.0
1 4 4
prob 9 9 9
We compute the mean and variance to be
4
E(T2) = =2µ
3
4
V ar(X 2) = = 2 σ 2
9
We can repeat this process by drawing a sample of size 3 from the
population. We can compute the probability distributions for X 3
and T3. The probability distributions are given below.
1 2
X3 0.0 3 3 1.0
1 6 12 8
prob 27 27 27 27
12
40
30
Percent of Total
20
10
0.0 0.2 0.4 0.6 0.8 1.0
ybar
Figure 3.3: Probability Histogram for Average: n=2
13
40
30
Percent of Total
20
10
0.0 0.5 1.0 1.5 2.0
ysum
Figure 3.4: Probability Histogram for Sum: n=2
14
We compute the mean and variance to be
2
E(X 3) = =µ
3
2 σ2
V ar(X 3) = =
9 3
The probability distribution of T3 is given below.
T3 0.0 1.0 2.0 3.0
1 6 12 8
prob 27 27 27 27
We compute the mean and variance to be
E(T3) = 2 = 3 µ
2
V ar(X 3) = = 3 σ2
3
Even with a sample of size 3, the histograms are beginning to
look symmetric and more like a normal distribution. If we continue
this process, we will observe that the probability histograms start
resembling the Gaussian Distribution.
15
50
40
Percent of Total
30
20
10
0.4 0.6 0.8 1.0
ybar
Figure 3.5: Probability Histogram for Average: n=3
16
4
3
2
1
0
0.3 0.4 0.5 0.6 0.7 0.8 0.9
ybar
Figure 3.6: Probability Histogram for Average: n=20
17
Theorem 3.3.1. Central Limit Theorem: Let X1, . . . , Xn be a
sequence of iid random variables drawn from a population with
finite mean µ, and variance σ 2. Then for large n, the sampling
distribution of the sample mean is approximately normal with
mean
E(X) = µ
and variance
σ2
V ar(X) = .
n
A similar statement can be written for the total.
Remark. If the population is known to be normal, then the
distribution of the sample mean is exactly normal for any sample
size n.
Remark. By large n, we usually mean a sample of at least 25
measurements.
18
Example: A manufacturer of automobile batteries claims that the
distribution of the lifetimes of its best battery has an average of 54
months, and a standard deviation of 6 months. Suppose a consumer
group decides to check the claim by purchasing a sample of 50 of
these batteries and testing them.
(a) Describe the sampling distribution of the average lifetime of a
sample of 50 batteries.
(b) What is the probability that the sample has an average life of
52 months or fewer?
Solution Since the sample size is greater than 25, the sampling
distribution of the average based on a sample of size 50 is approx-
imately normally distributed with mean
E(X 50) = 54 months
and variance
36
V ar(X 50) = .
50
19
(b) We want to find
P (X 50 ≤ 52)
We know the distribution is approximately normal, so we can stan-
dardize the above using the mean and variance computed in (a).
P (X 50 ≤ 52) = P ((X 50 − 54) ≤ (52 − 54))
(X 50 − 54) (52 − 54)
= P ≤
0.85 0.85
= P (Z ≤ −2.35)
= 0.0094
The probability the consumer group will observe a sample average
of 52 or less is 0.0094 if the manufacturer’s claim is true. If the 50
tested batteries do result in an average of 52 or fewer months, the
consumer group will have strong evidence that the manufacturer’s
claim is untrue. Such an event is very unlikely to happen if the claim
is true.
20
3.4 Normal Approximation to the Binomial
Let Y ∼ Bin(n, p). Y is the number of successes that are observed
in the n trials and p represents the probability of success in any trial.
Y = X1 + . . . + Xn, where Xi′s are iid Bernoulli random variables.
An estimate of p, denoted by p̂, is the proportion of successes that
are observed in the n trials, i.e.,
Y
p̂ = .
n
In order to use the normal approximation to the binomial, we
require
n p ≥ 5; n(1 − p) ≥ 5.
Theorem 3.4.1. The sampling distribution of p̂ is approximately
normal with mean
and variance
p (1 − p)
.
n
21
Example: An airline has determined that the no-show rate for reser-
vations is 10 %. Suppose the next flight has 100 parties with advance
reservations.
a. Find the probability that the number of no-shows is between
20 and 25.
b. Approximate the probability in (a). Justify the approximation.
22
3.5 Distributions Derived From the Normal
Linear combinations of normally distributed random variables are
normal.
Theorem 3.5.1. Let X1, . . . , Xn be independent N (µi, σi2) ran-
Pn
dom variables. Let Y = i=1 aiXi be a linear combination of the
Xi′s with a1, . . . , an constants. Then Y ∼ N ( ni=1 aiµi, ni=1 a2i σi2).
P P
3.5.1 The χ2 distribution
Theorem 3.5.2. If Z ∼ N (0, 1), then Z 2 ∼ χ21.
Proof: The pdf of Z is given by
1 2
f (z) = √ e−z /2, −∞ < z < ∞.
2π
Let Y = Z 2. This defines a transformation between the value of Z
and Y that is not one-to-one.
√ √
The inverse solutions of y = z 2 are z = ± y. Let z1 = − y and
√
z2 = y.
23
The Jacobian of the transformation is given by
d √ −1 d √ 1
J1 = (− y) = √ ; J2 = ( y) = √ .
dy 2 y dy 2 y
The pdf of Y is given by
1 −1 1 / 1 1
g(y) = √ e−y/2 √ + √ e−y 2 √ = √ y 1/2−1e−y/2
2π 2 y 2π 2 y 2π
for y > 0.
R∞
Since 0 g(y)dy = 1, we have
Z ∞
1 Γ(1/2)
1=√ y 1/2−1e−y/2dy = √ ,
2π 0 π
since the function within the integral sign resembles a Gamma ran-
dom variable with α = 1/2 and β = 2.
Recall the pdf of a Gamma(α, β) rv is
1
f (x) = α
xα−1e−x/β .
β Γ(α)
√
Therefore Γ(1/2) = π and the pdf of Y is given by
√ 1 y 1/2−1e−y/2, y > 0;
2Γ(1/2)
0,
otherwise.
24
0.0 0.2 0.4 0.6
0
2
4
6
y
8
10
12
Figure 3.7: Sampling Distribution of Z 2 with superimposed Chi-squared distribution
This is the pdf of a chi-squared random variable with 1 degree of
freedom. ■
We also showed that if X ∼ N (µ, σ 2), then (X −µ)/σ ∼ N (0, 1).
Therefore [(X − µ)/σ]2 ∼ χ21.
Theorem 3.5.3. If U1, . . . , Un are iid chi-squared random vari-
ables with 1 degree of freedom, then V = U1 + . . . + Un ∼ χ2n.
This is the reproductive property of the chi-squared distribution.
Sampling Distribution of S 2
Let X1, . . . , Xn be a random sample drawn from a normal pop-
25
0.00 0.02 0.04 0.06 0.08
0
10
y
20
30
Figure 3.8: Sampling Distribution of Sum of 10 chi-squared Random variables
ulation with mean µ and variance σ 2. The sample variance
n
2 1 X
S = (Xi − X̄)2
n − 1 i=1
is a random variable. We have
n
X n
X
2
(Xi − µ) = [(Xi − X̄) + (X̄ − µ)]2
i=1 i=1
Xn n
X n
X
= (Xi − X̄)2 + (X̄ − µ)2 + 2(X̄ − µ) (Xi − X̄)
i=1 i=1 i=1
Xn
= (Xi − X̄)2 + n(X̄ − µ)2.
i=1
2 2
Pn
Dividing each term by σ and substituting (n − 1)S for i=1 (Xi −
26
X̄)2, we have
n
1 X 2 (n − 1)S 2 (X̄ − µ)2
(Xi − µ) = +
σ 2 i=1 σ2 σ 2/n
We know that
n
X (Xi − µ)2
i=1
σ2
is a chi-squared random variable with n degrees of freedom. We also
know that
(X̄ − µ)
√ ∼ N (0, 1)
σ/ n
which implies
(X̄ − µ)2
σ 2/n
is a chi-squared random variable with 1 degree of freedom. We then
have the following result.
Theorem 3.5.4. Consider a random sample of size n from a
normal population with mean µ and variance σ 2. Then
2 (n − 1)S 2
χ =
σ2
is a chi-squared random variable with n − 1 degrees of freedom
27
3.5.2 Student’s t− distribution
The Central Limit Theorem allows us to determine the sampling dis-
tribution of X̄ when σ is known. In many practical applications,
the population variance is unknown and must be estimated from the
data. This introduces additional variability and produces a distribu-
tion that deviates significantly from a standard normal.
Theorem 3.5.5. Let Z ∼ N (0, 1) and V ∼ χ2(ν). If Z and V
are independent, then
Z
T =p
V /ν
has a t− distribution with ν degrees of freedom.
Corollary 3.5.6. Let X1, . . . , Xn be a random sample from a
normal population with mean µ and variance σ 2. Let
n
1X 1 Xn
2
X̄ = Xi and S = (Xi − X̄)2.
n i=1 n−1 i=1
X̄−µ
Then the random variable T = √
S/ n
has a t− distribution with
ν = n − 1 degrees of freedom.
28
• The t−distribution is symmetric about 0.
• It is bell-shaped.
• The t−distribution is more variable than the Z (standard nor-
mal).
• The distribution is characterized by a single parameter ν called
the degrees of freedom (df).
• As the df increases, the t−distribution gets closer and closer to
the normal curve.
• Assumes underlying population is normal: however, if the un-
derlying population is not normal but is ”nearly” bell shaped
the distribution of T will be approximately a t.
• Tables of the percentage points for the t− distribution are avail-
able for different degrees of freedom.
29
0.4
0.3
t-3
t-20
stnorm
0.2
0.1
0.0
-4 -2 0 2 4
Figure 3.9: The t densities on 1 and 20 df, and the standard normal
30
3.5.3 The F − distribution
Theorem 3.5.7. Let U ∼ χ2(ν1) and V ∼ χ2(ν2) be independent
random variables. Then
U/ν1
F =
V /ν2
has the F −distribution with ν1 and ν2 degrees of freedom.
The F −distribution is not symmetric.
Theorem 3.5.8. Let fα (ν1, ν2) be the value from the F −table
that cuts off an area of α in the upper tail for ν1 and ν2 degrees
of freedom. Then
1
f1−α (ν1, ν2) = .
fα (ν2, ν1)
Theorem 3.5.9. Let S12 and S22 be the variances corresponding to
two independent random samples of size n1 and n2 from normal
populations with variances σ12 and σ22, respectively. Then
S12/σ12
F = 2 2
S2 /σ2
31
has an F − distribution with ν1 = n1 − 1 and ν2 = n2 − 1 degrees
of freedom.
32