MAS 206 Lecture Notes 2
MAS 206 Lecture Notes 2
What we call the central limit theorem actually comprises several theorems developed over
the years. The first such theorem was the discovery of the normal curve by Abraham De
Moivre in 1733, when he discovered the normal distribution as the limit of the binomial
distribution. The fact that the normal distribution appears as a limit of the binomial distribu-
tion as n increases is a form of the central limit theorem. Around the turn of the twentieth
century, Liapunov gave a more general form of the central limit theorem, and in 1922
Lindeberg gave the final form we use in applied statistics. In 1935, W Feller gave the proof
Let us now look at an example of the use of the central limit theorem.
Example -1
ABC Tool Company makes Laser XR; a special engine used in speedboats. The company’s
engineers believe that the engine delivers an average power of 220 horsepower, and that the
100 engines (each engine to be run a single time). What is the probability that the sample
Here our random variable X is normal (or at least approximately so, by the central limit
2
or X ~ N (220, 15 2 100 )
X −μ
So we can use the standard normal variable Z = to find the required probability,
σ n
217 − 220
P( X < 217) = P(Z < )
15 100
= 0.0228
So there is a small probability that the potential buyer’s tests will result in a sample mean less
a particular attribute that is of interest to us. This also implies that a proportion q (=1-p) of the
population does not possess the attribute of interest. If we pick up a sample of size n with
replacement and found x successes in the sample, the sample proportion of success ( p ) is
given by
x
p =
n
x is a binomial random variable, the possible value of this random variable depends on the
P( x) = nCx p x qn-x
x
Since p = and n is fixed (determined before the sampling) the distribution of the number
n
values the random variable p may take when a sample of size n is taken
The expected value and the variance of x i.e. number of successes in a sample of size n is
known to be:
E(x) = n p
Var (x) = n p q
()
μp = E p = E⎜ ⎟
⎛ x⎞
⎝n⎠
1 1
= E(x) = .n p = p
n n
and ()
σ 2p = Var p = Var ⎜ ⎟
⎛ x⎞
⎝n⎠
1 1 pq
= 2
. Var(x) = 2 . n p q =
n n n
σ p = SD p =() pq
n
When sampling is without replacement, we can use the finite population correction factor, so
Mean μp = p
pq ⎛ N − n ⎞
Variance σ 2p = .⎜ ⎟
n ⎝ N −1 ⎠
pq N − n
and standard deviation σp = .
n N −1
As the sample size n increases, the central limit theorem applies here as well. The rate at
which the distribution approaches a normal distribution does depend, however, on the shape
relatively large sample size is required to achieve a good normal approximation for
the distribution of p
In order to use the normal approximation for the sampling distribution of p , the sample size
needs to be large. A commonly used rule of thumb says that the normal approximation to the
We now state the central limit theorem when sampling for the population proportion p .
increases.
2
For "Large Enough" n: p ~ N (p, pq n )
The estimated standard deviation of p is also called its standard error. We demonstrate the
Example -2
A manufacturer of screws has noticed that on an average 0.02 proportion of screws produced
are defective. A random sample of 400 screws is examined for the proportion of defective
screws. Find the probability that the proportion of the defective screws ( p ) in the sample is
So q = 0.08 (= 1-0.02)
Since the population is infinite and also the sample size is large, the central limit theorem
2
applies. So p ~ N (p, pq n )
2
p ~ N (0.02, (0.02)(0.08) 400 )
⎛ p− p ⎞
We can find the required probability using standard normal variable Z = ⎜ ⎟
⎜ pq / n ⎟
⎝ ⎠
⎛ ⎞
⎜ ⎟
0.01 − 0.02 0.03 − 0.02 ⎟
P(0.01 < p < 0.03) = P ⎜⎜ <Z<
(0.02)(0.08) (0.02)(0.08) ⎟
⎜ ⎟
⎝ 400 400 ⎠
⎛ − 0.01 0.01 ⎞
= P⎜ <Z< ⎟
⎝ 0.007 0.007 ⎠
= P(-1.43 < Z < 1.43)
= 2 P(0 < Z < 1.43)
= 0.8472
So there is a very high probability that the sample will result in a proportion between 0.01
and 0.03.
Let us consider independent random sampling from the populations so that the sample sizes
distribution of X 1 - X 2 .
(
μX −X = E X1 - X 2
1 2
) ( ) ( )
= E X1 − E X 2
= μ1 - μ2
and (
σ X2 − X = Var X 1 - X 2
1 2
) ( )
= Var X 1 + Var X 2 ( )
σ 12 σ 22
= + ; when sampling is with replacement
n1 n2
σ 12 ⎛ N 1 − n1 ⎞ σ 22 ⎛ N 2 − n2 ⎞
= .⎜ ⎟+ .⎜ ⎟ ; when sampling is without
n1 ⎜⎝ N 1 − 1 ⎟⎠ n 2 ⎜⎝ N 2 − 1 ⎟⎠
replacement
As the sample sizes n1 and n2 increases, the central limit theorem applies here as well. So we
state the central limit theorem when sampling for the difference of population means X 1 - X 2
When sampling is done from two populations with means μ1 and μ2 and
and n2 increases.
2
σ 12 σ 22
For "Large Enough" n1 and n2: X 1 - X 2 ~ N (μ1 - μ2, + )
n1 n2
Example -3
The makers of Duracell batteries claims that the size AA battery lasts on an average of 45
minutes longer than Duracell’s main competitor, the Energizer. Two independent random
samples of 100 batteries of each kind are selected. Assuming σ 1 = 84 minutes and
σ 2 = 67 minutes, find the probability that the difference in the average lives of Duracell and
μ1 - μ2 = 45
σ1 = 84 and σ2 = 67
Let X 1 and X 2 denote the two sample average lives of Duracell and Energizer batteries
respectively. Since the population is infinite and also the sample sizes are large, the central
limit theorem applies.
2
σ 12 σ 22
i.e X 1 - X 2 ~ N (μ1 - μ2, + )
n1 n2
2
84 2 67 2
X 1 - X 2 ~ N (45, + )
100 100
So we can find the required probability using standard normal variable
Z =
(X 1 )
− X 2 − (μ 1 − μ 2 )
σ 12 σ 22
+
n1 n2
54 − 45
So P( X 1 - X 2 < 54) = P(Z< )
84 2 67 2
+
100 100
= P(Z < 0.84)
= 1- 0.20045
= 0.79955
So there is a very high probability that the difference in the average lives of Duracell and
Let us consider independent random sampling from the populations so that the sample sizes
distribution of p1 - p 2 .
samples of size n1 and n2 are taken from two specified binomial populations.
Mean and Variance of p1 - p 2
(
μ p − p = E p1 - p 2
1 2
) ( ) ( )
= E p1 − E p 2
= p1 - p2
and (
σ p2 − p = Var p1 - p 2
1 2
) ( )
= Var p1 + Var p 2 ( )
p1 q1 p 2 q 2
= + ; when sampling is with replacement
n1 n2
p1 q1 ⎛ N 1 − n1 ⎞ p 2 q 2 ⎛ N 2 − n 2 ⎞
= .⎜ ⎟+ .⎜ ⎟⎟ ; when sampling is
n1 ⎜⎝ N 1 − 1 ⎟⎠ n 2 ⎜⎝ N 2 − 1 ⎠
without replacement
As the sample sizes n1 and n2 increases, the central limit theorem applies here as well. So we
state the central limit theorem when sampling for the difference of population proportions
p1 - p 2
p1 q1 p 2 q 2
deviation + as the sample sizes n1 and n2 increases.
n1 n2
2
p1 q1 p 2 q 2
For "Large Enough" n1 and n2: p1 - p 2 ~ N (p1 - p2, + )
n1 n2
The estimated standard deviation of p1 - p 2 is also called its standard error. We demonstrate
Example -4
It has been experienced that proportions of defaulters (in tax payments) belonging to business
class and professional class are 0.20 and 0.15 respectively. The results of a sample survey
are:
Business class Professional class
Find the probability of drawing two samples with a difference in the two sample proportions
p1 = 0.20 p2 = 0.15
n1 = 400 n2 = 420
p1 = 0.21 p 2 = 0.14
Since the population is infinite and also the sample sizes are large, the central limit theorem
applies. i.e.
2
p1 q1 p 2 q 2
p1 - p 2 ~ N (p1 - p2, + )
n1 n2
2
(0.20)(0.80) (0.15)(0.85)
p1 - p 2 ~ N (0.05, + )
400 420
Z =
(p 1 )
− p 2 − ( p1 − p 2 )
p1 q1 p q
+ 2 2
n1 n2
0 . 07 − 0 . 05
P( p1 - p 2 > 0.07) = P(Z > )
( 0 . 20 )( 0 . 80 ) ( 0 . 15 )( 0 . 85 )
+
400 400
= P(Z > 0.76)
= 0.22363
So there is a low probability of drawing two samples with a difference in the two sample
sampling distributions can be well approximated by a normal distribution for “Large Enough”
sample sizes. In other words, the Z-statistic is used in statistical inference when sample size is
large. It may, however, be appreciated that the sample size may be prohibited from being
large either due to physical limitations or due to practical difficulties of sampling costs being
too high. Consequently, for our statistical inferences, we may often have to contend
ourselves with a small sample size and limited information. The consequences of the sample
Thus, the basic difference which the sample size makes is that while the sampling
distributions based on large samples are approximately normal and sample variance S2 is an
unbiased estimator of σ 2 , the same does not occur when the sample is small.
It may be appreciated that the small sampling distributions are also known as exact sampling
distributions, as the statistical inferences based on them are not subject to approximation.
However, the assumption of population being normal is the basic qualification underlying the
In the category of small sampling distributions, the Binomial and Poisson distributions were
already discussed. Now we will discuss three more important small sampling
distributions – the chi-square, the F and the student t-distribution. The purpose of discussing
these distributions at this stage is limited only to understanding the variables, which define
them and their essential properties. The application of these distributions will be highlighted
The concept of degrees of freedom (df) is important for many statistical calculations and
probability distributions. We may define df associated with a sample statistic as the number
of observations contained in a set of sample data which can be freely chosen. It refer to the
number of independent variables which vary freely without being influenced by the
1 n
Let x1 , x 2 ......x n be n observations comprising a sample whose mean x = ∑ xi is a value
n i =1
known to us. Obviously, we are free to assign any value to n-1 observation out of n
observations. Once the value are freely assigned to n-1observations, freedom to do the same
for the nth observation is lost and its value is automatically determined as
n −1
= n x − ∑ xi
i =1
∑x
i =1
i = nx
We say that one degree of freedom, df is lost and the sum n x of n observations has n-1 df
For example, if the sum of four observations is 10, we are free to assign any value to three
observations only, say, x1 = 2, x 2 = 1 and x3 = 4 . Given these values, the value of fourth
4
x 4 = ∑ xi − (x1 + x 2 + x3 )
i =1
x 4 = 10 − (2 + 1 + 4)
x4 = 3
Sampling essentially consists of defining various sample statistics and to make use them in
estimating the corresponding population parameters. In this respect, degrees of freedom may
n-m.
particular value of its estimator S2; the sample variance. The number of observations in the
sample being n, df = n-m = n-1 because σ2 is the only parameter (i.e. m =1) to be
and then present the chi-square distribution, which helps us in working out
By now it is implicitly clear that we use the sample mean to estimate the population mean
and sample proportion to estimate the population proportion, when those parameters are
unknown. Similarly, we use a sample statistic called the sample variance to estimate the
population variance.
As will see in the next lesson on Statistical Estimation a sample statistic is an unbiased
estimator of the population parameter when the expected value of sample statistic is equal to
However, it can be shown empirically that while calculating S2 if we divide the sum of square
∑ (x − x )
n
2
of deviations from mean (SSD) i.e. by n, it will not be an unbiased estimator of σ2
i =1
⎛ n
⎜∑ x−x( )2 ⎞
⎟ n −1 2 σ2
and E ⎜ i =1 ⎟ = σ = σ2 −
⎜ n ⎟ n n
⎜ ⎟
⎝ ⎠
∑ (x − x ) σ2
2
∑ (x − x )
n
2
∑ (x − x )
n
2
compensate for this downward bias we divide by n-1, so that S 2 = i =1
is
i =1 n −1
an unbiased estimator of population variance σ2 and we have:
⎛ n
(
⎜∑ x−x )2 ⎞
⎟
E ⎜ i =1 ⎟ = σ2
⎜ n −1 ⎟
⎜ ⎟
⎝ ⎠
In other words to get the unbiased estimator of population variance σ2, we divide the
( )
n
sum ∑ x − x by the degree of freedom n-1
2
i =1
X = {X 1 , X 2 ...... X N }
We may draw a random sample of size n comprising x1 , x 2 ......x n values from this population.
As brought out in previous section, each of the n sample values x1 , x 2 ......x n can be treated as an
independent normal random variable with mean μ and variance σ2. In other words
xi − μ
Zi = ~ N (0, 12) where i = 1, 2, ………n
σ
U = Z 12 + Z 22 + ......... + Z n2
n
U = ∑ Z i2
i =1
2
n
⎛x −μ⎞
U = ∑⎜ i ⎟
i =1 ⎝ σ ⎠
Which will take different values in repeated random sampling. Obviously, U is a random
variable. It is called chi-square variable, denoted by χ2. Thus the chi-square random
variable is the sum of several independent, squared standard normal random variables.
1 n
− χ2 −1
f (χ ) =Ce (χ ) dχ2
2 2 2 2
for χ2 ≥ 0
where e is the base of natural logarithm, n denotes the sample size (or the number of
independent normal random variables).C is a constant to be so determined that the total area
under the χ2 distribution is unity. χ2 values are determined in terms of degrees of freedom, df
=n
Properties of χ2 Distribution
a non-parametric distribution.
3. As a sum of squares the χ2 random variable cannot be negative and is, therefore,
4. The mean of a χ2 distribution is equal to the degrees of freedom df. The variance of
χ2 distribution looks more and more like a normal. Thus for large df
2
χ2 ~ N (n, 2n )
In general, for n ≥ 30, the probability of χ2 taking a value greater than or less than a
We can write
∑ [( x − x) + ( x − μ )]
2 2
n
⎛ xi − μ ⎞ 1 n
∑ ⎜
i =1 ⎝ σ ⎠ σ
⎟ = 2
i =1
i
∑ [( x ]
n
1
= i − x) 2 + ( x − μ ) 2 + 2( xi − x)( x − μ )
σ 2
i =1
n n n
1 1 2
=
σ2
∑ ( xi − x ) 2 +
i =1 σ2
∑ (x − μ)2 +
i =1 σ2
( x − μ )∑ ( x i − x )
i =1
2
(n − 1) S 2 ⎛ x−μ ⎞
= + ⎜⎜ ⎟
⎟
σ2 ⎝ σ / n ⎠
⎡ n n n
⎤
⎢since
⎣
∑ ( xi − x) 2 = (n − 1)S 2 ; ∑ ( x − μ ) = n( x − μ ) and
i =1 i =1
∑ (x
i =1
i − x ) = 0⎥
⎦
Now, we know that the LHS of the above equation is a random variable which has chi-square
distribution, with df = n
2
x ~ N (μ, σ2 n )
2
⎛ x−μ ⎞
Then ⎜⎜ ⎟ will have a chi-square distribution with df = 1
⎟
⎝ σ n ⎠
distribution with df = n-1. One degree of freedom is lost because all the deviations are
of S2 directly.
Since
(n − 1)S 2 has a chi-square distribution with df = n-1
σ2
⎡ (n − 1)S 2 ⎤
So E⎢ ⎥ = n −1
⎣ σ
2
⎦
n −1
E (S 2 ) = n − 1
σ 2
E (S 2 ) = σ 2
⎡ (n − 1) S 2 ⎤
Also Var ⎢ ⎥ = 2(n − 1)
⎣ σ
2
⎦
2
⎡ (n − 1)S 2 ⎛ (n − 1) S 2 ⎞⎤
E⎢ − E ⎜⎜ ⎟⎟⎥ = 2(n − 1)
⎣ σ ⎝ σ
2 2
⎠⎦
⎡ (n − 1)S 2
2
⎤
or E⎢ − (n − 1)⎥ = 2(n − 1)
⎣ σ
2
⎦
or E⎢ + ( n − 1) 2
− 2( n − 1) ⎥
⎣ σ σ2 ⎦
4
or
(n − 1)2 E [S 4 + σ 4 − 2S 2σ 2 ]2 = 2(n − 1)
σ4
or
(n − 1)2 E ( S 2 − σ 2 ) 2 = 2(n − 1)
σ4
2(n − 1) 4
or E (S 2 − σ 2 ) 2 = σ
(n − 1) 2
2σ 4
So Var ( S 2 ) =
n −1
It may be noted that the conditions necessary for the central limit theorem to be operative in
the case of sample variance S2 are quite restrictive. For the sampling distribution of S2 to be
approximately normal requires not only that the parent population is normal, but also that the
Example -5
In an automated process, a machine fills cans of coffee. The variance of the filling process is
known to be 30. In order to keep the process in control, from time to time regular checks of
the variance of the filling process are made. This is done by randomly sampling filled cans,
measuring their amounts and computing the sample variance. A random sample of 101 cans
is selected for the purpose. What is the probability that the sample variance is between 21.28
and 38.72?
Solution: We have
Population variance σ2 = 30
n = 101
χ2 =
(n − 1)S 2
σ2
≈ 0.990 – 0.025
= 0.965
Since our population is normal and also sample size is quite large, we can also estimate the
2
2σ 4
We have S ~ (σ ,
2 2
)
n −1
⎛ ⎞
⎜ ⎟
2 ⎜ 21.28 − σ 2 38.72 − σ 2 ⎟
So P(21.28 < S < 38.72) = P⎜ <Z< ⎟
⎜ 2σ 4
2σ 4 ⎟
⎜ ⎟
⎝ n −1 n −1 ⎠
⎛ ⎞
⎜ ⎟
21.28 − 30 38.72 − 30
= P⎜ <Z< ⎟
⎜ 2 x30 x30 2 x30 x30 ⎟
⎜ ⎟
⎝ 101 − 1 101 − 1 ⎠
⎛ − 8.72 8.72 ⎞
= P⎜ <Z< ⎟
⎝ 4.36 4.36 ⎠
= 0.9544
THE F -DISTRIBUTION
Let us assume two normal population with variances σ 12 and σ 22 repetitively. For a random
sample of size n1 drawn from the first population, we have the chi-square variable
χ =
2 (n1 − 1)S12
1 2
σ1
Similarly, for a random sample of size n2 drawn from the second population, we have the chi-
square variable
χ =
2 (n2 − 1)S 22
2 2
σ2
χ 12
v1
F=
χ 2
2
v2
is a random variable known as F statistic, named in honor of the English statistician Sir
Ronald A Fisher.
Being a random variable it has a probability distribution, which is known as F distribution.
variables that are independent of each other, each of which is divided by its
Properties of F- Distribution
freedom of the numerator always listed as the first item in the parentheses and the
degrees of freedom of the denominator always listed as the second item in the
parentheses. So there are a large number of F distributions for each pair of v1 and v2.
2. As a ratio of two squared quantities, the F random variable cannot be negative and is,
3. The F( v1 ,v2 ) has no mean for v2 ≤ 2 and no variance for v2 ≤ 4. However, for v2
v2 2v 22 (v1 + v 2 − 2)
E( F( v1 ,v2 ) ) = Var( F( v1 ,v2 ) ) =
v2 − 2 v1 (v 2 − 2) 2 (v 2 − 4)
distribution looks more and more like a normal. In general, for v2 ≥ 30, the probability
of F taking a value greater than or less than a particular value can be approximated by
5. The F distributions defined as F( v1 ,v2 ) and as F( v2 ,v1 ) are reciprocal of each other.
1
i.e. F( v1 ,v2 ) =
F( v2 ,v1 )
THE t-DISTRIBUTION
Let us assume a normal population with mean μ and variance σ 2 . If xi represent the n values
xi − μ
Zi = ~ N (0, 12) where i = 1, 2, ………n
σ
∑ (x )
n 2
2 −x
⎛x −μ⎞
n i
and U = ∑⎜ i ⎟ = i =1
~ χ2 (n-1 df) where i = 1, 2, ………n
i =1 ⎝ σ ⎠ σ2
xi − μ
T= σ
∑ (x )
n 2
i −x
1 i =1
n −1 σ2
xi − μ
T=
∑ (x )
n 2
i −x
i =1
n −1
xi − μ
T=
S
This statistic - the ratio of the standard normal variable Z to the square root of the χ2
variable divided by its degree of freedom - is known as ‘t’ statistic or student ‘t’ statistic,
named after the pen name of Sir W S Gosset, who discovered the distribution of the quantity.
xi − μ
The random variable follows t-distribution with n-1 degrees of freedom.
S
xi − μ
~ t (n-1 df) where i = 1, 2, ………n
S
2
We know X ~ N (μ, σ2 n )
X −μ
So ~ N (0, 12 )
σ
n
xi − μ
X −μ xi − μ σ
Putting for in T = , we get
σ σ
∑ (x )
n 2
n i −x
1 i =1
n −1 σ2
X −μ
σ
n
T=
∑ (x )
n 2
i −x
1 i =1
n −1 σ2
(X − μ )
or T= σ
∑ (x )
n 2
i −x
1 i =1
σ n(n − 1)
X −μ
or T=
∑ (x )
n 2
i −x
1 i =1
n n −1
X −μ
or T=
S
n
When defined as above, T again follows t-distribution with n-1 degrees of freedom.
X −μ
~ t (n-1 df) where i = 1, 2, ………n
S
n
Properties of t- Distribution
1. The t-distribution like Z distribution, is unimodal, symmetric about mean 0, and the t-
2. The t-distribution is defined by the degrees of freedom v = n-1, the df associated with
the distribution are the df associated with the sample standard deviation.
3. The t-distribution has no mean for n = 2 i.e. for v = 1 and no variance for n ≤ 3 i.e. for v
≤ 2. However, for v >1, the mean and for v > 2, the variance is given as
v
E(T) = 0 Var(T) =
v−2
v
4. The variance of the t-distribution must always be greater than 1, so it is more
v−2
variable as against Z distribution which has variance 1. This follows from the fact what
while Z values vary from sample to sample owing to the change in the X alone, the
general, for n ≥ 30, the variance of t-distribution is approximately the same as that of Z