Lecture 6 Outline
• Distribution of sample means
SAMPLING DISTRIBUTION
• The central limit theorem
Reading materials:
Chap 9 (Keller)
1 2
1 2
Distribution of Samples: example (1) Another 50 observations; 1000 observations,
• Data were collected on the time taken for a pizza order to be
on the time to complete a pizza order (2)
completed in minutes (from order taken to pizza handed over
to customer). Below is a histogram of 50 observations and
some summary statistics.
100
10
10
Frequency
Frequency
5 50
Frequency
0 0
6 8 10 12 14 16 18 20 22 24 26 10 20 30
Pizza time Pizza time
0
10 12 14 16 18 20 22 24 26
Pizza time Variable N Mean Median StDev
Pizza time 50 17.585 17.374 3.872
Variable N Mean Median StDev Pizza time 1000 17.934 17.627 4.009
Pizza time 50 17.256 17.041 3.743
3 4
3 4
10,000 observations on the time to complete a
pizza order (3) General notice
• When the sample size gets large (infinitive), the
distribution of the sample is approximately normal.
600
500
400
Frequency
300
200
100
10 20 30 40
Pizza time
Variable N Mean Median StDev
Pizza time 10000 18.046 17.744 4.006
5 6
5 6
Distribution of sample means
S.D for the 1000 random samples of size 10
• One thousand datasets, each with 10 observations in it (that
is, 1 thousand samples of size 10) are generated (simulated
data) from this model and for each sample, the average
(sample mean), median (sample median) and sample
standard deviation are calculated and recorded. 90
80
70
60
Variable N Mean Median StDev
Frequency
50
average 1000 18.007 18.020 1.231 40
30
median 1000 17.757 17.804 1.433 20
10
0
1 2 3 4 5 6 7
90 stdev
80
80
70 70
60 60
Variable N Mean Median StDev
Frequency
Frequency
50 50
40
30
40
30
stdev 1000 3.8183 3.7282 0.9505
20 20
10 10
0 0
13 14 15 16 17 18 19 20 21 22 14 15 16 17 18 19 20 21 22 23
average median
7 8
7 8
More random numbers S.D for samples of size 25
• Another thousand datasets are generated from the same model,
but this time each dataset has 25 observations.
70
60
100
80
90 50
70
Frequency
80
60 40
70
50
Frequency
60 30
Frequency
50 40
40 20
30
30
20 10
20
10 10 0
0 0
2 3 4 5 6
15.5 16.5 17.5 18.5 19.5 20.5 14 15 16 17 18 19 20 21 22
stdev
average median
Variable N Mean Median StDev Variable N Mean Median StDev
average 1000 17.991 17.982 0.814 stdev 1000 3.9637 3.9391 0.6048
median 1000 17.711 17.675 1.017
9 10
9 10
A general result of great importance
Notices as we take larger samples….
No matter what model a random sample is taken
• The histograms for all three statistics (sample mean, from, as the sample size (number of random
sample median and sample standard deviation) are observations) increases, the distribution of the
becoming more and more symmetric and bell-shaped sample mean becomes closer and closer to the
and less variable, particularly those for the sample normal distribution. And
mean No matter what model a random sample is taken
• Also notice that the estimated standard deviation of from, and for any sample size n, the standard
the sample mean is not only decreasing as sample deviation of the sample mean is the model standard
size increases, but is also approximately the same for deviation, , (the theoretical standard deviation)
the same sample sizes. divided by n , that is, / n =>Called standard
error of the means (SE).
11 12
11 12
The Central Limit Theorem (1) The Central Limit Theorem (2)
• Whatever the population
dist. looks like (normal
or not), when a sample
size is large enough, the
distribution of sample
means will be normal and
we can use Z-statistic to
calculate probability of
any mean value
13 14
13 14
This is the Central Limit Theorem So, how large does n need to be?
• If X is a random variable with a mean µ and
variance σ², then in general,
2
X N ,
n Sampling error
X
Z Z ~ N 0,1 as n .
n
15 16
15 16
So, how large does n need to be? In general
• It depends on the original distribution of X.
– If X has a normal distribution, then the sample mean has a
normal distribution for all sample sizes.
– If X has a distribution that is close to normal, the
approximation is good for small sample sizes (e.g. n=20).
– If X has a distribution that is far from normal, the
approximation requires larger sample sizes (e.g. n=50).
17 18
17 18
Activity 1
• The average height of Vietnamese women is 1.6m,
with a standard deviation of 0.2m. If I choose 25
women at random, what is the probability that their
average height is less than 1.53m?
19
19