Lecture 6
BUSINESS STATISTICS SAMPLING DISTRIBUTION
Advanced Educational Program
Reading materials:
Chap 9 (Keller)
1 2
Outline Distribution of Samples: example (1)
• Data were collected on the time taken for a pizza order to be
completed in minutes (from order taken to pizza handed over
to customer). Below is a histogram of 50 observations and
• Distribution of sample means some summary statistics.
• The central limit theorem
10
Frequency
10 12 14 16 18 20 22 24 26
Pizza time
Variable N Mean Median StDev
Pizza time 50 17.256 17.041 3.743
3 4
Another 50 observations; 1000 observations, 10,000 observations on the time to complete a
on the time to complete a pizza order (2) pizza order (3)
600
500
400
Frequency
100
10
300
200
Frequency
Frequency
5 50
100
10 20 30 40
0 0
6 8 10 12 14 16 18 20 22 24 26
Pizza time
10 20 30
Pizza time Pizza time
Variable N Mean Median StDev Variable N Mean Median StDev
Pizza time 50 17.585 17.374 3.872 Pizza time 10000 18.046 17.744 4.006
Variable N Mean Median StDev
Pizza time 1000 17.934 17.627 4.009
5 6
Distribution of sample means
General notice
• One thousand datasets, each with 10 observations in it (that
is, 1 thousand samples of size 10) are generated (simulated
• When the sample size gets large (infinitive), the data) from this model and for each sample, the average
distribution of the sample is approximately normal. (sample mean), median (sample median) and sample
standard deviation are calculated and recorded.
Variable N Mean Median StDev
average 1000 18.007 18.020 1.231
median 1000 17.757 17.804 1.433
90
80
80
70 70
60 60
Frequency
Frequency
50 50
40 40
30 30
20 20
10 10
0 0
13 14 15 16 17 18 19 20 21 22 14 15 16 17 18 19 20 21 22 23
average median
7 8
More random numbers
S.D for the 1000 random samples of size 10
• Another thousand datasets are generated from the same model,
but this time each dataset has 25 observations.
90
80
70
100
80
60 90
Frequency
70
80
50
70 60
40 50
Frequency
Frequency
60
30 50 40
40
20 30
30
20
10 20
10 10
0
0 0
1 2 3 4 5 6 7 15.5 16.5 17.5 18.5 19.5 20.5 14 15 16 17 18 19 20 21 22
average median
stdev
Variable N Mean Median StDev n Variable N Mean Median StDev
stdev 1000 3.8183 3.7282 0.9505
n average 1000 17.991 17.982 0.814
n median 1000 17.711 17.675 1.017
9 10
S.D for samples of size 25 Notices as we take larger samples….
• The histograms for all three statistics (sample mean,
sample median and sample standard deviation) are
70
60
becoming more and more symmetric and bell-shaped
50
and less variable, particularly those for the sample
Frequency
40
30
20
mean
• Also notice that the estimated standard deviation of
10
the sample mean is not only decreasing as sample
2 3 4 5 6
stdev
size increases, but is also approximately the same for
Variable N Mean Median StDev the same sample sizes.
stdev 1000 3.9637 3.9391 0.6048
11 12
A general result of great importance
The Central Limit Theorem (1)
No matter what model a random sample is taken
from, as the sample size (number of random • Whatever the population
observations) increases, the distribution of the dist. looks like (normal
sample mean becomes closer and closer to the or not), when a sample
normal distribution. And size is large enough, the
No matter what model a random sample is taken distribution of sample
from, and for any sample size n, the standard means will be normal and
deviation of the sample mean is the model standard we can use Z-statistic to
deviation, σ , (the theoretical standard deviation) calculate probability of
divided by n , that is, σ / n =>Called standard any mean value
error of the means (SE).
13 14
The Central Limit Theorem (2) This is the Central Limit Theorem
• If X is a random variable with a mean µ and
variance σ², then in general,
⎛ σ2 ⎞
X → N ⎜ µ, ⎟
⎝ n ⎠
X −µ
Z= → Z ~ N ( 0,1) as n → ∞.
σ n
15 16
So, how large does n need to be? So, how large does n need to be?
• It depends on the original distribution of X.
– If X has a normal distribution, then the sample mean has a
normal distribution for all sample sizes.
– If X has a distribution that is close to normal, the
approximation is good for small sample sizes (e.g. n=20).
– If X has a distribution that is far from normal, the
approximation requires larger sample sizes (e.g. n=50).
17 18
In general Activity 1
• The average height of Vietnamese women is 1.6m,
with a standard deviation of 0.2m. If I choose 25
women at random, what is the probability that their
average height is less than 1.53m?
19 20