Working With Z-Scores 2:: Surveys & The Central Limit Theorem
Working With Z-Scores 2:: Surveys & The Central Limit Theorem
Working With Z-Scores 2:: Surveys & The Central Limit Theorem
By now, you should be well familiar with the normal distribution and the normal curve.
You should also be familiar with the z-score table and how it works. If you aren’t
comfortable working with these things, review those subjects before starting this
worksheet. This worksheet will help you to take what you’ve learned about z-scores and
apply it to surveys.
In statistical studies, there’s always a chance that when we pick a single item at random
from a population that it won’t be close to the average for the population. If we’re trying
to measure the height of the average American male, randomly picking Tom Cruise or
Shaquille O’Neal isn’t going to give us a true indication of what that value is.
We minimize the risk of picking a statistical weirdo by picking multiple members of the
population to study. A batch of members of the sample space, studied together, is
called a survey.
The mean of a sample should be the same as the mean of the individual members of
the population, or close to it, but the standard deviation will be smaller. After all, even if
my survey of American men gives me Tom Cruise and Shaquille O’Neal, at least they’ll
help to average each other out.
For a population that has a mean of μ and a standard deviation of σ for some statistical
variable x, we can take a sample of size n and take the mean of the measurements in
the sample to get x (which we read “x-bar”). We can define a second statistical variable
concerning all possible samples of that size from that population, and look at all those
possibilities as a new sample space. This new variable has a distribution called a
sampling distribution. When n is large enough (we’ll define “large enough” in a
moment), the sampling distribution is approximately normal, with a mean equal to μ, the
mean of the population, and a standard deviation of σn , derived from the standard
deviation of the population and the sample size. This result is called the Central Limit
Theorem.
The good news for statisticians is that the Central Limit Theorem is still true even if the
distribution of x, the variable for the individuals, isn’t distributed normally! Taking a
sample evens out the distribution and makes it more normal the bigger n gets.
This brings us back to “large enough” and what it means. The minimum usable value for
n will vary depending on what the distribution of the individuals is. If that’s normal, then
any n will do. If the distribution is skewed or bimodal, or strange in some other way, n
must be much higher. An n higher than 30 is safe, according to your textbook.
If the survey is large enough to give a normal sampling distribution of x , we can use the
z-table to find out things about the mean of the population from the mean of the survey.
Example 1: For a study of bone brittleness, the ages of people at the onset of
osteoporosis following a normal distribution with a mean age of 71 and a standard
deviation of 2.8 years. What is the probability of:
a) selecting one person who had the onset of osteoporosis at age 68 or less?
b) having a sample of 5 people who had an average age of onset of osteoporosis at
age 68 or less?
c) having a sample of 50 people who had an average age of onset of osteoporosis
at age 68 or less?
Solution: a) First, note that if the question hadn’t specified that if the age of onset of
osteoporosis wasn’t distributed normally, we couldn’t even answer this question! Since
the ages are normal, X ~ N(71, 2.8), so we use the z-score:
x −μ
z =
σ
68 − 71
= = −1.07…
2 .8
We look this up in the z-score table and get the answer 0.1423. Not likely, but it will
happen about 1 time in 7.
b) Now we have a sample. Since x is normal, x is also, even at this low sample
size. The mean for the sample is still 71 and the standard deviation is now 2.58 = 1.252…
The standard deviation has gone down. The z-score is:
x −μ
z =
σ
n
68 − 71
= = −2.40…
2 .8
5
This time we get a probability of 0.0082. This means that the probability of getting a
sample mean of 68 years or less will occur less than 1% of the time.
c) The sample has gotten larger again, and our new standard deviation is 250 .8
=
0.396… The z-score is:
x −μ
z =
σ
n
68 − 71
= = −7.58…
2 .8
50
The probability of getting a sample mean that’s off the real mean for the population
by 3 years is now so remote it isn’t even on the table. That’s how powerful even a small
sample can be.