SAMPLING AND SAMPLING DISTRIBUTION
Objectives
The learner should be able to
1. illustrate random sampling,
2. distinguish between parameter and statistics,
3. find the mean and variance of the sampling distribution of the sample mean,
4. illustrate the Central Limit Theorem, and
5. solve problems involving sampling distributions of the sample mean.
Lesson 3.1 Random Sampling
Since it is impossible to study an entire population (every individual in a country, all college
students, every geographic area, etc.), researchers typically rely on sampling to acquire a
section of the population to perform an experiment or observational study. It is important that the
group selected be representative of the population, and not biased in a systematic manner. For
example, a group comprised of the wealthiest individuals in a given area probably would not
accurately reflect the opinions of the entire population in that area. For this reason,
randomization is typically employed to achieve an unbiased sample. The most common
sampling designs are simple random sampling and stratified random sampling.
Simple Random Sampling
It is the basic sampling technique where we select a group of subjects (a sample) for
study from a larger group (a population). Each individual is chosen entirely by chance and each
member of the population has an equal chance of being included in the sample. Every possible
sample of a given size has the same chance of selection. Simple random sampling is most
appropriate when the entire population from which the sample is taken is homogeneous.
Stratified Random Sampling
It is obtained by taking samples from each stratum or sub-group of a population. There may
often be factors which divide up the population into sub-populations (groups / strata) and we
may expect the measurement of interest to vary among the different sub-populations. This has
to be accounted for when we select a sample from the population in order that we obtain a
sample that is representative of the population. Stratified sampling techniques are generally
used when the population is heterogeneous, or dissimilar, where certain homogeneous, or
similar, sub-populations can be isolated (strata). Some reasons for using stratified sampling over
simple random sampling are:
a) the cost per observation in the survey may be reduced;
b) estimates of the population parameters may be wanted for each sub- population; and
c) increased accuracy at given cost.
Lesson 3.2 Populations, Samples, Parameters, and Statistics
The field of inferential statistics enables you to make educated guesses about the
numerical characteristics of large groups. The logic of sampling gives you a way to test
conclusions about such groups using only a small portion of its members.
Ms. Zyrill Macha-Quisquino
A population is a group of phenomena that have something in common. The term often
refers to a group of people, as in the following examples:
• all registered voters in Muntinlupa City
• all regular employees of Lyceum of Alabang
• all students of Lyceum of Alabang under the strand of STEM
Often, researchers want to know things about populations but do not have data for every
person or thing in the population. If a company's customer service division wanted to learn
whether its customers were satisfied, it would not be practical (or perhaps even possible) to
contact every individual who purchased a product. Instead, the company might select a sample
of the population. A sample is a smaller group of members of a population selected to represent
the population. In order to use statistics to learn things about the population, the sample must
be random.
* parameter - is a characteristic of a population
* statistic - is a characteristic of a sample
For example, say you want to know the mean income of the subscribers to a particular
magazine—a parameter of a population. You draw a random sample of 100 subscribers and
determine that their mean income is $27,500 (a statistic). You conclude that the population mean
income μ is likely to be closed to $27,500 as well. This example is one of statistical inference.
Lesson 3.3 Mean, Variance and Standard Deviation of Probability Distribution
The mean of a discrete random variable X is a weighted average of the possible values
that the random variable can take. Unlike the sample mean of a group of observations, which
gives each observation equal weight, the mean of a random variable weights outcome i ,
according to its probability, pi. The common symbol for the mean (also known as the expected
value of X) is µ , formally defined by:
where :
µ - mean
i - mean of a random variable weights outcome
pi - probability of each mean random variable outcome
The law of large numbers states that the observed random mean from an increasingly
large number of observations of a random variable will always approach the distribution mean .
That is, as the number of observations increases, the mean of these observations will become
closer and closer to the true mean of the random variable. This does not imply, however, that
short term averages will reflect the mean.
Ms. Zyrill Macha-Quisquino
Variance
The variance of a discrete random variable X measures the spread, or variability, of the
distribution, and is defined by:
s2 = ( x – x )2
n-1
where
s2 - variance
x - random variable
x - mean of the sample values
n - size of the random variable
Standard Deviation ( s ) - is the square root of the variance.
s= ( x – x )2 or s = √ s2
n-1
Example 1:
Problem : 5, 8, and 9 are the scores obtained by 3 selected students in a particular quiz. By
using a random distribution with random variable size of r = 2, solve the following:
a. population mean
b. variance and standard deviation.
Solution
a. population mean
Step 1: Determine the number of sample values of size r = 2 using the combination formula :
nCr = n!
( n – r )! r!
where: C - number of sample values
n – number of samples
r - size of the combination
3C2 = 3!
( 3 – 2 )! 2!
= 6
1( 2 )
3C2 =3 ( there 3 sample values and the probability of each
is 1/3 or 0.33 )
Ms. Zyrill Macha-Quisquino
Step 2: Calculate the mean, x, of each sample values as tabulated below.
Sample Number Sample Mean ( x ) Probability,
Values ( x ) Pr ( x )
1 5 & 8 6.5 1/3
2 5 & 9 7 1/3
3 8 & 9 8.5 1/3
Step 3: Compute the population mean, 𝝁, of the sampling distribution.
7.333…
b. variance and standard deviation
Prepare a table that shows a sample variance and standard deviation
Sample Sample Mean ( ) Sample Sample
Number Values(x) Variance (s2) Standard
Deviation (s)
1 5&8 6.5
2 5&9 7
3 8&9 8.5
Compute the sample variance ( s2 ) and standard deviation ( s or √ s2 ) of each sample values.
1. 5 & 8 2. 5 & 9
𝒔𝟐= 𝟒. 𝟓 𝒔𝟐 = 𝟖
Ms. Zyrill Macha-Quisquino
s= s=
= = √8
s = 2.12 s = 2.83
3. 8 & 9
𝒔𝟐= 𝟎. 𝟓
s=
s=
s = 0.71
Complete the table with obtained values.
Sample Sample Mean ( ) Sample Sample
Number Values(x) Variance (s2) Standard
Deviation (s)
1 5&8 6.5 4.5 2.12
2 5&9 7 8 2.83
3 8&9 8.5 0.5 0.71
Example 2:
Problem : A population consists of 5 values such as 11, 13, 15, 17, and 19. Compute the
following with a random variable size of r = 3:
a. population mean
b. variance and standard deviation
Ms. Zyrill Macha-Quisquino
Solution:
a. population mean
Step 1: Determine the number of sample values of size r = 3 using the combination formula.
nCr = n!
( n – r )! r!
3C2 = 5!
( 5 – 3 )! 3!
= 5x4x3x2x1
(2x1)( 3x21x )
3C2 = 10 ( there 10 sample values and the probability of each
is 1/10 or 0.10 )
Step 2: Calculate the mean, , of each sample values as tabulated below.
Sample Sample Mean ( ) Probability Pr(x)
Number Values(x)
1 11, 13, 15 13 0.10
2 11,13, 17 13.67 0.10
3 11,13, 19 14.33 0.10
4 11,15, 17 14.33 0.10
5 11, 15, 19 15 0.10
6 11, 17, 19 15.67 0.10
7 13, 15, 17 15 0.10
8 13, 15, 19 15.67 0.10
9 13, 17, 19 16.33 0.10
10 15, 17, 19 17 0.10
Ms. Zyrill Macha-Quisquino
Since there are mean values that are common, the table below may use to make the solution
of the population mean simpler.
Mean ( ) Frequency (f) Probability Pr(x)
13 1 0.10
13.67 1 0.10
14.33 2 0.20
15 2 0.20
15.67 2 0.20
16.33 1 0.10
17 1 0.10
1.00
Step 3: Compute the population mean, 𝝁, of the sampling distribution.
𝜇=∑ 𝑖𝑝𝑖
𝝁=13(0.10)+13.67(0.10)+14.33(0.20)+ 15(0.20) +15.67(0.20) +16.33(0.10) + 17(0.10)
𝝁 = 1.3 +1.367 + 2.866 + 3.0 + 3.134 + 1.633 + 1.7
𝝁= 15
b. variance and standard deviation
Sample Sample Mean ( ) Sample Sample
Number Values(x) Variance (s2) Standard
Deviation (s)
1 11, 13, 15 13
2 11,13, 17 13.67
3 11,13, 19 14.33
4 11,15, 17 14.33
5 11, 15, 19 15
6 11, 17, 19 15.67
7 13, 15, 17 15
8 13, 15, 19 15.67
9 13, 17, 19 16.33
10 15, 17, 19 17
Compute the sample variance and standard deviation of each sample values.
1. 11, 13, 15 2. 11, 13, 17
Ms. Zyrill Macha-Quisquino
s2 = 4 s2 = 9.33335
s = √ s2 s = √ s2
s=√4 s = √ 9.33335
s=2 s = 3.0551
Activity:
Continue computation and complete the table below.
Sample Number Sample Values(x) Mean ( ) Sample Sample
Variance (s2) Standard
Deviation (s)
1 11, 13, 15 13 4 2
2 11,13, 17 13.67 9.33335 3.0551
3 11,13, 19 14.33
4 11,15, 17 14.33
5 11, 15, 19 15
6 11, 17, 19 15.67
7 13, 15, 17 15
8 13, 15, 19 15.67
9 13, 17, 19 16.33
10 15, 17, 19 17
Lesson 3.4 Central Limit Theorem
Central limit theorem is a statistical theory that states that given a sufficiently large
sample size from a population with a finite level of variance, the mean of all samples from the
same population will be approximately equal to the mean of the population. If random samples
of a large sample size n that increase without limit are taken from a population with a specific
mean (𝝁) and standard deviation (s), the sampling distribution of the sample mean ( ) is
approximately normally distributed with a mean (𝝁) and standard deviation of
where: 𝑠 x - standard deviation of the sample mean
s - standard deviation of the population
n - sample size
Ms. Zyrill Macha-Quisquino
To compute the value of z we used:
z=x–𝝁
𝑠x
where: - sample mean
𝝁 - population mean
s x - standard deviation of the sample mean
Note:
1. For any sample size n, the sampling distribution of a sample mean is a normal
distribution if the original variable is normally distributed.
2. For a sample size of 30 or more, it is required to use a normal distribution to
estimate the distribution of a sample mean if the original variable is normally distributed.
Example 1:
Problem : The mean raw score of Grade 11 students in Statistics examination was 20 with a
standard deviation of 4. If 36 students are randomly selected, find the probability
that the mean score of the students is higher than 21.
Solution:
Step 1: Compute the standard deviation of the sample mean
s=3 n = 36
= 3
√ 36
sx = 1/2 or 0.5
Step 2: Identify the parts of the problem.
µ = 20 x > 21 s x = 0.5
Step 3: Compute the z score
z=x–𝝁
𝑠 x
= 21 – 20
0.5
z=2
Ms. Zyrill Macha-Quisquino
Step 4: Draw the graph.
Step 5: Find the area of the variable in a normal distribution table ( Area z=2 = 0.4772 ).
Area net = Area Right side – Area z = 2
= 0.5000 – 0.4772
= 0.0228
Therefore, the probability of obtaining sample that has a raw score of higher than 21 is
0.0228 or 2.28%.
Example 2:
Problem: The average amount of salt in mg. for certain instant noodle per cup sold in the
market is 200 mg. with a standard deviation of 10 mg. Assume that the variable is
distributed, and if a single cup noodle is selected , find the probability that the
of salt in the noodle will be more than 210 mg.
Solution:
Step 1: Compute the standard deviation of the sample mean
s = 10 n=1
s X = 10
Step 2: Identify the parts of the problems.
𝝁 = 200
= 210
𝑠 = 10
Ms. Zyrill Macha-Quisquino
Step 3: Compute the z score
z=x–𝝁
𝑠 x
= 210 – 200
10
z=1
Step 4: Draw the graph
Step 5: Find the area of the variable in a normal distribution table ( Area z = 1 = 0.3413 )
Area net = Area Right side – Area z = 1
= 0.5000 – 0.3413
= 0.1587
Therefore, the probability of obtaining a sample noodle that contains 210 mg. of salt is
0.1587 or 15.87 %.
Example 3:
Problem: The average consumption of rice of a rural male adult person in a year is 96 kilos. If
the standard deviation is 20 kilos and the distribution is approximately normal, find
the probability that the mean of the sample will be less than 102 kilos in a year if a
sample of 49 individual male adults chosen.
Step 1: Compute the standard deviation of the sample mean .
s = 20 n = 49
𝒔 x = 𝟐. 𝟖𝟓𝟕𝟏
Ms. Zyrill Macha-Quisquino
Step 2: Identify the parts of the problem
µ = 96 x = 102 𝒔 = 𝟐. 𝟖𝟓𝟕𝟏
Step 3: Compute the z score
z=x–𝝁
𝑠 x
= 102 – 96
2.8571
z = 2.1
Step 4: Draw the graph
Step 5: Find the area of the variable in a normal distribution table ( Area z=2.1 = 0.4821 ).
Area net = Area Left side + Area z = 2.1
= 0.5000 + 0.4821
= 0.9821
Therefore, the probability of obtaining 49 samples that consume less than 102 kilos of
rice is 0.9821 or 98.21%.
Example 4:
Problem: The average life span of TV sets manufactured by company X is 10.5 years and the
standard deviation is 1.8 years. If a random sample of 50 TV sets are chosen, find
the probability that the mean life span of its TV sets is 10 to 11 years.
Ms. Zyrill Macha-Quisquino
Solution:
Step 1: Compute the standard deviation of the sample mean
s = 1.8 n = 50
s x = 0.2546
Step 2: Identify the parts of the problem
µ = 10.5 µ = 10.5
x = 10 x = 11
s x = 0.2546 s x = 0.2546
Step 3: Compute the Z score. Since there were two ( 2 ) sample means, we are going to
compute two values of Z.
z=x–𝝁 z=x–𝝁
𝑠 x 𝑠 x
= 10 – 10.5 = 11 – 10.5
0.2546 0.2546
z = - 1.96 z = 1.96
Step 4: Draw the graph.
Ms. Zyrill Macha-Quisquino
5: Find the area of the variable in a normal distribution table.
Area z = - 1.96 = 0.4750 ; Area z = 1.96 = 0.4750
Area net = Area z = - 1.96 + Area z = 1.96
= 0.4750 + 0.4750
= 0.95
Therefore, the probability that the mean life span of its TV sets range
from 10 to 11 years is 0.9500 or 95%.
Ms. Zyrill Macha-Quisquino