[go: up one dir, main page]

0% found this document useful (0 votes)
59 views6 pages

Sampling Unit 7

This document discusses cluster sampling methodology. It defines what a cluster is and provides examples of clusters. It then discusses simple one-stage cluster sampling and multi-stage cluster sampling. The key reasons for using cluster sampling being feasibility and cost-effectiveness are also covered.

Uploaded by

yonasante2121
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views6 pages

Sampling Unit 7

This document discusses cluster sampling methodology. It defines what a cluster is and provides examples of clusters. It then discusses simple one-stage cluster sampling and multi-stage cluster sampling. The key reasons for using cluster sampling being feasibility and cost-effectiveness are also covered.

Uploaded by

yonasante2121
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Chapter 7: Cluster Sampling

7.1 Definition

A cluster, when used in sample survey methodology, can be defined as any sampling
unit, containing a set of elements, treated as a single unit for the purpose of selecting a
sample. The unit can be geographical, temporal, or spatial in nature. Some practical
examples of clusters are as follows.
Elementary
Cluster Listing unit Application
unit
City block Household Person Estimation of total persons in
city
School Classroom Student Estimation of mean of
academic achievement among
student in a district
Week Day Day Estimation of all days having
maximum rain fall or
temperature
District/Woreda Hospital Patient Estimation of the proportion
discharged dead in a particular
state
Village Farm Farm Estimation of production

For example, a list containing of 52 calendar weeks can be compiled and a sample of the
weeks can be selected from this list. For each of the weeks selected in the sample, a
sample of days can be selected, and on each sample day measurements of rainfall or
temperature can be made.

Cluster Sampling is a process of any sampling plan that uses a frame consisting of
clusters of listing units. The sampling plan is often characterized in terms of the number
of stages involved, working down from larger clusters to smaller ones. We can select a
sample of clusters by simple random sampling or by systematic sampling. We can group
the clusters into strata and take a stratified random sample of clusters.
 Simple one- Stage Cluster Sampling: Cluster sampling is a sampling plan in which
clusters are chosen by simple random sampling in only one step and, every listing
unit within each of the selected clusters is included in the sample.
 Multi-Stage Cluster Sampling: is a process of sampling by which several stages of
sampling are often involved. That is, a sample of clusters selected at different stages
within each successive selected samples. More than one sampling frame might be
involved in the process.
After the first stage of sampling, the sampling frame is compiled from only those clusters
chosen in the sample. Once the sample clusters are selected at the first state, the listing of
second stage sampling units is compiled only for the sample clusters. Likewise, if there
are more than two stages of sampling, sampling units at any later stage are listed only for
those sampling units selected at the previous stage.

1
Why is cluster sampling widely used? The two most important reasons for cluster
sampling so widely used in practice, are feasibility and economy.
Cluster sampling is often the only feasible method of sampling because the only sampling
frames readily available for the target population are lists of clusters. This is especially
true for the surveys of human populations for which the household serves as the listing
unit. It is almost never feasible in terms of time and resources to compile a list of
households for any sizable population for the sole purpose of conducting a survey.
However, lists of higher clusters (geographical units) can be compiled relatively easily,
and these can serve as the sampling frame.
Cluster sampling is often the most economical form of sampling. Not only are listing
costs almost always lowest for cluster sampling, but also traveling costs are often lowest.

One disadvantage of cluster sampling is that the standard errors of estimates obtained
from cluster sampling designs are often high compared with those obtained from samples
of the same number of listing units chosen by other sampling designs. Another problem is
that the costs and problems of statistical analysis are greater.
In this section, we treat only a simple one-stage cluster sampling having clusters of equal
size.

7.2 Simple One-Stage Cluster Sampling (Cluster of Equal Size)

Suppose that a population has M cluster and L units in each cluster. The total population
units N = ML and all L units are included. Structure of clusters with observations is
shown below.
Clusters
Units 1 2    i    M
1 Y11 Y21    Yi1    YM 1
2 Y12 Y22    Yi 2    YM 2
' ' ' ' '
' ' ' ' '
' ' ' ' '
j Y1 j Y2 j    Yij    YM j
' ' ' ' '
' ' ' ' '
' ' ' ' '
L Y1L Y2 L    YiL    YML
C.T . Y1 Y2    Yi    YM
C.M Y1 Y2    Yi    YM
Where C.T. is cluster total and C.M. cluster mean. The following notations are used for
population:
Yij  Value obtained for listing unit j in population cluster i, (i = 1, 2, - - -, M; j = 1, 2, - -
-, L)

2
L
Yi   Yij  Aggregate of characteristic y for the i th population cluster,
j 1
M M L
Y   Yi   Y ij , Population total for characteristic y,
i 1 i 1 j 1
L

Y
Y
j 1
ij

Yi  i  , Mean for cluster i,


L L
M  L  M
 Yi   Yij   Yi
Y i 1 1 M  j 1  i 1 , the mean of the population unit, or the mean of
Y 
N

ML
 
M i 1  L  M
 
 
the M cluster means,

Population Variance:
M L M L

  Y Y   Y Y 
2 2
ij ij
i 1 j 1 i 1 j 1
S2   , Population variance for SRS
N 1 ML  1
L

 Y  Yi 
2
ij
j 1
S i2  , Population variances for cluster i
L 1
M

 Y Y 
2
i
S a2  i 1
, Variance of cluster means
M 1
Since the population in clusters is generally not random, the degree of homogeneity
between any two units within the cluster could be measured by intra-cluster correlation (
 w ).Thus the variance of cluster means can be expressed in terms of intra-cluster
correlation (  w ). The correlation coefficient between elements in the same cluster is
expressed as:
E Yij  Y Yik  Y 
w  , i = 1, 2, - - -, M; and i, j, = 1, 2, - - -, L. Expressing the
E Yij  Y  E Yik  Y 
2 2

variance of cluster means ( S a2 ) in terms of intra-cluster correlation would give:


2 S 2 ( ML  1)  w ( L  1)  1 2 S 2  w ( L  1)  1
S 
a  S a 
 , this approximation is valid
L2 ( M  1) L
for large L units. (Verify)

7.3 Estimation from Sample of Clusters:

Suppose a sample of m clusters, each containing L elements, is drawn from M clusters by


simple random sample. The sample cluster data structure will be as follows.

3
Clusters
Units 1 2    i    m
1 y11 y 21    y i1    yM 1
2 y12 y 22    yi 2    ym2
' ' ' ' '
' ' ' ' '
' ' ' ' '
j y1 j y2 j    y ij    ym j
' ' ' ' '
' ' ' ' '
' ' ' ' '
L y1L y2L    y iL    y mL
Total. y1 y2    yi    ym
Mean y1 y2    yi    ym

Notation for sample:


y ij  Value obtained for listing unit j in sample cluster i (i = 1, 2,- - -, m; j = 1,2,- - -, L).
L
y i   y ij , Aggregate value for the i th sample cluster
j 1
L

yj 1
ij
yi
yi   , Mean for sample cluster i
L L
m L m m
 y ij
i 1 j 1
 yi y i
i 1 i 1
y ce    , The sample mean per element or the mean of the m
n mL m
n mL m
cluster means. Sampling fraction, f    .
N ML M

Theorem 7.1: A simple random sample of m clusters, each containing L elements, is


drawn from M clusters in the population. Then the sample mean per element y ce is an
unbiased estimate of Y with variance

1  f S 2 ML  1 w  L  1  1 1  f S 2
Var  y ce     w L  1  1
m L2 M  1 m L
Prove this theorem.

Corollary: For the population total, an unbiased estimate and its variance are,
respectively, Yˆce  N y ce  MLy ce and var Yˆce  N 2
1  f  S 2  M 2 L2 1  f  S 2
 
a a
m m

4
Estimation of the variance from a sample:
m

 y i  y ce 2
If the unit variance between cluster means is given by s a2  i 1
, then
m 1
the sample variance of the mean, y ce , is
1 f 2 1 f
var  yce   sa and its standard error is s.e. y ce   sa
m m

7.4 Comparison of Cluster of Equal Size with SRS

Consider a simple random sample of size n is taken from N, where n = mL, and N = ML.
If the Var ( y cl ) < Var ( y ) , then the cluster sampling is more efficient. Show that this is
1
true when  w   or  w  0 , for large N.
N 1
Example

An agricultural extension agent wishes to estimate the average farm size (in hectares) per
household in a given community. In this particular community there are 4000 households
living in 400 geographical clusters of 10 households each. Because of fund constraints, a
simple random sample of four clusters is selected. The household data from all 4 selected
cluster samples are given below.

Cluster Farm size (in hectares)

1 1.0, 2.0, 1.5, 2.2, 3.0, 3.5, 1.0, 4.1, 1.6, 1.0
2 1.1, 3.1, 2.2, 2.8, 3 .5, 1.0, 4.4, 1.1, 1.0, 2.3
3 2.3, 1.0, 1.2, 1.4, 1.0, 3.2, 2.1, 1.5, 3.3, 1.0
4 1.6, 1.2, 3.0, 2.0, 1.3, 5.0, 0.2, 2.2, 3.5, 1.0

i. Estimate the average farm size per household for the community
ii. Find the standard error of the estimate
iii. Estimate the total area under crops, assuming that all farm size of households are
cultivated.
Solution:
L

y
j 1
ij

(i) First, find the cluster mean y i , i.e, y i  , Where L = 10 and y ij represent
L
the characteristic, farm size in hectares, of households.
1.0  2.0  1.5  2.2  3.0  3.5  1.0  4.1  1.6  1.0 20.9
y1    2.09
10 10
1.1  3.1  ....  2.3 22.5
y2    2.25
10 10

5
2.3  1.0  ...  1.0 18.0
y3    1.80
10 10

1.6  1.2  ...  1.0 22.0


y4    2.20
10 10
The overall estimate for the community is obtained by
m

y
i 1
i
y ce  , Where m = 4, i = 1, 2, 3, 4.
m
2.09  2.25  1.80  2.20 8.34
y ce    2.085  2.1 Hectares.
4 4
(ii) To find the standard error of y ce , we must calculate the variance of y ce
m
2
  yi  yce 
1 f 2 m
var  yce   sa , where f  2
, M  400 and sa  i 1

m M m 1
 4 
1   2 2 2 2
400   2.09  2.085  2.25  2.085  1.8  2.085  2.2  2.085 
 var  yce     
4  4 1 

1  0.01  0.000025  0.027225  0.081225  0.013225 


=
4  3 

0.99  0.1217 
=   = 0.01004
4  3 
s.e.  y ce   0.01004  0.1002
(iii) Estimate of total area cultivated
yˆ ce  Ny ce  MLy ce  400  10  2.085  8340 Hectares

If 95% confidence interval is required, then it will be calculated as follows. A 95%


confidence interval for population mean Y ( Z=1.96).
Y  yce  Z  s.e. yce 
2

 Y  2.085  1.96  0.1002


 Y  2.085  0.196
1.889  Y  2.281
We are 95% confident that the actual average farm size per household in the community
lies between 1.889 and 2.281 hectares.

You might also like