Statistical Sampling Guide
Statistical Sampling Guide
Introduction ............................................................................................................................................ 3
Chapter 1: Sampling Concepts.............................................................................................................. 4
Chapter 2: Statistical Sampling Techniques ........................................................................................ 10
1. Simple Random Sample…………………………………………………………..…………………10
2.Systematic Random Sample…………………………………………………………………………11
3. Sampling Proportional to the Size…………………………………………………………………..12
4.Stratified Random Sample……………………………………………………………………………13
5.Cluster Sampling………………………………………………………………………………………20
6. Multi-Stage Sampling…………………………………………………………………………………20
Chapter 3: Sample Size Estimation ..................................................................................................... 22
1. Requirements for Sample Size Estimation…………………………………………………………22
2. Pre-assessment of the population variance………………………………………………………..22
3.Choosing the convenient variable for the estimation of the sample size………………………..24
4. Sample size estimation to estimate the Ratio (𝒑)………………………………………………….25
5. Sample Size Estimation for the Population Mean (μ)……………………………………………..26
6. Sample Size Estimation in stratified sampling design…………………………………………….28
7. The Design Effect in Cluster Sample Size Estimation…………………………………………….30
Chapter 4: Sampling Technique in Statistics Centre – Abu Dhabi ...................................................... 32
1. Sampling Frames in the Statistics Centre – Abu Dhabi…………………………………………..32
2. Designing Statistical Survey Samples………………………………………………………………36
References: ......................................................................................................................................... 39
2
Introduction
The Statistical Sampling Guide related to the Statistics Centre – Abu Dhabi falls within the scope of works
entrusted to the Statistical Research, Methodology and Quality Standards Department pertaining to the
documentation of statistical operations evidence. This guide aims to provide the statistical technicians
within the centre, along with data users outside the centre, with details related to the design and sample
selection procedures conducted by the centre, regardless of the type of surveys (economic survey,
household survey, etc.)
This guide consists of four main chapters, whereby Chapter one includes the statistical terms and
concepts related to the statistical sampling techniques. Such concepts relate to the actual reference to
the sampling theory as well as to the statistical evidence and dictionaries of statistical terms adopted by
the international and regional organizations. Chapter two of this guide presents the various statistical
sampling technique in addition to the restrictions and advantages related to the same. It also presents
the standards of selection of the best technique to be adopted in statistical sampling. On another note,
Chapter three is concerned with the sample size estimates in accordance with the used sample design
techniques. Furthermore, it presents the main requirements to be met to determine the sample size.
Finally, Chapter four describes, in statistical terms, the statistical sampling frames adopted in the centre,
whether such frames were related to economic, household, or any other type of surveys. This chapter
also includes the statistical sampling designs used in the Statistics Centre – Abu Dhabi, as well as the
mechanism used in determining the size of the samples and the random techniques used in the sample
selection.
3
Chapter 1: Sampling Concepts
In this chapter, the main concepts and definitions related to the theoretical and applicable aspects of
sampling and sample design will be presented, knowing that the same aligns with the international
concepts and definitions adopted in this field:
Statistical Population
The term “Statistical Population” includes all statistical units on which the statistical survey is to be
conducted. These units shall be clearly defined, as they might include one or more common
characteristics. The majority of the statistical populations consist of statistical units, whereby such units
change in terms of time (renewable societies), while other units constitute static populations on which the
time factor does not have any effect.
Statistical Survey
The Statistical Survey refers to an organized statistical process based on scientific methods and the
principle of inclusion of part of the statistical population. More often, the units are chosen by adopting
probability sampling technique or by including all units of the population, for data collection.
Comprehensive Census
The comprehensive Census refers to the organized statistical process based on the inclusion all of the
units related to the Statistical Population during the data collection process. Usually, the comprehensive
census is carried out in the social, agricultural and economics populations. The comprehensive census
shall also be carried out in the event the targeted population is usually small, thus the inefficiency of the
sampling technique. In addition, in the event the statistician fails to acquire a clear background on the
nature of the population, they shall be mistaken and carry out a comprehensive census instead of a
sampling technique.
Sampling Techniques
These techniques shall be used to select a sample of units from the population to be subject to statistical
methods, whereby the results reached based on the sample data represent the targeted population
indicators.
Random Selection
This process is related to the selection of units from the statistical population in such a manner that avoids
any personal control which interferes with the selection or exclusion of any of the population units while
ensuring the provision of a chance for each unit to appear in the selected sample.
Sampling Frame
It is a list or record including all units of the statistical population. It usually includes names and addresses
of the statistical units and some other relevant information. It might also refer to a map enabling access
to the statistical units for data collection.
4
Sample Design
Sample Design refers to a specific plan aiming to select a sample from a specific population. It also refers
to the technique that shall be adopted by the statistician in the sample unit’s selection process.
Sample
The Sample is a subset of the Statistical Population. It shall be selected by one of the statistical sampling
techniques. The sample shall be representative of the survey population. For this purpose, the sample
must include the characteristics of the population in such a manner that enables its results to be
generalized to estimate population parameters.
Types of Samples
Statistical samples shall be divided into two main sections:
1- Probability Samples:
Probability samples are selected in accordance with the laws of probability, whereby their units
are selected successively and with a known probability. The Probability Samples include Simple
Random Samples, Stratified Samples, Systematic Samples and Cluster, etc.
Probability samples are mainly characterized by their capability to generalize their results to all
population units by calculating the sampling weights, whereby the amount of weight of the sample
unit depends on the probability of selecting the relevant unit from the population. Probability
samples also enable the analysis of the sample results as well as the calculation of standard
errors and coefficient of variation in addition to the Design Effect. So, the Probability Sample
enables to estimate the margin error and the confidence level in the resulting estimates.
Therefore, the official statistics indicators depend mainly on the Probability Samples designed in
such a manner that they represent their results at the level of the population as a whole and
estimate the sampling errors.
2- Non-Probability Samples:
Non-Probability Samples shall be selected based on a method without referring to the laws of
probability. Non-Probability Samples include purposive sampling, quota sampling, convenient
sampling, snowball sampling, etc. This type of sampling is often applied in poll surveys and
studies conducted on limited phenomena within the population. Such samples also give results
based on data representing the sample units rather than the population as a whole.
5
Multi-Purpose Sample
The Multi-Purpose Sample refers to the sample by which data is collected for several topics included in
a single statistical survey, such as income, expenditure, health and nutrition.
Successive Sample
It refers to a sampling technique, whereby the Successive Sample covers the population for several
years, where the population is divided into several groups, each being covered for one year. This
technique is usually used in small populations as an alternative to censuses.
Pilot Survey
The Pilot Survey is concerned with selecting sampling units to collect their related data for the purpose
of questionnaire test and to tackle the challenges that might be faced by the researcher at the time of the
survey conduction.
Matched Sample
Matched samples refer to sample units consisting of similar or matching pairs, whereby the studied
characteristic related to the sample is measured twice under different conditions.
Self-Weighting Sample
The Self-Weighting Sample refers to the sampling design which includes equal sampling weights for all
sample units. In other terms, the probabilities of selecting all sample units shall be equal.
Convenient Sample
It is a Non-Probability sampling technique by which a sample is taken from the population due to its
availability and suitability to be part of the survey, provided that it is not taken into consideration the
sample to represent the whole population.
Sampling Unit
The Sampling Unit refers to the unit selected in the sample, representing an element in the statistical
population. In other terms, the sampling unit constitutes the unit for which statistical data is collected.
6
Analyzing Unit
The Analyzing Unit is used during the analysis of the collected statistical data to achieve the statistical
survey objectives. The Analyzing Unit could be the sampling unit itself.
Non-Response Errors
The Errors result from the refusal of the respondents in the sample to response to the survey, there is a
unit nonresponse, it is a full non-response to all the required variables in the questionnaire, also there is
an item nonresponse it is a partial non-response to some variables in the questionnaire.
Random Errors
Random Errors are the deviation of the data values from the population parameter, provided that such
deviation is made by chance. The random errors cannot be concealed in the comprehensive census.
More often, random errors are limited and thus, can be measured and identified. The size of random
errors depends on two main factors: the extent of difference or contrast between the population units on
one hand and the sample size in relation to the population from which it was selected. The higher variance
in the population units, the higher the chance of occurrence of random errors. As for the sample size, the
larger sample size leads to lower value of random errors.
Bias Error
(a) Bias error in relation to estimation:
This term refers to the deviation of the mean of all population parameter’s possible estimates
from their real value. Such error is not easily detectable and corrected unless through radical
modifications to the study design or data collection technique or the modification of the results.
7
Standard Error
The Standard Error represents the square root of the estimated sample variance, whereby the sample
variance refers to the mean of the squares of the differences between the values of the sample units and
the arithmetic mean of the concerned units.
Random Start
It refers to a random number selected randomly when using the systematic sample, whereby the selected
number is between 1 and K, where K represents the value of the systematic period.
Optimum allocation
It constitutes one of the stratified sampling unit’s allocation techniques on the various strata, whereby the
share of each stratum is directly proportional to the size of the stratum and the variance within the stratum,
and inversely proportional to the cost related to the data collection of the sampling unit within the
concerned stratum.
Nyman Allocation
It constitutes one of the stratified sampling allocation techniques on the various strata, whereby the share
of each stratum is directly proportional to both, the size of the stratum and the variance within the same
stratum.
Proportional Allocation
It constitutes one of the stratified sampling unit’s allocation techniques for the various strata, whereby the
share of each stratum is directly proportional to the size of the concerned stratum in terms of the number
of population units in the stratum.
8
Design Effect
It is the ratio between the variance of the estimate of an identified sampling design and the variance that
estimate in simple random sample design.
Weighting
Weighting constitutes the procedure for calculating the weights for the sample units (the weight of a
sampling unit is equal to the inverse of the probability of selecting the unit from the population) to be used
in sample estimation
9
Chapter 2: Statistical Sampling Techniques
The sampling theory includes several statistical sampling techniques, whereby they all focus on obtaining
a statistical sample that produces sampling estimates with the lowest possible sampling error, while
considering the estimated sample size. This chapter is concerned with the statistical sampling techniques.
10
Example (1):
Consider a population of 60 establishments where a simple random sample of 4 establishments shall be
selected. In this case, the establishments of the population are marked with serial numbers, starting from
01, 02, 03…. to 60. A random number of two decimals is selected from the random numbers table or via
the computer. In this example, let’s consider that the number 45 has been selected. Accordingly, the
establishment with the serial number 45 shall be selected from the sample. In the following selection
process, if the selected random number is 73, it shall be neglected, and the selection process shall be
repeated. Subsequently, 4 establishments representing a simple random sample are selected.
Example (2):
In order to estimate the number of employees in the establishments within a selected region consisting
of 400 establishments, a Systematic Sample of 16 establishments has been selected from. The sample
selection technique shall be as follows:
𝑁 = 400, n=16
𝑘 = 𝑁/𝑛 = 400/16 = 25
A random number between 1 and 25 is then selected. In this case, consider the number 14. Therefore,
the serial numbers of the selected sampling units within the sample shall be as follows:
11
The difficulty of establishing unbiased estimate for variance represents a determinant in this type of
sample, as well as in the case of a cyclical variable in the population, which might lead to producing a
bias in the selected samples and the estimation process. For example, the sample of one member has
been selected from each household in a group of families. The start number shall be 1 and the Sample
Period shall be 2. The members holding the number 3 were selected in the sample. It shall be noticed
that all the members of the sample belong to the third rink in the household. In the event the arrangement
of the members within the household was as follows: the father, the mother, the son, the daughter, it shall
be noticed that all the units in the sample refer to sons, which might lead to a bias in the sample estimates.
Example (3):
Within the frame of economic establishments, the establishment with 1,000 employees is considered to
have a weight of 1,000, which means that it contains 1,000 hypothetical units. Accordingly, the
establishment with 100 employees is considered to weight 100, and so on.
The following tables clarify the aforementioned example:
Number of Cumulative Number of Accompanying
Establishment
Employees Employees Numbers
1 1000 1000 1 – 1000
2 700 1700 1001 – 1700
3 1200 2900 1701 – 2900
4 500 3400 2901 – 3400
5 300 3700 3401 – 3700
6 800 4500 3701 - 4500
In order to select three establishments, three random numbers between 1 and 4500 shall be selected
using the random numbers table or via the computer. The random number shall then be located
conveniently in the column entitled “Cumulative Number of Employees” as shown in the above table. The
establishment with the equal or greater cumulative number of workers shall be selected. In this example,
12
if the random numbers 75, 2000, and 4000 were selected, the establishments within the sample shall
bear the numbers 1, 3, and 6, respectively.
Whenever the population is relatively large, the aforementioned sampling technique might be very time-
consuming. Therefore, another technique, Lahiri, is used to select the sample proportional to the size.
This technique includes selecting pairs of random numbers, whereby the first number of each represents
the number of the sampling unit. The selected random number shall be between 1 and N, where N
represents the number of the total sampling units within the population. The second number of the pair
represents the size of the sampling unit, and shall be between 1 and M, where M represents the size of
the larger sampling unit within the population in terms of the variable according to which the sampling
weights is calculated.
Example (4):
Consider the following eight establishments with the number of employees:
Establishment Number of Establishment
Number of Employees
Number Employees Number
1 3000 5 6000
2 1500 6 1200
3 7000 7 4500
4 2500 8 5000
In order to estimate the mean number of employees in the establishments, the Simple Sampling
Technique might be adopted for the selection of three samples. In the event the selected sample consists
of the establishments with the numbers 1, 2, and 6, the sample mean of employees shall be 1,900. On
the other hand, the actual mean of the population is 3,838 employees, which exceeds the double value
13
of the mean estimated in the sample. Therefore, an error is clearly detected due to the usage of the
simple random sampling technique for a non-homogenous population.
When applying the stratified sampling technique, the population shall be divided into strata in accordance
with the categories of the number of Employees as follows:
Stratum I – Number of Stratum II – Number of Stratum III – Number of
Employees (<3,000) Employees (3,000 – 5,000) Employees (>5,000)
Establishmen Number of Establishment Number of Number of
Establishment No.
t No. Employees No. Employees Employees
2 1500 1 3000 3 7000
4 2500 7 4500 5 6000
6 1200 8 5000
A sample consisting of one establishment has been randomly selected from each stratum, which are with
the serial numbers 2, 7 and 5. The estimated sample mean of employees is given by:
3 2 3
( × 6000) + ( × 4500) + ( × 1500) = 3938
8 8 8
It shall be noticed that the aforementioned result calculated by stratified random sample is very close to
the actual mean of the population when compared with the result established by the simple random
sampling technique. Therefore, it shall be concluded that using the stratified sampling technique is a
necessity in such cases.
14
Example (5):
In the event the number of establishments in a determined sector is 1,850 establishments, the number of
categories into which the population may be divided shall be as follows:
3+1(Log 1850) = 11
In order to divide the population into strata, the Cochran technique may be used, whereby it considers
both, the number of the units of the population and the weight of each unit, given that the number of
required strata is priory determined. The following example clarifies the stated:
Example (6):
If the number of establishments in the population is 1,850, whereby they are divided into initial categories
according to the number of employees of 12 categories, should it be required to stratify the concerned
population into four strata, being the following:
Cumulative
Number of Number of
Category Number of √𝑪
Establishments Employees
Employees (C)
1-5 200 720 720 26.8
6 - 10 250 2000 2720 52.2
11-15 300 3900 6620 81.4
16-20 300 5250 11870 108.9
21-25 250 5750 17620 132.7
26-30 200 5400 23020 151.7
31-35 100 3650 26670 163.3
36-40 80 3000 29670 172.2
41-45 80 3360 33030 181.7
46-50 50 2400 35430 188.2
51-55 30 1590 37020 192.4
56 and above 10 950 37970 194.9
The square root of the Cumulative Number of employees ( C ) in the last column shall be divided by the
15
4.2 Advantages of Stratified Sampling
1. In the stratified sample, the population shall be homogenous in each stratum. The population
shall be well represented, as the samples are selected from different strata, mainly those of
special importance.
2. The usage of the stratified sample shall be more efficient in comparison with the other types of
samples, mainly in the event of a non-homogenous population and in the event of extreme values
of some population units.
3. The stratified sampling leads to the reduction of the incurred costs as it reduces the sample
required to be covered at a certain accuracy level.
4. The stratified sample may be used to obtain results at specified geographical or administrative
levels (districts, sectors, regions, etc.).
5. Controlling, supervising, and organizing the field work, as well as determining the area of work of
each group is better achieved when dividing the population into strata according to the
geographical and administrative areas.
Equal Allocation:
This technique is usually used whenever the need to obtain results at the level of the administrative
regions arises (in the event the stratum represents an administrative region) or in the event of equal
allocation of works within all the strata (according to the availability of the fieldwork capabilities). In
addition, the equal allocation technique is used when the population size is nearly equal in all the strata.
In this case, the size of the sample in a single stratum shall be calculated according to the following
equation:
n
nh = .…… (1)
L
Example (7):
If the sample size determined by 2,000 sampling units, given that the number of strata is 8 strata, the size
of the sample in e stratum h is equal to:
2000
= 250
8
16
Proportional Allocation:
This is the most common allocation technique, given its easy application. In fact, when no information
other than the number of the units of the population in each stratum is available, the following equation
may be used to estimate the number of samples in the stratum (h):
= n(
N h
) ..…… (2)
n h
N
Where Nₕ represents the number of population units in the stratum (h).
It may be concluded from the above that the sample percentage in a single stratum is equal for all the
strata, which leads to a self-weighted sample. In this case, there is no need to calculate the sampling
weights when establishing the sample estimates, which enables the ability to give quick and highly
accurate estimations.
Example (8):
When estimating the number of employees in a specified population, a sample of establishments has
been selected from each region through the stratified sampling technique and by using the Proportional
Allocation methods for a total sample size of 35 establishments. The size of the establishments in each
stratum according to the above equation (2) shall be as follows:
Total number of Sample size in the
Stratum (Region)
establishments stratum
1 200 20
2 100 10
3 50 5
Total 350 35
Nyman Allocation:
The most important advantage characterizing the Nyman Allocation technique is represented by its usage
to reduce the variance and increase the accuracy and efficiency of the data, whereby it takes into
consideration the variance of each stratum in addition to the size of the same when allocating the total
sample to the strata. The size of the sample in the stratum is directly proportional to the standard deviation
of the same in order to increase the efficiency of the sampling design in comparison with the proportional
allocation technique. This technique is usually used when the standard deviation varies between the
strata. Whenever the total sample size as well as the sample cost are fixed among the different strata,
the sample size of the stratum (h) shall be estimated according to the following equation:
Nh Sh n
nh = L ..…… (3)
Nh Sh
h =1
On the other hand, the standard deviation at the level of each stratum shall be obtained according to a
previous census or may be estimated by reference to previous surveys or other similar surveys.
17
Example (9):
When estimating the household average income in a certain region, the population shall be divided into
three strata according to the income category. A sample of each category shall be selected according to
the Nyman Allocation technique, with a total sample size of 15 households. Considering the following
data:
Stratum (Income Number of households Standard Deviation to each Nₕ Sₕ
Category) in the stratum (Nh) stratum (Sh)
Less than 1,000 30 20 600
Between 1,000 – 3,000 50 30 1500
Greater than 3,000 20 20 1000
Total 100 - 3100
Optimum Allocation:
The Optimum Allocation technique aims to reduce the variance as little as possible with a determined
cost or to reduce the cost as little as possible with a certain accuracy level, where the cost factor is
included in the sample allocation to the strata. This technique is usually used in the case of a variability
in the data collection cost between the strata. For instance, the data collection cost in certain areas is
much higher than the data collection cost in others. Many equations are used for this purpose, of which
we present the following:
Nh Sh Ch
nh = L
n ..…… (4)
Nh Sh Ch
h =1
Where (Ch) represents the cost of a single sampling unit in stratum (h).
Example (10):
To estimate the field production, mean in several agricultural regions, a stratified sample has been
selected, where each region constituted a single stratum. The total number of selected samples was 200
samples. According to the aforementioned equation (4), the size of the sample in a single stratum as
established in the last column of the below table is as follows:
18
Stratum Number of Standard Nₕ Sₕ Unit Cost 𝑵𝒉 𝑺𝒉 Sample
(Region) fields deviation of the (Cₕ) √𝑪 Size (n)
Nₕ stratum Sₕ
1 200 10 2000 4 1000 58
2 100 20 2000 6 816 47
3 150 10 1500 6 612 36
4 100 15 1500 9 500 29
5 50 20 1000 12 287 17
6 50 20 1000 20 224 13
Total 650 3439 200
The size of Stratum 3, according to the number of establishments therein, represents 50% of the
population units, although it contains 0.01667 of the production quantities. Therefore, in the case the
number of establishments is used to weigh the stratum, misleading results may be produced, whereby
the share of Stratum 3 represents 50% of the size of the selected sample. However, the share of Stratum
2 represents 17% of the selected sample, although it contains 50% of the total production quantity related
to revenues, which reduces the accuracy and efficiency of the sample and increases the related costs.
For this purpose, the best technique to be adopted consists of allocating the sample to the strata in
accordance with the production quantity, being the variable to be studied. The sample shall be allocated
between the strata based on the production quantity rather than the number of establishments, using the
following equation to allocate the sample in accordance with the proportional technique with a specific
variable:
xh
nh = n
L ..…… (5)
xh
n =1
Where Xh refers to the value of the variable contained in stratum (h), being the production quantity
according to the above mentioned example.
If the required sample size consists of 100 establishments, the allocation of the sample to the strata in
accordance with the proportional technique and the production quantity variable shall be as follows:
19
Stratum Sample Size
1 33
2 50
3 17
Total 50
5. Cluster Sampling
The Cluster Sampling Technique is based on the principle of conveniently dividing the population into
groups, whereby the latter are close in terms of size and homogenous in terms of the studied variables.
Each of the groups is called “cluster”, and the group of clusters represents the complete population
without omission or repetition.
The most important advantage characterizing the cluster sample is its efficiency in terms of the cost unit,
this technique is usually used in the populations lacking sampling frames or where it is difficult to provide
an updated frame to the population units. However, a cluster frame may be provided, saving time and
effort. Another advantage exists in using this technique represented in saving the transportation costs
during the data collection stage between the sampling units. It shall be kept in mind that the disadvantages
of the cluster sample are represented by it being less effective than the simple random sample as it is
less prevalent.
The following shall be taken into consideration when using the cluster sample:
• Matching the number and size of clusters, whereby the size of the cluster is relatively small, and
their number is relatively large.
• When forming the clusters, units from the populations close spatially or within a certain
geographic area shall be selected, being similar in terms of the studied variable.
• Consistency in terms of the sizes of clusters, whereby they shall be as close as possible in size.
• Each cluster shall be clearly defined in order to be distinguished from another.
6. Multi-Stage Sampling
The main challenge faced by several surveys is represented by the lack of an updated frame for the main
sampling units, such as establishments, housing, etc., whereby it is difficult to develop an updated frame
for the same. At the same time, a list or frame with a variable is provided at a collective rather than a
detailed level, such as population communities or major regions. In the relevant cases, the multi-stage
sampling technique may be used.
• Saving time and money, whereby setting a frame with the primary sampling units shall be
sufficient.
• The multi-stage sample is characterized by flexibility, whereby the sampling technique may be
used at each of the various stages.
Within the multi-stages sampling technique, it is preferable to divide the population into equal primary
sampling units in the following cases:
20
• When the size of the primary sampling units is relatively large, it shall need more time in terms of
preparing the secondary sampling units’ frame.
• When the size of the primary sampling units is small, it shall need time in terms of navigating
between samples.
• The simple random unit may be used in the event the primary sampling units are homogenous.
• The primary sampling units may be divided into strata, and a sample shall be selected from each
stratum.
• In case of significant variance between primary sampling units, the proportional to the size
sample may be used.
• The systematic sample may be used. However, it will not produce an unbiased estimate for the
sampling error.
As for the secondary sampling units, they shall be selected through any of the following sampling
techniques: simple random sampling, systematic sampling, cluster sampling, or proportional to the size
sampling.
21
Chapter 3: Sample Size Estimation
This chapter is concerned with the sample size estimation when conducting a statistical survey in addition
to the main requirements to be made available in order to access the accurate estimation of the sample
size. Furthermore, it presents a sample size estimation mechanism in accordance with the various
statistical sampling techniques.
1. The confidence level in the estimates to be built based on the size of the sample, being 90%,
95%, 99%, etc., whereby it statistically represents the areas of the normal allocation under the
standard normal curve where the values of Z are 1.64, 1.96, 2.58, etc., respectively. The
confidence level pertaining to the estimation value is positively related to the sample size,
meaning the higher the size, the higher the estimation confidence level.
2. The estimation margin error, which means the difference between the actual and estimated
value of the parameter for which an estimation shall be found using the sample data. The
sample size is directly related to the margin error, meaning the lower margin error in the
estimation (reduction of the error) required the increase in the sample size.
3. The variance of the population. In case when the population variance is unknown, a convenient
variance estimation shall be adapted. In the event the survey aims to estimate various
indicators in the survey, a convenient indicator shall be selected, known as the Key Indicator, to
estimate the sample size while depending on estimating all the required indicators with high
accuracy.
The confidence level and margin error shall be determined as a requirement, while calculating the sample
size. A larger sample size shall be needed whenever the margin error value (meaning the required bound
of error when estimating (d)) is low and when the confidence level exceeding the margin of error value
(d) is high. Sample size determination carried out by studying several values of each of the confidence
levels (z) and the margin errors (d). On this basis, different scenarios shall be established for the sample
size, which enables the survey management to establish a balance between the same in accordance with
the costs and available requirements.
22
2.1. Two-stage sample segmentation
Following this technique, the sample shall be divided into two parts and executed in two stages. In the
first stage, a simple random sample of size (𝑛1 ), based on which the variance value is estimated by 𝑠12 .
This estimation is also used to calculate the final sample size (n).
The remaining sampling units of the complete sample size shall be selected in the second stage after
estimating the variance value in accordance with the sample size depending on this estimation.
Example (1):
For the calculation of the sample required for estimating the annual average household expenditure, it
shall be sufficient to randomly select 500 households from the population subject to the study. The
variance shall be estimated accordingly. The sample size estimation shall be used, whereby it might show
that the sufficient sample size (according to the aforementioned accuracy and confidence levels) is 2,300
households. The survey concerned with the remaining sample units shall be completed, which is the
difference between 2,300 and 500, showing a result of 1,700 establishments.
This technique is mainly characterized by the provision of trusted estimations to the parameter 𝑆 2 .
However, it is limited by the fact that major efforts shall be exerted for long periods. In addition, the sample
selection has been divided into two phases, each sample unit was selected with a different sampling
fraction, meaning that the probability theory was not employed in selecting the sample in its optimal form.
This technique is based on benefiting from the pilot survey data executed near the main surveys in order
to achieve other objectives, such as testing the survey questionnaire and control auditing rules as well as
the estimation for the preparation of the necessary number of survey team to carry out the survey. Pilot
surveys may also aim towards benefitting from their data in the estimation of the population variance and
calculation of the necessary standard deviation in order to estimate the main survey sample size.
Practically, this is the most common technique as, when estimating the convenient sample size, prior
surveys shall be reviewed and conducted on the same population or a similar one in order to estimate
the standard error value. Although the variance is more static in comparison with the changes occurring
in the central tendency measures for the phenomenon under study, than the standard deviation, as
changes within the phenomenon conduct might occur over time.
The ratio indicators (the number of cases characterizing the studied variable to the total number of cases)
are among the main indicators included in studies and surveys. The ratio (p) may be estimated. It shall
also be well known that the closer the ratio to the actual parameter, the more accurate the variance
23
estimation. For instance, if the parameter value in the population is p=0.3, the variance value shall be
(𝑆 2 = 𝑝(1 − 𝑝) = 0.3 × 0.7 = 0.21).
It shall be noticed that the variance value reaches its maximum when the ratio is equal to 0.5, as follows:
(𝑆 2 = 𝑝(1 − 𝑝) = 0.5 × 0.5 = 0.25)
This assumption requires the selection of the largest sample size under the previously adopted accuracy
and confidence levels.
If the variance value is unknown regarding a certain phenomenon, and if the convenient sample size
estimation is to be carried out, the variance value shall be set at 0.25, providing the largest sample size
possible as a precautionary measure.
3. Choosing the convenient variable for the estimation of the sample size
It has been practically well known that the objective of conducting a statistical survey is not limited to data
collection in relation to a single indicator. However, several indicators shall be available to collect their
respective data and analyze the studied phenomenon in its different aspects. On another hand, the
majority of the national and international statistical institutions are focusing on conducting Multi-indicators
Surveys, which have become common and widely spread with the data collection techniques
development and their automatic processing. The results obtained by such surveys provide various
indicators. Among these surveys, we shall note the Multi-Indicators cluster Survey which aims to collect
detailed indicators regarding the status of the children and mothers as well as their surrounding conditions
and factors.
The main challenge that may be faced when determining the sample size in this type of surveys is related
to the mechanism to be adopted in order to choose the convenient variable that may produce a sufficient
sample size to obtain an estimate of the value of the concerned indicator as well as estimations for other
indicators included in the survey, whereby they shall all be characterized with accuracy and efficiency.
This challenge leads to the fact that the optimum technique for the determination of the sample size shall
be according to the following:
• Determining all important indicators included in the survey, as well as selecting an important
indicator requiring the largest sample size.
• Ensuring keeping the error rate at a specific value in terms of the various indicators regarding
which data is collected.
• Determining the targeted sub populations for each indicator and select the one with the lower
rate in the overall survey population while considering the importance of the same in the overall
survey objectives.
Example (2):
Based on Household Income and Expenditure Survey, if the ratio of the households spending money on
durable goods refers to 82%, and the ratio of households with children attending school refers to 60%,
24
respectively, the sample size selected from the first indicator shall be adopted, although the second
indicator requires a sample larger in size. It shall also be evident from the following calculations,
considering that Z=1.96 and d=0.05.
( 1.96)2 (0.82)(0.18)
𝑛1 = =202
(0.05)2
(1.96)2 (0.6)(0.4)
𝑛2 = =369
(0.05)2
The first ration is extremely significant in comparison with the second one. Accordingly, increasing the
sample size from 202 to 369 is not justified to reach an indicator that is not considered essential for
conducting the Household Income and Expenditure Survey.
4.1 Sample size estimation for the population ratio with absolute margin error (d)
If 𝑝̂ is the estimation of the percentage of the population 𝑝 based on a previous survey, or based on pilot
survey, in accordance with the simple random sampling technique, the sampling distribution for estimation
shall be the approximate normal distribution, where
𝑝(1−𝑝)
𝐸(𝑝̂ ) = 𝑝 and the variance 𝑣𝑎𝑟(^𝑃 ) = .
𝑛
The value (d) represents the margin error, being the difference between the actual population percentage
and the estimated percentage of the sample. It shall be expressed according to the following:
𝑃̂ 𝑞̂
d= 𝑍1−𝛼 √ 𝑛 …… (1)
2
Where 𝑞̂ = (1 − 𝑝̂ )
The value 𝑧1−𝑎 in the aforementioned relation (1) expresses the number of standard errors, for the
2
difference value between the estimated percentage and the population parameter (P). The value of (d)
refers to the accuracy level. In order to obtain a small value of (d), this usually requires selecting a larger
sample size. In case the value of Z=1.96 was chosen, this means that 95% of the possible sample
percentages will have a standard error range of 1.96.
25
The aforementioned leads to the conclusion stating that, as the sample size increase the margin error
relatively decrease, which mean that there is strong relation between both of the data accuracy and the
related sample. In order to obtain the sample size, the solution for equation (1) above with respect to (n)
obtain the following:
2
(𝑍1−𝛼 )
n= 2
𝑝̂ 𝑞̂ …… (2)
𝑑2
The above equation leads to the conclusion that the size of the sample increases with the higher
numerator value in equation (2) at a certain value of Z, resulting in the change of the value of n according
to the change of the value 𝑝̂ .
The larger sample size possible is achieved when the percentage value 𝑝̂ is equal to 0.5. As such, the
4.2 Sample size Estimation for the population Ratio with the relative Standard Error ε
The sample size may be estimated depending on the error value as a percentage of the estimation value
p (relative margin error) rather than the absolute error value, meaning in the case where the error value
is d= 𝜀𝑝. According to the aforementioned equation (1), the estimation of the sample size shall be
calculated as follows:
2
𝑞̂(𝑍1−𝛼 )
𝑛= 2
…… (3)
(𝑝̂)(𝜖)2
For instance, of the value of 𝑝̂ = 0.2 and the required estimation value is ε=0.10, the absolute error value
is d = 0.2 x 0.10 = 0.02. In this example, the estimated sample size shall be:
0.8(1.96)2
𝑛= = 1536
(0.2)(0.10)2
5.1 Sample Size Estimation to the Population Mean with Absolute error in Points (d)
The required sample size to estimate the population mean (μ) for a certain variable shall be calculated
according to the following equation:
2
(𝑍1−𝛼 ) 𝜎2
n= 2
…… (4)
𝑑2
26
Where:
𝜎 2 represents the population variance and may be estimated using the 𝑠 2 value of a sample selected
from a previous similar survey or based on the results of a previous pilot survey.
the value of d (in points) represents the margin error in estimation, being the absolute value of the
difference between the actual population mean (μ) and the estimated mean concluded from the sample
data (𝑥̅ ).
Example (3):
The available information shows that the variance in production from in several agriculture holdings is
140 kgs. When estimating the production means in a single agriculture holding, provided that the
difference between the estimated mean from the sample data and the actual mean does not exceed 1
kg, with a confidence level of 95%, then the required sample size (number of holdings) to estimate the
holding production mean shall be as follows:
The absolute margin of error is: d = 1. The value of Z, achieving a confidence level of 95%, is: Z = 1.96,
and the population variance is: 𝜎 2 = 140.
According to the aforementioned equation (4), the sample size estimation shall be as follows:
(140)(1.96)2
n= =538
12
The size of the sample, being (538), is relatively large, due to the selection of a small absolute error size.
In the event a difference not exceeding 2 kgs (d = 2) is accepted, the sample size will significantly
decrease, as follows:
(140)(1.96)2
n= =134
22
5.2 Sample Size Estimation to the Population Mean with relative Margin Error (ε)
In case when the relative value of margin error has been adopted instead of the difference in points,
equation (4) shall be replaced with the following:
𝑍 2 𝜎2
n= ……(5)
𝜀 2 (𝜇)2
Where:
ε represents the margin of error percentage; and
μ represents the population parameter (arithmetic mean).
The real values of 𝜎 2 and μ may not be available. Therefore, they shall be replaced by the values 𝑥̅ and
𝑠 2 respectively from previous surveys or pilot studies., and the size of the sample shall be expressed as
follows:
𝑍 2 𝑠2
n= …… (6)
𝜀2 (𝑥̅ )2
27
Example (4):
In the previous example, if the margin of error is referred to when the sample mean represents 5% of the
real values of the mean, which, according to a previous study, refers to a value of 80 kgs (μ = 80), and
the value of the variance is 250, the size of the sample shall be:
(250)(1.96)2
n=(0.05)2 (80)2 = 60
If we consider the equation (5), it shall be noticed that the value of 𝜎 2 divided by 𝜇2 represent the value
of the square of the Coefficient of Variation, whereas:
𝜎
C.V=
𝜇
This coefficient is usually unknown and may be estimated from the sample data collected from a previous
survey.
In the event it is required to adopt the value of the Coefficient of Variation to estimate the sample size of
the current survey, it shall be expressed using the following equation:
2
(𝑍1−𝛼 ) (𝐶.𝑉)2
n= 2
…… (7)
𝜀2
The sample size equation is based on the relation between the margin of error (d) and the required
standard error as follows:
𝑑 = 𝑍. 𝑆(𝑥̅ )
If d refers to the value of the margin error and Z to the confidence level, and by adopting the following
𝑑2
equation related to the standard error in estimating the stratified sample: 𝑆 2 (𝑥̅𝑠𝑡 ) = 𝑍 2 = 𝐵 2 , the
overall stratified sample size shall be as follows:
• In equal allocation:
𝐿 ∑𝐿ℎ=1 𝑁ℎ2 𝑆ℎ2
𝑛𝑒𝑞 = …… (8)
𝑁2 𝐵2 +∑𝐿ℎ=1 𝑁ℎ 𝑆ℎ2
• In proportional allocation:
𝑁 ∑𝐿ℎ=1 𝑁ℎ2 𝑆ℎ2
𝑛𝑝𝑟𝑜𝑝 = …… (9)
𝑁2 𝐵2 +∑𝐿ℎ=1 𝑁ℎ 𝑆ℎ2
28
• In Nyman allocation:
2
(∑𝐿ℎ=1 𝑁ℎ 𝑆ℎ )
𝑛𝑁𝑒𝑦 = …… (10)
𝑁2 𝐵2 +∑𝐿ℎ=1 𝑁ℎ 𝑆ℎ2
• In optimum allocation:
𝑁 𝑆
∑(𝑁ℎ 𝑆ℎ √𝐶ℎ ) ∑ ℎ ℎ
√𝐶ℎ
𝑛𝑜𝑝𝑡 = …… (11)
𝑁 2 𝐵2 + ∑𝐿ℎ=1 𝑁ℎ 𝑆ℎ2
6.1 Alignment of the stratified sample size in accordance with the survey costs
In case when the overall budget for the survey is the only determinant for the sample size, with total cost
is given by amount C, with an equal sampling unit cost in all strata (𝐶ℎ ) the sample size shall be calculated
according to the following equation:
𝐶
𝑛 = 𝑐 …… (12)
ℎ
Example (5):
Suppose that the total available costs to conduct a survey is given by AED 160,000, whereas the cost of
data collection from each sampling unit is AED 100, the overall sample size shall be:
𝐶 160000
n= 𝑐ℎ
= 100
= 1600
𝑁
∑ ℎ
√𝐶ℎ
𝑛 = 𝐶. ∑ 𝑁 …… (13)
ℎ √𝑐ℎ
Example (6):
Suppose that the total cost has been determined with a value of AED 160,000 with the following values
attributed to the size and cost:
𝑵𝒉 √𝑪𝒉 𝑵𝒉 ⁄√𝑪𝒉
Stratum No. 𝑵𝒉 𝑪𝒉 √𝑪𝒉
1 4000 36 6 24000 667
2 2700 81 9 24300 300
3 1600 100 10 16000 160
Total 8300 64300 1127
29
• When the total cost of the survey has been previously determined, whereas a different value is
attributed to each of the strata sizes and variances value among each, the sample size estimation
shall be carried out according to the following equation:
∑𝐿ℎ=1 𝑁ℎ 𝑆ℎ ⁄√𝑐ℎ
𝑛 = 𝐶. …… (14)
∑𝐿ℎ=1 𝑁ℎ 𝑆ℎ √𝑐ℎ
• When the costs and variance are previously determined in each stratum with a given accuracy
level, the sample size shall be estimated as follows:
𝑁 𝑆
∑(𝑁ℎ 𝑆ℎ √𝐶ℎ )(∑ ℎ ℎ )
√𝐶ℎ
𝑛= …… (15)
𝑁 2 𝐵2 + ∑𝐿ℎ=1 𝑁ℎ 𝑆ℎ
𝑑
Where B represents a certain level of accuracy expressed as follows: 𝐵 = , whereas the value of (d)
𝑧
represents the previously determined margin of error. And z value is related to the confidence level.
In the event, that it has been required to apply the cluster sampling technique on a certain survey, and
to determine the required sample size with a known confidence level and margin error, it shall be
referred to as the normal probability distribution assumption, estimated sample size shall be obtained in
accordance with this purpose.
However, the determined sample size is for a simple random sample rather than a cluster sample. It is
well known that the variance related to the cluster sampling is higher than that of the simple random
sampling, which means that this challenge shall be faced by increasing the size of the sample in the
cluster sampling to reduce the variance value to be as equal as possible to the simple random variance
value.
Based on the above, a relative coefficient shall be used to show the expected effect related to the
usage of a cluster sampling design rather than using the simple random sampling technique. The
relevant coefficient is known as the “Design Effect”.
30
Design Effect: refers to the variance ratio of the sample cluster design to the simple random sample
design. It constitutes a measure of relative efficiency and shall be mathematically expressed using the
following equation:
̂
𝑉𝑎𝑟𝑐𝑙𝑢𝑠𝑡𝑒𝑟 𝐷𝑒𝑠. (𝜃)
𝑑𝑒𝑓𝑓(𝜃̂ ) = 𝑉𝑎𝑟 ̂)
(𝜃
…… (16)
𝑆𝑅𝑆
The above has helped conclude that the reason behind variance inflation in the cluster sampling
technique is related to the cluster design. Therefore, the Design Effect may be defined as the inflation
value of the estimated variance due to the adoption of the cluster design rather than the simple random
sample.
On another hand, the number of sampling units to be selected from a single enumeration area (cluster
size) is directly related to the value of the design effect according to the following equation:
𝑑𝑒𝑓𝑓 = 1 + 𝛿𝑥 × (𝑛̅ − 1)
Where:
𝑑𝑒𝑓𝑓 refers to the design effect;
x = the value of intraclass correlation between population units in the Primary sampling unit (PSU)
_
n = the mean of the number of the secondary sample units selected from the main enumeration area.
It may be noticed from the aforementioned equation that the design effect value is directly proportional
to cluster size as well as to the intraclass correlation value between the population units in the primary
sampling unit.
Example (7):
In order to study the income average of the household in a certain population, based on similar previous
surveys conducted for the same population, where two samples were randomly selected at that time: a
cluster sample and a simple random sample. The variance ratio between the two designs (Design Effect)
is given by Deff=1.4. The required sample size selected from the current survey to obtain an estimation
for the average income of the household according to a confidence level of 95%, and margin error of d =
1.5, according to the simple random sample, shall be as follows:
2 2
= z S SRS = 1.96 370 = 632
nSRS d 2 1.5
2
This means that the required simple random sample size within a confidence level of 95% is 632
establishments. In case of using a cluster sample design, the variance amount shall be significantly higher
than that attributed to the simple random sample. accordingly, the sample size shall be modified to cover
the variance value and maintain the estimation value within the bounds of error. In this case, the sample
size shall be modified using the design effect as follows:
𝑛𝑐𝑙𝑢𝑠𝑡𝑒𝑟 = 𝑑𝑒𝑓𝑓 × 𝑛𝑆𝑅𝑆
Since the value of the design effect is equal to 1.4, the size of the cluster sample shall be 632 multiplied
by the design effect value, being 1.4, resulting in 885 secondary sampling units.
31
Chapter 4: Sampling Technique in Statistics Centre – Abu Dhabi
Within its framework Statistics Centre – Abu Dhabi is concerned with conducting all kinds of statistical
surveys, economic, agricultural, environmental, and household surveys, to provide the Emirate of Abu
Dhabi with all required official statistics. Furthermore, the SCAD role in designing and selecting samples
that may be required from the various institutions and departments of the Abu Dhabi government.
The Statistics Centre – Abu Dhabi has different types of sampling frames and follows up on a periodic
and continuous basis on the related updating operation. These frames are used to design and select
statistical samples able to represent the statistical population with utmost efficiency and accuracy. Based
on the concerning frames, the required sample size estimation procedures shall be carried out and
statistical samples of all types may be selected.
Both, the establishment's frame and the households and housing units frame, are highly relied on in the
design and selection process of household and economic survey samples selection in the Emirate of
Abu Dhabi.
Content: the establishment frame includes a list of all operating establishments practicing one or several
economic activities in the Emirate of Abu Dhabi according to their legal entity. An establishment may be
the headquarter, a single establishment, or a branch that keeps the accounts of an establishment whose
headquarters is based in the Emirate of Abu Dhabi, or the branch of the establishment that does not keep
its accounts and its headquarters is located outside the Emirate of Abu Dhabi.
Stratification: establishments within this frame are divided into strata according to the economic activity
ISIC-4 and the size of the establishment according to the number of employees according to the
classification of economic establishments, as follows:
32
Other Sectors (trade, services, etc.):
Micro-establishments (1 to 5) employees, small establishments (6 to 50) employee, medium-sized
establishments (51 to 200) employee and large establishments (200 employee and above).
On other hand the variables included in the sampling frame categorized as follows:
Type of variables
Address Variables Region, area, Plot No., location of the establishment in the building,
establishment address, etc.
Characteristics Variables Name of the establishment, license No., name of the owner, name of the
Director General, Telephone number, etc.
Analytical Variables Establishment status characteristics, legal entity, economic activity,
establishment description, paid-up capital, revenues, etc.
Data Sources: The sampling frame of economic establishments was constructed based on a statistical
register for economic establishments in the Emirate of Abu Dhabi.
Updating the economic establishment's frame: To keep pace with the coverage and inclusion process
of all economic establishments being established or closing or modifying their economic activity, ongoing
updating processes shall be carried out based on the results of the annual economic surveys carried out
by the Centre. An annual update is carried out about the status of the establishments to record whether
the same have ceased their economic activity or modified the latter. It may be even carried out to record
the change in the number of employees therein in addition to updating the definitional and geographical
variables related thereto.
In addition to the previous updating processes, the Centre is implementing the updating project of the
economic establishments based on the statistical registers made available by the Abu Dhabi government
institutions to update the lists of the frame.
Content: This frame includes a list of all the occupied housing units existing in the Emirate of Abu Dhabi.
Data Sources: The frame is built based on the available records of buildings, housing, and households,
whereby all housing units comprise households.
Geographical Structure of the Frame: The geographical structure of the frame is in line with the
administrative divisions approved by the Abu Dhabi Government, in addition to detailed statistical
divisions used for sampling purposes.
As for the approved administrative division levels, they include the Region, the District, and the
community. Whereas the statistical division levels include Enumeration Areas.
33
Definition: An Enumeration Area refers to a geographical area with natural or industrial borders. It
includes building, housing units, and households, whereby the average number of households ranges
between 100 and 200 households, with some exceptions for the areas extending over large spaces with
a small population density.
Based on the above, the development and organization of the housing units and households frame is
constructed in two stages, ensuring consistency with the statistical sampling designs that may be furtherly
applied.
First Stage: Building Enumeration Areas based on geographical maps and data related to buildings,
housing units, and households. The emirate of Abu Dhabi has been divided into independent and
nonoverlapped Enumeration Areas.
Accordingly, a single enumeration area may form a single or part of a community. Sometimes, a group of
sectors may be combined to form a single enumeration area, depending on the size and allocation of
households and housing units in the area. It shall also be noted that these areas shall be adopted as
Primary Sampling Units after the completion of the frame development process.
After the construction of the enumeration areas, the latter were arranged according to the aforementioned
administrative divisions, within each region sectors were spirally arranged, from north to south, in order
to ensure the geographical inclusion of the sample.
Sector 55
Sector 70 Sector 34
Sector 63 Sector 62
Sector 46
The arrangement of enumeration areas within each community was also spiral, as it includes the greatest
spread within the sector, mainly in those including a large number of enumeration areas.
In addition to the list including all housing units and households within a single enumeration area,
geographical maps exist to refer to the detailed locations of the buildings and housing units within a single
enumeration area.
Second Stage: Preparing occupied housing units lists by households. The housing units and households
framed within a single enumeration area include a list of detailed geographical names and addresses
through which the housing units may be accessed as well as the households residing therein. In addition,
the list includes the names of the heads of the families occupying the housing units as well as the type of
household according to the nationality of the head of household (citizen household, non-citizen
household, collective household). Accordingly, households are divided into two main types:
34
The Private Household: The private household consists of one or several members living in a housing
unit and sharing their food. Among the members of the private households, only one is known as the
head of the household, who shall be in charge of the living arrangements of the members. Usually, the
household members’ expenditure comes from the income of the head of the household. In the event the
head of the household is a citizen, the household shall be referred to as a private citizen household. In a
reverse condition where the head of the household is not a citizen, the household shall be referred to as
a private non-citizen household.
The Collective Household: The collective family consists of a group of more than one member living
together under a single housing unit, whereby no kinship relates any member to the other. The collective
household does not have a head and its members do not share food or cooperatively spend their money.
In this context, collective households shall be distinguished from labour camps, whereby the latter
consists of large housing units managed by a specific institution or establishment providing job
opportunities for the members residing in the camp. On the other hand, the members residing in the
housing unit of the collective household are in charge of living arrangements as well as managing the
housing they live in.
The First Technique: it depends on the use of administrative records data, providing detailed information
on the development of housing units and households residing therein. Water and electricity records, as
well as other service records, provide data that may be used in the update of the new emerging
enumeration areas after the establishment of the frame. New cities and housing units might have
emerged after the census or in the event some housing units became occupied after being previously
under construction. These new units may be added to the frame based on the administrative records data
in order to increase the coverage levels.
The Second Technique: The second technique shall be carried out through a comprehensive updating
of the enumeration areas in which the building, housing units, and the household residing therein
information shall be updated. The update process is not mostly carried out according to the foregoing on
all the enumeration areas, due to the time and costs that might be incurred as a result. Some countries
tend to conduct partial update operations in the areas expected to have been subject to significant
changes in terms of the establishment of new buildings, housing units and households.
On the other hand, another technique is used for the partial update of the enumeration areas located
within the Master Sample, which is a relatively large sample selected to be used in several surveys. The
Master Sample is divided into groups called “Replicates”, where each replicate is selected independently
35
to represent the population as a whole. When conducting any survey, one or more replicates shall be
chosen according to the selected survey sample size and the survey is carried out accordingly.
The Statistics Centre – Abu Dhabi updates the housing and buildings' current frame in accordance with
the second technique by conducting partial annual field update processes for a specific percentage of
enumeration areas in the Emirate. The update process also includes the addition of new emerging areas
that were not existent at the time of preparation of the frame.
Annual Economic Surveys aim to provide with economic indicators at the level of economic activities as
well as representing the size of the establishment within a single activity (large, medium, small and micro
establishments). The optimum sampling technique to be adopted in the design of the sample shall be the
random stratified sampling technique, as the establishments are divided into independent strata:
• According to the economic activities, which may be at two or sometimes four ISIC digits.
• According to the size of the economic establishment which shall be comprised of 4 groups: large,
medium, small, and micro establishments.
According to the foregoing, each stratum shall be considered an independent statistical population.
Establishments belonging to a single stratum shall be determined according to the economic activity
they carry out as well as the category of the number of employees (micro, small, medium or large
establishments).
36
trade, services, and other sectors, whereby the margin of error does not exceed 15% for the main
variables.
The frame of the household surveys has been developed and organized in line with the purposes of the
concerned surveys as it ensures an efficient representation of the results. Being one of the basic
procedures adopted when developing the frame, the household population in the Emirate of Abu Dhabi
has been divided into four independent strata to ensure the minimum variance between the population
units, which ensures reducing the sample size as much as possible while maintaining a high level of
accuracy. The strata were divided as follows:
Stratum (1): it includes all the enumeration areas comprising citizen households representing less than
25% of the total households in each area.
37
Stratum (2): it includes all enumeration areas comprising citizen households representing 25% to 50% of
the total households in each area.
Stratum (3): it includes all enumeration areas comprising citizen households representing 50% to 75% of
the total households.
Stratum (4): it includes all enumeration areas comprising citizen households representing 75% to 100%
of the total households in each area.
Based on the foregoing, the enumeration areas were arranged within the frame according to the following
sequence: Region, District, community, Stratum, where it shall be classified within a single region (Abu
Dhabi, Al Ain, Al Dhafra) into 4 strata according to the aforementioned classification. Accordingly, 12
implied strata were formed.
The sampling technique adopted in this case is the Stratified Two-Stage Cluster Sample Design. Within
the first stage, a sample shall be selected from each stratum from the enumeration area, whereby these
areas are known as the Primary Sampling Units. During the second stage, a sample of housing units
occupied by households shall be selected from each Primary Sampling Unit already selected in the first
stage.
38
References:
39
40