Sampling method and sample
size determination
Samrawit .F (MSC)
1
Objective
• At the end of this session you are expected to:
ØIdentify probability sampling method
ØIdentify non-probability sampling method
ØDistinguish errors in sampling
Ø understand sample size determination
2
Terminologies
• Sampling universe: population from which we are sampling.
• Sampling frame: the list of all units in the reference population, from
which the sample is to be picked.
• Sampling unit: the unit selected during the process of sampling.
• Sampling: The process of drawing a subset of people from a
population so that results with that subset may be generalized to the
population
3
Sampling technique
• When you conduct research about a group of people, it’s
rarely possible to collect data from every person in that
group. Instead, you select a sample.
• The sample is the group of individuals who will actually
participate in the study.
• To draw valid conclusions from your results, you have to
carefully decide how you will select a sample that is
representative of the group as a whole. This is called
a sampling method.
4
Why sample?
• The population of interest is usually too large to
attempt to survey all of its members.
• A carefully chosen sample can be used to represent the
population.
• The sample reflects the characteristics of the
population from which it is drawn.
5
Basic conditions in sampling process
• Sample must be well chosen: – Representativeness
• Sample must be sufficiently large: – Minimizes
sampling variation
• There must be adequate coverage of the sample: –
Information should be obtained from almost all
Ø Two keys
1. Selecting the right people
§ Have to be selected scientifically so that they are representative of the
population
2. Selecting the right number of the right people
§ To minimize sampling errors I.e. choosing the wrong people by chance
6
Basic terms
ØA population is a group of individuals persons, objects, or
items from which samples are taken for measurement.
ØReference population (or target population): the population
of interest to whom the researchers would like to make
generalizations.
ØSource population: is the sub set of target population where
samples are drawn.
ØStudy population: the group that is studied, either in total or
by selecting a sample of its members
ØStudy unit: the units on which information will be collected:
persons, families, housing units, health facilities, schools
7
8
Study subjects
The actual
Hierarchy of sampling participants in
the study
Sample
Subjects who are
selected
Sampling Frame
The list of potential subjects
from which the sample is drawn
Source population
The Population from whom the study
subjects would be obtained
Target population
9
The population to whom the results would be applied
Advantage of sampling
üREDUCED COST: ↓demands on resources
ü GREATER ACCURACY: lead to better accuracy of
collecting data.
üGREATER SPEED: data collected & summarized
quickly.
üFEASIBILITY: the only feasible method of collecting
data
10
Drawback of sampling
üThere is always a sampling error.
üSampling may create a feeling of discrimination
with in the population.
üSampling may be inadvisable where every unit in
the population is legally required to have a record.
11
Characteristics of Good Samples
1. Representation
• Sample surveys are almost never conducted for the purposes
of describing the particular sample under study. Rather they
are conducted for purposes of understanding the larger
population from which the sample was initially selected
• A great deal of work has been done over the years in
developing sampling methods that provide representative
samples for the general population.
12
ü3 factors that influence sample representativeness
• Sampling procedure
• Sample size
• Participation (response)
üWhen might you sample the entire population?
• When your population is very small
• When you have extensive resources
• When you don’t expect a very high response
2.Accessible
3.Low cost
13
Types of sampling method
• There are two primary types of sampling methods.
1. Probability sampling is a sampling method that involves
randomly selecting a sample, or a part of the population.
• It is also sometimes called random sampling.
• A sample is obtained in a way that ensures every member of
the population to have a known, non-zero probability of
being included in the sample.
14
Cont’d…
2. Non-probability sampling is a sampling method where
every item has an unknown chance of being selected
• It uses non-random criteria like the availability, geographical
proximity, or expert knowledge of the individuals you want
to research in order to answer a research question.
• Used when a sampling frame does not exist
• No random selection (unrepresentative)
• Assumption that there is an even distribution of a
characteristic of interest within the population
15
Types of sampling
• Probability (Random) Samples
• Simple random sample
• Systematic random sample
• Stratified random sample
• Multistage sample
• Multiphase sample
• Cluster sample
• Non-Probability Samples
• Convenience sample
• Purposive sample
• Quota
• snowball
16
1.Simple random sampling
• Involves random selection
• Most common form of probability sampling.
• To use a SRS method:
– Make a numbered list of all the units in the population
(sampling frame)
– Each unit should be numbered from 1 to N (where N is
the size of the population)
- Decide on the size of sample
– Select the required number.
17
Cont’d…
• Methods of drawing simple random
1. Fish bowl draw (If the total population is small)
2. A table of random numbers
3. Computer programs
18
Cont’d…
Random number table
• It is a table of random numbers constructed by a process
that
1. In any position in the table, each of the numbers 0 through
9 has a probability 1/10 of occurring.
2. The occurrence of any number in one part of the table is
independent of the occurrence of any number in any other
part of the table
19
20
SIMPLE RANDOM SAMPLING
25 samples are selected from sampling
population of 256 individuals
Source: Kumar (1996) 21
Limitation of SRS
•Minority subgroups of interest in population
may not be present in the sample in sufficient
numbers for study
• It is costly to conduct
22
2. Systematic random sampling
• If researcher wants to select a fixed size sample.
In this case, it is first necessary to know the whole
population size from which the sample is being selected.
• The appropriate sampling interval, I, is then calculated by
dividing population size, N, by required sample size, n, as
follows:
Øk=(population size/sample size).
ØK=N/n
23
Cont’d…
• It is important that the starting point is not automatically the
first in the list, but is instead randomly chosen from within
the first to the Kth element in the list
24
25
3. Stratified sampling
• It is done when the population is known to have
heterogeneity with regard to some factors and those factors
are used for stratification
• Using stratified sampling, the population is divided into
homogeneous, mutually exclusive groups called strata
(e.g., age, sex, province of residence, income, etc)
• A separate sample is taken independently from each stratum.
26
Cont’d…
• Divided into 2 types:
1) Proportionate STRS
2) Disproportionate STRS
27
• In the case of Proportionate STRS
- Determine the proportion of each stratum in
the study population
- p = elements (#) in each stratum
total pop. size
• Determine the number of elements to be
selected from each stratum = (n) x (p)
• Select the required number of elements from
each stratum with SRS technique.
28
• In the case of Disproportionate STRS
-allocate equal sample size to each stratum
- Determine the number of element to be
selected from each stratum = Sample size (n)
No. of strata (k)
29
vThe advantage of stratified random sampling is that it
increases the likelihood of representation, especially if
the sample size is small
vUsed for sampling when we want to ensure that
minority populations (in number) are adequately
represented in the sample
30
4.Cluster sampling
• It is selection of groups of study units (clusters) instead of
the selection of study units individually
• The sampling unit is a cluster, and sampling frame is a list of
these clusters
• The clusters should be homogeneous, unlike stratified
sampling whereby the strata are heterogeneous
31
Cont’d…
Cluster samples are generally used if:
• No list of the population exists.
• Well-defined clusters, which will often be
geographic areas exist.
• A reasonable estimate of the number of elements in
each level of clustering can be made.
• Often the total sample size must be fairly large to
enable cluster sampling to be used effectively.
32
Steps in cluster sampling
1. Divide the population into groups or clusters
2. A number of clusters are selected randomly to represent
the total population, and then all units within selected
clusters are included in the sample.
3. No units from non-selected clusters are included in the
sample.
4. Differs from stratified sampling, where some units are
selected from each group.
33
5. Multi-stage sampling
• Similar to the cluster sampling, except that it involves picking
a sample from within each chosen cluster, rather than
including all units in the cluster.
• This method is appropriate when the reference population is
large and widely scattered
• This type of sampling requires at least two stages.
• The sampling unit at the first stage is primary sampling unit
(PSU) and the second stage is the secondary sampling unit
(SSU) and so on
34
Advantages
• No need to have a list of all units in the population.
• Saves a great amount of time and effort
Disadvantages
• More information is needed in this type of sampling, which
may not be available
• Error will be multiplied
• Provide less precise estimation
35
2. Non-probability sampling
36
1. Convenience sampling
• Sometimes known as grab or opportunity sampling
or accidental or haphazard sampling.
• For convenience, the study units that are available
at the time of data collection are selected
• Many clinic-based studies
37
2.Quota sampling
• is done until a specific number of units (quotas) for
different categories of populations have been
selected.
• Similar to stratified but does not involve random
selection
• It is based on the researcher’s judgment
38
3. Purposive sampling
• Often used in qualitative studies( such as those conducting
Focus Group Discussion and In-depth interview )
• Investigator assumes that they are typical of the study
population – People who are assumed to provide rich
information are selected
• Eg.: In-depth interviews of Trauma Clinic coordinator to
understand the issues of drug and supply shortage in the
hospital
39
4. Snow ball sampling
• Also called chain referral sampling
• People who are enrolled in the study are asked to name
others who fulfill the selection criteria using their networks
and contacts Or one case identifies others of his/her kinds
• Useful for identifying hard-to-find individuals or marginalized
populations
• Eg.: The researcher wants to study HIV prevalence in People
Who Inject Drugs (PWID). They are mostly hidden
population.
40
Errors in sampling
1.Sampling error – Random error
ØThe uncertainty associated with an estimate that is
based on data gathered from a sample of the
population rather than the full population is known
as sampling error.
ØIt is an error arising from the sampling process
itself
ØSampling error can be minimized by increasing the
size of the sample.
ØCan not be avoided or totally eliminated
41
2.Non-sampling error (Bias)
It is a type of systematic error in the design or
conduct of a sampling procedure which results in
distortion of the sample, so that it is no longer
representative of the reference population.
We can eliminate or reduce the non-sampling error
(bias) by careful design of the sampling procedure.
It can occur whether the total study population or a
sample is being used.
42
43
Sample size determination
44
• A sample size determination is the act of choosing
the number of observations or replicates to include in
a statistical sample.
• The sample size is an important feature of any
empirical study in which the goal is to
make inferences about a population from a sample
45
Sample Size Determination
Determining the sample size for a study is a crucial component of
study to include sufficient numbers of subjects so that statistically
significant results can be detected.
"How large a sample do I need?“
The answer will depend on the aims, nature and scope of the
study and on the expected result. All of which should be carefully
considered at the planning stage.
46
Basic things in sample size determination
• The more heterogeneous a population is, the larger
the sample needs to be
• In general, the larger the sample size (selected with
the use of probability techniques) the better.
• With nonprobability samples, not generalizable
regardless – still consider stability of results
47
Sample……
n
o If sample (“ ”) is
§ Large
§Increase accuracy
§ Costy / complex
Take
Optimum
§ Small sample
o Decrease accuracy
o Less costy
How ?
48
Factors to determine sample size
• Size of population
• Resources – subjects, financial, manpower
• Method of Sampling- random, stratified
• Degree of difference to be detected
• Degree of Accuracy (or errors)
- Type I error (alpha) p<0.05
- Type II error (beta) less than 0.2 (20%)
- Power of the test : more than 0.8 (80%)
• Statistical Formulae
• Dropout rate, non-compliance to treatment
49
o Sample size determination depending on outcome variables.
There are three possible categories of outcome variables.
• The first is where the variable of interest has only two
alternatives response: yes/no, dead/alive, vaccinated/not
vaccinated and so on.
• The second category covers those outcome variable with
multiple, mutually exclusive alternatives responses, such as
marital status, religion, blood group and so on.
• For these two categories of outcome variables, the data are
generally express as percentages or rates.
• So we can use percentage to compute the sample size.
50
• The third category covers continuous response variables such as
birth weight, age at first marriage, blood pressure and cerium
uric acid level, for which numerical measurement are usually
made.
• In this case the data are summarize in the form of means and
standard deviations or their derivatives.
51
Sample Size………...
There are several approaches to determining the sample size.
Depending on the type of response variable, whether it is categorical
or continuous, we will have two sets of formulas.
The sample size determination formulas come from the formulas for
the maximum error of the estimates and is derived by solving for n.
52
Sample size for single population mean
This is the condition in which the research question is about mean.
Standard deviation () of the population: It is rare that a researcher
knows the exact standard deviation of the population.
Typically, the standard deviation of the population is estimated:
Øfrom the results of a previous survey,
Øfrom a pilot study,
Øfrom secondary data,
Øfrom judgment of the researcher.
53
Maximum acceptable difference (d or w): This is the maximum
amount of error that you are willing to accept.
Desired confidence level (Z/2 ) : is your level of certainty that
the sample mean does not differ from the true population mean by
more than the maximum acceptable difference. Commonly we use
a 95% confidence level.
Then the sample size determination formula for single population
mean is defined by:
z 2 2 2
n 2
w
54
Sample Size for Single Population Proportion
This is the situation in which the variable of interest is categorical.
Three questions must be answered to determine the sample size for single
population proportion:
Best estimate of population proportion of the variable of interest : Make
your best estimate of what the actual percent of the survey characteristic is.
The possible source of this proportion are:
ü from the results of a previous study,
üitem from a pilot study,
üitem judgment of the researcher.
üitem Simply taking 50% if the research is not previously conducted
55
Then the formula for the sample size of single population proportion is defined
as:
z22 * p (1 p )
n 2
w
Where α = the level of significance which can be obtained as 1- confidence level.
P = best estimate of population proportions
W = maximum acceptable difference
z the value under standard normal table for the given value of confidence level
2
56
Example 1
One of MPH student want to conduct a research on the prevalence of ANC utilization of
mothers in DABAT district. Given that the prevalence from the previous study found to be
45.7% , what will be the sample size he should take to address his objective?
Solution:
ØMargin of error d= 5%
ØA confidence level of 95% will give the value of as Zα/2=1.96.
ØThen using the formula :
2 2
Z P (1 P ) Z 0 . 457 (1 0 . 457 )
0 . 05
n 2 2
2
W 0 . 05 2
1 . 96 0 . 457 ( 0 . 543 )
2
0 . 05 2
382
57
Some Considerations
• The final sample size will be corrected for
§ Nonresponse, lost to follow up, lack of compliance and so on
§ Consider the total size of the population (N): if N <10,000 then we
need correction the formula which is defined by
no
n f
no
1
N
• Where nf = final sample size, no = sample size from the
above formula and N total population.
§ Take the design effect in to account if needed
58
Sample size for case control study
59
Sample size in cohort study
60
Incorrect sample size will lead to
oWrong conclusions
oPoor quality research (Errors)
oWaste of resources
oLoss of money
oEthical problems
oDelay in completion
61
Any question??
62
T hank you
63