Is 221 Lecture - Sampling
Is 221 Lecture - Sampling
2
Objectives
3
Sampling
Why sample?
4
Why sample?
• Cost in terms of money, time and manpower
• Accessibility
• Utility e.g. to do diagnostic laboratory test you don’t
draw the whole of patient’s blood.
A census is a sample consisting of the entire population.
Even though a census is not full proof, it gives detailed information about
every small area of the population.
It has the following disadvantages:
• Expensive
• Takes a long time
• Cumbersome & therefore inaccurately done ( a careful sample produces a more accurate data than a
census.)
5
Sampling…..
Sampling is the process of selecting a
representative sample from populations.
It is Selecting cases (elements)—or locating people (or other units
of analysis)—from a target population in order to study the
population.
sampling
Sample
Inference
Population
6
Cont’d
The process of obtaining information from a subset (sample) of a larger
group (population)
The results for the sample are then used to make estimates of the larger
group
Faster and cheaper than asking the entire population
Two keys
1. Selecting the right population
Have to be selected scientifically so that they are representative of the population
Population of Interest
Population
Sample
Sample
Parameter
Statistic
We measure the sample using statistics in order to draw
inferences about the population and its parameters.
8
Characteristics of Good Samples
o Representation
• Sample surveys are almost never conducted for the purposes of
describing the particular sample under study. Rather they are
conducted for purposes of understanding the larger population from
which the sample was initially selected
• A great deal of work has been done over the years in developing
sampling methods that provide representative samples for the
general population.
E.g. In an ICT study on user satisfaction with a new software tool, the sample should include
users from different departments and varying levels of experience to accurately represent the entire
user base.
9
Characteristics of Good Samples
• Adequate Size: The sample size is sufficiently large to provide reliable
and valid results.
• Randomness: Each member of the population has an equal chance of
being included in the sample
• Relevance: The sample is relevant to the research questions and
objectives.
10
Characteristics of Good Samples
12
Basic term cont’d….
13
Basic Terms cont’d…
14
Basic Terms cont’d…
Sampling Frame: is the list of people from which the sample is taken. It
is the list from which the potential respondents are drawn.
15
Basic term cont’d….
16
Basic term cont’d….
• The sampling unit is not necessarily the same as the study unit.
• If the objective is to determine the availability of latrine, then the
study unit would be the household; or if it is lab equipment then study
unit is labs.
• If the objective is to determine the prevalence of trachoma, then the
study unit would be the individual.
Sampling fraction (Sampling interval) - the ratio of the number of units
in the sample to the number of units in the reference population (N/n)
17
Hierarchy of sampling
Study subjects
The actual
participants in
the study
Sample
Subjects who are
selected
Sampling Frame
The list of potential subjects
from which the sample is
drawn
Source population
The Population from whom the study
subjects would be obtained
Target population
The population to whom the results would be
applied
18
Errors in statistical Study
A sample is expected to mirror the population from which it comes,
however, there is no guarantee that any sample will be precisely
representative of the population.
No sample is the exact mirror image of the population .
Sampling or Random
Errors
Non-sampling or systematic
19
1. Sampling error
20
Sampling error cont’d…
21
Sampling error cont’d…
Chance: main cause of sampling error and is the error that occurs just
because of bad luck.
It can occur whether the total study population or a sample is being used.
24
Non-sampling Error……
25
Non-sampling Error…….
26
Non-Sampling Error cont’d …
Random error can distort the results in any given direction but tend to
balance out on average
Thus, the total survey error
27
Advantage of sampling
28
Disadvantage of Sampling
If the population is very large and there are many sections and subsections,
the sampling procedure becomes very complicated
If the researcher does not possess the necessary skill and technical
knowledge in sampling procedure, then the outcome will be devastated.
29
Characteristics Of A Good Sample Design
From what has been stated above, we can list down the
characteristics of a good sample design as:
Sample design must result in a truly representative sample.
Sample design must be such which results in a small
sampling error.
Sample design must be viable in the context of funds
available for the research study.
Sample design must be such so that systematic bias can be
controlled in a better way.
Sample should be such that the results of the sample study
can be applied, in general, for the universe with a
reasonable level of confidence.
30
Types of Sampling
31
Types of Sampling Methods
Sampling Method
Simple Stratified
Random
Quota
Judgemental
Systematic Cluster
Convenience
Multistage Random Sampling
32
Probability Sampling Method …
What does it mean to be independent? The researchers select each person for
the study separately.
Let us say you were asked to participate in an experiment, enjoyed it, and
told your friends to contact the researcher to volunteer for the study.
In probability sampling
A sampling frame exists or can be compiled.
should have an equal or at least a known or nonzero chance of being
included in the sample.
Generalization is possible (from sample to population)
• Simple Random Sampling,
• Systematic Sampling,
• Stratified Random Sampling,
• Cluster Sampling
• Multistage Sampling. 34
1. Simple Random Sampling(SRS)
36
Simple Random Sampling
cont’d …
o with replacement- an element may appear multiple times in the one sample possible
Nn
samples.
37
Example
so
what?
38
2. Systematic Random Sampling
39
Steps in systematic sampling:
40
E.g. systematic sampling
• N = 1200, and n = 60
sampling fraction = 1200/60 = 20
• List persons from 1 to 1200
• Randomly select a number between 1 and 20 (e.g. 8)
• 1st person selected = the 8th on the list
• 2nd person = 8 + 20 = 28th list e.t.c.
41
Systematic sampling ….
o It relies on arranging the target population according to some ordering
scheme and then selecting elements at regular intervals through that ordered
list.
o Systematic sampling involves a random start and then proceeds with the
selection of every kth element from then onwards. In this case, k
=(population size/sample size).
o It is important that the starting point is not automatically the first in the list,
but is instead randomly chosen from within the first to the kth element in the
list.
42
• Though the frame available, the
population may not be homogeneous, so
what?
43
3. Stratified Random Sampling
So, you divide your sample into male and female members and randomly
select the required sample size within each subgroup (or "stratum")
With this technique, you are guaranteed to have enough of each subgroup
for meaningful analysis.
45
3. Stratified Random Sampling
Classify all members of the population as a member of one of the identified subgroups
Randomly select (using simple random sampling or others) an appropriate number of
individuals from each subgroup.
Then the total sample size will be the sum of all samples from each subgroup.
46
3. Stratified Random Sampling
There are two methods to get the study subject from each subgroup,
proportional allocation or
equal allocation.
We use proportional allocation technique when our subgroups vary dramatically in size
in our population
• Let N be total population and N1, N2 . . . . Nk be the subtotal population for strata 1, 2, ….
K respectively. Moreover let n be the total sample size and n1, n2…..nk be th subsample
for strata 1, 2…..k respectively in which N = N1 + N2 +….. …+ N K and n = n1 +
n2 + …………..+ nk
Then the subsample “ni “ which will be selected from subgroup Ni can be computed by
n Ni
ni where i 1, 2, 3........k
N
47
The higher the population in the subgroup, the higher the sample
size will be.
48
Advantage of stratified sampling over simple random sampling
DEMERIT
Sampling frame for the entire population has to be prepared separately
for each stratum.
49
Proportional Allocation
• If Pi represents the proportion of population included in stratum i, and
n represents the total sample size, the number of elements selected
from stratum i is n
50
Example
• Suppose that a sample of size n = 30 is to be drawn from a population of
size N = 8000 which is divided into three strata of size N1 = 4000, N2 = 2400
and N3 = 1600.
• Adopting proportional allocation, we shall get the sample sizes as under for
the different strata:
• For strata with N1 = 4000, we have P1 = 4000/8000 and hence n1 = n . P1 = 30
(4000/8000) = 15
• Similarly, for strata with N2 = 2400, we have n2 = n . P2 = 30 (2400/8000) = 9,
• For strata with N3 = 1600, we have n3 = n . P3 = 30 (1600/8000) = 6
• Using proportional allocation, the sample sizes for different strata are 15, 9
and 6 respectively which is in proportion to the sizes of the strata viz.,
4000 : 2400 : 1600. 51
4. Cluster Random
Sampling
In this sampling scheme, selection of the required sample is done on groups
of study units (clusters) instead of each study unit individually.
The sampling unit is a cluster, and the sampling frame is a list of these
clusters.
If the study covers wide geographical area, using the other methods will be
too costly.
The idea is, divide the total population in to different clusters and then the
unit of selection will be cluster.
Therefore, total population in the selected cluster will be taken as the sample.
52
Steps in cluster sampling are:
53
Consider the following graphical display:
54
5. Multistage Random Sampling
56
Cont’d……….
Advantages
Cheaper and faster than probability
Reasonably representative if collected in a thorough manner
57
1. Judgment Sampling/ Purposive sampling
58
2. Convenience Sampling
• A type of non probability sampling which involves the sample being drawn from
that part of the population which is close to hand. That is, readily available and
convenient.
• The researcher using such a sample cannot scientifically make generalizations about
the total population from this sample because it would not be representative enough.
60
3. Quota sampling
61
Cont’d
62
Cont’d
64
Review Questions
• Why might a researcher choose non-probability sampling methods?
• How does judgment sampling differ from other non-probability
sampling methods?
• Why is convenience sampling not ideal for making generalizations
about a population?
• What is the major weakness of quota sampling compared to stratified
sampling?
• What are the advantages and disadvantages of using snowball
sampling?
65
Cont’d
• While this technique can dramatically lower search costs, it comes at the
expense of introducing bias because the technique itself reduces the
likelihood that the sample will represent a good cross section from the
population.
66
Sample Size Determination
The answer will depend on the aims, nature and scope of the study and on
the expected result. All of which should be carefully considered at the
planning stage.
67
Sample……
o If sample (“n”) is
Large
Increase
accuracy Optimum
Costy / complex Take sample
Small
o Decrease accuracy
o Less costy
How ?
68
Factors to determine sample size
• Size of population
• Resources – subjects, financial, manpower
• Method of Sampling- random, stratified
• Degree of difference to be detected
• Variability (S.D.) – pilot study, historical
• Degree of Accuracy (or errors)
- Type I error (alpha) p<0.05
- Type II error (beta) less than 0.2 (20%)
- Power of the test : more than 0.8 (80%)
• Statistical Formulae
• Dropout rate, non-compliance to Rx
69
o Sample size determination depending on outcome variables.
70
• The third category covers continuous response variables such as birth
weight, age at first marriage, blood pressure and cerium uric acid level,
for which numerical measurement are usually made.
• In this case the data are summarize in the form of means and standard
deviations or their derivatives.
71
Sample Size………...
The sample size determination formulas come from the formulas for
the maximum error of the estimates and is derived by solving for n.
72
Sample for Single population
73
Sample size for single population mean
74
Maximum acceptable difference (w): This is the maximum amount of error that
you are willing to accept.
Desired confidence level (Z/2 ) : is your level of certainty that the sample mean
does not differ from the true population mean by more than the maximum
acceptable difference. Commonly we use a 95% confidence level.
Then the sample size determination formula for single population mean is defined
by:
z22 2
n
w2
75
Sample size for single population mean cont’d…
Where
• α= The level of significance which can be
obtain as 1-confidence level.
• σ=Standard deviation of the population
• w= Maximum acceptable difference
• z α/2 = The value under standard normal
table for the given value of confidence level
76
Sample Size for Single Population Proportion
z22 * p (1 p )
n
w2
One of undergraduate student want to conduct a research on the prevalence of lab utilization
of students at CIVE. Given that the prevalence from the previous study found to be 45.7% ,
what will be the sample size he should take to address his objective?
Solution:
Margin of error d= 5%
A confidence level of 95% will give the value of as Zα/2=1.96.
Then using the formula :
2 2
Z P (1 P ) Z
0.05 0.457 (1 0.457 )
n 2
2
2
W 0.05 2
1.96 0.457(0.543)
2
0.05 2
382
79
Some Considerations
• Where nf = final sample size, no = sample size from the above formula and N total
population.
Take the design effect in to account if needed
80
Incorrect sample size will lead to
o Wrong conclusions
o Waste of resources
o Loss of money
o Ethical problems
o Delay in completion
81
Review Questions
• Why is determining the correct sample size important in research?
• What factors should be considered when determining sample size?
• What information is needed to estimate the sample size for a single
population survey?
• How is sample size calculated for categorical and continuous variables?
• How would you calculate the sample size for a study with a known
prevalence rate?
• What adjustments might be needed for nonresponse or dropout rates?
82
Example 2
83
84