Sampling error &
Sample Size
Estimation (SSE)
Dr. Mariyamma Philip
Additional Professor,
Department of Biostatistics
NIMHANS, Bangalore
Sampling..
The main objective of drawing a sample is to make inferences
about the larger population from the smaller sample.
Every member of the
population has the
same chance of being
selected in the sample
Random sample
Parameters
Statistics
Estimation
2
Sampling…
Requirements of a Sample
1. Representativeness
2. Adequate
3. Unbiased & Objective
So, if the nature of the population has to be interpreted from a
sample, it is necessary for the sample to be truly representative of
the population.
i.e., it should reflect the similarities and differences found in the
population.
3
Pros & cons of Sampling
Pros Cons
1 Cost effective and takes less time 1 Introduces sampling error
2 Results are obtained more quickly 2 Excludes a great proportion of the population
3 Accurate Results 3 Results may not be generalizable to
target population if sample is not
representative
4
Sampling Error
The error that emerges when the sample used in your study is
not representative of its entire population.
Can be reduced by
- Taking large sample size
- Employing Probability Sampling methods
- Aligning the study sample with the target Sample Sampling error
population on as many characteristics as possible
For example, suppose a researcher is conducting a study of stress among medical–
surgical nurses. If 20% of the nursing target population is male, ideally, 20% of the
study sample would be male as well. Similarly, the study sample should mirror the
target population on other characteristics, such as age, education, and nursing
experience.
In selecting the study sample, the primary goal is to minimize sampling error (the
discrepancy between the study sample and the target population). 5
Non-Sampling Error
Non sampling error includes a range of errors brought
forth from human mistakes, such as incorrect data entries
and questionnaire preparation.
• Vague Questions
• Ambiguity in definition
• Procedure in data collection
• Investigator’s bias
• Respondent’s bias
• Analysis error
• Tabulation error….
6
Sampling Methods
Probability Sampling Non probability sampling
1 Subjects do not have equal chance for being
1 Involves Random Sampling – equal selected- May not represent the target
chance for all the units population
2 No Selection Bias
2 Selection Bias is present
3 Requires Sampling Frame 3 Do not require sampling frame
4 Preferred sampling technique 4 Used in Qualitative studies or in
difficult to reach populations
Used in Quantitative Research
7
Sampling Methods
Probability or Random Sampling Non probability sampling
1 Simple Random Sampling 1 Convenience sampling
2 Stratified Random Sampling 2 Quota Sampling
3 Systematic Random Sampling 3 Judgement Sampling / Purposive sampling
4 Cluster Sampling 4 Snowball Sampling
Mixed or Multistage sampling
8
How big a sample is needed?
How big a sample is needed?
Fundamental question.
Arises in the planning of most research projects, but most
often inadequately answered.
Determining sample size is a very important issue because
Too large samples – waste of time, resources and money
Too small samples – do not facilitate in making good decisions
or lead to inaccurate results.
Objective of SSE is to obtain both desirable accuracy and
confidence levels with minimum cost.
10
Why sample size estimation ?
Data is not readily available.
How many members of the population should be selected to ensure
that the population is properly represented?
If data has already been collected.
How do you determine if you have enough data?
11
What happens if “n” is?
Too small:
- Inadequately address the research question
- May fail to detect a clinically important difference
- Too small a sample will fail to detect clinically important effects
(Lack of sufficient POWER)
Too Many:
- Involve extra subjects
- Require more resources (manpower, finance, time etc.)
- Too large a sample will identify “statistically significant” differences
which have no clinical relevance.
12
How big a sample is needed?
Should be addressed after finalizing the
• research question / primary objective,
• primary outcome,
• study design and
• sampling method.
13
3 lakhs
30 subjects
This formula or the magic number 30 will not
be suitable for all studies. 14
SSE differs according to
objectives / research question
Estimation
Mean, proportion, sensitivity, specificity..
Hypothesis testing
Comparison of means – two or more independent groups
Comparison of means – paired data (2 or more)
Comparison of two proportions
Comparison of proportions – paired data
Testing – correlation coefficient, OR, RR, HR
15
SSE also differs as per
the type of variable.
Estimation
1) What is the prevalence of anemia among pregnant women?
2) On an average what is the hemoglobin level among HIV patients?
Hypothesis testing
1) Does the proportion of side effects differs between two treatments?
2) Does the average cholesterol level differ between treatments A and B?
16
Before SSE, the Researcher should
have finalized these:
1) Primary objective(s) / research question
2) Primary outcome variable
3) Study design
4) Sampling technique
17
Before a statistician can do SSE, the
researcher need to answer these:
1. What is the primary objective of the study?
2. What is the primary outcome / primary variable of interest?
3. Is it numerical or categorical?
4. How small a difference is clinically important to detect?
5. What is the Effect size ?
6. How much variability is in the population?
7. What is the desired and ?
8. What is the sample size allocation ratio?
To calculate sample size, the researcher has to have some
idea of the results expected in the study !!!... 18
Estimates for SSE from
previous / Pilot Studies
• Needed estimates for sample size calculation can be taken from
similar previous studies.
• If previous research is unavailable, a pilot study is needed for
adequate estimation of sample Size with a given precision.
• A pilot (preliminary) sample must be drawn from the
population, and the statistic computed from it is used in sample
size determination .
19
SSE Situations in today’s class
Estimation of mean
Estimation of Proportion (prevalence)
Difference in two means
Difference in two Proportions
20
Required info to estimate sample size.
Primary research question or Primary objective
Type of outcome
− Qualitative, Quantitative
Type of study
-Descriptive study ❖ Clinically significant difference
❖ Probability of type I error
❖ Variability
- Comparative study
❖ Clinically significant difference /ES
Test of significance ❖ Probability of type I error
❖ Power of the study 21
❖ Variability
SSE depends on the primary objective,
outcome variable.
Estimation of mean
On an average what is the hemoglobin level
Estimation among HIV patients?
Estimation of proportion
What is the prevalence of anemia among
pregnant women?
Comparison of means
Hypothesis testing Does the average cholesterol level differ between
treatments A and B?
Comparison of proportions
Does the proportion of side effects differs
between two treatments?
22
SSE examples …
mean, proportion, estimation, testing ….
Estimation of mean
What is the average Fasting Blood Sugar (FBS)
Estimation level among Diabetes patients?
Estimation of proportion
What proportion of young adults are diabetic?
Comparison of means
Does average Fasting Blood Sugar (FBS) level differ with
Hypothesis respect to Gender of the patient?
testing
Comparison of proportions
Does the proportion of young adult diabetic differ
between Rural and Urban?
23
SSE examples …
mean, proportion, estimation, testing ….
What is the average haemoglobin level among teenage girls in rural schools?
Estimation of mean
How common is death among trauma care patients?
Estimation of Proportion (prevalence)
Is treatment A better than treatment B for the reduction in blood sugar level?
Difference in two means
Is the adherence to therapy more among who received usual care than
collaborative care model?
Difference in two Proportions 24
S S E – example 1
Estimation of mean
What is the sample size required to find out the average
hemoglobin level among HIV patients?
1) Research question
To estimate the average hemoglobin level among HIV patients
2) Study design
Descriptive Single group
3) Outcome measure
Average hemoglobin level - numerical
25
S S E – example 1
Estimation of mean
What is sample size required to find out the average hemoglobin
level among HIV patients?
What is the average Hb among HIV patients?
What is the clinically significant difference?
How close would be the estimated value to the value of the parameter?
± 1, ± 1.5, ± 2, ± 2.5, ± 3, etc
What is the anticipated variability in the study population?
What confidence level you would like?
26
S S E – example 1
Estimation of mean
What is the sample size required to find out the average hemoglobin
level among HIV patients?
What is the average Hb among HIV patients? Don’t know
How close would the estimated value to be to the value of the parameter? ±2
± 1, ± 1.5, ± 2, ± 2.5, ± 3, etc
What is the anticipated variability in the study population? 6.95 from a previous study
What confidence level you would like? At least 95 %.
Z 2 * 2 z 2 * 2
n= = L2
−
( x− ) 2
= 0.05
= 6.95
27
How is the formula arrived
Difference between the means
Critical ratio = -------------------------------------------------------
Standard error of the difference
−
x−
Z=
n
2
−
x− n
Z2 =
2
z2 * 2
n=
−
( x − )2 28
S S E – example 1.1
Estimation of mean
What is sample size required to find out the average hemoglobin level
among HIV patients?
What is the average Hb among HIV patients?
How close would you expect the sample mean to this value? ±2
± 1, ± 1.5, ± 2, ± 2.5, ± 3, etc
What is the anticipated variability in the study population? 6.95 from a previous study
What confidence level you would like? At least 95 %.
Z 2 * 2 z 2 * 2
n=
− = L2
( x− ) 2
= 6.95
29
S S E – example 1.2
Estimation of mean
Heights of students in a college campus are normally distributed with a
standard deviation = 5 inches, find the minimum sample size required to
estimate the average height with 95% confidence with a maximum error = 0.5 inches.
Z 2 * 2
n=
−
( x− ) 2
=5
d = 0.5
30
S S E - example 2
Estimation of Proportion (prevalence)
A researcher wants to estimate the prevalence of anemia
among kids aged less than 3 years, as part of her study.
How many kids to be selected for the study ?
1) Research question
To estimate the prevalence of anemia among kids
2) Study design
Descriptive - Single group
3) Outcome measure
Prevalence – anemia yes/no
31
S S E - example 2
Estimation of Proportion (prevalence)
A researcher wants to estimate the prevalence of anaemia among kids
aged less than 3 years, as part of her study. How many kids to be
selected for the study ?
What is the anticipated prevalence of anemia among kids < 3 years ? 70 %
What is the minimum degree of precision required ?
Estimate should not be different by more than 5 %
What confidence in the estimate is expected?
95 %.
* p(1 − p)
2
p = 0.70
n = z /2
d2 d = 0.05
n=
1.962 * 0.7 * 0.3
= 323
= 0.05
(0.05)2
1.96 2 * 70 * 30
n= 2
= 323 32
5
How is the formula arrived
Difference between the proportions
Critical ratio = ---------------------------------------------
Standard error of the difference in proportions
^
( p − p)
Z=
p(1 − p) / n
Z2 =
(p − p )^ 2
*n
p(1 − p)
d = p− p
2
( )
^ 2
33
S S E - example 2.1
Estimation of Proportion (prevalence)
A study found that 73% of pre KG children ages 3 to 5 whose mothers had a
bachelor’s degree or higher were enrolled in early childhood care.
1. How large a sample is needed to estimate the true proportion within 3% with 95%
confidence?
* p(1 − p)
2
n= z /2
d2
p = 0.73 / 73
d = 0.3 / 3
2. How large a sample is needed if you had no prior knowledge of the proportion?
p = 0.5 / 50
If you have prior knowledge about the sample proportion then you have 34
to study less number of subjects, for the same margin of error.
SSE Situations in today’s class
Estimation of mean
✓
Estimation of Proportion (prevalence)
Difference in two means
Difference in two Proportions
35
S S E – example 3
Comparison of means
Suppose a researcher wishes to compare the birth
weights of babies conceived by Assisted Reproductive L = 100
Technologies (ART) with that of normal pregnancies.
Standard deviation of birth weight is assumed to be
400 grams. = 0.80, 𝑧=0.84
= 400
The researcher considers a difference in birth weight
of at -least 100 grams will be an important. If he
assumes a power of 80% of detecting a true
difference, how many children need to be selected in
each group?
36
ART study
1) Research question
To compare whether there is any difference in mean birth
weights children (Conceived by ART VS normal pregnancies.)
ART
2) Study groups Children
Normal
3) Outcome measure
Mean Birth weight (in grams)
37
ART study..
( z / 2 + z )2 * 2 2
n=
L2 L = 100
= 0.80, 𝑧=0.84
(1.96 + 0.84 ) 2 * 2 * 400 2 = 400
n= 2
= 251
100
The minimum sample size in each group is 251.
38
S S E – example 3.1
Comparison of means
A researcher wants to know the sample size for detecting a mean
difference of 0.3mg/dl Hb level in kids with two different strategies, with
95 % confidence. Mean (SD) from a pilot study are 10.2 (1.9) and
11.0(1.7) respectively.
2 * z / 2 * ( + 2 )
2 2 2
n= 1
L2 L = 0.3
2 * 1.96 2 * (1.9 2 + 1.72 )
n= 2
= 891 1 = 1.9
0 .3
2 = 1.7
The minimum sample size in each group is 891. 39
S S E – example 3.2
Comparison of means
Mean uric acid level in the diseased group is 5.4 mg/100 ml (1.1). Using a new
drug, the researcher expects the clinical difference between the untreated and
treated group to be at least 0.2mg/100ml. How many subjects should be studied?
He wishes to conduct the study with significance level = 0.05 and power = 0.90.
( z / 2 + z )2 * 2 2 L = 0.2
n= 2
L
= 0.90, 𝑧=1.28
(1.96 + 1.28 ) 2 * 2 * 1.12
n= 2
= 635 = 1.1
0.2
The minimum sample size in each group is 635. 40
S S E - example 4
Objective : To see whether there is any significant difference in
percentage of brain tumor patients survived after 3 years of
diagnosis between a new treatment and the standard one ?
Outcome measure: Percentage of people survived (alive/dead)
Study design: RCT New Treatment
Brain tumour patients
Standard treatment
41
Brain tumor study
Details provided by researchers upon statistician’s queries
Survival for standard treatment: 70%
Survival for new treatment: 80%
We wish to have a 80% chance of finding this difference (Power)
Level of significance: 0.05
p1 = 0.7, q1 = 0.3, p2 = 0.80, q2 = 0.20
42
Brain tumor study
p1 = 0.7, q1 = 0.3, p2 = 0.80, q2 = 0.20
=
(z /2 2 pq + z ( p1q1 + p2 q2 ) )2
( p1 − p2 ) 2
( )
2
1.9 6 2 × 0 .7 5 × 0 .2 5 + 0 .8 4 0 .7 × 0 .3 + 0 .8 0 × 0 .2 0
= 2 9 2 .8 2
( 0 .7 - 0 .8 )
2
The minimum size of the sample in each group is 293 43
S S E - example 4.1
Comparison of proportions
An investigator hypothesizes that the proportion of children developing
emotional problems at beginning of school with both parents at home is 5 and
the proportion of children with only one parent at home is about 15. He has set
significance level = 0.05 and power = 0.90.
p1 = 5, p2 = 15, p = 0.10
=
(z /2 2 pq + z ( p1q1 + p2 q2 ) )
2
d = 10
( p1 − p2 ) 2
= 0.90, 𝑧=1.28
=
(
1.96 2 * 0.10 * 0.90 + 1.28 (0.05 * 0.95 + 0.15 * 0.85) )
2
= 187
(0.05 − 0.15)2
The minimum size of the sample in each group is 187
44
Factors that affect sample size
1. Clinically significant difference /
Effect size / Minimum expected difference
2. Variability
3. Probability of type I error
4. Power of the study
5. One tailed or two tailed tests
45
1.Minimum Expected Difference.. .
For example, suppose a study is designed to compare a standard
diagnostic procedure of 80% accuracy with a new procedure of
unknown but potentially higher accuracy.
It would probably be clinically unimportant if the new procedure
were only 81% accurate.
The investigator believes that it would be a clinically important
improvement if the new procedure were 90% accurate.
Therefore, the investigator would choose a minimum expected
difference of 10% (0.10).
The results of pilot studies or a literature review can also guide the selection
of a reasonable minimum difference or the clinician’s experience.
46
1. Minimum Expected Difference
This is the smallest measured difference between comparison groups
that the investigator would like the study to detect.
As the minimum expected difference is made smaller, the sample size
needed increases.
The setting of this parameter is subjective and is based on clinical judgment
and experience with the problem being investigated.
How much is the percentage (mean) to be obtained from the sample likely to vary from
the population percentage (mean) ?
How much is the difference in mean (percentage) between the groups?
Can be obtained from Previous literature / Pilot study – Mean, SD or effect size
Can be assumed if there’s no similar study / pilot study is not feasible. 47
Factors that affect sample size…
1. Minimum Expected Difference ( z / 2 + z ) 2 * 2
n=
L2
(1.96 + 1.28 ) 2 * 10 .12
n= 2
= 1071
1 Difference of 1
(1.96 + 1.28 ) 2 * 10 .12
n= 2
= 268
2 Difference of 2
(1.96 + 1.28 ) 2 * 10 .12
n= 2
= 67 Difference of 4
4
(1.96 + 1.28 ) 2 * 10 .12
n= 2
= 43 Difference of 5
5
Smaller the clinically significant / expected difference → more subjects 48
2. Estimated measurement variability
Precision with which parameters should be estimated.
Precise measurement of variables.
How to get an estimate of the variability ?
Standard deviation -from literature or from Pilot study
49
Variability & precision
Range SD More Precise,
Weighing Machine1 57 - 60 0.93 less variation
58 57 58 58 58 57 60 58
Weighing Machine 2
58 57 67 57 60 50 65 55
Range SD
55 - 67 5.4 Less precise,
more variation
50
Factors that affect sample size
2. Measurement variability ( z / 2 + z ) 2 * 2
n=
L2
SD = 2.5
(1.96 + 1.28 ) 2 *1.12
n= 2
= 318 SD = 1.1
0.2
(1.96 + 1.28 ) 2 * 0.9 2
n= 2
= 213 SD = 0.9
0.2
(1.96 + 1.28 ) 2 * 0.82
n= 2
= 168 SD = 0.8
0.2
51
Higher SD → more subjects
3. Significance level
This parameter is the maximum p value for which a difference
is to be considered statistically significant.
As the significance criterion is decreased (made more strict),
the sample size needed to detect the minimum difference
increases.
The significance criterion is customarily set to 0.05.
52
Types of errors
Actual Statistical Decision
situation Ho is not Ho is
rejected rejected
Ho is true
Type I Error
Ho is false
Type II Error
Prob. of rejecting a true Ho is Type I error.
Prob. of accepting a false Ho is Type II error.
The alpha level of a hypothesis test is the threshold (maximum type 1 error that can
be allowed) that is used to determine whether or not to reject the null hypothesis.
It is often set at 0.05 but it is sometimes set as low as 0.01 or as high as 0.10.
It is not possible to reduce both Type I and Type II error simultaneously. For a given
sample size, if one is reduced, the other automatically increases. Usually Type I error is
fixed at a tolerable limit and the Type II error is minimized by increasing the sample 53
size.
Confidence, Error…
Type 1 error / Confidence C
O
10% / 90% N
E
F
5% / 95% R
I
R D
1% / 99% O E
R N
C
E
Lesser the error () → more subjects
54
Confidence & Type 1 error
Maximum Type 1 error allowed in a test is called level of
significance / Alpha level.
Conventionally it has been fixed as 5 % (5/100).
5 % Error,
30 % Error, 95 % confidence
90 % Error,
70 % confidence
10 % confidence
5
Error
100 90 80 70 60 50 40 30 20 10 0
0 10 20 30 40 50 60 70 80 90 100
Confidence → 95
So it is said that p-value should be < 5% (0.05) for findings to be significant.
55
Factors that affect sample size
4.Significance level (Type I error) ( z / 2 + z ) 2 * 2
n=
L2
n=
(1.65) *10 * 90
2
= 25 Z0.10= 1.65
(5 − 15) 2
n=
(1.96) *10 * 90
2
= 35 Z0.05 = 1.96
(5 − 15) 2
Two-Tailed
α
Z
n=
(2.58) *10 * 90
2
= 60
Z0.01 = 2.58
0.10 1.645 (5 − 15) 2
0.05 1.960
0.010 2.576
0.001 3.291
0.0001 3.819 Less error More confidence → more subjects
56
4. Statistical Power
This parameter is the power that is desired from the study.
Power is the ability of a study to detect significant difference when
there is true difference.
As power is increased, sample size increases.
While higher power is always desirable, there is an obvious trade-off
with the number of individuals that can feasibly be studied, given the
usually fixed amount of time and resources available to conduct a
study.
In randomized controlled trials, the statistical power is customarily set
to 0.80, but many clinical trial experts now advocate a power of 0.90.
57
Types of errors
Statistical Decision
Actual
situation Ho is not rejected Ho is rejected
Ho is true 1-
Type I Error
Confidence
Ho is false
Type II Error 1-
Power
Type II error (β): the probability of accepting a false null hypothesis.
Power (1-β): the probability correctly rejecting a false null hypothesis.
Complement of Type II error is called the “Power” of the test.
It is the probability of correctly rejecting a false Ho.
Power is a numerical value indicating the sensitivity of the test.
Power is the test’s ability to correctly reject the null hypothesis.
A test with high power has a good chance of being able to detect the difference if it exists.
It is not possible to reduce both Type I and Type II error simultaneously. For a given sample size, if
one is reduced, the other automatically increases. Usually Type I error is fixed at a tolerable limit and
58
the Type II error is minimized by increasing the sample size.
Factors that affect sample size
( z / 2 + z ) 2 * 2
3.Power n=
L2
n=
(1.96) *10 * 90
2
= 35 Power not specified
(5 − 15) 2
n=
(1.96 + 0.84) *10 * 90
2
= 71 80 % Power
(5 − 15) 2 Z =0.84
n=
(1.96 + 1.28) *10 * 90
2
= 95 90 % Power
(5 − 15) 2
Z =1.28
Two-Tailed
Z
80 0.84
More power → more subjects 59
90 1.28
One-tailed, Two-tailed test
60
One-tailed, Two-tailed test
Alternate Hypothesis tells us if a test is one-tailed or two tailed.
Ho: No difference between the groups (M1 = M2)
H1 could be one-sided, where either superiority or
inferiority is accepted.
H1:M1 > M2 OR H1 : M1 < M2
H1 could be two-sided, where the difference is accepted
without any inference as to which is better or higher.
H1 : M1 M2 61
5. One- or Two-tailed tests
In a few cases, it may be known before the study that
difference between comparison groups is possible in only one
direction.
In such cases, use of a one-tailed statistical analysis may be
considered, which would require a smaller sample size for
detection of the minimum difference than would a two-tailed
analysis.
Because truly appropriate one-tailed analyses are rare, a two-
tailed analysis is usually assumed.
62
Factors that affect sample size
5. One or two tailed ( z / 2 + z ) 2 * 2
n=
L2
n=
(1.96) *10 * 90
2
= 35
(5 − 15) 2 Two tailed
n=
(1.645) * 10 * 90
2
= 24 One tailed
(5 − 15) 2
Two-Tailed One-Tailed
α
Z
0.10 1.645 1.282
0.05 1.960 1.645
0.010 2.576 2.326
0.001 3.291 3.090 Two tailed tests → more subjects63
0.0001 3.819 3.719
Factors that affect sample size
1. Clinically significant difference /
Effect size / Minimum expected difference
2. Variability
3. Probability of type I error
4. Power of the study
5. One tailed or two tailed tests
64
Prior to approach a Statistician for sample size....
✓ What is the primary outcome measure?
✓ What is the study design and statistical procedure?
✓ What difference in outcome measure is clinically significant?
✓ What would be the level of significance or level of confidence?
✓ What is the power?
✓ Which are the other constraints?
65
Adjustments to Sample Size for
non-response
Non-Response (and attrition):
n2 = final size, n1 = effective sample size
NR = Non-response (attrition) rate
66
Rejected Sample Size Statements
“Sample sizes are not provided because there is no prior information on
which to base them“ or “this study is exploratory”….
• Find previously published information
• Conduct small pre-study
• If the study is a very preliminary study, sample size calculations
not usually necessary.
"A previous study in this area recruited 150 subjects and found
highly significant results (p=0.014), and therefore a similar
sample size should be sufficient here."
• Previous studies may have been 'lucky' to find significant results,
due to sampling variation.
• You might not be that lucky.
“The clinic sees around 50 patients a year, of whom 10% may refuse to take
part in the study. Therefore over the 2 years of the study, the sample
size will be 90 patients. "
• Although most studies need to balance feasibility with study power,
the sample size should not be decided on the number of available 67
patients alone.
68