Dr.
Muhammad Akram Uzzaman
Professor & Chairman
Department of Psychology
Jagannath University
October 2023
akrambro@gmail.com
01718645841
What will we learn?
Nature of population and sampling
Types of sampling
Calculating sampling formula
Application to research
1.Formulating the research problem
2.Extensive literature review
3.Developing the hypothesis
4.Prepare the research design
5.Measurement and scale construction
6.Determining sample design and processing
7.Collecting data, data processing, analyzing
and hypothesis testing
8.Generalization and interpretation
9.Writing research report
A population/study population is the total collection of all the
population elements each of which is a potential case
Finite Population: A finite population contains a
countable/measurable number of sampling units.
For example: all registered patients in BSMMU in a given
year.
Infinite Population: An infinite population consists of an
endless number of sampling units
For example: all patients appeal for registration in
BSMMU in a given year.
(Depends on whether the sampling units are finite or infinite)
Sample : Any representative part of the population
is called sample
For example, You want to study on every doctors of
Bangladesh regarding how much the average doctors earns.
Time and finances stop you from knocking on every doctor
in Bangladesh, so you choose to ask 1,000 random Drs.
This one thousand people is your sample
Sampling: It is a statistical procedure of drawing a
small number of elements from a population and
drawing conclusion regarding the population
Error/Sampling error
It is the difference between a population parameter and a
sample statistic used to estimate it. For example, the
difference between a population mean and a sample mean
is sampling error.
Sampling error occurs because a portion, and not the
entire population, is surveyed
Bias
Bias refers to only to error that is systematic in
nature
Principle 1: Systematic error occur during selection
sample from population though population is
homogeneous. Usually small number of sample is
mainly reason for this
Principle 2: The more the sample size increase, the
more the precision about population prediction
Principle 3: Heterogeneous among population
increase error rate about population prediction from
sample
Probability Sampling/Random Sampling/Chance
Sampling
PS is based on the concept of random selection-A controlled procedure
that assures that each population element has equal chance of
selection.
Non-probability Sampling/ Deliberate Sampling/Judgement
Sampling
NPS is a non-random and subjective method of sampling where the
selection of the population elements comprising the sample depends
on the personal judgement or the discretion of the sampler.
Purposive/Judgement Sampling
Accidental Sampling
Quota Sampling
Self-selected Sampling
Network Sampling
Purposive sampling, also known as judgmental,
selective, or subjective sampling, is a form of non-
probability sampling in which researchers rely on
their own judgment, knowledge, wisdom, and
prudence when choosing members of the
population to participate in their surveys.
For example: The selection of a sample of universities in
the Bangladesh that represent a cross-section of
Bangladeshi universities
Accidental sampling involves taking a population
sample that is close at hand, rather than carefully
determined and obtained. It is not as experimentally
sound as using random sampling and random
assignment
For example: When health supervisor distribute their
vaccine first come first serve basis
Quota sampling is a type of non-probability sampling
method. This means that elements from the population are
chosen on a non-random basis and all members of the
population do not have an equal chance of being selected to
be a part of the sample group
For Example: BMRC requires 20% Dentist, 20% Heart
Specialist, 30% General Medicine and the remaining from
others discipline for training
A sample is self-selected when the inclusion or exclusion of
sampling units is determined by whether the units
themselves agree or decline to participate in the sample,
either explicitly or implicitly. When survey units volunteer
to be included in the sample, this introduces self-selection
For example, to choose to take part in research on their own
accord or survey researchers may put a questionnaire online
and subsequently invite anyone within a particular
organization to take part.
Network sampling is widely used when rare
populations are of interest in survey research. It is
considered an alternative to the previous ways of
estimating rare populations in which sampling frame
is almost impossible to obtain.
For example: This is a sampling technique, in which
existing subjects provide referrals to recruit samples
required for a research study. Fake Doctor,
Prostitution, Drug Addiction, Snacther etc.
Simple Random Sampling
Systematic Sampling
Stratified Sampling
Cluster Sampling
A simple random sample is a subset of a statistical
population in which each member of the subset has an equal
probability /opportunity of being chosen. A simple random
sample is meant to be an unbiased representation of a group.
For example: The names of 25 medical representative being
chosen from 250 employees. In this case, the population is
all 250 employees, and the sample is random because each
employee has an equal chance of being chosen.
Method of Randomization
1.Lottery Method (With and Without Replacement)
2. Calculator
3. Software
4. Random Number Table
Systematic sampling is a type of probability sampling
method in which sample members from a larger
population are selected according to a random starting
point but with a fixed, periodic interval.
This interval, called the sampling interval, is calculated
by dividing the population size by the desired sample
size.
For example: As a hypothetical example of systematic
sampling, assume that in a population of 10,000 people, a
statistician selects every 100th person for sampling. The
sampling intervals can also be systematic, such as
choosing a new sample to draw from every 12 hours.
Stratified sampling is a type of sampling method in which
the total population is divided into smaller groups or strata
to complete the sampling process. After dividing the
population into strata, the researcher randomly selects the
sample proportionally.
It stratified sample is one that ensures that subgroups
(strata) of a given population are each adequately
represented within the whole sample population of a
research study
For example, one might divide a sample of adults into
subgroups by age, like 18–29, 30–39, 40–49, 50–59, and
60 and above.
For example
One might divide a sample of adults into subgroups by age,
like 18–29, 30–39, 40–49, 50–59, and 60 and above.
Stratified can be classified into three ways
1.Proportionate Stratified Sampling (100 students: female
25 & male 75, that is 3:1 from Dhaka Medical Final Year
Students)
2. Disproportionate Stratified Sampling
3. Cross Stratification: considering other characteristics
(Age, sex, education etc.)
Cluster sampling is defined as a sampling method
where the researcher creates multiple clusters of
people from a population where they are indicative
of homogeneous characteristics and have an equal
chance of being a part of the sample.
Example: Consider a scenario where BMA survey the
smartphones addiction across Bangladesh. So BMA can
divide the entire country’s population into cities (clusters)
and select further towns with the highest population and
also filter those using mobile devices.
N = z2pq/d2
Here,
N= Sample Size
Z = Value of z in specific area (1.96 for two tailed test and 2.57 for one tailed
test)
d = Level of Significance (.05 for 5%, .01 for 1%, .10 for 10%)
p = Expected proportion in population based on previous studies or pilot
studies. If unknown it will be .50.
Suppose an epidemiologist want to know proportion of children who are
hyperactive in a population. Previous published research reported that
actual number of hypertensive may not be more than 15%.
Here, z = 1.96, p = .50, q = 1-.50 = .50, d = .05
Total sample will be = 384
But for 15% it will be 196 Guilford and Fruchter (1978)
N = z2SD2 /d2
N= Sample Size
Z = Value of z in specific area (1.96 for two tailed test and 2.57 for one tailed test)
d = Allowable error or precision in the estimate of mean
SD = Population Standard deviation. Value of standard deviation can be taken from
previously done study or through pilot study: if it is not possible, then, consider as 1.0
as SD
Suppose the researcher is interested in knowing average systolic blood pressure of
children of the Dhaka city at 5 % type of 1 error and precision of 5 mmHg of either
side (more or less than mean systolic BP) and SD is 25 mmHG then the below
mentioned formula should be used as blood pressure is a quantitative variable.
Here, z = 1.96, SD = 25 mmHg (Millimeter of Mercury: Standard Unit of
Measurement of Pressure), d = 5
Total sample will be = 96
Guilford and Fruchter (1978 )
2
(1.96) x 25
2
52
In case control studies cases (the group with disease/ condition
under consideration) are compared with controls (the group without
disease/condition under consideration) regarding exposure to the
risk factor under question.
Example: Suppose a researcher want to see the link between
childhood sexual abuses with psychiatric disorder in adulthood. He
will take a sample of adult persons with psychiatric disorder and
will take another sample of normal adults having no psychiatric
disorders. He will then go retrospectively to see history of
childhood sexual abuse in both groups. Exposure to both groups
will be compared and odds ratio will be calculated. Here number of
people exposed to childhood sexual abuse is qualitative variable
hence this formula will be used for such type of design
r +1 (p*)(1-p*)(Zβ + Z )2
α/2
N= r (p1 -p2)2
r = Ratio of control to cases, 1 for equal number of case and control
p* = Average proportion exposed = proportion of exposed cases +
proportion of control exposed/2 = (0.35 + 0.20)/2 = 0.275
Zβ = Standard normal variate for power = for 80% power it is 0.84 and
for 90% value is 1.28. Researcher has to select power for the study.
Zα/2 = Standard normal variate for level of significance as mentioned
in previous section = 1.96 (for 5% level of significance)
p1 – p2 = Effect size or different in proportion expected based on
previous studies. p1 is proportion in cases and p2 is proportion in
control.
N = 138.9 = 139
Suppose a researcher wants to see the association between birth
weight and diabetes in adulthood. The birth weight being a
quantitative data, the researcher will select one group i.e. cases that
will be diabetic adults and other group i.e. control will be non‑diabetic
adults. Both groups will be traced back for data regarding childhood
weight. The formula for sample size calculation is
r +1 SD2(Zβ + Z )2α/n
N= r d2
SD = Standard deviation = researcher can take value from previously published
studies
d = Expected mean difference between case and control (may be based on
previously published studies.)
r, Zβ, Zα/2 are already explained in previous sections
if researcher think that difference in mean weight between case and control may be
around 250 gm (d) and SD is 1 Kg then considering equal number of cases and
control and 80% power the sample size will be 250.88 = 251
In cohort studies healthy subjects with or without exposure to
some risk factor are observed over a time period to see the event
rate in both groups.
Problem
If a researcher wants to see the impact of weight training
exercise on cardiovascular mortality then he will select two
groups, one consisting of subjects who do exercise and
another consisting of those who don’t do. These groups will
be followed up for a specific time period to see
cardiovascular mortality in both groups. At the end of the
study period both groups will be compared for
cardiovascular mortality. The formula for sample size is
FORMULA
So suppose the researcher wants to see the impact of weight training
exercise on cardiovascular mortality and according to previous studies
proportion of cardiovascular death in case may be around 20% and in
control it can be around 40% hence sample size calculation for 5% of
significant level and 80% power with equal number of case and
control will be = 157
Here,
P1 = .20
p2 = .40
P = (p1 +p2)/2 = .30
α = .05
Zα = 1.96 (Standard normal variate for level of significance = 1.96 (for 5% level
of significance)
β = .84
n1 or N = Sample size
A clinical trial is a randomized controlled trial
only when participants are randomly allocated to
the group receiving the treatment and a control
group.
What participants are allocated among groups
receiving different treatments the clinical trial is
simply called a randomized trial.
N= 2σ2 (Zα + Zβ)2
(µ1 - µ2)2
= 68.04 or 69
Problem
Mean blood glucose following standard insulin therapy and intensive
insulin therapy for 5 years is expected to be 200 mg% and 175 mg%
respectively. If pooled SD is 45 mg %; calculate the sample size to test the
efficacy of intensive insulin therapy at 5 % level and 90 % power.
µ1 = Expected Control Group Mean (From Previous Study) = 200
µ2 = Expected Experimental Group Mean (From Previous Study) = 175
σ = SD of control group or pooled SD of two groups = 45
Zα = z-value of SND at a given level of significance (see Table) = 1.96
Zβ = z-value of SND at a given power (see Table) = 1.28
SND = Standard Normal Distribution
Any Question
Lots of Thanks
The power of a test is the probability of rejecting the null
hypothesis when it is false; in other words, it is the probability
of avoiding a type II error. For example, a study that has an
80% power means that the study has an 80% chance of the test
having significant results. A high statistical power means that
the test results are likely valid. As the power increases, the
probability of making a Type II error
In Cluster Sampling, the sampling is done on a population of clusters therefore, cluster/group is
considered a sampling unit. In Stratified Sampling, elements within each stratum are sampled. In
Cluster Sampling, only selected clusters are sampled. In Stratified Sampling, from each stratum, a
random sample is selected.
The key distinction between cluster sampling and stratified sampling is that in cluster sampling, only a
sample of subpopulations (clusters) is chosen, whereas in stratified sampling, all the subpopulations
(strata) are selected for further sampling.
Zα = Standard normal variate for level of significance
m = Number of control subject per experimental subject
Zb = Standard normal variate for power or type 2 error as explained in earlier section p1 = Probability of
events in control group
p2 = Probability of events in experimental group
RCTs are the gold standard of clinical testing applied to new
medical interventions
RCTs are generally required in pharmaceutical testing programs
before regulators will allow new drugs to be sold.
Randomization reduce selection bias and control helps to identify
original effect of variables.
Placebos may be used for the control group (Dummy treatment
that looks the same as the experimental treatment but contain
nothing)
It is not always ethical to give a placebo, such as when this would
mean denying treatment to people who have a life-threating or
serious illness
A standard treatment that is already established against the disease
can be used in the control group for comparison.
Single-stage cluster sampling
An example of single-stage cluster sampling – An NGO wants to create
a sample of girls across five neighboring towns to provide education.
Using single-stage sampling, the NGO randomly selects towns
(clusters) to form a sample and extend help to the girls deprived of
education in those towns.
Two-stage cluster sampling
Here, instead of selecting all the elements of a cluster, only a handful of
members are chosen from each group by implementing systematic or
simple random sampling.
Multiple stage cluster sampling:
Multiple-stage cluster sampling takes a step or a few steps
further than two-stage sampling.
An example of Multiple stage sampling by clusters – An
organization intends to survey to analyze the performance of
smartphones across Germany. They can divide the entire
country’s population into cities (clusters) and select cities
with the highest population and also filter those using mobile
devices.
Multiple stage cluster sampling:
Multiple-stage cluster sampling takes a step or a few steps
further than two-stage sampling.
An example of Multiple stage sampling by clusters – An
organization intends to survey to analyze the performance of
smartphones across Germany. They can divide the entire
country’s population into cities (clusters) and select cities
with the highest population and also filter those using mobile
devices.