Population and Sampling
What is population?
• The population refers to the entire group of people, events or
things of interest that the researcher wishes to investigate.
• Population is the target group which a researcher selected to draw
a conclusion about it.
• Group of individuals who have common characteristics.
• Size of the population.
What is Sample?
• Subset of a population.
• Samples are used in research in order to draw inference about
population.
• Sampling is the process of selecting a small number of elements
from a larger defined target group (Population) of elements such
that the information gathered from the small group will allow
judgments to be made about the larger groups.
Reason of Sampling?
• Timeliness
• Economy
• The large size of many populations
• Inaccessibility of some of the population
What is a good sample?
• Accurate: absence of bias
• Precise estimate: sampling error
• Sampling error is any type of bias that is attributable to mistakes in
either drawing a sample or determining the sample size.
Sampling Process
Define the population
Determine the sampling frame
Select the sampling technique
Determine the sample size
Execute the sampling process
Types of Sampling
Population
Sample
Probability Non-
Sampling probability
Types of Sampling
Sampling
Methods
Non-probability
Probability
/ Purposive
Sampling
Sampling
Simple random Systematic Stratified Cluster Judgement Convenience Snowball
Quota Sampling
sampling Sampling Sampling Sampling Sampling Sampling Sampling
Probability Sampling Designs
• A probability sample is one that gives every member of the
population a known chance of being selected.
• All are selected randomly.
• It is true representation of the population.
Simple Random Sampling
• Simple random sampling is a method of probability sampling in
which every unit has an equal nonzero chance of being selected.
• Each element in the population has a known and equal probability
of selection.
• This implies that every element is selected independently of every
other element.
Systematic Sampling
• Systematic Random Sampling is a method of probability sampling in which the defined
target population is ordered and the sample is selected according to position using a
skip interval.
• The sample is chosen by selecting a random starting point and then picking every ith
element in succession from the sampling frame.
• The sampling interval, i, is determined by dividing the population size N by the sample
size n and rounding to the nearest integer.
• For example, there are 100,000 elements in the population and a sample of 1,000 is
desired. In this case the sampling interval, i, is 100. A random number between 1 and
100 is selected. If, for example, this number is 23, the sample consists of elements 23,
123, 223, 323, 423, 523, and so on.
Stratified Sampling
• Stratified Random Sampling is a method of probability sampling in which the
population is divided into different subgroups and samples are selected from
each.
• A two-step process in which the population is partitioned into subpopulations.
• Divide the target population into homogeneous subgroups or strata.
• Draw random samples from each stratum, Combine the samples from each
stratum into a single sample of the target population
Cluster Sampling
• The target population is first divided into mutually exclusive and collectively exhaustive
subpopulations, or clusters.
• Then a random sample of clusters is selected, based on a probability sampling technique.
• It is of Two Type: one stage and two stage cluster sampling
• For each selected cluster, either all the elements are included in the sample (one-stage)
• Or a sample of elements is drawn probabilistically (two-stage).
• Elements within a cluster should be as heterogeneous as possible, but clusters themselves
should be as homogeneous as possible. Ideally, each cluster should be a small-scale
representation of the population.
Nonprobability/Purposive Sampling
• Non-probability sampling is a sampling technique where some
members of the population have no chance of being selected or
where the selection is based on subjective criteria rather than
randomness.
• Sample is not selected randomly.
• It is not true representation of population.
• Used widely in exploratory research.
• Results are biased.
• Quick and easy process.
Convenience Sampling
• Convenience sampling attempts to
obtain a sample of convenient
elements.
• Often, respondents are selected
because they happen to be in the right
place at the right time.
• Samples are selected based on
availability.
Judgmental/Purposive Sampling
• A non-probability sampling method where researchers select participants based
on their judgment or knowledge.
• Judgmental sampling is used to identify people who are likely to have the most
relevant information for a research project.
• Researchers use their judgment to select people with relevant experience,
expertise, or traits.
• Allows researchers to approach their target market directly.
Quota Sampling
• Quota sampling may be viewed as two-
stage restricted judgmental sampling.
• The first stage consists of developing
control categories, or quotas, of
population elements.
• In the second stage, sample elements are
selected based on convenience or
judgment.
Snowball Sampling
• In Snowball Sampling, an initial group of respondents is selected,
usually at random.
• Initial participants are recruited, who then refer others within their
networks, creating a snowball effect.
Probability Sampling vs Non-probability
Sampling
Probability Sampling Non-Probability Sampling
Inference about the entire population is available. Inference about the entire population is not available.
Randomly selected. Non-randomly selected.
Inferences are generalized. Inferences are non-generalized.
Expensive and time-consuming. Less expensive and convenient.
Less chance of bias and sampling errors. Chances of bias and sampling errors.
Sampling Error
• A sampling error is the difference between a population parameter
and a sample statistics.
• Sampling error reduces when the sample size increases.
• Sampling errors and biases are indued by the sample design.
• They include:
• Selection bias
• Random sampling error
Non-sampling Error
• Errors not related to the act of selecting a sample from the
population.
• Non-sampling errors are:
• Over-coverage
• Under-coverage
• Measurement error
• Processing error
• Non-response and participation bias
Size of Sample
• Number of individuals in the sample is called as size of the
sample.
• A sampling frame is a list of all the units in the population from
which a sample will be selected.
Some important terms
• Confidence Score (Confidence Level) – It represents how certain
we are that a sample estimate reflects the true population value. It
is expressed as a percentage (e.g., 95%, 99%) and is associated
with a Z-score in statistics.
• If a survey says "with 95% confidence, the average height of maize
plants is 180 cm", it means that if we repeat the survey multiple
times, 95% of the time, the true average height will be within the
calculated range.
Some important terms
• A Z-score (or standard score) measures how many standard
deviations a data point is from the mean of a dataset. It helps in
comparing individual values to a normal distribution.
𝑋−𝜇
𝑍=
𝜎
• Z = Z-score
• X = Individual data value
• μ = Mean of the dataset
• σ = Standard deviation of the dataset
Z-score
Z-Score Meaning
0 The data point is exactly at the mean.
The data point is 1 standard deviation above the
+1
mean.
The data point is 1 standard deviation below the
-1
mean.
The data point is 2 standard deviations above the
+2
mean.
The data point is 2 standard deviations below the
-2
mean.
The data point is very far above the mean (unusual
+3 or more
value).
The data point is very far below the mean (unusual
-3 or less
value).
Margin of Error
• The Margin of Error (E) is the maximum expected difference between
the sample estimate and the true population value.
• It helps define the range (confidence interval) in which the true value
likely falls.
𝜎
𝐸=𝑍 ×
𝑛
• Z = Z-score based on confidence level
• σ = Population standard deviation
• n = Sample size
• If a survey estimates that 50% of farmers use AI for crop monitoring,
with a margin of error of ±5%, the true percentage of AI adoption is
expected to be between 45% and 55%.
Calculation of Sample Size
• To calculate the sample size following parameters are needed.
• Z-score
• Standard deviation
• Margin of error (0.05, 0.01)
• Confidence level (95%, 99%)
(𝑍 − 𝑠𝑐𝑜𝑟𝑒)2 × 𝑆𝑡𝑑𝐷𝑒𝑣 × (1 − 𝑆𝑡𝑑𝐷𝑒𝑣)
𝑆𝑎𝑚𝑝𝑙𝑒 𝑆𝑖𝑧𝑒 =
(𝑀𝑎𝑟𝑔𝑖𝑛 𝑜𝑓 𝑒𝑟𝑟𝑜𝑟)2
Calculation of Sample Size
• Common Z-scores for Confidence Levels:
• 90% confidence level → 𝑍=1.645
• 95% confidence level → 𝑍=1.96
• 99% confidence level → 𝑍=2.576
• Standard Deviation (p): 0.5 (default if unknown)
• Margin of error (E) (0.05, 0.01)
• Suppose Z=1.96, p=0.5, E =0.05
𝑆𝑎𝑚𝑝𝑙𝑒 𝑆𝑖𝑧𝑒 𝑛 =?
Calculation of Sample Size
(1.96)2 × 0.5 × (1 − 0.5)
𝑆𝑎𝑚𝑝𝑙𝑒 𝑆𝑖𝑧𝑒 𝑛 =
(0.05)2
3.8416 × 0.25
𝑛 =
0.0025
𝑛 = 384.16
𝑛 = 384
Some more examples
• 95% Confidence Level, Margin of Error = 0.01
• 99% Confidence Level, Margin of Error = 0.05
• 99% Confidence Level, Margin of Error = 0.01