Sampling
Steps in determining sampling
Identify unit of analysis
At what level the data needs to be gathered
May be multi level or single
Specify population and sample
Representativeness
Population group of individuals who have the same
characteristics
Target population group of individuals with some
common defining characteristics
Sample subgroup of the target population
Sampling
strategies
Probability
sampling
Simple
Random
sampling
Stratified
sampling
Nonprobability
sampling
Multistage
Cluster
sampling
Convenience
sampling
Snowball
sampling
Simple random sampling
Each element in the population has an
equal probability of selection AND each
combination of elements has an equal
probability of selection
Names drawn out of a hat
Random numbers to select elements from
an ordered list
Stratified sampling
Divide population into groups that differ in
important ways
Basis for grouping must be known before
sampling
Select random sample from within each
group
Random Cluster Sampling
Done correctly, this is a form of random
sampling
Population is divided into groups, usually
geographic or organizational
Some of the groups are randomly chosen
In pure cluster sampling, whole cluster is
sampled.
In simple multistage cluster, there is random
sampling within each randomly chosen cluster
Random Cluster Sampling
Population is divided into groups
Some of the groups are randomly selected
For given sample size, a cluster sample has
more error than a simple random sample
Cost savings of clustering may permit larger
sample
Error is smaller if the clusters are similar to
each other
Cluster sampling has very high error if the
clusters are different from each other
Cluster sampling is NOT desirable if the
clusters are different
It IS random sampling: you randomly
choose the clusters
But you will tend to omit some kinds of
subjects
Stratified Cluster Sampling
Reduce the error in cluster sampling by
creating strata of clusters
Sample one cluster from each stratum
The cost-savings of clustering with the
error reduction of stratification
Stratification vs. Clustering
Stratification
Divide population into
groups different from
each other: sexes, races,
ages
Sample randomly from
each group
Less error compared to
simple random
More expensive to obtain
stratification information
before sampling
Clustering
Divide population into
comparable groups:
schools, cities
Randomly sample some
of the groups
More error compared to
simple random
Reduces costs to sample
only some areas or
organizations
Stratified Cluster Sampling
Combines elements of stratification and
clustering
First you define the clusters
Then you group the clusters into strata of clusters,
putting similar clusters together in a stratum
Then you randomly pick one (or more) cluster from
each of the strata of clusters
Then you sample the subjects within the sampled
clusters (either all the subjects, or a simple random
sample of them)
Multi-stage Probability Samples
Large national probability samples involve
several stages of stratified cluster sampling
The whole country is divided into geographic
clusters, metropolitan and rural
Some large metropolitan areas are selected with
certainty (certainty is a non-zero probability!)
Other areas are formed into strata of areas (e.g.
middle-sized cities, rural counties); clusters are
selected randomly from these strata
Within each sampled area, the clusters are
defined, and the process is repeated, perhaps
several times, until blocks or telephone
exchanges are selected
At the last step, households and individuals
within household are randomly selected
Random samples make multiple call-backs to
people not at home.
The Problem of Non-Response
You can randomly pick elements from sampling
frame and use them to randomly select people
But you cannot make people respond
Non-response destroys the generalizeability of
the sample. You are generalizing to people who
are willing to respond to surveys
If response is 90% or so, not so bad. But if it is
50%, this is a serious problem
Multiple call-backs are essential for trying to reduce non
response bias
Samples without call-backs have high bias: cannot really
be considered random samples
Response rates have been falling
It is very difficult to get above a 60% response rate
You do the best you can, and try to estimate the effect
of the error by getting as much information as possible
about the predictors of non-response.
Non-probability Samples
Convenience
Purposive
Quota
Convenience Sample
Subjects selected because it is easy to
access them.
No reason tied to purposes of research.
Students in your class, people on State,
Street, friends
Purposive Samples
Subjects selected for a good reason tied to
purposes of research
Small samples < 30, not large enough for power
of probability sampling.
Nature of research requires small sample
Choose subjects with appropriate variability in what
you are studying
Hard-to-get populations that cannot be found
through screening general population
Quota Sampling
Pre-plan number of subjects in specified
categories (e.g. 100 men, 100 women)
In uncontrolled quota sampling, the subjects
chosen for those categories are a convenience
sample, selected any way the interviewer
chooses
In controlled quota sampling, restrictions are
imposed to limit interviewers choice
No call-backs or other features to eliminate
convenience factors in sample selection
Quota Vs Stratified Sampling
In Stratified Sampling,
selection of subject is
random.
Call-backs are used to get
that particular subject.
Stratified sampling
without call-backs may
not, in practice, be much
different from quota
sampling.
In Quota Sampling,
interviewer selects first
available subject who
meets criteria: is a
convenience sample.
Highly controlled quota
sampling uses probability
sampling down to the last
block or telephone
exchange
Sample Size
Heterogeneity: need larger sample to study
more diverse population
Desired precision: need larger sample to get
smaller error
Sampling design: smaller if stratified, larger if
cluster
Nature of analysis: complex multivariate
statistics need larger samples
Accuracy of sample depends upon sample size,
not ratio of sample to population
Sampling in Practice
Often a non-random selection of basic sampling
frame (city, organization etc.)
Fit between sampling frame and research goals
must be evaluated
Sampling frame as a concept is relevant to all
kinds of research (including nonprobability)
Nonprobability sampling means you cannot
generalize beyond the sample
Probability sampling means you can generalize
to the population defined by the sampling frame