CHAPTER 2
DATA COLLECTION AND
SAMPLING
Points to highlight
Methods of collecting data
Observation
Experiment
Survey
Why sampling
Sampling methods
Simple Random Sampling
Stratified Random Sampling
Cluster Sampling
Systematic Sampling
Sampling and Non-sampling error
2
Primary Secondary
Data Collection Data Compilation
Print or Electronic
Observation Survey
Experimentation
I. Methods of collecting data
1. Observation
The
investigator observes characteristics of a subset of
members of one or more existing population.
Goal: draw conclusions about the corresponding
population or about the difference between two or more
populations.
Advantage vs Disadvantage
o Advantage: easy to conduct, relatively inexpensive
o Disadvantage: provide little useful information;
impossible to draw cause-and-effect conclusions due to
confounding variable
4
Observation
Example
A researcher for a pharmaceutical company
wants to determine whether aspirin does
reduce the incidence of heart attacks. He
select a sample of men and women and asking
each whether he or she has taken aspirin
regularly over the past 2 years. Each person
would be asked whether he or she had
suffered a heart attack over the same period.
The proportions reporting heart attacks would
be compared and a conclusion can be drawn
whether aspirin is effective in reducing the
likelihood of heart attacks.
5
I. Methods of collecting data
2. Experiment
The investigator observes how a response variable
behaves when the researcher manipulates one or more
explanatory variables (factors).
Goal: determine the effect of the manipulated factors
on the response variable
Advantage vs Disadvantage
o Advantage: provide useful data particularly for cause-
and-effect relationship
o Disadvantage: relatively expensive, time required.
6
Experiment
Example
A researcher for a pharmaceutical company
wants to determine whether aspirin does
reduce the incidence of heart attacks. He
select a sample of men and women. The
sample would be divided into two groups: one
group would take aspirin regularly and the
other would not. After 2 years, the researcher
would determine the proportion of people in
each group who had suffered a heart attack.
Then, it is possible to draw conclusion
whether aspirin is effective in reducing the
likelihood of heart attacks.
7
I. Methods of collecting data
3. Survey
One of the most familiar methods of collecting data
Goal: Used to solicit information from people concerning
things as income, family size, opinions on various issues…
The majority of surveys are conducted for private use
Examples:
o market researchers conduct a survey to determine the
preferences and attitudes of consumers which will help
target a new product;
o A company surveys customers’ satisfaction on their
products and service.
8
SURVEY
TELEPHONE
PERSONAL INTERVIEW MAIL SURVEY
INTERVIEW
- Inexpensive
- High rate of
- Less expensive - Low response
response, fewer
- Less personal, rate, high
incorrect answers
lower response number of
- Costly: people,
rate incorrect
money, time…
answers
9
Define the issue
what are the purpose and objectives of the survey
Identify the questions to answer?
Deciding what to measure and how to measure
Decide what information needed to answer questions
Think about how you intend to tabulate and analyze
the response
Define the population of interest
Survey Design Steps
10
Design questionnaire
Questionnaire should be kept as short as possible
The questions should be short, simple, clear,
unambiguous
Begin with simple demographic questions
Use both dichotomous questions (close–ended)
questions as well as open – ended question
Avoid using leading questions
Survey Design Steps
11
Pre-test the survey
pilot test with a small group of participants
assess clarity and length
Determine the sample size and sampling method
Select Sample and administer the survey
Survey Design Steps
12
Close-ended Questions
* Select from a short list of defined choices
Example: Major: __business __liberal arts
__science __other
Open-ended Questions
* Respondents are free to respond with any value, words, or
statement
Example: What did you like best about this course?
Demographic Questions
* Questions about the respondents’ personal characteristics
Example: Gender: __Female __ Male
Types of Questions
13
II. SAMPLING METHODS
1/ Why Sampling
- Less time consuming than a census
- Less costly to administer than a census
- It is possible to obtain statistical results of a sufficiently
high precision based on samples.
- Sometimes, it’s impossible to identify the whole
population
14
POPULATION VS SAMPLE
All likely voters in the 1000 voters selected at
next election random for interview
All parts produced A few parts selected for
today destructive testing
All sales receipts of a Every 100th receipt
year selected for audit
15
2/ Methods of Sampling
Probability Samples
Simple
Stratified Systematic Cluster
Random Random
16
Simple Random Samples
Every individual or item from the population
has an equal chance of being selected
Selection may be with replacement or
without replacement
Samples can be obtained from a table of
random numbers or computer random number
generators
17
Stratified Random Samples
Population divided into subgroups (called
strata) according to some common characteristic
Simple random sample selected from each
subgroup
Samples from subgroups are combined into one
Population
Divided
into 4
strata
18 Sample
Systematic Samples
Decide on sample size: n
Divide frame of N individuals into groups of k
individuals: k=N/n
Randomly select one individual from the 1st
group
Select every kth individual thereafter
N = 64
n=8 First Group
k=8 19
Cluster Samples
*Population is divided into several “clusters,”
each representative of the population
*A simple random sample of clusters is selected
* All items in the selected clusters can be used, or items can be
chosen from a cluster using another probability sampling
technique
Population
divided into
16 clusters. Randomly selected
clusters for sample
20
CONVENIENT SAMPLING
- Use easily available/convenient
group to form a sample
WHAT IS IT? - Voluntary response sampling, self-
selected sampling…
21
III. SAMPLING AND NON-SAMPLING ERROR
1/ Sampling Error
- An error expected to occur when making statement
about the population that is based on the observations
contained in a sample taken from the population.
- The difference/deviation between the true (unknown)
value of a population parameter (mean, standard
deviation…) and its estimate, the sample statistic is the
sampling error.
- Sample error may be large due to unrepresentative
sample be selected.
- The only way to reduce sample error is to take larger
sample size
22
SAMPLING ERROR
23
III. SAMPLING AND NON-SAMPLING ERROR
1/ Non-Sampling Error
Selection Bias
An error occur
when there are
mistakes in the
acquisition of Measurement or
the data or due response bias
to the sample
observations
being selected
improperly.
Nonresponse Bias
24
SELECTION BIAS
- Occur when the way the sample selected is
systematically excludes some part of the population
of interest.
- Example: A study on an issue related to the
population consisting of all residents of a city. The
methods of selecting individuals may exclude the
homeless or those without telephones.
- Selection bias also usually occurs when only
volunteers or self-selected individuals are used in a
study.
25
MEASUREMENT OR RESPONSE BIAS
- Occur when the method of observation tends to
produce values that systematically differ from the true
value in some ways.
-This problem might happen due to:
An improperly calibrated scale is used to weigh items
Questions on a survey are worded in a way that tends
to influence the response.
The appearance or the behavior of the interviewer,
the group or organization conducting the survey, the
tendency for people not to be completely honest
when asked about sensitive issues (sexual, illegal
activities…)
26
NONRESPONSE BIAS
- Occur when responses are not obtained from some
individuals of the sample.
- As with selection bias, nonresponse bias can distort
results of the study.
- This problem might happen due to:
An interviewer unable to contact a person listed
in the sample
Sampled person refuses to respond for some
reasons
27
Case study
In summer 1936, the Literary Digest magazine wanted to
predict the next US president, just as they had successfully
done five times before.
They sent out postcards to 10 million Americans and then
announced that Alfred M. Landon, then governor of Kansas,
would gain 57% percent of the popular vote and, thus,
demolish Franklin D. Roosevelt, the incumbent president.
In fact, Roosevelt won by a landslide never before seen in
U.S. history. He garnered not the predicted 43%, but 62.5%
of the popular vote and all but 8 of 531 electoral votes.
The Digest never survived the debacle and folded shortly
thereafter.
What had gone wrong?
28
Case analysis
1 Sample selection
10 million people were
What had Digest done chosen from various sources:
with their poll? mailing list of subscribers,
club membership roster,
telephone directories,
automobile registration rolls
2 Response
percentage
Only 2.4 million of the 10
29 million questionnaire were
mailed back
End of chapter 2
THANK YOU!
30