Study Designs in Epidemiologic
Research
Jerry D.T. Purnomo, Ph.D
(B.Sc.-ITS; M.Sc.-ITS; Ph.D.-NCTU, Taiwan)
Departemen Statistika
Institut Teknologi Sepuluh Nopember, Indonesia
Email: jerrypurnomo@gmail.com; jerry@statistika.its.ac.id
What is Epidemiology?
• epi- among
• demos- people
• Goal: Identify (cause/risk/protective) factors of
diseases
1. Elucidate the etiology of disease
2. Evaluate the consistency of epidemiologic data
with etiologic hypotheses
3. Provide the basis for developing and
evaluating preventive or public health
practices
2
Fundamental Assumption in
Epidemiology
Disease doesn’t occur in a vacuum
Disease is not randomly distributed throughout a
population
Epidemiology uses systematic approach to
study the differences in disease distribution in
subgroups
Allows for study of causal and preventive
factors
Components of Epidemiology
Measure disease frequency
Quantify disease
Assess distribution of disease
Who is getting disease?
Where is disease occurring?
When is disease occurring?
Formulation of hypotheses concerning causal and preventive
factors
Identify determinants of disease
Hypotheses are tested using epidemiologic studies
Types of Research Studies
• Observational: studies that do not involve
any intervention or experiment.
• Experimental: studies that entail manipulation
of the study factor (exposure) and
randomization of subjects to treatment
(exposure) groups
Observation Methods
• Selected Units: individuals, groups
• Study Populations: cross-sectional,
longitudinal
• Data collection timing: prospectively,
retrospectively, combination
• Data collection types: primary,
secondary
Study Populations
• Cross-sectional: where only ONE set of observations
is collected for every unit in the study, at a certain
point in time, disregarding the length of time of the
study as a whole
• Longitudinal: where TWO or MORE sets of
observations are collected for every unit in the
study, i.e. follow-up is involved in order to allow
monitoring of a certain population (cohort) over a
specified period of time. Such populations are AT
RISK (disease-free) at the start of the study.
Observational Designs
(Classification I)
• Exploratory: used when the state of
knowledge about the phenomenon is poor:
small scale; of limited duration.
• Descriptive: used to formulate a certain
hypothesis: small / large scale. Examples:
case-studies; cross-sectional studies
• Analytical: used to test hypotheses: small /
large scale. Examples: case-control, cross-
sectional, cohort.
Observational Designs
(Classification II)
• Preliminary (case-reports, case-series)
• Basic (cross-sectional, case-control,
cohort [prospective, retrospective] )
• Hybrid (two or more of the above,
nested case-control within cohort, etc.)
• Incomplete (ecological, PMR, etc.)
• Others (repeated, case cross-over, migrant,
twin, etc.)
Basic Question in Analytic
Epidemiology
Are exposure and disease linked?
Exposure Disease
Basic Questions in Analytic
Epidemiology
Look to link exposure and disease
What is the exposure?
Who are the exposed?
What are the potential health effects?
What approach will you take to study the relationship
between exposure and effect?
Big Picture
To prevent and control disease
In a coordinated plan, look to
identify hypotheses on what is related to disease
and may be causing it
formally test these hypotheses
Study designs direct how the
investigation is conducted
Timeframe of Studies
Prospective Study - looks forward, looks to the
future, examines future events, follows a
condition, concern or disease into the future
time
Study begins here
Timeframe of Studies
Retrospective Study - “to look back”, looks
back in time to study events that have already
occurred
time
Study begins here
Study Design Sequence
Hypothesis formation
Descriptive
Case reports Case series
epidemiology
Analytic Animal Lab
epidemiology study study
Clinical
trials Hypothesis testing
Cohort Case- Cross-
control sectional
Descriptive Studies Develop
hypothesis
Increasing Knowledge of
Investigate it’s
Disease/Exposure
Case-control Studies relationship to
outcomes
Define it’s meaning
Cohort Studies with exposures
Test link
Clinical trials experimentally
Case Reports
Detailed presentation of a single case or
handful of cases
Generally report a new or unique finding
e.g. previous undescribed disease
e.g. unexpected link between diseases
e.g. unexpected new therapeutic effect
e.g. adverse events
Case Series
Experience of a group of patients with a similar
diagnosis
Assesses prevalent disease
Cases may be identified from a single or multiple
sources
Generally report on new/unique condition
May be only realistic design for rare disorders
Case Series
Advantages
Useful for hypothesis generation
Informative for very rare disease with few
established risk factors
Characterizes averages for disorder
Disadvantages
Cannot study cause and effect relationships
Cannot assess disease frequency
Case Report One case of unusual
findings
Multiple cases of
Case Series findings
Descriptive Population-based
Epidemiology Study cases with denominator
Study Designs -
Analytic Epidemiology
Experimental Studies
Randomized controlled clinical trials
Community trials
Observational Studies
Group data
Ecologic
Individual data
Cross-sectional
Cohort
Case-control
Case-crossover
Experimental Studies
treatment and exposures occur in a “controlled”
environment
planned research designs
clinical trials are the most well known
experimental design. Clinical trials use randomly
assigned data.
Community trials use nonrandom data
Observational Studies
non-experimental
observational because there is no individual
intervention
treatment and exposures occur in a “non-
controlled” environment
individuals can be observed prospectively,
retrospectively, or currently
Cross-sectional studies
An “observational” design that surveys exposures and
disease status at a single point in time (a cross-section of
the population)
time
Study only exists at this point in time
Cross-sectional Design
factor present
No Disease
factor absent
Study
population
factor present
Disease
factor absent
time
Study only exists at this point in time
Cross-sectional Studies
Often used to study conditions that are relatively
frequent with long duration of expression
(nonfatal, chronic conditions)
It measures prevalence, not incidence of disease
Example: community surveys
Not suitable for studying rare or highly fatal
diseases or a disease with short duration of
expression
Cross-sectional studies
Disadvantages
Weakest observational design,
(it measures prevalence, not incidence of
disease). Prevalent cases are survivors
The temporal sequence of exposure and effect
may be difficult or impossible to determine
Usually don’t know when disease occurred
Rare events a problem. Quickly emerging
diseases a problem
Epidemiologic Study Designs
Case-Control Studies
an “observational” design comparing exposures
in disease cases vs. healthy controls from same
population
exposure data collected retrospectively
most feasible design where disease outcomes
are rare
Case-Control Studies
Cases: Disease
Controls: No disease
factor present
Cases
(disease)
factor absent
Study
population
factor present
Controls
(no disease)
factor absent
present
past
time
Study begins here
Example of Case-Control Studies
• Smoking and carcinoma of the lung– Doll and
Hill (BMJ 1950; ii:739-748)
• Maternal stilbestrol therapy and vaginal
cancer in daughters– Herbst et al. (NEJM
1971; 284:878-881)
• Thromboembolism and oral
contraceptives– Sartwell et al. (AJE 1969;
90:365-380)
History of Case-Control Studies
• Lane-Claypon (1926)
– Reproductive experience in the etiology of breast
cancer
• Cigarette smoking and lung cancer
– Levin, Goldstein and Gerhardt (1950)
– Wynder and Graham (1950)
• Doll and Hill (1952)
– Prototype
• Cornfield (1951)
– Odds ratio and relative risk
• Mantel and Haenszel (1959)
– Foundation for analysis of case-control studies
Case-Control Study
Strengths
Less expensive and time consuming
Efficient for studying rare diseases
Limitations
Inappropriate when disease outcome for a specific
exposure is not known at start of study
Exposure measurements taken after disease occurrence
Disease status can influence selection of subjects
Major Challenges of Case-
Control Studies
• Selection of a comparable control group
by avoiding
– selection bias
– recall bias
• Confounding and interaction
• Matching
Epidemiologic Study Designs
Cohort Studies
an “observational” design comparing individuals
with a known risk factor or exposure with
others without the risk factor or exposure
looking for a difference in the risk (incidence)
of a disease over time
best observational design
data usually collected prospectively (some
retrospective)
disease
Factor
present
no disease
Study
population
free of
disease disease
no disease
present
future
time
Study begins here
2000 Defined
Population
NON-RANDOMIZED
2010 Exposed Non-exposed
No
2020 Disease disease
Disease No
disease
Prospective Cohort Study
• Assemble the cohort in the present, and follow the
individuals prospectively into the future (Prospective
cohort)
– Advantage: One may collect exactly the
information thought to be required
– Disadvantage: Many years may elapse before
sufficient cases of disease have developed
for analysis
Defined
1980
Population
NON-RANDOMIZED
1990 Exposed Non-exposed
No
2000 Disease disease
Disease No
disease
Retrospective Cohort Study
• One may identify a group with certain exposure
characteristics, by means of historical records, at a
certain defined time in the past, and reconstruct
the disease experience of the group between the
defined time in the past and the present
(Retrospective cohort)
– Advantage: Results are potentially available
immediately
– Disadvantage: The information available on the
cohort may not be completely satisfactory
Examples of Cohort Study
• The British doctors study– Doll and Hill (BMJ 1954;
ii: 1451-1455)
• Hepatocellular carcinoma and hepatitis B
virus– Beasley et al. (Lancet 1981; ii:1129-
1133)
• Cancer in the South Wales nickel workers– Doll et al.
(BJC 1970; 24:623-632)
• The Montana smelter workers– Lee and Fraumeni
(JNCI 1969; 42:1045-1052)
Cohort Study
Strengths
Exposure status determined before disease
detection
Subjects selected before disease detection
Can study several outcomes for each exposure
Limitations
Expensive and time-consuming
Inefficient for rare diseases or diseases with long
latency
Loss to follow-up
Clinical Trials
Randomized, controlled studies
Randomized Clinical Trials
Reference population
Study Population
RANDOMIZATION
New Treatment Current Treatment
Improved Not Improved Not
Improved Improved
Exposure
• Environmental factors
– Water contamination
– Radiation
– Occupational
• Host factors
– Lifestyle: smoking, drinking, dietary, exercise
– Genetic: BRCA1 gene and breast cancer
• Protective factors
– Oral contraceptives and benign breast cancer in
Kelsey et al. (1974)
• Health services
– Pap smear and cervical cancer in Clarke and
Anderson (1979)
Major Differences Between Case-
Control and Cohort Studies
• Selection of study subjects
– The cohort study selects subjects who are initially
free of disease and follows them over time to
determine the rates of disease in the presence or
absence of exposure
– The case-control study selects subjects on the basis
of the presence or absence of the disease under
study
• Measure of associations
– Relative risk in a cohort study
– Odds ratio in a case-control study
Contrast Between Case-Control
and Cohort Studies
• Case-control study (Doll and Hill, 1950, 1952)
– April 1948-December 1952
– 1,488 lung cancer cases out of 4,342 interviewed
– 1,465 lung cancer cases and 1,465 individually matched
controls
• Cohort study (Doll and Peto, 1976; Doll et al. 1980)
– October 1951
– 20 year follow-up for men in 1976
– 441 lung cancer deaths among 34,440 men
– 22 year follow-up for women in 1980
– 27 lung cancer deaths among 6,194 women
Relative Risk (RR)
Association: how much does one factor (e.g.,
disease) vary according to the value of
another factor (e.g., exposure).
The relative risk reflects the strength of the
association of developing the disease among
those exposed relative to those who are not
exposed.
Relative risk (RR) =
incidence in the exposed group
incidence in the non-exposed group
Calculating RR in Cohort Studies
• In a cohort study, the relative risk can be calculated
directly.
THEN: follow to see
disease develop? Incidence
Yes No Total of disease
Exposed a b a +b a/(a+b)
FIRST:
select
Unexposed c d c +d c/(c+d)
• RR = a/(a+b)
c/(c+d)
Interpreting RR
If RR = 1, risk in exposed = risk in
unexposed (no association)
If RR > 1, risk in exposed > risk in
unexposed (positive association; ?
causal)
If RR < 1, risk in exposed < risk in
unexposed (negative association; ?
protective)
RR Example
CHD risk among smokers and non-smokers
CHD+ CHD- Total
Smokers 84 2,916 3,000
Non-smokers 87 4,913 5,000
• Incidence among exposed = 84/3,000 = 28.0
• Incidence among unexposed = 87/5,000 =17.4
• RR = 28.0/17.4 = 1.61 – Smokers are 1.61 times
more likely to develop CHD than non-smokers.
Odds Ratio (Relative Odds)
• In a case-control study, we do not know the incidence
in the exposed and unexposed groups, so we cannot
calculate RR directly.
• Odds =
Probability that the event can occur
Probability that the event cannot occur
• Odds ratio (OR) =
Odds of developing the disease in the exposed group
Odds of developing the disease in the unexposed group
Calculating “OR” in Cohort Studies
THEN: follow them to see if disease is developed?
Yes No Total
Exposed a b a +b
FIRST:
select
Unexposed c d c +d
Odds of developing the disease in the exposed group
= [a/(a+b)] / [b/(a+b)] = a/b
Odds of developing the disease in the unexposed group
= [c/(c+d)] / [d/(c+d)] = c/d
Odds ratio = (a/b) / (c/d) = ad/bc
Calculating “OR” in Case-Control Studies
FIRST: select Odds of developing the
Cases Controls disease in the exposed
group
THEN: Exposed Odds of developing the
a b
measure disease in the non-
past exposed group
exposure Not c d
=
Odds of being exposed in
a+c b+d cases
Odds of being exposed in
controls
Odds of being exposed in cases = [a/(a+c)] / [c/(a+c)] = a/c
Odds of being exposed in controls = [b/(b+d)] / [d/(b+d)] = b/d
Odds ratio = (a/c) / (b/d) = ad/bc
Interpreting “OR”
• ad/bc represents the OR in both cohort
and case-control studies.
• If OR = 1, the exposure is not related to the
disease
• If OR > 1, the exposure is positively
related to the disease
• If OR < 1, the exposure is negatively related
to the disease
Strengths of Case-Control Studies
Informative
– Permit evaluation of interaction, which is the extent
and manner in which two or more causes of the
disease modify the strength of one another
– Evaluation and control of confounding, of an association
resulting because the factor under study is associated
with a known or suspected cause of the disease
– Large number of cases can be assembled
Effective
– Can be done quickly with relatively low cost
Applicable to rare diseases
36
Choice of Design (I)
Depends on:
– Research Questions
– Research Goals
– Researcher Beliefs and Values
– Researcher Skills
– Time and Funds
Choice of Design (II)
It is also related to:
Status of existent knowledge
Occurrence of disease
Duration of latent period
Nature and availability of information
Available resources
Comparing Study Design
Theme
Ease
Timing
Maintenance and continuity
Costs
Ethics
Data utilization
Main contribution
Observer bias
Selection bias
Analytic output
Overlap in The Conceptual Basis
Of Quantitative Study Designs
• The cross-sectional study can be repeated
• If the same sample is studied for a second time
i.e. it is followed up, the original cross-sectional
study now becomes a cohort study.
• If, during a cohort study, possibly in a subgroup, the investigator
imposes an intervention, a trial begins.
• Cohort study also gives birth to case-control studies, using
incident cases (nested case control study).
• Cases in a case-series, particularly a population based one, may be
the starting point of a case- control study or a trial.
• Not every epidemiological study fits neatly into one of the basic
designs.
47