BIOSTATISTICS and
Research Design
COMH 607
Anteneh Tessema
Department of Statistics, AAU
March 2018
Introduction
• What is statistics?
• Statistics: A field of study concerned with:
– collection, organization, analysis,
summarization and interpretation of numerical
data, &
– the drawing of inferences about a body of data
when only a small part of the data is observed.
• Statistics helps us use numbers to
communicate ideas
2
Biostatistics: The application of statistical
methods to the fields of biological and
medical sciences.
Concerned with interpretation of biological
data & the communication of information
derived from these data
Has central role in medical investigations
3
• The numbers must be presented in such a
way that valid interpretations are possible
• Statistics are everywhere – just look at any
newspaper or the current medical and
public health literature.
4
Uses of biostatistics
• Provide methods of organizing information
• Assessment of health status
• Health program evaluation
• Resource allocation
• Magnitude of association
– Strong vs weak association between
exposure and outcome
5
Uses of biostatistics
• Assessing risk factors
– Cause & effect relationship
• Evaluation of a new vaccine or drug
– What can be concluded if the proportion of
people free from the disease is greater among
the vaccinated than the unvaccinated?
– How effective is the vaccine (drug)?
– Is the effect due to chance or some bias?
• Drawing of inferences
– Information from sample to population 6
What does biostatistics cover?
Research Planning
Design The best way to
Biostatistical learn about
thinking biostatistics is to
Execution (Data collection)
contribute in follow the flow of a
every step in a research from
Data Processing inception to the
research
final publication
Data Analysis
Presentation
Interpretation
7
Publication
Research Design
• We can not study all subjects (all pregnant
women, or all people) living in a given
geographical area
– Sampling technique
– Inclusion/exclusion criteria
– Sample size calculation
– Study design
– Method of data collection
– Etc
8
Analysis
• Analysis part is the major part of learning
about biostatistics
– There are dozens of different methods of
analysis, which makes difficult the choice of
the correct method for a particular case
– It is necessary to consider the philosophy
that underlies all methods of analysis:
• Use data from a sample to draw inference about
a wider population
9
Interpretation
• Interpretation of results of statistical
analysis is not always straightforward,
but is simpler when the study has a
clearer aim
• If the study has been well designed and
correctly analyzed the interpretation of
results can be fairly simple
10
Types of Statistics
1. Descriptive statistics:
• Ways of organizing and summarizing data
• Helps to identify the general features and
trends in a set of data and extracting
useful information
• Also very important in conveying the final
results of a study
• Example: tables, graphs, numerical
summary measures
11
Types of Statistics
2. Inferential statistics:
• Methods used for drawing conclusions
about a population based on the
information obtained from a sample of
observations drawn from that population
• Example: Principles of probability,
estimation, confidence interval,
comparison of two or more means or
proportions, hypothesis testing, etc.
12
Data
• Data are numbers which can be measurements
or can be obtained by counting
• The raw material for statistics
• Can be obtained from:
– Routinely kept records, literature
– Surveys
– Counting
– Experiments
– Reports
– Observation
– Etc
13
Types of Data
1. Primary data: collected from the items or
individual respondents directly by the
researcher for the purpose of a study.
2. Secondary data: which had been collected by
certain people or organization, & statistically
treated and the information contained in it is
used for other purpose by other people
14
Population and Sample
• Population:
– Refers to any collection of objects
• Target population:
– A collection of items that have something in
common for which we wish to draw conclusions at
a particular time.
• E.g., All hospitals in Ethiopia
– The whole group of interest
15
Population and Sample
Study (Sampled) Population:
• The subset of the target population that has at
least some chance of being sampled
• The specific population group from which
samples are drawn and data are collected
16
Population and Sample
Sample:
. A subset of a study population, about
which information is actually obtained.
. The individuals who are actually measured
and comprise the actual data.
17
Population
• Role of statistics
in using information
from a sample to make
inferences about the
population
Information
Sample
18
E.g.: In a study of the prevalence
of HIV among adolescents in
Ethiopia, a random sample of
adolescents in Lideta Kifle
Ketema of AA were included.
Sample Target Population: All
adolescents in Ethiopia
Study Population Study population: All
adolescents in Addis Ababa
Target Population
Sample: Adolescents in Lideta
Kifle Ketema who were included
in the study
19
Generalizability
• Is a two-stage procedure:
• We need to be able to generalize from:
– the sample to the study population, &
– then from the study population to the target
population
• If the sample is not representative of the
population, the conclusions are restricted to
the sample & don’t have general
applicability
20
Collect information Draw conclusions
about a rather
from a relatively
LARGE population
SMALL sample
21
Parameter and Statistic
• Parameter: A descriptive measure
computed from the data of a population.
– E.g., the mean (µ) age of the target population
• Statistic: A descriptive measure computed
from the data of a sample.
– E.g., sample mean age ( )
22
• Before summarization and organization,
we need to know the types of variables
and measurement scales of our data.
• Before displaying or analyzing data,
classify the variables into their different
types.
Variable
• Variable: A characteristic which takes
different values in different persons, places,
or things.
• Any aspect of an individual or object that is
measured (e.g., BP) or recorded (e.g., age,
sex) and takes any value.
• There may be one variable in a study or
many.
• E.g., A study of treatment outcome of TB
• Variables can be broadly classified
into:
– Categorical (or Qualitative) or
– Quantitative (or numerical variables).
• Categorical variable: A variable or
characteristic which can not be measured in
quantitative form but can only be sorted by
name or categories
• Not able to be measured as we measure
height or weight
• The notion of magnitude is absent or implicit.
• Quantitative variable: A variable that can
be measured (or counted) and expressed
numerically.
• Height, wt, # of children, etc.
• Has the notion of magnitude.
Quantitative variable is divided into two:
1. Discrete: It can only have a limited number of
discrete values (usually whole numbers).
– E.g., the number of episodes of diarrhoea a child has
had in a year. You can’t have 12.5 episodes of diarrhoea
• Characterized by gaps or interruptions in the
values (integers).
• Both the order and magnitude of the values matter.
• The values aren’t just labels, but are actual
measurable quantities.
2. Continuous variable: It can have an
infinite number of possible values in any
given interval.
• Both the magnitude and the order of the
values matter
• Does not possess the gaps or interruptions
• Weight is continuous since it can take on
any number of values (e.g., 34.575 Kg).
SUMMARY
Variable
Types
of Qualitative Quantitative
variables or categorical measurement
Nominal Ordinal Discrete Continuous
(not ordered) (ordered) (count data) (real-valued)
e.g. ethnic e.g. response e.g. # of e.g. height
group to treatment admissions
Measurement scales
Scales of measurement
• Measurement- the assignment of numbers to the
objects or events according to the set of rules.
• The various measurement scales result from under
different set of rules.
• All measurements are not the same.
• Measuring weight = eg. 40kg
• Measuring the status of a patient on scale =
“improved”, “stable”, “not improved”.
• There are four types of scales of measurement.
1. Nominal scale:
• The simplest type of data, in which the values
fall into unordered categories or classes
• Consists of “naming” observations or
classifying them into various mutually
exclusive and collectively exhaustive
categories
• Uses names, labels, or symbols to assign each
measurement.
– Examples: Blood type, sex, race, marital status, etc.
Example of nominal Scale:
Race/Ethnicity:
1. Black • The numbers have NO
2. White meaning
3. Latino • They are labels only
4. Other
• If nominal data can take on only two
possible values, they are called
dichotomous or binary.
• So sex is not just nominal, it is
dichotomous (male or female).
• Yes/no questions
– E.g., cured from TB at 6 months of Rx
2. Ordinal scale:
• Assigns each measurement to one of a
limited number of categories that are
ranked in terms of order.
• Although non-numerical, can be considered
to have a natural ordering
– Examples: Patient status (unimproved,
improved), cancer stages, socioeconomic (low,
medium, high) etc.
Example of ordinal scale:
• Pain level: • The numbers have
1. None LIMITED meaning
2. Mild 4>3>2>1 is all we
3. Moderate know apart from their
utility as labels
4. Severe
3. Interval scale:
- Measured on a continuum and differences
between any two numbers on a scale are of
known size.
Example: Temp. in oF on 4 consecutive days
Days: A B C D
Temp. oF: 50 55 60 65
For these data, not only is day A with 50o cooler
than day D with 65o, but is 15o cooler.
- It has no true zero point. “0” is arbitrarily chosen
and doesn’t reflect the absence of temp.
4. Ratio scale:
- Measurement begins at a true zero point
and the scale has equal space.
- Examples: Height, age, weight, BP, etc.
• Note on meaningfulness of “ratio”-
– Someone who weighs 80 kg is two times as
heavy as someone else who weighs 40 kg.
This is true even if weight had been measured
in other measurements.
Interval
Ordinal
Nominal
Ratio
Degree of precision in measuring
Statistical software and
Biostatistical Analysis
• The use of statistical software makes
possible for the investigator to devote
more time to improvement of the quality of
raw data and the interpretation of the
results.
• We may use Epi Info, SPSS, Minitab,
SAS, Stata, …
40