Type of Statistics:
The Natural of Epidemiologic data
Descriptive Statistics
z Used to organize and produce
z Type of data quantitative summaries of numerical
z Application of statistics in information by
– Means of tables
epidemiology – Charts
– Graphs
– Diagrams
Dr. Ferng- 1
Dr. Ferng- 2
Type of Statistics: Type of Statistics:
Inferential statistics (1) Inferential statistics (2)
z Used to make generalizations or z Parametric inferential statistical
inferences about a larger group or methods (from normal distributed
population on the basis of population
information derived from a – z test
representative subset or sample of – t test
the same group – F test
Dr. Ferng- 3
Dr. Ferng- 4
1
Type of Statistics:
Inferential statistics (3) Populations vs. Samples
z Nonparametric inferential statistical z Population
methods (from not normal – All objects targeted for epidemiologic
distributed population) investigation
– Chi-square test
z Samples
– Wilcoxon test
– Mann-Whitney test – Subset of a population
– Should be representable to the
population
Dr. Ferng- 5
Dr. Ferng- 6
– Controlled by sampling methods
Type Of Data
Quantitative Data
Type of Data
z body weight
Qualitative Quantitative
z height
nominal ordinal interval ratio
z blood pressure
race stress level known the exact difference with true zero
gender SES no true zero
temperature
z blood cholesterol
z blood lead content
Discrete Continuous
z worker working hours, etc.
number of child age, rate, BW
z use Mean, Standard Deviation to
Dr. Ferng- 7 describe its distribution Dr. Ferng- 8
2
Qualitative Data Couple Quick Questions for You
Variable Name Qualitative Quantitative
z satisfaction index
z stress level (index) Zip Code
z sex
Social Security
z Phone number Number
z response Frequencies for the GPA
data analysis
Dr. Ferng- 9
Dr. Ferng- 10
Key: Couple Quick Questions for You Did You Get them All Corrected?
Variable Name Qualitative Quantitative z Here are some more questions for
you..
Zip Code
x
Social Security
x
Number
GPA
x
Dr. Ferng- 11
Dr. Ferng- 12
3
What type of variable are they? Key for “What type of variable are they?”
Variables Qualitative? Quantitative? Variable Name Qualitative Quantitative
Years of Years of
x
Education Education
Annual Income Annual Income x
SES SES x
Major in Major in
x
School School
Dr. Ferng- 13
Dr. Ferng- 14
Dependent Variable: Variables (1)
z outcomes
zlung function test
– may be used as an index of
occupational exposure
z Smoking behavior
– As a result of a smoking cessation
education program
Dr. Ferng- 15
Dr. Ferng- 16
4
Do You Know How To Identify the
Dependent and Independent Variables
Independent Variable: Variables (2) in a Research Statement?
z Exposure or Risk factor Statement Dependent Variable Independent Variable
– Individual characteristics Effectiveness of a ? ?
z Age
Weight Watch
z Sex
Program
z Smoking Status
z Genetic Background
Health Effect of ? ?
z Other Respiratory Disease
Polluted Drinking
– Special intervention Water
z Participatinghealth promotion program
z Participatingsafety training program
z Drank polluted water
Dr. Ferng- 17
Dr. Ferng- 18
Do You Know How To Identify the Mathematic Presentation
Dependent and Independent
Variables? of Two Variables
Statement Dependent Variable Independent Variable
Effectiveness of a Bodyweight weight watch
Weight Watch change programs (various
Program type of programs)
Health Effect of Health status, such drinking water
Polluted Drinking as Cancer quality (polluted
Water and non-polluted)
Dr. Ferng- 19
Dr. Ferng- 20
5
y = a + bx Descriptive Statistics
z y: dependent variable z Central Tendency
z x: independent variable – median
z Examples – mode
– 2 X 2 table, use foodbone disease as – mean
example – making statistical inferences
z Disease + or - is dependent variable
follows from the Central Limit
z Food is independent variable
Theorem
– Blood lead concentration (Y) as a
function of lead concentration in the air
Dr. Ferng- 21
Dr. Ferng- 22
(X)
Central Limit Theorem The Mean
z Mean of z Mean of Sample
zn is moderately large
Population
z has approximately a
normal distribution µ=
∑X X =
∑X
z regardless of the distribution of N n
the underlying variable X.
Dr. Ferng- 23
Dr. Ferng- 24
6
Geometric Mean Geometric Mean
z Logarithmic Mean z Mean of 1, 2, 3 , 3, 3, 4, 4, 5
z Calculable only for positive values.
z [log1 + log2 + 3 x log3 + 2 x log4 +log5 ]/8
z taking the Logarithms of the values
= 3.64/8
z calculating their Arithmetic Mean =0.45
z converting back by taking Antilogarithm.
z Mean = anti-log of 0.45
z Mean = 100.45
= 2.8
Dr. Ferng- 25
Dr. Ferng- 26
Standard Deviation SD): the square root of
Descriptive Statistics the average squared deviation from the mean
z Population SD: z Sample SD:
z Variability (or Dispersion)
SS
– variance (V) σ2 = N = sample size SS
– standard deviation (SD) N s2 =
z A measure of dispersion or n −1
variation of a frequency SS = ∑ ( X − µ ) 2
distribution (∑ X ) 2
=∑X2 −
SD = Variance
N
Standard Deviation is one of several indices of variability
Dr. Ferng- 27
used to characterize the dispersion among the measures in a
Dr. Ferng- 28
given sample population.
7
Relationship Between Variance and
Standard Deviation Arithmetic Mean
z If the variance of a
SD = Variance z Arithmetic
set of data is 16,
– A measure of Central Tendency
then the SD
(standard – Sum of all the observations divided by
deviation) of this the number of observations.
data set is
4 = 16
Dr. Ferng- 29
Dr. Ferng- 30
Arithmetic Mean Median
z A Measure of Central Tendency
z Mean of 1, 2, 3 , 3, 3, 4, 4, 5 z the simplest division of a set of
measurements is into two parts - the
lower and the upper half (i.e., 50%).
z (1+2+3x3+2x4+5)/8 = 3.1
Dr. Ferng- 31
Dr. Ferng- 32
8
Median Mode
z Median of 1, 2, 2 , 3, 4, 5, 6,7 z One of the Measures of Central
Tendency.
z 50th percentile
z The Most Frequently Occurring
z n=8, n/2 =4
Value in a set of observation.
z median = (3+4)/2 =3.5
Dr. Ferng- 33
Dr. Ferng- 34
Mode Types of Data Distributions
z Normal Distribution
z Mode of 1, 2, 2,2 , 3, 3, 4, 4, 5 z Skewed Distribution
z 2
Dr. Ferng- 35
Dr. Ferng- 36
9
Normal Distribution Normal Distribution Curve
Mean = Mode = Median
z Also called as Gaussian Distribution
z Symmetrical clustering of values
z Mean = Mode = Median
50%
50%
Dr. Ferng- 37
Dr. Ferng- 38
How To Use a Normal Curve to
Dispersion of a Normal Curve Establish Confidence Intervals and
Make Statistical Decision?
1. Establish a 95% CL by
– Lower limit = mean – 2xSD
– Upper limit = mean + 2xSD
68% 2. If the number you need to decide is
WITHIN the interval Æ it is a normal
situation Æ accept the null hypothesis
.3413 .3413
3. If the number you need to decide is
95% OUTSIDE the interval Æ it is an abnormal
.4772 .4772 situation Æ reject the null hypothesis
99%
.4987 .4987
Dr. Ferng- 39
Dr. Ferng- 40
-3 -2 -1 0 +1 +2 +3
10
Example of the Dispersion of a Normal
Curve Example: 95% Confidence Interval
Establishment
z What is 95% Confidence Interval when Mean= 20,
SD=2?
– Upper limit = 20 + 2x2=24
– Lower limit = 20-2x2=16
68% – 95% CL = (16,24)
z 23 is greater than the mean (20) and is still within
.3413 .3413 95% interval Æ a normal situation
z 26 is greater than the mean (20) but is outside
95%
the 95% interval Æ an abnormal situation
.4772 .4772
99% z See graph in next page
.4987 .4987
Dr. Ferng- 41
Dr. Ferng- 42
-3 -2 -1 0 +1 +2 +3
Use Normal Curve to Establish What Other Information Can This
Confidence Intervals Normal Curve Tell You?
Mean = 20 z Within one standard deviation for Mean=
SD = 2 20, SD=2 will be
– Upper limit = 20 + 1x2=22
– Lower limit = 20- 1x2=18
– 68% CL = (18,22)
95% CL, Lower limit = 16 95% CL, Upper limit = 24
z It means that the chance of the event
occurrence between 18 and 22 will be 68%.
-3 -2 -1 0 +1 +2 +3
14 16 = 18 20 22 = 24= 26
Dr. Ferng- 43
Dr. Ferng- 44
20-2x2 20 + 2 20 +2x2
11
Practice Question What are your answers?
z The teenager pregnancy rate for last 5
years is 43.0 ± 0.8 per 1,000 female
population per year
z What is the 95% confidence interval?
z What is the lower limit?
z What is the higher limit?
z Is 44.3 per 1,000 per year too high?
z What is the chance of the teenager
pregnancy rate between 42.2 and 43.8?
Dr. Ferng- 45
Dr. Ferng- 46
Use Normal Curve to Establish
Confidence Intervals: 95% Confidence Interval Establishment
Mean = 43.0, SD = 0.8. Is 44.3 ok? and Decision Making
z What is 95% confidence Interval when
Mean= 43, SD=0.8?
– Upper limit = 43 + 2x0.8=44.6
95% CL, Upper limit = 44.6
95% CL, Lower limit = 41.4
– Lower limit = 43 - 2x0.8=41.4
– 95% CL = (41.4, 44.6)
44.3 is within 95% CL
z 44.3 is greater than the mean (43) but is
still within 95% interval Æ a normal
-3 -2 -1 0 +1 +2 +3
situation
41.4 = 42.2 43.0 43.8 = 44.6=
Dr. Ferng- 47
Dr. Ferng- 48
43.0-2x0.8 43.0 + 0.8 43.0 +2x0.8
12
What is the chance of the teenager
pregnancy rate between 42.2 and 43.8? Skew Distribution
z (42.2, 43.8) is 65% interval z an asymmetrical frequency distribution
z Within one standard deviation for Mean=
43, SD=0.8 will be
– Upper limit = 43 + 1x0.8=43.8
– Lower limit = 43 - 1x0.8=42.2
– 68% interval = (42.2, 43.8)
z It means that the chance of the event
occurrence between 18 and 22 will be 68%.
Dr. Ferng- 49
Dr. Ferng- 50
median
Positive Skewness
mean •a longer tail extending toward higher
Negative values of the variate
•Mean > Median
Negative skewness median
–a longer tail extending toward lower
mean
values of the variate
Positive
–Mean < median
13
Shape of Dispersion The End
SD of A curve is smaller than SD of B curve
A B
Dr. Ferng- 53
Dr. Ferng- 54
14