INFERENTIAL STATISTICS
Dr. Dalia El-Shafei
Assist.Prof., Community Medicine Department, Zagazig
University
Definition of statistics :
Branch of mathematics concerned with:
Collection, Summarization, Presentation, Analysis,
and Interpretation of data.
Collection Summarization Presentation Analysis Interpretation
TYPES OF STATISTICS
• Describe or summarize the data • Use data to make inferences or
of a target population. generalizations about population.
• Describe the data which is • Make conclusions for population
already known. that is beyond available data.
• Organize, analyze & present
• Compare, test and predicts future
data in a meaningful manner.
• Final results are shown in outcomes.
forms of tables and graphs. • Final results is the probability
• Tools: measures of central scores.
tendency & dispersion. • Tools: hypothesis tests
Descriptive Inferential
INFERENCE
Inference involves making a generalization about a larger group of
individuals on the basis of a subset or sample.
Inferential statistics
Hypothesis testing Estimation
Hypothesis Set level of Choosing Decision Point
Decision
formulation significance test approach estimate
Null Interval estimate
hypothesis Quantitative
α error p-value Accept H0 “Confidence
data interval”
“H0”
Alternative
hypothesis Qualitative
Critical value Reject H0
data
“H1”
CONFIDENCE LEVEL & INTERVAL “INTERVAL ESTIMATE”
Confidence • The range of values that is used to
interval
estimate the true value of the
“Interval
population parameter.
estimate”
• The probability that the confidence
interval does, in fact, contain the
Confidence
true population parameter, assuming
Level
that the estimation process is
repeated many times (1−𝛼 ).
HYPOTHESIS TESTING
To find out whether the observed variation among sampling is
explained by sampling variations, chance or is really a difference
between groups.
The method of assessing the hypotheses testing is known as
“significance test”.
Significance testing is a method for assessing whether a result is
likely to be due to chance or due to a real effect.
NULL & ALTERNATIVE HYPOTHESES:
In hypotheses testing, a specific hypothesis is formulated & data is
collected to accept or to reject it.
Null hypotheses means: H0: x1=x2 this means that there is no
difference between x1 & x2.
If we reject the null hypothesis, i.e there is a difference between the 2
readings, it is either H1: x1< x2 or H2: x1> x2
Null hypothesis is rejected because x 1 is different from x2.
Compared the smoking cessation rates for smokers randomly
assigned to use a nicotine patch versus a placebo patch.
Null hypothesis: Smoking cessation rate in nicotine patch group =
smoking cessation rate in placebo patch group.
Alternative hypothesis: Smoking cessation rate in nicotine patch
group ≠ smoking cessation rate in placebo patch group (2 tailed) OR
smoking cessation rate in nicotine patch group is higher than
smoking cessation rate in placebo patch group (1 tailed).
DECISION ERRORS
Type I error “α” = False +ve = Rejection of true H0
Type II error “β” = False –ve = Accepting false H0
In statistics, there are 2 ways to determine whether the evidence is likely or
unlikely given the initial assumption:
Critical value approach (favored in many of the older textbooks).
P-value approach (what is used most often in research, journal articles, and
statistical software).
If the data are not consistent with the null hypotheses, the difference is said
to be “statistically significant”.
If the data are consistent with the null hypotheses it is said that we accept it
i.e. statistically insignificant.
In medicine, we usually consider that differences are significant if the
probability is <0.05.
This means that if the null hypothesis is true, we shall make a wrong decision
<5 in a 100 times.
CRITICAL VALUE
CRITICAL VALUE
A point on the test distribution that is compared to the test statistic to
determine whether to reject the null hypothesis.
If the absolute value of your test statistic is greater than the
critical value, you can declare statistical significance and reject
the null hypothesis.
Critical values correspond to α, so their values become fixed when
you choose the test's α.
Critical Value is the z-score that separates sample statistics likely to occur
from those unlikely to occur. The number 𝑍𝛼⁄2 is the z-score that separates a
region of 𝛼⁄2 from the rest of the standard normal curve
TESTS OF
SIGNIFICANCE
Tests of significance
Qualitative
Quantitative variables
variables
Proportion
1 Mean 2 Means >2 Means X2 test
Z-test
One One Large
sample Z- sample t- sample Small sample “<30” ANOVA
test test “>30”
Paired t-
Z-test t-test
test
ANALYSIS OF
QUANTITATIVE VARIABLES
Tests of significance
Qualitative
Quantitative variables
variables
Proportion
1 Mean 2 Means >2 Means X2 test
Z-test
One One Large
sample Z- sample t- sample Small sample “<30” ANOVA
test test “>30”
Paired t-
Z-test t-test
test
Z TEST OR SND
“STANDARD NORMAL DEVIATE”
Z TEST OR SND STANDARD NORMAL DEVIATE
Used for Comparing 2 means of large samples (>60) using the
normal distribution.
Student’s t-test
STUDENT'S T-TEST
Used for Comparing two means of small samples (<60) by the t
distribution instead of the normal distribution.
UNPAIRED T-TEST
X1= mean of the 1st sample X2=mean of the 2nd sample
n1= sample size of the 1st sample n2= sample size of the 2nd sample
SD1= SD of the 1st sample SD2 = SD of the 2nd sample.
Degree of freedom (df) = (n1+n2)-2
STUDENT'S T-TEST
The value of t will be compared to values in the specific table of "t
distribution test" at the value of the degree of freedom.
If the calculated value of t is less than that in the table, then the difference
between samples is insignificant.
If the calculated t value is larger than that in the table so the difference is
significant i.e. the null hypothesis is rejected.
Statistical
significance
Small P-
value
Big t-value
Suppose that you calculate t test= 1.75
STUDENT'S T-TEST Suppose that df = 3
Calculated t (1.75) < Tabulated t (3.182), then the difference between samples is
insignificant. i.e. Null hypothesis is accepted.
PAIRED T-TEST
Comparing repeated observation in the same individual or difference
between paired data.
The analysis is carried out using the mean & SD of the difference
between each pair.
ANALYSIS OF VARIANCE
(ANOVA)
Used for Comparing several means.
To compare >2 means, this can be done by use of several t-tests that can consume
more time & lead to spurious significant results. So, we must use analysis of
variance or ANOVA.
ANALYSIS OF VARIANCE (ANOVA)
There are two main types:
One-way ANOVA
• When the subgroups to be compared are defined by just one factor
• Comparison between means of blood glucose levels among 3 groups of
diabetic patients (1st group was on insulin, 2nd group was on oral
hypoglycemic drugs, & 3rd group was on lifestyle modification)
Two-way ANOVA
• When the subdivision is based upon more than one factor.
• The above-mentioned example the groups were divided into males & females.
The main idea in the ANOVA is that we have to take into account the variability
within the groups and between the groups and value of F is equal to the ratio
between the means sum square of between the groups and within the groups.
F = between-groups MS / within-groups MS.
ANALYSIS OF
QUALITATIVE VARIABLES
Tests of significance
Qualitative
Quantitative variables
variables
Proportion
1 Mean 2 Means >2 Means X2 test
Z-test
One One Large
sample Z- sample t- sample Small sample “<30” ANOVA
test test “>30”
Paired t-
Z-test t-test
test
Chi -square test
CHI -SQUARE TEST
Test relationships between categorical independent variables.
Qualitative data are arranged in table formed by rows & columns.
Variables Obese Non-Obese Total
Diabetic 62 63 125
Non-diabetic 51 44 105
Total 113 107 220
O = Observed value in the table
E = Expected value
Row total Χ Column total
Expected (E) =
Grand total
Degree of freedom =
(row - 1) (column - 1)
EXAMPLE HYPOTHETICAL STUDY
Two groups of patients are treated using different spinal
manipulation techniques
Gonstead vs. Diversified
The presence or absence of pain after treatment is the outcome
measure.
Two categories
Technique used
Pain after treatment
GONSTEAD VS. DIVERSIFIED EXAMPLE - RESULTS
Pain after treatment
Technique Yes No Row Total
Gonstead 9 21 30
Diversified 11 29 40
Column Total 20 50 70
Grand Total
9 out of 30 (30%) still had pain after Gonstead treatment
and 11 out of 40 (27.5%) still had pain after Diversified,
but is this difference statistically significant?
FIRST FIND THE EXPECTED VALUES FOR EACH
CELL
Row total Χ Column
Expected (E)
total
= Grand total
Pain after treatment
Technique Yes No Row Total
Gonstead 9 21 30 Multiply
Multiplyrow
rowtotal
total
Diversified 11 29 40
Column Total 20 50 70 Divide
Divideby
bygrand
grandtotal
total
Grand Total
Times
Timescolumn
columntotal
total
To find E for cell a (and similarly for the rest)
Find E for all cells
Pain after treatment
Yes No Row Total
Technique
Gonstead
9 21 30
E = 30*20/70=8.6 E = 30*50/70=21.4
11 29
Diversified 40
E=40*20/70=11.4 E=40*50/70=28.6
Column Total 20 50 70
Grand
Total
2
Use the Χ formula with each cell and then add them together
(9 - 8.6)2 (21 - 21.4)2 0.018
0.0168
8.6 21.4 6
=
(11 - 11.4) (29 - 28.6)
2 2
0.031
0.0056
11.4 28.6 6
Χ2 = 0.0186 + 0.0168 + 0.0316 + 0.0056 = 0.0726
Evidence-based Chiropractic
Calculated χ2 value (0.0726) < Tabulated value
2 (7.815) at df = 1.
Therefore, Χ is not statistically significant
So, we will accept null hypothesis
Z TEST FOR COMPARING 2
PERCENTAGES
“PROPORTION Z-TEST”
Z TEST FOR COMPARING 2 PERCENTAGES “PROPORTION Z-
TEST”
Z= p1 – p2 /√(p1q1/n1 + p2q2/n2).
p1=% in the 1st group. p2 = % in the 2nd group
q1=100-p1 q2=100-p2
n1= sample size of 1st group
n2=sample size of 2nd group .
Z test is significant (at 0.05 level) if the result >2.
EXAMPLE
If the number of anemic patients in group 1 which includes 50 patients
is 5 and the number of anemic patients in group 2 which contains 60
patients is 20. if groups 1 & 2 are statistically different in prevalence of
anemia we calculate z test.
p1=5/50=10% p2=20/60=33% q1=100-10=90 q2=100-33=67
Z= 10 – 33/ √ (10x90/50 + 33x67/60)
Z= 23 / √ (18 + 36.85) Z= 23/ 7.4 Z= 3.1
So, there is statistically significant difference between percentages of
anemia in the studied groups (because Z>2).
CORRELATION & REGRESSION
CORRELATION & REGRESSION
Correlation measures the closeness of the association between 2
continuous variables, while Linear regression gives the equation of
the straight line that best describes & enables the prediction of one
variable from the other.
CORRELATION IS NOT CAUSATION!!!
LINEAR REGRESSION
Same as correlation Differ than correlation
Determine the relation & prediction of the The independent factor has to be
change in a variable due to changes in specified from the dependent
other variable.
t-test is also used for the assessment of the
variable.
level of significance. The dependent variable in linear
regression must be a continuous
one.
Allows the prediction of dependent
variable for a particular independent
variable “But, should not be used
outside the range of original data”.
CORRELATION
Measured by the correlation coefficient, r. The values of r ranges between +1
and -1.
“1” means perfect correlation while “0” means no correlation.
If r value is near the zero, it means weak correlation while near the one it
means strong correlation. The sign - and + denotes the direction of correlation
REGRESSION
LINEAR REGRESSION
Used to determine the relation & prediction of the change in a
variable due to changes in another variable.
For linear regression, the independent factor (x) must be specified
from the dependent variable (y).
Also allows the prediction of dependent variable for a particular
independent variable
SCATTERPLOTS
An X-Y graph with symbols that represent the values of 2 variables
Regression
Regression
line
line
LINEAR REGRESSION
However, regression for
prediction should not be used
outside the range of original
data.
t-test is also used for the
assessment of the level of
significance.
The dependent variable in
linear regression must be a
continuous one.
MULTIPLE LINEAR REGRESSION
The dependency of a dependent variable on several independent
variables, not just one.
Test of significance used is the ANOVA. (F test).
EXAMPLE
If neonatal birth weight depends on these factors: gestational age, length of
baby and head circumference. Each factor correlates significantly with baby
birth weight (i.e has +ve linear correlation).
We can do multiple regression analysis to obtain a mathematical equation by
which we can predict the birth weight of any neonate if we know the values of
these factors.