[go: up one dir, main page]

0% found this document useful (0 votes)
31 views102 pages

Unit 9 Inferential Statistics

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views102 pages

Unit 9 Inferential Statistics

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 102

INFERENTIAL STATISTICS

Dr. Dalia El-Shafei


Assist.Prof., Community Medicine Department, Zagazig
University
Definition of statistics :

Branch of mathematics concerned with:


Collection, Summarization, Presentation, Analysis,
and Interpretation of data.

Collection Summarization Presentation Analysis Interpretation


TYPES OF STATISTICS
• Describe or summarize the data • Use data to make inferences or
of a target population. generalizations about population.
• Describe the data which is • Make conclusions for population
already known. that is beyond available data.
• Organize, analyze & present
• Compare, test and predicts future
data in a meaningful manner.
• Final results are shown in outcomes.
forms of tables and graphs. • Final results is the probability
• Tools: measures of central scores.
tendency & dispersion. • Tools: hypothesis tests

Descriptive Inferential
INFERENCE
Inference involves making a generalization about a larger group of
individuals on the basis of a subset or sample.
Inferential statistics

Hypothesis testing Estimation

Hypothesis Set level of Choosing Decision Point


Decision
formulation significance test approach estimate

Null Interval estimate


hypothesis Quantitative
α error p-value Accept H0 “Confidence
data interval”
“H0”

Alternative
hypothesis Qualitative
Critical value Reject H0
data
“H1”
CONFIDENCE LEVEL & INTERVAL “INTERVAL ESTIMATE”

Confidence • The range of values that is used to


interval
estimate the true value of the
“Interval
population parameter.
estimate”
• The probability that the confidence
interval does, in fact, contain the
Confidence
true population parameter, assuming
Level
that the estimation process is
repeated many times (1−𝛼 ).
HYPOTHESIS TESTING

To find out whether the observed variation among sampling is


explained by sampling variations, chance or is really a difference
between groups.

The method of assessing the hypotheses testing is known as


“significance test”.

Significance testing is a method for assessing whether a result is


likely to be due to chance or due to a real effect.
NULL & ALTERNATIVE HYPOTHESES:
 In hypotheses testing, a specific hypothesis is formulated & data is
collected to accept or to reject it.
 Null hypotheses means: H0: x1=x2 this means that there is no
difference between x1 & x2.
 If we reject the null hypothesis, i.e there is a difference between the 2
readings, it is either H1: x1< x2 or H2: x1> x2
 Null hypothesis is rejected because x 1 is different from x2.
Compared the smoking cessation rates for smokers randomly
assigned to use a nicotine patch versus a placebo patch.

Null hypothesis: Smoking cessation rate in nicotine patch group =


smoking cessation rate in placebo patch group.

Alternative hypothesis: Smoking cessation rate in nicotine patch


group ≠ smoking cessation rate in placebo patch group (2 tailed) OR
smoking cessation rate in nicotine patch group is higher than
smoking cessation rate in placebo patch group (1 tailed).
DECISION ERRORS
Type I error “α” = False +ve = Rejection of true H0
Type II error “β” = False –ve = Accepting false H0
In statistics, there are 2 ways to determine whether the evidence is likely or
unlikely given the initial assumption:
 Critical value approach (favored in many of the older textbooks).
 P-value approach (what is used most often in research, journal articles, and

statistical software).
 If the data are not consistent with the null hypotheses, the difference is said
to be “statistically significant”.
 If the data are consistent with the null hypotheses it is said that we accept it
i.e. statistically insignificant.

 In medicine, we usually consider that differences are significant if the


probability is <0.05.
 This means that if the null hypothesis is true, we shall make a wrong decision
<5 in a 100 times.
CRITICAL VALUE
CRITICAL VALUE

A point on the test distribution that is compared to the test statistic to


determine whether to reject the null hypothesis.

If the absolute value of your test statistic is greater than the


critical value, you can declare statistical significance and reject
the null hypothesis.

Critical values correspond to α, so their values become fixed when


you choose the test's α.
Critical Value is the z-score that separates sample statistics likely to occur
from those unlikely to occur. The number 𝑍𝛼⁄2 is the z-score that separates a
region of 𝛼⁄2 from the rest of the standard normal curve
TESTS OF
SIGNIFICANCE
Tests of significance
Qualitative
Quantitative variables
variables

Proportion
1 Mean 2 Means >2 Means X2 test
Z-test

One One Large


sample Z- sample t- sample Small sample “<30” ANOVA
test test “>30”

Paired t-
Z-test t-test
test
ANALYSIS OF
QUANTITATIVE VARIABLES
Tests of significance
Qualitative
Quantitative variables
variables

Proportion
1 Mean 2 Means >2 Means X2 test
Z-test

One One Large


sample Z- sample t- sample Small sample “<30” ANOVA
test test “>30”

Paired t-
Z-test t-test
test
Z TEST OR SND
“STANDARD NORMAL DEVIATE”
Z TEST OR SND STANDARD NORMAL DEVIATE

 Used for Comparing 2 means of large samples (>60) using the


normal distribution.
Student’s t-test
STUDENT'S T-TEST
 Used for Comparing two means of small samples (<60) by the t
distribution instead of the normal distribution.
UNPAIRED T-TEST

 X1= mean of the 1st sample X2=mean of the 2nd sample


 n1= sample size of the 1st sample n2= sample size of the 2nd sample
 SD1= SD of the 1st sample SD2 = SD of the 2nd sample.

 Degree of freedom (df) = (n1+n2)-2


STUDENT'S T-TEST
 The value of t will be compared to values in the specific table of "t
distribution test" at the value of the degree of freedom.
 If the calculated value of t is less than that in the table, then the difference
between samples is insignificant.
 If the calculated t value is larger than that in the table so the difference is
significant i.e. the null hypothesis is rejected.
Statistical
significance
Small P-
value

Big t-value
Suppose that you calculate t test= 1.75
STUDENT'S T-TEST Suppose that df = 3
Calculated t (1.75) < Tabulated t (3.182), then the difference between samples is
insignificant. i.e. Null hypothesis is accepted.
PAIRED T-TEST
Comparing repeated observation in the same individual or difference
between paired data.
The analysis is carried out using the mean & SD of the difference
between each pair.
ANALYSIS OF VARIANCE
(ANOVA)
 Used for Comparing several means.
 To compare >2 means, this can be done by use of several t-tests that can consume
more time & lead to spurious significant results. So, we must use analysis of
variance or ANOVA.
ANALYSIS OF VARIANCE (ANOVA)

There are two main types:

One-way ANOVA

• When the subgroups to be compared are defined by just one factor


• Comparison between means of blood glucose levels among 3 groups of
diabetic patients (1st group was on insulin, 2nd group was on oral
hypoglycemic drugs, & 3rd group was on lifestyle modification)

Two-way ANOVA

• When the subdivision is based upon more than one factor.


• The above-mentioned example the groups were divided into males & females.
The main idea in the ANOVA is that we have to take into account the variability
within the groups and between the groups and value of F is equal to the ratio
between the means sum square of between the groups and within the groups.
F = between-groups MS / within-groups MS.
ANALYSIS OF
QUALITATIVE VARIABLES
Tests of significance
Qualitative
Quantitative variables
variables

Proportion
1 Mean 2 Means >2 Means X2 test
Z-test

One One Large


sample Z- sample t- sample Small sample “<30” ANOVA
test test “>30”

Paired t-
Z-test t-test
test
Chi -square test
CHI -SQUARE TEST

Test relationships between categorical independent variables.


Qualitative data are arranged in table formed by rows & columns.

Variables Obese Non-Obese Total


Diabetic 62 63 125
Non-diabetic 51 44 105
Total 113 107 220
O = Observed value in the table
E = Expected value
Row total Χ Column total
Expected (E) =
Grand total

Degree of freedom =
(row - 1) (column - 1)
EXAMPLE HYPOTHETICAL STUDY
 Two groups of patients are treated using different spinal
manipulation techniques
 Gonstead vs. Diversified

 The presence or absence of pain after treatment is the outcome


measure.

 Two categories
 Technique used
 Pain after treatment
GONSTEAD VS. DIVERSIFIED EXAMPLE - RESULTS

Pain after treatment

Technique Yes No Row Total


Gonstead 9 21 30
Diversified 11 29 40
Column Total 20 50 70
Grand Total

9 out of 30 (30%) still had pain after Gonstead treatment


and 11 out of 40 (27.5%) still had pain after Diversified,
but is this difference statistically significant?
 FIRST FIND THE EXPECTED VALUES FOR EACH
CELL
Row total Χ Column
Expected (E)
total
= Grand total
Pain after treatment
Technique Yes No Row Total

Gonstead 9 21 30 Multiply
Multiplyrow
rowtotal
total

Diversified 11 29 40
Column Total 20 50 70 Divide
Divideby
bygrand
grandtotal
total
Grand Total
Times
Timescolumn
columntotal
total

 To find E for cell a (and similarly for the rest)


 Find E for all cells
Pain after treatment
Yes No Row Total
Technique

Gonstead
9 21 30
E = 30*20/70=8.6 E = 30*50/70=21.4
11 29
Diversified 40
E=40*20/70=11.4 E=40*50/70=28.6
Column Total 20 50 70
Grand
Total
 2
Use the Χ formula with each cell and then add them together

(9 - 8.6)2 (21 - 21.4)2 0.018


0.0168
8.6 21.4 6
=
(11 - 11.4) (29 - 28.6)
2 2
0.031
0.0056
11.4 28.6 6
Χ2 = 0.0186 + 0.0168 + 0.0316 + 0.0056 = 0.0726
Evidence-based Chiropractic

Calculated χ2 value (0.0726) < Tabulated value


2 (7.815) at df = 1.
Therefore, Χ is not statistically significant
So, we will accept null hypothesis
Z TEST FOR COMPARING 2
PERCENTAGES
“PROPORTION Z-TEST”
Z TEST FOR COMPARING 2 PERCENTAGES “PROPORTION Z-
TEST”

Z= p1 – p2 /√(p1q1/n1 + p2q2/n2).

p1=% in the 1st group. p2 = % in the 2nd group


q1=100-p1 q2=100-p2
n1= sample size of 1st group
n2=sample size of 2nd group .
Z test is significant (at 0.05 level) if the result >2.
EXAMPLE
If the number of anemic patients in group 1 which includes 50 patients
is 5 and the number of anemic patients in group 2 which contains 60
patients is 20. if groups 1 & 2 are statistically different in prevalence of
anemia we calculate z test.
p1=5/50=10% p2=20/60=33% q1=100-10=90 q2=100-33=67
Z= 10 – 33/ √ (10x90/50 + 33x67/60)
Z= 23 / √ (18 + 36.85) Z= 23/ 7.4 Z= 3.1

So, there is statistically significant difference between percentages of


anemia in the studied groups (because Z>2).
CORRELATION & REGRESSION
CORRELATION & REGRESSION
Correlation measures the closeness of the association between 2
continuous variables, while Linear regression gives the equation of
the straight line that best describes & enables the prediction of one
variable from the other.
CORRELATION IS NOT CAUSATION!!!
LINEAR REGRESSION
Same as correlation Differ than correlation

Determine the relation & prediction of the The independent factor has to be
change in a variable due to changes in specified from the dependent
other variable.
t-test is also used for the assessment of the
variable.
level of significance. The dependent variable in linear
regression must be a continuous
one.
Allows the prediction of dependent
variable for a particular independent
variable “But, should not be used
outside the range of original data”.
CORRELATION
 Measured by the correlation coefficient, r. The values of r ranges between +1
and -1.
 “1” means perfect correlation while “0” means no correlation.
 If r value is near the zero, it means weak correlation while near the one it
means strong correlation. The sign - and + denotes the direction of correlation
REGRESSION
LINEAR REGRESSION
 Used to determine the relation & prediction of the change in a
variable due to changes in another variable.
 For linear regression, the independent factor (x) must be specified
from the dependent variable (y).
 Also allows the prediction of dependent variable for a particular
independent variable
SCATTERPLOTS
 An X-Y graph with symbols that represent the values of 2 variables

Regression
Regression
line
line
LINEAR REGRESSION
 However, regression for
prediction should not be used
outside the range of original
data.
 t-test is also used for the
assessment of the level of
significance.
 The dependent variable in
linear regression must be a
continuous one.
MULTIPLE LINEAR REGRESSION
 The dependency of a dependent variable on several independent
variables, not just one.
 Test of significance used is the ANOVA. (F test).
EXAMPLE
 If neonatal birth weight depends on these factors: gestational age, length of
baby and head circumference. Each factor correlates significantly with baby
birth weight (i.e has +ve linear correlation).
 We can do multiple regression analysis to obtain a mathematical equation by
which we can predict the birth weight of any neonate if we know the values of
these factors.

You might also like