Inferential Statistics
Inferential statistics
Inferential statistics infer from the sample
to the population .
Statistics that use sample data to make
decision or inferences about a population
help assess strength of the relationship
between your independent (causal)
variables, and you dependent (effect)
variables.
Purposes
Estimating population parameter from
sample data
Testing hypotheses
Hypothesis Testing
Hypothesis : A premise or claim that we
want to test
The process of deciding statistically
whether the findings of an investigation
reflect chance or real effects at a given
level of probability.
Is also called significance testing
Elements of Testing hypothesis
Null Hypothesis
Alternative hypothesis
Identify level of significance
Test statistic
Identify p-value
Conclusion
Hypothesis Testing
H0: There is no association between the
exposure and disease of interest
H1: There is an association between the
exposure and disease of interest
Hypothesis Testing
Hypothesis:- Hygiene procedures are
effective in preventing cold.
State 2 hypotheses:
Null: H0 : Hand-washing has no effect on
bacteria counts.
Alternative: Ha : Hand-washing has an
effect on bacteria counts.
Hypothesis Testing
Two types of pitfalls can occur that affect
the association between exposure and
disease
Type 1 error: observing a difference when
in truth there is none
Type 2 error: failing to observe a
difference where there is one.
Example - Efficacy Test for New
drug
Drug company has new drug, wishes to
compare it with current standard treatment
Federal regulators tell company that they
must demonstrate that new drug is better
than current treatment to receive approval
Firm runs clinical trial where some patients
receive new drug, and others receive
standard treatment
Numeric response of therapeutic effect is
obtained (higher scores are better).
Example - Efficacy Test for New
drug
Type I error - Concluding that the new
drug is better than the standard (HA)
when in fact it is no better (H0).
Ineffective drug is deemed better.
Type II error - Failing to conclude that
the new drug is better (HA) when in fact
it is. Effective drug is deemed to be no
better.
p-value
When you perform a hypothesis test in
statistics, a p-value helps you determine
the significance of your results.
A small p-value (typically ≤ 0.05) indicates
strong evidence against the null
hypothesis, so you reject the null
hypothesis.
A large p-value (> 0.05) indicates weak
evidence against the null hypothesis, so
you fail to reject the null hypothesis.
Confidence interval (CI)
A related, but more informative, measure
known as the confidence interval (CI) can
also be calculated.
CI = a range of values within which the
true population value falls, with a certain
degree of assurance (probability).
Confidence Interval - Definition
A range of values for a variable
constructed so that this range has a
specified probability of including the true
value of the variable
A measure of the study’s precision
Confidence interval
◦ 95% C.I. means that true estimate of effect
(mean, risk, rate) lies within 2 standard
errors of the population mean 95 times out
of 100
Interpreting Results
Confidence Interval: Range of values for a point
estimate that has a specified probability of including
the true value of the parameter.
Confidence level: refers to the percentage of all
possible samples that can be expected to include the
true population parameter. For example, suppose all
possible samples were selected from the same
population, and a confidence interval were computed
for each sample. A 95% confidence level implies that
95% of the confidence intervals would include the
true population parameter.
Confidence Limits: The upper and lower end points
of the confidence interval.
Real Life Example of a
Confidence Interval
The U.S. Census Bureau routinely uses
confidence levels of 90% in their surveys. One
survey of the number of people in poverty in
1995 stated a confidence level of 90% for the
statistics “The number of people in poverty in the
United States is 35,534,124 to 37,315,094.” That
means if the Census Bureau repeated the survey
using the same techniques, 90 percent of the time
the results would fall between 35,534,124 and
37,315,094 people in poverty. The stated figure
(35,534,124 to 37,315,094) is the confidence
interval.
Selection of Tests of Significance
Hypothesis Testing
TestStatistic:
n>30 we use Z test
n<30 we use t test
Hypothesis testing for difference
between two independent means
Independent sample T test is used
Example:- in study the effect of Age on
practicing breast self examination “BSE”
H0: there is no age difference between
women who perform BSE and women
who not
Ha: there is age difference
Level of significance= 0.05
Hypotheses testing for paired
samples
Paired sample T test is used for analysis
Example:- to study the level of security in
Libyan hospitals before and after the 17th
revolution, we asked 206 doctors who work
in Emergency departments about their
assessment of security in scale from 1 to 10
where 1 = worse security
10= the best
H0: there is no difference in security level
before and after the revolution
Ha: there is difference before and after
Alpha=0.05
Chi square test
Testing the significance between two
proportions
Can be used for more than two groups
Example: in the same doctors study we
want to know if the gender of doctors
associated with violence
Correlation
Finding the relationship between two
quantitative variables without being able
to infer causal relationships
Correlation is a statistical technique used
to determine the degree to which two
variables are related
Simple Correlation coefficient (r)
Statistic showing the degree of relation
between two variables
It is also called Pearson's correlation or
product moment correlation coefficient.
It measures the nature and strength
between two variables of
the quantitative type.
Scatter diagram
Rectangular coordinate
Two quantitative variables
One variable is called independent (X)
and
the second is called dependent (Y)
Points are not joined
• No frequency table
Scatter plots
The pattern of data is indicative of the
type of relationship between your two
variables:
Øpositive relationship
Ønegative relationship
Øno relationship
Simple Correlation coefficient (r)
The sign of r denotes the nature of
association
while the value of r denotes the strength
of association.
Simple Correlation coefficient (r)
ifthe sign is +ve this means the relation is
direct (an increase in one variable is
associated with an increase in the other
variable and a decrease in one variable is
associated with a decrease in the other
variable).
While if the sign is -ve this means an
inverse or indirect relationship (which
means an increase in one variable is
associated with a decrease in the other).
If r = Zero this means no association or
correlation between the two variables.
If 0 < r < 0.25 = weak correlation.
If0.25 ≤ r < 0.75 = intermediate
correlation.
If 0.75 ≤ r < 1 = strong correlation.
If r = l = perfect correlation.