Hypothesis Testing
ECE 3530 { Spring 2010
Antonio Paiva
What is hypothesis testing?
A statistical hypothesis is an assertion or
conjecture concerning one or more populations.
To prove that a hypothesis is true, or false, with absolute
certainty, we would need absolute knowledge. That is, we
would have to examine the entire population.
Instead, hypothesis testing concerns on how to use a
random sample to judge if it is evidence that supports or
not the hypothesis.
What is hypothesis testing? (cont.)
Hypothesis testing is formulated in terms of two hypotheses:
H0: the null hypothesis;
H1: the alternate hypothesis.
What is hypothesis testing? (cont.)
The hypothesis we want to test is if H1 is \likely" true.
So, there are two possible outcomes:
Reject H0 and accept H1 because of su cient
evidence in the sample in favor or H1;
Do not reject H0 because of insu cient evidence
to support H1.
What is hypothesis testing? (cont.)
Very important!!
Note that failure to reject H0 does not mean the null
hypothesis is true. There is no formal outcome that
says \accept H0." It only means that we do not have
su cient evidence to support H1.
What is hypothesis testing? (cont.)
Example
In a jury trial the hypotheses are:
H0: defendant is innocent;
H1: defendant is guilty.
H0 (innocent) is rejected if H1 (guilty) is supported by
evidence beyond \reasonable doubt." Failure to
reject H0 (prove guilty) does not imply innocence,
only that the evidence is insu cient to reject it.
Case study
A company manufacturing RAM chips claims the
defective rate of the population is 5%. Let p denote the
true defective probability. We want to test if:
H0 : p = 0:05
H1 : p > 0:05
We are going to use a sample of 100 chips
from the production to test.
Case study (cont.)
Let X denote the number of defective in the sample of 100.
Reject H0 if X 10 (chosen \arbitrarily" in this case).
X is called the test statistic.
p = 0:05 Reject H0, p > 0:05
0 10 critical region 100
Do not critical
reject H0 value
Case study (cont.)
Why did we choose a critical value of 10 for this example?
Because this is a Bernoulli process, the expected number of
defectives in a sample is np. So, if p = 0:05 we should expect
100 0:05 = 5 defectives in a sample of 100 chips. Therefore,
10 defectives would be strong evidence that p > 0:05.
The problem of how to nd a critical value for a desired level
of signi cance of the hypothesis test will be studied later.
Types of errors
Because we are making a decision based on a nite
sample, there is a possibility that we will make
mistakes. The possible outcomes are:
H0 is true H1 is true
Do not Correct Type II
reject H0 decision error
Type I Correct
Reject H0
error decision
Types of errors (cont.)
De nition
The acceptance of H1 when H0 is true is called a Type I
error. The probability of committing a type I error is called
the level of signi cance and is denoted by .
Example
Convicting the defendant when he is innocent!
The lower signi cance level , the less likely we are to
commit a type I error. Generally, we would like small
values of ; typically, 0.05 or smaller.
Types of errors (cont.)
Case study continued
= Pr(Type I error) = Pr(reject H0 when H0 is true)
= Pr(X 10 when p = 0:05)
100
X
x
= b(x; n = 100; p = 0:05);binomial distribution
=10
100
x 100 x
=
x=10
n 0:05 0:95 = 0:0282
100
X
So, the level of signi cance is = 0:0282.
Types of errors (cont.)
De nition
Failure to reject H0 when H1 is true is called a Type II error.
The probability of committing a type II error is denoted by .
Note: It is impossible to compute unless we have a speci
c alternate hypothesis.
Types of errors (cont.)
Case study continued
We cannot compute for H1 : p > 0:05 because the true p
is unknown. However, we can compute it for testing
H0 : p = 0:05 against the alternative hypothesis that
H1 : p = 0:1, for instance.
= Pr(Type II error) = Pr(reject H1 when H1 is true)
= Pr(X < 10 when p = 0:1)
9
X
= b(x; n = 100; p = 0:1) = 0:4513
x=0
Types of errors (cont.)
Case study continued
What is the probability of a type II error if p = 0:15?
= Pr(Type II error)
= Pr(X < 10 when p = 0:15)
9
X
= b(x; n = 100; p = 0:15) = 0:0551
x=0
E ect of the critical value
Moving the critical value provides a trade-o
between and
. A reduction in is always possible by increasing the size
of the critical region, but this increases . Likewise,
reducing is possible by decreasing the critical region.
E ect of the critical value (cont.)
Case study continued
Lets see what happens when we change the critical
value from 10 to 8. That is, we reject H0 if X 8.
Reject H0
0 8 10 old critical region 100
new critical region
Do not critical
reject H0 value
E ect of the critical value (cont.)
Case study continued
The new signi cance level is
= Pr(X 8 when p = 0:05)
100
X
= b(x; n = 100; p = 0:05) = 0:128:
x=8
As expected, this is a large value than before (it was 0.0282).
E ect of the critical value (cont.)
Case study continued
Testing against the alternate hypothesis H1 : p = 0:1,
= Pr(X < 8 when p = 0:1)
7
X
= b(x; n = 100; p = 0:1) = 0:206;
x=0
which is lower than before.
Testing against the alternate hypothesis H1 : p = 0:15,
7
X
= b(x; n = 100; p = 0:15) = 0:012;
x=0
again, lower than before.
E ect of the sample size
Both and can be reduced simultaneously by increasing
the sample size.
Case study continued
Consider that now the sample size is n = 150 and the
critical value is 12. Then, reject H 0 if X 12, where X is now
the number of defectives in the sample of 150 chips.
E ect of the sample size (cont.)
Case study continued
The signi cance level is
= Pr(X 12 when p = 0:05)
150
X
= b(x; n = 150; p = 0:05) = 0:074:
x=12
Note that this value is lower than 0.128 for n = 100
and critical value of 8.
E ect of the sample size (cont.)
Case study continued
Testing against the alternate hypothesis H1 : p = 0:1,
= Pr(X < 12 when p = 0:1)
11
X
= b(x; n = 150; p = 0:1) = 0:171;
x=0
which is also lower than before (it was 0.206).
Approximating the binomial distribution
using the normal distribution
Factorials of very large numbers are problematic to
compute accurately, even with Matlab. Thankfully, the
binomial distribution can be approximated by the normal
distribution (see Section 6.5 of the book for details).
Approximating the binomial distribution
using the normal distribution (cont.)
Theorem
If X is a binomial random variable with n trials and
probability of success of each trial p, then the limiting
form of the distribution of
X np
Z= p np(1 p) n!1
is the standard normal distribution.
This approximation is good when n is large and p is
not extremely close to 0 or 1.
Approximating the binomial distribution
using the normal distribution (cont.)
Case study continued
Lets recompute with the normal approximation.
= Pr(Type I error) = Pr(X 12 when p = 0:05)
150
X
= b(x; n = 150; p = 0:05)
x=12
p 150 0:05 0:95
Pr 12 150 0:05
Z = Pr(Z 1:69)
=1 Pr(Z 1:69) = 1 0:9545 = 0:0455:
Not too bad. . . (It was 0.074.)
Approximating the binomial distribution
using the normal distribution (cont.)
Case study continued
What if we increase the sample size to n = 500 and
the critical value to 40?
The normal approximation should be better since n is larger.
p 500 0:05 0:95
40 500 0:05
Pr Z = Pr(Z 3:08)
=1 Pr(Z 3:08) = 1 0:999 = 0:001:
Very unlikely to commit type I error.
Approximating the binomial distribution
using the normal distribution (cont.)
Case study continued
Testing against the alternate hypothesis H1 : p = 0:1,
39
X
= b(x; n = 500; p = 0:1)
x=0
Pr Z p500 0:1 0:9
39 500 0:1
= Pr(Z1:69) = 0:0681:
Visual interpretation with normal approximation
H0 is true: Reject H0
= 0:05
(Type I error rate)
25 33
H1 is true: p = 0:06
Fail to reject H0 Accept H1
= 0:6468
(Type II error rate)
30
Visual interpretation with normal approximation
H0 is true: Reject H0
= 0:05
(Type I error rate)
25 33
H1 is true: p = 0:08
Fail to reject H0 Accept H1
= 0:0936
(Type II error rate)
40
Visual interpretation with normal approximation
H0 is true: Reject H0
= 0:05
(Type I error rate)
25 33
H1 is true: p = 0:10
Fail to reject H0 Accept H1
= 0:0036
(Type II error rate)
50
Power of a test
De nition
The power of a test is the probability of rejecting H 0
given that a speci c alternate hypothesis is true. That is,
Power = 1 :
Summary
Properties of hypothesis testing
1. and are related; decreasing one generally
increases the other.
2. can be set to a desired value by adjusting the
critical value. Typically, is set at 0.05 or 0.01.
3. Increasing n decreases both and .
4. decreases as the distance between the true
value and hypothesized value (H1) increases.
One-tailed vs. two-tailed tests
In our examples so far we have considered:
H0: =0
H1: > 0.
This is a one-tailed test with the critical region in the right-
tail of the test statistic X.
0
reject H0
One-tailed vs. two-tailed tests (cont.)
Another one-tailed test could have the form,
H0: =0
H1: < 0,
in which the critical region is in the left-tail.
reject H0 0
One-tailed vs. two-tailed tests (cont.)
In a two-tailed test check for di erences:
H0: =0
H1: 6= 0,
0
reject H0 reject H0
Two-tailed test: example
Consider a production line of resistors that are supposed
to be 100 Ohms. Assume = 8. So, the hypotheses are:
H0: = 100
H1: 6= 100,
Let X be the sample mean for a sample of size n = 100.
Reject H0 Do not reject H0 Reject H0
98 102
In this case the test statistic is the sample mean
because this is a continuous random variable.
Two-tailed test: example (cont.)
area =2 area =2
98 102
= 100
We know the sampling distribution of X is a normal p
distribution with mean and standard deviation = n = 0:8
due to the central limit theorem.
Two-tailed test: example (cont.)
Therefore we can compute the probability of a type I error as
= Pr(X < 98 when = 100) + Pr(X > 102 when = 100)
p
= Pr Z < 8= p 100 ) + Pr(Z > 8= 100
98 100 102 100
= Pr(Z < 2:5) + Pr(Z > 2:5)
= 2 Pr(Z < 2:5) = 2 0:0062 = 0:0124:
Con dence interval
Testing H0 : = 0 against H1 : 6= 0 at a signi cance level is
equivalent to computing a 100 (1 )% con dence interval
for and H0 if 0 is outside this interval.
Example
For the previous example the con dence interval at a
signi cance level of 98:76% = 100 (1 0:0124) is [98; 102].
Tests concerning sample mean
(variance known)
As in the previous example, we are often interested in testing
H0: =0
H1: 6= 0,
based on the sample mean X from samples X1;
2
X2; : : : ; Xn, with known population variance .
Under H0 : = 0,
the probability of a type I error is
computed using the sampling distribution of X, which, due to
the central limit theorem, is normal distributed with mean p
and standard deviation = n.
Tests concerning sample mean (cont.)
(variance known)
From con dence intervals we know that
Pr z =2 < =pn
< z =2 = 1
X
0
area 1
area =2 area =2
X
a 0
reject H0 b reject H0
Tests concerning sample mean (cont.)
(variance known)
Therefore, to design a test at the level of signi cance
we choose the critical values a and b as
a= z
0 =2p n
b= 0 + z =2p n ;
then we collect the sample, compute the sample mean X and
reject H0 if X < a or X > b.
Tests concerning sample mean (cont.)
(variance known)
Steps in hypothesis testing
1. State the null and alternate hypothesis
2. Choose a signi cance level
3. Choose the test statistic and establish the critical region
4. Collect the sample and compute the test statistic.
If the test statistic is in the critical region, reject
H0. Otherwise, do not reject H0.
Tests concerning sample mean (cont.)
(variance known)
Example
A batch of 100 resistors have an average of 102
Ohms. Assuming a population standard deviation of 8
Ohms, test whether the population mean is 100 Ohms
at a signi cance level of = 0:05.
Step 1:
H0: =100
H1 : 6= 100;
Note: Unless stated otherwise, we use a two-tailed test.
Step 2: = 0:05
Tests concerning sample mean (cont.)
(variance known)
Example continued
Step 3: In this case, the test statistic is speci ed by the
problem to be the sample mean X.
Reject H0 if X < a or X > b, with
z = z
a= 0 =2p n 0 0:025 p 100
8
= 100 1:96 = 98:432
10
8
b= 0 + z =2p n = 100 + 1:96 10 = 101:568:
Step 4: We are told that the test statistic on a sample is
X = 102 > b. Therefore, reject H0.
One-sided sample mean test
(variance known)
Case A:
In this case, we are interested in testing,
H0: =0
H1: > 0.
area 1
area
0
reject H0
One-sided sample mean test (cont.)
(variance known)
Under H0 : = 0, the probability of a type I error is
0
Pr =p
n <z =1 :
X
Thus, our decision becomes: reject H0 at signi cance level if
X > +zp :
0 n
Note that we use z instead of z =2, just as in one-
tailed con dence intervals.
One-sided sample mean test (cont.)
(variance known)
Case B:
In this case, we are interested in testing,
H0: =0
H1: < 0.
area 1
area
0
reject H0
One-sided sample mean test (cont.)
(variance known)
Under H0 : = 0, the probability of a type I error is
0
Pr z<
n
=p
=
X
The decision becomes: reject H0 at signi cance level
if
z
X< 0 p
One-sided sample mean test (cont.)
(variance known)
Example
A quality control engineer nds that a sample of 100 light
bulbs had an average life-time of 470 hours. Assuming a
population standard deviation of = 25 hours, test whether
the population mean is 480 hours vs. the alternative
hypothesis < 480 at a signi cance level of = 0:05.
Step 1:
H0: =480
H1 : < 480;
Step 2: = 0:05
One-sided sample mean test (cont.)
(variance known)
Example continued
Step 3: The test statistic is the sample mean X. Reject H0 if
25
X< 0 z p n = 480 1:645 10 = 47
Step 4: Since X = 470 < 475:9, we reject H0.
Tests concerning sample mean
(variance unknown)
As before, we are often interested in testing
H0: =0
H1: 6= 0,
based a sample X1; X2; : : : ; Xn, but now with unknown
2
variance . For our decision we use the sample mean X
2
and the sample variance s .
We know that in this case the sampling distribution for
X is the t-distribution.
Tests concerning sample mean (cont.)
(variance unknown)
Critical region at signi cance level is, X < a or X > b,
where
s
a= t
0 =2 pn
s
b= 0 + t =2 p n ;
where t =2 had v = n 1 degrees of freedom.
X t or
Equivalently, let T = s=pn0 . Reject H0 if T < =2
T > t =2, for v = n 1 degrees of freedom.
For one-sided tests, t =2 is replaced by t as
usual.
Tests concerning sample mean (cont.)
(variance unknown)
Example 10.5 from the textbook
It is claimed that a vacuum cleaner expends 46 kWh per
year. A random sample of 12 homes indicates that
vacuum cleaners expend an average of 42 kWh per year
with (sample) standard deviation 11.9 kWh. At a 0.05
level of signi cance, does this suggest that, on average,
vacuum cleaner expend less than 46 kWh per year?
Assume the population to be normally distributed.
Tests concerning sample mean (cont.)
(variance unknown)
Example solution:
Step 1:
H0 : = 46 kWh
H1 : < 46 kWh;
Step 2: = 0:05
Tests concerning sample mean (cont.)
(variance unknown)
Example solution: continued
X
0
Step 3: The test statistic is T = s=pn .
Reject H0 if T < t0:05 for v = n 1 = 11 degrees of
freedom; that is, reject H0 if T < 1:796.
Step 4: We have that X = 42, s = 11:9 and n = 12. So,
42 46
T= p = 1:16 > 1:796:
11:9= 12
Do not reject H0.
Hypothesis testing using the p-value
In the approach we have taken so far, the signi cance
level is pre-selected up front, either by choosing a given
value or setting the critical region explicitly. In this case,
the nal outcome is the decision.
Now suppose a hypothesis test is performed at a signi
cance level of 0.05, but someone else wants to test with
a stricter signi cance level of 0.01. This requires
recomputing the critical region.
The p-value aims to provide more information about the
test statistic with regards to the hypothesis test.
Hypothesis testing using the p-value (cont.)
De nition
The p-value is the lowest level of signi cance at which the
observed value of a test statistic is signi cant (i.e., one
rejects H0).
Hypothesis testing using the p-value (cont.)
Alternative interpretation: the p-value is the minimum
probability of a type I error with which H0 can still be rejected.
area is the
p-value
value of test
statistic
Hypothesis testing using the p-value (cont.)
Example
Suppose that, for a given hypothesis test, the p-value is
0.09. Can H0 be rejected?
Depends! At a signi cance level of 0.05, we cannot
reject H0 because p = 0:09 > 0:05. However, for signi
cance levels greater or equal to 0:09, we can reject H 0.
Hypothesis testing using the p-value (cont.)
Example
A batch of 100 resistors have an average of 101.5 Ohms.
Assuming a population standard deviation of 5 Ohms:
(a) Test whether the population mean is 100 Ohms at a
level of signi cance 0.05.
(b) Compute the p-value.
Hypothesis testing using the p-value (cont.)
Example continued
(a) H0 : = 100; H1 : 6= 100 X = 101:5
therefore,
Test statistic is X. Reject H0 if reject H0.
z
X<100 0:025p n = 100 1:96
or
X > 100 + z0:025p n = 100 + 1:96
5
10 = 99:02
5
10 = 100:98
Hypothesis testing using the p-value (cont.)
Example continued
(b) The observed z-value is
X 100
101:5
p
Z= = n = 5=10
Then, the p-value is
p = 2 Pr(Z > 3) = 2 0:0013 = 0:0026:
This means that H0 could have been rejected at signi
cance level = 0:0026 which is much stronger than
rejecting it a 0:05.
Hypothesis testing using the p-value (cont.)
Example continued
area=0.025 area=0.025
area = 0.0013 area = 0.0013
99:02 100 100:98
observed X = 101:5 (Z = 3)
Could have moved
critical value here
and still reject H0