[go: up one dir, main page]

0% found this document useful (0 votes)
125 views22 pages

Lecture7 Hypothesistest PDF

This document discusses hypothesis testing, which involves determining whether sample evidence supports or rejects a belief about a population. It outlines the six steps of hypothesis testing: 1) setting up the null and alternative hypotheses, 2) determining the test statistic and its sampling distribution, 3) specifying the significance level, 4) defining the decision rule, 5) taking a sample and calculating the test statistic, and 6) making a statistical decision. It provides examples to illustrate these steps, such as testing whether the mean amount of cereal in boxes matches the specified amount, or whether the mean HSC test score of students matches the historical average.

Uploaded by

snoozerman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
125 views22 pages

Lecture7 Hypothesistest PDF

This document discusses hypothesis testing, which involves determining whether sample evidence supports or rejects a belief about a population. It outlines the six steps of hypothesis testing: 1) setting up the null and alternative hypotheses, 2) determining the test statistic and its sampling distribution, 3) specifying the significance level, 4) defining the decision rule, 5) taking a sample and calculating the test statistic, and 6) making a statistical decision. It provides examples to illustrate these steps, such as testing whether the mean amount of cereal in boxes matches the specified amount, or whether the mean HSC test score of students matches the historical average.

Uploaded by

snoozerman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

HYPOTHESIS TESTING

HYPOTHESIS TESTING
• There are two types of statistical inference: estimation and hypothesis
testing.
While estimation is used to calculate a single value or a range of values
from the sample in order to approximate a population parameter,
the purpose of hypothesis testing is to determine whether the sample
(or samples) provides reasonable evidence in favor of a belief about a
population (or about several populations).
E.g.: Male accountants in Australian accounting firms earn more than
$65,000 a year on average.
Male accountants earn more than female accountants, on average, in
Australian accounting firms.
The earnings of male accountants in Australian accounting firms are
normally distributed.
The earnings of accountants in Australian accounting firms do not
depend on gender.

These statements are about population parameters, and this semester we shall
be interested in this type of statements only.
2
Ex 1: µ X
In a cereal packaging process the amount of cereal in a box is supposed to be
normally distributed with a mean of 750 grams and standard deviation of 20 σ
grams. The plant manager, in order to check whether the packaging machine is
working to specification, randomly selects a sample of 25 boxes. If the mean
weight of this sample is found to be 741 grams should the packaging machine
be closed down for adjustment? x-bar n

The packaging machine should be closed down if the actual average net weight
of all boxes is different from 750 grams, which is the content’s weight printed on
each box.
The sample mean is certainly smaller than 750 grams. However, since X-bar
might change from sample to sample, this is not an overwhelming piece of
evidence.
Is the real µ also different from 750?

3
Recall that if µ were really 750,
the sampling distribution of X-bar (for n=25) would be N : (750, 4).
i.e. the sampling distribution would have a mean of 750.

20 / 25

741
95%
P(742.2  X  757.8)  0.95
X-bar However, the sample mean from the
742.2 750 757.8
single sample at hand is only 741
grams. How can this happen?
750  1.96  4 750  1.96  4

There are two possible reasons:


i. The sample is not typical. (Hopefully, it is representative.)
ii. µ is not 750 grams. (More likely, if the sample is random.)
4
TESTING THE POPULATION MEAN

In the previous example we developed a confidence interval for µ in order to


decide whether the assumption about the population mean was supported by
the sample evidence or not. But how to evaluate similar assumptions by formal
hypothesis testing procedures?

• Hypothesis testing, in general, is a six-step procedure:


1) Set up the null and alternative hypotheses.
2) Determine the test statistic and its sampling distribution.
3) Specify the significance level.
4) Define the decision rule.
5) Take a sample and calculate the value of the test statistic.
6) Make a statistical decision and draw the conclusion.

Let us study these steps one-by-one and discuss the concepts involved in
hypothesis testing.

5
1) Set up the null and alternative hypotheses.

– In any hypothesis test there are two mutually exclusive statements,

Null hypothesis (H0): Alternative hypothesis (HA):


The test procedure is based on the This is a logical alternative of H0 that
assumption that this statement is is accepted when the sample
true (even if we do not believe it). provides ample evidence against H0.

– There are two guidelines for setting up H0 and HA:


i. H0 must involve an identity (e.g. H0 : µ = µ0)
ii. What we expect to conclude (or what the question suggests)
should be in HA.

If there is sufficient sample evidence against H0, we reject it in favour of


HA. However, when there is not enough evidence to infer that H0 is not
true, we maintain it, but do not accept it.

6
In order to understand the asymmetrical statuses of the null and alternative hypotheses,
consider the following example.
Suppose, somebody claims that
H0 : whenever a fair coin is tossed the outcome is always head.
Accordingly,
HA : … the outcome is not always head.
Of course, we know that H0 is false and HA is true, but how can we prove this?
Say, we decide on tossing a coin ten times.
What if the first 9 outcomes are head, and the last one is tail?
Since one counterexample is enough to show that H0 is incorrect, we can safely reject it
and conclude that HA is true.
What if all 10 outcomes are head?
The experiment has failed to demonstrate that H0 is incorrect, so we cannot reject it.
But, it has not proved that H0 is correct either, so it would be unreasonable to accept H0.

A decision based on a rejected H0 is more conclusive than a decision


based on a non-rejected H0.
HA is the more important statement and whatever we intend to show
statistically, should be represented by HA.
7
Ex 2: ‘historical’ µ σ
At Sydney University the historical mean HSC aggregate score of entering
students has been 420 with a standard deviation of 84. Each year a sample of
applications is taken to see if the HSC scores are at the same level as in
previous years. A sample of 200 students in this year’s intake shows a sample
mean score of 435. Can we conclude at the 5% level of significance that the
mean HSC aggregate score for this year’s intake is different from 420?
x-bar n

The question suggests that HA : µ  420, so H0 : µ = 420.

2) Determine the test statistic and its sampling distribution.


– The test statistic is a sample statistic upon which we decide to reject or
not to reject H0.
Testing a population parameter, the test statistic is derived from the
point estimator of the parameter to be tested,
and our decision is based on the sampling distribution of this statistic,
assuming that H0 is correct.

8
If X-bar is normally distributed, the test statistic for testing µ is

X  0 X  0
Z t
x sx
if σ is known if σ is unknown
where µ0 denotes the hypothetical value of µ (i.e. µ under H0).
Granted that H0 is true, Z is a standard normal random variable, while
t has a Student’s distribution with n -1 degrees of freedom (unless X
is extremely non-normal).

Since n=200>30 and σ is given, the test statistic is Z.

9
3) Specify the significance level.

– H0 is either true or false, and at the end of the test procedure we either
reject it or maintain it. However, our decision based on the sample
evidence might be incorrect.
Altogether there are four possibilities:

H0 is true H0 is false

Maintain H0 Correct decision Incorrect decision

Reject H0 Incorrect decision Correct decision

Type I error: Type II error:


rejecting a true H0. non-rejecting a false H0.
P(Type I error) = α P(Type II error) = β
The probability of the type I error, α, is called significance level,
and it is equal to (100 – confidence level ) / 100.
10
In practice we would like α and β to be as small as possible. However,
at a given sample size, there is a trade-off between them, i.e. the lower
α, the higher β, and vice versa.

Both types of errors are harmful, but the type I error is usually more
costly, and thus it is considered worse, than the type II error.
We try to keep α at a relatively low level, usually 1%, 5% or at
most 10%, by selecting its value before performing the test.

This is the maximum amount of risk of rejecting a true H0,


that we are willing to tolerate.

(Ex 2) Let the significance level be 5%, i.e. α = 0.05.

4) Define the decision rule.

– The decision rule specifies the range of values of the test statistic for
which H0 is rejected in favour of HA.

11
– The sampling distribution of the test statistic covers all the possible
values that the statistic can assume when H0 is true. This set is divided
into two mutually exclusive but exhaustive parts.

Rejection region: Non-rejection region:


The subset of the possible values The remaining (and in fact bigger)
that are extremely unlikely if H0 is part of the possible values.
true, and thus lead us to reject H0.

These two regions are separated from each other by the critical value(s).

Suppose, for example, that X-bar is normally distributed and σ is known.


If H0 : µ = µ0 is true, the test statistic is standard normal,
X  0 X  0
Z : N (0 ; 1) and P (  z / 2   z / 2 )  1  
x x
If α is small, it is quite likely that the test statistic is between
– zα/2 and zα/2.

12
If the value of the test statistic, calculated from the sample at
hand, happens to be below –zα/2 or above zα/2, we are willing to
reject H0.

α/2 α/2

1- α
Z
-zα/2 0 zα/2
Rejection Non-rejection Rejection
region region region
Critical values

(Ex 2) At the 5% significance level zα/2 = 1.96, so the decision rule is:
Reject H0 if the value of the test statistic calculated from the sample is
either smaller than –1.96 or bigger than 1.96.

13
5) Take a sample and calculate the value of the test statistic.

– If they are not given, calculate the point estimate and its standard error,
and substitute them into the formula of the test statistic.

(Ex 2) n = 200, x-bar = 435 and σ = 84.


 84 x  0 435  420
x    5.94 z obs    2.525
n 200 observed x 5.94

6) Make a statistical decision and draw the conclusion.

Upon the decision rule decide Answer the original question.


whether to reject or not H0.

(Ex 2) Since zobs = 2.525 > 1.96 = zα/2, we reject H0 and conclude at the 5%
significance level that this year’s average HSC score is different from
420.

14
– In the previous example we wanted to know whether ‘the mean HSC
aggregate score for this year’s intake is different from 420?’
In this question (and also in our final conclusion) one of the key words
is different. It suggests that we reject H0 : µ = 420 if the sample mean is
either well below or well above 420.
In other words, we are ready to reject H0 on both sides of µ0, i.e. under
both tails of the sampling distribution of the test statistic.
This type of tests are called two-tail tests (or two-sided), and in general
they have the following null and alternative hypotheses for testing the
population mean
H0 : µ = µ0 and HA : µ  µ0

If we are willing to reject H0 only under one particular tail of the


sampling distribution, the test is said to be one-tail test (or one-sided).

Left-tail test: Right-tail test:


H0 : µ = µ0 and HA : µ < µ0 H0 : µ = µ0 and HA : µ > µ0

15
Assuming again that the test statistic is Z, the decision mechanisms of
left-tail and right-tail tests can be illustrated as follows:

Left-tail Right-tail

α α
1- α 1- α
Z Z
-zα 0 0 zα
Rejection Non-rejection Non-rejection Rejection
region region region region

Ex 3:
The average weekly wage of all workers in a large factory is $626.40. In a
random sample of 100 male workers in the factory, it was found that x-bar =
$682.00. Assuming that the standard deviation is $82.09, can we conclude (with
α = 0.05) that the mean weakly wage of male workers is greater than the overall
mean weekly wage?

16
Follow again the six-step process.
i. The question suggests that HA : µ > 626.40, so H0 : µ = 626.40.
ii. Since n=100>30 and σ is given, the test statistic is Z.
iii. The significance level is given, α = 0.05.
iv. This is a right-tail test, so the entire rejection region is located under the
right tail of the sampling distribution.
The significance level is not halved, and there is only one critical
value: zα = z0.05 = 1.645.
Reject H0 if the value of the test statistic calculated from the sample
is greater than 1.645.

 82.09 x  0 682.00  626.40


v. x    8.209 z obs    6.77
n 100 x 8.209

vi. Since zobs = 6.77 > 1.645 = zα, we reject H0 and conclude at the 5% level of
significance that the mean weakly wage of male workers is greater than the
overall mean weekly wage.

17
– In both of these examples the population standard deviation was
known.
If σ is unknown, we can overcome the problem the same way as in
confidence interval estimation.
Namely,
1) Estimate σ with the sample standard deviation, s.
2) Replace the true standard error of X-bar with its estimate.
3) Look up the critical values in the t-table instead of the Z-table.
(Given that the population is not extremely non-normal).
Apart from the different statistical table, the six-step procedure of
hypothesis testing is still valid.

n
Ex 4:
One of the critical factors in choosing a location for a new men’s clothing store
is the mean clothing expenditure per household in the surrounding
neighbourhood. A survey of 20 households reveals that the mean and the
standard deviation of annual expenditure on clothes are $387 and $60,
respectively. Can we conclude at the 5% significance level that the population
mean annual expenditure is less than $400? x-bar s
18
i. HA : µ < 400 and H0 : µ = 400.
ii. Since n is only 20, the central limit theorem does not apply. Assume that
clothing expenditure is normally distributed as both test statistics, Z and t,
require that X-bar is at least approximately normal.

The population standard deviation is unknown, thus the test statistic is t.


iii. Again, α = 0.05.
iv. This is a left-tail test, so the entire rejection region is located under the left
tail of the sampling distribution.
The critical value from the t-table with df = n -1= 19 is -tα = -t0.05 =
-1.729.
Reject H0 if the value of the test statistic calculated from the sample
is smaller than -1.729.

s 60 x   0 387  400
v. sx    13.42 t obs    0.969
n 20 sx 13.42
vi. tobs = -0.969 > -1.729 = -tα, so we fail to reject H0. At α = 0.05 there is not
enough evidence to indicate that that the mean expenditure is below $400.
TESTING THE POPULATION PROPORTION
• We have seen how to develop hypotheses tests about the population
mean from the sampling distribution of the sample mean.
In a similar way the sampling distribution of the sample proportion can
be used to determine whether evidence from a sample supports or
casts doubt about the validity of a hypothesis about the population
proportion.
• Recall that if the sample size is relatively large (np5, nq5), p-hat is
approximately normally distributed with
pq
 pˆ  p and  pˆ 
n
pˆ  p0
This suggests that the test statistic for testing p is Z 
 pˆ
Under the null hypothesis, H0 : p = p0,

p0 q0
 pˆ  , q0  1  p0 and Z is a standard normal random variable.
n
20
Ex 5:
In a random sample of 100 units from an assembly line, 15 were defective.
Does this constitute sufficient evidence at the 10% significance to conclude that
the defective rate among all units exceeds 10%? f  p-hat = 15/100
n

i. HA : p > 0.1 and H0 : p = 0.1.


ii. If H0 is correct, np = 1000.1=10 and nq = 1000.9=90 are both above 5.
The test statistic is Z.
iii. The significance level is 10%, so α = 0.10.
iv. This is a right-tail test and the critical value is zα = z0.10 = 1.282.
Reject H0 if the observed test statistic is greater than 1.282.

p0 q0 0.1  0.9 pˆ  p0 0.15  0.1


v.  pˆ    0.03 z obs    1.667
n 100  pˆ 0.03

vi. zobs = 1.667 > 1.282 = zα, so we can reject H0. At α = 0.10 there is sufficient
evidence to conclude that the defective rate is above 10%.

21
Ex 6:
Suppose that a company selling electronic article surveillance devices claims
that the proportion of all consumers who would never shop in a store again if
the store subjected them to a false alarm is 12%. A store considering installing
such a device has interviewed a random sample of 100 consumers and found
that 17 of them would not return if they were subjected to a false alarm. Does
this sample supports the company’s claim? Use α = 0.05.
p-hat = 17/100
i. H0 : p = 0.12 and HA : p  0.12.
ii. If H0 is correct, np = 12 and nq = 88. The test statistic is Z.
iii. The significance level is 5%.
iv. This is a two-tail test and the critical values are zα =  z0.05 =  1.645
Reject H0 if zobs is less than -1.645 or greater than 1.645.

p0 q0 0.12  0.88 pˆ  p0 0.17  0.12


v.  pˆ    0.032 z obs    1.563
n 100  pˆ 0.032

vi. zobs is between the two critical values, so we cannot reject H0. At α = 0.05
there is insufficient evidence to conclude that the proportion of consumers
discouraged by a false alarm is different from 12%.
22

You might also like