[yet
a le ve
Introduction to Hypothesis Testing D&T
‘The concept of evidence collection
‘A well-known story goes something like this: Four students missed the midterm for their statistics
class. They went to the professor together and said, “Please let us make up the exam. We carpool
together, and on our way to the exam, we got a flat tire. That's why we missed the exam.” The
professor didn’t believe them, but instead of arguing he said, “Sure, you can make up the exam. Be
in my office tomorrow at 8,”
‘The next day, they met in his office. He sent each student to a separate room and gave them an
exam. The exam consisted of only one question: “Which tire?” We don’t know the outcome of this
story, but let's imagine that all four students answer, “left rear tire.” The professor is surprised. He
had assumed that the students were lying. “Maybe,” he thinks, “they just got lucky. After all, if
they just guessed, they could still all choose the same tire.” But then he does a quick calculation
‘and figures out that the probability that all four students will guess the same tire is only 1.6%.
Reluctantly, he concedes that the students were not lying, and now he must give all of them an A
on the exam.
‘The statistics professor has just performed a hypothesis test. Hypothesis testing is a formal
procedure that enables us to choose between two hypotheses when we are uncertain about
our measurements. We call hypothesis testing a formal procedure because itis based on
particular terminology and a rather well specified set of steps.
However, we hope to show you that this “formal” procedure has a generous helping of common
sense supporting it.
Start with: A Pair of Hypotheses
In a formal hypothesis test, hypotheses are always statements about population parameters.
Remember you would never make a guess at something you already know,
hypothesis are always on parameters not statistics!
|
_
ahaa
py ty, Prenstt)
Hor
wees
albbnbie fe
no of Veure, no
peal hengesessing |
Hypotheses come in mutually exclusive pairs: aed ¢l! ertompessicg
| =~, A
> , Ou or
Hoar os Serer\, . Ant a the ober
Ber tos \ SPR ate ae
ville
Hye eel Tyre
Cant haus Both!
The null hypothesis, which we write HO (and pronounce “H-naught” or simply “the null
hypothesis”), is the conservative, status-quo, business-as-usual statement about a population
parameter. In the context of researching new ideas, the null hypothesis often represents “no,
change,” “no effect,” or “no difference.”
‘The alternative hypothesis, Ha (pronounced “H-A”), is the research hypothesis. It is usually a
statement about the value of a parameter that we hope to demonstrate is true.
‘The most imy step of a formal hypothesis test is choosing the hypotheses. In fact, there
are really only two steps of a formal hypothesis test that a computer cannot do, and this is one of
those steps. (The other step is checking to make sure that the conditions necessary for the
probability calculations to be valid are satisfied.)
thesis tests are like criminal trials. In a criminal trial, tw: fore the
Hi fendant is not guilty, or he is guilty. These hypotheses are not given equal weight,
however. The jury is told to assume the defendant is not guilty until the eviden
suggests this is not so. (Defendants charged with a crime in tht Canada must be found guilty
“beyond all reasonable doubt.”) 7
‘The null hypothesis is, by default, assumed true. This means when no substantial
evidence is provided the null is “not dethroned’ this does not guarantee the
truthfulness ofthe null, rather just that at the moment we can’t discount i
‘Should we then claim H, is true? Or that we failed to say H, is false? 4-yyyed) a
Hypothesis tests follow the same principles. The statistician plays the role of the prosecuting
attorney, who hopes to show that the defendant is guilty. The hypothesis thatthe statistician or
researcher hopes to establish, called the ‘claim,’ plays the role of the alternative hypothesis. The
null hypothesis is chosen to be a neutral, noncontroversial statement. Just as in a jury trial, where
we ask the jury to believe that the defendant is not guilty unless the evidence against this belief is
overwhelming, we will believe that the null hypothesis is true in the beginning. But once we
‘examine the evidence, we may reject this belief if the evidence is overwhelmingly against it.Using Mathematical Signs to set up Hypothesis
‘An extremely powerful tool for setting up the correct hypotheses is to consider the condition of the
statement given as it relates to the following mathematical notation.
Gs or 4>,<)
Any sign that includes the equality (=, <, >) must be located in the null
hypothesis!
Ue Halt of Wop SOS — Hor Apiors
Conversely, the corresponding non-equal sign (+, >, <) must be located
in the alternative hypothesis! Or vise-versa. .
NAT it Ok Ho P2 OS He MAOH
Consider the following statements. In each, state the appropriate null and
alternative hypothesis.
(a) One flips a coin n-times to test the claim that the coin isa “fair” coin '
Ve expe! S98 deeds So% HS.
hypothesis |
ontains the
2
Teis-alsoacceptable
to always write it
only the equal
Ho 20S
Ha eFOS
(b) An ‘a doctor claims that the average cost of a Multiple Resonance Imaging (MRI) test,
isffess thn $1,200. 4 eae ee an
ak of ites Statist
Ty, prt ge — aiae Fyroo THE oO Sins
1 . #200
Hes 0<'1200 He. ae
(c) An economist wishes to test March unemployment rate in Alberta - the percentage of
able-working Albertan’s - is €igher han February’s Alberta unemployment rate oa)
7
Ho: p£ 0.068 ee Hat P #064 bot 64%
prof ore atime!
Hes p 90-068 ar p70.064 pep ox altos,
(d) Researchers believe that a new chemotherapy treatment will@rolong)the lifetime of
patients afflicted with liver cancer. The @ieanfaveragd survival time of liver cancer patients
using the current chemotherapy regime is 43 months. Mameric!
Ho AS YS months of Ho E43 mening
Har Ard Smonihs , Hes LF 4.3 mantles
(e) Anepidemiologist wants to see iftbetter detection methods ancdimproved treatment has
awe conclude that the null hypothesis is supported “we fail to reject Ho”
(FRHo), implies if Hy is true we got a typical response ab intervie ©S the Hue
Mn ele sealed mess thik FO ta fis oli, Phe
The Noll thel ype arept pol ober pene fled 1
wee)
kest
1
ep vatue < awe conclude thatthe null hypothesis not is supported “we reject Ho” (RHO), |
implies if Hy is true something rare has occurred. ae a
nN Loe
: ory “ 2. \ ie
Dasicvs ae oe i Ys we i ated ml sae Beer
hese He
wn a? tee
: gee et oe Het em
> ee oral ote Oe
t ae 3 of ayItis improper (maybe even impolite/incorrect!) to say that you have “accepted” the null hypothesis,
when your p-value is bigger than 0.05. Instead, we say “We have failed to reject Ho” or “We
cannot reject Ho.” The reason for this is that several factors might make it difficult to determine
whether the null hypothesis is false. Appin, The Pobien tage 5 4 Ha ynst bene you
cet prove tuslt Bars Ast genrlatee Ignite, | aya
Coming to Incorrect Decisions ( $ Did at prove 7 just Dd? f
Mistakes are an inevitable pat ofthe hypothesis-testing process. The trick is not to make them too qe r
fen, (Not talking about mathematically mistakes here, we are referring to mistakes of
conclusion.)
One mistake we might make is to reject the null hypothesis when it is true. This is called a Type I
error. For example, recall the default or null hypothesis of a court trial is that the defendant is
innocent.’ If a truly innocent defendant is found guilty this is considered a type I error.
_Mowever, we might mistakenly, fail to reject the null hypothesis when itis actually false. This is
called a Type II error. In our example, this would be described as finding a truly guilty defendant
innocent.
SO eR Ue
per eee los ‘Ho True: Ho False
Ps oka RHo _a@=significance level 1-f=Power "4oo)*
a ——. P(RHo|Ho true) = P(RHolHio false)
Heel P(type | error) = no error
FRHo 1-@ B = P(RFHolHo false)
be ‘onfidence level P(type II error)
doe” = P(FRHo|Ho true) 4
no error
Calculating the poweris tricky and somewhat complex, in part because it requires that we know
the true value of ti population proportion. We leave this calculation to a future statistics course,
For now, be aware that if you do a hypothesis test and do not reject the null, then there is always
the chance that you have made a mistake because your power is too low. You simply don’t have
enough evidence to tell the difference between the pl hypothesis and the truth, a eee
i rrS—r——“‘“EUCCO
‘The Tradeoff between Significance Level and Power bn Aiea alee
bees
We are free to choose any value we wish for the significance level. Typically, we set this
probability at 0.05, but sometimes we go as low as 0.01. But why don't we make it arbitrarily, = ©
small? Say 0,00000012 That way, we'd almost never make this mistake. guar make *4P% FY
thy” Ze
Nowe Bid mane Set
We can’t make the significance level as small as we would like because we have a price to pay. Bro
The price is that if we make the significance level smaller, then the power gets smaller too! Jus a
a
To see this, think about our criminal justice example. We can make the significance level, the ie
probability of convicting an innocent man, a = 0 by following a simple rule: Free every a1
defendant. If everyone goes free, then itis impossible to convict an innocent person because YoU yaaont
are convicting no one. But now the power—the probability of correctly convicting a guilty person | et
is: 1B = 0%,
Know that although a and f are inversely related 1 — a is not how we calculate B
eeClearly we have a lot of p
symbols let's review
their meanings... |
I Variables, Proportions; a sl
Hypothesis Testing for Categori
modification to the CLT. P- Pap. prop
Test Statistic
bea B= sanpe PrP
Where p, represents the hypothesized proportion
Po byP
Example: ifthe number of people sampled was 100 000 and the number of Asem
success’ was 25 214, could you conclude the population proportion (i) was Ree
0.257 (ii) was less than 0.25? (ii) was more than 0.25? —
: , ea Pevelane Has
C)inpAos a U2) Has ORS Lad) Yer? it
_ BM 01ST 156 Se a thé ob how cane ©
Semple is
Test ste+
70 605
Understanding how to interpret the p-value is crucial to understanding hypothesis testing, The
computer might compute the p-value for you, but you need to understand how the computer did
this calculation if you are to successfully perform a hypothesis test.
‘The p-value is all about extremes. The meaning of the phrase “as extreme as or more extreme
than” depends on the alternative hypothesis. There are three basic pairs of hypotheses
TC ft
He p= po Ho: Ho: p= po fek, So
He p# po He p Po a
If the alternative hypothesis is ont
Ha: p # po (the true value of p is either bigger or smaller than what the null hypothesis claims) f
then “as extreme as or more extreme than” means “even farther away from O than the value you
observed.” This corresponds to finding the probability in both til of the N(O,1) distribution, This
is called a two-tailed hypothesis... het
“a a
w Pe
If the alternative hypothesis is v
Ha: p< pz (the true value is less than the value claimed by the null hypothesis) then “as extreme
as of more extreme than” means “less than or equal to the observed value.” This corresponds to
finding the probability in the left tail of N(O, 1). This is an example of a one-tailed hypothesis.
This alternative is also sometimes called a left-tailed gs Jower-tailed hypothesis, because the p-
value area is in the left tail Sa
e
Finally, ifthe alternative hypothesis is
Ha: p > py (the true value is greater than the value claimed by the null hypothesis) then “as
extreme as or more extreme than” means “greater than or equal to the observed value.” This
corresponds to finding the probability in the right tail of N(O, 1). This is another one-tailed
hypothesis. This alternative hypothesis is called a right-tailed or upper-tailed hypothesis.
are
howThe p-value is always the tail area(s) of the curve, which side will correspond to the sign in
the alternative hypothesis!
: ~ J
0 depo DO RR pa
7 tle pe \se us
ey YF zl ost paula
aS i$ an © j-o.4ihol
7 ay ooh — + 0.0544
ol
Pala 2(0.88%9), 9g “VS POO Prvelutto-ts
Example: The BRCA1 is a gene that has been linked to breast cancer. Itis believed that aot
all women who have a family history of breast cancer have the BRCAJ mutatign fo
Researchers used DNA anal me to search for BRCA] mutations in(|69srandomly chosen on)
Of the 169 women tested, 27had BRCA 1 mutation Does this data suggest tha mory than(2%
of women with a family history of breast cancer have the BRCA1 mutation?
(a) State the appropriate null and alternative hypothesis.
Ho: PEO? ge Hot PEO ive Pe PMP cb wamenith Wi
Has p> out He: p72 a rr—
(b) Find the value of the test statistic. agin gustan
A a b.1d TONG % ee ae
Z-P-Pe yout & |. S407 Ponty soe
(ey Gao 2 ot Ne
7 VaR te
ee \
(c) Find the P-value.
Her p20. _
pevrbee ag Pr vel? 10,5444
; fmt aa
Tel shot Po
pone? # =) 0.0984 > 0-05 wher
nop ob woe
Sp Reseed on ths alate VF epee P ae
a LDLrtr——e
Sah
ob Beast tani
lays tron 0:20Hypothesis Testing for Numerical Variables, Means
Test Statistic =
i
Fy
Ko Where uo represents the hypothesized average ormean
mn a> prety pace all fle Hae!
Uge t usher © unbaos 1
!
or 2 whe 5 known =» prety much Aawers
You will always use one of the following three pairs of hypotheses for the one sample t-test:
The p-values for each alternative hypothesis, all using the same t-statistic value of t = 2.1 and the
same sample size of n= 30.
Wer wee
Ile pot = Bord = 24 oot,
dln eS
Firs
=
2S OF as
“a!
pails 2 feria ODS
0.0% i eae
\
Lapgiepvsiet of
: SE unt a(Eoobe © one |
1008S
pet eget
triikb {fenExample: Hypothesis Testing of the Population Mean pt
Ina study conducted by Patel et al, the immune system status was measured on 15 randomly
selected people who are infected with the HIV virus. One’s immune system status is measured by
a CD4. Higher CD4 values are associated with better immunity system function
16,324, 256, 536, 321, 190, 818, 355, 465, 519, 87, 108, 190, 573, 1032
Using MINITAB, the 2, #, s were computed and provided below.
Descriptive Statistics: CD4
Variable N Mean StDev Qi Median 3.
cps 15 386.0 279.2 190.0 324.0 536.0
Can you infer from this data that the mean CD4 count of all those infected with HIV iqlessyhan
500?
(a) State the appropriate null and alternative hypothesis.
Ho wr, $00 gg He HO
Hes we S00 Ye tne Seo
(b) Find the value of the test statistic. Assume the CD4 counts of people infected with HIV are
Normally distributed. _
ee ee ey
Cia CMMs) |
(c) Find the P-value.
ron + debe
0.054 pveloe 20.01
(d) Provide a conclusion, testing at a level of significance of 5%:
y a 4
ee) ele 98% Garten The
bony those he
lesg then 50
ad voit BVMore Examples:
Example: An office manager believes the @ean mount of time spent by male-office
workers playing fantasy football at work is a normally distributed random vari
mean€xceeding 23)ninutes per week. The office manager randomly selecte:
memeivctes Sp to participate in such gaming) and had their ‘fantasy football
website’ activity monitored for a week. The amount of time each male spent over the
course of the week is given below, to the nearest minute.
35,23, 48, 13, 29,9, 44, 11, 17, 30, 21,42, 32, 37, 28, 43, 34, 48
The mean, median, and standard deviation of this sample is: ¥ = 30.22, = 31,8 = 12.47
Does this sample support the office manager's belief? Assuming that the amount of time a
male in this particular office spends per week playing fantasy football is normally.
distributed, conduct the appropriate statistical test using a level of significance of 5%)
Ho: a4 28 =
Ha: A? 2S
beta. pared = 1H
Sua) Ca) t
fee ee
aos dink, Hb af
Pasett on : hom «4 work per
ele sped Pell ae
t le spent P cp tien 2S we foal
Ps
Cae ae
weare Met a0
Cpl ta Coal Bowed, oe i
mind yor,
a eae
Ly Wwe ort 95% > ecb a)
een Hine 6 Afo Po
Example: Premier Redford’s approval rating was at@2in September 2013. That is, @a)
of Alb x's approved of her performance as the Premier of Alberta. A Leger survey
one-thoygand Alberta residents were randomly chosen between February 24th and 28th,
io ‘approved’ of Premier Redford’s performance as premier.
%
Does this recent poll indicate that Premier Redford’s approval rating na€Gectina) since
September 2013? Test at a = 0.05 z
Hing? 8% pe (pr o-32
Harpe O-dt Hs pg o-3t
wit
ap oe spp
earwsdl fom Ss
A toes we sup? Bade
0.222854
hanExample: A cigarette manyfacturer claims that the average amount of nicotine in a certain
brand of cigarette §& 1.5 milligram A random sample oft rettes of this particular
brand was taken; the amount of nicotine in each cigaretle was Measured. The average and
standard deviation of this sample were(T-6mgs and@2Ings. )
x
Does this data suggest on average, the amount of nicotine inthis brand of cigarette jxéqual)
to 1.5 milligrams? Use a = 0.05, =
Ho, A= LS
ber 27S,
a nied
Bose on ths Tate il * Ale 5
tol nice
wee
tee eae SM
pt Sqr
gs er gpk tnpile
3, BFF) cy yor te ge 15S!
C1552 7 taroet te eteess
es ol
ae cae
\ ow om
figs ke pale nrve’
: Uy bree ben meee?
Bet gpd errr oo Thome Beer Heel St
Wen we Bis Bt HS wer scoble
a9 type EER BF7 Pe
Example: An Ipsos-Keid poll conducted in early September 2013 suggested that,
percentage of Canadigns who will use their cell phones while they are drixing is(19%) A
random sample of 60) ‘drivers in the City of Calgary is taken, of whicl ere observed
to have a cell-phone in ther hand while diving. Does this sample suggest tp
at the
percentage of Calgary drivers who use a cellephage while operating a vehicle iRaeen
than the national percentage? Test wsing@=001;) x
%
Mos P= On9
Hes Pons
Povo © od gf sol 1
printleb & O04 Bare o delevet
Thee “hove
A Fes
Bevel on the dale, it offers
| Chen dies
b the prperlion © bert Drow OF
the a re a dest hae
oho ue all phos able
f ‘le
: Ql, whe un call phos we
woe sure te prop a a aa and aa
Davy Beloit jangle Tle inter’
‘ os bl
ost _.
ae ps le lew! ol %4 ow. athe!
Jeet agpe ot Bere 7
oe ee Quen Mock? world yor be
» : ary pv
jun Oe F Pulp Tet He rose ai vee
eet fe wid het PAe at sale
nen i
ibs 5 72 fs sh ; oo te
aul Canhelnt Bebe te
s HY ote weet
Sprain ang cont
pe oh jo!