0% found this document useful (0 votes)

99 views48 pages

CVEN2002 Week12

This document discusses one-way analysis of variance (ANOVA). It begins with an introduction to ANOVA, explaining that it allows comparison of more than two population means. The document then provides details on the ANOVA model and assumptions. It explains how ANOVA partitions total variability into between-group and within-group components. It derives estimators for the variance parameter based on the error and treatment sums of squares terms, laying the groundwork for the ANOVA F-test to compare variability between and within groups.

Uploaded by

Kai Liu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

99 views48 pages

CVEN2002 Week12

Uploaded by

Kai Liu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

Statistics

CVEN2002/2702

Week 12

This lecture

11. Analysis of Variance (ANOVA)

11.1 Introduction
11.2 One-way Analysis of Variance
11.3 Multiple pairwise comparisons
11.4 Adequacy of the ANOVA model
11.5 Blocking factor

Additional reading: Sections 9.1, 9.2 and 9.4 in the textbook

CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

2 / 48

11. Analysis of Variance (ANOVA)

11. ANOVA

CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

3 / 48

11. Analysis of Variance (ANOVA)

11.1 Introduction

Introduction
In Chapter 10, we introduced testing procedures for comparing the
means of two different populations, having observed two random
samples drawn from those populations (two-sample z- and t-tests)
However, in applications, it is common that we want to detect a
difference in a set of more than two populations
Imagine the following context: four groups of students were subjected
to different teaching techniques and tested at the end of a specified
period of time. Do the data shown in the table below present sufficient
evidence to indicate a difference in mean achievement for the four
teaching techniques?
Tech. 1
65
87
73
79
81
69
CVEN2002/2702 (Statistics)

Tech. 2
75
69
83
81
72
79
90

Dr Justin Wishart

Tech. 3
59
78
67
62
83
76

Tech. 4
94
89
80
88

Session 2, 2012 - Week 12

4 / 48

11. Analysis of Variance (ANOVA)

11.1 Introduction

Introduction: randomisation
To answer this question, we should first note that the method of division of the
students into 4 groups is of vital importance
For instance, some basic visual inspection of the data suggests that the
members of group 4 scored higher than those in the other groups. Can we
conclude from this that teaching technique 4 is superior? Perhaps, students
in group 4 are just better learners
; it is essential that we divide the students into 4 groups in such a
way to make it very unlikely that one of the group is inherently
superior to others (regardless of the teaching technique it will be
subjected to)
; the only reliable method for doing this is to divide the students in a
completely random fashion, to balance out the effect of any
nuisance variable that may influence the variable of interest
This kind of consideration is part of a very important area of statistical
modelling called experimental design, which is not addressed in this course
(Chapter 10 in the textbook). In this course, we will always assume that the
division of the individuals into the groups was indeed done at random
CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

5 / 48

11. Analysis of Variance (ANOVA)

11.1 Introduction

Introduction

75
69
83
81
72
79
90
78.43
7.11

59
78
67
62
83
76

94
89
80
88

70.83
9.58

87.75
5.80

75.67
8.17

65
87
73
79
81
69

Tech. 4

Tech. 3

Tech. 2

x
s

Tech. 1

Now, numerical summaries and a graphical display of the data are

always useful:

; the boxplots show the variability of the observations within a group

and the variability between the groups
; comparing the between-group with the within-group variability
is the key in detecting any significant difference between the groups
CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

6 / 48

11. Analysis of Variance (ANOVA)

11.2 One-way Analysis of Variance

Between-group and within-group variability

Group 1
5.90
5.92
5.91
5.89
5.88

Group 2
5.51
5.50
5.50
5.49
5.50

Group 3
5.01
5.00
4.99
4.98
5.02

Between-group variance = 1.017, within-group variance = 0.00018

(ratio = 5545)
CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

7 / 48

11. Analysis of Variance (ANOVA)

11.2 One-way Analysis of Variance

Between-group and within-group variability

Group 1
5.90
4.42
7.51
7.89
3.78

Group 2
6.31
3.54
4.73
7.20
5.72

Group 3
4.52
6.93
4.48
5.55
3.52

Between-group variance = 1.017, within-group variance = 2.332

(ratio = 0.436)
CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

8 / 48

11. Analysis of Variance (ANOVA)

11.2 One-way Analysis of Variance

Analysis of Variance
Comparing (intelligently) the between-group variability and the
within-group variability is the purpose of the Analysis of Variance
; often shortened to the acronym ANOVA
Suppose that we have k different groups (k populations, or k
sub-populations of a population) that we wish to compare
Often, each group is called a treatment or treatment level (general
terms that can be traced back to the early applications of this
methodology in the agricultural sciences)
The response for each of the k treatments is the random variable of
interest, say X
Denote Xij the jth observation (j = 1, . . . , ni ) taken under treatment i
; we have k independent samples (one sample from each of the
treatments)
CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

9 / 48

11. Analysis of Variance (ANOVA)

11.2 One-way Analysis of Variance

ANOVA samples
The k random samples are often presented as:
Treatment

Mean
St. Dev.

1
X11
X21
..
.
X1n1

2
X21
X22
..
.
X1n2

...

k
Xk 1
Xk 2
..
.
Xknk

1
X
S1

2
X
S2

...
...

k
X
Sk

...
...

i and Si are the sample mean and standard deviation of the ith
where X
sample. The total number of observations is
n = n1 + n2 + . . . + nk
, is
and the grand mean of all the observations, usually denoted X
k

i
XX
1 + n2 X
2 + . . . + nk X
k
n1 X
=1
X
Xij =
n
n

i=1 j=1

CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

10 / 48

11. Analysis of Variance (ANOVA)

11.2 One-way Analysis of Variance

ANOVA model
The ANOVA model is the following:
Xij = i + ij
where
i is the mean response for the ith treatment (i = 1, 2, . . . , k )
ij is an individual random error component (j = 1, 2, . . . , ni )
As usual for errors, we will assume that the random variables ij are
normally and independently distributed with mean 0 and variance 2 :
i.i.d.

ij N (0, )

for all i, j

Therefore, each treatment can be thought of as a normal population

with mean i and variance 2 :
ind.

Xij N (i , )
Important: the variance 2 is common for all treatments
CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

11 / 48

11. Analysis of Variance (ANOVA)

11.2 One-way Analysis of Variance

ANOVA hypotheses
We are interested in detecting differences between the different
treatment means i , which are population parameters
; hypothesis test!
The null hypothesis to be tested is
H0 : 1 = 2 = . . . = k
versus the general alternative
Ha : not all the means are equal
Careful! The alternative hypothesis should be that at least two of the
means differ, not that they are all different !
As pointed out previously, the primary tool when testing for equality of
the means is based on a comparison of the variances within the
groups and between the groups
CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

12 / 48

11. Analysis of Variance (ANOVA)

11.2 One-way Analysis of Variance

Variability decomposition
The ANOVA partitions the total variability in the sample data,
described by the total sum of squares
ni
k X
X

SSTot =

)2
(Xij X

(df = n 1)

i=1 j=1

into the treatment sum of squares (= variability between groups)

SSTr =

k
X

)2
i X
ni (X

(df = k 1)

i=1

and the error sum of squares (= variability within groups)

SSEr =

ni
k X
X

i )2
(Xij X

(df = n k )

i=1 j=1

The sum of squares identity is

SSTot = SSTr + SSEr
CVEN2002/2702 (Statistics)

Dr Justin Wishart

(Proof: exercise)
Session 2, 2012 - Week 12

13 / 48

11. Analysis of Variance (ANOVA)

11.2 One-way Analysis of Variance

Variability decomposition
The total sum of squares SSTot quantifies the total amount of variation
contained in the global sample
The Treatment sum of squares SSTr quantifies the variation between
the groups, that is the variation between the means of the groups
(giving more weight to groups with more observations)
The Error sum of squares SSEr quantifies the variation within the
groups

treatment sample

error samples

global sample
95

samples

CVEN2002/2702 (Statistics)

10
5
X

Dr Justin Wishart

80
X
75

85
80

Session 2, 2012 - Week 12

14 / 48

11. Analysis of Variance (ANOVA)

11.2 One-way Analysis of Variance

Mean Squared Error

In sample i, the sample variance is given by Si2 =

1
ni 1

Pni

j=1 (Xij

i )2
X

which is an unbiased estimator for 2 : E(Si2 ) = 2 (Slide 43 Week 10)

Since,
SSEr =

ni
k X
X

i )2 =
(Xij X

i=1 j=1

hence
E(SSEr ) =

k
X

(ni 1)Si2

i=1

k
k
X
X
(ni 1)E(Si2 ) = 2
(ni 1) = (n k ) 2
i=1

i=1

; another unbiased estimator for 2 is the Mean Squared Error MSEr

MSEr =

SSEr
nk

(generalisation of the pooled sample variance, Slide 21 Week 10)

; the number of degrees of freedom for this error estimator of 2 is
nk
CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

15 / 48

11. Analysis of Variance (ANOVA)

11.2 One-way Analysis of Variance

Treatment mean square

Now if H0 is true, that is if 1 = 2 = . . . = k = , we have
i ) N (0, ), for all i = 1, . . . , k
i N (, ), that is ni (X
X
ni

1 ), n2 (X
2 ), . . . , nk (X
k ), is a random sample
; n1 (X
whose sample variance
k
1 X
)2 = SSTr
i X
ni (X
k 1
k 1
i=1

is an unbiased estimator for 2

; the Treatment Mean Square MSTr , defined by
MSTr =

SSTr
k 1

is also an unbiased estimator for 2

; the number of degrees of freedom for this treatment estimator
of 2 is k 1
CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

16 / 48

11. Analysis of Variance (ANOVA)

11.2 One-way Analysis of Variance

ANOVA test
Thus we have two potential estimators of 2 :
1
MSEr , which always estimates 2
2
MSTr , which estimates 2 only when H0 is true
Actually, if H0 is not true, MSTr tends to exceed 2 , as we have
E(MSTr ) = 2 + true variance between the groups
; the idea of the ANOVA test now takes shape
Suppose we have observed k samples xi1 , xi2 , . . . , xini , for
i = 1, 2, . . . , k , from which we can find through calculations the
observed values msTr and msEr . Then:
if msTr ' msEr , then H0 is probably reasonable
if msTr msEr , then H0 should be rejected
; this will thus be a one-sided hypothesis test
We need to determine what msTr msEr means so as to obtain a
hypothesis test at given significance level
CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

17 / 48

11. Analysis of Variance (ANOVA)

11.2 One-way Analysis of Variance

Sampling distribution
It can be shown that, if H0 is true, the ratio
MSTr
F =
=
MSEr

SSTr
k 1
SSEr
nk

follows a particular distribution known as the Fishers F -distribution

with k 1 and n k degrees of freedom, which is usually denoted by
F Fk 1,nk
Note: Ronald A. Fisher (1890-1962) was an English statistician and
biologist. Some say that he almost single-handedly created the
foundation for modern statistical science. As a biologist, he is also
regarded as the greatest biologist since Charles Darwin

CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

18 / 48

11. Analysis of Variance (ANOVA)

11.2 One-way Analysis of Variance

The Fishers F -distribution

A random variable, say X , is said to follow Fishers F -distribution with
d1 and d2 degrees of freedom, i.e.
X Fd1 ,d2
if its probability density function is given by
f (x) =

((d1 + d2 )/2)(d1 /d2 )d1 /2 x d1 /21

(d1 /2)(d2 /2)((d1 /d2 )x + 1)(d1 +d2 )/2

for x > 0
; SX = [0, +)

for some integers d1 and d2

Note: the Gamma function is given by
Z +
(y ) =
x y 1 ex dx,

for y > 0

It can be shown that (y ) = (y 1) (y 1), so that, if y is a positive

integer n,
(n) = (n 1)!
There is usually no simple expression for the F -cdf
CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

19 / 48

11. Analysis of Variance (ANOVA)

11.2 One-way Analysis of Variance

The Fishers F -distribution

1.0

Some F -distributions

d1=4,d2=4
d1=100,d2=6
0.6

d1=4,d2=100

0.2

0.4
0.2

0.4

f(x)

F(x)

0.6

0.8

d1=3,d2=10

d1=4,d2=4
d1=100,d2=6
d1=3,d2=10

0.0

d1=4,d2=100

pdf f (x) = F 0 (x)

cdf F (x)
CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

20 / 48

11. Analysis of Variance (ANOVA)

11.2 One-way Analysis of Variance

The Fishers F -distribution

It can be shown that the mean and the variance of the F -distribution
with d1 and d2 degrees of freedom are
E(X ) =
and
Var(X ) =

d2
d2 2

for d2 > 2

2d22 (d1 + d2 2)
d1 (d2 2)2 (d2 4)

for d2 > 4

Note that a F -distributed random variable is nonnegative, as expected

(ratio of two positive random quantities) and the distribution is highly
skewed to the right

CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

21 / 48

11. Analysis of Variance (ANOVA)

11.2 One-way Analysis of Variance

The Fishers F -distribution: quantiles

Similarly to what we did for other distributions, we can define the
quantiles of any F -distribution:
Fdistribution

Let fd1 ,d2 ; be the value such that

P(X > fd1 ,d2 ; ) = 1

The F -distribution is not symmetric,

however it can be shown that
fd1 ,d2 ; =

f(x)

for X Fd1 ,d2

1
fd2 ,d1 ;1

1
fd1, d2,
x

For any d1 and d2 , the main quantiles of interest may be found in the
F -distribution critical values tables
CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

22 / 48

11. Analysis of Variance (ANOVA)

11.2 One-way Analysis of Variance

ANOVA test
The null hypothesis to test is H0 : 1 = 2 = . . . = k
versus the general alternative Ha : not all the means are equal
Evidence against H0 is shown if MSTr MSEr , so we will reject H0
whenever MSTr is much larger than MSEr , i.e.

MSTr
MSEr

much larger than 1

; for testing H0 at significance level , we need a constant c such that

MSTr
=P
> c if H0 is true
MSEr
We know that, if H0 is true, F =

MSTr
MSEr

Fk 1,nk

; we have directly that c = fk 1,nk ;1

From observed values msTr and msEr , the decision rule is:
reject H0 if
CVEN2002/2702 (Statistics)

msTr
> fk 1,nk ;1
msEr
Dr Justin Wishart

Session 2, 2012 - Week 12

23 / 48

11. Analysis of Variance (ANOVA)

11.2 One-way Analysis of Variance

ANOVA test: p-value

F(k1), (nk)distribution

The observed value of the test

statistic is
msTr
msEr

and thus the p-value is given by

f(x)

f0 =

p = P(X > f0 ),
p

where X Fk 1,nk
f0

(the probability that the test statistic will take on a value that is at least
as extreme as the observed value when H0 is true, definition on Slide
21 Week 9) ; from the F -distribution table, only bounds can be found
for this p-value (use software to get an exact value)
x

This test is also often called the F -test or ANOVA F -test

CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

24 / 48

11. Analysis of Variance (ANOVA)

11.2 One-way Analysis of Variance

ANOVA table
The computations for this test are usually summarised in tabular form
Source

degrees
of freedom

sum of
squares

mean
square

Treatment

dfTr = k 1

ssTr

msTr =

ssTr
k 1

Error

dfEr = n k

ssEr

msEr =

ssEr
nk

Total

dfTot = n 1

ssTot

F -statistic
f0 =

msTr
msEr

Note 1: dfTot = dfTr + dfEr and ssTot = ssTr + ssEr

Note 2: this table is the usual computer output when an ANOVA
procedure is run
CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

25 / 48

11. Analysis of Variance (ANOVA)

11.2 One-way Analysis of Variance

ANOVA: example
Example
Consider the data shown on Slide 5. Test at significance level = 0.05 the
null hypothesis that there is no difference in mean achievement for the four
teaching techniques
We have k = 4, n1 = 6, n2 = 7, n3 = 6 and n4 = 4, with x1 = 75.67,
x2 = 78.43, x3 = 70.83, x4 = 87.75 and s1 = 8.17, s2 = 7.11, s3 = 9.58,
s4 = 5.80. Besides,
4

n = 6 + 6 + 7 + 4 = 23

and

x =

1X
ni xi = 77.35
n
i=1

Thus, from the expressions on Slides 15 and 16,

ssEr = 5 8.172 + 6 7.112 + 5 9.582 + 3 5.802 = 1196.63
and
ssTr = 6 (75.67 77.35)2 + 7 (78.43 77.35)2
+ 6 (70.83 77.35)2 + 4 (87.75 77.35)2 = 712.59
CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

26 / 48

11. Analysis of Variance (ANOVA)

11.2 One-way Analysis of Variance

ANOVA: example
From there, the ANOVA table can be easily completed:
Source

degrees
of freedom

sum of
squares

mean
square

F -statistic

Treatments

dfTr = 3

ssTr = 712.59

msTr = 237.53

f0 = 3.77

Error

dfEr = 19

ssEr = 1196.63

msEr = 62.98

Total

dfTot = 22

ssTot =1909.22

Is f0 = 3.77 much larger than 1?

; compare to the appropriate F -distribution critical value

CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

27 / 48

11. Analysis of Variance (ANOVA)

11.2 One-way Analysis of Variance

ANOVA: example
According to M ATLAB, f3,19;0.95 = 3.1274 (in the table: f3,20;0.95 = 3.10)
; the decision rule is:
reject H0 if

msTr
> 3.1274
msEr

Here, we have observed

f0 =

msTr
= 3.77
msEr

; reject H0

We can claim that the teaching technique does have an influence on

the mean achievement of the students (with less than 5% chance of
being wrong)
The associated p-value is
p = P(X > 3.77) = 0.0281
(M ATLAB again)
CVEN2002/2702 (Statistics)

for X F3,19

; indeed, p < = 0.05 (reject H0 )

Dr Justin Wishart

Session 2, 2012 - Week 12

28 / 48

11. Analysis of Variance (ANOVA)

11.3 Multiple pairwise comparisons

ANOVA: confidence intervals on treatment means

The ANOVA F -test will tell you whether the means are all equal or not,
but nothing more
When the null hypothesis of equal means is rejected, we will usually
want to know which of the i s are different from one another
A first step in that direction is to build confidence intervals for the
different means i
From our assumptions (normal populations, random samples, equal
variance 2 in each group), we have

Xi N i ,
ni
The value of 2 is unknown, however we have (numerous!) estimators
for it
CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

29 / 48

11. Analysis of Variance (ANOVA)

11.3 Multiple pairwise comparisons

ANOVA: confidence intervals on treatment means

For instance, the MSEr is an unbiased estimator for 2 with n k
degrees of freedom
This one is based on all the n observations from the global sample
; it has smaller variance (i.e. it is more accurate) than any other
(like e.g. Si ), and should always be used in the ANOVA framework!
Acting as usual, we can conclude that
i i
X
ni p
tnk
MSEr
and directly write a 100 (1 )% two-sided confidence interval for
i , from the observed values xi and msEr :
r
r

msEr
msEr
xi tnk ,1/2
, xi + tnk ,1/2
ni
ni
; these confidence intervals for each group will tell which values i s
are much different from one another and which ones are close
CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

30 / 48

11. Analysis of Variance (ANOVA)

11.3 Multiple pairwise comparisons

ANOVA: confidence intervals on treatment means

For instance, in the previous example (mean achievement for teaching

techniques), we would find, with t19;0.975 = 2.093 (table) and msEr = 62.98:
q
95% CI for 1 = [75.67 2.093 62.98
] = [68.89, 84.45]
q 6
95% CI for 2 = [78.43 2.093 62.98
] = [72.15, 84.71]
q 7
] = [64.05, 77.61]
95% CI for 3 = [70.83 2.093 62.98
q 6
95% CI for 4 = [87.75 2.093 62.98
4 ] = [79.45, 96.06]

3
2

2
1

teaching technique

4
3

100

; it seems clear that 3 6= 4 is the main reason for rejecting H0

CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

31 / 48

11. Analysis of Variance (ANOVA)

11.3 Multiple pairwise comparisons

ANOVA: pairwise comparisons

It is also possible to build confidence intervals for the differences
between two means i and j . From observed values xi , xj and msEr , a
100 (1 ) % confidence interval for i j is
s
"

1
1
(xi xj ) tnk ;1/2 msEr
+
,
ni
nj
s

#
1
1
(xi xj ) + tnk ;1/2 msEr
+
ni
nj
for any pair of groups (i, j) (compare Slide 26 Week 10)
Finding the value 0 in such an interval is an indication that i and j
are not significantly different. On the other hand, if the interval does
not contain 0, that is evidence that i 6= j
However, these confidence intervals are sometimes misleading and
must be carefully analysed, in particular when related to the global null
hypothesis H0 : 1 = 2 = . . . = k
CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

32 / 48

11. Analysis of Variance (ANOVA)

11.3 Multiple pairwise comparisons

ANOVA: pairwise comparisons

Suppose that for a pair (i, j), the 100 (1 )% confidence interval for
i j does not contain 0
(i,j)

; at significance level %, you would reject H0

: i = j (Sl. 44 W9)

If i 6= j then automatically H0 : 1 = 2 = . . . = k is contradicted

; should you also reject H0 at significance level %? No
(i,j)

When you reject H0 : i = j at significance level %, you essentially

keep a % chance of being wrong
(1,2)

Successively testing H0

(1,3)

: 1 = 2 , and then H0

: 1 = 3 , and

(k 1,k )
H0

then . . ., and then finally

: k 1 = k , that is

k
k!
K =
=
pairwise comparisons,
2
2!(k 2)!
greatly increases the chance of making a wrong decision
(look back at Example Slide 32 Week 3)
CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

33 / 48

11. Analysis of Variance (ANOVA)

11.3 Multiple pairwise comparisons

ANOVA: pairwise comparisons

Suppose that H0 : 1 = 2 = . . . = k is true
(i,j)

If the decisions made for each of the K pairwise tests H0 : i = j

were independent (which they are not! why?), we would wrongly reject
at least one null hypothesis with probability 1 (1 )K (why?)
If the decisions were perfectly dependent (which they are not either!),
we would wrongly reject at least one null hypothesis with probability
(why?)
; if we based our decision about H0 : 1 = 2 = . . . = k on the
pairwise comparison tests, we would wrongly reject H0 with a
probability strictly between and 1 (1 )K , larger than !

To fix ideas, suppose k = 4 groups, which would give K = 42 = 6
pairwise comparisons, and = 0.05
; the test based on pairwise comparisons would be of effective
significance level between 0.05 and 1 (1 0.05)6 = 0.265
CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

34 / 48

11. Analysis of Variance (ANOVA)

11.3 Multiple pairwise comparisons

Pairwise comparisons: Bonferonni adjustments

It is usually not possible to determine exactly the significance level of
such a test: it all depends on the exact level of dependence between
the decisions about the different pairwise comparisons
Several procedures have been proposed to overcome this difficulty, the
simplest being the Bonferonni adjustment method
It is based on the Bonferonni inequality (see Exercise 1 Tut. Week 5):
P(A1 A2 . . . AK ) P(A1 ) + P(A2 ) + . . . + P(AK )
Suppose that Aq is the event we wrongly reject H0 for the qth pairwise
comparison. Then, the event B = (A1 A2 . . . AK ) is the event we
wrongly reject H0 : 1 = . . . = k
; if we want P(B) , it is enough to take P(Aq ) =

for all q

Hence, to guarantee an overall significance level of at most %, the

pairwise comparison tests must be carried
out at significance level

/K % (instead of %), where K = k2
CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

35 / 48

11. Analysis of Variance (ANOVA)

11.3 Multiple pairwise comparisons

Pairwise comparisons: example

In our running example, we have k = 4 groups, and we can run K = 6
pairwise two-sample t-tests
We can find:
t-test for H0 : 1 = 2

; p-value = 0.5276

t-test for H0 : 1 = 3

; p-value = 0.3691

t-test for H0 : 1 = 4

; p-value = 0.0346

t-test for H0 : 2 = 3

; p-value = 0.1293

t-test for H0 : 2 = 4

; p-value = 0.0537

t-test for H0 : 3 = 4

; p-value = 0.0139

At level 5%, we reject H0 : 1 = 4 and H0 : 3 = 4

From this, can we reject H0 : 1 = 2 = 3 = 4 at level 5%? No
; we must compare the above p-values to /K = 0.05/6 = 0.0083
None are smaller than 0.0083 ; do not reject H0 : 1 = 2 = 3 = 4 !
The ANOVA test did reject H0 . Is that a contradiction?
CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

36 / 48

11. Analysis of Variance (ANOVA)

11.4 Adequacy of the ANOVA model

Adequacy of the ANOVA model

The ANOVA model is based on several assumptions that should be
carefully checked
The central assumption here is that the random variables ij = Xij i ,
i = 1, . . . , k and j = 1, . . . , ni , are (1) independent and (2) normally
distributed:
i.i.d.
ij N (0, ),
with (3) the same variance in each group
We do not have access to values for ij (i s are unknown!), however
we can approximate these values by the observed residuals
ij = xij xi
e
Note that these residuals are the quantities arising in ssEr
; as for a regression model (see Slides 41-42 Week 11), the
adequacy of the ANOVA model is established by examining the
residuals
; residual analysis
CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

37 / 48

11. Analysis of Variance (ANOVA)

11.4 Adequacy of the ANOVA model

Residuals analysis
The normality assumption can be checked by constructing a normal
quantile plot for the residuals
The assumption of equal variances in each group can be checked by
plotting the residuals against the treatment level (that is, xi )
; the spread in the residuals should not depend on any way on xi
A rule-of-thumb is that, if the ratio of the largest sample standard
deviation to the smallest one is smaller than 2, the assumption of equal
population variances is reasonable
The assumption of independence can be checked by plotting the
residuals against time, if this information is available
; no pattern, such sequences of positive and negative residuals,
should be observed
As for the regression, the residuals are everything the model will not
consider ; no information should be observed in the residuals, they
should look like random noise
CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

38 / 48

11. Analysis of Variance (ANOVA)

11.4 Adequacy of the ANOVA model

Residual analysis: example

For our running example, a normal quantile plot and a plot against the
fitted values xi for the residuals are shown below:
Normal QQ Plot

residuals

Theoretical Quantiles

Sample Quantiles

; nothing (obvious) to report

; the assumptions we made look valid
CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

39 / 48

11. Analysis of Variance (ANOVA)

11.5 Blocking factor

Residual analysis: example

Example
To assess the reliability of timber structures, researchers have studied
strength factors of structural lumber. Three species of Canadian softwood
were analysed for bending strength (Douglas Fir, Hem-Fir and
Spruce-Pine-Fir). Wood samples were selected from randomly selected
sawmills. The results of the experiment are given below. Is there any
significant difference in the mean bending parameters among the three types
of wood?
Douglas (1)
370
150
372
145
374
365

Hem (2)
381
401
175
185
374
390

Spruce (3)
440
210
230
400
386
410

; an ANOVA was run to test the null hypothesis H0 : 1 = 2 = 3 , against

the alternative Ha : not all the means are equal
CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

40 / 48

11. Analysis of Variance (ANOVA)

11.5 Blocking factor

Residual analysis: example

We computed values for the ANOVA table:
Source

degrees
of freedom

sum of
squares

mean
square

F -statistic

Treatment

7544

3772

0.33

Error

172929

11529

Total

180474

In the F -distribution table, we can find that f2,15;0.95 = 3.68

; here we have observed f0 = 0.33 ; do not reject H0 !
Associated p-value: p = P(X > 0.33) = 0.726 for X F2,15
; we confidently claim that there is no significant difference in the
mean bending parameters for the different wood types
CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

41 / 48

11. Analysis of Variance (ANOVA)

11.5 Blocking factor

Residual analysis: example

Residual analysis:
residuals
100

Normal QQ Plot

residuals

Theoretical Quantiles

100

150

100

280

290

300

Sample Quantiles

310

320

330

340

350

; the assumptions are clearly not fulfilled!

; the above conclusion is certainly not reliable!
CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

42 / 48

11. Analysis of Variance (ANOVA)

11.5 Blocking factor

Blocking factor
450

If we had plotted the data first, we would have seen:

400

300

250

bending parameter

350

200

150

Mill 1
Mill 2
Mill 3
Mill 4
Mill 5
Mill 6

tree type

(Bottom line: always plotting the data before analysing them!)

CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

43 / 48

11. Analysis of Variance (ANOVA)

11.5 Blocking factor

Blocking factor
It is clear that over and above the wood type, the mills where the
lumber was selected is another source of variability, in this example
even more important than the main treatment of interest (wood type)
This kind of extra source of variability is known as a blocking factor,
as it essentially groups some observations in blocks across the initial
groups ; the samples are not independent! (assumption violation)
; a potential blocking factor must be taken into account!
When a blocking factor is present, the initial Error Sum of Squares,
, that is the whole amount of variability not due to the
say SSEr
treatment, can in turn be partitioned into:
1
the variability due to the blocking factor, quantified by SSBlock
2
the true natural variability in the observations SSEr
= SS
We can write thatSSEr
Block + SSEr , and thus
SSTot = SSTr + SSBlock + SSEr
CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

44 / 48

11. Analysis of Variance (ANOVA)

11.5 Blocking factor

Blocking factor
The ANOVA table becomes:
Source

degrees
of freedom

sum of
squares

mean
square

Treatment

k 1

ssTr

msTr =

Block

ssBlock

msBlock =

Error

nk b+1

ssEr

Total

ssTot

msEr =

ssTr
k 1

F -statistic
f0 =

msTr
msEr

ssBlock
b1

ssEr
nk b+1

where b is the number of blocks

msTr
Note: the test statistic is again the ratio ms
(we have just removed the
Er
variability due to the blocking factor first), to be compared with the
quantile of the Fk 1,nk b+1 distribution
CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

45 / 48

11. Analysis of Variance (ANOVA)

11.5 Blocking factor

Blocking factor
In the previous example, we would have found:
Source

degrees
of freedom

sum of
squares

mean
square

F -statistic

Treatment

7544

3772

15.87

Block

170552

34110

Error

2378

238

Total

180474

In the F -distribution table, we can find that f2,10;0.95 = 4.10

Here, we have observed f0 = 15.87 ; clearly reject H0 !
Associated p-value: p = P(X > 15.87) = 0.0008 for X F2,10
CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

46 / 48

11. Analysis of Variance (ANOVA)

11.5 Blocking factor

Blocking factor: comments

The SSEr in the first ANOVA (without block) was 172,929 which contains an
amount of variability 170,552 due to mills

; about 99% of the initial SSEr

was due to mill to mill variability, and so was
no natural variability!

The second ANOVA (with blocking factor) adjusts for this effect
The net effect is a substantial reduction in the genuine MSEr , leading to a
larger F -statistic (increased from 0.33 to 15.87!)
; with very little risk of being wrong (p ' 0), we can now conclude that
there is a significant difference in the mean bending parameters for
the three different wood types
An analysis of the residuals in this second ANOVA would not show anything
peculiar ; valid conclusion

Generally speaking, ignoring a blocking factor leads to a misleading

conclusion, and it should always be carefully assessed whether a
blocking factor may exist or not (plot the data!)
CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

47 / 48

11. Analysis of Variance (ANOVA)

Objectives

Objectives
Now you should be able to:
conduct engineering experiments involving a treatment with a
certain number of levels
understand how the ANOVA is used to analyse the data from
these experiments
assess the ANOVA model adequacy with residual plots
understand the blocking principle and how it is used to isolate the
effect of nuisance factors
Recommended exercises: Q3, Q6 p.406, Q9 p.407, Q10, Q11 p.412,
Q13, Q15, Q17 p.413, Q19 p.414, Q22, Q23 p.415, Q35 p.428

CVEN2002/2702 (Statistics)

Dr Justin Wishart

Session 2, 2012 - Week 12

48 / 48

Unit Five
No ratings yet
Unit Five
16 pages
Week10 One and Two Way Anova
No ratings yet
Week10 One and Two Way Anova
93 pages
Hypothesis Testing ANOVA
No ratings yet
Hypothesis Testing ANOVA
61 pages
N o Va: Alysis F Riance (ANOVA)
No ratings yet
N o Va: Alysis F Riance (ANOVA)
3 pages
Chapter 08
No ratings yet
Chapter 08
24 pages
Anova
No ratings yet
Anova
16 pages
Anova
No ratings yet
Anova
36 pages
Lecture 7 ANOVA
No ratings yet
Lecture 7 ANOVA
30 pages
2 One-Way ANOVA
No ratings yet
2 One-Way ANOVA
59 pages
7 One-way-ANOVA (Statistics IEM 2-2)
No ratings yet
7 One-way-ANOVA (Statistics IEM 2-2)
42 pages
Session 12 - 2023
No ratings yet
Session 12 - 2023
43 pages
CH 10
No ratings yet
CH 10
54 pages
Ermi Stat LL ch5
No ratings yet
Ermi Stat LL ch5
42 pages
Chapter 4 Hypotheses Testing of More Than Two Populations
No ratings yet
Chapter 4 Hypotheses Testing of More Than Two Populations
90 pages
Chapter - 13 Correlation and Linear Regression
No ratings yet
Chapter - 13 Correlation and Linear Regression
26 pages
ANOVA Models
No ratings yet
ANOVA Models
7 pages
One Way Anova
100% (3)
One Way Anova
36 pages
Lecture 2 Anova Erb
No ratings yet
Lecture 2 Anova Erb
26 pages
Lecture Two 2019-2020
No ratings yet
Lecture Two 2019-2020
30 pages
BST 32202 Linear Regression 3 Anova One Way
No ratings yet
BST 32202 Linear Regression 3 Anova One Way
29 pages
Unit 12
No ratings yet
Unit 12
27 pages
ANOVA for Statisticians
No ratings yet
ANOVA for Statisticians
45 pages
ANOVA
No ratings yet
ANOVA
11 pages
Oneway
No ratings yet
Oneway
41 pages
Anova: Analysis of Variance
No ratings yet
Anova: Analysis of Variance
45 pages
PLU Quantitative Techniques 4
No ratings yet
PLU Quantitative Techniques 4
13 pages
Oneway
No ratings yet
Oneway
37 pages
07 - One Way ANOVA
No ratings yet
07 - One Way ANOVA
21 pages
Chapter 6 - ANOVA Models
No ratings yet
Chapter 6 - ANOVA Models
7 pages
13.1 Classroom Notes
No ratings yet
13.1 Classroom Notes
4 pages
Module 9
No ratings yet
Module 9
11 pages
ANOVA for Statistical Analysis
100% (2)
ANOVA for Statistical Analysis
52 pages
Statistik Ch12
No ratings yet
Statistik Ch12
36 pages
Lecture 2
No ratings yet
Lecture 2
13 pages
One-Way ANOVA
No ratings yet
One-Way ANOVA
39 pages
Chapter 4 Analysis of Variance (Student - S Notes
No ratings yet
Chapter 4 Analysis of Variance (Student - S Notes
17 pages
ANOVA Testing and F-Distribution Guide
No ratings yet
ANOVA Testing and F-Distribution Guide
21 pages
CH-5 Analysis-Of-Variance
No ratings yet
CH-5 Analysis-Of-Variance
34 pages
CHAP 11 - Analysis of Variance
No ratings yet
CHAP 11 - Analysis of Variance
11 pages
One-Way ANOVA
No ratings yet
One-Way ANOVA
37 pages
Analysis of Variance: David Chow Nov 2014
No ratings yet
Analysis of Variance: David Chow Nov 2014
32 pages
Lecture 8 - One Way ANOVA
No ratings yet
Lecture 8 - One Way ANOVA
34 pages
Analysis of Variance (Anova)
No ratings yet
Analysis of Variance (Anova)
18 pages
One Way Analysis of Variance
No ratings yet
One Way Analysis of Variance
13 pages
One-Way ANOVA
No ratings yet
One-Way ANOVA
52 pages
ANOVA
No ratings yet
ANOVA
52 pages
Anova Test
No ratings yet
Anova Test
32 pages
Analysis of Variance: Testing Equality of Means Across Groups
No ratings yet
Analysis of Variance: Testing Equality of Means Across Groups
7 pages
Anova
No ratings yet
Anova
43 pages
Anova Biometry
No ratings yet
Anova Biometry
33 pages
1 Lecture-4
No ratings yet
1 Lecture-4
54 pages
Topic 5 Analysis of Variance
No ratings yet
Topic 5 Analysis of Variance
31 pages
Edited Analysis of Variance - Final Anova
No ratings yet
Edited Analysis of Variance - Final Anova
58 pages
One-Way ANOVA Test
No ratings yet
One-Way ANOVA Test
28 pages
Analysis of Variance
No ratings yet
Analysis of Variance
28 pages
Analysis of Variance
No ratings yet
Analysis of Variance
27 pages
Statistics Solutions Weeks 1-7
No ratings yet
Statistics Solutions Weeks 1-7
22 pages
Week4 PDF
No ratings yet
Week4 PDF
50 pages
CVEN2002 Week11
No ratings yet
CVEN2002 Week11
49 pages
Statistics Lecture: Random Variables
No ratings yet
Statistics Lecture: Random Variables
43 pages
CVEN2002 Week9
No ratings yet
CVEN2002 Week9
45 pages
Week2 PDF
No ratings yet
Week2 PDF
58 pages
Screening Cum Scholarship Test: IIT/AIIMS 2024
No ratings yet
Screening Cum Scholarship Test: IIT/AIIMS 2024
6 pages
Role Type Pay Band Location Duration Reports To:: Teacher of English
No ratings yet
Role Type Pay Band Location Duration Reports To:: Teacher of English
4 pages
Cultural Food Respect Lesson
No ratings yet
Cultural Food Respect Lesson
2 pages
SB Davis Lesson Plan: Creating Intermediate Advanced
No ratings yet
SB Davis Lesson Plan: Creating Intermediate Advanced
1 page
Course Outline For English 3201 2016-17
No ratings yet
Course Outline For English 3201 2016-17
5 pages
Positive Psychology The Science of Happiness and Flourishing 3rd Edition Compton Unlocked Test Bank
No ratings yet
Positive Psychology The Science of Happiness and Flourishing 3rd Edition Compton Unlocked Test Bank
344 pages
UPSC 2028 Complete Roadmap
No ratings yet
UPSC 2028 Complete Roadmap
6 pages
Vendor Attendees: Supplier Name Date: Auditors
No ratings yet
Vendor Attendees: Supplier Name Date: Auditors
5 pages
CSU Guidance Counselor IPCR 2014
100% (1)
CSU Guidance Counselor IPCR 2014
3 pages
CDP MCQs - Child Development & Pedagogy (CDP) MCQ Questions With Answer
No ratings yet
CDP MCQs - Child Development & Pedagogy (CDP) MCQ Questions With Answer
4 pages
Prince2 2017 Foundation Classroom
No ratings yet
Prince2 2017 Foundation Classroom
2 pages
Driving License
No ratings yet
Driving License
5 pages
Action Research Proposa1
No ratings yet
Action Research Proposa1
3 pages
QCF Int Adv FND Specification
No ratings yet
QCF Int Adv FND Specification
26 pages
PGCB
No ratings yet
PGCB
1 page
Student Survey Questionnaire: Feedback On Teachers' Classroom Strategies
No ratings yet
Student Survey Questionnaire: Feedback On Teachers' Classroom Strategies
3 pages
Exam
No ratings yet
Exam
9 pages
Indian Students' Guide to Foreign Medical Degrees
No ratings yet
Indian Students' Guide to Foreign Medical Degrees
54 pages
FBS-DLL - Table Reservation
100% (1)
FBS-DLL - Table Reservation
5 pages
ISTQB Foundation 4.0. 0. Introduction. (13-10-2024)
No ratings yet
ISTQB Foundation 4.0. 0. Introduction. (13-10-2024)
15 pages
Kerala PSC Exam Schedule Dec 2022
No ratings yet
Kerala PSC Exam Schedule Dec 2022
8 pages
Procedure For Equation of Foreign Qualifications 2023
No ratings yet
Procedure For Equation of Foreign Qualifications 2023
7 pages
FS1 Activity 5
No ratings yet
FS1 Activity 5
9 pages
Essay Writing for IAS Aspirants
No ratings yet
Essay Writing for IAS Aspirants
3 pages
Bestlink College of The Philippines
No ratings yet
Bestlink College of The Philippines
31 pages
The Effects of Using Substandard Materials in Constructing A Building
No ratings yet
The Effects of Using Substandard Materials in Constructing A Building
5 pages
Financial Risk Management Guide
No ratings yet
Financial Risk Management Guide
8 pages
Computational Electronics Semiclassical and Quantum Device Modeling and Simulation 1st Vasileska Digital Access
No ratings yet
Computational Electronics Semiclassical and Quantum Device Modeling and Simulation 1st Vasileska Digital Access
404 pages
Tradiitional Methods of Performance Appraisal1
No ratings yet
Tradiitional Methods of Performance Appraisal1
7 pages
Sjce
No ratings yet
Sjce
7 pages