ANOVA
• One Way Analysis of Variance
The one way analysis of variance allows us to compare
several groups of observations, all of which are
independent but possibly with a different mean for each
group. A test of great importance is whether or not all the
means are equal.
72
Feeds for chicken
A 180 177 175 170 182 181 177 180 183 185
B 199 203 200 194 195 204 206 207 202 200
C 191 194 201 193 197 195 203 199 199 201 206 197
Comment?
Ma = 179 Mb = 201 Mc = 198
At the beginning, N could be > 12 for each treatment (sum > 30, 40)
How long is your experiment?
You need to prepare samples more than needed.
73
Important to have more than 30 samples
74
What is the best feed?
Could you use t – test for each 2 feeds, then compare m
Use another method using variance?
75
Controlled
Factor
3 feeds
Factor
Uncontrolled
Factor
Observed
variations
76
• The same feed, but hens give different eggs
• But all environmental factors are the same: variety, age,
health status…
77
Effect of
Effect of UnC F
controlled F
What can you say about the effect of feeds?
Action of feeds
78
Effect of
Effect of
Uncontrolled
controlled
Factor
Factor
What can you say about the effect of feeds?
No Action of feeds
79
Effect of Effect of
controlled Uncontrolled
Factor Factor
What can you say about the effect of feeds?
Is it possible to quantify these variations, to compare them?
80
Of course, but it is necessary to make an hypothesis.
H0: All levels of treatment belong to the same statistical population
What this means?
The 3 feeds have the same effect on the number of eggs laid
81
We suppose that all observations belong to the same population
The distribution of observations the number of
eggs, follows a normal distribution.
82
How to know: the feed variation ?
We must remove the genetic variation
How ?
Imagine that all the chickens receiving feed A, lay the same
number of eggs
the chickens receiving feed B, lay the same number of eggs
the chickens receiving feed C, lay the same number of eggs
83
The new data table
A 179 179 179 179 179 179 179 179 179 179
B 201 201 201 201 201 201 201 201 201 201
C 198 198 198 198 198 198 198 198 198 198 198 198
Vt = Variance of treatment
N = level of treatment = 3 (action of feeds)
84
How to know: the genetics variation ?
We must remove the treatment variation
How ?
1. Take each level of treatment
2. In each level of treatment, differences between data
express the variation due to genetics.
3. Its expression is the “error variance”
85
Expression of the “error”
86
The Anova table
A 180 177 175 170 182 181 177 180 183 185
B 199 203 200 194 195 204 206 207 202 200
C 191 194 201 193 197 195 203 199 199 201 206 197
Sum of Degrees of Variance F ratio F limite Conclusion
squares freedom SS/DF
(x-X)^2
VT
Vt Vt/Ve table
Verror
87
The Anova table
Sum of Degrees of Variance F ratio F limite Conclusion
squares freedom
VT 3448 31 111.2 Vt/Vr = H0 refuse for
76.7 α = 0.05
Vt 2900 2 1450
Verror 548 29 18.9
88
89
90
• Which feed is the best one?
91
We use the Student test but with a different approximation
of sd
The hypothesis that the 2 levels
The hypothesis is The hypothesis is
belong to the same statistical
refused refused
population is accepted (a = 5 %).
92
Ma = 179 Mb = 201 Mc = 198
Ma = 179 Ma – Mb = 22 Mc – Ma = 18
Significant Significant
different different
Mb = 201 Mb – Mc = 3
No Significant
different
Mc = 198
93
Between A and B
Sd = 1.9
2 sd = 3.8
Ma - Mb = 22
Conclusion: A and B are different
94
Between A and C
Sd = 1.9
2 sd = 3.8
Ma – Mc = 19
Conclusion: A and C are different
95
Between B and C
Sd = 1.9
2 sd = 3.8
Mb – Mc = 3
Conclusion: B and C are the same
Why B and C belong to the same population but Mb
and Mc differ slightly?
96
Block design
Hypothesis: Experiment with one controlled experimental factor but without
control of environmental heterogeneity
uncontrolled factor as temperature, light…
If we have action of block, and also action of level of treatment,
We cannot give interpretation.
97
Block design
1 2 3 4 5 6 7 8 9 10 m
A 180 177 175 170 182 181 177 180 183 185 179
B 199 203 200 194 195 204 206 207 202 200 201
C 191 194 201 193 197 195 203 199 199 201 197
mb 190 191 192 186 191 193 195 195 195 195 192.4
98
1 2 3 4 5 6 7 8 9 10 m
A 180 177 175 170 182 181 177 180 183 185 179
B 199 203 200 194 195 204 206 207 202 200 201
C 191 194 201 193 197 195 203 199 199 201 197
mb 190 191 192 186 191 193 195 195 195 195 192.4
99
Two Way Analysis of Variance
A Two way analysis of variance is a way of studying the effects of two
factors separately (their main effects) and (sometimes) together (their
interaction effect).
m1 m2 m3 m4
n1 n2 n3 n4
one measurement variable and two nominal variables 100
A1 A2
B1 180 191
177 193
175 201
182 193
177 205
180 204
184 199
185 198
B2 199 219
203 221
200 229
195 221
206 233
207 232
206 227
200 226
101
Assumption
• Assumption #1: Your dependent variable should be measured at the continuous level:
revision time (hours), intelligence (IQ score), exam performance (0 to 100), weight (kg)
• Assumption #2: two independent variables should each consist of two or more
categorical, independent groups: gender (2 groups: male or female), ethnicity (3 groups:
Caucasian, African American and Hispanic)
• Assumption #3: You should have independence of observations, which means that there is
no relationship between the observations in each group or between the groups themselves
• Assumption #4: There should be no significant outliers
• Assumption #5: Your dependent variable should be approximately normally distributed
for each combination of the groups of the two independent variables.
• Assumption #6: There needs to be homogeneity of variances for each combination of the
groups of the two independent variables
102
Hypotheses
• H01 hypotheses: the means of are equal for different values
of the first nominal variable;
• H02 hypotheses: the means are equal for different values of
the second nominal variable;
• H03 hypotheses: there is no interaction (the effects of one
nominal variable don't depend on the value of the other
nominal variable).
103
Are whale heavier in early or late mating season and does that
depend on the gender of the whale?
month in mating season” and “gender of whale” are nominal factors
(independent variable)
dependent variable – weight
H01: The means of all month groups are equal
H1: The mean of at least one month group is different
H02: The means of the gender groups are equal
H1: The means of the gender groups are different
H03: There is no interaction between the month and gender
104
H1: There is interaction between the month and gender
Before doing experiments, you need to design experiments
and think about how to analyze the data
105
A1 A2
B1 180 191
177 193
175 201
182 193
177 205
180 204
184 199
185 198
B2 199 219
203 221
200 229
195 221
206 233
207 232
206 227
200 226
106
107
215.4
265.07
375.67
5.41
Hyp. Refused
320.5 45.79 3.44 2.5 Hyp. refused
279.5 13.3
108
109
110
111
112
enzyme activity of mannose-6-phosphate isomerase (MPI) and MPI
genotypes in the amphipod crustacean Platorchestia platensis
113
Mann-Whitney test
Non-Parametric Test for Independent Measures Between Two groups,
can be performed on ranked data (equal to parametric t – test)
On non-normally distributed data
19 – 18
20- 19
18 – 17- 22 22 – 21- 19
20
23
Sample 2
Sample 1
114
Are these 2 samples come from the same population with
α = 5% ?
19 – 18
20- 19
22 – 21- 19
18 – 17- 22
20
23
Sample 2
Sample 1
H0: there is no different between the ranks of 2 samples
H1: there is different between the ranks of 2 samples
1 17 18 19 20 22 23
2 18 19 19 20 21 22
115
Mann–Whitney signed-rank test.
Non-parametric statistical hypothesis test for assessing whether two
independent samples of observations have equally large values (n < 30)
20 - 19 19 – 18
22 – 21- 19 Sample 2
Sample 1 18 – 17- 22
23 20
The two samples data are ranked against each other : at first U1 for
sample 1
(1) 17 18 19 20 22 23
(2) 18 19 19 20 21 22
No data (2)
> 23, u = 0
0 data in (2) > 22 +22
common u = 0,5
2 data(2) > 20 & 20 common u = 2 + 0,5 = 2,5
4 Data (2) > 19 and 19 common u = 4 + 0,5 = 4,5
5 data (2)> 18 and 18 common u = 5 + 0,5 = 5,5
6 data (2) > 17 u = 6
U1 = u = 19
U1 = 6 + 5,5 + 4,5 + 2,5 + 0,5 = 19 116
… and now U2, for the second sample:
(2) 18 19 19 20 21 22
(1) 17 18 19 20 22 23
1 data(1) > 23 + 22 common
u = 1,5
2 data(1) > 21 u = 2
2 data(1) > 20 + 20 common u = 2,5
3 data(1) > 19 u = 3
3 data(1) > 19 + 19 common u = 3,5
4 data(1) > 18 + 18 common u = 4,5
U2 = 17
U2 = 4,5 + 3,5 + 3 + 2,5 + 2 + 1,5 = 17 U1 + U2 = 17 + 19 = 36 = n1* n2
117
The U statistic show you how degrees of
overlap in rank between 2 groups
118
Sample 1
U
Sample 2
119
What are the limits of U1 and U2?
Example 2
1 2 3
1 2 3
U1 = 2,5 + 1,5 + 0,5 = 4.5
U2 = 2,5 + 1,5 + 0,5 = 4.5
n1 n2
U1 = U2 =3 3 = 2
2
120
Example 3
1 2 3
4 5 6
U2 = 0 and U1 = n1*n2
Sample 1
U=0
Sample 2
121
Smaller U = Bigger different between groups
Bigger U = Smaller different between groups
122
What are the limits of U1 and U2?
Example 2
1 2 3
1 2 3
U1 = 2,5 + 1,5 + 0,5 = 4.5
U2 = 2,5 + 1,5 + 0,5 = 4.5
n1 n2
U1 = U2 =3 3 = 2
2
The 2 samples are belong to the same population
123
Example 3
1 2 3 U1 = 3 + 3 + 3 = 9
4 5 6 U2 = 0
U2 = 0 and U1 = n1*n2
The 2 samples are different
Sample 1
U=0
Sample 2
124
… and now U2, for the second sample:
(2) 18 19 19 20 21 22
(1) 17 18 19 20 22 23
1 data(1) > 23 + 22 common
u = 1,5
2 data(1) > 21 u = 2
2 data(1) > 20 + 20 common u = 2,5
3 data(1) > 19 u = 3
3 data(1) > 19 + 19 common u = 3,5
4 data(1) > 18 + 18 common u = 4,5
U2 = 17
U2 = 4,5 + 3,5 + 3 + 2,5 + 2 + 1,5 = 17 U1 + U2 = 17 + 19 = 36 = n1 *n2
125
U1 and U2 are far different from n1 n2
2
Using Mann-Whitney table
Ucrit (α = %) n1 n2
U=0 U=
2
Hypothesis refused. Hypothesis accepted
////////////////////////////////////////////////////////////
126
Let hypothesis : the two samples belong
to the same statistical population
If n < 20
n1 … 5 6 7 …
n2
… … … … …
6 - 5 6 …
… limit value (α = 5 %)… … …
U=0 5 U = 18
////////////////////////////////////////////////////////////
Hypothesis refused Hypothesis accepted
The Hypothesis is accepted with an a risk of 5% 127
U stat A = Sum of rank A – n(n+1)/2
U stat B = Sum of rank B – n(n+1)/2
U stat A = 19 – 6(6+1)/2 = -2
U stat B = 17 – 6(6+1)/2 = -4
U stat = smallest = |2|
U crit = 5 by checking the Mann Whitney table
Reject H0: There is different between the ranks of 2 samples
128
• The Mann–Whitney U-test is limited to nominal
variables (Qualitative data) with only two samples
• It is the non-parametric analogue to two-sample t–test.
• Nominal variables: sex (male or female), genotype (AA,
Aa, or aa), or ankle condition (values are normal, sprained,
torn ligament, or broken).
129
Is there statistical evidence of a difference in APGAR scores in women receiving the
new and enhanced versus usual prenatal care?
U1 = 46.5
U2 = 17.5
U statA = 10.5
U statB = -18.5
Ustat = 10.5
Ucrit = 13
Reject H0, Accept H1: there is significant
different among 2 methods in APGAR score
with risk of 5%
U stat A = Sum of rank A – n(n+1)/2
U stat B = Sum of rank B – n(n+1)/2
130
Practical: Exercises
You analyse the data of an experiment to test the effect of different medium
composition for rice plant development.
1. Which medium is the best suitable for rice plant development?
2. If you test the effect of media in different time period, can you give the same
conclusion as before?
3. Compare 2 first treatments
A B C D E
121 112 117 128 123
126 103 124 130 125
141 122 123 127 115
125 105 115 126 117
118 106 120 128 121
125 112 121 129 119
131
Products produced in chain A, B, and C
A B C
Defective 5 8 9
Non-defective 35 42 51
The proportion of defective items in 3 chain are different or not?
132
Kruskal–Wallis H
• Non-parametric method for testing equality of
population median among groups.
• It is identical to a one-way analysis of variance
with the data replaced by their ranks.
• It is an extension of the Mann–Whitney test to 3 to 7
groups with N < 30
133
Kruskal–Wallis
ANOVA one way:
one nominal variable and Measurement variable does not
one measurement variable meet the normality
134
Data
Liver 1 Liver 2 Liver 3
18 15 15
20 16 20
22 17 21
25 21 25
135
H0: there is no different between the ranks of 3 samples
H1: there is different between the ranks of 3 samples
136
You have to control the calcium levels (mg per liter)
in three livers (12 samples)
Data rank Average rank
Liver 1 Liver 2 Liver 3 Liver 1 Liver 2 Liver 3
15 15 1 and 2 1,5 1,5
16 3 3
17 4 4
18 5 5
20 20 6 and 7 6,5 6,5
21 21 8 and 9 8,5 8,5
22 10 10
25 25 11 and 12 11,5 11,5
Sum 33 17 28
137
Then calculate the relationship
12 r i2
h= n(n 1)
( ) 3(n 1)
ni
2 2 2
12 (33 17 28 ) 3(13)
h= 12 13 4 4 4 = 2,58
138
The limit value is given by the table of Kruskal–Wallis
Be carefull: every combination of numbers per sample gives a value
limit
Number of samples : 3
Effective sample a=5%
… … …
Effective
in each
4 4 3 5,598
sample 4 4 4 5,692
5 2 1 5,00
H=0 2,58 limite value = 5,70
///////////////////////////////////////////////////
Hypothesis accepted Hypothesis refused 139
Conclusion: There is no significant different
between 3 groups with risk = 5%
Conclusion: Accept H0
H value is followed chisquare table
α = 0.05, DF = 2
Chisquare = 5.99 > 2.58
Conclusion: Accept H0
140
Is there any different between group A, B and C with risk of 5%
12 r i2
h= n(n 1)
( ) 3(n 1)
ni
Group A Group B Group C
27 20 28
23 8 27
14 14 3
18 28 23
7 21 28
9 22 6
141
Is there any different between group A, B and C with risk of 5%
Group A Group B Group C
27 20 28
23 8 27
14 14 3
18 28 23
7 21 28
9 22 6
HA = 49.5
HB = 57.5
H = 0.62, Hcrit = 5.8 HC = 64
P = 0.986589 > 0.05
142
143
Check the normality of data
Simply check based on histogram
Satisfactory if the data is roughly symmetric
144
No clear evidence of asymmetric
Difficult to determine whether the data is
normal or not
Information on the same measurements
from previous larger scale maybe helped
Clear asymmetric even
in a small sample
145
Data not normal
If non-normality due to outlier, should remove
If normality is double, non-parametric test is safe.
146
Nominal scales are used for labeling variables, without any quantitative
value. “Nominal” scales could simply be called “labels.”
None of them have any numerical significance
Ordinal scales: the order of the values is important and significant, but the
differences between each one is not really known.
Typically measures of non-numeric concepts like satisfaction, happiness,
discomfort, etc. 147
Interval scales are numeric scales in which we know both the order and the exact
differences between the values
The classic example of an interval scale is Celsius temperature because
the difference between each value is the same.
For example, the difference between 60 and 50 degrees is a measurable
10 degrees, as is the difference between 80 and 70 degrees.
pH, SAT score (200-800),credit score (30-850)
Problem: don’t have a “true zero.” For example, there is no such thing as
“no temperature,” at least not with celsius. zero doesn’t mean the absence
of value, but is another number used on the scale, like 0 degrees celsius
Ratio variable, has all the properties of an interval variable, and also has a clear
definition of 0.0. When the variable equals 0.0, there is none of that variable.
Examples:
enzyme activity, dose amount, reaction rate, flow rate, concentration,
pulse, weight, length, temperature in Kelvin (0.0 Kelvin really does mean
“no heat”), survival time. 148
149
A shoe company wants to know if three groups of workers have different salaries:
Women Men Minorities
23 45 20
41 55 30
54 60 34
66 70 40
90 72 44
150
A shoe company wants to know if three groups of workers have different salaries:
Women Men Minorities
23 45 20
41 55 30
54 60 34
66 70 40
90 72 44
H = 7.43
151
H
2. In a manufacturing unit, four teams of operators were randomly selected and sent to four
different facilities for machining techniques training. After the training, the supervisor
conducted the exam and recorded the test scores. At 95% confidence level does the scores are
same in all four facilities?
152
H
5.1 5.8
4.7 6.3
5 7.6
4.6 7.3
4.4 7.2
5.4 6.4
4.8 5.7 [Link] in setosa and
virginica species
5.8 6.4
Are they the same or different? Α
5.4 7.7
= 0.05
5.7 6
5.4 5.6
4.6 6.3
4.8 7.2
5 6.1
5.2 7.2
4.8 7.9
5.2 6.3
4.9 7.7
5.5 6.4
4.4 6.9
5 6.9
153
4.4 6.8 T
The students are randomly assigned to use one of three studying techniques for the next three
weeks to prepare for an exam. At the end of the three weeks, all of the students take the same
test.
The test scores for the students are shown below:
F stat = 1.91
154
A
Is there any difference in the results of treatment A and B? α = 5%
U stat A = Sum of rank A – n(n+1)/2
U stat B = Sum of rank B – n(n+1)/2
U stat = smallest = |U1, U2|
155
U
Researchers want to know if a fuel treatment leads to a change in the average miles
per gallon of a car. To test this, they conduct an experiment in which they measure the
miles per gallon of 12 cars with the fuel treatment and 12 cars without it.
156
The end
Thank you for your attention
157
[Link]
158