ANALYSIS OF VARIANCE
ANOVA
▪ Student’s t-test is used to test whether there is any significant difference (or the variation) in the means of two
populations.
▪ This test fails when we consider the means of three or more populations.
▪ The equality of several populations means can be tested by comparing the sample variances using F-distribution
i.e., by analysis of variance.
▪ The systematic procedure of this statistical technique was first developed by Prof. R.A. Fisher, an Agriculturist
who made an exclusive use of this technique in his agricultural experiments. Now it is widely used in professional
business and physical sciences.
▪ For example, the differences in various types of drugs manufactured by a company to cure a particular disease
may be studied through his statistical technique.
The technique of analysis of variance is referred to as ANOVA. The technique of “ANOVA” is to split the
variation into its various components. Usually the variance is divided into the following two parts
1. Variance between the samples
2. Variance within the samples
▪ Assumptions of ANOVA
▪ The analysis of variance is based on the following assumptions
1. The samples are independently drawn from the populations
2. Populations from which the samples are selected are normally distributed
3. Each of the populations has the same variance.
▪ In the investigation of ANOVA, if we consider the influence of any one factor, then it is called one-
way classification. For example, the yields of several plots of land may be classified according to
one or more types of fertilizers.
▪ The techniques for ANOVA one-way classification model are:
i) Direct Method
ii) Short-Cut Method
▪ Null Hypothesis H0: μ1= μ2= . . . = μk where μ1, μ2, . . . , μk are the arithmetic means of k populations from which k
samples are drawn at random.
▪ Alternative Hypothesis H1: all μi (i=1,2, . . ,k) are not equal.
The steps in carrying out the analysis are given below.
a) Calculation of Variance Between the Samples
It is the sum of squares of the deviations of the means of various samples (or groups) from the
grand mean. To calculate variance between the samples, we proceed as follows:
ത1 , 𝑋ത2 , 𝑋ത3 , . . . ,𝑋ത𝑘 of all the k samples
i) Calculate the sample means 𝑋
ത by using the formula
ii) Calculate the mean of the sample means or the grand mean 𝑋
ത ത ത ത
𝑋1 + 𝑋2 + 𝑋3 + ...+ 𝑋𝑘 𝑇
ത
▪ 𝑋= ത
𝑜𝑟 𝑋=
𝑘 𝑁
▪ Where T = grand total of all the observations and N= total no of observations in all k samples
= σ 𝑛𝑖
ത1 - 𝑋,
i) Evaluate the deviations of the sample means from the grand mean i.e., find 𝑋 ത 𝑋ത2 - 𝑋,
ത . . . , 𝑋ത𝑘 -
𝑋ത
▪ SSB (or SSC) = sum of squares of the variations between the samples (or between the
columns)
= σ 𝑛𝑖 𝑋ത𝑖 − 𝑋ത 2
▪ MSB (or MSC) =variance of the mean square between the samples (or between the columns)
𝑆𝑆𝐵
= , where v1 = degrees of freedom = k-1
𝑉1
CALCULATION OF VARIANCE WITHIN THE SAMPLES:
▪ Variance within the samples is calculated as below
ത1 , 𝑋ത2 , 𝑋ത3 , . . . ,𝑋ത𝑘 of all the k samples
i) Calculate sample means 𝑋
ii) Calculate the deviations of the various items of k samples from the mean values of the respective sample
iii) Square these deviations and obtain their total
▪ Thus SSW (or SSE) = sum of squares of the variations within the samples
= σ 𝑋1 − 𝑋ത1 2 +σ 𝑋2 − 𝑋ത2 2 + . . . + σ 𝑋𝑘 − 𝑋ത𝑘 2
▪ MSW (or MSC) = variance of the mean square within the samples
𝑆𝑆𝑊
= , where v2 = degrees of freedom = N – k
𝑉2
Calculation of the Test statistic F
Assuming that H0 is true, the test statistic is
𝑀𝑆𝐵 𝑀𝑆𝐶 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
▪F = or =
𝑀𝑆𝑊 𝑀𝑆𝐸 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑤𝑖𝑡ℎ𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
▪ Follows F-distribution with degrees of freedom v1 = k-1 and v2=N-k
𝑀𝑆𝑊
▪ In general MSB>MSW. But if MSB<MSW, we shall take the inverse ratio i.e., F = i.e.,
𝑀𝑆𝐵
variance ratio or F is the ratio between the greater variance and smaller variance
Decision:
▪ The computed value of F is compared with the critical value of F for (k-1, N-k) d.f. at α% level. If
the calculated value > the tabulated value, reject H0 and conclude that all the proportion means are
not equal. Otherwise, accept H0 and conclude that the population means are equal.
Source of Sum of Degrees of Mean Test statistic
variation Squares Freedom Squares
Between the 𝑆𝑆𝐵
SSB k-1 MSB =
samples 𝑘−1
𝑀𝑆𝐵
F=
𝑀𝑆𝑊
Within the 𝑆𝑆𝑊
SSW N-k MSW=
samples 𝑁−𝑘
Total SST N-1
▪ SST = sum of squares of the total
▪ Three different machines are used for a production. On the basis of the outputs, test whether the
machines are equally effective.
OUTPUTS
Machine 1 Machine 2 Machine 3
10 9 20
5 7 16
11 5 10
10 6 4
▪ Null Hypothesis be H0: μ1= μ2= μ3 i.e., the Machines are equally effective.
▪ Alternative Hypothesis be H1: μ1 ≠ μ2 ≠ μ3 are not equal i.e., the
Machines are not equally effective.
The calculations for sample means, the variance between and within the samples are shown below.
Calculations for sample means
Machine 1 Machine 2 Machine 3
10 9 20
5 7 16
11 5 10
10 6 4
ΣX1 = 36 ΣX2 =27 ΣX3 =50
σ 𝑋1 36 σ 𝑋2 27 σ 𝑋3 50
ത
▪ 𝑋1 = ത
= = 9, 𝑋2 = ത
= = 6.75, 𝑋3 = = = 12.5,
𝑛1 4 𝑛2 4 𝑛3 4
ത ത ത
𝑋1 + 𝑋2 + 𝑋3 + ...+ 𝑋𝑘 ത
9+6.75+15
ത
▪ Grand Mean, 𝑋= = =9.42
𝑘 3
Now SSB = Sum of squares between the samples
= σ 𝑛𝑖 𝑋ത𝑖 − 𝑋ത 2
= 4(9-10.25)2+4(6.75-10.25)2 + 4(15-10.25)2
= 67.16
And v1 = degrees of freedom = k-1 = 3-1=2
MSB (or MSC) =variance of the mean square between the samples
𝑆𝑆𝐵 67.16
= = = 33.58
𝑉1 2
Calculations for sample means
X1 𝑋1 − 𝑋ത1 2 X2 𝑋2 − 𝑋ത2 2 X3 𝑋3 − 𝑋ത3 2
10 1 9 5.0625 20 56.25
5 16 7 0.0625 16 12.25
11 4 5 3.0625 10 6.25
10 1 6 0.5625 4 72.25
Σ 𝑋1 − 𝑋ത1 2=22 Σ 𝑋2 − 𝑋ത2 2=8.75 Σ 𝑋3 − 𝑋ത3 2=147
SSW (or SSE) = sum of squares of the variations within the samples
= σ 𝑋1 − 𝑋ത1 2+σ 𝑋2 − 𝑋ത2 2 + . . . + σ 𝑋𝑘 − 𝑋ത𝑘 2
= 22+8.75+147 = 178.75
V2 = degrees of freedom =N-k = 12-3=9
MSW = mean square within the samples
𝑆𝑆𝐵 178.75
= = = 19.86
𝑉2 9
ANOVA table: Source of Sum of Degrees of Mean Squares Test statistic
variation Squares Freedom
Between the
67.16 2 33.5
samples
33.5
F = 19.86 =1.68
Within the
82.75 9 19.86
samples
Total 228.25 11
The tabulated value of F for v1=2 and v2=9 at 5% level is 4.26. We see that the calculated value
1.68 of F is less than the tabulated value 4.26. Hence, we accept the Null Hypothesis at 5% level and
conclude that the three machines are equally effective.
The various steps in calculating of F-ratio by the short-cut method are given below.
i) Calculate the total T of all the observations in all k samples.
Thus T = ΣX1+ ΣX2+ . . . + ΣXk
𝑇2
ii) Calculate the correction factor , where N is the number of items in all samples.
𝑁
iii) Compute SST = total sum of the squares of deviations
𝑇2
= ΣX12+ ΣX22+ . . . + ΣXk2 -
𝑁
i) Find out the sum of squares between the samples (SSB) as follows:
a) Find the square of the total of each sample and divide each squared value by the number of
values of the corresponding sample and then calculate total of all the results thus obtained.
b) Subtract the correction factor from (a)
σ 𝑋1 2 σ 𝑋2 2 σ 𝑋𝑘 2 𝑇2
Thus, SSB = + + ...+ -
𝑛1 𝑛2 𝑛𝑘 𝑁
i) Calculate SSW i.e., the sum of the squares within the samples by subtracting SSB from SST.
Thus, SSW = SST –SSB
Now proceed as in direct method to obtain MSB, MSW and F to arrive at the final decision.
EXAMPLE:
▪ Three samples, each of size 5, were drawn from the uncorrelated normal populations with
equal variances. Test the hypothesis that the population means are equal at 5% level.
Sample 1 10 12 9 16 13
Sample 2 9 7 12 11 11
Sample 3 14 11 15 14 16
▪ Let μ1, μ2, μ3 be the means of the three populations.
▪ Null Hypothesis be H0 : μ1= μ2= μ3
▪ Alternative Hypothesis be H1 : μ1 ≠ μ2 ≠ μ3 are not equal
Calculations for short-cut method
X1 X12 X2 X22 X3 X32
10 100 9 81 14 196
12 144 7 49 11 121
9 81 12 144 15 225
16 256 11 121 14 196
13 169 11 121 16 256
ΣX1=60 ΣX12=750 ΣX2=50 ΣX22=516 ΣX3=70 ΣX32=994
▪ T = sum of all the observations = 180
𝑇2 1802
▪ Correction factor = = = 2160
𝑁 15
▪ SST = the total sum of squares
𝑇2
= σ 𝑋12 + σ 𝑋22 + 2
σ 𝑋3 - = 750+516+994 – 2160 = 100
𝑁
▪ SSB = sum of squares between the samples
σ 𝑋1 2 σ 𝑋2 2 σ 𝑋3 2 𝑇2 602 502 702
= + + - = + + − 2160 = 40
𝑛1 𝑛2 𝑛3 𝑁 5 5 5
V1 = k-1=3-1=2
MSB= mean square between the samples
𝑆𝑆𝐵 40
= = = 20
𝑉1 2
▪ SSW = sum of squares within the samples
= SST –SSB = 100-40 = 60
V2 = N-k=15-3=12
▪ MSW = mean square within the samples
𝑆𝑆𝐵 60
= = =5
𝑉2 12
▪ ANOVA table
Source of Sum of Degrees of Mean Test
variation Squares Freedom Squares statistic
Between the
40 2 20
samples
20
F= =4
5
Within the
60 12 5
samples
Total 100 14
▪ The value of F for v1=2 and v2=9 at 5% level is 3.89. We see that the calculated value 4 of F
is greater than the tabulated value 3.89. Hence, we reject the Null Hypothesis at 5% level and
conclude that the population means are not equal.
▪ A manager of a merchanting firm wishes to test whether its three
salesmen A,B,C tend to make sales of the same size or whether they
differ in their selling abilities. During a week there have been 14 sales
calls; A made 5 calls, B made 4 calls, C made 5 calls. Following are the
weekly sales record of the three salesmen:
A 500 400 700 800 600
B 300 700 400 600 -
C 500 300 500 400 300
Perform the analysis of variance and draw your conclusion.
▪ In two-way classification, observations are classified according to two different factors. For
example, fertilizers may be tried on different soil textures. The table dealing with data on such
grouping has a number of columns and rows.
▪ In this case the total variance or the sum of squares of variation consists of
three parts:
i) SSC i.e., the sum of squares (or variance) between the columns ( or due to
columns)
ii) SSR i.e., the sum of squares (or variance) between the rows ( or due to rows)
iii) SSE i.e., the sum of squares for the residuals or residual variance (other sum of
squares due to errors). Thus
▪ SST = SSC + SSR +SSE
▪ If c be the number of columns and r the number of rows, then
▪ The total number of degrees of freedom = cr-1
▪ Degrees of freedom between columns = c-1
▪ Degrees of freedom between rows = r-1
▪ Degrees of freedom between residuals = (cr-1) – (c-1) – (r-1)
= (c-1)(r-1)
Source Sum of Degrees of Mean Squares Test
of variation Squares Freedom statistic
Between 𝑆𝑆𝐶 𝑀𝑆𝐶
SSC c-1 MSC=𝑐−1 FC = 𝑀𝑆𝐸
Columns
𝑆𝑆𝑅 𝑀𝑆𝑅
Between rows SSR r-1 MSR=𝑟−1 FR = 𝑀𝑆𝐸
𝑆𝑆𝐸
Residual SSE (c-1)(r-1) MSE=(c−1)(r−1)
Total SST N-1
𝑀𝑆𝐸
Note that if MSE > MSC, then we shall take FC = 𝑀𝑆𝐶
𝑀𝑆𝐸
Similarly if MSE > MSR, then we shall take FR = 𝑀𝑆𝑅
▪ If the calculated value of FC > the table value of F at 5% level for c-1 and (c-1)(r-1)
d.f., we reject H0 and conclude that the difference between the column values is
significant.
▪ If the calculated value of FR > the table value of F at 5% level for r-1 and
(c-1)(r-1) d.f., we reject H0 and conclude that the difference between the row values
is significant.
▪ Otherwise we do not reject H0.
▪ A former applies three types of fertilizers on 4 separate plots. The figure
on yield per acre are tabulated below.
Plots Yield
Fertilizers A B C D Total
Nitrogen 6 4 8 6 24
Potash 7 6 6 9 28
Phosphates 8 5 10 9 32
Total 21 15 24 24 84
Find out if the plots are materially different in fertility, as also, if three
fertilizers make any material difference in yields.
▪ The Null Hypothesis be H0: there is no significant difference i.e., plots are equally fertile and
fertilizers are equally effective.
▪ Let us first determine the correction factor,
𝑇2 5582
▪ C= = = 558
𝑟𝑐 12
𝑐𝑜𝑙𝑢𝑚𝑛 𝑠𝑢𝑚 σ 𝑋𝑖 2
▪ SSC = Sum of squares between Columns = σ –C
𝑛
σ 𝑋1 2 σ 𝑋2 2 σ 𝑋3 2 σ 𝑋4 2
= + + + -C
𝑟 𝑟 𝑟 𝑟
21 2 15 2 24 2 24 2
= + + + – 558
3 3 3 3
1818
= – 558 = 18, and d.f., = c-1 =4 – 1 = 3
3
𝑟𝑜𝑤 𝑠𝑢𝑚 σ 𝑋𝑖 2
▪ SSR =sum of squares between rows = σ −𝐶
𝑁
24 2 28 2 32 2
= + + – 558
4 4 4
2384
= – 558 = 8
4
And d.f., = r-1 = 3 – 1 = 2
2
▪ SST = total sum of squares = σ σ 𝑥𝑖𝑗 −𝐶
= [62+72+82+42+62+52+82+62+102+62+92+92] – 588
= [36+49+64+16+36+25+100+36+100+36+81+81] – 588
= 624 – 588 = 36
and d.f., = N-1 = 12 – 1 = 11
▪ SSE = The error sum of squares
= SST – ( SSC+SSR) = 36 – (18 + 8) = 10
and d.f., = (c-1)(r-1) = (4-1)(3-1) = 6
Two-way ANOVA table for the given data is given below
▪ ANOVA Table
Source of Sum of Degrees Mean Squares Test statistic
variation Squares of
Freedom
Between 18 6
18 3 MSC= 3 = 6 FC = 1.667 = 3.6
Columns
Between 8 4
8 2 MSR= 2 = 4 FR = 1.667 = 2.4
rows
10
Residual 10 6 MSE= = 1.667
2
Total 36 11
▪ The table value of F for (3,6) degrees of freedom at 5% level is 7.76. since
the calculated value FC = 3.6 < the tabulated value at 5% level, we accept H0
and conclude that they do not show any significant difference.
▪ The table value of F for (2,6) degrees of freedom at 5% level is 5.14. since
the calculated value FC = 2.4 < the tabulated value at 5% level, we accept H0
and conclude that they do not show any significant difference and whatever
difference exists is due to sampling error.
▪ To study the performance of three detergents and three different
water temperatures the following whiteness regarding were
obtained with specially designed equipment.
Water Detergent Detergent Detergent
temperature A B C
Cold water 57 55 67
Warm water 49 52 68
Hot water 54 46 58
Perform a two way ANOVA using 5% level of significance .