Unit-I-Correlation and Regression
Unit-I-Correlation and Regression
𝑿 = 𝒀 = 𝑿𝟐 = 𝒀𝟐 = 𝑿𝒀 =
Therefore the correlation between 𝑋 and 𝑌 is given by
𝑛 σ 𝑋𝑌 − σ 𝑋 . σ 𝑌
𝑟 = 𝑟 𝑋, 𝑌 =
𝑛 σ 𝑋2 − σ 𝑋 2 . 𝑛 σ 𝑌2 − σ 𝑌 2
Where 𝐷 is the difference between 𝑅1 𝑎𝑛𝑑 𝑅2 . Here 𝑅1 𝑎𝑛𝑑 𝑅2 are the ranks
given by two judges.
Properties of R:-
1)−1 ≤ 𝑅 ≤ +1.
2)If R is positive ,we can say that the two judges have the same thinking .
3)If R is negative ,we can say that the two judges have the opposite thinking .
9)If 𝑋 𝑎𝑛𝑑 𝑌 be two random variables and 𝑎, 𝑏, 𝑐, 𝑑 are any numbers provided
𝑎 ≠ 0, 𝑐 ≠ 0,then
𝑎𝑐
𝑟 𝑎𝑋 + 𝑏, 𝑐𝑌 + 𝑑 = 𝑟 𝑋, 𝑌 .
𝑎𝑐
-:Examples:-
1) Calculate the correlation coefficient for the following heights (in inches) of
fathers (X) and their sons (Y)
X 1 2 3 4 5
Y 1 2 4 5 7
360−285 75 75 75
= = = = = 0.99
275−225. 475−361 50. 114 5700 75.498
Thus 𝑟 = 0.99.
2)Calculate the correlation coefficient between 𝑋 and 𝑌.
X 10 20 30 40 50 60 70
Y 12 22 32 42 52 62 72
𝑋−10 𝑌−12
Solution: Define 𝑈 = ,𝑉 = .Draw the following table
10 10
X 10 20 30 40 50 60 70
Y 12 22 32 42 52 62 72
U 0 1 2 3 4 5 6
𝑼 = 𝟐𝟏
V 0 1 2 3 4 5 6
𝑽 = 𝟐𝟏
𝑈2 0 1 4 9 16 25 36
U 2 = 91
𝑉2 0 1 4 9 16 25 36
V 2 = 91
𝑈𝑉 0 1 4 9 16 25 36
𝑼𝑽 = 𝟗𝟏
Therefore the correlation between 𝑈 and 𝑉 is given by
𝑛 σ 𝑈𝑉 − σ 𝑈 σ 𝑉
𝑟𝑈𝑉 =
𝑛 σ 𝑈2 − σ 𝑈 2 . 𝑛 σ 𝑉2 − 𝑉 2
7 × 91 − 21 × 21
=
7 × 91 − 21 2
7 × 91 − 21 2
637 − 441 196 196
𝑟𝑈𝑉 = = = =1
637 − 441. 637 − 441 196. 196 196
𝑋−10 𝑌−12
Thus 𝑟𝑈𝑉 = 1. Here 𝑈 = ,𝑉 = ,ℎ = 10 > 0, 𝑘 = 10 > 0.Thus by change
10 10
of scale and origin property, we have
𝑟𝑋𝑌 = 𝑟𝑈𝑉 = 1
Line of Regression:-
If we plot the data set X and set Y which have both n observations using
scatter diagram ,then we observed that if the variables are related then then
the point are cluster around some cure. If this curve is a straight line ,then it is
called a line of regression and we can say that there is linear regression
between the these two variables X and Y.
We can find the value of variable Y(dependent variable) given the value of
variable X( Independent variable ).Similarly we can find the value of variable
X(dependent variable) given the value of variable Y( Independent variable
).One line is regression line of Y on X and another line is regression line of X on
Y . Regression line Y on X is used to estimate the value of Y for given value of X
and regression line X on Y is used to estimate the value of X for given value of
Y.
Regression Coefficients:-
Let 𝑥𝑖 , 𝑦𝑖 , 𝑖 = 1,2,3, … , 𝑛 be a bivariate distribution .
1)Let 𝑌 be dependent variable and 𝑋 be independent variable. Then the line
regression of Y on X is given by 𝑌 = 𝑎𝑋 + 𝑏,where a ,b are constant ,𝑎 ≠ 0.Here ‘𝑎’ is
called as a slope of line of regression of Y on X. These slope ‘a’ is called as the
regression coefficient of regression line Y on X. It represent the increment in the
value of dependent variable Y corresponding to a unit change in the value of
independent variable X . It is denoted by 𝑏𝑌𝑋 and is given by
𝜎𝑌 𝑛 σ 𝑋𝑌 − σ 𝑋.σ 𝑌
𝑏𝑌𝑋 = 𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑌 𝑜𝑛 𝑋 = 𝑟. =
𝜎𝑋 𝑛 σ 𝑋2− σ 𝑋 2
Where
𝑟 = 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑋 𝑎𝑛𝑑 𝑌.
𝜎𝑋 =Standard deviation of X;𝜎𝑌 =Standard deviation of Y.
Then the regression line Y on X is given by
σ𝑿 σ𝒀
ത ത ത
𝑌 − 𝑌 = 𝑏𝑌𝑋 𝑋 − 𝑋 ,where 𝑋 = ,𝑌 = .
𝑛 𝑛
Method:
1)Draw the following table
X 𝑌 𝑿𝟐 𝑋𝑌
𝑥1 𝑦1 𝑥12 𝑥1 𝑦1
𝑥2 𝑦2 𝑥22 𝑥2 𝑦2
𝑥3 𝑦3 𝑥32 𝑥3 𝑦3
𝑥4 𝑦4 𝑥42 𝑥4 𝑦4
𝑥5 𝑦5 𝑥52 𝑥5 𝑦5
𝑥6 𝑦6 𝑥62 𝑥6 𝑦6
.. .. . ..
. . . .
. . . .
𝑥𝑛 𝑦𝑛 𝑥𝑛2 𝑥𝑛 𝑦𝑛
𝑿 = 𝒀 = 𝑿𝟐 = 𝑿𝒀 =
2) Find the regression coefficient 𝑏𝑌𝑋 ,given by
𝑛 σ 𝑋𝑌 − σ 𝑋.σ 𝑌
𝑏𝑌𝑋 =
𝑛 σ 𝑋2− σ 𝑋 2
Y 1 2 4 5 7 σ 𝒀 =19
𝑿𝟐 1 4 9 16 25 σ 𝑿𝟐 =55
XY 1 4 12 20 35 σ 𝑿𝒀 =72
The regression coefficient 𝑏𝑌𝑋 ,given by
𝑛 σ 𝑋𝑌 − σ 𝑋.σ 𝑌 5×72 − 15×19 360−285 75 3
𝑏𝑌𝑋 = = = = = = 1.5
𝑛 σ 𝑋2− σ 𝑋 2 5×55 − 15 2 275−225 50 2
X 1 3 7 9 11
Y 0 4 5 9 10
X 1 3 7 9 11
𝑿 = 𝟑𝟏
Y 0 4 5 9 10
𝒀 = 𝟐𝟖
𝑿𝟐 1 9 49 81 121
𝑿𝟐 = 𝟐𝟔𝟏
XY 0 12 35 81 110
𝑿𝒀 = 238
The regression coefficient 𝑏𝑌𝑋 ,given by
𝑛 σ 𝑋𝑌 − σ 𝑋.σ 𝑌 5×238 − 31×28 1190−868 322
𝑏𝑌𝑋 = = = = = 0.94
𝑛 σ 𝑋2− σ 𝑋 2 5×261 − 31 2 1305−961 344
𝑿 = 𝒀 = 𝒀𝟐 = 𝑿𝒀 =
2) Find the regression coefficient 𝑏𝑋𝑌 ,given by
𝑛 σ 𝑋𝑌 − σ 𝑋.σ 𝑌
𝑏𝑋𝑌 ==
𝑛 σ 𝑌2− σ 𝑌 2
Y 1 2 4 5 7 σ 𝒀 =19
𝒀𝟐 1 4 16 25 49
𝒀𝟐 = 𝟗𝟓
XY 1 4 12 20 35 σ 𝑿𝒀 =72
the regression coefficient 𝑏𝑋𝑌 ,given by
𝑛 σ 𝑋𝑌 − σ 𝑋.σ 𝑌 5×72 − 15×19 360−285 75
𝑏𝑋𝑌 = = = = = 0.52
𝑛 σ 𝑌 2− σ 𝑌 2 5×95 − 19 2 .475−361 144
X 1 3 7 9 11
Y 0 4 5 9 10
X 1 3 7 9 11
𝑿 = 𝟑𝟏
Y 0 4 5 9 10
𝒀 = 𝟐𝟖
𝒀𝟐 0 16 25 81 100
𝒀𝟐 = 𝟐𝟐𝟐
XY 0 12 35 81 110
𝑿𝒀 = 238
the regression coefficient 𝑏𝑋𝑌 ,given by
𝑛 σ 𝑋𝑌 − σ 𝑋.σ 𝑌 5×238 − 31×28 1190−868 322
𝑏𝑋𝑌 = = = = = 0.99
𝑛 σ 𝑌 2− σ 𝑌 2 5×222 − 28 2 1110−784 326
∴𝑌 − 69 = 0.668 𝑋 − 68
∴𝑌 − 69 = 0.668𝑋 − 45.424
∴𝑌 = 69 + 0.668𝑋 − 45.424
∴𝑌 = 0.668𝑋 + 23.576 is the regression line Y on X ….(1)
2) Then the regression line X on Y is given by
𝜎𝑋 2.12
ത ത
𝑋 − 𝑋 = 𝑏𝑋𝑌 𝑌 − 𝑌 , where 𝑏𝑋𝑌 = r = 0.603 × = 0.544
𝜎𝑌 2.35
∴ 𝑋 − 68 = 0.544 𝑌 − 69
∴ 𝑋 − 68 = 0.544𝑌 − 37.526
∴ 𝑋 + 68 + 0.544𝑌 − 37.526
∴ 𝑋 = 0.544𝑌 + 30.474 is the regression line X on Y ….(2).
c)To estimate of X for Y=70.Put the Y=70 in equation (2),we get
𝑋 = 0.544 × 70 + 30.474 = 68.554
Line of Regression Y on X.
73
y = 0.668x + 23.576
72 R² = 0.3636
71
70
69
Y
68
67
66
65
64
64 65 66 67 68 69 70 71 72 73
X
2)In a partially destroyed laboratory ,record of an analysis of correlation data
are given below
Variance of X=9,Regression equation :8𝑋 − 10𝑌 + 66 = 0,40𝑋 − 18𝑌 = 214.
What are 1)Mean values of X and Y.
2)The correlation coefficient between X and Y.
3)Standard deviation of Y ?
Solution:-Let 𝜎𝑋2 = 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑋 = 9.
Regression equation :8𝑋 − 10𝑌 + 66 = 0,40𝑋 − 18𝑌 = 214.
ത 𝑌ത .Thus
1)We know that both lines of regression passing through the point 𝑋,
8𝑋ത − 10𝑌ത + 66 = 0 ⇒ 8𝑋ത − 10𝑌ത = −66 … (𝑖)
And 40𝑋ത − 18𝑌ത = 214 … (𝑖𝑖)
To solve equation (i) and (ii), 5 × 𝑒𝑞 𝑖 − 𝑒𝑞(𝑖𝑖) ,we get
40𝑋ത − 50𝑌ത = −330
− 40𝑋ത − 18𝑌ത = 214
− + −
−544
0 − 32𝑌ത = −544 ത ത
−32𝑌 = −544 ⇒ 𝑌 = = 17
−32
Put these value 𝑌ത = 17 in equation (i),we get
8𝑋ത − 10𝑌ത = −66 ⇒ 8𝑋ത − 10 × 17 = −66
⇒ 8𝑋ത − 170 = −66
⇒ 8𝑋ത = 170 − 66
104
ത
⇒ 8𝑋 = 104 ⇒ 𝑋 = ത = 13
8
Thus, Mean of 𝑋=𝑋ത = 13 and Mean of 𝑌 = 𝑌ത = 17.
2) Regression equation can be written as
8 66
8𝑋 − 10𝑌 + 66 = 0 ⇒ 𝑌 = 𝑋+
10 10
8 4
∴𝑏𝑌𝑋 =regression coefficient of Y on 𝑋 = =
10 5
18 214
and 40𝑋 − 18𝑌 = 214 ⇒ 𝑋 = 𝑌+
40 40
18 9
∴ 𝑏𝑋𝑌 =regression coefficient of X on Y= = .
40 20
We know that 𝑏𝑋𝑌 × 𝑏𝑌𝑋 = 𝑟 2
4 9 2 2 9 9 3
∴ × =𝑟 ⇒𝑟 = ⇒𝑟=± =± .
5 20 25 25 5
3
Since both the regression confidents are positive ,we have 𝑟 = = 0.6.
5
𝜎𝑋
3)We know that 𝑏𝑋𝑌 = r
𝜎𝑌
𝜎𝑋 9 3 3
Thus 𝑏𝑋𝑌 = r ⇒ = ×
𝜎𝑌 20 5 𝜎𝑌
9 9
⇒ =
20 5𝜎𝑌
1 1
⇒ = ⇒ 20 = 5𝜎𝑌 ⇒ 𝜎𝑌 = 4.
20 5𝜎𝑌
Solution
X 6 2 10 4 8 Total=30
Y 9 11 5 8 7 40
𝑋2 36 4 100 16 64 220
𝑌2 81 121 25 64 49 340
XY 54 22 50 32 56 214
σ5𝑖=1 𝑥𝑖 30 σ5𝑖=1 𝑦𝑖 40
Here 𝑋ത = = = 6, 𝑌ത = = =8
5 5 5 5
σ5𝑖=1 𝑥𝑖2 220
𝜎𝑋2 = − 𝑋ത 2
= − 6 2
= 44 − 36 = 8
5 5
σ5𝑖=1 𝑦𝑖2 340
𝜎𝑌2 = − 𝑌ത 2
= − 8 2
= 68 − 64 = 4
5 5
σ5𝑖=1 𝑥𝑖 𝑦𝑖 214
𝑐𝑜𝑣 𝑋, 𝑌 = ത ത
− 𝑋𝑌 = − 48 = 42.8 − 48 = −5.2
5 5
𝐶𝑜𝑣 𝑋,𝑌 −5.2 𝐶𝑜𝑣 𝑋,𝑌 −5.2
𝑏𝑋𝑌 = = = −1.3, 𝑏𝑌𝑋 = = = −0.65
𝜎𝑌2 4 𝜎𝑋2 8
1) Then the regression line Y on X is given by
𝑌 − 𝑌ത = 𝑏𝑌𝑋 𝑋 − 𝑋ത ,
∴𝑌 − 8 = −0.65 𝑋 − 6
∴𝑌 − 8 = −0.65𝑋 + 7.735
∴𝑌 = 8 − 0.65𝑋 + 7.735
∴𝑌 = −0.65𝑋 + 11.9
∴𝑌 = −0.65𝑋 + 11.9 is the regression line Y on X ….(1)
2) Then the regression line X on Y is given by
𝑋 − 𝑋ത = 𝑏𝑋𝑌 𝑌 − 𝑌ത
∴ 𝑋 − 6 = −1.3 𝑌 − 8
∴ 𝑋 − 6 = −1.3𝑌 + 10.4
∴𝑋 = 6 − 1.3𝑌 + 10.4
∴𝑋 = −1.3𝑌 + 16.4
∴ 𝑋 = −1.3𝑌 + 16.4 is the regression line X on Y ….(2).
Line of Regression Y on X
12
y = -0.65x + 11.9
R² = 0.845
10
6
Y
4
Y
0
0 2 4 6 8 10 12
X
Standard Error of Estimate or Residual Variance:-
1) The regression line Y on X is given by
𝜎𝑌
ത
𝑌−𝑌 =r 𝑋 − 𝑋ത .The standar𝑑 𝑒𝑟𝑟𝑜𝑟 𝑜𝑓 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 𝑜𝑓 𝑌, 𝑆𝑌 is given by
𝜎𝑋
𝑆𝑌 = 𝜎𝑌 . 1 − 𝑟 2 .
2) The regression line X on Y is given by
𝜎𝑋
ത ത
𝑋 − 𝑋 = 𝑏𝑋𝑌 𝑌 − 𝑌 , where 𝑏𝑋𝑌 = r .
𝜎𝑌
2
2 𝜎12 −𝜎1.23
2 . 𝜎12 −𝜎1.23
2 2
𝜎1.23
∴𝑅1.23 = = =1−
𝜎12 . 𝜎12 −𝜎1.23
2 𝜎12 𝜎12
2
𝜎1.23
2
∴1 − 𝑅1.23 = .
𝜎12
1 𝑟12 𝑟13
2 2 2
Denote 𝜔 = 𝑟21 1 𝑟23 = 1 − 𝑟12 − 𝑟13 − 𝑟23 + 2𝑟12 𝑟13 𝑟23
𝑟31 𝑟32 1
1 𝑟23 2 2 2 𝜔
And 𝜔11 = = 1 − 𝑟23 .Then 𝜎1.23 = 𝜎1
𝑟23 1 𝜔11
2
𝜎1.23 𝜔
2 2
Thus , 1 − 𝑅1.23 = ⇒1− 𝑅1.23 =
𝜎12 𝜔11
𝜔 2 −𝑟 2 −𝑟 2 +2𝑟 𝑟 𝑟
1−𝑟12
2 13 23 12 13 23
⇒ 𝑅1.23 =1− =1− 2
𝜔11 1−𝑟23
2 +𝑟 2 −2𝑟 𝑟 𝑟
𝑟12
2 13 12 13 23
⇒ 𝑅1.23 = 2
1−𝑟23
𝑋2 3 6 10 12
𝑋3 1 3 6 10
25 31 20
Solution:- 𝑋1 = = 6.25, 𝑋2 = = 7.75, 𝑋3 = =5
4 4 4
𝑿𝟏 𝑿𝟐 𝑿𝟏 − 𝑋1 𝑿𝟐 − 𝑋2 𝑿𝟏 − 𝑋1 𝟐 𝑿𝟐 − 𝑋2 𝟐 𝑨 𝑩
𝑨 𝑩
2 1 -4.25 -4 18.0635 16 17
Total=25 20 0 0 42.751 46 44
σ4𝑖=1 𝑋𝑖 −𝑋1 2 42.751
𝜎12 = = = 10.68775⇒𝜎1 = 3.269
4 4
σ4𝑖=1 𝑋𝑖 −𝑋3 2 46
𝜎32 = = = 11.5⇒𝜎3 = 3.361
4 4
σ4𝑖=1 𝑋𝑖 − 𝑋1 𝑋𝑗 − 𝑋3 44
𝐶𝑜𝑣 𝑋1 , 𝑋3 = = = 11.
4 4
𝐶𝑜𝑣 𝑋1 ,𝑋2 11
𝑟13 = 𝑟 𝑋1 , 𝑋2 = = =1
𝜎1 .𝜎2 3.269×3.361
𝑿𝟐 𝑿𝟑 𝑿𝟐 − 𝑋2 𝑿𝟑 − 𝑋3 𝑿𝟐 − 𝑋2 𝟐 𝑿𝟑 − 𝑋3 𝟐 𝑨 𝑩
𝑩
𝑨
3 1 -4.75 -4 22.5625 16 19
Total=31 20 0 0 48.75 46 46
σ4𝑖=1 𝑋𝑖 −𝑋2 2 48.75
𝜎22 = = = 12.1875⇒𝜎3 = 3.491
4 4
σ4𝑖=1 𝑋𝑖 −𝑋3 2 46
𝜎32 = = = 11.5⇒𝜎3 = 3.361
4 4
σ4𝑖=1 𝑋𝑖 − 𝑋1 𝑋𝑖 − 𝑋3 46
𝐶𝑜𝑣 𝑋1 , 𝑋3 = = = 4.18.
4 4
𝐶𝑜𝑣 𝑋1 ,𝑋2 4.18
𝑟23 = 𝑟 𝑋2 , 𝑋3 = = = 0.356
𝜎1 .𝜎2 3.491×3.361
1 𝑟23 1 0.356
Let 𝜔11 = 𝑟 1
=
0.356 1
= 1 − 0.126736 = 0.873
23
𝑟12 𝑟23 0.969 0.356
, 𝜔12 = 𝑟 1 = = 0.969 − 0.356 = 0.613
13 1 1
1 𝑟12 1 0.969
𝜔13 = = = 1 − 0.344964 = 0.655036
𝑟23 𝑟13 0.356 1
𝜎1 𝜔12 3.269 ×0.613
Then 𝑏12.3 = − =− = −0.6575
𝜎2 𝜔11 3.491×0.873
𝜎1 𝜔13 3.269×0.655036
𝑏13.2 =− =− = −0.7297
𝜎3 𝜔11 3.361×0.873
2 3 1 4 9 1 6 2 3
5 6 3 25 36 9 30 15 18
7 10 6 49 100 36 70 42 60