Chapter 17 :
Correlation and regression
• Univariate distribution: Distribution of one variable like height , weight , marks , profit, wages and
so on.
• Bivariate distribution :when data are collected on two variable simultaneously, they are known as
Bavariate data and the corresponding frequency distribution, derived from it , is known as
bivariate frequency distribution .
• Correlation analysis and regression analysis are the two analysis that are made from a
multivariate distribution I.e a distribution of more than one variable .
• Correlation analysis : helps us to find an association or the lack of it between the two variables x
and y
• Regression analysis: is concerned with predicting the value of the dependent variable corresponding
to a known value of independent variable on the assumption of mathematical relationship between
two variables and also an average relationship between them
Correlation analysis
While studying two variables at the same time, if it is found that the change in one variable is
reciprocated by a corresponding change in the other variable either directly or inversely, then the two
variables are known to be associated or correlated .otherwise , the two variables are known to be
dissociated or uncorrelated or independent.
Positive correlation Negative correlation
If two variables move in the same direction , then If two variables move in the opposite direction, then
they are positively correlated. they are negatively correlated
Measures of correlation
• Scatter diagram
Positive correlation
Negative correlation
No correlation
• Karl pearson’s product moment correlation coefficient
cov (x; y) P
r = rxy = (x ¡ x) (y ¡ y)
cov (x; y) =
¾x¾y n
P
P
xy ¡ nxy xy
=q
P 2 qP = ¡ xy
x ¡ n (x)
2
y 2 ¡ n (y)
2
s
n
P 2
(x ¡ x)
P ¾x =
n§xy ¡ x§y n
=q
2
P 2 q P 2 P 2 r
n§x ¡ ( x) : n y ¡ ( y)
"x2 2
= ¡ (x)
n
Properties. s
P 2
• coefficient of correlation is unit free. (y ¡ y)
¾y =
• Coefficient of correlation lies between -1 to 1 ,inclusive n
• The coefficient of correlation remains invariant under
a change of origin and /or scale of the variables under
r
"y 2
consideration depending on the sign of scale factors. = ¡ (y)
x¡a
V =
y¡c n
u=
b d
bd
rxy = ruv
jbj jdj
• Spearman’s rank correlation coefficient
P " ¡ ¢#
6 d2 d = x i ¡ yi P P tj 3 ¡ tj
r =1¡
n (n2 ¡ 1)
6 d+
12
r =1¡
n (n2 ¡ 1)
tj Represent the jth tie length and the summation extends over the lengths of all the ties for both series.
• Coefficient of concurrent deviation
r
(2c ¡ m) If (2c-m) >o, then we take the positive sign both inside and
rc = § § outside the radical sign & if (2c-m)<o.we consider negative sign
m both inside and outside the radical sign
¡ ¢
Coefficient of non - determination = 1 ¡ r2
Regression analysis:
In regression analysis , we are concerned with the estimation of one variable for a given value of
another variable ( or for a given set of values of number of variables) on the basis of an average
mathematical relationship between two variables (or a number of variables)
When there are two variable x and y and if y is influenced by x i.e y is dependent on x (y on x)
and if x is influenced by y i.e x is dependent on y (x on y)
Y on x X on y
y = a + bx x = a + by
¾y ¾x
byx = r ¢ bxy = r ¢
¾x
P P
¾y
n§xy ¡ x y n§xy ¡ §x§y
= 2 = 2
n§x2 ¡ (§x) n§y 2 ¡ (§y)
cov (x; y) cov (x; y)
= =
¾x2 ¾y2
(y ¡ y) = byx (x ¡ x) (x ¡ x) = bxy (y ¡ y)
Properties of regression lines.
• The regression coefficient remain unchanged due to a shift of origin but change due to shift of scale .
u=
x¡a
v=
y¡c d b
b d byx = ¢ bvu bxy = ¢ buv
b d
• The two lines of regression intersect at a point where mean of x and y lies
• The coefficient of correlation between two variable x and y in the simple GM of two regression coefficient.
The sign of the correlation coefficient would be the common sign of two regression coefficient
p
r=§ byx ¢ bxy