[go: up one dir, main page]

100% found this document useful (1 vote)
99 views53 pages

Correlation & Regression

This document discusses correlation and different aspects of correlation analysis such as: - Correlation is a statistical tool used to measure the relationship between two variables. The degree of relationship is measured by the correlation coefficient. - There can be positive correlation, negative correlation, simple correlation, multiple correlation, partial correlation and total correlation. - The relationship can be linear or non-linear. Correlation does not necessarily imply causation. - Different degrees of correlation from perfect to low are explained using scatter diagrams. Karl Pearson's coefficient is presented as a quantitative measure of correlation calculated using the formula given.

Uploaded by

vhj jhhj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
99 views53 pages

Correlation & Regression

This document discusses correlation and different aspects of correlation analysis such as: - Correlation is a statistical tool used to measure the relationship between two variables. The degree of relationship is measured by the correlation coefficient. - There can be positive correlation, negative correlation, simple correlation, multiple correlation, partial correlation and total correlation. - The relationship can be linear or non-linear. Correlation does not necessarily imply causation. - Different degrees of correlation from perfect to low are explained using scatter diagrams. Karl Pearson's coefficient is presented as a quantitative measure of correlation calculated using the formula given.

Uploaded by

vhj jhhj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Birinder Singh, Assistant Professor, PCTE

CORRELATION &
REGRESSION
CORRELATION
 Correlation is a statistical tool that helps to
measure and analyze the degree of relationship
between two variables.

Birinder Singh, Assistant Professor, PCTE


 Correlation analysis deals with the association
between two or more variables.
CORRELATION
 The degree of relationship between the variables
under consideration is measure through the
correlation analysis.

Birinder Singh, Assistant Professor, PCTE


 The measure of correlation called the correlation
coefficient .
 The degree of relationship is expressed by
coefficient which range from correlation
( -1 ≤ r ≥ +1)
 The direction of change is indicated by a sign.
 The correlation analysis enable us to have an
idea about the degree & direction of the
relationship between the two variables under
study.
TYPES OF CORRELATION - TYPE I

Correlation

Positive Correlation Negative Correlation


TYPES OF CORRELATION TYPE I
 Positive Correlation: The correlation is said to be
positive correlation if the values of two variables
changing with same direction.
Ex. Pub. Exp. & Sales, Height & Weight.

 Negative Correlation: The correlation is said to be


negative correlation when the values of variables change
with opposite direction.
Ex. Price & Quantity demanded.
DIRECTION OF THE CORRELATION
 Positive relationship – Variables change in the
same direction.
 As X is increasing, Y is increasing
 As X is decreasing, Y is decreasing
Indicated by
 E.g., As height increases, so does weight.
sign; (+) or (-).
 Negative relationship – Variables change in
opposite directions.
 As X is increasing, Y is decreasing
 As X is decreasing, Y is increasing

 E.g., As TV time increases, grades decrease


EXAMPLES
Positive Correlation Negative Correlation

Water consumption Alcohol consumption

Birinder Singh, Assistant Professor, PCTE


 
and temperature. and driving ability.
 Study time and  Price & quantity
grades. demanded
TYPES OF CORRELATION TYPE II

Correlation

Simple Multiple

Partial Total
TYPES OF CORRELATION TYPE II
 Simple correlation: Under simple correlation
problem there are only two variables are studied.

 Multiple Correlation: Under Multiple


Correlation three or more than three variables
are studied. Ex. Qd = f ( P,PC, PS, t, y )

 Partial correlation: analysis recognizes more


than two variables but considers only two
variables keeping the other constant.

 Total correlation: is based on all the relevant


variables, which is normally not feasible.
Types of Correlation
Type III

Correlation

LINEAR NON LINEAR


TYPES OF CORRELATION TYPE
III
 Linear correlation: Correlation is said to be
linear when the amount of change in one
variable tends to bear a constant ratio to the
amount of change in the other. The graph of the
variables having a linear relationship will form
a straight line.
Ex X = 1, 2, 3, 4, 5, 6, 7, 8,
Y = 5, 7, 9, 11, 13, 15, 17, 19,
Y = 3 + 2x
 Non Linear correlation: The correlation
would be non linear if the amount of change in
one variable does not bear a constant ratio to
the amount of change in the other variable.
CORRELATION & CAUSATION
 Causation means cause & effect relation.
 Correlation denotes the interdependency among the
variables for correlating two phenomenon, it is
essential that the two phenomenon should have
cause-effect relationship,& if such relationship does
not exist then the two phenomenon can not be
correlated.
 If two variables vary in such a way that movement
in one are accompanied by movement in other, these
variables are called cause and effect relationship.
 Causation always implies correlation but correlation
does not necessarily implies causation.
DEGREE OF CORRELATION
 Perfect Correlation
 High Degree of Correlation

Birinder Singh, Assistant Professor, PCTE


 Moderate Degree of Correlation

 Low Degree of Correlation

 No Correlation
METHODS OF STUDYING CORRELATION

Birinder Singh, Assistant Professor, PCTE


Methods

Graphic Algebraic
Methods Methods

Karl
Scatter Correlation Rank Concurrent
Pearson’s
Diagram Graph Correlation Deviation
Coefficient
SCATTER DIAGRAM METHOD

 Scatter Diagram is a graph of


observed plotted points where each
points represents the values of X & Y
as a coordinate.
 It portrays the relationship between
these two variables graphically.
A PERFECT POSITIVE
CORRELATION

Weight
Weight
of B
Weight A linear
of A
relationship

Height
Height Height
of A of B
HIGH DEGREE OF POSITIVE
CORRELATION

 Positive relationship

r = +.80

Weight

Height
DEGREE OF CORRELATION
 Moderate Positive Correlation

r = + 0.4
Shoe
Size

Weight
DEGREE OF CORRELATION

 Perfect Negative Correlation

r = -1.0
TV
watching
per
week

Exam score
DEGREE OF CORRELATION

 Moderate Negative Correlation

r = -.80
TV
watching
per
week

Exam score
DEGREE OF CORRELATION
 Weak negative Correlation

Shoe
Size r = - 0.2

Weight
DEGREE OF CORRELATION
 No Correlation (horizontal line)

r = 0.0
IQ

Height
DEGREE OF CORRELATION (R)
r = +.80 r = +.60

r = +.40 r = +.20
DIRECTION OF THE RELATIONSHIP
 Positive relationship – Variables change in the same
direction.
 As X is increasing, Y is increasing Indicated by
 As X is decreasing, Y is decreasing

 E.g., As height increases, so does weight. sign; (+) or (-).


 Negative relationship – Variables change in opposite
directions.
 As X is increasing, Y is decreasing
 As X is decreasing, Y is increasing

 E.g., As TV time increases, grades decrease


ADVANTAGES OF SCATTER DIAGRAM
 Simple & Non Mathematical method
 Notinfluenced by the size of extreme
item
 First
step in investing the relationship
between two variables
DISADVANTAGE OF SCATTER DIAGRAM

Can not adopt the an exact


degree of correlation
CORRELATION GRAPH
300

250

Birinder Singh, Assistant Professor, PCTE


200

150 Consumption
Production
100

50

0
2012 2013 2014 2015 2016 2017
KARL PEARSON’S COEFFICIENT OF
CORRELATION
 It is quantitative method of measuring
correlation

Birinder Singh, Assistant Professor, PCTE


 This method has been given by Karl Pearson

 It’s the best method


CALCULATION OF COEFFICIENT OF
CORRELATION – ACTUAL MEAN METHOD
 Formula used is:
Σ𝑥𝑦
 r= where x = X – 𝑋 ; y = Y – 𝑌
Σ𝑥 2 . Σ𝑦 2

Birinder Singh, Assistant Professor, PCTE


Q1: Find Karl Pearson’s coefficient of correlation:
X 2 3 4 5 6 7 8
Y 4 7 8 9 10 14 18
Ans: 0.96
Q2: Find Karl Pearson’s coefficient of correlation:
X- Series Y-series
No. of items 15 15
AM 25 18
Squares of deviations from mean 136 138
Summation of product of deviations of X & Y series from their respective
arithmetic means = 122 Ans: 0.89
PRACTICE PROBLEMS - CORRELATION
Q3: Find Karl Pearson’s coefficient of correlation:
X 6 2 10 4 8

Birinder Singh, Assistant Professor, PCTE


Y 9 11 ? 8 7
Arithmetic Means of X & Y are 6 & 8 respectively. Ans: – 0.92

Q4: Find the number of items as per the given data:


r = 0.5, Ʃxy = 120, σy = 8, Ʃx2 = 90
where x & y are deviations from arithmetic means
Ans: 10
Q5: Find r:
ƩX = 250, ƩY = 300, Ʃ(X – 25)2 = 480, Ʃ(Y – 30)2 = 600
Ʃ(X – 25)(Y – 30) = 150 , N = 10 Ans: 0.28
CALCULATION OF COEFFICIENT OF
CORRELATION – ASSUMED MEAN METHOD
 Formula used is:
𝑁 .Σ𝑑𝑥𝑑𝑦 − Σ𝑑𝑥.Σ𝑑𝑦
r=

Birinder Singh, Assistant Professor, PCTE



𝑁.Σ𝑑𝑥 2 −(Σ𝑑𝑥)2 𝑁.Σ𝑑𝑦 2 −(Σ𝑑𝑦)2
Q6:Find r:
X 10 12 18 16 15 19 18 17

Y 30 35 45 44 42 48 47 46

Ans: 0.98
Q7: Find r, when deviations of two series from assumed mean
are as follows: Ans: 0.895
Dx +5 -4 -2 +20 -10 0 +3 0 -15 -5
Dy +5 -12 -7 +25 -10 -3 0 +2 -9 -15
CALCULATION OF COEFFICIENT OF
CORRELATION – ACTUAL DATA METHOD
 Formula used is:
𝑁.Σ𝑋𝑌 − Σ𝑋.Σ𝑌
r=

Birinder Singh, Assistant Professor, PCTE



𝑁.Σ𝑋 2 −(Σ𝑋)2 𝑁.Σ𝑌 2 −(Σ𝑌)2
Q8:Find r:
X 10 12 18 16 15 19 18 17

Y 30 35 45 44 42 48 47 46

Ans: 0.98
Q9: Calculate product moment correlation coefficient from the
following data: Ans: 0.996
X -5 -10 -15 -20 -25 -30
Y 50 40 30 20 10 5
IMPORTANT TYPICAL PROBLEMS
Q10: Calculate the coefficient of correlation from the following
data and interpret the result: Ans: 0.76
N = 10, ƩXY = 8425, 𝑋 = 28.5, 𝑌 = 28.0, 𝜎𝑥 = 10.5, 𝜎𝑦 = 5.6

Birinder Singh, Assistant Professor, PCTE


Q11: Following results were obtained from an analysis:
N = 12, ƩXY = 334, ƩX = 30, ƩY = 5, ƩX2 = 670, ƩY2 = 285
Later on it was discovered that one pair of values (X = 11, Y = 4) were
wrongly copied. The correct value of the pair was (X = 10, Y = 14).
Find the correct value of correlation coefficient. Ans: 0.774
VARIANCE – COVARIANCE METHOD
 This method of determining correlation coefficient is based on
covariance.
𝐶𝑜𝑣 (𝑋,𝑌) 𝐶𝑜𝑣 (𝑋,𝑌)
r= =

Birinder Singh, Assistant Professor, PCTE



𝑉𝑎𝑟 𝑋 𝑉𝑎𝑟 (𝑌) σ𝑥 .σ𝑦
Σ𝑥𝑦 Σ(𝑋−𝑋)(𝑌−𝑌) Σ𝑋𝑌
where Cov X, Y = = = − 𝑋𝑌
𝑁 𝑁 𝑁
Σ𝑥𝑦
 Another Way of calculating r =
𝑁. σ𝑥 .σ𝑦.
Q12: For two series X & Y, Cov(X,Y) = 15, Var(X)=36, Var (Y)=25.
Find r. Ans: 0.5
Q13: Find r when N = 30, 𝑋 = 40, 𝑌 = 50, 𝜎𝑥 = 6, 𝜎𝑦 = 7, Σ𝑥𝑦 = 360
Ans: 0.286
Q14: For two series X & Y, Cov(X,Y) = 25, Var(X)=36, r = 0.6.
Find 𝜎𝑦. Ans: 6.94
CALCULATION OF CORRELATION
COEFFICIENT – GROUPED DATA
 Formula used is:
𝑁 .Σ𝑓𝑑𝑥𝑑𝑦 − Σ𝑓𝑑𝑥.Σ𝑓𝑑𝑦
 r=

Birinder Singh, Assistant Professor, PCTE


𝑁.Σ𝑓𝑑𝑥 2 −(Σ𝑓𝑑𝑥)2 𝑁.Σ𝑓𝑑𝑦 2 −(Σ𝑓𝑑𝑦)2

Q15: Calculate Karl Pearson’s coefficient of correlation:

X/Y 10-25 25-40 40-55


0-20 10 4 6
20-40 5 40 9
40-60 3 8 15

Ans: 0.33
Birinder Singh, Assistant Professor, PCTE
PROPERTIES OF COEFFICIENT OF
CORRELATION
 Karl Pearson’s coefficient of correlation lies between -
1 & 1, i.e. – 1 ≤ r ≤ +1
If the scale of a series is changed or the origin is

Birinder Singh, Assistant Professor, PCTE



shifted, there is no effect on the value of ‘r’.
 ‘r’ is the geometric mean of the regression coefficients
byx & bxy, i.e. r = 𝑏𝑥𝑦 . 𝑏𝑦𝑥
 If X & Y are independent variables, then coefficient of
correlation is zero but the converse is not necessarily
true.
 ‘r’ is a pure number and is independent of the units of
measurement.
 The coefficient of correlation between the two
variables x & y is symmetric. i.e. ryx = rxy
PROBABLE ERROR & STANDARD ERROR
 Probable Error is used to test the reliability of Karl
Pearson’s correlation coefficient.
1 − 𝑟2
Probable Error (P.E.) = 0.6745 x

Birinder Singh, Assistant Professor, PCTE



𝑁
 Probable Error is used to interpret the value of the
correlation coefficient as per the following:
 If 𝑟 > 6 P.E., then ‘r’ is significant.
 If 𝑟 < 6 P.E., then ‘r’ is insignificant. It means that there
is no evidence of the existence of correlation in both the
series.
 Probable Error also determines the upper & lower
limits within which the correlation of randomly
selected sample from the same universe will fall.
 Upper Limit = r + P.E.
 Lowe Limit = r – P.E.
PRACTICE PROBLEM – PROBABLE ERROR
Q16: Find Karl Pearson’s coefficient of correlation
from the following data:

Birinder Singh, Assistant Professor, PCTE


X 9 28 45 60 70 50
Y 100 60 50 40 33 57

Also calculate probable error and check whether it


is significant or not. Ans: – 0.94, 0.032

Q17: A student calculates the value of r as 0.7


when N = 5. He concludes that r is highly
significant. Comment. Ans: Insignificant
SPEARMAN’S RANK CORRELATION
METHOD
 Given by Prof. Spearman in 1904
 By this method, correlation between qualitative
aspects like intelligence, honesty, beauty etc. can be

Birinder Singh, Assistant Professor, PCTE


calculated.
 These variables can be assigned ranks but their
quantitative measurement is not possible.
𝟔 𝜮𝑫𝟐
 It is denoted by R = 1 –
𝑵 (𝑵𝟐 −𝟏)
 R = Rank correlation coefficient
 D = Difference between two ranks (R1 – R2)
 N = Number of pair of observations
 As in case of r, –1≤R≤1
 The sum total of Rank Difference is always equal to
zero. i.e. ƩD = 0.
THREE CASES

Birinder Singh, Assistant Professor, PCTE


Spearman’s
Rank
Correlation
Method

When ranks are When ranks are When equal or


given not given tied ranks exist
PRACTICE PROBLEMS – RANK
CORRELATION (WHEN RANKS ARE GIVEN)
Q18: In a fancy dress competition, two judges accorded the
following ranks to eight participants:
Judge X 8 7 6 3 2 1 5 4

Birinder Singh, Assistant Professor, PCTE


Judge Y 7 5 4 1 3 2 6 8
Calculate the coefficient of rank correlation. Ans: .62

Q19: Ten competitors in a beauty contest are ranked by three


judges X, Y, Z:
X 1 6 5 10 3 2 4 9 7 8
Y 3 5 8 4 7 10 2 1 6 9
Z 6 4 9 8 1 2 3 10 5 7
Use the rank correlation coefficient to determine which pair of
judges has the nearest approach to common tastes in beauty.
Ans: X & Z
Birinder Singh, Assistant Professor, PCTE
PRACTICE PROBLEMS – RANK CORRELATION
(WHEN RANKS ARE NOT GIVEN)
Q20: Find out the coefficient of Rank Correlation
between X & Y:

Birinder Singh, Assistant Professor, PCTE


X 15 17 14 13 11 12 16 18 10 9
Y 18 12 4 6 7 9 3 10 2 5
Ans: 0.48
PRACTICE PROBLEMS – RANK CORRELATION
(WHEN RANKS ARE EQUAL OR TIED)
 When two or more items have equal values in a
series, so common ranks i.e. average of the ranks
are assigned to equal values.

Birinder Singh, Assistant Professor, PCTE


𝟐 𝒎𝟑 −𝒎 𝒎𝟑 −𝒎
𝟔 𝜮𝑫 + + + …………..
𝟏𝟐 𝟏𝟐
 Here R = 1 –
𝑵 (𝑵𝟐 −𝟏)
 m = No. of items of equal ranks
𝒎𝟑 −𝒎
 The correction factor of is added to 𝜮𝑫𝟐 for such
𝟏𝟐
number of times as the cases of equal ranks in the
question
PRACTICE PROBLEMS – RANK CORRELATION
(WHEN RANKS ARE EQUAL OR TIED)
Q21: Calculate R:
X 15 10 20 28 12 10 16 18

Birinder Singh, Assistant Professor, PCTE


Y 16 14 10 12 11 15 18 10

Ans: – 0.37

Q22: Calculate Rank Correlation:


X 40 50 60 60 80 50 70 60
Y 80 120 160 170 130 200 210 130

Ans: 0.43
IMPORTANT TYPICAL PROBLEMS –
RANK CORRELATION
Q23: Calculate Rank Correlation from the following data:
Ans: 0.64
Serial No. 1 2 3 4 5 6 7 8 9 10

Birinder Singh, Assistant Professor, PCTE


Rank
-2 ? -1 +3 +2 0 -4 +3 +3 -2
Difference

Q24: The coefficient of rank correlation of marks obtained by 10


students in English & Math was found to be 0.5. It was later
discovered that the difference in the ranks in two subjects was
wrongly taken as 3 instead of 7. Find the correct rank correlation.
Ans: 0.26
Q25: The rank correlation coefficient between marks obtained by
some students in English & Math is found to be 0.8. If the total of
squares of rank differences is 33, find the number of students.
Ans: 10
Birinder Singh, Assistant Professor, PCTE
CONCURRENT DEVIATION METHOD
 Correlation is determined on the basis of direction of the
deviations.
 Under this method, the direction of deviations are assigned
(+) or (-) or (0) signs.

Birinder Singh, Assistant Professor, PCTE


 If the value is more than its preceding value, then its deviation
is assigned (+) sign.
 If the value is less than its preceding value, then its deviation
is assigned (-) sign.
 If the value is equal to its preceding value, then its deviation is
assigned (0) sign.
 The deviations dx & dy are multiplied to get dxdy. Product of
similar signs will be (+) and for opposite signs will be (-).
 Summing the positive dxdy signs, their number is counted. It is
called CONCURRENT DEVIATIONS. It is denoted by C.
𝟐𝑪 −𝒏
 Formula used: rc = ± ± where rc = Correlation of
𝒏
CD, C = No. of Concurrent Deviations, n = N – 1.
PRACTICE PROBLEMS – COEFFICIENT OF
CONCURRENT DEVIATIONS
Q26: Find the Coefficient of Concurrent Deviation from
the following data:

Birinder Singh, Assistant Professor, PCTE


Year 2001 2002 2003 2004 2005 2006 2007
Demand 150 154 160 172 160 165 180
Price 200 180 170 160 190 180 172
Ans: – 1
Q27: Find the Coefficient of Concurrent Deviation from
the following data:
X 112 125 126 118 118 121 125 125 131 135
Y 106 102 102 104 98 96 97 97 95 90
Ans: – 0.75
COEFFICIENT OF DETERMINATION (COD)
 CoD is used for the interpretation of coefficient of correlation and
comparing the two or more correlation coefficients.
It is the square of the coefficient of correlation i.e. r2.

Birinder Singh, Assistant Professor, PCTE


 It explains the percentage variation in the dependent variable Y


that can be explained in terms of the independent variable X.
 If r = 0.8, r2 = 0.64, it implies that 64% of the total variations in Y
occurs due to X. The remaining 34% variation occurs due to
external factors.
𝐸𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒
 So, CoD = r2 =
𝑇𝑜𝑡𝑎𝑙 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒
𝑈𝑛𝑒𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒
 Coefficient of Non Determination= K2 = 1 – r2 =
𝑇𝑜𝑡𝑎𝑙 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒

 Coefficient of Alienation = 1 – r2
PRACTICE PROBLEMS – COD
Q28: The coefficient of correlation between
consumption expenditure (C) and disposable
income (Y) in a study was found to be +0.8. What

Birinder Singh, Assistant Professor, PCTE


percentage of variation in C are explained by
variation in Y? Ans: 64%
CLASS TEST
Q1: In a fancy dress competition, two judges accorded the
following ranks to eight participants:
Judge X 8 7 6 3 2 1 5 4

Birinder Singh, Assistant Professor, PCTE


Judge Y 7 5 4 1 3 2 6 8
Calculate the coefficient of rank correlation.

Q2: Following results were obtained from an analysis:


N = 12, ƩXY = 334, ƩX = 30, ƩY = 5, ƩX2 = 670, ƩY2 = 285
Later on it was discovered that one pair of values (X = 11, Y = 4) were
wrongly copied. The correct value of the pair was (X = 10, Y = 14).
Find the correct value of correlation coefficient.

You might also like