Bangladesh University of Business and Technology
Course Title: Introduction to Statistics
Chapter: Regression
Regression
Regression is a mathematical measure of expressing the average of relationship between
two or more variables in terms of the original units of the data. In a regression analysis
there are two types of variables. The variable whose is influenced or is to be predicted is
called dependent variable, regressed predicted or explained variable and the variable
which influences the values or is used for prediction is called independent variable or
regressor or predictor or explanator. These relationships between two variables can be
considered between say rainfall and agricultural production, price of an output and the
overall cost of product, consumer expenditure and disposable income.
Q. Discuss about regression equation and regression line.
Regression equation
Regression equations are algebraic expression of the regression lines. Since there are two
regression lines, the regression equation of X on Y is said to describe the variation in the
values of X for given changes in Y and the regression equation of Y on X is used to
describe the variation in the values of Y for given changes in X .
The regression equation of Y on X is expressed as follows
Y a bX
The regression equation of X on Y is expressed as follows
X a b Y
Regression line
If the variables in a bivariate distribution are related we will find that points in the scatter
diagram will cluster around some curve called the “Curve of regression”. If the curve is
straight line of, it is called the line of regression and there is said to be linear regression
between the variables, otherwise regression is said to be curvilinear. The line of
regression is the line which gives the best estimate to the value of one variable for any
specific value of the other variable.
Regression Analysis
Q. What are the regression coefficients?
In the line of regression of Y on X
Y a b X
The coefficient „ b ‟ which is the slope of the line of regression of Y on X is called the
coefficient of regression of Y on X . It represents the increment in the value of the
dependent variable Y for a unit change in the value of the independent variable X . For
notational convenience, coefficient of regression of Y on X is denoted by byx .
Regression coefficient of Y on X is
X Y
XY n
b yx
( X ) 2
X 2
n
and the intercept
a Y bX =
Y b X
n n
Similarly in the regression equation of X on Y
X a bY
Regression coefficient of Y on X is
X Y
XY n
bxy
( Y ) 2
Y 2
n
and the intercept
a X bY =
X b Y
n n
Note: Interpret a and b for the regression equation Y a b X
** The slope b represents the estimated average change in Y when X increases by
one unit.
** The intercept a represents the estimated average value of Y when X equals
zero.
2
Regression Analysis
Example # 1
From the following data obtain the regression equations of Y on X :
Sales X 91 97 108 121 67 124 51 73 111 57
Purchase Y 71 75 69 97 70 91 39 61 80 47
Solution:
We know that,
The regression equation of Y on X is expressed as follows
Y a bX
Again,
Regression coefficient of Y on X is
X Y
XY n
b yx
( X ) 2
X 2
n
Sales X Purchase Y X2 Y2 XY
91 71 8281 5041 6461
97 75 9409 5625 7275
108 69 11664 4761 7452
121 97 14641 9409 11737
67 70 4489 4900 4690
124 91 15376 8281 11284
51 39 2601 1521 1989
73 61 5329 3721 4453
111 80 12321 6400 8880
57 47 3249 2209 2679
X = 900 Y = 700 X 2 = 87360 Y 2 = 51868 XY = 66900
X Y 900 700
XY n
66900
10
b yx 0.613207547 = 0.613
( X ) 2 900
2
X 2
n
87360
10
3
Regression Analysis
a Y bX =
Y b X = 700 0.613 900 = 14.81
n n 10 10
Regression equation of Y on X is
Y 14.81 0.613 X
Example # 2
The following data give the ages and blood pressure of 10 women
Age X 56 42 36 47 49 42 60 72 63 55
Blood pressure
Y 147 125 118 128 145 140 155 160 149 150
a) Find the correlation coefficient between X and Y
b) Determine the least squares regression equation of Y on X .
c) Estimate the blood pressure of a women whose age is 45 years
Solution:
a) We know that, Correlation coefficient between X and Y is given by
XY N
X Y
r
X Y
2 2
X Y
2 2
N N
4
Regression Analysis
Age X Blood X2 Y2 XY
Pressure Y
56 147 3136 21609 8232
42 125 1764 15625 5250
36 118 1296 13924 4248
47 128 2209 16384 6016
49 145 2401 21025 7105
42 140 1764 19600 5880
60 155 3600 24025 9300
72 160 5184 25600 11520
63 149 3969 22201 9387
55 150 3025 22500 8250
X = 522 Y = 1417 X 2 = 28348 Y 2 = 202493 XY = 75188
522 1417
75188
r 10 0.891678842
202493
522
2
1417
2
28348
10
10
b) We know that, the regression equation of Y on X is expressed as follows
Y a bX
Again, Regression coefficient of Y on X is
X Y 522 1417
XY n
75188
10
b yx 1.110040015 1.11
( X ) 2 522
2
X 2
n
28348
10
a Y bX =
Y b X = 1417 1.11 522 83.75591124
n n 10 10
Regression equation of Y on X is
Y 83.756 1.11 X
5
Regression Analysis
c) When X 45 then
Y 83.756 1.11 45 133.706
Hence the most likely blood pressure of women of 45 years is 134.
Compare the correlation analysis with regression analysis.
Correlation Regression
1) Correlation coefficient is Regression co-efficients are not
symmetric i.e. rxy ryx symmetric in X and Y i.e. bxy byx
2) Correlation co-efficient rxy is a The regression co-efficient byx bxy are
relative measure of the linear absolute measures representing the
relationship between X and Y and change in the value of the variable
is independent of the units of the Y ( X ) for a unit change in the variable
measurement. If is a pure number X ( Y ).
lying between 1.
3) Correlation analysis has limited 3) Regression analysis studies linear as
applications as it is confined only well as non-linear relationship between
to the study of linear relationship the variables and therefore has much
between the variables. wider applications.
Q. What are the uses of regression analysis?
Uses:
The relation can be used for predictive purpose.
Regression analysis is widely used in statistical estimation of demand curves,
supply curves, production functions; cost functions, consumption function etc.