[go: up one dir, main page]

0% found this document useful (0 votes)
16 views14 pages

1.1.2simple Linear Regression

The document provides an overview of Simple Linear Regression, explaining its definition, the relationship between dependent and independent variables, and the regression equation. It discusses the differences between correlation and regression, the uses of regression analysis, and the assumptions underlying regression models. Additionally, it covers the least squares method for fitting regression lines and includes examples for practical application.

Uploaded by

Pranita Poudyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views14 pages

1.1.2simple Linear Regression

The document provides an overview of Simple Linear Regression, explaining its definition, the relationship between dependent and independent variables, and the regression equation. It discusses the differences between correlation and regression, the uses of regression analysis, and the assumptions underlying regression models. Additionally, it covers the least squares method for fitting regression lines and includes examples for practical application.

Uploaded by

Pranita Poudyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

4/8/2020 Simple Linear Regression

Sudip khanal
SUDIP KHANAL
4/8/20

Simple Linear Regression


The term “regression” literally means “stepping back towards the average”. It was first used by
British biometrician Sir Francis Galton (1822-1911). It describes a phenomenon which he observed in

Definition: Regression Analysis is a mathematical measure of the average relationship between


two or more variable in terms of original unit of data.
analyzing the heights of children and their parents. He found that, though tall parents have tall
children and short parents have short children, the average height of children tends to steps back or
to regress toward the average height of all mean. This tendency toward the average height of all men
was called regression by Galton.
Today, the word regression used in a quite sense. It investigates the dependence of one variable,
conventionally called dependent variable, on one or more other variables, called independent
variables, and provides an equation to be used for estimating or predicting the average value of the
dependent variable from the known values of the independent variable. The dependent variable is
assumed to be a random variable whereas the independent variable are assumed to fixed values, i.e.
they are chosen non randomly. The relation between the expected value of the dependent variable
and the independent variables is called a regression relation, when we study the dependence of a
variable on a single independent variable, it is called a simple or two variable regression. When the
dependence of a variable on two or more than two independent variables is studied, it is called
multiple regression. Furthermore, when the dependence is represented by a straight-line equation,
the regression is said to be linear, otherwise it is said to be curvilinear.

In regression analysis there are two types of variables. The variable whose value is influenced or is to
be predicted is called dependent variable and the variable which influences the value or used for
prediction is called independent variable. In regression analysis independent variable is also known
as regressor or predictor or explanatory variable while the dependent variable is also known as regressed
or explained variable.
Correlation describes the strength of a relationship between two variables and is completely
symmetrical i.e .the correlation between A and B is the same as the correlation between B and A.
however, if the two variables are related it means that when one changes by certain amount the other
changes on an average by certain amount.
For instance, when height of children increases then the weight of children also increases on an
average. The nature of relationship between variables is examined by regression analysis,
In the simple linear regression model, two variables X and Y are interest. The variable X usually
referred to as independent variable and totally controlled by investigator and the variable Y is
referred as the dependent variable and researcher wish to find the average change of Y with unit
change in X.
Dependent Variable (Response)
The variable whose value is influenced or is to be predicted on the basis of given information is called
dependent variable .It is also called the regressed or explained variable.
Independent Variable (Explanatory)
The variable which influences the value is called independent variable .It is also called regressor or
predictor or explanatory variable.
For example: In the study of cancer disease on smoking, the smoking is the independent variable and
the cancer disease is the dependent variable.
In the study of mortality and economic condition, the mortality is the dependent variable and the
economic condition is the independent variable .
1
SUDIP KHANAL
4/8/20

In the relationship between age and height in human, age is independent variable and height is
dependent variable. In income and mortality, income is independent variable and mortality is
dependent variable.

The simplest functional relationship of one variable to another in a population is the simple linear
regression and called regression equation .In general, the equation of straight line and written as.
Y = a + bX
The regression equation representing how much Y changes with any change of X can be used to
construct a regression line on a scatter diagram.
The direction in which the line slopes depends on whether the correlation is positive or negative .
When the two sets of observations increase or decreases together the line slopes upwards from left to
right .When one set decreases as the other increases the line slopes downward from left to right .
In the above equation, “a” represents Y intercept or constant and “b”(byx) represents the slope of line
called the regression coefficient.
Sometimes the regression of X and Y is used. It is written as
X = a + bY
The regression equation representing how much X changes with any change of Y can be used to
construct a regression line on a scatter diagram.
The direction in which the line slopes depends on whether the correlation is positive or negative .
When the two sets of observations increase or decreases together the line slopes upwards from left to
right .When one set decreases as the other increases the line slopes downward from left to right .
In the above equation, “a” represents X intercept or constant and “b”(bxy) represents the slope of line
called the regression coefficient.

Differences between correlation and regression.


• Correlation analysis is the statistical tool we can used to describe the degree to which the
variables are linearly related .On the other hand, regression always is measure expressing
the average relationship between the two variables whether the variables are linearly related
or non-linearly related.
• In correlation the linear relationship between two variables is symmetric r xy=ryx .But in
regression analysis concerned to establishing the functional relationship between two
variables under study and the using this relationship to predictor estimate the value of
dependent variable for any given value of independent variable. The regression coefficients
are not symmetrical. byx≠bxy
• Correlation coefficient is the pure number but the regression coefficients are not pure
number.
• Correlation analysis is the limited application in comparison but in the regression, coefficient
has wide application and it is estimated the value of dependent variable.
Uses of regression analysis
Following are the uses of regression analysis
• Regression analysis through regression line facilities to predict the values of a dependent
variable from the given value of an independent variable.
• Regression analysis through standard error facilitates to obtain a measure of the error
involved in using the regression line as a basis for the estimation.
• Regression analysis through regression coefficients (byx and bxy) facilities to calculate
coefficient of determination(r2) and coefficient of correlation(r)
2
SUDIP KHANAL
4/8/20

• Regression analysis is highly valuable tools in public health, Pharmacy and many medical
studies. Most of the medical and public health problem are based on cause and effect
relationship.
Assumption of the regression analysis
• Quantitative models always rest on the assumptions about the way the world works, and
regression models are no exception. There are four principal assumptions, which justify the
use of linear regression models for purposes of prediction.
• Linearity: the relationship between dependent and independent
• Independence of errors
• Homoscedasticity
• Normality of the error distribution
Least Square method of fitting regression line
Least square method is one of the standard methods to find the desired regression line. The obtained
line is called least square. The general equation for a straight line may be written as, the regression
equation y on x where x is independent variable and y is dependent variable.
𝑦 = 𝑎 + 𝑏𝑥
Applying the least square method to find normal equations.
∑ y = na + b ∑ x … … … … … . (1)
And
∑ xy = a ∑ x + b ∑ x 2 . . . . . . . . . . . . . . . . (2)
Solving these two equations then we get, value of a and b, Then the estimated regression equation y
on x is
𝑦𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 = 𝑎 + 𝑏𝑥
Similarly, the regression equation x on y is given as
𝑥 = 𝑎 + 𝑏𝑦
Applying, the least square method to find normal equations.
∑ x = na + b ∑ y. . . . . . . . . . . . . . . . (1)
And
∑ xy = a ∑ y + b ∑ y 2 . . . . . . . . . . . . . . . . (2)
Solving these two equations then we get, value of a and b., Then the estimated regression equation x
on y is
𝑥𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 = 𝑎 + 𝑏𝑦
Method of data modeling
Principle of ordinary Least Square:
It is the best method used in estimating regression lines. The line fitted using the least square method
is called line of best fit. After fitting the lines, the scatterness of points from the line is given by ei = Yi
-Ŷi is called error or residual. Here Yi is given value of Y and Ŷi is estimated value of Yi. For the best
fitted line ei= 0. In actual practice it is very rare so that the error is minimized. The sum of square of
error of various points is minimum only from the line of best fit. Regression lines are fitted after
getting the estimated parameters obtained by using the principle of least square by minimizing error
sum of square.
Simple linear regression:
It is the linear function of a dependent variable with independent variable. With the help of
independent variable, the values of dependent variable can be predicted. It is statistical tool for
finding the linear relationship of a dependent variable with independent variable. If the variables in
bivariate distribution are related in linear form then the regression is linear otherwise curvilinear. It
gives the best estimate to the value of one variable for any specific value of other variable.
Let us consider bivariate distribution (Xi, Yi) ; I = 1,2,3,………n ; Y is dependent variable and X is
3

independent variable then regression equation of Y on X is given by Y = a+bX. Where a and b are
SUDIP KHANAL
4/8/20

constants and called y intercept and regression coefficient or slope respectively. b measures amount
of change in Y per unit change in X. The values of a and b are determined by using the principle of
least square by minimizing error sum of square.
Here,
error (e) = Y-Ŷ
so that Σe² = Σ(Y-Ŷ) ²
Let, S = Σe² = Σ(Y-Ŷ) ² = Σ(Y-a-bX) ²
Differentiating both sides with respect to a
𝑑𝑆 𝑑 ∑(𝑌−𝑎−𝑏𝑋)²
=
𝑑𝑎 𝑑𝑎
= 2Σ(Y-a-bX) (-1)
= -2 Σ(Y-a-bX)
For S to be minimum
𝑑𝑆
=0
𝑑𝑎
or, -2Σ(Y-a-bX) = 0
or, Σ(Y-a-bX) =0
or, ΣY – na – bΣX = 0
or, ΣY = na+bΣX …..(i)
Differentiating both sides with respect to b
𝑑𝑆 𝑑 ∑(𝑌−𝑎−𝑏𝑋)²
=
𝑑𝑏 𝑑𝑏
= 2 Σ(Y-a-bX) (-X) = - 2Σ(YX-aX-bX²)
For S to be minimum
𝑑𝑆
=0
𝑑𝑏
or, -2Σ(YX-aX-bX²) =0
or, Σ(YX-aX-bX²) =0
or, ΣYX – aΣX -bΣX² = 0
or, ΣYX = aΣX + bΣX² ……...(ii)
Solving equation (i)and (ii) get a and b and substitute value to get the regression equation Y = a + bX
when X is dependent variable and Y is independent variable.
Properties of regression coefficients:
• Correlation coefficient is the geometric mean between the regression coefficients.
r= (byx × bxy)1/2
• If one of the regression coefficients is greater than unity then other must be less than unity.
• Arithmetic mean of regression coefficients is greater than the correlation coefficient.
• Regression coefficients are independent of change of origin but not of scale.
Fitting of linear curve(fitting a straight line)
Let the linear curve be y = a+bx in which y is dependent variable and x is independent variable.
The error or residual of the curve is e = y-a-bx
Let S =∑ 𝑒² = ∑(𝑦 − 𝑎 − 𝑏𝑥)²
To find a and b by using the principle of least square,
we have to minimize error sum of square.
𝑑𝑆 𝑑 ∑(𝑦−𝑎−𝑏𝑥)²
Differentiating S with respect to a, = = 2∑(𝑦 − 𝑎 − 𝑏𝑥)(−1)
𝑑𝑎 𝑑𝑎
𝑑𝑆
For S to be minimum 𝑑𝑎 = 0
or,2 ∑(𝑦 − 𝑎 − 𝑏𝑥)(−1) = 0
or, -2∑(𝑦 − 𝑎 − 𝑏𝑥) = 0
or, ∑(𝑦 − 𝑎 − 𝑏𝑥) = 0
4

or, ∑ 𝑦 − 𝑛𝑎 − 𝑏𝛴𝑥 = 0
SUDIP KHANAL
4/8/20

or,∑ 𝑦 = 𝑛𝑎 + 𝑏 ∑ 𝑥 …. (i)
Again, Differentiating S with respect to b,
𝑑𝑆 𝑑 ∑(𝑦−𝑎−𝑏𝑥)²
= = 2∑(𝑦 − 𝑎 − 𝑏𝑥)(−𝑥)
𝑑𝑏 𝑑𝑏
𝑑𝑆
For S to be minimum 𝑑𝑏 = 0
or, 2∑(𝑦 − 𝑎 − 𝑏𝑥)(−𝑥) = 0
or, -2∑(𝑦 − 𝑎 − 𝑏𝑥)𝑥 = 0
or, ∑(𝑦𝑥 − 𝑎𝑥 − 𝑏𝑥²) = 0
or, ∑ 𝑦𝑥 − 𝑎 ∑ 𝑥 − 𝑏𝛴𝑥² = 0
or,∑ 𝑦𝑥 = 𝑎 ∑ 𝑥 + 𝑏 ∑ 𝑥² ….(ii)
Solving (i) and (ii) get a and b then substitute value in the equation y = a+bx to get the linear curve.
Here a gives the expected value of y at x=0 and b gives average growth rate in y per unit rise in x and
is also called the regression coefficient of y on x.
Short cut method.
When value of x are equally spaced ordered data then x can be changed into u by taking deviation
u = x-A where A is assumed mean from X values.
Now u=x-A and y are two variables in which y is dependent and u is independent variable
First fit the model y=a+bu by solving ΣY = na +bΣu and Σuy = aΣu +bΣu² and substitute value of u to
convert the linear model in the form y = a+bx

Numerical Examples
Example 1: Fit straight line to the following data (y=a+bx)

x 1 2 3 4 6 8
y 2.4 3 3.6 4 5 6
Solution:
To fit, 𝑦 = 𝑎 + 𝑏𝑥……….(1)
By using the principle of least square the two normal equations are,
𝛴𝑦 = 𝑛𝑎 + 𝑏𝛴𝑥………(2)
𝛴𝑥𝑦 = 𝑎𝛴𝑥 + 𝑏𝛴𝑥² …….(3)
x y x² xy
1 2.4 1 2.4
2 3 4 6
3 3.6 9 10.8
4 4 16 16
6 5 36 30
8 6 64 48
Σx =24 Σy =24 Σx² =130 Σxy=113.2
Putting all the values in equation (2) and (3)
24 = 6a + 24b……….(4)
113.2 = 24a + 130b……(5)
Multiplying eq(5) by 6 and eq (4) by 24 and subtracting them each other
(24 = 6a + 24b)24
(113.2 = 24a + 130b) 6

576=144a+576b
679.2=144a+780b
5

-103.2= 0 - 204b
SUDIP KHANAL
4/8/20

-103.2= -204 b
204 b =103.2
or, b =103.2/204 = 0.5
Putting the value of (b) in equation (4)
24=6a+24b
24 = 6a+240.5
or, 24= 6a+12
or, 6a+12=24
or, 6a=24-12=12
or, a=12/6=2
Hence , the straight line is y= a+bx
𝑦𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 = 𝑦̂ = 2 + 0.5𝑥
Example 2 Age and weight of newborn calf taken at weekly intervals. Estimate the line of best fit
and estimate the weight of calf of age 15 weeks.
Age 1 2 3 4 5 6 7 8 9 10
Weight(kg) 50 51 53 55 57 58 62 70 75 108
Solution:
To fit the regression equation:
Given. Age = x (Suppose)
Weight=y (Suppose)
Weight = a+bAge
y = a + bx…..(1)
Using principle of least square the normal equations are,
∑ 𝑦 = 𝑛𝑎 + 𝑏 ∑ 𝑥 ……..(2)
∑ 𝑥𝑦 = 𝑎 ∑ 𝑥 + 𝑏 ∑ 𝑥 2 …….(3)
Solution
Week(x) Weight(y) x2 xy
1 50 1 50
2 51 4 102
3 53 9 159
4 55 16 220
5 57 25 285
6 58 36 348
7 62 49 434
8 70 64 560
9 75 81 675
10 108 100 1080
Σx=55 Σy=639 Σx2=385 Σxy=3913
Putting all these values in eq(2) and (3)
639=10a+55b……..(4)
3913 = 55a + 385b……(5)
Multiplying eq(4) by 55 and eq(5) by 10 and subtracting each other,
(639=10a+55b)55
(3913 = 55a + 385b)10

35145=550a+3025b
39130=550a+3850b
6

-3985=-825b
SUDIP KHANAL
4/8/20

825b=3985
b =3985/825=4.83
putting the value of b in equation (4)
639=10a+554.83
639=10a+265.65
or, 10a+265.65=639
or, 10a=639-265.65
or, 10a=373.35
or, a=373.35/10=37.335~37.34
Hence the straight line is y= a+bx
𝑦̂ = 37.34 + 4.83𝑥

When , x =15
𝑦̂ = 37.34 + 4.83 × 15 = 109.79
𝑦̂ = 109.79

Coefficient of Determination (R2 )


R2 is interpreted as the proportion of total variability of the outcome that is accounted by the model
(Vittinghoff et al., 2005). In other words, it is the proportion of the variation in the y variable that is
“explained” by the variation in the x variable. R2 is called as the ‘coefficient of determination’. R2 can
vary from 0 to1. An R2 close to 1 indicates that the actual y values fall almost right on the regression
line. An R2 close to 0 indicates that there is little or no relationship between x and y.
Alternative method of fitting the regression equation
The regression equation of Y on X and X on Y can be expressed as
y=a+bx and x=a+by
Y − Y = byx (X − X) … … … … … (i)
X − X = bxy (Y − Y) … … … … … . . (ii)
This shows that x̅ and y̅ is the common point of two regression lines.
x̅ and y̅ =arithmetic mean of X and Y respectively
byx and bxy =regression coefficient of y on X and X on y.
Direct method:
∑x ∑y 𝜎 n ∑ xy−∑ x ∑ y 𝜎 n ∑ xy−∑ x ∑ y
x= y= byx = r 𝜎𝑦 = , bxy = r 𝜎𝑥 =
n n 𝑥 n ∑ x2 −(∑ x)2 𝑦 n ∑ y2 −(∑ y)2
Putting all these values in equation (i) or (ii) we get the estimated regression equations.
Short-cut Method:
If we take a and b as assumed value .
Then,
∑u ∑v n ∑ uv−∑ u ∑ v n ∑ uv−∑ u ∑ v
x= a+ y = b+ byx = bxy = where u = x-a and v = y-b
n n n ∑ u2 −(∑ v)2 n ∑ v2 −(∑ v )2
Example 3: The following table shows the ages (x) and systolic blood pressure (y) of 8 women.

Age(x) 56 42 72 36 63 47 55 40
Bp(y) 147 125 160 118 149 128 150 145
Estimate the blood pressure of women whose age are 45 and 65 years.
Solution:
Suppose, Age = x, and Bp = y
The regression equation y on x is given as
𝑦 = 𝑎 + 𝑏𝑥 ……………(i)
7

This given equation can be written as,


SUDIP KHANAL
4/8/20

𝑦 − 𝑦 = 𝑏𝑦𝑥 (𝑥 − 𝑥) ………….(ii)

Where,
∑𝑥 ∑𝑦
𝑥= 𝑦=
𝑛 𝑛
𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦
𝑏𝑦𝑥 = 𝑛 ∑ 𝑥 2−(∑ 𝑥)2

Age(x) Bp(y) xy x2 y2
56 147 8232 3136 21609
42 125 5250 1764 15625
72 160 11520 5184 25600
36 118 4248 1296 13924
63 149 9387 3969 22201
47 128 6016 2209 16384
55 150 8250 3025 22500
40 245 9800 1600 60025
∑ 𝑥 = 411 ∑ 𝑦 = 1222 ∑ 𝑥𝑦 = 55303 ∑ 𝑥 2 = 22183 ∑ 𝑦 2 =197868
∑𝑥 411
𝑥= = = 51.375
𝑛 8
∑𝑦 1222
𝑦= = = 152.75
𝑛 8
𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦 8×55303−411×1222 −59818
𝑏𝑦𝑥 = = = = −7.0
𝑛 ∑ 𝑥 2−(∑ 𝑥)2 8×22183−4112 8543
Putting all these values in the following equation (ii)
y − y = byx (x − x)
y − 152.75 = −7.0(x − 51.375)
or, y-152.75 = -7x+ 751.375
or, y-152.75 = -7x+359.625
or, y = -7x+359.625+152.75
or, y = -7x+ 512.375
the required regression equation is
𝑦 = 𝑎 + 𝑏𝑥
̂ = 𝟓𝟏𝟐. 𝟑𝟕𝟓 − 𝟕𝒙
𝒚
When x = 45
𝑦̂ = 512.375 − 7 × 45 = 512.375 − 315 = 197.375
When x = 65
𝑦̂ = 512.375 − 7 × 65 = 512.375 − 455 = 57.375

Example 4: The following table gives information on ages and cholesterol levels for random sample
of 10 men.
Age 58 69 43 35 63 52 47 31 74 36
Cholesterol level 189 235 193 177 154 191 213 165 148 181
Predict the cholesterol levels for a 60 years old man.
Solution:
Suppose, Age=x
Cholesterol level =y
To estimate the regression equation y on x is
𝑦 = 𝑎 + 𝑏𝑥 ………(i)
Which can be written as
8

Y − Y = byx (X − X) … … … … … (ii)
SUDIP KHANAL
4/8/20

where,
∑𝑥 ∑𝑦
𝑥= 𝑦=
𝑛 𝑛
𝜎𝑦 𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦
𝑏𝑦𝑥 = 𝑟 =
𝜎𝑥 𝑛 ∑ 𝑥 2−(∑ 𝑥)2
Calculation table
Age(x) Cholesterol level(y) 𝑥2 𝑥𝑦
58 189 3364 10962
69 235 4761 16215
43 193 1849 8299
35 177 1225 6195
63 154 3969 9702
52 191 2704 9932
47 213 2209 10011
31 165 961 5115
74 148 5476 10952
36 181 1296 6516
∑ 𝑥 = 508 ∑ 𝑦 = 1846 ∑ 𝑥 2 = 27814 ∑ 𝑥𝑦 = 93899
∑𝑥 508 ∑𝑦 1846
𝑥= = = 50.8, 𝑦 = = = 184.6
𝑛 10 𝑛 10
𝜎𝑦 𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦 10×93899−508×1846 1222
𝑏𝑦𝑥 = 𝑟 𝜎 = = = 20076 = 0.06
𝑥 𝑛 ∑ 𝑥 2−(∑ 𝑥)2 10×27814−(508)2
Putting all these values in above equation (ii)
𝑦 − 184.6 = 0.06(𝑥 − 50.8)
𝑦 = 0.06𝑥 − 50.8 × 0.06 + 184.6
𝑦 = 0.06𝑥 − 3.048 + 184.6
𝑦 = 0.06𝑥 + 181.552
The required estimated regression equation is
𝑦̂ = 181.552 + 0.06𝑥
When Age(x)=60
𝑦̂ = 181.552 + 0.06 × 60 = 181.552 + 3.6
𝑦̂ = 185.152

Example 5 Compute the least square regression equation Y on X for the following data. What is the
regression coefficient and what does it mean?
X 5 6 8 10 12 13 15 16 17
Y 16 19 23 28 36 41 44 45 50
The estimated regression line of Y on X is
𝑌̂ = 𝑎 + 𝑏𝑋
And the two normal equations are
Σy = na+bΣx
Σxy = aΣx +bΣx²
To compute the necessary summations, we arrange the computations in the table below.

X Y XY X2
5 16 80 25
6 19 114 36
8 23 184 64
10 28 280 100
12 36 432 144
9

13 41 533 169
SUDIP KHANAL
4/8/20

15 44 660 225
16 45 720 256
17 50 850 289
ΣX=102 ΣY=302 ΣXY=3853 ΣX2=1308
Now,
∑𝑋 102
𝑥̅ = = = 11.33
𝑛 9
∑𝑌 302
𝑦̅ = = = 33.56
𝑛 9
𝑛 ∑ 𝑋𝑌−(∑ 𝑋)(∑ 𝑌) 9(3853)−(102)(302)
𝑏= 2 =
𝑛 ∑ 𝑋 2−(∑ 𝑋) 9(1308)−(102)2
34677−30804 3873
= 11772−10404 = 1368 = 2.831
And
𝑎 = 𝑌̅ − 𝑏𝑋̅ = 33.56 − (2.831)(11.33) = 1.47
hence, the desired estimated regression line of Y on X is
𝑌̂ = 𝑎 + 𝑏𝑋=1.47+2.83X
The estimated regression co-efficient b=2.831, which indicated that the of Y increase by 2.831 units for
a unit increase in X.
Example 6: The height of samples of 10 fathers and their eldest sons are given below (to the nearest
cm)
Height of 170 167 162 163 167 166 169 171 164 165
father(x)
Height of son(y) 168 167 166 166 168 165 168 170 165 168
a. Find the regression equation of y on x.
b. Compute the regression equation of x on y
Solution:
Here, Height of father = x
Height of son = y
Calculation table:
Height of Height of xy x2 y2
father(x) son(y)
170 168 28560 28900 28224
167 167 27889 27889 27889
162 166 26892 26244 27556
163 166 27058 26569 27556
167 168 28056 27889 28224
166 165 27390 27556 27225
169 168 28392 28561 28224
171 170 29070 29241 28900
164 165 27060 26896 27225
165 168 27720 27225 28224
2
∑ 𝑥 = 1664 ∑ 𝑦 = 1671 ∑ 𝑥𝑦 = ∑𝑥 = ∑ 𝑦 2 = 279247
278087 276970

To estimate the regression equation y on x is


𝑦 = 𝑎 + 𝑏𝑥 ………(i)
Which can be written as
Y − Y = byx (X − X) … … … … … (ii)
10

where,
SUDIP KHANAL
4/8/20

∑𝑥 ∑𝑦
𝑥= 𝑦=
𝑛 𝑛
𝜎 𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦
𝑏𝑦𝑥 = 𝑟 𝜎𝑦 =
𝑥 𝑛 ∑ 𝑥 2−(∑ 𝑥)2
∑𝑥 1664
𝑥= = = 166.4
𝑛 10
∑𝑦 1671
𝑦= = = 167.1
𝑛 10
𝜎𝑦 𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦 10×278087−1664×1671 326
𝑏𝑦𝑥 = 𝑟 𝜎 = = = 11974 = 0.027
𝑥 𝑛 ∑ 𝑥 2−(∑ 𝑥)2 10×278087−(1664)2
Putting all these values in equation(ii)

Example 7: Five tablets were weighed and then assayed with the following results:
Weight(mg) 205 200 202 198 197
Potency(mg) 103 100 101 98 98
Predict the potency for a 200-mg tablet.

Solution:
Suppose, Weight=x, Potency =y
To estimate the regression equation y on x is
𝑦 = 𝑎 + 𝑏𝑥 ………(i)
Which can be written as
Y − Y = byx (X − X) … … … … … (ii)
where,
∑𝑥 ∑𝑦
𝑥= 𝑦=
𝑛 𝑛
𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦
𝑏𝑦𝑥 = 𝑛 ∑ 𝑥 2−(∑ 𝑥)2
Weight(x) Potency(y) xy x2
205 103 205103=21115 42025
200 100 20000 40000
202 101 20402 40804
198 98 19404 39204
197 98 19306 38809
x=1002 y=500 xy=100227 x2=200842
1002 500
𝑥= = 200.4 𝑦 = = 100
5 5
𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦 5×100227−1002×500 135
𝑏𝑦𝑥 = = = 206 = 0.655
𝑛 ∑ 𝑥 2−(∑ 𝑥)2 5×200842−10022
Y − Y = byx (X − X)
Y − 100 = 0.655(X − 200.4)
Or, y-100=0.655x-200.40.655
or, y-100=0.655x-131.262
or, y=0.655x-132.262+100
or, y=0.655x-32.262
The required regression equation is
𝑦̂ = −32.262 + 0.655𝑥
When x=200
𝑦̂ = −32.262 + 0.655 × 200 = −32.262 + 131 = 98.738 𝑚𝑔

1. What is regression analysis? How does it differ from correlation? Why there are, in general,
11

two-regression equation?
2. What is linear regression? Why are there in general, two regression lines?
SUDIP KHANAL
4/8/20

3. Explain the use of regression equation in public health?


4. Explain the use of regression equation in health care management?
5. Explain the uses of regression equation in Pharmacy?
6. What are the properties of regression coefficients?
7. Distinguish between regression and correlation?
8. Explain the principle of method of least square.
9. Explain the concept of regression analysis between two variables.

Numerical Questions
10. The following table shows the ages X and the systolic blood pressure Y of 8 women.
Age of the pregnant women(X) 30 20 25 36 35 16 40 38
Blood pressure(y) 147 125 160 118 149 128 150 145
a. Determine the regression equation of Y on X.
b. Estimate the blood pressure of women whose ages is 27, 32, and 15
11. The objective of study by s .Sharma and S .Tiwari was to determine whether age is correlated
with blood pressure .For study 8 patients coming to health post were selected using random
sampling techniques .The resulting variables on these measurements are as follows.
Age(x) 56 42 36 47 49 42 60 72
Blood pressure(y) 147 125 118 128 145 140 155 160
a. Determine the regression equation x on y
b. Estimate the regression equation y on x
c. Estimate the blood pressure of person whose age is 76.
12. The following data gives the blood pressure and age of 12 adults.
Age (Years) 56 42 72 36 63 47 55 49 38 42 68 60
Bp (mm of hg) 147 125 160 118 149 128 150 145 115 140 152 155
Find out regression equations of Blood pressure on age.
13. Calculate the correlation and find the two lines of regression from the following data:
X 57 58 59 60 61 62 64 65
Y 67 87 54 32 12 45 67 86
14. Obtain the lines of regression for the following data.
X 1 2 3 4 5 6 7 8 9
Y 8 9 11 12 14 13 15 16 4
Obtain the estimate of Y which should correspond on the average to x=5.5
15. The following table gives information on ages and cholesterol levels for random sample of
10 men
Age 58 69 43 35 63 52 47 31 74 36
Cholesterol level 189 235 193 177 154 191 213 165 148 181
Predict the cholesterol levels for a 60 years old man. (Ans: y=185.152)
16. Find out the regression equation of Y on X from the following data:
X 1 2 3 4 5
Y 160 180 140 180 200
(Ans: y= -148+8x)
17. Samples of drug product are stored in their original containers under normal conditions and
sampled periodically to analyzed the content of the medication.
12

Time (Months) 6 12 18 24 36 48
Assay(Mg) 995 984 973 960 952 948
SUDIP KHANAL
4/8/20

Estimate the content of the medication for 72 months.


18. During a laboratory experiment muscular contraction of a frog muscle were measure against
different doses of a given drug. The height of the curves was considered as the response to
the drug. The observations were as below.
Serial number of the experiment
1 2 3 4 5
Dose of Drug 0.3 0.4 0.6 0.8 0.9
Response of the drug 54.0 59.0 60.0 65.0 70.0
Calculate the response of the drug for a dose of 0.5

Additional Numerical problems:


19. For a bivariate data, the mean value of x is 20 and mean value of Y is 45. The regression
coefficient of Y on X is 4 and that of X on y is 1/9. Find (a) coefficient of correlation (b) find
the standard deviation of x if the standard deviation y is 12 (C) the two regression equations
(d) estimate the value of X when Y=25. (Ans: 0.67,2.01, y=-35+4x, x=15+1/9y, 17.78)
20. Regression coefficient of Y on X and X on Y are given as -2.002 and -0.461. Find the value of
the correlation coefficient between X and Y. If mean of X and Y are 87.2 and 127.2 estimate X
when Y=133.
21. Given the following data variance of X is 9, the regression equations: 4x-5y+33=0; 20x-9y-
107=0 find
a. The mean values of X and Y
b. The coefficients of correlation between X and Y
c. Standard deviation of Y
22. A computer while calculating correlation coefficient between two variable X and Y obtained
the following results. n=30, ∑x=120, ∑y=90, ∑xy =356, ∑x2=600, ∑y2=250,
It was, however, later discovered at the time of checking that it had wrongly copied down
two pairs of observations as (8, 10) and (12, 7) while correct values were (8, 12) and (10, 8)
find the correct value of correlation coefficient between x and y and obtain the regression
equation of y on x of the corrected values also find the value of y when x is 27.

13

You might also like