Multiple Linear Regression
1
Steps for building the Multiple Linear
Regression Model
1. Check for the multicollinearity (correlation among
predictors) using Variance Inflation Factor (VIF).
a. If VIF>10 then muticollinearity exits. First remove the variable
with highest VIF value and check for the multicollinearity of
the remaining variables.
b. Continue this process until VIF of the variables less than or
equal to 10
2. Fit the regression model with the remaining variables and
test for the significance of the parameters. First remove
the insignificant parameter with the highest p-value.
a. Continue this process until all the parameters are significant.
2
3. Test for the overall significance of the fitted model
using F-test and coefficient of determination .
4. Carry out the residual analysis.
5. Model validation.
3
Example 2
The following data were collected on a simple random
sample of 20 patients with hypertension. The variables are
y = Mean arterial blood pressure (mm Hg)
x1= age (years)
x2= weight (kg)
x3=body surface area (sq m)
x4= duration of hypertension (years)
x5= basal pulse (beats/min)
x6= measure of stress
4
Patient y x1 x2 x3 x4 x5 x6
1 105 47 85.4 1.75 5.1 63 33
2 115 49 94.2 2.10 3.8 70 14
3 116 49 95.3 1.98 8.2 72 10
4 117 50 94.7 2.01 5.8 73 99
5 112 51 89.4 1.89 7.0 72 95
6 121 48 99.5 2.25 9.3 71 10
7 121 49 99.8 2.25 2.5 69 42
8 110 47 90.9 1.90 6.2 66 8
9 110 49 89.2 1.83 7.1 69 62
10 114 48 92.7 2.07 5.6 64 35
11 114 47 94.4 2.07 5.3 74 90
12 115 49 94.1 1.98 5.6 71 21
13 114 50 91.6 2.05 10.2 68 47
14 106 45 87.1 1.92 5.6 67 80
15 125 52 101.3 2.19 10.0 76 98
16 114 46 94.5 1.98 7.4 69 95
17 106 46 87.0 1.87 3.6 62 18
18 113 46 94.5 1.90 4.3 70 12
19 110 48 90.5 1.88 9.0 71 99
20 122 56 95.7 2.09 7.0 75 99
5
• Import the “patients” data set
• View the data set
View(patients)
• Attaching the data set
attach(patients)
• To obtain the correlation
cor(patients)
• To obtain a scatter plot
pairs(patients)
6
• Define the model to check the Multicollinearity
mlr1<-lm(y~x1+x2+x3+x4+x5+x6,data = patients)
• Calculating VIF values
install.packages("faraway")
library(faraway)
vif(mlr1)
VIF(<10) No Multicollinearity
7
Fit the regression model
mlr1<-lm(y~x1+x2+x3+x4+x5+x6,data = patients)
• Obtaining the regression coefficients of the model
coef(mlr1)
or
mlr1
8
• Obtaining the summary of the regression
model
summary(mlr1)
Insignificant
(P(>0.05))
9
Note
• If the Intercept is not significant, How can you
get rid of it in R
lm(y~ -1+x1+x2+x3+x4+x5+x6,data = patients)
10
• Define the model by removing variable x4
mlr2<-lm(y~x1+x2+x3+x5+x6,data = patients)
• Calculating VIF values
vif(mlr2)
VIF(<10) No Multico-llinearity
11
• Fitting the model by removing variable x4 and
obtaining the summary
mlr2<-lm(y~x1+x2+x3+x5+x6,data = patients)
summary(mlr2)
Insignificant
(P(>0.05))
12
• Define the model by removing variable x5
mlr3<-lm(y~x1+x2+x3+x6,data = patients)
• Calculating VIF values
vif(mlr3)
VIF(<10) No Multicollinearity
13
• Fitting the model by removing variable x5 and
obtaining the summary
mlr3<-lm(y~x1+x2+x3+x6,data = patients)
summary(mlr3)
Insignificant
(P(>0.05))
14
• Define the model by removing variable x6
mlr4<-lm(y~x1+x2+x3,data = patients)
• Calculating VIF values
vif(mlr4)
VIF(<10) No Multico-llinearity
15
• Fitting the model by removing variable x6 and
obtaining the summary
mlr4<-lm(y~x1+x2+x3,data = patients)
summary(mlr4)
All
parameters
are significant
(P(<0.05))
16
Residual Analysis(assumptions)
• H0 - No serial correlation (auto
correlation)-Durbin Watson Test
• p-value=0.4011>0.05 Do not reject H0
17
Normality Test
• H0: Residuals are normally distributed –
Anderson-Darling Test
Do not
(P(>0.05))
reject H0
18
• H0 – variance of the residuals is constant.
Since there is a
random pattern,
constant variance of
residual is satisfied
19
Model Validation
• Here p-value<0.05. Therefore, model is
significant.
• R2=0.9935, 99.35% of the total variation can
be explained by the fitted model.
20
Using the fitted model to predict the
blood pressure
𝑦ො = −13.6672 + 0.7016 ∗ 𝑥1 + 0.9058 ∗ 𝑥2 + (4.6273 ∗ 𝑥3)
• When weight and body surface area are fixed, age increases by
one year, the Blood pressure will increase by 0.702 units
• Calculating blood pressure when
age (x1) = 52 years,
Weight (x2) = 83.7 Kg,
Body surface area (x3) = 1.4 (sq m)
x1=52
x2=83.7
x3=1.4
Blood_pressure<-(-13.6672)+(0.7016*x1)+(0.9058*x2)+(4.6273*x3)
Blood_pressure
[1] 105.1097
21