Bansilal RamnathAgarwal Charitable Trust’s
VISHWAKARMA INSTITUTE OF TECHNOLOGY – PUNE
Department of Multidisciplinary Engineering
MD2201: Data Science
Name of the student: Swaroop Deokar Roll No. 16
Div: CS-AIML-A Batch: 1
Date of performance: 17/08/2023
Experiment No.4
Title: Regression .
Aim: i. To construct a simple linear regression model
ii. To construct a multiple linear regression model.
Software used: Programming language R.
Data Set: Toy Sales Dataset
Code Statement:
1. Simple Linear Regression
i. Consider the Toy sales data set.
ii. Apply simple linear model considering response as Unit sales and explanatory variable
as Price.
iii. Plot the scatter plot and draw the regression.
iv. What are values of R-square and residual standard error? (Write in conclusion)
v. Display all predicted values from the designed model and the corresponding values of
error.
2. Multiple Linear regression:
i. Consider Toy sales data set.
ii. Consider all variables to fit the regression model.
iii. Compare the R-square of SLR with MLR. (Write in conclusion)
iv. Which of the variable is more significant? Why? (Write in conclusion)
v. Can you reject Null hypothesis for promotion expenditure variable? (Write in conclusion)
vi. Which scenario from the following you will select to be applied to get maximum
number of Unit sales? (Write in conclusion)
a. Price=9.1$, Adexp=52,000$, Promexp=61,000$
b. Price=8.1$, Adexp=50,000$,Promexp=60,000$
Code: #SLR----
f1=read.csv("Toy_sales_csv.csv")
#print(f1)
l1=lm(Unitsales~Price,f1)
s1=summary(l1)
print(s1)
library(ggplot2)
Bansilal RamnathAgarwal Charitable Trust’s
VISHWAKARMA INSTITUTE OF TECHNOLOGY – PUNE
Department of Multidisciplinary Engineering
p=ggplot(f1,aes(Price,Unitsales))+geom_point()+geom_smooth(method=lm,formula =
y~x,col="red",se=F)
print(p)
pred1=predict(l1)
cat("\nPredicted value\n",pred1)
err<-f1$Unitsales-pred1
cat("\n\nErrors",err)
#MLR----
l2=lm(Unitsales~Price+Adexp+Promexp,f1)
s2=summary(l2)
print(s2)
df=data.frame(Price=c(9.1,8.1),Adexp=c(52,50),Promexp=c(61,60))
pred2=predict(l2,df)
cat("\nPredicted value\n",pred2)
Results:
Bansilal RamnathAgarwal Charitable Trust’s
VISHWAKARMA INSTITUTE OF TECHNOLOGY – PUNE
Department of Multidisciplinary Engineering
Conclusion: In conclusion constructing a simple linear regression model involves visually observing a
linear pattern in the scatterplot and a statistically significant correlation between the independent and
dependent variables. It is important to check the assumptions for linear regression, including linearity,
independence of observations, normality and homogeneity of variance. On the other hand constructing a
multiple linear regression model involves analyzing the relationship between a dependent variable and
two or more independent variables. It is important to consider the assumptions and limitations of linear
regression analysis, as well as the potential pitfalls that may arise. By following these guidelines and
interpreting the results appropriately, linear regression can be a useful tool for predicting trends and
estimating values of variables.
1) Simple Linear Regression
The value of R-squared for LSR is 0.619
Residual standard error is 1997
2) Multiple Linear Regression
a) The value of R-Squared for SLR is 0.619 while that for MLR is 0.8588
Bansilal RamnathAgarwal Charitable Trust’s
VISHWAKARMA INSTITUTE OF TECHNOLOGY – PUNE
Department of Multidisciplinary Engineering
b) Multiple R-Squared variable is more significant having a higher value. Higher value
implies that more changes in independent variables corelates to shifts in dependent variable.
c) Yes, the Null hypothesis will be rejected as the pvalue is less than 0.05
d) Scenario a will be selected with price 9.1$