0% found this document useful (0 votes)

29 views7 pages

Ds - Lab - 4.ipynb - Colab

Uploaded by

tarun.24msd7001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views7 pages

Ds - Lab - 4.ipynb - Colab

Uploaded by

tarun.24msd7001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

11/29/24, 10:17 PM ds_lab_4.

ipynb - Colab

G KALYAN 24MSD7034

from google.colab import files

uploaded = files.upload()

Choose Files Advertising_lab_4_Q.csv

Advertising_lab_4_Q.csv(text/csv) - 5166 bytes, last modified: 11/23/2024 - 100% done
Saving Advertising lab 4 Q csv to Advertising lab 4 Q csv

#necessary Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import pearsonr
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

data = pd.read_csv("Advertising_lab_4_Q.csv")

1) Create three well labelled scatterplots of this data with TV, Radio and News paper on the x-axis and Sales on the y-axis, and describe the re
lationship you see. The scatterplot colour should be red, blue and green respectively. Add suitable labels and title to the plot.

import matplotlib.pyplot as plt

# Scatterplot for TV, Radio, and Newspaper vs. Sales

plt.figure(figsize=(16, 5))

# TV vs Sales
plt.subplot(1, 3, 1)
plt.scatter(data['TV'], data['Sales'], color='red')
plt.title('TV Advertising Budget vs Sales')
plt.xlabel('TV Advertising Budget')
plt.ylabel('Sales')
plt.grid(True)

# Radio vs Sales
plt.subplot(1, 3, 2)
plt.scatter(data['Radio'], data['Sales'], color='blue')
plt.title('Radio Advertising Budget vs Sales')
plt.xlabel('Radio Advertising Budget')
plt.ylabel('Sales')
plt.grid(True)

https://colab.research.google.com/drive/1ZXvPoqQX9prVOkmN3w_sav9_1jAoMtBS#scrollTo=I9E_fhsnw6qU&printMode=true 1/7
11/29/24, 10:17 PM ds_lab_4.ipynb - Colab
# Newspaper vs Sales
plt.subplot(1, 3, 3)
plt.scatter(data['Newspaper'], data['Sales'], color='green')
plt.title('Newspaper Advertising Budget vs Sales')
plt.xlabel('Newspaper Advertising Budget')
plt.ylabel('Sales')
plt.grid(True)

plt.tight_layout()
plt.show()

2) In the scatterplot you made, what is the explanatory variable? What is the response variable? Why might you want to construct the problem
in this way?

Explanatory variable : The explanatory variable is the variable that is used to explain or predict the response variable it is also know as
independent variable

Response Variable : The variable that measures the impact of the explanatory variable on the subject. It is also known as the dependent
variable

Predictive Modeling :

Framing the problem with ad budgets as explanatory variables allows us to build predictive models that estimate sales based on different
spending levels.

https://colab.research.google.com/drive/1ZXvPoqQX9prVOkmN3w_sav9_1jAoMtBS#scrollTo=I9E_fhsnw6qU&printMode=true 2/7
11/29/24, 10:17 PM ds_lab_4.ipynb - Colab

Business Decisions:

Understanding the relationship between advertising spend and sales helps businesses optimize their budgets for maximum return on
investment (ROI).

By constructing the problem this way, we focus on identifying actionable insights for sales prediction and budget optimization.

3) Compute Pearson’s correlation coefficient between sales and each of the independent variables. What is your observation?

# Pearson's correlation coefficients

cor_tv, _ = pearsonr(data['TV'], data['Sales'])
cor_radio, _ = pearsonr(data['Radio'], data['Sales'])
cor_newspaper, _ = pearsonr(data['Newspaper'], data['Sales'])
print(f"Pearson's correlation between TV and Sales: {cor_tv}")
print(f"Pearson's correlation between Radio and Sales: {cor_radio}")
print(f"Pearson's correlation between Newspaper and Sales: {cor_newspaper}")

Pearson's correlation between TV and Sales: 0.7822244248616065

Pearson's correlation between Radio and Sales: 0.576222574571055
Pearson's correlation between Newspaper and Sales: 0.2282990263761654

4) Split the data into train (80%) and test (20%) (without shuffling). Fit a simple linear regression model on the train data for the three
independent variables separately and assess the accuracy of the model in terms of MSE(train and test). Which independent variable
contributes to accurate prediction of Sales?

# Split the data into train (80%) and test (20%) without shuffling
train, test = train_test_split(data, test_size=0.2, shuffle=False)

# Function to evaluate a simple linear regression model

def evaluate_model(feature):
model = LinearRegression()
X_train, y_train = train[[feature]], train['Sales']
X_test, y_test = test[[feature]], test['Sales']

# Fit the model

model.fit(X_train, y_train)

# Predictions and MSE

y_train_pred = model.predict(X_train)
y_test_pred = model.predict(X_test)
mse_train = mean_squared_error(y_train, y_train_pred)
mse_test = mean_squared_error(y_test, y_test_pred)

return mse_train, mse_test

# Evaluate models for TV, Radio, and Newspaper

https://colab.research.google.com/drive/1ZXvPoqQX9prVOkmN3w_sav9_1jAoMtBS#scrollTo=I9E_fhsnw6qU&printMode=true 3/7
11/29/24, 10:17 PM ds_lab_4.ipynb - Colab
results = {}
for feature in ['TV', 'Radio', 'Newspaper']:
mse_train, mse_test = evaluate_model(feature)
results[feature] = {'MSE Train': mse_train, 'MSE Test': mse_test}

print("Simple Linear Regression Results:")

for feature, mse_values in results.items():
print(f"{feature}: Train MSE = {mse_values['MSE Train']}, Test MSE = {mse_values['MSE Test']}")

Simple Linear Regression Results:

TV: Train MSE = 9.699713411632143, Test MSE = 14.128761342728321
Radio: Train MSE = 19.063327668527208, Test MSE = 14.44042373035678
Newspaper: Train MSE = 26.026670592327, Test MSE = 24.35470998177176

5) Fit multiple linear regression model on the train data for the different possible combinations of the three independent variables and assess
the accuracy of the model in terms of MSE (train and test). Which combina tion contributes to accurate prediction of Sales?

from itertools import combinations

# Function to evaluate multiple linear regression for different combinations

def evaluate_multiple_models(features):
model = LinearRegression()
X_train, y_train = train[features], train['Sales']
X_test, y_test = test[features], test['Sales']

# Fit the model

model.fit(X_train, y_train)

# Predictions and MSE

y_train_pred = model.predict(X_train)
y_test_pred = model.predict(X_test)
mse_train = mean_squared_error(y_train, y_train_pred)
mse_test = mean_squared_error(y_test, y_test_pred)

return mse_train, mse_test

# Evaluate all combinations of features

all_features = ['TV', 'Radio', 'Newspaper']
combination_results = {}
for r in range(1, len(all_features) + 1):
for combo in combinations(all_features, r):
mse_train, mse_test = evaluate_multiple_models(list(combo))
combination_results[combo] = {'MSE Train': mse_train, 'MSE Test': mse_test}

print("Multiple Linear Regression Results:")

for combo, mse_values in combination_results.items():
print(f"{combo}: Train MSE = {mse_values['MSE Train']}, Test MSE = {mse_values['MSE Test']}")

https://colab.research.google.com/drive/1ZXvPoqQX9prVOkmN3w_sav9_1jAoMtBS#scrollTo=I9E_fhsnw6qU&printMode=true 4/7
11/29/24, 10:17 PM ds_lab_4.ipynb - Colab

Multiple Linear Regression Results:

('TV',): Train MSE = 9.699713411632143, Test MSE = 14.128761342728321
('Radio',): Train MSE = 19.063327668527208, Test MSE = 14.44042373035678
('Newspaper',): Train MSE = 26.026670592327, Test MSE = 24.35470998177176
('TV', 'Radio'): Train MSE = 2.8221633041460903, Test MSE = 2.7930889237731136
('TV', 'Newspaper'): Train MSE = 8.77967640891637, Test MSE = 13.103347790031586
('Radio', 'Newspaper'): Train MSE = 19.06326627813012, Test MSE = 14.43136345887287
('TV', 'Radio', 'Newspaper'): Train MSE = 2.82179249487708, Test MSE = 2.7911451862764003

6) What is the difference between R2 and Adjusted R2? Comment!

keyboard_arrow_down R-squared
measures the proportion of the variance in the dependent variable that is explained by the independent variables.

Adjusted R-squared
It adjusts the R-squared value based on the number of predictors in the model and the sample size.

Difference between R-squared and Adjusted R-squared

R-squared:

Measures the proportion of variance explained by the model.

Always increases with the addition of more variables. Can be misleading in models with many predictors.

Adjusted R-squared:

Penalizes the addition of unnecessary variables.

Provides a more accurate measure of the model's fit.
Can decrease if a variable does not improve the model significantly

summary:

R-squared tells you how well your model fits the data.
Adjusted R-squared tells you how well your model fits the data, taking into account the number of predictors.

# Function to compute R² and Adjusted R²

def calculate_r2_adj_r2(features):
model = LinearRegression()
X_train, y_train = train[features], train['Sales']
model.fit(X_train, y_train)

# R²
r2 = model.score(X_train, y_train)

https://colab.research.google.com/drive/1ZXvPoqQX9prVOkmN3w_sav9_1jAoMtBS#scrollTo=I9E_fhsnw6qU&printMode=true 5/7
11/29/24, 10:17 PM ds_lab_4.ipynb - Colab
# Adjusted R²
n = len(y_train)
p = len(features)
adjusted_r2 = 1 - ((1 - r2) * (n - 1)) / (n - p - 1)

return r2, adjusted_r2

# Example: Evaluate R² and Adjusted R² for all features

r2, adj_r2 = calculate_r2_adj_r2(['TV', 'Radio', 'Newspaper'])
print(f"R²: {r2}, Adjusted R²: {adj_r2}")

R²: 0.8961523241120161, Adjusted R²: 0.8941552534218625

7)Give your final comments on which model linear or multiple linear is apt for accurate prediction of sales based on MSE values for train and
test and R2, Adjusted-R2 values

R² (0.896) :

The R² value of 0.896 means that the model explains 89.6% of the variance in the Sales variable using the features TV, Radio, and Newspaper.
This is a fairly high value

Adjusted R² (0.894) :

The Adjusted R² value is 0.894, which is very close to the R² value. This means that the inclusion of multiple predictors (TV, Radio, and
Newspaper) is contributing to explaining the variance in Sales and is not simply inflating the R² through overfitting.

Multiple Linear Regression is appropriate here because:

The Adjusted R² is close to R², suggesting that the additional features (Radio and Newspaper) are providing meaningful explanatory power and
not just increasing the complexity of the model unnecessarily.

test MSE is comparable to train MSE, this would further confirm that the model generalizes well and is a good choice for prediction.

https://colab.research.google.com/drive/1ZXvPoqQX9prVOkmN3w_sav9_1jAoMtBS#scrollTo=I9E_fhsnw6qU&printMode=true 6/7
11/29/24, 10:17 PM ds_lab_4.ipynb - Colab

https://colab.research.google.com/drive/1ZXvPoqQX9prVOkmN3w_sav9_1jAoMtBS#scrollTo=I9E_fhsnw6qU&printMode=true 7/7

Ds Lab 4.ipynb - TARUN
No ratings yet
Ds Lab 4.ipynb - TARUN
6 pages
Sales
No ratings yet
Sales
7 pages
Linear Regression Model
No ratings yet
Linear Regression Model
4 pages
0.1 Advertising Dataset: Linear Regression and Model Assumption
No ratings yet
0.1 Advertising Dataset: Linear Regression and Model Assumption
42 pages
Task05 Salespredictionusingpython 1752340936
No ratings yet
Task05 Salespredictionusingpython 1752340936
3 pages
Exercise#8 Instructions Linear Regression Model
No ratings yet
Exercise#8 Instructions Linear Regression Model
4 pages
Linear Regression for Beginners
No ratings yet
Linear Regression for Beginners
46 pages
Linear Regression - Jupyter Notebook
100% (3)
Linear Regression - Jupyter Notebook
56 pages
1.3 - Multiple Linear Regression
No ratings yet
1.3 - Multiple Linear Regression
13 pages
Sales Prediction Using Python
No ratings yet
Sales Prediction Using Python
6 pages
Task 5 Sales Prediction Using Machine Learning
No ratings yet
Task 5 Sales Prediction Using Machine Learning
8 pages
Exemplar - Hypothesis Testing With Python
No ratings yet
Exemplar - Hypothesis Testing With Python
14 pages
Lab5 MLR
No ratings yet
Lab5 MLR
12 pages
Project1 Research Report Week2 FullPages
No ratings yet
Project1 Research Report Week2 FullPages
14 pages
Linear Regression
No ratings yet
Linear Regression
8 pages
Lab Mannual of ML
No ratings yet
Lab Mannual of ML
43 pages
Machine Exercise 3
No ratings yet
Machine Exercise 3
22 pages
Linear Regression for Analysts
No ratings yet
Linear Regression for Analysts
22 pages
Exemplar - Perform Multiple Linear Regression
No ratings yet
Exemplar - Perform Multiple Linear Regression
20 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
Data Science for Beginners
No ratings yet
Data Science for Beginners
98 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
30 pages
ML Cheatsheet PDF
100% (1)
ML Cheatsheet PDF
211 pages
ML Cheatsheet for Beginners
100% (1)
ML Cheatsheet for Beginners
211 pages
Python Simple Linear Regression Guide
No ratings yet
Python Simple Linear Regression Guide
14 pages
INSY446 - 02 - Linear Model Part 1
No ratings yet
INSY446 - 02 - Linear Model Part 1
27 pages
GVPCOEW-Supervised ML - Linear Regression - DONE
No ratings yet
GVPCOEW-Supervised ML - Linear Regression - DONE
24 pages
Zerox Ready
No ratings yet
Zerox Ready
21 pages
UNIT6
No ratings yet
UNIT6
8 pages
Statistics For Data Science
No ratings yet
Statistics For Data Science
4 pages
Simple Linear Regression Guide
No ratings yet
Simple Linear Regression Guide
26 pages
CPSC 4830 2025summer Lecture 3
No ratings yet
CPSC 4830 2025summer Lecture 3
33 pages
Lab Experiment 4 - AI
No ratings yet
Lab Experiment 4 - AI
7 pages
Practical # 10
No ratings yet
Practical # 10
5 pages
IS4242 W3 Regression Analyses
No ratings yet
IS4242 W3 Regression Analyses
67 pages
Mtcars Data Analysis with ggplot
No ratings yet
Mtcars Data Analysis with ggplot
20 pages
Unit III Da Notes
No ratings yet
Unit III Da Notes
43 pages
ml1 PRG
No ratings yet
ml1 PRG
2 pages
7 محاضرات
No ratings yet
7 محاضرات
36 pages
Linear Regression
100% (1)
Linear Regression
16 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Unit II - Supervised Machine Learning Techniques
No ratings yet
Unit II - Supervised Machine Learning Techniques
131 pages
DS Food
No ratings yet
DS Food
23 pages
ML Combined
No ratings yet
ML Combined
254 pages
Abinash Nag Project Report CART
No ratings yet
Abinash Nag Project Report CART
40 pages
ICSCSP 2021 Proceedings-477-488
No ratings yet
ICSCSP 2021 Proceedings-477-488
12 pages
DISC 212 Session 13
No ratings yet
DISC 212 Session 13
29 pages
04 MLR
No ratings yet
04 MLR
32 pages
ML LN 3
No ratings yet
ML LN 3
44 pages
Unit 3
No ratings yet
Unit 3
30 pages
Lecture 4
No ratings yet
Lecture 4
62 pages
ML CH
No ratings yet
ML CH
19 pages
Linear Regression3.0
No ratings yet
Linear Regression3.0
24 pages
Linear Regression Lab Guide
100% (1)
Linear Regression Lab Guide
8 pages
Regression Analysis For Forecasting: Yosef Daryanto
No ratings yet
Regression Analysis For Forecasting: Yosef Daryanto
36 pages
Statistics
No ratings yet
Statistics
56 pages
2024 Managing Procurement Performance Lecture Notes
No ratings yet
2024 Managing Procurement Performance Lecture Notes
154 pages
In Uence of Employee Relations On Organization Performance of Private Universities in Kenya
No ratings yet
In Uence of Employee Relations On Organization Performance of Private Universities in Kenya
29 pages
Ect 210 Teaching Stratgies-1
No ratings yet
Ect 210 Teaching Stratgies-1
20 pages
Logistics 2030
No ratings yet
Logistics 2030
42 pages
The BIRD Recommendations On The Use of Ivermectin For Covid-19
No ratings yet
The BIRD Recommendations On The Use of Ivermectin For Covid-19
99 pages
Salah's Impact on Maguindanaon Time Management
No ratings yet
Salah's Impact on Maguindanaon Time Management
3 pages
Research Methodology: 1 Sarath Bhushan Kaluturi
No ratings yet
Research Methodology: 1 Sarath Bhushan Kaluturi
25 pages
Grade 10 Track Choice Factors
0% (1)
Grade 10 Track Choice Factors
17 pages
Southern New Hampshire University: Student Signature Date
No ratings yet
Southern New Hampshire University: Student Signature Date
5 pages
Absence of Parental Guidance and Its Effect To The Behavior of Children
67% (6)
Absence of Parental Guidance and Its Effect To The Behavior of Children
26 pages
Business Analytics Data Analysis Decision Making 6th Edition Albright PDF Download
No ratings yet
Business Analytics Data Analysis Decision Making 6th Edition Albright PDF Download
317 pages
Current State of LLM Risks and AI Guardrails: Suriya Ganesh Ayyamperumal, Limin Ge
No ratings yet
Current State of LLM Risks and AI Guardrails: Suriya Ganesh Ayyamperumal, Limin Ge
9 pages
Graphic Design Thesis Proposal Guide
100% (2)
Graphic Design Thesis Proposal Guide
7 pages
Sap Summarize 1
No ratings yet
Sap Summarize 1
16 pages
Filipino Trending TikTok On Outfit Ideas, Lifestyle Influence and Factors On Body Image of Selected Young Adults in Bacoor City, Cavite
No ratings yet
Filipino Trending TikTok On Outfit Ideas, Lifestyle Influence and Factors On Body Image of Selected Young Adults in Bacoor City, Cavite
11 pages
PMS Unit 4
No ratings yet
PMS Unit 4
13 pages
Erich Fromm 1929a5-E: Psychoanalysis and Sociology
100% (4)
Erich Fromm 1929a5-E: Psychoanalysis and Sociology
3 pages
MEDDEV 2.7/1 Rev. 4: Clinical Evaluation Updates
No ratings yet
MEDDEV 2.7/1 Rev. 4: Clinical Evaluation Updates
5 pages
Diabetes Care in Mauritius
No ratings yet
Diabetes Care in Mauritius
52 pages
PRIUSS Scale and Guidelines
No ratings yet
PRIUSS Scale and Guidelines
2 pages
Community-Based Health Insurance Utilization and Its Associated Factors Among Rural Households in Akaki District, Oromia, Ethiopia, 2021
No ratings yet
Community-Based Health Insurance Utilization and Its Associated Factors Among Rural Households in Akaki District, Oromia, Ethiopia, 2021
20 pages
Cep Symcor Case Study
No ratings yet
Cep Symcor Case Study
9 pages
M U T U A L
No ratings yet
M U T U A L
25 pages
EEG Conformer Convolutional Transformer For EEG Decoding and Visualization
No ratings yet
EEG Conformer Convolutional Transformer For EEG Decoding and Visualization
10 pages
Business Research Method Question and Answers
75% (12)
Business Research Method Question and Answers
16 pages
VP Marketing Analytics Strategy in Philadelphia PA Resume William Serad
No ratings yet
VP Marketing Analytics Strategy in Philadelphia PA Resume William Serad
5 pages
Operation Management Question Bank
No ratings yet
Operation Management Question Bank
16 pages
Thesis Presentation Analysis and Interpretation
100% (3)
Thesis Presentation Analysis and Interpretation
8 pages
AO 27 Workplan
No ratings yet
AO 27 Workplan
4 pages

Ds - Lab - 4.ipynb - Colab

Uploaded by

Ds - Lab - 4.ipynb - Colab

Uploaded by

11/29/24, 10:17 PM ds_lab_4.

from google.colab import files

Choose Files Advertising_lab_4_Q.csv

import matplotlib.pyplot as plt

# Scatterplot for TV, Radio, and Newspaper vs. Sales

# Pearson's correlation coefficients

Pearson's correlation between TV and Sales: 0.7822244248616065

# Function to evaluate a simple linear regression model

# Fit the model

# Predictions and MSE

return mse_train, mse_test

# Evaluate models for TV, Radio, and Newspaper

print("Simple Linear Regression Results:")

Simple Linear Regression Results:

from itertools import combinations

# Function to evaluate multiple linear regression for different combinations

# Fit the model

# Predictions and MSE

return mse_train, mse_test

# Evaluate all combinations of features

print("Multiple Linear Regression Results:")

Multiple Linear Regression Results:

6) What is the difference between R2 and Adjusted R2? Comment!

Difference between R-squared and Adjusted R-squared

Measures the proportion of variance explained by the model.

Penalizes the addition of unnecessary variables.

# Function to compute R² and Adjusted R²

return r2, adjusted_r2

# Example: Evaluate R² and Adjusted R² for all features

R²: 0.8961523241120161, Adjusted R²: 0.8941552534218625

Multiple Linear Regression is appropriate here because:

You might also like