0% found this document useful (0 votes)

8 views8 pages

Linear Regression

The document provides an overview of linear regression, explaining its purpose in modeling relationships between dependent and independent variables. It distinguishes between simple and multiple linear regression, discusses model evaluation metrics, and addresses multicollinearity issues. Additionally, it includes practical examples using Python for data analysis and visualization with a dataset on advertising and sales.

Uploaded by

gauri10in

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views8 pages

Linear Regression

Uploaded by

gauri10in

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

6/2/23, 8:33 PM linearRegression

Linear Regression

Linear regression is an important statitical technique used to model and analyze the
relationship between a dependent and one or more independent variables

Question 1: What is linear regression ? linear reegression is a statistical method used to

model relationship between a dependent variable and one or more independent
variables by fitting a linear equation to observed data.

Question 2: What is the difference between simple linear regression and multiple linear
regression?

Simple linear regression involves a single independent variable predicting the dependent
variable, while multiple linear regression involves two or more independent variables
predicting the dependent variable. Multiple linear regression allows for the analysis of
the combined effects of multiple predictors on the outcome variable.

Question 3:How is the quality of a linear regression model evaluated ?

The quality of a linear regression model is assessed using various metrics, such as

1. the coefficient of determination (R-squared)

file:///C:/Users/rinki/Downloads/linearRegression.html 1/8
6/2/23, 8:33 PM linearRegression

2. root mean square error (RMSE)

3. mean absolute error (MAE)
4. adjusted R-squared

Question 4: How can you deal with multicollinearity in linear regression ?

Multicollinearity occurs when independent variables in a regression model are highly

correlated with each other. It can lead to unstable coefficient estimates and reduce the
model's interpretability. Dealing with multicollinearity can involve removing one of the
correlated variables, combining variables, or using dimensionality reduction techniques
such as principal component analysis (PCA)

In [1]: ## import library

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [3]: df = pd.read_csv('advertising.csv')
df.head

Out[3]: <bound method NDFrame.head of TV Radio Newspaper Sales

0 230.1 37.8 69.2 22.1
1 44.5 39.3 45.1 10.4
2 17.2 45.9 69.3 12.0
3 151.5 41.3 58.5 16.5
4 180.8 10.8 58.4 17.9
.. ... ... ... ...
195 38.2 3.7 13.8 7.6
196 94.2 4.9 8.1 14.0
197 177.0 9.3 6.4 14.8
198 283.6 42.0 66.2 25.5
199 232.1 8.6 8.7 18.4

[200 rows x 4 columns]>

In [4]: df.shape

Out[4]: (200, 4)

In [5]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 TV 200 non-null float64
1 Radio 200 non-null float64
2 Newspaper 200 non-null float64
3 Sales 200 non-null float64
dtypes: float64(4)
memory usage: 6.4 KB

In [7]: df.describe()

file:///C:/Users/rinki/Downloads/linearRegression.html 2/8
6/2/23, 8:33 PM linearRegression

Out[7]: TV Radio Newspaper Sales

count 200.000000 200.000000 200.000000 200.000000

mean 147.042500 23.264000 30.554000 15.130500

std 85.854236 14.846809 21.778621 5.283892

min 0.700000 0.000000 0.300000 1.600000

25% 74.375000 9.975000 12.750000 11.000000

50% 149.750000 22.900000 25.750000 16.000000

75% 218.825000 36.525000 45.100000 19.050000

max 296.400000 49.600000 114.000000 27.000000

In [11]: df.isnull().sum()*100/df.shape[0]

Out[11]: TV 0.0
Radio 0.0
Newspaper 0.0
Sales 0.0
dtype: float64

In [13]: #outlier analysis

fig, axs = plt.subplots(3, figsize = (5,5))
plt1 = sns.boxplot(df['TV'], ax = axs[0])
plt2 = sns.boxplot(df['Newspaper'], ax = axs[1])
plt3 = sns.boxplot(df['Radio'], ax = axs[2])
plt.tight_layout()

file:///C:/Users/rinki/Downloads/linearRegression.html 3/8
6/2/23, 8:33 PM linearRegression

In [14]: sns.pairplot(df, x_vars=['TV', 'Newspaper', 'Radio'], y_vars='Sales', height=4,

plt.show()

In [15]: # Let's see the correlation between different variables.

sns.heatmap(df.corr(), cmap="YlGnBu", annot = True)
plt.show()

file:///C:/Users/rinki/Downloads/linearRegression.html 4/8
6/2/23, 8:33 PM linearRegression

TV seems to be most correlated with Sales

In [16]: X = df['TV']
y = df['Sales']

In [17]: from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size = 0.6, test

In [20]: # Add a constant to get an intercept

import statsmodels.api as sm
X_train_sm = sm.add_constant(X_train)

In [21]: lr = sm.OLS(y_train, X_train_sm).fit()

In [22]: lr.params

Out[22]: const 6.780417

TV 0.055639
dtype: float64

In [23]: plt.scatter(X_train, y_train)

plt.plot(X_train, 6.948 + 0.054*X_train, 'r')
plt.show()

file:///C:/Users/rinki/Downloads/linearRegression.html 5/8
6/2/23, 8:33 PM linearRegression

In [24]: y_train_pred = lr.predict(X_train_sm)

res = (y_train - y_train_pred)

In [25]: fig = plt.figure()

sns.distplot(res, bins = 15)
fig.suptitle('Error Terms', fontsize = 15) # Plot heading
plt.xlabel('y_train - y_train_pred', fontsize = 15) # X-label
plt.show()

<ipython-input-25-723b49e70e34>:2: UserWarning:

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

sns.distplot(res, bins = 15)

file:///C:/Users/rinki/Downloads/linearRegression.html 6/8
6/2/23, 8:33 PM linearRegression

In [26]: plt.scatter(X_train,res)
plt.show()

In [29]: X_test_sm = sm.add_constant(X_test)

y_pred = lr.predict(X_test_sm)

file:///C:/Users/rinki/Downloads/linearRegression.html 7/8
6/2/23, 8:33 PM linearRegression

In [30]: y_pred.head()

Out[30]: 126 7.214399

104 20.033555
99 14.302769
92 18.892962
111 20.228291
dtype: float64

In [31]: from sklearn.metrics import mean_squared_error

from sklearn.metrics import r2_score
#Returns the mean squared error; we'll take a square root
np.sqrt(mean_squared_error(y_test, y_pred))

Out[31]: 1.994739178382777

In [32]: r_squared = r2_score(y_test, y_pred)

r_squared

Out[32]: 0.7807592057194056

In [33]: #best fit line for test

plt.scatter(X_test, y_test)
plt.plot(X_test, 6.948 + 0.054 * X_test, 'r')
plt.show()

file:///C:/Users/rinki/Downloads/linearRegression.html 8/8

Task05 Salespredictionusingpython 1752340936
No ratings yet
Task05 Salespredictionusingpython 1752340936
3 pages
Linear Regression - Jupyter Notebook
100% (3)
Linear Regression - Jupyter Notebook
56 pages
1.3 - Multiple Linear Regression
No ratings yet
1.3 - Multiple Linear Regression
13 pages
Assumption of Linear Regression
No ratings yet
Assumption of Linear Regression
6 pages
Linear Regression for Beginners
No ratings yet
Linear Regression for Beginners
46 pages
Linear Regression - Jupyter Notebook
No ratings yet
Linear Regression - Jupyter Notebook
6 pages
Linear Regression3.0
No ratings yet
Linear Regression3.0
24 pages
0.1 Advertising Dataset: Linear Regression and Model Assumption
No ratings yet
0.1 Advertising Dataset: Linear Regression and Model Assumption
42 pages
Linear Regression Model
No ratings yet
Linear Regression Model
4 pages
CS250 - Simple Linear Regression Project - Saylor Academy - Saylor Academy
No ratings yet
CS250 - Simple Linear Regression Project - Saylor Academy - Saylor Academy
9 pages
Ds Lab 4.ipynb - TARUN
No ratings yet
Ds Lab 4.ipynb - TARUN
6 pages
Linear Regression: What Is Regression Analysis?
100% (1)
Linear Regression: What Is Regression Analysis?
21 pages
Unit III Da Notes
No ratings yet
Unit III Da Notes
43 pages
Unit5 R
No ratings yet
Unit5 R
5 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
Lab5 MLR
No ratings yet
Lab5 MLR
12 pages
OE-ML Unit - 3
No ratings yet
OE-ML Unit - 3
29 pages
ML Exp 1
No ratings yet
ML Exp 1
4 pages
Linear Regression for Sales and Advertising
No ratings yet
Linear Regression for Sales and Advertising
14 pages
Ds - Lab - 4.ipynb - Colab
No ratings yet
Ds - Lab - 4.ipynb - Colab
7 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
25 pages
Sales
No ratings yet
Sales
7 pages
Complete
No ratings yet
Complete
12 pages
Linear Regression Besant
No ratings yet
Linear Regression Besant
11 pages
Chapter - 2 - Linear and Logistic Regression
No ratings yet
Chapter - 2 - Linear and Logistic Regression
34 pages
Regression Analysis
No ratings yet
Regression Analysis
40 pages
Regression Questionnaire
No ratings yet
Regression Questionnaire
10 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
22UCS303 DS-Unit IV-LINEAR REGRESSION
No ratings yet
22UCS303 DS-Unit IV-LINEAR REGRESSION
19 pages
DS Unit 4
No ratings yet
DS Unit 4
21 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
5 pages
Lecture 9-10
No ratings yet
Lecture 9-10
28 pages
Linear Regression Guide & Assumptions
No ratings yet
Linear Regression Guide & Assumptions
9 pages
Machine Exercise 3
No ratings yet
Machine Exercise 3
22 pages
Chapter4 Regression
No ratings yet
Chapter4 Regression
15 pages
Regression
No ratings yet
Regression
16 pages
Unit5 R
No ratings yet
Unit5 R
5 pages
U-4 Iml
No ratings yet
U-4 Iml
17 pages
AI Lec23
No ratings yet
AI Lec23
36 pages
Unit Iii
No ratings yet
Unit Iii
27 pages
Exercise#8 Instructions Linear Regression Model
No ratings yet
Exercise#8 Instructions Linear Regression Model
4 pages
Practical # 10
No ratings yet
Practical # 10
5 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Mulitple Linear Regression
No ratings yet
Mulitple Linear Regression
6 pages
Polynomial Regression
No ratings yet
Polynomial Regression
6 pages
Deepak Data Analysis 1
No ratings yet
Deepak Data Analysis 1
31 pages
Linear Regression - Numpy and Sklearn
No ratings yet
Linear Regression - Numpy and Sklearn
7 pages
ML Algorithm
No ratings yet
ML Algorithm
4 pages
ML LN 3
No ratings yet
ML LN 3
44 pages
Exp 4 - LM
No ratings yet
Exp 4 - LM
5 pages
Linear Regression for Analysts
No ratings yet
Linear Regression for Analysts
22 pages
Linear Regression Firm Basit PDF
No ratings yet
Linear Regression Firm Basit PDF
21 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
Linear Regression
No ratings yet
Linear Regression
6 pages
Linear Regression Analysis Guide
No ratings yet
Linear Regression Analysis Guide
15 pages
CBA Presentation
No ratings yet
CBA Presentation
8 pages
William Burks Mullins Library
No ratings yet
William Burks Mullins Library
3 pages
SAP EUT EWM Outbound Process
No ratings yet
SAP EUT EWM Outbound Process
26 pages
ARM Implementation: Datapath Control Unit (FSM)
No ratings yet
ARM Implementation: Datapath Control Unit (FSM)
22 pages
Characteristics of Silicon and Germanium
100% (4)
Characteristics of Silicon and Germanium
2 pages
Compiler Design Question Paper 21 22
No ratings yet
Compiler Design Question Paper 21 22
3 pages
Bangladesh Railway CRM Strategy
100% (1)
Bangladesh Railway CRM Strategy
13 pages
Catalogo de Los Rele Schneider
No ratings yet
Catalogo de Los Rele Schneider
130 pages
HPBIOSUPDREC65
No ratings yet
HPBIOSUPDREC65
1 page
Modern Electronic Instrumentation
No ratings yet
Modern Electronic Instrumentation
27 pages
Signature V3 Implementation Guide
No ratings yet
Signature V3 Implementation Guide
21 pages
ch02 ITSS 459
No ratings yet
ch02 ITSS 459
37 pages
7sem Result
No ratings yet
7sem Result
1 page
PNI Smart Dongle-WLAN User Manual
No ratings yet
PNI Smart Dongle-WLAN User Manual
213 pages
Java Programming Eigth Edition Joyce Farrell Instant Download
No ratings yet
Java Programming Eigth Edition Joyce Farrell Instant Download
75 pages
Grade 9 EM Unit 2 Excel 2023
No ratings yet
Grade 9 EM Unit 2 Excel 2023
22 pages
Van Wynsberghe and Robbins - 2019 - Critiquing AMAs
No ratings yet
Van Wynsberghe and Robbins - 2019 - Critiquing AMAs
17 pages
Chapter 4 AES
No ratings yet
Chapter 4 AES
45 pages
4 LP Graphical Method Maximization 1
No ratings yet
4 LP Graphical Method Maximization 1
20 pages
B436-Y2 Mitsubishi 730 VSL BIGLIA LATHES
No ratings yet
B436-Y2 Mitsubishi 730 VSL BIGLIA LATHES
153 pages
IoT Smart Cities - Qualcomm
No ratings yet
IoT Smart Cities - Qualcomm
27 pages
Python Lottery Documentation
No ratings yet
Python Lottery Documentation
26 pages
Cisco Unified Communications Manager Express Version 14
No ratings yet
Cisco Unified Communications Manager Express Version 14
10 pages
DF1 Toman Tron
No ratings yet
DF1 Toman Tron
5 pages
Road To Serfdom Text and Documents The Definitive Edition The Wei Zhi Download
100% (1)
Road To Serfdom Text and Documents The Definitive Edition The Wei Zhi Download
32 pages
Digital Technique
No ratings yet
Digital Technique
33 pages
Assignment DBI
No ratings yet
Assignment DBI
4 pages
CCNA 3 v7 Modules 9 - 12 - Optimize, Monitor, and Troubleshoot Networks Exam Answers
No ratings yet
CCNA 3 v7 Modules 9 - 12 - Optimize, Monitor, and Troubleshoot Networks Exam Answers
34 pages
9-12 Flat File Schema Developers Guide
No ratings yet
9-12 Flat File Schema Developers Guide
86 pages
Data Analysis With Python and PySpark (MEAP V07) Jonathan Rioux Online PDF
100% (1)
Data Analysis With Python and PySpark (MEAP V07) Jonathan Rioux Online PDF
155 pages

Linear Regression

Uploaded by

Linear Regression

Uploaded by

6/2/23, 8:33 PM linearRegression

Question 1: What is linear regression ? linear reegression is a statistical method used to

Question 3:How is the quality of a linear regression model evaluated ?

1. the coefficient of determination (R-squared)

2. root mean square error (RMSE)

Question 4: How can you deal with multicollinearity in linear regression ?

Multicollinearity occurs when independent variables in a regression model are highly

In [1]: ## import library

Out[3]: <bound method NDFrame.head of TV Radio Newspaper Sales

[200 rows x 4 columns]>

Out[7]: TV Radio Newspaper Sales

count 200.000000 200.000000 200.000000 200.000000

mean 147.042500 23.264000 30.554000 15.130500

std 85.854236 14.846809 21.778621 5.283892

min 0.700000 0.000000 0.300000 1.600000

25% 74.375000 9.975000 12.750000 11.000000

50% 149.750000 22.900000 25.750000 16.000000

75% 218.825000 36.525000 45.100000 19.050000

max 296.400000 49.600000 114.000000 27.000000

In [13]: #outlier analysis

In [14]: sns.pairplot(df, x_vars=['TV', 'Newspaper', 'Radio'], y_vars='Sales', height=4,

In [15]: # Let's see the correlation between different variables.

TV seems to be most correlated with Sales

In [17]: from sklearn.model_selection import train_test_split

In [20]: # Add a constant to get an intercept

In [21]: lr = sm.OLS(y_train, X_train_sm).fit()

Out[22]: const 6.780417

In [23]: plt.scatter(X_train, y_train)

In [24]: y_train_pred = lr.predict(X_train_sm)

In [25]: fig = plt.figure()

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

sns.distplot(res, bins = 15)

In [29]: X_test_sm = sm.add_constant(X_test)

Out[30]: 126 7.214399

In [31]: from sklearn.metrics import mean_squared_error

In [32]: r_squared = r2_score(y_test, y_pred)

In [33]: #best fit line for test

You might also like