[go: up one dir, main page]

0% found this document useful (0 votes)
10 views11 pages

Exp - 6-Model Development - SDK - Ok

The document outlines the development and evaluation of various regression models to predict car prices based on different variables. It includes examples of linear regression, multiple linear regression, polynomial regression, and visualizations for model evaluation. The conclusion indicates that the multiple linear regression model is the most effective for predicting car prices from the dataset.

Uploaded by

gmranuj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views11 pages

Exp - 6-Model Development - SDK - Ok

The document outlines the development and evaluation of various regression models to predict car prices based on different variables. It includes examples of linear regression, multiple linear regression, polynomial regression, and visualizations for model evaluation. The conclusion indicates that the multiple linear regression model is the most effective for predicting car prices from the dataset.

Uploaded by

gmranuj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Model Development

Setup

A Model will help us understand the exact relationship between different variables and how these variables are used to predict the result.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from IPython.display import display


from IPython.html import widgets
from ipywidgets import interact, interactive, fixed, interact_manual
df = pd.read_csv('../input/auto_clean.csv')
df.head()
1. Linear Regression and Multiple Linear Regression

from sklearn.linear_model import LinearRegression


lm = LinearRegression()
lm
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
normalize=False)
X = df[['highway-mpg']]
Y = df['price']
lm.fit(X,Y)
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
normalize=False)
Yhat = lm.predict(X)
Yhat[0:5]

OUTPUT:
array([16236.50464347, 16236.50464347, 17058.23802179, 13771.3045085 ,
20345.17153508])

lm.intercept_
OUTPUT :
38423.305858157386

lm.coef_
OUTPUT :
array([-821.73337832])

Z = df[['horsepower', 'curb-weight', 'engine-size', 'highway-mpg']]

lm.fit(Z, df['price'])
OUTPUT :
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
normalize=False)
lm.intercept_
OUTPUT :
-15806.624626329198

lm.coef_
OUTPUT :
array([53.49574423, 4.70770099, 81

2) Model Evaluation using Visualization

EXAMPLE NO : 01
Regression Plot

import seaborn as sns


%matplotlib inline
width = 12
height = 10
plt.figure(figsize=(width, height))
sns.regplot(x="highway-mpg", y="price", data=df)
plt.ylim(0,)

OUTPUT :
plt.figure(figsize=(width, height))
sns.regplot(x="peak-rpm", y="price", data=df)
plt.ylim(0,)

plt.figure(figsize=(width, height))
sns.regplot(x="peak-rpm", y="price", data=df)
plt.ylim(0,)

OUTPUT :
(0, 47422.919330307624
df[["peak-rpm","highway-mpg","price"]].corr()

Residual Plot

width = 12
height = 10
plt.figure(figsize=(width, height))
sns.residplot(df['highway-mpg'], df['price'])
plt.show()
Multiple Linear Regression

EXAMPLE NO :03
Y_hat = lm.predict(Z)
plt.figure(figsize=(width, height))

ax1 = sns.distplot(df['price'], hist=False, color="r", label="Actual Value")


sns.distplot(Yhat, hist=False, color="b", label="Fitted Values" , ax=ax1)

plt.title('Actual vs Fitted Values for Price')


plt.xlabel('Price (in dollars)')
plt.ylabel('Proportion of Cars')

plt.show()
plt.close()

3) Polynomial Regression and Pipelines

EXAMPLE NO : 04
def PlotPolly(model, independent_variable, dependent_variabble, Name):
x_new = np.linspace(15, 55, 100)
y_new = model(x_new)

plt.plot(independent_variable, dependent_variabble, '.', x_new, y_new, '-')


plt.title('Polynomial Fit with Matplotlib for Price ~ Length')
ax = plt.gca()
ax.set_facecolor((0.898, 0.898, 0.898))
fig = plt.gcf()
plt.xlabel(Name)
plt.ylabel('Price of Cars')

plt.show()
plt.close()
x = df['highway-mpg']
y = df['price']
f = np.polyfit(x, y, 3)
p = np.poly1d(f)
print(p)
PlotPolly(p, x, y, 'highway-mpg')

OUTPUT:
4) Measures for In-Sample Evaluation
# highway_mpg_fit
lm.fit(X, Y)
OUTPUT:
The R-square is: 0.4965911884339175

# Find the R^2


print('The R-square is: ', lm.score(X, Y))
Yhat = lm.predict(X)
print('The output of the first four predicted value is: ', Yhat[0:4])
OUTPUT:
The output of the first four predicted value is: [16236.50464347 16236.50464347 17058.23802179
13771.3045085 ]

from sklearn.metrics import mean_squared_error


mse = mean_squared_error(df['price'], Yhat)
print('The mean square error of price and predicted value is: ', mse)
OUTPUT:
The mean square error of price and predicted value is: 31635042.944639895

5) Prediction and Decision Making


new_input = np.arange(1, 100, 1).reshape(-1, 1)
lm.fit(X, Y)
lm
OUTPUT:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
normalize=False)
yhat=lm.predict(new_input)
yhat[0:5]
OUTPUT:
array([37601.57247984, 36779.83910151, 35958.10572319, 35136.37234487,
34314.63896655])

plt.plot(new_input, yhat)
plt.show()
Conclusions on Model Development:
Comparing these three models, we conclude that the MLR model is the best model to be able to predict price from our dataset. This result makes
sense, since we have 27 variables in total, and we know that more than one of those variables are potential predictors of the final car price.

You might also like