Lab 3 - Linear Regression
Lab 3 - Linear Regression
Lab 3 - Linear Regression
Nikhilesh Prabhakar
16BCE1158
Datasets used
Apart from the usual House Prices dataset that were used in previous
lab submissions, the dataset that I had worked on for this one was
the one presented in Sebastian Raschka’s “Python Machine
Learning”.
Link: https://www.kaggle.com/c/house-prices-advanced-regression-
techniques/data. There are 79 variables describing almost every
aspect of residential homes for sale in Iowa (at the time of collecting
data).
Link for the second dataset:
https://archive.ics.uci.edu/ml/machine-
learningdatabases/housing/housing.data
Methodology
corrmat = fdata.corr()
top_corr_features =
corrmat.index[abs(corrmat["SalePrice"])>0.50]
plt.figure(figsize=(10,10))
g =
sb.heatmap(fdata[top_corr_features].corr(),annot=True)
#Sale Price is most correlated with OverallQual,
GrLivArea, GarageCars, GarageArea, TotalBsmtSF, 1stFlrSF
Step 4: The Linear Regression Model
There are 3 methods covered in this lecture to for finding out the
Linear Regression
class LinearRegressionGD(object):
def __init__(self, eta=0.001, n_iter=20):
self.eta = eta
self.n_iter = n_iter
def fit(self, X, y):
self.w_ = np.zeros(1 + X.shape[1])
self.cost_ = []
for i in range(self.n_iter):
output = self.net_input(X)
errors = (y - output)
b = self.eta * X.T.dot(errors)
self.w_[1:] += b
self.w_[0] += self.eta * errors.sum()
cost = (errors**2).sum() / 2.0
self.cost_.append(cost)
return self
def net_input(self, X):
return np.dot(X, self.w_[1:]) + self.w_[0]
def predict(self, X):
return self.net_input(X)
def lin_regplot(X, y, model):
plt.scatter(X, y, c='blue')
plt.plot(X, model.predict(X), color='red')
return None
lin_regplot(X_std, y_std, lr)
plt.xlabel('Average number of rooms [RM]
(standardized)')
plt.ylabel('Price in $1000\'s [MEDV] (standardized)')
plt.show()
slr = LinearRegression()
slr.fit(X, y)
print('Slope: %.3f' % slr.coef_[0])
print('Intercept: %.3f' % slr.intercept_)
Slope: 107.130
Intercept: 18569.026