DSBDA-Assignment-4 - Jupyter Notebook http://localhost:8888/notebooks/DSBDA-Assignment-4...
In [21]: import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
In [ ]:
In [22]: df = pd.read_csv("HousingData.csv")
In [23]: df
Out[23]:
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B
0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90
1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90
2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83
3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63
4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90
... ... ... ... ... ... ... ... ... ... ... ... ...
501 0.06263 0.0 11.93 0.0 0.573 6.593 69.1 2.4786 1 273 21.0 391.99
502 0.04527 0.0 11.93 0.0 0.573 6.120 76.7 2.2875 1 273 21.0 396.90
503 0.06076 0.0 11.93 0.0 0.573 6.976 91.0 2.1675 1 273 21.0 396.90
504 0.10959 0.0 11.93 0.0 0.573 6.794 89.3 2.3889 1 273 21.0 393.45
505 0.04741 0.0 11.93 0.0 0.573 6.030 NaN 2.5050 1 273 21.0 396.90
506 rows × 14 columns
In [24]: df.head()
Out[24]:
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT
0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90
1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90
2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83
3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63
4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90
1 of 5 27/02/25, 11:30
DSBDA-Assignment-4 - Jupyter Notebook http://localhost:8888/notebooks/DSBDA-Assignment-4...
In [25]: df.tail()
Out[25]:
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT
501 0.06263 0.0 11.93 0.0 0.573 6.593 69.1 2.4786 1 273 21.0 391.99
502 0.04527 0.0 11.93 0.0 0.573 6.120 76.7 2.2875 1 273 21.0 396.90
503 0.06076 0.0 11.93 0.0 0.573 6.976 91.0 2.1675 1 273 21.0 396.90
504 0.10959 0.0 11.93 0.0 0.573 6.794 89.3 2.3889 1 273 21.0 393.45
505 0.04741 0.0 11.93 0.0 0.573 6.030 NaN 2.5050 1 273 21.0 396.90
In [26]: df.isnull().sum()
Out[26]: CRIM 20
ZN 20
INDUS 20
CHAS 20
NOX 0
RM 0
AGE 20
DIS 0
RAD 0
TAX 0
PTRATIO 0
B 0
LSTAT 20
MEDV 0
dtype: int64
In [27]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 506 entries, 0 to 505
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 CRIM 486 non-null float64
1 ZN 486 non-null float64
2 INDUS 486 non-null float64
3 CHAS 486 non-null float64
4 NOX 506 non-null float64
5 RM 506 non-null float64
6 AGE 486 non-null float64
7 DIS 506 non-null float64
8 RAD 506 non-null int64
9 TAX 506 non-null int64
10 PTRATIO 506 non-null float64
11 B 506 non-null float64
12 LSTAT 486 non-null float64
13 MEDV 506 non-null float64
dtypes: float64(12), int64(2)
2 of 5 27/02/25, 11:30
DSBDA-Assignment-4 - Jupyter Notebook http://localhost:8888/notebooks/DSBDA-Assignment-4...
In [28]: df.describe()
Out[28]:
CRIM ZN INDUS CHAS NOX RM AGE
count 486.000000 486.000000 486.000000 486.000000 506.000000 506.000000 486.000000
mean 3.611874 11.211934 11.083992 0.069959 0.554695 6.284634 68.518519
std 8.720192 23.388876 6.835896 0.255340 0.115878 0.702617 27.999513
min 0.006320 0.000000 0.460000 0.000000 0.385000 3.561000 2.900000
25% 0.081900 0.000000 5.190000 0.000000 0.449000 5.885500 45.175000
50% 0.253715 0.000000 9.690000 0.000000 0.538000 6.208500 76.800000
75% 3.560263 12.500000 18.100000 0.000000 0.624000 6.623500 93.975000
max 88.976200 100.000000 27.740000 1.000000 0.871000 8.780000 100.000000
In [29]: df.fillna(df.median(numeric_only=True), inplace=True)
In [36]: df.isnull().sum()
Out[36]: CRIM 0
ZN 0
INDUS 0
CHAS 0
NOX 0
RM 0
AGE 0
DIS 0
RAD 0
TAX 0
PTRATIO 0
B 0
LSTAT 0
MEDV 0
dtype: int64
In [30]: X = df.drop(columns=['MEDV'])
y = df['MEDV']
In [31]: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2
In [32]: model = LinearRegression()
model.fit(X_train, y_train)
Out[32]:
▾ LinearRegression i ?
(https://
scikit-
LinearRegression() learn.org/1.4/
modules/
generated/
In [33]: y_pred = model.predict(X_test)
3 of 5 27/02/25, 11:30
DSBDA-Assignment-4 - Jupyter Notebook http://localhost:8888/notebooks/DSBDA-Assignment-4...
In [34]: mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")
print(f"R-squared (R2): {r2:.2f}")
Mean Squared Error: 25.00
R-squared (R2): 0.66
In [35]: plt.figure(figsize=(8, 6))
sns.scatterplot(x=y_test, y=y_pred, color='blue', alpha=0.6)
plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], color
plt.xlabel("Actual Prices")
plt.ylabel("Predicted Prices")
plt.title("Actual vs Predicted Prices")
plt.show()
4 of 5 27/02/25, 11:30
DSBDA-Assignment-4 - Jupyter Notebook http://localhost:8888/notebooks/DSBDA-Assignment-4...
In [41]: # Take user input for all features
print("Enter the house details to predict the price:")
CRIM = float(input("Crime rate per capita: "))
ZN = float(input("Proportion of residential land zoned for large lots: "
INDUS = float(input("Proportion of non-retail business acres per town: "
CHAS = float(input("Charles River (1 if bounds river, else 0): "))
NOX = float(input("Nitrogen oxide concentration (pollution level): "))
RM = float(input("Average number of rooms per dwelling: "))
AGE = float(input("Proportion of owner-occupied units built before 1940: "
DIS = float(input("Weighted distance to employment centers: "))
RAD = int(input("Index of accessibility to highways: "))
TAX = int(input("Property tax rate per $10,000: "))
PTRATIO = float(input("Pupil-teacher ratio by town: "))
B = float(input("Proportion of Black residents: "))
LSTAT = float(input("Lower status population percentage: "))
# Store input values in a DataFrame
user_data = pd.DataFrame([[CRIM, ZN, INDUS, CHAS, NOX, RM, AGE, DIS,
columns=X.columns)
# Predict the house price
predicted_price = model.predict(user_data)
# Display the result
print(f"\nPredicted House Price: ${predicted_price[0] * 1000:.2f}")
Enter the house details to predict the price:
Crime rate per capita: 12
Proportion of residential land zoned for large lots: 42
Proportion of non-retail business acres per town: 52
Charles River (1 if bounds river, else 0): 45
Nitrogen oxide concentration (pollution level): 23
Average number of rooms per dwelling: 5
Proportion of owner-occupied units built before 1940: 10
Weighted distance to employment centers: 45
Index of accessibility to highways: 15
Property tax rate per $10,000: 5
Pupil-teacher ratio by town: 5
Proportion of Black residents: 200
Lower status population percentage: 20
Predicted House Price: $-245333.47
In [ ]:
5 of 5 27/02/25, 11:30