0% found this document useful (0 votes)

42 views5 pages

DSBDA Assignment 4 Jupyter Notebook

The document is a Jupyter Notebook for an assignment involving housing data analysis using Python libraries such as pandas, numpy, and sklearn. It includes data loading, preprocessing, model training with linear regression, and prediction of house prices based on user input. The model evaluation shows a Mean Squared Error of 25.00 and an R-squared value of 0.66.

Uploaded by

sumeet

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views5 pages

DSBDA Assignment 4 Jupyter Notebook

Uploaded by

sumeet

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

DSBDA-Assignment-4 - Jupyter Notebook http://localhost:8888/notebooks/DSBDA-Assignment-4...

In [21]: import numpy as np

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

In [ ]:

In [22]: df = pd.read_csv("HousingData.csv")

In [23]: df

Out[23]:
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B

0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90

1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90

2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83

3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63

4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90

... ... ... ... ... ... ... ... ... ... ... ... ...

501 0.06263 0.0 11.93 0.0 0.573 6.593 69.1 2.4786 1 273 21.0 391.99

502 0.04527 0.0 11.93 0.0 0.573 6.120 76.7 2.2875 1 273 21.0 396.90

503 0.06076 0.0 11.93 0.0 0.573 6.976 91.0 2.1675 1 273 21.0 396.90

504 0.10959 0.0 11.93 0.0 0.573 6.794 89.3 2.3889 1 273 21.0 393.45

505 0.04741 0.0 11.93 0.0 0.573 6.030 NaN 2.5050 1 273 21.0 396.90

506 rows × 14 columns

In [24]: df.head()

Out[24]:
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT

0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90

1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90

2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83

3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63

4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90

1 of 5 27/02/25, 11:30
DSBDA-Assignment-4 - Jupyter Notebook http://localhost:8888/notebooks/DSBDA-Assignment-4...

In [25]: df.tail()

Out[25]:
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT

501 0.06263 0.0 11.93 0.0 0.573 6.593 69.1 2.4786 1 273 21.0 391.99

502 0.04527 0.0 11.93 0.0 0.573 6.120 76.7 2.2875 1 273 21.0 396.90

503 0.06076 0.0 11.93 0.0 0.573 6.976 91.0 2.1675 1 273 21.0 396.90

504 0.10959 0.0 11.93 0.0 0.573 6.794 89.3 2.3889 1 273 21.0 393.45

505 0.04741 0.0 11.93 0.0 0.573 6.030 NaN 2.5050 1 273 21.0 396.90

In [26]: df.isnull().sum()

Out[26]: CRIM 20
ZN 20
INDUS 20
CHAS 20
NOX 0
RM 0
AGE 20
DIS 0
RAD 0
TAX 0
PTRATIO 0
B 0
LSTAT 20
MEDV 0
dtype: int64

In [27]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 506 entries, 0 to 505
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 CRIM 486 non-null float64
1 ZN 486 non-null float64
2 INDUS 486 non-null float64
3 CHAS 486 non-null float64
4 NOX 506 non-null float64
5 RM 506 non-null float64
6 AGE 486 non-null float64
7 DIS 506 non-null float64
8 RAD 506 non-null int64
9 TAX 506 non-null int64
10 PTRATIO 506 non-null float64
11 B 506 non-null float64
12 LSTAT 486 non-null float64
13 MEDV 506 non-null float64
dtypes: float64(12), int64(2)

2 of 5 27/02/25, 11:30
DSBDA-Assignment-4 - Jupyter Notebook http://localhost:8888/notebooks/DSBDA-Assignment-4...

In [28]: df.describe()

Out[28]:
CRIM ZN INDUS CHAS NOX RM AGE

count 486.000000 486.000000 486.000000 486.000000 506.000000 506.000000 486.000000

mean 3.611874 11.211934 11.083992 0.069959 0.554695 6.284634 68.518519

std 8.720192 23.388876 6.835896 0.255340 0.115878 0.702617 27.999513

min 0.006320 0.000000 0.460000 0.000000 0.385000 3.561000 2.900000

25% 0.081900 0.000000 5.190000 0.000000 0.449000 5.885500 45.175000

50% 0.253715 0.000000 9.690000 0.000000 0.538000 6.208500 76.800000

75% 3.560263 12.500000 18.100000 0.000000 0.624000 6.623500 93.975000

max 88.976200 100.000000 27.740000 1.000000 0.871000 8.780000 100.000000

In [29]: df.fillna(df.median(numeric_only=True), inplace=True)

In [36]: df.isnull().sum()

Out[36]: CRIM 0
ZN 0
INDUS 0
CHAS 0
NOX 0
RM 0
AGE 0
DIS 0
RAD 0
TAX 0
PTRATIO 0
B 0
LSTAT 0
MEDV 0
dtype: int64

In [30]: X = df.drop(columns=['MEDV'])
y = df['MEDV']

In [31]: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2

In [32]: model = LinearRegression()

model.fit(X_train, y_train)

Out[32]:
▾ LinearRegression i ?
(https://
scikit-
LinearRegression() learn.org/1.4/
modules/
generated/

In [33]: y_pred = model.predict(X_test)

3 of 5 27/02/25, 11:30
DSBDA-Assignment-4 - Jupyter Notebook http://localhost:8888/notebooks/DSBDA-Assignment-4...

In [34]: mse = mean_squared_error(y_test, y_pred)

r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse:.2f}")

print(f"R-squared (R2): {r2:.2f}")

Mean Squared Error: 25.00

R-squared (R2): 0.66

In [35]: plt.figure(figsize=(8, 6))

sns.scatterplot(x=y_test, y=y_pred, color='blue', alpha=0.6)
plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], color
plt.xlabel("Actual Prices")
plt.ylabel("Predicted Prices")
plt.title("Actual vs Predicted Prices")
plt.show()

4 of 5 27/02/25, 11:30
DSBDA-Assignment-4 - Jupyter Notebook http://localhost:8888/notebooks/DSBDA-Assignment-4...

In [41]: # Take user input for all features

print("Enter the house details to predict the price:")

CRIM = float(input("Crime rate per capita: "))

ZN = float(input("Proportion of residential land zoned for large lots: "
INDUS = float(input("Proportion of non-retail business acres per town: "
CHAS = float(input("Charles River (1 if bounds river, else 0): "))
NOX = float(input("Nitrogen oxide concentration (pollution level): "))
RM = float(input("Average number of rooms per dwelling: "))
AGE = float(input("Proportion of owner-occupied units built before 1940: "
DIS = float(input("Weighted distance to employment centers: "))
RAD = int(input("Index of accessibility to highways: "))
TAX = int(input("Property tax rate per $10,000: "))
PTRATIO = float(input("Pupil-teacher ratio by town: "))
B = float(input("Proportion of Black residents: "))
LSTAT = float(input("Lower status population percentage: "))

# Store input values in a DataFrame

user_data = pd.DataFrame([[CRIM, ZN, INDUS, CHAS, NOX, RM, AGE, DIS,
columns=X.columns)

# Predict the house price

predicted_price = model.predict(user_data)

# Display the result

print(f"\nPredicted House Price: ${predicted_price[0] * 1000:.2f}")

Enter the house details to predict the price:

Crime rate per capita: 12
Proportion of residential land zoned for large lots: 42
Proportion of non-retail business acres per town: 52
Charles River (1 if bounds river, else 0): 45
Nitrogen oxide concentration (pollution level): 23
Average number of rooms per dwelling: 5
Proportion of owner-occupied units built before 1940: 10
Weighted distance to employment centers: 45
Index of accessibility to highways: 15
Property tax rate per $10,000: 5
Pupil-teacher ratio by town: 5
Proportion of Black residents: 200
Lower status population percentage: 20

Predicted House Price: $-245333.47

In [ ]:

5 of 5 27/02/25, 11:30

DSBDA4
No ratings yet
DSBDA4
6 pages
Data Analytics I: Link of The Dataset
No ratings yet
Data Analytics I: Link of The Dataset
12 pages
DSBDA Prac4 1
No ratings yet
DSBDA Prac4 1
2 pages
A4 Dsbda Sana
No ratings yet
A4 Dsbda Sana
16 pages
Prg7a - Jupyter Notebook
No ratings yet
Prg7a - Jupyter Notebook
12 pages
DL 1
No ratings yet
DL 1
4 pages
ML Lab Assessment3.Ipynb - Colab
No ratings yet
ML Lab Assessment3.Ipynb - Colab
3 pages
Boston Housing Price Prediction
No ratings yet
Boston Housing Price Prediction
9 pages
Python ML for Engineers: Week 3
No ratings yet
Python ML for Engineers: Week 3
12 pages
Satya772244@gmail - Com House Price Prediction
No ratings yet
Satya772244@gmail - Com House Price Prediction
5 pages
Assignment 8
No ratings yet
Assignment 8
4 pages
Ash 4
No ratings yet
Ash 4
8 pages
A09Ass04 - Jupyter Notebook
No ratings yet
A09Ass04 - Jupyter Notebook
10 pages
Data Analytucs 1
No ratings yet
Data Analytucs 1
5 pages
Exp 3 ML
No ratings yet
Exp 3 ML
3 pages
Implementing OLS Regression On Boston Housing Secondary Dataset. Also Check The Data For Missing Values and Outliers.
No ratings yet
Implementing OLS Regression On Boston Housing Secondary Dataset. Also Check The Data For Missing Values and Outliers.
26 pages
DSBDA Prac4 2
No ratings yet
DSBDA Prac4 2
1 page
Continuous Assessment
No ratings yet
Continuous Assessment
4 pages
Ayush ML 5
No ratings yet
Ayush ML 5
8 pages
MLR-handson - Jupyter Notebook
No ratings yet
MLR-handson - Jupyter Notebook
5 pages
Data Mining Lab: Regression & Clustering
No ratings yet
Data Mining Lab: Regression & Clustering
36 pages
ML Lab Experiment Shivansh
No ratings yet
ML Lab Experiment Shivansh
29 pages
Group Work Assignment Supervised and Unsupervised Learning
No ratings yet
Group Work Assignment Supervised and Unsupervised Learning
10 pages
Train
No ratings yet
Train
17 pages
Week 6 LAB
No ratings yet
Week 6 LAB
13 pages
Ash Regression
No ratings yet
Ash Regression
11 pages
SML Lab 1
No ratings yet
SML Lab 1
19 pages
T2 Summary VHA
No ratings yet
T2 Summary VHA
14 pages
ML Manual
No ratings yet
ML Manual
30 pages
Linear Regression with Boston Housing Data
No ratings yet
Linear Regression with Boston Housing Data
14 pages
ML Merged
No ratings yet
ML Merged
28 pages
ModuleAr Merged
No ratings yet
ModuleAr Merged
42 pages
Dav Lab Manual
No ratings yet
Dav Lab Manual
28 pages
Dsbda Exp4 Part1
No ratings yet
Dsbda Exp4 Part1
39 pages
ASSi2 DSBDA
No ratings yet
ASSi2 DSBDA
4 pages
Data Analytics Lab Manual
No ratings yet
Data Analytics Lab Manual
26 pages
ML Practice Assignment
No ratings yet
ML Practice Assignment
7 pages
Da Program Upto 6
No ratings yet
Da Program Upto 6
20 pages
Data Science Record - 05
No ratings yet
Data Science Record - 05
20 pages
Zerox Ready
No ratings yet
Zerox Ready
21 pages
Experiment No.:1: Program
No ratings yet
Experiment No.:1: Program
7 pages
Python Data Analysis Guide
No ratings yet
Python Data Analysis Guide
171 pages
Assignment 4
No ratings yet
Assignment 4
7 pages
DA Programs
No ratings yet
DA Programs
44 pages
ML Manual
No ratings yet
ML Manual
9 pages
Latihan4 - Analisis Deskriptif
No ratings yet
Latihan4 - Analisis Deskriptif
10 pages
Linear Regression Analysis Guide
No ratings yet
Linear Regression Analysis Guide
15 pages
Openlab 1
No ratings yet
Openlab 1
17 pages
Featureselection
No ratings yet
Featureselection
11 pages
Assignment 03
No ratings yet
Assignment 03
6 pages
ML Lab Records
No ratings yet
ML Lab Records
101 pages
DSBDA Practicals
No ratings yet
DSBDA Practicals
16 pages
Sukanya Linear LogisticRegression Report
100% (1)
Sukanya Linear LogisticRegression Report
23 pages
Machine Learning Lab Assignment 2
No ratings yet
Machine Learning Lab Assignment 2
23 pages
Project Paarth
No ratings yet
Project Paarth
21 pages
Deber
No ratings yet
Deber
23 pages
Regression Anallysis Hands0n 1
100% (1)
Regression Anallysis Hands0n 1
3 pages
Assignment AI-ML
No ratings yet
Assignment AI-ML
13 pages
DS (310245C) Unit 6 Question Bank
No ratings yet
DS (310245C) Unit 6 Question Bank
1 page
Wt@Insemqb
No ratings yet
Wt@Insemqb
56 pages
Earn and Learn Scheme - Notice and Application Form
No ratings yet
Earn and Learn Scheme - Notice and Application Form
3 pages
DS (310245C) Unit 4 Question Bank
No ratings yet
DS (310245C) Unit 4 Question Bank
1 page
AI Unit 6
No ratings yet
AI Unit 6
60 pages
Te Comp Distributed System 6180-51
No ratings yet
Te Comp Distributed System 6180-51
2 pages
Te Comp Distributed Systems 6262-41
No ratings yet
Te Comp Distributed Systems 6262-41
2 pages
AI Unit 3
No ratings yet
AI Unit 3
52 pages
Mini Report Movie
No ratings yet
Mini Report Movie
9 pages
Beige & Gold Vintage Bordered Achievement Certificate
No ratings yet
Beige & Gold Vintage Bordered Achievement Certificate
1 page
CC Unit4
No ratings yet
CC Unit4
33 pages
Submission Certifcate - TE COMP
No ratings yet
Submission Certifcate - TE COMP
1 page
CC Unit5
No ratings yet
CC Unit5
28 pages
Pathway To Light - 6. The Blue Canal Elemental Water Attunements
No ratings yet
Pathway To Light - 6. The Blue Canal Elemental Water Attunements
31 pages
DSBDA Assignment 3 Jupyter Notebook
No ratings yet
DSBDA Assignment 3 Jupyter Notebook
3 pages
Reiki Certificate Example 1 - PLEASE EDIT
No ratings yet
Reiki Certificate Example 1 - PLEASE EDIT
1 page
6 Queue
No ratings yet
6 Queue
14 pages
Activity 3.1
No ratings yet
Activity 3.1
5 pages
Fluids 06 00386
No ratings yet
Fluids 06 00386
16 pages
IMSO 2013 MATH - Essay
No ratings yet
IMSO 2013 MATH - Essay
9 pages
Vector 1 - Reprsesentation and Angle - Reviewed
No ratings yet
Vector 1 - Reprsesentation and Angle - Reviewed
2 pages
CH 11 Algebra and Forumulae
No ratings yet
CH 11 Algebra and Forumulae
12 pages
Mock Exam (For Practice Only) : Course: Maae 2101 Ab FACILITATOR NAME: Mikhail Chizhov
No ratings yet
Mock Exam (For Practice Only) : Course: Maae 2101 Ab FACILITATOR NAME: Mikhail Chizhov
6 pages
Ligação
No ratings yet
Ligação
5 pages
Data Preprocessing - 2: Course Leader
No ratings yet
Data Preprocessing - 2: Course Leader
31 pages
Electrical Eng. Board Exam Prep
No ratings yet
Electrical Eng. Board Exam Prep
1 page
PT ENGLISH3 4th-Quarter
No ratings yet
PT ENGLISH3 4th-Quarter
4 pages
Bochner Flat Tangent Bundles Study
No ratings yet
Bochner Flat Tangent Bundles Study
12 pages
Gujarat Technological University: W.E.F. AY 2018-19
No ratings yet
Gujarat Technological University: W.E.F. AY 2018-19
4 pages
Code Org CSP Activity Flippy Do
No ratings yet
Code Org CSP Activity Flippy Do
2 pages
Miniaturized UWB Monopole Microstrip Antenna Design by The Combination of Giusepe Peano and Sierpinski Carpet Fractals
No ratings yet
Miniaturized UWB Monopole Microstrip Antenna Design by The Combination of Giusepe Peano and Sierpinski Carpet Fractals
4 pages
12.2 Simple Factorising
No ratings yet
12.2 Simple Factorising
16 pages
Approaches For Impact Assessment Ext-507
No ratings yet
Approaches For Impact Assessment Ext-507
15 pages
EXPLORATIONS in ANCIENT and MODERN PHILOSOPHY - Volume 3 - Myles Burnyeat, Carol Atack, Malcolm Schofield, David Sedley - 3, 2022 - Cambridge - 9780521750721 - Anna's Archive
No ratings yet
EXPLORATIONS in ANCIENT and MODERN PHILOSOPHY - Volume 3 - Myles Burnyeat, Carol Atack, Malcolm Schofield, David Sedley - 3, 2022 - Cambridge - 9780521750721 - Anna's Archive
460 pages
ME Math 10 Q2 1103 SG
No ratings yet
ME Math 10 Q2 1103 SG
13 pages
LM 24 Aug
No ratings yet
LM 24 Aug
84 pages
Runaway Groom 1st Edition Lauren Layne Download Full Chapters
100% (5)
Runaway Groom 1st Edition Lauren Layne Download Full Chapters
77 pages
Appendix A: Authors: John Hennessy & David Patterson
No ratings yet
Appendix A: Authors: John Hennessy & David Patterson
15 pages
Calenders
No ratings yet
Calenders
5 pages
Baye's THM - Lecture 12 Notes
No ratings yet
Baye's THM - Lecture 12 Notes
5 pages
Recurrence Relations 2.4 Q.17
No ratings yet
Recurrence Relations 2.4 Q.17
3 pages
CESTAT30 02.01.vectors - Presentation
No ratings yet
CESTAT30 02.01.vectors - Presentation
164 pages
QT Iii Iv Sem PDF
100% (1)
QT Iii Iv Sem PDF
17 pages
FCI Paper-1 Subject Wise Syllabus
No ratings yet
FCI Paper-1 Subject Wise Syllabus
4 pages
Complex Number DPP 1 To 6
100% (1)
Complex Number DPP 1 To 6
12 pages
Unit 3 - Permutation and Combination: by Name of The Creator-Vikas Ranjan Designation - Trainer Department - CTLD
No ratings yet
Unit 3 - Permutation and Combination: by Name of The Creator-Vikas Ranjan Designation - Trainer Department - CTLD
17 pages
Introducing Logic A Graphic Guide Third Edition Dan Cryan Download Full Chapters
100% (2)
Introducing Logic A Graphic Guide Third Edition Dan Cryan Download Full Chapters
160 pages

DSBDA Assignment 4 Jupyter Notebook

Uploaded by

DSBDA Assignment 4 Jupyter Notebook

Uploaded by

DSBDA-Assignment-4 - Jupyter Notebook http://localhost:8888/notebooks/DSBDA-Assignment-4...

In [21]: import numpy as np

506 rows × 14 columns

count 486.000000 486.000000 486.000000 486.000000 506.000000 506.000000 486.000000

mean 3.611874 11.211934 11.083992 0.069959 0.554695 6.284634 68.518519

std 8.720192 23.388876 6.835896 0.255340 0.115878 0.702617 27.999513

min 0.006320 0.000000 0.460000 0.000000 0.385000 3.561000 2.900000

25% 0.081900 0.000000 5.190000 0.000000 0.449000 5.885500 45.175000

50% 0.253715 0.000000 9.690000 0.000000 0.538000 6.208500 76.800000

75% 3.560263 12.500000 18.100000 0.000000 0.624000 6.623500 93.975000

max 88.976200 100.000000 27.740000 1.000000 0.871000 8.780000 100.000000

In [29]: df.fillna(df.median(numeric_only=True), inplace=True)

In [31]: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2

In [32]: model = LinearRegression()

In [33]: y_pred = model.predict(X_test)

In [34]: mse = mean_squared_error(y_test, y_pred)

print(f"Mean Squared Error: {mse:.2f}")

Mean Squared Error: 25.00

In [35]: plt.figure(figsize=(8, 6))

In [41]: # Take user input for all features

CRIM = float(input("Crime rate per capita: "))

# Store input values in a DataFrame

# Predict the house price

# Display the result

Enter the house details to predict the price:

Predicted House Price: $-245333.47

You might also like