EMAN AMIR (11)
DATA MINING
Submitted to: Sir Khurshid
3-1-2025
Qno1: Predict house prices using regression in Python (scikit-
learn).
Here's a step-by-step example of predicting house prices using regression in Python
with the scikit-learn library.
Steps:
1. Import necessary libraries.
2. Load or create a dataset.
3. Preprocess the data.
4. Split the dataset into training and test sets.
5. Train a regression model (e.g., Linear Regression).
6. Evaluate the model.
7. Make predictions.
Here's the code:
# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
# Example dataset (replace with a real dataset)
data = {
'Size (sq ft)': [1500, 1600, 1700, 1800, 1900, 2000, 2100],
'Bedrooms': [3, 3, 4, 4, 4, 5, 5],
'Age (years)': [10, 15, 20, 10, 5, 5, 3],
'Price (USD)': [300000, 320000, 340000, 360000, 400000, 420000, 450000]
}
df = pd.DataFrame(data)
# Features and target variable
X = df[['Size (sq ft)', 'Bedrooms', 'Age (years)']]
y = df['Price (USD)']
# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train a regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Predict on the test set
y_pred = model.predict(X_test)
# Evaluate the model
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)
print("Model Evaluation:")
print(f"Mean Absolute Error (MAE): {mae}")
print(f"Mean Squared Error (MSE): {mse}")
print(f"Root Mean Squared Error (RMSE): {rmse}")
print(f"R^2 Score: {r2}")
# Example prediction
example_house = np.array([[2000, 4, 8]]) # 2000 sq ft, 4 bedrooms, 8 years old
predicted_price = model.predict(example_house)
print(f"Predicted price for the house: ${predicted_price[0]:,.2f}")
Explanation:
1. Dataset: The dataset is a small example; in practice, you would use a more
extensive dataset.
2. Features and Target: The features (X) include size, bedrooms, and age, and the
target variable (y) is the price.
3. Splitting Data: The data is divided into training and testing sets using an 80-20
split.
4. Model: A Linear Regression model is used for simplicity.
5. Evaluation: Common metrics (MAE, MSE, RMSE, R²) assess the model's
performance.
6. Prediction: The model predicts house prices based on new data.