1.
Importing Libraries
We will import numpy, pandas, matplotlib and scikit learn for this.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.datasets import fetch_california_housing
2. Load Dataset
Fetches the California Housing dataset from sklearn.datasets.
Dataset contains features (such as median income, average rooms) stored in X and the
target (house prices) is stored in y.
california_housing = fetch_california_housing()
X = pd.DataFrame(california_housing.data, columns=california_housing.feature_names)
y = pd.Series(california_housing.target)
3. Select Features for Visualization
Selects two features (MedInc for median income and AveRooms for average rooms) to simplify
the visualization to two dimensions.
X = X[['MedInc', 'AveRooms']]
4. Train-Test Split
We will use 80% data for training and 20% for testing.
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42)
5. Initialize and Train Model
model = LinearRegression()
model.fit(X_train, y_train)
6. Make Predictions
y_pred = model.predict(X_test)
7. Visualizing Best Fit Line in 3D
fig = plt.figure(figsize=(10, 7))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X_test['MedInc'], X_test['AveRooms'],
y_test, color='blue', label='Actual Data')
x1_range = np.linspace(X_test['MedInc'].min(), X_test['MedInc'].max(), 100)
x2_range = np.linspace(X_test['AveRooms'].min(), X_test['AveRooms'].max(), 100)
x1, x2 = np.meshgrid(x1_range, x2_range)
z = model.predict(np.c_[x1.ravel(), x2.ravel()]).reshape(x1.shape)
ax.plot_surface(x1, x2, z, color='red', alpha=0.5, rstride=100, cstride=100)
ax.set_xlabel('Median Income')
ax.set_ylabel('Average Rooms')
ax.set_zlabel('House Price')
ax.set_title('Multiple Linear Regression Best Fit Line (3D)')
plt.show()