Machine Learning Lab Manaul BCSL606
Machine Learning Lab Manaul BCSL606
TECHNOLOGY-NORTH CAMPUS
Off International Airport Road, Kundana, Bengaluru -
562110
PREPARED BY:
Prof. Prathima G
each feature. Generate box plots for all numerical features and identify any outliers. Use California Housing
dataset.
Program:
import pandas as pd
import numpy as np
from sklearn.datasets import fetch_california_housing# Step 1: Load the California Housing dataset
data = fetch_california_housing(as_frame=True)
plt.figure(figsize=(15, 10))
plt.subplot(3, 3, i + 1)
plt.title(f'Distribution of {feature}')
plt.tight_layout()
plt.figure(figsize=(15, 10))
plt.subplot(3, 3, i + 1)
sns.boxplot(x=housing_df[feature], color='orange')
plt.tight_layout()
print("Outliers Detection:")
outliers_summary = {}
Dept. Of CSE, CITNC Page|2
Machine Learning Laboratory (BCSL606)
for feature in numerical_features:
Q1 = housing_df[feature].quantile(0.25)
Q3 = housing_df[feature].quantile(0.75)
IQR = Q3 - Q1
outliers_summary[feature] = len(outliers)
print("\nDataset Summary:")
print(housing_df.describe())
OUTPUT:
Lab Experiment 2:
Develop a program to compute the correlation matrix to understand the relationships between
pairs of features. Visualize the correlation matrix using a heatmap to know which variables have
strong positive/negative correlations. Create a pair plot to visualize pairwise relationships between features.
Use California Housing dataset.
Program:
import pandas as pd
california_data = fetch_california_housing(as_frame=True)
data = california_data.frame
correlation_matrix = data.corr()
plt.figure(figsize=(10, 8))
plt.show()
plt.show()
OUTPUT:
Program:
import numpy as np
import pandas as pd
iris = load_iris()
data = iris.data
labels = iris.target
label_names = iris.target_names
pca = PCA(n_components=2)
data_reduced = pca.fit_transform(data)
reduced_df['Label'] = labels
plt.figure(figsize=(8, 6))
plt.scatter(
color=colors[i])
plt.legend()
plt.grid()
plt.show()
OUTPUT:
For a given set of training data examples stored in a .CSV file, implement and demonstrate the Find-S
algorithm to output a description of the set of all hypotheses consistent with the training examples.
Program:
import pandas as pd
def find_s_algorithm(file_path):
data = pd.read_csv(file_path)
print("Training data:")
print(data)
attributes = data.columns[:-1]
class_label = data.columns[-1]
if row[class_label] == 'Yes':
hypothesis[i] = value
else:
hypothesis[i] = '?'
return hypothesis
file_path = ‘e:/react-2025/python/one.csv’
hypothesis = find_s_algorithm(file_path)
Training data:[one.csv]
OUTPUT:
Lab Experiment 5:
Develop a program to implement k-Nearest Neighbour algorithm to classify the randomly generated 100
values of x in the range of [0,1]. Perform the following based on dataset generated.
a) Label the first 50 points {x1,……,x50} as follows: if (xi ≤ 0.5), then xi ∊ Class1, else xi ∊ Class1
b) Classify the remaining points, x51,……,x100 using KNN. Perform this for k=1,2,3,4,5,20,30
Program:
import numpy as np
data = np.random.rand(100)
in range(len(train_data))]
distances.sort(key=lambda x: x[0])
k_nearest_neighbors = distances[:k]
return Counter(k_nearest_labels).most_common(1)[0][0]
train_labels = labels
test_data = data[50:]
print("Training dataset: First 50 points labeled based on the rule (x <= 0.5 -> Class1, x >
results = {}
for k in k_values:
test_point in test_data]
results[k] = classified_labels
{label}")
print("\n")
print("Classification complete.\n")
for k in k_values:
classified_labels = results[k]
== "Class1"]
== "Class2"]
plt.figure(figsize=(10, 6))
marker="o")
(Test)", marker="x")
(Test)", marker="x")
plt.xlabel("Data Points")
plt.ylabel("Classification Level")
plt.legend()
plt.grid(True)
plt.show()
OUTPUT:
Program:
import numpy as np
m = X.shape[0]
W = np.diag(weights)
X_transpose_W = X.T @ W
return x @ theta
np.random.seed(42)
X_bias = np.c_[np.ones(X.shape), X]
tau = 0.5
plt.figure(figsize=(10, 6))
plt.xlabel('X', fontsize=12)
plt.ylabel('y', fontsize=12)
Dept. Of CSE, CITNC P a g e | 15
Machine Learning Laboratory (BCSL606)
plt.title('Locally Weighted Regression', fontsize=14)
plt.legend(fontsize=10)
plt.grid(alpha=0.3)
plt.show()
OUTPUT:
Program:
import numpy as np
import pandas as pd
def linear_regression_california():
housing = fetch_california_housing(as_frame=True)
X = housing.data[["AveRooms"]]
y = housing.target
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
plt.legend()
plt.show()
Dept. Of CSE, CITNC P a g e | 17
Machine Learning Laboratory (BCSL606)
print("Linear Regression - California Housing Dataset")
def polynomial_regression_auto_mpg():
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data"
data = data.dropna()
X = data["displacement"].values.reshape(-1, 1)
y = data["mpg"].values
poly_model=make_pipeline(PolynomialFeatures(degree=2),StandardScaler(),
LinearRegression())
poly_model.fit(X_train, y_train)
y_pred = poly_model.predict(X_test)
plt.xlabel("Displacement")
plt.legend()
plt.show()
if __name__ == "__main__":
linear_regression_california()
OUTPUT:
Program:
import numpy as np
data = load_breast_cancer()
X = data.data
y = data.target
clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
new_sample = np.array([X_test[0]])
prediction = clf.predict(new_sample)
plt.figure(figsize=(12,8))
plt.show()
Lab Experiment 9:
Develop a program to implement the Naive Bayesian classifier considering Olivetti Face Data set for training.
Compute the accuracy of the classifier, considering a few test data sets.
Program:
import numpy as np
X = data.data
y = data.target
gnb = GaussianNB()
Dept. Of CSE, CITNC P a g e | 21
Machine Learning Laboratory (BCSL606)
gnb.fit(X_train, y_train)
y_pred = gnb.predict(X_test)
print("\nClassification Report:")
print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))
ax.axis('off')
plt.show()
OUTPUT:
Accuracy: 80.83%
Classification Report:
precision recall f1-score support
Dept. Of CSE, CITNC P a g e | 22
Machine Learning Laboratory (BCSL606)
Confusion Matrix:
[[2 0 0 ... 0 0 0]
[0 2 0 ... 0 0 0]
[0 0 2 ... 0 0 1]
...
[0 0 0 ... 1 0 0]
[0 0 0 ... 0 3 0]
[0 0 0 ... 0 0 5]]
import numpy as np
import pandas as pd
data = load_breast_cancer()
X = data.data
y = data.target
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
y_kmeans = kmeans.fit_predict(X_scaled)
print("Confusion Matrix:")
print(confusion_matrix(y, y_kmeans))
print("\nClassification Report:")
print(classification_report(y, y_kmeans))
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)
df['Cluster'] = y_kmeans
df['True Label'] = y
plt.figure(figsize=(8, 6))
edgecolor='black', alpha=0.7)
plt.legend(title="Cluster")
plt.show()
plt.figure(figsize=(8, 6))
plt.legend(title="True Label")
plt.show()
plt.figure(figsize=(8, 6))
centers = pca.transform(kmeans.cluster_centers_)
plt.legend(title="Cluster")
plt.show()
OUTPUT: