0% found this document useful (0 votes)

3K views33 pages

BCSL606 Machine Learning Lab

The document is a lab manual for a Machine Learning course at Guru Nanak Dev Engineering College, detailing experiments for the sixth semester. It includes ten programming tasks that cover various machine learning techniques such as histograms, correlation matrices, PCA, KNN, decision trees, and clustering using different datasets. Each task provides a brief description and example code for implementation.

Uploaded by

santoshpatilrao33

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3K views33 pages

BCSL606 Machine Learning Lab

Uploaded by

santoshpatilrao33

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

GURU NANAK DEV ENGINEERING COLLEGE

(Affiliated to VTU, Belagavi & Approved By AICTE, New Delhi)

Mailoor Road, Bidar - 585403

DEPARTMENT OF CSE (DATA SCIENCE) ENGINEERING

MACHINE LEARNING LAB MANUAL
SEMESTER - VITH
(BCSL606)
Sl.NO Experiments
1 Develop a program to create histograms for all numerical features and analyze the distribution of each feature.
Generate box plots for all numerical features and identify any outliers. Use California Housing dataset.

2 Develop a program to Compute the correlation matrix to understand the relationships between pairs of
features. Visualize the correlation matrix using a heatmap to know which variables have strong
positive/negative correlations. Create a pair plot to visualize pairwise relationships between features. Use
California Housing dataset.

3 Develop a program to implement Principal Component Analysis (PCA) for reducing the dimensionality of the
Iris dataset from 4 features to 2.

4 For a given set of training data examples stored in a .CSV file, implement and demonstrate the Find-S
algorithm to output a description of the set of all hypotheses consistent with the training examples.

5 Develop a program to implement k-Nearest Neighbour algorithm to classify the randomly generated 100 values
of x in the range of [0,1]. Perform the following based on dataset generated.

1. Label the first 50 points {x1,……,x50} as follows: if (xi ≤ 0.5), then xi ∊ Class1, else xi ∊ Class1
2. Classify the remaining points, x51,……,x100 using KNN. Perform this for k=1,2,3,4,5,20,30

6 Implement the non-parametric Locally Weighted Regression algorithm in order to fit data points. Select
appropriate data set for your experiment and draw graphs

7 Develop a program to demonstrate the working of Linear Regression and Polynomial Regression. Use Boston
Housing Dataset for Linear Regression and Auto MPG Dataset (for vehicle fuel efficiency prediction) for
Polynomial Regression.

8 Develop a program to demonstrate the working of the decision tree algorithm. Use Breast Cancer Data set for
building the decision tree and apply this knowledge to classify a new sample.

9 Develop a program to implement the Naive Bayesian classifier considering Olivetti Face Data set for training.
Compute the accuracy of the classifier, considering a few test data sets.

10 Develop a program to implement k-means clustering using Wisconsin Breast Cancer data set and visualize the
clustering result.
PROGRAM 1

Develop a program to create histograms for all numerical features and

analyze the distribution of each feature. Generate box plots for all
numerical features and identify any outliers. Use California Housing
dataset.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import fetch_california_housing

# Load the California Housing dataset

data = fetch_california_housing()
df = pd.DataFrame(data.data, columns=data.feature_names)

# Create histograms for all numerical features

plt.figure(figsize=(12, 8))
df.hist(bins=30, figsize=(12, 8), layout=(3, 3), edgecolor='black')
plt.suptitle("Histograms of Numerical Features", fontsize=16)
plt.tight_layout()
plt.show()

# Generate box plots for all numerical features to identify outliers

plt.figure(figsize=(12, 8))
for i, column in enumerate(df.columns):
plt.subplot(3, 3, i+1)
sns.boxplot(y=df[column])
plt.title(column)
plt.suptitle("Box Plots of Numerical Features", fontsize=16)
plt.tight_layout()
plt.show()
OUTPUT
PROGRAM 2
Develop a program to Compute the correlation matrix to understand the relationships between
pairs of features. Visualize the correlation matrix using a heatmap to know which variables have
strong positive/negative correlations. Create a pair plot to visualize pairwise relationships between
features. Use California Housing dataset.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing

# Step 1: Load the California Housing Dataset

california_data = fetch_california_housing(as_frame=True)
data = california_data.frame

# Step 2: Compute the correlation matrix

correlation_matrix = data.corr()

# Step 3: Visualize the correlation matrix using a heatmap

plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f',
linewidths=0.5)
plt.title('Correlation Matrix of California Housing Features')
plt.show()

# Step 4: Create a pair plot to visualize pairwise relationships

sns.pairplot(data, diag_kind='kde', plot_kws={'alpha': 0.5})
plt.suptitle('Pair Plot of California Housing Features', y=1.02)
plt.show()
OUTPUT
PROGRAM -3
Develop a program to implement Principal Component Analysis (PCA) for reducing
the dimensionality of the Iris dataset from 4 features to 2.

Import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
# Load the Iris dataset
iris = load_iris()
data = iris.data
labels = iris.target
label_names = iris.target_names
# Convert to a DataFrame for better visualization
iris_df = pd.DataFrame(data, columns=iris.feature_names)
# Perform PCA to reduce dimensionality to 2
pca = PCA(n_components=2)
data_reduced = pca.fit_transform(data)
# Create a DataFrame for the reduced data
reduced_df = pd.DataFrame(data_reduced, columns=[‘Principal Component 1’,
‘Principal Component 2’])
reduced_df[‘Label’] = labels
# Plot the reduced data
plt.figure(figsize=(8, 6))
colors = [‘r’, ‘g’, ‘b’]
for I, label in enumerate(np.unique(labels)):
plt.scatter(
reduced_df[reduced_df[‘Label’] == label][‘Principal Component 1’],
reduced_df[reduced_df[‘Label’] == label][‘Principal Component 2’],
label=label_names[label],
color=colors[i]
)
plt.title(‘PCA on Iris Dataset’)
plt.xlabel(‘Principal Component 1’)
plt.ylabel(‘Principal Component 2’)
plt.legend()
plt.grid()
plt.show()
OUTPUT
PROGRAM-4
For a given set of training data examples stored in a .CSV file, implement
and demonstrate the Find-S algorithm to output a description of the set of
all hypotheses consistent with the training examples.
import pandas as pd
def find_s_algorithm(file_path):
data = pd.read_csv(file_path)
print("Training data:")
print(data)
attributes = data.columns[:-1]
class_label = data.columns[-1]
hypothesis = ['?' for _ in attributes]
for index, row in data.iterrows():
if row[class_label] == 'Yes':
for i, value in enumerate(row[attributes]):
if hypothesis[i] == '?' or hypothesis[i] == value:
hypothesis[i] = value
else:
hypothesis[i] = '?'
return hypothesis
file_path = 'training_data.csv'
hypothesis = find_s_algorithm(file_path)
print("\nThe final hypothesis is:", hypothesis)
OUTPUT
Training data:
Outlook Temperature Humidity Windy PlayTennis
0 Sunny Hot High False No
1 Sunny Hot High True No
2 Overcast Hot High False Yes
3 Rain Cold High False Yes
4 Rain Cold High True No
5 Overcast Hot High True Yes
6 Sunny Hot High False No
The final hypothesis is: ['Overcast', 'Hot', 'High', '?']

PROGRAM=5
Develop a program to implement k-Nearest Neighbour algorithm to classify
the randomly generated 100 values
of x in the range of [0,1]. Perform the following based on dataset generated.
1. Label the first 50 points {x1,……,x50} as follows: if (xi ≤ 0.5), then xi ∊
Class1, else xi ∊ Class1
2. Classify the remaining points, x51,……,x100 using KNN. Perform this
for k=1,2,3,4,5,20,30

import numpy as np
import matplotlib.pyplot as plt
from collections import Counter
def generate_data():
np.random.seed(42) # For reproducibility
x = np.random.rand(100) # Generate 100 random values in [0,1]
labels = np.array(["Class1" if xi <= 0.5 else "Class2" for xi in x[:50]]) #
Label first 50 points
return x, labels

def knn_classification(train_x, train_labels, test_x, k):

predictions = []
for x_test in test_x:
distances = np.abs(train_x - x_test) # Compute absolute distance
nearest_indices = np.argsort(distances)[:k] # Get k nearest neighbors
nearest_labels = train_labels[nearest_indices] # Get corresponding labels
most_common = Counter(nearest_labels).most_common(1)[0][0] #
Majority voting
predictions.append(most_common)
return np.array(predictions)
def main():
x, labels = generate_data()
train_x, test_x = x[:50], x[50:]
train_labels = labels
k_values = [1, 2, 3, 4, 5, 20, 30]
results = {}
for k in k_values:
predictions = knn_classification(train_x, train_labels, test_x, k)
results[k] = predictions
for k, preds in results.items():
print(f"Results for k={k}: {preds}")
plt.scatter(train_x, [1] * 50, c=["blue" if lbl == "Class1" else "red" for lbl
in train_labels], label="Training Data")
for k, preds in results.items():
plt.scatter(test_x, [k] * 50, c=["blue" if lbl == "Class1" else "red" for lbl in
preds], label=f"Test Data k={k}")
plt.xlabel("x")
plt.ylabel("k-values")
plt.legend()
plt.show()
if __name__ == "__main__":
main()
OUTPUT
Results for k=1: ['Class2' 'Class2' 'Class2' 'Class2' 'Class2' 'Class2' 'Class1'
'Class1'
'Class1' 'Class1' 'Class1' 'Class1' 'Class2' 'Class1' 'Class1' 'Class2'
'Class1' 'Class2' 'Class1' 'Class2' 'Class2' 'Class1' 'Class1' 'Class2'
'Class2' 'Class2' 'Class2' 'Class1' 'Class1' 'Class1' 'Class2' 'Class2'
'Class1' 'Class1' 'Class1' 'Class1' 'Class2' 'Class2' 'Class2' 'Class1'
'Class1' 'Class2' 'Class2' 'Class2' 'Class2' 'Class1' 'Class2' 'Class1'
'Class1' 'Class1']
Results for k=2: ['Class2' 'Class2' 'Class2' 'Class2' 'Class2' 'Class2' 'Class1'
'Class1'
'Class1' 'Class1' 'Class1' 'Class1' 'Class2' 'Class1' 'Class1' 'Class2'
'Class1' 'Class2' 'Class1' 'Class2' 'Class2' 'Class1' 'Class1' 'Class2'
'Class2' 'Class2' 'Class2' 'Class1' 'Class1' 'Class1' 'Class2' 'Class2'
'Class1' 'Class1' 'Class1' 'Class1' 'Class2' 'Class2' 'Class2' 'Class1'
'Class1' 'Class2' 'Class2' 'Class2' 'Class2' 'Class1' 'Class2' 'Class1'
'Class1' 'Class1']
Results for k=3: ['Class2' 'Class2' 'Class2' 'Class2' 'Class2' 'Class2' 'Class1'
'Class1'
'Class1' 'Class1' 'Class1' 'Class1' 'Class2' 'Class1' 'Class1' 'Class2'
'Class1' 'Class2' 'Class1' 'Class2' 'Class2' 'Class1' 'Class1' 'Class2'
'Class2' 'Class2' 'Class2' 'Class1' 'Class1' 'Class1' 'Class2' 'Class2'
'Class1' 'Class1' 'Class1' 'Class1' 'Class2' 'Class2' 'Class2' 'Class1'
'Class1' 'Class2' 'Class2' 'Class2' 'Class2' 'Class2' 'Class2' 'Class1'
'Class1' 'Class1']
Results for k=4: ['Class2' 'Class2' 'Class2' 'Class2' 'Class2' 'Class2' 'Class1'
'Class1'
'Class1' 'Class1' 'Class1' 'Class1' 'Class2' 'Class1' 'Class1' 'Class2'
'Class1' 'Class2' 'Class1' 'Class2' 'Class2' 'Class1' 'Class1' 'Class2'
'Class2' 'Class2' 'Class2' 'Class1' 'Class1' 'Class1' 'Class2' 'Class2'
'Class1' 'Class1' 'Class1' 'Class1' 'Class2' 'Class2' 'Class2' 'Class1'
'Class1' 'Class2' 'Class2' 'Class2' 'Class2' 'Class2' 'Class2' 'Class1'
'Class1' 'Class1']
Results for k=5: ['Class2' 'Class2' 'Class2' 'Class2' 'Class2' 'Class2' 'Class1'
'Class1'
'Class1' 'Class1' 'Class1' 'Class1' 'Class2' 'Class1' 'Class1' 'Class2'
'Class1' 'Class2' 'Class1' 'Class2' 'Class2' 'Class1' 'Class1' 'Class2'
'Class2' 'Class2' 'Class2' 'Class1' 'Class1' 'Class1' 'Class2' 'Class2'
'Class1' 'Class1' 'Class1' 'Class1' 'Class2' 'Class2' 'Class2' 'Class1'
'Class1' 'Class2' 'Class2' 'Class2' 'Class2' 'Class2' 'Class2' 'Class1'
'Class1' 'Class1']
Results for k=20: ['Class2' 'Class2' 'Class2' 'Class2' 'Class2' 'Class2' 'Class1'
'Class1'
'Class1' 'Class1' 'Class1' 'Class1' 'Class2' 'Class1' 'Class1' 'Class2'
'Class1' 'Class2' 'Class1' 'Class2' 'Class2' 'Class1' 'Class1' 'Class2'
'Class2' 'Class2' 'Class2' 'Class1' 'Class1' 'Class1' 'Class2' 'Class2'
'Class1' 'Class1' 'Class1' 'Class1' 'Class2' 'Class2' 'Class2' 'Class1'
'Class1' 'Class2' 'Class2' 'Class2' 'Class2' 'Class2' 'Class2' 'Class1'
'Class1' 'Class1']
Results for k=30: ['Class2' 'Class2' 'Class2' 'Class2' 'Class2' 'Class2' 'Class1'
'Class1'
'Class1' 'Class1' 'Class1' 'Class1' 'Class2' 'Class1' 'Class1' 'Class2'
'Class1' 'Class2' 'Class1' 'Class2' 'Class2' 'Class1' 'Class1' 'Class2'
'Class2' 'Class2' 'Class2' 'Class1' 'Class1' 'Class1' 'Class2' 'Class2'
'Class1' 'Class1' 'Class1' 'Class1' 'Class2' 'Class2' 'Class2' 'Class1'
'Class1' 'Class2' 'Class2' 'Class2' 'Class2' 'Class1' 'Class2' 'Class1'
'Class1' 'Class1']
PROGRAM -6
Implement the non-parametric Locally Weighted Regression algorithm in
order to fit data points. Select appropriate data set for your experiment and
draw graphs
import numpy as np
import matplotlib.pyplot as plt
def gaussian_kernel(x, xi, tau):
return np.exp(-np.sum((x - xi) ** 2) / (2 * tau ** 2))
def locally_weighted_regression(x, X, y, tau):
m = X.shape[0]
weights = np.array([gaussian_kernel(x, X[i], tau) for i in range(m)])
W = np.diag(weights)
X_transpose_W = X.T @ W
theta = np.linalg.inv(X_transpose_W @ X) @ X_transpose_W @ y
return x @ theta
np.random.seed(42)
X = np.linspace(0, 2 * np.pi, 100)
y = np.sin(X) + 0.1 * np.random.randn(100)
X_bias = np.c_[np.ones(X.shape), X]
x_test = np.linspace(0, 2 * np.pi, 200)
x_test_bias = np.c_[np.ones(x_test.shape), x_test]
tau = 0.5
y_pred = np.array([locally_weighted_regression(xi, X_bias, y, tau) for xi in
x_test_bias])
plt.figure(figsize=(10, 6))
plt.scatter(X, y, color='red', label='Training Data', alpha=0.7)
plt.plot(x_test, y_pred, color='blue', label=f'LWR Fit (tau={tau})', linewidth=2)
plt.xlabel('X', fontsize=12)
plt.ylabel('y', fontsize=12)
plt.title('Locally Weighted Regression', fontsize=14)
plt.legend(fontsize=10)
plt.grid(alpha=0.3)
plt.show()
OUTPUT
PROGRAM -7
Develop a program to demonstrate the working of Linear Regression and
Polynomial Regression. Use Boston Housing Dataset for Linear Regression
and Auto MPG Dataset (for vehicle fuel efficiency prediction) for
Polynomial Regression.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures, StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.metrics import mean_squared_error, r2_score

def linear_regression_california():
housing = fetch_california_housing(as_frame=True)
X = housing.data[["AveRooms"]]
y = housing.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

random_state=42)

model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

plt.scatter(X_test, y_test, color="blue", label="Actual")

plt.plot(X_test, y_pred, color="red", label="Predicted")
plt.xlabel("Average number of rooms (AveRooms)")
plt.ylabel("Median value of homes ($100,000)")
plt.title("Linear Regression - California Housing Dataset")
plt.legend()
plt.show()

print("Linear Regression - California Housing Dataset")

print("Mean Squared Error:", mean_squared_error(y_test, y_pred))
print("R^2 Score:", r2_score(y_test, y_pred))

def polynomial_regression_auto_mpg():
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/auto-
mpg/auto-mpg.data"
column_names = ["mpg", "cylinders", "displacement", "horsepower",
"weight", "acceleration", "model_year", "origin"]
data = pd.read_csv(url, sep='\s+', names=column_names, na_values="?")
data = data.dropna()

X = data["displacement"].values.reshape(-1, 1)
y = data["mpg"].values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

random_state=42)

poly_model = make_pipeline(PolynomialFeatures(degree=2),
StandardScaler(), LinearRegression())
poly_model.fit(X_train, y_train)
y_pred = poly_model.predict(X_test)

plt.scatter(X_test, y_test, color="blue", label="Actual")

plt.scatter(X_test, y_pred, color="red", label="Predicted")
plt.xlabel("Displacement")
plt.ylabel("Miles per gallon (mpg)")
plt.title("Polynomial Regression - Auto MPG Dataset")
plt.legend()
plt.show()

print("Polynomial Regression - Auto MPG Dataset")

print("Mean Squared Error:", mean_squared_error(y_test, y_pred))
print("R^2 Score:", r2_score(y_test, y_pred))

if __name__ == "__main__":
print("Demonstrating Linear Regression and Polynomial Regression\n")
linear_regression_california()
polynomial_regression_auto_mpg()
OUTPUT

Demonstrating Linear Regression and Polynomial Regression

Linear Regression - California Housing Dataset
Mean Squared Error: 1.2923314440807299
R^2 Score: 0.013795337532284901
Polynomial Regression - Auto MPG Dataset
Mean Squared Error: 0.743149055720586
R^2 Score: 0.7505650609469626
PROGRAM 8
Develop a program to demonstrate the working of the decision tree
algorithm. Use Breast Cancer Data set for building the decision tree and
apply this knowledge to classify a new sample.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn import tree
data = load_breast_cancer()
X = data.data
y = data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy * 100:.2f}%")
new_sample = np.array([X_test[0]])
prediction = clf.predict(new_sample)
prediction_class = "Benign" if prediction == 1 else "Malignant"
print(f"Predicted Class for the new sample: {prediction_class}")
plt.figure(figsize=(12,8))
tree.plot_tree(clf, filled=True, feature_names=data.feature_names,
class_names=data.target_names)
plt.title("Decision Tree - Breast Cancer Dataset")
plt.show()

OUTPUT
PROGRAM 9
Develop a program to implement the Naive Bayesian classifier considering
Olivetti Face Data set for training. Compute the accuracy of the classifier,
considering a few test data sets.
import numpy as np
from sklearn.datasets import fetch_olivetti_faces
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report,
confusion_matrix
import matplotlib.pyplot as plt

data = fetch_olivetti_faces(shuffle=True, random_state=42)

X = data.data
y = data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,

random_state=42)

gnb = GaussianNB()
gnb.fit(X_train, y_train)
y_pred = gnb.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

print(f'Accuracy: {accuracy * 100:.2f}%')

print("\nClassification Report:")
print(classification_report(y_test, y_pred, zero_division=1))
print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))

cross_val_accuracy = cross_val_score(gnb, X, y, cv=5, scoring='accuracy')

print(f'\nCross-validation accuracy: {cross_val_accuracy.mean() * 100:.2f}%')

fig, axes = plt.subplots(3, 5, figsize=(12, 8))

for ax, image, label, prediction in zip(axes.ravel(), X_test, y_test, y_pred):
ax.imshow(image.reshape(64, 64), cmap=plt.cm.gray)
ax.set_title(f"True: {label}, Pred: {prediction}")
ax.axis('off')
plt.show()
OUTPUT
Accuracy: 80.83%
Classification Report:
precision recall f1-score support

0 0.67 1.00 0.80 2

1 1.00 1.00 1.00 2
2 0.33 0.67 0.44 3
3 1.00 0.00 0.00 5
4 1.00 0.50 0.67 4
5 1.00 1.00 1.00 2
7 1.00 0.75 0.86 4
8 1.00 0.67 0.80 3
9 1.00 0.75 0.86 4
10 1.00 1.00 1.00 3
11 1.00 1.00 1.00 1
12 0.40 1.00 0.57 4
13 1.00 0.80 0.89 5
14 1.00 0.40 0.57 5
15 0.67 1.00 0.80 2
16 1.00 0.67 0.80 3
17 1.00 1.00 1.00 3
18 1.00 1.00 1.00 3
19 0.67 1.00 0.80 2
20 1.00 1.00 1.00 3
21 1.00 0.67 0.80 3
22 1.00 0.60 0.75 5
23 1.00 0.75 0.86 4
24 1.00 1.00 1.00 3
25 1.00 0.75 0.86 4
26 1.00 1.00 1.00 2
27 1.00 1.00 1.00 5
28 0.50 1.00 0.67 2
29 1.00 1.00 1.00 2
30 1.00 1.00 1.00 2
31 1.00 0.75 0.86 4
32 1.00 1.00 1.00 2
34 0.25 1.00 0.40 1
35 1.00 1.00 1.00 5
36 1.00 1.00 1.00 3
37 1.00 1.00 1.00 1
38 1.00 0.75 0.86 4
39 0.50 1.00 0.67 5
accuracy 0.81 120
macro avg 0.89 0.85 0.83 120
weighted avg 0.91 0.81 0.81 120
Confusion Matrix:
[[2 0 0 ... 0 0 0]
[0 2 0 ... 0 0 0]
[0 0 2 ... 0 0 1]
...
[0 0 0 ... 1 0 0]
[0 0 0 ... 0 3 0]
[0 0 0 ... 0 0 5]]
Cross-validation accuracy: 87.25%
PROGRAM 10
Develop a program to implement k-means clustering using Wisconsin
Breast Cancer data set and visualize the clustering result.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_breast_cancer
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.metrics import confusion_matrix, classification_report

data = load_breast_cancer()
X = data.data
y = data.target

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

kmeans = KMeans(n_clusters=2, random_state=42)

y_kmeans = kmeans.fit_predict(X_scaled)

print("Confusion Matrix:")
print(confusion_matrix(y, y_kmeans))
print("\nClassification Report:")
print(classification_report(y, y_kmeans))
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

df = pd.DataFrame(X_pca, columns=['PC1', 'PC2'])

df['Cluster'] = y_kmeans
df['True Label'] = y

plt.figure(figsize=(8, 6))
sns.scatterplot(data=df, x='PC1', y='PC2', hue='Cluster', palette='Set1', s=100,
edgecolor='black', alpha=0.7)
plt.title('K-Means Clustering of Breast Cancer Dataset')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend(title="Cluster")
plt.show()

plt.figure(figsize=(8, 6))
sns.scatterplot(data=df, x='PC1', y='PC2', hue='True Label', palette='coolwarm',
s=100, edgecolor='black', alpha=0.7)
plt.title('True Labels of Breast Cancer Dataset')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend(title="True Label")
plt.show()

plt.figure(figsize=(8, 6))
sns.scatterplot(data=df, x='PC1', y='PC2', hue='Cluster', palette='Set1', s=100,
edgecolor='black', alpha=0.7)
centers = pca.transform(kmeans.cluster_centers_)
plt.scatter(centers[:, 0], centers[:, 1], s=200, c='red', marker='X',
label='Centroids')
plt.title('K-Means Clustering with Centroids')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend(title="Cluster")
plt.show()

OUTPUT
Confusion Matrix:
[[175 37]
[ 13 344]]

Classification Report:
precision recall f1-score support

0 0.93 0.83 0.88 212

1 0.90 0.96 0.93 357

accuracy 0.91 569

macro avg 0.92 0.89 0.90 569
weighted avg 0.91 0.91 0.91 569

Bec654a - DSDV - Module 1
No ratings yet
Bec654a - DSDV - Module 1
48 pages
ML - Datascience Manual
No ratings yet
ML - Datascience Manual
64 pages
ML Lab Manual
No ratings yet
ML Lab Manual
43 pages
BCSL606 Machine Learning Lab Final Draft
No ratings yet
BCSL606 Machine Learning Lab Final Draft
32 pages
ML Lab Manual
No ratings yet
ML Lab Manual
25 pages
Machine Learning Lab Manaul BCSL606
No ratings yet
Machine Learning Lab Manaul BCSL606
27 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
26 pages
ML Lab Mannual1
No ratings yet
ML Lab Mannual1
37 pages
Nebosh Fsc2 Risk Assessment Guide
No ratings yet
Nebosh Fsc2 Risk Assessment Guide
17 pages
Life in The United Kingdom Official Practice Questions and Answe
100% (11)
Life in The United Kingdom Official Practice Questions and Answe
184 pages
System Configuration Editor
100% (1)
System Configuration Editor
570 pages
Module 1
No ratings yet
Module 1
15 pages
Web Programming Lab Manual
No ratings yet
Web Programming Lab Manual
21 pages
@vtucode - in BCS402 Model Paper 2022 Scheme
No ratings yet
@vtucode - in BCS402 Model Paper 2022 Scheme
2 pages
bcs402 Microcontrollers Model Question Paper Solutions For 4th Sem Be
No ratings yet
bcs402 Microcontrollers Model Question Paper Solutions For 4th Sem Be
44 pages
BCS402 MODULE 4 Notes
100% (1)
BCS402 MODULE 4 Notes
20 pages
@vtucode - in BIS402 Module 2 PDF 2022 Scheme
No ratings yet
@vtucode - in BIS402 Module 2 PDF 2022 Scheme
47 pages
BCS508 M-5
No ratings yet
BCS508 M-5
8 pages
Bai602 ML I
100% (1)
Bai602 ML I
4 pages
ELECTRONICS COMMUNICATION SYSTEMS Cie 2 Question Bank
No ratings yet
ELECTRONICS COMMUNICATION SYSTEMS Cie 2 Question Bank
2 pages
Updated Mongodb Lab Manual IV Sem
No ratings yet
Updated Mongodb Lab Manual IV Sem
48 pages
Unmanned Aerial Vehicle Uav Autopilot Sy
No ratings yet
Unmanned Aerial Vehicle Uav Autopilot Sy
90 pages
Question Bank: Software Engineering and Project Management (BCS501)
No ratings yet
Question Bank: Software Engineering and Project Management (BCS501)
1 page
Sta Manual 2025 (Bis657e)
No ratings yet
Sta Manual 2025 (Bis657e)
47 pages
Ai Lab Manual
No ratings yet
Ai Lab Manual
37 pages
Data Analytics Syllabus Created
No ratings yet
Data Analytics Syllabus Created
3 pages
Dbms Vtu Notes
100% (1)
Dbms Vtu Notes
104 pages
Aston Martin Vanquish S 2004
No ratings yet
Aston Martin Vanquish S 2004
27 pages
BCS402
No ratings yet
BCS402
2 pages
Module 5
No ratings yet
Module 5
19 pages
L'Oreal Technical Assessment - Raw Datad3208ec (Completed)
No ratings yet
L'Oreal Technical Assessment - Raw Datad3208ec (Completed)
368 pages
AI - Model Paper Answers - 240817 - 173447
No ratings yet
AI - Model Paper Answers - 240817 - 173447
27 pages
Maintenance Schedule
No ratings yet
Maintenance Schedule
3 pages
Energy Efficient Cluster Based Routing Protocol For WSN Using Multi Strategy Fusion Snake Optimizer and Minimum Spanning Tree
No ratings yet
Energy Efficient Cluster Based Routing Protocol For WSN Using Multi Strategy Fusion Snake Optimizer and Minimum Spanning Tree
24 pages
BCS306a Super Important - 22SCHEME
No ratings yet
BCS306a Super Important - 22SCHEME
2 pages
1
No ratings yet
1
35 pages
BCS602 Model Question Paper Solved (Search Creators) - 2-37
0% (2)
BCS602 Model Question Paper Solved (Search Creators) - 2-37
36 pages
OneFS - How To View Active Directory Provider Status and User Mapping Token Information - Dell India
No ratings yet
OneFS - How To View Active Directory Provider Status and User Mapping Token Information - Dell India
5 pages
C4159 - Transistor Sin Damper
No ratings yet
C4159 - Transistor Sin Damper
4 pages
BCS515B
0% (1)
BCS515B
2 pages
HAVELLS List Price - FAN January 2019
No ratings yet
HAVELLS List Price - FAN January 2019
4 pages
terms-of-use-ISCA Audit Manual For Group Entities
No ratings yet
terms-of-use-ISCA Audit Manual For Group Entities
7 pages
South Eastern European Journal of Communication, 1, 2019.
No ratings yet
South Eastern European Journal of Communication, 1, 2019.
90 pages
Module 05
No ratings yet
Module 05
57 pages
Handout 3 - Percentages, Profit & Loss, SI and CI
No ratings yet
Handout 3 - Percentages, Profit & Loss, SI and CI
2 pages
A. Umesh Subudhi
No ratings yet
A. Umesh Subudhi
3 pages
Module 04
No ratings yet
Module 04
40 pages
21CS482 Unix Shell Programming
100% (1)
21CS482 Unix Shell Programming
3 pages
Technical Competency Evaluation
No ratings yet
Technical Competency Evaluation
10 pages
Chapter 7
No ratings yet
Chapter 7
10 pages
Application Development Using Python: Model Question Paper-1 With Effect From 2018-19 (CBCS Scheme)
100% (1)
Application Development Using Python: Model Question Paper-1 With Effect From 2018-19 (CBCS Scheme)
6 pages
Data Visualization With Python Lab.-17007256038890 PDF
No ratings yet
Data Visualization With Python Lab.-17007256038890 PDF
25 pages
FT Esterilizador Uv - Serie - VP600 VP950 E4 F4 - Viqua
No ratings yet
FT Esterilizador Uv - Serie - VP600 VP950 E4 F4 - Viqua
2 pages
VTU ADA Lab Programs
No ratings yet
VTU ADA Lab Programs
31 pages
BAD402 Module 1 AI&ML 2022 Scheme
No ratings yet
BAD402 Module 1 AI&ML 2022 Scheme
16 pages
BCS401 ADA m5 Notes
No ratings yet
BCS401 ADA m5 Notes
29 pages
AFL CoApplicant V23082021
No ratings yet
AFL CoApplicant V23082021
2 pages
@vtucode - in BCS402 Syllabus 2022 Scheme
100% (2)
@vtucode - in BCS402 Syllabus 2022 Scheme
3 pages
Bsais 2a Group 2 Act 1
No ratings yet
Bsais 2a Group 2 Act 1
13 pages
Rsl1Pvbu: Product Data Sheet
No ratings yet
Rsl1Pvbu: Product Data Sheet
3 pages
Devops Lab Viva Questions
No ratings yet
Devops Lab Viva Questions
14 pages
Chapter 02 Understanding of Data
No ratings yet
Chapter 02 Understanding of Data
96 pages
DBMS Lab Manual 13-57
100% (1)
DBMS Lab Manual 13-57
45 pages
Question 1: Make A List of Project Numbers For Projects That Involve An Employee Whose Last Name Is Smith', Either As A Worker or As A Manager of The Department That Controls The Project
100% (1)
Question 1: Make A List of Project Numbers For Projects That Involve An Employee Whose Last Name Is Smith', Either As A Worker or As A Manager of The Department That Controls The Project
6 pages
Module 5
No ratings yet
Module 5
12 pages
Ada Lesson Plan Bcs401
No ratings yet
Ada Lesson Plan Bcs401
11 pages
Bcs302 Complete Notes
No ratings yet
Bcs302 Complete Notes
179 pages
Data Science Fundamentals QB
No ratings yet
Data Science Fundamentals QB
23 pages
Development of A Useful Wind Turbine Emulator Based On Permanent Magnet DC Motor
No ratings yet
Development of A Useful Wind Turbine Emulator Based On Permanent Magnet DC Motor
5 pages
Paper Airplane Model General Instructions: What You'll Need
No ratings yet
Paper Airplane Model General Instructions: What You'll Need
6 pages
@vtucode - in 21AI54 Question Bank 2021 Scheme
No ratings yet
@vtucode - in 21AI54 Question Bank 2021 Scheme
5 pages
BCS405A Module 4
No ratings yet
BCS405A Module 4
53 pages
Central Lubrication System: BOM For Work Order BOM For Work Order
No ratings yet
Central Lubrication System: BOM For Work Order BOM For Work Order
2 pages
DVP Manual
No ratings yet
DVP Manual
37 pages
Summary of History of Ordinances in India
No ratings yet
Summary of History of Ordinances in India
7 pages
21CS53 DBMS Module3 QuestionBank 2023-24
No ratings yet
21CS53 DBMS Module3 QuestionBank 2023-24
3 pages
18CS72-BDA Question Bank of First Internal Syllabus
No ratings yet
18CS72-BDA Question Bank of First Internal Syllabus
1 page
As 2687-1997 Textiles - Upholstery Fabrics For Domestic and Commercial Use (Excluding Face-Coated Fabrics)
No ratings yet
As 2687-1997 Textiles - Upholstery Fabrics For Domestic and Commercial Use (Excluding Face-Coated Fabrics)
7 pages
Somosot vs. Pontevedra (2006)
No ratings yet
Somosot vs. Pontevedra (2006)
4 pages
P-18 Business Valuation Management
No ratings yet
P-18 Business Valuation Management
17 pages
Ai (Bad402)
100% (2)
Ai (Bad402)
4 pages
Rolling
No ratings yet
Rolling
17 pages
Data Visualization With Python: BCS358D
No ratings yet
Data Visualization With Python: BCS358D
5 pages
2PAA107557 A en Compact 800 Course T367e - Panel Builder 800 Tutorials
No ratings yet
2PAA107557 A en Compact 800 Course T367e - Panel Builder 800 Tutorials
1 page
Discrete Mathematical Structures 15CS36: Course Objectives: This Course Will Enable Students To
No ratings yet
Discrete Mathematical Structures 15CS36: Course Objectives: This Course Will Enable Students To
53 pages
BDSL456B Lab Manual
No ratings yet
BDSL456B Lab Manual
36 pages
BCS401 ADA m2 Notes
No ratings yet
BCS401 ADA m2 Notes
28 pages
@vtucode - in BCSL404 Syllabus 2022 Scheme
No ratings yet
@vtucode - in BCSL404 Syllabus 2022 Scheme
3 pages
ADA Model QP 2023-24
No ratings yet
ADA Model QP 2023-24
2 pages
PHP Programming
No ratings yet
PHP Programming
4 pages
Web Lab Question Bank
100% (1)
Web Lab Question Bank
2 pages
VTU Question Paper of 18CS72 Big Data Analytics June-2022
100% (1)
VTU Question Paper of 18CS72 Big Data Analytics June-2022
2 pages
Data Structures and Application QP Vtu
No ratings yet
Data Structures and Application QP Vtu
9 pages
VTU Question Paper of 18CS72 Big Data Analytics Feb-2022
100% (1)
VTU Question Paper of 18CS72 Big Data Analytics Feb-2022
2 pages
CS8711 - Cloud Computing Laboratory Record: Department of Computer Science & Engineering
No ratings yet
CS8711 - Cloud Computing Laboratory Record: Department of Computer Science & Engineering
5 pages
Unit 1 Introduction of Machine Learning Notes
No ratings yet
Unit 1 Introduction of Machine Learning Notes
57 pages
Ada Lab Manual
No ratings yet
Ada Lab Manual
57 pages
Design and Analysis of Algorithms Laboratory 10CSL47
No ratings yet
Design and Analysis of Algorithms Laboratory 10CSL47
28 pages
Written By: Prof A. M .Padmareddy Chapter 1: Introduction To Finite Automata
No ratings yet
Written By: Prof A. M .Padmareddy Chapter 1: Introduction To Finite Automata
13 pages
CS6612 Compiler Lab Manual
100% (4)
CS6612 Compiler Lab Manual
60 pages