0% found this document useful (0 votes)

30 views36 pages

ML Lab Manual

The document outlines various practice programs focused on data analysis and machine learning using Python, including tasks such as creating histograms, computing correlation matrices, implementing PCA, and using algorithms like k-Nearest Neighbour and Naive Bayesian classifier. Each program includes specific datasets, such as the California Housing and Iris datasets, and provides source code examples for implementation. Additionally, it covers data visualization techniques using libraries like Matplotlib and Seaborn.

Uploaded by

surakshakeerthi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views36 pages

ML Lab Manual

Uploaded by

surakshakeerthi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

SI.

No Description Page
No
1 Practice Programs 1-7
2 Develop a program to create histograms for all numerical features and
analyze the distribution of each feature. Generate box plots for all 8 - 12
numerical features and identify any outliers. Use California Housing
dataset.
3 Develop a program to Compute the correlation matrix to understand
the relationships between pairs of features. Visualize the correlation 13 - 15
matrix using a heatmap to know which variables have strong
positive/negative correlations. Create a pair plot to visualize pairwise
relationships between features. Use California Housing dataset.
4 Develop a program to implement Principal Component Analysis (PCA) 16 – 17
for reducing the dimensionality of the Iris dataset from 4 features to 2.
5 For a given set of training data examples stored in a .CSV file, implement 18 - 19
and demonstrate the Find-S algorithm to output a description of the set
of all hypotheses consistent with the training examples.
6 Develop a program to implement k-Nearest Neighbour algorithm to
classify the randomly generated 100
values of x in the range of [0,1]. Perform the following based on dataset 20 – 22
generated.
a. Label the first 50 points {x1,……,x50} as follows: if (xi ≤ 0.5), then xi ε
Class1, else xi ε Class2
b. Classify the remaining points, x51,……,x100 using KNN. Perform this
for k=1,2,3,4,5,20,30
7 Implement the non-parametric Locally Weighted Regression algorithm
in order to fit data points. Select appropriate data set for your 23- 24
experiment and draw graphs
8 Develop a program to demonstrate the working of Linear Regression
and Polynomial Regression. Use Boston Housing Dataset for Linear
Regression and Auto MPG Dataset (for vehicle fuel efficiency 25 - 28
prediction) for Polynomial Regression.
9 Develop a program to demonstrate the working of the decision tree
algorithm. Use Breast Cancer Data set for building the decision tree and 29- 30
apply this knowledge to classify a new sample.
10 Develop a program to implement the Naive Bayesian classifier
considering Olivetti Face Data set for training. Compute the accuracy of 31 - 33
the classifier, considering a few test data sets.
11 Develop a program to implement k-means clustering using Wisconsin 34 - 35
Breast Cancer data set and visualize the clustering result.
12 Viva Questions 36
Practice Programs:

1. Write Python Script to create a DataFrame.

import pandas as pd

# Creating a DataFrame from a dictionary

data = {

'Name': ['Alice', 'Bob', 'Charlie', 'David'],

'Age': [25, 30, 35, 40],

'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago']

df = pd.DataFrame(data)

# Display the DataFrame

print(df)

2. Write a Python Script to Read and Write CSV Files

# Save DataFrame to a CSV file

df.to_csv('data.csv', index=False)

# Read DataFrame from a CSV file

df_read = pd.read_csv('data.csv')
print(df_read)

3. Write a Python Script to perform Basic DataFrame Operations

# Show first 2 rows

print(df.head(2))

# Show last 2 rows

print(df.tail(2))

# Get summary statistics

print(df.describe())

# Get column names

print(df.columns)
# Get DataFrame shape (rows, columns)

print(df.shape)

# Get data types of each column

print(df.dtypes)

4. Write a Python Script for Selecting and Filtering Data

# Select a single column

print(df['Name'])

# Select multiple columns

print(df[['Name', 'Age']])

# Filter rows based on a condition

print(df[df['Age'] > 30])
5. Write a Python Script for Adding and Modifying Columns

# Add a new column

df['Salary'] = [50000, 60000, 70000, 80000]

# Modify an existing column

df['Age'] = df['Age'] + 1 # Increase age by 1

print(df)

6. Write a Python Script for Sorting and Grouping Data

# Sort DataFrame by Age in ascending order

print(df.sort_values(by='Age'))

# Group data by City and find the mean Age

print(df.groupby('City')['Age'].mean())

7. Write a Python Script for Handling Missing Values

import numpy as np

# Introduce missing values

df.loc[1, 'Age'] = np.nan

# Check for missing values

print(df.isnull().sum())

# Fill missing values with mean

df['Age'].fillna(df['Age'].mean(), inplace=True)

print(df)

8. Write a Python Script for Applying Functions to DataFrame

# Apply a function to a column

df['Age_Category'] = df['Age'].apply(lambda x: 'Young' if x < 30 else 'Old')
print(df)
9. Write a Python Script to Plot a Line Plot (Trends over Time)
import matplotlib.pyplot as plt
import numpy as np

# Sample Data
x = np.arange(1, 11)
y = np.sin(x)

# Line Plot
plt.plot(x, y, marker='o', linestyle='-', color='b', label='Sine Wave')
plt.xlabel("X Values")
plt.ylabel("Y Values")
plt.title("Simple Line Plot")
plt.legend()
plt.grid(True)
plt.show()
10. Write a Python Script to Plot a Bar Chart (Category Comparison)

import matplotlib.pyplot as plt

# Sample Data

categories = ['A', 'B', 'C', 'D', 'E']

values = [10, 25, 15, 30, 20]

# Bar Plot

plt.bar(categories, values, color=['red', 'blue', 'green', 'purple', 'orange'])

plt.xlabel("Categories")

plt.ylabel("Values")

plt.title("Bar Chart Example")

plt.show()

11. Write a Python Script to Plot a Histogram (Distribution of Data)

import numpy as np

import matplotlib.pyplot as plt

# Generate Random Data

data = np.random.randn(1000)

# Histogram

plt.hist(data, bins=30, color='skyblue', edgecolor='black')

plt.xlabel("Value")

plt.ylabel("Frequency")

plt.title("Histogram of Random Data")

plt.show()

12. Write a Python Script to Plot a Scatter Plot (Relationship between Two Variables)

import numpy as np

import matplotlib.pyplot as plt

# Generate Data

x = np.random.rand(100)

y = np.random.rand(100)

# Scatter Plot

plt.scatter(x, y, c='red', alpha=0.6)

plt.xlabel("X Values")
plt.ylabel("Y Values")

plt.title("Scatter Plot Example")

plt.show()

13. Write a Python Script to Plot a Box Plot (Detecting Outliers)

import seaborn as sns

import numpy as np

import matplotlib.pyplot as plt

# Generate Random Data

data = np.random.randn(100)

# Box Plot

sns.boxplot(data=data, color='lightblue')

plt.title("Box Plot Example")

plt.show()

14. Write a Python Script to Plot a Pair Plot (Multiple Feature Relationships - Iris
Dataset)

import seaborn as sns

import pandas as pd

from sklearn.datasets import load_iris

# Load Iris Dataset

iris = load_iris()

df = pd.DataFrame(iris.data, columns=iris.feature_names)

df['species'] = iris.target

# Pair Plot

sns.pairplot(df, hue='species', palette='coolwarm')

plt.show()

15. Write a Python Script to Plot a Heatmap (Correlation Matrix - Titanic Dataset)

import seaborn as sns

import pandas as pd

# Load Sample Dataset

df = sns.load_dataset("titanic").dropna()

# Compute Correlation

corr_matrix = df.corr()

# Heatmap

plt.figure(figsize=(8,6))

sns.heatmap(corr_matrix, annot=True, cmap="coolwarm", fmt=".2f")

plt.title("Correlation Heatmap")

plt.show()

Program 1: Develop a program to create histograms for all numerical features and analyze
the distribution of each feature. Generate box plots for all numerical features and identify
any outliers. Use California Housing dataset.

Source Code:

import numpy as np

import pandas as pd
import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.datasets import fetch_california_housing

# Load California Housing dataset

data = fetch_california_housing()

df = pd.DataFrame(data.data, columns=data.feature_names)

# Display basic dataset information

print("Dataset Overview:")

print(df.info())

print("\nSummary Statistics:")

print(df.describe())

# Set plot style

sns.set_style("whitegrid")

# Create histograms for all numerical features

plt.figure(figsize=(12, 8))

df.hist(bins=30, figsize=(12, 8), edgecolor='black')

plt.suptitle("Histograms of Numerical Features in California Housing Dataset", fontsize=14)

plt.show()

# Create improved box plots for all numerical features to identify outliers

plt.figure(figsize=(14, 8))

for i, col in enumerate(df.columns):

plt.subplot(3, 3, i + 1)

sns.boxplot(x=df[col], color="skyblue", width=0.6, fliersize=3)

plt.title(col, fontsize=12)

plt.tight_layout()

plt.suptitle("Box Plots of Numerical Features", fontsize=14, y=1.02)

plt.show()

# Identify outliers using IQR method

Q1 = df.quantile(0.25)

Q3 = df.quantile(0.75)

IQR = Q3 - Q1

outliers = ((df < (Q1 - 1.5 * IQR)) | (df > (Q3 + 1.5 * IQR)))

print("\nOutlier Detection:")

print(outliers.sum())

Output :

Dataset Overview:

RangeIndex: 20640 entries, 0 to 20639

Data columns (total 8 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 MedInc 20640 non-null float64

1 HouseAge 20640 non-null float64

2 AveRooms 20640 non-null float64

3 AveBedrms 20640 non-null float64

4 Population 20640 non-null float64

5 AveOccup 20640 non-null float64

6 Latitude 20640 non-null float64

7 Longitude 20640 non-null float64

dtypes: float64(8)
Summary Statistics:

MedInc HouseAge AveRooms AveBedrms Population \

count 20640.000000 20640.000000 20640.000000 20640.000000 20640.000000

mean 3.870671 28.639486 5.429000 1.096675 1425.476744

std 1.899822 12.585558 2.474173 0.473911 1132.462122

min 0.499900 1.000000 0.846154 0.333333 3.000000

25% 2.563400 18.000000 4.440716 1.006079 787.000000

50% 3.534800 29.000000 5.229129 1.048780 1166.000000

75% 4.743250 37.000000 6.052381 1.099526 1725.000000

max 15.000100 52.000000 141.909091 34.066667 35682.000000

AveOccup Latitude Longitude

count 20640.000000 20640.000000 20640.000000

mean 3.070655 35.631861 -119.569704

std 10.386050 2.135952 2.003532

min 0.692308 32.540000 -124.350000

25% 2.429741 33.930000 -121.800000

50% 2.818116 34.260000 -118.490000

75% 3.282261 37.710000 -118.010000

max 1243.333333 41.950000 -114.310000

<Figure size 1200x800 with 0 Axes>

Outlier Detection:

MedInc 681

HouseAge 0

AveRooms 511

AveBedrms 1424

Population 1196

AveOccup 711

Latitude 0

Longitude 0
Program 2: Develop a program to Compute the correlation matrix to understand the
relationships between pairs of features. Visualize the correlation matrix using a heatmap to
know which variables have strong positive/negative correlations. Create a pair plot to
visualize pairwise relationships between features. Use California Housing dataset.

Source Code:

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.datasets import fetch_california_housing

# Load California Housing dataset

data = fetch_california_housing()

df = pd.DataFrame(data.data, columns=data.feature_names)

# Set plot style

sns.set_style("whitegrid")

# Compute and visualize the correlation matrix

plt.figure(figsize=(10, 6))

corr_matrix = df.corr()

sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt='.2f', linewidths=0.5)

plt.title("Feature Correlation Heatmap", fontsize=14)

plt.show()

# Create pair plot to visualize pairwise relationships between features

sns.pairplot(df, diag_kind='kde', plot_kws={'alpha':0.5})

plt.suptitle("Pair Plot of Features", fontsize=14, y=1.02)

plt.show()
# Identify skewness of numerical features

skew_values = df.skew()

print("\nSkewness of Features:")

print(skew_values)

Output:
Skewness of Features:

MedInc 1.646657

HouseAge 0.060331

AveRooms 20.697869

AveBedrms 31.316956

Population 4.935858

AveOccup 97.639561

Latitude 0.465953 Longitude -0.297801

Program 3: Develop a program to implement Principal Component Analysis (PCA) for
reducing the dimensionality of the Iris dataset from 4 features to 2.

Source Code:

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from sklearn.decomposition import PCA

from sklearn.datasets import load_iris

from sklearn.preprocessing import StandardScaler

# Load Iris dataset

data = load_iris()

df = pd.DataFrame(data.data, columns=data.feature_names)

# Standardize the data

scaler = StandardScaler()

df_scaled = scaler.fit_transform(df)

# Apply PCA to reduce dimensionality from 4 to 2

pca = PCA(n_components=2)

principal_components = pca.fit_transform(df_scaled)

# Create a new DataFrame with principal components

pca_df = pd.DataFrame(principal_components, columns=['PC1', 'PC2'])

pca_df['Target'] = data.target

# Visualize the PCA results

plt.figure(figsize=(8, 6))
for target, label in enumerate(data.target_names):

subset = pca_df[pca_df['Target'] == target]

plt.scatter(subset['PC1'], subset['PC2'], label=label, alpha=0.7)

plt.xlabel('Principal Component 1')

plt.ylabel('Principal Component 2')

plt.title('PCA of Iris Dataset')

plt.legend()

plt.grid(True)

plt.show()

# Print explained variance ratio

print("Explained Variance Ratio:", pca.explained_variance_ratio_)

Output:

Explained Variance Ratio: [0.72962445 0.22850762]

Program 4: For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Find-S algorithm to output a description of the set of all hypotheses
consistent with the training examples.

Source Code:

import csv

num_attributes = 6

a = []

print("\n The Given Training Data Set \n")

with open('data.csv', 'r') as csvfile:

reader = csv.reader(csvfile)

for row in reader:

a.append (row)

print(row)

print("\n The initial value of hypothesis: ")

hypothesis = ['0'] * num_attributes

print(hypothesis)

for j in range(0,num_attributes):

hypothesis[j] = a[0][j];

print("\n Find S: Finding a Maximally Specific Hypothesis\n")

for i in range(0,len(a)):

if a[i][num_attributes]=='yes':

for j in range(0,num_attributes):

if a[i][j]!=hypothesis[j]:

hypothesis[j]='?'

else :

hypothesis[j]= a[i][j]

print(" For Training instance No:{0} the hypothesis is ".format(i),hypothesis)

print("\n The Maximally Specific Hypothesis for a given Training Examples :\n")
print(hypothesis)

Output:

The Given Training Data Set

['sunny', 'warm', 'normal', 'strong', 'warm', 'same', 'yes']

['sunny', 'warm', 'high', 'strong', 'warm', 'same', 'yes']

['rainy', 'cold', 'high', 'strong', 'warm', 'change', 'no']

['sunny', 'warm', 'high', 'strong', 'cool', 'change', 'yes']

The initial value of hypothesis:

['0', '0', '0', '0', '0', '0']

Find S: Finding a Maximally Specific Hypothesis

For Training instance No:3 the hypothesis is ['sunny', 'warm', '?', 'strong', '?', '?']

The Maximally Specific Hypothesis for a given Training Examples :

['sunny', 'warm', '?', 'strong', '?', '?']

Program 5: Develop a program to implement k-Nearest Neighbour algorithm to classify the
randomly generated 100 values of x in the range of [0,1]. Perform the following based on
dataset generated.

a. Label the first 50 points {x1,……,x50} as follows: if (xi ≤ 0.5), then xi ε Class1, else xi ε Class1

b. Classify the remaining points, x51,……,x100 using KNN. Perform this for k=1,2,3,4,5,20,30

Source Code:

import numpy as np

import matplotlib.pyplot as plt

from sklearn.neighbors import KNeighborsClassifier

# Generate 100 random values in the range [0,1]

x = np.random.rand(100, 1)

# Label the first 50 points based on the given condition

labels = np.array([1 if xi <= 0.5 else 2 for xi in x[:50]])

# Prepare training and test sets

X_train, y_train = x[:50], labels # First 50 for training

X_test = x[50:] # Remaining 50 for classification

# Test for different values of k

k_values = [1, 2, 3, 4, 5, 20, 30]

plt.figure(figsize=(10, 6))

for k in k_values:

knn = KNeighborsClassifier(n_neighbors=k)

knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)

# Visualization of classification results

plt.scatter(X_test, y_pred, label=f'k={k}', alpha=0.7)

# Mark training points for reference

plt.scatter(X_train, y_train, color='red', marker='x', label='Training Data')

plt.xlabel('X values')

plt.ylabel('Predicted Class')

plt.title('KNN Classification for Different k-values')

plt.legend()

plt.show()

# Print classification results for each k

for k in k_values:

knn = KNeighborsClassifier(n_neighbors=k)

knn.fit(X_train, y_train)

y_pred = knn.predict(X_test)

print(f'Predictions for k={k}:', y_pred)

Output:
Predictions for k=1: [1 2 2 2 2 1 1 2 2 1 1 2 1 1 1 1 2 1 2 1 1 1 1 2 2 1 1 1 2 1 2 2 1 2 1 2 1

1 1 1 1 1 2 1 1 1 1 2 2 1]

Predictions for k=2: [1 2 2 2 2 1 1 2 2 1 1 2 1 1 1 1 2 1 2 1 1 1 1 2 2 1 1 1 2 1 2 2 1 2 1 2 1

1 1 1 1 1 2 1 1 1 1 2 2 1]

Predictions for k=3: [1 2 2 2 2 1 1 2 2 1 1 2 1 1 1 1 2 1 2 1 1 1 1 2 2 1 1 1 2 1 2 2 1 2 1 2 1

1 1 1 2 1 2 1 1 1 1 2 2 1]

Predictions for k=4: [1 2 2 2 2 1 1 2 2 1 1 2 1 1 1 1 2 1 2 1 1 1 1 2 2 1 1 1 2 1 2 2 1 2 1 2 1

1 1 1 1 1 2 1 1 1 1 2 2 1]

Predictions for k=5: [1 2 2 2 2 1 1 2 2 1 1 2 1 1 1 1 2 1 2 1 1 1 1 2 2 1 1 1 2 1 2 2 1 2 1 2 1

1 1 1 1 1 2 1 1 1 1 2 2 1]

Predictions for k=20: [1 2 2 2 2 1 1 2 2 1 1 2 1 1 1 1 2 1 2 1 1 1 1 2 2 1 1 1 2 1 1 2 1 2 1 2 1

1 1 1 1 1 1 1 1 1 1 2 2 1]

Predictions for k=30: [1 2 2 2 2 1 1 2 2 1 1 2 1 1 1 1 2 1 2 1 1 1 1 2 2 1 1 1 2 1 1 2 1 2 1 2 1

1 1 1 1 1 1 1 1 1 1 2 2 1]
Program 6: Implement the non-parametric Locally Weighted Regression algorithm in order
to fit data points. Select appropriate data set for your experiment and draw graphs.

Source Code:

import numpy as np

import matplotlib.pyplot as plt

# Generate synthetic dataset

np.random.seed(42)

X = np.linspace(0, 10, 100)

y = np.sin(X) + np.random.normal(0, 0.1, 100) # Sinusoidal data with noise

# Define Locally Weighted Regression function

def locally_weighted_regression(x_query, X, y, tau):

m = X.shape[0]

W = np.diag(np.exp(-((X[:, 1] - x_query[1]) ** 2) / (2 * tau ** 2))) # Diagonal weight matrix

theta = np.linalg.pinv(X.T @ W @ X) @ X.T @ W @ y # Compute theta

return x_query @ theta

# Fit Locally Weighted Regression for different values of tau

tau_values = [0.1, 0.5, 1, 5]

X_ones = np.c_[np.ones(X.shape[0]), X] # Add bias term

plt.figure(figsize=(10, 6))

plt.scatter(X, y, label='Data', color='blue', alpha=0.5)

for tau in tau_values:

y_pred = np.array([locally_weighted_regression(np.array([1, x_i]), X_ones, y, tau) for x_i in

X])

plt.plot(X, y_pred, label=f'tau={tau}')

plt.xlabel('X values')

plt.ylabel('Y values')

plt.title('Locally Weighted Regression with Different Bandwidths')

plt.legend()

plt.show()

Output:
Program 7: Develop a program to demonstrate the working of Linear Regression and
Polynomial Regression. Use Boston Housing Dataset for Linear Regression and Auto MPG
Dataset (for vehicle fuel efficiency prediction) for Polynomial Regression.

Source Code:

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.preprocessing import PolynomialFeatures

from sklearn.pipeline import make_pipeline

from sklearn.datasets import fetch_california_housing

from sklearn.metrics import mean_squared_error

# Load Boston Housing Dataset for Linear Regression

boston = fetch_california_housing()

X_boston = boston.data[:, :2] # Selecting first two features for simplicity

y_boston = boston.target

# Split data

X_train, X_test, y_train, y_test = train_test_split(X_boston, y_boston, test_size=0.2,

random_state=42)

# Train Linear Regression Model

linear_reg = LinearRegression()

linear_reg.fit(X_train, y_train)

y_pred = linear_reg.predict(X_test)

# Evaluate Model
mse = mean_squared_error(y_test, y_pred)

print(f'Linear Regression MSE: {mse}')

# Plot Predictions vs Actual

plt.scatter(y_test, y_pred, color='blue', alpha=0.5)

plt.xlabel('Actual Prices')

plt.ylabel('Predicted Prices')

plt.title('Linear Regression: Actual vs Predicted Prices')

plt.show()

# Load Auto MPG Dataset for Polynomial Regression

auto_mpg = pd.read_csv("https://raw.githubusercontent.com/mwaskom/seaborn-
data/master/mpg.csv").dropna()

X_auto = auto_mpg[['horsepower']].values

y_auto = auto_mpg['mpg'].values

# Split data

X_train, X_test, y_train, y_test = train_test_split(X_auto, y_auto, test_size=0.2,

random_state=42)

# Train Polynomial Regression Model

degree = 3 # Choosing a cubic polynomial model

poly_model = make_pipeline(PolynomialFeatures(degree), LinearRegression())

poly_model.fit(X_train, y_train)

y_pred_poly = poly_model.predict(X_test)

# Evaluate Model

mse_poly = mean_squared_error(y_test, y_pred_poly)

print(f'Polynomial Regression MSE: {mse_poly}')

# Plot Polynomial Regression Results

X_sorted = np.sort(X_test, axis=0)

y_sorted = poly_model.predict(X_sorted)

plt.scatter(X_test, y_test, color='blue', alpha=0.5, label='Actual')

plt.plot(X_sorted, y_sorted, color='red', label=f'Polynomial Degree {degree}')

plt.xlabel('Horsepower')

plt.ylabel('MPG')

plt.title('Polynomial Regression: Horsepower vs MPG')

plt.legend()

plt.show()

Output:

Linear Regression MSE: 0.6629874283048177

Polynomial Regression MSE: 18.460267222145088

Program 8: Develop a program to demonstrate the working of the decision tree algorithm.
Use Breast Cancer Data set for building the decision tree and apply this knowledge to
classify a new sample.

Source Code:

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier

from sklearn.datasets import load_breast_cancer

from sklearn.metrics import accuracy_score, classification_report

# Load Breast Cancer Dataset

cancer = load_breast_cancer()

X = cancer.data

y = cancer.target

# Split data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Decision Tree Model

decision_tree = DecisionTreeClassifier(random_state=42)

decision_tree.fit(X_train, y_train)

# Predict on test data

y_pred = decision_tree.predict(X_test)

# Evaluate Model

accuracy = accuracy_score(y_test, y_pred)

print(f'Decision Tree Accuracy: {accuracy}')

print(classification_report(y_test, y_pred))

# Classify a new sample

new_sample = np.array([X_test[0]]) # Using first test sample as an example

predicted_class = decision_tree.predict(new_sample)

print(f'Predicted class for new sample: {cancer.target_names[predicted_class[0]]}')

Output:

Decision Tree Accuracy: 0.9473684210526315

precision recall f1-score support

0 0.93 0.93 0.93 43

1 0.96 0.96 0.96 71

accuracy 0.95 114

macro avg 0.94 0.94 0.94 114

weighted avg 0.95 0.95 0.95 114

Predicted class for new sample: benign

Program 9: Develop a program to implement the Naive Bayesian classifier considering
Olivetti Face Data set for training. Compute the accuracy of the classifier, considering a few
test data sets.

Source Code:

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.naive_bayes import GaussianNB

from sklearn.datasets import fetch_olivetti_faces

from sklearn.metrics import accuracy_score, classification_report

# Load Olivetti Faces Dataset

faces = fetch_olivetti_faces(shuffle=True, random_state=42)

X = faces.data

y = faces.target

# Split data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Naive Bayes Classifier

naive_bayes = GaussianNB()

naive_bayes.fit(X_train, y_train)

# Predict on test data

y_pred = naive_bayes.predict(X_test)

# Evaluate Model

accuracy = accuracy_score(y_test, y_pred)

print(f'Naive Bayes Accuracy: {accuracy}')

print(classification_report(y_test, y_pred))

# Classify a new sample

new_sample = np.array([X_test[0]]) # Using first test sample as an example

predicted_class = naive_bayes.predict(new_sample)

print(f'Predicted class for new sample: {predicted_class[0]}')

Output:

Naive Bayes Accuracy: 0.775

precision recall f1-score support

0 1.00 1.00 1.00 2

1 1.00 1.00 1.00 1

2 0.33 1.00 0.50 1

3 0.00 0.00 0.00 3

4 1.00 0.50 0.67 4

5 1.00 1.00 1.00 2

7 1.00 1.00 1.00 3

8 1.00 0.67 0.80 3

9 0.50 1.00 0.67 2

10 1.00 1.00 1.00 1

11 1.00 1.00 1.00 1

12 0.50 0.67 0.57 3

13 1.00 0.50 0.67 2

14 0.00 0.00 0.00 4

15 1.00 1.00 1.00 1

16 0.67 1.00 0.80 2

17 1.00 1.00 1.00 2

18 1.00 1.00 1.00 3

19 0.40 1.00 0.57 2

20 1.00 1.00 1.00 3

21 1.00 0.50 0.67 2

22 1.00 0.40 0.57 5

23 1.00 0.50 0.67 2

24 1.00 1.00 1.00 1

25 0.67 1.00 0.80 2

26 1.00 1.00 1.00 1

27 1.00 1.00 1.00 4

28 0.00 0.00 0.00 0

29 1.00 1.00 1.00 2

30 1.00 1.00 1.00 1

31 1.00 0.67 0.80 3

32 1.00 1.00 1.00 1

34 0.00 0.00 0.00 0

35 1.00 1.00 1.00 2

36 1.00 1.00 1.00 2

38 1.00 1.00 1.00 3

39 0.57 1.00 0.73 4

accuracy 0.78 80

macro avg 0.80 0.79 0.77 80

weighted avg 0.82 0.78 0.76 80

Predicted class for new sample: 18

Program 10: Develop a program to implement k-means clustering using Wisconsin Breast
Cancer data set and visualize the clustering result.

Source Code:

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

from sklearn.cluster import KMeans

from sklearn.datasets import load_breast_cancer

from sklearn.decomposition import PCA

# Load Breast Cancer Dataset

cancer = load_breast_cancer()

X = cancer.data

# Apply K-Means Clustering

kmeans = KMeans(n_clusters=2, random_state=42)

kmeans.fit(X)

labels = kmeans.labels_

# Reduce dimensions for visualization using PCA

pca = PCA(n_components=2)

X_pca = pca.fit_transform(X)

# Scatter plot of the clusters

plt.figure(figsize=(8, 6))

plt.scatter(X_pca[:, 0], X_pca[:, 1], c=labels, cmap='viridis', alpha=0.5)

plt.title('K-Means Clustering on Breast Cancer Data')

plt.xlabel('Principal Component 1')

plt.ylabel('Principal Component 2')

plt.colorbar(label='Cluster Label')

plt.show()

Output:
Viva Questions:

1. What is the difference between supervised and unsupervised learning?

2. What are the key assumptions of the Naive Bayes classifier?

3. How does the k-Nearest Neighbors (k-NN) algorithm work?

4. What is the curse of dimensionality, and how does PCA help mitigate it?

5. What is the significance of the correlation matrix in data analysis?

6. How does the Find-S algorithm work for hypothesis learning?

7. What is the difference between parametric and non-parametric regression?

8. Why is feature scaling important in machine learning?

9. How do you evaluate the performance of a clustering algorithm?

10. What is the difference between K-Means clustering and hierarchical clustering?

11. How does Locally Weighted Regression differ from traditional regression models?

12. How does k-NN classify a new data point?

13. What are the advantages and disadvantages of Decision Trees?

14. How does the Naive Bayes classifier handle continuous data?

15. What is the role of the Gaussian assumption in Naive Bayes?

16. What are the hyperparameters in K-Means clustering, and how do they affect results?

17. What is the role of eigenvalues and eigenvectors in PCA?

18. How does polynomial regression differ from linear regression?

19. Why do we use test-train splits in machine learning models?

20. What are some real-world applications of K-Means clustering?

Machine Learning Labnem
No ratings yet
Machine Learning Labnem
5 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
ML Lab Manual
No ratings yet
ML Lab Manual
60 pages
Lab Manual ML
No ratings yet
Lab Manual ML
26 pages
ML LAB Manual
No ratings yet
ML LAB Manual
18 pages
Machine Learning Laboratory
No ratings yet
Machine Learning Laboratory
23 pages
ML Lab Manual for CSE Students
No ratings yet
ML Lab Manual for CSE Students
32 pages
BCSL606 Machine Learning Lab
No ratings yet
BCSL606 Machine Learning Lab
33 pages
12 AI Lab Practical File HW
No ratings yet
12 AI Lab Practical File HW
25 pages
Practical (Data Science)
No ratings yet
Practical (Data Science)
13 pages
DSBDA Manual
No ratings yet
DSBDA Manual
76 pages
ML 1-11
No ratings yet
ML 1-11
27 pages
Ai Class 12 Practical 2
0% (1)
Ai Class 12 Practical 2
21 pages
Manishadav
No ratings yet
Manishadav
27 pages
Data Analysis Lab with Python
No ratings yet
Data Analysis Lab with Python
11 pages
ML Lab Manual
No ratings yet
ML Lab Manual
25 pages
DSBDA Lab Plan
No ratings yet
DSBDA Lab Plan
5 pages
Int375 Etp Paper
No ratings yet
Int375 Etp Paper
11 pages
ML 3
No ratings yet
ML 3
24 pages
GE Practical Sem 2
No ratings yet
GE Practical Sem 2
28 pages
Syllabus AIML
No ratings yet
Syllabus AIML
14 pages
ML Observation
No ratings yet
ML Observation
29 pages
ML Final Prac
No ratings yet
ML Final Prac
47 pages
External
No ratings yet
External
11 pages
PR List Dsbda
No ratings yet
PR List Dsbda
2 pages
Pert Q Python
No ratings yet
Pert Q Python
3 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
18 pages
Icse Phython Programs
No ratings yet
Icse Phython Programs
65 pages
Machine Learning Lab Dlihebca6sem
100% (1)
Machine Learning Lab Dlihebca6sem
25 pages
DAV Practical File 234003
No ratings yet
DAV Practical File 234003
14 pages
Program 4: Public
No ratings yet
Program 4: Public
10 pages
DSML Problem Statements
No ratings yet
DSML Problem Statements
8 pages
DBDAL LAB - MANUAL - Final
No ratings yet
DBDAL LAB - MANUAL - Final
93 pages
DATASCIENCE
No ratings yet
DATASCIENCE
3 pages
Khadeeja - DS - PRACTICAL 4
No ratings yet
Khadeeja - DS - PRACTICAL 4
24 pages
Dhruv 1121
No ratings yet
Dhruv 1121
24 pages
Machinelearninglabmanual
No ratings yet
Machinelearninglabmanual
47 pages
DSBDAlab Manual
No ratings yet
DSBDAlab Manual
116 pages
Study Material IP XII
No ratings yet
Study Material IP XII
116 pages
Ai Class 12 Practical
No ratings yet
Ai Class 12 Practical
21 pages
Certificate
No ratings yet
Certificate
25 pages
Machine Learning
No ratings yet
Machine Learning
22 pages
ML Lab
No ratings yet
ML Lab
14 pages
Data Science
No ratings yet
Data Science
18 pages
Data Sci
No ratings yet
Data Sci
6 pages
21hcs4108 Davpracticals
No ratings yet
21hcs4108 Davpracticals
29 pages
ML Lab Question Set - 21
No ratings yet
ML Lab Question Set - 21
4 pages
Class 12 Practical Assignment Questions
No ratings yet
Class 12 Practical Assignment Questions
3 pages
CS3362 Data Science Laboratory Manual 2022-23
No ratings yet
CS3362 Data Science Laboratory Manual 2022-23
54 pages
ML Lab File
No ratings yet
ML Lab File
43 pages
ML Lab Programs
No ratings yet
ML Lab Programs
2 pages
Python Practice Questions
No ratings yet
Python Practice Questions
5 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
33 pages
Class Xii PDF For Practical
No ratings yet
Class Xii PDF For Practical
24 pages
Python Lab PRG
No ratings yet
Python Lab PRG
20 pages
SL-III Lab Manual
No ratings yet
SL-III Lab Manual
74 pages
Machine Learning Programs
No ratings yet
Machine Learning Programs
10 pages
DXV Guidelines
No ratings yet
DXV Guidelines
3 pages
MLLab Manual
No ratings yet
MLLab Manual
24 pages
DFSF
No ratings yet
DFSF
50 pages
China in The WTO: Alan Wm. WOLFF
No ratings yet
China in The WTO: Alan Wm. WOLFF
11 pages
The Importance of Counting Costs: Principle of Food Preparation of 10
No ratings yet
The Importance of Counting Costs: Principle of Food Preparation of 10
17 pages
Employee Engagement Survey
100% (1)
Employee Engagement Survey
3 pages
Optimization Problems and Solutions
0% (1)
Optimization Problems and Solutions
6 pages
Hyundai: No Engine Car Name/Year/Model Full Set Head Set Cylinder Head
No ratings yet
Hyundai: No Engine Car Name/Year/Model Full Set Head Set Cylinder Head
8 pages
MACDxxxxxx
100% (1)
MACDxxxxxx
8 pages
Building Construction Materials and Techniques P. Purushothama Raj Instant Download Full Chapters
100% (2)
Building Construction Materials and Techniques P. Purushothama Raj Instant Download Full Chapters
141 pages
Gow Props
No ratings yet
Gow Props
6 pages
All Category Database in Jammu and Kashmir
No ratings yet
All Category Database in Jammu and Kashmir
131 pages
JAR - OPS Quality System
No ratings yet
JAR - OPS Quality System
9 pages
Basic Hydraulic
50% (2)
Basic Hydraulic
5 pages
Unlock 2e RW3 Tests Unit 01
100% (1)
Unlock 2e RW3 Tests Unit 01
6 pages
Unit, Quantities and Vector
No ratings yet
Unit, Quantities and Vector
7 pages
Mypna Se g10 Sel House Taken Web
No ratings yet
Mypna Se g10 Sel House Taken Web
14 pages
2023 Wedding Buffet Package - The Chinese National
No ratings yet
2023 Wedding Buffet Package - The Chinese National
9 pages
Mechatronics: Pınar Boyraz, Mutlu Gündüz
No ratings yet
Mechatronics: Pınar Boyraz, Mutlu Gündüz
13 pages
Expansion Api 510 Tomo 1
No ratings yet
Expansion Api 510 Tomo 1
2 pages
Epoxy Adhesive for Bridge Bonding
No ratings yet
Epoxy Adhesive for Bridge Bonding
2 pages
For User in Local PPP Secrets
No ratings yet
For User in Local PPP Secrets
7 pages
Online Premium Payment Confirmation
No ratings yet
Online Premium Payment Confirmation
1 page
MECON Limited: Engineering & Consultancy Overview
No ratings yet
MECON Limited: Engineering & Consultancy Overview
5 pages
Viral Pin Formula-2019
No ratings yet
Viral Pin Formula-2019
62 pages
Grapevine Communication
No ratings yet
Grapevine Communication
2 pages
Inter Islamic Physics 2 2024
No ratings yet
Inter Islamic Physics 2 2024
4 pages
Catalogue
No ratings yet
Catalogue
15 pages
English Grammar Guide for Students
No ratings yet
English Grammar Guide for Students
5 pages
12) United States Patent: Kazala, Jr. Et A
No ratings yet
12) United States Patent: Kazala, Jr. Et A
16 pages
Liehr
No ratings yet
Liehr
9 pages
MPERC Order Online Mechansim To Monitor SAIFI SAIDI 02 05 2024
No ratings yet
MPERC Order Online Mechansim To Monitor SAIFI SAIDI 02 05 2024
2 pages