Machine Learning Lab
(MR22-1CS0204)
Learning Manual
B.Tech: II Year II
Semester(CSE-AI&ML)
(2023-24)
MALLA REDDY UNIVERSITY II YEAR I SEM, CSE-AIML
GENERAL LABORATORY INSTRUCTIONS
1. Students are advised to come to the laboratory at least 5 minutes before (to the
starting time), those who come after 5 minutes will not be allowed into the lab.
2. Plan your task properly much before to the commencement, come prepared to the lab
with the synopsis / program / experiment details.
3. Student should enter into the laboratory with: Laboratory observation notes with all
the details (Problem statement, Aim, Algorithm, Procedure, Program, Expected Output,
etc.,) filled in for the lab session.
4. Laboratory Record updated up to the last session experiments and other utensils (if
any) needed in the lab.
5. Proper Dress code and Identity card.
6. Sign in the laboratory login register, write the TIME-IN, and occupy the computer
system allotted to you by the faculty.
7. Execute your task in the laboratory, and record the results / output in the lab
observation note book, and get certified by the concerned faculty.
8. All the students should be polite and cooperative with the laboratory staff, must
maintain the discipline and decency in the laboratory.
9. Computer labs are established with sophisticated and high end branded systems,
which should be utilized properly.
10. Students / Faculty must keep their mobile phones in SWITCHED OFF mode during
the lab sessions. Misuse of the equipment, misbehaviors with the staff and systems etc.,
will attract severe punishment.
11. Students must take the permission of the faculty in case of any urgency to go out; if
anybody found loitering outside the lab / class without permission during working hours
will be treated seriously and punished appropriately.
12. Students should LOG OFF/ SHUT DOWN the computer system before he/she
leaves the lab after completing the task (experiment) in all aspects. He/she must ensure
the system / seat is kept properly.
AI & ML DEPARTMENT (II YEAR II SEMESTER)
MACHINE LEARNING LABORATORY (MR22-1CS0204)
INDEX
S.No. Name of the Experiment Page No.
1 Implementation of Linear algebra , Statistics & Data Preprocessing 1-8
2 Implementation of Linear regression 9-12
3 Implementation of Logistic regression 13-16
4 Implementation of Decision trees 17-19
5 Implementation of Support vector machines 20-22
6 Implementation of Neural networks 23-28
7 Implementation of K-means clustering 29-32
8 Implementation of Principal component analysis 33-35
9 Implementation of Hierarchical clustering 36-38
10 Implementation of Ensemble learning Bagging Algorithms 39-40
11 Implementation of Random forest Algorithms 41-42
12 Implementation of Model Evaluation 43-46
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
1. Linear algebra, Statistics & Data Preprocessing
Exercise 1.1: Implement a program to calculate the dot product of two vectors.
Python Code:
def dot_product(vector1, vector2):
if len(vector1) != len(vector2):
raise ValueError("Vectors must have the same length for dot product calculation.")
result = sum(x * y for x, y in zip(vector1, vector2))
return result
# Example data
vector_a = [2, 3, 4]
vector_b = [5, 6, 7]
# Calculate dot product
result_dot_product = dot_product(vector_a, vector_b)
# Display the result
print(f"The dot product of {vector_a} and {vector_b} is: {result_dot_product}")
Out put:
The dot product of [2, 3, 4] and [5, 6, 7] is: 56
II YEAR II SEMESTER MACHINE LEARNING 1
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
Exercise 1.2: Implement a program to generate a random variable from a given probability
distribution.
Python Code:
import numpy as np
import matplotlib.pyplot as plt
# Example data: mean and standard deviation
mean_value = 0
std_deviation = 1
# Number of random variables to generate
num_samples = 1000
# Generate random variables from a normal distribution
random_variable =np.random.normal(loc=mean_value, scale=std_deviation, size=num_samples)
# Plot a histogram of the generated random variables
plt.hist(random_variable, bins=30, density=True, alpha=0.7, color='blue')
# Plot the probability density function (PDF) of the normal distribution
xmin, xmax = plt.xlim()
x = np.linspace(xmin, xmax, 100)
p = np.exp(-0.5 * ((x - mean_value) / std_deviation) ** 2) / (std_deviation * np.sqrt(2 * np.pi))
plt.plot(x, p, 'k', linewidth=2)
II YEAR II SEMESTER MACHINE LEARNING 2
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
plt.title('Random Variable from Normal Distribution')
plt.xlabel('Value')
plt.ylabel('Probability Density')
plt.show()
Output:
II YEAR II SEMESTER MACHINE LEARNING 3
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
Exercise 1.3: Implement a program to calculate the derivative of a function.
Python code:
import sympy as sp
def calculate_derivative():
# Define a symbolic variable and the function
x = sp.symbols('x')
function = x**2 + 3*x + 5
# Calculate the derivative of the function
derivative = sp.diff(function, x)
return function, derivative
# Example data
original_function, derivative_function = calculate_derivative()
# Display the results
print(f"Original function: {original_function}")
print(f"Derivative function: {derivative_function}")
Output :
Original function: x**2 + 3*x + 5
Derivative function: 2*x + 3
II YEAR II SEMESTER MACHINE LEARNING 4
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
Exercise:1.4 Implement a program to find the minimum of a function using gradient descent.
Python Code:
import numpy as np
import matplotlib.pyplot as plt
def quadratic_function(x):
return x**2 + 4*x + 4 # Example quadratic function
def gradient_quadratic_function(x):
return 2*x + 4 # Gradient of the quadratic function
def gradient_descent(initial_guess, learning_rate, num_iterations):
x_values = []
y_values = []
x = initial_guess
for _ in range(num_iterations):
x_values.append(x)
y_values.append(quadratic_function(x))
# Update x using the gradient descent formula
x = x - learning_rate * gradient_quadratic_function(x)
return x_values, y_values
# Example data
initial_guess = -5
learning_rate = 0.1
num_iterations = 20
# Run gradient descent
x_values, y_values = gradient_descent(initial_guess, learning_rate, num_iterations)
# Plot the function and the gradient descent path
II YEAR II SEMESTER MACHINE LEARNING 5
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
x_range = np.linspace(-8, 2, 100)
plt.plot(x_range, quadratic_function(x_range), label='Quadratic Function')
plt.scatter(x_values, y_values, color='red', label='Gradient Descent Path')
plt.title('Gradient Descent to Minimize a Quadratic Function')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()
Output :
II YEAR II SEMESTER MACHINE LEARNING 6
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
Exercise:1.5 Implement a program to clean a dataset of missing values.
Python Code:
import pandas as pd
from sklearn.impute import SimpleImputer
def generate_example_dataset():
# Create an example dataset with missing values
data = {
'PassengerId': [1, 2, 3, 4, 5],
'Name': ['John', 'Jane', 'Bob', 'Alice', 'Charlie'],
'Age': [22, None, 25, None, 30],
'Fare': [7.25, 71.28, None, 8.05, 10.5],
'Survived': [0, 1, 1, 0, 1]
}
return pd.DataFrame(data)
def clean_dataset(df):
# Display the original dataset
print("Original Dataset:")
print(df)
# Drop non-numeric columns
numeric_df = df.select_dtypes(include='number')
# Handling missing values using SimpleImputer (mean strategy)
imputer = SimpleImputer(strategy='mean')
df_cleaned = pd.DataFrame(imputer.fit_transform(numeric_df),
columns=numeric_df.columns)
# Display the cleaned dataset
print("\nCleaned Dataset:")
print(df_cleaned)
if __name__ == "__main__":
# Generate an example dataset with missing values
II YEAR II SEMESTER MACHINE LEARNING 7
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
example_dataset = generate_example_dataset()
# Call the clean_dataset function
clean_dataset(example_dataset)
Output:-
Original Dataset:
PassengerId Name Age Fare Survived
0 1 John 22.0 7.25 0
1 2 Jane NaN 71.28 1
2 3 Bob 25.0 NaN 1
3 4 Alice NaN 8.05 0
4 5 Charlie 30.0 10.50 1
Cleaned Dataset:
PassengerId Age Fare Survived
0 1.0 22.000000 7.25 0.0
1 2.0 25.666667 71.28 1.0
2 3.0 25.000000 24.27 1.0
3 4.0 25.666667 8.05 0.0
4 5.0 30.000000 10.50 1.0
II YEAR II SEMESTER MACHINE LEARNING 8
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
2. Linear Regression
Exercise:2.1
Implement a program to fit a linear regression model to a dataset.
Python code:
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd
data_set= pd.read_csv('Salary_Data.csv')
x= data_set.iloc[:, :-1].values
y= data_set.iloc[:, 1].values
# Splitting the dataset into training and test set.
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 1/3, random_state=0)
#Fitting the Simple Linear Regression model to the training dataset
from sklearn.linear_model import LinearRegression
regressor= LinearRegression()
regressor.fit(x_train, y_train) #Prediction of Test and Training set result
y_pred= regressor.predict(x_test)
x_pred= regressor.predict(x_train)
mtp.scatter(x_train, y_train, color="green")
mtp.plot(x_train, x_pred, color="red")
mtp.title("Salary vs Experience (Training Dataset)")
mtp.xlabel("Years of Experience")
mtp.ylabel("Salary(In Rupees)")
mtp.show()
#visualizing the Test set results
mtp.scatter(x_test, y_test, color="blue")
II YEAR II SEMESTER MACHINE LEARNING 9
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
mtp.plot(x_train, x_pred, color="red")
mtp.title("Salary vs Experience (Test Dataset)")
mtp.xlabel("Years of Experience")
mtp.ylabel("Salary(In Rupees)")
mtp.show()
Output:
II YEAR II SEMESTER MACHINE LEARNING 10
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
Exercise:2.2
Implement a program to calculate the coefficient of determination for a linear regression model.
Python code:
First, we will start with importing necessary packages as follows −
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score
Next, we will load the diabetes dataset and create its object −
diabetes = datasets.load_diabetes()
As we are implementing SLR, we will be using only one feature as follows −
X = diabetes.data[:, np.newaxis, 2]
Next, we need to split the data into training and testing sets as follows −
X_train = X[:-30]
X_test = X[-30:]
Next, we need to split the target into training and testing sets as follows −
y_train = diabetes.target[:-30]
y_test = diabetes.target[-30:]
Now, to train the model we need to create linear regression object as follows −
regr = linear_model.LinearRegression()
Next, train the model using the training sets as follows −
regr.fit(X_train, y_train)
Next, make predictions using the testing set as follows −
y_pred = regr.predict(X_test)
Next, we will be printing some coefficient like MSE, Variance score etc. as follows −
print('Coefficients: \n', regr.coef_)
print("Mean squared error: %.2f" % mean_squared_error(y_test, y_pred))
print('Variance score: %.2f' % r2_score(y_test, y_pred))
Now, plot the outputs as follows −
plt.scatter(X_test, y_test, color='blue')
plt.plot(X_test, y_pred, color='red', linewidth=3)
plt.xticks(())
plt.yticks(())
II YEAR II SEMESTER MACHINE LEARNING 11
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
plt.show()
Output:
Coefficients:
[941.43097333]
Mean squared error: 3035.06
Variance score: 0.41
II YEAR II SEMESTER MACHINE LEARNING 12
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
3. Logistic Regression
Exercise:3.1
Implement a program to fit a logistic regression model to a dataset with the given x,y data, fit
the logistic regression model.
x = np.arange(10).reshape(-1, 1)
y = np.array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1])
Python code:
import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
x = np.arange(10).reshape(-1, 1)
y = np.array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1])
model = LogisticRegression(solver='liblinear', random_state=0)
model.fit(x, y)
model = LogisticRegression(solver='liblinear', random_state=0).fit(x, y)
model.classes_
model.intercept_
model.coef_
model.predict_proba(x)
model.predict(x)
model.score(x, y)
confusion_matrix(y, model.predict(x))
print(classification_report(y, model.predict(x)))
#Improve the Model
#You can improve your model by setting different parameters. For example, let’s work with
#the regularization strength C equal to 10.0, instead of the default value of 1.0:
model = LogisticRegression(solver='liblinear', C=10.0, random_state=0)
model.fit(x, y)
model.classes_
model.intercept_
model.coef_
model.predict_proba(x)
model.predict(x)
model.score(x, y)
confusion_matrix(y, model.predict(x))
print(classification_report(y, model.predict(x)))
II YEAR II SEMESTER MACHINE LEARNING 13
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
Output:
precision recall f1-score support
0 1.00 1.00 1.00 4
1 1.00 1.00 1.00 6
accuracy 1.00 10
macro avg 1.00 1.00 1.00 10
weighted avg 1.00 1.00 1.00 10
II YEAR II SEMESTER MACHINE LEARNING 14
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
Exercise:3.2
Implement a program to calculate the odds ratio for a logistic regression model.
Python Code:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Load or create your dataset
# for simplicity, let's create a sample dataset
data = {
'Age': [25, 30, 35, 40, 45, 50, 55, 60, 65, 70],
'Smoker': [0, 1, 0, 1, 0, 1, 0, 1, 0, 1],
'Outcome': [0, 0, 0, 1, 1, 1, 1, 1, 1, 1]
}
df = pd.DataFrame(data)
# Split the dataset into features (X) and target variable (y)
X = df[['Age', 'Smoker']]
y = df['Outcome']
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train a logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)
# Make predictions on the test set
y_pred = model.predict(X_test)
# Calculate the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')
II YEAR II SEMESTER MACHINE LEARNING 15
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
# Calculate odds ratio for each feature
odds_ratio = np.exp(model.coef_)
print(f'Odds Ratio: {odds_ratio}')
Output:
Accuracy: 1.00
Odds Ratio: [[2.04559905 1.14215319]]
II YEAR II SEMESTER MACHINE LEARNING 16
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
4. Decision trees
Exercise:4.1
Implement a program to construct a decision tree from a dataset.
Python code:
# Python program to implement decision tree algorithm and plot the tree
# Importing the required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import metrics
import seaborn as sns
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn import tree
# Loading the dataset
iris = load_iris()
#converting the data to a pandas dataframe
data = pd.DataFrame(data = iris.data, columns = iris.feature_names)
#creating a separate column for the target variable of iris dataset
data['Species'] = iris.target
#replacing the categories of target variable with the actual names of the species
target = np.unique(iris.target)
target_n = np.unique(iris.target_names)
target_dict = dict(zip(target, target_n))
data['Species'] = data['Species'].replace(target_dict)
# Separating the independent dependent variables of the dataset
x = data.drop(columns = "Species")
y = data["Species"]
names_features = x.columns
target_labels = y.unique()
# Splitting the dataset into training and testing datasets
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = 93)
# Importing the Decision Tree classifier class from sklearn
from sklearn.tree import DecisionTreeClassifier
II YEAR II SEMESTER MACHINE LEARNING 17
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
# Creating an instance of the classifier class
dtc = DecisionTreeClassifier(max_depth = 3, random_state = 93)
# Fitting the training dataset to the model
dtc.fit(x_train, y_train)
# Plotting the Decision Tree
plt.figure(figsize = (30, 10), facecolor = 'b')
Tree = tree.plot_tree(dtc, feature_names = names_features, class_names = target_labels, rounded
= True, filled = True, fontsize = 14)
plt.show()
y_pred = dtc.predict(x_test)
# Finding the confusion matrix
confusion_matrix = metrics.confusion_matrix(y_test, y_pred)
matrix = pd.DataFrame(confusion_matrix)
axis = plt.axes()
sns.set(font_scale = 1.3)
plt.figure(figsize = (10,7))
# Plotting heatmap
sns.heatmap(matrix, annot = True, fmt = "g", ax = axis, cmap = "magma")
axis.set_title('Confusion Matrix')
axis.set_xlabel("Predicted Values", fontsize = 10)
axis.set_xticklabels([''] + target_labels)
axis.set_ylabel( "True Labels", fontsize = 10)
axis.set_yticklabels(list(target_labels), rotation = 0)
plt.show()
Output:
II YEAR II SEMESTER MACHINE LEARNING 18
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
Exercise:4.2
Implement a program to calculate the accuracy of a decision tree model
Python code:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
# Load or create your dataset
# For simplicity, let's create a sample dataset
data = {
'Feature1': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Feature2': [2, 4, 1, 3, 6, 8, 5, 7, 10, 9],
'Target': [0, 0, 0, 1, 1, 1, 1, 1, 1, 1]
}
df = pd.DataFrame(data)
# Split the dataset into features (X) and target variable (y)
X = df[['Feature1', 'Feature2']]
y = df['Target']
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train a decision tree model
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
# Make predictions on the test set
y_pred = model.predict(X_test)
# Calculate the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')
# Check if the accuracy meets the desired threshold (97%)
desired_accuracy = 0.97
if accuracy >= desired_accuracy:
print(f'Model accuracy meets the desired threshold of {desired_accuracy:.2%}')
else:
print(f'Model accuracy does not meet the desired threshold of {desired_accuracy:.2%}')
Output:
Accuracy: 1.00
Model accuracy meets the desired threshold of 97.00%
II YEAR II SEMESTER MACHINE LEARNING 19
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
5. Support Vector Machine (SVM)
Exercise: 5.1: Implement a program to fit a support vector machine model to a dataset.
Python code:
import pandas as pd
data = pd.read_csv("apples_and_oranges.csv")
#Splitting the dataset into training and test samples
from sklearn.model_selection import train_test_split
training_set, test_set = train_test_split(data, test_size = 0.2, random_state = 1)
#Classifying the predictors and target
X_train = training_set.iloc[:,0:2].values
Y_train = training_set.iloc[:,2].values
X_test = test_set.iloc[:,0:2].values
Y_test = test_set.iloc[:,2].values
#Initializing Support Vector Machine and fitting the training data
from sklearn.svm import SVC
classifier = SVC(kernel='rbf', random_state = 1)
classifier.fit(X_train,Y_train)
#Predicting the classes for test set
Y_pred = classifier.predict(X_test)
#Attaching the predictions to test set for comparing
test_set["Predictions"] = Y_pred
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(Y_test,Y_pred)
accuracy = float(cm.diagonal().sum())/len(Y_test)
print("\nAccuracy of SVM for the Given Dataset : ", accuracy)
Output:
Accuracy of SVM for the Given Dataset: 0.375
II YEAR II SEMESTER MACHINE LEARNING 20
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
Exercise: 5.2:
Implement the SVM and find the performance of with the given bill_authentication.CSV dataset.
Python code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#matplotlib inline
bankdata = pd.read_csv("bill_authentication.csv")
bankdata.shape
bankdata.head()
X = bankdata.drop('Class', axis=1)
y = bankdata['Class']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)
from sklearn.svm import SVC
svclassifier = SVC(kernel='linear')
svclassifier.fit(X_train, y_train)
y_pred = svclassifier.predict(X_test)
from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))
II YEAR II SEMESTER MACHINE LEARNING 21
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
Output:
[[152 0]
[ 1 122]]
precision recall f1-score support
0 0.99 1.00 1.00 152
1 1.00 0.99 1.00 123
avg / total 1.00 1.00 1.00 275
II YEAR II SEMESTER MACHINE LEARNING 22
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
6. Neural networks
Exercise: 6.1 Implement a program to construct a neural network from a dataset.
Python code:
# Import necessary libraries
import numpy as np
import tensorflow as tf
from tensorflow import keras
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_classification
# Generate a synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Define the neural network architecture
model = keras.Sequential([
keras.layers.Dense(32, activation='relu', input_shape=(X_train.shape[1],)),
keras.layers.Dense(16, activation='relu'),
keras.layers.Dense(1, activation='sigmoid') ])
# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
# Train the model
model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2)
# Evaluate the model on the test set
test_loss, test_acc = model.evaluate(X_test, y_test)
print(f'Test accuracy: {test_acc * 100:.2f}%')
# Make predictions on new data
new_data = np.random.randn(5, 20) # Replace with your own new data
new_data_standardized = scaler.transform(new_data)
predictions = model.predict(new_data_standardized)
print("Predictions:")
print(predictions)
Output :
Epoch 1/10
20/20 [==============================] - 3s 37ms/step - loss: 0.7252 -
accuracy: 0.5641 - val_loss: 0.6267 - val_accuracy: 0.6500
Epoch 2/10
20/20 [==============================] - 0s 6ms/step - loss: 0.6146 -
accuracy: 0.6484 - val_loss: 0.5604 - val_accuracy: 0.7250
II YEAR II SEMESTER MACHINE LEARNING 23
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
Epoch 3/10
20/20 [==============================] - 0s 8ms/step - loss: 0.5520 -
accuracy: 0.7344 - val_loss: 0.5102 - val_accuracy: 0.7688
Epoch 4/10
20/20 [==============================] - 0s 8ms/step - loss: 0.5009 -
accuracy: 0.7875 - val_loss: 0.4652 - val_accuracy: 0.8062
Epoch 5/10
20/20 [==============================] - 0s 10ms/step - loss: 0.4571 -
accuracy: 0.8219 - val_loss: 0.4248 - val_accuracy: 0.8313
Epoch 6/10
20/20 [==============================] - 0s 8ms/step - loss: 0.4175 -
accuracy: 0.8422 - val_loss: 0.3871 - val_accuracy: 0.8250
Epoch 7/10
20/20 [==============================] - 0s 12ms/step - loss: 0.3873 -
accuracy: 0.8531 - val_loss: 0.3550 - val_accuracy: 0.8438
Epoch 8/10
20/20 [==============================] - 0s 8ms/step - loss: 0.3607 -
accuracy: 0.8641 - val_loss: 0.3335 - val_accuracy: 0.8562
Epoch 9/10
20/20 [==============================] - 0s 11ms/step - loss: 0.3411 -
accuracy: 0.8734 - val_loss: 0.3190 - val_accuracy: 0.8562
Epoch 10/10
20/20 [==============================] - 0s 7ms/step - loss: 0.3254 -
accuracy: 0.8813 - val_loss: 0.3034 - val_accuracy: 0.8625
7/7 [==============================] - 0s 9ms/step - loss: 0.3521 -
accuracy: 0.8550
Test accuracy: 85.50%
1/1 [==============================] - 0s 283ms/step
Predictions:
[[0.813061 ]
[0.5343085 ]
[0.99716836]
[0.67003953]
[0.19199367]]
II YEAR II SEMESTER MACHINE LEARNING 24
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
Exercise: 6.2 Implement a program to train a neural network model.
Python code:
# Import necessary libraries
import numpy as np
import tensorflow as tf
from tensorflow import keras
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target.reshape(-1, 1)
# One-hot encode the target variable
encoder = OneHotEncoder(sparse=False)
y_onehot = encoder.fit_transform(y)
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y_onehot, test_size=0.2, random_state=42)
# Define the neural network architecture
model = keras.Sequential([
keras.layers.Dense(10, activation='relu', input_shape=(X_train.shape[1],)),
keras.layers.Dense(3, activation='softmax') # Output layer with 3 classes for Iris dataset
])
# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# Train the model
model.fit(X_train, y_train, epochs=50, batch_size=10, validation_split=0.1)
# Evaluate the model on the test set
test_loss, test_acc = model.evaluate(X_test, y_test)
print(f'Test accuracy: {test_acc * 100:.2f}%')
II YEAR II SEMESTER MACHINE LEARNING 25
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
Output :
Epoch 1/50
11/11 [==============================] - 1s 40ms/step - loss: 3.0484 - accuracy: 0.3519 - val_loss:
3.5613 - val_accuracy: 0.1667
Epoch 2/50
11/11 [==============================] - 0s 11ms/step - loss: 2.7265 - accuracy: 0.3519 - val_loss:
3.1855 - val_accuracy: 0.1667
Epoch 3/50
11/11 [==============================] - 0s 22ms/step - loss: 2.4384 - accuracy: 0.3519 - val_loss:
2.8338 - val_accuracy: 0.1667
Epoch 4/50
11/11 [==============================] - 0s 13ms/step - loss: 2.1603 - accuracy: 0.3519 - val_loss:
2.5011 - val_accuracy: 0.1667
Epoch 5/50
11/11 [==============================] - 0s 13ms/step - loss: 1.8983 - accuracy: 0.3519 - val_loss:
2.1564 - val_accuracy: 0.1667
Epoch 6/50
11/11 [==============================] - 0s 13ms/step - loss: 1.6421 - accuracy: 0.3519 - val_loss:
1.8575 - val_accuracy: 0.1667
Epoch 7/50
11/11 [==============================] - 0s 12ms/step - loss: 1.4327 - accuracy: 0.3519 - val_loss:
1.6076 - val_accuracy: 0.1667
Epoch 8/50
11/11 [==============================] - 0s 9ms/step - loss: 1.2498 - accuracy: 0.3519 - val_loss: 1.4182
- val_accuracy: 0.1667
Epoch 9/50
11/11 [==============================] - 0s 18ms/step - loss: 1.1191 - accuracy: 0.3611 - val_loss:
1.2617 - val_accuracy: 0.2500
Epoch 10/50
11/11 [==============================] - 0s 8ms/step - loss: 1.0100 - accuracy: 0.4630 - val_loss: 1.1399
- val_accuracy: 0.4167
Epoch 11/50
11/11 [==============================] - 0s 9ms/step - loss: 0.9218 - accuracy: 0.6481 - val_loss: 1.0457
- val_accuracy: 0.5833
Epoch 12/50
11/11 [==============================] - 0s 19ms/step - loss: 0.8534 - accuracy: 0.6667 - val_loss:
0.9746 - val_accuracy: 0.5833
Epoch 13/50
11/11 [==============================] - 0s 5ms/step - loss: 0.8040 - accuracy: 0.6759 - val_loss: 0.9162
- val_accuracy: 0.5833
Epoch 14/50
11/11 [==============================] - 0s 5ms/step - loss: 0.7619 - accuracy: 0.6852 - val_loss: 0.8738
- val_accuracy: 0.5833
Epoch 15/50
11/11 [==============================] - 0s 6ms/step - loss: 0.7339 - accuracy: 0.6759 - val_loss: 0.8430
- val_accuracy: 0.5000
Epoch 16/50
11/11 [==============================] - 0s 5ms/step - loss: 0.7111 - accuracy: 0.6944 - val_loss: 0.8184
- val_accuracy: 0.5000
Epoch 17/50
11/11 [==============================] - 0s 5ms/step - loss: 0.6963 - accuracy: 0.6667 - val_loss: 0.7989
- val_accuracy: 0.3333
Epoch 18/50
II YEAR II SEMESTER MACHINE LEARNING 26
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
11/11 [==============================] - 0s 5ms/step - loss: 0.6797 - accuracy: 0.6944 - val_loss: 0.7861
- val_accuracy: 0.3333
Epoch 19/50
11/11 [==============================] - 0s 6ms/step - loss: 0.6681 - accuracy: 0.7037 - val_loss: 0.7746
- val_accuracy: 0.3333
Epoch 20/50
11/11 [==============================] - 0s 5ms/step - loss: 0.6578 - accuracy: 0.6944 - val_loss: 0.7637
- val_accuracy: 0.5000
Epoch 21/50
11/11 [==============================] - 0s 5ms/step - loss: 0.6476 - accuracy: 0.7130 - val_loss: 0.7552
- val_accuracy: 0.5000
Epoch 22/50
11/11 [==============================] - 0s 6ms/step - loss: 0.6388 - accuracy: 0.7037 - val_loss: 0.7478
- val_accuracy: 0.5833
Epoch 23/50
11/11 [==============================] - 0s 5ms/step - loss: 0.6307 - accuracy: 0.7130 - val_loss: 0.7414
- val_accuracy: 0.5833
Epoch 24/50
11/11 [==============================] - 0s 6ms/step - loss: 0.6227 - accuracy: 0.7130 - val_loss: 0.7346
- val_accuracy: 0.5833
Epoch 25/50
11/11 [==============================] - 0s 5ms/step - loss: 0.6147 - accuracy: 0.7222 - val_loss: 0.7284
- val_accuracy: 0.5833
Epoch 26/50
11/11 [==============================] - 0s 6ms/step - loss: 0.6078 - accuracy: 0.7037 - val_loss: 0.7220
- val_accuracy: 0.5833
Epoch 27/50
11/11 [==============================] - 0s 7ms/step - loss: 0.6009 - accuracy: 0.6944 - val_loss: 0.7167
- val_accuracy: 0.5833
Epoch 28/50
11/11 [==============================] - 0s 6ms/step - loss: 0.5941 - accuracy: 0.6852 - val_loss: 0.7119
- val_accuracy: 0.5833
Epoch 29/50
11/11 [==============================] - 0s 5ms/step - loss: 0.5875 - accuracy: 0.6852 - val_loss: 0.7065
- val_accuracy: 0.5833
Epoch 30/50
11/11 [==============================] - 0s 7ms/step - loss: 0.5816 - accuracy: 0.6852 - val_loss: 0.7009
- val_accuracy: 0.5833
Epoch 31/50
11/11 [==============================] - 0s 7ms/step - loss: 0.5757 - accuracy: 0.6852 - val_loss: 0.6960
- val_accuracy: 0.5833
Epoch 32/50
11/11 [==============================] - 0s 5ms/step - loss: 0.5697 - accuracy: 0.6852 - val_loss: 0.6907
- val_accuracy: 0.5833
Epoch 33/50
11/11 [==============================] - 0s 7ms/step - loss: 0.5642 - accuracy: 0.6852 - val_loss: 0.6860
- val_accuracy: 0.5833
Epoch 34/50
11/11 [==============================] - 0s 5ms/step - loss: 0.5591 - accuracy: 0.6944 - val_loss: 0.6818
- val_accuracy: 0.5833
Epoch 35/50
11/11 [==============================] - 0s 7ms/step - loss: 0.5540 - accuracy: 0.7037 - val_loss: 0.6770
- val_accuracy: 0.5833
Epoch 36/50
II YEAR II SEMESTER MACHINE LEARNING 27
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
11/11 [==============================] - 0s 7ms/step - loss: 0.5487 - accuracy: 0.7037 - val_loss: 0.6722
- val_accuracy: 0.5833
Epoch 37/50
11/11 [==============================] - 0s 5ms/step - loss: 0.5443 - accuracy: 0.6852 - val_loss: 0.6678
- val_accuracy: 0.5833
Epoch 38/50
11/11 [==============================] - 0s 7ms/step - loss: 0.5394 - accuracy: 0.7037 - val_loss: 0.6634
- val_accuracy: 0.5833
Epoch 39/50
11/11 [==============================] - 0s 5ms/step - loss: 0.5349 - accuracy: 0.7037 - val_loss: 0.6600
- val_accuracy: 0.5833
Epoch 40/50
11/11 [==============================] - 0s 5ms/step - loss: 0.5306 - accuracy: 0.7130 - val_loss: 0.6558
- val_accuracy: 0.5833
Epoch 41/50
11/11 [==============================] - 0s 5ms/step - loss: 0.5264 - accuracy: 0.7037 - val_loss: 0.6519
- val_accuracy: 0.5833
Epoch 42/50
11/11 [==============================] - 0s 7ms/step - loss: 0.5221 - accuracy: 0.7037 - val_loss: 0.6479
- val_accuracy: 0.5833
Epoch 43/50
11/11 [==============================] - 0s 6ms/step - loss: 0.5181 - accuracy: 0.7037 - val_loss: 0.6447
- val_accuracy: 0.5833
Epoch 44/50
11/11 [==============================] - 0s 6ms/step - loss: 0.5142 - accuracy: 0.7037 - val_loss: 0.6409
- val_accuracy: 0.5833
Epoch 45/50
11/11 [==============================] - 0s 7ms/step - loss: 0.5104 - accuracy: 0.7037 - val_loss: 0.6370
- val_accuracy: 0.5833
Epoch 46/50
11/11 [==============================] - 0s 7ms/step - loss: 0.5072 - accuracy: 0.7315 - val_loss: 0.6332
- val_accuracy: 0.5833
Epoch 47/50
11/11 [==============================] - 0s 6ms/step - loss: 0.5030 - accuracy: 0.7222 - val_loss: 0.6296
- val_accuracy: 0.5833
Epoch 48/50
11/11 [==============================] - 0s 5ms/step - loss: 0.4997 - accuracy: 0.7222 - val_loss: 0.6259
- val_accuracy: 0.5833
Epoch 49/50
11/11 [==============================] - 0s 5ms/step - loss: 0.4961 - accuracy: 0.7407 - val_loss: 0.6222
- val_accuracy: 0.5833
Epoch 50/50
11/11 [==============================] - 0s 5ms/step - loss: 0.4929 - accuracy: 0.7315 - val_loss: 0.6185
- val_accuracy: 0.5833
1/1 [==============================] - 0s 27ms/step - loss: 0.4907 - accuracy: 0.7667
Test accuracy: 76.67%
II YEAR II SEMESTER MACHINE LEARNING 28
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
7. K-means clustering
Exercise 7.1: Implement a program to cluster a dataset using K-means clustering.
Python code:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
# Load the Iris dataset
iris = load_iris()
data = iris.data # Features only, not using target labels
# Standardize the data (important for K-Means)
scaler = StandardScaler()
data_scaled = scaler.fit_transform(data)
# Apply K-Means clustering
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(data_scaled)
# Get cluster labels and centroids
labels = kmeans.labels_
centroids = kmeans.cluster_centers_
# Visualize the clustering result
plt.scatter(data_scaled[:, 0], data_scaled[:, 1], c=labels, cmap='viridis', edgecolors='k', s=50)
plt.scatter(centroids[:, 0], centroids[:, 1], marker='X', s=200, color='red', label='Centroids')
plt.title('K-Means Clustering on Iris Dataset')
II YEAR II SEMESTER MACHINE LEARNING 29
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
plt.xlabel('Sepal Length (scaled)')
plt.ylabel('Sepal Width (scaled)')
plt.legend()
plt.show()
Output :
II YEAR II SEMESTER MACHINE LEARNING 30
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
Exercise 7.2: Implement a program to calculate the elbow method for K-means clustering.
Python code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
from sklearn.preprocessing import StandardScaler
# Generate example data
np.random.seed(42)
X, _ = make_blobs(n_samples=300, centers=4, random_state=42, cluster_std=1.0)
# Standardize the data (important for K-Means)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Implement the elbow method
distortions = []
max_k = 10
for k in range(1, max_k + 1):
kmeans = KMeans(n_clusters=k, random_state=42)
kmeans.fit(X_scaled)
distortions.append(kmeans.inertia_) # Inertia: Sum of squared distances to the closest centroid
# Plot the elbow method graph
plt.plot(range(1, max_k + 1), distortions, marker='o')
plt.title('Elbow Method for Optimal K')
plt.xlabel('Number of Clusters (K)')
plt.ylabel('Sum of Squared Distances (Inertia)')
plt.show()
II YEAR II SEMESTER MACHINE LEARNING 31
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
Output:
II YEAR II SEMESTER MACHINE LEARNING 32
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
8. Principal component analysis
Exercise 8.1: Implement a program to perform principal component analysis on a dataset.
Python code:
import numpy as np
# Step 1: Data standardization
def standardize(X):
return (X - np.mean(X, axis=0)) / np.std(X, axis=0)
# Step 2: Covariance matrix calculation
def compute_covariance_matrix(X):
return np.cov(X.T)
# Step 3: Eigenvalue and eigenvector calculation
def find_eigenvectors_and_eigenvalues(X):
cov_matrix = compute_covariance_matrix(X)
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)
return eigenvalues, eigenvectors
# Step 4: Principal component calculation
def project_data(X, eigenvectors, k):
sorted_eigenvectors = eigenvectors[:, np.argsort(-eigenvectors)[:,:k]]
return np.dot(X, sorted_eigenvectors)
# Step 5: Dimensionality reduction
def get_variance_explained(eigenvalues, k):
return sum(eigenvalues[:k]) / sum(eigenvalues)
# Example usage
X = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
X_std = standardize(X)
eigenvalues, eigenvectors = find_eigenvectors_and_eigenvalues(X_std)
projected_data = project_data(X_std, eigenvectors, 2)
variance_explained = get_variance_explained(eigenvalues, 2)
print("Standardized data:")
print(X_std)
print("Covariance matrix:")
print(compute_covariance_matrix(X_std))
print("Eigenvalues:")
print(eigenvalues)
print("Eigenvectors:")
print(eigenvectors)
print("Projected data:")
print(projected_data)
II YEAR II SEMESTER MACHINE LEARNING 33
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
print("Variance explained:")
print(variance_explained)
Output:
II YEAR II SEMESTER MACHINE LEARNING 34
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
Exercise 8.2: Implement a program to calculate the covariance matrix for a dataset.
Python code:
import numpy as np
# Predefined dataset
dataset = np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[10, 11, 12]
])
# Calculate the covariance matrix
covariance_matrix = np.cov(dataset, rowvar=False)
# Print the covariance matrix
print("Covariance Matrix:")
print(covariance_matrix)
Output:
Covariance Matrix:
[[15. 15. 15.]
[15. 15. 15.]
[15. 15. 15.]]
II YEAR II SEMESTER MACHINE LEARNING 35
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
9. Hierarchical clustering
Exercise 9.1: Implement a program to perform hierarchical clustering on a dataset.
Python code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import linkage, dendrogram
# Generate example data
np.random.seed(42)
X = np.array([[2, 5], [3, 3], [5, 8], [8, 5], [10, 6]])
# Perform hierarchical clustering using linkage function
linkage_matrix = linkage(X, method='complete', metric='euclidean')
# Plot the dendrogram
dendrogram(linkage_matrix, labels=['A', 'B', 'C', 'D', 'E'])
plt.title('Hierarchical Clustering Dendrogram')
plt.xlabel('Data Points')
plt.ylabel('Euclidean Distance')
plt.show()
output :
II YEAR II SEMESTER MACHINE LEARNING 36
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
Exercise 9.2: Implement a program to calculate the agglomerative clustering algorithm for a
hierarchical clustering.
Python code:
import numpy as np
from sklearn.cluster import AgglomerativeClustering
from scipy.cluster.hierarchy import dendrogram, linkage
import matplotlib.pyplot as plt
# Generate example data
np.random.seed(42)
X = np.array([[2, 5], [3, 3], [5, 8], [8, 5], [10, 6]])
# Agglomerative clustering with complete linkage
agg_cluster_complete = AgglomerativeClustering(n_clusters=None, linkage='complete',
distance_threshold=0)
agg_labels_complete = agg_cluster_complete.fit_predict(X)
# Plot dendrogram for complete linkage
linkage_matrix_complete = linkage(X, method='complete')
dendrogram(linkage_matrix_complete, labels=['A', 'B', 'C', 'D', 'E'])
plt.title('Agglomerative Clustering (Complete Linkage)')
plt.show()
II YEAR II SEMESTER MACHINE LEARNING 37
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
Output:
II YEAR II SEMESTER MACHINE LEARNING 38
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
10. Bagging
Exercise 10.1: Implement a program to implement bagging for a decision tree model.
Python code:
from sklearn.datasets import make_classification
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Generate synthetic data
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10, n_classes=2,
random_state=42)
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Define the base decision tree model
base_model = DecisionTreeClassifier(random_state=42)
# Define the bagging classifier
bagging_model = BaggingClassifier(base_model, n_estimators=10, random_state=42)
# Train the bagging model
bagging_model.fit(X_train, y_train)
# Make predictions on the test set
predictions = bagging_model.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, predictions)
print(f"Accuracy: {accuracy}")
Output:
Accuracy: 0.88
II YEAR II SEMESTER MACHINE LEARNING 39
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
Exercise 10.2: Implement a program to calculate the out-of-bag error for a bagging model.
Python code:
from sklearn.datasets import make_classification
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.metrics import accuracy_score
# Generate synthetic data
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10, n_classes=2,
random_state=42)
# Create a bagging model with decision tree as base estimator
base_model = DecisionTreeClassifier(random_state=42)
bagging_model=BaggingClassifier(base_model,n_estimators=10,oob_score=True,
random_state=42)
# Train the bagging model
bagging_model.fit(X, y)
# Access the out-of-bag score
oob_score = bagging_model.oob_score_
print(f"Out-of-Bag Score: {oob_score}")
Output:
Out-of-Bag Score: 0.871
II YEAR II SEMESTER MACHINE LEARNING 40
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
11. Random forest
Exercise 11.1: Implement a program to implement random forests for a decision tree model.
Python code:
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Generate synthetic data
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10, n_classes=2,
random_state=42)
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Define the Random Forest classifier
random_forest_model = RandomForestClassifier(n_estimators=100, random_state=42)
# Train the Random Forest model
random_forest_model.fit(X_train, y_train)
# Make predictions on the test set
predictions = random_forest_model.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, predictions)
print(f"Accuracy: {accuracy}")
Output:
Accuracy: 0.92
II YEAR II SEMESTER MACHINE LEARNING 41
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
Exercise 11.2:
Implement a program to calculate the out-of-bag error for a random forests model.
Python code:
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
# Generate synthetic data
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10, n_classes=2,
random_state=42)
# Create a Random Forest classifier
random_forest_model = RandomForestClassifier(n_estimators=100, oob_score=True,
random_state=42)
# Train the Random Forest model
random_forest_model.fit(X, y)
# Access the out-of-bag score
oob_score = random_forest_model.oob_score_
print(f"Out-of-Bag Score: {oob_score}")
Output:
Out-of-Bag Score: 0.912
II YEAR II SEMESTER MACHINE LEARNING 42
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
12. Model Evaluation
Exercise 12.1: Implement a program to calculate the accuracy, precision, and recall of a model.
Python Code:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, classification_report,
confusion_matrix
def load_example_dataset():
# Load the Iris dataset as an example
iris = load_iris()
X, y = iris.data, iris.target
return X, y
def evaluate_model(X, y):
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create a Logistic Regression model (replace with your own model)
model = LogisticRegression()
# Train the model
model.fit(X_train, y_train)
# Make predictions on the test set
y_pred = model.predict(X_test)
# Calculate and display accuracy, precision, and recall
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
print(f"Accuracy: {accuracy}")
print(f"Precision: {precision}")
II YEAR II SEMESTER MACHINE LEARNING 43
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
print(f"Recall: {recall}")
# Display classification report and confusion matrix
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
print("\nConfusion Matrix:")
cm = confusion_matrix(y_test, y_pred)
print(cm)
if __name__ == "__main__":
# Load an example dataset
X, y = load_example_dataset()
# Evaluate the model and calculate metrics
evaluate_model(X, y)
Output:
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
Classification Report:
precision recall f1-score support
0 1.00 1.00 1.00 10
1 1.00 1.00 1.00 9
2 1.00 1.00 1.00 11
accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30
Confusion Matrix:
[[10 0 0]
[ 0 9 0]
[ 0 0 11]]
II YEAR II SEMESTER MACHINE LEARNING 44
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
Exercise 12.2: Implement a program to calculate the ROC curve and AUC of a model
Python Code:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_curve, auc, roc_auc_score
def load_example_dataset():
# Load the Breast Cancer dataset as an example
breast_cancer = load_breast_cancer()
X, y = breast_cancer.data, breast_cancer.target
return X, y
def evaluate_model(X, y):
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create a Logistic Regression model (replace with your own model)
model = LogisticRegression()
# Train the model
model.fit(X_train, y_train)
# Make probability predictions on the test set
y_prob = model.predict_proba(X_test)[:, 1]
# Calculate ROC curve and AUC
fpr, tpr, thresholds = roc_curve(y_test, y_prob)
roc_auc = auc(fpr, tpr)
# Display AUC score
print(f"AUC Score: {roc_auc}")
# Visualize ROC curve
plt.figure(figsize=(8, 6))
II YEAR II SEMESTER MACHINE LEARNING 45
MALLA REDDY UNIVERSITY AI & ML DEPARTMENT
plt.plot(fpr, tpr, color='darkorange', lw=2, label=f'ROC Curve (AUC = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--', label='Random')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend()
plt.show()
if __name__ == "__main__":
# Load an example dataset
X, y = load_example_dataset()
# Evaluate the model and calculate ROC curve and AUC
evaluate_model(X, y)
Output:
AUC Score: 0.9970520799213888
II YEAR II SEMESTER MACHINE LEARNING 46