HINDUSTHAN
COLLEGE OF ENGINEERING
AND TECHNOLOGY
(AUTONOMOUS INSTITUTION)
Coimbatore - 641032
22CS5252/MACHINE LEARNING LABORATORY
REG.NO :
NAME :
COURSE :
YEAR/SEM:
HINDUSTHAN
COLLEGE OF ENGINEERING AND TECHNOLOGY
(AUTONOMOUS INSTITUTION)
Coimbatore-641032.
DEPARTMENT OF COMPUTER
DEPARTMENT SCIENCE
OF COMPUTER SCIENCE AND ENGINEERING
AND ENGINEERING
Certified that this is the bonafide record of work done by
In the 22CS5252 / MACHINE LEARNING LABORATORY of this autonomous
institution, for the FIFTH Semester during the Academic year 2024 –2025.
Place : Coimbatore
Date:
Staff In-charge Head of the Department
Register Number:
Submitted for the 22CS5252 / MACHINE LEARNING LABORATORY practical examination
conducted on .
INTERNALEXAMINER EXTERNALEXAMINER
CONTENTS
PAGE
S.NO DATE EXPERIMENT MARKS SIGN
NO
Implementation of Basic Python Libraries
1 a)
(Math, Numpy, Scipy)
Implementation of Python Libraries for
1 b)
Machine Learning Applications (Pandas,
Matplotlib)
1 c) Creation and Loading of Datasets
2 Find-S Algorithm for Hypothesis Selection
Support Vector Machine (SVM) Decision
3
Boundary
4 Decision Tree Classification using ID3
Algorithm
5 Clustering Using EM (GMM) and k-Means
Algorithms
k-Nearest Neighbor Classification
6
STAFFINCHARGE
Ex.No: 01 a)
Implementation of Basic Python Libraries (Math, Numpy, Scipy)
Date:
Aim:
Algorithm:
Program:
# Importing libraries
import math
import numpy as np
from scipy import integrate
from scipy import linalg
# 1. Using the Math library
print("Math Library Operations:")
print("Square root of 16:", math.sqrt(16))
print("Factorial of 5:", math.factorial(5))
print("Sin(45 degrees):", math.sin(math.radians(45)))
# 2. Using the Numpy library
Print ("\nNumpy Library Operations:")
# Creating a numpy array
array = np.array([1, 2, 3, 4, 5])
print("Original array:", array)
# Array operations
print("Array after adding 10:", array + 10)
print("Mean of array:", np.mean(array))
print("Standard deviation of array:", np.std(array))
print("Dot product of array with itself:", np.dot(array, array))
# Matrix creation and operations
matrix = np.array([[1, 2], [3, 4]])
print("\nOriginal matrix:\n", matrix)
print("Matrix transpose:\n", np.transpose(matrix))
print("Matrix determinant:", np.linalg.det(matrix))
# 3. Using the Scipy library
print("\nScipy Library Operations:")
# Integration using Scipy
result, error = integrate.quad(lambda x: x**2, 0, 1)
print("Integration of x^2 from 0 to 1:", result)
# Solving a linear system Ax = b
A = np.array([[3, 1], [1, 2]])
b = np.array([9, 8])
x = linalg.solve(A, b)
print("Solution of linear system Ax = b:", x)
Result:
Ex.No: 01 b) Implementation of Python Libraries for Machine Learning
A Applications (Pandas, Matplotlib)
Date:
Aim:
Algorithm:
Program:
# Importing libraries
import pandas as pd
import matplotlib.pyplot as plt
# 1. Using the Pandas Library
print("Pandas Library Operations:")
# Loading a dataset (using a simple dictionary for illustration)
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
'Age': [24, 27, 22, 32, 29],
'Salary': [50000, 54000, 45000, 62000, 58000]
}
# Creating a DataFrame
df = pd.DataFrame(data)
print("Dataframe:\n", df)
# Data inspection
print("\nBasic Data Information:")
print("Data types:\n", df.dtypes)
print("Summary statistics:\n", df.describe())
# Filtering data
filtered_df = df[df['Age'] > 25]
print("\nFiltered data (Age > 25):\n", filtered_df)
# 2. Using the Matplotlib Library
print("\nMatplotlib Library Operations:")
# Plotting a bar chart for Age vs. Salary
plt.figure(figsize=(8, 5))
plt.bar(df['Name'], df['Salary'], color='skyblue')
plt.xlabel('Name')
plt.ylabel('Salary')
plt.title('Salary by Person')
plt.show()
# Plotting Age vs Salary as a scatter plot
plt.figure(figsize=(8, 5))
plt.scatter(df['Age'], df['Salary'], color='green')
plt.xlabel('Age')
plt.ylabel('Salary')
plt.title('Salary vs Age')
plt.show()
Result:
Ex.No:01 c)
Creation and Loading of Datasets
Date:
Aim:
Algorithm:
Program:
# Importing libraries
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification, load_iris
import seaborn as sns
# 1. Creating a Synthetic Dataset Using Numpy
print("Synthetic Dataset with Numpy:")
# Generating random data with Numpy
np.random.seed(0) # For reproducibility
synthetic_data = np.random.rand(10, 3) # 10 rows, 3 columns
print(synthetic_data)
# 2. Creating a Synthetic Classification Dataset Using Scikit-Learn
print("\nSynthetic Classification Dataset with Scikit-Learn:")
# Generating a classification dataset with Scikit-Learn
X, y = make_classification(n_samples=100, n_features=4, n_classes=2, random_state=0)
print("Features:\n", X[:5])
print("Labels:\n", y[:5])
# 3. Loading a Dataset from a CSV File Using Pandas
print("\nLoading Dataset from CSV:")
# Sample data in dictionary form for example purposes
sample_data = {
'A': np.random.rand(5),
'B': np.random.rand(5),
'C': np.random.rand(5)
}
# Creating a DataFrame and saving it to a CSV
df = pd.DataFrame(sample_data)
df.to_csv('sample_data.csv', index=False)
# Reading the CSV back into a DataFrame
df_loaded = pd.read_csv('sample_data.csv')
print(df_loaded)
# 4. Loading a Built-in Dataset Using Scikit-Learn
print("\nLoading Built-in Dataset with Scikit-Learn:")
# Load the Iris dataset
iris = load_iris()
iris_df = pd.DataFrame(iris.data, columns=iris.feature_names)
print(iris_df.head())
# 5. Loading a Built-in Dataset Using Seaborn
print("\nLoading Built-in Dataset with Seaborn:")
# Load the 'tips' dataset from Seaborn
tips = sns.load_dataset('tips')
print(tips.head())
Result:
Ex.No:02
Find-S Algorithm for Hypothesis Selection
Date:
Aim:
Algorithm:
Program:
Let's start by creating a sample CSV file with training data. The CSV file should contain rows with attribute
values and the class label (e.g., "Yes" or "No").
Sample CSV File (training_data.csv)
Outlook Temperature Humidity Wind Water Forcast Play
Sunny Warm Normal Strong Warm Same Yes
Sunny Warm High Strong Warm Same Yes
Rainy Cold High Strong Warm Change No
Sunny Warm High Strong Cool Change Yes
Save this data as training_data.csv.
import csv
# Step 1: Read the CSV file and extract the relevant columns
def read_csv(file_path):
data = []
with open(file_path, mode='r') as file:
reader = csv.reader(file)
for row in reader:
data.append(row)
return data
# Step 2: Find-S Algorithm Implementation
def find_s_algorithm(data):
# Step 2.1: Initialize the hypothesis
# Assuming the first positive instance is the most specific hypothesis
hypothesis = data[0][:-1] # All attributes except the last (target class)
# Step 2.2: Iterate through the training data
for row in data:
if row[-1] == 'Yes': # Only consider positive instances (class 'Yes')
for i in range(len(hypothesis)):
# Generalize the hypothesis if needed
if hypothesis[i] != row[i]:
hypothesis[i] = '?' # Use '?' to denote any value (generalized)
return hypothesis
# Step 3: Main function to run the process
def main():
file_path = r'C:\Users\Hicet\Documents\training_dataa.csv' # Path to your CSV file
data = read_csv(file_path)
# Step 3.1: Apply Find-S algorithm
hypothesis = find_s_algorithm(data)
# Step 3.2: Print the most specific hypothesis
print("Most Specific Hypothesis:", hypothesis)
if __name__ == "__main__":
main()
Output:
Result:
Ex.No:3
Support Vector Machine (SVM) Decision Boundary
Date:
Aim:
Algorithm:
Program:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# Step 1: Load the Iris dataset
iris = datasets.load_iris()
X = iris.data[:, :2] # Use only the first two features for 2D visualization (sepal length and sepal width)
y = iris.target
# Step 2: Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Step 3: Standardize the data (important for SVM to perform well)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Step 4: Train the SVM classifier (use a linear kernel for simplicity)
svm = SVC(kernel='linear', random_state=42)
svm.fit(X_train, y_train)
# Step 5: Plot the decision boundary
# Create a mesh grid over the feature space
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 500), np.linspace(y_min, y_max, 500))
# Predict the class labels for each point in the grid
Z = svm.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
# Plot the decision boundary
plt.contourf(xx, yy, Z, alpha=0.75, cmap=plt.cm.coolwarm)
# Step 6: Plot the training points
plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train, edgecolors='k', marker='o', s=100, cmap=plt.cm.coolwarm)
# Plot support vectors
plt.scatter(svm.support_vectors_[:, 0], svm.support_vectors_[:, 1], s=200, facecolors='none', edgecolors='k',
linewidth=2, label='Support Vectors')
# Step 7: Set labels and title
plt.title("SVM Decision Boundary and Support Vectors (Linear Kernel)")
plt.xlabel('Sepal Length')
plt.ylabel('Sepal Width')
plt.legend()
plt.show()
Result:
Ex.No:4
Decision Tree Classification using ID3 Algorithm
Date:
Aim:
Algorithm:
Program:
# Importing necessary libraries
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
import pandas as pd
# Define the dataset
data = {
'Weather': ['Sunny', 'Sunny', 'Overcast', 'Rainy', 'Rainy', 'Sunny', 'Overcast', 'Rainy', 'Sunny'],
'Temperature': ['Hot', 'Mild', 'Mild', 'Cool', 'Mild', 'Cool', 'Hot', 'Mild', 'Mild'],
'PlayTennis': ['No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes']
}
# Convert to DataFrame
df = pd.DataFrame(data)
# Encode the categorical variables (Weather, Temperature, PlayTennis)
df_encoded = pd.get_dummies(df)
# Define features and target
X = df_encoded[['Weather_Sunny', 'Weather_Overcast', 'Weather_Rainy', 'Temperature_Hot',
'Temperature_Mild', 'Temperature_Cool']]
y = df_encoded['PlayTennis_Yes']
# Initialize and train the decision tree classifier (ID3)
clf = DecisionTreeClassifier(criterion='entropy')
clf.fit(X, y)
# Visualize the tree (fix the issue with feature_names)
tree.plot_tree(clf, filled=True, feature_names=X.columns.tolist(), class_names=['No', 'Yes'], rounded=True)
# Classify a new sample
new_sample = pd.DataFrame({
'Weather_Sunny': [1], 'Weather_Overcast': [0], 'Weather_Rainy': [0],
'Temperature_Hot': [0], 'Temperature_Mild': [1], 'Temperature_Cool': [0]
})
prediction = clf.predict(new_sample)
print("Predicted class for the new sample:" ,{prediction[0]})
Result:
Ex.No:05
Clustering Using EM (GMM) and k-Means Algorithms
Date:
Aim:
Algorithm:
Program:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.mixture import GaussianMixture
# Generate synthetic data with 2D features
np.random.seed(42)
# Create 3 clusters of data points with different means and covariances
data1 = np.random.randn(300, 2) + [5, 5]
data2 = np.random.randn(300, 2) + [-5, -5]
data3 = np.random.randn(300, 2) + [5, -5]
# Combine the data into a single dataset
X = np.vstack([data1, data2, data3])
# Create K-Means model with 3 clusters
kmeans = KMeans(n_clusters=3, random_state=0)
kmeans_labels = kmeans.fit_predict(X)
# Create Gaussian Mixture Model (GMM) with 3 components
gmm = GaussianMixture(n_components=3, random_state=0)
gmm_labels = gmm.fit_predict(X)
# Plotting the results
fig, axs = plt.subplots(1, 2, figsize=(12, 6))
# Plot K-Means clustering
axs[0].scatter(X[:, 0], X[:, 1], c=kmeans_labels, cmap='viridis', marker='.')
axs[0].scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=200, c='red', marker='X')
axs[0].set_title("K-Means Clustering")
# Plot GMM clustering
axs[1].scatter(X[:, 0], X[:, 1], c=gmm_labels, cmap='viridis', marker='.')
axs[1].set_title("GMM Clustering (EM)")
# Show the plots
plt.tight_layout()
plt.show()
# Optional: print cluster centers for K-Means and GMM component means
print("K-Means Cluster Centers:")
print(kmeans.cluster_centers_)
print("\nGMM Component Means:")
print(gmm.means_)
Result:
Ex.No:06
k-Nearest Neighbor Classification
Date:
Aim:
Algorithm:
Program:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
# Step 1: Load the Iris dataset
iris = load_iris()
X = iris.data # Features (sepal length, sepal width, petal length, petal width)
y = iris.target # Labels (species of iris)
# Step 2: Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Step 3: Initialize and train the k-NN classifier (k=3 in this case)
k=3
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)
# Step 4: Make predictions on the test set
y_pred = knn.predict(X_test)
# Step 5: Compare predictions with actual labels and print results
correct_predictions = []
incorrect_predictions = []
for i in range(len(y_test)):
if y_pred[i] == y_test[i]:
correct_predictions.append((X_test[i], y_test[i], y_pred[i]))
else:
incorrect_predictions.append((X_test[i], y_test[i], y_pred[i]))
# Step 6: Print correct predictions
print("Correct Predictions:")
for x, actual, predicted in correct_predictions:
print(f"Features: {x}, Actual: {iris.target_names[actual]}, Predicted: {iris.target_names[predicted]}")
# Step 7: Print incorrect predictions
print("\nIncorrect Predictions:")
for x, actual, predicted in incorrect_predictions:
print(f"Features: {x}, Actual: {iris.target_names[actual]}, Predicted: {iris.target_names[predicted]}")
# Step 8: Calculate and print the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print(f"\nAccuracy: {accuracy * 100:.2f}%")
Result: