Machine learning 3170724
LAB MANUAL
MACHINE LEARNING
Subject Code: 3170724
Prepared By:
Mahavir Swami Collage of Engineering and Technology, Surat
Practical List (Academic year : 2022-23)
Subject : Sem : Department : Faculty Name:
Machine Learning (3170724) 7th C.S.E Darsha Chauhan
Guide: Darsha Chauhan
Machine learning 3170724
MAHAVIR SWAMI COLLEGE OF
ENGINEERING & TECHNOLOGY, SURAT
CERTIFICATE
This is to certify that MR. / Ms. ____________________________________
of class Computer Science & Engineering 7thsemester Enrollment No.
__________________ has satisfactorily submitted his / her term work in
subject Machine learning (Sub. Code: 3170724) for the term ending in
___________.
Date:
Sign of teacher Sign of Head of department
Machine learning 3170724
Index
Sr. date PRACTICAL PAGE Sign
No.
1 17/06/2021 Write a program to Implementation of mean, median and 4
mode
2 01/07/2021 Write a program to implement Data distribution 6
histogram.
3 08/07/2021 Write a program to implement scatter plot using given 7
dataset
4 15/07/2021 Write a program to Implementation of linear regression 8
from given dataset
5 29/07/2021 Write a program to implement Scale 10
6 12/08/2021 Write a program to training and testing from given 12
dataset
7 02/09/2021 Write a program to Implementation of Decision tree from 16
given dataset
8 16/09/2021 Write a program to Implement K-Nearest Neighbors 20
Algorithm from given dataset
9 23/09/2021 Write a program to implementation of K- Mean 22
clustering from given dataset
10 07/10/2021 Write a program to implementation of hierarchical 26
clustering from dataset
Machine learning 3170724
Practical 1: Write a program to Implementation of mean,
median and mode
Code:
import numpy as np
v1=np.arange(1,33)
print(v1)
print('----------------------------------')
v2=np.mean(v1)
print(v2)
print('----------------------------------')
v4=np.arange(1,11)
v3=np.median(v4)
print(v3)
print('----------------------------------')
v5=np.arange(1,11)
v6=np.arange(2,22)
#v5%2
#print(v5)
Gill Prabhdeep Singh(191110107013) Page 1
Machine learning 3170724
sum=0
i=0
for i in range(1,11):
sum=sum+i
print(sum)
sum/11
print(sum)
Gill Prabhdeep Singh(191110107013) Page 2
Machine learning 3170724
Output:
Gill Prabhdeep Singh(191110107013) Page 3
Machine learning 3170724
Practical: 2 Write a program to implement Data distribution
histogram from the given dataset.
Code:
import numpy
import matplotlib.pyplot as plt
x = numpy.random.normal(5.0, 1.0, 2000000)
plt.hist(x, 100)
plt.show()
Gill Prabhdeep Singh(191110107013) Page 4
Machine learning 3170724
Output:
Gill Prabhdeep Singh(191110107013) Page 5
Machine learning 3170724
Practical 3: Write a program to implement scatter plot using
given Dataset
Code:
importnumpy
importmatplotlib.pyplot as plt
x = numpy.random.normal(6.0, 1.0, 200)
y = numpy.random.normal(10.0, 2.0, 200)
plt.scatter(x, y)
plt.show()
Gill Prabhdeep Singh(191110107013) Page 6
Machine learning 3170724
Output:
Gill Prabhdeep Singh(191110107013) Page 7
Machine learning 3170724
Practical 4: Write a program to Implement linear regression
from given dataset
Code:
fromtkinter import *
def select():
sel = "Value = " + str(v.get())
label.config(text = sel)
top = Tk()
top.geometry("200x100")
v = DoubleVar()
scale = Scale( top, variable = v, from_ = 1, to = 100, orient = HORIZONTAL)
scale.pack(anchor=CENTER)
btn = Button(top, text="Value", command=select)
btn.pack(anchor=CENTER)
label = Label(top)
label.pack()
top.mainloop()
Gill Prabhdeep Singh(191110107013) Page 8
Machine learning 3170724
Output:
Gill Prabhdeep Singh(191110107013) Page 9
Machine learning 3170724
Practical 5: Write a program to implement Scale from given
dataset
Code:
import pandas
from sklearn import linear_model
from sklearn.preprocessing import StandardScaler
scale = StandardScaler()
df = pandas.read_csv("cars2.csv")
X = df[['Weight', 'Volume']]
scaledX = scale.fit_transform(X)
print(scaledX)
Gill Prabhdeep Singh(191110107013) Page 10
Machine learning 3170724
Output:
Gill Prabhdeep Singh(191110107013) Page 11
Machine learning 3170724
Practical 6: Write a program to training and testing from given
dataset
Code:
1. Training set: 80% from original dataset by random selection:
Testing set: 20% from original dataset by random selection:
import numpy
import matplotlib.pyplot as plt
numpy.random.seed(2)
x = numpy.random.normal(3, 1, 100)
y = numpy.random.normal(150, 40, 100) / x
train_x = x[:80]
train_y = y[:80]
test_x = x[20:]
test_y = y[20:]
mymodel = numpy.poly1d(numpy.polyfit(train_x, train_y, 4))
myline = numpy.linspace(0, 6, 100)
plt.scatter(train_x, train_y)
plt.plot(myline, mymodel(myline))
plt.show()
Gill Prabhdeep Singh(191110107013) Page 12
Machine learning 3170724
Output:
Gill Prabhdeep Singh(191110107013) Page 13
Machine learning 3170724
2. Training set: 20% from original dataset by random selection:
Testing set: 80% from original dataset by random selection:
import numpy
import matplotlib.pyplot as plt
numpy.random.seed(2)
x = numpy.random.normal(3, 1, 100)
y = numpy.random.normal(150, 40, 100) / x
train_x = x[:80]
train_y = y[:80]
test_x = x[80:]
test_y = y[80:]
mymodel = numpy.poly1d(numpy.polyfit(train_x, train_y, 4))
myline = numpy.linspace(0, 6, 100)
plt.scatter(train_x, train_y)
plt.plot(myline, mymodel(myline))
plt.show()
Gill Prabhdeep Singh(191110107013) Page 14
Machine learning 3170724
Output:
Gill Prabhdeep Singh(191110107013) Page 15
Machine learning 3170724
Practical 7: Write a program to implement Decision tree from
given datast.
Code:
from DecisionTree import *
import pandas as pd
from sklearn import model_selection
df = pd.read_csv('data_set/Social_Network_Ads.csv')
header = list(df.columns)
lst = df.values.tolist()
trainDF, testDF = model_selection.train_test_split(lst, test_size=0.2)
t = build_tree(trainDF, header)
print("\nLeaf nodes ****************")
leaves = getLeafNodes(t)
for leaf in leaves:
print("id = " + str(leaf.id) + " depth =" + str(leaf.depth))
print("\nNon-leaf nodes ****************")
innerNodes = getInnerNodes(t)
for inner in innerNodes:
Gill Prabhdeep Singh(191110107013) Page 16
Machine learning 3170724
print("id = " + str(inner.id) + " depth =" + str(inner.depth))
maxAccuracy = computeAccuracy(testDF, t)
print("\nTree before pruning with accuracy: " + str(maxAccuracy*100) + "\n")
print_tree(t)
nodeIdToPrune = -1
for node in innerNodes:
if node.id != 0:
prune_tree(t, [node.id])
currentAccuracy = computeAccuracy(testDF, t)
print("Pruned node_id: " + str(node.id) + " to achieve accuracy: " +
str(currentAccuracy*100) + "%")
if currentAccuracy > maxAccuracy:
maxAccuracy = currentAccuracy
nodeIdToPrune = node.id
t = build_tree(trainDF, header)
if maxAccuracy == 1:
break
if nodeIdToPrune != -1:
t = build_tree(trainDF, header)
prune_tree(t, [nodeIdToPrune])
print("\nFinal node Id to prune (for max accuracy): " + str(nodeIdToPrune))
else:
t = build_tree(trainDF, header)
Gill Prabhdeep Singh(191110107013) Page 17
Machine learning 3170724
print("\nPruning strategy did'nt increased accuracy")
print("\n********************************************************************")
print("*********** Final Tree with accuracy: " + str(maxAccuracy*100) + "% ************")
print("********************************************************************\n")
print_tree(t)
Gill Prabhdeep Singh(191110107013) Page 18
Machine learning 3170724
Output:
Gill Prabhdeep Singh(191110107013) Page 19
Machine learning 3170724
Practical 8: K-Nearest Neighbors Algorithm
Code:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
# Assign colum names to the dataset
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']
# Read dataset to pandas dataframe
dataset = pd.read_csv(url, names=names)
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 4].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors=5)
Gill Prabhdeep Singh(191110107013) Page 20
Machine learning 3170724
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))
error = []
# Calculating error for K values between 1 and 40
for i in range(1, 40):
knn = KNeighborsClassifier(n_neighbors=i)
knn.fit(X_train, y_train)
pred_i = knn.predict(X_test)
error.append(np.mean(pred_i != y_test))
plt.figure(figsize=(12, 6))
plt.plot(range(1, 40), error, color='red', linestyle='dashed', marker='o',
markerfacecolor='blue', markersize=10)
plt.title('Error Rate K Value')
plt.xlabel('K Value')
plt.ylabel('Mean Error')
Gill Prabhdeep Singh(191110107013) Page 21
Machine learning 3170724
Output:
Gill Prabhdeep Singh(191110107013) Page 22
Machine learning 3170724
Practical 9: Write a program to implementation of K- Mean
clustering given dataset.
Code:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
dataset = pd.read_csv('Mall_Customers.csv')
X = dataset.iloc[:, [3, 4]].values
from sklearn.cluster import KMeans
wcss = []
for i in range(1, 11):
kmeans = KMeans(n_clusters = i, init = 'k-means++', random_state = 42)
kmeans.fit(X)
wcss.append(kmeans.inertia_)
plt.plot(range(1, 11), wcss)
plt.title('The Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS')
plt.show()
kmeans = KMeans(n_clusters = 5, init = 'k-means++', random_state = 42)
y_kmeans = kmeans.fit_predict(X)
plt.scatter(X[y_kmeans == 0, 0], X[y_kmeans == 0, 1], s = 100, c = 'red', label = 'Cluster 1')
Gill Prabhdeep Singh(191110107013) Page 23
Machine learning 3170724
plt.scatter(X[y_kmeans == 1, 0], X[y_kmeans == 1, 1], s = 100, c = 'blue', label = 'Cluster 2')
plt.scatter(X[y_kmeans == 2, 0], X[y_kmeans == 2, 1], s = 100, c = 'green', label = 'Cluster 3')
plt.scatter(X[y_kmeans == 3, 0], X[y_kmeans == 3, 1], s = 100, c = 'cyan', label = 'Cluster 4')
plt.scatter(X[y_kmeans == 4, 0], X[y_kmeans == 4, 1], s = 100, c = 'magenta', label = 'Cluster 5')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s = 300, c = 'yellow',
label = 'Centroids')
plt.title('Clusters of customers')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1-100)')
plt.legend()
plt.show()
Gill Prabhdeep Singh(191110107013) Page 24
Machine learning 3170724
Output:
Gill Prabhdeep Singh(191110107013) Page 25
Machine learning 3170724
Practical 10: Write a program to implementation of
hierarchical clustering from given dataset.
Code:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
dataset = pd.read_csv('Mall_Customers.csv')
X = dataset.iloc[:, [3, 4]].values
import scipy.cluster.hierarchy as sch
dendrogram = sch.dendrogram(sch.linkage(X, method = 'ward'))
plt.title('Dendrogram')
plt.xlabel('Customers')
plt.ylabel('Euclidean distances')
plt.show()
from sklearn.cluster import AgglomerativeClustering
hc = AgglomerativeClustering(n_clusters = 5, affinity = 'euclidean', linkage = 'ward')
y_hc = hc.fit_predict(X)
plt.scatter(X[y_hc == 0, 0], X[y_hc == 0, 1], s = 100, c = 'red', label = 'Cluster 1')
plt.scatter(X[y_hc == 1, 0], X[y_hc == 1, 1], s = 100, c = 'blue', label = 'Cluster 2')
plt.scatter(X[y_hc == 2, 0], X[y_hc == 2, 1], s = 100, c = 'green', label = 'Cluster 3')
plt.scatter(X[y_hc == 3, 0], X[y_hc == 3, 1], s = 100, c = 'cyan', label = 'Cluster 4')
plt.scatter(X[y_hc == 4, 0], X[y_hc == 4, 1], s = 100, c = 'magenta', label = 'Cluster 5')
Gill Prabhdeep Singh(191110107013) Page 26
Machine learning 3170724
plt.title('Clusters of customers')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1-100)')
plt.legend()
plt.show()
Gill Prabhdeep Singh(191110107013) Page 27
Machine learning 3170724
Output:
Gill Prabhdeep Singh(191110107013) Page 28