ML Manual
ML Manual
1
2
3
DHIRAJLAL GANDHI COLLEGE OF TECHNOLOGY
4
Salem Airport (Opp.), Salem – 636
309Ph. (04290) 233333,
www.dgct.ac.in
BONAFIDE CERTIFICATE
Name : …………………………………………………………
Degree : …………………………………………………………
Branch : …………………………………………………………
Certified that this is the bonafide record of the work done by the above student in
…………………………………………………………………………………………………
5
4
LAB MANNERS
Students must be present in proper dress code and wear the ID card.
Students should enter the log-in and log-out time in the log register without fail.
Students are not allowed to download pictures, music, videos or files without the permission
of respective lab in-charge.
Students should wear their own lab coats and bring observation note books to thelaboratory
classes regularly.
Record of experiments done in a particular class should be submitted in the nextlab class.
Students who do not submit the record note book in time will not be allowed to dothe next
experiment and will not be given attendance for that laboratory class.
Students will not be allowed to leave the laboratory until they complete the
experiment.
Students are advised to switch-off the Monitors and CPU when they leave the lab.
Students are advised to arrange the chairs properly when they leave the lab.
5
College
Vision
To improve the quality of human life through multi-disciplinary programs in
Engineering, architecture and management that are internationally recognized and
would facilitate research work to incorporate social economical and environmental
development.
Mission
To create a vibrant atmosphere that creates competent engineers, innovators,scientists,
entrepreneurs, academicians and thinkers of tomorrow.
To establish centers of excellence that provides sustainable solutions to industryand
society. To enhance capability through various values added programs so as to meet
thechallenges ofdynamically changing global needs.
Department
Vision
To cultivate creative, globally competent, employable and disciplined computing
professionals with the spirit of benchmarking educational system that promotes
academic excellence, scientific pursuits, entrepreneurship and professionalism.
Mission
To develop the creators of tomorrow’s technology to meet the social needs of
ournation.
To promote and encourage the strength of research in Engineering, Science
andTechnology.
To channel the gap between Academia, Industry and Society.
Program Educational
Objectives(PEOs)
6
To provide students with an academic environment conducive for life-long
PEO learningneeded for a successful professional career.
5
7
Program Outcomes(POs)
To apply knowledge of mathematics, science, engineering fundamentals and
PO1 computer science theory to solve the complex problems in Computer Scienceand
Engineering.
To analyze problems, identify and define the solutions using basic principles
PO2
ofmathematics, science, technology and computer engineering.
To design, implement, and evaluate computer based systems, processes,
PO3 components, or software to meet the realistic constraints for the public healthand
safety,
and the cultural, societal and environmental considerations.
To design and conduct experiments, perform analysis & interpretation and
PO4
provide valid conclusions with the use of research-based knowledge andresearch
methodologies related to Computer Science and Engineering.
To propose innovative original ideas and solutions, culminating into
PO5
modernengineering products for a large section of the society with longevity.
To apply the understanding of legal, health, security, cultural & social issues,
PO6 and thereby ones responsibility in their application in Professional
Engineering practices.
To understand the impact of the professional engineering solutions in societaland
PO7
environmental issues, and the need for sustainable development.
To demonstrate integrity, ethical behavior and commitment to code of conduct
PO8 of professional practices and standards to adapt to the technologicaldevelopments of
revolutionary world.
To function effectively as an individual, and as a member or leader in diverseteams,
PO9
and in multifaceted environments.
To communicate effectively to end users, with effective presentations and
PO10 write comprehends technical reports and publications representing
efficientengineering
solutions.
To understand the engineering and management principles and their
PO11 applications to manage projects to suite the current needs of
multidisciplinaryindustries.
To learn and invent new technologies, and use them effectively
PO12
towardscontinuous professional development throughout the human life.
Program Specific
Outcomes(PSOs)
PSO1 Graduates with an interest in, and aptitude for, advanced studies in computing
willhave completed, or be actively pursuing, graduate studies in computing.
8
Graduates will be informed and involved members of their communities, and
PSO2 responsible engineering and computing professionals.
9
Course
Outcomes(COs)
CO1 Analyze the efficiency of algorithms using various frameworks
CO2 Apply graph algorithms to solve problems and analyze their efficiency.
CO3 Make use of algorithm design techniques like divide and conquer, dynamic
programming and greedy techniques to solve problems
CO4 Use the state space tree method for solving problems.
10
CS3401 ALGORITHMS LTPC
3 02 4
1. Implement Linear Search. Determine the time required to search for an element. Repeat the experiment for
different values of n, the number of elements in the list to be searched and plot a graph of the time taken versus
n.
2. Implement recursive Binary Search. Determine the time required to search an element. Repeat the
experiment for different values of n, the number of elements in the list to be searched and plot a graph of the
time taken versus n.
3. Given a text txt [0...n-1] and a pattern pat [0...m-1], write a function search (char pat [ ], char txt [ ]) that prints
all occurrences of pat [ ] in txt [ ]. You may assume that n > m.
4. Sort a given set of elements using the Insertion sort and Heap sort methods and determine the time required
to sort the elements. Repeat the experiment for different values of n, the number of elements in the list to be
sorted and plot a graph of the time taken versus n.
Graph Algorithms
1. Develop a program to implement graph traversal using Breadth First Search
2. Develop a program to implement graph traversal using Depth First Search
3. From a given vertex in a weighted connected graph, develop a program to find the shortest paths to other
vertices using Dijkstra’s algorithm.
4. Find the minimum cost spanning tree of a given undirected graph using Prim’s algorithm.
5. Implement Floyd’s algorithm for the All-Pairs- Shortest-Paths problem.
6. Compute the transitive closure of a given directed graph using Warshall's algorithm.
11
to sort. Repeat the experiment for different values of n, the number of elements in the list to be sorted and plot
a graph of the time taken versus n.
COURSE OUTCOMES: At the end of this course, the students will be able to:
TOTAL 30 HRS
13
1. S Page Date of Marks Staff
Remar
.DATE Name of the Experiment No. completion Awarded Signature
ks
2. N
o
.
Perform a case study by installing
and exploring varioustypes of
operating systems on a physical or
1.
logical (virtual) machine. (Linux
Installation).
14
Page Date of Marks Staff
Remarks
S. No. completion Awarded Signature
DATE Name of the Experiment
No.
a)First fit
b)Worst fit
c)Best fit
15
Page Date of Marks Staff
Remarks
S. No. completion Awarded Signature
DATE Name of the Experiment
No.
RECORDCOMPLETIONDATE: AVERAGEMARKSSCORED:
LAB-IN-CHARGE:
16
1. Implement and demonstrate the FIND-S algorithm for finding the most specific
hypothesis based on a given set of training data samples. Read the training data
from a .CSV file.
Aim:
The aim of this program is to learn the most specific hypothesis from the given training data.
Procedure:
Sure, here's a step-by-step procedure to implement and demonstrate the FIND-S algorithm for finding
the most specific hypothesis based on a given set of training data samples from a CSV file:
Read the training data from the CSV file. Each row represents a training example, and the last
column represents the class label.
Initialize the most specific hypothesis with the first training example.
After iterating through all training examples, output the final most specific hypothesis.
17
Program:
import csv
def find_s_algorithm(training_data):
hypothesis = ['0'] * num_attributes # Initialize the hypothesis to the most specific hypothesis
for i in range(num_attributes):
hypothesis[i] = instance[i]
return hypothesis
def read_training_data_from_csv(file_name):
training_data = []
csv_reader = csv.reader(file)
training_data.append(row)
return training_data
18
def main():
file_name = 'training_data.csv'
training_data = read_training_data_from_csv(file_name)
print("Training Data:")
print(instance)
print()
hypothesis = find_s_algorithm(training_data)
print(hypothesis)
if __name__ == "__main__":
main()
Output :
Outlook,Temperature,Humidity,Windy,PlayTennis
Sunny,Hot,High,FALSE,No
Sunny,Hot,High,TRUE,No
Overcast,Hot,High,FALSE,Yes
Rainy,Mild,High,FALSE,Yes
Rainy,Cool,Normal,FALSE,Yes
Rainy,Cool,Normal,TRUE,No
Overcast,Cool,Normal,TRUE,Yes
Sunny,Mild,High,FALSE,No
Sunny,Cool,Normal,FALSE,Yes
Rainy,Mild,Normal,FALSE,Yes
19
Sunny,Mild,Normal,TRUE,Yes
Overcast,Mild,High,TRUE,Yes
Overcast,Hot,Normal,FALSE,Yes
Rainy,Mild,High,TRUE,No
Result:
The program outputs the learned hypothesis after processing the training data.
20
1. For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Candidate-Elimination algorithm to output a description of
the set of all hypotheses consistent with the training examples.
Aim:
The aim of this program is to output a description of the set of all hypotheses consistent with
the training examples using the Candidate-Elimination algorithm.
Procedure:
Certainly! Below is a procedure to implement and demonstrate the Candidate-Elimination
algorithm for generating a description of the set of all hypotheses consistent with the training
examples from a CSV file:
Read the training data from the CSV file. Each row represents a training example, and
the last column represents the class label.
Initialize the version space with the most specific and most general hypotheses.
21
After iterating through all training examples, output the final version space, which
contains all hypotheses consistent with the training examples.
Program:
import csv
def get_training_data(file_name):
with open(file_name, 'r') as file:
csv_reader = csv.reader(file)
training_data = [row for row in csv_reader]
return training_data
def candidate_elimination(training_data):
num_attributes = len(training_data[0]) - 1
S = [('',) * num_attributes] # Most specific hypothesis
G = [('?',) * num_attributes] # Most general hypothesis
22
else: # Negative example
for g in list(G):
if is_consistent(x, g):
G.remove(g)
for i in range(num_attributes):
new_h = list(g)
if new_h[i] == '?' or new_h[i] == x[i]:
new_h[i] = x[i]
G.append(tuple(new_h))
return S, G
if __name__ == "__main__":
training_data = get_training_data("training_data.csv")
S, G = candidate_elimination(training_data)
print("S:", S)
print("G:", G)
Output :
Sky,Temperature,Humidity,Wind,Water,Forecast,EnjoySport
Sunny,Warm,Normal,Strong,Warm,Same,Yes
Sunny,Warm,High,Strong,Warm,Same,Yes
Rainy,Cold,High,Strong,Warm,Change,No
Sunny,Warm,High,Strong,Cool,Change,Yes
Result:
The program outputs the final specific hypotheses (S) and final general hypotheses (G) after processing
the training data.
23
2. Write a program to demonstrate the working of the decision tree based ID3
algorithm. Use an appropriate data set for building the decision tree and apply
this knowledge to classify a new sample.
Aim:
The aim of this program is to demonstrate the working of the decision tree-based ID3 algorithm
by building a decision tree classifier on the Iris dataset and using it to classify a new sample.
Procedure:
Here's a procedure to implement and demonstrate the working of the decision tree based ID3
algorithm:
6. Make Predictions :
Use the trained decision tree classifier to make predictions on the testing dataset.
24
7. Evaluate the Model :
Evaluate the performance of the decision tree classifier using metrics such as accuracy,
precision, recall, and F1-score.
Program :
import numpy as np
class Node:
def entropy(y):
return Node(label=y[0])
25
if len(attributes) == 0: # If there are no more attributes to split on
best_attribute_index = np.argmax(gains)
best_attribute = attributes[best_attribute_index]
node = Node(attribute=best_attribute)
return node
return node.label
else:
# Example usage
if __name__ == "__main__":
26
X = np.array([
])
y = np.array(['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No'])
27
print("Predicted class label:", predicted_label)
Output :
pip install scikit-learn
Result:
The program outputs the decision tree visualization and predicts the class of a new sample
based on the trained decision tree.
28
3. Build an Artificial Neural Network by implementing the Backpropagation
algorithm and test the same using appropriate data sets.
Aim:
The aim of this program is to implement an Artificial Neural Network using the Backpropagation
algorithm and test it with appropriate datasets.
Procedure:
Sure, here's a procedure to implement and test an Artificial Neural Network (ANN) using the
Backpropagation algorithm:
5. Forward Propagation :
Implement the forward propagation process to compute the output of the neural
network for a given input.
6. Backpropagation :
29
Implement the backpropagation algorithm to update the weights and biases of the
neural network based on the error between the predicted output and the actual output.
Program :
import numpy as np
class NeuralNetwork:
self.input_size = input_size
self.hidden_size = hidden_size
self.output_size = output_size
return 1 / (1 + np.exp(-x))
return x * (1 - x)
30
# Propagate inputs through the network
return self.output
hidden_error = output_delta.dot(self.weights_hidden_output.T)
# Update weights
for _ in range(epochs):
self.forward(inputs)
return self.forward(inputs)
# Example usage:
if __name__ == "__main__":
31
inputs = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
input_size = 2
hidden_size = 4
output_size = 1
learning_rate = 0.1
epochs = 10000
predictions = neural_network.predict(test_data)
print(predictions)
Output :
Epoch 0, Loss: 0.4610432043725938
32
Epoch 800, Loss: 0.026725408084147274
Accuracy: 0.9666666666666667
Result:
The program trains the neural network using the XOR dataset and prints the predictions made
by the trained network. The XOR dataset is chosen for simplicity, but the network can be trained
with other datasets as well.
33
4. Write a program to implement the naïve Bayesian classifier for a sample
training data set stored as a .CSV file. Compute the accuracy of the classifier,
considering few test data sets.
Aim:
The aim of this program is to implement the Naive Bayes classifier for a sample training dataset
stored as a .CSV file and compute the accuracy of the classifier using a few test datasets.
Procedure:
Sure, here's an outline of how you can write a program to implement the naïve Bayesian
classifier in Python:
5. Compute accuracy :
Compare the predicted labels with the actual labels in the test data set and calculate the
accuracy.
34
6. Repeat steps 1-5 for multiple test data sets :
This is to ensure that the classifier's performance is evaluated on various data samples.
Program :
import pandas as pd
def load_data(file_path):
return pd.read_csv(file_path)
y_train = train_data['class']
y_test = test_data['class']
model = GaussianNB()
model.fit(X_train, y_train)
35
predictions = model.predict(X_test)
return accuracy
if __name__ == "__main__":
# Load dataset
dataset = load_data("training_data.csv")
Output :
Accuracy of the Naive Bayes classifier: 0.85
Result:
The program computes the accuracy of the Naive Bayes classifier on the test dataset and
prints the result.
36
5. Assuming a set of documents that need to be classified, use the naïve Bayesian
Classifier model to perform this task. Built-in Java classes/API can be used to
write the program. Calculate the accuracy, precision, and recall
for your data set.
Aim:
The aim of this program is to implement the Naive Bayesian Classifier model to classify
documents and calculate accuracy, precision, and recall for the dataset.
Procedure:
Here's a procedural breakdown for constructing a Bayesian network considering medical data
and using it to demonstrate the diagnosis of heart patients using the standard Heart Disease
Data Set:
37
Preprocess the data if necessary, including handling missing values, encoding categorical
variables, and scaling numerical features.
5. Perform Inference :
Use the Bayesian network to perform inference, i.e., to make predictions or diagnoses
based on observed evidence.
Provide evidence for variables like age, sex, cholesterol levels, etc., to perform diagnosis
for heart patients.
7. Display Results :
Display the diagnosis results for heart patients based on the Bayesian network model.
Program :
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.Random;
import weka.core.Instances;
import weka.classifiers.bayes.NaiveBayes;
import weka.classifiers.evaluation.Evaluation;
38
Instances data = new Instances(reader);
reader.close();
// Set the class attribute (assuming the last attribute is the class)
data.setClassIndex(data.numAttributes() - 1);
nb.buildClassifier(data);
Output :
Accuracy: 85.0%
Precision: 84.0%
Recall: 86.0%
Result:
The program calculates and prints the accuracy, precision, and recall based on the classification
results on the test dataset.
39
6. Write a program to construct a Bayesian network considering medical data.
Use this model to demonstrate the diagnosis of heart patients using standard
Heart Disease Data Set. You can use Java/Python ML library classes/API.
Aim:
The aim of this program is to construct a Bayesian network considering medical data and use it
to diagnose heart patients using the standard Heart Disease Data Set.
Procedure:
Here's a procedural breakdown for using the naïve Bayesian classifier model to classify a set of
documents in Java and calculating accuracy, precision, and recall:
40
3. Train the Naïve Bayesian Classifier :
Use built-in Java classes/APIs (e.g., Weka library) to train the Naïve Bayesian classifier
with the training data.
Recall: The proportion of correctly classified positive documents among all actual
positive documents.
Program :
import pandas as pd
# Load dataset
data = pd.read_csv("heart_disease_data.csv")
('sex', 'heart_disease'),
('cp', 'heart_disease'),
('trestbps', 'heart_disease'),
41
('chol', 'heart_disease'),
('fbs', 'heart_disease'),
('restecg', 'heart_disease'),
('thalach', 'heart_disease'),
('exang', 'heart_disease'),
('oldpeak', 'heart_disease'),
('slope', 'heart_disease'),
('ca', 'heart_disease'),
('thal', 'heart_disease')])
model.fit(data, estimator)
# Perform inference
inference = VariableElimination(model)
print("Diagnosis:", query_result['heart_disease'])
Output :
Diagnosis: 1
Result:
The program constructs a Bayesian network using medical data and performs diagnosis of heart
patients using the network. It prints the probability distribution of heart disease given the
evidence provided.
42
7. Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same
data set for clustering using k-Means algorithm. Compare the results of these
two algorithms and comment on the quality of clustering. You can add
Java/Python ML library classes/API in the program.
Aim:
The aim of this program is to apply the EM algorithm and the k-Means algorithm to cluster a set
of data stored in a .CSV file. It then compares the results of these two algorithms and comments
on the quality of clustering.
Procedure:
Here's a general procedure for implementing the EM algorithm and k-means algorithm in
Python using popular libraries like scikit-learn:
3. Implement EM algorithm :
Import GaussianMixture from sklearn.mixture.
Initialize the Gaussian mixture model with the desired number of clusters.
Fit the model to the data.
Retrieve cluster assignments and cluster centers.
43
4. Implement k-means algorithm :
Import KMeans from sklearn.cluster.
Initialize the KMeans model with the desired number of clusters.
Fit the model to the data.
Retrieve cluster assignments and cluster centers.
5. Compare results :
Compare the clustering results from both algorithms using metrics like silhouette score,
adjusted Rand index, etc.
Visualize the clusters if possible to observe the quality of clustering.
Program :
import pandas as pd
import numpy as np
data = pd.read_csv("data.csv")
scaler = StandardScaler()
data_scaled = scaler.fit_transform(data)
kmeans_labels = kmeans.fit_predict(data_scaled)
44
# Apply EM algorithm (Gaussian Mixture Model)
gmm_labels = gmm.fit_predict(data_scaled)
# Output results
Output :
Silhouette Score for k-Means: 0.75
Result :
The program performs clustering using both EM algorithm and k-Means algorithm, then plots
the results for visual comparison. After analyzing the clustering results, one can comment on the
quality of clustering based on metrics such as cluster separation, compactness, and overlap.
45
8. Write a program to implement k-Nearest Neighbour algorithm to classify the
iris data set. Print both correct and wrong predictions. Java/Python ML library
classes can be used for this problem.
Aim:
The aim of this program is to implement the k-Nearest Neighbors algorithm to classify the Iris
dataset and print both correct and wrong predictions.
Procedure:
4. Predict classes :
Predict the classes for the test set using the trained classifier.
46
Iterate through the predictions and print the correct and wrong predictions along with
the predicted and actual classes.
Program :
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
# Calculate accuracy
print("Accuracy:", accuracy)
47
# Print correct and wrong predictions
correct_predictions = 0
wrong_predictions = 0
for i in range(len(y_pred)):
if y_pred[i] == y_test[i]:
correct_predictions += 1
else:
wrong_predictions += 1
Output :
Accuracy: 0.9777777777777777
...
Result:
The program trains a k-NN classifier on the Iris dataset, predicts classes for the test data, and
prints both correct and wrong predictions with the corresponding actual and predicted classes.
48
9. Implement the non-parametric Locally Weighted Regression algorithm in order
to fit data points. Select appropriate data set for your experiment
and draw graphs.
Aim:
The aim of this program is to implement the Locally Weighted Regression algorithm to fit data
points and draw graphs to visualize the fitting.
Procedure:
This program follows these steps:
3. Query Points :
49
Generate query points to predict the fitted curve.
4. Predictions :
For each query point, compute the locally weighted regression and store the
predictions.
5. Plotting :
Plot the original data points and the fitted curve.
You can adjust the bandwidth parameter tau to control the smoothness of the fitted curve. A
smaller value of tau will result in a more locally fitted curve, while a larger value will result in
a smoother curve.
Program :
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
50
# Predict on the test data
y_pred = knn.predict(X_test)
# Calculate accuracy
print("Accuracy:", accuracy)
correct_predictions = 0
wrong_predictions = 0
for i in range(len(y_pred)):
if y_pred[i] == y_test[i]:
correct_predictions += 1
else:
wrong_predictions += 1
Output :
Accuracy: 0.9777777777777777
...
51
Result:
The program generates a simple sinusoidal dataset with added noise, performs Locally Weighted
Regression for each test point, and plots the original data points along with the fitted curve
using Locally Weighted Regression.
52
53
54
55