Final AI & ML Lab Manual
Final AI & ML Lab Manual
PAGE
S.NO DATE TITLE MARKS SIGNATURE
NO.
2a. A* Algorithm
4. Bayesian Network
5. Regression Model
8. Ensemble Model
ALGORITHM:
Step 1: Initialize an empty list called ‘visited’ to keep track of the nodes visited during the
traversal.
Step 2: Initialize an empty queue called ‘queue’ to keep track of the nodes to be traversed
in the future.
Step 3: Add the starting node to the ‘visited’ list and the ‘queue’.
Step 4: While the ‘queue’ is not empty, do the following:
a. Dequeue the first node from the ‘queue’ and store it in a variable called
‘current’.
b. Print ‘current’.
c. For each of the neighbours of ‘current’ that have not been visited yet, mark the
neighbour as visited and add it to the ‘queue’.
Step 5: When all the nodes reachable from the starting node have been visited, the
algorithm terminates.
1
PROGRAM:
import networkx as nx
import matplotlib.pyplot as plt
# Define the graph
graph = {
'5': ['3', '7'],
'3': ['2', '4'],
'7': ['8'],
'2': [],
'4': ['8'],
'8': []
}
# Create an undirected graph
G = nx.Graph()
# Add nodes and edges from the graph dictionary
for node, neighbors in graph.items():
for neighbor in neighbors:
G.add_edge(node, neighbor)
pos = nx.spring_layout(G) # Positions for all nodes
nx.draw(G, pos, with_labels=True, node_size=700, node_color="skyblue", font_size=12,
font_weight="bold")
plt.title("Undirected Graph")
plt.show()
visited = [] # List for visited nodes.
queue = [] # Initialize a queue
def bfs(visited, graph, node): # Function for BFS
2
visited.append(node)
queue.append(node)
while queue: # Creating loop to visit each node
m = queue.pop(0)
print(m, end=" ")
for neighbor in graph[m]:
if neighbor not in visited:
visited.append(neighbor)
queue.append(neighbor)
print("Following is the Breadth-First Search")
bfs(visited, graph, '5') # Function calling
Input Graph
OUTPUT:
Following is the Breadth-First Search
537248
RESULT:
Thus the Python program to implement breadth first search was executed and THE
output was verified.
3
EX.NO:1.b DEPTH FIRST SEARCH
AIM:
To write a python program to implement depth first search.
DFS
Depth First Traversal (DFS) explores nodes deeply before moving on to the next
branch. It starts at a designated node ("root" node) and explores as far as possible along
each branch before backtracking. DFS typically uses a stack data structure to keep track
of nodes to be explored. This allows it to backtrack efficiently.
ALGORITHM:
Step 1: Initialize an empty set called ‘visited’ to keep track of the nodes visited
during the Traversal.
Step 2: Define a DFS function that takes the current node, the graph, and
the‘visited’ set as input.
Step 3: If the current node is not in the ‘visited’ set, do the following:
a. Print the current node.
b. Add the current node to the ‘visited’ set.
c. For each of the neighbours of the current node, call the DFS
function. Recursively with the neighbour as the current node.
Step 4: When all the nodes reachable from the starting node have been visited, the
algorithm terminates.
4
PROGRAM:
import networkx as nx
import matplotlib.pyplot as plt
# Define the graph
graph = {
'5': ['3', '7'],
'3': ['2', '4'],
'7': ['8'],
'2': [],
'4': ['8'],
'8': []}
# Create an undirected graph
G = nx.Graph()
items = graph.items()
print("Graph items are:", items)
# Add nodes and edges from the graph
for node, neighbors in items:
for neighbor in neighbors:
G.add_edge(node, neighbor) # It is a built-in function to add nodes
pos = nx.spring_layout(G) # Positions for all nodes
nx.draw(G, pos, with_labels=True, node_size=1000, node_color="skyblue")
plt.title("Undirected Graph")
plt.show()
visited = set() # Set to keep track of visited nodes of graph.
def dfs(visited, graph, node): # Function for DFS
5
if node not in visited:
print(node, end=" ")
visited.add(node)
for neighbor in graph[node]:
dfs(visited, graph, neighbor)
print("Following is the Depth-First Search for the above graph:")
dfs(visited, graph, '5')
Input Graph
OUTPUT
Following is the Depth-First Search for the above graph:
532487
RESULT:
Thus the python program to implement depth first search was executed and the output was
verified.
6
EX.NO:2.a A* ALGORITHM
AIM:
To write a python program to implement A* algorithm
A* ALGORITHM
A* Algorithm is used to find the shortest path between two nodes in a graph, given the
estimated cost of getting from the current node to the destination node. The main
advantage of the algorithm is its ability to provide an optimal path by exploring the graph
in a more informed way compared to traditional search algorithms such as Dijkstra's
algorithm.
ALGORITHM:
Step 1: Initialize the distances dictionary with float('inf') for all vertices in the
graph except for the start vertex which is set to 0.
Step 2: Initialize the parent dictionary with None for all vertices in the graph.
Step 3: Initialize an empty set for visited vertices.
Step 4: Initialize a priority queue (pq) with a tuple containing the sum of the heuristic
value and the distance from start to the current vertex, the distance from start to the
current vertex, and the current vertex.
Step 5:While pq is not empty, do the following:
a. Dequeue the vertex with the smallest f-distance (sum of the heuristic value and
the distance from start to the current vertex).
b. If the current vertex is the destination vertex, return distances and parent.
c. If the current vertex has not been visited, add it to the visited set.
d. For each neighbor of the current vertex, do the following:
i. Calculate the distance from start to the neighbor (g) as the sum of the
distance from start to the current vertex and the edge weight between the
current vertex and the neighbor.
ii. Calculate the f-distance (f = g + h) for the neighbor.
iii. If the f-distance for the neighbor is less than its current distance
7
in the distances dictionary, update the distances dictionary with the new
distance and the parent dictionary with the current vertex as the parent of the
neighbor.
iv. Enqueue the neighbor with its f-distance, distance from start to neighbor,
and the neighbor itself into the priority queue.
Step 6. Return distances and parent.
PROGRAM:
import heapq
def a_star(graph, start, dest, heuristic):
distances = {vertex: float('inf') for vertex in graph}
distances[start] = 0
parent = {vertex: None for vertex in graph}
visited = set()
pq = [(0 + heuristic[start], start)] # Estimated total distance (f) is g + h
while pq:
curr_f, curr_vert = heapq.heappop(pq)
if curr_vert == dest:
break
if curr_vert not in visited:
visited.add(curr_vert)
for nbor, weight in graph[curr_vert].items():
distance = distances[curr_vert] + weight # distance from start (g)
f_distance = distance + heuristic[nbor] # f = g+h
if f_distance < distances[nbor]:
distances[nbor] = f_distance
8
parent[nbor] = curr_vert
heapq.heappush(pq, (f_distance, nbor)) # logE time
return distances, parent
def generate_path_from_parents(parent, start, dest):
path = []
curr = dest
while curr:
path.append(curr)
curr = parent[curr]
return '->'.join(path[::-1])
graph = {
'S': {'A': 2, 'B': 1},
'A': {'B': 3, 'C': 1},
'B': {'D': 4},
'C': {'D': 2},
'D': {}
}
heuristic = {
'S': 5,
'A': 4,
'B': 3,
'C': 2,
'D': 0
}
start = 'S'
9
dest = 'D'
distances, parent = a_star(graph, start, dest, heuristic)
print('Optimal path:', generate_path_from_parents(parent, start, dest))
Input:
graph = {
'S': {'A': 2, 'B': 1},
'A': {'B': 3, 'C': 1},
'B': {'D': 4},
'C': {'D': 2},
'D': {}
}
heuristic = {
'S': 5,
'A': 4,
'B': 3,
'C': 2,
'D': 0
}
OUTPUT:
Optimal path: S->B->D
RESULT:
Thus the python program to implement A* algorithm was executed and the output was
verified.
10
EX.NO:2.b MEMORY BOUNDED A* ALGORITHM
AIM:
To write a python program to implement A* algorithm
MEMORY BOUNDED A* ALGORITHM
A memory-bound algorithm is an algorithm that is limited by the amount of available
memory rather than by processing power or other resources. These algorithms are
designed to operate within a fixed amount of memory, making them suitable for systems
with memory constraints, such as embedded systems or devices with limited memory
capacity.
ALGORITHM:
Step 1:Initialize the distances dictionary with float('inf') for all vertices in
the graph except for the start vertex which is set to 0.
Step 2: Initialize the parent dictionary with None for all vertices in the graph.
Step 3: Initialize an empty set for visited vertices.
Step 4: Initialize a priority queue (pq) with a tuple containing the sum of the heuristic
value and the distance from start to the current vertex, the distance from start to the
current vertex, and the current vertex.
Step 5:While pq is not empty, do the following:
a. Dequeue the vertex with the smallest f-distance (sum of the heuristic value and
the distance from start to the current vertex).
b. If the current vertex is the destination vertex, return distances and parent.
c. If the current vertex has not been visited, add it to the visited set.
d. For each neighbor of the current vertex, do the following:
i. Calculate the distance from start to the neighbor (g) as the sum of the
distance from start to the current vertex and the edge weight between the
current vertex and the neighbor.
ii. Calculate the f-distance (f = g + h) for the neighbor.
11
iii. If the f-distance for the neighbor is less than its current distance in the
distances dictionary, update the distances dictionary with the new distance
and the parent dictionary with the current vertex as the parent of the
neighbor.
iv. Enqueue the neighbor with its f-distance, distance from start to neighbor,
and the neighbor itself into the priority queue.
Step 6. Return distances and parent.
PROGRAM:
import heapq
def ma_star(graph, start, dest, heuristic, memory_limit):
distances = {vertex: float('inf') for vertex in graph}
distances[start] = 0
parent = {vertex: None for vertex in graph}
visited = set()
pq = [(0 + heuristic[start], 0, start)]
num_nodes = 0
while pq:
curr_f, curr_dist, curr_vert = heapq.heappop(pq)
num_nodes -= 1
if curr_vert == dest:
break
if curr_vert not in visited:
visited.add(curr_vert)
for nbor, weight in graph[curr_vert].items():
distance = curr_dist + weight
f_distance = distance + heuristic[nbor]
12
if f_distance < distances[nbor]:
distances[nbor] = f_distance
parent[nbor] = curr_vert
if num_nodes < memory_limit:
heapq.heappush(pq, (f_distance, distance, nbor))
num_nodes += 1
elif f_distance < max(pq)[0]:
pq.remove(max(pq))
heapq.heappush(pq, (f_distance, distance, nbor)
return distances, parent
def generate_path_from_parents(parent, start, dest):
path = []
curr = dest
while curr:
path.append(curr)
curr = parent[curr]
return '->'.join(path[::-1])
graph = {
'S': {'A': 2, 'B': 1},
'A': {'B': 3, 'C': 1},
'B': {'D': 4},
'C': {'D': 2},
'D': {}
}
heuristic = {
13
'S': 5,
'A': 4,
'B': 3,
'C': 2,
'D': 0
}
start = 'S'
dest = 'D'
memory_limit = 2
distances, parent = ma_star(graph, start, dest, heuristic, memory_limit)
print('Optimal path:', generate_path_from_parents(parent, start, dest))
Input:
graph = {
'S': {'A': 2, 'B': 1},
'A': {'B': 3, 'C': 1},
'B': {'D': 4},
'C': {'D': 2},
'D': {}
}
heuristic = {
'S': 5,
'A': 4,
'B': 3,
'C': 2,
14
'D': 0
}
memory_limit = 2
Output:
Optimal path: S->B->D
RESULT:
Thus the python program to implement memory bounded A* algorithm was executed
successfully and the output was verified.
15
EX.NO: 3 NAIVE BAYES CLASSIFIER
AIM:
To write a python program to implement naive bayes model.
NAIVE BAYES
Naive Bayes classifiers are a collection of classification algorithms based on Bayes’
Theorem. This model predicts the probability of an instance belongs to a class with a
given set of feature values. Every pair of features being classified is independent of each
other. It is highly used in text classification. It is a probabilistic classifier.
Bayes’ Theorem
Bayes’ Theorem finds the probability of an event occurring given the probability
of another event that has already occurred. Bayes’ theorem is stated mathematically
as the following equation:
Basically, we are trying to find the probability of event A, given that event B is true.
Event B is also termed as evidence.
P(A) is the priori of A (the prior probability, i.e. Probability of event before evidence is
seen). The evidence is an attribute value of an unknown instance(here, it is event B).
P(B) is Marginal Probability: Probability of Evidence.
P(A|B) is a posteriori probability of B, i.e. probability of event after evidence is seen.
16
P(B|A) is Likelihood probability i.e the likelihood that a hypothesis will come true
based on the evidence.
Advantages:
● Simple and easy to implement.
● Efficient for large datasets and high-dimensional feature spaces.
● Works well with categorical and numerical data.
ALGORITHM:
Step 1: The code imports necessary libraries, including pandas for data manipulation,
CountVectorizer for converting text data into numerical vectors, and MultinomialNB for
implementing the Multinomial Naive Bayes classifier.
Step 2: The code reads a CSV file containing text data and their corresponding labels
(spam or not spam) into a pandas DataFrame.
Step 3: Rows with missing values (NaN) in the 'text' and 'spam' columns are dropped to
ensure data quality.
Step 4: The 'text' column is extracted as training sentences, and the 'spam' column is
extracted as labels.
Step 5: The training sentences are vectorized using CountVectorizer, which converts text
data into numerical feature vectors. Each sentence is represented as a vector of word
frequencies or presence indicators.
Step 6: A Multinomial Naive Bayes classifier is trained using the vectorized training data.
The MNB classifier is a probabilistic classifier commonly used for text classification
tasks. It models the conditional probability of each word given the class (spam or not
spam) and uses Bayes' theorem to make predictions.
Step 7: The user is prompted to input a sentence. The input sentence is vectorized using
the same CountVectorizer used for training. The trained MNB classifier then predicts the
label (spam or not spam) for the input sentence based on its probability distribution over
the classes.
Step 8: The predicted label (spam or not spam) for the input sentence is printed to the
console.
PROGRAM:
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
17
from sklearn.naive_bayes import MultinomialNB
# Read the CSV file
data = pd.read_csv("note.csv") # Update "your_file.csv" with the actual file path
# Drop rows with NaN values
data.dropna(subset=['text', 'spam'], inplace=True)
# Extract the "spam" column as labels and the "text" column as training sentences
train_sentences = data['text'].tolist()
train_labels = data['spam'].tolist()
# Vectorize the training data
vectorizer = CountVectorizer()
X_train = vectorizer.fit_transform(train_sentences)
# Train the classifier
nb_classifier = MultinomialNB()
nb_classifier.fit(X_train, train_labels)
# Get user input
print("Enter a Sentence to check whether the text is spam or not")
user_input = input("Enter a sentence: ")
# Vectorize the user input
X_user = vectorizer.transform([user_input])
# Predict label
prediction = nb_classifier.predict(X_user)[0]
# Print prediction
spam_or_not = "spam" if prediction == 1 else "not spam"
print(f"Predicted label for '{user_input}': {spam_or_not}")
Sample Dataset
note.csv
Text spam
Subject: re : parking pass for van ngo done . shirley crenshaw @ ect 01 / 19 / 2000 07 : 33 am to : louis
allen @ enron cc : vince j kaminski / hou / ect @ ect , kevin g moore / hou / ect @ ect , william smith /
corp / enron @ enron subject : parking pass for van ngo good morning louis : please cancel the " secom "
parking badge that was issued to van ngo for parking in the 777 clay garage while she was working part
time with the research group during the holidays . the number on the card is 4280 . i will return the badge
to you this morning . the co . # is 0011 and the rc # is 100038 . thanks louis and have a great day ! shirley
3 - 5290 0
18
Subject: 4 color printing special request additional information now ! click here click here for a printable
version of our order form ( pdf format ) phone : ( 626 ) 338 - 8090 fax : ( 626 ) 338 - 8102 e - mail :
ramsey @ goldengraphix . com request additional information now ! click here click here for a printable
version of our order form ( pdf format ) golden graphix & printing 5110 azusa canyon rd . irwindale , ca
91706 this e - mail message is an advertisement and / or solicitation . 1
Subject: do not have money , get software cds from here ! software compatibility . . . . ain ' t it great ?
grow old along with me the best is yet to be . all tradgedies are finish ' d by death . all comedies are ended
by marriage . 1
OUTPUT:
Enter a sentence to check whether the text is spam or not:
Enter a sentence: hi
Predicted label for 'hi': not spam
Enter a sentence to check whether the text is spam or not:
Enter a sentence: undeliverable
Predicted label for 'undeliverable ': spam
RESULT:
Thus the python program to implement Naive Bayes Model was executed and the
output was verified.
19
EX.NO:4 BAYESIAN NETWORK
AIM:
To write a python program to implement Bayesian Network
BAYESIAN NETWORK
Bayesian networks are probabilistic, because these networks are built from a
probability distribution, and also use probability theory for prediction and anomaly
detection. A Bayesian network is a probabilistic graphical model which represents a set of
variables and their conditional dependencies using a directed acyclic graph.
ALGORITHM:
Step 1: Read two Excel files into pandas DataFrames (drug.xlsx and age.xlsx).
Step 2: Create a new column Age_binned in both datasets by categorizing Age into bins:
'Young', 'Middle-aged', and 'Old'.
Step 3: Convert categorical variables to numerical codes:
Step 4: Rename the gender column in data2 to Sex to ensure consistent column names for
merging.
Step 5: Merge data1 and data2 on common columns Age_binned and Sex to create
combined_data.Verify and print the column names of combined_data to confirm the merge
was successful.
Step 6: Select columns needed for analysis: Age_binned, Sex, Drug, and Cholesterol_x
(from data1).
Step 7: Define Bayesian Network Structure-Specify the structure of the Bayesian
Network (model) using BayesianModel from pgmpy.models:
Step 10: Print the results of the inference query to show the conditional probabilities of
Drug and Cholesterol based on given evidence.
PROGRAM:
import pandas as pd
from pgmpy.models import BayesianModel
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.inference import VariableElimination
# Load the datasets
data1 = pd.read_excel('drug.xlsx')
data2 = pd.read_excel('age.xlsx')
# Binning the Age
data1['Age_binned'] = pd.cut(data1['Age'], bins=[0, 30, 60, 100], labels=['Young',
'Middle-aged', 'Old'])
data2['Age_binned'] = pd.cut(data2['Age'], bins=[0, 30, 60, 100], labels=['Young',
'Middle-aged', 'Old'])
# Converting categorical variables to numerical codes
data1['Sex'] = data1['Sex'].map({'F': 0, 'M': 1})
data2['gender'] = data2['gender'].map({'F': 0, 'M': 1})
# Renaming columns to have consistent naming
data2.rename(columns={'gender': 'Sex'}, inplace=True)
# Merging datasets on common columns
combined_data = pd.merge(data1, data2, on=['Age_binned', 'Sex'])
# Inspecting the merged data to ensure it contains the expected columns
print("Combined data columns:", combined_data.columns)
21
# Use one of the Cholesterol columns and rename it for consistency
combined_data['Cholesterol'] = combined_data['Cholesterol_x']
# Simplifying data
combined_data = combined_data[['Age_binned', 'Sex', 'Drug', 'Cholesterol']]
# Define the Bayesian Network structure
model = BayesianModel([('Age_binned', 'Drug'), ('Sex', 'Drug'), ('Age_binned',
'Cholesterol'), ('Sex', 'Cholesterol'), ('Drug', 'Cholesterol')])
# Fit the model with Maximum Likelihood Estimation
model.fit(combined_data, estimator=MaximumLikelihoodEstimator)
# Perform inference
infer = VariableElimination(model)
# Example inference
query = infer.query(variables=['Drug', 'Cholesterol'], evidence={'Age_binned': 'Young',
'Sex': 0})
print("P(Drug, Cholesterol | Age=Young, Sex=0):")
print(query)
Sample Dataset
drug.xlsx:
22
age.xlsx:
1 F 40 160 60 81 192
2 M 55 170 60 80 242
3 M 40 165 70 88 322
4 F 40 155 60 86 184
OUTPUT :
P(Drug, Cholesterol | Age=Young, Sex=0):
+-------------+---------------------+-------------------------+
| Drug | Cholesterol | phi(Drug,Cholesterol) |
+=============+=====================+=========================+
| Drug(DrugY) | Cholesterol(HIGH) | 0.1852 |
+-------------+---------------------+-------------------------+
| Drug(DrugY) | Cholesterol(NORMAL) | 0.3333 |
+-------------+---------------------+-------------------------+
| Drug(drugA) | Cholesterol(HIGH) | 0.0741 |
+-------------+---------------------+-------------------------+
| Drug(drugA) | Cholesterol(NORMAL) | 0.0370 |
+-------------+---------------------+-------------------------+
| Drug(drugB) | Cholesterol(HIGH) | 0.0000 |
+-------------+---------------------+-------------------------+
| Drug(drugB) | Cholesterol(NORMAL) | 0.0000 |
+-------------+---------------------+-------------------------+
| Drug(drugC) | Cholesterol(HIGH) | 0.0741 |
+-------------+---------------------+-------------------------+
| Drug(drugC) | Cholesterol(NORMAL) | 0.0000 |
+-------------+---------------------+-------------------------+
| Drug(drugX) | Cholesterol(HIGH) | 0.1852 |
+-------------+---------------------+-------------------------+
| Drug(drugX) | Cholesterol(NORMAL) | 0.1111 |
+-------------+---------------------+-------------------------+
RESULT:
Thus the python program to implement Bayesian Network was written, executed and
output was verified successfully.
23
EX NO :5 REGRESSION MODEL
AIM:
To write a python program to implement linear regression.
LINEAR REGRESSION
Linear regression is a type of supervised machine-learning algorithm that learns from
the labeled datasets and maps the data points to the most optimized linear functions. which
can be used for prediction on new datasets. The goal of the algorithm is to find the best Fit
Line equation that can predict the values based on the independent variables.
ALGORITHM:
24
a.Select the BMI column as the feature (independent variable) and the Diabetes
column as the target variable (dependent variable).
b.Reshape both feature and target variables into a 2-dimensional array
with one column using the reshape(-1, 1) method.
Step 4:Visualize the data:
a.Create a scatter plot with BMI on the x-axis and Diabetes on the y-axis using
matplotlib.
b.Set labels for the x-axis and y-axis
Step 5:Create and fit the linear regression model:Initialize a LinearRegression object
a..Fit the model to the data using the fit() method, passing the feature (x) and
target variable (y) as arguments
PROGRAM:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Read the dataset
d = pd.read_csv("diabetes.csv")
# Extracting BMI and Diabetes columns
x = d.iloc[:, 5].values.reshape(-1, 1)
y = d.iloc[:, 1].values.reshape(-1, 1)
# Scatter plot of BMI vs Diabetes
plt.scatter(x, y)
plt.xlabel("BMI")
plt.ylabel("Diabetes")
# Creating and fitting the linear regression model
lin = LinearRegression()
lin.fit(x, y)
# Predicting values
y_pred = lin.predict(x)
# Plotting the regression line
plt.plot(x, y_pred, color='red')
# Calculating mean squared error and R-squared
25
mse = mean_squared_error(y, y_pred)
r2 = r2_score(y, y_pred)
# Displaying mean squared error and R-squared
print("Mean Squared Error:", mse)
print("R-squared:", r2)
plt.show()
Sample Dataset
diabetes.csv
1 85 66 29 0 26.6 0.351 31 0
1 89 66 23 94 28.1 0.167 21 0
OUTPUT:
Mean Squared Error: 971.0225668527718
R-squared: 0.04887241775173856
RESULT:
Thus the python program to implement the linear regression was executed and the
output was verified successfully.
26
EX.NO:6 DECISION TREE AND RANDOM FOREST
AIM:
To write a python program to implement Decision Tree and Random forest.
DECISION TREE
A decision tree is a flowchart-like structure used to make decisions or predictions. It
consists of nodes representing decisions or tests on attributes, branches representing the
outcome of these decisions, and leaf nodes representing final outcomes or predictions.
where piis the probability of an instance being classified into a particular class.
where piis the probability of an instance being classified into a particular class.
27
Information Gain: Measures the reduction in entropy or Gini impurity after a dataset is
split on an attribute.
RANDOM FOREST
Random Forest is a classifier that contains a number of decision trees on various
subsets of the given dataset and takes the average to improve the predictive accuracy of
that dataset. The greater number of trees in the forest leads to higher accuracy and
prevents the problem of overfitting.
It is based on the concept of ensemble learning, which is a process of combining
multiple classifiers to solve a complex problem and to improve the performance of the
model.
ALGORITHM:
Step 1:Read the dataset from the Excel file "nation.xlsx" using pandas
Step 2:Map categorical variables 'Nationality' and 'Go' to numerical values as specified.
Step 3:Define Features and Target Variable:
3.1:Define the features 'Age', 'Experience', 'Rank', and 'Nationality'.
3.2:Define the target variable 'Go'.
Step 4:Initialize a Decision Tree Classifier.
28
Step 5:Fit the Decision Tree Classifier using the features and target variable.
Step 6:Plot the decision tree using matplotlib.pyplot and sklearn.tree.plot_tree().Adjust
the plot size and display the tree with feature names and class names.
Step 7:Initialize a Random Forest Classifier with 100 decision trees.
Step 8:Fit the Random Forest Classifier using the features and target variable.
Step 9:Iterate through a specified number of additional trees from the Random Forest.Plot
each decision tree using matplotlib.pyplot and sklearn.tree.plot_tree().Adjust the plot size
and display each tree with feature names and class names.
Step 10:Calculate feature importances from the Random Forest Classifier.
10.1: Sort the feature importances in descending order.
10.2: Plot the feature importances using matplotlib.pyplot.
10.3: Display feature importance values with their corresponding feature
names.Adjust the plot size and labels for clarity.
PROGRAM:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
import matplotlib.pyplot as plt
from sklearn import tree
# Load data from Excel file
df = pd.read_excel("nation.xlsx") # Assuming your Excel file is named "nation.xlsx"
# Mapping categorical variables to numerical values
d = {'UK': 0, 'USA': 1, 'N': 2}
df['Nationality'] = df['Nationality'].map(d)
d = {'YES': 1, 'NO': 0}
df['Go'] = df['Go'].map(d)
# Features and target variable
29
features = ['Age', 'Experience', 'Rank', 'Nationality']
X = df[features]
y = df['Go']
# Decision tree classifier
dtree = DecisionTreeClassifier() # Using information gain
dtree = dtree.fit(X, y)
# Plot decision tree
plt.figure(figsize=(12, 8))
tree.plot_tree(dtree, feature_names=features, class_names=['NO', 'YES'], filled=True)
plt.subplots_adjust(left=0.05, right=0.95, top=0.95, bottom=0.05)
plt.title("Decision Tree Classifier")
plt.show()
# Random Forest classifier
rf = RandomForestClassifier(n_estimators=100) # 100 decision trees
rf.fit(X, y)
# Plotting five additional decision trees from Random Forest
num_trees_to_plot = 5
for i in range(num_trees_to_plot):
plt.figure(figsize=(12, 8))
tree.plot_tree(rf.estimators_[i+1], feature_names=features, class_names=['NO', 'YES'],
filled=True)
plt.subplots_adjust(left=0.05, right=0.95, top=0.95, bottom=0.05)
plt.title("Decision Tree {} in Random Forest".format(i+2)) # Adjusted index
plt.show()
importances = rf.feature_importances_
indices = sorted(range(len(importances)), key=lambda i: importances[i], reverse=True)
30
# Plot feature importance
plt.figure(figsize=(10, 6))
plt.tight_layout()
plt.show()
OUTPUT:
31
RESULT :
Thus the python program to implement the Decision Tree and Random forest was
executed and output was verified successfully.
32
EX.NO:7 SUPPORT VECTOR MACHINE
AIM:
To write a python program to build SVM models.
SVM
Support Vector Machine (SVM) is a powerful machine learning algorithm used for
linear or nonlinear classification, regression, and even outlier detection tasks. SVMs can
be used for a variety of tasks, such as text classification, image classification, spam
detection, handwriting identification, gene expression analysis, face detection, and
anomaly detection.
The main objective of the SVM algorithm is to find the optimal hyperplane in an
N-dimensional space that can separate the data points in different classes in the feature
space.
ALGORITHM:
Step 1: Load the Iris dataset using the `datasets.load_iris()` function from scikit-learn.
Step 2: Extract the features (petal length and sepal length) and target labels (species) from
the loaded dataset.
Step 3: Filter the dataset to include only samples corresponding to the Iris setosa and Iris
virginica species by selecting the rows where the target labels are 0 (Iris setosa) or 2 (Iris
virginica).
Step 4: Split the filtered dataset into training and testing sets using the `train_test_split`
function from scikit-learn, with a specified test size (e.g., 20%) and a random state for
reproducibility.
Step 5: Initialize an SVM classifier with a linear kernel using `svm.SVC(kernel='linear')`.
33
Step 6: Train the SVM classifier using the training data (features and labels)
using the `fit` method.
Step 7: Plot the training data points on a scatter plot where the x-axis represents petal
length and the y-axis represents sepal length. Different colors are used to distinguish
between different species.
Step 8: Retrieve the current axis limits of the scatter plot to define the range for the
meshgrid.
Step 9: Create a meshgrid covering the range of feature values (petal length and sepal
length) using `np.meshgrid`.
Step 10: Evaluate the decision function of the trained SVM classifier on the meshgrid
points to determine the decision boundary.
Step 11: Plot the decision boundary on the scatter plot using contours, where the decision
function equals -1, 0, and 1. Different linestyles are used to distinguish between the
margin and the decision boundary.
PROGRAM:
OUTPUT:
RESULT:
Thus the python program to build SVM models was executed and the output was
verified successfully.\
35
EX.NO:8 ENSEMBLE MODEL
AIM:
To write python program to implement ensembling techniques
STUDY:
Ensemble learning is a machine learning technique that combines the predictions
from multiple individual models to obtain a better predictive performance than any single
model. The basic idea of ensemble learning is to aggregating the predictions of multiple
models, each of which may have its own strengths and weaknesses. This can lead to
improved performance and generalization.
ALGORITHM:
Step 1: Import the necessary libraries including scikit-learn modules for datasets,
ensemble techniques (Bagging, Boosting, Stacking), classifiers (Random Forest,
AdaBoost, Logistic Regression), and evaluation metrics.
Step 2: Load the Iris dataset using load_iris() function from scikit-learn datasets module.
Split the dataset into features (X) and target (y).
Step3: Split the dataset into training and testing sets using train_test_split() function from
scikit-learn.
Step 4:Bagging with Random Forest:
4.1:Initialize a BaggingClassifier with a base estimator as RandomForestClassifier. Set
the number of base estimators to 10 and the number of bags (ensemble members) to 5.
4.2:Fit the BaggingClassifier on the training data.
4.3:Predict using the trained BaggingClassifier on the entire dataset
4.4:Evaluate the accuracy of the Bagging Classifiers predictions using
accuracy_score() function.
Step 5:Boosting with AdaBoost:
5.1:Initialize an AdaBoostClassifier with the number of estimators set to 50.
5.2:Fit the AdaBoostClassifier on the training data.
5.3:Predict using the trained AdaBoostClassifier on the entire dataset.
5.4:Evaluate the accuracy of the AdaBoostClassifier's predictions.
Step 6:Stacking with Random Forest, AdaBoost, and Logistic Regression:
6.1:Define base models as a list of tuples, where each tuple contains the name of the
base model and the base model itself.
6.2:Initialize the meta-learner as LogisticRegression.
6.3:Initialize the StackingClassifier with base models and the meta-learner.
37
6.4:Fit the StackingClassifier on the training data.
6.5:Predict using the trained StackingClassifier on the entire dataset.
6.6:Evaluate the accuracy of the StackingClassifier's predictions.
Step 7:Print the accuracy and predictions for each ensemble method.
PROGRAM:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier,
BaggingClassifier, StackingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Bagging with Random Forest
bagging_clf = BaggingClassifier(RandomForestClassifier(n_estimators=10,
random_state=42), n_estimators=5, random_state=42)
bagging_clf.fit(X_train, y_train)
bagging_pred = bagging_clf.predict(X)
bagging_accuracy = accuracy_score(y, bagging_pred)
print("Bagging Accuracy:", bagging_accuracy)
print("Bagging Predictions:", bagging_pred)
# Boosting with AdaBoost
boosting_clf = AdaBoostClassifier(n_estimators=50, random_state=42)
boosting_clf.fit(X_train, y_train)
boosting_pred = boosting_clf.predict(X)
boosting_accuracy = accuracy_score(y, boosting_pred)
print("Boosting Accuracy:", boosting_accuracy)
print("Boosting Predictions:", boosting_pred)
# Stacking with Random Forest, AdaBoost, and Logistic Regression
base_models = [
38
('random_forest', RandomForestClassifier(n_estimators=10, random_state=42)),
('adaboost', AdaBoostClassifier(n_estimators=10, random_state=42))]
meta_learner = LogisticRegression()
stacking_clf = StackingClassifier(estimators=base_models, final_estimator=meta_learner)
stacking_clf.fit(X_train, y_train)
stacking_pred = stacking_clf.predict(X)
stacking_accuracy = accuracy_score(y, stacking_pred)
print("Stacking Accuracy:", stacking_accuracy)
print("Stacking Predictions:", stacking_pred)
Output:
Bagging Accuracy: 0.98
Predictions: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0000000000000111111111111111111112111
1111111111111111111111111122222212222
2222222212222222222222222222222222222
2 2]
Bagging Boosting Accuracy: 0.9733333333333334
Boosting Predictions: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0
0000000000000111111111111111111112111
1112111112111111111111111122222212222
2222222222222222222222222222222222222
2 2]
Stacking Accuracy: 0.9933333333333333
Stacking Predictions: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0
0000000000000111111111111111111112111
1111111111111111111111111122222222222
2222222222222222222222222222222222222
2 2]
RESULT:
Thus the python program to implement ensembling techniques was executed and
output was verified successfully.
39
EX.NO:9 K-MEANS CLUSTERING ALGORITHM
AIM:
CLUSTERING
Clustering algorithms are unsupervised learning techniques used to group similar data
points together based on certain characteristics or features.
TYPES OF CLUSTERING ALGORITHM
1. Partitioning Methods
● K-Means Clustering
● K-Medoids (PAM)
2. Hierarchical Methods
● Agglomerative Clustering
● Divisive Clustering
3. Density-Based Methods
● DBSCAN
● OPTICS
K-MEANS CLUSTERING
40
ALGORITHM:
Step 1: Import the necessary libraries including pandas, sklearn for KMeans clustering,
and matplotlib for visualization.
Step 2: Read the dataset from a file using pd.read_csv(file_path) into a pandas
DataFrame df.
Step 3: Plot the original data points with 'AGE' on the x-axis and 'BP' on the y-axis using
plt.scatter. Each point is represented as a blue circle.
Step 4: Initialize a KMeans object k means with the desired number of clusters
(n_clusters=2) and a random state (random_state=42).
4.1: Fit the KMeans model to the 'AGE' and 'BP' columns of the DataFrame using
kmeans.fit_predict(df[['AGE', 'BP']])
4.2: Assign the cluster labels to a new column in the DataFrame df['Cluster'].
Step 5: Plot the clustered data points with 'AGE' on the x-axis and 'BP' on the y-axis
using plt.scatter. Each point is colored according to its assigned cluster, with the colormap
set to 'viridis' and adjust the size of the points, transparency, and edge color for better
visualization.
Step 6: Plot the centroids of the clusters as red 'X' markers using plt.scatter.
Step 7: Add a title to the plot using plt.title and label the x-axis and y-axis using
plt.xlabel and plt.ylabel.
Step 8: Add a legend to the plot using plt.legend.
41
Step 9: Display the plot using plt.show().
PROGRAM:
import pandas as pd
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
#load data from csv file
df = pd.read_csv(“C:/Users/Desktop/AzureDiabDataset.csv”)
# Visualize the original data
plt.scatter(df['AGE'], df['BP'], s=50,color='blue', label='Original Data')
#Apply k-means clustering
kmeans = KMeans(n_clusters=2, random_state=42)
df['Cluster'] = kmeans.fit_predict(df[['AGE', 'BP']])
#Visualize the clustered data
plt.scatter(df['AGE'], df['BP'], c=df['Cluster'], cmap='viridis', s=50, alpha=0.8,
edgecolors='w', label='Clustered Data')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], c='red',
marker='X', s=200, label='Centroids')
#Add labels and legends
plt.title('K-Means Clustering')
plt.xlabel('AGE')
plt.ylabel('BP')
plt.legend()
#show the plot
plt.show()
42
Sample Dataset
AzureDiabDataset.csv
OUTPUT:
RESULT:
Thus the python program to implement clustering algorithms was executed and the
output was verified successfully.
43
EX.NO:10 EM FOR BAYESIAN NETWORKS
AIM:
To write a python program to implement EM for Bayesian Networks.
EXPECTATION MAXIMIZATION
The Expectation-Maximization (EM) algorithm is an iterative method used to find
maximum likelihood estimates of parameters in statistical models, particularly when the
model involves latent variables or incomplete data.
Two-Step Process: The algorithm consists of the Expectation (E) step, which
calculates the expected value of the log-likelihood, and the Maximization (M) step,
which updates the parameters to maximize this expected log-likelihood.
Application: Commonly used in applications such as Gaussian Mixture Models
(GMMs), Hidden Markov Models (HMMs), and image segmentation to handle
incomplete or missing data.
Convergence: The algorithm is guaranteed to converge to a local maximum of the
likelihood function, though it may not always find the global maximum.
ALGORITHM:
44
Step 2: Load the Iris dataset using load_iris() function from scikit-learn. Assign the
features to variable X.
Step 4: Fit the GMM to the data using the Expectation-Maximization (EM) algorithm by
calling the fit() method on the GMM object gmm and passing the dataset X.
Step 5: Predict the cluster assignments for each data point using the predict() method on
the fitted GMM object gmm. This assigns each data point to one of the three clusters.
Step 6: Print the cluster means and covariances using the means and ‘covariances’
attributes of the fitted GMM object ‘gmm’.
PROGRAM:
import numpy as np
iris = load_iris()
X = iris.data
gmm = GaussianMixture(n_components=3)
gmm.fit(X)
labels = gmm.predict(X)
print(gmm.means)
print("\nCluster Covariances:")
print(gmm.covariances)
OUTPUT:
Cluster Means:
Cluster Covariances:
46
[0.097232 0.140817 0.011464 0.009112 ]
RESULT:
Thus the python program to implement EM for Bayesian Networks was executed
and the output was verified successfully.
47
EX.NO:11 SIMPLE NN MODEL
AIM:
NEURAL NETWORK
ALGORITHM:
Step 1:Import numpy for numerical operations, load_iris to load the Iris dataset, and
various functions from scikit-learn for data preprocessing and model evaluation. Import
Sequential and Dense from TensorFlow.keras for building the neural network model.
Step 4: One-hot encode the target labels using OneHotEncoder from scikit-learn.
Step 5:Split the data into training and testing sets using train_test_split from scikit-learn.
48
Step 7:Add a Dense layer with 10 units and ReLU activation function as the first hidden
layer. Specify input shape as the number of features.
Step 8:Add a Dense output layer with 3 units (one for each class) and softmax activation
function.
Step 11:Specify the number of epochs (50) and batch size (5).
Step 12:Evaluate the trained model on the testing set to get the loss and accuracy.
Step 14: The program doesn't explicitly show the epoch loop, but it's implied that during
training, it's running for 50 epochs, printing the progress of each epoch.
PROGRAM:
import numpy as np
iris = load_iris()
X, y = iris.data, iris.target
enc = OneHotEncoder()
49
y = enc.fit_transform(y.reshape(-1, 1)).toarray()
model = Sequential([
Dense(3, activation=’softmax’)
])
OUTPUT:
Epoch 1/50
Epoch 2/50
50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
51
Test Loss: 0.40232041478157043, Test Accuracy: 0.800000011920929
RESULT:
Thus the python program to build a simple NN model was written, executed and
output was verified successfully.
52
EX.NO:12 DEEP LEARNING NEURAL NETWORK MODEL
AIM:
To write a python program to build deep learning neural network model.
ALGORITHM
Step 1:Import numpy for numerical operations, load_iris to load the Iris dataset, and
various functions from scikit-learn for data preprocessing and model evaluation. Import
Sequential and Dense from TensorFlow.keras for building the neural network model
Step 2:Load the Iris dataset using load_iris() function.
Step 3:Separate the features (X) and target labels (y).
Step 4:One-hot encode the target labels using OneHotEncoder from scikit-learn.
Step 5:Split the data into training and testing sets using train_test_split from scikit-learn.
Step 8:Add a Dense output layer with 3 units (one for each class) and softmax activation
function.
Step 11:Specify the number of epochs (50) and batch size (5).
Step 12:Evaluate the trained model on the testing set to get the loss and accuracy.
Step 14: The program doesn't explicitly show the epoch loop, but it's implied that during
training, it's running for 50 epochs, printing the progress of each epoch.
PROGRAM:
import numpy as np
iris = load_iris()
X = iris.data
y = iris.target
54
# Split the data into training and testing sets
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
model = Sequential([
Dense(16, activation='relu'),
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
55
Output:
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
56
Epoch 9/50
Epoch 10/50
RESULT:
Thus the python program to build a deep learning neural network model was executed
and the output was verified successfully.
57