[go: up one dir, main page]

0% found this document useful (0 votes)
4 views59 pages

Final AI & ML Lab Manual

The document outlines various algorithms and models in computer science, including Breadth First Search, Depth First Search, A* Algorithm, Memory Bounded A* Algorithm, and Naive Bayes Classifier. Each section provides an aim, a brief explanation of the algorithm or model, the algorithm steps, and a Python program implementation. The document concludes with results verifying the execution of these programs.

Uploaded by

Anuja R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views59 pages

Final AI & ML Lab Manual

The document outlines various algorithms and models in computer science, including Breadth First Search, Depth First Search, A* Algorithm, Memory Bounded A* Algorithm, and Naive Bayes Classifier. Each section provides an aim, a brief explanation of the algorithm or model, the algorithm steps, and a Python program implementation. The document concludes with results verifying the execution of these programs.

Uploaded by

Anuja R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

TABLE OF CONTENTS

PAGE
S.NO DATE TITLE MARKS SIGNATURE
NO.

1a. Breadth First Search

1b. Depth First Search

2a. A* Algorithm

2b. Memory Bounded A* Algorithm

3. Naive Bayes Classifier

4. Bayesian Network

5. Regression Model

6. Decision Tree and Random Forest

7. Support Vector Machine

8. Ensemble Model

9. K-Means Clustering Algorithm

10. EM for Bayesian Networks

11. Simple NN Model

12. Deep Learning Neural Network Model


EX.NO:1.a BREADTH FIRST SEARCH
AIM:
To write a python program to implement breadth first search.
BFS
Breadth First Search (BFS) is a graph traversal algorithm that explores nodes level by
level, visiting all nodes at the same depth before moving on to the nodes at the next depth
level. BFS uses a queue data structure to keep track of nodes to be explored.

ALGORITHM:
Step 1: Initialize an empty list called ‘visited’ to keep track of the nodes visited during the
traversal.
Step 2: Initialize an empty queue called ‘queue’ to keep track of the nodes to be traversed
in the future.
Step 3: Add the starting node to the ‘visited’ list and the ‘queue’.
Step 4: While the ‘queue’ is not empty, do the following:
a. Dequeue the first node from the ‘queue’ and store it in a variable called
‘current’.
b. Print ‘current’.
c. For each of the neighbours of ‘current’ that have not been visited yet, mark the
neighbour as visited and add it to the ‘queue’.
Step 5: When all the nodes reachable from the starting node have been visited, the
algorithm terminates.

1
PROGRAM:
import networkx as nx
import matplotlib.pyplot as plt
# Define the graph
graph = {
'5': ['3', '7'],
'3': ['2', '4'],
'7': ['8'],
'2': [],
'4': ['8'],
'8': []
}
# Create an undirected graph
G = nx.Graph()
# Add nodes and edges from the graph dictionary
for node, neighbors in graph.items():
for neighbor in neighbors:
G.add_edge(node, neighbor)
pos = nx.spring_layout(G) # Positions for all nodes
nx.draw(G, pos, with_labels=True, node_size=700, node_color="skyblue", font_size=12,
font_weight="bold")
plt.title("Undirected Graph")
plt.show()
visited = [] # List for visited nodes.
queue = [] # Initialize a queue
def bfs(visited, graph, node): # Function for BFS
2
visited.append(node)
queue.append(node)
while queue: # Creating loop to visit each node
m = queue.pop(0)
print(m, end=" ")
for neighbor in graph[m]:
if neighbor not in visited:
visited.append(neighbor)
queue.append(neighbor)
print("Following is the Breadth-First Search")
bfs(visited, graph, '5') # Function calling

Input Graph

OUTPUT:
Following is the Breadth-First Search
537248
RESULT:
Thus the Python program to implement breadth first search was executed and THE
output was verified.

3
EX.NO:1.b DEPTH FIRST SEARCH
AIM:
To write a python program to implement depth first search.
DFS
Depth First Traversal (DFS) explores nodes deeply before moving on to the next
branch. It starts at a designated node ("root" node) and explores as far as possible along
each branch before backtracking. DFS typically uses a stack data structure to keep track
of nodes to be explored. This allows it to backtrack efficiently.

ALGORITHM:
Step 1: Initialize an empty set called ‘visited’ to keep track of the nodes visited
during the Traversal.
Step 2: Define a DFS function that takes the current node, the graph, and
the‘visited’ set as input.
Step 3: If the current node is not in the ‘visited’ set, do the following:
a. Print the current node.
b. Add the current node to the ‘visited’ set.
c. For each of the neighbours of the current node, call the DFS
function. Recursively with the neighbour as the current node.
Step 4: When all the nodes reachable from the starting node have been visited, the
algorithm terminates.

4
PROGRAM:
import networkx as nx
import matplotlib.pyplot as plt
# Define the graph
graph = {
'5': ['3', '7'],
'3': ['2', '4'],
'7': ['8'],
'2': [],
'4': ['8'],
'8': []}
# Create an undirected graph
G = nx.Graph()
items = graph.items()
print("Graph items are:", items)
# Add nodes and edges from the graph
for node, neighbors in items:
for neighbor in neighbors:
G.add_edge(node, neighbor) # It is a built-in function to add nodes
pos = nx.spring_layout(G) # Positions for all nodes
nx.draw(G, pos, with_labels=True, node_size=1000, node_color="skyblue")
plt.title("Undirected Graph")
plt.show()
visited = set() # Set to keep track of visited nodes of graph.
def dfs(visited, graph, node): # Function for DFS

5
if node not in visited:
print(node, end=" ")
visited.add(node)
for neighbor in graph[node]:
dfs(visited, graph, neighbor)
print("Following is the Depth-First Search for the above graph:")
dfs(visited, graph, '5')

Input Graph

OUTPUT
Following is the Depth-First Search for the above graph:
532487

RESULT:
Thus the python program to implement depth first search was executed and the output was
verified.
6
EX.NO:2.a A* ALGORITHM
AIM:
To write a python program to implement A* algorithm
A* ALGORITHM
A* Algorithm is used to find the shortest path between two nodes in a graph, given the
estimated cost of getting from the current node to the destination node. The main
advantage of the algorithm is its ability to provide an optimal path by exploring the graph
in a more informed way compared to traditional search algorithms such as Dijkstra's
algorithm.
ALGORITHM:
Step 1: Initialize the distances dictionary with float('inf') for all vertices in the
graph except for the start vertex which is set to 0.
Step 2: Initialize the parent dictionary with None for all vertices in the graph.
Step 3: Initialize an empty set for visited vertices.
Step 4: Initialize a priority queue (pq) with a tuple containing the sum of the heuristic
value and the distance from start to the current vertex, the distance from start to the
current vertex, and the current vertex.
Step 5:While pq is not empty, do the following:
a. Dequeue the vertex with the smallest f-distance (sum of the heuristic value and
the distance from start to the current vertex).
b. If the current vertex is the destination vertex, return distances and parent.
c. If the current vertex has not been visited, add it to the visited set.
d. For each neighbor of the current vertex, do the following:
i. Calculate the distance from start to the neighbor (g) as the sum of the
distance from start to the current vertex and the edge weight between the
current vertex and the neighbor.
ii. Calculate the f-distance (f = g + h) for the neighbor.
iii. If the f-distance for the neighbor is less than its current distance
7
in the distances dictionary, update the distances dictionary with the new
distance and the parent dictionary with the current vertex as the parent of the
neighbor.
iv. Enqueue the neighbor with its f-distance, distance from start to neighbor,
and the neighbor itself into the priority queue.
Step 6. Return distances and parent.

PROGRAM:
import heapq
def a_star(graph, start, dest, heuristic):
distances = {vertex: float('inf') for vertex in graph}
distances[start] = 0
parent = {vertex: None for vertex in graph}
visited = set()
pq = [(0 + heuristic[start], start)] # Estimated total distance (f) is g + h
while pq:
curr_f, curr_vert = heapq.heappop(pq)
if curr_vert == dest:
break
if curr_vert not in visited:
visited.add(curr_vert)
for nbor, weight in graph[curr_vert].items():
distance = distances[curr_vert] + weight # distance from start (g)
f_distance = distance + heuristic[nbor] # f = g+h
if f_distance < distances[nbor]:
distances[nbor] = f_distance

8
parent[nbor] = curr_vert
heapq.heappush(pq, (f_distance, nbor)) # logE time
return distances, parent
def generate_path_from_parents(parent, start, dest):
path = []
curr = dest
while curr:
path.append(curr)
curr = parent[curr]
return '->'.join(path[::-1])
graph = {
'S': {'A': 2, 'B': 1},
'A': {'B': 3, 'C': 1},
'B': {'D': 4},
'C': {'D': 2},
'D': {}
}
heuristic = {
'S': 5,
'A': 4,
'B': 3,
'C': 2,
'D': 0
}
start = 'S'

9
dest = 'D'
distances, parent = a_star(graph, start, dest, heuristic)
print('Optimal path:', generate_path_from_parents(parent, start, dest))
Input:
graph = {
'S': {'A': 2, 'B': 1},
'A': {'B': 3, 'C': 1},
'B': {'D': 4},
'C': {'D': 2},
'D': {}
}
heuristic = {
'S': 5,
'A': 4,
'B': 3,
'C': 2,
'D': 0
}
OUTPUT:
Optimal path: S->B->D

RESULT:
Thus the python program to implement A* algorithm was executed and the output was
verified.

10
EX.NO:2.b MEMORY BOUNDED A* ALGORITHM
AIM:
To write a python program to implement A* algorithm
MEMORY BOUNDED A* ALGORITHM
A memory-bound algorithm is an algorithm that is limited by the amount of available
memory rather than by processing power or other resources. These algorithms are
designed to operate within a fixed amount of memory, making them suitable for systems
with memory constraints, such as embedded systems or devices with limited memory
capacity.
ALGORITHM:
Step 1:Initialize the distances dictionary with float('inf') for all vertices in
the graph except for the start vertex which is set to 0.
Step 2: Initialize the parent dictionary with None for all vertices in the graph.
Step 3: Initialize an empty set for visited vertices.
Step 4: Initialize a priority queue (pq) with a tuple containing the sum of the heuristic
value and the distance from start to the current vertex, the distance from start to the
current vertex, and the current vertex.
Step 5:While pq is not empty, do the following:
a. Dequeue the vertex with the smallest f-distance (sum of the heuristic value and
the distance from start to the current vertex).
b. If the current vertex is the destination vertex, return distances and parent.
c. If the current vertex has not been visited, add it to the visited set.
d. For each neighbor of the current vertex, do the following:
i. Calculate the distance from start to the neighbor (g) as the sum of the
distance from start to the current vertex and the edge weight between the
current vertex and the neighbor.
ii. Calculate the f-distance (f = g + h) for the neighbor.

11
iii. If the f-distance for the neighbor is less than its current distance in the
distances dictionary, update the distances dictionary with the new distance
and the parent dictionary with the current vertex as the parent of the
neighbor.
iv. Enqueue the neighbor with its f-distance, distance from start to neighbor,
and the neighbor itself into the priority queue.
Step 6. Return distances and parent.

PROGRAM:
import heapq
def ma_star(graph, start, dest, heuristic, memory_limit):
distances = {vertex: float('inf') for vertex in graph}
distances[start] = 0
parent = {vertex: None for vertex in graph}
visited = set()
pq = [(0 + heuristic[start], 0, start)]
num_nodes = 0
while pq:
curr_f, curr_dist, curr_vert = heapq.heappop(pq)
num_nodes -= 1
if curr_vert == dest:
break
if curr_vert not in visited:
visited.add(curr_vert)
for nbor, weight in graph[curr_vert].items():
distance = curr_dist + weight
f_distance = distance + heuristic[nbor]
12
if f_distance < distances[nbor]:
distances[nbor] = f_distance
parent[nbor] = curr_vert
if num_nodes < memory_limit:
heapq.heappush(pq, (f_distance, distance, nbor))
num_nodes += 1
elif f_distance < max(pq)[0]:
pq.remove(max(pq))
heapq.heappush(pq, (f_distance, distance, nbor)
return distances, parent
def generate_path_from_parents(parent, start, dest):
path = []
curr = dest
while curr:
path.append(curr)
curr = parent[curr]
return '->'.join(path[::-1])
graph = {
'S': {'A': 2, 'B': 1},
'A': {'B': 3, 'C': 1},
'B': {'D': 4},
'C': {'D': 2},
'D': {}
}
heuristic = {

13
'S': 5,
'A': 4,
'B': 3,
'C': 2,
'D': 0
}
start = 'S'
dest = 'D'
memory_limit = 2
distances, parent = ma_star(graph, start, dest, heuristic, memory_limit)
print('Optimal path:', generate_path_from_parents(parent, start, dest))

Input:
graph = {
'S': {'A': 2, 'B': 1},
'A': {'B': 3, 'C': 1},
'B': {'D': 4},
'C': {'D': 2},
'D': {}
}
heuristic = {
'S': 5,
'A': 4,
'B': 3,
'C': 2,

14
'D': 0
}
memory_limit = 2
Output:
Optimal path: S->B->D

RESULT:
Thus the python program to implement memory bounded A* algorithm was executed
successfully and the output was verified.
15
EX.NO: 3 NAIVE BAYES CLASSIFIER
AIM:
To write a python program to implement naive bayes model.

NAIVE BAYES
Naive Bayes classifiers are a collection of classification algorithms based on Bayes’
Theorem. This model predicts the probability of an instance belongs to a class with a
given set of feature values. Every pair of features being classified is independent of each
other. It is highly used in text classification. It is a probabilistic classifier.

Bayes’ Theorem
Bayes’ Theorem finds the probability of an event occurring given the probability
of another event that has already occurred. Bayes’ theorem is stated mathematically
as the following equation:

where A and B are events and P(B) ≠ 0

Basically, we are trying to find the probability of event A, given that event B is true.
Event B is also termed as evidence.
P(A) is the priori of A (the prior probability, i.e. Probability of event before evidence is
seen). The evidence is an attribute value of an unknown instance(here, it is event B).
P(B) is Marginal Probability: Probability of Evidence.
P(A|B) is a posteriori probability of B, i.e. probability of event after evidence is seen.

16
P(B|A) is Likelihood probability i.e the likelihood that a hypothesis will come true
based on the evidence.

Advantages:
● Simple and easy to implement.
● Efficient for large datasets and high-dimensional feature spaces.
● Works well with categorical and numerical data.

ALGORITHM:

Step 1: The code imports necessary libraries, including pandas for data manipulation,
CountVectorizer for converting text data into numerical vectors, and MultinomialNB for
implementing the Multinomial Naive Bayes classifier.
Step 2: The code reads a CSV file containing text data and their corresponding labels
(spam or not spam) into a pandas DataFrame.
Step 3: Rows with missing values (NaN) in the 'text' and 'spam' columns are dropped to
ensure data quality.
Step 4: The 'text' column is extracted as training sentences, and the 'spam' column is
extracted as labels.
Step 5: The training sentences are vectorized using CountVectorizer, which converts text
data into numerical feature vectors. Each sentence is represented as a vector of word
frequencies or presence indicators.
Step 6: A Multinomial Naive Bayes classifier is trained using the vectorized training data.
The MNB classifier is a probabilistic classifier commonly used for text classification
tasks. It models the conditional probability of each word given the class (spam or not
spam) and uses Bayes' theorem to make predictions.
Step 7: The user is prompted to input a sentence. The input sentence is vectorized using
the same CountVectorizer used for training. The trained MNB classifier then predicts the
label (spam or not spam) for the input sentence based on its probability distribution over
the classes.
Step 8: The predicted label (spam or not spam) for the input sentence is printed to the
console.
PROGRAM:
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
17
from sklearn.naive_bayes import MultinomialNB
# Read the CSV file
data = pd.read_csv("note.csv") # Update "your_file.csv" with the actual file path
# Drop rows with NaN values
data.dropna(subset=['text', 'spam'], inplace=True)
# Extract the "spam" column as labels and the "text" column as training sentences
train_sentences = data['text'].tolist()
train_labels = data['spam'].tolist()
# Vectorize the training data
vectorizer = CountVectorizer()
X_train = vectorizer.fit_transform(train_sentences)
# Train the classifier
nb_classifier = MultinomialNB()
nb_classifier.fit(X_train, train_labels)
# Get user input
print("Enter a Sentence to check whether the text is spam or not")
user_input = input("Enter a sentence: ")
# Vectorize the user input
X_user = vectorizer.transform([user_input])
# Predict label
prediction = nb_classifier.predict(X_user)[0]
# Print prediction
spam_or_not = "spam" if prediction == 1 else "not spam"
print(f"Predicted label for '{user_input}': {spam_or_not}")

Sample Dataset
note.csv

Text spam
Subject: re : parking pass for van ngo done . shirley crenshaw @ ect 01 / 19 / 2000 07 : 33 am to : louis
allen @ enron cc : vince j kaminski / hou / ect @ ect , kevin g moore / hou / ect @ ect , william smith /
corp / enron @ enron subject : parking pass for van ngo good morning louis : please cancel the " secom "
parking badge that was issued to van ngo for parking in the 777 clay garage while she was working part
time with the research group during the holidays . the number on the card is 4280 . i will return the badge
to you this morning . the co . # is 0011 and the rc # is 100038 . thanks louis and have a great day ! shirley
3 - 5290 0

18
Subject: 4 color printing special request additional information now ! click here click here for a printable
version of our order form ( pdf format ) phone : ( 626 ) 338 - 8090 fax : ( 626 ) 338 - 8102 e - mail :
ramsey @ goldengraphix . com request additional information now ! click here click here for a printable
version of our order form ( pdf format ) golden graphix & printing 5110 azusa canyon rd . irwindale , ca
91706 this e - mail message is an advertisement and / or solicitation . 1
Subject: do not have money , get software cds from here ! software compatibility . . . . ain ' t it great ?
grow old along with me the best is yet to be . all tradgedies are finish ' d by death . all comedies are ended
by marriage . 1

OUTPUT:
Enter a sentence to check whether the text is spam or not:
Enter a sentence: hi
Predicted label for 'hi': not spam
Enter a sentence to check whether the text is spam or not:
Enter a sentence: undeliverable
Predicted label for 'undeliverable ': spam

RESULT:
Thus the python program to implement Naive Bayes Model was executed and the
output was verified.
19
EX.NO:4 BAYESIAN NETWORK

AIM:
To write a python program to implement Bayesian Network

BAYESIAN NETWORK
Bayesian networks are probabilistic, because these networks are built from a
probability distribution, and also use probability theory for prediction and anomaly
detection. A Bayesian network is a probabilistic graphical model which represents a set of
variables and their conditional dependencies using a directed acyclic graph.

ALGORITHM:

Step 1: Read two Excel files into pandas DataFrames (drug.xlsx and age.xlsx).
Step 2: Create a new column Age_binned in both datasets by categorizing Age into bins:
'Young', 'Middle-aged', and 'Old'.
Step 3: Convert categorical variables to numerical codes:

○ Map 'Sex' column in data1 from {'F': 0, 'M': 1}.


○ Map 'gender' column in data2 from {'F': 0, 'M': 1}.

Step 4: Rename the gender column in data2 to Sex to ensure consistent column names for
merging.
Step 5: Merge data1 and data2 on common columns Age_binned and Sex to create
combined_data.Verify and print the column names of combined_data to confirm the merge
was successful.
Step 6: Select columns needed for analysis: Age_binned, Sex, Drug, and Cholesterol_x
(from data1).
Step 7: Define Bayesian Network Structure-Specify the structure of the Bayesian
Network (model) using BayesianModel from pgmpy.models:

○ Nodes: Age_binned, Sex, Drug, and Cholesterol.


○ Edges:
■ Age_binned → Drug
■ Sex → Drug
■ Age_binned → Cholesterol
■ Sex → Cholesterol
20
■ Drug → Cholesterol

Step 8: Use Maximum Likelihood Estimation (MaximumLikelihoodEstimator) to


estimate parameters of the Bayesian Network model (model) using fit() method on
combined_data.
Step 9: Instantiate VariableElimination from pgmpy.inference to perform probabilistic
queries (infer.query()):

○ Example query: Calculate joint probability distribution of Drug and


Cholesterol given Age_binned='Young' and Sex=0 (female).

Step 10: Print the results of the inference query to show the conditional probabilities of
Drug and Cholesterol based on given evidence.

PROGRAM:

import pandas as pd
from pgmpy.models import BayesianModel
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.inference import VariableElimination
# Load the datasets
data1 = pd.read_excel('drug.xlsx')
data2 = pd.read_excel('age.xlsx')
# Binning the Age
data1['Age_binned'] = pd.cut(data1['Age'], bins=[0, 30, 60, 100], labels=['Young',
'Middle-aged', 'Old'])
data2['Age_binned'] = pd.cut(data2['Age'], bins=[0, 30, 60, 100], labels=['Young',
'Middle-aged', 'Old'])
# Converting categorical variables to numerical codes
data1['Sex'] = data1['Sex'].map({'F': 0, 'M': 1})
data2['gender'] = data2['gender'].map({'F': 0, 'M': 1})
# Renaming columns to have consistent naming
data2.rename(columns={'gender': 'Sex'}, inplace=True)
# Merging datasets on common columns
combined_data = pd.merge(data1, data2, on=['Age_binned', 'Sex'])
# Inspecting the merged data to ensure it contains the expected columns
print("Combined data columns:", combined_data.columns)
21
# Use one of the Cholesterol columns and rename it for consistency
combined_data['Cholesterol'] = combined_data['Cholesterol_x']
# Simplifying data
combined_data = combined_data[['Age_binned', 'Sex', 'Drug', 'Cholesterol']]
# Define the Bayesian Network structure
model = BayesianModel([('Age_binned', 'Drug'), ('Sex', 'Drug'), ('Age_binned',
'Cholesterol'), ('Sex', 'Cholesterol'), ('Drug', 'Cholesterol')])
# Fit the model with Maximum Likelihood Estimation
model.fit(combined_data, estimator=MaximumLikelihoodEstimator)

# Perform inference
infer = VariableElimination(model)
# Example inference
query = infer.query(variables=['Drug', 'Cholesterol'], evidence={'Age_binned': 'Young',
'Sex': 0})
print("P(Drug, Cholesterol | Age=Young, Sex=0):")
print(query)

Sample Dataset

drug.xlsx:

Age Sex BP Cholesterol Na_to_K Drug

23 F HIGH HIGH 25.355 DrugY

47 M LOW HIGH 13.093 drugC

47 M LOW HIGH 10.114 drugC

28 F NORMAL HIGH 7.798 drugX

61 F LOW HIGH 18.043 DrugY

22 F NORMAL HIGH 8.607 drugX

49 F NORMAL HIGH 16.275 DrugY

22
age.xlsx:

ID Gender Age height(cm weight(kg) waist(cm) Cholesterol

0 F 40 155 60 81.3 215

1 F 40 160 60 81 192

2 M 55 170 60 80 242

3 M 40 165 70 88 322

4 F 40 155 60 86 184

OUTPUT :
P(Drug, Cholesterol | Age=Young, Sex=0):
+-------------+---------------------+-------------------------+
| Drug | Cholesterol | phi(Drug,Cholesterol) |
+=============+=====================+=========================+
| Drug(DrugY) | Cholesterol(HIGH) | 0.1852 |
+-------------+---------------------+-------------------------+
| Drug(DrugY) | Cholesterol(NORMAL) | 0.3333 |
+-------------+---------------------+-------------------------+
| Drug(drugA) | Cholesterol(HIGH) | 0.0741 |
+-------------+---------------------+-------------------------+
| Drug(drugA) | Cholesterol(NORMAL) | 0.0370 |
+-------------+---------------------+-------------------------+
| Drug(drugB) | Cholesterol(HIGH) | 0.0000 |
+-------------+---------------------+-------------------------+
| Drug(drugB) | Cholesterol(NORMAL) | 0.0000 |
+-------------+---------------------+-------------------------+
| Drug(drugC) | Cholesterol(HIGH) | 0.0741 |
+-------------+---------------------+-------------------------+
| Drug(drugC) | Cholesterol(NORMAL) | 0.0000 |
+-------------+---------------------+-------------------------+
| Drug(drugX) | Cholesterol(HIGH) | 0.1852 |
+-------------+---------------------+-------------------------+
| Drug(drugX) | Cholesterol(NORMAL) | 0.1111 |
+-------------+---------------------+-------------------------+

RESULT:
Thus the python program to implement Bayesian Network was written, executed and
output was verified successfully.

23
EX NO :5 REGRESSION MODEL

AIM:
To write a python program to implement linear regression.

LINEAR REGRESSION
Linear regression is a type of supervised machine-learning algorithm that learns from
the labeled datasets and maps the data points to the most optimized linear functions. which
can be used for prediction on new datasets. The goal of the algorithm is to find the best Fit
Line equation that can predict the values based on the independent variables.

ALGORITHM:

Step 1:Import necessary libraries:


a.pandas: for data manipulation and analysis.
b.matplotlib.pyplot: for creating visualizations.
c.LinearRegression from sklearn.linear_model: for implementing linear regression.
d.mean_squared_error and r2_score from sklearn.metrics: for evaluating the
performance of the regression model.
Step 2: Read the dataset
a.Use pandas to read the CSV file "diabates.csv" into a DataFrame.
Step 3:Extract the features and target variable:

24
a.Select the BMI column as the feature (independent variable) and the Diabetes
column as the target variable (dependent variable).
b.Reshape both feature and target variables into a 2-dimensional array
with one column using the reshape(-1, 1) method.
Step 4:Visualize the data:
a.Create a scatter plot with BMI on the x-axis and Diabetes on the y-axis using
matplotlib.
b.Set labels for the x-axis and y-axis
Step 5:Create and fit the linear regression model:Initialize a LinearRegression object
a..Fit the model to the data using the fit() method, passing the feature (x) and
target variable (y) as arguments

PROGRAM:

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Read the dataset
d = pd.read_csv("diabetes.csv")
# Extracting BMI and Diabetes columns
x = d.iloc[:, 5].values.reshape(-1, 1)
y = d.iloc[:, 1].values.reshape(-1, 1)
# Scatter plot of BMI vs Diabetes
plt.scatter(x, y)
plt.xlabel("BMI")
plt.ylabel("Diabetes")
# Creating and fitting the linear regression model
lin = LinearRegression()
lin.fit(x, y)
# Predicting values
y_pred = lin.predict(x)
# Plotting the regression line
plt.plot(x, y_pred, color='red')
# Calculating mean squared error and R-squared

25
mse = mean_squared_error(y, y_pred)
r2 = r2_score(y, y_pred)
# Displaying mean squared error and R-squared
print("Mean Squared Error:", mse)
print("R-squared:", r2)
plt.show()

Sample Dataset
diabetes.csv

Pregnancies Glucose Blood Skin Insulin BMI Diabetes Age Outcome


Pressure Thickness Pedigree
Function

6 148 72 35 0 33.6 0.627 50 1

1 85 66 29 0 26.6 0.351 31 0

8 183 64 0 0 23.3 0.672 32 1

1 89 66 23 94 28.1 0.167 21 0

OUTPUT:
Mean Squared Error: 971.0225668527718
R-squared: 0.04887241775173856

RESULT:
Thus the python program to implement the linear regression was executed and the
output was verified successfully.

26
EX.NO:6 DECISION TREE AND RANDOM FOREST

AIM:
To write a python program to implement Decision Tree and Random forest.

DECISION TREE
A decision tree is a flowchart-like structure used to make decisions or predictions. It
consists of nodes representing decisions or tests on attributes, branches representing the
outcome of these decisions, and leaf nodes representing final outcomes or predictions.

Metrics for Splitting:


Gini Impurity: Measures the likelihood of an incorrect classification of a new instance if
it was randomly classified according to the distribution of classes in the dataset.

where pi​is the probability of an instance being classified into a particular class.

Entropy: Measures the amount of uncertainty or impurity in the dataset

where pi​is the probability of an instance being classified into a particular class.

27
Information Gain: Measures the reduction in entropy or Gini impurity after a dataset is
split on an attribute.

RANDOM FOREST
Random Forest is a classifier that contains a number of decision trees on various
subsets of the given dataset and takes the average to improve the predictive accuracy of
that dataset. The greater number of trees in the forest leads to higher accuracy and
prevents the problem of overfitting.
It is based on the concept of ensemble learning, which is a process of combining
multiple classifiers to solve a complex problem and to improve the performance of the
model.

ALGORITHM:
Step 1:Read the dataset from the Excel file "nation.xlsx" using pandas
Step 2:Map categorical variables 'Nationality' and 'Go' to numerical values as specified.
Step 3:Define Features and Target Variable:
3.1:Define the features 'Age', 'Experience', 'Rank', and 'Nationality'.
3.2:Define the target variable 'Go'.
Step 4:Initialize a Decision Tree Classifier.
28
Step 5:Fit the Decision Tree Classifier using the features and target variable.
Step 6:Plot the decision tree using matplotlib.pyplot and sklearn.tree.plot_tree().Adjust
the plot size and display the tree with feature names and class names.
Step 7:Initialize a Random Forest Classifier with 100 decision trees.
Step 8:Fit the Random Forest Classifier using the features and target variable.
Step 9:Iterate through a specified number of additional trees from the Random Forest.Plot
each decision tree using matplotlib.pyplot and sklearn.tree.plot_tree().Adjust the plot size
and display each tree with feature names and class names.
Step 10:Calculate feature importances from the Random Forest Classifier.
10.1: Sort the feature importances in descending order.
10.2: Plot the feature importances using matplotlib.pyplot.
10.3: Display feature importance values with their corresponding feature
names.Adjust the plot size and labels for clarity.
PROGRAM:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
import matplotlib.pyplot as plt
from sklearn import tree
# Load data from Excel file
df = pd.read_excel("nation.xlsx") # Assuming your Excel file is named "nation.xlsx"
# Mapping categorical variables to numerical values
d = {'UK': 0, 'USA': 1, 'N': 2}
df['Nationality'] = df['Nationality'].map(d)
d = {'YES': 1, 'NO': 0}
df['Go'] = df['Go'].map(d)
# Features and target variable
29
features = ['Age', 'Experience', 'Rank', 'Nationality']
X = df[features]
y = df['Go']
# Decision tree classifier
dtree = DecisionTreeClassifier() # Using information gain
dtree = dtree.fit(X, y)
# Plot decision tree
plt.figure(figsize=(12, 8))
tree.plot_tree(dtree, feature_names=features, class_names=['NO', 'YES'], filled=True)
plt.subplots_adjust(left=0.05, right=0.95, top=0.95, bottom=0.05)
plt.title("Decision Tree Classifier")
plt.show()
# Random Forest classifier
rf = RandomForestClassifier(n_estimators=100) # 100 decision trees
rf.fit(X, y)
# Plotting five additional decision trees from Random Forest
num_trees_to_plot = 5
for i in range(num_trees_to_plot):
plt.figure(figsize=(12, 8))
tree.plot_tree(rf.estimators_[i+1], feature_names=features, class_names=['NO', 'YES'],
filled=True)
plt.subplots_adjust(left=0.05, right=0.95, top=0.95, bottom=0.05)
plt.title("Decision Tree {} in Random Forest".format(i+2)) # Adjusted index
plt.show()
importances = rf.feature_importances_
indices = sorted(range(len(importances)), key=lambda i: importances[i], reverse=True)
30
# Plot feature importance
plt.figure(figsize=(10, 6))
plt.tight_layout()
plt.show()
OUTPUT:

31
RESULT :
Thus the python program to implement the Decision Tree and Random forest was
executed and output was verified successfully.
32
EX.NO:7 SUPPORT VECTOR MACHINE
AIM:
To write a python program to build SVM models.

SVM

Support Vector Machine (SVM) is a powerful machine learning algorithm used for
linear or nonlinear classification, regression, and even outlier detection tasks. SVMs can
be used for a variety of tasks, such as text classification, image classification, spam
detection, handwriting identification, gene expression analysis, face detection, and
anomaly detection.
The main objective of the SVM algorithm is to find the optimal hyperplane in an
N-dimensional space that can separate the data points in different classes in the feature
space.

ALGORITHM:
Step 1: Load the Iris dataset using the `datasets.load_iris()` function from scikit-learn.
Step 2: Extract the features (petal length and sepal length) and target labels (species) from
the loaded dataset.
Step 3: Filter the dataset to include only samples corresponding to the Iris setosa and Iris
virginica species by selecting the rows where the target labels are 0 (Iris setosa) or 2 (Iris
virginica).
Step 4: Split the filtered dataset into training and testing sets using the `train_test_split`
function from scikit-learn, with a specified test size (e.g., 20%) and a random state for
reproducibility.
Step 5: Initialize an SVM classifier with a linear kernel using `svm.SVC(kernel='linear')`.
33
Step 6: Train the SVM classifier using the training data (features and labels)
using the `fit` method.
Step 7: Plot the training data points on a scatter plot where the x-axis represents petal
length and the y-axis represents sepal length. Different colors are used to distinguish
between different species.
Step 8: Retrieve the current axis limits of the scatter plot to define the range for the
meshgrid.
Step 9: Create a meshgrid covering the range of feature values (petal length and sepal
length) using `np.meshgrid`.
Step 10: Evaluate the decision function of the trained SVM classifier on the meshgrid
points to determine the decision boundary.
Step 11: Plot the decision boundary on the scatter plot using contours, where the decision
function equals -1, 0, and 1. Different linestyles are used to distinguish between the
margin and the decision boundary.

PROGRAM:

import matplotlib.pyplot as plt


import numpy as np
from sklearn import datasets, svm
from sklearn.model_selection import train_test_split
# Load Iris dataset
iris = datasets.load_iris()
X = iris.data[:, [0, 2]] # Selecting petal length and sepal length as features
y = iris.target # Target variable
# Select Iris setosa and Iris virginica species from the dataset (target classes 0 and 2)
X = X[(y == 0) | (y == 2)]
y = y[(y == 0) | (y == 2)]
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create and train the SVM classifier
clf = svm.SVC(kernel='linear')
clf.fit(X_train, y_train)
# Plot the data
plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=plt.cm.coolwarm)
ax = plt.gca()
34
ax.set_xlabel('Petal Length')
ax.set_ylabel('Sepal Length')
# Get current axis limits
xlim = ax.get_xlim()
ylim = ax.get_ylim()
# Create a meshgrid to evaluate the decision function
xx, yy = np.meshgrid(np.linspace(xlim[0], xlim[1], 100), np.linspace(ylim[0], ylim[1],
100))
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)
# Plot the decision boundary
plt.contour(xx, yy, Z, colors='k', levels=[-1, 0, 1], alpha=0.5, linestyles=['--', '-', '--'])
plt.title('SVM Decision Boundary for Iris Setosa and Iris Virginica')
plt.show()

OUTPUT:

RESULT:
Thus the python program to build SVM models was executed and the output was
verified successfully.\

35
EX.NO:8 ENSEMBLE MODEL

AIM:
To write python program to implement ensembling techniques

STUDY:
Ensemble learning is a machine learning technique that combines the predictions
from multiple individual models to obtain a better predictive performance than any single
model. The basic idea of ensemble learning is to aggregating the predictions of multiple
models, each of which may have its own strengths and weaknesses. This can lead to
improved performance and generalization.

Bagging (Bootstrap Aggregating)


● Bagging is a technique where multiple subsets of the dataset are created through
bootstrapping (sampling with replacement).
● A base model (often a decision tree) is trained on each subset, and the final
prediction is the average (for regression) or majority vote (for classification) of
the individual predictions.
● Bagging helps reduce variance and overfitting, especially for unstable models.
Boosting
● Boosting is an ensemble technique where base models are trained sequentially,
with each subsequent model focusing on the mistakes of the previous ones.
● The final prediction is a weighted sum of the individual models' predictions,
with higher weights given to more accurate models.
36
● Boosting algorithms like AdaBoost, Gradient Boosting, and XGBoost are
popular because they improve model performance.
Stacking
● Stacking, or stacked generalization, combines multiple base models with a
meta-model to make predictions.
● Instead of using simple methods like averaging or voting, stacking trains a
meta-model to learn how to combine the base models' predictions best.
● The base models can be diverse to capture different aspects of the data, and the
meta-model learns to weight its predictions based on its performance.

ALGORITHM:

Step 1: Import the necessary libraries including scikit-learn modules for datasets,
ensemble techniques (Bagging, Boosting, Stacking), classifiers (Random Forest,
AdaBoost, Logistic Regression), and evaluation metrics.
Step 2: Load the Iris dataset using load_iris() function from scikit-learn datasets module.
Split the dataset into features (X) and target (y).
Step3: Split the dataset into training and testing sets using train_test_split() function from
scikit-learn.
Step 4:Bagging with Random Forest:
4.1:Initialize a BaggingClassifier with a base estimator as RandomForestClassifier. Set
the number of base estimators to 10 and the number of bags (ensemble members) to 5.
4.2:Fit the BaggingClassifier on the training data.
4.3:Predict using the trained BaggingClassifier on the entire dataset
4.4:Evaluate the accuracy of the Bagging Classifiers predictions using
accuracy_score() function.
Step 5:Boosting with AdaBoost:
5.1:Initialize an AdaBoostClassifier with the number of estimators set to 50.
5.2:Fit the AdaBoostClassifier on the training data.
5.3:Predict using the trained AdaBoostClassifier on the entire dataset.
5.4:Evaluate the accuracy of the AdaBoostClassifier's predictions.
Step 6:Stacking with Random Forest, AdaBoost, and Logistic Regression:
6.1:Define base models as a list of tuples, where each tuple contains the name of the
base model and the base model itself.
6.2:Initialize the meta-learner as LogisticRegression.
6.3:Initialize the StackingClassifier with base models and the meta-learner.
37
6.4:Fit the StackingClassifier on the training data.
6.5:Predict using the trained StackingClassifier on the entire dataset.
6.6:Evaluate the accuracy of the StackingClassifier's predictions.
Step 7:Print the accuracy and predictions for each ensemble method.

PROGRAM:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier,
BaggingClassifier, StackingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Bagging with Random Forest
bagging_clf = BaggingClassifier(RandomForestClassifier(n_estimators=10,
random_state=42), n_estimators=5, random_state=42)
bagging_clf.fit(X_train, y_train)
bagging_pred = bagging_clf.predict(X)
bagging_accuracy = accuracy_score(y, bagging_pred)
print("Bagging Accuracy:", bagging_accuracy)
print("Bagging Predictions:", bagging_pred)
# Boosting with AdaBoost
boosting_clf = AdaBoostClassifier(n_estimators=50, random_state=42)
boosting_clf.fit(X_train, y_train)
boosting_pred = boosting_clf.predict(X)
boosting_accuracy = accuracy_score(y, boosting_pred)
print("Boosting Accuracy:", boosting_accuracy)
print("Boosting Predictions:", boosting_pred)
# Stacking with Random Forest, AdaBoost, and Logistic Regression
base_models = [
38
('random_forest', RandomForestClassifier(n_estimators=10, random_state=42)),
('adaboost', AdaBoostClassifier(n_estimators=10, random_state=42))]
meta_learner = LogisticRegression()
stacking_clf = StackingClassifier(estimators=base_models, final_estimator=meta_learner)
stacking_clf.fit(X_train, y_train)
stacking_pred = stacking_clf.predict(X)
stacking_accuracy = accuracy_score(y, stacking_pred)
print("Stacking Accuracy:", stacking_accuracy)
print("Stacking Predictions:", stacking_pred)

Output:
Bagging Accuracy: 0.98
Predictions: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0000000000000111111111111111111112111
1111111111111111111111111122222212222
2222222212222222222222222222222222222
2 2]
Bagging Boosting Accuracy: 0.9733333333333334
Boosting Predictions: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0
0000000000000111111111111111111112111
1112111112111111111111111122222212222
2222222222222222222222222222222222222
2 2]
Stacking Accuracy: 0.9933333333333333
Stacking Predictions: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0
0000000000000111111111111111111112111
1111111111111111111111111122222222222
2222222222222222222222222222222222222
2 2]

RESULT:
Thus the python program to implement ensembling techniques was executed and
output was verified successfully.

39
EX.NO:9 K-MEANS CLUSTERING ALGORITHM

AIM:

To write a python program to implement clustering algorithms.

CLUSTERING
Clustering algorithms are unsupervised learning techniques used to group similar data
points together based on certain characteristics or features.
TYPES OF CLUSTERING ALGORITHM
1. Partitioning Methods
● K-Means Clustering
● K-Medoids (PAM)
2. Hierarchical Methods
● Agglomerative Clustering
● Divisive Clustering
3. Density-Based Methods
● DBSCAN
● OPTICS

K-MEANS CLUSTERING

K-Means Clustering is an Unsupervised Learning algorithm, which groups the


unlabeled dataset into different clusters. K-means aims to partition data points into K
clusters, with each cluster represented by its centroid. Here K defines the number of
predefined clusters that need to be created in the process, as if K=2, there will be two
clusters, and for K=3, there will be three clusters, and so on.
It is an iterative algorithm that divides the unlabeled dataset into k different clusters in
such a way that each dataset belongs only one group that has similar properties.

40
ALGORITHM:
Step 1: Import the necessary libraries including pandas, sklearn for KMeans clustering,
and matplotlib for visualization.
Step 2: Read the dataset from a file using pd.read_csv(file_path) into a pandas
DataFrame df.
Step 3: Plot the original data points with 'AGE' on the x-axis and 'BP' on the y-axis using
plt.scatter. Each point is represented as a blue circle.
Step 4: Initialize a KMeans object k means with the desired number of clusters
(n_clusters=2) and a random state (random_state=42).
4.1: Fit the KMeans model to the 'AGE' and 'BP' columns of the DataFrame using
kmeans.fit_predict(df[['AGE', 'BP']])
4.2: Assign the cluster labels to a new column in the DataFrame df['Cluster'].
Step 5: Plot the clustered data points with 'AGE' on the x-axis and 'BP' on the y-axis
using plt.scatter. Each point is colored according to its assigned cluster, with the colormap
set to 'viridis' and adjust the size of the points, transparency, and edge color for better
visualization.
Step 6: Plot the centroids of the clusters as red 'X' markers using plt.scatter.
Step 7: Add a title to the plot using plt.title and label the x-axis and y-axis using
plt.xlabel and plt.ylabel.
Step 8: Add a legend to the plot using plt.legend.

41
Step 9: Display the plot using plt.show().

PROGRAM:
import pandas as pd
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
#load data from csv file
df = pd.read_csv(“C:/Users/Desktop/AzureDiabDataset.csv”)
# Visualize the original data
plt.scatter(df['AGE'], df['BP'], s=50,color='blue', label='Original Data')
#Apply k-means clustering
kmeans = KMeans(n_clusters=2, random_state=42)
df['Cluster'] = kmeans.fit_predict(df[['AGE', 'BP']])
#Visualize the clustered data
plt.scatter(df['AGE'], df['BP'], c=df['Cluster'], cmap='viridis', s=50, alpha=0.8,
edgecolors='w', label='Clustered Data')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], c='red',
marker='X', s=200, label='Centroids')
#Add labels and legends
plt.title('K-Means Clustering')
plt.xlabel('AGE')
plt.ylabel('BP')
plt.legend()
#show the plot
plt.show()

42
Sample Dataset

AzureDiabDataset.csv

AGE SEX BMI BP S1 S2 S3 S4 S5 S6 Y

59 2 32.1 101 157 93.2 38 4 4.8598 87 151

48 1 21.6 87 183 103.2 70 3 3.8918 69 75

72 2 30.5 93 156 93.6 41 4 4.6728 85 141

24 1 25.3 84 198 131.4 40 5 4.8903 89 206

OUTPUT:

RESULT:

Thus the python program to implement clustering algorithms was executed and the
output was verified successfully.
43
EX.NO:10 EM FOR BAYESIAN NETWORKS

AIM:
To write a python program to implement EM for Bayesian Networks.

EXPECTATION MAXIMIZATION
The Expectation-Maximization (EM) algorithm is an iterative method used to find
maximum likelihood estimates of parameters in statistical models, particularly when the
model involves latent variables or incomplete data.
Two-Step Process: The algorithm consists of the Expectation (E) step, which
calculates the expected value of the log-likelihood, and the Maximization (M) step,
which updates the parameters to maximize this expected log-likelihood.
Application: Commonly used in applications such as Gaussian Mixture Models
(GMMs), Hidden Markov Models (HMMs), and image segmentation to handle
incomplete or missing data.
Convergence: The algorithm is guaranteed to converge to a local maximum of the
likelihood function, though it may not always find the global maximum.

ALGORITHM:

Step 1: Import necessary libraries, including NumPy for numerical computations,


scikit-learn for machine learning functionalities, and specifically GaussianMixture for
GMM clustering.

44
Step 2: Load the Iris dataset using load_iris() function from scikit-learn. Assign the
features to variable X.

Step 3: Create an instance of GaussianMixture with n_components=3, indicating that we


want to fit the model with three Gaussian distributions.

Step 4: Fit the GMM to the data using the Expectation-Maximization (EM) algorithm by
calling the fit() method on the GMM object gmm and passing the dataset X.

Step 5: Predict the cluster assignments for each data point using the predict() method on
the fitted GMM object gmm. This assigns each data point to one of the three clusters.

Step 6: Print the cluster means and covariances using the means and ‘covariances’
attributes of the fitted GMM object ‘gmm’.

PROGRAM:

import numpy as np

from sklearn.datasets import load_iris

from sklearn.mixture import GaussianMixture

# Load the Iris dataset

iris = load_iris()

X = iris.data

# Initialize the Gaussian Mixture Model with 3 components

gmm = GaussianMixture(n_components=3)

# Fit the model using the EM algorithM

gmm.fit(X)

# Get the cluster assignments for each data point

labels = gmm.predict(X)

# Print the cluster means and covariances


45
print("Cluster Means:")

print(gmm.means)

print("\nCluster Covariances:")

print(gmm.covariances)

Sample Iris Dataset

sepal.length sepal.width petal.length petal.width variety

5.1 3.5 1.4 0.2 Setosa

4.9 3 1.4 0.2 Setosa

4.7 3.2 1.3 0.2 Setosa

4.6 3.1 1.5 0.2 Setosa

5 3.6 1.4 0.2 Setosa

OUTPUT:

Cluster Means:

[[5.006 3.428 1.462 0.246 ]

[5.91697517 2.77803998 4.20523542 1.29841561]

[6.54632887 2.94943079 5.4834877 1.98716063]]

Cluster Covariances:

[[[0.121765 0.097232 0.016028 0.010124 ]

46
[0.097232 0.140817 0.011464 0.009112 ]

[0.016028 0.011464 0.029557 0.005948 ]

[0.010124 0.009112 0.005948 0.010885 ]]

[[0.27550587 0.09663458 0.18542939 0.05476915]

[0.09663458 0.09255531 0.09103836 0.04299877]

[0.18542939 0.09103836 0.20227635 0.0616792 ]

[0.05476915 0.04299877 0.0616792 0.03232217]]

[[0.38741443 0.09223101 0.30244612 0.06089936]

[0.09223101 0.11040631 0.08386768 0.0557538 ]

[0.30244612 0.08386768 0.32595958 0.07283247]

[0.06089936 0.0557538 0.07283247 0.08488025]]]

RESULT:
Thus the python program to implement EM for Bayesian Networks was executed
and the output was verified successfully.

47
EX.NO:11 SIMPLE NN MODEL

AIM:

To write a python program to implement a simple NN model.

NEURAL NETWORK

A Neural Network is a system designed to operate like a human brain. Human


information processing takes place through the interaction of many billions of neurons
connected to each other sending signals to other neurons. Similarly, a Neural Network is a
network of artificial neurons, as found in human brains, for solving artificial intelligence
problems such as image identification.

ALGORITHM:

Step 1:Import numpy for numerical operations, load_iris to load the Iris dataset, and
various functions from scikit-learn for data preprocessing and model evaluation. Import
Sequential and Dense from TensorFlow.keras for building the neural network model.

Step 2: Load the Iris dataset using load_iris() function.

Step 3:Separate the features (X) and target labels (y).

Step 4: One-hot encode the target labels using OneHotEncoder from scikit-learn.

Step 5:Split the data into training and testing sets using train_test_split from scikit-learn.

Step 6:Initialize a Sequential model.

48
Step 7:Add a Dense layer with 10 units and ReLU activation function as the first hidden
layer. Specify input shape as the number of features.

Step 8:Add a Dense output layer with 3 units (one for each class) and softmax activation
function.

Step 9: Specify the optimizer (adam), loss function (categorical_crossentropy), and


metrics (accuracy).

Step 10:Fit the model to the training data.

Step 11:Specify the number of epochs (50) and batch size (5).

Step 12:Evaluate the trained model on the testing set to get the loss and accuracy.

Step 13: Print the test loss and test accuracy

Step 14: The program doesn't explicitly show the epoch loop, but it's implied that during
training, it's running for 50 epochs, printing the progress of each epoch.

PROGRAM:

import numpy as np

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import OneHotEncoder

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

# Load the Iris dataset

iris = load_iris()

X, y = iris.data, iris.target

# One-hot encode the target labels

enc = OneHotEncoder()
49
y = enc.fit_transform(y.reshape(-1, 1)).toarray()

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Build the neural network model

model = Sequential([

Dense(10, activation=’relu’, input_shape=(X_train.shape[1],)),

Dense(3, activation=’softmax’)

])

# Compile the model

model.compile(optimizer=’adam’, loss=’categorical_crossentropy’, metrics=[‘accuracy’])

# Train the model

model.fit(X_train, y_train, epochs=10, batch_size=5, validation_split=0.1)

# Evaluate the model on the testing set

loss, accuracy = model.evaluate(X_test, y_test)

print(f”test Loss: {loss}, Test Accuracy: {accuracy}”)

OUTPUT:

Epoch 1/50

22/22 [==============================] - 1s 22ms/step - loss:


1.7254 - accuracy: 0.1019 - val_loss: 1.7468 - val_accuracy: 0.0000e+00

Epoch 2/50

22/22 [==============================] - 0s 5ms/step - loss: 1.4427 - accuracy:


0.0093 - val_loss: 1.4198 - val_accuracy: 0.0000e+00

50
Epoch 3/50

22/22 [==============================] - 0s 5ms/step - loss: 1.2818 - accuracy:


0.0833 - val_loss: 1.2746 - val_accuracy: 0.0833

Epoch 4/50

22/22 [==============================] - 0s 6ms/step - loss: 1.2047 - accuracy:


0.1296 - val_loss: 1.2024 - val_accuracy: 0.0833

Epoch 5/50

22/22 [==============================] - 0s 6ms/step - loss: 1.1276 - accuracy:


0.1389 - val_loss: 1.1422 - val_accuracy: 0.1667

Epoch 6/50

22/22 [==============================] - 0s 6ms/step - loss: 1.0586 - accuracy:


0.4259 - val_loss: 1.0789 - val_accuracy: 0.2500

Epoch 7/50

22/22 [==============================] - 0s 6ms/step - loss: 0.9892 - accuracy:


0.5278 - val_loss: 1.0456 - val_accuracy: 0.2500

Epoch 8/50

22/22 [==============================] - 0s 6ms/step - loss: 0.9348 - accuracy:


0.5556 - val_loss: 0.9946 - val_accuracy: 0.3333

Epoch 9/50

22/22 [==============================] - 0s 6ms/step - loss: 0.8965 - accuracy:


0.5556 - val_loss: 0.9641 - val_accuracy: 0.3333

Epoch 10/50

22/22 [==============================] - 0s 6ms/step - loss: 0.8549 - accuracy:


0.5741 - val_loss: 0.9305 - val_accuracy: 0.3333

51
Test Loss: 0.40232041478157043, Test Accuracy: 0.800000011920929

RESULT:

Thus the python program to build a simple NN model was written, executed and
output was verified successfully.

52
EX.NO:12 DEEP LEARNING NEURAL NETWORK MODEL

AIM:
To write a python program to build deep learning neural network model.

DEEP LEARNING NEURAL NETWORK


The extension of conventional artificial neural networks is deep neural networks. It is
shallow for conventional neural networks to have one or two hidden layers. There are
many hidden layers in deep neural networks. There is a wide range of models for deep
neural networks, ranging from DNNs, CNNs, RNNs, and LSTMs.

ALGORITHM
Step 1:Import numpy for numerical operations, load_iris to load the Iris dataset, and
various functions from scikit-learn for data preprocessing and model evaluation. Import
Sequential and Dense from TensorFlow.keras for building the neural network model
Step 2:Load the Iris dataset using load_iris() function.
Step 3:Separate the features (X) and target labels (y).

Step 4:One-hot encode the target labels using OneHotEncoder from scikit-learn.

Step 5:Split the data into training and testing sets using train_test_split from scikit-learn.

Step 6:Initialize a Sequential model.


53
Step 7:Add a Dense layer with 10 units and ReLU activation function as the first hidden
layer. Specify input shape as the number of features.

Step 8:Add a Dense output layer with 3 units (one for each class) and softmax activation
function.

Step 9: Specify the optimizer (adam), loss function (categorical_crossentropy), and


metrics (accuracy).

Step 10:Fit the model to the training data.

Step 11:Specify the number of epochs (50) and batch size (5).

Step 12:Evaluate the trained model on the testing set to get the loss and accuracy.

Step 13:Print the test loss and test accuracy

Step 14: The program doesn't explicitly show the epoch loop, but it's implied that during
training, it's running for 50 epochs, printing the progress of each epoch.

PROGRAM:

import numpy as np

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

# Load the Iris dataset

iris = load_iris()

X = iris.data

y = iris.target
54
# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features

scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)

X_test_scaled = scaler.transform(X_test)

# Build the deep learning neural network model

model = Sequential([

Dense(32, activation='relu', input_shape=(X_train.shape[1],)),

Dense(16, activation='relu'),

Dense(3, activation='softmax') # Three output neurons for three iris species

])

# Compile the model

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy',
metrics=['accuracy'])

# Train the model

model.fit(X_train_scaled, y_train, epochs=10, batch_size=32, validation_split=0.2)

# Evaluate the model

loss, accuracy = model.evaluate(X_test_scaled, y_test)

print(f"Test Loss: {loss}, Test Accuracy: {accuracy}")

55
Output:

Epoch 1/50

3/3 [==============================] - 1s 203ms/step - loss: 1.0726 - accuracy:


0.4375 - val_loss: 0.9696 - val_accuracy: 0.5000

Epoch 2/50

3/3 [==============================] - 0s 39ms/step - loss: 1.0362 - accuracy:


0.4479 - val_loss: 0.9370 - val_accuracy: 0.5417

Epoch 3/50

3/3 [==============================] - 0s 35ms/step - loss: 0.9983 - accuracy:


0.4583 - val_loss: 0.9064 - val_accuracy: 0.5833

Epoch 4/50

3/3 [==============================] - 0s 52ms/step - loss: 0.9653 - accuracy:


0.4583 - val_loss: 0.8772 - val_accuracy: 0.6250

Epoch 5/50

3/3 [==============================] - 0s 42ms/step - loss: 0.9313 - accuracy:


0.5938 - val_loss: 0.8495 - val_accuracy: 0.7500

Epoch 6/50

3/3 [==============================] - 0s 42ms/step - loss: 0.9004 - accuracy:


0.7292 - val_loss: 0.8232 - val_accuracy: 0.7917

Epoch 7/50

3/3 [==============================] - 0s 41ms/step - loss: 0.8709 - accuracy:


0.8333 - val_loss: 0.7970 - val_accuracy: 0.7917

Epoch 8/50

3/3 [==============================] - 0s 39ms/step - loss: 0.8417 - accuracy:


0.8333 - val_loss: 0.7708 - val_accuracy: 0.7917

56
Epoch 9/50

3/3 [==============================] - 0s 36ms/step - loss: 0.8129 - accuracy:


0.8333 - val_loss: 0.7460 - val_accuracy: 0.7917

Epoch 10/50

3/3 [==============================] - 0s 38ms/step - loss: 0.7855 - accuracy:


0.8333 - val_loss: 0.7213 - val_accuracy: 0.7917

Test Loss: 0.20980490744113922, Test Accuracy: 0.9666666388511658

RESULT:

Thus the python program to build a deep learning neural network model was executed
and the output was verified successfully.

57

You might also like