Experiment 11 – k-Nearest Neighbour (k-NN) on Iris Dataset
Aim To implement the k-Nearest Neighbour (k-NN) algorithm to classify the Iris dataset and print both
correct and wrong predictions.
Theory - k-NN Algorithm: A supervised machine learning algorithm used for classification and regression. -
Working: Classifies a new data point based on the majority class among its k nearest neighbors. - Distance
Metric: Usually Euclidean distance. - Advantages: Simple, effective, no training phase required. -
Disadvantages: Computation-heavy on large datasets.
Dataset Used: Iris dataset with 3 classes of flowers: Setosa, Versicolor, Virginica.
Algorithm / Procedure 1. Import required libraries (sklearn, pandas, numpy). 2. Load the Iris dataset. 3. Split
the dataset into training and testing sets. 4. Train the k-NN classifier with a chosen value of k (e.g., 3). 5.
Predict on test data. 6. Print correct and wrong predictions separately. 7. Display the accuracy.
Program (Python) from sklearn.datasets import load_iris from sklearn.model_selection import
train_test_split from sklearn.neighbors import KNeighborsClassifier from sklearn.metrics import
accuracy_score
iris = load_iris() X = iris.data y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1) knn =
KNeighborsClassifier(n_neighbors=3) knn.fit(X_train, y_train) y_pred = knn.predict(X_test)
print("Correct Predictions:") for actual, predicted in zip(y_test, y_pred): if actual == predicted: print(f"Actual:
{iris.target_names[actual]}, Predicted: {iris.target_names[predicted]}")
print("\nWrong Predictions:") for actual, predicted in zip(y_test, y_pred): if actual != predicted: print(f"Actual:
{iris.target_names[actual]}, Predicted: {iris.target_names[predicted]}")
print("\nAccuracy:", accuracy_score(y_test, y_pred))
Result / Conclusion The k-NN algorithm successfully classified the Iris dataset. Correct and wrong
predictions were displayed, and the model achieved high accuracy.
Experiment 12 – Decision Tree for Most Specific Hypothesis
Aim To demonstrate the Decision Tree algorithm for finding the most specific hypothesis based on a given
set of training data samples.
Theory - Decision Tree: A supervised learning algorithm for classification and regression. - Structure: Nodes
(tests), branches (outcomes), leaves (class labels). - Most Specific Hypothesis: Attributes are chosen step by
1
step to best divide the data until classification is clear. - Splitting Measures: Information Gain, Gini Index. -
Advantages: Easy to interpret, can handle both categorical and numerical data.
Algorithm / Procedure 1. Collect training data samples. 2. Select root node with highest information gain. 3.
Split dataset based on attribute values. 4. Repeat recursively until all samples are classified or no attributes
remain. 5. Use the tree to predict new samples.
Program (Python) import pandas as pd from sklearn.tree import DecisionTreeClassifier, export_text
data = { 'Outlook':
['Sunny','Sunny','Overcast','Rainy','Rainy','Rainy','Overcast','Sunny','Sunny','Rainy','Sunny','Overcast','Overcast','Rainy'],
'Temperature': ['Hot','Hot','Hot','Mild','Cool','Cool','Cool','Mild','Cool','Mild','Mild','Mild','Hot','Mild'], 'Humidity':
['High','High','High','High','Normal','Normal','Normal','High','Normal','Normal','Normal','High','Normal','High'],
'Wind':
['Weak','Strong','Weak','Weak','Weak','Strong','Strong','Weak','Weak','Weak','Strong','Strong','Weak','Strong'],
'PlayTennis': ['No','No','Yes','Yes','Yes','No','Yes','No','Yes','Yes','Yes','Yes','Yes','No'] }
df = pd.DataFrame(data) df_encoded = pd.get_dummies(df[['Outlook', 'Temperature', 'Humidity', 'Wind']]) y =
df['PlayTennis']
clf = DecisionTreeClassifier(criterion="entropy") clf.fit(df_encoded, y) tree_rules = export_text(clf,
feature_names=list(df_encoded.columns)) print(tree_rules)
Result / Conclusion The Decision Tree algorithm successfully generated the most specific hypothesis in the
form of decision rules. It can classify new data samples accurately.