[go: up one dir, main page]

0% found this document useful (0 votes)
6 views1 page

ML Lab1 Theory

Uploaded by

Sanjay Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views1 page

ML Lab1 Theory

Uploaded by

Sanjay Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

EXPERIMENT - 1

Introduction to the Iris Dataset

The Iris dataset is one of the most famous datasets used in data science and machine learning.
It was introduced by Sir Ronald Fisher in 1936 and is often used for learning and testing
classification algorithms. The dataset contains 150 samples of iris flowers from three
different species — Setosa, Versicolor, and Virginica.

Dataset Description

Each flower in the dataset has four features: sepal length, sepal width, petal length, and petal
width. All measurements are recorded in centimeters. These features are used to classify the
flower into its correct species. The dataset is simple, clean, and small in size, which makes it
perfect for beginners to practice.

k-Nearest Neighbors (k-NN) Algorithm

KNN is a simple and widely used machine learning algorithm for classification. It works by
finding the “k” closest data points to a new sample and predicting its class based on the
majority vote. It is a distance-based method, meaning it uses the closeness of data points in
feature space to make predictions.

Data Visualization

Visualizing data helps in understanding patterns and relationships between features. Pair plots
show how features vary for different species. Box plots display the spread of feature values,
while heatmaps show the correlation between numerical features. These visualizations make
it easier to analyze the dataset before building a model.

Cross-Validation

Cross-validation is a method to check how well a model works on unseen data. In 5-fold
cross-validation, the dataset is split into 5 parts. The model is trained on 4 parts and tested on
the remaining part, and this process is repeated 5 times. The results are averaged to get a
reliable accuracy score.

Learning Outcomes

The model’s performance is assessed using multiple metrics: Accuracy (percentage of correct
predictions), Precision(proportion of correct positive predictions), Recall (proportion of
actual positives correctly identified), and F1-score(harmonic mean of precision and recall).

You might also like