0% found this document useful (0 votes)

11 views6 pages

Module 3 Lab 1

This document provides a detailed explanation of distance metrics and the K-Nearest Neighbors (KNN) algorithm, including definitions, examples, and visualizations of various distance metrics such as Euclidean, Manhattan, and Cosine Similarity. It discusses how KNN works for classification and regression, using synthetic and real datasets like the Iris dataset, and emphasizes the importance of selecting the appropriate distance metric and scaling features. Key takeaways highlight the foundational role of distance metrics in KNN and the necessity of experimentation for optimal performance.

Uploaded by

katrao39798

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views6 pages

Module 3 Lab 1

Uploaded by

katrao39798

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Detailed Explanation of Module 3 Lab 1: Understanding Distance Metrics and

Introduction to KNN
This lab introduces the concept of distance metrics—how to measure the "closeness" of data
points—and shows how these are used in the K-Nearest Neighbors (KNN) algorithm. Below, each
section and concept is explained step-by-step, with examples and answers to key questions.

Section 1: Distance Metrics

A. What is a Distance Metric?

A distance metric is a mathematical way to measure how far apart two points (data samples) are
in space. Different metrics are used depending on the data type and problem.

B. Common Distance Metrics (with Examples)

1. Euclidean Distance
Definition: The straight-line distance between two points.
Formula:

Example:
x_1 = np.array((1, 2))
x_2 = np.array((4, 6))
euclidean_dist = np.sqrt(np.sum((x_1-x_2) ** 2))
print(euclidean_dist) # Output: 5.0

Visualization: The shortest path between two points on a plane.

2. Manhattan Distance
Definition: The sum of the absolute differences of their coordinates (like a taxi driving
on a city grid).
Formula:

Example:
manhattan_dist = np.sum(np.abs(x_1 - x_2))
print(manhattan_dist) # Output: 7

3. Minkowski Distance
Generalizes Euclidean (p=2) and Manhattan (p=1) distances.
Formula:

Example:
For , the Minkowski distance between the same points is about 4.5.
4. Hamming Distance
Definition: Number of positions at which the corresponding values are different (used
for categorical/binary data).
Example:
str_1 = 'euclidean'
str_2 = 'manhattan'
hamming_dist = distance.hamming(list(str_1), list(str_2)) * len(str_1)
print(hamming_dist) # Output: 7.0

5. Cosine Similarity
Definition: Measures the cosine of the angle between two vectors (used for text and
high-dimensional data).
Formula:

Example:
cosine_similarity = np.dot(x_1, x_2)/(norm(x_1)*norm(x_2))
print(cosine_similarity) # Output: 0.992...

6. Chebyshev Distance
Definition: The maximum absolute difference across any dimension.
Example:
chebyshev_distance = distance.chebyshev(x_1, x_2)
print(chebyshev_distance) # Output: 4

7. Jaccard Distance
Definition: Measures dissimilarity between sets.
Formula:
Example:
print(distance.jaccard([1, 0, 0], [0, 1, 0])) # Output: 1.0

8. Haversine Distance
Definition: Used for geographic coordinates (latitude/longitude) on a sphere (e.g.,
Earth).
Example:
haversine([-0.116773, 51.510357], [-77.009003, 38.889931]) # Output: 5897.658 km

C. How to Choose the Right Distance Metric?

Euclidean: Most common for continuous, low-dimensional data.
Manhattan: Useful for high-dimensional or grid-like data.
Cosine Similarity: Good when only direction matters (e.g., text).
Hamming: For categorical/binary variables.
Jaccard: For set/binary data.
Haversine: For geographic data.

D. Visualizing Distance Metrics

The lab uses 3D plots to show how Euclidean and Manhattan distances look from the origin,
helping you understand how each metric "measures" space differently.

Section 2: K-Nearest Neighbors (KNN)

A. What is KNN?
KNN is a supervised, non-parametric, instance-based algorithm used for classification and
regression.
How it works: For a new data point, KNN finds the k closest points in the training set (using
a distance metric) and assigns the most common class among them (for classification) or
averages their values (for regression).
B. KNN on a Synthetic Dataset
The lab generates two clusters of 2D points (red and blue) and uses KNN to classify new
points.
Example:
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(pts, tgts)
our_predictions = knn.predict(test_pts)
print("Prediction Accuracy: ", 100 * np.mean(our_predictions == test_tgts))
# Output: e.g., 80.0

Experiment:
Try different distance metrics ('euclidean', 'manhattan', 'chebyshev', 'minkowski',
'hamming') and observe how accuracy changes.

C. KNN on the Iris Dataset (Real Data Example)

Iris dataset: 150 samples, 3 species, 4 features each.
Data is split into training and testing sets.
KNN is run with different distance metrics (Euclidean, Cosine, Manhattan, Chebyshev).
Result:
For this dataset and split, all metrics gave 100% accuracy, but this may not always be the
case in other datasets.

Section 3: Questions to Think About and Answer

1. How are similarity and distance different?
Similarity measures how alike two data points are (higher = more alike, e.g., cosine
similarity).
Distance measures how far apart two data points are (lower = more similar, e.g.,
Euclidean).
In KNN, distance metrics like Euclidean and Manhattan are used to find the closest
neighbors, while similarity is used in other algorithms.
2. What makes a valid distance metric?
A valid distance metric must satisfy:
Non-negativity: $ d(x, y) \geq 0 $
Identity: $ d(x, y) = 0 $ if and only if $ x = y $
Symmetry: $ d(x, y) = d(y, x) $
Triangle Inequality: $ d(x, z) \leq d(x, y) + d(y, z) $
Section 4: Best Practices and Observations
Metric choice matters: For some data, the right distance metric can significantly improve
KNN performance [1] [2] [3] .
Curse of dimensionality: In very high-dimensional data, distances between points become
less meaningful, and KNN may not work well [2] .
Feature scaling: Always scale features before using KNN, especially with Euclidean or
Manhattan distance [2] [4] .

Section 5: Summary Table

Metric Use Case Formula/Example

$ \sqrt{\sum (x_i - y_i)^2}

Euclidean Continuous, low-dim data
$

x_i -
Manhattan High-dim, grid data $ \sum $
y_i

Max difference in any x_i -

Chebyshev $ \max $
dimension y_i

# positions where $ x_i

Hamming Categorical/binary variables
\neq y_i $

A \cap } A \cup }
Jaccard Set/binary data $ 1 - \frac{
B { B $

Text, direction matters, not

Cosine $ \frac{x \cdot y}{|x||y|} $
magnitude

Haversine Geographic coordinates See code in notebook

Key Takeaways
Distance metrics are foundational for KNN and many other algorithms.
KNN is simple and effective, but its performance depends on the distance metric and value
of $ k $.
Experiment with different metrics and always scale your features.

If you want a deeper explanation of any metric, code example, or visualization, just ask!
⁂

1. https://www.ustcnewly.com/teaching/2020_2_3.pdf
2. https://www.kdnuggets.com/2020/11/most-popular-distance-metrics-knn.html
3. https://blog.devgenius.io/exploring-knn-with-different-distance-metrics-85aea1e8299
4. https://www.freecodecamp.org/news/k-nearest-neighbors-algorithm-classifiers-and-model-example/

DS - Module 3
No ratings yet
DS - Module 3
65 pages
4 Intro To K Nearest Neighbors
No ratings yet
4 Intro To K Nearest Neighbors
13 pages
Machine Learning: kNN Techniques
No ratings yet
Machine Learning: kNN Techniques
9 pages
Notes On K
No ratings yet
Notes On K
3 pages
KNN Algorithm
No ratings yet
KNN Algorithm
2 pages
Class Notes Unit 2 ML Material
No ratings yet
Class Notes Unit 2 ML Material
31 pages
K Nearest Neighbor Classification
No ratings yet
K Nearest Neighbor Classification
16 pages
K Nearest Neighbour - Algorithm
No ratings yet
K Nearest Neighbour - Algorithm
29 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
4 pages
8.2. Machine Learning K Nearest Neighbor
No ratings yet
8.2. Machine Learning K Nearest Neighbor
36 pages
AIML
No ratings yet
AIML
13 pages
Unit II 2 Mark Answers ML
No ratings yet
Unit II 2 Mark Answers ML
3 pages
m3 Final-1
No ratings yet
m3 Final-1
171 pages
K-Nearest Neighbor Algorithm: by Vipul Pathak (00216404824) Siddharth Tyagi (02016404824)
No ratings yet
K-Nearest Neighbor Algorithm: by Vipul Pathak (00216404824) Siddharth Tyagi (02016404824)
19 pages
Instance Based Learning
No ratings yet
Instance Based Learning
7 pages
Miss Erum Mahood Topic: KNN Algorthim: Presentator BY: Zobia Malaika Maryam Minahil
No ratings yet
Miss Erum Mahood Topic: KNN Algorthim: Presentator BY: Zobia Malaika Maryam Minahil
10 pages
K Nearest Neighbor: Presented by
No ratings yet
K Nearest Neighbor: Presented by
29 pages
Machine Learning
No ratings yet
Machine Learning
50 pages
K Nearest Neighbour Algorithm
No ratings yet
K Nearest Neighbour Algorithm
14 pages
K-Nearest Neighbours (KNN)
No ratings yet
K-Nearest Neighbours (KNN)
10 pages
Distance Functions
No ratings yet
Distance Functions
7 pages
KNN Algorithms - 5082025
No ratings yet
KNN Algorithms - 5082025
14 pages
Introduction To Classification - KNN
No ratings yet
Introduction To Classification - KNN
29 pages
Chapter 4. K Nearest Neighbors
No ratings yet
Chapter 4. K Nearest Neighbors
55 pages
K-Nearest Neighbors (KNN) Algorithm
No ratings yet
K-Nearest Neighbors (KNN) Algorithm
26 pages
Metric Learning 1
No ratings yet
Metric Learning 1
10 pages
K-Nearest Neighbors (KNN)
No ratings yet
K-Nearest Neighbors (KNN)
16 pages
K-Nearest Neighbour Classifier: Prerequisite
No ratings yet
K-Nearest Neighbour Classifier: Prerequisite
6 pages
Introduction To AI and ML - UNIT 4
No ratings yet
Introduction To AI and ML - UNIT 4
29 pages
KNN - Feb 19
No ratings yet
KNN - Feb 19
42 pages
KNN Algorithm
No ratings yet
KNN Algorithm
11 pages
Smai Lecture 05 KNN Regression
No ratings yet
Smai Lecture 05 KNN Regression
33 pages
Week 07
No ratings yet
Week 07
24 pages
ML - Unit-2 Notes
No ratings yet
ML - Unit-2 Notes
9 pages
ML Unit 2
No ratings yet
ML Unit 2
11 pages
Why Do We Need A K-NN Algorithm?
No ratings yet
Why Do We Need A K-NN Algorithm?
11 pages
K Nearest Neighbor ML
No ratings yet
K Nearest Neighbor ML
15 pages
05 KNN
No ratings yet
05 KNN
49 pages
k-NN Algorithm: Basics, Applications, and Advantages
No ratings yet
k-NN Algorithm: Basics, Applications, and Advantages
42 pages
4 KNN Classifier
No ratings yet
4 KNN Classifier
6 pages
KNN Lecture Presentation
No ratings yet
KNN Lecture Presentation
9 pages
Machine Learning KNN Presentation
No ratings yet
Machine Learning KNN Presentation
28 pages
Machine Learning KNN Presentation
No ratings yet
Machine Learning KNN Presentation
28 pages
K-Nearest Neighbors (KNN) Algorithm: Dr. Nagaraju K, CSE
No ratings yet
K-Nearest Neighbors (KNN) Algorithm: Dr. Nagaraju K, CSE
24 pages
ML Unit - 2
No ratings yet
ML Unit - 2
85 pages
4 KNN Classifier
No ratings yet
4 KNN Classifier
6 pages
ML Unit 2
No ratings yet
ML Unit 2
24 pages
Research Paper
No ratings yet
Research Paper
6 pages
KNN - Algorithm - SVM - Algorithm
No ratings yet
KNN - Algorithm - SVM - Algorithm
27 pages
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
No ratings yet
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
7 pages
K-Nearest Neighbors Algorithm
No ratings yet
K-Nearest Neighbors Algorithm
7 pages
Lecture 07 KNN 14112022 034756pm
100% (1)
Lecture 07 KNN 14112022 034756pm
24 pages
Unit V Non Parametric Machine Learning
No ratings yet
Unit V Non Parametric Machine Learning
47 pages
K-Nearest Neighbors (KNN) Algorithm in Machine Learning
No ratings yet
K-Nearest Neighbors (KNN) Algorithm in Machine Learning
3 pages
K - Nearest Neighbours
No ratings yet
K - Nearest Neighbours
6 pages
Lecture 4 KNN
No ratings yet
Lecture 4 KNN
17 pages
K-NN Distance Metrics Explained
No ratings yet
K-NN Distance Metrics Explained
8 pages
K Nearestneighborknnalgorithm 241117075907 d767c46d
No ratings yet
K Nearestneighborknnalgorithm 241117075907 d767c46d
13 pages
ML Lecture 13 KNN
No ratings yet
ML Lecture 13 KNN
14 pages
Unit-1 Part 2
No ratings yet
Unit-1 Part 2
85 pages
Unit 1
No ratings yet
Unit 1
149 pages
IP 2 July25
No ratings yet
IP 2 July25
150 pages
IP 1 July22
No ratings yet
IP 1 July22
188 pages
Temp 2 Lab 1
No ratings yet
Temp 2 Lab 1
5 pages
Module 3 Lab 3
No ratings yet
Module 3 Lab 3
4 pages
Module 1 Lab 2
No ratings yet
Module 1 Lab 2
7 pages
Module 2 Lab 3
No ratings yet
Module 2 Lab 3
5 pages
Spe 205386 Pa
No ratings yet
Spe 205386 Pa
21 pages
Semiconductor Fundamentals Review
No ratings yet
Semiconductor Fundamentals Review
4 pages
Friday Afternoon Physics
No ratings yet
Friday Afternoon Physics
128 pages
Indexing X Ray Diffraction Patterns
100% (6)
Indexing X Ray Diffraction Patterns
28 pages
Cheng Et Al 2019 Rotor Dynamic Experimental Investigation of An Ultra High Speed Permanent Magnet Synchronous Motor
No ratings yet
Cheng Et Al 2019 Rotor Dynamic Experimental Investigation of An Ultra High Speed Permanent Magnet Synchronous Motor
11 pages
Tex-101-E-Preparing Soil & Flex Base For Testingh
No ratings yet
Tex-101-E-Preparing Soil & Flex Base For Testingh
8 pages
Experiment 8
No ratings yet
Experiment 8
7 pages
RN M - 202 Theory
No ratings yet
RN M - 202 Theory
23 pages
Physics MLL Guide for Class XII
No ratings yet
Physics MLL Guide for Class XII
51 pages
Edexcel As Physics Unit 4 Practice Book
No ratings yet
Edexcel As Physics Unit 4 Practice Book
375 pages
1 TFN Introduction and Evolution of Nursing Theory
No ratings yet
1 TFN Introduction and Evolution of Nursing Theory
46 pages
Integrated On-Board EV Battery Chargers: New Perspectives and Challenges For Safety Improvement
No ratings yet
Integrated On-Board EV Battery Chargers: New Perspectives and Challenges For Safety Improvement
9 pages
Astrapool Remote
No ratings yet
Astrapool Remote
28 pages
Maths
No ratings yet
Maths
258 pages
Color Systems for Designers
No ratings yet
Color Systems for Designers
44 pages
2.5.2 L C - D C: Fracture Mechanics: Fundamentals and Applications
No ratings yet
2.5.2 L C - D C: Fracture Mechanics: Fundamentals and Applications
1 page
How To Design One-Way Slab As Per ACI 318-19 - Example Included
No ratings yet
How To Design One-Way Slab As Per ACI 318-19 - Example Included
41 pages
Golden Ratio
No ratings yet
Golden Ratio
13 pages
ABE 313 Properties of AB Materials Module 6
0% (1)
ABE 313 Properties of AB Materials Module 6
21 pages
9999999999999999FAQs About Mechanical Splices - ETN-M-4-14 PDF
No ratings yet
9999999999999999FAQs About Mechanical Splices - ETN-M-4-14 PDF
8 pages
10 1016@j Commatsci 2020 109883 PDF
No ratings yet
10 1016@j Commatsci 2020 109883 PDF
13 pages
Parabola Concepts and Equations
No ratings yet
Parabola Concepts and Equations
18 pages
Tank Inbreathing/Outbreathing Guide
0% (1)
Tank Inbreathing/Outbreathing Guide
2 pages
TEM JEOL 2100F April-2025
No ratings yet
TEM JEOL 2100F April-2025
8 pages
Name Date: Number and Place Value Assessment
No ratings yet
Name Date: Number and Place Value Assessment
8 pages
Distillation Assignment PDF
No ratings yet
Distillation Assignment PDF
13 pages
23UPHP01
No ratings yet
23UPHP01
2 pages
TGH40N65F2DS Final Datasheet Rev0.1.0
No ratings yet
TGH40N65F2DS Final Datasheet Rev0.1.0
9 pages
Give Reasons Periodic Table
No ratings yet
Give Reasons Periodic Table
3 pages
The Walkie Talkie
No ratings yet
The Walkie Talkie
4 pages

Module 3 Lab 1

Uploaded by

Module 3 Lab 1

Uploaded by

Detailed Explanation of Module 3 Lab 1: Understanding Distance Metrics and

Section 1: Distance Metrics

A. What is a Distance Metric?

B. Common Distance Metrics (with Examples)

Visualization: The shortest path between two points on a plane.

C. How to Choose the Right Distance Metric?

D. Visualizing Distance Metrics

Section 2: K-Nearest Neighbors (KNN)

C. KNN on the Iris Dataset (Real Data Example)

Section 3: Questions to Think About and Answer

Section 5: Summary Table

$ \sqrt{\sum (x_i - y_i)^2}

Max difference in any x_i -

# positions where $ x_i

Text, direction matters, not

Haversine Geographic coordinates See code in notebook

You might also like