[go: up one dir, main page]

0% found this document useful (0 votes)
21 views2 pages

Distance Metrics in Machine Learning

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views2 pages

Distance Metrics in Machine Learning

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Distance Metrics in Machine Learning

Created by Daniel Zaldaña


https://x.com/ZaldanaDaniel

Euclidean Distance 1 def manhattan (


Formula 2 x : ndarray ,
3 y : ndarray
v
u n 4 ) -> float :
uX 5 """ Calculate Manhattan distance using
d(x, y) = t (xi − yi )2 sklearn . """
i=1
6 X = x . reshape (1 , -1) if x . ndim == 1 else x
7 Y = y . reshape (1 , -1) if y . ndim == 1 else y
When to use:
8 return manhattan_distances (X , Y ) [0 , 0]
• Continuous data in low-dimensional space

• When scale and magnitude matter Cosine Similarity


• Default choice for clustering algorithms Formula
Properties: x·y
similarity(x, y) =
||x||||y||
• Symmetric: d(x, y) = d(y, x) When to use:
• Non-negative: d(x, y) ≥ 0 • Text analysis and document similarity

• Sensitive to outliers and scale • High-dimensional sparse data


• When direction matters more than magnitude
1 from sklearn . metrics . pairwise import (
2 euclidean_distances , 1 def cosine_similarity (
3 manhattan_distances , 2 x : ndarray ,
4 cosine_distances 3 y : ndarray
5 ) 4 ) -> float :
6 import numpy as np 5 """ Calculate cosine similarity using
7 from typing import ndarray sklearn . """
8
6 X = x . reshape (1 , -1) if x . ndim == 1 else x
9 def euclidean ( 7 Y = y . reshape (1 , -1) if y . ndim == 1 else y
10 x : ndarray , 8 return 1 - cosine_distances (X , Y ) [0 , 0]
11 y : ndarray
12 ) -> float :
13 """ Calculate Euclidean distance using
sklearn . """ Mahalanobis Distance
14 # Reshape if needed for single vectors
15 X = x . reshape (1 , -1) if x . ndim == 1 else x Formula
16 Y = y . reshape (1 , -1) if y . ndim == 1 else y q
17 return e u cl i d ean_distances (X , Y ) [0 , 0] d(x, y) = (x − y)T Σ−1 (x − y)
When to use:
• Correlated features
Manhattan Distance
• Anomaly detection
Formula
• Scale-invariant clustering
n
X
d(x, y) = |xi − yi | 1 from sklearn . covariance import
i=1
EmpiricalCovariance
When to use: 2

3 def mahalanobis (
• Grid-like patterns (e.g., city blocks) 4 x : ndarray ,
5 y : ndarray ,
• When diagonal movement costs more 6 cov : ndarray = None
7 ) -> float :
• Robust to outliers 8 """ Calculate Mahalanobis distance using
sklearn . """
9 X = x . reshape (1 , -1) if x . ndim == 1 else x When to use:
10 Y = y . reshape (1 , -1) if y . ndim == 1 else y
11 • Binary or set-based data
12 if cov is None :
13 # Estimate covariance from data • Comparing discrete features
14 cov_estimator = EmpiricalCovariance ()
15 cov_estimator . fit ( np . vstack ([ X , Y ]) ) • Document similarity with word sets
16 cov = cov_estimator . covariance_
17

18 diff = X - Y 1 def jaccard (


19 inv_covmat = np . linalg . inv ( cov ) 2 x : ndarray ,
20 return np . sqrt ( 3 y : ndarray
21 diff . dot ( inv_covmat ) . dot ( diff . T ) 4 ) -> float :
22 ) [0 , 0] 5 """ Calculate Jaccard distance using
sklearn . """
6 X = x . reshape (1 , -1) if x . ndim == 1 else x
7 Y = y . reshape (1 , -1) if y . ndim == 1 else y
Minkowski Distance 8 return pairwise_distances (
Formula 9 X , Y , metric = ’ jaccard ’
10 ) [0 , 0]
n
! p1
X
p
d(x, y) = |xi − yi |
i=1
When to use: Hamming Distance
• Generalizing distance metrics Formula
• When you need to tune the influence of large differences n
X
• Experimenting with different p-norms d(x, y) = ⊮xi ̸=yi
i=1
1 from sklearn . metrics import pairwise_distances
2 When to use:
3 def minkowski (
4 x : ndarray , • Categorical data
5 y : ndarray ,
6 p : float = 2 • Error detection in communication
7 ) -> float :
8 """ Calculate Minkowski distance using • Comparing equal-length strings
sklearn . """
9 X = x . reshape (1 , -1) if x . ndim == 1 else x
10 Y = y . reshape (1 , -1) if y . ndim == 1 else y 1 def hamming (
11 return pa ir wi se_distances ( 2 x : ndarray ,
12 X , Y , metric = ’ minkowski ’ , p = p 3 y : ndarray
13 ) [0 , 0] 4 ) -> float :
5 """ Calculate Hamming distance using
sklearn . """
6 X = x . reshape (1 , -1) if x . ndim == 1 else x
Jaccard Distance 7 Y = y . reshape (1 , -1) if y . ndim == 1 else y
8 return pairwise_distances (
Formula 9 X , Y , metric = ’ hamming ’
|x ∩ y| 10 ) [0 , 0]
d(x, y) = 1 −
|x ∪ y|

You might also like