Distance metric uses distance function which provides a relationship metric between
each elements in the dataset.
A good distance metric helps in improving the performance of Classification, Clustering, and
Information Retrieval process significantly. In this article, we will discuss different Distance
Metrics and how do they help in Machine Learning Modelling.
So, in this blog, we are going to understand distance metrics, such as Euclidean and Manhattan
Distance used in machine learning models, in-depth.
Euclidean Distance Metric:
Euclidean Distance represents the shortest distance between two points.
The “Euclidean Distance” between two objects is the distance you would expect in “flat” or
“Euclidean” space; it’s named after Euclid, who worked out the rules of geometry on a flat
surface.
The Euclidean is often the “default” distance used in e.g., K-nearest neighbors (classification)
or K-means (clustering) to find the “k closest points” of a particular sample point. The
“closeness” is defined by the difference (“distance”) along the scale of each variable, which
is converted to a similarity measure. This distance is defined as the Euclidian distance.
It is only one of the many available options to measure the distance between two vectors/data
objects. However, many classification algorithms, as mentioned above, use it to either train
the classifier or decide the class membership of a test observation and clustering algorithms
(for e.g. K-means, K-medoids, etc) use it to assign membership to data objects among
different clusters.
Mathematically, it’s calculated using Pythagoras’ theorem. The square of the total distance
between two objects is the sum of the squares of the distances along each perpendicular co-
ordinate.
Manhattan Distance Metric:
Manhattan Distance is the sum of absolute differences between points across all the
dimensions.
Manhattan distance is a metric in which the distance between two points is the sum of the
absolute differences of their Cartesian coordinates. In a simple way of saying it is the total
sum of the difference between the x-coordinates and y-coordinates.
This Manhattan distance metric is also known as Manhattan length, rectilinear distance, L1
distance or L1 norm, city block distance, Minkowski’s L1 distance, taxi-cab metric, or city
block distance.
Applications of Manhattan distance metric include,
1. Regression analysis: It is used in the linear regression to find a straight line that fits a
given set of points
2. Compressed sensing: In solving an underdetermined system of linear equations, the
regularisation term for the parameter vector is expressed in terms of Manhattan distance.
This approach appears in the signal recovery framework called compressed sensing
3. Frequency distribution: It is used to assess the differences in discrete frequency
distributions.
Now, apart from these distance metrics, we also have other popular distance metrics, which
are,
1. Hamming Distance: Used to Calculate the distance between binary vectors.
2. Minkowski Distance: Generalization of Euclidean and Manhattan distance.
3. Cosine distance: Cosine similarity measures the similarity between two vectors of an
inner product space.