Euclidean distance, Manhattan distance, Minkowski distance, Mahalanobis distance, Hamming distance

Euclidean distance

Euclidean distance is a commonly used definition of distance, which is the real distance between two points in N-dimensional space .

insert image description here

 

Manhattan distance

Manhattan distance is a geometric term used in geometric metric spaces to indicate the sum of the absolute axis distances of two points on a standard coordinate system .

The red line in the figure below represents the Manhattan distance, the green represents the Euclidean distance , that is, the straight-line distance, and the blue and yellow represent the equivalent Manhattan distance.

Manhattan distance

 

Minkowski distance

Minkowski distance is not a kind of distance, but a definition of a group of distances. It is a general expression of multiple distance measurement formulas, which includes Euclidean distance (p = 2) and Manhattan distance (p = 1 ), which is a special case of the Minkowski distance:

 Python implementation

def MinkowskiDistance(x, y, p):
    import math
    import numpy as np
    zipped_coordinate = zip(x, y)
    return math.pow(np.sum([math.pow(np.abs(i[0]-i[1]), p) for i in zipped_coordinate]), 1/p)

Mahalanobis distance

Before introducing the Mahalanobis distance, let's look at the following concepts:

Variance : Variance is the square of the standard deviation, and the meaning of the standard deviation is the average distance from each point in the data set to the mean point. It reflects the degree of dispersion of the data.

Covariance : Standard deviation and variance describe one-dimensional data. When there are multidimensional data, we usually need to know whether there is a relationship among the variables of each dimension. Covariance is a statistic that measures the correlation between variables in a cube. For example, the relationship between a person's height and his weight needs to be measured by covariance. If the covariance between two variables is positive, then there is a positive correlation between the two variables, and if it is negative, then there is a negative correlation.

Covariance matrix : When there are many variables, more than two variables. Then, use the covariance matrix to measure the correlation between so many variables. Suppose X is a column vector composed of n random variables (each of which is also a vector, of course a row vector):

 where is the expected value of the i-th element, ie . The i, j items of the covariance matrix (the i, j items are a covariance) are defined as follows:

 Right now:

      The ( i , j ) th element in the matrix is   ​​the covariance of  and  .

 

Mahalanobis distance (Mahalanobis Distance) is a measure of distance, which can be regarded as a correction of Euclidean distance. It corrects the problem that the scales of each dimension in Euclidean distance are inconsistent and related; The problem of non-independent and identical distribution among dimensions .

 

Hamming distance

Hamming Distance (Hamming Distance) is a distance measurement method applied to data transmission error control coding, which represents the number of different bits of two (same length) strings. Perform an XOR operation on two strings and count the number of 1s, then this number is the Hamming distance. We can also understand the Hamming distance as the minimum number of substitutions between two equal-length strings to change one of them into the other.
 

Python implementation

def HammingDistance(x, y):
    return sum(x_ch != y_ch for x_ch, y_ch in zip(x, y))

Guess you like

Origin blog.csdn.net/m0_61897853/article/details/126839902