Intuitive understanding - Mahalanobis distance

First of all, we have a good understanding of Euclidean distance, which is used to calculate the distance between two points in Euclidean space (that is, our common coordinate system).
For example point x = ( x 1 , … , xn ) x = (x_1,…,x_n)x=(x1,,xn)y = ( y 1 , … , yn ) y = (y_1,…,y_n)y=(y1,,yn) 的欧氏距离为:
d ( x , y ) = ( x 1 − y 1 ) 2 + ( x 2 − y 2 ) 2 + . . . + ( x n − y n ) 2 = ∑ i = 1 n ( x i − y i ) 2 = ( x − y ) T ( x − y ) d(x,y) = \sqrt{(x_1-y_1)^2+(x_2-y_2)^2+...+(x_n-y_n)^2} \\ = \sqrt{\sum_{i=1}^{n}(x_i-y_i)^2}\\ =\sqrt{(x-y)^T(x-y)} d(x,y)=(x1y1)2+(x2y2)2+...+(xnyn)2 =i=1n(xiyi)2 =(xy)T(xy)

The Mahalanobis distance is mostly used in anomaly detection. As shown in the figure below, our data is distributed in an ellipse, and the center point of the ellipse is the green point in the middle. Now we need to judge whether the green point or the blue point outside is an outlier point, which one is a normal point?

It is obvious that the green points are inside the ellipse, so the green points should be normal points and the blue points should be outliers. But if calculated by Euclidean distance, blue should be a normal point. So there is a problem with the Euclidean distance at this time.

Let’s take another one-dimensional chestnut. Now there are two categories with uniform units. The first category has a mean of 1 and a variance of 0.1, and the second category has a mean of 5 and a variance of 4. Then a point with a value of 2.5 has a greater probability of belonging to the first category or the second category?

In terms of Euclidean distance, it should be the first type, but intuitively it is obviously the second type, because the value range of the first type is 1 ± 0.1 1 ± 0.11±0.1 , the value range of the second category is5 ± 4 5±45±4 , so it must be the second category.

See the clue yet? It's the mean and variance problem.

Before officially starting the Euclidean distance, I hope you can have the basis of PCA: PCA principal component analysis-clear, detailed and easy to understand . This article is written very clearly and in detail. It is recommended to draw for ten minutes to understand PCA.

From this article, we know that the data can be centered by first subtracting the mean value (the left of the figure), and then the base transformation is performed to map the original data to the data shown on the right of the figure below.
insert image description here

This is only intuitively more convenient, but still does not change the Euclidean distance between the blue and green points and the center point.

At this time, if you are careful, you must have observed the clue again. The variance of the data in the horizontal direction is relatively large, while the variance in the vertical direction is relatively small. A distribution with mean 0 and variance 1).
There are two dimensions here, so it is necessary to standardize both x and y directions.

xnew = x − μ σ x_{new}=\frac{x-\mu}{σ}xnew=pxm

where μ is the sample mean and σ is the standard deviation of the sample data.

Here I randomly made some data

x = [0,-3,-3,-2,-2,-2,-1,-1,-1,-1,0,0,1,1,2,2,2,3]
y = [-1.5,-0.5,0.4,-0.5,-1,0.2,0.1,0,-1,0.5,1.2,-0.5,0.8,0.5,0,-0.5,0.5,0]

Visualize it as shown in the figure:
insert image description here
you can see ( 0 , − 1.5 ) (0,-1.5)(0,1.5 ) and( 3 , 0 ) (3,0)(3,0 ) , the intuitive feeling is( 3 , 0 ) (3,0)(3,0 ) is far away from the far point, which is an outlier point. But look at the distribution of the data again. The variance on the x-axis is large and the variance on the y-axis is small. Let’s look at the alignment and standardization:

insert image description here
The same coordinate system, after normalization and scaling, the original ( 0 , 3 ) (0,3)(0,3 ) The point is closer to the origin.

import numpy as np
import matplotlib.pyplot as plt
x = [0,-3,-3,-2,-2,-2,-1,-1,-1,-1,0,0,1,1,2,2,2,3]
y = [-1.5,-0.5,0.4,-0.5,-1,0.2,0.1,0,-1,0.5,1.2,-0.5,0.8,0.5,0,-0.5,0.5,0]

x = np.array(x)
y = np.array(y)
plt.plot(x,y,'+')
plt.plot(0,0,'ro')
plt.plot(x[0],y[0],'ro')
plt.plot(x[-1],y[-1],'ro')
plt.xlim(-4,4)
plt.ylim(-3,3)
plt.show()

a = np.mean(x)
b = np.mean(y)

print(np.mean(x-a), np.mean(y-b))
print(np.std(x), np.std(y))

x1 = (x-a)/np.std(x)
y1 = (y-b)/np.std(y)

print(x1)
print(y1)
plt.plot(x1[0],y1[0],'ro')
plt.plot(x1[-1],y1[-1],'ro')
plt.xlim(-4,4)
plt.ylim(-3,3)
plt.plot(x1,y1,'+')
plt.plot(0,0,'ro')
plt.xlim(-4,4)
plt.ylim(-3,3)
plt.show()

From this PCA article we know: Let CCC is forXXThe covariance matrix of X , let YYY isXXX versusPPP is the data matrix after base transformation, that is,Y = PXY=PXY=PX , notePPP is an orthogonal matrix. The base transformation performed above isY = PXY=PXY=The process of PX . After the transformation, there is Y, and then finding the Euclidean distance of Y is the so-called Mahalanobis distance.

So how to combine the above process in a unified way?

YYY -shaped horizontal distance D, for:
D = 1 m YYT = 1 m ( PX ) ( PX ) T = 1 m PXXTPT = P ( 1 m XXT ) PT = PCPT \begin{array}{lll} D & = &\frac{1}{m}YY^\mathsf{T}\\& = &\frac{1}{m}(PX)(PX)^\mathsf{T}\&=&\frac{1 }{m}PXX^\mathsf{T}P^\mathsf{T} \\ & = & P(\frac{1}{m}XX^\mathsf{T})P^\mathsf{T} \\ &=&PCP^\mathsf{T}\end{array}\\D=====m1YYTm1(PX)(PX)Tm1PXXTPTP(m1XXT)PTPCPT

From PCA we know that the transformed XXX is uncorrelated (orthogonal) in the new vector space, so D is diagonal. Due toXXX is known, so its covariance matrixCCC also knows, so how to find P to transform X?

Thanks to CCC is a symmetric matrix. In linear algebra, a real symmetric matrix has a series of very good properties:

  1. The eigenvectors corresponding to different eigenvalues ​​of a real symmetric matrix must be orthogonal.
  2. Let the eigenvector λ λλ multiplicity is r, then there must be r linearly independent eigenvectors corresponding toλ λλ , so these r eigenvectors can be unit orthogonalized.

From the above two, it can be seen that a real symmetric matrix with n rows and n columns must be able to find n unit orthogonal eigenvectors, let these n eigenvectors be e 1 , e 2 , ⋯ , en e_1,e_2,\cdots,e_ne1,e2,,en, we organize it into a column-wise matrix:
E = ( e 1 e 2 ⋯ en ) E=\begin{pmatrix} e_1 & e_2 & \cdots & e_n \end{pmatrix} \\E=(e1e2en)
Equivalent matrix function C:
ETCE = Λ = ( λ 1 λ 2 ⋱ λ n ) E^\mathsf{T}CE=\Lambda=\begin{pmatrix} \lambda_1 & & & \\ & \lambda_2 &&\\&&\ddots&\\&&&\lambda_n\end{pmatrix}\\ETCE=L= l1l2ln

where Λ \LambdaΛ is a diagonal matrix, and its diagonal elements are the eigenvalues ​​corresponding to each eigenvector (there may be repetitions).

The above conclusions no longer provide strict mathematical proofs. Friends who are interested in proofs can refer to the content of "diagonalization of real symmetric matrices" in linear algebra books.

At this point, we find that we have found the required matrix P:

P = E T P=E^\mathsf{T} \\ P=ET

P is 协方差矩阵的特征向量单位化后按行排列出的矩阵where each row is an eigenvector of C. If P is set according to Λ \LambdaThe eigenvalues ​​in Λ用P的前K行组成的矩阵乘以原始数据矩阵X,就得到了我们需要的降维后的数据矩阵Y are arranged from large to small, and the eigenvectors are arranged from top to bottom, then .

I don’t know how to push the above, so I have the following

Original address: www.cnblogs.com

Mahalanobis Distance

Mahalanobis Distance (Mahalanobis Distance) was proposed by Indian statistician Mahalanobis (PC Mahalanobis), which represents the covariance distance of data . It is an efficient method to calculate the similarity between two unknown sample sets. It takes into account the connection between data features and is scale-invariant, i.e. independent of the measurement scale.

Definition of Mahalanobis distance

Assume xxxyyy is obtained from the mean vector asμ \muμ , the covariance matrix is​​Σ \SigmaOverallGG of ΣTwo samples drawn at random from G , define xxxyyThe Mahalanobis distance between two points in y is:

d m 2 ( x , y ) = ( x − y ) T Σ − 1 ( x − y ) d_m^2(x, y)= (x-y)^T \Sigma^{-1}(x-y) dm2(x,y)=(xy)T Σ1(xy)

define xxx vs. overallGGThe Mahalanobis distance of G is:

d m 2 ( x , μ G ) = ( x − μ G ) T Σ − 1 ( x − μ G ) d_m^2(x, \mu_G)= (x-\mu_G)^T \Sigma^{-1}(x-\mu_G) dm2(x,mG)=(xmG)T Σ1(xmG)

Among them, if the covariance matrix is ​​a unit vector, that is, each dimension is independently and identically distributed, the Mahalanobis distance becomes the Euclidean distance.

Note: The above two expressions are actually the square of the Mahalanobis distance

Why define Mahalanobis distance

1. The impact of the unit of the data index on the distance measurement

In many machine learning problems, the features between samples can be described by distance, such as the most common Euclidean distance. For Euclidean distance, any two points in space P = ( x 1 , x 2 , … , xp ) P=(x_1, x_2, \dots, x_p)P=(x1,x2,,xp)Q = ( y 1 , y 2 , … , yp ) Q=(y_1, y_2, \dots, y_p)Q=(y1,y2,,yp) is the distance between:

d ( P , Q ) = ( x 1 − y 1 ) 2 + ( x 2 − y 2 ) 2 + ⋯ + ( x p − y p ) 2 d(P,Q)=\sqrt{(x_1-y_1)^2 + (x_2-y_2)^2 + \dots + (x_p-y_p)^2} d(P,Q)=(x1y1)2+(x2y2)2++(xpyp)2

Obviously, when the fixed point QQQ and the value is0 0When 0 , it means pointPPP is the distance from the coordinate origin.

One disadvantage of Euclidean distance is that when each component is a quantity of different nature (for example, the height and weight of a person, the weight and volume of a watermelon), the size of the "distance" is actually related to the unit of the index. For example, horizontal axis x 1 x_1x1Represents weight (in kg kgkg g ), the vertical axisx 2 x_2x2Represents the length (in cm cmcm is the unit). There are four pointsA , B , C , DA, B, C, DA,B,C,D , as shown in Figure 2.1.


Figure 2.1

at this time

d ( A , B ) = 5 2 + 1 0 2 = 125 d(A, B) = \sqrt{5^2+10^2} = \sqrt {125} d(A,B)=52+102 =125
d ( C , D ) = 1 0 2 + 1 2 = 101 d(C, D) = \sqrt{10^2+1^2} = \sqrt {101}d(C,D)=102+12 =101

Obviously, AB ABA B ratioCD CDCD should be longer.

Now, if x 2 x_2x2In millimeters ( mm ) (mm)( mm ) as unit,x 1 x_1x1remain unchanged, at this time AAA coordinate( 0 , 50 ) (0, 50)(0,50) C C C coordinates are( 0 , 100 ) (0, 100)(0,100 ) , then

d ( A , B ) = 5 0 2 + 1 0 2 = 2600 d(A, B) = \sqrt{50^2+10^2} = \sqrt {2600} d(A,B)=502+102 =2600
d( C , D ) = 10 0 2 + 1 2 = 10001 d(C, D) = \sqrt{100^2+1^2} = \sqrt {10001}d(C,D)=1002+12 =10001

Result CD CDC D is better thanAB ABA B long! This is obviously unreasonable.

2. The influence of sample distribution on the distance measure

Although we can do normalization first to eliminate the problem of different measurement units between dimensions, the sample distribution will also affect the distance measurement. We can see that the weight of each coordinate in the Euclidean distance is equal. But in real problems, when the coordinate axes represent observation values, they often have random fluctuations of different sizes, that is, have different variances, and obviously the weights between them are different.

Here are two one-dimensional examples to illustrate this problem:

Assuming two normal distribution populations G 1 : N ( μ 1 , σ 1 2 ) G_1: N(\mu_1, \sigma_1^2)G1:N ( m1,p12) andG 2 : N ( μ 2 , σ 2 2 ) G_2: N(\mu_2, \sigma_2^2)G2:N ( m2,p22) . If there is a sample whose value is at point A, which population is A closer to? As shown in Figure 2.2:


Figure 2.2

In terms of absolute length, point A is 1 G_1 away from the population G on the leftG1Closer, that is, point A to μ 1 \mu_1m1Ratio to μ 2 \mu_2m2To be closer, (the Euclidean distance is used here, and the coordinates of point A are compared with μ 1 \mu_1m1to μ 2 \mu_2m2the absolute value of the difference between values), but from a probability point of view, point A is at μ 1 \mu_1m1About 4 σ 1 \sigma_1 on the right sidep1, while at μ 2 \mu_2m2About 3 σ 2 \sigma_2 on the left sidep2, if measured by standard deviation, the distance from point A is μ 2 \mu_2m2To be "closer". The latter is considered from the perspective of probability. It divides the square of the coordinate difference by the variance (or multiplies the reciprocal of the variance), thus turning it into a dimensionless number. When it is extended to multi-dimensional, it must be multiplied by the covariance matrix Σ \SigmaThe inverse matrix of Σ Σ − 1 \Sigma^{-1}S1

Look at Figure 2.3 below again, the distances between A and B relative to the origin are the same. But since the sample population is distributed along the horizontal axis, point B is more likely to be a point in this sample, and point A is more likely to be an outlier. A small difference in a dimension with small variance can become an outlier.


Figure 2.3

3. The influence of correlation between dimensions on distance measurement

We see that the situation described above is that the dimensions or features are not correlated, so what if there is correlation between the dimensions? As shown in Figure 2.4 below:


Figure 2.4

It can be seen that the sample point can be approximated as f ( x ) = xf(x) = xf(x)=For the linear distribution of x , the distances between A and B relative to the origin are still the same. Obviously, A is more like an outlier, that is to say, it is farther away from the sample population.

Even if the data has been standardized, it will not change the relationship between AB and the distance between the origin. So to essentially solve this problem, it is necessary to standardize the principal components in the principal component analysis .

Why standardization does not change the relationship between distances , here is a brief explanation of the most common z-score standardization (also called standard deviation standardization) as an example. This method gives the mean (mean) and standard deviation (standard deviation) of the original data to standardize the data. The processed data conform to the standard normal distribution, that is, the mean is 0, the standard deviation is 1, and its transformation function is: x ∗ = ( x − μ ) / σ x^∗=(x−\mu)/\sigmax=(xμ ) / σ,使用μ \muμ is the mean of all sample data,σ \sigmaσ is the standard deviation of all sample data.

Obviously here μ \muµσ \sigmaσ is the same, sostandardization is only equivalent to a proportional scaling of the distance between points A and B and the data center, and does not affect the relationship between them.
It can be seen that there are still big problems only relying on the standardized Euclidean distance, and the influence of data correlation on the judgment result is still very large.

Why is the standard deviation of z-score normalized to 1?

x-μ only changes the mean, and the standard deviation remains unchanged, so the mean becomes 0
(x-μ)/σ only divides the standard deviation by σ times, so the standard deviation becomes 1, that is,
E
( x ∗ ) = E [ x − E ( x ) D ( x ) ] = 1 D ( x ) E [ x − E ( x ) ] = 0 E(x^*) = E[\frac{xE(x)}{\sqrt{D( x)}}] = \frac{1}{\sqrt{D(x)}}E[xE(x)]=0And ( x)=E[D(x) xE(x)]=D(x) 1E [ xE ( x )]=0

D ( x ∗ ) = D [ x − E ( x ) D ( x ) ] = 1 D ( x ) D [ x − E ( x ) ] = D ( x ) D ( x ) = 1 D(x^*) = D\big[\frac{x-E(x)}{\sqrt{D(x)}}\big] = \frac{1}{D(x)}D[x-E(x)] = \frac{D(x)}{D(x)} = 1 D(x)=D[D(x) xE(x)]=D(x)1D[xE ( x )]=D(x)D(x)=1

OK! To sum up the question:

1) Consider the influence of correlation between dimensions
2) Consider the influence of variance
3) Consider the influence of measurement units or dimensions

Therefore, it is necessary to establish a statistical distance (just a term used to distinguish the commonly used Euclidean distance). The distance is required to be independent of the units used for each variable. It appears that the distance we choose depends on the variance and covariance of the samples. The Mahalanobis distance is the most commonly used statistical distance.

Geometric meaning of Mahalanobis distance

So what to do? , in fact, if the above content has been understood, we will know that we only need to rotate the variables according to the principal components, so that the dimensions are independent of each other , and then standardize, so that the dimensions have the same distribution , then it is OK, and then the calculation of the Mahalanobis distance becomes Became the calculation of Euclidean distance.

According to the principal component analysis, since the principal component is the direction of the eigenvector, the variance of each direction is the corresponding eigenvalue, so it is only necessary to rotate according to the direction of the eigenvector, and then scale the eigenvalue times. The following results can be obtained, Figure 2.5:


Figure 2.5

The outliers are successfully separated, and the Mahalanobis distance at this time is the Euclidean distance.

Derivation of the Mahalanobis distance

Suppose the original multidimensional sample data X n × m X_{n\times m}Xn×m n n n lines,mmm columns):

X n × m = [ x 11 x 12 ⋯ x 1 m x 21 x 22 ⋯ x 2 m ⋮ ⋮ ⋱ ⋮ x n 1 x n 2 ⋯ x n m ] X_{n\times m} = \begin{bmatrix} x_{11} & x_{12}& \cdots & x_{1m} \\ x_{21} & x_{22}& \cdots & x_{2m} \\ \vdots & \vdots& \ddots & \vdots \\ x_{n1} & x_{n2}& \cdots & x_{nm} \end{bmatrix} Xn×m= x11x21xn 1x12x22xn 2x1 mx2 mxnm

where each row represents a sample, X i X_iXiIndicates the ii of the samplei个维度, X i = ( x 1 i , x 2 i , … , x n i ) T , i = 1 , 2 , … , m X_i=(x_{1i}, x_{2i}, \dots, x_{ni})^T, i=1, 2, \dots, m Xi=(x1 i,x2 i,,xit is)T,i=1,2,,m , then the above multidimensional sample data can be recorded as:X = ( X 1 , X 2 , … , X m ) X = (X_1, X_2, \dots, X_m)X=(X1,X2,,Xm)

The mean vector of the sample population is recorded as: μ X = ( μ X 1 , μ X 2 , … , μ X m ) \mu_X=(\mu_{X1},\mu_{X2},\dots,\mu_{Xm} )mX=( mX 1,mX2 _,,mXm)

The covariance matrix is ​​written as: Σ X = E [ ( X − μ X ) T ( X − μ X ) ] = 1 n ( X − μ X ) T ( X − μ X ) \Sigma_X = E[(X-\ mu_X)^T(X-\mu_X)] = \frac{1}{n}(X-\mu_X)^T(X-\mu_X)SX=And [( XmX)T(XmX)]=n1(XmX)T(XmX)

According to the previous description, the data must first be transformed and rotated to the principal components to make the dimensions linearly independent. Suppose the original data XXX through the coordinate rotation matrixUUThe U transformation has obtained new coordinates, and the corresponding new data set is recorded asFFF (actuallyXXX F F F represents the same data set, but due to different coordinate values, different marks are used for distinction), data setFFThe mean vector of F is recorded asμ F = ( μ F 1 , μ F 2 , … , μ F m ) \mu_F=(\mu_{F1},\mu_{F2},\dots,\mu_{Fm})mF=( mF 1,mF2 _,,mFm) , then:

F T = ( F 1 , F 2 , … , F m ) = U X T F^T = (F_1, F_2, \dots, F_m) = UX^T FT=(F1,F2,,Fm)=UXT

( F − μ F ) T = U ( X − μ X ) T (F-\mu_F)^T = U(X-\mu_X)^T (FmF)T=U(XmX)T

( F − μ F ) = ( X − μ X ) U T (F-\mu_F) = (X-\mu_X)U^T (FmF)=(XmX)UT

The transformed data dimensions are linearly independent, and the variance of each dimension is an eigenvalue, that is, the covariance matrix Σ F \Sigma_FSFis a diagonal matrix, so:

Σ F = E [ ( F − μ F ) T ( F − μ F ) ] = 1 n ( F − μ F ) T ( F − μ F ) = 1 n U ( X − μ X ) T ( X − μ X ) UT = U Σ XUT = ( λ 1 λ 2 ⋱ λ m ) \begin{aligned} \Sigma_F &= E[(F-\mu_F)^T(F-\mu_F)] \\ &= \frac{; 1}{n}(F-\mu_F)^T(F-\mu_F) \\ &= \frac{1}{n}U(X-\mu_X)^T(X-\mu_X)U^T \ \ &= U\Sigma_XU^T \\ &= \left(\begin{matrix}\lambda_1 & && \\ &\lambda_2&& \\ &&\ddots& \\ &&&\lambda_m\end{matrix}\right) \end{ aligned}SF=E[(FmF)T(FmF)]=n1(FmF)T(FmF)=n1U(XmX)T(XmX)UT=The SXUT= l1l2lm

where λ i , i = 1 , 2 , … , m \lambda_i, i=1, 2, \dots, mli,i=1,2,,m represents the variance of each dimension.

After deriving here, we can derive the Mahalanobis distance formula, assuming that we want to calculate XXA sample point x of X = ( x 1 , x 2 , … , x 3 ) x=(x_1, x_2, \dots, x_3)x=(x1,x2,,x3)到重心μ X = ( μ X 1 , μ X 2 , … , μ X m ) \mu_X=(\mu_{X1},\mu_{X2},\dots,\mu_{Xm})mX=( mX 1,mX2 _,,mXm) Mahalanobis distance. Equivalent to findingFFF中点 f = ( f 1 , f 2 , … , f m ) f = (f_1, f_2, \dots, f_m) f=(f1,f2,,fm) to the coordinate value of the center of gravity of the standardized dataμ F = ( μ F 1 , μ F 2 , … , μ F m ) \mu_F=(\mu_{F1},\mu_{F2},\dots,\ mu_{Fm})mF=( mF 1,mF2 _,,mFm) Euclidean distance.

d m 2 ( f , μ F ) = ( f 1 − μ F 1 λ 1 ) 2 + ( f 2 − μ F 2 λ 2 ) 2 + ⋯ + ( f m − μ F m λ m ) 2 = ( f 1 − μ F 1 , f 2 − μ F 2 , … , f m − μ F m ) ( 1 λ 1 1 λ 2 ⋱ 1 λ m ) ( f 1 − μ F 1 f 2 − μ F 2 ⋮ f m − μ F m ) = ( f − μ F ) ( U Σ X U T ) − 1 ( f − μ F ) T = ( x − μ X ) U T ( U Σ X U T ) − 1 U ( x − μ X ) T = ( x − μ X ) U T ( U T ) − 1 Σ X − 1 U − 1 U ( x − μ X ) T = ( x − μ X ) Σ X − 1 ( x − μ X ) T \begin{aligned} d_m^2(f, \mu_F) &= \big(\frac{f_1-\mu_{F1}}{\sqrt{\lambda_1}}\big)^2 + \big(\frac{f_2-\mu_{F2}}{\sqrt{\lambda_2}}\big)^2 + \dots + \big(\frac{f_m-\mu_{Fm}}{\sqrt{\lambda_m}}\big)^2\\ &= (f_1-\mu_{F1},f_2-\mu_{F2},\dots,f_m-\mu_{Fm})\left(\begin{matrix}\frac{1}{\lambda_1} & && \\ &\frac{1}{\lambda_2}&& \\ &&\ddots& \\ &&&\frac{1}{\lambda_m}\end{matrix}\right)\left(\begin{matrix}f_1-\mu_{F1} \\ f_2-\mu_{F2} \\ \vdots\\ f_m-\mu_{Fm}\end{matrix}\right) \\ &= (f-\mu_F)(U\Sigma_XU^T)^{-1}(f-\mu_F)^T \\ &= (x-\mu_X)U^T(U\Sigma_XU^T)^{-1}U(x-\mu_X)^T \\ &= (x-\mu_X)U^T(U^T)^{-1}\Sigma_X^{-1}U^{-1}U(x-\mu_X)^T \\ &= (x-\mu_X)\Sigma_X^{-1}(x-\mu_X)^T \end{aligned} dm2(f,mF)=(l1 f1mF 1)2+(l2 f2mF2 _)2++(lm fmmFm)2=(f1mF 1,f2mF2 _,,fmmFm) l11l21lm1 f1mF 1f2mF2 _fmmFm =(fmF) ( The ΣXUT)1(fmF)T=(xmX)UT (UΣXUT)1U(xmX)T=(xmX)UT(UT)1 SX1U1U(xmX)T=(xmX) SX1(xmX)T

This is the calculation formula of the Mahalanobis distance defined earlier

if xxx is a column vector, then

d m 2 ( f , μ F ) = ( x − μ X ) T Σ X − 1 ( x − μ X ) d_m^2(f, \mu_F) = (x-\mu_X)^T\Sigma_X^{-1}(x-\mu_X) dm2(f,mF)=(xmX)T ΣX1(xmX)

If the center of gravity in the derivation process μ X \mu_XmXChange to any other sample point yyy , then the Mahalanobis distance formula between two sample points can be obtained as:

d m 2 ( x , y ) = ( x − y ) T Σ X − 1 ( x − y ) d_m^2(x, y) = (x-y)^T\Sigma_X^{-1}(x-y) dm2(x,y)=(xy)T ΣX1(xy)

Resort EEE represents a point set,ddd represents the distance, it isE × EE \times EE×E to[ 0 , ∞ ) [0, \infty)[0,) , it can be proved that the Mahalanobis distance conforms tothe basic properties of the distance measure

that's all!

Machine Learning Notes - Distance Measure and Similarity
1) Machine Learning Notes - Distance Measure and Similarity (1) Minkowski Distance

Reference sources:
1) https://www.jianshu.com/p/5706a108a0c6
2) https://blog.csdn.net/u010167269/article/details/51627338
3) https://zhuanlan.zhihu.com/p /46626607
4) Multivariate Statistical Analysis- He Xiaoqun

I really want to scold csdn, but I can’t understand it after searching for a long time, so I understand it after reading this article in the blog garden. . .
After all, it's your own food!

Guess you like

Origin blog.csdn.net/weixin_45755332/article/details/128616305