Common distance metric method

In general, the distance metric used two methods: the Euclidean distance and cosine similarity

  Euclidean distance scale impact indicators will be affected by different units, therefore, before the first general use standardized, the greater the distance, the greater the differences between individuals

 Cosine similarity index scale similarity measure is not affected by the impact, [1,1], the greater the cosine value in the interval, the more similar.

   1, Euclidean distance: also known as Euclidean distance

   Distances between two or more points notation

   d (x, y) = sqrt ((x1-y1) ^ 2 + (x2-y2) ^ 2 + ... + (xn-a) ^ 2)

     Improved Method 1:

          Standardized Euclidean distance: for each component distribution is inconsistent, each of the components are standardized to the mean and variance equal

          After normalization value before standardization :( - mean component) / the component standard deviation

     Improved Method 2:

         Mahalanobis distance: the distance between the point and distribution, taking into account the links between the various characteristics, and scale-independent.

         For [mu] is the mean, covariance [Sigma multivariable vectors, the Mahalanobis distance sqrt ((x-μ) Σ ^ (- 1) (x-μ))

   2, cosine similarity calculated

   With the cosine of the angle between two vectors as a measure of vector space between the two individual differences in size.

   cos(seta) = (a^2+b^2-c^2)/(2ab)

   or

   cos (seta) = (a * b) / (a ​​|| || || x || b)

   or

   (x1,y1)*(x2,y2)/sqrt(x1^2+y1^2)xsqrt(x2^2+y2^2)

   ==

    (x1x2,+y1y2)/sqrt(x1^2+y1^2)xsqrt(x2^2+y2^2)

    

     

Guess you like

Origin www.cnblogs.com/jimchen1218/p/11504545.html