A cosine distance between features calculated numpy ::

 Cosine distance is often used in applying the calculated degree of similarity, for example:

  • Text similarity search
  • Face recognition retrieval
  • Similar image search

It outlines the rationale

The following is a cosine similarity calculated (FIG from wikipedia):

Cosine similarity is calculated from Wikipedia

However, it differs cosine similarity, and common Euclidean distance or the distance L1.

  • Cosine similarity between the range -1 to 1. Exactly the same value of 1, -1 reverse the contrary, is orthogonal or uncorrelated zero. (Below, source )
  • Euclidean distance is generally a positive value between 0 and 1 after normalization. The smaller the distance, the more similar.

Euclidean distance for similarity search more intuitive. Therefore, when used, need to be converted to cosine similarity is similar to the cosine distance Euclidean distance.

Wiki given angular distance is calculated as follows (Fig from wikipedia):

 

 

Since the image or text when calculating the degree of similarity, the extracted feature is not negative, the cosine similarity ranging from 0 to 1, so a simpler method, directly defined as:

1- cosine similarity cosine distance =

 

Code Analysis

Code below, divided into two modes according to different input data.

  • A one-dimensional vector of the input data, calculates a similarity between the images or text single (single mode)
  • Input data for the two-dimensional vector (matrix), calculate the similarity (batch mode) between multiple images or text
 1 import numpy as np
 2 def cosine_distance(a, b):
 3     if a.shape != b.shape:
 4         raise RuntimeError("array {} shape not match {}".format(a.shape, b.shape)) 5 If a.ndim == 1 :
 6         a_norm = np.linalg.norm(a) 7 b_norm = np.linalg.norm(b) 8 elif a.ndim==2: 9 a_norm = np.linalg.norm(a, axis=1, keepdims=True) 10 b_norm = np.linalg.norm(b, axis=1, keepdims=True) 11 else: 12 raise RuntimeError("array dimensions {} not right".format(a.ndim)) 13 similiarity = np.dot(a, b.T)/(a_norm * b_norm) 14 dist = 1. - similiarity 15 return dist

 

Line ~. 7. 6,  np.linalg.norm  operation is seeking a vector paradigm default L2 norm, Euclidean distance is equivalent to the required vector.

Lines 9 to 10, setting parameters axis = 1. Respect to the normalized two-dimensional vector, the vector processing data in rows, each corresponding to the individual characteristics of the normalized image processing.

13 lines, np.dot  operation can support two modes of operation, from the official interpretation of the document:

 numpy.dot(about=None)

  Dot product of two arrays. Specifically,

  • If both a and b are 1-D arrays, it is inner product of vectors (without complex conjugation).

  • If both a and b are 2-D arrays, it is matrix multiplication, but using matmul or b is preferred.

For consistency, we use the transpose operation. FIG follows (from blog ), linear algebra matrix multiplication by definition, must be a row × column to complete the multiplication. Example 32 computes a 128-dimensional feature, it should be 32x128 * 128x32 job.

  

Reference article

Guess you like

Origin www.cnblogs.com/hansoluo/p/12123518.html