Kmeans based indexing and Asymmetric Distance Computation for ANN search (Binary Local Feature):...

Transfer: http://www.cvchina.info/2012/01/13/kmeans-based-indexing-and-asymmetric-distance-computation-for-ann-search-binary-local-feature-part1/#more-3232


Herve Jegou by the Hamming Embedding and Weak Geometric consistency for large -scale image search and Product quantization for nearest neighbor search of the nearest neighbor to retrieve local feature inspired the Kmeans clustering, inverted files, Asymmetric Distance Computation applied to the binary form.

The main idea:

Kmeans do characterized by rough index.

According to statistical data compression feature.

Wherein the distance between the query feature index is calculated using a non-symmetrical way when retrieving.

algorithm:

training:

  1. To use the feature index Kmeans clustering obtain K centers. Do clustering Feature binary form, the category Center updates way: For each bit, the statistical frequency of a corresponding bit 0 all falling within the categories of features, and whichever is higher.
  2. For each Cluster, all falling within the class of statistical characteristic frequency of 1,0 per bit, 1 or 0 takes a frequency close to the first M bits of 50%. (Closer to 50%, the greater the entropy)

After training, we get two sets of data:

  • K feature categories centers.
  • For each category the center, has a set of "M th bit position identifier." These identifiers form a compressed original feature basis. (Herein after referred to as projection vectors)

index:

  1. Create an inverted table
  2. Wherein for each index to be calculated category center thereof, and is projected by the feature projection vectors of the category, to obtain a M-bit signature. Referred to as sig_templ, the signature is inserted into the inverted table.

ANN Search:

  1. Computing center query feature category query_cluster, and the feature vector of the projected projection category, to give a sig_query M bits.
  2. Wherein the query is calculated center categories, the category from the bit position corresponding to the projection vector removal, referred to as dist_base.
  3. Traversing query_cluster corresponding inverted entry from the table entries in sig_query sig_templ of calculation, referred to as dist_sig. Then the distance of the query feature and the indexing feature dist = dist_base + dist_sig. Minimum distance thus obtained, such as less than a threshold value, can be considered to find a ANN.

One o'clock analysis of the complexity of the time:

We assumed that K = 40, M = 64, wherein 32Byte the ORB (assuming for the index is characterized 1k):

Then each ANN retrieve only do

32Byte 40 calculates the Hamming Distance of about + 25 8Byte Hamming Distance Calculation.

Exhaustive comparison Search:

Hamming Distance calculated 32Byte of 1000

Faster: 20x

 

  1. Example:
  1. Experimental configuration:
  • Characteristics: Orb
  • nn / ann match threshold: 50
  • K = 40
  1. The following figure shows an exhaustive search result image without matching :( RANSAC)

image

After Ransac:

image

The following figure shows the ANN retrieval mode, the result of the image matching :( no RANSAC)

image

After Ransac:

image

As can be seen, matching the number of points will be significantly decreased. The reason is the inherent flaw two indexing. (Two itself close feature, will be classified into different cluster). Perhaps the use of Multi Assignment will be improved.

And other codes collated recurrence part2.


Reproduced in: https: //my.oschina.net/dake/blog/196639

Guess you like

Origin blog.csdn.net/weixin_34306676/article/details/91508693