Seeking approximate nearest library Annoy

Annoy open source Spotify is seeking high-dimensional space approximate nearest library, use it for music recommendation in Spotify. Nearest neighbor search (Nearest Neighbor Search, NNS) also known as "the nearest point search" (Closest point search), an optimization problem to find the nearest point in the scale space.

Annoy can be used as an index of static files, which means you can cross-process shared index. It also creates a lot of data structure read-only file-based data structures are embedded in memory, so that many processes can share the same data. Annoy Another benefit is that it tries to minimize the memory footprint, so the index is very small.

characteristic:

  • Euclidean distance, Manhattan distance, cosine distance, Hamming distance, or the point (endo) product from
  • Cosine distance vector is equivalent to the normalized Euclidean distance = sqrt (2-2 * cos (u, v)
  • If your small dimension (such as <100), would be better, even if the dimensions up to 1000, it performed very well
  • Small memory usage
  • It allows you to share memory among multiple processes
  • Index creation and lookup are separate (especially after the tree is created, you can not add more items)
  • Native Python support
  • Generating an index on disk, in order to index the memory is not suitable for large data sets

Guess you like

Origin www.cnblogs.com/fewfwf/p/11832542.html