KD tree concept learning

kd-tree 

kd-tree (short for k-dimensional tree) is a tree data structure that stores instance points in k-dimensional space for quick retrieval.
Mainly used for searching key data in multi-dimensional space (such as range search and nearest neighbor search).
KD trees are a special case of binary space partitioning trees.

In computer science, a kd-tree (short for k-dimensional tree) is a data structure that organizes points in k-dimensional Euclidean space.
KD trees can be used in a variety of applications, such as multi-dimensional key-value search (for example: range search and nearest neighbor search).
The kd tree is a special case of binary space partitioning.

Structure introduction
    kd tree is a binary tree in which each node is a k-dimensional point. All non-leaf nodes can be viewed as dividing the space into two half-spaces with a hyperplane. The subtree to the left of the node represents the points to the left of the hyperplane, and the subtree to the right of the node represents the points to the right of the hyperplane. The method of selecting the hyperplane is as follows: each node is related to the dimension perpendicular to the hyperplane in k dimensions. Therefore, if you choose to divide according to the x-axis, all nodes with x values ​​less than the specified value will appear in the left subtree, and all nodes with x values ​​greater than the specified value will appear in the right subtree. In this way, the hyperplane can be determined using this x value, and its normal is the unit vector of the x-axis.

structural operations

    1 There are many ways to create a kd tree.
    You can choose axis-aligned splitting planes, so there are many ways to create a kd tree. 
    The most typical method is as follows:

    .As the tree grows deeper, the axis is selected as the dividing plane in turn. (For example: in the three-dimensional space, the root node is an x-axis vertical dividing surface, its child nodes are all y-axis vertical dividing surfaces, its grandchild nodes are all z-axis vertical dividing surfaces, and its great-grandchild nodes are all x-axis vertical dividing surfaces, etc. And so on.)
    .Points are distinguished by the median of the axis coordinates of the vertical dividing plane and put into the subtree

    This method produces a balanced kd-tree. The height of each leaf node is very close. However, balanced trees are not necessarily optimal for every application.

    2 Nearest neighbor search
    Nearest neighbor search is used to find the point in the tree that is closest to the input point.
    The process of kd tree nearest neighbor search is as follows:
    1) Start from the root node and move downward recursively. The method of deciding whether to go left or right is the same as the method of inserting elements (if the input point is on the left side of the partition surface, it enters the left subnode, and on the right side it enters the right subnode).
    2) Once moved to a leaf node, treat this node as the "current best point".
    3) Unlock the recursion and run the following steps for each passing node:
        a. If the current point is closer to the input point than the current best point, change it to the current best point.
        b. Check if there is a closer point in the subtree on the other side, and if so, search downwards from this node.
    4) When the root node search is completed, the nearest neighbor search is completed.

    3. Handling High-Dimensional Data
    The curse of dimensionality makes most search algorithms fancy and impractical in high-dimensional situations. Similarly, in high-dimensional space, kd leaves cannot perform very efficient nearest neighbor search. The general guideline is: in the k-dimensional case, the number of data points N should be much greater than when the nearest neighbor search of the kd tree can play its role well. Otherwise, most of the points will be queried, and the final algorithm efficiency will not be much better than the entire query. In addition, if you just need a fast enough result without necessarily being optimal, you can consider using the approximate proximity query method.

Guess you like

Origin blog.csdn.net/bcbobo21cn/article/details/132844113