Algorithm (1): The principle and basic code of KD tree

Table of contents

foreword

1. What is a KD tree

2. Why use KD tree

3. The basic idea of ​​KD tree

Four, several situation analysis of KD tree

4.1 There is no closer point in another subspace

4.2 There are closer points in another subspace

4.3 Summary

Five, KD tree code (two-dimensional point, python version)

Six, KD tree code (multidimensional version)

6.1 python version

7. Application of KD tree

7.1 Find the closest point to the target point in the target plane or space

7.2 Find several closest points to the target point in the target plane or space

8. References


foreword

Since I encountered the problem of solving the nearest point in the plane when working on the project, I needed to use the KD tree simplification algorithm to reduce computing resources, so this article records the knowledge learned in this process.

1. What is a KD tree

kd-tree (short for k-dimensional tree), kd tree is a tree data structure that stores instance points in k-dimensional space for fast retrieval. It can be used in the k nearest neighbor method to achieve fast k Neighbor search. Constructing a kd tree is equivalent to continuously dividing the k-dimensional space with a hyperplane perpendicular to the coordinate axis. A k -d tree is a binary tree where each node is a k-dimensional point. All non-leaf nodes can be regarded as dividing the space into two halves by a hyperplane. The subtree on the left of the node represents the points on the left of the hyperplane, and the subtree on the right of the node represents the points on the right of the hyperplane.

2. Why use KD tree

It is mainly for fast calculation and saving computing resources. As the example mentioned in the preface, the problem of calculating which of a bunch of points in the plane is closest to the target point, if you use the conventional method, you need to calculate the distance between all points and the target point, and then compare them, so that there are very many points in the plane It is very wasteful of computing resources and the speed is very slow. In order to solve this problem, KD tree can be introduced in this process to simplify the algorithm.

3. The basic idea of ​​KD tree

The following is a classic case of learning a KD tree:

There are the following six points on a two-dimensional plane: (2,3), (5,4), (9,6), (4,7), (8,1), (7,2)

The purpose of the KD tree is to determine the division lines of these division spaces in Figure 1, and the determination steps are as follows:

(1) In the first step, the point sets of the above six points are sorted according to the first dimension (x-axis) of the two-dimensional plane, and the sorting results are (2,3), (4,7), (5, 4), (7,2), (8,1), (9,6)

(2) The second step is to obtain the point at the median of the above-mentioned point set. The median of even numbers generally takes the larger one, and the one obtained in the above-mentioned point set is (7,2). At this point Divide the plane, draw a vertical line at the position (7,2) in Figure 1

(3) In the third step, the remaining points in the left and right planes divided by (7,2) are sorted according to the second dimension (y-axis) of the two-dimensional plane, and the sorting result is the left branch: (2,3), ( 5,4), (4,7) and right branches (8,1), (9,6)

(4) In the fourth step, take the median point of the left branch and the right branch respectively as a new node, and continue to sort according to the first dimension until there is only one point left on each sub-branch, which is called a leaf.

The above process is drawn as follows:

It should be noted that the logic of sorting after each division is analogized according to the latitude once. In a two-dimensional plane, it is sorted by x, y, x, y, x, y.... axes. If it is in a three-dimensional space, it is x,y,z,x,y,z,x,y,z.... axis ordering.

After the above operations are completed, the data is divided, and the target point can be placed in the KD tree for comparison.

Four, several situation analysis of KD tree

4.1 There is no closer point in another subspace

The case is as follows:

The initial point set is still (2,3), (5,4), (9,6), (4,7), (8,1), (7,2) six points

The target point is (2.1,3.1)

According to the search logic of the above KD tree, the operation is as follows:

(1) Sort the initial clicks according to the first dimension (X coordinate) as (2,3) (4,7) (5,4) (7,2) (8,1) (9,6), the median Take 4, which is (7,2), and draw a dividing line at X=7 in the above figure

(2) Judging that the X coordinate 2.1 of the target point is less than (7,2), so take the points (2,3) (4,7) (5,4) in the left space after division

(3) Sort the above point set according to the second dimension (Y coordinate) as (2,3) (5,4) (4,7), the median of which is 2, that is, (5,4) points, in the plane draw the line

(4) Judging that the Y coordinate 3.1 of the target point is less than (5,4), so in the space below after division

(5) At this time, there is only one point (2,3) in the space, and the operation of dividing the space is over.

(6) Calculate the distance between the last point and the target point obtained at this time sqrt(2)/10, and temporarily record this value tempDis

(7) Start backtracking below

Calculate the Y-axis distance difference between the target point (2.1,3.1) and the last node (5,4) as dist, which is calculated as 0.9, which is greater than ≈0.1414 and sqrt(2)/10cannot be on the upper side of the node (5,4) The space has closer points, so the point (2,3) found in the space below node (5,4) is already the closest point. Continue backtracking to the previous node (5,4), at this time the distance on the X axis is 2.9>0.1414, there is no possibility of a closer point, so the backtracking ends, the nearest point is (2,3), and the shortest distance is sqrt(2)/10.

In the actual situation, when backtracking, the above dist will be smaller than tempDis, and there may be a closer point in another space of the previous node. For this situation, see the following case.

4.2 There are closer points in another subspace

The case is as follows:

The initial point set is (5,3), (2.5,5), (8,4.5), (2,2), (3.5,8), (8,2.5), (5.5,7.5)

The target point is (4.5,7.5)

According to the search logic of the KD tree, the operation is as follows:

(1) Sort the initial clicks according to the first dimension (X coordinate) as (2,2) (2.5,5) (3.5,8) (5,3) (5.5,7.5) (8,4.5), (8, 2.5), the median is 4, that is, (5,3) point, draw a dividing line at X=5 in the above figure

(2) Judging that the X coordinate 4.5 of the target point is less than (5,3), so take the points (2,2) (2.5,5) (3.5,8) in the divided left space

(3) Sort the above point set according to the second dimension (Y coordinate) as (2, 2) (2.5, 5) (3.5, 8), where the median is 2, that is, (2.5, 5) points in the plane Draw the line Y=5

(4) Judging that the Y coordinate 7.5 of the target point is greater than (2.5,5), so it is in the upper space after division

(5) At this time, there is only one point (2.5,8) (point D) in the space, and the operation of dividing the space is over.

(6) Calculate the distance between the last point obtained at this time and the target point as sqrt(4.25)≈2.06, and temporarily record this value tempDis

(7) Start backtracking below

Calculate the Y-axis distance difference between the target point (4.5,7.5) and the last node B (2.5,5) as dist, calculated as 2.5, this value is greater than 2.06, it is impossible to exist under the node (2.5,5) There are closer points in the space; continue to trace back to the previous node A (5,3), at this time the distance on the X axis dist (5-4.5=0.5) <2.06, so there may be more nodes in the space on the right side of A For the closest point, we enter the space on the right side of node A, continue the previous operation, and find that there is a point E (5.5,7.5), the distance between it and the target point S is 1,1<2.06, so the shortest distance tempDis is updated is 1, the latest point is updated to (5.5,7.5), and the backtracking ends.

4.3 Summary

To sum up, some preparations are required before writing the code of the KD tree:

(1) To define a node class, you need to have the point coordinates of the node, the left branch, the right branch, and the dimension of division (used to confirm which dimension needs to be used to sort the coordinates)

(2) Determine the recursive exit (end when there is only one point in the divided point set)

(3) Calculate the number of the median point

(4) Calculate the dimension of the current division

(5) Calculate the Euclidean distance between two points (Pythagorean Theorem)

(6) Backtracking to obtain the coordinates of the nearest point

Five, KD tree code (two-dimensional point, python version)

import math

pts = [(5,3),(2.5,5),(8,4.5),(2,2),(3.5,8),(8,2.5),(5.5,7.5)]  #点集
targetPt = (4.5,7.5)  #目标点

class Node():
    def __init__(self,pt,leftBranch,rightBranch,dimension):
        self.pt = pt
        self.leftBranch = leftBranch
        self.rightBranch = rightBranch
        self.dimension = dimension

class KDTree():
    def __init__(self,data):
        self.nearestPt = None
        self.nearestDis = math.inf
    
    def createKDTree(self,currPts,dimension):
        if(len(currPts) == 0):
            return None
        mid = self.calMedium(currPts)
        sortedData = sorted(currPts,key=lambda x:x[dimension])
        leftBranch = self.createKDTree(sortedData[:mid],self.calDimension(dimension))
        rightBranch = self.createKDTree(sortedData[mid+1:],self.calDimension(dimension))
        return Node(sortedData[mid],leftBranch,rightBranch,dimension)

    def calMedium(self,currPts):
        return len(currPts) // 2

    def calDimension(self,dimension):
        return (dimension+1)%2

    def calDistance(self,p0,p1):
        return math.sqrt((p0[0]-p1[0])**2+(p0[1]-p1[1])**2)

    def getNearestPt(self,root,targetPt):
        self.search(root,targetPt)
        return self.nearestPt,self.nearestDis
        
    def search(self,node,targetPt):
        if node == None:
            return
        dist = node.pt[node.dimension] - targetPt[node.dimension]
        if(dist > 0):#目标点在节点的左侧或上侧
            self.search(node.leftBranch,targetPt)
        else:
            self.search(node.rightBranch,targetPt)
        tempDis = self.calDistance(node.pt,targetPt)
        if(tempDis < self.nearestDis):
            self.nearestDis = tempDis
            self.nearestPt = node.pt
        #回溯
        if(self.nearestDis > abs(dist)):
            if(dist > 0):
                self.search(node.rightBranch,targetPt)
            else:
                self.search(node.leftBranch,targetPt)

if __name__ == "__main__":
    kdtree = KDTree(pts) 
    root = kdtree.createKDTree(pts,0)   
    pt,minDis = kdtree.getNearestPt(root,targetPt)
    print("最近的点是",pt,"最小距离是",str(minDis))

Six, KD tree code (multidimensional version)

6.1 python version

import math

pts = []  #点集,任意维度的点集
targetPt =   #目标点,任意维度的点

class Node():
    def __init__(self,pt,leftBranch,rightBranch,dimension):
        self.pt = pt
        self.leftBranch = leftBranch
        self.rightBranch = rightBranch
        self.dimension = dimension

class KDTree():
    def __init__(self,data):
        self.nearestPt = None
        self.nearestDis = math.inf
    
    def createKDTree(self,currPts,dimension):
        if(len(currPts) == 0):
            return None
        mid = self.calMedium(currPts)
        sortedData = sorted(currPts,key=lambda x:x[dimension])
        leftBranch = self.createKDTree(sortedData[:mid],self.calDimension(dimension))
        rightBranch = self.createKDTree(sortedData[mid+1:],self.calDimension(dimension))
        return Node(sortedData[mid],leftBranch,rightBranch,dimension)

    def calMedium(self,currPts):
        return len(currPts) // 2

    def calDimension(self,dimension): # 区别就在于这里,几维就取余几
        return (dimension+1)%len(targetPt)

    def calDistance(self,p0,p1):
        return math.sqrt((p0[0]-p1[0])**2+(p0[1]-p1[1])**2)

    def getNearestPt(self,root,targetPt):
        self.search(root,targetPt)
        return self.nearestPt,self.nearestDis
        
    def search(self,node,targetPt):
        if node == None:
            return
        dist = node.pt[node.dimension] - targetPt[node.dimension]
        if(dist > 0):#目标点在节点的左侧或上侧
            self.search(node.leftBranch,targetPt)
        else:
            self.search(node.rightBranch,targetPt)
        tempDis = self.calDistance(node.pt,targetPt)
        if(tempDis < self.nearestDis):
            self.nearestDis = tempDis
            self.nearestPt = node.pt
        #回溯
        if(self.nearestDis > abs(dist)):
            if(dist > 0):
                self.search(node.rightBranch,targetPt)
            else:
                self.search(node.leftBranch,targetPt)

if __name__ == "__main__":
    kdtree = KDTree(pts) 
    root = kdtree.createKDTree(pts,0)   
    pt,minDis = kdtree.getNearestPt(root,targetPt)
    print("最近的点是",pt,"最小距离是",str(minDis))

7. Application of KD tree

7.1 Unity finds the closest point to the target point in the target plane or space

Similar to the above, no more details.

7.2 Find several closest points to the target point in the target plane or space

8. References

[1] https://www.cnblogs.com/bambipai/p/8435797.html

[2]  kd-tree_Baidu Encyclopedia

Guess you like

Origin blog.csdn.net/qq_41904236/article/details/128778063