KD-tree algorithm is to model the data set, and then search for the nearest neighbor, the final step is to predict.
KD refers to the tree K is the dimension of the sample characteristics.
First, the establishment of KD tree
m n-dimensional feature samples, wherein calculating the variance of n, whichever is the largest variance of the k-dimensional feature as the root node. Select k-dimensional feature as a median of the cutting points, the median is less than the left subtree of the discharge, the discharge is greater than the median right subtree, recursively generated.
For example
6 two-dimensional samples, {(2,3), (5,4), (9,6), (4,7), (8,1), (7,2)}:
1, to find the root, six data points in the x, y dimension on the variance are 6.97,5.37, x dimension maximum variance, choosing the x-dimension for trie;
2, to cut points, the median dimension is x (7,2), therefore the value of this point in the x-dimension is divided;
3, x = 7 is divided into left and right space portions, then using this method recursively, the final result is:
Second, the search for the nearest neighbor
After kd tree generation can predict where the target sample test set up.
First find a sample containing the target