Main reference:
Introduction to k-nearest neighbor method:
https://blog.csdn.net/Smile_mingm/article/details/108385312
-
k-Nearest Neighbor Method: Discriminant Model
-
模型:
y = a r g m a x c j ∑ x i ∈ N k ( x ) I ( y i = c j ) , i = 1 , 2 , . . . , N ; j = 1 , 2 , . . . , K y = arg \underset{cj}{max}\sum_{x_{i}\in N_{k}(x)}^{}I(y_{i}=c_{j}),i=1,2,...,N;j=1,2,...,K Y=argcjmax∑xi∈Nk(x)I(yi=cj),i=1,2,...,N;j=1,2,...,K
I I I is the indicator function, that is, whenyi = cj y_{i}=c_{j}Yi=cjTime III reason 1, no 则III is0 00 -
The basic idea: draw a circle at the point to be predicted (that is, the K points closest to the point are used as a neighborhood), and then see which category the K points in this neighborhood belong to to determine which category the predicted point belongs to .
-
Three elements: selection of k value, distance measurement, classification decision rule
-when k=1, it is called nearest neighbor method
-smaller value of k:
-excellent: only training examples that are close to the instance will have an effect on the prediction result . The approximation error will be reduced.
-Lack: The prediction result will be sensitive to nearby strength points. The estimation error will become larger.
-Larger value of k:
-Excellent: Reduced estimation error.
-Missing: Points far (irrelevant) from the input instance will also work. The approximation error will become larger. -
Distance measurement: There are many ways to measure the distance between two points, such as the usual Euclidean distance, and Manhattan distance (direct coordinate minus, not squared), etc.
-
Classification decision rule: The majority voting rule, that is, which category in the circle has more, is predicted to be that category.
kd tree:
https://zhuanlan.zhihu.com/p/53826008
build kd tree -> select segmentation domain, segmentation point
Find ->
1 Find approximate point-Find the nearest leaf node as the approximate nearest point of the target data.
2 Backtracking-backtracking and iterating along the root of the tree based on the distance between the target data and the nearest neighbor approximation point.
Code explanation reference in the
assignment: Assignment: https://blog.csdn.net/sdu_hao/article/details/103055338
zip: https://www.runoob.com/python/python-func-zip.html
plt.subplots (): https://www.cnblogs.com/komean/p/10670619.html
Flatten() function usage:
https://www.cnblogs.com/yvonnes/p/10020926.html
numpy.meshgrid():
https://zhuanlan.zhihu.com/p/41968396
General code explanation:
https://blog.csdn.net/smallcases/article/details/78236412