Statistical learning method 3-k-nearest neighbor method

Main reference:
Introduction to k-nearest neighbor method:
https://blog.csdn.net/Smile_mingm/article/details/108385312

  • k-Nearest Neighbor Method: Discriminant Model

  • 模型:
    y = a r g m a x c j ∑ x i ∈ N k ( x ) I ( y i = c j ) , i = 1 , 2 , . . . , N ; j = 1 , 2 , . . . , K y = arg \underset{cj}{max}\sum_{x_{i}\in N_{k}(x)}^{}I(y_{i}=c_{j}),i=1,2,...,N;j=1,2,...,K Y=argcjmaxxiNk(x)I(yi=cj),i=1,2,...,N;j=1,2,...,K
    I I I is the indicator function, that is, whenyi = cj y_{i}=c_{j}Yi=cjTime III reason 1, no 则III is0 00

  • The basic idea: draw a circle at the point to be predicted (that is, the K points closest to the point are used as a neighborhood), and then see which category the K points in this neighborhood belong to to determine which category the predicted point belongs to .

  • Three elements: selection of k value, distance measurement, classification decision rule
    -when k=1, it is called nearest neighbor method
    -smaller value of k:
    -excellent: only training examples that are close to the instance will have an effect on the prediction result . The approximation error will be reduced.
    -Lack: The prediction result will be sensitive to nearby strength points. The estimation error will become larger.
    -Larger value of k:
    -Excellent: Reduced estimation error.
    -Missing: Points far (irrelevant) from the input instance will also work. The approximation error will become larger.

  • Distance measurement: There are many ways to measure the distance between two points, such as the usual Euclidean distance, and Manhattan distance (direct coordinate minus, not squared), etc.

  • Classification decision rule: The majority voting rule, that is, which category in the circle has more, is predicted to be that category.

kd tree:
https://zhuanlan.zhihu.com/p/53826008
kd tree algorithm flow
build kd tree -> select segmentation domain, segmentation point

Find ->
1 Find approximate point-Find the nearest leaf node as the approximate nearest point of the target data.

2 Backtracking-backtracking and iterating along the root of the tree based on the distance between the target data and the nearest neighbor approximation point.

Code explanation reference in the
assignment: Assignment: https://blog.csdn.net/sdu_hao/article/details/103055338
zip: https://www.runoob.com/python/python-func-zip.html
plt.subplots (): https://www.cnblogs.com/komean/p/10670619.html

Flatten() function usage:
https://www.cnblogs.com/yvonnes/p/10020926.html

numpy.meshgrid():
https://zhuanlan.zhihu.com/p/41968396

General code explanation:
https://blog.csdn.net/smallcases/article/details/78236412

Guess you like

Origin blog.csdn.net/weixin_48760912/article/details/114579079
Recommended