KNN understanding

There is a saying called Road to Jane, I try to use simple language to explain KNN algorithm.

The real world is often necessary to classify things, such as our divided people into men and women, will be divided into apple fruit, watermelon, bananas, etc., who do classification is based on their existing knowledge, new things to be sorted for analysis, contrast have knowledge classification, then how to make the computer deal with the classification of things to classify it?

KNN algorithm used is "Like attracts like, people in groups" such thinking.

Who are you recently, you and he is a class of people. After all, like-minded thing. So if it is determined, calculation is known to be a distance between the classification of objects and object classification, as to how to calculate the distance from the two-dimensional plane, then the A to B


12989993-8fa5947c8bf10c7e.png

If extended to multi-dimensional, then
12989993-a4c8c9481098c81c.png

By the equation, we can see from the side, when the smaller value of D, the closer the A and B, then A and B coordinate positions closer, the closer by precisely because A and B "similar to . "

12989993-a743c50c7650d3a8.png

As shown above, there are two different types of sample data, each with a small blue square and small red triangle indicates that the middle of FIG green circle marked data is data to be classified. This is our aim, to a new data point, I want to get what it's class is? Well, let's be classified according to the green dot idea k neighbors.

If K = 3, three points most neighboring green dot is two small red triangles and a small blue square, a few most subordinate, based on the statistical method, the determination of the green dot to be classified belongs to a red triangle one type.
If K = 5, the five nearest neighbors green dot is two red triangles and three blue squares, a minority subordinate to the majority, based on the statistical method of determining the classification to be green this point belong to blue square category.
We can see from the above example, k nearest neighbor algorithm idea is very simple, very easy to understand, then we are not on this end, the principle of the algorithm we also have to understand, and also know how to newcomers how to classify points, just find the nearest k instances, which can be up to category.

Haha, not so simple matter, the core idea of ​​the algorithm is indeed the case, but in order to work in a practical application of the algorithm, the amount should be noted that many - such as how to determine k, k is the number of the best of it? So-called nearest neighbor and how to judge given it?

We generally select a smaller value, typically to take the cross-validation to select the optimal value of k. (In other words, it is important to select the key value of k is an experimental parameter adjustment, which is similar to how many layers of neural networks choose to get a better result by adjusting hyperparameter)

Reproduced in: https: //www.jianshu.com/p/750746da83d7

Guess you like

Origin blog.csdn.net/weixin_33752045/article/details/91243016