[Machine Learning Core Summary] What is KNN (K Nearest Neighbor Algorithm)

What is KNN (K nearest neighbor algorithm)

Although there is NN in the name, KNN is not a kind of neural network. Its full name is K-Nearest-Neighbors : K-nearest neighbor algorithm, which is a commonly used classification algorithm in machine learning.

Please add a picture description

Birds of a feather flock together. The basic idea of ​​KNN is very simple. To judge the category of a new data, it depends on who its neighbors are.

Suppose our task is to classify fruits. Although we don’t know whether the new fruit is a pear or an apple, we can find its position in the coordinate system by observing its size and color, and then look at the determined apples and pears. Where, if there are many apples nearby, we think it is an apple, otherwise we think it is a pear.

Please add a picture description

K in KNN refers to K neighbors, and K=3 is to judge the category of new data through the three closest samples.

Please add a picture description

Size and color are the characteristics of the data, and apples and pears are the labels of the data. When calculating the distance, you can use either the straight-line distance between two points, which is the Euclidean distance, or the sum of the absolute values ​​of the coordinate axis distances, which is the Manhattan distance.

For KNN, the value of K is very important. If the value of K is too small, it is easily affected by individual cases. If the value of K is too large, it will be affected by special data that is far away. The value of K is determined by the problem itself and the size of the data set, and it often depends on trial and error.

What can the KNN algorithm do?

  • Judgment of plant category based on petal length, width and other characteristics
  • Determine the type of the article after processing the text into words, counting word frequency, etc.
  • E-commerce and video sites can find users similar to you, and recommend products or content that you may be interested in based on their choices

The simple and easy-to-use KNN also has certain shortcomings. Its process is to calculate the distance between the new sample and all samples, sort them in order from near to far, and then determine the classification according to the K value. Therefore, the more data, the KNN The greater the amount of calculation, the lower the efficiency, and it is difficult to apply to larger data sets.

Please add a picture description

Guess you like

Origin blog.csdn.net/RuanJian_GC/article/details/131544209