The nearest neighbor rule classification - KNN (K-Nearest Neighbor)

KNN is a commonly used classification algorithm in machine learning. Its idea is to calculate according to the distance between the characteristics of unknown things and the characteristics of known things, and see which known category of things the unknown thing is closest to.

The K-Nearest Neighbor (KNN) classification algorithm is a theoretically mature method and one of the simplest machine learning algorithms. The idea of ​​this method is: if most of the k most similar samples in the feature space (that is, the closest neighbors in the feature space) belong to a certain category, the sample also belongs to this category. In the KNN algorithm, the selected neighbors are all objects that have been correctly classified. In the classification decision, this method only determines the category of the sample to be divided according to the category of the nearest one or several samples, and the number of this sample is the value of the parameter k.

KNN can be used not only for classification but also for regression. By finding the k nearest neighbors of a sample, and assigning the average value of the attributes of these neighbors to the sample, the attributes of the sample can be obtained. A more useful method is to give different weights to the influence of neighbors with different distances on the sample, such as weights proportional to distance (combination function).

K-NN can be said to be one of the most direct methods for classifying unknown data. Basically, through the following picture and text description, you can understand what K-NN does.
KNN
Simply put, K-NN can be seen as: there is a bunch of data that you already know the classification, and then when a new data comes in, it starts to find the distance from each point in the training data, and then picks out the training data. Look at the last K points to see what type these points belong to, and then classify the new data with the principle of minority obeying the majority.

The algorithm steps are as follows:

  1. Initialize the distance to the maximum value
  2. Calculate the distance dist between the unknown sample and each training sample
  3. Get the maximum distance maxdist in the current K nearest samples
  4. If dist is less than maxdist, use the training sample as the K-nearest neighbor sample
  5. Repeat steps 2, 3, and 4 until the distance between the unknown sample and all training samples is calculated
  6. Count the number of occurrences of each class label in the K-nearest neighbor sample
  7. Select the class label with the highest frequency as the class label of the unknown sample

References:
[1]. https://www.cnblogs.com/Leo_wl/p/5602481.html

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325473691&siteId=291194637