Machine learning algorithm: a preliminary study of k-nearest neighbors

1 Introduction and application of KNN

1.1 Introduction to KNN

kNN (k-nearest neighbors), Chinese translation K近近. We often hear a story: If you want to understand a person's financial level, you only need to know the financial capacity of his five best friends. The average of the financial level of his five people is the person's financial level. This sentence contains the idea of ​​kNN algorithm.

Image

Example: As shown in the figure above, which class should be assigned to the green circle, is it a red triangle or a blue square? If K=3, since the proportion of the red triangle is 2/3, the green circle will be assigned the category of the red triangle. If K=5, since the proportion of the blue square is 3/5, the green circle will be assigned the blue square. Square class.

1) KNN establishment process

1 Given a test sample, calculate the distance between it and each sample in the training set.
2 Find the K training samples in the nearest distance. As a neighbor of the test sample.
3 Determine the category of the sample based on the categories of the K neighbors.

 

2) Judgment of category

①Voting decision, the minority obeys the majority. Take the most category as the test sample category.

② Weighted voting method, based on the calculated distance, weights the votes of neighbors, the closer the distance, the greater the weight, and the weight is set as the reciprocal of the square of the distance.

1.2 Application of KNN

Although KNN is very simple, people often say "the avenue is simple". The sentence "things are gathered together, people are grouped" can lift its veil. The seemingly simple KNN can do both classification and regression, and it can also be used to do Fill in missing values ​​for data preprocessing. Since the KNN model has good interpretability, in general, for simple machine learning problems, we can use KNN as the baseline, and we can explain each prediction result well. The recommendation system also has the shadow of KNN. For example, in the article recommendation system, for a user A, we can push the articles viewed by k users closest to A to A.

In the field of machine learning, data is often very important. There is a saying called: "Data determines the upper limit of the task, and the goal of the model is to approach this upper limit infinitely". It can be seen that good data is very important, but due to various reasons, the data we get is missing. If we can fill these missing values ​​well, we can get better data, so that the training is more robust Model. Next, let's take a look at how KNN does classification, how to do regression and how to fill in empty values.

Guess you like

Origin blog.csdn.net/yichao_ding/article/details/111495049