Graphical machine learning "k-nearest neighbor algorithm" principle

background introduction

First of all, I want to tell a story. I don’t know the longitude and latitude. There are two tribes, the Red Dot Tribe and the Green Dot Tribe . The inside is red , and the residents of the green dot tribe are green in their tribe . The inhabitants of both tribes come out to hunt occasionally, but when they come out of the tribe, they will turn gray
insert image description here

insert image description here
Since the residents of the tribe have the habit of grouping together, they will be closer to their own tribe to have a better sense of security. If I want to know what color a gray guy belongs to, I only need to look at him closer to that tribe I knew it, but at this time, there was a bold guy Hui Chang boldly walked to the middle of the two tribes.

insert image description here
In order to find out which tribe Hui Changdao belongs to, the scientists located the coordinates of Hui Changdao and all the residents in the two tribes based on the clustering characteristics of the two tribes, and calculated the distance between them.

insert image description here

Then find the k residents closest to Huichangdaudao , and check their passports . It turns out that most of them are from the Green Point tribe , and then draw a conclusion: Huichangdaudao belongs to the Greenpoint tribe!

This is why this classification algorithm is called "k-Nearest Neighbor Algorithm"

practical application

In practical applications, the concept of distance is not used most of the time , but various indicators can be converted into distance information in a certain way.
For example, if we want to evaluate a person's fat or thin, we can use this method. For example, the abscissa is height, and the ordinate is weight.
I don't know whether Hui Changda is fat or thin, and I don't know the standard of fat or thin . But I know that there are a lot of fat and thin people's height and weight, I just need to do a simple normalization of the weight and height of this group of fat and thin people , and also bring in the normalized height and weight of gray long and bold Just do the calculation.

insert image description here
The calculation results show that k = 7 people whose height and weight are closer to Hui Changda Bold , and 5 of them are thin people, so Hui Changda Bold is considered to be a thin person, so k generally chooses an odd number .

But unexpectedly, the accuracy rate of such a simple and crude method is surprisingly high. The disadvantage is that if the sample is relatively large (it is known that there are a lot of fat and thin people), and there are many evaluation indicators (height, weight, skin color, age, etc.), the calculation efficiency will be very slow.

Guess you like

Origin blog.csdn.net/ruredfive/article/details/124877215