K- nearest neighbor (KNN) algorithm is an algorithm to solve classification problems. Not only can solve the two-class, multi-classification problems can be solved.
In fact, it can also solve the problem of return.
K- nearest neighbor principle:
The category of a sample, is determined by the K with the closest neighbors polling stations.
example:
There is now a set of samples, in which all data has been labeled a good class, assuming that there is a category of unknown sample x to be classified.
In this sample from the nearest K samples, statistical accounting for each category. Suppose k = 5, the five samples which is calculated from the unknown sample x Recently,
Then their categories statistics, as these five samples, there are two categories belonging to A, 3 belong to category B. Due to the relatively high proportion of category B,
So come sample x belongs to category B.
Figure:
Category red dot is class A, is a blue dot category class B, a black dot represents a sample class demand prediction x.
By Knn algorithm, when k = 5 when:
(Sample x) nearest 5 (K determined) samples from black spots, there are three points blue, two red dots. It can be determined that black and blue dots belong to the same category as a class B
Since the value of K K neighbors decided to take vote. Then when the other values of K, what is the situation?
(Another case) when k = 3 when:
At this time, the three nearest neighbors in black dots, 2 red, 1 blue dot, and therefore a high proportion of the red dot, can be determined as black spots and red spots belong to class A category
By comparison shows:
In the neighborhood K-, K values affect the prediction of the final result.
K- neighbors pseudo-code:
1. traversing all the training set samples, calculates the distance between each sample and the sample x, save all distances
2. (ascending) of these distances, k nearest samples taken
3. Category k samples statistics, find the highest proportion of classes
4. samples to be labeled category is the highest proportion of classes