Machine Learning | On the K- nearest neighbor algorithm

K- nearest neighbor (KNN) algorithm is an algorithm to solve classification problems. Not only can solve the two-class, multi-classification problems can be solved.

In fact, it can also solve the problem of return.

 

 

K- nearest neighbor principle:

  The category of a sample, is determined by the K with the closest neighbors polling stations.

  example:

  There is now a set of samples, in which all data has been labeled a good class, assuming that there is a category of unknown sample x to be classified.

  In this sample from the nearest K samples, statistical accounting for each category. Suppose k = 5, the five samples which is calculated from the unknown sample x Recently,

  Then their categories statistics, as these five samples, there are two categories belonging to A, 3 belong to category B. Due to the relatively high proportion of category B,

  So come sample x belongs to category B.

  Figure:

        

   Category red dot is class A, is a blue dot category class B, a black dot represents a sample class demand prediction x.

   By Knn algorithm, when k = 5 when:

   

    (Sample x) nearest 5 (K determined) samples from black spots, there are three points blue, two red dots. It can be determined that black and blue dots belong to the same category as a class B

    Since the value of K K neighbors decided to take vote. Then when the other values ​​of K, what is the situation?

    (Another case) when k = 3 when:

        

    At this time, the three nearest neighbors in black dots, 2 red, 1 blue dot, and therefore a high proportion of the red dot, can be determined as black spots and red spots belong to class A category

    By comparison shows:

      

     In the neighborhood K-, K values ​​affect the prediction of the final result.

 

K- neighbors pseudo-code:

    1. traversing all the training set samples, calculates the distance between each sample and the sample x, save all distances

    2. (ascending) of these distances, k nearest samples taken

    3. Category k samples statistics, find the highest proportion of classes

    4. samples to be labeled category is the highest proportion of classes 

 

      

 

    

 

Guess you like

Origin www.cnblogs.com/qiutenglong/p/10961222.html