KNN algorithm to achieve the simulation using python

About a .KNN

  1.KNN algorithm, also known as K near algorithm is data mining one classification. The so-called K-nearest neighbor, k nearest neighbor is the meaning of that is that each sample can use its closest neighbors k to represent.
  The core idea 2.KNN algorithm is that if a majority of the samples in the feature space of k-nearest neighbor samples belong to a category, then the sample also fall into this category, and has the characteristics of the sample on the category. The method in determining the classification decision based solely on the nearest one category or several samples to determine the category to be sub-sample belongs. KNN algorithm in the decision-making categories, with only a very small amount of adjacent samples related. Since the KNN algorithm is mainly limited by the surrounding adjacent samples, rather than a method of determining the discrimination class Category field, so for domains overlap or more class sample set is to be divided, KNN method is more than other methods suitable.

II. Code implementation

# - * - Coding: UTF-. 8 - * - 
"" " 
Use python KNN algorithm program to simulate 
the Created 22 is ON Sat On Jun 2019 18:38:22 

@author: Zhen 
" "" 
Import numpy AS NP
 Import Collections CS AS 

Data = NP .Array ([ 
        [ 203,1], [126,1], [89,1], [70,1], [196,2], [211,2], [221,2], [311,3 ], [271,3 ] 
        ]) 
feature   = data [:, 0] # wherein 
Print (feature) 

label = data [:, -. 1] # results Sort 
Print (label) 

predictPoint = 200 is # forecast data 
Print ( " predicted input characterized by:" + STR (predictPoint)) 

Distance = List (Map ( the lambda X: ABS (predictPoint - X), Feature)) # each point the distance to the prediction point 
Print (Distance) 

sortIndex = np.argsort (Distance) # sorting, returns after sorting index for each original data 
Print (sortIndex) 

sortLabel = label [sortIndex] # reordering according subscript 
Print (sortLabel) 

# k = #. 3 set value of size k. 3 

for k in Range (. 1, label.size + 1'd ): 
    Result = cs.Counter (sortLabel [0: K].) most_common (. 1) [0] [0] # highest occurrence frequency classified according to the previous value calculating k k data, the predicted classification is the 
    print (" When = K " + STR (K) + " predicted when classified: " + STR (Result))

OUTCOME

[203 1,268,970,196,211,221,311 271 ] 
[ 111,122,233 ] 
Predictive Input wherein: 200 
[ 3, 74, 111, 130, 4, 11, 21, 111, 71 ] 
[0 4 5681273 ] 
[ 122,231,131 ] 
when K = 1 is predicted classified as: 1 
when K = 2 predicted classified as: 1 
K when = 3 is predicted classified into: 2 
when K =. 4 is predicted classified as: 2 
K when = 5 forecast classified as: 2 
K when = 6 forecast classified as: 2 
K when = 7 prediction classified as: 1 
K when = 8 prediction classified as: 1 
when K = the prediction. 9 classified as: 1

IV. Summary

  1. The training data and results, such as when k is small [this] When k = 1, the prediction error is prone to the presence of abnormal data if the training data, it is generally not too small values ​​K!

  2. When the k value is large, the more training data for a classification to predict the more likely this category, therefore, must first be re-balance the training data according to the classification!

  3. Select the number and classification of general value of k, the greater number of classification, k is generally greater, the value is generally: between k ~ 2k!

 

Guess you like

Origin www.cnblogs.com/yszd/p/11070192.html