Algorithm machine learning algorithm --KNN

KNN algorithm principle

KNN (K-Nearest Neighbor) nearest neighbor classification algorithm is one of data mining classification (classification) technology is the most simple algorithm, which is the guiding ideology, "knows nothing, doubts nothing", that is to infer Your neighbors out of your category.

 The principle KNN nearest neighbor classification algorithm: In order to determine the category of the unknown sample, a sample of all known class as a reference, the unknown samples is calculated from all known samples, and the unknown sample select from the nearest known sample K according to the minority is subordinate voting rule (majority-voting) majority of the unknown sample with the K nearest samples more category proportion classified as a class.

Python implementation KNN algorithm

   

. 1  Import numpy AS NP
 2  Import operator
 . 3  
. 4  DEF CreateDataSet ():
 . 5    # four-dimensional feature 
. 6    Group np.array = ([[5,115], [7,106], [56,11], [66,9 ]] )
 7    # four corresponding tag 
. 8    labels = ( ' action movie ' , ' action movie ' , ' love stories ' , ' love stories ' )
 . 9    return group, labels
 10  
. 11  DEF Classify (intX, dataSet A, labels, K):
 12    '' ' 
13   KNN algorithm
 14    '' ' 
15    # numpy in shape [0] returns the number of rows of the array, Shape [1] Returns the number of columns 
16    dataSetSize = dataSet.shape [0]
 . 17    # The transverse intX dataSetSize repeated twice, the longitudinal repeat 1 
18    # e.g. intX = ([1,2]) --- > ([[1,2], [1,2], [1,2], [1,2]]) to facilitate later calculation 
. 19    diffMat = NP .tile (intX, (dataSetSize,. 1)) - dataSet A
 20 is    # dimensional feature subtraction of power 
21 is    sqdifMax diffMat ** 2 =
 22 is    # calculates a distance 
23 is    seqDistances = sqdifMax.sum (Axis =. 1 )
 24    distances * = seqDistances 0.5 *
 25    Print ( " Distances: ", Distances)
 26 is    # Returns the distance elements in ascending order of the index 
27    sortDistance = distances.argsort ()
 28    Print ( " sortDistance: " , sortDistance)
 29    classCount = {}
 30    for I in Range (K):
 31 is    # remove the category of the first k elements 
32    voteLabel = Labels [sortDistance [I]]
 33 is    classCount [voteLabel] = classCount.get (voteLabel, 0) + 1'd
 34 is    # dict.get (Key, default = None), dictionary get ( ) returns the value of the specified key, if the value is not in the dictionary returns the default value. 
35    # Reverse descending order dictionary 
36  
37 [   # ClassCount.iteritems () is decomposed into the dictionary classCount list of tuples, operator.itemgetter (1) in accordance with the order of the second element of the sorted tuples, reverse = True reverse order, i.e., arranged in descending order 
38    = the sorted sortedClassCount (classCount.items (), operator.itemgetter Key = (. 1), Reverse = True)
 39    # results sortedClassCount = [( 'action movie', 2), ( 'love stories',. 1)] 
40    Print ( " sortedClassCount: " , sortedClassCount)
 41 is    Print ( " === >>>% S " , classCount.items ())
 42 is    return sortedClassCount [0] [0]
 43 is  IF  the __name__ == ' __main__ ' :
 44 is   group,labels = createDataset()
45   test = [20,101]
46   test_class = classify(test,group,labels,3)
47   print (test_class)
View Code

 

Guess you like

Origin www.cnblogs.com/yuyang81577/p/11359799.html