knn principles and means of the film classification algorithm to achieve knn

KNN Nearest Neighbor algorithm theory

  KNN English name K-nearst neighbor, the Chinese name for K-nearest neighbor algorithm, which has been put forward in 1968 by the Cover and Hart 
  KNN algorithm works:
        1. Calculate the distance between the point known class data set and the current point;
        2. The distance sort ascending order;
        3. Select the minimum distance from the current point k;
        4. Determine the probability of occurrence of the first k classes point where
        5. Return to the point before the k most frequently occurs as a predicted classification category of the current point
 
  If the data set number 1-12 known film classified into eigenvalues ​​comedy, action, romance three types were used funny lens, fight scene, embracing the number of lenses. Then came a new movie "Chinatown Holmes," which belongs to which of the three types of film classification?
 
 
Code to achieve the following
PANDAS Import AS PD 
Import numpy AS NP 


DEF Distance (V1, V2): 
    "" "
     Distance calculation 
    : param v1: Point. 1 
    : param V2: Point 2 
    : return : Distance
     " ""
     dist = np.sqrt (np.sum ( np.power ((v1 - v2), 2 )))
     return dist 


# load data 
the data = pd.read_excel ( " ./ film classification data .xlsx " ) 
Print ( " the data: \ the n- " , the data) 
Print ( " * " * 80 ) 
# get the training set 
train = data.iloc [:, :. 6 ] 
Print ( " Train: \ n- " , Train) 
# acquires feature values of the training set and the target value 
train_x = train.iloc [:,: - . 1 ] 
train_y = train.iloc [:, - . 1 ] 
# Get the test set 
Print ( " * " * 80 ) 
test = data.columns [- . 4 :] 
Print ( " test: \ n- " , test) 

# distance calculation 
# calculation cycle each sample training set and test set distance 
for I in Range (train.shape [ 0 ]): 

    # calculate the distance 
    dist = distance (train_x.iloc [I,2 : . 5 ], Test [ . 1 :]) 

    train.loc [I, ' dist ' ] = dist 



Print (Train) 
# sorted in ascending order of distance 
train.sort_values (by = ' dist ' , InPlace = True) 
Print ( " * " * 80 ) 
Print ( " Train sorted: \ n- " , Train) 

# K value k value determines different results in different 
k = . 5 
RES = train.loc [:, ' Genre ' ] [: k]. MODE () [ 0 ] 
Print ( " *" * 80)
print(res)

 

 
 
 

Guess you like

Origin www.cnblogs.com/wutanghua/p/11546214.html