KNN Nearest Neighbor algorithm theory
KNN English name K-nearst neighbor, the Chinese name for K-nearest neighbor algorithm, which has been put forward in 1968 by the Cover and Hart
KNN algorithm works:
1. Calculate the distance between the point known class data set and the current point;
2. The distance sort ascending order;
3. Select the minimum distance from the current point k;
4. Determine the probability of occurrence of the first k classes point where
5. Return to the point before the k most frequently occurs as a predicted classification category of the current point
If the data set number 1-12 known film classified into eigenvalues comedy, action, romance three types were used funny lens, fight scene, embracing the number of lenses. Then came a new movie "Chinatown Holmes," which belongs to which of the three types of film classification?
Code to achieve the following
PANDAS Import AS PD Import numpy AS NP DEF Distance (V1, V2): "" " Distance calculation : param v1: Point. 1 : param V2: Point 2 : return : Distance " "" dist = np.sqrt (np.sum ( np.power ((v1 - v2), 2 ))) return dist # load data the data = pd.read_excel ( " ./ film classification data .xlsx " ) Print ( " the data: \ the n- " , the data) Print ( " * " * 80 ) # get the training set train = data.iloc [:, :. 6 ] Print ( " Train: \ n- " , Train) # acquires feature values of the training set and the target value train_x = train.iloc [:,: - . 1 ] train_y = train.iloc [:, - . 1 ] # Get the test set Print ( " * " * 80 ) test = data.columns [- . 4 :] Print ( " test: \ n- " , test) # distance calculation # calculation cycle each sample training set and test set distance for I in Range (train.shape [ 0 ]): # calculate the distance dist = distance (train_x.iloc [I,2 : . 5 ], Test [ . 1 :]) train.loc [I, ' dist ' ] = dist Print (Train) # sorted in ascending order of distance train.sort_values (by = ' dist ' , InPlace = True) Print ( " * " * 80 ) Print ( " Train sorted: \ n- " , Train) # K value k value determines different results in different k = . 5 RES = train.loc [:, ' Genre ' ] [: k]. MODE () [ 0 ] Print ( " *" * 80) print(res)