K-nearest neighbor algorithm

This article is used to practice the code implementation of the K-nearest neighbor algorithm! ! !

Algorithm idea:
By calculating the distance from each test sample to the training sample, take the labels of the k training samples closest to the test sample, which class of training samples

At most, the test sample falls into this category.

Advantages: simple idea; no training process required; suitable for multi-classification problems

Disadvantage: When there are many training samples, the computational complexity is high;

Steps:
1. Calculate the Euclidean distance from the test sample to each training sample
2. Sort the distances to obtain the labels of the nearest k samples

3. Compare the k sample labels that appear the most is the test sample label

scikit-learn machine learning algorithm library implementation.

from sklearn import neighbors

#Custom two-dimensional list training set
train = [[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]]
label = [-1,-1,1,1]

# get the classifier
knn = neighbors.KNeighborsClassifier(n_neighbors=2)

knn.fit(train, label)

print( knn.predict([[1.0,0.9]]) )
#return the probability of each element of the test data
print( knn.predict_proba([[1.0,0.9]]) )

Python implementation

import numpy as np

#Customize the training dataset and save the txt file
Train = np.array([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]])
Label = np.array([[-1],[-1],[1],[1]])
TrainSet = np.concatenate((Train,Label),axis=1)
print( TrainSet.shape )
np.savetxt("KNNTrainSet.txt",TrainSet)

#Customize the test dataset and save the txt file
TestSet = np.array([[1.0,0.9],[0.1,0.2]])
np.savetxt("KNNTestSet.txt",TestSet)

#Extract training data and test data
a = np.loadtxt('KNNTrainSet.txt',usecols=(0,1),unpack=True)
train=a.T
print( train.shape )
b = np.loadtxt('KNNTrainSet.txt',usecols=(2),unpack=True)
label=b.T
c = np.loadtxt('KNNTestSet.txt',usecols=(0,1),unpack=True)
test=c.T

#build KNN classifier
def Classifier(t,x,y,k):
    #Get the number of training samples
    trainSize = x.shape[0]
    #The difference between the test data and each training data
    diff = np.tile(t,(trainSize,1)) - x
    sqdiff = diff ** 2
    sumdiff = sqdiff.sum(axis=1)
    Distance  = sumdiff ** 0.5
    print('Euclidean distance:', Distance)
    sorteDistance = Distance.argsort()
    print('The subscript array corresponding to the ascending distance:', sorteDistance)
    #Create a matrix to store labels
    KLabel = np.zeros(k)
    class1=0
    class2=0
    print (KLabel)
    for i in range(k):
        nearLabel = y[sorteDistance[i]]
        print('The label of the most recent sample:', nearLabel)
        KLabel[i] = nearLabel
        #Get the number of each class in k labels
        if(nearLabel==-1):
            class1=class1+1
        if(nearLabel==1):
            class2=class2+1
    print("Number of categories: ",class1," ",class2)
    if(class1>class2):
        result=-1
    else:
        result=1
    return result

#Call the classifier to classify the test data set
testnum = test.shape[0]
for i in range(testnum):
    value=Classifier(test[i],train,label,2)
    print("The classification result of sample {0} is {1}".format(i,value))

References:

1.Scikit-learn

2. "Machine Learning in Practice"


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325818937&siteId=291194637