KNN achieve as well as some useful function numpy

1. numpy array minimum number to k:

np.argpartition (arr, k) f partitioning step is done fast row, it does not change the original array, the array index returns only after the partitioning well to ensure that before the index k -1 corresponding to the element <section = the k; for example:

      > arr= np.array([1, 3, 5, 4, 6, 2, 8])

      > np.argpartition(arr, 3)

      > [ 0, 5, 1, 2, 4, 3, 6]

If you want to take the minimum k, sliced ​​[0: k]; for taking the maximum k, sliced ​​[-k:].

The highest number of array elements 2. numpy demand appears

np.bincount (arr), is to count [0, the largest element array] number of occurrences of all integers in the range in the array. Such as:

       > arr = np.array( [ 1, 2, 1, 3, 4, 2] )

       > np.bincount(arr)

       > [0., 2., 2., 1., 1.,]

With the np.argmax (np.bincount (arr)) can be obtained the most number of occurrences of element value.

3. kNN python implementation:

class kNN:
    def __init__(self):
        self.x_train=[]
        self.y_train=[]
    def train(self, X,Y):
        self.x_train = np.array(X)
        self.y_train = np.array(Y)
        return self
    def predict(self, x_test, k=3, regularization = 'L1'):
        num_test = x_test.shape[0]
        y_predict = np.zeros(num_test, dtype=self.y_train.dtype)
        for i in range(num_test):
            if regularization == 'L1':
                distance = np.sum(np.abs(self.x_train - x_test[i,:]), axis=1 )
            elif regularization == 'L2':
                distance = np.sum(np.square(self.x_train - x_test[i,:]), axis=1)
            else:
                distance = np.sum(np.abs(self.x_train - x_test[i,:]), axis=1)
            nearest_idx = np.argpartition(distance, k-1)[0:k]
            votes = self.y_train[nearest_idx]
            y_predict[i] = np.argmax(np.bincount(votes))

        return y_predict
    def accuray(self, y_predict, y_test):
        acc = np.sum(y_predict==y_test)/y_predict.shape[0]
        return acc

The data set is used MNIST handwritten digit recognition. http://yann.lecun.com/exdb/mnist/

Their use do cross validation test data, to obtain a better hyperparametric is k = 3, L2.

Finally, the correct test result was 97.05% (error rate 2.95%), the same home than mnist KNN 'L2' of the resulting error rate 0.14%

 

Guess you like

Origin www.cnblogs.com/rarecu/p/11546894.html