1. numpy array minimum number to k:
np.argpartition (arr, k) f partitioning step is done fast row, it does not change the original array, the array index returns only after the partitioning well to ensure that before the index k -1 corresponding to the element <section = the k; for example:
> arr= np.array([1, 3, 5, 4, 6, 2, 8])
> np.argpartition(arr, 3)
> [ 0, 5, 1, 2, 4, 3, 6]
If you want to take the minimum k, sliced [0: k]; for taking the maximum k, sliced [-k:].
The highest number of array elements 2. numpy demand appears
np.bincount (arr), is to count [0, the largest element array] number of occurrences of all integers in the range in the array. Such as:
> arr = np.array( [ 1, 2, 1, 3, 4, 2] )
> np.bincount(arr)
> [0., 2., 2., 1., 1.,]
With the np.argmax (np.bincount (arr)) can be obtained the most number of occurrences of element value.
3. kNN python implementation:
class kNN: def __init__(self): self.x_train=[] self.y_train=[] def train(self, X,Y): self.x_train = np.array(X) self.y_train = np.array(Y) return self def predict(self, x_test, k=3, regularization = 'L1'): num_test = x_test.shape[0] y_predict = np.zeros(num_test, dtype=self.y_train.dtype) for i in range(num_test): if regularization == 'L1': distance = np.sum(np.abs(self.x_train - x_test[i,:]), axis=1 ) elif regularization == 'L2': distance = np.sum(np.square(self.x_train - x_test[i,:]), axis=1) else: distance = np.sum(np.abs(self.x_train - x_test[i,:]), axis=1) nearest_idx = np.argpartition(distance, k-1)[0:k] votes = self.y_train[nearest_idx] y_predict[i] = np.argmax(np.bincount(votes)) return y_predict def accuray(self, y_predict, y_test): acc = np.sum(y_predict==y_test)/y_predict.shape[0] return acc
The data set is used MNIST handwritten digit recognition. http://yann.lecun.com/exdb/mnist/
Their use do cross validation test data, to obtain a better hyperparametric is k = 3, L2.
Finally, the correct test result was 97.05% (error rate 2.95%), the same home than mnist KNN 'L2' of the resulting error rate 0.14%