k-Nearest Neighbor (kNN) exercise
开始预测
1 # Now implement the function predict_labels and run the code below: 2 # We use k = 1 (which is Nearest Neighbor). 3 y_test_pred = classifier.predict_labels(dists, k=1) 4 5 # Compute and print the fraction of correctly predicted examples 6 num_correct = np.sum(y_test_pred == y_test) 7 accuracy = float(num_correct) / num_test 8 print('Got %d / %d correct => accuracy: %f' % (num_correct, num_test, accuracy))
调用的预测方法如下:
1 def predict_labels(self, dists, k=1): 2 """ 3 Given a matrix of distances between test points and training points, 4 predict a label for each test point. 5 6 Inputs: 7 - dists: A numpy array of shape (num_test, num_train) where dists[i, j] 8 gives the distance betwen the ith test point and the jth training point. 9 10 Returns: 11 - y: A numpy array of shape (num_test,) containing predicted labels for the 12 test data, where y[i] is the predicted label for the test point X[i]. 13 """ 14 num_test = dists.shape[0] 15 y_pred = np.zeros(num_test) 16 for i in range(num_test): 17 # A list of length k storing the labels of the k nearest neighbors to 18 # the ith test point. 19 closest_y = [] 20 ######################################################################### 21 # TODO: # 22 # Use the distance matrix to find the k nearest neighbors of the ith # 23 # testing point, and use self.y_train to find the labels of these # 24 # neighbors. Store these labels in closest_y. # 25 # Hint: Look up the function numpy.argsort. # 26 ######################################################################### 27 # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)***** 28 #argsort函数返回的是数组值从小到大的索引值,然后从0到k切片取出这些索引值 29 closest_y = self.y_train[np.argsort(dists[i])[0:k]] 30 31 # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)***** 32 ######################################################################### 33 # TODO: # 34 # Now that you have found the labels of the k nearest neighbors, you # 35 # need to find the most common label in the list closest_y of labels. # 36 # Store this label in y_pred[i]. Break ties by choosing the smaller # 37 # label. # 38 ######################################################################### 39 # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)***** 40 # to find the most common element in list, you can use np.bincount 41 #x.bincount的数量比x中的最大值大1,每个bincount给出了它的索引值在x中出现的次数。 42 #a.argmax()取出a中元素最大值所对应的索引 43 y_pred[i] = np.bincount(closest_y).argmax() 44 45 # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)***** 46 47 return y_pred
结果如下:
Got 137 / 500 correct => accuracy: 0.274000