knn程序
http://cs231n.github.io/classification/
L1 distance
d1(I1,I2)=∑p|Ip1−Ip2|
def predict(self, X):
""" X is N x D where each row is an example we wish to predict label for """
num_test = X.shape[0]
# lets make sure that the output type matches the input type
Ypred = np.zeros(num_test, dtype = self.ytr.dtype)
# loop over all test rows
for i in xrange(num_test):
# find the nearest training image to the i'th test image num_test 10000
# using the L1 distance (sum of absolute value differences)记录的是每一行(每个图片,一共10000个)[i:](3072个)的计算值
distances = np.sum(np.abs(self.Xtr - X[i,:]), axis = 1)
#self.Xtr:50000*3072; X[i,:]:1*3072,根据广播,每行都会相减,所以distances结果是50000*1,然后找最小值的位置argmin
min_index = np.argmin(distances) # get the index with smallest distance
Ypred[i] = self.ytr[min_index] # predict the label of the nearest example
return Ypred
2018.5.11
def compute_distances_no_loops(self,X):
num_test = X.shape[0]
num_train = self.X_train.shape[0]
dists = np.zeros((num_test, num_train))
test_sum=np.sum(np.square(X),axis=1)
train_sum=np.sum(np.square(self.X_train),axis=1)
inner_product=np.dot(X,self.X_train.T)
dists=np.sqrt(-2*inner_product+test_sum.reshape(-1,1)+train_sum)
return dists
不用循环实现计算,参见:
https://blog.csdn.net/zhyh1435589631/article/details/54236643
因为broadcast,最后想实现M*N,而test_sum为1*M,train_sum为1*N,所以只要把test_sum转置即可,其他的不用改,最后会输出M*N矩阵的。
predict_lable
def predict_labels(self, dists, k=1):
https://blog.csdn.net/guangtishai4957/article/details/79950117
predict_labels函数中倒数第二行y_pred[i] = np.argmax(np.bincount(closest_y))的用法说明
# bincount函数的用法
x = np.array([0, 1, 1, 3, 3, 3, 3, 5])
# bincount函数的返回结果为一个列表,其中列表的索引于x中的值对应,
# 例如此例中x中的最大值为5,则bincount函数返回的列表的索引为0到5,每个索引对应的值为该索引对应的数字出现的次数(有点绕,看输出结果理解一下)
y = np.bincount(x)
print(y)
输出结果-》 [1 2 0 4 0 1]
# numpy里的argmax函数返回括号内参数中最大值对应的索引值,y的最大值为4,对应的索引值为3,因此返回结 果为3
# 这两个函数的结合因此实现了对多个类别中出现次数最多的类别进行统计并输出的功能!!!
z = np.argmax(y)
输出结果为 3