- 为什么探讨余弦相似度?
During our ImageNet experiments, we found that euclidean distance led to poor performance when computing margins for the EVM. This is consistent with the previous finding that euclidean distance does not work well when comparing deep features of individual samples
参考文献:
C. C. Aggarwal, A. Hinneburg, and D. A. Keim, “On the surprising behavior of distance metrics in high dimensional space,” in Proc. Int. Conf. Database Theory, 2001, pp. 420–434.
# consine相似度求解
import numpy as np
from scipy.spatial.distance import pdist
# 构造两个10维的数据:x,y
x=np.random.random(10)
y=np.random.random(10)
# solution1
dist1 = 1 - np.dot(x,y)/(np.linalg.norm(x)*np.linalg.norm(y))
# solution2
dist2 = pdist(np.vstack([x,y]),'cosine')
print('x',x)
print('y',y)
print('dist1',dist1)
print('dist2',dist2)
- 图解余弦相似度,
图片来自参考文献:
刘建学, 李守军. 基于余弦相似度的因子分析在食品成分检测中的应用[J]. 食品科学, 2005, 26(6).