Python科学计算初探——余弦相似度

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/xiaoyw/article/details/80048316

  SciPy是世界上著名的Python开源科学计算库,建立在Numpy之上。它增加的功能包括数值积分、最优化、统计和一些专用函数。例如线性代数、常微分方程数值求解、信号处理、图像处理、稀疏矩阵等等。

安装科学计算包SciPy

  由于SciPy库在Windows下使用pip intall安装失败(网上资料说的),所以需要寻找第三方(Unofficial Windows Binaries for Python Extension Packages)安装包,使用“.whl”安装包进行安装(确保在pip外,还安装了wheel库),安装包地址:https://www.lfd.uci.edu/~gohlke/pythonlibs/ ,注意,SciPy依赖于numpy+mkl,安装scipy前需要先安装好numpy+mkl。即使你在此前已经安装过numpy,也请从该页面中找到numpy+mkl的whl,下载到本地,卸载先前安装的NumPy。

  下面是记录步骤。
  
  列出已经安装的软件包,查看是否安装过numpy。

D:\Python\Python36\Tools>pip list
cycler (0.10.0)
kiwisolver (1.0.1)
matplotlib (2.2.2)
numpy (1.14.2)
pip (9.0.1)
pyparsing (2.2.0)
python-dateutil (2.7.2)
pytz (2018.4)
setuptools (28.8.0)
six (1.11.0)

  卸载已经安装的numpy。

D:\Python\Python36\Tools>pip uninstall numpy
# 安装numpy+mkl
D:\Python\Python36\Tools>pip install d:\Python\numpy-1.14.2+mkl-cp36-cp36m-win_amd64.whl

  安装numpy+mkl,在windows环境上必须安装此版本;接着安装scipy软件包,注意,安装前把文件名中的“cp36m”替换为“none”。
  

D:\Python\Python36\Tools>pip install d:\Python\numpy-1.14.2+mkl-cp36-cp36m-win_amd64.whl
Processing d:\python\numpy-1.14.2+mkl-cp36-cp36m-win_amd64.whl
Installing collected packages: numpy
Successfully installed numpy-1.14.2+mkl

D:\Python\Python36\Tools>pip install d:\Python\scipy-1.0.1-cp36-none-win_amd64.whl
Processing d:\python\scipy-1.0.1-cp36-none-win_amd64.whl
Requirement already satisfied: numpy>=1.8.2 in d:\python\python36\lib\site-packages (from scipy==1.0.1)
Installing collected packages: scipy
Successfully installed scipy-1.0.1

计算两个向量夹角的余弦值

  根据定义,任取平面上两点A(x1,y1),B(x2,y2),则向量AB=(x2-x1,y2-y1),即一个向量的坐标等于表示此向量的有向线段的终点坐标减去始点的坐标。
这里写图片描述

计算向量余弦相似度

  几何中夹角余弦可用来衡量两个向量方向的差异,机器学习中借用这一概念来衡量样本向量之间的差异。

  余弦取值范围为[-1,1]。求得两个向量的夹角,并得出夹角对应的余弦值,此余弦值就可以用来表征这两个向量的相似性。夹角越小,趋近于0度,余弦值越接近于1,它们的方向更加吻合,则越相似。当两个向量的方向完全相反夹角余弦取最小值-1。当余弦值为0时,两向量正交,夹角为90度。因此可以看出,余弦相似度与向量的幅值无关,只与向量的方向相关。

  由于连续离散点连线的斜率存在无穷大的问题,所以,把角度和斜率转换为向量夹角余弦值,方便比较相似度。

  参考代码如下:

import matplotlib.pyplot as plt
import math
import numpy as np
from scipy.spatial.distance import pdist


def VectorCosine(x,y):
    ''' 计算向量夹角余弦 '''
    vc = []
    for i in range(1,len(x)-2):
        xc1 = x[i] - x[i-1]
        xc2 = x[i+1] - x[i]
        yc1 = y[i] - y[i-1]
        yc2 = y[i+1] - y[i]
        vc.append((xc1*xc2+yc1*yc2)/(math.sqrt(xc1**2+yc1**2)*math.sqrt(xc2**2+yc2**2)))

    return vc

def main():
    x2 = [0.00,0.00,0.01,0.01,0.02,0.04,0.05,0.07,0.10,0.12,0.15,0.18,0.21,0.24,0.28,0.32,0.37,0.42,0.46,0.52,0.57,0.62,0.68,0.74,0.80,0.86,0.92,0.99,1.06,1.12,1.19,1.26,1.33,1.40,1.48,1.68,1.75,1.82,1.88,1.95,2.01,2.08,2.15,2.21,2.28,2.35,2.41,2.48,2.55,2.61,2.68,2.75,2.81,2.88,2.95,3.01,3.08,3.15,3.21,3.27,3.34,3.39,3.46,3.51,3.58,3.64,3.69,3.75,3.81,3.86,3.92,3.97,4.02,4.08,4.13,4.17,4.22,4.27,4.31,4.36,4.41,4.44,4.49,4.52,4.56,4.60,4.64,4.67,4.71,4.74,4.77,4.80,4.82,4.85,4.87,4.89,4.91,4.93,4.94,4.96,4.97,4.98,4.99,4.99,4.99,4.99,4.99,4.99,4.98,4.97,4.96,4.94,4.93,4.91,4.88,4.86,4.83,4.80,4.77,4.73,4.70,4.66,4.62,4.57,4.52,4.46,4.42,4.36,4.29,4.24,4.18,4.11,4.06,3.99,3.92,3.85,3.78,3.70,3.63,3.55,3.48,3.41,3.33,3.26,3.18,3.09,3.02,2.94,2.85,2.78,2.69,2.61,2.54,2.45,2.37,2.30,2.21,2.13,2.06,1.98,1.89,1.82,1.74,1.67,1.59,1.52,1.45,1.37,1.30,1.23,1.16,1.09,1.03,0.96,0.90,0.84,0.78,0.72,0.67,0.61,0.55,0.51,0.45,0.41,0.36,0.32,0.28,0.24,0.21,0.18,0.14,0.12,0.09,0.07,0.05,0.04,0.02,0.01,0.01,0.00,0.00]
    y2 = [35.01,35.30,35.32,35.22,37.23,38.91,40.61,41.66,43.01,45.78,49.20,51.85,53.81,56.15,58.65,57.61,55.97,54.22,52.13,50.91,51.01,51.65,52.28,53.65,54.56,54.53,54.43,53.75,52.45,51.85,51.76,51.75,51.80,52.42,52.42,52.47,52.60,52.75,52.83,52.55,52.35,52.25,52.01,51.82,51.82,51.81,51.85,51.88,51.88,51.81,51.80,51.75,51.53,51.49,51.54,51.51,51.51,51.52,51.51,51.48,51.52,51.26,51.09,51.05,50.92,50.93,50.97,50.97,50.95,51.02,50.99,51.04,51.04,50.92,50.65,50.64,50.61,50.61,50.66,50.67,50.64,50.67,50.58,50.47,50.45,50.24,50.07,50.10,50.07,50.05,50.11,50.10,50.07,49.97,49.70,49.67,49.68,49.50,49.50,49.49,49.47,49.50,49.46,49.48,49.21,48.11,47.81,47.37,47.32,46.85,45.77,44.54,43.09,41.66,40.29,38.49,36.54,33.99,31.23,28.23,25.26,23.25,24.20,26.10,29.01,31.74,33.24,33.20,32.61,30.41,27.65,26.16,25.95,25.98,27.61,29.39,31.12,31.89,31.97,30.75,29.65,28.33,27.31,27.00,27.47,28.33,29.30,30.26,30.96,30.99,30.31,29.17,28.83,28.18,28.16,28.18,28.94,29.49,30.08,30.34,30.43,30.24,29.58,29.15,29.08,29.08,29.41,29.76,30.36,30.48,30.55,30.48,30.47,30.14,29.80,29.80,30.17,30.39,30.85,31.42,31.55,31.53,31.54,31.48,31.43,31.40,31.41,31.57,32.01,32.66,33.24,33.25,33.24,33.24,32.80,32.25,32.25,32.40,32.61,33.04,35.01]

    x1 = [0.00,0.00,0.01,0.01,0.02,0.03,0.05,0.07,0.09,0.11,0.13,0.16,0.19,0.22,0.25,0.28,0.32,0.35,0.39,0.43,0.48,0.51,0.56,0.60,0.66,0.71,0.76,0.82,0.87,0.93,0.99,1.03,1.09,1.15,1.21,1.27,1.33,1.39,1.45,1.51,1.58,1.62,1.69,1.75,1.81,1.87,1.93,1.99,2.05,2.11,2.16,2.21,2.27,2.32,2.38,2.44,2.49,2.54,2.60,2.65,2.74,2.78,2.83,2.88,2.93,2.98,3.02,3.07,3.12,3.16,3.21,3.24,3.29,3.33,3.37,3.41,3.45,3.49,3.53,3.56,3.60,3.63,3.66,3.70,3.73,3.76,3.79,3.82,3.85,3.88,3.91,3.93,3.95,3.98,4.00,4.02,4.04,4.06,4.07,4.09,4.10,4.11,4.12,4.13,4.14,4.14,4.15,4.15,4.15,4.14,4.14,4.13,4.12,4.11,4.09,4.08,4.05,4.03,4.00,3.98,3.94,3.92,3.88,3.84,3.80,3.76,3.72,3.67,3.62,3.57,3.52,3.48,3.43,3.37,3.31,3.25,3.19,3.12,3.06,2.99,2.92,2.87,2.80,2.74,2.67,2.61,2.54,2.47,2.40,2.33,2.26,2.21,2.14,2.07,2.00,1.93,1.86,1.79,1.73,1.66,1.60,1.54,1.48,1.42,1.35,1.29,1.22,1.16,1.10,1.04,0.98,0.94,0.88,0.83,0.77,0.72,0.67,0.62,0.57,0.52,0.48,0.44,0.40,0.36,0.32,0.28,0.25,0.21,0.18,0.15,0.13,0.11,0.09,0.07,0.05,0.04,0.02,0.01,0.01,0.00,0.00]
    y1 = [22.60,23.39,24.27,25.45,26.78,28.30,29.75,30.86,32.34,34.06,36.00,38.69,41.29,46.88,50.25,53.15,55.22,57.65,61.04,63.47,68.09,71.36,71.69,69.49,67.67,65.42,61.75,58.15,55.43,53.57,54.53,54.76,56.02,57.72,59.22,60.26,60.82,60.00,59.18,57.25,55.58,54.47,53.71,53.30,53.27,54.15,55.09,56.36,57.19,57.52,57.62,57.55,56.40,55.63,54.44,53.81,53.57,53.14,53.34,54.25,54.13,54.84,55.31,55.41,55.62,56.00,55.63,55.16,54.39,53.98,53.85,53.56,53.28,53.40,53.78,54.29,54.53,54.63,54.81,55.10,54.95,54.54,54.05,53.78,53.58,53.52,53.06,53.17,53.52,53.64,53.81,53.73,53.64,53.94,53.59,53.15,52.70,52.60,52.28,51.99,51.62,51.64,51.61,51.81,51.52,51.43,50.73,50.12,49.80,49.12,48.41,48.07,47.69,47.27,47.45,47.12,46.66,46.21,45.64,44.68,43.32,41.93,40.07,38.38,36.20,33.33,30.39,27.32,23.77,19.61,15.33,13.88,15.64,17.82,20.16,23.61,26.95,30.24,32.15,31.35,30.97,29.86,27.51,24.47,22.41,20.55,20.44,20.44,21.27,22.56,25.36,26.92,28.51,29.10,29.56,29.47,28.16,26.54,25.53,23.89,22.90,22.52,22.15,23.17,24.55,25.62,26.61,26.85,26.91,26.95,26.52,25.38,24.46,23.52,23.12,22.87,22.10,21.70,23.16,23.97,24.92,25.58,26.50,26.95,27.12,25.98,24.50,23.94,22.91,21.73,20.86,20.67,21.14,22.83,23.84,24.29,25.08,24.86,24.47,23.15,22.60]

    x = [0.00,0.00,0.01,0.01,0.02,0.03,0.05,0.07,0.09,0.11,0.13,0.16,0.19,0.22,0.25,0.28,0.32,0.35,0.39,0.43,0.48,0.51,0.56,0.60,0.66,0.71,0.76,0.82,0.87,0.93,0.99,1.03,1.09,1.15,1.21,1.27,1.33,1.39,1.45,1.51,1.58,1.62,1.69,1.75,1.81,1.87,1.93,1.99,2.05,2.11,2.16,2.21,2.27,2.32,2.38,2.44,2.49,2.54,2.60,2.65,2.74,2.78,2.83,2.88,2.93,2.98,3.02,3.07,3.12,3.16,3.21,3.24,3.29,3.33,3.37,3.41,3.45,3.49,3.53,3.56,3.60,3.63,3.66,3.70,3.73,3.76,3.79,3.82,3.85,3.88,3.91,3.93,3.95,3.98,4.00,4.02,4.04,4.06,4.07,4.09,4.10,4.11,4.12,4.13,4.14,4.14,4.15,4.15,4.15,4.14,4.14,4.13,4.12,4.11,4.09,4.08,4.05,4.03,4.00,3.98,3.94,3.92,3.88,3.84,3.80,3.76,3.72,3.67,3.62,3.57,3.52,3.48,3.43,3.37,3.31,3.25,3.19,3.12,3.06,2.99,2.92,2.87,2.80,2.74,2.67,2.61,2.54,2.47,2.40,2.33,2.26,2.21,2.14,2.07,2.00,1.93,1.86,1.79,1.73,1.66,1.60,1.54,1.48,1.42,1.35,1.29,1.22,1.16,1.10,1.04,0.98,0.94,0.88,0.83,0.77,0.72,0.67,0.62,0.57,0.52,0.48,0.44,0.40,0.36,0.32,0.28,0.25,0.21,0.18,0.15,0.13,0.11,0.09,0.07,0.05,0.04,0.02,0.01,0.01,0.00,0.00]
    y = [22.6,23.39,24.27,25.45,26.78,28.3,29.75,30.86,32.34,34.06,36.0,38.69,39.29,26.88,30.25,33.15,35.22,37.65,31.04,33.47,38.09,40.36,40.69,39.48,37.67,35.42,31.75,38.15,35.43,33.57,34.53,34.76,36.02,37.72,39.22,30.25,30.82,40.0,39.18,37.25,35.58,34.47,33.71,33.3,33.27,34.15,35.09,36.36,37.19,37.52,37.62,37.55,36.4,35.63,34.44,33.81,33.57,33.14,33.34,34.25,34.13,34.84,35.31,35.41,35.62,36.0,35.63,35.16,34.39,33.98,33.85,33.56,33.28,33.4,33.78,34.29,34.53,34.63,34.81,35.1,34.95,34.54,34.05,33.78,33.58,33.52,33.06,33.17,33.52,33.64,33.81,33.73,33.64,33.94,33.59,33.15,32.7,32.6,32.28,31.99,31.61,31.64,31.61,31.81,31.52,31.43,30.72,30.11,29.79,29.11,28.40,28.07,27.68,27.27,27.45,27.11,26.65,26.21,25.64,25.68,25.32,25.93,25.07,24.38,24.2,23.33,25.39,27.32,23.77,21.61,21.33,21.88,21.64,21.82,20.16,23.61,26.95,30.24,30.15,30.35,30.97,29.86,27.51,24.47,22.41,20.55,20.24,20.24,21.27,22.56,25.36,26.92,28.51,26.1,26.56,26.47,26.16,26.54,25.53,23.89,22.9,22.52,22.15,23.17,24.55,25.62,26.61,26.85,26.91,26.95,26.52,25.38,24.46,23.52,23.12,22.87,22.1,21.7,23.16,23.97,24.92,25.58,26.5,26.95,27.12,25.98,24.5,23.94,22.91,21.73,20.86,20.67,21.14,22.83,23.84,24.29,25.08,24.86,24.47,23.15,22.6]

    v = VectorCosine(x2,y2)    
    vv = VectorCosine(x1,y1)    
    vvv=VectorCosine(x,y)
    # 计算向量余弦相似度
    cos1 = np.vstack([v,vv])
    p1 = 1 - pdist(cos1,'cosine')
    print(p1)    

    cos2 = np.vstack([v,vvv])
    p2 = 1 - pdist(cos2,'cosine')
    print(p2)

    plt.figure(1)
    plt.plot(x,y)

    plt.figure(2)
    plt.plot(x2,y2)

    plt.figure(3)
    plt.plot(x1,y1)
    plt.show()

if __name__ == '__main__':
    main()

  第二图与第一张图相似度为[0.62020321],第三图与第一张图相似度为[0.3941908]。

这里写图片描述

  基于此方法,如下图所示,取特定数据中的一段,做为比较相似度的基准,拿测试数据进行比较相似度,如果值越大,则相似度越高。
这里写图片描述

  欢迎读者反馈。
  
Python科学计算软件包下载地址:
1. Scipy, 第三方Scipy3.6
2. NumPy+MKL, numpy+mkl 3.6

参考:

1. 《【Python】Windows下安装scipy库步骤》 CSDN博客 阿秀的工作室 2017.1
2. 《距离度量以及python实现(二)》 denny的学习专栏 徐其华 2017.6
3. 《使用Python Matplotlib绘图并输出图像到文件中的实践》 CSDN博客 肖永威 2018.4

猜你喜欢

转载自blog.csdn.net/xiaoyw/article/details/80048316
今日推荐