Python简明教程（二）

- Python简明教程（二）

Python简明教程（二）

广播函数

广播函数(Broadcasting function)规则

广播允许通用函数（universal functions）以非常有意义的方式处理不具有相同形状(shape)的输入。使用广播需要知道以下两个规则：

维度传播

广播的第一个规则是如果所有输入数组不具有相同数量的维度，则将“1”重复地预先添加到较小数组的shape，直到所有数组具有相同数量的维度。
值传播

广播的第二个规则确保沿着特定维度大小为1的数组就好像它具有沿着该维度具有最大shape数组大小，同时假定数组元素的值沿着“广播”数组的那个维度是相同的。

最后，运用上面两个规则之后，所有数组的size必须匹配。

花哨的索引与索引技巧

NumPy提供比常规Python序列更多的索引功能。除了通过整数和切片进行索引之外，正如我们之前看到的，数组可以由整数数组和布尔数组索引。

使用数组下标索引

>>> a = np.arange(12)**2 # the fisrt 12 square numbers
>>> a
array([  0,   1,   4,   9,  16,  25,  36,  49,  64,  81, 100, 121])
>>> i = np.array([1,1,3,8,5]) # an array of indices
>>> i
array([1, 1, 3, 8, 5])
>>> a[i] # the elements of a at the position i
array([ 1,  1,  9, 64, 25])
>>> j = np.array([[3, 4], [9, 7]]) # 二维数组索引,结果与二维数组shape相同
>>> a[j]
array([[ 9, 16],
       [81, 49]])

上面例子a是一维的，当a是多维时，单个索引数组指的是a的第一个维度。以下示例通过使用调色板将标签图像转换为彩色图像来显示此行为。

扫描二维码关注公众号，回复： 6407368 查看本文章

palette = np.array( [ [0,0,0],                # black
...                       [255,0,0],              # red
...                       [0,255,0],              # green
...                       [0,0,255],              # blue
...                       [255,255,255] ] )       # white
>>> palette
array([[  0,   0,   0],
       [255,   0,   0],
       [  0, 255,   0],
       [  0,   0, 255],
       [255, 255, 255]])
>>> image = np.array( [ [ 0, 1, 2, 0 ],           # each value corresponds to a color in the palette
...                     [ 0, 3, 4, 0 ]  ] )
>>> image
array([[0, 1, 2, 0],
       [0, 3, 4, 0]])
>>> palette[image]
array([[[  0,   0,   0],    # image[0,0] 索引palette[0]
        [255,   0,   0],    # image[0,1] 索引palette[1]
        [  0, 255,   0],    # image[0,2] 索引palette[2]
        [  0,   0,   0]],   # image[0,3] 索引palette[0]

       [[  0,   0,   0],    # image[1,0] 索引palette[0]
        [  0,   0, 255],    # image[1,1] 索引palette[3]
        [255, 255, 255],    # image[1,2] 索引palette[4]
        [  0,   0,   0]]])  # image[1,3] 索引palette[0]

我们还可以为多个维度提供索引。每个维度的索引数组必须具有相同的shape。

>>> a = np.arange(12).reshape(3,4)
>>> a
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>> i = np.array([[0,1],[1,2]]) #a的一维索引
>>> i
array([[0, 1],
       [1, 2]])
>>> j = np.array([[2, 1], [3,3]]) #a的二维索引
>>> j
array([[2, 1],
       [3, 3]])
>>> a[i,j] # i, j必须有相同的shape
array([[ 2,  5],
       [ 7, 11]])
>>>
>>> a[i,2]    
array([[ 2,  6],
       [ 6, 10]])
>>>
>>> a[:,j]
array([[[ 2,  1],    # 对应a[0, j]
        [ 3,  3]],

       [[ 6,  5],   # 对应a[1,j]
        [ 7,  7]],

       [[10,  9],   # 对应a[2, j]
        [11, 11]]])

很自然的，我们可以把i和j作为list元素，然后使用list去索引：

>>> l = [i, j]
>>> l
[array([[0, 1],
       [1, 2]]), array([[2, 1],
       [3, 3]])]
>>> a[l] # 等价于a[i,j]
array([[ 2,  5],
       [ 7, 11]])
>>> a[i, j]
array([[ 2,  5],
       [ 7, 11]])

但是，我们不能通过将i和j放入数组来实现这一点，因为这个数组将被解释为索引’a’的第一个维度。

>>> s = np.array([i, j])
>>> s
array([[[0, 1],
        [1, 2]],

       [[2, 1],
        [3, 3]]])
>>> a[s]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: index 3 is out of bounds for axis 0 with size 3
>>>
>>> a[tuple(s)] # 可以,  类似于a[i, j]
array([[ 2,  5],
       [ 7, 11]])
>>> tuple(s)
(array([[0, 1],
       [1, 2]]), array([[2, 1],
       [3, 3]]))

使用数组索引的另一个常见用途是搜索与时间相关的序列的最大值：

>>> time = np.linspace(20, 145, 5) # time scale
>>> time
array([  20.  ,   51.25,   82.5 ,  113.75,  145.  ])
>>> data = np.sin(np.arange(20).reshape(5,4))
>>> data
array([[ 0.        ,  0.84147098,  0.90929743,  0.14112001],
       [-0.7568025 , -0.95892427, -0.2794155 ,  0.6569866 ],
       [ 0.98935825,  0.41211849, -0.54402111, -0.99999021],
       [-0.53657292,  0.42016704,  0.99060736,  0.65028784],
       [-0.28790332, -0.96139749, -0.75098725,  0.14987721]])
>>>
>>> ind = data.argmax(axis=0) # 每一列最大值的索引
>>> ind
array([2, 0, 3, 1])
>>>
>>> time_max = time[ind]
>>> time_max
array([  82.5 ,   20.  ,  113.75,   51.25])
>>> time
array([  20.  ,   51.25,   82.5 ,  113.75,  145.  ])
>>> data_max = data[ind, range(data.shape[1])] # # => data[ind[0],0], data[ind[1],1]...
>>> data_max
array([ 0.98935825,  0.84147098,  0.99060736,  0.6569866 ])
>>> data.shape[1]
4
>>> data.shape[0]
5
>>> np.all(data_max == data.max(axis=0))
True
>>> data.max(axis=0)
array([ 0.98935825,  0.84147098,  0.99060736,  0.6569866 ])

你也可以给数组索引对象赋值：

>>> a = np.arange(5)
>>> a
array([0, 1, 2, 3, 4])
>>> a[[1,3,4]] = 0
>>> a
array([0, 0, 2, 0, 0])

然而，当列表中包含重复的索引时，索引位置被多次赋值，并使用最后一次赋值的值：

>>> a=np.arange(5)
>>> a
array([0, 1, 2, 3, 4])
>>> a[[0,0,2]] = [1, 2,3]
>>> a
array([2, 1, 3, 3, 4])

这是合理的，但请注意是否要使用Python的+ =构造，因为它可能无法达到预期效果：

>>> a = np.arange(5)
>>> a[[0,0,2]] +=1
>>> a
array([1, 1, 3, 3, 4])

虽然在索引列表中0出现了两次，但第0个元素被加了一次。这是因为Python将a+=1解释为a=a+1

数组的`Boolean` 索引

当我们使用（整数）索引数组索引数组时，我们提供了要选择的索引列表。使用布尔索引，方法是不同的; 我们明确地选择了我们想要的数组中的哪些项以及我们不想要的项。

人们可以想到的最自然的布尔索引方法是使用与原始数组具有相同shape的布尔数组：

>>> a = np.arange(12).reshape(3,4)
>>> a
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>> b = a > 4
>>> b
array([[False, False, False, False],
       [False,  True,  True,  True],
       [ True,  True,  True,  True]], dtype=bool)
>>> a[b] #被选择的元素作为一维数组返回
array([ 5,  6,  7,  8,  9, 10, 11])

这个属性对于赋值非常的有用：

>>> a[b] = 0 #将 > 4的元素全部赋值为0
>>> a
array([[0, 1, 2, 3],
       [4, 0, 0, 0],
       [0, 0, 0, 0]])

您可以查看以下示例，了解如何使用布尔索引生成Mandelbrot图像：

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> def mandelbrot( h,w, maxit=20 ):
...   """Returns an image of the Mandelbrot fractal of size (h,w)."""
...   y,x = np.ogrid[ -1.4:1.4:h*1j, -2:0.8:w*1j ]
...   c=x+y*1j
...   z=c
...   divtime = maxit + np.zeros(z.shape, dtype=int)
...   
...   for i in range(maxit):
...     z=z**2+c
...     diverge = z*np.conj(z) > 2**2            # who is diverging
...     div_now = diverge & (divtime==maxit)  # who is diverging now
...     divtime[div_now] = i                  # note when
...     z[diverge] = 2                        # avoid diverging too much
...   return divtime
...
>>> plt.imshow(mandelbrot(400, 400))
<matplotlib.image.AxesImage object at 0xb5cbeccc>
>>> plt.show()

![mandelbrot](./mandelbrot.png)

使用布尔值进行索引的第二种方法更类似于整数索引; 对于数组的每个维度，我们给出一个1D布尔数组，选择我们想要的切片：

>>> a = np.arange(12).reshape(3,4)
>>> a
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>> b1=np.array([False, True, True])
>>> b2=np.array([True, False, True, False])
>>> b1
array([False,  True,  True], dtype=bool)
>>> b2
array([ True, False,  True, False], dtype=bool)
>>> a[b1,:]
array([[ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>> a[b1]
array([[ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>>
>>> a[:,b2]
array([[ 0,  2],
       [ 4,  6],
       [ 8, 10]])
>>> a[b1,b2]
array([ 4, 10])

请注意，1D布尔数组的长度必须与要切片的尺寸（或轴）的长度一致。在前面的例子中，b1的长度为3（a中的行数），b2（长度为4）适合于索引a的第2轴（列）。

`ix_()` 函数

ix_()函数可用于组合不同的向量，以便获得每个n-uplet的结果。例如，如果要计算从每个向量a，b和c中取得的所有三元组的所有a+b*c

>>> a = np.array([2, 3, 3, 5])
>>> b = np.array([8, 5, 4])
>>> c = np.array([5,4,6,8,3])
>>> ax, bx, cx = np.ix_(a, b, c)
>>> ax
array([[[2]],

       [[3]],

       [[3]],

       [[5]]])
>>> bx
array([[[8],
        [5],
        [4]]])
>>> cx
array([[[5, 4, 6, 8, 3]]])
>>> ax.shape, bx.shape, cx.shape
((4, 1, 1), (1, 3, 1), (1, 1, 5))
>>> result=ax+bx*cx
>>> result
array([[[42, 34, 50, 66, 26],
        [27, 22, 32, 42, 17],
        [22, 18, 26, 34, 14]],

       [[43, 35, 51, 67, 27],
        [28, 23, 33, 43, 18],
        [23, 19, 27, 35, 15]],

       [[43, 35, 51, 67, 27],
        [28, 23, 33, 43, 18],
        [23, 19, 27, 35, 15]],

       [[45, 37, 53, 69, 29],
        [30, 25, 35, 45, 20],
        [25, 21, 29, 37, 17]]])
>>> result[3, 2, 4]
17
>>> a[3]+b[2]*c[4]
17

您还可以按如下方式实现reduce：

>>> def ufunc_reduce(ufct, *vectors):
...    vs = np.ix_(*vectors)
...    r = ufct.identity
...    for v in vs:
...        r = ufct(r,v)
...    return r

然后以下面的方式使用它：

>>> ufunc_reduce(np.add,a,b,c)
array([[[15, 14, 16, 18, 13],
        [12, 11, 13, 15, 10],
        [11, 10, 12, 14,  9]],

       [[16, 15, 17, 19, 14],
        [13, 12, 14, 16, 11],
        [12, 11, 13, 15, 10]],

       [[16, 15, 17, 19, 14],
        [13, 12, 14, 16, 11],
        [12, 11, 13, 15, 10]],

       [[18, 17, 19, 21, 16],
        [15, 14, 16, 18, 13],
        [14, 13, 15, 17, 12]]])

与普通的ufunc.reduce相比，这个版本的reduce的优点是它利用了广播规则，以避免创建一个参数数组，输出的大小乘以向量的数量。

线性代数

简单的数组运算

>>> import numpy as np
>>> a = np.array([[1.0, 2.0],[3.0,4.0]])
>>> print(a)
[[1. 2.]
 [3. 4.]]
>>> a.transpose()
array([[1., 3.],
       [2., 4.]])
>>> np.linalg.inv(a)
array([[-2. ,  1. ],
       [ 1.5, -0.5]])
>>> np.linalg.inv(a) @ a
array([[1.00000000e+00, 0.00000000e+00],
       [2.22044605e-16, 1.00000000e+00]])
>>> u = np.eye(2)
>>> u
array([[1., 0.],
       [0., 1.]])
>>> j = np.array([[0.0, -1.0],[1.0,0.0]])
>>> j
array([[ 0., -1.],
       [ 1.,  0.]])
>>> j @ j
array([[-1.,  0.],
       [ 0., -1.]])
>>> np.trace(u)
2.0
>>> y = np.array([[5.], [7.]])
>>> y
array([[5.],
       [7.]])
>>> np.linalg.solve(a,y) # ax=y , solve x
array([[-3.],
       [ 4.]])
>>> np.linalg.eig(j)
(array([0.+1.j, 0.-1.j]), array([[0.70710678+0.j        , 0.70710678-0.j        ],
       [0.        -0.70710678j, 0.        +0.70710678j]]))

Tricks and Tips

“Automatic” Reshaping

要更改数组的尺寸(shape)，可以省略其中一个尺寸(使用-1填充省略的尺寸)，然后自动推导出尺寸:

>>> a = np.arange(30)
>>> a
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
>>> a.shape = 2, -1, 3
>>> a.shape
(2, 5, 3)
>>> a
array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8],
        [ 9, 10, 11],
        [12, 13, 14]],

       [[15, 16, 17],
        [18, 19, 20],
        [21, 22, 23],
        [24, 25, 26],
        [27, 28, 29]]])

直方图(Hitograms)

应用于数组的NumPy直方图函数返回一对向量：数组的直方图和区域向量。注意：matplotlib还具有构建直方图的功能（称为hist，如在Matlab中），与NumPy中的不同。主要区别在于pylab.hist自动绘制直方图，而numpy.histogram只生成数据。

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> # Build a vector of 10000 normal deviates with variance 0.5^2 and mean 2
... mu, sigma = 2, 0.5
>>> v = np.random.normal(mu,sigma,10000)
>>> # Plot a normalized histogram with 50 bins
...
>>> plt.hist(v, bins=50)
(array([   4.,    1.,    1.,    3.,    1.,    7.,    8.,   14.,   12.,
         27.,   41.,   41.,   57.,   94.,  116.,  150.,  209.,  253.,
        290.,  367.,  421.,  511.,  556.,  602.,  594.,  637.,  615.,
        630.,  625.,  538.,  487.,  444.,  355.,  298.,  240.,  219.,
        151.,  112.,   97.,   50.,   46.,   25.,   26.,    7.,    5.,
          7.,    1.,    2.,    1.,    2.]), array([-0.07307439,  0.00663803,  0.08635045,  0.16606286,  0.24577528,
        0.3254877 ,  0.40520012,  0.48491254,  0.56462496,  0.64433738,
        0.7240498 ,  0.80376222,  0.88347463,  0.96318705,  1.04289947,
        1.12261189,  1.20232431,  1.28203673,  1.36174915,  1.44146157,
        1.52117399,  1.60088641,  1.68059882,  1.76031124,  1.84002366,
        1.91973608,  1.9994485 ,  2.07916092,  2.15887334,  2.23858576,
        2.31829818,  2.39801059,  2.47772301,  2.55743543,  2.63714785,
        2.71686027,  2.79657269,  2.87628511,  2.95599753,  3.03570995,
        3.11542236,  3.19513478,  3.2748472 ,  3.35455962,  3.43427204,
        3.51398446,  3.59369688,  3.6734093 ,  3.75312172,  3.83283413,
        3.91254655]), <a list of 50 Patch objects>)
>>> plt.show()

>>> # Compute the histogram with numpy and then plot it
... (n, bins) = np.histogram(v, bins=50, density=True)  # NumPy version (no plot)
>>> plt.plot(.5*(bins[1:]+bins[:-1]), n)
[<matplotlib.lines.Line2D object at 0xb451782c>]
>>> plt.show()

numpy_hist

Refence