Python Concise Guide (b)
Radio function
Broadcast function ( Broadcasting function
) rule
Broadcasting allows generic function ( universal functions
) is not very meaningful manner having the same shape ( shape
) input. Broadcast needs to know the following two rules:
Dimensions spread
The first rule is that the broadcast dimension if all inputs do not have the same number of the array, then "1" is repeatedly added in advance to a small array
shape
, the array until all have the same number of dimensions.Value spread
The second rule is to ensure that the size of the broadcast along a particular dimension of the array as if it has a 1 has a maximum dimension along the
shape
array size, and assuming that the value of the array elements along the dimension of "broadcast" of the array is the same.Finally, after using two rules above, all of the array
size
must match.
Fancy indexing and index Tips
Python NumPy more than conventional sequence of indexing. In addition to indexing and slicing through the integers, as we have seen before, and Boolean arrays can be an array index array of integers.
Use array subscript Index
>>> a = np.arange(12)**2 # the fisrt 12 square numbers
>>> a
array([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121])
>>> i = np.array([1,1,3,8,5]) # an array of indices
>>> i
array([1, 1, 3, 8, 5])
>>> a[i] # the elements of a at the position i
array([ 1, 1, 9, 64, 25])
>>> j = np.array([[3, 4], [9, 7]]) # 二维数组索引,结果与二维数组shape相同
>>> a[j]
array([[ 9, 16],
[81, 49]])
The above example a
is one-dimensional, as a
when a multi-dimensional array refers to a single index a
in the first dimension. The following example using the palette label image into a color image to display this behavior.
palette = np.array( [ [0,0,0], # black
... [255,0,0], # red
... [0,255,0], # green
... [0,0,255], # blue
... [255,255,255] ] ) # white
>>> palette
array([[ 0, 0, 0],
[255, 0, 0],
[ 0, 255, 0],
[ 0, 0, 255],
[255, 255, 255]])
>>> image = np.array( [ [ 0, 1, 2, 0 ], # each value corresponds to a color in the palette
... [ 0, 3, 4, 0 ] ] )
>>> image
array([[0, 1, 2, 0],
[0, 3, 4, 0]])
>>> palette[image]
array([[[ 0, 0, 0], # image[0,0] 索引palette[0]
[255, 0, 0], # image[0,1] 索引palette[1]
[ 0, 255, 0], # image[0,2] 索引palette[2]
[ 0, 0, 0]], # image[0,3] 索引palette[0]
[[ 0, 0, 0], # image[1,0] 索引palette[0]
[ 0, 0, 255], # image[1,1] 索引palette[3]
[255, 255, 255], # image[1,2] 索引palette[4]
[ 0, 0, 0]]]) # image[1,3] 索引palette[0]
We can also provide multiple dimensions index. Each indexed array must have the same dimensions shape
.
>>> a = np.arange(12).reshape(3,4)
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> i = np.array([[0,1],[1,2]]) #a的一维索引
>>> i
array([[0, 1],
[1, 2]])
>>> j = np.array([[2, 1], [3,3]]) #a的二维索引
>>> j
array([[2, 1],
[3, 3]])
>>> a[i,j] # i, j必须有相同的shape
array([[ 2, 5],
[ 7, 11]])
>>>
>>> a[i,2]
array([[ 2, 6],
[ 6, 10]])
>>>
>>> a[:,j]
array([[[ 2, 1], # 对应a[0, j]
[ 3, 3]],
[[ 6, 5], # 对应a[1,j]
[ 7, 7]],
[[10, 9], # 对应a[2, j]
[11, 11]]])
Naturally, we can i
and j
as list
an element, then list
go to the index:
>>> l = [i, j]
>>> l
[array([[0, 1],
[1, 2]]), array([[2, 1],
[3, 3]])]
>>> a[l] # 等价于a[i,j]
array([[ 2, 5],
[ 7, 11]])
>>> a[i, j]
array([[ 2, 5],
[ 7, 11]])
But we can not by i
and j
into an array to accomplish this, because this will be interpreted as an array index 'a' in the first dimension.
>>> s = np.array([i, j])
>>> s
array([[[0, 1],
[1, 2]],
[[2, 1],
[3, 3]]])
>>> a[s]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: index 3 is out of bounds for axis 0 with size 3
>>>
>>> a[tuple(s)] # 可以, 类似于a[i, j]
array([[ 2, 5],
[ 7, 11]])
>>> tuple(s)
(array([[0, 1],
[1, 2]]), array([[2, 1],
[3, 3]]))
Another common use is to search using the array index associated with the maximum time sequence:
>>> time = np.linspace(20, 145, 5) # time scale
>>> time
array([ 20. , 51.25, 82.5 , 113.75, 145. ])
>>> data = np.sin(np.arange(20).reshape(5,4))
>>> data
array([[ 0. , 0.84147098, 0.90929743, 0.14112001],
[-0.7568025 , -0.95892427, -0.2794155 , 0.6569866 ],
[ 0.98935825, 0.41211849, -0.54402111, -0.99999021],
[-0.53657292, 0.42016704, 0.99060736, 0.65028784],
[-0.28790332, -0.96139749, -0.75098725, 0.14987721]])
>>>
>>> ind = data.argmax(axis=0) # 每一列最大值的索引
>>> ind
array([2, 0, 3, 1])
>>>
>>> time_max = time[ind]
>>> time_max
array([ 82.5 , 20. , 113.75, 51.25])
>>> time
array([ 20. , 51.25, 82.5 , 113.75, 145. ])
>>> data_max = data[ind, range(data.shape[1])] # # => data[ind[0],0], data[ind[1],1]...
>>> data_max
array([ 0.98935825, 0.84147098, 0.99060736, 0.6569866 ])
>>> data.shape[1]
4
>>> data.shape[0]
5
>>> np.all(data_max == data.max(axis=0))
True
>>> data.max(axis=0)
array([ 0.98935825, 0.84147098, 0.99060736, 0.6569866 ])
You can also assign an object to an array index:
>>> a = np.arange(5)
>>> a
array([0, 1, 2, 3, 4])
>>> a[[1,3,4]] = 0
>>> a
array([0, 0, 2, 0, 0])
However, when the list contains a duplicate index, the index position is assigned a plurality of times, using the values from the last assignment:
>>> a=np.arange(5)
>>> a
array([0, 1, 2, 3, 4])
>>> a[[0,0,2]] = [1, 2,3]
>>> a
array([2, 1, 3, 3, 4])
This is reasonable, but please note that if you want to use the Python + = structure, because it may not achieve the desired results:
>>> a = np.arange(5)
>>> a[[0,0,2]] +=1
>>> a
array([1, 1, 3, 3, 4])
Although the index list 0
appears twice, but the first 0
elements will be added once. This is because Python
to a+=1
be interpreted asa=a+1
Array Boolean
index
When we use the (integer) array index when the index array, we offer a list of indexes to be selected. Using Boolean index, the method is different; we explicitly choose which items we want as well as an array of items that we do not want.
Boolean index most natural way people can think of is to use the original array has the same shape
Boolean array:
>>> a = np.arange(12).reshape(3,4)
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> b = a > 4
>>> b
array([[False, False, False, False],
[False, True, True, True],
[ True, True, True, True]], dtype=bool)
>>> a[b] #被选择的元素作为一维数组返回
array([ 5, 6, 7, 8, 9, 10, 11])
This property is very useful for the assignment:
>>> a[b] = 0 #将 > 4的元素全部赋值为0
>>> a
array([[0, 1, 2, 3],
[4, 0, 0, 0],
[0, 0, 0, 0]])
You can view the following examples, learn how to use Boolean index generation Mandelbrot images:
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> def mandelbrot( h,w, maxit=20 ):
... """Returns an image of the Mandelbrot fractal of size (h,w)."""
... y,x = np.ogrid[ -1.4:1.4:h*1j, -2:0.8:w*1j ]
... c=x+y*1j
... z=c
... divtime = maxit + np.zeros(z.shape, dtype=int)
...
... for i in range(maxit):
... z=z**2+c
... diverge = z*np.conj(z) > 2**2 # who is diverging
... div_now = diverge & (divtime==maxit) # who is diverging now
... divtime[div_now] = i # note when
... z[diverge] = 2 # avoid diverging too much
... return divtime
...
>>> plt.imshow(mandelbrot(400, 400))
<matplotlib.image.AxesImage object at 0xb5cbeccc>
>>> plt.show()
Boolean value index of the second approach is more akin to an integer index; for each dimension of the array, we give a 1D array of Boolean, select the slice we want:
>>> a = np.arange(12).reshape(3,4)
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> b1=np.array([False, True, True])
>>> b2=np.array([True, False, True, False])
>>> b1
array([False, True, True], dtype=bool)
>>> b2
array([ True, False, True, False], dtype=bool)
>>> a[b1,:]
array([[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> a[b1]
array([[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>>
>>> a[:,b2]
array([[ 0, 2],
[ 4, 6],
[ 8, 10]])
>>> a[b1,b2]
array([ 4, 10])
Note that the length of Boolean 1D array must be consistent with the size (or shaft) to the slice length. In the previous example, the length of b1 (the number of rows in a) 3, b2 (of length 4) is adapted to index a second axis (column).
ix_()
function
ix_()
Function can be used in different combinations of vectors, in order to obtain each n-uplet
result. For example, to calculate from each vector a
, b
and c
all triples made alla+b*c
>>> a = np.array([2, 3, 3, 5])
>>> b = np.array([8, 5, 4])
>>> c = np.array([5,4,6,8,3])
>>> ax, bx, cx = np.ix_(a, b, c)
>>> ax
array([[[2]],
[[3]],
[[3]],
[[5]]])
>>> bx
array([[[8],
[5],
[4]]])
>>> cx
array([[[5, 4, 6, 8, 3]]])
>>> ax.shape, bx.shape, cx.shape
((4, 1, 1), (1, 3, 1), (1, 1, 5))
>>> result=ax+bx*cx
>>> result
array([[[42, 34, 50, 66, 26],
[27, 22, 32, 42, 17],
[22, 18, 26, 34, 14]],
[[43, 35, 51, 67, 27],
[28, 23, 33, 43, 18],
[23, 19, 27, 35, 15]],
[[43, 35, 51, 67, 27],
[28, 23, 33, 43, 18],
[23, 19, 27, 35, 15]],
[[45, 37, 53, 69, 29],
[30, 25, 35, 45, 20],
[25, 21, 29, 37, 17]]])
>>> result[3, 2, 4]
17
>>> a[3]+b[2]*c[4]
17
You can also achieve reduce follows:
>>> def ufunc_reduce(ufct, *vectors):
... vs = np.ix_(*vectors)
... r = ufct.identity
... for v in vs:
... r = ufct(r,v)
... return r
Then use it in the following way:
>>> ufunc_reduce(np.add,a,b,c)
array([[[15, 14, 16, 18, 13],
[12, 11, 13, 15, 10],
[11, 10, 12, 14, 9]],
[[16, 15, 17, 19, 14],
[13, 12, 14, 16, 11],
[12, 11, 13, 15, 10]],
[[16, 15, 17, 19, 14],
[13, 12, 14, 16, 11],
[12, 11, 13, 15, 10]],
[[18, 17, 19, 21, 16],
[15, 14, 16, 18, 13],
[14, 13, 15, 17, 12]]])
Compared with ordinary ufunc.reduce, reduce the advantage of this version is that it takes advantage of broadcasting rules in order to avoid creating a parameter array size multiplied by the number of vector output.
Linear Algebra
Simple array operations
>>> import numpy as np
>>> a = np.array([[1.0, 2.0],[3.0,4.0]])
>>> print(a)
[[1. 2.]
[3. 4.]]
>>> a.transpose()
array([[1., 3.],
[2., 4.]])
>>> np.linalg.inv(a)
array([[-2. , 1. ],
[ 1.5, -0.5]])
>>> np.linalg.inv(a) @ a
array([[1.00000000e+00, 0.00000000e+00],
[2.22044605e-16, 1.00000000e+00]])
>>> u = np.eye(2)
>>> u
array([[1., 0.],
[0., 1.]])
>>> j = np.array([[0.0, -1.0],[1.0,0.0]])
>>> j
array([[ 0., -1.],
[ 1., 0.]])
>>> j @ j
array([[-1., 0.],
[ 0., -1.]])
>>> np.trace(u)
2.0
>>> y = np.array([[5.], [7.]])
>>> y
array([[5.],
[7.]])
>>> np.linalg.solve(a,y) # ax=y , solve x
array([[-3.],
[ 4.]])
>>> np.linalg.eig(j)
(array([0.+1.j, 0.-1.j]), array([[0.70710678+0.j , 0.70710678-0.j ],
[0. -0.70710678j, 0. +0.70710678j]]))
Tricks and Tips
“Automatic” Reshaping
To change the size of the array (Shape), wherein one dimension can be omitted (using -1
filler size is omitted), and then automatically derive Size:
>>> a = np.arange(30)
>>> a
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
>>> a.shape = 2, -1, 3
>>> a.shape
(2, 5, 3)
>>> a
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]],
[[15, 16, 17],
[18, 19, 20],
[21, 22, 23],
[24, 25, 26],
[27, 28, 29]]])
Rectangular 图 (Hitograms)
Applied to the array NumPy
histogram function returns the pair of the vectors: vector array of histograms and regions. Note: matplotlib
also has a function to build a histogram (referred to hist
as the Matlab
middle), and NumPy
different in. The main difference is that the pylab.hist
automatic drawing histograms numpy.histogram
generated only data.
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> # Build a vector of 10000 normal deviates with variance 0.5^2 and mean 2
... mu, sigma = 2, 0.5
>>> v = np.random.normal(mu,sigma,10000)
>>> # Plot a normalized histogram with 50 bins
...
>>> plt.hist(v, bins=50)
(array([ 4., 1., 1., 3., 1., 7., 8., 14., 12.,
27., 41., 41., 57., 94., 116., 150., 209., 253.,
290., 367., 421., 511., 556., 602., 594., 637., 615.,
630., 625., 538., 487., 444., 355., 298., 240., 219.,
151., 112., 97., 50., 46., 25., 26., 7., 5.,
7., 1., 2., 1., 2.]), array([-0.07307439, 0.00663803, 0.08635045, 0.16606286, 0.24577528,
0.3254877 , 0.40520012, 0.48491254, 0.56462496, 0.64433738,
0.7240498 , 0.80376222, 0.88347463, 0.96318705, 1.04289947,
1.12261189, 1.20232431, 1.28203673, 1.36174915, 1.44146157,
1.52117399, 1.60088641, 1.68059882, 1.76031124, 1.84002366,
1.91973608, 1.9994485 , 2.07916092, 2.15887334, 2.23858576,
2.31829818, 2.39801059, 2.47772301, 2.55743543, 2.63714785,
2.71686027, 2.79657269, 2.87628511, 2.95599753, 3.03570995,
3.11542236, 3.19513478, 3.2748472 , 3.35455962, 3.43427204,
3.51398446, 3.59369688, 3.6734093 , 3.75312172, 3.83283413,
3.91254655]), <a list of 50 Patch objects>)
>>> plt.show()
>>> # Compute the histogram with numpy and then plot it
... (n, bins) = np.histogram(v, bins=50, density=True) # NumPy version (no plot)
>>> plt.plot(.5*(bins[1:]+bins[:-1]), n)
[<matplotlib.lines.Line2D object at 0xb451782c>]
>>> plt.show()