Data mining numpy advanced indexing and indexing skills

broadcast rule

The broadcasting law enables generic functions to meaningfully handle inputs that do not have the same shape.

The first law of broadcasting is that if all input arrays are not of the same dimension, a "1" will be repeatedly added to the smaller array until all arrays have the same dimension.

The second law of broadcasting determines that an array of length 1 behaves in a particular direction as if it had the size of the largest shape along that direction. For arrays, the values ​​of the array elements along that dimension should be the same.

After applying the broadcasting rule, all arrays must match in size.

Indexing and Indexing Tips

NumPy provides more indexing capabilities than normal Python sequences. In addition to indexing integers and slices, as we saw earlier, arrays can be indexed by integer arrays and boolean arrays.

index by array

The element of the array i is the index, and the value of the array a is read to form a one-dimensional array with the same dimension as the index array i.

import numpy
a = numpy.arange(12)**2
print("a: ", a)
i = numpy.array([1, 1, 3, 8, 5])
print("i: ", i)
# 已数组i元素为索引,读取数组a的值,构成一维数组,维度与索引数组i相同
print("a[i]: ", a[i])

j = numpy.array([[3, 4], [9, 7]])
print("j: ", j)
# 维度与索引数组i相同
print("a[j]: ", a[j])

"E:\Python 3.6.2\python.exe" F:/PycharmProjects/test.py
a:  [  0   1   4   9  16  25  36  49  64  81 100 121]
i:  [1 1 3 8 5]
a[i]:  [ 1  1  9 64 25]
j:  [[3 4]
 [9 7]]
a[j]:  [[ 9 16]
 [81 49]]

Process finished with exit code 0

When the indexed array a is multidimensional, each unique index array points to the first dimension of a . The following example demonstrates this behavior by converting an image tag with a palette to a color image.

import numpy
palette = numpy.array([
    [0, 0, 0],  # 黑色
    [255, 0, 0],  # 红色
    [0, 255, 0],  # 绿色
    [0, 0, 255],  # 蓝色
    [255, 255, 255]  # 白色
])

image = numpy.array([
    [0, 1, 2, 0],
    [0, 3, 4, 0]
])
print(palette[image])
"E:\Python 3.6.2\python.exe" F:/PycharmProjects/test.py
[[[  0   0   0]
  [255   0   0]
  [  0 255   0]
  [  0   0   0]]

 [[  0   0   0]
  [  0   0 255]
  [255 255 255]
  [  0   0   0]]]

Process finished with exit code 0

We can also give indexes in more than one dimension, and the index array for each dimension must have the same shape.

import numpy
a = numpy.arange(12).reshape(3, 4)
print("a: ", a)
i = numpy.array([
    [0, 1],
    [1, 2],
])
print("i: ", i)
j = numpy.array([
    [2, 1],
    [3, 3]
])
print("j: ", j)

print("a[i, j]: ", a[i, j])
print("a[i, 2]: ", a[i, 2])
print("a[:, j]: ", a[:, j])  # a第一维下的第j个元素构成的数组,第一维与a相同,其他维与索引数组j相同



"E:\Python 3.6.2\python.exe" F:/PycharmProjects/test.py
a:  [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
i:  [[0 1]
 [1 2]]
j:  [[2 1]
 [3 3]]
a[i, j]:  [[ 2  5]
 [ 7 11]]
a[i, 2]:  [[ 2  6]
 [ 6 10]]
a[:, j]:  [[[ 2  1]
  [ 3  3]]

 [[ 6  5]
  [ 7  7]]

 [[10  9]
  [11 11]]]

Process finished with exit code 0

Put i and j into a sequence (say a list) and index by list.

import numpy
a = numpy.arange(12).reshape(3, 4)
print("a: ", a)
i = numpy.array([
    [0, 1],
    [1, 2],
])
print("i: ", i)
j = numpy.array([
    [2, 1],
    [3, 3]
])
print("j: ", j)

# 把i和j放到序列中(比如说列表)然后通过list索引。
l = [i, j]
print("a[l]: ", a[l])
"E:\Python 3.6.2\python.exe" F:/PycharmProjects/test.py
a:  [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
i:  [[0 1]
 [1 2]]
j:  [[2 1]
 [3 3]]
a[l]:  [[ 2  5]
 [ 7 11]]

Process finished with exit code 0

We can't put i and j in an array because this array will be interpreted as the first dimension of index a.

import numpy
a = numpy.arange(12).reshape(3, 4)
print("a: ", a)
i = numpy.array([
    [0, 1],
    [1, 2],
])
print("i: ", i)
j = numpy.array([
    [2, 1],
    [3, 3]
])
print("j: ", j)

# 把i和j放到序列中(比如说列表)然后通过list索引。
l = numpy.array([i, j])
print("L: ", l)
# print("a[l]: ", a[l])  # 错误的做法
print("a[l]: ", a[l[0], l[1]])
"E:\Python 3.6.2\python.exe" F:/PycharmProjects/test.py
a:  [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
i:  [[0 1]
 [1 2]]
j:  [[2 1]
 [3 3]]
L:  [[[0 1]
  [1 2]]

 [[2 1]
  [3 3]]]
a[l]:  [[ 2  5]
 [ 7 11]]

Process finished with exit code 0

A common use of array indexing is to search for time series maximum values.

import numpy
time = numpy.linspace(20, 145, 5)  # 20到145的五个同间隔点
print("time: ", time)

data = numpy.sin(numpy.arange(20))
data.shape = 5, 4
print("data: ", data)

ind = data.argmax(axis=0)  # 第一维度上值最大的索引排序
print("index: ", ind)

time_max = time[ind]
print(time_max)

data_max = data[ind, range(data.shape[1])]
    # 在Python 3中,range()与xrange()合并为range( )。
    # data.shape[1]值得是data的第二维索引
print("data_max: ", data_max)
print("data.max: ", data.max(axis=0))
"""
ndarray.max([int axis])
函数功能:求ndarray中指定维度的最大值,默认求所有值的最大值。
axis=0:求各column的最大值
axis=1:求各row的最大值
"""
print(all(data_max == data.max(axis=0)))
# all() 函数用于判断给定的可迭代参数 iterable 中的所有元素是否不为 0、''、False 或者 iterable 为空,如果是返回 True,否则返回 False。
"E:\Python 3.6.2\python.exe" F:/PycharmProjects/test.py
time:  [  20.     51.25   82.5   113.75  145.  ]
data:  [[ 0.          0.84147098  0.90929743  0.14112001]
 [-0.7568025  -0.95892427 -0.2794155   0.6569866 ]
 [ 0.98935825  0.41211849 -0.54402111 -0.99999021]
 [-0.53657292  0.42016704  0.99060736  0.65028784]
 [-0.28790332 -0.96139749 -0.75098725  0.14987721]]
index:  [2 0 3 1]
[  82.5    20.    113.75   51.25]
data_max:  [ 0.98935825  0.84147098  0.99060736  0.6569866 ]
data.max:  [ 0.98935825  0.84147098  0.99060736  0.6569866 ]
True

Process finished with exit code 0

Assignments are made using an array index as the target. When an indexed list contains duplicates, the assignment is done multiple times, preserving the last value:

import numpy
a = numpy.arange(5)
print("a: ", a)
a[[1, 3, 4, 4]] = [0, 0, 0, 1]
print("a: ", a)
"E:\Python 3.6.2\python.exe" F:/PycharmProjects/test.py
a:  [0 1 2 3 4]
a:  [0 0 2 0 1]

Process finished with exit code 0

Using the += construct, even if 0 appears twice in the index list, the element at index 0 is only incremented once.

import numpy
a = numpy.arange(5)
print("a: ", a)
a[[0, 0, 2]] += 1
print("a: ", a)
import numpy
a = numpy.arange(5)
print("a: ", a)
a[[0, 0, 2]] += 1
print("a: ", a)

index by boolean array

When we index an array with an array of integers, we provide a selectable list of indices. By means of boolean array indexing, we can explicitly select which elements in the array we want and which we don't want.

The most natural way to use a boolean array index is to use a boolean array of the same shape as the original array.

Boolean indexing does not change the original array, it creates a copy of the original array.

import numpy
# 布尔索引
a = numpy.arange(12).reshape(3, 4)
b = a > 4
print("b: ", b)
print("a[b]: ", a[b])  # a中满足 关系b 的元素构成的一维数组
"E:\Python 3.6.2\python.exe" F:/PycharmProjects/test.py
b:  [[False False False False]
 [False  True  True  True]
 [ True  True  True  True]]
a[b]:  [ 5  6  7  8  9 10 11]

Process finished with exit code 0

attribute assignment:

import numpy
# 布尔索引
a = numpy.arange(12).reshape(3, 4)
print("a: ", a)
b = a > 4
a[b] = 0  # 满足 关系b 的元素将被赋值
print("a: ", a)
"E:\Python 3.6.2\python.exe" F:/PycharmProjects/test.py
a:  [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
a:  [[0 1 2 3]
 [4 0 0 0]
 [0 0 0 0]]

Process finished with exit code 0

See the Mandelbrot set example to see how to use boolean indexing to generate an image of the Mandelbrot set.

The method of indexing by boolean is more similar to integer indexing; for each dimension of the array we give a one-dimensional boolean array to choose the slice we want. The position of the Boolean index in the slice indicates the current or column operation of the array, [ : , b] corresponds to the row, and [b, : ] corresponds to the column.

import numpy
a = numpy.arange(12).reshape(3, 4)
print("a: ", a)
b1 = numpy.array([False, True, True])  # 输出每一列为True的值,列数不变
b2 = numpy.array([True, False, True, False])
print("a[b1, :] ", a[b1, :])  # 等同于 a[b1]
print("a[:, b1] ", a[:, b2])  # 输出每一行为True的值,行数不变
print("a[b1, b2] ", a[b1, b2])  # 未知, 不等于a[b1, :][:, b2]
"E:\Python 3.6.2\python.exe" F:/PycharmProjects/test.py
a:  [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
a[b1, :]  [[ 4  5  6  7]
 [ 8  9 10 11]]
a[:, b1]  [[ 0  2]
 [ 4  6]
 [ 8 10]]
a[b1, b2]  [ 4 10]

Process finished with exit code 0

Note that the length of the one-dimensional array must be the same as the length of the dimension or axis you want to slice. In the previous example, b1 is a rank 1 and length three array (the number of rows of a), b2 (length 4) Consistent with the second rank (column) of a.

ix_() function

The ix_() function can be used to combine different vectors in order to obtain a tuple result. For example, if you want to compute a+b*c with the triplet of all the elements of the vectors a, b, and c:

import numpy
a = numpy.array([2, 3, 4, 5])
b = numpy.array([8, 5, 4])
c = numpy.array([5, 4, 6, 8, 3])
ax, bx, cx = numpy.ix_(a, b, c)
print("ax: ", ax)
print("bx: ", bx)
print("cx: ", cx)

print("ax.shape: ", ax.shape)
print("bx.shape: ", bx.shape)
print("cx.shape: ", cx.shape)

result = ax + bx * cx
print("ax + bx * cx: ", result)
print("result[3, 2, 4] ", result[3, 2, 4])

"E:\Python 3.6.2\python.exe" F:/PycharmProjects/test.py
ax:  [[[2]]

 [[3]]

 [[4]]

 [[5]]]
bx:  [[[8]
  [5]
  [4]]]
cx:  [[[5 4 6 8 3]]]
ax.shape:  (4, 1, 1)
bx.shape:  (1, 3, 1)
cx.shape:  (1, 1, 5)
ax + bx * cx:  [[[42 34 50 66 26]
  [27 22 32 42 17]
  [22 18 26 34 14]]

 [[43 35 51 67 27]
  [28 23 33 43 18]
  [23 19 27 35 15]]

 [[44 36 52 68 28]
  [29 24 34 44 19]
  [24 20 28 36 16]]

 [[45 37 53 69 29]
  [30 25 35 45 20]
  [25 21 29 37 17]]]
17

Process finished with exit code 0

Implement the following simplification:

import numpy
def ufunc_reduce(ufct, *vectors):
    vs = numpy.ix_(*vectors)
    r = ufct.identity
    for v in vs:
        r = ufct(r, v)
    return r
a = numpy.array([2, 3, 4, 5])
b = numpy.array([8, 5, 4])
c = numpy.array([5, 4, 6, 8, 3])
print(ufunc_reduce(numpy.add, a, b, c))
"E:\Python 3.6.2\python.exe" F:/PycharmProjects/test.py
[[[15 14 16 18 13]
  [12 11 13 15 10]
  [11 10 12 14  9]]

 [[16 15 17 19 14]
  [13 12 14 16 11]
  [12 11 13 15 10]]

 [[17 16 18 20 15]
  [14 13 15 17 12]
  [13 12 14 16 11]]

 [[18 17 19 21 16]
  [15 14 16 18 13]
  [14 13 15 17 12]]]

Process finished with exit code 0

The advantage of this reduce over ufunc.reduce (such as add.reduce ) is that it exploits the broadcasting law and avoids creating a parameter array whose output size is multiplied by the number of vectors.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325131722&siteId=291194637