1 Getting Started

1.1 What is NumPy?

NumPy is the fundamental package for scientific computing in Python.
In short, the core of various calculations
Numpy is ndarry object
There are several important differences between NumPy arrays and the standard Python sequences: (some differences with Python)

NumPy arrays have a fixed size when created, unlike Python lists, which can grow dynamically. Changing the size of an ndarray will create a new array and delete the original.
The elements in a NumPy array all need to have the same data type and thus be the same size in memory. Exception: There can be arrays of (Python, including NumPy) objects, thus allowing arrays of elements of different sizes.
NumPy arrays facilitate advanced mathematical operations and other types of operations on large amounts of data. In general, such operations perform more efficiently and require less code than using Python's built-in sequences.
A growing number of scientific and mathematical Python-based packages are using NumPy arrays; although these often support Python sequence inputs, they convert these inputs to NumPy arrays before processing, and often output NumPy arrays. In other words, to effectively use today's Python-based scientific/mathematical software, it is not enough to know how to use Python's built-in sequence types, one also needs to know how to use NumPy arrays.

In Numpy, element-by-element operations are the “default mode” when an ndarray is involved [when the array is included, the corresponding multiplication of elements is the default format]

c = a * b

1.2 Why is NumPy Fast?

Vectorization [vectorization]
Broadcasting [broadcasting mechanism]

2 NumPy quickstart

2.1 The Basics

NumPy's primary objects are multidimensional arrays of the same structure
It is a table of elements of the same type (usually numbers), indexed by tuples of non-negative integers
In NumPy, dimensionsthe axes are calledaxes

[1, 2, 1]	# one axis,a length of 3
# the array has 2 axes,he first axis has a length of 2, the second axis has a length of 3.第一个维度是行，第二个维度是列
[[1., 0., 0.],
 [0., 1., 2.]]

NumPy’s array class is called ndarray
注意：numpy.array is not the same as the Standard Python Library class array.array
Python array.arraycan only handle important attributes of one-dimensional arrays
an object:ndarray

# 1 ndarray.ndim:the number of axes (dimensions) of the array【维度的数量】
# 2 ndarray.shape:the dimensions of the array.This is a tuple of integers indicating the size of the array in each dimension. 
For a matrix with n rows and m columns, shape will be (n,m). The length of the shape tuple is therefore the number of axes, ndim.
【数组的维度。这是一个整数元组，表示每个维度中数组的大小。
对于一个有n行m列的矩阵，shape将是（n，m）
因此，the shape tuple的长度就是轴的数量ndim】
# 3 ndarray.size：the total number of elements of the array. 
This is equal to the product of the elements of shape.
【数组中所有元素的个数，等于array shape所有元素的乘积】
# 4 ndarray.dtype：an object describing the type of the elements in the array.
One can create or specify dtype’s using standard Python types. 
Additionally NumPy provides types of its own. numpy.int32, numpy.int16, and numpy.float64 are some examples.
【描述数组中元素类型的对象
可以使用标准Python类型创建或指定dtype。
此外，NumPy还提供了自己的类型。比如numpy.int32、numpy.int16和numpy.float64】
# 4 ndarray.itemsize：the size in bytes of each element of the array.
For example, an array of elements of type float64 has itemsize 8 (=64/8),
while one of type complex32 has itemsize 4 (=32/8).
It is equivalent to 【ndarray.dtype.itemsize】
【数组中每个元素的大小（以字节为单位）。
例如，float64类型（64 bit）的元素数组的项大小为8（＝64/8），
而complex32(32 bit)类型的元素阵列的项大小是4（＝32/8）
它相当于ndarray.dtype.itemsize】
# 5 ndarray.data:the buffer containing the actual elements of the array. Normally, we won’t need to use this attribute because we will access the elements in an array using indexing facilities.
【该缓冲区包含数组的实际元素。
通常，我们不需要使用此属性，因为我们将使用索引功能访问数组中的元素。】

2.1.1 A Example

import numpy as np
a = np.arange(15).reshape(3, 5)
print(a)
print(a.shape)
print(a.ndim)
print(a.size)
print(a.dtype)

insert image description here

type(a)
# numpy.ndarray
b = np.array([3,4,5])
type(b)
# numpy.ndarray

2.1.2 Array Creation [array creation]

There are several ways to create new arrays:

Use array()functions to create arrays from regular Python lists or tuples . the type of the array deduced from the types of the elements in the sequence

import numpy as np
a = np.array([2, 3, 4])		# [2, 3, 4]是一个列表
a 
a.dtype

A common mistake is to call an array with multiple parameters instead of providing a single sequence as a parameter [If you use a list, you must not forget the square brackets []]

a = np.array(1, 2, 3, 4)    # WRONG
Traceback (most recent call last):
  ...
TypeError: array() takes from 1 to 2 positional arguments but 4 were given
a = np.array([1, 2, 3, 4])  # RIGHT

array()Sequences of sequences [sequence of sequences] can be converted to two-dimensional arrays Sequences of sequences of sequences [sequence of sequences of sequences] can be converted to three-dimensional
arrays

b = np.array([(1,2,3),(4,5,6)])	# 最外面是方括号[]，一个列表中包含了两个元组
b
array([[1, 2, 3],
    [4, 5, 6]])

At the same time, array type conversion can also be performed

c = np.array([[1, 2], [3, 4]], dtype=complex)
c
array([[1.+0.j, 2.+0.j],
       [3.+0.j, 4.+0.j]])

Often, the elements of an array are initially unknown, but their size is known.
Therefore, NumPy provides several functions to create arrays with initial placeholder contents

# zeros:creates an array full of zeros【全是0】
# ones:creates an array full of ones【全是1】
# empty:initial content is random,depends on the state of the memory【初始内容随机，取决于内存的状态】
# 默认，the dtype of the created array is float64，但可以通过关键字dtype进行更改

a = np.zeros((2,5))
a
array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])
b = np.ones((2,3,5), dtype=np.int16)
b
array([[[1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1]],

       [[1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1]]], dtype=int16)
c = np.empty((2,4))
c
array([[0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
        0.00000000e+000],
       [0.00000000e+000, 8.61650486e-321, 1.78019082e-306,
        8.90103559e-307]])

To create a sequence of numbers, use arange(similar to range in Python), but arangereturns an array

a = np.arange(10,30,5)	# 10是start,30是end,5是interval【间隔】
a
array([10, 15, 20, 25])
b = np.arange(0,2,0.6)	# end可以不达到，interval可以是小数
b
array([0. , 0.6, 1.2, 1.8])

When arangeused with floating point parameters, it is not very predictable the number of elements in the final array
So, more recommended linspace, allows us to set the number of elements in the array

from numpy import pi
a = np.linspace(0,2,9)	# 从0到2的9个数，包括0和2，均匀分配
a
array([0.  , 0.25, 0.5 , 0.75, 1.  , 1.25, 1.5 , 1.75, 2.  ])

2.1.3 Printing Arrays 【Print array】

When printing arrays, NumPy displays it in a similar way to nested lists [similar to Lists]

The last axis is printed from left to right [read one line from left to right]
the penultimate one is printed from top to bottom [read a column from top to bottom]
the rest is also printed from top to bottom, and each slice is separated from the next slice by a blank line. One-dimensional arrays are printed as rows
[row]
two-dimensional arrays are printed as matrices [matrix]
three-dimensional arrays are printed as lists of matrices

If you want to force Numpy to print all arrays, use np.set_printoptionsthe change print option

# 全部输出
np.set_printoptions(threshold=sys.maxsize)  # sys module should be imported

2.1.4 Basic Operations

Array operations are performed elementwise [by element], and a new array will be created

a = np.array([20, 30, 40, 50])
b = np.arange(4)
b
c = a - b

Unlike many matrix languages, the product operator *operates element-wise on NumPy arrays .
Matrix product matrix product can be used @运算符（在python中>=3.5）or the dotfunction or method

a = np.array([[1,1],[0,1]])
b = np.array([[2,0],[3,4]])

print(a*b)	# 对应元素相乘
print(a@b)	# 矩阵乘法
print(a.dot(b))	# 矩阵乘法
# 结果如下：
[[2 0]
 [0 4]]
[[5 4]
 [3 4]]
[[5 4]
 [3 4]]

These operations, such as +=or *=, directly process the original array without creating a new array

rg = np.random.default_rng(1)	# 设置随机树生成器，数字可以更改
a = np.ones((2, 3), dtype=int)
print(a)
[[1 1 1]
 [1 1 1]]
b = rg.random((2,3))
[[0.51182162 0.9504637  0.14415961]
 [0.94864945 0.31183145 0.42332645]]
print(b)
b +=a	# 对b进行处理，等同于b=b+a
print(b)
[[1.51182162 1.9504637  1.14415961]
 [1.94864945 1.31183145 1.42332645]]
 # 但是a = a + b,b不能自动从float转为int
 a += b
 # UFuncTypeError: Cannot cast ufunc 'add' output from dtype('float64') to dtype('int32') with casting rule 'same_kind'

When operating with arrays of different types, the type of the resulting array corresponds to a more general or precise one (a behavior called upcasting)

a = np.ones(3, dtype=np.int32)	# 'int32'
b = np.linspace(0, pi, 3)
b.dtype.name	# 'float64'
c = a + b
c.dtype.name	# 'float64'

Many unary operations, such as calculating the sum of all elements in an array, are ndarrayimplemented as methods of classes

a = rg.random((2,3))
print(a)
print(a.sum())
print(a.max())
print(a.min())
# 结果如下：
[[0.32973172 0.7884287  0.30319483]
 [0.45349789 0.1340417  0.40311299]]
2.412007822394087
0.7884287034284043
0.13404169724716475

By default, these operations are applied to the array, regardless of the shape of the array, as if it were a list of numbers [a list of numbers] However,
by specifying parameters, operations axiscan be applied along the specified array :axis
axis=0: process the columns
axis=1: Process the rows

# 指定轴参数axis
b = np.arange(12).reshape(3,4)
print(b)
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
print(b.sum(axis=0))	# 每列求和
[12 15 18 21]
print(b.min(axis=1))	# 每行求最小
[0 4 8]
print(b.cumsum(axis=1))	# 每行元素依次累加，得到和原来完全相同的数组
[[ 0  1  3  6]
 [ 4  9 15 22]
 [ 8 17 27 38]]
print(b.cumsum(axis=0))	# 每列元素依次累加，得到和原来完全相同的数组
[[ 0  1  2  3]
 [ 4  6  8 10]
 [12 15 18 21]]

2.1.5 Universal Functions [common functions]

NumPy provides familiar mathematical functions such as sin, cos, and exp.
These functions are set in Numpy to operate universal functions(ufunc)element-wise
when using these functions, and generate arrays as output

b = np.arange(3)
print(np.exp(b))
[1.         2.71828183 7.3890561 ]
print(np.sqrt(b))
[0.         1.         1.41421356]

2.1.6 Indexing, Slicing and Iterating【Indexing, Slicing, Iterating】

One-dimensional arrays can be indexed, sliced, and iterated just lists and other Python sequenceslike

a = np.arange(10)**3	# **是幂的意思
print(a)
[  0   1   8  27  64 125 216 343 512 729]
print(a[2])		# 从0开始
8
print(a[2:5])	# 切片
[ 8 27 64]
a[:6:2] = 1000	# 从开始到索引6（不包括索引6），间隔为2，每隔2个元素设置为1000
print(a)
[1000    1 1000   27 1000  125  216  343  512  729]
print(a[::-1])	# 两个冒号代表从开始到结尾|将数组a反转,注意，对a本身没有什么影响，除非重新赋值一个新数组
[ 729  512  343  216  125 1000   27 1000    1 1000]

Multidimensional arrays can have one index per axis. These indices are given as a comma-separated tuple:

def f(x,y):
    return 10*x+y

# fromfunction()：通过f，创建特定的数组
b = np.fromfunction(f, (5, 4), dtype=int)	# (5,4)指数组的shape,x从0-4，y从0-3
b
array([[ 0,  1,  2,  3],
       [10, 11, 12, 13],
       [20, 21, 22, 23],
       [30, 31, 32, 33],
       [40, 41, 42, 43]])
print(b[2,3])
print(b[0:5, 1])	# 0-5（不包括5）行，第2列
print(b[:, 1])	# 所有行，第2列
print(b[1:3, :] )	# 所有列，1-3行

When fewer indices are provided than the number of axes, missing indices are treated as full slices:
b[-1] equivalent to b[-1, :][last column, all rows]

b[i]的i后面可以跟冒号：或者dots...

dots(…) represents the colon required to generate a complete index tuple
means: pass the required colon through…

# if x is an array with 5 axes
x[1, 2, ...] is equivalent to x[1, 2, :, :, :],
x[..., 3] to x[:, :, :, :, 3] and
x[4, ..., 5, :] to x[4, :, :, 5, :]
# 例子
c = np.array([[[  0,  1,  2],  # a 3D array (two stacked 2D arrays)
               [ 10, 12, 13]],
              [[100, 101, 102],
               [110, 112, 113]]])
print(c)
[[[  0   1   2]
  [ 10  12  13]]
 [[100 101 102]
  [110 112 113]]]
print(c.shape)	# (2, 2, 3)
print(c[1,...])	# same as c[1, :, :] or c[1]【第二个块】
[[100 101 102]
 [110 112 113]]
print(c[...,2])	# same as c[:, :, 2]【第3列】
[[  2  13]
 [102 113]]

Iteration over multidimensional arrays is done relative to the first axis :

def f(x,y):
    return 10*x+y
b = np.fromfunction(f, (5, 4), dtype=int)	# (5,4)指数组的shape,x从0-4，y从0-3
b
array([[ 0,  1,  2,  3],
       [10, 11, 12, 13],
       [20, 21, 22, 23],
       [30, 31, 32, 33],
       [40, 41, 42, 43]])
for row in b:
    print(row)	# 按行读取
# 结果如下：
[0 1 2 3]
[10 11 12 13]
[20 21 22 23]
[30 31 32 33]
[40 41 42 43]

However, if you want to perform an operation on each element of the array, you can use the spread flatproperty, which is an iterator over all the elements of the array:

for element in b.flat:
    print(element)
# 结果
0
1
2
3
10
11
12
13
20
21
22
23
30
31
32
33
40
41
42
43

2.2 Shape Manipulation【Shape Management】

2.2.1 Changing the shape of an array

The shape of the array is given by the number of elements along each axis:

rg = np.random.default_rng(1)
a = np.floor(10*rg.random((3,4)))	# 下取整
a
array([[5., 9., 1., 9.],
       [3., 4., 8., 4.],
       [5., 0., 7., 5.]])
a.shape
(3, 4)

The shape of the array can be changed by various commands. Note that the following three commands all return a modified array, but do not alter the original array:

print(a.ravel(),a.ravel().shape)	# 展开flattened

print(a.reshape(6, 2))	# 6行2列,进行形状重设，各维度的乘积需保持不变
[[5. 9.]
 [1. 9.]
 [3. 4.]
 [8. 4.]
 [5. 0.]
 [7. 5.]]
print(a.T, a.T.shape)	# 转置[4,3]
[[5. 3. 5.]
 [9. 4. 0.]
 [1. 8. 7.]
 [9. 4. 5.]] 
 (4, 3)

ndarray.resizemethod modifies the array itself【Change its own shape】

a.resize((2,6))
a

On reshapethe above, if -1 appears, then the size of the dimension is automatically calculated

a.reshape(3, -1)
# 3行，自动计算列数12/3=4列

2.2.2 Stacking together different arrays

Several arrays can be stacked together along different axes:

vstack(): [rows are stacked, the number of rows increases]
hstack(): [columns stacked, the number of columns increases]

a = np.floor(10 * rg.random((2, 2)))
b = np.floor(10 * rg.random((2, 2)))
print(a)
[[5. 9.]
 [1. 9.]]
print(b)
[[3. 4.]
 [8. 4.]]
c = np.vstack((a, b))	# vstack进行行堆叠
print(c)
[[5. 9.]
 [1. 9.]
 [3. 4.]
 [8. 4.]]
d = np.hstack((a, b))	# hstack进行列堆叠
[[5. 9. 3. 4.]
 [1. 9. 8. 4.]]
print(d)

column_stack(): [For 1D arrays, stack by column - the number of columns increases, which is different from hstack()] [For 2D arrays, it is the same as hstack()]
row_stack(): [Row stacking]

For functions column_stack: stack 1D arrays into 2D arrays by column

# column_stack对于二维数组而言，进行列的堆叠
e = np.column_stack((a, b))
[[5. 9. 3. 4.]
 [1. 9. 8. 4.]]
 
a = np.array([4., 2.])
b = np.array([3., 8.])
c = np.column_stack((a, b)) # column_stack对于一维数组而言，将一维数据看作列，返回二维数组
[[4. 3.]
 [2. 8.]]
d = np.hstack((a, b))	# 由于a和b都只有一列，生成的d也是一列
print(d)
 [4. 2. 3. 8.]

usenewaxis

from numpy import newaxis

a = np.array([4., 2.])
a = a[:, newaxis]	# 将a看作一个2维的矢量
array([[4.],
       [2.]])	# 4外面存在两个中括号
c = np.column_stack((a[:, newaxis], b[:, newaxis]))	# 按照列堆叠
array([[4., 3.],
       [2., 8.]])
d = np.hstack((a[:, newaxis], b[:, newaxis]))	# 按照行堆叠
d
array([[4., 3.],
       [2., 8.]])
# 上述两种方法结果一样

For the function , the same as row_stackfor any input array vstack
In fact, row_stack is an alias for vstack

np.column_stack is np.hstack
False	# 两者不相同
np.row_stack is np.vstack
True	# 两者相同

In summary, for arrays of more than two dimensions
hstackstack along their second axes [horizontal]
vstackstack along their first axes [vertical]
concatenateconnect given numbered axes
Note: In complex cases, r_and c_can be used to create arrays by stacking numbers along one axis. They allow the use of range literals:

a = np.r_[1:4, 0, 4]
a
array([1, 2, 3, 0, 4])

# hstack, vstack, column_stack, concatenate, c_, r_ 这些比较类似

When used as an argument with an array, r_and behaves like and c_in default behavior , but allows an optional argument giving the number of the axis to joinvstackhstack

2.2.3 Splitting one array into several smaller ones【Array splitting】

hsplitThe array can be split along its horizontal axis (method 1: specify the number of equal-shaped arrays to return | method 2: specify the column on which the split should be made)

a = np.floor(10 * rg.random((2, 12)))
print(a)
[[5. 9. 1. 9. 3. 4. 8. 4. 5. 0. 7. 5.]
 [3. 7. 3. 4. 1. 4. 2. 2. 7. 2. 4. 9.]]
b = np.hsplit(a,3)	# 按照列进行分割,分为3份
print(b)
[array([[5., 9., 1., 9.],[3., 7., 3., 4.]]), 
array([[3., 4., 8., 4.],[1., 4., 2., 2.]]), 
array([[5., 0., 7., 5.],[7., 2., 4., 9.]])]
c = np.hsplit(a, (3, 4))	# 指定要分割的列通过括号实现，在第3列和第4列分别进行分割【第3列之前|3-4列|第4列之后，总共分为3份】
print(c)
[array([[5., 9., 1.],[3., 7., 3.]]), 
array([[9.],[4.]]), 
array([[3., 4., 8., 4., 5., 0., 7., 5.],[1., 4., 2., 2., 7., 2., 4., 9.]])]

vsplitSplitting along the vertical axis [vertical axis]
array_splitallows selection of a specific axis for splitting

2.3 Copies and Views【Copy】

When manipulating and manipulating an array, sometimes its data is copied into the new array, sometimes not. There are three cases:

2.3.1 No Copy at All【No Copy】

Simply specify that no object or its data will be copied

a = np.array([[ 0,  1,  2,  3],
              [ 4,  5,  6,  7],
              [ 8,  9, 10, 11]])
b = a	# 没有创建新的object
True
b is a	# a和b是同一数组object的两个命名

2.3.2 View or Shallow Copy [shallow copy]

Different array objects can share the same data. viewmethod creates a new array object for viewing the same data

c = a.view()
c is a	# c和a不同
False

2.3.3 Deep Copy [deep copy]

copymethod does a full copy of the array and data

d = a.copy()	# 包括新数据的新数组被创建
d is a	# d和a不共享任何数据
False

2.3.4 Functions and Methods Overview 【Functions and Methods Overview】

Reference link: Functions and Methods Overview

2.4 Less Basic

2.4.1 Broadcasting rules [broadcasting principles]

Broadcasting allows generic functions to handle inputs that are not exactly the same shape in a meaningful way

The first rule: If all input arrays have different dimensions, "1" is repeatedly added to the shape of the smaller array until all arrays have the same dimension .
The second rule: ensure that arrays of size 1 along a particular dimension behave as if they had the size of the array with the largest shape along that dimension. The values of the array elements are assumed to be the same as the dimensions of the "broadcast" array.

After broadcasting rules are applied, the sizes of all arrays must match

2.5 Advanced indexing and index tricks

NumPy provides more indexing functionality than regular Python sequences.
In addition to indexing by integers and slices, as we saw before, arrays can also be indexed by arrays of integers and booleans.

2.5.1 Indexing with Arrays of Indices [array index]

a = np.arange(12)**2
i = np.array([1, 1, 3, 8, 5]) 	# 索引构成的数组
a[i]	# i指数组的下标
array([ 1,  1,  9, 64, 25], dtype=int32)
j = np.array([[3, 4], [9, 7]])	# 二维数组
a[j]	# 得到的数组与j的shape相同
array([[ 9, 16],
       [81, 49]], dtype=int32)

When the index array a is multidimensional, the single index array refers to the first dimension of a The
following example shows this behavior by converting the label image to a color image using a color palette

# 调色板
palette = np.array([[0, 0, 0],         # black
                    [255, 0, 0],       # red
                    [0, 255, 0],       # green
                    [0, 0, 255],       # blue
                    [255, 255, 255]]) 	# black
image = np.array([[0, 1, 2, 0], 
                  [0, 3, 4, 0]])	# 相当于索引数组(2, 4)，每个数字代表调色板上的颜色
palette[image]	# 得到的结果与image的shape相同，并且由于palette内每个元素是三维，最终数组shape为（2， 4， 3）
array([[[  0,   0,   0],
        [255,   0,   0],
        [  0, 255,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0, 255],
        [255, 255, 255],
        [  0,   0,   0]]])

It is also possible to provide indexes for [multiple dimensions]. But the index arrays for each dimension must have the same shape.

a = np.arange(12).reshape(3, 4)
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
i = np.array([[0, 1], [1, 2]]) # indices for the first dim of `a`【a的第一个维度的索引】
j = np.array([[2, 1], [3, 3]]) # indices for the second dim of `a`【a的第二个维度的索引】
a[i,j]	# i和j的shape必须相同
array([[ 2,  5],
       [ 7, 11]])
a[i, 2]
array([[ 2,  6],
       [ 6, 10]])
a[:, j]
array([[[ 2,  1],
        [ 3,  3]],

       [[ 6,  5],
        [ 7,  7]],

       [[10,  9],
        [11, 11]]])

In Python, arr[i，j]it's arr[（i，j）]exactly the same as , so we can put i and j in a tuple and use that for indexing. But you can't put i and j in parentheses () [because this array will be interpreted as the first dimension of index a. 】

l = (i, j)	# 新建元组
a[l]	# 与a[i, j]相同

Another common use of indexing with arrays is to search for the maximum value of a time-related series

argmax(a, axis=None, out=None)
usage: return the index value corresponding to the maximum value.
Both one-dimensional and two-dimensional arrays are acceptable.
For two-dimensional arrays: axis=0: search for the maximum value in the column direction of the array|axis=1: search for the maximum value in the row direction of the array

time = np.linspace(20, 145, 5) # 时间序列
data = np.sin(np.arange(20)).reshape(5, 4)	# 数据，5行4列
data
ind = data.argmax(axis=0)	# 得到列方向最大值所对应的索引【得到的其实是行号】
ind
time_max = time[ind]	# 得到数据最大值所对应的时间
data_max = data[ind, range(data.shape[1])] 	# 得到最大值

You can also use an array as the index of the target to assign data:

a = np.arange(5)
a
array([0, 1, 2, 3, 4])
a[[1, 3, 4]] = 0
a
array([0, 0, 2, 0, 0])

However, when the index list contains duplicates, multiple assignments are performed and only the last value is kept:

a = np.arange(5)
a[[0, 0, 2]] = [1, 2, 3]
a
array([2, 1, 3, 3, 4])

2.5.2 Indexing with Boolean Arrays【Boolean Array】

When we index an array with an (integer) index array, we provide a list of indices to choose from.
For boolean indices, the method is different; we need to explicitly select the [items needed] and [items not needed] in the array. The
first method: [use a boolean array with the same shape as the original array]

a = np.arange(12).reshape(3, 4)
b = a > 4	# 返回布尔数组
b
array([[False, False, False, False],
       [False,  True,  True,  True],
       [ True,  True,  True,  True]])

a[b]	# False的舍弃，只保留True的
array([ 5,  6,  7,  8,  9, 10, 11])
a[b]=0 # 将对的数据变为0
a
array([[0, 1, 2, 3],
       [4, 0, 0, 0],
       [0, 0, 0, 0]])

Second method: more similar to integer indexing; for each dimension of the array, we are given a 1D boolean array, select the slice we want
Note that the length of the 1D boolean array must match the length of the dimension (or axis) to be sliced.

a = np.arange(12).reshape(3, 4)
b1 = np.array([False, True, True])	# b1长度为3（a中的行数）
b2 = np.array([True, False, True, False])	# b2长度为4(a中的列数)

a[b1, :]	# 选择行
array([[ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
a[b1]  # 依旧是选择行
array([[ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
a[:, b2] # 选择列 
array([[ 0,  2],
       [ 4,  6],
       [ 8, 10]])

2.5.3 The ix_() function [not very useful]

The ix_ function can be used to combine different vectors so as to obtain the result for each n-uplet. For example, if you want to compute all the a+b*c for all the triplets taken from each of the vectors a, b and c

2.5.4 Indexing with strings [Omitted]

2.6 Tricks and Tips [Tips]

2.6.1 “Automatic” Reshaping【Automatic deformation】

To change the dimensions of the array, one of the shapes can be omitted, and the shape will be automatically deduced

a = np.arange(30)
b = a.reshape((2, -1, 3)) 
b.shape
(2, 5, 3)
b
array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8],
        [ 9, 10, 11],
        [12, 13, 14]],

       [[15, 16, 17],
        [18, 19, 20],
        [21, 22, 23],
        [24, 25, 26],
        [27, 28, 29]]])

2.6.2 Vector Stacking [vector stacking]

How do we construct a 2D array from a list of row vectors of equal size? In MATLAB, this is easy: if x and y are two vectors of the same length, then just do m=[x;y] .
In NumPy, this is column_stack、dstack、hstack和vstackachieved with functions, depending on the dimensions to be stacked.

x = np.arange(0, 10, 2)
y = np.arange(5)
m = np.vstack([x, y])	# 按行堆叠
m
array([[0, 2, 4, 6, 8],
       [0, 1, 2, 3, 4]])
xy = np.hstack([x, y])	# 按列
xy
array([0, 2, 4, 6, 8, 0, 1, 2, 3, 4])

2.6.3 Histograms

A NumPy histogramfunction applied to an array returns a pair of vectors: [the histogram of the array and the vector of the bin edges]
Note: matplotlib also has a function for building histograms (called in Matlab hist), which is different from the one in NumPy.
The main difference is that pylab.histthe histogram is drawn automatically, whereas numpy.histogramonly the data is generated.

import numpy as np
rg = np.random.default_rng(1)
import matplotlib.pyplot as plt
# Build a vector of 10000 normal deviates with variance 0.5^2 and mean 2
mu, sigma = 2, 0.5
v = rg.normal(mu, sigma, 10000)
# Plot a normalized histogram with 50 bins
plt.hist(v, bins=50, density=True)       # matplotlib version (plot)
(array...)
# Compute the histogram with numpy and then plot it
(n, bins) = np.histogram(v, bins=50, density=True)  # NumPy version (no plot)
plt.plot(.5 * (bins[1:] + bins[:-1]), n) # bins[1:]指从第一个数据到最后一个数据，bins[:-1]指从第0个数据到倒数第二个数据，前后两个bin的值求平均