python——Introduction and usage of numpy

Table of contents

1. Overview of numpy

2. Commonly used configurations of numpy

2.1 numpy configuration explicit format

2.2 Set warning information

2.3 Data types supported by numpy

2.4 numpy array attributes

2.5 Dimension, rank, axis in numpy

2.6 Broadcasting and vectorization

1. Overview of numpy

NumPy is a basic software package for scientific computing in Python. It provides multi-dimensional array objects, a variety of derived objects (mask arrays, matrices, etc.) as well as functions and APIs for quickly operating arrays. It includes mathematics, logic, array shape transformation, Sorting, selection, I/O, discrete Fourier transform, basic linear algebra, basic statistical operations, stochastic simulation, etc.

NumPy is a member of the SciPy family. The SciPy family is an open source Python ecosystem specifically used in the fields of mathematics, science and engineering. The core members of the SciPy family are Matplotlib, SciPy and NumPy, which can be summarized as the three letters MSN.

The core of NumPy is the multidimensional array class numpy.ndarray, and the matrix class numpy.matrix is a derived class of the multidimensional array class. Using the multidimensional array class as the data organization structure, NumPy provides numerous mathematical, scientific, and engineering functions. The organizational structure of NumPy is shown in the figure below.

The two sub-modules in red in the picture, numpy.core is the core of NumPy, including ndarray, ufuncs, dtypes, etc., and numpy.lib provides NumPy functions. These two submodules are private, and all functions and objects can be used in the numpy namespace without using the submodule name.

In addition to core and lib, other commonly used submodules of NumPy are:

numpy.random: random sampling submodule

numpy.ma: Masked array submodule for handling arrays containing invalid or missing data

numpy.linalg: linear algebra submodule numpy.fft: discrete Fourier transform submodule

numpy.math: mathematical function submodule defined by the C standard

numpy.emath: Math functions submodule with automatic domain

numpy.rec: Record array submodule. Array elements are a combination of multiple different types of data, similar to structures.

numpy.matrixlib: matrix classes and function submodules

numpy.ctypeslib: ctypes external function interface submodule

numpy.polynomial: polynomial submodule

numpy.char: vectorized string manipulation submodule

numpy.testing: testing support submodule

2. Commonly used configurations of numpy

2.1 numpy configuration explicit format

NumPy provides a configuration dictionary for specifying the explicit format or method of data, such as the number of digits of precision, symbols for negative numbers, non-digits, and infinity, whether to use scientific notation, etc. Use get_printoptions() to view the default values of the configuration dictionary:

import numpy as np

print(np.get_printoptions())

#输出
#{'edgeitems': 3, 'threshold': 1000, 'floatmode': 'maxprec', 'precision': 8, 'suppress': False, 'linewidth': 75, 'nanstr': 'nan', 'infstr': 'inf', 'sign': '-', 'formatter': None, 'legacy': False}

"edgeitems": Defines the number of elements displayed at both ends of the array when printing the array. The default value is 3, which means that only the first 3 elements and the last 3 elements of the array will be displayed when printing the array, and the middle elements will be omitted.

"threshold": Defines the number of elements to be truncated during array printing. The default value is 1000. When the number of elements in the array exceeds this threshold, the middle elements will be omitted when printing, and ellipses will be displayed to indicate truncation.

"floatmode": Defines the printing mode of floating point numbers. The value of this key is "maxprec". This means that floating point numbers will be printed in maximum precision mode. Maximum precision mode ( "maxprec") is a printing option in NumPy that determines the number of digits printed based on the actual precision of the floating point number. In other words, it prints all significant digits of a floating point number instead of truncating or rounding according to a fixed precision value.

"precision": Defines the precision (number of digits after the decimal point) of printed floating point numbers. The default value is 8, which means that 8 decimal places will be displayed when printing floating point numbers.

"suppress": Controls whether to suppress the scientific notation display of decimals. The default is False, which means scientific notation is not suppressed, i.e. larger or smaller numbers will be printed in scientific notation.

"linewidth": Defines the character limit for printing each line. The default value is 75, which means that when printing an array, the number of characters per line is limited to 75. When a line of characters exceeds this limit, it will be displayed in a new line.

"nanstr": defines a string representing NaN (Not a Number). The default is "nan", which means that when printing an array, the string "nan" will be used to represent NaN values.

"infstr": defines a string representing positive infinity. The default is "inf", which means that when printing an array, the string "inf" will be used to represent positive infinity values. "sign": Controls how positive numbers are displayed. The default is "-", which means that when printing an array, a minus sign will be displayed in front of positive numbers.

"formatter": Define a function or object to customize the printing format of the array. In the provided dictionary, the value of this key is None, indicating that there is no custom printing format.

"legacy": Controls whether legacy printing options are used. In the provided dictionary, this key has a value of False, indicating that the legacy printing option is not used.

2.2 Set warning information

During mathematical operations, if the divisor is 0, an overflow error will occur and the program will interrupt abnormally. However, for NumPy, this result is just an invalid value (nan), which usually does not cause the program to interrupt, but only issues a warning message. If you want to suppress this warning message, you can use the seterr function.

a = np.array([2,4,6])
b = np.array([1,0,3])
print(a/b)

[ 2. inf  2.]
D:\PyCharm\pycharmProjects\XPath\Numpy_1.py:6: RuntimeWarning: divide by zero encountered in divide
  print(a/b)

Block warning messages:

np.seterr(invalid='ignore')

a = np.array([2,4,6])
b = np.array([1,0,3])
print(a/b)

2.3 Data types supported by numpy

a = np.array([2,4,6])
print(a.dtype)

b = np.array([1,0,3],dtype='int16')
print(b.dtype)

c = np.array([1+1j,2-7j])
print(c.dtype)

d = np.array([1.1,1.2,1.3])
print(d.dtype)

e = np.array([1,0,1],dtype='bool')
print(e.dtype)

#'''
int32
int16
complex128
float64
bool
'''

2.4 numpy array attributes

dtype can check the data type of the array. dtype is one of the attributes of the array object. In addition to dtype, NumPy arrays have other attributes. The two commonly used attributes are dtype and shape. The np object can be output by adding ".Attribute name" Attribute values will not be explained in detail here.

2.5 Dimension, rank, axis in numpy

Dimension refers to the dimension of the array, such as two-dimensional and three-dimensional.

Rank refers to the number of dimensions of the array.

The axes of the array, and the axes of the Cartesian coordinate system. When performing calculations on multidimensional arrays, the direction of calculation is specified by the axis.

For example, a one-dimensional array, analogous to a one-dimensional space, has only one axis, which is the 0-axis; a two-dimensional array, analogous to a two-dimensional plane, has two axes. We are used to expressing them in rows and columns, so the direction of the row is the 0-axis. The direction of the column is the 1-axis; a three-dimensional array, analogous to the three-dimensional space, has three axes. We are used to expressing it as layers, rows, and columns. Then the direction of the layer is the 0-axis, the direction of the row is the 1-axis, and the direction of the column is the 2-axis. .

The summation of three-dimensional arrays can be divided into hierarchical summation, row-by-row summation, column-by-column summation, etc. It is necessary to use the axis to specify the calculation direction.

a = np.arange(1,65).reshape((4,4,4)) # 3层3行3列的结构
print(np.sum(a))  #全部元素求和
print(np.sum(a,axis=0))  #分层求和  也就是层对应位置的和 消去层
print(np.sum(a,axis=1))  #分行求和  也就是行对应位置的和 消去行
print(np.sum(a,axis=2))  #分列求和  也就是列对应位置的和 消去列

#得到每一层的和
print(np.sum(np.sum(a,axis=1),axis=1))
print(np.sum(np.sum(a,axis=2),axis=1))

2.6 Broadcasting and vectorization

Broadcast and vectorization are the most essential features of NumPy and the soul of NumPy. The so-called broadcasting is to map the operation of the array to each array element; vectorization can be understood as there are no explicit loops, indexes, etc. in the code. The most important features of NumPy arrays are broadcasting and vectorization. In terms of performance, it is close to the operating efficiency of C language. When reflected in code, it has such characteristics.

Broadcasting can be realized by adding 1 to each value in the array:

a = np.arange(10)
print(a)
print(a+1)

结果：
[0 1 2 3 4 5 6 7 8 9]
[ 1  2  3  4  5  6  7  8  9 10]

Vectorization can be reflected in the addition, subtraction, multiplication and division of two arrays:

a = np.arange(10)
a = a + 1
b = np.arange(10,20)
print(b + a)
print(b - a)
print(b / a)
print(b * a)

得到结果：
[11 13 15 17 19 21 23 25 27 29]
[9 9 9 9 9 9 9 9 9 9]
[10.          5.5         4.          3.25        2.8         2.5
  2.28571429  2.125       2.          1.9       ]
[ 10  22  36  52  70  90 112 136 162 190]