Oriented Programming array of NumPy

import numpy as np

Using NumPy arrays enables you to express many kinds of data processing tasks as concise (concise) array expressions (do not use an array write cycle can express a lot of data process) that might otherwise require writing loops. This practice of replacing explicit loops whth array expressions is commonly referred to as vectorization (the quantization operation). in general, vectorized array operations will offen be one or two (or more) orders of magnitude faster than their pure Python equivalents, with the biggest impact in any kind of numerical computations. Later , Appendix A, I explain broadcasting, a powerful method for vectorizing computations -.> array oriented programming, high efficiency much faster than pure Python.

As a simple example, suppose we wished to evaluate the function sqrt(x^2 + y^2) across a reqular grid of values. The np.meshgrid function takes two 1D arrays and produces two 2D matrices corresponding(对应的值对) to all paris of (x,y) in the two arrays.

# 1000 equaly spaced points
points = np.arange(-5, 5, 0.01)

xs, ys = np.meshgrid(points, points)
ys
array([[-5.  , -5.  , -5.  , ..., -5.  , -5.  , -5.  ],
       [-4.99, -4.99, -4.99, ..., -4.99, -4.99, -4.99],
       [-4.98, -4.98, -4.98, ..., -4.98, -4.98, -4.98],
       ...,
       [ 4.97,  4.97,  4.97, ...,  4.97,  4.97,  4.97],
       [ 4.98,  4.98,  4.98, ...,  4.98,  4.98,  4.98],
       [ 4.99,  4.99,  4.99, ...,  4.99,  4.99,  4.99]])

Now, evaluating the function is a matter of writing the same expression you would wirte with two points:

z = np.sqrt(xs**2 + ys**2)
z
array([[7.07106781, 7.06400028, 7.05693985, ..., 7.04988652, 7.05693985,
        7.06400028],
       [7.06400028, 7.05692568, 7.04985815, ..., 7.04279774, 7.04985815,
        7.05692568],
       [7.05693985, 7.04985815, 7.04278354, ..., 7.03571603, 7.04278354,
        7.04985815],
       ...,
       [7.04988652, 7.04279774, 7.03571603, ..., 7.0286414 , 7.03571603,
        7.04279774],
       [7.05693985, 7.04985815, 7.04278354, ..., 7.03571603, 7.04278354,
        7.04985815],
       [7.06400028, 7.05692568, 7.04985815, ..., 7.04279774, 7.04985815,
        7.05692568]])

As a preview of Chapter9, , I use matplotlib to create visualizations(可视化) of this two-dimensional array.

import matplotlib.pyplot as plt

plt.imshow(z, cmap=plt.cm.gray)
plt.colorbar()

plt.title("Image plot of $\sqrt{x^2 + y^2}$ for a grid of values")

plt.show()
<matplotlib.image.AxesImage at 0x23043ff1be0>
<matplotlib.colorbar.Colorbar at 0x230430a16d8>
Text(0.5, 1.0, 'Image plot of $\\sqrt{x^2 + y^2}$ for a grid of values')

png

Expressing Conditional Logic as Array Oprations

The numpy. where function is a vectorized of the ternary(三元的) expression x if condition else y. (np.where(cond T, F)的三元表达式) Suppose we had a boolean array and two arrays of values:

xarr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])

cond = np.array([True, False, True, True, False])

Suppose we wanted to take a value from xarr whenever the corresponding(对应的) value in cond is True, and otherwise take the value from yarr. A list comprehension(理解) doing this might look like:

"通过判断 c 的值为 True or False"

"zip(xarr, yarr, zarr)"

result = [(x if c else y) for x, y, c in zip(xarr, yarr, cond)]
result
'通过判断 c 的值为 True or False'
'zip(xarr, yarr, zarr)'
[1.1, 2.2, 1.3, 1.4, 2.5]
# cj test
for x, y, c in zip(xarr, yarr, cond):
    x,y,c
(1.1, 2.1, True)
(1.2, 2.2, False)
(1.3, 2.3, True)
(1.4, 2.4, True)
(1.5, 2.5, False)
# cj test
list(zip(xarr, yarr, cond))
[(1.1, 2.1, True),
 (1.2, 2.2, False),
 (1.3, 2.3, True),
 (1.4, 2.4, True),
 (1.5, 2.5, False)]

This has multiplue problesms. First, it will not be very fast for large arrays (Because all the work is being done in interpreted Pyton (interpreter to execute slow) code). Second, it will not work with multidimensional arrays. With np.where you can write this very concisely (concisely) -.> np.where () python that can make up for the slow-running and can handle multi-dimensional array of insufficient interpreter.

result = np.where(cond, xarr, yarr)
result
array([1.1, 2.2, 1.3, 1.4, 2.5])

The second and the third arguments to np.where do not need to be arrays; one or both of them can be scalar A typical use of where in data analysis is to produce a new array of values ​​base on another array (by one. multidimensional array, its determination, to generate a new array, a triplet of expressions by the wording) Suppose you had a matrix of randomly generated data and you wanted to replace all positive values ​​with 2 and all negative values ​​(negative values) with. - 2. This is very easy to do with np.where.

arr = np.random.randn(4,4)
arr

"逻辑判断 值大于0"
arr > 0
array([[ 0.16344426, -0.24675782, -0.99098667,  2.30182665],
       [ 1.21964938,  1.6536566 , -0.06302591, -0.27577446],
       [ 0.9991692 ,  0.47264648,  0.51368592, -0.28743687],
       [-0.62238625,  1.24407926,  0.46229014, -0.09544536]])
'逻辑判断 值大于0'
array([[ True, False, False,  True],
       [ True,  True, False, False],
       [ True,  True,  True, False],
       [False,  True,  True, False]])
"np.where, 大于0的值设为2, 否则值设为-2, 产生了新的数组"

np.where(arr > 0, 2, -2)
'np.where, 大于0的值设为2, 否则值设为-2, 产生了新的数组'
array([[ 2, -2, -2,  2],
       [ 2,  2, -2, -2],
       [ 2,  2,  2, -2],
       [-2,  2,  2, -2]])

You can combine scalars and arrays when using np.where. For example, I can replace all positive values in arr with the constant 2 like so:

# set only positive values to 2
np.where(arr > 0, 2, arr)
array([[ 2.        , -0.24675782, -0.99098667,  2.        ],
       [ 2.        ,  2.        , -0.06302591, -0.27577446],
       [ 2.        ,  2.        ,  2.        , -0.28743687],
       [-0.62238625,  2.        ,  2.        , -0.09544536]])

The arrays passed to np.where can be more than just equal-sized arrays or scalars -.> Np.where There are many powerful use of it

Mathematical and Statistical Methods

A set of mathematical functions that compute statistics about an entire array or about the data along an axis are accessible(可理解为) as methods of the array class. You can use aggregations(聚合函数) like sum, mean, and std either by calling the array instance method of using the top-level NumPy function.

Here I generate some normally distribute random data and compute some aggregate statistics:

arr = np.random.randn(5,4)
arr
array([[-1.37805831, -1.12482245, -0.16684412, -0.76586049],
       [ 0.53032371, -0.44266291, -2.34564781, -0.16721986],
       [-0.85135248, -1.11541433,  1.50280171, -0.32380149],
       [-0.40347092,  0.04702776, -0.97636849,  0.2564794 ],
       [-0.44465538,  0.68593465, -1.45780821,  0.46746144]])
arr.mean()

np.mean(arr)

arr.sum()
-0.4236979305849384
-0.4236979305849384
-8.473958611698768

Functions like mean and sum take an opetional axis argument that complutes the statistic over the given axis, resulting in an array with one fewer dimension: -> aggregate function is calculated according to the shaft.

# cj test

arr = np.arange(1,7).reshape((2, 3))
arr
array([[1, 2, 3],
       [4, 5, 6]])
"axis=1, 轴1表示列方向, 右边, 即按每行计算"
arr.mean(axis=1)

"axis=0, 轴0表示行方向, 下边, 即表示按每列计算"
arr.sum(axis=0)
'axis=1, 轴1表示列方向, 右边, 即按每行计算'
array([2., 5.])
'axis=0, 轴0表示行方向, 下边, 即表示按每列计算'
array([5, 7, 9])

Here, arr.mean (1) means "compute mean across columns (the column direction, to the right, each line is calculated)" where arr.sum (0) means "compute sum down the rows. (Row direction, below, is calculated for each column ) "

Other methods like cumsum and cumprod do not aggregate, instead producing an array of the intermediate(中间的结果数组) results:

arr = np.array([0, 1, 2, 3, 4, 5, 6, 7])

"cumsum() 累积求和函数"
arr.cumsum()
'cumsum() 累积求和函数'
array([ 0,  1,  3,  6, 10, 15, 21, 28], dtype=int32)

In multidimensional arrays, accumulation(累积) like cumsum return an array of the same size, but with the partial aggregates(部分聚合函数) computed the indicated axis according to each lower dimentional slice:(根据轴来计算)

arr = np.array([[0,1,2], [3,4,5],[6,7,8]])
arr
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])
"0轴, 行方向, 下方, 按列展示"
arr.cumsum(axis=0)

"1轴, 方向, 右边, 按行展示"
arr.cumprod(axis=1)
'0轴, 行方向, 下方, 按列展示'
array([[ 0,  1,  2],
       [ 3,  5,  7],
       [ 9, 12, 15]], dtype=int32)
'1轴, 方向, 右边, 按行展示'
array([[  0,   0,   0],
       [  3,  12,  60],
       [  6,  42, 336]], dtype=int32)

See Table 4-5 for a full listing. We'll see many examples these methods in action in late chapters.

  • sum, mean, median
  • std, was
  • max, min
  • argmin, argmax
  • cumsum, cumprod
  • ...

Methods for Boolean arrays

Boolean values are coerced to(被规定为) 1 and 0 (False) in the preceding methods. Thus, sum is often used as a means of counting True values in a boolean array:

arr = np.random.randn(100)

"计算大于0的元素值有多少个"
(arr > 0).sum()
'计算大于0的元素值有多少个'
56

There are two addtional methods, any and all, usefull especially for boolean arrays. any(存在至少一个) tests whether one or more values in an array is True, while(而) all(所有) checks if every value is True.

bools = np.array([False, False, True, False])

"any: 存在至少一个即为真"
bools.any()

"all: 必须所有真才为真"
bools.all()
'any: 存在至少一个即为真'
True
'all: 必须所有真才为真'
False

Theres methods also work with non-boolean arrays, where non-zero elements evaluate to True.

Sorting

Like Python's built-in list type, NumPy arrays can be sorted in-place with the sort method:

arr = np.random.randn(6)
arr
array([-0.07751873,  1.96812178,  1.62236213,  0.35971909,  0.63935982,
        0.75188034])
"arr.sort() 排序是原地的, 直接修改原数组, 没有返回值"

arr.sort()
arr
'arr.sort() 排序是原地的, 直接修改原数组, 没有返回值'
array([-0.07751873,  0.35971909,  0.63935982,  0.75188034,  1.62236213,
        1.96812178])

You can sort each one-dimensional section(轴编号) of values in a multidimentional array in-place along an axis by passing the axis number to sort:

arr = np.random.randn(5,3)
arr
array([[ 1.19201961, -0.55352247,  0.59211779],
       [-0.72344831,  0.48316786, -0.11050496],
       [-0.77023054,  0.54681603,  0.49216649],
       [ 0.20738566, -0.60705897, -1.37389538],
       [ 0.46993764, -0.81503777, -1.31609675]])
"axis=1, 列方向, 右边, 按照每行"
arr.sort(1)
arr
'axis=1, 列方向, 右边, 按照每行'
array([[-0.55352247,  0.59211779,  1.19201961],
       [-0.72344831, -0.11050496,  0.48316786],
       [-0.77023054,  0.49216649,  0.54681603],
       [-1.37389538, -0.60705897,  0.20738566],
       [-1.31609675, -0.81503777,  0.46993764]])

The top-level method np.sort returns a sorted copy of an array instead of modifying the array in-place. (Np.sort () returns a deep copy, ex situ modification) A quick-and-dirty way to compute the quantiles (quantile) of an array is to sort it and select the value at a particular rank:

large_arr = np.random.randn(1000)

"默认升序"
large_arr.sort()

" %5 quantile"
large_arr[int(0.05 * len(large_arr))]
'默认升序'
' %5 quantile'
-1.7445979520348824

For more details on using NumPy's sorting methods, and more advanced techniques like indirect sorts, see Appendix A. Several other kinds of data manipulations related to sorting of data can also be found in pandas.

Unique and Other set Logic

NumPy has some basic set opetation for one-dimensional ndarrays. A commonly used is np.unique, which returns the sorted unique values in an array:

names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])

"数组去重"
np.unique(names)
'数组去重'
array(['Bob', 'Joe', 'Will'], dtype='<U4')
ints = np.array([3,3,3,2,2,1,1,4,4])

"数子去重"
np.unique(ints)
'数子去重'
array([1, 2, 3, 4])

Contrast(对比) np.unique with the pure Python alternative:

sorted(set(names))
['Bob', 'Joe', 'Will']

Another function, np.in1d, tests membership of the values in one array in another, returning a boolean array:

values = np.array([6, 0, 0, 3, 2, 5, 6])

"逐个判断数组的值, 是否在另一个array中"
np.in1d(values, [2,3,6])
'逐个判断数组的值, 是否在另一个array中'
array([ True, False, False,  True,  True, False,  True])

See Table 4-6 for a listing of set functions in NumPy.

  • unique(x) Compute the sorted, unique elements in x
  • intersect1d(x, y) Compute the sorted, common elements in x and y
  • union1d(x, y) Compute the sorted union of elements
  • in1d(x, y) Compute a boolean array indicating whether each element of x in y
  • ....

Guess you like

Origin www.cnblogs.com/chenjieyouge/p/11853150.html