Universal function
A universal function (ufunc) is a function that performs element-wise operations on data in an ndarray
# 例子数组
a = np.array([-1,2.1,0.2,2.6,9.1]) # [-1. 2.1 0.2 2.6 9.1]
b = np.arange(1,len(a)+1) # [1 2 3 4 5]
- unary function
function | illustrate | example | result |
---|---|---|---|
abs、fabs | Calculate the absolute value of integer, float and complex pairs, for non-complex values you can use faster pair fabs | np.abs(a) | [1. 2.1 0.2 2.6 9.1] |
sqrt | Calculate the square root of each element, equivalent to arr**0.5 | np.sqrt(b) | |
square | Calculate the square of each element, equivalent to arr**2 | np.square(b) | [ 1 4 9 16 25 36 49 64 81] |
exp | Calculate the exponent e(x) of each element | ||
log、log10、log2、log1p | are the natural logarithm (base e), log with base 10, log with base 2, log(1+x) | ||
sign | Calculate the sign of each element, 1 (positive number), 0 (zero), -1 (negative number) | np.sign(a) | [-1. 1. 1. 1. 1.] |
ceil | Improvement arrangement | np.ceil (a) | [-1. 3. 1. 3. 10.] |
floor | round down | np.floor(a) | [-1. 2. 0. 2. 9.] |
rint | rounded, dtype preserved | np.rint (a) | [-1. 2. 0. 3. 9.] |
modf | Returns the fractional and integer parts of an element as two separate arrays | np.modf (a) | (array([-0. , 0.1, 0.2, 0.6, 0.1]), array([-1., 2., 0., 2., 9.])) |
nonzero | Divide the row and column coordinates of all non-zero elements and reconstruct into two matrices of row and column respectively | np.nonzero(a) | (array([0, 1, 2, 3, 4]),) |
clip | cut out elements | np.clip(a, 0, 5) is equivalent to a.clip(0,5) | [0. 2.1 0.2 2.6 5. ] |
isnan | Returns a boolean array with NaN values in the True position | ||
isfinite、isinf | Returns a boolean array where the element in True position is either finite or infinite | ||
cos、conh、sin、sinh、tan、tanh | Ordinary and Hyperbolic Trigonometric Functions | ||
arccos、arccosh、arcsin、arcsinh、arctan、arctanh | Inverse trigonometric functions | ||
logical_not | Calculate the truth value of each element not x, equivalent to -arr |
- binary function
function | illustrate | example | result |
---|---|---|---|
add | add up | np.add(a,b) is equivalent to a+b | [ 0. 4.1 3.2 6.6 14.1] |
subtract | first array minus second array | np.subtract(a,b) is equivalent to ab | [-2. 0.1 -2.8 -1.4 4.1] |
multiply | multiply | np.multiply(a,b) is equivalent to a*b | [-1. 4.2 0.6 10.4 45.5] |
divide、floor_divide | Divide or round down after division | np.divide(a,b) is equivalent to a/b; np.floor_divide(a,b) is equivalent to np.floor(a/b) | |
power | pow(a,b), a to the b power | np.power(a,b) | |
maximum、fmax | The maximum value in the element, fmax ignores NaN | np.maximum(a,b) 、np.fmax(a,b) | |
minimum、fmin | The minimum value in the element, fmin ignores NaN | ||
mod | modulo | np.mod(a,b) | |
copysign | Copies the sign of the value in the second array to the value in the first array | np.copysign(b,a) | [-1. 2. 3. 4. 5.] |
greater、greater_equal、less、less_equal、equal、not_equal | >、>=、<、<=、=、!= | ||
logical_and、logical_or、logical_xor | Element-wise truth-valued logical operations, equivalent to infix operators &, |, ^ |
Mathematical and Statistical Methods
function | illustrate | example | result |
---|---|---|---|
sum | Sums all or all elements of an array in an axis direction | np.sum(a) or a.sum() | 13.0 |
mean | Arithmetic mean, mean for zero-length arrays is NaN | np.mean(a) 或 a.mean() | 2.6 |
average | Weighted average, when the weights are the same, it can also be regarded as the arithmetic average | np.average(a) | 2.6 |
median | 中位数,一组有序数列的中间数,偶数时,取平均 | np.median(a) | 2.1 |
std、var | 分别求标准差和方差,自由度可调(默认为n) | a.std()、a.var() | 3.4991427521608776、12.244 |
min、max | 最小值和最大值 | ||
argmin、argmax | 分别为最小元素和最大元素的索引 | a.argmin()、a.argmax() | 0、 4 |
diff | diff(a, n=1, axis=-1),后一个与前一个的差值,参数n表示进行n轮运算,多维数组中,可通过axis控制方向 | np.diff(a) | [ 3.1 -1.9 2.4 6.5] |
cumsum | 所有元素和累计和(数组) | a.cumsum() | [-1. 1.1 1.3 3.9 13. ] |
cumprod | 所有元素的累计积(数组) | np.cumprod() | [-1. -2.1 -0.42 -1.092 -9.9372] |
注:
上述例子是一维数组,如果是二维数组调用方法类似,不过可以使用参数 axis 指定方向,1为横向,0为竖向
arr = np.arange(24).reshape(4,6)
'''
[[ 0 1 2 3 4 5]
[ 6 7 8 9 10 11]
[12 13 14 15 16 17]
[18 19 20 21 22 23]]
'''
# 求和
arr.sum() # 276 总和
arr.sum(axis=0) # [36 40 44 48 52 56]
# 算术平均数
arr.mean() # 11.5 总数的算术平均数
arr.mean(axis=1) # [ 2.5 8.5 14.5 20.5] 竖向的算术平均数
关于加权平均数 : average函数
arr = np.arange(10).reshape(2,5)
'''
[[0 1 2 3 4]
[5 6 7 8 9]]
'''
arr.mean() # 4.5 算术平均数
np.average(arr) # 4.5 可看作是算术平均数
np.average(arr, axis=1) # [2. 7.],给出了方向
np.average(arr, weights=np.arange(arr1.size).reshape(2,5)) # 传入了权重 6.333333333333333
集合
# 例子数组
s0 = np.array([1,2,3,2,1,4,5,2]) # [1 2 3 2 1 4 5 2]
s1 = np.arange(0,30,2) # [ 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28]
s2 = np.arange(0,30,3) # [ 0 3 6 9 12 15 18 21 24 27]
函数 | 说明 | 例子 | 结果 |
---|---|---|---|
unique(x) | 计算x中唯一元素,并返回有序的结果 | np.unique(s0) | [1 2 3 4 5] |
intersect1d(x,y) | 交集,并返回有序结果 | np.intersect1d(s1,s2) | [ 0 6 12 18 24] |
union1d(x,y) | 并集,并返回有序结果 | np.union1d(s1,s2) | [ 0 2 3 4 6 8 9 10 12 14 15 16 18 20 21 22 24 26 27 28] |
setdiff1d(x,y) | 集合差,即元素在x中且不再y中 | np.setdiff1d(s1,s2) | [ 2 4 8 10 14 16 20 22 26 28] |
setxor1d(x,y) | 集合对称差,只存在x和y中的元素集合 | np.setxor1d(s1,s2) | [ 2 3 4 8 9 10 14 15 16 20 21 22 26 27 28] |
in1d(x,y) | 得到一个”x的元素是否包含于y”的布尔行数组 | np.in1d(s2,s1) | [ True False True False True False True False True False] |
注:数组1和数组2的元素数量及shape都可以不同
上述例子中的s1和s2虽然都是一维的,但是数量并不相同;为了验证集合操作无关shape,我们将s1和s2的shape做一下改变
s1 = s1.reshape(3,5)
'''
[[ 0 2 4 6 8]
[10 12 14 16 18]
[20 22 24 26 28]]
'''
s2 = s2.reshape(2,5)
'''
[[ 0 3 6 9 12]
[15 18 21 24 27]]
'''
np.intersect1d(s1,s2) # [15 18 21 24 27]]
np.union1d(s1,s2) # [ 0 2 3 4 6 8 9 10 12 14 15 16 18 20 21 22 24 26 27 28]
np.setdiff1d(s2,s1) # [ 3 9 15 21 27]
补充(where、sort、any、all)
where
where函数是一个三目运算符,where(condition, x, y),
完成类似下面的工作
if(condition):
x
else:
y
例子1:有xarr和yarr两个数组,需要根据condition选择数据
xarr = np.array(np.arange(1.1, 1.6, 0.1))
yarr = np.array(np.arange(2.1, 2.6, 0.1))
cond = np.array([True, False, True, True, False])
在python语法中:
result = [x if c else y for x, y, c in zip(xarr, yarr, cond)]
输出:
[1.1, 2.2, 1.3000000000000003, 1.4000000000000004, 2.5000000000000004]
非常不方便,而且出现了数据异常问题
在numpy中使用where函数:
result = np.where(cond, xarr, yarr)
输出:
[1.1 2.2 1.3 1.4 2.5]
例子2:将arr数组中小于0的部分重制为0,其余部分保留
arr = np.random.randn(4,4)
输出:
[[ 0.40336609 -1.42094364 -1.1257582 0.2787659 ]
[-0.64618146 -0.56508989 0.20527747 1.8542685 ]
[-0.39792887 0.94738928 -0.68713023 0.60328758]
[-0.94495984 -1.47217366 0.03280616 -0.13120201]]
arr = np.where(arr>0, arr, 0)
输出:
[[0.40336609 0. 0. 0.2787659 ]
[0. 0. 0.20527747 1.8542685 ]
[0. 0.94738928 0. 0.60328758]
[0. 0. 0.03280616 0. ]]
例子3:复杂嵌套的情况
cond1 = np.array([True, False, True, True, False])
cond2 = np.array([True, True, True, False, False])
result = []
python语法:
for i in range(len(cond1)):
if cond1[i] and cond2[i]:
result.append(0)
elif cond1[i]:
result.append(1)
elif cond2[i]:
result.append(2)
else:
result.append(3)
print(result) # [0, 2, 0, 1, 3]
在numpy中使用where函数:
result = np.where(cond1&cond2, 0 ,
np.where(cond1, 1,
np.where(cond2, 2, 3)))
list(result) # [0, 2, 0, 1, 3]
注:where函数可以只传条件,返回条件对象的真值下标数组
arr = np.random.randn(10)
np.where(arr>0) # (array([1, 2, 3, 6, 9]),)
如果是多维数组,返回也是数组,分别返回纬度数组索引
cond1 = np.array([True, False, True, True, False])
cond2 = np.array([True, True, True, False, False])
arr = np.array([cond1,cond2])
np.where(arr)
# (array([0, 0, 0, 1, 1, 1]), array([0, 2, 3, 0, 1, 2]))
# 即 [(0,0),(0,2),(0,3),(1,0),(1,1),(1,2)]位置
- sort 排序
# 多维数组,可指定方向
arr = np.random.randn(20).reshape(4,5)
'''
[[-0.94603557 -0.18393318 0.11450866 0.40325255 0.45881851]
[ 1.17704035 -0.41401001 0.75339636 -0.43745415 2.7929479 ]
[-0.28784153 -1.48745643 -0.07142102 -0.5482369 -0.22610164]
[ 1.35561729 -1.08766432 0.83278514 -1.32299757 0.04410116]]
'''
np.sort(arr, axis=0) # 竖向排序(默认为横向排序)
'''
[[-0.94603557 -1.48745643 -0.07142102 -1.32299757 -0.22610164]
[-0.28784153 -1.08766432 0.11450866 -0.5482369 0.04410116]
[ 1.17704035 -0.41401001 0.75339636 -0.43745415 0.45881851]
[ 1.35561729 -0.18393318 0.83278514 0.40325255 2.7929479 ]]
'''
# 一维数组
arr = np.array([2,6,4,2,1,4])
arr.sort() # 这种方式排序会直接改变「原」数组,使用np.sort()方式则将产生新的排序后的数组,而不改变原数组
print(arr) # [2 6 4 2 1 4]
例子:我想知道一组数据的25%分位数是多少?
# 产生一组数据
arr = np.random.randn(20).reshape(4,5)
# 1.我们先将其转化为一维数组,并进行排序处理
arr = arr.flatten()
# 排序
arr.sort()
# 获取25%下标数据
value = arr[int(0.25*len(arr))] # 获取25%分位数
print(arr)
'''
[-1.8819284 -1.84223613 -1.55037549 -1.19713841 -0.91661269 -0.69222229
-0.6796624 -0.65882803 -0.55325753 -0.34502426 -0.1197655 0.36925446
0.5343373 0.62780224 0.74335279 0.82012463 1.00546263 1.08559715
1.29212188 1.47629451]
'''
print(value) # -0.69222229
all、any
all:是否都是 True , 如果都是返回 True 否则 False
any: 是否存在 True , 如果存在 True 返回 True 否则 False
arr = np.array([True,False,True,True,False])
arr.all() # False
arr.any() # True
布尔型数组的统计方法
arr = np.random.randn(100)
(arr>0).sum() # 统计正值的总数