Data analysis study notes (3) - numpy: built-in functions (general functions, mathematical and statistical methods, collections)

Universal function

A universal function (ufunc) is a function that performs element-wise operations on data in an ndarray

# 例子数组
a = np.array([-1,2.1,0.2,2.6,9.1])  # [-1.   2.1  0.2  2.6  9.1]
b = np.arange(1,len(a)+1)           # [1 2 3 4 5]
  • unary function
function illustrate example result
abs、fabs Calculate the absolute value of integer, float and complex pairs, for non-complex values ​​you can use faster pair fabs np.abs(a) [1. 2.1 0.2 2.6 9.1]
sqrt Calculate the square root of each element, equivalent to arr**0.5 np.sqrt(b)
square Calculate the square of each element, equivalent to arr**2 np.square(b) [ 1 4 9 16 25 36 49 64 81]
exp Calculate the exponent e(x) of each element
log、log10、log2、log1p are the natural logarithm (base e), log with base 10, log with base 2, log(1+x)
sign Calculate the sign of each element, 1 (positive number), 0 (zero), -1 (negative number) np.sign(a) [-1. 1. 1. 1. 1.]
ceil Improvement arrangement np.ceil (a) [-1. 3. 1. 3. 10.]
floor round down np.floor(a) [-1. 2. 0. 2. 9.]
rint rounded, dtype preserved np.rint (a) [-1. 2. 0. 3. 9.]
modf Returns the fractional and integer parts of an element as two separate arrays np.modf (a) (array([-0. , 0.1, 0.2, 0.6, 0.1]), array([-1., 2., 0., 2., 9.]))
nonzero Divide the row and column coordinates of all non-zero elements and reconstruct into two matrices of row and column respectively np.nonzero(a) (array([0, 1, 2, 3, 4]),)
clip cut out elements np.clip(a, 0, 5) is equivalent to a.clip(0,5) [0. 2.1 0.2 2.6 5. ]
isnan Returns a boolean array with NaN values ​​in the True position
isfinite、isinf Returns a boolean array where the element in True position is either finite or infinite
cos、conh、sin、sinh、tan、tanh Ordinary and Hyperbolic Trigonometric Functions
arccos、arccosh、arcsin、arcsinh、arctan、arctanh Inverse trigonometric functions
logical_not Calculate the truth value of each element not x, equivalent to -arr
  • binary function
function illustrate example result
add add up np.add(a,b) is equivalent to a+b [ 0. 4.1 3.2 6.6 14.1]
subtract first array minus second array np.subtract(a,b) is equivalent to ab [-2. 0.1 -2.8 -1.4 4.1]
multiply multiply np.multiply(a,b) is equivalent to a*b [-1. 4.2 0.6 10.4 45.5]
divide、floor_divide Divide or round down after division np.divide(a,b) is equivalent to a/b; np.floor_divide(a,b) is equivalent to np.floor(a/b)
power pow(a,b), a to the b power np.power(a,b)
maximum、fmax The maximum value in the element, fmax ignores NaN np.maximum(a,b) 、np.fmax(a,b)
minimum、fmin The minimum value in the element, fmin ignores NaN
mod modulo np.mod(a,b)
copysign Copies the sign of the value in the second array to the value in the first array np.copysign(b,a) [-1. 2. 3. 4. 5.]
greater、greater_equal、less、less_equal、equal、not_equal >、>=、<、<=、=、!=
logical_and、logical_or、logical_xor Element-wise truth-valued logical operations, equivalent to infix operators &, |, ^

Mathematical and Statistical Methods

function illustrate example result
sum Sums all or all elements of an array in an axis direction np.sum(a) or a.sum() 13.0
mean Arithmetic mean, mean for zero-length arrays is NaN np.mean(a) 或 a.mean() 2.6
average Weighted average, when the weights are the same, it can also be regarded as the arithmetic average np.average(a) 2.6
median 中位数,一组有序数列的中间数,偶数时,取平均 np.median(a) 2.1
std、var 分别求标准差和方差,自由度可调(默认为n) a.std()、a.var() 3.4991427521608776、12.244
min、max 最小值和最大值
argmin、argmax 分别为最小元素和最大元素的索引 a.argmin()、a.argmax() 0、 4
diff diff(a, n=1, axis=-1),后一个与前一个的差值,参数n表示进行n轮运算,多维数组中,可通过axis控制方向 np.diff(a) [ 3.1 -1.9 2.4 6.5]
cumsum 所有元素和累计和(数组) a.cumsum() [-1. 1.1 1.3 3.9 13. ]
cumprod 所有元素的累计积(数组) np.cumprod() [-1. -2.1 -0.42 -1.092 -9.9372]

注:
上述例子是一维数组,如果是二维数组调用方法类似,不过可以使用参数 axis 指定方向,1为横向,0为竖向

arr = np.arange(24).reshape(4,6)
'''
[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]]
 '''
# 求和
arr.sum()           # 276   总和
arr.sum(axis=0)     #  [36 40 44 48 52 56]
# 算术平均数
arr.mean()          # 11.5      总数的算术平均数
arr.mean(axis=1)    # [ 2.5  8.5 14.5 20.5] 竖向的算术平均数

关于加权平均数 : average函数

arr = np.arange(10).reshape(2,5)
'''
[[0 1 2 3 4]
 [5 6 7 8 9]]
 '''
arr.mean()          # 4.5 算术平均数 
np.average(arr)     # 4.5 可看作是算术平均数
np.average(arr, axis=1) # [2. 7.],给出了方向
np.average(arr, weights=np.arange(arr1.size).reshape(2,5))  # 传入了权重 6.333333333333333

集合

# 例子数组
s0 = np.array([1,2,3,2,1,4,5,2])    # [1 2 3 2 1 4 5 2]
s1 = np.arange(0,30,2)  # [ 0  2  4  6  8 10 12 14 16 18 20 22 24 26 28]
s2 = np.arange(0,30,3)  # [ 0  3  6  9 12 15 18 21 24 27]
函数 说明 例子 结果
unique(x) 计算x中唯一元素,并返回有序的结果 np.unique(s0) [1 2 3 4 5]
intersect1d(x,y) 交集,并返回有序结果 np.intersect1d(s1,s2) [ 0 6 12 18 24]
union1d(x,y) 并集,并返回有序结果 np.union1d(s1,s2) [ 0 2 3 4 6 8 9 10 12 14 15 16 18 20 21 22 24 26 27 28]
setdiff1d(x,y) 集合差,即元素在x中且不再y中 np.setdiff1d(s1,s2) [ 2 4 8 10 14 16 20 22 26 28]
setxor1d(x,y) 集合对称差,只存在x和y中的元素集合 np.setxor1d(s1,s2) [ 2 3 4 8 9 10 14 15 16 20 21 22 26 27 28]
in1d(x,y) 得到一个”x的元素是否包含于y”的布尔行数组 np.in1d(s2,s1) [ True False True False True False True False True False]

注:数组1和数组2的元素数量及shape都可以不同
上述例子中的s1和s2虽然都是一维的,但是数量并不相同;为了验证集合操作无关shape,我们将s1和s2的shape做一下改变

s1 = s1.reshape(3,5)
'''
[[ 0  2  4  6  8]
 [10 12 14 16 18]
 [20 22 24 26 28]]
 '''
s2 = s2.reshape(2,5)
'''
[[ 0  3  6  9 12]
 [15 18 21 24 27]]
 '''
np.intersect1d(s1,s2)   # [15 18 21 24 27]]
np.union1d(s1,s2)       # [ 0  2  3  4  6  8  9 10 12 14 15 16 18 20 21 22 24 26 27 28]
np.setdiff1d(s2,s1)     # [ 3  9 15 21 27]

补充(where、sort、any、all)

  • where

    where函数是一个三目运算符,where(condition, x, y),
    完成类似下面的工作

ifcondition):
  x
else:
  y

例子1:有xarr和yarr两个数组,需要根据condition选择数据

xarr = np.array(np.arange(1.1, 1.6, 0.1))
yarr = np.array(np.arange(2.1, 2.6, 0.1))
cond = np.array([True, False, True, True, False])

在python语法中:

result = [x if c else y for x, y, c in zip(xarr, yarr, cond)]
输出:
[1.1, 2.2, 1.3000000000000003, 1.4000000000000004, 2.5000000000000004]
非常不方便,而且出现了数据异常问题

在numpy中使用where函数:

result = np.where(cond, xarr, yarr)
输出:
[1.1 2.2 1.3 1.4 2.5]

例子2:将arr数组中小于0的部分重制为0,其余部分保留

arr = np.random.randn(4,4)  
输出:
[[ 0.40336609 -1.42094364 -1.1257582   0.2787659 ]
 [-0.64618146 -0.56508989  0.20527747  1.8542685 ]
 [-0.39792887  0.94738928 -0.68713023  0.60328758]
 [-0.94495984 -1.47217366  0.03280616 -0.13120201]]
arr = np.where(arr>0, arr, 0)
输出:
[[0.40336609 0.         0.         0.2787659 ]
 [0.         0.         0.20527747 1.8542685 ]
 [0.         0.94738928 0.         0.60328758]
 [0.         0.         0.03280616 0.        ]]

例子3:复杂嵌套的情况

cond1 = np.array([True, False, True, True, False])
cond2 = np.array([True, True, True, False, False])
result = []

python语法:

for i in range(len(cond1)):
    if cond1[i] and cond2[i]:
        result.append(0)
    elif cond1[i]:
        result.append(1)
    elif cond2[i]:
        result.append(2)
    else:
        result.append(3)
print(result)           # [0, 2, 0, 1, 3]

在numpy中使用where函数:

result = np.where(cond1&cond2, 0 ,
             np.where(cond1, 1,
                  np.where(cond2, 2, 3)))
list(result)     # [0, 2, 0, 1, 3]

注:where函数可以只传条件,返回条件对象的真值下标数组

arr = np.random.randn(10)
np.where(arr>0)      # (array([1, 2, 3, 6, 9]),)

如果是多维数组,返回也是数组,分别返回纬度数组索引

cond1 = np.array([True, False, True, True, False])
cond2 = np.array([True, True, True, False, False])
arr = np.array([cond1,cond2])
np.where(arr)
# (array([0, 0, 0, 1, 1, 1]), array([0, 2, 3, 0, 1, 2]))
# 即 [(0,0),(0,2),(0,3),(1,0),(1,1),(1,2)]位置
  • sort 排序
# 多维数组,可指定方向
arr = np.random.randn(20).reshape(4,5)
'''
[[-0.94603557 -0.18393318  0.11450866  0.40325255  0.45881851]
 [ 1.17704035 -0.41401001  0.75339636 -0.43745415  2.7929479 ]
 [-0.28784153 -1.48745643 -0.07142102 -0.5482369  -0.22610164]
 [ 1.35561729 -1.08766432  0.83278514 -1.32299757  0.04410116]]
 '''
np.sort(arr, axis=0)     # 竖向排序(默认为横向排序)
'''
[[-0.94603557 -1.48745643 -0.07142102 -1.32299757 -0.22610164]
 [-0.28784153 -1.08766432  0.11450866 -0.5482369   0.04410116]
 [ 1.17704035 -0.41401001  0.75339636 -0.43745415  0.45881851]
 [ 1.35561729 -0.18393318  0.83278514  0.40325255  2.7929479 ]]
 '''
 # 一维数组
arr = np.array([2,6,4,2,1,4])
arr.sort()      # 这种方式排序会直接改变「原」数组,使用np.sort()方式则将产生新的排序后的数组,而不改变原数组
print(arr)      # [2 6 4 2 1 4]

例子:我想知道一组数据的25%分位数是多少?

# 产生一组数据
arr = np.random.randn(20).reshape(4,5)
# 1.我们先将其转化为一维数组,并进行排序处理
arr = arr.flatten()
# 排序
arr.sort()
# 获取25%下标数据
value = arr[int(0.25*len(arr))] # 获取25%分位数
print(arr)
'''
[-1.8819284  -1.84223613 -1.55037549 -1.19713841 -0.91661269 -0.69222229
 -0.6796624  -0.65882803 -0.55325753 -0.34502426 -0.1197655   0.36925446
  0.5343373   0.62780224  0.74335279  0.82012463  1.00546263  1.08559715
  1.29212188  1.47629451]
  '''
print(value)    # -0.69222229
  • all、any

    all:是否都是 True , 如果都是返回 True 否则 False
    any: 是否存在 True , 如果存在 True 返回 True 否则 False

arr = np.array([True,False,True,True,False])
arr.all()    # False
arr.any()    # True

布尔型数组的统计方法

arr = np.random.randn(100)
(arr>0).sum()      # 统计正值的总数 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325698109&siteId=291194637