numpy_进阶

1.数学与统计方法

a = np.array([-1,2.1,0.2,2.6,9.1]) 
b = np.arange([1 2 3 4 5])
sum	对数组中全部或者某轴方向(axis=0为行，axis=1为列)的元素求和	np.sum(a) 或 a.sum()	13.0

mean	算术平均数，零长度的数组的mean为NaN	np.mean(a) 或 a.mean()	2.6

average	加权平均，权重相同时，也可看作时算术平均	np.average(a)	2.6

median	中位数，一组有序数列的中间数，偶数时，取平均	np.median(a)	2.1

std、var	分别求标准差和方差，自由度可调（默认为n,一般指定ddof=1即无偏性）	a.std()、a.var()	
3.4991427521608776、12.244

cov    计算协方差(自由度为n-1)  np.cov(a，b) == np.cov(np.vstack((a,b)))  [[12.505  4.175],[ 4.175  2.5  ]]
'''
正对角线为样本的方差(无偏性)，反对角线为样本与样本之间的关系
    vect_1=[random.randint(0,5) for i in range(10) ]
    vect_2=[random.randint(0,5) for i in range(10)]
    vect_3= [random.randint(0, 5) for i in range(10)]
    print(np.cov((vect_1,vect_2,vect_3)))
    [[ 3.12222222 -0.41111111 -0.77777778]
     [-0.41111111  3.43333333  0.33333333]
     [-0.77777778  0.33333333  3.11111111]]
    ----------------------------------------------
    for i in range(len(cov)):
        for n in range(len(cov[i])):
            if i==n:
                print('第{}组数据的方差为:{}'.format(i+1,cov[i][n]))
            elif n>i:
                print('第{}组数据与第{}组数据的协方差为:{}'.format(i+1,n+1,cov[i][n]))
    第1组数据的方差为:3.1222222222222222
    第1组数据与第2组数据的协方差为:-0.4111111111111112
    第1组数据与第3组数据的协方差为:-0.7777777777777777
    第2组数据的方差为:3.433333333333333
    第2组数据与第3组数据的协方差为:0.3333333333333333
    第3组数据的方差为:3.1111111111111107
'''

corrcoef   Pearson相关系数(自由度为n)  np.corrcoef(a，b) == np.corrcoef(np.vstack((a,b)))
'''
反对角线为样本与样本的相关系数
relation=np.corrcoef((vect_1,vect_2,vect_3))
for i in range(len(relation)):
    for n in range(len(relation[i])):
        if n >i:
            print('第{}组数据与第{}组数据的相关系数为：{}'.format(i+1,n+1,relation[i][n]))
[[ 1.          0.62695314 -0.23393668]
 [ 0.62695314  1.         -0.16187993]
 [-0.23393668 -0.16187993  1.        ]]
第1组数据与第2组数据的相关系数为：0.6269531370115066
第1组数据与第3组数据的相关系数为：-0.23393667553251235
第2组数据与第3组数据的相关系数为：-0.16187992962181655
'''

min、max	最小值和最大值	
	
argmin、argmax	分别为最小元素和最大元素的索引	a.argmin()、a.argmax()	0、 4

diff	diff(a, n=1, axis=-1),后一个与前一个的差值，参数n表示进行n轮运算，多维数组中，可通过axis控制方向
	np.diff(a)	[ 3.1 -1.9 2.4 6.5]

cumsum	所有元素和累计和（数组）	a.cumsum()	[-1. 1.1 1.3 3.9 13. ]

cumprod	所有元素的累计积（数组）	np.cumprod()	[-1. -2.1 -0.42 -1.092 -9.9372]

--------------------------------------------------------------------------------
nanmedian(a[, axis, out, overwrite_input, …])	中位数，并忽略空值
nanmean(a[, axis, dtype, out, keepdims])	算数平均数，忽略空值
nanstd(a[, axis, dtype, out, ddof, keepdims])	标准差，忽略空值
nanvar(a[, axis, dtype, out, ddof, keepdims])	方差，忽略空值

2.一元函数（对DataFrame格式也适用）

a = np.array([-1,2.1,0.2,2.6,9.1]) 
b = np.arange([1 2 3 4 5])
abs、fabs 计算整数、浮点数和复数对绝对值，对于非复数值，可以使用更快对fabs np.abs(a) [1. 2.1 0.2 2.6 9.1]

sqrt	计算各元素的平方根，相当于 arr**0.5	np.sqrt(b)
	
square	计算各元素的平方，相当于 arr**2	np.square(b)	[ 1 4 9 16 25 36 49 64 81]

exp	计算各元素的指数 e(x)	
	
log、log10、log2、log1p	分别为自然对数（底数为e）、底数为10的log、底数为2的log、log(1+x)
		
sign	计算各元素的符号 ，1（正数）、0（零）、-1（负数）	np.sign(a)	[-1. 1. 1. 1. 1.]

ceil	向上取整	np.ceil(a)	[-1. 3. 1. 3. 10.]

floor	向下取整	np.floor(a)	[-1. 2. 0. 2. 9.]

rint	四舍五入，保留dtype	np.rint(a)	[-1. 2. 0. 3. 9.]

modf	将元素的小数和整数部分以两个独立的数组形式返回	np.modf(a)	(array([-0. , 0.1, 0.2, 0.6, 0.1]), 
array([-1., 2., 0., 2., 9.]))

nonzero	将所有非零元素的行与列坐标分割开，重构成两个分别关于行和列的矩阵	np.nonzero(a)	(array([0, 1, 2, 3, 4]),)

clip	切除元素	np.clip(a, 0, 5) 等同于 a.clip(0,5)	[0. 2.1 0.2 2.6 5. ]

3.二元函数

a = np.array([-1,2.1,0.2,2.6,9.1]) 
b = np.arange([1 2 3 4 5])
add	相加	np.add(a,b) 等同于 a+b	[ 0. 4.1 3.2 6.6 14.1]

subtract	第一个数组减第二个数组	np.subtract(a,b) 等同于 a-b	[-2. 0.1 -2.8 -1.4 4.1]

multiply	相乘	np.multiply(a,b) 等同于 a*b	[-1. 4.2 0.6 10.4 45.5]

divide、floor_divide	除法或做完除法后向下取整	np.divide(a,b) 等同于 a/b；

np.floor_divide(a,b) 等同于 np.floor(a/b)	

power	pow(a,b)，a的b次方	np.power(a,b)	

maximum、fmax	元素中最大值，fmax会忽略NaN	np.maximum(a,b) 、np.fmax(a,b)	

minimum、fmin	元素中最小值，fmin会忽略NaN		

mod	求模	np.mod(a,b)

4.集合

s0 = np.array([1,2,3,2,1,4,5,2])    # [1 2 3 2 1 4 5 2]
s1 = np.arange(0,30,2)  # [ 0  2  4  6  8 10 12 14 16 18 20 22 24 26 28]
s2 = np.arange(0,30,3)  # [ 0  3  6  9 12 15 18 21 24 27]

unique(x)	计算x中唯一元素，并返回有序的结果	np.unique(s0)	[1 2 3 4 5]

intersect1d(x,y)	交集，并返回有序结果	np.intersect1d(s1,s2)	[ 0 6 12 18 24]

union1d(x,y)	并集，并返回有序结果	np.union1d(s1,s2)	
[ 0 2 3 4 6 8 9 10 12 14 15 16 18 20 21 22 24 26 27 28]

setdiff1d(x,y)	集合差，即元素在x中且不再y中	np.setdiff1d(s1,s2)	[ 2 4 8 10 14 16 20 22 26 28]

setxor1d(x,y)	集合对称差，只存在x和y中的元素集合	np.setxor1d(s1,s2) [ 2 3 4 8 9 10 14 15 16 20 21 22 26 27 28]

in1d(x,y)	得到一个”x的元素是否包含于y”的布尔行数组	np.in1d(s2,s1)	
[ True False True False True False True False True False]

np.hstack:按水平方向（列顺序）堆叠数组构成一个新的数组

np.vstack:按垂直方向（行顺序）堆叠数组构成一个新的数组

5.将条件逻辑作为数组操作

arr=np.random.randint(1,100,size=16).reshape(4,4)
def condition(arr_1):
    con=[[True if i%2==0 else False for i in n]  for n in arr_1 ]
    print(con)
    return con
result=np.where(condition(arr),0,1)
print(result)
'''
[[True, True, True, False], [False, True, True, False], [False, True, False, False], [True, True, False, True]]
'''
'''
[[0 0 0 1]
 [1 0 0 1]
 [1 0 1 1]
 [0 0 1 0]]
'''

6.其他方法

arr=np.array([[1,2,3],[4,5,6],[7,8,9]])
'''对角线diagonal()'''
参数：
offset:正数表示右移，负数表示左移
arr.diagonal()   >>>[1,5,9]

'''对角线元素之和trace()'''

'''缺失值判断isNan()'''
import numpy as np
array=np.array([1,2,np.nan])
np.isnan(array)  >>>[False False  True]

1.数学与统计方法

2.一元函数（对DataFrame格式也适用）

3.二元函数

4.集合

5.将条件逻辑作为数组操作

6.其他方法

猜你喜欢