【Python数据分析】numpy知识总结(超全面)

文章目录

NumPy(Numerical Python) 是 Python 语言的一个扩展程序库，支持大量的维度数组与矩阵运算，此外也针对数组运算提供大量的数学函数库。

在文件中首先导入包import numpy as np

创建数组

通过list创建

a = [random.uniform(100.0, 200.0) for i in range(100)]
arr = np.array(a)

通过np.array初始化创建

arr = np.array([[[1, 2]], [[2, 3]], [[3, 4]], [[4, 5]]])

通过reshape创建二维数组

x = np.arange(10).reshape((2, 5))
print(x)
print(x[1, 4])
print(x.shape[0])
# [[0 1 2 3 4]
#  [5 6 7 8 9]]
# 9
# 2

二维数组的索引方式有两种，第一种是a[x][y]的方式，另一种是a[x,y]，通常更推荐后者。

数组array属性

shape属性

输出各个维度的大小

arr = np.array([[[1, 2]], [[2, 3]], [[3, 4]], [[4, 5]]])
print(arr.shape)
# (4, 1, 2)

size属性

输出数组元素总的个数

arr = np.array([[[1, 2]], [[2, 3]], [[3, 4]], [[4, 5]]])
print(arr.size)
# 8

ndim属性

数组的维度大小也就是几维数组

arr.ndim  # 3

T属性

实现数组转置

arr.T # 各个维度全部转换 (2, 1, 4)

特殊数组

np.zeros()

生成全0数组,，通过dtype可以指定数组中元素类型

z = np.zeros(10)
print(z)  # 默认是浮点数 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
z1 = np.zeros(10, dtype='int64')
print(z1)  # [0 0 0 0 0 0 0 0 0 0]

np.ones()

生成全1数组

one = np.ones(10, dtype='int64')
print(one)  # [1 1 1 1 1 1 1 1 1 1]

np.empty()

生成空数组

e = np.empty(100)
print(e)  # 其实是随机值

np.arange()

生成某个范围内的数组

aa = np.arange(10)
# [0 1 2 3 4 5 6 7 8 9]

bb = np.arange(0, 10, 3)  # 指定起始点 终止点 以及 step
# [0 3 6 9]

np.eye()

生成单位矩阵

ee = np.eye(10)

数组的切片操作

`array`和`list`的切片区别

a = list(range(10))
b = np.arange(10)
a1 = a[0:4]
b1 = b[0:4]
print("a_list:{}".format(a))
print("b_array:{}".format(b))
print("a1_list:{}".format(a1))
print("b1_array:{}".format(b1))
print("修改a1[0]和b1[0]的值之后：")
a1[0] = 11
b1[0] = 11
print("修改之后的a_list:{}".format(a))  # list的值并没有被修改 它相当于是一种浅拷贝
print("修改之后的b_array:{}".format(b))  # array的值被修改
# 说明对array来说，切片操作其实是对原array的引用，实际上是一种浅拷贝

a_list:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
b_array:[0 1 2 3 4 5 6 7 8 9]
a1_list:[0, 1, 2, 3]
b1_array:[0 1 2 3]
修改a1[0]和b1[0]的值之后：
修改之后的a_list:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
修改之后的b_array:[11  1  2  3  4  5  6  7  8  9]

需要注意的点

a = b，完全不复制，ab会相互影响；

a = b[:]，切片操作，会创建新的对象a，但是两者的变化是一致的；

a = b.copy()，复制，两者互不影响，相当于深拷贝。

二维数组切片

a = np.arange(15).reshape((3, 5))
print(a)
# [[ 0  1  2  3  4]
#  [ 5  6  7  8  9]
#  [10 11 12 13 14]]
print(a[0:2, 0:2])
# [[0 1]
#  [5 6]]
print(a[0:2][0:2])
# [[0 1 2 3 4]
#  [5 6 7 8 9]]
# 只有前一种方法凑效
print(a[1:, 2:4])
# [[ 7  8]
#  [12 13]]

数组索引

布尔索引

将布尔表达式作为索引

a = np.arange(10)
b = a[a > 5]
print(b)
# [6 7 8 9]

其实它的实际操作是这样的

c = np.array([1, 2, 3, 4])
d = [True, True, False, True]
print(c[d])  # [1, 2, 4]

print(a > 5)  # [False False False False False False  True  True  True  True]
print(a[a > 5])  # [6 7 8 9]

取a中大于5的偶数

b = a[(a > 5) & (a % 2 == 0)]  # 必须要加括号 并且不能用and 此处会重载运算符&
print(b)
# [6 8]

取a中大于5的数和所有偶数

c = a[(a > 5) | (a % 2 == 0)]
print(c)  # [0 2 4 6 7 8 9]

花式索引

传进去一个列表作为索引，可以取到相应位置上的值

也可以和切片操作、布尔索引结合

a = np.arange(20)
print(a[[1, 3, 5, 7, 9]])  # [1 3 5 7 9]

a = np.arange(20).reshape((4, 5))
print(a)
print(a[0, 2:4])  # [2, 3]
print(a[0, a[0] > 1])  # [2, 3, 4]

# [[ 0  1  2  3  4]
#  [ 5  6  7  8  9]
#  [10 11 12 13 14]
#  [15 16 17 18 19]]
# 取a中的元素作为新的二维数组
b = a[[1, 3], :]  # 先切第1 3行
c = b[:, [1, 3]]  # 再切第1 3列
print(c)
# [[ 6  8]
#  [16 18]]
print(a[[1, 3], :][:, [1, 3]])  # 效果同上

print(a[[0, 2], :][:, [0, 1, 3, 4]])

numpy中的重要函数

`np.isnan`

nan(not a number)不是一个数，类型是<class 'float'>。
两个NaN是不相等的 np.nan == np.nan ==> False
np.nan != np.nan ==> True

# 判断nan的方法是np.isnan(a)
a = np.arange(5)
a = a / a  
print(a)  # [nan  1.  1.  1.  1.]
print(np.isnan(a))  # [True False False False False]
# 去掉数组中nan的方法
b = a[~np.isnan(a)]  # [1. 1. 1. 1.]
# 统计数组中nan的个数
np.count_nonzero(a != a)

`np.isinf`

inf是无穷大，也是float型数据。用法同isnan

a = np.array([1, 1, 2, 3, 4])
b = np.array([0, 1, 2, 1, 0])
c = a / b
print(c)  # [inf  1.  1.  3. inf]
# 去除inf
print(c[~(np.isinf(c))])  # [1. 1. 3.]

`np.random`

随机产生一个数

t = np.random.rand(2,4)  # 产生多个维度均匀分布的数 或者 数组 范围0~1
# [[0.90817816 0.75715337 0.64737834 0.79045973]
#  [0.80215137 0.04848201 0.66005689 0.32470002]]
t = np.random.randn(4)  # 产生多个维度成标准正态分布的数组 均值为0 标准差为1
# [ 0.33196007 -1.59954454 -1.22863283  0.49101622]
a = np.random.random()

产生某个范围内的整数或数组

np.random.randint(0, 10)
a = np.random.randint(0, 10, 10)  # 随机生成10个0-9以内的int 一维的
# [9 0 2 3 2 0 8 9 2 6]
a = np.random.randint(0, 10, (2, 5))  # 最后一参数是形状 多维
# [[4 8 7 0 3]
#  [7 6 8 1 8]]

产生成均匀分布的数组

a = np.random.uniform(2.0, 10.0, 10)
print(a)
# [3.10070825 2.54493484 8.07038208 6.74178579 2.9491971  9.9813392 3.58365099 8.4720269  4.73902394 6.50748841]
a = np.random.uniform(2.0, 10.0, (2, 5))
print(a)
# [[6.86870706 8.48767828 3.35503304 2.35793278 6.05281056]
#  [9.67359048 3.16650548 7.81726608 2.72933486 2.22826293]]

随机数种子

# 因为计算机每次生成的都是伪随机数，在指定随机数种子之后，生成的随机数就不会改变。
np.random.seed(4)  
t = np.random.randint(0, 10, (2, 5))
print(t)
# [[7 5 1 8 7]
#  [8 2 9 7 7]]

随机选择数组中的数形成新数组

a = np.random.choice([1, 2, 3, 4, 5], 10)  # 最后一个参数是生成数组的形状
print(a)  # [2 2 2 2 1 1 3 5 3 3]

其它函数

print(np.maximum(a, b))
print(np.minimum(a, b))
a = np.array([1, 2, 3, 4, 5])
# 若a是二维数组，则可以指定axis用以在不同轴上取数据
t1 = np.arange(0, 20).reshape((4, 5))
print(t1.sum(axis=0))  # 枚举行
# [30 34 38 42 46]
print(t1.sum(axis=1))  # 枚举列
# [10 35 60 85]
print(a.max())
print(a.min())
print(a.mean())
print(a.sum())
print(a.argmax())
print(a.argmin())
print(a.std())
print(a.var())

a = np.arange(-5, 5, 0.6)
print(a)
print(np.floor(a))  # 向上取整
print(np.ceil(a))  # 向下取整
print(np.rint(a))  # 四舍五入
print(np.round(a))  # 四舍五入
print(np.trunc(a))  # 截断小数部分

数组的拼接`vstack`和`hstack`

垂直拼接需要保证两个数组在列维度上大小相同
水平拼接需要保证两个数组在行维度上大小相同

# 垂直拼接
t1 = np.arange(0, 10).reshape((2, 5))
t2 = np.arange(11, 21).reshape((2, 5))
tv = np.vstack((t1, t2))
print(tv)
# [[ 0  1  2  3  4]
#  [ 5  6  7  8  9]
#  [11 12 13 14 15]
#  [16 17 18 19 20]]

# 水平拼接
t1 = np.arange(0, 10).reshape((2, 5))
t2 = np.arange(11, 21).reshape((2, 5))
th = np.hstack((t1, t2))
print(th)
# [[ 0  1  2  3  4 11 12 13 14 15]
#  [ 5  6  7  8  9 16 17 18 19 20]]

数组的行列交换

t1 = np.arange(0, 20).reshape((4, 5))
print(t1)
# [[ 0  1  2  3  4]
#  [ 5  6  7  8  9]
#  [10 11 12 13 14]
#  [15 16 17 18 19]]
t1[[1, 2], :] = t1[[2, 1], :]  # 行交换之后
print(t1)
# [[ 0  1  2  3  4]
#  [10 11 12 13 14]
#  [ 5  6  7  8  9]
#  [15 16 17 18 19]]
t1[:, [0, 1]] = t1[:, [1, 0]]  # 列交换之后
print(t1)
# [[ 1  0  2  3  4]
#  [11 10 12 13 14]
#  [ 6  5  7  8  9]
#  [16 15 17 18 19]]

【Python数据分析】numpy知识总结(超全面)

文章目录

创建数组

通过list创建

通过np.array初始化创建

通过reshape创建二维数组

数组array属性

shape属性

size属性

ndim属性

T属性

特殊数组

np.zeros()

np.ones()

np.empty()

np.arange()

np.eye()

数组的切片操作

array和list的切片区别

二维数组切片

数组索引

布尔索引

花式索引

numpy中的重要函数

np.isnan

np.isinf

np.random

随机产生一个数

产生某个范围内的整数或数组

产生成均匀分布的数组

随机数种子

随机选择数组中的数形成新数组

其它函数

数组的拼接vstack和hstack

数组的行列交换

猜你喜欢

`array`和`list`的切片区别

`np.isnan`

`np.isinf`

`np.random`

数组的拼接`vstack`和`hstack`