数据分析——numpy（2）数组的形状、数组的计算

1、数组的形状

数组的形状即该数组是几行几列的数组。

方法：shape（查看形状）、reshape（修改形状）

#数组的形状
a = np.array([[1,2,3],[4,5,6]])
print(a)
print('*'*100)
print(a.shape) #数组形状 (2,3) 2代表行，3代表列
print('*'*100)
print(a.reshape((3,2,))) #修改成3行2列，需要保证3*2等于数组元素个数
print('*'*100)
print(a) #a依然不变，因为有返回值

输出：

[[1 2 3]
 [4 5 6]]
****************************************************************************************************
(2, 3)
****************************************************************************************************
[[1 2]
 [3 4]
 [5 6]]
****************************************************************************************************
[[1 2 3]
 [4 5 6]]

注意：
（1）reshape传入的参数是一个元组

（2）一维数组使用shape方法打印出的数值代表元素个数，不是代表行数

#一维数组形状
a = np.arange(10)
print(a)
print(a.shape) #一维数组使用shape方法打印出的数值代表元素个数，不是代表行数

输出：

[0 1 2 3 4 5 6 7 8 9]
(10,)

（3）将多维数组reshape为一维数组直接写reshape((元素个数,))即可，不能写reshape((1,元素个数,))，这是代表二维数组，也可以使用flatten方法，将多维数组转成一维数组

#多维数组转一维数组
a = np.array([[1,3,4],[5,6,8]])
print(a.reshape((6,))) #转成一维数组
print(a.reshape((1,6,))) #错误示例，转成了二维数组
print(a.flatten()) #转成一维数组

[1 3 4 5 6 8]
[[1 3 4 5 6 8]]
[1 3 4 5 6 8]

（4）假如一维数组用行表示，二维数组用行列表示，三维数组用行列堆表示，那么，将一维转成三维数组时，reshape第一个参数代表三维数组的堆，即reshape((堆,行,列,))

#一维数组转成三维数组
a = np.arange(24)
print(a.reshape((3,2,4,)))
print(a.reshape((3,2,4,)).shape)
print(a.shape)

输出：

[[[ 0  1  2  3]
  [ 4  5  6  7]]

 [[ 8  9 10 11]
  [12 13 14 15]]

 [[16 17 18 19]
  [20 21 22 23]]]
(3, 2, 4)
(24,)

2、数组的计算

2.1、数组和数的计算
当一个数组和某个数值进行加减乘除时，该数值每一个元素都与这个数值进行加减乘除

#数组和数的计算
a = np.arange(20)
b = a.reshape(2,10)
print(b)
print('*'*100)
print(b+2)
print('*'*100)
print(b-2)
print('*'*100)
print(b*2)
print('*'*100)
print(b/2)

输出：

[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]]
****************************************************************************************************
[[ 2  3  4  5  6  7  8  9 10 11]
 [12 13 14 15 16 17 18 19 20 21]]
****************************************************************************************************
[[-2 -1  0  1  2  3  4  5  6  7]
 [ 8  9 10 11 12 13 14 15 16 17]]
****************************************************************************************************
[[ 0  2  4  6  8 10 12 14 16 18]
 [20 22 24 26 28 30 32 34 36 38]]
****************************************************************************************************
[[0.  0.5 1.  1.5 2.  2.5 3.  3.5 4.  4.5]
 [5.  5.5 6.  6.5 7.  7.5 8.  8.5 9.  9.5]]

注意：
当数组除以0时，0/0=nan（非数值）非0数值/0=inf（无穷）

#数组和数的计算
a = np.arange(20)
b = a.reshape(2,10)

输出：

/home/pyvip/pro_analysis/pro_numpy/demo1.py:88: RuntimeWarning: divide by zero encountered in true_divide
  print(b/0)
/home/pyvip/pro_analysis/pro_numpy/demo1.py:88: RuntimeWarning: invalid value encountered in true_divide
  print(b/0)
[[nan inf inf inf inf inf inf inf inf inf]
 [inf inf inf inf inf inf inf inf inf inf]]

2.2、数组和数组的计算
当两个数组行列数相同时，这两个数组一一对应的元素进行加减乘除

a = np.arange(20)
b = a.reshape(2,10)
c = np.arange(40,60)
d = c.reshape(2,10)
print(b)
print('*'*100)
print(d)
print('*'*100)
print(b+d)

输出：

[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]]
****************************************************************************************************
[[40 41 42 43 44 45 46 47 48 49]
 [50 51 52 53 54 55 56 57 58 59]]
****************************************************************************************************
[[40 42 44 46 48 50 52 54 56 58]
 [60 62 64 66 68 70 72 74 76 78]]

当两个数组某一维度相同（如行相同或列相同）时，可进行加减乘除

#不同维度
a = np.arange(10)
b = np.arange(20).reshape((2,10,))
c = np.arange(2).reshape((2,1))
print(a)
print('*'*100)
print(b)
print('*'*100)
print(c)
print('*'*100)
print(a+b) #行相同
print('*'*100)
print(b+c) #列相同

输出：

[0 1 2 3 4 5 6 7 8 9]
****************************************************************************************************
[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]]
****************************************************************************************************
[[0]
 [1]]
****************************************************************************************************
[[ 0  2  4  6  8 10 12 14 16 18]
 [10 12 14 16 18 20 22 24 26 28]]
****************************************************************************************************
[[ 0  1  2  3  4  5  6  7  8  9]
 [11 12 13 14 15 16 17 18 19 20]]

2.3轴(axis)
含义：
在numpy中可以理解为方向,使用0,1,2…数字表示,对于一个一维数组,只有一个0轴,对于2维数组(shape(2,2)),有0轴和1轴,对于三维数组(shape(2,2, 3)),有0,1,2轴

作用：
有了轴的概念之后,我们计算会更加方便,比如计算一个2维数组的平均值,必须指定是计算哪个方向上面的数字的平均值

那么问题来了:
在前面的知识,轴在哪里?
如np.arange(0,10).reshape((2,5)),reshpe中2表示0轴长度(包含数据的条数)为2,1轴长度为5,2X5一共10个数据

二维数组的轴：
在这里插入图片描述
三维数组的轴：

在这里插入图片描述
2.4、广播原则
如果两个数组的后缀维度（从末尾开始算起的维度）的轴长相符或其中一方的长度为1，则认为它们是广播兼容的。广播会在缺失和（或）长度为1的维度上进行。我们可以理解成shape所对应的数字个数

那么问题来了:
1）shape为(3,3,3)的数组能够和(3,2)的数组进行计算么?
2）shape为(3,3,2)的数组能够和(3,2)的数组进行计算么?
3）shape为(3,3,2)的数组能够和(3,1)的数组进行计算么?

答案：
1）不能，因为(3,2)和(3,3,3)没有在某维度上相对应
2）能，(3,2)对应(3,3,2)的后二位数
3）能，因为(3,1)第一个参数对应3，第二个参数为1符合广播原则

有什么好处呢?
举个例子:每列的数据减去列的平均值的结果