NumPy的数据处理

由于Python中list中的元素为对象指针，因此对于一个简单数列[1,2,3]，就需要3个指针和3个对象来存储，这样比较浪费空间。而Python中的array模块，提供的array对象和list不同，其元素为数据对象本身，这和c语言的一维数组类似。但是其不支持多维数组和计算，因此一般不用其做数值计算。

NumPy提供了两种基本对象：ndarray(n-dimensional array object) 和 ufunc(universal function objext)。ndarray为存储单一数据类型的多维数组；ufunc为能对数组进行处理的函数。

1. ndarray对象

1.1 属性shape

数组array有个属性shape，可以通过shape改变当前数组的属性，这个属性值是一个元组（tuple）类型的，描述的是数组个轴的长度。当设置某个轴的元素个数为-1时，将自动计算此轴的长度。

import numpy as np
a=np.array([1,2,3,4,5,6])
print(a)
print(a.shape)

运行结果：

[1 2 3 4 5 6]

(6,)

改变数组a的形状属性。

a.shape = 2,3
print(a)
a.shape =(3,-1)
print(a)
a.shape = (6,)
print(a)

运行结果：

[[1 2 3]
 [4 5 6]]
 
[[1 2]
 [3 4]
 [5 6]]
 
[1 2 3 4 5 6]

使用reshape（）创建一个新的指定形状的数组，而原数组的形状保持不变。注意：新建的数组与原数组只是形状不同，但是两个数组a和b是共享数据存储空间的，也就是说改变其中任何一个的数据，另一个的相应位置的数据也会改变。

b = a.reshape((3,2))
print(a)
print(b)

运行结果：

[1 2 3 4 5 6]

[[1 2]
 [3 4]
 [5 6]]

改变b的数据，看a数组的数据是否改变。

b[0,1] = 99
print(b)
print(a)

运行结果：

[[ 1 99]
 [ 3  4]
 [ 5  6]]
 
[ 1 99  3  4  5  6]

1.2 属性dtype

数组元素的数据类型可以通过dtype属性获得，也可在创建数组时，通过dtype属性指定数据类型。
显示b的数据类型：

>>>b.dtype
dtype('int32')

指定数据类型，创建数组：

fd = np.array([1,2,3], dtype=np.float)
print(fd)
cd = np.array([1,2,3], dtype=np.complex)
print(cd)

结果：

[1. 2. 3.]
[1.+0.j 2.+0.j 3.+0.j]

用dtype强制转换数据类型： 把float64转换为uint64，只是把内存中的数据以uint64格式表示，并没有将float类型的数值转换为int类型数值。 用double类型重新表示float类型。

print(fd.dtype)
fd.dtype = np.uint64
print(fd)
fd.dtype = np.double
print(fd)

结果：

float64

[4607182418800017408 4611686018427387904 4613937818241073152]

[1. 2. 3.]

NumPy中的数据类型可以用typeDict查看：

>>>np.typeDict
{'?': numpy.bool_,
 0: numpy.bool_,
 'byte': numpy.int8,
 'b': numpy.int8,
 1: numpy.int8,
 'ubyte': numpy.uint8,
 'B': numpy.uint8,
 2: numpy.uint8,
 'short': numpy.int16,
 'h': numpy.int16,
 3: numpy.int16,
 'ushort': numpy.uint16,
 'H': numpy.uint16,
 4: numpy.uint16,
 'i': numpy.int32,
 5: numpy.int32,
 'uint': numpy.uint32,
 'I': numpy.uint32,
 6: numpy.uint32,
 'intp': numpy.int64,
 'p': numpy.int64,
 .......
 .......
 'bytes': numpy.bytes_,
 'a': numpy.bytes_}

用set去掉重复项：

>>>set(np.typeDict.values())  
{numpy.bool_,
 numpy.bytes_,
 numpy.complex128,
 numpy.complex128,
 numpy.complex64,
 numpy.datetime64,
 numpy.float16,
 numpy.float32,
 numpy.float64,
 numpy.float64,
 numpy.int16,
 numpy.int32,
 numpy.int32,
 numpy.int64,
 numpy.int8,
 numpy.object_,
 numpy.str_,
 numpy.timedelta64,
 numpy.uint16,
 numpy.uint32,
 numpy.uint32,
 numpy.uint64,
 numpy.uint8,
 numpy.void}

1.3 创建数组

函数	作用
arange(start, end, step)	通过指定开始值、终值和步长，创建一个形式为等差数列的一维数组。注意：数组值不包含终值
linspace(start, end, num)	通过指定开始值、终值和元素个数，创建一个形式为等差数列的一维数组。 *注意：可以通过设置参数 endpoint* 来指定数组值是否包含终值，endpoint的默认值为True**
logspace(start, end, num)	通过指定开始值、终值和元素个数，创建一个形式为等比数列的一维数组。 *注意：可以通过设置参数 endpoint* 来指定数组值是否包含终值，endpoint的默认值为True；通过 base 参数指定基数，默认值为10.**
zeros(shape, dtype)	通过指定形状和类型，创建一个值为初始化值0的数组。注意：默认类型为np.float64.
ones(shape, dtype)	通过指定形状和类型，创建一个值为初始化值1的数组。注意：默认类型为np.float64.
empty(shape, dtype)	通过指定形状和类型，创建的数组只分配内存空间，不初始化值。注意：默认类型为np.float64.
eye(N, M=None, k=0, dtype=<class ‘float’>, order=‘C’)	创建一个1对角阵，可为非方阵。 Parameters: N : int，=Number of rows in the output; M : int, optional =Number of columns in the output. If None, defaults to N; k : int, optional =Index of the diagonal: 0 (the default) refers to the main diagonal, a positive value refers to an upper diagonal, and a negative value to a lower diagonal; dtype : data-type, optional =Data-type of the returned array; order : {‘C’, ‘F’}, optional =Whether the output should be stored in row-major (C-style) or column-major (Fortran-style) order in memory.
identity(n, dtype=None)	创建一个1对角阵，方阵。 n : int =Number of rows (and columns) in n x n output；dtype : data-type, optional =Data-type of the output. Defaults to float.
mgrid，ogrid	mgrid生成网格矩阵，输出值其实是两个矩阵，一个表示y轴的划分，一个表示x轴的划分；ogrid生成列向量和行向量，也是网格的值。

对角单位矩阵：

>>> np.eye(2, dtype=int)
array([[1, 0],
       [0, 1]])
>>> np.eye(3, k=1)
array([[0.,  1.,  0.],
       [0.,  0.,  1.],
       [0.,  0.,  0.]])
       
>>> np.identity(3)
array([[1.,  0.,  0.],
       [0.,  1.,  0.],
       [0.,  0.,  1.]])

网格生成：

>>> np.mgrid[0:5,0:5]
array([[[0, 0, 0, 0, 0],
        [1, 1, 1, 1, 1],
        [2, 2, 2, 2, 2],
        [3, 3, 3, 3, 3],
        [4, 4, 4, 4, 4]],
       [[0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4]]])

>>> np.ogrid[0:5,0:5]
[array([[0],
        [1],
        [2],
        [3],
        [4]]), array([[0, 1, 2, 3, 4]])]

科学计算读书笔记（1）- NumPy的数据处理

NumPy的数据处理

1. ndarray对象

1.1 属性shape

1.2 属性dtype

1.3 创建数组

猜你喜欢