NumPy（Numerical Python 的简称）提供了高效存储和操作密集数据缓存的接口。在某些方面，NumPy 数组与 Python 内置的列表类型非常相似。但是随着数组在维度上变大，NumPy 数组提供了更加高效的存储和数据操作。

版本检查：（遵循传统，使用np作为别名导入NumPy）

2.1 理解Python中的数据类型

2.1.1 Python整形不仅仅是一个整形

Python 3.x 中的一个整型实际上包括 4 个部分。

ob_refcnt 是一个引用计数，它帮助 Python 默默地处理内存的分配和回收。
ob_type 将变量的类型编码。
ob_size 指定接下来的数据成员的大小。
ob_digit 包含我们希望 Python 变量表示的实际整型值。

2.1.2 Python列表不仅仅是一个列表

由于 Python 的动态类型特性，可以创建异构的列表。
但如果列表中的所有变量都是同一类型的，那么很多信息都会显得多余——将数据存储在固定类型的数组中会更加高效。

2.1.3 Python中的固定类型数组

'i' 是一个数据类型码，表示数据为整型。

2.1.4 从Python列表创建数组

NumPy 要求数组必须包含同一类型的数据。如果类型不匹配，NumPy 将会向上转换（如果可行）。

如果希望明确设置数组的数据类型，可以用 dtype 关键字。

不同于 Python 列表，NumPy 数组可以被指定为多维的。

2.1.5 从头创建数组

zeros：创建0值数组；
ones：创建1值数组；
arange：等差数组。

linspace：等间隔数组；
random.random：0~1均匀分布；
random.normal：正态分布；
random.randint：随机整数数组（前闭后开，例如下例中整数范围是[0, 10)）；
eye：单位矩阵；
empty：未初始化数组。

2.1.6 NumPy标准数据类型

bool_	Boolean (True or False) stored as a byte
int_	Default integer type (same as C long; normally either int64 or int32)
intc	Identical to C int (normally int32 or int64)
intp	Integer used for indexing (same as C ssize_t; normally either int32 or int64)
int8	Byte (-128 to 127)
int16	Integer (-32768 to 32767)
int32	Integer (-2147483648 to 2147483647)
int64	Integer (-9223372036854775808 to 9223372036854775807)
uint8	Unsigned integer (0 to 255)
uint16	Unsigned integer (0 to 65535)
uint32	Unsigned integer (0 to 4294967295)
uint64	Unsigned integer (0 to 18446744073709551615)
float_	Shorthand for float64.
float16	Half precision float: sign bit, 5 bits exponent, 10 bits mantissa
float32	Single precision float: sign bit, 8 bits exponent, 23 bits mantissa
float64	Double precision float: sign bit, 11 bits exponent, 52 bits mantissa
complex_	Shorthand for complex128.
complex64	Complex number, represented by two 32-bit floats
complex128	Complex number, represented by two 64-bit floats

2.2 Numpy数组基础

2.2.1 Numpy数组的属性

nidm：数组的维度；
shape：数组每个维度的大小；
size：数组的总大小；
dtype：数组的数据类型；
itemsize：数组每个元素的字节大小；
nbytes：数组总字节大小（=itemsize*size）。

2.2.2 数组索引：获取单个元素

数组索引下标从0开始！为了获取数组的末尾索引，可以用负值索引。

2.2.3 数组切片：获取子数组

x[start:stop:step]：如果以上 3 个参数未指定，那么它们会被分别设置默认值 start=0、stop= 维度的大小（size of dimension）和 step=1。

用一个冒号（:）表示空切片。

关于数组切片有一点很重要也非常有用，那就是数组切片返回的是数组数据的视图，而不是数值数据的副本。这一点也是 NumPy 数组切片和 Python 列表切片的不同之处：在 Python 列表中，切片是值的副本。

使用copy()方法，可以实现复制功能。

此时修改子数组，原始的数组将不会被改变。

2.2.4 数组的变形

reshape()方法返回原始数组的一个非副本视图。

通过x[np.newaxis, :]获得行向量；
通过x[:, np.newaxis]获得列向量。

2.2.5 数组拼接和分裂

np.concatenate()：拼接或连接数组；

np.vstack()：垂直栈（列数不变，行数累加）；
np.hstack()：水平栈（行数不变，列数累加）；
np.dstack：沿着第三个维度拼接数组。

np.split：分裂数组；

注意，N 分裂点会得到 N + 1 个子数组。

np.hsplit：水平分裂数组；
np.vsplit：垂直分裂数组。

2.3 NumPy数组的计算：通用函数

2.3.1 缓慢的循环

import numpy as np 
np.random.seed(0) 
def compute_reciprocals(values): 
    output = np.empty(len(values)) 
    for i in range(len(values)): 
        output[i] = 1.0 / values[i] 
    return output 
values = np.random.randint(1, 10, size=5) 
print(compute_reciprocals(values))

[0.16666667 1. 0.25 0.25 0.125 ]

big_array = np.random.randint(1, 100, size=1000000) 
%timeit compute_reciprocals(big_array)

1.95 s ± 18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

运算处理的瓶颈并不是运算本身，而是 CPython 在每次循环时必须做数据类型的检查和函数的调度。每次进行倒数运算时，Python 首先检查对象的类型，并且动态查找可以使用该数据类型的正确函数。

2.3.2 通用函数介绍

NumPy 为很多类型的操作提供了非常方便的、静态类型的、可编译程序的接口，也被称作向量操作。

%timeit 1.0 / big_array

4.12 ms ± 29.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

NumPy 中的向量操作是通过通用函数实现的，可以看到它的完成时间比 Python 循环花费的时间更短。

np.arange(5) / np.arange(1, 6)

array([0. , 0.5 , 0.66666667, 0.75 , 0.8 ])

x = np.arange(9).reshape((3, 3))
2 ** x

array([[ 1, 2, 4],
[ 8, 16, 32],
[ 64, 128, 256]], dtype=int32)

2.3.3 探索NumPy的通用函数

1. 数组的运算

x = np.arange(4) 
print("x =", x) 
print("x + 5 =", x + 5) 
print("x - 5 =", x - 5) 
print("x * 2 =", x * 2) 
print("x / 2 =", x / 2) 
print("x // 2 =", x // 2) #地板除法运算
print("-x = ", -x) 
print("x ** 2 = ", x ** 2) 
print("x % 2 = ", x % 2)

x = [0 1 2 3]
x + 5 = [5 6 7 8]
x - 5 = [-5 -4 -3 -2]
x * 2 = [0 2 4 6]
x / 2 = [0. 0.5 1. 1.5]
x // 2 = [0 0 1 1]
-x = [ 0 -1 -2 -3]
x ** 2 = [0 1 4 9]
x % 2 = [0 1 0 1]

2. 绝对值

x = np.array([-2, -1, 0, 1, 2]) 
print(abs(x))
x = np.array([3 - 4j, 4 - 3j, 2 + 0j, 0 + 1j]) 
print(np.abs(x))

[2 1 0 1 2]
[5. 5. 2. 1.]

3. 三角函数

theta = np.linspace(0, np.pi, 4)
print("theta = ", theta) 
print("sin(theta) = ", np.sin(theta)) 
print("cos(theta) = ", np.cos(theta)) 
print("tan(theta) = ", np.tan(theta))

x = [-1, 0, 1] 
print("x = ", x) 
print("arcsin(x) = ", np.arcsin(x)) 
print("arccos(x) = ", np.arccos(x)) 
print("arctan(x) = ", np.arctan(x))

theta = [0. 1.04719755 2.0943951 3.14159265]
sin(theta) = [0.00000000e+00 8.66025404e-01 8.66025404e-01 1.22464680e-16]
cos(theta) = [ 1. 0.5 -0.5 -1. ]
tan(theta) = [ 0.00000000e+00 1.73205081e+00 -1.73205081e+00 -1.22464680e-16]
x = [-1, 0, 1]
arcsin(x) = [-1.57079633 0. 1.57079633]
arccos(x) = [3.14159265 1.57079633 0. ]
arctan(x) = [-0.78539816 0. 0.78539816]

4. 指数和对数

x = [1, 2, 3] 
print("x =", x) 
print("e^x =", np.exp(x)) 
print("2^x =", np.exp2(x)) 
print("3^x =", np.power(3, x))

x = [1, 2, 4, 10] 
print("x =", x) 
print("ln(x) =", np.log(x)) 
print("log2(x) =", np.log2(x)) 
print("log10(x) =", np.log10(x))

x = [0, 0.001, 0.01, 0.1] 
print("exp(x) - 1 =", np.expm1(x)) 
print("log(1 + x) =", np.log1p(x))

x = [1, 2, 3]
e^x = [ 2.71828183 7.3890561 20.08553692]
2^x = [2. 4. 8.]
3^x = [ 3 9 27]
x = [1, 2, 4, 10]
ln(x) = [0. 0.69314718 1.38629436 2.30258509]
log2(x) = [0. 1. 2. 3.32192809]
log10(x) = [0. 0.30103 0.60205999 1. ]
exp(x) - 1 = [0. 0.0010005 0.01005017 0.10517092]
log(1 + x) = [0. 0.0009995 0.00995033 0.09531018]

Python数据科学手册（2） NumPy入门