numpy learning

Tags (separated by spaces): numpy python

type of data

5 types: Boolean (bool), integer (int), unsigned integer (uint), floating-point (float), complex (Complex)

Supported primitive types closely related to the original C type:

Numpy type	Type C	description
np.bool	bool	Boolean value is stored as byte (True or False)
np.byte	signed char	Platform definition
np.ubyte	unsigned char	Platform definition
np.short	short	Platform definition
np.ushort	unsigned short	Platform definition
np.intc	int	Platform definition
np.uintc	unsigned int	Platform definition
np.int_	long	Platform definition
np.uint	unsigned long	Platform definition
np.longlong	long long	Platform definition
np.ulonglong	unsigned long long	Platform definition
np.half / np.float16	-	Half-precision floating-point format: a sign bit, five exponent, 10 mantissa
np.single	float	Single-precision floating platform defined: typically the sign bit, eight exponent, 23 mantissa
np.double	double	Platform-defined double precision floating point: usually a sign bit, 11 exponent, 52 mantissa.
np.longdouble	long double
np.csingle	float complex	Complex, represented by two single-precision floating-point (real and imaginary part)
np.cdouble	double complex	Complex, is represented by two double precision floating point (real and imaginary part).
np.clongdouble	long double complex	Complex, represented by two extended precision floating-point (real and imaginary part).

Since many of which are dependent on the platform having a defined, thus providing a fixed-size set of alias:

Note: The type code is typically the first digit, a letter / 8, such as bit c32 complex256

| Type C | | Numpy type Type Code | Description |
|: ----------- |: ------: ||: --- |
| np.int8 | int8_t | I1 | bytes (-128 to 127) |
| np.int16 | int16_t | I2 | integer (-32768 to 32767) |
| np.int32 | int32_t | I4 | integer (-2147483648 to 2147483647) |
| np.int64 | int64_t | i8 | integer (-9223372036854775808 to 9223372036854775807) |
| np.uint8 | uint8_t | U1 | unsigned integers (0 to 255) |
| np.uint16 | uint16_t | U2 | unsigned integer (0 to 65535) |
| NP .uint32 | uint32_t | u4 | unsigned integer (0 to 4294967295) |
| || np.uint64 uint64_t | U8 | unsigned integer (0 to 18446744073709551615) ||
integer for indexing | | np.intp | intptr_t | None , generally the same as the index ssize_t |
| np.uintp | the uintptr_t | not | integer large enough to accommodate the pointer |
| np.float32 | a float | F or f4 | 8-bit exponent |
| np.float64 / np.float_ | Double | F8 | Note that this matches the precision of the built-in python float. |
| np.complex64 | float complex | c8 | complex, is represented by two 32-bit floating-point (real and imaginary components) |
| np.complex128 / np.complex_ | Double Complex | C16 | Note that the built-in complex python the accuracy of the match. |

Create, manipulate narray

import numpy as np

#新版本
x1 = np.int8([0,1,2,3,4,5,6,7,8,9]) 
x2 = np.arange(10, dtype=np.int8)
x1, x2

(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int8),
 array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int8))

#老版本
y1 = np.array(np.arange(9), dtype=np.int8)
y2 = np.array(np.arange(9), dtype='i1')
y1, y2
"""
新手最容易犯的一个错误就是把数组内容直接当做参数传给array，如np.array(1,2,3)
"""

'\n\xe6\x96\xb0\xe6\x89\x8b\xe6\x9c\x80\xe5\xae\xb9\xe6\x98\x93\xe7\x8a\xaf\xe7\x9a\x84\xe4\xb8\x80\xe4\xb8\xaa\xe9\x94\x99\xe8\xaf\xaf\xe5\xb0\xb1\xe6\x98\xaf\xe6\x8a\x8a\xe6\x95\xb0\xe7\xbb\x84\xe5\x86\x85\xe5\xae\xb9\xe7\x9b\xb4\xe6\x8e\xa5\xe5\xbd\x93\xe5\x81\x9a\xe5\x8f\x82\xe6\x95\xb0\xe4\xbc\xa0\xe7\xbb\x99array\xef\xbc\x8c\xe5\xa6\x82np.array(1,2,3)\n'

#要转换数组的类型，请使用 .astype() 方法（首选）或类型本身作为函数
x1.astype('u1'), x2.astype(int),np.int16(y1), np.uint16(y2)

(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint8),
 array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
 array([0, 1, 2, 3, 4, 5, 6, 7, 8], dtype=int16),
 array([0, 1, 2, 3, 4, 5, 6, 7, 8], dtype=uint16))

#dtype属性，代表数组的类型；shape属性，代表数组的形状；ndim属性，代表数组有几个维度
#size属性：代表数组中元素的个数， itemsize属性：代表数组中单个元素以字节计的大小
#data属性：数组中实际的数据
z = np.arange(15).reshape((3,5))
z.shape, z.dtype, z.ndim, z.size, z.itemsize, z.data, np.issubdtype(z.dtype, np.integer), np.issubdtype(z.dtype, np.floating)

((3L, 5L),
 dtype('int32'),
 2,
 15,
 4,
 <read-write buffer for 0x0000000007485580, size 60, offset 0 at 0x000000000745BC38>,
 True,
 False)

Overflow error

When the value of data requires more memory than available memory types, e.g., numpy.power 64-bit integer for the correct calculation of 100 * 10 * 8, but is given for 32-bit integer 1874919424 (incorrect).

np.power(100, 8, dtype=np.int64), np.power(100, 8, dtype=np.int32)

(10000000000000000, 1874919424)

# NumPy分别提供numpy.iinfo 并numpy.finfo 验证NumPy整数和浮点值的最小值或最大值：
np.iinfo(np.int), np.iinfo(np.int32), np.iinfo(np.int64)

(iinfo(min=-2147483648, max=2147483647, dtype=int32),
 iinfo(min=-2147483648, max=2147483647, dtype=int32),
 iinfo(min=-9223372036854775808, max=9223372036854775807, dtype=int64))

Create an array of five kinds of conventional mechanisms:

1、从其他Python结构（例如，列表，元组）转换
2、numpy原生数组的创建（例如，arange、ones、zeros等）
3、从磁盘读取数组，无论是标准格式还是自定义格式
4、通过使用字符串或缓冲区从原始字节创建数组
5、使用特殊库函数（例如，random）

Ones and zeros filled way

Note: [,] which is an optional parameter, such as: empty (shape [, dtype, order]), shape parameter is mandatory, dtype, order is an optional parameter

method	description
empty(shape[, dtype, order])	Returns a new array of given shape and type, without initialization entry.
empty_like(prototype[, dtype, order, subok, …])	Returns the type of the given shape, and a new array of the same array.
eye(N[, M, k, dtype, order])	It returns a two-dimensional array, a diagonal, the rest zero.
identity(n[, dtype])	Returns an array of identity.
ones(shape[, dtype, order])	Returns a new array of given shape and type, and filled 1.
ones_like(a[, dtype, order, subok, shape])	Returns the type of the given shape of the array to the same array.
zeros(shape[, dtype, order])	Returns a new array of given shape and type, and is filled with zeros.
zeros_like(a[, dtype, order, subok, shape])	Returns the type of the given shape of the array of the same array of zeros.
full(shape, fill_value[, dtype, order])	Returns a new array of given shape and type, and filled with fill_value.
full_like(a, fill_value[, dtype, order, …])	Shapes and types of return given the full array of the same array.

Created from existing data

method	description
array (object [, dtype, copy, order, tested, ndmin])	Create an array.
asarray(a[, dtype, order])	Will enter into an array.
asanyarray(a[, dtype, order])	Convert the inputs ndarray, but by ndarray subclass.
ascontiguousarray (a [, dtype])	Returns an array of contiguous memory (ndim> = 1) (C sequence).
asmatrix(data[, dtype])	The input interpreted as a matrix.
copy(a[, order])	Return an array of copies of a given object.
frombuffer(buffer[, dtype, count, offset])	The buffer is interpreted as a one-dimensional array.
fromfile(file[, dtype, count, sep, offset])	The data structure of a text file or a binary array.
fromfunction(function, shape, **kwargs)	An array is constructed by performing a function on each coordinate.
	fromiter(iterable, dtype[, count])
fromstring(string[, dtype, count, sep])	Initialization string from the text data of the new one-dimensional array.
loadtxt(fname[, dtype, comments, delimiter, …])	Load data from a text file.

Note:? View function can be explained by np.array

#zero和ones代码实现
"""
zero：创建所有元素为0的数组
ones：创建所有元素为1的数组
empty：创建所有元素为随机的数组 *****
"""
np.empty([2, 2], dtype=int), np.empty((3, 3)),np.empty_like(([1,2,3], [4,5,6])),np.eye(4)

(array([[43998544,        0],
        [62099504,        0]]), array([[0.22222222, 0.44444444, 0.66666667],
        [0.88888889, 1.11111111, 1.33333333],
        [1.55555556, 1.77777778, 2.        ]]), array([[         0, 1073741824,          0],
        [1074790400,          0, 1075314688]]), array([[1., 0., 0., 0.],
        [0., 1., 0., 0.],
        [0., 0., 1., 0.],
        [0., 0., 0., 1.]]))

#因为浮点数的有限精度问题，array返回的数组可能无法预知，因此出现浮点数时，最好用linspace
np.arange(10,100,10),np.linspace(0,2,10)

(array([10, 20, 30, 40, 50, 60, 70, 80, 90]),
 array([0.        , 0.22222222, 0.44444444, 0.66666667, 0.88888889,
        1.11111111, 1.33333333, 1.55555556, 1.77777778, 2.        ]))

Print array

打印出来非常类似嵌套列表
如果数组太长，则会自动忽略部数据，只打印首位

print np.arange(24).reshape(2,3,4),np.arange(1000000)

[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]] [     0      1      2 ... 999997 999998 999999]

Basic Operations

Arithmetic operations are performed the element level (do elementwise) on the array.

NOTE: linear algebra, matrix multiplication multiplication * represented, but element level representative of the multiplication, the matrix multiplication with the numpy np.dot (a, b), or a.dot (b)

When a + = b, a is an int, b is the float, an error is reported, because more refined error ratio b a, b + = a, but will not be given

a = np.array([1,2,3,4])
b = np.arange(4)
a-b, a+b, a*b, a*2, a/3,a.dot(b) #全部新增的一个数组

(array([1, 1, 1, 1]),
 array([1, 3, 5, 7]),
 array([ 0,  2,  6, 12]),
 array([2, 4, 6, 8]),
 array([0, 0, 1, 1]),
 20)

ndarray内置函数：sum,min,max,comsum，并且可以通过制定axis=0或1来指定对行或者列操作

通用函数

一元ufunc

函数	说明
abs、fabs	计算整数、浮点数或复数的绝对值。对于非复数，fabs更快
sqrt	计算各元素的平方根
square	计算各元素的平方
exp	计算各元素的指数$e^X$
log、log10、log2、log1P	分别对自然对数（$e^X$），底数分别为e、10、2、1+x
sign	计算各元素的正负号：1（正数）、-1（负数）
floor	计算各元素小于等于该值的最大整数
ceil	计算各元素大于等于该值的最小整数
rint	将各元素四舍五入到最接近的整数，保留dtype
modf	将数组的小数与整数部分分别以两个独立的数组形式返回
isnan	返回哪些值是nan的布尔型数组
isfinite、isinf	返回哪些数组是有穷或无穷的布尔型数组
cos、cosh、sin、sinh、tan、tanh	普通和双曲三角函数
arccos、arccosh、arcsin、arcsinh、arctan、arctanh	反三角函数
logical_not	计算各元素not x的真值。相当于-arr

二元ufunc

函数	说明
add	将数组中对应的元素相加
subtract	从第一个数组中减去第二个数组中的元素
multiply	数组元素相乘
divide、floor_divide	除法或除不要余数
power	第一个元素A，根据第二个相应的元素计算$A^B$
maximum、fmax、minimum、fmin	元素最大值和最小值计算。fmax和fmin将忽略NaN
mod	除法求余数
copysign	将第二个数组中的符号复制给第一个数组中的值
greater、greater_equal、less、less_equal、equal、not_equal	比较运算，产生布尔型数组。依次> >= < <= == !=
logical_and、logical_or、logical_xor	真值逻辑运算。与或非

索引、切片以及遍历

一维数组非常类似与python的list和tuple，它们可以被index、slice、iterate
copy方法可以复制数组和标量值

c = np.arange(10)
c,c[0],c[1:3],c[:5],c[5:],c[:-1],c[:]

(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
 0,
 array([1, 2]),
 array([0, 1, 2, 3, 4]),
 array([5, 6, 7, 8, 9]),
 array([0, 1, 2, 3, 4, 5, 6, 7, 8]),
 array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))

跟列表最重要的区别在于，数组的切片的数据不会复制，任何修改都会反馈到元数据上

c1 = c[4:6]
c1[:] = 111
c

array([  0,   1,   2,   3, 111, 111,   6,   7,   8,   9])

#二维数组的索引和切片
d = np.arange(16).reshape((4,4))
d[1][0],d[0,1]

(4, 1)

#布尔型索引
#布尔型索引在多维数组里面经常会跟上面的数字索引方法混用
e = np.array(['a','b','c','e','f','g','h','i','j','k'])
e == 'e',c[e == 'f'],c[e != 'e'],c[(e == 'f')|(e != 'f')]

(array([False, False, False,  True, False, False, False, False, False,
        False]),
 array([111]),
 array([  0,   1,   2, 111, 111,   6,   7,   8,   9]),
 array([  0,   1,   2,   3, 111, 111,   6,   7,   8,   9]))

元素的形状

可以通过reshape（创建新数组）、resize（改变原数组）、ravel（使数组变扁平）

将不同的数组堆叠（stacking）起来

多个数组可以沿着不同的额轴堆叠起来，用vstack（竖向）、hstack（横向）

将一个数组切分成多个

使用hsplit或vsplit将一个数组沿制定方向切分split(array,(x,y))切分为x列和y列，split(array,n)切分为n个数组

拷贝与视图

1、简单的赋值不发生拷贝；函数调用也不会产生拷贝
2、不同数据对象可以共享相同的数据，view方法新建一个数组，但是仍使用相同的数据
3、深拷贝：copy

广播

在NumPy中如果遇到大小不一致的数组运算，就会触发广播机制。满足一定的条件才能触发广播，不然也会报错。

形状相同

形状相同的数组之间的运算就是在对应位做运算。

形状不同

当数组大小不一致时，就会触发广播机制。广播机制的规则：

1.让所有输入数组都向其中shape最长的数组看齐，shape中不足的部分都通过在前面加1补齐；

2.输出数组的shape是输入数组shape的各个轴上的最大值；

3.如果输入数组的某个轴和输出数组的对应轴的长度相同或者其长度为1时，这个数组能够用来计算，否则出错；

4.当输入数组的某个轴的长度为1时，沿着此轴运算时都用此轴上的第一组值。

[Big Data technologies to enhance the ability of learning _2] numpy