NumPy基础（2. ndarray操作)

>>> import numpy as np
# 将一个python list 传入生成一个array
>>> a = np.array([2,3,4])
>>> a
array([2, 3, 4])
>>> type(a)
<class 'numpy.ndarray'>

# 这个属性与Python的环境有关
>>> a.dtype
dtype('int32')
>>> b = np.array([1.2, 3.5, 5.1])
>>> b.dtype
dtype('float64')

2. 创建二维数组

>>> b = np.array([(1.5,2,3), (4,5,6)])
>>> b
array([[ 1.5, 2. , 3. ],
[ 4. , 5. , 6. ]])

3. 显示指定数组类型

>>> c = np.array( [ [1,2], [3,4] ], dtype=complex )
>>> c
array([[ 1.+0.j, 2.+0.j],
[ 3.+0.j, 4.+0.j]])

4. 创建0值数组

>>> np.zeros( (3,4) )
array([[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.]])

5. 创建1值数组

>>> np.ones( (2,3,4), dtype=np.int16 ) # dtype can also be specified
array([[[ 1, 1, 1, 1],
[ 1, 1, 1, 1],
[ 1, 1, 1, 1]],
[[ 1, 1, 1, 1],
[ 1, 1, 1, 1],
[ 1, 1, 1, 1]]], dtype=int16)


>>> np.empty( (2,3) ) # uninitialized, output may vary
array([[ 3.73603959e-262, 6.02658058e-154, 6.55490914e-260],
[ 5.30498948e-313, 3.14673309e-307, 1.00000000e+000]])

6. NumPy提供range函数，返回一个array

>>> np.arange( 10, 30, 5 )
array([10, 15, 20, 25])
>>> np.arange( 0, 2, 0.3 ) # it accepts float arguments
array([ 0. , 0.3, 0.6, 0.9, 1.2, 1.5, 1.8])

7. NumPy提供linspace方法，将给定范围切成指定数量的线段（3个参数）

When arange is used with floating point arguments, it is generally not possible to predict the number of elements
obtained, due to the finite floating point precision. For this reason, it is usually better to use the function linspace
that receives as an argument the number of elements that we want, instead of the step:

>>> from numpy import pi
>>> np.linspace( 0, 2, 9 ) # 9 numbers from 0 to 2
array([ 0. , 0.25, 0.5 , 0.75, 1. , 1.25, 1.5 , 1.75, 2. ])

# 将 0 ~ 2π 切成100份再求每个元素的sin值
>>> x = np.linspace( 0, 2*pi, 100 ) # useful to evaluate function at lots of points
>>> f = np.sin(x)

二、打印array

如果元素个数与reshape需要的元素个数不相等会报错（ValueError）

>>> a = np.arange(6) # 1d array
>>> print(a)
[0 1 2 3 4 5]
>>>
>>> b = np.arange(12).reshape(4,3) # 2d array
>>> print(b)
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]]
>>>
>>> c = np.arange(24).reshape(2,3,4) # 3d array
>>> print(c)
[[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]]

如果数组太大，打印会自动跳过中心部分：

>>> print(np.arange(10000))
[ 0 1 2 ..., 9997 9998 9999]
>>>
>>> print(np.arange(10000).reshape(100,100))
[[ 0 1 2 ..., 97 98 99]
[ 100 101 102 ..., 197 198 199]
[ 200 201 202 ..., 297 298 299]
...,
[9700 9701 9702 ..., 9797 9798 9799]
[9800 9801 9802 ..., 9897 9898 9899]
[9900 9901 9902 ..., 9997 9998 9999]]

三、基本操作

1. 针对每个元素计算的操作

Arithmetic operators on arrays apply elementwise. A new array is created and filled with the result.

>>> a = np.array( [20,30,40,50] )
>>> b = np.arange( 4 )
>>> b
array([0, 1, 2, 3])
# array 减法
>>> c = a-b
>>> c
array([20, 29, 38, 47])
# array 平方
>>> b**2
array([0, 1, 4, 9])
# array 求sin
>>> 10*np.sin(a)
array([ 9.12945251, -9.88031624, 7.4511316 , -2.62374854])
# array比较
>>> a<35
array([ True, True, False, False])

2. 矩阵积

Unlike in many matrix languages, the product operator * operates elementwise in NumPy arrays. The matrix product
can be performed using the @ operator (in python >=3.5) or the dot function or method:

>>> A = np.array( [[1,1],
... 			   [0,1]] )
>>> B = np.array( [[2,0],
... 			   [3,4]] )
# 对应元素积
>>> A * B # elementwise product
array([[2, 0],
	   [0, 4]])
# 矩阵积
>>> A @ B # matrix product
array([[5, 4],
[3, 4]])
# 矩阵积方法二
>>> A.dot(B) # another matrix product
array([[5, 4],
[3, 4]])

3. += 和 *= 操作修改原array，而不是创建一个新的array

Some operations, such as += and *=, act in place to modify an existing array rather than create a new one.

>>> a = np.ones((2,3), dtype=int)
>>> b = np.random.random((2,3))
>>> a *= 3
>>> a
array([[3, 3, 3],
[3, 3, 3]])
>>> b += a
>>> b
array([[ 3.417022 , 3.72032449, 3.00011437],
[ 3.30233257, 3.14675589, 3.09233859]])
>>> a += b # b is not automatically converted to integer type
Traceback (most recent call last):
...
TypeError: Cannot cast ufunc add output from dtype('float64') to dtype('int64') with
˓→casting rule 'same_kind'

4. 不同类型的array操作，结果为更精准的数据类型（此性质与其他强类型编程语言一致）

When operating with arrays of different types, the type of the resulting array corresponds to the more general or precise
one (a behavior known as upcasting).

# int32 + float64 = float64
>>> a = np.ones(3, dtype=np.int32)
>>> b = np.linspace(0,pi,3)
>>> b.dtype.name
'float64'
>>> c = a+b
>>> c
array([ 1. , 2.57079633, 4.14159265])
>>> c.dtype.name
'float64'

# -> complex128

>>> d = np.exp(c*1j)
>>> d
array([ 0.54030231+0.84147098j, -0.84147098+0.54030231j,
-0.54030231-0.84147098j])
>>> d.dtype.name
'complex128'

5. ndarray类提供了一些一元操作方法：

Many unary operations, such as computing the sum of all the elements in the array, are implemented as methods of
the ndarray class.

>>> a = np.random.random((2,3))
>>> a
array([[ 0.18626021, 0.34556073, 0.39676747],
[ 0.53881673, 0.41919451, 0.6852195 ]])
# 所有元素求和
>>> a.sum()
2.5718191614547998
# 所有元素中最小
>>> a.min()
0.1862602113776709

>>> a.max()
0.6852195003967595

6. 通过参数指定轴方向为集合的进行计算

axis=0 对列操作
axis=1 对行操作

By default, these operations apply to the array as though it were a list of numbers, regardless of its shape. However,
by specifying the axis parameter you can apply an operation along the specified axis of an array:

# 3行4列

>>> b = np.arange(12).reshape(3,4)
>>> b
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>>
# 对列操作
>>> b.sum(axis=0) # sum of each column
array([12, 15, 18, 21])
>>>
# 对行操作
>>> b.min(axis=1) # min of each row
array([0, 4, 8])
>>>
>>> b.cumsum(axis=1) # cumulative sum along each row
array([[ 0, 1, 3, 6],
[ 4, 9, 15, 22],
[ 8, 17, 27, 38]])

7. 通用方法 exp\sqrt …

NumPy provides familiar mathematical functions such as sin, cos, and exp. In NumPy, these are called “universal
functions”(ufunc). Within NumPy, these functions operate elementwise on an array, producing an array as output.

# 创建array [0,1,2]
>>> B = np.arange(3)
>>> B
array([0, 1, 2])

# exp（自然对数e的指数）操作作用每个元素
>>> np.exp(B)
array([ 1. , 2.71828183, 7.3890561 ])

# 对B每个元素开平方
>>> np.sqrt(B)
array([ 0. , 1. , 1.41421356])
>>> C = np.array([2., -1., 4.])

# array求和

>>> np.add(B, C)
array([ 2., 0., 6.])

其他通用方法：
all, any, apply_along_axis, argmax, argmin, argsort, average, bincount, ceil, clip, conj,
corrcoef, cov, cross, cumprod, cumsum, diff, dot, floor, inner, inv, lexsort, max, maximum,
mean, median, min, minimum, nonzero, outer, prod, re, round, sort, std, sum, trace, transpose,
var, vdot, vectorize, where

8. Indexing, Slicing and Iterating

一维数组操作

One-dimensional arrays can be indexed, sliced and iterated over, much like lists and other Python sequences.

# Iterating操作，对1-10的array迭代，**3操作
>>> a = np.arange(10)**3
>>> a
array([ 0, 1, 8, 27, 64, 125, 216, 343, 512, 729])

# Indexing操作
>>> a[2]
8

# Slicking操作，截取2-5（前包后不包）位置
>>> a[2:5]
array([ 8, 27, 64])

# [起始位置：结束位置：步长] = 赋值 
>>> a[:6:2] = -1000 # equivalent to a[0:6:2] = -1000; from start to position 6, exclusive, set every 2nd element to -1000
>>> a
array([-1000, 1, -1000, 27, -1000, 125, 216, 343, 512, 729])

# 翻转
>>> a[ : :-1] # reversed a
array([ 729, 512, 343, 216, 125, -1000, 27, -1000, 1, -1000])

# Iterable-object
>>> for i in a:
... print(i**(1/3.))
...
nan
1.0
nan
3.0
nan
5.0
6.0
7.0
8.0
9.0

多维数组操作

Multidimensional arrays can have one index per axis. These indices are given in a tuple separated by commas:

# 定义返回值为（x坐标*10 + y坐标）
>>> def f(x,y):
... return 10*x+y
...
# 通过fromfunction生成形状（5,4），元素值生成规则为f,类型为int的一个array
>>> b = np.fromfunction(f,(5,4),dtype=int)
>>> b
array([[ 0, 1, 2, 3],
[10, 11, 12, 13],
[20, 21, 22, 23],
[30, 31, 32, 33],
[40, 41, 42, 43]])

# 通过x,y坐标获取值（坐标0开始）
>>> b[2,3]
23

# 第0-5行，第1列的元素
>>> b[0:5, 1] # each row in the second column of b
array([ 1, 11, 21, 31, 41])

# 所有行，第1列的元素，结果同上一个eg
>>> b[ : ,1] # equivalent to the previous example
array([ 1, 11, 21, 31, 41])

# 第1,3行所有列的元素

>>> b[1:3, : ] # each column in the second and third row of b
array([[10, 11, 12, 13],
[20, 21, 22, 23]])

When fewer indices are provided than the number of axes, the missing indices are considered complete slices:

>>> b[-1] # the last row. Equivalent to b[-1,:]
array([40, 41, 42, 43])

更高维数组下标省略表达

The expression within brackets in b[i] is treated as an i followed by as many instances of : as needed to represent
the remaining axes. NumPy also allows you to write this using dots as b[i,…].
The dots (…) represent as many colons as needed to produce a complete indexing tuple. For example, if x is an
array with 5 axes, then
• x[1,2,…] is equivalent to x[1,2,:,:,:],
• x[…,3] to x[:,:,:,:,3] and
• x[4,…,5,:] to x[4,:,:,5,:].

>>> c = np.array( [[[ 0, 1, 2], # a 3D array (two stacked 2D arrays)
... [ 10, 12, 13]],
... [[100,101,102],
... [110,112,113]]])
# 3维数组
>>> c.shape
(2, 2, 3)

# 第一维==1
>>> c[1,...] # same as c[1,:,:] or c[1]
array([[100, 101, 102],
[110, 112, 113]])

# 最后一维==2
>>> c[...,2] # same as c[:,:,2]
array([[ 2, 13],
[102, 113]])

按行迭代

Iterating over multidimensional arrays is done with respect to the first axis:

>>> for row in b:
... print(row)
...
[0 1 2 3]
[10 11 12 13]
[20 21 22 23]
[30 31 32 33]
[40 41 42 43]

扁平迭代

However, if one wants to perform an operation on each element in the array, one can use the flat attribute which is
an iterator over all the elements of the array:


>>> for element in b.flat:
... print(element)
...
0
1
2
3
10
11
12
13
20
21
22
23
30
31
32
33
40
41
42
43

四、形状操作

An array has a shape given by the number of elements along each axis:

>>> a = np.floor(10*np.random.random((3,4)))
>>> a
array([[ 2., 8., 0., 6.],
[ 4., 5., 1., 1.],
[ 8., 9., 3., 6.]])

# 查看形状属性

>>> a.shape
(3, 4)

1. reshape返回新的array，不会改变原array

The shape of an array can be changed with various commands. Note that the following three commands all return a
modified array, but do not change the original array:


>>> a.ravel() # returns the array, flattened
array([ 2., 8., 0., 6., 4., 5., 1., 1., 8., 9., 3., 6.])
>>> a.reshape(6,2) # returns the array with a modified shape
array([[ 2., 8.],
[ 0., 6.],
[ 4., 5.],
[ 1., 1.],
[ 8., 9.],
[ 3., 6.]])

2. 矩阵转置 T

>>> a.T # returns the array, transposed
array([[ 2., 4., 8.],
[ 8., 5., 9.],
[ 0., 1., 3.],
[ 6., 1., 6.]])
>>> a.T.shape
(4, 3)
>>> a.shape
(3, 4)

注意一维数组无法转置

>>> b = np.array([1,2,3])
>>> b
array([1, 2, 3])
>>> b.T
array([1, 2, 3])

# 需通过增加维度的方法：
>>> c = b[np.newaxis,]
>>> c
array([[1, 2, 3]])
>>> c.T
array([[1],
       [2],
       [3]])

The order of the elements in the array resulting from ravel() is normally “C-style”, that is, the rightmost index “changes
the fastest”, so the element after a[0,0] is a[0,1]. If the array is reshaped to some other shape, again the array is treated
as “C-style”. NumPy normally creates arrays stored in this order, so ravel() will usually not need to copy its argument,
but if the array was made by taking slices of another array or created with unusual options, it may need to be copied.
The functions ravel() and reshape() can also be instructed, using an optional argument, to use FORTRAN-style arrays,
in which the leftmost index changes the fastest.

3. resize改变原array

>>> a
array([[ 2., 8., 0., 6.],
[ 4., 5., 1., 1.],
[ 8., 9., 3., 6.]])
>>> a.resize((2,6))
>>> a
array([[ 2., 8., 0., 6., 4., 5.],
[ 1., 1., 8., 9., 3., 6.]])

4. reshape -1 参数为自动计算

# 给定3行自动计算列数
>>> a.reshape(3,-1)
array([[ 2., 8., 0., 6.],
[ 4., 5., 1., 1.],
[ 8., 9., 3., 6.]])

See also:
ndarray.shape, reshape, resize, ravel

5. 堆叠

通过堆叠扩展array

vstack 纵向堆叠

hstack 横向堆叠

>>> a = np.floor(10*np.random.random((2,2)))
>>> a
array([[ 8., 8.],
[ 0., 0.]])
>>> b = np.floor(10*np.random.random((2,2)))
>>> b
array([[ 1., 8.],
[ 0., 4.]])
>>> np.vstack((a,b))
array([[ 8., 8.],
[ 0., 0.],
[ 1., 8.],
[ 0., 4.]])
>>> np.hstack((a,b))
array([[ 8., 8., 1., 8.],
[ 0., 0., 0., 4.]])

使用column_stack函数堆叠

The function column_stack stacks 1D arrays as columns into a 2D array. It is equivalent to hstack only for 2D
arrays:

>>> from numpy import newaxis
>>> np.column_stack((a,b)) # with 2D arrays
array([[ 8., 8., 1., 8.],
[ 0., 0., 0., 4.]])
>>> a = np.array([4.,2.])
>>> b = np.array([3.,8.])
>>> np.column_stack((a,b)) # returns a 2D array
array([[ 4., 3.],
[ 2., 8.]])
>>> np.hstack((a,b)) # the result is different
array([ 4., 2., 3., 8.])
>>> a[:,newaxis] # this allows to have a 2D columns vector
array([[ 4.],
[ 2.]])
>>> np.column_stack((a[:,newaxis],b[:,newaxis]))
array([[ 4., 3.],
[ 2., 8.]])
>>> np.hstack((a[:,newaxis],b[:,newaxis])) # the result is the same
array([[ 4., 3.],
[ 2., 8.]])

# 对于高维数组还可以用方法concatenate，axis指定在哪个维度的基础上堆叠
>>> np.concatenate((b,b,c), axis=0)

On the other hand, the function row_stack is equivalent to vstack for any input arrays. In general, for arrays of
with more than two dimensions, hstack stacks along their second axes, vstack stacks along their first axes, and
concatenate allows for an optional arguments giving the number of the axis along which the concatenation should
happen.

In complex cases, r_ and c_ are useful for creating arrays by stacking numbers along one axis. They allow the use of
range literals (“:”)

>>> np.r_[1:4,0,4]
array([1, 2, 3, 0, 4])

When used with arrays as arguments, r_ and c_ are similar to vstack and hstack in their default behavior, but
allow for an optional argument giving the number of the axis along which to concatenate.
See also:
hstack, vstack, column_stack, concatenate, c_, r_

6. 将array切分成多个 vsplit & hsplit

Using hsplit, you can split an array along its horizontal axis, either by specifying the number of equally shaped
arrays to return, or by specifying the columns after which the division should occur:

>>> a = np.floor(10*np.random.random((2,12)))
>>> a
array([[ 9., 5., 6., 3., 6., 8., 0., 7., 9., 7., 2., 7.],
[ 1., 4., 9., 2., 2., 1., 0., 6., 2., 2., 4., 0.]])

# 将2行12列的a切成了3份，每份4列
>>> np.hsplit(a,3) # Split a into 3
[array([[ 9., 5., 6., 3.],
[ 1., 4., 9., 2.]]), array([[ 6., 8., 0., 7.],
[ 2., 1., 0., 6.]]), array([[ 9., 7., 2., 7.],
[ 2., 2., 4., 0.]])]

# 在第3列切一刀，第四列切一刀
>>> np.hsplit(a,(3,4)) # Split a after the third and the fourth column
[array([[ 9., 5., 6.],
[ 1., 4., 9.]]), array([[ 3.],
[ 2.]]), array([[ 6., 8., 0., 7., 9., 7., 2., 7.],
[ 2., 1., 0., 6., 2., 2., 4., 0.]])]

vsplit splits along the vertical axis, and array_split allows one to specify along which axis to split.

7. split切分

>>> a
array([[ 3,  4,  5,  6],
       [ 7,  8,  9, 10],
       [11, 12, 13, 14]])
       
# 此分割方法结果必须shape相同
>>> np.split(a,2,axis=1)
[array([[ 3,  4],
       [ 7,  8],
       [11, 12]]), 
array([[ 5,  6],
       [ 9, 10],
       [13, 14]])]
       
# 分割成2个shape不同的array
>>> np.array_split(a,2,axis=0)
[array([[ 3,  4,  5,  6],
       [ 7,  8,  9, 10]]), 
 array([[11, 12, 13, 14]])]

五、Copies and Views

When operating and manipulating arrays, their data is sometimes copied into a new array and sometimes not. This is
often a source of confusion for beginners. There are three cases:

1. 压根没拷贝 No Copy at All

5.1.1 赋值操作

>>> a = np.arange(12)
>>> b = a # no new object is created
>>> b is a # a and b are two names for the same ndarray object
True
>>> b.shape = 3,4 # changes the shape of a
>>> a.shape
(3, 4)

5.1.2 Python中传递可变对象的引用不会copy

>>> def f(x):
... print(id(x))
...
>>> id(a) # id is a unique identifier of an object
148293216
>>> f(a)
148293216

2. 浅拷贝 View or Shallow Copy

shape独立，data共享

Different array objects can share the same data. The view method creates a new array object that looks at the same data.

>>> c = a.view()
>>> c is a
False
>>> c.base is a # c is a view of the data owned by a
True
>>> c.flags.owndata
False
>>>

# 改变c的shape，a的shape不改变
>>> c.shape = 2,6 # a's shape doesn't change
>>> a.shape
(3, 4)
>>> c.shape
(2, 6)

# 改变c的data，a的data改变
>>> c[0,4] = 1234 # a's data changes
>>> a
array([[ 0, 1, 2, 3],
[1234, 5, 6, 7],
[ 8, 9, 10, 11]])

切片操作也是返回一个view

Slicing an array returns a view of it:

>>> s = a[ : , 1:3] # spaces added for clarity; could also be written "s = a[:,1:3]"
>>> s[:] = 10 # s[:] is a view of s. Note the difference between s=10 and s[:]=10
>>> a
array([[ 0, 10, 10, 3],
[1234, 10, 10, 7],
[ 8, 10, 10, 11]])

3. 深拷贝 Deep Copy

所谓深拷贝就是复制一份数据喽！原来np就由copy方法可直接拷贝！

The copy method makes a complete copy of the array and its data.

# 使用copy方复制对象
>>> d = a.copy() # a new array object with new data is created

# 新对象不同
>>> d is a
False

# 数据base不同
>>> d.base is a # d doesn't share anything with a
False

# 修改d数据，a没改变
>>> d[0,0] = 9999
>>> a
array([[ 0, 10, 10, 3],
[1234, 10, 10, 7],
[ 8, 10, 10, 11]])

copy方法一般用在切片之后，原array没用了，copy仅需要继续使用的数据array，更省内存啦。

Sometimes copy should be called after slicing if the original array is not required anymore. For example, suppose
a is a huge intermediate result and the final result b only contains a small fraction of a, a deep copy should be made
when constructing b with slicing:

>>> a = np.arange(int(1e8))
>>> b = a[:100].copy()
>>> del a # the memory of ``a`` can be released.

六、方法&函数总览

Array Creation arange, array, copy, empty, empty_like, eye, fromfile, fromfunction,
identity, linspace, logspace, mgrid, ogrid, ones, ones_like, r, zeros, zeros_like

Conversions ndarray.astype, atleast_1d, atleast_2d, atleast_3d, mat

Manipulations array_split, column_stack, concatenate, diagonal, dsplit, dstack, hsplit,
hstack, ndarray.item, newaxis, ravel, repeat, reshape, resize, squeeze, swapaxes,
take, transpose, vsplit, vstack

Questions all, any, nonzero, where

Ordering argmax, argmin, argsort, max, min, ptp, searchsorted, sort

Operations choose, compress, cumprod, cumsum, inner, ndarray.fill, imag, prod, put, putmask,
real, sum

Basic Statistics cov, mean, std, var

Basic Linear Algebra cross, dot, outer, linalg.svd, vdot

基础部分差不多了，还有一些花哨的技巧，后面更新。。。

不可描述的两脚兽

发布了45 篇原创文章 · 获赞 2 · 访问量 1万+

私信关注