Data Analysis with Python - Numpy Basics: Array and Vector Computing
- ndarry, a fast space-saving multidimensional array with vector arithmetic and complex broadcasting capabilities
- Standard math functions that perform fast operations on entire sets of data without for-loop
- Tools for reading and writing disk data and tools for manipulating memory-mapped files?
- Linear algebra, random number generation, and Fourier transform functions
- Tools for integrating code such as C/C++
1. ndarry: a multidimensional array object
1. Create ndarry
#一维
In [5]: data = [1,2,3]
In [6]: import numpy as np
In [7]: arr1 = np.array(data)
In [8]: arr1
Out[8]: array([1, 2, 3])
#二维
In [11]: data2 = [[1,2,3],[4,5,6]]
In [12]: arr2 = np.array(data2)
In [13]: arr2
Out[13]:
array([[1, 2, 3],
[4, 5, 6]])
#查看数组的信息
In [15]: arr2.shape
Out[15]: (2, 3)
In [16]: arr2.dtype
Out[16]: dtype('int32')
The array creation function
array()
arange() is similar to the Python built-in function range(), but range() returns a list of
ones, zeros creates an array of all 1/0, but the parameter passed in is a set, such as np .ones((2,3))
ones_like, zeros_like create an all 1/0 array with the same shape as the passed array
empty, empty_like create an empty array, allocate memory, do not store the value
eye, identity create a square matrix
2. Operations between arrays and scalars
In [36]: arr2
Out[36]:
array([[1, 2, 3],
[4, 5, 6]])
In [37]: arr3
Out[37]:
array([[11, 12, 13],
[14, 15, 16]])
#加
In [38]: arr2+arr3
Out[38]:
array([[12, 14, 16],
[18, 20, 22]])
#乘
In [39]: arr2*arr3
Out[39]:
array([[11, 24, 39],
[56, 75, 96]])
#减
In [40]: arr3-arr2
Out[40]:
array([[10, 10, 10],
[10, 10, 10]])
#除
In [41]: arr3/arr2
Out[41]:
array([[11. , 6. , 4.33333333],
[ 3.5 , 3. , 2.66666667]])
#平方
In [42]: arr2**2
Out[42]:
array([[ 1, 4, 9],
[16, 25, 36]], dtype=int32)
3. Indexing and Slicing
index:
arr2d[0,0]或者是arr2d[0][0]
arr3d[0,0,0]或者是arr3d[0][0][0]
slices: yes : mark
arr2d[:2,:2]
arr3d[:2,:2]
Distinguish array and list operations first.
Array slicing is performed on the original array, while list slicing operations are performed on data assignment.
If you need to slice a copy instead of the source array itself, you need toarr[5:8].copy()
#列表的切片
>>> l1 = list(range(10))
>>> l1
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> l2 = l1[5:8]
>>> l2
[5, 6, 7]
>>> l2[0]=15
>>> l2
[15, 6, 7]
>>> l1
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
#数组的切片
In [50]: arr = np.arange(10)
In [51]: arr
Out[51]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [52]: arr_slice = arr[5:8]
In [53]: arr_slice
Out[53]: array([5, 6, 7])
In [54]: arr_slice[0]=15
In [55]: arr_slice
Out[55]: array([15, 6, 7])
In [56]: arr
Out[56]: array([ 0, 1, 2, 3, 4, 15, 6, 7, 8, 9])
#二维数组的切片
In [95]: arr2d
Out[95]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [96]: arr2d[:2]
Out[96]:
array([[1, 2, 3],
[4, 5, 6]])
Multiple slices can be passed in at one time
In [97]: arr2d[:2,:1]
Out[97]:
array([[1],
[4]])
In [98]: arr2d[:2,:2]
Out[98]:
array([[1, 2],
#3维
In [83]: arr3d
Out[83]: [[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]]
In [84]: arr3d[1]
Out[84]: [[7, 8, 9], [10, 11, 12]]
In [85]: arr3d[1][1]
Out[85]: [10, 11, 12]
In [86]: arr3d[1][1][1]
Out[86]: 11
In [87]: arr3d[1][1][2]
Out[87]: 12
boolean index
#[True,False,True]就相当有是取第0/2行
In [121]: arr2d[[True,False,True]]
Out[121]:
array([[1, 2, 3],
[7, 8, 9]])
In [122]: arr2d[[True,False,True],2]
Out[122]: array([3, 9])
fancy index
#与上边的博布尔型索引一样,也是取第0/2行
In [132]: arr2d[[0,2]]
Out[132]:
array([[1, 2, 3],
[7, 8, 9]])
#花式索引注意以下问题
Fancy indexing, unlike slicing, always copies data into a new array, which results in the following
In [136]: arr2d[[0,2],[0,2]]
Out[136]: array([1, 9])
In [137]: arr2d[[0,2]][:,[0,2]]
Out[137]:
array([[1, 3],
[7, 9]])
Array transpose and axis swap
Transpose is a special form of reshape that returns a view of the source data without copying.
In [142]: arr2d.T
Out[142]:
array([[1, 4, 7],
[2, 5, 8],
[3, 6, 9]])
4. Functions that operate on elements of an array
Functions that operate on a single array element
- abs Calculate absolute value
- sqrt computes the square root of each element
- square calculates the square of each element
- exp computes the base-e exponent of each element
- log/log10/log2/log1p log1p是log(1+x)
- sign Calculate the sign of each element
- ceil calculates the smallest integer greater than or equal to this element
- floor calculates the largest integer less than or equal to the element
- rint rounds the element to the nearest whole number
- modf returns the fractional and integer parts of the element, as two separate arrays
- isnan is not a number Determine whether each element is a number
- isfinite isinf judges that each element is infinite and infinite
- cos/without/so
- arccos/acccosh/arcsin
function that operates on two array elements
- add adds elements in an array
- subtract the elements in the first array minus the elements in the second array
- multiply the corresponding elements of the array to multiply
- divide floor_divide division, division with remainder discarded
- power(a,b) Calculate the element in a to the b power of the corresponding element a in b
- mod finds the remainder of a division
- copysign assigns the sign of the element in the second array to the value in the first array
-
< >= <= == != compares the values of corresponding elements
- logical_and/logical_or/logical_xor
5. Some operations that can be processed with arrays
Vectorization is convenient for operations
Ternary operation
In [6]: xarr = np.array([1.1,1.2,1.3,1.4,1.5])
In [7]: yarr = np.array([2.1,2.2,2.3,2.4,2.5])
In [8]: cond = np.array([True,False,True,True,False])
In [9]: result = [x if c else y for x ,c ,y in zip(xarr,yarr,cond)]
In [10]: result
Out[10]: [1.1, 1.2, 1.3, 1.4, 1.5]
#usually np.where
used to generate another array from one array
In [11]: result2 = np.where(cond,xarr,yarr)
In [12]: result2
Out[12]: array([1.1, 2.2, 1.3, 1.4, 2.5])
Mathematical and Statistical Methods
These methods can be called either as instance methods
arr2d.sum()
or vianp.sum(arr2d)
- sum calculates the sum of all elements
- mean computes the mean of all elements
- std/var calculates standard deviation and variance
- min/max maximum and minimum
- argmin/argmax index of min and max
- cumsum returns a cumulative sum of all elements of an array
- cumprod cumulative product of all elements
Methods for Boolean Arrays
#True直接当1计算
In [24]: (arr2d<4).sum()
Out[24]: 3
In [25]: cond
Out[25]: array([ True, False, True, True, False])
In [26]: cond.any()
Out[26]: True
In [27]: cond.all()
Out[27]: False
sort
- np.sort() this will make a copy
- arr2d.sort() is an operation on the source data
5. Input and output for array files
save the array to disk in binary form
- np.save()
- np.load()
access text files
- e.g. loadtext ()
- np.savetext ()
6. When linear algebra is not found, it is in numpy.linalg
- Note: transpose arr.T
- np.dot(arr1,arr2) product of two matrices
- np.diag returns the diagonal elements/or converts a one-dimensional array to a square matrix with this as the diagonal
- trace() calculates the sum of the diagonals
- det calculates the determinant value of the f square matrix
- eig computes eigenvalues and eigenvectors
- inv computes the inverse matrix
- pinv computes the pseudo-inverse matrix
- qr computes the QR decomposition
- svd computes singular value decomposition
- solve solves the linear equation Ax=b
- lstsq computes the least squares solution of Ax=b
7. Random number generation numpy.random complements Python's built-in random
- seed determines the seed for random number generation
- permutation returns a random permutation of a sequence or returns a range of random permutations
- shuffle shuffles a sequence in-place
- rand produces uniformly distributed sample values
- randint randomly picks integers from a given upper and lower range
- randn produces normally distributed sample values
- binomial yields sample values from a binomial distribution
- normal produces sample values from a binomial distribution
- beta yields sample values from a beta distribution
- chisquare produces sample values from a chi-square distribution
- gamma produces sample values from a Gamma distribution
- uniform produces (0,1) uniformly distributed sample values