Python data processing notes 01 - numpy array operations

Disclaimer: This environment is Windows10 + jupyter notebook, please download and install on their own Anaconda

1, numpy Library Overview and Installation

introduction:

Python with a set of values stored in list, can be used as the array used, since the elements of the list can be any object, so the list is stored in the object pointer, a simple unsaved [1,2,3], need there are three pointers and integers three objects, for numerical calculation for this arrangement is clearly more wasteful of memory and CPU time.

In addition, Python also offers an array module , and a list of different array objects, which directly save the value, and one-dimensional array C language is somewhat similar, but because it does not support multi-dimensional , nor the various operational functions, and therefore are not suitable for numerical computation.

Numpy born to make up for these shortcomings, numpy provide ndarray (N-dimensional array object) Object: ndarray is stored in a single data type of multidimensional arrays.

numpy (Numerical Python short) is the basis for high-performance packet scientific computing and data analysis, matrix operations and support array dimensions. include:

(1) a powerful N-dimensional array of objects ndarray, space-saving and fast vector arithmetic multidimensional array broadcast capability and complexity.

(2) the standard mathematical functions for the entire set of data invisible fast operation (without writing cycle).

(3) for reading and writing data to disk, and operating the memory-mapped files.

(4) Linear Algebra, random number generation function and a Fourier transform.

(5) tools for inherited code written in C, C ++, Fortran and other languages

numpy library provides a number of library functions and operations, the program can help ease the ape numerical calculation, these values are widely used for the following tasks:

(1) Machine learning models: In the preparation of machine learning algorithms, the need for a variety of numerical matrix. Adder matrix multiplication, etc. For example, libraries can be used numpy simple and quick calculation. numpy array is used to store training data and machine parameters semester model.

(2) Image Processing and Computer Graphics: Computer graphics is represented as a multi-dimensional array of numbers, numpy provides some excellent library functions to quickly process images. For example a mirror image, rotating the image by a specific angle.

(3) mathematical tasks: numpy may be numerical integration, differentiation, interpolation, extrapolation and other operations. numpy library to form a Python-based MATLAB quick replacement.

installation:

The easiest way to install numpy, pip tool code is as follows:

--user option indicates that only installed in the current user, instead of writing to the system directory.

python -m pip install --user numpy

2, numpy array operations

[Overview] ndarray

"N-dimensional array object ndarray for storing the same type of the elements of a multidimensional array.

"Ndarray Each element has the same size storage area in memory.

"Ndarray each element is a data object type of the object (referred dtype).

"And the other container in Python as objects, can be achieved by an array index or slice .

"Ndarray by the methods and properties to access and modify the content of ndarray.

ndarray of creating arrays:

The easiest way is to use the array function. It accepts all object type sequence, and then generates a numpy array containing incoming data. Wherein the nested sequence will be converted to a multi-dimensional array.

numpy.array(object,dtype=None,copy=True,order=None,subok=False,ndmin=0)

name	description
object	The number of columns or nested array
dtype	Data type of the array elements, optional
copy	Whether the object needs to be replicated, optional
order	Create style array, C is the row direction, F is the column direction, A is any direction
tested	A default return coincides with an array of base class
nursing their babies	Specifies the smallest dimension of the array generated

>> import numpy as np
>> a=[1,2,3,4]    # 创建简单的列表
>> b=np.array(a)  # 将列表转换为数组
>> b
array([1,2,3,4])

>> c=np.array([[1,2],[3,4]])
>> print(c)
[[1 2]
 [3 4]]

In addition np.array, there are some new functions may arrays:
> ones and zeros, respectively, may be used to create the specified array length or shape of all zeros or all 1.

> Empty array can be created without any specific value.

>> import numpy as np
>> np.zeros(3)  # 全0一维数组
array([0.,0.,0.])
>> np.ones(3)   # 全1一维数组
array([1.,1.,1.])

>> np.zeros((3,3)) # 全0二维数组，3行3列
array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])
>> np.ones((3,3))  # 全1二维数组，3行1列
array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

>> np.identity(3)   # 单位矩阵，3行3列
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

[Create] a random array

>> np.random.rand(5,5) # 创建指定形状的数组，范围在0至1之间
array([[0.18958258, 0.99081753, 0.94536359, 0.50506502, 0.30719297],
       [0.87369297, 0.45526996, 0.36816989, 0.95841558, 0.2649228 ],
       [0.49620817, 0.53016646, 0.36794172, 0.21930886, 0.20047452],
       [0.2816989 , 0.11543322, 0.52197946, 0.75586478, 0.21387594],
       [0.33406605, 0.09586188, 0.51795042, 0.73277065, 0.32744227]])
>> np.random.uniform(0,100)  # 创建指定范围内的一个数
26.259620386892358
>> np.random.randint(0,100)  # 创建指定范围内的一个整数
62
>> np.random.normal(1.75,0.1,(2,3))  # 给定均值/标准差/维度的正态分布
array([[1.5979596 , 1.94714557, 1.699023  ],
       [1.83694804, 1.69616237, 1.76031946]])

>> np.random.randint(0,50,5)  # 随机数组，5个0到50之间的数字
array([12,  5,  7,  4, 22])
>>np.random.standard_normal(5)    # 从标准正态分布中随即采样5个数字
array([-0.95519789, -1.06259055, -0.34076133,  0.65119027, -0.31220016])

ndarray array of attributes:

usage	Explanation
b.size	The number of array elements
b.shape	Array shape
.You	Array dimensions
b.dtype	Array element type
b.itemsize	Array element size in bytes

# 数组属性
import numpy as np
x = np.array([(1,2,3),(4,5,6)])
print(x)
print(x.size)
print(x.ndim)
print(x.shape)
print(x.itemsize)
print(x.dtype,'\n')

y = x.reshape(3,2)
print(y)
print(y.shape)

Output:

[[1 2 3]
 [4 5 6]]
6
2
(2, 3)
4
int32 

[[1 2]
 [3 4]
 [5 6]]
(3, 2)

Between the array and scalar operations:
> array is important because he can not make us write cycles to perform bulk operations on data. This is often called a vector of (vectorization). Any arithmetic operation are equal in size between the array elements is applied to the operational level. Similarly, an array of scalar arithmetic operation that will be propagated to the scalar value of each element.

>> import numpy as np
>> arr = np.array([[1.,2.,3.],[4.,5.,6.]])
>> arr
array([[1., 2., 3.],
       [4., 5., 6.]])
>> 1/arr
array([[1.        , 0.5       , 0.33333333],
       [0.25      , 0.2       , 0.16666667]])
>> arr - arr
array([[0., 0., 0.],
       [0., 0., 0.]])
>> arr*arr
array([[ 1.,  4.,  9.],
       [16., 25., 36.]])
>> arr**0.5
array([[1.        , 1.41421356, 1.73205081],
       [2.        , 2.23606798, 2.44948974]])

[Basic indexing and slicing]

> There are many individual selected subset of the data element or embodiment

> One-dimensional array is very simple, on the surface, they function almost like Python list.

> List with the most important difference is that one-dimensional array, the array slice is a view of the original array, which means that data can not be copied, any changes will be on view in an array of reaction directly to the original array.

> A scalar value assigned to a time slice, the value is automatically propagated to the entire selection.

>> import numpy as np
>> arr = np.arange(10)
>> print(arr)
[0 1 2 3 4 5 6 7 8 9]

>> arr[5]    # 取索引为5的值
5
>> arr[5:8]  # 取索引为[5,8)这个区间的值
array([5, 6, 7])
>> arr[5:8] = 12   # 将索引为5~8的元素修改为12
>> arr
array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])

>> arr_slice = arr[5:8]  # 将arr数组5~8的元素作为新数组的元素
>> arr_slice[1] =12345   # 将新数组的索引为1的元素修改为12345
>> arr
array([    0,     1,     2,     3,     4,    12, 12345,    12,     8,
           9])

>> arr_slice[:] = 64
>> arr
array([ 0,  1,  2,  3,  4, 64, 64, 64,  8,  9])

> In the two-dimensional array, each element of the index position is no longer scalars, but one-dimensional array

> You can also recursive access to individual elements, but a bit of trouble

> Another way is to pass a comma-separated single element selected index

> In a multidimensional array, if the latter index is omitted, the dimension of the object returned will be a low point ndarray

>> import numpy as np
>> arr3d = np.array([[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]]])  # 两个二维数组组成的三维数组
>> arr3d
array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

>> arr3d[0]   # 取三维数组的索引为0的元素，即第一个二维数组
array([[1, 2, 3],
       [4, 5, 6]])
>> arr3d[0][1]  # 按索引取第一个二维数组的第二个一维数组
array([4, 5, 6])

[Mathematical and statistical methods]

Array basic statistical method: can be calculated for the entire array or a data axially by a set of mathematical functions on the array

method	Explanation
sum	All or some of the array elements axially summation. The array length is zero sum 0
mean	Arithmetic mean. Mean zero-length array of NaN
std，var	Variance and standard deviation, respectively, the adjustable degrees of freedom (default n)
min，max	Maximum and minimum values
argmin，argmax	Index the minimum and maximum elements
cumsum	Accumulate all the elements
cumprod	Accumulate all the elements

> Sum, mean and standard deviation std is calculated and the like may be used as examples of the polymerization method call array, may be used as a function numpy item level.

>> import numpy as np
>> arr = np.random.randn(5,4)   # 5行4列正太分布的数据
>> print(arr)
>> print(arr.mean(),'\n')    # 实例方法调用
>> print(np.mean(arr),'\n')
>> print(arr.sum(),'\n')

operation result:

[[ 1.00337894 -0.58464469  0.52354766 -0.9112032 ]
 [ 0.39131326  0.26497808  0.50512501  0.23437021]
 [-0.86304844  0.47144653  0.7895093  -0.37087672]
 [ 1.2245063  -0.10109734 -1.47588426 -1.03102073]
 [ 0.02636393 -0.96448101 -0.34462713 -0.24302158]]
-0.07276829372562879 

-0.07276829372562879 

-1.455365874512576

> Mean sum and of such a function can accept a axis parameter (for calculating the statistics in the axial direction)

>> print(arr.mean(axis=0),'\n')  # 按列求平均值
[ 0.3565028  -0.18275968 -0.00046589 -0.4643504 ]  

>> print(arr.mean(axis=1),'\n')  # 按行求平均值
[ 0.00776968  0.34894664  0.00675767 -0.34587401 -0.38144145]

We look at the whole sample:

import numpy as np
arr = np.random.randn(5,4)   # 5行4列正太分布的数据
print(arr)
print(arr.mean(),'\n')    # 实例方法调用
print(np.mean(arr),'\n')
print(arr.sum(),'\n')

print(arr.mean(axis=0),'\n')  # 按列求平均值
print(arr.mean(axis=1),'\n')  # 按行求平均值
print(arr.sum(0),'\n')
print(arr.sum(1),'\n')

b = np.array(arr[0])
print(arr[0])
print(b.mean())   # 验证第一行平均值

Operating results and verify:

[[ 1.00337894 -0.58464469  0.52354766 -0.9112032 ]
 [ 0.39131326  0.26497808  0.50512501  0.23437021]
 [-0.86304844  0.47144653  0.7895093  -0.37087672]
 [ 1.2245063  -0.10109734 -1.47588426 -1.03102073]
 [ 0.02636393 -0.96448101 -0.34462713 -0.24302158]]
-0.07276829372562879 

-0.07276829372562879 

-1.455365874512576 

[ 0.3565028  -0.18275968 -0.00046589 -0.4643504 ] 

[ 0.00776968  0.34894664  0.00675767 -0.34587401 -0.38144145] 

[ 1.78251399 -0.91379842 -0.00232943 -2.32175202] 

[ 0.03107872  1.39578656  0.02703067 -1.38349603 -1.52576578] 

[ 1.00337894 -0.58464469  0.52354766 -0.9112032 ]
0.007769678803297209

> Cumsum: returns the element according to the given axis parameters and cumulative trapezoidal, axis = 0, accumulating in rows. axis = 1, accumulated in columns

> Cumprod: Cumulative returns the element according to the given parameters trapezoidal product axis, axis = 0, accumulated in rows. axis = 1, accumulated in columns.

import numpy as np
arr = np.array([[0,1,2],[3,4,5],[6,7,8]])
print(arr)
print(arr.cumsum(0))   # 按行求累加

print(arr.cumprod(1))  # 按列求累积

operation result:

[[0 1 2]
 [3 4 5]
 [6 7 8]]
[[ 0  1  2]
 [ 3  5  7]
 [ 9 12 15]]
[[  0   0   0]
 [  3  12  60]
 [  6  42 336]]

Finally, a need to explain: rows accumulate seek to follow, including the current line, including the front of the line so demand accumulated demand accumulated cumulative columns, including the current row, including all previous columns.

Only the most know what White White needs more practice, understanding the control codes, refueling.

Dream small hacker.

Published 51 original articles · won praise 5 · Views 2011

Private letter concerns

Python data processing notes 01 - numpy array operations

1, numpy Library Overview and Installation

introduction:

2, numpy array operations

Guess you like