A, numpy module

About 1.1 numpy

numpy python is an open source numerical calculation extension library, which library can be used to store and process large numpy array.

numpy library has two functions:

1, different from the list list, offers an array of operations, array operations, as well as the statistical distribution and simple mathematical model

2, calculation speed, even better than a simple arithmetic python built, so that it becomes dependencies pandas, sklearn uniformly. Advanced frameworks such TensorFlow, PyTorch the like, and also numpy array operations are very similar.

1.2 create numpy arrays

That numpy array numpy of ndarray object, creating a list of numpy arrays is to pass np.array () method

import numpy as np 

# 创建一维的ndarray对象
arr =np.array([1,2,3])
print(arr,type(arr))   # [1 2 3] <class 'numpy.ndarray'

# 创建二维的ndarray对象
print(np.array([[1,2,3],[4,5,6]]))  

--------------------------------------------------------------------------------
[[1 2 3]
 [4 5 6]]

1.3 numpy array of common properties

Attributes	Explanation
T	Transpose of the array (in terms of high dimensional array)
dtype	Data type of the array element
size	The number of array elements
help	Dimensions of the array
shape	Dimensions size of the array (in the form of tuples)
astype	Type Conversion

arr = np.array([[1,2,3],[4,5,6]])
print(arr.T)  # 行与列互换
--------------------------------------------------------------------------------
[[1 4]
 [2 5]
 [3 6]]

1.4 slices

arr = np .array([[1,2,3],[4,5,6]])
print(arr[:])  # 取出数组所有元素
print(arr[:,:]) # 取出数组所有元素
print(arr[0,:]) # 取出第0行到第一行的数组
print(arr[0:1,:])  # 取出第0行到第一行的数组，顾头不顾尾
print(arr[0:1,0:1])  # 取出第0行到第一行，第0列到第一列的数组，顾头不顾尾
print(arr[0, 0],type(arr[0, 0]))   #取出第0行到第一行，第0列到第一列的数，输出数组类型
print(arr[0, [0,2]])   #取出第0行第0个元素和第2个元素   [1 3]
print(arr[0, 0] + 1)  #取出第0行第0列的元素加1  2

1.5 Value

arr = np.array([[1, 2, 3], [4, 5, 6]])
arr[0, :] = 0    #将第0行的元素全部变为0
print(arr)
--------------------------------------------
[[0 0 0]
 [4 5 6]]


arr[1, 1] = 1  #将第一行第一列的数字改为1   
print(arr)
--------------------------------------------------------------------------------
[[0 0 0]
 [4 1 6]]

arr[arr < 3] = 3  # 布尔取值   将小于3的数字全部变为3
print(arr)
--------------------------------------------------------------------------------
[[3 3 3]
 [4 3 6]]

1.6 Merge

arr1 = np.array([[1, 2, 3], [4, 5, 6]])  # 可变数据类型
arr2 = np.array([[7, 8, 9], [10, 11, 12]])  # 可变数据类型
-------------------------------------------------------
[[1 2 3]
 [4 5 6]]
[[ 7  8  9]
 [10 11 12]]

print(np.hstack((arr1,arr2)))  # 行合并
------------------------------------------------------------
[[ 1  2  3  7  8  9]
 [ 4  5  6 10 11 12]]

print(np.vstack((arr1,arr2)))  # 列合并
------------------------------------------------
[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]

print(np.concatenate((arr1, arr2)))  # 默认列合并
print(np.concatenate((arr1, arr2),axis=1))  # 1表示行;0表示列
-----------------------------------------------------------
[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
[[ 1  2  3  7  8  9]
 [ 4  5  6 10 11 12]]

1.7 create numpy array by function

method	Detailed
array()	Converting an array list, select explicitly specified dtype
arange()	range of numpy version, support for floating-point numbers
linspace()	Similarly arange (), the third parameter is the length of the array
zeros()	0 np.zeros create a full array according to a predetermined shape and dtype ((5, 5))
ones()	Np.ones create a full array 1 according to a predetermined shape and dtype ((5, 5))
eye()	Create a matrix (figure 1 on the diagonal)
empty()	Creating a full array of random elements
reshape()	Reshape

1.7 numpy array operations

Operators	Explanation
+	Adding the corresponding elements of two arrays numpy
-	Two subtracting corresponding elements of the array numpy
*	Multiplying corresponding elements of two arrays numpy
/	Numpy array elements corresponding to two division, if the take are integers supplier
%	Numpy corresponding elements of two arrays take the remainder after division
**n	Each individual array elements take numpy n-th power, such as 2 **: squaring each element

1.9 additional understanding

numpy random number

print(np.random.rand(3,4))  #随机生成一个3*4的数组

print(np.random.randint(1,10,(3,4))) # 最小值1,最大值10,3*4

print(np.random.choice([1,2,3,4,5],3))   #随机生成一个元素为3个的数组，数组元素在[1,2,3,4,5]内

Emphasis

Random number seed: All the random number is generated in a random number seed

The short time constant, time becomes longer

np.random.seed(int(time.time()))
np.random.seed(1)   #如果固定了就不会变
arr1 = np.random.rand(3,4)  # 可变数据类型
print(arr1)
rs = np.random.RandomState(1)  #产生一个随机状态种子，seed为1
print(rs.rand(3,4))
---------------------------------------------------------
[[4.17022005e-01 7.20324493e-01 1.14374817e-04 3.02332573e-01]
 [1.46755891e-01 9.23385948e-02 1.86260211e-01 3.45560727e-01]
 [3.96767474e-01 5.38816734e-01 4.19194514e-01 6.85219500e-01]]

Two, pandas module

1, the import mode

import pandas as pd

2, the role of

For document processing, more is done to excel file processing module numpy + xlrd made of one layer of encapsulation

3, pandas module data type

3.1 series()

Now generally do not use (one-dimensional)

df = pd.series(np.array([1,2,3,4]))
print(df)

3.2 DataFrame () (multi-dimensional)

3.2.1

dates = pd.date_range('20190101', periods=6, freq='M')
print(dates)    # periods=6, freq='M'代表前六个月

start	Starting time
end	End Time
periods	length of time
freq	Temporal frequency, the default is 'D', the optional H (our), W (eek), B (usiness), S (emi-) M (onth), (min) T (es), S (econd), A (year), ...

3.2.2 Properties

Attributes	Detailed
dtype is	View Data Types
index	Check sequence of rows or index
columns	Look at the label of each column
values	See data frame data, i.e., the index data contain header
describe	See extremum data, the mean, the median of each column, only numeric data can be used
transpose	Transpose, it can also be used to operate the T
sort_index	Sorting, sorting by row or column index output
sort_values	Sorting the data values according to

3.2.3 Value

#构造一个数组
dates = pd.date_range('20190101', periods=6, freq='M')
print(dates)

values = np.random.rand(6, 4) * 10
print(values)

columns = ['c4','c2','c3','c1']

#主要掌握
df.values[1,1]   #取出第一行第一列
df.iloc[1,1] = 1  #取出第一行第一列，替换为1

3.2.4 operating table

1, missing values

df = df.dropna(axis = 0)    #按行删除缺失值
df
df = df.dropna(tresh = 4)   #必须得有4个值，写5就不可以，因为只有4列
df = df.dropna(axis=0)  # 1列,0行
df  #按行取缺失值

2, the data merging processing

df1 = pd.DataFrame(np.zeros((2,3)))  #用0合并两行三列
df2 = pd.DataFrame(np.ones((2,3)))  #用1合并两行三列
pd.concat((df1,df2))  #默认按列合并
pd.concat((df1,df2),axis=1)    axis=1是行，0是列
df1.append(df2)   #往后追加

Import data, read json file only rookie to do to understand

numpy module and the module pandas