1, the time stamp is not missing values NaN, as the NaT, are likewise determined ISNA () or notna () Method
2, the value-\ deduplication
df.dropna() df.drop_duplicates()
3, the vertical interpolation value
value df.fillna (method = 'pad') # can be inserted and removed with the above parameter = limit df.fillna (Method = 'bfill') # interpolation takes the following values have the same time limit may be several interpolation parameter indicates the value df.fillna (pd.mean () [[]] ) # mean value interpolation, the number of columns behind which plug
df.interpolate () # linear interpolation
Implemented by means of interpolation function, known data to solve for the position data, interpolation is very common in the field of data analysis, the benefits can try to restore the data itself like the interpolate the linear interpolation method (), the benefits of the default linear interpolation, the original data not good when using the average value or down effect (the presence of continuous numerical variables missing values), this method can try.
Interpolation situation:
SciPy
- data faster growth rate may be selected `method = 'quadratic'` quadratic interpolation. - data sets showing cumulative distribution of the way, it is recommended to select `method = 'pchip'`. - default values to be filled in order to smooth drawing target, it is recommended to select `method = 'akima'`.
4、Series
Additions add
Subtraction sub
Multiplication mul
Division div
5、DataFrame
Create time interval date_range (): such pd.date_range ( 'today', periods = 6)
1) Create a way, the array array
2) Dictionary way
6, Other:
1) th column data query, query a plurality of columns, require double brackets [[]], DF [[ 'column name 1', '2 column name']],
2) Sort sort_values (by = '')
3) modification value df.iat [,], df.loc [ '', ''] will directly modify df
4) Case Conversion df.str.lower (), with opposite upper ()
...