Data processing pandas

1, the time stamp is not missing values NaN, as the NaT, are likewise determined ISNA () or notna () Method
2, the value-\ deduplication

df.dropna()
df.drop_duplicates()

3, the vertical interpolation value

value df.fillna (method = 'pad') # can be inserted and removed with the above parameter = limit 
df.fillna (Method = 'bfill') # interpolation takes the following values have the same time limit may be several interpolation parameter indicates the value 
df.fillna (pd.mean () [[]] ) # mean value interpolation, the number of columns behind which plug 
df.interpolate () # linear interpolation

Implemented by means of interpolation function, known data to solve for the position data, interpolation is very common in the field of data analysis, the benefits can try to restore the data itself like the interpolate the linear interpolation method (), the benefits of the default linear interpolation, the original data not good when using the average value or down effect (the presence of continuous numerical variables missing values), this method can try.
Interpolation situation:

SciPy 
- data faster growth rate may be selected `method = 'quadratic'` quadratic interpolation. - data sets showing cumulative distribution of the way, it is recommended to select `method = 'pchip'`. - default values to be filled in order to smooth drawing target, it is recommended to select `method = 'akima'`.

4、Series

Additions add

Subtraction sub 

Multiplication mul

Division div

5、DataFrame

Create time interval date_range (): such pd.date_range ( 'today', periods = 6)

1) Create a way, the array array

2) Dictionary way

6, Other:

1) th column data query, query a plurality of columns, require double brackets [[]], DF [[ 'column name 1', '2 column name']],

2) Sort sort_values ​​(by = '')

3) modification value df.iat [,], df.loc [ '', ''] will directly modify df

4) Case Conversion df.str.lower (), with opposite upper ()

 

...

 

 

 

 



 

Guess you like

Origin www.cnblogs.com/hqczsh/p/11599743.html