pandas common data analysis

1. concat merge data

  • API: pd.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False, keys=None, leves=None, names=None, verify_integrity=False, copy=True)
  • Parameter Description
    • objs: [Series, DataFrame, Panel, ..] list to be merged DataFrame
    • axis: {0, 1} combined shaft, combined column: axis = 1, are consolidated: axis = 0 
    • join: { 'inner', 'outer'} outer is an intersection of the joint, inner
  • use
    • The combined lateral (column combined): pd.concat ([df1, df2, ...], axis = 1)
    • The combined longitudinal (add line): pd.concat ([df1, df2, ...], axis = 0)

2. Slice

  • API: df.iloc, in accordance with the order of position acquisition
    • pd.iloc [line start position: end position of row and column start position: column end position]
  • API: df.loc, according to the name of acquiring
    • pd.loc [line beginning Name: end of the line name, the name of the column began: End Column Name]  

3. date related features

  • Converting the data into numeric date format: data [ 'data_parsed'] = pd.to_datetime (data [ 'date'], format = '% Y% m% d')
  • To convert numeric date format: dt.strftime ( '% Y-% M-% D')  # 4 is in the Y%,% y is 2 years
  • Acquiring property date format
    • Gets Year: dt.year
    • Gets month: dt.mouth
    • Access to day: dt.day
    • Gets hours: dt.hour
    • Get the name of the week:. Data [ 'daynameofweek'] = data [ 'data_parsed'] dt.weekday_name

4. Thermal encoding: get_dummies

  • API : pd.get_dummies(data, prefix, columns)

5. Check the value to a restated

  • API: data.column.unique()

6. Check if there are missing values ​​and infinite value

  • View missing values: all_dummy_df.isnull () sum () sort_values ​​(ascending = False) .head ()..
  • View miss rate

    total = df_train.isnull().sum().sort_values(ascending=False)
   percent = (df_train.isnull().sum()/df_train.isnull().count()).sort_values(ascending=False)
   missing_data = pd.concat([total, percent], axis=1, keys=['Total', 'Percent'])
   missing_data.head(20)

 

  • See infinite value: np.isinf (data [ 'column']) any ().
  • Replace missing values ​​and infinite value
    • data.replace(np.inf, 0, inplace=True) 
    • data.replace(np.nan, 0, inplace=True)

7. pandas omitted are not displayed in rows and columns

  • It is not omitted display line: pd.set_option ( 'display.max_rows', None)
  • Display column is not omitted: pd.set_option ( 'display.max_columns', None)  

 

 

 

 

 

 

Guess you like

Origin www.cnblogs.com/scy645670291/p/12018119.html