1. concat merge data
- API: pd.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False, keys=None, leves=None, names=None, verify_integrity=False, copy=True)
- Parameter Description
- objs: [Series, DataFrame, Panel, ..] list to be merged DataFrame
- axis: {0, 1} combined shaft, combined column: axis = 1, are consolidated: axis = 0
- join: { 'inner', 'outer'} outer is an intersection of the joint, inner
- use
- The combined lateral (column combined): pd.concat ([df1, df2, ...], axis = 1)
- The combined longitudinal (add line): pd.concat ([df1, df2, ...], axis = 0)
2. Slice
- API: df.iloc, in accordance with the order of position acquisition
- pd.iloc [line start position: end position of row and column start position: column end position]
- API: df.loc, according to the name of acquiring
- pd.loc [line beginning Name: end of the line name, the name of the column began: End Column Name]
3. date related features
- Converting the data into numeric date format: data [ 'data_parsed'] = pd.to_datetime (data [ 'date'], format = '% Y% m% d')
- To convert numeric date format: dt.strftime ( '% Y-% M-% D') # 4 is in the Y%,% y is 2 years
- Acquiring property date format
- Gets Year: dt.year
- Gets month: dt.mouth
- Access to day: dt.day
- Gets hours: dt.hour
- Get the name of the week:. Data [ 'daynameofweek'] = data [ 'data_parsed'] dt.weekday_name
4. Thermal encoding: get_dummies
-
API : pd.get_dummies(data, prefix, columns)
5. Check the value to a restated
- API: data.column.unique()
6. Check if there are missing values and infinite value
- View missing values: all_dummy_df.isnull () sum () sort_values (ascending = False) .head ()..
-
View miss rate
total = df_train.isnull().sum().sort_values(ascending=False) percent = (df_train.isnull().sum()/df_train.isnull().count()).sort_values(ascending=False) missing_data = pd.concat([total, percent], axis=1, keys=['Total', 'Percent']) missing_data.head(20)
- See infinite value: np.isinf (data [ 'column']) any ().
- Replace missing values and infinite value
- data.replace(np.inf, 0, inplace=True)
- data.replace(np.nan, 0, inplace=True)
7. pandas omitted are not displayed in rows and columns
- It is not omitted display line: pd.set_option ( 'display.max_rows', None)
- Display column is not omitted: pd.set_option ( 'display.max_columns', None)