Pandas use tips
A split rows
Common requirement is to split a column into multiple columns with the specified delimiter. Existing requirements, the specified delimiter split into multiple lines .
Example:
df = A B 0 a f 1 b;c h;g 2 d k 3 e l
Now it needs to be split into:
df = A B 0 a f 1 b h 1 c g 2 d k 3 e l
1.1 Treatment A column
Implementation process is as follows:
df = pd.DataFrame({'A': ['a', 'b;c', 'd', 'e'], 'B': ['f', 'h;j', 'k', 'l']}) df
A B
0 a f
1 b;c h;j
2 d k
3 el
A column in accordance with the ";" split, and to expand DataFrame, the effect due to expand null argument:
df_a = df['A'].str.split(';', expand=True) df_a 0 1 0 a None 1 b c 2 d None 3 e None
The df_a be stacked:
df_a = df_a.stack() df_a 0 0 a 1 0 b 1 c 2 0 d 3 0 e dtype: object
The index is reset to the column and the inner layer removed:
df_a = df_a.reset_index(level=1, drop=True) df_a 0 a 1 b 1 c 2 d 3 e dtype: object
Rename the Series , or the next merger will fail:
df_a.rename('A_split', inplace=True) df_a 0 a 1 b 1 c 2 d 3 e Name: A_split, dtype: object
1.2 Processing Column B
Process with columns A, after the final re-named:
df_b.rename('B_split', inplace=True) df_b 0 f 1 h 1 j 2 k 3 l Name: B_split, dtype: object
1.3 merge A_split and B_split
After the merger of the two levels of processing is complete:
concat_a_b = pd.concat([df_a, df_b], axis=1) concat_a_b
A_split B_split 0 a f 1 b h 1 c j 2 d k 3 e l
1.4 Finally, the original data and merge
The data eventually processed and raw data from the index merge:
df = df.join(concat_a_b, how='inner') df A B A_split B_split 0 a f a f 1 b;c h;j b h 1 b;c h;j c j 2 d k d k 3 e l e l
Finally we reached the desired effect.