pandas concat knowledge points summary.
- Create three dataframe
df1 = pd.DataFrame(np.random.random((3, 3)), columns=list("ABC"))
df2 = pd.DataFrame(np.random.random((1, 3)), columns=list("ABD"))
df3 = pd.DataFrame(np.random.random((2, 3)), columns=list("ABC"))
The results are as follows:
A B C
0 0.169679 0.250973 0.033926
1 0.517431 0.069529 0.117868
2 0.035198 0.693947 0.791094
A B D
0 0.285279 0.944059 0.433911
A B C
0 0.585748 0.851175 0.810830
1 0.901091 0.910537 0.475615
-
All default parameters
Default axis = 0, that is combined in the column direction, and non-consolidated set of directions taken columns; default index value is not cumulative, i.e., retain the original index
result = pd.concat([df1, df2])
The results are as follows:
A B C D
0 0.156784 0.940447 0.522436 NaN
1 0.728026 0.669282 0.495065 NaN
2 0.947834 0.150463 0.804081 NaN
0 0.891067 0.020196 NaN 0.242114
-
Merge columns
axis = 0, the horizontal direction and columns set taken, combined vertical index, no accumulation
result = pd.concat([df1, df2], axis=0)
The results are as follows:
A B C D
0 0.790409 0.950151 0.125780 NaN
1 0.074662 0.856551 0.558453 NaN
2 0.558790 0.418458 0.553458 NaN
0 0.244919 0.550575 NaN 0.778046
- Merge row
axis = 1, horizontal columns are added, have also retained the same column, in the vertical direction and take the set index
result = pd.concat([df1, df2], axis=1)
The results are as follows:
A B C A B D
0 0.899548 0.893985 0.600403 0.665124 0.494773 0.18973
1 0.296687 0.954922 0.507403 NaN NaN NaN
2 0.280254 0.267325 0.375680 NaN NaN NaN
- ignore_index parameters
Default ignore_index = False
axis = 0 when, ignore_index = True, so that the accumulated index
result = pd.concat([df1, df2], axis=0, ignore_index=True)
The results are as follows:
A B C D
0 0.142223 0.171115 0.345506 NaN
1 0.868534 0.969604 0.561111 NaN
2 0.769472 0.141292 0.846930 NaN
3 0.209132 0.726342 NaN 0.460136
When the axis = 1, ignore_index = True, the default indexes
result = pd.concat([df1, df2], axis=1, ignore_index=True)
The results are as follows:
0 1 2 3 4 5
0 0.613608 0.699028 0.710746 0.158601 0.214546 0.70234
1 0.071271 0.058034 0.445593 NaN NaN NaN
2 0.433755 0.516567 0.791369 NaN NaN NaN
- After the merger of the specified column
Take two columns of the dataframe concat combined and set default, column names and index the attributes set join_axes combined may be employed.
Default join_axes = None, when the 0 axis =, combined according to a vertical direction, provided join_axes = [df1.columns], represents df1 The combined use of columns
result = pd.concat([df1, df2], axis=0, join_axes=[df1.columns])
The results are as follows:
A B C
0 0.056351 0.774601 0.379272
1 0.946589 0.068344 0.200789
2 0.876588 0.506720 0.210272
0 0.512249 0.523099 NaN
axis = 1, the horizontal direction combined, provided join_axes = [df1.index], index of the combined use df1
result = pd.concat([df1, df2], axis=1, join_axes=[df1.index])
The results are as follows:
A B C A B D
0 0.536844 0.498911 0.374395 0.340025 0.640539 0.611227
1 0.321700 0.487316 0.829186 NaN NaN NaN
2 0.493442 0.368903 0.480279 NaN NaN NaN
- join parameters
In concat function, the value of the parameter only join Outer and inner, left, and right is not
join = 'inner', the column must be exactly the same, equivalent intersected
result = pd.concat([df1, df2], axis=1, join='inner')
The results are as follows:
A B C A B D
0 0.526198 0.231218 0.478691 0.682161 0.377862 0.722153
When join = 'outer', it will retain all of the columns corresponding to the union taken
result = pd.concat([df1, df2], axis=1, join='outer')
The results are as follows:
A B C A B D
0 0.655303 0.132546 0.967381 0.400043 0.160096 0.268971
1 0.079759 0.210028 0.904587 NaN NaN NaN
2 0.172952 0.604146 0.531020 NaN NaN NaN
- At the same time set the join and join_axes, subject to join_axes
result = pd.concat([df1, df2], axis=0, join='inner', join_axes=[df1.columns])
The results are as follows:
A B C
0 0.116892 0.321588 0.670490
1 0.670587 0.830011 0.467221
2 0.901773 0.857747 0.127813
0 0.354468 0.269192 NaN
- concat multiple dataframe
result = pd.concat([df1, df2], axis=0, join='inner', join_axes=[df1.columns])
The results are as follows:
A B C
0 0.422407 0.191703 0.951058
1 0.772422 0.868453 0.528624
2 0.752645 0.164527 0.400265
3 0.104067 0.747079 NaN
4 0.916764 0.083018 0.049442
5 0.943200 0.317038 0.404493
reference:
https://www.cnblogs.com/guxh/p/9451532.html