pandas entry: data consolidation of --concat

pandas concat knowledge points summary.

  • Create three dataframe
df1 = pd.DataFrame(np.random.random((3, 3)), columns=list("ABC"))
df2 = pd.DataFrame(np.random.random((1, 3)), columns=list("ABD"))
df3 = pd.DataFrame(np.random.random((2, 3)), columns=list("ABC"))

The results are as follows:

          A         B         C
0  0.169679  0.250973  0.033926
1  0.517431  0.069529  0.117868
2  0.035198  0.693947  0.791094
          A         B         D
0  0.285279  0.944059  0.433911
          A         B         C
0  0.585748  0.851175  0.810830
1  0.901091  0.910537  0.475615
  • All default parameters

Default axis = 0, that is combined in the column direction, and non-consolidated set of directions taken columns; default index value is not cumulative, i.e., retain the original index

result = pd.concat([df1, df2])

The results are as follows:

          A         B         C         D
0  0.156784  0.940447  0.522436       NaN
1  0.728026  0.669282  0.495065       NaN
2  0.947834  0.150463  0.804081       NaN
0  0.891067  0.020196       NaN  0.242114
  • Merge columns

axis = 0, the horizontal direction and columns set taken, combined vertical index, no accumulation

result = pd.concat([df1, df2], axis=0)

The results are as follows:

          A         B         C         D
0  0.790409  0.950151  0.125780       NaN
1  0.074662  0.856551  0.558453       NaN
2  0.558790  0.418458  0.553458       NaN
0  0.244919  0.550575       NaN  0.778046
  • Merge row

axis = 1, horizontal columns are added, have also retained the same column, in the vertical direction and take the set index

result = pd.concat([df1, df2], axis=1)

The results are as follows:

          A         B         C         A         B        D
0  0.899548  0.893985  0.600403  0.665124  0.494773  0.18973
1  0.296687  0.954922  0.507403       NaN       NaN      NaN
2  0.280254  0.267325  0.375680       NaN       NaN      NaN
  • ignore_index parameters

Default ignore_index = False

axis = 0 when, ignore_index = True, so that the accumulated index

result = pd.concat([df1, df2], axis=0, ignore_index=True)

The results are as follows:

          A         B         C         D
0  0.142223  0.171115  0.345506       NaN
1  0.868534  0.969604  0.561111       NaN
2  0.769472  0.141292  0.846930       NaN
3  0.209132  0.726342       NaN  0.460136

When the axis = 1, ignore_index = True, the default indexes

result = pd.concat([df1, df2], axis=1, ignore_index=True)

The results are as follows:

          0         1         2         3         4        5
0  0.613608  0.699028  0.710746  0.158601  0.214546  0.70234
1  0.071271  0.058034  0.445593       NaN       NaN      NaN
2  0.433755  0.516567  0.791369       NaN       NaN      NaN
  • After the merger of the specified column

Take two columns of the dataframe concat combined and set default, column names and index the attributes set join_axes combined may be employed.

Default join_axes = None, when the 0 axis =, combined according to a vertical direction, provided join_axes = [df1.columns], represents df1 The combined use of columns

result = pd.concat([df1, df2], axis=0, join_axes=[df1.columns])

The results are as follows:

          A         B         C
0  0.056351  0.774601  0.379272
1  0.946589  0.068344  0.200789
2  0.876588  0.506720  0.210272
0  0.512249  0.523099       NaN

axis = 1, the horizontal direction combined, provided join_axes = [df1.index], index of the combined use df1

result = pd.concat([df1, df2], axis=1, join_axes=[df1.index])

The results are as follows:

          A         B         C         A         B         D
0  0.536844  0.498911  0.374395  0.340025  0.640539  0.611227
1  0.321700  0.487316  0.829186       NaN       NaN       NaN
2  0.493442  0.368903  0.480279       NaN       NaN       NaN
  • join parameters

In concat function, the value of the parameter only join Outer and inner, left, and right is not

join = 'inner', the column must be exactly the same, equivalent intersected

result = pd.concat([df1, df2], axis=1, join='inner')

The results are as follows:

          A         B         C         A         B         D
0  0.526198  0.231218  0.478691  0.682161  0.377862  0.722153

When join = 'outer', it will retain all of the columns corresponding to the union taken

result = pd.concat([df1, df2], axis=1, join='outer')

The results are as follows:

          A         B         C         A         B         D
0  0.655303  0.132546  0.967381  0.400043  0.160096  0.268971
1  0.079759  0.210028  0.904587       NaN       NaN       NaN
2  0.172952  0.604146  0.531020       NaN       NaN       NaN
  • At the same time set the join and join_axes, subject to join_axes
result = pd.concat([df1, df2], axis=0, join='inner', join_axes=[df1.columns])

The results are as follows:

          A         B         C
0  0.116892  0.321588  0.670490
1  0.670587  0.830011  0.467221
2  0.901773  0.857747  0.127813
0  0.354468  0.269192       NaN
  • concat multiple dataframe
result = pd.concat([df1, df2], axis=0, join='inner', join_axes=[df1.columns])

The results are as follows:

          A         B         C
0  0.422407  0.191703  0.951058
1  0.772422  0.868453  0.528624
2  0.752645  0.164527  0.400265
3  0.104067  0.747079       NaN
4  0.916764  0.083018  0.049442
5  0.943200  0.317038  0.404493

 

reference:

https://www.cnblogs.com/guxh/p/9451532.html

Published 79 original articles · won praise 45 · views 220 000 +

Guess you like

Origin blog.csdn.net/jp_666/article/details/104365398