pandas concat “InvalidIndexError: Reindexing only valid with uniquely valued Index objects“

Using the concat method of pandas can quickly stack multiple DataFrames, which is very convenient, but when using concat
, the prompt "pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects " will appear . The translation is: use pandas to pair When df performs concat operation, the corresponding row and column index must be unique . But there is an exception. What is it? Let me talk about it slowly.

First, create a few dfs for demonstration:

First, df1 and df2 are spliced ​​vertically and horizontally, as shown in the following code. We found that under the premise that the column and row indexes do not have the same name, concat runs normally.

The following introduces the concat splicing operation between df3 and df1 of the same column index. Because the ab column in the column index abcab of df3 is duplicated, df3 cannot be concat stacked with df1.

The following introduces the same column index of df3 and df4 for concat splicing operation. Although the ab column in the column index abcab of df3 and df4 is duplicated, the column index of df3 and df4 is the same, so concat can also run normally. This is the exception.

So how to concat the duplicate row and column index df? That is:

To remove duplicates, if the row index is duplicated, perform df.reset_index() on these two dfs.

If the column index is duplicated, then use df.rename(columns="xxxx",inplace=True) to rename the column index

import pandas as pd
import numpy as np

df1=pd.DataFrame(np.random.randint(0,5,(6,5)),columns=list("abcde"))

df3=pd.DataFrame(np.random.randint(-10,0,(3,5)),columns=list("abcab"))

#df1=df1.reset_index()
#df3=df3.reset_index()

data3.columns = [j + f'_{i}' if data.columns.duplicated()[i] else j for i,j in enumerate(data.columns)]

df_c13=pd.concat([df1,df3],keys=["df1","df3"])
print(df_c13)

In the end we came to the conclusion: if the column names of two dfs are exactly the same, then even if the df rows and columns contain duplicate column indexes, the vertical stacking operation can be concat

 

Guess you like

Origin blog.csdn.net/xcntime/article/details/115180924