pandas concat() does not join on same columns

jorijnsmit :

I have two dataframes which I am trying to concatenate. I made sure they have the same amount of columns and that the data types match.

However, when calling pd.concat([df1, df2], ignore_index=True) I get a dataframe back with 24 columns and lots of NaN values. I expect pd.concat() to just place the second dataframe 'underneath' the first one (so the default; axis=0).

What am I doing wrong?

>>> df1.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 798810 entries, 0 to 798809
Data columns (total 12 columns):
 #   Column  Non-Null Count   Dtype  
---  ------  --------------   -----  
 0   0       798810 non-null  Int64  
 1   1       798810 non-null  float64
 2   2       798810 non-null  float64
 3   3       798810 non-null  float64
 4   4       798810 non-null  float64
 5   5       798810 non-null  float64
 6   6       798810 non-null  Int64  
 7   7       798810 non-null  float64
 8   8       798810 non-null  Int64  
 9   9       798810 non-null  float64
 10  10      798810 non-null  float64
 11  11      798810 non-null  float64
dtypes: Int64(3), float64(9)
memory usage: 75.4 MB
>>> df2.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 12 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   0       500 non-null    Int64  
 1   1       500 non-null    float64
 2   2       500 non-null    float64
 3   3       500 non-null    float64
 4   4       500 non-null    float64
 5   5       500 non-null    float64
 6   6       500 non-null    Int64  
 7   7       500 non-null    float64
 8   8       500 non-null    Int64  
 9   9       500 non-null    float64
 10  10      500 non-null    float64
 11  11      500 non-null    float64
dtypes: Int64(3), float64(9)
memory usage: 48.5 KB
>>> pd.concat([df1, df2], ignore_index=True).shape
(799310, 24)
jezrael :

I think columns names in one Dataframe are not numeric, but strings, so you can try:

df1.columns = df1.columns.astype(int)
df2.columns = df2.columns.astype(int)

df = pd.concat([df1, df2], ignore_index=True)

Or:

df = pd.concat([df1.rename(columns=int), df2.rename(columns=int)], ignore_index=True)

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=17060&siteId=1