I am trying to concatenate rows of a data frame that contains strings. I want to check if the row contains NaN
and if so, remove the NaN
from that row and concatenate the rest with the one above that row. Finally remove the row that contains NaN.
Here is my sample data:
df=[["d","t","u","y","e"],["d",np.nan,np.nan,np.nan,"o"],["y","p","p","w","r"]]
df=pd.DataFrame(df)
print(df)
0 1 2 3 4
d t u y e
d NaN NaN NaN o
y p p w r
I want the output to look like the one below.
0 1 2 3 4
dd t u y eo
y p p w r
Here is my trial, but no luck.
for i in range(len(df)):
for j in range(len(df.iloc[1,])):
if(pd.isnull(df.iloc[i,j])==True):
df.concat(df.iloc[i,j],df.iloc[i-1,j])
df.dropna(df.iloc[:,i])
I am new to Python, can anyone help me with this.
Idea is create helper Series
for grouping.
So first create mask for all rows with at least one NaN
s by DataFrame.isna
with DataFrame.any
, create Series
by constructor, replace non match values to NaN
s by Series.where
and back filling missing values for same groups above with limit=1
for replace only one row above.
Last replace all missing values to empty values, grouping and aggregate join
:
m = df.isna().any(axis=1)
s = pd.Series(np.arange(len(m)), index=df.index)
g = s.where(m).bfill(limit=1).fillna(s)
df = df.fillna('').groupby(g).agg(''.join).reset_index(drop=True)
print (df)
0 1 2 3 4
0 dd t u y eo
1 y p p w r