How to combine rows of df using python?

user86907 :

I am trying to concatenate rows of a data frame that contains strings. I want to check if the row contains NaN and if so, remove the NaN from that row and concatenate the rest with the one above that row. Finally remove the row that contains NaN.

Here is my sample data:

df=[["d","t","u","y","e"],["d",np.nan,np.nan,np.nan,"o"],["y","p","p","w","r"]]
df=pd.DataFrame(df)
print(df)
0    1    2    3  4
d   t    u    y   e
d  NaN  NaN  NaN  o
y    p    p    w  r

I want the output to look like the one below.

   0    1    2    3   4
   dd   t    u    y  eo   
   y    p    p    w   r

Here is my trial, but no luck.

for i in range(len(df)):
for j in range(len(df.iloc[1,])):
    if(pd.isnull(df.iloc[i,j])==True):
        df.concat(df.iloc[i,j],df.iloc[i-1,j])
        df.dropna(df.iloc[:,i])

I am new to Python, can anyone help me with this.

jezrael :

Idea is create helper Series for grouping.

So first create mask for all rows with at least one NaNs by DataFrame.isna with DataFrame.any, create Series by constructor, replace non match values to NaNs by Series.where and back filling missing values for same groups above with limit=1 for replace only one row above.

Last replace all missing values to empty values, grouping and aggregate join:

m = df.isna().any(axis=1)
s = pd.Series(np.arange(len(m)), index=df.index)
g = s.where(m).bfill(limit=1).fillna(s)

df = df.fillna('').groupby(g).agg(''.join).reset_index(drop=True)
print (df)
    0  1  2  3   4
0  dd  t  u  y  eo
1   y  p  p  w   r

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=351107&siteId=1