Inserting non-number rrows in MultiIndex dataframe

Vadim :

I have a pandas data-frame with multiple features, where I would like to insert rows of nans corresponding to only the first feature. In other words, I would like to transform something like this: enter image description here

into this: enter image description here

As I will be dealing with large datasets, the speed is important.

jezrael :

For general solution for select missing values if more columns add new DataFrame created by DataFrame.drop_duplicates, selecting features columns and rewritten data in feat2, so if use concat are all another columns replaced to missing values. Last for correct order add DataFrame.sort_values:

df1 = df.drop_duplicates('feat1')[['feat1','feat2']].assign(feat2='-')
df2 = (pd.concat([df1, df], sort=False, ignore_index=True)
         .sort_values('feat1'))

print (df2)
   feat1 feat2  var
0      A     -  NaN
3      A     x  0.0
4      A     y  1.0
5      A     z  2.0
1      B     -  NaN
6      B     x  3.0
7      B     y  4.0
8      B     z  5.0
2      C     -  NaN
9      C     x  6.0
10     C     y  7.0
11     C     z  8.0

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=27833&siteId=1