Reshape Pandas Dataframe with duplicate Index and fill missing rows

Timothée HENRY :

I would like to add missing rows corresponding to a given index.

For example, if I have:

df = pd.DataFrame({"date": ["1", "2", "1", "3"],
                   "name": ["bob", "bob", "anne", "anne"],
                   "x": [1, 2, 2, 3],
                   "y": [2, 4, 5, 5]})

I would like to obtain the following:

    name    date    x   y
    anne    1       2   5
    anne    2       NA  NA   <- because date 2 is missing for Anne
    anne    3       3   5
    bob     1       1   2
    bob     2       2   4
    bob     3       NA  NA   <- because date 3 is missing for Bob

I have tried numerous things with pivot_table, pivot, but could not figure it out so far.

df.pivot_table(index = ["name", "date"], values = ['x','y'], fill_value=0).reset_index()

is not filling missing rows.

jezrael :

Use DataFrame.set_index with DataFrame.unstack, DataFrame.stack and DataFrame.reset_index:

df = df.set_index(["name", "date"]).unstack().stack(dropna=False).reset_index()
print (df)
   name date    x    y
0  anne    1  2.0  5.0
1  anne    2  NaN  NaN
2  anne    3  3.0  5.0
3   bob    1  1.0  2.0
4   bob    2  2.0  4.0
5   bob    3  NaN  NaN

Your solution is possible same way:

df = df.pivot_table(index = ["name", "date"], values = ['x','y'], fill_value=0).unstack().stack(dropna=False).reset_index()
print (df)
   name date    x    y
0  anne    1  2.0  5.0
1  anne    2  NaN  NaN
2  anne    3  3.0  5.0
3   bob    1  1.0  2.0
4   bob    2  2.0  4.0
5   bob    3  NaN  NaN

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=5264&siteId=1