Timothée HENRY :
I would like to add missing rows corresponding to a given index.
For example, if I have:
df = pd.DataFrame({"date": ["1", "2", "1", "3"],
"name": ["bob", "bob", "anne", "anne"],
"x": [1, 2, 2, 3],
"y": [2, 4, 5, 5]})
I would like to obtain the following:
name date x y
anne 1 2 5
anne 2 NA NA <- because date 2 is missing for Anne
anne 3 3 5
bob 1 1 2
bob 2 2 4
bob 3 NA NA <- because date 3 is missing for Bob
I have tried numerous things with pivot_table, pivot, but could not figure it out so far.
df.pivot_table(index = ["name", "date"], values = ['x','y'], fill_value=0).reset_index()
is not filling missing rows.
jezrael :
Use DataFrame.set_index
with DataFrame.unstack
, DataFrame.stack
and DataFrame.reset_index
:
df = df.set_index(["name", "date"]).unstack().stack(dropna=False).reset_index()
print (df)
name date x y
0 anne 1 2.0 5.0
1 anne 2 NaN NaN
2 anne 3 3.0 5.0
3 bob 1 1.0 2.0
4 bob 2 2.0 4.0
5 bob 3 NaN NaN
Your solution is possible same way:
df = df.pivot_table(index = ["name", "date"], values = ['x','y'], fill_value=0).unstack().stack(dropna=False).reset_index()
print (df)
name date x y
0 anne 1 2.0 5.0
1 anne 2 NaN NaN
2 anne 3 3.0 5.0
3 bob 1 1.0 2.0
4 bob 2 2.0 4.0
5 bob 3 NaN NaN