drchops :
I have the following dataframe:
ID Days TreatmentGiven TreatmentNumber
--- ---- -------------- ---------------
1 0 False NaN
1 30 False NaN
1 40 True 1
1 56 False NaN
2 0 False NaN
2 14 True 1
2 28 True 2
I'd like to create a new column with a new baseline for Days based on when the first treatment was given (TreatmentNumber==1), grouped by ID so that the result is the following:
ID Days TreatmentGiven TreatmentNumber New_Baseline
--- ---- -------------- --------------- ------------
1 0 False NaN -40
1 30 False NaN -10
1 40 True 1 0
1 56 False NaN 16
2 0 False NaN -14
2 14 True 1 0
2 28 True 2 14
What is the best way to do this?
Thank you.
anky_91 :
Here is one approach with series.where
+ groupby+transform
:
s = df['Days'].where(df['TreatmentGiven']).groupby(df['ID']).transform('first')
df['New_Baseline'] = df['Days'].sub(s)
Output
ID Days TreatmentGiven TreatmentNumber New_Baseline
0 1 0 False NaN -40.0
1 1 30 False NaN -10.0
2 1 40 True 1.0 0.0
3 1 56 False NaN 16.0
4 2 0 False NaN -14.0
5 2 14 True 1.0 0.0
6 2 28 True 2.0 14.0