bAN :
I have a DataFrame like this:
id_a | date
12 | 2020-01-01
12 | 2020-01-02
13 | 2020-01-01
13 | 2020-01-03
14 | 2020-01-01
14 | 2020-01-02
14 | 2020-01-06
I would like to be able to make the diff between the max date and min date of each group based on id_a To get something like
id_a | date | diff
12 | 2020-01-01 | 1
12 | 2020-01-02 | 1
13 | 2020-01-01 | 2
13 | 2020-01-03 | 2
14 | 2020-01-01 | 5
14 | 2020-01-02 | 5
14 | 2020-01-06 | 5
I'm trying to do so with something like that:
df['diff'] = df.groupby('id_a').apply(lambda x: max(x['date']) - min(x['date']))
But I struggle a bit
Am I on the right path?
Quang Hoang :
You want transform
instead of apply
. Also np.ptp
would do:
# convert to datetime, ignore if already is
df['date'] = pd.to_datetime(df['date'])
df['date_diff'] = df.groupby('id_a')['date'].transform(np.ptp)
Output:
id_a date date_diff
0 12 2020-01-01 1 days
1 12 2020-01-02 1 days
2 13 2020-01-01 2 days
3 13 2020-01-03 2 days
4 14 2020-01-01 5 days
5 14 2020-01-02 5 days
6 14 2020-01-06 5 days
Update: if you want to get max
from date_a
and min
from date_b
:
groups = df.groupby('id_a')
min_dates = groups['date_b'].transform('min')
max_dates = groups['date_a'].transform('max')
df['date_diff'] = max_dates - min_dates