Pandas dataframe grouping function to compute date difference

bAN :

I have a DataFrame like this:

id_a | date

12   | 2020-01-01
12   | 2020-01-02
13   | 2020-01-01
13   | 2020-01-03
14   | 2020-01-01
14   | 2020-01-02
14   | 2020-01-06

I would like to be able to make the diff between the max date and min date of each group based on id_a To get something like

id_a | date       | diff

12   | 2020-01-01 | 1
12   | 2020-01-02 | 1
13   | 2020-01-01 | 2
13   | 2020-01-03 | 2
14   | 2020-01-01 | 5
14   | 2020-01-02 | 5
14   | 2020-01-06 | 5

I'm trying to do so with something like that:

df['diff'] = df.groupby('id_a').apply(lambda x: max(x['date']) - min(x['date']))

But I struggle a bit

Am I on the right path?

Quang Hoang :

You want transform instead of apply. Also np.ptp would do:

 # convert to datetime, ignore if already is
 df['date'] = pd.to_datetime(df['date'])

 df['date_diff'] = df.groupby('id_a')['date'].transform(np.ptp)

Output:

   id_a       date date_diff
0    12 2020-01-01    1 days
1    12 2020-01-02    1 days
2    13 2020-01-01    2 days
3    13 2020-01-03    2 days
4    14 2020-01-01    5 days
5    14 2020-01-02    5 days
6    14 2020-01-06    5 days

Update: if you want to get max from date_a and min from date_b:

groups = df.groupby('id_a')
min_dates = groups['date_b'].transform('min')
max_dates = groups['date_a'].transform('max')

df['date_diff'] = max_dates - min_dates

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=193552&siteId=1