josepmaria :
I have the first 4 columns, and I want to create the *5th one:
user date visit_num total_visits_user *last_cust__visit*
1 1995-10-01 1 2 1995-10-02
1 1995-10-02 2 2 1995-10-02
2 1995-10-01 1 3 1995-10-03
2 1995-10-02 2 3 1995-10-03
2 1995-10-03 3 3 1995-10-03
3 1995-10-01 1 5 1995-10-05
3 1995-10-02 2 5 1995-10-05
3 1995-10-03 3 5 1995-10-05
3 1995-10-04 4 5 1995-10-05
3 1995-10-05 5 5 1995-10-05
4 1995-10-03 1 2 1995-10-04
4 1995-10-04 2 2 1995-10-04
*last_cust_ visit is a new column showing the date of the last visit of a customer.
I tried if, elif, else combined with groupby, but unforunately I could not make it work.
Any help will be highly appreciated. Thanks
Serge Ballesta :
You could groupby on user
to get the max of date
and merge this with the original dataframe:
df['last_cust_visit'] = df.merge(df.groupby('user')['date'].max()
.reset_index(), on='user', suffixes=('_', '')
)['date']
It gives the expected:
user date visit_num total_visits_user last_cust_visit
0 1 1995-10-01 1 2 1995-10-02
1 1 1995-10-02 2 2 1995-10-02
2 2 1995-10-01 1 3 1995-10-03
3 2 1995-10-02 2 3 1995-10-03
4 2 1995-10-03 3 3 1995-10-03
5 3 1995-10-01 1 5 1995-10-05
6 3 1995-10-02 2 5 1995-10-05
7 3 1995-10-03 3 5 1995-10-05
8 3 1995-10-04 4 5 1995-10-05
9 3 1995-10-05 5 5 1995-10-05
10 4 1995-10-03 1 2 1995-10-04
11 4 1995-10-04 2 2 1995-10-04