SomeBruh :
I have a Pandas dataframe including two dates column in datetime. And I want to generate a list of dates in this date rage as a new column, so I can explode the entry into multiple rows later.
I tried the following list comprehension.
orders_df['list_of_dates'] = [orders_df['start_date'] + timedelta(days=n) for n in range(orders_df['date_difference'])]
But the following message was received
TypeError: 'Series' object cannot be interpreted as an integer
Any thoughts on the solution would be much appreciated.
jezrael :
Use nested list comprehension with range
:
from datetime import timedelta
rng = pd.date_range('2017-04-03', periods=5)
orders_df = pd.DataFrame({'start_date': rng, 'date_difference': 2})
orders_df['list_of_dates'] = [[d + timedelta(days=x) for x in range(n)]
for d, n
in zip(orders_df['start_date'],
orders_df['date_difference'])]
print (orders_df)
start_date date_difference list_of_dates
0 2017-04-03 2 [2017-04-03 00:00:00, 2017-04-04 00:00:00]
1 2017-04-04 2 [2017-04-04 00:00:00, 2017-04-05 00:00:00]
2 2017-04-05 2 [2017-04-05 00:00:00, 2017-04-06 00:00:00]
3 2017-04-06 2 [2017-04-06 00:00:00, 2017-04-07 00:00:00]
4 2017-04-07 2 [2017-04-07 00:00:00, 2017-04-08 00:00:00]
If need also new column is possible use Index.repeat
with GroupBy.cumcount
for counter Series converted to timedeltas by to_timedelta
:
df = orders_df.loc[orders_df.index.repeat(orders_df['date_difference'])]
g = df.groupby(level=0).cumcount()
df['new'] = df['start_date'] + pd.to_timedelta(g, unit='d')
df = df.reset_index(drop=True)
print (df)
start_date date_difference new
0 2017-04-03 2 2017-04-03
1 2017-04-03 2 2017-04-04
2 2017-04-04 2 2017-04-04
3 2017-04-04 2 2017-04-05
4 2017-04-05 2 2017-04-05
5 2017-04-05 2 2017-04-06
6 2017-04-06 2 2017-04-06
7 2017-04-06 2 2017-04-07
8 2017-04-07 2 2017-04-07
9 2017-04-07 2 2017-04-08