I have got a start date ('2019-11-18') and an end date ('2021-02-19'). I am trying to create a list of all the weeks of each month that exist between the start and end date. My expected result should be something like this:
list = ['2019.Nov.3','2019.Nov.4', '2019.Nov.5' .... '2021.Feb.2','2021.Feb.3']
If the first or last date of a month lands on a Wednesday, i will assume that the week belongs to this month (As 3 out of the 5 working days will belong to this month)
I was actually successful in creating a dataframe with all the weeks of the year that exist between the start and end date using the following code:
date_1 = '18-11-19'
first_date = datetime.strptime(date_1, '%d-%m-%y')
date_2 = '19-02-21'
last_date = datetime.strptime(date_2, '%d-%m-%y')
timeline = pd.DataFrame(columns=['Year', 'Week'])
def create_list(df):
start_year = int(first_date.isocalendar()[0])
start_week = int(first_date.isocalendar()[1])
end_year = int(last_date.isocalendar()[0])
end_week = int(last_date.isocalendar()[1])
while start_year < (end_year + 1):
if start_year == end_year:
while start_week < (end_week + 1):
if len(str(start_week)) == 1:
week = f'{start_year}' + '.0' + f'{start_week}'
else:
week = f'{start_year}' + '.' + f'{start_week}'
df = df.append(({'Year': start_year, 'Week': week}), ignore_index=True)
start_week += 1
else:
while start_week < 53:
if len(str(start_week)) == 1:
week = f'{start_year}' + '.0' + f'{start_week}'
else:
week = f'{start_year}' + '.' + f'{start_week}'
df = df.append(({'Year': start_year, 'Week': week}), ignore_index=True)
start_week += 1
start_year += 1
start_week = 1
return df
timeline = create_list(timeline)
I was successfully able to use this as an x axis for my line graph. However, the axis is a bit hard to read and its very difficult to know which week belongs to which month.
I would really appreciate if someone can give me a hand with this!
Edit:
So here is the solution with the guidance of @Serge Ballesta. I hope it helps anyone who might need something similiar in the future!
import pandas as pd
import dateutil.relativedelta
from datetime import datetime
def year_week(date):
if len(str(date.isocalendar()[1])) == 1:
return f'{date.isocalendar()[0]}' + '.0' + f'{date.isocalendar()[1]}'
else:
return f'{date.isocalendar()[0]}' + '.' + f'{date.isocalendar()[1]}'
date_1 = '18-11-19'
first_date = datetime.strptime(date_1, '%d-%m-%y')
date_2 = '19-02-21'
last_date = datetime.strptime(date_2, '%d-%m-%y')
set_first_date = str((first_date - dateutil.relativedelta.relativedelta(months=1)).date())
set_last_date = str((last_date + dateutil.relativedelta.relativedelta(months=1)).date())
s = pd.date_range(set_first_date, set_last_date, freq='W-WED'
).to_series(name='wed').reset_index(drop=True)
df = s.to_frame()
df['week'] = df.apply(lambda x: year_week(x['wed']), axis=1)
df = df.assign(week_of_month=s.groupby(s.dt.strftime('%Y%m')
).cumcount() + 1)
df = df[(s >= pd.Timestamp('2019-11-18'))
& (s <= pd.Timestamp('2021-02-19'))]
df['month_week'] = (df['wed'].dt.strftime('%Y.%b.') + df['week_of_month'].astype(str)).tolist()
df = df.drop(['wed', 'week_of_month'], axis = 1)
print (df)
Printed df:
week month_week
4 2019.47 2019.Nov.3
5 2019.48 2019.Nov.4
6 2019.49 2019.Dec.1
7 2019.50 2019.Dec.2
8 2019.51 2019.Dec.3
.. ... ...
65 2021.03 2021.Jan.3
66 2021.04 2021.Jan.4
67 2021.05 2021.Feb.1
68 2021.06 2021.Feb.2
69 2021.07 2021.Feb.3
I would build a Series of timestamps with a frequency of W-WED
to have consistently Wednesday as day of week. That way, we immediately get the correct month for the week.
To have the number of the week in the month, I would start one month before the required start, and use a cumcount
on year-month + 1. Then it would be enough to filter only the expected range and properly format the values:
# produce a series of wednesdays starting in 2019-10-01
s = pd.date_range('2019-10-01', '2021-03-31', freq='W-WED'
).to_series(name='wed').reset_index(drop=True)
# compute the week number in the month
df = s.to_frame().assign(week_of_month=s.groupby(s.dt.strftime('%Y%m')
).cumcount() + 1)
# filter the required range
df = df[(s >= pd.Timestamp('2019-11-18'))
& (s <= pd.Timestamp('2021-02-19'))]
# here is the expected list
lst = (df['wed'].dt.strftime('%Y.%b.')+df['week_of_month'].astype(str)).tolist()
lst
is as expected:
['2019.Nov.3', '2019.Nov.4', '2019.Dec.1', '2019.Dec.2', '2019.Dec.3', '2019.Dec.4',
'2020.Jan.1', '2020.Jan.2', '2020.Jan.3', '2020.Jan.4', '2020.Jan.5', '2020.Feb.1',
'2020.Feb.2', '2020.Feb.3', '2020.Feb.4', '2020.Mar.1', '2020.Mar.2', '2020.Mar.3',
'2020.Mar.4', '2020.Apr.1', '2020.Apr.2', '2020.Apr.3', '2020.Apr.4', '2020.Apr.5',
'2020.May.1', '2020.May.2', '2020.May.3', '2020.May.4', '2020.Jun.1', '2020.Jun.2',
'2020.Jun.3', '2020.Jun.4', '2020.Jul.1', '2020.Jul.2', '2020.Jul.3', '2020.Jul.4',
'2020.Jul.5', '2020.Aug.1', '2020.Aug.2', '2020.Aug.3', '2020.Aug.4', '2020.Sep.1',
'2020.Sep.2', '2020.Sep.3', '2020.Sep.4', '2020.Sep.5', '2020.Oct.1', '2020.Oct.2',
'2020.Oct.3', '2020.Oct.4', '2020.Nov.1', '2020.Nov.2', '2020.Nov.3', '2020.Nov.4',
'2020.Dec.1', '2020.Dec.2', '2020.Dec.3', '2020.Dec.4', '2020.Dec.5', '2021.Jan.1',
'2021.Jan.2', '2021.Jan.3', '2021.Jan.4', '2021.Feb.1', '2021.Feb.2', '2021.Feb.3']