Python - Creating list of week numbers of months using dates

dodoelhos :

I have got a start date ('2019-11-18') and an end date ('2021-02-19'). I am trying to create a list of all the weeks of each month that exist between the start and end date. My expected result should be something like this:

list = ['2019.Nov.3','2019.Nov.4', '2019.Nov.5' .... '2021.Feb.2','2021.Feb.3']

If the first or last date of a month lands on a Wednesday, i will assume that the week belongs to this month (As 3 out of the 5 working days will belong to this month)

I was actually successful in creating a dataframe with all the weeks of the year that exist between the start and end date using the following code:

date_1 = '18-11-19'

first_date = datetime.strptime(date_1, '%d-%m-%y')

date_2 = '19-02-21'

last_date = datetime.strptime(date_2, '%d-%m-%y')

timeline = pd.DataFrame(columns=['Year', 'Week'])


def create_list(df):
    start_year = int(first_date.isocalendar()[0])
    start_week = int(first_date.isocalendar()[1])
    end_year = int(last_date.isocalendar()[0])
    end_week = int(last_date.isocalendar()[1])

    while start_year < (end_year + 1):

        if start_year == end_year:

            while start_week < (end_week + 1):

                if len(str(start_week)) == 1:

                    week = f'{start_year}' + '.0' + f'{start_week}'

                else:

                    week = f'{start_year}' + '.' + f'{start_week}'

                df = df.append(({'Year': start_year, 'Week': week}), ignore_index=True)
                start_week += 1

        else:

            while start_week < 53:

                if len(str(start_week)) == 1:

                    week = f'{start_year}' + '.0' + f'{start_week}'

                else:

                    week = f'{start_year}' + '.' + f'{start_week}'

                df = df.append(({'Year': start_year, 'Week': week}), ignore_index=True)
                start_week += 1

        start_year += 1
        start_week = 1

    return df


timeline = create_list(timeline)

I was successfully able to use this as an x axis for my line graph. However, the axis is a bit hard to read and its very difficult to know which week belongs to which month.

I would really appreciate if someone can give me a hand with this!

Edit:

So here is the solution with the guidance of @Serge Ballesta. I hope it helps anyone who might need something similiar in the future!

import pandas as pd
import dateutil.relativedelta
from datetime import datetime


def year_week(date):
    if len(str(date.isocalendar()[1])) == 1:

        return f'{date.isocalendar()[0]}' + '.0' + f'{date.isocalendar()[1]}'

    else:

        return f'{date.isocalendar()[0]}' + '.' + f'{date.isocalendar()[1]}'


date_1 = '18-11-19'

first_date = datetime.strptime(date_1, '%d-%m-%y')

date_2 = '19-02-21'

last_date = datetime.strptime(date_2, '%d-%m-%y')

set_first_date = str((first_date - dateutil.relativedelta.relativedelta(months=1)).date())

set_last_date = str((last_date + dateutil.relativedelta.relativedelta(months=1)).date())

s = pd.date_range(set_first_date, set_last_date, freq='W-WED'
                  ).to_series(name='wed').reset_index(drop=True)

df = s.to_frame()

df['week'] = df.apply(lambda x: year_week(x['wed']), axis=1)

df = df.assign(week_of_month=s.groupby(s.dt.strftime('%Y%m')
                                       ).cumcount() + 1)

df = df[(s >= pd.Timestamp('2019-11-18'))
        & (s <= pd.Timestamp('2021-02-19'))]

df['month_week'] = (df['wed'].dt.strftime('%Y.%b.') + df['week_of_month'].astype(str)).tolist()

df = df.drop(['wed', 'week_of_month'], axis = 1)

print (df)

Printed df:

       week  month_week
4   2019.47  2019.Nov.3
5   2019.48  2019.Nov.4
6   2019.49  2019.Dec.1
7   2019.50  2019.Dec.2
8   2019.51  2019.Dec.3
..      ...         ...
65  2021.03  2021.Jan.3
66  2021.04  2021.Jan.4
67  2021.05  2021.Feb.1
68  2021.06  2021.Feb.2
69  2021.07  2021.Feb.3
Serge Ballesta :

I would build a Series of timestamps with a frequency of W-WED to have consistently Wednesday as day of week. That way, we immediately get the correct month for the week.

To have the number of the week in the month, I would start one month before the required start, and use a cumcount on year-month + 1. Then it would be enough to filter only the expected range and properly format the values:

# produce a series of wednesdays starting in 2019-10-01
s = pd.date_range('2019-10-01', '2021-03-31', freq='W-WED'
                  ).to_series(name='wed').reset_index(drop=True)

# compute the week number in the month
df = s.to_frame().assign(week_of_month=s.groupby(s.dt.strftime('%Y%m')
                                                 ).cumcount() + 1)

# filter the required range
df = df[(s >= pd.Timestamp('2019-11-18'))
      & (s <= pd.Timestamp('2021-02-19'))]

# here is the expected list
lst = (df['wed'].dt.strftime('%Y.%b.')+df['week_of_month'].astype(str)).tolist()

lst is as expected:

['2019.Nov.3', '2019.Nov.4', '2019.Dec.1', '2019.Dec.2', '2019.Dec.3', '2019.Dec.4', 
'2020.Jan.1', '2020.Jan.2', '2020.Jan.3', '2020.Jan.4', '2020.Jan.5', '2020.Feb.1',
'2020.Feb.2', '2020.Feb.3', '2020.Feb.4', '2020.Mar.1', '2020.Mar.2', '2020.Mar.3',
'2020.Mar.4', '2020.Apr.1', '2020.Apr.2', '2020.Apr.3', '2020.Apr.4', '2020.Apr.5',
'2020.May.1', '2020.May.2', '2020.May.3', '2020.May.4', '2020.Jun.1', '2020.Jun.2',
'2020.Jun.3', '2020.Jun.4', '2020.Jul.1', '2020.Jul.2', '2020.Jul.3', '2020.Jul.4',
'2020.Jul.5', '2020.Aug.1', '2020.Aug.2', '2020.Aug.3', '2020.Aug.4', '2020.Sep.1',
'2020.Sep.2', '2020.Sep.3', '2020.Sep.4', '2020.Sep.5', '2020.Oct.1', '2020.Oct.2',
'2020.Oct.3', '2020.Oct.4', '2020.Nov.1', '2020.Nov.2', '2020.Nov.3', '2020.Nov.4',
'2020.Dec.1', '2020.Dec.2', '2020.Dec.3', '2020.Dec.4', '2020.Dec.5', '2021.Jan.1',
'2021.Jan.2', '2021.Jan.3', '2021.Jan.4', '2021.Feb.1', '2021.Feb.2', '2021.Feb.3']

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=27797&siteId=1