Tool series: TimeGPT_(3) Handling holidays and special dates

Calendar variables and special dates are one of the most common types of exogenous variables used in forecasting applications. They provide additional contextual information about the current state of the time series, especially for window-based models such as TimeGPT-1. These variables typically include adding information about the month, week, day, or hour of each observation. For example, in high-frequency hourly data, providing the current month of the year is more meaningful than limited historical information in the input window and can improve forecasting results.

In this tutorial, we will show how to use date_featuresa function to automatically add calendar variables to a dataset.

from nixtlats.utils import colab_badge


colab_badge('docs/tutorials/2_holidays')
# 导入load_dotenv函数,用于加载.env文件中的环境变量
from fastcore.test import test_eq, test_fail, test_warns
from dotenv import load_dotenv
load_dotenv()
True

import pandas as pd
from nixtlats import TimeGPT

/home/ubuntu/miniconda/envs/nixtlats/lib/python3.11/site-packages/statsforecast/core.py:25: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from tqdm.autonotebook import tqdm
# 创建一个TimeGPT对象,传入token参数,如果没有传入则默认使用环境变量中的TIMEGPT_TOKEN
timegpt = TimeGPT(token='my_token_provided_by_nixtla')
# 创建一个TimeGPT对象
timegpt = TimeGPT()

Given the dominant use of calendar variables, we include the automatic creation of common calendar variables as a preprocessing step in the forecasting method. To automatically add calendar variables, use the "date_features" parameter.

# 从指定的URL读取CSV文件,并将其存储在名为pltr_df的数据框中
pltr_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/openbb/pltr.csv')
# 导入所需的模块和函数已在代码中完成,无需再次导入

# 使用timegpt模块的forecast函数进行时间序列预测,并将结果赋值给fcst_pltr_calendar_df变量
# 参数说明:
# - df:传入的数据框,这里使用pltr_df的最后28个数据作为输入数据
# - h:预测的时间步长,这里预测未来14个时间步
# - freq:时间序列的频率,这里使用工作日频率(Business Day)
# - time_col:时间列的名称,这里使用'date'作为时间列
# - target_col:目标列的名称,这里使用'Close'作为目标列
# - date_features:需要使用的日期特征,这里使用'month'和'weekday'作为日期特征
fcst_pltr_calendar_df = timegpt.forecast(
    df=pltr_df.tail(2 * 14), h=14, freq='B',
    time_col='date', target_col='Close',
    date_features=['month','weekday']
)

# 输出预测结果的前几行
fcst_pltr_calendar_df.head()
INFO:nixtlats.timegpt:Validating inputs...
INFO:nixtlats.timegpt:Preprocessing dataframes...
WARNING:nixtlats.timegpt:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
INFO:nixtlats.timegpt:Calling Forecast Endpoint...
date TimeGPT
0 2023-09-25 14.677374
1 2023-09-26 14.825757
2 2023-09-27 15.126798
3 2023-09-28 14.398899
4 2023-09-29 14.387407
# 导入timegpt模块中的plot函数

# 使用plot函数绘制图表,传入以下参数:
# - pltr_df: 数据框,包含要绘制的数据
# - fcst_pltr_calendar_df: 数据框,包含要绘制的预测数据
# - id_col: 字符串,指定数据框中表示系列ID的列名
# - time_col: 字符串,指定数据框中表示时间的列名
# - target_col: 字符串,指定数据框中表示目标变量的列名
# - max_insample_length: 整数,指定用于训练模型的最大样本数量
timegpt.plot(
    pltr_df, 
    fcst_pltr_calendar_df, 
    id_col='series_id',
    time_col='date',
    target_col='Close',
    max_insample_length=90,
)

We can also plot the importance of each date feature.

timegpt.weights_x.plot.barh(x='features', y='weights', figsize=(10, 10))
<Axes: ylabel='features'>

You can also CountryHolidaysadd national holidays using classes.

# 导入nixtlats.date_features模块中的CountryHolidays类

from nixtlats.date_features import CountryHolidays
# 导入所需模块和函数

# 使用timegpt.forecast函数进行时间序列预测,将预测结果保存在fcst_pltr_calendar_df中
# 参数df为输入的数据框pltr_df,h为预测的时间步数14,freq为频率为工作日'B',time_col为时间列'date',target_col为目标列'Close',date_features为日期特征,这里使用了CountryHolidays函数来指定美国的假日
fcst_pltr_calendar_df = timegpt.forecast(
    df=pltr_df, h=14, freq='B',
    time_col='date', target_col='Close',
    date_features=[CountryHolidays(['US'])]
)

# 使用timegpt.weights_x.plot.barh函数绘制水平条形图,x轴为特征'features',y轴为权重'weights',图像大小为(10, 10)
timegpt.weights_x.plot.barh(x='features', y='weights', figsize=(10, 10))
INFO:nixtlats.timegpt:Validating inputs...
INFO:nixtlats.timegpt:Preprocessing dataframes...
WARNING:nixtlats.timegpt:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
INFO:nixtlats.timegpt:Calling Forecast Endpoint...





<Axes: ylabel='features'>

The following is date_featuresa detailed description of the parameters:

  • date_features(bool or str list or callable object) : This parameter specifies the date attributes to be considered.

    • If set to , the model will automatically add the most common date features associated Truewith the frequency of the given data frame ( ). dfFor daily frequencies, this might include features such as day of the week, month, and year.
    • If a list of strings is provided, it will consider those specific date attributes. For example, date_features=['weekday', 'month']only day of the week and month will be added as features.
    • If a callable is provided, it should be a function that takes a date as input and returns the required characteristics. This allows flexibility in calculating custom date features.
  • date_features_to_one_hot(list of bool or str) : Once you have identified the date features, you may want to one-hot encode them, especially if they are categorical (such as day of the week). One-hot encoding converts these categorical features into binary matrices, making them more suitable for many machine learning algorithms.

    • If date_features=True, then by default all calculated date features will be one-hot encoded.
    • If a list of strings is provided, only those specific date features will be one-hot encoded.

By utilizing date_featuresand date_features_to_one_hotparameters, the temporal effects of date attributes can be effectively incorporated into the prediction model, thereby improving its accuracy and interpretability.

Guess you like

Origin blog.csdn.net/wjjc1017/article/details/135233379