Tool series: TimeGPT_(2) Time series forecasting using exogenous variables

TimeGPT uses exogenous variables for time series forecasting

Exogenous variables are very important in time series forecasting because they provide additional information that may affect the forecast. These variables can include holiday markers, marketing spend, weather data, or any other external data relevant to the time series data you are forecasting.

For example, if you are forecasting ice cream sales, temperature data can serve as a useful exogenous variable. Ice cream sales may increase during hot weather.

To include exogenous variables in TimeGPT, you need to pair each point in the time series data with the corresponding external data.

Import related toolkits


# Importing the colab_badge module from the nixtlats.utils package
from nixtlats.utils import colab_badge

# 导入load_dotenv函数,用于加载.env文件中的环境变量
from dotenv import load_dotenv
# 导入load_dotenv函数,用于加载环境变量
load_dotenv()
True

import pandas as pd
from nixtlats import TimeGPT

/home/ubuntu/miniconda/envs/nixtlats/lib/python3.11/site-packages/statsforecast/core.py:25: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from tqdm.autonotebook import tqdm


# 定义TimeGPT对象,并传入一个参数token,用于身份验证
# 如果没有提供token参数,则默认使用os.environ.get("TIMEGPT_TOKEN")获取token
timegpt = TimeGPT(
    token = 'my_token_provided_by_nixtla'
)
# 导入TimeGPT模型

timegpt = TimeGPT()  # 创建TimeGPT对象的实例

Case study on predicting electricity prices for the next day in European and American countries

Let’s look at an example of predicting electricity prices for the next day. The following data set contains hourly electricity prices ( ycolumns) for five markets in Europe and the United States, unique_ididentified by columns. The columns from Exogenous1to day_6are the exogenous variables used by TimeGPT to predict prices.

# 从指定的URL读取csv文件,并将其存储在DataFrame对象df中
df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-with-ex-vars.csv')

# 显示DataFrame对象df的前几行数据
df.head()
unique_id ds y Exogenous1 Exogenous2 day_0 day_1 day_2 day_3 day_4 day_5 day_6
0 BE 2016-12-01 00:00:00 72.00 61507.0 71066.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1 BE 2016-12-01 01:00:00 65.80 59528.0 67311.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
2 BE 2016-12-01 02:00:00 59.99 58812.0 67470.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
3 BE 2016-12-01 03:00:00 50.69 57676.0 64529.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
4 BE 2016-12-01 04:00:00 52.58 56804.0 62773.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0

In order to generate a forecast, we also need to add the future values ​​of the exogenous variables. Let's read this dataset. In this case we want to predict 24 steps into the future, so there will be 24 observations for each "unique_id".

# 从GitHub上读取电力短期未来外部变量数据集
future_ex_vars_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-future-ex-vars.csv')

# 打印数据集的前五行
future_ex_vars_df.head()
unique_id ds Exogenous1 Exogenous2 day_0 day_1 day_2 day_3 day_4 day_5 day_6
0 BE 2016-12-31 00:00:00 64108.0 70318.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
1 BE 2016-12-31 01:00:00 62492.0 67898.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
2 BE 2016-12-31 02:00:00 61571.0 68379.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
3 BE 2016-12-31 03:00:00 60381.0 64972.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
4 BE 2016-12-31 04:00:00 60298.0 62900.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0

Let's call forecastthe method and add this information:

# 使用timegpt模型对数据进行预测
# 参数说明:
# - df: 历史数据的DataFrame
# - X_df: 未来外部变量的DataFrame
# - h: 预测的时间步长
# - level: 置信水平
timegpt_fcst_ex_vars_df = timegpt.forecast(df=df, X_df=future_ex_vars_df, h=24, level=[80, 90])

# 打印预测结果的前几行
timegpt_fcst_ex_vars_df.head()
INFO:nixtlats.timegpt:Validating inputs...
INFO:nixtlats.timegpt:Preprocessing dataframes...
INFO:nixtlats.timegpt:Inferred freq: H
INFO:nixtlats.timegpt:Calling Forecast Endpoint...
unique_id ds TimeGPT TimeGPT-lo-90 TimeGPT-lo-80 TimeGPT-hi-80 TimeGPT-hi-90
0 BE 2016-12-31 00:00:00 38.861762 33.821073 34.368669 43.354854 43.902450
1 BE 2016-12-31 01:00:00 35.382102 30.014594 31.493322 39.270882 40.749610
2 BE 2016-12-31 02:00:00 33.811425 26.658821 28.543087 39.079764 40.964029
3 BE 2016-12-31 03:00:00 31.707475 24.896205 26.818795 36.596155 38.518745
4 BE 2016-12-31 04:00:00 30.316475 21.125143 24.432148 36.200801 39.507807
# 导入必要的模块和函数

# 使用timegpt.plot函数绘制时间序列预测结果的图表
# 参数1:df[['unique_id', 'ds', 'y']],表示要绘制的时间序列数据,包括唯一标识符、时间戳和目标变量
# 参数2:timegpt_fcst_ex_vars_df,表示时间序列预测的额外变量数据
# 参数3:max_insample_length=365,表示用于训练模型的最大历史数据长度为365天
# 参数4:level=[80, 90],表示绘制置信区间的水平,这里设置为80%和90%
# 返回:绘制好的时间序列预测结果图表
timegpt.plot(
    df[['unique_id', 'ds', 'y']], 
    timegpt_fcst_ex_vars_df, 
    max_insample_length=365, 
    level=[80, 90], 
)

We can also get the feature importance.

# 绘制水平条形图
timegpt.weights_x.plot.barh(x='features', y='weights')

<Axes: ylabel='features'>

You can also CountryHolidaysadd national holidays using classes.

# 导入nixtlats.date_features模块中的CountryHolidays类

from nixtlats.date_features import CountryHolidays
# 导入所需的模块和函数

# 使用timegpt模型对给定的数据进行预测
# 参数:
# - df: 历史数据的DataFrame,包含时间序列数据
# - X_df: 未来外部变量的DataFrame,包含与时间序列相关的外部变量
# - h: 预测的时间步长,即预测未来多少个时间点的值
# - level: 置信水平的列表,用于计算置信区间
# - date_features: 日期特征的列表,用于考虑特殊的日期效应,如假期等
# 返回值:
# - timegpt_fcst_ex_vars_df: 预测结果的DataFrame,包含预测值和置信区间
timegpt_fcst_ex_vars_df = timegpt.forecast(
    df=df, X_df=future_ex_vars_df, h=24, level=[80, 90], 
    date_features=[CountryHolidays(['US'])]
)
# 使用timegpt模型的weights_x属性绘制水平条形图
# 参数:
# - x: 水平条形图的x轴数据,即特征名称
# - y: 水平条形图的y轴数据,即特征权重值
timegpt.weights_x.plot.barh(x='features', y='weights')
INFO:nixtlats.timegpt:Validating inputs...
INFO:nixtlats.timegpt:Preprocessing dataframes...
INFO:nixtlats.timegpt:Inferred freq: H
INFO:nixtlats.timegpt:Calling Forecast Endpoint...





<Axes: ylabel='features'>

Guess you like

Origin blog.csdn.net/wjjc1017/article/details/135233108