工具系列：TimeGPT_(4)预测区间数据

文章目录

- 预测区间
- 历史预测

预测区间

预测区间提供了对预测值的不确定性的度量。在时间序列预测中，预测区间根据您设置的置信水平或不确定性，给出了一个估计的范围，未来观测值将在其中。这种不确定性水平对于做出明智决策、风险评估和规划至关重要。

例如，95%的预测区间意味着在100次中，有95次实际未来值将落在估计范围内。因此，较宽的区间表示对预测的不确定性更大，而较窄的区间则表示更高的置信度。

在使用TimeGPT进行时间序列预测时，您可以根据需求设置预测区间的水平。TimeGPT使用符合性预测来校准这些区间。


# Importing the necessary module
from nixtlats.utils import colab_badge
colab_badge('docs/tutorials/4_prediction_intervals')

#| hide
from itertools import product

from fastcore.test import test_eq, test_fail, test_warns
from dotenv import load_dotenv

# 加载环境变量
load_dotenv()

True


import pandas as pd
from nixtlats import TimeGPT

/home/ubuntu/miniconda/envs/nixtlats/lib/python3.11/site-packages/statsforecast/core.py:25: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from tqdm.autonotebook import tqdm


# 定义TimeGPT对象，并传入token参数，该参数默认为os.environ.get("TIMEGPT_TOKEN")，也可以手动提供一个token
timegpt = TimeGPT(
    token = 'my_token_provided_by_nixtla'
)

# 创建一个TimeGPT对象，用于生成时间相关的文本
timegpt = TimeGPT()

使用TimeGPT进行时间序列预测时，您可以根据您的需求设置预测区间的级别（或级别）。以下是您可以执行此操作的方法：

# 从指定的URL读取CSV文件，并将其存储在DataFrame中
df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv')

# 显示DataFrame的前几行数据
df.head()

	timestamp	value
0	1949-01-01	112
1	1949-02-01	118
2	1949-03-01	132
3	1949-04-01	129
4	1949-05-01	121

# 导入所需模块和函数

# 使用timegpt模型对数据进行预测
# 参数说明：
# - df: 输入的数据框，包含时间戳和目标值
# - h: 预测的时间步长，这里设置为12
# - level: 预测的置信水平，这里设置为[80, 90, 99.7]
# - time_col: 时间戳列的名称，这里设置为'timestamp'
# - target_col: 目标值列的名称，这里设置为'value'
# 返回值为预测结果的数据框
timegpt_fcst_pred_int_df = timegpt.forecast(
    df=df, h=12, level=[80, 90, 99.7], 
    time_col='timestamp', target_col='value',
)

# 打印预测结果的前几行
timegpt_fcst_pred_int_df.head()

INFO:nixtlats.timegpt:Validating inputs...
INFO:nixtlats.timegpt:Preprocessing dataframes...
INFO:nixtlats.timegpt:Inferred freq: MS
INFO:nixtlats.timegpt:Restricting input...
INFO:nixtlats.timegpt:Calling Forecast Endpoint...

	timestamp	TimeGPT	TimeGPT-lo-99.7	TimeGPT-lo-90	TimeGPT-lo-80	TimeGPT-hi-80	TimeGPT-hi-90	TimeGPT-hi-99.7
0	1961-01-01	437.837921	415.826453	423.783707	431.987061	443.688782	451.892136	459.849389
1	1961-02-01	426.062714	402.833523	407.694061	412.704926	439.420502	444.431366	449.291904
2	1961-03-01	463.116547	423.434062	430.316862	437.412534	488.820560	495.916231	502.799032
3	1961-04-01	478.244507	444.885193	446.776764	448.726837	507.762177	509.712250	511.603821
4	1961-05-01	505.646484	465.736694	471.976787	478.409872	532.883096	539.316182	545.556275

# 使用timegpt模型对数据进行预测
# 预测6个时间步长的数据
# 预测置信度分别为80%, 90%, 99.7%
# 时间列为'timestamp'，目标列为'value'
level_short_horizon_df = timegpt.forecast(
    df=df, h=6, level=[80, 90, 99.7], 
    time_col='timestamp', target_col='value',
)

# 检查预测结果的形状是否为(6, 8)
test_eq(
    level_short_horizon_df.shape,
    (6, 8)
)

INFO:nixtlats.timegpt:Validating inputs...
INFO:nixtlats.timegpt:Preprocessing dataframes...
INFO:nixtlats.timegpt:Inferred freq: MS
INFO:nixtlats.timegpt:Restricting input...
INFO:nixtlats.timegpt:Calling Forecast Endpoint...

# 定义一个列表test_level，包含两个元素80和90.5
test_level = [80, 90.5]

# 调用timegpt模块的forecast函数，对数据框df进行预测
# 预测的时间步长为12，置信水平为80和90.5
# 时间列为'timestamp'，目标列为'value'
cols_fcst_df = timegpt.forecast(
    df=df, h=12, level=[80, 90.5], 
    time_col='timestamp', target_col='value',
).columns

# 使用assert语句进行断言，判断是否满足条件
# 条件为所有的字符串'TimeGPT-{pos}-{lv}'都在cols_fcst_df中
# pos取值为'lo'和'hi'，lv取值为test_level中的元素
assert all(f'TimeGPT-{
      
      pos}-{
      
      lv}' for pos, lv in product(test_level, ['lo', 'hi']) )

INFO:nixtlats.timegpt:Validating inputs...
INFO:nixtlats.timegpt:Preprocessing dataframes...
INFO:nixtlats.timegpt:Inferred freq: MS
INFO:nixtlats.timegpt:Restricting input...
INFO:nixtlats.timegpt:Calling Forecast Endpoint...

# 导入timegpt模块中的plot函数

# 调用plot函数，传入以下参数：
# - df: 数据框，包含时间戳和值的列
# - timegpt_fcst_pred_int_df: 数据框，包含时间戳、预测值和置信区间的列
# - time_col: 时间戳列的名称
# - target_col: 值列的名称
# - level: 置信区间的水平，以列表形式提供，例如[80, 90]表示80%和90%的置信区间
timegpt.plot(
    df, timegpt_fcst_pred_int_df, 
    time_col='timestamp', target_col='value',
    level=[80, 90],
)

请注意，预测区间水平的选择取决于您的具体用例。对于高风险预测，您可能希望选择更宽的区间以考虑更多的不确定性。对于不太关键的预测，较窄的区间可能是可以接受的。

历史预测

您还可以通过添加add_history=True参数来计算历史预测的预测区间。

# 使用TimeGPT进行预测
# df: 输入的数据框，包含时间戳和目标值
# h: 预测的时间步长
# level: 置信水平，用于计算预测区间
# time_col: 时间戳列的名称
# target_col: 目标值列的名称
# add_history: 是否在预测结果中添加历史数据
timegpt_fcst_pred_int_historical_df = timegpt.forecast(
    df=df, h=12, level=[80, 90], 
    time_col='timestamp', target_col='value',
    add_history=True,
)

# 显示预测结果的前几行
timegpt_fcst_pred_int_historical_df.head()

INFO:nixtlats.timegpt:Validating inputs...
INFO:nixtlats.timegpt:Preprocessing dataframes...
INFO:nixtlats.timegpt:Inferred freq: MS
INFO:nixtlats.timegpt:Calling Forecast Endpoint...
INFO:nixtlats.timegpt:Calling Historical Forecast Endpoint...

	timestamp	TimeGPT	TimeGPT-lo-80	TimeGPT-lo-90	TimeGPT-hi-80	TimeGPT-hi-90
0	1951-01-01	135.483673	111.937767	105.262830	159.029579	165.704516
1	1951-02-01	144.442413	120.896508	114.221571	167.988319	174.663256
2	1951-03-01	157.191910	133.646004	126.971067	180.737815	187.412752
3	1951-04-01	148.769379	125.223473	118.548536	172.315284	178.990221
4	1951-05-01	140.472946	116.927041	110.252104	164.018852	170.693789

# 绘制时间序列图
# 参数：
# df：原始数据集
# timegpt_fcst_pred_int_historical_df：时间序列预测结果的置信区间数据集
# time_col：时间列的列名
# target_col：目标列的列名
# level：置信区间的水平，可以是单个值或列表形式，表示置信区间的百分比
timegpt.plot(
    df, timegpt_fcst_pred_int_historical_df, 
    time_col='timestamp', target_col='value',
    level=[80, 90],
)

工具系列：TimeGPT_(4)预测区间数据

文章目录

预测区间

历史预测

猜你喜欢