Tool series: TimeGPT_(4) prediction interval data

prediction interval

The prediction interval provides a measure of the uncertainty in the predicted value. In time series forecasting, a prediction interval gives an estimated range of values ​​within which future observations will fall, based on a confidence level or uncertainty that you set. This level of uncertainty is critical for informed decision-making, risk assessment and planning.

For example, a 95% prediction interval means that 95 out of 100 times, the actual future value will fall within the estimated range. Therefore, wider intervals indicate greater uncertainty about the forecast, while narrower intervals indicate higher confidence.

When using TimeGPT for time series forecasting, you can set the level of the forecast interval according to your needs. TimeGPT uses compliance predictions to calibrate these intervals.


# Importing the necessary module
from nixtlats.utils import colab_badge
colab_badge('docs/tutorials/4_prediction_intervals')

#| hide
from itertools import product

from fastcore.test import test_eq, test_fail, test_warns
from dotenv import load_dotenv
# 加载环境变量
load_dotenv()
True

import pandas as pd
from nixtlats import TimeGPT

/home/ubuntu/miniconda/envs/nixtlats/lib/python3.11/site-packages/statsforecast/core.py:25: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from tqdm.autonotebook import tqdm

# 定义TimeGPT对象,并传入token参数,该参数默认为os.environ.get("TIMEGPT_TOKEN"),也可以手动提供一个token
timegpt = TimeGPT(
    token = 'my_token_provided_by_nixtla'
)
# 创建一个TimeGPT对象,用于生成时间相关的文本
timegpt = TimeGPT()

When using TimeGPT for time series forecasting, you can set the level (or levels) of the forecast interval according to your needs. Here's how you can do this:

# 从指定的URL读取CSV文件,并将其存储在DataFrame中
df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv')

# 显示DataFrame的前几行数据
df.head()
timestamp value
0 1949-01-01 112
1 1949-02-01 118
2 1949-03-01 132
3 1949-04-01 129
4 1949-05-01 121
# 导入所需模块和函数

# 使用timegpt模型对数据进行预测
# 参数说明:
# - df: 输入的数据框,包含时间戳和目标值
# - h: 预测的时间步长,这里设置为12
# - level: 预测的置信水平,这里设置为[80, 90, 99.7]
# - time_col: 时间戳列的名称,这里设置为'timestamp'
# - target_col: 目标值列的名称,这里设置为'value'
# 返回值为预测结果的数据框
timegpt_fcst_pred_int_df = timegpt.forecast(
    df=df, h=12, level=[80, 90, 99.7], 
    time_col='timestamp', target_col='value',
)

# 打印预测结果的前几行
timegpt_fcst_pred_int_df.head()
INFO:nixtlats.timegpt:Validating inputs...
INFO:nixtlats.timegpt:Preprocessing dataframes...
INFO:nixtlats.timegpt:Inferred freq: MS
INFO:nixtlats.timegpt:Restricting input...
INFO:nixtlats.timegpt:Calling Forecast Endpoint...
timestamp TimeGPT TimeGPT-lo-99.7 TimeGPT-lo-90 TimeGPT-lo-80 TimeGPT-hi-80 TimeGPT-hi-90 TimeGPT-hi-99.7
0 1961-01-01 437.837921 415.826453 423.783707 431.987061 443.688782 451.892136 459.849389
1 1961-02-01 426.062714 402.833523 407.694061 412.704926 439.420502 444.431366 449.291904
2 1961-03-01 463.116547 423.434062 430.316862 437.412534 488.820560 495.916231 502.799032
3 1961-04-01 478.244507 444.885193 446.776764 448.726837 507.762177 509.712250 511.603821
4 1961-05-01 505.646484 465.736694 471.976787 478.409872 532.883096 539.316182 545.556275
# 使用timegpt模型对数据进行预测
# 预测6个时间步长的数据
# 预测置信度分别为80%, 90%, 99.7%
# 时间列为'timestamp',目标列为'value'
level_short_horizon_df = timegpt.forecast(
    df=df, h=6, level=[80, 90, 99.7], 
    time_col='timestamp', target_col='value',
)

# 检查预测结果的形状是否为(6, 8)
test_eq(
    level_short_horizon_df.shape,
    (6, 8)
)
INFO:nixtlats.timegpt:Validating inputs...
INFO:nixtlats.timegpt:Preprocessing dataframes...
INFO:nixtlats.timegpt:Inferred freq: MS
INFO:nixtlats.timegpt:Restricting input...
INFO:nixtlats.timegpt:Calling Forecast Endpoint...
# 定义一个列表test_level,包含两个元素80和90.5
test_level = [80, 90.5]

# 调用timegpt模块的forecast函数,对数据框df进行预测
# 预测的时间步长为12,置信水平为80和90.5
# 时间列为'timestamp',目标列为'value'
cols_fcst_df = timegpt.forecast(
    df=df, h=12, level=[80, 90.5], 
    time_col='timestamp', target_col='value',
).columns

# 使用assert语句进行断言,判断是否满足条件
# 条件为所有的字符串'TimeGPT-{pos}-{lv}'都在cols_fcst_df中
# pos取值为'lo'和'hi',lv取值为test_level中的元素
assert all(f'TimeGPT-{
      
      pos}-{
      
      lv}' for pos, lv in product(test_level, ['lo', 'hi']) )
INFO:nixtlats.timegpt:Validating inputs...
INFO:nixtlats.timegpt:Preprocessing dataframes...
INFO:nixtlats.timegpt:Inferred freq: MS
INFO:nixtlats.timegpt:Restricting input...
INFO:nixtlats.timegpt:Calling Forecast Endpoint...
# 导入timegpt模块中的plot函数

# 调用plot函数,传入以下参数:
# - df: 数据框,包含时间戳和值的列
# - timegpt_fcst_pred_int_df: 数据框,包含时间戳、预测值和置信区间的列
# - time_col: 时间戳列的名称
# - target_col: 值列的名称
# - level: 置信区间的水平,以列表形式提供,例如[80, 90]表示80%和90%的置信区间
timegpt.plot(
    df, timegpt_fcst_pred_int_df, 
    time_col='timestamp', target_col='value',
    level=[80, 90],
)

Note that the choice of prediction interval levels depends on your specific use case. For high-risk forecasts, you may want to choose a wider range to account for more uncertainty. For less critical forecasts, a narrower interval may be acceptable.

historical forecast

You can also add_history=Truecalculate prediction intervals for historical forecasts by adding parameters.

# 使用TimeGPT进行预测
# df: 输入的数据框,包含时间戳和目标值
# h: 预测的时间步长
# level: 置信水平,用于计算预测区间
# time_col: 时间戳列的名称
# target_col: 目标值列的名称
# add_history: 是否在预测结果中添加历史数据
timegpt_fcst_pred_int_historical_df = timegpt.forecast(
    df=df, h=12, level=[80, 90], 
    time_col='timestamp', target_col='value',
    add_history=True,
)

# 显示预测结果的前几行
timegpt_fcst_pred_int_historical_df.head()
INFO:nixtlats.timegpt:Validating inputs...
INFO:nixtlats.timegpt:Preprocessing dataframes...
INFO:nixtlats.timegpt:Inferred freq: MS
INFO:nixtlats.timegpt:Calling Forecast Endpoint...
INFO:nixtlats.timegpt:Calling Historical Forecast Endpoint...
timestamp TimeGPT TimeGPT-lo-80 TimeGPT-lo-90 TimeGPT-hi-80 TimeGPT-hi-90
0 1951-01-01 135.483673 111.937767 105.262830 159.029579 165.704516
1 1951-02-01 144.442413 120.896508 114.221571 167.988319 174.663256
2 1951-03-01 157.191910 133.646004 126.971067 180.737815 187.412752
3 1951-04-01 148.769379 125.223473 118.548536 172.315284 178.990221
4 1951-05-01 140.472946 116.927041 110.252104 164.018852 170.693789
# 绘制时间序列图
# 参数:
# df:原始数据集
# timegpt_fcst_pred_int_historical_df:时间序列预测结果的置信区间数据集
# time_col:时间列的列名
# target_col:目标列的列名
# level:置信区间的水平,可以是单个值或列表形式,表示置信区间的百分比
timegpt.plot(
    df, timegpt_fcst_pred_int_historical_df, 
    time_col='timestamp', target_col='value',
    level=[80, 90],
)

Guess you like

Origin blog.csdn.net/wjjc1017/article/details/135243848