Article directory
prediction interval
The prediction interval provides a measure of the uncertainty in the predicted value. In time series forecasting, a prediction interval gives an estimated range of values within which future observations will fall, based on a confidence level or uncertainty that you set. This level of uncertainty is critical for informed decision-making, risk assessment and planning.
For example, a 95% prediction interval means that 95 out of 100 times, the actual future value will fall within the estimated range. Therefore, wider intervals indicate greater uncertainty about the forecast, while narrower intervals indicate higher confidence.
When using TimeGPT for time series forecasting, you can set the level of the forecast interval according to your needs. TimeGPT uses compliance predictions to calibrate these intervals.
# Importing the necessary module
from nixtlats.utils import colab_badge
colab_badge('docs/tutorials/4_prediction_intervals')
#| hide
from itertools import product
from fastcore.test import test_eq, test_fail, test_warns
from dotenv import load_dotenv
# 加载环境变量
load_dotenv()
True
import pandas as pd
from nixtlats import TimeGPT
/home/ubuntu/miniconda/envs/nixtlats/lib/python3.11/site-packages/statsforecast/core.py:25: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from tqdm.autonotebook import tqdm
# 定义TimeGPT对象,并传入token参数,该参数默认为os.environ.get("TIMEGPT_TOKEN"),也可以手动提供一个token
timegpt = TimeGPT(
token = 'my_token_provided_by_nixtla'
)
# 创建一个TimeGPT对象,用于生成时间相关的文本
timegpt = TimeGPT()
When using TimeGPT for time series forecasting, you can set the level (or levels) of the forecast interval according to your needs. Here's how you can do this:
# 从指定的URL读取CSV文件,并将其存储在DataFrame中
df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv')
# 显示DataFrame的前几行数据
df.head()
timestamp | value | |
---|---|---|
0 | 1949-01-01 | 112 |
1 | 1949-02-01 | 118 |
2 | 1949-03-01 | 132 |
3 | 1949-04-01 | 129 |
4 | 1949-05-01 | 121 |
# 导入所需模块和函数
# 使用timegpt模型对数据进行预测
# 参数说明:
# - df: 输入的数据框,包含时间戳和目标值
# - h: 预测的时间步长,这里设置为12
# - level: 预测的置信水平,这里设置为[80, 90, 99.7]
# - time_col: 时间戳列的名称,这里设置为'timestamp'
# - target_col: 目标值列的名称,这里设置为'value'
# 返回值为预测结果的数据框
timegpt_fcst_pred_int_df = timegpt.forecast(
df=df, h=12, level=[80, 90, 99.7],
time_col='timestamp', target_col='value',
)
# 打印预测结果的前几行
timegpt_fcst_pred_int_df.head()
INFO:nixtlats.timegpt:Validating inputs...
INFO:nixtlats.timegpt:Preprocessing dataframes...
INFO:nixtlats.timegpt:Inferred freq: MS
INFO:nixtlats.timegpt:Restricting input...
INFO:nixtlats.timegpt:Calling Forecast Endpoint...
timestamp | TimeGPT | TimeGPT-lo-99.7 | TimeGPT-lo-90 | TimeGPT-lo-80 | TimeGPT-hi-80 | TimeGPT-hi-90 | TimeGPT-hi-99.7 | |
---|---|---|---|---|---|---|---|---|
0 | 1961-01-01 | 437.837921 | 415.826453 | 423.783707 | 431.987061 | 443.688782 | 451.892136 | 459.849389 |
1 | 1961-02-01 | 426.062714 | 402.833523 | 407.694061 | 412.704926 | 439.420502 | 444.431366 | 449.291904 |
2 | 1961-03-01 | 463.116547 | 423.434062 | 430.316862 | 437.412534 | 488.820560 | 495.916231 | 502.799032 |
3 | 1961-04-01 | 478.244507 | 444.885193 | 446.776764 | 448.726837 | 507.762177 | 509.712250 | 511.603821 |
4 | 1961-05-01 | 505.646484 | 465.736694 | 471.976787 | 478.409872 | 532.883096 | 539.316182 | 545.556275 |
# 使用timegpt模型对数据进行预测
# 预测6个时间步长的数据
# 预测置信度分别为80%, 90%, 99.7%
# 时间列为'timestamp',目标列为'value'
level_short_horizon_df = timegpt.forecast(
df=df, h=6, level=[80, 90, 99.7],
time_col='timestamp', target_col='value',
)
# 检查预测结果的形状是否为(6, 8)
test_eq(
level_short_horizon_df.shape,
(6, 8)
)
INFO:nixtlats.timegpt:Validating inputs...
INFO:nixtlats.timegpt:Preprocessing dataframes...
INFO:nixtlats.timegpt:Inferred freq: MS
INFO:nixtlats.timegpt:Restricting input...
INFO:nixtlats.timegpt:Calling Forecast Endpoint...
# 定义一个列表test_level,包含两个元素80和90.5
test_level = [80, 90.5]
# 调用timegpt模块的forecast函数,对数据框df进行预测
# 预测的时间步长为12,置信水平为80和90.5
# 时间列为'timestamp',目标列为'value'
cols_fcst_df = timegpt.forecast(
df=df, h=12, level=[80, 90.5],
time_col='timestamp', target_col='value',
).columns
# 使用assert语句进行断言,判断是否满足条件
# 条件为所有的字符串'TimeGPT-{pos}-{lv}'都在cols_fcst_df中
# pos取值为'lo'和'hi',lv取值为test_level中的元素
assert all(f'TimeGPT-{
pos}-{
lv}' for pos, lv in product(test_level, ['lo', 'hi']) )
INFO:nixtlats.timegpt:Validating inputs...
INFO:nixtlats.timegpt:Preprocessing dataframes...
INFO:nixtlats.timegpt:Inferred freq: MS
INFO:nixtlats.timegpt:Restricting input...
INFO:nixtlats.timegpt:Calling Forecast Endpoint...
# 导入timegpt模块中的plot函数
# 调用plot函数,传入以下参数:
# - df: 数据框,包含时间戳和值的列
# - timegpt_fcst_pred_int_df: 数据框,包含时间戳、预测值和置信区间的列
# - time_col: 时间戳列的名称
# - target_col: 值列的名称
# - level: 置信区间的水平,以列表形式提供,例如[80, 90]表示80%和90%的置信区间
timegpt.plot(
df, timegpt_fcst_pred_int_df,
time_col='timestamp', target_col='value',
level=[80, 90],
)
Note that the choice of prediction interval levels depends on your specific use case. For high-risk forecasts, you may want to choose a wider range to account for more uncertainty. For less critical forecasts, a narrower interval may be acceptable.
historical forecast
You can also add_history=True
calculate prediction intervals for historical forecasts by adding parameters.
# 使用TimeGPT进行预测
# df: 输入的数据框,包含时间戳和目标值
# h: 预测的时间步长
# level: 置信水平,用于计算预测区间
# time_col: 时间戳列的名称
# target_col: 目标值列的名称
# add_history: 是否在预测结果中添加历史数据
timegpt_fcst_pred_int_historical_df = timegpt.forecast(
df=df, h=12, level=[80, 90],
time_col='timestamp', target_col='value',
add_history=True,
)
# 显示预测结果的前几行
timegpt_fcst_pred_int_historical_df.head()
INFO:nixtlats.timegpt:Validating inputs...
INFO:nixtlats.timegpt:Preprocessing dataframes...
INFO:nixtlats.timegpt:Inferred freq: MS
INFO:nixtlats.timegpt:Calling Forecast Endpoint...
INFO:nixtlats.timegpt:Calling Historical Forecast Endpoint...
timestamp | TimeGPT | TimeGPT-lo-80 | TimeGPT-lo-90 | TimeGPT-hi-80 | TimeGPT-hi-90 | |
---|---|---|---|---|---|---|
0 | 1951-01-01 | 135.483673 | 111.937767 | 105.262830 | 159.029579 | 165.704516 |
1 | 1951-02-01 | 144.442413 | 120.896508 | 114.221571 | 167.988319 | 174.663256 |
2 | 1951-03-01 | 157.191910 | 133.646004 | 126.971067 | 180.737815 | 187.412752 |
3 | 1951-04-01 | 148.769379 | 125.223473 | 118.548536 | 172.315284 | 178.990221 |
4 | 1951-05-01 | 140.472946 | 116.927041 | 110.252104 | 164.018852 | 170.693789 |
# 绘制时间序列图
# 参数:
# df:原始数据集
# timegpt_fcst_pred_int_historical_df:时间序列预测结果的置信区间数据集
# time_col:时间列的列名
# target_col:目标列的列名
# level:置信区间的水平,可以是单个值或列表形式,表示置信区间的百分比
timegpt.plot(
df, timegpt_fcst_pred_int_historical_df,
time_col='timestamp', target_col='value',
level=[80, 90],
)