TimeGPT: Time Series Forecasting Model Example

The field of time series forecasting is going through a very exciting period. In the past three years, we have witnessed many important contributions, such as N-BEATS, N-HiTS, PatchTST and TimesNet, etc. Meanwhile, large language models (LLMs) have recently achieved great success in terms of popularity, such as ChatGPT, as they can be adapted to various tasks without further training.

This begs the question: similar to the underlying models that exist in natural language processing, can there be underlying models for time series? Is it possible for a large model pretrained on large amounts of time series data to then be able to produce accurate predictions on unseen data?

Through TimeGPT-1 proposed by Azul Garza and Max Mergenthaler-Canseco, the author adjusted the technology and architecture behind LLM to the prediction field and successfully built the first time series basic model capable of zero-time inference. In this article, we first explore the architecture behind TimeGPT and how the model is trained. We then apply it to a prediction project and evaluate its performance against other state-of-the-art methods such as N-BEATS, N-HiTS, and PatchTST.

Explore TimeGPT

As mentioned earlier, TimeGPT was the first attempt to create a basic model for time series forecasting.58e75003dd7347b3c1d90672663cb1b5.jpeg

Example of how to train TimeGPT for inference on unseen data

From the above figure, we can see that the general idea behind TimeGPT is to train the model on a large amount of data from different domains and then perform zero-shot inference on unseen data. Of course, this approach relies on transfer learning, where the model is able to use the knowledge it acquired during training to solve new tasks. Now, this is only possible if the model is large enough and trained on large amounts of data.

TrainingTimeGPT

To this end, the authors trained TimeGPT on over 100 billion data points, all derived from open source time series data. The dataset covers a variety of domains, from finance, economics and weather to network traffic, energy and sales.

Note that the authors did not reveal the source of the public data used to curate the 100 billion data points.

This diversity is critical to the success of the underlying model, as it can learn different temporal patterns and thus generalize better.

For example, we might expect weather data to have daily (warmer days, cooler nights) and yearly seasonality, while traffic data might have daily (more vehicles during the day) and weekly seasonality (more vehicles on weekdays) .

To ensure the robustness and generalization ability of the model, preprocessing is kept to a minimum. Actually, only the missing values ​​are filled, the rest are kept in their original form. Although the author did not specify the method of data interpolation, I suspect some kind of interpolation technique was used, such as linear interpolation, spline interpolation, or moving average interpolation. The model was then trained over multiple days, during which hyperparameters and learning rates were optimized. Although the authors did not reveal how many days and how many GPUs the training required, we do know that the model was implemented in PyTorch, using the Adam optimizer and a learning rate decay strategy.

TimeGPT architecture

TimeGPT utilizes the Transformer architecture based on the self-attention mechanism pioneered by Google and the University of Toronto in 2017.

ccb7d2ece3e728415f9f4dcdac470e72.jpeg

From the picture above, we can see that TimeGPT uses a complete encoder-decoder Transformer architecture

Inputs can include a window of historical data, as well as exogenous data, such as a one-time event or another time series.

The input is fed to the encoder part of the model. The attention mechanism inside the encoder then learns different properties from the input. This is then fed to the decoder, which uses the learned information to generate predictions. The forecast sequence ends when the length of the forecast time range set by the user is reached. It is worth noting that the authors implemented compliance prediction in TimeGPT, allowing the model to estimate prediction intervals based on historical errors.

Functions of TimeGPT

Considering that TimeGPT is the first attempt to build a basic model of time series, it has a wide variety of functions. First, since TimeGPT is a pre-trained model, this means we can generate predictions without training on specific data. Of course, the model can still be fine-tuned to fit our data.

Secondly, the model supports exogenous variables for prediction targets and can handle multivariate prediction tasks. Finally, by using compliance predictions, TimeGPT can estimate prediction intervals. This in turn enables the model to perform anomaly detection. Basically, if a data point falls outside the 99% confidence interval, then the model flags it as an anomaly.

Keep in mind that all of these tasks can be accomplished using zero-shot inference or with some fine-tuning, which is a complete paradigm change for the field of time series forecasting. Now that we have a more solid understanding of TimeGPT, how it works and how it is trained, let’s see how the model performs in action.

Prediction using TimeGPT

Now, let us apply TimeGPT to a prediction task and compare its performance with other models. Please note that at the time of writing, TimeGPT is only accessible via API and it is in closed beta. As mentioned earlier, the model was trained on 100 billion data points from publicly available data. Since the authors did not specify the actual dataset used, I don't think it is reasonable to test the model with datasets that might be seen during training on well-known benchmark datasets, such as ETT or weather data.

Import the library and read the data

The natural first step is to import the library used for this experiment.

import pandas as pd
import numpy as np
import datetime
import matplotlib.pyplot as plt


from neuralforecast.core import NeuralForecast
from neuralforecast.models import NHITS, NBEATS, PatchTST


from neuralforecast.losses.numpy import mae, mse


from nixtlats import TimeGPT


%matplotlib inline

Then, to access the TimeGPT model, we read the API key from the file. Note that I did not assign the API key to an environment variable as access is limited to two weeks.

with open("data/timegpt_api_key.txt", 'r') as file:
        API_KEY = file.read()

Then we can read the data.

df = pd.read_csv('data/medium_views_published_holidays.csv')
df['ds'] = pd.to_datetime(df['ds'])


df.head()

7e5b9ce96da1033d1add51715f738ced.jpeg

The first five rows of our dataset

From the image above, we can see that the format of the dataset is the same as when using other open source libraries such as Nixtla.

We have a unique_id column that marks different time series, but in our case we only have one series. Column y represents the number of daily visits to my blog, and published is a simple flag used to mark a day when a new article was published (1) or a day when no article was published (0). Intuitively, we know that when new content is published, traffic usually increases over a period of time. Finally, the column is_holiday indicates whether there is a holiday in the United States. Intuitively, during the holidays, fewer people visit my blog. Now, let's visualize our data and look for obvious patterns.

published_dates = df[df['published'] == 1]
fig, ax = plt.subplots(figsize=(12,8))
ax.plot(df['ds'], df['y'])
ax.scatter(published_dates['ds'], published_dates['y'], marker='o', color='red', label='New article')
ax.set_xlabel('Day')
ax.set_ylabel('Total views')
ax.legend(loc='best')
fig.autofmt_xdate()
plt.tight_layout()

61d1e3754dbf647d067daf6fb9047542.jpeg

Blog daily visits

From the picture above, we can already see some interesting behavior. First, notice that the red dots represent newly published articles, which spike in traffic almost immediately. We have also noticed less activity in 2021, which is reflected in lower daily visits to my blog. Finally, in 2023, we noticed some unusual traffic spikes after publishing articles. After zooming in on the data, we also found clear weekly seasonality.

14be46ba5183b3d756e2f6dc55e49337.jpeg

The number of daily visits to the blog. Here we see clear weekly seasonality, with fewer people visiting on weekends

From the image above, we can now see that on weekends, the blog has fewer visitors, while on weekdays, it has more visitors. With all this in mind, let's see how to use TimeGPT for prediction.

Prediction using TimeGPT

First, let us split the dataset into training and test sets. Here, I will keep 168 time steps for the test set, which corresponds to 24 weeks of daily data.

train = df[:-168]
test = df[-168:]

We then set the forecast horizon to seven days, since I'm interested in forecasting daily visits for the entire week.

Currently, the API does not provide an implementation of cross-validation. So we create our own loop to generate predictions seven at a time until we have predictions for the entire test set.

future_exog = test[['unique_id', 'ds', 'published', 'is_holiday']]


timegpt = TimeGPT(token=API_KEY)


timegpt_preds = []


for i in range(0, 162, 7):


    timegpt_preds_df = timegpt.forecast(
        df=df.iloc[:1213+i],
        X_df = future_exog[i:i+7],
        h=7,
        finetune_steps=10,
        id_col='unique_id',
        time_col='ds',
        target_col='y'
    )
    
    preds = timegpt_preds_df['TimeGPT']
    
    timegpt_preds.extend(preds)

In the code block above, note that we have to pass the future value of our exogenous variable. This is OK because they are static variables. We know the future dates for the holidays and the blogger personally knows when he plans to post. Also note that we fine-tuned TimeGPT using the finetune_steps parameter. Once the loop is complete, we can add the predictions to the test set. Again, TimeGPT generates seven predictions at a time until 168 predictions are obtained so that we can evaluate its ability to predict daily visits for the next week.

test['TimeGPT'] = timegpt_preds
test.head()

62348786a2c99dcb0a9ab48dea5258e6.jpeg

TimeGPT predictions

Forecasting using N-BEATS, N-HiTS and PatchTST

Now, let's apply other methods and see if specifically training these models on our dataset produces better predictions. For this experiment, as mentioned before, we use N-BEATS, N-HiTS and PatchTST.

horizon = 7


models = [NHITS(h=horizon,
               input_size=5*horizon,
               max_steps=50),
         NBEATS(h=horizon,
               input_size=5*horizon,
               max_steps=50),
         PatchTST(h=horizon,
                 input_size=5*horizon,
                 max_steps=50)]

Next, we initialize the NeuralForecast object and specify the frequency of our data, in this case daily.

nf = NeuralForecast(models=models, freq='D')

We then perform cross-validation on 7 time steps of 24 windows to obtain predictions that align with the test set used for TimeGPT.

preds_df = nf.cross_validation(
  df=df, 
  static_df=future_exog , 
  step_size=7, 
  n_windows=24
)

We can then simply add the predictions from TimeGPT to this new `preds_df` DataFrame to obtain a single DataFrame containing all model predictions.

preds_df['TimeGPT'] = test['TimeGPT']

f2937179e6617a126987e615ff8de6a8.jpeg

Next, we are ready to evaluate the performance of each model

Evaluate

Before measuring performance metrics, let’s visualize each model’s predictions on our test set.231f59f8c72b52ea2dad771a06883ed4.jpeg

Visualize each model’s predictions

First, we see that there is a lot of overlap between each model. However, we note that N-HiTS predicted two peaks that did not actually materialize. Furthermore, it seems that PatchTST is often underestimated. However, TimeGPT seems to generally match the actual data fairly well.

Of course, the only way to evaluate the performance of each model is to measure performance metrics. Here, we use mean absolute error (MAE) and mean square error (MSE). Additionally, we round the predictions to whole numbers because decimals are meaningless in the context of the blog’s daily visitors.

preds_df = preds_df.round({
  'NHITS': 0,
  'NBEATS': 0,
  'PatchTST': 0,
  'TimeGPT': 0
})
data = {'N-HiTS': [mae(preds_df['NHITS'], preds_df['y']), mse(preds_df['NHITS'], preds_df['y'])],
     'N-BEATS': [mae(preds_df['NBEATS'], preds_df['y']), mse(preds_df['NBEATS'], preds_df['y'])],
     'PatchTST': [mae(preds_df['PatchTST'], preds_df['y']), mse(preds_df['PatchTST'], preds_df['y'])],
     'TimeGPT': [mae(preds_df['TimeGPT'], preds_df['y']), mse(preds_df['TimeGPT'], preds_df['y'])]}
metrics_df = pd.DataFrame(data=data)
metrics_df.index = ['mae', 'mse']
metrics_df.style.highlight_min(color='lightgreen', axis=1)

d990c5ccf2b2c83c2f4064ab04bed10e.jpeg

As can be seen from the above figure, TimeGPT is the champion model because it performs best in MAE and MSE, followed by N-BEATS, PatchTST and N-HiTS. This is an exciting result because TimeGPT has never seen this dataset and only made minor fine-tuning. While this is not an exhaustive experiment, I think it does demonstrate the potential that the underlying model may have in the field of prediction.

My personal opinion on TimeGPT

While I'm excited about my brief experiment with TimeGPT, I must point out that the original paper remains obscure in many important areas. Again, we don’t know what the datasets were used to train and test the model, so we can’t really verify TimeGPT’s performance results, as shown below.


93a60ca57fd2ed3811fedbb065cb29d8.jpeg

TimeGPT performance results

From the table above, we can see that TimeGPT performs best in monthly and weekly frequency, with N-HiTS and Temporal Fusion Transformer (TFT) usually ranking second or third. However, since we do not know what data were used, we cannot validate these metrics. There is a lack of transparency on how the model is trained and adapted to handle time series data.

in conclusion

TimeGPT is the first basic model for time series forecasting. It leverages the Transformer architecture and is pre-trained using zero-shot inference capabilities from 100 billion data points. Combined with coincident prediction techniques, the model can generate prediction intervals and perform anomaly detection without training on a specific data set.

·  END  ·

HAPPY LIFE

c27aa5bef8b34c15b60a1ad919a942a5.png

This article is for learning and communication only. If there is any infringement, please contact the author to delete it.

Guess you like

Origin blog.csdn.net/weixin_38739735/article/details/134760355