Transformer Timing

Recently, there has been a proliferation of Transformer-based solutions for time series forecasting (TSF) tasks, especially for the challenging long-term TSF problem. The Transformer architecture relies on the self-attention mechanism to effectively extract the semantic correlation between pairs of elements in long sequences, which has permutation invariance and reverse order to a certain extent. However, in time series modeling, we want to extract the temporal relationship between an ordered set of consecutive points. Therefore, although these studies show performance improvements, whether Transformer-based techniques are the right solution for long-term time series forecasting is an interesting question worth investigating. In this work, we question the effectiveness of Transformer-based TSF solutions. In their experiments, the compared (non-Transformer) baselines are mainly autoregressive prediction solutions, which generally have poor long-term predictive ability due to the inevitable error accumulation effect. In contrast, we use an embarrassingly simple architecture called DLinear that does direct multi-step (DMS) prediction for comparison. DLinear decomposes a time series into a trend series and a remainder series, and uses two single-layer linear networks to model these two series for forecasting tasks. Surprisingly, it substantially outperforms existing complex Transformer-based models in most cases. Therefore, we conclude that the relatively high long-term prediction accuracy of Transformer-based TSF solutions in existing work has little to do with the Temporal Relationship Extraction capability of the Transformer architecture. Instead, this is mainly due to the non-autoregressive DMS forecasting strategy they use . We hope this study also advocates for future re-examination of the effectiveness of Transformer-based solutions for other time-series analysis tasks (e.g., anomaly detection).

The premise of the Transformer model is the semantic correlation between paired elements , while the self-attention mechanism itself is permutation-invariant . Considering raw numerical data in time series (e.g., stock prices or electricity prices), there is hardly any point-wise semantic correlation between them. In time series modeling, we are primarily concerned with the temporal relationship between a set of consecutive points , and the order of these elements rather than the pairing relationship plays the most critical role. While employing positional encoding and using tokens to embed sub-series helps preserve some ordering information, the nature of the ordering-invariant self-attention mechanism inevitably leads to loss of temporal information . Due to the above observations, we are interested in revisiting the effectiveness of Transformer-based LTSF solutions

Guess you like

Origin blog.csdn.net/sinat_37574187/article/details/130184722