Google released a new linear time series forecasting model this week, once again surpassing Transformer

Following the previous research on whether Transformer is really effective in time series prediction and using MLP to defeat complex models, Google published a latest time series prediction work yesterday, and proposed the TiDE model. The entire model does not have any attention mechanism. RNN, or CNN, consists entirely of full connections. The main content of this work is introduced below.

a6423b98cf3c8a20f27d498a4277b02c.jpeg

  • Paper title : Long-term Forecasting with TiDE: Time-series Dense Encoder

  • Download address : https://arxiv.org/pdf/2304.08424v1.pdf

1

background introduction

With the development of Transformer, many Transformer variant structures have entered the field of time series. However, an article in AAAI2023, using a simple linear model, can achieve better time series forecasting results than previous Transformer models ( Are Transformers Effective for Time Series Forecasting? AAAI 2023 ), for Transformer in time series forecasting The validity of it has been questioned. What followed was a series of work that used simple MLPs to defeat Transformers (such as Huawei's Timing Mixer). Recently, a new variant of Transformer, PatchTST, regained its place for Transformer, upgraded Transformer by using the patch modeling idea in Vision Transformer, and defeated the MLP model.

In this Google article, MLP has been upgraded again, defeating Transfomer again. The paper proposes that the nonlinearity of MLP, in solving complex time series problems, has the problem of being unable to adapt to the nonlinearity between time series and other time series variables. Based on these problems, the MLP time series prediction model was upgraded, and a new TiDE model was proposed, which achieved SOTA effects on multiple data sets.

2

model structure

The model in this paper focuses on solving multivariate long-period time series forecasting tasks. The entire model structure of TiDE is composed of MLP, which focuses on solving the problems that the previous linear model cannot model the nonlinear relationship between the prediction window and the historical window, and cannot effectively model external variables.

The core basic component of the model is the Residual Block, which consists of a Dense+RLU layer, a Dense linear layer, and an Add&Layernorm. Other components of TiDE are built based on this basic block. The whole model can be divided into four parts : Feature Projection, Dense Encoder, Dense Decoder, and Temporal Decoder .

Feature Projection maps external variables to a low-dimensional vector, which is implemented using Residual Block. The main purpose is to reduce the dimension of external variables.

The Dense Encoder part stitches together the low-dimensional vectors of the historical sequence, attribute information, and external variable mapping, and uses multi-layer Residual Block to map them, and finally obtains an encoding result e.

The Dense Decoder part maps e to g using the same multi-layer Residual Block, and reshape g into a [p, H] matrix. Among them, H corresponds to the length of the prediction window, and p is the output dimension of the Decoder, which is equivalent to obtaining a vector at each moment of the prediction window.

Temporal Decoder stitches together the g in the previous step and the external variable x according to the time dimension, uses a Residual Block to map the output results at each moment, and then adds the direct mapping results of the historical sequence to make a residual connection to obtain the final prediction result.

9767990426ba9bbf4c3878908e96c163.jpeg

For multivariate sequence prediction, each sequence is predicted separately in this paper, and the model parameters of each sequence prediction are shared.

The whole model looks unpretentious, but it is actually equivalent to using full connections to directly map information such as historical sequences, attribute characteristics, and external variables to the prediction results of future windows. However, in the follow-up paper, the linear model of this method is proved theoretically, which is most suitable for predicting the data of the Linear Dynamical System (the future sequence is a linear mapping of the historical sequence). Interested students can read the fifth section of the theoretical proof in the paper in detail.

3

Experimental results

In this paper, regarding the effects of TiDE and various Transformer time series prediction models, the data set and prediction window also adopt the composition form similar to the previous work. The effect of TiDE and PatchTST is better than other models, and TiDE and the current Transformer optimal version model PatchTST have equal or better capabilities.

0e6bd0e7fda83ac33204b2ca9bde1477.jpeg

In the specific case of prediction, the linear model also showed a better fit, and the prediction curve was closer to the real value.

9c42127eca891a6e8dfd8dfc0c1dff41.jpeg

推荐阅读:

我的2022届互联网校招分享

我的2021总结

浅谈算法岗和开发岗的区别

互联网校招研发薪资汇总
2022届互联网求职现状,金9银10快变成铜9铁10!!

公众号:AI蜗牛车

保持谦逊、保持自律、保持进步

发送【蜗牛】获取一份《手把手AI项目》(AI蜗牛车著)
发送【1222】获取一份不错的leetcode刷题笔记

发送【AI四大名著】获取四本经典AI电子书

Guess you like

Origin blog.csdn.net/qq_33431368/article/details/130299325