In the past year, many linear models have appeared in time series forecasting, such as DLinear, TSMixer, etc. In these works, it has been verified that such very simple linear models are even better than many complex Transformer-type models.
So, in spatio-temporal forecasting, are simple linear models also applicable? Since the beginning of this year, there have indeed been two related articles, using a very simple linear model + Embedding for spatio-temporal prediction, the effect surpasses various complex spatio-temporal prediction models such as STGCN . This article summarizes the work of these two articles and shows you how to use linear models for spatiotemporal prediction.
1
Spatial-Temporal Identity
论文题目:Spatial-Temporal Identity: A Simple yet Effective Baseline for Multivariate Time Series Forecasting
Download link : https://arxiv.org/pdf/2208.05233v2.pdf
This article is a short article published on CIKM 2022. The starting point is a very simple phenomenon: the sequence with the same historical law may have different future curves, which makes it difficult for the model to make different predictions based only on the historical sequence. For example, in the following example, the top picture selects three different windows W, P represents history, and F represents the future. W1 is the variable history and future sequence of two sensors in the space. The historical sequences of these two sensors are basically the same, but the difference in the future is very large. It is impossible to establish this difference by only fitting a regression model based on the historical sequence.
Previous models for solving graph learning, such as STGCN, use graph convolution to establish the relationship between different nodes in space, and use a timing model for timing modeling. The reason for the improved effect of this method is mainly due to the introduction of convolution to distinguish the same historical sequence of different nodes but different future sequences. This paper believes that this problem can be solved directly by adding id embedding, so the following structure is designed: the id of the sequence, the id of time information, etc. are converted into embedding, and the sequence representation generated by the time series model is spliced together. After obtaining all kinds of information, a simple multi-layer MLP network is used to map the input data to obtain the final prediction result. The overall model structure is shown in the figure below.
From the following experimental results, this simple method works better than many previous graphical models:
As can be seen from the following ablation experiments, adding the id embedding of each sequence has a great influence on the effect:
2
ST-MLP
论文标题:ST-MLP: A Cascaded Spatio-Temporal Linear Framework with Channel-Independence Strategy for Traffic Forecasting
Download link : https://arxiv.org/pdf/2308.07496v1.pdf
This article builds a simple and efficient linear model in the field of spatiotemporal prediction, and uses the channel-independence method for modeling.
The overall structure of the model is shown in the figure below, which is a cascaded structure. The input is divided into three parts: temporal embedding, spatial embedding, and data embedding. Among them, temporal embedding represents date-related features, such as the hour of the day; spatial embedding represents spatial features, such as the embedding and spatial topology of each node; data embedding represents time series information. The input of these three parts passes through independent MLP respectively. After the data of each part passes through the MLP, the information of the next part is spliced together, and then input to the MLP of the next layer. Finally a linear layer is used for prediction. The overall network structure is very simple.
It is worth noting that this modeling process is channel independence, that is, all MLPs are performed within each sequence, and there is no cross-sequence MLP.
The structure of MLP is as shown in the figure below, mainly linear layer, normalization layer, ReLU activation function, and a layer of dropout.
The composition of Temporal embedding is as follows, including two-way features. The first feature divides a day into multiple slots, and each slot corresponds to an id embedding; the second feature corresponds to the day of the week. These two parts of embedding are spliced together to form a temporal embedding, which is input into the MLP.
The composition of Spatial embedding is as follows, which also includes two parts. One part is a predefined graph structure, each node has a learnable embedding, and the adjacency matrix of the predefined graph is used for aggregation to obtain the embedding of each node. The other part cannot be reflected in the pre-defined graph, and this part directly uses a learnable embedding representation. These two parts are spliced together to form the overall spatial embedding.
From the experimental results, the ST-MLP simple model has a certain effect compared with the previous complex spatio-temporal prediction models:
3
Summarize
Judging from the linear space-time prediction model work since this year, simple linear models can indeed achieve good results in space-time prediction. How to further enhance the ability of the linear model and explore the ceiling of the spatio-temporal prediction problem is a follow-up problem worthy of research.
Recommended reading:
My 2022 Internet School Recruitment Sharing
Talking about the difference between algorithm post and development post
Internet school recruitment research and development salary summary
The 2022 Internet job hunting status, gold 9 silver 10 will soon become copper 9 iron 10! !
Public number: AI snail car
Stay humble, stay disciplined, keep improving
Send [Snail] to get a copy of "Hands-on AI Project" (written by AI Snail Car)
Send [1222] to get a good leetcode brushing notes
Send [AI Four Classics] Get four classic AI e-books