Summary of the latest spatiotemporal sequence prediction work!

1

Linear spatiotemporal prediction model

论文标题:ST-MLP: A Cascaded Spatio-Temporal Linear Framework with Channel-Independence Strategy for Traffic Forecasting

Download address : https://arxiv.org/pdf/2308.07496v1.pdf

0ccae6a02a4e6faf4058218ccee22af0.png

This article builds a simple and efficient linear model in the field of spatiotemporal prediction, and uses the channel-independence method for modeling.

The overall structure of the model is shown in the figure below, which is a cascade structure. The input is divided into three parts: temporal embedding, spatial embedding, and data embedding. Temporal embedding represents date-related features, such as the hour of the day; spatial embedding represents spatial features, such as the embedding and spatial topology of each node; data embedding represents time series information. The input of these three parts passes through independent MLP respectively. After the data of each part passes through MLP, the information of the next part is spliced ​​together and then input to the next layer of MLP. Finally a linear layer is used for prediction. The overall network structure is very simple.

52ad513693f32961471e8d8e82d911d1.png

2

Sparse graph spatiotemporal prediction

Paper title : Localized Adaptive Spatial-Temporal Graph Neural Network

Download address : https://arxiv.org/pdf/2306.06930.pdf

0aa5df5e66d00bfbd16ed50ee648ef43.png

This article analyzes the graph structure automatically learned by the model based on ASTGNNs type network. The analysis results are shown in the figure below, which plots the distribution of weights learned by the two edge models. It can be seen that the vast majority of edges are numbers close to 0, which shows that the relationships between nodes tend to be sparse. Most of the relationships between nodes There is no relatively strong correlation useful for spatiotemporal prediction.

ca299968f935deee9d04aaad264e4a0a.png

In order to introduce sparsity in ASTGNN, the Mask method is used in this article. Specifically, the graph adaptive learning module of the model learns two matrices at the same time, one is the graph structure adjacency matrix A, and the other is the mask matrix M, both of which have the same size. M indicates whether the edge between two nodes should be pruned, that is, set to 0. The elements in M ​​are of two types: 1 or 0. The overall form of the loss function is as follows:

f889c44b1bec9a12bc84350c10b0266b.png

Among them, the first item L represents the loss of spatiotemporal prediction, and the graph structure used is the element-wise product of A and M, which is the adjacency matrix after pruning. The second term is a regularization term, which requires the M matrix to be as small as possible, that is, to satisfy the sparsity constraint. The learning method of M matrix is ​​to fit directly based on the embedding of nodes and a full connection.

The overall algorithm process is as follows. Before sparsification, normal spatiotemporal network pre-training will be performed, and then the edges will be pruned recursively.

31ad4fe4f671a6b30b15df2284ea17f0.png

3

Long-term traffic forecast model

Paper title : HUTFormer: Hierarchical U-Net Transformer for Long-Term Traffic Forecasting

Download address : https://arxiv.org/pdf/2307.14596v1.pdf

ac704bdf51a112730b23a399774ea6dd.png

In the field of traffic prediction, previous work mainly focused on short-term predictions, such as predicting traffic volume in the next hour. This article is aimed at long-term traffic forecasting, such as forecasting for the next day. There are many challenges in long-term traffic prediction. The core is how to model global information and local mutation information at the same time.

This paper proposes a hierarchical Encoder-Decoder structure based on Transformer to achieve global and local information fusion. The overall structure of the model is shown in the figure, the core module of which is Window Transformer Layer. Ordinary Transformers use global attention, while Window Transformer only performs attention within a window, allowing the model to pay more attention to information in different ranges, including local information and global information. After the Window Transformer, the representations in each window are fused, continuously reducing the sequence time dimension while increasing the hidden state dimension. In the Decoder stage, a similar hierarchical structure is also used to align attention one-to-one with the Encoder.

ee5b164e81d6f0221355d245e64bdb81.png

4

A review of graph learning in time series forecasting

论文标题:A Survey on Graph Neural Networks for Time Series: Forecasting, Classification, Imputation, and Anomaly Detection

Download address : https://arxiv.org/pdf/2307.03759v1.pdf

e5c306f922cfc886e85f3babffadadd2.png

This article introduces a recently published review paper on the application of graph neural networks in time series, involving graph learning in time series prediction, time series completion, time series anomaly detection, time series classification and other time series analysis. Scenes.

5

Introducing contextual POI prediction

论文标题:Learning Dynamic Graphs from All Contextual Information for Accurate Point-of-Interest Visit Forecasting

Download address : https://arxiv.org/pdf/2306.15927v1.pdf

POI is a typical spatio-temporal prediction problem. In order to characterize dynamic POI relationships, previous work generally uses dynamic graph modeling, that is, the graph composed of POI relationships at each moment changes, and end-to-end based on data study. This article is also based on this kind of modeling idea, and the core optimizes the learning of dynamic graphs.

First, in order to obtain the sequence status of POIs in different types of scenes, this paper first adopts a data aggregation method to aggregate POI sequences according to functions to obtain coarse-grained sequences, and further generate global overall sequences to depict time at different granularities. sequence. Next, for each sequence, the article uses the GRU+Attention structure to generate a representation of each sequence as part of the input information for subsequent dynamic graph construction.

On the other hand, the paper further introduces various types of information of nodes to expand features. Each POI uses a corresponding text description and is input into a pre-trained language model to generate a text representation of the POI. The parameters of the pre-trained language model are frozen and a finetuneable fully connected adaptive downstream task is added. Finally, the above two types of information (sequence + node text features) are merged together, and the distance between two nodes is calculated through inner product and other methods to construct a dynamic sequence diagram.

Based on the above information, the article constructs an adjacency matrix based on distance (distance between POIs) and an adjacency matrix based on semantic information (pre-trained model extracts POI text representation). And use the Transformer to obtain the attention scores of different time series within a time window, and use the above two adjacent matrices to construct the Gate for fusion.

90a16a5b65808e6bd944a573f7125147.png

6

Contrastive learning model for spatio-temporal fusion

论文标题:Correlated Time Series Self-Supervised Representation Learning via Spatiotemporal Bootstrapping

Download address : https://arxiv.org/abs/2306.06994

In past work, there have been many studies on unsupervised pre-training of time series. The idea of ​​contrastive learning is generally used for self-supervised training of time series encoders. However, there are three drawbacks to historical work. The first is that most past methods learn the representation of the entire sequence, while time series prediction tasks focus more on the representation of each time step, so there is a certain misfit problem in upstream and downstream tasks. The second is that past work pre-trained on a single time series itself, without taking into account the relationship between each series. The third point is that in previous contrastive learning pre-training methods, false negative sample problems often occur in the process of constructing negative samples (that is, the constructed negative samples should actually be positive samples), which leads to a negative impact on the model effect.

In order to solve these problems, this paper proposes a new time series unsupervised pre-training method for spatio-temporal fusion. The overall structure of the model is shown in the figure below. In the comparative learning framework, the BYOL idea is used to replace the original comparative learning method. BYOL is a comparative learning method based entirely on positive samples. In this article, the masked sequence is used to obtain the predicted complete sequence, so that there is no need to construct negative samples, and there is no problem of false negative samples. At the same time, the sequence restoration task is also more suitable for downstream time series prediction tasks.

46bdbb4cdf78f278c17ee9626f78bac9.png

Recommended reading:

My 2022 Internet School Recruitment Sharing

My 2021 summary

A brief discussion on the difference between algorithm positions and development positions

Internet school recruitment R&D salary summary

The current situation of Internet job hunting in 2022, gold 9 silver 10 will soon become bronze 9 iron 10! !

Public account: AI snail car

Stay humble, stay disciplined, and keep improving

9abf4d497580a061a7c17fec31f656cd.jpeg

Send [Snail] to get a copy of "Hand-in-Hand AI Project" (written by AI Snail)

Send [1222] to get a good leetcode test note

Send [Four Classic Books on AI] to get four classic AI e-books

Guess you like

Origin blog.csdn.net/qq_33431368/article/details/132867416