Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting Paper Understanding + Computer Translation

Background: Rapid urbanization has brought about population growth and created tremendous mobility and challenges. Among these challenges, intelligent transportation systems are an important area, and traffic prediction is an important part of urban traffic management.

Problem description: The paper focuses on how to accurately predict future traffic conditions, such as traffic flow and speed, passenger demand, etc.

Methods: Traditional forecasting methods employ time series models, which cannot capture the nonlinear correlations and complex spatiotemporal patterns of large-scale traffic. The paper proposes a method called Adaptive Graph Convolutional Recurrent Network (AGCRN). It combines the ideas of graph convolutional neural network (GCN) and recurrent neural network (RNN) to capture the interdependent relationship between traffic flow data in different time periods.

Graph Convolutional Neural Network (GCN): GCN is used to capture the relationship between different traffic nodes. The nodes can represent different traffic intersections or areas in the city. GCN predicts future traffic conditions by learning information from neighboring nodes.

Recurrent Neural Network (RNN): RNN is used to capture temporal dependencies in time series. It helps the model understand how past traffic data affects future data.

Experiments and results: Experiments on real-world traffic datasets verify the performance of the AGCRN model. Experimental results show that AGCRN has high accuracy in traffic prediction.

1 Introduction

The Complexity of Traffic Forecasting

Traffic prediction has complex internal dependencies (i.e., temporal correlations within a traffic sequence) and external dependencies (i.e., spatial correlations between multiple related traffic sequences from different sources). These dependencies arise from different sources. For example, different loop detectors/intersections for traffic flow and traffic speed prediction, and various stations/areas for passenger demand prediction.

Limitations of traditional methods

Traditional traffic forecasting methods simply use time series models, such as autoregressive integrated moving average models (ARIMA) and vector autoregressive models (VAR). Then, they fail to capture nonlinear correlations and complex spatiotemporal patterns in large-scale traffic data. These methods often ignore the interaction and spatial correlation between different traffic sequences and therefore perform poorly in handling these challenges. Traditional methods have limitations in dealing with these problems. Therefore, more advanced methods are needed to address these challenges to improve the accuracy and reliability of traffic predictions.

research trends

A recent research trend is to adopt deep learning methods and focus on designing new neural network structures to capture the salient spatiotemporal patterns shared by all traffic data sequences. Temporal dependence is modeled using recurrent neural network (RNN) and temporal convolution module (TCN), and spatial correlation is modeled using graph-based convolutional neural network (GCN).

question

Although deep learning methods achieve satisfactory results, they are not accurate for specific fine-grained patterns of different data sequences because they are too biased to capture shared patterns. In addition, existing GCN methods need to pre-define a similarity or distance metric to generate the connection graph, which requires a large amount of domain knowledge and is very sensitive to the quality of the graph.

Proposed solutions and models

The author proposes two mechanisms to improve existing GCN building blocks to solve the above problems respectively.

1) A Node Adaptive Parameter Learning (NAPL) module is proposed to learn the specific pattern of each traffic sequence.

2) A data-adaptive graph generation (DAGG) module is proposed to infer node embedding attributes from data and generate graphs during the training process.

NAPL: This module allows the model to learn specific patterns or parameters for each node instead of sharing global parameters. It uses the idea of ​​matrix decomposition to decompose parameter learning into two smaller parameter matrices, so that specific parameters can be generated for each node.

 DAGG: This module allows the model to automatically generate graph structures based on data instead of relying on predefined graphs. It uses node embedding and weight pooling to dynamically generate graphs to better capture spatial relationships in traffic data.

The author combines these two modules with a recurrent neural network and proposes a unified traffic prediction model AGCRN. AGCRN is able to capture fine-grained node-specific spatiotemporal correlations in traffic data and unifies the node embedding properties in modified GCN.

2 Related Work

Correlated time series prediction 

When discussing developments and trends in the field of related event sequence prediction, the emerging prominence of deep learning methods was mentioned. Deep learning methods have excellent performance when processing time series data because they can automatically capture complex patterns and correlations in the data without the need to manually design models or features. However, some existing methods require a large amount of training data and parameters to achieve high performance, which is also an important issue mentioned by the authors in the text. In addition, although deep learning performs well in time prediction, sometimes they ignore the interaction between different time series, which is one of the directions for researchers to continue to explore improvements.

GCN based Traffic forecasting

This paragraph emphasizes the research process in the traffic prediction neighborhood, especially when processing time series data, researchers are increasingly paying attention to spatial correlation. They try to use methods such as GCN to comprehensively capture the spatiotemporal characteristics of traffic data to improve the accuracy and versatility of traffic prediction. Despite some progress, some challenges remain, such as the dependence on predefined spatial connectivity graphs.

Graph Convolutional Networks

When introducing GCN and its related methods, this text emphasizes the versatility and importance of GCN in processing graph data. New research methods try to get rid of the dependence on static predefined connection graphs and allow models to dynamically learn or infer connection relationships from data.

3 Methodology

3.1 Problem Definition

This section introduces the problem of multi-step traffic prediction. X=[X_{1},X_{2},...,X_{n}]The background of the problem is that there are multiple correlated univariate time series, represented by the X_{t}symbol The goal is to predict future values ​​of relevant traffic time series based on observed historical data.

The author uses a function F_{\Theta }that \Thetarepresents all the learnable parameters in the model. The task of this function is to predict the data of the next time steps based on the data of T time steps in the past \ can. The equation is as follows:

f(X_{t+1},X_{t+2},...,X_{t+\tau })=F_{\Theta }(X_{t},X_{t},...,X_{t-T+1})

In order to more accurately handle the spatial correlation between different traffic time series, further modeling is performed on the graph, where graph G = (V, E, A),

  • V represents the node set, which represents the source of the traffic time series
  • E represents the set of edges
  • A is the adjacency matrix of the graph, indicating the spatial proximity between nodes or traffic time series.

Therefore, the question was modified to:

f(X_{t+1},X_{t+2},...,X_{t+\tau })=F_{\Theta }(X_{t},X_{t},...,X_{t-T+1},G)

This means that the model will predict future traffic time series values ​​based on the spatial relationships in graph G. The goal of the model is to use historical data and graph structures to make accurate traffic predictions.

3.2 Node Adaptive Parameter Learning

In recent traffic prediction research, GCN is often used to capture the spatial correlation between different traffic time series. The calculation of GCN is based on the Laplacian matrix L of the graph (L=DA) and is processed in the spectral domain. It is mentioned in the paper "Semi-supervised classification with graph convolutional networks" that the graph convolution operation can be approximated by first-order Chebyshev polynomial expansion. The formula is as follows:

  •  A is the adjacency matrix of the graph
  • D is the degree matrix
  • X is the input feature of the GCN layer
  • Z is the output feature
  • Θ is the learnable weight
  • b is the bias

However, the shared parameter approach is not the best choice for traffic prediction problems. Because traffic time series may have diverse patterns between different nodes, this is due to the dynamic nature of time series data and various factors that affect traffic. Therefore, simply capturing the shared patterns among all nodes cannot meet the demand for accurate traffic prediction, and the specific patterns of each node need to be learned.

In order to solve this problem, an enhanced version of GCN's Node Adaptive Parameter Learning (NAPL) module is proposed, which draws on the idea of ​​matrix factorization.

NAPL learns two smaller parameter matrices: the node embedding matrix (EG) and the weight pooling matrix (WG). The product of these two matrices generates the parameters Θ of the GCN layer, where each node draws parameters from a shared weight pool WG according to its node embedding EG, which can be interpreted as a set of candidates discovered from all traffic time series Learn node-specific patterns in patterns.

Finally, the calculation formula of NAPL-enhanced GCN (i.e. NAPL-GCN) is as follows:

This approach aims to improve the prediction performance of traffic time series by learning node-specific patterns to better capture spatial correlations and node-specific patterns. 

3.3 Data Adaptive Graph Generation

A problem with existing GCN-based traffic prediction models is that these models need to predefine an adjacency matrix to perform graph convolution operations. Usually, the adjacency matrix A is calculated through distance or similarity measures and is used to describe the connection relationship between nodes.

Distance function: Define the graph according to the geographical location between nodes. The connectivity between nodes is affected by the geographical location between them.

Similarity function: Defines the connectivity between nodes based on the similarity of node attributes or the time series itself.

There are some problems with these methods. The predefined graph cannot obtain complete information containing spatial dependence and is not directly related to the prediction task, which may lead to considerable bias. Furthermore, without appropriate knowledge, these methods cannot be adapted to other domains, making existing GCN-based models ineffective.

In order to solve this problem, the author proposed the DAGG module method to automatically infer the hidden dependency relationships obtained from the data. The DAGG module first initializes a learnable node embedding dictionary (EA) to represent all nodes, and then infers the spatial dependencies between these nodes by calculating the similarities between them. Specifically, it generates an adaptive adjacency matrix by calculating the dot product of EA and the transpose of EA, and then applying softmax and ReLU functions without the need to generate A and calculate the Laplacian matrix. During the training process, the EA is automatically updated to learn the hidden dependencies between different traffic time series and obtain an adaptive matrix for graph convolution. Finally, the DAGG-enhanced GCN model can be expressed in the following way:

 The benefit of this approach is that it can automatically learn the spatial dependencies between nodes from the data without relying on a predefined graph structure. This improves the model's adaptability to different traffic data, allowing it to better capture spatial relationships, thereby improving the performance of traffic predictions. Finally, for the case of processing extremely large graphs, methods such as graph segmentation and subgraph training can be used to solve the problem of high computational cost.

3.4 Adaptive Graph Convolutional Recurrent Network

AGCRN aims to capture spatial and temporal correlations in traffic time series, while integrating "NAPL-GCN", "DAGG" and "Gated Recurrent Units" (GRU).

Specifically, AGCRN replaces the MLP layer in GRU and uses "NAPL-GCN" to learn node-specific patterns. Furthermore, it uses the "DAGG" module to automatically discover spatial dependencies. The following is the formal representation of AGCRN:

  • X:;t and h_{t} represent the input and output of time step t respectively
  • [·] indicates connection operation
  • z and r represent reset gate and update gate respectively
  • E, Wz, Wr, W_{\hat{h}}, bz, br and b_{\hat{h}}are the parameters that can be learned in AGCRN

Similar to GRU, all parameters in AGCRN can be trained end-to-end through temporal backpropagation. 

The key points of this model are:

  1. Node-specific pattern learning : By using "NAPL-GCN" to replace the traditional MLP layer, AGCRN is able to learn the specific pattern of each node to better capture the spatial relationship between nodes.

  2. Automatic discovery of spatial dependencies : Using the "DAGG" module, AGCRN can automatically discover spatial dependencies between nodes from data without the need for a predefined graph structure.

  3. Parameter sharing : AGCRN unifies all embedding matrices into E instead of learning independent node embedding matrices in different "NAPL-GCN" layers and "DAGG", which helps ensure node embedding consistency among all GCN blocks , and improve the interpretability of the model.

3.5 Multi-step traffic prediction

Use stacked "AGCRN" layers as encoders to capture node-specific spatiotemporal patterns, and represent the input (i.e., historical data) as H (a matrix of dimension R^{N \times d_{o}}). Then, the traffic predictions for all nodes τ steps into the future can be directly obtained by R^{N \times d_{o}}projecting the representation fromR^{N \times \tau }

Unlike traditional stepwise sequential prediction, the method here does not require stepwise generation of outputs, which helps reduce time consumption.

The training goal of the model is to minimize L1 loss (L1 loss) and optimize the loss function of multi-step prediction. Therefore, the loss function of AGCRN for multi-step traffic prediction can be expressed as:

  • Wθ represents all learnable parameters in the network
  • X_{:,i}is the actual observed value
  • {X}'_{:,i}is the predicted value of the model at time step i

This problem can be solved by backpropagation and Adam optimizer, with the goal of minimizing the loss function to improve the accuracy of predictions. 

4 Experiments

4.1 Datasets

To evaluate the performance, experiments are conducted on two publicly available real-world traffic datasets: PeMSD4 and PeMSD8.

PeMSD4: The PeMSD4 dataset refers to traffic flow data in the San Francisco Bay Area. A total of 307 loop detectors were selected between January 1, 2018, and February 28, 2018.

PeMSD8: The PeMSD8 dataset contains traffic flow information collected from 170 loop detectors in the San Bernardino area from July 1, 2016, to August 31, 2016.

Data preprocessing: Use linear interpolation to fill missing values ​​in the data set. The two data sets were then aggregated into 5-minute windows, resulting in 288 data points per day. In addition, we use standard normalization methods to normalize the dataset to make the training process more stable.

For multi-step traffic prediction, we use one hour's historical data to predict the next hour's data,

We organize 12 steps of historical data as input, and the following 12 steps of data as output. We divide the data set into training set, validation set and test set in chronological order. The split ratio of the two data sets is 6:2:2. Although our method does not require predefined graphs, we use predefined graphs as baselines. Detailed dataset statistics are provided in the appendix.

 5 Discussion

Multivariate or correlated time series forecasting tasks are fundamental tasks in many application areas, including epidemic spread forecasting, meteorological (e.g., air quality, rainfall) forecasting, stock forecasting, and sales forecasting. Although the task of this paper is traffic prediction, the two proposed adaptive modules and our AGCRN model can also be adapted to various multivariate/correlated time series prediction tasks individually or jointly. This means that these methods are general and can be applied to multiple fields, not just traffic prediction.

The method proposed in this paper can automatically discover the interdependencies between different related time series from the data. This is very important for many related time series forecasting problems, because in some cases it is difficult to define the graph structure or connection relationship between these time series in advance.

The author mentioned future work directions, which will focus on two aspects of scalability:

  1. Data aspect - verify the performance of AGCRN on more time series prediction tasks;
  2. Model aspect - Apply NAPL and DAGG to more GCN-based traffic prediction models. This further emphasizes the generalizability of their approach and potential directions for future research.

6 Conclusion

In this paper, we propose to enhance traditional graph convolutional networks with node-adaptive parameter learning and data-adaptive graph generation modules, which are used to learn node-specific patterns and discover spatial correlations from data, respectively. Based on these two modules, we further propose an adaptive graph convolutional recurrent network, which can automatically capture node-specific spatial and temporal correlations in time series data without pre-defining the graph.

This work has broad potential for social and commercial applications, especially in the era of big data. Adaptive modules enhance the robustness of data analysis and related applications for dynamic, interdependent time series data, thereby helping to better model and analyze graph-structured multi-channel data with complex displays. formula and implicit correlation. This research supports better modeling and analysis of multi-channel data, which is important for solving economic and social problems around the world, such as influenza outbreaks, economic growth and climate change predictions, and has the potential to accelerate related research. progress.

However, there may be downsides to this effort, such as fairness issues that may arise on ride-sharing platforms. If taxi supply cannot meet demand, the platform may overemphasize predicted high-demand areas, which may increase wait times for travelers in low-demand areas. This highlights the need to consider issues of fairness and balance when using data-driven models to ensure that all groups in society benefit.

Guess you like

Origin blog.csdn.net/DW_css/article/details/132964810
Recommended