【论文导读】- A Topological Information Protected Federated Learning Approach for Traffic Speed Forecasting

Paper information

FASTGNN: A Topological Information Protected Federated Learning Approach forTraffic Speed Forecasting
insert image description here

Original address: https://ieeexplore.ieee.org/document/9340313

Summary

Federated learning has been applied to various tasks in intelligent transportation systems to protect data privacy through decentralized training schemes. The majority of the state-of-the-art models in intelligent transportation systems (ITS) are graph neural networks (GNN)-based for spatial information learning. When applying federated learning to the ITS tasks with GNN-based models, the existing frameworks can only protect the data privacy; however, ignore the one of topological information of transportation networks. In this article, we propose a novel federated learning framework to tackle this problem. Specifically, we introduce a differential privacy-based adjacency matrix preserving approach for protecting the topological information. We also propose an adjacency matrix aggregation approach to allow local GNN-based models to access the global network for a better training effect. Furthermore, we propose a GNN-based model named attention-based spatial-temporal graph neural networks (ASTGNN) for traffic speed forecasting. We integrate the proposed federated learning framework and ASTGNN as FASTGNN for traffic speed forecasting. Extensive case studies on a real-world dataset demonstrate that FASTGNN can develop accurate forecasting under the privacy preservation constraint.

Federated learning has been applied to various tasks in intelligent transportation systems to preserve data privacy through decentralized training schemes. The main advantage of state-of-the-art models in Intelligent Transportation Systems (ITS) is the learning of spatial information based on Graph Neural Networks (GNNs). When applying feedforward learning to ITS tasks based on GNN models, existing frameworks can only preserve data privacy; however, the topological information of the transmission network is ignored. In this paper, we propose a novel distributed learning framework to address this problem. Specifically, we introduce a differential privacy-based adjacency matrix protection method to protect topological information . We also propose an adjacency matrix aggregation method, which allows local GNN-based models to be connected to the global network for better training performance. Furthermore, we propose a GNN-based model called Attention-based Spatio-Temporal Graph Neural Network (ASTGNN) for traffic speed prediction. We integrate the proposed joint learning framework and ASTGNN as FASTGNN for traffic speed prediction. Extensive case studies on real-world datasets demonstrate that FASTGNN can make accurate predictions under privacy-preserving constraints.

Contributions

  1. We propose a topological information-preserved FL framework, FASTGNN, for traffic speed prediction probabilities. This framework integrates GNN-based predictors with advanced spatio-temporal techniques. Such a framework can provide strong privacy-preserving transaction speed predictions by training models locally across organizations without exchanging raw data and topological information.
  2. In the proposed FL framework, we introduce a DP-based adjacency matrix preservation method to preserve topological information. We also develop an adjacency matrix aggregation mechanism to generate a preserved global network adjacency matrix. These two approaches ensure that our framework achieves a trade-off between privacy and performance.
  3. A series of comprehensive case studies are conducted on real-world traffic datasets to demonstrate the effectiveness of the proposed FASTGNN framework.

PRELIMINARY

Traffic Speed Forecasting on Transportation Networks

A traffic network can be represented by an undirected graph, insert image description here, where V is a node set, we define each node as a road segment, E is a set of edges, and A is an adjacency matrix of G. ∀ vi, vj ∈ V, if vi, vj is connected, then Aij = 1, otherwise it is 0. Denote the traffic speed observed on G as a graph-wide feature matrix X, let Xt be the traffic speed observation at time t, so the problem can be formulated as learning a function f( ), given the historical traffic flow to insert image description herepredict the subsequent Timestamped traffic flow insert image description here.

Federated Learning on Transportation Networks

In this paper, we construct a FL framework for traffic speed prediction on traffic networks. We define the "global-network" G as the entire transportation network of a region. This field is divided by several organizations (eg corporations, governments). Let insert image description heredenote the set of organizations, where p is the number of organizations. Let insert image description heredenote the set of local networks. The organizations' respective databases are Di, which collects traffic speed data from the local networks they operate. In particular, we have insert image description here, where, insert image description heredenote historical traffic flow data and topology information collected from the local network, respectively.

Furthermore, this paper is based on the fact that there is no overlapping area and data between organizations, i.e., for any two organizations i and j, Di∩Dj = ∅. Our goal is to train a robust model in the cloud that can predict global network-wide traffic speeds using local traffic speed data from Di. Nonetheless, these organizations are prohibited from sharing raw traffic data and topology information of the local networks they operate due to privacy concerns (i.e. they only have access to local networks).
To achieve our goal under the aforementioned privacy constraints, a Secure Parameter Aggregation Mechanism (SPAM) needs to be employed in the FL framework. Specifically, a graph-based deep learning model Mi constructed by each organization Oi utilizes the local training data from Di and the topological information of the corresponding local network G∗i to compute an updated set of model parameters φi. After all organizations complete parameter updates, their canonical parameters will be uploaded to the cloud. A global model is eventually developed by aggregating these uploaded parameters.

METHODOLOGY

Attention-Based Spatial-Temporal Graph Neural Networks

Aiming at the traffic speed prediction problem of the whole network, ASTGNN is proposed as a local prediction model. As shown in the figure, ASTGNN consists of four modules: feature embedding module, spatial dependency capture module, temporal dependency capture module and prediction output module.
insert image description here

Feature Embedding Module

The feature embedding module converts the input time series data into feature vectors, which can then be processed by the spatial dependency capture module. Specifically, given insert image description herea sequence of time-series values ​​(length = T), each eigenvector can be expressed as:
insert image description here

where ht is the network-wide feature vector at time t; F is the dimension of the vector, whose physical meaning is equivalent to the window size in the past. This means that we actually embed a sequence of sequence data into feature vectors whose length is the same as the window size in the past. In this way, we can obtain the sequence of feature vectors h1, h2, . . . , hT.

Spatial Dependency Capture Module

The spatial dependency capture module is used to exploit the spatial dependencies (graphs) between different road segments (nodes) in the transportation network. We build this module by following a Graph Attention Network (GAT), which exploits an attention mechanism to obtain spatial correlation. The operation steps of this module can be described as the following steps.

  1. We start by computing the attention score. For any ordered pair of nodes (vi, vj) ∈ V, the attention score from vj to vi can be expressed as:
    insert image description here. where insert image description hereis the attention score, hti and htj are the feature vectors of nodes vi and vj at time t, respectively, W is the weight matrix that can transform the feature vectors into higher dimension Fh, concat( ) means the concatenation operation, a is the weight vector , T represents the transpose operation.
  2. Subsequently, we use an activation function to normalize the attention score and obtain the attention efficiency, which can be expressed as insert image description here. Among them, insert image description hererepresents the attention coefficient, LeakyReLU( ) represents the LeakyReLU activation function, and softmax( ) represents the softmax activation function.
  3. Next, we filter the resulting attention coefficients to survive only for connected node pairs, which can be formulated insert image description hereas where Aij is the entry of nodes vi and vj in the adjacency matrix A, we can infer that when Aij = 1, the attention coefficient is still present, otherwise it is discarded (ie equal to 0).
  4. Finally, the feature vector of node vi is updated using the attention coefficient, which can be expressed as:
    insert image description here. Among them, insert image description hereis the updated feature vector of node vi at time t, which is regarded as the output of the module, N(i) is the set of node vi’s immediate neighbors, and W o is the weight matrix.

Time Dependent Capture Module

The Temporal Dependency Capture module aims to understand the underlying temporal dependencies of data. We use a two-layer GRU neural network in this module. GRU introduces a set of gating units and cell states to process input information, which can solve the problem of gradient disappearance in the learning process. There are two types of gate units, reset gate r and update gate z. Given the input data x t, the 1 hidden layer output h t g can be calculated by the following formula:
insert image description here

predictive output module

A fully connected layer is integrated in this module to generate futures timestamped traffic velocity. This linear transformation by fully connected layers is formulated as:
insert image description here
where W (fc) is the weight matrix mapping the hidden output of the GRU in the temporal module to the predicted output of s, and b is the bias.

Federated Learning Framework for ASTGNN

As shown, each organization runs an ASTGNN as a local model whose input is traffic speed data and topology information from its local traffic database. Implement the DP-based adjacency matrix preservation algorithm on the organization side to protect local topology information. The cloud server is responsible for aggregating the saved local topology information and ASTGNN model parameters, and broadcasting the aggregated topology information. A detailed elaboration of the relevant algorithms can be found below.

insert image description here

  1. FASTGNN Communication Protocol
    As defined in Section III-B, each organization only has access to its own traffic data and local network topology information for local model training. One problem with training local models using only local network topology information is that the local network does not contain all the basic topological information for ASTGNN to compute attention coefficients. This problem may lead to the final low learning effect (relevant experimental comparison will be demonstrated in VC section). Therefore, the topological information of the global network must be fed to the local model for better results. To achieve this without compromising the privacy of local network topology information, we propose a FL communication protocol shown in Algorithm 1.

Then it introduces the topology information privacy protection algorithm, local network topology information aggregation mechanism, SPAM and the whole FL process in detail.

  1. DP-Based Adjacency Matrix Preserving
    In this paper, we use the adjacency matrix of the local network as the carrier of topology information. We introduce a DP-based method to provide privacy protection for the adjacency matrix while maintaining its practicality in the learning process of ASTGNN. Given the adjacency matrix A to be protected, the algorithm is as follows:
    (1) Generate two Gaussian random matrices insert image description hereand insert image description herewhere M is the number of random projections. In this way, each entry of R(p) and R(q) is independently sampled from a Gaussian distribution.
    (2) Compute the projection matrix A (p) , A (p) = AR (p) By doing so, each row of A is projected from the high-dimensional RN to the low-dimensional RM
    (3) Perturb A(p) with the Gaussian random matrix R (q),insert image description here

The perturbation matrix insert image description hereis ​​treated as one of the preserved original adjacency matrices A. The top eigenvectors of the adjacency matrix are mainly used in GNN-based models to calculate spatial correlation.

Adopting the random projection described in step i preserves the top eigenvectors of A, which provides a guarantee for the validity of the preserved adjacency matrix in the subsequent ASTGNN predictor. Furthermore, the algorithm allows us to involve a small number of random perturbations, which further increases the utility of the perturbation matrix. In the case study of this work, we empirically set M = 10 and σ = 0.5.

  1. Step ii of the Local-Network Topological Information Aggregation Mechanism
    FASTGNN communication protocol requires the cloud server to aggregate the uploaded Z (pp) G . Therefore, we propose an adjacency matrix aggregation mechanism. Given a set of uploaded protected local network adjacency matrices insert image description here, where p is the number of involved local networks, their corresponding sizes are insert image description here. Since these matrices are of different sizes, we first use a matrix alignment method to make them have the same size while preserving their own topological information. Specifically, as shown in the figure, we use zero padding to align their dimensions to the size of the global network, resulting in a set of alignment matrices.
    insert image description here

Furthermore, considering the importance of the connectivity between different local networks (shaded regions as shown in Fig. 3) for learning attention, we construct a random connection for them. Specifically, we use the method introduced in Section IV-B2 to generate a Gaussian random matrix with the same size as the shaded area, and replace the original part symmetrically. Finally, the adjacency matrix retained by the aggregation is obtained by adding the alignment matrices, which can be expressed as
insert image description here
4. Learning Process of FASTGNN
In FASTGNN, we use the FedAvg algorithm as SPAM to aggregate the uploaded parameters and get the global model.

Finally, as shown in Algorithm 2, the whole learning process of each round in FASTGNN consists of three steps:
insert image description here

(1) The cloud server broadcasts the global model with initial parameters and the protected global network adjacency matrix to each organization.
(2) Each organization trains with local data and updates the global model parameters.
(3) The server aggregates the model parameters trained by each organization through the federated averaging algorithm to obtain a new global model.

  1. Theoretical Discussion of DP-Based Adjacency Matrix Preserving on Model Performance
    Many existing studies have shown that noise added to data by DP algorithms may lead to degenerate learning and further affect model performance. In our proposed method, noise is added to the adjacency matrix instead of the data. During the learning process of each local model, the global adjacency matrix-A (aggregation) processed with aggregated DP filters only the attention coefficients described in (4). Since ̃A (aggregation) approximates a binary matrix (i.e., (0, 1)-matrix) after DP processing and aggregation, the value of the attention coefficient is not significantly affected. Therefore, promising final model performance can be guaranteed. Furthermore, the existing performance loss is due to the difference between the original global topology and the new global topology after processing and aggregating DP on the adjacency matrix.

Guess you like

Origin blog.csdn.net/weixin_43598687/article/details/131253469