Article directory
Paper information
Cross-Node Federated Graph Neural Network for Spatio-Temporal Data Modeling
Original link: Cross-Node Federated Graph Neural Network for Spatio-Temporal Data Modeling: https://dl.acm.org/doi/10.1145/3447548.3467371
Summary
Vast amount of data generated from networks of sensors, wearables, and the Internet of Things (IoT) devices underscores the need for advanced modeling techniques that leverage the spatio-temporal structure of decentralized data due to the need for edge computation and licensing (data access) issues. While federated learning (FL) has emerged as a framework for model training without requiring direct data sharing and exchange, effectively modeling the complex spatio-temporal dependencies to improve forecasting capabilities still remains an open problem. On the other hand, state-of-the-art spatio-temporal forecasting models assume unfettered access to the data, neglecting constraints on data sharing. To bridge this gap, we propose a federated spatio-temporal model – Cross-Node Federated Graph Neural Network (CNFGNN) – which explicitly encodes the underlying graph structure using graph neural network (GNN)-based architecture under the constraint of cross-node federated learning, which requires that data in a network of nodes is generated locally on each node and remains decentralized. CNFGNN operates by disentangling the temporal dynamics modeling on devices and spatial dynamics on the server, utilizing alternating optimization to reduce the communication cost, facilitating computations on the edge devices. Experiments on the traffic flow forecasting task show that CNFGNN achieves the best forecasting performance in both transductive and inductive learning settings with no extra computation cost on edge devices, while incurring modest communication cost.
The massive amount of data generated by networks of sensors, wearables, and IoT devices underscores the need for advanced modeling techniques that exploit the spatiotemporal structure of decentralized data because of the need for edge computing and licensing issues. Although federated learning has emerged as a framework for model training without direct data sharing and exchange, effectively modeling complex spatio-temporal dependencies to improve predictive capabilities remains an open problem. On the other hand, state-of-the-art spatiotemporal prediction models assume unrestricted access to data, ignoring the constraints of data sharing. To bridge this gap, we propose a federated spatiotemporal model—Cross-Node Federated Graph Neural Network (CNFGNN)—that uses a graph neural network (GNN)-based architecture to Explicit encoding, which requires data in a network of nodes to be generated locally on each node and remain decentralized. CN-FGNN operates by disentangling modeling of temporal dynamics on the device and spatial dynamics on the server, exploiting alternating optimization to reduce communication cost and facilitate computation on edge devices. Experiments on the traffic flow prediction task show that CNFGNN achieves the best prediction performance in both transductive and inductive learning settings with no additional computational cost on edge devices while incurring moderate communication cost.
main contribution
- We propose Cross-Node Federated Graph Neural Network (CN-FGNN), a GNN-based federated learning architecture that can capture complex spatio-temporal relationships among multiple nodes while ensuring that locally generated data remains decentralized without additional The computational cost is on the edge device.
- Our modeling and training procedures enable GNN-based architectures to be used in federated learning settings. We achieve this by disentangling modeling of local temporal dynamics on edge devices and spatial dynamics on cen-tral servers, and utilize an optimization-based alternating process to update spatial and temporal modules using split learning and Federated averaging for efficient GNN-based federated learning.
- We demonstrate that CNFGNN achieves state-of-the-art prediction performance (in transductive and inductive settings) on edge devices with moderate communication costs, without additional computational cost compared to related techniques in the traffic flow prediction task.
CROSS-NODE FEDERATED GRAPHNEURAL NETWORK
problem definition
Given a dataset of graph G = (V, E), a feature tensor X and label tensor Y, a task is defined on the dataset with X as input and Y as predicted target. We consider learning models under the constraints of cross-node federated learning: node features , ..., node labels , ..., model output visible only to node i.
A typical task that requires joint learning of constraints across nodes is to predict spatiotemporal data generated by sensor networks. In this case, V is a collection of sensors and E describes the relationship between sensors. The feature tensor represents the records of the i-th sensor in the D-dim space in the past m time steps, and the label represents the records of the i-th sensor in the future n time steps. Since records collected on different sensors owned by different users/organizations may not be allowed to be shared due to the need for edge computing or permission issues when accessing the data, it is necessary to design an algorithm that models spatio-temporal relationships without directly exchanging node-level data.
Cross-Node Federated Graph Neu-ral Network (CNFGNN) model
The modeling of node-level temporal dynamics and server-level spatial dynamics is first disentangled as follows:
(i) At each node, an encoder-decoder model extracts temporal features from the data on the node and makes predictions;
(ii) On a central server, GraphNetwork (GN) [6] propagates the extracted node temporal features and outputs node embeddings, which contain informative relations between nodes.
(i) have access to non-shareable node data and execute each node locally. (ii) only involves uploading and downloading shredded features and gradients, not raw data on nodes. This decomposition allows exchanging and aggregating node information under the constraints of cross-node federated learning.
Modeling of Node-Level Temporal Dynamics
We modify a gated recurrent unit (GRU) based encoder-decoder architecture to model node-level temporal dynamics at each node. Given an input sequence at the i-th node . The encoder reads the entire sequence sequentially and outputs the hidden state as a summary of the input sequence according to Equation 1:
where is a zero-valued initial hidden state vector.
To incorporate spatial dynamics into each node's predictive model, we concatenate nodes with embeddings , which contain spatial information, as initial state vectors for the decoder. The decoder autoregressively generates predictions from the input sequence xi .
We choose the mean squared error (MSE) between predicted and true values as the loss function, which is evaluated locally at each node.
Modeling of Spatial Dynamics
To capture complex spatial dynamics, we employ GNNs to generate node embeddings that contain all node relationship information. A central server collects hidden states from all nodes as input to GNN. Each layer of GNN updates the input features as follows:
where ek, vi, U are edge features, node features and global features, respectively. is a neural network, an aggregate function such as summation.
We set (W is the adjacency matrix) and take the empty vector as input to the first GNN layer. The server-side GNN outputs the embeddings of all nodes and sends embeddings of each node accordingly.
Alternating Training of Node-Level and Spatial Models
One challenge posed by cross-node federated learning requirements and server-side GNN models is the high communication cost during the training phase. Since we distribute different parts of the model across different devices, split learning is a potential solution for training, where hidden vectors and gradients are communicated across devices. However, when we simply train the model end-to-end by split learning, the central server needs to receive hidden states from all nodes and send node embeddings to all nodes in the forward propagation, and then it has to receive nodes from all nodes The gradient of the embedding and sends the gradient of the hidden state back to all nodes in the backpropagation. Assuming that all hidden states and node embeddings have the same size S, the total amount of data transferred in each training round of the GNN model is 4|V|S.
To reduce the high communication cost in the training phase, we instead alternately train the models on the nodes and the GNN models on the server. More specifically, in each round of training, (1) update the node embeddings and optimize the encoder-decoder model for R rounds, then (2) we optimize the GNN model while fixing all models on the nodes.
Since the model on the node is fixed and remains unchanged during the training of the GNN model, the server only needs to obtain it from the node before the GNN training starts , and only needs to communicate the node embedding and gradient. Therefore, the average amount of data transmitted in each round of GNN model R training rounds is reduced to . We provide more details of the training process in Algorithm 1 and Algorithm 2.
To extract temporal features from each node more efficiently, an encoder-decoder model is trained on the nodes using the FedAvg algorithm. This enables all nodes to share the same feature extractor and thus a joint temporal feature hidden space, which avoids potential overfitting of the model on nodes and exhibits faster convergence and better predictive performance.